INTRODUCTION
Introduction
Self-driving cars are software-based and almost all human mistakes can be improved. Self-driving cars are a technological development in the automotive industry that provides drivers with both comfort and safety A study by ASIRT (Association for Safe International Road Travel) found that 3,700 people die on the roads on average every day. 20–50 million more people experience nonfatal injuries, many of which leave them permanently disabled Frequently, human error is responsible for this To avoid such mistakes, self-driving cars come to the ground and are the most demanding vehicles in the market.
Self-driving cars is a dream of developers for many years and many companies and individuals are taking part in their improvement’s day by day We also decided to put some effort, according to our knowledge, into this global development project.
Artificial intelligence (AI) is becoming increasingly popular and is affecting many aspects of daily life Computer vision (CV) is a branch of artificial intelligence that includes digital image acquisition, processing, analysis, and recognition Deep Learning Networks are a discipline that examines algorithms and computer programs so that computers may learn and make predictions in the same manner that humans do It is used in a variety of applications, including science, engineering, and other fields of life, as well as object detection and classification A good example is CNN (Convolutional Neural Network), which learns to distinguish patterns from images by successively stacking layers on top of each other CNN is now regarded as a model in many applications Full image classifier and leverages technologies in the field of computer vision to leverage machine learning.
More and more algorithms and models have been introduced for the recognition problem, including the U-net, CNN, and YOLO models, which are applied specifically to road lane recognition Therefore, we chose the topic "Design and implementation of a self-driving car for following lane" for the final graduation project.
Objective
The research objective of the thesis is the design and implementation of self-driving cars following the lane has the following functions:
Use Jetson Nano to process images and create signal communication through I2C and UART communication standards.
Use the Raspberry Pi V2 IMX219 8MP Camera to recognize traffic lanes and traffic signs.
Use an Arduino Nano to control the speed of the two rear wheels through L298N.
Use PCA9685 to control the steering angle of the two front wheels.
Monitor the process on the computer.
Build, design and execute hardware models in the most optimal way.
Provide model evaluation and model improvement.
The product is expected to be a vehicle model that moves along the lane and can recognize traffic signs.
Research Method
Analysis and evaluation of energy efficiency, processing speed, and performance on embedded systems of neural network models in road lane recognition.
Learn the neural network model's parameters, then build the network model to train the system to recognize road lanes.
Evaluate and analyze the functions of the system before selecting the hardware for the AI system.
Object and Scope of the study
We conducted study on the research subjects to have a better understanding of how the issue was implemented, which made it simpler to solve the problems The following are the subjects we investigated:
Nvidia Jetson Nano Developer Kit for AI Application Deployment Hardware:
The product is a small but potent embedded computer that allows running modern AI algorithms quickly It is possible to run multiple neural networks in parallel and process several high-resolution sensors simultaneously.
Camera Raspberry Pi V2 IMX219 8MP
Neural Network: YOLO (You Only Look Once) is a CNN network model used to detect and identify objects.
Due to a variety of objective (financial) and subjective (limited competence) reasons, the topic content is only used under the following scope:
The issue is our knowledge of the design of a self-driving car using Jetson Nano, which has only been designed at the model level and has not yet been used in practice.
The self-driving car using Jetson Nano in this project uses a Raspberry Pi camera that allows you to see the lane within 120 degrees.
The topic mainly focuses on identifying, detecting, and identifying road lanes and traffic signs performed in good light conditions.
The demo vehicle is implemented in indoors, away from direct sunlight.
The tracking software can only be used on computers.
Research contents
During the implementation of the graduation project with the topic "Design and implementation of a self-driving car for following lane", we worked on overcoming and accomplishing the following contents:
Content 1: Analyze the challenges of the project.
Content 2: Learn about the technical specifications, guiding thought and theoretical basis of the components of the hardware.
Content 3: Propose the model and summarize the overall system Design block diagram, principal diagram.
Content 4: Preprocessing data (cleaning data, generating object detection data).
Content 5: System configuration and design hardware.
Content 6: Test run, check, evaluate and adjust.
Outline
We tried to convey the information logically in the report so that readers quickly understood the subject's expertise, methodology, and functioning The report is organized into the following six chapters:
Chapter 1: INTRODUCTION Presenting an overview of the current research on AI applications in Self-driving cars Objectives, objects, and scope of the study.
Chapter 2: LITERATURE REVIEW Presenting background knowledge about AI
Technology, Image segmentation, U-Net, PID method and other techniques used in the Project.
Chapter 3: DESIGN AND IMPLEMENTATION Presenting system requirements, block diagrams and block functions, schematic, hardware design for the system, building algorithmic flowcharts.
Chapter 4: EXPERIMENT RESULTS AND DISCUSSION Presenting the results of hardware and software construction, evaluating their operation.
Chapter 5: CONCLUSIONS AND RECOMMENDATION Presenting conclusions for final project, stating the advantages and disadvantages of the topic, the errors that we made while implementing and giving directions for future development.
REVIEW
AI Technology
A Convolutional Neural Network [1], or CNN, is a type of artificial neural network that is widely used in Deep Learning for image/object recognition and classification. Deep Learning recognizes objects in images by employing CNN CNNs are important in a variety of tasks/functions such as image processing, computer vision tasks such as localization and segmentation, video analysis, recognizing obstacles in self-driving cars, and speech recognition in natural language processing [2] CNNs are very popular in Deep Learning because they play a significant role in these rapidly growing and emerging areas.
CNN is a neural network that basically contains three layers: input layer, hidden layer, output layer, to extract features of an image Besides, it also contains these convolutional layers, pooling layers, and fully connected layers [2] All of layers are stacked that leads to architecture of CNN is formed.
Figure 2.1: Architecture of CNN network Input layer: As we know, CNN is inspired from ANN model, so its input is an image which will hold the pixel values.
Convolutional layer: through the calculation of the scalar product between their weights and the area connected to the input volume, will be able to determine the output of neurons whose local regions of the input are connected.
Pooling layer: will just downscale the input along its spatial dimension, significantly lowering the number of parameters that make up that activation.
Fully connected layer: will then carry out the ANN's normal functions and try to create class scores from the activations for categorization Additionally, it is proposed that ReLu be applied in between these layers to enhance performance The goal of the rectified linear unit, also known as ReLu, is to activate the output of the previous layer's activation by applying an "elementwise" activation function, such as sigmoid Then, we will specifically analyze the convolutional layer, fully connected layer.
YOLO (You Only Look Once) [3] is a CNN network model used to detect and identify objects Additionally, the convolution of layers will extract features from an image and give the coordinates and order of labels assigned to each frame Furthermore, YOLO is the fastest algorithm in object recognition models but may not be the best. The main purpose of YOLO is to predict labels for objects in the classification and determine the coordinates of the object Therefore, YOLO can detect many objects with different labels in the fastest time analyze the.
YOLO has released many versions so far such as v1, v2, v3, v4, v5, … Each stage of YOLO has upgraded classification, optimized real-time label recognition, and extended prediction limits for frames.
Base networks are convolution networks that perform feature extraction in the YOLO architecture The Extra Layers are used to detect objects on the base network's feature map in the back part.
The base network of YOLO is composed primarily of convolutional layers and fully connected layers [4] YOLO architectures are also quite diverse and can be customized to accommodate a wide range of input shapes.
Figure 2.2: YOLO network architecture diagram.
The base network component of Darknet Architecture has a feature extraction effect. The base network produces a 7x7x1024 feature map, which is used as input for Extra layers that predict the label and bounding box coordinates of the object.
Image segmentation [5] is a function that takes image inputs and produces an output. The result is a mask or a matrix with various components indicating which object class or instance each pixel belongs to.
Image segmentation can benefit from several applicable heuristics, or high-level image properties Those features serve as the foundation for traditional image segmentation algorithms that employ clustering techniques such as edges and histograms. Traditional image segmentation techniques based on such heuristics can be fast and simple, but they often require significant fine-tuning to support specific use cases with manually designed heuristics They are not always precise enough to be used for complicated pictures Machine learning and deep learning are used in newer segmentation approaches to improve accuracy and flexibility.
Model training is used in machine learning-based segmentation of images algorithms to increase the program's capacity to recognize relevant characteristics Deep neural network technology is very useful for picture segmentation.
There are several neural network architectures and implementations that are appropriate for image segmentation They all have the same important components:
An encoder: a set of layers that extract image data by applying deeper and narrower filters as the layers are added The encoder may have been pre-trained on a comparable job (for example, image recognition), allowing it to use its previous experience to execute segmentation tasks.
A decoder: a sequence of layers that progressively convert the encoder's output into a segmentation mask with the same pixel resolution as the input image.
Skip connections: Multiple long-range neural network connections that allow the model to recognize characteristics at different sizes to improve model accuracy.
There are various ways to segment an image Here are some of the main techniques:
Figure 2.3: Types of Image Segmentation Semantic segmentation:
The process of grouping pixels in a picture based on semantic categorization Every pixel in this model belongs to a certain class, and the segmentation model makes no reference to any other context or information This method frequently leads to a problem statement that is poorly defined, especially if there are several instances grouped in the same class [6].
The process of categorizing pixels based on instances of an item (rather than object classes) Instance segmentation algorithms do not know what class each region belongs to; instead, they separate similar or overlapping regions based on object boundaries [7].
A newer sort of segmentation that is sometimes described as a hybrid of semantic and instance segmentation It guesses the identification of each object in the picture, separating every instance of each object [7].
Panoptic segmentation is beneficial for many products that require large volumes of data to function properly.
PID Method
PID Control [10], which stands for Proportional-Integral-Derivative feedback control, is one of the most extensively used controllers in the industry It works by regulating output to bring the value of a process to a preset set point.
Figure 2.6: PID controller Set Point: The set point is often a user-entered number, such as the set speed in cruise control.
Process Value: The value under control is the process value.
Error: The error value is used by the PID controller to determine how to change the output to bring the process value Error closer =Setpoint totheset − point ProcessValue
A PID controller's output is the regulated value The PID controller continuously monitors the error value and uses it to calculate the proportional, integral, and derivative values The controller then sums these three numbers to produce the output.
The proportional's objective is to have a big instantaneous reaction on the output to get the process value near the set point As the error decreases, the output will also decrease.
(2.2) c (2.3) where and the feedback error. is the proportional gain K e(t)
The Integral is computed by multiplying the I-Gain by the error, then multiplying this by the controller's cycle time, and continually accumulating this result as the "total integral".
0 ( ) e(t) r(t) (2.4) where is the error signal between the reference signal and the output , is the proportional gain, and is the integral time constant.
The derivative is calculated by multiplying the D-Gain by the process value's ramp rate The derivative's goal is to "predict" where the process value will go and integral, preventing the controller from exceeding the set point if the ramp rate is too fast.
(2.5) where , and is the feedback error signal between the reference signal output is the derivative control gain r(t) and thee(t) = r(t) − y(t)
A PID controller's output is the sum of the three terms in ideal form. y(t)
The overview of a self-driving car
A self-driving car [11] (sometimes called an autonomous car or driverless car) is a vehicle that uses a combination of sensors, cameras, radar, and artificial intelligence (AI) to travel between destinations without a human operator To qualify as fully autonomous, a vehicle must be able to navigate without human intervention to a predetermined destination over roads that have not been adapted for its use.
AI technologies power self-driving car systems Developers of self-driving cars use vast amounts of data from image recognition systems, along with machine learning and neural networks, to build systems that can drive autonomously.
The neural networks identify patterns in the data, which are fed to the machine learning algorithms That data includes images from cameras on self-driving cars from which the neural network learns to identify traffic lights, trees, curbs, pedestrians, street signs and other parts of any given driving environment.
Other techniques used in the Project
PWM [12] is a method of regulation that changes the output voltage by varying the width of the square pulse sequence, resulting in a voltage change.
PWM is an efficient way for controlling the amount of power given to a load while minimizing wasted power.
PWM pulse width modulation is applied a lot in control such as motor control, pulsers, voltage regulators, etc PWM helps control the speed of the motor or higher, and at the same time helps control the speed of the motor stability of engine speed.
The relationship between the Ton and Toff periods is called the duty cycle which describes the percentage of time a digital signal is “on” over an interval or period It is represented in percentage (%) and is det rmined by the following formula 2.7:
The Inter-Integrated Circuit (I2C) [13] Protocol is a synchronous serial communication protocol developed by Philips Semiconductors, allows several
"peripheral" digital integrated circuits to communicate with other peripheral or microcontroller devices It has recently become a popular protocol for short-distance communication It is also known as Two Wired Interface (TWI).
I2C is a two-wire serial bus, but those two lines can handle up to 1008 peripheral devices I2C may also support a multi-controller system, which allows several controllers to communicate with all peripheral devices on the bus.
I2C Communication Protocol uses only 2 bi-directional lines for data communication called SDA and SCL:
Serial Data (SDA) – The line to receive data.
Serial Clock (SCL) – It carries the clock signal.
Both SCL and SDA bus lines operate in Open Drain mode, meaning that the data line cannot be changed when the clock current is high, it can only be changed when the clock current is low Because to avoid the case where the bus is both pulled by one device to a high level and by another device to a low level, causing a short circuit.
In the I2C communication protocol, the data is transmitted in the form of packets which consist of 9 bits.
Figure 2.8: Transmitted in the form of packets of I2C
To generate START, SDA is changed from high to low while keeping SCL high To generate STOP, SDA goes from low to high while keeping SCL high.
The first frame following the start bit is the address frame The master sends the address of the slave with which the master wants to communicate to every slave connected to it Each slave then compares the address sent from the master with its own address.
If the address matches, it sends a low-voltage ACK bit back to the master.
If the addresses do not match, the slave does nothing and the SDA current between those 2 devices will remain high.
This bit indicates whether the process is sending or receiving data from the Master device A high Read/Write bit means the master is sending data to the slave, whereas a low Read/Write bit means the master is receiving data from the slave.
Abbreviation for Acknowledged / Not Acknowledged Used to compare the physical address bit of the device with the address to which it was transmitted After each data frame, an ACK/NACK bit is followed If it does, the Slave will be set to '0' and otherwise, the default will be '1'.
The data frame is always 8 bits long and is sent with the most significant bit first (MSB) Each data frame is immediately followed by an ACK/NACK bit to verify that the frame was received successfully (bit 0 in the SDA line) The ACK bit must be received by the master or slave before the next data frame can be sent.
After all data frames have been sent, the master can send a Stop condition to the slave to pause the transmission.
To STOP, the voltage changes from low to high on the SCL line before switching the voltage from low to high on the SDA line.
Master device will send a Start pulse by switching SDA and SCL from high voltage to low voltage, respectively.
Next, Master sends the 7 or 10 address bits to Slave that wants to communicate with the Read/Write bit Slave will compare the physical address with the address it was sent to If there is a match, Slave will acknowledge it by turning SDA low voltage and setting the ACK/NACK bit to '0' If there is no match, SDA and ACK/NACK bits both default to '1' Master device sends or receives a data bit frame If Master sends to Slave, Read/Write bit is set to '0' Otherwise, this bit is set to '1' If Data frame has been successfully transmitted, ACK/NACK bit is set to '0' to signal Master to continue.
After all data has been successfully sent to Slave, Master will send a Stop signal to notify Slave that the transmission has ended by switching SCL and SDA from low voltage to high voltage, respectively.
2.4.3 UART (Universal Asynchronous Receiver/Transmitter)
UART [14] is used for serial communication Two wires are established, with just one wire used for transmission and the other used for receiving.
UART is an asynchronous protocol, so there is no clock line that regulates the data transfer rate The user must set both devices to communicate at the same speed This speed is called the baud rate, expressed in bits per second (bps) The baud rate varies considerably, from 9600 baud to 115200 baud and beyond.
Data transmitted by UART is organized into packets Each packet contains 1 start bit,
5 to 9 data bits (depending on UART), an optional parity bit, and 1 or 2 stop bits.
Figure 2.10: Transmitted in the form of packets of UART Start bit:
UART data lines are kept at high voltage when not transmitting data For data transmission, the transmitting UART is changed from high voltage to low voltage for one clock cycle When the receiving UART detects the high-to-low voltage transition, it starts reading the bits in the data frame with the frequency of the baud rate.
The data frame contains the data being transferred It can be from 5 bits to 8 bits long if a parity bit is used If parity bits are not used, the data frame can be 9 bits long Data is sent with the least significant bit first (LSB).
The parity bit is a way for the receiving UART to indicate whether any data has changed during transmission Bits can be altered by some cases After the receiving UART reads the data frame, it counts the number of bits with a value of 1 and checks if the total is even or odd If the parity bit is 0 (even), the sum of bit 1 in the data frame must be an even number If the parity bit is 1 (odd), the sum of 1 in the data frame is an odd number When the parity bit matches the data, the UART knows that the transmission was error-free and Otherwise, UART will know that the bits in the data frame have changed.
To signal the end of a data packet, the sending UART drives the data line from low voltage to high voltage for at least about 2 bits.
DESIGN AND IMPLEMENTATION
Requirements of the topic
The self-driving vehicle system serves to support the driver during the transportation process when an unexpected incident occurs, minimizing the risk of an accident. Therefore, the system must respond appropriately to certain technological specifications.
To detect and recognize objects, the AI system will use a camera and an AI algorithm, with all computations and algorithms running on the Jetson Nano platform because to its graphics processing capabilities.
Alternatively, the system must use the least amount of power possible to ensure that it can operate continuously for several hours These technological needs are also relative,and there is no standard for such supporting equipment In this study, we select factors that are both relevant and appropriate.
Block diagram
We discuss the issue of selecting relative and suitable parameters in this design.
Figure 3.1 depicts the system's block diagram.
Figure 3.1: Block of Design a self-driving car for following lane system
The self-driving car for following lane system is made up of three main blocks The first important block is the AI Block The control and central processing circuit of this block is the Jetson Nano 16 eMMC Developer Kit and Camera 8MP Raspberry Pi V2 IMX219, which can handle large workload In this topic, we use to: Detect and identify moving lanes on U-net algorithm and traffic signs on YOLO algorithm The second block is the DC Motor Control Block Arduino Nano is responsible for controlling the H-Bridge Motor driver L298N and the motor helps control vehicle speed The third block is the Servo Motor Control block This block helps control the steering stem at will in different cases PCA9685 16-Channel PWM Driver will be responsible for controlling RC Servo MG996 Motor.
The self-driving car for following lane system is divided into two main processing blocks: AI block and Control block Because the Jetson Nano is handling heavy workloads, using the Jetson Nano's GPIO can be disruptive, reducing the overall system performance.
It is better to use Arduino Nano and PCA9685 for lighter tasks like processing data from motors that do not require a lot of processing power than to rely on Jetson Nano for everything On the other hand, the Arduino Nano and PCA9685 will use less power,which allows them to use from the battery power block.
Design option
The AI block must handle a lot of work; therefore, it plays an important part in the self-driving car for following lane system When the AI block recognizes and detects the input lane in the recorded video stream in real time The following sections provide details on the hardware and algorithms used by AI Block A detailed block diagram of the
AI system has been shown in Figure 3.2.
Figure 3.2: The detailed block of the AI system
Jetson Nano Developer Kit eMMC [15]
To achieve the processing capacity and speed requirements, we need a board with the processing capability to execute very complicated control algorithms Furthermore, choosing an integrated microprocessor with a compact size is a viable choice for reducing the size and increasing the flexibility of the device Today's embedded boards are available at a variety of costs, resulting in a wide range of processing capabilities.
Since then, developing applications for self-driving car for following lane system based on NVIDIA Jetson boards is an optimal choice The Jetson Nano is a good fit for us on this topic.
Figure 3.3: NVIDIA Jetson Nano Developer Kit eMMC
Jetson Nano Developer Kit [15] is an AI computer that strongly supports image processing such as classification and recognition through GPU matching.
The Jetson Nano model used in this project is the Jetson Nano 4GB development kit. This GPU support includes a 64-bit quad-core CPU based on ARM Cortex-A57, 4GB of RAM and a video processor that can handle up to 4K 30 fps for encoding and 4K 60fps for decoding, along with are the PCIe and USB 3.0 slots Jetson Nano offers 472 GFLOPS to run modern AI algorithms quickly, with a quad-core 64-bit ARM CPU, a 128-core integrated NVIDIA GPU, as well as 4GB LPDDR4 memory It is possible to run multiple neural networks in parallel and process several high-resolution sensors simultaneously.
Camera Raspberry Pi V2 IMX219 8MP [16]
To be able to fulfill the requirements of detecting and identifying road construction as well as traffic signs, we need a camera that can have good identification functions TheRaspberry Pi Camera Module V2 has an 8-megapixel sensor from Sony IMX219 is the right choice.
The Raspberry Pi Camera Module V2 is a leap forward in image quality, true color, and low light performance Especially it supports video up to 1080P30, 720P60 and VGA90 video mode, as well as photo mode Of course, it still uses a 15cm cable through the CSI port on the Raspberry Pi.
Figure 3.4: Camera Raspberry Pi V2 IMX219 8MP Table 3.1: Camera Raspberry Pi V2 IMX219 8MP Specifications
Sensor image area 3.68 x 2.76 mm (4.6 mm diagonal)
Depth of field Approx 10 cm to ∞
Horizontal Field of View (FoV) 62.2 degrees
Vertical Field of View (FoV) 48.8 degrees
The schematic of AI System
The Jetson Nano is a central processing unit in this figure that receives images from the input block, which is the Raspberry Pi V2 IMX219 Camera 8MP, processes the signal, and then forwards it to the output block, which includes the Arduino Nano and PCA9685, via two protocols, UART and I2C respectively.
Figure 3.5: The schematic of the AI system
The power block will provide the central processor (Jetson Nano) with a voltage of5V.
Figure 3.6 examines all the steps completed to recognize lanes of moving roads and traffic signs, which are also a path of the AI system.
Figure 3.6: The flowchart of recognized lanes
First, we will input the model file onnz or tflite Then the camera will be opened with GStreamer GStreamer is a framework that allows us to create multimedia applications. When the Camera is open, the data collection process will begin The frame from the camera is cropped to fit the training model Next, the data is processed, recognized, and detected prediction Finally, draw the pointed prediction on image in real time to locate the move To describe the system's operation in more depth, it has been divided into several parts.
After researching some project related documents, we decided to create our own lane.
To generate rich data for training, we generate the lanes using two different materials for the dataset One includes a reflective tarp, while the other uses felt fabric Both materials are black.
Figure 3.7: The flowchart of generate the lane
First, to open the Jetson Nano's Camera system, we will use the OpenCV 4.5.0 library and the GStreamer pipeline Next, the system will check whether the Camera is open or not with the "cv2.VideoCapture()" function If the conditions are met, each frame will be saved to the memory cell with the "cv2.imwrite('lane_' + str(i) + '.jpg', frame)" function Images are saved as jpg or png format Limit the number of images based on index "i".
Labeling and classification for lane dataset
Figure 3.8: The process flow of Labeling and classification for lane dataset
We used images cut from the video to label Use LabelMe software to classify and label data by Semantic Segmentation The image after the label is saved as a json file.Next, create a label txt file included name of class as below:
#%cd {src_segmentation}/{PATH_foldername}
%cd {PATH_2segmentation}/{PATH_foldername}
!rm labels.txt # nếu có file thì xóa
!echo ' ignore ' >> labels.txt ! echo '_background_' >> labels.txt !echo 'lane' >> labels.txt
After create a labels txt file, we get the results as Figure 3.9:
Figure 3.9: Result the name of class
The purpose of creating a txt file is to define the "lane" class name for the entire dataset.
Labeling and classification for traffic signs dataset
We used images from our video to ensure that our learners could handle a variety of traffic signs We will use a dataset of 1200 images cropped from a video of traffic signs(including left, right, stop and straight) to prepare for the labeling and training process A label will be attached to each image.
Figure 3.10: The process flow of Labeling and classification for traffic signs dataset
Then, the size of the image will be set according to YOLO’s regulation because Roboflow software supports exporting images as txt files using Yolov8 format.
Figure 3.11: Format to export dataset
Figure 3.12 are some examples from the dataset:
Image input data of Yolov8 format with each txt file will give an image containing the object that we label The txt file will have the following format:
Each row will be an object.
Each row will have the following format: class x center y center width height.
The coordinates of the boxes will be normalized in the format x, y, w,h.
Figure 3.13: Parameter of ratio Frame Train Yolv8 on Google Colab
Object detection is a subfield of deep learning As you may be aware, deep learning contains a wide range of methods for object detection, image processing, and so on.
In comparison to other models, "Yolo" has been the best for identifying objects in real time In terms of accuracy and speed, "Yolo" outperforms the competition As a result, we use "Yolov8" to implement object detection.
We choose Google Colab as the environment in which to train the input data. Furthermore, the core algorithm, Yolov8, builds the weight file after learning from input data.
Figure 3.14: The process flow of Train Yolv8 on Google Colab
To start using or training custom datasets with Yolov8 it is necessary to first git and install github Yolov8 to the drive Then we need to get the Yolov8 weight file, which is used to train the dataset and is entitled "yolov8n.pt" Next, in Yolov8, we have to modify the "custom_dataset.yaml" file to reflect the classes that we defined when we generated the dataset Because, the standard “custom_dataset.yaml” file contains 80 classes but not the classes that we define for the dataset.
To begin the training process, execute the following command below:
%cd /content/drive/MyDrive/Data_Label/YoloV8 from ultralytics import YOLO # Load a model model = YOLO("/content/yolov8n.pt")
# Use the model model.train(data="/content/mydataset.yaml",epochs00, imgszd0,batch2)
Batch size: The amount of data is divided and based on the batch size chosen by the user, the 2 to the power n technique is used Example:
Epoch: An epoch is one traversal of the training set's data.
Data: Images are contained in the train folder.
“Best.pt” file is the one that provides superior performance.
“Last.pt” file is the model after the most recent epoch, and it can be used to restart training.
To check the accuracy of the model, we will execute the following command below: from ultralytics import YOLO model = YOLO("/content/train/weights/best.pt") results = model("/content/train/images/signal_12600.jpg") cv2_display(results)
This is a Keras Sequential API-based convolutional neural network (CNN) model.
Figure 3.15: Create CNN networkInput layer: Conv2D layer with 32 filters, kernel size of (5, 5), 'relu' activation function, and input shape of (30, 30, 3).
Conv2D layer with 64 filters, a kernel size of (3, 3), and 'relu' activation function.
MaxPool2D layer with a pool size of (2, 2) for down sampling.
Dropout layer with a dropout rate of 0.5 to prevent overfitting.
Conv2D layer with 64 filters, a kernel size of (3, 3), and 'relu' activation function.
MaxPool2D layer with a pool size of (2, 2) for down sampling.
Dropout layer with a dropout rate of 0.5.
Flatten layer: Converts the preceding layer's output to a 1D array.
A 256-unit dense layer with a 'relu' activation function.
Dropout layer with a dropout rate of 0.5.
Output layer: Dense layer with four units (assuming it's a classification issue) and a
'softmax' activation function to calculate class probabilities.
EXPERIMENT RESULTS AND DISCUSSION
Hardware implementation
Figure 4.1 depicts a self-driving car deployment model for a lane-following system.
Figure 4.1: Traffic map of the self-driving car system
The system will consist of three main components A car model with a Jetson Nano central processing and an AI system camera will be mounted on top of the vehicle system to conduct lane and traffic sign recognition, the control system helps the vehicle move and finally, black lanes with white lines and traffic signs.
A model vehicle is the result of the system design The main responsibility of the vehicle is to move to the correct position on the road after receiving the control signal from the system.
Figure 4.2: The structure layers of the self-driving car system
The vehicle will have four floors: the first floor will contain the L298N motor driver,
RC Servo MG996 Motor and DC motor; the second floor will contain the power supply block for the central processing system; the third floor will include the central processing block, Jetson Nano, and the Raspberry Pi V2 Camera, which is the "brain" of the entire system; and the fourth stage contains the power supply system for the motor, the PCA9685 16-Channel PWM Driver and the Arduino Nano The car will be powered by a lithium battery that enables it to run for several hours.
Finally, white tape is applied to the vehicle's path system so that the vehicle can recognize the range of movement and adjust to the correct route There are notices along the route where traffic signs are placed At each of these locations, a sign will be identified so that the vehicle can properly perform that function.
System Operation
During the process of moving on the roads, the AI system will collect data usingCamera and image processing techniques to identify lanes and signs and then take necessary action to drive the car Figure 4.3 shows the success of the recognition AI system in classification:
Figure 4.3: The recognition of lane and traffic signs by autonomous vehicles
Applying image processing techniques to determine the next moving position, AI system will transmit signals to the control system, including angle and speed data The angle data will be calculated through the PID method and the speed data will be obtained from the Linear function After receiving the control signal, control systems, which included the Servo control block and the DC motor control block, received the values of angle and speed, respectively, to execute the moving process.
Figure 4.4: The recognition of lane by autonomous vehicles
If the AI system recognizes a sign in the frame while moving, it will transmit the processed signal to the control system to perform that function (left, right, stop, straight).
Figure 4.5: The recognition of traffic signs by autonomous vehicles
Process and send to the control system a signal including the value of angle and speed for the vehicle to perform the function at the same time.
Implementation
We decided to generate a lane using the tools we have Use the Camera and let the car move on the created lane This is the outcome of the successful lane.
Figure 4.6: Result of generating lane process
Some images of lanes in the dataset:
Lanes are clearly distinguished on a black background by white tape.
To perform the data labeling process, use the LabelMe software To get started, upload the folder containing the data Then use Create Polygons to draw lines around the lane and give it a class name Figure 4.8 shows the result of the above process as the first step in the labeling process.
Figure 4.8: Result of labelling each lane image
The red area is the part of the lane that needs to be labeled and is given the class name "lane".
Label file is saved json format Because the json file only contains information about the point coordinates of each segment point Therefore, it is necessary to convert all the coordinates of a json file into an image.
Furthermore, instead of converting the point coordinates to a file with RGB image color, convert them to images 0 and 1 Simultaneously, to speed up the model training process while saving space.
Figure 4.9: The segmentation of image
Figure 4.9 is the result of the transformation.
The Sign traffic generation process
Like the lane generation process, we also create signs and place them on the lane. Use the Camera and let the car move on the lane and collect the results as Figure 4.10.
Figure 4.10: Result of generating Sign traffic process
Details of some signs traffic in the dataset
Collected images are clear as Figure 4.11.
To perform the data labeling process, using a computer vision development platform called Roboflow allows for better methods of model training, preprocessing and data collection To begin, upload the data to Roboflow Each image will then create a rectangle around the identified region and assign it a layer name Figure 4.12 shows the result of the above process as the first step in the labeling process.
A red rectangle is drawn around the sign, the "left" class is labeled for it.
Figure 4.12: Result of labelling each traffic sign image
Similarly, with classes "right", "stop" and straight" are green, orange and blue rectangles respectively.
Figure 4.13: Labelling of other classes4.4 Evaluation and comparison
4.4.1 Comparation between 2.500 dataset and Extra 2000 dataset
After training 2.500 dataset first with 100 epochs, we got the results as Figure 4.14:
Figure 4.14: Loss and Accuracy Graph of 2.500 dataset first with 100 epochs
Although the results show that the values of Accuracy and Validation Accuracy slow down to 1, indicating that the model is well trained, in the range of 0.92-0.98, the amplitude of Loss and Validation Loss has a rather large difference even though both two are still approaching zero, between 0.2 and 0.05.
Figure 4.15: Results of test image from 2.500 dataset first with 100 epochs
Following recognition of the fact, the returned result is noisy and deviates from the predicted.
Continuing to train with the following 2000 datasets gets even better results, as can be seen in the Figure 4.16:
Figure 4.16: Loss and Accuracy Graph of extra 2000 dataset second - 100 epochs
The accuracy and validation accuracy values are still in the 0.97-0.985 range. Furthermore, the magnitude of Loss and Validation Loss findings are greatly improved. The range has a tiny divergence, and both converge to zero at a value between 0.06 and 0.
Figure 4.17: Results of test image from extra 2000 dataset second - 100 epochs
Table 4.1: Results of training for Lane detection model
LANE DETECTION MODEL Model Epochs Dataset Accuracy Val_Accuracy Loss Val_Loss Model size Format
Here, we apply the U-net model to train the for lane detection model From the results of Table 4.1, we have divided into 2 periods showing a clear change in coefficients such as Accuracy, Loss and so on.
In about the first 2500 datasets, the accuracy and loss are quite high as well as validation accuracy and loss, but the margin between loss and validation loss is still high.Besides, with 2000 data sets added later, we can see a clear change in accuracy from0.963 to 0.979 and validation loss from 0.076 to 0.49 as well as loss amplitude and validation loss now narrows between 0.041 and 0.049.
Table 4.2: Results of deploying for Lane detection model
Model Format CUDA CPU Model size
The best results obtained after testing the h5 models after training Because, we changed from file “.h5” to “.onnx”, the model size decreased from 1.4M to 0.422M as well as making the model light and suitable during deploy and operation.
4.4.2 Evaluate of the Yolo model
The Figure 4.18 is result after training:
Figure 4.18: Result of the recognized traffic sign by Yolo
Traffic sign recognition results using model Yolo.
Table 4.3: Results of training for Yolo versions
DETECTION OF THE TRAFFIC SIGN Model Precision Recall F1_score MAP@50 MAP@50-95
From the Table 4.3, we see that Yolov8n will give us high results and outperform previous generation Yolo models such as Precision, Recall, F1_core and MAP In contrast, Yolov4 tiny gives a low result, but it exceeds the safe threshold and is greater than 90% of the total allowable index.
Table 4.4: Results of deploying for Yolo versions
DEPLOY DETECTION OF TRAFFIC SIGN Model Format CUDA CPU Model size
Engine 10-11fps X 7,1M Yolov7-tiny Onnx 3.5-4fps X 23,8M
During deploying, we used the Onnx runtime source code to load the “.onnx” model of Yolov5 to Yolov8 onto Jetson Nano But the results obtained in terms of FPS index are still quite low to be able to apply and develop together with autonomous vehicles during the vehicle's movement.
Besides, the TensorRT source code gives better results and can increase the performance in detection of Yolo versions Yolov4-tiny obtained the highest fps and Yolov8 gave the lowest result Although Yolov4-tiny gives the lowest results in the Yolo versions, the fps obtained is quite high and stable from 24 to 25fps.
The result of training with CNN model:
Figure 4.19: Loss and Accuracy Graph of training with CNN model
After training with 20 epochs, the accuracy and validation accuracy values are still in the 0.97-0.9 range Furthermore, the magnitude of Loss and Validation Loss findings are greatly improved The range has a tiny divergence, and both converge to zero at a value between 0.08 and 0.5.
CLASSIFICATION Model Epochs Dataset Accuracy Val_Accuracy Loss Val_Loss Model size Format
From the Table 4.5, we see that CNN model will give us a result significantly high, go up to 0,993 Besides, the result of Loss converges to 0.
4.5.1 Run a Yolov4 and YoloV5 model by converting it to the TensorRT engine and then running it in Jetson Nano.
During inference, TensorRT-based apps can outperform CPU-only systems by up to
36 times It has a response time of less than 7ms and can optimize for specific targets As a result, developers may improve neural network models trained on all main frameworks for quicker inference, including PyTorch, TensorFlow, ONNX, and MATLAB.
CONCLUSION AND FUTURE WORKS
Conclusion
Despite numerous challenges in studying, familiarizing, and understanding the issue, we can see that the feasibility exists and can be accomplished within the time and conditions for project approval The objectives were met by this method.
The report accomplished the following tasks:
- Learn and apply the control algorithm to navigate the steering angle and speed for the vehicle.
- Using cameras in the mapping process during the movement, navigate the vehicle by following the signs.
- Integrate an embedded computer into the vehicle to reduce its size and increase its flexibility.
- Apply the knowledge learned in the subject's construction process and hardware design.
In addition, there are jobs that the report has not completed well:
- Accuracy in sharp or sudden turns is not stable.
- FPS has not been improved when deploying from Yolov5 to Yolov8 on Jetson Nano.
Future works
The current research serves as the foundation and premise for effectively building and developing the issue at a higher level Improve the system in general and make it more stable in a variety of scenarios.
To improve the stability of the self-driving car position, we can be training more so that the output quality is higher, allowing us to compute a more precise position for the moving automobile.
[1] Rikiya Yamashita, Mizuho Nishio, Richard Kinh Gian Do & Kaori Togashi,
"Convolutional neural networks: an overview and application in radiology," 22 June 2018.
[2] Laith Alzubaidi, Jinglan Zhang, Amjad J Humaidi, Ayad Al-Dujaili, Ye Duan,
Omran Al-Shamma, J Santamaría, Mohammed A Fadhel, Muthana Al-Amidie
& Laith Farhan, "Review of deep learning: concepts, CNN architectures, challenges, applications, future directions," 31 March 2021.
[3] Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, "You Only Look
Once: Unified, Real-Time Object Detection," 8 Jun 2015.
[4] Tausif Diwan, G Anirudh & Jitendra V Tembhurne, "Object detection using
YOLO: challenges, architectural successors, datasets and applications," 08 August 2022.
[5] Yanming Guo, Yu Liu, Theodoros Georgiou & Michael S Lew, "A review of semantic segmentation using deep neural networks," 24 November 2017.
[6] Fude Cao, Qinghai Bao, "A Survey On Image Semantic Segmentation Methods
With Convolutional Neural Network," 20 November 2020.
[7] H Bandyopadhyay, "An Introduction to Image Segmentation: Deep Learning vs.
Traditional," [Online] Available: https://www.v7labs.com/blog/image- segmentation-guide.
[8] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, "U-Net:
Convolutional Networks for Biomedical," 18 May 2015.
[9] Albert Aarón Cervera-Uribe, Paul Erick Méndez-Monroy, "U19-Net: a deep learning approach for obstacle detection in self-driving cars," 7 April 2022.
[10] L Wang, "Basics of PID Control," in PID Control System Design and
Automatic Tuning using MATLAB/Simulink, Wiley-IEEE Press, 2020, pp 1 - 30.
[11] Christoph Lütge, Christoph Bartneck, "Autonomous Vehicles," in An
Introduction to Ethics in Robotics and AI, January 2021, pp 83-92.
[12] Safaa Alaa Eldeen Hamza, Dr Amin Babiker A/Nabi Mustafa, "The Common Use of Pulse Width Modulation," Faculty of Engineering, Department of Electronic
Engineering, MSc in Communication and Data Networks, Sudan-Khartoum.
[13] Texas Instruments, "A Basic Guide to I2C," [Online] Available: https://www.ti.com/ lit/an/sbaa565/sbaa565.pdf?ts87506980649&ref_url=https %253A%252F
[14] Texas Instruments, "KeyStone Architecture Universal Asynchronous
Receiver/Transmitter (UART) User Guide," [Online] Available: https://www.ti.com/lit/ug/sprugp1/sprugp1.pdf?ts87537304378.
[15] NVIDIA, "NVIDIA Deep Learning TensorRT Documentation," [Online].
Available: https://docs.nvidia.com/deeplearning/tensorrt/.
[16] Raspberry Pi, "Raspberry Pi Camera Module 2," [Online] Available: https://www.raspberrypi.com/products/camera-module-v2/.
[17] Arduino, "Arduino Nano," [Online] Available: https://store.arduino.cc/products/arduino-nano.
[18] STMicroelectronics, "L298N Motor Driver Module," [Online] Available: https://components101.com/modules/l293n-motor-driver-module.
[19] Luigi Cocco and Muhammad Aziz, "Role of Bearings in New Generation
Automotive Vehicles: Powertrain," in Advanced Applications of Hydrogen and Engineering Systems in the Automotive Industry, December 4th, 2020.
[20] Bill Earl, "Adafruit PCA9685 16-Channel Servo Driver," [Online] Available: https://learn.adafruit.com/16-channel-pwm-servo-driver?view=all.
[21] "MG996R High Torque Metal Gear Dual Ball Bearing Servo," [Online] Available: https://www.electronicoscaldas.com/datasheet/MG996R_Tower-Pro.pdf.
[22] Ronneberger, O., Fischer, P., & Brox, "U-Net: Convolutional Networks for
Biomedical Image Segmentation," in International Conference on Medical Image
Computing and Computer-Assisted Intervention (MICCAI), Springer, 2015, pp.
[23] Dillon Reis, Jordan Kupec, Jacqueline Hong, Ahmad Daoudi, "Real-Time
Flying Object Detection with YOLOv8," 17 May 2023.
[24] Chien-Yao Wang, Alexey Bochkovskiy, Hong-Yuan Mark Liao, "YOLOv7:
Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," 6Jul 2022.
[25] Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao, "YOLOv4:
Optimal Speed and Accuracy of Object Detection," 23 Apr 2020.
[26] Sparsh Mittal, "A Survey on Optimized Implementation of Deep Learning Models on the NVIDIA Jetson Platform," 2019.
[27] Yuxiao Zhou, Kecheng Yang, "Exploring TensorRT to Improve Real-
Time Inference for Deep Learning," IEEE, 2022.
[28] KUI WANG , "Velocity and Steering Control for Automated Personal Mobility
Vehicle," 2021 [Online] Available: https://www.diva-portal.org/smash/get/diva2:1670919/FULLTEXT01.pdf.
[29] Rafsanjani, I P D Wibawa , and C Ekaputri, "Speed and steering control system for self-driving car prototype," IOP, 2019.
[30] Mehran Pakdaman; M Mehdi Sanaatiyan; Mahdi Rezaei Ghahroudi, "A line follower robot from design to implementation: Technical issues and problems," IEEE, March 2010.
Link GitHub: https://github.com/nghungthinh/Self_driving_car