INTRODUCTION
Introduction
Self-driving cars represent a significant advancement in automotive technology, enhancing both comfort and safety for drivers According to the ASIRT, an average of 3,700 fatalities occur on roads daily, with 20–50 million individuals suffering nonfatal injuries, often resulting in permanent disabilities, primarily due to human error By minimizing these mistakes, self-driving cars emerge as the most sought-after vehicles in today's market.
Self-driving cars have long been a vision for developers, with numerous companies and individuals actively contributing to their advancement daily We, too, are committed to leveraging our expertise to support this global initiative.
Artificial intelligence (AI) is increasingly influencing daily life, with computer vision (CV) playing a significant role in digital image acquisition, processing, analysis, and recognition Deep Learning Networks focus on algorithms that enable computers to learn and predict like humans, with applications across various fields such as science and engineering, particularly in object detection and classification A prominent example is the Convolutional Neural Network (CNN), which identifies patterns in images by stacking layers, making it a widely used model in computer vision and machine learning applications.
Recent advancements in algorithms and models, such as U-net, CNN, and YOLO, have significantly enhanced road lane recognition capabilities Consequently, our final graduation project focuses on the "Design and Implementation of a Self-Driving Car for Lane Following."
Objective
The research objective of the thesis is the design and implementation of self-driving cars following the lane has the following functions:
Use Jetson Nano to process images and create signal communication through I2C and UART communication standards.
Use the Raspberry Pi V2 IMX219 8MP Camera to recognize traffic lanes and traffic signs.
Use an Arduino Nano to control the speed of the two rear wheels through L298N.
Use PCA9685 to control the steering angle of the two front wheels.
Monitor the process on the computer.
Build, design and execute hardware models in the most optimal way.
Provide model evaluation and model improvement.
The product is expected to be a vehicle model that moves along the lane and can recognize traffic signs.
Research Method
Analysis and evaluation of energy efficiency, processing speed, and performance on embedded systems of neural network models in road lane recognition.
Learn the neural network model's parameters, then build the network model to train the system to recognize road lanes.
Evaluate and analyze the functions of the system before selecting the hardware for the AI system.
Object and Scope of the study
Our study focused on understanding the implementation of the issue, which facilitated problem-solving We investigated several key subjects to gain deeper insights into the matter.
Nvidia Jetson Nano Developer Kit for AI Application Deployment Hardware:
Introducing a compact yet powerful embedded computer designed for rapid execution of modern AI algorithms This innovative device enables the simultaneous operation of multiple neural networks while efficiently processing high-resolution sensor data.
Camera Raspberry Pi V2 IMX219 8MP
Neural Network: YOLO (You Only Look Once) is a CNN network model used to detect and identify objects.
Due to a variety of objective (financial) and subjective (limited competence) reasons, the topic content is only used under the following scope:
The design of a self-driving car utilizing Jetson Nano is currently limited to a model-level concept and has yet to be implemented in real-world applications.
The self-driving car using Jetson Nano in this project uses a Raspberry Pi camera that allows you to see the lane within 120 degrees.
The topic mainly focuses on identifying, detecting, and identifying road lanes and traffic signs performed in good light conditions.
The demo vehicle is implemented in indoors, away from direct sunlight.
The tracking software can only be used on computers.
Research contents
During the implementation of the graduation project with the topic "Design and implementation of a self-driving car for following lane", we worked on overcoming and accomplishing the following contents:
Content 1: Analyze the challenges of the project.
Content 2: Learn about the technical specifications, guiding thought and theoretical basis of the components of the hardware.
Content 3: Propose the model and summarize the overall system Design block diagram, principal diagram.
Content 4: Preprocessing data (cleaning data, generating object detection data).
Content 5: System configuration and design hardware.
Content 6: Test run, check, evaluate and adjust.
Outline
The report is structured logically to ensure that readers can easily grasp the subject's expertise, methodology, and functioning It is divided into six comprehensive chapters.
Chapter 1: INTRODUCTION Presenting an overview of the current research on AI applications in Self-driving cars Objectives, objects, and scope of the study.
Chapter 2: LITERATURE REVIEW Presenting background knowledge about AI Technology, Image segmentation, U-Net, PID method and other techniques used in the Project.
Chapter 3: DESIGN AND IMPLEMENTATION Presenting system requirements, block diagrams and block functions, schematic, hardware design for the system, building algorithmic flowcharts.
Chapter 4: EXPERIMENT RESULTS AND DISCUSSION Presenting the results of hardware and software construction, evaluating their operation.
Chapter 5: CONCLUSIONS AND RECOMMENDATION Presenting conclusions for final project, stating the advantages and disadvantages of the topic, the errors that we made while implementing and giving directions for future development.
REVIEW
AI Technology
A Convolutional Neural Network (CNN) is a key artificial neural network utilized in Deep Learning for image and object recognition CNNs are essential for various applications, including image processing, computer vision tasks like localization and segmentation, video analysis, obstacle detection in self-driving cars, and speech recognition in natural language processing Their popularity in Deep Learning stems from their critical role in advancing these rapidly evolving fields.
A Convolutional Neural Network (CNN) consists of three main layers: the input layer, hidden layer, and output layer, which work together to extract features from images Additionally, CNNs incorporate convolutional layers, pooling layers, and fully connected layers, all of which are stacked to create the network's architecture.
Figure 2.1: Architecture of CNN network
Input layer: As we know, CNN is inspired from ANN model, so its input is an image which will hold the pixel values.
The convolutional layer computes the output of neurons by calculating the scalar product between their weights and the corresponding regions of the input volume, effectively determining the output based on local connections within the input data.
Pooling layer: will just downscale the input along its spatial dimension, significantly lowering the number of parameters that make up that activation.
The fully connected layer plays a crucial role in an Artificial Neural Network (ANN) by generating class scores from activations for effective categorization To enhance performance, it is recommended to implement the Rectified Linear Unit (ReLu) activation function between these layers ReLu activates the output of the preceding layer through an elementwise activation function, such as sigmoid This article will further examine both the convolutional and fully connected layers in detail.
YOLO (You Only Look Once) is a convolutional neural network (CNN) model designed for object detection and identification It efficiently extracts features from images through its convolutional layers, providing coordinates and labels for each detected object in a frame While YOLO is recognized as the fastest algorithm for object recognition, it may not always deliver the highest accuracy compared to other models.
YOLO's primary function is to swiftly predict object labels and their coordinates, enabling it to detect multiple objects with varying labels in real-time This efficiency makes YOLO a powerful tool for object detection and classification tasks.
YOLO has released many versions so far such as v1, v2, v3, v4, v5, … Each stage of YOLO has upgraded classification, optimized real-time label recognition, and extended prediction limits for frames.
Base networks in the YOLO architecture utilize convolutional networks for effective feature extraction In the latter part of the model, Extra Layers are employed to identify objects based on the feature maps generated by the base network.
The YOLO base network features a combination of convolutional and fully connected layers, allowing for diverse architectures that can be tailored to various input shapes.
Figure 2.2: YOLO network architecture diagram.
The core component of Darknet Architecture serves as a feature extractor, generating a 7x7x1024 feature map This feature map is then utilized by additional layers to predict both the object labels and their corresponding bounding box coordinates.
Image segmentation is a process that transforms image inputs into outputs, resulting in a mask or matrix This output highlights different components, indicating the object class or instance associated with each pixel.
Image segmentation can be enhanced by utilizing various heuristics and high-level image properties These features form the basis for traditional segmentation algorithms that use clustering methods, including edge detection and histogram analysis.
Traditional image segmentation techniques are fast and straightforward but often need extensive fine-tuning with manually designed heuristics for specific applications These methods may lack the precision required for complex images In contrast, modern segmentation approaches leverage machine learning and deep learning to enhance accuracy and adaptability.
Model training enhances the ability of machine learning algorithms for image segmentation to identify key features effectively The application of deep neural networks significantly improves the accuracy and efficiency of image segmentation processes.
There are several neural network architectures and implementations that are appropriate for image segmentation They all have the same important components:
An encoder consists of multiple layers that progressively extract image data using increasingly specialized filters Often pre-trained on similar tasks, such as image recognition, the encoder leverages its prior knowledge to enhance the performance of segmentation tasks.
A decoder:a sequence of layers that progressively convert the encoder's output into a segmentation mask with the same pixel resolution as the input image.
Skip connections: Multiple long-range neural network connections that allow the model to recognize characteristics at different sizes to improve model accuracy.
There are various ways to segment an image Here are some of the main techniques:
Figure 2.3: Types of Image Segmentation Semantic segmentation:
The pixel grouping process in images relies on semantic categorization, where each pixel is assigned to a specific class without considering additional context or information This approach can result in poorly defined problem statements, particularly when multiple instances are classified under the same category.
PID Method
PID Control, or Proportional-Integral-Derivative feedback control, is a widely utilized controller in industrial applications It functions by adjusting the output to maintain a process variable at a desired set point.
Set Point:The set point is often a user-entered number, such as the set speed in cruise control.
Process Value:The value under control is the process value.
Error: The error value is used by the PID controller to determine how to change the output to bring the process value closer to the set point.
A PID controller produces a regulated output by continuously monitoring the error value It calculates the proportional, integral, and derivative components based on this error, and then sums these three values to generate the final output.
The primary goal of a proportional controller is to produce a significant immediate response in the output, bringing the process value closer to the set point As the error diminishes, the output correspondingly reduces.
K c e(t) (2.3) where � � is the proportional gain and the feedback error.
The Integral is computed by multiplying the I-Gain by the error, then multiplying this by the controller's cycle time, and continually accumulating this result as the "total integral".
��(�)�� (2.4) where e(t) = r(t) − y(t) is the error signal between the reference signal r(t) and the outputy(t), � � is the proportional gain, and � � is the integral time constant.
The derivative is determined by multiplying the D-Gain with the ramp rate of the process value Its primary function is to forecast the future behavior of the process value and the integral, ensuring that the controller does not surpass the set point, especially when the ramp rate is rapid.
� � � � ��(�) �� (2.5) where e(t) = r(t) − y(t) is the feedback error signal between the reference signal r(t) and the outputy(t), and � � is the derivative control gain.
A PID controller's output is the sum of the three terms in ideal form.
The overview of a self-driving car
A self-driving car, also known as an autonomous or driverless vehicle, utilizes a blend of sensors, cameras, radar, and artificial intelligence (AI) to transport passengers without the need for human control For a vehicle to be considered fully autonomous, it must navigate independently to a specific destination on conventional roads that are not specially modified for its operation.
AI technologies drive self-driving car systems by leveraging extensive data from image recognition, machine learning, and neural networks, enabling the development of autonomous driving capabilities.
Neural networks analyze data patterns, which are utilized by machine learning algorithms This data comprises images captured by cameras on self-driving cars, enabling the neural network to recognize essential elements of the driving environment, such as traffic lights, trees, curbs, pedestrians, and street signs.
Other techniques used in the Project
PWM [12] is a method of regulation that changes the output voltage by varying the width of the square pulse sequence, resulting in a voltage change.
PWM is an efficient way for controlling the amount of power given to a load while minimizing wasted power.
PWM (Pulse Width Modulation) is widely utilized in various control applications, including motor control, pulsers, and voltage regulators This technique effectively regulates motor speed and enhances the stability of engine performance.
The duty cycle defines the relationship between the Ton and Toff periods, indicating the percentage of time a digital signal remains "on" within a specific interval This measurement is expressed as a percentage (%) and can be calculated using a specific formula.
Duty cycle = T T on on +T off × 100 (2.7)
The Inter-Integrated Circuit (I2C) [13] Protocol is a synchronous serial communication protocol developed by Philips Semiconductors, allows several
Peripheral digital integrated circuits utilize a protocol for communication with other peripheral or microcontroller devices, which has gained popularity for short-distance interactions This protocol is commonly referred to as the Two Wire Interface (TWI).
I2C is a two-wire serial bus capable of connecting up to 1008 peripheral devices, and it supports multi-controller systems, enabling multiple controllers to communicate with all devices on the bus.
I2C Communication Protocol uses only 2 bi-directional lines for data communication called SDA and SCL:
Serial Data (SDA) – The line to receive data.
Serial Clock (SCL) – It carries the clock signal.
Both SCL and SDA bus lines function in Open Drain mode, allowing the data line to change only when the clock current is low This design prevents potential short circuits that could occur if one device pulls the bus high while another pulls it low.
In the I2C communication protocol, the data is transmitted in the form of packets which consist of 9 bits.
Figure 2.8: Transmitted in the form of packets of I2C
To generate START, SDA is changed from high to low while keeping SCL high To generate STOP, SDA goes from low to high while keeping SCL high.
The initial frame after the start bit is the address frame, where the master transmits the address of the desired slave to all connected slaves Each slave then checks the master's sent address against its own to identify the intended recipient.
If the address matches, it sends a low-voltage ACK bit back to the master.
If the addresses do not match, the slave does nothing and the SDA current between those 2 devices will remain high.
The Read/Write bit determines the direction of data transfer between the Master device and the slave A high Read/Write bit signifies that the master is transmitting data to the slave, while a low Read/Write bit indicates that the master is receiving data from the slave.
The terms ACK (Acknowledged) and NACK (Not Acknowledged) are used to evaluate the physical frame Following this comparison, an ACK/NACK bit is indicated If acknowledged, the Slave is set to '0'; if not acknowledged, the default value remains '1'.
A data frame is consistently 8 bits in length and transmitted with the most significant bit (MSB) first Following each data frame, an ACK/NACK bit is sent to confirm successful receipt of the frame, indicated by bit 0 on the SDA line It is essential for the master or slave to receive the ACK bit before proceeding with the transmission of the next data frame.
After all data frames have been sent, the master can send a Stop condition to the slave to pause the transmission.
To STOP, the voltage changes from low to high on the SCL line before switching the voltage from low to high on the SDA line.
Master device will send a Start pulse by switching SDA and SCL from high voltage to low voltage, respectively.
In I2C communication, the Master device transmits 7 or 10 address bits along with a Read/Write bit to the Slave The Slave compares the received address with its own; if they match, it acknowledges by pulling the SDA line low and setting the ACK/NACK bit to '0' If there is no match, both SDA and ACK/NACK bits default to '1' When data is being sent, if the Master is transmitting to the Slave, the Read/Write bit is '0'; if receiving, it is set to '1' Upon successful transmission of the data frame, the ACK/NACK bit is set to '0', indicating to the Master to proceed with the next operation.
Once all data has been transmitted to the Slave device, the Master will signal the end of the transmission by sending a Stop signal This is accomplished by changing the SCL and SDA lines from low voltage to high voltage.
2.4.3 UART (Universal Asynchronous Receiver/Transmitter)
UART [14] is used for serial communication Two wires are established, with just one wire used for transmission and the other used for receiving.
UART is an asynchronous communication protocol that does not utilize a clock line to control data transfer rates To ensure successful communication, both devices must be configured to operate at the same baud rate, which is measured in bits per second (bps) Baud rates can vary significantly, typically ranging from 9600 bps to 115200 bps and even higher.
Data transmitted by UART is organized into packets Each packet contains 1 start bit,
5 to 9 data bits (depending on UART), an optional parity bit, and 1 or 2 stop bits.
Figure 2.10: Transmitted in the form of packets of UART Start bit:
UART data lines maintain a high voltage state when idle During data transmission, the transmitting UART shifts from high to low voltage for one clock cycle The receiving UART detects this high-to-low transition and begins reading the bits in the data frame at the baud rate frequency.
The data frame, which can vary in length from 5 to 8 bits when utilizing a parity bit, may extend to 9 bits if no parity bits are included Data transmission occurs with the least significant bit (LSB) sent first.
The parity bit serves as a mechanism for the receiving UART to detect any changes in data during transmission After receiving a data frame, the UART counts the number of bits set to 1 and determines if this count is even or odd If the parity bit is 0, the count of 1s must be even, while a parity bit of 1 indicates that the count should be odd A match between the parity bit and the data confirms an error-free transmission; otherwise, it signals that the bits in the data frame have been altered.
To signal the end of a data packet, the sending UART drives the data line from low voltage to high voltage for at least about 2 bits.
DESIGN AND IMPLEMENTATION
Requirements of the topic
The self-driving vehicle system enhances driver safety by providing crucial support during transportation, especially in unexpected situations, thereby reducing accident risks To achieve this, the system must adhere to specific technological standards for effective response.
The AI system utilizes a camera and advanced algorithms to detect and recognize objects, leveraging the powerful graphics processing capabilities of the Jetson Nano platform for all computations.
To ensure continuous operation for several hours, the system must minimize power consumption The technological requirements are subjective, lacking a universal standard for supporting equipment This study focuses on selecting relevant and suitable factors to address these needs.
Block diagram
We discuss the issue of selecting relative and suitable parameters in this design. Figure 3.1 depicts the system's block diagram.
Figure 3.1: Block of Design a self-driving car for following lane system
The self-driving car's lane-following system consists of three main components: the AI Block, the DC Motor Control Block, and the Servo Motor Control Block The AI Block utilizes the Jetson Nano 16 eMMC Developer Kit and an 8MP Raspberry Pi V2 Camera to efficiently detect and identify moving lanes using the U-net algorithm and traffic signs through the YOLO algorithm The DC Motor Control Block, managed by an Arduino Nano, oversees the H-Bridge Motor driver L298N, which regulates vehicle speed Finally, the Servo Motor Control Block, utilizing the PCA9685 16-Channel PWM Driver, enables precise steering control for various driving scenarios.
The self-driving car's lane-following system consists of two primary components: the AI block and the Control block Due to the Jetson Nano's management of intensive tasks, utilizing its GPIO can interfere with operations, leading to decreased overall system performance.
For lighter tasks, such as processing data from low-power motors, using the Arduino Nano and PCA9685 is more efficient than depending solely on the Jetson Nano Additionally, the Arduino Nano and PCA9685 consume less power, enabling them to operate effectively from a battery power source.
Design option
The AI block is crucial for the lane-following system in self-driving cars, as it processes significant workloads It effectively recognizes and detects input lanes from real-time video streams The subsequent sections delve into the hardware and algorithms employed by the AI block, and a detailed block diagram of the AI system is illustrated in Figure 3.2.
Figure 3.2: The detailed block of the AI system
Jetson Nano Developer Kit eMMC [15]
To meet the demands for processing capacity and speed, selecting a board with advanced processing capabilities is essential for executing complex control algorithms Opting for a compact integrated microprocessor can effectively minimize device size while enhancing flexibility Currently, embedded boards are offered at various price points, providing a diverse range of processing capabilities to suit different needs.
Developing applications for self-driving cars that utilize lane-following systems on NVIDIA Jetson boards is an ideal solution The Jetson Nano stands out as a suitable choice for this purpose.
Figure 3.3: NVIDIA Jetson Nano Developer Kit eMMC
Jetson Nano Developer Kit [15] is an AI computer that strongly supports image processing such as classification and recognition through GPU matching.
The Jetson Nano 4GB development kit features a powerful 64-bit quad-core ARM Cortex-A57 CPU and 4GB of LPDDR4 RAM, making it ideal for AI applications With a 128-core integrated NVIDIA GPU, it delivers 472 GFLOPS, enabling quick execution of modern AI algorithms The device supports video processing up to 4K at 30 fps for encoding and 60 fps for decoding, and includes PCIe and USB 3.0 slots for expanded connectivity This capability allows for the simultaneous operation of multiple neural networks and the processing of high-resolution sensors.
Camera Raspberry Pi V2 IMX219 8MP [16]
To effectively detect and identify road construction and traffic signs, a high-quality camera with robust identification capabilities is essential The Raspberry Pi Camera Module V2, featuring an 8-megapixel Sony IMX219 sensor, is an ideal solution for this purpose.
The Raspberry Pi Camera Module V2 significantly enhances image quality, true color accuracy, and low-light performance It supports impressive video resolutions, including 1080P at 30 frames per second, 720P at 60 frames per second, and VGA at 90 frames per second, along with a dedicated photo mode The module connects to the Raspberry Pi using a 15cm cable through the CSI port.
Figure 3.4: Camera Raspberry Pi V2 IMX219 8MP Table 3.1: Camera Raspberry Pi V2 IMX219 8MP Specifications
Sensor image area 3.68 x 2.76 mm (4.6 mm diagonal)
Depth of field Approx 10 cm to ∞
Horizontal Field of View (FoV) 62.2 degrees
Vertical Field of View (FoV) 48.8 degrees
The schematic of AI System
The Jetson Nano acts as the central processing unit, receiving images from the Raspberry Pi V2 IMX219 Camera (8MP), processing the signals, and transmitting them to the output block, which consists of the Arduino Nano and PCA9685, using UART and I2C protocols.
Figure 3.5: The schematic of the AI system
The power block will provide the central processor (Jetson Nano) with a voltage of5V.
Figure 3.6 examines all the steps completed to recognize lanes of moving roads and traffic signs, which are also a path of the AI system.
Figure 3.6: The flowchart of recognized lanes
To begin the process, we will input the model file in ONNX or TFLite format, followed by opening the camera using GStreamer, a versatile framework for multimedia applications Once the camera is activated, data collection commences, with the camera frame being cropped to align with the training model requirements The data is then processed for recognition and prediction detection In real time, the system visually marks the predicted movements on the image For a clearer understanding of the system's functionality, it has been segmented into several distinct components.
After researching some project related documents, we decided to create our own lane.
To create a comprehensive dataset for training, we utilize two distinct materials for lane generation: a reflective tarp and black felt fabric.
Figure 3.7: The flowchart of generate the lane
To activate the Jetson Nano's camera system, we will utilize the OpenCV 4.5.0 library along with the GStreamer pipeline The system will verify the camera's status using the "cv2.VideoCapture()" function Once confirmed, each frame will be stored in memory using "cv2.imwrite('lane_' + str(i) + '.jpg', frame)", saving images in jpg or png formats The total number of images will be restricted by the index "i".
Labeling and classification for lane dataset
Figure 3.8: The process flow of Labeling and classification for lane dataset
We utilized images extracted from video footage for labeling purposes The LabelMe software was employed to classify and annotate the data through Semantic Segmentation After labeling, the images were saved in json format Subsequently, a txt file was generated, containing the names of the classes as outlined below.
# %cd {src_segmentation}/{PATH_foldername}
%cd {PATH_2segmentation}/{PATH_foldername}
!rm labels.txt # nếu có file thì xóa
After create a labels txt file, we get the results as Figure 3.9:
Figure 3.9: Result the name of class
The purpose of creating a txt file is to define the "lane" class name for the entire dataset.
Labeling and classification for traffic signs dataset
To enhance our learners' understanding of various traffic signs, we utilized images extracted from our video, comprising a dataset of 1,200 cropped images This collection includes essential signs such as left, right, stop, and straight Each image will be systematically labeled to facilitate the training and labeling process.
Figure 3.10: The process flow of Labeling and classification for traffic signs dataset
Then, the size of the image will be set according to YOLO’s regulation because Roboflow software supports exporting images as txt files using Yolov8 format.
Figure 3.11: Format to export dataset
Figure 3.12 are some examples from the dataset:
Image input data of Yolov8 format with each txt file will give an image containing the object that we label The txt file will have the following format:
Each row will be an object.
Each row will have the following format: class xcenterycenterwidth height.
The coordinates of the boxes will be normalized in the format x, y, w,h.
Figure 3.13: Parameter of ratio Frame Train Yolv8 on Google Colab
Object detection is a subfield of deep learning As you may be aware, deep learning contains a wide range of methods for object detection, image processing, and so on.
"Yolo" stands out among object detection models for its superior real-time identification capabilities With unmatched accuracy and speed, "Yolo" consistently surpasses its competitors Consequently, we utilize "Yolov8" for effective object detection implementation.
We choose Google Colab as the environment in which to train the input data. Furthermore, the core algorithm, Yolov8, builds the weight file after learning from input data.
Figure 3.14: The process flow of Train Yolv8 on Google Colab
To begin training custom datasets with Yolov8, first clone and install the Yolov8 repository from GitHub Next, obtain the Yolov8 weight file, named "yolov8n.pt," which is essential for training your dataset After that, modify the "custom_dataset.yaml" file in Yolov8 to match the specific classes you defined during dataset creation, as the default file includes 80 classes that may not correspond to your dataset's requirements.
To begin the training process, execute the following command below:
%cd /content/drive/MyDrive/Data_Label/YoloV8 from ultralytics import YOLO
# Load a model model = YOLO("/content/yolov8n.pt")
# Use the model model.train(data="/content/mydataset.yaml",epochs00, imgszd0,batch2)
Batch size: The amount of data is divided and based on the batch size chosen by the user, the 2 to the power n technique is used Example:
Epoch: An epoch is one traversal of the training set's data.
Data:Images are contained in the train folder.
“Best.pt”file is the one that provides superior performance.
“Last.pt” file is the model after the most recent epoch, and it can be used to restart training.
To verify the model's accuracy, execute the command: `from ultralytics import YOLO`, followed by `model = YOLO("/content/train/weights/best.pt")` Then, run `results = model("/content/train/images/signal_12600.jpg")` and display the results using `cv2_display(results)`.
This is a Keras Sequential API-based convolutional neural network (CNN) model.
Figure 3.15: Create CNN networkInput layer: Conv2D layer with 32 filters, kernel size of (5, 5), 'relu' activation function, and input shape of (30, 30, 3).
Conv2D layer with 64 filters, a kernel size of (3, 3), and 'relu' activation function.
MaxPool2D layer with a pool size of (2, 2) for down sampling.
Dropout layer with a dropout rate of 0.5 to prevent overfitting.
Conv2D layer with 64 filters, a kernel size of (3, 3), and 'relu' activation function.
MaxPool2D layer with a pool size of (2, 2) for down sampling.
Dropout layer with a dropout rate of 0.5.
Flatten layer:Converts the preceding layer's output to a 1D array.
A 256-unit dense layer with a 'relu' activation function.
Dropout layer with a dropout rate of 0.5.
Output layer: Dense layer with four units (assuming it's a classification issue) and a 'softmax' activation function to calculate class probabilities.
EXPERIMENT RESULTS AND DISCUSSION
Hardware implementation
Figure 4.1 depicts a self-driving car deployment model for a lane-following system.
Figure 4.1: Traffic map of the self-driving car system
The system comprises three key components: a car model equipped with a Jetson Nano central processing unit, an AI camera for lane and traffic sign recognition, and a control system that facilitates vehicle movement It is designed to navigate by identifying black lanes with white lines and interpreting traffic signs.
A model vehicle is designed to accurately navigate to its designated position on the road upon receiving control signals from the system This system design is crucial for ensuring the vehicle's effective movement and operational efficiency.
Figure 4.2: The structure layers of the self-driving car system
The vehicle will have four floors: the first floor will contain the L298N motor driver,
The RC Servo MG996 Motor and DC motor are integrated into a multi-tiered system design The first floor houses the power supply block for the central processing unit, while the second floor features the central processing block, which includes the Jetson Nano and Raspberry Pi V2 Camera, serving as the system's "brain." The third level is dedicated to the power supply for the motors, along with the PCA9685 16-Channel PWM Driver and Arduino Nano This setup is powered by a lithium battery, allowing the car to operate for several hours.
White tape is strategically applied to the vehicle's path system, enabling it to identify its range of movement and navigate accurately Traffic signs are positioned along the route, with each sign clearly marked to ensure the vehicle can execute its functions effectively.
System Operation
The AI system utilizes advanced camera and image processing techniques to gather data while navigating roads, enabling it to identify lanes and traffic signs effectively This information allows the system to make informed decisions for safe driving As illustrated in Figure 4.3, the AI's recognition capabilities demonstrate significant success in accurately classifying road elements.
Figure 4.3: The recognition of lane and traffic signs by autonomous vehicles
Utilizing image processing techniques, the AI system identifies the next movement position and transmits angle and speed data to the control system The angle is calculated using the PID method, while speed is derived from a linear function Upon receiving these control signals, the system's Servo and DC motor control blocks utilize the angle and speed values to execute the movement process effectively.
Figure 4.4: The recognition of lane by autonomous vehicles
If the AI system recognizes a sign in the frame while moving, it will transmit the processed signal to the control system to perform that function (left, right, stop, straight).
Figure 4.5: The recognition of traffic signs by autonomous vehicles
Process and send to the control system a signal including the value of angle and speed for the vehicle to perform the function at the same time.
Implementation
We utilized our available tools to create a lane, employing a camera to track the movement of the car along this designated path The outcome demonstrates the successful implementation of the lane generation process.
Figure 4.6: Result of generating lane process
Some images of lanes in the dataset:
Lanes are clearly distinguished on a black background by white tape.
To initiate the data labeling process, utilize the LabelMe software by uploading your data folder Next, employ the Create Polygons feature to outline the lane and assign it a class name The outcome of this initial step in the labeling process is illustrated in Figure 4.8.
Figure 4.8: Result of labelling each lane image
The red area is the part of the lane that needs to be labeled and is given the class name "lane".
A label file is saved in json format, which exclusively holds the point coordinates of each segment Consequently, it is essential to convert all the coordinates from the json file into an image for better visualization and usability.
Furthermore, instead of converting the point coordinates to a file with RGB image color, convert them to images 0 and 1 Simultaneously, to speed up the model training process while saving space.
Figure 4.9: The segmentation of image
Figure 4.9 is the result of the transformation.
The Sign traffic generation process
In addition to generating lanes, we also create and position signs along the lane By utilizing a camera, we can track the car's movement on the lane and gather results, as illustrated in Figure 4.10.
Figure 4.10: Result of generating Sign traffic process
Details of some signs traffic in the dataset
Collected images are clear as Figure 4.11.
Roboflow is an effective computer vision development platform that enhances data labeling, model training, preprocessing, and data collection Start by uploading your images to Roboflow, where the platform automatically generates rectangles around identified regions and assigns layer names This initial step in the labeling process is illustrated in Figure 4.12.
A red rectangle is drawn around the sign, the "left" class is labeled for it.
Figure 4.12: Result of labelling each traffic sign image
Similarly, with classes "right", "stop" and straight" are green, orange and blue rectangles respectively.
Evaluation and comparison
4.4.1 Comparation between 2.500 dataset and Extra 2000 dataset
After training 2.500 dataset first with 100 epochs, we got the results as Figure 4.14:
Figure 4.14: Loss and Accuracy Graph of 2.500 dataset first with 100 epochs
The results indicate that the model is well-trained, as evidenced by the Accuracy and Validation Accuracy values stabilizing at 1, while remaining in the range of 0.92 to 0.98 However, there is a notable discrepancy between the Loss and Validation Loss, which, despite both approaching zero, show a significant difference, ranging from 0.2 to 0.05.
Figure 4.15: Results of test image from 2.500 dataset first with 100 epochs
Following recognition of the fact, the returned result is noisy and deviates from the predicted.
Continuing to train with the following 2000 datasets gets even better results, as can be seen in the Figure 4.16:
Figure 4.16: Loss and Accuracy Graph of extra 2000 dataset second - 100 epochs
The accuracy and validation accuracy values remain consistently high, ranging from 0.97 to 0.985 Additionally, there has been significant improvement in both Loss and Validation Loss metrics The divergence between these values is minimal, with both converging towards zero, falling between 0.06 and 0.
Figure 4.17: Results of test image from extra 2000 dataset second - 100 epochs
Table 4.1: Results of training for Lane detection model
LANE DETECTION MODEL Model Epochs Dataset Accuracy Val_Accuracy Loss Val_Loss Model size Format
In this study, we utilized the U-net model to train a lane detection system The findings presented in Table 4.1 reveal a distinct shift in performance metrics, including Accuracy and Loss, which we categorized into two distinct periods.
In the initial 2500 datasets, both accuracy and loss, along with validation accuracy and loss, remain elevated, indicating a significant gap between loss and validation loss However, after incorporating an additional 2000 datasets, a noticeable improvement in accuracy is observed, increasing from 0.963 to 0.979 Concurrently, validation loss decreases from 0.076 to 0.049, while the difference between loss and validation loss narrows to a range of 0.041 to 0.049.
Table 4.2: Results of deploying for Lane detection model
Model Format CUDA CPU Model size
After testing the h5 models post-training, we achieved optimal results By converting the model from h5 to onnx format, we reduced its size from 1.4MB to 0.422MB, making it lighter and more efficient for deployment and operation.
4.4.2 Evaluate of the Yolo model
The Figure 4.18 is result after training:
Figure 4.18: Result of the recognized traffic sign by Yolo
Traffic sign recognition results using model Yolo.
Table 4.3: Results of training for Yolo versions
DETECTION OF THE TRAFFIC SIGN Model Precision Recall F1_score MAP@50 MAP@50-95
Table 4.3 demonstrates that Yolov8n significantly outperforms earlier Yolo models in key metrics, including Precision, Recall, F1_score, and MAP Conversely, while Yolov4 tiny shows lower performance, it still surpasses the safe threshold, achieving over 90% of the total allowable index.
Table 4.4: Results of deploying for Yolo versions
DEPLOY DETECTION OF TRAFFIC SIGN Model Format CUDA CPU Model size
We utilized the Onnx runtime source code to load the Yolov5 model into Yolov8 on the Jetson Nano However, the frames per second (FPS) performance remains insufficient for effective application and development in conjunction with autonomous vehicles during motion.
The TensorRT source code enhances performance in YOLO version detection, with YOLOv4-tiny achieving the highest frames per second (fps) at a stable range of 24 to 25 fps, despite yielding the lowest detection results among the YOLO versions Conversely, YOLOv8 produced the lowest fps results.
The result of training with CNN model:
Figure 4.19: Loss and Accuracy Graph of training with CNN model
After completing 20 epochs of training, both accuracy and validation accuracy remain high, ranging between 0.97 and 0.90 Additionally, there has been significant improvement in the Loss and Validation Loss metrics, which show minimal divergence and both trend towards zero.
CLASSIFICATION Model Epochs Dataset Accuracy Val_Accuracy Loss Val_Loss Model size Format
From the Table 4.5, we see that CNN model will give us a result significantly high, go up to 0,993 Besides, the result of Loss converges to 0.
FPS expansion and improvement
4.5.1 Run a Yolov4 and YoloV5 model by converting it to the TensorRT engine and then running it in Jetson Nano.
During inference, TensorRT-based apps can outperform CPU-only systems by up to
With a response time under 7ms and the ability to optimize for specific targets, developers can enhance neural network models across major frameworks such as PyTorch, TensorFlow, ONNX, and MATLAB for faster inference, achieving performance improvements up to 36 times.
Running Yolo models on the Jetson Nano without optimization results in low frame rates To enhance performance, it's advisable to convert your Yolov5 model using the TensorRT engine Utilizing TensorRT with your Yolo model significantly improves efficiency and frame rates on the Jetson Nano.
Before starting the conversion process Yolo model to the Tensor engine, we have to install a lot of libraries Installing Python Modules, PyCuda, Seaborn, Torch and Torchvision.
Starting by converting the Yolo models into the TensorRT engine:
Figure 4.20: The process of converting the Yolo models into the TensorRT engine
To initiate the process, we will first convert the pt file into a wts file Subsequently, we will utilize this wts file to construct the engine file, which will enable us to perform inference on images based on the generated engine file.
We utilize the Yolov5n model, a compact version of the Yolov5 series, to enhance performance by reducing complexity and optimizing frames per second (fps) Larger Yolov5 models, with their extensive parameters, can significantly slow down operations and complicate the conversion to a TensorRT engine By opting for Yolov5n, we achieve a lighter model that maintains efficiency while minimizing resource demands.
Make a build directory within yolov5 Copy and paste the generated.wts file into the build directory, then run the command $ cmake
Figure 4.21: Result of run the command $ cmake
After the cmake process completes, we will run the $ make command The result is as Figure 4.22:
Figure 4.22: Result of run the command $ make
To run the Build Engine file we use the command: $ /yolov5_det -s yolov5n.wts yolov5n.engine n.
Figure 4.23: Result of export to yolov5n.engine
Deploy engine file on Jetson Nano by TensorRT:
Previously, we utilized onnx or tflite files for traffic sign identification; however, these formats resulted in large file sizes and heavy models, which negatively impacted frames per second (fps) To enhance performance, we now employ a file engine that optimizes the model, significantly reducing its weight and maximizing fps.
Figure 4.24: The flowchart of Deploy engine file on Jetson Nano by TensorRT
After converting the Yolo models into the Tensor engine, we will load the ".engine" file onto the Jetson Nano The system will then recognize valid signs displayed on the screen, extract the Bounding Box, and identify the corresponding labeled class Finally, it will display the detection results along with the achieved frames per second (fps) before concluding the process.
4.5.2 Improve the detection accuracy of Yolo.
Researchers have proposed various methods to enhance the accuracy of deploying Yolo on the Jetson Nano After reviewing these improvement strategies, we recommend utilizing the Classification method to boost detection accuracy Figure 4.25 illustrates the process of integrating Yolo with Classification for improved results.
Figure 4.25: The flowchart of combining Yolo with Classification
We utilize Yolo for object detection to obtain the bounding box coordinates Next, we crop the image based on these coordinates and apply a classification model to analyze the cropped image The outcome of this detection process reveals the class of the object along with its accuracy.
Compared with the above method is the detection process using only Yolo, presented as Figure 4.26:
Figure 4.26: The flowchart of combining Yolo without Classification
In contrast to the approach of integrating Yolo with Classification, this method first draws the Bounding Box around the detected object, followed by the detection and display of the object's class result along with its accuracy.
Table 4.6: Results of deploying detection lane and traffic signs
DEPLOY DETECTION LANE AND TRAFFIC SIGNS
In evaluating the performance of previous Yolo versions, we will focus on implementing Yolov4 and Yolov5 in our project Both U-net and CNN models demonstrate high and stable frames per second (fps) due to their lightweight nature on GPU (CUDA) and CPU However, integrating three models simultaneously leads to a significant reduction in fps, dropping from 50 to 19 with Yolov4-tiny and from 50 to 9.5 with Yolov5n.
After analyzing the collected data, we selected three models for our application: Yolov4-tiny (or Yolov4 max), U-net, and CNN, as they demonstrated impressive performance with frame rates of 19-20 fps when supported by a stable power source.
After improving the model, we will collect some good information when running theYolo recognition combining classification and segmentation to identify the steering angle shown in Figures 4.27:
Figure 4.27: Results of the combination of Yolo, classification, and segmentation
The current status indicates an impressive increase in frame rates, now ranging from 18 to 19 fps, a significant improvement from the previous average of just under 10 fps The accuracy level exceeds 0.9, with class names being displayed correctly most of the time The implemented PID method utilizes two key parameters: speed and steering angle, set at 60 and 90 respectively The steering angle for the wheels is configured to vary between 50 to 120 degrees, with 90 degrees facilitating straight-line driving; angles below 90 degrees will steer left while angles above 90 degrees will steer right.
The deployment models are currently operating in a stable environment, but the actual frames per second (fps) achieved will be between 16-17 fps, slightly lower than the expected 18-19 fps This decrease is attributed to various factors, including camera shake, noise, glare, false recognition, and the complexity of the control program, which can cause the engine's performance to decline significantly.
CONCLUSION AND FUTURE WORKS
Conclusion
Despite various challenges in understanding the issue, the feasibility of the project is achievable within the required timeframe and conditions for approval This method successfully met the established objectives.
The report accomplished the following tasks:
- Learn and apply the control algorithm to navigate the steering angle and speed for the vehicle.
- Using cameras in the mapping process during the movement, navigate the vehicle by following the signs.
- Integrate an embedded computer into the vehicle to reduce its size and increase its flexibility.
- Apply the knowledge learned in the subject's construction process and hardware design.
In addition, there are jobs that the report has not completed well:
- Accuracy in sharp or sudden turns is not stable.
- FPS has not been improved when deploying from Yolov5 to Yolov8 on JetsonNano.
Future works
This research establishes a solid foundation for advancing the subject matter, enabling a more robust and adaptable system across diverse scenarios.
Enhancing the stability of self-driving car positioning involves increased training efforts, which lead to higher output quality This improvement enables the calculation of a more accurate location for the moving vehicle.
[1] Rikiya Yamashita, Mizuho Nishio, Richard Kinh Gian Do & Kaori Togashi,
"Convolutional neural networks: an overview and application in radiology," 22 June 2018.
[2] Laith Alzubaidi, Jinglan Zhang, Amjad J Humaidi, Ayad Al-Dujaili, Ye Duan,
Omran Al-Shamma, J Santamaría, Mohammed A Fadhel, Muthana Al-Amidie & Laith Farhan, "Review of deep learning: concepts, CNN architectures, challenges, applications, future directions," 31 March 2021.
[3] Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, "You Only Look
Once: Unified, Real-Time Object Detection," 8 Jun 2015.
[4] Tausif Diwan, G Anirudh & Jitendra V Tembhurne, "Object detection using
YOLO: challenges, architectural successors, datasets and applications," 08 August 2022.
[5] Yanming Guo, Yu Liu, Theodoros Georgiou & Michael S Lew, "A review of semantic segmentation using deep neural networks," 24 November 2017.
[6] Fude Cao, Qinghai Bao, "A Survey On Image Semantic Segmentation Methods
With Convolutional Neural Network," 20 November 2020.
[7] H Bandyopadhyay, "An Introduction to Image Segmentation: Deep Learning vs.
Traditional," [Online] Available: https://www.v7labs.com/blog/image-segmentation-guide.
[8] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, "U-Net: Convolutional
[9] Albert Aarón Cervera-Uribe, Paul Erick Méndez-Monroy, "U19-Net: a deep learning approach for obstacle detection in self-driving cars," 7 April 2022.
[10] L Wang, "Basics of PID Control," inPID Control System Design and Automatic
Tuning using MATLAB/Simulink, Wiley-IEEE Press, 2020, pp 1 - 30.
[11] Christoph Lütge, Christoph Bartneck, "Autonomous Vehicles," inAn Introduction to Ethics in Robotics and AI, January 2021, pp 83-92.
[12] Safaa Alaa Eldeen Hamza, Dr Amin Babiker A/Nabi Mustafa, "The Common Use of Pulse Width Modulation," Faculty of Engineering, Department of Electronic
Engineering, MSc in Communication and Data Networks, Sudan-Khartoum.
[13] Texas Instruments, "A Basic Guide to I2C," [Online] Available: https://www.ti.com/lit/an/sbaa565/sbaa565.pdf?ts87506980649&ref_url=https
[14] Texas Instruments, "KeyStone Architecture Universal Asynchronous
Receiver/Transmitter (UART) User Guide," [Online] Available: https://www.ti.com/lit/ug/sprugp1/sprugp1.pdf?ts87537304378.
[15] NVIDIA, "NVIDIA Deep Learning TensorRT Documentation," [Online].
Available: https://docs.nvidia.com/deeplearning/tensorrt/.
[16] Raspberry Pi, "Raspberry Pi Camera Module 2," [Online] Available: https://www.raspberrypi.com/products/camera-module-v2/.
[17] Arduino, "Arduino Nano," [Online] Available: https://store.arduino.cc/products/arduino-nano.
[18] STMicroelectronics, "L298N Motor Driver Module," [Online] Available: https://components101.com/modules/l293n-motor-driver-module.
[19] Luigi Cocco and Muhammad Aziz, "Role of Bearings in New Generation
Automotive Vehicles: Powertrain," inAdvanced Applications of Hydrogen and Engineering Systems in the Automotive Industry, December 4th, 2020.
[20] Bill Earl, "Adafruit PCA9685 16-Channel Servo Driver," [Online] Available: https://learn.adafruit.com/16-channel-pwm-servo-driver?view=all.
[21] "MG996R High Torque Metal Gear Dual Ball Bearing Servo," [Online] Available: https://www.electronicoscaldas.com/datasheet/MG996R_Tower-Pro.pdf.
[22] Ronneberger, O., Fischer, P., & Brox, "U-Net: Convolutional Networks for
Biomedical Image Segmentation," inInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Springer, 2015, pp.
[23] Dillon Reis, Jordan Kupec, Jacqueline Hong, Ahmad Daoudi, "Real-Time Flying
Object Detection with YOLOv8," 17 May 2023.
[24] Chien-Yao Wang, Alexey Bochkovskiy, Hong-Yuan Mark Liao, "YOLOv7:
Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," 6Jul 2022.
[25] Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao, "YOLOv4:
Optimal Speed and Accuracy of Object Detection," 23 Apr 2020.
[26] Sparsh Mittal, "A Survey on Optimized Implementation of Deep Learning Models on the NVIDIA Jetson Platform," 2019.
[27] Yuxiao Zhou, Kecheng Yang, "Exploring TensorRT to Improve Real-Time
Inference for Deep Learning," IEEE, 2022.
[28] KUI WANG , "Velocity and Steering Control for Automated Personal Mobility
Vehicle," 2021 [Online] Available: https://www.diva-portal.org/smash/get/diva2:1670919/FULLTEXT01.pdf.
[29] Rafsanjani, I P D Wibawa , and C Ekaputri, "Speed and steering control system for self-driving car prototype," IOP, 2019.
[30] Mehran Pakdaman; M Mehdi Sanaatiyan; Mahdi Rezaei Ghahroudi, "A line follower robot from design to implementation: Technical issues and problems,"IEEE, March 2010.
Link GitHub:https://github.com/nghungthinh/Self_driving_car