OVERVIEW
Introduction
Over the last 50 years, the concept of autonomous cars was first introduced but it seemed unrealistic and only existed in fiction movies The idea of long decades wasn't made feasible until sufficient breakthroughs were achieved in processing units, which happened in the 1980s Nowadays, self-driving cars are rapidly developing and becoming a promising technology with multiple benefits, including increased road safety, reduced accidents, and enhanced passenger comfort Many wealthy nations, including the United States, Japan, France, Germany, and China, have devoted significant resources to the study and development of autonomous cars for a long time and have achieved outstanding accomplishments As a typical example, the Tesla’s self-driving car has gained significant popularity since its introduction, with a growing number of users around the world, and numerous enthusiasts embracing the technology's potential to revolutionize the future of transportation
Thanks to the recent development of driving control algorithms, multisensory technique and especially deep learning, autonomous vehicle model now has many chances to improve the level of driving itself, such as identify terrain conditions, analyze to find suitable trajectory Recent years have witnessed significant development in the field of computer vision and Deep Learning So that, many research efforts have been conducted and published to share enormous amount of ideas in which resolve many disadvantages of self-driving cars
Encouraged by these positive factors, in this thesis, I managed to construct a prototype golf-based autonomous car, which can be achieved real-time accurate navigation results in several basic scenarios, by the combination of multiple sensors Additionally, my proposed system conducted a lateral control strategy without the support from EPS (Electronic Power Steering)
Autonomous cars, also known as self-driving cars, are a rapidly developing technology that has the potential to revolutionize transportation globally There are numerous international research groups and companies working on developing autonomous car technology Some notable international companies in the field of autonomous cars include: Waymo, Tesla, Baidu, and Uber Figure 1.1 illustrates the disclosed and estimated minimum total spend in Self-driving car R&D of 12 companies
Figure 1.1 Estimated spending in Self-driving R&D (2019)
In more detail, in paper [1], the authors utilized a combination of lane detection, disparity map and SVM (Support Vector Machine) to keep the lane and determining the distance between a car and other vehicles Turn to paper [2], a comprehensive review was conducted on various studies addressing current obstacles, advanced system structures, newly developing techniques, and fundamental features such as localization, mapping, perception, planning, and human machine interfaces The multi-sensor fusion was introduced in [3], which illustrated a simple and robust algorithm for their car to operate real-time using CPUs only [4] suggests a functional model of an autonomous vehicle that has the ability to navigate through specific courses such as straight, curved, or curvilinear by incorporating image processing and neural networks as part of its system
In Viet Nam, VinAI was the famous company that investigate strongly into the development of autonomous vehicles In recent years, VinAI has successfully developed the lane and 3D object detection features with high accuracy and compatibility to Vietnams conditions, which are still a big problem for preceding solutions in the global market The algorithm can synthesize and analyze data from cameras, radars, maps, and vehicle sensors to make optimal control decisions on speed and steering angle for the car Figure 1.2 shows the autonomous car of Vin-AI
Figure 1.2 Vin-AI autonomous car
In the last 3 years, the topic of autonomous vehicles has attracted much interest from students by many academic competitions Many RC-car models (as Figure 1.3) have been selected for research in graduate thesis as well as in scientific research One of the earliest approaches in HCMUTE is [5], where the authors proposed a robust end- to-end strategy for steering angle estimation of an autonomous 1/10 RC car on their custom tracks For the next example, in paper [6], the author deployed a lightweight Bise-Net for detecting drivable area and estimate corresponding states by multi- sensor fusion algorithm
Figure 1.3 The RC car model of HCMC UTE students
Even though several areas of expertise in the scientific discipline are necessary, and the dearth of familiarity and financial resources in designing autonomous vehicles, the student's self-driving prototype is still hindered by certain restrictions.
Objectives
The main objective of this project is to construct and develop an autonomous golf model that can operate well on HCMUTE campus So that, the first objective is to design and construct a controller board which uses compatible components for my system After that, I need to research a suitable perception algorithm for optimizing the low-cost devices Furthermore, this creates a pipeline for future work on operating real-life self-driving cars on actual roads.
Limitations
• In my project, my system can operate on HCMUTE campus road only
• The car is only capable of functioning in relatively simple environments and may struggle with scenarios involving crowded areas, sudden changes, and similar challenges
• The low-price sensors like GPS and IMU are affected by significant levels of noise when used in outdoor environments
• The car's performance on wide roads is limited by the constraints imposed by the camera angle
• The environment to operate my model was outdoors, so it can be affected by extreme weather factors like rain, lack of light, etc
• Due to the hardware limitations, this project gives priority to algorithms that are light and efficient Therefore, some of the more precise methods have not been employed in this thesis.
Research contents
The researched and implemented contents are illustrated as follows:
• Content 1: Research related papers and documents to find out the direction for solving thesis problems
• Content 2: Fix the mechanical and electrical problems from my prototype model
• Content 3: Design and construct the appropriate control circuit board with many relevant electronic components such as: STM32, sensors, receiver, power supply, etc
• Content 4: Calculate steering strategy for control wheeling
• Content 5: Collect and label data for training and testing model
• Content 6: Build a multi-task network with deep learning approach
• Content 7: Collect and build a campus road map with GPS data
• Content 8: Synchronize multitask model and GPS navigation with processor board
• Content 9: Collect and evaluate the experiment results
Thesis summary
The structure of this thesis is arranged as follows:
This chapter introduced the topic and the related works of the research, the objectives, the limitations, and the layout of this thesis.
LITERATURE REVIEW
Self-driving car technologies
The camera is already a crucial component of the car Today, most vehicles come with a reverse camera and curbside monitoring software to aid with parking Every vehicle equipped with a lane departure warning system has a main direction camera to look for paint splatters on the road Likewise with autonomous vehicles Modern cars almost universally include a lighting camera technology that aids with lane recognition A 360-degree camera is frequently used in multi-function systems to provide a panoramic picture of the area surrounding the vehicle Moreover, recently Advanced Driver Assist Systems (ADAS) have employed many AI algorithms to assist the driver for improving the ability of safety driving As Figure 2.1 below, the modern camera has integrated many intelligent functions for Object detection and Lane keeping
Figure 2.1 Modern driverless camera performance
However, it still exists several serious disadvantages due to the similarity between human eyes and cameras It can be affected by environmental factors such as too high or too low light levels So that, its risk can cause serious accidents for the user and the other one For that reason, a strategy to combine multiple sensors with the main camera to reduce risk while operating products was considered
Global Positioning System (GPS) is a navigation system that synchronizes position, velocity, and time data for land, sea, and air travel utilizing satellites, a receiver, and algorithms GPS systems (Figure 2.2) can provide users with
7 information about the exact location of an object anywhere in the world But, due to the stringent laws that countries have enacted to prevent unauthorized persons from accessing satellites, this system is one-way, as the user can only receive the GPS signal Using real-time geographic information gathered from several GPS satellites, an autonomous vehicle navigation system based on the Global Positioning System (GPS) calculates longitude, latitude, speed, and course to assist in vehicle navigation
IMU or Inertial Measurement Unit is an electronic measuring device used to determine velocity, orientation, and gravitational acceleration values of various vehicles such as airplanes, rockets, and more The IMU is a combination of sensors including accelerometers, gyroscopes, and magnetometers Figure 2.3 illustrates that combination on IMU sensor IMUs are commonly used to control the motion of manned or autonomous vehicles
The IMU is responsible for reading the measured values from the sensors (accelerometers, gyroscopes, etc.), and then sending the data to the computer for computation and determination of the current position based on velocity and time So
8 that, IMU nowadays have become widely used in many devices or transportation (as Figure 2.4)
Figure 2.4 IMU applications in mobile phone and in plane
However, the fundamental drawback of IMUs is that they tend to accumulate errors over time Due to continuously updating changes measured based on previously calculated values, any errors, no matter how small in scale, will accumulate over time This leads to discrepancies between the computed values and the actual values of the system So on, to maintain the accuracy of navigation system, the proposed solution is leveraging conventional modules such as the GPS, gravity sensors, velocity sensors, electronic compasses, and more, the returned values from these modules are responsible for refining the IMU values.
Overview about Artificial Intelligence
Artificial intelligence (AI) possesses the potential to disrupt numerous industries and has become a prominent topic in today's business landscape However, AI is often misunderstood While it is recognized as a transformative force for innovative companies, there are also concerns about its perceived impact on human employment and the potential upheaval of established industries [7] The term “artificial intelligence” refers to machines that employ decision-making or computational processes that mimic human cognition In practical terms, this entails machines comprehending information from their surrounding environment, whether it be physical space, the state of a game, or relevant data from a database
Artificial Intelligence is a broad field encompassing various sub-fields, techniques, and algorithms The primary objective of artificial intelligence is to develop machines that exhibit the same level of intelligence as human beings In
1956, researchers gathered at Dartmouth with the explicit aim of programming computers to emulate human behavior, marking the inception of artificial
9 intelligence In further elucidating the goals of AI, researchers expanded their primary objective to encompass six main goals:
Despite these primary goals, they do not encompass the entirety of specific AI algorithms and techniques The goals merely represent six major algorithms and techniques within the field of artificial intelligence:
• Machine Learning is the branch of artificial intelligence that equips computers with the capability to learn without explicit programming
• Search and Optimization: These algorithms, such as Gradient Descent, are employed to iteratively discover local maximum or minimum points
• Constraint Satisfaction involves finding a solution to a set of constraints that impose conditions on variables, ensuring their satisfaction
• Logical Reasoning: An instance of logical reasoning in artificial intelligence is exemplified by expert computer systems that replicate human decision-making abilities
• Probabilistic Reasoning combines the capacity of probability theory to handle uncertainty with the deductive logic's ability to leverage the structure of formal arguments
• Control Theory is a formal approach used to design controllers with verifiable properties Typically, this entails employing a system of differential equations to describe physical systems such as robots or aircraft
So on, as Figure 2.5 below, Artificial Intelligence, Machine Learning, and Deep Learning are each a subset of the previous field
Figure 2.5 Relation between Deep Learning, Machine Learning and AI
Machine Learning
Machine Learning, a subset of Artificial Intelligence, operates under the overarching goal of enhancing the intelligence of computers Unlike traditional
10 programming approaches, Machine Learning advocates for providing data to computers and allowing them to learn autonomously The concept of computers learning on their own was initially conceived by Arthur Samuel in 1959 A pivotal breakthrough that propelled Machine Learning to the forefront of Artificial Intelligence was the advent of the internet With the vast amount of digital information being generated, stored, and made accessible for analysis, the concept of Big Data gained prominence Machine Learning algorithms have proven to be highly effective in harnessing the potential of this Big Data landscape
Neural Networks play a crucial role in many highly effective machine learning algorithms They have been instrumental in enabling computers to simulate human- like thinking and comprehension In essence, a neural network replicates the functionality of the human brain The brain consists of interconnected neurons, communicating through synapses This concept is abstracted in the form of a graph, where nodes represent neurons and weighted edges represent synapses Figure 2.6 provides an illustration of the structure of a biological neuron
Figure 2.6 Structure of a biological neuron
The human brain operates through vast interconnected networks of neurons, enabling us to process information and create models of the world we inhabit Within this network, electrical inputs are transmitted through neurons, resulting in the production of outputs A neuron collects inputs through dendrites, and if the summed value exceeds a certain threshold, the neuron fires This firing triggers an electrical impulse that travels through the neuron's axon to the boutons, which can be connected to thousands of other neurons through synapses The intricate wiring of these
11 synapses, with approximately 100 billion neurons and 1000 synaptic connections each, allows our brains to process information effectively
Artificial neural networks, or neuron models, are simplified representations of biological neurons These models capture the fundamental functionality of biological neurons and are commonly referred to as "perceptron" As depicted in Figure 2.7, a typical perceptron consists of multiple inputs, each with individual weights These weighted signals are summed and passed through an activation function, which converts the input into a more useful output Various types of activation functions exist, with a simple example being the step function, which outputs 1 if the input surpasses a predetermined threshold, and 0 otherwise
Figure 2.7 Structure of artificial neurons
The more detailed about elements in Figure 2.7:
• Neurons: A neural network is composed of interconnected neurons, with input and output neurons representing the network's inputs and outputs, respectively Input neurons have no preceding neurons but provide outputs, while output neurons lack successor neurons but receive inputs
• Connections and Weights: Neurons in a neural network are linked through connections, with each connection assigned a weight that influences the signal's strength being transferred
• Propagation Function: The propagation function calculates a neuron's input based on the outputs of its predecessor neurons
• Learning Rule: The learning rule is a mechanism that modifies the connection weights in a neural network It helps the network achieve desired outputs for given inputs during training
• Learning Types: Several learning algorithms can be used to train artificial neural networks, each with its own advantages and disadvantages:
- Supervised Learning: This type of learning involves providing both input and desired output pairs during training By comparing the network's predicted output to the desired output, the algorithm can calculate an error and adjust the weights accordingly
- Unsupervised Learning: In unsupervised learning, only inputs are provided to the neural network The network's task is to discover patterns within the given inputs without external guidance This learning paradigm finds applications in data mining and recommendation algorithms
- Reinforcement Learning: Reinforcement learning incorporates feedback in the form of rewards based on system performance rather than target outputs The goal is to maximize the system's cumulative reward through trial-and-error exploration.
Deep Learning
Deep Learning represents the forefront of machine capabilities, and it is crucial for developers and business leaders to grasp its essence and functionality This distinctive algorithmic approach has outperformed all previous benchmarks in image, text, and voice classification Deep learning is a specific subset of Machine Learning, which is a specific subset of Artificial Intelligence Deep Learning is one way of creating intelligent machines by using a specific algorithm called a Neural Network Neural networks are constructed based on the intricate architecture of the cerebral cortex At its core, a perceptron serves as a mathematical representation of a biological neuron Similar to the structure of the cerebral cortex, neural networks consist of multiple interconnected layers of perceptron The input values, which represent our underlying data, traverse through these interconnected layers, commonly referred to as hidden layers, until they converge at the output layer The output layer generates predictions, consisting of either a single node for numerical outputs or multiple nodes for multiclass classification problems
The hidden layers within a neural network perform transformative operations on the data, gradually discerning its relationship with the target variable Each node in
13 the network possesses a weight, which is multiplied by its corresponding input value Through this iterative process across multiple layers, the neural network effectively manipulates the data, enabling it to extract meaningful insights
Figure 2.8 Compare performance between Machine Learning and Deep
Deep Learning represents a specific class of algorithms that exhibit remarkable proficiency in prediction tasks Deep Learning and Neural Networks are largely interchangeable terminologies for practical purposes While Machine Learning has been employed for image and text classification for many years, it faced challenges in surpassing a baseline accuracy threshold Deep Learning has emerged as a breakthrough that allows us to overcome these limitations and achieve significant advancements in areas that were previously inaccessible
Convolutional Neural Network (CNN) is a fundamental concept in the field of deep learning Unlike manually crafted features, CNN has the ability to extract features in a deep and detailed manner Specifically, CNN layers extract these features, which can reveal more distinct patterns within the data as a whole Currently, the primary role of CNN is to leverage the intricate features of predefined kernels The input data is convoluted with the kernels with predefined shapes This convolution leads the network to detect edges and lower-level features in earlier layers and more complex features in deeper layers in the network CNNs are used in combination with pooling layers, and they often have fully connected layers at the end Perform forward propagation in a similar manner to a standard neural network and utilize backpropagation to minimize the loss function during training of the CNN
Almost all CNN models exhibit a comparable architectural structure To attain the desired functionality, the Convolutional Neural Network processes images through multiple layers, as illustrated in Figure 2.9
As the Figure 2.9, a vanilla CNN consists of convolutional layer, activation function (ReLU layer), pooling layer, and fully connected layer with corresponding functions:
• Convolutional layer: This layer has the mission to detect features from input
• Activation layer: are mathematical equations that govern how a neural network behaves
• Pooling layer: This layer is used to reduce the number of weights and controls overfitting
• Fully connected layer: Standard Neural Network used for classification from flattening layer
In CNN, the Convolution Layer is the primary layer responsible for extracting features from input images It employs a series of filters, typically with predefined sizes like 3x3 or 5x5, to perform convolution operations Convolution is a mathematical operation to merge two sets of information The output of this layer is referred to as a feature map, which represents the detected features in the input image
As an example, in Figure 2.10, the convolution is applied on the 5x5 input image using a 3x3 convolution filter to produce a corresponding feature map (activation map) The input and feature can be visualized intuitive in Figure 2.11
Figure 2.11 The result of a convolution operation
The mathematical of the convolution operation is demonstrated in the equation (2.1), where I is the input image, K is the filter size of h x w
The convolution operation is carried out by moving the filter across the input At each position, we perform element-wise matrix multiplication and then calculate the sum of the resulting values This sum is then used to generate the feature map The receptive field is the blue area in Figure 2.10 Before we proceed to visualize the convolution operation, it is important to note that we apply multiple convolutions to the input, each utilizing a different filter and producing a unique feature map These feature maps are then stacked together to form the final output of the convolution layer The concept of CNN laid the foundation for other image applications using neural network such as image segmentation several different image recognitions
Activation functions are mathematical formulas that regulate the behavior of a neural network Each neuron in the network is associated with an activation function, which determines whether the neuron should be activated based on its significance in the model's prediction These functions play a crucial role in filtering out noisy data
16 and transforming linear networks into nonlinear ones In the past, nonlinear functions like Sigmoid and tanh were used, but it turned out that the function that gives the best results when it comes to the speed of training of the Neural Network is ReLU, which removes linearity by setting values that are below 0 to 0 Figure 2.12 presents the curves of the Sigmoid, Tanh, and ReLU activation function
Figure 2.12 Curves of the Sigmoid, Tanh, and ReLU
Stride determines the displacement of the convolution filter during each iteration
By default, the stride is set to 1, meaning the filter moves one unit at a time However, larger strides can be used to reduce the overlap between receptive fields This has the effect of shrinking the resulting feature map since certain locations are skipped over The feature map got smaller in Figure 2.13 when applied stride = 2 to convolution operating
Figure 2.13 Convolution with stride of 2
Padding is the technique used to maintain the same dimensionality after conducting a convolution calculation by padding to surround the matrix with zeros Figure 2.14 shows the mentioned padding technique
There are two main types of pooling in neural networks: Max Pooling and Average Pooling Max pooling selects the maximum value from the pixel regions covered by the pooling kernel, while Average pooling calculates the average of all the pixels within the kernel
Max Pooling is commonly used to extract specific features, such as borders, by capturing the highest values within the regions On the other hand, Average Pooling tends to capture more general characteristics, providing a smoother representation of the input data The choice between Max Pooling and Average Pooling depends on the specific requirements of the task and the nature of the data being processed If sharp features or distinct patterns are important, Max Pooling may be more suitable
On the other hand, if a more generalized representation or softer characteristics are desired, Average Pooling can be a better choice
In the context of image segmentation networks, I prefer using Max Pooling due to prioritizing capturing specific and distinctive features, such as boundaries, to aid in the segmentation task The max pooling process can be illustrated in
After the image has passed through multiple hidden layers in a neural network, it becomes necessary to aggregate the diverse and multi-level features extracted from these layers This aggregation allows all the extracted values to contribute to the final prediction collectively One common method used for this purpose is the Fully Connected Layer, also known as the connection layer This layer is computationally efficient
To create a column vector, the two-dimensional image is flattened, converting it into a one-dimensional representation During each training iteration, this flattened result is forwarded through the network towards the final layer and then backpropagated to update the output
Extend Kalman Filter
The Kalman filter [16] is commonly employed to merge low-level data When the system can be represented as a linear model and the error can be assumed to follow Gaussian noise, the recursive Kalman filter produces optimal statistical estimates The Extended Kalman Filter (EKF) [17] is an extension of the Kalman Filter that addresses nonlinearity in system dynamics and measurements While the traditional Kalman Filter assumes linearity, the EKF accommodates nonlinear relationships by approximating them using a technique called linearization
Similar to the Kalman Filter, the EKF represents the system's state using a mean vector and a covariance matrix However, the EKF employs the Jacobian or Taylor series approximation to linearize nonlinear functions, enabling them to be handled within the framework of the Kalman Filter
By employing this linearization technique, the Extended Kalman Filter extends the applicability of the Kalman Filter to nonlinear systems, providing a way to estimate the state even in the presence of nonlinearities.
PID controller
A PID controller [18], which stands for Proportional-Integral-Derivative controller, is a feedback mechanism commonly employed in industrial control systems and various other applications that demand continuous and adjustable control The PID controller constantly computes an error value by comparing a desired setpoint with a measured process variable It then applies a correction, taking into account Proportional, Integral, and Derivative terms (represented as P, I, and D, respectively), in order to generate an appropriate control value that brings the system to the desired state The close loop control of PID controller is shown on Figure 2.24
The full equation of PID algorithm is:
• K p is the proportional gain, a tuning parameter
• K i is the proportional gain, a tuning parameter
• K D is derivative gain, a tuning parameter
• e t( )=SP PV t− ( ) is the error between Setpoint (SP) and Process variable (PV)
• t is the time or the instantaneous time.
PyTorch framework
PyTorch is a machine learning framework built on the Torch library, widely utilized for tasks like computer vision and natural language processing [19] It operates under an open-source license, resulting in a large and active community While Google's TensorFlow is a prominent ML/DL framework with a dedicated following, PyTorch has gained significant traction due to its dynamic graph approach and flexible debugging capabilities [20] The core data structure in PyTorch is called a tensor, which resembles a one-dimensional array with elements of the same data type, similar to NumPy arrays
PyTorch encompasses various modules designed to facilitate the training process, as illustrated in Figure 2.25 These modules provide support to users, making training procedures more accessible across different hardware platforms
Figure 2.25 PyTorch Deep Learning framework
Table 2-3 listing commonly utilized modules across various phases like dataset loading, model configuration, and more:
Modules Usage torch.nn Define basic blocks such as layers in models torch.optim Define optimizers in Deep learning/ Machine Learning torch.utils Commonly, used for loading datasets with different loader torch.autograd PyTorch’s automatic differentiation engine that powers neural network training torch.backward Computes the gradient of the current tensor
TensorRT
TensorRT sut[21] is a specialized NVIDIA library developed to enhance inference speed and reduce latency on NVIDIA GPUs It aims to boost inference performance by 2-4 times compared to real-time services and up to 30 times compared to CPU performance TensorRT employs five optimization techniques to optimize and accelerate inference speed, as demonstrated in the following context in Figure 2.26
For more detail, each stage can be presented as follows:
• Precision calibration: During the training phase, TensorRT converts parameters and activations from FP32 (Float Point 32) precision to lower precisions like FP16 or INT8 This optimization reduces memory usage and improves inference speed, although it may slightly impact model accuracy In real-time recognition scenarios, finding the right balance between accuracy and inference speed becomes crucial
• Dynamic Tensor Memory: TensorRT intelligently combines nodes in vertical, horizontal, or both directions to minimize GPU memory consumption and bandwidth usage
• Layer and Tensor Fusion: TensorRT employs various optimized kernels to fuse layers and tensors together during the model optimization process, further enhancing inference speed
• Kernel Auto-Tuning: TensorRT dynamically allocates memory for each tensor as needed and optimizes memory footprint by encouraging memory reuse, resulting in more efficient memory utilization
• Multi-Stream Execution: TensorRT enables simultaneous processing of multiple input streams, allowing for concurrent execution and boosting overall inference performance.
Multi-Threading
In computer systems, the term "Multithreading" [22] refers to the execution of a program that utilizes more than one thread At a minimum, a program requires one thread to execute In modern computers, which typically have multiple CPU cores, the threads in a program are scheduled and executed on these cores by the operating system's scheduler Multithreading offers several advantages, including the "Full utilization of CPU resources" and improved user experience A multithreading can be showed as Figure 2.27
ASSEMBLE THE HARDWARE
Overall system
This section outlines the roles and interdependencies between different hardware components and their respective functions Overall, the whole system is illustrated by block diagram in Figure 3.1
The functions of each component are described as follows:
• Sensors block: Include 4 main sensors such as GPS module, IMU module, Camera, and Encoder This block has the mission to collect environmental information and return to Embedded computer and Laptop
• Processors include Embedded computer, which was Nvidia Jetson TX2 board, and my laptop These processors are utilized to receive data from external devices, perform processing operations, and transmit signals to actuators
• Controller circuit board: include one STM-32 microprocessor as center controller It has two main functions control steering angle and car’s speed
• Actuator: This module fulfills the demands of microcontrollers
• Controlling device: This module is used to select driving mode between manual and automatic mode Additionally, this module played as wheeling control in manual mode.
DESIGN AND CALCULATION
Multi-task road lane detection
In this section, I introduce the proposed network architecture for multi-task learning in detail I discussed how an efficient feed-forward network is implemented to accomplish the tasks of two different heads, which are Scene parsing and Lane detection, and their fusion of them was considered My perception architecture is presented in Figure 4.3, which includes a shared encoder backbone and two subsequent decoders for solving specific assignments The Path Planning procedure utilizing Ultra-Fast Lane Detection (UFLD) [24] along with the RANSAC [23] technique was the primary path estimator The Scene Parsing technique using Semantic segmentation provided an analysis related to the ambiance
My network shared one encoder between two branches, which can be called a backbone network Usually, the backbone consists of the convolutional and pooling layers of a classification network The backbone section's goal was to use input images to extract rich abstract features My suggested network used Resnet-18, which has the ability to resolve the vanishing gradient during training, as its backbone The architecture of Resnet-18 was introduced above, Resnet-18 has a limited number of layers, resulting in efficiency, quick processing, and restricted memory usage
In lane detection head (as Figure 4.4), lane-line detection (LD) was an indispensable part of keeping the autonomous car more stable The lane detection model was developed in [24] to reduce the computational burden and no information problem thanks to the structural loss and row anchor-based selecting The UFLD presented a technique to estimate the position of lanes on specified rows anchor instead of the whole image pixel in the segmentation approach to optimize the inference time while maintaining accuracy
The formulation for the predicted corresponding lane may be inferred as:
In which T i n , is probability to sort out corresponding i th lane and n th row anchor from the classification model F i n , with the global feature X
For more detail, Qin et al implemented cross-entropy loss for T i n , and one-hot label Y i n , to optimized row-based selection methods with K and N are the number of lane-line and the number of predefined row anchors, respectively The formulation could be illustrated as follow:
In [24] the researchers also proposed another loss function which is called structural loss The primary goal of the structural loss was to create a connection between all predetermined rows The proposed structural loss function included two sub-functions such as L sim and L shape , which L sim is similarity loss function and L shape is loss function for lane shape is the loss coefficient, which used for determining straight lane or not structural sim shape
Furthermore, semantic segmentation was milked as an auxiliary branch to learn the context information from different feature maps sizes of backbone The optimize function L aux seg _ for aux-segment branch would be using cross-entropy loss as Qin et al at [24] recommended An auxiliary segmentation task, which is only valid when training, utilizing multi-scale features to model local features
Therefore, the lane-line detection model gained much of an upper hand compared to lane-line segment which the steering angle calculation algorithms were further complicated The overall loss function for the UFLD method would be written as follow with and as loss factor:
_ lane cf structural aux seg
When implementing a path estimator for determining the appropriate steering angle in an outdoor environment, it's important to have robust strategies in place that can help maintain the middle point of the vehicle's path, particularly when facing noisy data So, in this part, I utilized Random Sample Consensus (RANSAC) algorithm to fit the centre points for mentioned mission This model can be used as a straight-line estimator in 2D space by employing the voting mechanism to determine the best-fitting result given a dataset with both inliers and outliers The outliners would be removed, and the inliers would be kept for processing corresponding steering angle in lane keeping scenario
Input: data – A set of center point
Outputs: BestCenter: Best-fitted center point found (or null if no good model is found) k ← Maximum number of iterations allowed in the algorithm t ← Threshold value to determine data points that fit well with the model n ← Best number of close data points required to assert that a model fits well to data
Begin n = 0 for iterations < k do randomIndex = randomly select index in range from 0 to k consensus_set = [data[randomIndex], data[randomIndex + 1]] x = [data[randomIndex][0], data[randomIndex + 1][0]] y = [data[randomIndex][1], data[randomIndex + 1][1]] fit = fit_line_through_points(x, y) /* Fit a line through the points (x, y) */ coeff = get_coefficient(fit) /* Get the coefficient of the line */ intercept = get_intercept(fit) /* Get the y-intercept of the line */ point_random = data[iterations] /* Get random point from data */ distance = Calculate distance from point random to line
Add point_random to consensus_set end if num_points = length of consensus_set if num_points > n then: n = num_points
BestCenter = consensus_set end if increment iterations end for return BestCenter end
One of the key functions of driverless automobiles has been Scene Parsing (SP), which generates through the semantic segmentation technique The fundamental function of this method is to identify pixels throughout the picture while analyzing the semantics of various areas of the image After passing ResNet18-layer, the extracted features map would apply new versions of the Atrous Spatial Pyramid Pooling (ASPP) module called Deeper Atrous Spatial Pyramid Pooling (DASPP) which was inspired by [25] This ASPP module was built to enhance learning in a global or local context due to the different dilation rates of the receptive fields Because of its effective performance, the decoder architecture presented in this paper is inspired by the approach conducted in reference [26] Additionally, to reduce
63 the depth of the features for computational efficiency and enhance richer geometrical information, I concatenated the low feature map from the Resnet backbone with the output of ASPP module Due to the large number of channels, 1x1 convolution was considered to utilize to maintain the width and height of corresponding feature maps The output segmentation would then be created by using three more 3x3 convolution formulations and upsampling approach
In this work, I utilize cross-entropy loss as a function that helps to minimize the disparity between the predicted and true segmentation maps The equation could be clarified as below:
In which C is the number of classes, N number of samples, and Q i j , is the j th entry of the one-hot encoded label from the dataset P i j , is the predicted probabilities segmentation class
In conclusion, the general optimizes loss function for my architecture, which is a weighted sum of the two above branches, could be presented as follow:
* * total lane lane scene scene
Where lane and scene are the corresponding total loss weights for the Lane detection head and Scene parsing head in my proposed method lane and scene can be adjusted to achieve an equilibrium state in all stages of the loss function
From the baseline of the scene parsing head, which contains a structure mentioned in above section I decided to sequentially integrate with the lightweight self-attention mechanisms Convolution Block Attention Module (CBAM) [27] and Cross Stage Partial Network (CSP) [28] block
Figure 4.6 Convolutional Block Attention Module with two submodules
Driving strategy
In this thesis, I utilized the method in paper [30] as my steering wheel strategy Most modern cars now come equipped with Electronic Power Steering Support (EPSS), which allows for more flexibility in adjusting the steering angle using advanced algorithms This additional momentum provided by EPSS enables the steering angle to adapt to significant changes However, in my specific situation, I faced limitations in terms of mechanical capability, preventing the steering system from meeting the required adjustments Consequently, my system could not handle consecutive large adjustments by calculating the steering angle solely through image processing To address this issue, I utilized a strategy in this study that utilizes a DC servo and an absolute encoder to maintain stability in my vehicle The proposed method was implemented in the system depicted in Figure 4.12 My wheeling system incorporates a 1:11 gearbox, which assists the servo in achieving the necessary momentum The gearbox is directly connected to the absolute encoder to measure the roll Additionally, the steering gear box has a 1:12 ratio, enabling direct control of the steering angle The steering gear box serves the purpose of converting circular force into horizontal control The algorithm was created to adjust to the ongoing changes
69 in the previous processing phase Its objective is to enhance the seamless synchronization between two input signals: the absolute encoder and the steering angle obtained from the central processing device The driving strategy is illustrated as the algorithm below
Figure 4.12 Detail parameter in proposed car model
Input: Steering angle from image processing
Outputs: The pulse for DC servo
𝛼 𝑠 ← Steering angle from image processing
𝛽 𝑒 ← The recent angle of the encoder à ← The final pulse ѱ ← Threshold for steering
𝑀(𝛿 𝑝 , 𝛽 𝑒 ) : is the function that estimates the recent angle depends on the difference of the previous and the recent
Servo-direction (0 is the right to left rotation and 1 is the backward)
= 𝑀(𝛿 𝑝 , 𝛽 𝑒 ) /∗ Define recent wheeling angle ∗/ if 𝑎𝑏𝑠(𝛼 𝑠 − ) > ѱ do: if (𝛼 𝑠 − ) > 0 do:
The primary concept behind Algorithm 3 was to divide the total rotation distance required by the servo and enable the car to respond to the environment with greater flexibility Additionally, in various situations, we have the ability to adjust a parameter, denoted as "k," within Algorithm 3 to accommodate variations in the surrounding objects, such as for obstacle avoidance purposes To clarify, equation (4.10) demonstrates the ratio of the pulse computed à by the proposed algorithm and the corresponding physical angle ω:
In subsection 4.1, I first introduced about the multitask network which used for Lane keeping and Scene parsing case After that, the middle point and the offset compared to the center of car from both cases can be calculated
Figure 4.13 Calculate offset in Lane keeping scenario
In lane keeping scenario, after determining the list of right lane and the middle lane, I calculate the average points between two lists of that lane To remove the outliners and keep the inliers in the average points list, I applied RANSAC algorithm which mentioned in section Refining strategy After that, the middle point of road can be calculated by using with N is the total number of inliers:
So, the x offset can be calculated by following:
Following section 4.1.7, in case the car face with the obstacles like cars or pedestrians, the multitask model will change to segmentation-based controller So, the car can adjust steering toward the drivable area For estimating the drivable area or offset location that the vehicle should head to, the “7-point distance matrix” algorithm has been applied In detail, the complete set of coordinates (x, y) obtained from the points where the road contour intersects with seven lines on both the left and right sides The initial line is positioned at the vertical midpoint of the frame, and each subsequent line is spaced apart by an angle of 10 degrees Figure 4.14 depict this method step by step
Figure 4.14 Average offset point from segmented image
After found the corresponding intersections, the offset x offset or error of the vehicle is calculated from frame to frame with following equation:
Where m i j , is the (7x2) matrix containing the extracted coordinates of intersection points from both the left and right sides of the road contour
Then, the x offset can be calculated as follows: offset car middle x =x −x (4.14)
It needs to have an algorithm to calculate steering angle from the corresponding offset For that reason, the PID controller was applied to calculate desired angle from the offset in each scenario Proportional Integral Derivate control (PID) plays the most vital during running time which is classical but well work with this task
In more detail, Figure 4.15 describes the block diagram of PID controller for image-based angle in a feedback loop
Figure 4.15 Block diagram of PID controller for vehicle angle in continuous time
In continuous time, the overall control function:
In discrete time, the equation follows below:
• e is error between target input and feedback value of the system
• t is time or instantaneous time
• is variable of integration takes on values from time 0 to the present t Overall, the block diagram of PID for steering angle would be presented as Figure 4.16 below In detail, the input of my PID controller is the value of 𝑋 𝑜𝑓𝑓𝑠𝑒𝑡 In lane keeping case, the value of 𝑋 𝑜𝑓𝑓𝑠𝑒𝑡 can be inferred from the section 4.2.2.1 In another case, with obstacles appearing in frame, 𝑋 𝑜𝑓𝑓𝑠𝑒𝑡 can be presented from the section 4.2.2.2
Figure 4.16 Block diagram of PID for steering angle
GPS processing
In this thesis, GPS sensors were integrated with the purpose of determining accurately positions of my ego car In detail, GPS sensors were conducted to build a UTE campus map Additionally, Extend Kalman Filter was utilized to enhance the real-time performance of GPS modules
OpenStreetMap (OSM) [29] is a collaborative initiative that aims to develop a free, editable global geographic database The geodata underpinning the maps is regarded as the project's principal product The inception and expansion of OSM was spurred by restrictions on the usage or availability of map data throughout most of the world, as well as the introduction of low-cost portable satellite navigation systems OSM data may be utilized in a variety of ways, including the creation of print and electronic maps, the geocoding of addresses and place names, and route planning
In this experiment, the Campus Road map was constructed by using three different low-cost GPS sensors such as GNSS LC86L, Beitian BN-880 and Neo-8 The road map extended from block C, pass block A and stop at parking lot Figure 4.17 manifests the corresponding campus road map
Figure 4.17 The campus road map
To convert from a global frame (Lat, Lon) to a simple plane (X, Y) for easier calculation, I adapt the method of Equirectangular projection
To begin with, I managed to combine data from data sourced from a triad of GPS sensors, in order to enhance the overall reliability of the GPS signal in the whole system Particularly, I have one GPS (LC86L Quectel) integrated into my printed board with highest accuracy but is unstable, and another two modules (GPS NEO8 and GPS Beitian) which are less accurate but more stable
In this stage, a Linear Kalman Filter (LKF) and weighting technique were employed with the purpose of combining these three signals As a normal Linear model, I fed the algorithm with the extracted longitude and latitude readings The algorithm then iterates over each GPS sensor and uses the Kalman filter to estimate the state of the system for each sensor Inside the loop, it constructs a measurement vector latitude[i], longitude[i] ,i =1, 2,3 containing the combined latitude and longitude data from the mentioned sensors The observation matrix H is simply a 2x4 matrix with ones in the diagonal elements and zeros elsewhere The measurement noise covariance matrix is provided as input to the Kalman filter
After gaining the refined results from LKF, a weighted averaging technique was applied in order to assign importance to the sensor with a higher weight by multiplying its latitude and longitude estimates by its corresponding weight factor during the Kalman filter update phase The reason for this step is that I wanted to leverage the compensation of the three GPS sensors to improve the stability of the final GPS signal The formula used to calculate the fused latitude and longitude estimates is as follows:
is the raw GPS data from three GPS sensors and
is a weight vector with the function of tuning how much each sensor will distribute to the fused data
In this subsection, I will first present the dynamic model for the autonomous car in a 2D plan in equation (4.19) The model was taken into consideration the vehicle's velocity and turning capacity to depict the movement and location of the vehicle cos( ) sin( ) i x x v t y y v t x v v
Then, the implementation of Extend Kalman Filter (EKF) would be described to precise the localization of four-wheel self-driving cars on my GPS map Kalman filter was the technique that applied Bayesian estimation in order to estimate systems states that are subject to noise, measurement error, and other uncertainties However, it cannot handle nonlinear systems and non-Gaussian noise For that reason, the Extend Kalman Filter was designed for resolve that problem The system on which the Extended Kalman Filter is used evolves according to the process:
With x i consists of estimated states presented in equation (4.19), u t is system control input and w t is the process noise
Equation (4.21) models the measurement of parameters In whichz i is the measurement vector, the shape of this vector depends on number of measurements and variables, v i is the measurement noise
In a navigation system of self-driving car, signals exhibit non-linear behavior rather than being linear As a result, to estimate non-linear processes and measurements, the first order of the Taylor series is employed for both the process and measurement equations The accuracy of the Taylor estimation is determined by its proximity to the working point depicted in Figure 4.18 and in equation (4.22)
A and W are the Jacobian matrices of the partial derivatives of 𝑓 to x and w, respectively H and V are the Jacobian matrices of the partial derivatives of ℎ to z and v, respectively There are two steps to the Kalman filter: prediction and correction The following is the standard Kalman prediction state:
P t is the covariance matrix linked to the prediction of state vector parameters, while Q t is the process noise covariance matrix The measurement update formular are as follows:
The flowchart for real time EKF is shown in Figure 4.19 below
Figure 4.19 Block diagram of real time EKF
After applying the EKF for GPS data, I obtained data with relative accuracy But, due to the enormous noises of GPS that I cannot absolutely remove, I utilize the other method call “circular position checking” to deal with the proposed problem First of all, the data map has been collected and refined as the strategy in section 4.3.2 So, I can check the corresponding destination The fundamental concept of this algorithm involves establishing a circular equation (referred to (4.25)), with the car positioned at the center (the sensor is placed on the top center of the car) xC−xP 2 + yC−yP 2 R (4.25)
In which, R is the radius of that given circle, (xC yC, ) is the coordinate of the center of the car, (xP yP, ) is the coordinate of the waypoint
When the waypoint falls within the confines of the circle, I assign the anticipated state as the subsequent waypoint and determine the direction (instruction) associated with that waypoint Figure 4.20 manifests the circle around the car and the predefined path
EXPERIMENTS AND RESULTS
Experimental environment
In this research paper, my self-driving vehicle underwent testing on straightforward routes located within the UTE campus Due to constraints in terms of time and available hardware, the autonomous car's capabilities are currently limited to operating on medium-sized roads within the campus premises The operating road is the whole road system except the road on the right of the central block The experiment roads are shown in Figure 5.1
The datasets used for training and validation were collected from my university campus at various times and conditions My datasets (as Figure 5.2) contain 4980 images with corresponding annotations for each head training 2000 images for testing The size of each image in our datasets is 640x480 with a 16:9 aspect ratio
In this work, I managed to perform lane detection training using my self-labeling dataset with over 4980 utilized images collected at the university campus
Besides, we designed a tool for self-labeling images collected on campus The tool is written depending on the OpenCV library [31] The user interacts and interfaces with this tool through the terminal The tool and the labeled images are illustrated as Figure 5.3
Figure 5.3 The data labeling process The labeled images (a,c) and the collected images (b,d)
As for the semantic segmentation task, the custom dataset encompasses over
4980 road images collected while running along with my campus on the golf car was used
Moreover, I also self-labeling more than 4980 images on my dataset collected on the HCMUTE campus The label tool is called “Labelme” [32] which specializes in segmentation labels In my case, I labeled 3 classes: people, car, road, and others in the background The labeling process is shown in Figure 5.4
Figure 5.4 Labeling process using LabelMe tool
To test the localization algorithms as well as define the circular radius, I run the car around the university campus, then read and write the GPS sensors into a CSV file In more detail, the GPS sensors module included three different low-cost GPS sensors such as GNSS LC86L, Beitian BN-880 and Neo-8 From that, I can plot and design algorithms The road map extended from block C, pass block A and stop at parking lot The raw data of them collected on my campus is depicted in Figure 5.5
Training process
The suggested model was built on the Pytorch framework All experiments were carried out on an Intel® Xeon® processor with a 2.3 GHz clock and the Ubuntu 18.04 operating system I leveraged the two Tesla T4 GPU with 24GB of memory for the training model
The multi-task model was trained with the following parameters in Table 5-1:
Evaluate metric (Lane line head) Accuracy
Evaluate metric (Segment head) mIoU
Figure 5.6 depicts the training loss during the training process:
The accuracy of lane line detection is being demonstrated as below Figure 5.7:
Figure 5.7 Training accuracy of lane line detection graph
The accuracy of Segmentation is being evaluated by mIOU values The training mIOu line graph can be depicted as:
Figure 5.8 Training mIOU segment graph
CONCLUSION AND FUTURE WORK
Conclusion
In conclusion, in this thesis, I manage to research, design, and construct an autonomous golf car that can run on HCMUTE school campus The objective of my work is to operate an autonomous car within specific natural surroundings The main functions of my self-driving car include: “lane keeping”, “scene parsing for obstacle detection” and “GPS navigation” The lane-line detection head used for planning on a normal road The scene parsing acts as external guidance on where to make lane predictions on for the road line algorithm In addition, I enhanced the scene parsing by incorporating CSP and CBAM blocks Regarding GPS system, the refinement campus data map was built from a triad of GPS sensors with LKF and weighting method Moreover, the real-time localization was conducted by the EKF algorithm on the above campus road map The inference model was run with the Tensor RT pattern to optimize the referenced time of the model Finally, I designed the circuit board, which included STM32, GPS, RX-601 modules, to operate several tasks such as control steering and car’s speed The experimental results and evaluations show that my car can operate well on the HCMUTE campus with some basic scenarios
However, due to the poor mechanical driving wheel and the lack of quality sensors the robust driving strategy of this car when it deals with intersection is not resolved.
Future works
In the future, to push this project further and make it commercialize, I suggested to improve several tasks listed below:
• The EPSS will be equipped to reduce the complexity of control algorithms
• Several methods will be researched to enhance driving strategy in case of intersection
• Recommend LiDAR sensor to increase the ability to locate and avoid obstacles
• Research more modern controller such as NMPC or MPC to improve the stable of driving system
[1] M Fathy, N Ashraf, O Ismail, S Fouad, L Shaheen, and A Hamdy, “Design and implementation of self-driving car,” Procedia Comput Sci, vol 175, pp
[2] E Yurtsever, J Lambert, A Carballo, and K Takeda, “A Survey of Autonomous Driving: Common Practices and Emerging Technologies,” IEEE
Access, vol 8, pp 58443–58469, Jun 2019, doi: 10.1109/ACCESS.2020.2983149
[3] K Burnett et al., “Building a Winning Self-Driving Car in Six Months,” Proc
IEEE Int Conf Robot Autom, vol 2019-May, pp 9583–9589, Nov 2018, doi:
[4] D Kalathil, V K Mandal, A Gune, K Talele, P Chimurkar, and M Bansode,
“Self Driving Car using Neural Networks,” in 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC), 2022, pp 213–
[5] M T Duong, T D Do, and M H Le, “Navigating Self-Driving Vehicles Using Convolutional Neural Network,” Proceedings 2018 4th International Conference on Green Technology and Sustainable Development, GTSD 2018, pp 607–610, Dec 2018, doi: 10.1109/GTSD.2018.8595533
[6] H.-H.-N Nguyen, D.-H Pham, T.-L Le, and M.-H Le, “Lane Keeping and Navigation of a Self-driving RC Car Based on Image Semantic Segmentation and GPS Fusion,” in 2022 6th International Conference on Green Technology and Sustainable Development (GTSD), 2022, pp 601–606 doi:
[7] “What is Artificial Intelligence (AI) ? | IBM.” https://www.ibm.com/topics/artificial-intelligence (accessed Jun 29, 2023)
[8] “A Beginner’s Guide To Understanding Convolutional Neural Networks – Adit Deshpande – Engineering at Forward | UCLA CS ’19.” https://adeshpande3.github.io/A-Beginner’s-Guide-To-Understanding-
Convolutional-Neural-Networks/ (accessed Jun 29, 2023)
[9] “CNN architecture | Packt Hub.” https://hub.packtpub.com/cnn-architecture/ (accessed Jun 29, 2023)
[10] “Evaluating image segmentation models.” https://www.jeremyjordan.me/evaluating-image-segmentation-models/
[11] K He, X Zhang, S Ren, and J Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Dec
[12] S Ruder, “An overview of gradient descent optimization algorithms *”,
Accessed: Jul 14, 2022 [Online] Available: http://caffe.berkeleyvision.org/tutorial/solver.html
[13] D P Kingma and J L Ba, “Adam: A method for stochastic optimization,” in
3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, International Conference on Learning
Representations, ICLR, Dec 2015 Accessed: May 15, 2021 [Online] Available: https://arxiv.org/abs/1412.6980v9
[14] N Jegadeesh and S Titman, “Momentum,” SSRN Electronic Journal, Oct
[15] A Graves, “Generating Sequences With Recurrent Neural Networks,” Aug
2013, Accessed: Jun 29, 2023 [Online] Available: https://arxiv.org/abs/1308.0850v5
[16] “(PDF) The Kalman Filter and Related Algorithms: A Literature Review.” https://www.researchgate.net/publication/236897001_The_Kalman_Filter_an d_Related_Algorithms_A_Literature_Review (accessed Jul 14, 2022)
[17] “(PDF) Kalman and Extended Kalman Filters: Concept, Derivation and Properties.” https://www.researchgate.net/publication/2888846_Kalman_and_Extended_K alman_Filters_Concept_Derivation_and_Properties (accessed Jun 29, 2023)
[18] K H Ang, G Chong, and Y Li, “PID control system analysis, design, and technology,” IEEE Transactions on Control Systems Technology, vol 13, no
[19] “Pytorch Tutorial | Deep Learning With Pytorch.” https://www.analyticsvidhya.com/blog/2018/02/pytorch-tutorial/ (accessed Jun 29, 2023)
[20] “What is Pytorch? - JournalDev.” https://www.journaldev.com/35641/what-is- pytorch (accessed Jul 14, 2022)
[21] “Sử dụng TensorRT để suy luận nhanh hơn và giảm độ trễ cho Mô hình đào sâu - Trang Chủ.” https://itzone.com.vn/vi/article/su-dung-tensorrt-de-suy-
96 luan-nhanh-hon-va-giam-do-tre-cho-mo-hinh-dao-sau/ (accessed Jun 29,
[22] “(PDF) Multithreading Implementations.” https://www.researchgate.net/publication/2734966_Multithreading_Implemen tations (accessed Jun 29, 2023)
[23] R Raguram, J M Frahm, and M Pollefeys, “A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus,”
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 5303 LNCS, no PART 2, pp 500–513, 2008, doi: 10.1007/978-3-540-88688-4_37
[24] Z Qin, H Wang, and X Li, “Ultra Fast Structure-aware Deep Lane Detection,”
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 12369 LNCS, pp 276–291, Apr 2020, doi: 10.48550/arxiv.2004.11757
[25] T Emara, H E A El Munim, and H M Abbas, “LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation,” 2019 Digital Image Computing: Techniques and Applications, DICTA 2019, Dec 2019, doi:
[26] L.-C Chen, G Papandreou, F Schroff, and H Adam, “Rethinking Atrous Convolution for Semantic Image Segmentation,” Jun 2017, Accessed: Jun 29,
2023 [Online] Available: https://arxiv.org/abs/1706.05587v3
[27] S Woo, J Park, J Y Lee, and I S Kweon, “CBAM: Convolutional Block Attention Module,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 11211 LNCS, pp 3–19, Jul 2018, doi: 10.48550/arxiv.1807.06521
[28] C.-Y Wang, H.-Y M Liao, I.-H Yeh, Y.-H Wu, P.-Y Chen, and J.-W Hsieh,
“CSPNet: A New Backbone that can Enhance Learning Capability of CNN,”
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol 2020-June, pp 1571–1580, Nov 2019, Accessed:
May 28, 2021 [Online] Available: http://arxiv.org/abs/1911.11929
[29] “OpenStreetMap.” https://www.openstreetmap.org/#map/10.85220/106.77206&layers=G (accessed Jul 14, 2022)
[30] T D Phan, H H N Nguyen, N H D Le, T S Nguyen, M T Duong, and M
H Le, “Steering angle estimation for self-driving car based on enhanced
97 semantic segmentation,” Proceedings of 2021 International Conference on System Science and Engineering, ICSSE 2021, pp 32–37, Aug 2021, doi:
[31] “Releases - OpenCV.” https://opencv.org/releases/ (accessed Jul 14, 2022)
[32] “tzutalin/labelImg: LabelImg is a graphical image annotation tool and label object bounding boxes in images.” https://github.com/tzutalin/labelImg (accessed May 28, 2021)
[33] L C Chen, Y Zhu, G Papandreou, F Schroff, and H Adam, “Encoder- Decoder with Atrous Separable Convolution for Semantic Image Segmentation,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 11211 LNCS, pp 833–851, Feb 2018, doi: 10.1007/978-3-030-01234-2_49
In this section, the authors introduce a number of additional initiatives that are being pursued to advance the project Furthermore, they acknowledge a limitation in the project's current capabilities and describe the iterative process of experimentation and refinement employed to improve the wheeling structure This section serves to demonstrate the contributions made and the extensive efforts undertaken The following are the extra works being undertaken:
Utilizing the superiority of machine learning (ML) technique, a coordinator undertakes a work of tuning the vector of weight (W) which is built as a model of Fully connected network in configuration of 6 input gates response to 3 pairs of longitude and latitude get from each GPS sensor and the output is the weight vector including 3 gates Thanks to the SoftMax activation function, the output of model always ensure that the sum of all output gate equals 1, from that, this model will be optimized by the Mean square errors (MSE) loss function through the target weight, as shown in Figure 1:
Figure 1 Structure of Fully connected network
The parameters of training process can be illustrated in below:
The training loss during the training process would be depicted in Figure 2:
In this part, I research the method to expand the camera angle This investigation gives an autonomous car more visual information If using a mono frontal camera leads to a lack of right and left information One possible approach is the panorama camera, but it will be distorted heavily and the cost is really expensive Therefore, I managed to use the stitching image technique with 2 mono cameras to create the larger view shown in
In this thesis, I have related publication on International Conferences: IWIS2023