VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY
NGUYEN NGOC TRUC
STUDY ON CAMERA-BASED REAL-TIME CAR SPEED MORNITOR USING YOLOv5 MULTIPLE OBJECT DETECTION MODEL
Major: Vehicle Engineering Major code: 8520116
MASTER’S THESIS
Trang 2THIS THESIS IS COMPLETED AT
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY – VNU-HCM Supervisor: Trần Đăng Long, Ph.D
Examiner 1: Trần Hữu Nhân, Ph.D
Examiner 2: Nguyễn Văn Trạng, Ph.D
This master’s thesis is defended at HCM City University of Technology, VNU- HCM City on July 15th, 2023
Master’s Thesis Committee:
1 Chairman: Lê Tất Hiển Assoc.Prof.Ph.D 2 Member: Võ Tấn Châu, Ph.D
3 Secretary: Hồng Đức Thông, Ph.D
4 Reviewer 1: Trần Hữu Nhân, Ph.D
5 Reviewer 2: Nguyễn Văn Trạng, Ph.D
Approval of the Chairman of Master’s Thesis Committee and Dean of Faculty of Transportation Engineering after the thesis being corrected (If any)
Trang 3UNIVERSITY HO CHI MINH CITY
HO CHI MINH CITY UNIVERSITY OF
TECHNOLOGY
SOCIALIST REPUBLIC OF VIETNAM Independence – Freedom - Happiness
THE TASK SHEET OF MASTER’S THESIS
Full name: Nguyễn Ngọc Trực Studen code: 2170108 Date of Birth: 30/07/1996 Place of birth: Đăk Lăk Major: Vehicle Engineering Major code: 8520116
I THESIS TOPIC: Study on camera-based real-time car speed monitor using
YOLOv5 multiple object detection model
ĐỀ TÀI LUẬN VĂN : Nghiên cứu ứng dụng mơ hình YOLOv5 nhận diện đa
vật thể trong ảnh cho hệ thống giám sát tốc độ xe bằng camera theo thời gian thực
II TASKS AND CONTENTS:
- Develop a traffic sign recognition system, specifically for speed limit signs, from images captured by cameras on the road
- Employ the Jetson Nano embedded computer as the central processing unit to run the YOLOv5 model for detecting speed limit signs Simultaneously, compare the sign recognition results with the vehicle's current speed accessed from the OBD-II system Subsequently, the system provides a direct alert to the driver on the screen if the speed limit is exceeded
III TASKS STARTING DATE: February 06th, 2023
IV TASKS ENDING DATE: June 12th 2023
VIII INSTRUCTOR: Trần Đăng Long, Ph.D
Ho Chi Minh City, July 15th 2023.
INSTRUCTOR
(Full name & Signature) HEAD OF DEPARTMENT (Full name & Signature)
DEAN - FACULTY OF TRANSPORTATION ENGINEERING
Trang 4ACKNOWLEDGEMENT
I would like to express my heartfelt gratitude to my thesis advisor, Ph.D Tran Dang Long, for his invaluable guidance, unwavering support, and continuous encouragement throughout the entire duration of this thesis His expertise, insightful feedback, and constructive criticism have immensely contributed to the success of this research endeavor
I am also deeply grateful to Ho Nam Hoa, for his assistance and collaboration in helping me establish the OBD-II CAN communication His technical knowledge, dedication, and willingness to share his expertise have been instrumental in overcoming challenges and achieving significant milestones in this project
Furthermore, I extend my sincere appreciation to my friend, Bui Huu Nghia, for his valuable contribution in collecting the dataset His commitment, attention to detail, and assistance in data acquisition have greatly enriched the quality of this research
I would also like to acknowledge the support and encouragement received from my family and friends throughout this academic journey
Lastly, I am grateful to all the individuals who have directly or indirectly contributed to the completion of this thesis Their support, guidance, and encouragement have been indispensable in shaping my research and personal growth Ho Chi Minh City, 15th July, 2023
Researcher,
Trang 5ABSTRACT
This study aims to two objectives Firstly, the design of a real-time traffic sign detection system for automobiles, with a specific focus on speed limit signs, using the YOLOv5 model Secondly, the study includes an assessment of the practical implementation of the traffic sign detection system by integrating it with a speed warning system that can be integrated into vehicles
This study includes several key tasks Firstly, extensive research was conducted to identify real-time detection methods suitable for traffic signs Subsequently, a comprehensive dataset of speed limit traffic signs was prepared, consists of 3200 images The next step involved training a model for the detection of these speed limit signs with the result of mAP 0.922 across 10 classes The model was then implemented on a Jetson Nano embedded computer Parallelly, an ESP32 microcontroller was utilized to extract actual vehicle speed data from the OBD-II system Lastly, the speed limit traffic sign detection system and the actual vehicle speed information were seamlessly integrated to develop a speed warning system
The experimental results demonstrate the efficiency of the proposed traffic sign detection system in this study The YOLOv5 model achieves 4 frames per second (FPS) on Jetson Nano computer in real-time detection speed Moreover, by integrating the speed limit sign detection system with real-time monitoring of the actual vehicle speed, it enables timely warnings to the driver in the event of exceeding the speed limit
Additionally, the experimental results showed limitations of the speed limit traffic sign detection system One such limitation is its inability to detect the number of lanes on the road, which affects its accuracy in providing the precise speed limit, particularly in residential areas Furthermore, there were instances where untrained traffic signs were mistakenly detected as speed limit signs To address these issues, it is recommended to expand the training dataset to include a wider range of traffic signs, not limited to speed limit signs alone
Trang 6TÓM TẮT LUẬN VĂN THẠC SĨ
Nghiên cứu này nhằm đạt được hai mục tiêu chính Thứ nhất, thiết kế một hệ thống nhận diện biển báo giao thông theo thời gian thực cho ô tô, tập trung đặc biệt vào biển báo giới hạn tốc độ, bằng cách sử dụng mơ hình YOLOv5 Thứ hai, nghiên cứu bao gồm việc đánh giá khả năng ứng dụng thực tế của hệ thống nhận diện biển báo giao thông bằng cách tích hợp với một hệ thống cảnh báo tốc độ có thể sử dụng trên xe ơ tơ
Nghiên cứu này bao gồm một số nhiệm vụ chính Đầu tiên, nghiên cứu xác định được các phương pháp nhận diện biển báo giao thơng theo thời gian thực Sau đó, đã chuẩn bị tập dữ liệu về biển báo giới hạn tốc độ, bao gồm 3200 hình ảnh Bước tiếp theo, huấn luyện một mơ hình để nhận diện các biển hạn chế tốc độ này với kết quả mAP là 0.922 trên 10 loại biển báo giao thông Sau đó, mơ hình đã được triển khai trên máy tính nhúng Jetson Nano Đồng thời, đã sử dụng vi điều khiển ESP32 để trích xuất dữ liệu tốc độ thực tế của xe từ hệ thống OBD-II Cuối cùng, hệ thống nhận diện biển hạn chế tốc độ và thông tin tốc độ xe thực tế đã được tích hợp để phát triển thành một hệ thống cảnh báo tốc độ
Kết quả thử nghiệm thể hiện tính ứng dụng của hệ thống nhận diện biển báo giao thông được trình bày trong nghiên cứu này Mơ hình YOLOv5 đạt được 4 khung hình/giây (FPS) trên máy tính Jetson Nano trong quá trình nhận diện với thời gian thực Hơn nữa, bằng cách tích hợp hệ thống nhận diện biển báo giới hạn tốc độ với việc giám sát tốc độ thực tế của xe, hệ thống cho phép cảnh báo kịp thời cho người lái trong trường hợp vượt quá giới hạn tốc độ
Ngoài ra, kết quả thử nghiệm cũng cho thấy những hạn chế của hệ thống Một trong những hạn chế đó là khả năng khơng thể nhận diện số làn đường trên đường, điều này ảnh hưởng đến độ chính xác của hệ thống trong việc xác định giới hạn tốc độ chính xác, đặc biệt là trong khu vực dân cư Hơn nữa, có những trường hợp biển báo giao thông chưa được huấn luyện bị nhận diện nhầm là biển báo giới hạn tốc độ Để khắc phục các vấn đề này, tập dữ liệu huấn luyện cần được đa dạng hơn cho các loại biển báo giao thông khác, không chỉ giới hạn ở biển báo giới hạn tốc độ
Trang 7THE COMMITMENT OF THE THESIS’ AUTHOR
I am Nguyen Ngoc Truc, Master’s student of Department of Vehicle Engineering,
Faculty of Transportation, class 2021, at Ho Chi Minh City University of Technology
I guarantee that the information below is accurate:
(i) I conducted all of the work for this research study by myself.
(ii) This thesis uses actual, reliable, and highly precise sources for its references and citations.
(iii) The information and findings of this study were produced independently by me and honesty.
Ho Chi Minh City, 15th July, 2023 Researcher,
Trang 8Contents
1 Introduction 1
1.1 Background 2
1.2 Literature Review 4
1.2.1 Speed Warning Systems 4
1.2.2 Traffic Sign Detection 7
1.2.3 Object Detectors 81.3 Research Objectives 91.4 Research Methodology 101.5 Research Contents 101.6 Scope of Research 121.7 Research Contributions 121.8 Research Outline 122 Fundamentals 142.1 Convolutional Neural Networks 15
2.1.1 Convolutional Layer 15
2.1.2 Pooling Layer 17
2.1.3 Fully Connected Layer 18
2.1.4 Activation Function 19
2.2 YOLOv5 21
Trang 92.2.2 YOLOv5 Architecture 22
2.3 Evaluation Metrics 26
2.3.1 Confusion Matrix 26
2.3.2 Intersection over Union 28
2.3.3 Precision and Recall 29
2.3.4 Mean Average Precision 30
2.3.5 F1 Score 30
2.4 Toolchain 30
2.4.1 Roboflow 31
2.4.2 Google Colaboratory 31
2.5 Conclusion 32
3 Design A Speed Limit Signs Detection Model 333.1 Prepare Dataset 343.1.1 Dataset Requirement 353.1.2 Dataset Classes 353.1.3 Dataset Collection 373.1.4 Data Annotation 383.1.5 Data Augmentation 383.1.6 Dataset Structure 403.2 Training Model 413.2.1 Install dependencies 413.2.2 Download Dataset 41
3.2.3 Training Model Parameters 42
3.2.4 Training Results 44
4 Experimental Evaluations 484.1 Experimental Preparation 49
4.1.1 Hardware Circuit Diagram 49
Trang 10CONTENTS
4.1.3 Speed Limit Caching Algorithm 51
4.1.4 Finite State Machine Based Speed Warning Algorithm 53
4.2 Experimental Apparatus 554.2.1 Jetson Nano 554.2.2 Camera Raspberry Pi V2 574.2.3 ESP32 594.2.4 CAN Transceiver 604.2.5 DC-DC Converter 614.2.6 OBD-II Adapter 61
4.3 Deploy on Jetson Nano 62
4.3.1 Build Model Engine 62
4.3.2 Run Model Engine 63
4.4 Experiment Conditions 64
5 Results and Discussions 665.1 System Setup 67
5.2 Speed Limit Detection 69
5.2.1 Results 69
5.2.2 Error Cases 72
5.3 Speed Warning Applications 75
6 Conclusions and Future Works 78
References 80
Trang 111.1 Types of ADAS 2
1.2 GSpeed, based on GPS and developed by iCar 5
1.3 Concept of Smart Road Signs communicate to vehicles 6
1.4 The comparison of YOLOv3 on performance 9
1.5 Research Contents and Workflows 10
2.1 An example of CNN architecture to classify handwritten digits 15
2.2 The Convolution Operation 16
2.3 An example of convolution with stride equal to 2 16
2.4 An example of padding in convolutional 17
2.5 An example of max pooling and average pooling 17
2.6 An example of the fully connected layer’s input multiplied by theweights matrix to receive the output vector 18
2.7 Plot of sigmoid activation function 19
2.8 Plot of tanh activation function 20
2.9 Plot of ReLU activation function 20
2.10 How YOLO works 21
2.11 Darknet-53 Architecture 23
2.12 (a) DenseNet and (b) Cross Stage Partial DenseNet 24
2.13 YOLOv5 Network Architecture 25
Trang 12LIST OF FIGURES
2.16 Define TP, FP base on IoU 29
2.17 The computer vision workflow on Roboflow 31
3.1 Dataset Preparation Workflows 34
3.2 Recorded Traffic Signs at Day and Night 38
3.3 Data annotating on roboflow 39
3.4 Image before and after augmentation 39
3.5 Dataset Health Check before Augment 40
3.6 Export Dataset with Download Code 42
3.7 The YOLOv5s Model Training Architecture 43
3.8 Training Results over 100 Epochs 44
3.9 Confusion Matrix 45
3.10 Precision and Recall Curve 46
3.11 F1-Confidence Curve 47
4.1 Concept of Experimental System 48
4.2 Hardware Circuit Diagram 49
4.3 Software Algorithm Flowchart 50
4.4 Speed Limit Caching Algorithm 52
4.5 FSM based speed warning algorithm 53
4.6 Jetson Nano Developer Kit B01 55
4.7 Camera Raspberry Pi V2 58
4.8 Microcontroller ESP32 59
4.9 Module CAN Transceiver SN65HVD230 60
4.10 DC-DC Buck Converter 61
4.11 OBD-II Male Adapter 62
5.1 The system setup for experiment 67
5.2 The system implemented on vehicle 67
5.3 The system was tested on vehicle 68
Trang 135.5 Speed detection system being tested in nighttime environments 70
5.6 Speed detection system being tested in various environments 70
5.7 Speed detection system being tested in various environments 71
5.8 The width limit signs mistaken by speed limit 50 km/h 72
5.9 Speed limit 80 km/h mistaken by speed limit 60 km/h in few frames 735.10 Warning in case speed exceeds from 1-5 km/h 75
5.11 Warning in case speed exceeds more than 5km/h 76
5.12 Warning in case speed falls below minimum from 1-5 km/h 76
Trang 14List of Tables
2.1 An example to calculate dimension of output activation map 24
3.1 Traffic sign classes 36
4.1 Jetson Nano GPIO 56
4.2 Technical specifications of Jetson Nano Developer Kit B01 57
4.3 Technical specifications of Raspberry Pi Camera Module V2 58
4.4 ESP32 GPIO 59
Trang 15ADAS Advanced Driver Assistance SystemsACC Adaptive Cruise Control
LDW Lane Departure WarningGPS Global Positioning SystemAI Artificial IntelligenceCV Computer Vision
CNN Convolutional Neural NetworksYOLO You Only Look Once
OBD-II On-Board Diagnostics IISWS Speed Warning SystemsROI Regions of InterestmAP Mean Average PrecisionFPS Frames Per Second
Trang 16SSD Single Shot MultiBox DetectorRPN Region Proposal Networktanh hyperbolic tangent
ReLU Rectified Linear UnitSPP Spatial Pyramid PoolingTP True Positive
TN True NegativeFP False PositiveFN False Negative
IoU Intersection over UnionAP Average Precision
UART Universal Asynchronous Receiver-TransmitterCSI Camera Serial Interface
Trang 17Introduction
Trang 18CHAPTER 1 INTRODUCTION
1.1Background
Figure 1.1: Types of ADAS [1]
In recent years, Advanced Driver Assistance Systems (ADAS) have emerged as apromising approach to enhance driving safety and reduce the number of accidents onthe road ADAS utilize various technologies, such as sensors, cameras, and commu-nication systems, to provide drivers with advanced warning and assistance in criticaldriving situations.
One of the most common ADAS features is Adaptive Cruise Control (ACC),which helps drivers maintain a safe distance from the vehicle in front by automati-cally adjusting the speed of the vehicle Another important ADAS feature is LaneDeparture Warning (LDW), which alerts drivers when they are drifting out of theirlane In addition,ADAScan also assist drivers in parking, with features such as park-ing sensors and automatic parkpark-ing systems Blind spot detection systems can alsoprovide drivers with visual or auditory warnings when there is a vehicle or obstaclein their blind spot.
Trang 19speed limit warning systems can be effective in reducing speeding behavior and im-proving road safety [2].
Recent advancements in Artificial Intelligence (AI) and Computer Vision (CV)have led to significant improvements in the accuracy and reliability ofADAS Deeplearning based approaches, such as Convolutional Neural Networks (CNN), haveshown promising results in object detection and recognition tasks, which are impor-tant forADAS.
Camera-based object detection has emerged as a promising technology for speedlimit warning systems The You Only Look Once (YOLO) object detection model is astate-of-the-art algorithm that has shown to be effective in detecting and tracking ob-jects in real-time [3] By using modelYOLO to detect and track speed limit signs onthe road, a speed limit warning system can provide accurate and reliable informationto the driver about the current speed limit.
Trang 20CHAPTER 1 INTRODUCTION
1.2Literature Review
1.2.1Speed Warning Systems
Speeding is a major cause of road accidents and poses significant risks to bothdrivers and pedestrians To address this issue, researchers and engineers have de-veloped various speed warning systems for automobiles These systems aim to alertdrivers when they exceed the speed limit, thereby promoting safer driving behavior.In this literature review, we explore three commonly used methods for implementing
SWS: GPS-based systems, systems that communicate with roadside infrastructure,and camera-based systems.
TheSWScomprises two primary components Firstly, it detects the speed limitcorresponding to specific road infrastructure Secondly, it continuously monitors theactual speed of the vehicle By comparing the detected speed limit with the actualvehicle speed, the system determines whether the driver is exceeding the speed limit.If a violation is detected, the system generates appropriate speed warning messagesto alert the driver.
Trang 21in-tegrated into their dash cameras Another recently introduced software is GSpeedby iCar [8], which was launched in June 2023 and can be integrated into the car’smonitor The utilization of GPS-based methods is widespread, but it necessitates asubstantial database However, this approach has certain drawbacks, such as the lackof real-time updates In some instances, the speed limit may have changed, but thesystem still relies on outdated information from its database Additionally, when twoparallel routes exist, the system may struggle to accurately detect the correct roadbeing traveled on.
Figure 1.2: GSpeed, based on GPS and developed by iCar [8]
Trang 22CHAPTER 1 INTRODUCTION
and implementation of infrastructure on the road, which may not be feasible in thecurrent traffic conditions in Vietnam.
Figure 1.3: Concept of Smart Road Signs communicate to vehicles [9]
Camera-based SWSutilize computer vision techniques to detect and recognizespeed limit signs In a study conducted by Chang et al in 2015 [10], they devel-oped a speed warning system for automobiles using computer vision techniques on amobile device Their approach involved extracting red color pixels to define Regionsof Interest (ROI) and utilizing pre-defined template numbers for pattern matching.However, this method had limitations One drawback was that traffic signs can varyin their fonts, requiring a diverse range of template numbers for accurate detection.Additionally, environmental conditions such as rain or nighttime can cause blurrinessin the traffic signs, posing further challenges for detection.
Trang 23Communication-based systems enable real-time information exchange between the vehicle and road-side infrastructure Further research and advancements in these areas can contributeto the development of more robust and effective SWS, ultimately enhancing roadsafety and reducing the risks associated with speeding.
1.2.2Traffic Sign Detection
Since 2010s, there has been a growing trend in utilizing camera-based objectdetection systems for the purpose of traffic sign detection This approach involvesthe application of deep learning algorithms, particularlyCNN, which have shown re-markable capabilities in accurately detecting and recognizing various types of trafficsigns [11, 12] These systems can be trained to recognize a variety of traffic signsincluding speed limit signs, and are able to work in a variety of lighting and weatherconditions In the year 2022, a comparative experiment was conducted on the GermanTraffic Sign Recognition benchmark dataset [13] with 43 classes, specifically compar-ing the performance of two popular object detection algorithms: Faster Region-BasedConvolutional Neural Network (R-CNN) [14] and YOLOv4 [15] The results of thisexperiment revealed that Faster R-CNN achieved a Mean Average Precision (mAP)of 43.26% while operating at a speed of 6 Frames Per Second (FPS) On the otherhand,YOLOv4 exhibited superior performance with anmAP of 59.88% at a signif-icantly higher detection speed of 35 FPS These findings highlight the suitability of
YOLOv4 for real-time traffic sign detection, offering a combination of higher preci-sion and faster detection speeds.
Trang 24CHAPTER 1 INTRODUCTION
1.2.3Object Detectors
Object detection is a fundamental problem in computer vision, with many appli-cations such as autonomous driving and intelligent transportation systems The twomain categories of object detection methods are one-stage and two-stage detectors.One-stage detectors such asYOLO[3] and Single Shot MultiBox Detector (SSD) [16]can detect objects in a single pass, while two-stage detectors such as FasterR-CNN
[14] and Mask R-CNN [17] first propose object regions before detecting the objectwithin those regions.
Faster R-CNN is a two-stage object detection method that first proposes objectregions and then classifies objects within those regions It uses a Region ProposalNetwork (RPN) to propose regions that might contain objects and then uses a secondnetwork to classify objects within those regions FasterR-CNNhas high accuracy butis slower than one-stage detectors such asYOLOandSSD[18].
MaskR-CNNextends FasterR-CNNby adding a branch to predict object masksin addition to object classes and bounding boxes It achieves state-of-the-art accuracyin object detection and instance segmentation tasks, but it is computationally expen-sive and has a slow detection speed.
YOLO is a popular one-stage object detection method that uses a single neuralnetwork to predict bounding boxes and class probabilities It divides the input imageinto a grid of cells and predicts the class probabilities and bounding boxes for eachcell YOLO has a fast detection speed and can achieve real-time performance onlow-power devices [19].
SSD is another one-stage object detection method that predicts object classesand bounding boxes from feature maps of different resolutions It uses convolutionalfilters of different sizes to detect objects at different scales SSDis faster than Faster
Trang 25Figure 1.4: The comparison of YOLOv3 on performance [20]
Figure1.4illustrates the performance comparison betweenYOLOv3 and anothermethods It can be observed that YOLOv3 outperforms in terms of detection speed,indicating its suitability for real-time object detection applications.
In summary, one-stage detectors such as YOLO and SSD are faster but havelower accuracy compared to two-stage detectors such as Faster R-CNN and Mask
R-CNN The choice of which method to use depends on the specific applicationrequirements such as speed and accuracy.
1.3Research Objectives
Trang 26CHAPTER 1 INTRODUCTION
Secondly, the study aims to evaluate the practical application of the developedtraffic sign detection system by integrating it with a speed warning system This in-volves utilizing the trained speed limit sign detection model to monitor and comparethe actual vehicle speed with the speed limit By implementing the speed warningsystem, the study aims to provide timely warnings to the driver in the event of ex-ceeding the speed limit, promoting safer driving practices.
Overall, this research seeks to contribute to the field of Computer Vision (CV)and traffic safety by designing and implementing an effective traffic sign detectionsystem and demonstrating its practical application in a Speed Warning Systems (SWS).
1.4Research Methodology
The research methodology employed in this study is empirical research, whichinvolves gathering real-world data and conducting experiments to test the effective-ness and performance of the developed system The research focuses on training amodel and implementing the speed warning system on an embedded device, followedby testing and evaluation in various real-world scenarios.
1.5Research Contents
Figure 1.5: Research Contents and Workflows
The research consists of four main sections: research onADAS and object de-tection methods, prepare the training dataset, train the traffic sings dede-tection model,, and evaluating its performance through experimentation to validate its detection ca-pabilities.
Trang 27methods It involves studying the existing literature, analyzing different approaches,and understanding the principles behindADASand object detection technologies.
The second section of the research centers on preparing the training dataset forthe traffic sign detection model It involves collecting relevant data, such as imagesor videos of traffic signs, and annotating them with appropriate labels The datasetneeds to be diverse and representative to ensure effective training of the model.
The model training section involves training the traffic sign detection modelusing the prepared dataset It includes selecting an appropriate model architecture,configuring the training parameters, and optimizing the model through the trainingprocess.
The final section of the research entails running the traffic sings detection modelon an embedded device and integrating it with OBD-II communication to read thevehicle speed data This allows for conducting experiments in real-world scenarios.The testing procedures are designed to evaluate the system’s performance, includingits detection accuracy, real-time capabilities, and overall effectiveness in providingspeed warnings.
Trang 28CHAPTER 1 INTRODUCTION
1.6Scope of Research
This study has a defined scope of research that is guided by certain limitations.The study is conducted within the boundaries set by these limitations.
The classes of traffic signs detection include 10 classes: Speed limit 50 km/h,60 km/h, 70 km/h, 80 km/h, 100 km/h, 120 km/h, start of residential area, end ofresidential area, and end of speed limit.
The system does not include the ability to recognize auxiliary signs attachedto the main speed limit signs The system does not able to set the priority betweentemporary signs and permanent signs.
1.7Research Contributions
This study makes two significant contributions Firstly, in terms of scientificcontribution, it provides an evaluation and insights into the practical applications oftheYOLOv5 object detection model specifically for traffic sign detection.
Secondly, in terms of practical significance, it extends the application of artificialintelligence to the field ofADAS, bringing advancements and potential benefits to theautomotive industry.
1.8Research Outline
Chapter 1 provides an introduction of the objectives, scope of the thesis andpresents a comprehensive review of the relevant literature and previous studies relatedto traffic signs detection and speed warning systems.
Trang 29Chapter 3 describes the data collection, pre-processing, training model, andevaluate the model training results.
Chapter 4 presents the experiment setup to evaluate the traffic signs detectionmodel and the speed warning system applications.
Chapter 5 shows the experimental results demonstrate both the detection capa-bilities and the instances of errors.
Trang 30Chapter 2
Fundamentals
Trang 312.1Convolutional Neural Networks
The Convolutional Neural Networks (CNN) have emerged as a powerful andwidely used deep learning algorithm in various domains, including computer vision.
CNNhave revolutionized the field of image recognition and analysis by demonstrat-ing superior performance in tasks such as object detection, image classification, andsemantic segmentation This introduction aims to provide an explanation of CNN,their underlying principles, and their significance in deep learning.
Figure 2.1: An example of CNNarchitecture to classify handwritten digits [21]
2.1.1Convolutional Layer
The convolutional layers play a crucial role in extracting features from the inputdata During the convolution operation, a convolution kernel is applied to the inputmatrix of the layer The kernel performs a dot product with the input matrix, typicallyusing the Frobenius inner product Convolutional layers perform convolutions on theinput data and transmit the output to the subsequent layer.
Trang 32CHAPTER 2 FUNDAMENTALS
Figure 2.2: The Convolution Operation [21]
dimensions of 6x6 (W = 6), and we apply a filter (or kernel) of size 3x3 (F = 3) tothe input image with a stride (S = 1) of 1 and no padding (P = 0), the dimension ofthe output feature map (also known as the activation map) can be calculated using thefollowing formula:
O = W − F + 2P
S + 1 (2.1)
As a result, it is possible to calculate the dimension of output feature map byfollowing Fomula2.1
Stride in the context of convolutional neural networks refers to the step size orthe number of cells by which the filter/kernel is moved across the input or feature mapduring the convolution operation The following Figure2.3showed and example withstride = 2.
Figure 2.3: An example of convolution with stride equal to 2 [21]
Trang 33or feature map before applying a convolutional operation The purpose of padding isto preserve the spatial dimensions of the input while preventing information loss atthe borders.
Figure 2.4: An example of padding in convolutional [21]
2.1.2Pooling Layer
A pooling layer is used to downsample the feature maps generated by the convo-lutional layers The pooling layer reduces the spatial dimensions (width and height)of the input feature maps, while retaining the most important information The mostcommon type of pooling is max pooling, where the maximum value within each pool-ing window is selected as the representative value for that region Other types ofpooling, such as average pooling, compute the average value within each poolingwindow.
Trang 34CHAPTER 2 FUNDAMENTALS
2.1.3Fully Connected Layer
A fully connected layer, also known as a dense layer or a fully connected neuralnetwork layer, is a type of layer in a neural network where each neuron is connectedto every neuron in the previous layer In other words, the outputs of all neurons in theprevious layer serve as inputs to every neuron in the fully connected layer.
In a fully connected layer, each neuron performs a weighted sum of its inputs,followed by the application of an activation function The weights and biases as-sociated with each connection are learned during the training process of the neuralnetwork.
The purpose of fully connected layers is to enable the network to learn complexnonlinear relationships between the input data and the target output These layers areoften added at the end of the network, following a series of convolutional or poolinglayers, to perform high-level feature extraction and classification.
Figure 2.6: An example of the fully connected layer’s input multiplied by the weightsmatrix to receive the output vector [22]
Trang 35output vector of the fully connected layer is obtained by performing a dot productbetween the input vector and the weights matrix, followed by a non-linear transfor-mation using an activation function The resulting output vector has dimensions 1x4,representing four learned features or aspects.
2.1.4Activation Function
An activation function in a neural network is a mathematical function that intro-duces non-linearity to the network’s output It is applied to the weighted sum of theinputs in a neuron, determining whether the neuron should be activated (fire) or not.
The activation function adds non-linearity to the network, enabling it to learncomplex patterns and make more accurate predictions Without an activation func-tion, the neural network would simply be a linear combination of the input values,which limits its representation and learning capabilities.
Commonly used activation functions include the sigmoid function, hyperbolictangent (tanh) function, and Rectified Linear Unit (ReLU) function.
The sigmoid function maps the input to a value between 0 and 1:
f (x) = 1
1 + e−x (2.2)
Trang 36CHAPTER 2 FUNDAMENTALS
Thetanhfunction maps the input to a value between -1 and 1:
f (x) = e
x− e−x
ex+ e−x (2.3)
Figure 2.8: Plot oftanh activation function [23]
The ReLU function sets negative inputs to zero and keeps positive inputs un-changed.
f (x) = max(0, x) (2.4)
Trang 372.2YOLOv5
2.2.1Introduce to YOLO
The You Only Look Once (YOLO) model is a state-of-the-art object detectionsystem was made public by Joseph Redmon et al in 2016 [3] YOLO is widely usedin computer vision tasks and has several advantages over traditional object detectionmethods, including its speed and accuracy YOLO divides an image into a grid andpredicts bounding boxes and class probabilities for each grid cell This approachallows YOLO to make predictions for all objects in an image in a single forwardpass, which results in real-time object detection Due to these advantages, it has beenapplied in multiple use cases for the detection of traffic signs, individuals, parkingmeters, animals,
The deep learning algorithm CNNare used by the YOLO method to recognizeobjects in real-time As the name indicates, the technique only needs one forwardpropagation through a neural network in order to detect objects.
Figure 2.10: How YOLO works [3]
Trang 38CHAPTER 2 FUNDAMENTALS
single step (figure 2.10) These grids are of the same S × S size It is utilized tofind and locate any objects that might be present in any of these areas Boundingbox coordinates,B, for any prospective objects are predicted for each grid along withtheir object labels and a probability rating for their presence These predictions areencoded as anS × S × (B ∗ 5 + C)tensor C in this case is numbers of class.
2.2.2YOLOv5 Architecture
YOLOv5 is a state-of-the-art object detection algorithm that has shown signif-icant improvements over previous versions of YOLO One of the key features of
YOLOv5 is its improved architecture, which includes a number of innovative tech-niques such as a focus mechanism that allows the model to focus on specific regionsof the image that are most relevant to the object being detected YOLOv5 also em-ploys a novel anchor-based approach to object detection that improves its accuracywhile reducing its computational requirements.
Another important aspect ofYOLOv5 is its flexibility and scalability The modelis available in various sizes, ranging from the smallYOLOv5-tiny model to the large
YOLOv5x model, which can handle complex object detection tasks Additionally,
YOLOv5 supports a wide range of input image sizes, making it suitable for a varietyof applications.
The normalYOLOnetwork consists of three main part:
• Backbone is composed of convolutional layers that extract high-level featuresfrom the input image
• Neck helps to fuse the features from the backbone network and improve thedetection accuracy.
Trang 39Backbone
InYOLOv5, the backbone consists of CSPDarknet-53 (Cross-Stage Partial Net-work [24]), which is a deeper and wider version of Darknet-53 used inYOLOv3 [20].
Figure 2.11: Darknet-53 Architecture [20]
CSPDarknet-53 has 53 layers deep, introduces a concept of cross-stage partialconnections, where the input feature maps are split into two paths One path goesthrough a convolutional block while the other path bypasses the block The outputsof both paths are then concatenated, creating a fused representation It leverages thecross-stage partial connections to improve the information flow and promote betterfeature learning This architectural modification has shown to bring performancegains over Darknet-53, resulting in improved object detection accuracy.
Trang 40CHAPTER 2 FUNDAMENTALS
Figure 2.12: (a) DenseNet and (b) Cross Stage Partial DenseNet [24]
Table2.1provided an illustrative example.
Table 2.1: An example to calculate dimension of output activation mapConvolution Layer Input size Output size
P1 [64, 6, 2, 2] 640x640x3 320x320x64P2 [128, 3, 2] 320x320x64 160x160x128P3 [256, 3, 2] 160x160x126 80x80x256P4 [512, 3, 2] 80x80x256 40x40x512P5 [1024, 3, 2] 40x40x512 20x20x1024
One observation is that as the width and height of the output activation mapdecrease, the depth or number of channels will increase.
Neck
The neck network consists of two convolutional layers that reduce the spatialdimensions of the feature maps and merge them to produce a single feature map withhigher resolution.
In theYOLOv5 object detection architecture, the "neck" refers to the set of layersthat follow the backbone and precede the head The purpose of the neck is to processthe feature maps produced by the backbone and extract higher-level representationsthat are more suitable for the detection task.