34 CHAPTER 3: VIOLATION DETECTION RESULTS AND DATA TRAIN-ING 37 3.1 Violation and license plate detection approaches.. To address this issue, a traffic flow detection andtracking scheme
Trang 1HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
SCHOOL OF ELECTRICAL AND ELECTRONIC
MASTER THESIS
Deep learning on edged devices in an
intelligent transport system
TRAN THAI SON Electronic Technology
Instructor: DR Nguyen Nam Phong
School of Electrical and Electronic
Ha Noi, Oct 2022
Trang 2ĐÁNH GIÁ QUYỂN ĐỒ ÁN TỐT NGHIỆP
(Dùng cho giảng viên hướng dẫn)
Tên giảng viên đánh giá:
Họ và tên sinh viên: Trần Thái Sơn MSSV: 20212460M
Tên đồ án: Deep Learning on Edged Devices in an Intelligent Transport System
Chọn các mức điểm phù hợp cho sinh viên trình bày theo các tiêu chí dưới đây:
Rất kém (1); Kém(2); Đạt(3); Giỏi(4); Xuất sắc(5)
Có sự kết hợp giữa lý thuyết và thực hành (20)
1
Nêu rõ tính cấp thiết và quan trọng của đề tài, các vấn đề và các giả
thuyết (bao gồm mục đích và tính phù hợp) cũng như phạm vi ứng dụng
của đồ án
2 Cập nhật kết quả nghiên cứu gần đây nhất (trong nước/quốc tế) 1 2 3 4 5
3 Nêu rõ và chi tiết phương pháp nghiên cứu/giải quyết vấn đề 1 2 3 4 5
4 Có kết quả mô phỏng/thực nghiệm và trình bày rõ ràng kết quả đạt được 1 2 3 4 5
Có khả năng phân tích và đánh giá kết quả (15)
5 Kế hoạch làm việc rõ ràng bao gồm mục tiêu và phương pháp thực hiệndựa trên kết quả nghiên cứu lý thuyết một cách có hệ thống 1 2 3 4 5
6 Kết quả được trình bày một cách logic và dễ hiểu, tất cả kết quả đềuđược phân tích và đánh giá thỏa đáng 1 2 3 4 5 7
Trong phần kết luận, tác giả chỉ rõ sự khác biệt (nếu có) giữa kết quả
đạt được và mục tiêu ban đầu đề ra đồng thời cung cấp lập luận để đề
xuất hướng giải quyết có thể thực hiện trong tương lai
Kỹ năng viết quyển đồ án (10)
8
Đồ án trình bày đúng mẫu quy định với cấu trúc các chương logic và
đẹp mắt (bảng biểu, hình ảnh rõ ràng, có tiêu đề, được đánh số thứ tự và
được giải thích hay đề cập đến; căn lề thống nhất, có dấu cách sau dấu
chấm, dấu phảy v.v.), có mở đầu chương và kết luận chương, có liệt kê
tài liệu tham khảo và có trích dẫn đúng quy định
9 Kỹ năng viết xuất sắc (cấu trúc câu chuẩn, văn phong khoa học, lập luậnlogic và có cơ sở, từ vựng sử dụng phù hợp v.v.) 1 2 3 4 5
Thành tựu nghiên cứu khoa học (5) (chọn 1 trong 3 trường hợp)
10a
Có bài báo khoa học được đăng hoặc chấp nhận đăng/Đạt giải SVNCKH
giải 3 cấp Viện trở lên/Có giải thưởng khoa học (quốc tế hoặc trong
nước) từ giải 3 trở lên/Có đăng ký bằng phát minh, sáng chế
5
10b
Được báo cáo tại hội đồng cấp Viện trong hội nghị SVNCKH nhưng
không đạt giải từ giải 3 trở lên/Đạt giải khuyến khích trong các kỳ thi
quốc gia và quốc tế khác về chuyên ngành (VD: TI contest)
2
Điểm tổng quy đổi về thang 10
Trang 3Nhận xét khác (về thái độ và tinh thần làm việc của sinh viên)
Ngày: / / 20
Người nhận xét
(Ký và ghi rõ họ tên)
Trang 4ĐÁNH GIÁ QUYỂN ĐỒ ÁN TỐT NGHIỆP
(Dùng cho cán bộ phản biện)
Giảng viên đánh giá:
Họ và tên sinh viên: Trần Thái Sơn MSSV: 20212460M
Tên đồ án: Deep Learning on Edged Devices in an Intelligent Transport System
Chọn các mức điểm phù hợp cho sinh viên trình bày theo các tiêu chí dưới đây:
Rất kém (1); Kém(2); Đạt(3); Giỏi(4); Xuất sắc(5)
Có sự kết hợp giữa lý thuyết và thực hành (20)
1
Nêu rõ tính cấp thiết và quan trọng của đề tài, các vấn đề và các giả
thuyết (bao gồm mục đích và tính phù hợp) cũng như phạm vi ứng dụng
của đồ án
2 Cập nhật kết quả nghiên cứu gần đây nhất (trong nước/quốc tế) 1 2 3 4 5
3 Nêu rõ và chi tiết phương pháp nghiên cứu/giải quyết vấn đề 1 2 3 4 5
4 Có kết quả mô phỏng/thực nghiệm và trình bày rõ ràng kết quả đạt được 1 2 3 4 5
Có khả năng phân tích và đánh giá kết quả (15)
5 Kế hoạch làm việc rõ ràng bao gồm mục tiêu và phương pháp thực hiện
dựa trên kết quả nghiên cứu lý thuyết một cách có hệ thống 1 2 3 4 5
6 Kết quả được trình bày một cách logic và dễ hiểu, tất cả kết quả đềuđược phân tích và đánh giá thỏa đáng 1 2 3 4 5 7
Trong phần kết luận, tác giả chỉ rõ sự khác biệt (nếu có) giữa kết quả
đạt được và mục tiêu ban đầu đề ra đồng thời cung cấp lập luận để đề
xuất hướng giải quyết có thể thực hiện trong tương lai
Kỹ năng viết quyển đồ án (10)
8
Đồ án trình bày đúng mẫu quy định với cấu trúc các chương logic và
đẹp mắt (bảng biểu, hình ảnh rõ ràng, có tiêu đề, được đánh số thứ tự và
được giải thích hay đề cập đến; căn lề thống nhất, có dấu cách sau dấu
chấm, dấu phảy v.v.), có mở đầu chương và kết luận chương, có liệt kê
tài liệu tham khảo và có trích dẫn đúng quy định
9 Kỹ năng viết xuất sắc (cấu trúc câu chuẩn, văn phong khoa học, lập luậnlogic và có cơ sở, từ vựng sử dụng phù hợp v.v.) 1 2 3 4 5
Thành tựu nghiên cứu khoa học (5) (chọn 1 trong 3 trường hợp)
10a
Có bài báo khoa học được đăng hoặc chấp nhận đăng/Đạt giải SVNCKH
giải 3 cấp Viện trở lên/Có giải thưởng khoa học (quốc tế hoặc trong
nước) từ giải 3 trở lên/Có đăng ký bằng phát minh, sáng chế
5
10b
Được báo cáo tại hội đồng cấp Viện trong hội nghị SVNCKH nhưng
không đạt giải từ giải 3 trở lên/Đạt giải khuyến khích trong các kỳ thi
quốc gia và quốc tế khác về chuyên ngành (VD: TI contest)
2
Điểm tổng quy đổi về thang 10
Trang 5Nhận xét khác của cán bộ phản biện
Ngày: / / 20
Người nhận xét
(Ký và ghi rõ họ tên)
Trang 6Modern structures such as bridges, buildings, water dams, minerals mines, etc.have indeed played an important role in our daily life They benefit us in many aspectsand greatly contribute to our twenty-first-century society Owners and maintenancemanagers of these capital-intensive assets expect to increase durability of the structure.Therefore, Structural Health Monitoring (SHM) system is born in response to such aneed Acknowledged of structures’ defects, timely intervention shall reduce risks andmaintenance costs As a result, SHM system shall help to increase the service life ofstructures For example, The Canadian Highway Bridge considers a service life of 75years for newly constructed bridges Another example in this case is the New Cham-plain bridge in Montreal, Canada, which is designed for 125 years in service Moreover,municipalities, provincial and federal governments are becoming interested in the con-cept of “Smart City” This vision aims at implementing advanced technologies to create
value-added services for citizens and the administration of the city [?] In the context,
proper inspection and maintenance have become more important than ever
Another crucial application of SHM system is to improve reliability of existinginfrastructure Existing infrastructures in North America are rapidly approaching the
end of their design service life According to ASCE 2017 infrastructure report card,
nearly 10% of bridges (about 56,000 bridges) in the United States have structural ciencies, which makes them vulnerable The conditions in Canada are not much better.There are about 75,000 highway bridges in Canada; According to the National ResearchCouncil of Canada (NRCC 2015), almost one-third of these bridges have structural orfunctional deficiencies The collapse of Ponte Mirandi in Genova, Italy, in August 2018clearly shows that existing infrastructure requires immediate attention
defi-SHM system refers to an array of connected sensors, which collect and analyzedata at every moment during the service life of the structure The goal is to identify
and quantify any damage or deterioration state that might occur over the service life [?].
Vibrating Wire Sensors (VWS) are among such sensors In Viet Nam, due to our developed economy in the past decade, constructions are seen everywhere The tragicand catastrophic breakdown of Can Tho bridge in Vinh Long province, 26 September
rocket-2007 that causes the death of 55 people is evidence of how important structure ing is Thus, in Viet Nam, the need for VWS and SHM system in general is undeniable.According to surveys, most of the VWS sold in the market are GEOKON original, whileReadout device compatible with these sensors is overpriced as well as hard to purchase.This study plays the part of introducing and developing a VWS Reader that is compatiblewith these VWS models, with the vision to develop it to a Data-logger for IoT implemen-
Trang 7monitor-tation The product is designed to be affordable while still satisfy all standard industrialrequirements.
To complete this graduation thesis, I would like to express my heartfelt gratitude
to Mr Nguyen Nam Phong (Dr.) for his unwavering support and guidance during myproject writing process
I would love to give special thanks to the Hanoi University of Science and ogy lecturers who have enthusiastically passed on their experience over the years Theinformation I gained during the Hanoi University of Science and Technology lectureprocess is not only the cornerstone for completing the project but also a valuable assetthat will enable me to confidently enter the labor market
Technol-I’m also grateful to Ms Bui Van Anh and Mr Nguyen Tien Hoa for creating thisthesis LaTeX template
Finally, I wish the teachers good health and continued achievement in their careers
as educators
Trang 8I am Tran Thai Son, student ID 20212460M, my instructors are Dr Nguyen NamPhong I assure that all of the content presented in the project is the result of my research;the data stated in the project is completely genuine, reflecting the actual measurementresults; all cited information complies with intellectual property regulations; the refer-ences are clearly listed I take full responsibility for the content written in this project
Hanoi, 10 Sep, 2022
Student
Tran Thai Son
Trang 9TABLE OF CONTENTS
1.1 Introduction to ITS problems and system’s overview 1
1.2 Violation detection flow on edge module 13
CHAPTER 2: OBJECT DETECTION AND TRACKING APPROACHES 16 2.1 Convolutional Neural Networks and YOLO 16
2.2 Evolution of YOLO and selection of YOLOv5 model 19
2.3 Implementation of YOLO detector 23
2.4 Theory of BYTE Track 29
2.5 Implementation of BYTE tracker 34
CHAPTER 3: VIOLATION DETECTION RESULTS AND DATA TRAIN-ING 37 3.1 Violation and license plate detection approaches 37
3.2 Experiment results 51
3.3 Data preparation and training 53
3.3.1 Data collection 55
3.3.2 Exploratory data analysis 56
3.3.3 Data augmentation 58
3.3.4 Model training and evaluation 63
Trang 10ABBREVIATIONS
Trang 11OCR Optical Character Recognition
Trang 12LIST OF FIGURES
1.1 Overall system flow diagram 4
1.2 Configuration drawing tool for operators 4
1.3 Camera viewing access for mobile application 5
1.4 Image of Jetson Xavier 5
1.5 HIKVISION camera 7
1.6 Operators utilise OGUI for set up configuration 8
1.7 Violation data are visualized to operators 8
1.8 Mobile application’s access to violation data 9
1.9 Violation data sample in Cloud Firestore 10
1.10 Data structure of camera’s configuration 11
1.11 Overview implementation of edge device 13
2.1 Bounding boxes with a car as object [1] 18
2.2 Bounding boxes illustration [1] 18
2.3 YOLO’s core idea [2] 19
2.4 YOLO’s loss function [1] 20
2.5 Speed comparison of YOLOv3 [3] 21
2.6 YOLOv5 different model sizes, where FP16 stands for the half floating-point precision, V100 is an inference time in milliseconds on the Nvidia V100 GPU, and mAP based on the original COCO dataset [4] 22
2.7 YOLOv5 network architecture [2] 22
2.8 Comparison of vanilla PyTorch and ONNX-TensorRT workflow 24
2.9 TensorRT with six optimizations approach [5] 25
2.10 TensorRT: Precision calibration, layer and tensor fusion, kernel auto-tuning and multi-stream execution [5] 26
2.11 TensorRT layer and tensor fusion optimization [5] 26
2.12 MOTA-IDF1-FPS comparisons of different trackers The horizontal axis is FPS (running speed), the vertical axis is MOTA, and the radius of circle is IDF1 [6] 29
Trang 132.13 Architectural model one method MOT [6] 30
2.14 BYTE Track Algorithms [6] 32
2.15 Simulate the working principle of BYTE Track [6] 33
2.16 Tracking flow of Byte Track 34
2.17 Recovering Appearance Feature Extraction method For four-wheel ve-hicles, the features in the area of the red box should be extracted While these two-wheel vehicles are featured in the green and blue boxes 35
3.1 Violation detection algorithm 37
3.2 Extracting bounding boxes center from YOLO format 38
3.3 Traffic light recognition flow 39
3.4 Intersection algorithm filtering 41
3.5 The long, single-row license plate type in Vietnam 43
3.6 License plate with different colors indicates different categories of vehi-cles in Vietnam 43
3.7 Pipeline in license plate detection 44
3.8 Original images of vehicles 44
3.9 Result of grayscale process in license plate flow 45
3.10 Result of Filter Bilateral in license plate flow 46
3.11 Edge detection with Canny Edge method in license plate flow 47
3.12 Finding contours process in license plate flow 47
3.13 Extracted ROI in license plate flow 48
3.14 Cropped short, two-line license plate as input of character segmentation and separation process 48
3.15 Cropped long, one-line license plate 48
3.16 CRNN architecture in Paddle OCR approach 50
3.17 License plate before detection (left) and after detection(right) 50
3.18 Determine characters’ position on the license plate 51
3.19 Image of violation event, test run on recorded video at an intersection 53 3.20 Image of violation event, test run on stream video at an intersection 53
3.21 Image of violation event with a violated motorcycle vehicle 54
Trang 143.22 Image of violation event with a violated car vehicle 54
3.23 Recognizing the license plate of a violated vehicle 55
3.24 Automated annotation using YOLOv7 model 56
3.25 Result of automated annotation tool 57
3.26 Flipping transformation result in our dataset 59
3.27 Filtering transformation approach result in our dataset 59
3.28 Scaling transformation approach result in our dataset 60
3.29 Translation transformation approach result in our dataset 61
3.30 Shear transformation by five degree in our dataset 61
3.31 Shear transformation by minus five degree in our dataset 61
3.32 Rotation transformation approach result in our dataset 62
3.33 MixUp augmentation approach in our dataset 63
3.34 Two common approaches to tuning hyper-parameters 64
3.35 Data in VOC 2007 format 65
3.36 Model training process on server 66
3.37 Training results on dataset 66
Trang 15LIST OF TABLES
Table 1.1 Specfications of Jetson Xavier 6Table 2.1 Maximum batch size for each network architecture and workflow 28Table 2.2 Utilized frameworks with corresponding version included in Jet-Pack 4.6.4 28Table 2.3 Other frameworks with corresponding version 28Table 2.4 Comparison of different data association methods on the MOT17validation set [6] 33Table 3.1 Range of components in HSV color space of signal lights 40Table 3.2 Comparison of improved RAFE-BYTE Track with others approaches 52Table 3.3 Comparision of data analysis metrics between our dataset and others 58
Trang 16An intelligent transportation system (ITS) [7] plays an essential role in public port management, security, and other issues The ITS requires traffic flow detection andtracking to be a vital component Based on the real-time acquisition of urban road trafficflow information, an ITS provides intelligent guidance for relieving traffic jams and re-ducing environmental pollution The traffic flow detection in an ITS usually adopts thecloud computing mode The edge node will transmit all the captured video to the cloudcomputing center However, the increasing traffic monitoring has brought enormouschallenges to the storage, communication, and processing of traditional transportationsystems based on cloud computing To address this issue, a traffic flow detection andtracking scheme based on deep learning on the edge node is presented in this article.Apart from the edge node, the proposed ITS also contains a data-storing server andclient-based user interface, which makes this ITS more practical and user-oriented thanothers
trans-The design of our edge module is implemented by five sub-processes, with theobjective of identifying any traffic violations First, a vehicle detection algorithm based
on the YOLO [2] (You Only Look Once) model trained with a great volume of trafficdata is put to use The vehicle detection algorithm is integrated with an object trackingalgorithm called BYTE Track [6] The detection and tracking scheme is further improvedwith an enhanced feature extraction and association method In favor of fulfilling thepurpose of ITS, a violation supervising module with an Intersection Filtering methodtakes tracked objects as inputs, in order to detect and classify violations Lastly, theviolated vehicle’s license plate is recognized and segmented To guarantee the real-timefeature of ITS, the vehicle detection model is optimized on the edge device Jetson Nanoplatform [8] with TensorRT and ONNX conversion After verifying the correctness andefficiency of my framework, the test results indicate that my ITS is able to execute theentire violation detection flow with an average processing speed of 28.7 FPS (frames persecond) and an average accuracy of 86.53% on the edge device
Trang 17CHAPTER 1: INTRODUCTION AND SYSTEM’S OVERVIEW
1.1 Introduction to ITS problems and system’s overview
The purchasing of vehicles in Vietnam and generally worldwide has significantlyincreased within the last decade due to the development of the economy and the im-provement of people’s living standards According to the statistics, there were 45 mil-lion motor vehicles and 4.7 million cars in Vietnam until 2022, which has caused hugeproblems such as traffic jams and environmental pollution To address these problems,the intelligent transportation system (ITS) [7] is presented, which combines both IoT(Internet of Things) and AI (Artificial Intelligence) technologies [9] In addition, thematurity of 5G technology [10] will make ITS feasible and faster grown
Traffic flow tracking [11] is one of the key technologies in ITS However, the tional traffic flow tracking methods mainly include ultrasonic detection [12] or inductioncoil detection [13] These methods have some disadvantages such as low detection ac-curacy, short detection distance and are easily affected by the environment Therefore,with the rapid development of deep learning and computer vision, many excellent objectdetection and target tracking algorithms [14] - [15] have been introduced over the pastfew years These modern approaches are presented with several advantages:
tradi-1 They cause no harm to the pavement, which leads to lower maintenance costs;
2 They require no additional devices and tools for testing;
3 They can be deployed with a small budget;
4 They obtain metadata through the traffic video monitoring system, the subsystem
of ITS;
5 They achieve much higher accuracy and precision than the traditional methods;
6 Their performance can be improved over time with more data and enhanced rithms;
algo-The process of the traffic flow tracking [16] - [17] algorithm based on intelligentvideo is described as a three-step-process First, the vehicle object is detected from thevideo or image sequence Then, the detected vehicle is tracked to establish the connec-tion between different video frames Finally, it detects the vehicles passing the relevantlane within a certain period and outputs the detection result
Nowadays, most ITS deploy a cloud-center mode [18] for traffic flow tracking gorithms However, when comes to applications, cloud computing modes have shown
Trang 18al-several drawbacks First of all, uploading video files to the cloud and waiting for the cessing result is impossible for real-time processing Secondly, the network bandwidth
pro-is insufficient, especially for 4k and Ultra HD videos Moreover, the energy tion and maintenance work of a data center are excessive Lastly, cloud center leads tothe risk of data security and privacy
consump-Moreover, in recent years, even some edge computing modules that propose fic flow tracking have been introduced However, considering all aspects, these edgecomputing modules have shown deficits For instance, S.Y.Nikouei, Y.Chen et al [19]propose a "Real-Time Human Detection as an Edge Service" implemented on Rasp-berry PI 3 edge device with openCV library Nevertheless, openCV is not capable offully utilizing GPUs, especially on Raspberry PI 3, which leads to non-real-time detec-tion Another work is Nashwan Adnan Othman et al [8], who introduce an "Embeddedcar counter system using Jetson Xavier" Despite adopting the system on an AI-orienteddevice like Jetson Xavier, the authors still do not fully make use of the NVIDIA de-velopment environment Also, the car counter system does not attach to a bigger IoTarchitecture but only uses a Telegram API to send notifications to smartphones Al-though Chi-Sung Ahn, Bong-Gyou Lee et al [20] suggest outstanding ideas to detectcar license plate area, the design only addresses a modest problem and is unable to bedeveloped into a completed system
traf-Based on the above shortcomings of the cloud computing mode and other edgecomputing systems, I introduced a user-oriented, technology-efficient system to addressthe problems of traffic surveillance and vehicle violation detection This ITS comeswith two Graphical User Interfaces, uniquely for on-site police officers and monitoringoperators at the operation center Moreover, I adopt an edge computing mode [21] in
my traffic flow tracking scheme on edge devices Any edge device will connect to itscorresponding camera, forming a completed edge module, to perform violations detec-tion and supervision All client-based and edge-based modules share a mutual Firebasecloud service for communicating and data storing Together, they establish a completedITS system, which differentiates itself from other ITS The overall system diagram isshown in Figure 1.1, including four subsystems:
1 Embedded AI camera: this module is responsible for the multiple-object detectionand tracking flow The violation detection takes the tracker’s outputs as inputs anddetermines which tracked object is considered "violated", according to Vietnam’straffic laws It also includes a liscence-plate specific module, which detects distinc-tive Vietnam license plate characters, applied to violated vehicles This is the heart
of our system
2 Operation Graphical User Interface (OGUI): this module targets end-users, whose
Trang 19roles are operators, administrators of the system With OGUI, operators are able tomonitor and modify a list of cameras and their statuses, control the configuration ofeach Embedded AI cameras, which is important for Embedded AI camera to detectviolations Further more, OGUI provides operators with statistical figures of vio-lations in each city, which give users a deeper insight to the traffic laws obediencestatus of local citizens When a new camera is added to the system, OGUI providesusers with line drawing and visualization features Drawing and visualization tool
of OGUI are shown in Figure 1.2
3 On-duty Officer User Interface (OOUI)): this module targets end-users, who areon-duty on-site officers OOUI provides users with cameras and violations viewingfeature as shown in Figure 1.3.However, this list of cameras and violations restrictonly to each officer’s account, meaning that he/she can only see cameras and viola-tions in his/her area
4 Google Firebase real-time cloud-based storage and authentication: this module isthe cloud-based backend server for our system It contains mainly two services:authentication and data storing While the "authentication" service verifies user’sidentity and gains access to user to certain part of our system, "data storing" servicetakes care of secure data, namely embedded camera configurations, violation details(date time, location, type of violation, type of vehicle, ), violation video footage,violated vehicle’s license plate information
The edge device chosen for the "Embedded AI camera" module is Jetson Xavier[22] Jetson Xavier is a small, yet powerful computer for embedded applications and AIIoT that delivers the power of modern AI Researchers are able to get started fast withthe comprehensive JetPack SDK with accelerated libraries for deep learning, computervision, graphics, multimedia, and more Jetson Xavier has the performance and capabil-ities needed to run modern AI workloads, giving researchers a fast and easy way to addadvanced AI to a product On market, Jetson Xavier (4GB RAM module) ranges from
200 to 300 US dollars, which is a reasonable price for edge node in ITS The productimage of Jetson Xavier is shown in Figure 1.4
Being GPU-strong device, Jetson Xavier is perfect for computer vision and deeplearning tasks, since tasks like this require parallel computing power of GPU Somespecifications are shown in Table 1.1
Edge device will connect to a Hikvision camera, shown in Figure 1.5 Besides,some of camera’s features are listed as below:
• 1/3" Progressive Scan CMOS
Trang 20Figure 1.1 Overall system flow diagram
Figure 1.2 Configuration drawing tool for operators
Trang 21Figure 1.3 Camera viewing access for mobile application
Figure 1.4 Image of Jetson Xavier
• IP67
Also, camera’s specifications are provided:
• Image Sensor:1/3" Progressive Scan CMOS
• Min Illumination:Color: 0.028 Lux @ (F2.0, AGC ON)
Trang 22communica-GPU 128-core MaxwellCPU Quad-core ARM A57 with 1.43 GhzMemory 4GB 64-bit LPDDR4 25.6GB/sStorage microSD(not included)Video Encode 4K width 30 |4x1080p width 30
|0X720p width 30 (H.264/H.265)Video Decode 4K width 60|2x 4K W 30
|8x 1080p W 30 | (H.264/H.265)Camera 1xMIPI CSI-2 DPHY lanesConnectivity Gigabit Ethernet, M.2 Key EDisplay HDMI 2.0 and eDP 1.4USB 4x USB 3.0, USB 2.0 Micro-BOthers GPIO, I2C, I2S, SPI, UARTMechanical 100mm x 80mm x 29mm
Table 1.1 Specfications of Jetson Xavier
and cloud server, represents which data is transmitted and received Data transmissionand receive in our overall system is shown in Figure 1.1, and is explained as below:
1 Middle-ware between Cloud server and Operation Graphical User Interface (OGUI):based on camera-list data received from server, OGUI can request an update for eachcamera by emitting requests included camera set-up configurations such as borderlines (no-cross solid line, crossable dash line, red-light line, ), turnable lanes, These camera configurations are input from operators when they perform set-upprocess for each camera, as shown in Figure 1.6.Operators can also interact withOGUI to create new camera resources with similar set-up configurations Mean-while, cloud server feeds OGUI with violation data Figure 1.7 indicates that theseviolation data are then visualized to operators in order to provide them with usefulinformation
2 Middle-ware between Cloud server and Embedded AI Camera: embedded edgeddevices receive set-up configuration correspond to its device id Follow previousstep, certain edged devices apply configuration changes to their multiple-object-tracking and violation detection flow Based on configurations assigned by oper-ator, edge devices execute its process and return any detected violations to Cloudserver Detected violation contains following information: a five-second-video ofviolation scenes, type of violation formatted as a string, license plate of vehicle,
Trang 23Figure 1.5 HIKVISION camera
date and time when violation occurs, class of violated vehicle (car, motorcycle, bus,truck) These information are stored in Cloud server and will be served to operatorswhen there are requests from OGUI, as mentioned above
3 Middle-ware between Cloud server and On-duty Officer User Interface (OOUI):OOUI is a lite, mobile-based version of Operation Graphical User Interface Toprovide on-duty on-site officers with convenience and easy-access to violation data,OOUI must be a mobile platform However, since this is only a lite client device, itsaccess is limited to violation data read-only OOUI also provides limited features,such as accessing to assigned camera footage, viewing and visualizing account-specific violations, as shown in Figure 1.8 Due to OOUI’s read-only right, OOUIdoes not create, update or delete any resources on Cloud server
Our cloud Firebase storage server contains two services that provide storing ture:
fea-1 Cloud Firestore, a flexible, scalable NoSQL cloud database, for storing violations
Trang 24Figure 1.6 Operators utilise OGUI for set up configuration
Figure 1.7 Violation data are visualized to operators
and camera detail data Cloud Firestore provides both storage and sync-able datafor client-side and server-side development It keeps our data in sync across clientapps through realtime listeners and offers offline support for mobile and web soresponsive apps shall work regardless of network latency or Internet connectivity
2 Cloud Storage for user-generated content, such as license plate photos and violationvideos It provides OOUI and OGUI with programming interface to securely uploadand download violation-related files, regardless of network quality
Diving deeper into Cloud Firestore [23], following key capabilities make it suitable
Trang 25Figure 1.8 Mobile application’s access to violation data
for violations, camera detail and configuration data:
• Flexibility: data model supports flexible, hierarchical data structures Data is stored
in documents, organized into collections Documents can contain complex nestedobjects in addition to sub-collections This data structure well fits our purposesince one violation sample can contain multiple complex data, including but notrestricted to date time of violation, license plate content, violation type, violatedlocation, type of violated vehicle, as shown in Figure 1.9 Flexibility helps our datamodel to adapt when our users’ requirements change
• Expressive querying: queries are utilized to retrieve individual violation, specificviolation documents or to retrieve all the documents in a collection that matchquery parameters, passing down from OGUI, OOUI Queries can include multiple,chained filters and combine filtering and sorting They’re also indexed by default,
so query performance is proportional to the size of result set
• Realtime updates: Cloud Firestore uses data synchronization to update data OGUI,OOUI clients Changes occur frequently in violation collection, and thanks to syn-chronization mechanism, they will be updated, visualized instantly in client baselike OOUI and OGUI Moreover, it’s also designed to make simple, one-time fetchqueries efficiently
• Offline support: On-site officers mainly carry on duty on the street Therefore, theyshould expect lost connection frequently, especially in remote area Cloud Firestorecaches data that our client is actively using, so clients can write, read, listen to, and
Trang 26even query data in case the device is offline When the officer enter signal-coveredarea, the device will be online and synchronize any local changes back to CloudFirestore.
• Designed to scale: Cloud Firestore brings along Google’s most powerful ture: automatic multi-region data replication, strong consistency guarantees, atomicbatch operations, and real transaction support When the system is deployed andspreads from one city to another, the number of edged camera devices will signif-icantly increase Powerful scalability ensures our back-end base abilities to handlethe toughest database workloads, and the most massive number of edged cameras
infrastruc-Figure 1.9 Violation data sample in Cloud Firestore
While Cloud Firestore presents key features that is suitable for storing violationsand camera detail, Cloud Storage [24] is best fit for files manipulation, due to followingreasons:
• Robust operations: servers perform uploads and downloads regardless of networkquality Uploads and downloads are robust, meaning they restart where they stopped.This mechanism not only guarantees the vehicle’s license plate video and violationvideos to be uploaded and downloaded successfully, but also saves users time andbandwidth Without robust operations, media data could be lost, causing inadequateviolation information and therefore, leads to difficulty in the violation monitoringprocess
• Strong security: integration with Firebase Authentication provides the system withsimple and intuitive authentication, which is necessary for a confidential-orientedtraffic monitoring system Declarative security model is used to allow access based
on filename, size, content type, and other metadata
Trang 27Following Cloud Firestore’s NoSQL data model, our violation data is stored indocuments that contain fields mapping to values These documents are stored in collec-tions, which are containers used to organize data and build queries Our data structure
is designed in the form of many different data types, from simple strings and numberslike license plate, vehicle class, and violation classification, to timestamp format likeviolation occurred time, to complex, nested objects like camera configuration
Collections are simply containers for documents The system is designed to havethree collections: "account" for storing user’s accounts information, "cameraIp" for cam-era detail and configuration, "violation" for vehicles’ violation detail "cameraIp" and
"violation" collections are mainly focused and, therefore, will be further discussed Adocument is a lightweight record that contains fields, which map to values In our sys-tem, each document is identified by a randomly, uniquely generated name, which sepa-rates each one from another In the "violation" collection, the field structure contains:license plate, location (refers to the camera’s location), time (refers to the date-time of theviolation), video_name (refers to evidence of the vehicle’s violation), violation (refers tothe type of violation) On the other hand, the field structure in the "cameraIp" collectionconsists of: "Ipcamera" (refers to the http/rtsp url of camera, including method, DNSdomain, port, user and password for camera authentication), "district" (refers to the lo-cation of camera), "namecam" (refers to the user’s display name of camera) and otherfields refer to camera’s configuration Figure 1.10 indicates how the camera’s configu-ration should be stored
Figure 1.10 Data structure of camera’s configuration
Cloud Firebase Firestore and Storage are implemented in the following path:
1 Integrate Firebase Interface: Programming Interface is include in client application,built by Flutter script
Trang 282 Create data resource: Three collections with respective documents as mentionedabove are created For media files such as license plate images and violation videos,
a reference the path to a file is declared to upload, download, or delete media.Data resource will be update, added or deleted by OGUI client or edged devices asexplained earlier
3 Fetch data resource: During execution process of clients, whenever there are quests, data will be retrieved using queries or realtime listeners Then, they will befetched as json data or file stream for displaying to users
re-Both of our client-based application is developed using Flutter SDK and work [25] Flutter is a free and open-source mobile UI framework created by Google inMay 2017 with the purpose to create a native mobile application with only one codebase.Flutter consists of two important parts, an SDK and a Framework An SDK (SoftwareDevelopment Kit) is a collection of tools used to compile codebase into native machinecode A Framework (UI Library based on widgets) is a collection of reusable UI ele-ments (buttons, text inputs, sliders, and so on) used to personalize front end
frame-To develop with Flutter, a programming language called Dart will be used Flutterhas some significant advantages, which makes it suitable application programming:
1 High productivity: By using the same codebase for iOS and Android, we can saveboth time and resources Flutter’s native widgets also minimize time spent on test-ing by ensuring there are little to no compatibility issues with different OS versions
2 Easy to learn: Flutter allows us to build native mobile applications without needing
to access OEM widgets or use a lot of code This, in addition to Flutter’s particularlyappealing user interface, makes the mobile app creation process much simpler
3 Impressive performance: It is difficult to notice the difference between a Flutterapp and a native mobile app
4 Cost-effective: Building iOS and Android apps with the same codebase is buildingtwo apps for the price of one
5 Available on different IDEs (Integrated Development Environment): We are free tochoose between Android Studio and VS Code to edit their code on Flutter
6 Great documentation and community: Flutter has many great resources to answerour questions, thanks to its ample documentation with easy-to-follow use cases.Flutter users also benefit from community hubs like Flutter Community and FlutterAwesome for exchanging ideas
Trang 297 Quick compilation: we can change your code and see the results in real-time It iscalled HotReload It only takes a short amount of time after we save to update theapplication itself.
1.2 Violation detection flow on edge module
The Edge module mentioned above contains a multiple-object tracking and licenseplate processing flow In this flow, the module’s input is the frame stream from thevideo obtained at the observation points The module’s output is violation informationand license plate of every monitored vehicle
Figure 1.11 Overview implementation of edge device
In Figure 1.11, input images consequences are consecutive frames transmitted fromthe camera attached to corresponding edge device, this module aims to address threemain computer vision problems:
• The problem of vehicle identification and detection: MOT flow returns the position
of the bounding box containing the object, including upper left corner, upper rightcorner and the label of the object Specifically in this problem, objects are vehicles:car, motorcycle, bus, truck A YOLOv5-based algorithm and a modified YOLOv5smodel are put to used to address this problem, resulting in edge-device optimizationand efficiency Later in section 2.3, detection problem shall be further discussedthoroughly
• The problem of vehicle tracking and re-id: each vehicle’s previous and subsequent
Trang 30position is considered for monitoring vehicle traffic entering and leaving the area, orsimply crossing a line The solution to this problem is BYTE Track [6] - currentlyone of the famous SOTAs of the Multi Tracking Object field Section 2.5 shallcontain more detail about tracking algorithm implementation.
• The problem of violation detection: from the results of the vehicle detection andtracking, Intersection Filtering algorithm is proposed to determine whether a ve-hicle violated predefined traffic laws This shall be further explained in section3.1
• The problem of license plate detection: a license plate detection module and numberplate character recognition are applied to extract the features of a violated vehicle,
as discussed in section 3.1
Input video in Figure 1.11 shall be taken from Hikvision camera as stated above,
it is then spitted into many images, also called frames These images will be handled
in real-time through the YOLO vehicle detector The vehicle detector shall output thebounding boxes and confident scores into the next steps Bounding boxes conventionallyhave four parameters, which are the top left and bottom right coordinates Then, Track-ing by Detection using BYTE Track will handle the object path tracing All steps abovewill further serve the violation and license plate detection By using a proposed Inter-section Filtering algorithm, the violation is identified and classified After the previousstep, the system will perform license plate recognition of violated vehicle and outputboth the box that covers the plate and the text inside the plate The general purpose ofthe system is to recognize the violation of a certain vehicle and identify that vehicle byits appearance and license plate
In the following section 2.3, 2.5, multiple optimization methods are proposed tofurther improve the performance of this edge module With these approaches, the systemachieves a throughput of 28.7 FPS for the entire flow, as shown in Figure 1.11
The efficiency of our edge module reflects the efficiency of traffic surveillanceand vehicle violation detection tasks Therefore, in this core module, I propose a trafficflow tracking scheme at the edge of the ITS that has the following novel ideas:
1 A vehicle detection algorithm based on the improved YOLOv5 object detection gorithm [2] The algorithm detects individual vehicles at the single-frame level
al-by capturing appearance features with a state-of-the-art object-detection method
By applying NVIDIA environment optimization, its detection speed is greatly proved to meet real-time requirements
Trang 31im-2 A vehicle tracking algorithm based on the improved BYTE Track multiple-objecttracking algorithm [6] However, to achieve high accuracy, popular MOT methodslike BYTE Track introduce the Re-ID module to integrate with the detection-basedMOT method and train two DNNs simultaneously However, this Re-ID modulelays burden on both computing costs and model training resources To address thiskey issue, I designed an enhanced feature extraction method called RAFE, which
is a simple but relatively accurate method, to substitute the cumbersome Re-IDcomponent Moreover, tracking scheme is also upgraded with proposed associationmethod
3 A real-time violation supervising algorithm [26] combines a newly proposed tersection Filtering method for detecting vehicles passing through the line, and aviolation classification flow It realized a high-accuracy real-time traffic flow de-tection approach
In-4 A license plate recognition algorithm based on PaddleOCR [27], which greatly creases the performance of license plate detection, compared to other deep learningapproaches The algorithm detects the plate, then recognizes the characters [28]and the color inside
in-5 An approach of edge device migration and deployment for the traffic flow detectiontask using the NVIDIA engine The high-accuracy and real-time traffic detectiontask is completed on the edge device Jetson Xavier with a compressed and opti-mized structure
However, when putting an ITS with popular, widely-used pretrained detection els to practice in Vietnam, these models show shortcomings when detecting motorcycles
mod-or motmod-orbikes Due to the fact that motmod-orcycles in Vietnam usually refer to scooters mod-orcompact motorbike vehicles, any pretrained models would show failures detecting thesevehicles Therefore, a dataset customized for Vietnam traffic is proposed To ensurethe validity and competence of data when applied to a pretrained model, the dataset islaunched through a proven step-by-step process, from random-gathering, augmentation
to exploring and analysis
The remaining part of this article is organized as follows In Chapter II, I describethe background theory and implementation of object detection and tracking methods InChapter III, I describe the data collection, exploration, and analysis process, the Inter-section Filtering algorithm, and the license plate recognition approach Chapter III alsoincludes some experiment results and real-life practice of the proposed system Finally,
I summarize the work done in this article, pointing out the shortcomings of this studyand elaborating future research directions that can be further improved
Trang 32CHAPTER 2: OBJECT DETECTION AND TRACKING
APPROACHES
For vehicles detection module, I use YOLOv5 [2] as the main approach, whilethis of tracking module is BYTE Track Both of which aim to bridge the gap betweenresearch and industrial communities In this section, I will first give a brief introduction
to the YOLO algorithm, and the deployment of detection module on NVIDIA hardware.Then, the improved BYTE Track with proposed RAFE method and data association will
be brought up
2.1 Convolutional Neural Networks and YOLO
A Convolutional Neural Network (CNN) is used in image recognition and ing that is specifically designed to process pixel data CNNs are powerful image pro-cessing, artificial intelligence (AI) that use deep learning to perform both generative anddescriptive tasks, often using machine vision that includes image and video recognition,along with recommendation systems and natural language processing (NLP)
process-A reader with prior background in computer vision and image processing may have
identified my description of a convolution above as a cross-correlation operation instead.
Using cross-correlation instead of convolution is actually by design Convolution
(de-noted by the * operator) over a two-dimensional input image I and two-dimensional kernel K is defined as:
of 3, stands for Red, Green and Blue channels, respectively Given this knowledge, we
can think of an image as big matrix and a kernel or convolutional matrix as a tiny matrix
that is used for blurring, sharpening, edge detection, and other processing functions
Essentially, this tiny kernel sits on top of the big image and slides from left-to-right and top-to-bottom, applying a mathematical operation (i.e., a convolution) at each (x,y)-
coordinate if the original image
Trang 33YOLO is an algorithm that uses neural networks to provide real-time object tion This algorithm is popular because of its speed and accuracy It has been used invarious applications to detect traffic signals, people, parking meters, and animals Ob-ject detection is a typical computer vision task in which you try to figure out what andwhere things are in a given image Because of its excellent accuracy and ability to oper-ate in real-time, YOLO has become extremely popular The algorithm "just looks once"
detec-at the picture in the sense thdetec-at it only needs to put the image or video through the neuralnetwork once in order to produce predictions A single CNN predicts multiple boundingboxes and class probabilities for those boxes at the same time with YOLO This simplyimplies that they detect and utilize bounding boxes to show the position and type of theobject
In order to illustrate YOLO theory, we consider how YOLO generates boundingboxes An image input is divided into SxS squares, the picture is then converted togray-scale so that it may be processed by the computer as 0 and 1 The brightness ofeach pixel is represented by a number A filter is essentially a place where we multiplyintegers by another number and then combine all of the results together We look at onearea of the image, apply the filter, then add the results to get a convolved feature in this
"image." It’s similar to this All that remains is to train the YOLO Convolutional NeuralNetwork to learn to predict the final detection result based on these numbers However,this is not the only way to detect bounding boxes
YOLO differs from conventional object identification algorithms is that it takes theentire image and re-frames the object recognition issue as a single regression problem,going from image pixels to bounding box coordinates and class probabilities The nextstep is to utilize IoU and NMS to calculate the bounding boxes IoU (Intersect overUnion) and NMS (Non-maximum suppression) are two major post-processing proce-dures used by YOLO The IoU measures how closely the machine’s predicted boundingbox fits the bounding box of the real item Consider Figure 2.1:
The purple box represents the computer’s interpretation of the automobile, whilethe red box is the car’s real bounding box as shown in Figure 2.2 IoU is determined bythe overlap of the two boxes
Currently, numerous bounding boxes of a car are detected by object detection.NMS comes into play here NMS guarantees that among all of the bounding boxes,the best cell is found Rather than concluding that her face appears in many places in theimage, NMS picks the box with the best likelihood of detecting the same item
It produces an exceptionally quick forecast of the numerous items in a picture usingboth IoU and NMS This is why YOLO is far more popular than other computer visiontechniques The capacity to recognize several items that are either too close or too tiny
Trang 34Figure 2.1 Bounding boxes with a car as object [1]
Figure 2.2 Bounding boxes illustration [1]
is a minor downside of YOLO
Trang 352.2 Evolution of YOLO and selection of YOLOv5 model
The core idea behind the YOLO algorithm is to take the entire picture as the input
of the network, and directly regress the position and category of the bounding boxes inthe output layer The implementation of YOLOv1 follows these steps:
1 Divide an image into S ×S grid cells If the center of an object falls in this grid, the
grid is responsible for predicting the object as shown in Figure 2.3
Figure 2.3 YOLO’s core idea [2]
2 Each network needs to predict the location information and confidence information
of B bounding boxes One bounding box corresponds to four location informationand one confidence information Confidence represents the confidence that thepredicted box contains the object and how accurate the box is predicted:
3 Among them, if there is an object in a grid cell, the first item is 1, otherwise it is
0 The second term is the IoU value between the predicted bounding box and theactual ground truth
4 Each bounding box needs to predict (x, y, w, h) and a total of 5 values of confidence,and each grid also predicts a category of information, which is recorded as category
C Then there are S × S grids, and each grid needs to predict B bounding boxes and
also predict C categories The output is a tensor of S × S × (5 ∗ B +C).
A grid predicts multiple boxes, and the hope is that each box predictor is ically responsible for predicting an object The specific method is to see which IoU
Trang 36specif-in the currently predicted box and ground truth box is larger, whichever is responsible.This approach is called specialization of box predictor Also, YOLO introduce us lossfunction, shown in Figure 2.4:
Figure 2.4 YOLO’s loss function [1]
In this loss function, only when there is an object in a grid, the classification errorwill be punished Moreover, only when a box predictor is responsible for a groundtruth box, it will punish the coordinate error of the box, and which ground truth box isresponsible depends on whether its predicted value and IoU of the ground truth box are
in that cell YOLO uses leak ReLU as the activation function and the model is pre-trainedwith ImageNet data-set
Compared with the v1 version, YOLOv2 has made improvements in three aspects:more accurate prediction, faster, and more recognition objects on the basis of continuing
to maintain processing speed [29] Recognizing more objects means expanding to beable to detect 9000 different objects, which is called YOLO9000
The improvement strategy of YOLOv2 compared to YOLOv1 is batch ization It helps solve the problem of gradient disappearance and gradient explosion
normal-in the back propagation process, and reduces the sensitivity to some hyper-parameters(such as learning rate, network parameter size range, activation function selection), andeach batch When normalization is performed separately, a certain regularization effect
is achieved, so that a better convergence speed and convergence effect can be obtained.YOLO2 uses 224 × 224 images to pre-train the classification model, and then uses
448 × 448 high-resolution samples to fine-tune the classification model (10 epochs),
Trang 37so that the network features gradually adapt to the resolution of 448 × 448 Then use
448 × 448 detection samples for training, which alleviates the impact of sudden
resolu-tion switching
The model of YOLO v3 [3] is much more complicated than the previous model,and the speed and accuracy can be weighed by changing the size of the model structure.The speed comparison is shown in Figure 2.5 as follows:
Figure 2.5 Speed comparison of YOLOv3 [3]
In short, the prior detection system of YOLOv3 reuses the classifier or locator toperform the detection task They apply the model to multiple locations and scales of theimage And those areas with higher scores can be regarded as the test results YOLOv3introduces many improvements:
• Multi-scale prediction (introduction of FPN):Each scale predicts 3 boxes, and theanchor design method still uses clustering to obtain 9 cluster centers, which areequally divided into 3 scales according to their sizes
• Better basic classification network (darknet-53, similar to the residual structure troduced by ResNet)
in-• The classifier does not use Softmax, and the classification loss uses binary entropy loss (two classification cross-entropy loss)
cross-YOLOv5 is the most efficient version on edge device of the YOLO family of tection networks, introduced by Glenn Jocher in May 2020 using the Pytorch frame-work There are four versions of the YOLOv5 network model: YOLOv5s, YOLOv5m,YOLOv5l, and YOLOv5x
Trang 38de-Figure 2.6 YOLOv5 different model sizes, where FP16 stands for the half point precision, V100 is an inference time in milliseconds on the Nvidia V100 GPU, and mAP based on the original COCO dataset [4]
floating-The YOLOv5 network has the smallest depth and feature map width in the YOLOv5series, and the next three are deepened and widened based on its basis The networkstructure of YOLOv5 consists of four parts: Input, Backbone, Neck and Prediction
Figure 2.7 YOLOv5 network architecture [2]
The input is the stage of image preprocessing for the input image ing includes data enhancement, adaptive image scaling and anchor frame calculation.YOLOv5 uses the Mosaic data enhancement method to stitch four images into a newphoto by random layout, cropping and scaling, which greatly enriches the detection.And the data from four images can be calculated directly in the calculation of batchnormalization, which speeds up the training efficiency YOLOv5 has embedded the an-chor frame calculation into the training, outputting the predicted frame on the initialanchor frame, and later comparing it with ground truth to calculate the Loss, thus con-tinuously updating the anchor frame size and adaptively calculating the optimal anchor
Trang 39Preprocess-frame value.
The Backbone mainly consists of Focus structure and CSP structure However,after the latest version V6.0 of YOLOv5, the Focus module is replaced with a 6 × 6sized convolutional layer The CSP structure enhances the learning ability of the modeland speeds up network inference
The Neck includes FPN and PAN structures YOLOv5 adds PAN to the FPN ture to make the combined structure better for the fusion and extraction of features atdifferent levels Version V6.1 also replaces SPP with SPPF, which is designed to con-vert arbitrarily sized feature maps into fixed-size feature vectors more quickly
struc-The Prediction completes the output of the target detection results YOLOv5 forms
a new loss function CIOU based on the IOU loss function by considering the distanceinformation of the center point of the bounding frame, and IOU refers to the intersectionratio of the predicted frame and the real frame DIOU-nms is also used in this processinstead of the traditional NMS operation, with the aim of suppressing redundant framesbetter, thus further improving the detection accuracy of the algorithm
Based on model comparison shown on Figure 2.6, YOLOv5s (small) model bestfit with following parameters:
1 Input size: 640x640 pixels
2 Number of params: 7.2M
3 FLOPs@640: 16.5B
The selection of YOLOv5s is based on the fact that the object detection module needs to
be compact (14 MB model size), efficient with high performance and low inference time(2.2 ms) The only trade-off is low accuracy, however mAP of 36.8 is still acceptablecompared to that of YOLOv5x (50.1)
2.3 Implementation of YOLO detector
Some popular, sky-rocketing-developed deep neural networks often result in verysuccessful accuracy and precision However, their inference runtime could be extremelyslow and sluggish and further require a significant amount of computing resources Ourembedded modules require inference in real-time while utilizing a limited amount ofhardware resources due to the size, weight, power, and cost of our embedded hardwareJetson Xavier Despite such efforts and advances, the common, general-purpose deeplearning framework, such as PyTorch [30], is not particularly optimized for the com-puting resource and time consumption of inferences To address this issue, NVIDIA
Trang 40published the TensorRT [5], a high-performance deep learning inference engine for duction deployments of deep learning models TensorRT can improve the inferencethroughput and efficiency, provide neural network models trained on PyTorch and Ten-sorFlow with optimization ability, and deploy them to various embedded platforms.
pro-In this section, the effectiveness of TensorRT in our embedded module will beexamined by comparing it to the vanilla (without TensorRT) PyTorch framework Inparticular, there are two workflows to deploy our YOLO model on Jetson Xavier, byusing the default Pytorch default, or by integrating the TensorRT engines with PyTorch-compatible models These two workflows are further demonstrated in Figure 2.8 Theperformance of these two integration workflows are evaluated under a variety of work-loads The performance bottlenecks in the inference using TensorRT is then identified,which leads to possibilities for fully maximizing the deep learning performance in terms
of GPU memory utilization
Figure 2.8 Comparison of vanilla PyTorch and ONNX-TensorRT workflow
PyTorch is an open-source machine learning framework that accelerates the pathfrom research prototyping to production deployment It is a Python package beingprimarily used as a deep learning research platform that aims at providing maximumflexibility and speed PyTorch supports the operation of Tensors (multi-dimension ar-ray) on both CPU and GPU, and this may accelerate the computation by a significantamount PyTorch provides a variety of tensor routines to fit different scientific compu-tation needs, including mathematical operations and linear algebra
The Open Neural Network Exchange (ONNX) is an open-source artificial gence ecosystem It is an open standard established by research organizations for repre-senting machine learning algorithms and tools to promote innovation and collaboration
intelli-in the AI sector ONNX targets at intelli-interoperability and defintelli-ines a common set of operators
— the building blocks of machine learning and deep learning models — and a commonfile format, which make it possible for models with various frameworks, tools, com-pilers, and runtimes to sync ONNX supports multiples frameworks such as PyTorch,TensorFlow [31], Caffe2 [32] and Apache MXNet [33]
TensorRT is a Software Development Kit (SDK) for high-performance deep ing inference It is a part of NVIDIA CUDA It comes with a deep learning inference