Khóa luận tốt nghiệp Kỹ thuật máy tính: Nghiên cứu và thực hiện hệ thống phát hiện, theo dõi tốc độ xe trên đường quốc lộ

-- ¿+ + 1Figure 1-2: Using internet of thing in public transportation Figure 1-3: Idea design Figure 2-1: Intelligent transportation system: Figure 3-1: Image processing in R-CNN Figure

Trang 1

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY

UNIVERSITY OF INFORMATION TECHNOLOGY

COMPUTER ENGINEERING DEPARTMENT

PHAN TRAN QUOC DAT

VÕ QUOC HUY

GRADUATION THESIS

RESEARCH AND IMPLEMENTATION OF DETECTING

AND TRACKING SYSTEM OF VEHICLE ON

NATIONAL WAYS

ENGINEER OF COMPUTER ENGINEERING

HO CHi MINH CITY, 2021

Trang 2

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY

UNIVERSITY OF INFORMATION TECHNOLOGY COMPUTER

NGHIEN CUU VA THUC HIEN HE THONG PHAT HIEN,

THEO DOI TOC DO XE TREN DUONG QUOC LO

ENGINEER OF COMPUTER ENGINEERING

INSTRUCTOR

PhD LAM DUC KHAI

HO CHi MINH CITY, 2021

Trang 3

PROTECTION COUNCIL OF THE GRADUATE THESIS

Protection council of the graduate thesis, established under decision no

70/QD-DQCNTT dated January 27", 2021 of the Rector of University of

Information Technology

Trang 4

We would like to give our gratitude to Ph.D Lam Duc Khai and Ph.D Nguyen MinhSon for the passionate answers whenever we face complicated questions, the usefuladvice whenever we run into troublesome tasks, the directions whenever we lost track

of situations

We also wish to give our appreciation to all the instructors of UIT computer engineeringdepartment for every single lesson which provide us with the knowledge, nurture us tobecome better people for the society

Finally, we would like to give out big thanks for all the people that constantly assist us

in the process of carrying out the research, give us the encouragements in the hard time,and support us both in finance and mentality This research can not be done withoutyour enthusiastic help

Trang 5

2.2 Problem and direction

Chapter 3 THEORY FOUNDATION

3.1 Review of object detection model:

3

3.2 Detail of Yolo Model for object detection

3.2.1 Introduction

3.2.2 The reason Choosing _YoÏO -:-¿- 5:5: S52 +e+xexsxervxererrrererree 13

3.2.3 How this WOTK 1 2222212 22 222 021221212121 re 13

3.2.4 The reSUÍ(L 5 cà 221 1212121211112 HH HH 19

3.3 Dataset and Training - St tt 1010121 H000 hàn 203.3.1 Dataset and why choOSingE ¿tt St 20Kao ẽắắ%mnỖ 213.3.3 Processing raw iIAgCS ánh HT HH ngư 2ky 233.3.5 Evaluate the training r€SuÏ - -¿ - «+ + St St re 25

3.4 Review of object tracking algorithms

3.4.1 Meanshift

3.4.2 Particle filter ố ẻ 28

3.4.3 Kaliman 293.5 SORT algorithm +22 22x22 x2 txerrrrrrrrerườt 313.5.1 InfFOđUCtÏOH c1 1 E* ST 1212 H1 HH re 31

Trang 6

3.5.2 Processing flow of the SORT ¿2-5222 2t22rrerrrerrerrre 31

3.6 Speed measuring

Chapter 4 PROJECT IMPLE TION AND RESULT EVALUATION

.34

4.1 Project implementation 344.1.1 Hardware 34

“9 36

.39 4.2 Result evaluation

4.2.1 Evaluating model base on đetecting distance 394.2.2 Evaluating model base on detecting aCCUTaCÿ c csc sex 444.2.3 Evaluating model base on tracking vehicle + s+++c+c+x+xss+ 46Chapter 5 CONCLUSION AND FUTURE WORKK -c-csc< << 51

5.1 Conclusion ⁄⁄22 <651 Te Ắ, 515.2 Future work 51REFERENCES 52

Trang 7

FIGURE MENU

Figure 1-1: Traffic jam in VietfIam ¿+ + 1Figure 1-2: Using internet of thing in public transportation

Figure 1-3: Idea design

Figure 2-1: Intelligent transportation system:

Figure 3-1: Image processing in R-CNN

Figure 3-2: Image processing in Fast R-CNN

Figure 3-3: Left: Region proposal network (RPN) Right: Samples detection of

lý 00.G000Ẻ1 10

%œ œ6

Figure 3-5: SSD mechanism in training and detecting

Figure 3-6: YOLO workflow

Figure 3-7: YoLo performance

Figure 3-8: Darknet framework loads 106 layers for every commands

Figure 3-9: Darknet-53

Figure 3-10: Image by Ayoosh Kathuria

Figure 3-11: Image by Valentyn Sichkar(a)

Figure 3-12: Image by Valentyn Sichkar(b)

Figure 3-13: Total bounding boxes of three difference scales

Figure 3-14: Calculate bounding box by using the anchor

Figure 3-15: Equation for objecness score - Image by Valentyn Sichkai

Figure 3-16: Image of objects detected on HoChiMinh cit

Figure 3-17: The result displayed on terminal

Figure 3-18: Types of vehicle used to train model

Figure 3-19: Instances label

Figure 3-20: Weather types

Figure 3-21: Annotation format

Figure 3-22: Interface of labeling

Figure 3-23: Services from Collab

Figure 3-24: Training procedure

Figure 3-25: Chart evaluate after training

Figure 3-26: Detail result after training

Figure 3-27: Meanshift illustration

Figure 3-28: Steps in the operation

Figure 3-29: Content of kalman

Figure 3-30: Processing flow of the SORT

Figure 3-31: Estimate the speed of vehicles

Figure 4-1: Jetson Nano kit

Figure 4-2: OpenCv

Figure 4-3: Distance at sunny

Figure 4-4: Distance at cloudy SOSCDRHHKBVAIADRURWY

Trang 8

Figure 4-5: Distance at near night

Figure 4-6: Distance at rain

Figure 4-7: Distance at rainy nigh

Figure 4-8: Video result

Figure 4-9: Detect object partially obscured

Trang 9

TABLE MENU

Table 1: Testing results of many models in similar conditions - -‹Table 2: Number of imported automobiles on November 2020

Table 3: Specification of hardware on Collab

Table 4: Compare the Tracking methods

Table 5: Performance of models on different hardwar:

Table 6: Jetson Nano kit detai

Table 7: Camera detail

Table 8: Distance at sunny

Table 9: Distance at cloudy

Table 10: Distance at near night

Table 11: Distance at rain

Table 12: Distance at rainy night

Table 13: Video Specifications

Table 14: Number of vehicles in practice

Table 15: Number of vehicles compare

Table 16: Compare the results

Trang 10

We made this research with the hope to provide a device that can help solve in

improve the quality of people when traveling on roads in Vietnam

Thanks to the development of deep learning and computer version, main functions

of the device are detection and tracking popular vehicles on national roads, in

addition, estimate the distance and speed as the same time

The system will have the following direction:

- The system will detect various kinds of vehicles using Yolo version 3 Inputcan be available videos or transmit through camera attached to the hardware

- The detected objects will be tracking and estimating speed by applying

algorithms

According to the expected result, performance of the detecting process can reachabout 80% and the ability to calculate speed is added

Trang 11

Chapter 1 CURRENT PROBLEM AND POTENTIAL SOLUTION

1.1 Problem statement

Our country is at the stage of developing in many fields, some of noticeable ones areabout human, science, technology, Through the process, there is no other solutionbut to be headstrong and deal with challenges born along the way

Figure 1-1: Traffic jam in Vietnam

A large-scale problem we could mention of is the national transportation So far, with

128 national roads having the total length of 17.530 kilometers and a tremendousnumber of vehicles that are increasing day by day, one might easily think about thehardship for the Ministry of Transport to manage such complex traffic

Trang 12

Fortunately, the improvement of information technology has taken effect With theapplication of deep learning, particularly detection and tracking methods, we can giveassists to controlling traffic more adequately One great detecting and tracking systemcan not only support the management of traveling but also raise the citizens’consciousness of safety when participating in traffic flow.

1.2 The idea

From primitive forms like walking or riding bicycle on trails, vehicles with four wheelsappear regularly on the asphalt roads which are prolonged from time to time Besidepositive aspects such as the improving of everyday life quality, people have toconfront trouble incurred from that growth Many types of vehicles meaning moretule need to establish to keep all of them under control, the frustration of people whenthey get stuck in a traffic jam which can last for a few hours after a day of tiring work,

or the lack of sense of responsibility and safety for themselves and the others, all of

Figure 1-3: Idea designWith the application of deep learning to detect and track vehicles, the amount of workwill be able to reduce Having a system to analyze the number of vehicles in specific

Trang 13

frame of time so we can give citizens announcements about the situation on the road

so they could avoid driving on jamming roads For the traffic police, having anaccurate information will put them in the right destination to give out trafficcommands, or in the office, they can detect vehicles that break the rules Not onlysolving external issues, but the project may also join in fixing internal problems.People knowing about the system will have to behave themselves if they do not want

to have punishments, gradually, they will form a civilized habit, care more for theirown safety and the others’ as the same time

1.3 Methodology

About this project, we propose three main steps that need to acquire:

— The first one is to understand the basic concept of Deep learning Foundation is

always an important element when approach new knowledge Our group hadgathered necessary information from various reliable sources: learningwebsites on the internet, articles of many previous researches, experience fromqualified people,

— The second one is the implementation of the project This is the main stage of

the project as it carries multiple tasks to be accomplished For the hardwarewill have a crucial role in deciding the performance of the system, a choiceneed to be make considered about the functionality and price After making anoverall view at the marketplaces, NVidia Jetson Nano kit seems to be areasonable pick because of the efficiency in both price and performance Aftersetting up the hardware with every requirement, YOLO algorithm, a state ofart Object Detector, had been installed and run demo to see the result Nextstep is collecting the dataset for our purpose Dataset is considered one of themost basic factors to evaluate if the model can be applied to reality or not Themore resourceful dataset with a precise annotation will give the model anability to learn deeply about the features of objects that we desired to detect, inorder word, the result will gain better accuracy for this project The intention ofthis dataset is from Vietnam, for Vietnam as most of training data will be

collect in our nation so model can get used to environment in this country.

Trang 14

— Finally, the last target is training and testing the model Theory is usually

different from reality, so testing will never be an ignorable subject After all,this project is mean to be able to be applied in real world After testingfunctionality and packing vital components, we bring the system to suitable

environment for the validation.

Trang 15

Chapter 2 RELATED WORK

2.1 Previous projects

2.1.1 Domestic

The human eye is structured in the following parts:

— ITS (intelligent transportation system) for Vietnam Expressway has been

publicly available in March 2017 This project is the fruit of cooperation of aconsortium of Japanese companies led by Toshiba The man in charged claimedthe system including cutting-edge information processing technology foranalyze vehicles on the road, which results in reducing the disruption along withthe network and inconvenience for its users

~ ——Ÿ_ `

Figure 2-1: Intelligent transportation system

— Da Nang smart camera system has been carrying out as a foundation to create

a smart city in the future The system can run 24/24 hours, detect two kinds of

vehicles (car and motorbike), and can report traffic violations The scale of the

project is at national level: there are nearly 50 locations that are installed smartcamera, the total number of traffic cameras in the city is 143 with 125surveillance cameras, 9 speed testing cameras and 9 observation cameras

2.1.2 Foreign

A Cascade of Boosted Generative and Discriminative Classifiers for VehicleDetection conducted by Pablo Negri, Xavier Clady, Shehzad Muhammad

Trang 16

Hanif, and Lionel Prevost showed a cascade of boosted classifiers for vehicledetection in scene image on the road The project studied two main features:Haar-like features and HoG (histogram of oriented gradients) features, whereHaar-like features are used to construct discriminative weak classifiers whilethe other ones are used to construct generative weak classifiers The fusiondetector combines the advantages of both Haar and HoG detectors and achieves

a high correct detection rate of 94% and a small number of false alarms rateper image of 0.0003 The result had been evaluated on 2.2 GHz processor and

had not been tested in practice.

Siemens Mobility has developed technology to assist traffic management tasks.Some of examples are: Sitraffic Sivicam- an easy to attach to availableinfrastructure elements and be able to detect vehicles at the intersection Theirrecent project is the cooperation Australia authority to improve the quality ofhighway transportation The main use of the newly introduced system is the

ability to exchange information on traffic disruptions real time so they can send

signals back to be processed This can help navigate vehicles such asemergency vehicles or public transports to move more efficiently

2.2 Problem and direction

Most of the projects above have been tested and proved to be in use as they areproviding by brands that have history and experience The utilization of deeplearning in image processing has been providing usefulness in surveillance andcapturing situation on streets, which lend a hand to make a good decision atsolving congestion and traffic violation

Although the system mentioned can operate smoothly, those are still a nationallevel project With a scale that big, the expending in equipping and maintainingthem are extremely pricey leading to the decrease in coverage of systems, not tomention the resources of computing and network

This project is proposed to build a system where the resources usage will bereduced With the main point is:

6

Trang 17

— Receiving the data from the camera.

— Detecting various typical kinds of vehicles in Vietnam national roads

— Counting the number of vehicles

— Estimating the speed of vehicles.

With the information extracted by the system, we hope this project can help resolvemany issues of current traffic and help improving the satisfaction of citizens whentraveling

Trang 18

Chapter 3 THEORY FOUNDATION

3.1 Review of object detection models

Object detection is an activity when a model provides location of an object in an

image and draw a bounding box around that object Some common model

architectures for object detection are R-CNN, SSD, YOLO, let have some reviews

3.1.1 R-CNN

R-CNN stands for regions with CNN features The model has the name by the

activity of extracting proposal regions from input image, wrapping it to

compatible size for convolutional neural network, then compute the features for

each proposal After that, regions are classified by linear Support Vector

Figure 3-1: Image processing in R-CNN

This model achieves a mean average precision of 53.7% on PASCAL VOC 2010

As an early proposed model, disadvantages appear:

— Training is multiple-stage pipeline

— Training is expensive in space and time as VGG16 is used as backbone

— Object detection is slow as ConvNet forward pass for each object proposals

3.1.2 Fast R-CNN

Fast R-CNN is an improvement that fixes disadvantages of CNN The network

takes entire image and a set of object proposals as input Image goes through

Trang 19

several convolutional and max pooling layers to product convolutional feature

map Each Rol (region of interest) is pooled into a fixed-size feature map and

mapped to feature vector by fully connected layers The network then outputs two

vectors per Rol: softmax probabilities and per class bounding box regression

offset

Some advantages of Fast R-CNN compared to R-CNN:

— Training is a single-stage, using a multi-task loss

— Training can update all network layers

— No disk storage is needed for feature caching

Figure 3-2: Image processing in Fast R-CNN

The reason why Faster R-CNN give a quicker result owing to the replacement of

selective search, which is used for both R-CNN and fast R-CNN, with RPN to

identify the region proposals

3.1.3 Faster R-CNN

Faster R-CNN comprises of two modules: a deep convolutional network for

proposing regions and a Fast R-CNN as detector A region proposal network

(RPN) takes image as input and outputs rectangular object proposals Each

rectangular has objectness score

Trang 20

2k scores 4k coordinates ° ` )

ls layer \ t reg layer

256-4 + intermediate layer

sliding window

ony feature map

Figure 3-3: Left: Region proposal network (RPN) Right: Samples

Faster R-CNN| 0.2

fe) 15 30 45

Figure 3-4: Image processing time of Faster R-CNN comparing to

previous models3.1.4 SSD

SSD, standing for single shot multibox detector, is a single shot detector designed

to use one-stage deep neural network for object detection in real-time SSDenhances running speed comparing to two stage detector like faster R-CNN by

eliminating the proposal network

SSD comprises of two main features: extractions of feature maps andconvolutional filters to detect objects The model takes image and ground trueboxes as inputs, then model evaluate default boxes at each location in several

10

Trang 21

feature maps with different scales(8x8 ,4x4) The bounding boxes are chosenbased on what rate they match with the ground true boxes The chosen defaultboxes are then predicted with both coordination and confidence for all objectclasses.

(a) Image with GT boxes (b) 8 x 8 feature map (c) 4 x 4 feature map

Figure 3-5: SSD mechanism in training and detecting3.1.5 YOLO

YOLO stands for you only look one Different from region-based model abovewhich only use a part of an image to detect object, YOLO will look through animage at a whole to find out where an object is

The input image will be divided into SxS grid, each cell is responsible to predictone class Convolutional layers are used to extract features then feed to fullyconnected layer to predict the output including coordinate and output

probabilities

Due to spatial constraints of algorithm, YOLO struggles against small objectwithin the image

le boxes + confidence

Class probability map

Figure 3-6: YOLO workflow

II

Trang 22

Decision making in choosing an object detection model:

Table 1: Testing results of many models in similar conditions

13.5 38.1 52.0 16.2 39.8 52.1

unreliable YOLO based on VGG16, lowers FPS to 21, but 66.4 mAP makes up for

it SSD300 outperforms both YOLO and Faster R-CNN with 74.3 mAP, 59 FPS onsame hardware But this comparison was made in 2016, as technology improves, we

will have different result.

In 2018, Joseph Redmon and Ali Farhadi introduced YOLOv3 with manyimprovements The architecture increased into 106 layers, had 3 different scale fordetecting operation, These changes were huge and leveraged YOLOv3 into one ofthe best detectors for real-time task

Metho.

[B] SSD22T

[C] DSSD221 [DỊ R-FCN [E] SSDS13 [E]l DSSDS13

[G] FPN FRCN

YOLOv3-320 YOLOv3-416

Trang 23

3.2 Detail of Yolo Model for object detection

3.2.1 Introduction

Yolo, standing for You Only Look One, is a state of art algorithm which is popularfor the ability to detect object and can be utilized for real time application

3.2.2 The reason choosing Yolo

Yolo had been publicized for community a few year ago with many versionsreleased through time This project make use of Yolov3 as for its incredible speedand reliable accuracy compared to other detection algorithms It can also detectmultiple objects including objects’ class and location in a single image It can alsodetect multiple objects including objects’ class and location in a single image.Above all the model can simply have a tradeoff between speed and accuracy just

by changing the model network size without requiring more resources

3.2.3 How this work

3.2.3.1 Architecture

Yolo uses:

— 53 CNNs layers (darknet-53)

— For detection, 53 more layers are added

— Total 106 layers for Yolo version 3

0.044 BF

Figure 3-8: Darknet framework loads 106 layers for every commands

13

Trang 24

-Ivpe Filters Size Output

— n: the number of images

— w: width

— h: height

3.2.3.3 Channel

The number of w and his the resolution of the network and they can be changed

with any number that can be divided by 32 without any remainder Increasing

14

Trang 25

the solution of the network leading to improvement of accuracy in training and

detecting

Darknet framework is integrated resize function, so the user may feed images

of any size to network with trouble, as the images will be adjusted according

to network size (width x height)

3.2.3.4 Detection

As mention above, layer 82, 94 and 106 is where detection conducting,

respectively, input image goes through downsampling with the factors of 32,

16, and 8 These three numbers are called stride, which indicates how many

times the solution of the images in that specific layer is smaller than the input

network size For example, the input size is 416 x 416, then move to layer 82,

it is downsampled to 13 x13, similarly, 26 x 26 for layer 84 and 52 x 52 for

layer 106

, |

Residual Block

A Detection Layer *

106

Upsamplng Layer "

@ Further Layers

YOLO v3 network Architecture là

Figure 3-10: Image by Ayoosh Kathuria

15

Trang 26

The reason for these conversions is to make detecting more effective when 13

x 13 solution is responsible for detecting large objects, 26 x 26 is for the medium and 52 x 52 is for small objects

— B: number of bounding boxes which each cell is responsible for

detection In the paper, author of Yolov3 stated each cell would predict

3 boxes.

— C: number of classes you want to detect.

— 5: including 5 attributes: coordinate of the center of bounding box (tx,

ty), width and height of bounding box (tw, th), objectness score (p0) and confidence of each classes (p1; p2, , Pc).

For example, we want to Yolo to detect 5 classes, the formula will become 3*(5+5) = 30 attributes for each cell in the feature map of detecting layers.

Figure 3-11: Image by Valentyn Sichkar(a)

16

Trang 27

3.2.3.6 Anchors (prior)

Anchors (or prior) is a redefined bounding box which participate in choosing

which bounding boxes in each cell predict the right object They are calculated

by K-means clustering Through whole process, total 9 anchor boxes are used

to calculate the bounding boxes, 3 for each scale.

0.05 class: orange

Figure 3-12: Image by Valentyn Sichkar(b)

First Yolo extracts the information in the kernel of each cell Then it calculates and chooses bounding box which has highest probability for specific class.

Repeating the steps for all the grip cells of all the scale, Yolo version 3 calculate total 10647 bounding boxes till the end of the process (supposed the

network input solution is 416 x 416).

Trang 28

3.2.3.7 Calculate bounding box

To calculate bounding box, yolo evaluate the offset to the anchor through these formulas

bw: width of bounding box

bp: height of the bounding box

Figure 3-14: Calculate bounding box by using the anchor 3.2.3.8 Objectness score

Objectness score is one of the attributes in bounding box It is used to calculate what class bounding box relating to and later that result will be used to choose anchor box The value of object score stands for the probability of bounding box has object inside.

18

Trang 29

We need to recognize the different between objectness score and the

confidences of classes: objectness score indicates the probability whether the cell contains object inside, confidences indicate what kind of classes the cell belongs to.

- predicted probability

- between predicted BB2 and ground truth BB1

Figure 3-15: Equation for objecness score - Image by Valentyn

Sichkar

3.2.4 The result

Yolo version with Darknet framework gives the bounding boxes and confidence

scores it detects The reliability of detection depends on how training process

executes.

Figure 3-16: Image of objects detected on HoChiMinh city

19

Trang 30

CM-day_6.j]pg: Predicted in 31024.904000 milli-seconds.

(left_x: 167 top y: 390 width: 117

(left_x: 223 top y: 287 width: 83 (left_x: 241 top y: 163 width: 39

)

(Left x: 380 top y: 67 width: 14

(left_x: 392 top y: 60 width: 14 (left_x: 394 top_y: 79 width: 22 (left_x: 411 top_y: 140 width: 38 (left_x: 423 top_y: 186 width: 52 (left_x: 426 top_y: 85 width: 23 (left_x: 426 top_y: 93 width: 25 (left_x: 477 top Vy: width: 49

(left_x: 595 top y: width:

Figure 3-17: The result displayed on terminal

3.3 Dataset and Training

3.3.1 Dataset and why choosing

Globally, one of the standard to evaluate development level of a country is the automobiles owning rate Vehicles of various types appear on road also determine the growth of car industry The prediction of Ministry of Industry and Trade is the flourish of car market in 2025 when our nation can have 600.000 cars per year.

The report of VEMA (Vietnam automobile manufacturers' association) stated that there was total 36.359 automobiles sold on October 2020, increased 9% compared

to previous month, 22% compared to previous year at the same month.

Although we have suffered from disadvantages such as pandemic obstructing the economy, the number of automobiles imported, installed and sold still showed such significant number.

Table 2: Number of imported automobiles on November 2020

Types Instances

Automobiles less than 9 seats 8.441

Automobiles more than 9 seats 12

Automobiles specializing in 2.585

transporting

20

Trang 31

Others 1.199

Total 12.237

With such evidences, we would like to choose these types of vehicles: car, van, truck, lorry truck, bus as they are vehicles whose number will only grow larger in the future, especially in main cities of our nation like Ho Chi Minh city or Ha Noi capital.

3.3.2 Raw image

A good model replies on a good dataset and Yolo is not excluded This project is conducted with the hope to give a mean to control the traffic for nowadays growth rate, so data for training are popular vehicles running on Vietnam national roads.

Our dataset currently has total 19200 annotated instances in 2430 images that are

used in this study 5 classes are identified to feed into the model: car, van, truck,

lorry truck and bus We decided to choose those types of transportation as theyare large in quality and have a flow to increase in the future.

21

Tiêu đề	Research and Implementation of Detecting and Tracking System of Vehicle on National Ways
Tác giả	Phan Tran Quoc Dat, Vo Quoc Huy
Người hướng dẫn	PhD. Lam Duc Khai
Trường học	University of Information Technology
Chuyên ngành	Computer Engineering
Thể loại	Graduation Thesis
Năm xuất bản	2021
Thành phố	Ho Chi Minh City

Định dạng
Số trang	63
Dung lượng	28,4 MB

Tài liệu tham khảo	Loại	Chi tiết
[1] Rich feature hierarchies for accurate object detection and semantic segmentation by Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, UC Berkeley(2014 Oct 22)	Khác
[2] Fast R-CNN by Ross Girshick, Microsoft Research (2015 Sep 27)	Khác
[3] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks by Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun (2016 Jan6)	Khác
[4] R-FCN: Object Detection via Region-based Fully Convolutional Networks by Jifeng Dai, Yi Li, Kaiming He and Jian Sun (2016 Jun 21)	Khác
[5] R-CNN, Fast R-CNN, Faster R-CNN, YOLO — Object Detection Algorithms by Rohith Gandhi (2018 Jul 10)	Khác
[6] SSD: Single Shot MultiBox Detector by Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, and Scott Reed, Cheng-Yang Fu, Alexander C. Berg(2016 Dec 29)	Khác
[7] YOLO9000: Better, Faster, Stronger by Joseph Redmon, Ali Farhadi (2016 Dec 25)	Khác
[8] YOLOv3: An Incremental Improvement by Joseph Redmon and Ali Farhadi (2018 Apr 8)	Khác
[9] An Evaluation of Deep Learning Methods for Small Object Detection by Nhat- Duy Nguyen, Tien Do, Thanh Duc Ngo, and Duy-Dinh Le (2020 Apr 27)	Khác
[10] SSD vs. YOLO for Detection of Outdoor Urban Advertising Panels underMultiple Variabilities by Angel Morera, Angel Sanchez, A. Belén Moreno, Angel	Khác
[11] A Comparison of Tracking Algorithm Performance For Objects in Wide Area Imagery by Rohit C. Philip, Sundaresh Ram, Xin Gao, and Jeffrey J. Rodriguez(2014 April 8)	Khác
[12] Application of SORT on Multi-Object Tracking and Segmentation by Franz Koeferl, Johannes Link and Bjoern Eskofier(2019)	Khác
[13] Multiple Object Tracking: A Literature Review by Wenhan Luo, Junliang Xing, Anton Milan, XiaoqIin Zhang, Wei Liu, Xiaowei Zhao, Tae-Kyun Kim(2017 May 22)	Khác
[14] 3D Visual Tracking with Particle and Kalman Filters by Burak Bayramli (2010 June 29)	Khác