technical report computer vision techniques and its application for solving vietnams ransportation problems

This report aims to present the use of object detection and instance segmentation for emergency vehicle detection, which is essential for any intelligent transportation system Vietnam is

Trang 1

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY SCHOOL OF ELECTRONIC AND ELECTRICAL ENGINEERING

TECHNICAL REPORT

Topic:

COMPUTER VISION TECHNIQUES AND ITS APPLICATION FOR SOLVING VIETNAM’S TRANSPORTATION PROBLEMS

Instructor: Dr Nguy n Ti n Hòa ễ ế

Student:

Student ID:

Thân Đức Trí

20203891

Course: Technical writing & Presentation

Major: Smart embedded systems and IoT

HANOI, 2022 December

Trang 2

C ONTENTS

LIST OF ACRONYMS i

LIST OF FIGURES ii

LIST OF TABLES iii

ABSTRACT 1

Chapter 1 INTRODUCTION 2

Chapter 2 BODY 3

2.1 The object detection model (Example Ambulance) 3

2.2 The instance segmentation model 6

2.3 Analysis of use cases 9

Chapter 3 CONCLUSION 10

REFERRENCE 11

Trang 3

i

LIST OF ACRONYMS

Trang 4

LIST OF FIGURES

Figure 1 Representation of Faster RCNN 3

Figure 2(a) Training loss vs number of iterations 4

Figure 3 Recognition of ambulance in traffic congestion 5

Figure 4 Representation of Faster RCNN 6

Figure 5 Specified configurations 7

Figure 6 Identification of ambulance in traffic 8

Trang 5

iii

LIST OF TABLES

Table 2 Tabular representation of accuracy 8

Trang 6

ABSTRACT

Computer vision technology has significantly impacted the field of intelligent transportation systems Its applications range from traffic monitoring systems to self-driving cars and often involve basic or advanced image or video analytics This report aims to present the use of object detection and instance segmentation for emergency vehicle detection, which is essential for any intelligent transportation system (Vietnam

is also included) Specifically, this detection can be integrated into autonomous vehicles and traffic signal controllers to prioritize emergency vehicles The implemented architectures, Faster RCNN for object detection and Mask RCNN for instance segmentation, are evaluated in terms of accuracy and suitability for detecting emergency vehicles in chaotic traffic conditions Additionally, the pros and cons of using object detection versus instance segmentation for emergency vehicle detection are compared

Trang 7

2

Computer vision is a field of artificial intelligence that involves the development

of algorithms and systems that can interpret visual data from the world around us This includes the ability to recognize and classify objects, understand the relationships between objects, and interpret the context and meaning of the scene being observed One area where computer vision techniques have been applied is in solving traffic problems For example, traffic cameras equipped with computer vision algorithms can be used to detect and classify vehicles, monitor traffic flow, and identify potential hazards or incidents This information can be used to improve traffic management and safety, as well as to optimize the use of transportation infrastructure Other applications of computer vision in traffic include the development of autonomous vehicles, which rely on computer vision to navigate roads and avoid collisions, and the use of computer vision to analyze traffic patterns and optimize the routing of vehicles Traffic in Vietnam is often congested, making it difficult for ambulances to quickly reach their destinations This is especially problematic as there are many motorbikes and cars on the roads In order to find a solution to this issue, my friends and

I have been researching various scientific approaches, including the use of computer vision

Trang 8

C HAPTER 2 BODY

The two computer vision techniques used for emergency vehicle recognition are object detection and instance segmentation For object recognition and instance segmentation, specially built (CNNs [1]) termed Faster RCNN and Mask RCNN are used

These CNNs were first trained, iteration after iteration, to distinguish aspects of emergency vehicles in photos This was followed by extensive testing of the trained models for detection accuracy on an alternate unseen dataset The outcomes were classified as genuine positives, false positives, true negatives, and false negatives

2.1 HE OBJECT DETECTION MODEL T (EXAMPLE AMBULANCE)

Object detection involves identifying and locating a particular object in an image by creating a bounding box around it This is typically accomplished using convolutional neural networks that have been pre-trained on large datasets for image classification, such as Resnet, Visual Geometry Group Net, and Inception Net These networks are modified to be fully convolutional and able to handle inputs of various dimensions, and are then combined with object detection networks such as Faster RCNN, Single Shot Detectors, and Region based Fully Convolution Networks An example of this is shown in Figure 1 Representation of Faster RCNN which depicts a Faster RCNN [2] with a Visual Geometry Group Net base network Transfer learning, which involves using pre-trained networks to minimize the number of computations and images needed for a custom dataset, is commonly employed in object detection

Figure 1 Representation of Faster RCNN

Trang 9

4

Object detection is the process of locating a specific object in an image by constructing a bounding box around the object The backbone for object detection is traditional convolutional neural networks that do picture classification (Resnet, VGG Net, Inception Net, and so on) The transfer learning concept is used, which means that these base networks are pre-trained on big pre-existing datasets to decrease the amount

of pictures This ensures that traffic congestion, disorderly movement, and non-homogeneity in the images are avoided The model took into account the forms and sizes of the emergency vehicles The object detecting model was developed

TensorFlow [3] deep learning platform was used for 10200 iterations to reduce training loss values The subsequent training and validation loss values achieved were 0.0086 and 0.0029 (Figure 2 )

Figure 2(a) Training loss vs number of iterations

Trang 10

Figure 2(b) Validation loss vs number of iterations Results:

The object detection algorithm recognizes the ambulance even when it is amidst

a traffic congestion (Figure 3) The output is in the form of a bounding box with detection accuracy in percentage

Figure 3 Recognition of ambulance in traffic congestion

Trang 11

6

2.2 T HE INSTANCE SEGMENTATION MODEL

Instance segmentation is a computer vision technique that accurately detects and outlines the boundaries of a particular object at the pixel level It is often accomplished using a flexible framework called Mask Region Based Convolution Neural Network Mask RCNN is highly effective deep neural network that can identify objects in an image, video, or real-time feed by enclosing them in a bounding box and simultaneously creating a segmentation mask for specific instances detected in the feed It outperforms other models by combining object detection (classification and location) and instance segmentation simultaneously

The Mask RCNN network (as shown in Figure 4) has two main stages The first stage determines the presence of an object in a specific region of the input image, known as the Region of Interest The second stage predicts the probability and displays

an Image over Union bounding box and a binary mask around the image based on the results of the first stage Both stages are integrated into the backbone The network has three components: the Feature Pyramid Network, the Region Proposal Network, and the backbone network architecture The FPN is a top-down or bottom-up architecture that serves as a universal feature extractor, using a bottom-up approach for this implementation The RPN is a lightweight network that scans the FPN bottom-up and proposes likely regions in the image where the object may be present It then recognizes different regions by fitting multiple bounding boxes according to certain IoU values The backbone is a multi-layered neural network that generates feature maps of the input feed In this case, ResNet50 is used as it is not a very deep architecture and fine-tuning helps the model achieve higher accuracy with less training time

Figure 4 Representation of Faster RCNN

Tiêu đề	Computer Vision Techniques And Its Application For Solving Vietnam’s Transportation Problems
Tác giả	Thân Đức Trí
Người hướng dẫn	Dr Nguyễn Tiến Hòa
Trường học	Hanoi University of Science and Technology
Chuyên ngành	Smart embedded systems and IoT
Thể loại	Technical Report
Năm xuất bản	2022
Thành phố	Hanoi

Định dạng
Số trang	11
Dung lượng	842,01 KB