This report aims to present the use of object detection and instance segmentation for emergency vehicle detection, which is essential for any intelligent transportation system Vietnam is
Trang 1HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY SCHOOL OF ELECTRONIC AND ELECTRICAL ENGINEERING
TECHNICAL REPORT
Topic:
COMPUTER VISION TECHNIQUES AND ITS APPLICATION FOR SOLVING VIETNAM’S TRANSPORTATION PROBLEMS
Instructor: Dr Nguy n Ti n Hòa ễ ế
Student:
Student ID:
Thân Đức Trí
20203891
Course: Technical writing & Presentation
Major: Smart embedded systems and IoT
HANOI, 2022 December
Trang 2C ONTENTS
LIST OF ACRONYMS i
LIST OF FIGURES ii
LIST OF TABLES iii
ABSTRACT 1
Chapter 1 INTRODUCTION 2
Chapter 2 BODY 3
2.1 The object detection model (Example Ambulance) 3
2.2 The instance segmentation model 6
2.3 Analysis of use cases 9
Chapter 3 CONCLUSION 10
REFERRENCE 11
Trang 3i
LIST OF ACRONYMS
Trang 4LIST OF FIGURES
Figure 1 Representation of Faster RCNN 3
Figure 2(a) Training loss vs number of iterations 4
Figure 3 Recognition of ambulance in traffic congestion 5
Figure 4 Representation of Faster RCNN 6
Figure 5 Specified configurations 7
Figure 6 Identification of ambulance in traffic 8
Trang 5iii
LIST OF TABLES
Table 2 Tabular representation of accuracy 8
Trang 6ABSTRACT
Computer vision technology has significantly impacted the field of intelligent transportation systems Its applications range from traffic monitoring systems to self-driving cars and often involve basic or advanced image or video analytics This report aims to present the use of object detection and instance segmentation for emergency vehicle detection, which is essential for any intelligent transportation system (Vietnam
is also included) Specifically, this detection can be integrated into autonomous vehicles and traffic signal controllers to prioritize emergency vehicles The implemented architectures, Faster RCNN for object detection and Mask RCNN for instance segmentation, are evaluated in terms of accuracy and suitability for detecting emergency vehicles in chaotic traffic conditions Additionally, the pros and cons of using object detection versus instance segmentation for emergency vehicle detection are compared
Trang 72
Computer vision is a field of artificial intelligence that involves the development
of algorithms and systems that can interpret visual data from the world around us This includes the ability to recognize and classify objects, understand the relationships between objects, and interpret the context and meaning of the scene being observed One area where computer vision techniques have been applied is in solving traffic problems For example, traffic cameras equipped with computer vision algorithms can be used to detect and classify vehicles, monitor traffic flow, and identify potential hazards or incidents This information can be used to improve traffic management and safety, as well as to optimize the use of transportation infrastructure Other applications of computer vision in traffic include the development of autonomous vehicles, which rely on computer vision to navigate roads and avoid collisions, and the use of computer vision to analyze traffic patterns and optimize the routing of vehicles Traffic in Vietnam is often congested, making it difficult for ambulances to quickly reach their destinations This is especially problematic as there are many motorbikes and cars on the roads In order to find a solution to this issue, my friends and
I have been researching various scientific approaches, including the use of computer vision
Trang 8C HAPTER 2 BODY
The two computer vision techniques used for emergency vehicle recognition are object detection and instance segmentation For object recognition and instance segmentation, specially built (CNNs [1]) termed Faster RCNN and Mask RCNN are used
These CNNs were first trained, iteration after iteration, to distinguish aspects of emergency vehicles in photos This was followed by extensive testing of the trained models for detection accuracy on an alternate unseen dataset The outcomes were classified as genuine positives, false positives, true negatives, and false negatives
2.1 HE OBJECT DETECTION MODEL T (EXAMPLE AMBULANCE)
Object detection involves identifying and locating a particular object in an image by creating a bounding box around it This is typically accomplished using convolutional neural networks that have been pre-trained on large datasets for image classification, such as Resnet, Visual Geometry Group Net, and Inception Net These networks are modified to be fully convolutional and able to handle inputs of various dimensions, and are then combined with object detection networks such as Faster RCNN, Single Shot Detectors, and Region based Fully Convolution Networks An example of this is shown in Figure 1 Representation of Faster RCNN which depicts a Faster RCNN [2] with a Visual Geometry Group Net base network Transfer learning, which involves using pre-trained networks to minimize the number of computations and images needed for a custom dataset, is commonly employed in object detection
Figure 1 Representation of Faster RCNN
Trang 94
Object detection is the process of locating a specific object in an image by constructing a bounding box around the object The backbone for object detection is traditional convolutional neural networks that do picture classification (Resnet, VGG Net, Inception Net, and so on) The transfer learning concept is used, which means that these base networks are pre-trained on big pre-existing datasets to decrease the amount
of pictures This ensures that traffic congestion, disorderly movement, and non-homogeneity in the images are avoided The model took into account the forms and sizes of the emergency vehicles The object detecting model was developed
TensorFlow [3] deep learning platform was used for 10200 iterations to reduce training loss values The subsequent training and validation loss values achieved were 0.0086 and 0.0029 (Figure 2 )
Figure 2(a) Training loss vs number of iterations
Trang 10Figure 2(b) Validation loss vs number of iterations Results:
The object detection algorithm recognizes the ambulance even when it is amidst
a traffic congestion (Figure 3) The output is in the form of a bounding box with detection accuracy in percentage
Figure 3 Recognition of ambulance in traffic congestion
Trang 116
2.2 T HE INSTANCE SEGMENTATION MODEL
Instance segmentation is a computer vision technique that accurately detects and outlines the boundaries of a particular object at the pixel level It is often accomplished using a flexible framework called Mask Region Based Convolution Neural Network Mask RCNN is highly effective deep neural network that can identify objects in an image, video, or real-time feed by enclosing them in a bounding box and simultaneously creating a segmentation mask for specific instances detected in the feed It outperforms other models by combining object detection (classification and location) and instance segmentation simultaneously
The Mask RCNN network (as shown in Figure 4) has two main stages The first stage determines the presence of an object in a specific region of the input image, known as the Region of Interest The second stage predicts the probability and displays
an Image over Union bounding box and a binary mask around the image based on the results of the first stage Both stages are integrated into the backbone The network has three components: the Feature Pyramid Network, the Region Proposal Network, and the backbone network architecture The FPN is a top-down or bottom-up architecture that serves as a universal feature extractor, using a bottom-up approach for this implementation The RPN is a lightweight network that scans the FPN bottom-up and proposes likely regions in the image where the object may be present It then recognizes different regions by fitting multiple bounding boxes according to certain IoU values The backbone is a multi-layered neural network that generates feature maps of the input feed In this case, ResNet50 is used as it is not a very deep architecture and fine-tuning helps the model achieve higher accuracy with less training time
Figure 4 Representation of Faster RCNN