A Manual Annotation Tool For Machine LearningAuto Image Annotation Process... Thesis project “An image annotation tool” is a project focus on users who demand a tool which generate and m
Trang 1VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY
UNIVERSITY OF INFORMATION TECHNOLOGY
ADVANCED PROGRAM IN INFORMATION SYSTEMS
NGUYEN MINH TU - DO THANH XUAN
AN ANNOTATION TOOL FOR MACHINE LEARNING
BACHELOR OF ENGINEERING IN INFORMATION SYSTEMS
Ho Chi Minh City, 2021
Trang 2NGUYEN MINH TU - 16521839
DO THANH XUAN - 16521479
AN ANNOTATION TOOL FOR MACHINE LEARNING
BACHELOR OF ENGINEERING IN INFORMATION SYSTEMS
THESIS ADVISOR
DR DO TRONG HOP
Ho Chi Minh City, 2021
Trang 3ASSESSMENT COMMITEE
Trang 4First of all, the authors want to say thank you to the University of Information
Technology and all of the lecturers in Information System department for tutoring andteaching important knowledge and treasure not only in university, but also in real lifethrough 4 years so that the authors can qualified to complete this thesis
The authors want to send their best regards and extend their thanks to Mr Do Trong
Hop and Mr Nguyen Thanh Binh for directly giving advice, corrections, helping and
supporting the authors through the time implement and working on the thesis All theseencouragement, feedback and inputs are treasure motivation when the author struggling
for research and implement this project
The authors also extend their thanks to Mr Ngo Duc Thanh for giving inputs and
advices so the report and thesis can complete 100%
The final words, the authors want to say thank you all to family and friends for those
encouragement through all the research project However, due to the authors’ knowledge
and experience is still limited so that mistakes and shortcommings are inevitable Hence,
the author sincerely looking forward to receiving helpful contribution, advices andfeedbacks from the lecturers to complete, fluent and be prerequisite for me to be able toimplement other projects in the future Again, thank you for all your help
Thank you sincerely
Trang 5UNIVERSITY OF INFORMATION
AdvancedTECHNOLOGY Education
Program
ADVANCED PROGRAM
IN INFORMATION SYSTEMS
THESIS PROPOSAL
THESIS TITLE: OBJECT DETECTION
Advisor: Do Trong Hop
Duration: (From 14" Aug to 11" Jan , 2021)
Students: Nguyen Minh Tu - 16521839
creating datasets is a very time consuming job Thus, it requires the effective
provisions of tools and secondary algorithm
2 Scope:
Main functionalities:
° Manage Videos, Images, Training dataset
° Labeling observation every single picture, video, frame
* Export types of dataset from the labelled sources
3 Objectives:
- Learn how the dataset works and the structure of the common algorithm
platforms
Trang 6- Creating an Object Detection dataset using Yolo and Tensorflow to manage,generate datasets, training set and test set.
4 Methodologies: 3 main steps
Programing language: C#, Python
s Survey main machine learning framework:
1 Investigate, Training, Testing and Running provided dataset (COCO, TINY)
2 Analyze sample dataset
¢ Developed an application support manage dataset, labeling objects and export
training dataset compatible with analyze platforms
s Using the result exported to verify data which is images, videos
Research time lines: (Plan of action: describe the planning and the assigned tasks for
each students)
Task Start Date | End Date | Assigned
to
Meet advisor Mr.Binh to contact and have 01/08/2020 Tu, Xuan
idea about project
Research about technology and framework | 03/08/2020 | 09/08/2020 | Tu, Xuanfor working
Survey main machine learning framework: 10/08/2020 | 16/08/2020 Tu
Investigate, Training, Testing and Running
provided dataset(COCO, TINY)
Analyse sample dataset 17/08/2020 | 15/09/2020 Xuan
Trang 7Develope an application which support 17/08/2020 | 25/11/2020 Tu
manage images, videos, labeling objects and
export training dataset compatible with
analysed platforms
Using the result exported to verify data 25/11/2020 | 02/12/2020 | Tu, Xuan
which is image, video
Testing and completing the application 02/12/2020 | 9/12/2020 | Tu, Xuan
Document writing and Reporting 09/12/2020 | 15/12/2020 | Tu, Xuan
Approved by the advisor(s)
Signature(s) of advisor(s)
Ho Chi Minh city, 18/9/2020Signature(s) of student(s)
Trang 8LIST OF ACRONYMS xiiiABSTRACT
OPENING 5< HH TH HH TH nh T00 n3 0741.0340401 XVCHAPTER I: PROJECT OVERVIEW scscscssssssssssssessssesssssssessesesesssssessesesessseseeenee 11.1 Problems and questions: cccccccsssssssssssssccsssesssssecessnesccsnseeessnseessneceesnneeesnueeeesneesseneete 1
1.1.1 Problem and statement:
1.1.2 Annotation Tools Research:
1.1.2.1 Lableling:
1.1.2.2 5/4 6 '/' *ẽố.\4 xkx L 31.1.3 Comments: :c-cooccoctrhhhhhhhhrrrrririiiiiiirrrirrrrrrree 41.2 Project’s S€0p€: HH HH HH 4
1.3 Report Layout:
CHAPTER 2: THEORETICAL BASI
2.1 Annotation Overview:
2.1.1 Manual annofafi0n: 22222+ HH HH He, 7
2.1.2 Auto AnnofafÏONI - 5s nung HH HH niệu 8
2.1.3 Ammotation’s eafUT€S: - 52222 nh reo 92.2 Object Detection Platforms and Algorithms: - -c 57555cccccer 12
2.2.1 YONOE a cecscsssssssssssssseseeeeeseeseeseeeeeenns 1.12
2.2.1.1 Yolo’s framework: DARKNET
2.2.1.3 YoloV4 requirements: 0 ececcsccccccscsesssssssssssssnneeeesseesseeseceeesesssssssssunnnnmeseseseeeteetee 142.2.1.4 YOIOV4 dataset: con 152.2.1.5 Set up environment on Darknet/yolo V4: -ccccccrrrrrrrrrrrree 182.2.1.6 Pros and Cons: - 5222222222 2tr 23
vi
Trang 92.2.2 Tensorflow/SSD_resnet: - 2 s2 2 E22 x2 2117111111112 11tr crrer 232.2.2.1 TensorÍÏoW:: -cnnnHH_.,.,nH HH HH HH He Ha 232.2.2.2 Tensorflow requiremenfs: 22555 2222 ri 242.2.2.3 Temsorflow dafase(: HH rrereeren 242.2.2.4 Set up environment on Tensorflow/SSD_resnefS: - .c5-55+ 292.2.2.5 Pros and Cons: 55 nu ngư ướt
CHAPTER 3: GROUNDTRUTH SYSTEM DESIGN AND ANALYSIS 323.1 Groundtruth OYerVieW: che 323.2 Groundtruth Main Functions: .2555 22s HH, ra 33
3.2.1 Function Requirements: -33
3.2.2 Non-Function Requirements:
3.3 Use Case Diagrams
3.3.1 Use case diagram lisf: ào 343.3.2 Use case diagram desCrÏDfÏOH: c5 c2 tì tinh Hy 353.3.2.1 Manage dataset use case diagraI: cc:5Svccttiitrkiirrtiierirrrrke 35
3.3.2.2 Manage data source use Case dỉagram: c-ccccccccccceccrrrrrrrrrrer 36
3.3.2.3 Manage labels use case dỉag8raIm: - 5s 55v tre 373.3.2.4 Export dataset use case diaỹraIm: - s25 2E tr ry 383.4 Sequence đỉagTaImS: - 22th 393.4.1 Add dataset sequence đỉaðTaI: - - 55s 22th titttrirrrrririrrrreriiere 393.4.2 Delete dataset/data sourc sequence diagrams cccccccccer 39
3.4.3 Add data source sequence đỉagram: cv 40
3.4.4 Delete data source sequence diagram:
3.4.5 Edit data source sequence diagram:
3.5 Database:
CHAPTER 4: TESTING ON YOLO AND SSD_RESNET
4.1 Deploy Groundtruth For Generating Dataset For Yolo An Tensorfow: 444.1.1 QV€TV€W: HH HH HH 444.1.2 Yolo training and testing deploymen(: -ccc5ccccccccccecirrrrrrrrrrrer 444.1.2.1 Trainings oe ccceecssseccssseecessvecessevecesseeeessnsccssnessssvecssnneccsnnecesnueecssneceesnneceunneeesneeess 444.1.2.2 Tesfing: eo
4.1.3 SSD_Resnet Training And Testing Deployment: -ssse+ 50
4.1.3.1 Trainings occ ccceeccsssecessseecessvecessevecesnsesssnnsecssseceesnnecessnecesnneessneeeessneceesneceunneseeneeets 50
vii
Trang 104.1.3.2 TeSfÏNB: on HH HH HH HH HH ghi 524.1.3.3 Auto Image Annotation for Tensorflow:
CHAPTER 5: CONCLUSION AND FUTURE DEVELOPMENT: 56
viii
Trang 11Figure 2-10: Yolo Dataset c.cececcecsssesessseessesseeesesesescseeceeseeeeseseseaeeceeeseseeeseaeenseeeeeasees 5Figure 2-11: - Coordinate Text File Explanation ‹-+-++c+c+scs+szx+cs> 6Figure 2-12: - Folder Obj of Yolo Dataset c cccececesseeeeseseeeeseseseseeeesesesesssneseaeseees 7Figure 2-13: - File Obj.Data Của Yolo Dataset 5c cccccsterererrrrrer 7Figure 2-14: Obj.Names Of Yolo Dataset c-cc St tr ưec 8Figure 2-15: Obj.Names Of Yolo Dataset 0 cece eessseeseseeseseseseeeseeseseseeeeneseaeseeee 8Figure 2-16: Alexeyab’s Darknet Github 00.0.0 cceceseseeesesseseseseeeneneeeeeeseseseaeeneeeeeeeeees 9Figure 2-17: Download And Install CUDA Toolkit eee ceeeeeeseeneneseee 9
Figure 2-18: Download And Install OpenCV
Figure 2-19: Cmake Configuration With CUDA Toolkit
Figure 2-20: Configuration File And Pre-Trained Model
Figure 2-21: Config Cmake To Unzip The Environment From Alexeyab
LIST OF FIGURES
œEElt›
User Interface of labellmg application -¿-5-5- 55+ ssss+s+xsescee 2Graphic User Interface of CV ATT + xxx seerrrrrrereerkrkree 3Example of Image Annotation
A Manual Annotation Tool For Machine LearningAuto Image Annotation Process
Bounding Box Annotation FeatureCuboid Box Annotation Feature - 5c cccc+xersrereersrrerrere 1Line Annotation Feature s-c:ccccxcccctettrterrerrrrerrrrrrrerrrrrrrree 2Darknet Framework LLOBO - - - + 6 SE Sk*k+kEEEEkEEkekekrkkrkrkrkeree 3Performance Of Yolov4 Compared With Other Object Detection
Figure 2-22: Building Darknet/Yolo Environiment - c5 +++c+c+scxcrexexee 2
Figure 2-23: Darknet/Yolo EnVITOIeIII ó6 + tk 22
Figure 2-24: Tensorflow Library LOBO - ¿+ SE Sk*k‡EEEEEEEkekeErkkrkrkrkrtee 24Figure 2-25: VOC Dataset For TensOr[ÏOW óc tt St Sky 24Figure 2-26: Tensorflow metadata files ¿S5 S SE 11111 re 25
Figure 2-27: Data from XML Ẩile S3 SH re 25Figure 2-28: Indicators Files For Environment To Training . - 26
ix
Trang 12Figure 2-29: Data Sources Formatted With MDỐ - 6-65 cccetssrerrrererereek 26Figure 2-30: Labels From Tensorflow DatasSet 6 Sàn 27Figure 2-31: Annotation File For Tensorflow 6-5 csseeveeerrrrerererexee 27Figure 2-32: Data Stored In Annotated File 0.0 cccesseeceseseeeneeeeeeseseeeseneeeneeeeeseeees 28Figure 2-33: Label_Map.Pbtxt Eile - «5S rên 28Figure 2-34: Data Source Directory Stored In Train.txt File - - - 29Figure 2-35: Data Source Directory Stored In Vai.txt File cece -ccccccscs> 29Figure 2-36: Checking GPU Support
Figure 2-37: Install Python
Figure 2-38: Installing Virtualenv
Figure 2-39: Import Tensorflow Library
Figure 3-1: Groundtruth user interface .cceeeeseseseeeeseceeeseseeceeeeeeeseseseeeseeneneeeeeeaes 33Figure 3-2: Main Use Case điagraim - «c5 St TH ưếc 34Figure 3-3: Manage Dataset Use Case Diagram .c.ccccceceesseseeeeeseeteseseseensneeaeseees 35
Figure 3-4: Manage Datasource Use Case Diagram .:.cccecscscsseesesteteseseseeneseneeee 36Figure 3-5: Manage Labels Use Case Diagram .c.cccccsccseseesenseseseeteesesesneneseneees 37Figure 3-6: Export Dataset Use Case Diagram -. ¿- 5S se 2ccxcxsrererrrreree 38Figure 3-7: Add Dataset Sequence IDiagram - ¿6555 S+cx+xsrtrerrrkerer 39Figure 3-8: Delete Dataset/Data Source Sequence Diagram -‹ 39Figure 3-9: Add Data Source Sequence Diagram
Figure 3-10: Delete Data Source Sequence Diagram
Figure 3-12: Groundtruth Database
Figure 3-13: Srcdb.Xml File cccccccccesesseseseseseseeseseseeseneseseseeeeensseseseeeeeseseseenesenseee 41Figure 3-14: Setdb.Xml Eiles c2 St SH eưườ 42
Figure 3-15: Mediasource TFỌ€T - «ch HH 42Figure 3-16: Groundtruth FFỌ€T - «xxx kg re 43Figure 4-2: Set Up Dataset into EnvirOnIeIIL 5 Server 44
Figure 4-3: Input Commands For Training OÌO -¿- ¿555 ++s+s*cvxeeeeexexev 45
Figure 4-4: Training Process Of O]O càng Hư 45Figure 4-5: Indicators To Notice When Training, ccccceceseseseseeeeseseseeneeeneneeeeeeees 46Figure 4-6: Training Đ €SuÌ( - cà TT HH HT 46Figure 4-7: Configuration For Yolov4-Obj.Cfg
Figure 4-8: Input commands for testing
Figure 4-9: Machine Detecting Object
Trang 13Figure 4-10: YOLO detection Result Number Í 5c c5 5csstsvrvrerererexev 48Figure 4-11: YOLO detection Result Number 2 5c 555 tt svrvrerererexev 49Figure 4-12: Json EÏÏe ch SH HH HH HH HH 49
Figure 4-13: Active Environment For Tensorflow And Create “Train.Record” And
“Val.Record’
Figure 4-14: Complete Dataset Of Tensorflow
Figure 4-15: Configuration For Training .c.cccccecesesesessesescseesesesesessesesesenesenseeeseeeaes 51Figure 4-16: Input Command For Training For SSD_Resnet On Tensorflow 51Figure 4-17: Training Process .ccccccsesesssssseseseseseseseesensseseseseeeesseseseeeaessseseeneaeseees 52Figure 4-18: Export Saved Model Environment Commands . ‹-‹-+ 52
Figure 4-19: Export Saved Model Complete .ccccsssessseeseseeeeessesesesesesenseeeeseeees 53Figure 4-20: Command Line for Validating Detection On Tensorflow/SSD_Resnet .53Figure 4-21: Tensorflow detection Result Number Ï - - 5s +<sc+xexses+ 53Figure 4-22: Tensorflow detection Result Number 2 555 5<<<sc<c-x-x-x/Figure 4-23: Result Of Validating Data
Figure 4-24: Data Source’s Metadata As An Xml File
Figure apendix 1: YOLO Test Result For Standard Test
Figure apendix 2: SSD_resnet Standard Test For SSD_Resnet Result 65Figure apendix 3: YOLO Test Result For Contrast Boosted Test . -‹ 65Figure apendix 4: SSD_resnet Contrast Boosted Test For SSD_ Resnet 66Figure apendix 5: YOLO Result Of Low-Resolution 'Test 5s 5+ secsxsc++ 66Figure apendix 6: YOLO Low Resolution Test ResuÏt - 5555 <<< 67
xi
Trang 14LIST OF TABLES
calle
Table 3-1: Use case diagram list 0 0.0 e cece eee ee + )
xii
Trang 15LIST OF ACRONYMS
CUDA: Compute Unified Device Architecture
GPU: Graphics processing unit
CC: Compute Capability
GCC: GNU Compiler Collection
MSVC: Microsoft Visual C++
CuDNN: Deep Neural Network library
CVAT: Computer Vision Annotation Tool
YOLO: You Only Look Once
xiii
Trang 16Thesis project “An image annotation tool” is a project focus on users who demand
a tool which generate and manage datasets for training machine learning in a fast,convenient and accuracy way
This project is to build and developed an annotation tool which helps to create,generate and manage datasets response to the demand of basic functions such as
annotating and labeling object, managing datasets and data sources (pictures and video)
of those datasets in a convenient way for the users
After a brief about this project, the authors had planned to implement project as below:
⁄ Research of steps how to detect an object.
v Research about Darknet/YOLO
⁄ Research about Tensorflow/SSD resnets
Y Developed and build up Groundtruth application to labeling, annotating and managing
dataset and data source
Y Set up environments for Darknet/YOLO and Tensorflow/SSD_resnets.
Vv Using datasets generated from GroundTruth application to training and tesing on both
environment and algorithms
xiv
Trang 17With development of AI and machine learning all over the globe, especially object
detection because of the high demands, access, exploit and need of using them in reallife has been larger than ever Especially in Big-Tech companies
Object detection’s scope applies not only to individual users/small-scale businessesbut also big corporation Following with small-scale businesses and that is one of the
hardest tasks is to create their own annotation tool There are two popular kind of
annotation tools, one of which is powerful but they’re not open-source, the other one isfree but lacks many functionalities
Our software provides a free of charge annotation tool, better features than some
nowadays tools Our targeted users are:
> Individual user
> Small-scale business company
xv
Trang 18CHAPTER I: PROJECT OVERVIEW
This chapter will have and general view about current status of annotation tool and
overview about object detection’s algorithms in machine learning and deep learningfields Thereby, describe the problems, research surveys and provide solution to theproblems
1.1 Problems and questions:
1.1.1 Problem and statement:
In this part, the author will state a problem when a user wants to use object detection
to detect an object such as a car or a bike Below are the steps the user would do:
1 Collect object’s images for detection
2 Choose the algorithm that the user wants to implement detection
3 Study/Investigate about algorithms such as: algorithm’s framework, pre-trainedmodels and dataset, study about how algorithm works is optional
4 Build environment to run detection algorithm
5 Prepare dataset for training detection model
6 Implement detection on trained model
The first 4 steps and step 6" are easy to do But step 5 is quite difficult, a normal
user will usually get stuck or having a struggle when come into step 5 The user will
first search on the internet for an annotation and label tool to create training dataset Let
surf the internet and draw out conclusion about annotation tool pros and cons with theuser.
Trang 191.1.2 Annotation Tools Research:
1.1.2.a Lablellmg:
AUsersirflynn/srclabellmg/derlUsersiflynn/srcliabelimg/deNUsersiflynn/srciabellmg/deNsers/rfiynn/srclabellmg/deAUsersirflynnisrolabellmg/NUsersirflynn/srolabellmg/lUsersirflynn/srcliabellmg/
Usersirlynn/srolabel
Figure 1-1: User Interface of labelImg application[1]
¢ Open Source
¢ Friendly User Interface
e Easy to install and to use
¢ Support 2 types of dataset profiles which are VOC and YOLO
¢ Missing dataset’s management feature
e Support old VOC version which is VOC2007 dataset profile
e Cannot annotate and label on videos.
2
Trang 201.1.2.b CVAT:
may
This blog post was originally published at Intel's website It is reprinted here with the permission of Intel.
Pros:
Figure 1-2: Graphic User Interface of CVAT[2]
One of the top annotation tools at this time
Professional User Interface and easy to use
Lot of strong functions to support annotating and labeling such as: Datasetmanagement, annotation and label on video, Image edit and filter, video edit
function
Support almost every type of dataset profile
Support different features of annotation
Not open source
One of the biggest cons is low accessibility This annotation tool is not public forindividual, you cannot rent or pay to use You must be authorized/given access
to use this tool
Trang 211.1.3 Comments:
After taking a research on the internet about annotation tool, the authors and the usercome up with conclusion:
e Friendly and easy to use User Interface always is the 1 criteria.
e Annotation tools which open source are lacking mandatory functions such as
dataset management or video annotating and labeling
e Some tool is even support with early version of dataset profiles
e Annotation tools which fully capable of creating a training dataset such as
CVAT has biggest cons which is accessibility against an individual user
Understand about pros and cons of some annotation tool on the internet The authors
saw there is an opportunity to overcome those tool’s disadvantages So that we decided
to develop an annotation tool to meet the needs of the users An annotation tool support
for object detection which is easy to use, open source, fully capable of create and managedataset - data source - labels, support different types of dataset profile and support latest
version of those profile
1.2 Project’s Scope:
The authors only focus on research some of article below to build an application forthis project:
- Research about Darknet/YOLO
- Research about Tensorflow/SSD_resnet
- Research and develop an Auto Annotate function for Tensorflow evironment
- Research and develop an open-source annotation tool to support creating and
managing datasets — data sources — labels Also support annotation and labeling
on videos, can export latest version of dataset profiles and have some basic filter
function for data sources
- Building environment for YOLO and Tensorflow
- Use dataset generated from our annotation tool as an input to training models for
each algorithm then validate trained model
4
Trang 221.3 Report Layout:
The thesis comes with 5 chapter with the content below:
Chapter 1: Project overview:
- Introduce and have an overview about this thesis project Contents are: Status quo and
problem we are facing, short description of other annotation tools, project scope and
result and report layout
Chapter 2: Theoretical basis:
- Definition about Annotation, types of annotation and features Definition anddescription about algorithms and technology used in this project Those are: C#,
python, Darknet/YOLO, Tensorflow/SSD_resnet, auto image annotation
Chapter 3: GroundTruth system design and analysis
- Introduce about Groundtruth, functions, system analysis, database analysis and UserInterface design
Chapter 4: Testing on Darknet/YOLO and Tensorflow/SSD_resnet
- In this chapter, the authors will deploy Groundtruth to export dataset for training and
testing for both algorithms which are YOLO and SSD_RESNET After train and test,
the authors will deploy compare performance between Groundtruth and CVAT tool
Chapter 5: Conclusions and future development
- Conclusion about project, Pros and cons, Improvement and upgrade in the future
Trang 23CHAPTER 2: THEORETICAL BASIS
This chapter will discuss about theoretical basis of the project which include
definition about annotation, types of annotation and its features In this chapter, theauthors explain about algorithms are YOLO, SSD_resnet and guide how to set upenvironment to train and test these algorithms
2.1 Annotation Overview:
Annotated data sources are becoming important part of machine learning to train thecomputers for recognizing/detecting the various types of objects on roads or other places.Image annotations highlights and label a particular object by outlining using a specialtool
In machine learning and deep learning, image annotation is the process of labeling
or classifying an image using text, annotation tools, or both, to show the data featuresyou want your model to recognize on its own When you annotate an image, you are
adding metadata to a dataset [7]
Image annotation is a type of data labeling that is sometimes called tagging,transcribing, or processing You also can annotate videos continuously, as a stream, or
frame by frame
Image annotation marks the features you want your machine learning system torecognize, and you can use the images to train your model using unsupervised learning.Once your model is deployed, you want it to be able to identify those features in imagesthat have not been annotated and, as a result, make a decision or take some action [3]
Image annotation is most commonly used to detect objects and boundaries and tosegment images for instance, meaning, or image understanding For each of these uses,
it takes a significant amount of data to train, validate and test a machine learning model
to achieve the desired outcome
Trang 24e Simple image annotation may involve labeling an image with a phrase that
describes the objects pictured in it For example, you might annotate an image
of a cat with the label “domestic house cat.” This is also called image
classification or tagging, annotating
se Complex image annotation can be used to identify, count, or track multiple
objects or areas in an image For example, you might annotate the difference
between breeds of cat: perhaps you are training a model to recognize the
difference between a Maine Coon cat and a Siamese cat Both are unique and
can be labeled as such The complexity of your annotation will vary, based on
the complexity of your project
2.1.1 Manual annotation:
Manual annotation is one of the basic tasks in Computer vision technology.Annotated images are needed to train machine learning algorithms to recognize objectscontained in visuals and give computers the ability to ‘see’ almost like we humans
do But manual annotation has its weakness, Manual image annotation can be consuming and quite expensive, especially when the set of images that need annotation
time-7
Trang 25is extremely large The human-powered task of adding labels to an image (annotating)
to create training datasets for computer vision algorithms AI and machine learningengineers usually predetermine these labels manually using special image annotation
software or tools: they define regions in an image and create text-based descriptions to
No Name Maks Type Data Set |
1 20d 800445013694 7b 70 Goet60 0 image acb
Pre Ply = Pause Nest Snapset
Figure 2-2: A Manual Annotation Tool For Machine Learning
2.1.2 Auto Annotation:
Auto Image annotation has become integral part of AI development to create thetraining data for machine learning Its helps to make the objects recognizable formachines into images And as much as annotated images as a data is available for training
Trang 26the machines the accuracy level of prediction would be higher allowing AI developers tomake the right model.
Low confidence
Automatic Select segments Manual
annotation with low confidence correction
Train initial => J
model
Trained model
⁄⁄⁄Z_ Đ Retrain model
Figure 2-3: Auto Image Annotation Process
In race of supplying such training data companies are using the automatic route to
annotate the images by machines and get the high volume of data As AI-based imageannotator tools and software has been developed for such needs and they can produce
large amount of annotated images in the less time period fulfilling the needs of machine
Trang 27It is one of the most common and important method of image annotation techniquesmainly used to outline the object in the image In this thesis, this is the goal that authors
Figure 2-4: Bounding Box Annotation Feature
CUBOID ANNOTATION:
This is called 3D cuboid annotation that involves, high-quality labeling and markingtechnique to highlight the objects in the third-dimension sketching formats It helps tocalculate the depth or distance of various objects like gadgets, building, vehicles and also
on humans to distinguishing the volume and space of the object 3D cuboid annotation is
basically used for construction and building structure field including radiology imaging
in medical fields
10
Trang 28LINE ANNOTATION:
Line annotation is used to draw the lines on the roads or streets to make it identifiable
for training the vehicle perception computer models to detect the lane It is different fromother types of annotations like bounding boxes and cuboid annotation It is suitable for
drawing attention to important areas like road or streets, decoration or diagramming the
process flow to provide the clear view of streets to a machine like self-driving cars [8]
11
Trang 292.2 Object Detection Platforms and Algorithms:
2.2.1 Yolo:
2.2.1.1 Yolo’s framework: DARKNET
An open-source Neural Networks in C
Darknet is an open-source neural network framework written in C and CUDA It is
fast, easy to install, and supports CPU and GPU computation Darknet is developed first
by Pjreddie but after Pjreddie dropped this project, AlexeyAB continue its development
Darknet is a framework environment support for training and testing process of yoloalgorithm Beside of support yolo, darknet also support other object detection algorithms
such as RNNS, Tiny darknet, CIFAR-10, ImageNet, etc
12
Trang 302.2.1.2 Yolo:
YOLO is short for You Only Look Once It is a real-time object recognition systemthat can recognize multiple objects in a single frame YOLO recognizes objects more
precisely and faster than other recognition systems It can predict up to 9000 classes and
even unseen classes The real-time recognition system will recognize multiple objectsfrom an image and also make a boundary box around the object It can be easily trainedand deployed in a production system [4]
First version of yolo is created by Pjreddie, through times and versions, yolo is
improving its accuracy when training/testing, training times is shorter Until version 3.0,Pjreddie decided to drop out yolo At the present, yolo is at version 4.0 and developed
by AlexeyAB promises more improvement on performance, accuracy and less timeconsuming on training In this project, the authors decided to use YoloV4 as an
experiment for Groundtruth application
13
Trang 31MS COCO Object Detection
Trang 322.2.1.4 YoloV4 dataset:
YOLO uses dataset of COCO dataset which helps to improve on trainingperformance, high accuracy and reduce time-consuming for training
Performance on the COCO Dataset
Model Train Test mAP FLOPS FPS
SSD300 COCO trainval test-dev 41.2 46
SSDS500 COCO trainval testdev 46.5 19
YOLOv2 608x608 COCOtrainval testdev 48.1 40 Tiny YOLO Coco trainval testdev 23.7
SSD321 COCO trainval testdev 45.4 DSSD321 COCO trainval test-dev 46.1 R-FCN Coco trainval testdev 51.9 SSD513 COCO trainval test-dev 50.4 DSSD513 COCO trainval test-dev 53.3
FPN FRCN Coco trainval testdev 59.1
Retinanet-50-500 COCO trainval test-dev 50.9 Retinanet-101-500 COCO trainval testdev 53.1 -
Retinanet101800 COCOtrainval testdev 57.5
-YOLOv3-320 COCO trainval test-dev 51.5 38.97 Bn YOLOv3-416 GOCO trainval tesi-dev 553 65.86Bn
YOLOv3-608 Coco trainval test-dev 57.9 140.69BnYOLOv3-tiny COCO trainval testdev 33.1 5.56 Bn
YOLOv3-spp COCO trainval testdev 60.6 141.45Bn
File folder 12/22/2020 2:54 PM DATA File
Figure 2-10: Yolo Dataset
15
Trang 33Yolo dataset consists of these components shown below:
Obj folder:
This folder stored data sources whose name converted with MDS format and a textfile consist of labeled class, x coordinate, y coordinate, width and height of the datasource Text file’s name and data source’s name will be named matching so users willnot mistake when implement on dataset
This place is where the data sources (pictures and coordinate text files) are stored
In the coordinate file is a text file comes as below:
Class number followed up as array types
¢ X coordinate: Set at the center of the rectangular annotate filed and refer to Oxy
coordinate axis
¢ Y coordinate: Set at the center of the rectangular annotate filed and refer to Oxy
coordinate axis
e Width: The width of annotation field
e Height: The width of annotation field
Figure 2-11: - Coordinate Text File Explanation
16
Trang 34nent
4614_ 14,
be4614_1.©xt 755fa534_1.jpg
11 tet ay ument
—_1.jpg
1.bet ;, PM ument
nent
Figure 2-12: - Folder Obj of Yolo Dataset
e Obj.data file: This file contains dataset’s informations
‘Obj.data’ file contains the number of class labeled in a specific dataset Example:
A dataset contains dog and cat then number of classes is set to 2 If there is only dog orcat, then class is set to 1
Beside, ‘Obj.data’ also contain directory to other config file once set up as inputsinto environment such as ‘train.txt’, ‘obj.name’ and back up folder directory where
trained ‘.weights’ files are stored
Trang 35e Obj.names: The place to stored labeled names.
Cat
100% — Windows (CRLF)
Figure 2-14: Obj.Names Of Yolo Dataset
e Train.txt: This file stored datasources’ directory once set up into environment
data/obj /3d085617356640289721140a6Fca308e_1-jpa
Gata/obj /665e39a878d2eeco3Gce08c9755Fa534_1 Jp data/obj /6d42a8419200d024cba967 38695668b1_1 - pe
Gate/ob3 /75104541a88901 SbatesebFs19da50e7—4 -3pe
Gata/obj /8019202a6db0550a05330FS436be8045_1Jpe data/ob3 /b2bbb481b42batF435b0032bb25b1 3bd_1 - pe
data/obj /ec8o352a724c412ef1F394816dSede95_1 - SpE data /obj /Feb2GbbeA46685624824da0aGaabes75_1 Ips
int, colt 100% Windows (CRLF) uTr-s
Figure 2-15: Obj.Names Of Yolo Dataset
2.2.1.5 Set up environment on Darknet/yoloV4:
In this section will be the guide how to install and set up environment for Darknet
and yoloV4 These are the main steps for you to follow up to get a environment whichsupport training and testing [4]
18
Trang 36-Step 1: go to alexeyAB’s github and clone the source code to your local
Figure 2-16: Alexeyab’s Darknet Github
- Step 2: Download and install CUDA toolkit
e Note: please select the properly CUDA toolkit version which compatible with
the GPU you are using
se Note 2: please check your GPU which have enough CC that >= 3.0 on
darknet/yolo requirements
Get Started
The above options provide the complete CUDA Tock fr application development Runtime components for depioying CUDA-based applications are avalble in reacto-utecontiners
from NVIDIA PU Cou
Installing the CUDA Toolkit Introduction to CUDA Getting Started with CUDA Discover Latest CUDA Capabilities
Figure 2-17: Download And Install CUDA Toolkit
- Step 3: Download and Install OpenCV with CMake library
19
Trang 37Note: please check with your CUDA toolkit and OpenCV version compatibility.
Figure 2-18: Download And Install OpenCV
Grigor Opn rota Cama ld 162039
Figure 2-19: Cmake Configuration With CUDA Toolkit
- Step 4: Download config file and pre-trained model
20
Trang 38Detection Using A Pre-Trained Model
Nill guide ý ough detecting objects with the
If ye
of reading all t
git clone htt
ubdireetory
just run thi
Figure 2-20: Configuration File And Pre-Trained Model
- Step 5: Config Cmake together with alexeyAB’s environment
Aowsies69-c7
ome iNet alba
net Ls fine for nox, but uselib_teack has been disabled!
18 rebuild OpencY fron sources with CUDA support to enable Ít
~gencode srchecompuLe_75,codersa, 75 ~Wno-deprecated-declarations -Xoonpilers*/wit013, /wid018, /xd4929, /44047, /u44068, /e44099, /vd4101, /wd4213,.
L |
Figure 2-21: Config Cmake To Unzip The Environment From Alexeyab
21
Trang 39Figure 2-22: Building Darknet/Yolo Environment.
- Step 7: After build on Visual Studio then you can use Darknet/yolo environmentfor training and testing with object detection
22
Trang 402.2.1.6 Pros and Cons:
Pros:
e Fast Good for real-time processing
e Predictions (object locations and classes) are made from one single network Can
be trained end-to-end to improve accuracy
e Process frames at the rate of 45 fps (larger network) to 150 fps(smaller network)
which is better than real-time
e The network is able to generalize the image better
Tensorflow is developed by Google — Google Brain team to pursuit purpose of
implement for research and apply for machine learning and other field such as logistic,
Al in an effective way Tensorflow was license to operate at 9/11/2015 as a library forcomputing
Tensorflow is separated as definitions below:
- Tensor: Define as types data structures gathered in a main library is tensorflow
Inside this structure, there are 3 basic elements as: level, dimension and types
- In other way, the fundamental of Tensorflow is not defined as an environment forAl/machine learning but a computing library [5]
23