55-55 S+S+‡SteStt+Et+E+rtErterererertertrrrrrrrrrrrrrrrrrrrrre 7 Figure 3-1: Android Studio user interface ...ccceccsecssecsssscesssssssseseseesessesesseessesessssesseseseeseseenes 11 Fi
Trang 1VIETNAM NATIONAL UNIVERSITY HOCHIMINH CITY
UNIVERSITY OF INFORMATION TECHNOLOGYADVANCED PROGRAM IN INFORMATION SYSTEMS
NGUYEN TUAN ANH
GRADUATION THESISBUILDING A MOBILE APPLICATION FOR DETECTING AND
RECOGNIZING INFORMATION OF DRUGS
BACHELOR OF ENGINEERING IN INFORMATION SYSTEMS
Trang 2NATIONAL UNIVERSITY HOCHIMINH CITYUNIVERSITY OF INFORMATION TECHNOLOGYADVANCED PROGRAM IN INFORMATION SYSTEMS
NGUYEN TUAN ANH - 18520465
GRADUATION THESISBUILDING A MOBILE APPLICATION FOR DETECTING AND
RECOGNIZING INFORMATION OF DRUGS
BACHELOR OF ENGINEERING IN INFORMATION SYSTEMS
THESIS ADVISOR
Dr PHAN XUAN THIEN
Trang 3ASSESSMENT COMMITTEE
The Assessment Committee is established under the Decision , date
by Rector of the University of Information Technology
- Chairman
2 - Secretary
3 - Member
Trang 4I would like to thank my family all of them to my mother the greatest womanIknow, my brothers and sisters for their love and support for their wishes and
prayers All gratitude to my thesis advisor Prof Dr Phan Xuan Thien of the
information system faculty at University Of Information Technology He was therewhen I ever need, for his wise directions and his full support and encouragement
Words cannot express my gratitude to my professor and chair of my committee forhis invaluable patience and feedback I also could not have undertaken this journeywithout my defense committee, who generously provided knowledge and expertise.And i would to thank all instructor in IS department especially Associate ProfessorPhD Nguyen Dinh Thuan.Thanks should also go to the librarians, research assistants,and study participants from the university, who impacted and inspired me
Finally, I must express my very profound gratitude to my friends forproviding me with unfailing support and continuous encouragement throughout myyears of study and through the process of researching and writing this thesis Thisaccomplishment would not have been possible without them Thank from bottom of
my heart!
Trang 5TABLE OF CONTENTS
caLeo
Chapter 1 INTRODUCTION
cL Background cccceccccececsessesesseesesssesesseessseeseesessseusseeseesesnssesecsseusseeseesesnseesseess 1.2 _ Problem Statement ccceccecccseeseeseesesseseeseesesneseeseessssssessessesnsssseessenssesseeseses O.3 Aim of the Study
1.3.1 Study ObJ€C(IV€S cành HH HH Hit 4
4 Significance of the Study - -c«ceseeeeieeiirrreeree A.5_ Study LimItatiOIS - ¿+ St 1 St 2101 HH1 1 He 5.6 Overview of the Thesis
.7 _ Project SChedulle ¿xxx St E1 HH0 te 6
Chapter 2 LITERATURE REV HE ẨW SH HHHHHYY043388844808088846 8
2.1 Design and Development of Mobile based Medication classification
Chapter 3 THEORETICAL FRAMEWORK .cscsssssssscsesencscsesecsencncseseenenensees 11
3.1 Android Studio
3.2 Java Programming Language -c-555cseeeeereeseereeererreee LZ3.3 Mobile Application - se sesesreteeeeerrirrrrrrer LZ3.4 Neural Networks -ĂẶSsieeierrrrrrrr LS3.5 Convolutional Neural Networks (CNN) <eceeeeee LS
Trang 63.9 Object Detection ¿- c5 OO.3.10 TensorFlow Object Detection API 5-5555<5ccccsecseeeeeeeeeee e OO3.11 Libraries Used c.ceceececcessesessesscssesessessesscsessessesussssessessssessessesssnssseseessseeseeses OD
Model Testing ¿565cc 4DModel ResuÌL -c 55c55c555c5sccccstserereeeererrrrrrrrrrrrrrrrrre 4D
Chapter 5 SYSTEM IMPLEMENTATION -eeieesreeerrerrrerree 45
Near By Hospital places 555cc seseseesreerrrrrrrr 48,Scan QR Barcode 1 ,.Create Bill Page - ¿sen rrrerrrrerc OO.bình ÔÔÔÔÔÔÒÔÔỎ
Trang 7Chapter 6 CONCLUSION AND FUTURE WORK
6.1 COncÏUSIOn e6 Set kề HH HH1 ng ườn6.2 Future WOFK 6 St TH KH HH1 HH1 hờn
REFERENCES
Trang 8LIST OF FIGURES
œ4LHlg»
Figure 1-1 Gantt chart of the SÍHUẢY 55-55 S+S+‡SteStt+Et+E+rtErterererertertrrrrrrrrrrrrrrrrrrrrre 7 Figure 3-1: Android Studio user interface ccceccsecssecsssscesssssssseseseesessesesseessesessssesseseseeseseenes 11
Figure 3-2: Mobile development framework (Kulathumani, 2015)J18] - 13
Figure 3-3: : Architechture of Neural Networks [19].
Figure 3-4: Architechture of Convolution Neural Networks [23]
Figure 3-5: Diagram fo Faster R-CNN [26]
Figure 3-6: Faster R-CNN Architecture of faster_rcnn_test.pt | Ï9] -+ 19
Figure 3-7: Anchor generation [29] cscscsssssevesessesssssessesvecneeneeseessesesssssssassseeeneeneensensene 21
Figure 3-8: RPN data ÏOWV - S55 HH ri 22 Figure 3-9: Rol Pooling Layer.
Figure 3-10: Examples of image segmentation and object detection: (a) Input image;
(b) Semantic segmentation; (c) Object detection; (d) Instance segmentation
Figure 3-11: Mask R-CNN Architechture
Figure 3-12: Structure of RolAlign
Figure 3-13: Process of the proposed Method cccccccssessessesvessesvseveesessssssssesssseseeneeneeneeneene 26
Figure 3-14: Result of the pill area detection: (a) Detection result image; (b)
Cropped image of instance segmentation Outer rectangle is a bounding box and
inner solid line indicates a detected pill area; (c) Cropped image of detection
information consisting of the number of pill, detection scores, and bounding box
POSITIONS Ăn re
Figure 3-15: Process of data labeling and JavaScript Object Notation file creation:
(a) Process of data labeling; (b) Structure of JavaScript Object Notation, 28 Figure 3-16: Training process of pill detection using mask region-based
convolutional neural network
Figure 3-17: Mechanism of Object Detection in TensorFlow
Figure 4-1: Dataset Image of Alaxan, Bactidol, Bioflu, Biogesic, DayZinc, Decolgen,
Fish Oil, Kremil S, Medicol, and Neozep .
Figure 4-4: Architecture of CNN Model for Drugs Classification . -+ 38 Figure 4-5: Setting Procedure of the SO[ÏWđTE - tt êc 40 Figure 4-6: Model Training BÉ HH HH HH HH HH HH he
Figure 4-7: Training Process
Figure 4-8: Graph Training and validation Accuracy/Loss
Figure 4-9: Uploading the model and result
Trang 9Figure 5-1: Some main Function of this app
Figure 5-2: Mainmenu Figure 5-3: View of Drug detail on the
List of drug On Aatabase nan ố - 47Figure 1-5: Click nagative button Figure 1-6: View of open camera Figure 1-7: View
Of Open VDI 000088666 48
Figure 5-5: Near By Hospital places .- - ST TH TH HH it 49
Figure 5-6: Scan QR barcode SCF€€H 55t S£‡ct‡E‡EEEErErertertrrtrrrrrrrrrrrrrrrrrrrrrree 50
Figure 5-7: Bill Page ŠCT€€H - - kề TT HH HH HH ghe 51
Figure 5-8: The Statistic Screen
Trang 10Table 1: Project schedule
Table 2: Drug datasets,
LIST OF TABLES
caLeo
training and testing data Of PTOj€CI - ¿+55 c+c+<++ 35
Trang 11overall, and even put the patient's life in danger We provide a solution to that issuethat makes use of deep learning to identify pharmaceuticals in order to assist
physicians and nurses in appropriately dispensing medications This work uses a
CNN model baseline deep learning drug identification to explore how identificationconfusion of similar images by humans arises through the cognitive counterpart of
deep learning solutions in the hunt for better image-based solutions to the drug
identification problem We introduce the fundamental ideas behind object recognitionmodels in this study To find the most effective pill recognition model, we trained
each algorithm using a dataset of images of pills and then examined how well the
CNN models performed and applied the CNN network model to the drug
identification problem on the Android platform The recent advancement in
technology has provided an enabling technique to solve these types of problems bydesigning and developing an application that can run on smart phones in which
patient will find it easy to carry along The medication classification application couldimpact positively on the life of the patient as it will help patients in keeping track oftheir daily pills as remembering the intake of these prescribed medications could be amatter of life and death The performance of the model is evaluated by the correct
recognition rate and investigated with many different cases
Trang 12Health is riches, according to a well-known proverb For most people, having
excellent health is one of the most important things because poor health can result in
a very terrible life (Leonard, 2008) There are up to 10,000 different medications onthe market right now, many of which are LASA medications, and there are constantlymore pharmaceuticals entering the market
The US FDA has received over 95,000 reports of drug mistakes since 2000 Drugname confusion resulting from similar looks or readings accounts for about 25% oferrors [1] The Malaysian Ministry of Health also received 5,003 reports of
prescription errors in 2011, with LASA medications accounting for around 6% of theincidents
Most recently for Vietnam, in April 2018, there was an instance when pregnantwomen were given the incorrect medication at the Health Center of the Tan Phuoc
district Specifically, the pharmacy accidentally gave patients Misoprostol 200mcgfor abortion purposes when the doctor had prescribed Miproton 100mg for pregnancymaintenance [2]
Another instance of confusion occurred in the beginning of 2014 when a
physician at Binh Chanh Hospital (HCMC) gave a patient Levetiracetam (an
anti-epileptic drug) instead of Piracetam (a medication that enhances cell metabolism andsupports central nervous system activity) because the two medications are believed to
be similar [3]
Trang 13Drug interactions are not only dangerous for the patient and can even be fatal, butthey are also inefficient Whether an accident happened or the potential for a harm
existed, medication errors are errors in the ordering or delivery of a drug Adverse
drug events can be caused by some prescription errors [4] A pharmaceutical error isany avoidable circumstance that could result in improper medication use or patient
damage The following recommendation has been accepted as the working definition
of medication error by the National Coordinating Council for Medication Error andPrevention (NCCMERP): " any preventable event that may cause or lead to
inappropriate medication use or patient harm, while the medication is in the control ofthe health care professional, patient, or consumer" The following activities may beconnected to professional practice, healthcare systems, and products: prescribing,
order communication, product labeling, packaging, and nomenclature, compounding,dispensing, and distribution [5]
Nevertheless, recent technological advancements have made it possible to solvethese kinds of problems in a variety of ways, one of which is by purchasing a robotthat is specifically designed to remind the doctor to dispense the right medication forthe patient and to help the patient understand how to take the medicine However, theaforementioned solution appears to be ineffective and expensive (Riehemann et al.,2009) [6] Instead, using a mobile application looks to be more efficient because it
eliminates the need to purchase a separate device and because the majority of peopleuse smartphones The study decided to employ one of the most popular smartphoneoperating systems, Android, because it is the best in the smartphone industry
However, according to top-tier engineers, Android appears to be quite effective in
smartphones (Nosrati, 2012) [7]
The Android operating system was created from the very beginning to enable
developers to create compelling mobile applications that fully consider the
preferences of each device Because of this, the suggested mobile application is
compatible with smartphones utilizing one of the most widely used mobile operating
Trang 14systems, Android Using a CNN model, the program essentially serves to remind
doctors or users to take their medications properly and in the proper proportions
Additionally, the suggested method aids in medication differentiation and displays
some drug-specific information, such as the drug's name, action, and production
date In order to design, develop, and implement an android-based application fordrug classification using Java programming language, CNN model, and some androidAPIs The software is made to assist users in getting the most out of their medicationwhile minimizing the chance of forgetting to take a dose or doses at the wrong time
1.2 Problem Statement
For the majority of people, health is one of the most important things because,
without it, everything seems to go wrong Recently, it has become more common fordoctors to prescribe the incorrect medication and for people to utilize medications
without being aware of the source As the number of medications rises year after
year, doctors with a limited knowledge base will inevitably become confused
regarding color and shape in the absence of product packaging Medication abuse is avery severe issue because it can impact a patient's general health, delay healing, andraise their overall medical expenses
The CNN model is used to classify drugs into various categories, and several drugAPIs are used to provide some information, giving doctors and patients the right
source of information to prevent unfortunate confusion As a result, the design and
development of an application for classifying drugs and providing some drug
information based on the android platform can help to limit the aforementioned
problems
1.3 Aim of the Study
The goal of this project is to use the Java programming language and the AndroidStudio integrated development environment to design and construct an Android-
Trang 15based application for the classification of medications as recommended by a doctor topatients.
1.3.1 Study Objectives
+ Drug Recognition and classification helps doctors and patients avoid drug
confusion
* Support users find the nearest hospital
+ Assist drug sellers to generate invoices when selecting drugs from the list.
* Make a statistical chart of total revenue by invoice of the day
+ Design a drug recognition and classification application that can support androidwhich one of the most widely used OS with 70% of Mobile OS users (Android
Statistics 2022) [8]
1.4 Significance of the Study
* The study shows how to use an existing model and the Android Studio
integrated development environment to design, develop, and deploy a mobile
application for Android-based drug identification and classification
* This application is intended to help patients maximize the use of the drug and
avoid the risk of not taking the medicine according to the doctor's prescription and thecorrect dose as prescribed by the specialist
+ It helps doctors keep track of their medication dispensing to patients with the
correct prescriptions previously issued and control the amount of drugs prescribed
* The Android-based feature for sorting and displaying information is intended to
be of great assistance to medical professionals and patients dealing with a variety ofissues, including forgetfulness, busy schedules, old age, cognitive disorders,
unfavorable working conditions, Alzheimer's disease, dementia, people with
emotional problems, stress, anxiety, and depression, as well as those with extremelybusy work schedules or lifestyles busy
Trang 16+ By using the application, doctors may easily issue invoices when patients
request them and limit confusion when distributing medications
+ By enabling patients to differentiate between the drugs they are taking and thesource of those drugs, the program increases their sense of security while taking
1.5 Study Limitations
This study is restricted to the design, development, and implementation of an
Android-based mobile application for drug identification and categorization using
Java programming and the Android Studio Integrated Development Environment
(IDE) in conjunction with some android libraries However, the application is createdwith the following restrictions:
+ Drug classification is entirely platform-dependent; as a result, the program onlyfunctions on Android-enabled phones like the OPPO, Vivo, and Samsung It is not
compatible with iOS
* The application needs to be online in order to find the closest hospital Becausethe hosted application will restrict the user from doing any actions, do not connect thedevice to the internet
+ Although the CNN model can categorize pharmaceuticals with a fair amount ofefficiency, there are still many drugs on the market today for which there is
insufficient data to identify drug classes
¢ When a lot of photos are used in the model, the application is sluggish and
occasionally crashes
1.6 Overview of the Thesis
The six chapters that make up the whole written study are included here, along
with a brief summary of each one
s Chapter 1: This chapter introduces the design, development of an android-basedmedical categorization system, the research challenge, the research goal, the researchobjective, and the relevance of the research restrictions on research and application
Trang 17s Chapter 2: In order to identify and categorize pharmaceuticals for patients, thisstudy evaluates a variety of publications from various academic sources on the
design, development, and deployment of mobile applications
¢ Chapter 3: The framework and related technologies are discussed in this
chapter as they were employed in the design, creation, and implementation of an
Android-based drug classification application platform
s Chapter 4: The objectives of the application's development as well as the design
of an android-based drug classifier and identification application are covered in thischapter
¢ Chapter 5: This section of the thesis addresses the step-by-step implementation
of the intended Android-based medical categorization and identification application
s Chapter 6: This chapter completes the design and implementation of a mobileapplication for Android that is used to identify and categorize drugs It also offers
some suggestions on how to make the app even better
Table 1: Project schedule
Work done Duration
Project Feasibility Stdies -. - 6 St re 1 Weeks
Design and DevelopImeI - s55 5+5+scssservevereeeeeesexee.e.2: Weeks
Program 'Tes(IIB - 1t E3 121211 10H11 1g th 1 week
ImpleImenttiOH - 5-2 2 5% SE ESx2EEEk#EEEkEEkEEEEREEkCkrkEkrkrrkrkrrrii 9 Week
Trang 18Project Write up
Write up corrections
.2 Weeks
Fast Pick Drug Application
Project Feasibility Studies
Draft idea
Draft concept
Draft features
Draft minimum viable product
Draft core functions
Integrate system module
Perform Initial testing
Development finished
Program Testing
Perform system testing
Document issuse found
correct issuses found
10%
coco eeeseeeeee# eeceeceeece$ $ 85%
T 80%
15%
80%
ooo
Nguyen Tuan Ant
(BREE Nguyen Tuap Anh
BBE Nguyen Tuan Ann RRB Nguyen Tuan Anh
EREBREIB tjoyen Tuan Anh (BERRIEN No yen Tuan Anh IHBENE liouyon Tuan Anh
Nguyen Tuan Anh
Nguyet
Nguyen Tuan Anh|
juan Anh juan Anh
WEB Nouyen Tuan Anh
Trang 19Chapter 2 LITERATURE REVIEW
This study analyzed a range of academic literature on the conception, creation,
and deployment of mobile applications for patient drug identification and
classification
2.1 Design and Development of Mobile based Medication classification
The Computational Photography Project for Pill Identification (C3PI) was
developed in response to the National Institutes of Health's Pill Image Recognition
Challenge It has been demonstrated that extracting picture data using a
high-resolution camera on a smartphone and computer vision algorithms is an effective
method (Zeng, 2017) [9] CBIR now incorporates deep learning techniques to
improve its ability to extract features (contents) from input photos in order to find andretrieve related images from a database (Bose, 2020) [10]
Deep models enable the extraction of both high-level and low-level
characteristics, which is not possible with traditional CBIR (Bose, 2020) [10] The
ability of deep learning to recognize objects (Krizhevsky, 2014), faces (Taigman,
2014), and manage complex learning issues has been amazing (LeCun, 2015) Deeplearning has enhanced healthcare workflows as well, benefiting both patients and
caregivers (Delgado, 2019) Convolutional Neural Networks (CNNs) are
sophisticated methods for retrieving digital images Convolutional, pooling, and fullylinked layers that interact and are stacked make up the CNN architecture (Bose, 2020)[10]
Using AlexNet as its foundation, Krizhevsky created the multi-CNN architectureknown as MobileDeepPill (Zeng, 2017) [9] The method included measuring shape,color, and gradients to determine comparisons between consumer and reference
photos For picture identification, Wang et al (Wang, 2010) [11] employ clever edgedetection and a classifier from the Google Inception Network For identifying the
Trang 20shape, color, and imprint of the pill, respectively, GoogleNet has developed shape,
color, and feature models But unlike the NINJH dataset, pill data was collected in avery controlled setting (Delgado, 2019) [12] Other methods with varying degrees ofaccuracy have been developed for pill picture recognition The color property and asupport vector machine (SVM) learning algorithm are two C3PI techniques (Guo P.S., 2017) [13]
The method's overall color classification accuracy was 97.90% Despite this, thetechnique's usefulness is constrained by elements including the lighting situation, thecamera resolution, and the contrast between the color of the pill and the background(Guo P S., 2017) [13] Distance Set, a local descriptor, was first developed by
Grigorescu et al (Grigorescu, 2003) [14] The method looks at distance sets betweenany point and any of its k neighbors on the contour of the pill shape The technique'sdrawbacks include distortion brought on by noise, complex shapes, or irregular
imprints Eakins (2000) The Two-Step Sampling Distance Set (TSDS) enhances thetechnique of distance sets by adding imprint and color features to the shape of the pill
When the approach was used to 12500 photos, it achieved an accuracy of 93.64%.According to He & Zhang (2016), one of the finest computer vision systems for
object detection and face recognition is the deep Residual Network (ResNet) Evenwhen training thousands of layers, the deep learning technique can produce
convincing results (He & Zhang, 2016) [15] ResNet offers a significant comparativeadvantage over AlexNet, the VGG network, and GoogLeNet, which include just 5,
19, and 22 convolutional layers, respectively ResNet is a significantly deeper
learning technique A group of smaller networks make up ResNet
2.2 Summary
There are flaws with the aforementioned connected works that pertain to the
study's topic area:
Trang 21The systems are often built to be platform-dependent, meaning they are eithercompatible with the iOS or Android OS It follows that iOS users cannot
utilize the application if the system is built to run on the Android OS, and vice
Some designs include capabilities like the ability to locate the closest hospital
or drugstore Users feel uncomfortable using the application because invoicescannot be generated when choosing medicines from the list
Last but not least, some of the systems demonstrated require the purchase ofspecialized hardware, whereas other applications necessitate a significant
amount of hardware processing power
Trang 22Chapter 3 THEORETICAL FRAMEWORK
expanded dramatically over the past several years along with the growth of mobile
usage As the world becomes more digital, many businesses seek for remote Androiddevelopers for their development projects because it saves money and has numerous
Figure 3-1: Android Studio user interface
This IDE's IntelliJ IDEA capabilities allows for quick code completion times andimmediate workflow evaluation Android Studio has some capabilities, including
code push for modifications and a fantastic code editor for efficient coding output Byallowing developers to push code and facilitate rapid changes without completely
Trang 23restarting the app, Android Studio enables developers to quickly incorporate changes.This guarantees fantastic flexibility for implementing minor app modifications whilethe app is still in use One of Android Studio's main benefits, such as speedier
programming, is made possible by its user-friendly code editor It also guarantees
cutting-edge refactoring, code completion, and code analysis The emulator includedwith the Android Studio helps launch the full app more quickly than the actual
device The emulator can simulate a variety of hardware capabilities like GPS,
multiple touch inputs, motion and acceleration sensors, etc by enabling you to test
the app across a variety of devices, including phones, tablets, Android Wear, and
Android TV [17]
3.2 Java Programming Language
Because of its ease of use and effectiveness, Java was chosen as the programminglanguage for this project Another factor is that the programming language was
created by Mobile specifically for the creation of Android applications
Trang 24Mobile App
IDE
Mobile App Development Frameworks
to the network that these neurons are forming The biological neurons seen in the
neurological or sensory systems of humans are where the concept of artificial neuronsoriginated An artificial neural network is divided into layers, just as the neural
network found in the human body In an artificial neuron, the dendrites are merely theneurons' information terminals The input is processed by the axon and its output istransferred to various neurons via the synapses and dendrites of another neuron In
the computational model, the weight of the line increases input signals that pass alongthe input line The mathematical function processes the weighted input signal The
Trang 25activation function is the name given to this specific function The signal that has
previously been processed is once more sent to the neurons in the layer below for
further processing The weight of the link between neurons in this model is
understood to represent a component of learning Throughout the training, the value
of this model is adjusted in an effort to reduce the error to zero In the human body,the signals that are carried by the dendrites are added in the cell body, and if the sumexceeds a certain value, the axon then initiates the transmission of messages A
similar approach is used in mathematical or numerical models The activation
function decides what the threshold value should be The activation function's
standard decision is known as the sigmoid function When the summation value isentered, the sigmoid function transforms it into a reach that falls between 0 and 1
Trang 26Artificial neurons that receive and analyze incoming data make up a neuralnetwork The input layer, the hidden layer, and the output layer all receive data.
When input data is provided to a neural network, it begins to function The intendedresult is subsequently produced by processing the data through its layers A neural
network generates results after learning from structured data There are three types oflearning that can occur within neural networks:
e With the use of labeled data, inputs and outputs are provided to the algorithms
during supervised learning After receiving training on how to evaluate data,they then anticipate the intended outcome
e Unsupervised Learning: ANNs learn without the aid of humans The output is
decided based on patterns found in the output data; there is no labeled data
e With reinforcement learning, the network adjusts its learning based on the
feedback you provide
3.5 Convolutional Neural Networks (CNN)
Convolutional neural networks (CNN) basically classify the images into groups,cluster them according to how similar they are, and perform object detection with theaid of artificial neural networks The convolutional neural network uses the image'sdata to analyze the image as a tensor, or a matrix of integers with additional
dimensions, and performs a kinematic search [20]
A percentage of the situations in which the images are recognized as volumes are3D objects [21][22] Numerous applications, such as item identification and facial
recognition, are uploading it One of the top non-trivial assignments is this one Thethree distinct layer types that are seen as components of CNN are the convolutionallayer, the subsampling layer, and the fully connected layer [21] Since CNN offersmore benefits than other techniques, it is primarily utilized for image recognition
Trang 27Figure 3-4: Architechture of Convolution Neural Networks [23].
The input layer, convolution layer, down-sampling layer, fully connected layer,and output layer are the five main components of the CNN design, as depicted in
Figure 3.4 Below is a full explanation of each component:
e Input Layer: The input raw data set may be entered directly into the layer of
input The input layer receives one image by way of its pixel value
e Convolutional layer: Also known as the layer that performs upsampling and is
responsible for identifying characteristics in the input data Differentconvolutional kernels extract various aspects from the input data, and eachconvolutional layer has its own convolutional kernel As the number of
convolutional kernels used in the up-sampling layer rises, more features areextracted
¢ Down-sampling layer: the layer known as the pooling layer Its primary duty is
to complete the second feature data extraction, which is followed by theconvolution layer Under typical circumstances, the CNN architecture includes
Trang 28two down-sampling layers and at least two convolutional layers The morelevels of the architecture that are established, the more likely it is that
attributes taken from the input data can aid in clear classification
e Fully connected layer: As input, all of the feature maps are connected The
nodes of the neurons in each layer are typically isolated, but the nodes of theneurons in the later layer are connected to the nodes of the neurons in the
earlier layer In order to produce a probability for various scenarios, this layerintegrates and normalizes the previously convolutioned features that have beenabstracted
© Output layer: The number of neurons in this layer is determined by the
conditions that must be met If classification is necessary, there is typically acorrelation between the number of neurons and the number of categories thatneed to be classified
3.6 Faster R-CNN
One element in the image can be grouped or categorised using a simple CNN
algorithm Faster R-CNN is a Regional Proposal Network (RPN) addition to CNN
[26] The Faster R-CNN algorithm is used since it will aid in identifying several items
in a single image Two modules were used to create a faster R-CNN Regions will beoffered with RPN in the main module's deep convolution network, and the next
module will use the suggested images for classification RPN displays the output for agiven image as a rectangle object position that includes the item's score Anchors arethe name given to the object's proposal
An RPN can be used to forecast the likelihood of objects in the background Forthis, a training dataset with named and labeled items in the image is required The
anticipated areas are reshaped using a pooling layer known as the Region of Interest(ROD) It will then be used to categorize the image within the region and predict thevalues of the offset around the bounding boxes The accuracy of the final model will
Trang 29depend on how well the key regions are proposed At that time, it is highly likely that
it will be classified into the various classes of classifications if the regions offered tochoose the appropriate region depending on the object [25]
Figure 3-5: Diagram fo Faster R-CNN [26]
Faster R-CNN is based on the above figure can be broken down into four primarysections:
e Layers Conv The feature map is extracted by a base network
e Networks for Region Proposals (RPN) RPN produces output region
proposals and anchors
e Pooling by region of interest (RoI) This layer converts the proposal's
feature map to target dimensions
Trang 30e Classifier The final classes and bounding boxes were output.
conv ù man | yy full connection q
Faster RCNN
Figure 3-6: Faster R-CNN Architecture of faster_rcnn_test.pt [19]
Faster R-CNN, the name of our object detection system, consists of two
components A deep fully convolutional network serves as the first module, offeringregions, while a fast R-CNN detector serves as the second module [27], using the
regions offered The system as a whole functions as an item detection network
(Figure 3.5) The RPN module instructs the Fast R-CNN module where to seek by
referring to neural networks with "attention" processes, a concept that has lately
gained popularity [28]
3.6.1 Region Proposal Networks (RPN)
The Selective Search algorithm is used by the R-CNN and Fast R-CNN models togenerate region proposals Each suggestion is sent to a CNN that has already receivedtraining In this research [24], a network capable of producing area proposals was
proposed as the region proposal network (RPN) These are some benefits:
Trang 31e A network that can be trained and adapted to the detection task is now used
to provide region proposals
e The network used to create the ideas can be trained from beginning to end
to be specific to the detecting task Consequently, compared to moregeneral techniques like Selective Search and EdgeBoxes, it generates betterregion recommendations
® The RPN uses the same convolutional layers as the Fast R-CNN detection
network to process the image In contrast to algorithms like SelectiveSearch, the RPN produces ideas in a similar amount of time
e The RPN and the Fast R-CNN can be combined or united into a single
network because they share the same convolutional layers Training is thusonly performed once
3.6.1.1 Anchor
The feature map of the final shared convolution layer is shown in the following
picture to be passed via a rectangular sliding window of size nxn, where n=3 for theVGG-16 net K region ideas are generated for each window Each suggestion is
parametrized in accordance with an anchor box, a reference box The anchor boxes'two parameters are as follows:
e Aspect Ratio
e ScaleThere are typically three scales and three aspect ratios, for a total of K=9 anchorboxes But K might not be the same as 9 In other words, each area proposal results inthe production of K regions, each of which has a different scale or aspect ratio
Trang 32| 2Ä scores | 4k coordinates | XÃ tìnchorboxc
cls ”À / reg layer
t intermediate layer l
sliding window
conv feature map
Figure 3-7: Anchor generation [29]
In order to provide scale-invariant object detectors, reference anchors(also known as anchor boxes) are utilized The anchors exist at many scales,allowing for the usage of a single image at a single scale Using several photos
or filters is avoided in this way The RPN and the Fast R-CNN detection
network need to communicate features, and the multi-scale anchors are
essential for this An anchor with a scale and aspect ratio is centered at the
sliding window in question (Figure 3.7) We employ three scales and three
aspect ratios by default, resulting in k = 9 anchors at each sliding position
There are W H k anchors in total for a convolutional feature map of size W x
H (typical ~2,400)
Trang 333.6.1.2 Proposal Layer
The RPN structure is seen in the above picture Its input is a feature map from aconvolutional layer, and it has two data flows: the upper flow classifies anchors usingpositive or negative labels, while the lower flow determines the bounding box
regression offset To create and filter appropriate proposals, these flows are then
combined in a Proposal layer
A M*N image is transformed into a (M/16)*(N/16)*512 feature map using FasterR-CNN Let's use M/16 for W and N/16 for H The W*H*512 feature map is filtered
by a 1*1*18 conv layer, as seen in the upper data flow (the blue frame in Figure 3.8),with the aim of translating 512-dimensional feature map to 2*9 (positive/negative of
9 anchors) dimention vectors to categorize positive or negative anchors The WH18feature map is then sent to a softmax classifier to determine if each anchor has a
positive or negative probability
The base network is used by RPN to categorize which anchors are positive
(covering the ground truth) and which anchors are negative after setting up a densenumber of candidate anchors on the scale of the original image (outside the groundtruth) It ultimately resolves a problem with binary classification
3.6.1.3 Region of Interest (RoI) Pooling
The Rol Pooling layer receives suggestions from RPN and the feature map fromthe base network, as shown in the picture below The layer's primary function is to
Trang 34extract feature maps that are covered by proposals The issue is that the proposals arenot fixed-size boxes, even though the R-CNN requires fixed-size feature maps in
order to categorize them into a fixed number of classes
Faster R-CNN uses Rol Pooling, which is derived from Spatial Pyramid Pooling,
to address this problem Assuming that the proposal size is M*N and the fixed featuremap size is pooled w*pooled h, the procedure is straightforward:
e Transform proposal at 1/16 scale from the real image space to the feature
map space to get ROI
e Divide Rol: If necessary, round down the result after dividing the feature
map region that corresponds to the proposal into pooled w*pooled h grids
e Run maximum pooling for every grid
3.7 Mask R-CNN
Faster R-CNN is an instance segmentation technique that concurrently predicts
the bounding box that informs the position of the existing item and the mask of theobject region [25,26] Mask R-CNN is an expanded model of Faster R-CNN An
illustration of image segmentation and object detection may be seen in Figure 3.10.Four different sorts of pills and a background make up the input image known as
Figure 3.10a The outcome of a semantic segmentation is displayed in Figure 3.10b