1. Trang chủ
  2. » Luận Văn - Báo Cáo

Khóa luận tốt nghiệp: Building a mobile application for detecting and recognizing information of drugs

69 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Building A Mobile Application For Detecting And Recognizing Information Of Drugs
Tác giả Nguyen Tuan Anh
Người hướng dẫn Dr. Phan Xuan Thien
Trường học University of Information Technology
Chuyên ngành Information Systems
Thể loại Graduation Thesis
Thành phố Ho Chi Minh City
Định dạng
Số trang 69
Dung lượng 20,66 MB

Nội dung

55-55 S+S+‡SteStt+Et+E+rtErterererertertrrrrrrrrrrrrrrrrrrrrre 7 Figure 3-1: Android Studio user interface ...ccceccsecssecsssscesssssssseseseesessesesseessesessssesseseseeseseenes 11 Fi

Trang 1

VIETNAM NATIONAL UNIVERSITY HOCHIMINH CITY

UNIVERSITY OF INFORMATION TECHNOLOGYADVANCED PROGRAM IN INFORMATION SYSTEMS

NGUYEN TUAN ANH

GRADUATION THESISBUILDING A MOBILE APPLICATION FOR DETECTING AND

RECOGNIZING INFORMATION OF DRUGS

BACHELOR OF ENGINEERING IN INFORMATION SYSTEMS

Trang 2

NATIONAL UNIVERSITY HOCHIMINH CITYUNIVERSITY OF INFORMATION TECHNOLOGYADVANCED PROGRAM IN INFORMATION SYSTEMS

NGUYEN TUAN ANH - 18520465

GRADUATION THESISBUILDING A MOBILE APPLICATION FOR DETECTING AND

RECOGNIZING INFORMATION OF DRUGS

BACHELOR OF ENGINEERING IN INFORMATION SYSTEMS

THESIS ADVISOR

Dr PHAN XUAN THIEN

Trang 3

ASSESSMENT COMMITTEE

The Assessment Committee is established under the Decision , date

by Rector of the University of Information Technology

- Chairman

2 - Secretary

3 - Member

Trang 4

I would like to thank my family all of them to my mother the greatest womanIknow, my brothers and sisters for their love and support for their wishes and

prayers All gratitude to my thesis advisor Prof Dr Phan Xuan Thien of the

information system faculty at University Of Information Technology He was therewhen I ever need, for his wise directions and his full support and encouragement

Words cannot express my gratitude to my professor and chair of my committee forhis invaluable patience and feedback I also could not have undertaken this journeywithout my defense committee, who generously provided knowledge and expertise.And i would to thank all instructor in IS department especially Associate ProfessorPhD Nguyen Dinh Thuan.Thanks should also go to the librarians, research assistants,and study participants from the university, who impacted and inspired me

Finally, I must express my very profound gratitude to my friends forproviding me with unfailing support and continuous encouragement throughout myyears of study and through the process of researching and writing this thesis Thisaccomplishment would not have been possible without them Thank from bottom of

my heart!

Trang 5

TABLE OF CONTENTS

caLeo

Chapter 1 INTRODUCTION

cL Background cccceccccececsessesesseesesssesesseessseeseesessseusseeseesesnssesecsseusseeseesesnseesseess 1.2 _ Problem Statement ccceccecccseeseeseesesseseeseesesneseeseessssssessessesnsssseessenssesseeseses O.3 Aim of the Study

1.3.1 Study ObJ€C(IV€S cành HH HH Hit 4

4 Significance of the Study - -c«ceseeeeieeiirrreeree A.5_ Study LimItatiOIS - ¿+ St 1 St 2101 HH1 1 He 5.6 Overview of the Thesis

.7 _ Project SChedulle ¿xxx St E1 HH0 te 6

Chapter 2 LITERATURE REV HE ẨW SH HHHHHYY043388844808088846 8

2.1 Design and Development of Mobile based Medication classification

Chapter 3 THEORETICAL FRAMEWORK .cscsssssssscsesencscsesecsencncseseenenensees 11

3.1 Android Studio

3.2 Java Programming Language -c-555cseeeeereeseereeererreee LZ3.3 Mobile Application - se sesesreteeeeerrirrrrrrer LZ3.4 Neural Networks -ĂẶSsieeierrrrrrrr LS3.5 Convolutional Neural Networks (CNN) <eceeeeee LS

Trang 6

3.9 Object Detection ¿- c5 OO.3.10 TensorFlow Object Detection API 5-5555<5ccccsecseeeeeeeeeee e OO3.11 Libraries Used c.ceceececcessesessesscssesessessesscsessessesussssessessssessessesssnssseseessseeseeses OD

Model Testing ¿565cc 4DModel ResuÌL -c 55c55c555c5sccccstserereeeererrrrrrrrrrrrrrrrrre 4D

Chapter 5 SYSTEM IMPLEMENTATION -eeieesreeerrerrrerree 45

Near By Hospital places 555cc seseseesreerrrrrrrr 48,Scan QR Barcode 1 ,.Create Bill Page - ¿sen rrrerrrrerc OO.bình ÔÔÔÔÔÔÒÔÔỎ

Trang 7

Chapter 6 CONCLUSION AND FUTURE WORK

6.1 COncÏUSIOn e6 Set kề HH HH1 ng ườn6.2 Future WOFK 6 St TH KH HH1 HH1 hờn

REFERENCES

Trang 8

LIST OF FIGURES

œ4LHlg»

Figure 1-1 Gantt chart of the SÍHUẢY 55-55 S+S+‡SteStt+Et+E+rtErterererertertrrrrrrrrrrrrrrrrrrrrre 7 Figure 3-1: Android Studio user interface ccceccsecssecsssscesssssssseseseesessesesseessesessssesseseseeseseenes 11

Figure 3-2: Mobile development framework (Kulathumani, 2015)J18] - 13

Figure 3-3: : Architechture of Neural Networks [19].

Figure 3-4: Architechture of Convolution Neural Networks [23]

Figure 3-5: Diagram fo Faster R-CNN [26]

Figure 3-6: Faster R-CNN Architecture of faster_rcnn_test.pt | Ï9] -+ 19

Figure 3-7: Anchor generation [29] cscscsssssevesessesssssessesvecneeneeseessesesssssssassseeeneeneensensene 21

Figure 3-8: RPN data ÏOWV - S55 HH ri 22 Figure 3-9: Rol Pooling Layer.

Figure 3-10: Examples of image segmentation and object detection: (a) Input image;

(b) Semantic segmentation; (c) Object detection; (d) Instance segmentation

Figure 3-11: Mask R-CNN Architechture

Figure 3-12: Structure of RolAlign

Figure 3-13: Process of the proposed Method cccccccssessessesvessesvseveesessssssssesssseseeneeneeneeneene 26

Figure 3-14: Result of the pill area detection: (a) Detection result image; (b)

Cropped image of instance segmentation Outer rectangle is a bounding box and

inner solid line indicates a detected pill area; (c) Cropped image of detection

information consisting of the number of pill, detection scores, and bounding box

POSITIONS Ăn re

Figure 3-15: Process of data labeling and JavaScript Object Notation file creation:

(a) Process of data labeling; (b) Structure of JavaScript Object Notation, 28 Figure 3-16: Training process of pill detection using mask region-based

convolutional neural network

Figure 3-17: Mechanism of Object Detection in TensorFlow

Figure 4-1: Dataset Image of Alaxan, Bactidol, Bioflu, Biogesic, DayZinc, Decolgen,

Fish Oil, Kremil S, Medicol, and Neozep .

Figure 4-4: Architecture of CNN Model for Drugs Classification . -+ 38 Figure 4-5: Setting Procedure of the SO[ÏWđTE - tt êc 40 Figure 4-6: Model Training BÉ HH HH HH HH HH HH he

Figure 4-7: Training Process

Figure 4-8: Graph Training and validation Accuracy/Loss

Figure 4-9: Uploading the model and result

Trang 9

Figure 5-1: Some main Function of this app

Figure 5-2: Mainmenu Figure 5-3: View of Drug detail on the

List of drug On Aatabase nan ố - 47Figure 1-5: Click nagative button Figure 1-6: View of open camera Figure 1-7: View

Of Open VDI 000088666 48

Figure 5-5: Near By Hospital places .- - ST TH TH HH it 49

Figure 5-6: Scan QR barcode SCF€€H 55t S£‡ct‡E‡EEEErErertertrrtrrrrrrrrrrrrrrrrrrrrrree 50

Figure 5-7: Bill Page ŠCT€€H - - kề TT HH HH HH ghe 51

Figure 5-8: The Statistic Screen

Trang 10

Table 1: Project schedule

Table 2: Drug datasets,

LIST OF TABLES

caLeo

training and testing data Of PTOj€CI - ¿+55 c+c+<++ 35

Trang 11

overall, and even put the patient's life in danger We provide a solution to that issuethat makes use of deep learning to identify pharmaceuticals in order to assist

physicians and nurses in appropriately dispensing medications This work uses a

CNN model baseline deep learning drug identification to explore how identificationconfusion of similar images by humans arises through the cognitive counterpart of

deep learning solutions in the hunt for better image-based solutions to the drug

identification problem We introduce the fundamental ideas behind object recognitionmodels in this study To find the most effective pill recognition model, we trained

each algorithm using a dataset of images of pills and then examined how well the

CNN models performed and applied the CNN network model to the drug

identification problem on the Android platform The recent advancement in

technology has provided an enabling technique to solve these types of problems bydesigning and developing an application that can run on smart phones in which

patient will find it easy to carry along The medication classification application couldimpact positively on the life of the patient as it will help patients in keeping track oftheir daily pills as remembering the intake of these prescribed medications could be amatter of life and death The performance of the model is evaluated by the correct

recognition rate and investigated with many different cases

Trang 12

Health is riches, according to a well-known proverb For most people, having

excellent health is one of the most important things because poor health can result in

a very terrible life (Leonard, 2008) There are up to 10,000 different medications onthe market right now, many of which are LASA medications, and there are constantlymore pharmaceuticals entering the market

The US FDA has received over 95,000 reports of drug mistakes since 2000 Drugname confusion resulting from similar looks or readings accounts for about 25% oferrors [1] The Malaysian Ministry of Health also received 5,003 reports of

prescription errors in 2011, with LASA medications accounting for around 6% of theincidents

Most recently for Vietnam, in April 2018, there was an instance when pregnantwomen were given the incorrect medication at the Health Center of the Tan Phuoc

district Specifically, the pharmacy accidentally gave patients Misoprostol 200mcgfor abortion purposes when the doctor had prescribed Miproton 100mg for pregnancymaintenance [2]

Another instance of confusion occurred in the beginning of 2014 when a

physician at Binh Chanh Hospital (HCMC) gave a patient Levetiracetam (an

anti-epileptic drug) instead of Piracetam (a medication that enhances cell metabolism andsupports central nervous system activity) because the two medications are believed to

be similar [3]

Trang 13

Drug interactions are not only dangerous for the patient and can even be fatal, butthey are also inefficient Whether an accident happened or the potential for a harm

existed, medication errors are errors in the ordering or delivery of a drug Adverse

drug events can be caused by some prescription errors [4] A pharmaceutical error isany avoidable circumstance that could result in improper medication use or patient

damage The following recommendation has been accepted as the working definition

of medication error by the National Coordinating Council for Medication Error andPrevention (NCCMERP): " any preventable event that may cause or lead to

inappropriate medication use or patient harm, while the medication is in the control ofthe health care professional, patient, or consumer" The following activities may beconnected to professional practice, healthcare systems, and products: prescribing,

order communication, product labeling, packaging, and nomenclature, compounding,dispensing, and distribution [5]

Nevertheless, recent technological advancements have made it possible to solvethese kinds of problems in a variety of ways, one of which is by purchasing a robotthat is specifically designed to remind the doctor to dispense the right medication forthe patient and to help the patient understand how to take the medicine However, theaforementioned solution appears to be ineffective and expensive (Riehemann et al.,2009) [6] Instead, using a mobile application looks to be more efficient because it

eliminates the need to purchase a separate device and because the majority of peopleuse smartphones The study decided to employ one of the most popular smartphoneoperating systems, Android, because it is the best in the smartphone industry

However, according to top-tier engineers, Android appears to be quite effective in

smartphones (Nosrati, 2012) [7]

The Android operating system was created from the very beginning to enable

developers to create compelling mobile applications that fully consider the

preferences of each device Because of this, the suggested mobile application is

compatible with smartphones utilizing one of the most widely used mobile operating

Trang 14

systems, Android Using a CNN model, the program essentially serves to remind

doctors or users to take their medications properly and in the proper proportions

Additionally, the suggested method aids in medication differentiation and displays

some drug-specific information, such as the drug's name, action, and production

date In order to design, develop, and implement an android-based application fordrug classification using Java programming language, CNN model, and some androidAPIs The software is made to assist users in getting the most out of their medicationwhile minimizing the chance of forgetting to take a dose or doses at the wrong time

1.2 Problem Statement

For the majority of people, health is one of the most important things because,

without it, everything seems to go wrong Recently, it has become more common fordoctors to prescribe the incorrect medication and for people to utilize medications

without being aware of the source As the number of medications rises year after

year, doctors with a limited knowledge base will inevitably become confused

regarding color and shape in the absence of product packaging Medication abuse is avery severe issue because it can impact a patient's general health, delay healing, andraise their overall medical expenses

The CNN model is used to classify drugs into various categories, and several drugAPIs are used to provide some information, giving doctors and patients the right

source of information to prevent unfortunate confusion As a result, the design and

development of an application for classifying drugs and providing some drug

information based on the android platform can help to limit the aforementioned

problems

1.3 Aim of the Study

The goal of this project is to use the Java programming language and the AndroidStudio integrated development environment to design and construct an Android-

Trang 15

based application for the classification of medications as recommended by a doctor topatients.

1.3.1 Study Objectives

+ Drug Recognition and classification helps doctors and patients avoid drug

confusion

* Support users find the nearest hospital

+ Assist drug sellers to generate invoices when selecting drugs from the list.

* Make a statistical chart of total revenue by invoice of the day

+ Design a drug recognition and classification application that can support androidwhich one of the most widely used OS with 70% of Mobile OS users (Android

Statistics 2022) [8]

1.4 Significance of the Study

* The study shows how to use an existing model and the Android Studio

integrated development environment to design, develop, and deploy a mobile

application for Android-based drug identification and classification

* This application is intended to help patients maximize the use of the drug and

avoid the risk of not taking the medicine according to the doctor's prescription and thecorrect dose as prescribed by the specialist

+ It helps doctors keep track of their medication dispensing to patients with the

correct prescriptions previously issued and control the amount of drugs prescribed

* The Android-based feature for sorting and displaying information is intended to

be of great assistance to medical professionals and patients dealing with a variety ofissues, including forgetfulness, busy schedules, old age, cognitive disorders,

unfavorable working conditions, Alzheimer's disease, dementia, people with

emotional problems, stress, anxiety, and depression, as well as those with extremelybusy work schedules or lifestyles busy

Trang 16

+ By using the application, doctors may easily issue invoices when patients

request them and limit confusion when distributing medications

+ By enabling patients to differentiate between the drugs they are taking and thesource of those drugs, the program increases their sense of security while taking

1.5 Study Limitations

This study is restricted to the design, development, and implementation of an

Android-based mobile application for drug identification and categorization using

Java programming and the Android Studio Integrated Development Environment

(IDE) in conjunction with some android libraries However, the application is createdwith the following restrictions:

+ Drug classification is entirely platform-dependent; as a result, the program onlyfunctions on Android-enabled phones like the OPPO, Vivo, and Samsung It is not

compatible with iOS

* The application needs to be online in order to find the closest hospital Becausethe hosted application will restrict the user from doing any actions, do not connect thedevice to the internet

+ Although the CNN model can categorize pharmaceuticals with a fair amount ofefficiency, there are still many drugs on the market today for which there is

insufficient data to identify drug classes

¢ When a lot of photos are used in the model, the application is sluggish and

occasionally crashes

1.6 Overview of the Thesis

The six chapters that make up the whole written study are included here, along

with a brief summary of each one

s Chapter 1: This chapter introduces the design, development of an android-basedmedical categorization system, the research challenge, the research goal, the researchobjective, and the relevance of the research restrictions on research and application

Trang 17

s Chapter 2: In order to identify and categorize pharmaceuticals for patients, thisstudy evaluates a variety of publications from various academic sources on the

design, development, and deployment of mobile applications

¢ Chapter 3: The framework and related technologies are discussed in this

chapter as they were employed in the design, creation, and implementation of an

Android-based drug classification application platform

s Chapter 4: The objectives of the application's development as well as the design

of an android-based drug classifier and identification application are covered in thischapter

¢ Chapter 5: This section of the thesis addresses the step-by-step implementation

of the intended Android-based medical categorization and identification application

s Chapter 6: This chapter completes the design and implementation of a mobileapplication for Android that is used to identify and categorize drugs It also offers

some suggestions on how to make the app even better

Table 1: Project schedule

Work done Duration

Project Feasibility Stdies -. - 6 St re 1 Weeks

Design and DevelopImeI - s55 5+5+scssservevereeeeeesexee.e.2: Weeks

Program 'Tes(IIB - 1t E3 121211 10H11 1g th 1 week

ImpleImenttiOH - 5-2 2 5% SE ESx2EEEk#EEEkEEkEEEEREEkCkrkEkrkrrkrkrrrii 9 Week

Trang 18

Project Write up

Write up corrections

.2 Weeks

Fast Pick Drug Application

Project Feasibility Studies

Draft idea

Draft concept

Draft features

Draft minimum viable product

Draft core functions

Integrate system module

Perform Initial testing

Development finished

Program Testing

Perform system testing

Document issuse found

correct issuses found

10%

coco eeeseeeeee# eeceeceeece$ $ 85%

T 80%

15%

80%

ooo

Nguyen Tuan Ant

(BREE Nguyen Tuap Anh

BBE Nguyen Tuan Ann RRB Nguyen Tuan Anh

EREBREIB tjoyen Tuan Anh (BERRIEN No yen Tuan Anh IHBENE liouyon Tuan Anh

Nguyen Tuan Anh

Nguyet

Nguyen Tuan Anh|

juan Anh juan Anh

WEB Nouyen Tuan Anh

Trang 19

Chapter 2 LITERATURE REVIEW

This study analyzed a range of academic literature on the conception, creation,

and deployment of mobile applications for patient drug identification and

classification

2.1 Design and Development of Mobile based Medication classification

The Computational Photography Project for Pill Identification (C3PI) was

developed in response to the National Institutes of Health's Pill Image Recognition

Challenge It has been demonstrated that extracting picture data using a

high-resolution camera on a smartphone and computer vision algorithms is an effective

method (Zeng, 2017) [9] CBIR now incorporates deep learning techniques to

improve its ability to extract features (contents) from input photos in order to find andretrieve related images from a database (Bose, 2020) [10]

Deep models enable the extraction of both high-level and low-level

characteristics, which is not possible with traditional CBIR (Bose, 2020) [10] The

ability of deep learning to recognize objects (Krizhevsky, 2014), faces (Taigman,

2014), and manage complex learning issues has been amazing (LeCun, 2015) Deeplearning has enhanced healthcare workflows as well, benefiting both patients and

caregivers (Delgado, 2019) Convolutional Neural Networks (CNNs) are

sophisticated methods for retrieving digital images Convolutional, pooling, and fullylinked layers that interact and are stacked make up the CNN architecture (Bose, 2020)[10]

Using AlexNet as its foundation, Krizhevsky created the multi-CNN architectureknown as MobileDeepPill (Zeng, 2017) [9] The method included measuring shape,color, and gradients to determine comparisons between consumer and reference

photos For picture identification, Wang et al (Wang, 2010) [11] employ clever edgedetection and a classifier from the Google Inception Network For identifying the

Trang 20

shape, color, and imprint of the pill, respectively, GoogleNet has developed shape,

color, and feature models But unlike the NINJH dataset, pill data was collected in avery controlled setting (Delgado, 2019) [12] Other methods with varying degrees ofaccuracy have been developed for pill picture recognition The color property and asupport vector machine (SVM) learning algorithm are two C3PI techniques (Guo P.S., 2017) [13]

The method's overall color classification accuracy was 97.90% Despite this, thetechnique's usefulness is constrained by elements including the lighting situation, thecamera resolution, and the contrast between the color of the pill and the background(Guo P S., 2017) [13] Distance Set, a local descriptor, was first developed by

Grigorescu et al (Grigorescu, 2003) [14] The method looks at distance sets betweenany point and any of its k neighbors on the contour of the pill shape The technique'sdrawbacks include distortion brought on by noise, complex shapes, or irregular

imprints Eakins (2000) The Two-Step Sampling Distance Set (TSDS) enhances thetechnique of distance sets by adding imprint and color features to the shape of the pill

When the approach was used to 12500 photos, it achieved an accuracy of 93.64%.According to He & Zhang (2016), one of the finest computer vision systems for

object detection and face recognition is the deep Residual Network (ResNet) Evenwhen training thousands of layers, the deep learning technique can produce

convincing results (He & Zhang, 2016) [15] ResNet offers a significant comparativeadvantage over AlexNet, the VGG network, and GoogLeNet, which include just 5,

19, and 22 convolutional layers, respectively ResNet is a significantly deeper

learning technique A group of smaller networks make up ResNet

2.2 Summary

There are flaws with the aforementioned connected works that pertain to the

study's topic area:

Trang 21

The systems are often built to be platform-dependent, meaning they are eithercompatible with the iOS or Android OS It follows that iOS users cannot

utilize the application if the system is built to run on the Android OS, and vice

Some designs include capabilities like the ability to locate the closest hospital

or drugstore Users feel uncomfortable using the application because invoicescannot be generated when choosing medicines from the list

Last but not least, some of the systems demonstrated require the purchase ofspecialized hardware, whereas other applications necessitate a significant

amount of hardware processing power

Trang 22

Chapter 3 THEORETICAL FRAMEWORK

expanded dramatically over the past several years along with the growth of mobile

usage As the world becomes more digital, many businesses seek for remote Androiddevelopers for their development projects because it saves money and has numerous

Figure 3-1: Android Studio user interface

This IDE's IntelliJ IDEA capabilities allows for quick code completion times andimmediate workflow evaluation Android Studio has some capabilities, including

code push for modifications and a fantastic code editor for efficient coding output Byallowing developers to push code and facilitate rapid changes without completely

Trang 23

restarting the app, Android Studio enables developers to quickly incorporate changes.This guarantees fantastic flexibility for implementing minor app modifications whilethe app is still in use One of Android Studio's main benefits, such as speedier

programming, is made possible by its user-friendly code editor It also guarantees

cutting-edge refactoring, code completion, and code analysis The emulator includedwith the Android Studio helps launch the full app more quickly than the actual

device The emulator can simulate a variety of hardware capabilities like GPS,

multiple touch inputs, motion and acceleration sensors, etc by enabling you to test

the app across a variety of devices, including phones, tablets, Android Wear, and

Android TV [17]

3.2 Java Programming Language

Because of its ease of use and effectiveness, Java was chosen as the programminglanguage for this project Another factor is that the programming language was

created by Mobile specifically for the creation of Android applications

Trang 24

Mobile App

IDE

Mobile App Development Frameworks

to the network that these neurons are forming The biological neurons seen in the

neurological or sensory systems of humans are where the concept of artificial neuronsoriginated An artificial neural network is divided into layers, just as the neural

network found in the human body In an artificial neuron, the dendrites are merely theneurons' information terminals The input is processed by the axon and its output istransferred to various neurons via the synapses and dendrites of another neuron In

the computational model, the weight of the line increases input signals that pass alongthe input line The mathematical function processes the weighted input signal The

Trang 25

activation function is the name given to this specific function The signal that has

previously been processed is once more sent to the neurons in the layer below for

further processing The weight of the link between neurons in this model is

understood to represent a component of learning Throughout the training, the value

of this model is adjusted in an effort to reduce the error to zero In the human body,the signals that are carried by the dendrites are added in the cell body, and if the sumexceeds a certain value, the axon then initiates the transmission of messages A

similar approach is used in mathematical or numerical models The activation

function decides what the threshold value should be The activation function's

standard decision is known as the sigmoid function When the summation value isentered, the sigmoid function transforms it into a reach that falls between 0 and 1

Trang 26

Artificial neurons that receive and analyze incoming data make up a neuralnetwork The input layer, the hidden layer, and the output layer all receive data.

When input data is provided to a neural network, it begins to function The intendedresult is subsequently produced by processing the data through its layers A neural

network generates results after learning from structured data There are three types oflearning that can occur within neural networks:

e With the use of labeled data, inputs and outputs are provided to the algorithms

during supervised learning After receiving training on how to evaluate data,they then anticipate the intended outcome

e Unsupervised Learning: ANNs learn without the aid of humans The output is

decided based on patterns found in the output data; there is no labeled data

e With reinforcement learning, the network adjusts its learning based on the

feedback you provide

3.5 Convolutional Neural Networks (CNN)

Convolutional neural networks (CNN) basically classify the images into groups,cluster them according to how similar they are, and perform object detection with theaid of artificial neural networks The convolutional neural network uses the image'sdata to analyze the image as a tensor, or a matrix of integers with additional

dimensions, and performs a kinematic search [20]

A percentage of the situations in which the images are recognized as volumes are3D objects [21][22] Numerous applications, such as item identification and facial

recognition, are uploading it One of the top non-trivial assignments is this one Thethree distinct layer types that are seen as components of CNN are the convolutionallayer, the subsampling layer, and the fully connected layer [21] Since CNN offersmore benefits than other techniques, it is primarily utilized for image recognition

Trang 27

Figure 3-4: Architechture of Convolution Neural Networks [23].

The input layer, convolution layer, down-sampling layer, fully connected layer,and output layer are the five main components of the CNN design, as depicted in

Figure 3.4 Below is a full explanation of each component:

e Input Layer: The input raw data set may be entered directly into the layer of

input The input layer receives one image by way of its pixel value

e Convolutional layer: Also known as the layer that performs upsampling and is

responsible for identifying characteristics in the input data Differentconvolutional kernels extract various aspects from the input data, and eachconvolutional layer has its own convolutional kernel As the number of

convolutional kernels used in the up-sampling layer rises, more features areextracted

¢ Down-sampling layer: the layer known as the pooling layer Its primary duty is

to complete the second feature data extraction, which is followed by theconvolution layer Under typical circumstances, the CNN architecture includes

Trang 28

two down-sampling layers and at least two convolutional layers The morelevels of the architecture that are established, the more likely it is that

attributes taken from the input data can aid in clear classification

e Fully connected layer: As input, all of the feature maps are connected The

nodes of the neurons in each layer are typically isolated, but the nodes of theneurons in the later layer are connected to the nodes of the neurons in the

earlier layer In order to produce a probability for various scenarios, this layerintegrates and normalizes the previously convolutioned features that have beenabstracted

© Output layer: The number of neurons in this layer is determined by the

conditions that must be met If classification is necessary, there is typically acorrelation between the number of neurons and the number of categories thatneed to be classified

3.6 Faster R-CNN

One element in the image can be grouped or categorised using a simple CNN

algorithm Faster R-CNN is a Regional Proposal Network (RPN) addition to CNN

[26] The Faster R-CNN algorithm is used since it will aid in identifying several items

in a single image Two modules were used to create a faster R-CNN Regions will beoffered with RPN in the main module's deep convolution network, and the next

module will use the suggested images for classification RPN displays the output for agiven image as a rectangle object position that includes the item's score Anchors arethe name given to the object's proposal

An RPN can be used to forecast the likelihood of objects in the background Forthis, a training dataset with named and labeled items in the image is required The

anticipated areas are reshaped using a pooling layer known as the Region of Interest(ROD) It will then be used to categorize the image within the region and predict thevalues of the offset around the bounding boxes The accuracy of the final model will

Trang 29

depend on how well the key regions are proposed At that time, it is highly likely that

it will be classified into the various classes of classifications if the regions offered tochoose the appropriate region depending on the object [25]

Figure 3-5: Diagram fo Faster R-CNN [26]

Faster R-CNN is based on the above figure can be broken down into four primarysections:

e Layers Conv The feature map is extracted by a base network

e Networks for Region Proposals (RPN) RPN produces output region

proposals and anchors

e Pooling by region of interest (RoI) This layer converts the proposal's

feature map to target dimensions

Trang 30

e Classifier The final classes and bounding boxes were output.

conv ù man | yy full connection q

Faster RCNN

Figure 3-6: Faster R-CNN Architecture of faster_rcnn_test.pt [19]

Faster R-CNN, the name of our object detection system, consists of two

components A deep fully convolutional network serves as the first module, offeringregions, while a fast R-CNN detector serves as the second module [27], using the

regions offered The system as a whole functions as an item detection network

(Figure 3.5) The RPN module instructs the Fast R-CNN module where to seek by

referring to neural networks with "attention" processes, a concept that has lately

gained popularity [28]

3.6.1 Region Proposal Networks (RPN)

The Selective Search algorithm is used by the R-CNN and Fast R-CNN models togenerate region proposals Each suggestion is sent to a CNN that has already receivedtraining In this research [24], a network capable of producing area proposals was

proposed as the region proposal network (RPN) These are some benefits:

Trang 31

e A network that can be trained and adapted to the detection task is now used

to provide region proposals

e The network used to create the ideas can be trained from beginning to end

to be specific to the detecting task Consequently, compared to moregeneral techniques like Selective Search and EdgeBoxes, it generates betterregion recommendations

® The RPN uses the same convolutional layers as the Fast R-CNN detection

network to process the image In contrast to algorithms like SelectiveSearch, the RPN produces ideas in a similar amount of time

e The RPN and the Fast R-CNN can be combined or united into a single

network because they share the same convolutional layers Training is thusonly performed once

3.6.1.1 Anchor

The feature map of the final shared convolution layer is shown in the following

picture to be passed via a rectangular sliding window of size nxn, where n=3 for theVGG-16 net K region ideas are generated for each window Each suggestion is

parametrized in accordance with an anchor box, a reference box The anchor boxes'two parameters are as follows:

e Aspect Ratio

e ScaleThere are typically three scales and three aspect ratios, for a total of K=9 anchorboxes But K might not be the same as 9 In other words, each area proposal results inthe production of K regions, each of which has a different scale or aspect ratio

Trang 32

| 2Ä scores | 4k coordinates | XÃ tìnchorboxc

cls ”À / reg layer

t intermediate layer l

sliding window

conv feature map

Figure 3-7: Anchor generation [29]

In order to provide scale-invariant object detectors, reference anchors(also known as anchor boxes) are utilized The anchors exist at many scales,allowing for the usage of a single image at a single scale Using several photos

or filters is avoided in this way The RPN and the Fast R-CNN detection

network need to communicate features, and the multi-scale anchors are

essential for this An anchor with a scale and aspect ratio is centered at the

sliding window in question (Figure 3.7) We employ three scales and three

aspect ratios by default, resulting in k = 9 anchors at each sliding position

There are W H k anchors in total for a convolutional feature map of size W x

H (typical ~2,400)

Trang 33

3.6.1.2 Proposal Layer

The RPN structure is seen in the above picture Its input is a feature map from aconvolutional layer, and it has two data flows: the upper flow classifies anchors usingpositive or negative labels, while the lower flow determines the bounding box

regression offset To create and filter appropriate proposals, these flows are then

combined in a Proposal layer

A M*N image is transformed into a (M/16)*(N/16)*512 feature map using FasterR-CNN Let's use M/16 for W and N/16 for H The W*H*512 feature map is filtered

by a 1*1*18 conv layer, as seen in the upper data flow (the blue frame in Figure 3.8),with the aim of translating 512-dimensional feature map to 2*9 (positive/negative of

9 anchors) dimention vectors to categorize positive or negative anchors The WH18feature map is then sent to a softmax classifier to determine if each anchor has a

positive or negative probability

The base network is used by RPN to categorize which anchors are positive

(covering the ground truth) and which anchors are negative after setting up a densenumber of candidate anchors on the scale of the original image (outside the groundtruth) It ultimately resolves a problem with binary classification

3.6.1.3 Region of Interest (RoI) Pooling

The Rol Pooling layer receives suggestions from RPN and the feature map fromthe base network, as shown in the picture below The layer's primary function is to

Trang 34

extract feature maps that are covered by proposals The issue is that the proposals arenot fixed-size boxes, even though the R-CNN requires fixed-size feature maps in

order to categorize them into a fixed number of classes

Faster R-CNN uses Rol Pooling, which is derived from Spatial Pyramid Pooling,

to address this problem Assuming that the proposal size is M*N and the fixed featuremap size is pooled w*pooled h, the procedure is straightforward:

e Transform proposal at 1/16 scale from the real image space to the feature

map space to get ROI

e Divide Rol: If necessary, round down the result after dividing the feature

map region that corresponds to the proposal into pooled w*pooled h grids

e Run maximum pooling for every grid

3.7 Mask R-CNN

Faster R-CNN is an instance segmentation technique that concurrently predicts

the bounding box that informs the position of the existing item and the mask of theobject region [25,26] Mask R-CNN is an expanded model of Faster R-CNN An

illustration of image segmentation and object detection may be seen in Figure 3.10.Four different sorts of pills and a background make up the input image known as

Figure 3.10a The outcome of a semantic segmentation is displayed in Figure 3.10b

Ngày đăng: 02/10/2024, 05:29