1. Trang chủ
  2. » Luận Văn - Báo Cáo

Khóa luận tốt nghiệp Hệ thống thông tin: Building a smart document scanner application on mobile

102 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Building a Smart Document Scanner Application on Mobile
Tác giả Dang Quang Hung
Người hướng dẫn Dr. Nguyen Thanh Binh
Trường học University of Information Technology
Chuyên ngành Information Systems
Thể loại Graduation Thesis
Năm xuất bản 2022
Thành phố Ho Chi Minh City
Định dạng
Số trang 102
Dung lượng 51,87 MB

Nội dung

-- - ¿+ + S‡EE£k+kekexexerrrxekrrrrrrrrkrvee 7 Figure 2.5 TensorFlow Lite model execution flow in Android applications .... cece sees eeseseeeeseeeeseeseseeaeseeseseeseeeesees 11 Figure

Trang 1

VIETNAM NATIONAL UNIVERSITY HOCHIMINH CITY

UNIVERSITY OF INFORMATION TECHNOLOGY

ADVANCED PROGRAM IN INFORMATION SYSTEMS

DANG QUANG HUNG - 18520790

Trang 2

NATIONAL UNIVERSITY HOCHIMINH CITY UNIVERSITY OF INFORMATION TECHNOLOGY

ADVANCED PROGRAM IN INFORMATION SYSTEMS

DANG QUANG HUNG - 18520790

Trang 3

ASSESSMENT COMMITTEE

The Assessment Committee is established under the Decision , date

by Rector of the University of Information Technology.

— - Chairman.

" - Secretary.

3B eecceceeeeeeeseseeeseeeseeeeaeeesteeseeneneaeeee - Member

Trang 4

First of all, I would like to acknowledge and give sincere thanks to my supervisor,

Dr Nguyen Thanh Binh, who spent his time instructing me during the process of making this thesis His guidance and advice forwarded me to solutions that helped

my product works efficiently The final result is really a treasure which helped me learn a lot of knowledge and made the last four years fly by I also could not have undertaken this graduation thesis without my defense committee, who generously provided knowledge and expertise.

Additionally, I am grateful to my classmates and cohort members, for giving me valued feedback and moral supports Thanks should also go to the librarians, and teachers, who impacted and inspired me.

Lastly, I would like to mention to my family for moral support By strong words, they motivate me every time I feel exhausted Your assists made this thesis become meaningful and stay memorable in my mind.

Trang 5

1.1 Problem statemen( s:ccccccctecterrerrerrrrrrrrirtrrrrrrrrrrrrrrrrrrrrrer 1 1.2 Objectives

1.3 Object and scope Of Sfudy +: + + th HH 1 re 2

Chapter 2 BACKGROUND KNOWLEDGE - Ăn 3

2.1 Android ,ĐIN să= ` A f ineiieiiee 3

VN Noo 3

P vn 4 2.1.3 Integrated development environment for Android - -:-: + 6 2.2 Machine learning library on Android oo tees eeseeeseeeeseeeeeseseeeteeesteneeeeaeeeees 7

P9 7 2.2.2 Tensorflow Lite -cccccccccccccrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrir § 2.2.3 Google ML KÍ( 2cvccvcc nh re 10 2.3 Image processing algorithims - - ¿5+ + S+‡SExEEEEEExEkErkerrkerkrkrrkrrie 10

2.3.1 Grayscale transformation -:-cc+cc¿‡222Evxvvtrirrttttkkrrrrrrirtiirirrrer 11

Trang 6

2.3.4 Edge detection c2xxv nha 16 2.3.5 Perpective transforImation -:-ccccccSSEvvvvvrirtttttkktrrrirrirttrririrrre 17 2.4 Optical Character Recognition (OC)) - - - - 5+ stsxstsxsxexexerrrverrkerrre 18

Chapter 3 THE PROPOSED PROCESSING METHOD 19

3.1 General processing ÍÏOW - «cv HH tr viec 19 3.2 Edge processing based approach - :- - ++se+k+Evrkerkrkerkekrrkerrkerkrke 2

E6 o0 23 3.2.2 Canny edge detection

3.2.3 Text detection

3.2.4OCR „35

3.3 Cloud processing based approach ¿ 5+ 5xx st‡xevsrertrrkeksrererrrkrkevee 36 3.4 EValuafiOI - «Sàn HH HH H11 1 H1 T 0g ri 38

Chapter 4 APPLICATION DESIGN AND IMPLEMENTATION 41

4.1 Analysis and design 5+ St TT HH 11101111 1111 11.111 1kg 41 4.4 Implementation and testing

4.4.1 Wireframes

4.4.2 User Interface.

Chapter 5 CONCLUSION

REFERENCES

Trang 7

LIST OF FIGURES

cae

Figure 2.1 Reasons for choosing AndFOI - +6 5++x*E‡keEExEkEEkekerkerrkerkrree 3 Figure 2.2 Android architecture

Figure 2.3 Android studio icon

Figure 2.4 Popular OpenCV modules - ¿+ + S‡EE£k+kekexexerrrxekrrrrrrrrkrvee 7 Figure 2.5 TensorFlow Lite model execution flow in Android applications 9 Figure 2.6 RGB (3-channel) image represernfafiOI - - ¿5< + +xexcterrxerxerrk 11 Figure 2.7 RGB image visualization cece sees eeseseeeeseeeeseeseseeaeseeseseeseeeesees 11 Figure 2.8 Grayscale (1-channel) image represenfafion -. - + «c5s+ceccexcr+ 12 Figure 2.9 Grayscale transformation for each pixel in RGB image - 12 Figure 2.10 Image filtering — ConVỌUtIOH - s55 +c‡xvxvEvEetevexererrerversee 13 Figure 2.11 Original image and Image after applying Gaussian blur - 14 Figure 2.12 Original image and Image after applying thresholding.

Figure 2.13 Result visualization of edge detectiO + +-+5++c+c+x+£vrvrzxexse+ 16 Figure 2.14 Example of perspective transfOrimat(IOT - + + +5++s+xex+£vrvzzvsxsx+ 17 Figure 2.15 Construction of a typical OCR sysfem ¿5-5255 18 Figure 3.1 General processing flOW cccesceeseseeseseseseseeeeeseseseeeeneecseseeeeeseeeeneaeeeeets 19 Figure 3.2 Process flow of scanning CID card - - - - +55 sx+v+xexexerervrerrersre 20 Figure 3.3 Processing flow of edge processing based appoarch.

Figure 3.4 Image preprocessing aÌgOTIthIm§ - - + + St+x*Eekerekrkerkekrkerrk 23 Figure 3.5 The Gaussian function in one and two dimensIOns 5c55+ 25 Figure 3.6 Truncation threshoÏding - + 5+ +£Sk+EkEkeEkEkeEkEkerkekkrkerkrkrkerkek 26 Figure 3.7 Threshold to Z⁄TO -¿- ¿+ +23 Sk*k#kEktxvEkEkEkEkskerkrkrkrkrkekrkrrrrkrkrkrkerrre 27 Figure 3.8 Non maximum suppression visualization in coordination - 28

Trang 8

Figure 3.12 Sample annotated CID car( - - +: t+++xx+£vEvEvEevexexererrvrerkersre 32 Figure 3.13 EfficientDet archiit€CẨUTC - - + St ‡v+veEErtEkrkekekekerrrkrkrkrkerree 33 Figure 3.14 Retraining model DFOC€SS - - ¿+5 +tEEEvEvEEkekrkrrrrkrkrkrkerrre 34 Figure 3.15 Text recognition DFOC€SS - - + SE TH 1101111111111 xe 35 Figure 3.16 Processing flow of the cloud processing based approach

Figure 3.17 VietOCR archit€CfLITC - tt Sky 37 Figure 3.18 Experiment in đayfIIT€ - ¿+ St StSt#k+veEEEeEvEekekekererrkrkrkrrrre 38 Figure 3.19 Experiment in dark enVITOTINII( - 5-5-5 55 5*$‡v+xexvxeEvrzeexexsr+ 39 Figure 4.1 System arCIIt€CẦUTC - - 5c kg HH 1y 4I Figure 4.2 Use-case øeneral1Za{iOII ¿+ + EkE#k#kekerErkrrkekekrkerrrkrkrkrkerree 42 Figure 4.3 WIrefTaImes - - tt HH HH1 1.1 T100 111 111111 uy 83 Figure 4.4 UI of main SCT€€I ¿6 + EkStSkEk‡xEvEtEkEkskekEkrrrrkekekrkrkrrkrkrkrkrrree Figure 4.5 UI of camera screen

Figure 4.6 UI of image cropping screen.

Figure 4.7 UI of image reviewing screen

Figure 4.8 UI of image viewing SCT€€N - - 5S ‡v+teEEEvErEekekekekrrrrkrkrkerree 89 Figure 4.9 UI of tex result SCT€€TI - 5c tt tEEEEEEEkSkeEEErErEekekrkrkrrrkrkrkerree 90

Trang 9

Table 4.13 Share docuimet( 5:52 2 2# Ererrrrrrrrtrrrrtrrrrrrrrrrrrrrrre 7I Table 4.14 Scan feX( co nh re 74 Table 4.15 Adjust reSuÏ{ s6 S91 k2 E3 E121 11.1111 11.1 11.11011101 111 tr T7 Table 4.16 Save text result - ¿5-52-5251 2t 2t t2 211211211 111.1 re 80

Trang 10

Our daily lives have increasingly become more reliant on smartphones The hardware explosion was caused by competition between a high number of smartphones competing for market share The development of algorithms has also increased the capabilities of smartphones and improved the user experience People can now do a lot with a smart device, leading us to adopt a new viewpoint that emphasizes the use of smart gadgets to successfully complete daily tasks In order to keep up with this trend, this thesis offers a novel solution: an application that combines the benefits of mobile devices with those of machine learning models for document scanning and information extraction The application is primarily concerned with reading and retrieving data from Vietnam citizen identification (CID) cards Information that has been gathered can be used as input for an information system.

Trang 11

Chapter 1 INTRODUCTION

1.1 Problem statement

The bank clerk would typically ask consumers to provide their CID card when they visit and complete a procedure so that a profile can be created It is clear that entering client data from the CID card requires a lot of time Therefore, the CID scanner program was developed to scan the CID card, extract vital data such as the full name, date of birth, place of birth, and address, and then accurately and automatically input that data into the customer profile We can save a lot of time and money that would have been spent on recruiting extra workers by using this program.

1.2 Objectives

The following aims are covered by the thesis:

Identifying methods for document scanning on mobile devices.

Determining the ideal response.

Creating the application's user interface and user experience.

Implementing developed user interfaces on Android smartphones.

Trang 12

1.3 Object and scope of study

The document scanner application's capability to perform scanning functions on a specific paper format is the major part of this thesis We selected to employ the Vietnam CID card for our study due to this reason Vietnam CID cards are extensively used and play a significant role in regular life; thus, extracting the valuable information contained in them encouraged us to put our solution into practice in order to extract information.

The scope of this study contains some topics, including:

- Mobile application development

- General application development

- Computer vision technique (Object, Region of Interest Detection)

- Machine Learning — Optical Character Recognition (OCR)

Trang 13

Chapter 2 BACKGROUND KNOWLEDGE

applications Overall, Android is a mobile application ecosystem [1]

Trang 14

2.1.2 Architecture

Android provides a rich development framework You don't need to know the components of this architecture, but it's helpful to know what's available in the system

for your application [2] The diagram below shows the main components of the

Java API Framework

Trang 15

In the illustration above:

Applications: This tiers features applications and critical system elements like emails, SMS, calendars, internet browsing, and contacts [2].

Java API Framework: All of Android's features, from UI components to resource

management and lifecycle management, are accessible via application programming interfaces in the Java API Framework (APIs) The specifics of how the API functions

don't need to be known to you Just figure out how to use them [2]

Libraries and the Android Runtime: Each application launches a separate process and Android Runtime instance The majority of the Java programming language's features are provided via a core set of runtime libraries that are part of Android The

Android operating system is constructed from native code, which involves the

implementation of native libraries in C and C++ The Java API framework makes these native libraries approachable to applications [2].

Linux kernel: The Linux kernel is the foundation of the Android OS Threading,low-level memory management, and other basic features are all covered by the Kernel

in the layers above it Android can gain from Linux-based security features by using

a Linux kernel, and device manufacturers can create hardware drivers for a

Trang 16

2.1.3 Integrated development environment for Android

There are numerous IDEs that allow Android programming; however, we discovered that Android Studio is the best IDE for this thesis' requirements We built

the document scanner application to run machine learning models and algorithms onmobile devices With the help of the libraries indicated in the next section, we

implemented a large number of algorithms.

android

studio

Figure 2.3 Android studio icon?

? Figure is taken from https://commons.wikimedia.org/wiki/File:Android_Studio_Trademark.svg

Trang 17

2.2 Machine learning library on Android

2.2.1 OpenCV

A comprehensive set of computer vision algorithms can be found in the open-source

library known as OpenCV Python, Java, C++, and a variety of programming languagesare recognized by OpenCV It is compatible with a diverse range of operating systems,

namely OS X, Android, iOS, Windows, Linux, and The most frequently used

modules in OpenCV are depicted in the figure below OpenCV has a modular structurethat contains various shared or static libraries [3]

Video analysis 2D features framework High-level GUI

(video) {features2d) (highgui)

Image processing Videoio _ `" 4 (videoio)

(core) (objdetect) (calib3d)

; : ‘ Camera Calibration and

Figure 2.4 Popular OpenCV modules

We simply introduce the OpenCV image processing module in order to condense the thesis's substance Many of the algorithms in the image processing module take an

image as input and return the desired result, which can then be used as input for othertasks

Trang 18

2.2.2 Tensorflow Lite

TensorFlow Lite is a collection of technologies that makes it possible for developers

to execute their models on mobile, embedded, and edge devices, enabling on-device

machine learning [4]

According to TensorFlow Lite, we can use TensorFlow machine learning (ML)

models in the Android apps With options for hardware acceleration, the TensorFlow

Lite system offers prebuilt and scalable execution environments for rapidly and

efficiently operating models on Android [4].

TensorFlow Lite employs TensorFlow models that have been downsized, portable, and optimized for machine learning With TensorFlow Lite for Android, we may utilize

pre-built models or create your own TensorFlow models and export them in

TensorFlow Lite format [4].

An Android app running a TensorFlow Lite model receives input, analyzes it, and

makes a prediction based on the model's logic The input provided into a TensorFlowLite model must be in a certain data format called a tensor in order for it to function,and it also needs a special runtime environment The Android app receives the

prediction results as new tensors when a model evaluates the data, a process called

making an inference, and uses them to take action, like displaying the result to the user

or running further business logic [4]

Trang 19

TensorFlow Lite model

Figure 2.5 TensorFlow Lite model execution flow in Android applications?

The Android app needs the following things to perform a TensorFlow Lite model

at the functional design level:

_ Model execution environment with TensorFlow Lite.

_ To convert data into tensors, use the model input handler

_ Receiving output result tensors from the model and interpreting them as

prediction results.

Trang 20

The on-device functionality of ML Kit's APIs enables real-time use cases, such asprocessing live camera video Additionally, this implies that the functionality is usable

offline [5].

In this thesis, we compare the performance of the text recognition algorithm in ML

Kit with that of other libraries or algorithms in order to select the best one for use onAndroid mobile devices The details of this comparison are represented in the next

section of this report.

2.3 Image processing algorithms

Image conversion into the desired format is known as "image processing" This is

the important part besides using the machine learning libraries In OpenCV, there are

a large number of algorithms for image processing Plenty of well-known algorithmsthat are employed during practically every stage of image processing are discussedbelow

Trang 21

2.3.1 Grayscale transformation

OpenCV stores image information in Mat objects such as rows, columns, data An

image is usually in RGB format Because a vibrantly colored image is a 3-channel

image when adopting the RGB paradigm, this image has varying amounts of data Thestorage of RGB images is shown in the figure below

Row 0

Row 1

Figure 2.6 RGB (3-channel) image representation

red component image plane

green component image plane

blue component image plane

Figure 2.7 RGB image visualization

Trang 22

Row 0

Row 1

Row

Row n

Figure 2.8 Grayscale (1-channel) image representation

When utilizing integer representations, the numbers are displayed in the grayscale

image on a scale of 0-255, at which 0 is pure black and 255 is pure white, or on a

scale of 0-1 if we use a floating point representation

Images in grayscale contain fewer channels than images in color, so they

accommodate less data A colored image can be transformed into a grayscale image

to decrease its size and the algorithm's computation time

Grayscale images are perfect for the preponderance of image processing algorithms and have a considerable advantage over RGB images in this regard, which is why

grayscale transformation algorithms were developed There are several grayscale

algorithms, but they all follow the same three major steps:

o> Gray =(R+G+B)/3 mm ——> a

Green = Gray

Figure 2.9 Grayscale transformation for each pixel in RGB image

Trang 23

2.3.2 Image filtering

One of the most crucial stages of the entire image processing phase is image

filtering Numerous algorithms are started, and they all adhere to a fundamental one.

They employ a square matrix of numbers known as a kernel The output, whichcomprises the filtered image, is generated after it processes the entire image matrix [6]

Trang 24

Noise reduction is the most notable application of linear filtering Through bluroperations, noise reduction is a widespread technique When we need to blur an image,

we normally use the Gaussian blur method It makes use of the "kernel convolution"approach The below Gaussian function is implemented to obtain the Gaussian kernel:

Figure 2.11 Original image and Image after applying Gaussian blur*

4 Figure is taken from https://en.wikipedia.org/wiki/Gaussian_blur

Trang 26

2.3.4 Edge detection

Edge detection is a method of image processing that locates the edges of objects in

the image It operates by looking for changes in brightness In disciplines including

image processing, computer vision, and machine vision, edge detection is employedfor segmentation and extraction

Figure 2.13 Result visualization of edge detection®

6 figure is taken from

https:⁄/www.mathworks.com/help/supportpkg/android/refdetect-boundaries-objects-video-matlab-function-block-android.html

Trang 27

l | hai hạa hay l

The figure below is an example of how perspective transformation works.

Trang 28

2.4 Optical Character Recognition (OCR)

We need to understand the OCR terminology in order to create software that can

retrieve text from an image of a document Text recognition is another name for optical

character recognition (OCR) Information is extracted and used from scanneddocuments, camera photos, and image-only PDF files with an OCR application The

plaintext may be accessed and edited because the OCR software separates out the letters on the image, turns them into words, and then arranges the words into sentences.

Moreover, it really does away with the requirement for human data entry

OCR systems transform physically printed documents into text that is readable by

machines using a combination of both hardware and software Text is scanned or read

by hardware, such as an optical scanner or a specialized circuit board; the advanced

processing is then usually handled by software.

OCR software can use artificial intelligence (AI) to implement more comprehensive intelligent character recognition (ICR) approaches, such as recognizing

languages or handwriting styles OCR is most frequently used to convert paper-based

legal or historical documents into pdf files that can then be edited, formatted, and

searched just like word processor-created documents

Segmentation fines, Feature extraction | Recognition

characters) Character and selection | Discrimminating feature set ASCIl or UNICODE

* of the characters

Figure 2.15 Construction of a typical OCR system

Trang 29

Chapter 3 THE PROPOSED PROCESSING METHOD

3.1 General processing flow

The processing flow is generally described in the figure below:

crop image based

LE—————>

on bounding box

es

‘adjust image mode

|——> text detection ——————> ocR — display result —(~)

Figure 3.1 General processing flow

Trang 30

When using a smart scanner application to scan a document, the input image will

go through several stages in order to provide a high-accuracy result The graphic belowdepicts how a paper is scanned for features The specifics of each stage will beillustrated in the section afterwards

Open camera and

capture image

371960702

PANG QUANG HUNG

27-11-2000

Tp Rạch Giá, Kiên Giang

64 Nguyễn An Ninh, P Vĩnh Bảo,

Tp Rạch Giá, Kiên Giang

Figure 3.2 Process flow of scanning CID card

Trang 31

The figure 3.2 is the process of the image taken from camera or from the gallery.The figure also demonstrates the result visualization based on the processing flowfigure described above In order to provide us with the perfect output picture to utilize

as input in edge detection, the input image will first undergo the image preprocessingstep After that, we can quickly locate four corners based on the edges the algorithmdiscovered Therefore, the application is able to show the adjustable bounding box

Users can drag and drop the corners manually Additionally, users have the option of changing the image's color mode to one of binary, gray, or color At the next step, the

text detection model works to detect the core content text as shown in the figure 3.2.The boxes covered with a blue border are defined as the core content in the image TheOCR module utilizes the cropped images with the core content as input in the next

phase Users will see the information that was collected from those images.

We provide two approaches in this thesis: edge processing and cloud processing

We conducted an experiment and compared between 2 approaches The specifics are

covered in more depth in the next parts, where we design the application based on the

methodology that produces the best results

Trang 32

3.2 Edge processing based approach

The edge processing based approach puts all execution modules including all ML

models and algorithms in the mobile application The processing flow of this approarch

is demonstrated in the figure below:

start

capture image |

image preprocessing

——

edge detection

Ỷ calculate 4 corners

v display the bounding box based

‘adjust image ML Kit text |made text detection nition = > dl9playresult

Figure 3.3 Processing flow of edge processing based appoarch

Trang 33

3.2.1 Image preprocessing

The OCR system relies heavily on preprocessing To achieve great efficiency, the

input image must be well qualified The output from this stage is continuously applied

to the next ones The works completed are shown in the diagram below

image scaling |

ee

grayscale transformation

and image processing applications This has various benefits, including helping to

minimize the number of pixels in an image This can speed up the calculations for

subsequent stages during the analysis stage Additionally, practically every OCR

engine outputs a fixed-resolution picture In essence, image resizing helps the system

work properly

Trang 34

3.2.1.2 Grayscale transformation

We discovered in the last part that the RGB image has more information than the

grayscale image The grayscale picture also speeds up other algorithms' computations.

Last but not least, a grayscale picture must be provided as input in order for the

grayscale transformation process to be used.

OpenCV is one of the greatest options for grayscale transformation in Androidapplications Alpha channel addition and removal, channel order reversal, conversion

to and from 16-bit RGB color (R5:G6:B5 or R5:G5:B5), and conversion to and fromgrayscale using:

RGB[A] to Gray: Y — 0.299-R+0.587-G+0.114-B

and:

Gray to RGB[A]: R — Y,GŒ — Y,B — Y,A < max(ChannelRange)

Trang 35

The most practical filter is the Gaussian filter (although it is not the fastest) In order

to decrease the size of the output array, a Gaussian kernel is convolved with each point

in the input array before they are all added together The following Gaussian function

is used to produce the Gaussian kernel:

Trang 36

https://medium.com/jun94-devpblog/cv-2-gaussian-and-median-filter-separable-2d-filter-3.2.1.4 Thresholding

At this step, we separate out regions of an input image corresponding to objects that

we want to analyze This separation is based on the variation in intensity between the

object pixel and the background pixel In this thesis, we provide two types ofthresholding: truncation and thresholding to zero [9]

We may utilize truncation to make it consistently bright because the majority of

texts are in bold colors One way to describe truncation thresholding is as follows:

threshold ifsrc(x,y) > thresh

src(x, y) otherwise

dst(x, y) = {

The maximum intensity value for the pixels is threshold, if src(x,y) is greater, then its value is truncated For better understand, look at the figure below:

Figure 3.6 Truncation thresholding’

dst(x, y) = { src(x,y) if src(x,y) > thresh

0 otherwise

If src(x,y) is lower than threshold, the new pixel value will be set to 0.

? Figure is taken from https://docs.opencv.org/3.4/db/d8e/tutorial_threshold.html

Trang 37

3.2.2 Canny edge detection

Canny edge detector is an operator that uses a multi-stage method to detect edges

in images We must perform the following steps in order to implement this algorithm [10]:

_ Noise cancellation: At this stage, we apply a Gaussian filter to the picture to

decrease noise The information for this stage is shown above.

_ Finding the image's intensity gradient: We employ the sobel operator to determine

the derivative in the horizontal direction G and vertical direction G in order to determine the intensity gradient Thus, the following formula may be used to calculate the edge gradient [10]:

2

Edge_Gradient (G) = G + G.

G

0 = arctan(;")

Trang 38

Non maximum suppression (NMS): In this step, we use a 3x3 filter that runs

through the pixels on the gradient image When applying the filter, we take into account whether the gradient magnitude in the center pixel is greater than that of the pixels

around it We shall maintain that pixel if it is the maximum value Unless we reduce itsgradient's strength to zero, We just evaluate the core pixel's two gradient-direction

neighbors [10].

Figure 3.8 Non maximum suppression visualization in coordination

If 9= 0, then the point A is considered to be on the edge if its gradient

magnitude is greater than the magnitudes at A3 and A7

If 8 = 45, then the point A is considered to be on the edge if its gradient magnitude is greater than the magnitudes at A4 and A8.

If 6 = 90, then the point A is considered to be on the edge if its gradient magnitude is greater than the magnitudes at Al and AS.

If 8 = 135, then the point A is considered to be on the edge if its gradientmagnitude is greater than the magnitudes at A2 and A6

Trang 39

Figure 3.9 Simple CID card

As we can see, the blue zones on the CID card are the real zones with valuable

material, whereas the red rectangles are text zones with unnecessary elements.

However, it takes a long time for the OCR module to find and identify the text in thesezones when we upload that image The text in the red areas will then be filtered andremoved after that The processing flow to extract the text in the blue zones in an

traditional text detection phase is depicted in the diagram below:

Trang 40

Original CID card

CỘNG HOÀ XÃ HỘI CHỦ NGHĨA VIỆT NAM

Độc lâp - Tự do - Hạnh phúc

GIẤY CHỨNG MINH NHÂN DÂN

SỐ 17472416

Họ tên: THỊNH VŨ MINH Sinh ngày: 27-01-1919 Nguyên quán: Thôn Phụ Chính,

Xã Hoà Chính, Chương Mỹ

Nơi DKHK thường trú: Xã Hoà Chính

Huyện Chương Mỹ, Hà Nội

Figure 3.10 Processing flow of traditional text detection phase

Ngày đăng: 23/10/2024, 01:29

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN