Khóa luận tốt nghiệp: Xây dựng hệ thống điểm danh bằng nhận diện khuôn mặt với mobilefacenet

VIETNAM NATIONAL UNIVERSITY, HO CHI MINH CITYUNIVERSITY OF INFORMATION TECHNOLOGY FACULTY OF COMPUTER ENGINEERING TRAN NGUYEN PHONG DAT DEO QUOC GRADUATION THESIS THE RESEARCH OF APPLYIN

Algorithms DeSCTriptiOn 1

Siamese neural netWOrk - c1 +11 119 ng ng rệt 20 2.5.2 Outline of the alQorithMs - - - s6 + E11 191v ng ng 23 "n9

A Siamese neural network [19] (sometimes called a twin neural network) is an artificial neural network that uses the same weights while working in tandem on two different input vectors to compute comparable output vectors Often one of the output vectors is precomputed, thus forming a baseline against which the other output vector is compared This is similar to comparing fingerprints but can

20 be described more technically as a distance function for locality-sensitive hashing.

To make it easy to understand, the network architectures that when you put in 2 photos and the model will tell if they belong to the same person or not are called Siamese network Siamese network was first introduced by “DeepFace: Closing the Gap to Human-Level - Yaniv Taigman Elt’.

The architecture of the Siamese network is based on the base network, which is a Convolutional neural network with the output layer removed to encode the image into an embedding vector The input of the Siamese network is any 2 pictures randomly selected from image data The output of the Siam network is

2 vectors corresponding to the representation of the 2 input images We then put

2 vectors into the Loss function to measure the difference between them Usually the Loss function is a Norms function level 2.

From the Siamese neural network model, the model returns 2 encoding vectors xJ and x2 representing images | and 2 respectively x/J and x2 have the same number of spatial dimensions The function f(x) has the same effect as a transform through a fully connected layer in the neural network to create nonlinearity and reduce data dimension to small sizes Usually a vector of 128 numbers for pre- train models Next, to build a face-recognition system we need to compare two pictures Let’s say, the first picture with the second picture below To do this, we can feed the second picture to the same neural network with the same parameters and get a different vector of 128 numbers This will be our representation of the the second picture We also say that the picture is encoded in this way, as shown in Figure 2-10.

Finally, if it turns out that these encodings are a good representation of these two images, we can find the distance d between x” and x?) A common way is to use a norm of the difference between the encoding of these two images.

First, these two neural networks have the same parameters Then, train a neural

[20] (2.1) network, so that the encoding that it computes results in a function d Finally, it will tell when two pictures are of the same person. e When x” and x?) are the same person: đ(x,x)) must be a small value. e When x” and x?) are different people: đ(x),x2)) must be a big value.

We will see the goal of learning as illustrated in Figure 2-11.

Parameters of NN define an encoding f(x)

Learn parameters so that: if x, xỞ)are the same person, || f(x) — F(x)I)’is small if x, xWare different persons, || f(x) — ƒ(x0))||ẽis large

Figure 2-11: The goal of learning [20]

Facenet and MobileFaceNet algorithms are a form of Siam network that represents images in a dimensional eucledean n space (usually 128) such that the smaller the distance between the embedding vectors, greater similarity between them.

Most pre Facenet face recognition algorithms seek to represent the face with an embedding vector via a bottle neck layer that reduces the data dimension. e However, the limitation of these algorithms is that the number of dimensions of embedding is relatively large (usually> = 1000) and affects the speed of the algorithm Usually we have to apply Principal component analysis (PCA) algorithm to reduce data dimension to reduce computation speed. e The loss function only measures the distance between two pictures So in a training input only learn one of two possibilities is the similarity if they are the same class or the difference if they are different class but cannot learn the similarity and difference on the same training.

Facenet solved both of these problems with small tweaks, but with great effect:

23 e The Base network applies a Convolutional neural network and reduces the data dimension to only 128 dimensions So the inference and prediction process is faster and at the same time the accuracy is guaranteed. e Using a loss function is a triplot loss function that is able to simultaneously learn the similarities between two pictures of the same group and distinguish photos from the same group Hence much more effective than previous methods.

In Facenet, the encoding process of Convolutional neural network has helped to encode the image in vectors of 128 numbers These vectors will then serve as input to the Triple loss function to estimate the distance between the vectors.

To apply Triple loss funcition, we need to get 3 photos of which one image is the anchor image We will select the remaining 2 images so that one image is negative (of another person with anchor) and one image is positive (same person as anchor), as illustrated in Figure 2-12.

Triple Loss Function: ||ƒ(Ä) - /q)lÊ - ||ƒ{A)- NI light condition can affect the recognizing’s result

— Case 3: 1 face, bright condition, at 0.5meter distance

Result: 1 face detected, true identity.

— Case 4: 1 face, bright condition, at 2meters distance

Result: 1 face detected, true identity.

— Case 5: 5 faces, normal light condition, at 0.5meter distance

Result: 5 faces detected, all 5 bounding boxes are true identity.

— Case 6: 5 faces, fairly dark condition, at 0.5meter distance

Result: 4 faces detected, only 3 of the bounding boxes return true identity.

— Case 7: 5 faces, bright condition, at 0.5meter distance

Result: 4 faces detected, all 4 are true identity => Distance can affect the detecting’s accuracy

— Case 8:5 faces, normal light condition, 2meters distance

Result: 5 faces detected, 1 bounding box return wrong identity.

— Case 9: 5 faces, dark condition, 2meters distance

Result: 5 faces detected, 1 bounding box return wrong identity.

Result summary shown in Table 4.2

Table 4.2: Result summary of performed tests with Jetson Nano kit

Test no Actual Number | Number of Confidence | Recognizing number of | of true (%) speed (ms) faces bounding | identification boxes

Based on the test results, we can see the system works find under condition of normal light, below 2 meters distances The highest accuracy came up to 100% The error rate is 11% As the light getting darker or brighter, the less accurate the recognizing results are, and the closer the distance to performing recognizing, the more accurate the result could be.

Figure 4-6 shows one of the test cases for accurate results at the expected speed.

Figure 4-6: One of result of test case with MobileFaceNet model

Chapter 5 CONCLUSION AND FUTURE WORK

Within the scope of the thesis, we have built a device that can identify human faces using Face-Net and MobileFaceNet models in real time.

Ultra-light face detection algorithm, the face detection system functioned with almost perfect accuracy, but many of the non faces ware erroneously "detected" as faces. This was acceptable because the false positive rate was low.

About the Mobile-Face-Net model, we have run it on Jetson Nano kit, because of some bugs, we couldn't run it on Jetson Tx2 kit yet Instead, we used Face-Net algorithm for face recognition on Jetson Tx2 kit.

About FaceNet model and MTCNN face detection algorithm, we have run it on Jetson Tx2 kit to compare with MobileFaceNet model on Jetson Nano kit, and the result shows the face recognition system was not robust enough to achieve high recognition accuracy, the main reasons for this were the variance of noise in the images due to the illumination and the limited number, however, the results of some cases gave favorable results, giving rates of over 90% in some cases To improve the face recognition accuracy, it would be necessary to have more input images, and good pre- processing of the images is very important for getting adequate results Thus, the Face-Net and MobileFaceNet models approach satisfied the requirements of this project with its speed, simplicity and learning capability for the resolution of this project.

Our initial goal after recognition faces would be to export the data to an excel file as attendance data However, we did not have enough time to achieve this goal.

— Deploying final phases, export data to excel files

Optimize the algorithm for face recognition or use other algorithm for better performance.

Create interfaces which users find easy to use and pleasurable.

Add more functionalities for face recognition into smart home such as unlocking safes, securing important file, etc.

[1] Yuan Yuan, Michael Sarahzen, “Smaller Is Better: Lightweight Face Detection

[2] Sheng Chen, Yang Xiu, “MobileFaceNets: Efficient CNNs for Accurate Real-

Time Face Verification on Mobile Devices”, 20 Apr 2018.

[3] Catarinoconsulting.com, https://catarinoconsulting.com.au/articles/2019/6/13/facial-detection-vs-facial- recognition, 30/12/2020

[4] Developer.nvidia, https://developer.nvidia.com/embedded/learn/get-started- jetson-nano-devkit, 5/1/2021.

[5] paper Zhang, Kaipeng et al “Joint Face Detection and Alignment Using Multitask

Cascaded Convolutional Networks.” TEEE Signal Processing Letters 23.10 (2016):

[6] Nguyén Chiến Thang, https:⁄/www.miai.vn/2019/09/11/face-recog-2-0-nhan- dien-khuon-mat-trong-video-bang-mtcnn-va-facenet/, 1/1/2021.

[7] Wikipedia, https://en.wikipedia.org/wiki/Face, 3/1/2021.

[8] Researchgate, https://www.researchgate.net/figure/Identification-points-on-the- face-of-human figl 273442960, 3/1/2021.

[9] Medium, _https://medium.com/@kidargueta/detecting-emotion-in-faces-using- geometric-features-a9a7febe024f, 3/1/2021.

[10] Wikipedia, https://en.wikipedia.org/wiki/Facial_recognition_system,

{11] Jun-Ying Gan, Gun-Feng Liu, “fusion and recognition of face and iris feature based on wavelet feature and kfda,” School of Information, Wuyi University, 2009

[12] Viblo, https://viblo.asia/p/xay-dung-ung-dung-realtime-tu-dong-nhan-dien- khuon-mat-

Eb850j9W12G?fbclid=IwAR2HKZswdnlfLAKRcNIKk5WzmuaO2qbBR8Kj3hJ DIXbCJk6N2hLECrg6U2I, 3/1/2021.

[13] Machinelearningmastery, https://machinelearningmastery.com/introduction- to-deep-learning-for-face-recognition, 6/1/2021.

[14] Wikipedia, https://en.wikipedia.org/wiki/Face_detection, 3/1/2021.

[15] Springer, https://link.springer.com/referenceworkentry/10.1007%2F978-0-

[16] Researchgate , https://www.researchgate.net/figure/Face-alignment-example- with-the-supervised-descent-method-SDM-algorithm-6-

[17] Towardsdatascience, _ https://towardsdatascience.com/face-detection-using- mtcnn-a-guide-for-face-extraction-with-a-focus-on-speed-c6d59f82d49,

[18] Geeksforgeeks, https://www.geeksforgeeks.org/facenet-using-facial- recognition-system/, 3/1/2021.

[19] wikipedia, https://en wikipedia.org/wiki/Siamese_neural_network,

[20] datahacker, http://datahacker.rs/one-shot-learning-with-siamese-neural- network, 19/1/2021.

[21] Phamdinhkhanh, https://phamdinhkhanh github.10/2020/03/12/faceNetAlgorithm.html#42-triple- loss , 21/2/2021.

[22] NVIDIA Jetson TX2 kit, https://www.generationrobots.com/en/403401- orbitty-carrier-for-nvidia-jetson-tx2.html, 1/1/2021.

[23] Logitech C270 camera, https://www.logitech.com/vi-vn/product/hd-webcam- e270, 1/1/2021

[24] Geeks3d, _ https://www.geeks3d.com/20120122/test-asic-quality-of-geforce- gpus/, 7/1/2021.

[25] Vaibhav Jain, Dinesh Patel, “A GPU based implementation of Robust Face

Detection System”, Dept of Computer Engineering, Institute of Engineering and Technology, India, 2016.

Tiêu đề	Xây dựng hệ thống điểm danh bằng nhận diện khuôn mặt với Mobilefacenet
Tác giả	Tran Nguyen Phong, Dat Deo Quoc
Người hướng dẫn	Nghia Le Hoai, M.Sc, Duy Doan, Ph.D
Trường học	University of Information Technology
Chuyên ngành	Computer Engineering
Thể loại	Graduation Thesis
Năm xuất bản	2021
Thành phố	TP. Ho Chi Minh

Định dạng
Số trang	67
Dung lượng	41,47 MB