Designing the smart locking door by using image processing

MINISTRY OF EDUCATION AND TRAINING HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND EDUCATION FACULTY FOR HIGH QUALITY TRAINING GRADUATION PROJECT AUTOMATION AND CONTROL ENGINEERING DESIGNING THE SMART LOCKING DOOR BY USING IMAGE PROCESSING ADVISOR: DR TRAN VU HOANG MINH STUDENT: PHAM TUAN QUANG HUY NGUYỄN NGOC BAO SKL 0 Ho Chi Minh city, August 2022 HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND EDUCATION FACULTY OF HIGH QUALITY TRAINING GRADUATION PROJECT DESIGNING THE SMART LOCKING DOOR BY USING IMAGE PROCESSING Student name: Pham Tuan Quang Huy 18151011 Nguyen Ngoc Bao 18151003 School year: 2018 – 2022 Major: Automation and Control Engineering Instructor: Dr Tran Vu Hoang Ho Chi Minh city, August 2022 HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND EDUCATION FACULTY OF HIGH QUALITY TRAINING GRADUATION PROJECT DESIGNING THE SMART LOCKING DOOR BY USING IMAGE PROCESSING Student name: Pham Tuan Quang Huy 18151011 Nguyen Ngoc Bao 18151003 School year: 2018 – 2022 Major: Automation and Control Engineering Instructor: Dr Tran Vu Hoang Ho Chi Minh city, August 2022 ACKNOWLEDGEMENT Primarily, I would like to show appreciation to all the people that supported our team to complete this project The success and final outcome of this project required a lot of guidance from members in the laboratory and I was extremely fortunate to have got this all along the completion of our project work Thanks to all the support and assistance of our seniors and advisors, our team had more confidence to overcome all the challenges and difficulties during the project I also want to show our appreciation to our advisor Dr Tran Vu Hoang – A lecturer of Automation and Control Engineering, who was always supportive and willing to instruct us when facing challenges In addition, I would like to thank to all tutors and advisors in Ho Chi Minh University of Technology and Education general and all tutors and advisors in Department of Automation and Control Engineering in particular, who taught us basic knowledge and major knowledge about the studying field, so that we had enough principal knowledge and experience to apply to this project Moreover, all tutors always gave us a hand when our team needed any advice for the project Despite a careful research and design process, the project may still have certain limitations We hope to receive the feedback from all tutors, further improving limitations From there, we have a stronger basis to put our project into practice In conclusion, we would like to show our appreciation to the members of 18151CLA who supported and gave our team a lot of advice iv ENGAGEMENT All the achievements of this project are not a copy of any documents or research papers All the references of this project are put in the reference session Member of project (Signature) v CONTENTS LIST OF FIGURES LIST OF TABLES ABSTRACT Chapter 1: OVERVIEW 1.1 Introduction 1.2 Objective of the research 1.3 Limitations 1.4 Research methods 1.5 Structure of the project Chapter 2: PRINCIPAL THEORIES 2.1 Face detection model 2.1.1 Pytorch library 2.1.2 Ultra light fast generic face detector 2.1.2.1 Object detection 2.1.2.2 SSD - Single Shot Multibox Detector [3] 2.1.2.3 RFB – Receptive Field Block [5] 2.1.2.4 Ultra light face generic face detector architecture 11 2.1.3 2.2 Tensorflow Lite 12 Face recognition 13 2.2.1 FaceNet [14] 13 2.2.1.1 Triplet loss [14] 14 2.2.1.2 ReLU – Rectified Linear Unit [25] 16 2.2.1.3 Inception architecture [13] 16 2.2.2 VGGFace2 training dataset [15] 19 2.3 Dlib facial landmarks [25] 21 2.4 Tkinter [27] 22 Chapter 3: SYSTEM DESIGN 24 3.1 Design requirements 24 3.1.1 System block diagram 24 3.1.2 Block design on requirements 25 3.2 3.1.2.1 Image receiving block and recognition block 25 3.1.2.2 Data block and management block 28 System design 31 vi 3.2.1 Hardware 31 3.2.1.1 Choosing embedded hardware 31 3.2.1.2 Choosing camera 36 3.2.1.3 Choosing Arduino board 38 3.2.1.4 Relay module 40 3.2.1.5 Choosing LCD screen 41 3.2.1.6 Hardware block and wiring diagram 43 3.2.2 Methods survey 45 3.2.2.1 Survey face detection methods 45 3.2.2.2 Surveying human facial feature extraction methods 47 3.2.3 Operation process 49 3.2.3.1 Face registration 49 3.2.3.2 Face recognition 50 3.2.3.3 Liveness detection 51 Chapter 4: EXPERIMENT RESULT 53 4.1 Environment dataset 53 4.2 Evaluation methods 53 4.3 Performance of the system 53 4.4 System operation and hardware result 54 4.5 System validation 56 4.4.1 Environment and dataset 56 Chapter 5: CONCLUSION 62 5.1 Conclusion 62 5.2 Improvement 62 REFERENCE 63 vii viii LIST OF FIGURES Figure 2.1 Workflow of Pytorch Figure 2.2 Architecture of SSD [3] Figure 2.3 Construction of RFB module by combining multiple branches with different kernels and dilated convolution layers [5] 10 Figure 2.4 The architecture of RFB and RFB-s RFB-s is employed to mimic smaller pRFs in shallow human retinotopic maps, using more branches with smaller kernels [5]10 Figure 2.5 The pipeline of RFB-Net300 The Conv4_3 feature map is tailed by RFB-s which has smaller RFs and an RFB module with stride is produced by operating 2stride multi kernel conv-layer in the original RFB [5] 11 Figure 2.6 Architecture of Tensorflow Lite 13 Figure 2.7 Illustration of FaceNet recognition 14 Figure 2.8 Triplet loss 15 Figure 2.9 Regions of embedding space of negatives 15 Figure 2.10 ReLU activation function [25] 16 Figure 2.11 Inception network architecture [13] 17 Figure 2.12 VGGFace2 pose and age statistic [15] 19 Figure 2.13 VGGFace2 template examples (a) pose templates from three different viewpoints, (b) age templates for two subjects for young and mature ages 21 Figure 2.14 68-facial landmarks [25] 22 Figure 2.15 Fundamental structure of Tkinter program 23 Figure 3.1 System block diagram 24 Figure 3.2 System process 25 Figure 3.3 Input image 25 Figure 3.4 Input image of face detection model 26 Figure 3.5 Output of Ultra light fast detector 26 Figure 3.6 Input of face recognition model 27 Figure 3.7 Output of liveness detection 27 Figure 3.9 First window 28 Figure 3.10 Function window 29 Figure 3.11 Login window 29 Figure 3.12 Add data window 30 Figure 3.13 History window 30 Figure 3.14 Management window 31 Figure 3.15 Jetson Nano B01 Developer kit 31 Figure 3.16 connection ports of Jetson Nano B01 Developer kit 32 Figure 3.17 Pin diagram of Jetson Nano B01 33 Figure 3.18 Pin diagram of Raspberry Pi 34 Figure 3.19 Raspberry Pi 34 Figure 3.20 Camera Logitech C920 36 Figure 3.21 Camera Logitech C310 36 Figure 3.22 Camera Logitech C270 37 Figure 3.23 Arduino Uno R3 38 Figure 3.24 Pin diagram of Arduino Uno R3 39 Figure 3.25 Arduino Mega 2560 39 Figure 3.26 Relay module 41 Figure 3.27 HDMI LCD inch 42 Figure 3.28 HDMI LCD 10.1 inch 42 Figure 3.29 Hardware block diagram 43 Figure 3.30 Wiring diagram 44 Figure 3.31 Hardware SolidWorks design 45 Figure 3.32 Performance of models without mask 46 Figure 3.33 Performance of models with mask 46 Figure 3.34 Face registration process 50 Figure 3.35 Face recognition process 51 Figure 3.36 Liveness detection process 52 Figure 4.1 Performance of the system 54 Figure 4.2 Face recognition result 55 Figure 4.3 Face recognition + Liveness detection 55 Figure 4.4 Hardware design 56 Figure 4.5 Straight face result in good brightness condition 56 Figure 4.6 Result on one side of the face 57 Figure 4.7 Result with mask cover completely 57 Figure 4.8 Result with classes cover 58 Figure 4.9 Result with mask cover incompletely 58 Figure 4.10 Performance in backlit and low brightness condition 59 Figure 4.11 Performance in backlit and low brightness condition with mask 59 Figure 4.12 Result in 2m to 3m distance 60 Figure 4.13 Performance of face recognition and liveness detection 60 Figure 3.36 Liveness detection process 52 Chapter 4: EXPERIMENT RESULT 4.1 Environment dataset The researcher collected data through the registration function of the system which includes three different people with thirty images for each person The collected data contained every aspect of the human faces Validating system in different environments: Good condition, low brightness condition and in distance from 2m to 3m Validation system in different cases: Straight face, on the one side of human face, human face with class, human face with full mask and human face with half mask 4.2 Evaluation methods In this project, the researchers used the FPS (Frames per second) to validate the processing speed of the system After getting a frame from the camera, the detector will give a list of rectangle boxes Every value is a position of top right or bottom left corner of the bounding box The FPS value is calculated by dividing second to the period between two continuous frames, which is shown in formula (4.1) FPS = 𝑁𝑒𝑤 𝑓𝑟𝑎𝑚𝑒 𝑡𝑖𝑚𝑒−𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑓𝑟𝑎𝑚𝑒 𝑡𝑖𝑚𝑒 (4.1) In addition, the researchers also evaluate the accuracy of the system by validating the number of true identifications or also known as true accepted (TA) that the system can recognize on the specific given identifications The accuracy value will be the rate of the number of true identifications over the given of identifications as shown in formula (4.2) Accuracy = 4.3 𝑇𝑟𝑢𝑒 𝑎𝑐𝑐𝑒𝑝𝑡𝑒𝑑 (𝑇𝐴) 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠 (4.2) Performance of the system After selecting suitable models for face feature detection, extraction and liveness detection, we will proceed to test them on the hardware The experimental results will be conducted on Jetson Nano with configuration as Table 4.1 and five people were added in the system with thirty images for each person Device Name Jetson Nano B01 Operating system ubuntu 18.04.5 Processor core ARM A57 (1.43GHz) Memory 4GB 64bit LPDDR4 25.6GB/s Table 4.1 Hardware configuration 53 This system configuration is very suitable for experimentation due to the limited hardware configuration and without graphics card (a) Face recognition (b) Face recognition + Liveness detection Figure 4.1 Performance of the system Face Recognition Input FPS Accuracy The number of identifications 640x480 12-14 92% 20 Face Recognition + liveness detection 640x480 10-13 90% 20 Table 4.2 Performance of the system The result gain from Table 4.2: - With Face Recognition + liveness detection, the processing speed of the system achieved 10FPS-13FPS when running on Jetson Nano - With Face Recognition, the processing speed of the system achieved 12FPS-14FPS when running on Jetson Nano - The system is tested on 20 identifications and achieves 90% and 92% accuracy with Face Recognition + liveness detection and Face Recognition respectively 4.4 System operation and hardware result Operation result: Operating in real-time, the system can recognize multiple faces at the same time and backlit conditions, as shown in Figure 4.2 Face recognition and liveness detection also work stably and accurately as shown in Figure 4.3 54 (a) Good brightness (b) Low brightness Figure 4.2 Face recognition result Figure 4.3 Face recognition + Liveness detection Hardware result: The hardware is built with the purpose of simulating the smart door lock model Therefore, the hardware needs to optimize for cost as well as design, as shown in Figure 4.4 55 Figure 4.4 Hardware design 4.5 System validation 4.4.1 Environment and dataset Environment and dataset: Evaluating the model in different environmental conditions and data is trained on four people with thirty images for each person - Good brightness condition - Low brightness and backlit condition - In distance from 2m to 3m Face recognition: Good brightness: The Figure 4.5 is captured when the face was in good brightness and straight face condition, which give the high accuracy (95,9%) for each identification To make it clearer, we surveyed and collect the result in different condition as shown in Table 4.3 Figure 4.5 Straight face result in good brightness condition 56 The number True of identifications Cannot detect False Accuracy 20 20 0 100% On one side 20 20 0 100% With mask 15 75% Straight half 20 With mask 20 10 25% With classes 20 17 85% Table 4.3 Performance in good brightness condition Result from table 4.3, the Face Recognition system is tested on 20 identifications and gains 100% accuracy with straight and on one side face The accuracy reduces to 85% and 75% in with classes case and with half mask case respectively Figure 4.6 Result on one side of the face Figure 4.7 Result with mask cover completely 57 Figure 4.9 Result with mask cover incompletely Figure 4.8 Result with classes cover In the similar environment, in case the face is covered by accessories such as glasses (Figure 4.9), the system can recognize it well even when the face is turned away However, for the case of the mask system The system can recognize well when the nose is not covered (Figure 4.8, Table 4.3), in the case of wearing a mask completely, the recognition system is unstable and it is difficult to recognize faces (Figure 4.7) Backlit and low brightness condition: The system works less stable in case of low brightness The faces obtained from the camera have low detail and noise (Figure 4.10) However, the system still recognizes straight faces well The number True of identifications Cannot detect False Accuracy 10 10 0 100% On one side 10 10 0 100% With mask 5 0% Straight 10 Table 4.4 Performance in backlit and low brightness condition 58 Experimental results from Table 4.4, system is tested on 10 identifications and gains 100% accuracy in straight face and on one side face Figure 4.10 Performance in backlit and low brightness condition Figure 4.11 Performance in backlit and low brightness condition with mask In distance 2m to 3m: In the case of distance from 2m-3m as shown in Figure 4.12, the face is small, the detail is reduced which leads to system operates incorrectly In the straight face, the model still gains high accuracy The experimental results are shown in Table 4.5 The number True of identifications Straight 10 10 On one side 10 Cannot detect False Accuracy 100% 60% Table 4.5 Performance in 2m to 3m condition Result from table 4.5, The system is tested on 10 identifications and gains 100% and 60% accuracy in straight face and on one side face respectively 59 Figure 4.12 Result in 2m to 3m distance Recognition system and liveness detection: Face recognition system and liveness detection as shown in Figure 4.13 only work well in good brightness condition but low brightness condition the liveness detection is not accuracy, because 68 dlib facial landmarks cannot identify the key parts of the human face such as nose, eyebrow, mouth and eye corner (a) Good brightness (b) Low brightness Figure 4.13 Performance of face recognition and liveness detection Good brightness Low brightness The number True of identifications 10 Cannot detect False Accuracy 80% 10 10% Table 4.6 Performance in good brightness and low brightness condition 60 Result from Table 4.6, the liveness detection system is tested on 10 identifications and the system can detect human faces and image with 80% accuracy in good brightness condition and 10% accuracy in low brightness condition 61 Chapter 5: CONCLUSION 5.1 Conclusion In this project, the researchers proposed a deep learning model that can recognize human face and liveness detection in real-time The achieved result proved that the system met the requirements From the experiment, the researcher gained advantages and disadvantages of the system as shown in Table 5.1 Strengths Weaknesses Good performance on limited hardware Stability and high accuracy in different environment Simple and friendly interface for users Liveness detection has low accuracy in backlit and low brightness condition Liveness detection depend on eyes ratio of people Face recognition system cannot detect small faces well Table 5.1 Strength and weakness of the system 5.2 Improvement - Researchers are going to improve the system to detect the small faces and low brightness condition - Improving liveness detection to gain more accuracy 62 REFERENCE [1] V.hrimali, "learnopencv" [Online] Available:https://www.learnopencv.com/pytorchfor-beginners-basics/ [2] J Hui, "SSD object detection: Single Shot MultiBox Detector for real-time processing," [Online] Available: https://medium.com/@jonathan_hui/ssdobjectdetection-single-shotmultibox-detector-for-real-time-processing9bd8deac0e06 [3] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, "SSD: Single Shot MultiBox Detector," in Computer Vision – ECCV 2016, 14th European Conference, 2016, pp 21-37 [4] R Girshick, "Fast R-CNN," in IEEE/CVF, 2015 [5] Songtao Liu, Di Huang, Yunhong Wang, “Receptive Field Block Net for Accurate and Fast Object Detection,” in Beijing Advanced Innovation Center for Big Data and Brain Computing Beihang University, Beijing 100191, China [6] Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” In: AAAI (2017) [7] He, K., Zhang, X., Ren, S., Sun “Deep residual learning for image recognition,” In: CVPR (2016) [8] Chen, L.C., Papandreou, G., Schroff, F., Adam, “Rethinking atrous convolution for semantic image segmentation,” In eprint arXiv:1706.05587 (2017) [9] Simonyan, K., Zisserman, A, “Very deep convolutional networks for large-scale image recognition,” In NIPS (2014) [10] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, “Imagenet large scale visual recognition challenge,” in IJCV (2015) [11] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” In :CoRR abs/1704.04861 (2017) arXiv: 1704.04861 [12] Forrest N Iandola, Matthew W Moskewicz, Khalid Ashra AQAf, Song Han, William J Dally, and Kurt Keutzer, “SqueezeNet: AlexNetlevel accuracy with 50x fewer parameters and