Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 93 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
93
Dung lượng
2,4 MB
Nội dung
VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY FACULTY OF COMPUTER SCIENCE AND ENGINEERING BACHELOR THESIS Towards Adversarial Attack against Embedded Face Recognition Systems Major: Computer Engineering Committee: Computer Engineering Supervisors: Dr Le Trong Nhan Assoc Prof Quan Thanh Tho Reviewer: Assoc Prof Tran Ngoc Thinh —o0o— Authors: Nguyen Minh Dang - 1752170 Nguyen Tien Anh - 1752076 Tran Minh Hieu - 1752199 Ho Chi Minh City, July 2021 ĐẠI HỌC QUỐC GIA TP.HCM -TRƯỜNG ĐẠI HỌC BÁCH KHOA KHOA:KH & KT Máy tính _ BỘ MÔN: KHMT CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM Độc lập - Tự - Hạnh phúc NHIỆM VỤ LUẬN ÁN TỐT NGHIỆP Chú ý: Sinh viên phải dán tờ vào trang thuyết trình HỌ VÀ TÊN: Nguyễn Minh Đăng MSSV: 1752710 NGÀNH: KTMT LỚP: _ HỌ VÀ TÊN: Trần Minh Hiếu _MSSV: 1752199 NGÀNH: KTMT LỚP: _ HỌ VÀ TÊN: Nguyễn Tiến Anh MSSV: 1752076 NGÀNH: KTMT LỚP: _ Đầu đề luận án: Towards Adversarial Attack against Embedded Face Recognition Systems _ _ Nhiệm vụ (yêu cầu nội dung số liệu ban đầu): ✔ Investigate face authentication techniques ✔ Research and design the desired system based on NVIDIA Jetson Nano Developer Kit ✔ Research and propose an approach to apply adversarial attack technique to prevent attacker to fool the system ✔ Implement a prototype and evaluate the performance Ngày giao nhiệm vụ luận án: Ngày hoàn thành nhiệm vụ: Họ tên giảng viên hướng dẫn: Phần hướng dẫn: 1) Lê Trọng Nhân 2) Quản Thành Thơ 3) Nội dung yêu cầu LVTN thông qua Bộ môn Ngày tháng năm CHỦ NHIỆM BỘ MƠN GIẢNG VIÊN HƯỚNG DẪN CHÍNH (Ký ghi rõ họ tên) (Ký ghi rõ họ tên) PGS.TS Quản Thành Thơ PHẦN DÀNH CHO KHOA, BỘ MÔN: Người duyệt (chấm sơ bộ): _ Đơn vị: _ Ngày bảo vệ: Điểm tổng kết: _ Nơi lưu trữ luận án: _ TRƯỜNG ĐẠI HỌC BÁCH KHOA KHOA KH & KT MÁY TÍNH CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM Độc lập - Tự - Hạnh phúc -Ngày tháng năm PHIẾU CHẤM BẢO VỆ LVTN (Dành cho người hướng dẫn/phản biện) Họ tên SV: Nguyễn Minh Đăng MSSV: 1752710 Họ tên SV: Trần Minh Hiếu MSSV: 1752199 Họ tên SV: Nguyễn Tiến Anh MSSV: 1752076 Ngành (chuyên ngành): KTMT Ngành (chuyên ngành): KTMT Ngành (chuyên ngành): KTMT Đề tài: Towards Adversarial Attack against Embedded Face Recognition Systems Họ tên người hướng dẫn/phản biện: PGS.TS Quản Thành Thơ Tổng quát thuyết minh: Số trang: Số chương: Số bảng số liệu Số hình vẽ: Số tài liệu tham khảo: Phần mềm tính tốn: Hiện vật (sản phẩm) Tổng qt vẽ: - Số vẽ: Bản A1: Bản A2: Khổ khác: - Số vẽ vẽ tay Số vẽ máy tính: Những ưu điểm LVTN: - - The students addressed an emerging security problem in the area of face recognition The solution proposed by students include a selection of suitable hardware device and especially an AI approach for black-box adversarial attack, whose performance overcomes the current state-of-the-art results To achieve this, the students has conducted a very insightful literature review, gradually elaborated their suggested architecture and successfully implemented their models with impressive performance The work in this thesis has been publish in two papers, one in a student scientific conference and especially in prestigious international conference, whose proceedings are published by Springer This should illustrate excellent result of the students’ work Những thiếu sót LVTN: Đề nghị: Được bảo vệ Bổ sung thêm để bảo vệ câu hỏi SV phải trả lời trước Hội đồng: a 10 Đánh giá chung (bằng chữ: giỏi, khá, TB): Không bảo vệ Điểm : 10 /10 Ký tên (ghi rõ họ tên) PGS.TS Quản Thành Thơ KHOA KH & KT MÁY TÍNH -Ngày 08 tháng 08 21 ) Nguyen Minh Dang Nguyen Tien Anh Tran Minh Hieu Ngành (chuyên ngành): MSSV: 1752170 MSSV: 1752076 MSSV: 1752199 Máy Tính Towards Adversarial Attack against Embedded Face Recognition Systems : Assoc Prof Dr 83 35 :10 104 : 01 Adversarial attack system on Jetson Nano a The students successfully proposed a new attack algorithm on face recognition systems that works reliably in the physical world without requiring any knowledge about the victim model b Their methodology had been evaluated on various model architectures and training losses As compared the baseline, the attack success rates of their system are far better c They deployed a face recognition system on a Jetson Nano and proved it to work well d 01 paper has been accepted by The 4th International Conference on Multimedia Analysis and Pattern Recognition (MAPR 2021) The proposed methodology is only applied for global physical attacks The students should extend it for both global & local physical attacks a Based on your proposed methodology, how can you help mitigate or avoid adversarial attacks against Face Recognition Systems? b Most related works use PCs to deploy their systems, why your system is deployed on a Jetson Nano with low performance Very Good 9.5 /10 Ký tên (ghi rõ Declaration of Authenticity We hereby declare that this thesis titled "Towards Adversarial Attack against Embedded Face Recognition Systems" and the work presented in it are our own We confirm that: • This work was done wholly or mainly while in candidature for a degree at this University • Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other institution, this has been clearly stated • Where we have consulted the published work of others, this is always clearly attributed • Where we have quoted from the work of others, the source is always given With the exception of such quotations, this thesis is entirely our own work • We have acknowledged all main sources of help • Where the thesis is based on work done by ourselves jointly with others, we have made clear exactly what was done by others and what we have contributed ourselves Ho Chi Minh City, July 2021 i Acknowledgement Firstly, we would like to show our deepest gratitude to our supervisors, Professor Quan Thanh Tho and Dr Le Trong Nhan, for their invaluable time, patience, and warm support They have spent so much effort guiding us, and their insightful feedback has helped us realize the weaknesses in our work Furthermore, their enthusiasm has been an encouragement to help us move forward during the difficult stage of our research Without the help from them, this thesis could not have come to reality Secondly, we want to thank all the lecturers for all the knowledge and skills they provided us in the past four years Thank HCMC University of Technology and the Faculty of Computer Science and Engineering for creating such a wonderful incubating environment that has helped us grow as students as well as individuals Finally yet importantly, we thank our beloved friends and family for their immense amount of love, support, and encouragement throughout the years It has been an incredible journey, we wish you all good health and happiness in life Nguyen Minh Dang, Nguyen Tien Anh, Tran Minh Hieu ii Abstract Numerous studies have shown that deep neural networks (DNNs) are vulnerable to adversarial examples - malicious inputs that are carefully crafted to cause a model to misclassify This phenomenon raises a serious concern, especially for Deep learning-based security-critical systems such as face recognition However, most of the studies on the adversarial vulnerability of DNNs have only considered the ideal scenarios (e.g., they assume the attackers have perfect information about the victim model or the attack is performed in the digital domain) As a result, these methods often poorly (or even impossible to) transfer to the real world and hamper future studies on defense mechanisms against realworld attacks To address this issue, we propose a novel physically transferable attack on deep face recognition systems Our method can work in the physical world settings without requiring any knowledge about the victim model Our extensive experiments on various model architectures and training losses show non-trivial results and give rise to some interesting observations that can be a potential research direction in the future to improve the robustness of models against adversarial attacks iii Contents List of Figures vii List of Tables ix List of Notations x Introduction 1.1 Overview 1.2 Thesis Scopes and Objectives 1.3 Our contributions Background Knowledge 2.1 Deep Learning and Neural Networks 2.1.1 Artificial Neural Networks 2.1.2 Convolutional Neural Networks 10 2.2 Optimization Techniques 11 2.3 Face Recognition 13 2.4 Adversarial Machine Learning 15 2.5 2.4.1 Adversarial Examples 16 2.4.2 Properties of Adversarial Examples 16 2.4.3 A Taxonomy of Adversarial Attacks 17 2.4.4 Generating Adversarial Examples 21 Jetson Nano 22 2.5.1 Developer kit and Hardware 22 2.5.2 JetPack and libraries 24 Literature Review 3.1 28 Black-box adversarial attacks 29 iv 3.1.1 Decision-based adversarial attacks: Reliable attacks against blackbox machine learning models 29 3.1.2 Efficient Decision-based Black-box Adversarial Attacks on Face Recognition 31 3.2 Adversarial attacks in the physical world 32 3.2.1 Accessorize to a Crime: Real and Stealthy Attacks on State-of-theArt Face Recognition 33 3.2.2 AdvHat: Real-world adversarial attack on ArcFace Face ID system Methodology 34 35 4.1 Threat Model 36 4.2 Baseline Method 36 4.3 From Digital to Physical World Attack 38 4.4 Enhancing the Transferability of Transfer-based Attacks 39 Experiments 5.1 5.2 41 Experimental Settings 42 5.1.1 Datasets 42 5.1.2 Pre-trained Models 43 5.1.3 Evaluation Metric 44 5.1.4 Physical Evaluation 45 Experimental Results 46 5.2.1 Attack success rates in the physical world 46 5.2.2 Performance comparisons between digital and physical world 49 5.2.3 Sensitivity to epsilon and the number of ensemble models 50 5.2.4 Extended experiments on local adversarial attacks 51 5.2.5 Evaluation on NVIDIA Jetson Nano Embedded System 55 Conclusion and Future Works 57 Bibliography 67 Appendices 68 A FaceX-Zoo and LFW Dataset 69 A.1 Preparation and dependencies 69 v Moosavi-Dezfooli, S M., Fawzi, A., & Frossard, P (2016) DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 2574–2582 https://doi.org/10.1109/CVPR.2016.282 Nvidia (2020) Jetson nano developer kit user guide Nvidia https://developer.nvidia com/embedded/dlc/Jetson_Nano_Developer_Kit_User_Guide Parkhi, O., Vedaldi, A., & Zisserman, A (2015) Deep face recognition 1, 41.1–41.12 https://doi.org/10.5244/C.29.41 Pi, R (2021) Raspberry pi camera module v2 https://www.raspberrypi.org/products/ camera-module-v2/ Qian, N (1999) On the momentum term in gradient descent learning algorithms Neural Networks, 12 (1), 145–151 https : / / doi org / https : / / doi org / 10 1016 / S0893 6080(98)00116-6 Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P J (2020) Exploring the limits of transfer learning with a unified textto-text transformer Ranjan, R., Castillo, C D., & Chellappa, R (2017) L2-constrained softmax loss for discriminative face verification Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., & Madry, A (2018) Adversarially robust generalization requires more data Proceedings of the 32nd International Conference on Neural Information Processing Systems, 5019–5031 Schroff, F., Kalenichenko, D., & Philbin, J (2015) Facenet: A unified embedding for face recognition and clustering 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/cvpr.2015.7298682 Shao, R., Shi, Z., Yi, J., Chen, P.-Y., & Hsieh, C.-J (2021) Robust text captchas using adversarial examples Sharif, M., Bhagavatula, S., Bauer, L., & Reiter, M K (2016a) Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 1528– 1540 https://doi.org/10.1145/2976749.2978392 Sharif, M., Bhagavatula, S., Bauer, L., & Reiter, M K (2016b) Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 1528– 1540 64 Sun, Y., Wang, X., & Tang, X (2014a) Deep learning face representation from predicting 10,000 classes 2014 IEEE Conference on Computer Vision and Pattern Recognition, 1891–1898 https://doi.org/10.1109/CVPR.2014.244 Sun, Y., Wang, X., & Tang, X (2014b) Deeply learned face representations are sparse, selective, and robust Sun, Y., Cheng, C., Zhang, Y., Zhang, C., Zheng, L., Wang, Z., & Wei, Y (2020) Circle loss: A unified perspective of pair similarity optimization Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R (2014) Intriguing properties of neural networks Taigman, Y., Yang, M., Ranzato, M., & Wolf, L (2014) Deepface: Closing the gap to human-level performance in face verification, 1701–1708 https://doi.org/10.1109/ CVPR.2014.220 Tan, M., & Le, Q V (2020) Efficientnet: Rethinking model scaling for convolutional neural networks Tang, X., Du, D K., He, Z., & Liu, J (2018) Pyramidbox: A context-assisted single shot face detector Tramèr, F., Papernot, N., Goodfellow, I., Boneh, D., & McDaniel, P (2017) The space of transferable adversarial examples Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., & Madry, A (2019) Robustness may be at odds with accuracy Turner, A., Tsipras, D., & Madry, A (2019) Label-consistent backdoor attacks van den Oord, A., Li, Y., & Vinyals, O (2019) Representation learning with contrastive predictive coding Viola, P., & Jones, M (2004) Robust real-time face detection Wang, C.-Y., Bochkovskiy, A., & Liao, H.-Y M (2020) Scaled-yolov4: Scaling cross stage partial network Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., & Tang, X (2017) Residual attention network for image classification Wang, F., Cheng, J., Liu, W., & Liu, H (2018a) Additive margin softmax for face verification IEEE Signal Processing Letters, 25 (7), 926–930 https://doi.org/10.1109/ lsp.2018.2822810 65 Wang, F., Cheng, J., Liu, W., & Liu, H (2018b) Additive margin softmax for face verification IEEE Signal Processing Letters, 25 (7), 926–930 https://doi.org/10.1109/ lsp.2018.2822810 Wang, F., Xiang, X., Cheng, J., & Yuille, A L (2017) Normface Proceedings of the 25th ACM international conference on Multimedia https://doi.org/10.1145/3123266 3123359 Wang, H., Wang, Y., Zhou, Z., Ji, X., Gong, D., Zhou, J., Li, Z., & Liu, W (2018) CosFace: Large Margin Cosine Loss for Deep Face Recognition Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 5265–5274 https://doi.org/10.1109/CVPR.2018.00552 Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., & Xiao, B (2020) Deep high-resolution representation learning for visual recognition Wang, J., Liu, Y., Hu, Y., Shi, H., & Mei, T (2021) Facex-zoo: A pytorh toolbox for face recognition arXiv preprint arXiv:2101.04407 Wang, T., Wu, D J., Coates, A., & Ng, A (2012) End-to-end text recognition with convolutional neural networks Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), 3304–3308 Wang, X., Zhang, S., Wang, S., Fu, T., Shi, H., & Mei, T (2019) Mis-classified vector guided softmax loss for face recognition Wenchao Zhang, Shiguang Shan, Wen Gao, Xilin Chen, & Hongming Zhang (2005) Local gabor binary pattern histogram sequence (lgbphs): A novel non-statistical model for face representation and recognition 1, 786–791 Vol https://doi.org/10 1109/ICCV.2005.147 Wesemann, N (2020) Face recognition for nvidia jetson (nano) using tensorrt https : //github.com/nwesem/mtcnn_facenet_cpp_tensorRT Wu, Z., Lim, S.-N., Davis, L., & Goldstein, T (2020) Making an invisibility cloak: Real world adversarial attacks on object detectors Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., & Le, Q V (2020) Xlnet: Generalized autoregressive pretraining for language understanding Yuan, Y., Chen, X., & Wang, J (2020) Object-contextual representations for semantic segmentation Zeiler, M D., & Fergus, R (2013) Visualizing and understanding convolutional networks 66 Zeng, D., Shi, H., Du, H., Wang, J., Lei, Z., & Mei, T (2021) Npcface: Negative-positive collaborative training for large-scale face recognition Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Lin, H., Zhang, Z., Sun, Y., He, T., Mueller, J., Manmatha, R., Li, M., & Smola, A (2020) Resnest: Split-attention networks Zhang, K., Zhang, Z., Li, Z., & Qiao, Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks IEEE Signal Processing Letters, 23 (10), 1499–1503 https://doi.org/10.1109/lsp.2016.2603342 Zhang, X., Zhao, R., Qiao, Y., Wang, X., & Li, H (2019) Adacos: Adaptively scaling cosine logits for effectively learning deep face representations Zhong, Y., & Deng, W (2019) Adversarial learning with margin-based triplet embedding regularization Zhu, J., Xia, Y., Wu, L., He, D., Qin, T., Zhou, W., Li, H., & Liu, T.-Y (2020) Incorporating bert into neural machine translation 67 Appendices 68 Appendix A FaceX-Zoo and LFW Dataset A.1 Preparation and dependencies The open-source FaceX-Zoo is available at https://github.com/JDAI-CV/FaceX-Zoo Therefore, it can be downloaded easily Figure A.1: FaceX-Zoo on Github The repository requires some important dependencies such as: • Python >= 3.7.1 In our implementation, we use version 3.8.8 and did not encounter any conflicts • PyTorch >= 1.1.0 In our implementation, we use version 1.8.1+cu11.1 to utilize GPU computation • Torchvision >= 0.3.0 In our implementation, we use version 0.9.1+cu11.1 to utilize GPU computation 69 A.2 Face cropping We use the Face Cropper written in FaceX-Zoo to crop all images to the size 112 x 112 The process is listed as below: • Download the original LFW database at http://vis-www.cs.umass.edu/lfw/lfw.tgz • Inside the file FaceX-Zoo/test_protocol/lfw/face_cropper/crop_lfw_by_arcface.py, change direction at the bottom according to your files location – facescrub_root is the direction of your LFW dataset folder – facescrub_lms_file is the direction to the downloaded file in the above step – target is the direction to the output folder • Run the above file to crop images A.3 Pre-trained models The pre-trained models are also available in FaceX-Zoo with various state-of-the-art model architectures We use the models in Section 3.1 as the backbone-wise set and 3.2 as the head-wise set All models are available at the link https://github.com/JDAI-CV/ FaceX-Zoo/tree/main/training_mode Figure A.2: Backbone-wise models in FaceX-Zoo 70 Figure A.3: Head-wise models in FaceX-Zoo The properties of backbone-wise and head-wise models are: • With backbone-wise set, the selected head is MV-Softmax • With head-wise set, the selected backbone is MobileFaceNet • All models are trained on MS-Celeb-1M-v1c dataset using the conventional training procedure in FaceX-Zoo This dataset contains 72,778 identities and about 3.28M images • All the images are aligned and re-sized to x 112 x 112 A.4 LFW Dataset The dataset is download at the link http://vis-www.cs.umass.edu/lfw/lfw.tgz There are many variants of LFW dataset (aligned and cropped with different techniques), but we use the raw images to pre-process with the technique in FaceX-Zoo repository 71 Appendix B Deploying Face Recognition on NVIDIA Jetson Nano B.1 Prerequisite and installation guide We use an open-source implementation taken from (Wesemann, 2020) as our face recognition system The repository is published under the GNU General Public License v3.0 B.1.1 Hardware requirements There are two main things that are essential for the deployed system to run smoothly: A Jetson Nano module A camera Items listed below are our choices for the hardware requirement after considering our limited budget as well as the availability and accessibility of the item Alternatives to the items that we chose are also included, and it will work just fine, in some case better B.1.1.1 The Jetson Nano module We use the NVIDIA Jetson Nano Developer kit as our embedded board The Jetson Nano Developer kit includes a Jetson nano module(P3448-0000) attached to a carrier board(P3449-0000) To be exact, we use the latest Jetson nano Developer kit, which comes with carrier board revision B01 This is stated as to not be mistake for the original Jetson nano Developer kit, with the revision A02 The original one 72 only have 2GB of memory, which may not be enough to run the face recognition system Other alternatives includes the NVIDIA Jetson AGX Xavier module (“Jetson AGX Xavier Developer Kit”, 2021) B.1.1.2 The camera Besides the embedded board, a camera is also needed to capture real time streams In our implementation, we use the Logitech C270 HD USB Webcam (“Logitech C270 HD Webcam”, 2021) The USB Webcam provided up to 720p resolution video capture with a maximum of 30 frame-per-second (FPS) Other alternatives includes CSI cameras Albeit the fact that we used a USB Webcam, we suggest using CSI cameras instead since Jetson Nano has a built-in camera connector that is compatible with IMX219 camera modules, including Leopard Imaging LI-IMX219-MIPIFF-NANO (“LI-IMX219-MIPI-FFNANO-H136”, 2021) camera module and Raspberry Pi Camera Module V2 (Pi, 2021) B.1.2 Software dependencies Software dependencies are the libraries and packages needed for the system to function properly The dependencies for our system are listed in Table B.1 Libraries/Packages Version Jetpack 4.4.1 [L4T 32.4.4] TensorRT 7.1.3 OpenCV 4.1.1 CUDA 10.2 CuDNN 8.0 Tensorflow 1.14 CMake 3.21 OpenBLAS 0.3.15 Table B.1: Libraries/packages specifications Please note that Jetpack could be installed by either using an SD card, or using the NVIDIA SDK manager following the instructions in (Nvidia, 2020) Once Jetpack 4.4.1 is successfully installed, TensorRT, OpenCV, CUDA and CuDNN will also be installed with their specified version along with the Jetpack The rest can be installed as follow: 73 • Tensorflow should be installed following the instructions from NVIDIA: https:// docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html • CMake and OpenBLAS can be installed with the apt package manager by the following command: sudo apt install cmake libopenblas-dev B.2 System descriptions The deployed system is a simple face recognition system consists of main components: face detection and face representation B.2.1 Face Detection The face detection component used in the system is the MTCNN (Multi-task cascaded Convolutional Networks), a face and landmark detector First introduced in (K Zhang et al., 2016), MTCNN is known for working well with hard examples like partial occlusion, various poses and illuminations Figure B.1: Pipeline of the cascaded framework that includes three-stage multi-task deep convolutional networks 74 Figure B.1 visualizes the work of MTCNN, a face detection using a 3-stage neural network detector: P-network, R-network, and O-network First, the image is resized multiple times to create an image pyramid, consists of the image but in varying sizes This process is done to detect faces of different sizes Then the P-network (Proposal network) performs the first detection Then the estimated bounding box regression vectors are used to calibrate the candidates After that, non-maximum suppression (NMS) is employed to merge highly overlapped boxes At this stage, there may be many false positives, but it is designed to so The proposed regions (containing many false positives) are input for the second network, the R-network (Refine), which, as the name suggests, filters detections (also with NMS) to obtain quite precise bounding boxes In the final stage, the O-network (Output) performs the final refinement of the bounding boxes This way, not only faces are detected, but bounding boxes and the five facial landmarks (left and right eys, noses, left mouth, and right mouth) are very right and precise B.2.2 Face Representation We employ the FaceNet from (Schroff et al., 2015) as our face representation module FaceNet is state-of-the-art face recognition, verification, and clustering neural network FaceNet capability comes from its deep neural network and the Triplet loss, which served as the loss function in the training phase Figure B.2 represents the model structure of FaceNet Figure B.2: FaceNet high-level model structure FaceNet’s deep neural network directly trains its output to be a 128-dimensional embedding In our implementation, we use the Inception-Resnet-V1 from (Szegedy et al., 2016) for the deep neural network This is due to the fact that the Inception-Resnet V1 has around million parameters and 520 million FLOPS(floating-point operations), which is significantly lower than the original implementation in the paper (around 140 million parameters and 1.6 billion FLOPS) Despite the lower number of parameters, Inception-Resnet V1 achieves performance that is not statistically significantly different than the state-of-the-art architecture that is reported in (Schroff et al., 2015) Moreover, 75 the small number of parameters and FLOPS means that such architecture can be used in real-world applications, especially for mobile and embedded devices Figure B.3: Triplet loss intuition The very specific feature of FaceNet is the Triplet loss, which is first introduced in Table 2.1 as an Euclidean-based loss function The idea behind Triplet loss is represented in Figure B.3, which is to minimize the distance between the anchor and the positive, both of which share the same identity, and to maximize the distance between the anchor and the negative, which have different identity Hence, after the training phase, the faces of the same person have small distances, and faces of different persons have larger distances Accordingly, the learned features in not only separable but also discriminative Therefore, FaceNet can generalize well to new identities without retraining every time a new identity is added B.3 Evaluation We evaluate the system using metrics such as the time each component takes to process an image, the total system’s FPS (frame-per-second), and the system’s accuracy • The system’s FPS is measured by taking the number of frames and dividing it by the time it takes for each component to process an image Totally, there are three components: face detection(which detect faces in a frame), face representation(which turns the detected faces into embeddings), and face matching(which take the embeddings and compare to that of the database to see if the input faces belongs to some known identities in the system or not) • The system’s accuracy is evaluated with 100 pairs of images Half of the pairs are of the same identity, and the other half are of different identities The accuracy is measured by the number of correct classifications divided by the total number of classifications made by the system 76 Metrics Performance Face Detection 50 ± 10 ms Face Representation 22 ± ms FPS 15 ∼ 18 fps Accuracy 98 % Running time Table B.2: The deployed system evaluation 77 Appendix C Published Paper 78 ... Tính Towards Adversarial Attack against Embedded Face Recognition Systems : Assoc Prof Dr 83 35 :10 104 : 01 Adversarial attack system on Jetson Nano a The students successfully proposed a new attack. .. networks, deep face recognition models, adversarial machine learning, and the taxonomy of adversarial attacks In chapter 3, we summarize recent related works on adversarial attacks against face recognition. .. In this work, we aim to propose an adversarial attack algorithm against face recognition systems in targeted physical black-box setting The face recognition systems we consider in our thesis are