Đánh lừa các hệ thống nhận diện deepfake bằng nhận dạng giả thông qua biến đổi ngữ nghĩa ảnh

63 2 0
Đánh lừa các hệ thống nhận diện deepfake bằng nhận dạng giả thông qua biến đổi ngữ nghĩa ảnh

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY MASTER THESIS Fooling Deepfake detectors with fake personas using semantic adversarial examples NGUYEN HONG NGOC Ngoc.NH202706M@sis.hust.edu.vn School of Information and Communication Technology Supervisor: Assoc Prof Huynh Thi Thanh Binh Supervisor’s signature Institution: School of Information and Communication Technology Co-supervisor: Prof Yew Soon Ong Institution: Nanyang Technological University, Singapore May 19, 2022 Graduation Thesis Assignment Name: Nguyen Hong Ngoc Phone: +84947265498 Email: Ngoc.NH202706M@sis.hust.edu.vn; ngocnguyen.nd97@gmail.com Class: 20BKHDL-E Affiliation: Hanoi University of Science and Technology Nguyen Hong Ngoc - hereby warrants that the work and presentation in this thesis were performed by myself under the supervision of Assoc Prof Huynh Thi Thanh Binh and Prof Yew Soon Ong All the results presented in this thesis are truthful and are not copied from any other works All references in this thesis including images, tables, figures, and quotes are clearly and fully documented in the bibliography I will take full responsibility for even one copy that violates school regulations Student Signature and name Nguyen Hong Ngoc Acknowledgement This Master thesis would not have been possible without the support of many people First of all, I would like to acknowledge and give my warmest thanks to my supervisor, Assoc Prof Huynh Thi Thanh Binh, who has given me a lot of motivation to complete this work I also thank Prof Yew Soon Ong, Doctor Alvin Chan, and especially Doctor Nguyen Thi My Binh, for being wonderful mentors and for all the support, I could not have made it without your help and guidance I would also like to thank my committee members for your thoughtful comments and suggestion to complete this thesis I would also like to give a special thanks to my wife Pham Diem Ngoc and my family as a whole for their mental support during my thesis writing process, you truly mean the world to me Furthermore, in the absence of my friends, Vinh Tong, Quang Thang, Minh Tam, Thanh Dat, and Trung Vu, I could hardly melt away all the tension from my work Thanks for always accompanying me through ups and downs Finally, this work was funded by Vingroup and supported by Vingroup Innovation Foun-dation (VINIF) under project code VINIF.2020.ThS.BK.06 I enormously appreciate all the financial support from Vingroup, allowing me to stay focused on my research without worrying about my financial burden Abstract Recent advances in deep generative modeling techniques such as Generative Adversarial Networks (GANs) can synthesize high-quality media content (including images, videos, and sounds) This content, collectively known as deepfake, can be really difficult to dis-tinguish from real ones due to their extremely realistic looks and high resolution The initial purpose of synthesizing media content is to provide more examples for training deep models, thus, improving the performance and robustness of these models However, nowadays, deepfakes are also being abused for many cybercrimes such as fake personas, online frauds, misinformation, or producing media featuring people without their consent Deepfake has become an emerging threat to human life in the age of social networks To fight against and prevent these aforementioned deepfake abuses, forensic systems with the ability to detect synthetic content, have recently been exclusively studied by the re-search community At the same time, anti-forensic deepfakes are being investigated to understand the gaps in these detection systems and pave the way for improvement In the scope of this Master thesis, I investigate the threat of anti-forensic fake personas with the use of semantic adversarial examples, where a fraudster creates a fake personal profile from multiple anti-forensic deepfakes portraying a single identity To comprehensively study this threat model, three approaches that an attacker may use to conduct such attacks are considered, encompassing both white- and blackbox scenarios A range of defense strategies is then proposed with the aim to improve the robustness of current forensic sys-tems against such threats Experiments show that while the attacks can bypass current detection, the proposed defense approaches that consider the multi-image nature of a fake persona can effectively mitigate this threat by lowering the attack success rate The re-sult of this thesis can help strengthen the defense in the fight against many cybercrimes utilizing deepfakes Student Signature and Name Nguyen Hong Ngoc TABLE OF CONTENTS CHAPTER INTRODUCTION 1.1 Deepfake 1.2 Applications of deepfake 1.2.1 Image editing 1.2.2 Digital cinematic actors 1.2.3 Generating training examples 1.3 Deepfake abuses 1.3.1 Disinformation 1.3.2 Fake personas/identities 1.4 Forensic and anti-forensic deepfake 1.5 Research challenge: Anti-forensic deepfake personas 10 1.6 Motivations 11 1.7 Thesis methodology 11 1.8 Contributions 12 1.9 Thesis organization 13 CHAPTER BACKGROUND 14 2.1 Deepfake generators 14 2.1.1 Autoencoder 14 2.1.2 Generative Adversarial Networks 15 2.2 Semantic modification for GAN 16 2.3 Deepfake forensic systems 17 2.4 Attacks to deepfake forensic systems 18 2.4.1 Spatial transformations 18 2.4.2 Pixel-level adversarial examples 19 2.4.3 Semantic adversarial examples 19 CHAPTER ANTI-FORENSIC FAKE PERSONA ATTACK 21 3.1 Problem modeling 21 3.2 White-box approaches 21 3.2.1 Two-phases approach 22 3.2.2 Semantic Aligned Gradient Descent approach 23 3.3 Black-box approach 25 3.3.1 Introduction to Evolutionary Algorithms 25 3.3.2 Semantic Aligned Evolutionary Algorithm 26 CHAPTER DEFENSES AGAINST ANTI-FORENSIC FAKE PERSONAS 30 4.1 Defense against Single-image Semantic Attack task 30 4.2 Defenses against anti-forensic fake persona attack 31 4.2.1 Naive Pooling defense 32 4.2.2 Feature Pooling defense 33 CHAPTER EXPERIMENT RESULTS AND ANALYSIS 5.1 35 Experiment setup 35 5.1.1 General setup 35 5.1.2 Hyper-parameters setting 37 5.2 Single-image Sematic Attack task evaluation 37 5.2.1 Baseline 38 5.2.2 Two-phases white-box approach evaluation 39 5.2.3 SA-GD white-box approach evaluation 40 5.2.4 SA-EA black-box approach evaluation 40 5.2.5 Comparison between the approaches for SiSA 41 5.2.6 Visual quality evaluation 42 5.2.7 Computational time evaluation 44 5.3 Anti-forensic fake persona attack evaluation 45 5.4 Discussions 46 5.4.1 Visual quality trade-off between approaches 46 5.4.2 Query-based defenses 47 5.4.3 Ethical discussions 47 CHAPTER CONCLUSION AND FUTURE WORKS 6.1 48 Contributions 48 6.2 Limitations and future works 48 LIST OF FIGURES 1.1 Examples of deepfake images from website thispersondoesnot exist.com These images are generated from StyleGAN2 [2] 1.2 Barrack Obama deepfakes video created from a random source video 1.3 The four types of face manipulation in deepfake 1.4 Popular Faceapp filters, utilizing deepfake technology to edit images in various ways such as: older self, cartoon style, adding facial hair or swapping the gender 1.5 CGI in Rogue One movie to recreate young princess Leia, later improved with deepfakes by fans 1.6 Deepfake video of Donald Trump aired by Fox affiliate KCPQ 1.7 With the rising of deepfake technology, any social account could be fake 1.8 Andrew Walz was, according to his Twitter account and webpage, running for a congressional seat in Rhode Island In reality, Mr Walz does not exist, and is the creation of a 17-year old high-school student 1.9 Original deepfake image is detected ‘fake’ by the forensic system How- ever, after adding specially crafted imperceptible adversarial perturbations, the deepfake image, even though looks the same, is detected ‘real’ 1.10 Attacker bypasses forensic systems with seemingly legit fake persona pro- file, created by semantically modifying certain attributes of one source deepfake 12 2.1 Architecture of an autoencoder, includes an encoder and a decoder 15 2.2 Architecture of a Generative Adversarial Network, includes a generator and a discriminator [23] 15 2.3 The modeling of a simple GAN based deepfake generator The GAN generator takes latent code z as input and output the deepfake image x 16 2.4 Semantically modifying the attribute smile of a face image using the attribute vector Va = smile The attribute vector is learned from the latent space, using the method proposed in [24] 17 2.5 Spatial transformation adversarial attack to CNN classifier The classifier fails to classify these images after simple rotation and translation 18 2.6 The creating of pixel-level adversarial examples which uses gradient backpropagation to update the perturbations The loss function here is the 19 prediction score Fd(x) of the detector 2.7 The creating of semantic adversarial examples based on gradient backpropagation Different from adversarial examples, the gradient is backpropagated to update perturbation δ, which is added directly to the original latent code z 20 3.1 Two-phases approach illustration Phase 1: Semantic modifying the orig′ inal deepfake x along the target attributes to create x = G(z + αVA) Phase 2: Adding pixel-level adversarial perturbation σ to create the anti′ 22 forensic deepfake x + σ 3.2 Gradient back-propagation step of Semantic Aligned Gradient Descent approach, where a perturbation δ is added to the latent code z and updated by gradient descent This step is similar to the semantic adversarial example attack 24 ′ 3.3 Example of semantic aligning the perturbation δ into δ , with the orthogonal threshold h⊥ and only one attribute vector Va is targeted In the case of two or more target attributes, the perturbation is projected onto the space spanned from the target attribute vectors 24 3.4 An example of 1-point crossover in SA-EA The first half of f and the second half of m are concatenated to create offspring c 27 3.5 An example of average crossover in SA-EA Offspring c is created by taking average of f and m 27 3.6 An example of random noise mutation in SA-EA Chromosome c is mu′ tated to chromosome c by adding a noise uniformly sampled in range ∆ 28 4.1 Retraining the deepfake detector with addition of semantic attack images 30 4.2 Illustration of Naive Max-pooling defense, where m images of the profile are fed into the detector D to get m corresponding prediction scores Then, m prediction scores are fed through a max pooling layer to get the overall score of the profile 32 4.3 Illustration of Feature Max-pooling Defense, where m images of the profile are fed into the cnn layer of the detector then into a max-pooling layer to get the profile feature vector Lastly, the profile feature vector is fed into the f c layer to get the prediction 33 5.1 Two-phases white-box ASR: (a) against original detector with different target attributes ; (b) against original and defense detector (average value across target attributes) 39 5.2 The ASR of SA-GD white-box: (a) against original detector with different target attributes ; (b) against original and defense detector (average value across target attributes) 40 5.3 The attack success rate of SA-EA black-box: (a) against original detector with different target attributes ; (b) against original and defense detector (average value across target attributes) 41 5.4 The ASR of SA-GD white-box, SA-EA black-box and grid-search ap-proaches giving the same h⊥ (average value across target attributes) 42 5.5 FIDCelebA score (smaller is better) of each attack approach against the original and defense detector Red dash line shows the FID CelebA value of StyleGAN-generated-images from the input latent codes 43 5.6 Two-phases approach: samples of inputs and corresponding outputs with different target attributes (ϵ = 0.25) Inputs are predicted ‘fake’ while outputs are predicted ‘real’ 43 5.7 SA-GD approach: samples of inputs and corresponding outputs with different values of orthogonal threshold h⊥ Beside the target attribute age, other attributes such as smile, pose, hairstyle and background are sometimes changed, more often and more intense when the orthogonal threshold h⊥ increases 44 5.8 The P-ASR of two phases approach: (a) against Naive Max-pooling strat-egy with different ϵ, (b) Naive Max-pooling vs Feature Max-pooling strat- egy where ϵ = 0.2 (m is the number of images in a profile) 45 5.9 Exaggerated examples of how larger perturbation affects the visual quality: two-phases approach generates noisier images while SA-GD/SA-EA output non-target attributes changes 46

Ngày đăng: 04/06/2023, 11:32

Tài liệu cùng người dùng

Tài liệu liên quan