Fooling deepfake detectors with fake personas using semantic adversarial examples = đánh lừa các hệ thống nhận diện deepfake bằng nhận dạng giả thông qua biến đổi ngữ nghĩa ảnh
Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 63 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
63
Dung lượng
0,97 MB
Nội dung
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY MASTER THESIS Fooling Deepfake detectors with fake personas using semantic adversarial examples NGUYEN HONG NGOC Ngoc.NH202706M@sis.hust.edu.vn School of Information and Communication Technology Supervisor: Assoc Prof Huynh Thi Thanh Binh Supervisor’s signature Institution: School of Information and Communication Technology Co-supervisor: Prof Yew Soon Ong Institution: Nanyang Technological University, Singapore May 19, 2022 Graduation Thesis Assignment Name: Nguyen Hong Ngoc Phone: +84947265498 Email: Ngoc.NH202706M@sis.hust.edu.vn; ngocnguyen.nd97@gmail.com Class: 20BKHDL-E Affiliation: Hanoi University of Science and Technology Nguyen Hong Ngoc - hereby warrants that the work and presentation in this thesis were performed by myself under the supervision of Assoc Prof Huynh Thi Thanh Binh and Prof Yew Soon Ong All the results presented in this thesis are truthful and are not copied from any other works All references in this thesis including images, tables, figures, and quotes are clearly and fully documented in the bibliography I will take full responsibility for even one copy that violates school regulations Student Signature and name Nguyen Hong Ngoc Acknowledgement This Master thesis would not have been possible without the support of many people First of all, I would like to acknowledge and give my warmest thanks to my supervisor, Assoc Prof Huynh Thi Thanh Binh, who has given me a lot of motivation to complete this work I also thank Prof Yew Soon Ong, Doctor Alvin Chan, and especially Doctor Nguyen Thi My Binh, for being wonderful mentors and for all the support, I could not have made it without your help and guidance I would also like to thank my committee members for your thoughtful comments and suggestion to complete this thesis I would also like to give a special thanks to my wife Pham Diem Ngoc and my family as a whole for their mental support during my thesis writing process, you truly mean the world to me Furthermore, in the absence of my friends, Vinh Tong, Quang Thang, Minh Tam, Thanh Dat, and Trung Vu, I could hardly melt away all the tension from my work Thanks for always accompanying me through ups and downs Finally, this work was funded by Vingroup and supported by Vingroup Innovation Foundation (VINIF) under project code VINIF.2020.ThS.BK.06 I enormously appreciate all the financial support from Vingroup, allowing me to stay focused on my research without worrying about my financial burden Abstract Recent advances in deep generative modeling techniques such as Generative Adversarial Networks (GANs) can synthesize high-quality media content (including images, videos, and sounds) This content, collectively known as deepfake, can be really difficult to distinguish from real ones due to their extremely realistic looks and high resolution The initial purpose of synthesizing media content is to provide more examples for training deep models, thus, improving the performance and robustness of these models However, nowadays, deepfakes are also being abused for many cybercrimes such as fake personas, online frauds, misinformation, or producing media featuring people without their consent Deepfake has become an emerging threat to human life in the age of social networks To fight against and prevent these aforementioned deepfake abuses, forensic systems with the ability to detect synthetic content, have recently been exclusively studied by the research community At the same time, anti-forensic deepfakes are being investigated to understand the gaps in these detection systems and pave the way for improvement In the scope of this Master thesis, I investigate the threat of anti-forensic fake personas with the use of semantic adversarial examples, where a fraudster creates a fake personal profile from multiple anti-forensic deepfakes portraying a single identity To comprehensively study this threat model, three approaches that an attacker may use to conduct such attacks are considered, encompassing both white- and black-box scenarios A range of defense strategies is then proposed with the aim to improve the robustness of current forensic systems against such threats Experiments show that while the attacks can bypass current detection, the proposed defense approaches that consider the multi-image nature of a fake persona can effectively mitigate this threat by lowering the attack success rate The result of this thesis can help strengthen the defense in the fight against many cybercrimes utilizing deepfakes Student Signature and Name Nguyen Hong Ngoc TABLE OF CONTENTS CHAPTER INTRODUCTION 1.1 Deepfake 1.2 Applications of deepfake 1.2.1 Image editing 1.2.2 Digital cinematic actors 1.2.3 Generating training examples 1.3 Deepfake abuses 1.3.1 Disinformation 1.3.2 Fake personas/identities 1.4 Forensic and anti-forensic deepfake 1.5 Research challenge: Anti-forensic deepfake personas 10 1.6 Motivations 11 1.7 Thesis methodology 11 1.8 Contributions 12 1.9 Thesis organization 13 CHAPTER BACKGROUND 14 2.1 Deepfake generators 14 2.1.1 Autoencoder 14 2.1.2 Generative Adversarial Networks 15 2.2 Semantic modification for GAN 16 2.3 Deepfake forensic systems 17 2.4 Attacks to deepfake forensic systems 18 2.4.1 Spatial transformations 18 2.4.2 Pixel-level adversarial examples 19 2.4.3 Semantic adversarial examples 19 CHAPTER ANTI-FORENSIC FAKE PERSONA ATTACK 21 3.1 Problem modeling 21 3.2 White-box approaches 21 3.2.1 Two-phases approach 22 3.2.2 Semantic Aligned Gradient Descent approach 23 3.3 Black-box approach 25 3.3.1 Introduction to Evolutionary Algorithms 25 3.3.2 Semantic Aligned Evolutionary Algorithm 26 CHAPTER DEFENSES AGAINST ANTI-FORENSIC FAKE PERSONAS 30 4.1 Defense against Single-image Semantic Attack task 30 4.2 Defenses against anti-forensic fake persona attack 31 4.2.1 Naive Pooling defense 32 4.2.2 Feature Pooling defense 33 CHAPTER EXPERIMENT RESULTS AND ANALYSIS 35 5.1 Experiment setup 35 5.1.1 General setup 35 5.1.2 Hyper-parameters setting 37 5.2 Single-image Sematic Attack task evaluation 37 5.2.1 Baseline 38 5.2.2 Two-phases white-box approach evaluation 39 5.2.3 SA-GD white-box approach evaluation 40 5.2.4 SA-EA black-box approach evaluation 40 5.2.5 Comparison between the approaches for SiSA 41 5.2.6 Visual quality evaluation 42 5.2.7 Computational time evaluation 44 5.3 Anti-forensic fake persona attack evaluation 45 5.4 Discussions 46 5.4.1 Visual quality trade-off between approaches 46 5.4.2 Query-based defenses 47 5.4.3 Ethical discussions 47 CHAPTER CONCLUSION AND FUTURE WORKS 48 6.1 Contributions 48 6.2 Limitations and future works 48 LIST OF FIGURES 1.1 Examples of deepfake images from website thispersondoesnot exist.com These images are generated from StyleGAN2 [2] 1.2 Barrack Obama deepfakes video created from a random source video 1.3 The four types of face manipulation in deepfake 1.4 Popular Faceapp filters, utilizing deepfake technology to edit images in various ways such as: older self, cartoon style, adding facial hair or swapping the gender CGI in Rogue One movie to recreate young princess Leia, later improved with deepfakes by fans 1.6 Deepfake video of Donald Trump aired by Fox affiliate KCPQ 1.7 With the rising of deepfake technology, any social account could be fake 1.8 Andrew Walz was, according to his Twitter account and webpage, running for a congressional seat in Rhode Island In reality, Mr Walz does not exist, and is the creation of a 17-year old high-school student Original deepfake image is detected ‘fake’ by the forensic system However, after adding specially crafted imperceptible adversarial perturbations, the deepfake image, even though looks the same, is detected ‘real’ 1.10 Attacker bypasses forensic systems with seemingly legit fake persona profile, created by semantically modifying certain attributes of one source deepfake 12 2.1 Architecture of an autoencoder, includes an encoder and a decoder 15 2.2 Architecture of a Generative Adversarial Network, includes a generator and a discriminator [23] 15 The modeling of a simple GAN based deepfake generator The GAN generator takes latent code z as input and output the deepfake image x 16 Semantically modifying the attribute smile of a face image using the attribute vector Va = smile The attribute vector is learned from the latent space, using the method proposed in [24] 17 Spatial transformation adversarial attack to CNN classifier The classifier fails to classify these images after simple rotation and translation 18 1.5 1.9 2.3 2.4 2.5 2.6 The creating of pixel-level adversarial examples which uses gradient backpropagation to update the perturbations The loss function here is the prediction score Fd (x) of the detector 19 The creating of semantic adversarial examples based on gradient backpropagation Different from adversarial examples, the gradient is backpropagated to update perturbation δ, which is added directly to the original latent code z 20 Two-phases approach illustration Phase 1: Semantic modifying the original deepfake x along the target attributes to create x′ = G(z + αVA ) Phase 2: Adding pixel-level adversarial perturbation σ to create the antiforensic deepfake x′ + σ 22 Gradient back-propagation step of Semantic Aligned Gradient Descent approach, where a perturbation δ is added to the latent code z and updated by gradient descent This step is similar to the semantic adversarial example attack 24 Example of semantic aligning the perturbation δ into δ ′ , with the orthogonal threshold h⊥ and only one attribute vector Va is targeted In the case of two or more target attributes, the perturbation is projected onto the space spanned from the target attribute vectors 24 An example of 1-point crossover in SA-EA The first half of f and the second half of m are concatenated to create offspring c 27 An example of average crossover in SA-EA Offspring c is created by taking average of f and m 27 An example of random noise mutation in SA-EA Chromosome c is mutated to chromosome c′ by adding a noise uniformly sampled in range ∆ 28 4.1 Retraining the deepfake detector with addition of semantic attack images 30 4.2 Illustration of Naive Max-pooling defense, where m images of the profile are fed into the detector D to get m corresponding prediction scores Then, m prediction scores are fed through a max pooling layer to get the overall score of the profile 32 Illustration of Feature Max-pooling Defense, where m images of the profile are fed into the cnn layer of the detector then into a max-pooling layer to get the profile feature vector Lastly, the profile feature vector is fed into the f c layer to get the prediction 33 2.7 3.1 3.2 3.3 3.4 3.5 3.6 4.3 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 Two-phases white-box ASR: (a) against original detector with different target attributes ; (b) against original and defense detector (average value across target attributes) 39 The ASR of SA-GD white-box: (a) against original detector with different target attributes ; (b) against original and defense detector (average value across target attributes) 40 The attack success rate of SA-EA black-box: (a) against original detector with different target attributes ; (b) against original and defense detector (average value across target attributes) 41 The ASR of SA-GD white-box, SA-EA black-box and grid-search approaches giving the same h⊥ (average value across target attributes) 42 FIDCelebA score (smaller is better) of each attack approach against the original and defense detector Red dash line shows the FIDCelebA value of StyleGAN-generated-images from the input latent codes 43 Two-phases approach: samples of inputs and corresponding outputs with different target attributes (ϵ = 0.25) Inputs are predicted ‘fake’ while outputs are predicted ‘real’ 43 SA-GD approach: samples of inputs and corresponding outputs with different values of orthogonal threshold h⊥ Beside the target attribute age, other attributes such as smile, pose, hairstyle and background are sometimes changed, more often and more intense when the orthogonal threshold h⊥ increases 44 The P-ASR of two phases approach: (a) against Naive Max-pooling strategy with different ϵ, (b) Naive Max-pooling vs Feature Max-pooling strategy where ϵ = 0.2 (m is the number of images in a profile) 45 Exaggerated examples of how larger perturbation affects the visual quality: two-phases approach generates noisier images while SA-GD/SA-EA output non-target attributes changes 46 5.2.1 Baseline To ensure the fairness of the experiments and make sure that the results obtained are scientifically correct, in this part, some baselines are stated These baselines are acted as proof that both the original detector and the defense detector work perfectly fine before any attacks are made a, Evaluation of the baseline defense detector In this experiment, the retrained defense detector is evaluated to test if the detector is still functional as a real/fake classifier If the retrained detector can detect adversarial examples really well but is unable to perform real/fake classification then it is pointless to this defense Test Set ProGAN StyleGAN BigGAN CycleGAN StarGAN GauGAN CRN IMLE SITD SAN Deepfake StyleGAN2 Whichfaceisreal defense Acc AP 99.9% 100.0% 80.8% 99.0% 60.3% 85.5% 85.2% 96.8% 91.9% 98.0% 82.0% 97.5% 88.9% 99.4% 97.7% 99.9% 97.8% 99.8% 50.0% 61.3% 50.3% 65.3% 77.2% 98.8% 93.3% 99.9% original Acc AP 100% 100% 77.5% 99.3% 59.5% 90.4% 84.6% 97.9% 84.7% 97.5% 82.9% 98.8% 97.8% 100.0% 98.8% 100.0% 93.9% 99.6% 50.0% 62.8% 50.4% 63.1% 72.4% 99.1% 75.2% 100.0% Table 5.1: Comparison on the accuracy (Acc.) and the average precision (AP) between the defense and the original detector, test sets are from [17] (no-crop evaluation) The detector is evaluated with the same evaluation test as in the original paper [17] Table 5.1 shows the result of the test, where the accuracy (Acc.) and average precision (AP) result of the original detector is directly from [17] From observation, the AP is quite similar between the two models except for BigGAN where AP uncommonly drops For accuracy, after retraining, report cases where Acc drops (CRN, IMLE) and also cases where Acc remarkably raises (StyleGAN, StarGAN, SITD, StyleGAN2, Whichfaceisreal) Discuss on this matter, the additional Dattack set for retraining contains only face images while the original train set and test set in [17] contain images of many different objects, thus causing dropping performance On the other hand, since images of Dattack are all StyleGAN generated, the Acc after retraining may raise when testing with a dataset using an architecture that is similar to StyleGAN (e.g., Whichfaceisreal [48] is generated 38 from StyleGAN2) b, Without-the-attack baseline Without any attacks, the original detector can classify 1, 000 inputs with 99.6% accuracy (equivalent to 0.4% failing rate), average confidence of the predictions is 98.7% The defense detector retrained with StyleGAN can correctly classify all of 1, 000 inputs (1.0 accuracy) with average confidence of the predictions is 99.5% In summary, both detector work as intended and can classify these input deepfakes extremely well 5.2.2 Two-phases white-box approach evaluation 1.0 Success Rate Rsuccess Success Rate Rsuccess The first experiment is to evaluate the two-phases white-box attack Figure 5.1a shows the ASR of the two-phases attack against original detector when using different target attributes From observation, the two-phases attack has an overly high ASR, even with the lowest value of ϵ tested the ASR is still around 60% The ASR reaches 90% when ϵ is 0.15 and 100% for larger values, showing that the detector is highly vulnerable against the two-phases attacks Between different target attributes used in this test, there is no significant difference observed This shows that the attack works stably regardless of the target attribute selected 0.8 0.6 0.4 0.2 0.0 Target Attribute pose smile age gender 0.10 0.15 0.20 0.25 0.30 0.35 PGD epsilon (a) 1.0 0.8 0.6 0.4 Target Detector original defense 0.0 0.10 0.15 0.20 0.25 0.30 0.35 0.2 PGD epsilon (b) Figure 5.1: Two-phases white-box ASR: (a) against original detector with different target attributes ; (b) against original and defense detector (average value across target attributes) Figure 5.1b shows the decrease in ASR of the two-phases attacks after performing adversarial retraining on the detector Against the defense detector, the two-phases attack requires at least ϵ = 0.3 to reach 90% ASR Using a higher value of ϵ will create a noisier image, which lowers its visual quality as a result However, the overall ASR is still rather modest considering that the attacker may attack many times until successfully bypassing 39 the detector For this experiment, we can conclude that adversarial retraining does weaken the two-phase attacks to a certain extent, but does not completely mitigate the threat 5.2.3 SA-GD white-box approach evaluation In this experiment, SA-GD white-box approach is evaluated Figure 5.2a shows the ASR of SA-GD white-box attacks against original detector when using different target attributes Inspecting the Figure, overall, the ASR of SA-GD is rather high (greater than 50% in most cases, close to 100% when h⊥ is 3.5) even though not as high as the twophases attacks Nonetheless, the ASR of SA-GD is still sufficient for attacking real-life systems Notice that the ASR is higher when using greater values of h⊥ , which is an expected result from a wider and larger search space Between different target attributes used in this test, the same outcome is observed that there is no significant difference 1.0 Success Rate Rsuccess Success Rate Rsuccess Figure 5.2b illustrates the ASR of SA-GD against the detector after performing the adversarial retraining defense There is also a noticeably decrease in ASR of SA-GD across all the values of h⊥ Similar to the case of the two-phases approach that is considered above, attackers can still bypass defense systems with the SA-GD attack as they can perform the attack repeatably 0.8 0.6 Target Attribute pose smile age gender 0.4 0.2 0.0 1.0 1.5 2.0 2.5 3.0 3.5 Orthorgonal Threshold h (a) 1.0 0.8 0.6 0.4 Target Detector original 0.2 defense 0.0 1.0 1.5 2.0 2.5 3.0 3.5 Orthorgonal Threshold h (b) Figure 5.2: The ASR of SA-GD white-box: (a) against original detector with different target attributes ; (b) against original and defense detector (average value across target attributes) 5.2.4 SA-EA black-box approach evaluation Similar to the above experiments, this experiment tests the SA-EA black-box approach against the original and the defense detector Figure 5.3a shows the ASR of SA-EA against original detector with different target attributes Figure 5.3b illustrates the ASR against the detector after performing the retraining defense In general, for SA-EA blackbox attack, we can find a similar outcome compared to the white-box approaches, except 40 1.0 Target Attribute pose smile 0.8 age 0.6 gender Success Rate Rsuccess Success Rate Rsuccess for the overall lower ASR The ASR of the attack after retraining is fairly low (below 20% even when using the highest value of the orthogonal threshold) In a real attack case, the attacker may require many entries to get a successful attack, and many successful attacks with different target attributes to generate enough images in a fake personal profile Therefore, with the current ASR, we can safely say that it is would be quite challenging to attack with the SA-EA black-box after the adversarial retraining defense 0.4 0.2 0.0 1.0 1.5 2.0 2.5 3.0 3.5 Orthorgonal Threshold h Target Detector original defense 1.0 0.8 0.6 0.4 0.2 0.0 1.0 1.5 2.0 2.5 3.0 3.5 Orthorgonal Threshold h (a) (b) Figure 5.3: The attack success rate of SA-EA black-box: (a) against original detector with different target attributes ; (b) against original and defense detector (average value across target attributes) 5.2.5 Comparison between the approaches for SiSA The next experiment compares the three proposed approaches for the SiSA task giving the same constraint Since the two-phases approach does not share the same constraint of orthogonal threshold h⊥ as SA-GD and SA-EA, this attack is not included in this experiment To give a better perspective, two naive grid-search black-box approaches are also implemented as a baseline, which are: • 1-dim grid-search: perform the grid-searching along the target attribute vector • ‘Multi-dim grid-search: perform the grid-searching along the target attribute vector, with addition of random orthogonal perturbations within a threshold h⊥ in each grid step Figure 5.4 shows the ASR of SA-GD white-box attacks, SA-EA black-box attacks, 1-dim grid-search and Multi-dim grid-search against the original detector with different orthogonal threshold From observation of the result, we can see that both SA-GD white-box and SA-EA black-box semantic attacks easily outperform the naive grid-search baseline approaches Between the proposed, the SA-GD white-box attacks have noticeably higher ASR compared to SA-EA black-box attacks This is a common and expected outcome 41 Success Rate Rsuccess considering that the white-box attacks have further access and more information about the detector 1.0 0.8 SA-GD White-box SA-EA Black-box 1-dim grid-search Multi-dim grid-search 0.6 0.4 0.2 0.0 1.0 1.5 2.0 2.5 3.0 3.5 Orthorgonal Threshold h Figure 5.4: The ASR of SA-GD white-box, SA-EA black-box and grid-search approaches giving the same h⊥ (average value across target attributes) 5.2.6 Visual quality evaluation In this experiment, the visual quality of the attack examples from the SiSA task, in terms of image quality and the correctness of semantic changes is evaluated a, Adversarial images quality: The adversarial image quality is evaluated with FIDCelebA score in this experiment Figure 5.5 illustrates the average FIDCelebA score of output adversarial images when performing each attack approach against the original and defense detector As a baseline, the FIDCelebA value of the images generated by StyleGAN from the 1,000 input latent codes is calculated and represented in Figure 5.5 as the red dash Note that, the FIDCelebA of StyleGAN-generated-images in this research does not equivalent to the FID provided in the StyleGAN paper [6], since the set here only contains 1,000 StyleGAN images As we are using a single reference CelebA-HQ real images set to calculate the FID, this value can be used as a baseline to evaluate the visual quality of adversarial images relative to the original StyleGAN-generated-images We can observe in Figure 5.5 that the FIDCelebA score of all approaches is higher than the StyleGAN baseline, which suggests that the visual quality partially degrades after the attacks The FIDCelebA against defense detector is also higher compared to FIDCelebA against original detector This indicates that the robustness of the detector against semantic attacks has improved, making it more challenging to find adversarial examples that can fool the detector and yet have good visual quality Among the three approaches, the two-phases attack achieves the best FIDCelebA since it only adds imperceptible noises to 42 Average FIDCelebA 60 55 50 45 40 35 30 25 20 base StyleGAN original defense Two phases SA-GD Attack approach SA-EA Figure 5.5: FIDCelebA score (smaller is better) of each attack approach against the original and defense detector Red dash line shows the FIDCelebA value of StyleGAN-generatedimages from the input latent codes the images The SA-GD and SA-EA on the other hand, interfere with the generation process of StyleGAN, thus outputting a worse FIDCelebA score The FIDCelebA for SA-GD white-box and SA-EA black-box are also very close to each other, likely due to the fact that both approaches perform edits in the latent space b, Semantic changes correctness evaluation Figure 5.6: Two-phases approach: samples of inputs and corresponding outputs with different target attributes (ϵ = 0.25) Inputs are predicted ‘fake’ while outputs are predicted ‘real’ Figure 5.6 shows visual examples of inputs and corresponding outputs when performing the two-phases approach with different target attributes From observation of these examples, we can observe that, while in general only the target attribute is modified, there are slight changes in non-target semantic attributes such as hairstyle and skin This highlights 43 the challenge of completely disentangling semantic attributes in modern image generators, as a result of a highly abstract and entangled latent space Figure 5.7: SA-GD approach: samples of inputs and corresponding outputs with different values of orthogonal threshold h⊥ Beside the target attribute age, other attributes such as smile, pose, hairstyle and background are sometimes changed, more often and more intense when the orthogonal threshold h⊥ increases Figure 5.7 show examples of inputs and outputs when performing SA-GD approach with target attribute age and using different value of h⊥ We can see that, in most cases, changes are largely limited to the target semantic attribute, which is age As the orthogonal threshold increases, other attributes besides the target attribute age such as smile, pose, hairstyle and background are increasingly altered as well This shows a trade-off between attack success rate and off-target semantic changes Depending on what we require from the SiSA attacks (e.g the attacks are limited to changing a specific attribute only or just need to be as strong as possible), we can select a feasible orthogonal threshold to use In the experiments performed in this thesis, h⊥ = 2.0 is a quite balanced threshold since it already achieves around 75% success rate while having minimal off-target changes 5.2.7 Computational time evaluation The two-phases white-box attack takes, on average, 7.2 seconds per attack while achieving a 90% success rate The SA-GD approach takes on average 47.8 seconds per attack to achieve a similar attack success rate SA-EA black-box approach takes the longest time per attack (157.5 seconds on average) but can achieve up to only a 50% success rate The computational time shows how much effort each approach needs to accomplish a successful attack With this result, we can clearly see the advantage of the two-phases approach compared to the SA-GD approach 44 5.3 Anti-forensic fake persona attack evaluation In the experiments above, I have evaluated the ASR of different SiSA approaches, which are performed to generate an individual attack sample However, anti-forensic fake persona attacks require performing the SiSA task repeatedly to acquire the feasible amount of deepfakes to form the profile In this section, the ASR of anti-forensic fake persona attacks is evaluated To distinguish this success rate from the success rate of SiSA in the previous section, let’s notate P-ASR as the fake persona attack success rate 1.0 0.8 0.6 m=1 m=2 m=4 m=8 Success Rate Rsuccess Success Rate Rsuccess Here, the Naive Max-pooling and Feature Max-pooling strategy (introduced in Section 4.2) are applied to perform fake persona profile detection, as the defense detector alone does not support profile validation Furthermore, the two-phases approach is used for this evaluation since this attack has the highest ASR among all 0.4 0.2 0.0 0.10 0.15 0.20 0.25 0.30 0.35 PGD epsilon (a) Feature Max-pooling Naive Max-pooling 1.0 0.8 0.6 0.4 0.2 0.0 Number of images m (b) Figure 5.8: The P-ASR of two phases approach: (a) against Naive Max-pooling strategy with different ϵ, (b) Naive Max-pooling vs Feature Max-pooling strategy where ϵ = 0.2 (m is the number of images in a profile) Figure 5.8a shows the P-ASR of the two-phases approach against defense detector with the Naive Max-pooling strategy applied when changing the numbers of images in a profile m Note that, the case where m = is equivalent to the case when two-phases attacking defense detector without the Naive Max-pooling (which is tested in Section 5.2.2, Figure 5.1b) From observation, when the persona profile of interest contains more images, the Max-pooling strategy defense tends to detect better This indicates that, if the attackers want to increase the number of deepfake images to put into the fake profile (to make the profile look more legit), they will have to trade off with a lower chance to bypass the forensic systems With the Naive Max-pooling strategy alone, the robustness of the detector does get lower but not by much Next, the Feature Max-pooling strategy is tested Figure 5.8b shows the comparison between Naive Max-pooling and Feature Max-pooling strategy when the value of m is varied From the Figure, we can notice that the Feature Max-pooling strat45 egy tends to detect much better than the Naive Max-pooling strategy for every value of m tested The P-ASR dropped by an amount of 20% to 25% with the Feature Max-pooling strategy applied This highlights the importance of designing forensic methods that approach the threat in a more comprehensive manner for a stronger performance 5.4 5.4.1 Discussions Visual quality trade-off between approaches Against stronger and better-trained detectors, the adversarial attacks usually require generating larger perturbations to gain a higher attack success rate, thus, lowering the visual quality of the attack examples For different SiSA approaches, the visual quality may be affected in very different ways Figure 5.9: Exaggerated examples of how larger perturbation affects the visual quality: two-phases approach generates noisier images while SA-GD/SA-EA output non-target attributes changes Figure 5.9 illustrates how different approaches affect the visual quality in different ways when the detector gets strengthened Note that, the perturbations in the figure are exaggerated to show possible outcomes when attacking better-trained detectors For the two-phases approach, more perturbation equivalents to visibly much noisier images On the other hand, SA-GD/SA-EA approach can still generate relatively clean images, however, these images have more noticeable changes in off-target semantic attributes with increasing orthogonal threshold values This highlights the trade-offs in visual noise and off-target semantics faced by these different approaches In summary, even though the two-phases approach has the highest performance among the three proposed, it is not always the go-to choice One may prefer the SA-GD approach for the cleaner visual look of the attack examples or the SA-EA approach in case of a black-box attack scenario Each approach has its own advantage compared to the others, depend on the requirement of the fake persona attack, the attackers may select the most feasible approach 46 5.4.2 Query-based defenses From the experiments in this thesis, to perform the attacks against deepfake forensic systems, most approaches require querying the system multiple times until successful bypassing Furthermore, the queries made by these attacks also tend to converge to an instance eventually Based on this knowledge, certain defense techniques can be proposed such as: • Continuous Query Detecting: based on the work by Chen et al [49] This defense method alerts that the system is under the black-box attacks by recording the recent queries made to the detector and storing them in a buffer When a new query arrives, its distance to existing queries in the buffer is calculated If there are too many queries in the buffer that are close to the new query, we can alert that the forensic system may be under attack • Experience-based: this defense records the queries that are detected as ‘fake’ by the detector When a new query arrives that is detected ‘real’, the system checks if the query is close to the queries that have been previously detected ‘fake’ If it is, the system will mark the query as a ‘potentially dangerous’ query, and use a higher real/fake threshold to decide the output of the prediction With these query-based defenses, the robustness of forensic systems against fake persona attacks is expected to improve However, within the scope of this thesis, due to the limit in resources, I can only discuss the theory of these techniques and not empirically study them 5.4.3 Ethical discussions While the attack approaches discussed in this thesis may raise certain concerns about the risk of malicious actors abusing the attacks, I believe it is more beneficial as a whole that the findings here (especially the defenses) are shared with the research community With this thesis, I hope to facilitate research in the building of more robust forensic systems and partially contribute to the fight against deepfake abuses and cybercrimes 47 CHAPTER CONCLUSION AND FUTURE WORKS 6.1 Contributions In the bulk of this Master work, I have comprehensively investigated a novel threat model where attackers can fabricate a convincing anti-forensic deepfake persona backed by specially-crafted semantic adversarial examples Three attack approaches considering both white- and black-box scenarios are proposed to edit only targeted semantic attributes of a source deepfake image while bypassing the detector at the same time To counter this threat, different defense strategies are also proposed and discussed to help enhance the detection of deepfake personas In the experiments, while current forensic systems are shown to be highly vulnerable to the fake persona attacks, the aforementioned defenses can mitigate the threat by aggregating predictions from the images of the fake profile With the findings in this thesis work, I hope to raise awareness of the threat of fake persona cybercrimes and facilitate research in developing more robust defenses The result achieved in this Master thesis have been submitted to the following international conferences: Anti-Forensic Deepfake Personas and How To Spot Them – Nguyen Hong Ngoc, Alvin Chan, Huynh Thi Thanh Binh, Yew Soon Ong, IJCNN - 2022 IEEE World Congress on Computational Intelligence (Accepted for publishing) Publication during the Master program: “A family system based evolutionary algorithm for obstacles-evasion minimal exposure path problem in Internet of Things” - Nguyen Thi My Binh, Nguyen Hong Ngoc, Huynh Thi Thanh Binh, Nguyen Khanh Van, Shui Yu, Expert Systems With Applications, p 116943, 2022, ISSN: 0957-4174 (Q1 Journal IF: 6.954) The evolutionary algorithm designed for the SA-EA approach in this thesis is partially inspired by the evolutionary algorithm in this work 6.2 Limitations and future works Through the experiments, even though the defenses can reduce the attack success rate to a certain extent, it still does not completely neutralize the threat due to the fact that attackers may repeatedly attack the forensic system This has revealed a gap in the current deepfake detection techniques, as well as a limitation in the fight against deepfake abuses and cybercrimes Understanding this limitation, in the future, the research team will continue to research more robust deepfake detection techniques We will consider combining some of these detection techniques together to make the forensic system even stronger and detect fake persona profile better 48 REFERENCES [1] “Deepfake,” Wikipedia, 2021, https : / / en wikipedia org / wiki / Deepfake [2] T Karras, S Laine, M Aittala, J Hellsten, J Lehtinen, and T Aila, “Analyzing and improving the image quality of stylegan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp 8110–8119 [3] “Watch jordan peele use to make barack obama deliver a psa about fake news,” The Verge, 2021, https://www.theverge.com/tldr/2018/4/17/ 17247334/ai-fake-news-video-barack-obama-jordan-peelebuzzfeed (accessed Sep 1, 2021) [4] I Goodfellow, J Pouget-Abadie, M Mirza, et al., “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp 2672–2680 [5] T Karras, T Aila, S Laine, and J Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017 [6] T Karras, S Laine, and T Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2019, pp 4401–4410 [7] D Găuera and E J Delp, Deepfake video detection using recurrent neural networks,” in 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), IEEE, 2018, pp 1–6 [8] Y Li and S Lyu, “Exposing deepfake videos by detecting face warping artifacts,” arXiv preprint arXiv:1811.00656, 2018 [9] S Suwajanakorn, S M Seitz, and I Kemelmacher-Shlizerman, “Synthesizing obama: Learning lip sync from audio,” ACM Transactions on Graphics (TOG), vol 36, no 4, pp 1–13, 2017 [10] T Chen, A Kumar, P Nagarsheth, G Sivaraman, and E Khoury, “Generalization of audio deepfake detection,” in Proc Odyssey 2020 The Speaker and Language Recognition Workshop, 2020, pp 132–137 [11] R Tolosana, R Vera-Rodriguez, J Fierrez, A Morales, and J Ortega-Garcia, “Deepfakes and beyond: A survey of face manipulation and fake detection,” arXiv preprint arXiv:2001.00179, 2020 [12] N Carlini and H Farid, “Evading deepfake-image detectors with white-and blackbox attacks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp 658–659 [13] S Agarwal, H Farid, Y Gu, M He, K Nagano, and H Li, “Protecting world leaders against deep fakes.,” in CVPR Workshops, 2019, pp 38–45 49 [14] “A high school student created a fake 2020 candidate twitter verified it,” Edition CNN, 2021, https://edition.cnn.com/2020/02/28/tech/faketwitter-candidate-2020/index.html [15] Y Li, M.-C Chang, and S Lyu, “In ictu oculi: Exposing created fake videos by detecting eye blinking,” in 2018 IEEE International Workshop on Information Forensics and Security (WIFS), IEEE, 2018, pp 1–7 [16] A Rossler, D Cozzolino, L Verdoliva, C Riess, J Thies, and M Nießner, “Faceforensics++: Learning to detect manipulated facial images,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp 1–11 [17] S.-Y Wang, O Wang, R Zhang, A Owens, and A A Efros, “Cnn-generated images are surprisingly easy to spot for now,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, https://github.com/ peterwang512/CNNDetection (accessed Nov 1, 2020), vol 7, 2020 [18] J Stehouwer, H Dang, F Liu, X Liu, and A Jain, “On the detection of digital face manipulation,” arXiv preprint arXiv:1910.01717, 2019 [19] J Frank, T Eisenhofer, L Schăonherr, A Fischer, D Kolossa, and T Holz, “Leveraging frequency analysis for deep fake image recognition,” arXiv preprint arXiv:2003.08685, 2020 [20] D Li, W Wang, H Fan, and J Dong, “Exploring adversarial fake images on face manifold,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp 5789–5798 [21] W Xu, S Keshmiri, and G Wang, “Adversarially approximated autoencoder for image generation and manipulation,” IEEE Transactions on Multimedia, vol 21, no 9, pp 2387–2396, 2019 [22] A Van den Oord, N Kalchbrenner, L Espeholt, O Vinyals, A Graves, et al., “Conditional image generation with pixelcnn decoders,” Advances in neural information processing systems, vol 29, 2016 [23] J Feng, X Feng, J Chen, et al., “Generative adversarial networks based on collaborative learning and attention mechanism for hyperspectral image classification,” Remote Sensing, vol 12, no 7, p 1149, 2020 [24] Y Shen, J Gu, X Tang, and B Zhou, “Interpreting the latent space of gans for semantic face editing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, https://github.com/genforce/interfacegan (accessed Nov 1, 2020), 2020, pp 9243–9252 [25] S McCloskey and M Albright, “Detecting gan-generated imagery using color cues,” arXiv preprint arXiv:1812.08247, 2018 [26] N Yu, L Davis, and M Fritz, “Attributing fake images to gans: Analyzing fingerprints in generated images,” arXiv preprint arXiv:1811.08180, vol 2, 2018 50 [27] R Wang, L Ma, F Juefei-Xu, X Xie, J Wang, and Y Liu, “Fakespotter: A simple baseline for spotting ai-synthesized fake faces,” arXiv preprint arXiv:1909.06122, 2019 [28] J C Neves, R Tolosana, R Vera-Rodriguez, V Lopes, and H Proenc¸a, “Real or fake? spoofing state-of-the-art face synthesis detection systems,” arXiv preprint arXiv:1911.05351, 2019 [29] F Marra, C Saltori, G Boato, and L Verdoliva, “Incremental learning for the detection and classification of gan-generated images,” in 2019 IEEE International Workshop on Information Forensics and Security (WIFS), IEEE, 2019, pp 1–6 [30] L Engstrom, B Tran, D Tsipras, L Schmidt, and A Madry, “Exploring the landscape of spatial robustness,” in International Conference on Machine Learning, 2019, pp 1802–1811 [31] N Carlini and D Wagner, “Towards evaluating the robustness of neural networks,” in 2017 IEEE Symposium on Security and Privacy (SP), IEEE, 2017, pp 39–57 [32] N Papernot, F Faghri, N Carlini, I Goodfellow, and et al., “Technical report on the cleverhans v2.1.0 adversarial examples library,” arXiv preprint arXiv:1610.00768, 2018 [33] F Croce and M Hein, “Minimally distorted adversarial examples with a fast adaptive boundary attack,” arXiv preprint arXiv:1907.02044, 2019 [34] P Neekhara, B Dolhansky, J Bitton, and C C Ferrer, “Adversarial threats to deepfake detection: A practical perspective,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp 923–932 [35] A Joshi, A Mukherjee, S Sarkar, and C Hegde, “Semantic adversarial attacks: Parametric transformations that fool deep classifiers,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp 4773–4783 [36] C.-H Ho, B Leung, E Sandstrom, Y Chang, and N Vasconcelos, “Catastrophic child’s play: Easy to perform, hard to defend adversarial attacks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp 9229–9237 [37] I J Goodfellow, J Shlens, and C Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014 [38] H Zhang, Y Yu, J Jiao, E P Xing, L E Ghaoui, and M I Jordan, “Theoretically principled trade-off between robustness and accuracy,” arXiv preprint arXiv:1901.08573, 2019 [39] A Madry, A Makelov, L Schmidt, D Tsipras, and A Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv: 1706.06083, 2017 51 [40] T Back, Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms Oxford university press, 1996 [41] M Alzantot, Y Sharma, A Elgohary, B.-J Ho, M Srivastava, and K.-W Chang, “Generating natural language adversarial examples,” arXiv preprint arXiv:1804.07998, 2018 [42] N T M Binh, N H Ngoc, H T T Binh, N K Van, and S Yu, “A family system based evolutionary algorithm for obstacle-evasion minimal exposure path problem in internet of things,” Expert Systems with Applications, p 116 943, 2022, ISSN: 0957-4174 [43] A J Umbarkar and P D Sheth, “Crossover operators in genetic algorithms: A review.,” ICTACT journal on soft computing, vol 6, no 1, 2015 [44] K He, X Zhang, S Ren, and J Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp 770–778 [45] J Deng, W Dong, R Socher, L.-J Li, K Li, and L Fei-Fei, “Imagenet: A largescale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, Ieee, 2009, pp 248–255 [46] M Heusel, H Ramsauer, T Unterthiner, B Nessler, and S Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” in Advances in neural information processing systems, 2017, pp 6626–6637 [47] X W X T Ziwei Liu Ping Luo, Large-scale celebfaces attributes (celeba) dataset, http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html, 2016 (accessed Norvember 1, 2020) [48] C B Jevin West, Which face is real, https://www.whichfaceisreal com/, 2019 (accessed November 1, 2020) [49] S Chen, N Carlini, and D Wagner, “Stateful detection of black-box adversarial attacks,” in Proceedings of the 1st ACM Workshop on Security and Privacy on Artificial Intelligence, 2020, pp 30–39 52 ... adversarial examples refer to deepfake images that are adversarial to deepfake forensic detectors, meaning that these deepfakes are detected as real by the detectors The pixel-level adversarial. .. anti-forensic deepfake - deepfake examples that are specially crafted to bypass forensic systems, fooling these detectors to detect synthetic content as real These anti-forensic deepfakes, also called adversarial. .. output the deepfake image x Currently, most deepfake generators are based on GAN due to their high visual quality of the deepfake images In this thesis, GAN is mainly used as the deepfake generator