1. Trang chủ
  2. » Luận Văn - Báo Cáo

Đánh lừa các hệ thống nhận diện deepfake bằng nhận dạng giả thông qua biến đổi ngữ nghĩa ảnh

63 3 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Fooling Deepfake Detectors With Fake Personas Using Semantic Adversarial Examples
Tác giả Nguyen Hong Ngoc
Người hướng dẫn Assoc. Prof. Huynh Thi Thanh Binh, Prof. Yew Soon Ong
Trường học Hanoi University of Science and Technology
Chuyên ngành Information and Communication Technology
Thể loại thesis
Năm xuất bản 2022
Thành phố Hanoi
Định dạng
Số trang 63
Dung lượng 1,72 MB

Cấu trúc

  • CHAPTER 1. INTRODUCTION (12)
    • 1.1 Deepfake (12)
    • 1.2 Applications of deepfake (14)
      • 1.2.1 Image editing (14)
      • 1.2.2 Digital cinematic actors (15)
      • 1.2.3 Generating training examples (16)
    • 1.3 Deepfake abuses (17)
      • 1.3.1 Disinformation (17)
      • 1.3.2 Fake personas/identities (18)
    • 1.4 Forensic and anti-forensic deepfake (19)
    • 1.5 Research challenge: Anti-forensic deepfake personas (21)
    • 1.6 Motivations (22)
    • 1.7 Thesis methodology (22)
    • 1.8 Contributions (23)
    • 1.9 Thesis organization (24)
  • CHAPTER 2. BACKGROUND (25)
    • 2.1 Deepfake generators (25)
      • 2.1.1 Autoencoder (25)
      • 2.1.2 Generative Adversarial Networks (26)
    • 2.2 Semantic modification for GAN (27)
    • 2.3 Deepfake forensic systems (28)
    • 2.4 Attacks to deepfake forensic systems (29)
      • 2.4.1 Spatial transformations (29)
      • 2.4.2 Pixel-level adversarial examples (30)
      • 2.4.3 Semantic adversarial examples (30)
  • CHAPTER 3. ANTI-FORENSIC FAKE PERSONA ATTACK (32)
    • 3.1 Problem modeling (32)
    • 3.2 White-box approaches (32)
      • 3.2.1 Two-phases approach (33)
      • 3.2.2 Semantic Aligned Gradient Descent approach (34)
    • 3.3 Black-box approach (36)
      • 3.3.1 Introduction to Evolutionary Algorithms (36)
      • 3.3.2 Semantic Aligned Evolutionary Algorithm (37)
  • CHAPTER 4. DEFENSES AGAINST ANTI-FORENSIC FAKE PERSONAS (41)
    • 4.1 Defense against Single-image Semantic Attack task (41)
    • 4.2 Defenses against anti-forensic fake persona attack (42)
      • 4.2.1 Naive Pooling defense (43)
      • 4.2.2 Feature Pooling defense (44)
  • CHAPTER 5. EXPERIMENT RESULTS AND ANALYSIS (46)
    • 5.1 Experiment setup (46)
      • 5.1.1 General setup (46)
      • 5.1.2 Hyper-parameters setting (48)
    • 5.2 Single-image Sematic Attack task evaluation (48)
      • 5.2.1 Baseline (49)
      • 5.2.2 Two-phases white-box approach evaluation (50)
      • 5.2.3 SA-GD white-box approach evaluation (51)
      • 5.2.4 SA-EA black-box approach evaluation (51)
      • 5.2.5 Comparison between the approaches for SiSA (52)
      • 5.2.6 Visual quality evaluation (53)
      • 5.2.7 Computational time evaluation (55)
    • 5.3 Anti-forensic fake persona attack evaluation (56)
    • 5.4 Discussions (57)
      • 5.4.1 Visual quality trade-off between approaches (57)
      • 5.4.2 Query-based defenses (58)
      • 5.4.3 Ethical discussions (58)
  • CHAPTER 6. CONCLUSION AND FUTURE WORKS (59)
    • 6.1 Contributions (59)
    • 6.2 Limitations and future works (59)
    • 1.2 Barrack Obama deepfakes video created from a random source video (0)
    • 1.3 The four types of face manipulation in deepfake (0)
    • 1.4 Popular Faceapp filters, utilizing deepfake technology to edit images in (0)
    • 1.5 CGI in Rogue One movie to recreate young princess Leia, later improved (0)
    • 1.6 Deepfake video of Donald Trump aired by Fox affiliate KCPQ (0)
    • 1.7 With the rising of deepfake technology, any social account could be fake. 7 (0)
    • 1.8 Andrew Walz was, according to his Twitter account and webpage, running (0)
    • 1.9 Original deepfake image is detected ‘fake’ by the forensic system. How- ever, after adding specially crafted imperceptible adversarial perturba- tions, the deepfake image, even though looks the same, is detected ‘real’ (0)
    • 1.10 Attacker bypasses forensic systems with seemingly legit fake persona pro- file, created by semantically modifying certain attributes of one source deepfake (0)
    • 2.1 Architecture of an autoencoder, includes an encoder and a decoder (0)
    • 2.2 Architecture of a Generative Adversarial Network, includes a generator (0)
    • 2.3 The modeling of a simple GAN based deepfake generator. The GAN (0)
    • 2.4 Semantically modifying the attribute smile of a face image using the at- (0)
    • 2.5 Spatial transformation adversarial attack to CNN classifier. The classifier (0)
    • 2.7 The creating of semantic adversarial examples based on gradient back- propagation. Different from adversarial examples, the gradient is back- (0)
    • 3.1 Two-phases approach illustration. Phase 1: Semantic modifying the orig- (0)
  • Phase 2: Adding pixel-level adversarial perturbation σ to create the anti- (0)
    • 3.2 Gradient back-propagation step of Semantic Aligned Gradient Descent approach, where a perturbation δ is added to the latent code z and up- (0)
    • 3.3 Example of semantic aligning the perturbation δ into δ ′ , with the orthogo- (0)
    • 3.4 An example of 1-point crossover in SA-EA. The first half of f and the (0)
    • 3.5 An example of average crossover in SA-EA. Offspring c is created by (0)
    • 3.6 An example of random noise mutation in SA-EA. Chromosome c is mu- (0)
    • 4.1 Retraining the deepfake detector with addition of semantic attack images . 30 (0)
    • 4.2 Illustration of Naive Max-pooling defense, where m images of the pro- (0)
    • 4.3 Illustration of Feature Max-pooling Defense, where m images of the pro- (0)
    • 5.2 The ASR of SA-GD white-box: (a) against original detector with differ- (0)
    • 5.3 The attack success rate of SA-EA black-box: (a) against original detector (0)
    • 5.4 The ASR of SA-GD white-box, SA-EA black-box and grid-search ap- proaches (0)
    • 5.5 FID CelebA score (smaller is better) of each attack approach against the (0)
    • 5.6 Two-phases approach: samples of inputs and corresponding outputs (0)
    • 5.7 SA-GD approach: samples of inputs and corresponding outputs with (0)
    • 5.8 The P-ASR of two phases approach: (a) against Naive Max-pooling strat- egy with different ϵ, (b) Naive Max-pooling vs Feature Max-pooling strat- (0)
    • 5.9 Exaggerated examples of how larger perturbation affects the visual qual- ity: two-phases approach generates noisier images while SA-GD/SA-EA (0)

Nội dung

INTRODUCTION

Deepfake

Originated from a Reddit user who shared synthetic fake pornography videos featuring faces of celebrities, the term “deepfakes” refers to high-quality media content generated from deep-learning generative techniques Even though the term has only been popular since 2019, the techniques of image manipulation were developed way back in the 19 th century and mostly applied to motion pictures The technology is steadily improved dur-ing the 20 th century, and more quickly with the invention of digital video [1] Deepfake technology has been developed by researchers at academic institutions, beginning in the 1990s, and later by amateurs in online communities Over the last few years, deepfake has drastically improved in generation quality due to the advance in graphic computa-tional power and deep learning techniques.

Figure 1.1: Examples of deepfake images from website thispersondoesnot exist.com. These images are generated from StyleGAN2 [2].

Nowadays, with the power of artificial intelligence and deep learning, the quality of deep- fakes synthetic content is greatly enhanced to a remarkable realistic level For instance, inFigure 1.1, these two seemingly normal facial photos of two normal people turn out to be deepfake images, which are taken from the website thispersondoesnotexist com True to its name, these two people do not exist since these images are generated completely random by computer, to be more specific, by a deep generative architecture calledStyleGAN2 [2] Even if we examine these images carefully, it is nearly impossible to tell any difference between these deepfake images and real ones Not to mention that the resolution of these deepfakes is also profoundly high with razor-sharp image quality.

Deepfake has gained a lot of attention in 2018 when Jordan Peele and BuzzFeed cooper-ated to synthesize a fake PSA video delivered by Barrack Obama [3] utilizing deepfake technology From an arbitrary source video of a person giving a random speech, deepfake can swap the face and the voice of the person with the face and voice of Barrack Obama while the content of the speech is unchanged (Figure 1.2). Even though the deepfake video was supposed to be for entertainment purposes, the realism of its visual and audio content had made many wonders about the safety of the technology and the possibility of abusing deepfake for cybercrimes.

Figure 1.2: Barrack Obama deepfakes video created from a random source video.

Deepfake comes in many forms, from the most common form of image [4]–[6] to video [7]–[9] and even audio deepfakes [9], [10] The Barrack Obama deepfake video men- tioned above (Figure 1.2) is an example that combines all of these forms together. Among the subjects of deepfakes, the most widely studied is human facial deepfake as it could be greatly used for many applications Within the field of human facial deepfake, there are four common types of face manipulation techniques, which are (Figure 1.3) [11]:

• Entire face synthesis: Refers to the case where an entire facial image is gener- ated/synthesized by computer techniques The face image is synthesized from a random seed and usually belongs to a non-existed person.

• Face identity swap: Deepfakes where a target facial image of a person is swapped with a source facial image of another person To be more specific, only the face identity is swapped while other content in the image is unchanged.

• Facial attributes manipulation: Manipulation of a target facial image to change certain attributes such as hairstyle, eyeglasses, or even age For instance, this ma-nipulation technique can semantically change a facial image to create an older look of the person.

• Facial expression manipulation: Manipulation of a target facial image to change the expression of the person such as smile, surprise, angry, etc.

Figure 1.3: The four types of face manipulation in deepfake.

Even though each type of face manipulation has its own application, in the scope of this thesis, I exclusively study face synthesis techniques [11], in which entire non-existent face images are generated.

Applications of deepfake

One of the most well-known applications of deepfake technology is image editing. Faceapp (https://www.faceapp.com/) is a famous software that allows image editing us-ing deepfake Faceapp provides dozens of different filters that can be used on users’ uploaded images to create various effects These filters usually apply the aforementioned facial attributes manipulation deepfake, targeting different attributes that can semantically modify the image in the most realistic way Figure 1.4 illustrates a few of the most popular filters in Faceapp, including:

• Older filter: creates an image of older self from the input image, allows users to see what they may look like in the future.

• Genderswap filter: swaps the gender of the person in the input image, allowing users to see what they look like in the opposite gender.

• Cartoon filter: creates a cartoon version of the input image.

• Add facial hair filter: adds facial hair to the input image.

Figure 1.4: Popular Faceapp filters, utilizing deepfake technology to edit images in vari- ous ways such as: older self, cartoon style, adding facial hair or swapping the gender.

Facial expression manipulation deepfake can also be used to edit images and videos Peo- ple may use expression manipulation to change the expression of a person in images or videos as their desire Furthermore, face identity swap deepfake can be used for image editing, by allowing users to insert their face identity into the images of others.

Compared to traditional image processing techniques (e.g tools such as OpenCV and frameworks such as Photoshop), deepfake image editing has the advantage of being fully automatic, since, with a well-trained deepfake generative model, the input image through the generator is automatically transformed In contrast, with traditional techniques, each input must be manually handled, which often takes a lot of time and effort Not only that, deepfake can generate a very natural look to the image, which with traditional techniques depends a lot on the skills of the editor.

As mentioned above, one of the biggest applications of deepfakes is for creating digital actors in the cinematography industry Although image manipulation appeared a long time ago in form of computer graphic effects (CGI), more recent deepfakes technology is promising to generate even better quality in a much shorter time and takes much less effort Deepfake technology has already been used by fans to insert faces into existing films, such as the insertion of Harrison Ford’s young face onto Han Solo’s face in the

Figure 1.5: CGI in Rogue One movie to recreate young princess Leia, later improved with deepfakes by fans. movie Solo: A Star Wars Story, and similar techniques were used for the acting of Princess Leia in the movie Rogue One [1] As in Figure 1.5, CGI is used to recreate young Princess Leia, which is based on the face expression scanned from another actress using hundreds of motion sensors With deepfake technology, instead of the motion sensors to capture the facial expression, we only need reference videos, which can either be the original video of the target (in this case, princess Leia from original movies), or the source video where we want to replace the face of the target in The quality of deepfake videos is getting better every day, while the cost of time and resources to synthesize deepfake is much less than the cost for CGI.

One other important application of deepfake is to generate more examples for training deep neural networks As many of us may know, the capability of artificial neural net- works, regardless of size, is highly dependent on the data on which the networks are trained If the data is too small or too biased, the performance of the networks in real life may be significantly affected For instance, a face recognition system, which is only trained the data of facial images of young people, will perform poorly when recognizing the faces of older people An animal image classifier that is trained only with images of black cats will likely not be able to correctly classify images of white cats Since the era of Artificial Intelligence (AI), biased data is always one of the biggest problems when train-ing neural networks The problem also extremely takes a lot of effort to solve, because the only way to make biased data non-biased is to collect even more data to improve the diversity of training samples Collecting data is usually very time- consuming and often costs a fortune to do.

Deepfake comes in handy as a promising answer to the biased-data problem without cost- ing too many resources With deepfake, people can easily generate new examples to improve the diversity of the training dataset For instance, in the above face recognition system example, we can use deepfake to generate older versions of young people’s im-ages in the dataset and use those deepfakes to train the model In the case where the deep learning system lacks training data, a deepfake generator can be used to generate more data for training This solution helps save a lot of time and money for the system designers compared to manually collecting real data.

Deepfake abuses

In contrast to promising applications, deepfakes are mostly being abused for many illegal activities and cybercrimes Two most dangerous crimes can be done with deepfakes are disinformation and fake personas/identities.

Deepfake’s remarkable performances in generating photo-realistic content involving faces and humans have raised concerns about issues such as malicious use of fake media to spread misinformation [12] and fabricating content of people without their consent [13].

With deepfakes, one can also spread fake news, and hoaxes targeting celebrities which are backed up by convincingly high-quality images/videos For instance, the origin of deepfakes is from synthetic pornographic videos featuring the faces of celebrities, which can be used to black-mailed or disrepute these people without their consent A report published in October 2019 by Dutch cyber-security startup Deeptrace estimated that 96% of all deepfakes online were pornographic [1].

Figure 1.6: Deepfake video of Donald Trump aired by Fox affiliate KCPQ.

Some people can also use deepfake videos to misrepresent well-known politicians in videos, targeting their rivals to achieve an advantage in politics Some incidents have been recorded in the past such as: in January 2019, Fox affiliate KCPQ aired a deep-fake video of Donald Trump during his Oval Office address, mocking his appearance and skin color [1] (Figure 1.6) In April 2020, the Belgian branch of Extinction Rebellion published a deepfake video of Belgian Prime Minister Sophie Wilmes` on Facebook [1].

Deepfake is also being abused a lot to create fake personas/identities and pretend to be other people For instance, someone with access to the technology may open prod-uct/social accounts using the identities of others or even of non-existed people with the intention to do cybercrimes such as scams and financial frauds Criminals can easily pre-tend to be other people online and do crimes without the consequence of being tracked (see Figure 1.7 and 1.8) With the support of deepfake, they can even generate photo-realistic ID card images to gain the trust of others, thus, successfully scamming in online transactions.

Figure 1.7: With the rising of deepfake technology, any social account could be fake.

A famous example of online fake personas deepfake is the case of the Twitter account Andrew Walz According to this account, Andrew was a congressional candidate running for office in Rhode Island, who called himself “Republican” with the tagline “Let’s make changes in Washington together” Walz’s Twitter account was complete with his picture and a prized blue check-mark, showing that he had been verified by Twitter as one of the accounts of congressional and gubernatorial candidates (Figure 1.8) Andrew Walz, how- ever, was actually the creation of a 17-year-old high-school student During his holiday break, this student created a website and Twitter account for this fictional candidate [14]. The Twitter profile picture was downloaded from the website thispersondoesnotexist.com.

These are just a few of many abuses of deepfake, which are increasing in quantity and quality every day Even though deepfake has a lot of great applications, we need to be more aware and cautious of deepfake’s potential threat.

Figure 1.8: Andrew Walz was, according to his Twitter account and webpage,running for a congressional seat in Rhode Island In reality, Mr Walz does not exist, and is the creation of a 17-year old high-school student.

Forensic and anti-forensic deepfake

Since the advent of deepfake abuses and cybercrimes, there is a wide array of defenses proposed to mitigate this emerging threat and prevent the risk These defenses, usually aim to counter against deepfake by detecting and classifying the deepfake content among real ones, also known as deepfake forensic/detection systems In recent years, deep-fake forensic systems are extensively studied and developed by the research communities. Most forensic systems can be divided into two main groups:

• The first group of measures seeks to detect fake content based on high-level se- mantic features such as behavioral cues [13] like inconsistent blinking of the eyes

[15] These methods have the advantage of fast validation of new instances but are usually quickly outdated when the deepfake technology improves over time Today’s deepfake content has developed into a near- perfection quality and excep-tionally natural looks, which makes these high-level features highly realistic and non-distinguishable.

• The second group of defenses is based on low-level features underneath the image pixels by training a convolution neural network (CNN) to classify images/videos as either fake or real [16]–[19] These forensic detectors normally achieve state-of-the-art performance due to the ability to automatically learn feature extraction of CNN.

On the opposite side of forensic systems, we have anti-forensic deepfake - deepfake examples that are specially crafted to bypass forensic systems, fooling these detectors to detect synthetic content as real These anti-forensic deepfakes, also called adversar- ial examples, are most commonly generated by using gradient back-propagation to add imperceptible adversarial perturbations to the pixels of the original deepfake [12], [20]. Figure 1.9 illustrates an example of the adversarial example Despite that to human eyes, the deepfake image seems to remain unchanged after adding the perturbations, forensic systems are still fooled and decide that the image is real Many experiments have shown that recent deepfake forensic systems are extremely vulnerable to adversarial examples, revealing a big gap in current detection techniques [12].

Figure 1.9: Original deepfake image is detected ‘fake’ by the forensic system. However, after adding specially crafted imperceptible adversarial perturbations, the deepfake image, even though looks the same, is detected ‘real’.

Forensic versus anti-forensic deepfake has been back and forth for years While forensic systems get improved over time due to recent advanced classification techniques and ex-tensive training, anti-forensic is also getting more effective with new generative networks developed every day Nonetheless, forensics (defenses) and anti-forensic (attacks) are two sides of the problem To make progress on one side, researchers must have knowledge from both sides For instance, knowledge obtained from the attack methods can be used to understand the weakness of the defenses, thus, proposing counter techniques to mitigate the attacks.

Understanding this relationship between forensic and anti-forensic, in the fight against deepfake abuses, there are normally two main groups of approaches The first group is to explore and discover different types of attacks on the forensic detectors, since it is crucial to be aware of possible attacks and prepare the corresponding counter defenses against them The second group is to propose techniques that focus directly on improving the forensic systems, either it is to boost the performance of the deepfake detectors in general or may simply gain robustness against a certain type of attack Either way, both groups of approaches are equally important and must be conducted simultaneously for the best efficiency.

Research challenge: Anti-forensic deepfake personas

As introduced in the section 1.3.2, the astonishing ability of deepfake technology to syn-thesize photo-realistic media has raised many questions on the risk of deepfake abuses and cybercrimes Deepfake persona/identity attacks, among those crimes, are highly dangerous since they can cause tremendous loss to victims To create such fake per-sonas/identities on the internet, a fraudster/attacker has to generate a set of many deepfake media (including images, voices, and videos) that satisfy three following conditions:

(i) Quality and quantity - the quality of the deepfake media has to be realistic enough to fool a human So as the quantity of the media, since a profile that contains only one image of the person would not be so convincing to others.

(ii) Identity consistency - the identity of the fake persona has to remain consistent between the deepfake media For instance, deepfake images have to be of the same person, preferably with different semantics (different poses, facial expres-sions, ages) and scenarios to make it seem more legit.

(iii) Anti-forensic - With the reputation of deepfake, many of today social networks are getting integrated with forensic systems to help detect and filter out malicious fake content Therefore, the generated deepfake media used to form the fake persona has to bypass current forensic systems in order to successfully attack.

As a result, the fake persona profile has to be anti-forensic.

In the scope of this Master thesis, I perform an exclusive study on this anti- forensic deepfake personas/identities abuse Not to forget that both forensic and anti-forensic ap-proaches are equally important in the fight against deepfake abuses, the challenges of this research is:

1 to study and come up with different attack methods that can be used to create such anti-forensic deepfake persona profiles that satisfy all three aforementioned condi-tions.

2 to analyze the attack methods and propose corresponding defenses that counter the attacks, improving the robustness of deepfake forensics against the attacks.

To the best of my knowledge, this thesis is among the first research works to study the threat of anti-forensic fake persona attacks Although recent works [12], [20] show that it is possible to create high-quality anti-forensic deepfakes that bypass state-of-the-art forensic systems with adversarial examples, this attack can only create separated anti- forensic fake images with no correlation Hence, in the context of deepfake persona attacks, adversarial attacks alone do not satisfy the identity consistency and the quantity condition For these reasons, I find this research topic to be novel, highly challenging and complies perfectly with the requirements for my Master program.

Motivations

Today, social network media has become an irreplaceable part of our life This network environment allows great connection between people, but at the same time provides ideal places for fraudsters to commit cybercrimes online Many of these cybercrimes in social networks apply deepfakes to create fake identities, fake personas, and fake news, causing tremendous loss to millions of victims Online transaction frauds can cause a loss in money/properties depending on the size of the transactions Fake news attacks, on the other hand, may be used to create a bad reputation for celebrities or politicians, indirectly causing an enormous loss in many aspects to the victim.

Realizing this emerging threat of deepfake, this Master work is performed with the aim to provide a safer and more secured social network life, protecting users from being victims of the aforementioned cybercrimes The proposed methods in this research can be used to strengthen current deepfake detectors, improve the robustness of forensic systems against fake persona attacks in particular, and adversarial attacks in general.With the obtained results, I also hope to facilitate the research community in developing better defense tech-niques, bridging the gap in current deepfake detection.

Thesis methodology

In this thesis, to study the anti-forensic deepfake personas abuse threat, I investigate three different approaches that an attacker in real life may use to perform the attack that satisfies all three conditions presented in Section 1.5 Essentially, these approaches are designed to increase the classification error of the forensic systems through iterative edits on a source fake image Simultaneously, to ensure identity consistency, the changes are constrained to alter only certain target semantic attributes of the image (e.g identity-preserving at-tributes such as pose or facial expressions) with minimal changes to others This process can be repeated on the same source image with different target attributes to create a set of diverse deepfake images that are consistent in identity, which is later used to form a fake persona profile (see Figure 1.10).

The proposed methods also take into account both the white-box setting (based on gra-dient back-propagation) and the black-box setting (based on the evolutionary algorithm) To be more specific, a white-box scenario refers to the case where the attackers have full access to the forensic detectors, useful when testing the limits of the defense In contrast, the black-box scenario assumes a more realistic case where the attackers do not have in-formation about the architecture and the gradient of the detectors Experiments of the proposed fake persona attacks on a state-of-the-art forensic system show that the attacks can achieve a high success rate, revealing a gap in current detection techniques against deepfake.

Figure 1.10: Attacker bypasses forensic systems with seemingly legit fake persona profile, created by semantically modifying certain attributes of one source deepfake.

As a means to defend and counter this threat, two defense strategies are proposed to im-prove the robustness of forensic systems The first defense strategy is based on adversarial retraining, where adversarial examples are used to enhance the training data and improve the detection accuracy against future attacks The second strategy is specially made for the fake persona attacks, by treating the profile of interest as a set of images and taking into account the correlation between these images to decide Through the experiments, the defense strategies are shown to be effective in reducing the attack success rate of such threats.

Contributions

The fake persona attack approaches investigated in this work would enable the forensic designer to identify the major weaknesses, if any, of current deepfake detector systems. From there, designers and managers can have a better idea of the limitation of the sys- tems and implement suitable actions in response Simultaneously, defense techniques are also proposed in this research to improve the performance of current forensic systems, boosting the robustness of these systems against attacks and deepfake abuses.

Toward the research community, the scientific contributions of this work are summarized as follows:

• This thesis investigates the possibility of fake persona attacks on the internet that satisfy both quality, quantity, identity consistency, and anti-forensic (to be able to bypass forensic systems).

• To achieve such attacks, different approaches that alter only targeted semantic at-tributes while fooling forensic systems are proposed, including:

– Two approaches based on white-box attack assumption, which are the Two-phases approach and the Semantic Aligned Gradient

Descent (SA-GD) ap-proach.

– Semantic Aligned Evolutionary Algorithm (SA-EA) approach, which is in-spired by Evolutionary Algorithm, is based on the black-box attack assump-tion.

• To counter against the attacks, three defense strategies are proposed, including ad-versarial retraining defense, Naive Pooling defense, and Feature Pooling defense These defenses are shown to improve the robustness of the forensic systems and mitigate the fake persona threat.

• Discussion on the performance of the approaches under different circumstances shows great insights While the Two-phases approach has the highest success rate, it may degrade the visual quality of the deepfake images in a certain way compared to other approaches.

Thesis organization

The remainder of this thesis is organized as follows Chapter 2 presents the fundamental background knowledge and a survey of scientific works that are related to the research challenges addressed in this thesis work Chapters 3 and 4 address the research challenge stated above, by proposing different approaches for fake persona attacking, and discussing defense strategies respectively Chapter

5 is the experimental results, evaluation, and analysis of the proposed approaches.Finally, chapter 6 concludes the thesis and discusses the future works.

BACKGROUND

Deepfake generators

In early works, content such as images and written texts is usually generated with autoen-coders - a type of artificial neural network used to learn efficient coding of unlabeled data (unsupervised learning) The encoding is validated and refined by attempting to regener-ate (or decode) the input from the encoding A common autoencoder includes two main components:

• Encoder: a neural network which maps from the data space to an encoding space which is normally has much less dimensions than the input data space. The encoder tries to learn an encoding (representation) for a set of data, typically for dimension-ality reduction, by training the network to ignore insignificant data.

• Decoder: a neural network which maps from the encoding space back to the data space The decoder learns to reconstruct the data from an arbitrary encoding.

Figure 2.1 illustrates a simple autoencoder To use an autoencoder to generate data (which in this case, deepfake content), first, the autoencoder is trained to reconstruct the exact in- put image More specifically, each input sample is forwarded through the encoder and then through the decoder to get the corresponding output Reconstruction loss (the differ-ence between the input and the output) is calculated and back-propagation is performed to update the encoder and the decoder After training, the autoencoder learned the mapping from the data to the encoding and vice versa At this step, we take the decoder part of the autoencoder to make the deepfake generator By that, feeding a random encoding that is sampled from the encoding space into the decoder, we would receive a deepfake image as the output Recent variational autoencoder, also known as a VAE, is an improved version of the autoencoder which offers better training quality.

Recent works [21], [22] applied VAEs for generating images Although the VAE has simple architecture and is easy to use, the generated deepfake is somewhat restricted to the data and the quality is not particularly high Therefore, most autoencoders are used to learn the representation of the data instead of generating new data.

Figure 2.1: Architecture of an autoencoder, includes an encoder and a decoder.

Generative Adversarial Networks (GANs), proposed by Goodfellow in 2014 [4], is one of the most efficient generative architectures for synthesizing deepfake content Compared to VAEs, GANs provide much superior image quality and the generated deepfakes can be much more diverse Figure 2.2 illustrates the architecture of the original GAN model, which includes two main components:

Figure 2.2: Architecture of a Generative Adversarial Network, includes a generator and a discriminator [23].

• Generator: a neural network that takes a random noise vector z (also known as the latent code) as the input and outputs an image The initial generator simply outputs a noisy image with no semantic content The generator is sometimes compared to the decoder in the autoencoder due to similarity.

• Discriminator: a neural network, usually a convolution neural network (CNN) which acts as a classifier that classifies an input image into two classes:

‘real’ (the image is originated from real data) and ‘fake’ (the image is generated from the generator).

The training process of GAN requires a training dataset of real data, for example, real im- ages collected online During the training, the generator and the discriminator are trained simultaneously with different loss functions The discriminator is trained to classify and discriminate generated samples from real ones better In contrast, the generator is trained to generate better samples that look more similar to the real one, i.e harder to be clas- sified correctly by the discriminator The generator and the discriminator are trained and improved together over time until they reach convergence, similar to the Minimax rule in Game theory The only downside of the training process of GAN is that it takes a long time to complete since GAN trains two networks simultaneously at a time.

After training, the generator part is taken to create the final generator We simply input a random latent code vector and the generator will output the deepfake example The semantic content of the deepfake highly relies on the value of input latent code. Recently proposed GANs such as ProGAN [5], StyleGAN [6] and StyleGAN2 [2] showed highly realistic results with spectacular level of details For example, Figure 1.1 in Section 1.1 shows deepfake facial images generated from StyleGAN2.

Figure 2.3: The modeling of a simple GAN based deepfake generator The GAN genera-tor takes latent code z as input and output the deepfake image x.

Currently, most deepfake generators are based on GAN due to their high visual quality of the deepfake images In this thesis, GAN is mainly used as the deepfake generator model. Hereinafter, a GAN is notated and modeled as G such that: x = G(z) : Z 7→X, where z ∈ Z is the input latent code vector and x ∈ X is the output synthesized image (Figure2.3) Z is also called the latent space of GAN.

Semantic modification for GAN

The proposed anti-forensic fake persona attacks in this thesis also involve the concept of semantic modification for GAN In here, semantic refers to the content of the deepfake images, that establishes the meaning of the image For a facial image, semantic modifica- tions can be modifications to certain facial attributes of the image, such as expression, age,hair, eyes, or skin of the person In GAN model, certain semantic attributes of the syn-thetic images can be modified by properly editing the input latent codes However, these attributes are usually entangled in the latent space, making controlled semantic modifica- tions to images difficult to accomplish In [24], the authors proposed a method to interpret the latent space of GANs, thus, providing a way to arbitrarily edit certain attributes of face images such as age, gender, smile and pose For each attribute, a corresponding attribute vector V a ∈ Z is generated by learning from the latent space of GANs Semantic mod- ification is done by translating the latent code along the direction defined by a selected attribute vector V a Figure 2.4 illustrates an example of semantic modifying the attribute smile of an image by translating the latent code along attribute vector V smile

It is worth noting that, in this thesis, the semantic modification for GAN is done with pre-trained attribute vectors proposed in [24], including vectors for many different attributes: age, gender, smile and pose.

Deepfake forensic systems

As introduced earlier in Section 1.4, deepfake forensics are techniques that allow dis-tinguishing the deepfake content from real ones At first, most forensic systems were based on high-level semantic features or fingerprints that GANs produced in the genera-tive process [8], [15], [25]–[27] Currently, the state-of-the-art detectors

[17]–[19], [28], [29] are based on image classifiers that work on low-level and pixel-related features, which are usually not visible to humans These detectors are regarded to be more versa-tile for different deepfake techniques and generators. The experiment showed that these CNN-based detectors can generalize to effectively detect deepfakes from many previously unseen generators.

Here, forensic systems are modeled as neural binary classifiers which map a given input image x ∈ X into two classes: ‘real’ and ‘fake’ For notation, hereinafter, a detector D is defined as in (2.1), where F d (x) : X 7→[0, 1] is corresponding to the predicted probability of the image x to be synthesized and λ is the real/fake threshold which is often set at a

Figure 2.4: Semantically modifying the attribute smile of a face image using the attribute vector V a = smile The attribute vector is learned from the latent space,using the method proposed in [24] neutral value such as 0.5.

D(x) = ( real if Fd(x) = activation(CNN(x)) < λ (2.1) f ake if otherwise

Attacks to deepfake forensic systems

Since state-of-the-art detectors are image classifiers, techniques used to manipulate CNN classifiers’ predictions can also be applied to attack deepfake forensic systems. Most attacks exploit the unrefined boundaries between classes in the loss landscape of the clas-sifier In this section, some of the most popular attacks to deepfake forensic systems in specific and CNN classifier, in general, are introduced.

Spatial transformation is one of the easiest ways to attack a CNN classifier, yet can be highly effective Some previous works have shown that CNN-based image classifiers can be easily fooled using simple spatial transformations on the original images such as trans-lation and rotation [30] Figure 2.5 illustrates the spatial transformation attack, where the classifier fails to classify the images after translation and rotation We can see that the CNN classifier made some ridiculous predictions, such as predicting the revolver as a mousetrap This attack can also be used to attack deepfake detectors, however, attacking binary classifiers is a bit harder since there are only two classes.

Figure 2.5: Spatial transformation adversarial attack to CNN classifier The classifier fails to classify these images after simple rotation and translation.

Adversarial examples [31]–[33] to attack deepfake forensic systems is a more popular choice due to its high success rate and image quality retraining In here, adversarial examples refer to deepfake images that are adversarial to deepfake forensic detectors, meaning that these deepfakes are detected as real by the detectors.

The pixel-level adversarial example is the first and most common type of adversarial ex-ample, in which, adversarial perturbations are added to the deepfake images in the pixel space These adversarial perturbations are visually imperceptible to human eyes and are crafted by increasing the classification loss of the classifier More recently, [12],

[34] pro-posed white- and black-box attacks based on pixel-level adversarial perturbation to fool deepfake detectors Figure 2.6 illustrates the process of creating pixel-level adversarial examples, where the gradient is back-propagated to update the perturbations with the loss function is the prediction score of the detector The perturbations here are also limited with the condition that the p-norm of the perturbations has to be lower than an amount ϵ This limitation is to ensure that the perturbations are imperceptible to human eyes, thus, retaining the quality of the image.

Figure 2.6: The creating of pixel-level adversarial examples which uses gradient back-propagation to update the perturbations The loss function here is the prediction score Fd(x) of the detector.

Several previous works [12], [20], [35] proposed semantic adversarial attacks on classi-fiers by altering the semantics of images through a generator’s latent space Instead of updating the perturbations in the pixel space of the image, back-propagation in semantic adversarial example updates the latent code directly, hence, changing the semantic mean-ing of the image This attack targets the fact that uncommon attributes in the training data may cause the forensic to fail when detecting Several works also perform different ways to semantically modify the deepfakes with the aim to bypass deepfake detectors For example, Ho et al in [36] designed attacks based on both small and large image perturbations, resulting from camera shake and pose variation.

Figure 2.7 illustrates the creation of semantic adversarial examples In which, the pertur-bations here (denoted as δ) are added directly to the original latent code z The gradient back-propagation is also performed to update the perturbation δ also with the loss func-tion is the prediction score of the detector.

Figure 2.7: The creating of semantic adversarial examples based on gradient back- propagation Different from adversarial examples, the gradient is back-propagated to update perturbation δ, which is added directly to the original latent code z.

These works, however, do not constrain the modifications to targeted attributes and are, thus, prone to altering the identity of a fake persona Different from these works, this the-sis studies a different scenario where attackers create a fake persona profile by generating multiple semantically different examples portraying one single identity Current attacks can not be directly applied here, thus, I design targeted semantic perturbations that aim to retain the identity of the image while still fooling the detector Nonetheless, the defense approach to augment training data is partially inspired by adversarial training [37], [38] where image classifiers are trained on adversarial examples to improve their adversarial robustness.

ANTI-FORENSIC FAKE PERSONA ATTACK

Problem modeling

Given a deepfake image x generated from latent code z by GAN model G: x = G(z) and a set of k target attributes: A = {ai|i = 1 k} with corresponding attribute vectors {Va i |i = 1 k} Note that, the attribute vectors in here are found from interpreting the latent code using the method proposed in [24], which is introduced in Section 2.2 Initially, x is assumed to be classified as ‘fake’ by a deepfake detector D: D(x) = f ake To perform the anti-forensic fake persona attack task, we are given a sub-task, which is to generate a single modified image x ′ such that:

• The image x ′ is anti-forensic, which means x ′ is detected ‘real’ by the detector D.

• Semantic preservation along with target attributes: the modification applied to x ′ must be semantically limited to target attributes only Attributes other than target ones are semantically consistent between x and x ′ This condition helps retain the identity of the fake persona after the modification.

This sub-task is named Single-image Semantic Attack (SiSA) task, which allows for creating a single attack example that is anti-forensic and identity consistent with the orig- inal deepfake The SiSA task can then be repeatedly done with different target attributes to generate a set of deepfake images that are semantically diverse while having a consis-tent identity In the scenario of this research, the attacker can use this set of images as the profile for the fake persona, making the fake persona look extremely real to people as well as to forensic systems Therefore, the SiSA task is crucial for the anti-forensic fake persona attack, once SiSA is done correctly then the main attack task is also completed.

In the next sections, black-box and white-box approaches that an attacker may use to perform the SiSA tasks are proposed and discussed.

White-box approaches

In the white-box scenario of semantic attack, the attackers are assumed to have full access to the deepfake detector including the network architecture and gradients The white-box scenario is often used to test the forensic systems in the worst case In this section, I denote VA is the target attributes matrix, created by vertically stacking k target attribute vectors to form rows of V A In particular, V A is a matrix of k rows, each row is one of the SiSA task’s target attribute vectors: {Va i |i = 1 k} (refer to Section 3.1 for the SiSA task).

The idea of the two-phases approach is fairly straightforward In Section 2.2, the se- mantic modification for GAN is introduced to edit and change a certain attribute of the deepfake semantically Pixel-level adversarial examples are also introduced in Section 2.4.2 to attack forensic systems by adding imperceptible perturbations to the deepfake. Hence, the two-phases approach is proposed to simply conduct semantic modification and subsequently pixel-level adversarial example attack to fool a detector Figure 3.1 and Algorithm 1 illustrate this approach, which includes the following two phases:

• Phase 1: Semantically modify the original deepfake image x to x ′ = G(z + αV A ) where α is a vector of k elements with the i-th element represents the magnitude of change on the attribute a i This operation is simply translation of the original latent code z along the directions defined by the attribute vectors, with different magnitude α After the translation, the deepfake image is semantically modified along the target attributes Refer to Section 2.2 for the meaning behind this operation.

• Phase 2: Generating a pixel-level adversarial perturbation σ using back-propagation to lower the loss function, which is the prediction score of the detector: L(x ′ +σ) = F d (x

′ +σ) Here, I use the state-of-the-art adversarial example generating technique which is Projected Gradient Descent (PGD) [39] to perform the adversarial attack PGD has two steps: gradient step and projection step, which are illustrated in line 7 and line 8 of Algorithm 1 PGD is repeated until the output prediction score is lower than the real/fake threshold and the image x ′ + σ is detected as real.

Figure 3.1: Two-phases approach illustration Phase 1: Semantic modifying the original deepfake x along the target attributes to create x ′ = G(z + αVA) Phase 2: Adding pixel-level adversarial perturbation σ to create the anti-forensic deepfake x ′ + σ.

Algorithm 1 Two-Phases White-box Attack

Input: latent code z, target attributes matrix V A , vector α

Parameter: learning rate η, PGD epsilon value ϵ

3: for each gradient step do

6: end if #gradient step of PGD

8: σ ← σ ∗ min(1.0, ϵ/∥σ∥p) #projection step of PGD 9: end for

The two-phases white-box approach is quite simple, yet can work very effectively and theoretically satisfy all the requirements of the SiSA task It is based on two well-known verified techniques, thus, the performance of this approach should be decent to be used in real-life attacks.

3.2.2 Semantic Aligned Gradient Descent approach

While having a high success rate, pixel-level adversarial examples are also known to reduce the visual quality of the deepfake image, especially when the detector is well-trained In the second white-box approach, the aim is to lower the prediction score while modifying the semantic attributes of the image simultaneously The main idea of this approach is to add perturbations to the original latent code (similar to semantic adversarial examples in Section 2.4.3), at the same time constrain the changes in image space to only alter the target attributes With this attack, the visual quality of the attack examples is expected to be retained.

In detail, a perturbation δ is added to the original latent code z in the latent space of GAN and updated in each gradient step through two following steps:

• Gradient back-propagation: performs gradient descent (GD) to update the per- turbation δ with the objective to lower the loss function L(z + δ) which is also the prediction score of the detector F d (G(z + δ)) Different from the two-phases ap- proach, here, the gradient is back-propagated to the latent space instead of the pixel space This step is similar to the semantic adversarial example introduced in 2.4.3 Figure 3.2 illustrates this gradient back-propagation operator.

• Semantic aligning (SA): aligns the perturbation δ to the direction of the target at- tribute vectors By that, the SA operator is performed by projecting the current perturbation δ onto the V A ’s row space (rowsp(V A )) Then, perturbation’s compo-nent that is orthogonal to rowsp(V A ) has its magnitude clipped to no more than the orthogonal threshold (h ⊥ ), to limit off-target semantic changes Figure 3.3 illus-trates the semantic aligning of a perturbation δ into δ ′ when there is only one target attribute.

Figure 3.2: Gradient back-propagation step of Semantic Aligned Gradient Descent ap-proach, where a perturbation δ is added to the latent code z and updated by gradient descent This step is similar to the semantic adversarial example attack.

Figure 3.3: Example of semantic aligning the perturbation δ into δ ′ , with the orthogonal threshold h ⊥ and only one attribute vector V a is targeted In the case of two or more target attributes, the perturbation is projected onto the space spanned from the target attribute vectors.

The first step of the SA-GD approach - gradient back-propagation is basically the seman-tic adversarial attack, which can lower the prediction score and make the attack example anti- forensic However, this step does not fulfill the semantic constraint of the SiSA task since the latent code is perturbed freely in the latent space, thus, prompting changes that are not targeted For this reason, the SA operator is introduced to control the GD’s latent code perturbation, to ensure that semantic changes other than the targeted attribute are limited.

By that, the SA step projects the perturbation onto the target attributes and clips the changes that are orthogonal to the target attribute vectors Note that, these orthogonal changes are changes that are not in the direction of the target attribute vectors, hence, are not targeted by the SiSA task and need to be limited.

Algorithm 2 illustrates the pseudo-code for the Semantic Aligning (SA) operator, in which, line 1 and line 2 are to perform the projection of the perturbation δ onto the tar- get attribute vectors The projection is done by projecting δ onto each of the vectors in the orthogonal set that spans rowsp(VA) Algorithm 3 illustrates the pseudo- code for the SA-GD approach.

Algorithm 2 Semantic Aligning - SA(δ, VA, h⊥)

Input: perturbation δ, target attributes matrix V A

1: Find orthogonal set {u 1 , , u k } that spans rowsp(V A )

2: δ proj ← k j=1 (u j (δãu j ) u j ) u j #project δ onto the orthogonal set

3: δ ortho ←P δ− δ proj ã #calculate the orthogonal perturbation

6: δ ortho ← δ ortho ∗ h ⊥ /∥δ ortho ∥ p #clip the orthogonal perturbation 7: δ ′ ← δproj + δortho

Algorithm 3 SA-GD White-box Attack

Input: latent code z, target attributes matrix VA,

Parameter: learning rate η, radius of random restart ∆

2: for each gradient step do

6: δ ← δ − η sign(∇δL(z + δ)) #gradient back-propagation step

7: δ ← δ ′ ← SA(δ, VA, h ⊥ ) #semantic aligning step, Algorithm 28: end for

Black-box approach

While GD-based methods are efficient, they often require information on the detector’s gradients, which is usually not available in real-life attacks In this section, a more realistic black-box setting where the attacker has no access to the model’s gradients is considered.

Assume that the attacker only knows the output prediction score F d (G(z) given by the de- tector The problem becomes a common multi-dimensional optimization problem, where we are given a task to find z to minimize F d (G(z)) until reaching a value that is smaller than the real/fake threshold λ (thus the detector would output ‘real’ as the prediction).

Stochastic searching methods [40]–[42] fit the problem extremely well since they do not require the information of gradients The Evolutionary Algorithm (EA) is one of the most famous stochastic searching algorithms that work well on many types of problems.

EA is also a population-based algorithm, which evolves a set of candidate solutions (or ‘chro-mosomes’) through generations The searching process of EA is guided via the previous knowledge obtained from the candidates of the population (this class of algorithms is also referred to as meta-heuristic algorithms).

The evolutionary step in an EA is the core of its searching for new candidates, usually con- taining two types of operators: crossover and mutation The crossover operator performs with two chromosomes (often referred to as the parent) as input and one to many newly created chromosomes as output (often referred to as the off-springs) The off-springs are expected to inherit some attributes of the parent For the mutation operator, one chro- mosome is inputted and a slightly different chromosome is outputted, which usually is a neighbor of the input regarding the search space After the evolution step, the fittest chro- mosomes of the population are kept for the next generation and the population continues to evolve again As generations pass by, the population is expected to become better over time and eventually converge to an optimal solution, which is the final output of EA.

For this thesis, a simple yet effective Semantic Aligned - Evolutionary Algorithm (SA-EA) for black-box semantic attacks is designed Algorithm 4 shows the pseudo-code for SA-EA Similar to SA-GD, the changes applied to the latent code are also limited within a certain orthogonal threshold by using the SA-EA operator (Algorithm 2) to each latent code candidate of SA-EA.

Algorithm 4 SA-EA Black-box Attack Input: latent code z, target attributes matrix VA

Parameter: population size n, number of generations iters, crossover rate cr, mutation rate mr

2: ℘ ← ℘ ∪ [z + SA(random(δ), V A , h ⊥ )] #semantic aligning, Algorithm 2 3: end for

6: for each latent code z ∗ in ℘ do

7: z ∗ ← z ∗ + SA(z ∗ − z, VA, h ⊥ ) #semantic aligning, Algorithm 2 8: end for

9: Sort ℘ in increasing order of Q(z)

In line 5 of Algorithm 4, the evolutionary step is performed to evolve the population There are many different choices for the evolutionary step In this research, to keep it simple, two basic types of crossover and one random noise mutation are selected for the evolutionary step To be more specific, the following operators are performed:

• 1-point crossover [43]: given the input parent f and m, generate a new chromosome by concatenating the first half of f to the second half of m. Figure 3.4 illustrate examples of the 1-point crossover.

Figure 3.4: An example of 1-point crossover in SA-EA The first half of f and the second half of m are concatenated to create offspring c.

• Average crossover [43]: given the input parent f and m, generate a new chromo-some by taking the average value of f and m Figure 3.5 illustrate examples of the average crossover.

Figure 3.5: An example of average crossover in SA-EA Offspring c is created by taking average of f and m.

• Random noise mutation [43]: given the input chromosome c, generate a new chro-mosome by adding to c a noise uniformly sampled within the range ∆ Figure 3.6 illustrate examples of the random noise mutation.

The input chromosomes for these operators are chosen randomly and the crossover rate cr and mutation rate mr (as input of Algorithm 4) decide how often the operators happen in one evolutionary step Algorithm 5 shows the pseudo-code for the evolution step.

The reason for choosing these operators for the evolutionary step is due to their simple- ness All of the operators only use basic math functions between latent code vectors, thus,

Figure 3.6: An example of random noise mutation in SA-EA Chromosome c is mutated to chromosome c ′ by adding a noise uniformly sampled in range ∆.

Algorithm 5 Evolution step of SA-EA

Parameter: population size n, crossover rate cr, mutation rate mr

1 ← concat(f[0, len/2], m[len/2, ]) #1-point crossover

18: return ℘ executing very fast Furthermore, these operators, even though simple, satisfy very well with the guide for designing EA that the offspring chromosomes should have attributes that are inherited from both of the parent chromosomes From observations of the seman-tic attributes in the images in Figure 3.4 and 3.5, we can clearly see many attributes such as hairstyle, hair color, eyes color, are passed from the parent chromosomes to the off- springs This is highly important in SA-EA as this allows the algorithm to shift and limit the changes within the target attributes more correctly For the random noise mutation in Figure3.6, while some certain attributes can be observed slightly changed (such as pose, hair color, hairstyle), overall, the final image appears largely similar to the original image.

In conclusion, for this chapter, I have introduced the concept of the SiSA sub-task in order to solve the main anti-forensic fake persona task Three different approaches based on both white-box and black-box scenarios have been proposed and described in detail, which are the Two-phases approach, SA-GD, and SA-EA In the next chapter,defense techniques to counter these attack approaches are discussed.

DEFENSES AGAINST ANTI-FORENSIC FAKE PERSONAS

Defense against Single-image Semantic Attack task

As a means to improve forensic systems’ performance against anti-forensic fake personal attacks, one obvious approach is to improve the robustness of the detector against the SiSA task In all of the above-proposed attacks, the SiSA task is the core of the anti- forensic fake persona Therefore, if the detector can detect the semantic adversarial ex- amples from the SiSA task well, the risk of fake persona attacks is also reduced.

Figure 4.1: Retraining the deepfake detector with addition of semantic attack images

To do so, the adversarial retraining technique is applied By that, the original training data for the targeted deepfake detector is augmented with semantic adversarial examples obtained from the attacks (Figure 4.1) The detector is then retrained with this new aug- mented data for a few epochs The detector after retraining is expected to learn to detect the adversarial examples better As a fact, semantic adversarial attacks exploit uncommon attributes within the training data to fool the forensic systems In this case, the adversar-ial retraining method can be interpreted as improving the diversity of the training data to better cover faces containing uncommon attributes.

Input: original training data Doriginal; target detector Fd with weight θ

Parameter: learning rate for training α

Output: retrained detector F d ′ with weight θ ′

1: for each training iteration do

3: Compute detector cross-entropy loss L d

6: Perform white-box attacks on F d #Algorithm 1,3

7: Retrieve the adversarial images → Dattack

9: for each training iteration do

11: Compute detector cross-entropy loss L d

Algorithm 6 illustrates the pseudo-code for the adversarial retraining defense In detail, adversarial retraining defense against the anti-forensic fake personal attacks can be done via the following steps:

1 Perform the SiSA attack tasks on the target deepfake detector to retrieve a set of adversarial images, preferably white-box attacks to test the worst- case scenario and to get better attack examples.

2 Combine the adversarial images obtained above with the original training data of the detector to form a new training data.

3 Finally, retrain the detector with the combined dataset for a few epochs.

Defenses against anti-forensic fake persona attack

The adversarial retraining strategy for defense is simply to improve the detector’s ro- bustness against an individual attack example separately However, we are studying the problem of fake personal profiles, meaning that the input data to the forensic system is a profile of images (list/series of images) instead of a single image The task of the forensic system is to tell whether the profile is fake or legit For this section, I aim to propose defense techniques that can be used to complete this task of classifying persona profiles To do so, two approaches that are built upon the original deepfake detector are proposed below Without the loss of generality, we can assume that the persona profile of interest contains m images.

The first approach called Naive Max-pooling defense is fairly simple and naive, true to its name The idea is to simply aggregates the output prediction score Fd(x) of each image in the profile to get an overall score, then use this score to validate the profile The aggregation of the prediction scores can be done in many ways:

• Max-pooling: A straightforward strategy is to use a maximum function to combine the output prediction scores We can also interpret this strategy as: if at least one image of a profile is predicted fake by the detector then the whole profile is decided fake This strategy can be used together with adversarial retraining to boost the performance of the defense forensic, filtering out more fake images Figure 4.2 illustrates the Naive Max-pooling method.

Figure 4.2: Illustration of Naive Max-pooling defense, where m images of the profile are fed into the detector D to get m corresponding prediction scores Then, m prediction scores are fed through a max pooling layer to get the overall score of the profile.

• Average-pooling: Another option for the aggregation function is to use average function This approach can be useful when the detector wrongly detects real im-ages to ‘fake’, which in that case, the profile of interest will also be detected ‘fake’ if Max-pooling is used Average-pooling on the other hand, may reduce the rate of false positive detection, but trade-off this for higher rate of false negative (in here, positive is ‘fake’ and negative is ‘real’).

In the case of forensic systems, it is usually preferred to have a lower false negative (when the profile is fake but the detector detects it is real) than to have a lower false positive(when the profile is real but the detector detects it is fake) Therefore, in this thesis, theNaive Max-pooling defense is mainly considered and used for the experiment.

The Naive Max-pooling strategy introduced above still infers each image individually and, thus, is unable to learn useful correlations between the image features within the same profile To better represent the idea of Feature Pooling defense, let’s separate the detector D into two following partial modules:

• the convolution module ‘cnn’: includes the convolution layers in the detector net-work This module is used to extract feature vectors of the input images that contain information about the input.

• the fully connected module ‘fc’: includes the fully connected layers at the final end of the detector network This module takes the feature vectors from cnn module and performs the classification on these feature vectors to output the prediction.

The idea of Feature Pooling is to insert a pooling layer between the cnn module and the fc module More specifically, first, m images of the persona profile are fed through cnn separately to get m feature vectors Note that, the size of each feature vector is n hidden , where n hidden is the size of the output layer of the cnn module Then, these vectors are fed through a pooling layer to get the combined feature vector of the persona profile, also with the size of n hidden Finally, we feed this vector through fc to get the final prediction.

Similar to the Naive Pooling defense, the choices for the pooling layer to add in are also various In the scope of this research, the simple Max-pooling layer is used as it makes the most sense Figure 4.3 illustrates this Feature Max-pooling defense.

Figure 4.3: Illustration of Feature Max-pooling Defense, where m images of the profile are fed into the cnn layer of the detector then into a max-pooling layer to get the profile feature vector Lastly, the profile feature vector is fed into the f c layer to get the predic-tion.

Different from Naive Max-pooling however, this strategy requires retraining of the fc module so that it can learn the correlation between images of a profile To perform this retraining, we can initialize the weights of this new detector model with the original de-tector D’s weight.

To conclude this chapter, three different defense strategies have been described to counter against the fake persona attacks The first strategy, adversarial retraining aims to defend the SiSA task and improve the overall robustness of the forensic systems.Naive Pooling defense and Feature Pooling defense, on the other hand, take into account the correlation between images of a persona profile to decide whether the profile is fake With these de-fenses, the forensic systems are expected to detect attack examples better, thus, mitigating the threat of fake personas.

EXPERIMENT RESULTS AND ANALYSIS

Experiment setup

Environment: All of the experiments are run on the same machine with GPU NVidia Tesla K80 12GB, CPU Intel Xeon E5 2695 v3, 128GB of memory The source code environment is Python 3.7, neural network models are implemented on PyTorch. Target deepfake detector: This thesis studies the robustness of the deepfake detector proposed by [17] against our semantic attacks This detector is based on ResNet-50 [44] pre-trained on ImageNet [45], then trained to classify real/fake images The data used in this work contains a total of 720,000 training images and 4,000 validation images of 20 different kinds of objects (dog, cat, car, person, etc.), of which half are real images and half are synthesized from ProGAN [5] The evaluation result shows that the deepfake detector can detect synthetic images from not only ProGAN but also ten other unseen generators with high accuracy The authors then improved the robustness of the detector against spatial blurring and JPEG compression by augmenting the training images In this research, the model ‘Blur+JPEG(0.5)’ (trained with 50% of the data augmented) provided by the authors is used for semantic attacking In all of the experiments, full-size images (1024x1024) are used instead of 224x224 cropping offered by the authors, which generally gives better accuracy for the detector This CNN detector is one of the most recent detectors, which is trained on an extremely large dataset and is shown to have SOTA accuracy Therefore, this detector is selected for the evaluation in this thesis.

Semantic modification: The semantic face editing by [24] is applied In this work, the authors proposed InterFaceGAN to explore how semantics are encoded in the latent space of GANs, then extended this to enable semantic face editing with any pre-trained GAN model Different models are available, such as ProGAN [5] trained on CelebA-HQ dataset,StyleGAN [6] trained on CelebA-HQ dataset, StyleGAN trained on FF-HQ dataset For each model, the authors provided different attribute vectors used for semantic face editing,such as gender, smile, age, pose etc Specifically, the semantic face editing for StyleGAN trained on CelebA-HQ [6] is used with four provided attribute vectors: age, smile, pose and gender The reason for using StyleGAN rather than ProGAN is because StyleGAN is the more recent model and is considered better in quality compared to ProGAN.

Evaluation metrics: To assess the performance of the proposed attack approaches and defense strategies, I use three different evaluation metrics, including:

• Attack success rate (ASR): each approach for the SiSA task is evaluated by the success rate of the SiSA task The inputs of the SiSA task is kept the same for all the approaches to ensure fairness in comparison In detail, the input latent codes of SiSA task are uniformly sampled in the latent space of StyleGAN with 1000 samples The SiSA task is then performed with each of these 1000 latent codes as input and the attack success rate - ASR is the ratio of successes in the total of 1000 attacks.

• Average computational time: The proposed approaches are also evaluated with the average computational time taken for each successful attack of the SiSA task To ensure fairness, the approaches are executed in the same machine, during the same session.

• Visual quality: To evaluate the visual quality of the output adversarial examples, the Frechet´ Inception Distance (FID) metric [46] is used FID is widely used to cal-culate the distance between two image datasets in terms of feature vector similarity, usually between a set of real images and a set of synthetic images to assess the vi- sual quality of the synthetic one Rather than directly comparing images pixel by pixel

(for example, as done by the l 2 norm), the FID compares the mean and stan-dard deviation of two Gaussian distributions By that, the FID between a Gaussian with mean and standard deviation (à, C) and a Gaussian with mean and standard deviation (àw, Cw) is given by:

F ID((à, C), (àw, Cw)) = ||à − àw|| 2 2 + Tr(C + Cw − 2(CCw) 1/2 ) (5.1)

Where in Equation 5.1, Tr is the trace of the matrix operator The FID metric is shown to be consistent with increasing disturbances and human judgment

[46] and is the current standard metric for assessing the quality of GANs since 2020 In this thesis, I calculate the FID scores between these images and real images from the state-of-the-art CelebA-HQ [47] (notation FIDCelebA score) The lower FIDCelebA score means the visual quality of the adversarial images is more similar to the quality of real images from CelebA-HQ.

Using different metrics in the experiment allows to evaluate the proposed methods un-der different aspects and perspectives Note that, apart from these three metrics, I also evaluate the proposed approaches from observation of the attack examples and assess the correctness of semantic modifications A semantic modification is said to be correct if the changes are bounded within the target attributes of SiSA only.

Two-phases white-box approach: the magnitude of change α is uniformly sampled in range 1.0 to 2.0, maximum iterations is 5, learning rate η is 0.1, norm p is 2, PGD epsilon ϵis varied from 0.1 to 0.35 with 0.05 step.

SA-GD white-box approach: maximum iterations is 100, η is 0.1, norm p is 2, and the orthogonal threshold h⊥ is varied from 1.0 to 3.5 with 0.5 step.

SA-EA black-box approach: population size n = 80, number of generations m 15, crossover rate cr = 1.0 and mutation rate mr = 0.4 The orthogonal threshold h ⊥ is also varied, same as in SA-GD.

Target attributes setting: For the testing purpose, in the ASR evaluation of the SiSA approaches, each attack targets only k = 1 attribute at a time In theory, with more target attributes, the attack should work even better since the search space is larger Therefore, in this experiment, I only consider the case where k = 1 to evaluate the performance of the attack in the most difficult attack scenario Additionally, four different target attributes are tested, including: age, smile, pose and gender for a comprehensive study.

Adversarial retraining defense: adversarial retraining is performed on the target detec-tor in [17] In total, 20, 000 adversarial examples generated from the proposed white-box attacks are augmented to the original dataset The detector is then retrained for 5 epochs (training parameters are the same as in [17]) The retrained detector is also evaluated and shown to be still effective as a real/fake classifier, later in this Chapter In the rest of the thesis, let the original detector be notated ‘original’ and the retrained detector be notated ‘defense’.

Feature Max-pooling defense: Similarly, the additional dataset is generated containing 5, 000 image profiles, each containing 4 images of the same identity.The Feature Max-pooling defense is retrained for at max 5 epochs Note that,even though the defense is trained with profiles of 4 images, it can still validate profiles with an arbitrary number of images.

Single-image Sematic Attack task evaluation

In this Section, the approaches for the SiSA task are evaluated Three proposed ap- proaches including the Two phases approach, SA-GD, and SA-EA are tested against the target deepfake detector The counter defense method against SiSA which is adversarial retraining (introduced in Section 4.1) is also considered and evaluated in this Section.

To ensure the fairness of the experiments and make sure that the results obtained are scientifically correct, in this part, some baselines are stated These baselines are acted as proof that both the original detector and the defense detector work perfectly fine before any attacks are made. a, Evaluation of the baseline defense detector

In this experiment, the retrained defense detector is evaluated to test if the detector is still functional as a real/fake classifier If the retrained detector can detect adversarial examples really well but is unable to perform real/fake classification then it is pointless to do this defense.

Table 5.1: Comparison on the accuracy (Acc.) and the average precision (AP) between the defense and the original detector, test sets are from [17] (no-crop evaluation).

The detector is evaluated with the same evaluation test as in the original paper [17] Table 5.1 shows the result of the test, where the accuracy (Acc.) and average precision (AP) result of the original detector is directly from [17] From observation, the AP is quite similar between the two models except for BigGAN where AP uncommonly drops For accuracy, after retraining, report cases where Acc drops (CRN, IMLE) and also cases where Acc. remarkably raises (StyleGAN, StarGAN, SITD, StyleGAN2, Whichfaceis-real) Discuss on this matter, the additional D attack set for retraining contains only face images while the original train set and test set in [17] contain images of many different objects, thus causing dropping performance On the other hand, since images of D attack are all StyleGAN generated, the Acc after retraining may raise when testing with a dataset using an architecture that is similar to StyleGAN (e.g., Whichfaceisreal [48] is generated from StyleGAN2). b, Without-the-attack baseline

Without any attacks, the original detector can classify 1, 000 inputs with 99.6% accuracy (equivalent to 0.4% failing rate), average confidence of the predictions is 98.7% The defense detector retrained with StyleGAN can correctly classify all of 1, 000 inputs (1.0 accuracy) with average confidence of the predictions is 99.5% In summary, both detector work as intended and can classify these input deepfakes extremely well.

5.2.2 Two-phases white-box approach evaluation

The first experiment is to evaluate the two-phases white-box attack Figure 5.1a shows the ASR of the two-phases attack against original detector when using different target attributes From observation, the two-phases attack has an overly high ASR, even with the lowest value of ϵ tested the ASR is still around 60% The ASR reaches 90% when ϵ is 0.15 and 100% for larger values, showing that the detector is highly vulnerable against the two-phases attacks Between different target attributes used in this test, there is no significant difference observed This shows that the attack works stably regardless of the target attribute selected. success

Figure 5.1: Two-phases white-box ASR: (a) against original detector with different tar-get attributes ; (b) against original and defense detector (average value across target at-tributes).

Figure 5.1b shows the decrease in ASR of the two-phases attacks after performing ad- versarial retraining on the detector Against the defense detector, the two-phases attack requires at least ϵ = 0.3 to reach 90% ASR Using a higher value of ϵ will create a noisier image, which lowers its visual quality as a result However, the overall ASR is still rather modest considering that the attacker may attack many times until successfully bypassing the detector For this experiment, we can conclude that adversarial retraining does weaken the two-phase attacks to a certain extent, but does not completely mitigate the threat.

5.2.3 SA-GD white-box approach evaluation

In this experiment, SA-GD white-box approach is evaluated Figure 5.2a shows the ASR of SA-GD white-box attacks against original detector when using different target attributes. Inspecting the Figure, overall, the ASR of SA-GD is rather high (greater than 50% in most cases, close to 100% when h ⊥ is 3.5) even though not as high as the two-phases attacks. Nonetheless, the ASR of SA-GD is still sufficient for attacking real-life systems Notice that the ASR is higher when using greater values of h ⊥ , which is an expected result from a wider and larger search space Between different target attributes used in this test, the same outcome is observed that there is no significant difference.

Figure 5.2b illustrates the ASR of SA-GD against the detector after performing the adver-sarial retraining defense There is also a noticeably decrease in ASR of SA-GD across all the values of h⊥ Similar to the case of the two-phases approach that is considered above, attackers can still bypass defense systems with the SA-GD attack as they can perform the attack repeatably. success

Figure 5.2: The ASR of SA-GD white-box: (a) against original detector with different target attributes ; (b) against original and defense detector (average value across target attributes).

5.2.4 SA-EA black-box approach evaluation

Similar to the above experiments, this experiment tests the SA-EA black-box approach against the original and the defense detector Figure 5.3a shows the ASR of SA-EA against original detector with different target attributes Figure 5.3b illustrates the ASR against the detector after performing the retraining defense In general, for SA-EA black-box attack, we can find a similar outcome compared to the white-box approaches, except for the overall lower ASR The ASR of the attack after retraining is fairly low (below 20% even when using the highest value of the orthogonal threshold) In a real attack case, the attacker may require many entries to get a successful attack, and many successful at-tacks with different target attributes to generate enough images in a fake personal profile. Therefore, with the current ASR, we can safely say that it is would be quite challenging to attack with the SA-EA black-box after the adversarial retraining defense. success

1.0 Target Attribute s u 0.8 pose smile age R a t

Figure 5.3: The attack success rate of SA-EA black-box: (a) against original detector with different target attributes ; (b) against original and defense detector (average value across target attributes).

5.2.5 Comparison between the approaches for SiSA

The next experiment compares the three proposed approaches for the SiSA task giving the same constraint Since the two-phases approach does not share the same constraint of orthogonal threshold h⊥ as SA-GD and SA-EA, this attack is not included in this experiment To give a better perspective, two naive grid-search black-box approaches are also implemented as a baseline, which are:

• 1-dim grid-search: perform the grid-searching along the target attribute vector.

• ‘Multi-dim grid-search: perform the grid-searching along the target attribute vector, with addition of random orthogonal perturbations within a threshold h⊥ in each grid step.

Figure 5.4 shows the ASR of SA-GD white-box attacks, SA-EA black-box attacks, 1-dim grid-search and Multi-dim grid-search against the original detector with different orthog-onal threshold From observation of the result, we can see that both SA-GD white-box and SA-

EA black-box semantic attacks easily outperform the naive grid-search baseline approaches Between the proposed, the SA-GD white-box attacks have noticeably higherASR compared to SA-EA black-box attacks This is a common and expected outcome considering that the white-box attacks have further access and more information about the detector. su cc es s 1.0

SA-GD White-box SA-EA Black-box 1-dim grid-search Multi-dim grid-search

Figure 5.4: The ASR of SA-GD white-box, SA-EA black-box and grid-search approaches giving the same h⊥ (average value across target attributes).

In this experiment, the visual quality of the attack examples from the SiSA task, in terms of image quality and the correctness of semantic changes is evaluated. a, Adversarial images quality:

The adversarial image quality is evaluated with FID CelebA score in this experiment.

Figure 5.5 illustrates the average FID CelebA score of output adversarial images when perform-ing each attack approach against the original and defense detector As a baseline, the FID CelebA value of the images generated by StyleGAN from the 1,000 input latent codes is calculated and represented in Figure 5.5 as the red dash Note that, the FID CelebA of StyleGAN-generated-images in this research does not equivalent to the FID provided in the StyleGAN paper [6], since the set here only contains 1,000 StyleGAN images As we are using a single reference CelebA-HQ real images set to calculate the FID, this value can be used as a baseline to evaluate the visual quality of adversarial images relative to the original StyleGAN-generated-images.

Anti-forensic fake persona attack evaluation

In the experiments above, I have evaluated the ASR of different SiSA approaches, which are performed to generate an individual attack sample However, anti-forensic fake per- sona attacks require performing the SiSA task repeatedly to acquire the feasible amount of deepfakes to form the profile In this section, the ASR of anti-forensic fake persona attacks is evaluated To distinguish this success rate from the success rate of SiSA in the previous section, let’s notate P-ASR as the fake persona attack success rate.

Here, the Naive Max-pooling and Feature Max-pooling strategy (introduced in Section 4.2) are applied to perform fake persona profile detection, as the defense detector alone does not support profile validation Furthermore, the two-phases approach is used for this evaluation since this attack has the highest ASR among all. su cc es s 1.0 m = 1 m = 2

Figure 5.8: The P-ASR of two phases approach: (a) against Naive Max-pooling strategy with different ϵ, (b) Naive Max-pooling vs Feature Max-pooling strategy where ϵ = 0.2 (m is the number of images in a profile)

Figure 5.8a shows the P-ASR of the two-phases approach against defense detector with the Naive Max-pooling strategy applied when changing the numbers of images in a profile m Note that, the case where m = 1 is equivalent to the case when two-phases attacking defense detector without the Naive Max-pooling (which is tested in Section 5.2.2, Figure 5.1b) From observation, when the persona profile of interest contains more images, the Max-pooling strategy defense tends to detect better This indicates that, if the attackers want to increase the number of deepfake images to put into the fake profile (to make the profile look more legit), they will have to trade off with a lower chance to bypass the forensic systems.

With the Naive Max-pooling strategy alone, the robustness of the detector does get lower but not by much Next, the Feature Max-pooling strategy is tested Figure 5.8b shows the comparison between Naive Max-pooling and Feature Max-pooling strategy when the value of m is varied From the Figure, we can notice that the Feature Max-pooling strat- egy tends to detect much better than the Naive Max-pooling strategy for every value of m tested The P-ASR dropped by an amount of 20% to 25% with the Feature Max- pooling strategy applied This highlights the importance of designing forensic methods that ap-proach the threat in a more comprehensive manner for a stronger performance.

Discussions

5.4.1 Visual quality trade-off between approaches

Against stronger and better-trained detectors, the adversarial attacks usually require gen-erating larger perturbations to gain a higher attack success rate, thus, lowering the visual quality of the attack examples For different SiSA approaches, the visual quality may be affected in very different ways.

Figure 5.9: Exaggerated examples of how larger perturbation affects the visual quality: two-phases approach generates noisier images while SA-GD/SA-EA output non-target attributes changes.

Figure 5.9 illustrates how different approaches affect the visual quality in different ways when the detector gets strengthened Note that, the perturbations in the figure are ex- aggerated to show possible outcomes when attacking better-trained detectors For the two-phases approach, more perturbation equivalents to visibly much noisier images.

On the other hand, SA-GD/SA-EA approach can still generate relatively clean images, how-ever, these images have more noticeable changes in off-target semantic attributes with increasing orthogonal threshold values This highlights the trade-offs in visual noise and off-target semantics faced by these different approaches.

In summary, even though the two-phases approach has the highest performance among the three proposed, it is not always the go-to choice One may prefer theSA-GD ap-proach for the cleaner visual look of the attack examples or the SA-EA approach in case of a black-box attack scenario Each approach has its own advantage compared to the others, depend on the requirement of the fake persona attack, the attackers may select the most feasible approach.

From the experiments in this thesis, to perform the attacks against deepfake forensic sys-tems, most approaches require querying the system multiple times until successful by-passing Furthermore, the queries made by these attacks also tend to converge to an in-stance eventually Based on this knowledge, certain defense techniques can be proposed such as:

• Continuous Query Detecting: based on the work by Chen et al [49] This defense method alerts that the system is under the black-box attacks by recording the recent queries made to the detector and storing them in a buffer When a new query arrives, its distance to existing queries in the buffer is calculated If there are too many queries in the buffer that are close to the new query, we can alert that the forensic system may be under attack.

• Experience-based: this defense records the queries that are detected as ‘fake’ by the detector When a new query arrives that is detected ‘real’, the system checks if the query is close to the queries that have been previously detected ‘fake’ If it is, the system will mark the query as a ‘potentially dangerous’ query, and use a higher real/fake threshold to decide the output of the prediction.

With these query-based defenses, the robustness of forensic systems against fake persona attacks is expected to improve However, within the scope of this thesis, due to the limit in resources, I can only discuss the theory of these techniques and do not empirically study them.

While the attack approaches discussed in this thesis may raise certain concerns about the risk of malicious actors abusing the attacks, I believe it is more beneficial as a whole that the findings here (especially the defenses) are shared with the research community With this thesis, I hope to facilitate research in the building of more robust forensic systems and partially contribute to the fight against deepfake abuses and cybercrimes

CONCLUSION AND FUTURE WORKS

Contributions

In the bulk of this Master work, I have comprehensively investigated a novel threat model where attackers can fabricate a convincing anti-forensic deepfake persona backed by specially-crafted semantic adversarial examples Three attack approaches considering both white- and black-box scenarios are proposed to edit only targeted semantic attributes of a source deepfake image while bypassing the detector at the same time To counter this threat, different defense strategies are also proposed and discussed to help enhance the detection of deepfake personas In the experiments, while current forensic systems are shown to be highly vulnerable to the fake persona attacks, the aforementioned defenses can mitigate the threat by aggregating predictions from the images of the fake profile With the findings in this thesis work, I hope to raise awareness of the threat of fake per-sona cybercrimes and facilitate research in developing more robust defenses.

The result achieved in this Master thesis have been submitted to the following interna- tional conferences: Anti-Forensic Deepfake Personas and How To Spot Them – Nguyen Hong Ngoc, Alvin Chan, Huynh Thi Thanh Binh, Yew Soon Ong, IJCNN - 2022 IEEE World Congress on Computational Intelligence (Accepted for publishing)

Publication during the Master program: “A family system based evolutionary algorithm for obstacles-evasion minimal exposure path problem in Internet of Things” - NguyenThi My Binh, Nguyen Hong Ngoc, Huynh Thi Thanh Binh, Nguyen Khanh Van, Shui Yu,Expert Systems With Applications, p 116943, 2022, ISSN: 0957-4174 (Q1 Journal - IF:6.954) The evolutionary algorithm designed for the SA-EA approach in this thesis is partially inspired by the evolutionary algorithm in this work.

Limitations and future works

Through the experiments, even though the defenses can reduce the attack success rate to a certain extent, it still does not completely neutralize the threat due to the fact that attackers may repeatedly attack the forensic system This has revealed a gap in the current deepfake detection techniques, as well as a limitation in the fight against deepfake abuses and cybercrimes.

Understanding this limitation, in the future, the research team will continue to research more robust deepfake detection techniques We will consider combining some of these detection techniques together to make the forensic system even stronger and detect fake persona profile better.

[1] “Deepfake,” Wikipedia, 2021, https : / / en wikipedia org / wiki /

[2] T Karras, S Laine, M Aittala, J Hellsten, J Lehtinen, and T Aila, “Analyzing and improving the image quality of stylegan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp 8110–8119.

[3] “Watch jordan peele use ai to make barack obama deliver a psa about fake news,” The Verge, 2021, https://www.theverge.com/tldr/2018/4/17/ 17247334/ai-fake-news-video-barack-obama-jordan-peele- buzzfeed

[4] I Goodfellow, J Pouget-Abadie, M Mirza, et al., “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp 2672–2680.

[5] T Karras, T Aila, S Laine, and J Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.

[6] T Karras, S Laine, and T Aila, “A style-based generator architecture for gener-ative adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2019, pp 4401–4410.

[7] D Guera¨ and E J Delp, “Deepfake video detection using recurrent neural net-works,” in 2018 15th IEEE International Conference on Advanced Video and Sig-nal Based Surveillance (AVSS), IEEE, 2018, pp 1–6.

[8] Y Li and S Lyu, “Exposing deepfake videos by detecting face warping artifacts,” arXiv preprint arXiv:1811.00656, 2018.

[9] S Suwajanakorn, S M Seitz, and I Kemelmacher-Shlizerman,

“Synthesizing obama: Learning lip sync from audio,” ACM Transactions on Graphics (TOG), vol 36, no 4, pp 1–13, 2017.

[10] T Chen, A Kumar, P Nagarsheth, G Sivaraman, and E Khoury,

“Generalization of audio deepfake detection,” in Proc Odyssey 2020 The Speaker and Language Recognition Workshop, 2020, pp 132–137.

[11] R Tolosana, R Vera-Rodriguez, J Fierrez, A Morales, and J Ortega- Garcia, “Deep-fakes and beyond: A survey of face manipulation and fake detection,” arXiv preprint arXiv:2001.00179, 2020.

[12] N Carlini and H Farid, “Evading deepfake-image detectors with white-and black-box attacks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp 658–659.

[13] S Agarwal, H Farid, Y Gu, M He, K Nagano, and H Li, “Protecting world leaders against deep fakes.,” in CVPR Workshops, 2019, pp 38–45.

[14] “A high school student created a fake 2020 candidate twitter verified it,”

Edition CNN, 2021, https://edition.cnn.com/2020/02/28/tech/fake- twitter- candidate-2020/index.html.

[15] Y Li, M.-C Chang, and S Lyu, “In ictu oculi: Exposing ai created fake videos by detecting eye blinking,” in 2018 IEEE International Workshop on

Information Forensics and Security (WIFS), IEEE, 2018, pp 1–7.

[16] A Rossler, D Cozzolino, L Verdoliva, C Riess, J Thies, and M Nieòner, “Face- forensics++: Learning to detect manipulated facial images,” in Proceedings of the

IEEE International Conference on Computer Vision, 2019, pp 1–11.

[17] S.-Y Wang, O Wang, R Zhang, A Owens, and A A Efros, “Cnn-generated im- ages are surprisingly easy to spot for now,” in Proceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition, https://github.com/ peterwang512/CNNDetection (accessed Nov 1, 2020), vol 7, 2020.

[18] J Stehouwer, H Dang, F Liu, X Liu, and A Jain, “On the detection of digital face manipulation,” arXiv preprint arXiv:1910.01717, 2019.

[19] J Frank, T Eisenhofer, L Schonherr,¨ A Fischer, D Kolossa, and T Holz, “Lever- aging frequency analysis for deep fake image recognition,” arXiv preprint arXiv:2003.08685, 2020.

[20] D Li, W Wang, H Fan, and J Dong, “Exploring adversarial fake images on face manifold,” in Proceedings of the IEEE/CVF Conference on

Computer Vision and Pattern Recognition, 2021, pp 5789–5798.

[21] W Xu, S Keshmiri, and G Wang, “Adversarially approximated autoencoder for image generation and manipulation,” IEEE Transactions on Multimedia, vol 21, no 9, pp 2387–2396, 2019.

[22] A Van den Oord, N Kalchbrenner, L Espeholt, O Vinyals, A Graves, et al., “Con-ditional image generation with pixelcnn decoders,” Advances in neural information processing systems, vol 29, 2016.

[23] J Feng, X Feng, J Chen, et al., “Generative adversarial networks based on collab-orative learning and attention mechanism for hyperspectral image classification,” Remote Sensing, vol 12, no 7, p 1149, 2020.

[24] Y Shen, J Gu, X Tang, and B Zhou, “Interpreting the latent space of gans for se- mantic face editing,” in Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, https://github.com/genforce/interfacegan

[25] S McCloskey and M Albright, “Detecting gan-generated imagery using color cues,” arXiv preprint arXiv:1812.08247, 2018.

[26] N Yu, L Davis, and M Fritz, “Attributing fake images to gans: Analyzing finger- prints in generated images,” arXiv preprint arXiv:1811.08180, vol 2, 2018.

[27] R Wang, L Ma, F Juefei-Xu, X Xie, J Wang, and Y Liu, “Fakespotter:

A simple baseline for spotting ai-synthesized fake faces,” arXiv preprint arXiv:1909.06122, 2019.

[28] J C Neves, R Tolosana, R Vera-Rodriguez, V Lopes, and H Proencáa,

“Real or fake? spoofing state-of-the-art face synthesis detection systems,” arXiv preprint arXiv:1911.05351, 2019.

[29] F Marra, C Saltori, G Boato, and L Verdoliva, “Incremental learning for the detection and classification of gan-generated images,” in 2019 IEEE International Workshop on Information Forensics and Security (WIFS), IEEE, 2019, pp 1–6.

[30] L Engstrom, B Tran, D Tsipras, L Schmidt, and A Madry, “Exploring the land-scape of spatial robustness,” in International Conference on Machine Learning, 2019, pp 1802–1811.

[31] N Carlini and D Wagner, “Towards evaluating the robustness of neural networks,” in 2017 IEEE Symposium on Security and Privacy (SP), IEEE, 2017, pp 39–57.

[32] N Papernot, F Faghri, N Carlini, I Goodfellow, and et al., “Technical report on the cleverhans v2.1.0 adversarial examples library,” arXiv preprint arXiv:1610.00768, 2018.

[33] F Croce and M Hein, “Minimally distorted adversarial examples with a fast adap-tive boundary attack,” arXiv preprint arXiv:1907.02044, 2019.

[34] P Neekhara, B Dolhansky, J Bitton, and C C Ferrer, “Adversarial threats to deep-fake detection: A practical perspective,” in Proceedings of the IEEE/CVF Confer-ence on Computer Vision and Pattern Recognition, 2021, pp 923–932.

[35] A Joshi, A Mukherjee, S Sarkar, and C Hegde, “Semantic adversarial attacks: Parametric transformations that fool deep classifiers,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp 4773–4783.

[36] C.-H Ho, B Leung, E Sandstrom, Y Chang, and N Vasconcelos,

“Catastrophic child’s play: Easy to perform, hard to defend adversarial attacks,” in Proceed-ings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp 9229–9237.

[37] I J Goodfellow, J Shlens, and C Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.

[38] H Zhang, Y Yu, J Jiao, E P Xing, L E Ghaoui, and M I Jordan,

“Theoretically principled trade-off between robustness and accuracy,” arXiv preprint arXiv:1901.08573, 2019.

[39] A Madry, A Makelov, L Schmidt, D Tsipras, and A Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv: 1706.06083, 2017.

Adding pixel-level adversarial perturbation σ to create the anti-

Ngày đăng: 04/06/2023, 11:32

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w