Facial image forgery and detection benchmark and application

Hanoi university of science and technology Master thesis Facial image forgery and detection: benchmark and application Pham Minh Tam tam.pm202708M@sis.hust.edu.vn School of Information and Communication Technology Supervisor: Assoc Prof Huynh Quyet Thang Supervisor’s signature Institution: School of Information and Communication Technology Hanoi, 12/2021 Declaration of Authorship I, Pham Minh Tam, declare that this thesis titled, "Facial image forgery and detection:benchmark and application" and the work presented in it are my own I confirm that: • This work was done wholly or mainly while in candidature for a research degree at this University • Where I have consulted the published work of others, this is always clearly attributed • Where I have quoted from the work of others, the source is always given With the exception of such quotations, this thesis is entirely my own work Student Signature and Name Acknowledgements I would like to extend thanks to the many people, who so generously contributed to the work presented in this thesis confirmation report I would first like to thank my supervisor, Prof Huynh Quyet Thang who give me so many wonderful opportunities and give me so many valuable guidance throughout my studies I would like to thank Dr Henry Nguyen and Dr Huynh Thanh Trung who provided me with the tools that I needed to choose the right direction and successfully complete my dissertation Their insightful feedback pushed me to sharpen my thinking and brought my work to a higher level I also like to thank my family, my girl friend and my friends for their wise counsel and sympathetic ear You are always there for me Finally, I would like to thank Vingroup Innovation Foundation (VINIF) for the financial support Abstract In recent years, visual forgery has reached a level of sophistication that humans cannot identify fraud, which poses a significant threat to information security A wide range of malicious applications have emerged, such as fake news, defamation or blackmailing of celebrities, impersonation of politicians in political warfare, and the spreading of rumours to attract views As a result, a rich body of visual forensic techniques has been proposed in an attempt to stop this dangerous trend In this thesis, I introduce two new models with name Efficient-Frequency and WADD to improve the result of fake face images detection problem I also present a benchmark that provides in-depth insights into visual forgery and visual forensics, using a comprehensive and empirical approach and propose a novel end-to-end visualforensic framework that can incorporate different modalities toefficiently classify real and forged contents More specifically, we develop an independent framework that integrates state-of-the-arts counterfeit generators and detectors, and measure the performance of these techniques using various criteria We also perform an exhaustive analysis of the benchmarking results, to determine the characteristics of the methods that serve as a comparative reference in this never-ending war between measures and countermeasures Student Signature and Name CONTENTS Abstract Introduction 10 1.1 Context 10 1.2 Research Problems 11 1.3 Contributions 12 1.4 Thesis Outline 14 1.5 Selected Publications 14 Preliminaries and literature survey 16 2.1 Image classification problem 16 2.2 Visual forgery techniques 21 2.2.1 Graphics-based techniques 21 2.2.2 Feature-based techniques 23 Visual forensics techniques 26 2.3.1 Computer vision techniques 26 2.3.2 Deep learning techniques 28 2.3 Proposed facial forgery detection models 31 3.1 Efficient-Frequency model 31 3.2 WADD (Wavelet Attention for deepfake detection) model 34 Proposed dual benchmarking framework for facial forgery and detection techniques 39 4.1 Framework 39 4.2 Datasets 40 4.2.1 Dual-benchmarking datasets (DBD) 40 4.2.2 External datasets 42 4.3 Measurements 43 4.4 Experimental Procedures 43 4.5 Reproducibility Environment 45 Experimental Results and Performance Analysis 46 5.1 Efficiency comparison 46 5.2 End-to-end comparison with existing datasets 47 5.3 Dual-benchmarking comparison 48 5.3.1 Forensic generalisation and forgery feature overlapping 49 5.3.2 Qualitative study of forensic-forgery duel 51 5.3.3 Influence of contrast 52 5.3.4 Effects of brightness 53 5.3.5 Robustness against noise 54 5.3.6 Robustness against image resolution 55 5.3.7 Influence of missing information 56 5.3.8 Adaptivity to image compression 57 Performance guidelines 57 5.4 Conclusions 60 Reference 61 List of Tables 1.1 Comparison Between Existing Benchmarks on Facial Forensics 13 2.1 Taxonomy of visual forgery techniques 21 4.1 Statistic of real datasets 43 5.1 Model size and detection speed 46 5.2 Statistic train, validate, test of existing datasets 47 5.3 Performance (Accuracy|P recision|Recall|F1 score) of visual forensic techniques on different datasets 48 5.4 Statistic train, validate, test of synthetic datasets 48 5.5 Performance of visual forensics techniques against visual forgery techniques 49 5.6 Performance guideline for visual forensics 59 List of Figures 2.1 A vanilla CNN 17 2.2 Convolution layer 18 2.3 Max and average pooling 18 2.4 Global average pooling 19 2.5 Flatten and fully connected layer 19 2.6 Activation functions 20 3.1 Overview of Efficient-Frequency pipeline 32 3.2 EfficientNet architecture 33 3.3 WADD model 35 3.4 Wavelet Pooling 35 3.5 Attention layer 38 4.1 Dual benchmarking framework 40 4.2 Size of facial forgery datasets 41 4.3 Images from DBD dataset 42 5.1 Generalization ability of forensic techniques 50 5.2 Overlapping feature of forgery techniques 51 5.3 Suspicious region of forged images 52 5.4 Effects of illumination factors 53 5.5 Robustness against noises 54 5.6 Robustness against image resolution 55 5.7 Influence of missing information 56 5.8 Adaptivity to image compression 57 Chapter Introduction 1.1 Context Forgery images are synthetic images which are created by tools in computer The content of these images does not reflect the truth, misleading viewers about reality Face is the most faked component due to the importance of the face in determining a person’s identity, face spoofing is the most common method to falsify the information of a photo In fake face images, a person in an existing image or video is replaced with someone else’s likeness The term "fake images" or "forgery image" have emerged in the recent years, because there have been a lot of tools helping change the content of images such as Photoshop software Although these tools are so powerful, users must have a huge a mount of knowledge to use them As a results, the number of fake images is low and it takes a lot of resources to make a fake photo But now, rather than images simply being altered by editing software such as Photoshop or videos being deceptively edited, there’s a new breed of machine-made fakes – and they could eventually make it impossible for us to tell fact from fiction With the development of deep learning techniques, there are a lot of methods which can generate "fake images" quickly, easily and in bulk "Deep fakes" are the most prominent form of what’s being called “synthetic media”: images, sound and video that appear to have been created through traditional means but that have, in fact, been constructed by complex software Deep fakes have been around for years and, even though their most common use to date has been transplanting the heads of celebrities onto the bodies of actors in pornographic videos, they have the potential to create convincing footage of any person doing anything, anywhere There are many app which use "Deep fake" technique to help people easily make forgery content such as Zao or FaceApp software 10 Figure 5.3: Suspicious region of forged images 5.3.3 Influence of contrast This experiment studied the effects of image contrast on the visual forensics techniques Figure 5.4a illustrates the result of these experiments for a contrast factor that varied from 0.5 to 2, as described in section 4.4 In general, all techniques suffered a reduction in accuracy when the contrast factor was at the extreme ends of this range, and performed the best when the contrast factor was Efficient-Frequency showed the greatest robustness to this factor for all forgery techniques, and could maintain an accuracy of higher than 0.9 when the contrast factor was as high as or as low as 0.5 Xception and Capsule have good result when constrast of image is changed Capsule showed a similar level of robustness to this factor as Xception, but it achieved better results on StarGAN, with an accuracy of 0.8, when the contrast factor was FDBD gave the poorest results when the contrast in the image was extreme WADD and Mesonet was unstable in terms of accuracy, due to their simple neural model architecture The GAN-fingerprint model could in some cases achieve 52 (a) Contrast (b) Brightness Figure 5.4: Effects of illumination factors better accuracy when the image contrast was increased This is because the higher contrast exposes more of the fingerprint of the forged image 5.3.4 Effects of brightness We then studied the effects of another property, the image brightness Figure 5.4b shows the results of an experiment in which the brightness factor of the visual content was varied from 0.5 to 53 Similarly to the experiment with contrast, each technique showed a reduction in accuracy when the brightness of the visual content changed significantly Efficient-Frequency showed the most technique to changes in this factor The accuracy was stable at around 0.9 for Deepfake, 3DMM,FaceSwap-3D and ReenactGAN, and at above 0.8 for FaceSwap-2D, MonkeyNet and StarGAN , when the brightness factor was increased to The Capsule and XceptionNet techniques showed the same level of stability as Efficient-Frequency, with an accuracy of above 0.7 for all forgery techniques In contrast, WADD and Mesonet was very susceptible to changes in brightness, as its model contains significantly fewer layers than EfficientFrequency, XceptionNet and Capsule Extreme changes in brightness also adversely affected the performance of GAN-fingerprint, and its accuracy was reduced by around 0.2 for all forgery techniques when the brightness factor changed to 0.5 or The traditional techniques, HPBD and Visual-Artifacts, were less susceptible to changes in the brightness of the visual content This is because these techniques depend strongly on engineered features such as landmarks and facial details, which are unaffected by the brightness 5.3.5 Robustness against noise We then explored the effects of noise on the performance of each forensics technique To simulate this condition, we added Gaussian noise to the images in the dual benchmarking dataset, as described in section 4.4 Figure 5.5 depicts the experimental results An unexpected finding was that most of the forensics techniques were strongly affected by this noise factor Figure 5.5: Robustness against noises 54 GAN-fingerprint demonstrated the greatest robustness to this noise factor, and its accuracy remained above 0.7 when the noise level (standard deviation σ) reached 0.3 This can be explained by the fact that Gaussian noise does not affect the fingerprint generated by GAN-based forgery techniques, and its in-depth investigation of both the image and model level helps the model to mitigate the effects of the noise Conversely, others deeplearning techniques such as EfficientFrequency, XceptionNet and Capsule, which showed strong potential in the previous test, did not perform well in this experiment The accuracy of these techniques quickly fell to 0.5 when the standard deviation reached 0.1 In some dataset such as 3DMM, ReenactGAN and StarGAN, WADD work very well with low noise level, which have accuracy above 80% with noise having σ

Định dạng
Số trang	66
Dung lượng	1,49 MB