Modified CNN Model Based Forgery Detection Applied to Multiple Resolution Tampered Images Modified CNN model based Forgery Detection applied to Multiple Resolution Tampered Images Thuong Le Tien Depar[.]
2021 8th NAFOSTED Conference on Information and Computer Science (NICS) Modified CNN model-based Forgery Detection applied to Multiple-Resolution Tampered Images Thuong Le-Tien Duy Ho-Van Nhu Pham-Ng-Quynh Department of EEE Department of EEE Department of EEE Uni. of Technology Uni. of Technology Uni. of Technology National University National University VNU, HCM City,VN HCM City, Vietnam HCM City, Vietnam nhu.pham112358@ hcmut.edu.vn thuongle@hcmut.edu.vn hoduy1411@gmail.com Hanh Phan-Xuan* Tuan Nguyen-Thanh* Department of EEE Uni. of Technology National University HCM City, VietNam phantyp@gmail.com Department of EEE Uni. of Technology National University HCM City, Vietnam nttuan@hcmut.edu.vn Abstract — The crucial problem of forensic techniquesis is how to detect/recognize tampered images through public media platforms under the attactks of subjective modifications Because of many accessible photoshop programs, an image/video such as in Facebook, Instagram, Reddit, Twitter, etc can be easily tampered to falsify the information within the image Accoding to the requirement of an efficient method for detecting fake images, we have developed modifed CNN models which are combined with the superresolution approach to solve this issue In the paper, we present an appropriate method using CNN models to detect tampered images with the increase in resolutions of the tampered areas, the proposed model can detect and point out the areas that have been tampered The ResNet50 and mUNet modified models are used for classification and segmentation respectively With the developed models, the results were given with an accuracy of at least 90% on the evaluation sets Index Terms—Image Forensics, Deep Learning, Image Forgery Segmentation, Image Forgery Detection and Localization, MultiResolution Tampered Images, ResNet50, mUNet, umUNET I. INTRODUCTION general method developed by H. Farid and A. Popescu [1], and they differ from each other only at the feature extraction stage. This approach is divided into two groups: key-pointbased and block-based methods. With the approach based on the key point of the image, in [2-4], based on the famous Scale-Invariant Feature Transform (SIFT) technique and Speeded Up Robust Features (SURF). Then, the features of the key points are compared with each other to find similarities. The final tampered regions are shown when the comparison completes with the same Affine transform and passes a threshold level. For the block-in-picture approach, in [5-7], the tiles are generated by the sliding window. Features of the tiles are extracted by applying transformations such as Zernike Moment in [5], Hwei-Jen Lin et al. [6] using Radix Sort algorithm to sort the feature vectors of blocks are extracted from the formula proposed in the paper. Then, the features are put into the comparison process, these tiles are considered as overlapping regions when comparing two blocks with sufficiently large similarity. B Deep Learning method One of the most common ways of forging an image is to embed copies of an image, in other words, this is a method of forging an image by copying-move. The process of "embedding" an image consists of three stages: copying an image segment, transforming it (in terms of intensity or geometry, such as applying some transformations: rotate, zoom in, zoom out, etc.) possibly increasing the resolution of the image). After shifting, the image can be blurred to hide the fake component. Another way to tamper an image is splicing, where we take an external image, edit it, and paste it into another image. This modifier is more difficult to detect than the first one mentioned above. That is because for copy-move images, we can search for information in the image as a basis to find the copied image area. As for the second type, it is difficult to find any information on the image to use as a basis for comparison. Therefore, there are many technical challenges posed for the splicing method. As the trend in Machine Learning approaches, many methods using deep learning are applied to image forgery detection. Among them is the method of Rao and Ni [8] using a deep neural network (CNN). Instead of random initialization like the following layers, the weights in the first layer for the CNN are initialized with 30 basic high-pass filters used in the SRM model [9] for image analysis; This is the highlight of this method. Differential initialization on the first layer increases the accuracy, since the filters in [9], such as high-pass filters, help to filter out unnecessary details in the image. The CNN network consists of 10 layers with input as 128x128 image patches containing the fake region boundary in the fake image. CNN is pre-trained and has accumulated experience from test images so it can distinguish fake/real images better. This method achieves high accuracy of 98.04% on CASIA v1.0 standard data set, 97.83% on CASIA v2.0 dataset and 96.38% on DVMM set. A Traditional methods There were many papers that proposed algorithms to detect copy-move tampered images. Most of them are related to the 978-1-6654-1001-4/21/$31.00 ©2021 IEEE 126 II. METHODOLOGY AND EXPERIMENT 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) In this section, the problem of detecting forgery images is discussed when the tampered region is increased in resolution, namely binary classification at the pixel level (i.e. binary segmentation). Before talking about network architecture, the effect of resolution on model performance is presented. When the image has an upscaled region, the tampered boundary will have a rather large gradient, which will make the neural network easy to learn. In previous studies of this problem, segmentation networks were trained from fake (positive) images. But this will lead to the following problems. Problem 1: When training neural networks only on positive data, then in real time running, these networks are likely to fail in the face of negative samples (authentication images). Problem 2: Adding negative images to the training set will lead to more severe data imbalances. To solve these two problems, unified neural network is designed to achieve the ultimate goal of performing both image classification and segmentation simultaneously. Besides, in order to overcome the data imbalance when training the network with both positive and negative samples, we also use a loss function to support the training of the merged network. This loss function is referred to in the study of Thuy Nguyen-Chinh et al. [10]. Bottleneck. Specifically, Residual Bottleneck is taken from ResNet50, while Decoder Block is specifically designed to reduce complexity. Fig.1. The original ResNet50 Architecture [11] A Network Architecture ResNet50 is chosen for the classification task.The ResNet50 architecture is shown in Figure 1. Specifically, ResNet50 is built on top of basic blocks called Residual Bottleneck. Inspired by establishing connections between low and high features, the neural network not only learn various features from low and high features, but also reduce the gradient vanishing phenomenon which previous neural networks often get stuck when trying to increase the number of layers or the depth of the network. To use ResNet50 in image forgery classification, we replace the last Fully connected layer (1,000 on ImageNet set) with two Fully Connected layers with 2048 outputs and 2 outputs (binary classifier) respectively. The network weights inherited from the ResNet50 network are trained on the ImageNet set. In addition, the group initializes the weights of the added classes by pre-training those classes. For the segmented network, we designed its architecture based on UNet. In fact, UNet has a symmetric architecture, in other words, the number of convolutional layers in the corresponding layers of the encoder and decoder is similar. UNet's encoder is designed based on VGG. Here, we boldly changed the encoder of UNet, the new encoder supported by ResNet50. Using ResNet, there are 5 stages, so the output stride of the encoder is changed to 32. The symmetry in the architecture of the original UNet can be redundant and unnecessary. Therefore, we continued to modify the network's decoder in a more compact way to make the computation faster, but without sacrificing accuracy too much. The last segmentation network is called mUNet (modified UNet) is plotted in Figure 2. In which, the backbone is ResNet50 with output stride of 32, excluding the last Fully connected layer. For each stage of the decoder, there are two components, including the Decoder Block and the Residual Fig. 2. Architecture of the proposed mUNet (modified UNet) with 1 output branch. Fig 3. Architecture of the proposed umUNet (unified modified UNet) with 2 output branches. A proposed model that merges the two models above into a single model called umUNet (unified modified UNet). umUNet is depicted in Figure 3. Since the backbone of the mUNet is ResNet50, which is also the classification network, we were able to add a classification branch immediately after the last layer of the encryption path. So umUNet has two output branches, including surface mask and classifier. 127 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) B The process summary The simulation process for implementing three models follows in four main steps as: Data Preparation Training Evaluation Simulation [14] and Columbia [15]. In which, CASIA2 and part of IEEE are used for training, while the rest of IEEE, CASIA1, Coverage and Columbia are used to evaluate model performance. All the data-sets used in this paper are summarized in Table I. D Training The ResNet50 model is used for the image classification task. Before training, the image is preprocessed by normalizing the gray level values to the range [0 1]: I (1) 255 where I and Inorm are the original and normalized images. Subscript c denotes color channels in the RGB color system, c∈{R, G,B}. The dataset is unbalanced between positive and negative labels, with a total of more than 13,000 images. This number is not really big, that's why we applied the data augmentation method (flip horizontally, vertically; transpose; transform: elastic, optical) from the augmentations library. As a result, a fairly balanced and large enough data set is obtained to train the model. Using the Cross-Entropy loss function and ) are used to optimize the Adam optimizer (learning rate 1 the weights of the CNN network. After 20 training epochs, the network starts to converge and can no longer grow. Here, we stop the training process and store the weights of this ResNet50 to serve the following models. c I norm c Fig. 4. Block diagrams of the simulated models. Four phases are similarly applied to all three models: ResNet50, mUNet and umUNet. The first thing, images in the dataset are set into the same name and size format 512 x 512.The training of the models proposed above is also quite similar. The first step is image preprocessing: normalize the input image; then proceed to enhance the data to help the network learn more efficiently. Conduct training the network, observe the F1-score value of the validation set until it no longer increases, at this point the network has converged, we proceed to stop the training. Finally, save the model for use in evaluation. The ResNet50, mUNet and umUNet models after training are saved, because we need to reuse them for model evaluation. After many finetuning, we will choose the most optimal set of model numbers. Run the model simulation. Figure 4 shows the entire implementation of the group. C Image Dataset TABLE I. THE SUMMARY OF IMAGE DATASETS USED IN THE SIMULATIONS OF PAPER. Dataset Authentic Tampered Total IEEE 1,050 450 1,500 CasiaV2 7,491 5,123 12,614 CasiaV1 800 921 1,721 Columbia 183 180 363 Coverage 100 100 200 The goal is to build a model capable of detecting fake images in many different resolutions. We decided to look at some of the popular datasets CASIA1-2 [12], IEEE [13], Coverage The mUNet model is used for the segmentation task. For databalancing reasons, this model is also constrained as input to a tampered image. Inside tampered images, the negative number of pixels still prevails, we solved this problem by adding a balanced weight to the Cross-Entropy loss function: L L(sur f ) W ,H w0 s0 log s0 w1s1 log s1 (2) H W x 1, y 1 where w w0 , w1 is the surface balance weight. This weight is the ratio of the number of negative pixel labels to the number of positive pixel labels in the training set. We choose w 10.0, 1.0 To take advantage of the ResNet50 trained on the ImageNet set, we used the weights of ResNet50 to initialize the weights of the mUNet encryption path. Using the Adam ) and the Crossoptimization function (learning rate1 Entropy function. After training the model after 20 epochs, the network converges. To train the umUNet model, we initiated the encoder path with the weights trained on the ImageNet set. We do not initialize umUNet from mUNet because in practice the classifier's role is very important, it affects the segmentation result, so if mUNet classifier is not good it will affect the performance of umUNet. In WGF loss function, we only use the same balanced weights as in the mUNet training case (w = [10.0, 1.0]). The training strategy still includes input size transformation, normalization, and optimization like the mentioned above models. 128 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) E Testing In this section, the proposed models are evaluated on several standard data sets. First, we conducted independently trained segmentation and classification network experiments. We then repeated the experiments, but the two tasks were united in a single network. Finally, compare the proposed model with classical methods as well as recent Deep Learning methods to be able to assess the contribution of the research. The classification results of ResNet50 are shown in Table II. Overall, the Acc and F1 scores are relatively low for the binary classification problem. From there, we can clearly see that the forgery image classification problem is still difficult for ResNet50 to learn. Obviously, the score in the IEEE set is higher than the rest. This can be explained that the manipulation distribution of the images in the IEEE dataset is quite similar. Furthermore, the Casia1 dataset is similar to the Casia2 dataset, which is used to train the network. At the same time, the data distribution in Columbia is quite simple, so Acc and F1 are higher than the rest of the sets. Meanwhile, Coverage's data distribution is shifted away from the training data distribution, making the model difficult to learn. There is also another reason, that is, it is difficult to judge whether an image is fake or not at the image level. In fact, forensic experts who want to assess a fake photo need to analyze the image by looking at small clues. Therefore, the classification network needs to be provided with more information in order to understand the definition of forgery. only positive images. This is true of the above theoretical analysis, that is, mUNet has the ability to make false predictions with authentic images. However, there are some cases where the authentic images are correctly predicted by mUNet (Figure 5). Fig. 6. Some cases of authentic images where mUNet predicts wrongly, and umUNet correctly predicts thanks to the classification output. (a)(b)(c)(d). The input image; (e)(f)(g)(h). The surface prediction of mUNet; (i)(j)(k)(l).The surface prediction of umUNet. It can be seen, the wrongly segmented mUNet regions are clean when umUNet predicts TABLE II. RESNET50 CLASSIFICATION RESULT (a) (b) (c) Metrics IEEE Casia1 Columbia Coverage Acc 0.5734 0.5234 0.5806 0.5050 F1 0.6433 0.5746 0.6308 0.5858 TABLE III. mUNET SEGMENTATION RESULTS ON TAMPERED IMAGES AND ON BOTH TAMPERED AND AUTHENTIC IMAGES OF THE EVALUATOR SET. Category Metrics IEEE Casia1 Columbia Coverage (d) (e) (f) Fig.5. Some cases of authentic images that mUNet predicts correctly. (a)(b)(c). The authentic images; (d)(e)(f). The model’s predictions, there are no pointed areas showing the manipulation in the images. Acc 0.8423 0.9026 0.8490 0.7661 Tampered F1 0.1593 0.5651 0.7613 0.4228 mIoU 0.4639 0.6483 0.7091 0.5067 The test to measure the performance of mUNet is processed. The results are shown in Table III. Specifically, the results of mUNet are quite high, in which, the results in the Columbia dataset are the best with F1 = 0.76,because the manipulation method in this dataset is quite simple. In addition, we evaluated the mUNet model on both authentic and tampered images, even though it was only trained on tampered images. The data in Table II show that the results predicting both negative and positive images are lower than those predicting Acc 0.8477 0.9346 0.7776 0.7524 Tampered and authentic F1 0.0940 0.5013 0.5088 0.2540 mIoU 0.4485 0.6393 0.5475 0.4440 Table IV shows the results of the two models in the task of segmenting tampered regions not only on the tampered image but also on the authentic image. We can clearly see that the performance of umUNet is better than that of the mUNet 129 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) model in the Casia1, Columbia and Coverage datasets, leaving IEEE for mUNet with F1 = 0.09. In general, IEEE datasets are difficult to learn in segmentation tasks, so the F1 of both models are very low. As for the Casia1 dataset, umUNet achieved F1 = 0.62, leaving mUNet about 0.12 points. We see that the model works quite well for this dataset in the segmentation task. Next, Columbia is the data set where umUNet reaches F1 = 0.71, leaving mUNet about 0.20 points behind. Finally, in the Coverage episode, umUNet surpassed mUNet with 0.30 points. In general, the umUNet model is quite good when faced with 3 data sets Casia1, Columbia and Coverage. To be more intuitive, Figure 6 shows that the classification output in umUNet is effective when combined with the segmentation output to accurately predict the authentic images. In addition, in Figure 7, the predicted sharpness umUNet is closer to the surface label than that of mUNet. TABLE IV. SEGMENTATION RESULTS OF TWO MODELS: MUNET AND UMUNET ON BOTH TAMPERED AND AUTHENTIC IMAGES OF THE EVALUATION SET. Model Metrics IEEE Casia1 Columbia Coverage Acc 0.8477 0.9346 0.7776 0.7524 mUNet F1 0.0940 0.5013 0.5088 0.2540 mIoU 0.4485 0.6393 0.5475 0.4440 Acc 0.8950 0.9672 0.8970 0.9156 umUNet F1 0.0928 0.6194 0.7092 0.5532 mIoU 0.4725 0.7175 0.7210 0.6491 taken into account: BLK [16], NOI1 [17], CFA1 [18] and NOI2 [19]. Studies using Deep Learning include: MFCN [20], EXIF [21], GSR [22] and CAS [10]. In particular, studies using Deep Learning have been published from 2017 to the present. MNCN [20] is the first method to solve the task of detecting tampered images by Semantic Segmentation. The author used the FCN network in combination with two outputs, including surface and edge, to increase the performance of the model. Next, EXIF [21] was built with the idea of exploiting the similarity of EXIF of small frames to be used to localize the forgery on the image. Then, the GSR [22] consists of three components: a GAN used to simulate rogue data to increase the amount of training data, a segmentation network (DeepLab) and an alternative layer for the model to focus on learning into forged edges. The most recent is the research of Thuy Nguyen-Chinh with CAS [10] using UNet architecture to perform two tasks of classification and segmentation simultaneously. This model performs well on all 5 benchmark data sets. F Comparison TABLE V. SHOWS F1 INDEX OF THE PROPOSED METHOD AND OTHER METHODS ON TWO DATA SETS (ONLY FORGERY IMAGES ARE CONSIDERED). Methods Casia1 Columbia Coverage BLK [15], 2009 0.2312 0.5234 0.5050 NOI1 [16], 2009 0.2633 0.5740 0.6178 CFA1 [17], 2012 0. 2073 0.4667 0.2335 NOI2 [18], 2014 0.2302 0.5318 0.2353 MFCN [19], 2017 0.5410 0.6117 - EXIF [20], 2018 0.2040 0.8800 0.2760 GSR [21], 2018 0.5740 0.8290 0.4890 CAS [10], 2019 0.3066 0.7985 0.5080 mUNet 0.5651 0.7613 0.4228 Figure. 7. Comparison of predictions of two models: mUNet and umUNet (a)(b)(c)(d). The input images; (e)(f)(g)(h). Surface labels; (i)(j)(k)(l). mUNet surface prediction; (m)(n)(o)(p) umUNet surface prediction. . As we can see, the segmented umUNet regions are sharper than those of mUNet In this section, the proposed methods are compared to the previous studies, including the classical method and the Deep Learning method. In the classical methods, four studies were The comparison of predictions of two models is presented into two tasks. The first is a fractional F1 comparison of the methods on the tampered images. Next, we compared the F1 of the methods on both authentic and tampered. Table V lists the first task results. In general, the proposed method has a fairly high F1, but cannot stand at the top. Specifically, in Casia1, GSR [22] was in the highest position with F1 = 0.574, while the proposed model ranked second with a slightly 130 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) smaller score (F1 = 0.565). In the Columbia set, EXIF [21] is still the best method with a score of F1 = 0.88, second place belongs to GSR [22] with F1 = 0.83, and the proposed model’s score is only in fourth place. Finally, in the Coverage dataset, the proposed model also comes in third place with F1 = 0.42. REFERENCES [1] [2] [3] Table VI shows the results of the remaining tasks. The umUNet model held the highest position in the three assessment datasets Casia1, Columbia and Coverage. In this task, since some deep learning methods provided neither results nor source code, the team only compared with EXIF [21] and CAS [10]. In Casia1, umUNet reached F1 = 0.62 ranked first and second place belonged to CAS [10] with F1 = 0.60. Next, in the Columbia dataset, umUNet is still in first place with F1 = 0.71, while a very close second place is for CAS [10] with F1 = 0.70. Finally, for the Coverage dataset, umUNet continues to lead and in second place is CAS. In general, in this task, recent methods have achieved high performance. TABLE VI. F1 INDEX OF THE PROPOSED METHOD AND OTHER METHODS ON TWO DATA SETS (CONSIDERING BOTH TAMPERED AND AUTHENTIC IMAGES). [4] [5] [6] [7] [8] Methods Casia1 Columbia Coverage [9] BLK [15], 2009 0.0967 0.0413 0.1132 NOI1 [16], 2009 0.0666 0.0118 0.1163 CFA1 [17], 2012 0.0935 0.3011 0.1168 NOI2 [18], 2014 0.0362 0.0054 0.1176 EXIF [20], 2018 0.1092 0.6852 0.2672 CAS [10], 2019 0.5979 0.6990 0.5000 [12] [13] mUNet 0.5013 0.5088 0.2540 [14] umUNet 0.6194 0.7092 0.5532 [10] [11] [15] III. CONCLUSION AND FUTURE DEVELOPMENT [16] In this paper, we propose two models ResNet50 for classification and mUNet for segmentation. With ResNet50 inherited from the original ResNet [11], the model mUNet is proposed based on UNet [23]. Then we merged the two models above into a proposed model namely umUNet. The backbone of umUNet is ResNet50 and its decoder is still from mUNet. The results showed that the proposed model was quite reasonable. Although in the segmentation task only on tampered images, but with the segmentation task on both captcha and spoof, the team passed all methods on three-thirds of the dataset (Casia1, Columbia and Coverage). In the future works we will consider an efficient approach to noisy tampered images with complexity of image resolutions. ACKNOWLEDGEMENT This study is partially supported by the University of Technology, HCMUT, National University of Ho Chi Minh City (VNU), Vietnam. [17] [18] [19] [20] [21] [22] [23] 131 A. Popescu, H. Farid, “Exposing digital forgeries by detecting duplicated image regions,” 6211 Sudikoff Lab, Computer Science Department, Dartmouth College, Hanover, NH 03755 USA, 2005. X. Pan and S. Lyu, “Region duplication detection using image feature matching,” IEEE Transactions on Information Forensics and Security, vol. 5, no.4, ISSN: 1556-6013, pp. 857-867, 2010. I. Amerini, L. Ballan, R. Caldelli, A. Del Bimbo and G.Serra, “A siftbased forensic method for copymove attack detection and transformation recovery,” IEEE Transactions on Information Forensics and Security, vol.6, no. 3, ISSN: 1556-6013, pp. 1099-1110, 2011. P. Kakar, N. Sudha, “Exposing postprocessed copy-paste forgeries through transform-invariant feature,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 3, ISSN: 1556-6013, pp. 1018-1028, June 2012. S.-J. Ryu, M.-J. Lee and H.-K. Lee, “Detection of copy-rotate-move forgery using Zernike moments,” Information Hiding Conference, Lecture Notes in Computer Science, vol. 6387, Springer, HeidelbergBerlin, 2010, ISBN: 978-3-642-16434-7. H.-J. Lin, C.-W. Wang and Y.-T. Kao, “Fast copy-move forgery detection,” WSEAS Transactions on Signal Processing, vol. 5, no. 5, ISSN: 0031-3203, pp. 188-1975, 2009. V. Christlein, C. Riess, J. Jordan and E. Angelopoulou, “An evaluation of popular copy-move forgery detection approaches,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 6, ISSN: 1556-6013, pp. 1841-1854, 2012. Rao Yuan, Ni Jiangqun, “A deep learning approach to detection of splicing and copy-move forgeries in images,” IEEE International Workshop on Information Forensics and Security (WIFS), Abu Dhabi United Arab Emirates, 2016. J. Fridrich, and J. Kodovsky, “Rich models for steganalysis of digital images,” IEEE Transactions on Information Forensics and Security, vol.7, no. 3, pp. 868-882, ISSN: 1556-6013, June 2012. ThuyNguyen-Chinh, ThienDo-Tieu, "Deep learning techniques to detect fake images: combining image classification and segmentation,” Final thesis of bachelor’s graduation, Ho Chi Minh City University of Technology, 2019. K.He, X.Zhang, S.Ren, J.Sun, “Deep Residual Learning for Image Recognition,”http://arxiv.org/pdf/1512.03385.pdf, December 2015. J.Dong and W.Wang, “Casia tampering detection dataset,” 2011. “The 1st IEEE IFS-TC Image Forensics Challenge,” http:// ifc.recod.ic.unicamp.br/fc.website/index.py, 2013. “Coverage-A novel database for copy-move forgery detection,” IEEE International Conference on Image processing (ICIP), Phoenix, USA, September 2016. Y. F. Hsu and S. F. Chang, “Detecting image splicing using geometry invariants and camera characteristics consistency,” IEEE International Conference on Multimedia and Expo, Toronto, Canada, July 2006. Li, W., Yuan, Y., Yu, N., “Passive detection of doctored jpeg image via block artifact grid extraction,” Signal Processing 89 (9), 1821-1829, 2009. Mahdian, B., Saic, S., “Using noise inconsistencies for blind image forensics,” Image and Vision Computing 27 (10), 1497-1503, 2009. Ferrara, P., Bianchi, T., De Rosa, A., Piva, A., “Image forgery localization via fine-grained analysis of cfa artifacts,” IEEE Transaction on Information Forensics and Security 7 (5), 1566-1577, 2012. Lyu, S., Pan, X., Zhang, X., “Exposing region splicing forgeries with blind local noise estimate,” International Journal of Computer Vision 110 (2), 202-221, 2014. R. Salloum, Y. Ren and C.C.J. Kuo, “Image Splicing Localization Using A Multi-Task Fully Convolutional Network (MFCN),” https:// arxiv.org/abs/1709.02016.pdf, 2017. R. Salloum, Y. Ren and C.C.J. Kuo, “Image Splicing Localization Using A Multi-Task Fully Convolutional Network (MFCN),” https:// arxiv.org/abs/1709.02016.pdf, 2017. P. Zhou, B.-C. Chen, X. Han, M. Najibi, A. Shrivastava, S.N. Lim, L.S. Davis, “Generate, Segment and Replace: Towards Generic Manipulation Segmentation,” http://arxiv.org/abs/1811.09729.pdf, 2018. O. Ronneberger, P. Fischer, T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,”http://arxiv.ord/pdf/ 1505.04597.pdf, May 2015. ... likely to? ? fail in the face of negative samples (authentication? ?images) . Problem 2: Adding negative images? ? to? ?the training set will lead to? ?more severe data imbalances. ? ?To? ?solve these two problems, unified ... of the simulated models. Four phases are similarly applied? ?to? ?all three models: ResNet50, mUNet and umUNet. The first thing, images? ? in the dataset are set into the same name ... proceed to? ? stop the training. Finally, save the? ?model? ? for use in evaluation. The ResNet50, mUNet and umUNet models after training are saved, because we need? ?to? ?reuse them for? ?model? ?evaluation. After many finetuning, we will choose the most optimal set of? ?model? ?numbers.