An improved unets architecture and its applications = cải tiến kiến trúc u nets và các ứng dụng

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY Master’s Thesis in Computer Science An Improved UNets Architecture and Its Applications Cải tiến kiến trúc U-Nets ứng dụng TRAN QUANG CHUNG Chung.TQCB190214@sis.hust.edu.vn Supervisor: Dr Dinh Viet Sang Department: Computer Science Ha Noi, 4/2021 Declaration of Authorship and Topic Sentences Personal Information Full name: Tran Quang Chung Mobile: 0965957672 Email: bktranquangchung@gmail.com Class: 19BKHMT Major: Computer Science The Topic An Improved UNets Architecture and Its Applications The result • Proposing a new architecture to improve the existing one • Applying the architecture for some tasks • Evaluating on many standard benchmark datasets Declaration of Authorship I hereby declare that my thesis titled "An Improved UNets Architecture and Its Applications" is my own work and my supervisor Dr Dinh Viet Sang All papers, sources, tables, etc that used in this thesis are thoroughly cited Supervisor Confirmation Ha Noi, April, 2021 Supervisor Dr Dinh Viet Sang Acknowledgements Throughout the writing of the thesis, I have received a large amount of support from my teachers, my friends, my colleague Firstly, I would like to express my deep and sincere gratitude to all of the teachers of School of Information and Communication Technology - Hanoi University of Science and Technology who equipped me a large amount of important information Second, I would like to thank my supervisor, Dr Dinh Viet Sang, whose expertise was invaluable in formulating the research questions and methodology for a newbie like me The first teacher taught me how to start in research work and write a good scientific paper His insightful feedback pushed me to a higher level and taught me how to solve a problem I would also like to thank VAIS (Vietnam Artificial Intelligence Solutions) company for supporting me a lot The VAIS company provided me the hardware resources such as GPU, server, hard disk driver, etc to finish my thesis Finally, I am grateful to my parents for their love, wise counsel Also, I express my thanks to my friends who always support me in difficult situations Abstract In recent years, deep learning technology develops rapidly and applies to support many problems in life Almost the deep learning methods surpass the traditional ones in many challenges such as image segmentation, image detection, face recognition, etc However, it faced some challenges, and there is still room for improvement In this thesis, we focus on two domains that are face reconstruction and polyp segmentation Face reconstruction is an important module to improve the performance of the pose-invariant face recognition system The recognition system is suffering its accuracy from the problems such as different poses, illumination, expression Hence, we propose two variants of a newly developed generative model (ResCUNet, Attention ResCUNet) that can transform a profile face into a frontal face The proposed model can reconstruct the frontal face from the profile face, and the synthetic faces are natural, photorealistic, coherent, and identity-preserved As a result, our proposal improved the performance of the face recognition system For the polyp segmentation task, the challenge is small polyps, illumination, and small data Thus, we proposed the Attention Ret-CUNeSt model to solve this challenge On these tasks, we conducted many experiments, and our proposals surpass many previous studies in the standard benchmark datasets Keywords: Face Reconstruction, ResNet, Image Segmentation, Convolution Neural Network, UNet, Attention, CUNet, UVGAN, UV Map Author Tran Quang Chung Contents Introduction 1.1 Introduce some tasks in Computer Vision 1.1.1 Face recognition 1.1.2 Image segmentation Introduce the problem and Motivation 1.2.1 Face Reconstruction 1.2.2 Polyp Segmentation 1.3 Contribution of the Master Thesis 1.4 Outline of the Master Thesis 1.2 Theoretical Basis 2.1 Convolution Neural Networks 2.1.1 Layers 2.1.2 Spatial Convolution 2.1.3 Spatial Pooling 2.1.4 Backpropagation algorithm 2.1.5 Gradient descent 11 2.1.6 Dropout 14 2.1.7 Tranfer Learning 15 Literature Review 3.1 3.2 10 16 Convolutional Neural Networks 16 3.1.1 LeNet 16 3.1.2 VGG 17 3.1.3 ResNet 18 Face Reconstruction 19 3.3 3.2.1 3D Morphable Model 19 3.2.2 3DDFA 20 3.2.3 UV-GAN 20 3.2.4 VGG 20 Polyp Segmentation 21 3.3.1 U-NET 21 3.3.2 ResUNet++ 22 3.3.3 Attention UNet 23 3.3.4 Pranet 25 Proposed Method 28 4.1 Face Reconstruction 28 4.2 Polyp Segmentation 34 4.2.1 Overall Architecture 36 4.2.2 Backbone: ResNet family 36 4.2.3 Coupled U-Nets 38 4.2.4 Loss function 38 Experiments and Results 5.1 5.2 5.3 39 The dataset 39 5.1.1 Multi-PIE 39 5.1.2 Dataset Verification 41 5.1.3 Polyp Segmentation 41 Face Reconstruction 42 5.2.1 Image Reconstruction 42 5.2.2 Pose Invariance Face Recognition 44 5.2.3 Attention map visualization 47 5.2.4 Failed Cases 48 Polyp Segmentation 49 5.3.1 Data augmentation 49 5.3.2 Evaluation metrics 49 5.3.3 Ablation study 50 5.3.4 Comparison to existing methods 51 References 55 List of Tables 5.1 Evaluation of different methods on Multi-PIE dataset 42 5.2 Verification results on different poses on the Multi-PIE dataset 44 5.3 Verification accuracy (%) comparison on the LFW and CPLFW datasets 44 5.4 Verification accuracy (%) comparison on the CFP dataset 44 5.5 Performance metrics for model variants trained using Scenario 1, i.e training on CVC-Colon and ETIS-Larib, testing on CVC-Clinic 50 5.6 Performance metrics for Mask-RCNN and Attention ResCUNeSt using Scenario 2, i.e using CVC-Colon for training, CVC-Clinic for testing 52 5.7 Performance metrics for Mask-RCNN, Double UNet and Attention ResCUNeSt using the Scenario 3, i.e using CVC-ClinicDB for training, ETIS-Larib for testing 54 5.8 mDice and mIoU scores for models trained using the Scenario on the Kvasir-SEG and CVC-ClinicDB test sets 55 5.9 Performance metrics for UNet, MultiResUNet and Attention ResCUNeSt101 using the Scenario 5, i.e 5-fold cross-validation on the CVCClinic dataset 55 5.10 Performance metrics for UNet, ResUNet++, PraNet and Attention ResCUNeSt-101 using the Scenario 6, i.e 5-fold cross-validation on the Kvasir-SEG dataset 56 List of Figures 1.1 Facial Recognition System 1.2 Four-level of image segmentation 2.1 A simple neural architecture has three layers which are input, hidden, output layers Each neuron is connected by a directed arrow 2 2.2 a) a drawing of brain neuron, b) its mathematical function 2.3 a) sigmoid activation function, b) activation function 2.4 Convolution Operation6 2.5 Max-Pooling Operation 2.6 The backpropagation algorithm8 10 2.7 Gradient Descent Algorithm9 2.8 Dropout is used as a regularization technique 3.1 The first deep learning architecture - LeNet11 3.2 VGG 16 architecture used for the ILSVRC-2012 and ILSVRC-2013 competitions 12 17 3.3 A residual block 3.4 UV-Gan framework consists of one generator (U-Net) and global and local discriminator 21 3.5 The UNET architecture has a contracting path and expanding path 22 3.6 The ResUnet++ architecture 23 3.7 The Attention Gate (AG) receives the two inputs input features (xl ) and gating signals (g) Firstly, input features (x) are up-sampled and add gating signals, then passing to two activation functions (ReLU, Sigmoid) to produce attention coefficient maps (α) Finally, input features (ˆ xl ) are scaled with attention coefficients 14 24 3.8 Attention UNet architecture 25 3.9 The Pranet architecture includes three reverse attention modules attaching at lass three high-level features 25 13 10 11 10 14 16 18 LIST OF FIGURES 4.1 A pipeline process of face synthesis Using 3DDFA to obtain a 3D mesh and an incomplete UV map Then a new generative model is applied to recover the self-occluded regions The completed UV map is attached to the fitted 3D mesh to generate faces of arbitrary poses 29 4.2 ResCUNet with a coupled U-Nets enhanced by residual connections within each U-Net 30 4.3 Attention ResCUNet-A a advanced version of the previous network The generator of proposed Attention ResCUNet consists of coupled U-Nets Skip connections within each U-Net are enhanced with attention gates before concatenation The contextual information from the first U-Net decoder is weighted fused with attentive lowlevel feature maps of the second U-Net encoder before concatenation with the high-level coarse feature maps of the second U-Net decoder An auxiliary loss is used to improve gradient flow during the training phase 30 4.4 Discriminators and identity preserving module of proposed Attention ResCUNet-GAN The global discriminator is responsible for the global structure of entire UV maps The local discriminator focuses on the local facial details The identity preserving module keeps the identity information unchanged during the modification of the generator 33 4.5 Overview of the proposed Attention ResCUNeSt Attention gates within each UNet are used to suppress irrelevant information in the encoder’s feature maps Skip connections across two UNets are also utilized to boost information flow and promote feature reuse 34 4.6 Split attention in the k-th cardinal group with R splits 37 5.1 Camera labels and approximate positions inside the gathering room There were 13 cameras placed at head height, separated in 15◦ intervals Two added cameras (08_1 and 19_1) were positioned above the subject, simulating a typical surveillance camera 39 5.2 Montage of all 15 cameras in the dataset, exhibited with frontal flash illumination 13 of the 15 cameras were placed at head height with two extra cameras mounted higher up to receive views typically encountered in surveillance purposes 40 5.3 The creation of ground-truth complete UV maps Three facial images with yaw angles of 0◦ , −30◦ , 30◦ are fed to the 3DDFA model to create three incomplete UV maps which are then merged by Poisson blending to generate the ground-truth complete UV map 40 LIST OF FIGURES 5.4 Some samples of positive pairs from the CFP dataset 41 5.5 Results with frontal input images Incomplete UV maps are generated using 3DDFA Next columns are ground truth UV maps, results of UV-GAN, results of normal ResCUNet-GAN, intermediate results of Attention ResCUNet-GAN (after the first U-Net) and final results of Attention ResCUNet-GAN (after the second U-Net), respectively The most right block shows some synthetic images generated based on the final results of Attention ResCUNet-GAN 42 5.6 Results with profile input images Incomplete UV maps are generated using 3DDFA Next columns are ground truth UV maps, results of UV-GAN, results of normal ResCUNet-GAN, intermediate results of Attention ResCUNet-GAN (after the first U-Net) and final results of Attention ResCUNet-GAN (after the second U-Net), respectively The most right block shows some synthetic images generated based on the final results of Attention ResCUNet-GAN 43 5.7 Results with in-the-wild input images Incomplete UV maps are generated using 3DDFA The ground truth UV maps are unavailable The next columns are the results of UV-GAN, results of normal ResCUNet-GAN, intermediate results of Attention ResCUNet-GAN (after the first U-Net), and final results of Attention ResCUNet-GAN (after the second U-Net), respectively The right block shows some synthetic images generated based on the final results of Attention ResCUNet-GAN 43 5.8 Synthetic images for frontal input images The left block corresponds to the result of UV-GAN The right block corresponds to the final result of Attention ResCUNet-GAN (after the second U-Net) 46 5.9 Synthetic images for profile input images The left block corresponds to the result of UV-GAN The right block corresponds to the final result of Attention ResCUNet-GAN (after the second U-Net) 46 5.10 Synthetic images for in-the-wild input images The left block corresponds to the result of UV-GAN The right block corresponds to the final result of Attention ResCUNet-GAN (after the second U-Net) 47 5.11 Attention map visualization The first column contains UV maps generated by 3DDFA network, the second column contains generated UV maps overlaid by attention masks, and the last column illustrates attention coefficients only 47 Table 5.10: Performance metrics for UNet, ResUNet++, PraNet and Attention ResCUNeSt-101 using the Scenario 6, i.e 5-fold cross-validation on the Kvasir-SEG dataset Method UNet [43] ResUNet++ [28] PraNet [13] Attention ResCUNeSt-101 mDice ↑ 0.708±0.017 0.780±0.01 0.883±0.02 0.912±0.01 (a) ROC curves mIoU ↑ 0.602±0.01 0.681±0.008 0.822±0.02 0.860±0.011 Recall ↑ 0.805±0.014 0.834±0.01 0.897±0.02 0.923±0.009 Precision ↑ 0.716±0.02 0.799±0.01 0.906±0.01 0.927±0.014 (b) PR curves Figure 5.15: ROC curves and PR curves for Attention ResCUNeSt-101, PraNet, ResUNet++ and UNet in the Scenario 6, i.e., 5-fold cross-validation on the KvasirSEG dataset All curves are averaged over five folds Figure 5.16: Qualitative result comparison of different models trained in the Scenario 6, i.e., 5-fold cross-validation on the Kvasir-SEG dataset 56 Figure 5.17: Some failed cases of our model on the Kvasir-SEG dataset 57 Bibliography [1] Mohammad Ali Armin, Hans De Visser, Girija Chetty, Cedric Dumas, David Conlan, Florian Grimpen, and Olivier Salvado Visibility map: a new method in evaluation quality of optical colonoscopy In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 396–404 Springer, 2015 [2] Jorge Bernal, F Javier Sánchez, Gloria Fernández-Esparrach, Debora Gil, Cristina Rodríguez, and Fernando Vilariño Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs saliency maps from physicians Computerized Medical Imaging and Graphics, 43:99–111, 2015 [3] Jorge Bernal, Javier Sánchez, and Fernando Vilarino Towards automatic polyp detection with a polyp appearance model Pattern Recognition, 45(9):3166– 3182, 2012 [4] Jorge Bernal, Nima Tajkbaksh, Francisco Javier Sánchez, Bogdan J Matuszewski, Hao Chen, Lequan Yu, Quentin Angermann, Olivier Romain, Bjørn Rustad, Ilangko Balasingham, et al Comparative validation of polyp detection methods in video colonoscopy: results from the miccai 2015 endoscopic vision challenge IEEE transactions on medical imaging, 36(6):1231–1249, 2017 [5] Volker Blanz and Thomas Vetter A morphable model for the synthesis of 3d faces In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, pages 187–194, 1999 [6] James Booth, Epameinondas Antonakos, Stylianos Ploumpis, George Trigeorgis, Yannis Panagakis, and Stefanos Zafeiriou 3d face morphable models" in-the-wild" In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5464–5473 IEEE, 2017 [7] Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam Rethinking atrous convolution for semantic image segmentation arXiv preprint arXiv:1706.05587, 2017 [8] LI Chongxuan, Taufik Xu, Jun Zhu, and Bo Zhang Triple generative adversarial nets In Advances in neural information processing systems, pages 4088–4098, 2017 58 [9] Douglas A Corley, Christopher D Jensen, Amy R Marks, Wei K Zhao, Jeffrey K Lee, Chyke A Doubeni, Ann G Zauber, Jolanda de Boer, Bruce H Fireman, Joanne E Schottinger, et al Adenoma detection rate and risk of colorectal cancer and death New england journal of medicine, 370(14):1298–1306, 2014 [10] Jiankang Deng, Shiyang Cheng, Niannan Xue, Yuxiang Zhou, and Stefanos Zafeiriou Uv-gan: Adversarial facial uv map completion for pose-invariant face recognition In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7093–7102, 2018 [11] Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou Arcface: Additive angular margin loss for deep face recognition In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4690–4699, 2019 [12] Qingyan Duan and Lei Zhang Look more into occlusion: Realistic face frontalization and recognition with boostgan IEEE Transactions on Neural Networks and Learning Systems, 2020 [13] Deng-Ping Fan, Ge-Peng Ji, Tao Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, and Ling Shao Pranet: Parallel reverse attention network for polyp segmentation In International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 263–273 Springer, 2020 [14] Yuqi Fang, Cheng Chen, Yixuan Yuan, and Kai-yu Tong Selective feature aggregation network with area-boundary constraints for polyp segmentation In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 302–310 Springer, 2019 [15] Shanghua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, and Philip HS Torr Res2net: A new multi-scale backbone architecture IEEE transactions on pattern analysis and machine intelligence, 2019 [16] Michael Gschwantler, Stephan Kriwanek, Erich Langner, Bernhard Göritzer, Christiane Schrutka-Kölbl, Eva Brownstone, Hans Feichtinger, and Werner Weiss High-grade dysplasia and invasive carcinoma in colorectal adenomas: a multivariate analysis of the impact of adenoma and patient characteristics European journal of gastroenterology & hepatology, 14(2):183–188, 2002 [17] Tal Hassner, Shai Harel, Eran Paz, and Roee Enbar Effective face frontalization in unconstrained images In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4295–4304, 2015 [18] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun Deep residual learning for image recognition In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016 59 [19] Ran He, Jie Cao, Lingxiao Song, Zhenan Sun, and Tieniu Tan Adversarial cross-spectral face completion for nir-vis face recognition IEEE transactions on pattern analysis and machine intelligence, 42(5):1025–1037, 2019 [20] Jie Hu, Li Shen, and Gang Sun Squeeze-and-excitation networks In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141, 2018 [21] Rui Huang, Shu Zhang, Tianyu Li, and Ran He Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis In Proceedings of the IEEE International Conference on Computer Vision, pages 2439–2448, 2017 [22] Nabil Ibtehaz and M Sohel Rahman Multiresunet: Rethinking the u-net architecture for multimodal biomedical image segmentation Neural Networks, 121:74–87, 2020 [23] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros Image-to-image translation with conditional adversarial networks In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017 [24] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros Image-to-image translation with conditional adversarial networks CVPR, 2017 [25] Iyad A Issa and Malak Noureddine Colorectal cancer screening: An updated review of the available options World journal of gastroenterology, 23(28):5086, 2017 [26] Yuji Iwahori, Takayuki Shinohara, Akira Hattori, Robert J Woodham, Shinji Fukui, Manas Kamal Bhuyan, and Kunio Kasugai Automatic polyp detection in endoscope images using a hessian filter In MVA, pages 21–24, 2013 [27] Debesh Jha, Michael A Riegler, Dag Johansen, Pål Halvorsen, and Håvard D Johansen Doubleu-net: A deep convolutional neural network for medical image segmentation arXiv preprint arXiv:2006.04868, 2020 [28] Debesh Jha, Pia H Smedsrud, Michael A Riegler, Dag Johansen, Thomas De Lange, Pål Halvorsen, and Håvard D Johansen Resunet++: An advanced architecture for medical image segmentation In 2019 IEEE International Symposium on Multimedia (ISM), pages 225–2255 IEEE, 2019 [29] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen Progressive growing of gans for improved quality, stability, and variation arXiv preprint arXiv:1710.10196, 2017 [30] Tero Karras, Samuli Laine, and Timo Aila A style-based generator architecture for generative adversarial networks In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019 60 [31] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner Gradientbased learning applied to document recognition Proceedings of the IEEE, 86(11):2278–2324, 1998 [32] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al Photo-realistic single image super-resolution using a generative adversarial network In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017 [33] Suck-Ho Lee, Il-Kwun Chung, Sun-Joo Kim, Jin-Oh Kim, Bong-Min Ko, Young Hwangbo, Won Ho Kim, Dong Hun Park, Sang Kil Lee, Cheol Hee Park, et al An adequate level of training for technical competence in screening and diagnostic colonoscopy: a prospective multicenter evaluation of the learning curve Gastrointestinal endoscopy, 67(4):683–689, 2008 [34] AM Leufkens, MGH Van Oijen, FP Vleggaar, and PD Siersema Factors influencing the miss rate of polyps in a back-to-back colonoscopy study Endoscopy, 44(05):470–475, 2012 [35] Xiang Li, Wenhai Wang, Xiaolin Hu, and Jian Yang Selective kernel networks In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 510–519, 2019 [36] Fujun Luan, Sylvain Paris, Eli Shechtman, and Kavita Bala Deep photo style transfer In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4990–4998, 2017 [37] Alejandro Newell, Kaiyu Yang, and Jia Deng Stacked hourglass networks for human pose estimation In European conference on computer vision, pages 483–499 Springer, 2016 [38] Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y Hammerla, Bernhard Kainz, et al Attention u-net: Learning where to look for the pancreas arXiv preprint arXiv:1804.03999, 2018 [39] Xi Peng, Xiang Yu, Kihyuk Sohn, Dimitris N Metaxas, and Manmohan Chandraker Reconstruction-based disentanglement for pose-invariant face recognition In Proceedings of the IEEE international conference on computer vision, pages 1623–1632, 2017 [40] Patrick Pérez, Michel Gangnet, and Andrew Blake Poisson image editing In ACM SIGGRAPH 2003 Papers, pages 313–318 2003 [41] Hemin Ali Qadir, Younghak Shin, Johannes Solhusvik, Jacob Bergsland, Lars Aabakken, and Ilangko Balasingham Polyp detection and segmentation using 61 mask r-cnn: Does a deeper feature extractor cnn always perform better? In 2019 13th International Symposium on Medical Information and Communication Technology (ISMICT), pages 1–6 IEEE, 2019 [42] Linda Rabeneck, Julianne Souchek, and Hashem B El-Serag Survival of colorectal cancer patients hospitalized in the veterans affairs health care system The American journal of gastroenterology, 98(5):1186–1192, 2003 [43] Olaf Ronneberger, Philipp Fischer, and Thomas Brox U-net: Convolutional networks for biomedical image segmentation In International Conference on Medical image computing and computer-assisted intervention, pages 234–241 Springer, 2015 [44] Seyed Sadegh Mohseni Salehi, Deniz Erdogmus, and Ali Gholipour Tversky loss function for image segmentation using 3d fully convolutional deep networks In International workshop on machine learning in medical imaging, pages 379–387 Springer, 2017 [45] Jo Schlemper, Ozan Oktay, Michiel Schaap, Mattias Heinrich, Bernhard Kainz, Ben Glocker, and Daniel Rueckert Attention gated networks: Learning to leverage salient regions in medical images Medical image analysis, 53:197–207, 2019 [46] Sohil Shah, Pallabi Ghosh, Larry S Davis, and Tom Goldstein Stacked unets: a no-frills approach to natural image segmentation arXiv preprint arXiv:1804.10343, 2018 [47] Younghak Shin, Hemin Ali Qadir, Lars Aabakken, Jacob Bergsland, and Ilangko Balasingham Automatic colon polyp detection using region based deep cnn and post learning approaches IEEE Access, 6:40950–40962, 2018 [48] Juan Silva, Aymeric Histace, Olivier Romain, Xavier Dray, and Bertrand Granado Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer International Journal of Computer Assisted Radiology and Surgery, 9(2):283–293, 2014 [49] Karen Simonyan and Andrew Zisserman Very deep convolutional networks for large-scale image recognition arXiv preprint arXiv:1409.1556, 2014 [50] Mingxing Tan and Quoc V Le Efficientnet: Rethinking model scaling for convolutional neural networks arXiv preprint arXiv:1905.11946, 2019 [51] Mingxing Tan, Ruoming Pang, and Quoc V Le Efficientdet: Scalable and efficient object detection arXiv preprint arXiv:1911.09070, 2019 [52] Zhiqiang Tang, Xi Peng, Shijie Geng, Yizhe Zhu, and Dimitris N Metaxas Cunet: Coupled u-nets In 29th British Machine Vision Conference, BMVC 2018, 2019 62 [53] Luan Tran, Xi Yin, and Xiaoming Liu Disentangled representation learning gan for pose-invariant face recognition In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1415–1424, 2017 [54] Luan Tran, Xi Yin, and Xiaoming Liu Representation learning by rotating your faces IEEE transactions on pattern analysis and machine intelligence, 41(12):3007–3021, 2018 [55] Jeroen C Van Rijn, Johannes B Reitsma, Jaap Stoker, Patrick M Bossuyt, Sander J Van Deventer, and Evelien Dekker Polyp miss rate determined by tandem colonoscopy: a systematic review American Journal of Gastroenterology, 101(2):343–350, 2006 [56] Qiang Wang, Huijie Fan, Gan Sun, Weihong Ren, and Yandong Tang Recurrent generative adversarial network for face completion IEEE Transactions on Multimedia, 2020 [57] Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He Aggregated residual transformations for deep neural networks In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1492–1500, 2017 [58] Niannan Xue, Jiankang Deng, Shiyang Cheng, Yannis Panagakis, and Stefanos Zafeiriou Side information for face completion: A robust pca approach IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(10):2349–2364, 2019 [59] Raymond Yeh, Chen Chen, Teck Yian Lim, Mark Hasegawa-Johnson, and Minh N Do Semantic image inpainting with perceptual and contextual losses arXiv preprint arXiv:1607.07539, 2(3), 2016 [60] Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, and Manmohan Chandraker Towards large-pose face frontalization in the wild In Proceedings of the IEEE International Conference on Computer Vision, pages 3990–3999, 2017 [61] Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang Generative image inpainting with contextual attention In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5505–5514, 2018 [62] Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Zhi Zhang, Haibin Lin, Yue Sun, Tong He, Jonas Mueller, R Manmatha, et al Resnest: Split-attention networks arXiv preprint arXiv:2004.08955, 2020 [63] K Zhang, Z Zhang, Z Li, and Y Qiao Joint face detection and alignment using multitask cascaded convolutional networks IEEE Signal Processing Letters, 23(10):1499–1503, Oct 2016 63 [64] Zhengxin Zhang, Qingjie Liu, and Yunhong Wang Road extraction by deep residual u-net IEEE Geoscience and Remote Sensing Letters, 15(5):749–753, 2018 [65] Jian Zhao, Yu Cheng, Yan Xu, Lin Xiong, Jianshu Li, Fang Zhao, Karlekar Jayashree, Sugiri Pranata, Shengmei Shen, Junliang Xing, et al Towards pose invariant face recognition in the wild In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2207–2216, 2018 [66] Jian Zhao, Junliang Xing, Lin Xiong, Shuicheng Yan, and Jiashi Feng Recognizing profile faces by imagining frontal view International Journal of Computer Vision, 128(2):460–478, 2020 [67] Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang Unet++: Redesigning skip connections to exploit multiscale features in image segmentation IEEE transactions on medical imaging, 39(6):1856–1867, 2019 [68] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros Unpaired imageto-image translation using cycle-consistent adversarial networks In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017 [69] Xiangyu Zhu, Xiaoming Liu, Zhen Lei, and Stan Z Li Face alignment in full pose range: A 3d total solution IEEE transactions on pattern analysis and machine intelligence, 41(1):78–92, 2017 64 Summary of the thesis The topic: An Improved UNets Architecture and Its Applications Author: Tran Quang Chung Supervisor: Dr Dinh Viet Sang Keyword: Generative Adversarial Networks, Pose-invariant Face recognition, Deep Learning, Colonoscopy, Polyp segmentation, Convolution Neural Network, Attention Gate The reason chooses the topic: Recently, deep learning technology develops rapidly in the image processing field However, this technique still faces many challenges in the domain such as “face recognition”, “face reconstruction”, “polyp detection”, etc By researching the common architecture UNET, we has proposed the variant models to improve its accuracy and performance on various domains The purpose of the thesis: • Face reconstruction: Pose-invariant face recognition refers to the problem of identifying or verifying a person by analyzing face images captured from different poses This problem is still challenging due to the large variation of the pose, illumination and facial expression A promising approach to deal with pose variation is to fulfill incomplete UV maps extracted from the in-the-wild faces, then attach the completed UV map to a fitted 3D mesh and finally generate different 2D faces of arbitrary poses Hence, we proposed a novel generative model to improve the UV map completion The synthesized faces increase the pose variation for training deep face recognition models and reduce the pose discrepancy during the testing phase • Polyp detection: Colorectal cancer is one the most common cause of cancer death in the world because physicians can miss the polyp during colonoscopy By diagnosing and treating early the extremely serious disease, the patients can be saved with a high success rate However, in order to use automatic polyp detection and segmentation is still a difficult challenge due to some reasons such as absent standard data, low-quality images In this thesis, we proposes a novel architecture (Attention ResCUNeSt) that can significantly improve the network’s ability for polyp segmentation in colonoscopy images 65 The main content and the contribution In this thesis, we proposed two variant models (ResCUNet, Attention ResCUNet) for the face reconstruction task The proposed model has surpassed the previous work in two metrics SSIM and PSNR Also, the proposed model could re-construct the frontal face from the profile face, and the synthetic faces are natural, photorealistic, coherent, and identity-preserved In the verification task, we used the synthetic faces to augment the data for the training phase As a result, the baseline model (arcface) combining with this augmented strategy outperformed many recent studies on benchmark datasets Next, for the polyp segmentation task, we conducted an ablation study to find out the best configuration The proposed model Attention ResCUNeSt is compared with many previous studies such as Pranet, ResUNet++ And the proposal has been demonstrated its efficiency The author experimented on standard benchmark datasets for a fair comparison The main contribution is: The authors publicized two papers (MAPR-2020, SCI-Journal Q1) for face reconstruction task • We propose three novel architectures for two tasks (ResCUNet, Attention ResCUNet, Attention Res-CUNeSt): face reconstruction and polyp segmentation • Evaluate on the various dataset: For individual tasks (face reconstruction and polyp segmentation), we evaluate our proposal on many popular datasets to obtain the best performance The methodology Face reconstruction: Firstly, to retrieve an incomplete UV Map and 3D mesh we used 2D image and fitted it to 3DDFA model Next, we used a generative model to transform the incomplete UV Map to the completed UV Map The author inherited from the original UV-Gan model and attached ResNet backbone, Attention Gate More specifically, the ResNet backbone will extract semantic features, then it will boost by Attention Gate Meanwhile, the features from two U-Nets are fused by trainable scalar weights To train the model, we used a loss function with four components: reconstruction loss, global and local loss, identity loss Polyp segmentation: In this task, we proposed a novel architecture (Attention Res-CUNeSt) that can significantly improve the network’s ability for polyp segmentation in colonoscopy images By using the powerful backbone (ResNeSt), Attention Gate module, the architecture can eliminate the unnec66 essary region and focus so much on high probability areas In addition, we doubled this model to leverage the deep semantic feature, after that we produce a new architecture called Attention CU-Net The tversky loss is used to retrieve a balanced trade-off between precision and recall Conclusion In this thesis, we proposed three variant U-Net architectures (ResCUNet, Attention ResCUNet, Attention Res-CUNeSt) to achieve the best performance on two tasks (face reconstruction and polyp segmentation) The author has conducted many experiments to compare the proposed model with the previous model in the public datasets and standard metrics As a result, we surpassed all of them in all experiments and demonstrated that the proposal is efficient in the above tasks 67 TÓM TẮT LUẬN VĂN THẠC SĨ Đề tài: Cải tiến kiến trúc U-Nets ứng dụng Tác giả luận văn: Trần Quang Chung Người hướng dẫn: Tiến sĩ Đinh Viết Sang Từ khóa (Keyword): Generative Adversarial Networks, Pose-invariant Face recognition, Deep Learning, Colonoscopy, Polyp segmentation, Convolution Neural Network, Attention Gate Lý chọn đề tài: Gần đây, kỹ thuật học sâu phát triển nhanh chóng lĩnh vực xử lý ảnh Tuy nhiên, vài toán "hệ thống nhận diện gương mặt", "phân vùng ảnh polyp đại tràng", "khơi phục gương mặt"vẫn cịn nhiều thử thách Bằng việc nghiên cứu kiến trúc UNet, đề xuất biến thể mơ hình để cải thiện độ xác tốn kể Mục đích nghiên cứu luận văn, đối tượng, phạm vi nghiên cứu: • Khơi phục gương mặt: Nhận dạng khn mặt bất biến tư (Poseinvariant face recognition) tốn định danh người việc phân tích mặt người chụp từ bắt kỳ góc cạnh Đây tốn khó thực tế bị ảnh hưởng ánh sáng, cảm xúc gương mặt, nhiều tư mặt Một phương pháp tốt để giải vấn đề xoay từ mặt nghiêng mặt diện khơng gian 2D, sau kết hợp với mơ hình gương mặt 3D để sinh bất kỳ vị trí mong muốn Do đó, chúng tơi đề xuất mơ hình sinh để cải tiến trình biến đổi từ gương mặt khơng hồn thiện gương mặt hoàn thiện Các ảnh sau sinh dùng để tăng cường liệu cho trình huấn luyện mơ hình nhận diện • Phân vùng ảnh polyp đại tràng: Ung thư đại trực tràng bệnh ung thư nguy hiểm phổ biến giới Nhưng nhờ việc chuẩn đoán điều trị sớm bệnh này, bệnh nhân cứu sống với tỉ lệ thành cơng cao Tuy nhiên, để sử dụng việc sử dụng hệ thống trí tuệ nhân tạo để tự động phát phân vùng cịn nhiều thử thách Trong luận văn này, chúng tơi đề xuất kiến trúc cải tiến độ xác việc phân vùng ảnh polyp từ ảnh nội soi 68 Tóm tắt đọng nội dung đóng góp tác giả: Trong luận văn này, đề xuất biến thể mơ hình UNET (ResCUNet, Attention Res-CUNET) cho tốn khôi phục gương mặt Phương pháp đề xuất vượt qua nghiên cứu trước thang đo SSIM PSNR ( thang đo sử dụng để đánh giá chất lượng hình ảnh) Hơn nữa, phương pháp đề xuất tái tạo gương mặt diện từ gương mặt khơng diện, gương mặt tái trông tự nhiên, chân thực, mạch lạc, bảo tồn tính định danh Tiếp theo, chúng tơi sử dụng ảnh sinh để tăng cường liệu huấn luyện Kết mơ hình gốc (arcface) kết hợp với chiến thuật tăng cường liệu vượt qua nghiên cứu gần tập liệu kiểm thử chuẩn Đối với toán phân vùng ảnh polyp đại tràng, tiến hành nhiều thí nghiệm để tìm mơ hình tối ưu Mơ hình đề xuất (Attention ResCUNeSt) so sánh với nhiều nghiên cứu trước Pranet, ResUNet++, Mơ hình đề xuất chứng minh tính hiệu tập kiểm thử chuẩn Đóng góp luận văn: Nhóm tác giả cơng bố báo (MAPR-2020, SCI-Journal Q1) cho nghiên cứu liên quan đến khơi phục gương mặt • Chúng tơi đề xuất mơ hình cho nhiệm vụ khơi phục gương mặt phân vùng ảnh polyp đại tràng (ResCUNet, Attention ResCUNet, Attention Res-CUNeSt) • Đánh giá kết nhiều tập liệu khác nhau: Đối với nhiệm vụ, chúng tơi đánh giá mơ hình đề xuất nhiều liệu phổ biến để đạt chứng minh khả mơ hình đề xuất Phương pháp nghiên cứu: • Khơi phục gương mặt: Đầu tiên, để nhận ảnh UV Map ( gương mặt khơng hồn thiện) mơ hình 3D gương mặt, chúng tơi sử dụng mơ hình 3DDFA với đầu vào ảnh 2D Tiếp theo, sử dụng mơ hình sinh để biến đổi ảnh gương mặt khơng hồn thiện sang gương mặt hồn thiện Tác giả kế thừa nghiên cứu gốc UV-Gan thêm nhiều mô đun đại ResNet, Attention Gate Cụ thể hơn, mơ hình ResNet sử dụng để trích xuất đặc trưng ngữ nghĩa ảnh đầu vào sau đặc trưng tăng cường Attention Gate Các đặc trưng từ hai U-Nets tổng hợp trọng số huấn luyện Để huấn luyện mơ hình, 69 sử dụng hàm mát với thành phần: hàm mát tái tạo, hàm mát toàn cục cục bộ, hàm mát định danh • Phân vùng ảnh polyp đại tràng: Đối với tốn này, chúng tơi đề xuất mơ hình (Attention Res-CUNeSt) cải thiện độ xác mạng cho tốn phân vùng ảnh Chúng tơi sử dụng mơ hình trích xuất đặc trưng mạnh mẽ ResNeSt, mơ đun Attention Gate để mơ hình biết cần phải tập trung vào vùng cần thiết Hàm mát tversky sử dụng để đạt cân precision recall Kết luận: Trong luận văn này, đề xuất biến thể U-Net (ResCUNet, Attention ResCUNet, Attention Res-CUNeSt) để đạt hiệu tốt toán (khôi phục gương mặt phân vùng ảnh polyp đại tràng) Chúng tơi tiến hành nhiều thí nghiệm để so sánh mơ hình đề xuất với nghiên cứu trước tập liệu kiểm thử chuẩn Kết mơ hình đề xuất chúng tơi chứng minh tính hiệu vượt qua kết trước thang đo chuẩn 70 ... , G and outputs an intermediate result Ui = Fi (X), where Ui ∈ RH×W ×C/K , and H, W, C are the sizes of a cardinal group’s output The output of k-th cardinal group is an element-wise summation... in appearance, such as size, texture, and color Secondly, the boundary between polyps and their surrounding mucosa is often blurred and ambiguous during colonoscopy, leading to confusion for... Improved UNets Architecture and Its Applications The result • Proposing a new architecture to improve the existing one • Applying the architecture for some tasks • Evaluating on many standard benchmark

Định dạng
Số trang	81
Dung lượng	1,58 MB