Develop an android application for removing unwanted objects

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY FACULTY OF COMPUTER SCIENCE AND ENGINEERING ——————– * ——————— THESIS Develop An Android Application For Removing Unwanted Objects COUNCIL: COMPUTER SCIENCE SUPERVISE: Dr Nguyen Ho Man Rang REVIEWER: Dr Nguyen Tien Thinh —o0o— STUDENTS: Nguyen Khac Tri 1752567 Bui Ngoc Dang Khoa 1752290 Doan Tuan Dat 1752159 HO CHI MINH CITY, 7/2021 - - KHOA:KH & KT Máy tính KHMT MSSV: 1752159 NGÀNH: ình 1752290 - 1752567 DEVELOPING OF AN ANDROID APPLICATION FOR REMOVING WANTED OBJECTS - Conduct a deep literature review on the topic of object removal - Propose a suitable algorithm which can work on edge device - Conduct experiments to evaluate the proposed approach - Implement an Android application for removing objects which can submit to Google Play 01/03/2021 23/07/2021 1) TS NGUY TS Nguy _ _ _ KHOA KH & KT MÁY TÍNH -Ngày 06 tháng 08 2021 ) àn Tu Bùi Ng Nguy n Kh í MSSV: 1752159 1752290 - 1752567 Ngành (chuyên ngành): Development of an Android application for removing wanted objects ng: : - Students had good background on image processing and conducted a deep literature review on object removal and impainting - Students have designed and implemented both traditional approach and deep learning based approach for object removal - Students have collected a small dataset of using different android phones for evaluating the proposed approaches - Students also designed and implemented an android mobile app for demonstration - The GUI for the Android app is not user-friendly - The processing time of the customized deep learning model is still relatively slow which needs to be improved a b c 10 G 9.0/10 TRƯỜNG ĐẠI HỌC BÁCH KHOA KHOA KH & KT MÁY TÍNH CỘNG HỊA XÃ HỘI CHỦ NGHĨA VIỆT NAM Độc lập - Tự - Hạnh phúc -Ngày 10 tháng 08 năm 2021 PHIẾU CHẤM BẢO VỆ LVTN (Dành cho người hướng dẫn/phản biện) Họ tên SV: - Nguyễn Khắc Trí (1752567) - Bùi Ngọc Đăng Khoa (1752290) - Đoàn Tuấn Đạt (1752159) MSSV: Ngành (chuyên ngành): Khoa học Máy tính Đề tài: Develop An Android Application for Removing Unwanted Objects Họ tên người hướng dẫn/phản biện: Nguyễn Tiến Thịnh Tổng quát thuyết minh: Số trang: Số chương: Số bảng số liệu Số hình vẽ: Số tài liệu tham khảo: Phần mềm tính tốn: Hiện vật (sản phẩm) Tổng quát vẽ: - Số vẽ: Bản A1: Bản A2: Khổ khác: - Số vẽ vẽ tay Số vẽ máy tính: Những ưu điểm LVTN: - The thesis was written quite well with a minor number of typos - For the thesis, students built a valuable Android application for removing unwanted objects in input photos - Modern deep-learning based techniques such as GANs (Generative Adversarial Networks) was applied to solve the image inpainting problem - Also, for the thesis, students had an interesting comparison between the proposed model and other models, for example, Contextual Reconstruction, Edge Connect, and Gated Convolution architectures based on a user study on 20 unlabeled images - Students also tested the proposed model performance on different Android devices OPPO F7, VSMART LIVE 4, and VSMART STAR The processing time from when a user uploads a picture until he receives the result is acceptable and it depends on the picture size Những thiếu sót LVTN: - The thesis lacks necessary discussions as well as explanations for notations, mathematical formulas, figures, and final results For example, the thesis lacks detailed comments for the figures on pages 57, 58, and 59 The discussions on page 56 about the figures are cursory - Quite many details are not clear like the one in the explanation for the notations appearing the formula (3.5) on page 26 Students wrote that “D is the set of the 1-Lipschitz functions” However, in the explanation for the notations appearing in the formula (3.4), students wrote that “D is the discriminator” They are incompatible although the Ds play the same role indeed The explanation for D in (3.5) should be the explanation for the calligraphic D in (3.5) - Pertaining to the user study, students didn’t describe carefully who the users that were invited to the survey are And students didn’t explain how the 20 images were chosen It could be the case where students tried to choose more images that the proposed model works well with than the ones that the other models work well with It is not fair, is it? - Some parts/sections in the thesis such as the future works section are too short Students should add more detailed information into these parts/sections - Font-size and citations must be considered carefully, there are no citations for figure 4.3, 4.4 on page 39, figure 5.6 on page 60, etc Đề nghị: Được bảo vệ  Bổ sung thêm để bảo vệ  Không bảo vệ  câu hỏi SV phải trả lời trước Hội đồng: a Explain formula (2.3) and (2.4) about MinMax quantization and Logarithmic min-max quantization on page 18 in the thesis b Describe the user test and explain the test result more carefully c Give some comments on the application performance compared to other applications for image inpainting existing on the current mobile application market 10 Đánh giá chung (bằng chữ: giỏi, khá, TB): Điểm : 9.2/10 Ký tên (ghi rõ họ tên) Nguyễn Tiến Thịnh Ký tên (ghi rõ họ tên) Work Guarantees We include Nguyen Khac Tri, Bui Ngoc Dang Khoa, Doan Tuan Dat declare that the work presented here is done by us and has not been submitted elsewhere for the requirement of a degree program Any related work within these has been listed in the reference section below Acknowledgements This project would not have been possible without the support of many people We are profoundly grateful to Dr Nguyen Ho Man Rang for his guidance and encouragement Without his constant support and inspiration, we would not be able to stay persistent and focused on our thesis topic Thanks to the Ho Chi Minh University of Technology, we were able to gain a lot of useful information Finally, we must sincerely heartfelt gratitude to all the staff members of the CSE faculty who helped us directly or indirectly during this work Abstract Nowadays, as technology has advanced, the phone seems to be an indispensable device to everyone not only because of its versatility but also its convenience Besides using mobiles for calling, texting, playing games, etc, working with photos or images is really common When we take a picture or reuse some images for our personal usage from different sources, there are maybe some unwanted objects that we want to remove For example, we have to use an image that contains a logo or text on it, and for some reason, we need to remove that unwanted object Or sometimes old images may get scratches and to make them like new ones, the scratches should be removed And a more common situation that we usually meet is that when we have a selfie or take photograph there are accidentally some strangers appearing in the photos which is quite annoying And we can not simply erase those things from the pictures or photos because erasing can leave a white patch in that area, we also need to refill the region with its background texture to make it look more natural and hide the white patches Because of that, our study attempts to investigate the application that supports users working with photos In this thesis , we implement an Android app to help users to remove the redundant object on photos We focused most on researching techniques that can generate the best result The result obtained from the research will be compared with the current mobile applications on Google Store Contents List of Tables vi List of Figures vii Introduction 1.1 Overview 1.2 Problem statement 1.3 Motivation 1.4 Problems 1.5 Goal and scope 1.6 Research method 1.7 Progress 1.8 Thesis outline Background 2.1 Image on human eye 2.2 Image on an analog camera 2.3 Image on digital camera 2.4 Digital image 2.5 Image inpainting 2.6 Spatial domain filtering 2.7 Sub-pixel convolution 2.8 Evaluation metrics 2.8.1 L1 distance 2.8.2 PSNR 2.8.3 SSIM 10 i 2.8.4 FID 10 Generative adversarial nets 11 2.9.1 Generative adversarial nets loss function 13 2.9.1.1 Minimax loss 13 2.9.1.2 Wasserstein loss 13 2.9.1.3 GAN hinge loss 14 Problems in GANs model 15 2.9.2.1 Vanishing gradients 15 2.9.2.2 Mode collapse 15 2.9.2.3 Failure to converge 16 2.10 Model compression and acceleration 16 2.10.1 Quantization 17 2.10.1.1 Typical quantization methods 17 2.10.1.2 Type of quantization 18 2.9 2.9.2 Related Work 21 3.1 Generative image inpainting with contextual attention 21 3.1.1 Dilated convolution 22 3.1.2 Contextual attention 23 3.1.3 Coarse-to-fine network architecture 23 3.1.4 Globally and locally discriminators 25 3.1.5 Wasserstein GANs 25 Free-Form image inpainting with gated convolution 26 3.2.1 Gated convolution 26 3.2.2 Spectral-Normalized markovian discriminator (SN-PatchGAN) 27 EdgeConnect: Generative image inpainting with adversarial edge learning 27 3.3.1 Loss function 28 CR-Fill: Generative image inpainting with auxiliary contextual reconstruction 29 3.4.1 Contextual reconstruction loss (CR loss) 30 Hi-Fill 31 3.5.1 Contextual residual aggregation (CRA) 32 3.5.2 Light weight gated convolution 33 3.5.3 Objective function 34 3.2 3.3 3.4 3.5 EXPERIMENTAL RESULTS STAR to estimate how much time users need to wait to get the latest image We did the testing multiple times on multiple sizes to achieve the most exact run time 5.5.1.1 OPPO F7 • RAM: 4GB • Octa-core, Helio P60 • GPU: Adreno 512 • Hexagon support: no 5.5.1.2 VSMART LIVE • RAM: 4GB • CPU: Octa-core, Snapdragon 675 2.0 GHz • GPU: No • Hexagon support: No 5.5.1.3 VSMART STAR • RAM: 4GB • MediaTek Helio P35 • GPU: PowerVR GE8320 • Hexagon support: no 5.5.2 Inference time Table 5.6: Inference time of our model on different devices and size 256*256 512*512 2048*2048 3024*3024 Mean (seconds) STD (seconds) Mean (seconds) STD (seconds) Mean (seconds) STD (seconds) Mean (seconds) STD (seconds) OPPO F7 1.6353 0.1209 5.2586 0.6046 11.5883 0.5037 13.1036 0.4201 VSMART LIVE 2.0331 0.2311 5.9813 0.2714 11.9326 11.9326 13.8384 13.8384 VSMART STAR 2.6863 0.0952 7.7317 0.1099 19.7176 0.2418 22.1681 0.2970 66 Chapter Application In real life, deployment models and algorithms into mobile environments can be very different and complex Mobile hardware is limited and the architecture is also unique especially in local condition so it takes time to get the final result In the scope of the thesis, we have successfully built an application that can be used the same as other mobile applications The objective main function is to remove an object that the user inputs and it contains detailed information about the technology inside 67 APPLICATION 6.1 Introduction 6.1.1 Target The main target is to provide end-users with an easy-to-use program We created an Android application to help users enforce the proposed algorithm and model We also use the TensorFlow Lite library for the deployment to support model to run on edge devices and Renderscript support library for speed up the computational task for traditional approach in image processing 6.1.2 Language and background For the mobile development, we used Java and Renderscript library The Java version is Java and it supports API 5.0 to the latest version Moreover, we used the nightly version for TensorFlow Lite and other third-party libraries for the general development Of course, the deep learning model was trained on Google Colab with both Pytorch and TensorFlow They were then load weighted into tflite models for the mobile version 6.1.3 Function 6.1.3.1 Tutorial Tutorial section will show user step by step to use each function in the application Each window of tutorial is about steps to it correctly, and a gif describe how to the action Here, a View Pager component in Android is used to make animation sliding on the screen 6.1.3.2 Information The tutorial section will show the user step by step to use each function in the application Each window of the tutorial is about steps to it correctly, and a gif describes how to the action Here, a View Pager component in Android is used to make animation sliding on the screen 68 APPLICATION 6.1.3.3 Main function Main function is used for drawing and removing objects User can choose different mode: default mode or copy move function, deep learning mode and finally clone stamp The main function is used for drawing and removing objects Users can choose different modes: default mode or copy-move function, deep learning mode and finally clone stamp 6.1.3.4 Side functions There are some function to help user have better experience with the application • Undo: help user to turn back to the previous state before the removing process happen The result may not satisfy the customers so it is necessary to equip a backward function • Save Image: save the working image into the gallery and now the user can freely use it for personal purposes 6.1.4 Interface The user interface in Figure 6.1 shows the above function include: detail view, a tutorial section, and a screen to draw and remove 69 APPLICATION (a) Detail view (b) Draw and remove view (c) Tutorial view Figure 6.1: Interfaces 6.1.5 Hardware requirement Hardware configuration: • Operating System: Android • Permission required: Read media files and write to the storage • Ram: at least 4GB • Core: multi-core CPU • Version: API 21 upper 6.1.6 APK detail The application has been uploaded into Google Store Users can download it here: https: //play.google.com/store/apps/details?id=thesis.objectRemoval • Last update: 21/7/2021 • Code version: 1.0 • Download size: 108MB 70 APPLICATION • Android version: API 21 or Android 5.0 upper • OpenGL ES version: 0.0 upper • Countries: 176 • Device support: 14713 • Right – Picture/media/file * Read the contents of your USB storage * Modify or delete the contents of your USB storage – Memory * Read the contents of your USB storage * Modify or delete the contents of your USB storage 71 APPLICATION 6.2 Diagrams Use case diagram in Figure 6.2, activity diagram in Figure 6.3 and sequence diagram in Figure 6.4 show how the application work in general 6.2.1 Use case diagram Figure 6.2: Use Case Diagram 72 APPLICATION 6.2.2 Activity diagram Figure 6.3: Activity Diagram 73 APPLICATION 6.2.3 Sequence diagram :Gallery :DrawClass :RemoveClass User Loop Request and Open Picked Image Redraw Image and Mask Result Undo Finish Figure 6.4: Sequence Diagram 74 Chapter Conclusion Removing objects from images using image inpainting can achieve improved performance in the future, but because of the difficulty and complexity of images, most past few years techniques failed to handle not only completely deleting entire wanted object out of the image but also restoring both texture and structure components of the image itself Until a few years later, few methods are proposed for blind image inpainting regarding the massive number of published works with different techniques like sequential-based, CNN-based or GAN-based and that is also the direction we want to head for in this thesis We first started as many directions but soon we had to give up on sequential-based (Patchbased and Diffusion-based) methods because they are non-learning techniques which means it is impossible to generate novel contents or fill in larger holes with these methods as the generated contents are often inconsistent with the remaining regions of the image just like common object delete applications on mobiles We continued to research more on the two left paths, which are CNN-based and GANbased, both kinds of architectures seems to work well at first but later due to the fact that most networks relied on CNN architecture only have to suffer the disadvantage of generating blurry content, boundary artifacts, and unrealistic content because the convolution operators only help generate local contextual information To make up for this scenario, people usually come up with really deep neural architectures such as Partial Convolutions or Generative Multi-column Convolutional Neural Networks, etc But they have to trade-off between size and quality so usually most CNN-based will be heavy which is not suitable for deploying in small,low-power devices like smartphones So it leaves us with the choice of GANs-based models, so we have spent much time research- 75 CONCLUSION ing about these architectures and made some comparisons and experiments between them and we choose the CR-Fill as our choice for deploying as it outperforms others in many aspects We also successfully built a mobile (android) app that can use both cameras to take photos and the photo library of the devices to select photos wanted and we have been able to deploy and run inference on the phone Our current solution has a problem that it requires quite amount of RAM while running and also the quality of model is still needed to improve to generate more plausible result Moreover, the source code on android is still non-optimal so we will try to improve that in the next stage 76 CONCLUSION 7.1 Summary 7.1.1 Conclusion The thesis has been conducted to rebuild and improve the model and method for remove objects on 2D images for the mobile application The system is designed using knowledge from deep learning and image processing The easy to used application serves human demand on picture and photography entertainment 7.1.2 Evaluation on advantages • The model can run on large sizes such as 4K resolution which is impossible for other existed ones that we tried to convert to the mobile environment It means the method can deal with most of the images taken from the camera on mid-range Android devices • The model designed is suitable for a wide range of device and use just not really much RAM for the process It required a better CPU to run faster but 2GB RAM is enough for the inference process • We designed an android application and put it on the Google Store so users can easily access it with multiple functions 7.2 Future works 7.2.1 Evaluation on disadvantage • To get a good generating result, it required a long training process maybe a month to get a relatively good result It took time and expensive hardware for the training • The current trending models were training for numerous iteration with huge batch sizes Our model is trained on small batch size with limited Google Colab resources only so it’s hard to achieve a satisfactory result • The training process is painful and so is the inference process The low-end device may not be supported or is not strong enough to speed up the removal process 77 CONCLUSION 7.2.2 Future development • We will extend the ability of the application to further limitations It will not only work for street and place scenes but also special textures such as human faces, animals or even cartoon art style • We are going to make the model work better by training more and more on the larger data set We have made it run fast enough for the mobile environment and we also try to make it generate a better result for the current SOTA models 78 Bibliography [1] Andrew Aitken, Christian Ledig, Lucas Theis, Jose Caballero, Zehan Wang, and Wenzhe Shi Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution, resize convolution and convolution resize arXiv preprint arXiv:1707.02937, 2017 [2] Martin Arjovsky, Soumith Chintala, and Léon Bottou Wasserstein gan, 2017 [3] Mengmeng Bai, Shuchen Li, Jianhua Fan, Chenchen Zhou, Li Zuo, Jaekeun Na, and MoonSik Jeong Fast light-weight network for extreme image inpainting challenge In European Conference on Computer Vision, pages 742–757 Springer, 2020 [4] Rafael C Gonzalez and Richard E Woods Digital Image Processing International series of monographs on physics Pearson, 2018 [5] Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio Generative adversarial networks, 2014 [6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun Deep residual learning for image recognition CoRR, abs/1512.03385, 2015 [7] Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa Globally and locally consistent image completion ACM Transactions on Graphics (ToG), 36(4):1–14, 2017 [8] Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa Globally and locally consistent image completion ACM Transactions on Graphics, 36:1–14, 07 2017 [9] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida Spectral normalization for generative adversarial networks, 2018 [10] Kamyar Nazeri, Eric Ng, Tony Joseph, Faisal Z Qureshi, and Mehran Ebrahimi Edgeconnect: Generative image inpainting with adversarial edge learning CoRR, abs/1901.00212, 2019 79 BIBLIOGRAPHY [11] Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network CoRR, abs/1609.05158, 2016 [12] Joel Shor Generative adversarial networks [13] Mingxing Tan and Quoc V Le Efficientnet: Rethinking model scaling for convolutional neural networks CoRR, abs/1905.11946, 2019 [14] Christopher Thomas Deep learning image enhancement insights on loss function engineering [15] Kimhan Thung A survey of image quality measures IEEE Xplore, January 2010 [16] Peiqi Wang, Dongsheng Wang, Xinfeng Xie Yu Ji, Xuxin Liu Haoxuan Song, Yongqiang Lyu, and Yuan Xie Qgan: Quantized generative adversarial networks pages 2–3, January 2019 [17] Zili Yi, Qiang Tang, Shekoofeh Azizi, Daesik Jang, and Zhan Xu Contextual residual aggregation for ultra high-resolution image inpainting CoRR, abs/2005.09704, 2020 [18] Fisher Yu and Vladlen Koltun Multi-scale context aggregation by dilated convolutions pages 2–3, April 2016 [19] Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang Free-form image inpainting with gated convolution CoRR, abs/1806.03589, 2018 [20] Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang Generative image inpainting with contextual attention CoRR, abs/1801.07892, 2018 [21] Yu Zeng, Zhe Lin, Huchuan Lu, and Vishal M Patel Image inpainting with contextual reconstruction loss CoRR, abs/2011.12836, 2020 [22] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba Places: A 10 million image database for scene recognition IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017 [23] Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva Learning deep features for scene recognition using places database In Z Ghahramani, M Welling, C Cortes, N Lawrence, and K Q Weinberger, editors, Advances in Neural Information Processing Systems, volume 27 Curran Associates, Inc., 2014 80 ... for removing unwanted objects on an image • Customize, optimize and quantize the model that is suitable for the mobile environment • Implement an Android application Scope: • For an application, ... Khoa học Máy tính Đề tài: Develop An Android Application for Removing Unwanted Objects Họ tên người hướng dẫn/phản biện: Nguyễn Tiến Thịnh Tổng quát thuyết minh: Số trang: Số chương: Số bảng số... ngành): Development of an Android application for removing wanted objects ng: : - Students had good background on image processing and conducted a deep literature review on object removal and impainting

Tiêu đề	Develop An Android Application For Removing Unwanted Objects
Tác giả	Nguyen Khac Tri, Bui Ngoc Dang Khoa, Doan Tuan Dat
Người hướng dẫn	Dr. Nguyen Ho Man Rang, Dr. Nguyen Tien Thinh
Trường học	Vietnam National University Ho Chi Minh City
Chuyên ngành	Computer Science
Thể loại	thesis
Năm xuất bản	2021
Thành phố	Ho Chi Minh City

Định dạng
Số trang	96
Dung lượng	4,95 MB