Unified model of detction and recognition fo han nom characters

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY MASTER THESIS Unified Model of Detection and Recognition for Han Nom Characters NGUYỄN VĂN LỢI loi.nv142732@sis.hust.edu.vn Information Systems Supervisor: PhD Nguyễn Thị Oanh School: SOICT HÀ NỘI, 03/2021 Signature CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM Độc lập – Tự – Hạnh phúc BẢN XÁC NHẬN CHỈNH SỬA LUẬN VĂN THẠC SĨ Họ tên tác giả luận văn: Nguyễn Văn Lợi Đề tài luận văn: Mơ hình hợp phát nhận dạng ký tự Hán Nôm Chuyên ngành: Hệ thống thông tin Mã số SV: CBC19016 Tác giả, Người hướng dẫn khoa học Hội đồng chấm luận văn xác nhận tác giả sửa chữa, bổ sung luận văn theo biên họp Hội đồng ngày 24/04/2021 với nội dung sau: - Đính thêm nội dung sơ đồ tổng thể cho hệ thống tiêu đề luận văn Ngày 03 tháng 05 năm 2021 Giáo viên hướng dẫn Tác giả luận văn TS Nguyễn Thị Oanh Nguyễn Văn Lợi CHỦ TỊCH HỘI ĐỒNG PGS TS Phạm Văn Hải Lời cảm ơn Bằng lòng biết ơn sâu sắc, xin gửi lời cảm ơn chân thành tới TS Nguyễn Thị Oanh, người giúp nhiều khoảng thời gian vừa qua, từ việc bảo tận tình lý thuyết, kỹ thái độ cần thiết để tơi hồn thành tốt cơng trình luận văn thạc sĩ Trong q trình thực luận văn, ln người nhiệt tình, chu đáo hỗ trợ hết mình, giúp vượt qua chặng đường, khoảng thời gian khó khăn Đồng thời tơi xin gửi lời cảm ơn sâu sắc đến gia đình bạn bè hậu phương vững hỗ trợ suốt thời gian vừa qua, nguồn động lực to lớn giúp tơi có thành cơng ngày hôm Tôi xin chân thành cảm ơn! Hà Nội, ngày 03 tháng 05 năm 2021 HỌC VIÊN Ký ghi rõ họ tên Nguyễn Văn Lợi Tóm tắt nội dung luận văn Trong luận văn thạc sĩ này, tơi trình bày tốn, vấn đề mà tơi quan tâm vấn đề mà tơi giải vấn đề phát nhận dạng dãy ký tự Hán Nơm có ảnh kỹ thuật số Cụ thể, tơi trình bày bối cảnh lịch sử hình thành vấn đề, lý động lực thúc đẩy tơi nghiên cứu vấn đề Sau tơi trình bày tốn, định nghĩa vấn đề cách cụ thể Ở luận văn này, có sử dụng kết hợp phương pháp nghiên cứu khoa học, phương pháp thực phổ biến như: phương pháp phân tích – tổng hợp, phương pháp so sánh, phương pháp dùng số liệu, phương pháp liệt kê Các phương pháp áp dụng xen kẽ, thường xuyên, đầy đủ đắn xuyên suốt nội dung trình bày luận văn Ví dụ việc áp dụng bốn phương pháp thực nghiên cứu giai đoạn tìm hiểu nghiên cứu, đề tài liên quan so sánh tìm đề xuất giải pháp phù hợp Cũng thấy diện phương pháp phân tích – tổng hợp, so sánh, q trình trình bày, mơ tả chi tiết giải pháp đề xuất trình cài đặt thực nghiệm, đánh giá, so sánh kết phát triển định hướng tương lai Để nói chi tiết hơn, mơ hình mà tơi đề xuất mơ hình mạng nơ-ron lấy cảm ứng từ mơ hình phát nhận dạng ký tự hiệu phổ biến ví dụ mơ hình CRNN, CRAFTS, FOTS, EAST, TextBoxes++, … Tuy nhiên, mơ hình tơi đề xuất có cải tiến đáng kể để nhằm giảm thiểu việc tiêu hao tài nguyên phần cứng, cải thiện độ xác, tương thích với đối tượng mục tiêu tạo tiền đề cho việc xây dựng đường ống liệu, mô-đun liên kết kiến trúc hiệu tương lai Kết sau thực luận văn thạc sĩ hệ thống hoàn chỉnh trả lời, đưa lời giải cho vấn đề đặt ban đầu Nó hồn tồn mang tính thực tiễn dễ dàng ứng dụng vào đời sống, giúp làm giảm thiểu gánh nặng công việc cho người Cũng lý đó, tơi xem xét khía cạnh cách chi tiết, rõ ràng để tiến hành phát triển, mở rộng hệ thống tương lại:  Xây dựng phần hệ thống dịch  Mở rộng cho tiếng Việt ngơn ngữ khác  Xây dựng mơ hình thu gọn cho thiết bị di động  Xây dựng tảng dịch vụ, ứng dụng cho hệ thống Contents Acronyms Introduction 1.1 Introduction 1.2 Tasks 1.3 Scope of the study 1.4 Content overview Theoretical Basis 2.1 Artificial Neural Network 2.1.1 Artificial Neuron 2.1.2 Feedforward Neural Network 2.1.3 Convolutional Neural Network 2.1.4 Recurrent Neural Network 2.2 Region of Interest pooling 2.2.1 Conventional RoI pooling 2.2.2 RoI Align 2.2.3 Other popular RoI pooling techniques 2.3 Detection and Segmentation 2.3.1 Detection 2.3.2 Segmentation 2.4 Sampling and Interpolation 2.5 Training and Inference 2.5.1 Training 2.5.2 Inference Related Work 3.1 Detection and text-spotting models 3.1.1 CharNet (Convolution Character Network) 3.1.2 PMTD (Pyramid Mask Text Detector) 3.1.3 OBD (Orderless Box Discretization Network) 3.1.4 FOTS (Fast Oriented Text Spotting) 3.1.5 ContourNet 3.1.6 CRAFT and CRAFTS 3.1.7 Comparison 3.2 Recognition models 3.2.1 CRNN 3.2.2 RARE 1 2 4 11 14 16 17 18 20 20 22 23 26 26 27 29 31 31 34 37 40 43 46 49 51 51 53 Proposed Solutions and Improvements 4.1 Proposed solutions 4.1.1 Remarks 4.1.2 Adaptive solution 4.2 Unified Model for Arbitrary-shape Text 4.2.1 Overview 4.2.2 Detector 4.2.3 Connector 4.2.4 Recognizer Spotting Implementation and Evaluation 5.1 Experimental models 5.2 Datasets 5.2.1 ReCTS2019 5.2.2 SynthText 5.2.3 Chinese Synthetic String dataset 5.2.4 Chinese Street View Text dataset 5.3 Implementation 5.3.1 Development environemt 5.3.2 Training strategy 5.4 Experimental results 5.4.1 Results of the detection models 5.4.2 Results of the UMATS text-spotting model 54 54 54 57 59 59 59 67 71 73 73 74 74 75 75 77 77 78 78 79 79 84 Conclusions and Future work 91 6.1 Conclusions 91 6.2 Future work 91 List of Figures 92 List of Tables 95 Bibliography 96 Acronyms AI Artificial Intelligence AMP Automatic Mixed Precision ANN Artificial Neural Network API Application Programming Interface BN Batch Normalization CCL Connected Components Labeling CNN Convolutional Neural Network CTC Connectionist Temporal Classification DL Deep Learning DNN Deep Neural Network DPM Deformable Part-based Model FCN Fully Connected Neural Network FNN Feedforward Neural Network FPN Feature Pyramid Network GRU Gated Recurrent Unit GT Ground Truth IoU Intersection Over Union KE Key Edge LSTM Long Short-Term Memory MSE Mean Square Error MTL Matching-Type Learning NED Normalized Edit Distance NMS Non-Maximum Suppression OBD Orderless Box Discretization OCR Optical Character Recognition OHEM On-line Hard Negative Mining ReLU Rectified Linear Unit RNN Recurrent Neural Network RoI Region of Interest RPN Region Proposal Network RPP Rescoring and Post-Processing STN Spatial Transformer Network SVM Supported Vector Machine TPS Thin-Plate Spline Chapter Introduction 1.1 Introduction Nowadays, there are more than billion people on Earth who are currently using smartphones and other small handheld information devices for their everyday lives More than that, most of the devices themselves have cameras built into them Not only that, the establishment of surveillance camera systems or the advent of social networks in the era of the fourth industrial revolution has led to an explosion in image resources At the same time, machine learning and computer vision algorithms have continuously made great progress year by year As a result, high-performance and low-cost object detection and recognition systems are continually being introduced and widely applied in many areas of life One of the subjects attracting the attention of the majority of researchers is the characters They are special objects, are means of exchanging information The effective detection and reading of languages simplify many applications of life Some applications can be mentioned such as locating and measuring the geographical position of an object through reading the characters related to it, thereby helping to locate the object, helping to extract the necessary information about the position, helping to detect the location of dangerous objects, etc Another application is the classification of images, classifying objects based on the sequence of characters assigned to them In addition, object tracking applications based on detecting and identifying the number assigned to the object such as detecting number plates, detecting objects with labels are also interesting applications Another potential application is the reading and translation of character sequences on documents, stelae, signs, or historical sites The development and efficiency enhancement of this model is the key to building automatic reading systems in the future and is the springboard for the rest of the applications to thrive In fact, most research on automatic character spotting and translation focuses on English On the other hand, Chinese is the most popular and frequently used language in the world Meanwhile, Nom - the official language in the past of the Vietnamese people before Vietnamese was born and popularized, is being faded over time Moreover, many documents and historical structures of Viet Nam use and contain Han Nom characters The development of Han Nom spotting and translation applications brings a lot of great value In recent years, there have been many models, tools, and systems that allow detecting, recognizing, even translate any characters in images from one language to another However, there are still many problems that need to be solved as well as many development directions to further enhance the efficiency and accuracy of the system For example, some systems require a periodic fee to be used, some have limited usable functionality, others only work well under specific conditions For these reasons, in this thesis, we will focus on building a model to detect and recognize characters, hieroglyphs in general, or Han Nom characters in particular Figure 1.1: Problem definition: localizing regions of lines of characters and converting them into encoded strings of characters [1] 1.2 Tasks In order to build the above model, we will perform the following tasks: ❼ Firstly, we research and evaluate different scene text detection and recognition methods suitable for hieroglyphs ❼ Secondly, we consider the pros and cons of the methods we find strengths to promote or reuse and find weaknesses to replace or eliminate in those methods ❼ Then, we consider the problem that is being solved and combine it with the existing knowledge to propose a total and reasonably effective solution ❼ After that, we describe the proposed solution in detail ❼ Then, we choose the appropriate development environment, choose the resources needed to implement and improve the solution ❼ Lastly, we evaluate solutions based on popular standards, comment on the achieved results, and propose future improvements and development directions 1.3 Scope of the study Here are some of the scopes that we place in the study of this topic: ❼ Solution is designed to detect and recognize sequences of characters instead of detecting and recognizing individual letters Table 5.4: Text spotting speed of different popular models ‘*’ denotes the results based on multiscale tests Method FPS Method FPS Method FPS Deep TextSpotter [43] TextNet* [5] 2.7 MaskTextSpotter* [15] 2.0 Qin et al [44] 4.7 FOTS* [3] 7.5 Li et al [45] 1.3 UMATS (for Latin) 8.55* UMATS (for Chinese) 8.28* CRAFTS [1] 5.4 We can see that our experimental models have a competitive performance to other emerging models Especially, our models are faster than CRAFTS thanks to the newly-designed connection module which efficiently connects the detection component to the recognition component without the need for the rectification module, an additional complex neural network An important point to note is that the speed of the text-spotting models is affected by the detection performance The more text lines the detector can locate, the longer time it takes the recognizer to decode those text lines Some text spotting visual results on ReCTS2019 testing dataset are shown in figures 5.9 and 5.10 Thanks to the connection module, the UMATS model can easily deal with arbitrary-shape text Figure 5.11 depicts some of the results on arbitrary-shape text Our proposed model is therefore a good model for Han Nom textline detection and recognition on natural scene images As shown in figures 5.12 and 5.13, distichs, horizontal lacquered boards, and other textlines in historical documents can be easily spotted by our model Especially, our model can easily deal with vertical textline recognition problems by pooling each character region and organizing them in a horizontal line 85 Figure 5.9: Qualitative results on ReCTS2019 dataset 86 Figure 5.10: Qualitative results on ReCTS2019 dataset 87 Figure 5.11: Visual results on arbitrary-shape text 88 Figure 5.12: Horizontal lacquered boards and historical sites 89 Figure 5.13: Distichs 90 Chapter Conclusions and Future work 6.1 Conclusions In this thesis, I present an end-to-end trainable single pipeline model that tightly couples detection and recognition modules The effective perspective character RoI pooling in the connection module not only helps rectifying arbitrary-shape textlines but also lets recognition loss from the recognizer back-propagate through the whole network easily we no longer need a separate rectification module and therefore reduce the model’s complexity while maintaining high-performance results Additionally, the model is designed with modularization in mind and the source code is written 100% by me so that it can be easy to develop the model in the future Moreover, to the best of our knowledge, this is the first model aimed to solve the Han Nom text detection and recognition problem So hopefully, this will be a good reference material for other researchers in the future 6.2 Future work In the future, we plan to refine the model to improve both its accuracy and speed: ❼ replace the backbone network with the current state-of-the-art methods like EfficientNet, FixEfficient-L2, etc ❼ replace CTC decoder with some newly-designed decoders, etc ❼ replace the Gaussian distribution for each character region with some other distributions, etc ❼ train model additionally with other datasets ❼ support other languages such as Korean, Japanese, etc ❼ take advantage of weakly-supervised training [1] and train the model with much more data 91 List of Figures 1.1 Problem definition 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20 2.21 2.22 2.23 2.24 2.25 2.26 2.27 2.28 2.29 2.30 Artificial neuron structure Popular activation functions Feedforward neural network structure The shape of several neural network volumes Convolutional neural network architecture Convolution operation Convolution layer Depth column How zero-padding affects the spatial size of the output volume Max pooling layer Recurrent neural network architecture Neuron structure of a recurrent neural network RoI pooling Example of an feature map Example of an region proposal 2x2 pooling sections Pooled feature map The quantization of RoI pooling RoI Align How RoI Align calculate output for each smaller region Object detection Shallow and deep learning Segmentation types Faster R-CNN and Mask R-CNN Image interpolation Linear interpolation Quadratic interpolation Interpolation example using resizing Interpolation example using rotation Bilinear Interpolation 6 7 10 10 11 12 12 15 17 17 18 18 18 19 19 20 21 22 22 23 23 24 24 24 25 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 Text spotting architecture Text spotting architecture Text spotting architecture The architecture of CharNet The architecture of Hourglass Different components of the Hourglass network Iterative Character Detection First limitation of Mask R-CNN based methods 30 30 31 32 32 33 34 35 92 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17 3.18 3.19 3.20 3.21 3.22 3.23 3.24 3.25 3.26 3.27 Second limitation of Mask R-CNN based methods Soft pyramid label Overall architecture of PMTD Generation of soft pyramid label Plane algorithm The architecture of OBD network Comparison between OBD and other previous methods Illustration of the OBD and MTL blocks Illustration of different matching types The architecture of FOTS Illustration of RoIRotate The pipeline of ContourNet Adaptive RPN The visualization of LOTM Schematic overview of CRAFTS pipeline The backbone of CRAFTS The architecture of CRNN The receptive field Structure of the SRN 35 36 36 36 37 37 38 39 39 41 42 44 45 46 47 48 51 52 53 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19 4.20 4.21 4.22 4.23 4.24 The space between adjacent characters Long Chinese sequence Chinese vertical textline The orientation of Latin characters The orientation of Chinese characters Pooling method comparison Order map The overview of the proposed UMATS architecture The detailed architecture of the proposed UMATS Visualization of region, link, and order maps Visualization of orientation-angle-related maps The architecture of UMATS detector The generation process of GT region map The generation process of GT link map The generation process of GT orientation map The GroundTruth maps Polygon generation for arbitrarily-shaped texts The organized character bounding boxes generation process The architectural overview of the connector The pooled textline feature maps The detailed architecture of the connector How the perspective RoI pooling works How to calculate perspective transformation matrix The overview of the recognizer 55 56 56 57 57 58 59 60 60 61 62 63 64 64 65 65 67 68 68 69 70 70 71 71 5.1 5.2 5.3 5.4 5.5 5.6 Example of some Example of some Example of some Example of some Visual results for Visual results for ReCTS2019 dataset images SynthText dataset images Chinese Synthetic String dataset images Chinese Street View Text dataset images occluded, partly-captured, and blurred characters textlines with large gaps between characters 75 76 77 77 81 82 93 5.7 5.8 5.9 5.10 5.11 5.12 5.13 Visual results for vertical textlines Visual results of textlines detection on complex backgrounds Qualitative results on ReCTS2019 dataset Qualitative results on ReCTS2019 dataset Visual results on arbitrary-shape text Horizontal lacquered boards and historical sites Distichs 83 83 86 87 88 89 90 94 List of Tables 2.1 2.2 2.3 Different types of gates of an RNN 13 GRU and LSTM 14 Variants of RNNs 14 3.1 3.2 Simplified Resnet50 49 Comparison between different models 49 4.1 4.2 The configuration of the feature extraction module 72 The configuration of the sequence modeling and prediction module 72 5.1 5.2 5.3 5.4 The results of several detection models on ReCTS2019 Task Detection speed of different popular models The results of several models on ReCTS2019 Task Text spotting speed of different popular models 80 81 84 85 95 Bibliography [1] Y Baek, S Shin, J Baek, S Park, J Lee, D Nam, and H Lee, “Character region attention for text spotting,” 2020 [2] H Li, P Wang, and C Shen, “Towards end-to-end text spotting with convolutional recurrent neural networks,” CoRR, vol abs/1707.03985, 2017 [Online] Available: http://arxiv.org/abs/1707.03985 [3] X Liu, D Liang, S Yan, D Chen, Y Qiao, and J Yan, “FOTS: fast oriented text spotting with a unified network,” CoRR, vol abs/1801.01671, 2018 [Online] Available: http://arxiv.org/abs/1801.01671 [4] M Jaderberg, K Simonyan, A Zisserman, and K Kavukcuoglu, “Spatial transformer networks,” CoRR, vol abs/1506.02025, 2015 [Online] Available: http://arxiv.org/abs/1506.02025 [5] Y Sun, C Zhang, Z Huang, J Liu, J Han, and E Ding, “Textnet: Irregular text reading from images with an end-to-end trainable network,” CoRR, vol abs/1812.09900, 2018 [Online] Available: http://arxiv.org/abs/1812.09900 [6] M Liao, B Shi, and X Bai, “Textboxes++: A single-shot oriented scene text detector,” CoRR, vol abs/1801.02765, 2018 [Online] Available: http://arxiv.org/abs/1801.02765 [7] B Shi, X Bai, and S J Belongie, “Detecting oriented text in natural images by linking segments,” CoRR, vol abs/1703.06520, 2017 [Online] Available: http://arxiv.org/abs/1703.06520 [8] Z Tian, W Huang, T He, P He, and Y Qiao, “Detecting text in natural image with connectionist text proposal network,” CoRR, vol abs/1609.03605, 2016 [Online] Available: http://arxiv.org/abs/1609.03605 [9] X Zhou, C Yao, H Wen, Y Wang, S Zhou, W He, and J Liang, “EAST: an efficient and accurate scene text detector,” CoRR, vol abs/1704.03155, 2017 [Online] Available: http://arxiv.org/abs/1704.03155 [10] J Liu, X Liu, J Sheng, D Liang, X Li, and Q Liu, “Pyramid mask text detector,” CoRR, vol abs/1903.11800, 2019 [Online] Available: http://arxiv.org/abs/1903.11800 [11] Y Liu, T He, H Chen, X Wang, C Luo, S Zhang, C Shen, and L Jin, “Exploring the capacity of sequential-free box discretization network for omnidirectional scene text detection,” CoRR, vol abs/1912.09629, 2019 [Online] Available: http://arxiv.org/abs/1912.09629 96 [12] B Shi, X Bai, and C Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” CoRR, vol abs/1507.05717, 2015 [Online] Available: http: //arxiv.org/abs/1507.05717 [13] B Shi, X Wang, P Lv, C Yao, and X Bai, “Robust scene text recognition with automatic rectification,” CoRR, vol abs/1603.03915, 2016 [Online] Available: http://arxiv.org/abs/1603.03915 [14] L Xing, Z Tian, W Huang, and M R Scott, “Convolutional character networks,” CoRR, vol abs/1910.07954, 2019 [Online] Available: http://arxiv.org/abs/1910.07954 [15] M Liao, P Lyu, M He, C Yao, W Wu, and X Bai, “Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes,” CoRR, vol abs/1908.08207, 2019 [Online] Available: http://arxiv.org/abs/1908.08207 [16] S Ren, K He, R B Girshick, and J Sun, “Faster R-CNN: towards real-time object detection with region proposal networks,” CoRR, vol abs/1506.01497, 2015 [Online] Available: http://arxiv.org/abs/1506.01497 [17] W Liu, D Anguelov, D Erhan, C Szegedy, S E Reed, C Fu, and A C Berg, “SSD: single shot multibox detector,” CoRR, vol abs/1512.02325, 2015 [Online] Available: http://arxiv.org/abs/1512.02325 [18] J Redmon, S K Divvala, R B Girshick, and A Farhadi, “You only look once: Unified, real-time object detection,” CoRR, vol abs/1506.02640, 2015 [Online] Available: http://arxiv.org/abs/1506.02640 [19] D Deng, H Liu, X Li, and D Cai, “Pixellink: Detecting scene text via instance segmentation,” CoRR, vol abs/1801.01315, 2018 [Online] Available: http://arxiv.org/abs/1801.01315 [20] S Long, J Ruan, W Zhang, X He, W Wu, and C Yao, “Textsnake: A flexible representation for detecting text of arbitrary shapes,” CoRR, vol abs/1807.01544, 2018 [Online] Available: http://arxiv.org/abs/1807.01544 [21] Y Xu, Y Wang, W Zhou, Y Wang, Z Yang, and X Bai, “Textfield: Learning A deep direction field for irregular scene text detection,” CoRR, vol abs/1812.01393, 2018 [Online] Available: http://arxiv.org/abs/1812.01393 [22] K He, G Gkioxari, P Dollár, and R B Girshick, “Mask R-CNN,” CoRR, vol abs/1703.06870, 2017 [Online] Available: http://arxiv.org/abs/1703.06870 [23] K He, X Zhang, S Ren, and J Sun, “Deep residual learning for image recognition,” CoRR, vol abs/1512.03385, 2015 [Online] Available: http://arxiv.org/abs/1512.03385 [24] H Law and J Deng, “Cornernet: Detecting objects as paired keypoints,” CoRR, vol abs/1808.01244, 2018 [Online] Available: http://arxiv.org/abs/ 1808.01244 97 [25] T Lin, P Dollár, R B Girshick, K He, B Hariharan, and S J Belongie, “Feature pyramid networks for object detection,” CoRR, vol abs/1612.03144, 2016 [Online] Available: http://arxiv.org/abs/1612.03144 [26] K Simonyan and A Zisserman, “Very deep convolutional networks for largescale image recognition,” in International Conference on Learning Representations, 2015 [27] M Schuster and K K Paliwal, “Bidirectional recurrent neural networks.” IEEE Trans Signal Process., vol 45, no 11, pp 2673–2681, 1997 [Online] Available: http://dblp.uni-trier.de/db/journals/tsp/tsp45.html#SchusterP97 [28] A Graves, S Fernandez, F Gomez, and J Schmidhuber, “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural nets,” in ICML ’06: Proceedings of the International Conference on Machine Learning, 2006 [29] X Zhu, H Hu, S Lin, and J Dai, “Deformable convnets v2: More deformable, better results,” CoRR, vol abs/1811.11168, 2018 [Online] Available: http://arxiv.org/abs/1811.11168 [30] Y Wang, H Xie, Z Zha, M Xing, Z Fu, and Y Zhang, “Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection,” 2020 [31] Y Baek, B Lee, D Han, S Yun, and H Lee, “Character region awareness for text detection,” CoRR, vol abs/1904.01941, 2019 [Online] Available: http://arxiv.org/abs/1904.01941 [32] F Zhan and S Lu, “ESIR: end-to-end scene text recognition via iterative image rectification,” CoRR, vol abs/1812.05824, 2018 [Online] Available: http://arxiv.org/abs/1812.05824 [33] J Baek, G Kim, J Lee, S Park, D Han, S Yun, S J Oh, and H Lee, “What is wrong with scene text recognition model comparisons? dataset and model analysis,” CoRR, vol abs/1904.01906, 2019 [Online] Available: http://arxiv.org/abs/1904.01906 [34] B Shi, C Yao, M Liao, M Yang, P Xu, L Cui, S J Belongie, S Lu, and X Bai, “ICDAR2017 competition on reading chinese text in the wild (RCTW-17),” CoRR, vol abs/1708.09585, 2017 [Online] Available: http://arxiv.org/abs/1708.09585 [35] X Chen, L Jin, Y Zhu, C Luo, and T Wang, “Text recognition in the wild: A survey,” 2020 [36] S Long, X He, and C Yao, “Scene text detection and recognition: The deep learning era,” CoRR, vol abs/1811.04256, 2018 [Online] Available: http://arxiv.org/abs/1811.04256 [37] O Ronneberger, P Fischer, and T Brox, “U-net: Convolutional networks for biomedical image segmentation,” CoRR, vol abs/1505.04597, 2015 [Online] Available: http://arxiv.org/abs/1505.04597 98 [38] A Shrivastava, A Gupta, and R B Girshick, “Training region-based object detectors with online hard example mining,” CoRR, vol abs/1604.03540, 2016 [Online] Available: http://arxiv.org/abs/1604.03540 [39] M Liao, Z Zhu, B Shi, G song Xia, and X Bai, “Rotation-sensitive regression for oriented scene text detection,” 2018 [40] Y Jiang, X Zhu, X Wang, S Yang, W Li, H Wang, P Fu, and Z Luo, “R2CNN: rotational region CNN for orientation robust scene text detection,” CoRR, vol abs/1706.09579, 2017 [Online] Available: http://arxiv.org/abs/1706.09579 [41] H Hu, C Zhang, Y Luo, Y Wang, J Han, and E Ding, “Wordsup: Exploiting word annotations for character based text detection,” CoRR, vol abs/1708.06720, 2017 [Online] Available: http://arxiv.org/abs/1708.06720 [42] P He, W Huang, T He, Q Zhu, Y Qiao, and X Li, “Single shot text detector with regional attention,” CoRR, vol abs/1709.00138, 2017 [Online] Available: http://arxiv.org/abs/1709.00138 [43] M Buˇsta, L Neumann, and J Matas, “Deep textspotter: An end-to-end trainable scene text localization and recognition framework,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp 2223–2231 [44] S Qin, A Bissacco, M Raptis, Y Fujii, and Y Xiao, “Towards unconstrained end-to-end text spotting,” CoRR, vol abs/1908.09231, 2019 [Online] Available: http://arxiv.org/abs/1908.09231 [45] H Li, P Wang, and C Shen, “Towards end-to-end text spotting in natural scenes,” CoRR, vol abs/1906.06013, 2019 [Online] Available: http://arxiv.org/abs/1906.06013 99 ... Moreover, many documents and historical structures of Viet Nam use and contain Han Nom characters The development of Han Nom spotting and translation applications brings a lot of great value In recent... regions of lines of characters and converting them into encoded strings of characters [1] 1.2 Tasks In order to build the above model, we will perform the following tasks: ❼ Firstly, we research and. .. Detection and recognition are two separate models in terms of backward path Specifically, image will first go through detection model in which all detected regions of text lines of characters (for

Định dạng
Số trang	107
Dung lượng	3,8 MB

Tiêu đề	Unified Model of Detection and Recognition for Han Nom Characters
Tác giả	Nguyễn Văn Lợi
Người hướng dẫn	PhD. Nguyễn Thị Oanh
Trường học	Hanoi University of Science and Technology
Chuyên ngành	Information Systems
Thể loại	master thesis
Năm xuất bản	2021
Thành phố	Hà Nội