Ví dụ bóc tách thông tin

2 CÁC PHƯƠNG PHÁP TIẾP CẬN BÀI TOÁN

3.11 Ví dụ bóc tách thông tin

Chương 4

KẾT LUẬN

Luận văn đã trình bày khá đầy đủ về bài toán Nhận dạng ký tự quang học - Optical Character Recognition (OCR). Tác giả đã trình bày các khái niệm chính trong bài toán OCR nói chung cũng như các dạng tiếp cận nói riêng, bao gồm:

• Phương pháp nhận dạng từng ký tự - Character Based OCR.

• Phương pháp nhận dạng từng từ - Word Based OCR.

• Phương pháp nhận dạng từng câu - Sentence Based OCR.

• Phương pháp nhận dạng toàn form - 2D OCR.

• Phương pháp End2End kết hợp - End2End OCR.

Hơn nữa, tác giả đã trình bày đánh giá chi tiết các kỹ thuật đang được sử dụng trong các bài toán trên, và đề xuất các giải pháp tối ưu cải tiến độ chính xác, bao gồm:

• Tạo dữ liệu tiếng Việt nhân tạo từ đa dạng font chữ và câu ngữ cảnh để gia tăng dữ liệu thực tế.

• Sử dụng kỹ thuật Mixed Precision Training để tăng tốc độ xử lý của mô hình.

Ngoài ra, với mỗi thuật toán được đưa ra, tác giả đã đánh giá khả năng của thuật toán, điểm mạnh cùng với các điểm yếu và kết quả thực tế chạy được của từng thuật toán. Trong thời gian tới, tác giả sẽ tập trung vào cải thiện kết quả nghiên cứu dựa trên một số hướng sau:

• Sử dụng thuật toán Collaborative Mutual Learning (CML) [2] để tăng tốc độ mô hình trên các thiết bị cấu hình yếu.

• Sử dụng kỹ thuật Self-Supervised Learning DINO [3] nhằm tăng chất lượng mô hình CNN.

Do thời gian nghiên cứu có hạn, bài luận văn còn gặp nhiều thiếu sót, tác giả mong nhận được nhiều nhận xét, đánh giá một cách tích cực để bài luận văn được cải thiện và tiến bộ tốt hơn.

Tài liệu tham khảo

[1] LeCun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard, R. E.; Hub- bard, W. & Jackel, L. D. (1989). "Backpropagation applied to handwritten zip code recognition". Neural Computation, 1(4):541-551, 1998. [2] Yuning Du, Chenxia Li, Ruoyu Guo, Cheng Cui, Weiwei Liu, Jun Zhou,

Bin Lu, Yehua Yang, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma, "PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System", 2021.

[3] Caron, Mathilde and Touvron, Hugo and Misra, Ishan and Jégou, Hervé and Mairal, Julien and Bojanowski, Piotr and Joulin, Armand, "Emerg- ing Properties in Self-Supervised Vision Transformers", Proceedings of the International Conference on Computer Vision (ICCV), 2021.

[4] Zbigniew Wojna, Alex Gorban, Dar-Shyang Lee, Kevin Murphy, Qian Yu† Yeqing Li, Julian Ibarz, "Attention-based Extraction of Structured Information from Street View Imagery", Computing Research Reposi- tory (CoRR), 2017.

[5] R. Smith, C. Gu, D.-S. Lee, H. Hu, R. Unnikrishnan, J. Ibarz, S. Arnoud, and S. Lin, “End-to-end interpretation of the french street name signs

dataset,” in European Conference on Computer Vision. Springer, pp. 411–426, 2016.

[6] T. He, W. Huang, Y. Qiao, and J. Yao, “Text-attentional convolutional neural network for scene text detection,” IEEE Transactions on Image Processing, vol. 25, no. 6, pp. 2529–2541, 2016.

[7] C.-Y. Lee and S. Osindero, “Recursive recurrent nets with attention mod- eling for OCR in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2231–2239, 2016.

[8] B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE transactions on pattern analysis and machine intelligence, 2016.

[9] B. Shi, X. Wang, P. Lyu, C. Yao, and X. Bai, “Robust scene text recognition with automatic rectification,” in Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pp.4168–4176, 2016.

[10] T. Bluche, J. Louradour, and R. Messina, “Scan, attend and read: Endto- end handwritten paragraph recognition with mdlstm attention,” arXiv preprint arXiv:1604.03286, 2016.

[11] T. Bluche, “Joint line segmentation and transcription for end- to-end handwritten paragraph recognition,” arXiv preprint arXiv:1604.08352,2016.

[12] Christian Reisswig, Anoop R Katti, Marco Spinaci, Johannes Hohne, "Chargrid-OCR: End-to-end Trainable Optical Character

Recognition for Printed Documents using Instance Segmentation", arXiv:1909.04469v4 [cs.CV], 2020.

[13] Asif Shahab, Faisal Shafait, Thomas Kieninger, and Andreas Dengel. "An open approach towards the benchmarking of table structure recognition systems". In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pages 113–120. ACM, 2010.

[14] Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. "Faster R-CNN: towards real-time object detection with region proposal networks". CoRR, abs/1506.01497, 2015.

[15] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C.Berg. "SSD: single shot multibox detector". In ECCV 2016, pages 21–37, 2016.

[16] Vladimir Batagelj and Matjaz Zaversnik. "An o(m) algorithm for cores decomposition of networks". CoRR, cs.DS/0310049, 2003.

[17] DD. A. Borges Oliveira and M. P. Viana, "Fast CNN-Based Docu- ment Layout Analysis," 2017 IEEE International Conference on Com- puter Vision Workshops (ICCVW), pp. 1173-1180, doi: 10.1109/IC- CVW.2017.142. 2017.

[18] Fisher Yu and Vladlen Koltun. "Multi-Scale Context Aggregation by Di- lated Convolutions." International Conference on Learning Representa- tions (ICLR), May 2016.

[19] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. "U-net: Convolu- tional networks for biomedical image segmentation". In MICCAI 2015, pages 234–241. Springer, 2015.

[20] Sergey Ioffe and Christian Szegedy. "Batch normalization: Accelerat- ing deep network training by reducing internal covariate shift". arXiv preprint arXiv:1502.03167, 2015.

[21] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. "Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems", pages 1097–1105, 2012.

[22] Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. "Efficient object localization using convolutional networks". In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 648–656, 2015.

[23] Xiang Zhang, Junbo Zhao, Yann LeCun, "Character-level Convolutional Networks for Text Classification", 2007.

[24] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y. & Yan, J. "FOTS: Fast Oriented Text Spotting with a Unified Network". CoRR.

abs/1801.01671, 2018. http://arxiv.org/abs/1801.01671

[25] Buˇsta, M., Neumann, L. & Matas, J. Deep "TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework". 2017 IEEE International Conference On Computer Vision (ICCV). pp. 2223- 2231 ,2017.

[26] Li, H., Wang, P. & Shen, C. Towards "End-to-end Text Spotting with Convolutional Recurrent Neural Networks". CoRR. abs/1707.03985, 2017. http://arxiv.org/abs/1707.03985.

[27] Girshick, R. "Fast R-CNN". CoRR. abs/1504.08083, 2015. http://arxiv.org/abs/1504.08083.

[28] Bartz, C., Yang, H. & Meinel, C. SEE: "Towards Semi-Supervised End-to-End Scene Text Recognition". CoRR. abs/1712.05404, 2017. http://arxiv.org/abs/1712.05404.

[29] Jaderberg, M., Simonyan, K., Zisserman, A. & Kavukcuoglu, K. "Spatial Transformer Networks". CoRR. abs/1506.02025, 2015. http://arxiv.org/abs/1506.02025

[30] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification". In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.

[31] Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. "Segnet: A deep convolutional encoder-decoder architecture for image segmentation". IEEE transactions on pattern analysis and machine intelligence, 39(12):2481–2495, 2017.

[32] Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun."Faster R-CNN: towards real-time object detection with region proposal networks". CoRR, abs/1506.01497, 2015.

[33] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. "Neural machine transation by jointly learning to align and translate". In Proc. ICLR. 2015.

[34] Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Ben- gio. "Show, attend and tell: Neural image caption generation with visual attention." In Proc. ICML. 2015

[35] Tsung-Hsien Wen, Milica Gasˇic , Nikola Mrksˇic , Pei-Hao Su, David Vandyke, and Steve Young. "Semanti- cally conditioned LSTM-based natural language generation for spoken dialogue systems." In Proc. EMNLP. 2015.

[36] S. M. Lucas. ICDAR 2005 text locating competition results. In Docu- ment Analysis and Recognition, pages 80–84, 2005.

[37] A. Mishra, K. Alahari, and C. Jawahar. "Scene text recognition using higher order language priors". 2012.

[38] ] K. Wang, B. Babenko, and S. Belongie. "End-to-end scene text recognition". In Proc. ICCV, pages 1457–1464. IEEE, 2011.

[39] TC. C. Tappert, C. Y. Suen, and T. Wakahara, “The state of the art in online handwriting recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 8, pp. 787–808, Aug. 1990.[On- line]. Available: http://dx.doi.org/10.1109/34.57669

[40] Yann Lecun, Léon Bottou, Yoshuo Bengio, and Patrick Haffner, "Gradient-Based Learning Applied to Document Recognition", PROC OF THE IEEE 1998.

[41] Wenjia Wang, Enze Xie, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, Ping Luo, "TextSR: Content-Aware Text Super-Resolution Guided by Recognition", 2019.

[42] Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai, "Real-time Scene Text Detection with Differentiable Binarization", 2020.

[43] Mingxing Tan, Quoc V. Le, "EfficientNetV2: Smaller Models and Faster Training", 2021.

[44] R. Smith, "An Overview of the Tesseract OCR Engine," Ninth Inter- national Conference on Document Analysis and Recognition (ICDAR 2007), pp. 629-633, doi: 10.1109/ICDAR.2007.4376991, 2007.

[45] Raymond Smith, Chunhui Gu, Dar-Shyang Lee, Huiyi Hu, Ranjith Un- nikrishnan, Julian Ibarz, Sacha Arnoud, and Sophia Lin, "End-to-End Interpretation of the French Street Name Signs Dataset", 2017.

[46] Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, Alexander M. Rush, "Image-to-Markup Generation with Coarse-to-Fine Attention", 2017. [47] Ch’ng, Chee Kheng, and Chee Seng Chan. "Total-text: A comprehen-

sive dataset for scene text detection and recognition." 14th IAPR Inter- national Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE, 2017.

[48] Yuliang, Liu, Lianwen, Jin, et al. "Curved Scene Text Detection via Transverse and Longitudinal Sequence Connection." Pattern Recogni- tion, 2019.

[49] Gupta, A., Vedaldi, A., Zisserman, A.: "Synthetic data for text locali- sation in natural images". In: Proceedings of the IEEE conference on computer vision and pattern. 2015.

[50] Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: "Synthetic data and artificial neural networks for natural scene text recognition". In: Workshop on Deep Learning, NIPS (2014) recognition. pp. 2315–2324, 2016.

Proceedings of IEEE Workshop on Neural Networks for Signal Process- ing", pp. 61-68, doi: 10.1109/NNSP.1994.366063, 1994.

[52] Ebin Zacharias, Martin Teuchler and Bénédicte Bernier, "Image Pro- cessing Based Scene-Text Detection and Recognition with Tesseract". 2020.

[53] Xiaoxue Chen, Lianwen Jin, Yuanzhi Zhu, Canjie Luo, and Tianwei Wang. "Text recognition in the wild: A survey", 2020.

[54] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative adversarial nets". In Proceedings of NIPS. 2672–2680. 2014.

[55] Baoguang Shi, Xiang Bai, Cong Yao. "An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition", CoRR, 2015.

[56] A. Bissacco, M. Cummins, Y. Netzer, and H. Neven. Photoocr: "Reading text in uncontrolled conditions". In ICCV, 2013.

[57] M. Jaderberg, A. Vedaldi, and A. Zisserman. "Deep features for text spotting". In ECCV, 2014.

[58] S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, R. Young, K. Ashida, H. Nagai, M. Okamoto, H. Yamamoto, H. Miyao, J. Zhu, W. Ou, C. Wolf, J. Jolion, L. Todoran, M. Worring, and X. Lin. ICDAR 2003 robust reading competitions: entries, results, and future directions. IJDAR, 7(2-3):105–122, 2005.

[59] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Bigorda, S. R. Mestre, J. Mas, D. F. Mota, J. Almazan, and ´ L. de las Heras. ICDAR 2013 robust reading competition. In ICDAR, 2013.

[60] A. Mishra, K. Alahari, and C. V. Jawahar. "Scene text recognition using higher order language priors". In BMVC, 2012.

[61] K. Wang, B. Babenko, and S. Belongie. "End-to-end scene text recognition". In ICCV, 2011.

[62] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. "Reading text in the wild with convolutional neural networks". IJCV (Accepted), 2015.

[63] Franc¸ois Chollet. "Xception: Deep Learning with Depthwise Separable Convolutions", 2017.

[64] W. Bieniecki, S. Grabowski and W. Rozenberg, "Image Preprocessing for Improving OCR Accuracy," 2007 International Conference on Per- spective Technologies and Methods in MEMS Design, 2007, pp. 75-80, doi: 10.1109/MEMSTECH.4283429, 2007.

[65] Suyoun Kim, Takaaki Hori, Shinji Watanabe. "Joint CTC-attention based end-to-end speech recognition using multi-task learning", 2017. [66] T. Bluche and R. Messina. Gated "convolutional recurrent neural net-

works for multilingual handwriting recognition". In 2017 14th IAPR International Conference on Document Analysis and Recognition (IC- DAR), volume 01, pages 646– 651, 2017.

[67] J. Michael, R. Labahn, T. Gru ning, and J. Zo llner. "Evaluating sequence-to-sequence models for handwritten text recognition". In

2019 International Conference on Document Anal- ysis and Recogni- tion (ICDAR), pages 1286–1293, 2019.

[68] Lei Kang, Pau Riba, Marc al Rusin ol, Alicia Forne s, and Mauricio Villegas. "Pay attention to what you read: Non- recurrent handwritten text-line recognition", 2020.

[69] Maurits Bleeker and Maarten de Rijke. "Bidirectional scene text recognition with a single decoder". arXiv preprint arXiv:1912.03656, 2019. [70] NingLu,WenwenYu,XianbiaoQi,YihaoChen,PingGong, and Rong Xiao.

"MASTER: multi-aspect non-local network for scene text recognition. CoRR", abs/1910.02562, 2019.

[71] B.Shi,M.Yang,X.Wang,P.Lyu,C.Yao,andX.Bai.Aster: "An attentional scene text recognizer with flexible rectification". IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9):2035–2048, 2019. [72] Fenfen Sheng, Zhineng Chen, and Bo Xu. "NRTR: A no- recur-

rence sequence-to-sequence model for scene text recognition". CoRR, abs/1806.00926, 2018.

[73] Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, Canjie Luo, Xi-aoxue Chen, Yaqiang Wu, Qianying Wang, and Mingxi- ang Cai. "Decoupled attention network for text recognition". In The Thirty-Fourth AAAI Confer- ence on Artificial Intelli- gence, AAAI 2020, The Thirty-Second Inno- vative Applica- tions of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artifi- cial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 12216–12224. AAAI Press, 2020.

[74] Ron Litman, Oron Anschel, Shahar Tsiper, Roee Litman, Shai Mazor, and R. Manmatha. "Scatter: Selective context attentional scene text recognizer". In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.

[75] Deli Yu, Xuan Li, Chengquan Zhang, Tao Liu, Junyu Han, Jingtuo Liu, and Errui Ding. "Towards accurate scene text recognition with semantic reasoning networks". In Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12113–12122, 2020. [76] Mlchain, https://github.com/Techainer/mlchain-python.

[77] http://yann.lecun.com/exdb/mnist/.

[78] https://moov.ai/en/blog/optical-character-recognition-ocr/.

[79] Optical Character Recognition, https://medium.com/sfu-cspmp/optical- character-recognition-948bfc4adfb3. [80] VeryPDF, http://www.verypdf.com/app/papertools/user-guide.html. [81] https://towardsdatascience.com/intuitively-understanding- connectionist-temporal-classification-3797e43a86c. [82] https://en.wikipedia.org. [83] https://en.wikipedia.org/wiki/PDF. [84] https://github.com/NVlabs/ocrodeg. [85] https://cs231n.github.io/convolutional-networks/

[87] Tesseract. https://github.com/tesseract-ocr/ tesseract.

[88] "Overview - Scanned Receipts OCR and Information Extraction (SROIE)". https://rrc.cvc.uab.es/?ch=13com=introduction

[89] https://deepai.org/machine-learning-glossary-and-terms/recurrent- neural-network.

Lớp tổng hợp Pooling Layer (PL) Nguồn: [85]

Kiến trúc mạng RN N Nguồn: [85]