KẾT LUẬN VÀ HƯỚNG PHÁT TRIỂN

4.1. Kết luận

4.1.1 Kết quả đạt được

− Cái nhìn tổng quan về bài tốn phát hiện đối tượng, lịch sử phát triển của nó, cụ thể là phát hiện đối tượng trong ảnh hóa đơn.

− Thực nghiệm phương pháp Faster R-CNN, YOLOv3, YOLOF cho bước phát hiện đối tượng.

− Sử dụng được model pretrained TransformerOCR cho bước trích xuất thơng tin đã được phát hiện.

4.1.2 Hạn chế

− Còn nhiều vấn đề trong lĩnh vực trong xử lý ảnh và xử lý ngôn ngữ tự nhiên vẫn chưa rõ.

− Kết quả dự đốn cịn thấp.

4.2. Hướng phát triển

− Áp dụng các thuật toán mới cho phát hiện đối tượng và trích xuất thơng tin trong hóa đơn ra văn bản.

TÀI LIỆU THAM KHẢO

[1] Huang, Z., Chen, K., He, J., Bai, X., Karatzas, D., Lu, S., & Jawahar, C. V. (2019, September). Icdar2019 competition on scanned receipt ocr and information extraction. In 2019 International Conference on Document Analysis and Recognition (ICDAR) (pp. 1516-1520). IEEE.

[2] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real- time object detection with region proposal networks. arXiv preprint arXiv:1506.01497.

[3] Yu, W., Lu, N., Qi, X., Gong, P., & Xiao, R. (2020). Pick: processing key information extraction from documents using improved graph learning-convolutional networks. arXiv preprint arXiv:2004.07464.

[4] Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020, August). Layoutlm: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1192-1200).

[5] Patel, S., & Bhatt, D. (2020). Abstractive Information Extraction from Scanned Invoices (AIESI) using End-to-end Sequential Approach. arXiv preprint arXiv:2009.05728.

[6] Zou, Z., Shi, Z., Guo, Y., & Ye, J. (2019). Object detection in 20 years: A survey. arXiv preprint arXiv:1905.05055.

[7] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

[8] Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, L., Shazeer, N., Ku, A., & Tran, D. (2018, July). Image transformer. In International Conference on Machine Learning (pp. 4055-4064). PMLR.

[9] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).

[10] Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization. arXiv preprint arXiv:1607.06450.

[11] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).

[12] Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263-7271).

[13] Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.

[14] Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., & Sun, J. (2021). You only look one-level feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 13039-13048).

[15] Yu, F., & Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122.

Hình minh họa loss function của YOLOv1

Ảnh minh họa kiến trúc mạng YOLOv3