Hướng phát triển

− Tiếp tục mở rộng bộ dữ liệu.

− Tìm hiểu và thực nghiệm các phương pháp mới để cải thiện kết quả rút trích thông tin hóa đơn.

− Tìm hiểu các cách để cải thiện kết quả dựa trên các phương pháp cũ.

TÀI LIỆU THAM KHẢO

[1] Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization. arXiv preprint arXiv:1607.06450.

[2] Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., & Sun, J. (2021). You only look one-level feature. In Proceedings of the IEEE/CVF Conference on

Computer Vision and Pattern Recognition (pp. 13039-13048).

[3] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint

arXiv:1810.04805.

[4]Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation.

In Proceedings of the IEEE conference on computer vision and pattern

recognition (pp. 580-587).

[5]Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international

conference on computer vision (pp. 1440-1448).

[6]Jiang, B., Luo, R., Mao, J., Xiao, T., & Jiang, Y. (2018). Acquisition of localization confidence for accurate object detection. In Proceedings of the

European conference on computer vision (ECCV) (pp. 784-799).

[7]He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn.

In Proceedings of the IEEE international conference on computer vision (pp.

2961-2969).

[8] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.

[9] Huang, Z., Chen, K., He, J., Bai, X., Karatzas, D., Lu, S., & Jawahar, C. V. (2019, September). Icdar2019 competition on scanned receipt ocr and information extraction. In 2019 International Conference on Document Analysis and Recognition (ICDAR) (pp. 1516-1520). IEEE.

[10] Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the

IEEE conference on computer vision and pattern recognition (pp. 2117-2125).

[11] Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international

conference on computer vision (pp. 2980-2988).

[12] Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., ... & Zitnick, C. L. (2014, September). Microsoft coco: Common objects in context.

In European conference on computer vision (pp. 740-755). Springer, Cham.

[13] Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, L., Shazeer, N., Ku, A., & Tran, D. (2018, July). Image transformer. In International Conference on Machine Learning (pp. 4055-4064). PMLR.

[14] Park, S., Shin, S., Lee, B., Lee, J., Surh, J., Seo, M., & Lee, H. (2019, September). CORD: a consolidated receipt dataset for post-OCR parsing. In

Workshop on Document Intelligence at NeurIPS 2019.

[15] Patel, S., & Bhatt, D. (2020). Abstractive Information Extraction from Scanned Invoices (AIESI) using End-to-end Sequential Approach. arXiv preprint arXiv:2009.05728.

[16] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).

[17] Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In

Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263-7271).

[18] Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement.

arXiv preprint arXiv:1804.02767.

[19] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real- time object detection with region proposal networks. Advances in neural information processing systems, 28.

[20] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural

information processing systems, 30.

[21] Vu, X. S., Bui, Q. A., Nguyen, N. V., Nguyen, T. T. H., & Vu, T. (2021, August). Mc-ocr challenge: Mobile-captured image document recognition for vietnamese receipts. In 2021 RIVF International Conference on Computing

and Communication Technologies (RIVF) (pp. 1-6). IEEE.

[22] Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020, August). Layoutlm: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1192-1200).

[23] Yu, F., & Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122.

[24] Yu, W., Lu, N., Qi, X., Gong, P., & Xiao, R. (2021, January). PICK: processing key information extraction from documents using improved graph learning-convolutional networks. In 2020 25th International Conference on Pattern Recognition (ICPR) (pp. 4363-4370). IEEE.

[25] Zou, Z., Shi, Z., Guo, Y., & Ye, J. (2019). Object detection in 20 years: A survey. arXiv preprint arXiv:1905.05055.

PHỤ LỤC A – BÀI BÁO

Bài báo khoa học được chấp nhận đăng tại Hội nghị khoa học 2022 IEEE 9th International Conference on Communications and Electronics (ICCE 2022)

104

Các công trình nghiên cứu liên quan

Ảnh minh họa kiến trúc mạng YOLOv3