− Tiếp tục mở rộng bộ dữ liệu.
− Tìm hiểu và thực nghiệm các phương pháp mới để cải thiện kết quả rút trích thông tin hóa đơn.
95
− Tìm hiểu các cách để cải thiện kết quả dựa trên các phương pháp cũ.
96
TÀI LIỆU THAM KHẢO
[1] Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization. arXiv preprint arXiv:1607.06450.
[2] Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., & Sun, J. (2021). You only look one-level feature. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition (pp. 13039-13048).
[3] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint
arXiv:1810.04805.
[4]Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation.
In Proceedings of the IEEE conference on computer vision and pattern
recognition (pp. 580-587).
[5]Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international
conference on computer vision (pp. 1440-1448).
[6]Jiang, B., Luo, R., Mao, J., Xiao, T., & Jiang, Y. (2018). Acquisition of localization confidence for accurate object detection. In Proceedings of the
European conference on computer vision (ECCV) (pp. 784-799).
[7]He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn.
In Proceedings of the IEEE international conference on computer vision (pp.
2961-2969).
[8] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
[9] Huang, Z., Chen, K., He, J., Bai, X., Karatzas, D., Lu, S., & Jawahar, C. V. (2019, September). Icdar2019 competition on scanned receipt ocr and information extraction. In 2019 International Conference on Document Analysis and Recognition (ICDAR) (pp. 1516-1520). IEEE.
97
[10] Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the
IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
[11] Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international
conference on computer vision (pp. 2980-2988).
[12] Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., ... & Zitnick, C. L. (2014, September). Microsoft coco: Common objects in context.
In European conference on computer vision (pp. 740-755). Springer, Cham.
[13] Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, L., Shazeer, N., Ku, A., & Tran, D. (2018, July). Image transformer. In International Conference on Machine Learning (pp. 4055-4064). PMLR.
[14] Park, S., Shin, S., Lee, B., Lee, J., Surh, J., Seo, M., & Lee, H. (2019, September). CORD: a consolidated receipt dataset for post-OCR parsing. In
Workshop on Document Intelligence at NeurIPS 2019.
[15] Patel, S., & Bhatt, D. (2020). Abstractive Information Extraction from Scanned Invoices (AIESI) using End-to-end Sequential Approach. arXiv preprint arXiv:2009.05728.
[16] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
[17] Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In
Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263-7271).
[18] Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement.
arXiv preprint arXiv:1804.02767.
[19] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real- time object detection with region proposal networks. Advances in neural information processing systems, 28.
98
[20] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural
information processing systems, 30.
[21] Vu, X. S., Bui, Q. A., Nguyen, N. V., Nguyen, T. T. H., & Vu, T. (2021, August). Mc-ocr challenge: Mobile-captured image document recognition for vietnamese receipts. In 2021 RIVF International Conference on Computing
and Communication Technologies (RIVF) (pp. 1-6). IEEE.
[22] Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020, August). Layoutlm: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1192-1200).
[23] Yu, F., & Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122.
[24] Yu, W., Lu, N., Qi, X., Gong, P., & Xiao, R. (2021, January). PICK: processing key information extraction from documents using improved graph learning-convolutional networks. In 2020 25th International Conference on Pattern Recognition (ICPR) (pp. 4363-4370). IEEE.
[25] Zou, Z., Shi, Z., Guo, Y., & Ye, J. (2019). Object detection in 20 years: A survey. arXiv preprint arXiv:1905.05055.
99
PHỤ LỤC A – BÀI BÁO
Bài báo khoa học được chấp nhận đăng tại Hội nghị khoa học 2022 IEEE 9th International Conference on Communications and Electronics (ICCE 2022)
104