TAI LIEU THAM KHAO

"Statista," [Online]. Available:

https://www.statista.com/statistics/87 15 13/worldwide-data-created/.

R. S. Fei-Fei, "Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora," [EEE Conference on Computer Vision and Pattern Recognition, 2010.

X. Y. L. L. M. W. L. a. Benjamin Z Yao, "I2t: Image parsing to text description," Proceedings of the IEEE, 2010.

Mao and Junhua, "Explain images with multimodal recurrent neural networks," arXiv preprint, 2014.

Plummer and Bryan , "Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models.," Proceedings of the IEEE international conference on computer vision, 2015.

Lin and Tsung-Yi, "Microsoft coco: Common objects in context," Computer Vision—ECCV 2014: 13th European Conference, 2014.

He and Kaiming, "Deep residual learning for image recognition,"

Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.

Oriol , Alexander and Samy , "Show and tell: Lessons learned from the 2015 mscoco image captioning challenge," JEEE Transactions on Pattern Analysis and Machine Intelligence, 2016.

S. E. and Y. , "Self-critical sequence training for image captioning," the IEEE Conference on Computer Vision and Pattern Recognition, 2017.

Donahue, Hendricks and Guadarrama, "Long-term recurrent convolutional networks for visual recognition and description.," JEEE Conference on

Computer Vision and Pattern Recognition, 2015.

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

Vinyals and Oriol, "Show and tell: A neural image caption generator.," [EEE conference on computer vision and pattern recognition, 2015.

Xu and Kelvin, "Show, attend and tell: Neural image caption generation with visual attention.," International conference on machine learning, 2015.

Pedersoli, Lucas and Schmid, "Areas of attention for image captioning.," the International Conference on Computer Vision, 2017.

Anderson, He and Buehler, "Bottom-up and top-down attention for image captioning and visual question answering," JEEE Conference on Computer Vision and Pattern Recognition, 2018.

Lu, Yang and Batra, "Neural Baby Talk.," IEEE Conference on Computer Vision and Pattern Recognition, 2018.

Vaswani and Shazeer, "Attention is all you need," In Advances in Neural Information Processing Systems, 2017.

Sukhbaatar and Grave, "Augmenting Self-attention with Persistent Memory," arXiv preprint arXi, 2019.

Tax, MJ and Laskov, "Online SVM learning: from classification to data description and back.," JEEE XIII Workshop on Neural Networks for Signal Processing, 2003.

Crammer and Koby, "Online passive aggressive algorithms.," 2006.

Wang and Guozhang, "Building a replicated logging system with Apache Kafka.," Proceedings of the VLDB Endowment, 2015.

Carbone and Paris, "Apache flink: Stream and batch processing in a single engine," The Bulletin of the Technical Committee on Data Engineering,

2015.

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

Zhang, He and Patel, "Density-aware single image de-raining using a multi- stream dense network.," Proceedings of the IEEE conference on computer vision and pattern recognition, 2018.

Hoi, Wang and Zhao , "Libol: A library for online learning algorithms,"

JMLR, 2014.

Santoro and Adam, ""Meta-learning with memory-augmented neural networks.," International conference on machine learning, 2016.

Parisi and German , "Continual lifelong learning with neural networks: A review."," Neural Network, 2019.

Hoang Lam and Quan, "UIT-VIC: A Dataset for the First Evaluation on Vietnamese Image Captioning.,” 2020.

Wales and Sanger, "Wikipedia," Wikipedia, 15 1 2001. [Online]. Available: https://wikipedia.org/.

Sharma and Piyush, "Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning.," Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018.

Matthew and Ines, "spaCy: Industrial-strength Natural Language Processing

in Python," 2020.

Trung, "Vietnamese language model for spacy.1o," 2019.

G. Cloud, "Google Cloud Vision API," Google , 2023. [Online]. Available: https://cloud.google.com/vision/.

G. Cloud, "Google Cloud Natural Language API," Google, [Online].

Available: https://cloud.google.com/natural-language/.

Google, "Google Knowledge Graph API," [Online]. Available:

https://developers.google.com/knowledge-graph/.

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

Kasymova, "Dida," 2022. [Online]. Available: https://dida.do/blog/image- captioning-with-attention.

M. Najman, "Image Captioning with Convolutional Neural Networks," 2016.

Mukhlif, "Incorporating a Novel Dual Transfer Learning Approach for Medical Images," in Advanced Signal Processing and Human-Machine Interface for Healthcare Diagnostics and Bioengineering Applications, 2022.

Zhang and Pengchuan, "Vinvl: Revisiting visual representations in vision- language models," in Proceedings of the IEEE/CVF Conference on

Computer Vision and Pattern Recognition, 2021.

Anderson and Peter, "Bottom-up and top-down attention for image captioning and visual question answering.," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018.

Krishna and Ranjay, "Visual genome: Connecting language and vision using crowdsourced dense image annotations," [International journal of computer vision , pp. 32-73, 2017.

Deng and Jia, "Imagenet: A large-scale hierarchical image database.," in IEEE conference on computer vision and pattern recognition, 2009.

Herdade and Simao, "Image captioning: Transforming objects into words.,"

in Advances in neural information processing systems, 2019.

Ren and He , "Faster {R-CNN}: Towards Real-Time Object Detection," in Advances in Neural Information Processing Systems, 2015.

Cornia and Marcella, "Meshed-memory transformer for image captioning.,"

in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition., 2020.

Papineni and Kishore , "BLEU: a Method for Automatic Evaluation of Machine Translation," in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 2002.

Các phương pháp đề xuất

Thực nghiệm và đánh giá kết quả