1. Trang chủ
  2. » Luận Văn - Báo Cáo

Luận văn thạc sĩ Khoa học máy tính: Hệ thống trả lời câu hỏi trực quan

51 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Nội dung

Trang 1

0mVӕ8.48.01.01

LUҰN VĂN THҤ&6Ƭ

Trang 2

7KjQKSKҫQ+ӝLÿӗQJÿiQKJLiOXұQYăQWKҥFVƭJӗP

761JX\ӉQĈӭF'NJQJ««««- &KӫWӏFKHӝLÿӗQJ 761JX\ӉQ7LӃQ7KӏQK««««- 7KѭNê

769}7Kӏ1JӑF&KkX««««- 3KҧQELӋQ 4 PGS.TS 1JX\ӉQ7XҩQĈăQJ««- 3KҧQELӋQ

5 PGS.TS +XǤQK7UXQJ+LӃX««- Ӫ\YLrQ

;iFQKұQFӫD&KӫWӏFK+ӝLÿӗQJÿiQKJLi/9Yj7UѭӣQJ.KRDTXҧQOêFKX\rQQJjQKVDXNKLOXұQYăQÿmÿѭӧFVӱDFKӳD QӃXFy 

.+2$+Ӑ&9¬.Ӻ7+8Ұ70È<7Ë1+

Trang 3

ĈҤ,+Ӑ&48Ӕ&*,$73+&0

&Ӝ1*+Ñ$;­+Ӝ,&+Ӫ1*+Ƭ$9,ӊ71$0 ĈӝFOұS- 7ӵGR- +ҥQKSK~F

NHIӊM VӨ LUҰ19Ă17+Ҥ&6Ƭ

+ӑWrQKӑFYLrQ7UҫQ&{QJ+ұX MSHV: 1970121 1Jj\WKiQJQăPVLQK 1ѫLVLQK/RQJ$Q &KX\rQQJjQK.KRDKӑFPi\WtQK 0mVӕ : 8.48.01.01

I 7Ç1Ĉӄ7¬, : +ӋWKӕQJWUҧOӡLFkXKӓLWUӵFTXDQ / Visual Question Answering

System

II 1+,ӊ09Ө9¬1Ӝ,'81* :

7UҧOӡLFkXKӓLGӵDYjRKuQKҧQKEҵQJQJ{QQJӳWLӃQJ9LӋW3KiWWULӇQKӋWKӕQJWUҧOӡLFkXKӓLGӵDWUrQP{FiFP{KuQKÿmFy6RViQKÿiQKJLiP{KuQKWUrQWұSGӳOLӋXWLӃQJ9LӋWYjWұSGӳOLӋXWLӃQJ$QKĈѭDUDFkXWUҧOӡLSKKӧSFKRFkXKӓLYăQEҧQWLӃQJ9LӋW

III 1*¬<*,$21+,ӊ09Ө : 22/02/2021

IV 1*¬<+2¬17+¬1+1+,ӊ09Ө: 13/06/2021 V &È1%Ӝ+ѬӞ1*'Ү1 : 3*6764XҧQ7KjQK7Kѫ

Trang 4

LӠI CҦ0Ѫ1

7{L[LQWUkQWUӑQJJӱLOӡLELӃWѫQFKkQWKjQKÿӃQWKҫ\3*6764XҧQ7KjQK7KѫQJѭӡLÿmWUӵFWLӃSKѭӟQJGүQWұQWuQKFKӍEҧRW{LWURQJTXiWUuQKWKӵFKLӋQÿӅWjLĈӗQJWKӡLWKҫ\FNJQJOjQJѭӡLOX{QFKRW{LQKӳQJOӡLNKX\rQY{FQJTXêJLiYӅFҧNLӃQWKӭFFKX\rQP{QFNJQJQKѭÿӏQKKѭӟQJSKiWWULӇQVӵQJKLӋS7{L[LQFҧPѫQWKҫ\YӅQKӳQJNLӃQWKӭFPjWKҫ\ÿmWUX\ӅQÿҥW7{LFNJQJ[LQFKkQWKjQKFҧPѫQWҩWFҧTXê7Kҫ\&{WURQJNKRDÿmWұQWuQK JL~Sÿӥ ÿӇ W{LKRjQWKjQKÿӅWjL 7{L[LQFҧPѫQYӅWҩWFҧVӵJL~SÿӥFӫDDQKFKӏYjFiFEҥQKӑFYLrQFQJKӑF FKXQJYӟLW{LYjÿmJL~SW{LKRjQWKjQKÿӅWjLOXұQYăQ7KҥFVƭQj\YjJySêFKRW{LWURQJTXiWUuQKWKӵFKLӋQOXұQYăQ

+ӗ&Kt0LQKQJj\WKiQJQăP 7UҫQ&{QJ+ұX

Trang 5

TÓM TҲT LUҰ19Ă1

7UҧOӡLFkXKӓLWUӵFTXDQOjPӝWWURQJQKӳQJFKӫÿӅWѭѫQJÿӕLPӟLWURQJOƭQKYӵFWUtWXӋQKkQWҥR9ӟLPӝWEӝGӳOLӋXÿҫXYjROjPӝWKuQKҧQKYjPӝWFkXKӓLGҥQJYăQEҧQFyQӝLGXQJOLrQTXDQÿӃQEӭFҧQKWKuKӋWKӕQJVӁFKRUDPӝWFkXWUҧOӡLFyêQJKƭDYj FyQӝLGXQJOLrQTXDQÿӃQFkXKӓLĈLӅXQj\FyQJKƭDOjPӝWKӋWKӕQJWUҧOӡLFkXKӓLWUӵFTXDQFҫQFyNKҧQăQJ[ӱOêKuQKҧQKFKҷQJKҥQQKѭQKұQGLӋQWKӵFWKӇSKiWKLӋQÿӕLWѭӧQJQKұQGҥQJKRҥWÿӝQJ

WLӃQJ9LӋWKLӋQFyWURQJOXұQYăQQj\ÿmÿҥWÿӝFKtQK[iFWәQJWKӇOj64.77% Thông

TXDOXұQYăQQj\W{LPRQJPXӕQÿyQJJySPӝWSKҫQQKӓ FKRFӝQJÿӗQJQJKLrQFӭXYӅWLӃQJ9LӋW

Trang 6

ABSTRACT

Visual question answering is one of the relatively new topics in the field of artificial intelligence Use an image as input and an image-related question to give a meaningful and relevant answer to the question The visual question answering system needs to have image processing capabilities, such as entity recognition, object detection, activity recognition, etc

In this thesis, I aim to use the model to be able to give an answer to the text-based questions in Vietnamese The evaluation results on the existing Vietnamese dataset

in this thesis have achieved an overall accuracy of 64.77% Through this thesis, I

wish to contribute a part to the Vietnamese language research community

Trang 7

LӠ,&$0Ĉ2$1

Tôi cam ÿRDQUҵQJFiFF{QJYLӋFWUuQKEj\WURQJOXұQYăQQj\OjGRFKtQKW{LWKӵFKLӋQYjFKѭDFySKҫQQӝLGXQJQjRFӫDOXұQYăQQj\ÿѭӧFQӝSÿӇOҩ\PӝWEҵQJFҩSӣWUѭӡQJQj\KRһFWUѭӡQJNKiF1ӃXNK{QJÿ~QJQKѭÿmQrXWUrQW{L[LQKRjQWRjQFKӏXWUiFKQKLӋPYӅ ÿӅWjLFӫDPuQK

1JѭӡLFDPÿRDQ

7UҫQ&{QJ+ұX

Trang 8

&KѭѫQJ CÁC CÔNG TRÌNH LIÊN QUAN 3

2.1 .LӃQWU~F%RWWRP-Up and Top-Down Attention 3

Trang 11

DANH MӨC CÁC TӮ VIӂT TҲT

R-CNN Regional-based Convolutional Neural Networks

Trang 13

Hình 15: Multi head attention [15] 22

Hình 16: Mô hình cho bài toán 23

+uQK0{KuQK[ӱOêFkXKӓLYjKuQKҧQKEDQÿҫX 23

Hình 18: Mô hình Encoder ± Decoder [4] 25

+uQK+uQKҧQKOjPLQSXW 26

Trang 14

&KѭѫQJ GIӞI THIӊU 1.1 Tәng quan

7URQJÿӡLVӕQJKLӋQQD\FRQQJѭӡLGӉGjQJQKuQWKҩ\PӝWKuQKҧQKYjWUҧOӡLEҩWNǤFkXKӓLQjROLrQTXDQÿӃQKuQKҧQKÿyEҵQJFiFKVӱGөQJNLӃQWKӭFWK{QJWKѭӡQJFNJQJQKѭNLQKQJKLӋPVӕQJFӫDFK~QJWD 7X\QKLrQFNJQJFyPӝWVӕWUѭӡQJKӧSQJRҥLOӋkhác QKѭOjQJѭӡLGQJNKLӃPWKӏKRһFcác QKjSKkQWtFKWUtWXӋKӑPXӕQFKӫÿӝQJWKXWKұSWK{QJWLQWUӵFTXDQEҵQJPӝWKuQKҧQKYjKӑNK{QJWKӇÿѭDUDÿѭӧFFkXWUҧOӡL'RÿyYLӋFÿL[k\GӵQJPӝWKӋWKӕQJWUҧOӡLFkXKӓLÿӇJLҧLTX\ӃWYҩQÿӅÿyOjPӝWF{QJYLӋFFҫQWKLӃW

0өFÿtFKFӫDKѭӟQJQJKLrQFӭXQj\OjÿLWuPKLӇXÿӇ[k\GӵQJPӝWKӋWKӕQJWUҧOӡLFkXKӓLWUӵFTXDQ 9LVXDO4XHVWLRQ$QVZHULQJ [1] GӵDWUrQYLӋFiSGөQJWUtWXӋQKkQWҥR $UWLILFLDOLQWHOOLJHQFH Oҩ\ÿҫXYjROjKuQKҧQKYjFkXKӓLÿӇ ÿѭD ra ckXWUҧOӡLEҵQJQJ{QQJӳWLӃQJ9LӋW

+ӋWKӕQJVӁFyNKҧQăQJWUҧOӡLFiFFkXKӓLKRjQWRjQNKiFQKDXYӅPӝWKuQKҧQKĈӕLYӟLWҩWFҧFiFKuQKҧQKKӋWKӕQJWUҧOӡLFkXKӓLWUӵFTXDQQj\VӁFyWKӇ[iFÿӏQKYӏWUtÿӕLWѭӧQJÿѭӧFWKDPFKLӃXÿӃQWURQJFkXKӓLYjSKiWKLӋQQyYjSKҧLFyPӝWVӕNLӃQWKӭFWK{QJWKѭӡQJÿӇWUҧOӡLQyĈӗQJWKӡL94$FҫQSKҧL WUҧOӡLPӝWFkXKӓLWѭѫQJWӵQKѭFRQQJѭӡLӣFiFNKtDFҥQKVDX

- +ӑFNLӃQWKӭFWUӵFTXDQYjYăQEҧQWӯÿҫXYjR KuQKҧQKYjFkXKӓLWѭѫQJӭQJ  - ӃWKӧSKDLOXӗQJGӳOLӋX

- 6ӱGөQJNLӃQWKӭFQkQJFDRQj\ÿӇWҥRUDFkXWUҧOӡL

Hình 1&KͱFQăQJFͯDP͡WK͏WK͙QJ94$

Trang 15

1.2 Thách thӭc cӫDÿӅ tài

7URQJÿӅWjLQj\W{LKѭӟQJÿӃQ[k\GӵQJPӝWKӋWKӕQJWUҧOӡLFkXKӓLFyNӃWKӧSYӟLKuQKҧQKĈӇWKӵFKLӋQÿѭӧFÿӅWjLFҫQÿzLKӓLQKLӅXNLӃQWKӭFPӟLWURQJFiFOƭQKYӵFWKӏJLiFPi\WtQK[ӱOêKuQKҧQK[ӱOêQJ{QQJӳWӵQKLrQ %rQFҥQKÿy mӝWWKiFKWKӭFNKiFOjFҫQJLҧLTX\ӃWEjLWRiQEҵQJWLӃQJ9LӋW

1.3 Mөc tiêu nghiên cӭu cӫDÿӅ tài

0өFWLrXFӫDÿӅWjLQj\OjQKҵPJLҧLTX\ӃWEjLWRiQWUҧOӡLFkXKӓLWLӃQJ9LӋWGӵDYjRKuQKҧQKÿmFKR.KLÿѭӧFFKRPӝWFkXKӓLYjPӝWKuQKҧQKWKuVӁÿѭDUDPӝWFkXWUҧOӡLFyQӝLGXQJOLrQTXDQÿӃQFkXKӓLYjKuQKҧQKÿmFKREҵQJFiFKVӱGөQJQJ{QQJӳWLӃQJ9LӋW

1.4 Giӟi hҥQYjÿӕLWѭӧng nghiên cӭu cӫDÿӅ tài

7URQJQJKLrQFӭXQj\W{LÿmWtFKKӧSFiFNӃWTXҧWLrQWLӃQFӫDYLӋFiSGөQJKӑFWұSVkXYjRYLӋFWUҧOӡLFkXKӓLFQJYӟLYLӋFWLӅQ[ӱOêGӳOLӋXEDQÿҫXOjFkXKӓLYjKuQKҧQKÿӇOjPFKRP{KuQKFyWKӇÿҥWKLӋXVXҩWWӕWKѫQT{LÿmWKӵFKLӋQFiFQKLӋPYөQJKLrQFӭXVDXÿk\

x 7uPKLӇXP{KuQKYӅWUҧOӡLFkXKӓLWUӵFTXDQYjWuPKLӇXFiFWұSGӳOLӋXKLӋQFy

x &KX\ӇQÿәLQJ{QQJӳFӫDEjLWRiQWӯWLӃQJ$QKVDQJWLӃQJ9LӋW x 3KiWWULӇQKӋWKӕQJ94$WLӃQJVLӋWGӵDWUrQP{FiFP{KuQKÿmFy x ĈiQKJLiNӃWTXҧÿҥWÿѭӧF

1.5 Ĉҫu ra cӫa nghiên cӭu

6DXNKLKRjQWKjQKGӵiQQJKLrQFӭXW{LK\YӑQJUҵQJVӁÿҥWÿѭӧFNӃWTXҧOjFyWKӇWUҧOӡLFKRFiFFkXKӓLWLӃQJ9LӋWYӟLEӝGӳOLӋXÿҫXYjROjKuQKҧQKYjFkXKӓLFyliên TXDQÿӃQKuQKҧQK

Trang 16

&KѭѫQJ CÁC CÔNG TRÌNH LIÊN QUAN

7URQJTXiWUuQKQJKLrQFӭXYӅÿӅWjLQj\W{LÿmFyWuPKLӇXPӝWVӕF{QJWUuQKFyOLrQTXDQWӟLEjLWRiQWUҧOӡLFkXKӓLWUӵFTXDQ0ӝWVӕEjLYLӃWOLrQTXDQÿӃQÿӅWjLEDRJӗPFiFEjLEiRÿҥWNӃWTXҧFDRWURQJFiFFXӝFWKLYӅVisual Question Answering KҵQJQăP

2.1 KiӃn trúc Bottom-Up and Top-Down Attention

&iFFѫFKӃKӑFFK~êWRS-down và bottom-XSÿmÿѭӧFVӱGөQJUӝQJUmLWURQJYLӋFWҥR

chú thích cho KuQKҧQKYjWUҧOӡLFkXKӓLEҵQJKuQKҧQK 7URQJEjLEiR³Bottom-Up

and Top-Down Attention for Image Captioning and Visual Question Answering´ [2],

FiF WiF JLҧ ÿm ÿӅ [XҩW PӝW Fѫ FKӃ NӃW KӧS JLӳD ERWWRP-up và top-GRZQ &ѫ FKӃbottom-up attention ÿӅ[XҩWPӝWWұSKӧSFiF YQJKuQK ҧQKQәLEұWYӟLPӛL YQJÿѭӧFÿҥLGLӋQEӣLPӝWYHFWѫÿһFWUѭQJÿѭӧFJӝSFKXQJ1KyPWiFJLҧÿmWULӇQNKDLbottom-up attention EҵQJFiFKVӱGөQJ)DVWHU5-CNN [8]&ѫFKӃWRS-down attention VӱGөQJQJӳFҧQKWKHRQKLӋPYөFөWKӇÿӇGӵÿRiQVӵSKkQEәVӵFK~êWUrQFiFYQJKuQKҧQK6DXÿyYHFWRUÿһFWUѭQJFӫDÿӕLWѭӧQJVӁÿѭӧFWtQKWRiQGѭӟLGҥQJWUXQJEuQKFyWUӑQJVӕFӫDÿһF WUѭQJKuQKҧQKWUrQWҩWFҧFiFYQJĈӇÿiQKJLiP{KuQKWKuQKyPWiFJLҧÿmWKӵFKLӋQKDLEѭӟF ĈҫXWLrQOjVӱGөQJP{KuQKLPDJHFDSWLRQLQJÿӇOҩ\QKDQKWK{QJWLQFӫDYQJKuQKҧQKQәLEұt6DXÿyWiFJLҧÿmWLӃQKjQKWKӱQJKLӋPYjÿiQKJLiNӃWTXҧ

NghLrQFӭXFӫDQKyPWiFJLҧQj\ÿmJLjQKÿѭӧFYӏWUtFDRQKҩWWURQJ&XӝFWKLVisual

Question Answering ÿҥWÿӝFKtQK[iFWәQJWKӇOj70,3% YjÿѭӧFWKӱQJKLӋP

WUrQWұSGӳOLӋXVQA v2.0 test-std

2.2 KiӃn trúc PyThia

Trong bài báo ³3\WKLDY7KHZLQQLQJHQWU\WRWKH94$FKDOOHQJH´>@, tác

JLҧ ÿm JLӟL WKLӋX PӝW P{ KuQK PҥQJ KӑF VkX [ӱ Oê EjL WRiQ 9LVXDO 4XHVWLRQ$QVZHULQJ0{KuQKQj\GӵDWUrQP{KuQK%RWWRP± Up and Top - down Attention >@ÿmÿѭӧFÿӅFұSÿӃQWUѭӟFÿyQKѭQJFyPӝWYjLEәVXQJÿӇQKҵPWăQJÿӝFKtQK[iFFKRNӃWTXҧGӵÿRiQ

0{KuQKÿѭӧFPLQKKӑDQKѭKuQKErQGѭӟLWUtFK[XҩWUDFiFÿһFWUѭQJWӯKuQKҧQKVӱGөQJSKpSQKkQHOHPHQW-ZLVHÿӇNӃWKӧSFiFÿһFWUѭQJFӫDKuQKҧQKFkXKӓLWҥR

Trang 17

UDPӝWWHQVRUWұSWUXQJ VӁPDQJÿҫ\ÿӫWK{QJWLQJLӳDQӝLGXQJFkXKӓLYjFiFÿӕLWѭӧQJOLrQTXDQWURQJEӭFҧQK

Hình 2.L͇QWU~F3\WKLD>@

0ӝWVӕEәVXQJWURQJNLӃQWU~FS\WKLDEDRJӗP

- Model Architecture: VӱGөQJSKpSQKkQHOHPHQW-ZLVHÿӇNӃWKӧSFiFWtQK

QăQJWӯSKѭѫQJWKӭFYăQEҧQYjKuQKҧQK

- Learning Schedule: WKD\ÿәLWӕFÿӝKӑFWURQJTXiWUuQKKXҩQOX\ӋQ

- Fine-Tuning Bottom-Up Features: ÿһWOHDUQLQJUDWHOjOҫQOHDUQLQJUDWH

WәQJWKӇ

- Data Augmentation: WKrPWұSGӳOLӋXWUDLQLQJ

- Model Ensembling: FKӑQFiFP{KuQKÿѭӧFÿjRWҥRYӟLFiFFjLÿһWNKiF

QKDXVӱGөQJXSGRZQPRGHOÿmÿѭӧFWUDLQVҹQӣWұSGӳOLӋX94$

.ӃWTXҧWKӵFKLӋQFӫDQKyPWiFJLҧQj\ÿmÿҥWÿѭӧFWURQJYLӋFÿiQKJLiYӟLWұSGӳOLӋXtest-std VQA v2

2.3 KiӃn trúc mҥng Modular Co-Attention Networks

'ӵDWUrQP{KuQKTransformer, mô hình Modular Co-Attention Networks (MCAN) [4] ÿѭӧFÿѭDUDYjRQăPP{KuQKQj\ÿmÿҥWÿѭӧFNӃWTXҧWӕWQKҩWWURQJFXӝFWKLYӅ9LVXDO4XHVWLRQ$QVZHULQJ7URQJEjLEiRFiFWiFJLҧÿmÿӅ[XҩWPӝWPҥQJÿӗQJFK~êWKHRP{-ÿXQ 0&$1 EDRJӗPFiFOӟSÿӗQJFK~êWKHRP{-ÿXQ

Trang 18

ÿѭӧFFiFYQJFyWKӇOjYұWWKӇWURQJҧQKYjFKXҭQKyDFK~QJYӅPӝWGҥQJYHFWRUÿӗQJQKҩWӣÿk\WiFJLҧVӁÿѭDYӅPӝWYHFWRUFyFKLӅX

7URQJNKӕL0RGXODU&R-$WWHQWLRQQKyPWiFJLҧÿmNӃWKӧSYӟLFiFÿһF WUѭQJÿmÿѭӧFWUtFK[XҩWWӯҧQKWK{QJTXDFѫFKӃERWWRP-XSDWWHQWLRQYjFkXKӓLFKRWUѭӟFWK{QJqua mô hình (Global vectors for word representation) GloVe [9] và LSTM [10] ÿӇÿѭD UD NӃW TXҧ WK{QJ TXD PӝW EjL WRiQ SKkQ ORҥL FRQ 6ӱ GөQJ KDL Fѫ FKӃ 6HOI-Attention và Guided-$WWHQWLRQOjÿLӇPKLӋXTXҧWURQJP{KuQKNKLFK~QJVӁÿѭӧFOLrQNӃWYӟLQKDXÿӇFyWKӇWKӵFKLӋQFѫFKӃWұSWUXQJKLӋXTXҧWUrQFҧKDLLQSXWOjFkXKӓLvà KuQKҧQKÿӇJLҧLTX\ӃWWӕWEjLWRiQ

0{KuQK0&$1ÿѭӧFELӇXGLӉQWKHRVѫÿӗKuQKErQGѭӟLJӗPFyEDJLDLÿRҥQ[ӱOêFkXKӓLYjKuQKҧQKÿҫXYjRVӱGөQJ'HHS&R-$WWHQWLRQ/HDUQLQJÿӇOҩ\FiFÿһFWUѭQJFӫDFkXKӓLYjKuQKҧQKVDXÿyKӧSQKҩWFiFÿһFWUѭQJYjÿѭDUDFkXWUҧOӡLFKRbài toán

Hình 3.L͇QWU~F0&$1>@

0{KuQKPDQJOҥLÿӝFKtQK[iFWUrQEӝtest-std VQA-v2

2.4 KiӃn trúc ImageBERT

/ҩ\êWѭӣQJWӯNLӃQWU~F%(57 [11@QәLWLӃQJWURQJOƭQKYӵF[ӱOêQJ{QQJӳWӵnhiên FӫD*RRJOHQăPQKyPQJKLrQFӭXӣ0LFURVRIWÿmÿӅ[XҩWP{KuQK,PDJH%(57 [12@ ÿҥW ÿѭӧF QKLӅX NӃW TXҧ ҩQ WѭӧQJ FKR FiF EjL WRiQ ÿD WKӇ PXOWL-model) ,PDJH%(57PmKyDFҧKuQKҧQKYjYăQEҧQӣWҫQJWUtFK[XҩW YHFWRUÿһFWUѭQJ6DXÿyVӁÿѭӧFFKX\ӇQWLӃSÿӃQFiFNKӕLPXOWL-head self-DWWHQWLRQFKRYLӋFKXҩQOX\ӋQYӟLWiFYөFKRYLӋFKXҩQOX\ӋQNK{QJJLiPViW

- Masked Language Modeling 0/0 7iFYөWѭѫQJWӵJLӕQJQKѭSKLrQEҧQ

JӕFFyQKLӋPYөGӵÿRiQFiFWӯÿѭӧFFKHOҥL

Trang 19

- Masked Object Classification7iFYөÿѭӧFSKiWWULӇQWKrPGӵDWUrQ0/0

2.5 ĈiQKJLi

&iFEjLEiRPjW{LÿmWuPKLӇXӣWUrQÿӅXÿѭӧFÿѭDUDÿӇJLҧLTX\ӃWFKREjLWRiQ9LVXDO4XHVWLRQ$QVZHULQJ&iFEjLEiRÿѭӧFÿѭDUDӣFiFQăPNKiFQKDXPӭFÿӝFKtQK[iFQJj\FjQJÿѭӧFWăQJOrQÿiQJNӇTXDWӯQJQăP'ѭӟLÿk\OjEҧQJÿiQKJLiÿһFÿLӇPFӫDFiFP{KuQKYjPӝWVӕNӃWTXҧNKLFKҥ\WKӱQJKLӋPYӟLWұSGӳOLӋXVQA-v2

WұSGӳOLӋX94$-v2

Bottom-up and top-down attention

ĈӅ[XҩWPӝWFѫFKӃNӃWKӧSJLӳDbottom-up attention và top-down attention:

- 6ӱGөQJ)DVWHU5-&11ÿӇOҩ\UDFiFÿһFWUѭQJFӫDKuQKҧQK6ӱGөQJWRS-GRZQDWWHQWLRQÿӇOҩ\UDWK{QJWLQFӫDFkXKӓL

- 6ӱGөQJFDSWLRQPRGHOÿӇOҩ\QKDQKWK{QJWLQFӫDYQJKuQKҧQKQәLEұW

Trang 20

- Fine-Tuning Bottom-Up Features:

ÿһWOHDUQLQJUDWHOjOҫQOHDUQLQJUDWHWәQJWKӇ

- Data Augmentation: WKrPWұSGӳ

OLӋXWUDLQLQJ

- Model Ensembling: FKӑQFiFP{

KuQKÿѭӧFÿjRWҥRYӟLFiFFjLÿһWNKiFQKDXVӱGөQJXSGRZQPRGHOÿmÿѭӧFWUDLQVҹQӣWұSGӳOLӋXVQA

MCAN 0{KuQKNӃWKӧSÿѭӧFFiFÿһFWUѭQJÿmÿѭӧFWUtFK[XҩWWӯҧQKWK{QJTXDFѫFKӃbottom-XSDWWHQWLRQYjFkXKӓLFKRWUѭӟFWK{QJTXDP{KuQK*OR9HÿӇÿѭDUDNӃWTXҧWK{QJTXDPӝWEjL WRiQSKkQORҥLcon

75.23% and 75.26%

on std and challenge

test-ImageBERT 0{KuQKQj\OjP{KuQKGӵDWUrQ7UDQVIRUPHUOҩ\FiFSKѭѫQJWKӭFNKiFQKDXOjPÿҫXYjRYjP{KuQKKyDPӕLTXDQKӋJLӳDFK~QJ

,PDJH%(57PmKyDFҧKuQKҧQKYjYăQEҧQӣWҫQJWUtFK[XҩWYHFWRUÿһFWUѭQJ6DXÿyVӁÿѭӧFFKX\ӇQWLӃSÿӃQFiFNKӕLmulti-head self-DWWHQWLRQFKRYLӋFKXҩQOX\ӋQYӟLWiFYөFKRYLӋFKXҩQOX\ӋQ

- Masked Language Modeling : có

QKLӋPYөGӵÿRiQFiFWӯÿѭӧFFKHOҥL

- Masked Object Classification: mô

huQKFKHÿLQJүXQKLrQOѭӧQJWKҿFӫDYұWWKӇ

- Masked Region Feature

QKLrQFiFYQJYұWWKӇWUrQҧQK

- Image-Text Matching: 7iFYөFy

YDLWUzOLrQNӃWJLӳDFiFYQJKuQKҧQKYӟLWӯÿѭӧFWUtFK[XҩWUDWURQJYăQEҧQ

%̫QJ1ĈiQKJLiFiFP{KuQK

Trang 21

&KѭѫQJ KIӂN THӬC NӄN TҦNG 3.1 KiӃn thӭc lý thuyӃt nӅn tҧng

3.1.1 Mҥng Neural nhân tҥo

0ҥQJQѫURQQKkQWҥR (ANN) [13] OjP{KuQK[ӱOêWK{QJWLQÿѭӧFP{SKӓQJGӵDWUrQKRҥWÿӝQJFӫDKӋWKӕQJWKҫQNLQKFӫDVLQKYұWEDRJӗPVӕOѭӧQJOӟQFiF1ѫURQÿѭӧFJҳQNӃWÿӇ[ӱOêWK{QJWLQ

3.1.1.1 Mô hình Perceptron

0{KuQK3HUFHSWURQOjP{KuQKPҥQJQѫURQÿѫQJLҧQQKҩWFKӍYӟLPӝWWҫQJÿҫXYjRYjWҫQJÿҫXUDÿk\FzQÿѭӧFJӑLOjEӝSKkQWiFKWX\ӃQWtQKQySKөFYөFKRYLӋFJLҧLTX\ӃW FiF EjL WRiQ SKkQ ORҥL WX\ӃQ WtQK Ӣ WURQJ KuQK SKtD ErQ GѭӟL là mô hình

3HUFHSWURQVӱGөQJKjPsigmoid, Oҩ\YtGөPӝWP{KuQKSHUFHSWURQELӇXGLӉQݕො ൌ

ߪሺݓͲ ൅ ݓͳ כ ݔͳ ൅ ݓʹ כ ݔʹሻP{KuQKQj\KRҥWÿӝQJWK{QJTXDKDLEѭӟF

- Tính tәng linear: ܢ ൌ ͳ כ  ࢝૙ ൅  ࢝૚ כ ܠͳ ൅  ࢝૛ כ ܠʹ 7URQJÿy࢝૙ ÿѭӧFJӑLOjbias

- 7tQKJLiWUӏWtQKWRiQÿҫXUDWUҧYӅWӯKjPNtFKKRҥW

Hình 4: Mô hình Perceptron

3.1.1.2 Mô hình Multilayer Perceptron

Trang 22

0{KuQK0XOWLOD\HU3HUFHSWURQOjPӝWP{KuQKFyFҩXWU~FWәQJTXiWKѫQP{KuQK3HUFHSWURQ0{KuQKQj\VӁFyNKҧQăQJJLҧLTX\ӃWFiFEjLWRiQSKkQWiFKSKLWX\ӃQ0{KuQK0XOWLOD\HU3HUFHSWURQÿѭӧFVӱGөQJSKәELӃQWURQJFiFEjLWRiQ SKkQORҥLÿӕLWѭӧQJSKiWKLӋQUDQKӳQJTXDQKӋSKӭFWҥSFӫDGӳOLӋXOjPQӅQWҧQJÿӇQJKLrQFӭXYjSKiWPLQKFiFNLӃQWU~FPҥQJKӑFVkXSKӭFWҥSWURQJOƭQKYӵFWKӏJLiFPi\WtQKKD\[ӱOtQJ{QQJӳWӵQKLrQ

Mô hình Multilayer Perceptron VӁJӗPFiFWKjQKSKҫQVDX - 0ӝWWҫQJÿҫXYjR LQSXWOD\HU 

- 7ҫQJӣJLӳDKDLWҫQJQrXWUrQÿѭӧFJӑLWҫQJҭQ KLGGHQOD\HU  - 0ӝWWҫQJÿҫXUD RXWSXWOD\HU 

Hình 5: Mô hình Multilayer Perceptron

0ӛLWҫQJWURQJP{KuQKPҥQJnày FyWKӇEDRJӗPPӝWKRһFQKLӅXÿѫQYӏJӑLOjnode 0ӛLnode FӫDWҫQJVDXVӁÿѭӧFOLrQNӃWYӟLWRjQEӝnode ӣWҫQJWUѭӟF NK{QJNӇWҫQJÿҫXYjR

Ngày đăng: 03/08/2024, 23:03

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN