1. Trang chủ
  2. » Luận Văn - Báo Cáo

Luận văn thạc sĩ Khoa học máy tính: Hệ thống trả lời câu hỏi trực quan

51 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Hệ thống trả lời câu hỏi trực quan
Tác giả Lê Nguyễn Bảo
Người hướng dẫn TS. Nguyễn Văn Ánh, PTS, PGS.TS. Nguyễn Thị Thanh Huyền, PGS.TS. Nguyễn Văn Dũng, PGS.TS. Lê Minh Tâm, PGS.TS. Nguyễn Văn Hoàng, TS. Lê Hồng Quân
Trường học Trường Đại học Bách khoa Tp.HCM
Chuyên ngành Khoa học máy tính
Thể loại Luận văn thạc sĩ
Năm xuất bản 2021
Thành phố Tp. HCM
Định dạng
Số trang 51
Dung lượng 0,91 MB

Nội dung

DANH MӨC CÁC TӮ VIӂT TҲT R-CNN Regional-based Convolutional Neural Networks... DANH MӨC CÁC BIӆ8ĈӖ HÌNH ҦNH +uQK&KӭFQăQJFӫDPӝWKӋWKӕQJ94$ .... 8 Hình 5: Mô hình Multilayer Perceptron ....

Trang 2

.+2$+Ӑ&9¬.Ӻ7+8Ұ70È<7Ë1+

Trang 4

LӠI CҦ0Ѫ1

7{L[LQWUkQWUӑQJJӱLOӡLELӃWѫQFKkQWKjQKÿӃQWKҫ\3*6764XҧQ7KjQK7KѫQJѭӡLÿmWUӵFWLӃSKѭӟQJGүQWұQWuQKFKӍEҧRW{LWURQJTXiWUuQKWKӵFKLӋQÿӅWjLĈӗQJWKӡLWKҫ\FNJQJOjQJѭӡLOX{QFKRW{LQKӳQJOӡLNKX\rQY{FQJTXêJLiYӅFҧNLӃQWKӭFFKX\rQP{QFNJQJQKѭÿӏQKKѭӟQJSKiWWULӇQVӵQJKLӋS7{L[LQFҧPѫQWKҫ\YӅQKӳQJNLӃQWKӭFPjWKҫ\ÿmWUX\ӅQÿҥW7{LFNJQJ[LQFKkQWKjQKFҧPѫQWҩWFҧTXê7Kҫ\&{WURQJNKRDÿmWұQWuQK JL~Sÿӥ ÿӇ W{LKRjQWKjQKÿӅWjL 7{L[LQFҧPѫQYӅWҩWFҧVӵJL~SÿӥFӫDDQKFKӏYjFiFEҥQKӑFYLrQFQJKӑF FKXQJYӟLW{LYjÿmJL~SW{LKRjQWKjQKÿӅWjLOXұQYăQ7KҥFVƭQj\YjJySêFKRW{LWURQJTXiWUuQKWKӵFKLӋQOXұQYăQ

+ӗ&Kt0LQKQJj\WKiQJQăP

7UҫQ&{QJ+ұX

Trang 5

TÓM TҲT LUҰ19Ă1

7UҧOӡLFkXKӓLWUӵFTXDQOjPӝWWURQJQKӳQJFKӫÿӅWѭѫQJÿӕLPӟLWURQJOƭQKYӵFWUtWXӋQKkQWҥR9ӟLPӝWEӝGӳOLӋXÿҫXYjROjPӝWKuQKҧQKYjPӝWFkXKӓLGҥQJYăQEҧQFyQӝLGXQJOLrQTXDQÿӃQEӭFҧQKWKuKӋWKӕQJVӁFKRUDPӝWFkXWUҧOӡLFyêQJKƭDYj FyQӝLGXQJOLrQTXDQÿӃQFkXKӓLĈLӅXQj\FyQJKƭDOjPӝWKӋWKӕQJWUҧOӡLFkXKӓLWUӵFTXDQFҫQFyNKҧQăQJ[ӱOêKuQKҧQKFKҷQJKҥQQKѭQKұQGLӋQWKӵFWKӇSKiWKLӋQÿӕLWѭӧQJQKұQGҥQJKRҥWÿӝQJ

7URQJOXұQYăQQj\W{LKѭӟQJÿӃQYLӋFVӱGөQJP{KuQKÿӇFyWKӇÿѭDUDÿѭӧFFkXWUҧOӡLFKRFkXKӓLGҥQJYăQEҧQEҵQJWLӃQJ9LӋW.ӃWTXҧÿiQKJLiWUrQEӝWұSGӳOLӋX

WLӃQJ9LӋWKLӋQFyWURQJOXұQYăQQj\ÿmÿҥWÿӝFKtQK[iFWәQJWKӇOj64.77% Thông

TXDOXұQYăQQj\W{LPRQJPXӕQÿyQJJySPӝWSKҫQQKӓ FKRFӝQJÿӗQJQJKLrQFӭXYӅWLӃQJ9LӋW

Trang 6

ABSTRACT

Visual question answering is one of the relatively new topics in the field of artificial intelligence Use an image as input and an image-related question to give a meaningful and relevant answer to the question The visual question answering system needs to have image processing capabilities, such as entity recognition, object detection, activity recognition, etc

In this thesis, I aim to use the model to be able to give an answer to the text-based questions in Vietnamese The evaluation results on the existing Vietnamese dataset

in this thesis have achieved an overall accuracy of 64.77% Through this thesis, I

wish to contribute a part to the Vietnamese language research community

Trang 7

LӠ,&$0Ĉ2$1

Tôi cam ÿRDQUҵQJFiFF{QJYLӋFWUuQKEj\WURQJOXұQYăQQj\OjGRFKtQKW{LWKӵFKLӋQYjFKѭDFySKҫQQӝLGXQJQjRFӫDOXұQYăQQj\ÿѭӧFQӝSÿӇOҩ\PӝWEҵQJFҩSӣWUѭӡQJQj\KRһFWUѭӡQJNKiF1ӃXNK{QJÿ~QJQKѭÿmQrXWUrQW{L[LQKRjQWRjQFKӏXWUiFKQKLӋPYӅ ÿӅWjLFӫDPuQK

1JѭӡLFDPÿRDQ

7UҫQ&{QJ+ұX

Trang 8

MӨC LӨC

1+,ӊ09Ө/8Ұ19Ă17+Ҥ&6Ƭ i

/Ӡ,&Ҧ0Ѫ1 ii

7Ï07Ҳ7/8Ұ19Ă1 iii

ABSTRACT iv

/Ӡ,&$0Ĉ2$1 v

0Ө&/Ө& vi

'$1+0Ө&&È&7Ӯ9,ӂ77Ҳ7 ix

'$1+0Ө&&È&%Ҧ1* x

'$1+0Ө&&È&%,ӆ8ĈӖ+Î1+Ҧ1+ xxi

&KѭѫQJ *,Ӟ,7+,ӊ8 1

1.1 7әQJTXDQ 1

1.2 7KiFKWKӭFFӫDÿӅWjL 2

1.3 0өF WLrXQJKLrQFӭXFӫDÿӅWjL 2

1.4 *LӟLKҥQYjÿӕLWѭӧQJQJKLrQFӭXFӫDÿӅWjL 2

1.5 ĈҫXUDFӫDQJKLrQFӭX 2

&KѭѫQJ CÁC CÔNG TRÌNH LIÊN QUAN 3

2.1 .LӃQWU~F%RWWRP-Up and Top-Down Attention 3

2.2 .LӃQWU~F3\7KLD 3

2.3 .LӃQWU~FPҥQJ0RGXODU&R-Attention Networks 4

2.4 .LӃQWU~F,PDJH%(57 5

2.5 ĈiQKJLi 6

&KѭѫQJ ,ӂ17+Ӭ&1ӄ17Ҧ1* 8

3.1 .LӃQWKӭFOêWKX\ӃWQӅQWҧQJ 8

3.1.1 0ҥQJ1HXUDOQKkQWҥR 8

3.1.2 +jPNtFKKRҥW 9

3.1.3 +jPPҩWPiW 10

3.2 .LӃQWKӭFQӅQWҧQJWURQJ[ӱOêQJ{QQJӳWӵQKLrQ 11

Trang 9

3.2.1 Word Embedding 11

3.2.2 Mô hình Continuous Bag-of-Words 11

3.2.3 Mô hình Skip-gram 12

3.2.4 Mô hình GloVe 12

3.3 Long Short-Term Memory 13

3.3.1 0ҥQJKӗLTX\5HFXUUHQW1HXUDO1HWZRUN 13

3.3.2 0ҥQJ/RQJ6KRUW-Term Memory 14

3.4 .LӃQWU~F5HJLRQDO-based Convolutional Neural Networks 15

3.4.1 .LӃQWU~F5-CNN 15

3.4.2 .LӃQWU~F)DVW5-CNN 16

3.4.3 .LӃQWU~F)DVWHU5-CNN 18

3.5 Modular Co-Attention 20

3.5.1 &ѫFKӃ6HOI-Attention 20

3.5.2 &ѫFKӃ0XOWL+HDG$WWHQWLRQ 21

3.5.3 Modular Co-Attention Layer 22

&KѭѫQJ 3+ѬѪ1*3+È31*+,Ç1&Ӭ8 23

4.1 Mô hình 23

4.1.1 XӱOêFkXKӓLYjKuQKҧQK 23

4.1.2 Deep Co-Attention learning 24

4.1.3 PKkQORҥLÿҫXUD 25

4.2 3KѭѫQJSKiSWKXWKұSGӳOLӋX 26

4.2.1 7ұSGӳOLӋX94$-v2 26

4.2.2 TұSGӳOLӋX94$-YWLӃQJ9LӋW 28

4.3 3KѭѫQJSKiS[ӱOêWұSGӳOLӋXWLӃQJ9LӋW 29

&KѭѫQJ 7+Ӵ&1*+,ӊ0ĈÈ1+*,È.ӂ748Ҧ 30

Trang 10

5.2.2 ĈiQKJLiFkXKӓLGҥQJÿ~QJVDL 31

5.2.3 ĈiQKJLiFkXKӓLWKXӝFGҥQJNKiF 32

5.2.4 ĈiQKJLiWәQJWKӇ 33

&KѭѫQJ ӂ7/8Ұ1 35

6.1 7әQJNӃWÿӅWjL 35

6.2 ĈӅ[XҩW KѭӟQJPӣUӝQJFӫDÿӅWjL 35

7¬,/,ӊ87+$0.+Ҧ2 36

3+Ҫ1/é/ӎ&+75Ë&+1*$1* 38

Trang 11

DANH MӨC CÁC TӮ VIӂT TҲT

R-CNN Regional-based Convolutional Neural Networks

Trang 12

DANH MӨC CÁC BҦNG

%ҧQJĈiQKJLiFiFP{KuQK 7

%ҧQJ%ҧQJWyPWҳWWұSGӳOLӋX94$-YWLӃQJ9LӋW 28

%ҧQJ%ҧQJÿiQKJLiFkXKӓLYӅVӕOѭӧQJ 31

%ҧQJ%ҧQJÿiQKJLiFkXKӓLGҥQJÿ~QJVDL 32

%ҧQJ%ҧQJÿiQKJLiFkXKӓLWKXӝFGҥQJNKiF 33

%ҧQJ%ҧQJÿiQKJLiNӃWTXҧJLӳDWLӃQJ$QKYӟLWLӃQJ9LӋW 34

Trang 13

DANH MӨC CÁC BIӆ8ĈӖ HÌNH ҦNH

+uQK&KӭFQăQJFӫDPӝWKӋWKӕQJ94$ 1

+uQK.LӃQWU~F3\WKLD>@ 4

+uQK.LӃQWU~F0&$1>@ 5

Hình 4: Mô hình Perceptron 8

Hình 5: Mô hình Multilayer Perceptron 9

+uQK6ѫÿӗPLQKKӑDP{KuQK&%2: 11

Hình 7: Mô hình Skip-gram 12

Hình 8: Mô hình RNN 13

+uQK0{KuQKPҥQJ/670>@ 14

+uQK.LӃQWU~F5-CNN [6] 16

Hình .LӃQWU~F)DVW5-CNN [7] 17

+uQK.LӃQWU~F)DVWHU5-CNN [8] 18

+uQK9tGөYӅVHOI-attention 20

Hình 14: Self-Attention 21

Hình 15: Multi head attention [15] 22

Hình 16: Mô hình cho bài toán 23

+uQK0{KuQK[ӱOêFkXKӓLYjKuQKҧQKEDQÿҫX 23

Hình 18: Mô hình Encoder ± Decoder [4] 25

+uQK+uQKҧQKOjPLQSXW 26

Trang 14

&KѭѫQJ GIӞI THIӊU

1.1 Tәng quan

7URQJÿӡLVӕQJKLӋQQD\FRQQJѭӡLGӉGjQJQKuQWKҩ\PӝWKuQKҧQKYjWUҧOӡLEҩWNǤFkXKӓLQjROLrQTXDQÿӃQKuQKҧQKÿyEҵQJFiFKVӱGөQJNLӃQWKӭFWK{QJWKѭӡQJFNJQJQKѭNLQKQJKLӋPVӕQJFӫDFK~QJWD 7X\QKLrQFNJQJFyPӝWVӕWUѭӡQJKӧSQJRҥLOӋkhác QKѭOjQJѭӡLGQJNKLӃPWKӏKRһFcác QKjSKkQWtFKWUtWXӋKӑPXӕQFKӫÿӝQJWKXWKұSWK{QJWLQWUӵFTXDQEҵQJPӝWKuQKҧQKYjKӑNK{QJWKӇÿѭDUDÿѭӧFFkXWUҧOӡL'RÿyYLӋFÿL[k\GӵQJPӝWKӋWKӕQJWUҧOӡLFkXKӓLÿӇJLҧLTX\ӃWYҩQÿӅÿyOjPӝWF{QJYLӋFFҫQWKLӃW

0өFÿtFKFӫDKѭӟQJQJKLrQFӭXQj\OjÿLWuPKLӇXÿӇ[k\GӵQJPӝWKӋWKӕQJWUҧOӡLFkXKӓLWUӵFTXDQ 9LVXDO4XHVWLRQ$QVZHULQJ [1] GӵDWUrQYLӋFiSGөQJWUtWXӋQKkQWҥR $UWLILFLDOLQWHOOLJHQFH Oҩ\ÿҫXYjROjKuQKҧQKYjFkXKӓLÿӇ ÿѭD ra ckXWUҧOӡLEҵQJQJ{QQJӳWLӃQJ9LӋW

+ӋWKӕQJVӁFyNKҧQăQJWUҧOӡLFiFFkXKӓLKRjQWRjQNKiFQKDXYӅPӝWKuQKҧQKĈӕLYӟLWҩWFҧFiFKuQKҧQKKӋWKӕQJWUҧOӡLFkXKӓLWUӵFTXDQQj\VӁFyWKӇ[iFÿӏQKYӏWUtÿӕLWѭӧQJÿѭӧFWKDPFKLӃXÿӃQWURQJFkXKӓLYjSKiWKLӋQQyYjSKҧLFyPӝWVӕNLӃQWKӭFWK{QJWKѭӡQJÿӇWUҧOӡLQyĈӗQJWKӡL94$FҫQSKҧL WUҧOӡLPӝWFkXKӓLWѭѫQJWӵQKѭFRQQJѭӡLӣFiFNKtDFҥQKVDX

- +ӑFNLӃQWKӭFWUӵFTXDQYjYăQEҧQWӯÿҫXYjR KuQKҧQKYjFkXKӓLWѭѫQJӭQJ 

- ӃWKӧSKDLOXӗQJGӳOLӋX

- 6ӱGөQJNLӃQWKӭFQkQJFDRQj\ÿӇWҥRUDFkXWUҧOӡL

Hình 1&KͱFQăQJFͯDP͡WK͏WK͙QJ94$

Trang 15

1.2 Thách thӭc cӫDÿӅ tài

7URQJÿӅWjLQj\W{LKѭӟQJÿӃQ[k\GӵQJPӝWKӋWKӕQJWUҧOӡLFkXKӓLFyNӃWKӧSYӟLKuQKҧQKĈӇWKӵFKLӋQÿѭӧFÿӅWjLFҫQÿzLKӓLQKLӅXNLӃQWKӭFPӟLWURQJFiFOƭQKYӵFWKӏJLiFPi\WtQK[ӱOêKuQKҧQK[ӱOêQJ{QQJӳWӵQKLrQ %rQFҥQKÿy mӝWWKiFKWKӭFNKiFOjFҫQJLҧLTX\ӃWEjLWRiQEҵQJWLӃQJ9LӋW

1.3 Mөc tiêu nghiên cӭu cӫDÿӅ tài

0өFWLrXFӫDÿӅWjLQj\OjQKҵPJLҧLTX\ӃWEjLWRiQWUҧOӡLFkXKӓLWLӃQJ9LӋWGӵDYjRKuQKҧQKÿmFKR.KLÿѭӧFFKRPӝWFkXKӓLYjPӝWKuQKҧQKWKuVӁÿѭDUDPӝWFkXWUҧOӡLFyQӝLGXQJOLrQTXDQÿӃQFkXKӓLYjKuQKҧQKÿmFKREҵQJFiFKVӱGөQJQJ{QQJӳWLӃQJ9LӋW

1.4 Giӟi hҥQYjÿӕLWѭӧng nghiên cӭu cӫDÿӅ tài

7URQJQJKLrQFӭXQj\W{LÿmWtFKKӧSFiFNӃWTXҧWLrQWLӃQFӫDYLӋFiSGөQJKӑFWұSVkXYjRYLӋFWUҧOӡLFkXKӓLFQJYӟLYLӋFWLӅQ[ӱOêGӳOLӋXEDQÿҫXOjFkXKӓLYjKuQKҧQKÿӇOjPFKRP{KuQKFyWKӇÿҥWKLӋXVXҩWWӕWKѫQT{LÿmWKӵFKLӋQFiFQKLӋPYөQJKLrQFӭXVDXÿk\

x 7uPKLӇXP{KuQKYӅWUҧOӡLFkXKӓLWUӵFTXDQYjWuPKLӇXFiFWұSGӳOLӋXKLӋQFy

Trang 16

&KѭѫQJ CÁC CÔNG TRÌNH LIÊN QUAN

7URQJTXiWUuQKQJKLrQFӭXYӅÿӅWjLQj\W{LÿmFyWuPKLӇXPӝWVӕF{QJWUuQKFyOLrQTXDQWӟLEjLWRiQWUҧOӡLFkXKӓLWUӵFTXDQ0ӝWVӕEjLYLӃWOLrQTXDQÿӃQÿӅWjLEDRJӗPFiFEjLEiRÿҥWNӃWTXҧFDRWURQJFiFFXӝFWKLYӅVisual Question Answering KҵQJQăP

2.1 KiӃn trúc Bottom-Up and Top-Down Attention

&iFFѫFKӃKӑFFK~êWRS-down và bottom-XSÿmÿѭӧFVӱGөQJUӝQJUmLWURQJYLӋFWҥR

chú thích cho KuQKҧQKYjWUҧOӡLFkXKӓLEҵQJKuQKҧQK 7URQJEjLEiR³Bottom-Up

and Top-Down Attention for Image Captioning and Visual Question Answering´ [2],

FiF WiF JLҧ ÿm ÿӅ [XҩW PӝW Fѫ FKӃ NӃW KӧS JLӳD ERWWRP-up và top-GRZQ &ѫ FKӃbottom-up attention ÿӅ[XҩWPӝWWұSKӧSFiF YQJKuQK ҧQKQәLEұWYӟLPӛL YQJÿѭӧFÿҥLGLӋQEӣLPӝWYHFWѫÿһFWUѭQJÿѭӧFJӝSFKXQJ1KyPWiFJLҧÿmWULӇQNKDLbottom-up attention EҵQJFiFKVӱGөQJ)DVWHU5-CNN [8]&ѫFKӃWRS-down attention VӱGөQJQJӳFҧQKWKHRQKLӋPYөFөWKӇÿӇGӵÿRiQVӵSKkQEәVӵFK~êWUrQFiFYQJKuQKҧQK6DXÿyYHFWRUÿһFWUѭQJFӫDÿӕLWѭӧQJVӁÿѭӧFWtQKWRiQGѭӟLGҥQJWUXQJEuQKFyWUӑQJVӕFӫDÿһF WUѭQJKuQKҧQKWUrQWҩWFҧFiFYQJĈӇÿiQKJLiP{KuQKWKuQKyPWiFJLҧÿmWKӵFKLӋQKDLEѭӟF ĈҫXWLrQOjVӱGөQJP{KuQKLPDJHFDSWLRQLQJÿӇOҩ\QKDQKWK{QJWLQFӫDYQJKuQKҧQKQәLEұt6DXÿyWiFJLҧÿmWLӃQKjQKWKӱQJKLӋPYjÿiQKJLiNӃWTXҧ

NghLrQFӭXFӫDQKyPWiFJLҧQj\ÿmJLjQKÿѭӧFYӏWUtFDRQKҩWWURQJ&XӝFWKLVisual

Question Answering ÿҥWÿӝFKtQK[iFWәQJWKӇOj70,3% YjÿѭӧFWKӱQJKLӋP

WUrQWұSGӳOLӋXVQA v2.0 test-std

2.2 KiӃn trúc PyThia

Trong bài báo ³3\WKLDY7KHZLQQLQJHQWU\WRWKH94$FKDOOHQJH´>@, tác

JLҧ ÿm JLӟL WKLӋX PӝW P{ KuQK PҥQJ KӑF VkX [ӱ Oê EjL WRiQ 9LVXDO 4XHVWLRQ

$QVZHULQJ0{KuQKQj\GӵDWUrQP{KuQK%RWWRP± Up and Top - down Attention

>@ÿmÿѭӧFÿӅFұSÿӃQWUѭӟFÿyQKѭQJFyPӝWYjLEәVXQJÿӇQKҵPWăQJÿӝFKtQK[iFFKRNӃWTXҧGӵÿRiQ

0{KuQKÿѭӧFPLQKKӑDQKѭKuQKErQGѭӟLWUtFK[XҩWUDFiFÿһFWUѭQJWӯKuQKҧQKVӱGөQJSKpSQKkQHOHPHQW-ZLVHÿӇNӃWKӧSFiFÿһFWUѭQJFӫDKuQKҧQKFkXKӓLWҥR

Trang 17

UDPӝWWHQVRUWұSWUXQJ VӁPDQJÿҫ\ÿӫWK{QJWLQJLӳDQӝLGXQJFkXKӓLYjFiFÿӕLWѭӧQJOLrQTXDQWURQJEӭFҧQK

Hình 2.L͇QWU~F3\WKLD>@

0ӝWVӕEәVXQJWURQJNLӃQWU~FS\WKLDEDRJӗP

- Model Architecture: VӱGөQJSKpSQKkQHOHPHQW-ZLVHÿӇNӃWKӧSFiFWtQK

QăQJWӯSKѭѫQJWKӭFYăQEҧQYjKuQKҧQK

- Learning Schedule: WKD\ÿәLWӕFÿӝKӑFWURQJTXiWUuQKKXҩQOX\ӋQ

- Fine-Tuning Bottom-Up Features: ÿһWOHDUQLQJUDWHOjOҫQOHDUQLQJUDWH

WәQJWKӇ

- Data Augmentation: WKrPWұSGӳOLӋXWUDLQLQJ

- Model Ensembling: FKӑQFiFP{KuQKÿѭӧFÿjRWҥRYӟLFiFFjLÿһWNKiF

QKDXVӱGөQJXSGRZQPRGHOÿmÿѭӧFWUDLQVҹQӣWұSGӳOLӋX94$

.ӃWTXҧWKӵFKLӋQFӫDQKyPWiFJLҧQj\ÿmÿҥWÿѭӧFWURQJYLӋFÿiQKJLiYӟLWұSGӳOLӋXtest-std VQA v2

2.3 KiӃn trúc mҥng Modular Co-Attention Networks

'ӵDWUrQP{KuQKTransformer, mô hình Modular Co-Attention Networks (MCAN) [4] ÿѭӧFÿѭDUDYjRQăPP{KuQKQj\ÿmÿҥWÿѭӧFNӃWTXҧWӕWQKҩWWURQJFXӝFWKLYӅ9LVXDO4XHVWLRQ$QVZHULQJ7URQJEjLEiRFiFWiFJLҧÿmÿӅ[XҩWPӝWPҥQJÿӗQJFK~êWKHRP{-ÿXQ 0&$1 EDRJӗPFiFOӟSÿӗQJFK~êWKHRP{-ÿXQ

Trang 18

ÿѭӧFFiFYQJFyWKӇOjYұWWKӇWURQJҧQKYjFKXҭQKyDFK~QJYӅPӝWGҥQJYHFWRUÿӗQJQKҩWӣÿk\WiFJLҧVӁÿѭDYӅPӝWYHFWRUFyFKLӅX

7URQJNKӕL0RGXODU&R-$WWHQWLRQQKyPWiFJLҧÿmNӃWKӧSYӟLFiFÿһF WUѭQJÿmÿѭӧFWUtFK[XҩWWӯҧQKWK{QJTXDFѫFKӃERWWRP-XSDWWHQWLRQYjFkXKӓLFKRWUѭӟFWK{QJqua mô hình (Global vectors for word representation) GloVe [9] và LSTM [10] ÿӇÿѭD UD NӃW TXҧ WK{QJ TXD PӝW EjL WRiQ SKkQ ORҥL FRQ 6ӱ GөQJ KDL Fѫ FKӃ 6HOI-Attention và Guided-$WWHQWLRQOjÿLӇPKLӋXTXҧWURQJP{KuQKNKLFK~QJVӁÿѭӧFOLrQNӃWYӟLQKDXÿӇFyWKӇWKӵFKLӋQFѫFKӃWұSWUXQJKLӋXTXҧWUrQFҧKDLLQSXWOjFkXKӓL

và KuQKҧQKÿӇJLҧLTX\ӃWWӕWEjLWRiQ

0{KuQK0&$1ÿѭӧFELӇXGLӉQWKHRVѫÿӗKuQKErQGѭӟLJӗPFyEDJLDLÿRҥQ[ӱOêFkXKӓLYjKuQKҧQKÿҫXYjRVӱGөQJ'HHS&R-$WWHQWLRQ/HDUQLQJÿӇOҩ\FiFÿһFWUѭQJFӫDFkXKӓLYjKuQKҧQKVDXÿyKӧSQKҩWFiFÿһFWUѭQJYjÿѭDUDFkXWUҧOӡLFKRbài toán

Hình 3.L͇QWU~F0&$1>@

0{KuQKPDQJOҥLÿӝFKtQK[iFWUrQEӝtest-std VQA-v2

2.4 KiӃn trúc ImageBERT

/ҩ\êWѭӣQJWӯNLӃQWU~F%(57 [11@QәLWLӃQJWURQJOƭQKYӵF[ӱOêQJ{QQJӳWӵnhiên FӫD*RRJOHQăPQKyPQJKLrQFӭXӣ0LFURVRIWÿmÿӅ[XҩWP{KuQK,PDJH%(57 [12@ ÿҥW ÿѭӧF QKLӅX NӃW TXҧ ҩQ WѭӧQJ FKR FiF EjL WRiQ ÿD WKӇ PXOWL-model) ,PDJH%(57PmKyDFҧKuQKҧQKYjYăQEҧQӣWҫQJWUtFK[XҩW YHFWRUÿһFWUѭQJ6DXÿyVӁÿѭӧFFKX\ӇQWLӃSÿӃQFiFNKӕLPXOWL-head self-DWWHQWLRQFKRYLӋFKXҩQOX\ӋQYӟLWiFYөFKRYLӋFKXҩQOX\ӋQNK{QJJLiPViW

- Masked Language Modeling 0/0 7iFYөWѭѫQJWӵJLӕQJQKѭSKLrQEҧQ

JӕFFyQKLӋPYөGӵÿRiQFiFWӯÿѭӧFFKHOҥL

Trang 19

- Masked Object Classification7iFYөÿѭӧFSKiWWULӇQWKrPGӵDWUrQ0/0

\ӃXFӫDFiFP{KuQK94$WUѭӟFÿk\

2.5 ĈiQKJLi

&iFEjLEiRPjW{LÿmWuPKLӇXӣWUrQÿӅXÿѭӧFÿѭDUDÿӇJLҧLTX\ӃWFKREjLWRiQ9LVXDO4XHVWLRQ$QVZHULQJ&iFEjLEiRÿѭӧFÿѭDUDӣFiFQăPNKiFQKDXPӭFÿӝFKtQK[iFQJj\FjQJÿѭӧFWăQJOrQÿiQJNӇTXDWӯQJQăP'ѭӟLÿk\OjEҧQJÿiQKJLiÿһFÿLӇPFӫDFiFP{KuQKYjPӝWVӕNӃWTXҧNKLFKҥ\WKӱQJKLӋPYӟLWұSGӳOLӋXVQA-v2

Trang 20

- Fine-Tuning Bottom-Up Features:

Trang 21

&KѭѫQJ KIӂN THӬC NӄN TҦNG

3.1 KiӃn thӭc lý thuyӃt nӅn tҧng

3.1.1 Mҥng Neural nhân tҥo

0ҥQJQѫURQQKkQWҥR (ANN) [13] OjP{KuQK[ӱOêWK{QJWLQÿѭӧFP{SKӓQJGӵDWUrQKRҥWÿӝQJFӫDKӋWKӕQJWKҫQNLQKFӫDVLQKYұWEDRJӗPVӕOѭӧQJOӟQFiF1ѫURQÿѭӧFJҳQNӃWÿӇ[ӱOêWK{QJWLQ

3.1.1.1 Mô hình Perceptron

0{KuQK3HUFHSWURQOjP{KuQKPҥQJQѫURQÿѫQJLҧQQKҩWFKӍYӟLPӝWWҫQJÿҫXYjRYjWҫQJÿҫXUDÿk\FzQÿѭӧFJӑLOjEӝSKkQWiFKWX\ӃQWtQKQySKөFYөFKRYLӋFJLҧLTX\ӃW FiF EjL WRiQ SKkQ ORҥL WX\ӃQ WtQK Ӣ WURQJ KuQK SKtD ErQ GѭӟL là mô hình

Trang 22

0{KuQK0XOWLOD\HU3HUFHSWURQOjPӝWP{KuQKFyFҩXWU~FWәQJTXiWKѫQP{KuQK3HUFHSWURQ0{KuQKQj\VӁFyNKҧQăQJJLҧLTX\ӃWFiFEjLWRiQSKkQWiFKSKLWX\ӃQ0{KuQK0XOWLOD\HU3HUFHSWURQÿѭӧFVӱGөQJSKәELӃQWURQJFiFEjLWRiQ SKkQORҥLÿӕLWѭӧQJSKiWKLӋQUDQKӳQJTXDQKӋSKӭFWҥSFӫDGӳOLӋXOjPQӅQWҧQJÿӇQJKLrQFӭXYjSKiWPLQKFiFNLӃQWU~FPҥQJKӑFVkXSKӭFWҥSWURQJOƭQKYӵFWKӏJLiFPi\WtQKKD\[ӱOtQJ{QQJӳWӵQKLrQ

Mô hình Multilayer Perceptron VӁJӗPFiFWKjQKSKҫQVDX

YjiSGөQJKjPNtFKKRҥW

3.1.2 +jPNtFKKRҥW

+jPNtFKKRҥW DFWLYDWLRQIXQFWLRQ OjQKӳQJKjPSKLWX\ӃQÿѭӧFiSGөQJYjRÿҫXUDFӫDFiFÿѫQYӏ QRGH WURQJWҫQJҭQFӫDPӝWP{KuQK PҥQJWKҫQNLQKYjÿѭӧFVӱ

Trang 23

GөQJEDRJӗP

- Sigmoid: +jPVLJPRLGOjPӝWKjPSKLWX\ӃQYӟLÿҫXYjROjFiFVӕWKӵFYj

FKRNӃWTXҧQҵPWURQJNKRҧQJ  YjÿѭӧF[HPOj[iF[XҩWWURQJPӝWVӕEjLWRiQ+jPVLJPRLGWKѭӡQJÿѭӧFVӱGөQJÿӇGӵÿRiQ[iFVXҩWFӫDPӝWNӃWTXҧQKӏSKkQ

Công thӭc: ܵ݋݂ݐ݉ܽݔሺݔ௜ሻ ൌ  ܍ܠܘሺ࢞࢏ ሻ

σ ܍ܠܘሺ࢞࢐ ࢐ሻ

3.1.3 Hàm mҩt mát

+jPPҩWPiW ORVVIXQFWLRQ NêKLӋX/OjWKjQKSKҫQFӕWO}LWURQJYLӋFÿiQKJLi&өWKӇWURQJF{QJWKӭFWKѭӡQJJһSOj

Trang 24

3.2 KiӃn thӭc nӅn tҧng trong xӱ lý ngôn ngӳ tӵ nhiên

3.2.1 Word Embedding

:RUG(PEHGGLQJOjPӝWNK{QJJLDQYHFWRUGQJÿӇELӇXGLӉQVӵWѭѫQJÿӗQJYӅPһWQJӳQJKƭDQJӳFҧQKFӫDGӳOLӋX'ӳOLӋXÿҫXYjRFӫDFiFEjLWRiQ[ӱOêQJ{QQJӳWӵQKLrQKLӋQWҥLWKѭӡQJEDRJӗPFiF\ӃXWӕQKѭWӯFөPWӯ'RÿӝGjLYjWҫQVXҩW[XҩWKLӋQFӫDFiFWӯWURQJPӝWFkXNK{QJÿӗQJQKҩWVӁJk\NKyNKăQWURQJYLӋFWtQKWRiQQrQFҫQSKҧLFyPӝWSKѭѫQJSKiSFKX\ӇQWҩWFҧFiF\ӃXWӕQj\YӅPӝWGҥQJÿӗQJQKҩWÿӇPi\WtQKFyWKӇ[ӱOêÿѭӧFYjFKӭDQKLӅXWK{QJWLQQKҩWFyWKӇ

0ӝWSKѭѫQJSKiSÿѫQJLҧQQKҩW ÿѭӧFÿӅ[XҩWÿyOjGQJRQH-KRWYHFWRUÿӇÿѭDFiFWӯYӅPӝWGҥQJÿӗQJQKҩWWURQJNK{QJJLDQYHFWRU

3.2.2 Mô hình Continuous Bag-of-Words

0{KuQK&%2:Oҩ\êWѭӣQJOjGӵÿRiQWӯPөFWLrXGӵDYjRFiFWӯQJӳFҧQK[XQJTXDQKQyWURQJPӝWSKҥPYLQKҩWÿӏQK&KRWӯPөFWLrX࢚࢝ WҥLYӏWUtt WURQJFkXYăQ

Trang 25

JӗPC WӯQJӳFҧQKV OjNtFKWKѭӟFFӫDWұSWӯYӵQJYj1OjNtFKWKѭӟFFӫDWҫQJ

hidden (hidden layer)

3.2.3 Mô hình Skip-gram

0ӝWP{KuQKNKiFFNJQJKD\ÿѭӧFVӱGөQJOjP{KuQKVNLS-JUDPP{KuQKQj\VӱGөQJWӯPөFWLrXOjPÿҫXYjRYjÿҫXUDPRQJÿӧLOjWӯQJӳFҧQKÿӇKXҩQOX\ӋQPҥQJQѫ-URQ1KѭYұ\PӛLPүXKXҩQOX\ӋQVӁOjPӝWFһS WӯPөFWLrXWӯQJӳFҧQK

Ngày đăng: 03/08/2024, 23:03

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN