Mặc dù đã đạt được các kết quả theo mục tiêu đề ra của luận án, do hạn chế về thời gian và khuơn khổ nội dung, luận án vẫn tồn tại một số điểm hạn chế:
- Việc thử nghiệm đánh giá chỉ dựa trên một số ít bộ số liệu được cơng bố nên các thử nghiệm đánh giá chưa được đa dạng với nhiều miền dữ liệu khác nhau Chất lượng giĩng hàng từ được đánh giá thơng qua điểm BLEU của hệ thống dịch máy mà chưa cĩ thử nghiệm đánh giá bằng các độ đo về chất lượng giĩng hàng từ như AER, precision, recall, F-measure nên chưa chỉ rõ được chất lượng giĩng hàng từ thay đổi như thế nào sau khi áp dụng các phương pháp chia nhỏ từ và cải tiến thuật tốn giĩng hàng
- Dịch ngược sử dụng ngơn ngữ trung gian địi hỏi phải cĩ ngữ liệu huấn luyện đủ lớn để huấn luyện mơ hình dịch hoặc phải cĩ mơ hình huấn luyện sẵn chất lượng tốt, trong nghiên cứu sử dụng mơ hình dịch máy nơ-ron cho cặp
ngơn ngữ Anh - Đức - Anh, điều này tăng thời gian tăng thời gian huấn luyện mơ hình dịch máy thống kê
Để khắc phục các tồn tại nêu trên, NCS đề xuất một số hướng nghiên cứu tiếp theo của luận án như sau:
1 Nghiên cứu, áp dụng kết hợp hai phương pháp đã đề xuất để nâng cao chất lượng hệ thống dịch máy thống kê, bên cạnh đĩ, tiếp tục nghiên cứu cải tiến các thành phần khác trong hệ thống dịch máy thống kê như mơ hình ngơn ngữ
2 Phương pháp chia nhỏ từ cĩ sẵn được xây dựng để áp dụng cho dịch máy nơ-ron, trong luận án đã sử dụng cho dịch máy thống kê, tuy nhiên cần nghiên cứu đề xuất phương pháp chia nhỏ từ phù hợp với kiến trúc và đặc điểm của dịch máy thống kê
3 Nghiên cứu, đề xuất áp dụng các phương pháp đã sử dụng trong luận án vào dịch máy nơ-ron để cĩ thể xây dựng hệ thống dịch máy tốt cho cả hai chiều Việt - Anh và Anh - Việt
DANH MỤC CÁC CƠNG TRÌNH KHOA HỌC ĐÃ CƠNG BỐ
[CT1] “Automatic Detection of Problematic Rules in Vietnamese Treebank” RIVF-2015
[CT2] “The JAIST-UET-MITI Machine Translation Systems for IWSLT 2015” IWSLT-2015
[CT3] “Phương pháp tăng cường dữ liệu huấn luyện dịch máy thống kê cặp ngơn ngữ Việt - Anh bằng kỹ thuật back - translation và lựa chọn thích nghi” Tạp chí nghiên cứu khoa học và cơng nghệ quân sự số đặc san tháng 12-2020
[CT4] “Cải tiến mơ hình giĩng hàng trong dịch máy thống kê cặp ngơn ngữ Việt - Anh với kỹ thuật chia nhỏ từ” Tạp chí nghiên cứu khoa học và cơng nghệ quân sự số 74 tháng 8-2021
TÀI LIỆU THAM KHẢO Tiếng Anh 1 2 3 4 5 6 7 8 9
Al-Onaizan Y, Curin J, Jahr M, Knight K, Lafferty J, Melamed D, et al (1999) Statistical machine translation: Final report JHU Workshop Axelrod A, Elgohary A, Martindale M, Nguyen K, Niu X, Vyas Y, et al (2015) The UMD Machine Translation Systems at IWSLT 2015 Proc IWSLT
Ayan NF (2005) Combining linguistic and machine learning techniques
for word alignment improvement, PhD Thesis
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by
jointly learning to align and translate ArXiv Prepr ArXiv14090473
Banĩn M, Chen P, Haddow B, Heafield K, Hoang H, Espla-Gomis M, et al (2020) ParaCrawl: Web-scale acquisition of parallel corpora Proc 58th Annu Meet Assoc Comput Linguist Tr 4555–67
Bao HT, Khanh PN, Le HT, Thao NTP (2009) Issues and first
development phase of the english-vietnamese translation system evsmt1 0 Proc Third Hanoi Forum Information—Communication Technol
Bentivogli L, Bisazza A, Cettolo M, Federico M (2016) Neural versus
phrase-based machine translation quality: a case study ArXiv Prepr
ArXiv160804631
Bentivogli L, Bisazza A, Cettolo M, Federico M (2018) Neural versus
phrase-based mt quality: An in-depth analysis on english–german and english–french Comput Speech Lang Số 49 , Tr 52–70
Berg-Kirkpatrick T, Bouchard-Cơté A, DeNero J, Klein D (2010)
Painless unsupervised learning with features Hum Lang Technol 2010
Annu Conf North Am Chapter Assoc Comput Linguist Tr 582–90 10 Bojar O, Tamchyna A (2011) Improving translation model by
monolingual data Proc Sixth Workshop Stat Mach Transl Tr 330–6
11 Brown PF, Cocke J, Della Pietra SA, Della Pietra VJ, Jelinek F, Lafferty J, et al (1990) A statistical approach to machine translation Comput Linguist Số 16 (2), Tr 79–85
12 Brown PF, Della Pietra SA, Della Pietra VJ, Lai JC, Mercer RL (1992)
An estimate of an upper bound for the entropy of English Comput
13 Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The
mathematics of statistical machine translation: Parameter estimation
Comput Linguist Số 19 (2), Tr 263–311
14 Castilho S, Gaspari F, Moorkens J, Popović M, Toral A (2019) Editors’
foreword to the special issue on human factors in neural machine translation Mach Transl Số 33 (1), Tr 1–7
15 Castilho S, Moorkens J, Gaspari F, Calixto I, Tinsley J, Way A (2017) Is
neural machine translation the new state of the art? Prague Bull Math
Linguist (108)
16 Castilho S, Moorkens J, Gaspari F, Sennrich R, Sosoni V,
Georgakopoulou P, et al (2017) A comparative quality evaluation of
PBSMT and NMT using professional translators
17 Cettolo M, Jan N, Sebastian S, Bentivogli L, Cattoni R, Federico M (2015) The iwslt 2015 evaluation campaign Int Workshop Spok Lang Transl
18 Chatzikoumi E (2020) How to evaluate machine translation: A review of
automated and human metrics Nat Lang Eng Số 26 (2), Tr 137–61
19 Cheng Y (2019) Semi-supervised learning for neural machine
translation Jt Train Neural Mach Transl Springer Tr 25–40
20 Chiang D (2005) A hierarchical phrase-based model for statistical
machine translation Proc 43rd Annu Meet Assoc Comput Linguist
Acl’05 Tr 263–70
21 Chiang D (2007) Hierarchical phrase-based translation Comput Linguist Số 33 (2), Tr 201–28
22 Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, et al (2014) Learning phrase representations using RNN
encoder-decoder for statistical machine translation ArXiv Prepr
ArXiv14061078
23 Chung J, Cho K, Bengio Y (2016) A character-level decoder without
explicit segmentation for neural machine translation ArXiv Prepr
ArXiv160306147
24 Clifton A, Sarkar A (2011) Combining morpheme-based machine
translation with post-processing morpheme prediction Proc 49th Annu
25 Creutz M, Lagus K (2005) Inducing the morphological lexicon of a
natural language from unannotated text Proc Int Interdiscip Conf Adapt
Knowl Represent Reason AKRR’05 Tr 51–9
26 Creutz M, Lagus K (2002) Unsupervised discovery of morphemes ArXiv Prepr Cs0205057
27 Cui Y, Chen Z, Wei S, Wang S, Liu T, Hu G (2017) Attention-over-
Attention Neural Networks for Reading Comprehension Proc 55th Annu
Meet Assoc Comput Linguist Vol 1 Long Pap , Tr 593–602
28 Currey A, Miceli-Barone AV, Heafield K (2017) Copied monolingual
data improves low-resource neural machine translation Proc Second
Conf Mach Transl Tr 148–56
29 Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from
incomplete data via the EM algorithm J R Stat Soc Ser B Methodol Số
39 (1), Tr 1–22
30 Dien D (2003) BTL: an Hybrid Model in the English-Vietnamese
Machine Translation System Proc MT Summit IX La USA 2003
31 Dinh D, Ngan NLT, Quang DX, Nam VC (2003) A Hybrid Approach to
Word Order Transfer in the English-to-Vietnamese Machine Translation
Proc Mach Transl Summit IX Citeseer
32 Doddington G (2002) Automatic evaluation of machine translation
quality using n-gram co-occurrence statistics Proc Second Int Conf Hum
Lang Technol Res Tr 138–45
33 Dowling M, Lynn T, Poncelas A, Way A (2018) SMT versus NMT:
Preliminary comparisons for Irish
34 Dyer C, Chahuneau V, Smith NA (2013) A simple, fast, and effective
reparameterization of ibm model 2 Proc 2013 Conf North Am Chapter
Assoc Comput Linguist Hum Lang Technol Tr 644–8
35 Dyer C, Clark JH, Lavie A, Smith NA (2011) Unsupervised word
alignment with arbitrary features Proc 49th Annu Meet Assoc Comput
Linguist Hum Lang Technol Tr 409–19
36 Edunov S, Ott M, Auli M, Grangier D (2018) Understanding back-
translation at scale ArXiv Prepr ArXiv180809381
37 Farwell D, Wilks Y (1990) ULTRA: a multilingual machine translator New Mexico State University Las Cruces, NM
38 Galley M, Hopkins M, Knight K, Marcu D (2004) What’s in a translation
rule? Proc Hum Lang Technol Conf North Am Chapter Assoc Comput
Linguist HLT-NAACL 2004 Tr 273–80
39 Garcia-Varea I, Och FJ, Ney H, Casacuberta F (2002) Improving
alignment quality in statistical machine translation using context- dependent maximum entropy models COLING 2002 19th Int Conf
Comput Linguist
40 Ghaffar SA, Fakhr MW, Sheraton C (2011) English to arabic statistical
machine translation system improvements using preprocessing and arabic morphology analysis Recent Res Math Methods Electr Eng
Comput Sci , Tr 50–4
41 Gibadullin I, Valeev A, Khusainova A, Khan A (2019) A survey of
methods to leverage monolingual data in low-resource neural machine translation ArXiv Prepr ArXiv191000373
42 Ha T-L, Niehues J, Cho E, Mediani M, Waibel A (2015) The KIT
translation systems for IWSLT 2015 Universitätsbibliothek der RWTH
Aachen
43 Han D, Martínez-Gĩmez P, Miyao Y, Sudoh K, Nagata M (2013) Effects
of parsing errors on pre-reordering performance for Chinese-to-
Japanese SMT Proc 27th Pac Asia Conf Lang Inf Comput PACLIC 27
Tr 267–76
44 Ho TB (2005) Current Status of Machine Translation Research in
Vietnam Towards Asian wide multi language machine translation project
Proc Vietnam Lang Speech Process Workshop
45 Hoang VCD, Koehn P, Haffari G, Cohn T (2018) Iterative back-
translation for neural machine translation Proc 2nd Workshop Neural
Mach Transl Gener Tr 18–24
46 Hoang V, Ngo M, Dinh D (2008) A dependency-based word reordering
approach for statistical machine translation 2008 IEEE Int Conf Res
Innov Vis Future Comput Commun Technol IEEE Tr 120–7 47 Hutchins WJ (2001) Machine translation over fifty years Hist
Epistémologie Lang Số 23 (1), Tr 7–31
48 Hutchins WJ, Somers HL (1992) An introduction to machine translation Academic Press London
49 Isabelle P, Cherry C, Foster G (2017) A challenge set approach to
evaluating machine translation ArXiv Prepr ArXiv170407431
50 Ittycheriah A, Roukos S (2005) A maximum entropy word aligner for
arabic-english machine translation Proc Hum Lang Technol Conf Conf
Empir Methods Nat Lang Process Tr 89–96
51 Jia Y, Carl M, Wang X (2019) Post-editing neural machine translation
versus phrase-based machine translation for English–Chinese Mach
Transl Số 33 (1), Tr 9–29
52 Junczys-Dowmunt M, Dwojak T, Hoang H (2016) Is neural machine
translation ready for deployment? A case study on 30 translation directions ArXiv Prepr ArXiv161001108
53 Jurafsky D, Martin JH Speech and Language Processing: An
Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
54 Kamigaito H, Watanabe T, Takamura H, Okumura M, Sumita E (2016)
Unsupervised Word Alignment Using Frequency Constraint in Posterior Regularized EM J Nat Lang Process Số 23 (4), Tr 327–51
55 Kay M (1973) Automatic translation of natural languages Daedalus , Tr 217–30
56 Khayrallah H, Koehn P (2018) On the impact of various types of noise
on neural machine translation ArXiv Prepr ArXiv180512282
57 Koehn P, Hoang H (2007) Factored translation models Proc 2007 Jt Conf Empir Methods Nat Lang Process Comput Nat Lang Learn EMNLP-CoNLL Tr 868–76
58 Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, et al (2007) Moses: Open source toolkit for statistical machine
translation Proc 45th Annu Meet Assoc Comput Linguist Companion
Vol Proc Demo Poster Sess Tr 177–80
59 Koehn P, Knowles R (2017) Six challenges for neural machine
translation ArXiv Prepr ArXiv170603872
60 Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation UNIVERSITY OF SOUTHERN CALIFORNIA MARINA DEL REY INFORMATION SCIENCES INST2003
61 Kudo T (2018) Subword regularization: Improving neural network
translation models with multiple subword candidates ArXiv Prepr
ArXiv180410959
62 Lample G, Ott M, Conneau A, Denoyer L, Ranzato M (2018) Phrase-
based & neural unsupervised machine translation ArXiv Prepr
ArXiv180407755
63 Le A-C, Nguyen T-P, Tran Q-L, Linh DB (2018) Integrating Word
Embeddings into IBM Word Alignment Models 2018 10th Int Conf
Knowl Syst Eng KSE IEEE Tr 79–84
64 Le KH (2003) One method of Interlingua translation Proc Natl Conf IT Res Dev Appl
65 Lee J-H, Lee S-W, Hong G, Hwang Y-S, Kim S-B, Rim HC (2010) A
post-processing approach to statistical word alignment reflecting
alignment tendency between part-of-speeches Coling 2010 Posters Tr
623–9
66 Lin D, Cherry C (2003) Word Alignment with Cohesion Constraint Companion Vol Proc HLT-NAACL 2003 - Short Pap HLT-NAACL 2003 Truy cập ngày 11/05/2021, Tr 49–51 URL:
https://www aclweb org/anthology/N03-2017
67 Liu Y, Liu Q, Lin S (2010) Discriminative word alignment by linear
modeling Comput Linguist Số 36 (3), Tr 303–39
68 Liu Y, Liu Q, Lin S (2005) Log-linear models for word alignment Proc 43rd Annu Meet Assoc Comput Linguist ACL’05 Tr 459–66
69 Liu Y, Sun M (2015) Contrastive unsupervised word alignment with non-
local features Proc AAAI Conf Artif Intell
70 Luong M-T, Manning CD (2015) Stanford neural machine translation
systems for spoken language domains Proc Int Workshop Spok Lang
Transl Tr 76–9
71 Ma Y, Ozdowska S, Sun Y, Way A (2008) Improving word alignment
using syntactic dependencies Association for Computational Linguistics
72 Mahata SK, Mandal S, Das D, Bandyopadhyay S (2018) SMT vs NMT:
a comparison over Hindi & Bengali simple sentences ArXiv Prepr
73 Menacer MA, Langlois D, Mella O, Fohr D, Jouvet D, Smạli K (2017)
Is statistical machine translation approach dead? ICNLSSP 2017-Int
Conf Nat Lang Signal Speech Process Tr 1–5
74 Mermer C, Saraỗlar M, Sarikaya R (2013) Improving statistical machine
translation using Bayesian word alignment and Gibbs sampling IEEE
Trans Audio Speech Lang Process Số 21 (5), Tr 1090–101 75 Mitamura T (1999) Controlled language for multilingual machine
translation Proc Mach Transl Summit VII Tr 46–52
76 Mitamura T, Nyberg E, Carbonell JG (1991) An efficient interlingua
translation system for multi-lingual document production
77 Moore RC (2005) A discriminative framework for bilingual word
alignment Proc Hum Lang Technol Conf Conf Empir Methods Nat Lang
Process Tr 81–8
78 Moore RC (2004) Improving IBM word alignment model 1 Proc 42nd Annu Meet Assoc Comput Linguist ACL-04 Tr 518–25
79 Müller M, Nguyen T-S, Sperber M, Kilgour K, Stüker S, Waibel A (2015) The 2015 KIT IWSLT Speech-to-Text Systems for English and
German Int Workshop Spok Lang Transl IWSLT Citeseer
80 Nagao M (1984) A framework of a mechanical translation between
Japanese and English by analogy principle Artif Hum Intell , Tr 351–4
81 Ng N, Yee K, Baevski A, Ott M, Auli M, Edunov S (2019) Facebook
FAIR’s WMT19 News Translation Task Submission ArXiv Prepr
ArXiv190706616
82 Nguyen NT, Le VQ, Nghiem M-Q, Dinh D (2015) A General Approach
for Word Reordering in English-Vietnamese-English Statistical Machine Translation Int J Artif Intell Tools Số 24 (06), Tr 1550024
83 Nomura T, Tsukada H, Akiba T Improvement of Word Alignment Models
for Vietnamese-to-English Translation
84 Och FJ (2003) Minimum error rate training in statistical machine
translation Proc 41st Annu Meet Assoc Comput Linguist Tr 160–7
85 Och FJ, Ney H (2000) A Comparison of Alignment Models for Statistical
Machine Translation COLING 2000 Vol 2 18th Int Conf Comput
Linguist COLING 2000 Truy cập ngày 11/05/2021, URL: https://www aclweb org/anthology/C00-2163
86 Och FJ, Ney H (2003) A systematic comparison of various statistical
alignment models Comput Linguist Số 29 (1), Tr 19–51
87 Och FJ, Ney H (2000) Improved statistical alignment models Proc 38th Annu Meet Assoc Comput Linguist Tr 440–7
88 Och FJ, Ney H (2004) The alignment template approach to statistical
machine translation Comput Linguist Số 30 (4), Tr 417–49
89 Ojha AK, Chowdhury KD, Liu C-H, Saxena K (2018) The RGNLP
machine translation systems for WAT 2018 ArXiv Prepr
ArXiv181200798
90 Ott M, Auli M, Grangier D, Ranzato M (2018) Analyzing uncertainty in
neural machine translation Int Conf Mach Learn PMLR Tr 3956–65
91 Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for
automatic evaluation of machine translation Proc 40th Annu Meet Assoc
Comput Linguist Tr 311–8
92 Park J, Song J, Yoon S (2017) Building a neural machine translation
system using only synthetic parallel data ArXiv Prepr ArXiv170400253
93 Pham N-L, Nguyen V-V (2020) Adaptation in Statistical Machine
Translation for Low-resource Domains in English-Vietnamese Language
VNU J Sci Comput Sci Commun Eng Số 36 (1)
94 Phuoc NQ, Quan Y, Ock C-Y (2016) Building a bidirectional english-
vietnamese statistical machine translation system by using moses Int J
Comput Electr Eng Số 8 (2), Tr 161
95 Poerner N, Sabet MJ, Roth B, Schütze H (2018) Aligning Very Small
Parallel Corpora Using Cross-Lingual Word Embeddings and a Monogamy Objective ArXiv Prepr ArXiv181100066
96 Poncelas A, Popovic M, Shterionov D, Wenniger GM de B, Way A (2019) Combining SMT and NMT back-translated data for efficient
NMT ArXiv Prepr ArXiv190903750
97 Poncelas A, Shterionov D, Way A, Wenniger GM de B, Passban P (2018)
Investigating Backtranslation in Neural Machine Translation Truy cập
ngày 11/05/2021; URL: https://arxiv org/abs/1804 06189v1
98 Richman T Johns Hopkins scientists win $10 7 million grant to translate
little-used languages baltimoresun com Truy cập ngày 11/05/2021,
URL: https://www baltimoresun com/latest/bs-md-hopkins-language- grant-20171011-story html
99 Ruiz N, Di Gangi MA, Bertoldi N, Federico M (2019) Assessing the
tolerance of neural machine translation systems against speech recognition errors ArXiv Prepr ArXiv190410997
100 Sabet MJ, Faili H, Haffari G (2016) Improving word alignment of rare
words with word embeddings Proc COLING 2016 26th Int Conf Comput
Linguist Tech Pap Tr 3209–15
101 Sato S, Nagao M (1990) Toward memory-based translation COLNG 1990 Vol 3 Pap Present 13th Int Conf Comput Linguist
102 Schuster M, Nakajima K (2012) Japanese and korean voice search 2012 IEEE Int Conf Acoust Speech Signal Process ICASSP IEEE Tr 5149–