xử lý ngôn ngữ tự nhiên,regina barzilay,ocw mit edu 6 864 Lecture 17 (November 10th, 2005) Machine Translation Part III CuuDuongThanCong com https //fb com/tailieudientucntt http //cuuduongthancong co[.]
6.864: Lecture 17 (November 10th, 2005) Machine Translation Part III CuuDuongThanCong.com https://fb.com/tailieudientucntt Overview • A Phrase-Based Model: (Koehn, Och and Marcu 2003) • Syntax Based Model 1: (Wu 1995) • Syntax Based Model 2: (Yamada and Knight 2001) Methods that go beyond word-word alignments CuuDuongThanCong.com https://fb.com/tailieudientucntt A Phrase-Based Model (Koehn, Och and Marcu 2003) • Intuition: IBM models have word-word translation • Intuition: in IBM models each French word is aligned with only one English word • A new type of model: align phrases in English with phrases in French CuuDuongThanCong.com https://fb.com/tailieudientucntt • An example from Koehn and Knight tutorial: Morgen fliege ich nach Kanada zur Konferenz Tomorrow I will fly to the conference in Canada Morgen fliege ich nach Kanada zur Konderenz CuuDuongThanCong.com Tomorrow will fly I in Canada to the conference https://fb.com/tailieudientucntt Representation as Alignment “Matrix” Maria no Mary did not slap the green witch • • • daba una bof’ • • • a la • • bruja • verde • (Note: “bof”’ = “bofetada”) (Another example from the Koehn and Knight tutorial) CuuDuongThanCong.com https://fb.com/tailieudientucntt The Issues Involved • Finding alignment matrices for all English/French pairs in training corpora • Coming up with a model that incorporates phrases • Training the model • Decoding with the model CuuDuongThanCong.com https://fb.com/tailieudientucntt Finding Alignment Matrices • Step 1: train IBM model for P (f | e), and come up with most likely alignment for each (e, f ) pair • Step 2: train IBM model for P (e | f )(!) and come up with most likely alignment for each (e, f ) pair • We now have two alignments: take intersection of the two alignments as a starting point CuuDuongThanCong.com https://fb.com/tailieudientucntt Alignment from P (f | e) model: Maria no daba una Mary • did not • slap • • the green witch Alignment from P (e | f ) model: Maria no daba una Mary • did • not • slap the green witch CuuDuongThanCong.com bof’ a la bruja verde • • • • bof’ • a la bruja • • • verde • https://fb.com/tailieudientucntt Intersection of the two alignments: Maria no daba una Mary • did not • slap the green witch bof’ • a la bruja • • verde • The intersection of the two alignments was found to be a very reliable starting point CuuDuongThanCong.com https://fb.com/tailieudientucntt Heuristics for Growing Alignments • Only explore alignment in union of P (f | e) and P (e | f ) alignments • Add one alignment point at a time • Only add alignment points which align a word that currently has no alignment • At first, restrict ourselves to alignment points that are “neighbors” (adjacent or diagonal) of current alignment points • Later, consider other alignment points CuuDuongThanCong.com https://fb.com/tailieudientucntt