1. Trang chủ
  2. » Tất cả

xử lý ngôn ngữ tự nhiên,regina barzilay,ocw mit edu

57 4 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 57
Dung lượng 470,13 KB

Nội dung

xử lý ngôn ngữ tự nhiên,regina barzilay,ocw mit edu 6 864 Lecture 17 (November 10th, 2005) Machine Translation Part III CuuDuongThanCong com https //fb com/tailieudientucntt http //cuuduongthancong co[.]

6.864: Lecture 17 (November 10th, 2005) Machine Translation Part III CuuDuongThanCong.com https://fb.com/tailieudientucntt Overview • A Phrase-Based Model: (Koehn, Och and Marcu 2003) • Syntax Based Model 1: (Wu 1995) • Syntax Based Model 2: (Yamada and Knight 2001) Methods that go beyond word-word alignments CuuDuongThanCong.com https://fb.com/tailieudientucntt A Phrase-Based Model (Koehn, Och and Marcu 2003) • Intuition: IBM models have word-word translation • Intuition: in IBM models each French word is aligned with only one English word • A new type of model: align phrases in English with phrases in French CuuDuongThanCong.com https://fb.com/tailieudientucntt • An example from Koehn and Knight tutorial: Morgen fliege ich nach Kanada zur Konferenz Tomorrow I will fly to the conference in Canada Morgen fliege ich nach Kanada zur Konderenz CuuDuongThanCong.com Tomorrow will fly I in Canada to the conference https://fb.com/tailieudientucntt Representation as Alignment “Matrix” Maria no Mary did not slap the green witch • • • daba una bof’ • • • a la • • bruja • verde • (Note: “bof”’ = “bofetada”) (Another example from the Koehn and Knight tutorial) CuuDuongThanCong.com https://fb.com/tailieudientucntt The Issues Involved • Finding alignment matrices for all English/French pairs in training corpora • Coming up with a model that incorporates phrases • Training the model • Decoding with the model CuuDuongThanCong.com https://fb.com/tailieudientucntt Finding Alignment Matrices • Step 1: train IBM model for P (f | e), and come up with most likely alignment for each (e, f ) pair • Step 2: train IBM model for P (e | f )(!) and come up with most likely alignment for each (e, f ) pair • We now have two alignments: take intersection of the two alignments as a starting point CuuDuongThanCong.com https://fb.com/tailieudientucntt Alignment from P (f | e) model: Maria no daba una Mary • did not • slap • • the green witch Alignment from P (e | f ) model: Maria no daba una Mary • did • not • slap the green witch CuuDuongThanCong.com bof’ a la bruja verde • • • • bof’ • a la bruja • • • verde • https://fb.com/tailieudientucntt Intersection of the two alignments: Maria no daba una Mary • did not • slap the green witch bof’ • a la bruja • • verde • The intersection of the two alignments was found to be a very reliable starting point CuuDuongThanCong.com https://fb.com/tailieudientucntt Heuristics for Growing Alignments • Only explore alignment in union of P (f | e) and P (e | f ) alignments • Add one alignment point at a time • Only add alignment points which align a word that currently has no alignment • At first, restrict ourselves to alignment points that are “neighbors” (adjacent or diagonal) of current alignment points • Later, consider other alignment points CuuDuongThanCong.com https://fb.com/tailieudientucntt

Ngày đăng: 27/11/2022, 21:17