xử lý ngôn ngữ tự nhiên,kai wei chang,www cs virginia edu

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	50
Dung lượng	1,89 MB

Nội dung

xử lý ngôn ngữ tự nhiên,kai wei chang,www cs virginia edu Lecture 12 EM Algorithm Kai Wei Chang CS @ University of Virginia kw@kwchang net Couse webpage http //kwchang net/teaching/NLP16 1CS6501 Natur[.]

Lecture 12: EM Algorithm Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Three basic problems for HMMs v Likelihood of the input: v Forward algorithm How likely the sentence ”I love cat” occurs v Decoding (tagging) the input: v Viterbi algorithm v Estimation (learning): POS tags of ”I love cat” occurs How to learn the model? v Find the best model parameters v Case 1: supervised – tags are annotated vMaximum likelihood estimation (MLE) v Case 2: unsupervised only unannotated text vForward-backward algorithm CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt EM algorithm v POS induction – can we tag POS without annotated data? v An old idea v Good mathematical intuition v Tutorial paper: ftp://ftp.icsi.berkeley.edu/pub/techreports/1997/t r-97-021.pdf v http://people.csail.mit.edu/regina/6864/em_note s_mike.pdf CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Hard EM (Intuition) v We don’t know the hidden states (i.e., POS tags) v If we know the model CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: Learning from Labeled Data C v If we know the hidden states (labels) C C H H H 2 3 H H 1 3 C H H H v we count how often we see 𝑡"#$ 𝑡" and w&𝑡" then normalize CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: Tagging the input v If we know the model, we can find the best tag sequence CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Hard EM (Intuition) v We don’t know the hidden states (i.e., POS tags) Let’s guess! Then, we have labels; we can estimate the model Check if the model is consistent with the labels we guessed; if no → Step CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt P(…| C) P(… | H) P(…|Start) Let’s make a guess ? ? ? ( 1| … ) ? - ( 2| … ) ? ? - ( 3 | …) ? - ( C| …) 0.8 0.2 0.5 ( H | …) 0.2 0.8 0.5 ? ? ? 2 ? ? 1 ? ? ? ? CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt P(…| C) P(… | H) P(…|Start) These are obvious C ? ? ( 1| … ) ? - ( 2| … ) ? ? - ( 3 | …) ? - ( C| …) 0.8 0.2 0.5 ( H | …) 0.2 0.8 0.5 ? H ? 2 ? H 1 C ? H ? CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt P(…| C) P(… | H) P(…|Start) Guess more C C ? ( 1| … ) ? - ( 2| … ) ? ? - ( 3 | …) ? - ( C| …) 0.8 0.2 0.5 ( H | …) 0.2 0.8 0.5 H H H 2 H H 1 C ? H H CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 10 ... intuition v Tutorial paper: ftp://ftp.icsi.berkeley .edu/ pub/techreports/1997/t r-97-021.pdf v http://people.csail.mit .edu/ regina/6864/em_note s_mike.pdf CS6 501 Natural Language Processing CuuDuongThanCong.com... normalize CS6 501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: Tagging the input v If we know the model, we can find the best tag sequence CS6 501 Natural... have labels; weinstead! can estimate Let’s use expected counts the model v Maximum Likelihood Estimation Check if the model is consistent with the labels we guessed; if no → Step CS6 501 Natural

Ngày đăng: 27/11/2022, 21:14