xử lý ngôn ngữ tự nhiên,kai wei chang,www cs virginia edu Lecture 12 EM Algorithm Kai Wei Chang CS @ University of Virginia kw@kwchang net Couse webpage http //kwchang net/teaching/NLP16 1CS6501 Natur[.]
Lecture 12: EM Algorithm Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Three basic problems for HMMs v Likelihood of the input: v Forward algorithm How likely the sentence ”I love cat” occurs v Decoding (tagging) the input: v Viterbi algorithm v Estimation (learning): POS tags of ”I love cat” occurs How to learn the model? v Find the best model parameters v Case 1: supervised – tags are annotated vMaximum likelihood estimation (MLE) v Case 2: unsupervised only unannotated text vForward-backward algorithm CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt EM algorithm v POS induction – can we tag POS without annotated data? v An old idea v Good mathematical intuition v Tutorial paper: ftp://ftp.icsi.berkeley.edu/pub/techreports/1997/t r-97-021.pdf v http://people.csail.mit.edu/regina/6864/em_note s_mike.pdf CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Hard EM (Intuition) v We don’t know the hidden states (i.e., POS tags) v If we know the model CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: Learning from Labeled Data C v If we know the hidden states (labels) C C H H H 2 3 H H 1 3 C H H H v we count how often we see 𝑡"#$ 𝑡" and w&𝑡" then normalize CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: Tagging the input v If we know the model, we can find the best tag sequence CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Hard EM (Intuition) v We don’t know the hidden states (i.e., POS tags) Let’s guess! Then, we have labels; we can estimate the model Check if the model is consistent with the labels we guessed; if no → Step CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt P(…| C) P(… | H) P(…|Start) Let’s make a guess ? ? ? ( 1| … ) ? - ( 2| … ) ? ? - ( 3 | …) ? - ( C| …) 0.8 0.2 0.5 ( H | …) 0.2 0.8 0.5 ? ? ? 2 ? ? 1 ? ? ? ? CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt P(…| C) P(… | H) P(…|Start) These are obvious C ? ? ( 1| … ) ? - ( 2| … ) ? ? - ( 3 | …) ? - ( C| …) 0.8 0.2 0.5 ( H | …) 0.2 0.8 0.5 ? H ? 2 ? H 1 C ? H ? CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt P(…| C) P(… | H) P(…|Start) Guess more C C ? ( 1| … ) ? - ( 2| … ) ? ? - ( 3 | …) ? - ( C| …) 0.8 0.2 0.5 ( H | …) 0.2 0.8 0.5 H H H 2 H H 1 C ? H H CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 10 ... intuition v Tutorial paper: ftp://ftp.icsi.berkeley .edu/ pub/techreports/1997/t r-97-021.pdf v http://people.csail.mit .edu/ regina/6864/em_note s_mike.pdf CS6 501 Natural Language Processing CuuDuongThanCong.com... normalize CS6 501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: Tagging the input v If we know the model, we can find the best tag sequence CS6 501 Natural... have labels; weinstead! can estimate Let’s use expected counts the model v Maximum Likelihood Estimation Check if the model is consistent with the labels we guessed; if no → Step CS6 501 Natural