xử lý ngôn ngữ tự nhiên,kai wei chang,www cs virginia edu Lecture 9 Hidden Markov Model Kai Wei Chang CS @ University of Virginia kw@kwchang net Couse webpage http //kwchang net/teaching/NLP16 1CS6501[.]
Lecture 9: Hidden Markov Model Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt This lecture v Hidden Markov Model v Different views of HMM v HMM in supervised learning setting CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: Parts of Speech v Traditional parts of speech v ~ of them CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: Tagset v Penn TreeBank tagset”, 45 tags: v PRP$, WRB, WP$, VBG v Penn POS annotations: The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS / v Universal Tag set, 12 tags v NOUN, VERB, ADJ, ADV, PRON, DET, ADP, NUM, CONJ, PRT, “.”, X CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: POS Tagging v.s Word clustering v Words often have more than one POS: back v The back door = JJ v On my back = NN v Win the voters back = RB v Promised to back the bill = VB v Syntax v.s Semantics (details later) These examples from Dekang Lin CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: POS tag sequences v Some tag sequences more likely occur than others v POS Ngram view https://books.google.com/ngrams/graph?co ntent=_ADJ_+_NOUN_%2C_ADV_+_NOU N_%2C+_ADV_+_VERB_ Existing methods often model POS tagging as a sequence tagging problem CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Evaluation v How many words in the unseen test data can be tagged correctly? v Usually evaluated on Penn Treebank v State of the art ~97% v Trivial baseline (most likely tag) ~94% v Human performance ~97% CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Building a POS tagger v Supervised learning v Assume linguistics have annotated several examples Tag set: DT, JJ, NN, VBD… POS Tagger The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS / CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt POS induction v Unsupervised learning v Assume we only have an unannotated corpus Tag set: DT, JJ, NN, VBD… POS Tagger The grand jury commented on a number of other topics CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt TODAY: Hidden Markov Model v We focus on supervised learning setting v What is the most likely sequence of tags for the given sequence of words w v We will talk about other ML models for this type of prediction tasks later CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 10 Table representation Let 𝜆 = {𝐴, 𝐵, 𝜋} represents all parameters CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 20 ... supervised learning setting CS6 501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: Parts of Speech v Traditional parts of speech v ~ of them CS6 501 Natural Language... commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS / v Universal Tag set, 12 tags v NOUN, VERB, ADJ, ADV, PRON, DET, ADP, NUM, CONJ, PRT, “.”, X CS6 501 Natural Language Processing CuuDuongThanCong.com... voters back = RB v Promised to back the bill = VB v Syntax v.s Semantics (details later) These examples from Dekang Lin CS6 501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt