1. Trang chủ
  2. » Tất cả

xử lý ngôn ngữ tự nhiên,kai wei chang,www cs virginia edu

34 1 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 34
Dung lượng 1,17 MB

Nội dung

xử lý ngôn ngữ tự nhiên,kai wei chang,www cs virginia edu Lecture 9 Hidden Markov Model Kai Wei Chang CS @ University of Virginia kw@kwchang net Couse webpage http //kwchang net/teaching/NLP16 1CS6501[.]

Lecture 9: Hidden Markov Model Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt This lecture v Hidden Markov Model v Different views of HMM v HMM in supervised learning setting CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: Parts of Speech v Traditional parts of speech v ~ of them CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: Tagset v Penn TreeBank tagset”, 45 tags: v PRP$, WRB, WP$, VBG v Penn POS annotations: The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS / v Universal Tag set, 12 tags v NOUN, VERB, ADJ, ADV, PRON, DET, ADP, NUM, CONJ, PRT, “.”, X CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: POS Tagging v.s Word clustering v Words often have more than one POS: back v The back door = JJ v On my back = NN v Win the voters back = RB v Promised to back the bill = VB v Syntax v.s Semantics (details later) These examples from Dekang Lin CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: POS tag sequences v Some tag sequences more likely occur than others v POS Ngram view https://books.google.com/ngrams/graph?co ntent=_ADJ_+_NOUN_%2C_ADV_+_NOU N_%2C+_ADV_+_VERB_ Existing methods often model POS tagging as a sequence tagging problem CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Evaluation v How many words in the unseen test data can be tagged correctly? v Usually evaluated on Penn Treebank v State of the art ~97% v Trivial baseline (most likely tag) ~94% v Human performance ~97% CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Building a POS tagger v Supervised learning v Assume linguistics have annotated several examples Tag set: DT, JJ, NN, VBD… POS Tagger The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS / CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt POS induction v Unsupervised learning v Assume we only have an unannotated corpus Tag set: DT, JJ, NN, VBD… POS Tagger The grand jury commented on a number of other topics CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt TODAY: Hidden Markov Model v We focus on supervised learning setting v What is the most likely sequence of tags for the given sequence of words w v We will talk about other ML models for this type of prediction tasks later CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 10 Table representation Let 𝜆 = {𝐴, 𝐵, 𝜋} represents all parameters CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 20 ... supervised learning setting CS6 501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: Parts of Speech v Traditional parts of speech v ~ of them CS6 501 Natural Language... commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS / v Universal Tag set, 12 tags v NOUN, VERB, ADJ, ADV, PRON, DET, ADP, NUM, CONJ, PRT, “.”, X CS6 501 Natural Language Processing CuuDuongThanCong.com... voters back = RB v Promised to back the bill = VB v Syntax v.s Semantics (details later) These examples from Dekang Lin CS6 501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt

Ngày đăng: 27/11/2022, 21:14