xử lý ngôn ngữ tự nhiên,kai wei chang,www cs virginia edu Lecture 3 Language Model Smoothing Kai Wei Chang CS @ University of Virginia kw@kwchang net Couse webpage http //kwchang net/teaching/NLP16 1C[.]
Lecture 3: Language Model Smoothing Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt This lecture Zipf’s law Dealing with unseen words/n-grams Add-one smoothing Linear smoothing Good-Turing smoothing Absolute discounting Kneser-Ney smoothing CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: Bigram language model I am Sam I am legend Sam I am Let P() = P( I | ) = / P( Sam | am) = 1/3 P(am | I) = P( | Sam) = 1/2 P( I am Sam) = 1*2/3*1*1/3*1/2 CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt More examples: Berkeley Restaurant Project sentences can you tell me about any good cantonese restaurants close by mid priced thai food is what i’m looking for tell me about chez panisse can you give me a listing of the kinds of food that are available i’m looking for a good place to eat breakfast when is caffe venezia open during the day CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Raw bigram counts Out of 9222 sentences CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Raw bigram probabilities Normalize by unigrams: Result: CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Zeros Test set Training set: … denied the allegations … denied the offer … denied the reports … denied the loan … denied the claims … denied the request P(“offer” | denied the) = CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Smoothing This dark art is why NLP is taught in the engineering school There are more principled smoothing methods, too We’ll look next at log-linear models, which are a good and popular general technique But the traditional methods are easy to implement, run fast, and will give you intuitions about what you want from a smoothing method Credit: the following slides are adapted from Jason Eisner’s NLP course CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt What is smoothing? 20 200 2000 2000000 CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt ML 101: bias variance tradeoff Different samples of size 20 vary considerably though on average, they give the correct bell curve! 20 20 20 20 CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 10 ... caffe venezia open during the day CS6 501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Raw bigram counts Out of 9222 sentences CS6 501 Natural Language Processing... from Jason Eisner’s NLP course CS6 501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt What is smoothing? 20 200 2000 2000000 CS6 501 Natural Language Processing... they give the correct bell curve! 20 20 20 20 CS6 501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 10 Overfitting CS6 501 Natural Language Processing CuuDuongThanCong.com