xử lý ngôn ngữ tự nhiên,kai wei chang,www cs virginia edu Lecture 17 Statistical Parsing with PCFG Kai Wei Chang CS @ University of Virginia kw@kwchang net Couse webpage http //kwchang net/teaching/NL[.]
Lecture 17: Statistical Parsing with PCFG Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501-NLP CuuDuongThanCong.com https://fb.com/tailieudientucntt Reading list v Look at Mike Collins’ note on PCFGs and lexicalized PCFG http://www.cs.columbia.edu/~mcollins/ CS6501-NLP CuuDuongThanCong.com https://fb.com/tailieudientucntt Phrase structure (constituency) trees v Can be modeled by Context-free grammars CS6501-NLP CuuDuongThanCong.com https://fb.com/tailieudientucntt CKY algorithm § for J := to n § Add to [J-1,J] all categories for the Jth word § for width := to n § for start := to n-width // this is I § Define end := start + width // this is J § for mid := start+1 to end-1 // find all I-to-J phrases § for every rule X Y Z in the grammar if Y in [start,mid] and Z in [mid,end] then add X to [start,end] CS6501-NLP CuuDuongThanCong.com https://fb.com/tailieudientucntt Weighted CKY: Viterbi algorithm • initialize all entries of chart to ∞ • for i := to n • for each rule R of the form X word[i] • chart[X,i-1,i] max ( weight(R) ) • for width := to n Assume the weights • for start := to n-width are log probabilities • Define end := start + width of rules • for mid := start+1 to end-1 • for each rule R of the form X Y Z • chart[X,start,end] = max( weight(R) + chart[Y,start,mid] + chart[Z,mid,end]) • return chart[ROOT,0,n] CS6501-NLP CuuDuongThanCong.com Slides are modified from Jason Eisner’s NLP course https://fb.com/tailieudientucntt Likelihood of a parse tree WHY?? CS6501-NLP CuuDuongThanCong.com https://fb.com/tailieudientucntt Probabilistic Trees v Just like language models or HMM for POS tagging v We make independent assumptions! S NP time VP flies VP PP P like NP Det an N arrow CS6501-NLP CuuDuongThanCong.com https://fb.com/tailieudientucntt Chain rule: One word at a time p(time flies like an arrow) = p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an) CS6501-NLP CuuDuongThanCong.com https://fb.com/tailieudientucntt Chain rule + Indep assumptions (to get trigram model) p(time flies like an arrow) = p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an) CS6501-NLP CuuDuongThanCong.com https://fb.com/tailieudientucntt Chain rule – written differently p(time flies like an arrow) = p(time) * p(time flies | time) * p(time flies like | time flies) * p(time flies like an | time flies like) * p(time flies like an arrow | time flies like an) Proof: p(x,y | x) = p(x | x) * p(y | x, x) = * p(y | x) CS6501-NLP CuuDuongThanCong.com 10 https://fb.com/tailieudientucntt ... http://www .cs. columbia .edu/ ~mcollins/ CS6 501-NLP CuuDuongThanCong.com https://fb.com/tailieudientucntt Phrase structure (constituency) trees v Can be modeled by Context-free grammars CS6 501-NLP... if Y in [start,mid] and Z in [mid,end] then add X to [start,end] CS6 501-NLP CuuDuongThanCong.com https://fb.com/tailieudientucntt Weighted CKY: Viterbi algorithm • initialize all entries of chart... for i := to n • for each rule R of the form X word[i] • chart[X,i-1,i] max ( weight(R) ) • for width := to n Assume the weights • for start := to n-width are log probabilities • Define end := start