xử lý ngôn ngữ tự nhiên,regina barzilay,ocw mit edu 6 864 Lecture 21 (November 29th, 2005) Global Linear Models Part II CuuDuongThanCong com https //fb com/tailieudientucntt http //cuuduongthancong co[.]
6.864: Lecture 21 (November 29th, 2005) Global Linear Models: Part II CuuDuongThanCong.com https://fb.com/tailieudientucntt Overview • Log-linear models for parameter estimation • Global and local features – The perceptron revisited – Log-linear models revisited CuuDuongThanCong.com https://fb.com/tailieudientucntt Three Components of Global Linear Models • � is a function that maps a structure (x, y) to a feature vector �(x, y) � Rd • GEN is a function that maps an input x to a set of candidates GEN(x) • W is a parameter vector (also a member of Rd ) • Training data is used to set the value of W CuuDuongThanCong.com https://fb.com/tailieudientucntt Putting it all Together • X is set of sentences, Y is set of possible outputs (e.g trees) • Need to learn a function F : X � Y • GEN, �, W define F (x) = arg max �(x, y) · W y�GEN(x) Choose the highest scoring candidate as the most plausible structure • Given examples (xi , yi ), how to set W? CuuDuongThanCong.com https://fb.com/tailieudientucntt She announced a program to promote safety in trucks and vans ∈ GEN S S S S VP NP NP She NP announced NP S NP VP She VP NP NP announced announced S VP She She VP VP She She NP NP announced VP NP a NP VP NP program announced to promote a safety VP NP NP NP a announced program program to promote NP NP in NP a program NP trucks and to promote trucks NP and vans and NP NP and NP safety VP a NP promote PP NP safety in NP trucks PP in program to NP safety VP NP trucks promote vans NP NP vans PP in program to and NP vans vans a promote safety and NP vans NP NP trucks to in in VP NP safety PP PP NP PP NP trucks ∈� ∈� ∈� ∈� ∈� �1, 1, 3, 5 �2, 0, 0, 5 �1, 0, 1, 5 �0, 0, 3, 0 �0, 1, 0, 5 ∈ � �0, 0, 1, 5 ��·W ��·W ��·W ��·W ��·W ��·W 13.6 12.2 12.1 3.3 11.1 9.4 ∈ arg max S VP NP She announced NP NP a VP program to VP promote NP PP safety in NP NP trucks CuuDuongThanCong.com and NP vans https://fb.com/tailieudientucntt A Variant of the Perceptron Algorithm Inputs: Training set (xi , yi ) for i = n Initialization: W=0 Define: F (x) = argmaxy�GEN(x) �(x, y) · W Algorithm: For t = T , i = n zi = F (xi ) If (zi →= yi ) W = W + �(xi , yi ) − �(xi , zi ) Output: Parameters W CuuDuongThanCong.com https://fb.com/tailieudientucntt Overview • Recap: global linear models • Log-linear models for parameter estimation • Global and local features – The perceptron revisited – Log-linear models revisited CuuDuongThanCong.com https://fb.com/tailieudientucntt Back to Maximum Likelihood Estimation [Johnson et al 1999] • We can use the parameters to define a probability for each parse: e�(x,y)·W P (y | x, W) = � �(x,y � )·W e � y �GEN(x) • Log-likelihood is then L(W) = � log P (yi | xi , W) i • A first estimation method: estimates, i.e., take maximum likelihood WM L = argmaxW L(W) CuuDuongThanCong.com https://fb.com/tailieudientucntt Adding Gaussian Priors [Johnson et al 1999] • A first estimation method: take maximum likelihood estimates, i.e., WM L = argmaxW L(W) • Unfortunately, very likely to “overfit”: could use feature selection methods, as in boosting • Another way of preventing overfitting: choose parameters as ⎟ WM AP = argmaxW L(W) − C � W2k k ⎠ for some constant C • Intuition: adds a penalty for large parameter values CuuDuongThanCong.com https://fb.com/tailieudientucntt The Bayesian Justification for Gaussian Priors • In Bayesian methods, combine the log-likelihood P (data | W) with a prior over parameters, P (W) P (data | W)P (W) P (data | W)P (W)dW W P (W | data) = � • The MAP (Maximum A-Posteriori) estimates are WM AP = = argmaxW P (W | data) � ⎛ � ⎜ � argmaxW �log P (data | W) + log P (W)⎜ � �� � � �� � ⎝ Log-Likelihood Prior � W2k • Gaussian prior: P (W) �� e � log P (W) = −C k Wk + C2 −C CuuDuongThanCong.com k https://fb.com/tailieudientucntt