1. Trang chủ
  2. » Tất cả

xử lý ngôn ngữ tự nhiên,regina barzilay,ocw mit edu

39 3 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 39
Dung lượng 394,33 KB

Nội dung

xử lý ngôn ngữ tự nhiên,regina barzilay,ocw mit edu 6 864 Lecture 21 (November 29th, 2005) Global Linear Models Part II CuuDuongThanCong com https //fb com/tailieudientucntt http //cuuduongthancong co[.]

6.864: Lecture 21 (November 29th, 2005) Global Linear Models: Part II CuuDuongThanCong.com https://fb.com/tailieudientucntt Overview • Log-linear models for parameter estimation • Global and local features – The perceptron revisited – Log-linear models revisited CuuDuongThanCong.com https://fb.com/tailieudientucntt Three Components of Global Linear Models • � is a function that maps a structure (x, y) to a feature vector �(x, y) � Rd • GEN is a function that maps an input x to a set of candidates GEN(x) • W is a parameter vector (also a member of Rd ) • Training data is used to set the value of W CuuDuongThanCong.com https://fb.com/tailieudientucntt Putting it all Together • X is set of sentences, Y is set of possible outputs (e.g trees) • Need to learn a function F : X � Y • GEN, �, W define F (x) = arg max �(x, y) · W y�GEN(x) Choose the highest scoring candidate as the most plausible structure • Given examples (xi , yi ), how to set W? CuuDuongThanCong.com https://fb.com/tailieudientucntt She announced a program to promote safety in trucks and vans ∈ GEN S S S S VP NP NP She NP announced NP S NP VP She VP NP NP announced announced S VP She She VP VP She She NP NP announced VP NP a NP VP NP program announced to promote a safety VP NP NP NP a announced program program to promote NP NP in NP a program NP trucks and to promote trucks NP and vans and NP NP and NP safety VP a NP promote PP NP safety in NP trucks PP in program to NP safety VP NP trucks promote vans NP NP vans PP in program to and NP vans vans a promote safety and NP vans NP NP trucks to in in VP NP safety PP PP NP PP NP trucks ∈� ∈� ∈� ∈� ∈� �1, 1, 3, 5 �2, 0, 0, 5 �1, 0, 1, 5 �0, 0, 3, 0 �0, 1, 0, 5 ∈ � �0, 0, 1, 5 ��·W ��·W ��·W ��·W ��·W ��·W 13.6 12.2 12.1 3.3 11.1 9.4 ∈ arg max S VP NP She announced NP NP a VP program to VP promote NP PP safety in NP NP trucks CuuDuongThanCong.com and NP vans https://fb.com/tailieudientucntt A Variant of the Perceptron Algorithm Inputs: Training set (xi , yi ) for i = n Initialization: W=0 Define: F (x) = argmaxy�GEN(x) �(x, y) · W Algorithm: For t = T , i = n zi = F (xi ) If (zi →= yi ) W = W + �(xi , yi ) − �(xi , zi ) Output: Parameters W CuuDuongThanCong.com https://fb.com/tailieudientucntt Overview • Recap: global linear models • Log-linear models for parameter estimation • Global and local features – The perceptron revisited – Log-linear models revisited CuuDuongThanCong.com https://fb.com/tailieudientucntt Back to Maximum Likelihood Estimation [Johnson et al 1999] • We can use the parameters to define a probability for each parse: e�(x,y)·W P (y | x, W) = � �(x,y � )·W e � y �GEN(x) • Log-likelihood is then L(W) = � log P (yi | xi , W) i • A first estimation method: estimates, i.e., take maximum likelihood WM L = argmaxW L(W) CuuDuongThanCong.com https://fb.com/tailieudientucntt Adding Gaussian Priors [Johnson et al 1999] • A first estimation method: take maximum likelihood estimates, i.e., WM L = argmaxW L(W) • Unfortunately, very likely to “overfit”: could use feature selection methods, as in boosting • Another way of preventing overfitting: choose parameters as ⎟ WM AP = argmaxW L(W) − C � W2k k ⎠ for some constant C • Intuition: adds a penalty for large parameter values CuuDuongThanCong.com https://fb.com/tailieudientucntt The Bayesian Justification for Gaussian Priors • In Bayesian methods, combine the log-likelihood P (data | W) with a prior over parameters, P (W) P (data | W)P (W) P (data | W)P (W)dW W P (W | data) = � • The MAP (Maximum A-Posteriori) estimates are WM AP = = argmaxW P (W | data) � ⎛ � ⎜ � argmaxW �log P (data | W) + log P (W)⎜ � �� � � �� � ⎝ Log-Likelihood Prior � W2k • Gaussian prior: P (W) �� e � log P (W) = −C k Wk + C2 −C CuuDuongThanCong.com k https://fb.com/tailieudientucntt

Ngày đăng: 27/11/2022, 21:17