xử lý ngôn ngữ tự nhiên,regina barzilay,ocw mit edu

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	39
Dung lượng	291,59 KB

Nội dung

xử lý ngôn ngữ tự nhiên,regina barzilay,ocw mit edu 6 864 Lecture 6 (September 27th, 2005) The EM Algorithm Part II CuuDuongThanCong com https //fb com/tailieudientucntt http //cuuduongthancong com?sr[.]

6.864: Lecture (September 27th, 2005) The EM Algorithm Part II CuuDuongThanCong.com https://fb.com/tailieudientucntt Hidden Markov Models A hidden Markov model (N, �, �) consists of the following elements: • N is a positive integer specifying the number of states in the model Without loss of generality, we will take the N ’th state to be a special state, the final or stop state • � is a set of output symbols, for example � = {a, b} • � is a vector of parameters CuuDuongThanCong.com https://fb.com/tailieudientucntt • � is a vector of parameters parameters: It contains three types of – �j for j = N is the probability of choosing state j as an initial state – aj,k for j = (N − 1), k = N , is the probability of transitioning from state j to state k – bj (o) for j = (N − 1), and o � �, is the probability of emitting symbol o from state j Thus it can be seen that � is a vector of N + (N − 1)N + (N − 1)|�| parameters • Note that we have the following constraints: �N – j=1 �j = �N – for all j, k=1 aj,k = � – for all j, o�� bj (o) = CuuDuongThanCong.com https://fb.com/tailieudientucntt Hidden Markov Models • An HMM specifies a probability for each possible (x, y) pair, where x is a sequence of symbols drawn from �, and y is a sequence of states drawn from the integers (N − 1) The sequences x and y are restricted to have the same length • E.g., say we have an HMM with N = 3, � = {a, b}, and with some choice of the parameters � Take x = �a, a, b, b∈ and y = �1, 2, 2, 1∈ Then in this case, P (x, y|�) = �1 a1,2 a2,2 a2,1 a1,3 b1 (a) b2 (a) b2 (b) b1 (b) CuuDuongThanCong.com https://fb.com/tailieudientucntt Hidden Markov Models In general, if we have the sequence x = x1 , x2 , xn where each xj � �, and the sequence y = y1 , y2 , yn where each yj � (N − 1), then P (x, y|�) = �y1 ayn ,N n � j =2 CuuDuongThanCong.com ayj−1 ,yj n � byj (xj ) j =1 https://fb.com/tailieudientucntt EM: the Basic Set-up • We have some data points—a “sample”—x , x2 , xm • For example, each xi might be a sentence such as “the dog slept”: this will be the case in EM applied to hidden Markov models (HMMs) or probabilistic context-free- grammars (PCFGs) (Note that in this case each xi is a sequence, which we will sometimes write xi1 , x i2 , x ini where ni is the length of the sequence.) • Or in the three coins example (see the lecture notes), each xi might be a sequence of three coin tosses, such as HHH, THT, or TTT CuuDuongThanCong.com https://fb.com/tailieudientucntt • We have a parameter vector � For example, see the description of HMMs in the previous section As another example, in a PCFG, � would contain the probability P (� � �|�) for every rule expansion � � � in the context-free grammar within the PCFG CuuDuongThanCong.com https://fb.com/tailieudientucntt • We have a model P (x, y|�): A function that for any x, y, � triple returns a probability, which is the probability of seeing x and y together given parameter settings � • This model defines a joint distribution over x and y, but that we can also derive a marginal distribution over x alone, defined as P (x|�) = P (x, y|�) y CuuDuongThanCong.com https://fb.com/tailieudientucntt • Given the sample x1 , x2 , xm , we define the likelihood as L� (�) = m � P (x i |�) = m � P (x i , y|�) i=1 y i=1 and we define the log-likelihood as � L(�) = log L (�) = m i=1 CuuDuongThanCong.com i log P (x |�) = m i=1 log P (x i , y|�) y https://fb.com/tailieudientucntt • The maximum-likelihood estimation problem is to find �M L = arg max L(�) �� where � is a parameter space specifying the set of allowable parameter settings In the HMM example, � would enforce �N the restrictions j =1 �j = 1, for all j = (N − 1), �N � o�� bj (o) = k=1 aj,k = 1, and for all j = (N − 1), CuuDuongThanCong.com https://fb.com/tailieudientucntt ... transitioning from state j to state k – bj (o) for j = (N − 1), and o � �, is the probability of emitting symbol o from state j Thus it can be seen that � is a vector of N + (N − 1)N + (N − 1)|�|

Ngày đăng: 27/11/2022, 21:16