xử lý ngôn ngữ tự nhiên,kai wei chang,www cs virginia edu

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	33
Dung lượng	1,88 MB

Nội dung

xử lý ngôn ngữ tự nhiên,kai wei chang,www cs virginia edu Lecture 7 Word Embeddings Kai Wei Chang CS @ University of Virginia kw@kwchang net Couse webpage http //kwchang net/teaching/NLP16 16501 Natur[.]

Lecture 7: Word Embeddings Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt This lecture v Learning word vectors (Cont.) v Representation learning in NLP 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: Latent Semantic Analysis v Data representation v Encode single-relational data in a matrix v Co-occurrence (e.g., from a general corpus) v Synonyms (e.g., from a thesaurus) v Factorization v Apply SVD to the matrix to find latent components v Measuring degree of relation v Cosine of latent vectors CuuDuongThanCong.com https://fb.com/tailieudientucntt Recap: Mapping to Latent Space via SVD 𝚺 ≈ 𝐔 𝑪 𝑑×𝑛 𝐕' 𝑘×𝑘 𝑘×𝑛 𝑑×𝑘 v SVD generalizes the original data v Uncovers relationships not explicit in the thesaurus v Term vectors projected to 𝑘-dim latent space v Word similarity: cosine of two column vectors in 𝚺𝐕 $ CuuDuongThanCong.com https://fb.com/tailieudientucntt Low rank approximation v Frobenius norm C is a 𝑚×𝑛 matrix ||𝐶||/ = 1 |𝑐34 |5 378 478 v Rank of a matrix v How many vectors in the matrix are independent to each other 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Low rank approximation v Low rank approximation problem: ||𝐶 − 𝑋||/ 𝑠 𝑡 𝑟𝑎𝑛𝑘 𝑋 = 𝑘 = v If I can only use k independent vectors to describe the points in the space, what are the best choices? Essentially, we minimize the “reconstruction loss” under a low rank constraint 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Low rank approximation v Low rank approximation problem: ||𝐶 − 𝑋||/ 𝑠 𝑡 𝑟𝑎𝑛𝑘 𝑋 = 𝑘 = v If I can only use k independent vectors to describe the points in the space, what are the best choices? Essentially, we minimize the “reconstruction loss” under a low rank constraint 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Low rank approximation v Assume rank of 𝐶 is r v SVD: 𝐶 = 𝑈Σ𝑉 ' , Σ = diag(𝜎8 , 𝜎5 … 𝜎P , 0,0,0, … 0) 𝜎8 0 Σ = ⋱ 0 0 𝑟 non-zeros v Zero-out the r − 𝑘 trailing values Σ′ = diag(𝜎8 , 𝜎5 … 𝜎U , 0,0,0, … 0) v 𝐶 V = UΣV 𝑉 ' is the best k-rank approximation: C V = 𝑎𝑟𝑔 min ||𝐶 − 𝑋||/ 𝑠 𝑡 𝑟𝑎𝑛𝑘 𝑋 = 𝑘 = 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Word2Vec v LSA: a compact representation of cooccurrence matrix v Word2Vec:Predict surrounding words (skip-gram) v Similar to using co-occurrence counts Levy&Goldberg (2014), Pennington et al (2014) v Easy to incorporate new words or sentences 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Word2Vec v Similar to language model, but predicting next word is not the goal v Idea: words that are semantically similar often occur near each other in text v Embeddings that are good at predicting neighboring words are also good at representing similarity 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 10 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 19 Example v Assume vocabulary set is 𝑊 We have one center word 𝑐, and one context word 𝑜 v What is the conditional probability 𝑝 𝑜 𝑐 exp (𝑢• ⋅ 𝑣– ) 𝑝 𝑜𝑐 = ∑pV exp (𝑢p n ⋅ 𝑣– ) v What is the gradient of the log likelihood w.r.t 𝑣– ? 𝜕 log 𝑝 𝑜 𝑐 = 𝑢• − 𝐸p∼™ 𝑤 𝑐 [𝑢p ] 𝜕𝑣– 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 20

Ngày đăng: 27/11/2022, 21:14