1. Trang chủ
  2. » Tất cả

xử lý ngôn ngữ tự nhiên,kai wei chang,www cs virginia edu

48 1 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 48
Dung lượng 6,07 MB

Nội dung

xử lý ngôn ngữ tự nhiên,kai wei chang,www cs virginia edu Lecture 6 Vector Space Model Kai Wei Chang CS @ University of Virginia kw@kwchang net Couse webpage http //kwchang net/teaching/NLP16 16501 Na[.]

Lecture 6: Vector Space Model Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt This lecture v How to represent a word, a sentence, or a document? v How to infer the relationship among words? v We focus on “semantics”: distributional semantics v What is the meaning of “life”? 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt How to represent a word v Naïve way: represent words as atomic symbols: student, talk, university v N-germ language model, logical analysis v Represent word as a “one-hot” vector [0 0 … 0] egg student talk university happy buy v How large is this vector? v PTB data: ~50k, Google 1T data: 13M v 𝑣 ⋅ 𝑢 =? 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Issues? v Dimensionality is large; vector is sparse v No similarity 𝑣'())* = [0 0 0 … 0 ] 𝑣+(, = [0 0 … 0 ] 𝑣-./0 = [1 0 0 … 0 ] 𝑣'())* ⋅ 𝑣+(, = 𝑣'())* ⋅ 𝑣-./0 = 0 v Cannot represent new words v Any idea? 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Idea 1: Taxonomy (Word category) 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt What is “car”? >>> from nltk.corpus importwordnet as wn >>> wn.synsets('motorcar') [Synset('car.n.01')] >>> motorcar.hypernyms() [Synset('motor_vehicle.n.01')] >>> paths = motorcar.hypernym_paths() >>> [synset.name() for synsetin paths[0]] ['entity.n.01', 'physical_entity.n.01', 'object.n.01', 'whole.n.02', 'artifact.n.01', 'instrumentality.n.03', 'container.n.01', 'wheeled_vehicle.n.01','self-propelled_vehicle.n.01', 'motor_vehicle.n.01', 'car.n.01'] >>> [synset.name() for synsetin paths[1]] ['entity.n.01', 'physical_entity.n.01', 'object.n.01', 'whole.n.02', 'artifact.n.01', 'instrumentality.n.03', 'conveyance.n.03', 'vehicle.n.01', 'wheeled_vehicle.n.01', 'self-propelled_vehicle.n.01', 'motor_vehicle.n.01', 'car.n.01'] 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Word similarity? >>> right = wn.synset('right_whale.n.01') >>> minke = wn.synset('minke_whale.n.01') >>> orca = wn.synset('orca.n.01') >>> tortoise = wn.synset('tortoise.n.01') >>> novel = wn.synset('novel.n.01') >>> right.lowest_common_hypernyms(minke) [Synset('baleen_whale.n.01')] >>> right.lowest_common_hypernyms(orca) [Synset('whale.n.02')] >>>right.lowest_common_hypernyms(tortoise) [Synset('vertebrate.n.01')] >>> right.lowest_common_hypernyms(novel) [Synset('entity.n.01')] Require human labor 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Taxonomy (Word category) v Synonym, hypernym (Is-A), hyponym 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Idea 2: Similarity = Clustering 6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 10 ... or a document? v How to infer the relationship among words? v We focus on “semantics”: distributional semantics v What is the meaning of “life”? 6501 Natural Language Processing CuuDuongThanCong.com... https://fb.com/tailieudientucntt 10 Cluster n-gram model v Can be generated from unlabeled corpora v Based on statistics, e.g., mutual information Implementation of the Brown hierarchical word clustering algorithm... representations v Word meanings are vector of “basic concept” v What are “basic concept”? v How to assign weights? v How to define the similarity/distance?

Ngày đăng: 27/11/2022, 21:14