1. Trang chủ
  2. » Tất cả

xử lý ngôn ngữ tự nhiên,kai wei chang,www cs virginia edu

21 4 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 21
Dung lượng 1,01 MB

Nội dung

xử lý ngôn ngữ tự nhiên,kai wei chang,www cs virginia edu Lecture 9 Part of Speech Kai Wei Chang CS @ University of Virginia kw@kwchang net Couse webpage http //kwchang net/teaching/NLP16 1CS6501 Natu[.]

Lecture 9: Part of Speech Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt This lecture v Parts of speech (POS) v POS Tagsets CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Parts of Speech v Traditional parts of speech v ~ of them CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt POS examples vN noun chair, bandwidth, pacing vV verb study, debate, munch v ADJ adjective purple, tall, ridiculous v ADV adverb unfortunately, slowly vP preposition of, by, to v PRO pronoun I, me, mine v DET determiner the, a, that, those CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Parts of Speech v A.k.a parts-of-speech, lexical categories, word classes, morphological classes, lexical tags v Lots of debate within linguistics about the number, nature, and universality of these CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt POS Tagging v The process of assigning a part-of-speech to each word in a collection (sentence) WORD tag the koala put the keys on the table DET N V DET N P DET N CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Why is POS Tagging Useful? v First step of a vast number of practical tasks v Parsing v Need to know if a word is an N or V before you can parse v Information extraction v Finding names, relations, etc v Speech synthesis/recognition v v v v OBject OVERflow DIScount CONtent obJECT overFLOW disCOUNT conTENT v Machine Translation CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Open and Closed Classes v Closed class: a small fixed membership v Prepositions: of, in, by, … v Pronouns: I, you, she, mine, his, them, … v Usually function words (short common words which play a role in grammar) v Open class: new ones can be created v English has 4: Nouns, Verbs, Adjectives, Adverbs v Many languages have these 4, but not all! CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Open Class Words v Nouns v Proper nouns (Boulder, Granby, Eli Manning) v Common nouns (the rest) v Count nouns and mass nouns v Count: have plurals, get counted: goat/goats, one goat, two goats v Mass: don’t get counted (snow, salt, communism) (*two snows) v Verbs v In English, have morphological affixes (eat/eats/eaten) CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Closed Class Words Examples: vprepositions: on, under, over, … vparticles: up, down, on, off, … vdeterminers: a, an, the, … vpronouns: she, who, I, vconjunctions: and, but, or, … vauxiliary verbs: can, may should, … vnumerals: one, two, three, third, … CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 10 Prepositions from CELEX CELEX: online dictionary Frequency counts are from COBUILD 16-billion-word corpus CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 11 English Particles CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 12 Conjunctions CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 13 Choosing a Tagset v Could pick very coarse tagsets v N, V, Adj, Adv, Other v More commonly used set is finer grained v E.g., “Penn TreeBank tagset”, 45 tags: PRP$, WRB, WP$, VBG v Brown cropus, 87 tags v Prague Dependency Treebank (Czech) v 4452 tags v AAFP3 3N : (nejnezajímavějším) Adj Regular Feminine Plural….Superlative [Hajic 2006, VMC tutorial] CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 14 Penn TreeBank POS Tagset CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 15 Using the Penn Tagset v The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS / CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 16 Universal Tag set v ~ 12 different tags v NOUN, VERB, ADJ, ADV, PRON, DET, ADP, NUM, CONJ, PRT, “.”, X CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 17 POS Tagging v.s Word clustering v Words often have more than one POS: back v The back door = JJ v On my back = NN v Win the voters back = RB v Promised to back the bill = VB These examples from Dekang Lin CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 18 How Hard is POS Tagging? CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 19 POS tag sequences v Some tag sequences more likely occur than others v POS Ngram view https://books.google.com/ngrams/graph?co ntent=_ADJ_+_NOUN_%2C_ADV_+_NOU N_%2C+_ADV_+_VERB_ Existing methods often model POS tagging as a sequence tagging problem CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 20 ... speech (POS) v POS Tagsets CS6 501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Parts of Speech v Traditional parts of speech v ~ of them CS6 501 Natural Language... morphological classes, lexical tags v Lots of debate within linguistics about the number, nature, and universality of these CS6 501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt... counts are from COBUILD 16-billion-word corpus CS6 501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 11 English Particles CS6 501 Natural Language Processing CuuDuongThanCong.com

Ngày đăng: 27/11/2022, 21:14