xử lý ngôn ngữ tự nhiên,kai wei chang,www cs virginia edu Lecture 9 Part of Speech Kai Wei Chang CS @ University of Virginia kw@kwchang net Couse webpage http //kwchang net/teaching/NLP16 1CS6501 Natu[.]
Lecture 9: Part of Speech Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt This lecture v Parts of speech (POS) v POS Tagsets CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Parts of Speech v Traditional parts of speech v ~ of them CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt POS examples vN noun chair, bandwidth, pacing vV verb study, debate, munch v ADJ adjective purple, tall, ridiculous v ADV adverb unfortunately, slowly vP preposition of, by, to v PRO pronoun I, me, mine v DET determiner the, a, that, those CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Parts of Speech v A.k.a parts-of-speech, lexical categories, word classes, morphological classes, lexical tags v Lots of debate within linguistics about the number, nature, and universality of these CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt POS Tagging v The process of assigning a part-of-speech to each word in a collection (sentence) WORD tag the koala put the keys on the table DET N V DET N P DET N CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Why is POS Tagging Useful? v First step of a vast number of practical tasks v Parsing v Need to know if a word is an N or V before you can parse v Information extraction v Finding names, relations, etc v Speech synthesis/recognition v v v v OBject OVERflow DIScount CONtent obJECT overFLOW disCOUNT conTENT v Machine Translation CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Open and Closed Classes v Closed class: a small fixed membership v Prepositions: of, in, by, … v Pronouns: I, you, she, mine, his, them, … v Usually function words (short common words which play a role in grammar) v Open class: new ones can be created v English has 4: Nouns, Verbs, Adjectives, Adverbs v Many languages have these 4, but not all! CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Open Class Words v Nouns v Proper nouns (Boulder, Granby, Eli Manning) v Common nouns (the rest) v Count nouns and mass nouns v Count: have plurals, get counted: goat/goats, one goat, two goats v Mass: don’t get counted (snow, salt, communism) (*two snows) v Verbs v In English, have morphological affixes (eat/eats/eaten) CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Closed Class Words Examples: vprepositions: on, under, over, … vparticles: up, down, on, off, … vdeterminers: a, an, the, … vpronouns: she, who, I, vconjunctions: and, but, or, … vauxiliary verbs: can, may should, … vnumerals: one, two, three, third, … CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 10 Prepositions from CELEX CELEX: online dictionary Frequency counts are from COBUILD 16-billion-word corpus CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 11 English Particles CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 12 Conjunctions CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 13 Choosing a Tagset v Could pick very coarse tagsets v N, V, Adj, Adv, Other v More commonly used set is finer grained v E.g., “Penn TreeBank tagset”, 45 tags: PRP$, WRB, WP$, VBG v Brown cropus, 87 tags v Prague Dependency Treebank (Czech) v 4452 tags v AAFP3 3N : (nejnezajímavějším) Adj Regular Feminine Plural….Superlative [Hajic 2006, VMC tutorial] CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 14 Penn TreeBank POS Tagset CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 15 Using the Penn Tagset v The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS / CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 16 Universal Tag set v ~ 12 different tags v NOUN, VERB, ADJ, ADV, PRON, DET, ADP, NUM, CONJ, PRT, “.”, X CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 17 POS Tagging v.s Word clustering v Words often have more than one POS: back v The back door = JJ v On my back = NN v Win the voters back = RB v Promised to back the bill = VB These examples from Dekang Lin CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 18 How Hard is POS Tagging? CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 19 POS tag sequences v Some tag sequences more likely occur than others v POS Ngram view https://books.google.com/ngrams/graph?co ntent=_ADJ_+_NOUN_%2C_ADV_+_NOU N_%2C+_ADV_+_VERB_ Existing methods often model POS tagging as a sequence tagging problem CS6501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 20 ... speech (POS) v POS Tagsets CS6 501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt Parts of Speech v Traditional parts of speech v ~ of them CS6 501 Natural Language... morphological classes, lexical tags v Lots of debate within linguistics about the number, nature, and universality of these CS6 501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt... counts are from COBUILD 16-billion-word corpus CS6 501 Natural Language Processing CuuDuongThanCong.com https://fb.com/tailieudientucntt 11 English Particles CS6 501 Natural Language Processing CuuDuongThanCong.com