Tài liệu Báo cáo khoa học: "Bayesian Word Sense Induction" pdf
... Introduction Sense induction is the task of discovering automat- ically all possible senses of an ambiguous word. It is related to, but distinct from, word sense disam- biguation (WSD) where the senses ... the generated word. Then, the word sense is selected based on the word, neighbor, and topic. Boyd-Graber et al. (2007) extend the topic mod- eling framework to include WordNet...
Ngày tải lên: 22/02/2014, 02:20
... all WordNet synonyms of the target word, under all its possible senses, and picking randomly one of the synonyms as the source word. For example, the word ‘disc’ is one of the words in the Sense- val ... were excluded since their sense annotation in Senseval-3 is not based on WordNet senses. The Senseval dataset includes a set of example occurrences in context for each word, split to...
Ngày tải lên: 20/02/2014, 12:20
... describes SENSELEARNER – a minimally supervised word sense disam- biguation system that attempts to disam- biguate all content words in a text using WordNet senses. We evaluate the accu- racy of SENSELEARNER ... model learning Sense tagged text semantic models SenseLearner definitions Word sense disambiguation Trained semantic models Sense tagged texts Figure 1: Semantic model learn...
Ngày tải lên: 20/02/2014, 15:20
Tài liệu Báo cáo khoa học: "Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing" pptx
... grammars with only unary and binary productions. We replace lexical words with count ≤ 5 in the training data with one of 50 unknown words using lexical features, following (Petrov et al., 2006). ... of (a). The refinement annotation is hyphenated with a nonterminal symbol. morphology analysis, word segmentation (Johnson and Goldwater, 2009), and dependency grammar in- duction (Cohen et al.,
Ngày tải lên: 19/02/2014, 19:20
Tài liệu Báo cáo khoa học: "Improving Word Representations via Global Context and Multiple Word Prototypes" pdf
... multi-prototype VSM where word sense discrimination is first applied by clus- tering contexts, and then prototypes are built using the contexts of the sense- labeled words. However, in order to ... for clustering word instances, which is used in the multi-prototype ver- sion of our model that accounts for words with mul- tiple senses. We evaluate our new model on the standard WordSim-35...
Ngày tải lên: 19/02/2014, 19:20
Tài liệu Báo cáo khoa học: "Unsupervized Word Segmentation: the case for Mandarin Chinese" doc
... density distributions for words vs. non-words, we observed that the VBE at both boundaries were the most dis- criminative value. Therefore, we decided to take in account the VBE only at the word- candidate ... corresponding to the sequence of words w 0 w 1 . . . w m , and len(w i ) is the length of a word w i used here to be able to com- pare segmentations resulting in a different number of...
Ngày tải lên: 19/02/2014, 19:20
Tài liệu Báo cáo khoa học: "Enhanced word decomposition by calibrating the decision threshold of probabilistic models and using a model ensemble" pdf
... relative word positions and found out that the calibrated PROMODES-H pre- dicted non-boundaries better for initial word posi- tions whereas the calibrated PROMODES for mid- and final word positions. ... term word morphology. It is worthwhile studying this internal structure since a language description using its morphological formation is more compact and complete than listing all pos-...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Learning Word-Class Lattices for Definition and Hypernym Extraction" doc
... frequent words F to generalize words to word classes”. We define a word class as either a word itself or its part of speech. Given a sentence s = w 1 , w 2 , . . . , w |s| , where w i is the i-th word ... order of symbols like in word/ phoneme lat- tices, and nodes are clusters of salient words ag- gregated using synonymy, similarity, or subtrees of a thesaurus. However, salient word...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Learning Word Vectors for Sentiment Analysis" ppt
... com- pare our model’s word representations with several bag of words weighting methods, and alternative ap- proaches to word vector induction. 4.1 Word Representation Learning We induce word representations ... simi- larity of w with all other words w ′ , we can find the words deemed most similar by the model. Table 1 shows the most similar words to given query words using our model’s...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Joint Word Segmentation and POS Tagging using a Single Perceptron" docx
... last word can be a complete word or a partial word. A problem arises in whether to give POS tags to incomplete words. If partial words are given POS tags, it is likely that some partial words are ... words 12 the ending characters c 1 and c 2 of two con- secutive words 13 a word of length l with previous word w 14 a word of length l with next word w Table 1: Feature templates for...
Ngày tải lên: 20/02/2014, 09:20