Tài liệu Báo cáo khoa học: "SenseLearner: Word Sense Disambiguation for All Words in Unrestricted Text" doc
... describes SENSELEARNER – a minimally supervised word sense disam- biguation system that attempts to disam- biguate all content words in a text using WordNet senses. We evaluate the accu- racy of SENSELEARNER ... present a method for solving the semantic ambiguity of all content words in a text. The algorithm can be thought of as a minimally supervised word sense disambi...
Ngày tải lên: 20/02/2014, 15:20
... frequent words in T (cf. Section 3.1), the star pattern σ(s) associated with s is obtained by replacing with * all the words w i ∈ F, that is all the tokens that are non-frequent words. For instance, ... 2008. Word lattice reranking for chineseword segmentation and part-of-speech tagging. In Proceedings of the 22nd International Conference on Computational Lin- guistics (C...
Ngày tải lên: 20/02/2014, 04:20
... have usually been investigated within Senseval using the All Words dataset, which does not include training examples. In this paper we preferred us- ing the same test set which was used for the ... polyse- mous source words provide poor training models for sense matching. This can be explained by ob- serving that polysemous source words can be sub- stituted with the target word...
Ngày tải lên: 20/02/2014, 12:20
Tài liệu Báo cáo khoa học: "Joint Word Segmentation and POS Tagging using a Single Perceptron" docx
... t on a word containing char c (not the starting or ending character) 12 tag t on a word starting with char c 0 and containing char c 13 tag t on a word ending with char c 0 and containing char ... POS tagging using an HMM-based approach. Word information is used to process known -words, and character infor- mation is used for unknown words in a similar way to Ng and Low (2004)....
Ngày tải lên: 20/02/2014, 09:20
Tài liệu Báo cáo khoa học: "Bayesian Word Sense Induction" pdf
... the same number of senses for all the words, since tuning this number individually for each word would be prohibitive. We experimented with values ranging from three to nine senses. Figure 3 shows ... a grouping of these instances into classes cor- responding to the induced senses. In other words, contexts that are grouped together in the same class represent a specific word...
Ngày tải lên: 22/02/2014, 02:20
Tài liệu Báo cáo khoa học: "Modified Distortion Matrices for Phrase-Based Statistical Machine Translation" doc
... work Pre-processing approaches to word reordering aim at permuting input words in a way that minimizes the reordering needed for translation: determinis- tic reordering aims at finding a single optimal ... with a 42% increase of the run time. Results in the row “allReo” are obtained by encod- ing all the rule-generated reorderings in L×F chunk- to -word conversion mode. Except...
Ngày tải lên: 19/02/2014, 19:20
Tài liệu Báo cáo khoa học: "Wikipedia as Sense Inventory to Improve Diversity in Web Search Results" doc
... word in the Wikipedia page for the word sense; (ii) occurrences of the word in Wikipedia pages pointing to the page for the word sense; (iii) occurrences of the word in external pages linked in ... found in the page for the sense being trained. • TiMBL-inlinks uses the examples found in Wikipedia pages pointing to the sense being trained. • TiMBL -all uses b...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives" ppt
... a document 4 D into a queried question q. Rather than translating single words in isola- tion, the phrase-based model translates one sequence of words into another sequence of words, thus in- corporating ... ranking algo- rithm proceeds as follows. First, all the words in a given document are added as vertices in a graph G. Then edges are added between words if the words...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Modeling Morphologically Rich Languages Using Split Words and Unstructured Dependencies" docx
... experiment with splitting words into their stem and suffix components for mod- eling morphologically rich languages. We show that using a morphological ana- lyzer and disambiguator results in a sig- nificant ... contains about 600 thousand sentences in the training set and 60 thousand sentences in the test set (giving a total of about 10 million words) . The versions of the corpus we...
Ngày tải lên: 20/02/2014, 09:20
Tài liệu Báo cáo khoa học: "Archivus: A multimodal system for multimedia meeting browsing and retrieval" doc
... interaction in the domain of meet- ing retrieval and for developing NLP mod- ules for this specific domain. 1 Introduction In the past few years, there has been an increasing interest in research ... controlled for in the experiment increases substantially. For instance, if it is the case that within a single inter- face any task that can be performed using natural language can...
Ngày tải lên: 20/02/2014, 12:20