... data. In Proceedings of EMNLP- 09. D. McCarthy, R. Koeling, J. Weeds, J. Carroll. 2004. Finding predominant word senses in untagged text. In Proceedings of the 42nd Meeting of the Associa- tion for ... abstract). In Re- search and Development in Information Retrieval, 279–280. P. Sorg, P. Cimiano. 2008. Cross-lingual information retrieval with explicit semantic analysis. In In...
Ngày tải lên: 23/03/2014, 16:20
Báo cáo khoa học: "Exemplar-Based Models for Word Meaning In Context" pptx
... paraphrasing task, we find that a simple exemplar model outperforms more complex state-of-the-art models. 1 Introduction Distributional models are a popular framework for representing word meaning. ... model for modeling word meaning in context, applying the model to the task of decid- ing paraphrase applicability. With a very simple vector representation and just using activa...
Ngày tải lên: 30/03/2014, 21:20
... explicitly smooth the resulting p s (e|f), since many word pairs will be unseen for a given domain s, we are already performing an implicit form of smoothing (when computing the expected counts), since each docu- ment ... is to increase the likelihood of selecting rele- vant phrases for translation. Matsoukas et al. (2009) introduced assigning a pair of binary features to each training...
Ngày tải lên: 19/02/2014, 19:20
Báo cáo khoa học: "Data Cleaning for Word Alignment" pdf
... training sentences whose n-gram scores are low, we can dupli- cate such training sentences in word alignment. This method is appealing, but unfortunately if we use mgiza or GIZA++, our training ... is the following: if we witness bad sentence-based scores in word- based MT, we can consider our MT system failing to incorporating a n : m mapping object for those sentences. Later in o...
Ngày tải lên: 08/03/2014, 01:20
Báo cáo khoa học: "Distortion Models For Statistical Machine Translation" doc
... translations in the proper word order by attempting many possible 529 word reorderingsduring the translation process. Trying all possible word reordering is an NP-Complete prob- lem as shown in (Knight, ... counting over align- ment links in the training data. Any aligner such as (Al-Onaizan et al., 1999) or (Vogel et al., 1996) can be used to obtain word alignments. For the r...
Ngày tải lên: 08/03/2014, 02:21
Báo cáo khoa học: "Structured Models for Fine-to-Coarse Sentiment Analysis" pdf
... every sentence in a document and inference was solved us- ing a min-cut algorithm. However, jointly modeling the document label and allowing for non-binary la- bels complicates min-cut style solutions as inference becomes ... algorithm is outlined in Figure 3. The algorithm works by considering a single training in- stance during each iteration. The weight vector w is updated in lin...
Ngày tải lên: 08/03/2014, 02:21
Báo cáo khoa học: "A Method for Word Sense Disambiguation of Unrestricted Text" potx
... and adjectives in a text, using the senses pro- vided in WordNet. The senses are ranked us- ing two sources of information: (1) the Inter- net for gathering statistics for word- word co- occurrences ... word PROCEDURE: STEP 1. Form a similarity list ]or each sense of one of the words. Pick one of the words, say W2, and using WordNet, form a similarity list for each sens...
Ngày tải lên: 08/03/2014, 06:20
Báo cáo khoa học: "Combining Clues for Word Alignment" pdf
... important for our purposes to allow mul- tiple links from each word (source and target) to corresponding words in the other language in or- der to obtain phrasal links We say that a word- to -word link ... task is to combine available clues in order to find inter- lingual links. Clues are defined as probabilities of associations. In order to combine all indications which are gi...
Ngày tải lên: 08/03/2014, 21:20
Báo cáo khoa học: " New Models for Improving Supertag Disambiguation" pdf
... as information extraction. We extend our su- pertagging models to perform this task in a fash- ion similar to that described in Srinivas (1997b). Selected models have been trained on 200K words. ... are combined using various vot- ing strategies. The same 1000K word test corpus is used in models of classifier combination as is used in pre- vious models. We created three...
Ngày tải lên: 08/03/2014, 21:20
Báo cáo khoa học: "Fertility Models for Statistical Natural Language Understanding" pdf
... of words in q is denoted by g(ci), cl begins at the first word in the sentence, and ct(c) ends at the last word in the sentence. The clumps form a proper partition of E. All the words in ... modeling tech- niques for modeling clump generation are n-gram language models (Miller et al., 1995; Levin and Pier- accini, 1995; Epstein, 1996), and headword language models (E...
Ngày tải lên: 08/03/2014, 21:20