... possible word pairs (,) ji fe as a parallel ET pair and using the IBM Model 1 (Brown et al., 1993) word to word translation probability as the ET translation probability. y Smoothing the ET translation ... POS tag pairs. y Word translation probability: P( | ) ji fe. y Rank: the rank of the word to word probabil- ity of j f in as a translation of i e among all...
Ngày tải lên: 20/02/2014, 15:20
... stochastic models for doc- ument coherence, for both EARTHQUAKES and ACCIDENTS genre, using IDL-CH-HB . Board’s database. For both collections, we used 100 documents for training and 100 documents for ... tend to trigger the usage of certain words in a target language translation of that sentence.) We train models able to recognize local recur- ring patterns of word usage across sente...
Ngày tải lên: 20/02/2014, 12:20
Tài liệu Báo cáo khoa học: "Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity" pdf
... mul- tilingual setting. In particular, translations of a word into other languages found in parallel cor- pora are seen as the (translational) context of that word. We assume that words that share transla- tional ... similarity: one using syntactic context and one using translational context based on word alignment and the combination of both. For both approaches, we used a cutoff n...
Ngày tải lên: 20/02/2014, 12:20
Tài liệu Báo cáo khoa học: "Lexical transfer using a vector-space model" doc
... other words in the source sentence surrounding the concerned source word. Suppose that we have translation examples including the concerned source word and we know in advance which target word ... sparseness that comes from variations in target words. The translation of a word can vary more than the meaning of the target word. For example, the English word “bill” has t...
Ngày tải lên: 20/02/2014, 18:20
Tài liệu Báo cáo khoa học: "INFORMATION RETRIEVAL USING ROBUST NATURAL LANGUAGE PROCESSING" docx
... this using a stochastic parts of speech tagger 5 to preprocess the text. WORD SUFFIX TRIMMER Word stemming has been an effective way of improving document recall since it reduces words ... operators) is defined using word- combination frequencies within the linguistic dependency structures. Further, the likeli- hood of a given word being paired with another word, within one...
Ngày tải lên: 20/02/2014, 21:20
Tài liệu Báo cáo khoa học: "Sentiment Translation through Lexicon Induction" doc
... of a word w, the hits near positive (P words) and negative (Nwords) seed words is used. The SO-PMI equation is given as SO-PMI (word) = log 2 pword∈P words hits (word NEAR pword) nword∈Nwords hits (word ... = log 2 pword∈P words hits (word NEAR pword) nword∈Nwords hits (word NEAR nword) × nword∈Nwords hits(nword) pword∈P words hits(pword) 5.2 Data Acquisition We used t...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Crowdsourcing Translation: Professional Quality from Non-Professionals" pptx
... this translation. • Is-Best percentage: how often the translation was top-ranked among the four translations. • Is-Better percentage: how often the translation was judged as the better translation, ... source Professional LDC Translation Non-Professional Mechanical Turk Translation Figure 1: A comparison of professional translations provided by the LDC to non-professional translati...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Name Translation in Statistical Machine Translation Learning When to Transliterate" pptx
... cap- italized words (with a few exceptions). We use a list of about 200 Arabic and English stopwords and stopword pairs. We use lists of countries and their adjective forms to bridge cross-POS translations ... the NEWA metric (section 2) to both our SMT translations as well as the four human ref- erence translations, using both the original named- entity translation annotation and the re-...
Ngày tải lên: 20/02/2014, 09:20
Tài liệu Báo cáo khoa học: "Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora" ppt
... Gigaword have billions of words, but the parallel data has only about 30 million words. Step-4 and -5 are natural ways to integrate the ab- breviation translation component with the baseline translation ... 0.133 phrase translation 0.066 0.023 lexical translation 0.061 0.078 reverse phrase translation 0.059 0.103 reverse lexical translation 0.112 0.090 phrase penalty -0.150 -0.162...
Ngày tải lên: 20/02/2014, 09:20
Tài liệu Báo cáo khoa học: "Resolving Translation Mismatches With Information Flow" pdf
... our ap- proach with examples of translation between En- glish and Japanese. 1 Introduction The focus of machine translation (MT) technol- ogy has been on the translation of sentence struc- ... of translation mis- matches. In this paper, we propose a framework based on Situation Theory (Barwise and Perry 1983). First we will define the problem of translation mismatches, th...
Ngày tải lên: 20/02/2014, 21:20