... SetupOur training corpora 7consist of 96.9M Chinesewords and 109.5M English words in 3.8M sentence pairs. We used all corpora to train our translation model and smaller corpora without the United ... 1288–1297,Portland, Oregon, June 19-24, 2011.c2011 Association for Computational LinguisticsEnhancing Language Models in StatisticalMachine Translation with Backward N-grams and Mutual Information ... July.Arne Mauser, Saˇsa Hasan, and Hermann Ney. 2009. Ex-tending statisticalmachinetranslationwith discrimi-native and trigger-based lexicon models. In Proceed-ings of the 2009 Conference...
... Statisti-cal Machine Translation. Machine Translation, pages77-94.Hua Wu, Haifeng Wang and Chengqing Zong. 2008. Do-main Adaptation for StatisticalMachine Translation with Domain Dictionary and Monolingual ... (Och and Ney,2004) with the alignment if and only if: (1) theremust be at least one word inside one phrase aligned to a word inside the other phrase and (2) no wordsinside one phrase can be aligned ... FBIS corpus and the Hansard-s part of LDC2004T07 corpus (54.6K documents with 1M parallel sentences, 25.2M Chinese words and 29M English words). We use the Chinese Sohuweblog in 20091 and the...
... em-ploys the wordtranslation model to calculate the probabilities of alignments. In IBM Model 2, both the wordtranslation model and position dis-tribution model are used. IBM Model 3, 4 and 5 consider ... model in addition to the word translation model and position distribution model. And these three models are similar, ex-cept for the word distortion models. One-to-one and many-to-one alignments ... probability Given the monolingual wordaligned corpus, we calculate the frequency of two words aligned in the corpus, denoted as ),(jiwwfreq. We filtered the aligned words occurring only once....
... system, GS: System trained with only gen-erated sentence pairs, IT: Interpolated phrase table with GS and BL,. GA and IA are GS and IT systems trained with baseline word alignment models accordingly. ... 294–298,Portland, Oregon, June 19-24, 2011.c2011 Association for Computational LinguisticsCorpus Expansion for StatisticalMachineTranslation with Semantic Role Label Substitution RulesQin Gao and ... classifier with features derived from standardphrase based translation models and bilingual lan-guage models to identify high quality sentence pairs, and use these sentence pairs in the SMT training....
... one sentence pair in our training data. Consider the subtree rootedat word “Problem”. With the gold alignment, “Prob-lem” is aligned to the 5th target word, and with latter procedure” are aligned ... canhelp English-Hindi StatisticalMachine Translation. In Proc. IJCNLP.Roy Tromble. 2009. Search and Learning for the Lin-ear Ordering Problem with an Application to Machine Translation. Ph.D. ... Galley and Manning, 2008).Long-distance word reordering between languagepairs with substantial word order difference, such asJapanese with Subject-Object-Verb (SOV) structure and English with...
... Domain-Adapted Word Segmentationfor StatisticalMachine Translation Yanjun Ma Andy WayNational Centre for Language TechnologySchool of ComputingDublin City UniversityDublin 9, Ireland{yma, away}@computing.dcu.ieAbstractWe ... domain-specific corpora to trainsegmenters, we make use of bilingual cor-pora andstatisticalword alignment tech-niques. First of all, our approach isadapted for the specific translation task athand ... the segmentation of the respective sentencesin the parallel corpus according to these candi-date words; these modified sentences are thengiven back to the word aligner, which producesnew alignments....
... thatthe most number of words any candidate translation has is two words. Since among all the 2 -word candi-date translations, the translation “every month” hasthe highest translation probability ... model’saccuracy on two simplified translation tasks: word translation and blank-filling.Recently, Cabezas and Resnik (2005) experi-mented with incorporating WSD translations intoPharaoh, a state-of-the-art ... English words pro-vided by Hiero+WSD, though appropriate, do not39 translation choices for a word w were defined as theset of words or phrases aligned to w, as gatheredfrom a word- aligned parallel...
... TL word is generated by exactly one SL word. We use a matrix Z for every sentence pair, whose fields describe whether or not two words are aligned. In this approach, multiple words can be aligned ... lines and j rows with binary values. The value zij = 1 (zij = 0) means that the word i influences (not) the word j. In figure 1 every link stands for zij = l. The models 1, 2 and 2 ~ and ... contain words with similar syntactic/semantic properties. To arrive at WCs having both (method COMB), we determine TL WCs with the first method and afterwards we determine SL WCs with the...
... all) statisticalmachinetranslation systems employ a word- based alignment model (Brown et al., 1993; Vogel, Ney, and Tillman, 1996; Wang and Waibel, 1997), which treats words in a sentence ... sen- tence, align it with a source word. 3. Produce a word at each target word po- sition according to the source wordwith which the target word position has been aligned. IBM Alignment ... edu Abstract Most statisticalmachinetranslation systems employ a word- based alignment model. In this paper we demonstrate that word- based align- ment is a major cause of translation errors....