0

statistical machine translation with word and sentence aligned parallel corpora

Báo cáo khoa học:

Báo cáo khoa học: "Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers" ppt

Báo cáo khoa học

... SetupOur training corpora 7consist of 96.9M Chinesewords and 109.5M English words in 3.8M sentence pairs. We used all corpora to train our translation model and smaller corpora without the United ... 1288–1297,Portland, Oregon, June 19-24, 2011.c2011 Association for Computational LinguisticsEnhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information ... July.Arne Mauser, Saˇsa Hasan, and Hermann Ney. 2009. Ex-tending statistical machine translation with discrimi-native and trigger-based lexicon models. In Proceed-ings of the 2009 Conference...
  • 10
  • 415
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information" doc

Báo cáo khoa học

... Statisti-cal Machine Translation. Machine Translation, pages77-94.Hua Wu, Haifeng Wang and Chengqing Zong. 2008. Do-main Adaptation for Statistical Machine Translation with Domain Dictionary and Monolingual ... (Och and Ney,2004) with the alignment if and only if: (1) theremust be at least one word inside one phrase aligned to a word inside the other phrase and (2) no wordsinside one phrase can be aligned ... FBIS corpus and the Hansard-s part of LDC2004T07 corpus (54.6K documents with 1M parallel sentences, 25.2M Chinese words and 29M English words). We use the Chinese Sohuweblog in 20091 and the...
  • 10
  • 533
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Improving Statistical Machine Translation with Monolingual Collocation" pdf

Báo cáo khoa học

... em-ploys the word translation model to calculate the probabilities of alignments. In IBM Model 2, both the word translation model and position dis-tribution model are used. IBM Model 3, 4 and 5 consider ... model in addition to the word translation model and position distribution model. And these three models are similar, ex-cept for the word distortion models. One-to-one and many-to-one alignments ... probability Given the monolingual word aligned corpus, we calculate the frequency of two words aligned in the corpus, denoted as ),(jiwwfreq. We filtered the aligned words occurring only once....
  • 9
  • 474
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules" doc

Báo cáo khoa học

... system, GS: System trained with only gen-erated sentence pairs, IT: Interpolated phrase table with GS and BL,. GA and IA are GS and IT systems trained with baseline word alignment models accordingly. ... 294–298,Portland, Oregon, June 19-24, 2011.c2011 Association for Computational LinguisticsCorpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution RulesQin Gao and ... classifier with features derived from standardphrase based translation models and bilingual lan-guage models to identify high quality sentence pairs, and use these sentence pairs in the SMT training....
  • 5
  • 416
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "A Ranking-based Approach to Word Reordering for Statistical Machine Translation" doc

Báo cáo khoa học

... one sentence pair in our training data. Consider the subtree rootedat word “Problem”. With the gold alignment, “Prob-lem” is aligned to the 5th target word, and with latter procedure” are aligned ... canhelp English-Hindi Statistical Machine Translation. In Proc. IJCNLP.Roy Tromble. 2009. Search and Learning for the Lin-ear Ordering Problem with an Application to Machine Translation. Ph.D. ... Galley and Manning, 2008).Long-distance word reordering between languagepairs with substantial word order difference, such asJapanese with Subject-Object-Verb (SOV) structure and English with...
  • 9
  • 615
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Bilingually Motivated Domain-Adapted Word Segmentation for Statistical Machine Translation" pptx

Báo cáo khoa học

... Domain-Adapted Word Segmentationfor Statistical Machine Translation Yanjun Ma Andy WayNational Centre for Language TechnologySchool of ComputingDublin City UniversityDublin 9, Ireland{yma, away}@computing.dcu.ieAbstractWe ... domain-specific corpora to trainsegmenters, we make use of bilingual cor-pora and statistical word alignment tech-niques. First of all, our approach isadapted for the specific translation task athand ... the segmentation of the respective sentencesin the parallel corpus according to these candi-date words; these modified sentences are thengiven back to the word aligner, which producesnew alignments....
  • 9
  • 236
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Word Sense Disambiguation Improves Statistical Machine Translation" docx

Báo cáo khoa học

... thatthe most number of words any candidate translation has is two words. Since among all the 2 -word candi-date translations, the translation “every month” hasthe highest translation probability ... model’saccuracy on two simplified translation tasks: word translation and blank-filling.Recently, Cabezas and Resnik (2005) experi-mented with incorporating WSD translations intoPharaoh, a state-of-the-art ... English words pro-vided by Hiero+WSD, though appropriate, do not39 translation choices for a word w were defined as theset of words or phrases aligned to w, as gatheredfrom a word- aligned parallel...
  • 8
  • 285
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Improving Statistical Natural Language Translation with Categories and Rules" potx

Báo cáo khoa học

... TL word is generated by exactly one SL word. We use a matrix Z for every sentence pair, whose fields describe whether or not two words are aligned. In this approach, multiple words can be aligned ... lines and j rows with binary values. The value zij = 1 (zij = 0) means that the word i influences (not) the word j. In figure 1 every link stands for zij = l. The models 1, 2 and 2 ~ and ... contain words with similar syntactic/semantic properties. To arrive at WCs having both (method COMB), we determine TL WCs with the first method and afterwards we determine SL WCs with the...
  • 5
  • 347
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Modeling with Structures in Statistical Machine Translation" pot

Báo cáo khoa học

... all) statistical machine translation systems employ a word- based alignment model (Brown et al., 1993; Vogel, Ney, and Tillman, 1996; Wang and Waibel, 1997), which treats words in a sentence ... sen- tence, align it with a source word. 3. Produce a word at each target word po- sition according to the source word with which the target word position has been aligned. IBM Alignment ... edu Abstract Most statistical machine translation systems employ a word- based alignment model. In this paper we demonstrate that word- based align- ment is a major cause of translation errors....
  • 7
  • 422
  • 0

Xem thêm