improving statistical machine translation by paraphrasing the training data

Tài liệu Báo cáo khoa học: "Improving Statistical Machine Translation with Monolingual Collocation" pdf

Tài liệu Báo cáo khoa học: "Improving Statistical Machine Translation with Monolingual Collocation" pdf

... alignments on the phrase-based SMT sys- tem. The bi-directional alignments are obtained 830 Figure 3. Example of the translations generated by the baseline system and the system where the phrase ... that of the translation "can we avoid". Thus, our method selects the former as the translation. Although the phrase "我们 必须 采取 有效 措 施" in the source sentence has the same ... the systems using the improved bi-directional alignments achieve higher quality of translation than the baseline system. If the same alignment method is used, the systems using CM-3 got the...

Ngày tải lên: 20/02/2014, 04:20

9 474 0
Tài liệu Báo cáo khoa học: "Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation" docx

Tài liệu Báo cáo khoa học: "Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation" docx

... Setup We set out to see whether we could use the HNG method to achieve translation quality improve- ments by gathering additional translations to add to the training data of the entire LDC language pack, ... widely used metric in the MT commu- nity. The x-axis measures the number of sen- tence translation pairs in the training data. The VG curves are cut off at the point at which the stopping criterion ... returns. We further confirmed this by running a least-squares linear regression on the points of the last 700,000 words annotated in the LDC data and also for the points in the new data that we...

Ngày tải lên: 20/02/2014, 04:20

11 580 0
Tài liệu Báo cáo khoa học: "Unsupervised Search for The Optimal Segmentation for Statistical Machine Translation" doc

Tài liệu Báo cáo khoa học: "Unsupervised Search for The Optimal Segmentation for Statistical Machine Translation" doc

... respectively. trained on the English side of the training cor- pus using the SRILM toolkit (Stolcke, 2002). The BLEU metric (Papineni et al., 2002) was used for translation evaluation. Figure 1 compares the translation ... 2006. Word-based alignment, phrase-based translation: What’s the link? In Proceedings of the 7th Conference of the Association for Machine Translation in the Ameri- cas (AMTA-06), pages 90–99. Franz ... Oflazer. 2006. Initial explorations in English to Turkish statistical machine translation. In Proceedings of the Work- shop on Statistical Machine Translation, pages 7– 14, New York City, New York,...

Ngày tải lên: 20/02/2014, 04:20

6 446 0
Tài liệu Báo cáo khoa học: "Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information" doc

Tài liệu Báo cáo khoa học: "Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information" doc

... represents the probability that t f in is not the topic of the phrase ˜ f. Similarly, P ( ¯ t f in |f j ) indicates the probability that t f in is not the topic of the word f j . The other method ... and context-dependent translation selection both of which put emphasis on solving the translation ambiguity by the exploitation of the context in- formation at the sentence level, we adopt the topical ... that the more data, the better translation quality when the corpus size is less than 30K. The overall BLEU scores corresponding to the range of great N val- ues are generally higher than the ones...

Ngày tải lên: 19/02/2014, 19:20

10 533 0
Tài liệu Báo cáo khoa học: "Modified Distortion Matrices for Phrase-Based Statistical Machine Translation" doc

Tài liệu Báo cáo khoa học: "Modified Distortion Matrices for Phrase-Based Statistical Machine Translation" doc

... that is parsed by the decoder to modify the distortion matrix just before starting the search. As usual, the distortion matrix is queried by the distortion penalty generator and by the hypothesis expander 9 . 7.1 ... hypothesis, thus they are not affected by changes in the matrix. 10 That is everything except the small GALE corpus and the UN corpus. As reported by Green et al. (2010) the removal of UN data ... Arabic- English statistical machine translation. Machine Trans- lation, Published Online. David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the...

Ngày tải lên: 19/02/2014, 19:20

10 473 0
Tài liệu Báo cáo khoa học: "A Ranking-based Approach to Word Reordering for Statistical Machine Translation" doc

Tài liệu Báo cáo khoa học: "A Ranking-based Approach to Word Reordering for Statistical Machine Translation" doc

... parallel data, viewing the source tree nodes to be reordered as list items to be ranked. The ranks of tree nodes are determined by their relative positions in the target language – the node in the ... gets the high- est rank, while the ending word in the target sentence gets the lowest rank. The ranking model is trained to directly minimize the mis-ordering of tree nodes, which differs from the ... English-Hindi Statistical Machine Translation. In Proc. IJCNLP. Roy Tromble. 2009. Search and Learning for the Lin- ear Ordering Problem with an Application to Machine Translation. Ph.D. Thesis. Karthik...

Ngày tải lên: 19/02/2014, 19:20

9 616 0
Tài liệu Báo cáo khoa học: "Mixing Multiple Translation Models in Statistical Machine Translation" docx

Tài liệu Báo cáo khoa học: "Mixing Multiple Translation Models in Statistical Machine Translation" docx

... hypotheses. Union, on the other hand, uses hypotheses from all the phrase tables. The feature set of these hypotheses are expanded to include one feature set for each table. However, for the ... only the scores of the best rules, the model with the highest weighted sum of the probabil- ities of the rules wins. This sum has to take into account the translation table limit (ttl), on the ... shows the results of the baselines. The first group are the baseline results on the phrase-based system discussed in Section 2 and the second group are those of our hierarchical MT system. Since the Hiero...

Ngày tải lên: 19/02/2014, 19:20

10 456 0
Tài liệu Báo cáo khoa học: "Bilingual Sense Similarity for Statistical Machine Translation" ppt

Tài liệu Báo cáo khoa học: "Bilingual Sense Similarity for Statistical Machine Translation" ppt

... of the rule (a particular phrase pair in the training corpus), the context will be the words instantiating the non-terminals. In turn, the context for the sub-phrases that instantiate the ... large data (CE_LD) and small data (CE_SD) NIST task by applying one feature. 6.2 Effect of Combining the Two Similari- ties We then combine the two similarity scores by using both of them ... different languages by using their contexts. Second, we use the sense similarities between the source and target sides of a translation rule to improve statistical machine translation perfor- mance....

Ngày tải lên: 20/02/2014, 04:20

10 595 0
Tài liệu Báo cáo khoa học: "Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules" doc

Tài liệu Báo cáo khoa học: "Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules" doc

... algorithm during the substitution. If the inserted phrase has a prefix or suffix sub-phrase that is the same as the suffix or prefix of the adjacent parts of the original sentence, then the duplication will ... scores of the experiments. As we can see in the results, by using only the generated sentence pairs, the per- formance of the system drops. However the inter- polated phrase tables outperform the baseline. ... Because of this, We use the fea- tures in the phrase table to sort the rules, and keep 100 rules with highest the arithmetic mean of the feature values. The second problem is the phrase boundaries...

Ngày tải lên: 20/02/2014, 04:20

5 416 0
Tài liệu Báo cáo khoa học: "Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem" docx

Tài liệu Báo cáo khoa học: "Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem" docx

... time for the two corpuses. Since in the real translation task, the size of the TSP graph is much larger than in the artificial re- ordering task (in our experiments the median size of the TSP graph ... the source side of b (whatever their rel- ative positions in the source sentence), and to pro- ducing the target side of b  directly after the target side of b; the transition cost is then the ... between the reconstructed and the original sentences, which allows us to check how well the quality of reconstruction correlates with the inter- nal score. The training dataset for learning the LM consists...

Ngày tải lên: 20/02/2014, 07:20

9 438 0
Tài liệu Báo cáo khoa học: "Name Translation in Statistical Machine Translation Learning When to Transliterate" pptx

Tài liệu Báo cáo khoa học: "Name Translation in Statistical Machine Translation Learning When to Transliterate" pptx

... the transliterations then compete with the translations generated by the gen- eral SMT system. This means that the MT system will not always use the transliterator suggestions, depending on the ... substring pairs where the English side is preceded by the let- ters a, o, or u. The fourth rule specifies a cost of 0.1 if the substrings occur at the end of (both) names, 0.2 otherwise. According to the fifth ... fifth rule, the Ara- bic letter may match an empty string on the En- glish side, if there is an English consonant (EC) in the right context of the English side. The total cost is computed byalways...

Ngày tải lên: 20/02/2014, 09:20

9 546 0
Tài liệu Báo cáo khoa học: "Segmentation for English-to-Arabic Statistical Machine Translation" ppt

Tài liệu Báo cáo khoa học: "Segmentation for English-to-Arabic Statistical Machine Translation" ppt

... without the table difficult. 3.3 Factored Models For the Factored Translation Models experiment, the factors on the English side are the POS tags and the surface word. On the Arabic side, we use the ... Arabic. The Factored Translation Models experiments uses the MOSES system. 4.1 Data Used We experiment with two domains: text news and spoken dialogue from the travel domain. For the news training data ... later, we observe the same trend, which is due to the fact that the model becomes less sparse with more training data. There has been work on translating from En- glish to other morphologically...

Ngày tải lên: 20/02/2014, 09:20

4 374 0
Tài liệu Báo cáo khoa học: "Combination of Arabic Preprocessing Schemes for Statistical Machine Translation" ppt

Tài liệu Báo cáo khoa học: "Combination of Arabic Preprocessing Schemes for Statistical Machine Translation" ppt

... for Phrase-based Statistical Machine Translation Mod- els. In Proc. of the Association for Machine Trans- lation in the Americas (AMTA). P. Koehn. 2004b. Statistical Significance Tests for Machine Translation ... able to disambiguate amongst the possible analyses of a word, identify the features addressed by the scheme in the chosen analysis and process them as specified by the scheme. In this section we ... constructed from the Arabic side of the par- allel corpus used in the MT experiments (Sec- tion 5). Obviously the more verbose a scheme is, the bigger the number of tokens in the text. The ST, ON,...

Ngày tải lên: 20/02/2014, 11:21

8 295 0
w