Báo cáo khoa học: "Bilingual Topic AdMixture Models for Word Alignment" ppt
... alignment. 3.3 BiTAM-3: Word- level Admixture It is straightforward to extend the sentence-level BiTAM-1 to a word- level admixture model, by sampling topic indicator z n,j for each word- pair (f j , e a j ) ... Null word and smoothing strategies. Empirically, we find that adding “Null” word is always beneficial to all models regardless of number of topics selected. 973 Topics-...
Ngày tải lên: 17/03/2014, 04:20
... models in machine translation are the IBM models and their extensions. When trying to determine whether two words are aligned, the IBM models ask, “What is the probability that this English word ... not correspond to a translation for any French word in F . The null link l(e 0 , f j ) is defined similarly. An alignment A for two sentences E and F is a set of links such that every...
Ngày tải lên: 17/03/2014, 06:20
... more than one word, this process is first carried out for the first new word in the hypothe- sis. It is then repeated for the remaining words in the hypothesis extension. Once the final word in the ... trans- lation has effectively used n-gram word sequence models as language models. Modern phrase-based translation using large scale n-gram language models generally performs well...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Statistical phrase-based models for interactive computer-assisted translation" pdf
... Phrase-based models The usual statistical translation models can be classified as single -word based alignment models. Models of this kind assume that an input word is generated by only one output word ... to this list, new hy- potheses formed by a single translation -word from a non-translated source word. Third, add to this list, new hypotheses formed by a single word with a...
Ngày tải lên: 20/02/2014, 12:20
Tài liệu Báo cáo khoa học: "Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach" pptx
... lexicon models lack from context infor- mation that can be extracted from the same paral- lel corpus. This additional information could be: Simple context information: information of the words ... surrounding the word pair; Syntactic information: part-of-speech in- formation, syntactic constituent, sentence mood; Semantic information: disambiguation in- formation (e.g. from WordNet), cur-...
Ngày tải lên: 20/02/2014, 18:20
Tài liệu Báo cáo khoa học: "Statistical Decision-Tree Models for Parsing*" ppt
... the root of the tree. For an n word sentence, a parse tree has n leaf nodes, where the word feature value of the ith leaf node is the ith word in the sentence. The word fea- ture value of ... fact, no information other than the words is used from the test corpus. For the sake of efficiency, only the sentences of 40 words or fewer are included in these experiments. 4 For th...
Ngày tải lên: 20/02/2014, 22:20
Tài liệu Báo cáo khoa học: "Phrase-Based Backoff Models for Machine Translation of Highly Inflected Languages" docx
... splitting, are interleaved such that a more specific form (i.e. a form closer to the full word form) is chosen before a more general form (i.e. a form that has undergone morphological process- ing). ... backoff models for combining in-domain and 42 out-of-domain data for the purpose of bootstrap- ping a part-of-speech tagger for Chinese, outper- forming standard methods such as EM. 4...
Ngày tải lên: 22/02/2014, 02:20
Báo cáo khoa học: "Utilizing Dependency Language Models for Graph-based Dependency Parsing Models" pptx
... for Computational Linguistics Utilizing Dependency Language Models for Graph-based Dependency Parsing Models Wenliang Chen, Min Zhang ∗ , and Haizhou Li Human Language Technology, Institute for ... fea- tures consistently outperformed the Baselines. For the basic models (MST1/2), we obtained absolute improvements of 0.94 and 0.63 points respectively. For the enhanced models...
Ngày tải lên: 23/03/2014, 14:20
Báo cáo khoa học: "A Topic Similarity Model for Hierarchical Phrase-based Translation" ppt
... doc- ument. Then, for each word in the document, it sam- ples a topic index from the document -topic distribu- tion and samples the word conditioned on the topic index according the topic- word distribution. Generally ... top- ic information at the word level. Howev- er, SMT has been advanced from word- based paradigm to phrase/rule-based paradigm. We therefore propose a top...
Ngày tải lên: 23/03/2014, 14:20
Báo cáo khoa học: "Reduced n-gram models for English and Chinese corpora" ppt
... Chinese corpora. For English, we can reduce the model sizes, compared to 7-gram traditional model sizes, with factors of 14.6 for a 40-million -word corpus and 11.0 for a 500-million -word corpus ... ) ∑ ∑ − = − − = − −− − +− ×+× = 1 0 1 1 1 1 1 N l i li N l i lii i liii i NiiWA wwgt wwPwwgtwPwwgt wwP (3) This averages the probabilities of a word w i following the previo...
Ngày tải lên: 23/03/2014, 18:20