Báo cáo khoa học: "Boosting Statistical Word Alignment Using

Báo cáo khoa học: "Boosting Statistical Word Alignment Using Labeled and Unlabeled Data" ppt

... incor- porating the unlabeled data. In this algorithm, we build a word aligner by using both the labeled data and the unlabeled data. Then we build a pseudo reference set for the unlabeled data, and calculate ... is the word alignment model, which is taken as a learner in the boosting algorithm. The word alignment model is built using both the labeled data...

Ngày tải lên: 08/03/2014, 02:21

8 451 1

Báo cáo khoa học: "Boosting Statistical Machine Translation by Lemmatization and Linear Interpolation" ppt

... without is the alignment factor. The former uses Chinese surface words and English lemmas as the alignment factor, but the latter uses Chinese surface words and English surface words. Therefore, ... sparse data, and retains word meanings unchanged. It is not impossible to improve word alignment by using English lemmatization. We determined what effect lemmatization had in ex...

Ngày tải lên: 23/03/2014, 18:20

4 293 0

Tài liệu Báo cáo khoa học: "Guiding Statistical Word Alignment Models With Prior Knowledge" pdf

... results of IBM Model-4 word alignments implemented in GIZA++ toolkit (Och and Ney, 2003). We study and compare two types of constraint and see how they affect word alignments and translation output. ... requires training the baseline word alignment model in another direction by taking fs as source words and es as target words, which is often done for symmetric alignments,...

Ngày tải lên: 20/02/2014, 12:20

8 495 0

Tài liệu Báo cáo khoa học: "Yet Another Word Alignment Tool" docx

... Another Word Alignment Tool Ulrich Germann University of Toronto germann@cs.toronto.edu Abstract Yawat 1 is a tool for the visualization and ma- nipulation of word- and phrase-level alignments of ... into devising and improving automatic word alignment algorithms, and into evaluating their per- formance (e.g., Och and Ney, 2003; Taskar et al., 2005; Moore et al., 2006; Fras...

Ngày tải lên: 20/02/2014, 09:20

4 417 1

Báo cáo khoa học: "Improving IBM Word-Alignment Model " pdf

... only one iteration of EM in using Model 1 to initialize their Model 2, and Och and Ney (2003) stop af- ter ﬁve iterations in using Model 1 to initialize the HMM word- alignment model. Both of these ... 1, trained using English as the source language and French as the target language: • The standard model is initialized using uniform distributions, and trained without smoo...

Ngày tải lên: 23/03/2014, 19:20

8 260 0

Báo cáo khoa học: "A DOM Tree Alignment Model for Mining Parallel Data from the Web" doc

... 1999), STRAND (Resnik and Smith, 2003), BITS (Ma and Liberman, 1999), and PTI (Chen, Chau and Yeh, 2004). Given a bilingual website, these systems identify candidate parallel docu- ments using ... best alignments among [ ] i K F i TCT ,1 . and [ ] j K E j TCT ,1 . , and then com- pute the best alignment between F i N and E j N . where || F T and || E T are number...

Ngày tải lên: 08/03/2014, 02:21

8 435 0

Báo cáo khoa học: "Parsing Free Word Order Languages in the Paninian Framework" pptx

... Each word group is uniquely identifiable before the core parser ex- ecutes, (b) Each demand word has only one karaka chart, and (c) There are no ambiguities between source word and demand word. ... relations among word groups, and 2. To identify senses of words. The first task requires karaka charts and transfor- mation rules. The second task requires lakshan charts...

Ngày tải lên: 08/03/2014, 07:20

7 353 0

Báo cáo khoa học: "Fully Unsupervised Word Segmentation with BVE and MDL" pdf

... using the ﬁrst 100,000 words of the Chinese Gigaword corpus (Huang, 2007), written in Chinese characters. The word boundaries speciﬁed in the Chinese Gigaword Cor- pus were used as a gold standard. ... (Miller and Stoytchev, 2008) and Markov Experts (Cheng and Mitzenmacher, 2005). Table 2 shows the results for candidate algorithms as well as the two other VE-derived algorithms, HVE-...

Ngày tải lên: 30/03/2014, 21:20

6 373 0

Báo cáo khoa học: "A Stochastic Language Model using Dependency and Its Improvement by Word Clustering" ppt

... a bunsetsu, the content word sequence and the function word sequence are independently predicted by word n-gram models equipped with unknown word models (Mori and Yamaji, 1997). The above ... called bunsetsu composed of one or more content words and function words. Let Cont be a set of content words, Func a set of function words and Sign a set of punctuation symbols...

Ngày tải lên: 31/03/2014, 04:20

7 352 0

Tài liệu Báo cáo khoa học: "Guiding an HPSG Parser using Semantic and Pragmatic Expectations" pdf

... by The Ohio State Center for Cognitive Science and The Ohio State Departments of Computer and Information Science and Linguistics grammar (using compiled knowledge) which is then used to realize ... language generation has been successfully demonstrated using highly compiled knowledge about speech acts and their related social actions. A design and prototype implementation...

Ngày tải lên: 20/02/2014, 21:20

3 379 0