word alignment for languages with scarce resources

Tài liệu Báo cáo khoa học: "Word Alignment for Languages with Scarce Resources Using Bilingual Corpora of Other Language Pairs" pptx

Tài liệu Báo cáo khoa học: "Word Alignment for Languages with Scarce Resources Using Bilingual Corpora of Other Language Pairs" pptx

... proposes an approach to im- prove word alignment for languages with scarce resources using bilingual corpora of other language pairs. To perform word alignment between languages L1 and L2, we introduce ... two parameters for the dis- tortion probability: one for head words and the other for non-head words. Distortion Probability for Head Words The distortion probability for head words represents ... Class-based Approach to Word Alignment. Computational Lin- guistics, 23(2): 313-343. Adam Lopez and Philip Resnik. 2005. Improved HMM Alignment Models for Languages with Scarce Resources. In Proc....

Ngày tải lên: 20/02/2014, 12:20

8 359 0
570 ACADEMIC WORD FAMILIES  FOR IELTS (WITH VIETNAMESE MEANING)

570 ACADEMIC WORD FAMILIES FOR IELTS (WITH VIETNAMESE MEANING)

... chính, tiền bạc Finance for education comes from taxpayers ã financial ã financially ã financier Formula: Cụng thc ã formulate / ã reformulate ã formulation / • reformulation Function: ... energetically Enforce: Ép buộc. (obey: tuân theo) The legislation will be difficult to enforce. It is the duty of the police to enforce the law. United Nations troops enforced a ceasefire ... database Integrate into /with something These programs will integrate with your existing software. Integrate A (into /with B)| integrate A and B These programs can be integrated with your existing...

Ngày tải lên: 06/01/2014, 17:52

94 6.9K 45
Báo cáo khoa học: "A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers" pptx

Báo cáo khoa học: "A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers" pptx

... 29–32, Suntec, Singapore, 4 August 2009. c 2009 ACL and AFNLP A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers Han-Cheol Cho , Do-Gil Lee Đ , Jung-Tae Lee Đ , ... module. 1 Introduction Word segmentation (WS) has been a fundamen- tal research issue for languages that do not have word boundary markers (WBMs); on the con- trary, other languages that do have ... information of the user in- put, for languages with WBMs. By utilizing the user input, the proposed method effectively refines the output of the baseline WS model and improves the overall performance. The...

Ngày tải lên: 17/03/2014, 02:20

4 268 0
Tài liệu Báo cáo khoa học: "Word Alignment with Synonym Regularization" doc

Tài liệu Báo cáo khoa học: "Word Alignment with Synonym Regularization" doc

... framework for word alignment that incorporates synonym knowledge collected from monolingual linguistic resources in a bilingual proba- bilistic model. Synonym information is helpful for word alignment ... monolingual resources in a bilin- gual word alignment model. We formulate a syn- onym pair generative model with a topic variable and use this model as a regularization term with a bilingual word alignment ... the different words coupled with the same word in the synonym pairs as synonyms. For in- stance, the words ‘head’, ‘chief’ and ‘forefront’ in the bilingual sentences are replaced with ‘chief’, since...

Ngày tải lên: 20/02/2014, 04:20

5 471 2
Tài liệu Báo cáo khoa học: "Discriminative Word Alignment with Conditional Random Fields" ppt

Tài liệu Báo cáo khoa học: "Discriminative Word Alignment with Conditional Random Fields" ppt

... many-to-one word alignments, where each source word is aligned with zero or one target words, and therefore each target word can be aligned with many source words. Each source word is labelled with ... algorithms for maximum entropy parameter estimation. In Proceedings of CoNLL, pages 49–55. J. Martin, R. Mihalcea, and T. Pedersen. 2005. Word align- ment for languages with scarce resources. ... alignments, where each target word is aligned with zero or more source words. Many-to-many alignments are recoverable using the standard techniques for superimposing pre- dicted alignments in both translation...

Ngày tải lên: 20/02/2014, 11:21

8 461 0
Tài liệu Báo cáo khoa học: "Guiding Statistical Word Alignment Models With Prior Knowledge" pdf

Tài liệu Báo cáo khoa học: "Guiding Statistical Word Alignment Models With Prior Knowledge" pdf

... candidate. This information is de- rived before word alignment model training and will act as soft constraints that need to be respected dur- ing training and alignments. For a given word pair, the ... are used to guide word alignment model training for each iteration. The BLEU score and TER with this constraint are shown in the line “BiLSA-1” of Table 1. To exploit word alignment statistics ... as 1. In building word alignment models, a special “NULL” word is usually introduced to address tar- get words that align to no source words. Since this physically non-existing word is not in the...

Ngày tải lên: 20/02/2014, 12:20

8 495 0
Báo cáo khoa học: "Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment" pdf

Báo cáo khoa học: "Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment" pdf

... a family of word alignment. Definition 1. The ITG alignment family is a set of word alignments that has at least one BTG deriva- tion. ITG alignment family is only a subset of word alignments because ... am- biguity in word alignment is the case where two or more derivations d 1 , d 2 , d k of G have the same underlying word alignment A. A grammar G is non- spurious if for any given word alignment, ... PA, USA. Association for Computational Lin- guistics. Aria Haghighi, John Blitzer, and Dan Klein. 2009. Bet- ter word alignments with supervised itg models. In Association for Computational Linguistics,...

Ngày tải lên: 07/03/2014, 22:20

5 499 0
Báo cáo khoa học: "Data Cleaning for Word Alignment" pdf

Báo cáo khoa học: "Data Cleaning for Word Alignment" pdf

... in its word alignment. Secondly, the aforementioned phrase alignment (Marcu and Wong, 02) considers the n : m map- ping directly bilingually generated by some con- cepts without word alignment. ... two word alignments as an alignment point, 2) add new alignment points that exist in the union with the constraint that a new alignment point connects at least one previ- ously unaligned word, ... are less than 20 percent. 2 1 : n Word Alignment Our discussion of uni-directional alignments of word alignment is limited to IBM Model 4. Definition 1 (Word alignment task) Let e i be the i-th...

Ngày tải lên: 08/03/2014, 01:20

9 487 0
Báo cáo khoa học: "Soft Syntactic Constraints for Word Alignment through Discriminative Training" pot

Báo cáo khoa học: "Soft Syntactic Constraints for Word Alignment through Discriminative Training" pot

... bilin- gual word alignment finds word- to -word connec- tions across languages. Originally introduced as a byproduct of training statistical translation models in (Brown et al., 1993), word alignment ... this formalism: A → [AA] | AA | e/f This grammar enforces its own weak cohesion constraint: for every possible alignment, a corre- sponding binary constituency tree must exist for which the alignment ... new information resulting in im- proved alignments. 2 Constrained Alignment Let an alignment be the complete structure that connects two parallel sentences, and a link be one of the word- to-word...

Ngày tải lên: 08/03/2014, 02:21

8 325 0
Báo cáo khoa học: "Using Similarity Scoring To Improve the Bilingual Dictionary for Word Alignment" doc

Báo cáo khoa học: "Using Similarity Scoring To Improve the Bilingual Dictionary for Word Alignment" doc

... clustering. Those words that are considered for clustering should account for more than of the cooccur- rences of the source language word with any tar- get language word. If a word falls below ... the target language words that cooccur with source language word . Similarly to the most frequent words, dictionary scores for word pairs that are too rare for clustering remain unchanged. 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4 0 ... threshold . enforces that all words within one cluster must have an av- erage similarity score of at least . The sec- ond threshold, , enforces that only certain words are considered for clustering....

Ngày tải lên: 08/03/2014, 07:20

8 363 0
Báo cáo khoa học: "Combining Clues for Word Alignment" pdf

Báo cáo khoa học: "Combining Clues for Word Alignment" pdf

... 0.72 0 opened 0.2 0.86 0 The matrix is simply filled with all values of combined clues for each word pair. For ex- ample, the total clue value for the word pair s ="baggage" and t ="handbagaget" ... find the word- to -word relation with the highest value in the matrix to zero. 2. Check for overlaps: If the link overlaps with other links from more than one accepted link cluster  continue with ... bilingual lexical information. Word alignment approaches focus on the automatic identification of translation relations in translated texts. Alignments are usu- ally represented as a set of links between words and...

Ngày tải lên: 08/03/2014, 21:20

8 579 0
Báo cáo khoa học: "Unsupervised Word Alignment with Arbitrary Features" potx

Báo cáo khoa học: "Unsupervised Word Alignment with Arbitrary Features" potx

... evaluation exer- cise for word alignment. In Proc. of the Workshop on Building and Using Parallel Texts. R. C. Moore. 2005. A discriminative framework for bilingual word alignment. In Proc. of ... Czech-English poses problems for word alignment models since, unlike English, Czech words have a complex inflectional morphol- ogy, and the syntax permits relatively free word or- der. For this language ... their performance in a translation system. Since we only have gold alignments for Czech- English (Bojar and Prokopov ´ a, 2006), we can re- port alignment error rate (AER; Och and Ney, 2003) only for...

Ngày tải lên: 17/03/2014, 00:20

11 293 0
Báo cáo khoa học: "Confidence Measure for Word Alignment" potx

Báo cáo khoa học: "Confidence Measure for Word Alignment" potx

... based on word alignment. In this paper we introduce a confidence mea- sure for word alignment, which is robust to extra or missing words in the bilingual sentence pairs, as well as word alignment ... the same word does in- crease the confusion for word alignment and re- duce the link confidence. On the other hand, ad- ditional information (such as the distance of the word pair, the alignment ... confidence sentence alignments and alignment links from mul- tiple word alignments of the same sen- tence pair. Additionally, we remove low confidence alignment links from the word alignment of a bilingual...

Ngày tải lên: 17/03/2014, 01:20

9 317 0
Báo cáo khoa học: "Optimizing Word Alignment Combination For Phrase Table Training" pptx

Báo cáo khoa học: "Optimizing Word Alignment Combination For Phrase Table Training" pptx

... Translation results (BLEU score) with phrase tables trained with different word align- ment combination methods 4 Conclusions We presented a simple yet effective method for word alignment symmetrization ... syntax-based translation framework. Most word alignment models distinguish trans- lation direction in deriving word alignment matrix. Given a parallel sentence, word alignments in two directions are ... Statistics of word alignment set and the resulting phrase table size (number of entries in thousand (K)) with different combination methods 3.2 Translation Results The ultimate goal of word alignment...

Ngày tải lên: 17/03/2014, 02:20

4 211 0
Báo cáo khoa học: "Bilingual Topic AdMixture Models for Word Alignment" ppt

Báo cáo khoa học: "Bilingual Topic AdMixture Models for Word Alignment" ppt

... and document- pair. Formally, we dene the following terms 1 : ã A word- pair (f j , e i ) is the basic unit for word alignment, where f j is a French word and e i is an English word; j and i are ... corpus- level word- correlation and contextual-level topical information may help to disambiguate translation candidates and word- alignment choices. For ex- ample, the most frequent source words (e.g., ... improve alignment. 3.3 BiTAM-3: Word- level Admixture It is straightforward to extend the sentence-level BiTAM-1 to a word- level admixture model, by sampling topic indicator z n,j for each word- pair (f j ,...

Ngày tải lên: 17/03/2014, 04:20

8 354 0
w