... therefore cannot cover fresh words or new usages of existing words. Secondly, their search 1 http://www.engkoo.com. functions are often limited, making it hard for users to effectively find information ... definition of a word/ phrase; 2) retrieve example sentences using keywords, POS tags or collocations; and 3) get the translation of a word/ phrase/sentence. While Engkoo is currently buil...
Ngày tải lên: 20/02/2014, 05:20
... language is available for download (download.wikimedia.org) in a text format suitable for inclusion in a database. For the remainder of this paper, we refer to this format. 1 Within Wikipedia, ... language article, if available, for additional information. • A second pass checks for multi -word phrases that exist as titles of Wikipedia articles. • We look for certain t...
Ngày tải lên: 20/02/2014, 09:20
Báo cáo khoa học: "Mining the Web for Bilingual Text" pot
... and formally evaluating perfor- mance. The most recent end-product is an au- tomatically acquired parallel corpus comprising 2491 English-French document pairs, approxi- mately 1.5 million words ... world changes. For example, Diekema et al., in a presentation at the 1998 TREC-7 conference (Voorhees and Har- man, 1998), observed that the performance of their cross-language informati...
Ngày tải lên: 08/03/2014, 06:20
Tài liệu Báo cáo khoa học: "A Gibbs Sampler for Phrasal Synchronous Grammar Induction" docx
... p null = 10 −10 for this value in the experiments we report below. 784 either φ P z i for phrase pairs or φ null for single lan- guage phrases. We choose Dirichlet process (DP) priors for these parameters: φ P z i ∼ ... undesirable. Word- based mod- els are incapable of learning translational equiv- alences between non-compositional phrasal units, while the algorithms used for indu...
Ngày tải lên: 20/02/2014, 07:20
Tài liệu Báo cáo khoa học: "An expressive formalism for describing tree-based grammars" docx
... tree description and/or of semantic formu- las. The XMG formalism furthermore supports the sharing of identifiers across dimension hence al- lowing for a straightforward encoding of the syn- tax/semantics ... frame- work for the processing of linguistic meta- descriptions. 1 Introduction It is well known that grammar engineering is a complex task and that factorizing grammar in- formation...
Ngày tải lên: 22/02/2014, 02:20
Báo cáo khoa học: "Improving Decoding Generalization for Tree-to-String Translation" docx
... the most representative work is the forest-based translation method (Mi et al. 2008; Mi and Huang 2008; Zhang et al. 2009) in which a packed forest (forest for short) structure is used to effectively ... Association for Computational Linguistics:shortpapers, pages 418–423, Portland, Oregon, June 19-24, 2011. c 2011 Association for Computational Linguistics Improving Decoding Genera...
Ngày tải lên: 17/03/2014, 00:20
Báo cáo khoa học: "Soft Syntactic Constraints for Hierarchical Phrased-Based Translation" docx
... syntactic constraints; for example, AdvP, the top performer in Arabic, cannot possibly perform well for Chinese, since in our parses the AdvP constituents rarely in- clude more than a single word. At the ... rules of the grammar directly from word- aligned paral- lel text. Rules have the form X → ¯e, ¯ f, where ¯e and ¯ f are phrases containing terminal symbols (words) and possibly co...
Ngày tải lên: 17/03/2014, 02:20
Báo cáo khoa học: "A Probabilistic Model for Canonicalizing Named Entity Mentions" docx
... 1 for clarity. We sample the column index c 1 for the first word in the mention, marginalizing out probabilities of other words in the mention. After we sample the column index for the first word, ... features f for our featurized log-linear distri- bution (§3.1.2). We then downcased all words in mentions for the purpose of defining the table and the mention words w. Ten context wor...
Ngày tải lên: 23/03/2014, 14:20
Báo cáo khoa học: "Alignment Model Adaptation for Domain-Specific Word Alignment" pptx
... perform statistical word alignment. We use and to represent the bi-directional alignment sets, which are shown in Equation (4) and (5). For alignment in both sets, we use j for source words ... http://www.fjoch.com/GIZA++.html. In the following subsections, we will perform linear interpolation for word alignment in the source to target direction. For the word alignment...
Ngày tải lên: 31/03/2014, 03:20
Báo cáo khoa học: "An Efficient Method for Determining Bilingual Word Classes" doc
... is met: FOR EACH word e: FOR EACH class E: [Determine the change of LP((E, 9v), rig) if e is moved to E. Move e to the class with the largest improvement. FOR EACH &apos ;word f: FOR EACH ... and Wu, 1995)). We define bilingual word clustering as the process of forming correspond- ing word classes suitable for machine translation purposes for a pair of languages usin...
Ngày tải lên: 31/03/2014, 21:20