Báo cáo khoa học: "Feature-based Method for Document Alignment in Comparable News Corpora" ppt
... http://www.straitstimes.com/ an English news agency in Singapore. Source © Singapore Press Holdings Ltd. 3 http://www.zaobao.com/ a Chinese news agency in Singa- pore. Source © Singapore Press Holdings Ltd. 4 http://cyberita.asia1.com.sg/ ... Linguistics Feature-based Method for Document Alignment in Comparable News Corpora Thuy Vu, Ai Ti Aw, Min Zhang Departmen...
Ngày tải lên: 24/03/2014, 03:20
... the string including errors from the String- Database (the former string is referred to as the Similar-String, and the latter as the Error-String). Finally, the correction is made using the ... K (2 in the experiment) characters before and after an error-block in the Error-String, am found in the Similar- String, take out the string (denoted C) between A and B in 1 For detect...
Ngày tải lên: 20/02/2014, 18:20
... gasolines on newer engines.” In a common dataset for NP chunking, the word “re- formulated” never appears in the training data, but appears four times in the test set as part of the NP “reformulated ... the increased performance by the HMM- smoothed model on the rare-word subset con- tributes in part to an increase in performance on the overall dataset of 1% for tagging and 3%...
Ngày tải lên: 17/03/2014, 01:20
Báo cáo khoa học: "Pivot Approach for Extracting Paraphrase Patterns from Bilingual Corpora" ppt
... phrasal paraphrases from bilingual corpora. Our method involves three steps: (1) corpus prepro- cessing, including English monolingual dependency 780 parsing and English-foreign language word align- ment, ... patterns extracted using their method. How- ever, the performance of their method is dependent on the hand-crafted queries for web mining. Shinyama et al. (2002) presented a meth...
Ngày tải lên: 31/03/2014, 00:20
Tài liệu Báo cáo khoa học: "A Method for Measuring Machine Translation Confidence" docx
... three novel feature sets including source side information, alignment context, and dependency structures. Experi- mental results show that by combining the source side information, alignment context, ... derived from a window of four words. Combining alignment context with POS tags: In- stead of using lexical context we have features to look at source and target POS alignment contex...
Ngày tải lên: 20/02/2014, 04:20
Báo cáo khoa học: "A Method for Relating Multiple Newspaper Articles by Using Graphs, and Its Application to Webcasting" pptx
... new information in each article. The thread- ing technique is suitable for Webcasting (push) ap- plications. A threading server determines relation- ships among articles from various news ... have links, or else must be manually linked at a high cost in terms of time and effort. This paper describes methods for relating news- paper articles automatically, and its application...
Ngày tải lên: 08/03/2014, 06:20
Báo cáo khoa học: "A Method for Word Sense Disambiguation of Unrestricted Text" potx
... adverbs and adjectives in a text, using the senses pro- vided in WordNet. The senses are ranked us- ing two sources of information: (1) the Inter- net for gathering statistics for word-word co- ... words in the similarity lists of the noun report are: (investigate-report, investigate-study) (investigate-report, investigate -news report, investigate- story, investigate-accou...
Ngày tải lên: 08/03/2014, 06:20
Báo cáo khoa học: "a Method for Automatic Evaluation of Machine Translation" pot
... the baseline metric in detail. In Section 3, we evaluate the performance of BLEU. In Section 4, we describe a human evaluation experiment. In Section 5, we compare our baseline metric performance ... ample signal in any single n-gram precision, it is more robust to combine all these sig- nals into a single number metric. 2.1.3 Combining the modified n-gram precisions How should we co...
Ngày tải lên: 23/03/2014, 20:20
Báo cáo khoa học: "A Method for Effective and Scalable Mining of Named Entity Transliterations from Large Comparable Corpora" doc
... scala- ble mining method, called MINT (MIning Named-entity Transliteration equivalents), for mining of NETEs from large comparable corpo- ra. MINT addresses several challenges in mining NETEs ... the world. The MINT method pro- posed in this paper addresses all the above is- sues. 800 3 The MINT Mining Method MINT has two stages. In the first stage, for every documen...
Ngày tải lên: 24/03/2014, 03:20
Tài liệu Báo cáo khoa học: "A PROGRAM FOR ALIGNING SENTENCES IN BILINGUAL CORPORA" docx
... constructing a probabilistic dictionary (Table 3) for use in aligning words in machine translation (Brown et al., 1990), or for constructing a bilingual concordance (Table 4) for use in lexicography ... Crossing dependencies are possible in the latter, but not in the former. Table 1: Input to Alignment Program English According to our survey, 1988 sales of mineral...
Ngày tải lên: 20/02/2014, 21:20