Báo cáo khoa học: "Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge" doc
... 2012. c 2012 Association for Computational Linguistics Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge Ivan Vuli ´ c and Marie-Francine Moens Department ... topic model on document-aligned comparable corpora and intro- duce different methods for identifying word trans- lations across languages, underpinned by per- topi...
Ngày tải lên: 08/03/2014, 21:20
... evaluation set is derived from WordNet in a semi- supervised way. Graph connectivity mea- sures are employed for unsupervised pa- rameter tuning. 1 Introduction and related work Multi -word expressions ... sequences of words that tend to cooccur more frequently than chance and are either idiosyncratic or decompos- able into multiple simple words (Baldwin, 2006). Deciding idiomaticity of MW...
Ngày tải lên: 23/03/2014, 17:20
... same with any other phrase vertex in G, then the paraphrases will be captured. The transition probability from any vertex u to any other vertex v in G, i.e., the probability of 6 hopping from u ... identifying similar words on the graph of WordNet (Rao et al., 2008) and a related measure, the hitting time is known to perform well in har- vesting paraphrases on a graph constructed f...
Ngày tải lên: 17/03/2014, 22:20
Báo cáo khoa học: Nautilin-63, a novel acidic glycoprotein from the shell nacre of Nautilus macromphalus doc
... extracted from the nacre of the cephalo- pod N. macromphalus [34]. In particular, we obtained approximately 40 short sequences of different shell proteins, both extracted from the acid-soluble and from ... determined by monitoring the pH decrease (Fig. 3). In the blank experiment (without sample), the pH decreased with- out any time lag (approximately 120 s), corresponding to the rap...
Ngày tải lên: 22/03/2014, 16:20
Báo cáo khoa học: "Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages" docx
... endings from words in English. For Pashto, we utilize a morphological decompostion algorithm that has been shown to be effective for Arabic speech recognition (Xiang et al., 2006). We start from ... suffixes are stripped off from the Pashto words under the two constraints:(1) Longest matched affixes first; (2) Remaining stem must be at least two characters long. 2.3 Partial Word For low-...
Ngày tải lên: 23/03/2014, 16:20
Báo cáo khoa học: "Simultaneous Tokenization and Part-of-Speech Tagging for Arabic without a Morphological Analyzer" doc
... regexes. If it text-matches any closed-class expression, we pick a random choice from among those regexes and otherwise from the open-class regexes that it text-matches. Any POS ambiguities for a ... testing, we run each word through all the open and closed regexes. Text- matches for an open-class regex give rise to fea- tures as just described. Also, if the word matches any clo...
Ngày tải lên: 30/03/2014, 21:20
Báo cáo khoa học: "Pivot Approach for Extracting Paraphrase Patterns from Bilingual Corpora" ppt
... that word alignment error is the major factor that influences the perfor- mance of the methods learning paraphrases from bilingual corpora. The LW based features validate the quality of word alignment ... induced from S E , we extract the pivot pattern P C aligning to P E (e) as in Algorithm 2. Note that the Chinese patterns are not extracted from parse trees. They are only sequence...
Ngày tải lên: 31/03/2014, 00:20
Báo cáo khoa học: "Bilingual Terminology Mining – Using Brain, not brawn comparable corpora" ppt
... features of the harvested comparable corpora: the number of doc- uments, and the number of words for each language and each type of discourse. French Japanese doc. words doc. words Scientific 65 425,781 ... 538 807,287 Table 2: Comparable corpora statistics From these documents, we created two compara- ble corpora: scientific corpora , composed only of scientific documents; mixe...
Ngày tải lên: 31/03/2014, 01:20
Tài liệu Báo cáo khoa học: "Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents" doc
... exploits dictionaries and phrase tables extracted from bilingual parallel cor- pora to determine the number of word sequences in H that can be mapped to word sequences in T. In this way a semantic ... loss in precision. Like lexical phrase tables, SPTs are extracted from parallel corpora. As a first step we annotate the parallel corpora with named-entity taggers for the source and t...
Ngày tải lên: 19/02/2014, 19:20
Tài liệu Báo cáo khoa học: "Collecting Highly Parallel Data for Paraphrase Evaluation" doc
... Linguistics (COLING-2008). Chris Callison-Burch. 2008. Syntactic constraints on paraphrases extracted from parallel corpora. In Pro- ceedings of the 2008 Conference on Empirical Meth- ods in Natural Language ... 7(1):1–29. Ali Ibrahim, Boris Katz, and Jimmy Lin. 2003. Extract- ing structural paraphrases from aligned monolingual corpora. In Proceedings of the 41st Annual Meeting of the A...
Ngày tải lên: 20/02/2014, 04:20