Báo cáo khoa học: "Bilingual Terminology Mining

Báo cáo khoa học: "Bilingual Terminology Mining – Using Brain, not brawn comparable corpora" ppt

... Computational Linguistics, pages 66 4–6 71, Prague, Czech Republic, June 2007. c 2007 Association for Computational Linguistics Bilingual Terminology Mining – Using Brain, not brawn comparable corpora E. Morin, ... Japan kyo@p.u-tokyo.ac.jp Abstract Current research in text mining favours the quantity of texts over their quality. But for bilingual terminology mining,...

Ngày tải lên: 31/03/2014, 01:20

8 281 0

Tài liệu Báo cáo khoa học: "Bilingual Terminology Acquisition from Comparable Corpora and Phrasal Translation to Cross-Language Information Retrieval" pptx

... for Japanese-English language pair, especially if involving the comparable corpora. Re-scoring through the Comparable Corpora Comparable corpora could be considered for the disambiguation of ... non-aligned comparable corpora, phrasal translation as well as evalua- tions on Cross-Language Information Re- trieval. A two-stages translation model is proposed for the acquisition of bili...

Ngày tải lên: 20/02/2014, 16:20

4 377 0

Báo cáo khoa học: "Word Sense Disambiguation using lexical cohesion in the context" ppt

... 2006 Main Conference Poster Sessions, pages 92 9–9 36, Sydney, July 2006. c 2006 Association for Computational Linguistics Word Sense Disambiguation using lexical cohesion in the context Dongqiang ... cross- connecting words from different domains or POS tags. It should be noted that WordNet 2.0 makes some efforts to interrelate nouns and verbs using their derived lexical forms, p...

Ngày tải lên: 08/03/2014, 02:21

8 404 0

Báo cáo khoa học: "Word Sense Disambiguation using Optimised Combinations of Knowledge Sources" ppt

... so our performance cannot be eas- ily compared with theirs. Mahesh et. al. claim high levels of sense tagging accuracy (about 89%), but our results are not directly comparable since its authors ... novelty, not just one of the difficulty of discrimination. If that is the case, it tends to under- mine the standard mark-up-model-and-test method- ology of most recent NLP, since it wi...

Ngày tải lên: 17/03/2014, 07:20

5 240 0

Báo cáo khoa học: "Cohesion and Collocation: Using Context Vectors in Text Segmentation" pptx

... words not in the training corpus (per- sonal names, rare terminology etc.) that ties text together. Such cases pose no challenge to the string- based system, but the VecTile system cannot ... The best solution might be a hybrid system with a backup procedure for unknown words. Another point to note is how well the much sim- pler TextTile system compares. Indeed, a close look at ....

Ngày tải lên: 17/03/2014, 07:20

5 392 0

Báo cáo khoa học: " Named Entity Recognition using an HMM-based Chunk Tagger" pptx

... IdentiFinder, which models the original process that generates the NE-class annotated words from the original NE tags. Another difference is that our model assumes mutual information independence ... 1−i t and i t (Column: 1−i BC in 1−i t ; Row: i BC in i t ) 3 Determining Word Feature As stated above, token is denoted as ordered pairs of word-feature and word itself: >=<...

Ngày tải lên: 17/03/2014, 08:20

8 473 1

Báo cáo khoa học: "Parsing the WSJ using CCG and Log-Linear Models" pptx

... Hockenmaier does not use a supertagger, but does use a beam search. Parsing the 2,401 sentences in section 23 takes 1.6 minutes using the normal-form model, and 10.5 minutes using the dependency ... derivations is not possible (at least for wide-coverage automatically extracted grammars). Clark and Curran (2003) show how the sum over the complete derivation space can be performed ef-...

Ngày tải lên: 23/03/2014, 19:20

8 336 0

Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora" ppt

... multi-word expressions is widely recognized in the domains of translation and terminology. These expressions can usually not be translated literally, and one must find adequate correspondences in ... identified by the extraction system. Oth- erwise, if the two terms do not belong to the same chunk, it will be missed. We did not assess yet the number of missed cooccurrences, but we esti...

Ngày tải lên: 08/03/2014, 21:20

4 353 0

Báo cáo khoa học: "Detecting Highly Conﬁdent Word Translations from Comparable Corpora without Any Prior Knowledge" doc

... Natural Language Processing, pages 88 0–8 89. Emmanuel Morin, B ´ eatrice Daille, Koichi Takeuchi, and Kyo Kageura. 2007. Bilingual terminology mining - using brain, not brawn comparable corpora. In Proceedings ... International Conference on Computational Linguistics, pages 1–7 . Mona T. Diab and Steve Finch. 2000. A statis- tical translation model using comparable co...

Ngày tải lên: 08/03/2014, 21:20

11 290 0

Báo cáo khoa học: "Learning Translations of Named-Entity Phrases from Parallel Corpora" ppt

... target language corpus may be annotated as constituting lexical compounds, which may or may not include the translations of the source language phrases of interest. Otherwise there is no annotation of the target ... score than another candidate translation that does include that particular target language word. While this is not actually a gener- ative model, the probabilities being combi...

Ngày tải lên: 17/03/2014, 22:20

8 312 0