comparable corpora and phrasal

Tài liệu Báo cáo khoa học: "Bilingual Terminology Acquisition from Comparable Corpora and Phrasal Translation to Cross-Language Information Retrieval" pptx

... involving the comparable corpora. Re-scoring through the Comparable Corpora Comparable corpora could be considered for the disambiguation of translation alternatives and thus selection of best phrasal ... with 10.35%, 8.27% and 3.08% for the WWW-based, the NTCIR- based and comparable corpora- based techniques, re- spectively compared to the hybrid two-stages comparable corpora and linguistics-based ... (1998-1999) for Japanese and Mainichi Daily News (1998-1999) for English were considered as comparable corpora. We have also considered documents of NTCIR-2 test collection as comparable corpora in order...

Ngày tải lên: 20/02/2014, 16:20

4 377 0

Báo cáo khoa học: "Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization" potx

... WordNet) and Collins bilingual dictionary. Section 5 evaluates our methodolo- gies and Section 6 concludes the paper suggesting some future developments. 2 Comparable Corpora Comparable corpora ... t i z ∈ T ∗ and for every language L j , and is known, then the corpus is parallel and aligned at document level. For the purpose of this paper it is enough to as- sume that two corpora are comparable, ... Italian and 27,821 English news partitioned by AdnKronos into four ﬁxed cat- egories: QUALITY OF LIFE, MADE IN ITALY, TOURISM, CULTURE AND SCHOOL. The En- glish and the Italian corpora are comparable, ...

Ngày tải lên: 17/03/2014, 04:20

8 361 0

Báo cáo khoa học: "Using Bilingual Comparable Corpora and Semi-supervised Clustering for Topic Tracking" ppt

... Research, and Interna- tional Communications Foundation. References J.Allan and R.Papka and V.Lavrenko, On-line new event detection and tracking, Proc. of the DARPA Workshop, 1998. J.Allan and V.Lavrenko ... chose the TDT3 English corpora as our gold standard corpora. TDT3 consists of 34,600 stories with 60 manually identiﬁed topics. We then created Japanese corpora (Mainichi and Yomiuri newspapers) ... stories, and tracking. 3.1 Extracting Bilingual Story Pairs We extract story pairs which consist of positive English story and its associated Japanese stories using the TDT English and Mainichi and...

Ngày tải lên: 17/03/2014, 04:20

8 254 0

Tài liệu Báo cáo khoa học: "Weakly Supervised Named Entity Transliteration and Discovery from Multilingual Comparable Corpora" ppt

... (Cucerzan and Yarowsky, 1999) and (Collins and Singer, 1999) present algorithms to obtain NEs from untagged corpora. However, they focus on the classiﬁcation stage of already segmented entities, and ... transliteration candidates in another language. Time sequence scoring is then used to re- rank the list and choose the candidate best tem- porally aligned with the NE. Pairs of NEs and the best candidates ... pairs of En- glish NEs and their Russian transliterations. Nega- tive examples here and during the rest of the training were pairs of randomly selected non-NE En- glish and Russian words. New...

Ngày tải lên: 20/02/2014, 12:20

8 392 0

Báo cáo khoa học: "Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora" pptx

... terms and named entities. The toolkit pairs similar bilingual comparable documents and extracts parallel sentences and bilingual terminological and named entity dictionaries from comparable corpora. ... from parallel corpora by means of symmetrical word alignment and/ or by phrase generation (Koehn et al., 2003). Our toolkit exploits comparable corpora in order to find and extract comparable ... extraction from comparable corpora. It consists of tools bundled in two workflows: (1) alignment of comparable documents and extraction of parallel sentences and (2) extraction and bilingual...

Ngày tải lên: 16/03/2014, 20:20

6 289 0

Báo cáo khoa học: "A Method for Effective and Scalable Mining of Named Entity Transliterations from Large Comparable Corpora" doc

... 1: Comparable Corpora The corpora can be categorized into two sepa- rate groups, group S (for Small) consisting of EK-S, ET-S, ER-S, and EH-S and group L (for Large) consisting of EK-L and ... language and the pairing of articles in the comparable corpora is known in advance. We want to emphasize here that such corpora are indeed available in many domains such as tech- nical documents and ...  11 1 11 ,|,|| 1       jajajj A m j nm tstpsaapstP jj Here, j t (and resp. i s ) denotes the j th (and resp. i th ) character in w T (and resp. w S ) and m aA 1  is the hidden alignment between w T and w S where j t is aligned...

Ngày tải lên: 24/03/2014, 03:20

9 358 0

Tài liệu Báo cáo khoa học: "Effect of Cross-Language IR in Bilingual Lexicon Acquisition from Comparable Corpora" pot

... Acquisition from Comparable Corpora Takehito Utsuro  Takashi Horiuchi and Kohei Hino Graduate School of Informatics, Takeshi Hamamoto and Takeaki Nakayama Kyoto University  Dpt. Information and Computer ... se- quences, and word frequency vectors v (d j) and v (dlY T ) are generated. Then, cosine similarities between v (d j ) and v (dr') are calculated 3 and pairs of articles di and dE ... help of any existing bilingual lexicons. On the other hand, later works such as Kaji and Aizono (1996), Fung and Yee (1998), Rapp (1999), and Tanaka (2002) studied to exploit existing bilingual lexicons...

Ngày tải lên: 22/02/2014, 02:20

8 477 0

Commonly-Used Idioms, Sayings and phrasal verbs

... The taxes increased across the board and everyone must pay more. act high and mighty - to act proud and powerful The woman always acts high and mighty and nobody likes her. act one's ... energy The man stood up and belted out several old songs. Mr_doody2004@yahoo.com 29 B back Idioms back and forth - backwards and forwards, first one way and then the other way ... Mr_doody2004@yahoo.com 47 bite the hand that feeds you - to harm or turn against someone who does good things for you He is biting the hand that feeds him when he criticizes and fights against his boss....

Ngày tải lên: 02/03/2014, 20:36

994 659 0

Báo cáo khoa học: "Detecting Highly Conﬁdent Word Translations from Comparable Corpora without Any Prior Knowledge" doc

... pair (isola, island). 2. Remove the words isola and island from their respective vocabularies. 3. Since island is not in the vocabulary, the indirect association between arcipelago and island is not ... on precision and recall of bilingual lexicon extraction from parallel corpora. This assumption should also be reasonable for many types of comparable corpora such as Wikipedia or news corpora, which are ... promising performance. 5 Conclusions and Future Work We have designed an algorithm that focuses on ac- quiring and keeping only highly conﬁdent translation candidates from multilingual comparable corpora. By employing...

Ngày tải lên: 08/03/2014, 21:20

11 290 0

Báo cáo khoa học: "Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases" pptx

... including the joint probability phrase- based model (Marcu and Wong, 2002) and a vari- ant on the alignment template approach (Och and Ney, 2004), and contrast them to the performance of the word-based ... pairs, and has 127 million words in English, and 106 million words in Arabic. The table shows the number of unique Arabic phrases, and gives the average number of translations into English and their ... corpus, and another created from the target language portion of the corpus, • An index that tells us the correspondence between sentence numbers and positions in the source and target language corpora, •...

Ngày tải lên: 17/03/2014, 05:20

8 316 0

McGraw Hill''''s Dictionary of american idioms and phrasal verbs

... on me. backhanded compliment and left-handed compliment an unintended or ambiguous compliment. ᮀ Back- handed compliments are the only kind he ever gives! ᮀ And I think his left-handed compliments ... abbreviations and symbols are used, and these are explained in the section “Terms and Symbols.” The user who understands the meaning of entry head, variable, and wild card term is equipped to understand ... back and made a motion with his hand indicating that Mary should go first. “After you,” smiled Bob. again and again repeatedly; again and even more [times]. ᮀ He knocked on the door again and...

Ngày tải lên: 20/03/2014, 17:47

1,1K 17,6K 5

Báo cáo khoa học: "Clustering Comparable Corpora For Bilingual Lexicon Extraction" ppt

... and that of P 2 is 0.939. Both corpora are more comparable than P 0 of which the comparability is 0.881. Furthermore, both P 1 and P 2 are more comparable than P 1  (comparability 0.912) and ... studies and is now standard. 3.2.2 Results and Analysis In a ﬁrst series of experiments, bilingual lexicons were extracted from the corpora obtained by our approach (P 1 and P 2 ), the corpora ... English documents and 87k French documents) consisting of the corpora LAT94, MON94 and SDA94; P 2 T (368k English documents and 378k French documents) consisting of Wiki-En and Wiki-Fr. 1 http://trec.nist.gov 2 http://www.clef-campaign.org 3 The...

Ngày tải lên: 23/03/2014, 16:20

6 308 0

Báo cáo khoa học: "Identifying Word Translations from Comparable Corpora Using Latent Topic Models" potx

... acquire translation candidates based on comparable and unrelated corpora comes from (Rapp, 1995). Similar approaches are described in (Diab and Finch, 2000), (Koehn and Knight, 2002) and (Gaussier et ... behind our work and gives an overview and a theoretical background of the methods. Section 4 evaluates and discusses ini- tial results. Finally, section 5 proposes several ex- tensions and gives a ... 479–484, Portland, Oregon, June 19-24, 2011. c 2011 Association for Computational Linguistics Identifying Word Translations from Comparable Corpora Using Latent Topic Models Ivan Vuli ´ c, Wim De Smet and...

Ngày tải lên: 23/03/2014, 16:20

6 449 0

Báo cáo khoa học: "Extracting loanwords from Mongolian corpora and producing a Japanese-Mongolian bilingual dictionary" ppt

... corpora (Myaeng and Jeong, 1999; Oh and Choi, 2001) use the phonetic differences between conventional Korean words and loanwords. However, these methods require manually tagged training corpora, ... or comparable corpus is not available, such as Mongolian and Japanese. Fujii et al. (2004) proposed a method that does not require tagged corpora or parallel corpora to extract loanwords and ... bilingual comparable corpora, and matched named entities in each language corpus if they were similar to each other. Thus, Lam et al.’s method cannot be used for a language pair where comparable corpora...

Ngày tải lên: 23/03/2014, 18:20

8 271 0

Báo cáo khoa học: "Using comparable corpora to solve problems difﬁcult for human translators" pptx

... several comparable corpora for English and Russian, including large ref- erence corpora (the BNC and the Russian Refer- ence Corpus) and corpora of major British and Russian newspapers. All corpora ... Barcelona. Michael Carl and Andy Way, editors. 2003. Re- cent advances in example-based machine translation. Kluwer, Dordrecht. Ido Dagan and Kenneth Church. 1997. Ter- might: Coordinating humans and machines ... corpus- based extraction and the very large lexicon. In Lars Borin, editor, Language and Computers, Parallel corpora, parallel worlds, pages 137–149. Rodopi. John S. Justeson and Slava M. Katz. 1995....