0

comparable corpora and phrasal

Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Bilingual Terminology Acquisition from Comparable Corpora and Phrasal Translation to Cross-Language Information Retrieval" pptx

Báo cáo khoa học

... involving the comparable corpora. Re-scoring through the Comparable Corpora Comparable corpora could be considered for thedisambiguation of translation alternatives and thusselection of best phrasal ... with 10.35%,8.27% and 3.08% for the WWW-based, the NTCIR-based and comparable corpora- based techniques, re-spectively compared to the hybrid two-stages com-parable corpora and linguistics-based ... (1998-1999) forJapanese and Mainichi Daily News (1998-1999) forEnglish were considered as comparable corpora. Wehave also considered documents of NTCIR-2 testcollection as comparable corpora in order...
  • 4
  • 377
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization" potx

Báo cáo khoa học

... WordNet) and Collins bilingualdictionary. Section 5 evaluates our methodolo-gies and Section 6 concludes the paper suggestingsome future developments.2 Comparable Corpora Comparable corpora ... tiz∈ T∗ and for every language Lj, and is known, then thecorpus is parallel and aligned at document level.For the purpose of this paper it is enough to as-sume that two corpora are comparable, ... Italian and 27,821 English newspartitioned by AdnKronos into four fixed cat-egories: QUALITYOF LIFE, MADE IN ITALY,TOURISM, CULTURE AND SCHOOL. The En-glish and the Italian corpora are comparable, ...
  • 8
  • 361
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Using Bilingual Comparable Corpora and Semi-supervised Clustering for Topic Tracking" ppt

Báo cáo khoa học

... Research, and Interna-tional Communications Foundation.ReferencesJ.Allan and R.Papka and V.Lavrenko, On-line new eventdetection and tracking, Proc. of the DARPA Workshop,1998.J.Allan and V.Lavrenko ... chose the TDT3 English corpora as our goldstandard corpora. TDT3 consists of 34,600 sto-ries with 60 manually identified topics. We thencreated Japanese corpora (Mainichi and Yomiurinewspapers) ... stories, and tracking.3.1 Extracting Bilingual Story PairsWe extract story pairs which consist of positiveEnglish story and its associated Japanese storiesusing the TDT English and Mainichi and...
  • 8
  • 254
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Weakly Supervised Named Entity Transliteration and Discovery from Multilingual Comparable Corpora" ppt

Báo cáo khoa học

... (Cucerzan and Yarowsky, 1999) and (Collins and Singer, 1999) present algorithms to obtainNEs from untagged corpora. However, they focuson the classification stage of already segmentedentities, and ... transliteration candidates in another lan-guage. Time sequence scoring is then used to re-rank the list and choose the candidate best tem-porally aligned with the NE. Pairs of NEs and thebest candidates ... pairs of En-glish NEs and their Russian transliterations. Nega-tive examples here and during the rest of the train-ing were pairs of randomly selected non-NE En-glish and Russian words.New...
  • 8
  • 391
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora" pptx

Báo cáo khoa học

... terms and named entities. The toolkit pairs similar bilingual comparable documents and extracts parallel sentences and bilingual terminological and named entity dictionaries from comparable corpora. ... from parallel corpora by means of symmetrical word alignment and/ or by phrase generation (Koehn et al., 2003). Our toolkit exploits comparable corpora in order to find and extract comparable ... extraction from comparable corpora. It consists of tools bundled in two workflows: (1) alignment of comparable documents and extraction of parallel sentences and (2) extraction and bilingual...
  • 6
  • 289
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Method for Effective and Scalable Mining of Named Entity Transliterations from Large Comparable Corpora" doc

Báo cáo khoa học

... 1: Comparable Corpora The corpora can be categorized into two sepa-rate groups, group S (for Small) consisting of EK-S, ET-S, ER-S, and EH-S and group L (for Large) consisting of EK-L and ... language and the pairing of articles in the comparable corpora is known in advance. We want to emphasize here that such corpora are indeed available in many domains such as tech-nical documents and ... 11111,|,||1jajajjAmjnmtstpsaapstPjj Here, jt (and resp. is) denotes the jth (and resp. ith) character in wT (and resp. wS) and maA1is the hidden alignment between wT and wS where jtis aligned...
  • 9
  • 358
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Effect of Cross-Language IR in Bilingual Lexicon Acquisition from Comparable Corpora" pot

Báo cáo khoa học

... Acquisitionfrom Comparable Corpora Takehito UtsuroTakashi Horiuchi and Kohei HinoGraduate School of Informatics, Takeshi Hamamoto and Takeaki NakayamaKyoto UniversityDpt. Information and Computer ... se-quences, and word frequency vectors v (d j) and v (dlYT) are generated. Then, cosine similaritiesbetween v (d j) and v (dr') are calculated3 and pairs of articles di and dE ... help of anyexisting bilingual lexicons. On the other hand,later works such as Kaji and Aizono (1996),Fung and Yee (1998), Rapp (1999), and Tanaka (2002) studied to exploit existing bilinguallexicons...
  • 8
  • 477
  • 0
Commonly-Used Idioms, Sayings and phrasal verbs

Commonly-Used Idioms, Sayings and phrasal verbs

Ngữ pháp tiếng Anh

... The taxes increased across the board and everyone must pay more. act high and mighty - to act proud and powerful The woman always acts high and mighty and nobody likes her. act one's ... energy The man stood up and belted out several old songs. Mr_doody2004@yahoo.com 29 B back Idioms back and forth - backwards and forwards, first one way and then the other way ... Mr_doody2004@yahoo.com 47 bite the hand that feeds you - to harm or turn against someone who does good things for you He is biting the hand that feeds him when he criticizes and fights against his boss....
  • 994
  • 658
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge" doc

Báo cáo khoa học

... pair(isola, island).2. Remove the words isola and island fromtheir respective vocabularies.3. Since island is not in the vocabulary, theindirect association between arcipelago and island is not ... onprecision and recall of bilingual lexicon extractionfrom parallel corpora. This assumption shouldalso be reasonable for many types of comparable corpora such as Wikipedia or news corpora, whichare ... promisingperformance.5 Conclusions and Future WorkWe have designed an algorithm that focuses on ac-quiring and keeping only highly confident trans-lation candidates from multilingual comparable corpora. By employing...
  • 11
  • 290
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases" pptx

Báo cáo khoa học

... including the joint probability phrase-based model (Marcu and Wong, 2002) and a vari-ant on the alignment template approach (Och and Ney, 2004), and contrast them to the performance ofthe word-based ... pairs, and has127 million words in English, and 106 million wordsin Arabic. The table shows the number of uniqueArabic phrases, and gives the average number oftranslations into English and their ... corpus, and another created fromthe target language portion of the corpus,• An index that tells us the correspondence be-tween sentence numbers and positions in thesource and target language corpora, •...
  • 8
  • 316
  • 0
McGraw Hill''''s Dictionary of american idioms and phrasal verbs

McGraw Hill''''s Dictionary of american idioms and phrasal verbs

Ngữ pháp tiếng Anh

... on me.backhanded compliment and left-handed compli-mentan unintended or ambiguous compliment. ᮀ Back-handed compliments are the only kind he ever gives! ᮀ And I think his left-handed compliments ... abbreviations and symbols areused, and these are explained in the section“Terms and Symbols.” The user who understandsthe meaning of entry head, variable, and wildcard term is equipped to understand ... back and made a motion withhis hand indicating that Mary should go first. “After you,”smiled Bob.again and again repeatedly; again and even more[times]. ᮀ He knocked on the door again and...
  • 1,098
  • 17,584
  • 5
Báo cáo khoa học:

Báo cáo khoa học: "Clustering Comparable Corpora For Bilingual Lexicon Extraction" ppt

Báo cáo khoa học

... and that of P2is 0.939. Both corpora aremore comparable than P0of which the comparabil-ity is 0.881. Furthermore, both P1 and P2are more comparable than P1(comparability 0.912) and ... studies and isnow standard.3.2.2 Results and AnalysisIn a first series of experiments, bilingual lexiconswere extracted from the corpora obtained by our ap-proach (P1 and P2), the corpora ... Englishdocuments and 87k French documents) consisting ofthe corpora LAT94, MON94 and SDA94; P2T(368kEnglish documents and 378k French documents)consisting of Wiki-En and Wiki-Fr.1http://trec.nist.gov2http://www.clef-campaign.org3The...
  • 6
  • 308
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Identifying Word Translations from Comparable Corpora Using Latent Topic Models" potx

Báo cáo khoa học

... acquire translation candidates basedon comparable and unrelated corpora comes from(Rapp, 1995). Similar approaches are described in(Diab and Finch, 2000), (Koehn and Knight, 2002) and (Gaussier et ... behind our work and gives an overview and a theoretical background ofthe methods. Section 4 evaluates and discusses ini-tial results. Finally, section 5 proposes several ex-tensions and gives a ... 479–484,Portland, Oregon, June 19-24, 2011.c2011 Association for Computational LinguisticsIdentifying Word Translations from Comparable Corpora Using LatentTopic ModelsIvan Vuli´c, Wim De Smet and...
  • 6
  • 449
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Extracting loanwords from Mongolian corpora and producing a Japanese-Mongolian bilingual dictionary" ppt

Báo cáo khoa học

... corpora (Myaeng and Jeong, 1999; Oh and Choi, 2001) use the phonetic differences between conventional Korean words and loanwords. However, these methods require manually tagged training corpora, ... or comparable corpus is not available, such as Mongolian and Japanese. Fujii et al. (2004) proposed a method that does not require tagged corpora or parallel corpora to extract loanwords and ... bilingual comparable corpora, and matched named entities in each language corpus if they were similar to each other. Thus, Lam et al.’s method cannot be used for a language pair where comparable corpora...
  • 8
  • 271
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Using comparable corpora to solve problems difficult for human translators" pptx

Báo cáo khoa học

... several comparable cor-pora for English and Russian, including large ref-erence corpora (the BNC and the Russian Refer-ence Corpus) and corpora of major British and Russian newspapers. All corpora ... Barcelona.Michael Carl and Andy Way, editors. 2003. Re-cent advances in example-based machine transla-tion. Kluwer, Dordrecht.Ido Dagan and Kenneth Church. 1997. Ter-might: Coordinating humans and machines ... corpus-based extraction and the very large lexicon. In LarsBorin, editor, Language and Computers, Parallel corpora, parallel worlds, pages 137–149. Rodopi.John S. Justeson and Slava M. Katz. 1995....
  • 8
  • 253
  • 0

Xem thêm

Tìm thêm: hệ việt nam nhật bản và sức hấp dẫn của tiếng nhật tại việt nam xác định các mục tiêu của chương trình khảo sát các chuẩn giảng dạy tiếng nhật từ góc độ lí thuyết và thực tiễn khảo sát chương trình đào tạo của các đơn vị đào tạo tại nhật bản khảo sát chương trình đào tạo gắn với các giáo trình cụ thể xác định thời lượng học về mặt lí thuyết và thực tế tiến hành xây dựng chương trình đào tạo dành cho đối tượng không chuyên ngữ tại việt nam điều tra đối với đối tượng giảng viên và đối tượng quản lí khảo sát các chương trình đào tạo theo những bộ giáo trình tiêu biểu xác định mức độ đáp ứng về văn hoá và chuyên môn trong ct mở máy động cơ lồng sóc các đặc tính của động cơ điện không đồng bộ hệ số công suất cosp fi p2 đặc tuyến hiệu suất h fi p2 đặc tuyến mômen quay m fi p2 đặc tuyến tốc độ rôto n fi p2 thông tin liên lạc và các dịch vụ từ bảng 3 1 ta thấy ngoài hai thành phần chủ yếu và chiếm tỷ lệ cao nhất là tinh bột và cacbonhydrat trong hạt gạo tẻ còn chứa đường cellulose hemicellulose chỉ tiêu chất lượng theo chất lượng phẩm chất sản phẩm khô từ gạo của bộ y tế năm 2008 chỉ tiêu chất lượng 9 tr 25