using bilingual parallel corpora

Báo cáo khoa học: "Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment" pptx

Báo cáo khoa học: "Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment" pptx

... more generic multilingual resources (e.g bilin- gual dictionaries). 3 Using Parallel Corpora for CLTE Bilingual parallel corpora represent a possible solu- tion to overcome the inadequacy of ... TE. (4) Can parallel corpora be useful also for mono- lingual TE? To answer this question, we experiment on monolingual RTE datasets using paraphrase ta- bles extracted from bilingual parallel corpora. ... extracted from bilingual corpora, we conducted a series of ex- periments using the different resources mentioned in Section 4.2. As it can be observed in Table 1, the highest results are achieved using...

Ngày tải lên: 17/03/2014, 00:20

10 284 0
Báo cáo khoa học: "Paraphrasing with Bilingual Parallel Corpora" pot

Báo cáo khoa học: "Paraphrasing with Bilingual Parallel Corpora" pot

... word- and sentence-aligned parallel corpora. In Proceedings of ACL. Mona Diab and Philip Resnik. 2002. An unsupervised method for word sense tagging using parallel corpora. In Proceedings of ... and comfort as console. While monolingual parallel corpora often have identical contexts that can be used for identifying paraphrases, bilingual parallel corpora do not. In- stead, we use phrases ... probability to include multiple corpora, as follows: ˆe 2 = arg max e 2 =e 1  C  f in C p(f|e 1 )p(e 2 |f) (5) where C is a parallel corpus from a set of parallel corpora. For this condition we...

Ngày tải lên: 08/03/2014, 04:22

8 308 0
Tài liệu Báo cáo khoa học: "Word Alignment for Languages with Scarce Resources Using Bilingual Corpora of Other Language Pairs" pptx

Tài liệu Báo cáo khoa học: "Word Alignment for Languages with Scarce Resources Using Bilingual Corpora of Other Language Pairs" pptx

... amounts of bilingual data are available for the desired language pair L1-L2, large-scale bilin- gual corpora in L1-L3 and L2-L3 are available. Using these two additional bilingual corpora, we ... as compared with the method using the two corpora in L1-L3 and L3-L2, and a relative error rate reduction of 21.30% as compared with the method using the small bilingual corpus in L1 and ... Robert Gaizauskas. 2005. Aligning Words in English-Hindi Parallel Corpora. In Proc. of the ACL 2005 Workshop on Building and Using Parallel Texts: Data-driven Machine Translation and Beyond,...

Ngày tải lên: 20/02/2014, 12:20

8 359 0
Báo cáo khoa học: "Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora" potx

Báo cáo khoa học: "Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora" potx

... those with parallel data, we can still obtain improvements using the pseudo -parallel data, especially in the first setting. The difference between using parallel versus pseudo -parallel data ... show the performance of our model using the pseudo- parallel data versus the real parallel data, in the two settings, respectively. The EN->CH pseudo- parallel data consists of the English ... is reasonable since the quality of the pseudo -parallel data is not as good as that of the parallel data. Therefore, the performance using pseudo -parallel data is better with a small weight (e.g....

Ngày tải lên: 17/03/2014, 00:20

11 302 0
Tài liệu Báo cáo khoa học: "Exploring Syntactic Structural Features for Sub-Tree Alignment using Bilingual Tree Kernels" docx

Tài liệu Báo cáo khoa học: "Exploring Syntactic Structural Features for Sub-Tree Alignment using Bilingual Tree Kernels" docx

... Conclusion In this paper, we explore syntactic structure fea- tures by means of Bilingual Tree Kernels and ap- ply them to bilingual sub-tree alignment along with various lexical and plain structural ... translation, tree kernels are seldom applied. In this paper, we propose Bilingual Tree Ker- nels (BTKs) to model the bilingual translational equivalences, in our case, to conduct sub-tree alignment. ... structures. We propose two kinds of BTKs named dependent Bilingual Tree Kernel (dBTK), which takes the sub-tree pair as a whole and independent Bilingual Tree Kernel (iBTK), which individually models...

Ngày tải lên: 20/02/2014, 04:20

10 467 0
Tài liệu Báo cáo khoa học: "ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation" pdf

Tài liệu Báo cáo khoa học: "ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation" pdf

... as there are parallel corpora avail- able for the targeted languages. Although large mul- tilingual corpora are still rather scarce, we strongly believe there will be more parallel corpora available in ... metrics. Several studies have already shown the validity of using parallel corpora for sense discrimination (e.g. (Ide et al., 2002)), for bilingual WSD mod- ules (e.g. (Gale and Church, 1993; Ng ... disambiguation using a second language monolingual corpus. Compu- tational Linguistics, 20(4):563–596. M. Diab and P. Resnik. 2002. An Unsupervised Method for Word Sense Tagging Using Parallel Corpora. ...

Ngày tải lên: 20/02/2014, 05:20

6 538 0
Tài liệu Báo cáo khoa học: "Semi-Supervised Learning of Partial Cognates using Bilingual Bootstrapping" doc

Tài liệu Báo cáo khoa học: "Semi-Supervised Learning of Partial Cognates using Bilingual Bootstrapping" doc

... monolingual bootstrapping technique, we also use bilingual bootstrapping. Diab (2002) has shown that unsupervised WSD systems that use parallel corpora can achieve results that are close to ... Some improvement might be achieved when using lemmatization. We wanted to see how well we can do by using sentences as they are extracted from the parallel corpus, with no additional pre-processing ... we use the parallel data in a dif- ferent way: we use words from parallel sentences as features for Machine Learning (ML). Li and Li (2004) have shown that word translation and bilingual bootstrapping...

Ngày tải lên: 20/02/2014, 12:20

8 420 1
Tài liệu Báo cáo khoa học: "Context-dependent SMT Model using Bilingual Verb-Noun Collocation" doc

Tài liệu Báo cáo khoa học: "Context-dependent SMT Model using Bilingual Verb-Noun Collocation" doc

... automatically acquired from the chunk-aligned bilingual corpora. 4.1 Automatic Extraction of Bilingual Verb-Noun Collocation(BiVN) To automatically extract the bilingual verb-noun collocations, we utilize ... ob- tain more semantically plausible transla- tion results, we use bilingual verb-noun collocations; these are automatically ex- tracted by using chunk alignment and a monolingual dependency parser. ... Arbor, June 2005. c 2005 Association for Computational Linguistics Context-dependent SMT Model using Bilingual Verb-Noun Collocation Young-Sook Hwang ATR SLT Research Labs 2-2-2 Hikaridai Seika-cho Soraku-gun...

Ngày tải lên: 20/02/2014, 15:20

8 305 0
Tài liệu Báo cáo khoa học: " Word Translation Disambiguation Using Bilingual Bootstrapping" doc

Tài liệu Báo cáo khoa học: " Word Translation Disambiguation Using Bilingual Bootstrapping" doc

... Learning, vol. 34, pp. 107-130. G. Kikui, 1999. Resolving Translation Ambiguity Using Non -parallel Bilingual Corpora. In Proceedings of ACL ’99 Workshop on Unsupervised Learning in Natural ... −← γγ ;} Output: classifiers in English and Chinese Figure 2: Bilingual Bootstrapping Word Translation Disambiguation Using Bilingual Bootstrapping Cong Li Microsoft Research Asia 5F ... paper proposes a new method for word translation disambiguation using a machine learning technique called Bilingual Bootstrapping’. Bilingual Bootstrapping makes use of in learning a small...

Ngày tải lên: 20/02/2014, 21:20

9 480 0
Tài liệu Báo cáo khoa học: "ALIGNING SENTENCES IN PARALLEL CORPORA" doc

Tài liệu Báo cáo khoa học: "ALIGNING SENTENCES IN PARALLEL CORPORA" doc

... sentences of each length less tha.n 8]. We estimated the probabilities 173 ALIGNING SENTENCES IN PARALLEL CORPORA Peter F. Brown, Jennifer C. Lai, a, nd Robert L. Mercer IBM Thomas J. Watson Research ... describe a statistical tech- nique for aligning sentences with their translations in two parallel corpora. In addition to certain anchor points that are available in our da.ta, the only information ... sentence, or even a whole passage, may be missing from one or the other of the corpora. If a person is given two parallel texts and asked to match up the sentences in them, it is na.tural for...

Ngày tải lên: 20/02/2014, 21:20

8 387 0
Tài liệu Báo cáo khoa học: "A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora" doc

Tài liệu Báo cáo khoa học: "A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora" doc

... primary lexicon using the scores. A threshold is applied to the DTW score of each pair, selecting the most correlated pairs as the first bilingual lexicon. 5. Find anchor points using the primary ... secondary lexicon. 3 Finding high frequency bilingual word pairs When the sentence alignments for the corpus are un- known, standard techniques for extracting bilingual lexicons cannot apply. To ... in its translation 1, suggest- ing a discontinuous mapping between some parallel texts. We have previously shown that using a vector rep- resentation of the frequency and positional informa-...

Ngày tải lên: 20/02/2014, 22:20

8 427 0
Báo cáo khoa học: "Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora" pdf

Báo cáo khoa học: "Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora" pdf

... for parallel data acquisition is highly benefi- cial for the SMT field. Comparable corpora exhibit various degrees of parallelism. Fung and Cheung (2004a) describe corpora ranging from noisy parallel, ... very non -parallel. Corpora from the last category contain “ disparate, very non- parallel bilingual documents that could either be on the same topic (on-topic) or not”. This is the kind of corpora ... information concerning these corpora. 3.2 Extraction Experiments On each of our comparable corpora, and using each of our initial parallel corpora, we apply both the fragment extraction and the sentence ex- traction...

Ngày tải lên: 08/03/2014, 02:21

8 263 0
Báo cáo khoa học: "Unsupervised Sense Disambiguation Using Bilingual Probabilistic Models" pdf

Báo cáo khoa học: "Unsupervised Sense Disambiguation Using Bilingual Probabilistic Models" pdf

... have presented two novel probabilistic models for unsupervised word sense disambiguation using parallel corpora and have shown that both models outperform existing unsupervised approaches. In addition, ... experiments with real data, we make use of the parallel corpora constructed by Diab and Resnik (2002) for evaluation purposes. We chose to work on these corpora in order to permit a direct compar- ison ... unsuper- vised word-sense disambiguation using parallel cor- pora. The first model, which we call the Sense model, builds on the work of Diab and Resnik (2002) that uses both parallel text and a sense in- ventory...

Ngày tải lên: 08/03/2014, 04:22

8 361 0
Báo cáo khoa học: "Unsupervised Learning of Arabic Stemming using a Parallel Corpus" pot

Báo cáo khoa học: "Unsupervised Learning of Arabic Stemming using a Parallel Corpus" pot

... Use the small parallel corpus • Step 2: (optional) Use the monolingual corpus The two steps are described in detail in the fol- lowing subsections. 2.1 Step 1: Using the Small Parallel Corpus Figure ... 55–60. Mona Diab and Philip Resnik. 2002. An unsupervised method for word sense tagging using parallel corpora. In Proceedings of the 40th Annual Meeting of the As- sociation for Computational ... multilingual text analysis tools via ro- bust projection across aligned corpora. Unsupervised Learning of Arabic Stemming using a Parallel Corpus Monica Rogati † Computer Science Department, Carnegie...

Ngày tải lên: 08/03/2014, 04:22

8 424 0
Báo cáo khoa học: "AUTOMATIC ALIGNMENT IN PARALLEL CORPORA" potx

Báo cáo khoa học: "AUTOMATIC ALIGNMENT IN PARALLEL CORPORA" potx

... (associated with content words) reduces the number of parameters 335 AUTOMATIC ALIGNMENT IN PARALLEL CORPORA Harris Papageorgiou, Lambros Cranias, Stelios Piperidis I Institute for Language ... the optimum alignment of units. The proposed scheme has been tested at sentence level on parallel corpora of the CELEX database. The success rate exceeded 99%. The next steps of the work ... scheme's efficiency at lower levels endowed with necessary bilingual information about potential delimiters. INTRODUCTION Parallel linguistically meaningful text units are indispensable...

Ngày tải lên: 08/03/2014, 07:20

3 193 0

Bạn có muốn tìm thêm với từ khóa:

w