boosting statistical word alignment using labeled and unlabeled data

Báo cáo khoa học: "Boosting Statistical Word Alignment Using Labeled and Unlabeled Data" ppt

... incor- porating the unlabeled data. In this algorithm, we build a word aligner by using both the labeled data and the unlabeled data. Then we build a pseudo reference set for the unlabeled data, and calculate ... for Computational Linguistics Boosting Statistical Word Alignment Using Labeled and Unlabeled Data Hua Wu Haifeng Wang Zhanyi Liu Toshiba (China) Research and Development Center 5/F., Tower ... semi-supervised boosting approach to improve statistical word alignment with limited labeled data and large amounts of unlabeled data. The proposed approach modifies the supervised boosting algorithm...

Ngày tải lên: 08/03/2014, 02:21

8 451 1

Tài liệu Báo cáo khoa học: "Guiding Statistical Word Alignment Models With Prior Knowledge" pdf

... results of IBM Model-4 word alignments implemented in GIZA++ toolkit (Och and Ney, 2003). We study and compare two types of constraint and see how they affect word alignments and translation output. ... the statistics of the baseline word alignment model and use them to construct the sparse matrix. We ﬁnd low-dimensional representation (R = 67) of English words and Arabic words and use their similarity ... impact on word alignments and show their effectiveness in improving translation performance. 1 convenience, and is not the only way to address the empty word issue. 2.2.2 Utilizing Word Alignment...

Ngày tải lên: 20/02/2014, 12:20

8 495 0

Báo cáo khoa học: "Boosting Statistical Machine Translation by Lemmatization and Linear Interpolation" ppt

Ngày tải lên: 23/03/2014, 18:20

4 293 0

Báo cáo khoa học: "Semi-Supervised Training for Statistical Word Alignment" docx

Ngày tải lên: 31/03/2014, 01:20

8 193 0

Tài liệu Báo cáo khoa học: "Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity" pdf

... similarity: one using syntactic context and one using translational context based on word alignment and the combination of both. For both approaches, we used a cutoff n for each row in our word- by-context ... on Word Alignment The alignment approach to synonym extraction is based on automatic word alignment. Context vectors are built from the alignments found in a parallel corpus. Each aligned word ... automatic word alignment. We applied GIZA++ and the in- tersection heuristics as explained in section . From the word aligned corpora we extracted word type links, pairs of source and target words...

Ngày tải lên: 20/02/2014, 12:20

8 516 0

Báo cáo khoa học: "Semi-Supervised Sequential Labeling and Segmentation using Giga-word Scale Unlabeled Data" pdf

Ngày tải lên: 23/03/2014, 17:20

9 274 0

Báo cáo khoa học: "High-Performance Bilingual Text Alignment Using Statistical and Dictionary Information" pptx

Ngày tải lên: 31/03/2014, 06:20

8 270 0

Tài liệu Báo cáo khoa học: "Word Alignment for Languages with Scarce Resources Using Bilingual Corpora of Other Language Pairs" pptx

... data, which is not included in the training data and the held-out data. The alignment links in the held-out data and the testing data are manually annotated. Testing data includes 4,926 alignment ... 2005. Combined Word Alignments. In Proc. of the ACL-2005 Workshop on Building and Using Parallel Texts: Data- driven Machine Trans- lation and Beyond, pages 107-110. Franz Josef Och and Hermann ... Japanese word , and put the English sentences in the pairs into Set 2. e f (3) Construct the feature vectors and of the given English word using all other words as context in Set 1 and Set 2,...

Ngày tải lên: 20/02/2014, 12:20

8 359 0

Tài liệu Báo cáo khoa học: "Learning with Unlabeled Data for Text Categorization Using Bootstrapping and Feature Projection Techniques" doc

... with robustness from noisy data (Ko and Seo, 2004). How can labeled training data be automatically created from unlabeled data and title words? Maybe unlabeled data don’t have any information ... for using unlabeled data in text categorization have two directions; One builds classifiers from a combination of labeled and unlabeled data (Nigam, 2001; Bennett and Demiriz, 1999), and ... related works, we presented two approaches using unlabeled data in text categorization; one approach combines unlabeled data and labeled data, and the other approach uses the clustering technique...

Ngày tải lên: 20/02/2014, 16:20

8 444 0

Tài liệu Báo cáo khoa học: "Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data" pdf

... 1268 Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data Sun Maosong, Shen Dayang*, Benjamin K Tsou** State Key Laboratory of Intelligent Technology and Systems, ... Chinese word segmentation developed so far, both statistical and rule-based, exploited two kinds of important resources, i.e., lexicon and hand-crafted linguistic resources(manually segmented and ... Chinese, and time consuming. Furthermore, even the lexicon is large enough, and the corpus annotated is balanced and huge in size, the word segmenter will still face the problem of data incompleteness,...

Ngày tải lên: 20/02/2014, 18:20

7 396 0

Báo cáo khoa học: "Using Similarity Scoring To Improve the Bilingual Dictionary for Word Alignment" doc

... show a signiﬁ- cant improvement in precision and recall for word alignment when the improved dicitonary is used. 1 Introduction and Related Work Word alignment is a well-studied problem in Natu- ral ... are based on word cooccur- rences in sentence-aligned bilingual data. A source language word and a target language word are said to cooccur if occurs in a source language sentence and occurs ... one word. Then the similarity score of the merged cluster will be the similarity score of the word pair. 2. Merge a cluster that contains a single word and a cluster that contains words and has...

Ngày tải lên: 08/03/2014, 07:20

8 363 0

Báo cáo khoa học: "Finding Word Substitutions Using a Distributional Similarity Baseline and Immediate Context Overlap" potx

... her and make cancer cells and his wife and an innocent man a mocking bird thousands of innocent unsuspecting people and or die for women and children suspects in foreign or be killed her husband ... (see Geffet and Dagan, 2004 and Gef- fet and Dagan, 2005, who improve the output of a distributional similarity system for an entailment task using a web-based feature inclusion check, and comment ... husband and a young girl another human being in the name and forcibly recruit thousands of people in connection with a teenage girl in the name another human being and kill her his wife and tens...

Ngày tải lên: 08/03/2014, 21:20

9 248 0

Báo cáo khoa học: "Phrase-based Statistical Language Generation using Graphical Models and Active Learning" potx

... He and S. Young. Semantic processing using the Hidden Vector State model. Computer Speech & Language, 19 (1):85–106, 2005. A. Isard, C. Brockmann, and J. Oberlander. Individuality and alignment ... A. Walker, O. Rambow, and M. Rogati. Training a sentence planner for spoken dialogue using boosting. Com- puter Speech and Language, 16(3-4), 2002. M. White, R. Rajkumar, and S. Martin. Towards ... the data (Fre- und et al., 1997). 7 Discussion and conclusion This paper presents and evaluates BAGEL, a statistical language generator that can be trained en- tirely from data, with no handcrafting...

Ngày tải lên: 17/03/2014, 00:20

10 382 0

Báo cáo khoa học: "Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora" docx

... tween bandwidth and largeur de bande can be obtained through the edge linking these two units (type 1), or through two edges, one from bandwidth to largeur de bande., and one from bandwidth ... English and French words, an empty English word, and an empty French word, • E comprises edges from the source to all the English words (including the empty one), edges from all the French words ... reader to (Ford and Fulkerson, 1962; Klein, 1967). 2.2 Alignment models Flows and networks define a general framework in which it is possible to model alignments be- tween words, and to find,...

Ngày tải lên: 17/03/2014, 07:20

7 379 0

Báo cáo khoa học: "Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages" docx

... 2004. Im- proving bitext word alignments via syntax-based re- ordering of english. In Proc. ACL. Alexander Fraser and Daniel Marcu. 2007. Getting the structure right for word alignment: Leaf. In Proc. ... English and Pashto word to generate one more alternative for the word alignment. 3 Conﬁdence-Based Alignment Combination Now we describe the algorithm to combine mul- tiple sets of word alignments ... method to improve word alignment quality and eventually the translation performance by producing and combining complementary word alignments for low-resource languages. Instead of focusing on the...

Ngày tải lên: 23/03/2014, 16:20

5 274 0

Báo cáo khoa học: "Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers" doc

Ngày tải lên: 23/03/2014, 16:20

5 274 0

Báo cáo khoa học: "A Combination of Active Learning and Semi-supervised Learning Starting with Positive and Unlabeled Examples for Word Sense Disambiguation: An Empirical Study on Japanese Web Search Query" pdf

Ngày tải lên: 23/03/2014, 17:20

4 441 1

Báo cáo khoa học: "Efﬁcient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words" ppt

Ngày tải lên: 23/03/2014, 18:20

8 478 0

Báo cáo khoa học: "Word Alignment in English-Hindi Parallel Corpus Using Recency-Vector Approach: Some Studies" ppt

Ngày tải lên: 23/03/2014, 18:20

8 388 0

Báo cáo khoa học: "A Stochastic Language Model using Dependency and Its Improvement by Word Clustering" ppt

Ngày tải lên: 31/03/2014, 04:20

7 352 0