0

boosting statistical word alignment using labeled and unlabeled data

Báo cáo khoa học:

Báo cáo khoa học: "Boosting Statistical Word Alignment Using Labeled and Unlabeled Data" ppt

Báo cáo khoa học

... incor-porating the unlabeled data. In this algo-rithm, we build a word aligner by using both the labeled data and the unlabeled data. Then we build a pseudo reference set for the unlabeled data, and calculate ... for Computational Linguistics Boosting Statistical Word Alignment Using Labeled and Unlabeled Data Hua Wu Haifeng Wang Zhanyi Liu Toshiba (China) Research and Development Center 5/F., Tower ... semi-supervised boosting approach to improve statistical word alignment with limited labeled data and large amounts of unlabeled data. The proposed approach modifies the super-vised boosting algorithm...
  • 8
  • 451
  • 1
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Guiding Statistical Word Alignment Models With Prior Knowledge" pdf

Báo cáo khoa học

... results of IBMModel-4 word alignments implemented in GIZA++toolkit (Och and Ney, 2003).We study and compare two types of constraint and see how they affect word alignments and translationoutput. ... thestatistics of the baseline word alignment model and use them to construct the sparse matrix. We findlow-dimensional representation (R = 67) of Englishwords and Arabic words and use their similarity ... impact on word alignments and show their effectiveness in improv-ing translation performance.1convenience, and is not the only way to address theempty word issue.2.2.2 Utilizing Word Alignment...
  • 8
  • 495
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity" pdf

Báo cáo khoa học

... similarity: one using syntactic context and one using translational context based on word alignment and the combination of both. For bothapproaches, we used a cutoff n for each row in our word- by-context ... on Word Alignment The alignment approach to synonym extraction isbased on automatic word alignment. Context vec-tors are built from the alignments found in a paral-lel corpus. Each aligned word ... automatic word alignment. We applied GIZA++ and the in-tersection heuristics as explained in section . Fromthe word aligned corpora we extracted word typelinks, pairs of source and target words...
  • 8
  • 516
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Word Alignment for Languages with Scarce Resources Using Bilingual Corpora of Other Language Pairs" pptx

Báo cáo khoa học

... data, which is not included in the training data and the held-out data. The alignment links in the held-out data and the testing data are manually annotated. Testing data includes 4,926 alignment ... 2005. Combined Word Alignments. In Proc. of the ACL-2005 Workshop on Building and Using Parallel Texts: Data- driven Machine Trans-lation and Beyond, pages 107-110. Franz Josef Och and Hermann ... Japanese word , and put the English sentences in the pairs into Set 2. ef(3) Construct the feature vectors and of the given English word using all other words as context in Set 1 and Set 2,...
  • 8
  • 359
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Learning with Unlabeled Data for Text Categorization Using Bootstrapping and Feature Projection Techniques" doc

Báo cáo khoa học

... with robustness from noisy data (Ko and Seo, 2004). How can labeled training data be automatically created from unlabeled data and title words? Maybe unlabeled data don’t have any information ... for using unlabeled data in text categorization have two directions; One builds classifiers from a combination of labeled and unlabeled data (Nigam, 2001; Bennett and Demiriz, 1999), and ... related works, we presented two approaches using unlabeled data in text categorization; one approach combines unlabeled data and labeled data, and the other approach uses the clustering technique...
  • 8
  • 443
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data" pdf

Báo cáo khoa học

... 1268 Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data Sun Maosong, Shen Dayang*, Benjamin K Tsou** State Key Laboratory of Intelligent Technology and Systems, ... Chinese word segmentation developed so far, both statistical and rule-based, exploited two kinds of important resources, i.e., lexicon and hand-crafted linguistic resources(manually segmented and ... Chinese, and time consuming. Furthermore, even the lexicon is large enough, and the corpus annotated is balanced and huge in size, the word segmenter will still face the problem of data incompleteness,...
  • 7
  • 396
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Using Similarity Scoring To Improve the Bilingual Dictionary for Word Alignment" doc

Báo cáo khoa học

... show a signifi-cant improvement in precision and recallfor word alignment when the improveddicitonary is used.1 Introduction and Related Work Word alignment is a well-studied problem in Natu-ral ... are based on word cooccur-rences in sentence-aligned bilingual data. A sourcelanguage word and a target language word aresaid to cooccur if occurs in a source language sen-tence and occurs ... one word. Then the similarity score of themerged cluster will be the similarity score ofthe word pair.2. Merge a cluster that contains a single word and a cluster that contains words and has...
  • 8
  • 363
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Finding Word Substitutions Using a Distributional Similarity Baseline and Immediate Context Overlap" potx

Báo cáo khoa học

... her and makecancer cells and his wife and an innocent mana mocking bird thousands of innocent unsuspecting people and or die for women and children suspects in foreignor be killed her husband ... (see Geffet and Dagan, 2004 and Gef-fet and Dagan, 2005, who improve the output of adistributional similarity system for an entailmenttask using a web-based feature inclusion check, and comment ... husband and a young girlanother human being in the name and forcibly recruitthousands of people in connection with a teenage girlin the name another human being and kill herhis wife and tens...
  • 9
  • 248
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Phrase-based Statistical Language Generation using Graphical Models and Active Learning" potx

Báo cáo khoa học

... He and S. Young. Semantic processing using the HiddenVector State model. Computer Speech & Language, 19(1):85–106, 2005.A. Isard, C. Brockmann, and J. Oberlander. Individuality and alignment ... A. Walker, O. Rambow, and M. Rogati. Training a sen-tence planner for spoken dialogue using boosting. Com-puter Speech and Language, 16(3-4), 2002.M. White, R. Rajkumar, and S. Martin. Towards ... the data (Fre-und et al., 1997).7 Discussion and conclusionThis paper presents and evaluates BAGEL, a sta-tistical language generator that can be trained en-tirely from data, with no handcrafting...
  • 10
  • 382
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora" docx

Báo cáo khoa học

... tween bandwidth and largeur de bande can be obtained through the edge linking these two units (type 1), or through two edges, one from bandwidth to largeur de bande., and one from bandwidth ... English and French words, an empty English word, and an empty French word, • E comprises edges from the source to all the English words (including the empty one), edges from all the French words ... reader to (Ford and Fulkerson, 1962; Klein, 1967). 2.2 Alignment models Flows and networks define a general framework in which it is possible to model alignments be- tween words, and to find,...
  • 7
  • 379
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages" docx

Báo cáo khoa học

... 2004. Im-proving bitext word alignments via syntax-based re-ordering of english. In Proc. ACL.Alexander Fraser and Daniel Marcu. 2007. Getting thestructure right for word alignment: Leaf. In Proc. ... English and Pashto word togenerate one more alternative for the word align-ment.3 Confidence-Based Alignment CombinationNow we describe the algorithm to combine mul-tiple sets of word alignments ... method to improve word alignment quality and eventually thetranslation performance by producing and combining complementary word align-ments for low-resource languages. Insteadof focusing on the...
  • 5
  • 274
  • 0

Xem thêm