improving ibm word alignment model 1

Tài liệu Báo cáo khoa học: "Word Alignment with Synonym Regularization" doc

Tài liệu Báo cáo khoa học: "Word Alignment with Synonym Regularization" doc

... 0 .14 5 Proposed 0.947 0.824 0.8 81 0 .11 2 (b) 10 0k Precision Recall F-measure AER GIZA++ standard 0.925 0.7 91 0.853 0 .13 6 with SRH 0.934 0.803 0.864 0 .12 6 HM-BiTAM standard 0.898 0.8 51 0.874 0 .12 4 with ... representative word. For instance, the words ‘sick’ and ‘ill’ in the bilingual sentences # vocabularies 10 k 50k 10 0k English standard 8578 16 924 22 817 with SRH 5435 7235 13 978 French standard 10 7 91 218 72 ... of the ACL 2 010 Conference Short Papers, pages 13 7 14 1, Uppsala, Sweden, 11 -16 July 2 010 . c 2 010 Association for Computational Linguistics Word Alignment with Synonym Regularization Hiroyuki...

Ngày tải lên: 20/02/2014, 04:20

5 471 2
Tài liệu Báo cáo khoa học: "Smoothing a Tera-word Language Model" doc

Tài liệu Báo cáo khoa học: "Smoothing a Tera-word Language Model" doc

... order model is taken to be the word frequencies in the Web 1T corpus. The Brown corpus was re- tokenized to match the tokenization style of the Web 1T dataset resulting in 1, 186,262 tokens in 52 ,10 8 sentences. ... 52 ,10 8 sentences. The Web 1T dataset has a 13 million word vocabulary consisting of words that appear 10 0 times or more in its corpus. 769 sentences in Brown that contained words outside this vocabulary ... and Linda C. Bauman Peto. 19 95. A hierarchical Dirichlet language model. Natural Lan- guage Engineering, 1( 3) :1 19 . Y.W. Teh. 2006. A hierarchical Bayesian language model based on Pitman-Yor...

Ngày tải lên: 20/02/2014, 09:20

4 425 1
Tài liệu Báo cáo khoa học: "Yet Another Word Alignment Tool" docx

Tài liệu Báo cáo khoa học: "Yet Another Word Alignment Tool" docx

... the term word alignment 1 Yawat was first presented at the 2007 Linguistic Annota- tion Workshop (Germann, 2007). to refer to any form of alignment that identifies words or groups of words as ... sub-sentential alignments of paral- lel text.” Linguistic Annotation Workshop (LAW ’07), 12 1 12 4. Prague, Czech Republic. Hwa, Rebecca and Nitin Madnani. 2004. “The umiacs word alignment interface.” http://www.umiacs.umd.edu/ ∼ nmadnani/ alignment/ forclip.htm. Lambert, ... translations by word alignment but also becaus e of such interface issues that aligning words manually has the reputa- tion of being a very tedious task. 3 Yawat Yawat (Yet Another Word Alignment Tool)...

Ngày tải lên: 20/02/2014, 09:20

4 417 1
Tài liệu Báo cáo khoa học: "Discriminative Word Alignment with Conditional Random Fields" ppt

Tài liệu Báo cáo khoa học: "Discriminative Word Alignment with Conditional Random Fields" ppt

... 7.73 –dictionary 27.72 7. 21 –sentence position 28.30 8. 01 –POS – 8 .19 Model 1 28.62 8.45 alignment word pair 32. 41 7.20 –Markov 32.75 12 .44 –Dice & Model 1 35.43 14 .10 Table 3. The resulting ... f-score AER Model 4 refined 87.4 95 .1 91. 1 9. 81 Model 4 intersection 97.9 86.0 91. 6 7.42 French → English 96.7 85.0 90.5 9. 21 English → French 97.3 83.0 89.6 10 . 01 intersection 98.7 78.6 87.5 12 .02 refined ... implementa- tion of the IBM alignment models (Brown et al., 19 93). These models treat word alignment as a hidden process, and maximise the probability of the observed (e, f ) sentence pairs 1 using the ex- pectation...

Ngày tải lên: 20/02/2014, 11:21

8 461 0
Tài liệu Báo cáo khoa học: "Using Word Support Model to Improve Chinese Input System" ppt

Tài liệu Báo cáo khoa học: "Using Word Support Model to Improve Chinese Input System" ppt

... usually 2, i.e. bigram model (Lin and Tsai, 19 87; Gu et al., 19 91; Fu et al., 19 96; Ho et al., 19 97; Sproat, 19 90; Gao et al., 2002; Lee 2003). From the studies (Hsu 19 94; Tsai and Hsu, 2002; ... 量刑/事實 /1, 關於/兩性 /1, 關與/實施 /1, 生殖/實施 /1, 關於/事實 /1, 關於/史實 /1 WSM Set 關於( guan yu)/7, 實施(shi shi)/4, 兩性(liang xing)/3, 量刑(liang xing)/2, 知識(zhi shi)/2, 事實(shi shi)/2, 失事( shi shi) /1, 關與(guan yu) /1, ... are (18 .9%, 10 .1% ) and (25.6%, 16 .6%), respectively. From Table 3b, the tonal and toneless STW improvements of the BiGram by using the WP identifier and the WSM are (8.6%, 11 .9%) and (17 .1% ,...

Ngày tải lên: 20/02/2014, 12:20

8 359 0
Tài liệu Báo cáo khoa học: "Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity" pdf

Tài liệu Báo cáo khoa học: "Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity" pdf

... from the Europarl corpus 4 #word- transl. pairs #word- transl. pairs DA 10 4K FR 90K DE 13 3K IT 96K EL 60K PT 86K EN 11 9K SV 97K ES 11 9K ALL 994K FI 89K Table 3: Number of word- translation pairs for ... automatic word alignment. Context vec- tors are built from the alignments found in a paral- lel corpus. Each aligned word type is a feature in the vector of the target word under consideration. The alignment ... introduced above. For the word alignment, we apply standard tech- niques derived from statistical machine transla- tion using the well-known IBM alignment mod- els (Brown et al., 19 93) implemented in...

Ngày tải lên: 20/02/2014, 12:20

8 516 0
Tài liệu Báo cáo khoa học: "Word Alignment for Languages with Scarce Resources Using Bilingual Corpora of Other Language Pairs" pptx

Tài liệu Báo cáo khoa học: "Word Alignment for Languages with Scarce Resources Using Bilingual Corpora of Other Language Pairs" pptx

... ∑ Δ=− Δ=− −− −− −− −− = −−= Δ Δ '':,' :, 1& apos;1EJ 1& apos;1EJ EJ 1& apos ;1& apos; 11 ),'|,(Pr )'|(Pr )'|(Pr jjj jjj ii ii ii ii jj jj jj ⊙⊙ ⊙⊙ ⊙⊙ ⊙⊙ (12 ) The English word in position is aligned to the Japanese word ... build a word alignment model for L1 and L2 based on the above two models. Here, we call this model an induced model. With this induced model, we per- form word alignment between languages L1 and ... given English word using all other words as context in Set 1 and Set 2, respectively. CE V EJ V >=< ),(, ),,(),,( 11 2 211 1CE nn ctectecteV >=< ),(, ),,(),,( 2222 211 EJ nn ctectecteV...

Ngày tải lên: 20/02/2014, 12:20

8 359 0
Tài liệu Báo cáo khoa học: "Guiding Statistical Word Alignment Models With Prior Knowledge" pdf

Tài liệu Báo cáo khoa học: "Guiding Statistical Word Alignment Models With Prior Knowledge" pdf

... Constrained Word Alignment Models The framework that we propose to incorporate sta- tistical constraints into word alignment models is generic. It can be applied to complicated models such IBM Model- 4 ... sophis- ticated Model- 4 when proper constraints are active in guiding word alignment model training. We also try to put constraints in Model- 4. As the Equation 1 implies, when a word- to -word generative ... in guiding word alignment training, we compare statistics of different word alignment models. We find that our baseline HMM 6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0 .1 0.2 0.3 ...

Ngày tải lên: 20/02/2014, 12:20

8 495 0
Tài liệu Báo cáo khoa học: "A Discriminative Syntactic Word Order Model for Machine Translation" pdf

Tài liệu Báo cáo khoa học: "A Discriminative Syntactic Word Order Model for Machine Translation" pdf

... BLEU 1 best 30 best Lang Model (Permutations) 58.8 71. 2 Lang Model (TargetProjective) 83.9 95.0 Local Tree Order Model 75.8 87.3 Local Tree Order Model + Lang Model 92.6 98.0 Re-ranking Models Features ... vocab avg. len vocab MT-train 500K 15 .8 77K 18 .7 79K MT-test 1K 17 .5 – 20.9 – Ref-test 1K 17 .5 – 21. 2 – Table 3: Main data sets used in experiments. target words and/or will not be projective ... first-pass models in the top part, and the performance of our 15 First-pass models Model BLEU 1 best 30 best Baseline MT System 33.0 – Lang Model (Permutations) 26.3 28.7 Lang Model (TargetCohesive) 31. 7...

Ngày tải lên: 20/02/2014, 12:20

8 404 0
Báo cáo khoa học: "Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment" pdf

Báo cáo khoa học: "Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment" pdf

... get 3 81 % 0 5 10 15 20 25 LG 1 42.2 19 20.8 9 914 .1+ 10 000+ 10 000+ HaG 1 3.5 10 .9 34 .1 89.2 219 .9 Table 1: Average #derivations per alignment for LG and HaG v.s. Percentage of unaligned words. ... Haghighi et al. (2009). We used a log-linear model, with features like IBM model1 02 0. 21 017 0 .18 0 .19 0 . 2 A ER 0 .15 0 .16 0 . 17 16 111 6 A HaG-20best LFG-1best LFG-20best Number of iterations Figure ... C C e 1 / e 2 e 3 e 4 f 1 f 2 f 3 (a) B A A C C C C e 1 / e 2 e 3 e 4 f 1 f 2 f 3 (b) B A C C 01 C t C 01 C C e 1 / e 2 e 3 e 4 f 1 f 2 f 3 (c) Figure 4: Null -word attachment for the same alignment. ((a)...

Ngày tải lên: 07/03/2014, 22:20

5 499 0
Báo cáo khoa học: "Data Cleaning for Word Alignment" pdf

Báo cáo khoa học: "Data Cleaning for Word Alignment" pdf

... 0.297 0 .17 1 0 .14 6 70 0 .17 9 0.2 51 0.298 0 .17 0 0 .14 6 80 0 .18 1 0.252 0.3 01 0 .16 9 0 .14 7 90 0 .18 0 0.252 0.297 0 .17 1 0 .14 7 10 0 0 .18 0 0.2 51 0.302 0 .16 9 0 .14 6 # 51k 51k 51k 60k 60k ave 21. 0/23.8(EN/FR) ... 0 .16 9 99 .10 % 0 .18 0 91. 81% Ours 0.2 21 96.42% 0 .19 2 96.38% 1 0.2 01 40.49% 0 .18 7 49.37% 2 0.205 48.53% 0 .18 8 55.03% 3 0.208 58.07% 0 .18 7 61. 22% 4 0. 215 83 .10 % 0 .19 0 81. 57% 5 0 .19 2 29.03% 0 .18 0 31. 52% 6 ... ENDE 10 0 .16 7 0.088 0 .14 3 0.097 0.079 20 0.087 0 .19 5 0.246 0 .13 8 0 .12 7 30 0 .14 5 0.229 0.279 0 .15 7 0 .13 7 40 0 .17 5 0.242 0.295 0 .16 8 0 .14 2 50 0.229 0.250 0.297 0 .17 0 0 .14 5 60 0 .17 8 0.253 0.297 0 .17 1...

Ngày tải lên: 08/03/2014, 01:20

9 487 0
Báo cáo khoa học: "A DOM Tree Alignment Model for Mining Parallel Data from the Web" doc

Báo cáo khoa học: "A DOM Tree Alignment Model for Mining Parallel Data from the Web" doc

... Church 19 91; Brown et al. 19 91; Wu 19 94) used sentence length as the ba- sic feature for alignment. (Kay & Roscheisen 19 93; and Chen 19 93) used lexical information for sentence alignment. Models ... techniques, the DOM tree alignment model, sen- tence alignment model, and candidate web page pair verification model are introduced. 4 DOM Tree Alignment Model The Document Object Model (DOM) is an ... 4 .1 DOM Tree Alignment Similar to STSG, our DOM tree alignment model supports node deletion, insertion and substitution. Besides, both STSG and our DOM tree align- ment model define the alignment...

Ngày tải lên: 08/03/2014, 02:21

8 435 0
Báo cáo khoa học: "Improved Discriminative Bilingual Word Alignment" pdf

Báo cáo khoa học: "Improved Discriminative Bilingual Word Alignment" pdf

... training his stage 1 and stage 2 models. For the stage 2 model, we used a single learning rate of 0. 01. For the stage 1 model, we used a sequence of learning rates: 10 00, 10 0, 10 , and 1. 0. At each transition ... words being linked with the estimated condi- tional odds of a cluster of words being linked: LO(w 1 , . . . , w k ) = links 1 (w 1 , . . . , w k ) + 1 (cooc(w 1 , . . . , w k ) − links 1 (w 1 , ... between our stage 1 and stage 2 models is that the stage 1 model con- siders each word- to -word link separately, but al- lows multiple links per word, as long as they lead to an alignment consisting...

Ngày tải lên: 08/03/2014, 02:21

8 217 0
Báo cáo khoa học: "Soft Syntactic Constraints for Word Alignment through Discriminative Training" pot

Báo cáo khoa học: "Soft Syntactic Constraints for Word Alignment through Discriminative Training" pot

... of EMNLP, pages 304– 311 . W. A. Gale and K. W. Church. 19 91. Identifying word cor- respondences in parallel texts. In 4th Speech and Natural Language Workshop, pages 15 2 15 7. DARPA. D. Gildea. ... im- proved alignments. 2 Constrained Alignment Let an alignment be the complete structure that connects two parallel sentences, and a link be one of the word- to -word connections that make up an alignment. ... models in (Brown et al., 19 93), word alignment has be- come the first step in training moststatistical trans- lation systems, and alignments are useful to a host of other tasks. The dominant IBM...

Ngày tải lên: 08/03/2014, 02:21

8 325 0
Báo cáo khoa học: "Boosting Statistical Word Alignment Using Labeled and Unlabeled Data" ppt

Báo cáo khoa học: "Boosting Statistical Word Alignment Using Labeled and Unlabeled Data" ppt

... 0.75 31 0.2469 Interpolated 0.7555 0.7084 0.7 312 0.2688 Method 1 0.7986 0. 719 7 0.75 71 0.2429 Method 2 0.8060 0.7388 0.7709 0.22 91 Combination 0. 817 5 0.7858 0.8 013 0 .19 87 Table 2. Word Alignment ... 6. 2 Statistical Word Alignment Model According to the IBM models (Brown et al., 19 93), the statistical word alignment model can be generally represented as in equation (1) . ∑ = a' e|f,a' e|fa, e|fa, )Pr( )Pr( )Pr( ... simplified version does not take into account word classes as described in Brown et al. (19 93). ))))(()](([ ))()](([( )|( )|( )Pr( 0 ,1 1 0 ,1 1 11 1 2 0 0 0 00 ∏ ∏ ∏∏ ≠= > ≠= == − −⋅≠ +−⋅= ⋅⋅ ⋅ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − = m aj j m aj j m j aj l i ii m j j j a j jpjdahj cjdahj eften pp m ρ φφ φ φ φ e|fa, (2) ml, ...

Ngày tải lên: 08/03/2014, 02:21

8 451 1
w