DSpace at VNU: Refining lexical translation training scheme for improving the quality of statistical phrase-based translation

Refining Lexical Translation Training Scheme for Improving The Quality of Statistical Phrase-Based Translation Cuong Hoang1 , Cuong Anh Le1 , Son Bao Pham1,2 Faculty of Information Technology University of Engineering and Technology Vietnam National University, Hanoi Information Technology Institute Vietnam National University, Hanoi {cuongh.mi10, cuongla, sonpb}@vnu.edu.vn ABSTRACT Under word-based alignment, frequent words with consistent translations can be aligned at a high rate of precision However, the words that are less frequent or exhibit diverse translations in training corpora generally not have statistically significant evidences for confident alignments [7] In this work, we will focus on proposing a bootstrapping algorithm to capture those less frequent or exhibit diverse alignments Interestingly, we avoid making any explicit assumption concerning with the pair of languages used As the result, we take the experimental evaluations on two phrase-based translation systems: the English-Vietnamese and English-French translation systems Experiments point out a significant “boosting” capacity for the quality in overall for both these tasks INTRODUCTION Statistical Machine Translation (SMT) is a machine translation approach that sentence translations are generated based on statistical models whose parameters are derived from the analysis of parallel sentence pairs in a bilingual corpus In SMT, the best performing systems are based in some way on phrases (or the groups of words) The basic idea of phrasebased translation is to learn to break given source sentence into phrases, then translate each phrase and finally compose target sentence from these phrase translations [9, 12] For a statistical phrase-based translation system, the accuracy of statistical word-based alignment models is heavily important In fact, under lexical alignment models (IBM Models 1-2), the frequent words with a consistent translation usually can be aligned at a high rate of precision However, for the words that are less frequent or exhibit diverse translations, in general we not have statistically significant evidence for a confident alignment Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee SoICT 2012, August 23-24, 2012, Ha Long, Vietnam Copyright 2012 ACM 978-1-4503-1232-5/12/08 $10.00 This problem tends to deeply reduce the translation quality in some important manners First, the diverse translations of a source word are never be able to recognize by our original statistical alignment models This is a quite important aspect It points out the sense that a pure statistical IBM alignment model is not sufficient enough to reach the state-of-the-art of the alignment quality Second, the bad quality in lexical translating estimation leads a bad influence to the quality of using higher fertility-based alignment models [6] Finally, the phrase extraction could not be able to generate more diverse translations for each word or phrase In our observation, the essence of linguistics is flexible and diversity Therefore, we need to capture those diverse translations as serving for obtaining a more superior in quality To overcome that problem, some papers report improvements when linguistics knowledge is used [7] In general, the linguistics knowledge is mainly served to filter out incorrect alignments As the result, it is impossible to be easily applicable for all the language pairs without any adaptation Different to the previous methods, in this work we propose a bootstrapping word alignment algorithm for improving the modelling of lexical translation Basically, we found that our alignment model are better in “capturing” the diverse translations of words and therefore reducing the bad alignments of rare words by that way Following to the work from [6], we also show out a very interesting point That is, although we mainly focus on IBM Models 1-2, we found that it is possible to significantly improve the quality of fertility-based alignment models better Consecutively, with the improving of word-based alignment models, we point out that our phrase-based SMT system gains a statistically significant in improving its translation quality The evaluation of our work is performed on different tasks of different languages Without concerning to linguistics knowledge, we believe our approach will be applicable to other language pairs The rest of this paper is organized as follows: Section II presents IBM Model 1-2 Section III denotes the problem of bad alignment for rare words or for the words with diverse translations Section IV focuses on our bootstrapping word alignment algorithm Section V presents our experimental evaluations Finally, conclusion is derived in section VI IBM MODELS 1-2 55 Model is a probabilistic generative model within a framework that assumes a source sentence f1J of length J translates as a target sentence eI1 of length I It is defined as a particularly simple instance of this framework, by assuming all possible lengths for f1J (less than some arbitrary upper bound) have a uniform probability Let t(fj |ei ) as the translation probability of fj given ei The alignment is determined by specifying the values of aj for j from to J [1] yields as the follows: J P r(f |e) = (I + 1)J We consider a foreign word, for example: fj We dive into the case, where fj usually co-occurs with a word ek from the pairs of bilingual sentences from parallel corpus We take an assumption that this pair is not a pair of corresponding lexical translation in linguistics In our expectation, we hope the lexical translation probability t(fj |ek ) is always smaller than the lexical translation probability t(fj |ei ) Similar to the λei normalization factor, we have the λek normalization factor of the word ek : m I t(fj |ei ) (1) j=1 i=0 c(fj |ek ) λek = (6) j=1 Unfortunately, if the word ek appears less than the word ei (ek ei ) in the training corpus, the λei normalization factor is usually greater deeply than the value λek normalization λei ) Therefore, following from factor of the word ek (λek the equation (3), the lexical translation probability t(fj |ek ) is deeply greater over than the lexical translation probability t(fj |ei ) Therefore, we cannot “capture” the option for choosing the correct alignment pair (fj |ei ) as we expected J Similarly, we assume that the word fj is a diverse transt(f |e ) j i · σ(f1J , fj ) c(fj |ei ; f1J , eI1 ) = lation of the word ei Because ei contains a larger N posI i=0 t(fj |ei ) j=1 sible translation options, the λei gains a greater value, too I Hence, following from the equation (3), the lexical translaσ(eI1 , ej ) (2) · tion probability t(fj |ei ) gains a very small value In this i=0 case, even when ek is not a rare word, it is very common that t(fj |ei ) t(fj |ek ) Therefore, we cannot control the In addition, we set λe as normalization factor and then diverse translation (fj ; ei ) as we expected find repeatedly the translating probability between a word In our observations, we found that these things happen J I fj in f1 given a word ei in e1 as: too usually Hence, it quite impacts the quality of wordJ I by-word alignment modelling In the following, we will take t(fj |ei ) = λ−1 (3) e c(fj |ei ; f1 , e1 ) an example for showing this problem clearly From a paralIBM Model is just another simple model which is better lel corpus that contains 60, 000 parallel sentences (Englishthan Model due to the fact that it addresses the issue of Vietnamese), Table and Table are the results which dealignment with an explicit model for alignment based on the rived from the “Viterbi” alignments which were pegged by positions of the input and output words We make the same training IBM Model assumptions as in Model except we assume P r(aj |aj−1 , f1j−1 , J, e) In this section, we will propose an index, which is entitled depends on j, aj , and J, as well as on I The equation for esthe Average Number of Best Alignment Importantly, timating the probability of a target sentence, given a source it is the key of this work Previously, [1] introduce the idea sentence: of an alignment between a pair of strings as an object indicating for each word in the French string that word in the J I English string from which it arose For each specific transP r(f |e) = t(fj |ei )a(i|j, J, I) (4) lation model, with a pair of parallel sentences, we will find j=1 i=0 the ”best” corresponding words of all the words of source sentence Our focus is to find out the correlation between A CAUSE FOR BAD WORD ALIGNMENT the occurrence of a word and its probability to be chosen TRANSLATION as the best alignment pair (that word with its correspondWithout the loss of generality, we assume that there exists ing “marked” alignment word) by our statistical alignment a set Fei contains N possible word translations of a word model ei : In other words, we try to find out the relationship between the number of occurrence of words and the possibility that Fei = {f1 , f2 , , fn } it could be “pegged” as the best alignment pair These best alignment pairs could be usually not accurate Hence, our It means that from parallel corpus, the correct alignments focus tries to reflect the possible error happened For conof the elements from the set Fei are the lexical pairs (fj : ei ) venience, we call the Average Number of Best Alignment (j ∈ 1, 2, , n) For each training iteration of word-based (ANBA) index as the average unit between the total numalignment model, the λei normalization factor is defined as ber of the times of a group of target words when they were the sum of all the lexical translation probabilities between marked as the best word-by-word alignments A group of fj and ei : target words here means that these words have the same frequency occurrence (Freq column) n Hence, ANBA could be defined as the average ratio beλei = c(fj |ei ) (5) tween the total number of the times of a group of target j=1 The parameters of Model for a given pair of languages are normally estimated using EM [3] We call the expected number of times that word ei connects to fj in the pair of translation (f1J |eI1 ) the count of fj given ei for (fj |ei ) and denote it by c(fj |ei ; f1J , eI1 ) Following some mathematical inductions by [1], c(fj |ei ; f1J , eI1 ) could be calculated as follows: 56 words when they were marked as the best word-by-word alignments when we find the best possible corresponding word of a source word A class of words, which contains all the words which have the same frequency occurrences in training data Hence, the ANBA index of a class could be calculated as the ratio between the number of times the words belonged to its class were chosen as the best wordby-word corresponding over the number of all the words of that class ANBA is also used to reflect the average number of possible translations of a group of target words Consider the ANBA tables for both English and Vietnamese words when each side was chosen as the target language in translation Freq 10 Num of Words 5874 2350 1444 963 698 604 471 389 301 278 ANBA 3.31 2.76 2.48 2.27 2.13 2.02 1.89 1.85 1.78 1.73 Deviation 1.18 0.29 0.07 0.00 0.01 0.04 0.11 0.14 0.20 0.24 σ = 2.28 Table 1: The ANBA Statistical Table for Vietnamese Words Freq 10 Num of Words 11815 4219 2190 1350 932 681 524 412 322 305 ANBA 4.45 3.08 2.69 2.46 2.29 2.19 2.12 2.01 1.92 1.85 Deviation 3.78 0.32 0.03 0.00 0.05 0.10 0.15 0.25 0.34 0.43 σ = 5.46 tunately, IBM Models 1-2 are simple models in sense that we could implement the training processes of them very fast when we compare to the complexity for training higher IBM models Therefore, our improving idea will focus on improving the training scheme of for gaining a better result 4.1 Turn back to the set Fei which contains n possible lexical translations of the word ei case, we assume that the pair (fn ; ei ) appears many times in training corpus, from the equations (2) and (3), t(fn |ei ) will be obtained a very high value Consecutively, the rest lexical translation probabilities t(fj |ei ) with i ∈ {1 (n − 1)} will be obtained some very small values It becomes badly when the cardinality of the set F of a word is large, as expected as the diversity property of linguistics Clearly, t(fj |ei ) be never gained it “satisfied” translation probability according to them Therefore, the noisy alignment choosing happens strongly followed to the fact that the pairs of not kinds of real lexical pair (in linguistics) easily gain higher lexical translation probabilities To overcome our noisy problem, in the following, we will present a bootstrapping alignment algorithm for refining the training scheme for IBM Models 1-2 In more details, we divide the training scheme of a translation model into N steps, which N is called the smoothing bootstrapping factor In other words, N could be also defined as the number of times we re-train our IBM Models Without the loss of generality, we assume that the occurrence happening of the word fj is not usually than the occurrence happening of the word fj+1 For more convenience, we denoted this assumption as: c(f1 ) ≤ c(f2 ) ≤ c(fn ) If we could divide our training work into N consecutive steps, which each of them tries to separate fn in the case that the lexical translation probability t(fn |ei ) is the maximum translation probability at that time when we compare to other target words ek Hence, if we mark and filter out every “marked” pairs of fn and ei in bilingual corpus when they are chosen as the best lexical alignment at that time, we have new training corpus with a set of new updated “parallel sentences” If we re-train our new training parallel corpus again, for each ˜e training iteration, we have a new normalization factor λ i of the word ei : Table 2: The ANBA Statistical Table for English Words Table and denote clearly the fact that we can only be able to “capture” the translations of a word when that word appears not many times (the diverse translation problem) In addition, the more one word appears, the more difficulties for capturing its diverse target translations To besides, we see that the average “ANBA” index strongly goes far away corresponding to the smaller frequency of a word When one word appears rarely than the others in training data, it is usually chosen as the best “Viterbi” alignment pair of source word as a target word IMPROVING LEXICAL TRANSLATION MODEL The problem of rare words or words which have a lot of diverse translations takes us an interesting challenge For- Improving Lexical Translation Model n−1 ˜e = λ i c(fj |ei ) (7) j=1 It is very interesting that each time when we separate fn out of the set F, it is obviously that the new lexical translation probability t˜(fk |ei ) (k is different to n) will be increased as it is expected to be increased because the decreasing value ˜ e when we compare to the original normalization factor of λ i λei That result comes from the fact that there is no need ˜ e any more Therefore, for adding a value of c(fn |ei ) to λ i the possibility of the source lexical word fj will be automatically aligned to ei is increased as we expected, too This is the main key for us to be able to capture the other “diverse” translations of a word 4.2 The Bootstrapping Word Alignment Algorithm From the above analysis, we have a simple but very effec- 57 tive way to improve the quality of lexical translation modelling In this section, we will formalize our bootstrapping word alignment algorithm Put the set: N = {∆1 , ∆2 , , ∆n }(∆1 < ∆2 < · · · < ∆n ) as the set which could be defined as the covering range of our bootstrapping word alignment algorithm The ∆i value could be understood as the occurrence threshold (or frequency threshold) for separating a group of words according to their group’s level of the number of occurrences The bootstrapping word alignment algorithm is formally described as follows: The Bootstrapping Word Alignment Algorithm S Input: eS , f1 N = {∆1 , ∆2 , , ∆n }(∆1 < ∆2 < · · · < ∆n ) Output: Alignment A Start with a = φ For each ∆n as a threshold: Count frequency of each word in eS Train IBM Model For each pair (f (s) , e(s) ), ≤ s ≤ S For each ei in e(s) ) If ei marked in A continue If fj in f (s) marked in A continue Finding best fj in f (s) If c(fi ) ≥ ∆n Mark (fj ; ei ) as a pair of word Add (fj ; ei ) to A Changing ei in e(s) n=n-1 Goto For the word fj which is the best alignment of a source word ei and the number of the occurrences of fj is over a threshold condition, we mark and add (fj ; ei ) to the set of alignment A After that, we need to mark ei as “pegged” Similar to [1], we change the word ei by adding a prefix “UNK” to it The changing word ei as noticed previously, serving for “boosting” the probability for obtaining other possible translation fk of ei in other sentences (by reducing the adding of the translation probability t(fj |ei ) value to the total λei value since we change ei in e(s) by adding prefix to change) In fact, the training of lexical translation models does not cost deeply our computing power However, to re-train system by n times, which n is the cardinality of the set N , actually costs deeply the computational resource We have an upgraded version of our bootstrapping word alignment algorithm That is after pegging, for example, (fj ; ei ) as the best alignment which satisfies the count frequency condition, we let them out of our training data and re-train the system with corresponding to our new threshold ∆i This reduces the computational resource deeply and also reduces deeply time for processing Together, we will improve the way of choosing each element ∆i in the set N for helping us to not only cover a larger range of ∆n , but also reduce the computational resource In more details, these improving schemes will be described in the experimental section EXPERIMENT Recently researches point out that it is difficult to achieve heavily gains in translation performance based on improving word-based alignment results Word alignment quality [4] improving could be strong but it is hard to improve the quality statistical phrase-based translation system in overall [2, 14] Therefore, to confirm the influences of the improvements from lexical translation, we will test directly the impact of using word alignment extracting component for ”learning” phrases in phrase-based SMT system This experiment is deployed on English-Vietnamese languages It is also deployed on English-French languages for a larger training data The English-Vietnamese training data was credited by [5] The English-French training corpus was Hansards corpus [10] We use MOSES framework [8] as the phrase-based SMT framework In all evaluations, we translate sentences from Vietnamese to English Then, we measure performance using BLEU metric [13], which estimates the accuracy of translation output with respect to a reference translation We use 1, 000 pair of parallel sentences for testing the translation quality of statistical phrase-based translation system 5.1 Baseline Results According to this work, we use LGIZA1 as a lightweight statistical machine translation toolkit that is used to train IBM Models 1-3 More information about LGIZA could be referred from [6] Different to GIZA++, LGIZA is originally implemented based on the original IBM Models documentary [1] without applying other latter improved techniques which are integrated to GIZA++ These are determining word classes for giving a low translation lexicon perplexity (Och, 1999), various smoothing techniques for fertility, distortion or alignment parameters, symmetrization [10][11], etc which are applied in GIZA++ [11] and therefore, the applying of your improved techniques could make our results a little bit noise in comparison Table presents the testing results with BLEU score measurement with each specific IBM Model which was trained on a bilingual corpus with 60, 000 parallel sentences (0.65 million of tokens) There is a little bit noticed here - that is for English and Vietnamese - as an example of a pair of quite different in linguistics, the comparison results of using IBM Model is usually not good as IBM Model IBM IBM IBM IBM Model Model Model Model BLEU(%) 19.07 19.54 18.70 Table 3: Using IBM Models as the baseline 5.2 Evaluation On The Bootstrapping Word Alignment Algorithm These followings experimental evaluations aim to focus on seeing the impact by applying our bootstrapping word alignment algorithm For each evaluation, for the same set LGIZA is available on: http://code.google.com/p/lgiza/ 58 N served as the covering range, we take different smoothing factors as the difference between two thresholds ∆i and ∆i+1 (we assume it is a constant) Table 4-5-6-7 point out clearly the results when we apply our bootstrapping word alignment algorithm for each specific smoothing factor and for each set N N 40 80 120 160 Smooth 1 1 BLEU Score 19.54 19.55 19.71 19.57 ∆ +0.47 +0.48 +0.64 +0.50 Table 4: Improving results when setting the smoothing factor value is N 40 80 120 160 Smooth 2 2 BLEU Score 19.49 19.55 19.66 19.63 ∆ +0.42 +0.48 +0.59 +0.56 Table 5: Improving results when setting the smoothing factor value is N 40 80 120 160 Smooth 4 4 BLEU Score 19.41 19.38 19.53 19.53 ∆ +0.34 +0.31 +0.46 +0.46 Table 6: Improving results when setting the smoothing factor value is Clearly, the lesson from our evaluation is that it could be gained better quality of statistical machine translation using our training smoothing scheme Usually, the more refined smoothing factor (smaller value) together with the large range we apply the smoothing (N ) help us to obtain better the translation quality In fact, sometimes when we choose the size of the set N is too large, it is not sure that we will obtain a better result (for example - 160 vs 120) This comes from the fact that for the words which appear more times, the probability that it was “pegged” by a wrong alignment pair is smaller than the probability of a word which appears less times Therefore, we not need to use a constant smoothing factor for the same treating to different occurrence level of words 5.3 N 40 80 120 160 Smooth 8 8 BLEU Score 19.39 19.40 19.48 19.38 ∆ +0.32 +0.33 +0.41 +0.31 Table 7: Improving results when setting the smoothing factor value is smooth factors That is we not choose the same smoothing factor for all each bootstrapping iterations Alternatively, we take the smoothing factors between each ∆i and ∆i+1 as a consecutively sequence: S : {0, 1, 2, 3, 4, 5, , n − 1} and we get the corresponding set N : N : {0, 1, 3, 6, 10, 15, , ∆n } This comes from the fact as well-described by the English and Vietnamese words statistical Tables We could see it clearly that for the words which have smaller frequencies, the more word is rarely happens, the more over-fitting when we test its ANBA index Otherwise, it is slightly better for these other words which are happened more often We integrate our improving in choosing the set N corresponding to the set S to our upgraded version of the bootstrapping word alignment algorithm to have a better algorithm We found that our improving algorithm could also improving the similar results In more details, Table and presents how Model and Model could be improved (denoted by the increasing BLEU score measurement) by using our refining scheme In fact, with our upgraded version, the improving in choosing the set N helps us not only need much less computational power but also cover a larger range of the classes of the occurrences of words In practice, we found that we only need around times for running the bootstrapping word alignment algorithm for IBM Models 1-2 when we compare to apply our original implementation Size of S 14 15 16 17 18 BLEU Score 19.45 19.44 19.71 19.47 19.53 ∆ +0.39 +0.38 +0.65 +0.41 +0.47 Table 8: Evaluation on the refining scheme - Improving IBM Model Evaluation On The Upgraded Version The above experimental evaluations on our bootstrapping word alignment algorithm clearly helps us to improve better translation quality of IBM Model In almost cases, we also be able to boost the quality of translation models around 0.5% BLEU Score However, if we re-train the quality too much time, it took too much cost Similarly, if we use a constant smoothing factor for the same treating to different occurrence level of words, it is not good as we described above In this section, we will improve our bootstrapping word alignment algorithm by improving the way of choosing the Size of S 14 15 16 17 18 BLEU Score 20.21 20.12 20.01 20.02 19.88 ∆ +0.67 +0.58 +0.47 +0.48 +0.34 Table 9: Evaluation on the refining scheme - Improving IBM Model 59 5.4 Evaluation On Improving Fertility Model In this section, we will go to one of the most interesting thing which we want to emphasize We will present the boosting translation quality to fertility models based on the improving from lexical translation modelling Actually, the improving 0.5% BLEU score measurement is not too much important The most important thing is that by improving lexical translation modelling, higher fertility models could boost their translation quality around 1% in BLEU score when we compare to the original implementation To reason the boosting capacity of using higher translation models, at first we consider new obtaining results about the words statistical tables when we apply our bootstrapping word alignment algorithm These “ANBA” index tables derived from using the test with N with it cardinality is 14 in the evaluation in the Table We could see it clearly that the ANBA index is reduced quite divergence strongly as we let them out previously when we compare to the results which we train by our original method Freq 10 Num of Words 5874 2350 1444 963 698 604 471 389 301 278 ANBA 2.07 1.59 1.46 1.35 1.27 1.23 1.21 1.21 1.19 1.11 Deviation 0.49 0.05 0.01 0.00 0.01 0.02 0.03 0.03 0.03 0.07 σ = 0.73 Table 10: The Upgraded ANBA Statistical Table for Vietnamese Words Freq 10 Num of Words 11815 4219 2190 1350 932 681 524 412 322 305 ANBA 3.63 2.18 1.93 1.82 1.72 1.63 1.63 1.56 1.56 1.48 Deviation 2.94 0.07 0.00 0.01 0.03 0.08 0.08 0.13 0.13 0.19 σ = 3.66 Table 11: The Upgraded ANBA Statistical Table for English Words Thinking it likes we have a better balance in ANBA indices This will let us have a better initial translation parameters using for the initial parameters for training fertility models In addition to the better in the initial fertility transferring parameters, more important, there is no “trick” for helping us to train IBM Model in a very fast way which could “care” all the possible alignments for each pair of parallel sentences in our training data for finding the best “Viterbi” alignments Our strategy is to carry out the sums of translations only over some of the more probable alignments, ignoring the vast sea of much less probable ones Since we begin with the most probable alignment that we can find and then include all alignments that can be obtained from it by small changes, the improving results in lexical modelling is very important Based on the original equation by [9], following to our better “Viterbi” alignments from IBM Model 2, we will have a new translation probabilities and position alignment probabilities derived from our “Viterbi” alignment sequences obtaining from applying the bootstrapping word alignment algorithm for training IBM Models 1-2: t(f |e) = a(i|j, le, lf ) = count(f, e) f count(f, e) (8) count(i, j, le, lf ) i count(i, j, le, lf ) (9) Hence, we have new initial translation probabilities and new initial alignment probabilities By using them as the initial parameter transferring to the training IBM Model 3, we will show that we could obtain a better translation probabilities, as described in the Table 12: Size of S 14 15 16 17 18 BLEU Score 19.76 19.58 19.81 19.68 19.56 ∆ +1.06 +0.88 +1.11 +0.98 +0.86 Table 12: Improving IBM Model It is clearly from the above evaluation results that by improving IBM Model and translation scheme, we could obtain a better SMT system with increasing BLEU Score around 1% measurement Another interesting thing is, in almost case, we always gain a better results, which are almost around 1% BLEU score improving measurement To besides, the more appealing thing is, we could see that by applying our improving, we not change deeply the monotonic of the ANBA index It means that we slightly change them but the improving result is quite impressed It makes us a strong believe that in the future, by applying other better improving translation scheme, and together, changing more deeply the monotonic of the ANBA index, we will be able to obtain a more strong impress better SMT system quality 5.5 Evaluation On Larger Training Data To see our improving could be able to apply to other pairs of languages, in this section, we will deploy our improving training scheme in the pair of English-French language This also helps us for testing the influences in larger training data We apply the improved bootstrapping algorithm on our training data which contains 100, 000 English - French parallel sentences (4 million of tokens) The original ANBA indices Tables when we running the original IBM Model 1-2 (the BLEU Score is 24.01) is denoted in Table 13 and 14 We choose the size of the set S is 14 After applying our Bootstrapping Word Alignment Algorithm, we have a better SMT system with BLEU Score is 25.17, and a better ANBA 60 Freq 10 Num of Words 12230 3880 2418 1606 1140 858 670 483 432 402 ANBA 3.72 2.80 2.46 2.16 1.95 1.89 1.73 1.65 1.55 1.50 Deviation 2.49 0.43 0.10 0.00 0.04 0.06 0.17 0.24 0.35 0.41 σ = 4.30 Table 13: The ANBA Statistical Table for French Words Freq 10 Num of Words 7910 3146 1752 1223 898 676 540 463 390 339 ANBA 4.43 3.12 2.63 2.50 2.29 2.21 2.10 1.98 1.89 1.80 Deviation 3.74 0.39 0.02 0.00 0.04 0.08 0.16 0.27 0.37 0.48 σ = 5.55 Table 14: The ANBA Statistical Table for English Words indices, as denoted in Table 15 and 16 Finally, the boosting of fertility models denoted in Table 17 Freq 10 Num of Words 12230 3880 2418 1606 1140 858 670 483 432 402 ANBA 2.48 1.96 1.81 1.66 1.56 1.58 1.50 1.43 1.40 1.40 Deviation 0.64 0.08 0.02 0.00 0.01 0.01 0.03 0.06 0.08 0.08 σ = 1.01 Table 15: The Upgraded ANBA Statistical Table for French Words The original BLEU score measurement of IBM Model for this training data is 24.33 Our improving got a BLEU score 26.05 and it’s a significantly better result compared to the original result It seems like our bootstrapping algorithm is improved the quality of statistical phrase-based translation system better for the larger training data CONCLUSION AND FUTURE WORK Under a word-based approach, frequent words with a consistent translation can be aligned at a high rate of preci- Freq 10 Num of Words 7910 3146 1752 1223 898 676 540 463 390 339 ANBA 2.95 2.15 2.07 1.87 1.76 1.72 1.67 1.65 1.57 1.59 Deviation 1.10 0.06 0.02 0.00 0.02 0.03 0.05 0.06 0.11 0.10 σ = 1.57 Table 16: The Upgraded ANBA Statistical Table for English Words Baseline Bootstrapping BLEU Score 24.33 26.05 ∆ +1.72 Table 17: Improving IBM Model sion However, words that are less frequent or exhibit diverse translations not have statistically significant evidence for confident alignment, thereby leading to incomplete or incorrect alignments We have already presented this aspect based on the proposed index ANBA We have also pointed out the fact that using a pure statistical IBM translation model is not enough to reach the state-of-the-art in word alignment modelling To overcome this problem, we present an effective bootstrapping word alignment algorithm In addition, we found that there are also a lot of effected methods which are quite simple and easy for boosting the quality of IBM Models 1-2 and hence improving the machine translation quality in overall Following to this scheme and other methods, we believe that these improvements will improve the fertility models to be obtained the state-of-the-art of statistical alignment quality ACKNOWLEDGEMENT This work is partially supported by the Vietnam’s National Foundation for Science and Technology Development (NAFOSTED), project code 102.99.35.09 This work is also partially supported by the project KC.01.TN04/11-15 REFERENCES [1] P F Brown, V J D Pietra, S A D Pietra, and R L Mercer The mathematics of statistical machine translation: parameter estimation Comput Linguist., 19:263–311, June 1993 [2] C Callison-Burch, D Talbot, and M Osborne Statistical machine translation with word- and sentence-aligned parallel corpora In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL ’04, Stroudsburg, PA, USA, 2004 Association for Computational Linguistics [3] A P Dempster, N M Laird, and D B Rubin Maximum likelihood from incomplete data via the em algorithm JOURNAL OF THE ROYAL 61 [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] STATISTICAL SOCIETY, SERIES B, 39(1):1–38, 1977 A Fraser and D Marcu Measuring word alignment quality for statistical machine translation Comput Linguist., 33:293–303, Sept 2007 C Hoang, A Le, P Nguyen, and T Ho Exploiting non-parallel corpora for statistical machine translation In Proceedings of The 9th IEEE-RIVF International Conference on Computing and Communication Technologies, pages 97 – 102 IEEE Computer Society, 2012 C Hoang, A Le, and B Pham A systematic comparison of various statistical alignment models for statistical english-vietnamese phrase-based translation (to appear) In Proceedings of The 4th International Conference on Knowledge and Systems Engineering IEEE Computer Society, 2012 S J Ker and J S Chang A class-based approach to word alignment Comput Linguist., 23:313–343, June 1997 P Koehn, H Hoang, A Birch, C Callison-Burch, M Federico, N Bertoldi, B Cowan, W Shen, C Moran, R Zens, C Dyer, O Bojar, A Constantin, and E Herbst Moses: open source toolkit for statistical machine translation In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL ’07, pages 177–180, Stroudsburg, PA, USA, 2007 Association for Computational Linguistics P Koehn, F J Och, and D Marcu Statistical phrase-based translation In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1, NAACL ’03, pages 48–54, Stroudsburg, PA, USA, 2003 Association for Computational Linguistics F J Och and H Ney Improved statistical alignment models In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, ACL ’00, pages 440–447, Stroudsburg, PA, USA, 2000 Association for Computational Linguistics F J Och and H Ney A systematic comparison of various statistical alignment models Comput Linguist., 29:19–51, March 2003 F J Och and H Ney The alignment template approach to statistical machine translation Comput Linguist., 30:19–51, June 2004 K Papineni, S Roukos, T Ward, and W.-J Zhu Bleu: a method for automatic evaluation of machine translation In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 311–318, Stroudsburg, PA, USA, 2002 Association for Computational Linguistics D Vilar, M Popovi´c, and H Ney AER: Do we need to ”improve” our alignments? In International Workshop on Spoken Language Translation, pages 205–212, Kyoto, Japan, Nov 2006 62 ... Association for Computational Linguistics P Koehn, F J Och, and D Marcu Statistical phrase-based translation In Proceedings of the 2003 Conference of the North American Chapter of the Association for. .. steps, which each of them tries to separate fn in the case that the lexical translation probability t(fn |ei ) is the maximum translation probability at that time when we compare to other target words... following from factor of the word ek (λek the equation (3), the lexical translation probability t(fj |ek ) is deeply greater over than the lexical translation probability t(fj |ei ) Therefore, we cannot

Định dạng
Số trang	8
Dung lượng	705,75 KB