Báo cáo khoa học: "HANDLING SYNTACTICAL AMBIGUITY IN MACHINE TRANSLATION" docx

4 322 0
Báo cáo khoa học: "HANDLING SYNTACTICAL AMBIGUITY IN MACHINE TRANSLATION" docx

Đang tải... (xem toàn văn)

Thông tin tài liệu

HANDLING SYNTACTICAL AMBIGUITY IN MACHINE TRANSLATION Vladimir Pericliev Institute of Industrial Cybernetics and Robotics Acad. O.Bontchev Sir., bl.12 1113 Sofia, Bulgaria ABSTRACT The difficulties to be met with the resolu- tion of syntactical ambiguity in MT can be at least partially overcome by means of preserving the syntactical ambiguity of the source language into the target language. An extensive study of the co- rrespondences between the syntactically ambiguous structures in English and Bulgarian has provided a solid empirical basis in favor of such an approach. Similar results could be expected for other suffi- ciently related languages as well. The paper con- centrates on the linguistic grounds for adopting the approach proposed. 1. INTRODUCTION Syntactical amblgulty, as part of the ambigui- ty problem in general, is widely recognized as a major difficulty in MT. To solve this problem, the efforts of computational linguists have been main- ly directed to the process of analysis: a unique analysis is searched (semantical and/or world knowledge information being basically employed to this end), and only having obtained such an ana- lysis, it is proceeded to the process of synthesis. On this approach, in addition to the well known difficulties of general-linguistic and computa- tional character, there are two principle embarras- ments to he encountered. It makes us entirely in- capable to process, first, sentences with "unre- solvable syntactical ambiguity" (with respect to the disambiguation information stored), and, se- condly, sentences which must he translated ambi- guously (e.g. puns and the like). In this paper, the burden of solution of the syntactical ambiguity problem is shifted from the domain of analysis to the domain of synthesis of sentences. Thus, instead of trying to resolve such ambiguities in the source language (SL), syntac- tically ambiguous sentences are synthesized in the target language (TL) which preserve their ambigui- ty, so that the user himself rather than the par- ser disambiguates the ambiguities in question. This way of handling syntactical ambiguity may be viewed as an illustration of a more gene- ral approach, outlined earlier (Penchev and Perl- cliev 1982, Pericliev 1983, Penchev and Perlcllev 1984), concerned also with other types of ambt- guitles in the SL translated by means of syntacti- cal, and not only syntactical, ambiguity in the TL. In this paper, we will concentrate on the linguistics ~rounds for adopting such a manner of handling of syntactical ambiguity in an English in- to Bulgarian translation system. 2. PHILOSOPHY This approach may be viewed as an attempt to simulate the behavior of s man-translator who is linguistically very competent, but is quite unfa- miliar with the domain he is translating his texts from. Such a man-translator will be able to say what words in the original and in the translated sentence go together under all of the syntactica- lly admissible analyses; however, he will be, in general, unable to make a decision as to which of these parses "make sense". Our approach will be an obvious way out of this situation. And it is in fact not Infrequently employed in the everyday practice of more "smart" translators. We believe that the capacity of such transla- tors to produce quite intelligible translations is a fact that can have a very direct bearing on at least some trends in MT. Resolvlng syntactical am- biguity, or, to put it more accurately, evading syntactical ambiguity in MT following a similar human-like strategy is only one instance of this. There are two further points that should be made in connection with the approach discussed. We assume as more or less self-evident that: (i) MT should not be intended to explicate texts in the SL by means of texts in the TL as previous approaches imply, but should only tran- slate them, no matter how ambiguous they might happen to be; (ii) Since ambiguities almost always pass un- noticed in speech, the user will unconsciously dtsambtguate them (as in fact he would have done, had he read the text in the SL); this, in effect, will not diminish the quality of the translation in comparison with the original, at least insofar as ambiguity is concerned. 521 3. THE DESCRIPTION OF SYNTACTICAL AMBIGUITY IN ENGLISH AND BULGARIAN The empirical basis of the approach is provi- ded by an extensive study of syntactical ambiguity in English and Bulgarlan (Pericliev 19835, accom- plished within the framework of a version of de- pendency grammar using dependency arcs and bra- cketlngs. In this study, from a given llst of con- figurations for each language, all logically-ad- mlssible ambiguous strings of three types in En- gllsh and Bulgarian were calculated. The first type of syntactlcally ambiguous strings is of the form: (15 A ~L~B, e.g. adv.mod(how long?) f The statistician studied(V) the ~hole year(PP), obj.dir(wh~t?) where A, B, are complexes of word-classes, " ~" is a dependency arc, and 1, 2, are syn- tactical relations. The second type is of the form: (2) A -~->B<-~- C, e.g. adv.mod(how?) She greeted(V) the girl(N) ~ith a smil6(PP) attrib(what?) The third type is of the form: (3) A -!-~B~-~- C, e.g. adv.mod(how?) [ He failed(V) enttrely(Adv) to cheat(Vin f) her adv.mod(how?) It was found, first, that almost all logically -admissible strings of the three types are actually realized in both languages (cf. the same result al- so for Russian in JordanskaJa (1967)5. Secondly, and more important, there turned out to be a stri- king coincidence between the strings in English and Bulgarian; the latter was to he expected from the coincidence of configurations in both languages as well as from their sufficiently similar global syntactic organization. 4. TRANSLATIONAL PROBLEMS With a view to the aims of translation, it was convenient to distinguish two cases: Case A, in which to each syntactically ambiguous string in En- glish corresponds a syntactically ambiguous string in Bulgarlan, and Case B, in which to some English strings do not correspond any Bulgarian ones; Case A provides a possibility for literal English into Bulgarian translation, while there is no such possibillty for sentences containing strings classed under Case B. 4.1. Case A: Literal Translation English strings which can be literally tran- slated into Bulgarian comprise,roughly speaking, the majority and the most common of strings to appear In real English texts. Informally, these strings can be included into several large groups of syntactically ambiguous constructions, such as constructions with "floating" word-classes (Ad- verbs, Prepositional Phrases, etc. acting as slaves either to one, or to another master-word), constru- ctions with prepositional and post-positional ad- juncts to conjoined groups, constructions with se- veral conjoined members, constructions with symmet- rical predicates, some elliptical constructions, etc. Due to space limitations, a few English phra- ses with their literal translations will suffice as an illustration of Case A. (Further on, syntac- tical relations as labels of arcs will be omitted where superfluous in marking the ambiguity): (4) I 41 a review(N) "of a ^boo~(PP) ~(PP) ===~ I t l [ ==>retsenzija(N) ~(PP) o~ ~(PP) (5) I saw(V) the car(N) ouslde(Adv) ==~> ===~Azl vidjah(V)i k°l~ Ata(N) navan(Adv)I O' v°iy 'dv' ) ===>.mnogo (Adv) ~ I skromen (Adjjl))i" razumen (Adj)i, 522 1 t l IVq ) beau ful( d )(wo n(N) II gi s(N) > v' !1 'v ) (ze,,, (N) " momicheta(N) ) >kra ivi( dj, It 4.2. Case B: Non-Literal Translation English strings which cannot be literally translated into Bulgarian are such strings which contain: (i) word-classes (V i f Gerund) not pre- n ' sent in Bulgarian, and/or (ii) syntactical relations (e.g. "composite": language~-~ theory, etc.) not present in Bulgarian, and/or (iii) other differences (in global syntactical organization, agreement, etc. ). It will be shown how certain English strings falling under this heading are related to Bulgarian strings preserving their ambiguity. A way to over- come difficulties with (il) and (iii) is exempli- fied on a very common (complex) string, vlz. Adj/N/Prt+N/N's+N (e.g. stylish ~entlemen's suits). As an illustration, here we confine to prob- lems to be met with (i), and, more concretely, to such English strings containing Vin f. These strings are mapped onto Bulgarian strings containing da-construction or a verbal noun (V i ~ generally b-eeing translated either way). E.g. nXthe Vln f in obj. dlr (8) a. He promised(V) to please(Vin f) mother t._JI . eL. adv. mod (promised what or why?) is rendered by a da-con- struction in agreement with the subject, preserving the ambiguity: obj. dir ~,'" I[ ~1 ' zaradva(da-const r) objelht a (V) da b. T~J . ~ I __ m~Jka adv. mod In the string attrib (9) a. ~ have(V)jl, instructions(N)~, toj st~dy(Vin f)j obJ.dlr (what instructions or I have to study what?) V. _ can be rendered alternatively by a d_~a-construc ~nz- tion or by a prepositional verbal noun: attrib b. AZ imam(V) lnstruktsii(N) da ucha(d__aa-constr) ohj dir attrib c. instruktsii(N) za uchene(PrVblN) obj. dl r J Yet in other strings, e.g. The chicken(N) is ready(Adj) to eat(V. .) (the chicken eats or is eaten.), in order to preserve the ambiguity the infinitive should be rendered by a prepositional verbal noun: Pileto(N) e gotovo(AdJ) z_~a jadene (PrVblN), rather than with the finite da-construc- tion, since in the latter case we would obtain two unambiguous translations: Pileto e gotovo d a ~ade (the chicken eats) or Pileto e got ovo da se ~ade (the chicken is eaten), and so on. For some English strings no syntactically am- biguous Bulgarian strings could be put into corres- pondence, so that a translation with our method proved to be an impossibility. E.g. predicative V~ 7 I[ ob~ .dir ~ (I0) He found(V) the mechanic(N) a helper(N) ~ Jl~bJ.indir ~ t obJ.dir (either the mechanic or someone else is the helper) is such a sentence due to the impossibility in Bul- garian~r two non-prepositional objects, a direct and an indirect one, to appear in a sentence. 4.3. Mul~,,iple Syntactical Ambiguity Many very frequently encountered cases of mul- tiple syntactical ambiguity can also be handled successfully within this approach. E.g. a phrase like Cybernetical devices and systems for automatic control and dia~nosis in biomedicine with more than 30 possible parsings is amenable to literal trans- lation into Bulgarian. 4.4. Semantically Irrelevant Syntactical Ambi~uity Disambiguating syntactical ambiguity is an im- portant task in MT only because different meanings are usually associated with the different syntac- tical descriptions. This, however, is not always the case. There are some constructions in English the syntactical ambiguity of which cannot lead to multiple understanding. E.g. in sentences of the form A is not B (He is not happy), in which the ad- verbial particle not is either a verbal negation (He isn't happy) or a non-verbal negation (He's not happy), the different syntactical trees will be in- terpreted semantically as synonymous: 'A is not B' ~-==~A is not-B'. 523 We should not worry about finding Bulgarlan syntactically ambiguous correspondences for such English constructions. We can choose arbitrarily one analysis, since either of the syntactical des- criptions will provide correct information for our translational purposes. Indeed, the construc- tion above has no ambiguous Bulgarian correspon- dence: in Bulgarian the negating particle combines either with the verb (then it is written as a se- parate word) or with the adjective (in which case it is prefixed to it). Either construction, how- ever, will yield a correct translation: To~ nee == radosten or To~ e neradosten. 4.5. A Lexical Problem Certain difficulties may arise, having managed to map English syntactically ambiguous strings onto ambiguous Bulgarian ones. These difficulties are due to the different behavior of certain English lexemes in comparison to their Bulgarian equiva- lents. This behavior is displayed in the phenomenon we call "intralingual lexical-resolution of syn- tactical ambiguity" (the substitution of lexemes in the SL with their translational equivalents from the TL results in the resolution of the syn- tactical ambiguity). For instance, in spite of the existence of am- biguous strings in both languages of the form Verbtr/itr~->Noun, with some particular le- xemes (e.g. shoot~r/itr==-~>zastrel~amtr or strel~amitr), In which to One Engllsh lexeme co- rrespond two in Bulgarian (one only transitive, and the other only intransitive), the ambiguity in the translation will be lost. This situation explains why it seems impossible to translate ambiguously into Bulgarian examples containing verbs of the type given, or verbal nouns formed from such verbs, as the case is in The shootin~ of the hunters. This problem, however, could be generally tackled in the translation into Bulgarian, since it is a language usually providing a series of forms for a verb: transitive, intransitive, and transitive/in- transitive, which are more or less synonymous ~for more details, cf. Penchev and Perlcliev (1984)). 5. CONCLUDING REMARKS To conclude, some syntactically ambiguous strings in English can have literal, others non-ll- teral, and still others do not have any correspon- dences in Bulgarian. In summary, from a total num- ber of approximately 200 simple strings treated in Engllsh more than 3/4 can, and only 1/4 cannot, be literally translated; about half of the latter strings can be put into correspondence with syntac- tically ambiguous strings in Bulgarian preserving their ambiguity. This gives quite a strong support to the usefulness of our approach in an English in- to Bulgarian translation system. Several advantages of this way of handling of syntactical ambiguity can be mentioned. First, in the processing of the majority of syntactically ambiguous sentences within an En- glish into Bulgarian translation system it dispen- ses with semantical and world knowledge information at the very low cost of studying the ambiguity co- rrespondences in both languages. It could be expec- ted that investigations along this line will prove to be frultful for other pairs of languages as well. Secondly, whenever this way of handling syn- tactical ambiguity is applicable, the impossibili- ty of previous approaches to translate sentences with unresolvable ambiguity, or such with verbal Jokes and the like, turns out to be an easily attainable task. Thirdly, the approach seems to have a very na- tural extension to another principal difficulty in MT, viz. coreference (cf. the three-ways ambiguity of Jim hit John and then he (Jim, John or neither?) went away and the same ambiguity of tQ~ (=he) in its literal translation into Bulgarian: D$im udari DJon i togava toj(?) si otide). And, finally, there is yet another reason for adopting the approach discussed here. Even if we choose to go another way and (somehow) dlsamblgu- ate sentences in the SL, almost certainly their translational equivalents will be again syntactl- cally ambiguous, and quite probably preserve the very ambiguity we tried to resolve. In this sense, for the purposes of MT (or other man-oriented applications of CL) we need not waste our efforts to disambiguate e.g. sentences like John hit the dog with th___ee lon~ hat or John hit th____ee do~ with the long woo1, since, even if we have done that, the correct Bulgarian translations of both these sen- tences are syntactically ambiguous in exactly the same way, the resolution of ambiguity thus proving to be an entirely superfluous operation (cf. D~on udari kucheto s dal~ata palka and Djon udari ku- cheto s dal~ata valna). 6. REFERENCES JordanskaJa, L. 1967. Syntactical ambiguity in Russian (with respect to automatic analysis and synthesis). Scientific and Technical In- formation, Moscow, No.5, 1967. (in Russian). Penchev, J. and V. Perlcllev. 1982. On meaning in theoretical and computational semantics. In: COLING-82, Abstracts, Prague, 1982. Penchev, J. and V. Perlcliev. 1984. On meaning in theoretical and computational semantics. Bulgarian Language, Sofia, No.4, 1984. (in Bulgarian). Pericliev, V. 1983. Syntactical Ambiguity in Bul- garian and in English. Ph.D. Dissertation, ms., Sofia, 1983. (in Bulgarian). 524 . English strings containing Vin f. These strings are mapped onto Bulgarian strings containing da-construction or a verbal noun (V i ~ generally b-eeing translated. to distinguish two cases: Case A, in which to each syntactically ambiguous string in En- glish corresponds a syntactically ambiguous string in Bulgarlan,

Ngày đăng: 17/03/2014, 19:21

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan