1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: "Analysts Grammar or Japanese to the Nu-ProJect" pdf

8 331 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 8
Dung lượng 532,71 KB

Nội dung

Analysts Grammar or Japanese tn the Nu-ProJect - A Procedural Approach to Analysts Grammar - Jun-tcht TSUJII. Jun-tcht NAKANURA and Nakoto NAGAO Department of Electrical Engineering Kyoto University Kyoto. JAPAN Abstract Analysts grammar of Japanese tn the Mu-proJect ts presented, It is emphasized that rules expressing constraints on stngle linguistic structures and rules for selecting the most preferable readtngs are completely different In nature, and that rules for selecting preferale readings should be utilized tn analysts grammars of practical HT systems. It ts also clatmed that procedural control ts essential tn integrating such rules tnto a unified grammar. Some sample rules are gtven to make the points of discussion clear and concrete. 1. Introduction The Hu-ProJect ts a Japanese nattonal project supported by grants from the Special Coordination Funds for Promoting Science & Technology of STA(Sctence and Technology Agency). whlch atms to develop Japanese-English and English-Japanese machine translation systems. Ve currently restrict the domain of translation to abstracts of scientific and technological papers. The systems are based on the transfer approach[;], and consist of three phases: analysts, transfer and generation. In thts paper, we focus on the analysts grammar of Japanese tn the Japanese-English system. The grammar has been developed by using GRADE which ts a programming language specially designed for thts project[2]. The grammar now consists of about 900 GRADE rules. The experiments so far show that the grammar works very well and ts comprehensive enough to treat various linguistic phenomena tn abstracts. In thts paper we wtll discuss some of the basic design principles of the grammar together wtth its detatled construction. Some examples of grammar rules and analysts results wtll be shown to make the points of our discussion clear and concrete. 2. Procedural Grammar There has been a prominent tendency tn recent computational linguistics to re-evaluate CFG and use tt dtrectly or augment tt to analyze sentences[3.4.5]. In these systems(frameworks), CFG rules Independently describe constraints on stngle linguistic structures, and a universal rule application mechanism automatically produces a set of posstble structures which satisfy the given constraints. It ts well-known, however, that such sets of posstble structures often become unmanageably large. Because two separate rules such as NP • NP PREP-P VP • VP PREP-P are usually prepared tn CFG grammars tn order to analyze noun and verb phrases modifted by prepositional phrases. CFG grammars provide two syntactic analyses for She was given flowers by her uncle. Furthermore. the ambiguity of the sentence ts doubled by the lexlcal ambiguity of "by". which can be read as etther a locattve or an agenttve preposition. Since the two syntactic structures are recognized by compZetely independent ru]es and the semantic interpretations of "by" are given by independent processes tn the ]ater stages. It ts difficult to compare these four readings during the anaZysts to gtve a preference to one of these four readings. A rule such as "If a sentence ts passlve and there ts a "by"-prepostttonal phrase, tt ts often the case that the prepositional phrase ftlls the deep agenttve case. (try thts ana]ysts first)" seems reasonable and quite useful for choosing the most preferable interpretation, but tt cannot be expressed by refining the ordinary CFG rules. Thts ktnd of ru]e ts quite different In nature from a CFG ru]e. It ts not a rule of constraint on a stng]e ]tngutsttc structure(in fact. the above four readings are a]l ]tngulsttcal]y posstb]e), but tt ts a "heuristic" ru]e concerned with preference of readings, which compares several alternative analysts paths and chooses the most feastble one. Human translaters (or humans tn general) have many 267 such preference rules based on vartous sorts of cue such as morphological forms of words, collocations of words, text styles, word semantics, etc. These heuristic rules are quite useful not only for increasing efficiency but also for preventing proliferation of analysts results. As Wllks[6] potnted out, we cannot use semanttc Information as constraints on stngle linguistic structures, but Just as preference cues to choose the most feastble Interpretations among linguistically posstble Interpretations. We clatm that many sorts of preference cues other than semanttc ones exist tn real texts whtch cannot be captured by CFG rules. We will show tn thts paper that. by utilizing vartous sorts of preference cues. our analysts grammar of Japanese can work almost determtntsttcally to gtve the most preferable Interpretation as the ftrst output, wtthout any extensive semanttc processing (note that even "semant|c" processing cannot dtsambtguate the above sentence. The four readings are semantically possible. It requtres deep understanding of contexts or situations, whtch we cannot expect tn a practical MT system). In order to Integrate heuristic rules based on var|ous levels of cues tnto a untfted analysts grammar, we have developed a programming langauage. GRADE. GRADE provtdes us wtth the following facilities. Expllctt Control of Rule Appl|cattons : Heuristic rules can be ordered according to thetr strength(See 4-2). - Nulttple Relatton Representation : Vartous levels of Informer|on Including morphological. syntactic, semantic, logtcal etc. are expressed tn a s|ngle annotated tree and can be manipulated at any ttme durtng the analysts. Thts ts requtred not only because many heuristic rules are based on heterogeneous levels of cues. but also because the analysts grammar should perform semantic/logical Interpretation of sentences at the same ttme and the rules for these phases should be wrttten tn the same framework as syntactic analysis rules (See 4-2. 4-4). - Lextcon Drtven Processing : We can wrtte heuristic rules spectftc to a stngle or a 11mtted number of words such as rules concerned wtth collocations among words. These rules are strong tn the sense that they almost always succeed. They are stored tn the lextcon and tnvoked at appropriate times durtng the analysts wtthout decreasing efficiency (See 4-1). - Expltct% Definition of Analysts Strategies : The whole analysts phase can be dtvtded into steps. Thts makes the whole grammar efficient, natural and easy %o read. Furthermore. strategic consideration plays an essential role tn preventing undesirable interpretations from betng generated (See 4-3). 3 Organization of Grammar In thts sectton, we will give the organization of the grammar necessary for understanding the discuss|on |n the follow|ng sections. The matn components of the grammar are as follows. (1) Post-Morphological Analysts (2) Determination of Scopes (3) Analysts of Stmple Noun Phrases (4) Analysts of Stmple Sentences (5) Analysts of Embedded Sentences (Relative Clauses) (6) Analysts of Relationships of SentenCes (7) Analysts of Outer Cases (8) Contextual Processing (Processing of Omttted case elements. Interpretation of 'Ha' . etc.) (9) Reduction of Structures for Transfer Phase Each component conststs of from 60 to 120 GRADE rules. 47 morpho-syntacttc categories are provtded for Japanese analysts, each of whtch has tts own lextcal description format. 12.000 lextcal entrtes have already been prepared according to the formats. In thts classification. Japanese nouns are categorized |nto 8 sub-classes according to thetr morpho-syntacttc behavtour, and 53 semanttc markers are used to characterize thetr semanttc behaviour. Each verb has a set of case frame descriptions (CFD) whtch correspond to different usages of the verb. A CFD g|ves mapping rules between surface case markers (SCN - postpostttonal case particles are used as SCN's tn Japanese) and thetr deep case interpretations (DCZ 33 deep cases are used). DC! of an SCM often depends on verbs so that the mapping rules are given %o CFD's of Individual verbs. A CFO also gtves a normal collocation between the verb and SCM's(postpositonal case particles). Oetatled lextcal descriptions are gtven and discussed tn another paper[7]. The analysts results are dependency trees whtch show the semanttc relationships among tnput words. 4. Typtcal Steps of Analysts Grammar In the following, we w111 take some sample rules to Illustrate our points of discussion. 4-; Relative Clauses Relative clause constructions in Japanese express several different relationships between modifying clauses (relative clauses) and thelr antecedents. Some relattve clause constructions 268 cannot be translated as relative clauses tn Engltsh. Me classified Japanese relattve clauses Into the followtn 9 four types, according to the relationships between clauses and their antecedents. (1) Type 1 : Gaps In Cases One of the case elements of the relattve clause ts deleted and the antecedent fills the gap. (2) Type 2 : Gaps In Case Elements The antecedent modifies a case element tn the clause. That ts. a gap exists tn a noun phrase tn the clause. (3) Type 3 : Apposition The clause describes the content of the antecedent as the Engltsh "that"-clause tn 'the tdea that the earth ts round'. (4) Type 4 : Partlal Apposltlon The antecedent and the clause are related by certain semantic/pragmatic relationships. The relative clause of thts type doesn't have any gaps. This type cannot be translated dtrectly lnto English relative clauses. Me have to Interpolate In English appropriate phrases or clauses whtch are Implicit tn Japanese. tn order to express the semantic/pragmatic relationships between the antecedents and relative clauses explicitly. In other words, gaps extst tn the Interpolated phrases or clauses. Because the above four types of relattve clauses have the same surface forms fn Japanese (verb) (noun). RelattvefClause Antecedent careful processing ts requtred to d|sttngutsh them (note that the "antecedents' -modified nouns- ape located after the relat|ve clauses tn Japanese). A sophisticated analysis procedure has already been developed, which fully ut|ltzes vartous levels of heuristic cues as follows. (Rule 1) There are a 11mtted number of nouns whtch are often used as antecedents of Type 3 clauses. (Rule 2) Vhen nouns with certa|n semanttc markers appear tn the relattve clauses and those nouns are followed by one of spectflc postpostttonal case part4cles, there ts a htgh possibility that the relattve clauses are Type 2. In the following example, the word "SHORISOKUDO"(processtn 9 speed) has the semanttc marker AO (attribute). [ex-1] [Type 2] "SHORZSOKUDO" "GA" (processing speed) (case particle: subject I case) RelattvetClause "HAYA[" "KEISANK[" (htgh) I (computer) I /t Antecedent >(English Translation) A computer whose processing speed ts htgh (Rule 3) Nouns such as "MOKUTEKZ"(puPpose). "GEN ZN"(reason), "SHUDAN"(method) etc. express deep case relationships by themselves, and. when these nouns appear as antecedents. |t is often the case that they ft11 the gaps of the corresponding deep cases tn the relattve clauses. [ex-2] [Type 1] "KONO" "SOUCHI" "O" "TSUKAT" "TA" "MOKUTEK[" (th,s)l(dev,c. (c ICpurpos.) |part,cle:h /,ormat,ve: I J I / °bJect l / pest) l /case) ~ / RelattvetClause Antecedent > (English Translation) The purpose for wh|ch (someone) used thts devtce The purpose of ustn9 thts devtce (Rule 4) There ts a 11mtted number of nouns whtch are often used as antecedents In Type 4 relattve clauses. Each of such nouns requtres a specific phrase or clause to be Interpolated tn Engltsh. [ex-3] [Type 4] "KONO" "SOUCHI" "0" "TSUKAT" "TA" "KEKKA" (th,s),(devlce)/~case e.~. (to use)/~tense ~' (;esult) l fformat,ve:h J 1 ,object , Ipast) I 1 [ I case) l Rel at tve ~ Clause Antecedent > (Engllsh Translation) The result which was obtatned by ustng thts dev|ce In the above example, the clause "the result whtch someone obtatned (the result : gap)" ts onmitted tn Japanese. whtch relates the antecedent "KEKKA"(result) and the relattve clause "KONO SOUCHI 0 TSUKAT_TA"(someone used thts devtce). 269 A set of lextcal rules ts defined for "KEKKA"(resulL). which basically works as follows : tt examines first whether the deep object case has already been filled by a noun phrase tn the relattve clause. If so, the relattve clause ts taken as type 4 and an appropriate phrase ts Interpolated as tn [ex-3]. If not, the relattve clause ts taken as type 1 as tn the following example where the noun *KEKKA" (result) ftlls the gap of object case tn the relattve clause. [ex-4] [Type 1] "KONO" "JIKKEN • / •GA". "TSUKAT• J"TA" l "KEKKA" (thts)J(expertment)//(case~(to use)~(tense (r~ult) rParticle~ iformsttve:]l IsubJect I I past)| I [ _ll case) l / I Relattve Clause Antecedent >(English Translation) The result whtch thts experiment used Such lextcal rules are Invoked at the beginning of the relattve clause analysts by a rule tn the math flow of processing. The noun "KEKKA • (result) is given a mark as a lexlcal property which Indicates the noun has special rules to be Invoked when tt appears as an antecedent of a relatlve clause. A11 the nouns which requlre speclal treatments In the relative clause analysts are given the same marker. The rule tn the matn flow only checks thts mark and Invokes the lextcal rules defined tn the lextcon. (Rule 5) Only the cases marked by postpostttonal case particles 'GA'. 'WO" and 'NI" can be deleted tn Type 1 relattve clauses, when the antecedents are ordtnary nouns. Gaps tn Type 1 relative clauses can have other surface case marks, only when the antecedents are spectal nouns such as described tn Rule (3). 4-2 ConJuncted Noun Phrases ConJuncted noun phrases often appear in abstracts of scientific and technological papers. It ts Important to analyze them correctly. especially to determine scopes of conjunctions correctly, because they often lead to proliferation of analysis results. The particle "TO" plays almost the same role as the Engllsh "and" to conjunct noun phrases. There are several heuristic rules based on various levels of information to determine the scopes. <Scope Decision Rules of ConJuncted Noun Phrases by Partlcle 'TO'> (Rule 1) Stnce parttcle "TO" ts also used as a case particle, tf It appears tn the position: Noun 'TO" verb Noun, Noun 'TO' adjective Noun. there are two posstble Interpretations. one tn whlch "TO" Is a case parttcle and "noun TO adjective(verb)' forms a relattve clause that modifies the second noun. and the other one tn which "TO" ts a conjunctive particle to form a conJuncted noun phrase. However. it ts very 11kely that the parttcle 'TO' ts not 8 conjunctive parttcle but a post-positional case particle, if the adjective (verb) ts one of adjectives (verbs) which requtre case elements wtth surface case mark "TO' and there are no extra words between "TO • end the adjective (verb). In the following example. "KOTONARU(to be different)" ts an adjective which ts often collocated wtth a noun phrase followed by case particle "TO". [ex-5] YOSOKU-CHI "TO" KOTONARU ATAI (predicted value) (to be different) (value) [dominant interpretation] IYOSOKU-CHI "TO" KOTONARU ATIAI relattve~clause ant/cedent • the value which ts different from the predicted value [less domtnant Interpretation] YOSOKU-CHI "TO" KOTONARU ATAI Me N~ I I conJuncte~ noun phrase = the predicted value and the different value (Rule 2) If two "TO* particles appear tn the position: Noun-1 'TO' . Noun-2 'TO' 'NO" NOUN-3 the right boundary of the scope of the conJuctton ts almost always Noun-2. The second 'TO" plays a role of a delimiter which deltmtts the right boundary of the conjunction. Thts 'TO" tS optional, but tn real texts one often places tt to make the scope unambiguous, especially when the second conjunct IS a long noun phrase and the scope is highly ambiguous without tt. Because the second 'TO' can be Interpreted as a case parttcle (not as a delimiter of the conjunction) and 'NO' following a case parttcle turns the preceding phrase to a 270 modlfter of s noun. on Interpretation tn whtch "NOUN-2 TO NO" ts taken as o modtrter of NOUN-3 and NOUN-3 ts token as the hood noun of the second conJunt ts also linguistically possible. However, In most cases, when two 'TO" particles appear tn the above position, the second "TO' Is Just a delimiter of the scope(see [ex-6]). [ex-6] YOSOKU-CHI TO JIKKEN DE.NO JISSOKU-CHI TO 60 SA (predtctedl'~expertment~'~case'~(octual valu~ I value) J ~orttcle~ (dtt'ference) t pl°c°) ] [dominant Interpretation] YOSOKU-CHI TO J[KKEN DE 60 O[$$OKU-CH] TO NO SA NP NP 1 I ConJuncted HP I NP • the difference between the predicted value and the actual value tn the experiment [less domtnant tnterpnetattons] (A) YOSOKU-CHI TO JIKKEN DE NO JISSOKU-CHI TO NO $A NP NP I I ConJuncted NP - the difference wtth the actual value tn the predicted value and the experiment (e) YOS~KU-CH] .p ~p l I ConJun~ted NP TO J[KKEN DE NO JZSSOKU-CH[ TO NO SA "l "" I • the predicted value and the difference wtth the actual value tn the experiment (Rule 3) If a spectal noun whtch ts often collocated wtth conjunctive noun phrases appear tn the position: Noun-1 'TO' . Noun-2 "NO'<spectal-noun>, the rtght boundary of the conjunction ts almost always Noun-2. Such spectal nouns are marked tn the lextcon. [n the following example. "KANKEI" ts such a spectal noun. [ex-7] JISSOKU-CHI~O" (actual value) I RIRON-DE E-TA YOSOKU-CHI. NO, KANKE[__ 1(theory ]( ( to~( prod tcted~ (l:e lot ton~ " Iobtatn)l value) // shtp)J II spectal noun [dominant Interpretation] JISSOKU-CH! "TO" . YOSOKU-CH[ NO KANKEI L._;___I (relative antecedent clsuse)l J NP ~P I I con]u~cted NP • the relationship between the actual value and the predicted value obtatned by the theory [less domtnant Interpretations] (A) JIS$OKU-CHI "TO" R]ROH-DE YO$OKU-CH[ NO KANKE! NP I I conJun~ted NF I relattvetclouse antecedent • the relationship of the predicted value whtch was obtatned by the actual value and the theory (e) JX$SOKU-CH! "TO" . YO$OKU-CHX NO KANKEX ~P NP I I conJuncted NP • the actual value and the relationship of the predicted value whtch was obtatned by the theory (Rule 4) Zn Noun-1 'TO' . Noun-2, tf Noun-1 and Noun-2 are the same nouns, the rtght boundary of the conjunction ts almost always Noun-2. (Rule 5) In Noun-! 'TO' . Noun-2. tf Noun-! and Noun-2 are not exactly the some but nouns wtth the same morphemes, the rtght boundary 271 ts often Noun-2. In [ex-7] above, both of the heed nouns of the conJuncts. JISSOKU°CHI(actual value) and YOSOKU-CH[(predtcted value), have the same morpheme "CH[" (whtch meams "value"). Thus, thts rule can correctly determine the scope, even tf the spectal word "KANKE1"(relattonshtp) does not extst. (Rule 6) If some spectal words (11ke 'SONO" 'SORE-NO' etc. whtch roughly correspond to 'the'. '1iS' tn Engllsh) appear tn the position: Phrases whtchlNoun-1 "TO' <spectal word> Noun-2. modtfy noun phrases the modifiers preceding Noun-1 modtfy only Noun*l but not the whole conJuncted noun phrase. (Rule 7) [n Noun-1 'TO' . Noun-2. tf Noun-1 and flour-2 belong to the same spectftc semanttc categories, 11Le actton nouns, abstract nouns etc, the rtght boundary ts often Noun-2. (Rule 8) [n most conJuncted noun phrases, the structures of conJuncts are well-balanced. Therefore, tf a relattve clause precedes the first conjunct and the length of the second conjunct (the number of words between 'TO" and Noun-2) ts short 11ke [Relative Clause] Noun-1 'TO" . Noun-2 the relattve clause modtftes both conJuncts, that ts. the antecedent of the relattve clause ts the whole conJuncted phrase. These heuristic rules are based on different levels of Information (some are based on surface lexlcal Items. some are based on morphemes of words, some on semanttc |nformatton) and may lead to different decisions about scopes. However. we can distinguish strong heuristic rules (t.e. rules whtch almost always give correct scopes when they are applled) from others. In fact. there extsts some ordertng of heuristic rules according to thetr strength. Rules (1). (2). (3), (4) and (6). for example, almost always succeed, and rules like (7) and (8) often lead to wrong decisions. Rules 11ke (7) and (8) should be treated as default rules whtch are applted only when the other stronger rules cannot dectde the scopes. We can deftne tn GRADE an arbitrary ordertng of rule applications. Thts capability of contro114ng the sequences of rule applications ts essential tn Integrating heuristic rules based on heterogeneous levels of Information tnto a untried set of rules. Note that most of these rules cannot be naturally expressed by ordtnary CFG rules. Rule (2). for example, ts a rule whtch blocks the application of the ordtnary CFG rule such as NP > NP <case-particle> NO N when the <case-particle> ts 'TO' and a conjunctive parttcle 'TO' precedes thts sequence of words. 4-3 Determination of Scopes Scopes of conJuncted noun phrases often overlap wtth scopes of relattve clauses, whtch males the problem of scope determination more complicated. For the surface sequence of phrases 11ke NP-1 'TO' NP-2 <case-particle> <verb> NP-3 there are two passable scopes of conJuncted noun clause 11ke relationships between the phrase and the relattve (1) NP-1 'TO" NP-2 <case-particle> <verb> NP-3 I J conJ~ncted noun phrase I Relattv~ Clause I Antecedent I t NP (2)NP-2 'TO' NP-2 <case-particle> <verb> NP-3 I Relattve ~ Clause Antecedent J I N,P ConJuncted* Noun Phrase Thts ambiguity together with genutne ambtgu|ttes tn scopes of conJuncted noun phrases tn 4-2 produces combinatorial Interpretations tn CFG grammars, most of whtch are linguistically posstble but practically unth|nkable. It Is not only Inefficient but also almost Impossible to compare such an enormous number of linguistically posstble structures after they have been generated. In our analys|s grammar, a set of scope dectston rules are applted in the early stages of processing tn order to block the generation of combinatorial Interpretations. ]n fact. the structure (2) tn whtch a relsttve clause extsts wtthtn the scope of • conJuncted noun phrase is relatively rare tn real texts, especially when the relattve clause ts rather long. Such constructions wtth long relattve clauses are a ktnd or garden path sentence. Therefore. unless strong heuristic rules like (2). (3) and (4) tn 4-2 suggest the structure (2). the structure (1) ts adopted as the ftrst chotce (Note that, tn [ex-7] tn 4-2, the strong heuristic rule[rule (3)] suggests the structure (2)). Stnce 272 the result of such a decision ts explicitly expressed tn the tree: SCOPE-OF-CONUN~CTI~ and the grammar rules in the later stages of processing work on thts structure, the other interpretations of scopes will not be tried unless the ftrst choice fatls at e later stage for some reason or alternative interpretations are explicitly requested by a human operator. Note that a structure llke NP-1 'TO' . <verb> NP-2 <verb> NP-3 r[ relettve~clause 8!tecedent I relattve ~clause antecedent I I I conJunct~d noun phrase which ts linguistically posstble but extremely rare tn real texts, is naturally blocked. 4-4 Sentence Relationships and Outer Case Analysts Corresponding to Engltsh sub-ordinators and co-ordinators like 'although'. 'tn order to'. 'and' etc we have several different syntactic constructions as follows. (1) (Verb wtthe specific Inflection form) I I I I $1 S2 (2) (Verb)(a postpostttonal particle) ! S1 S2 (3) (Verb)(a conjunctive noun) ! | I i S1 S2 (1) roughly corresponds to Engllsh co-ordinate constructions, end (2) end (3) to Engltsh sub-ordinate constructions. However. the correspondence between the forms of Japanese end Engltsh sentence connections ts not so straightforward. Some postposttional particles tn (2). for example, are used to express several different semantic relationships between sentences. and therefore, should he translated tnto different sub-ordtnators in Engltsh according to the semantic relationships. The postpostttonal parttcle 'TAME' expresses either 'purpose-action" relationships or 'cause-effect' relationships. In order to dtsambtguate the semantic relationships expressed by 'TAME'. a set of lextcal rules ts defined in the dictionary of "TAME'. The rules are roughly as follows. (1) If S1 expresses a completed actton or a stative assertion, the relationship ts "cause-effect'. (2) If $1 expresses neither a completed event nor e statIve assertion and $2 expresses s controllable action, the relationship ts 'purpose- action'. [ex-e] (A) $1: TOKYO-NX (Tokyo) IT- TEITA (to go) (aspect formative) TAME 52: KAIGI-N! SHUSSEK| DEKINAKA- TA (meeting) (to attend) (cennot)(tense format- ive : past) $1: completed actton (the aspect formative "TEITA" means completion of an action) > [cause-effect] - Because I was in Tokyo. I couldn't attend the meeting. (B) $1: TOKYO-NI IKU (Tokyo) (to go) TAME $2: KAIGI-NI SHUSSEKI DEKINAI (meeting) (to attend) (cannot) $1: neither a completed action nor a stattve assertion S2: "whether I can attend the meeting or not • ts not controllable. > [cause-effect] • Because ! go to Tokyo. I cannot attend the meeting. (C) S1: TOKYO-NI IKU (Tokyo) (to go) TAME S2: KIPPU-O KAT- TA (ttcket) (to buy) (tense formative: past) $1: neither a completed action nor a stative assertion S2: volitional action > [purpose-action] • In order to go to Tokyo. I bought a ticket. Note that whether S1 expresses a completed action or not is determined tn the preceding phases 273 by ustng rules whtch uttllze espectual features of verbs described tn the dictionary and aspect formattves following the verbs (The classification of Japanese verbs based on thetr aspectual features and related toptcs are discussed tn [8]). Ve have already wrttten rules (some of whtch are heuristic ones) for 57 postpostttonal particles for conJucttons of sentences 11ke 'TAME'. Postpostttonal particles for cases, whtch follow noun phrases and express case relationships, are also very ambiguous In the sense that they express several different deep cases. Vhtle the Interpretation of tnner case elements are dtrectly given tn the verb dictionary as the form of mapping between surface case part|cles and thetr deep case Interpretations. the outer case elements should be semantically Interpreted by referring to semanttc categories or noun phrases and properties of verbs. Lextcal rules for 62 case particles have also been Implemented and tested. 5 Conclusions Analysts Grammar of Japanese tn the Mu-proJect ts discussed tn thts paper. By Integrating vartous levels of heuristic Information, the grammar can work very efficiently to produce the most natural and preferable readtn 9 as the f|rst output result. wtthout any extensive semanttc processtngs. The concept of procedural granwars was originally proposed by Wtnograd[9] and Independently persued by other research groups[lO]. However. thetr clatms have not been well appreciated by other researchers (or even by themselves). One often argues agatnst procedural grammars, saytng that: the linguistic facts Wtnograd's grammar captures can also be expressed by ATN. and the expressive power of ATN ts equivalent wtth that of the augmented CFG. Therefore; procedural grammars have no advantages over the augmented CFG. They Just make the whole grammars complicated and hard to maintain. The above argument, however, mtsses an Important po|nt and confuses procedural grammar wtth the representation of grammars tn the form of programs (as Shown tn Vtnograd[9]). Ve showed tn thts paper that: the rules whtch gtve structural constraints on ftnal analysts results and the rules whtch choose the most preferable linguistic structures (or the rules whtch block "garden path" structures) are different tn nature. [n order to Integrate the latter type of rules tn a untfted analysts grammar, tt ts essential to control the sequence of rule applications explicitly and Introduce strategic knowledge tnto grammar organizations. Furthermore. Introduction of control specifications doesn't necessarily lead to the grammar In the form of programs. Our grammar wrtttng system GRADE allows us a rule based specification of grammar, and the grammar developed by ustng GRADE ts easy to maintain. Ve also dtscuss the usefulness of lexicon driven processing 4n treattng Idiosyncratic phenomena tn natural languages. Lax|con drtven prcesstng ts extremely useful tn the transfer phase of machtne translation systems, because the transfer of lextcal ttems (selection of appropriate target lextcal ttems) ts htghly dependent on each lextcal ttem[tt]. The current verston of our analysts grammar works qutte well on t.O00 sample sentences tn real abstracts wtthout any pre-edtttng. Acknowledgements Appreciations go to the members of the Nu-ProJect, especially to the members of the Japanese analys4s group [Mr. E.Sumtta (Japan [BH). Hr. M.gato (Sord Co.). Hr. S.Ten|gucht (Kyosera Co.). Hr. A.Kosaka (~EC Co.). Mr. H.Sakamoto (Ok1 Electr|c Co.), MtSS H.Kume (JCS). Hr. N.[shtkawa (Kyoto Untv.)] who are engaged tn Implementing the comprehensive Japanese analysts grammar, and also to Or. 6.Vauquots. Dr. C.Bottet (Grenoble Untv France) and Dr. P.Sabat|er (CNRS. France) for their fnuttful discussions and comments. References [t] S.Vauquots: La Traductton Automat|que 8 Grenoble, Documents de Linguist|qua Quantitative, No. 24, Par|s, Dunod, t975 [2] J.Nakamura et.al.: Granunar Vrtttng System (GRADE) of Nu-Machtne Translation Project and tts Characteristics, Prec. of COL[NG 84. t984 [3] J.Slocum: A Status Report on the LRC Nach|ne Translation System, Vorktng Paper LRC-82-3. Linguistic Research Center, Untv. of Texas, t982 [4] F.Pere|ra et.al.: Oef|ntte Clause GRammars of Natural Language Analysts. Artificial Intelligence. Vol. 13. 1980 [5] G.Gazdan: Phrase Structure Grammars and Natural Languages. Prec. of 8th [JCA[. 1983 [6] Y.M|lks: Preference Semantics, tn The Formal Semant4cs of Natural Language (ed: E.L.Keenan), Cambridge University Press, t975 [7] Y.Sakamoto et.al.: Lextcon Features for Japanese Syntactic Analysts In Mu-ProJect-JE, Prec. of COLING 84, 1984 [8] J.TsuJ41: The Transfer Phase tn an English-Japanese Translation System. Proc. of COLING 82. t982 [g] T.Mtnognad: Understanding Natural Language, Academic Press, t975 [tO] C.Bottet et.al.: Recent Developments tn Russian-French Machtne Translation at Grenoble, Linguistics, Vol. 19, tg8t [tt] M.Nagao. et.al.: Dealing wtth [ncompleteness of L4ngutsttc Knowledge on Language Translation. Proc. of COLZNG 84. 1984 274 . sentences. and therefore, should he translated tnto different sub-ordtnators in Engltsh according to the semantic relationships. The postpostttonal parttcle. and Outer Case Analysts Corresponding to Engltsh sub-ordinators and co-ordinators like 'although'. 'tn order to& apos;. 'and'

Ngày đăng: 08/03/2014, 18:20

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN