1. Trang chủ
  2. » Thể loại khác

...nded model for land registration____.pdf

8 71 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 8
Dung lượng 1,13 MB

Nội dung

...nded model for land registration____.pdf tài liệu, giáo án, bài giảng , luận văn, luận án, đồ án, bài tập lớn về tất...

Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 9–16, Prague, Czech Republic, June 2007. c 2007 Association for Computational Linguistics A Discriminative Syntactic Word Order Model for Machine Translation Pi-Chuan Chang ∗ Computer Science Department Stanford University Stanford, CA 94305 pichuan@stanford.edu Kristina Toutanova Microsoft Research Redmond, WA kristout@microsoft.com Abstract We present a global discriminative statistical word order model for machine translation. Our model combines syntactic movement and surface movement information, and is discriminatively trained to choose among possible word orders. We show that com- bining discriminative training with features to detect these two different kinds of move- ment phenomena leads to substantial im- provements in word ordering performance over strong baselines. Integrating this word order model in a baseline MT system results in a 2.4 points improvement in BLEU for English to Japanese translation. 1 Introduction The machine translation task can be viewed as con- sisting of two subtasks: predicting the collection of words in a translation, and deciding the order of the predicted words. For some language pairs, such as English and Japanese, the ordering problem is es- pecially hard, because the target word order differs significantly from the source word order. Previous work has shown that it is useful to model target language order in terms of movement of syn- tactic constituents in constituency trees (Yamada and Knight, 2001; Galley et al., 2006) or depen- dency trees (Quirk et al., 2005), which are obtained using a parser trained to determine linguistic con- stituency. Alternatively, order is modelled in terms of movement of automatically induced hierarchical structure of sentences (Chiang, 2005; Wu, 1997). ∗ This research was conducted during the author’s intern- ship at Microsoft Research. The advantages of modeling how a target lan- guage syntax tree moves with respect to a source lan- guage syntax tree are that (i) we can capture the fact that constituents move as a whole and generally re- spect the phrasal cohesion constraints (Fox, 2002), and (ii) we can model broad syntactic reordering phenomena, such as subject-verb-object construc- tions translating into subject-object-verb ones, as is generally the case for English and Japanese. On the other hand, there is also significant amount of information in the surface strings of the source and target and their alignment. Many state-of-the-art SMT systems do not use trees and base the ordering decisions on surface phrases (Och and Ney, 2004; Al-Onaizan and Papineni, 2006; Kuhn et al., 2006). In this paper we develop an order model for machine translation which makes use of both syntactic and surface information. The framework for our statistical model is as fol- lows. We assume the existence of a dependency tree for the source sentence, an unordered dependency tree for the target sentence, and a word alignment between the target and source sentences. Figure 1 (a) shows an example of aligned source and target dependency trees. Our task is to order the target de- pendency tree. We train a statistical model to select the best or- der of the unordered target dependency tree. An im- portant advantage of our model is that it is global, and does not decompose the task of ordering a tar- get sentence into a series of local decisions, as in the recently proposed order models for Machine Transi- tion (Al-Onaizan and Papineni, 2006; Xiong et al., 2006; Kuhn et al., 2006). Thus we are able to define features over complete target sentence orders, and avoid the independence Person, Parcel, Power An extended model for Land Registration Peter.Laarakker@Kadaster.NL Jaap Zevenbergen Yola Georgiadou Outline Context Theory on Land registration Role of the State Innovation & Crowdsourcing Conclusion Context • • • • Land registration too slow Fit for purpose Innovation in GEO-ICT Crowdsourcing; What is happening? Land Registration Theory State roles: - Adjudication - Demarcation - Surveying - Recording - Classification Role of the State James Scott: Seeing like a State State has a world view Van Egmond & De Vries: Sustainability, the search for the integral world view Role of the State State not always consistent nor legitimate: • Conflicting laws • Discriminatory laws • Corruption • Land grabbing Role of the State • State-citizen is complex relationship So: State is multi-polar Innovation and crowdsourcing Continuum of Land Rights • Land Registration delegated to non-State actors 10 Participative mapping 11 Participative mapping What happens? • Classification • Adjudication • Demarcation • Surveying • Recording 12 Many influences Power: Legitimate Non-legitimate State State-authority Corruption, land grabbing, discriminatory laws Non-State Non-State authority: church, family, international, etc Gangs, war-lords, discriminatory practices, corruption 13 Conclusion 14 Thank you 15 A DOP Model for Semantic Interpretation* Remko Bonnema, Rens Bod and Remko Scha Institute for Logic, Language and Computation University of Amsterdam Spuistraat 134, 1012 VB Amsterdam Bonnema~mars. let. uva. nl Rens. Bod@let. uva. nl Remko. Scha@let. uva. nl Abstract In data-oriented language processing, an annotated language corpus is used as a stochastic grammar. The most probable analysis of a new sentence is constructed by combining fragments from the corpus in the most probable way. This approach has been successfully used for syntactic anal- ysis, using corpora with syntactic annota- tions such as the Penn Tree-bank. If a cor- pus with semantically annotated sentences is used, the same approach can also gen- erate the most probable semantic interpre- tation of an input sentence. The present paper explains this semantic interpretation method. A data-oriented semantic inter- pretation algorithm was tested on two se- mantically annotated corpora: the English ATIS corpus and the Dutch OVIS corpus. Experiments show an increase in seman- tic accuracy if larger corpus-fragments are taken into consideration. 1 Introduction Data-oriented models of language processing em- body the assumption that human language per- ception and production works with representations of concrete past language experiences, rather than with abstract grammar rules. Such models therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the cor- pus; the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. * This work was partially supported by NWO, the Netherlands Organization for Scientific Research (Prior- ity Programme Language and Speech Technology). For the syntactic dimension of language, vari- ous instantiations of this data-oriented processing or "DOP" approach have been worked out (e.g. Bod (1992-1995); Charniak (1996); Tugwell (1995); Sima'an et al. (1994); Sima'an (1994; 1996a); Goodman (1996); Rajman (1995ab); Kaplan (1996); Sekine and Grishman (1995)). A method for ex- tending it to the semantic domain was first intro- duced by van den Berg et al. (1994). In the present paper we discuss a computationally effective version of that method, and an implemented system that uses it. We first summarize the first fully instanti- ated DOP model as presented in Bod (1992-1993). Then we show how this method can be straightfor- wardly extended into a semantic analysis method, if corpora are created in which the trees are enriched with semantic annotations. Finally, we discuss an implementation and report on experiments with two semantically analyzed corpora (ATIS and OVIS). 2 Data-Oriented Syntactic Analysis So far, the data-oriented processing method has mainly been applied to corpora with simple syntac- tic annotations, consisting of labelled trees. Let us illustrate this with a very simple imaginary example. Suppose that a corpus consists of only two trees: S S NP VP NP VP Det N sings Det N whistles I I I I every woman a man Figure 1: Imaginary corpus of two trees We employ one operation for combining subtrees, called composition, indicated as o; this operation identifies the leftmost nonterminal leaf node of one tree with the root node of a second tree (i.e., the second tree is substituted on the leftmost nontermi- 159 nal leaf node of the first tree). A new input sentence like "A woman whistles" can now be parsed by com- bining subtrees from this corpus. For instance: S N S NP VP wom~ NP VP Det N whistles Det N whistles t I I a a woman Figure 2: Derivation and parse for "A woman whistles" Other Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 216–223, Prague, Czech Republic, June 2007. c 2007 Association for Computational Linguistics A Simple, Similarity-based Model for Selectional Preferences Katrin Erk University of Texas at Austin katrin.erk@mail.utexas.edu Abstract We propose a new, simple model for the auto- matic induction of selectional preferences, using corpus-based semantic similarity metrics. Fo- cusing on the task of semantic role labeling, we compute selectional preferences for seman- tic roles. In evaluations the similarity-based model shows lower error rates than both Resnik’s WordNet-based model and the EM-based clus- tering model, but has coverage problems. 1 Introduction Selectional preferences, which characterize typ- ical arguments of predicates, are a very use- ful and versatile knowledge source. They have been used for example for syntactic disambigua- tion (Hindle and Rooth, 1993), word sense dis- ambiguation (WSD) (McCarthy and Carroll, 2003) and semantic role labeling (SRL) (Gildea and Jurafsky, 2002). The corpus-based induction of selectional preferences was first proposed by Resnik (1996). All later approaches have followed the same two- step procedure, first collecting argument head- words from a corpus, then generalizing to other, similar words. Some approaches have used WordNet for the generalization step (Resnik, 1996; Clark and Weir, 2001; Abe and Li, 1993), others EM-based clustering (Rooth et al., 1999). In this paper we propose a new, simple model for selectional preference induction that uses corpus-based semantic similarity metrics, such as Cosine or Lin’s (1998) mutual information- based metric, for the generalization step. This model does not require any manually created lexical resources. In addition, the corpus for computing the similarity metrics can be freely chosen, allowing greater variation in the domain of generalization than a fixed lexical resource. We focus on one application of selectional preferences: semantic role labeling. The ar- gument positions for which we compute selec- tional preferences will be semantic roles in the FrameNet (Baker et al., 1998) paradigm, and the predicates we consider will be semantic classes of words rather than individual words (which means that different preferences will be learned for different senses of a predicate word). In SRL, the two most pressing issues today are (1) the development of strong semantic features to complement the current mostly syntactically- based systems, and (2) the problem of the do- main dependence (Carreras and Marquez, 2005). In the CoNLL-05 shared task, participating sys- tems showed about 10 points F-score difference between in-domain and out-of-domain test data. Concerning (1), we focus on selectional prefer- ences as the strongest candidate for informative semantic features. Concerning (2), the corpus- based similarity metrics that we use for selec- tional preference induction open up interesting possibilities of mixing domains. We evaluate the similarity-based model against Resnik’s WordNet-based model as well as the EM-based clustering approach. In the evaluation, the similarity-model shows lower er- ror rates than both Resnik’s WordNet-based model and the EM-based clustering model. However, the EM-based clustering model has higher coverage than both other paradigms. Plan of the paper. After discussing previ- 216 ous approaches to selectional preference induc- tion in Section 2, we introduce the similarity- based model in Section 3. Section 4 describes the data used for the experiments reported in Section 5, and Section 6 concludes. 2 Related Work Selectional restrictions and selectional prefer- ences that predicates impose on their arguments have long been used A METAPLAN MODEL FOR PROBLEM-SOLVING DISCOURSE* Lance A. Ramshaw BBN Systems and Technologies Corporation 10 Moulton Street Cambridge, MA 02138 USA ABSTRACT The structure of problem-solving discourse in the expert advising setting can be modeled by adding a layer of metaplans to a plan-based model of the task domain. Classes of metaplans are introduced to model both the agent's gradual refinement and instantiation of a domain plan for a task and the space of possible queries about preconditions or fillers for open variable slots that can be motivated by the exploration of par- ticular classes of domain plans. This metaplan structure can be used to track an agent's problem-solving progress and to predict at each point likely follow-on queries based on related domain plans. The model is implemented in the Pragma system where it is used to suggest cor- rections for ill-formed input. 1. INTRODUCTION Significant progress has been achieved recently in natural language (NL) understanding systems through the use of plan recognition and "plan tracking" schemes that maintain models of the agent's domain plans and goals. Such sys- tems have been used for recognizing discourse structure, processing anaphora, providing cooperative responses, and interpreting intersen- tential ellipsis. However, a model of the dis- course context must capture more than just the plan structure of the problem domain. Each dis- course setting, whether argument, narrative, cooperative planning, or the like, involves a level of organization more abstract than that of domain plans, a level with its own structures and typical strategies. Enriching the domain plan model with a model of the agent's plans and strategies on this more abstract level can add "This research was supported by the Advanced Research Projects Agency of the Department of Defense and was monitored by ONR under Contract No. N00014-85- C-0016. The views and conclusions contained in this docu- ment are those of the author and should not be interpreted as necessarily representing the official policies, either ex- pressed or implied, of the Defense Advanced Research Projects Agency or the U.S. Government. significant power to an NL system. This paper presents an approach to pragmatic modeling in which metaplans are used to model that level of discourse structure for problem-solving dis- course of the sort arising in NL interfaces to expert systems or databases. The discourse setting modeled by metaplans in this work is expert-assisted problem-solving. Note that the agent's current task in this context is creating a plan for achiev- ing the domain goal, rather than executing that plan. In problem-solving discourse, the agent poses queries to the expert to gather information in order to select a plan from among the various possible plans. Meanwhile, in order to respond to the queries cooperatively, the expert must maintain a model of the plan being considered by the agent. Thus the expert is in the position of deducing from the queries that are the agent's observable behavior which possible plans the agent is currently considering. The metaplans presented here model both the agent's plan- building choices refining the plan and instantiat- ing its variables and also the possible queries that the agent may use to gain the information needed to make those choices. This unified model in a single formalism of the connection between the agent's plan-building choices and the queries motivated thereby allows for more precise and efficient prediction from the queries observed of the underlying plan-building choices. The model can be used for plan track- ing by searching outward each time from the previous context in a tree of metaplans [ Mechanical Translation , vol.4, nos.1 and 2, November 1957; pp. 2-4] A Model for Mechanical Translation John P. Cleave, Birkbeck College Research Laboratory, University of London* A mathematical model for a translating machine is proposed in which the transla- tion of each word is conditioned by the preceding text. The machine contains a number of dictionaries where each dictionary represents one of the states of a multistate machine. * Now at Southampton University, Southampton, England. translation and also to a simple coding expres- sed by the table which may be regarded as a dictionary. If the input data S and the output data are punched tape on an automatic computer with unidirec- tional reading and printing devices, then the above transformation is effected by a single- state machine. A word-for-word translation in which the equivalents selected for an input word depend upon the context of the preceding text is repre- sented by a compound coding, effected by a multistate machine. This type of transforma- tion, called "conditional" is effected by the rules: A Model for MT 3 where r = 1, 2 We suppose that the sequence of rules provides a course of action for each possibility. ( The exact conditions on the number of rules will not be investigated here, but it should be noted that the rules are in a certain order.) If we let the sign ' »' de- note 'precede in the message' then rule r can be abbreviated to Instead a connected series of dictionaries may be constructed by the following method, which is best illustrated by supposing one conditional rule only. Suppose the sequence of rules is The sequence of dictionaries will contain some entries which will refer the operator to another dictionary. If we let, say The last n rules cover those instances where a datum of S 1 is not preceded by its relevant context. These rules cannot be reduced to the simple dictionary with a finite number of en- tries as in the previous simple transformation. 4 J. P. Cleave If the conditional rules are effected by a com- puting machine, each dictionary represents a state of the machine. A transformation which de- pends upon context therefore can be represented as a compound coding or a multistate machine. ...1 Context • • • • Land registration too slow Fit for purpose Innovation in GEO-ICT Crowdsourcing; What is happening? Land Registration Theory State roles: - Adjudication... Sustainability, the search for the integral world view Role of the State State not always consistent nor legitimate: • Conflicting laws • Discriminatory laws • Corruption • Land grabbing Role of the... complex relationship So: State is multi-polar Innovation and crowdsourcing Continuum of Land Rights • Land Registration delegated to non-State actors 10 Participative mapping 11 Participative

Ngày đăng: 04/11/2017, 14:24