Balancing Keynesian and Neoclassical Models

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	4
Dung lượng	89,48 KB

Nội dung

Balancing Clarity and Efficiency in Typed Feature Logic through Delaying Gerald Penn University of Toronto 10 King’s College Rd. Toronto M5S 3G4 Canada gpenn@cs.toronto.edu Abstract The purpose of this paper is to re-examine the balance between clarity and efficiency in HPSG design, with particular reference to the design decisions made in the English Resource Grammar (LinGO, 1999, ERG). It is argued that a simple generaliza- tion of the conventional delay statements used in logic programming is sufficient to restore much of the functionality and concomitant benefit that the ERG elected to forego, with an acceptable although still perceptible computational cost. 1 Motivation By convention, current HPSGs consist, at the very least, of a deductive backbone of extended phrase structure rules, in which each category is a description of a typed feature structure (TFS), augmented with constraints that enforce the principles of grammar. These principles typically take the form of statements, “for all TFSs, ψ holds,” where ψ is usually an implication. Historically, HPSG used a much richer set of formal descriptive devices, however, mostly on analogy to developments in the use of types and description logics in programming language theory (A¨ıt-Kaći, 1984), which had served as the impetus for HPSG’s invention (Pol- lard, 1998). This included logic-programming-style relations (Höhfeld and Smolka, 1988), a powerful description language in which expressions could de- note sets of TFSs through the use of an explicit disjunction operator, and the full expressive power of implications, in which antecedents of the above- mentioned ψ principles could be arbitrarily complex. Early HPSG-based natural language processing systems faithfully supported large chunks of this richer functionality, in spite of their inability to han- dle it efficiently — so much so that when the designers of the ERG set out to select formal descriptive devices for their implementation with the aim of “balancing clarity and efficiency,” (Flickinger, 2000), they chose to include none of these ameni- ties. The ERG uses only phrase-structure rules and type-antecedent constraints, pushing all would-be description-level disjunctions into its type system or rules. In one respect, this choice was successful, because it did at least achieve a respectable level of efficiency. But the ERG’s selection of functionality has acquired an almost liturgical status within the HPSG community in the intervening seven years. Keeping this particular faith, moreover, comes at a considerable cost in clarity, as will be argued below. This paper identifies what it is precisely about this extra functionality that we miss (modularity, Section 2), determines what it would take at a mini- mum computationally to get it back (delaying, Sec- tion 3), and attempts to measure exactly how much that minimal computational overhead would cost (about 4 µs per delay, Section 4). This study has not been undertaken before; the ERG designers’ decision was based on largely anecdotal accounts of performance relative to then-current implemen- tations that had not been designed with the inten- tion of minimizing this extra cost (indeed, the ERG baseline had not yet been devised). 2 Modularity: the cost in clarity Semantic types and inheritance serve to organize the constraints and overall structure of an HPSG grammar. This is certainly a familiar, albeit vague justification from programming languages research, but the comparison between HPSG and modern programming languages essentially ends with this statement. Programming languages with inclusional poly- morphism (subtyping) invariably provide functions or relations and allow these to be reified as methods Balancing Keynesian and Neoclassical Models Balancing Keynesian and Neoclassical Models By: OpenStaxCollege Finding the balance between Keynesian and Neoclassical models can be compared to the challenge of riding two horses simultaneously When a circus performer stands on two horses, with a foot on each one, much of the excitement for the viewer lies in contemplating the gap between the two As modern macroeconomists ride into the future on two horses—with one foot on the short-term Keynesian perspective and one foot on the long-term neoclassical perspective—the balancing act may look uncomfortable, but there does not seem to be any way to avoid it Each approach, Keynesian and neoclassical, has its strengths and weaknesses The short-term Keynesian model, built on the importance of aggregate demand as a cause of business cycles and a degree of wage and price rigidity, does a sound job of explaining many recessions and why cyclical unemployment rises and falls By focusing on the short-run adjustments of aggregate demand, Keynesian economics risks overlooking the long-term causes of economic growth or the natural rate of unemployment that exists even when the economy is producing at potential GDP The neoclassical model, with its emphasis on aggregate supply, focuses on the underlying determinants of output and employment in markets, and thus tends to put more emphasis on economic growth and how labor markets work However, the neoclassical view is not especially helpful in explaining why unemployment moves up and down over short time horizons of a few years Nor is the neoclassical model especially helpful when the economy is mired in an especially deep and long-lasting recession, like the Great Depression of the 1930s Keynesian economics tends to view inflation as a price that might sometimes be paid for lower unemployment; neoclassical economics tends to view inflation as a cost that offers no offsetting gains in terms of lower unemployment Macroeconomics cannot, however, be summed up as an argument between one group of economists who are pure Keynesians and another group who are pure neoclassicists Instead, many mainstream economists believe both the Keynesian and neoclassical 1/4 Balancing Keynesian and Neoclassical Models perspectives Robert Solow, the Nobel laureate in economics in 1987, described the dual approach in this way: “At short time scales, I think, something sort of ‘Keynesian’ is a good approximation, and surely better than anything straight ‘neoclassical.’ At very long time scales, the interesting questions are best studied in a neoclassical framework, and attention to the Keynesian side of things would be a minor distraction At the five-to-ten-year time scale, we have to piece things together as best we can, and look for a hybrid model that will the job.” Many modern macroeconomists spend considerable time and energy trying to construct models that blend the most attractive aspects of the Keynesian and neoclassical approaches It is possible to construct a somewhat complex mathematical model where aggregate demand and sticky wages and prices matter in the short run, but wages, prices, and aggregate supply adjust in the long run However, creating an overall model that encompasses both short-term Keynesian and long-term neoclassical models is not easy Navigating Unchartered Waters Were the policies implemented to stabilize the economy and financial markets during the Great Recession effective? Many economists from both the Keynesian and neoclassical schools have found that they were, although to varying degrees Alan Blinder of Princeton University and Mark Zandi for Moody’s Analytics found that, without fiscal policy, GDP decline would have been significantly more than its 3.3% in 2008 followed by its 0.1% decline in 2009 They also estimated that there would have been 8.5 million more job losses had the government not intervened in the market with the TARP to support the financial industry and key automakers General Motors and Chrysler Federal Reserve Bank economists Carlos Carvalho, Stefano Eusip, and Christian Grisse found in their study, Policy Initiatives in the Global Recession: What Did Forecasters Expect? that once policies were implemented, forecasters adapted their expectations to these policies They were more likely to anticipate increases in investment due to lower interest rates brought on by monetary policy and increased economic growth resulting from fiscal policy The difficulty with evaluating the effectiveness of the stabilization policies that were taken in response to the Great Recession is that we will never know what would have happened had those policies not have been implemented Surely some of the programs were more effective at creating and saving jobs, while other programs were less so The final conclusion on the effectiveness of macroeconomic policies is still up for debate, and further study will no doubt consider the impact of these policies on the U.S budget and ...Copulas and credit models Rüdiger Frey Swiss Banking Institute University of Zurich freyr@isb.unizh.ch Alexander J. McNeil Department of Mathematics ETH Zurich mcneil@math.ethz.ch Mark A. Nyfeler Investment Office RTC UBS Zurich mark.nyfeler@ubs.com October 2001 1 Introduction In this article we focus on the latent variable approach to modelling credit portfolio losses. This methodology underlies all models that descend from Merton’s firm-value model (Merton 1974). In particular, it underlies the most important industry models, such as the model proposed by the KMV corporation and CreditMetrics. In these models default of an obligor occurs if a latent variable, often interpreted as the value of the obligor’s assets, falls below some threshold, often interpreted as the value of the obligor’s liabilities. Dependence between default events is caused by dependence between the latent variables. The correlation matrix of the latent variables is often calibrated by developing factor models that relate changes in asset value to changes in a small number of economic factors. For further reading see papers by Koyluoglu and Hickman (1998), Gordy (2000) and Crouhy, Galai, and Mark (2000). A core assumption of the KMV and CreditMetrics models is the multivariate normality of the latent variables. However there is no compelling reason for choosing a multivariate normal (Gaussian) distribution for asset values. The aim of this article is to show that the aggregate portfolio loss distribution is often very sensitive to the exact nature of the multivariate distribution of the latent variables. This is not simply a question of asset correlation. Even when individual default probabilities of obligors and the matrix of latent variable correlations are held fixed, it is still possible to develop alternative models which lead to much heavier-tailed loss distributions. A useful source of alternative models is the family of multivariate normal mixture distributions, which includes Student’s t distribution and the generalized hyperbolic distribution. In most cases it is as easy to base latent variable models on these mixture distributions as it is to base them on the multivariate normal distribution. An elegant way of understanding how a multivariate latent variable distribution determines the distribution of the number of defaults in a portfolio is to use the concept of copulas. In this article we show that it is the copula (or dependence structure) of the latent variables that determines the higher order joint default probabilities for groups of obligors, and thus determines the extreme risk that there are many defaults in the portfolio. If we choose alternative latent variable distributions in the normal mixture family then we implicitly work with alternative copulas which often differ markedly from the copula of a Gaussian distribution. Some of these copulas, such as the t copula, possess tail dependence and, in contrast to the multivariate normal, they have a much greater tendency to generate 1 simultaneous extreme values (Embrechts, McNeil, and Straumann 1999). This effect is highly important in latent variable models, since simultaneous low asset values will lead to many joint defaults and past experience shows that realistic credit risk models need to be able to give sufficient weight to scenarios where many joint defaults occur. This article may be understood as a model risk study in the context of latent variable models. Individual default probabilities and asset correlations are insufficient to determine the portfolio loss distribution, since they do not fix the copula of the latent variables. For large portfolios of tens of thousands of counterparties there remains considerable model risk. Risk managers who employ the latent variable methodology should be aware of this. 2 Latent Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pages 793–803, Portland, Oregon, June 19-24, 2011. c 2011 Association for Computational Linguistics Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models Sameer Singh § Amarnag Subramanya † Fernando Pereira † Andrew McCallum § § Department of Computer Science, University of Massachusetts, Amherst MA 01002 † Google Research, Mountain View CA 94043 sameer@cs.umass.edu, asubram@google.com, pereira@google.com, mccallum@cs.umass.edu Abstract Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses paral- lelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granular- ities of entities to facilitate more effective ap- proximate inference. To evaluate these ideas, we constructed a labeled corpus of 1.5 million disambiguated mentions in Web pages by se- lecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy (with error re- duction of 38%) on this large dataset, demon- strating the scalability of our approach. 1 Introduction Given a collection of mentions of entities extracted from a body of text, coreference or entity resolution consists of clustering the mentions such that two mentions belong to the same cluster if and only if they refer to the same entity. Solutions to this problem are important in semantic analysis and knowledge discovery tasks (Blume, 2005; Mayfield et al., 2009). While significant progress has been made in within-document coreference (Ng, 2005; Culotta et al., 2007; Haghighi and Klein, 2007; Bengston and Roth, 2008; Haghighi and Klein, 2009; Haghighi and Klein, 2010), the larger problem of cross-document coreference has not received as much attention. Unlike inference in other language processing tasks that scales linearly in the size of the corpus, the hypothesis space for coreference grows super- exponentially with the number of mentions. Conse- quently, most of the current approaches are developed on small datasets containing a few thousand mentions. We believe that cross-document coreference resolution is most useful when applied to a very large set of documents, such as all the news ar- ticles published during the last 20 years. Such a corpus would have billions of mentions. In this paper we propose a model and inference algorithms that can scale the cross-document coreference problem to corpora of that size. Much of the previous work in cross-document coreference (Bagga and Baldwin, 1998; Ravin and Kazi, 1999; Gooi and Allan, 2004; Pedersen et al., 2006; Rao et al., 2010) groups mentions into entities with some form of greedy clustering using a pair- wise mention similarity or distance function based on mention text, context, and document-level statis- tics. Such methods have not been shown to scale up, and they cannot exploit cluster features that cannot be expressed in terms of mention pairs. We provide a detailed survey of related work in Section 6. Other previous work attempts to address some of the above concerns by mapping coreference to inference on an undirected graphical model (Culotta et al., 2007; Poon et al., 2008; Wellner et al., 2004; Wick et al., 2009a). These models contain pair- wise factors between all pairs of mentions Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:shortpapers, pages 101–106, Portland, Oregon, June 19-24, 2011. c 2011 Association for Computational Linguistics Extracting Opinion Expressions and Their Polarities – Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN), Italy {johansson, moschitti}@disi.unitn.it Abstract We investigate systems that identify opinion expressions and assigns polarities to the extracted expressions. In particular, we demon- strate the benefit of integrating opinion extraction and polarity classification into a joint model using features reflecting the global polarity structure. The model is trained using large-margin structured prediction methods. The system is evaluated on the MPQA opinion corpus, where we compare it to the only previ- ously published end-to-end system for opinion expression extraction and polarity classification. The results show an improvement of between 10 and 15 absolute points in F-measure. 1 Introduction Automatic systems for the analysis of opinions expressed in text on the web have been studied exten- sively. Initially, this was formulated as a coarse- grained task – locating opinionated documents – and tackled using methods derived from standard re- trieval or categorization. However, in recent years there has been a shift towards a more detailed task: not only finding the text expressing the opinion, but also analysing it: who holds the opinion and to what is addressed; it is positive or negative (polarity); what its intensity is. This more complex formula- tion leads us deep into NLP territory; the methods employed here have been inspired by information extraction and semantic role labeling, combinatorial optimization and structured machine learning. A crucial step in the automatic analysis of opinion is to mark up the opinion expressions: the pieces of text allowing us to infer that someone has a particular feeling about some topic. Then, opinions can be assigned a polarity describing whether the feeling is positive, neutral or negative. These two tasks have generally been tackled in isolation. Breck et al. (2007) introduced a sequence model to extract opinions and we took this one step further by adding a reranker on top of the sequence labeler to take the global sentence structure into account in (Johansson and Moschitti, 2010b); later we also added holder extraction (Johansson and Moschitti, 2010a). For the task of classifiying the polarity of a given expression, there has been fairly extensive work on suitable classification features (Wilson et al., 2009). While the tasks of expression detection and polarity classification have mostly been studied in isolation, Choi and Cardie (2010) developed a sequence labeler that simultaneously extracted opinion expressions and assigned polarities. This is so far the only published result on joint opinion segmenta- tion and polarity classification. However, their ex- periment lacked the obvious baseline: a standard pipeline consisting of an expression identifier followed by a polarity classifier. In addition, while theirs is the first end-to-end system for expression extraction with polarities, it is still a sequence labeler, which, by construction, is restricted to use simple local features. In contrast, in (Johansson and Moschitti, 2010b), we showed that global structure matters: opinions interact to a large extent, and we can learn about their interactions on the opinion level by means of their interactions on the syntactic Joint and conditional estimation of tagging and parsing models ∗ Mark Johnson Brown University Mark Johnson@Brown.edu Abstract This paper compares two different ways of estimating statistical language models. Many statistical NLP tagging and parsing models are estimated by maximizing the (joint) likelihood of the fully-observed training data. How- ever, since these applications only re- quire the conditional probability distributions, these distributions can in prin- ciple be learnt by maximizing the conditional likelihood of the training data. Perhaps somewhat surprisingly, models estimated by maximizing the joint were superior to models estimated by maximizing the conditional, even though some of the latter models intuitively had access to “more information”. 1 Introduction Many statistical NLP applications, such as tagging and parsing, involve finding the value of some hidden variable Y (e.g., a tag or a parse tree) which maximizes a conditional probability distribution P θ (Y |X), where X is a given word string. The model parameters θ are typically estimated by maximum likelihood: i.e., maximizing the likelihood of the training ∗ I would like to thank Eugene Charniak and the other members ofBLLIP for theircomments andsuggestions. Fer- nando Pereira was especially generous with comments and suggestions, as were the ACL reviewers; I apologize for not being able to follow up all of your good suggestions. This research was supported by NSF awards 9720368 and 9721276 and NIH award R01 MH60922-01A2. data. Given a (fully observed) training corpus D = ((y 1 , x 1 ), . . . , (y n , x n )), the maximum (joint) likelihood estimate (MLE) of θ is: ˆ θ = argmax θ n  i=1 P θ (y i , x i ). (1) However, it turns out there is another maximum likelihood estimation method which maximizes the conditional likelihood or “pseudo-likelihood” of the training data (Besag, 1975). Maximum conditional likelihood is consistent for the conditional distribution. Given a training corpus D, the maximum conditional likelihood estimate (MCLE) of the model parameters θ is: ˆ θ = argmax θ n  i=1 P θ (y i |x i ). (2) Figure 1 graphically depicts the difference between the MLE and MCLE. Let Ω be the universe of all possible pairs (y, x) of hidden and visible values. Informally, the MLE selects the model parameter θ which make the training data pairs (y i , x i ) as likely as possible relative to all other pairs (y  , x  ) in Ω. The MCLE, on the other hand, selects the model parameter θ in order to make the training data pair (y i , x i ) more likely than other pairs (y  , x i ) in Ω, i.e., pairs with the same visible value x i as the training datum. In statistical computational linguistics, maximum conditional likelihood estimators have mostly been used with general exponential or “maximum entropy” models because standard maximum likelihood estimation is usually computationally intractable (Berger et al., 1996; Della Pietra et al., 1997; Jelinek, 1997). Well- known computational linguistic models such as (MLE) (MCLE) Ω Y = y i , X = x i Ω X = x i Y = y i , X = x i Figure 1: The MLE makes the training data (y i , x i ) as likely as possible (relative to Ω), while the MCLE makes (y i , x i ) as likely as possible relative to other pairs (y  , x i ). Maximum-Entropy Markov Models (McCallum et al., 2000) and Stochastic Unification-based Grammars (Johnson et al., 1999) are standardly estimated with conditional estimators, and it would be interesting to know whether conditional estimation affects the quality of the estimated model. It should be noted that in practice, the MCLE of a model with a large number of features with complex dependencies may yield far better performance than the MLE of the ... policies on the U.S budget and deficit, as well as the value of the U.S dollar in the financial market 2/4 Balancing Keynesian and Neoclassical Models Key Concepts and Summary The Keynesian perspective... taxes and higher interest rates? Explain your answer Critical Thinking Question Is it a logical contradiction to be a neoclassical Keynesian? Explain 3/4 Balancing Keynesian and Neoclassical Models. .. aggregate demand as a cause of business cycles and a degree of wage and price rigidity, and thus does a sound job of explaining many recessions and why cyclical unemployment rises and falls The neoclassical

Ngày đăng: 31/10/2017, 17:12

Xem thêm