Handbook of Economic Forecasting part 13 pot

94 C.W.J. Granger and M.J. Machina x F is set equal to x R , so that the optimal action function for the objective function (22) takes the form α(x) ≡ g −1 (x). This in turn implies that its canonical form ˆ U(x,x F ) is given by (23) ˆ U(x,x F ) ≡ U  x,α(x F )  ≡ f(x)− L  x,g  α(x F )  ≡ f(x)− L(x, x F ). 2.3.2. Implications of squared-error loss The most frequently used loss function in statistics is unquestionably the squared-error form (24)L Sq (x R ,x F ) ≡ k ·(x R − x F ) 2 ,k>0, which is seen to satisfy the properties (8). Theorem 1 thus implies the following result: C OROLLARY 1. For arbitrary squared-error function L Sq (x R ,x F ) ≡ k · (x R − x F ) 2 with k>0, an objective function U(·, ·) :X ×A → R 1 with strictly monotonic optimal action function α(·) will generate L Sq (·, ·) as its loss function if and only if it takes the form (25)U(x,α) ≡ f(x)− k ·  x − g(α)  2 for some function f(·) :X → R 1 and monotonic function g(·) :A → X . Since utility or profit functions of the form (25) are not particularly standard, it is worth describing some of their properties. One property, which may or may not be realistic for a decision setting, is that changes in the level of the choice variable α do not affect the curvature (i.e. the second and higher order derivatives) of U(x,α) with respect to x, but only lead to uniform changes in the level and slope with respect to x – that is to say, for any pair of values α 1 ,α 2 ∈ A, the difference U(x,α 1 ) − U(x,α 2 ) is an affine function of x. 1 A more direct property of the form (25) is revealed by adopting the forecast- equivalent labeling of the choice variable to obtain its canonical form ˆ U(x,x F ) from (20), which as we have seen, specifies the level of utility or profit resulting from an actual realized value of x and the action that would have been optimal for a realized value of x F . Under this labeling, the objective function implied by the squared-error loss function L Sq (x R ,x F ) is seen (by (23))totaketheform (26) ˆ U(x,x F ) ≡ f(x)− L Sq (x, x F ) ≡ f(x)− k · (x − x F ) 2 . In terms of our earlier example, this states that when a firm faces a realized output price of x, its shortfall from optimal profits due to having planned for an output price of x F only depends upon the difference between x and x F (and in particular, upon the 1 Specifically, (25) implies U(x,α 1 ) − U(x,α 2 ) ≡−k ·[g(α 1 ) 2 − g(α 2 ) 2 ]+2 ·k ·[g(α 1 ) − g(α 2 )]·x. Ch. 2: Forecasting and Decision Theory 95 Figure 1. Level curves of a general loss function L(x R ,x F ) and the band |x R − x F |  ε. square of this difference), and not upon how high or how low the two values might both be. Thus, the profit shortfall from having underpredicted a realized output price of $10 by one dollar is the same as the profit shortfall from having underpredicted a realized output price of $2 by one dollar. This is clearly unrealistic in any decision problem which exhibits “wealth effects” or “location effects” in the uncertain variable, such as a firm which could make money if the realized output price was $7 (so there would be a definite loss in profits from having underpredicted the price by $1), but would want to shut down if the realized output price was only $4 (in which case there would be no profit loss at all from having underpredicted the price by $1). 2.3.3. Are squared-error loss functions appropriate as “local approximations”? One argument for the squared-error form L Sq (x R ,x F ) ≡ k · (x R − x F ) 2 is that if the forecast errors x R − x F are not too big – that is, if the forecaster is good enough at prediction – then this functional form is the natural second-order approximation to any smooth loss function that exhibits the necessary properties of being zero when x R = x F (from (8)) and having zero first-order effect for small departures from a perfect forecast (from (15)). However, the fact that x R − x F may always be close to zero does not legitimize the use of the functional form k · (x R − x F ) 2 as a second-order approximation to a 96 C.W.J. Granger and M.J. Machina general smooth bivariate loss function L(x R ,x F ), even one that satisfies L(0, 0) = 0 and ∂L(x R ,x F )/∂x F | x R =x F = 0. Consider Figure 1, which illustrates the level curves of some smooth loss function L(x R ,x F ), along with the region where |x R −x F | is less than or equal to some small value ε, which is seen to constitute a constant-width band about the 45 ◦ line. This region does not constitute a small neighborhood in R 2 ,evenas ε → 0. In particular, the second order approximation to L(x R ,x F ) when x R and x F are both small and approximately equal to each other is not the same as the second-order approximation to L(x R ,x F ) when x R and x F are both large and approximately equal to each other. Legitimate second-order approximations to L(x R ,x F ) can only be taken in over small neighborhoods of points in R 2 , and not over bands (even narrow bands) about the 45 ◦ line. The “quadratic approximation” L Sq (x R ,x F ) ≡ k · (x R − x F ) 2 over such bands is not justified by Taylor’s theorem. 2.3.4. Implications of error-based loss By the year 2000, virtually all stated loss functions were of the form (27) – that is, a single-argument function of the forecast error x R −x F which satisfies the properties (8): (27)L err (x R ,x F ) ≡ H(x R − x F ), H (·)  0,H(0) = 0,H(·) quasiconcave. Consider what Theorem 1 implies about this general error-based form: C OROLLARY 2. For arbitrary error-based function L err (x R ,x F ) ≡ H(x R −x F ) satis- fying (27), an objective function U(·, ·) :X ×A → R 1 with strictly monotonic optimal action function α(·) will generate L err (·, ·) as its loss function if and only if it takes the form (28)U(x,α) ≡ f(x)−H  x − g(α)  for some function f(·) :X → R 1 and monotonic function g(·) :A → X . Formula (28) highlights the fact that the use of an error-based loss function of the form (27) implicitly assumes that the decision maker’s underlying problem is again “location-independent”, in the sense that the utility loss from having made an ex post nonoptimal choice α = g −1 (x R ) only depends upon the difference between the values x R and g(α), and not their general levels, so that it is again subject to the remarks following Equation (26). This location-independence is even more starkly illustrated in formula (28)’s canonical form, namely ˆ U(x,x F ) ≡ f(x)− H(x − x F ). 2.4. Location-dependent loss functions Given a loss function L(x R ,x F ) which is location-dependent and hence does not take the form (27), we can nevertheless retain most of our error-based intuition by defining e = x R −x F and defining L(x R ,x F )’s associated location-dependent error-based form Ch. 2: Forecasting and Decision Theory 97 by (29)H(x R ,e) ≡ L(x R ,x R − e) which implies (30)L(x R ,x F ) ≡ H(x R ,x R − x F ). In this case Theorem 1 implies that the utility function (22) takes the form (31)U(x,α) ≡ f(x)−H  x,x − g(α)  for some f(·) and monotonic g(·). This is seen to be a generalization of Corollary 2, where the error-based function H(x − g(α)) is replaced by a location-dependent form H(x,x−g(α)). Such a function, with canonical form ˆ U(x,x F ) ≡ f(x)−H(x,x−x F ), would be appropriate when the decision maker’s sensitivity to a unit error was different for prediction errors about high values of the variable x than for prediction errors about low values of this variable. 2.5. Distribution-forecast and distribution-realization loss functions Although the traditional form of forecast used was the point forecast, there has recently been considerable interest in the use of distribution forecasts. As motivation, consider “forecasting” the number that will come up on a biased (i.e. “loaded”) die. There is little point to giving a scalar point forecast – rather, since there will be irreducible uncertainty, the forecaster is better off studying the die (e.g., rolling it many times) and reporting the six face probabilities. We refer to such a forecast as a distribution forecast. The decision maker bases their optimal action upon the distribution forecast F F (·) by solving the first order condition (32)  U α (x, α) dF F (x) = 0 to obtain the optimal action function (33)α(F F ) ≡ argmax α∈A  U(x,α)dF F (x). For the case of a distribution forecast F F (·), the reduced-form payoff function takes the form (34)R(x R ,F F ) ≡ U  x R , arg max α∈A  U(x,α)dF F (x)  ≡ U  x R ,α(F F )  . Recall that the point-forecast equivalent is defined as the value x F (F F ) that satisfies (35)α  x F (F F )  = α(F F ) 98 C.W.J. Granger and M.J. Machina and in the case of a single realization x R ,thedistribution-forecast/point-realization loss function is given by (36)L(x R ,F F ) ≡ U  x R ,α(x R )  − U  x R ,α(F F )  . In the case of T successive throws of the same loaded die, there is a sense in which the “best case scenario” is when the forecaster has correctly predicted each of the successive realized values x R1 , ,x RT . However, when it is taken as given that the successive throws are independent, and when the forecaster is restricted to offering a single distribution forecast F F (·) which must be provided prior to any of the throws, then the “best case” distribution forecast is the one that turns out to match the empirical distribution F R (·) of the sequence of realizations, which we can call its “histogram”. We thus define the distribution-forecast/distribution-realization loss function by (37)L(F R ,F F ) ≡  U  x,α(F R )  dF R (x) −  U  x,α(F F )  dF R (x) and observe that much of the above point-realization based analysis can be extended to such functions. References Berger, J.O. (1985). Statistical Decision Theory and Bayesian Analysis, second ed. Springer-Verlag, New Yo rk . Chamberlain, G. (2000). “Econometrics and decision theory”. Journal of Econometrics 95, 255–283. DeGroot, M.H. (1970). Optimal Statistical Decisions. McGraw Hill, New York. Elliott, G., Lieli, R., (2005). “Predicting binary outcomes”. Working Paper, Department of Economics, Uni- versity of California, San Diego. Granger, C.W.J., Pesaran, M.H. (2000a). “A decision-theoretic approach to forecast evaluation”. In: Chan, W.S., Li, W.K., Tong, H. (Eds.), Statistics and Finance: An Interface. Imperial College Press, London. Granger, C.W.J., Pesaran, M.H. (2000b). “Economic and statistical measures of forecast accuracy”. Journal of Forecasting 19, 537–560. Machina, M.J., Granger, C.W.J., (2005). “Decision-based forecast evaluation”. Manuscript, Department of Economics, University of California, San Diego. Pesaran, M.H., Skouras, S. (2002). “Decision-based methods for forecast evaluation”. In: Clements, M.P., Hendry, D.F. (Eds.), A Companion to Economic Forecasting. Blackwell, Oxford. Pratt, J.W., Raiffa, H., Schlaifer, R. (1995). Introduction to Statistical Decision Theory, second ed. MIT Press, Cambridge, MA. Theil, H. (1961). Economic Forecasts and Policy, second ed. North-Holland, Amsterdam. Theil, H. (1971). Applied Economic Forecasting. North-Holland, Amsterdam. West, M., Harrison, J. (1989). Bayesian Forecasting and Dynamic Models. Springer-Verlag, New York. West, M., Harrison, J. (1997). Bayesian Forecasting and Dynamic Models, second ed. Springer-Verlag, New Yo rk . White, D.J. (1966). “Forecasts and decision making”. Journal of Mathematical Analysis and Applications 14, 163–173. Chapter 3 FORECAST EVALUATION KENNETH D. WEST University of Wisconsin Contents Abstract 100 Keywords 100 1. Introduction 101 2. A brief history 102 3. A small number of nonnested models, Part I 104 4. A small number of nonnested models, Part II 106 5. A small number of nonnested models, Part III 111 6. A small number of models, nested: MSPE 117 7. A small number of models, nested, Part II 122 8. Summary on small number of models 125 9. Large number of models 125 10. Conclusions 131 Acknowledgements 132 References 132 Handbook of Economic Forecasting, Volume 1 Edited by Graham Elliott, Clive W.J. Granger and Allan Timmermann © 2006 Elsevier B.V. All rights reserved DOI: 10.1016/S1574-0706(05)01003-7 100 K.D. West Abstract This chapter summarizes recent literature on asymptotic inference about forecasts. Both analytical and simulation based methods are discussed. The emphasis is on techniques applicable when the number of competing models is small. Techniques applicable when a large number of models is compared to a benchmark are also briefly discussed. Keywords forecast, prediction, out of sample, prediction error, forecast error, parameter estimation error, asymptotic irrelevance, hypothesis test, inference JEL classification: C220, C320, C520, C530 Ch. 3: Forecast Evaluation 101 1. Introduction This chapter reviews asymptotic methods for inference about moments of functions of predictions and prediction errors. The methods may rely on conventional asymptotics or they may be bootstrap based. The relevant class of applications are ones in which the investigator uses a long time series of predictions and prediction errors as a model evaluation tool. Typically the evaluation is done retrospectively rather than in real time. A classic example is Meese and Rogoff’s (1983) evaluation of exchange rate models. In most applications, the investigator aims to compare two or more models. Measures of relative model quality might include ratios or differences of mean, mean-squared or mean-absolute prediction errors; correlation between one model’s prediction and an- other model’s realization (also known as forecast encompassing); or comparisons of utility or profit-based measures of predictive ability. In other applications, the investigator focuses on a single model, in which case measures of model quality might include correlation between prediction and realization, lack of serial correlation in one step ahead prediction errors, ability to predict direction of change, or bias in predictions. Predictive ability has long played a role in evaluation of econometric models. An early example of a study that retrospectively set aside a large number of obser- vations for predictive evaluation is Wilson (1934, pp. 307–308). Wilson, who studied monthly price data spanning more than a century, used estimates from the first half of his data to forecast the next twenty years. He then evaluated his model by computing the correlation between prediction and realization. 1 Growth in data and computing power has led to widespread use of similar predictive evaluation techniques, as is indicated by the applications cited below. To prevent misunderstanding, it may help to stress that the techniques discussed here are probably of little relevance to studies that set aside one or two or a handful of ob- servations for out of sample evaluation. The reader is referred to textbook expositions about confidence intervals around a prediction, or to proposals for simulation methods such as Fair (1980). As well, the paper does not cover density forecasts. Inference about such forecasts is covered in the Handbook Chapter 5 by Corradi and Swanson (2006). Finally, the paper takes for granted that one wishes to perform out of sample analysis. My purpose is to describe techniques that can be used by researchers who have decided, for reasons not discussed in this chapter, to use a non-trivial portion of their samples for prediction. See recent work by Chen (2004), Clark and McCracken (2005b) and Inoue and Kilian (2004a, 2004b) for different takes on the possible power advantages of using out of sample tests. Much of the paper uses tests for equal mean squared prediction error (MSPE) for illustration. MSPE is not only simple, but it is also arguably the most commonly used measure of predictive ability. The focus on MSPE, however, is done purely for expo- sitional reasons. This paper is intended to be useful for practitioners interested in a 1 Which, incidentally and regrettably, turned out to be negative. 102 K.D. West wide range of functions of predictions and prediction errors that have appeared in the literature. Consequently, results that are quite general are presented. Because the tar- get audience is practitioners, I do not give technical details. Instead, I give examples, summarize findings and present guidelines. Section 2 illustrates the evolution of the relevant methodology. Sections 3–8 discuss inference when the number of models under evaluation is small. “Small” is not pre- cisely defined, but in sample sizes typically available in economics suggests a number in the single digits. Section 3 discusses inference in the unusual, but conceptually simple, case in which none of the models under consideration rely on estimated regression parameters to make predictions. Sections 4 and 5 relax this assumption, but for reasons described in those sections assume that the models under consideration are nonnested. Section 4 describes when reliance on estimated regression parameters is irrelevant as- ymptotically, so that Section 3 procedures may still be applied. Section 5 describes how to account for reliance on estimated regression parameters. Sections 6 and 7 consider nested models. Section 6 focuses on MSPE, Section 7 other loss functions. Section 8 summarizes the results of previous sections. Section 9 briefly discusses inference when the number of models being evaluated is large, possibly larger than the sample size. Section 10 concludes. 2. A brief history I begin the discussion with a brief history of methodology for inference, focusing on mean squared prediction errors (MSPE). Let e 1t and e 2t denote one step ahead prediction errors from two competing models. Let their corresponding second moments be σ 2 1 ≡ Ee 2 1t and σ 2 2 ≡ Ee 2 2t . (For reasons explained below, the assumption of stationarity – the absence of a t sub- script on σ 2 1 and σ 2 2 – is not always innocuous. For the moment, I maintain it for consistency with the literature about to be reviewed.) One wishes to test the null H 0 : σ 2 1 − σ 2 2 = 0, or perhaps construct a confidence interval around the point estimate of σ 2 1 − σ 2 2 . Observe that E(e 1t − e 2t )(e 1t + e 2t ) = σ 2 1 − σ 2 2 . Thus σ 2 1 − σ 2 2 = 0 if and only if the covariance or correlation between e 1t − e 2t and e 1t + e 2t is zero. Let us suppose initially that (e 1t ,e 2t ) is i.i.d. Granger and Newbold (1977) used this observation to suggest testing H 0 : σ 2 1 − σ 2 2 = 0 by testing for zero correlation between e 1t − e 2t and e 1t +e 2t . This procedure was earlier proposed by Morgan (1939) in the context of testing for equality between variances of two normal random variables. Granger and Newbold (1977) assumed that the forecast errors had zero mean, but Morgan (1939) indicates that this assumption is not essential. The Granger and Newbold test was extended to Ch. 3: Forecast Evaluation 103 multistep, serially correlated and possibly non-normal prediction errors by Meese and Rogoff (1988) and Mizrach (1995). Ashley, Granger and Schmalensee (1980) proposed a test of equal MSPE in the context of nested models. For nested models, equal MSPE is theoretically equivalent to a test of Granger non-causality. Ashley, Granger and Schmalensee (1980) proposed ex- ecuting a standard F-test, but with out of sample prediction errors used to compute restricted and unrestricted error variances. Ashley, Granger and Schmalensee (1980) recommended that tests be one-sided, testing whether the unrestricted model has smaller MSPE than the restricted (nested) model: it is not clear what it means if the restricted model has a significantly smaller MSPE than the unrestricted model. The literature on predictive inference that is a focus of this chapter draws on now standard central limit theory introduced into econometrics research by Hansen (1982) – what I will call “standard results” in the rest of the discussion. Perhaps the first explicit use of standard results in predictive inference is Christiano (1989).Letf t = e 2 1t − e 2 2t . Christiano observed that we are interested in the mean of f t ,callitEf t ≡ σ 2 1 − σ 2 2 . 2 And there are standard results on inference about means – indeed, if f t is i.i.d. with finite variance, introductory econometrics texts describe how to conduct inference about Ef t given a sample of {f t }. A random variable like e 2 1t −e 2 2t may be non-normal and serially correlated. But results in Hansen (1982) apply to non-i.i.d. time series data. (Details below.) One of Hansen’s (1982) conditions is stationarity. Christiano acknowledged that standard results might not apply to his empirical application because of a possible failure of stationarity. Specifically, Christiano compared predictions of models estimated over samples of increasing size: the first of his 96 predictions relied on models estimated on quarterly data running from 1960 to 1969, the last from 1960 to 1988. Because of increasing precision of estimates of the models, forecast error variances might decline over time. (This is one sense in which the assumption of stationarity was described as “not obviously innocuous” above.) West, Edison and Cho (1993) and West and Cho (1995) independently used standard results to compute test statistics. The objects of interest were MSPEs and a certain utility based measure of predictive ability. Diebold and Mariano (1995) proposed using the same standard results, also independently, but in a general context that allows one to be interested in the mean of a general loss or utility function. As detailed below, these papers explained either in context or as a general principle how to allow for multistep, non-normal, and conditionally heteroskedastic prediction errors. The papers cited in the preceding two paragraphs all proceed without proof. None di- rectly address the possible complications from parameter estimation noted by Christiano (1989). A possible approach to allowing for these complications in special cases is in Hoffman and Pagan (1989) and Ghysels and Hall (1990). These papers showed how 2 Actually, Christiano looked at root mean squared prediction errors, testing whether σ 1 − σ 2 = 0. For clarity and consistency with the rest of my discussion, I cast his analysis in terms of MSPE. . number of models, nested, Part II 122 8. Summary on small number of models 125 9. Large number of models 125 10. Conclusions 131 Acknowledgements 132 References 132 Handbook of Economic Forecasting, . 102 3. A small number of nonnested models, Part I 104 4. A small number of nonnested models, Part II 106 5. A small number of nonnested models, Part III 111 6. A small number of models, nested:. (2000b). Economic and statistical measures of forecast accuracy”. Journal of Forecasting 19, 537–560. Machina, M.J., Granger, C.W.J., (2005). “Decision-based forecast evaluation”. Manuscript, Department

Định dạng
Số trang	10
Dung lượng	115,29 KB