194 A. Timmermann and Mark Watson provided detailed comments and suggestions that greatly improved the paper. Comments from seminar participants at the UCSD Rady School forecasting conference were also helpful. References Aiolfi, M., Favero, C.A. (2005). “Model uncertainty, thick modeling and the predictability of stock returns”. Journal of Forecasting 24, 233–254. Aiolfi, M., Timmermann, A. (2006). “Persistence of forecasting performance and combination strategies”. Journal of Econometrics. In press. Armstrong, J.S. (1989). “Combining forecasts: The end of the beginning or the beginning of the end”. Inter- national Journal of Forecasting 5, 585–588. Bates, J.M., Granger, C.W.J. (1969). “The combination of forecasts”. Operations Research Quarterly 20, 451–468. Bunn, D.W. (1975). “A Bayesian approach to the linear combination of forecasts”. Operations Research Quar- terly 26, 325–329. Bunn, D.W. (1985). “Statistical efficiency in the linear combination of forecasts”. International Journal of Forecasting 1, 151–163. Chan, Y.L., Stock,J.H., Watson, M.W. (1999). “A dynamic factor modelframework for forecast combination”. Spanish Economic Review 1, 91–122. Chong, Y.Y., Hendry, D.F. (1986). “Econometric evaluation of linear macro-economic models”. Review of Economic Studies 53, 671–690. Christoffersen, P., Diebold, F.X. (1997). “Optimal prediction under asymmetrical loss”. Econometric The- ory 13, 806–817. Clemen, R.T. (1987). “Combining overlapping information”. Management Science 33, 373–380. Clemen, R.T. (1989). “Combining forecasts: A review and annotated bibliography”. International Journal of Forecasting 5, 559–581. Clemen, R.T., Murphy, A.H., Winkler, R.L. (1995). “Screening probability forecasts: Contrasts between choosing and combining”. International Journal of Forecasting 11, 133–145. Clemen, R.T., Winkler, R.L. (1986). “Combining economic forecasts”. Journal of Business and Economic Statistics 4, 39–46. Deutsch, M., Granger, C.W.J., Terasvirta, T. (1994). “The combination of forecasts using changing weights”. International Journal of Forecasting 10, 47–57. Diebold, F.X. (1988). “Serial correlation and thecombination of forecasts”. Journalof Business and Economic Statistics 6, 105–111. Diebold, F.X. (1989). “Forecast combination and encompassing: Reconciling two divergent literatures”. In- ternational Journal of Forecasting 5, 589–592. Diebold, F.X., Lopez, J.A. (1996). “Forecast evaluation and combination”. In: Maddala, G.S., Rao, C.R. (Eds.), Statistical Methods in Finance, Handbook of Statistics, vol. 14. Elsevier, Amsterdam, pp. 241– 268. Diebold, F.X., Pauly, P. (1987). “Structural change and the combination of forecasts”. Journal of Forecast- ing 6, 21–40. Diebold, F.X., Pauly, P. (1990). “The use of prior information in forecast combination”. International Journal of Forecasting 6, 503–508. Donaldson, R.G., Kamstra, M.(1996). “Forecast combining with neural networks”. Journal of Forecasting 15, 49–61. Dunis, C., Laws, J., Chauvin, S. (2001). “The use of market data and model combinations to improve forecast accuracy”. In: Dunis, C., Timmermann, A., Moody, J.E. (Eds.), Development in Forecasts Combination and Portfolio Choice. Wiley, Oxford. Ch. 4: Forecast Combinations 195 Dunis, C.L., Timmermann, A., Moody, J.E. (Eds.) (2001). Developments in Forecasts Combination and Port- folio Choice. Wiley, Oxford. Elliott, G. (2004). “Forecast combination with many forecasts”. Mimeo, Department of Economics, Univer- sity of California, San Diego. Elliott, G., Timmermann, A.(2004). “Optimal forecast combinations under general loss functions and forecast error distributions”. Journal of Econometrics 122, 47–79. Elliott, G., Timmermann, A. (2005). “Optimal forecast combination weights under regime switching”. Inter- national Economic Review 46, 1081–1102. Engle, R.F., Granger, C.W.J., Kraft, D. (1984). “Combining competing forecasts of inflation using a bivariate ARCH model”. Journal of Economic Dynamics and Control 8, 151–165. Figlewski, S., Urich, T. (1983). “Optimal aggregation of money supply forecasts: Accuracy, profitability and market efficiency”. Journal of Finance 28, 695–710. Genest, S., Zidek, J. (1986). “Combining probability distributions: A critique and an annotated bibliography”. Statistical Science 1, 114–148. Geweke, J., Whitemann, C. (2006). “Bayesian forecasting”. In: Elliott, G., Granger, C.W.J., Timmermann, A. (Eds.), Handbook of Economic Forecasting. Elsevier, Amsterdam, pp. 3–79. Chapter 1 in this volume. Giacomini, R., Komunjer, I. (2005). “Evaluation and combination of conditional quantile forecasts”. Journal of Business and Economic Statistics 23, 416–431. Granger, C.W.J., Jeon, Y. (2004). “Thick modeling”. Economic Modelling 21, 323–343. Granger, C.W.J., Machina, M.J. (2006). “Forecasting and decision theory”. In: Elliott, G., Granger, C.W.J., Timmermann, A. (Eds.), Handbook of Economic Forecasting. Elsevier, Amsterdam, pp. 81–98. Chapter 2 in this volume. Granger, C.W.J., Pesaran, M.H. (2000). “Economic and statistical measures of forecast accuracy”. Journal of Forecasting 19, 537–560. Granger, C.W.J., Ramanathan, R. (1984). “Improved methods of combining forecasts”. Journal of Forecast- ing 3, 197–204. Guidolin, M., Timmermann, A. (2005). “Optimal forecast combination weights under regime shifts with an application to US interest rates”. Mimeo, Federal Reserve Bank of St. Louis and the Department of Economics, University of California, San Diego. Gupta, S., Wilton, P.C. (1987). “Combination of forecasts: An extension”. Management Science 33, 356–372. Hamilton, J.H. (1989). “A new approach to theeconomic analysis ofnonstationary time series and thebusiness cycle”. Econometrica 57, 357–384. Hendry, D.F., Clements, M.P. (2002). “Pooling of forecasts”. Econometrics Journal 5, 1–26. Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T. (1999). “Bayesian model averaging: A tutorial”. Statistical Science 14, 382–417. Jackson, T., Karlsson, S. (2004). “Finding good predictors for inflation: A Bayesian model averaging ap- proach”. Journal of Forecasting 23, 479–498. Jaganathan, R., Ma, T. (2003). “Risk reduction in large portfolios: Why imposing the wrong constraints helps”. Journal of Finance 58, 1651–1684. Jobson, J.D., Korkie, B. (1980). “Estimation for Markowitz efficient portfolios”. Journal of the American Statistical Association 75, 544–554. Kang, H. (1986). “Unstable weights in the combination of forecasts”. Management Science 32, 683–695. Leamer, E. (1978). Specification Searches. Wiley, Oxford. Ledoit, O., Wolf, M. (2003). “Improved estimation of the covariance matrix of stock returns with an applica- tion to portfolio election”. Journal of Empirical Finance 10, 603–621. Ledoit, O., Wolf, M. (2004). “Honey, I shrunk the sample covariance matrix”. Journal of Portfolio Manage- ment. LeSage, J.P., Magura, M. (1992). “A mixture-model approach to combining forecasts”. Journal of Business and Economic Statistics 10, 445–453. Makridakis, S. (1989). “Why combining works?”. International Journal of Forecasting 5, 601–603. Makridakis, S., Hibon, M. (2000). “The M3-competition: Results, conclusions and implications”. Interna- tional Journal of Forecasting 16, 451–476. 196 A. Timmermann Makridakis, S., Winkler, R.L. (1983). “Averages of forecasts: Some empirical results”. Management Sci- ence 29, 987–996. Makridakis, S., Andersen, A., Carbone, R., Fildes, R., Hibon, M., Lewandowski, R., Newton, J., Parzen, E., Winkler, R. (1982). “The accuracy of extrapolation (time series) methods: Results of a forecasting competition”. Journal of Forecasting 1, 111–153. Marcellino, M. (2004). “Forecast pooling for short time series of macroeconomic variables”. Oxford Bulletin of Economic and Statistics 66, 91–112. Min, C K., Zellner, A. (1993). “Bayesian and non-Bayesian methods for combining models and forecasts with applications to forecasting international growth rates”. Journal of Econometrics 56, 89–118. Newbold, P., Granger, C.W.J. (1974). “Experience with forecasting univariate time series and the combination of forecasts”. Journal of the Royal Statistical Society Series A 137, 131–146. Newbold, P., Harvey, D.I. (2001). “Forecast combination and encompassing”. In: Clements, M.P., Hendry, D.F. (Eds.), A Companion to Economic Forecasting. Blackwell, Oxford. Palm, F.C., Zellner, A. (1992). “To combine or not to combine? Issues of combining forecasts”. Journal of Forecasting 11, 687–701. Patton, A., Timmermann, A. (2004).“Properties of optimalforecasts under asymmetric loss and nonlinearity”. Mimeo, London School of Economics and Departmentof Economics, University of California, San Diego. Pesaran, M.H., Timmermann, A. (2005). “Selection of estimation window in the presence of breaks”. Mimeo, Cambridge University and Department of Economics, University of California, San Diego. Raftery, A.E., Madigan, D., Hoeting, J.A. (1997). “Bayesian model averaging for linear regression models”. Journal of the American Statistical Association 92, 179–191. Reid, D.J. (1968). “Combining three estimates of gross domestic product”. Economica 35, 431–444. Sanders, F. (1963). “On subjective probability forecasting”. Journal of Applied Meteorology 2, 196–201. Sessions, D.N., Chattererjee, S. (1989). “The combining of forecasts using recursive techniques with nonsta- tionary weights”. Journal of Forecasting 8, 239–251. Stock, J.H., Watson, M. (2001). “A comparison of linear and nonlinear univariate models for forecasting macroeconomic time series”. In: Engle, R.F., White, H. (Eds.), Festschrift in Honour of Clive Granger. Cambridge University Press, Cambridge, pp. 1–44. Stock, J.H., Watson, M. (2004). “Combination forecasts of output growth in a seven-country data set”. Journal of Forecasting 23, 405–430. Swanson, N.R., Zeng, T. (2001). “Choosing among competing econometric forecasts: Regression-based fore- cast combination using model selection”. Journal of Forecasting 6, 425–440. West, K.D. (2006). “Forecast evaluation”. In: Elliott, G., Granger, C.W.J., Timmermann, A. (Eds.), Handbook of Economic Forecasting. Elsevier, Amsterdam, pp. 99–134. Chapter 3 in this volume. White, H. (2006). “Approximate nonlinear forecasting methods”. In: Elliott, G., Granger, C.W.J., Timmer- mann, A. (Eds.), Handbook of Economic Forecasting. Elsevier, Amsterdam, pp. 459–512. Chapter 9 in this volume. Winkler, R.L. (1981). “Combining probability distributions from dependent information sources”. Manage- ment Science 27, 479–488. Winkler, R.L. (1989). “Combining forecasts: A philosophical basis and some current issues”. International Journal of Forecasting 5, 605–609. Winkler, R.L., Makridakis, S. (1983). “The combination of forecasts”. Journal of the Royal Statistical Society Series A 146, 150–157. Wright, S.M., Satchell, S.E. (2003). “Generalized mean-variance analysis and robust portfolio diversifica- tion”. In: Satchell, S.E., Scowcroft, A. (Eds.), Advances in Portfolio Construction and Implementation. Butterworth Heinemann, London, pp. 40–54. Yang, Y. (2004). “Combining forecasts procedures: Some theoretical results”. Econometric Theory 20, 176– 190. Zellner, A. (1986). “Bayesian estimation and prediction using asymmetric loss functions”. Journal of the American Statistical Association 81, 446–451. Zellner, A., Hong, C., Min, C k. (1991). “Forecasting turning points in international output growth rates using Bayesian exponentially weighted autoregression, time-varying parameter, and pooling techniques”. Journal of Econometrics 49, 275–304. Chapter 5 PREDICTIVE DENSITY EVALUATION VALENTINA CORRADI Queen Mary, University of London NORMAN R. SWANSON Rutgers University Contents Abstract 198 Keywords 199 Part I: Introduction 200 1. Estimation, specification testing, and model evaluation 200 Part II: Testing for Correct Specification of Conditional Distributions 207 2. Specification testing and model evaluation in-sample 207 2.1. Diebold, Gunther and Tay approach – probability integral transform 208 2.2. Bai approach – martingalization 208 2.3. Hong and Li approach – a nonparametric test 210 2.4. Corradi and Swanson approach 212 2.5. Bootstrap critical values for the V 1T and V 2T tests 216 2.6. Other related work 220 3. Specification testing and model selection out-of-sample 220 3.1. Estimation and parameter estimation error in recursive and rolling estimation schemes – West as well as West and McCracken results 221 3.2. Out-of-sample implementation of Bai as well as Hong and Li tests 223 3.3. Out-of-sample implementation of Corradi and Swanson tests 225 3.4. Bootstrap critical for the V 1P,J and V 2P,J tests under recursive estimation 228 3.4.1. The recursive PEE bootstrap 228 3.4.2. V 1P,J and V 2P,J bootstrap statistics under recursive estimation 231 3.5. Bootstrap critical for the V 1P,J and V 2P,J tests under rolling estimation 233 Part III: Evaluation of (Multiple) Misspecified Predictive Models 234 4. Pointwise comparison of (multiple) misspecified predictive models 234 4.1. Comparison of two nonnested models: Diebold and Mariano test 235 4.2. Comparison of two nested models 238 4.2.1. Clark and McCracken tests 238 Handbook of Economic Forecasting, Volume 1 Edited by Graham Elliott, Clive W.J. Granger and Allan Timmermann © 2006 Elsevier B.V. All rights reserved DOI: 10.1016/S1574-0706(05)01005-0 198 V. Corradi and N.R. Swanson 4.2.2. Chao, Corradi and Swanson tests 240 4.3. Comparison of multiple models: The reality check 242 4.3.1. White’s reality check and extensions 243 4.3.2. Hansen’s approach applied to the reality check 247 4.3.3. The subsampling approach applied to the reality check 248 4.3.4. The false discovery rate approach applied to the reality check 249 4.4. A predictive accuracy test that is consistent against generic alternatives 249 5. Comparison of (multiple) misspecified predictive density models 253 5.1. The Kullback–Leibler information criterion approach 253 5.2. A predictive density accuracy test for comparing multiple misspecified models 254 5.2.1. A mean square error measure of distributional accuracy 254 5.2.2. The tests statistic and its asymptotic behavior 255 5.2.3. Bootstrap critical values for the density accuracy test 262 5.2.4. Empirical illustration – forecasting inflation 265 Acknowledgements 271 Part IV: Appendices and References 271 Appendix A: Assumptions 271 Appendix B: Proofs 275 References 280 Abstract This chapter discusses estimation, specification testing, and model selection of predic- tive density models. In particular, predictive density estimation is briefly discussed, and a variety of different specification and model evaluation tests due to various authors including Christoffersen and Diebold [Christoffersen, P., Diebold, F.X. (2000). “How relevant is volatility forecasting for financial risk management?”. Review of Economics and Statistics 82, 12–22], Diebold, Gunther and Tay [Diebold, F.X., Gunther, T., Tay, A.S. (1998). “Evaluating density forecasts with applications to finance and manage- ment”. International Economic Review 39, 863–883], Diebold, Hahn and Tay [Diebold, F.X., Hahn, J., Tay, A.S. (1999). “Multivariate density forecast evaluation and cali- bration in financial risk management: High frequency returns on foreign exchange”. Review of Economics and Statistics 81, 661–673], White [White, H. (2000). “A reality check for data snooping”. Econometrica 68, 1097–1126], Bai [Bai, J. (2003). “Testing parametric conditional distributions of dynamic models”. Review of Economics and Statistics 85, 531–549], Corradi and Swanson [Corradi, V., Swanson, N.R. (2005a). “A test for comparing multiple misspecified conditional distributions”. Econometric Theory 21, 991–1016; Corradi, V., Swanson, N.R. (2005b). “Nonparametric bootstrap procedures for predictive inference based on recursive estimation schemes”. Working Paper, Rutgers University; Corradi, V., Swanson, N.R. (2006a). “Bootstrap conditional distribution tests in the presence of dynamic misspecification”. Journal of Economet- rics, in press; Corradi, V., Swanson, N.R. (2006b). “Predictive density and conditional Ch. 5: Predictive Density Evaluation 199 confidence interval accuracy tests”. Journal of Econometrics, in press], Hong and Li [Hong, Y.M., Li, H.F. (2003). “Nonparametric specification testing for continuous time models with applications to term structure of interest rates”. Review of Financial Stud- ies, 18, 37–84], and others are reviewed. Extensions of some existing techniques to the case of out-of-sample evaluation are also provided, and asymptotic results associated with these extensions are outlined. Keywords block bootstrap, density and conditional distribution, forecast accuracy testing, mean square error, parameter estimation error, parametric and nonparametric methods, prediction, rolling and recursive estimation scheme JEL classification: C22, C51 200 V. Corradi and N.R. Swanson Part I: Introduction 1. Estimation, specification testing, and model evaluation The topic of predictive density evaluation has received considerable attention in eco- nomics and finance over the last few years, a fact which is not at all surprising when one notes the importance of predictive densities to virtually all public and private insti- tutions involved with the construction and dissemination of forecasts. As a case in point, consider the plethora of conditional mean forecasts reported by the news media. These sorts of predictions are not very useful for economic decision making unless confidence intervals are also provided. Indeed, there is a clear need when forming macroeconomic policies and when managing financial risk in the insurance and banking industries to use predictive confidence intervals or entire predictive conditional distributions. One such case is when value at risk measures are constructed in order to assess the amount of cap- ital at risk from small probability events, such as catastrophes (in insurance markets) or monetary shocks that have large impact on interest rates [see Duffie and Pan (1997) for further discussion]. Another case is when maximizing expected utility of an investor who is choosing an optimal asset allocation of stocks and bonds, in which case there is a need to model the joint distribution of the assets [see Guidolin and Timmermann (2005, 2006) for a discussion of this and related applications]. Finally, it is worth noting that density forecasts may be useful in multi-step ahead prediction contexts using non- linear models, even if interest focuses only on point forecasts of the conditional mean [see Chapter 8 in this Handbook by Teräsvirta (2006)]. In this chapter we shall discuss some of the tools that are useful in such situations, with particular focus on estimation, specification testing, and model evaluation. Additionally, we shall review various tests for the evaluation of point predictions. 1 There are many important historical precedents for predictive density estimation, test- ing, and model selection. From the perspective of estimation, the parameters character- izing distributions, conditional distributions and predictive densities can be constructed using innumerable well established techniques, including maximum likelihood, (simu- lated generalized) methods of moments, and a plethora of other estimation techniques. Additionally, one can specify parametric models, nonparametric models, and semi- parametric models. For example, a random variable of interest, say y t , may be assumed to have a particular distribution, say F(u|θ 0 ) = P(y u|θ 0 ) = (u) = u −∞ f(y)dy, where f(y) = 1 σ √ 2π e −(y−μ) 2 /(2σ 2 ) . Here, the consistent maximum likelihood esti- mator of θ 0 is μ = n −1 T t=1 y t , and σ 2 = n −1 T t=1 (y t − μ) 2 , where T is the 1 In this chapter, the distinction that is made between specification testing and model evaluation (or predic- tive accuracy testing) is predicated on the fact that specification tests often consider only one model. Such tests usually attempt to ascertain whether the model is misspecified, and they usually assume correct specifi- cation under the null hypothesis. On the other hand, predictive accuracy tests compare multiple models and should (in our view) allow for various forms of misspecification, under both hypotheses. Ch. 5: Predictive Density Evaluation 201 sample size. This example corresponds to the case where the variable of interest is a martingale difference sequence and so there is no potentially useful (conditioning) information which may help in prediction. Then, the predictive density for y t is sim- ply f(y) = 1 ˆσ √ 2π e −(y−ˆμ) 2 /(2ˆσ 2 ) . Alternatively, one may wish to use a nonparametric estimator. For example, if the functional form of the distribution is unknown, one might choose to construct a kernel density estimator. In this case, one would construct f(y)= 1 Tλ T t=1 κ( y t −y λ ), where κ is a kernel function and λ is the bandwidth parame- ter that satisfies a particular rate condition in order to ensure consistent estimation, such as λ = O(T −1/5 ). Nonparametric density estimators converge to the true underlying density at a nonparametric (slow) rate. For this reason, a valid alternative is the use of empirical distributions, which instead converge to the cumulative distribution (CDF) at a parametric rate [see, e.g., Andrews (1993) for a thorough overview of empirical dis- tributions, and empirical processes in general]. In particular, the empirical distribution is crucial in our discussion of predictive density because it is useful in estimation, test- ing, and model evaluation; and has the property that 1 √ T T i=1 (1{y t u}−F(u|θ 0 )) satisfies a central limit theorem. Of course, in economics it is natural to suppose that better predictions can be constructed by conditioning on other important economic variables. Indeed, discus- sions of predictive density are usually linked to discussions of conditional distribution, where we define conditioning information as Z t = (y t−1 , ,y t−v ,X t , ,X t−w ) with v,w finite, and where X t may be vector valued. In this context, we could de- fine a parametric model, say F(u|Z t ,θ) to characterize the conditional distribution F 0 (u|Z t ,θ 0 ) = Pr(Y t u|Z t ). Needless to say, our model would be misspecified, unless F = F 0 . Alternatively, one may wish to estimate and evaluate a group of alternative models, say F 1 (u|Z t ,θ † 1 ), ,F m (u|Z t ,θ † m ), where the parameters in these distributions cor- respond to the probability limits of the estimated parameters, and m is the number of models to be estimated and evaluated. Estimation in this context can be carried out in much the same way as when unconditional models are estimated. For example, one can construct a conditional distribution model by postulating that y t |Z t ∼ N(θ Z t ,σ 2 ), estimate θ by least square, σ 2 using least square residuals and then forming predictive confidence intervals or the entire predictive density. The foregoing discussion under- scores the fact that there are numerous well established estimation techniques which one can use to estimate predictive density models, and hence which one can use to make associated probabilistic statements such as: “There is 0.9 probability, based on the use of my particular model, that inflation next period will lie between 4 and 5 percent.” Indeed, for a discussion of estimation, one need merely pick up any basic or advanced statistics and/or econometrics text. Naturally, and as one might expect, the appropriate- ness of a particular estimation technique hinges on two factors. The first is the nature of the data. Marketing survey data are quite different from aggregate measures of eco- nomic activity, and there are well established literatures describing appropriate models and estimation techniques for these and other varieties of data, from spatial to panel, 202 V. Corradi and N.R. Swanson and from time series to cross sectional. Given that there is already a huge literature on the topic of estimation, we shall hereafter assume that the reader has at her/his disposal software and know-how concerning model estimation [for some discussion of estima- tion in cross sectional, panel, and time series models, for example, the reader might refer to Baltagi (1995), Bickel and Doksum (1977), Davidson and MacKinnon (1993), Hamilton (1994), White (1994), and Wooldridge (2002), to name but a very few]. The second factor upon which the appropriateness of a particular estimation strategy hinges concerns model specification. In the context of model specification and evaluation, it is crucial to make it clear in empirical settings whether one is assuming that a model is correctly specified (prior to estimation), or whether the model is simply an approxi- mation, possibly from amongst a group of many “approximate models”, from whence some “best” predictive density model is to be selected. The reason this assumption is important is because it impacts on the assumed properties of the residuals from the first stage conditional mean regression in the above example, which in turn impacts on the validity and appropriateness of specification testing and model evaluation techniques that are usually applied after a model has been estimated. The focus in this chapter is on the last two issues, namely specification testing and model evaluation. One reason why we are able to discuss both of these topics in a (rela- tively) short handbook chapter is that the literature on the subjects is not near so large as that for estimation; although it is currently growing at an impressive rate! The fact that the literature in these areas is still relatively underdeveloped is perhaps surprising, given that the “tools” used in specification testing and model evaluation have been around for so long, and include such important classical contributions as the Kolmogorov–Smirnov test [see, e.g., Kolmogorov (1933) and Smirnov (1939)], various results on empiri- cal processes [see, e.g., Andrews (1993) and the discussion in Chapter 19 of van der Vaart (1998) on the contributions of Glivenko, Cantelli, Doob, Donsker and others], the probability integral transform [see, e.g., Rosenblatt (1952)], and the Kullback–Leibler Information Criterion [see, e.g., White (1982) and Vuong (1989)]. However, the imma- turity of the literature is perhaps not so surprising when one considers that many of the contributions in the area depend upon recent advances including results validating the use of the bootstrap [see, e.g., Horowitz (2001)] and the invention of crucial tools for dealing with parameter estimation error [see, e.g., Ghysels and Hall (1990), Khmaladze (1981, 1988) and West (1996)], for example. We start by outlining various contributions which are from the literature on (con- sistent) specification testing [see, e.g., Bierens (1982, 1990) and Bierens and Ploberger (1997)]. An important feature of such tests is that if one subsequently carries out a series of these tests, such as when one performs a series of specification tests using alternative conditional distributions [e.g., the conditional Kolmogorov–Smirnov test of Andrews (1997)], then sequential test bias arises (i.e. critical values may be incorrectly sized, and so inference based on such sequential tests may be incorrect). Additionally, it may be difficult in some contexts to justify the assumption under the null that a model is cor- rectly specified, as we may want to allow for possible dynamic misspecification under the null, for example. After all, if two tests for the correct specification of two different Ch. 5: Predictive Density Evaluation 203 models are carried out sequentially, then surely one of the models is misspecified under the null, implying that the critical values of one of the two tests may be incorrect, as we shall shortly illustrate. It is in this sense that the idea of model evaluation in which a group of models are jointly compared, and in which case all models are allowed to be misspecified, is important, particularly from the perspective of prediction. Also, there are many settings for which the objective is not to find the correct model, but rather to select the “best” model (based on a given metric or loss function to be used for predic- tive evaluation) from amongst a group of models, all of which are approximations to some underlying unknown model. Nevertheless, given that advances in multiple model comparison under misspecification derive to a large extent from earlier advances in (cor- rect) specification testing, and given that specification testing and model evaluation are likely most powerful when used together, we shall discuss tools and techniques in both areas. Although a more mature literature, there is still a great amount of activity in the area of tests for the correct specification of conditional distributions. One reason for this is that testing for the correct conditional distribution is equivalent to jointly evaluat- ing many conditional features of a process, including the conditional mean, variance, and symmetry. Along these lines, Inoue (2001) constructs tests for generic conditional aspects of a distribution, and Bai and Ng (2001) construct tests for conditional asymme- try. These sorts of tests can be generalized to the evaluation of predictive intervals and predictive densities, too. One group of tests that we discuss along these lines is that due to Corradi and Swan- son (2006a). In their paper, they construct Kolmogorov type conditional distribution tests in the presence of both dynamic misspecification and parameter estimation error. As shall be discussed shortly, the approach taken by these authors differs somewhat from much of the related literature because they construct a statistics that allow for dy- namic misspecification under both hypotheses, rather than assuming correct dynamic specification under the null hypothesis. This difference can be most easily motivated within the framework used by Diebold, Gunther and Tay (1998, DGT), Hong (2001), and Bai (2003). In their paper, DGT use the probability integral transform to show that F t (y t | t−1 ,θ 0 ) is identically and independently distributed as a uniform random variable on [0, 1], where F t (·| t−1 ,θ 0 ) is a parametric distribution with underlying pa- rameter θ 0 , y t is again our random variable of interest, and t−1 is the information set containing all “relevant” past information (see below for further discussion). They thus suggest using the difference between the empirical distribution of F t (y t | t−1 , θ T ) and the 45 ◦ -degree line as a measure of “goodness of fit”, where θ T is some estimator of θ 0 . This approach has been shown to be very useful for financial risk management [see, e.g., Diebold, Hahn and Tay (1999)], as well as for macroeconomic forecasting [see, e.g., Diebold, Tay and Wallis (1998) and Clements and Smith (2000, 2002)]. Likewise, Bai (2003) proposes a Kolmogorov type test of F t (u| t−1 ,θ 0 ) based on the compar- ison of F t (y t | t−1 , θ T ) with the CDF of a uniform on [0, 1]. As a consequence of using estimated parameters, the limiting distribution of his test reflects the contribu- tion of parameter estimation error and is not nuisance parameter free. To overcome this . predictive models 234 4.1. Comparison of two nonnested models: Diebold and Mariano test 235 4.2. Comparison of two nested models 238 4.2.1. Clark and McCracken tests 238 Handbook of Economic Forecasting, . Timmermann, A. (2004).“Properties of optimalforecasts under asymmetric loss and nonlinearity”. Mimeo, London School of Economics and Departmentof Economics, University of California, San Diego. Pesaran,. accuracy of extrapolation (time series) methods: Results of a forecasting competition”. Journal of Forecasting 1, 111–153. Marcellino, M. (2004). “Forecast pooling for short time series of macroeconomic