Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 44 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
44
Dung lượng
2,43 MB
Nội dung
Chapter 19 INFERENCE MODELS JOHN Duke AND CAUSALITY IN ECONOMIC TIME SERIES GEWEKE Unwersig~ Contents Introduction Causality Causal orderings and their implications 3.1 3.2 3.3 time series Causality and exogeneity Inference 5.1 5.2 A canonical form for wide sense stationary multiple The implications of unidirectional causality Extensions Alternative Comparison tests of tests Some practical problems for further research 6.1 6.2 6.3 6.4 6.5 The parameterization problem Non-autoregressive processes Deterministic processes Non-stationary processes Multivariate methods and asymptotic distribution theory References Handbook Elsevier of Econometrics, Volume II, Science Puhlishers B V, I984 Edited hv Griliches and M.D Intrrlrgator 1102 1103 1108 1109 1113 1115 1117 1122 1122 1127 1133 1133 1135 1138 1139 1140 1142 1102 J Gmeke Introduction Many econometricians are apt to be uncomfortable when thinking about the concept “causality” (in part, because they usually so under some duress) On the one hand, the concept is a primitive notion which is indispensable when thinking about economic phenomena, econometric models, and the relation between the two On the other, the idea is notoriously difficult to formalize, as casual reading in the philosophy of science will attest In this chapter we shall be concerned with a particular formalization that has proved useful in empirical work: hence the juxtaposition of “causality” and “inference” It also bears close relation to notions of strictly exogenous and predetermined variables, which have considerable operational significance in statistical inference, and to the concepts of causal orderings and realizability which are important in model construction in econometrics and engineering, respectively Our concept of causality was introduced to economists by C W J Granger [Granger (1963, 1969)], who built on earlier work by Wiener (1956) We shall refer to the concept as Wiener-Granger causality It applies to relations among time series Let X = {x,, t real} and Y = { y,, t real} be two time series, and let X, and Y denote their entire histories up to and including time t: X, = {x,_,, s 0}, Y, = { yrp,, s O} Let U, denote all information accumulated as of time t, and suppose that X, G U, if and only if s I t, and Y, U, if and only if s I t If we are better able to predict x, using U,_, than we are using U,_ - Y,_ 1, then Y CUUS~S If we are better able to “predict” x, using U,_ U y, than we are using X U,_ 1, then Y causes X instantaneously.’ Since Wiener-Granger causality is defined in terms of predictability, it cannot be an acceptable definition of causation for most philosophers of science [Bunge (1959, ch 12)] We not take up that argument in this chapter Rather, we concentrate on the operational usefulness of the definition in the construction, estimation, and application of econometric models In Section 2, for example, we consider the logical relationships among Wiener-Granger causality, Simon’s (1952) definition of causal ordering, the engineer’s criterion of realizability [e.g Zemanian (1972)], and the concept of structure set forth by Hurwicz (1962) Although Wiener-Granger causality is an empirical rather than a logical or ontological concept, it must be made much more specific before propositions like ‘Granger’s (1963, 1969) definitions assume that the time series are stationary, predictors are linear least-squares projections, and mean-square error is the criterion for comparison of forecasts While these assumptions are convenient to make when conducting empirical tests of the proposition that causality of a certain type is absent, they are not SUIgeneris and therefore have not been imposed here Ch 19: Inference und Causali@ 1103 does not cause X” can be refuted, even in principle One must always specify the set of “all information” assumed in the definition since Y may cause X for some sets but not others One must also have a criterion for the comparison of predictors, and the validity of propositions like “Y does not cause X” can be assessed only for restricted classes of predictors and distribution functions In Section we take up the case, frequently assumed in application, in which U, = X, u q, predictors are linear, and the time series are jointly wide sense stationary, purely nondeterministic, and have autoregressive representations In Sections and we move on to issues of statistical inference In Section it is shown that unidirectional causality from X to Y (i.e Y does not cause X, and X may or may not cause Y) is logically equivalent to the existence of simultaneous equation models with X exogenous It is also shown that unidirectional causality from X to Y is not equivalent to the assertion that X is predetermined in a particular behavioral relationship whose parameters are to be estimated In Section we take up the narrower problem of testing the proposition that Y does not cause X under the assumptions made in Section Section is devoted to some of the problems which arise in testing the proposition of unidirectional causality using actual economic time series, due to the fact that these series need not satisfy the ideal assumptions made in Sections and We concentrate on parameterization problems, processes which are nonautoregressive or have deterministic components or are nonstationary, and inference about many variables The reader who is only interested in the mechanics of testing hypotheses about unidirectional causality can skip Sections and 4, and read Sections 3, 5, and in order The material in Sections and 4, however, is essential in the interpretation of the results of those tests “Y Causality Whether or not Wiener-Granger causality is consistent with formal definitions of causality offered by philosophers of science is an open question In most definitions, “cause” is similar in meaning to “force” or “produce” [e.g Blalock (1961, pp 9-lo)], which are clearly not synonymous with “predict” Perhaps the definition closest to Wiener-Granger causality is Feigl’s in which “causation is defined in terms of predictability according to a law” [Feigl (1953, p 408)J It has been argued [Zellner (19?9)] that statistical “laws” of the type embodied in Wiener-Granger causality are not admissible, as opposed to those of economic theory Wiener-Granger causality is therefore “devoid of subject matter considerations, including subject matter theory, and thus is in conflict with others’ definitions, including Feigl’s, that mention both predictability and laws” [Zellner (1979, p 51)] Bunge (1959, p 30) on the other hand, argues forcefully J Geweke 1104 S = outcomes Figure2.1 against a distinction between statistical and other kinds of laws: “The claim that statistical laws, in contrast to other kinds of scientific law, are incomplete, hence provisional, is largely a matter of metascientific inertia In contemporary science and technology, and even in everyday life, we often ask questions that simply cannot be answered on any individual or dynaniical laws, questions requiring a statistical approach and analysis.” The usefulness of the concept of Wiener-Granger causality in the conceptualization, construction, estimation and manipulation of econometric models is independent of its consistency or inconsistency with formal definitions To evaluate its usefulness, we review and formalize some operational concepts implicit in econometric modelling.* A definition of causal ordering in any econometric model (as opposed to the real world) was proposed by Simon (1952) Suppose S is a space of possible outcomes, and that the model imposes two sets of restrictions, A and B, on these outcomes The entire model imposes the restriction n n B on S Suppose that S is mapped into two spaces, X and Y, by Px and P,, respectively Then the ordered pair of restrictions (A, B) implies a causal ordering from X to Y if A restricts X (if at all) but not Y, and B restricts Y (if at all) without further restricting X Formally we have the following: Definition The ordered pair (A, B) of restrictions on S determines a causal ordering from X to Y if and only if Pr( A) = Y and Px( A f~ B) = Px( A) 2Much(but not all) of what followsin this section may be found in Sims (1977a) Ch 19:Inference 1105 und Causuli~v A geometric interpretation of this definition is provided in Figure 2.1 Some examples may also be helpful Perhaps the simplest one which can be constructed is the following Let S = {(x, y) E R2 }, and consider the restrictions: x=a y + bx = c “A ” “B ” on S Let P, map S into the x coordinate and let P, map S into the y coordinate Then (A, B) determines a causal ordering from X to Y because A determines x without affecting y, while B together with A determines y without further restricting x The causal ordering is a property of the model, not a property of the restrictions on S to which the model happens to give rise: clearly, there are many pairs of restrictions (C, D) such that Px( C n D) = Px( A n B) = a and P,(C f~ D) = P,(A n B) = c - ba, and in fact one of these establishes a causal ordering from Y to X As a second example, let S be the family of pairs of random variables (x, v) with bivariate normal distribution Consider the restrictions: x=ui- N( pi, 0;) y+bx=u,-N(~2,a,2) “A” “B” on S, where ui and u2 are independent Suppose Px and P, map S into the marginal distributions for x and y, respectively Then (A, B) determines a causal ordering from the marginal for x to the marginal for y The model consisting of A, B, and the stipulation that ui and u2 are independent is the simplest example of a recursive model [Strotz and Wold (1960)] As Basmann (1965) has pointed out, any outcome in S can be described by such a model-again, the causal ordering is a property of the model, not of the outcome Causal orderings, or recursive models, are intended to be more than just descriptive devices Inherent in such models is the notion that if A is changed, the outcome will still be A n B, with B unchanged Once the possibility of changing the first restriction in the ordered pair is granted, it makes a great deal of difference which causal ordering is inherent in the model: different models describe different sets of restrictions on S arising from manipulation of the first restriction Hence attention is focused on B We formalize the notion that B is unchanged when A is manipulated as follows Definition The set B c S accepts X as input if for any A c S which constraints only X (i.e P; ‘( Px( A)) = A), (A, B) determines a causal ordering from X to Y In econometric modelling, the notion that B should accept X as input is so entrenched and natural that it is common to think of B as the model itself, with J Geweke 1106 little or no attention given to the set A which restricts the admissible inputs for the model, although these restrictions may be very important Conventional manipulation of an econometric model for policy or predictive purposes assumes that the manipulated variables are accepted as input by the model In many applications X and Y are time series, as they were in the notation of Section Consider the simple case in which X and Y are univariate, normally distributed, jointly stationary time series, and S is the family of bivariate, normally distributed, jointly stationary time series Suppose that the restriction A is: A(L)x, = u,, where A(L) is one-sided (i.e involves only non-negative powers of the lag operator L) and has all roots outside the unit circle; and V= {u,, t real} is a serially uncorrelated, normally distributed, stationary time series Let the restriction B be: where B(L) has no roots on the unit circle, both B(L) and C(L) may be two-sided (i.e involve negative powers of the lag operator L) and W = { w,, t real} is a serially uncorrelated normally distributed, stationary time series independent of U Since A implies x, = A(L)-‘u,, it establishes the first time series without restricting the second, while “B ” implies Y,=-B(L)~‘C(L)x,+B(L)~‘u,; (2.1) which establishes the second without changing the first Hence, the model establishes a causal ordering from X to Y, and if for any normally distributed, jointly stationary X the outcome of the model satisfies (2.1), then B accepts X as input Such a model might or might not be interesting for purposes of manipulation, however In general, y, will be a function of past, current, and future X, which is undesirable if B is supposed to describe the relation between actual inputs and outputs; the restriction that B(L) and C(L) be one-sided and that B(L) have no roots inside the unit circle would obviate this difficulty The notion that future inputs should not be involved in the determination of present outputs is known in the engineering literature as realizability [Zemanian (1972)], and we can formalize it in our notation as follows Definition The set B G S is realizable with time series X as input if B accepts X as input, and Px (A,) = Px (A*) implies Pr( A, fl B) = Py,( A, n B) for all A, c S and A, c S wl&h constrain only X, and all t r Ch 19: Inference and Cau.sali(v 1107 If B accepts X as input but is not realizable, then a specification of inputs up to time t will not restrict outputs, but once outputs up to time t are restricted by B, then further restrictions on inputs-those occurring after time t-are implied This is clearly an undesirable characteristic of any model which purports to treat time in a realistic fashion The concepts of causal ordering, inputs, and realizability pertain to models One can establish whether models possess these properties without reference to the phenomena which the models are supposed to describe Of course, our interest in these models stems from the possibility that they indeed describe actual phenomena Hurwicz (1962) attributes the characteristic structural to models which meet this criterion Dejinition The set B c S is structural for inputs X if B accepts X as input, and when any set C G X is implemented, then Pv( Pi’(C)n B) is true Notice that the use of the word “structural” here is not the same as its use in the parlance of simultaneous equation models The sets of “structural”, “reduced form” and “final form” equations are either all structural or not structural in the sense of the foregoing definition, depending on whether or not the model depicts actual phenomena This definition incorporates two terms which shall remain primitive: “implemented” and “true” Whether or not Py( P;‘(C)fl B) is true for a given C is a question to which statistical inference can be addressed; at most, we can hope to attach a posterior probability to the truth of this statement We can never know whether PY( Pi’(C)n B) is true for any C: one can never prove that a model is structural, although by implementing one or more sets C serious doubts could be cast on the assertion Since the definition allows any set C G X to be implemented, those implementing inputs in real time are permitted to change their plans It seems implausible that the current outputs of an actual system should depend on future inputs as yet undetermined We formalize this idea as follows Axiom of causality B c S is structural for inputs X only if B is realizable with X as input The axiom of causality is a formalization of the idea that the future cannot cause the past, an idea which appears to be uniformly accepted in the philosophy of science despite differences about the relations between antecedence and causality For example, Blalock (1964, p 10) finds this condition indispensable: J Geweke 1108 “Since the forcing or producing idea is not contained in the notion of temporal sequences, as just noted, our conception of causality should not depend on temporal sequences, except for the impossibility of an effect preceding its cause.” Bunge argues that the condition is universally satisfied: Even relativity admits the reversal of time series of physically disconnected events but excludes the reversal of causal connections, that is, it denies that effects can arise before they have been produced events whose order of succession is reversible cannot be causally connected with one another; at most they may have a common origin To conclude, a condition for causality to hold is that C [the cause] be previous to or at most simultaneous with E [the event] (relative to a given reference system) [Bunge (1959, p 67)] It is important to note that the converse of the axiom of causality is the post hoc ergo propter hoc fallacy The fallaciousness of the converse follows from the fact that there are many B, G S which are realizable with X as input, but for which P,(Pi’(C)n B,) # P,(&‘(C)n Bk) when j # k for some choices of C For C which have actually been implemented, B, and B, may of course produce identical outputs in spite of their logical inconsistency: one cannot establish that a restriction is structural through statistical inference, even to a specified level of a posteriori probability.3 It may seem curious to provide the name “axiom of causality” to a statement which nowhere mentions the word “cause” The name is chosen because of Sims’ (1972) result that (in our language, and with appropriate restrictions on classes of time series and predictors) B is realizable with X as input if and only if in B Wiener-Granger causality is unidirectional from X to Y To develop this result we shall be quite specific about the structure of the time series X and Y Causal orderings and their implications In any empirical application the concept of Wiener-Granger causality must be formulated more narrowly than it is in Granger’s definitions The relevant universe of information must be specified, and the class of predictors to be considered must be limited If formal, classical hypothesis testing is contemplated, then the question of whether or not Y is causing X must be made to depend on the values of parameters which are few in number relative to the number of observations at hand The determination of the relevant universe of information rests primarily on a priori considerations from economic theory, in much the ‘An extended discussion of specific pitfalls encountered in using a finding that a restriction B which is realizable with X as input is in agreement with the data, to buttress a claim that B is structural, is provided by Sims (1977) 1109 Ch 19: Inference und Causahry same way that the specification of which variables should enter a behavioral equation or system of equations does Empirical studies which examine questions of Wiener-Granger causality differ greatly in the care with which the universe of information is chosen; in many instances, it is suggested by earlier work on substantively similar issues which did not address questions of causality However, virtually all of these studies consider only predictors which are linear either in levels or logarithms This choice is due mainly to the analytical convenience of the linearity specification, as it is elsewhere in econometric theory In the present case it is especially attractive because only linear predictors are necessarily time invariant when time series are assumed to be wide sense stationary, the least restrictive class of time series for which a rich and useful theory of prediction is available In this section we will discuss the portions of this theory essential for developing the testable implications of Wiener-Granger causality Considerations of testing and inference are left to Section 3.1 A canonical form for wide sense stationary multiple time series We focus our attention on a wide sense stationary, purely non-deterministic time series z,: m x By wide sense stationary, it is meant that the mean of z, exists and does not depend on t, and for all t and s cov(z,, z(+,) exists and depends on s but not t By purely non-deterministic, it is meant that the correlation of z,+~ and z, vanishes as p increases so that in the limit the best linear forecast of z(+~ conditional on {z(_,, s > 0} is the unconditional mean of z(+~, which for convenience we take to be It is presumed that the relevant universe of information at time t consists of Z, = {zt_$, s > 0) These assumptions restrict the universe of information which might be considered, but they are no more severe than those usually made in standard linear or simultaneous equation models for the purposes of developing an asymptotic theory of inference We further suppose that there exists a moving average representation for z,: z,= f s=o AE s t s, E(q) = 0, var( E,) = Y (3.1) In the moving average representation, all roots of the generating function CT+ A,zS have modulus not less than unity, the coefficients satisfy the square summability condition C~zollA,J(2 < 00,~ and the vector E, is serially uncorrelated [Wold (1938)] The existence of the moving average representation is important to 4For any square complex matrix C, llC\l denotes the square root of the largest eigenvalue of C’C, and 1Cl denotes the square root of the determinant of C’C J Gmeke 1110 us for two reasons First, it is equivalent to the existence of the spectral density matrix S,(X) of z, at almost all frequencies h E [- n,m] [Doob (1953, pp 499-500)] Second, it provides a lower bound on the mean square error of one-step ahead minimum mean square error linear forecasts, which is: ]Z’]=exp &/I * ln]S,(h)ldh) >O (3.2) The condition ]‘I’]> is equivalent to our assumption that Z is strictly nondeterministic [Rozanov (1967, p 72)] Whether this lower bound can be realized depends on whether the relation in (3.1) can be inverted so that L, becomes a linear function of Z,_, and a,, z, = : Bsz,es + E, (3.3) s=l A sufficient condition for invertibility is that there exist a constant c such that for almost all h: (3.4) [Rozanov (1967, pp 77-78)], which we henceforth assume.’ This assumption is nontrivial, because processes like z, = E, + E,_ are excluded The requirement (3.4) that the spectral density matrix be bounded uniformly away from zero almost everywhere in [ - r, a] is more restrictive than (3.2) On the other hand (3.4) is less restrictive than the assumption that Z is a moving average, autoregressive process of finite order with invertible moving average and autoregressive parts, which is sometimes taken as the point of departure in the study of multiple time series Suppose now that z, has been partitioned into k x and x subvectors x, and y,, z; = (XI, y,‘), reflecting an interest in causal relationships between X and Y Adopt a corresponding partition of S,(h): 5(h) sz(x)= S,,(A) [ %,(V S,(A) From (3.4) X and Y each possess autoregressive denote: M 1, = C &xl-, + ult, var(u,,) = Z,, representations, which we (3.5) s=l 5A@I? indicates that B - A is positive semidetinite; semidefinite and not null A @B indicates that B - is positive J Geweke 1130 Table 5.1” Design of a sampling experiment Model A: p, =l.o+ x, 1.5_v,_* -0.5625.~_~ +0X5-~_, + 04r , + u2, =1.0+0.8.x, U2, - N(O,l) - NO, I); y, =1.0+0.258.~,_~ +0.172.~,_~ -0.086.x, w,=L5w,_, -0.5625~,_~ + c4, ~,=l.O+O.Ex,_~+wp, 04, Model B: L’4, - TGL = TsL TGR = TSR T”w = Tsw T’sw’ T’SL’ TcSL)(pretiltered) N(O, 1); U2, - + W, N(O.1) Approximate slopes Model A 0.0985 0.1037 0.1093 0.1240 0.1103 0.0711 Model B 0.0989 0.1041 0.1098 0.1125 0.0954 0.1016 ‘Source: Geweke, Meese and Dent (1983 table 4.1) but (5.7) and (5.16) are never exact In all instances in which parameterizations are inexact, however, the contributions of the omitted variables to variance in the dependent variable is very small The outcome of the sampling experiment is presented in Table 5.2, for tests that Y does not cause X, and in Table 5.3, for tests that X does not cause Y The sampling distribution of the test statistics for Sims tests requiring a serial correlation correction is very unsatisfactory when the null is true If no prefilter is applied, rejection frequencies range from 1.5 to 6.4 times their asymptotic values When the prefilter (l -0.75L)’ is used, the sampling distribution of these test statistics is about as good as that of the others In the present case this prefilter removes all serial correlation in the disturbances of the regression equation (5.13) With actual data one cannot be sure of doing this, and the sensitivity of these test results to prefiltering, combined with the greater computational burden which they impose, argues against their use By contrast, the sampling distribution of the Granger and Sims test which use lagged dependent variables appears much closer to the limiting distribution Overall, Wald variants reject somewhat too often and Lagrange variants somewhat too infrequently, but departures from limiting frequencies are not often significant (given that only 100 replications of data were generated) The rejection frequencies when the null is false, presented in Table 5.3, accord very well with what one would expect given the approximate slopes in Table 5.1 and the rejection frequencies anticipated in the design of the experiment Rejection frequencies for TGW and Tsw are about the same; those for TGL and TsL are each lower but similar to each other Rejection frequencies are greatest for the Sims tests requiring a serial correlation correction, but their distribution under the Ch 19: Inference 1131 and Causali!v Outcome Number Table 5.2” of sampling experiment of rejectionsb Parameterization Test’ pw T”L p’ TSL Ttsw'( F) T’sw’(U) TcsL’( F) TcsL’( L’) p q 12 12 12 12 4 4 4 4 4 12 4 12 12 12 12 12 12 12 12 12 12 12 12 12 in 100 replications when null is true Model A Model B r 5% level 10% level 5% level 10% level 4 12 4 12 4 12 4 12 4 12 4 12 9 4 26 24 21 26 23 20 18 14 19 16 11 11 11 13 10 13 13 11 12 15 12 17 15 33 32 28 31 I 30 26 27 17 11 11 22 4 3 1 12 12 15 26 32 32 31 30 4 18 8 16 19 22 36 10 4 14 21 24 35 38 41 38 34 12 26 21 13 15 ?Source: Geweke, Meese and Dent (1983, table 5.2) ‘The appropriate F, rather than cm-square, distribution was used ‘For tests requiring correction for serial correlation, the Hannan estimation was used The spectral density of the disturbances was inverted “V” spectral window with a base of 19 ordinates, applied periodogram ordinates For those tests with the suffix (F), data were by (1-0.75i)*, which Raitens the spectral density of the disturbance (U), no prefilter was applied efficient method of estimated using an to the 100 residual initially prefiltered We, For those with J Geweke 1132 Table 5.3” Outcome of sampling experiment Number of rejections in 100 rephcationsb when null is false Model A Parameterization Test’ T”L TSW TSL Tcsw'(F) TtsL’( F) p q 12 12 12 12 4 4 4 4 4 12 4 12 12 12 12 12 12 12 12 12 Model B r 5% level 10% level 5% level 10% level 4 12 4 12 4 12 4 12 59 63 61 36 55 54 55 15 60 64 58 41 51 49 50 32 90 88 85 81 82 78 71 32 76 71 70 50 67 68 69 24 70 16 68 56 65 64 64 37 96 94 94 86 92 89 86 52 69 64 64 39 59 58 58 71 73 71 49 63 65 65 25 87 87 85 62 78 78 14 17 19 13 76 59 14 70 70 24 19 71 16 61 14 75 72 44 93 91 90 79 88 89 84 33 “Source: Geweke, Meese and Dent (1983, table 5.4) hThe appropriate F, rather than &-square, distribution was used ‘For tests requiring correction for serial correlation, the Hannan ei?icient method of estimation was used The spectral density of the disturbances was estimated using an inverted “V” spectral window with a base of 19 ordinates, applied to the 100 residual periodogram ordinates For those tests with the suffix (F), data were initially prefiltered by (I -O.l5L)“, which flattens the spectral density of the disturbance y, null is less reliable; in view of the results in Table 5.2, rejection frequencies for those tests which use unfiltered data have not been presented in Table 5.3 These results are corroborated in the rest of the Geweke, Meese and Dent study, as well as in Guilkey and Salemi, and Nelson and Schwert Guilkey and Salemi have, in addition, studied the case in which the chosen parameterization omits important contributions of the explanatory variables; as one would expect, this biases tests of the null toward rejection and diminishes power under the alternative The consensus is that inference can be undertaken with greatest reliability and computational ease employing either (5.1) as a restriction on (5.2) or (5.15) as a restriction on (5.16) and using either the Lagrange or Wald variant .I Geh,eke 1134 must exist an upper bound on the rate of expansion of p and q with n such that if p and q increase without limit as n grows but satisfy the upper bound constraint on rate, estimatesof v%(x,x,_i , , x,_p) and var(x,]x,_i , , x,_p; y,_i , , y,_,) will be strongly consistent for 2, and z2, respectively.14 The problem of prescribing which values of p and q should be used in a particular situation is, of course, more difficult In our discussion of test performance under the null, it was necessary to make the much stronger assumption that the dimension of the parameter space can be expanded with n in such a way that the distributions of the test statistics approach those which would obtain were the true parameterization finite and known In the context of our example of the Wald variant of the Granger test of the hypothesis that Y does not cause X, this is equivalent to saying that p and q grow with n slowly enough that strongly consistent estimates of 2, and z2 are achieved, but rapidly enough that their squared bias becomes negligible relative to their variance It is not intuitively clear that such rates of expansion must exist, and there has been little work on this problem Sampling study results suggest that as a practical matter this problem need not be overwhelming; witness the behavior of the Granger tests and the Sims tests incorporating lagged dependent variables, under the null hypothesis On the other hand, the poor performance of the Sims tests involving a correction for serial correlation, under the null hypothesis, may be due in large measure to this difficulty A variety of operational solutions of the problem, “how to choose lag length”, has appeared in the literature Most of these solutions [Akaike (1974) Amemiya (1980) Mallows (1973) Parzen (1977)] emerge from the objective of choosing lag length to minimize the mean square error of prediction For example, Parzen suggests that in a multivariate autoregression for Z, that lag length which minimizes the values of trace (6.1) be chosen, where m is the number of variables in Z, p is lag length, and 1, = c:=, G,.$/(n - jm), where 2, is the vector of ordinary least squares residuals in the linear regression of z, on z,_ i, , z,_, Choice of p in the other solutions is based on the minimization of different functions, but the value of p chosen will t4Upper bounds of this type have been derived by Robinson (1978) for a problem which is formally similar, that of estimating the coefficients in the population linear projection of Y on X consistently by ordinary least squares The bounds require that lag lengths be O(~I(“~~)/“) where v is a constant between and - l/2 which depends on higher moments of A’ and that the rates of increase in the number of past and number of future X not be too disparate 1135 Inference Causalr[, and Ch I!?: usually be (1978) has probability estimate of the same [Geweke and Meese (1981)] On the other hand, Schwarz shown that for a wide variety of priors which place positive prior on all finite lag lengths (but none on infinite lag length), the posterior lag length in large samples will be that which minimizes the value of 1n]3P] + m2pln(n)/n, (6.2) when Z is Gaussian Lag lengths chosen using (6.2) will generally be shorter than those chosen using (6.1) These solutions to the problem of choosing lag length are in many respects convenient, but it must be emphasized that they were not designed as the first step in a regression strategy for estimating coefficients in an autoregression whose length may be infinite Neither analytic work nor sampling studies have addressed the properties of inference conditional on lag length chosen by either method, although the first one has been used in empirical work [Hsiao (1979a, 1979b)] 6.2 Non - autoregressive processes A time series may be wide sense stationary and purely non-deterministic, and fail to have autoregressive representation because one or more of the roots of the Laurent expansion of its moving average representation lie on the unit circle A non-autoregressive process does not possess an optimal linear, one step ahead predictor with coefficients which converge in mean square The representations derived in Section fail to exist for such a series, and if one attempts inference about Wiener-Granger causality using non-autoregressive processes, then misleading results are likely to emerge A simple example provides some indication of what can go wrong Consider first two time series with joint moving average representation x, = E, + PPi, Y, = E,-1 + (6.3) 71,, where (E,, 77,)’ is the bivariate innovation for the (x,, y,)’ process, Jp] < 1, E( et) = E( nr) = cov( E,, nl) = and var( et) = var( n,) = In the population, var(x,lx,_, , , x,_k)=l+pp2-m;M;‘m,, where 1+ p2 P’ mk = kxl P 1+p2 P * -o Mk = 1+p2 kxh *p 0, -o- P + p= J Geweke 1136 Using a result of Nerlove, Grether and Carvalho (1979, p 419) for the exact inverse of a tridiagonal matrix, the entry in the first row and column of A4~t is (1 - IJ~~)/(~ - p2ck+‘)), from which )/(l- var(X,Ix,P1, ,~,_k)=(1-p2(k+2) p2(k+t)).15 Suppose that k lagged values of y, are also used to predict x, In the population, where -o- P -o- - * ‘0 P P_ The element in the first row and column of -’ I Mk B, Bl 21, is the same as that in the first row and column of the inverse of:16 + 5p2 asp Mk - OSB, -o- asp 0.5(1+p’) B; = -o- 0.5(1+ 0.5p 15For an alternative derivation see Whittle (1983, p 75) lhWe use the standard result [e.g Theil (1971, pp 17-19)] p2) 0.5p 0.5(1+ on the inverse of a partitioned p’) matrix 1137 Ch 19: Inference und Cuusuliry Partitioning out the first row and column of this matrix, the desired element is (l+OSp* -O.Sm;_,M;~,m,_,)-‘ Hence: var(x,Jx,_i , , x,_,+; ytpl, , Y,Fk) =1+p2-~*(1+0.5p~-O.5m~_,~~~,m,_,)~’ =1+p*-(p~(1-p*~))/(l-0.5p2~-0.5p2(k+1)) = (1-0.5p2k -o.5p*(k+*))/(1 -o.5p*k -o.5p*‘k+“) (6.4) Clearly k must grow with n, or else the hypothesis that Y does not cause X-which is true-will be rejected in large samples If k grows at a rate such that the asymptotic results of Wald (1943) on the distribution of test statistics under the alternative are valid, then in large samples the Wald variant of the Granger test will have a limiting non-central chi square distribution with non-centrality parameter: (I- (l- p*(k+*) ) (1 -o.5p*k -o.5p*‘k+“) P2(k+l)) (1-o.5p*k-0.5p*(k+*)) p*+ _o.5p*(k+*)) -l n o.5p*(k+‘)(1- p2)(1= (I- P*‘k+1))(1_o.5p2k ’ So long as k/in(n) increases without bound, the non-centrality parameter will vanish asymptotically-a necessary and sufficient condition for the test statistic to have a valid asymptotic distribution under the null, given our assumptions In particular, this will occur if k = [na], < (Y< 0.5 Suppose now that p =l in (6.3): the process (x,, y,) is then stationary but has no autoregressive representation Nerlove, Grether and Carvalho’s result does not apply when p = 1; let ck denote the element in the first row and column of ML ‘ Partitioning out the first row and column of Mk: Ck = (2-m;_,M,‘,m,_,)-1 = (2- ck-l)-1 By mathematical induction ck = k/( k + 1) Hence: var(x,lx,_i, , x,_,)=2-m~M~‘m,=2-c~~,=(k+2)/(k+1) Substituting in (6.4): v?Lr(x,lx,_l , , x,-k, Y,-I, , y,_k) = (2k +2)/(X + 1) J Geweke 1138 The non-centrality parameter of the limiting distribution of the Wald variant of the Granger test in this case is: (k+2) _ [ (k+l) (2k+l) -1 2@+1) n= kn 2(k t-1)“ The non-centrality parameter will increase without bound unless lim, j ~ (n /k ) = 0, a condition which cannot be met It therefore appears plausible that when ]p] < 1, Granger tests may be asymptotically unbiased if the number of parameters estimated increases suitably with -C sample size; in particular, k = n”, < (Y 0.5 may be a suitable rule It appears equally implausible that such rules are likely to exist when 1= 1; in particular, p Granger tests are likely to be biased against the null This result is not peculiar to the Granger test: S,(m) = causes equally troublesome problems in Sims tests of the hypothesis of absence of Wiener-Granger causality from Y to X 6.3 Deterministic processes From Weld’s decomposition theorem [Wold (1938)] we know that a stationary time series may contain deterministic components Although the definition of Wiener-Granger causality can be extended to encompass processes which are not purely non-deterministic, inference about directions of causality for such series is apt to be treacherous unless the deterministic process is known up to a finite number of unknown parameters The difficulty is that to the extent the two series in question are mutually influenced by the same deterministic components, the influence may be perceived as causal A simple example will illustrate this point Let x, = xl, + x2,, y, = _J+! yzr, xrr = y,, = cos(mt/2), and suppose xzr and + yz, are independent white noises with zero means and unit variances The bivariate process (x,, v,)’ may be interpreted as quarterly white noise contaminated by deterministic seasonal influence Following Hosoya (1977) X does not cause Y and Y does not cause X because xl, = yrr is in the Hilbert space generated by either X,_, or U,_, It is clear from the symmetry of the situation, however, that given only finite subsets of X,_, and Y-t, the subsets taken together will permit a better separation of the deterministic and non-deterministic components of the series than either alone, and hence a better prediction of x, and y,, For instance, the linear projection of x, on (x,-t, , x,_~~) is ~.~=t(~~_~, x,_~) = (2k + 3)/(2k + 2) The lmear pro- ~,+~-~,)/(2(k +I)), vWx,l.qI, , jection of x, on 4k lagged values of x, and y, is C~=,(X,_~, + JJ_~, - x,+~_~, j;+2-#J(k + 1) var(x,]&,, >X,-z,k; y,-,, , Y(-z$k) = (4k + W(4k + 2) Following the argument used in the example for non-autoregressive processes, if Ch 19: Inference und Causuliiy 1139 the limiting distribution of the Wald variant of the Granger tests is (possibly non-central) chi square, the non-centrality parameter is 2kn/(2k + 2)(4k + 3) This parameter could vanish only if the number of parameters estimated increased faster than the number of observations, which is impossible This result is consistent with the intuition that tests for the absence of Wiener-Granger causality are apt to be biased against the null when deterministic components are involved It is equally clear from the example that if the determinism can be reduced to dependence on a few unknown parameters-here, the coefficients on perhaps four seasonal dummies-then the results of Sections and apply to the processes conditional on deterministic influences 6.4 Non-stationary processes Although in principle non-stationarity can take on many forms, most nonstationarity which we suspect” in economic time series is of certain specific types If the non-stationarity arises from deterministic influences on mean or variance and their functional forms are known, the methods of Section can be modified directly to accommodate these influences For example, trend terms may be incorporated in estimated equations to allow for conditional means which change with time, and means and variances which increase with time may in some cases be eliminated by the logarithmic transformation and incorporation of a linear trend Non-stationarity need not be deterministic, and in fact many economic time series appear to be well described as processes with autoregressive representations which have Laurent expansions with one or more roots on the unit circle.lx The asymptotic distribution theory for the estimates of autoregressions of stationary series does not apply directly in such cases For example, in the case of the first-order autoregression x, = px,_i + E,, ]p] 11, the least squares estimator of p has a limiting distribution that is non-normal if E, is non-normal [Rao (1961)] and a variance that is 0(1/n) If ]p] =l, the limiting distribution of the least squares estimator is not symmetric about p even when the distribution of E, is normal [Anderson (1959)] and the variance of the limiting distribution is again 0(1/n) The limiting distributions in these cases reflect the fact that information about the “One can never demonstrate non-stationarity with a finite sample: e.g apparent trends can be ascribed to powerful, low frequency components in a purely non-deterministic, stationary series As a practical matter, however, inference with only asymptotic justification is apt to be misleading in such cases ‘“Witness the popularity of the autoregressive integrated moving average models proposed by Box and Jenkins (1970) in which autoregressive representations often incorporate factors of the form (1 - L), or (1 - L’) when there are s observations per year, J Geweke 1140 unstable or explosive roots of these processes is accumulated at a more rapid rate than is information about the roots of stationary processes, since the variance of the regressors grows in the former case but remains constant in the latter This intuition is reinforced by a useful result of Fuller (1976) who considers an autoregression of finite order, Y, = c,“=,c,Y1_, + E, with one root of csP_tc,L’ equal to unity and all others outside the unit circle In the transformed equation YI= etY,-r +CP=*ei( YI+i_; - v,_~)+ E, the ordinary least squares estimates of 0,) , $, have the same limiting distribution as they would if all roots of the expansion were outside the unit circle, and the ordinary least squares estimate of 8, has the same limiting distribution as in the first order autoregressive equation with unit root Hence, in the original equation inference about c2, , cp (but not cr) may proceed in the usual way Whether or not Fuller’s result can be extended to multiple roots and vector processes is an important problem for future research If this extension is possible, then the methods of Section involving the estimation of vector autoregressions can be applied directly to processes which are non-stationary because of roots on the unit circle in the expansions of their autoregressive representations For example, if the process Z has autoregressive representation [(l- L)@Z,,,]A(L)z, = E, with all roots of IA(L)] outside the unit circle, tests of unidirectional causality between subsets of Z can proceed as in the stationary case An attraction of this procedure is its conservatism: it allows non-stationarity, rather than requires it as is the case if one works with first differences 6.5 Multivariate methods While our discussion has been cast in terms of Wiener-Granger causal orderings between vector time series, in most empirical work causal relations between univariate time series are considered In many instances relations between vector series would be more realistic and interesting than those between univariate series The chief impediment to the study of such relations is undoubtedly the number of parameters required to describe them, which in the case of the vector autoregressive parameterization increases as the square of the number of series, whereas available degrees of freedom increases linearly The orders of magnitude are similar for other parameterizations Practical methods for the study of vector time series are only beginning to be developed, and the outstanding questions are numerous All that will be done here is to mention those which seem most important for inference about causal orderings The first question is whether or not inference about causal orderings is adversely affected by the size of the system, in the sense that for vector time series the power of tests is substantially less than for univariate series Any answer to Ch 19: Inference and Causality 1141 this question is a generalization about relationships among observed economic time series; confidence intervals for measures of feedback constructed by Geweke (1982a) suggest that for samples consisting of 60-80 quarterly observations and lags of length six feedback must be roughly twice as great for a pair of bivariate series as for a pair of univariate series if unidirectional causality is to be rejected The large number of parameters required for pairs of vector time series thus appears potentially important in the assessment of feedback and tests of unidirectional causal orderings Two kinds of approaches to the large parameter problem are being taken The first is to reduce the number of parameters, appealing to “principles of parsimony” and using exact restrictions For example, the autoregressive representation may be parameterized as a mixed moving average autoregression of finite order, with the form of the moving average and autoregressive components specified by inspection of various diagnostic statistics [e.g Wallis, (1978)] This is the extension to vector time series of the methods of Box and Jenkins (1970) There are two serious complications which emerge when going from univariate to multivariate series: estimation, in particular by exact maximum likelihood methods, is difficult [Osborn (1977)]; and the restrictions specified subjectively are quite large in number and not suggested in nearly so clear a fashion from the diagnostic statistics as in the univariate case At least one alternative set of exact restrictions has been suggested for vector autoregressions: Sargent and Sims (1977) experiment with the restriction that in the finite order autoregressive model A( L)z, = E, the matrix A(L) be less than full rank Computations are again burdensome The forecasting accuracy of neither method has been assessed carefully A second approach is to use prior information in a probabilistic way This information might reflect economic theory, or might be based on purely statistical considerations In the former case Bayes estimates of parameters and posterior odds ratios for the hypothesis of unidirectional causality will result, whereas in the latter case final estimates are more naturally interpreted as Stein estimates In both cases, computation is simple if priors on parameters in a finite order autoregression are linear and one is content to use Theil-Goldberger mixed estimates Litterman (1980) has constructed mixed estimates of a seven-variable autoregression, using six lags and 87 observations: 301 parameters are estimated, beginning with 609 degrees of freedom The mean of the mixed prior is 1.0 for the seven coefficients on own lags and zero for all others, and variances decline as lag increases His out-of-sample forecasts are better than those of the system estimated by classical methods and those issued by the proprietors of two large econometric models This second approach is attractive relative to the first because of its computational simplicity, and because Bayesian or Stein estimators are methodologically better suited to the investigator’s predicament in the estimation of large vector autoregressions than are exact restrictions J Geweke 1142 References Akaike, H (1974) Automatic Control, “A New Look AC-19, at the Statistical Model Identitication”, IEEE Transacttons 017 716-723 International Economic Revtew forthcoming T (1980) “Selection of Regressors”, T (1973) “Generalized Least Squares with an Estimated Autocovariance Matrix” Amameiya, Amemiya, Econometrica, 41, 723-732 Anderson, T W (1958) An Introduction to Multivariate Statrstical AnafWs New York: Wiley Anderson, T W (1959) “On Asymptotic Distributions of Estimates of Parameters of Stochastic Difference Equations”, Annals of Mathematrcal Statistics, 30, 676-687 Statistics, 38, 303-324 Bahadur, (1960) “Stochastic Comparisons of Tests”, Annals of Mathematical Barth, J R and J T Bennett (1974) “The Role of Money in the Canadian Economy: An Empirical Test”, Cunadian Journal of Economics, 7, 306311 Basmann, R L (1965) “A Note on the Statistical Testability of ‘Explicit Causal Chains’ Against the Class of ‘Interdependent Models”‘, Journal of the Amerrcun Statisttcul Associatton 60 1080-1093 Bemdt, E R and N E Savin (1977) “Conflict Among Criteria for Testing Hypotheses in the Multivariate Linear Regression Model”, Econometrica, 45, 1263-1278 Blalock, H M., Jr (1961) Causal Inferences in Non-e.xperimentul Research Chapel Hill: University of North Carolina Press Bunge, M (1959) Cuusality: The Place of the Causal Principle rn Modern Science Cambridge: Harvard University Press Christ, C F (1966) Econometric Models and Methods Wiley: New York Doob, J (1953) Stochastic Processes New York: Wiley Engle, R., D Hendry, and J.-F Richard (1983) “Exogeneity” Econometrrcu 51, 277-304 Feigel, H (1953) “Notes on Causality”, in Feigl and Brodbeck, eds., Readtngs in the Phrlosophy of Science Fuller, W A (1976) Introduction to Statistical Time Series New York: Wiley Gel’fand, I M and A M Yaglom (1959) “Calculation of the Amount of Information about a Random Function Contained in Another Such Function”, American Mathematical Society Translutions Series 2, 12, 199-264 Geweke, J Equation Geweke, J Geweke, J Journul (1978) “Testing the Exogeneity Specification in the Complete Dynamic Simultaneous Model”, Journal of Econometrics, 7, 163-185 (1981) “The Approximate Slopes of Econometric Tests”, Econometrica, 49 1427-1442 (1982a) “Measurement of Linear Dependence and Feedback Between Time Series” of the Americun Statistical Association, 77, 304-324 Geweke, J (1982b) “The Neutrality of Money in the United States, 1867-1981: An Interpretation of the Evidence”, Carnegie-Mellon University Working Paper Geweke, J and R Meese (1981) “Estimating Regression Models of Finite But Unknown Order”, Internationul Economtc Reutew, forthcoming Geweke, J., R Meese and W Dent (1983) “Comparing Alternative Tests of Causality in Temporal Systems: Analytic Results and Experimental Evidence”, Journal of Econometrics, 21 161-194 Granger, C W J (1963) “Economic Processes Involving Feedback”, Informotron und Control 6, 28-48; also Chapter of Granger, C W J and M Hatanaka, Spectral Ana!ysrs of Economic Time Series Princeton: University Press Granger, C W J (1969) “Investigating Causal Relations by Econometric Models and Cross-Spectral Methods”, Econometrica, 37, 424-438 Grenander, U and G Szego (1958) Toeplitz Forms and Their Applicattons., Berkeley: University of California Press Guilkey, D K and M K Salemi (1982) “Small Sample Properties of Three Tests for Granger-Causal Ordering in a Bivariate Stochastic System”, Review of Economrcs and Statistics, 64 68-80 Hannan, E J (1963) “Regression for Time Series”, in: M Rosenblatt, ed Proceedings of a Symposium on Time Series Analysis New York: Wiley Hannan, E J (1970) Multiple Time Series New York: Wiley Haugh, L D (1976) “Checking the Independence of Two Covariance-Stationary Time Series: A Univariate Residual Cross Correlation Approach” Journal of the Amencan Stutistical Assorration II 71 378-385 Ch 19: Inference and Causalig 1143 Economerrica 45, 1735-1736 &soya, Y (1977) “On the Granger Condition for Non-causality”, Hsiao, C (1979a) “Autoregressive Modeling of Canadian Money and Income Data”, Journal of the American Sraristical Association 74 553-560 Journal of Economic Dynamics and Control, 1, Hsiao C (1979b) “Causality Tests in Econometrics” 321-346 Hunvia, L (1962) “On the Structural Form of Interdependent Systems”, in: Nagel E., et al., Logic, Merhodolog~ and the Phrlosophy of Science Palo Alto: Stanford University Press Koopmans, T C (1950) “When is an Equation System Complete for Statistical Purposes?’ in: Koopmans, T C., ed., Sraristical Inference in Dynamic Economrc Models New York: Wiley Koopmans, T C and W C Hood (1953) “The Estimation of Simultaneous Economic Relationships”, in: Hood, W C and T C Koopmans, eds., Studies m Economerric Method New Haven: Yale University Press Litterman, R., “A Bayesian Procedure for Forecasting with Vector Autoregressions” (MIT manuscript) Mallows, C L (1973) “Some Comments on C,“ Technomerrics, 15 661-675 Neftci, S N (1978) “A Time-Series Analysis of the Real Wages-Employment Relationship”, Journal of Polirical Economy,, 86, 281-292 Nelson C R and G W Schwert (1982) “Tests for Predictive Relationships between Time Series Journal of the American Statisrical Association, 77 11- 18 Variables: A Monte Carlo Investigation”, Nerlove, M., D M Grether, and J K Carvalho (1979) Ana!)vis of Economic Time Serves: A Svnrhesrs New York: Academic Press Oberhofer W and J Kmenta (1974) “A General Procedure for Obtaining Maximum Likelihood Estimates in Generalized Regression Models”, Econometrica, 42 579-590 Osbom D R (1977) “Exact and Approximate Maximum Likelihood Estimators for Vector Moving Average Processes”, Journal of the RoJjal Starisrical Socie!)) , Series B, 39, 114-118 Parzen, E (1977) “Multiple Time Series: Determining the Order of Approximating Autoregressive Schemes”, in: P Krishnaiah, ed., Multroariare Ana&is IV, Amsterdam: North-Holland Pierce, D A (1977) “Relationships-and the Lack Thereof&Between Economic Time Series with Special Reference to Money and Interest Rates”, Journal of the Amerrcan Sratistrcal Associarion, 72 11-22 Pierce, D A (1979) “R2 Measures for Time Series”, Journal oj the American Statrsrical Association 74, 90-910 Pierce, D A and L D Haugh (1977) “Causality in Temporal Systems: Characterizations and a Survey”, Journal of Economen-its, 265-293 Rao, M M (1961) “Consistency and Limit Distributions of Estimators of Parameters in Explosive Stochastic Difference Equations”, Annals of Mathematical Statlsfrcs, 32, 195-218 Robinson, P (1978) “Distributed Lag Approximations to Linear Time-Invariant Systems,” Annals of Statistics 507-515 Rozanov Y A (1967) Srarionar?, Random Processes San Francisco: Holden-Day Sargent, T J and C A Sims (1977) “Business Cycle Modeling with Pretending to Have Too Much a priori Economic Theory”, in: C A Sims, ed., New Methods in Business Cycle Research: Proceedrngs from a Conference Minneapolis: Federal Reserve Bank of Minneapolis Schwatz, G (1978) “Estimating the Dimension of a Model”, Annals of Srarisrics, 461-464 Simon, H A (1952) “On the Definition of the Causal Relation”, Journal of Philosoph!,, 49, 517-527; reprinted in H A Simon, ed 1957, Models of Man New York: Wiley Sims, C A (1972) “Money, Income and Causality”, American Economrc Reeien~, 62, 540-552 Sims, C A (1974) “Output and Labor Input in Manufacturing”, Brooking Papers on Economic Acriciry 1974, 695-736 Sims, C A (1977a) “Exogeneity and Causal Ordering in Macroeconomic Models”, in: C A Sims, ed., New Methods in Business Cvcle Research: Proceedmgs from a Conference, Minneapolis: Federal Reserve Bank of Minneapolis Sims, C A (1977b) “Comment”, Journal of the American Statrstical Association, 72, 23-24 Strotz R H and H A Wold (1960) “Recursive vs Non-recursive Systems: An Attempt at Synthesis”, Econometrica, 28, 417-427 Theil, H (1971) Principles of Economefrics New York: John Wiley Wald, A (1943) “Tests of Statistical Hypotheses Concerning Several Parameters When the Number of Observations is Large”, Transactions of the American Mathematical Socie!,,, 54, 426-485 1144 J Geweke Wallis, K F (1977) “Multiple Time Series Analysis and the Final Form of Econometric Models”, Econometrica, 45, 1481-1498 Whittle, P (1983) Predrclion und Regulullon ly Lineur Least-Squure Methods Wiener, N (1956) “The Theory of Prediction”, in: E F Beckenback, ed., Modern Muthemutics for Engineers Weld, H (1938) The An&m of Stationmy Time Series Uppsala: Almquist and Wicksell Zellner, A (1979) “Causality and Econometrics” Zemanian, A H (1972) Reulizuhili~v Theon? for Continuous Lineur Slatems New York: Academic Press ... A: p, =l.o+ x, 1.5_v ,_* -0.5625. ~_~ +0X5- ~_, + 04r , + u2, =1.0+0.8.x, U2, - N(O,l) - NO, I); y, =1.0+0.258.~ ,_~ +0.172.~ ,_~ -0.086.x, w,=L5w ,_, -0.5625~ ,_~ + c4, ~,=l.O+O.Ex ,_~ +wp, 04, Model... rate of expansion of p and q with n such that if p and q increase without limit as n grows but satisfy the upper bound constraint on rate, estimatesof v%(x,x,_i , , x,_p) and var(x,]x,_i , , x,_p;... variety of operational solutions of the problem, “how to choose lag length”, has appeared in the literature Most of these solutions [Akaike (197 4) Amemiya (198 0) Mallows (197 3) Parzen (197 7)]