THE LINEAR REGRESSION MODEL III

50 269 0
THE LINEAR REGRESSION MODEL III

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

CHAPTER 21 The linear regression model III — departures from the assumptions underlying the probability model The purpose of this chapter is to consider various forms of departures from the assumptions of the probability model: [6] (i) D(y,/X,; 8) is normal, E71 (1) E(y,/X,= x,)= B’x,, linear in x,, (iti) Var(y,/X,=x,)=07, homoskedastic, =(ÿ;ø?) are time-invarlant In each of the Sections 2—5 the above assumptions will be relaxed one at a time, retaining the others, and the following interrelated questions will be discussed: (a) (b) what are the implications of the departures considered? how we detect such departures?, and (c) how we proceed if departures are detected? It is important to note at the outset that the following discussion which considers individual assumptions being relaxed separately limits the scope of misspecification analysis because it is rather rate to encounter such conditions in practice More often than not various assumptions are invalid simultaneously This is considered in more detail in Section Section discusses the problem of structural change which constitutes a particularly important form of departure from [7] 211 Misspecification testing and auxiliary regressions Misspecification testing refers to the testing of the assumptions underlying a Statistical model In its context the null hypothesis is uniquely defined as the assumption(s) in question being valid The alternative takes a particular form of departure from the null which is invariably non-unique This is 443 444 Departures from assumptions - probability model because departures from a given assumption can take numerous forms with the specified alternative being only one such form Moreover, most misspecification tests are based on the questionable presupposition that the other assumptions of the model are valid This is because joint misspecification testing is considerably more involved For these reasons the choice in a misspecification test is between rejecting and not rejecting the null; accepting the alternative should be excluded at this stage An important implication for the question on how to proceed if the null is rejected is that before any action is taken the results of the other misspecification tests should also be considered It is often the case that a particular form of departure from one assumption might also affect other assumptions For example when the assumption of sample independence [8] is invalid the other misspecification tests are influenced (see Chapter 22) In general the way to proceed when any of the assumptions [6]-[8] are invalid is first to narrow down the source of the departures by relating them back to the NIID assumption of ,Z,,t¢ 7} and then respecify the model taking into account the departure from NIID The respecification of the model involves a reconsideration of the reduction from D(Z,,Z5, ,Z7; W) to D(y,/X,; 8) so as to account for the departures from the assumptions involved As argued in Chapters 19-20 this reduction coming in the form of: r D(Z4 Z¡:)= [[ ĐưZ: ý) (21.1) ted T = IT Dy,/X, 01) DIX 2) (21.2) t= involves the independence and the identically distributed assumptions in (1) The normality assumption plays an important role in defining the parametrisation of interest @=(B,o*) as well as the weak exogeneity condition Once the source of the detected departure is related to one or more of the NIID assumptions the respecification takes the form of an alternative form of reduction This is illustrated most vividly in Chapter 22 where assumption [8] is discussed It turns out that when [8] is invalid not only the results in Chapter 19 are invalid but the other misspecification tests are ‘largely’ inappropriate as well For this reason it is advisable in practice to test assumption [8] first and then proceed with the other assumptions if [8] is not reyected The sequence of misspecification tests considered in what follows is chosen only for expositional purposes With the above discussion in mind let us consider the question of general procedures for the derivation of misspecification tests In cases where the alternative in a misspecification test is given a specific parametric form the various procedures encountered in specification testing (F-type tests, Wald, 21.1 Misspecification testing 445 Lagrange multiplier and likelihood ratio) can be easily adapted to apply in the present context In addition to these procedures several specific misspecification test procedures have been proposed in the literature (see White (1982), Bierens (1982), inter alia) Of particular interest in the present book are the procedures based on the ‘omitted variables’ argument which lead to auxiliary regressions (see Ramsey (1969), (1974), Pagan and Hall (1983), Pagan (1984), inter alia) This particular procedure is given a prominent role in what follows because it is easy to implement in practice and it provides a common-sense interpretation of most other misspecification tests The ‘omitted variables’ argument was criticised in Section 20.2 because it was based on the comparison of two ‘non-comparable’ statistical GM’s This was because the information sets underlying the latter were different It was argued, however, that the argument could be reformulated by postulating the same sample information sets In particular if both parametrisations can be derived from D(Z, Z>, Zr; w) by using alternative reduction arguments then the two statistical GM’s can be made comparable Let {Z,, f¢ 1} be a vector stochastic process defined on the probability space (S, ~ P(-)) which includes the stochastic variables of interest In Chapter 17 it was argued that for a given %Co.F y= EO,/GZ)+y,, tel defines a general statistical GM M=EN/YG) with u,=w,—EU//2/) satisfying some desirable orthogonality condition: E(uju,)=0, (21.3) properties (21.4) by te construction including the (21.5) Itis important to note, however, that (3)(4) as defined above are just ‘empty boxes’ These are filled when {Z,,t¢€ 1} is given a specific probabilistic structure such as NIID In the latter case (3)-(4) take the specific forms: 1,=fx,+u, HƑ=x, and cel u*=y,—f'x,, (21.6) (21.7) with the conditioning information set being Y='X,=x,) When (21.8) any of the assumptions in NIID are invalid, however, the various properties of u, and u, no longer hold for u* and u* In particular the 446 Departures from assumptions — probability model orthogonality condition (5) is invalid The non-orthogonality E(uxus) 40, teT (21.9) can be used to derive various misspecification tests If we specify the alternative in a parametric form which includes the null as a special case (9) could be used to derive misspecification tests based on certain auxiliary regressions In order to illustrate this procedure let us consider two important parametric forms which can provide the basis of several misspecification tests: (a) g*(x,)= Yo plu)’ (b) 9(Xx)=a+ i=] ), bxu+ i=l kok +) k x i=1j2t (21.0) kok } 3, Cụ Xu Xu i=1j>1 ni I>dj an, (21.11) The polynomial g*(x,) is related to RESET type tests (see Ramsey (1969) and g(x,) is known as the Kolmogorov—Gabor polynomial (see Ivakhnenko (1984)) Both of these polynomials can be used to specify parametric form for the alternative systematic component: Hy = BoX, +702 a general (21.12) where z* represents known functions of the variables Z,_;, ,Z,,X, This gives rise to the alternative statistical GM V,= Box, + you*e +e,, teT (21.13) which includes (6) as a special case under Hg:yo¿=0, A direct regression with H,:yaz0 comparison between (13) and Uy = (Bo ~ BYX, + Your + &, (21.14) (6) gives rise to the auxiliary (21.15) whose operational form U, = (Bo — BYX, + Yous + &, (21.16) can be used to test (14) directly The most obvious test is the F-type test discussed in Sections 19.5 and 20.3 The F-test will take the general form RRSS—URSS FT) = URSS (T—k* (=) m (21.17) 21.2 Normality 447 where RRSS and URSS refer to the residuals sum of squares from (6) and (16) (or (13)), respectively; k* being the number of parameters in (13) and m the number of restrictions This procedure could be easily extended to the higher central moments of VN E(ul/X,=x,), r>2 (21.18) For further discussion see Spanos (1985b) 21.2 Normality As argued above, the assumptions underlying the probability model are all interrelated and they stem from the fact that D(y,,X,; ý) is assumed to be multivariate normal When D(y,.X,;W) is assumed to be some other multivariate distribution the regression function takes a more general form (not necessarily linear), E(y,/X, = X,) = Aly, x,), (21.19) and the skedasticity function is not necessarily free of x,, Var(y,/X,= x,)=0(Ú X,) (21.20) Several examples of regression and skedasticity functions in the bivariate case were considered in Chapter In this section, however, we are going to consider relaxing the assumption of normality only, keeping linearity and homoskedasticity In particular we will consider the consequences of assuming (¥,/X,=X,) ~ D(B’x,, 07), (21.2 1) where D(-) is an unknown distribution, and discuss the problem of testing whether D(-) is in fact normal or not (1) Consequences of non-normality Let us consider the effect of the non-normality assumption in (21) on the specification, estimation and testing in the context of the linear regression model discussed in Chapter 19 As far as specification (see Section 19.2) is concerned only marginal changes are needed After removing assumption [6](i) the other assumptions can be reinterpreted in terms of D(f’x,, o”) This suggests that relaxing normality but retaining linearity and homoskedasticity might not constitute a major break from the linear regression framework 448 Departures from assumptions — probability model The first casualty of (21) as far as estimation (see Section 19.4) is concerned is the method of maximum likelihood itself which cannot be used unless the form of D(-) is known We could, however, use the least-squares method of estimation briefly discussed in Section 13.1, where the form of the underlying distribution is ‘apparently’ not needed Least-squares is an alternative method of estimation which is historically much older than the maximum likelihood or the method of moments The least-squares method estimates the unknown parameters by minimising the squares of the distance between the observable random variables y,, te T, and h,(@) (a function of @ purporting to approximate the mechanism giving rise to the observed values y,), weighted by a precision factor I/x, which ts assumed known, i.e ny (S”) 0cQ ¢ (21.22) Ky It is interesting to note that this method was first suggested by Gauss in 1794 as an alternative to maximising what we, nowadays, call the log-likelihood function under the normality assumption (see Section 13.1 for more details) In an attempt to motivate the least-squares method he argued that: the most probable value of the desired parameters will be that in which the sum of the squares of differences between the actually observed and computed values multiplied by numbers that measure the degree of precision, is a minimum This clearly shows a direct relationship between the normality assumption and the least-squares method of estimation It can be argued, however, that the least-squares method can be applied to estimation problems without assuming normality In relation to such an argument Pearson (1920) warned that: we can only assert that the least-squares methods are theoretically accurate on the assumption that our observations obey the normal law Hence in disregarding normal distributions and claiming great generality by merely using the principle of least-squares the apparent generalisation has been gained merely at the expense of theoretical validity Despite this forceful argument let us consider the estimation of the linear regression model without assuming normality, but retaining linearity and homoskedasticity as in (21) The least-squares method suggests minimising l8) = T 1=1 , _— #/ (PX) ỡ , (21.23) 21.2 Normality 449 or, equivalently: IØ)= > (y,—x)?=(y—X#)(y—Xỹ), t=1 al apt 7X 2X(y—-Xÿ)=0 =X#)=0, (21.24) , (21.25) Solving the system of normal equations (25) (assuming that rank(X)=k) we get the ordinary least-squares (OLS) estimator of B b=(X’X) X’y The OLS (21.26) estimator of o? is §? Xb Ps Ib)==—1 (y—Xb)(y — Xb Xb) 21.27 (21.27) Let us consider the properties nha OLS estimators b and $? in view of the fact that the form of D(f’x,, a7) is not known Finite sample properties of b and $? Although b is identical to B (the MLE of ) the similarity does not extend to the properties unless D(y,/X,; 8) is normal (a) Since b=Ly, the OLS estimator is linear in y Using the properties of the expectation operator E(-) we can deduce: (b) E(b) = E(b + Lu) = B+ LE(u) = £,i-e bis an unbiased estimator of B (c) E(b— B)(b — = E(LuwL)= ø?LU =ø?(XX)_ ! Given that we have the mean and variance of b but not its distribution, what other properties can we deduce? Clearly, we cannot say anything about sufficiency or full efficiency without knowing D(y,/X,; 6) but hopefully we could discuss relative efficiency within the class of estimators satisfying (a) and (b) The Gauss— Markov theorem provides us with such a result Gauss—Markov theorem Under the assumption (21), b, the OLS estimator of B, has minimum variance among the class of linear and unbiased estimators (for a proof see Judge et al, (1982)) (d) As far as §? is concerned, we can show that E(8?)= 07, i.e §? is an-unbiased estimator of 07, using only the properties of the expectation operator relative to D(p’x,, ¢”) In order to test any hypotheses or set up confidence intervals for 450 Departures from assumptions — probability model =(ÿ, ø?) we need the distribution of the OLS estimators b and §* Thus, unless we specify the form of D(f’x,,¢7), no test or/and confidence interval statistics can be derived The question which naturally arises is to what extent ‘asymptotic theory’ can at least provide us with large sample results Asymptotic distribution of b and §? Lemma 21.1 Under assumption (21), i v/T(b—8) ~ N(0.ø?Qy') (21.28) lim (=) =Q, (21.29) is finite and non-singular Lemma 21.2 Under (21) we can deduce that \/T(s? —02) ~ v(0, É;- ')») # (21.30) Ớa where {14 refers to the fourth central moment of D(y,/X,; 6) assumed to be finite (see Schmidt (1976)) Note that in the case where D(y,/X,; Ø) is normal 2131) mi =3=V/ TẾ? =ø°) ~ N(O,204) Lemma 21.3 Under (21) and p b— (21.32) ( lim; „(XX)=0) (21.33) P (21.34) $— Ø2 From the above lemmas we can see that although the asymptotic distribution of b coincides with the asymptotic distribution of the MLE this is not the case with §* The asymptotic distribution ofb does not depend on 21.2 Normality 451 D(y,/X,; 8) but that of $? does via The question which naturally arises is to what extent the various results related to tests about 0=(B, a) (see Section 19.5) are at least asymptotically justifiable Let us consider the Ftest for Hy: RB=r against H,: RBƠÂr From lemma 21.1 we can deduce that under Hy: ,/T(Rb—r)~ N(0,o?(RQ;'R’)~'), which implies that , (Rb —r) aon =1R/71—1 (Rb—r) ^ x(m) (21.35) Using this result in conjunction with lemma 21.3 we can deduce that R(XX) !R1]-! tr(y)= (Rb —r) a (Rb—r) ~ m z'(m) (21.36) under Ho, and thus the F-test is robust with respect to the non-normality assumption (21) above Although the asymptotic distribution of t,(y) is chisquare, in practice the F-distribution provides a better approximation fora small T (see Section 19.5)) This is particularly true when D(f’x,,o) has heavy tails The significance t-test being a special case of the F-test, b (y= ae Viwwn ~ N(O, 1) 9,9 under H,: oP 8,=0 21.37 ere) is also asymptotically justifiable and robust relative to the non-normality assumption (21) above Because of lemma 21.2 intuition suggests that the testing results in relation to c? will not be robust relative to the non-normality assumption Given that the asymptotic distribution of §? depends on „ or #¿= /u„/ø' the kurtosis coefficient, any departures from normality (where «,=3) will seriously affect the results based on the normality assumption In particular the size « and power of these tests can be very different from the ones based on the postulated value of « This can seriously affect all tests which depend on the distribution of s* such as some heteroskedasticity and structural change tests (see Sections 21.4-21.6 below) In order to get non-normality robust tests in such cases we need to modify them to take account of ji, (2) Testing for departures from normality Tests for normality can be divided into parametric and non-parametric tests depending on whether the alternative is given a parametric form or not 452 Departures from assumptions — probability model (a) Non-parametric tests The Kolmogorov—Smirnov test Based on the assumption that {u,/X,,t¢ 7} isan IID process we can use the results of Appendix 1i.1 to construct test with rejection region C,={y:./T Dt>c,} (21.38) where D* refers to the Kolmogorov-Smirnov test statistic in terms of the residuals Typical values of c, are: a € O01 123 „ 05 1.36 O01 1.67) (21.39) For a most illuminating discussion of this and similar tests see Durbin (1973) The Shapiro—Wilk test This test is based on the ratio of two different estimators of the variance ơ? n z=| > t= Aer (Urey 1) where ti1)

Ngày đăng: 17/12/2013, 15:17

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan