THE MULTIVARIATE LINEAR REGRESSION MODEL
Trang 1
The multivariate linear regression model
24.1 Introduction
The multivariate linear regression model is a direct extension of the linear regression model to the case where the dependent variable is an mx 1 random vector y, That is, the statistical GM takes the form
y,= Bx, + Wy, tel, (24.1)
where y,: mx 1, B: kx m, x, k x 1, u,: mx 1 The system (1) is effectively a system of m linear regression equations:
Ve = BX, tuy, 1=1,2, ,m, teT, (24.2)
with B=(B,, B2,. Bn)
In direct analogy with the m= 1 case (see Chapter 19) the multivariate linear regression model will be derived from first principles based on the
joint distribution of the observable random variables involved, D(Z,; ý) -
Trang 2Moreover, by construction, u, and y, satisfy the following properties:
(i) E(u,) = ELE(u,/X, = x,)] =0;
i) Eluw)=E[EuayX,=x)]=1° '°
(it) u,u,) = E[E(uu,/X, = x,)] = 0 ts:
(iii) EUsw)=E[Eua/X,=xJ]=E[w,Etw/X,=x)]=0 reT,
where O=X,¡—¡¿:ŠX;; *¿;¡ (compare these with the results in Section
19.2)
The similarity between the m= 1 case and the general case allows us to
consider several loose ends left in Chapter 19 The first is the use of the joint
distribution D(Z,; ys) in defining the model instead of concentrating exclusively on D(y,/X,; w,) The loss of generality in postulating the form of the joint distribution is more than compensated for by the additional insight provided In practice it is often easier to ‘judge’ the plausibility of
assumptions relating to the nature of D(Z,; y) rather than D(y,/X,; p,)
Moreover, in misspecification analysis the relationship between the assumptions underlying the model and those underlying the random vector process {Z,, t¢ 1} enhances our understanding of the nature of the
possible departures An interesting example of this is the relationship of the
assumption that {Z,,reT} isa (1) normal (N);
(2) independent (J); and
(3) identically distributed (ID) process; and
[6] (i) ——_Dly,/X,; 4) is normal; (1) E(y,/X,=X,) 18 linear in x,; (ili) Cov(y,/X,=x,) is homoskedastic (free of x,); [7] 6=(B, Q) are time-invariant; [8] {¥,/X,,t€T} ts an independent process The relationship between these components is shown diagrammatically below: (i) (N)> 4 (i), (ID) >[7], (D> [8] (iii)
Trang 3that X,~ N(0,Z,,), det(X,,)40, the reverse implication holds Lemma 24.1
Z,~ N(0,X) for te TT if and only if (i) X,~N(0,E,,), det(L,) #0; ti) E(y,/X,=X,) = 2252x,; (iii) Covly,/X;=X,) =, —Ly.E57Z>, (iii) (y,/X,) ~ N(B‘X,, Q) (see Barra (1981)) The statistical GM (1) for the sample period f=1.2 T is written as Y=XB+U (24.6)
whereY:T xm.X:T x k.B:k x m.U: 7 x m The system in (1) can be viewed
as the tth row of (6) The ith row taking the form
y,=XB,t+u, i= 1.2 m (24.7)
represents all T observations on the ith regression in (2) In order to define the conditional distribution D(Y/X; w,) we need the special notation of Kronecker products (see Appendix 2) Using this notation the matrix distribution can be written in the form (Y/# =X)~ N(XB, 9 @ II), (24.8) where Q @I, represents the covariance of ti vec(Y)= %2 : Tmx 1 Ym
The vectoring operator vec( -) transforms a matrix into a column vector by stacking the columns of the matrix one beneath the other Using the vectoring operator we can express (6) in the form
vec(Y) =(l„ @ X) vec(B) + vec(U) (24.9) or
y*=X,B, +, (24.10)
Trang 4The multivariate linear regression (MLR) model is of considerable interest in econometrics because of its direct relationship with the simultaneous equations formulation to be considered in Chapter 25 In particular, the latter formulation can be viewed as a reparametrisation of the MLR model where the statistical parameters of interest @=(B, Q) do not coincide with the theoretical parameters of interest € Instead, the two sets of parameters are related by some system of implicit equations of the form:
h,(0.6)=0 i=1,2, ,p (24.11)
These equations can be interpreted as providing an alternative parametrisation for the statistical GM in terms of the theoretical parameters of interest In view of this relationship between the two statistical models a sound understanding of the MLR model will pave the way for the simultaneous equations formulation in Chapter 25
24.2 Specification and estimation
In direct analogy to the linear regression model (m= 1) the multivariate linear regression model is specified as follows:
qd) Statistical GM: y,=B’x,+u,, teT y:mxl, Xx;:kx1, B:kxm
[1] The systematic and non-systematic components are:
H,= E(y,X,=x,)=Bx, u,=y,— Ety,/X,=x,),
and by construction
E(u,) = EL E(u,/X, =x,)]=9,
E(uu,) = E[LE(wu,/X, = x,)] =0, reT
[2] The statistical parameters of interest are 6=(B,Q) where B=
X2; E¿y, Q=%,, —2,2%37'E),
[3] X, is assumed to be weakly exogenous with respect to 0 [4] No a prion information on Ø
Trang 5ap Probability model 0=| DivX a Or Pí— 3ÍW,— Bx)OQ_ ty c—B%,)}, 0c **xC”"†,íe vf [6] (i) — Dly,/Xs @) — normal; (1) Ety,/X, = x,) = B’x, — linear in x,; (ili) Cov(y,/X, = X,) = 2 — homoskedastic (free of x,); [7] 6 is time invariant
(IW) Sampling model
{8} Y=(¡,Y¿ Yr} 1S an independent sample sequentially drawn
from D(y,/X,; 6), t=1,2, , T.and T2=m+k
The above specification is almost identical with that of m= 1 considered in Chapter 19 The discussion of the assumptions in the same chapter applies to [1]-[8] above with only minor modifications due to m> 1 The only real change brought about by m> | is the increase in the number of statistical parameters of interest being mk +4m(m + 1) It should come as no
surprise to learn that the similarities between the two statistical models
extend to estimation, testing and prediction
From assumptions [6] to [8] we can deduce that the likelihood function takes the form
r
10; Y) )=c(Y) IP (y,/X,; 9)
and the log likelihood is
1 , B
log L= const ~5, lost (det Q)— 5 S (y, )@~!(y,—Bx,) (24.12) =const —4[T log(det 2) + trQ~!(Y —XB(Y—XB)] (24.13)
Trang 6These first-order conditions lead to the following MLE’s:
B=(XX)'!XY (24.16)
1
G=— CU, T (24.17)
where U = Y —XB For Q to be positive definite we need to assume that T > m+k (see Dykstra (1970)) It is interesting to note that (16) amounts to estimating cach regression equation separately by
,=(XX) lXy, i=1,2 m (24.18)
Moreover, the residuals from these separate regressions ủ,= y,— Xổ, can be used to derive QD via @¡;=(1/T)ùjâ,, ij=1,2, m
As in the case of B in the linear regression model, the MLE B preserves the original orthogonality between the systematic and non-systematic components That is, for #,=B’x, and a, =y, —Bx,
V,=f8,+úủ, /=1,2 T (24.19)
and #, 1 ũ, This orthogonality can be used to deñne a goodness-of-lit measure by extending R? = 1—(a'd) (yy) to
G=I—(Ù Õ)\(Y'Y)!=(Y'YT-ÙÔ)J(Y'Y)T1, (24.20) The matrix G varies between the identity matrix when U =0 and zero when
Y=U (no explanation) In order to reduce this matrix goodness-of-fit measure to a scalar we can use the trace or the determinant
[
d,=-trG, d,=det(G) (24.21)
m (see Hooper (1959))
In terms of the eigenvalues (4,, 7 , ,4,,) of G the above measures of goodness of fit take the form
[ m m
dị= } 2 and dạ=[|] 2, (24.22)
Mm = ¿=1
The orthogonality extends directly to M=XB and U and can be used to show that B and Q are independent random matrices In the present context this amounts to
Cov(B @ Ô) =0, (24.23)
Trang 7Finite sample properties of B and Oo
From the fact that B and Q are MLE’s we can deduce that they enjoy the invariance property of such estimators (see Chapter 13) and they are functions of the minimal sufficient statistics, if they exist Using the Lehman— Scheffe result (see Chapter 12) we can see that the ratio
D(Y/X; 6) |
DIV gi 8) RPL 20767 1[YY —Y0Yu —(Y —YuJXB—
BX(Y-Yạ]; (24.24)
is independent of 0 if YY=Y’Y, and Y’'X = YX This implies that
a(Y)=(t,(¥),t,(¥)), where t,(Y)=Y’Y, t,(¥Y) =Y'X
defines the set of minimal sufficient statistics and B=(X'X) '2,(YY, (24.25) ~ 1 Ô=.r(Y)—r(VIXX) tra), (24.26) In order to discuss the other properties of B and © let us derive their distributions Since B=B+(X’X) 'X'U =B+LU, L=(XX) 1X, we can deduce that B~ N(B, Q @ (X'X)~') (24.27) This is because B is a linear function of Y where (Y/X) ~ N(XB,Q © I) (24.28)
Given that TO = Y(I — M,)Y’, its distribution is the matrix equivalent to the chi-square, known as the Wishart distribution with T—k degrees of freedom and scale matrix Q and written as
TQ ~ W,(Q, T—k) (24.29)
(see Appendix 1) In the case where m= 1 TƠ =ữủ and
TƠ~ø2z?(T—k), E(TƠ)=ø?(T—ĐI (24.30)
The Wishart distribution enjoys most of the attractive properties of the multivariate normal distribution (see Appendix 1) In direct analogy to (30),
Trang 8and thus Õ=[1/(T—k)]Ữ Ô is an unbiased estimator of Q In view of (25}-(31) we can summarise the finite sample properties of the MLE’s B and © of B and Q respectively: (1) B and are invariant (with respect to Borel functions of the form g(-): © — (1 <r <(mk + 3m(m + 1) (2) Band Ô are functions of the minimal sufficient statistics t,(Y)=Y'Y and 1,(Y)=Y’X
(3) B is an unbiased estimator of B (i.e E+(B)=B) but Q is a biased estimator of Q; Q=[1/(T—k)]U'U being unbiased
(4) Bisa fully efficient estimator of B in view of the fact that Cov(B) = O@(XX)'! and the information matrix of @=(B,Q) takes the form " , 1= 9X ° (2432) 0 (Q°-'@Q7)) Nps (see Rothenberg (1973))
(5) B and Q are independent: in view of the orthogonality in (19) Asymptotic properties of B and Q
Arguing again by analogy to the m= 1 case we can derive the asymptotic properties of the MLE’s B and Q of B and Q, respectively
(1) Consistency: (B *,B,Q24 Q)
In view of the result (B—B)~ N(0, Q @ (X’X)z!) we can deduce that if
lim,_, , (XX); ! =0 then Cov(B) — 0 and thus B is a consistent estimator of
Trang 9where 2„„(XX)y and Ama(X’X)7' refer to the smallest and largest eigenvalue of (X'X); and its inverse respectively; see Amemiya (1985) (ii) Strong consistency: (B — B) If - Ảma(X X)r lim(XX)»£!=0 and | |<€C T a Mn Armin XX) for some arbitrary constant C, then Bo B; see Anderson and Taylor (1979) (1) Asymptotic normality
From the theory of maximum likelihood estimation we know that under
relatively mild conditions (see Chapter 13) the MLE 6 of 6 \/T(6—6) ~
N(,I,,(0)~) For this result to apply, however, we need the boundedness of I,,(0)=lim,_, ,.(1/T)1(8) as well as its non-singularity In the present case the asymptotic information matrix is bounded and non-singular (full rank) if lim,_, , (X'X)/T=Q, < «x and non-singular Under this condition we can deduce that \/ T(B—B) ~ N0,2 @ Qz') (24.33) and J/T(Q —Q) ~ NO, 2(Q © Q)) (24.34) (see Rothenberg (1973))
Note that if {(X’X),;, T >k} is a sequence of k x k positive definite matrices such that (X’X),;_, —(X’X); is positive semi-definite and e’(X’X);c + x as
T— x for every c#0 then lim;_,(X’X)7'=0
(iv) In view of (iii) we can deduce that B and Q are both asymptotically unbiased and efficient
24.3 A priori information
One particularly important departure from the assumptions underlying the multivariate linear regression model is the introduction of a priori
restrictions related to 6 When such additional information is available
assumption [4] no longer applies and the results on estimation derived in
Trang 10the present context arises partly because it allows us to derive tests which can be usefully employed in misspecification testing and partly because this will provide the link between the multivariate linear regression model and the simultaneous equations model to be considered in Chapter 25 (1) Linear restrictions ‘related’ to X,
The first form of restrictions to be considered is
D,B+C, =0, (24.35)
where D, : px k (p<k), rank(D,)=p and C, : px mare matrices of known constants A particularly important special case of (35) is when
B
D,=(0.1,), B-( } C,=0 2 B; (24.36)
and (35) takes the form B;=0 That is, a subset of the coefficients in B is zero The thing to note about these restrictions 1s that they are not the same as the form
RB=r, (24.37)
discussed in the context of the m= | case (see Chapter 20) This is because the D, matrix affects all the columns of B and thus the same restrictions apart from the constants in C,, are imposed on all m linear regression equations The form of restrictions comparable to (37) im the present context is
RB, =" (24.38)
where B,, = vec(B) =(B' Bo , Bi.) 1 mk x 1, Rip x mk, px 1 This form of
linear restrictions is more general than (35) as well as
BI, +A, =0 (24.39)
All three forms, (35), (38) and (39), will be discussed in this section because they are interesting for different reasons
When the restrictions (35) are interpreted in the context of the statistical GM
y,=—Bx,+u,, tel, (24.40)
we can see that they are directly related to the regressors X;,,i=1,2, ,k The easiest way to take (35) into consideration in the estimation of @=(B, Q) is to ‘solve’ the system (35) for B and substitute the ‘solution’ into (40) In
Trang 11k—p, and C#: (k—p) xm, and reformulate (35) into DB+C=0 (24.41) where C D=(D,.D*), kxk, c-(C1) k xm The fact that rank(D)=k enables us to solve (41) for B to yield B= —D 'C=G,C, + G*C%, (24.42) where G =(G,, G*)= —D™! Substituting this into (40) for t=1,2, , T yields Y*=X*C?+U, (24.43)
where Y*=Y—XG,C, and X*=XGf The fact that the form of the underlying probability model is unchanged implies that the MLE of C?# is Cƒ=(X*X*)!X*YY =(GƒXXG?) 'GƑX(VY—XG,C,) =(GIXXG#)!G#XX(B—G,€,) (24.44) Hence, from (42) the constrained MLE of B is B=G,C,+G*(G†XXG#)"!G#XX(B—G,C,) (24.45) =G,C,+L(B—G,C,), whereL=G†(G†XXG‡#) 'G*XX =B—P(B—G,C¡), where P=I—L (24.46)
Given that L?=L, P? = P and LP=0 (ie they are orthogonal projections) we can deduce that P takes the form P=(XX) !D;,[D,(XX) !D¡] !D, (24.47) (see exercise 3 This implies that B=B-(XX)“!D;,[D,(XX)”!Đ,]-!(D,B+C,) (24.48) since D,G, =I, Moreover the constrained MLE of Q is a ~ SA Il « ~ sa Q= T le L= Qe, (B—B)(X X)(B — Bì (24.49) Looking at the constrained MLE’s of B and Q we can see that they are direct extensions of the results in the case of m=1 in Chapter 20
Another important special case of (35) is the case where all the coefficients
Trang 12form (35) with
D, =(0,1,-1), B=(B ;,B,,,), C=0
and H, takes the form B,,,=9
(2) Linear restrictions ‘related’ to y,
The second form of restrictions to be considered is
Br, +A, =0, (24.50)
Trang 13This implies that the constrained MLE’s of B and Q are
B=B-(BT,+A,)\(T;/ƠT,)-!T;ơ, (24.58)
~ law a lage ~ 2
Q=— ỮŨ=ơƠ +7 B- By (X'X)(B—B) (24.59) (see Richard (1979)) If we compare (58) with (48) we can see that the main difference is that Q enters the MLE estimator of B in view of the fact that the restrictions (50) affect the form of the probability model It is interesting to note that if we premultiply (58) by F, it yields (54) The above formulae, (58), (59), will be of considerable value in Chapter 25
(3) Linear restrictions ‘related’ to both y, and X,
A natural way to proceed is to combine the linear restrictions (38) and (50) in the form of
D,Br'+C=0, (24.60)
where D,: px k, |: mx q, C: px q, are known matrices with rank(D,)=p, rank(I’,)=q Using the Lagrangian function T (B,Q,A)= 5 log(det @) —3 tr @~!(Y — XB}(Y — XB) —tr[A(D;BT,+©)], (24.61) we can show that the restricted MLE’s are B=B-(XX)-!D;[D,(XX)~!D;/]r'!(D,BT, +C\Œ;ÔT;)1T;,Ô, (24.62) ~ A | we & ~ 3a =Ð + (B—B)(X X)(B — Bì) (24.63) An alternative way to derive (62) and (63) is to consider D,B*+C=0 (24.64) for the transformed specification Y* = XB*+E (24.65)
where Y*=YT,, B*=BL, and E=UF;
The linear restrictions in (60) in vector form can be written as
vec(D, BI, + C)=(T, © D,) vec(B) + vec(C) =0 (24.66) or
Trang 14where B, = vec B and r= — vec(C) This suggests that an obvious way to generalise this is to substitute (/', © D,) with a pxkm matrix R to formulate the restrictions in the form
RB, =r, (24.68)
where rank(R)=p (p<km) The restrictions in (68) represent the most general form of linear restrictions in view of the fact that B, enables us to ‘reach’ each coefficient of B directly and impose within-equation and between-equations restrictions separately
In the case where only within-equation linear restrictions are available R is block-diagonal, Le R, 0 - 0 Ty 0 R,z T; R={ _ land r=| _ † (24.69) 0 OR, Tin! R¿:p;xk, i=l1,2, ,m, rank(R,)=p, rp,x I
Exclusion restrictions are a special case of within-equation restrictions where R,; has a unit sub-matrix, of dimension equal to the number of
excluded variables, and zeros everywhere else
Across-equations linear restrictions can be accommodated in the off block-diagonal submatrices R;,,i,j=1,2, m,iAj of R with R,, referring to the restrictions between equations i and j
Trang 15ề inh , ee (24.73) * cl 1 1 , =1 70,7 2% +400 Wy„T—X„„)(y„—X„8,„)©; ! (24.74) cl _ inp, 1) =0 cao (24.75)
Looking at the above first-order conditions (73){75) we can see that they constitute a system of non-linear equations which cannot be solved explicitly unless Q is assumed to be known In the latter case (73) and (75) imply that ñ =B,— x Xx) ) IR TR(X,Q7X,)7 'R*]- (RB, —r),(24.76) tease - ¬ - (24.77) and _ B, =(X,Q,'°X,) UX Qyty, (24.78)
If we compare these formulae with those in the m= 1 case (see Chapter 20) we can see that the only difference (when Q is known) is the presence of Q,
This is because in the m>1 case the restrictions RB,=r affect the
underlying probability model by restricting y, In the econometric literature the estimator (78) is known as the generalised least-squares (GLS) estimator
In practice Q is unknown and thus in order to ‘solve’ the conditions (73)-{75) we need to resort to iterative numerical optimisation (see Harvey (1981), Quandt (1983) inter alia)
The purpose of the next section is to consider two special cases of (68) where the restrictions can be substituted directly into a reformulated statistical GM These are the cases of exclusion and across-equations linear
homogeneous restrictions In these two cases the constrained MLE of B,,
takes a form similar to (78)
24.4 The Zellner and Malinvaud formulations
In econometric modelling two special cases of the general linear restrictions
RỢ,=r (24.79)
are particularly useful These are the exclusion and across-equations linear
Trang 16two-equation case
X1
u
(ante Bai nh) xX», +( ") teT (24.80)
Vat Bi2 Bar Bas Har X3¢
(i) Exclusion restrictions: B,,=0, B.3=0;
(ii) Across-equation linear homogeneous restrictions: B,, =P , > It turns out that in these two cases the restrictions can be accommodated directly into a reformulation of the statistical GM and no constrained optimisation is necessary The purpose of this section is to discuss the
estimation of B, under these two forms of restrictions and derive explicit
formulae which will prove useful in Chapter 25
Let us consider the exclusion restrictions first The vectorised form of Y=XB+U, (24.81) as defined in the previous sections, takes the explicit form y1 X09 0 By uy ị = x ọ , + Ỷ (24.82) w„ À0 = 0 XỈ \#y uy or ¥,=X,By+U, (24.83)
in an obvious notation Exclusion restrictions can be accommodated directly into (82) by allowing the regressor matrix X to be different for different regression equations y,=X£,+u,,i=1,2, ,m, and redefining the B;s accordingly That is, reformulate (82) into y1 X, 0 - 0 Br Uy 0 xX : ¥ u 2= |,” : : "0 HP? : : (24.84) Ym 0 0 X,/ NBR \ Up, or Y„=XšØ8š +uy, (24.85)
where X;, refers to the regressor data matrix for the ith equation and Ø# the
Trang 17restrictions ¡¡ =0, 633 =0, (84) takes the form
hs LIB Y2 0 X2/\B2 H; nm
where X, =(x ,X3), X.=(%1,%2), Bi =(B21B31)' and B,=(B, 2822)’
The formulation (84) is known as the seemingly unrelated regression equations (SURE), a term coined by Zellner (1962), because the m linear regression equations in (84) seem to be unrelated at first sight but this turns out to be false When different restrictions are placed on different equations the original statistical GM is affected and the various equations become interrelated In particular the covariance matrix Q enters the estimator of
* As shown in the previous section, in the case where Qis known the MLE of B* takes the form
‡=(X‡(9~! @1,)X‡)"'X‡(@~' @I;)y, (24.87)
Otherwise, the MLE is derived using some iterative numerical procedure For this case Zellner (1962) suggested the two-step least-squares estimator
đ‡=(X‡(Ơ“'! @I,)X#)”!X‡(Ô @ I,)y„ (24.88)
where Q=(1/T)U'U, U=Y—XB It is not very difficult to see that this
estimator can be viewed as an approximation to the MLE defined in the
previous section by the first-order conditions (73}{75) where only two iterations were performed One to derive © and then substitute into (87)
Zellner went on to show that if
Qc!
lim (x: a X:}=0, <x, (24.89)
Tox
Trang 18of across-equation linear homogeneous restrictions such as B,,=f,> in example (80) Such restrictions can be accommodated into the formulation (82) directly by redefining the regressor matrix as
xi; O 0
0 x’
X*=]{ 7 (24.92)
Oc: OF XY:
(where x;, refers to the regressors included in the ith equation) and the coefficient vector B,,, So as to include only the independent coefficients The
form
y,=X#š +u, (24.93)
is said to be the Malinvaud form (see Malinvaud (1970)) For the above example the restriction ~,,=f/,, can be accommodated into (80) by defining X# and Bx as 0 Bay Xa, X xr-(" Ñ ) and pt=| Bs, | (24.94) *u X21 8 22 The constrained MLE of Øš In the case where 22 is known is T ~1 TT ?:~{ » xzo '/)] ` XF⁄O ty, (24.95) t=] t=1
Given that Q is usually unknown, the MLE of B* as defined in the previous section by (73)(75) can be approximated by the GLS estimator based on the iterative formula
=1 t=1
T -1 Tổ
j =(Š xrô, 'x7] Y XO; 'y,, i=1,2, ,1, (24.96
where / refers to the number of iterations which is either chosen a priori or
determined by some convergence criterion such as
|Ê*.(—f|<e for somee>0, eg ¢=0.001 (24.97)
Trang 1924.5 Specification testing
In the context of the linear regression model the F-type test proved to be by far the most useful test in both specification as well as misspecification analysis; see Chapters 19-22 The question which naturally arises is whether the F-type test can be extended to the multivariate linear regression model The main purpose of this section is to derive an extended F-type test which serves the same purpose as the F-test in Chapters 19-22
From Section 24.2 we know that for the MLE’s B and Ô
(i) B~ N(B, Q @ (X’X)~'); (24.99)
and
(ii) TO ~ W(Q, T—k) (24.100)
Using these results we can deduce that, in the case where we consider one regression from the system, say the ith,
yi=XB;+ Uy, (24.101)
the MLE’s of ổ; and ,; are
ô.=(XX) 'Xy, and Ou=F ũù, ô,—=y,— XỔ; (24.102)
Moreover, using the properties of the multivariate normal (see Chapter 15) and Wishart (see Appendix 1) distributions we can deduce that
tự
Ê,~N(,œ;¿(XX)”!) and !{ )xzư-b (24.103)
Œ) tỉ
These results ensure that, in the case of linear restrictions related to B; of the
form Hạ: R,8,=r, against H,: R;Ø;#r, where R; and r; are p;x k and p;x 1 known matrices and rank(R;)=p;, the F-test based on the test statistic
R.Ð.—r,[R(XX)T!R,] !{(R,P,—r,)/T—k FT y= Pi nit t(X X) i] Œ i8; —T¡) (=
uu;
is applicable without any changes In particular, tests of significance for individual coefficients based on the test statistic
) (24.104)
(24.105)
Trang 20Let us now consider the derivation of a test for the null hypothesis:
H,):DB-—C=0 against H,:DB—C#0 (24.106)
where D and C are pxk and pxm known matrices, rank(D)=p A particularly important special case of (106) is when
B
D=(0,1;,): ko xk, B=(3'] and C=0:k,xm, 2
Le Hy: B, =O against H,: B, #0 The constrained MLE’s of B and Q under
Hạ take the form
B=B-—(XX)”!D[D(XX)”!D']r '!DB—C) (24.107)
and
ð=Ô tự (B -B)(XX)(B — ô), (24.108)
where Ê=(XX)"!X'Y, Ô=(1/T)P Ô, Ô=Y —XÊ are the unconstrained
MLE’s of B and Q (see Section 24.3 above) Using the same intuitive argument as in the m= 1 case (see Chapter 20) a test for Hy could be based on the distance
|DB —C|j (24.109)
The closer this distance is to zero the more the support for Hạ If we
normalise this distance by defining the matrix quadratic form
@-~!(DB —C)[D(XXJ)~1D']-!(DB —C), (24.110)
the similarity between (110) and the F-test statistic (104) is all too apparent Moreover, in view of the equality
v0 = 0'U +(DB—Cÿ[D(XX)”!D']:!Dồ—C) (24.111)
stemming from (108), (110) can be written in the form
G=(0T-U 000)", (24.112)
where U =Y —XB This form constitutes a direct extension of the F-test Statistic to the general m> 1 case Continuing the analogy, we can show that
Ô~ W⁄(Q,T—k), T>m+k, (24.113)
where Ù'Ũ =ƯM¿U, My =I—X(XX)~'X' Moreover, in view of (112) the distribution of O’U —U’U is a direct extension of the non-central chi-
square distribution, the non-central Wishart, denoted by
Trang 21where A=Q97'(DB—C)'[D(X'X)"'D’] 1(DB—C) (24.115) is the non-centrality parameter This is because (Ũ— Ô)=ƯM,U where M;=X(XX) !D[D(XX) !D] !D(XX) !X, (24.116)
and M, is a symmetric idempotent matrix (M,= Mp), M)>M,=M,) with rank(M,)= rank(D)= p Given that Mp and My, are orthogonal, 1
M;M¿=0, (24.117)
we can also deduce that U’M, U and U’MbU are independently distributed
(see Chapter 15) The analogy between the F-test statistic,
APR
ữũ—ữñ /T—k\°
FT(y)= aa (7*)* F0.7-m, (24.118)
in the case k= 1, and G as defined in (112), is established The problem, however, is that G is a random m x m matrix, not a random variable as in
the case of (118) The obvious way to reduce a matrix to a scalar is to use a
matrix real-valued function such as the determinant or the trace ofa matrix:
+,(Y)= det[( Ũ — Õ(Ô)- 1], (24.119)
t;(Y)= tr[( Õ - Ở Ô(Ữ Ô)- 1] (24.120) In order to construct tests for H, against H, using the rejection regions
C,={Y¥:1(¥)>c,}, i=1,2, (24.121)
we need the distribution of the test statistics r;(Y) and r;(Y) These
distributions can be derived from the joint distribution of the eigenvalues of G, say, 4,, 42, , 4, where |= minim p) because
l i
1()= 3 2¡ and rạ(V)=[[ ái (24.122)
(=1 i=1
The distribution of ÀZ(⁄¡ ⁄¿ 2} was derived by Constantine (1963) and James (1964) in terms of a zonal polynomial expansion Constantine
(1966) went on to derive the distribution of t(y) in terms of generalised Laguerre polynomials which is rather complicated to be used directly For
Trang 22tables relating to the upper percentage points of r3(y)=(T~k};(y) (24.123 have been constructed (see Davis (1970)) For large 7 — k we can also use the asymptotic result, Hạ tŸ(y) ~ z7mp) (24.124:
in order to derive c, in (121) The test statistic is known as the Lawley- Hotelling statistic Similar results can be derived for the determinental ratio test statistic
rŸy)=(T— k}r(y) (24.125)
The test statistics t,(y) and 1,(y) can be interpreted as arising from the Wald test procedure discussed in Chapter 16 Using the other two test procedures, the likelihood ratio and Lagrange multiplier procedures, we can construct alternative tests for Hy against H, The likelihood ratio test procedure gives rise to the test statistic
LRWY)= (4 yy _ det(Ô) LEY) ~deug LE OWT)" — Fn/f/£fT! (24.126)
In terms of the eigenvalues of G this test statistic takes the form i 1 = 12 LR(Y) lỊ ( az} (24.127) and thus its rejection region is defined by C,=(Y: LR(Y)<c,', (24.128)
c, being determined by the distribution of LR(Y) under H, This distribution can be expressed as the product of p independent beta distributed random variables (see Johnson and Kotz (1970), Anderson (1984), inter alia) For large T we might also use the asymptotic result
Họ
— T* log LR(y) ~ x7(mp) (24.129)
x
where T*=[T—k—4(m—p+1)] (p>m); see Davis (1979) for tables of upper percentage points c,,
The Lagrange multiplier test statistic based on the function
Trang 23can be expressed in the form:
LM(Y)=tr(G), (24.131)
where
G=(ỮŨ —-Ð Ơ)(Ũ Ư)- ! (24.132)
This test statistic is known in the statistical literature as Pillai’s trace test statistic because it was suggested by Pillai (1955), but not as a Lagrange
multiplier test statistic In terms of the eigenvalues /,, 42, ., 4, this statistic takes the form
! A;
LM(Y)= * (; 2) (24.133)
The distribution of LM(Y) was obtained by Pillai and Jayachandran (1970) but this is also rather complicated to be used directly and several approximations have been suggested in the literature: see Pillai (1976), (1977), for references For large T—k the critical value c, for a rejection region identical to (121) can be based on the asymptotic result
Ho
(T—k)LM(Y) ~ y7(mp) (24.134)
A similar test statistic known as Wilk’s ratio test statistic is defined as the other matrix scalar function of G, the determinant
1:(Y)= det(G) (24.135)
In terms of the eigenvalues of G this test statistic is
509= TT hi ) (24.136)
¿=1 1+2;
It is interesting to note that G as defined above is directly related to multivariate goodness of fit measure G as defined in Section 24.2 above Note that G=(YY-U(/(YY)—!, (24.137) In order to see the relationship Jet us consider the special case where H,:B,=0 H,:B;z0 (24.138) and Y=X,B,+X;B;~U (24.139)
Defining the restricted residuals by O=Y—X,B, where B,=
Trang 24matrix of the auxiliary multiple regression
U=X,B,+X,B,+V (24.140)
All the tests based on the test statistics mentioned so far are unbiased but no uniformly most powerful test exists for H, against H,; see Giri (1977) Hart and Money (1976) for power comparisons
A particularly important special case of Hp: DB—C=0 against H,:
DB—Cz0 1s the case where the sample period is divided into two sub-
samples, say, T, =(1,2, , 7,)and T,=(7,+1, , T), where T—T, =T- and T,, T, >k If we allow the conditional means for the two sub-periods to
be different, ie for
teT,: ¥,;=X,B,+U,, (24.141)
reT;:Y;=X;B;+U;, (24.142)
but the conditional variances to be the same, Le
F(Y,;=X,)=@Q @l„, ¡=1,2, (24.143)
then the hypothesis: H):B, =B, against H,:B, #B, can be accommodated into the above formulation with B D=(,, —I,), B-( ) C=0 B, Y,\_ /X; 0 )\/B, U, bình xJ6;)*(v)) (24.144)
This is a direct extension of the F-test for structural change considered in Chapter 21 In the same chapter it was argued that we need to test the
equality of the conditional variances before we apply the coefficient
constancy test
Trang 2524.6 Misspecification testing
Misspecification testing in the context of the multivariate linear regression model is of considerable interest in econometric modelling because of its relationship to the simultaneous equations model to be discussed in Chapter 25, As argued above, the latter model is a reparametrisation of the former and the reparametrisation can at best be as well defined (statistically) as the statistical parameters 6 =(B, Q) In practice, before any
questions related to the theoretical parameters of interest € can be asked the
misspecification testing for 6 must be successfully completed
As far as assumptions [1]-[8] are concerned the discussion in Chapters 19 to 22 is directly applicable with minor modifications Let us consider the probability and sampling model assumptions in the present case where m> 1,
Normality The assumption that D(y,/X,; 0) is multivariate normal can be tested using a multivariate extension of the skewness—kurtosis test The skewness and kurtosis coefficients for a random vector u, with mean zero and covariance matrix Q are defined by
Z3„=[E(u© “!u,*]Ÿ and z„„=E(wO-!u,)2, (24.157)
Trang 26Using (160) and (161) we can define separate tests for the hypothesis
HÿẸ)':x;„=0 against HỊ: z; „z0
and
HỆ": x„„= mũm +2) H‡”: x„„ # mắm +2)
respectively When D(y,/X,; 0) is normal then HE) n HỆ? ¡s valid
As in the case where m=1, the above tests based on the residuals
skewness and kurtosis coefficients are rather sensitive to outliers and should be used with caution
For further discussion of tests for multivariate normality see Mardia
(1980), Small (1980) and Seber (1984), inter alia
Linearity A test for the linearity of the conditional mean can be based on the auxiliary regression
ù,=(Bạ—Jx,+IỨ,+e, r=l.2 T (24.163)
where W, =(W,,, Wp)’ are the higher-order terms related to the Kolmogorov—Gabor or RESET type polynomials (see (21.10) and (21.11)
The hypothesis to be tested in the present case takes the form
Hy: T=0 against H,: 10 (24.164)
This hypothesis can be tested using any one of the tests t,(Y), i= 1,2, 3, LR(Y) or LM(Y) discussed in Section 24.4
Homoskedasticity A direct extension of the White test for departures from homoskedasticity is based on the following multivariate linear auxiliary regression: $= Co t Cy, + & (24.165) where $,=(0¡, đà, hi Pq) and -
Py = Nyt Í>I= 1.2, m, q=3m(m+ I) (24.166)
Testing for homoskedasticity can be based on
Hy: C=0 against H,:C¥0 (24.167)
A linear set of restrictions which can be tested using the tests discussed in Section 24.4 above The main difference with the m= | case is that we have the cross-products of the residuals in addition to the cross-products of the
Trang 27Time invariance and structural change The discussion of departures from the time invariance of @=(f,c7) in the m=1 case was based on the
behaviour of the recursive estimators of 0, say 6,, t=k+1, , T This
discussion can be generalised directly to the multivariate case where 0= (B,Q) without any great difficulty The same applies to the discussion of structural change whose tests have been considered briefly in Section 24.4 Independence Using the analogy with the m=1 case we can argue that
when {Z,,t¢ 17}, Z,=(y;: Xj) is assumed to be a normal, stationary and /th-
order Markov process (see Chapter 8), the statistical GM takes the form
i I
Y,=Box,+ ¥ Ajy,-;+ ¥ Bix,_; +6, (24.168)
i=1 i=1
where {¢,, t>/} is a vector innovation process If we compare (168) with the
statistical GM under independence
y,= Bx, +u,, (24.169)
we can see that the independence assumption can be tested using the auxiliary multivariate regression
I
0, =(Bo ~Byx, + » LA¡W,-¡ +Bjx,_.] Tứ (24.170) i=1
In particular, the hypothesis of interest takes the form
Hy: A;=0 and B;=0 for all i=1,2, ,1
against
H,:A;40 or B40 forany i=1, ,1
This hypothesis can be tested using the multivariate F-type tests discussed in Section 24.4
The independence test which corresponds to the autocorrelation approach (see Chapter 22) could be based on the auxiliary multivariate regression
đ,=Dạx,+C¡ơ,_, +: + Cjâ,_,+Y, (24.171)
That 1s, test Hạ: C¡ =C¿= -:: =C,=0against H,:C, #0 for any ¡= l,2, ,
Í This can also be tested using the tests developed in Section 24.4 for linear restrictions
Testing for departures from the independence assumption is particularly important in econometric modelling with time-series data When the assumption is inappropriate a respecification of the multivariate linear regression model gives rise to the multivariate dynamic linear regression
Trang 2824.7 Prediction
In view of the assumption that
y=Bx,+u, tel, (24.172)
the best predictor of y;,,, given that the observations t= 1, 2, , Twere used to estimate B and Q, can only be its conditional expectation
97 =Bx,., l=1,2, (24.173)
where x;,, represents the observed value of the random vector X, at t= T +I The prediction error is
er.¡=Yr¿i—~Ÿr.=(—B Xr.¡+ uy (24.174)
Given that e;,, is a linear function of normally distributed r.v.’s,
e7.~ N(0, QU +x7 (XX) 'xr.,)) (24.175)
(see exercise 7), which is a direct generalisation of the prediction error distribution in the case of the linear regression model Since Q is unknown its unbiased estimator 1 s=—— UU Tok ( 24.176 ) is used to construct the prediction test statistic H=(yrur—fr.)Sr `(Yr.i—Ÿr+i)› (24.177) where S;=S(I+xr.XX)” !xr,j) (24.178) Hotelling (1931) showed that Tk— NH peal komt ĐỀ ` nụ T—k—m+ 1), (24.179) (T—k)m and this can be used to test hypotheses about the predictions or construct prediction regions
Trang 29[1] = Ey,/ø(y?- 1} X? =x,) and u, =W,— Ely,/o(¥?_ 1)» X?= x?)
{2] 6*=(A,, ,A;,B,B,, ,B,,Q,) are the statistical parameters of interest
[3] X, is strongly exogenous with respect to 0* [4] The roots of the matrix polynomial
i1
(21 y Aa')=0
¡=1 lie within the unit circle
[5] Rank(X*) =k* where X*=(Y_,, ,Y¥_),X,X_,, ,Ä_¡),k*= mk + lm(k — 1) + lmẺ dd Probability model 0={ Diy 22 0*) (det Qo) gma X exp{ —34y, — B*X?JO¿ '(y,— B#X?)}, XO - 0*c@,¡íc uh (24.181)
[6] (i) Dy,/Zp_ ,; 0*) — normal:
(ii) E(y,/a(Vp_ 1), XP =xp) = B*'X* — linear in X*; (iii) Covty,/o(¥P_,), X°2=x°)=Q, - homoskedastic; [7] Ø* 1s time invariant dd) Sampling model [8] Y =(Y,, , Y,)'isa non-random sample sequentially drawn from Dy,/Zp_,; 0"), t=1,2, , T, respectively Note: B* lll (A1, Á›; A¡, Bọ, By BY), *— r x =(W,Si:Ÿi~2s -sŸr—p Xe ¬
The estimation, misspecification and specification testing in the context
of this statistical model follows closely that of the multivariate linear regression model considered in Sections 24.2-24.5 above The modifications to these results needed to apply to the MDLR model are analogous to the ones considered in the context of the m=1 case (see Chapter 23) In particular the approximate MLE’s of 6*
Trang 30and
^ 1 ~ ~ ^ ˆ
Qo= 7 U*U*, U*=Y—X*B* (24.183)
behave asymptotically like B and Q (see Anderson and Taylor (1979) Moreover, the multivariate F-type tests considered in Section 24.4 are asymptotically justifiable in the context of the MDLR model
For policy analysis and prediction purposes it 1s helpful to reformulate the statistical GM (180) in order to express it in the first-order autoregression form y*=AZy# ;+BƑ Z# +uế, (24.184) where , iI, 0 0 Y, Uy, A, O -I, Vi 1 0 v*= ux=| | Ate 0 : , —f,, Yeni! 9 A 0 0 Bọ "` B X._ Br=| | Z2=| 7! \ 3 X,-
This form can be viewed as a first-order non-homogeneous vector
difference equation (see Miller (1968)) which becomes
t-1 1-1
ye =(At ye + (APY BEZ* + Y (APU; (24.185)
1-0 iO
by repeated substitution Assuming that lim,.,(A#)'=0 (compare with \4;/<1 in the m= 1 case) then the solution of (184) is
yes NAS Be ~ NV (AP YU, _, (24.186)
This is known in the econometric literature as the final form, with
M, = BF (24.187)
M.=BƑAF :=1.2 (24.188)
known as the impact and interim multipliers of delay t respectively The
Trang 31multiplier matrix defined by
L= } BƑAƒ'=B/(I—A†)—' (24.189)
t=0
The elements of this matrix /,, refer to the total expected response of the jth endogenous variable to the sustained unit change in the ith exogenous variable, holding the other exogenous variables constant (see Schmidt (1973), (1979), for a discussion of the statistical analysis of these multipliers) Returning to the question of prediction we can see that the natural
predictor for y;,, given by y,, , yy and x,, , X;4, 18
Grii= Afy, +BY Zt (24.190)
In order to predict y;,, we need to know x74), Xr42 a8 well as y7,) Assuming that x;,, and x;,, are available we can use the predictor of yy, ;, fr„¡1n order to get yr, in Vrs2 = APY ra: + B†7}.; †(†y:+B†Z? )+†Z.› =(†)?yry+†ʆZ†,.+ʆZ†.› (24.191) Hence, 97 = (AP) yr + } (APY BY ZF, t= 1,2 (24.192) J=1
will provide predictions for future values of y, assuming that the values taken by the regressors are available For the asymptotic covariance matrix
of the prediction error (y;,,—¥74,) see Schmidt (1974)
Prediction in the context of the multivariate (dynamic) linear regression model is particularly important in econometric modelling because of its
relationship with the simultaneous equation model discussed in the next chapter The above discussion of prediction carries over to the
simultaneous equation model with minor modifications and the concepts of impact, interim and long-run multipliers are useful in simulations and policy analysis
Appendix 24.1 — The Wishart distribution
Let {Z,,t¢€T} bea sequence of n x | independent random vectors such that
Z,~N(0,Q), te T, then for T>n, S=)/_, Z,Z;, S~ W,(Q, T), where the
density function of S is
đẹt S)\E7~¬- 1⁄21
DS; 1 et) exp{—$trQ7'Ss}
Trang 32h + —1 c-|?" 2,yinữ — D4} H (Ay 5| ¡=1 I(-) being the gamma function (see Press (1972)) where
Properties of the Wishart distribution
(i) If S,, , 8, are nxn random independent matrices and S,~ W.(Q, T), i=1,2, , k, then ( Š s,)~ WQ, 1) where (H) If S~ W(Q,T) and M is a k xn matrix of rank k then MSM ~ W,(MOM,,T)
(see Muirhead (1982), Press (1972) inter alia)
These results enable us to deduce that if S~ W(Q, T) and S and Q are partitioned conformally as s-(§" ] a-(0" a] Si S22 , Sài Q,, , (a) S.i:~ W (QT), i= 1, 2, , where ny +ng=n, $1, ny X11, S22: Ny X Ny (b) S,, and S,, are independent if Q,,=0 (c) (S;; —S;2837'S21)~ W,,(Q,, -Q, ,Q57Q,,, T—n) and is inde- pendent of S,, and S,) (d) (S,2/S22)~ N(Q, 03782), (Q,,-Q, 29;;©¿¡) @S¿¿)
Appendix 24.2 — Kronecker products and matrix differentiation
Trang 33Let A, B, C and D be arbitrary matrices, then
(ill)
(tv)
(vi)
A © (xB)=2(A @ B), x being a scalar;
(A+B) đâC=A @đC+B OC, A and B being of the same order; A đ(B+C)=A âđB+A @C, B and C being of the same order; A®(B@C)=(A @B) @C; (A@B/=A @B; (A @ B)(C @ D)= AC @ BD; (A@B)"'=A"!@B-ˆ', A and B being square non-singular matrices; vec(ACB) =(B’ © A) vec(C);
tr(A © B)=(tr A)(tr B), A and B being square matrices;
det(A © B)=(det A)"(det B)", A and B being nxn and mxm matrices; vec(A + B)= vec(A)+ vec(B), A and B being of the same order; tr(AB) = (vec(B’)’)(vec(A)) Useful derivatives ê êlog(det A) dA Ê trAB) CB ¢ tr(A’B) cB tr(X’AXB) ox =(A 1: =A’: =AXB+AXB; ề vec(AXB) é vec X =B @A Important concepts
Wishart distribution, trace correlation, coefficient of alienation, iterative
numerical optimisation, SURE and Malinvaud formulations, estimated GLS estimator, exclusion restrictions, linear homogeneous restrictions,
Trang 3410 11 Questions Compare the linear regression and multivariate linear regression statistical models Explain how linearity and homoskedasticity are related to the normality of Z,
Compare the formulations Y=XB+U and y, =X,B, +u,
‘How do you explain the fact that, although yq,, V›,, Vự„ AT€ correlated, the MLE estimator of B, given by
B=(XX) 1XY
does not involve Q” Derive the distribution of B
Explain why the assumption T >m-+k is needed for the existence of the MLE Q of Q Discuss the distribution of Q
Discuss the relationship between the goodness-of-fit measures d, = (1/m)trG and d,=det(G) where
G=I-(YY) 'ỮCU
with d=[det(Z)]/[det(Z, ,) det(Z,,)] known as Hotelling’s alienation
coefficient ; -
State the distributions of the MLE’s B and Ô and discuss the properties which can be deduced from their distributions
Discuss intuitively how the conditions
(i) lim(XX) !=0; and Tox : Ay (1) —|<K for all T; As imply that B > B Give two examples for each of the following forms of restrictions: (i) D,B+C, =0: (ii) Br, +A, =0
Discuss the differences between them
Trang 3513 14 15 16 17 18 19
Explain how the GLS-type estimators of B, for the Zellner and Malinvaud formulations can be derived using some numerical optimisation iterative formula
Discuss the question of constructing a test for H,: DB-C=0 against H,: DB—C#0
Compare the following test statistics defined in Section 24.5: T,(Y), T(Y), LR(Y), LM(Y), T14(Y)
Discuss the question of testing for departures from normality and compare it with the same test for the m= 1 case
Explain how you would go about testing for linearity, homoskedasticity and independence in the context of the multivariate
linear regression model
‘Misspecification testing in the context of the multivariate dynamic linear regression model is related to that of the non-dynamic model in the same way as the dynamic linear regression is related to the linear regression model.’ Discuss
Explain the concepts of impact, interim and long-run multipliers and discuss their usefulness
Exercises
Verify the following: (i) vec(B) 4 vec(B’);
(1) Covi(vec(Y’) vec(Y’)’) =I, @ Q:;
(H) 5,(y,—Bx,}Oˆ~1{y,—Bx,)=tr@~!(Y —XB)(Y — XB) Using the relatlonships L?=L, P?=P and LP=0 show that for L=
G†(G†XXG?) 'G†XX, P takes the form P=(XX) !D/[D,(XX)-!D;,]-!D¿, where D , D=( Di) (G,.G†)= ~D-! 1 (see Section 24.3)
Verify the formulae (58), (59) and (62), (63) Consider the following system of equations (m=2)
Trang 36Discuss the estimation of this system in the following three cases: (i) no a priori restrictions;
(ii) B3,=0, By =9; (iii) B31 =Bs2
5 Derive the F-type test for Hạ: DB—C=0 against H;: DB—C z0
6 ‘In testing for departures from the assumption of independence we can
use either of the following auxiliary equations: 1 u,= Ay, —-;+ ), B¿x, ,+V,, 1 (=1 4- tt ! ‡ U, = (By — By)’x, + Y Ai,-;+ > Bix,-;+¥;,, t=] i=1 because X’4=0 and thus both cases should give the same answer.’ Discuss 7 Construct a 1—« prediction region for y;., Additional references