DISTRIBUTION THEORY FOR NORMAL VARIATES

Statistical inference for normal linear models uses sampling distributions derived from quadratic forms with multivariate normal random variables. We now review the multivariate normal distribution and related sampling distributions.

3.1.1 Multivariate Normal Distribution

LetN(𝝁,V) denote the multivariate normal distribution with mean𝝁and covariance matrixV. Ify=(y1,…,yn)Thas this distribution andVis positive definite, then the probability density function (pdf) is

f(y)=(2𝜋)−n2|V|−12exp [

−1

2(y−𝝁)TV−1(y−𝝁)] ,

where|V|denotes the determinant ofV. Here are a few properties, wheny∼N(𝝁,V).

r Ifx=Ay+b, thenx∼N(A𝝁+b,AVAT).

r Suppose thatypartitions as

y= (y1

y2 )

, with𝝁= (𝝁1

𝝁2

)

andV=

(V11 V12 V21 V22

) .

The marginal distribution ofyaisN(𝝁a,Vaa),a=1, 2. The conditional distribution

(y1∣y2)∼N[

𝝁1+V12V−122(y2−𝝁2), V11−V12V−122V21] . In addition,y1andy2are independent if and only ifV12 =0.

r From the previous property, ifV=𝜎2I, thenyi∼N(𝜇i,𝜎2) and {yi} are independent.

The normal linear model assumes thaty∼N(𝝁,V) withV=𝜎2I. The least squares estimator𝜷̂and the residualsealso have multivariate normal distributions, since they are linear functions ofy, but their elements are typically correlated. This estimator𝜷̂ is also the maximum likelihood (ML) estimator under the normality assumption (as we showed in Section 2.1).

3.1.2 Chi-Squared,F, andtDistributions

Let 𝜒p2 denote a chi-squared distribution with p degrees of freedom (df). A chi- squared random variable is nonnegative with mean=df and variance=2(df). Its distribution1is skewed to the right but becomes more bell-shaped asdfincreases.

1The pdf is the special case of the gamma distribution pdf (4.29) with shape parameterk=df∕2.

Recall that wheny1,…,yp are independent standard normal random variables,

∑p

i=1y2i ∼𝜒p2. In particular, ify∼N(0, 1), theny2∼𝜒12. More generally

r If ap-dimensional random variabley∼N(𝝁,V) withVnonsingular of rankp, then

x=(y−𝝁)TV−1(y−𝝁)∼𝜒p2. Exercise 3.1 outlines a proof.

r Ifz∼N(0, 1) andx∼𝜒p2, withxandzindependent, then

√z

x∕p ∼tp, thetdistribution withdf =p.

Thetdistribution is symmetric around 0 with variance=df∕(df−2) whendf >2.

The termx∕pin the denominator is a mean ofpindependent squaredN(0, 1) random variables, so as p→∞ it converges in probability to their expected value of 1.

Therefore, thetdistribution converges to aN(0, 1) distribution asdf increases.

Here is a classic way thetdistribution occurs for independent responsesy1,…,yn from aN(𝜇,𝜎2) distribution with sample meanȳand sample variances2: For testing H0: 𝜇=𝜇0, the test statistic z=√

n(̄y−𝜇0)∕𝜎 has the N(0, 1) null distribution.

Also,s2∕𝜎2is a𝜒n−12 variatex=(n−1)s2∕𝜎2divided by itsdf. Sinceȳands2are independent for independent observations from a normal distribution, underH0

t= z

√x∕(n−1)

= ȳ−𝜇0

s∕√ n

∼tn−1.

Larger values of|t|provide stronger evidence againstH0. r Ifx∼𝜒2

p andy∼𝜒q2, withxandyindependent, then x∕p

y∕q ∼Fp,q, theFdistribution withdf1=panddf2=q.

An F random variable takes nonnegative values. When df2>2, it has mean = df2∕(df2−2), approximately 1 for largedf2. We shall use this distribution for testing hypotheses in ANOVA and regression by taking a ratio of independent mean squares.

For atrandom variable,t2has theFdistribution withdf1=1 anddf2equal to thedf for thatt.

3.1.3 Noncentral Distributions

In significance testing, to analyze the behavior of test statistics when null hypotheses are false, we usenoncentralsampling distributions that occur under parameter values from the alternative hypothesis. Such distributions determine the power of a test (i.e., the probability of rejecting H0), which can be analyzed as a function of the actual parameter value. When observations have a multivariate normal distribution, sampling distributions in such non-null cases contain the ones just summarized as special cases.

Let𝜒p,2𝜆denote a noncentral chi-squared distribution withdf =pand with noncentrality parameter 𝜆. This is the distribution of x=∑p

i=1y2i in which {yi} are independent withyi∼N(𝜇i, 1) and𝜆=∑p

i=1𝜇2i. For this distribution2,E(x)=p+𝜆 and var(x)=2(p+2𝜆). The ordinary (central) chi-squared distribution is the special case with𝜆=0.

r If ap-dimensional random variabley∼N(𝝁,V) withVnonsingular of rankp, then

x=yTV−1y∼𝜒p,2𝜆 with 𝜆=𝝁TV−1𝝁.

The construction of the noncentral chi-squared from a sum of squared independent N(𝜇i, 1) random variables results whenV=I.

r Ifz∼N(𝜇, 1) andx∼𝜒p2, withxandzindependent, then

t= z

√x∕p

∼tp,𝜇,

the noncentraltdistribution withdf =pand noncentrality𝜇.

The noncentralt distribution is unimodal, but skewed in the direction of the sign of 𝜇=E(z). When p>1 and 𝜇≠0, its mean E(t)≈[1−3∕(4p−1)]−1𝜇, which is near𝜇 but slightly larger in absolute value. For largep, the distribution oft is approximately theN(𝜇, 1) distribution.

r Ifx∼𝜒2

p,𝜆andy∼𝜒q2, withxandyindependent, then x∕p

y∕q ∼Fp,q,𝜆,

the noncentralFdistribution withdf1=p,df2=q, and noncentrality𝜆.

2Here is an alternative way to define noncentrality: Letz∼Poisson(𝜙) and (x∣z)∼𝜒p+2z2 . Then unconditionallyx∼𝜒p,2𝜙. This noncentrality𝜙relates to the noncentrality𝜆we defined by𝜙=𝜆∕2.

For largedf2, the noncentralFhas mean approximately 1+𝜆∕df1, which increases in𝜆from the approximate mean of 1 for the central case.

As reality deviates farther from a particular null hypothesis, the noncentrality 𝜆 increases. The noncentral chi-squared and noncentralFdistributions are stochasti- cally increasing in𝜆. That is, evaluated at any positive value, the cumulative distribution function (cdf) decreases as 𝜆increases, so values of the statistic tend to be larger.

3.1.4 Normal Quadratic Forms with Projection Matrices Are Chi-Squared Two results about quadratic forms involving normal random variables are espe- cially useful for statistical inference with normal linear models. The first generalizes the above quadratic form result for the noncentral chi-squared, which follows with A=V−1.

r Supposey∼N(𝝁,V) andAis a symmetric matrix. Then,

yTAy∼𝜒r,2𝝁TA𝝁 ⇔ AVis idempotent of rankr.

For the normal linear model, the n independent observations y∼N(𝝁,𝜎2I) with 𝝁=X𝜷, and soy∕𝜎∼N(𝝁∕𝜎,I). By this result, ifPis a projection matrix (which is symmetric and idempotent) with rankr, thenyTPy∕𝜎2∼𝜒2

r,𝝁TP𝝁∕𝜎2. Applying the result with the standardized normal variables (y−𝝁)∕𝜎∼N(0,I), we have

Normal quadratic form with projection matrix and chi-squared: Suppose y∼N(𝝁,𝜎2I) andPis symmetric. Then,

𝜎2(y−𝝁)TP(y−𝝁)∼𝜒r2 ⇔ Pis a projection matrix of rankr.

Cochran (1934) showed3this result, which also provides an interpretation for degrees of freedom.

r Since thedf for the chi-squared distribution of a quadratic form with a normal linear model equals the rank ofP,degrees of freedomrepresent the dimension of the vector subspace to whichPprojects.

The following key result also follows from Cochran (1934), building on the first result.

3From Cochran’s result I, since a symmetric matrix whose eigenvalues are 0 and 1 is idempotent.

Cochran’s theorem: Suppose n observations y∼N(𝝁,𝜎2I) and P1,…,Pk are projection matrices having ∑

iPi=I. Then, {yTPiy} are independent and (1

𝜎2

)yTPiy∼𝜒r2

i,𝜆i where ri=rank(Pi) and 𝜆i= 𝜎12𝝁TPi𝝁, i=1,…,k, with

∑

iri =n.

If we replace yby (y−𝝁) in the quadratic forms, we obtain central chi-squared distributions (𝜆i=0). This result is the basis of significance tests for parameters in normal linear models. The proof of the independence result shows that all pairs of projection matrices in this decomposition satisfyPiPj=0.

3.1.5 Proof of Cochran’s Theorem

We next show a proof4of Cochran’s theorem. You may wish to skip these techni- cal details for now and go to the next section, which uses this result to construct significance tests for the normal linear model.

We first show that ify∼N(𝝁,𝜎2I) andPis a projection matrix having rankr, then (1

𝜎2

)yTPy∼𝜒r,2𝜆with𝜆= 𝜎12𝝁TP𝝁. SincePis symmetric and idempotent with rank r, its eigenvalues are 1 (rtimes) and 0 (n−rtimes). By the spectral decomposition of a symmetric matrix, we can expressP=Q𝚲QT, where𝚲is a diagonal matrix of (1, 1,…, 1, 0,…, 0), the eigenvalues ofP, andQis an orthogonal matrix with columns that are the eigenvectors of P. Let z=QTy∕𝜎. Then, z∼N(QT𝝁∕𝜎,I), and(1

𝜎2

)yTPy=zT𝚲z=∑r

i=1z2i. Since each zi is normal with standard deviation 1,∑r

i=1z2i has a noncentral chi-squared distribution withdf =r and noncentrality parameter

∑r i=1

[E(zi)]2=[E(𝚲z)]T[E(𝚲z)]= ( 1

𝜎2

)[𝚲QT𝝁]T[𝚲QT𝝁]

= ( 1

𝜎2

)𝝁TQ𝚲QT𝝁= ( 1

𝜎2

)𝝁TP𝝁.

Now we considerkquadratic forms withkprojection matrices that are a decomposition ofI, then×nidentity matrix. The rank of a projection matrix is its trace, so

∑

iri=∑

itrace(Pi)=trace(∑

iPi)=trace(I)=n. We apply the spectral decomposition to each projection matrix, withPi=Qi𝚲iQTi, where𝚲iis a diagonal matrix of (1, 1,…, 1, 0,…, 0) withrientries that are 1. By the form of𝚲i, this is identical to Pi=Q̃iIr

iQ̃Ti =Q̃iQ̃Ti, whereQ̃iis an×rimatrix of the firstricolumns ofQi. Note thatQ̃TiQ̃i=Iri. We stack the {Q̃i} together as

Q=[Q̃1 : Q̃2 : ⋯ : Q̃k],

4This proof is based on one in Monahan (2008, pp. 113–114).

for which

QQT=Q̃1Q̃T1+⋯+Q̃kQ̃Tk =P1+⋯+Pk=In.

Thus,Qis an orthogonaln×nmatrix and alsoQTQ=InandQ̃TiQ̃j=0fori≠j. So QTy∼N(QT𝝁,𝜎2I), and its components {Q̃Tiy} are independent, as are {‖Q̃Tiy‖2= yTQ̃iQ̃Tiy=yTPiy}. Note5also that fori≠j,PiPj=Q̃iQ̃TiQ̃jQ̃Tj =0.

QUANTITATIVE/QUALITATIVE EXPLANATORY VARIABLES AND INTERPRETING EFFECTS

MODEL MATRICES AND MODEL VECTOR SPACES