Gauss–Markov and GLS: The Gauss–Markov theorem is- 123docz.net

EXERCISES

2.1 For independent observationsy1,…,ynfrom a probability distribution with mean𝜇, show that the least squares estimate of𝜇isy.̄

2.2 In the linear modely=X𝜷+𝝐, suppose 𝜖i has the Laplace density, f(𝜖)= (1∕2b)exp(−|𝜖|∕b). Show that the ML estimate minimizes∑

i|yi−𝜇i|.

2.3 Consider the least squares fit of the linear modelE(yi)=𝛽0+𝛽1xi. a. Show that ̂𝛽1=[∑

i(xi−x)(ȳ i−y)]∕[̄ ∑

i(xi−x)̄2].

b. Derive var(̂𝛽1). State the estimated standard error of ̂𝛽1, and discuss how its magnitude is affected by (i)n, (ii) the variability around the fitted line, (iii) the sample variance ofx. In experiments with control over setting values ofx, what does (iii) suggest about the optimal way to do this?

2.4 In the linear modelE(yi)=𝛽0+𝛽1xi, suppose that instead of observingxiwe observex∗i =xi+ui, whereui is independent ofxifor alliand var(ui)=𝜎2u. Analyze the expected impact of thismeasurement erroron ̂𝛽1andr.

2.5 In the linear modelE(yi)=𝛽0+𝛽1xi, consider the fitted line that minimizes the sum of squared perpendicular distances from the points to the line. Is this fit invariant to the units of measurement of either variable? Show that such invariance is a property of the usual least squares fit.

2.6 For the model in Section 2.3.4 for the two-way layout, construct a full-rank model matrix. Show that the normal equations imply that the marginal row and column sample totals foryequal the row and column totals of the fitted values.

2.7 Refer to the analysis of covariance model𝜇i=𝛽0+𝛽1xi1+𝛽2xi2for quanti- tativex1and binaryx2for two groups, withxi2=0 for group 1 andxi2=1 for group 2. Denote the sample means onx1andyby (̄x(1)

1 ,ȳ(1)) for group 1 and (̄x(2)

1 ,ȳ(2)) for group 2. Show that the least squares fit corresponds to parallel lines for the two groups, which pass through these points. (At the overallx̄1, the fitted values ̂𝛽0+ ̂𝛽1x̄1and ̂𝛽0+ ̂𝛽1x̄1+ ̂𝛽2are calledadjusted meansofy.) 2.8 By theQR decomposition,Xcan be decomposed asX=QR, whereQconsists of the firstp columns of an×n orthogonal matrix andR is ap×p upper triangular matrix. Show that the least squares estimate𝜷̂=R−1QTy.

2.9 In an ordinary linear model with two explanatory variablesx1andx2having sample corr(x∗1,x∗2)>0, show that the estimated corr(̂𝛽1, ̂𝛽2)<0.

2.10 For a projection matrix P, for any y in Rn show that Py and y−Py are orthogonal vectors; that is, the projection is anorthogonal projection.

2.11 Prove thatI− 1

n1n1Tn is symmetric and idempotent (i.e., a projection matrix), and identify the vector to which it projects an arbitraryy.

2.12 For a full-rank model matrix X, show that rank(H) =rank(X), whereH= X(XTX)−1XT.

2.13 From Exercise 1.17, if Ais nonsingular and X∗=XA (such as in using a different parameterization for a factor), then C(X∗)=C(X). Show that the

linear models with the model matricesXandX∗have the same hat matrix and the same fitted values.

2.14 For a linear model with full rank X and projection matrix PX, show that PXX=Xand thatC(PX)=C(X).

2.15 Denote the hat matrix byP0for the null model andHfor any linear model that contains an intercept term. Explain whyP0H=HP0=P0. Show this implies that each row and each column ofHsums to 1.

2.16 WhenXdoes not have full rank, let’s see whyPX =X(XTX)−XTis invariant to the choice of generalized inverse. LetGandHbe two generalized inverses of XTX. For an arbitrary v∈Rn, letv=v1+v2 withv1=Xb∈C(X) for someb.

a. Show that vTXGXTX=vTX, so that XGXTX=X for any generalized inverse.

b. Show thatXGXTv=XHXTv, and thusXGXTis invariant to the choice of generalized inverse.

2.17 When X has less than full rank and we use a generalized inverse to estimate 𝜷, explain why the space of possible least squares solutions 𝜷̂ does not form a vector space. (For a solution, 𝜷, this space is the set of̂ 𝜷̃ =𝜷̂+𝜸 for all𝜸∈N(X); such a shifted vector space is called anaffine space.)

2.18 In R3, let W be the vector subspace spanned by (1, 0, 0), that is, the “x- axis” in three-dimensional space. Specify its orthogonal complement. For any y inR3, show its orthogonal decomposition y=y1+y2 with y1∈W and y2∈W⟂.

2.19 Two vectors that are orthogonal or that have zero correlation are linearly independent. However, orthogonal vectors need not be uncorrelated, and uncorrelated vectors need not be orthogonal.

a. Show this with two particular pairs of 4×1 vectors.

b. Suppose u and v have corr(u,v) = 0. Explain why the centered ver- sionsu∗=(u−u) and̄ v∗=(v−v) are orthogonal (where, e.g.,̄ ū denotes the vector having the mean of the elements of u in each component).

Show thatuandvthemselves are orthogonal if and only ifū =0,v̄=0, or both.

c. Ifuandvare orthogonal, then explain why they also have corr(u,v)=0 iff

u=0,v̄=0, or both. (From (b) and (c), orthogonality and zero correlation are equivalent only whenū =0and/orv̄=0. Zero correlation means that the centered vectors are perpendicular. Centering typically changes the angle between the two vectors.)

2.20 Suppose that all the parameters in a linear model are orthogonal (Section 2.2.4).

a. When the model contains an intercept term, show that orthogonality implies that each column inXafter the first (for the intercept) has mean 0; i.e., each explanatory variable is centered. Thus, based on the previous exercise, explain why each pair of explanatory variables is uncorrelated.

b. When the explanatory variables for the model are all centered, explain why the intercept estimate does not change as the variables are added to the linear predictor. Show that that estimate equalsȳin each case.

2.21 Using the normal equations for a linear model, show that SSE decomposes into (y−X𝜷̂)T(y−X𝜷̂)=yTy−𝜷̂TXTy.

Thus, for nestedM1andM0, explain why

SSR(M1∣M0)=𝜷̂T1XT1y−𝜷̂T0XT0y.

2.22 In Section 2.3.1 we showed the sum of squares decomposition for the null modelE(yi)=𝛽,i=1,…,n. Suppose you haven=2 observations.

a. Specify the model spaceC(X) and its orthogonal complement, and findPX and (I−PX).

b. Suppose y1=5 and y2=10. Find ̂𝛽 and 𝝁. Show the sum of squareŝ decomposition, and finds. Sketch a graph that showsy,𝝁,̂ C(X), and the projection ofyto𝝁.̂

2.23 In complete contrast to the null model is the saturated model, E(yi)=𝛽i, i=1,…,n, which has a separate parameter for each observation. For this model:

a. SpecifyX, the model spaceC(X), and its orthogonal complement, and find PX and (I−PX).

b. Find 𝜷̂ and𝝁̂ in terms ofy. Finds, and explain why this model is not sensible for practice.

2.24 Verify that then×nidentity matrixIis a projection matrix, and describe the linear model to which it corresponds.

2.25 Section 1.4.2 stated “When X has full rank, 𝜷 is identifiable, and then all linear combinations𝓵T𝜷are estimable.” Findasuch thatE(aTy)=𝓵T𝜷 for all𝜷.

2.26 For a linear model withpexplanatory variables, explain why sample multiple correlationR=0 is equivalent to sample corr(y,x∗j)=0 forj=1,…,p.

2.27 In Section 2.5.1 we noted that for linear models containing an intercept term, corr(𝝁,̂ e)=0, and plottingeagainst𝝁̂ helps detect violations of model assumptions. However, it is not helpful to ploteagainst y. To see why not,

using formula (2.5), show that (a) the regression ofyonehas slope 1, (b) the regression ofeonyhas slope 1−R2, (c) corr(y,e)=√

1−R2.

2.28 Derive the hat matrix for the centered-data formulation of the linear model with a single explanatory variable. Explain what factors cause an observation to have a relatively large leverage.

2.29 Show that an observation in a one-way layout has the maximum possible leverage if it is the only observation for its group.

2.30 Consider the leverages for a linear model with full-rank model matrix andp parameters.

a. Prove that the leverages fall between 0 and 1 and have a mean ofp∕n.

b. Show how expression (2.10) forhiisimplifies when each pair of explanatory variables is uncorrelated.

2.31 a. Give an example of actual variablesy,x1,x2for which you would expect 𝛽1≠0 in the model E(yi)=𝛽0+𝛽1xi1 but 𝛽1≈0 in the model E(yi)= 𝛽0+𝛽1xi1+𝛽2xi2 (e.g., perhapsx2 is a “lurking variable,” such that the association ofx1withydisappears when we adjust forx2).

b. Letr1=corr(y,x∗1),r2=corr(y,x∗2), and letRbe the multiple correlation with predictorsx1andx2. For the case described in (a), explain why you would expectRto be close to|r2|.

c. For the case described in (a), which would you expect to be relatively near SSR(x1,x2):SSR(x1) orSSR(x2)? Why?

2.32 In studying the model for the one-way layout in Section 2.3.2, we found the projection matrices and sums of squares and constructed the ANOVA table.

a. We did the analysis for a non-full-rank model matrix X. Show that the simple form for (XTX)−stated there is in fact a generalized inverse.

b. Verify the corresponding projection matrixPX specified there.

c. Verify thatyT(I−PX)yis the within-groups sum of squares stated there.

2.33 Refer to the previous exercise. Conduct a similar analysis, but making parameters identifiable by setting𝛽0=0. SpecifyXand findPX andyT(I−PX)y.

2.34 From the previous exercise, setting𝛽0=0 results in {̂𝛽i=ȳi}. Explain why imposing only this constraint is inadequate for models with multiple factors, and a constraint such as𝛽1=0 is more generalizable. Illustrate for the two-way layout.

2.35 Consider the main-effects linear model for the two-way layout with one observation per cell. Section 2.3.4 stated the projection matrixPrthat generates the treatment means. Find the projection matrixPcthat generates the block means.

2.36 For the two-wayr×clayout with one observation per cell, find the hat matrix.

2.37 In the model for the balanced one-way layout,E(yij)=𝛽0+𝛽i with identical ni, show that {𝛽i} are orthogonal with 𝛽0 if we impose the constraint

∑

i𝛽i=0.

2.38 Section 2.4.5 considered the “main effects” model for a balanced 2×2 layout, showing there is orthogonality between each pair of parameters when we constrain∑

i𝛽i=∑

j𝛾j=0.

a. If you instead constrain𝛽1=𝛾1=0, show that pairs of columns ofXare uncorrelated but not orthogonal.

b. Explain why 𝛽2 for the coding 𝛽1=0 in (a) is identical to 2𝛽2 for the coding𝛽1+𝛽2=0.

c. Explain how the results about constraints and orthogonality generalize if the model also contains a term𝛿ijto permit interaction betweenAandBin their effects ony.

2.39 Extend results in Section 2.3.4 to ther×cfactorial withnobservations per cell.

a. Express the orthogonal decomposition ofyijkto include main effects, interaction, and residual error.

b. Show howPrgeneralizes from the matrix given in Section 2.3.4.

c. Show the relevant sum of squares decomposition in an ANOVA table that also shows thedf values. (It may help you to refer to (b) and (c) in Exercise 3.13.)

2.40 A genetic association study considers a large number of explanatory variables, with nearly all expected to have no effect or a very minor effect on the response. An alternative to the least squares estimator𝜷̂for the linear model incorporating those explanatory variables is the null model and its estimator, 𝜷̃ =0except for the intercept. Is𝜷̃ unbiased? How does var(̃𝛽j) compare to var(̂𝛽j)? Explain why∑

jE(̃𝛽j−𝛽j)2<∑

jE(̂𝛽j−𝛽j)2 unless nis extremely large.

2.41 The Gauss–Markov theorem shows the best way to form a linear unbiased estimator in a linear model. Are unbiased estimators always sensible? Consider a sequence of independent Bernoulli trials with parameter𝜋.

a. Let ybe the number of failures before the first success. Show that the only unbiased estimator (and thus the best unbiased estimator) of 𝜋 is T(y)=1 ify=0 and T(y)=0 ify>0. Show that the ML estimator of 𝜋 is ̂𝜋=1∕(1+y). Although biased, is this a more efficient estimator?

Why?

b. Forntrials, show there is no unbiased estimator of the logit,log[𝜋∕(1−𝜋)].

2.42 In some applications, such as regressing annual income on the number of years of education, the variance ofytends to be larger at higher values ofx.

Consider the modelE(yi)=𝛽xi, assuming var(yi)=xi𝜎2for unknown𝜎2. a. Show that the generalized least squares estimator minimizes ∑

i(yi− 𝛽xi)2∕xi (i.e., giving more weight to observations with smaller xi) and has ̂𝛽GLS=y∕̄̄ x, with var(̂𝛽GLS)=𝜎2∕(∑

ixi).

b. Show that the ordinary least squares estimator is ̂𝛽=(∑

ixiyi)∕(∑

ix2i) and has var(̂𝛽)=𝜎2(∑

ix3i)∕(∑

ix2i)2. c. Show that var(̂𝛽)≥var(̂𝛽GLS).

2.43 Write a simple program to simulate data so that when you plot residuals against x after fitting the bivariate linear model E(yi)=𝛽0+𝛽1xi, the plot shows inadequacy of (a) the linear predictor, (b) the constant variance assumption.

2.44 Exercise 1.21 concerned a study comparing forced expiratory volume (y= fev1 in the data fileFEV.datat the text website) for three drugs, adjusting for a baseline measurement. For the R output shown, using notation you define, state the model that was fitted, and interpret all results shown.

---

> summary(lm(fev1 ~ base + factor(drug))) Estimate Std. Error (Intercept) 1.1139 0.2999

base 0.8900 0.1063

factor(drug)b 0.2181 0.1375 factor(drug)p -0.6448 0.1376 ---

Residual standard error: 0.4764 on 68 degrees of freedom Multiple R-squared: 0.6266, Adjusted R-squared: 0.6101

> anova(lm(fev1 ~ base + factor(drug))) Analysis of Variance Table

Df Sum Sq Mean Sq

base 1 16.2343 16.2343

factor(drug) 2 9.6629 4.8315 Residuals 68 15.4323 0.2269

> quantile(rstandard(lm(fev1 ~ base + factor(drug))))

0% 25% 50% 75% 100%

-2.0139 -0.7312 -0.1870 0.6341 2.4772

---

2.45 A data set shown partly in Table 2.4 and fully available in theOptics.datfile at the text website is taken from a math education graduate student research project. For the optics module in a high school freshman physical science class, the randomized study compared two instruction methods (1=model building inquiry, 0 = traditional scientific). The response variable was an optics post-test score. Other explanatory variables were an optics pre-test score, gender (1=female, 0=male), OAA (Ohio Achievement Assessment)

reading score, OAA science score, attendance for optics module (number of days), and individualized education program (IEP) for student with disabilities (1=yes, 0=no).

a. Fit the linear model with instruction type, pre-test score, and attendance as explanatory variables. Summarize and interpret the software output.

b. Find and interpret diagnostics, including residual plots and measures of influence, for this model.

Table 2.4 Partial Optics Instruction Dataafor Exercise 2.45

ID Post Inst Pre Gender Reading Science Attend IEP

1 50 1 50 0 368 339 14 0

2 67 1 50 0 372 389 11 0

…

37 55 0 42 1 385 373 7 0

Source:Thanks to Harry Khamis, Wright State University, Statistical Consulting Center, for these data, provided with client permission. Complete data (n=37) are in the fileOptics.datatwww.stat.ufl .edu/~aa/glm/data.

2.46 Download from the text website the data fileCrabs.datintroduced in Section 1.5.1. Fit the linear model with both weight and color as explanatory variables for the number of satellites for each crab, without interaction, treating color as qualitative. Summarize and interpret the software output, including the prediction equation, error variance,R2, adjustedR2, and multiple correlation.

Plot the residuals against the fitted values for the model, and interpret. What explains the lower nearly straight-line boundary? By contrast, what residual pattern would you expect if the response variable is normal and the linear model holds with constant variance?

2.47 The horseshoe crab dataset17 Crabs3.dat at www.stat.ufl.edu/~aa /glm/datacollects several variables for female horseshoe crabs that have males attached during mating, over several years at Seahorse Key, Florida.

Use linear modeling to describe the relation between y = attached male’s carapace width (AMCW) and x1 = female’s carapace width (FCW),x2 = female’s color (Fcolor, where 1 =light, 3= medium, 5= dark), andx3= female’s surface condition (Fsurf, where lower scores represent better condition). Summarize and interpret the output, including the prediction equation, error variance,R2, adjustedR2, multiple correlation, and model diagnostics.

2.48 Refer to the anorexia study in Exercise 1.24. For the model fitted there, interpret the output relating to predictive power, and check the model using residuals and influence measures. Summarize your findings.

17Thanks to Jane Brockmann for making these data available.

2.49 In later chapters, we use functions in the useful R package, VGAM. In that package, thevenicedata set contains annual data between 1931 and 1981 on the annual maximum sea level (variabler1) in Venice. Analyze the relation between year and maximum sea level. Summarize results in a two-page report, with software output as an appendix. (An alternative to least squares uses ML with a distribution suitable for modeling extremes, as in Davison (2003, p. 475).)

Normal Linear Models: Statistical Inference

Chapter 2 introduced least squares fitting of ordinary linear models. Fornindependent observations y=(y1,…,yn)T, with𝝁=(𝜇1,…,𝜇n)T for 𝜇i=E(yi) and a model matrixXand parameter vector𝜷, this model states that

𝝁=X𝜷 with V=var(y)=𝜎2I.

We now add to this model the assumption that {yi} have normal distributions. The model is then the normal linear model. This chapter presents the foundations of statistical inference about the parameters of the normal linear model.

We begin this chapter by reviewing relevant distribution theory for normal linear models. Quadratic forms incorporating normally distributed response variables and projection matrices generate chi-squared distributions. One such result,Cochran’s theorem, is the basis of significance tests about𝜷in the normal linear model. Section 3.2 shows how the tests use the chi-squared quadratic forms to construct test statistics havingFdistributions. A useful general result about comparing two nested models is also derived as a likelihood-ratio test. Section 3.3 presents confidence intervals for elements of𝜷 and expected responses as well as prediction intervals for future observations. Following an example in Section 3.4, Section 3.5 presents methods for making multiple inferences with a fixed overall error rate, such as multiple comparison methods for constructing simultaneous confidence intervals for differences between all pairs of a set of means. Without the normality assumption, the exact inference methods of this chapter apply to the ordinary linear model in an approximate manner for largen.

Foundations of Linear and Generalized Linear Models, First Edition. Alan Agresti.

Gauss–Markov and GLS: The Gauss–Markov theorem is named after results estab- lished in 1821 by Carl Friedrich Gauss and published in 1912 by the Russian probabilist

QUANTITATIVE/QUALITATIVE EXPLANATORY VARIABLES AND INTERPRETING EFFECTS

MODEL MATRICES AND MODEL VECTOR SPACES