Class Notes in Statistics and Econometrics Part 22 pdf

25 218 0
Class Notes in Statistics and Econometrics Part 22 pdf

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

CHAPTER 43 Multiple Comparisons in the Linear Model Due to the isomorphism of tests and confidence intervals, we will keep this whole discussion in terms of confidence intervals. 43.1. Rectangular Confidence Regions Assume you are interested in two linear combinations of β at the same time, i.e., you want separate confidence intervals for them. If you use the Cartesian product (or the intersection, depending on how you look at it) of the individual confidence intervals, the confidence level of this rectangular confidence region will of necessity be different that of the individual intervals used to form this region. If you want the 971 972 43. MULTIPLE COMPARISONS IN THE LINEAR MODEL joint confidence region to have confidence level 95%, then the individual confidence intervals must have a confidence level higher than 95%, i.e., they must be be wider. There are two main approaches for compute the confidence levels of the indi- vidual intervals, one very simple one which is widely applicable but which is only approximate, and one more specialized one which is precise in some situations and can be taken as an approximation in others. 43.1.1. Bonferroni Intervals. To derive the first method, the Bonferroni in- tervals, assume you have individual confidence intervals R i for parameter φ i . In order to make simultaneous inferences about the whole parameter vector φ =    φ 1 . . . φ i    you take the Cartesian product R 1 ×R 2 ×···×R i ; it is defined by    φ 1 . . . φ i    ∈ R 1 ×R 2 ×···×R i if and only if φ i ∈ R i for all i. Usually it is difficult to compute the precise confidence level of such a rectan- gular set. If one cannot be precise, it is safer to understate the confidence level. The following inequality from elementary probability theory, called the Bonferroni 43.1. RECTANGULAR CONFIDENCE REGIONS 973 inequality, gives a lower bound for the confidence level of this Cartesian prod- uct: Given i events E i with Pr[E i ] = 1 − α i ; then Pr[  E i ] ≥ 1 −  α i . Proof: Pr[  E i ] = 1 − Pr[  E  i ] ≥ 1 −  Pr[E  i ]. The so-called Bonferroni bounds therefore have the individual levels 1 − α/i. Instead of γ j = α/i one can also take any other γ i ≥ 0 with  γ i = α. For small α and small i this is an amazingly precise method. 43.1.2. The Multivariate t Distribution. Let z ∼ N (o, σ 2 Ψ) where Ψ is positive definite and has ones in the diagonal: (43.1.1) Ψ =        1 ρ 12 ρ 13 ··· ρ 1i ρ 12 1 ρ 23 ··· ρ 2i ρ 13 ρ 23 1 ··· ρ 3i . . . . . . . . . . . . . . . ρ 1i ρ 2i ρ 3i ··· 1        Let s 2 ∼ σ 2 ν χ 2 ν be independent of z. Then t = z/s has a multivariate t distribution with ν degrees of freedom. This is clearly what one needs for simultaneous t intervals, since this is the joint distribution of the statistics used to construct t intervals. Each t i has a t distribution. For certain special cases of Ψ, certain quantiles of this joint distribution have been calculated and tabulated. This allows to compute the precise confidence levels of multiple t intervals in certain situations. 974 43. MULTIPLE COMPARISONS IN THE LINEAR MODEL Problem 432. Show that the correlation coefficient between t i and t j is ρ ij But give a verbal argument that the t i are not independent, even if the ρ ij = 0, i.e. z i are independent. (This means, one cannot get the quantiles of their maxima from individual quantiles.) Answer. First we have E[t j ] = E[z j ] E[ 1 s ] = 0, since z j and s are independent. Therefore (43.1.2) cov[t i , t j ] = E[t i t j ] = E[E[t i t j ]|s] = E[E[ 1 s 2 z i z j |s]] = E[ 1 s 2 E[z i z j |s]] = E[ 1 s 2 E[z i z j ]] = E[ σ 2 s 2 ]ρ ij . In particular, var[t i ] = E[ σ 2 s 2 ], and the statement follows.  43.1.3. Studentized Maximum Modulus and Related Intervals. Look at the special case where all ρ ij are equal, call them ρ. Then the following quantiles have been tabulated by [HH71], and reprinted in [Seb77, pp. 404–410], where they are called u α i,ν,ρ : (43.1.3) Pr[  max j=1, ,i |t i |  ≤ u α i,ν,ρ ] = 1 −α, where t =    t 1 . . . t i    is a multivariate equicorrelated t with ν degrees of freedom and correlation coefficient ρ. 43.1. RECTANGULAR CONFIDENCE REGIONS 975 If one needs only two joint confidence intervals, i.e., if i = 2, then there are only two off-diagonal elements in the dispersion matrix, which must be equal by symmetry. A 2 × 2 dispersion matrix is therefore always “equicorrelated.” The values of the u α 2,n−k,ρ can therefore be used to compute simultaneous confidence intervals for any two parameters in the regression model. For ρ one must use the actual correlation coefficient between the OLS estimates of the respective parameters, which is known precisely. Problem 433. In the model y = Xβ + ε ε ε, with ε ε ε ∼ (o, σ 2 I), give a formula for the correlation coefficient between g  ˆ β and h  ˆ β, where g and h are arbitrary constant vectors. Answer. This is in Seber, [Seb77, equation (5.7) on p. 128] . (43.1.4) ρ = g  (X  X) −1 h/  (g  (X  X) −1 g)(h  (X  X) −1 h)  But in certain situations, those equicorrelated quantiles can also be applied for testing more than two parameters. The mos t basic situation in which this is the case is the following: you have n × m observations y ij = µ i + ε ij , and the ε ε ε ij ∼ NID(0, σ 2 ). Then the equicorrelated t quantiles allow you to compute precise joint confidence intervals for all µ i . Define s 2 =  i,j (y ij −¯y i· ) 2 /(n(m − 1)), and define z 976 43. MULTIPLE COMPARISONS IN THE LINEAR MODEL by z i = (¯y i· − µ i ) √ m. These z i are normal with mean zero and dispersion matrix σ 2 I, and they are independent of s 2 . Therefore one gets confidence intervals (43.1.5) µ i ∈ ¯y i· ± u α n,n(m−1),0 s/ √ m. This simplest example is a special case of “orthogonal regression,” in which X  X is a diagonal matrix. One can do the same procedure also in other cases of orthogonal regression, such as a regression with orthogonal polynomials as explanatory variables. Now return to the situation of the basic example, but assume that the first row of the matrix Y of observations is the reference group, and one wants to know whether the means of the other groups are significantly different than that first group. Give the first row the subscript i = 0. Then use z i = (¯y i· − ¯y 0· ) √ m/ √ 2, i = 1, . . . , n. One obtains again the multivariate t, this time ρ = 1/2. Miller calls these intervals “many-one intervals.” Problem 434. Assume again we are in t he situation of our basic example, re- vert to counting i from 1 to n. Construct simultaneous confidence intervals for the difference between the individual means and the grand mean. Answer. One uses (43.1.6) z i =  ¯y i· − ¯y ·· − (µ i − µ)   n − 1 mn , 43.1. RECTANGULAR CONFIDENCE REGIONS 977 where ¯y ·· is the grand sample mean and µ its population counterpart. Since ¯y ·· = 1 n  ¯y i· . one obtains cov[¯y i· , ¯y ·· ] = 1 n var[¯y i· ] = σ 2 mn . Therefore var[¯y i· −¯y ·· ] = σ 2  1 m − 2 mn + 1 mn  = σ 2  n−1 mn  . And the correlation coefficient is 1/(n −1).  43.1.4. Studentized Range. This is a famous example, it is not in Seber [Seb77], but we should at least know what it is. Just as the projected F intervals are connected with the name of Scheff´e, these intervals are connected with the name of Tukey. Again in the situation of our basic example one uses ¯y i· − ¯y k· to build confidence intervals for µ i − µ k for all pairs i, k : i = k. This is no longer the equicorrelated case. (Such simultaneous confidence intervals are useful if one knows that one will compare means, but one does not know a priori which means.) Problem 435. Again in our basic example, define z = 1 √ 2         ¯y 1· − ¯y 2· ¯y 1· − ¯y 3· ¯y 1· − ¯y 4· ¯y 2· − ¯y 3· ¯y 2· − ¯y 4· ¯y 3· − ¯y 4·         . Compute the correlation matrix of z . 978 43. MULTIPLE COMPARISONS IN THE LINEAR MODEL Answer. Write z = 1 √ 2 A ¯ y, therefore V [z] = σ 2 2m AA  where (43.1.7) A =       1 −1 0 0 1 0 −1 0 1 0 0 −1 0 1 −1 0 0 1 0 −1 0 0 1 −1       V [z] = 1 2m AA  = 1 2m       2 1 1 −1 −1 0 1 2 1 1 0 −1 1 1 2 0 1 1 −1 1 0 2 1 −1 −1 0 1 1 2 1 0 −1 1 −1 1 2        43.2. Relation between F-test and t-tests. Assume you have constructed the t-intervals for several different linear combina- tions of the two parameters β 1 and β 2 . In the (β 1 , β 2 )-plane, each of these intervals can be represe nted by a band delimited by parallel straight lines. If one draws many of these bands, their intersection becomes an e llipse, which has the same shape as the joint F -confidence region for β 1 and β 2 , but it is smaller, i.e., it comes from an F -test for a lower significance level α The F-test, say for β 1 = β 2 = 0, is therefore equivalent not to two but to infinitely many t-tests, one for each linear combination of β 1 and β 2 , but each of these t tests has a higher confidence level than that of the F test. This is the right way how to lo ok at the F test. 43.2. RELATION BETWEEN F-TEST AND T-TESTS. 979 What are situations in which one would want to obtain a F -confidence region in order to get information about many different linear combinations of the parameters at the same time? For instance, one e xamines a regression output and looks at all parameters and computes linear combinations of parameters of interest, and believes they are sig- nificant if their t-tests reject. This whole procedure is sometimes considered as a misuse of statistics, “data-snooping,” but Scheff´e argued it was justified if one raises the significance level to that of the F test implied by the infinitely many t tests of all linear combinations of β. Or one looks at only c ertain kinds of linear combinations, for instance, at all contrasts, i.e., linear combinations whose coefficients sum to zero. This is a very thorough way to ascertain that all parameters are equal. Or if one wants to draw a confidence band around the whole regression line. Problem 436. Someone fits a regression with 18 observations, one explanatory variable and a constant term, and then draws around each point of the regression line a standard 95% t interval. What is the probability that the band created in this way covers the true regression line over its entire length? Note: the Splus commands qf(1-alpha,df1,df2) and qt(1-alpha/2,df) give quantiles, and the commands pf(critval,df1,df2) and pt(critval,df) give the cumulative distribution func- tion of F and t distributions. 980 43. MULTIPLE COMPARISONS IN THE LINEAR MODEL Answer. Instead of n = 18 and k = 2 we do it for arbitrary n and k. We need a α such that t (n−k;0.025) =  2F (k,n−k;α) (43.2.1) 1 2 (t (n−k;0.025) ) 2 = F (k,n−k;α) (43.2.2) 1 − α = Pr[F k,n−k ≤ 1 2 (t (n−k;0.025) ) 2 ](43.2.3) The Splus command is obsno<-18; conflev<-pf((qt(0.975,obsn o-2 )^2/ 2,2, obsn o-2). The value is 0.8620989.  Problem 437. 6 points Which options do you have if you want to test more than one hypothesis at the same time? Describe situations in which one F -test is better than two t-tests (i.e., in which an elliptical confidence region is better than a rectangular one). Are there also situations in which you might want two t-tests instead of on e F -test? In the one-dimensional case this confidence region is identical to the t-interval. But if one draws for i = 2 the confidence ellipse generated by the F-test and the two intervals generated by the t-tests into the same diagram, one obtains the picture as in figure 5.1 of Seber [Seb77], p. 131. In terms of hypothesis testing this means: there are values for which the F test does not reject but one or both t tests reject, and there are values for which one or both t-tests fail to reject but the F-test rejects. The reason for this confusing situation is that one should not compare t tests and F [...]... can be written in the form Σ u for some u Now as an example let’s do Johnson and Wichern, example 5.7 on pp 194–197 In a survey, people in a city are asked which bank is their primary savings bank The answers are collected as rows in the Y matrix The columns correspond to banks A, B, C, D, to some other bank, and to having no savings Each row has exactly one 1 in the column corresponding to the respondent’s... savings bank, zeros otherwise The people with no savings will be ignored, i.e., their rows will be trimmed from the matrix together with the last column After this trimming, Y has 5 columns, and 986 43 MULTIPLE COMPARISONS IN THE LINEAR MODEL there are 355 respondents in these five categories It is assumed that the rows of Y are independent, which presupposes sampling with replacement, i.e., the sampling... other consistent estimate, for that matter, in the above model): 984 43 MULTIPLE COMPARISONS IN THE LINEAR MODEL Assume y ∼ N (µ, Σ ) with unknown µ and known Σ We allow Σ to be singular, i.e., there may be some nonzero linear combinations g y which have zero variance Let q be the rank of Σ Then a simultaneous 1 − α confidence region for all linear combinations of µ is g µ∈g y± (43.3.1) where χ2 q... while the d.f of the SSE is the number of observations minus the number of parameters (intercept and slope parameters) in the 989 990 44 SAMPLE SAS REGRESSION OUTPUT regression The d.f of the sum of squares due to the model consists in the number of slope parameters (not counting the intercept) in the model The “mean squares” are the corresponding sum of squares divided by their degrees of freedom This... prob>|t| value indicates the significance for the two-sided test Problem 438 What does the c stand for in c total in Table 1? Problem 439 4 points Using the sample SAS regression output in Table 1, test at the 5% significance level that the coefficient of gender is −1.0, against the alternative that it is < −1.0 Answer -1.73266849-(-1)=-.73266849 must be divided by 0.41844140, which gives -1.7509465 and then... tests regarding the unknown µ in terms of the sample mean y , which one may assume to be normally distributed, and whose true dispersion matrix may be assumed to be know and to be equal to the sample dispersion matrix of the y i , divided by n Therefore it makes sense to look at the following model (the y in this model is ¯ equal to the y in the above model, and the Σ in this model is equal to S/n,... R-square is ss(model) ¯ divided by ss(c total), and the adjusted R-square is R2 = 1 − SSE/(n−k) SST /(n−1) 44 SAMPLE SAS REGRESSION OUTPUT 991 For every parameter, including the intercept, the estimated value is printed, and next to it the estimate of its standard deviation The next column has the t-value, which is the estimated value divided by its estimated standard deviation This is the test statistic... all the bands tangent to the ellipse 43.3 LARGE-SAMPLE SIMULTANEOUS CONFIDENCE REGIONS 983 Taking only the vertical and the horizontal band tangent to the ellipse, one has now the following picture: if one of the t-tests rejects, then the F -test rejects too But it may be possible that the F -test rejects but neither of the two t-tests rejects In this case, there must be some other linear combination... • f Here is the printout of the analysis of variance table after additional explanatory variables were included in the above regression (i.e., the dependent variable is the same, and the set of explanatory variables contains all variables used in the above regression, plus some additional ones) 44 SAMPLE SAS REGRESSION OUTPUT • g 1 point There were 993 additional variables • h 3 points Make an F test... It is sufficient to take the intersection over all g with unit length What does each of these regions intersected look like? First note that the i × 1 vector u lies in that region if and only if g u lies in a t-interval for g Rβ, whose confidence level is no longer α but is γ = Pr[|t| ≤ iF(i,n−q;α) ], where t is distributed as a t with n − q degrees of freedom Geometrically, in Seber [Seb77]’s figure 5.1, . parameters at the same time? For instance, one e xamines a regression output and looks at all parameters and computes linear combinations of parameters of interest, and believes they are sig- nificant. looks at only c ertain kinds of linear combinations, for instance, at all contrasts, i.e., linear combinations whose coefficients sum to zero. This is a very thorough way to ascertain that all parameters. line a standard 95% t interval. What is the probability that the band created in this way covers the true regression line over its entire length? Note: the Splus commands qf(1-alpha,df1,df2) and

Ngày đăng: 04/07/2014, 15:20

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan