The estimated standard error of is

Một phần của tài liệu Ebook An introduction to statistical methods and data analysis (6th edition) Part 2 (Trang 143 - 192)

where is the standard deviation from the regression equation and vjjis the entry in row j 1, column j 1 of (XX)1:

Because the (XX)1matrix must be computed to obtain the , it is easy to get the estimated standard errors.

js (XX)1 D

v00

v11

vkk

T se

s

j se1vjj

j

12.10 Research Study: Evaluation of the Performance of an Electric Drill 715

12.10 Research Study: Evaluation of the Performance of an Electric Drill

Defining the Problem

There have been numerous reports of homeowners encountering problems with electric drills. The drills would tend to overheat when under strenuous usage. A consumer product testing laboratory has selected a variety of brands of electric drills to determine what types of drills are most and least likely to overheat under specified conditions. After a careful evaluation of the differences in the design of the drills, the engineers selected three design factors for use in comparing the re- sistance of the drills to overheating. The design factors were the thickness of the in- sulation around the motor, the quality of the wire used in the drill’s motor, and the size of the vents in the body of the drill.

Collecting the Data

The engineers designed a study taking into account various combinations of the three design factors. There were five levels of the thickness of the insulation, three levels of the quality of the wire used in the motor, and three sizes for the vents in the drill body.

Thus, the engineers had potentially 45 (533) uniquely designed drills. However, each of these 45 drills would have differences with respect to other factors that may impact on their performance. Thus, the engineers selected ten drills from each of the 45 designs. Another factor that may vary the results of the study is the conditions under which each of the drills is tested. The engineers selected two “torture tests”

which they felt reasonably represented the types of conditions under which over- heating occurred. The ten drills were then randomly assigned to one of the two tor- ture tests. At the end of the test, the temperature of the drill was recorded. The mean temperature of the five drills was the response variable of interest to the engineers. A second response variable was the logarithm of the sample variance of the five drills.

This response variable measures the degree to which the five drills produced a con- sistent temperature under each of the torture tests. The goal of the study was to de- termine which combination of the design factors of the drills produced the smallest values of both response variables. Thus they would obtain a design for a drill having minimum mean temperature and a design which produced drills for which an individ- ual drill was most likely to produce a temperature closest to the mean temperature.

Summarizing the Data

The data consist of the 90 responses under the various designs and tests. The data were presented in Table 12.4 at the beginning of this chapter with the variables of interest given below.

AVTEM: mean temperature for the five drills under a given torture test LOGV: logarithm of the variance of the temperatures of the five drills IT: the thickness of the insulation within the drill (IT 2, 3, 4, 5, or 6) QW: an assessment of quality of the wire used in the drill motor (QW 6,

7, or 8)

VS: the size of the vent used in the motor (VS 10, 11, or 12) TEST: the type of torture test used

(I2(ITmean IT)2, Q2(QWmean QW)2, (V2(VSmean VS)2

The response variables (dependent variables) are AVTEM and LOGV. The explana- tory variables (independent variables) are IT, QW, and VS. Quadratic versions of all three variables will also be considered in finding an appropriate model. These variables are denoted as I2, Q2, and V2. We thus have six possible explanatory variables to be used in our model. There are a total of 90 observations in this study. A preliminary sum- mary of the data are given by the scatterplots that follow (Figures 12.7 and 12.8).

716 Chapter 12 Multiple Regression and the General Linear Model

FIGURE 12.7 Scatterplots of IT, QW, and VS versus AVTEM

140

2 3

IT

AVTEM

4 5 6

150 160 190

170 180

140 10

VS

AVTEM

11 12

150 160 190

170 180

140 6

QW

AVTEM

7 8

150 160 190

170 180

FIGURE 12.8 Scatterplots of IT, QW,

and VS versus LOGV

2.6

2 3

IT

LOGV

4 5 6

3.0 3.4 3.8

2.6 6

QW

LOGV

7 8

3.0 3.4 3.8

2.6 10

VS

LOGV

11 12

3.0 3.4 3.8

12.10 Research Study: Evaluation of the Performance of an Electric Drill 717 From the scatterplots the following relationships between the variables are obtained: AVTEM tends to decrease as IT increases, but in a nonlinear fashion.

However, AVTEM appears to remain fairly constant with increases in QW or VS.

Similarly, LOGV tends to decrease as QW increases, but not at a constant rate.

LOGV tends to remain fairly constant with increases in IT or VS.

Analyzing the Data

After examining the scatterplots, the models in Table 12.18 were considered in an attempt to relate AVTEM and LOGV to the explanatory variables.

The goal was to obtain models for AVTEM and LOGV which fit the data well but did not overfit the data. Thus, models were sought which would have a sig- nificant fit (small p-value and large R2value) without having too many terms in the model. The eight models were programmed for analysis using the SAS software.

SAS output is given in Tables 17–20 with the following identification:

Variable Notation Variable Notation

IT x1 IT * QW x7

QW x2 IT * VS x8

VS x3 VS * QW x9

I2 x4 AVTEM y1

Q2 x5 LOGV y2

V2 x6

TABLE 12.17 Notation for variables in

regression models

TABLE 12.18 Models for describing AVTEM

Models for AVTEM Model 1

Model 2 Model 3 Model 4

b8IT * VS b9QW * VSe

AVTEMb0b1ITb2QWb3VSb4I2b5Q2b6V2b7IT * QW b6QW * VSe

AVTEMb0b1ITb2QWb3VSb4IT * QWb5IT * VS AVTEMb0b1ITb2QWb3VSb4I2b5Q2b6V2e AVTEMb0b1ITb2QWb3VSe

The SAS System

OUTPUT FROM MODELS FOR RELATING AVTEM (y1) TO EXPLANATORY VARIABLES Dependent Variable: y1

MODEL 1:

Analysis of Variance Sum of Mean

Source DF Squares Square F Value Pr > F Model 3 7660.94568 2553.64856 131.97 <.0001 Error 86 1664.17654 19.35089

Corrected Total 89 9325.12222

Root MSE 4.39896 R-Square 0.8215 Dependent Mean 164.25556 Adj R-Sq 0.8153

Parameter Estimates Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 234.56106 7.63872 30.71 <.0001 x1 1 –6.15000 0.32788 –18.76 <.0001 x2 1 –0.67445 0.56822 –1.19 0.2385 x3 1 –3.73340 0.56843 –6.57 <.0001 ---

718 Chapter 12 Multiple Regression and the General Linear Model

MODEL 2:

Analysis of Variance Sum of Mean

Source DF Squares Square F Value Pr > F Model 6 7941.21675 1323.53612 79.38 <.0001 Error 83 1383.90547 16.67356

Corrected Total 89 9325.12222

Root MSE 4.08333 R-Square 0.8516 Dependent Mean 164.25556 Adj R-Sq 0.8409

Parameter Estimates Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 234.87853 7.16673 32.77 <.0001 x1 1 -6.18215 0.30447 -20.30 <.0001 x2 1 -0.72541 0.52761 -1.37 0.1729 x3 1 -3.81541 0.52812 -7.22 <.0001 x4 1 0.96451 0.24758 3.90 0.0002 x5 1 -0.29207 0.91332 -0.32 0.7499 x6 1 -1.04740 0.91355 -1.15 0.2549 --- MODEL 3:

Analysis of Variance Sum of Mean

Source DF Squares Square F Value Pr > F Model 6 7683.85390 1280.64232 64.76 <.0001 Error 83 1641.26833 19.77432

Corrected Total 89 9325.12222

Root MSE 4.44683 R-Square 0.8240 Dependent Mean 164.25556 Adj R-Sq 0.8113

Parameter Estimates Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 214.01181 58.56103 3.65 0.0005 x1 1 -0.53333 5.30316 -0.10 0.9201 x2 1 0.21831 7.91120 0.03 0.9781 x3 1 -2.60819 5.21968 -0.50 0.6186 x7 1 -0.29167 0.40594 -0.72 0.4745 x8 1 -0.32500 0.40594 -0.80 0.4256 x9 1 0.02498 0.70409 0.04 0.9718 --- MODEL 4:

Analysis of Variance Sum of Mean

Source DF Squares Square F Value Pr > F Model 9 7968.16362 885.35151 52.20 <.0001 Error 80 1356.95860 16.96198

Corrected Total 89 9325.12222

Root MSE 4.11849 R-Square 0.8545 Dependent Mean 164.25556 Adj R-Sq 0.8381

12.10 Research Study: Evaluation of the Performance of an Electric Drill 719

Parameter Estimates Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 203.41326 54.30065 3.75 0.0003 x1 1 -0.22505 4.91223 -0.05 0.9636 x2 1 1.72599 7.33803 0.24 0.8146 x3 1 -1.82023 4.83905 -0.38 0.7078 x4 1 0.97354 0.25005 3.89 0.0002 x5 1 -0.29587 0.92146 -0.32 0.7490 x6 1 -1.04984 0.92165 -1.14 0.2581 x7 1 -0.34034 0.37617 -0.90 0.3683 x8 1 -0.32500 0.37597 -0.86 0.3899 x9 1 -0.09944 0.65298 -0.15 0.8793 ---

TABLE 12.19 Models for describing

LOGV

Models for LOGV Model 1

Model 2 Model 3 Model 4

b7IT * QW b8IT * VSb9QW * VSe LOGVb0b1ITb2QWb3VSb4I2b5Q2b6V2

b6QW * VSe

LOGVb0b1ITb2QWb3VSb4IT * QWb5IT * VS LOGVb0b1ITb2QWb3VSb4I2 b5Q2 b6V2e LOGVb0b1ITb2QWb3VSe

OUTPUT FROM MODELS FOR RELATING LOGV (y2) TO EXPLANATORY VARIABLES Dependent Variable: y2

MODEL 1:

Analysis of Variance Sum of Mean

Source DF Squares Square F Value Pr > F Model 3 9.87413 3.29138 160.33 <.0001 Error 86 1.76543 0.02053

Corrected Total 89 11.63956

Root MSE 0.14328 R-Square 0.8483 Dependent Mean 3.19778 Adj R-Sq 0.8430

Parameter Estimates Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 6.23345 0.24880 25.05 <.0001 x1 1 0.00667 0.01068 0.62 0.5341 x2 1 -0.40568 0.01851 -21.92 <.0001 x3 1 -0.02028 0.01851 -1.10 0.2764 ---

MODEL 2:

Analysis of Variance Sum of Mean

Source DF Squares Square F Value Pr > F Model 6 9.96474 1.66079 82.30 <.0001 Error 83 1.67482 0.02018

Corrected Total 89 11.63956

Root MSE 0.14205 R-Square 0.8561 Dependent Mean 3.19778 Adj R-Sq 0.8457

720 Chapter 12 Multiple Regression and the General Linear Model

MODEL 3:

Analysis of Variance Sum of Mean

Source DF Squares Square F Value Pr > F Model 6 9.97345 1.66224 82.81 <.0001 Error 83 1.66610 0.02007

Corrected Total 89 11.63956

Root MSE 0.14168 R-Square 0.8569 Dependent Mean 3.19778 Adj R-Sq 0.8465 Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 9.95482 1.86582 5.34 <.0001 x1 1 -0.21000 0.16896 -1.24 0.2174 x2 1 -0.81681 0.25206 -3.24 0.0017 x3 1 -0.35718 0.16630 -2.15 0.0347 x7 1 0.00083333 0.01293 0.06 0.9488 x8 1 0.01917 0.01293 1.48 0.1421 x9 1 0.03719 0.02243 1.66 0.1012 --- MODEL 4:

Analysis of Variance Sum of Mean

Source DF Squares Square F Value Pr > F Model 9 10.05889 1.11765 56.57 <.0001 Error 80 1.58066 0.01976

Corrected Total 89 11.63956

Root MSE 0.14056 R-Square 0.8642 Dependent Mean 3.19778 Adj R-Sq 0.8489

Parameter Estimates Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 9.83366 1.85328 5.31 <.0001 x1 1 –0.20686 0.16765 –1.23 0.2209 x2 1 –0.79658 0.25045 –3.18 0.0021 x3 1 –0.34633 0.16516 –2.10 0.0392 x4 1 0.00993 0.00853 1.16 0.2482 x5 1 0.01164 0.03145 0.37 0.7122 x6 1 –0.05187 0.03146 –1.65 0.1031 x7 1 0.00033702 0.01284 0.03 0.9791 x8 1 0.01917 0.01283 1.49 0.1392 x9 1 0.03547 0.02229 1.59 0.1154 --- Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 6.25908 0.24932 25.10 <.0001 x1 1 0.00632 0.01059 0.60 0.5525 x2 1 -0.40624 0.01835 -22.13 <.0001 x3 1 -0.02148 0.01837 -1.17 0.2457 x4 1 0.01047 0.00861 1.22 0.2274 x5 1 0.01043 0.03177 0.33 0.7436 x6 1 -0.05300 0.03178 -1.67 0.0991 ---

12.10 Research Study: Evaluation of the Performance of an Electric Drill 721 The fit of the eight models are summarized in Table 12.21. We will repeat the table of models (Table 12.20) to assist in the evaluation:

TABLE 12.20 Models for describing AVTEM and LOGV

Models for AVTEM Model 1

Model 2 Model 3 Model 4

Models for LOGV Model 1

Model 2 Model 3 Model 4

b7IT * QW b8IT * VSb9QW * VSe LOGVb0b1ITb2QWb3VSb4I2b5Q2b6V2

b6QW * VSe

LOGVb0b1ITb2QWb3VSb4IT * QWb5IT * VS LOGVb0b1ITb2QWb3VSb4I2 b5Q2 b6V2e LOGVb0b1ITb2QWb3VSe

b8IT * VS b9QW * VSe

AVTEMb0b1ITb2QWb3VSb4I2b5Q2b6V2b7IT * QW b6QW * VSe

AVTEMb0b1ITb2QWb3VSb4IT * QWb5IT * VS AVTEMb0b1ITb2QWb3VSb4I2b5Q2b6V2e AVTEMb0b1ITb2QWb3VSe

TABLE 12.21

Model summary information Model R2 Model p-value p-value for Model Comparisons

Models for AVTEM

Model 1 .822 .0001 Model 2 vs Model 1: p-value .0015

Model 2 .852 .0001 Model 3 vs Model 1: p-value .7605

Model 3 .824 .0001 Model 4 vs Model 3: p-value .0016

Model 4 .855 .0001 Model 4 vs Model 2: p-value .5296

Models for LOGV

Model 1 .848 .0001 Model 2 vs Model 1: p-value .2206

Model 2 .856 .0001 Model 3 vs Model 1: p-value .1842

Model 3 .857 .0001 Model 4 vs Model 3: p-value .2373

Model 4 .864 .0001 Model 4 vs Model 2: p-value .5296

All four models for AVTEM provided a significant (p-value .0001) fit to the data set. TheR2values for the four models relating AVTEM to the explanatory vari- ables are .822, .852, .824, and .855. There is very little difference in the 4 values for R2. Based on the significant fit and the very slight differences in theR2values, the most appropriate model would be the model with the fewest independent vari- ables, namely model 1. Another comparison of the models involves testing whether adding extra terms to model 1 yielded any significant terms in the fitted model.

From Table 12.21, only model 2 had added terms over model 1 which were signifi- cantly different from 0. That is, the question of examining adding terms to model 1 in order to obtain model 2 is equivalent to testing in model 2 the hypotheses:

versus Ha: at least one of

From the SAS output, we obtain the sum of squares model from the two models and compute the value of theFstatistic for the full model (model 2) versus the reduced model (model 1):

with df 3,83 p-value Pr(F3,83 5.50) .0015

F (7941.21675 7660.94568)(6 3)

1383.9054783 5.60

b4, b5, b6 0 H0: b4 b5 b6 0

722 Chapter 12 Multiple Regression and the General Linear Model

We thus conclude that model 2 is significantly different in fit than model 1, that is, at least one of b4, b5, b6is not equal to 0 in model 2. With p-value .761, we would conclude that model 3 is not significantly different in fit than model 1; that is, we cannot reject the hypothesis that b4b5b60 in model 3. With p-value .530, we would conclude that model 4 is not significantly different in fit than model 2, because we cannot reject the hypothesis that b7b8b9= 0 in model 4.

Based on the scatterplots and the above test, model 2 would be the most appro- priate model. Although model 4 has a slightly largerR2value, theF-test demon- strates that model 4 is not significantly different from model 2, whereas model 2 is significantly different from model 1. Model 2 includes the variables I2, Q2, and V2, at least one of which appears to significantly improve the fit of the model over model 1.

Model 4 is more complex than model 2 but does not appear to provide much improvement in the fit over model 2 (R2.8545 versus .8516).

For the purpose of predicting values of AVTEM, the least-squares estimates produce the following prediction model for AVTEM:

For the response variable, LOGV, all four models provided a significant, p-value .0001, fit to the data set. The R2values for the four models relating LOGV to the ex- planatory variables are .848, .856, .857, and .864. There is very little difference in the models based on the values for R2. Based on the significant fit and the very slight differences in the R2values, the most appropriate model would be the model with the fewest independent variables, namely, model 1. Another comparison of the models involves testing whether adding extra terms to model 1 yielded any signifi- cant terms in the fitted model. From Table 12.21, none of the models provided a sig- nificant improvement in fit over model 1. With p-value .221, we would conclude that model 2 is not significantly different in fit than model 1, that is, we cannot reject the hypothesis that b4b5b60 in model 2. With p-value .184, we would con- clude that model 3 is not significantly different in fit than model 1, that is, we cannot reject the hypothesis that b4b5 b60 in model 3. With p-value .237, we would conclude that model 4 is not significantly different in fit than model 3, that is, we cannot reject the hypothesis that b4b5b60 in model 3. With p-value .530, we would conclude that model 4 is not significantly different in fit than model 2, that is, we cannot reject the hypothesis that b7 b8 b90 in model 4.

Based on the scatterplots, the fit statistics, and tests of hypotheses, model 1 would appear to be the most appropriate model. Model 2 and model 3 are not significantly different from model 1. Model 4 is more complex than model 2 but does not provide much improvement in the fit over model 2. Therefore, since the models are not significantly different, theR2values are nearly the same, and model 1 is the model containing the fewest independent variables (hence the easiest to under- stand), I would select model 1. For the purpose of predicting values of LOGV, the least-squares estimates produce the following prediction model LOGV:

12.11 Summary and Key Formulas

This chapter consolidates the material for expressing a response y as a function one or more independent variables. Multiple regression models (where all the independent variables are quantitative) and models that incorporate information

LOGV 6.233 .00667 IT .406 QW .0203 VS .292 Q21.047 V2

AVTEM 234.879 6.182 IT .725 QW 3.815 VS .965 I2

12.11 Summary and Key Formulas 723 on qualitative variables were discussed and can be represented in the form of a general linear model

After discussing various models and the interpretation of bs in these models, we presented the normal equations used in obtaining the least-squares estimates .

A confidence interval and statistical test about an individual parameter were developed using and the standard error of . We also considered a statisti- cal test about a set of bs, a confidence interval for E(y) based on a set of xs, and a prediction interval for a given set of xs.

All of these inferences involve a fair to moderate amount of numerical calcu- lation unless statistical software programs of packages are available. Sometimes these calculations can be done by hand if one is familiar with matrix operations (see Section 12.9). However, even these methods become unmanageable as the number of independent variables increases. Thus, the message should be very clear.

Inferences about general linear models should be done using available computer software to facilitate the analysis and to minimize computational errors. Our job in these situations is to review and interpret the output.

Aside from a few exercises that will probe your understanding of the me- chanics involved with these calculations, most of the exercises in the remainder of this chapter and in the regression problems of the next chapter will make extensive use of computer output.

Here are some reminders about multiple regression concepts:

1. Regression coefficients in a first-order model (one not containing trans- formed values, such as squares of a variable or product terms) should be interpreted as partial slopes—the predicted change in a dependent vari- able when an independent variable is increased by one unit, while other variables are held constant.

2. Correlations are important, not only between an independent variable and the dependent variable, but also between independent variables.

Collinearity—correlation between independent variables—implies that regression coefficients will change as variables are added to or deleted from a regression model.

3. The effectiveness of a regression model can be indicated not only by the R2value but also by the residual standard deviation.

4. As always, the various statistical tests in a regression model only indi- cate how strong the evidence is that the apparent pattern is more than random. They don’t directly indicate how good a predictive model is. In particular, a large overall F statistic may merely indicate a weak predic- tion in a large sample.

5. A t test in a multiple regression assesses whether that independent vari- able adds unique, predictive value as a predictor in the model. It is quite possible that several variables may not add a statistically detectable amount of unique, predicted value, yet deleting all of them from the model causes a serious drop in predictive value. This is especially true when there is severe collinearity.

6. The variance inflation factor (VIF) is a useful indicator of the overall impact of collinearity in estimating the coefficient of an independent variable. The higher the VIF number, the more serious is the impact of collinearity on the accuracy of a slope estimate.

jj

bjy b0 b1x1 b2x2 . . . bkxk e

7. Extrapolation in multiple regression can be subtle. Making predictions for a new set of x values may not be unreasonable when considered one by one, but the combination of values may be far outside the range of previous data.

Key Formulas 1.

where

and

2. F test for

3.

where

4. Confidence interval for bj

5. Statistical test for bj

6. Testing a subset of predictors

7. Assessing collinearity

12.12 Exercises

12.2 The General Linear Model

12.1 Letybe the yield in pounds of commercial cherry trees. A horticulturist wants to relate the yield to amount of rainfall during the month prior to harvest,x1; the amount of nitrogen in the soil,x2; and the age of the tree,x3. Write a first-order multiple regression model relatingytox1, x2, andx3.

VIFj 1(1 R2j) where R2j R2xjx1 xj1xj1 xk

T.S.: F[SS(Regression, complete)SS(Regression, reduced)](kg) SS(Residual, complete)[n (k 1)]

H0: bg1 bg2 . . . bk 0 T.S.: tj

sj

j ta2sj bjj ta2sj

se A

MS(Residual) n (k 1) sj seB

1

a(xij x)2(1 R2xjx1 . . . xj1xj1. . . xk) F SS(Regression)k

SS(Residual)[n (k 1)]

H0: b1 b2 . . . bk 0 SS(Residual)a(yi yˆi)2

SS(Total)a(yi y)2, SS(Regression)a(yˆi y)2 Ryx2 1. . . xk SS(Total) SS(Residual)

SS(Total) SS(Regression) SS(Total) 724 Chapter 12 Multiple Regression and the General Linear Model

12.12 Exercises 725

12.2 Refer to Exercise 12.1. Suppose there are three varieties of cherry trees and the horticul- turist wants to relate yield to the three explanatory variables with a separate model for each variety.

a. Write a first-order general linear model which allows for different slopes and inter- cepts for each variety.

b. In terms of the coefficients of the model in part (a), identify the slopes and intercepts associated with each of the three varieties.

12.3 A kinesiologist is studying the conditioning of long distance runners. A measure of condi- tioning is the maximum heart rate. The explanatory variables are x1, age of the runner, and x2, the body mass index of the runner. Write a second-order regression model relating yto x1and x2. Hint:A first-order model contains terms involving xi, whereas a second-order regression model involves terms and xixj.

12.4 Refer to Exercise 12.3. Suppose the kinesiologist determines that it is important to have separate models for males and females.

a. Write a second-order general linear model which includes gender as one of the explanatory variables.

b. In terms of the coefficients of the model in part (a), identify the slopes and intercepts associated with each gender.

12.5 A research professor in a leading department of education is studying three different methods of teaching English as a second language. After three months in the program the partic- ipants take an exam and let ybe the score on the exam. The following model was used to assess the efficiencies of the three methods

where

a. Interpret the bs in the above model in terms of the mean scores for each of the three methods.

b. Express the difference in mean scores for methods 1 and 2 in terms of the bs in the above model.

12.6 Refer to Exercise 12.5. Suppose the researcher wants to determine if one of the programs is more appropriate for females than males. The indicator variable, x3, was now included in the model, where

The following first-order model was used to express yas a function of x1, x2, x3:

a. Using the coefficients, bs, from the above model, write separate models for females and males.

b. Express the difference in the mean scores for methods 1 and 2 for female participants.

c. Express the difference in the mean scores for female and male participants using method 2.

12.7 Refer to Exercise 12.5. The researcher has given each participant a test prior to the begin- ning of the study and obtains an index, x4, of the participant’s English proficiency. The following model was fit to the data set, ignoring differences due to gender:

a. Using the coefficients, bs, from the above model, write three separate models, one for each method, relating the scores after three months in the program, y, to the scores prior to starting the program, x4.

b. If there is no difference in the slopes relating yto x4for the three methods, which terms in the above model would be 0?

c. Write a general linear model which would include gender as one of the explanatory variables.

yb0b1x4b2x1b3x2b5x1 * x4b6x2 * x4e yb0b1x1b2x2b3x3b4x1 * x3b5x2 * x3e x310 ifif FemaleMale

x110 ifif Method 2 Otherwise x210 ifif OtherwiseMethod 3

yb0b1x1b2x2e xi, x2i,

Một phần của tài liệu Ebook An introduction to statistical methods and data analysis (6th edition) Part 2 (Trang 143 - 192)

Tải bản đầy đủ (PDF)

(712 trang)