Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 28 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
28
Dung lượng
296,85 KB
Nội dung
Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Statistics in Geophysics: Linear Regression II Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Model definition Suppose we have the following model under consideration: y = Xβ + , where y = (y1 , yn ) is an n × vector of observations on the response, β = (β0 , β1 , , βk ) is a (k + 1) × vector of parameters, = ( , , n ) is an n × vector of random errors, and X is the n × (k + 1) design matrix with x11 · · · x1k X = xn1 · · · xnk Winter Term 2013/14 2/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Model assumptions The following assumptions are made: E( ) = Cov( ) = σ I The design matrix has full column rank, that is, rank(X) = k + = p The normal regression model is obtained if additionally ∼ N (0, σ I) For stochastic covariates these assumptions are to be understood conditionally on X It follows that E(y) = Xβ and Cov(y) = σ I In case of assumption 4, we have y ∼ N (Xβ, σ I) Winter Term 2013/14 3/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Modelling the effects of continuous covariates We can fit nonlinear relationships between continuous covariates and the response within the scope of linear models Two simple methods for dealing with nonlinearity: Variable transformation; Polynomial regression Sometimes it is customary to transform the continuous response as well Winter Term 2013/14 4/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Modelling the effects of categorical covariates We might want to include a categorical variable with two or more distinct levels In such a case we cannot set up a continuous scale for the categorical variable A remedy is to define new covariates, so-called dummy variables, and estimate a separate effect for each category of the original covariate We can deal with c categories by the introduction of c − dummy variables Winter Term 2013/14 5/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Example: Turkey data 10 11 12 13 weightp agew origin 13.3 28 G 8.9 20 G 15.1 32 G 10.4 22 G 13.1 29 V 12.4 27 V 13.2 28 V 11.8 26 V 11.5 21 W 14.2 27 W 15.4 29 W 13.1 23 W 13.8 25 W Turkey weights in pounds, ages in weeks, and origin (Georgia (G); Virgina (V); Wisconsin (W)), of 13 Thanksgiving turkeys Winter Term 2013/14 6/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Dummy coding for categorical covariates Given a covariate x ∈ {1, , c} with c categories, we define the c − dummy variables xi1 = xi = otherwise , xi,c−1 = xi = c − otherwise , for i = , n, and include them as explanatory variables in the regression model yi = β0 + β1 xi1 + + βi,c−1 xi,c−1 + + i For reasons of identifiability, we omit the dummy variable for category c, where c is the reference category Winter Term 2013/14 7/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Design matrix for the turkey data using dummy coding > X (Intercept) agew originV originW 1 28 0 20 0 32 0 22 0 29 27 28 26 21 10 27 11 29 12 23 13 25 Winter Term 2013/14 8/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Interactions between covariates An interaction between predictor variables exists if the effect of a covariate depends on the value of at least one other covariate Consider the following model between a response y and two covariates x1 and x2 : y = β0 + β1 x1 + β2 x2 + β3 x1 x2 + , where the term β3 x1 x2 is called an interaction between x1 and x2 The terms β1 x1 and β2 x2 depend on only one variable and are called main effects Winter Term 2013/14 9/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Least squares estimation of regression coefficients The error sum of squares is = (y − Xβ) (y − Xβ) = y y − β X y − y Xβ + β X Xβ = y y − 2β X y + β X Xβ ˆ which The least squares estimator of β is the value β, minimizes Minimizing with respect to β yields ˆ = (X X)−1 X y , β which is obtained irrespective of any distribution properties of the errors Winter Term 2013/14 10/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Properties of the least squares estimator For the least squares estimator we have ˆ = β E(β) ˆ = σ (X X)−1 Cov(β) Gauss-Markov Theorem: Among all linear and unbiased ˆ L , the least squares estimator has minimal estimators β variances, implying Var(βˆj ) ≤ Var(βˆjL ) , If j = 0, , k ˆ ∼ N (β, σ (X X)−1 ) ∼ N (0, σ I), then β Winter Term 2013/14 14/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection ANOVA table for multiple linear regression and R Source of variation Degrees of freedom (df) Sum of squares (SS) Mean square (MS) F -value Regression k SSR MSR n − p = n − (k + 1) SSE MSR = SSR/k σ ˆ = SSE n−1 SST Residual Total n−p The multiple coefficient of determination is still computed as SSR , but is no longer the square of the Pearson correlation R = SST between the response and any of the predictor variables Winter Term 2013/14 15/28 σ ˆ2 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Interval estimation and tests We would like to construct confidence intervals and statistical tests for hypotheses regarding the unknown regression parameters β A requirement for the construction of tests and confidence intervals is the assumption of normally distributed errors However, tests and confidence intervals are relatively robust to mild departures from the normality assumption Moreover, tests and confidence intervals, derived under the assumption of normality, remain valid for large sample size even with non-normal errors Winter Term 2013/14 16/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Testing linear hypotheses Hypotheses: General linear hypothesis: H0 : Cβ = d against H1 : Cβ = d , where C is a r × p matrix with rank(C) = r ≤ p (r linear independent restrictions) and d is a p × vector Test of significance (t-test): H0 : βj = against H1 : βj = Winter Term 2013/14 17/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Testing linear hypotheses Hypotheses: Composite test of subvector: H0 : β = against H1 : β = Test for significance of regression: H0 : β1 = β2 = · · · = βk = against H1 : βj = for at least one j ∈ {1, , k} Winter Term 2013/14 18/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Testing linear hypotheses Test statistics: ˆ − d) (ˆ ˆ − d) ∼ Fr ,n−p F = 1r (Cβ σ C(X X)−1 C )−1 (Cβ tj = βˆj σ ˆβˆ j ∼ tn−p , where σ ˆβˆj denotes the standard error of βˆj ˆ ) Cov(β ˆ )−1 (β ˆ ) ∼ Fr ,n−p F = 1r (β F = n−p k SSR = SSE n−p R k 1−R ∼ Fk,n−p Winter Term 2013/14 19/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Testing linear hypotheses Critical values: Reject H0 in the case of: F > Fr ,n−p (1 − α) |t| > tn−p (1 − α/2) F > Fr ,n−p (1 − α) F > Fk,n−p (1 − α) Winter Term 2013/14 20/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Confidence intervals and regions for regression coefficients Confidence interval for βj : A confidence interval for βj with level − α is given by σβˆj ] [βˆj − tn−p (1 − α/2)ˆ σβˆj , βˆj + tn−p (1 − α/2)ˆ Confidence region for subvector β : A confidence ellipsoid for β = (β1 , , βr ) with level − α is given by ˆ ˆ )−1 (β ˆ − β ) ≤ Fr ,n−p (1 − α) β : (β − β ) Cov(β r Winter Term 2013/14 21/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Prediction intervals Confidence interval for the (conditional) mean of a future observation: A confidence interval for E(y0 ) of a future observation y0 at location x0 with level − α is given by ˆ ± tn−p (1 − α/2)ˆ σ (x0 (X X)−1 x0 )1/2 x0 β Prediction interval for a future observation: A prediction interval for a future observation y0 at location x0 with level − α is given by ˆ ± tn−p (1 − α/2)ˆ x0 β σ (1 + x0 (X X)−1 x0 )1/2 Winter Term 2013/14 22/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection The corrected coefficient of determination We already defined the coefficient of determination, R , as a measure for the goodness-of-fit to the data The use of R is limited, since it will never decrease with the addition of a new covariate into the model , adjusts for The corrected coefficient of determination, Radj this problem, by including a correction term for the number of parameters It is defined by Radj =1− Winter Term 2013/14 n−1 (1 − R ) n−p 23/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Akaike information criterion The Akaike information criterion (AIC) is one of the most widely used criteria for model choice within the scope of likelihood-based inference The AIC is defined by ˆ ML , σML AIC = −2l(β ) + 2(p + 1) , ˆ ML , σ ) is the maximum value of the log-likelihood where l(β ML Smaller values of the AIC correspond to a better model fit Winter Term 2013/14 24/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Bayesian information criterion The Bayesian information criterion (BIC) is defined by ˆ ML , σML BIC = −2l(β ) + log(n)(p + 1) The BIC multiplied by 1/2 is also known as Schwartz criterion Smaller values of the BIC correspond to a better model fit The BIC penalizes complex models much more than the AIC Winter Term 2013/14 25/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Practical use of model choice criteria To select the most promising models from candidate models, we first obtain a preselection of potential models All potential models can now be assessed with the aid of one of the various model choice criteria (AIC, BIC) This method is not always practical, since the number of regressor variables and modelling variants can be very large in many applications In this case, we can use the following partially heuristic methods Winter Term 2013/14 26/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Practical use of model choice criteria II Complete model selection: In case that the number of predictor variables is not too large, we can determine the best model with the “leaps-and-bounds” algorithm Forward selection: Based on a starting model, forward selection includes one additional variable in every iteration of the algorithm The variable which offers the greatest reduction of a preselected model choice criterion is chosen The algorithm terminates if no further reduction is possible Winter Term 2013/14 27/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Practical use of model choice criteria III Backward elimination: Backward elimination starts with the full model containing all predictor variables Subsequently, in every iteration, the covariate which provides the greatest reduction of the model choice criterion is eliminated from the model The algorithm terminates if no further reduction is possible Stepwise selection: Stepwise selection is a combination of forward selection and backward elimination In every iteration of the algorithm, a predictor variable may be added to the model or removed from the model Winter Term 2013/14 28/28 [...]... tests and confidence intervals, derived under the assumption of normality, remain valid for large sample size even with non-normal errors Winter Term 2013/14 16/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Testing linear hypotheses Hypotheses: 1 General linear hypothesis: H0 : Cβ = d against H1 : Cβ = d , where... rank(C) = r ≤ p (r linear independent restrictions) and d is a p × 1 vector 2 Test of significance (t-test): H0 : βj = 0 against H1 : βj = 0 Winter Term 2013/14 17/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Testing linear hypotheses Hypotheses: 3 Composite test of subvector: H0 : β 1 = 0 against H1 : β 1 = 0... 2 k 1−R 2 ∼ Fk,n−p Winter Term 2013/14 19/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Testing linear hypotheses Critical values: Reject H0 in the case of: 1 F > Fr ,n−p (1 − α) 2 |t| > tn−p (1 − α/2) 3 F > Fr ,n−p (1 − α) 4 F > Fk,n−p (1 − α) Winter Term 2013/14 20/28 Multiple linear regression model Parameter... Among all linear and unbiased ˆ L , the least squares estimator has minimal estimators β variances, implying Var(βˆj ) ≤ Var(βˆjL ) , If j = 0, , k ˆ ∼ N (β, σ 2 (X X)−1 ) ∼ N (0, σ 2 I), then β Winter Term 2013/14 14/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection ANOVA table for multiple linear regression. .. practical, since the number of regressor variables and modelling variants can be very large in many applications In this case, we can use the following partially heuristic methods Winter Term 2013/14 26/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Practical use of model choice criteria II Complete model selection: In. .. β 1 = 0 against H1 : β 1 = 0 4 Test for significance of regression: H0 : β1 = β2 = · · · = βk = 0 against H1 : βj = 0 for at least one j ∈ {1, , k} Winter Term 2013/14 18/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Testing linear hypotheses Test statistics: ˆ − d) (ˆ ˆ − d) ∼ Fr ,n−p 1 F = 1r (Cβ σ 2... Hypothesis testing and confidence intervals Model choice and variable selection Practical use of model choice criteria III Backward elimination: 1 2 3 Backward elimination starts with the full model containing all predictor variables Subsequently, in every iteration, the covariate which provides the greatest reduction of the model choice criterion is eliminated from the model The algorithm terminates if... Prediction interval for a future observation: A prediction interval for a future observation y0 at location x0 with level 1 − α is given by ˆ ± tn−p (1 − α/2)ˆ x0 β σ (1 + x0 (X X)−1 x0 )1/2 Winter Term 2013/14 22/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection The corrected coefficient of determination We already defined... determination, R 2 , as a measure for the goodness-of-fit to the data The use of R 2 is limited, since it will never decrease with the addition of a new covariate into the model 2 , adjusts for The corrected coefficient of determination, Radj this problem, by including a correction term for the number of parameters It is defined by 2 Radj =1− Winter Term 2013/14 n−1 (1 − R 2 ) n−p 23/28 Multiple linear. .. log-likelihood where l(β ML Smaller values of the AIC correspond to a better model fit Winter Term 2013/14 24/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Bayesian information criterion The Bayesian information criterion (BIC) is defined by 2 ˆ ML , σML BIC = −2l(β ) + log(n)(p + 1) The BIC multiplied by 1/2 is also ... nonlinear relationships between continuous covariates and the response within the scope of linear models Two simple methods for dealing with nonlinearity: Variable transformation; Polynomial regression. .. 13.8 25 W Turkey weights in pounds, ages in weeks, and origin (Georgia (G); Virgina (V); Wisconsin (W)), of 13 Thanksgiving turkeys Winter Term 2013/14 6/28 Multiple linear regression model Parameter... against H1 : βj = Winter Term 2013/14 17/28 Multiple linear regression model Parameter estimation Hypothesis testing and confidence intervals Model choice and variable selection Testing linear