Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 50 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
50
Dung lượng
1,16 MB
Nội dung
Chapter Linear Regression and Correlation analysis Introduction to regression analysis Regression analysis - Describe a relationship between two variables in mathematical terms - Predict the value of a dependent variable based on the value of at least one independent variable - Explain the impact of changes in an independent variable on the dependent variable Introduction to regression analysis Dependent variable the variable we wish to explain Independent variable the variable used to explain the dependent variable Names for ys and xs in regression model Names for y Name for xs Dependent variable Regressand Independent variables Regressors Effect variable Causal variables Explained variable Explanatory variables Simple Linear Regression Model Only one independent variable, x Relationship between x and y is described by a linear function Changes in y are assumed to be caused by changes in x Types of Regression Models Positive Linear Relationship Negative Linear Relationship Non-linear relationship No Relationship Population Linear Regression The population regression model: Population y intercept Dependent Variable Population Slope Coefficient Independent Variable y β0 β1x ε Linear component Random Error term, or residual Random Error component Linear Regression Assumptions Error values (ε) are statistically independent Error values are normally distributed for any given value of x The probability distribution of the errors has constant variance The underlying relationship between the x variable and the y variable is linear Population Linear Regression y y β0 β1x ε Observed Value of y for xi εi Predicted Value of y for xi Slope = β1 Random Error for this x value Intercept = β0 xi x Estimated Regression Model The sample regression line provides an estimate of the population regression line Estimated (or predicted) y value Estimate of the regression Estimate of the regression slope intercept yˆ i b0 b1x Independent variable The individual random error terms ei have a mean of zero Linear regression equation Interpretation of b0 and b1? Coefficient of determination and correlation coefficient The Multiple Regression Model Idea: Examine the linear relationship between dependent (y) & or more independent variables (xi) Population model: Y-intercept Population slopes Random Error y β0 β1x1 β x βk x k ε Estimated multiple regression model: Estimated (or predicted) value of y Estimated intercept Estimated slope coefficients yˆ b0 b1x1 b x bk x k Estimates b0, b1, b2,….,bk ��y nb0 b1 �x1 b2 �x2 bk �xk � ��x1 y b0 �x1 b1 �x1 b2 �x1 x2 bk �x1 xk � ��x2 y b0 �x2 b1 �x1 x2 b2 �x2 bk �x2 xk � � � ��xk y b0 �xk b1 �x1 xk b2 �x2 xk bk �xk Interpretation of Estimated Coefficients Slope (bi) Estimates that the average value of y changes by bi units for each unit increase in Xi given that all other variables unchanged Intercept (b0) The estimated average value of y when all xi = Multiple Regression Model Two variable model y pe o Sl x1 fo i ar v r le ab yˆ b0 b1x1 b x x1 iable x r a v r o f Slope x2 Multiple Regression Model Two variable model y yi Sample observation yˆ b0 b1x1 b x < < yi e = (y – y) x2i x1 < x1i x2 The best fit equation, y , is found by minimizing the sum of squared errors, e2 Multiple Regression Assumptions < Errors (residuals) from the regression e = model: (y – y) The errors are normally distributed The mean of the errors is zero Errors have a constant variance The model errors are independent Example A distributor of frozen desert pies wants to evaluate factors thought to influence demand Data are collected for 15 weeks Week Pie Sales Price ($) Advertising ($100s) 350 5.50 3.3 460 7.50 3.3 350 8.00 3.0 430 8.00 4.5 350 6.80 3.0 380 7.50 4.0 430 4.50 3.0 470 6.40 3.7 450 7.00 3.5 10 490 5.00 4.0 11 340 7.20 3.5 12 300 7.90 3.2 13 440 5.90 4.0 14 450 5.00 3.5 15 300 7.00 2.7 Example Dependent variable Pie(y): sales Independent variables Prices ($) (x1): Independent variables Advertising ($ (x2): 100s) Estimated (Predicted) regression equation: yˆ b0 b1 x1 b2 x2 Estimates b0, b1, b2 ��y nb0 b1 �x1 b2 �x2 � � ��x1 y b0 �x1 b1 �x1 b2 �x1 x2 � ��x2 y b0 �x2 b1 �x1 x2 b2 �x2 Multiple Coefficient of Determination Reports the proportion of total variation in y explained by all x variables taken together RSS Regression sum of squares R TSS Total sum of squares Multiple correlation (R) Multiple correlation provides a measure of the overall strength of the relationship between dependent variable and independent variables It is defined as the positive square root of the coefficient of the determination R R Correlation matrix Provides measures of the strength of the relationship between dependent variable and each independent variable y x1 y x1 rx1y x2 rx2y rx1x2 x2 ... House Price Model House Price in $1000s (y) Square Feet (x) 245 140 0 312 1600 279 1700 308 1875 199 1100 219 1550 40 5 2350 3 24 245 0 319 142 5 255 1700 Least Squares Regression Properties The sum... 2863.8 38 ( yˆ y ) ( y y )2 1202.1 27 162.096 3.1035 87 3 04. 007 45 67.2 86 331. 84 82 18911 71 1722.2 650.25 56.25 46 2.25 7656.25 45 56.25 32600 Coefficient of determination RSS R TSS 2 Correlation... variation in y is explained by variation in x) y 245 x 140 0 312 1600 279 1700 308 1875 199 1100 219 2865 1550 17150 yˆ 251.82 83 273.76 83 2 84. 73 83 303.93 58 218.91 83 268.28 33 2863.8 38 (