Some Basic Fundamentals about Regression

625 4 0
Some Basic Fundamentals about Regression

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Some Basic Fundamentals about Regression CONSIDER A Y MEASUREMENT AND AN X MEASUREMENT Obs 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Estriol (X) 9 12 14 16 16 14 16 16 17 19 21 24 15 16 17 25 27 15 15 15 16 19 18 17 18 20 22 25 24 Birthweight (Y) 25 25 25 27 27 27 24 30 30 31 30 31 30 28 32 32 32 32 34 34 34 35 35 34 35 36 37 38 40 39 43 Want to know the relationship between Birthweight and Estrodiol Does Estrodiol control the Birthweight in some fashion? Plot of Birthweight (Y) versus Estriol (X) Very first thing you should is plot the data abline(21.5,0.6) Postulate a linear Relationship between X and Y Y-intercept + slope * x The Expected Value of Y (long run average) Depends on X E[ y | x]    x Actual value of y Error or Residual Model predicts This value of y at this x value Deterministic, explanatory part of model Random errors Called residuals yi    xi   i yi    xi   i  i ~ N (0,  ) Ordinary regression assumes that residuals:  Mean and  Constant Variance  RESIDUALS ARE UNCORRELATED AND INDEPENDENT OF ONE ANOTHER Brief Detour … Why Do We Typically Assume Residuals Are Normally Distributed? Engage in a little fun with me but be not afraid … Motivation comes from Boltzman’s Maximum Entropy Theorem ( Cercignani, 1998; Rosenkrantz, 1989) Let X be a random variable with distribution function f(x), mean E[ X ]   Var[ X ]  E[( X   ) ]   Want to find density function which maximizes the entropy  Maximizing a function of a function S ( f )    f ( x) ln( f ( x))dx  subject to the constraints:    f(x)dx  1, EX   μ, E[X ]       Take the generalized derivative of:       S ( f )  S ( f )    f ( x)dx  1  xf ( x)dx  2  x f ( x)dx * Lagrangian multipliers… Add an arbitrarily small perturbation to f(x) ensure constraints … are met… where g(x) is any nonzero perturbation function and      0 S ( f  g )    f ( x) ln  f ( x)  g ( x) dx    g ( x) ln  f ( x)  g ( x) dx  *          f ( x)dx  1  xf ( x)dx  2  x f ( x)dx          g ( x)dx  1  xg ( x)dx  2  x g ( x)dx Take the derivative of S* and set equal to zero: dS * ( f  g ) d     0          (  1)  g ( x)dx   g ( x) ln  f ( x) dx  1  xg ( x)dx  2  x g ( x)dx (  1)  ln( f ( x))   x   x g ( x)dx  2  ln( f ( x))  (  1)  1 x  2 x  f ( x)  exp(  1) * exp(1 x  2 x )  C exp(2 x  1 x   ) Exponential function of polynomial Now choose the Lagrangian multipliers to satisfy the constraints and after completing the square we see that:  ( x  )2  Normal  f ( x)  exp  Distribution!!      If nothing else is known about a distribution other than it’s mean and variance then the Normal Density is the distribution which maximizes the entropy  Principle of maximal entropy from Thermodynamics: Physical systems tend to move towards maximal entropy configurations over time  If nothing else is known about a distribution, then the distribution with maximum entropy should be chosen by default yi  h( xi )   i Real Observations Deterministic Model of Reality Departures From Model Philosophy of Ordinary Regression: After Accounting for Deterministic Causes of some phenomena, the random departures from the model occur in such a way to maximize the entropy 10 How to fit these models using loglin() in R Model (XYZ) (XY,XZ,YZ) (XY,XZ) satfit=loglin(T,margin=list(1:3),fit=TRUE,param=TRUE) fit1=loglin(T,margin=list(c(1,2),c(1,3),c(2,3)),fit=TRUE,param=TRUE) fit2=loglin(T,margin=list(c(1,2),c(1,3)),fit=TRUE,param=TRUE) (XY,YZ) fit3=loglin(T,margin=list(c(1,2),c(2,3)),fit=TRUE,param=TRUE) (YZ,XZ) fit4=loglin(T,margin=list(c(1,3),c(2,3)),fit=TRUE,param=TRUE) (XY,Z) fit5=loglin(T,margin=list(c(1,2),c(3)),fit=TRUE,param=TRUE) (XZ,Y) fit6=loglin(T,margin=list(c(1,3),c(2)),fit=TRUE,param=TRUE) (YZ,X) fit7=loglin(T,margin=list(c(2,3),1),fit=TRUE,param=TRUE) (X,Y,Z) indepfit=loglin(T,margin=list(1,2,3),fit=TRUE,param=TRUE) Compare model fits FIT=as.data.frame(ftable(satfit$fit,row.vars=1:3)) y1=c(fit1$fit) y2=c(fit2$fit) y3=c(fit3$fit) y4=c(fit4$fit) y5=c(fit5$fit) y6=c(fit6$fit) y7=c(fit7$fit) y8=c(indepfit$fit) FIT=cbind(FIT,y1,y2,y3,y4,y5,y6,y7,y8) names(FIT)[4:12] U=fit4$fit >U , , M = yes C A yes no yes 909.2395833 45.7604167 no 4.7604167 0.2395833 The fit of a loglin object is a contingency table , , M = no C A yes no yes 438.8404255 555.1595745 no 142.1595745 179.8404255 The XY(1) partial table is: > U[,,1] C A yes no yes 909.239583 45.7604167 no 4.760417 0.2395833  XY (1) (909.24)(0.24)   1.0 (4.76)(45.76) The XY(2) partial table is: > U[,,2] C A yes no yes 438.8404 555.1596 no 142.1596 179.8404  XY ( 2)  (438.84)(179.84)  1.0 (142.15)(555.15) Under Model,  XY (1)   XY (2)  1.0  X & Y are conditionally independent given Z Illustration of Conditional Associations with (YZ,XZ) model The XZ(1) partial table is: > U[,1,] M A yes no yes 909.239583 438.8404 no 4.760417 142.1596  XZ (1) (909.24)(142.16)   61.9 (438.84)(4.76) The XZ(2) partial table is: > U[,2,] M A yes no yes 45.7604167 555.1596 no 0.2395833 179.8404  XZ ( 2)  (45.76)(179.84)  61.9 (0.24)(555.15)  XZ (1)   XZ ( 2)  61.9 (Homogeneous Conditional Association) The YZ(1) partial table is:  U[1,,] M C yes no yes 909.23958 438.8404 no 45.76042 555.1596 YZ (1) (909.24)(555.16)   25.1 (438.84)(45.76) The YZ(2) partial table is: > U[2,,] M C yes no yes 4.7604167 142.1596 no 0.2395833 179.8404 YZ ( 2)  (4.76)(179.84)  25.1 (142.15)(0.24) YZ(1)  YZ( 2)  25.1 (Homogeneous Conditional Association) But Does Conditional Association = Marginal Association under (YZ,XZ) model? The XY(1) partial table is: > U[,,1] C A yes no yes 909.239583 45.7604167 no 4.760417 0.2395833  XY (1) (909.24)(0.24)   1.0 (4.76)(45.76) The XY(2) partial table is: > U[,,2] C A yes no yes 438.8404 555.1596 no 142.1596 179.8404  XY ( 2)  (438.84)(179.84)  1.0 (142.15)(555.15) The XY marginal table is: > margin.table(U,c(1,2)) C A yes no yes 1348.08 600.92 no 146.92 180.08  XY (  )  (1348.08)(180.08)  2.7 (146.92)(600.92) NOPE!! Simpson’s Paradox still holds true  XY    XY (k ) Sometimes, however Conditional Association = Marginal Association and we will learn when this is true soon Other way to compute conditional association Fit$param returns the parameters > fit4$param$A.M M A yes no yes 1.031272 -1.031272 no -1.031272 1.031272 Recall: ikXZ redundant parameter XZ XZ log( XZ ( j ) )  11XZ  22  12XZ  21 Thus, XZ XZ  XZ ( j )  exp(11XZ  22  12XZ  21 )  XZ ( j )  exp(sum(abs(fit4$param$A.M))) [1] 61.87324 How you determine which model is the best model? A yes no yes no yes no yes no C yes yes no no yes yes no no M yes yes yes yes no no no no (XYZ) 911 44 538 43 456 279 Saturated model fit = raw data (XY,XZ,YZ) (XY,XZ) 910.4 710.0 3.6 0.7 44.6 245.0 1.4 4.3 538.6 739.0 42.4 45.3 455.4 255.0 279.6 276.7 (XY,YZ) 885.9 28.1 29.4 16.6 563.1 17.9 470.6 264.4 (XY,XZ,YZ) model fit is pretty close MODEL FIT (YZ,XZ) 909.2 4.8 45.8 0.2 438.8 142.2 555.2 179.8 (XY,Z) 611.2 19.4 210.9 118.5 837.8 26.6 289.1 162.5 (XZ,Y) 627.3 3.3 327.7 1.7 652.9 211.5 341.1 110.5 (YZ,X) 782.7 131.3 39.4 6.6 497.5 83.5 629.4 105.6 (X,Y,Z) 540.0 90.6 282.1 47.3 740.2 124.2 386.7 64.9 (XZ,YZ) model fit is also pretty close It seemed like models (XY,XZ,YZ) and models (YZ,XZ) both fit the data pretty well Goodness of Fit Tests are based upon either Deviance G2  nijk   G  2 nijk log  ˆ   ijk  Pearson’s X2 ˆ ( n   ) ijk ijk X2  ˆ ijk Both compare the fitted count ˆ ijk to the actual count nijk How you determine which model is the best model? lrt=c(satfit$lrt,fit1$lrt,fit2$lrt,fit3$lrt,fit4$lrt,fit5$lrt,fit6$lrt,fit7$lrt,indepfit$lrt) pearson=c(satfit$pearson,fit1$pearson,fit2$pearson,fit3$pearson,fit4$pearson, fit5$pearson,fit6$pearson,fit7$pearson,indepfit$pearson) df=c(satfit$df,fit1$df,fit2$df,fit3$df,fit4$df,fit5$df,fit6$df,fit7$df,indepfit$df) pval=1-pchisq(lrt,df) comparison=data.frame(G2=lrt,X2=pearson,df=df,pval=pval) rownames(comparison) anova(model4,model1) LR tests for hierarchical log-linear models Model 1: ~C+A+M Model 2: ~C+A+M Deviance df Delta(Dev) Delta(df) P(> Delta(Dev) Model 187.7543029 Model 0.3739859 187.3803170 0.00000 Saturated 0.0000000 0.3739859 0.54084 What make log-linear models so special? Answer: The correspondence between Log-linear and Logistic Models Example: Consider the (XY,XZ,YZ) model: XY log(ijk )    iX  Yj  Zk  ikXZ  YZ   jk ij Suppose we consider Y=cigarette use as a Binary response variable Let us treat the variables X and Z as explanatory variables in a logistic regression In a logistic regression we are interested in:  P(Y  1)   P(Y  | X  i, Z  k )  Logit( P(Y  1))  log  log   1  P(Y  1)   P(Y  | X  i, Z  k )     log i1k   log(i1k )  log(i k )  i k  X Y Z XZ YZ XY  (  iX  1Y  Zk  ikXZ  1YZk  iXY )  (  i  2  k  ik  2 k  i ) XY YZ YZ Logit( P(Y  1))  (1Y  Y2 )  (iXY   )  (    i2 1k 2k ) Constant Re-parameterize: Depends on X Logit( P(Y  1))    iX   jZ Depends on Z Plug in Log-linear formula above The correspondence between Log-linear and Logistic Models So the logistic model Logit( P(Y  1))    iX   jZ is equivalent to the (XY,XZ,YZ) log-linear model: XY log(ijk )    iX  Yj  Zk  ikXZ  YZ jk  ij but it is also equivalent to the (XY,YZ) log-linear model: XY log(ijk )    iX  Yj  Zk  YZ   jk ij This model is missing associations & dependencies among the explanatory variabibles (XZ)  For every logistic regression model, there is an equivalent log-linear model Logistic Regression  Log - linear Regression  But logistic regression models cannot explain relationships among the explanatory variables like the log-linear models can Independence Graphs  For every log-linear model there is an associated independence graph which has a set of vertices, with each vertex representing a variable  Any two variables either are or are not connected by an edge (a line) A variable has an edge if there is an association between variables  A missing edge represents conditional independence between the two corresponding variables Y Example: Models (XYZ) and (XY,XZ,YZ): Z X Model (XY,YZ) : X Y Z missing 4-variable example: Models (WX,WYZ) and (WX,WY,WZ,YZ): Y Y X X W Z missing W Z Independence Graphs 4-variable example: Models (WX,WYZ) and (WX,WY,WZ,YZ): Y X W Z Definition  A path in an independence graph is a sequence of edges leading from one variable to another  Two variables X and Y are said to be separated by a subset of variables if all paths connecting X & Y intersect that subset Example: W separates X &Y in the above graph The subset {W,Z} also separates X&Y  Two variables are conditionally independent given any subset of variables that separates them For example, X and Y are conditionally independent given W and Z X and Y are also conditionally independent given W alone Independence Graphs and Contingency Table Collapsibility  People want to collapse contingency tables because it makes things simpler  Collapsing a Multi-Way table entails summing over one of the variables to eliminate a dimension Collapsing a (XYZ) Contingency table on the variable Z: X X Marginal (XY) Table (Summing over Z) a b c d e f g h i Y Y Z Independence Graphs and Contingency Table Collapsibility  A contingency table is collapsible on some variable Z, if the marginal and the conditional odds ratios for all variables (other than Z) are the same 3-Way Table Collapsibility Conditions:  For 3-way tables, XY marginal and conditional odds ratios are identical if EITHER X & Z are conditionally independent OR if Y&Z are conditionally independent  XY ( k )   XY (  ) For (XY,YZ) model: Can collapse on X YZ ( k )  YZ (  ) Can collapse on Z Y Z X But,  XZ ( k )   XZ (  ) But cannot collapse on Y Multi-Way Table Collapsibility Conditions: Suppose that variables in a model for a multi-way table partition into Three mutually exclusive subsets, A,B, C such that B separates A from C B C A When we collapse the table over variables in C, the model parameters for A and the model parameters relating A & B remain unchanged ... Philosophy of Ordinary Regression: After Accounting for Deterministic Causes of some phenomena, the random departures from the model occur in such a way to maximize the entropy 10 Back to regression …...  Residual SS  Regression SS n n n 2 ˆ ˆ  ( yi  y )   ( yi  yi )   ( yi  y ) i 1 i 1 i 1 17 THE ANOVA (Analysis of Variance) TABLE Analysis of Variance Table SOURCE Regression Residual... whenever F  F1,n  2,1 18 Review how to simple linear regression in R fit = lm( y ~ x, data = Dataset) Review how to simple linear regression in SAS: proc reg; model y = x; run; proc reg plots=diagnostics(stats=(default

Ngày đăng: 25/08/2021, 22:50

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan