306: Log-linear models – Poisson Regression (August 2005) tài liệu, giáo án, bài giảng , luận văn, luận án, đồ án, bài t...
B a s i c S t a t i s t i c s F o r D o c t o r s Singapore Med J 2005; 46(8) : 377 CME Article Biostatistics 306 Log-linear models: poisson regression Y H Chan Log-linear models are used to determine whether there are any significant relationships in multiway contingency tables that have three or more categorical variables and/or to determine if the distribution of the counts among the cells of a table can be explained by a simpler, underlying structure (restricted model) The saturated model contains all the variables being analysed and all possible interactions between the variables Let us use a simple 2X2 cross-tabulation (over-eating versus over-weight, Table Ia) to illustrate the log-linear model analysis Table Ib shows the SPSS data structure and their association could easily be assessed using the chi-square test(1) (test of independence) Table Ic shows that there is no association (phew!), p=0.065 and Table Id shows the corresponding risk estimates Table Ic Chi-square test Chi-square tests Value df 3.407b 065 Continuity correction 2.904 088 Likelihood ratio 3.417 065 Pearson chi-square a Asymp Exact Exact sig sig sig (2-sided) (2-sided) (1-sided) Fisher’s exact test Linear-by-linear association No of valid cases 068 3.390 044 066 200 a Computed only for a 2x2 table b cells (.0%) have expected count less than The minimum expected count is 47.52 Table Ia Over-eating x over-weight Table Id Risk estimate table Over-eating * over-weight cross-tabulation Risk estimate Over-weight 95% confidence interval Over-eating Yes Count % within over-weight No Count % within over-weight Total Yong Loo Lin School of Medicine National University of Singapore Block MD11 Clinical Research Centre #02-02 10 Medical Drive Singapore 117597 Y H Chan, PhD Head Biostatistics Unit Correspondence to: Dr Y H Chan Tel: (65) 6874 3698 Fax: (65) 6778 5743 Email: medcyh@ nus.edu.sg Count % within over-weight Yes No Total 58 41 99 55.8% 42.7% 49.5% 46 55 101 44.2% 57.3% 50.5% 104 96 200 100.0% 100.0% 100.0% Table Ib SPSS data structure for over-eating x over-weight Over-eating Over-weight Count Yes Yes 58 Yes No 41 No Yes 46 No No 55 Coding Yes = & No = Value Lower Upper Odds ratio for over-eating (yes/no) 1.691 966 2.960 For cohort over-weight = yes 1.286 982 1.685 For cohort over-weight = no 761 567 1.021 No of valid cases 200 We shall use the log-linear model analysis for the above 2X2 table Before running the analysis for the log-linear model, we have to “weight cases” using the variable Count first Go to Data, Weight Cases to get Template I Check on the “Weight cases by” and input “Count” to the Frequency Variable option Singapore Med J 2005; 46(8) : 378 Template I Declaring “count” as the “Weight cases by” Template IV Display options Go to Analyze, Loglinear, General to get Template II Put Over-weight and Over-eating into the Factors option (a maximum of 10 categorical variables could be included) Template II Declaring only categorical variables Check the Estimates box The following options are available in the Saved folder (Template V) Leave them unchecked Template V Save options Leave the “Distribution of Cell Counts” as Poisson, then click on the Model folder, and see Template III The Saturated model gives all possible interactions between the categorical variables In this case, the model will be Over-weight + Over-eating + Over-eating X Over-weight Template III Defining the saturated model The model information and goodness-of-fit statistics will be automatically displayed SPSS output – Saturated Model (only relevant tables shown) Table II shows the goodness-of-fit test, which will always result in a chi-square value of because the saturated model will fully explain all the relationships among the variables Table II Goodness-of-fit test Goodness-of-fit testsa,b Click on the Options folder in Template II to get Template IV Value df Sig Likelihood ratio 000 Pearson chi-square 000 a Model: Poisson b Design: Constant + over_weight + over_eating + over_weight* over_eating Singapore Med J 2005; 46(8) : 379 Table III Saturated model – parameter estimates Parameter estimatesb,c 95% confidence interval Parameter Estimate Std error Z Sig Lower bound Upper bound Constant 4.016 134 29.922 000 3.753 4.279 [over_weight = 1.00] -.177 199 -.890 373 -.567 213 [over_weight = 2.00] a [over_eating = 1.00] -.291 205 -1.417 157 -.693 112 [over_eating = 2.00] a 525 284 1.831 067 -.037 1.077 0a 0a 0a [over_weight = 1.00]* [over_eating = 1.00] [over_weight = 1.00]* [over_eating = 2.00] [over_weight = 2.00]* [over_eating = 1.00] [over_weight = 2.00]* [over_eating = 2.00] a This parameter is set to zero because it is redundant b Model: Poisson c Design: Constant + over_weight + over_eating + over_weight * over_eating Table III shows the parameter estimates of the saturated model Taking the exponential (exp) of the estimate gives the odds ratio We are particularly interested in the interaction term [over_weight = 1.00] * [over_eating = 1.00] which assesses the association between the variables This interaction’s estimate is 0.525 and exp (0.525) = 1.691 with a p-value of 0.067 – which is exactly the same results obtained using Chi-square test (Tables Ic & Id) The main effect ([over_weight = 1.00] and [over_eating = 1.00]) tests on the null hypothesis that the subjects are distributed evenly over the levels of each variable Here we have both variables quite evenly distributed (over-weight: 52% vs 48% and over-eating: 49.5% vs 50.5%, Table Ib), thus p>0.05 for both main effects The standardised form (Z) can be used to assess which variables/interactions in the model are the most or least important to explain the data The higher the absolute of Z, the more “important” If our interest is to determine relationships, we can stop here But if we want to develop a simpler model, then the next simpler (restricted) model will be Over-weight + Over-eating (ignoring their interaction, since the variables are independent) To define this Over-weight + Over-eating restricted model, click on the custom button in Template III Put Over-weight and Over-eating to the Terms in Model option (Template VI) Template VI Defining the restricted over-weight + over-eating model In Template IV, check on the Residuals and Frequencies options, and clear all the plot options SPSS outputs – Restricted model: Over-weight + Over-eating Table IVa Goodness-of-fit test: Over-weight + Over-eating Goodness-of-fit testsa,b Value df Sig Likelihood ratio 3.417 065 Pearson chi-square 3.407 065 a Model: Poisson b Design: Constant + over_eating + over_weight Singapore Med J 2005; 46(8) : 380 Table IVb Residual analysis for Over-weight + Over-eating Cell counts and residualsa,b Observed Overweight Yes No Expected Overeating Count % Count % Residual Standardised residual Adjusted residual Deviance Yes 58 29.0% 51.480 25.7% 6.520 909 1.843 890 No 46 23.0% 52.520 26.3% -6.520 -.900 -1.843 -.919 Yes 41 20.5% 47.520 23.8% -6.520 -.946 -1.843 -.969 No 55 27.5% 48.480 24.2% 6.520 936 1.843 917 a Model: Poisson b Design: Constant + over_eating + over_weight The goodness-of-fit test (Table IVa) compares whether this restricted model (Over-weight + Over-eating) is an adequate fit to the data We want the p-value (sig) to be >0.05 In this case, we have p=0.065 which means that this restricted model is adequate to fit the data Residual analysis helps us to spot outlier cells, where the restricted model is not fitting well The Residual is the difference of the expected frequencies and the observed cell frequencies The smaller the residual, the better the model is working for that cell The Standardized residuals (normalised against the mean and standard deviation) should have values