LỰA CHỌN MÔ HÌNH LOGITUDINAL cho biến định tính

Nếu không có biến loại bỏ (omitted variables) hoặc biến loại bỏ không liên quan đến biến giải thích thì chúng ta có thể sử dụng random effect. Nếu có biến loại bỏ hoặc biến loại bỏ tương quan với biến giải thích thì chúng ta cần sử dụng mô hình fixed effect vì mô hình fixed effect loại bỏ được hiệu ứng của các biến loại bỏ. Nếu các đối tượng ít thay đổi theo thời gian thì không thể sử dụng fixed effect. Nói chung nếu muốn tính hiệu ứng của cả biến không thay đổi theo thời gian thì sử dụng random effect còn nếu muốn kiểm soát biến không thay đổi theo thời gian thì sử dụng fixed effect.

Trang 1

LỰA CHỌN MÔ HÌNH LOGITUDINAL

- Nếu không có biến loại bỏ (omitted variables) hoặc biến loại bỏ không liên quan đến biến giải thích thì chúng ta có thể sử dụng random effect.

- Nếu có biến loại bỏ hoặc biến loại bỏ tương quan với biến giải thích thì chúng ta cần sử dụng mô hình fixed effect vì mô hình fixed effect loại bỏ được hiệu ứng của các biến loại bỏ.

- Nếu các đối tượng ít thay đổi theo thời gian thì không thể sử dụng fixed effect.

- Nói chung nếu muốn tính hiệu ứng của cả biến không thay đổi theo thời gian thì sử dụng random effect còn nếu muốn kiểm soát biến không thay đổi theo thời gian thì sử dụng fixed effect.

THIẾT LẬP LONGITUDINAL DATA CHO BỘ DỮ LIỆU

Bộ dữ liệu phải ở dạng long chứ không được dạng wide Khi đó chúng ta dùng lệnh xtset để thiết lập bộ dữ liệu là dạng longitudinal

Trang 2

TÌM HIỂU CÁC KIỂU DỮ LIỆU CỦA BỘ DỮ LIỆU

Mô tả mô hình mất dữ liệu

Nhìn vào mô hình dữ liệu có thể thấy có 224 bệnh nhân hoàn thành tất cả 7 lần đo,

21 bệnh nhân bỏ lần đo thứ 6 Các bệnh nhân còn lại bỏ hầu hết các lần đo Những bệnh nhân này gọi là monotone missingess còn những bệnh nhân mất lần đo giữa chừng và sau đó tiếp tục đo lại gọi là intermittent missingess.

Vẽ tỷ lệ biến outcome theo từng lần đo và ở các nhóm của biến predictor

label define tr 0 "Itraconazole" 1 "Terbinafine"

label values treatment tr

graph bar (mean) proportion = outcome, over(visit) by(treatment) > ytitle(Proportion with onycholysis)

Trang 3

Ở đây chúng ta dùng số lần đo để định nghĩa các thanh đồ thị thay vì dùng giờ đo chính xác của tháng đo bởi vì có thể không đủ bệnh nhân có cùng thời điểm đo để tính tỷ lệ

An alternative display is a line graph, plotting the observed proportions at each visit against time For this graph, it is better to use the average time associated with each visit for the x axis than to use visit number, because the visits were not equally spaced Both the

proportions and the average times for each visit in each treatment group can be obtained using the egen command with the mean() function:

egen prop = mean(outcome), by(treatment visit)

egen mn_month = mean(month), by(treatment visit)

twoway line prop mn_month, by(treatment) sort > xtitle(Time in months) ytitle(Proportion with onycholysis)

The proportions shown in ﬁgure 10.7 represent the estimated average (or marginal)

probabilities of onycholysis given the two covariates, time since randomization and treatment group We are not attempting to estimate individual patients’ personal probabilities, which may vary substantially, but are considering the population averages given the

covariates

Trang 4

MÔ HÌNH HỒI QUY TUYẾN TÍNH GỘP (POOLED MODEL) CHO BIẾN OUTCOME PHÂN LOẠI

Chúng ta sử dụng mô hình gộp khi giả định rằng error term không liên quan với biến giải thích và các đối tượng có cùng đặc tính nghĩa là random intercept = 0.

generate trt_month = treatment*month

Chúng ta có thể sử dụng mô hình hồi quy tuyến tính (pooled model) với biến treatment và biến month cùng với tương tác giữa hai biến này Ngoài ra chúng ta còn phải sử dụng phương pháp ước lược robust standard errors để ước lượng đúng cho sai số chuẩn.

Instead of creating a new variable for the interaction, we could have used factor-variables syntax as follows:

logit outcome i.treatment##c.month, or vce(cluster patient)

check how well predicted probabilities from the logistic regression model correspond to the observed proportions in ﬁgure 10.7 The predicted probabilities are obtained and plotted together with the observed proportions by using the following commands, which result in ﬁgure 10.8

predict prob, pr

twoway (line prop mn_month, sort) (line prob month, sort lpatt(dash)), > by(treatment) legend(order(1 "Observed proportions" 2 "Fitted probabilities")) > xtitle(Time in months) ytitle(Probability of onycholysis)

Trang 5

The marginal probabilities predicted by the model ﬁt the observed proportions reason- ably well However, we have treated the dependence among responses for the same patient as a nuisance by ﬁtting an ordinary logistic regression model with robust standard errors for clustered data

Nếu error term có phân phối bình thường thì chúng ta dùng probit model, nếu error term có phân phối logistic thì chúng ta dùng logit model, còn nếu error term có phân phối linear thì chúng ta dùng LPM model

Trang 7

MÔ HÌNH RANDOM INTERCEPT CHO BIẾN OUTCOME PHÂN LOẠI

Nếu các biến không quan sát được (error term) không liên quan đến biến giải thích nhưng hiệu ứng đặc hiệu cho cá nhân khác không hoặc khác nhau giữa các đối tượng (random intercept) thì

mô hình hồi quy tuyến tính gộp không thể thực hiện được Khi đó chúng ta có thể sử dụng mô hình random effect probit, random effect logit và fixed effect logit cho biến phân loại.

In a random effects model, the unobserved variables are assumed to be uncorrelated with (or, more strongly, statistically independent of) all the observed variables.” That assumption will often be wrong but, for the reasons given above (e.g standard errors may be very high with fixed effects, RE lets you estimate effects for time-invariant variables), an RE model may still be desirable under some circumstances RE models can be estimated via Generalized Least Squares (GLS).

Ở đây chúng ta dùng option intpoint(30) (intergration points) để đảm bảo ước lượng chính xác Giá trị sigma_u chính là độ lệch chuẩn ước lượng cho random intercept

Giá trị rho chính là residual intraclass correlation của outcome.

Trang 8

Chúng ta có thể chuyển hệ số ước lượng sang OR bằng cách dùng option or ngay sau khi chạy mô hình random intercept.

Trang 10

Chạy mô hình random effect cho biến birth weight nhị giá.

Option re: ra lệnh Stata chạy mô hình random effect

Option or ra lệnh Stata tính OR

MÔ HÌNH FIXED EFFECT CHO BIẾN OUTCOME PHÂN LOẠI

Mô hình fixed effect có thể áp dụng cho biến outcome phân loại khi:

- Khi từng cá nhân được đo lường ít nhất hai lần

- Biến độc lập phải thay đổi theo thời gian

Tuy nhiên sử dụng mô hình fixed effect sẽ có những hạn chế sau:

- Không thể đo lường hiệu ứng của các biến độc lập không thay đổi theo thời gian

- Chỉ đo lường được hiệu ứng trong cùng nhóm (within-individual) chứ không đo được hiệu ứng giữa các nhóm

Trang 11

- Không thể kiểm soát được các biến độc lập thay đổi theo thời gian không quan sát được.

Trong trường hợp error term có tương quan với các biến giải thích thì mô hình random effect không sử dụng được Khi đó chúng ta phải sử dụng fixed effect logit model

Here is how we interpret the results The note “multiple positive outcomes within groups

encountered” is a warning that you may need to check your data, because with some analyses there should be no more than one positive outcome In the present case, that is not a problem, i.e there is

no reason that respondents cannot be in poverty at multiple points in time

The note “324 groups (1620 obs) dropped because of all positive or all negative outcomes” means that 324 subjects were either in poverty during all 5 time periods or were not in poverty during all 5 time periods Fixed-effects models are looking at the determinants of within-subject variability If there is no variability within a subject, there is nothing to examine Put another way, in the 827 groups that remained, sometime during the 5 year period the subject went from being in poverty to being out of poverty; or else switched from being out of poverty to being in poverty If poverty status

Trang 12

were something that hardly ever changed across time, or if very few people were ever in poverty, there would not be many cases left for a fixed effects analysis Even as it is, more than a fourth of the sample has been dropped from the analysis (Other techniques, like xtreg, fe, won’t cost you so many cases.)

Chuyển qua OR để dễ đọc hơn

Trang 13

Chúng ta cũng có thể dùng hàm clogit để chạy mô hình

I did not need to clear the xtsettings; but I did so to illustrate that with clogit, it isn’t necessary to xtset the data Instead, the panelvar is specified by using the group option Further, with neither method was the timevar actually needed Instead of years, these could have been children within schools The xt labeling of commands can be deceptive in that you do not necessarily need to have longitudinal data to use some of the commands.

MÔ HÌNH GEE CHO BIẾN OUTCOME PHÂN LOẠI

Assumptions:

Measurements are independent across clusters (can be relaxed for time and space) Measurements may be correlated within cluster.

For dichotomous outcome variables, the GEE approach also requires the choice of a

“working correlation structure.”

Trang 14

The output of the logistic GEE analysis is comparable to the output of a linear GEE

analysis The outcome variable is Ydich, which is the dichotomized version of the

outcome variable Y, and the correlation structure used is “exchangeable.” The

difference between the outputs is found in the Link function and the Family In a logistic regression analysis, the link function is the logit and the family is binomial The second part of the output consists of the parameter estimates For each of the covariates the

magnitude of the regression coefﬁcient, the standard error, the z-value (obtained from dividing the regression coefﬁcient by its standard error), the corresponding p-value, and

the 95% conﬁdence interval around the regression coefﬁcient are presented The latter

is calculated in the regular way, i.e by the regression coefﬁcient ± 1.96 times the standard error

From the four covariates, only X2 is signiﬁcantly related to the dichotomous outcome

variable Ydich The regression coefficient is 0.2881188, and the odds ratio is therefore EXP[0.2881188] = 1.33 The 95% confidence interval around the odds ratio ranges from EXP[0.1737259] = 1.19 to EXP[0.4025118] = 1.50 The interpretation of this odds ratio is somewhat complicated As for the regression coefficients calculated for a continuous outcome variable, the odds ratios can be interpreted in two ways (1) The

between-subjects interpretation: a subject with a one-unit higher score for covariate X2,

compared to another subject, has a 1.33 times higher odds of being in the highest

group for the dichotomous outcome variable Ydich, compared to the odds of being in the lowest group (2) The within- subject interpretation: an increase of one unit in covariate

X2 within a subject over a certain time period is associated with a 1.33 times higher

odds of moving to the highest group of the dichotomous outcome variable Ydich

compared to the odds of staying in the lowest group The magnitude of the regression coefﬁcient (i.e the magnitude of the odds ratio) reﬂects both relationships, and it is not clear from the results of this analysis which is the most important component of the relationship

Trang 15

As in the GEE analysis with a continuous outcome variable, the scale parameter (also known as dispersion parameter) is an indication of the variance of the model The interpretation of this coefﬁcient is, however, different to that in the situation with a continuous outcome variable This has to do with the charac- teristics of the binomial distribution on which the logistic GEE analysis is based

In the binomial distribution the variance is directly linked to the mean value

So, for the logistic GEE analysis, the scale parameter has to be one (i.e a direct

connection between the variance and the mean)

Comparable to the situation already described for continuous outcome variables, GEE analysis requires the choice of a particular “working correlation structure.”

It has already been mentioned that for a dichotomous outcome variable it is not

possible to base that choice on the correlation structure of the observed data It is therefore interesting to investigate the difference in regression coefﬁcients estimated when different correlation structures are chosen Output 7.4 shows the results of

several analyses with different correlation structures and Table 7.5 summarizes the results of the different GEE analyses

The most important conclusion which can be drawn from Table 7.5 is that the results of the GEE analysis with different (dependent) correlation structures are highly

comparable This ﬁnding is different from that observed in the analysis of a continuous outcome variable (see Table 4.2), for which a remarkable difference was found between the results of the analysis with different correlation structures

So, probably, the statement in the literature that GEE analysis is robust against the wrong choice of a correlation structure is particularly true for dichotomous outcome variables

Furthermore, from Table 7.5 it can be seen that there are remarkable differences

between the results obtained from the analysis with an independent correlation

structure and the results obtained from the analysis with the three dependent

correlation structures It should further be noted that comparable to the situation with a continuous outcome variable, the standard errors obtained from the analysis with an independent correlation structure are higher than those obtained from the analysis with

a dependent correlation structure

Trang 16

Mô hình có biến predictor liên tục

Chạy mô hình hồi quy logistic cho biến outcome với birthorder và biến initage là predictor

I(nomid) chính là id của cluster ở đây là mẹ

Coor (exc) là working correlation là exchangeable

Family (binomial): chính là biến outcome là nhị giá

Link(logit) chính là chạy mô hình hồi quy logistic cho biến outcome.

Robust: tính robust standar errors

Ef hiển thị odd ratio thay vì hệ số của predictor.

Trang 17

Mô hình có biến predictor phân loại

Trang 18

MÔ HÌNH MIXED EFFECT CHO BIẾN OUTCOME PHÂN LOẠI

Mixed-effects models are characterized as containing both fixed effects and random effects The fixed effects are analogous to standard regression coefficients and are estimated directly The random effects are not directly estimated (although they may be obtained postestimation) but are summarized according to their estimated variances and covariances Random effects may take the form of either random intercepts or random coefficients, and the grouping structure of the data may consist of multiple levels of nested groups As such, mixed-effects models are also known in the literature as multilevel models and hierarchical models Mixed-effects commands fit mixed-effects models for a variety of distributions of the response conditional on normally distributed random mixed-effects

Định dạng
Số trang	18
Dung lượng	1,89 MB
File đính kèm	66. LONGITUDINAL DATA CHO BIEN DINH TINH.docx.zip (22 B)