Models for correlated data

There is extensive support for correlated data regression models, including repeated measures, longitudinal, time series, clustered, and other related methods. Throughout this section, we assume that repeated measurements are taken on a subject or cluster with a common value for the variableid.

7.4.1 Linear models with correlated outcomes

Example: 7.10.10 library(nlme)

glsres = gls(y ~ x1 + ... + xk,

correlation=corSymm(form = ~ ordervar | id), weights=varIdent(form = ~1 | ordervar), ds)

Note:Thegls()function supports estimation of generalized least squares regression models with arbitrary specification of the variance covariance matrix. In addition to a formula interface for the mean model, the analyst specifies a within-group correlation structure as well as a description of the within-group heteroscedasticity structure (using the weights option). The statement ordervar | idimplies that associations are assumed within id.

Other covariance matrix options are available; seehelp(corClasses).

7.4.2 Linear mixed models with random intercepts

See 7.4.3 (random slope models), 7.4.4 (random coefficient models), and 11.2 (empirical power calculations).

library(nlme)

lmeint = lme(fixed= y ~ x1 + ... + xk, random = ~ 1 | id, na.action=na.omit, data=ds)

Note: Best linear unbiased predictors (BLUPs) of the sum of the fixed effects plus corresponding random effects can be generated using the coef() function, random effect estimates using therandom.effects()function, and the estimated variance–covariance matrix of the random effects using VarCorr()(see fixef()and ranef()). Normalized residuals (using a Cholesky decomposition, see pages 238–241 of Fitzmaurice et al. [40]) can be generated using the type="normalized" option when calling residuals() using an NLME option (more information can be found using help(residuals.lme)). A plot of the random effects can be created usingplot(lmeint). See thelmmfitpackage for goodness-of-fit measures for linear mixed models with one level of clustering.

7.4.3 Linear mixed models with random slopes

Example: 7.10.11 See 7.4.2 (random intercept models) and 7.4.4 (random coefficient models).

library(nlme)

lmeslope = lme(fixed=y ~ time + x1 + ... + xk, random = ~ time | id, na.action=na.omit, data=ds)

Note:The default covariance for the random effects is unstructured (seehelp(reStruct)for other options). Best linear unbiased predictors (BLUPs) of the sum of the fixed effects plus corresponding random effects can be generated using the coef() function, random effect estimates using the random.effects() function, and the estimated variance covariance matrix of the random effects usingVarCorr(). A plot of the random effects can be created usingplot(lmeint).

7.4. MODELS FOR CORRELATED DATA 97

7.4.4 More complex random coefficient models

We can extend the random effects models introduced in 7.4.2 and 7.4.3 to three or more subject-specific random parameters (e.g., a quadratic growth curve or spline/“broken stick”

model [40]). We usetime1andtime2 to refer to two generic functions of time.

library(nlme)

lmestick = lme(fixed= y ~ time1 + time2 + x1 + ... + xk, random = ~ time1 time2 | id, data=ds, na.action=na.omit)

7.4.5 Multilevel models

Studies with multiple levels of clustering can be estimated. In a typical example, a study might include schools (as one level of clustering) and classes within schools (a second level of clustering), with individual students within the classrooms providing a response. Gener- ically, we refer to levell variables, which are identifiers of cluster membership at level l.

Random effects at different levels are assumed to be uncorrelated with each other.

library(nlme)

lmres = lme(fixed= y ~ x1 + ... + xk, random= ~ 1 | level1 / level2, data=ds)

Note: A model with k levels of clustering can be fit using the syntax: level1 / ... / levelk.

7.4.6 Generalized linear mixed models

Examples: 7.10.13 and 11.2 library(lme4)

glmmres = glmer(y ~ x1 + ... + xk + (1|id), family=familyval, data=ds) Note: Seehelp(family)for details regarding specification of distribution families and link functions.

7.4.7 Generalized estimating equations

Example: 7.10.12 library(gee)

geeres = gee(formula = y ~ x1 + ... + xk, id=id, data=ds, family=binomial, corstr="independence")

Note: The gee() function requires that the dataframe be sorted by subject identifier.

Other correlation structures include "fixed", "stat M dep", "non stat M dep", "AR-M", and"unstructured". Note that the"unstructured"working correlation requires careful specification of ordering when missing data are monotone.

7.4.8 MANOVA

library(car)

mod = lm(cbind(y1, y2, y3) ~ x1, data=ds) Anova(mod, type="III")

Note: Thecarpackage has a vignette that provides detailed examples, including a repeated measures ANOVA with details of use of the idata andidesign options. If the factorx1 has two levels, this is the equivalent of a Hotelling’sT2 test. TheHotellingpackage (due to James Curran) can also be used to calculate Hotelling’sT2 statistic.

7.4.9 Time series model

Time series modeling is an extensive area with a specialized language and notation. We make only the briefest approach here. We display fitting an ARIMA (autoregressive integrated moving average) model for the first difference, with first-order autoregression and moving averages. The CRAN time series task view provides an overview of support available for R.

tsobj = ts(x, frequency=12, start=c(1992, 2)) arres = arima(tsobj, order=c(1, 1, 1))

Note: Thets() function creates a time series object, in this case for monthly time series data within the variable x beginning in February 1992 (the default behavior is that the series starts at time 1 and the number of observations per unit of time is 1). The start option is either a single number or a vector of two integers that specify a natural time unit and a number of samples into the time unit. The arima()function fits an ARIMA model with AR, differencing, and MA order, all equal to 1.

Derived variables and data manipulation

Merging, combining, and subsetting datasets