3.1.1 Longitudinal data
In a longitudinal study, the outcome variables of concern in each of many subjects are measured repeatedly throughout the duration of the study. This study design is in contrast to that of a cross-sectional study where a single outcome variable is measured for each subject. Typically, a fixed number of repeated measurements will be made on all subjects at a set of common time points in a longitudinal study design. The occasions of measurements are not necessarily distributed evenly throughout the study period.
The fundamental objective of a longitudinal analysis is the assessment of within-subject changes in the outcomes and the explanation of systematic different among subjects in their changes.(Fitzmaurice, Laird & Ware, 2004) The ability to obtain information concerning individual patterns of change is a key strength of longitudinal studies. This design also has the potential to increase both the statistical power when making comparisons and the robustness of model selection when compared to cross-sectional data.(Zeger & Liang, 1992)
There are two main challenges in the analysis of longitudinal data. First, the analysis of longitudinal data is complicated by the dependence among repeated measures made on the same subject. Therefore, standard regression analysis may not be appropriate for longitudinal data because the basic assumption that all observations are statistically independent or at least uncorrelated with each other is no longer valid.
The repeated measures will typically exhibit positive correlation, and this correlation must be accounted for in the analysis. Failure to account for the correlation among
35 the repeated measures would lead to a loss of important information, result in inefficient estimates of regression parameters, overestimate the variability of the estimate of change and produce inconsistent estimates of precision.(Zeger & Liang, 1992) Second, the data may be unbalanced or partially incomplete since the researcher often cannot control the circumstances for obtaining measurements. In many longitudinal studies, there may be considerable variation among subjects in the number and timing of measurements. For instance, the measurement times may be unequally spaced within an individual and may differ across subjects. Gaps in the data may also occur because the subjects fail to attend an appointed visit.
Many approaches to the analysis of longitudinal data have been studied, but most are restricted to the setting in which the outcome variable is normally distributed and the data are balanced and complete. The approaches to analyse longitudinal studies with a continuous outcome variable range from the simple to the complex.(Everitt, 1995; Laird, Donnelly & Ware, 1992) The simplest approach is to reduce the data of multiple observations within each subject to a single summary measure. This summary measures approach reduces the original multivariate problem to a univariate problem and therefore the standard methods for univariate data such as two-sample t-test can then be applied. Summary measures approach is widely employed in practice, owing to their technical simplicity and the ease with which they can be easily explained to non-statisticians.(Matthews, Altman, Campbell & Royston, 1990) If the question of interest is to monitor the rate of change of an outcome variable over time, potentially useful summary measures are a simple “change score”
between the repeated measurements or the slope of a linear response trajectory.
Alternative summary measurements include area under the curve, maximum values, and simple mean with the choice dependent upon the research question and nature of
36 the data.(Matthews, Altman, Campbell & Royston, 1990) However, these methods are not reliable if there are many missing data and the missing data does not occur at random. (Vittinghoff, Glidden, Shiboski & McCulloch, 2005)
Other response trajectories, for example, piecewise linear or curvilinear, can also be used to parsimoniously smooth and summarise within-subject changes in the outcome throughout the duration of study. Nonetheless, the richness of information available in longitudinal data can be more appropriately appreciated through the use of statistical approaches that focus on the dynamics of change as a function of time.
These more complex and powerful statistical techniques include analysis of variance or multivariable analysis of variance for repeated measures, multilevel modelling and structural equation modelling.
3.1.2 Analysis of longitudinal data in myopia
Most studies of myopia in children have investigated the relationship between age, refractive error and ocular components by presenting the mean difference of measurements taken at only two ages.(Chua, Balakrishnan, Chan, et al., 2006;
Gwiazda, Hyman, Hussein, et al., 2003; C. S. Lam, Edwards, Millodot & Goh, 1999) The growth process which is important in understanding the potential mechanisms of myopia pathogenesis, however, may not be fully reflected from these studies. Very few studies have adopted statistical approaches that focus on the growth of refractive error and ocular components as a function of age for children with myopia. A notable exception is the OLSM study which has developed ocular components’ growth curves for children aged 6 to 14 years in the USA using a mixed model approach with piecewise curves.(Jones, Mitchell, Mutti, et al., 2005)
37 3.1.3 Statistical analyses
In the following sections, the statistical approaches employed for i) developing the growth curves, ii) comparing the curves, and iii) modelling the changes before and after the onset of myopia are described. To illustrate the approach, exploratory analysis and procedures for modelling and comparing the curves are demonstrated using only the data involving AL. The same approaches were applied for SE of refractive error, height and other ocular components of the children, including VCD, ACD, LT and CR, but only the best-fitting FP models are presented in Chapter 4.
Analyses were performed using data collected yearly throughout the children’s elementary education when the child was aged 6 to 13 years. The corresponding data collection period was from 1999 through 2004 for the Northern and Eastern schools and from 2001 through 2006 for the Western school. Children who had at least three visits between the ages of 6 to 13 years and met the criteria for one of the five refractive error groups were included in the analyses. Growth curve models were developed using longitudinal data of the randomly selected eye. The child’s age at each visit was calculated using the date of visits and date of birth.
All statistical analyses were carried out using STATA version 10.1. All probabilities quoted are two-sided, with a significance level of 0.05.