Before using PROC MIXED to fit an individual growth model, you must structure your data file in a format suitable for analysis. When working with cross-sectional nonhierarchical data, this task is relatively straightforward as there is only one logical way of arranging the data-using a single record for each individual. When working with longitudinal data, this task is more complex as there are two equally plausible ways of arranging the information: (a) as a person-level data set, in which each person has one record and you use multiple variables t o record the data for each occasion of measurement; or (b) as a person-period data set, in which each person has multiple records-one for each occasion of measurement.
In a person-level file, there are only as many records as there are peo- ple. As the number of occasions of measurement grows, the file gains new variables, but no new cases. In a person-period file, there are many more records-one for each person-period combination. As data collection lengthens, so does the data set grow.
I illustrate the difference between these two file formats using a small data set presented in Willett (1988). On each of four equally spaced occa- sions of measurement, 35 individuals completed an inventory assessing their performance on a simple cognitive task called “opposite naming.” At the outset of the study, each participant also completed a baseline inventory on a covariate thought t o be associated with the growth of skill in this domain.
Table 6.1 presents these data in a person-level format. Each individual has his or her own row of data containing the values of outcome variable on each of the four occasions (SCORE1, SCORE2, SCORES, and SCORE4).
Each record also contains an identifying variable, ID, as well as the covari- ate, COVAR. Table 6.2 presents the same data in a person-period format.
To conserve space, I present only three of the cases-IDs 1, 2, and 35.
In the person-period format, the data set contains two variables identical t o those in the person format (ID and COVAR) and two new variables:
WAVE, which identifies the occasion of measurement to which the record refers; and Y , which records the individual’s score on that occasion of mea- surement. The entire person-period data set for this study has a total of 140 records, 4 for each of the 35 individuals in Table 6.1.
To use PROC MIXED t o fit an individual growth model, your data must be arrayed in a person-period format. If your data are already organized this way, you are ready for analysis. If your data have been stored in the person format, you must first convert the structure. Fortunately, this task is relatively simple, even for complex longitudinal studies. If the data set in Table 6.1 is called p e r s o n with six variables (ID, SCORE1-SCORE4, and COVAR), you can convert the file to a new data set called p e r s p e r using the code:
data p e r s p e r ; s e t p e r s o n ;
Table 6.1
Person-Level Data Set with Four Waves of Data on the Growth of Opposite Naming over Time
I D SCORE1 SCORE2 S C O R E 3 SCORE4 COVAR 1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
205 219 142 206 190 165 170 96 138 216 180 97 145 195 162 119 144 107 167 156 165 197 206 182 174 199 160 184 174 215 147 127 165 76 166
217 243 212 230 220 205 182 131 156 252 225 136 161 184 138 148 166 165 201 156 228 181 209 196 198 238 178 231 194 226 188 172 217 139 197
268 279 250 248 229 207 214 159 197 274 215 168 151 209 204 164 236 193 233 197 279 185 230 217 229 253 189 260 189 257 197 222 230 150 203
302 302 289 273 220 263 268 213 200 298 249 222 177 213 195 208 261 262 216 246 290 217 255 199 236 282 229 292 188 3 10 232 273 286 214 233
137 123 129 125 81 110 99 113 104 96 125 115 109 95 118 120 118 115 120 118 126 121 108 104 118 104 124 130 87 131 109 115 104 110 110
SAS PROC M I X E D 139
Table 6.2
Selected Records from the Person-Period Data Set on the Growth of Opposite Naming over Time
ID WAVE Y COVAR
1 1 1 1 2 2 2 2 etc.
35 35 35 35
1 2 3 4 1 2 3 4
1 2 3 4
205 217 68 302 219 243 279 302
166 197 203 233
137 137 137 137 123 123 123 123
110 110 110 110
array score [4] scorel-score4;
do i = l t o 4 ; wave = i;
y = score[il ; output ; end ;
drop i scorel-score4;
run ;
Without going line-by-line through the program, I draw your attention t o the most important aspect of the code: the presence of the output statement within the do loop. Placing the output statement within the loop ensures that the code creates a person-period structure because it outputs a new record t o the persper file multiple times-every time the loop is executed.
As you work with longitudinal data in SAS, you will discover a need to move back and forth between data sets in the two different formats (person and person-period). Strategies for most of the important conversions are given in Singer (1998). To illustrate the ease with which you can move from this person-period data set back t o a person-level data set, the code:
data person;
array score [4] scorel-score4;
do i=l t o 4 until(last.id);
set persper;
by id;
score[i]=y;
end;
drop i wave y ; run;
will convert the person-period data set (persper) t o a person-level data set (person). In this program, it is the presence of the set statement within the do loop that creates the requisite structure. Were we to run this program using the person-period data set in Table 6.2, we would obtain the person level data set in Table 6.1.