Methods and applications of longitudinal data analysis

Methods and Applications of Longitudinal Data Analysis Xian Liu Uniformed Services University of the Health Sciences DoD Deployment Health Clinical Center, Defense Centers of Excellence, Walter Reed National Military Medical Center AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Academic Press is an Imprint of Elsevier Academic Press is an imprint of Elsevier 125, London Wall, EC2Y 5AS, UK 525 B Street, Suite 1800, San Diego, CA 92101-4495, USA 225 Wyman Street, Waltham, MA 02451, USA The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK Copyright © 2016 Higher Education Press Published by Elsevier Inc All rights reserved No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein) Notices Knowledge and best practice in this field are constantly changing As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress ISBN: 978-0-12-801342-7 For information on all Academic Press publications visit our website at http://store.elsevier.com/ Biography Xian Liu obtained a PhD degree in Sociology with specialization in Demography from the Population Studies Center, the Institute for Social Research at University of Michigan in 1991 Currently, he is Professor of Research at the Department of Psychiatry and Senior Scientist at the Center for the Study of Traumatic Stress, F Edward Hebert School of Medicine at Uniformed Services University of the Health Sciences He also serves as Research Scientist/Senior Statistician in the Deployment Health Clinical Center, Defense Centers of Excellence at Walter Reed National Military Medical Center His areas of expertise include longitudinal analysis in health research, survival analysis, aging and health, and development of advanced statistical models in behavioral and medical studies His articles have appeared in many leading scientific journals of various fields, with massive citations both in the United States and internationally Some of Dr Liu’s papers on longitudinal modeling and survival analysis have been used as teaching materials at some prominent research institutions His book, Survival Analysis: Models and Applications, was published jointly by Wiley and the Chinese Higher Education Press in 2012 As an internationally recognized scientist, Dr Liu has received several awards for his work in his career and has served as a Principal Investigator on a number of research projects from external sources He has given numerous presentations, seminars, and special talks on the related issues to experts and professionals specialized in longitudinal data analysis and survival analysis xv Preface In recent decades, longitudinal data analysis has become a topic of tremendous interest to statisticians, demographers, policymakers, and insurance companies Scientifically, it encompasses a wide array of disciplines that include biomedical, gerontological, and behavioral and social sciences The aim of this book is to provide a comprehensive account of the most relevant methods and techniques applied in longitudinal data analysis, without concentrating on any single field of application Therefore, this book is expected to appeal to a wide variety of disciplines Given my multidisciplinary background in training and research, the book covers models and methods applied in biomedicine, biostatistics, demography, psychology, sociology, and epidemiology, with practical examples associated with each of those disciplines Linear mixed models have seen widespread applications in longitudinal data analysis, and accordingly five chapters are devoted to this statistical perspective In light of the growing popularity of nonlinear models in empirical research, particularly in behavioral and social sciences, I also describe a variety of nonlinear mixed models in considerable detail Regression modeling, linear and nonlinear predictions, and computing programming associated with different phases of longitudinal data analysis are extensively covered A large number of statistical models and methods are presented, starting with the most basic specifications and ending with some of the most advanced techniques applied in longitudinal data analysis With a considerable volume of empirical illustrations, I attempt to make the transition from the introductory to the advanced levels as coherent and smooth as possible For almost every major method or model, I include step-by-step instructions to demonstrate for the readers how to perform the technique on their own Most methods described are supplemented with empirical practices, computing programs and output, and detailed interpretations of analytic results My hope is that scientists, professors, and other professionals of various disciplines can benefit from using this book either as a useful reference book, as a textbook in graduate courses, or any other use that requires and promotes appropriate longitudinal data modeling This hope stems from my observation that many researchers are using incorrect models and methods to perform longitudinal analysis, particularly in handling nonlinear longitudinal data Given the focus on the application and practice in this book, the audience includes professionals, academics, and graduate students who have some experience in longitudinal data analyses Numerous illustrations on various topics permit professionals to learn new models and methods or to improve their professional skills for performing longitudinal data analysis As it covers a wide scope of methods and techniques to measure change, from the introductory to the advanced, this book is useful as a reference book for planners, researchers, and professors who are working in settings involving change Scientists interested in studying the pattern of change over time should find it a useful guidebook for informing the appropriate models and methods for analyzing longitudinal data in their projects xvii xviii Preface Graduate students of various disciplines constitute another important component of the audience Social science students can benefit from the application of the concepts and the methods of longitudinal data analysis to solve problems in sociology, population studies, economics, psychology, geography, and political science This book offers a useful framework and practical examples of applied social science, especially at a time when more questions about change are raised The accessibility of many large-scale longitudinal datasets in the public domain will facilitate the interested student in practicing the models and methods learned from this book Graduates students of biology, medicine, and public health, who are interested in doing research for their future careers, can learn a rich variety of techniques from this book to perform mathematical simulation, clinical trials, and competing risks analyses on health, healthcare, mortality, and disease Longitudinal data analysis and some other related courses have been recognized as essential components of graduate students’ training in demography, psychology, epidemiology, and some of the biomedical departments For example, in medical schools this book can bestow considerable appeal among medical students who want to know how to analyze data of a randomized controlled clinical trial adequately for understanding the effectiveness of a new medical treatment or of a new medicine on disease If the reader attempts to understand the entire body of the methods and techniques covered by this book, the prerequisites should include calculus, matrix algebra, and generalized linear modeling For those not particularly familiar with this required knowledge, they might want to skip detailed mathematical and statistical descriptions and place their focus upon empirical illustrations and computer programming skills By such a concentration, they can still command the skills to apply various models and methods effectively, thereby adding new dimensions to their professional, research, or teaching activities Therefore, this book can be read selectively by the reader who is not extremely versed in high-level mathematics and statistics The reviewers for this book include Kenneth Land, Duke University; David Swanson, University of California, at Riverside; and several anonymous reviewers Additionally, a number of colleagues and friends have enriched, supported, and refined the intellectual development of this book, including Bradley Belsher, James Edward McCarroll, Jichuan Wang, and Chu Zhang Sincere thanks are given to Jill Tao of the SAS Institute for her help in SAS programing I owe special thanks to Charles C Engel, whose consistent support and help has made the completion of this book possible The staff of the Deployment Health Clinical Center, Defense Centers of Excellence, Walter Reed National Military Medical Center, provided tremendous dedication, competence, and excellence in the course of the preparation of this book The assistance of Phoebe McCutchan in editing some of the graphs was vital Finally, I would like to thank my wife, Ming Dong, for her support and encouragement throughout the entire period of preparing, writing, and editing this book CHAPTER Introduction CHAPTER OUTLINE 1.1 What is Longitudinal Data Analysis? .1 1.2 History of Longitudinal Analysis and its Progress 1.3 Longitudinal Data Structures 1.3.1 Multivariate Data Structure 1.3.2 Univariate Data Structure 1.3.3 Balanced and Unbalanced Longitudinal Data 1.4 Missing Data Patterns and Mechanisms 1.5 Sources of Correlation in Longitudinal Processes 10 1.6 Time Scale and the Number of Time Points 12 1.7 Basic Expressions of Longitudinal Modeling 13 1.8 Organization of the Book and Data Used for Illustrations 16 1.8.1 Randomized Controlled Clinical Trial on the Effectiveness of Acupuncture Treatment on PTSD 17 1.8.2 Asset and Health Dynamics among the Oldest Old (AHEAD) 18 1.1 WHAT IS LONGITUDINAL DATA ANALYSIS? We live in a dynamic world full of change A person grows, ages, and dies During that process, we may contract disease, develop functional disability, and lose mental ability Accompanying this biological life course, social change also occurs We attend school, develop a career and retire In the meantime, many of us experience family disruption, become involved in social activities, cultivate personal habits and hobbies, and make adjustments to our daily activities according to our physical and mental conditions Indeed, change characterizes almost all aspects of our social lives, ranging from the aforementioned social facets to unemployment, drug use recidivism, occupational careers, and other social events In these biological and social processes, the gradual changes and developments over a life course reflect a pattern of change over time More formally, such changes and developments may be referred to as an individual’s trajectory In a wider scope, trajectories are also seen in the pattern of change referring to such phenomena as the decaying quality over time of a commercial product or the collapse of a political system in a country In the field of business management, change in consumer purchasing behavior is generally linked both with individual characteristics and with competing products In population studies, demographers Methods and Applications of Longitudinal Data Analysis http://dx.doi.org/10.1016/B978-0-12-801342-7.00001-0 Copyright © 2016 Higher Education Press Published by Elsevier Inc All rights reserved CHAPTER 1 Introduction are concerned with such longitudinal processes as internal and international migration, and intervals between successive births In cases such as these events and in others, the pattern of change over time can be influenced and determined by various factors, such as genetic predisposition, illness, violence, environment, medical and social advancements, or the like Therefore, each trajectory can differ significantly among individuals and other observational units, or by the variables that govern the timing and rate of change in a period of time Data available at a single point of time does not suffice to analyze change and its pattern over time Cross-sectional data, traditionally so popular and so widely used in a wide variety of applied sciences, only designates a snapshot of a course and thus does not possess the capacity to reflect change, growth, or development Aware of the limitations in cross-sectional studies, many researchers have advanced the analytic perspective by examining data with repeated measurements By measuring the same variable of interest repeatedly at a number of times, the change is displayed, its pattern over time revealed and constructive findings are derived with regard to the significance of change Data with repeated measurements are referred to as longitudinal data In many longitudinal data designs, subjects are assigned to the levels of a treatment or of other risk factors over a number of time points that are separated by specified intervals Analyzing longitudinal data poses considerable challenges to statisticians and other quantitative methodologists due to several unique features inherent in such data First, the most troublesome feature of longitudinal analysis is the presence of missing data in repeated measurements In a longitudinal survey, the loss of observations on the variables of interest frequently occurs For example, in a clinical trial on the effectiveness of a new medical treatment for disease, patients may be lost to a follow-up investigation due to migration or health problems In a longitudinal observational survey, some baseline respondents may lose interest in participating at subsequent times These missing cases may possess unique characteristics and attributes, resulting in the fact that data collected at later time points may bear little resemblance to the sample initially gathered Second, repeated measurements for the same observational unit are usually related because average responses usually vary randomly between individuals or other observational units, with some being fundamentally high and some being fundamentally low Consequently, longitudinal data are clustered within observational units In the meantime, an individual’s repeated measurements may be a response to a time-varying, systematic process, resulting in serial correlation Third, longitudinal data are generally ordered by time either in equal space or by unequal intervals, with each scenario calling for a specific analytic approach Sometimes, even with an equal-spacing design, some respondents may enter a follow-up investigation after a specified survey date, which, in turn, imposes unequal intervals for different individuals Over the years, scientists have developed a variety of statistical models and methods to analyze longitudinal data Most of these advanced techniques are built upon biomedical and psychological settings, and therefore, these methodologically advanced techniques are relatively unfamiliar to researchers of other disciplines To 1.2 History of longitudinal analysis and its progress date, many researchers still use incorrect statistical methods to analyze longitudinal data without paying sufficient attention to the unique features of longitudinal data For these researchers, the advanced models and methods developed specifically for longitudinal data analysis can be readily borrowed for use after careful verification, evaluation, and modification In health and aging research, for example, the pattern of change in health status is generally the main focus In analyzing such longitudinal courses, failure to use correct, appropriate methods can result in tremendous bias in parameter estimates and outcome predictions In these areas, the application of advanced models and methods is essential 1.2 HISTORY OF LONGITUDINAL ANALYSIS AND ITS PROGRESS There were some vague, sporadic discussions about the theory of random effects and growth as early as the nineteenth century (Gompertz, 1820; Ware and Liang, 1996) The year 1918 witnessed the advent of the earliest repeated measures analysis when Fisher (1918) published the celebrated article on the analysis of variance (ANOVA) In this historical masterpiece, Fisher introduced variance-component models and the concept of “intraclass correlation.” Some later works extended Fisher’s approach to the domain of mixed modeling with the developments of such concepts as the splitplot design and the multilevel ANOVA (Yates, 1935; Jackson, 1939) For a long period of time, these variance decomposition methods were the major statistical tool to analyze repeated measurements Though simplistic in many ways, the advancement of these early works provided a solid foundation for the advancement of the modern mixed modeling techniques Around the same period, there were also some early mathematical formulations of trajectories to analyze the pattern of change over time in biological and social research (Baker, 1954; Rao, 1958; Wishart, 1938; see the summary in Bollen and Curran, 2006, Chapter 1) Until the early 1980s, however, longitudinal data analysis was largely restricted within the formulation of the classical repeated measures analysis traditionally applied in biomedical settings Given the substantial limitations and constraints in the traditional approaches in repeated measures analysis, many methodologists expressed grave concerns regarding how to measure and analyze the pattern of change over time correctly (see the summary in Singer and Willett, 2003, Chapter 1) Over the past 30 years, longitudinal data analysis has grown tremendously as a consequence of the rapid developments in mixed-effects modeling, multilevel analysis, and individual growth perspectives Accompanying these developments in statistical models and methods are the equally important advancements in computer science, particularly the powerful statistical software packages The convenience of using computer software packages to create and utilize complex statistical models has made it possible for many scientists to analyze longitudinal data by applying complex, efficient statistical methods and techniques, once considered impossible to accomplish (Singer and Willett, 2003) CHAPTER 1 Introduction As applications of various statistical techniques on longitudinal data have grown, methodological innovation has accelerated at an unprecedented pace over the past three decades The advent of the modern mixed-effects modeling and the various approaches for the analysis of longitudinal data triggered the advancement of a large number of statistical models and methods, characterized by the complex procedures of multivariate regression The major contribution of mixed-effects models, with their capability of containing both fixed and the random effects, is the provision of a flexible statistical approach to model the autoregressive process involved in the trajectory of individuals, both for average change across time and change for each observational unit Given such a powerful perspective, both the measurable covariates and the unobservable characteristics can be incorporated in the model simultaneously, thereby deriving more reliable analytic results for the description of a longitudinal process Being robust to missing data in general circumstances, mixed-effects models also have the added advantage of permitting irregularly spaced measurements across time More recently, a variety of Bayes-type approximation methods have been advanced to estimate parameters in the analysis of longitudinal data characterized of nonlinear functions such as proportions and counts Given their flexibility in modeling nonnormal outcome data, these approximation techniques have enabled researchers to estimate random effects with complex structures and to correctly perform nonlinear predictions To date, the various approximation methods have been applied by statisticians and some other quantitative methodologists to develop more statistically refined longitudinal models, thus expanding the capacity of mixed-effects modeling in longitudinal data analysis to a new dimension In a different track, some other methodologists have advanced growth curve modeling by introducing latent factors or/and latent classes within the framework of structural equation modeling (SEM) 1.3 LONGITUDINAL DATA STRUCTURES Methodologically, longitudinal data can be regarded as a special case of the classical repeated measures data of individuals that are collected and applied in experimental studies Strictly speaking, there are some conceptual differences between the two data types: repeated measures and longitudinal analysis The classical repeated measures data represent a wider concept of data type as they sometimes involve a large number of time points and permit changing experimental or observational conditions (West et al., 2007) In contrast, longitudinal data are more specific They are generally composed of observations for the same subject ordered by a limited number of time points with equally or unequally spaced intervals Therefore, longitudinal data can be defined as the data with repeated measurements at a limited number of time points with predetermined designs on time scale, time interval, and other related conditions In statistics and econometrics, longitudinal data is often referred to as panel data In this section, longitudinal data structures are delineated I first review the multivariate data format used traditionally in the classical repeated-measures analysis and 1.3 Longitudinal data structures applied presently in latent growth modeling Second, the univariate data structure is introduced, in which an individual or an observational unit (such as a commercial product) has multiple rows of data to record repeated measurements at a limited number of time points Lastly, balanced and unbalanced longitudinal data are defined and described 1.3.1 MULTIVARIATE DATA STRUCTURE The classical repeated measures data are predominantly used in the ANOVA in experimental studies Traditionally, the data structure for repeated measures ANOVA follows a multivariate format In this data structure, each subject only has a single row of data, with repeated measurements being recorded horizontally That is, a column is assigned to the measurement at each time point in the data matrix To illustrate the multivariate data structure, I provide an example by using the repeated measures data of the Randomized Controlled Clinical Trial on the Effectiveness of Acupuncture Treatments on PTSD, which will be described extensively in Section 1.7 (PTSD is the abbreviation of posttraumatic stress disorder) The PTSD Checklist (PCL) score is the response variable to gauge severity of PTSD symptoms, a 17-item summary scale measured at four time points The value range of the PCL score is from 17 to 85 In the multivariate data format, the repeated measurements for each subject are specified as four outcome variables lined in the same row, with time points indicated as suffixes attached to the variable name Additionally, two covariates are included in the dataset: Age and Female (male = 0, female = 1) To identify the subject for further analysis, each individual’s ID number is also incorporated Below is the data matrix for the first five subjects in the multivariate data format In Table 1.1, each subject has one row of data with four outcome variables, PCL1–PCL4, the ID number, and the two covariates, Age and Female Among the five subjects, one person is aged below 30 years, one above 50, and the rest ranging between 38 and 44 years of age There are four men and one woman As all observations for the outcome variable are lined horizontally in the same row, the multivariate data structure of repeated measurements contains additional columns, therefore also referred to as the wide table format Clearly, the cross-sectional data format is a special case of the multivariate structure with the outcome variable being observed only at one time The most distinctive advantage of using the multivariate data structure Table 1.1 Multivariate Data of Repeated Measurements ID PCL1 PCL2 PCL3 PCL4 Age Female 66 48 37 41 51 31 56 50 23 57 58 43 53 21 39 39 43 47 21 46 27 44 38 53 44 0 References Tierney, L., Kadane, J.B., 1986 Accurate approximations for posterior moments and marginal densities J Am Stat Assoc 81, 82–86 Tomasko, L., Helms, R.W., Snapinn, S.M., 1999 A discriminant analysis extension to mixed models Stat Med 18, 1249–1260 Vaupel, J.W., Manton, K.G., Stallard, E., 1979 The impact of heterogeneity in individual frailty on the dynamics of mortality Demography 16, 439–454 Verbeke, G., Lesaffre, E., 1996 A linear mixed-effects model with heterogeneity in the random-effects population J Am Stat Assoc 91, 217–221 Verbeke, G., Molenberghs, G., 2000 Linear Mixed Models for Longitudinal Data Springer, New York, NY Verbeke, G., Molenberghs, G., 2003 The use of score tests for inference of variance components Biometrics 50, 254–262 Verbeke, G., Lesaffre, E., Brant, L.J., 1998 The detection of residual serial correlation in linear mixed models Stat Med 17, 1391–1402 Verbrugge, L.M., Liu, X., 2014 Midlife trends in activities and disability J Aging Heal 26, 178–206 Wang, J., Wang, X., 2012 Structural Equation Modeling: Applications Using Mplus Wiley, West Sussex, UK Ware, J.H., 1985 Linear models for the analysis of longitudinal studies Am Stat 39, 95–101 Ware, J.H., Liang, K.-Y., 1996 The design and analysis of longitudinal studies: a historical perspective In: Armitage, P., David, H.A (Eds.), Advances in Biometry Wiley, New York, NY, pp 339–362 Wedderburn, R.W.M., 1974 Quasi-likelihood functions, generalized models, and the Gauss– Newton method Biometrika 61, 439–447 West, B.T., Welch, K.B., Gałecki, A.T (with contributions from B.W Gillespie), 2007 Linear Mixed Models: A Practical Guide Using Statistical Software Chapman & Hall/CRC, Boca Raton, FL Wilkinson, L., APA Task Force on Statistical Inference, 1999 Statistical methods in psychological journals: guidelines and explanations Am Psych 54, 594–604 Willekens, F., Rogers, A., 1978 Spatial Population Analysis: Methods and Computer Programs International Institute for Applied Systems Analysis, Laxenburg, Austria Winship, C., Mare, R.D., 1992 Models for sample selection bias Ann Rev Soc 18, 327–350 Wishart, J., 1938 Growth-rate determinations in nutrition studies with the bacon pig, and their analysis Biometrika 30, 16–28 Wolfinger, R., 1993 Laplace’s approximation for nonlinear mixed models Biometrika 80, 791–795 Wolfinger, R., O’Connell, M., 1993 Generalized linear mixed models: a pseudo-likelihood approach J Stat Comp Sim 4, 233–243 Wolfinger, R., Tobias, R., Sall, J., 1994 Computing Gaussian likelihoods and the derivatives for general linear mixed models SIAM J Sci Comp 15, 1294–1310 Wu, M.C., Carroll, R.J., 1988 Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process Biometrics 44, 175–188 Yao, Q., Wei, L.J., Hogan, J.W., 1998 Analysis of incomplete repeated measurements with dependent censoring times Biometrika 85, 139–149 Yates, F., 1935 Discussion of Neyman’s 1935 paper J Roy Stat Soc Supp 2, 161–166 Zeger, S.L., Karim, M.R., 1991 Generalized linear models with random effects: a Gibbs sampling approach J Am Stat Assoc 86, 79–86 497 498 References Zeger, S.L., Liang, K., 1986 Longitudinal data analysis for discrete and continuous outcomes Biometrics 42, 121–130 Zeger, S.L., Liang, K., Albert, P.S., 1988 Models for longitudinal data: a generalized estimating equation approach Biometrics 44, 1049–1060 Zhang, P., Song, P.X.-K., Qu, A., Greene, T., 2008 Efficient estimation for patient-specific rates of disease progression using nonnormal linear mixed models Biometrics 64, 29–38 Zhao, L.P., Prentice, R.L., 1990 Correlated binary regression using a quadratic exponential model Biometrika 77, 642–648 Zimmer, Z., Liu, X., Hermalin, A.I., Chuang, Y., 1998 Educational attainment and transitions in functional status among older Taiwanese Demography 35, 361–375 Zimmer, Z., Martin, L.G., Nagin, D.S., Jones, B.L., 2012 Modeling disability trajectories and mortality of the oldest-old in China Demography 49, 291–314 Subject Index A Activities of daily living (ADL) count, 89, 93, 166, 168, 198, 358, 425 from conventional, baseline, and survivor approaches, 240 fixed-effect estimates, 214 heterogeneous patterns, 237 intraindividual growth curves, 434, 435 log-transformed, 467, 469 longitudinal trajectory, 213 marginal mean, 239 model-based predictions, 237 pattern of change over time, 241 population-averaged, 434 predicted values for, 228 repeated measurements, 428 six patterns/six time points, 238 standard errors, 238 time plots of adjusted, 88 of prediction, 86, 437 two-step parametric/nonparametric mixed models, analytic results of, 468 Adaptive Gaussian quadratures, 261, 263, 358 applied to compute fixed and random effects for, 319 default integration method, 330 empirical Bayes estimate, 262 integral approximation, 263 integral of likelihood over random effects, 328 log-likelihood function, evaluation, 261 recommended approach, 263 ADL See Activities of daily living (ADL) count ADL trajectory curve, 470 Age-specific transition probabilities, 409 AHEAD baseline survey, 90 AHEAD cohort, 468 AHEAD data/dataset, 224, 235, 305 analyses of, 222 survivors, percent distribution of, 239 longitudinal data, 164, 222, 234, 327, 395, 463 output, 485 pattern-mixture modeling to analyze, 234 sample size of, 198 survey, 18, 89, 161, 213, 358, 374 TOEP covariance pattern model, 163 AIC See Akaike Information Criterion (AIC) AICC See Corrected version of the AIC (AICC) Akaike Information Criterion (AIC), 78, 145, 288 American Psychological Association (APA) Task Force on Statistical Inference, 27 Analysis of covariance (ANCOVA) models, 206, 207, 209, 213 Analysis of variance (ANOVA) method, 3, 19, 207 classical model, longitudinal data analysis, 44, 85 mean square statistics, 43 null hypothesis, 37 one-factor repeated measures, 41, 42 repeated-measures models, 15, 16 repeated measures, 37, 39 empirical illustration, PCL revisited, acupuncture treatment, 45 one-factor specifications, 37, 39 two-factor, 42 two-way repeated measures, 42, 43 ANCOVA See Analysis of covariance (ANCOVA) models ANOVA See Analysis of variance (ANOVA) method Approximation method, empirical application of, 394 Associated missing-data indicator random vector, 460 B Baseline mixture, 241 Bayes formulations, 96, 448 Bayesian inference, 96, 97, 219, 269, 316, 319, 352 Bayes’ rule in, 97 overview of, 96, 447 probability model, 98 REML estimator, 104 to specify marginal mean of response probability, 319 Bayesian information criterion (BIC), 78, 145 Bayesian methods, 99 Bayes minimization, 263 Bayes model, 78, 95–99, 220, 246, 252, 253, 255, 264 approximation methods, complex simulating procedures, 298 estimates, 257, 262, 352 of between-subjects random effects, 363 expression, 105 shrinkage estimates, 115 statistic, 423 theory, 447 499 500 Subject Index BDI-II score See Beck Depression Inventory-II (BDI-II) score Beck Depression Inventory-II (BDI-II) score, 53 Best linear unbiased predictor (BLUP), 111, 151, 152, 175 approximation empirical Bayes, 269 in linear mixed models, 267 covariance estimator, 270 estimation, 114 generalized least squares estimator, 112 least square means, 222, 227 linear random coefficient model, 115 model-based least squares, 217 nonlinear predictions, 116 reduced-form empirical approach, 331 retransformation method, 376 variance–covariance matrix, 112 Between-sample variations, 38 Between-subjects random error, 317 variability, 298 BIC See Bayesian information criterion (BIC) Binary data, pair-wise correlations, 293 Binary longitudinal data conditional effect, computation, 323 conventional logistic, overview of, 310 empirical illustration analytic plan, 328 graphical results, 339 marital status on probability of disability among older Americans, 327 nonlinear predictions, 335 SAS programs, analytic steps, 328 three logit models, analytic results of, 333 mixed-effects logit model, inference of, 318 random intercept logistic regression model, specification of, 313 Block design, data structure for analysis, 86 randomized, 39, 41, 42 BLUP See Best linear unbiased predictor (BLUP) Bootstrapping, 393 Borrowing of strength approach, 79 C Case-by-case base, 13 Categorical response data, longitudinal transition models, 379 Chi-square distributions, 23, 105, 145 Chi-square value, 430 Cholesky decomposition application, on variance– covariance matrix, 176 CI See Confidence intervals (CI) Classical ANOVA model, CLASS statement, 211 Clinical experimental studies, 26 Coding schemes, for classification factor, 147 Cohen's d, 24 Column vector, 13 Common variance-covariance structure, 237 Compound symmetry (CS), 135 residual covariance structure, 135 variance–covariance structure, 135 Computer programming, 426 Conditional effect, 324 computation, 323 conditional log OR on logit, 326 of marital status, 369, 374 statistical significance, 408 on probability scale, 325 covariates on, 355 Wald chi-square statistic, 337 Confidence intervals (CI), 35 Conventional linear mixed models, 205, 208, 220 Conventional logistic, overview of, 310 Conventional odds ratio (OR), 310, 323 for covariate, 323 in longitudinal data analysis, 317 Cook’s distance statistic, 181, 182, 193 Corrected version of the AIC (AICC), 78, 145 Correlation matrix, 287 Corresponding variance–covariance matrix, 425 Count data, 275 Covariance matrices, 417 mixed-effects models, 74, 417 of parameter estimates, 371 standard GEE procedure, 291 Covariance pattern model, 287 Covariates, 327 COVRATIO statistics, 184, 186, 193 COVTEST option, 119, 127 COVTRACE statistics, 184, 186 Cox model, proportional hazard rate model, 277 Cramér–Rao inequality, 480 Cross-sectional data analysis, CS See Compound symmetry (CS) Cubit polynomial function, J -shaped time trend, 82 Cumulative standard normal distribution function, 312 D Data available, at single point of time, Degrees of freedom (df), 39 Subject Index Delta method, 356, 477–478 Department of Defense (DoD), 17 Deployment Health Clinical Center (DHCC), 17 DESCENDING option, 465 df See Degrees of freedom (df) DFFITS score, 181 DFFITS statistics, 184, 185 DHCC acupuncture treatment study, 29, 36, 37, 85, 117, 153, 192, 454, 455 effectiveness of acupuncture treatment on PTSD, 85 longitudinal data of, 451 Dichotomous variables, 146 Disability severity score, 457, 459 conditional density function on, 457 defined as, 89, 456, 463 model-based prediction of, 462 terms of joint distribution, 461 truncated linear mixed model, 459 Dispersion/scale parameter, 245 Distinctive mixture groups, 231 DoD See Department of Defense (DoD) E Economic capability, 216 Effect size, 23 EM algorithms See Expectation-Maximization (EM) algorithms Empirical-Bayes methods development of the retransformation method, 272 estimation, 266 within-subject random errors in nonlinear predictions, 270, 271 Empirical BLUP, 362 Erroneous model-based predictions, 429 Error distributional function, 319, 462 ESTIMATE statement, 163, 223 Estimator of variances, 247 Euclidean distance, 140 Exogenous latent variables, 413 Expectation-maximization (EM) algorithms, 16, 96, 106, 110 longitudinal data analysis, 108 random effect, 108 F Factor-scoring coefficients, 417, 420, 431 Family history of diseases, 216 First order, autoregressive processes, 138 Fisher information matrix, 188, 248, 285, 311, 346, 352, 392, 419 Fisher scoring algorithm, 248, 292 Fixed-effect estimates, 237 Fixed-effects multinomial logit model, 367 Flexible mixed-effects multinomial logit models, 364 Follow-up time point, F -test, 28, 41 G Gamma distribution, 225 Gauss–Hermite abscissas, 263 Gauss–Hermite quadrature, 262 Gaussian distributions methods, 221 quadrature methods, 219, 254, 261, 262, 263, 268, 319, 328 See also Adaptive Gaussian quadratures rules, 255, 352 random effects, 222 Gauss–Newton method, 286 GEEs See Generalized estimating equations (GEEs) models Generalized estimating equations (GEEs) models, 248, 281, 282 approaches advantages of, 285 comparison, 295 basic specifications, 282, 284 working correlation matrix, 287 empirical illustration marital status on disability severity in older Americans, 299 inferences, 282 logit link, 299 marginal regression models, 295 naïve model with independence hypothesis, 282 odds ratios (ORs), 289, 292 Prentice’s approach, 289, 291 procedure, 291 quasi-likelihood information criteria, 288 random-effects models, 295 Zhao method, 291 Generalized least squares (GLS) equation, 111 Generalized linear mixed models (GLMMs), 244 analytic convenience of, 250 between-subjects random effects, 251 BLUP procedure, 267 complexity of data structures, 253 deriving parameter estimates, 261 empirical Bayes BLUP, 268 estimating procedures, 254 GLMs, probability distribution of, 251 hypothesis testing on fixed effects, 255 501 502 Subject Index Generalized linear mixed models (GLMMs) (cont.) likelihood functions, 251 linearization, best linear unbiased prediction, 267 linear predictor of, 250 link function, 267 log-likelihood function, 252, 253 marginal quasi-likelihood method, 258 methods of estimating parameters, 255 adaptive Gaussian quadrature methods, 261 Gaussian quadrature, 261 Laplace method, 259 marginal quasi-likelihood method, 258 Markov chain Monte Carlo methods, 264 penalized quasi-likelihood (PQL) method, 256 mixed-effects logistic regression model, 273 mixed-effects multinomial logit regression models, 275 mixed-effects ordered logistic model, 274 mixed-effects Poisson regression model, 275 mixture of distributions, 255 nonlinear distribution of, 272 nonlinear predictions, 266 nonlinear response, 268 nonnormal distributions, 272 overview of, 244 pseudo-error term, 257 pseudo-response variable, 257 quasi-likelihood functions, 258 random components, retransformation, 266 random effects, 251 parameters, 254 variance-covariance components of, 255 vector, 249 retransformation method, 269 statistical inferences, 248 basic specifications, 249 hypothesis testing procedures on fixed effects, 253 on variance components, 255 maximization procedures on fixed effects, 253 survival models, 276 variance-covariance matrix, 243, 251, 268 variance function, 249 within-subject random errors in nonlinear predictions, 271 within-subject variance, 258 Generalized linear models (GLMs), 243 approach, 284 coding scheme, 149 estimates, 291 independence hypothesis, 248 log-likelihood function, 247 maximum likelihood estimator (MLE), 247, 346 probability distribution, 245 regression coefficient of covariate, 246 statistical expression, 246 statistical inferences, 245 systematic component, 246 user-friendly procedures, 312 variance–covariance matrix for estimates, 248 Genetic predisposition, 216 Gibbs sampler, 265, 266 Glass’s effect, 24 GLMMs See Generalized linear mixed models (GLMMs) GLMs See Generalized linear models (GLMs) GLS equation See Generalized least squares (GLS) equation Group-based model, 422 conditional independence hypothesis, 423 development, 422 individual-level likelihood function, 424 LGMM, statistical perspective, 422, 439 longitudinal sequence of response measurement, 422 multinomial function as, 423 GROUP BY TIME option, 223 H Handling missing data, 443 Hat-value, 183 Heckman’s classical two-step estimator, 472 Hedges’s d, 25 Hermite polynomials, 262 Hessian matrix, 260, 262 Hessian of log-likelihood, 247 Heterogeneous linear mixed model, 218 Heterogeneous mixture patterns, classification of, 230 Heterogeneous transition pattern, 386 Hotelling’s trace test statistic, 50, 52 Household and Retirement Survey (HRS), 18 HRS See Household and Retirement Survey (HRS) Hybrid variance–covariance structure, 142 Hypothesis testing, on nonnegative variance, 145 I ICC See Intraclass correlation (ICC) IDENTITY option, 55 ID number, 5, IIA hypothesis See Independence from irrelevant alternatives (IIA) hypothesis Subject Index IIC See Intraindividual correlation (IIC) Immortal cohort, 234 Independence from irrelevant alternatives (IIA) hypothesis, 346 Individual-level likelihood function, 424 Influence diagnostics, 181 Cook’s distance statistic, 181 COVRATIO statistics, 184, 186 COVTRACE statistics, 184, 186 DFFITS statistics, 184, 185 empirical illustrations, 190 linear mixed model concerning marital status/disability severity among older Americans, 198 PCL score, acupuncture treatment linear mixed model, checks, 190 leverage statistic, 183 likelihood displacement statistic approximation, 187 linear mixed model, 173 LMAX statistic, for influential observations identification, 189 MDFFITS statistics, 184, 185 INFLUENCE option, 195 Institute for Social Research (ISR), 18 Integral approximation methods, 255 Intraclass correlation (ICC), 3, 70 Intraindividual correlation (IIC), 10, 67, 70, 316 Intraindividual growth patterns, 20 Irregular time trends, 85 ISR See Institute for Social Research (ISR) L Laplace approximation, 255 Laplace method, 259, 260 parameters, 260 specifications, 260 Last observation carried forward (LOCF) approach, 444 advantage of, 447 classical method handling missing data, 446 Latent endogenous random variables, 413 Latent growth curve model (LGCM), 230, 411 Latent growth mixture model (LGMM), 230, 411, 419 group-based model, 422 latent growth modeling, 411 maximum likelihood approach, 420 model covariates, 421 Latent growth model (LGM), 411, 416 application, 436 assumption of multivariate normality, 418 empirical illustration, marital status effect on ADL count, 425 factor-scoring coefficients, 417 group-based model, 422 intraindividual growth curves, 434, 435 linear slope component, 417 model, 419 structural equation modeling, overview of, 412 Latent variable model, 413 covariance matrices, 413 measurement, 413 structural equation model construction, 413 LDA See Linear discriminant analysis (LDA) LD statistic See Likelihood displacement (LD) statistic Least squares means, 222 Leverage measurement, 183 LGCM See Latent growth curve model (LGCM) LGM See Latent growth model (LGM) LGMM See Latent growth mixture model (LGMM) Likelihood-based method, 79 canonical parameters, 283 for checking the polynomial form of time, 79 Likelihood displacement (LD) statistic, 181, 187, 188, 200 ADL count, for three linear mixed models, 201 linear mixed models, 194 Likelihood distance, 192 Likelihood function, 219 Likelihood ratio statistic, 116 asymptotic null distribution of, 116 Gaussian quadrature use of, 263 goodness-of-fit information, 333 log-likelihood function, 77 p -value of, 116 test statistic, 220 Linear discriminant analysis (LDA) classical approach, 230 random-effects regression models, 230 use of, 230 Linearization-based approaches, 278 Linearization methods, 259 Linear mixed-effects models, 14, 61, 73, 111, 181, 216 cases, 62, 64, 65, 66 empirical illustrations applications, 85 baseline score, adjustment, 210 BLUPs vs least squares means, 222 marital status and disability severity in older Americans, 89 PCL score, acupuncture treatment, 85 503 504 Subject Index Linear mixed-effects models (cont.) fixed effects, inference/estimation of, 73 maximum likelihood methods, 73 missing data, 78 statistical/hypothesis testing, 75 formalization of, 66 variance–covariance components, 71 general specification of, 67 intraindividual correlation, 69 longitudinal data analysis baseline response, adjustment, 206 baseline score, adjustment, 206, 208 Lord’s paradox, 206 one-factor with random intercept, 62 pattern-mixture modeling, 229, 232 basic theory, 231 empirical illustration of, 234 heterogeneous groups, classification, 229 random effects assumed distribution, misspecification, 216 in different distributions, 220 heterogeneity linear mixed model, 217 nonnormal random effect distribution, 218 and three covariates, 65 random intercept and random slope, 64 trend analysis, 79 polynomial time functions, 80 numeric checks, 84 reduce collinearity methods, 82 variance–covariance matrix, 69 Linear predictor, 245 Linear random coefficient model, 218 Linear random intercept model, 220 Linear regression estimator, 461 Linear regression model, 14 LMAX approximation, 190 LMAX score, 189 LMAX statistic, 189 LOCF approach See Last observation carried forward (LOCF) approach Log-gamma distributed random coefficient model, 220 Log-gamma distributed slopes, 219 Log-gamma linear mixed model, 219 Logistic cumulative distribution function, 274 Logistic regression model, 380, 445 Logit models, 310 See also Multinomial logit models analytic results, 334 fixed-effects, 339 Logit regression, 328 Log-likelihood functions, 73, 74, 107, 187, 219, 247, 311, 317, 319, 346, 351, 418 expression, complete-data, 421 maximization, 103 ratio statistic, 84 Log-linear probability distribution, 300 Log OR GEE model, 306 Log transformation, 461 Longitudinal clinical controlled trial, 442 Longitudinal courses, Longitudinal data analysis, 20, 111, 138, 206, 230, 286, 288, 445 ANOVA, repeated measures of, 37 balanced/unbalanced, 7, book/data organization, for illustrations, 16 asset and health dynamics among the oldest old (AHEAD), 18 confidence interval, effect size estimators, 23, 24 computation of, 27 meta-analysis, 26 defined, empirical illustration, 29 history of, intraindividual correlation, 14 MANOVA, repeated measures of, 47 missing data, patterns and mechanisms, monotone missing pattern, nonmonotone missing data, paired t -test, 21 PTSD symptom, acupuncture treatment effectiveness, 29 randomized controlled clinical trial, 17 structures See Longitudinal data structures time plots, of trends, 20 time scale/number of time points, 12 traditional methods, 19 Longitudinal data designs, applications, statistical techniques, risk factors over time points, unbalanced, Longitudinal data structures, balanced/unbalanced, multivariate data, univariate data, Longitudinal modeling, basic expressions of, 13 Longitudinal processes, sources of correlation, 10 Longitudinal regression models, on nonignorable missing data, 456 Longitudinal trajectories of mortality derived, 377 for probability of disability prediction, 339 Longitudinal transition models for categorical response data, 379 empirical illustration Subject Index measures/models/SAS programs, 395 predicted transition probabilities in functional status/marital status, 395 transition probabilities effects of marital status, 407 prediction of, 399 mixed-effects multinomial logit transition model, 386 random coefficient, 389 random intercept, 386 separate creation, 394 statistical inference, 390 variance–covariance matrix approximation for transition probabilities, 392 with only fixed effects, 384 two-time multinomial transition modeling, overview of, 380 Lord’s paradox, 206, 207 LSMEANS statements, 156, 169 M MANOVA See Multivariate analysis of variance (MANOVA) MAR See Missing at random (MAR) Marginal effect, 356 discrete probability change, 325, 356 variance-covariance matrix for, 241 Marginal predictions, 240 Marginal quasi-likelihood (MQL) technique, 255, 263, 282, 352 Marginal regression model, 286 Marital status conditional effects of, 369, 408 on probability of disability among older Americans, 327 on transition probabilities, 408 Markov Chain Monte Carlo (MCMC) method, 255, 264, 352, 449–451 approximation method, 392 imputation on variable PCL_SUM, 452 Markov chain process, 264 fixed-effects techniques, 379 hypothesis, 384 response at time point, 389 Markov random variable, 384 Maximized log-likelihoods, 77 Maximizing equation, 345 Maximum likelihood (ML), 179, 246 approach, 102, 311, 418 equations, 254 estimates factor-scoring coefficients, 430 log-likelihood functions, 107 Maximum likelihood estimate (MLE), 74, 247, 260, 284 Fisher information matrix, 248 linear mixed models, 194 for three linear mixed models on ADL count, 201 log-likelihood function, 260 parameter, 247 solution for regression coefficients, 306 MCAR See Missing completely at random (MCAR) MCMC See Markov Chain Monte Carlo (MCMC) method MDFFITS statistic, 181, 184, 185, 191 Mean PCL scores, time plot of, 33 Mean square (MS) error, 38 factor, 38 Meta-analysis, 26 estimated effect size, 26–27 medical treatment on PTSD, 26 Methods handling missing data mixed-effects regression models, 441 not at random, 454 empirical illustration analytic results/ADL count predictions, 468 pattern of change over time in ADL count, 470 measures/models/SAS programming, 463 nonignorable missing data, 463 impact of nonignorable missing data, 456 nonparametric regression model, on nonignorable missing data, 461 pattern mixture model on MNAR, 460 selection model on MNAR, 458 at random, 444 empirical illustration, analytic results with and without multiple imputations, 451 last observation carried forward (LOCF), 446 multiple imputations (MI), 447 and shrinkage, comparison, 450 simple approaches, 444 Metropolis–Hastings algorithm, 265 Metropolis sampling, 264 MI See Multiple imputations (MI) Mills ratio, 459, 464, 465 Minimum mean square error, of prediction, 220 505 506 Subject Index Missing at random (MAR), 10, 73, 229, 442 assumption, 78, 443 defined, 443 hypothesis, 10, 451 mathematical definitions of, 441 missing-data mechanism, 444 Missing completely at random (MCAR), 10, 442 assumption, 444 defined, 442 hypothesis, 429 longitudinal data analysis, 443 mathematical definitions of, 441 Missing data See also Methods handling missing data analysis, 229, 460 classification, natural extension of, 230 nonignorable, 455, 458 patterns, classification standards for, 229 monotone missing pattern, nonmonotone missing data, patterns and mechanisms, well-assumed prior distribution of, 448 Missing not at random (MNAR), 10, 442 hypothesis, 115 mathematical definitions of, 441 mechanisms, 229 missing-data mechanism, 442, 443 nonignorable missing data, 444 statistical models, 73 Mixed-effects models, 61 logit model, 316, 318, 321, 323 analytic results/nonlinear predictions, 364 analytic steps with SAS programs, 359 binary logit model, 355, 357, 376 data, measures, and models, 358 graphical analysis, on nonlinear predictions, 374 inference of, 318 marital status conditional effects of, 369 longitudinal trajectories of disability/ mortality, 357 multinomial, 350, 351, 354, 376, 387 approximation method, empirical application of, 394 covariates’ conditional effects on probability scale, 355 fixed/random effects, estimation of, 351 longitudinal data analysis, 343 and nonlinear predictions, 347 random components, 343 regression model, 275 transition model, 386, 389, 390, 394, 396 variance–covariance matrix approximation on probabilities, 353 regression model, 273 ordered logistic model, 274 Poisson regression model, 275 probit model, 273 regression models, 78, 441, 455 ML See Maximum likelihood (ML) MLE See Maximum likelihood estimate (MLE) MNAR See Missing not at random (MNAR) Model-based longitudinal trajectory, 472 Modeling nonignorable missing data, 471 Modeling normal longitudinal data, 150 MODEL statement, 195, 210, 465 MQL estimates, 259 MQL technique See Marginal quasi-likelihood (MQL) technique Multinomial logit models, 355, 357, 424 on health states, 365 inverse of, 345 marginal means of approximate variance–covariance matrix, 401 mixed-effects models, 350, 351, 354, 376, 387 covariates’ conditional effects on probability scale, 355 empirical illustration analytic results/nonlinear predictions, 364 analytic steps with SAS programs, 359 data, measures, and models, 358 graphical analysis, on nonlinear predictions, 374 marital status, conditional effects of, 369 marital status/longitudinal trajectories of disability/mortality, 357 fixed/random effects, estimation of, 351 longitudinal data analysis, 343 and nonlinear predictions, 347 random components, 343 variance–covariance matrix approximation on probabilities, 353 regression model likelihood function for, 345 overview of, 344 transition model, 382, 385, 406 parameters, 382 Multiple imputations (MI), 444, 447 approach, 449, 450 handling missing data, 450 and shrinkage, comparison, 450 Subject Index Multivariate analysis of variance (MANOVA), 19, 47 application of, 49 constant error variance–covariance matrix, 51 distinctive disadvantages, 49 empirical illustration psychiatric disorders, acupuncture treatment, 53 general uses, 47 hypothesis testing, 49 repeated measures, 47, 51 Response*Time Effect, 56 total sums of squares, 47 Wilks’ lambda distribution, 49 within-group matrix, 48 Multivariate data PCL_SUM imputed on, 451 of repeated measurements, structure, distinctive disadvantages, vs univariate longitudinal data matrix, N Naïve model, 286 Naïve variance estimator, in longitudinal data analysis, 284 National Death Index (NDI), 18 NDI See National Death Index (NDI) Newton-Raphson (NR) algorithms, 96, 107, 108 scoring method, 248 NLMIXED procedure, 370 Nonignorable missing data, 472 clinical experimental studies, 455 in longitudinal data analysis, 456, 471 regression models, 456 multivariate regression models, 472 nonparametric regression model, 461 selection model to handle, 472 statistical methods for handling, 115, 444 Nonlinear longitudinal data, generalized linear models, 243 Nonlinear predictions, 347 graphical analysis, 374 mixed-effects multinomial logit model, 347 model-based and empirical BLUPs, 229, 267 probability of disability, 335 of random components, 266 retransformation method, 339 transition probabilities, 391 within-subject random errors, 270 Nonmonotone missing-data pattern, 450 Nonparametric mixed-effects model, 462 Nonrandom factor-scoring coefficients, 431 Null hypothesis (Type-I error), 37 O Odds ratios (OR) interpretability of, 322 mixed-effects logit model, 312, 337 parameterization, 292 quadratic formula, 293 probability of disability and conditional, 338 standard error, 326 ODS Graphics, 191 ODS OUTPUT statement, 157 Offset, 276 OLS-type residuals, 178 ONLY suboption, 195 OR See Odds ratios (OR) Orthogonal polynomials, 475–476 convenient curvilinear expression, 475 least-square estimators, 476 regression law, 476 P Paired t-test, 21 Panel data, Parameter estimates, asymptotic standard errors of, 419 Parametric hazard regression model, 277 Pattern mixture model, 229, 231, 234, 239, 240, 460 conditional distribution, 232 individual pattern indicator, 233 on missing data, 460 pattern-specific parameter estimates, 233 PATTERN, variable, 235 previous model, 233 Patterson’s expression, 105 Pearson-type residuals, 175 Penalized quasi-likelihood (PQL) method, 255, 257, 263, 352 approximation techniques, 255 Gaussian quadrature, 352 pseudo-likelihood estimates of model parameters, 256 Physiological senescence parameters, 216 Pillai’s trace statistic, 50, 52 Poisson distribution, 276, 423 Poisson process, 276 Polynomial time function, longitudinal data analysis, 84 507 508 Subject Index Posttraumatic stress disorder (PTSD) symptom, 29 acupuncture treatment, 153 randomized controlled clinical trial, 17 longitudinal study, 447 severity, 160 PQL method See Penalized quasi-likelihood (PQL) method Practical significance, 24 Predicted probabilities, of disability/death, 368 Predicted response probability, variance approximation, 320 Predicted time trend, 81 Predicted transition probabilities, 405, 407 in functional status and marital status, 395 variance approximates, 404 variance–covariance matrix for, 380, 382, 386 PREDICT statements, 399 Prentice’s approach, 291 Prentice’s expansion, 290 Pre–post effect size, 28 Pre–post paired t-test, 21 PRESS statistic, defined, 183 Probability density function, 261 Probability of correct model, 424 PROBIT, 465 Probit regression models, overview of, 310 Probit survival model, 458 PROC CALIS procedure, 428, 430 PROC GENMOD procedure models, 301 PROC GLIMMIX procedure, 257, 259, 329, 330, 398 logit components, 361 multinomial distribution, 360 parameter estimates, 398 PROC NLMIXED procedure in SAS system, 359 random effect parameters, 263 PROC LOGISTIC statement, 306, 465 PROC MEANS procedure, 332, 399, 402 PROC MI procedure, 453 PROC MIXED procedures, 87, 115, 118, 119, 125, 140, 156, 163, 198, 225, 235, 453, 467, 468 PROC NLMIXED procedure, 226, 332, 359, 363, 369, 397 estimates regression coefficients, 361 RANDOM statement, 333 in SAS is applied to yield parameter estimation, 226 PROC SGPLOT procedure, 122, 434, 436 PROC SQL procedure, 128 PROC TTEST procedure, 34 Proportional odds assumption, 274 Pseudo-likelihood estimates, 260 PTSD See Posttraumatic stress disorder (PTSD) symptom PTSD Checklist (PCL) score, 5, 29, 36, 209, 210 acupuncture treatment, 155, 212 fixed effects of three linear mixed models, 212 four time points, 149 pattern of change over time, 159 prediction time plots of, 124 time trends of, 161 subject-specific time plot, 31, 123 Q Q -point Gaussian quadrature rule, 263 Quadratic polynomial function, 80, 82 high-order polynomial functions, 82 time function, 80 Quasi-likelihood function, 259, 288, 479–480 R Random coefficient model, 179, 180 linear model, 235 logistic regression model, specification of, 316 model specification/SAS program, 483–485 AHEAD longitudinal data, 485 multinomial logit model on health, 483 multivariate model, 179 Random-effects multinomial logit transition model, maximum likelihood estimates, 392 Random errors, 13, 174 covariance structure for within-subject, 133 subject-specific, 134 variance–covariance matrix, 133 Random intercept linear model, 176 Random intercept logit model, 313, 316 empirical BLUP approach and retransformation method based on, 328 reduced-form, 328 regression model, specification of, 313 Wald statistics, 337 within-subject error, 315 random, 335 within-subject variability, 314 Random intercept model, 314 Random intercept regression model, 350 multinomial logit models, 366, 396 transition model, 405 Randomized block design, 39 Randomized controlled clinical trials, 208, 446 RANDOM statement, 121, 329 Raw residuals, 175 Subject Index Reduced-form fixed-effects multinomial logit model, 364 Regression coefficients, 355, 454 estimate, 181 interpretability of, 322 Regression diagnostics, 173, 174, 181 Regression modeling, 13, 174, 209 asymptotic process, 74 log-likelihood function, 247 on longitudinal data, 281 multinomial logit regression modeling, 391 multivariate, 10, 113, 242, 405 nonlinear, 330 standardized diagnostic method, 189 REML approach See Restricted maximum likelihood (REML) approach Repeated measures ANOVA, 39 REPEATED statement, 46, 55, 141, 154 Residual covariance structure, patterns, 133 between-subjects variance component, 133 classification factor coding schemes of time, 149 GLM coding, 149, 150 scaling approaches, 146 scaling of time, 146 comparison of, 143 empirical illustrations linear regression model, 153 marital status/disability severity among older Americans, 161 PCL score, acupuncture treatment, 161 two linear regression models, estimation, 153 with equal spacing, 135 autoregressive structures (AR), 137 compound symmetry (CS), 135 toeplitz structures (TOEP), 138 unstructured pattern (UN), 136 least squares means, 150, 151 local contrasts, 150, 151 local tests, 150, 151 nonzero off-diagonal elements, 134 with unequal time intervals, 139 hybrid residual covariance model, 142 spatial exponential model, 141 spatial Gaussian pattern model SP(GAU), 141 spatial power model, 140 variance–covariance pattern models, 134 Residual diagnostics, 174 linear mixed model, 173 types of, 174 semivariogram in linear random coefficient model, 178 in random intercept linear models, 176 Residual log-likelihood function, maximization, 103 Residual variance–covariance matrix, 70, 232 pattern models, 139 use of, 161 Restricted maximum likelihood (REML) approach, 75, 95 Bayesian inference, overview of, 96 computational procedures, 106 estimators, 96, 99, 101, 102, 105, 106, 117, 120, 126, 151, 175, 256 AHEAD survey, 124 Expectation–Maximization (EM) algorithm, 108 hypothesis testing, on variance component G, 116 justification of, 104 linear mixed models, estimator, 102 approximation of random effects, 111 best linear unbiased prediction (BLUP), 111 empirical illustrations, 117 marital status and disability among older americans, 124 PCL score, acupuncture treatment on, 117 shrinkage/reliability, 113 log-likelihood functions, 103, 106, 107 MLE bias, in variance estimate, 99 ML estimators, comparison, 105 Newton–Raphson (NR) algorithm, 107 REML, in general linear models, 101 Retransformation method, 357 based on random intercept multinomial logit model, 362 Bayesian inference, development, 272 longitudinal trajectories of health probabilities, 359 mean multinomial logit functions, 396 mixed-effects multinomial logit model, 344 nonlinear response, 269 transition probabilities, 405 Retransformation, mixed-effects logistic regression model, 273 RLD scores, 188 RMSE See Root of mean square error (RMSE) Root of mean square error (RMSE), 191 Roy’s greatest root criterion, 50, 52 S Sandwich estimator, 367 SAS PROC GLIMMIX procedure, 328 509 510 Subject Index SAS PROC MIXED procedure, 141, 145, 464 SAS PROC NLMIXED procedure, 328, 330, 358 SAS PROC SGPLOT steps, 30 SAS PROC TRAJ algorithm, 425 SAS programming, 32, 55, 56, 58, 86, 87, 91, 118, 120, 122, 153, 154, 158, 162, 168, 191, 210, 211, 223–226, 235, 236, 300, 302–304, 306, 307, 330, 360–362, 370, 397, 399, 400, 432, 483 covariance parameter, 202 COVB option, 360 reduced-form random intercept multinomial logit model, 363 SAS–STAT software, 425 SAS system, PROC NLMIXED procedure, 359 Satterthwaite approximation, 77, 449 Scaling techniques, 134 SD See Standard deviation (SD) SE See Standard error (SE) Selection model, 458 SEM See Structural equation modeling (SEM) Semi-Markov transition process, 389 Semivariogram, 180 in linear random coefficient model, 178 in random intercept linear models, 176, 178 Serial correlation, 11 SGPLOT procedure, 122, 128 Shrinkage technique, 114 Software packages, 3, 222 SOLUTION option, 121 Spatial Gaussian covariance pattern model, 141 Spatial Gaussian pattern model (SPGAU), 141 SPGAU See Spatial Gaussian pattern model (SPGAU) SQL procedure, 128 Square root of approximated variance, 322 SS See Sums of squares (SS) Standard deviation (SD), 35 Standard error (SE), 35, 36, 238 approximates, 336, 369 estimates, 366, 445 Statistical models handling missing data, 442 Stochastic missing-data indicator matrix, 231 Stochastic variations, decomposition of, 177 Structural equation modeling (SEM), 411, 430 basic null hypothesis, 415 construction of, 413 estimating procedure, 414 maximum likelihood function, 415 types of random variables, 412 Structural equation modeling, overview of, 412, 426 Subject-specific random effects, 12 Subject-specific regression coefficients, 314 Sums of squares (SS), 38 T Taylor series approximation, 321, 354 Taylor series expansion, 298, 353 first-order, 382, 477 higher order terms, 477 Taylor’s theorem, 260 t distribution, 21 Thompson’s expression, 105 Time-independent random parameter, 314 Time trends ADL score, 163, 170, 240 irregular, 85 J -shaped, 79 PCL score, 87, 124, 213 in predicted probability of disability, 375 U -shaped, 80 Time-varying covariates, TOEP covariance structure, 144 TOEP pattern model, 138, 139 TOEP residual variance–covariance pattern model, 164 TOEP variance–covariance structure, 164 Transient states, 380 Transition models, two-time transition models, 380 Transition probabilities, 406 TREAT-by-TIME interaction, 57 T -scale, 83 t -tests, 20, 37 Type-I error, 37 U Univariate longitudinal data, Unknown distribution, 462 UN pattern model, 137 US Bureau of Census, 446 V Variance-component models, Variance-covariance matrix, 72, 477 approximation, 394 defined, 284 generalized linear mixed models (GLMMs), 243 marginal effect, 241 within-subjects random errors, 387 VARIANCE statement, 428 VAR statement, 453 Subject Index W Wald chi-square statistics, 76, 325, 337, 357, 373, 408 conditional effects, 325 delta method, 325 Wald test, 84 Walter Reed National Military Medical Center (WRNMMC), 17 Wedderburn’s theory, 284 Weibull distributional function, 277 Wide table format, Wilcoxon rank test, 23 Wilks’ lambda distribution, 49, 50, 52 Within-sample variations, 38 component, random errors, 38 variations of sample data, 38 Within-study effect sizes, 28 Within-subjects random errors, 317, 347, 349, 358, 387 local approximations for, 390 multinomial logit regression modeling, 391 probabilities at series of time points, 349 variance-covariance matrix of, 387 Within-subject variability, 314, 348 Working correlation, 308 WRNMMC See Walter Reed National Military Medical Center (WRNMMC) Z Zero-inflated Poisson (ZIP) distribution, 423 ZIP distribution See Zero-inflated Poisson (ZIP) distribution Z -test, 75 511 ... issues to experts and professionals specialized in longitudinal data analysis and survival analysis xv Preface In recent decades, longitudinal data analysis has become a topic of tremendous interest... Traditional methods of longitudinal data analysis In SAS Program 2.3, two temporary datasets from the complete longitudinal data (a long table) are created, TP21, and TP22, containing data at baseline and. .. The advent of the modern mixed-effects modeling and the various approaches for the analysis of longitudinal data triggered the advancement of a large number of statistical models and methods, characterized

Tiêu đề	Methods and Applications of Longitudinal Data Analysis
Tác giả	Xian Liu
Trường học	Uniformed Services University of the Health Sciences
Chuyên ngành	Sociology
Thể loại	book
Năm xuất bản	2016
Thành phố	Amsterdam

Định dạng
Số trang	507
Dung lượng	32,05 MB
File đính kèm	159. Methods and Applicatio.rar (24 MB)