- Advanced Medical Statistics Advanced Medical Statistics r EDITORS YING I,U Universityof California, San Francisco, USA TI-QLAN FANG Sun Yat-Sen University, Guangzhou, China World Scientific New Jersey * London Singapore Hong Kong Published by World Scientific Publishing Co Pte Ltd Toh Tuck Link, Singapore 596224 USA office: Suite 202, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ADVANCED MEDICAL STATISTICS Copyright © 2003 by World Scientific Publishing Co Pte Ltd All rights reserved This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA In this case permission to photocopy is not required from the publisher ISBN 981-02-4799-0 ISBN 981-02-4800-8 (pbk) Printed in Singapore This page intentionally left blank May 30, 2003 16:8 WSPC/Advanced Medical Statistics PREFACE Since the early last century, many scholars from China have studied statistics in Western countries Some of the early pioneers, including P.L Hsu, C.L Chiang, C.C Lee, K.L Chung, and G Tiao, etc., achieved international recognition for their significant contributions to advanced statistics Since the 1960s, many students from Taiwan, Hong Kong, and Mainland China have received their advanced degrees from universities in North America and Europe Some have remained, becoming professors in academia or scientists in government or industry and making significant contributions to the fields of statistics and biostatistics Many have been elected as fellows of the American Statistical Association and/or senior members of International Biometric Society Others have become editors or associate editors for important journals, including the Annals of Statistics, the Annals of Probability, the Journals of the Royal Statistical Society, the Journal of American Statistical Association, Biometrika, Biometrics, and Statistica Sinnica, etc Several Chinese statisticians have been honored with the COPSS award, among whom Professor T.L Lai and J Fan have participated in the creation of this book Meanwhile, many young statisticians have trained in Mainland China They have accumulated a rich store of experience in teaching biostatistics and applying its theory and methods to medical research in their home country Many overseas Chinese statisticians as well as statisticians in Mainland China, Taiwan and Hong Kong participated in publishing a book in Chinese about advances in medical statistics, which was published in 2000 by The People’s Health Press, Beijing Now, with the help of World Scientific Publishing Co, we are pleased to present the English version of this book — “Advanced Medical Statistics” — with a much larger professional community of English readers The book consists of four sections and 29 chapters The first section is about statistical methods in biomedical research, including their history and statistical thinking in medical research, medical diagnoses, dependent v preface May 30, 2003 vi 16:8 WSPC/Advanced Medical Statistics Preface data, quality control and quality assurance in medical measurements, cost-effective and evidence-based medicine, quality of life, meta analysis, descriptive statistics, medical image processing, and time series Many of these statistical methods were developed specifically for specific medica issues The second section covers the most important statistical issues in pharmaceutical research and development, including pharmacology and pre-clinical studies, biopharmaceutical research, toxicological study, and confirmative clinical trials Some of the theory and methods are published here for the first time The third section is concerned with statistical methods in epidemiology, including statistics in genetic studies, risk assessment, infectious diseases, disease surveys, capture-recapture models for monitoring epidemics, cancer screening, and causal inferences Most of the methods have been newly developed within the past decades The last section is dedicated to advanced statistical theory and methods, including survival analysis, longitudinal data analysis, non-parametric curve estimation, Bayes statistics, stochastic processes, tree structured methods, EM algorithms, and artificial neural networks These last chapters not only summarize the current status of research, future research topics and applications in medical research, but also provide some necessary theory and background for the statistical methods discussed in the first three sections All the chapters in the book are independent of each other; each is dedicated to a specific issue To meet the needs of different readers, all chapters have a similar structure The first subsection introduces the general concepts and the medical questions discussed in the chapter; examples are usually given in this section The following sections present more specific details of concepts, methods and algorithms with the emphasis on application and significance Derivations of proofs are generally not included, but citations in the literature are provided for interested readers This book is targeted to a broad readership We hope that regardless of your background whether as a physician, a researcher in bioscience, a professional statistician, or a graduate student, you will find the book appropriate to your needs As statistical thinking and methods are essential tools in modern medicine and biomedical research, medical researchers, leaving aside the statistical derivations and mathematical arguments, will learn what statistical tools are available to them, how to prepare the necessary information to use these methods, and how to interpret statistical results and their limitations For professional medical statisticians, this book provides a broad perspective on medical statistics, their possible applications and interactions between special subjects, and suggestions preface May 30, 2003 16:8 WSPC/Advanced Medical Statistics Preface preface vii about future research topics, which will be helpful to their research as well as in consultation work with clients For theoretical statisticians or applied statisticians working in other areas, the book provides many examples of statistical applications and challenges facing medical statistics, and which should help theoretical statisticians to identify new frontiers and possible application areas of their new methods Last but not least, this book is a good reference for graduate students, providing a broad overview of medical statistics that will help them to select their research topics and guide them into the heart of the issue All the authors are experts in their specific areas Each chapter reflects their own research experience, results and achievements They have given much under the tremendous pressures of their many other obligations As editors, we greatly appreciate their support, dedications and friendship Many thanks to our colleagues in the School of Public Health, Sun YatSen University, who provided assistance in the preparation of the book, especially Dr Yu Chuanhua, Dr Yan Jie, Dr Wang Xianhong, Dr Ling Li, Dr Xu Zongli, Mr Shuming Zhu, Ms Shaomin Wu and Ms Fangfang Zeng We thank the People’s Health Press, Beijing, for kindly permitting us to freely publish versions other than the Chinese ones We are most appreciative to the editors of World Scientific Publishing Co, Singapore, for their work in bringing this book to publication Ying Lu Jiqian Fang Editors This page intentionally left blank June 26, 2003 9:45 WSPC/Advanced Medical Statistics ABOUT THE EDITORS Ying Lu is an associate professor of Radiology at the Department of Radiology and the director of the Biostatistics Core, UCSF Comprehensive Cancer Center, and faculty of Bioengineering Graduate Program, University of California, San Francisco He received his BS in mathematics from Fudan University (1982) and MS in applied mathematics from Shanghai Jiao Tong University (1984), and PhD in biostatistics from the University of California, Berkeley (1990) At Berkeley, he received university fellowships (1985–1988), and Public Health Alumni Association Scholarship (1989) In 1990, he received Evelyn Fix Memorial Medal for excellent statistical dissertation on animal carcinogenicity experiments under guidance of Professors Manali and Chiang, followed by being an assistant professor of epidemiology and public health at the University of Miami School of Medicine (1990–1993) Then, he moved to the Department of Radiology at the University of California, San Francisco in 1994 He was the director of the Biostatistical Laboratory in the Osteoporosis Research Group specialized in statistical applications in quality control, clinical trial and diagnosis of osteoporosis; a member of the International Committee for Standards in Bone Measurement (1996–1998), Vice President (1995–1997) and President (1999) of the San Francisco Bay Area Chapter, American Statistical Association Dr Lu has supervised two post-doctor fellows in biostatistics and more than 20 fellows in radiology and bioengineering He has authored or co-authored more than 80 peer-reviewed articles and book chapters in statistical methods for animal carcinogenicity experiments, medical diagnostic tests, and outcome prediction, as well as clinical research areas of radiology, osteoporosis, and cancer clinical trials His papers have been published in various journals, such as Biometrics, Statistics in Medicine, Mathematical Biosciences, Medical Decision Making, Radiology, Journal of Bone and Mineral Research, Cancer, etc ix about-a July 9, 2003 10:17 WSPC/Advanced Medical Statistics Introduction to Artificial Neural Networks chap29 1087 layer are θi The values of ah , Vhi , θi are determined by the probability distributions, e−|γ| The stimuli algorithm is a logistic function: f (x) = 1/(1 + e−x ) (3) And then using formula (3) and (4), we compute the stimuli values of units in the output layer: p Wij bi + γj Cj = f i=1 where j = 1, 2, , q, the connection weights from hidden layer to output layer are Wij , the thresholds of units in the output layer are γj And also the values of Wij , γj are determined by the probability distribution, e−|γ| The method for calculating normalized error of output layer is given in (5): dj Cj (1 − Cj )(Cjk − Cj ) , (4) where the expected value of unit j in output layer is Cjk Finally, compute the error of units in hidden layer compared with each dj q ei = bi (1 − bi ) Wij dj j=1 On the basis of abiding on the above-mentioned steps and recombining crossover and mutation, we modulate the hidden-layer-to-output-layer connection weights and the thresholds of units in the output layer following the adjustment of the input-layer-to-hidden-layer connection weights and the thresholds of units in hidden layer When the error between expected and observed outcomes converge to the pre-determined error tolerance, the learning of neural network stops Future Research Trends Because the intelligent computing techniques, such as ANN, succeed in many applications, they obtain considerate attention and play important role in magnitude of research fields As a popular utility, should become more matured in the future But in terms of intelligent computing itself and statistics, several outstanding obstacles, which be urgently solved in this field are: (1) It is known that AI still does not possess multitude inherent characteristics of brain, such as tolerance and robustness Although ANNs have July 9, 2003 10:17 1088 WSPC/Advanced Medical Statistics J L Xia, H W Jiang & Q Y Tang solved some problems in AI, their theories are not perfect in and are still in their infancy Moreover, AI and ANN may be completely different realizations of the natural principles on which the brain is based So a number of mathematical principles should been developed right now But this is ignored in lots of standard textbooks and reviews Researchers should pay more attention to studying existing theories and to establishing mathematical foundations To avoid losing in “forest” of biological mechanisms, the nature of AI and ANN theories should be figured out As there have been a multitude materials and experiences, it is time to replace many seemly faultless algorithms, which are full with analogues and metaphors, with the specifically objective methods and theories of quantification (2) It will take longer time to reveal and understand the intelligent mechanisms of human being These biological discoveries are increasingly considered as a way to open up the scopes of AI and ANN Once biology, neurology, genetics make break-through, AI and ANN can simulate high-level intelligent mechanisms to solve some unsolvable problem encountered today Furthermore, they also urge biologic researches to unveil more sealed puzzle in intelligence Although AI and ANN are gray-box algorithms and approximately correspond to intelligence of human being, they can provide some useful clues and foundations for further research While verifying rationality of previous models, the methodology of AI and ANN, how to construct more sophisticated model, become more individualistic and explicit (3) Although AI and ANN have developed for near 50 years, their terminology is not standardized by any cases Especially as to random system, most networks completely conceal or rigidly utilize the statistics In the words of Anderson, Pellionisz and Rosenfeld22 : Neural networks are statistics for amateurs Most statisticians still soberly stand by the development of ANNs and will not to accept it in a short time because ANNs are quite imperfect compared with statistics They, especially in statistical applications, are reluctant to waste valuable data and time in automatic processing of computer However, along with maturity of MCMC theory and Gibbs sampling and the increasing Interactions between Frequency School and Bayesian School, the ANN based on Bayesian theory is growing rapidly All the advantages of ANN are evaluated empirically based on practical applications rather than in theoretical comparisons with statistic methods The impact of ANNs on the theory and application of statistics is rather obscure at chap29 July 9, 2003 10:17 WSPC/Advanced Medical Statistics Introduction to Artificial Neural Networks chap29 1089 this stage At present, as a system method between gray-box and blackbox, we cannot evaluate ANNs advantages in generalized range So how to combine ANN with statistics may be a feasible approach to get out of current dilemma Statistician should pay more attention to the aspects of AI and ANN References Metropolis, N., Rosenbluth, A W., Rosenbluth, M N., Teller, A H and Teller, E (1953) Equation of state calculations by fast computing machines Journal of Chemical Physics 21: 1087–1092 Rumelhart D E., Hinton G E and Williams R J (1986) Learning representation by back propagation errors Nature 323(6188): 533–536 Holland J H (1992) Adaptation in Natural and Artificial Systems MIT Press, Cambridge, Massachusetss Dong, C., Li, Z N., Xia, R W and He, Q Z (1995) Advance and some problems on multilayer perceptron neural network Mechanics advance 25(2): 186–196 Hopfield, J J (1982) Neural networks and physical systems with emergent collective computational abilities Proceedings of National Academic Sciences USA 79: 2445–2458 Feldmann, R., Monien, B and Mysliwietz, P (1990) Distributed game tree search In Parallel Algorithms for Machine Intelligence and Vision, eds Kumar, Kanal and Gopalakrishan, Springer-Verlag, New York Hinton, G E., Van Camp, D (1993) Keeping neural networks simple by minimizing the description length of the weights Proceedings of the 6th Annual ACM Conference on Computational Learning Theory, Santa Cruz, 5–13 Rumbelhart, D E., Hinton, G E., Willians, R J (1986a) Learning representations by back-propagating errors Nature 323: 533–536 Kohonen, T (1984) Associative Memory and Self-Organization, SpringerVerlag, New York 10 Minsky, M L and Papert, S A (1988) Perceptrons Expanded Edition, MIT Press, Cambridge, Massachusetss (First edn 1969.) 11 Lai, Y X and Lu, Y S (1999) BP neural network application in the distribution study of fluid diversity Journal of Beijing Chemistry University 26(2) 12 Radford, M J (1996) Bayeian Learning for Neural Networks, Springer, New York 13 Watt, R (1991) Understanding Vision, Academic Press, London 14 Barndorff-Nielsen, O E and Jensen, J L (1993) Networks and Chaos Statistical and Probabilistic Aspects, Chapman and Hall, London 15 Zhang, S Q., Chen, C and Wan, E P (1998) Gray system application in production evaluation Geography 18(6): 581–585 16 Hechi Nielsen, R (1989) Theory of the back propagation neural network International Journal of Conference Neural Network 1: 593–605 July 9, 2003 10:17 1090 WSPC/Advanced Medical Statistics J L Xia, H W Jiang & Q Y Tang 17 Bornholdt, A (1992) General asymmetric neural network and structure design by genetic algorithms Neural Network 5(2): 327–334 18 Li, M Q., Xu, B Y and Kou, J S (1999) The combination of GA and ANN Theory and Practice of System Engineering 21(2): 65–69 19 Jin, L., Luo, Y., Mou, Q L et al (1998) Study on ANN prediction model of the humidity in the soil of cornfield Agrology 35(1): 25–35 20 Li, Z and Zhang, J T (2000) The study on evaluation of mealie based on the combination of GA and ANN Natural Resource Journal 15(3) 21 Deng J N (1992) The Basic Methods of Gray System, Center China Sci Tech Univ Press, 304–312 22 Anderson, J A., Pellionisz, A and Rosenfeld, E (1990) Neuro-computing : Directions for Research, MIT Press, Cambridge, Massachusetts About the Author Xia Jielai, professor of the Fourth Military Medical University, earned his bachelor in Applied Mathematics from the Anhui University and master and PhD in Healthy Statistics from the Fourth Military Medical University He teaches in bio-statistics department of the university as an assistant professor (1983–1988), lecturer (1988–1995), associate professor (1995– 1998) and professor (1988-present) He visited clinical and epidemiological research center of Prince Welsh Hospital, the Chinese University of Hongkong (January–February 1994), department of genetics and biometry, Louisiana state university medical center (March–October 1994) His research fields are biostatistics and data processing, including theory and methods of statistical modeling and soft development of NoSA (Non-typical data statistical analysis system) He has published more than 50 articles in various scientific journals chap29 September 5, 2003 14:8 WSPC/Advanced Medical Statistics Index artificial neural networks (ANN), 1073 assay assay development and validation, 410 assay method, 436 assay validation, 452 biological assays, 436 chemical assays, 436 immunoassays, 436 asymptotic mean integrated square error (AMISE), 891 atrial fibrillation over ventricular arrhythmias, 427 autocorrelation, 61, 335 auto-regressive models ARIMA(p; d; q), 335 ARMA(n; n 1), 335 multivariable ARMA model, 368 autosomal chromosomes, 583 average causal effect (ACE), 784 Back Propagation (BP) Neural Networks, 1074 back-calculation or back projection, 647 back-door criterion, 804 backfitting algorithm, 853 balance at baseline, 527 cross-validated bandwidth, 852 bandwidth selection, 900 batch-to-batch variation, 457 accelerated approval, 566 accuracy accuracy errors, 104 definition, 21, 103, 436 diagnostic accuracy, 484 active control, 566 effect size, 568 control trials, 480 acute neurologic illness in children, 713 affected sib pair method (ASP), 602 affected-pedigree-member (APM), 605 agreement (see also Bland-Altman plot, Kappa) assessment of agreement, 131 chance-corrected agreement, 143 reader agreement, 483 Akaike Information Criterion (AIC), 705, 721, 942 allele codominant, 584 definition, freqeuncy, probabiity, 584 dominant, 584 recessive, 584 allelic association, 588 Alzheimer’s disease assessment scale, 554 analysis of variance (ANOVA), 64 analytical survey, 685 angiotensin-converting enzyme (ACE), 591 animal carcinogenicity experiments, 496 animal toxicological experiments, 972 antibiotic preparation, 978 area under curve (AUC) ROC curve, 34, 488 SROC curve, 294 plasma or blood concentration-time curve, 415 artificial intelligence (AI), 1073 Bayes, Bayesian Bayesian computation, 956 Bayesian credible intervals, 935 Bayesian Highest Posterior Density (HPD), 935 Bayesian Information Criterion (BIC), 705, 942 Bayesian meta-analysis, 269 Bayesian methods, 268 Bayesian model averaging (BMA), 942 1091 index-new September 5, 2003 14:8 WSPC/Advanced Medical Statistics 1092 Bayesian statistics, 933 Bayes theorem, 941 benchmark dose (BMD), 619, 637 Begg’s method, 312 benchmark response (BMR), 620 best linear unbiased predictor (BLUP), 695 beta-binomial model, 515, 517 bias, 305, 526 bias and variability, 445 bias-corrected and accelerated (BCa), 177 English language bias, 306 evaluation bias, 528 extractor bias, 308 indexing bias, 306 lead time bias, 743 length bias, 743 multiple publications bias, 306 operational bias, 528 publication bias, 306, 308 reference bias, 306 sampling bias, 306 selection bias, 306 sources for bias, 527 statistical bias, 534 structural bias, 533 verification bias, 21, 22 within study biases, 308 bilinear model, 367 binary test, 24, 26 binormal model, 31 binwidth, 888 bioavailability, 410, 415, 431 bioequivalence, 48, 410, 431, 432, 467, 468 biostatistics history, birth-death process, 1017, 1024 Bland-Altman plot, 134 blind blinded-reader studies, 482 blinding, 526, 529, 537 double blind, 14, 529 blood or plasma concentration-time curve, 467 body fat, 942 Index body surface area, 1080 bone mineral density (BMD), 102 Bonferroni adjustment, 870 bootstrap, 86, 175, 866 bootstrap algorithm, 109 bootstrap variability bands, 871 conditional bootstrap, 915 parametric bootstrap, 178 box-and-whisker plot, 326 Box-Jenkins model, 350 BP neural networks (BPNN), 1075 bracketing design, 465 Bradley-Blackwood procedure, 134 branching processes, 657, 1013 breast cancer screening, 750 Breslow-Day test of homogeneity, 297 bypass angioplasty revascularization, 158 calibration, 145 capture-recapture, 701, 716 case-control study, 779 categorical data, 142, 320 causal diagram, 778, 800 causal effect model, 778 causal inference, 778 causal relationship, 334 cause-of-death, 498 censored, 815 censored and truncated data, 323 central limit theorem, 387 cervical cancer, 1029 chain binomial model, 657 change point detection, 924 Chapman-Kolmogorov equation, 993 chromosomes, 583 Ciclosporin A, 51 clearance, 413, 414 clinical decision rule, 547, 550 clinical endpoint, 548 clinical trials, 8, 414 active control trial, 480 confirmatory trial, 525 randomized controlled clinical trial (RCT), 12, 164 index-new September 5, 2003 14:8 WSPC/Advanced Medical Statistics Index two- or multi-stage randomized trial, 562 cluster analysis, 384 cluster sampling, 47 Cochrane collaboration, 234 Cochran’s semi-weighted estimator, 254 coefficient curves, 863 coefficient of variation (CV), 108, 109, 453, 723 longitudinal CV, 113 standardized coefficient of variation (SCV), 111 within-batch CV, 109 collapsibility-based criterion, 787, 792 community-based surveys, 686 comparative calibrations, 146, 147 compartment models, 412, 418, 656, 658 complete ascertainment, 595 complex human traits, 583 computer software, 41 computerized tomography (CT), 34, 379, 380 concentration, 978 conditional conjugacy, 950 conditional density estimation, 924 conditional heterogeneity, 337 conditional maximum likelihood estimation, 845 conditional variance estimator, 922 conditional variance function, 922 confidence interval of, 86 confounder, 787, 795 confounding, 446, 526, 527, 787 conjugate priors, 944, 949 construct validity, 217 content validity, 215 continuous proportional data, 322 continuous-time Markov chain, 997 convergent validity, 219 co-primary endpoints, 546, 548 cost-complexity, 1039 cost-effectiveness analysis (CEA), 157, 161 index-new 1093 cost-effectiveness plane (CE plane), 168 cost-effectiveness ratio (CER), 161 counterfactual model, 783 counting process, 818, 1026 multiple counting process, 1030 Cox regression, proportional hazards model, 821, 908, 965, 1027, 1041 criterion validity, 216 Cronbach’s coefficient, 228 cross-calibration, 144 crossing-over, 585 crossover design, 48, 432 cross-validation, 852, 865, 901, 936 cross-validation bandwidth selector, 901 cure rate model, 963 current Good Manufacturing Practice (cGMP), 450 DAD test, 108 data monitoring, 478, 530, 538, 563 data processing system (DPS), 1078 datasets absorption, distribution, metabolism and excretion (ADME), 444 air/ethanol mix, 894 Alabama fetal growth study, 872 Alabama small-for-gestational-age (ASGA) study, 839 aminophylline treatment in severe acute asthma, 267 burn data, 907, 916 Indianapolis Study of Health and Aging, 688 Multicenter AIDS Cohort Study (MACS), 839, 877 National Cholesterol Education Program (NCEP), 159 National Health and Nutrition Examination Survey (NHAINES), 685 Primary Biliary Cirrhosis (PBC) data set, 909 September 5, 2003 14:8 WSPC/Advanced Medical Statistics 1094 stratified neurologic illness data, 730 decision set, 550 decision structure, 551 dementia screening test, 35 DerSimonian-Laird method, 261, 278 descriptive survey, 685 design density, 900 design efficiency, 94 diagnostic imaging, 481 diethylene glycol dimenthyl ether (TGDM), 627 diethylhexyl phthalate (DEHP), 512 Dirichlet distribution, 1006 Dirichlet-multinomial distribution, 625 discrepancy loss function, 919 discriminant validity, 219 DNA sequence, 1006 dose dosage regimen, 410 dose-related trend, 497 dose-response assessment, 618 dose-response modeling, 639 dropouts, 477 drug discovery, 410 drug interchangeability, 470 drug shelf-life, 455 d-separation criterion, 801 dual X-ray absorptiometry (DXA), 102 duodenal ulcer prevention trials, 533 edge effect, 887 effect size, 237, 249 effectiveness, 162, 444, 524 efficacy, 5, 435, 444, efficacy and safety, 444 efficacy subsets, 479 Egger’s linear regression method, 311 electrical source imaging, 381 elimination half-life, 416 Emax model, 412 empirical Bayes, 425, 695 hierarchical and empirical Bayes methods, 410 Index empirical BLUP (EBLUP), 696 epidemic transmission, 653 epilepsy epilepsy trial, 555 epileptics, 56 felbamate in epileptic patients, 427 seizure counts, 58 equal-catchability, 719, 726 equilibrium in genetics, 993 equivalence, 567 estimated shelf-life, 466 estimation, 16, 64, 65, 847 Evidence-Base Medicine (EBM), 234 exact-based logit method, 295 excitability score, 638 expectation-maximization (EM) algorithm, 15, 393, 396, 634, 1051 ECME algorithm, 1057, 1064 parameter-expanded EM algorithm, 1059 PX-E step, 1059, 1066, 1067 PX-EM, 1059 PX-M step, 1059, 1066, 1067 extra-Poisson variation, 517 extremely concordant (EC), 607 extremely discordant (ED), 607 fail-safe number, 312 Fieller’s method, 173 first phase shelf-life, 461 first-pass effect, 415 fixed-effects (FE) model, 251 Fleiss method, 258 Fourier Transform Fast Fourier Transform (FFT), 361 finite discrete Fourier transformation (DFT), 361 frailty, 828 front-door criterion, 804 Functional Observational Battery (FOB), 629 funnel graph, 309 gain field, 394 g-algorithm formula, 808 index-new September 5, 2003 14:8 WSPC/Advanced Medical Statistics Index gamma-normal hierarchical model, 1052 general variance-based method, 260 generalize simple branching processes, 1015 generalized estimating equation (GEE), 38, 71, 515, 625, 842 generalized linear mixed models (GLMM), 410, 428, 514 generic drug products, 431 Genetic Algorithm (GA), 1084 Genome Search Meta-Analysis method (GSMA), 304 genotype, 584 Gibbs sampling method, 269, 272, 422, 938, 957, 1009 Good Clinical Practice, 102, 450, 475 Good Laboratory Practice (GLP), 450 Good Manufacturing Practice (cGMP), 444 Good Regulatory Practice (GRP), 450 Good Statistics Practice (GSP), 449 goodness of fit, 344 goodness-of-split complexity, 1042 government regulation, 14 United States Food and Drug Administration (FDA), 14, 443, 472, 496, 565 graphic methods, 325 haplotype relative risk (HRR), 610 Hardy-Weinberg equilibrium, 590 Hardy-Weinberg law, 586, 587, 993 Haseman-Elston procedure, 299, 606 hazard function, 816 additive hazards model, 825 Health and Retirement Survey (HRS), 685 health risk assessment, 618 Helicobacter pylori (HP) infection, 1000 hepatitis hepatitis A virus (HAV), 712, 728 hepatitis B virus (HBV), 273, 655 index-new 1095 hepatitis C virus (HCV), 273 heterogeneous model, 720 heterozygous, 584 hierarchical Bayes approach (HB), 695 hierarchical generalized linear models (HGLM), 428 hierarchical or clustered structure, 76 hierarchical power prior, 947 hierarchical structure, 62 histogram, 326, 886 homogeneity, 252 homozygous, 584 hospital cost, 692 human dynamic FDG-PET brain, 386 human immunodefficiency virus (HIV) HIV dynamic, 660, 662, 839 HIV/AIDS, 646 zidovudine (AZT), 948 human OB gene, 299 hypergeometric distribution, 820 Ibragimov-Has’minskii (IH) environment, 424 identical-by-descent (IBD), 602 identifiability, 412 image sampling schedule, 383 importance weighted marginal density estimation (IWMDE), 940 in vitro, 409 incomplete ascertainment, 595 incomplete data, 1051 incremental cost-effectiveness ratio (ICER), 169, 179 independence chain, 959 individual bioequivalence, 434, 469 individual causal effect (ICE), 783 infarctions acute cerebral infarctions, 50 acute myocardial infarction, 236 infectious diseases, 645 informative priors, 945 intent-to-treat (ITT), 534 interaction, 446 interim analysis, 478, 560, 561 September 5, 2003 14:8 WSPC/Advanced Medical Statistics 1096 International Conference on Harmonization, 449, 525 interval censored, 829 intra-class correlation, 46 intraclass correlation coefficient (ICC), 67, 137 intra-litter correlation, 626 intra-subject correlations, 845 in-utero, 624 Investigational New Drug (IND), 443, 524 ion-channels, 1005 irregularity, 333 irrelevant factor, 795 Iteratively reweighted least squares (IRLS), 73, 78 Jeffreys prior, 952 joint sojourn distribution, 755 Kaplan-Meier estimator, 818, 1026 Kappa statistics, 140, 483 Kendall’s tau, 312 kernel density estimate, 889 kernel function, 851, 889 kernel regression, 896 Kurtzke Disability Status Scale, 264 L measure, 970 Laplace’s rule, 951 Last Observation Carried Forward (LOCF), 535 latency to persistent sleep (LPS), 556 latent period of cancer, 1002 latent structure models, 146 learning sample, 1034 least significant change (LSC), 114 least squares kernel estimator, 855 least squares local linear estimator, 856 least squares local polynomial estimators, 859 leave-one-subject-out, 852 levels of validation, 189 life table, 1025 likelihood classification, 390 Index likelihood ratio test (LRT), 108 limit of detection/quantitation, 436 limit of quantitation (LOQ), 416 limiting dependent models, 756 Lindstrom-Bates procedure, 421, 428 linearity, 436 linguistic validation, 203 link function, 904 linkage, 587 linkage analysis, 598 linkage disequilibrium, 588 linkage equilibrium, 587 linkage studies, 298 linkage to BMI, 300 litter effect, 511, 624 local log-likelihood function, 904, 906, 913 local maximum likelihood estimator, 906, 1096 local modeling, 885 local partial likelihood, 909 local polynomial regression, 897 local quasi-likelihood estimation, 904 Localized Metropolis’ algorithm, 961 locus, 583 LOD (log-odds) score method, 598 logistic regression, 690, 727 multilevel logistic regression, 80 log-linear models, 142, 705, 721 log-log model, 81 logrank statistic, 820 longitudinal data, 59, 807, 837 long-term memory, 1004 LOWESS, 921 lowest-observed-adverse-effect-level (LOAEL), 618 Lp Wasserstein metrics, 1041 lymphoid mononuclear cells (MNCs), 662 macro-simulation method, 769 magnetic resonance imaging (MRI), 379 magnetic resonance spectroscopy (MRS), 379 malignant melanoma, 963, 970 index-new September 5, 2003 14:8 WSPC/Advanced Medical Statistics Index Mammography, 379 Mantel-Haenszel method, 255, 502 marginal posterior densities, 940 marginal quasi-likelihood (MQL), 78 Markov and Semi-Markov models, 212 non-homogeneous continuous-time Markov model, 1003 non-homogeneous discrete-time Markov model, 1002 non-homogeneous Markov model with covariables, 762 Markov chain, 758 discrete-time Markov chains, 991 hidden Markov chain, 1006 Markov chain in random enviroment, 1005 non-homogeneous time discrete Markov chain model, 763 time homogeneous Markov chain model, 758 Markov Chain Monte Carlo (MCMC) method, 268, 422, 956, 1007 Markov counting process, 1029 Markov process, 758 martingale, 818 mass screening, 741 matrix design, 465 maximum concentration (Cmax ;), 467 maximum likelihood estimate (MLE), 11, 253, 590, 847 maximum tolerated dose (MTD), 496 Mean Integrated Square Errors (MISE), 900 measurement errors, 103, 844 medical imaging, 379 Medical Outcomes Study 36-Item Short Form (SF-36), 198 Mendel’s first law, 585 meta-analysis, 233 methods for describing data, 320 Metropolis-Hastings algorithm, 958, 1009 micro-simulation model, 771 missing informative missing, 535 index-new 1097 missing at random (MAR), 535 missing completely at random (MCAR), 535 missing data, 208, 477 mixed effects models, 59, 633 model diagnostics, 824 model validation, 427 modeling type estimator, 690 molecular genetics, 583 monitoring time interval (MTI), 115 Monte Carlo (MC), 956 Monte Carlo EM, 422 Monte Carlo integration, 422 Monte Carlo method, 87 Monte Carlo simulation, 400 morbidity, 195 mortality, 195 multicenter studies, 478 multilevel model, 61 multilevel Poisson regression model, 82 multilevel probit model, 81 multivariate multilevel model, 85 multi-phase sampling, 688 multiple comparisons, 208, 539, 541 multiple endpoints, 544 multiple event times, 827 multiple renewal process, 1028 multiple sclerosis, 264 multiplicative intensity model, 823 multiplicity, 479 multi-stage sampling, 688 Multivariate Analysis of Variance (MANOVA), 181 multivariate mortality, 972 Nadaraya-Watson type kernel estimators, 851 nasopharyngeal carcinoma (NPC), 766 natural history, 650 Nelson’s estimator, 818 nerve growth factor (NGF), 54 New Drug Application (NDA), 443 N-nitrosodiethylamine (DEN), 621 noise reduction, 386 September 5, 2003 14:8 WSPC/Advanced Medical Statistics 1098 non-inferiority, 567 noninformative priors, 951 non-linear anisotropic diffusion, 388 nonlinear mixed effects models, 410, 420 nonlinear regression analysis, 400 nonparametric goodness of fit test, 911 nonparametric likelihood ratio test, 915 nonparametric methods, 324 no-observed-adverse-effect-level (NOAEL), 618 normalizing constant, 934 Nottingham Health Profile (NHP), 198 objective endpoints, 532 odds ratios, 247 pooled estimate of odds ratio, 257 one-stage models, 747 one-step local MLE, 913 optimal image sampling schedule (OISS), 383 optimization, 768 Ordinal-Scale Test, 29 orthogonal series method, 896 orthopantomograms, 381 osteoporosis, 116 outlier detection, 475 over-dispersion, 324, 328 parallel designs, 433 parametric model, 324 partial likelihood, 822, 908 partially linear model, 863, 849 Pearson residuals, 73 Pearson Chi-square test, 10, 591 penalized least squares criterion, 859 periodicity, 333 periodogram, 365 Peto test, 503 Peto’s method, 256 pharmacodynamic (PD), 409, 411 pharmacokinetic (PK), 51, 324, 410, 467 Index pharmacology, 409 pharmacometrics, 410 phase 1, 410, 523 phase 2, 410, 523 phase 3, 410, 524 phenobarbital in neonates, 427 piecewise exponential model, 965 Poisson process, 1021 non-homogeneous Poisson process, 1021 weighted Poisson process, 1021 Poisson regression, 57 poly-exponential models, 412 poly-k test, 503 polynomial calibration models, 978 polynomial regression, 894 population, 686 population bioequivalence, 434, 468 population genetics, 15 positive and negative predictive values, 24 Positron Emission Tomography, 379, 380 posterior covariance, 935 distribution, 933 mean, 935 predictive density, 935 predictive distribution, 935 probability, 935 quantity, 934 post-intervention distribution, 803 potency, 436 power, 93, 182, 207, 237, 446, 473 precision, 103, 436 absolute precision errors, 106 long-term precision, 107 long-term precision errors, 104, 105 precision errors, 104 short-term precision errors, 104 pre-clinical, 410 preclinical detectable phase (PCDP), 746 prediction, 347, 935 predictive errors, 347 predictive quasi-likelihood (PQL), 78 index-new September 5, 2003 14:8 WSPC/Advanced Medical Statistics Index pre-IND, 443 prevalence, 688 prevention trial, 668 primary endpoint, 545, 546, 548, 666 primary hepatocellular carcinoma (PHC), 273 principle of independent segregation, 585 priori, 411, 547 distribution, 933, 944 power, 946 probability matching, 955 probability density, 886 probability theory, process control charts, 119 cumulative sum chart (CUSUM), 125 exponentially weighted moving average (EWMA) control chart, 131 moving average chart, 124 Shewhart chart, 121 process validation, 453, 454, 455 propensity score, 787 proportional hazard models, 821, 908, 965, 1041 protease inhibitor (PI), 663 pseudo-Bayes factor, 969 pseudo-data step, 421 pseudo-likelihood method, 687 pulmonary function, 267 Q-TwiST method, 212 quality assurance (QA), 101 quality control (QC), 101 quality improvement, 101 quality of life (QOL), 195 QOL instrument, 198 domains, 199 health-related quality of life (HRQOL), 195 quality-adjusted life year (QALY), 158, 211 WHOQOL, 196 quantitative trait locus (QTL), 607 index-new 1099 quantitative ultrasound (QUS), 104, 116 quasi-confidence interval, 167 quasi-likelihood method, 516, 842 radiology, 101 random effect model, 59, 251, 687, 692, 695, 841 random effects, 844 random variation, 319 randomization, 11, 447, 526, 537 randomized block design, 47 randomized experiment, 784 range, 436 ratios, 321 reader agreement, 483 receiver operating characteristic (ROC) analyses, 29, 34, 486, 886 receptor, 409 recombination fraction, 585 recurrent events, 823 recursive partitioning, 1033 reference concentration (RfC), 619 reference dose (RfD), 619 reference prior, 954, 980 relapse-free survival (RFS), 970 relative risk, 237, 247 reliability, 223, 225 repeated measures, 322 reproductive and developmental toxicological data, 624 reproductive studies, 509 responses, 1043 restricted iterative generalized least squares (RIGLS), 78 restricted maximum likelihood (REML), 253, 696, 847 reverse transcriptase inhibitor, 663 reversible jump MCMC samplers, 1011 risk assessment, 495 risk differences, 247 robust method, 284 rosiglitazone maleate tablets, 48 ruggedness, 436 September 5, 2003 14:8 WSPC/Advanced Medical Statistics 1100 safety, 444, 524 safety and efficacy, 435 sample coverage, 724 sample size, 92, 112, 182, 447, 473 sample size and cost effect, 92 sample size re-estimation, 562 sample surveys, 685 sampling frame, 686 sampling plan, 686 sampling units, 686 SAS, 291 Savage-Dicky ratio, 976 scintigraph, 26 seasonality, 333, 336, 357 second phase slopes, 462 secondary endpoints, 546 segmentation, 392 segregation analysis of dominant loci, 592 segregation analysis of recessive loci, 594 segregation ratio, 589 semi-Markov process, 1005 semi-parametric, 324 semiparametric accelerated failure time model, 826 semi-parametric cure rate model, 967 semiparametric model, 842 sensitivity, 22, 280, 486, 746 sensitivity analysis, 164 sensitizing tests, 122 sequential group sequential procedure, 560 sequential clinical decision rule, 561 sequential decision structure, 561 serial correlations, 844 short-term memory, 1004 sib-pair method, 300 Sickness Impact Profile (SIP), 198 signal-noise ratio (SNR), 386 significance level, 446 simple random sampling, 687 simple random walk, 992 Simpsons Paradox, 780, 781 simultaneous bands, 869 Index single ascertainment, 595 single blind, 529 single-photon computed tomography, 379 small area estimation, 692 smooth nonparametric (SNP) model, 423 smoothing estimators, 850 sojourn time of state, 997 source language, 205 Spearman-Brown prophey formula, 227 specificity, 22, 280, 436, 486 SPECT, 381 spectral analysis, 357, 359 spectral function, 362 spline B-spline, 864 spline approach, 896 spline smoothing estimator, 326 split-halves method, 227 SROC curve, SROC regression model, 286, 549 standardized means differences, 249 standardized validity coefficient, 222 stationarity and invertibility, 353 stationary distribution, 995 stationary process, 340 statistical anisotropic diffusion, 386 statistical calibration, 978 statistical process control (SPC), 121 stochastic process, 333, 992 strata, 687 stratified cluster sampling, 62 stratified random sampling, 687 strong ingorable, ignorability, 785, 786 strongly stationary process, 340 structural nonparametric models, 850 structural nonparametric regression models, 843 study design, 474 surrogate efficacy, 523 surrogate endpoint, 524 Survey of Asset and Health Dynamics of the Oldest Old (AHEAD), 685 index-new September 5, 2003 14:8 WSPC/Advanced Medical Statistics Index survival function, 816 susceptible-infection-removal (SIR), 653 symmetric beta family, 890 synthetic estimator, 694 target language, 205 temporal domain, 383 terminal nodes, 1035 test for stationarity, 341 overfitting, 353 of hypotheses, 16 CER, 181 ICER, 179 homogeneity, 253 test sample, 1034 test-retest method, 225 threshold autoregression model, 366 thrombolytic agents, 236 time-dependent covariates, 827, 908 time-invariant, 861 time-reversible, 1007 time-to-event outcome, 815 time-to-virologic-failure endpoints, 667 tissue time active curve (TAC), 384 topotecan in Solid Tumors, 324 toxicology, 48, 410, 495 t-PA, 558 tracer kinetic techniques, 382 transition intensity, 997 transition probability, 992 transmission/disequilibrium test (TDT), 609 treatment of congestive heart failure, 552 Tree Classification and Regression Tree (CART), 1033 tree node, 1035 tree pruning, 1038 tree splitting, 1036 trend trend assessment interval (TAI), 115 index-new 1101 trend assessment margin (TAM), 115 trend test, 333, 502 two-phase sampling, 688 two-phase shelf-life estimation, 459 two-stage model, 752 two-step smoothing method, 860 type I error, 446 types of data, 320 ultrasound, 104, 116, 379, 380 uniform irrelevant factor, 796 unique validity variance, 223 unstandardized coefficient linking, 222 unstandardized validity coefficient, 222 urokinase, 50 vaccine attack rate, 670 vaccine efficacy, 670 vaccine studies, 669 validity, 215, 221 variability, 323 varying-coefficient models, 843, 912 nonparametric and semiparametric varying-coefficient models, 863 VASOTEC, 552 volume of distribution, 413 Wald test, 517 weak stationary, 341 weighting type estimator, 689 within-node impurity, 1042 World Health Organization (WHO), 196, 742 xeloda, 571 x-rays, 380 x-ray mammography (MG), 381 x-ray transmission imaging, 379 Yule process, 1022 non-homogeneous Yule process, 1024 Yule-Walker estimation, 369 ... 2003 15:44 WSPC /Advanced Medical Statistics Section Statistical Methods in Biomedical Research secdiv This page intentionally left blank May 30, 2003 16:0 WSPC /Advanced Medical Statistics CHAPTER... methods in biomedical research, including their history and statistical thinking in medical research, medical diagnoses, dependent v preface May 30, 2003 vi 16:8 WSPC /Advanced Medical Statistics. ..- Advanced Medical Statistics Advanced Medical Statistics r EDITORS YING I,U Universityof California, San Francisco, USA