1. Trang chủ
  2. » Thể loại khác

Regression Methods for Medical Research

313 3 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 313
Dung lượng 12,7 MB

Nội dung

Bee Choo Tai, Saw Swee Hock School of Public Health, National University of Singapore, and National University Health System; and Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore David Machin, Medical Statistics Unit, School of Health and Related Sciences, University of Sheffield, Sheffield; and Cancer Studies, Faculty of Medicine, University of Leicester, Leicester, UK Regression Methods for Medical Research provides medical researchers with the skills they need to critically read and interpret research using more advanced statistical methods The statistical requirements of interpreting and publishing in medical journals, together with rapid changes in science and technology, increasingly demand an understanding of more complex and sophisticated analytic procedures Regression Methods for Medical Research is especially designed for clinicians, public health and environmental health professionals, para-medical research professionals, scientists, laboratory-based researchers and students Tai and Machin The text explains the application of statistical models to a wide variety of practical medical investigative studies and clinical trials Regression methods are used to appropriately answer the key design questions posed and in so doing take due account of any effects of potentially influencing co-variables It begins with a revision of basic statistical concepts, followed by a gentle introduction to the principles of statistical modelling The various methods of modelling are covered in a non-technical manner so that the principles can be more easily applied in everyday practice A chapter contrasting regression modelling with a regression tree approach is included The emphasis is on the understanding and the application of concepts and methods Data drawn from published studies are used to exemplify statistical concepts throughout Regression Methods for Medical Research Regression Methods for Medical Research Regression Methods for Medical Research Bee Choo Tai and David Machin ISBN 978-1-4443-3144-8 781444 331448 Tai_Regression_9781444331448_pb.indd 13/09/2013 11:20 Regression Methods for Medical Research Regression Methods for Medical Research Bee Choo Tai Saw Swee Hock School of Public Health National University of Singapore and National University Health System; Yong Loo Lin School of Medicine National University of Singapore and National University Health System Singapore David Machin Medical Statistics Unit School of Health and Related Sciences University of Sheffield; Cancer Studies, Faculty of Medicine University of Leicester Leicester, UK This edition first published 2014 © 2014 by Bee Choo Tai and David Machin Published 2014 by John Wiley & Sons, Ltd Registered Office John Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK Editorial Offices 9600 Garsington Road, Oxford, OX4 2DQ, UK The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK 111 River Street, Hoboken, NJ 07030-5774, USA For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley-blackwell The right of the author to be identified as the author of this work has been asserted in accordance with the UK Copyright, Designs and Patents Act 1988 All rights reserved No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher Designations used by companies to distinguish their products are often claimed as trademarks All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners The publisher is not associated with any product or vendor mentioned in this book It is sold on the understanding that the publisher is not engaged in rendering professional services If professional advice or other expert assistance is required, the services of a competent professional should be sought The contents of this work are intended to further general scientific research, understanding, and discussion only and are not intended and should not be relied upon as recommending or promoting a specific method, diagnosis, or treatment by health science practitioners for any particular patient The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of fitness for a particular purpose In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of medicines, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each medicine, equipment, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions Readers should consult with a specialist where appropriate The fact that an organization or Website is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Website may provide or recommendations it may make Further, readers should be aware that Internet Websites listed in this work may have changed or disappeared between when this work was written and when it is read No warranty may be created or extended by any promotional statements for this work Neither the publisher nor the author shall be liable for any damages arising herefrom Library of Congress Cataloging-in-Publication Data Tai, Bee Choo, author   Regression methods for medical research / Bee Choo Tai, David Machin    p ; cm   Includes bibliographical references and index   ISBN 978-1-4443-3144-8 (pbk : alk paper) – ISBN 978-1-118-72198-8 – ISBN 978-1-118-72197-1   (Mobi) – ISBN 978-1-118-72196-4 – ISBN 978-1-118-72195-7 I.  Machin, David, 1939– author.  II.  Title [DNLM:  1.  Regression Analysis.  2.  Biomedical Research.  3.  Models, Statistical.  WA 950]  R853.S7  610.72′4–dc23 2013018953 A catalogue record for this book is available from the British Library Wiley also publishes its books in a variety of electronic formats Some content that appears in print may not be available in electronic books Cover image: Stethoscope - iStock file #13368468 © LevKing DNA - iStock file#1643638 © Andrey Prokhorov Cover design by Meaden Creative Set in 10/12pt Times by SPi Publisher Services, Pondicherry, India 1 2014 To Isaac Xu-En Koh and Kheng-Chuan Koh and Lorna Christine Machin Contents Preface 1 Introduction viii 2 Linear regression: practical issues25 3 Multiple linear regression43 Logistic Regression 64 5 Poisson Regression 98 Time-to-Event Regression 120 Model Building 146 8 Repeated Measures 176 9 Regression Trees 204 10 Further Time-to-Event Models 236 11 Further Topics 269 Statistical Tables References Index 285 294 298 Preface In the course of planning a new clinical study, key questions that require answering have to be determined and once this is done the purpose of the study will be to answer the questions posed Once posed, the next stage of the process is to design the study in detail and this will entail more formally stating the hypotheses of concern and considering how these may be tested These considerations lead to establishing the statistical models underpinning the research process Models, once established, will ultimately be fitted to the experimental data collated and the associated statistical techniques will help to establish whether or not the research questions have been answered with the desired reliability Thus, the chosen statistical models encapsulate the design structure and form the basis for the subsequent analysis, reporting and interpretation In general terms, such models are termed regression models, of which there are several major types, and the fitting of these to experimental data forms the basis of this text Our aim is not to describe regression methods in all their technical detail but more to illustrate the situations in which each is suitable and hence to guide medical researchers of all disciplines to use the methods appropriately Fortunately, several user-friendly statistical computer packages are available to assist in the model fitting processes We have used Stata statistical software in the majority of our calculations, and to illustrate the types of commands that may be needed, but this is only one example of packages that can be used for this purpose Statistical software is continually evolving so that, for example, several and improving versions of Stata have appeared during the time span in which this book has been written We strongly advise use of the most up-to-date software available and, as we mention within the text itself, one that has excellent graphical facilities We caution that, although we use real data extensively, our analyses are selective and are for illustration only They should not be used to draw conclusions from the studies concerned We would like to give a general thank you to colleagues and students of the Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, and a specific one for the permission to use the data from the Singapore Cardiovascular Cohort Study Thanks are also due to colleagues at the Skaraborg Institute, Skövde, Sweden In addition, we would like to thank the following for allowing us to use their studies for illustration: Tin Aung, Singapore Eye Research Institute; Michael J Campbell, University of Sheffield, UK; Boon-Hock Chia, Chia Clinic, Singapore; Siow-Ann Chong, Institute of Mental Health, Singapore; Richard G Grundy, University of Nottingham, UK; James H-P Hui, National University Health System, Singapore; Ronald C-H Lee, National University of Singapore; Daniel P-K Ng, National University of Singapore; R Paul Symonds, University of Leicester, UK; Veronique Viardot-Foucault, KK Women’s and Children’s Hospital, Singapore; Joseph T-S Wee, National Cancer Centre, Singapore; Chinnaiya Anandakumar, Camden Medical Centre, Singapore; and Annapoorna Venkat, National University Health System, Singapore Finally, we thank Haleh G Maralani for her help with some of the statistical programming George EP Box (1979): ‘All models are wrong, but some are useful.’ Bee Choo Tai David Machin Statistical Tables 289 Table T4  Students t -distribution The value tabulated is tα/2, such that if X is distributed as Student’s t -distribution with f degrees of freedom, then a is the probability that X ≤ -tα/2 or X ≥ tα/2 α/2 α/2 –t α1 tα/2 If following a Student t-test with df = 10, a value of the test statistic X = 2.764 is obtained, then this implies that probability that X ≤ -2.764 or ≥ 2.764 is a = 0.02 a df 0.20 0.10 0.05 0.04 0.03 0.02 0.01 0.001 10 3.078 1.886 1.634 1.530 1.474 1.439 1.414 1.397 1.383 1.372 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 15.895 4.849 3.482 2.999 2.757 2.612 2.517 2.449 2.398 2.359 21.205 5.643 3.896 3.298 3.003 2.829 2.715 2.634 2.574 2.528 31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 63.657 9.925 5.842 4.604 4.032 3.707 3.499 3.355 3.250 3.169 636.6 31.60 12.92 8.610 6.869 5.959 5.408 5.041 4.781 4.587 11 12 13 14 15 16 17 18 19 20 1.363 1.356 1.350 1.345 1.340 1.337 1.333 1.330 1.328 1.325 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.328 2.303 2.282 2.264 2.249 2.235 2.224 2.214 2.205 2.196 2.491 2.461 2.436 2.415 2.397 2.382 2.368 2.356 2.346 2.336 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 4.437 4.318 4.221 4.140 4.073 4.015 3.965 3.922 3.883 3.850 21 22 23 24 25 26 27 28 29 30 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 2.079 2.074 2.069 2.064 2.059 2.056 2.052 2.048 2.045 2.042 2.189 2.183 2.178 2.172 2.166 2.162 2.158 2.154 2.150 2.147 2.327 2.320 2.313 2.307 2.301 2.396 2.291 2.286 2.282 2.278 2.517 2.508 2.499 2.492 2.485 2.479 2.472 2.467 2.462 2.457 2.830 2.818 2.806 2.797 2.787 2.779 2.770 2.763 2.756 2.750 3.819 3.790 3.763 3.744 3.722 3.706 3.687 3.673 3.657 3.646 ∞ 1.282 1.645 1.960 2.054 2.170 2.326 2.576 3.291 290 Statistical Tables Table T5  The c distribution The value tabulated is c2(a), such that if X is distributed as c2 with degrees of freedom, df, then α is the probability that X ≥ c2 α X2 (α) a df 0.2 0.1 0.05 0.04 0.03 0.02 0.01 0.001 1.64 3.22 4.64 5.99 7.29 2.71 4.61 6.25 7.78 9.24 3.84 5.99 7.81 9.49 11.07 4.22 6.44 8.31 10.03 11.64 4.71 7.01 8.95 10.71 12.37 5.41 7.82 9.84 11.67 13.39 6.63 9.21 11.34 13.28 15.09 10.83 13.82 16.27 18.47 20.51 10 8.56 9.80 11.03 12.24 13.44 10.64 12.02 13.36 14.68 15.99 12.59 14.07 15.51 16.92 18.31 13.20 14.70 16.17 17.61 19.02 13.97 15.51 17.01 18.48 19.92 15.03 16.62 18.17 19.68 21.16 16.81 18.48 20.09 21.67 23.21 22.46 24.32 26.12 27.88 29.59 11 12 13 14 15 14.63 15.81 16.98 18.15 19.31 17.28 18.55 19.81 21.06 22.31 19.68 21.03 22.36 23.68 25.00 20.41 21.79 23.14 24.49 25.82 21.34 22.74 24.12 25.49 26.85 22.62 24.05 25.47 26.87 28.26 24.73 26.22 27.69 29.14 30.58 31.26 32.91 34.53 36.12 37.70 16 17 18 19 20 20.47 21.61 22.76 23.90 25.04 23.54 24.77 25.99 27.20 28.41 26.30 27.59 28.87 30.14 31.41 27.14 28.44 29.75 31.04 32.32 28.19 29.52 30.84 32.16 33.46 29.63 31.00 32.35 33.69 35.02 32.00 33.41 34.81 36.19 37.57 39.25 40.79 42.31 43.82 45.31 21 22 23 24 25 26.17 27.30 28.43 29.55 30.68 29.62 30.81 32.01 33.20 34.38 32.67 33.92 35.17 36.42 37.65 33.60 34.87 36.13 37.39 38.64 34.76 36.05 37.33 38.61 39.88 36.34 37.66 38.97 40.27 41.57 38.93 40.29 41.64 42.98 44.31 46.80 48.27 49.73 51.18 52.62 26 27 28 29 30 31.79 32.91 34.03 35.14 36.25 35.56 36.74 37.92 39.09 40.26 38.89 40.11 41.34 42.56 43.77 39.89 41.13 42.37 43.60 44.83 41.15 42.41 43.66 44.91 46.16 42.86 44.14 45.42 46.69 47.96 45.64 46.96 48.28 49.59 50.89 54.05 55.48 56.89 58.30 59.70 Example For an observed test statistic of X = 9.7 with degrees of freedom, the tabular entries for α equal to 0.03 and 0.02 are 8.95 and 9.84, respectively Hence X = 9.7 suggests a p-value between 0.03 and 0.02, or more precisely close to 0.021 a 0.10 0.05 0.01 0.10 0.05 0.01 0.10 0.05 0.01 0.10 0.05 0.01 0.10 0.05 0.01 0.10 0.05 0.01 0.10 0.05 0.01 0.10 0.05 0.01 df2 1 2 3 4 5 6 7 8 3.46 5.32 11.26 3.59 5.59 12.25 3.78 5.99 13.75 4.06 6.61 16.26 4.54 7.71 21.20 5.54 10.13 34.12 8.53 18.51 98.50 39.86 161.45 4052.18 3.11 4.46 8.65 3.26 4.74 9.55 3.46 5.14 10.92 3.78 5.79 13.27 4.32 6.94 18.00 5.46 9.55 30.82 9.00 19.00 99.00 49.50 199.50 4999.34 2.92 4.07 7.59 3.07 4.35 8.45 3.29 4.76 9.78 3.62 5.41 12.06 4.19 6.59 16.69 5.39 9.28 29.46 9.16 19.16 99.16 53.59 215.71 5403.53 2.81 3.84 7.01 2.96 4.12 7.85 3.18 4.53 9.15 3.52 5.19 11.39 4.11 6.39 15.98 5.34 9.12 28.71 9.24 19.25 99.25 55.83 224.58 5624.26 2.73 3.69 6.63 2.88 3.97 7.46 3.11 4.39 8.75 3.45 5.05 10.97 4.05 6.26 15.52 5.31 9.01 28.24 9.29 19.30 99.30 57.24 230.16 5763.96 2.67 3.58 6.37 2.83 3.87 7.19 3.05 4.28 8.47 3.40 4.95 10.67 4.01 6.16 15.21 5.28 8.94 27.91 9.33 19.33 99.33 58.20 233.99 5858.95 df1 2.62 3.50 6.18 2.78 3.79 6.99 3.01 4.21 8.26 3.37 4.88 10.46 3.98 6.09 14.98 5.27 8.89 27.67 9.35 19.35 99.36 58.91 236.77 5928.33 2.59 3.44 6.03 2.75 3.73 6.84 2.98 4.15 8.10 3.34 4.82 10.29 3.95 6.04 14.80 5.25 8.85 27.49 9.37 19.37 99.38 59.44 238.88 5980.95 2.56 3.39 5.91 2.72 3.68 6.72 2.96 4.10 7.98 3.32 4.77 10.16 3.94 6.00 14.66 5.24 8.81 27.34 9.38 19.38 99.39 59.86 240.54 6022.40 2.54 3.35 5.81 2.70 3.64 6.62 2.94 4.06 7.87 3.30 4.74 10.05 3.92 5.96 14.55 5.23 8.79 27.23 9.39 19.40 99.40 60.19 241.88 6055.93 10 2.42 3.15 5.36 2.59 3.44 6.16 2.84 3.87 7.40 3.21 4.56 9.55 3.84 5.80 14.02 5.18 8.66 26.69 9.44 19.45 99.45 61.74 248.02 6208.66 20 (Continued ) 2.30 2.93 4.87 2.47 3.23 5.66 2.72 3.67 6.89 3.11 4.37 9.03 3.76 5.63 13.47 5.13 8.53 26.14 9.49 19.49 99.50 63.30 254.19 6362.80 ∞ Table T6  The F-distribution The value tabulated is F(α,df1,df2), such that if X has an F-distribution with df1 and df2 degrees of freedom, then α is the probability that X ≥ F(α,df1,df2) 0.10 0.05 0.01 0.10 0.05 0.01 0.10 0.05 0.01 0.10 0.05 0.01 0.10 0.05 0.01 0.10 0.05 0.01 0.10 0.05 0.01 10 10 10 20 20 20 30 30 30 40 40 40 50 50 50 100 100 100 ∞ ∞ ∞ 2.71 3.85 6.66 2.76 3.94 6.90 2.81 4.03 7.17 2.84 4.08 7.31 2.88 4.17 7.56 2.97 4.35 8.10 3.29 4.96 10.04 3.36 5.12 10.56 2.31 3.00 4.63 2.36 3.09 4.82 2.41 3.18 5.06 2.44 3.23 5.18 2.49 3.32 5.39 2.59 3.49 5.85 2.92 4.10 7.56 3.01 4.26 8.02 2.09 2.61 3.80 2.14 2.70 3.98 2.20 2.79 4.20 2.23 2.84 4.31 2.28 2.92 4.51 2.38 3.10 4.94 2.73 3.71 6.55 2.81 3.86 6.99 1.95 2.38 3.34 2.00 2.46 3.51 2.06 2.56 3.72 2.09 2.61 3.83 2.14 2.69 4.02 2.25 2.87 4.43 2.61 3.48 5.99 2.69 3.63 6.42 1.85 2.22 3.04 1.91 2.31 3.21 1.97 2.40 3.41 2.00 2.45 3.51 2.05 2.53 3.70 2.16 2.71 4.10 2.52 3.33 5.64 2.61 3.48 6.06 1.78 2.11 2.82 1.83 2.19 2.99 1.90 2.29 3.19 1.93 2.34 3.29 1.98 2.42 3.47 2.09 2.60 3.87 2.46 3.22 5.39 2.55 3.37 5.80 df1 1.72 2.02 2.66 1.78 2.10 2.82 1.84 2.20 3.02 1.87 2.25 3.12 1.93 2.33 3.30 2.04 2.51 3.70 2.41 3.14 5.20 2.51 3.29 5.61 1.68 1.95 2.53 1.73 2.03 2.69 1.80 2.13 2.89 1.83 2.18 2.99 1.88 2.27 3.17 2.00 2.45 3.56 2.38 3.07 5.06 2.47 3.23 5.47 1.64 1.89 2.43 1.69 1.97 2.59 1.76 2.07 2.78 1.79 2.12 2.89 1.85 2.21 3.07 1.96 2.39 3.46 2.35 3.02 4.94 2.44 3.18 5.35 1.61 1.84 2.34 1.66 1.93 2.50 1.73 2.03 2.70 1.76 2.08 2.80 1.82 2.16 2.98 1.94 2.35 3.37 2.32 2.98 4.85 2.42 3.14 5.26 10 1.43 1.58 1.90 1.49 1.68 2.07 1.57 1.78 2.27 1.61 1.84 2.37 1.67 1.93 2.55 1.79 2.12 2.94 2.20 2.77 4.41 2.30 2.94 4.81 20 1.08 1.11 1.16 1.22 1.30 1.45 1.33 1.45 1.70 1.38 1.52 1.82 1.46 1.63 2.02 1.61 1.85 2.43 2.06 2.54 3.92 2.16 2.71 4.32 ∞ Example For an observed test statistic of X = 5.1 with and degrees of freedom, the tabular entries for a equal to 0.10, 0.05 and 0.01 are 4.19, 6,59 and 16.69, respectively Hence X = 5.1 suggests a p-value between 0.10 and 0.05 or approximately 0.08 0.10 0.05 0.01 a 9 df2 Table T6  (cont‘d ) Statistical Tables 293 Table T7  Number of subjects required for a range of the Cohen (1988) standardized effect size, D, for a continuous endpoint assuming a two-group comparison with equal numbers in each group a = 0.05 D 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.35 b = 0.2 b = 0.1 788 352 200 128 90 66 52 42 34 20 1054 470 266 172 120 88 68 54 44 26 References Numbers in parentheses after each entry are the chapters in which these are cited *Books indicated for further reading in Chapter *Altman DG (1991) Practical Statistics for Medical Research London, Chapman and Hall [1] *Armitage P, Berry G and Matthews JNS (2002) Statistical Methods in Medical Research (4th edn) Blackwell Science, Oxford [1] Altman DG, Lausen B, Sauerbrei W and Schumacher M (1994) Dangers of using “optimal” cutpoints in the evaluation of prognostic factors Journal of the National Cancer Institute, 86, 829–835 and 1798–1799 [7] Altman DG, Machin D, Bryant TN and Gardner MJ (eds) (2000) Statistics with Confidence (2nd edn) British Medical Journal, London [4] Altman DG and Royston P (2000) What we mean by validating a prognostic model? Statistics in Medicine, 11, 453–473 [7] Bellary S, O’Hare JP, Raymond NT, Gumber A, Mughal S, Szczepura A, Kumar S and Barnett AH (2008) Enhanced diabetes care to patients of south Asian ethnic origin (the United Kingdom Asian Diabetes Study): a cluster randomised trial Lancet, 371, 1769–1776 [11] Beyersmann J, Dettenkofer M, Bertz H and M Schumacher (2007) A competing risks analysis of b­ loodstream infection after stem-cell transplantation using subdistribution hazards and cause-specific hazards Statistics in Medicine, 26, 5360–5369 [10] *Bland M (2000) An Introduction to Medical Statistics (3rd edn) Oxford University Press, Oxford [1] Böhning D, Dietz E, Schlattman P, Mendoca L and Kirchner U (1999) The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology Journal of the Royal Statistical Society, (A), 162, 195–209 [5] Boos D and Stefanski L (2010) Efron’s bootstrap Significance, 7, 186–188 [5] Breiman L, Freedman JH, Olshen RA and Stone CJ (1984) Classification and Regression Trees Wadsworth & Brooks/Cole Advanced Books and Software, Monterey, California [9] Busse WW, Lemanske Jr RF and Gern JE (2010) Role of viral infections in asthma exacerbations Lancet, 376, 826–834 [1, 2, 7, 11] *Campbell MJ (2006) Statistics at Square Two: Understanding Modern Statistical Applications in Medicine (2nd edn) Blackwell BMJ Books, Oxford [1] *Campbell MJ, Machin D and Walters SJ (2007) Medical Statistics: A Commonsense Approach: A Text Book for the Health Sciences, (4th edn) Wiley, Chichester [1, 5] Cassimally KA (2011) We come from one Significance, 8, 19–21 [3] Chia B-H, Chia A, Ng W-Y and Tai B-C (2010) Suicide trends in Singapore: 1955–2004 Archives of Suicide Research, 14, 276–283 [5] Chinnaiya A, Venkat A, Chia D, Chee WY, Choo KB, Gole LA and Meng CT (1998) Intrahepatic vein fetal sampling: Current role in prenatal diagnosis Journal of Obstetrics and Gynaecology, 24, 239–246 [1, 4, 7] Chong S-A, Tay JAM, Subramaniam M, Pek E and Machin D (2009) Mortality rates among patients with schizophrenia and tardive dyskinesia Journal of Clinical Psychopharmacology, 29, 5–8 [1, 5, 6, 7, 9, 11] *Clayton D and Hills M (1993) Statistical Models in Epidemiology, Oxford University Press, Oxford [1] Regression Methods for Medical Research, First Edition Bee Choo Tai and David Machin © 2014 Bee Choo Tai and David Machin Published 2014 by John Wiley & Sons, Ltd References 295 Cnattingius S, Hultman CM, Dahl M and Sparén P (1999) Very preterm birth, birth trauma, and the risk of anorexia nervosa among girls Archives of General Psychiatry, 56, 634–638 [4] Cohen J (1988) Statistical Power Analysis for the Behavioral Sciences, 2nd edn Lawrence Earlbaum, New Jersey [7] Cohn SL, Pearson ADJ, London WB, Monclair T, Ambros PF, Brodeur GM, Faldum A, Hero B, Iehara T, Machin D, Mosseri V, Simon T, Garaventa A, Castel V and Matthau KK (2009) The International Neuroblastoma Risk Group (INRG) Classification System: An INRG Task Force Report Journal of Clinical Oncology, 27, 289–297 [1, 9] *Collett D (2002) Modelling Binary Data, (2nd edn) Chapman and Hall/CRC, London [1] *Collett D (2003) Modelling Survival Data in Medical Research, (2nd edn), Chapman and Hall/CRC, London [1] Contoli M, Message SD, Laza-Stanca V, Edwards MR, Wark PA, Bartlett NW, Kebadze T, Malia P, Stanciu LA, Parker HL, Slater L, Lewis-Antes A, Kon OM, Holgate ST, Davies DE, Kotenko SV, Papi A and Johnston SL (2006) Role of deficient type III interferon-lambda production in asthma exacerbations Nature Medicine, 12, 1023–1026 [1] Cox DR (1972) Regression models and life tables (with discussion) Journal of the Royal Statistical Society, B34, 187–220 [6] Diggle PJ (1990) Time Series: A Biostatistical Introduction Oxford University Press, Oxford [8] *Diggle PJ, Liang K-Y and Zeger SL (1994) Analysis of Longitudinal Data Oxford Science Publications, Clarendon Press, Oxford [1, 8] *Dobson AJ and Barnett AG (2008) Introduction to Generalized Linear Models (3rd edn), Chapman and Hall/CRC, London [1] Everitt BS (2003) Modern Medical Statistics: A Practical Guide Arnold Publishers, London [9] *Everitt BS and Rabe-Hesketh S (2006) A Handbook of Statistical Analysis using Stata (4th edn), Chapman and Hall/CRC, London [1] Feinstein AR (1996) Multivariable Analysis: An Introduction Yale University Press, New Haven and London [9] Fine JP and Gray RJ (1999) A proportional hazards model for the subdistribution of a competing risk Journal of the American Statistical Association, 94, 496–509 [10] *Freeman JV, Walters SJ and Campbell MJ (2008) How to Display Data BMJ Books, Blackwell Publishing, Oxford [1] Frison L and Pocock SJ (1992) Repeated measures in clinical trials: analysing mean summary statistics and its implications for design Statistics in Medicine, 11, 1685–1704 [11] Grundy RH, Wilne SH, Robinson KJ, Ironside JW, Cox T, Chong WK, Michalski A, Campbell RHA, Bailey CC, Thorpe N, Pizer B, Punt J, Walker DA, Ellison DW and Machin D (2010) Primary postoperative chemotherapy without radiotherapy for treatment of brain tumours other than ependymoma in children under years: Results of the first UKCCSG/SIOP CNS 9204 trial European Journal of Cancer, 46, 120–133 [10] Grundy RG, Wilne SH, Weston CL, Robinson K, Lashford LS, Ironside J, Cox T, Chong WK, Campbell RHA, Bailey CC, Gattamaneni R, Picton S, Thorpe N, Mallucci C, English MW, Punt JAG, Walker DA, Ellison DW and Machin D (2007) Primary postoperative chemotherapy without radiotherapy for intracranial ependymoma in children: the UKCCSG/SIOP prospective study Lancet Oncology, 8, 696–705 [10] Husain R, Liang S, Foster PJ, Gazzard G, Bunce C, Chew PTK, Oen FTS, Khaw PT, Seah SKL and Aung T (2012) Cataract surgery after trabulectomy: The effect of trabulectomy function Archives of Ophthalmology, 130, 165–170 [10] ICH E9 Expert Working Group (1999) Statistical principles for clinical trials: ICH Harmonised Tripartite Guideline Statistics in Medicine, 18, 1905–1942 [7] Ince D (2011) The Duke University scandal – what can be done? Significance, 8, 113–115 [7] Jackson DJ, Gangnon RE, Evans MD, Roberg KA, Anderson EL, Pappas TE, Printz MC, Lee W-M, Shult PA, Reisdorf E, Carlson-Dakes KT, Salazar LP, DaSilva DF, Tisler CJ, Gern JE and Lemanske RF (2008) Wheezing rhinovirus illnesses in early life predict asthma development in high-risk children American Journal of Respiratory and Critical Care Medicine, 178, 667–672 [4, 7] *Kleinbaum G, Kupper LL, Muller KE and Nizam E (2007) Applied Regression Analysis and Other Multivariable Methods (4th edn), Duxbury Press, Florence, Kentucky [1] Korenman S, Goldman N, Fu H (1997) Misclassification bias in estimates of bereavement effects American Journal of Epidemiology, 145, 995–1002 [10] 296 References LeBlanc M and Crowley J (1993) Survival trees by goodness of split Journal of the American Statistical Association, 88, 457–467 [9] Lee CH, Tai BC, Soon CY, Low AF, Poh KK, Yeo TC, Lim GH, Yip J, Omar AR, Teo SG and Tan HC (2010) A new set of intravascular ultrasound-derived anatomical criteria for defining functionally significant ­stenoses in small coronary arteries: results from Intravascular Ultrasound Diagnostic Evaluation of Atherosclerosis in Singapore (IDEAS) study American Journal of Cardiology, 105, 1378–1384 [7, 9] Levitan EB, Yang AZ, Wolk W and Mittleman MA (2009) Adiposity and incidence of heart failure h­ ospitalization and mortality: a population-based prospective study Circulation: Heart Failure, 2, 203–208 [7] Levy HL, Milanowski A , Chakrapani A, Cleary M, Lee P, Trefz FK, Whitley CB, Feillet F, Feigenbaum AS, Bebchuk JD, Christ-Schmidt H and Dorenbaum A (2007) Efficacy of sapropterin dihydrochloride (tetrahydrobiopterin, 6R-BH4) for reduction of phenylalanine concentration in patients with phenylketonuria: a phase III randomised placebo-controlled study Lancet, 370, 504–510 [11] Little RJ, Cohen ML, Dickersin K, Emerson SS, Farrar JT, Neaton JD, Shih W, Siegel JP and Stern H (2012) The design and conduct of clinical trials to limit missing data Statistics in Medicine, 31, 3433 – 3443 [10] *Machin D and Campbell MJ (2005) Design of Studies for Medical Research Wiley, Chichester [1] *Machin D, Campbell MJ, Tan SB and Tan SH (2009) Sample Size Tables for Clinical Studies (3rd edn) Wiley-Blackwell, Chichester [1, 7] *Machin D, Cheung Y-B and Parmar MKB (2006) Survival Analysis: A Practical Approach (2nd edn) Wiley, Chichester [1] Maheswaran R, Pearson T, Hoysal N and Campbell MJ (2010) Evaluation of the impact of a health forecast alert service on admissions for chronic obstructive pulmonary disease in Bradford and Airedale Journal of Public Health, 32, 97–102 [1, 5] Marshall RJ (2001) The use of classification and regression trees in clinical epidemiology Journal of Clinical Epidemiology, 54, 603–609 [9] Marubini E and Valsecchi MG (1995) Analysing Survival Data from Clinical Trials and Observational Studies Wiley, Chichester [6] *Mitchell MN (2012) Interpreting and Visualizing Regression Models Using Stata Stata Press, College Station, TX [1] Nejadnik H, Hui JH, Choong EP-F, Tai B-C and Lee E-H (2010) Autologous bone marrow-derived ­mesenchymal cells versus autologous chondrocyte implantation: An observational cohort study American Journal of Sports Medicine, 38, 1110–1116 [8, 11] Ng DPK, Fukushima M, Tai B-C, Koh D, Leong H, Imura H and Lim XL (2008) Reduced GFR and ­albuminuria in Chinese type diabetes mellitus patients are both independently associated with activation of the TNF-α system Diabetologia, 51, 2318–2324 [1, 7] Pearson ADJ, Pinkerton CR, Lewis IJ, Ellershaw C and Machin D (2008) High-dose rapid and standard induction chemotherapy for patients aged over year with stage neuroblastoma: a randomized trial Lancet Oncology, 9, 247–256 [6, 7, 10] Poole-Wilson PA, Uretsky BF, Thygesen K, Cleland JGF, Massie BM and Rydén L (2003) Mode of death in heart failure: findings from the ATLAS trial Heart, 89: 42–48 [10] Poon C-Y, Goh B-T, Kim M-J, Rajaseharan A, Ahmed S, Thongprasom K, Chaimusik M, Suresh S, Machin D, Wong H-B and Seldrup J (2006) A randomised controlled trial to compare steroid with cyclosporine for the topical treatment of oral lichen planus Oral Surgery, Oral Pathology, Oral Radiology and Endodontology, 102, 47–55 [1, 8, 11] Pregibon (1981) Logistic regression diagnostics Annals of Statistics, 9, 705–724 [4] *Rabe-Hesketh S and Skrondal A (2008) Multilevel and Longitudinal Modeling Using Stata (2nd edn), Stata Press, College Station, TX [1, 8] Richards SH, Bankhead C, Peters TJ, Austoker J, Hobbs FDR, Brown J, Tydeman C, Roberts L, Formby J, Redman V, Wilson S and Sharp DJ (2001) Cluster randomized controlled trial comparing the e­ ffectiveness and cost-effectiveness of two primary care interventions aimed at improving attendance for breast ­screening Journal of Medical Screening, 8, 91–98 [11] Rothman KJ (2002) Epidemiology: An Introduction Oxford University Press, New York, page 194 [7] Royston P, Reitz M and Atzpodien J (2006) An approach to estimating prognosis using fractional ­polynomials in metastatic renal carcinoma British Journal of Cancer, 94, 1785–1788 [11] Sackley CM, van den Berg ME, Lett K, Patel S, Hollands K, Wright CC and Hoppitt TJ (2009) Effects of a physiotherapy and occupational therapy intervention on mobility and activity in care home residents: a cluster randomized controlled trial British Medical Journal, 2009 Sep 1, 339:b3123.doi: 10.1136bmj.b3123 [11] References 297 SAS Institute (2012) SAS Enterprise Miner (EM): Version 12.1 SAS Institute, Cary, NC [9] Schoenfeld D (1982) Partial residuals for the proportional hazards regression model Biometrika, 69, 239–241 [6] Sridhar T, Gore A, Boiangiu I, Machin D and Symonds RP (2009) Concomitant (without adjuvant) temozolomide and radiation to treat glioblastoma: A retrospective study Clinical Oncology, 21, 19–22 [6, 7, 10] StataCorp (2007a) Stata Base Reference Manual, Volume 2: A-H, Release 10, College Station, TX [11] StataCorp (2007b) Stata Base Reference Manual, Volume 2: I-P, Release 10, College Station, TX [4] StataCorp (2007c) Stata Base Reference Manual, Volume 2: Q-Z, Release 10, College Station, TX [7] *Swinscow TV and Campbell MJ (2002) Statistics at Square One (10th edn), Blackwell, BMJ Books, Oxford [1] Tai BC, Grundy R and Machin D (2011) On the importance of accounting for competing risks in paediatric cancer trials designed to delay or avoid radiotherapy II Adjustment for covariates and sample size issues International Journal of Radiation Oncology, Biology and Physics, 79, 1139–1146 [10] Tai B-C, Peregoudov A and Machin D (2001) A competing risk approach to the analysis of trials of alternative intrauterine devices (IUDs) for fertility regulation Statistics in Medicine, 20, 3589–3600 [10] Tan C-K, Law N-M, Ng H-S and Machin D (2003) Simple clinical prognostic model for hepatocellular carcinoma in developing countries and its validation Journal of Clinical Oncology, 21, 2294–2298 [7, 9] Therneau T and Atkinson E (2011) An introduction to recursive partitioning using the RPART routines Technical report, Mayo Foundation, Rochester http://CRAN.R-project.org/package=rpart [9] Viardot-Foucault V, Prasath EB, Tai B-C, Chan JKY and Loh SF (2011) Predictive factors of success for Intra-Uterine Insemination: a single centre retrospective study Human Reproduction, 26 (Suppl 1), 1232–1233 [5, 8] Vuong Q (1989) Likelihood ratio tests for model selection and non-nested hypotheses Econometrica, 57, 307–334 [5] Wee J, Tan E-H, Tai B-C, Wong H-B, Leong S-S, Tan T, Chua E-T, Yang E, Lee K-M, Fong K-W, Tan HSK, Lee K-S, Loong S, Sethi V, Chua E-J and Machin D (2005) Randomized trial of radiotherapy versus concurrent chemoradiotherapy followed by adjuvant chemotherapy in patients with American Joint Committee on Cancer/International Union against cancer stage III and IV nasopharyngeal cancer of the endemic variety Journal of Clinical Oncology, 23, 6730–6738 [1, 7, 10] Whitehead J (1993) Sample size calculations for ordered categorical data Statistics in Medicine, 12, 2257–2272 [4] Wight J, Jakubovic M, Walters S, Maheswaran R, White P and Lennon V (2004) Variation in cadaveric organ donor rates in the UK Nephrology Dialysis Transplantation; 19, 963–968 [5] Young SS and Karr A (2011) Deming, data and observational studies: A process out of control and needing fixing Significance, 8, 116–119 [7] Zhang H, Crowley J, Sox HC and Olshen RA (1999) Tree-structured statistical methods In Armitage P and Colton T (eds) Encyclopedia of Biostatistics, 6, 4561–4573 [9] Zhang H, Holford T and Bracken MB (1996) A tree-based method of analysis for prospective studies Statistics in Medicine, 15, 37–49 [9] Index adjacent values  27 adjusted analysis  172 Akaike Information Criterion (AIC)  57, 58, 63, 110, 149, 170 alternative hypothesis  31, 96, 166 Analysis of Variance (ANOVA)  6, 20–1, 42, 45, 47, 48, 52, 54, 62, 116, 177 anorexia 89–92 association  9, 38, 45, 160, 162, 170, 183, 269, 272 linear  22, 59 asthma  8–9, 36, 64–8, 78–81, 94, 95, 148–9, 165 auto-correlation  22, 177, 178, 183–93, 203, 269, 271–5, 277 auto-regressive 184 correlation structure  184, 190, 203 exchangeable  183–4, 188, 190–93 independent 183 uniform 183–4 unstructured  184–5, 194, 197, 202 Auto-regressive 184 Autologous bone marrow-derived mesenchymal cells (BMSC)  182, 190, 192 Autologous chondrocyte (ACC)  182, 190, 192 backward selection  160 bandwidth 203 Barthel index  277 baseline adjustment  270–1 baseline characteristic  277 baseline hazard  126, 127, 251 before-and-after design  270 between subject variation  193 bias  168, 174–5 binary covariates  28, 79, 80, 86, 127, 136, 146, 151–4, 162, 169, 171, 214, 234 binary outcome  65–6, 83, 93, 205, 207–14 Binomial distribution  95, 116 Binomial models  99–101 body mass index (BMI)  151, 254 Bonferroni correction  169–1 bootstrap 190 box-whisker plot  27, 32 brain tumour  237 branches  15, 204, 206, 224–7, 229 base-control studies matched  64, 87, 89–91 unmatched 89–91 bataract surgery  256–8 bategorical covariates  69–75, 214, 244 ordered  8, 27–8, 32–3, 70–2, 75, 77, 130, 132, 135, 147, 198, 212, 214, 230 unordered  25, 28–3, 55, 69–70, 75, 86, 130, 132, 198 CART 205 cause-specific rates (CSR)  237–40 censored observation  120–5, 138, 141–3, 167, 237, 239–41, 248, 256, 268 censored survival time  121 chemotherapy  12, 237, 239, 258, 260, 266 change in estimates  156–7 chi-squared (χ2) distribution  96 chronic obstructive pulmonary disease (COPD)  11–12, 98 Classification And Regression Tree  205 classification trees  230–4 clinical significance  40–1 clinical trial  8, 22, 38–40, 120–2, 124, 148, 164, 179, 236–7, 239, 260, 267, 269 cluster  36, 235, 257, 269, 277–81 cluster design  12, 276–81 coefficient of determination (R2) 21 coefficient of variation  275–6 cohort 254 Regression Methods for Medical Research, First Edition Bee Choo Tai and David Machin © 2014 Bee Choo Tai and David Machin Published 2014 by John Wiley & Sons, Ltd Index collinearity 59–60 competing risks (CR)  236–7 adjusting for covariates  243–7 CuMulative Incidence Rates (CMIR) 240–3 complementary log-log plot  136, 143 complexity parameter (CP)  229 conditional logistic regression  64, 87–92, 105 conditional probability  142 confidence interval  4, 12, 17, 20, 40–1, 61, 68, 74 confounding 157 constant hazard  253 continuous covariate  25, 33–8, 72–7 continuous explanatory variable  146 continuous outcome  65, 202, 214–22, 235 continuous time-varying covariate  258–63 continuous variable  8, 17, 34, 73, 74, 82, 130, 165, 234, 279 correlation 21–2 correlation coefficient  22 correlation matrix  183–4 correlation structure  184 cost complexity  229 covariance  195–7, 199–201, 279–81 covariate pattern  64, 83, 86, 87 covariates  2, 6, 9–10, 17, 21, 38, 43 binary  28, 79, 80, 86, 127, 136, 146, 151–3, 162, 169, 171, 214, 234 continuous  25, 33–8, 72–7 continuous time-varying  258–63 design  10, 93, 148–9, 157, 265 discrete time-varying  256–8, 261 independent variables  25–31 knowingly influential  148, 149 linear regression  43–5 query  149, 165, 170 time-dependent 141 see also categorical covariates; time-varying covariates Cox regression model  12–13, 120, 127–36, 205, 237, 243, 253, 254 non-proportional hazards  141, 171, 262–7 proportional hazards  167, 205, 243, 262–7 stratified Cox model  138–41, 171 time-varying covariates  254–66 cross-validation  164, 171, 232–3 cumulative death rate (CDR)  128, 129, 133, 134, 221–3, 225, 232, 237 cumulative exposure  109–12 CuMulative Incidence Rates (CMIR)  240–3 cumulative survival probability  141 299 cut-points  132, 147, 164, 205–9, 211–18, 230 cyclosporine  13, 14 database format  178–82 long format  181, 197, 202, 256, 261 wide format  181, 202, 256 daughter node  205–7, 209, 212–15, 218, 219, 221–7, 230, 235 decision rules  204–5 degrees of freedom  16–18, 20–1, 39, 41, 45, 47, 50, 52, 63, 96, 153 dependent variable  design covariates  10, 93, 148–9, 157, 265 deviances 96 diabetes  9–10, 147, 237 discrete time-varying covariate  256–9, 261 distant metastasis  236–9, 242–7, 267 dummy variables  29, 31 effect size  166, 167 endpoint  72, 99, 120–2, 147–51, 153–4, 161, 167, 168, 174, 181, 183, 208, 213, 237, 254, 269–70, 274, 276, 278, 281 binary  92, 147, 162, 196–202, 207, 217 continuous  165, 197, 207, 216, 230 entropy criterion  208 event  10, 12, 48, 65, 87, 95, 98–9, 101, 116, 120, 121, 165, 167, 236–41, 243 event-free-survival (EFS)  15, 172, 204, 237 Events Per candidate coVariate (EPV)  165 Exponential constant (e) 22–4 Exponential distribution  248, 251, 253, 254 Exponential survival function  248 F-distribution  21, 31, 62 F-statistic 20 F-test  31, 41–2, 62 fetal sampling  10, 94 fixed-effects models  185–93, 202 forced expiratory volume (FEV)  8, 22, 148–9, 165, 273–4 forced selection  155–7 forward selection  160 fractional flow reserve (FFR)  162, 205, 206, 230 fractional polynomials  282–4 Generalized Estimating Equation (GEE)  187, 202 geometric mean  23–4 300 Index Gini information index  235 glioblastoma  123–9, 142–5, 149, 249–53 global test  31, 41–2, 138, 139, 264 goodness-of-fit 87 hazard function  142 hazard rate  125–6 hazard ratio (HR)  12, 127, 171 hepatocellular carcinoma (HCC)  162, 171, 205 hierarchical models  193 hierarchical selection  157–9 hierarchical backward  157 hierarchical forward  158–9 high density lipoprotein (HDL)  2–8, 17–20, 22, 25–38, 40, 41, 43, 254, 282–4 histogram 2 hypothesis  31, 74 null  2–3, 12, 16, 18, 20–2, 27, 31, 33, 39, 41, 50–5, 62–3, 65, 68, 73, 74, 80, 96, 104, 108, 127, 133, 138, 148, 152, 157, 166, 169, 202, 244, 247, 251, 258, 271, 276 human rhinovirus (HRV)  8–9, 64–7, 79, 81, 148 impurity 208 incidence-rate ratio  100 independent variables  25–31, 44 see also covariates indicator variables  29, 31 influential observations  36, 38 information criterion  208 interactions  148, 169, 205, 234–5, 264, 282 logistic regression  80–2 multiple linear regression  53–9, 61 intercept 43 interferon-l  8, 37, 149, 165, 276 intermediate node  206, 212–14, 224–6, 229 intra-class correlation (ICC)  277–8 ischaemic heart disease (IHD)  69–77, 81–7, 148, 151–3, 155–8 intra-uterine device (IUD)  237–9, 242–3 intra-uterine insemination (IUI)  99–101 jackknife  105, 106 jittering 27 Kaplan-Meier estimates  12, 240 Kaplan-Meier (K-M) survival curves  123–5, 237 knowingly influential covariate  148, 149 lesion length  162, 205–6, 210, 211, 213, 214 leverage  83–4, 86 likelihood ratio  68, 96 linear regression  4–7, 17–18, 25–42 assumptions 32–8 independent variables  25–31 ordered Normal scores  42 residuals  33–9, 41, 42 link function  99 local maxima  211, 232 local recurrence  236–47, 267 log hazard ratio  138 log likelihood  96 log link function  99 log transformation  23–4 logarithms 22–4 logistic regression  64–97 categorical covariates  69–72 conditional  64, 87–2, 105 continuous covariates  72–7 goodness-of-fit 87 interactions 80–1 lack of important covariate  82 logit transformation  64–8 model checking  81–7 ordered  64, 92–4, 150 logit models  119 logit transformation  64–8 logrank test  220 long format  181, 197, 202, 256, 261 longitudinal studies  176–82, 191, 260 auto-correlation  177, 178 before-and-after design  270 database format  178–82 time series  177–8 repeated measures  13, 176–82 Locally Weighted Scatterplot Smoothing (Lowess) curve  187, 203 Lowess  187, 203 matched case-control studies  64, 89, 90, 91 maximum likelihood estimate (MLE)  96, 118–19 mean  1, 20, 29–32, 47, 51, 65, 87, 95, 103–5, 116, 175, 203, 216–19, 271, 275–6 comparing two  2–5, 17, 18 geometric 23–4 median  15, 23–4, 27, 32, 150, 151, 185, 190, 275 minimum lumen area  162, 205–6, 210, 211, 213, 217, 218 Index misclassification costs  226–8 missing covariate values  233–4 missing data  168–9 misspecified model  104 mixed-effects models  193–202 modeling a difference  271–3 modeling a ratio  273–4 multi-level models  278–81 multiple comparisons  169 multiple linear regression  43–63 adequacy of fit  45–8 assumptions 61 collinearity 59–60 interactions 53–8 nested models  51, 53, 57–9, 61–3 non-nested models  58 parsimonious models  61 multiple logistic regression  78–81 multiple modeling  170 multiple tests  169–70, 173 multivariable model  16, 47, 148, 167, 170 nasopharyngeal cancer  12–13, 171–2, 237, 244, 246, 262, 264, 265, 267 natural logarithm  116 nested models  51, 53, 57–9, 61–3 neuroblastoma  14, 15, 140, 173, 204, 266 non-informative censoring  124 non-linearity  33, 43, 49, 51, 61, 148, 176, 282 non-nested models  58 non-parametric 250 non-proportional hazards  141, 172, 267–9 Normal distribution  2, 3, null hypothesis  2–3 null model  41, 58, 96, 102, 104, 105, 108, 148, 152, 157, 160 numerically discrete covariates  28 odds  10, 64, 65, 73, 94, 202 odds ratio (OR)  64–5, 92, 94–5 offset 108 optimal split  220 optimal subtree  229 optimum cut-point  208, 217 oral lichen planus (OLP)  13–14, 22, 34, 38, 176, 177, 179, 282 ordered categorical covariates  8, 27–8, 32–3, 70–2, 75, 77, 130, 132, 135, 148, 202, 212, 214, 230 ordered logistic regression  64, 92–4, 150 ordinary least-squares  17, 178, 187, 278 outlier  27, 36–8, 41, 86, 168, 276 301 over-dispersion  98, 100, 103–5 overall survival  12, 204, 249, 250, 260, 264–6 p-value  104, 114, 127, 133, 138, 141, 149, 152, 155, 157, 160, 164, 168, 169, 173 parallel lines model  146 parametric models  247–54 Exponential distribution  248–51, 253, 254 Weibull distribution  248–51, 253, 254 parsimonious models  61 change in estimates  156–7 covariates 147–51 data checking  167–8 forced selection  155–6 grouping terms  159–60 hierarchical selection  157–9 missing data  168–9 multiple modeling  170 multiple tests  169–70 prognostic index (PI)  161–4 selection and stepwise  160 stratification 171–2 subgroup analysis  172–4 Pearson statistic  86, 90, 104 Pearson’s residuals  84 percentile 27 plaque burden  162, 205–7, 209–10, 212, 213, 217, 220 Poisson distribution  98, 99, 103–5, 112, 116–18 Poisson regression  98–119 cumulative exposure  109–12 over-dispersion 103–5 population size at risk  101–3, 105–9 residuals 114–15 robust estimates  98, 105, 190, 203 zero-inflated models  112–14 polynomial model  51, 282, 283 population-averaged models  187, 188, 192, 193, 202–3 population parameters  17, 185 population size at risk known 105–9 unknown 101–3 Prais-Watson method  177 predicted value  33, 34, 136, 284 pregnancy  99–102, 104, 196–203, 237, 238, 242 Pregibon leverage  83 primary splits  215, 219, 221, 231, 234 302 Index probability  3, 42, 65, 66, 83–7, 95, 99, 118, 141, 142, 248 prognostic index (PI)  161–4, 234–5 prognostic model  164, 171, 205 proportional hazards (PH)  127, 136–41 proportional odds  94 Q-Q plot  35–6 quadratic models  48–51, 148 quantile  35, 36, 62, 114, 115 query covariates  149, 165, 170 R routine  219 radiotherapy  12, 237 random intercept  195–7, 199 random-effects model  176, 194–6 random-effects slope  195–7, 199 random variation  rank  27, 42, 121, 122, 141, 144–5, 197, 206–9, 213, 214, 216, 219, 240 recursive partitioning  206 regression coefficient  1, 5, 10, 12, 17, 30–2, 38, 40, 45, 51–5, 57, 59, 61–3, 66–74, 77, 80, 82–3, 88, 89, 99, 105, 110, 113, 126–9, 132, 133, 135–8, 143, 149, 152, 156, 163–5, 168, 170, 177, 183, 185, 190–1, 202, 203, 254, 265, 271, 279 regression models  234–5 regression tree  204–35 relative risk (RR)  94–5 repeated measures  13, 176–203 residual variation  residuals 143–5 linear regression  33–9, 41, 42 Pearson’s 83 Q-Q plot  35–6 Schoenfeld  138, 143–4 standardized  64, 115, 116 Rivermead mobility index  277 robust estimates  90 Root Mean Square Error (MSE)  177 of ANOVA table  51 root node  204, 206, 208, 209, 212, 215, 216, 219, 226, 229 rpart  213, 235 sampling with replacement  105 sandwich estimate  105 SAS 235 scale parameter  248 scatter plot  5–7, 19, 27, 28, 32–9, 41, 49, 59, 60, 62, 72, 83, 86, 139, 145, 165, 177, 185–7, 203, 207, 209–14, 216–20, 272, 274 schizophrenia  128–39, 155, 233, 237 Schoenfeld residuals  138, 143–4 serial-correlation 183–5 see also auto-correlation SF-36 quality-of-life  179 shape parameter  248 significance level  157, 160, 169 significance test  2, 169, 170 skewed distribution  23, 118, 147 slope  8, 18, 22, 23, 41, 43, 48, 53–4, 59, 61, 165, 190, 194–200 smoothed hazard function  253 ‘split-sample’ analysis  171 standard deviation  1–3, 6, 16, 17, 20, 29, 33, 38, 39, 42, 51, 83, 87, 95, 105, 116, 166, 167, 177, 183, 185, 194, 196, 272, 273, 277, 278 standard error  39 standard Normal distribution  standardized difference  166 standardised residuals  64, 115, 116 Stata  15, 27, 31, 66, 83, 105, 154, 187, 213, 235, 282 statistical significance  2, 3, 17, 18, 40–1, 45, 146, 157, 169, 251, 258 stepwise methods  160 steroid  13, 14, 176 stratification 171–2 stratified Cox model  138–41, 171 study size  165–7 sub-distribution  243, 247 sub-distribution hazard ratio  243–6 sub-trees  225–30, 233 subgroup analysis  172–4 subject-specific models  202–3 suicide  101–9, 112–14 surrogate split  233, 234 t-distribution 22 t-test  3, 16–17, 39, 45 tardive dyskinesia  109–11, 128–32, 134, 135, 137, 139, 148, 155, 220, 223, 224, 232, 278, 280 terminal node  15, 204, 205, 206, 212, 219, 221–32 test statistic  45 time-dependent covariates  141 Index time-series 177 time-to-event data  120–2 time-to-event outcome  220–3 time-varying covariates  141, 254–67 continuous 258–62 discrete 255–8 TNF-a system  9–10, 147 total sum of squares (SS)  17, 46, 57, 87, 215 total variation  20, 45, 47, 50, 55, 57 trabulectomy 255 transformations 149–51 see also log transformation; logit transformation tree building  206–24 binary covariates  214 binary outcome  207–14 categorical covariates  214 continuous covariates  207–9 continuous outcome  214–19 cross-validation 232–3 cut-off points  230 missing covariate values  233–4 time-to-event outcome  220–4 tree pruning  224–30 branches  204, 206, 224–7, 229 cost complexity  229 misclassification costs  227 sub-trees 225–6 tree homogeneity tree quality  226–9 tree homogeneity  228 tree quality  226–7 Type error  166 Type error  166 unadjusted analysis  172 unexplained variation  43, 51, 281 univariate model  78, 80, 89, 96, 114 non-nested model  58 regression models  59, 60 unmatched case-control studies  89 unordered categorical covariates  25, 28–31, 55, 69–70, 75, 86, 130, 132, 199 validation 233 cross-validation  164, 171, 232–3 variance  98, 103–5, 116–17, 272, 274 Visual Analog Scale (VAS)  14, 38, 176 Vuong test  113–14 Weibull distribution  248–51, 253, 254 wide format  181, 202, 255 within-cluster standard deviation  277 within-subject correlation  257 z-distribution 47 zero count  105, 112–13, 162 Zero-inflated Poisson (ZiP)  112–14 303 ... sPool = (nM − 1)sM2 + (nF − 1)sF2 For the data concerned (nM − 1) + (nF − 1) (5 5 − 1) 0.34252 + (6 5 − 1) 0.28812 = 0.3142 This is then used in equation (1 . 1), (5 5 − 1) + (6 5 − 1) ( y − yM ) − ( µ... deaths (% ) Live-births Intrahepatic vein (IHV) Percutaneous umbilical cord sampling (PUBS) Cardiocentesis Total (% ) 21 11 39 (1 0. 2) 10 21 2 18 (4 . 7) 30? ?(7 . 9) 52 (1 7. 8) 240 20 (2 8. 6) 50 15 (7 5. 0) 87... (2 00 2) Statistical Methods in Medical Research (4 th edn) Blackwell Science, Oxford Bland M (2 00 0) An Introduction to Medical Statistics (3 rd edn) Oxford University Press, Oxford Campbell MJ (2 006)

Ngày đăng: 03/09/2021, 23:32