A Handbook of Statistical Analyses Using n SECOND EDITION © 2010 by Taylor and Francis Group, LLC Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 A Handbook of Statistical Analyses Using SECOND EDITION Brian S Everitt and Ibrsten Hothorn CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an informa business A CHAPMAN & HALL BOOK © 2010 by Taylor and Francis Group, LLC Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 Chapman & Hall/CRC Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2010 by Taylor and Francis Group, LLC Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S Government works Printed in the United States of America on acid-free paper 10 International Standard Book Number: 978-1-4200-7933-3 (Paperback) This book contains information obtained from authentic and highly regarded sources Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint Except as permitted under U.S Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers For permission to photocopy or use material electronically from this work, please access www.copyright com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400 CCC is a not-for-profit organization that provides licenses and registration for a variety of users For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe Library of Congress Cataloging-in-Publication Data Everitt, Brian A handbook of statistical analyses using R / Brian S Everitt and Torsten Hothorn 2nd ed p cm Includes bibliographical references and index ISBN 978-1-4200-7933-3 (pbk : alk paper) Mathematical statistics Data processing Handbooks, manuals, etc R (Computer program language) Handbooks, manuals, etc I Hothorn, Torsten II Title QA276.45.R3E94 2010 519.50285’5133 dc22 Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com © 2010 by Taylor and Francis Group, LLC 2009018062 Dedication Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 To our wives, Mary-Elizabeth and Carolin, for their constant support and encouragement © 2010 by Taylor and Francis Group, LLC Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 Preface to Second Edition Like the first edition this book is intended as a guide to data analysis with the R system for statistical computing New chapters on graphical displays, generalised additive models and simultaneous inference have been added to this second edition and a section on generalised linear mixed models completes the chapter that discusses the analysis of longitudinal data where the response variable does not have a normal distribution In addition, new examples and additional exercises have been added to several chapters We have also taken the opportunity to correct a number of errors that were present in the first edition Most of these errors were kindly pointed out to us by a variety of people to whom we are very grateful, especially Guido Schwarzer, Mike Cheung, Tobias Verbeke, Yihui Xie, Lothar H¨aberle, and Radoslav Harman We learnt that many instructors use our book successfully for introductory courses in applied statistics We have had the pleasure to give some courses based on the first edition of the book ourselves and we are happy to share slides covering many sections of particular chapters with our readers LATEX sources and PDF versions of slides covering several chapters are available from the second author upon request A new version of the HSAUR package, now called HSAUR2 for obvious reasons, is available from CRAN Basically the package vignettes have been updated to cover the new and modified material as well Otherwise, the technical infrastructure remains as described in the preface to the first edition, with two small exceptions: names of R add-on packages are now printed in bold font and we refrain from showing significance stars in model summaries Lastly we would like to thank Thomas Kneib and Achim Zeileis for commenting on the newly added material and again the CRC Press staff, in particular Rob Calver, for their support during the preparation of this second edition Brian S Everitt and Torsten Hothorn London and M¨ unchen, April 2009 © 2010 by Taylor and Francis Group, LLC Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 Preface to First Edition This book is intended as a guide to data analysis with the R system for statistical computing R is an environment incorporating an implementation of the S programming language, which is powerful and flexible and has excellent graphical facilities (R Development Core Team, 2009b) In the Handbook we aim to give relatively brief and straightforward descriptions of how to conduct a range of statistical analyses using R Each chapter deals with the analysis appropriate for one or several data sets A brief account of the relevant statistical background is included in each chapter along with appropriate references, but our prime focus is on how to use R and how to interpret results We hope the book will provide students and researchers in many disciplines with a self-contained means of using R to analyse their data R is an open-source project developed by dozens of volunteers for more than ten years now and is available from the Internet under the General Public Licence R has become the lingua franca of statistical computing Increasingly, implementations of new statistical methodology first appear as R add-on packages In some communities, such as in bioinformatics, R already is the primary workhorse for statistical analyses Because the sources of the R system are open and available to everyone without restrictions and because of its powerful language and graphical capabilities, R has started to become the main computing engine for reproducible statistical research (Leisch, 2002a,b, 2003, Leisch and Rossini, 2003, Gentleman, 2005) For a reproducible piece of research, the original observations, all data preprocessing steps, the statistical analysis as well as the scientific report form a unity and all need to be available for inspection, reproduction and modification by the readers Reproducibility is a natural requirement for textbooks such as the Handbook of Statistical Analyses Using R and therefore this book is fully reproducible using an R version greater or equal to 2.2.1 All analyses and results, including figures and tables, can be reproduced by the reader without having to retype a single line of R code The data sets presented in this book are collected in a dedicated add-on package called HSAUR accompanying this book The package can be installed from the Comprehensive R Archive Network (CRAN) via R> install.packages("HSAUR") and its functionality is attached by R> library("HSAUR") The relevant parts of each chapter are available as a vignette, basically a © 2010 by Taylor and Francis Group, LLC document including both the R sources and the rendered output of every analysis contained in the book For example, the first chapter can be inspected by Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 R> vignette("Ch_introduction_to_R", package = "HSAUR") and the R sources are available for reproducing our analyses by R> edit(vignette("Ch_introduction_to_R", package = "HSAUR")) An overview on all chapter vignettes included in the package can be obtained from R> vignette(package = "HSAUR") We welcome comments on the R package HSAUR, and where we think these add to or improve our analysis of a data set we will incorporate them into the package and, hopefully at a later stage, into a revised or second edition of the book Plots and tables of results obtained from R are all labelled as ‘Figures’ in the text For the graphical material, the corresponding figure also contains the ‘essence’ of the R code used to produce the figure, although this code may differ a little from that given in the HSAUR package, since the latter may include some features, for example thicker line widths, designed to make a basic plot more suitable for publication We would like to thank the R Development Core Team for the R system, and authors of contributed add-on packages, particularly Uwe Ligges and Vince Carey for helpful advice on scatterplot3d and gee Kurt Hornik, Ludwig A Hothorn, Fritz Leisch and Rafael Weißbach provided good advice with some statistical and technical problems We are also very grateful to Achim Zeileis for reading the entire manuscript, pointing out inconsistencies or even bugs and for making many suggestions which have led to improvements Lastly we would like to thank the CRC Press staff, in particular Rob Calver, for their support during the preparation of the book Any errors in the book are, of course, the joint responsibility of the two authors Brian S Everitt and Torsten Hothorn London and Erlangen, December 2005 © 2010 by Taylor and Francis Group, LLC Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 List of Figures 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3.1 3.2 3.3 3.4 3.5 Histograms of the market value and the logarithm of the market value for the companies contained in the Forbes 2000 list Raw scatterplot of the logarithms of market value and sales Scatterplot with transparent shading of points of the logarithms of market value and sales Boxplots of the logarithms of the market value for four selected countries, the width of the boxes is proportional to the square roots of the number of companies Histogram (top) and boxplot (bottom) of malignant melanoma mortality rates Parallel boxplots of malignant melanoma mortality rates by contiguity to an ocean Estimated densities of malignant melanoma mortality rates by contiguity to an ocean Scatterplot of malignant melanoma mortality rates by geographical location Scatterplot of malignant melanoma mortality rates against latitude Bar chart of happiness Spineplot of health status and happiness Spinogram (left) and conditional density plot (right) of happiness depending on log-income Boxplots of estimates of room width in feet and metres (after conversion to feet) and normal probability plots of estimates of room width made in feet and in metres R output of the independent samples t-test for the roomwidth data R output of the independent samples Welch test for the roomwidth data R output of the Wilcoxon rank sum test for the roomwidth data Boxplot and normal probability plot for differences between the two mooring methods © 2010 by Taylor and Francis Group, LLC 19 20 21 22 30 31 32 33 34 35 36 38 55 56 56 57 58 3.6 3.7 Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 3.8 3.9 3.10 3.11 3.12 3.13 4.1 4.2 4.3 4.4 5.1 5.2 5.3 5.4 5.5 5.6 6.1 6.2 6.3 6.4 6.5 6.6 6.7 R output of the paired t-test for the waves data R output of the Wilcoxon signed rank test for the waves data Enhanced scatterplot of water hardness and mortality, showing both the joint and the marginal distributions and, in addition, the location of the city by different plotting symbols R output of Pearsons’ correlation coefficient for the water data R output of the chi-squared test for the pistonrings data Association plot of the residuals for the pistonrings data R output of McNemar’s test for the rearrests data R output of an exact version of McNemar’s test for the rearrests data computed via a binomial test An approximation for the conditional distribution of the difference of mean roomwidth estimates in the feet and metres group under the null hypothesis The vertical lines show the negative and positive absolute value of the test statistic T obtained from the original data R output of the exact permutation test applied to the roomwidth data R output of the exact conditional Wilcoxon rank sum test applied to the roomwidth data R output of Fisher’s exact test for the suicides data Plot of mean weight gain for each level of the two factors R output of the ANOVA fit for the weightgain data Interaction plot of type and source Plot of mean litter weight for each level of the two factors for the foster data Graphical presentation of multiple comparison results for the foster feeding data Scatterplot matrix of epoch means for Egyptian skulls data Scatterplot of velocity and distance Scatterplot of velocity and distance with estimated regression line (left) and plot of residuals against fitted values (right) Boxplots of rainfall Scatterplots of rainfall against the continuous covariates R output of the linear model fit for the clouds data Regression relationship between S-Ne criterion and rainfall with and without seeding Plot of residuals against fitted values for clouds seeding data © 2010 by Taylor and Francis Group, LLC 59 59 60 61 61 62 63 63 71 72 73 73 84 85 86 87 90 92 104 105 107 108 109 111 113 6.8 6.9 Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 Normal probability plot of residuals from cloud seeding model clouds_lm Index plot of Cook’s distances for cloud seeding data Conditional density plots of the erythrocyte sedimentation rate (ESR) given fibrinogen and globulin R output of the summary method for the logistic regression model fitted to ESR and fibrigonen R output of the summary method for the logistic regression model fitted to ESR and both globulin and fibrinogen Bubbleplot of fitted values for a logistic regression model fitted to the plasma data R output of the summary method for the logistic regression model fitted to the womensrole data Fitted (from womensrole_glm_1) and observed probabilities of agreeing for the womensrole data R output of the summary method for the logistic regression model fitted to the womensrole data Fitted (from womensrole_glm_2) and observed probabilities of agreeing for the womensrole data Plot of deviance residuals from logistic regression model fitted to the womensrole data R output of the summary method for the Poisson regression model fitted to the polyps data R output of the print method for the conditional logistic regression model fitted to the backpain data Three commonly used kernel functions Kernel estimate showing the contributions of Gaussian kernels evaluated for the individual observations with bandwidth h = 0.4 Epanechnikov kernel for a grid between (−1.1, −1.1) and (1.1, 1.1) Density estimates of the geyser eruption data imposed on a histogram of the data A contour plot of the bivariate density estimate of the CYGOB1 data, i.e., a two-dimensional graphical display for a three-dimensional problem The bivariate density estimate of the CYGOB1 data, here shown in a three-dimensional fashion using the persp function Fitted normal density and two-component normal mixture for geyser eruption data Bootstrap distribution and confidence intervals for the mean estimates of a two-component mixture for the geyser data © 2010 by Taylor and Francis Group, LLC 114 115 123 124 125 126 127 129 130 131 132 133 136 144 145 146 148 149 150 152 155 9.1 Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 9.2 9.3 9.4 9.5 9.6 9.7 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 11.1 11.2 11.3 11.4 11.5 11.6 11.7 12.1 Initial tree for the body fat data with the distribution of body fat in terminal nodes visualised via boxplots Pruned regression tree for body fat data Observed and predicted DXA measurements Pruned classification tree of the glaucoma data with class distribution in the leaves Estimated class probabilities depending on two important variables The 0.5 cut-off for the estimated glaucoma probability is depicted as a horizontal line Glaucomateous eyes are plotted as circles and normal eyes are triangles Conditional inference tree with the distribution of body fat content shown for each terminal leaf Conditional inference tree with the distribution of glaucomateous eyes shown for each terminal leaf A linear spline function with knots at a = 1, b = and c = Scatterplot of year and winning time Scatterplot of year and winning time with fitted values from a simple linear model Scatterplot of year and winning time with fitted values from a smooth non-parametric model Scatterplot of year and winning time with fitted values from a quadratic model Partial contributions of six exploratory covariates to the predicted SO2 concentration Residual plot of SO2 concentration Spinograms of the three exploratory variables and response variable kyphosis Partial contributions of three exploratory variables with confidence bands 166 167 168 169 172 173 174 183 187 188 189 190 191 192 193 194 ‘Bath tub’ shape of a hazard function Survival times comparing treated and control patients Kaplan-Meier estimates for breast cancer patients who either received a hormonal therapy or not R output of the summary method for GBSG2_coxph Estimated regression coefficient for age depending on time for the GBSG2 data Martingale residuals for the GBSG2 data Conditional inference tree for the GBSG2 data with the survival function, estimated by Kaplan-Meier, shown for every subgroup of patients identified by the tree 211 Boxplots for the repeated measures by treatment group for the BtheB data 220 © 2010 by Taylor and Francis Group, LLC 202 205 207 208 209 210 12.2 12.3 Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 12.4 12.5 13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8 13.9 13.10 13.11 13.12 13.13 14.1 14.2 14.3 R output of the linear mixed-effects model fit for the BtheB data R output of the asymptotic p-values for linear mixed-effects model fit for the BtheB data Quantile-quantile plots of predicted random intercepts and residuals for the random intercept model BtheB_lmer1 fitted to the BtheB data Distribution of BDI values for patients that (circles) and not (bullets) attend the next scheduled visit Simulation of a positive response in a random intercept logistic regression model for 20 subjects The thick line is the average over all 20 subjects R output of the summary method for the btb_gee model (slightly abbreviated) R output of the summary method for the btb_gee1 model (slightly abbreviated) R output of the summary method for the resp_glm model R output of the summary method for the resp_gee1 model (slightly abbreviated) R output of the summary method for the resp_gee2 model (slightly abbreviated) Boxplots of numbers of seizures in each two-week period post randomisation for placebo and active treatments Boxplots of log of numbers of seizures in each two-week period post randomisation for placebo and active treatments R output of the summary method for the epilepsy_glm model R output of the summary method for the epilepsy_gee1 model (slightly abbreviated) R output of the summary method for the epilepsy_gee2 model (slightly abbreviated) R output of the summary method for the epilepsy_gee3 model (slightly abbreviated) R output of the summary method for the resp_lmer model (abbreviated) Distribution of levels of expressed alpha synuclein mRNA in three groups defined by the NACP -REP1 allele lengths Simultaneous confidence intervals for the alpha data based on the ordinary covariance matrix (left) and a sandwich estimator (right) Probability of damage caused by roe deer browsing for six tree species Sample sizes are given in brackets © 2010 by Taylor and Francis Group, LLC 222 223 224 227 237 239 240 241 242 243 244 245 246 247 248 249 249 258 261 263 14.4 Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 15.1 15.2 15.3 15.4 15.5 15.6 15.7 15.8 16.1 16.2 16.3 16.4 16.5 17.1 17.2 17.3 17.4 18.1 18.2 18.3 18.4 18.5 18.6 Regression relationship between S-Ne criterion and rainfall with and without seeding The confidence bands cover the area within the dashed curves R output of the summary method for smokingOR Forest plot of observed effect sizes and 95% confidence intervals for the nicotine gum studies R output of the summary method for BCG_OR R output of the summary method for BCG_DSL R output of the summary method for BCG_mod Plot of observed effect size for the BCG vaccine data against latitude, with a weighted least squares regression fit shown in addition Example funnel plots from simulated data The asymmetry in the lower plot is a hint that a publication bias might be a problem Funnel plot for nicotine gum data Scatterplot matrix for the heptathlon data (all countries) Scatterplot matrix for the heptathlon data after removing observations of the PNG competitor Barplot of the variances explained by the principal components (with observations for PNG removed) Biplot of the (scaled) first two principal components (with observations for PNG removed) Scatterplot of the score assigned to each athlete in 1988 and the first principal component Two-dimensional solution from classical multidimensional scaling of distance matrix for water vole populations Minimum spanning tree for the watervoles data Two-dimensional solution from non-metric multidimensional scaling of distance matrix for voting matrix The Shepard diagram for the voting data shows some discrepancies between the original dissimilarities and the multidimensional scaling solution Bivariate data showing the presence of three clusters Example of a dendrogram Darwin’s Tree of Life Image plot of the dissimilarity matrix of the pottery data Hierarchical clustering of pottery data and resulting dendrograms 3D scatterplot of the logarithms of the three variables available for each of the exoplanets © 2010 by Taylor and Francis Group, LLC 265 274 275 277 278 279 280 281 282 289 291 294 295 296 306 308 309 310 319 321 322 326 327 328 Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 18.7 Within-cluster sum of squares for different numbers of clusters for the exoplanet data 18.8 Plot of BIC values for a variety of models and a range of number of clusters 18.9 Scatterplot matrix of planets data showing a three-cluster solution from Mclust 18.10 3D scatterplot of planets data showing a three-cluster solution from Mclust © 2010 by Taylor and Francis Group, LLC 329 331 332 333 Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 List of Tables 2.1 2.2 2.3 2.4 2.5 2.6 3.1 3.2 3.3 3.4 3.5 3.6 3.7 4.1 4.2 4.3 4.4 4.5 USmelanoma data USA mortality rates for white males due to malignant melanoma CHFLS data Chinese Health and Family Life Survey household data Household expenditure for single men and women suicides2 data Mortality rates per 100, 000 from male suicides USstates data Socio-demographic variables for ten US states banknote data (package alr3) Swiss bank note data roomwidth data Room width estimates (width) in feet and in metres (unit) waves data Bending stress (root mean squared bending moment in Newton metres) for two mooring methods in a wave energy experiment water data Mortality (per 100,000 males per year, mortality) and water hardness for 61 cities in England and Wales pistonrings data Number of piston ring failures for three legs of four compressors rearrests data Rearrests of juvenile felons by type of court in which they were tried The general r × c table Frequencies in matched samples data suicides data Crowd behaviour at threatened suicides Classification system for the response variable Lanza data Misoprostol randomised clinical trial from Lanza (1987) Lanza data Misoprostol randomised clinical trial from Lanza et al (1988a) Lanza data Misoprostol randomised clinical trial from Lanza et al (1988b) © 2010 by Taylor and Francis Group, LLC 25 28 40 41 42 43 45 46 47 49 49 52 53 66 66 66 67 67 4.6 4.7 Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 4.8 5.1 5.2 5.3 5.4 5.5 6.1 6.2 6.3 7.1 7.2 7.3 7.4 7.5 7.6 8.1 8.2 8.3 8.4 8.5 Lanza data Misoprostol randomised clinical trial from Lanza et al (1989) anomalies data Abnormalities of the face and digits of newborn infants exposed to antiepileptic drugs as assessed by a paediatrician (MD) and a research assistant (RA) orallesions data Oral lesions found in house-to-house surveys in three geographic regions of rural India weightgain data Rat weight gain for diets differing by the amount of protein (type) and source of protein (source) foster data Foster feeding experiment for rats with different genotypes of the litter (litgen) and mother (motgen) skulls data Measurements of four variables taken from Egyptian skulls of five periods schooldays data Days absent from school students data Treatment and results of two tests in three groups of students hubble data Distance and velocity for 24 galaxies clouds data Cloud seeding experiments in Florida – see above for explanations of the variables Analysis of variance table for the multiple linear regression model plasma data Blood plasma data womensrole data Women’s role in society data polyps data Number of polyps for two treatment arms ¯ backpain data Number of drivers (D) and non-drivers (D), ¯ suburban (S) and city inhabitants (S) either suffering from a herniated disc (cases) or not (controls) bladdercancer data Number of recurrent tumours for bladder cancer patients leuk data (package MASS) Survival times of patients suffering from leukemia faithful data (package datasets) Old Faithful geyser waiting times between two eruptions CYGOB1 data Energy output and surface temperature of Star Cluster CYG OB1 galaxies data (package MASS) Velocities of 82 galaxies birthdeathrates data Birth and death rates for 69 countries schizophrenia data Age on onset of schizophrenia for both sexes © 2010 by Taylor and Francis Group, LLC 67 68 78 79 80 81 95 96 97 98 102 117 118 119 120 137 138 139 141 156 157 158 9.1 Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 10.1 10.2 10.3 11.1 11.2 11.3 12.1 12.2 13.1 13.2 13.3 14.1 14.2 15.1 bodyfat data (package mboost) Body fat prediction by skinfold thickness, circumferences, and bone breadths men1500m data Olympic Games 1896 to 2004 winners of the men’s 1500m USairpollution data Air pollution in 41 US cities kyphosis data (package rpart) Children who have had corrective spinal surgery glioma data Patients suffering from two types of glioma treated with the standard therapy or a novel radioimmunotherapy (RIT) GBSG2 data (package ipred) Randomised clinical trial data from patients suffering from node-positive breast cancer Only the data of the first 20 patients are shown here mastectomy data Survival times in months after mastectomy of women with breast cancer BtheB data Data of a randomised trial evaluating the effects of Beat the Blues phosphate data Plasma inorganic phosphate levels for various time points after glucose challenge 161 177 178 180 197 199 212 214 228 respiratory data Randomised clinical trial data from patients suffering from respiratory illness Only the data of the first seven patients are shown here epilepsy data Randomised clinical trial data from patients suffering from epilepsy Only the data of the first seven patients are shown here schizophrenia2 data Clinical trial data from patients suffering from schizophrenia Only the data of the first four patients are shown here 251 alpha data (package coin) Allele length and levels of expressed alpha synuclein mRNA in alcohol-dependent patients trees513 data (package multcomp) 253 255 smoking data Meta-analysis on nicotine gum showing the number of quitters who have been treated (qt), the total number of treated (tt) as well as the number of quitters in the control group (qc) with total number of smokers in the control group (tc) 268 © 2010 by Taylor and Francis Group, LLC 231 232 Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 15.2 15.4 15.5 16.1 16.2 16.3 17.1 17.2 17.3 17.4 18.1 18.2 18.3 BCG data Meta-analysis on BCG vaccine with the following data: the number of TBC cases after a vaccination with BCG (BCGTB), the total number of people who received BCG (BCG) as well as the number of TBC cases without vaccination (NoVaccTB) and the total number of people in the study without vaccination (NoVacc) aspirin data Meta-analysis on aspirin and myocardial infarct, the table shows the number of deaths after placebo (dp), the total number subjects treated with placebo (tp) as well as the number of deaths after aspirin (da) and the total number of subjects treated with aspirin (ta) toothpaste data Meta-analysis on trials comparing two toothpastes, the number of individuals in the study, the mean and the standard deviation for each study A and B are shown heptathlon data Results Olympic heptathlon, Seoul, 1988 meteo data Meteorological measurements in an 11-year period Correlations for calculus measurements for the six anterior mandibular teeth watervoles data Water voles data – dissimilarity matrix voting data House of Representatives voting data eurodist data (package datasets) Distances between European cities, in km gardenflowers data Dissimilarity matrix of 18 species of gardenflowers pottery data Romano-British pottery data planets data Jupiter mass, period and eccentricity of exoplanets Number of possible partitions depending on the sample size n and number of clusters k © 2010 by Taylor and Francis Group, LLC 269 283 284 286 297 297 300 301 312 313 315 317 322 Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 Contents An 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 Introduction to R What is R? Installing R Help and Documentation Data Objects in R Data Import and Export Basic Data Manipulation Computing with Data Organising an Analysis Summary 1 11 14 20 21 Data Analysis Using Graphical Displays 2.1 Introduction 2.2 Initial Data Analysis 2.3 Analysis Using R 2.4 Summary 25 25 27 29 38 Simple Inference 3.1 Introduction 3.2 Statistical Tests 3.3 Analysis Using R 3.4 Summary 45 45 49 53 63 Conditional Inference 4.1 Introduction 4.2 Conditional Test Procedures 4.3 Analysis Using R 4.4 Summary 65 65 68 70 77 Analysis of Variance 5.1 Introduction 5.2 Analysis of Variance 5.3 Analysis Using R 5.4 Summary 79 79 82 83 94 © 2010 by Taylor and Francis Group, LLC Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 Simple and Multiple Linear Regression 6.1 Introduction 6.2 Simple Linear Regression 6.3 Multiple Linear Regression 6.4 Analysis Using R 6.5 Summary 97 97 99 100 103 112 Logistic Regression and Generalised Linear Models 7.1 Introduction 7.2 Logistic Regression and Generalised Linear Models 7.3 Analysis Using R 7.4 Summary 117 117 120 122 136 Density Estimation 8.1 Introduction 8.2 Density Estimation 8.3 Analysis Using R 8.4 Summary 139 139 141 147 155 Recursive Partitioning 9.1 Introduction 9.2 Recursive Partitioning 9.3 Analysis Using R 9.4 Summary 161 161 164 165 174 10 Smoothers and Generalised Additive Models 10.1 Introduction 10.2 Smoothers and Generalised Additive Models 10.3 Analysis Using R 177 177 181 186 11 Survival Analysis 11.1 Introduction 11.2 Survival Analysis 11.3 Analysis Using R 11.4 Summary 197 197 198 204 211 12 Analysing Longitudinal Data I 12.1 Introduction 12.2 Analysing Longitudinal Data 12.3 Linear Mixed Effects Models 12.4 Analysis Using R 12.5 Prediction of Random Effects 12.6 The Problem of Dropouts 12.7 Summary 213 213 216 217 219 223 223 226 © 2010 by Taylor and Francis Group, LLC Downloaded by [King Mongkut's Institute of Technology, Ladkrabang] at 01:45 11 September 2014 13 Analysing Longitudinal Data II 13.1 Introduction 13.2 Methods for Non-normal Distributions 13.3 Analysis Using R: GEE 13.4 Analysis Using R: Random Effects 13.5 Summary 231 231 233 238 247 250 14 Simultaneous Inference and Multiple Comparisons 14.1 Introduction 14.2 Simultaneous Inference and Multiple Comparisons 14.3 Analysis Using R 14.4 Summary 253 253 256 257 264 15 Meta-Analysis 15.1 Introduction 15.2 Systematic Reviews and Meta-Analysis 15.3 Statistics of Meta-Analysis 15.4 Analysis Using R 15.5 Meta-Regression 15.6 Publication Bias 15.7 Summary 267 267 269 271 273 276 277 279 16 Principal Component Analysis 16.1 Introduction 16.2 Principal Component Analysis 16.3 Analysis Using R 16.4 Summary 285 285 285 288 295 17 Multidimensional Scaling 17.1 Introduction 17.2 Multidimensional Scaling 17.3 Analysis Using R 17.4 Summary 299 299 299 305 310 18 Cluster Analysis 18.1 Introduction 18.2 Cluster Analysis 18.3 Analysis Using R 18.4 Summary 315 315 318 325 334 Bibliography 335 © 2010 by Taylor and Francis Group, LLC