COMMON ERRORS IN STATISTICS (AND HOW TO AVOID THEM) COMMON ERRORS IN STATISTICS (AND HOW TO AVOID THEM) Phillip I Good James W Hardin A JOHN WILEY & SONS, INC., PUBLICATION Copyright © 2003 by John Wiley & Sons, Inc All rights reserved Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax 978-750-4470, or on the web at www.copyright.com Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, e-mail: permreq@wiley.com Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose No warranty may be created or extended by sales representatives or written sales materials The advice and strategies contained herein may not be suitable for your situation You should consult with a professional where appropriate Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages For general information on our other products and services please contact our Customer Care Department within the U.S at 877-762-2974, outside the U.S at 317-572-3993 or fax 317-572-4002 Wiley also publishes its books in a variety of electronic formats Some content that appears in print, however, may not be available in electronic format Library of Congress Cataloging-in-Publication Data: Good, Phillip I Common errors in statistics (and how to avoid them)/Phillip I Good, James W Hardin p cm Includes bibliographical references and index ISBN 0-471-46068-0 (pbk : acid-free paper) Statistics I Hardin, James W (James William) II Title QA276.G586 2003 519.5—dc21 2003043279 Printed in the United States of America 10 Contents Preface ix PART I FOUNDATIONS 1 Sources of Error Prescription Fundamental Concepts Ad Hoc, Post Hoc Hypotheses 4 Hypotheses: The Why of Your Research Prescription What Is a Hypothesis? Null Hypothesis Neyman–Pearson Theory Deduction and Induction Losses Decisions To Learn More 11 11 11 14 15 19 20 21 23 Collecting Data Preparation Measuring Devices Determining Sample Size Fundamental Assumptions Experimental Design Four Guidelines To Learn More 25 25 26 28 32 33 34 37 CONTENTS v PART II HYPOTHESIS TESTING AND ESTIMATION 39 Estimation Prevention Desirable and Not-So-Desirable Estimators Interval Estimates Improved Results Summary To Learn More 41 41 41 45 49 50 50 Testing Hypotheses: Choosing a Test Statistic Comparing Means of Two Populations Comparing Variances Comparing the Means of K Samples Higher-Order Experimental Designs Contingency Tables Inferior Tests Multiple Tests Before You Draw Conclusions Summary To Learn More 51 53 60 62 65 70 71 72 72 74 74 Strengths and Limitations of Some Miscellaneous Statistical Procedures Bootstrap Bayesian Methodology Meta-Analysis Permutation Tests To Learn More 77 78 79 87 89 90 Reporting Your Results Fundamentals Tables Standard Error p Values Confidence Intervals Recognizing and Reporting Biases Reporting Power Drawing Conclusions Summary To Learn More 91 91 94 95 100 101 102 104 104 105 105 Graphics The Soccer Data Five Rules for Avoiding Bad Graphics 107 107 108 vi CONTENTS One Rule for Correct Usage of Three-Dimensional Graphics One Rule for the Misunderstood Pie Chart Three Rules for Effective Display of Subgroup Information Two Rules for Text Elements in Graphics Multidimensional Displays Choosing Effective Display Elements Choosing Graphical Displays Summary To Learn More PART III BUILDING A MODEL 115 117 118 121 123 123 124 124 125 127 Univariate Regression Model Selection Estimating Coefficients Further Considerations Summary To Learn More 129 129 137 138 142 143 10 Multivariable Regression Generalized Linear Models Reporting Your Results A Conjecture Building a Successful Model To Learn More 145 146 149 152 152 153 11 Validation Methods of Validation Measures of Predictive Success Long-Term Stability To Learn More 155 156 159 161 162 Appendix A 163 Appendix B 173 Glossary, Grouped by Related but Distinct Terms 187 Bibliography 191 Author Index 211 Subject Index 217 CONTENTS vii Preface ONE OF THE VERY FIRST STATISTICAL APPLICATIONS ON which Dr Good worked was an analysis of leukemia cases in Hiroshima, Japan following World War II; on August 7, 1945 this city was the target site of the first atomic bomb dropped by the United States Was the high incidence of leukemia cases among survivors the result of exposure to radiation from the atomic bomb? Was there a relationship between the number of leukemia cases and the number of survivors at certain distances from the atomic bomb’s epicenter? To assist in the analysis, Dr Good had an electric (not an electronic) calculator, reams of paper on which to write down intermediate results, and a prepublication copy of Scheffe’s Analysis of Variance The work took several months and the results were somewhat inconclusive, mainly because he could never seem to get the same answer twice—a consequence of errors in transcription rather than the absence of any actual relationship between radiation and leukemia Today, of course, we have high-speed computers and prepackaged statistical routines to perform the necessary calculations Yet, statistical software will no more make one a statistician than would a scalpel turn one into a neurosurgeon Allowing these tools to our thinking for us is a sure recipe for disaster Pressed by management or the need for funding, too many research workers have no choice but to go forward with data analysis regardless of the extent of their statistical training Alas, while a semester or two of undergraduate statistics may suffice to develop familiarity with the names of some statistical methods, it is not enough to be aware of all the circumstances under which these methods may be applicable The purpose of the present text is to provide a mathematically rigorous but readily understandable foundation for statistical procedures Here for the second time are such basic concepts in statistics as null and alternative PREFACE ix