A chronicle of permutation statistical methods

Acronyms xiiiMITS Micro Instrumentation Telemetry Systems MPP Massively parallel processing MRBP Multivariate randomized block permutation procedures MRPP Multi-response permutation proc

Trang 2

A Chronicle of Permutation Statistical Methods

Trang 4

Kenneth J Berry • Janis E Johnston • Paul W Mielke Jr.

A Chronicle of

Permutation Statistical Methods

1920–2000, and Beyond

123

Trang 5

Colorado State University

Fort Collins, CO

USA

Alexandria, VAUSA

Additional material to this book can be downloaded from http://extra.springer.com

ISBN 978-3-319-02743-2 ISBN 978-3-319-02744-9 (eBook)

DOI 10.1007/978-3-319-02744-9

Springer Cham Heidelberg New York Dordrecht London

Library of Congress Control Number: 2014935885

This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer Permissions for use may be obtained through RightsLink at the Copyright Clearance Center Violations are liable to prosecution under the respective Copyright Law.

The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made The publisher makes no warranty, express or implied, with respect to the material contained herein.

Printed on acid-free paper

Springer is part of Springer Science+Business Media ( www.springer.com )

Trang 6

For our families: Nancy T Berry,

Ellen E Berry, Laura B Berry,

Lindsay A Johnston, James B Johnston, Roberta R Mielke, William W Mielke, Emily (Mielke) Spear, and Lynn (Mielke) Basila.

Trang 8

The stimulus for this volume on the historical development of permutation statisticalmethods from 1920 to 2000 was a 2006 Ph.D dissertation by the second author onranching in Colorado in which permutation methods were extensively employed[695] This was followed by an invited overview paper on permutation statistical

methods in Wiley Interdisciplinary Reviews: Computational Statistics, by all three

authors in 2011 [117] Although a number of research monographs and textbookshave been published on permutation statistical methods, few have included muchhistorical material, with the notable exception of Edgington and Onghena in the

fourth edition of their book on Randomization Tests published in 2007 [396] Inaddition, David provided a brief history of the beginnings of permutation statisticalmethods in a 2008 publication [326], which was preceded by a more technical anddetailed description of the structure of permutation tests by Bell and Sen in 1984[93] However, none of these sources provides an extensive historical account of thedevelopment of permutation statistical methods

As Stephen Stigler noted in the opening paragraph of his 1999 book on Statistics

on the Table: The History of Statistical Concepts and Methods:

[s]tatistical concepts are ubiquitous in every province of human thought they are more likely to be noticed in the sciences, but they also underlie crucial arguments in history, literature, and religion As a consequence, the history of statistics is broad in scope and rich in diversity, occasionally technical and complicated in structure, and never covered completely [ 1321 , p 1].

This book emphasizes the historical and social context of permutation statisticalmethods, as well as the motivation for the development of selected permutation tests.The field is broadly interpreted and it is notable that many of the early pioneers weremajor contributors to, and may be best remembered for, work in other disciplinesand areas Many of the early contributors to the development of permutationmethods were trained for other professions such as mathematics, economics,agriculture, the military, or chemistry In more recent times, researchers fromatmospheric science, biology, botany, computer science, ecology, epidemiology,environmental health, geology, medicine, psychology, and sociology have madesignificant contributions to the advancement of permutation statistical methods.Their common characteristic was an interest in, and capacity to use, quantitativemethods on problems judged to be important in their respective disciplines

vii

Trang 9

The purpose of this book is to chronicle the birth and development of permutationstatistical methods over the approximately 80-year period from 1920 to 2000 As towhat the state of permutation methods will be 80 years in the future—one can onlyguess Not even our adult children will live to see the permutation methods of thatday As for ourselves, we have to deal with the present and the past It is our hope inthis writing that knowledge of the past will help the reader to think critically aboutthe present Those who write intellectual history, as Hayden White maintained,

“do not build up knowledge that others might use, they generate a discourse aboutthe past” (White, quoted in Cohen [267, pp 184–185]) Although the authors arenot historians, they are still appreciative of the responsibility historians necessarilyassume when trying to accurately, impartially, and objectively interpret the past

Moreover, the authors are acutely aware of the 1984 Orwellian warning that “Who

controls the past controls the future” [1073, p 19] The authors are also fullycognizant that there are the records of the past, then there is the interpretation ofthose records The gap between them is a source of concern As Appleby, Hunt,

and Jacob noted in Telling the Truth About History, “[a]t best, the past only dimly

corresponds to what the historians say about it” [28, p 248] In writing this book,the authors were reminded of the memorable quote by Walter Sellar and Robert

Yeatman, the authors of 1066 and All That: A Memorable History of England:

“History is not what you thought It is what you can remember” [1245, p vii].1Inresearching the development of permutation methods, the authors constantly dis-covered historical events of which they were not aware, remembered events theythought they had forgotten, and often found what they thought they remembered wasincorrect Debates as to how to present historical information about the development

of permutation methods will likely be prompted by this volume What is not up fordebate is the impact that permutation methods have had on contemporary statisticalmethods Finally, as researchers who have worked in the field of statistics for manyyears, the authors fondly recall a sentient quote by Karl Pearson:

I do feel how wrongful it was to work for so many years at statistics and neglect its history [ 1098 , p 1].

A number of books and articles detailing the history of statistics have beenwritten, but there is little coverage of the historical development of permutationmethods While many of the books and articles have briefly touched on thedevelopment of permutation methods, none has been devoted entirely to the topic.Among the many important sources on the history of probability and statistics, afew have served the authors well, being informative, interesting, or both Among

these we count Natural Selection, Heredity and Eugenics: Selected Correspondence

of R.A Fisher with Leonard Darwin and Others and Statistical Inference and Analysis: Selected Correspondence of R.A Fisher by J.H Bennett [96,97]; “Ahistory of statistics in the social sciences” by V Coven [289]; A History of Inverse

Probability from Thomas Bayes to Karl Pearson by A.I Dale [310]; Games, Gods,

1 Emphasis in the original.

Trang 10

Preface ix

and Gambling: The Origin and History of Probability and Statistical Ideas from the Earliest Times to the Newtonian Era by F.N David [320]; “Behavioral statistics: Anhistorical perspective” by A.L Dudycha and L.W Dudycha [361]; “A brief history

of statistics in three and one-half chapters” by S.E Fienberg [428]; The Making

of Statisticians edited by J Gani [493]; The Empire of Chance: How Probability

Changed Science and Everyday Life by G Gigerenzer, Z Swijtink, T.M Porter,

and L Daston [512]; The Emergence of Probability and The Taming of Chance by

I Hacking [567,568]; History of Probability and Statistics and Their Applications

Before 1750 and A History of Mathematical Statistics from 1750 to 1930 by A Hald

[571,572]; “The method of least squares and some alternatives: Part I,” “The method

of least squares and some alternatives: Part II,” “The method of least squares andsome alternatives: Part III,” “The method of least squares and some alternatives:Part IV,” “The method of least squares and some alternatives: Addendum to Part IV,”

“The method of least squares and some alternatives: Part V,” and “The method ofleast squares and some alternatives: Part VI” by H.L Harter [589–595]; Statisticians

of the Centuries edited by C.C Heyde and E Seneta [613]; Leading Personalities

in Statistical Sciences: From the Seventeenth Century to the Present edited by

N.L Johnson and S Kotz [691]; Bibliography of Statistical Literature: 1950–1958,

Bibliography of Statistical Literature: 1940–1949, and Bibliography of Statistical Literature: Pre 1940 by M.G Kendall and A.G Doig [743–745]

Also, Studies in the History of Statistics and Probability edited by M.G.

Kendall and R.L Plackett [747]; Creative Minds, Charmed Lives: Interviews at

Institute for Mathematical Sciences, National University of Singapore edited by

L.Y Kiang [752]; “A bibliography of contingency table literature: 1900 to 1974”

by R.A Killion and D.A Zahn [754]; The Probabilistic Revolution edited by

L Krüger, L Daston, and M Heidelberger [775]; Reminiscences of a Statistician:

The Company I Kept and Fisher, Neyman, and the Creation of Classical Statistics by

E.L Lehmann [814,816]; Statistics in Britain, 1865–1930: The Social Construction

of Scientific Knowledge by D MacKenzie [863]; The History of Statistics in the

17th and 18th Centuries Against the Changing Background of Intellectual, Scientific and Religious Thought edited by E.S Pearson [1098]; Studies in the History of

Statistics and Probability edited by E.S Pearson and M.G Kendall [1103]; The Rise

of Statistical Thinking, 1820–1900 by T.M Porter [1141]; Milestones in Computer

Science and Information Technology by E.D Reilly [1162]; The Lady Tasting Tea:

How Statistics Revolutionized Science in the Twentieth Century by D Salsburg

[1218]; Bibliography of Nonparametric Statistics by I.R Savage [1225]; Theory of

Probability: A Historical Essay by O.B Sheynin [1263]; American Contributions

to Mathematical Statistics in the Nineteenth Century, Volumes 1 and 2, The History

of Statistics: The Measurement of Uncertainty Before 1900, and Statistics on the Table: The History of Statistical Concepts and Methods by S.M Stigler [1318–

1321], Studies in the History of Statistical Method by H.M Walker [1409], and the

44 articles published by various authors under the title “Studies in the history of

probability and statistics” that appeared in Biometrika between 1955 and 2000.

In addition, the authors have consulted myriad addresses, anthologies, cles, autobiographies, bibliographies, biographies, books, celebrations, chronicles,

Trang 11

arti-collections, commentaries, comments, compendiums, compilations, conversations,correspondences, dialogues, discussions, dissertations, documents, essays, eulogies,encyclopedias, festschrifts, histories, letters, manuscripts, memoirs, memorials,obituaries, remembrances, reports, reviews, speeches, summaries, synopses, theses,tributes, web sites, and various other sources on the contributions of individualstatisticians to permutation methods, many of which are listed in the references atthe end of the book.

No preface to a chronicle of the development of permutation statistical methodswould be complete without acknowledging the major contributors to the field,some of whom contributed theory, others methods and algorithms, and still otherspromoted permutation methods to new audiences At the risk of slighting someone

of importance, in the early years from 1920 to 1939 important contributions weremade by Thomas Eden, Ronald Fisher, Roy Geary, Harold Hotelling, Joseph Irwin,Jerzy Neyman, Edwin Olds, Margaret Pabst, Edwin Pitman, Bernard Welch, andFrank Yates Later, the prominent names were Bernard Babington Smith, GeorgeBox, Meyer Dwass, Eugene Edgington, Churchill Eisenhart, Alvan Feinstein, LeonFestinger, David Finney, Gerald Freeman, Milton Friedman, Arthur Ghent, JohnHaldane, John Halton, Wassily Hoeffding, Lawrence Hubert, Maurice Kendall,Oscar Kempthorne, William Kruskal, Erich Lehmann, Patrick Leslie, Henry Mann,

M Donal McCarthy, Cyrus Mehta, Nitin Patel, Henry Scheffé, Cedric Smith,Charles Spearman, Charles Stein, John Tukey, Abraham Wald, Dirk van der Reyden,

W Allen Wallis, John Whitfield, Donald Whitney, Frank Wilcoxon, Samuel Wilks,and Jacob Wolfowitz More recently, one should recognize Alan Agresti, BrianCade, Herbert David, Hugh Dudley, David Freedman, Phillip Good, Peter Kennedy,David Lane, John Ludbrook, Bryan Manly, Patrick Onghena, Fortunato Pesarin, JonRichards, and Cajo ter Braak

Acknowledgments The authors wish to thank the editors and staff at

Springer-Verlag A very special thanks to Federica Corradi Dell’Acqua, Assistant Editor,Statistics and Natural Language Processing, who guided the project through frombeginning to end; this book would not have been written without her guidance andoversight We also wish to thank Norm Walsh who answered all our LATEX questions

We are grateful to Roberta Mielke who read the entire manuscript and made manyhelpful comments, and Cristi MacWaters, Interlibrary Loan Coordinator at MorganLibrary, Colorado State University, who retrieved many of the manuscripts weneeded Finally, we wish to thank Steve and Linda Jones, proprietors of the RainbowRestaurant, 212 West Laurel Street, Fort Collins, Colorado, for their gracioushospitality; the bulk of this book was written at table 20 in their restaurant adjacent

to the campus of Colorado State University

August 2013

Trang 12

AAAS American Association for the Advancement of Science

ACM Association for Computing Machinery

ALGOL Algorithmic computer language

AMAP Approximate Multivariate Association Procedure

ANOVA Analysis of variance

ARE Asymptotic relative efficiency

ARPAnet Advanced Research Projects Agency network

ASCC Automatic sequence controlled calculator

BAAS British Association for the Advancement of Science

BASIC Beginners All-Purpose Symbolic Instruction Code

CDF Cumulative distribution function

CEEB College Entrance Examination Board

CF Correction factor (analysis of variance)

CIT California Institute of Technology

COBOL Common business oriented language

CTR Computing Tabulating Recording Corporation

DARPA Defense Advanced Research Projects Agency

DHSS Department of Health and Social Security

DOE The design of experiments (Fisher)

xi

Trang 13

ECDF Empirical cumulative distribution function

ECST Exact chi-squared test

EDSAC Electronic delay storage automatic calculator

EMAP Exact multivariate association procedure

ENIAC Electronic numerical integrator and computer

ETH Eidgenössische Technische Hochschule

FEPT Fisher exact probability test

FLOPS Floating operations per second

FORTRAN Formula Translation

GCHQ Government Communications Head Quarters

GL Generalized logistic (distribution)

GPD Generalized Pareto distribution

GUI Graphical user interface

IAS Institute for Advanced Study (Princeton)

IBM International Business Machines (Corporation)

ICI Imperial Chemical Industries

IEEE Institute of Electrical and Electronics Engineering

IMS Institute of Mathematical Statistics

LAD Least absolute deviation (regression)

LANL Los Alamos National Laboratory

LASL Los Alamos Scientific Laboratory

LLNL Lawrence Livermore National Laboratory

LSED Least sum of Euclidean distances

MANIAC Mathematical analyzer, numerical integrator and computer

MANOVA Multivariate analysis of variance

MIT Massachusetts Institute of Technology

Trang 14

Acronyms xiii

MITS Micro Instrumentation Telemetry Systems

MPP Massively parallel processing

MRBP Multivariate randomized block permutation procedures

MRPP Multi-response permutation procedures

MS Mean square (analysis of variance)

MSPA Multivariate sequential permutation analyses

MXH Multivariate extended hypergeometric

NBA National Basketball Association

NCAR National Center for Atmospheric Research

NHSRC National Homeland Security Research Center

NIST National Institute of Standards and Technology

NIT National Institutes of Health

NSFNET National Science Foundation NETwork

OBE Order of the British Empire

OECD Organization for Economic Cooperation and Development

OLS Ordinary least squares (regression)

ORACLE Oak Ridge Automatic Computer and Logical Engine

OSRD Office of Scientific Research and Development

PET Personal Electronic Transactor (Commodore PET)

ˆK‚ Phi kappa theta (fraternity)

PISA Programme for International Student Assessment

PSI Statisticians in the Pharmaceutical Industry

RAND Research and Development (Corporation)

RIDIT Relative to an identified distribution

SAT Scholastic aptitude test

SFMT SIMD-Oriented Fast Mersenne Twister

SIAM Society for Industrial and Applied Mathematics

SIMD Single instruction [stream], multiple data [stream]

Trang 15

SLC Super Little Chip

SNL Sandia National Laboratories

SPSS Statistical Package for the Social Sciences

SREB Southern Regional Education Board

SRG Statistical Research Group (Columbia University)

SRI Stanford Research Institute

SS Sum of squares (analysis of variance)

TAOCP The Art of Computer Programming

UCLA University of California, Los Angeles

UNIVAC Universal Automatic Computer

USDA United States Department of Agriculture

WMW Wilcoxon–Mann–Whitney two-sample rank-sum test

Trang 16

1 Introduction 1

1.1 Overview of This Chapter 1

1.2 Two Models of Statistical Inference 3

1.3 Permutation Tests 4

1.3.1 Exact Permutation Tests 4

1.3.2 Moment-Approximation Permutation Tests 5

1.3.3 Resampling-Approximation Permutation Tests 5

1.3.4 Compared with Parametric Tests 5

1.3.5 The Bootstrap and the Jackknife 7

1.4 Student’s t Test 8

1.4.1 An Exact Permutation t Test 9

1.4.2 A Moment-Approximation t Test 10

1.4.3 A Resampling-Approximation t Test 11

1.5 An Example Data Analysis 11

1.6 Overviews of Chaps 2–6 13

2 1920–1939 19

2.2 Neyman–Fisher–Geary and the Beginning 20

2.2.1 Spława-Neyman and Agricultural Experiments 21

2.2.2 Fisher and the Binomial Distribution 24

2.2.3 Geary and Correlation 31

2.3 Fisher and the Variance-Ratio Statistic 33

2.3.1 Snedecor and the F Distribution 35

2.4 Eden–Yates and Non-normal Data 37

2.5 Fisher and 2 2 Contingency Tables 41

2.6 Yates and the Chi-Squared Test for Small Samples 43

2.6.1 Calculation with an Arbitrary Initial Value 46

2.7 Irwin and Fourfold Contingency Tables 48

2.8 The Rothamsted Manorial Estate 52

2.8.1 The Rothamsted Lady Tasting Tea Experiment 58

2.8.2 Analysis of The Lady Tasting Tea Experiment 60

2.9 Fisher and the Analysis of Darwin’s Zea mays Data 61

2.10 Fisher and the Coefficient of Racial Likeness 65

xv

Trang 17

2.11 Hotelling–Pabst and Simple Bivariate Correlation 66

2.12 Friedman and Analysis of Variance for Ranks 71

2.13 Welch’s Randomized Blocks and Latin Squares 73

2.14 Egon Pearson on Randomization 75

2.15 Pitman and Three Seminal Articles 78

2.15.1 Permutation Analysis of Two Samples 79

2.15.2 Permutation Analysis of Correlation 80

2.15.3 Permutation Analysis of Variance 81

2.16 Welch and the Correlation Ratio 82

2.17 Olds and Rank-Order Correlation 83

2.18 Kendall and Rank Correlation 84

2.19 McCarthy and Randomized Blocks 88

2.20 Computing and Calculators 88

2.20.1 The Method of Differences 93

2.20.2 Statistical Computing in the 1920s and 1930s 93

2.21 Looking Ahead 97

3 1940–1959 101

3.2 Development of Computing 105

3.3 Kendall–Babington Smith and Paired Comparisons 111

3.4 Dixon and a Two-Sample Rank Test 114

3.5 Swed–Eisenhart and Tables for the Runs Test 117

3.6 Scheffé and Non-parametric Statistical Inference 120

3.7 Wald–Wolfowitz and Serial Correlation 122

3.8 Mann and a Test of Randomness Against Trend 125

3.9 Barnard and 2 2 Contingency Tables 130

3.10 Wilcoxon and the Two-Sample Rank-Sum Test 132

3.10.1 Unpaired Samples 134

3.10.2 Paired Samples 137

3.11 Festinger and the Two-Sample Rank-Sum Test 139

3.12 Mann–Whitney and a Two-Sample Rank-Sum Test 143

3.13 Whitfield and a Measure of Ranked Correlation 147

3.13.1 An Example of Whitfield’s Approach 149

3.14 Olmstead–Tukey and the Quadrant-Sum Test 152

3.15 Haldane–Smith and a Test for Birth-Order Effects 154

3.16 Finney and the Fisher–Yates Test for 2 2 Tables 159

3.17 Lehmann–Stein and Non-parametric Tests 161

3.18 Rank-Order Statistics 163

3.18.1 Kendall and Rank Correlation Methods 163

3.18.2 Wilks and Order Statistics 164

3.19 van der Reyden and a Two-Sample Rank-Sum Test 165

3.20 White and Tables for the Rank-Sum Test 168

3.21 Other Results for the Two-Sample Rank-Sum Test 170

3.22 David–Kendall–Stuart and Rank-Order Correlation 172

Trang 18

Contents xvii

3.23 Freeman–Halton and an Exact Test of Contingency 172

3.24 Kruskal–Wallis and the C-sample Rank-Sum Test 178

3.25 Box–Andersen and Permutation Theory 180

3.26 Leslie and Small Contingency Tables 184

3.27 A Two-Sample Rank Test for Dispersion 186

3.27.1 Rosenbaum’s Rank Test for Dispersion 187

3.27.2 Kamat’s Rank Test for Dispersion 188

3.28 Dwass and Modified Randomization Tests 193

4 1960–1979 199

4.3 Permutation Algorithms and Programs 209

4.3.1 Permutation Methods and Contingency Tables 219

4.4 Ghent and the Fisher–Yates Exact Test 222

4.5 Programs for Contingency Table Analysis 225

4.6 Siegel–Tukey and Tables for the Test of Variability 231

4.7 Other Tables of Critical Values 234

4.8 Edgington and Randomization Tests 235

4.9 The Matrix Occupancy Problem 238

4.10 Kempthorne and Experimental Inference 240

4.11 Baker–Collier and Permutation F Tests 243

4.11.1 A Permutation Computer Program 243

4.11.2 Simple Randomized Designs 243

4.11.3 Randomized Block Designs 244

4.12 Permutation Tests in the 1970s 245

4.13 Feinstein and Randomization 245

4.14 The Mann–Whitney, Pitman, and Cochran Tests 249

4.15 Mielke–Berry–Johnson and MRPP 249

4.15.1 Least Absolute Deviations Regression 254

4.15.2 Multi-Response Permutation Procedures 254

4.15.3 An Example MRPP Analysis 258

4.15.4 Approximate Probability Values 261

4.16 Determining the Number of Contingency Tables 266

4.17 Soms and the Fisher Exact Permutation Test 266

4.18 Baker–Hubert and Ordering Theory 267

4.19 Green and Two Permutation Tests for Location 268

4.20 Agresti–Wackerly–Boyett and Approximate Tests 269

4.21 Boyett and Random R by C Tables 271

5 1980–2000 275

5.3 Permutation Methods and Contingency Tables 281

Trang 19

5.4 Yates and 2 2 Contingency Tables 285

5.5 Mehta–Patel and a Network Algorithm 287

5.5.1 Multi-Way Contingency Tables 298

5.5.2 Additional Contingency Table Analyses 300

5.6 MRPP and the Pearson Type III Distribution 303

5.7 MRPP and Commensuration 305

5.8 Tukey and Rerandomization 306

5.9 Matched-Pairs Permutation Analysis 308

5.10 Subroutine PERMUT 311

5.11 Moment Approximations and the F Test 312

5.11.1 Additional Applications of MRPP 313

5.12 Mielke–Iyer and MRBP 313

5.13 Relationships of MRBP to Other Tests 316

5.14 Kappa and the Measurement of Agreement 318

5.14.1 Extensions to Interval and Ordinal Data 321

5.14.2 Extension of Kappa to Multiple Raters 321

5.14.3 Limitations of Kappa 323

5.14.4 Relationships Between < and Existing Measures 327

5.14.5 Agreement with Two Groups and a Standard 332

5.15 Basu and the Fisher Randomization Test 333

5.16 Still–White and Permutation Analysis of Variance 334

5.17 Walters and the Utility of Resampling Methods 335

5.18 Conover–Iman and Rank Transformations 337

5.19 Green and Randomization Tests 337

5.20 Gabriel–Hall and Rerandomization Inference 338

5.21 Pagano–Tritchler and Polynomial-Time Algorithms 338

5.22 Welch and a Median Permutation Test 339

5.23 Boik and the Fisher–Pitman Permutation Test 339

5.24 Mielke–Yao Empirical Coverage Tests 340

5.25 Randomization in Clinical Trials 342

5.26 The Period from 1990 to 2000 343

5.27 Algorithms and Programs 343

5.28 Page–Brin and Google 346

5.29 Spino–Pagano and Trimmed/Winsorized Means 347

5.30 May–Hunter and Advantages of Permutation Tests 349

5.31 Mielke–Berry and Tests for Common Locations 350

5.32 Kennedy–Cade and Multiple Regression 351

5.33 Blair et al and Hotelling’s T2Test 352

5.34 Mielke–Berry–Neidt and Hotelling’s T2Test 353

5.35 Cade–Richards and Tests for LAD Regression 355

5.36 Walker–Loftis–Mielke and Spatial Dependence 356

5.37 Frick on Process-Based Testing 357

5.38 Ludbrook–Dudley and Biomedical Research 357

5.39 The Fisher Z Transformation 358

Trang 20

Contents xix

6 Beyond 2000 363

6.2 Computing After Year 2000 364

6.3 Books on Permutation Methods 368

6.4 A Summary of Contributions by Publication Year 369

6.5 Agresti and Exact Inference for Categorical Data 377

6.6 The Unweighted Kappa Measure of Agreement 378

6.7 Mielke et al and Combining Probability Values 380

6.8 Legendre and Kendall’s Coefficient of Concordance 381

6.9 The Weighted Kappa Measure of Agreement 382

6.10 Berry et al and Measures of Ordinal Association 384

6.11 Resampling for Multi-Way Contingency Tables 387

6.11.1 Description 387

6.11.2 An Example Analysis 388

6.12 Mielke–Berry and a Multivariate Similarity Test 389

6.13 Cohen’s Weighted Kappa with Multiple Raters 391

6.14 Exact Variance of Weighted Kappa 393

6.15 Campbell and Two-by-Two Contingency Tables 397

6.16 Permutation Tests and Robustness 400

6.16.1 Robustness and Rank-Order Statistics 402

6.16.2 Mielke et al and Robustness 404

6.17 Advantages of the Median for Analyzing Data 409

6.18 Consideration of Statistical Outliers 410

6.19 Multivariate Multiple Regression Analysis 413

6.19.1 A Permutation Test 414

6.20 O’Gorman and Multiple Linear Regression 417

6.21 Brusco–Stahl–Steinley and Weighted Kappa 419

6.22 Mielke et al and Ridit Analysis 421

6.23 Knijnenburg et al and Probability Values 424

6.24 Reiss et al and Multivariate Analysis of Variance 425

6.25 A Permutation Analysis of Trend 426

6.26 Curran-Everett and Permutation Methods 428

Epilogue 429

References 433

Name Index 489

Subject Index 503

Trang 21

Permutation statistical methods are a paradox of old and new While permutationmethods pre-date many traditional parametric statistical methods, only recentlyhave permutation methods become part of the mainstream discussion regardingstatistical testing Permutation statistical methods follow a permutation modelwhereby a test statistic is computed on the observed data, then (1) the observeddata are permuted over all possible arrangements of the observations—an exactpermutation test, (2) the observed data are used for calculating the exact moments

of the underlying discrete permutation distribution and the moments are fitted

to an associated continuous distribution—a moment-approximation permutationtest, or (3) the observed data are permuted over a random subset of all possiblearrangements of the observations—a resampling-approximation permutation test[977, pp 216–218]

1.1 Overview of This Chapter

This first chapter begins with a brief description of the advantages of tation methods from statisticians who were, or are, advocates of permutationtests, followed by a description of the methods of permutation tests includingexact, moment-approximation, and resampling-approximation permutation tests.The chapter continues with an example that contrasts the well-known Student ttest and results from exact, moment-approximation, and resampling-approximationpermutation tests using historical data The chapter concludes with brief overviews

permu-of the remaining chapters

Permutation tests are often described as the gold standard against which ventional parametric tests are tested and evaluated Bakeman, Robinson, and Queraremarked that “like Read and Cressie (1988), we think permutation tests representthe standard against which asymptotic tests must be judged” [50, p 6] Edgingtonand Onghena opined that “randomization tests have come to be recognized

con-by many in the field of medicine as the ‘gold standard’ of statistical tests forrandomized experiments” [396, p 9]; Friedman, in comparing tests of significance

K.J Berry et al., A Chronicle of Permutation Statistical Methods,

DOI 10.1007/978-3-319-02744-9 1,

1

Trang 22

2 1 Introduction

for m rankings, referred to an exact permutation test as “the correct one” [486,

p 88]; Feinstein remarked that conventional statistical tests “yield reasonably able approximations of the more exact results provided by permutation procedures”[421, p 912]; and Good noted that Fisher himself regarded randomization as atechnique for validating tests of significance, i.e., making sure that conventionalprobability values were accurate [521, p 263]

reli-Early statisticians understood well the value of permutation statistical tests evenduring the period in which the computationally-intensive nature of the tests madethem impractical Notably, in 1955 Kempthorne wrote that “[t]ests of significance

in the randomized experiment have frequently been presented by way of normal lawtheory, whereas their validity stems from randomization theory” [719, p 947] and[w]hen one considers the whole problem of experimental inference, that is of tests of significance, estimation of treatment differences and estimation of the errors of estimated differences, there seems little point in the present state of knowledge in using method of inference other randomization analysis [ 719 , p 966].

In 1966 Kempthorne re-emphasized that “the proper way to make tests ofsignificance in the simple randomized experiments is by way of the randomiza-tion (or permutation) test” [720, p 20] and “in the randomized experiment oneshould, logically, make tests of significance by way of the randomization test”[720, p 21].1 Similarly, in 1959 Scheffé stated that the conventional analysis of

variance F test “can often be regarded as a good approximation to a permutation

[randomization] test, which is an exact test under a less restrictive model” [1232,

p 313] In 1968 Bradley indicated that “eminent statisticians have stated that therandomization test is the truly correct one and that the corresponding parametrictest is valid only to the extent that it results in the same statistical decision” [201,

p 85]

With the advent of high-speed computing, permutation tests became morepractical and researchers increasingly appreciated the benefits of the randomizationmodel In 1998, Ludbrook and Dudley stated that “it is our thesis that therandomization rather than the population model applies, and that the statisticalprocedures best adapted to this model are those based on permutation” [856, p 127],concluding that “statistical inferences from the experiments are valid only under therandomization model of inference” [856, p 131]

In 2000, Bergmann, Ludbrook, and Dudley, in a cogent analysis of theWilcoxon–Mann–Whitney two-sample rank-sum test, observed that “the onlyaccurate form of the Wilcoxon–Mann–Whitney procedure is one in which theexact permutation null distribution is compiled for the actual data” [100, p 72] andconcluded:

[o]n theoretical grounds, it is clear that the only infallible way of executing the [Wilcoxon–Mann–Whitney] test is to compile the null distribution of the rank-sum statistic

by exact permutation This was, in effect, Wilcoxon’s (1945) thesis and it provided the theoretical basis for his [two-sample rank-sum] test [ 100 , p 76].

1 The terms “permutation test” and “randomization test” are often used interchangeably.

Trang 23

1.2 Two Models of Statistical Inference

Essentially, two models of statistical inference coexist: the population modeland the permutation model; see for further discussion, articles by Curran-Everett[307], Hubbard [663], Kempthorne [721], Kennedy [748], Lachin [787], Ludbrook[849,850], and Ludbrook and Dudley [854] The population model, formallyproposed by Jerzy Neyman and Egon Pearson in 1928 [1035,1036], assumesrandom sampling from one or more specified populations Under the populationmodel, the level of statistical significance that results from applying a statisticaltest to the results of an experiment or a survey corresponds to the frequency withwhich the null hypothesis would be rejected in repeated random samplings from thesame specified population(s) Because repeated sampling of the true population(s) isusually impractical, it is assumed that the sampling distribution of the test statisticsgenerated under repeated random sampling conforms to an assumed, conjectured,hypothetical distribution, such as the normal distribution

The size of a statistical test, e.g., 0.05, is the probability under a specifiednull hypothesis that repeated outcomes based on random samples of the samesize are equal to or more extreme than the observed outcome In the populationmodel, assignment of treatments to subjects is viewed as fixed with the stochasticelement taking the form of an error that would vary if the experiment was repeated[748] Probability values are then calculated based on the potential outcomes ofconceptual repeated draws of these errors The model is sometimes referred to

as the “conditional-on-assignment” model, as the distribution used for structuringthe test is conditional on the treatment assignment of the observed sample; see for

example, a comprehensive and informative 1995 article by Peter Kennedy in Journal

of Business & Economic Statistics [748]

The permutation model was introduced by R.A Fisher in 1925 [448] and furtherdeveloped by R.C Geary in 1927 [500], T Eden and F Yates in 1933 [379], andE.J.G Pitman in 1937 and 1938 [1129–1131] Permutation tests do not refer to anyparticular statistical tests, but to a general method of determining probability values

In a permutation statistical test the only assumption made is that experimentalvariability has caused the observed result That assumption, or null hypothesis,

is then tested The smaller the probability, the stronger is the evidence againstthe assumption [648] Under the permutation model, a permutation test statistic

is computed for the observed data, then the observations are permuted over allpossible arrangements of the observations and the test statistic is computed foreach equally-likely arrangement of the observed data [307] For clarification, anordered sequence of n exchangeable objects !1; : : : ; !n/ yields nŠ equally-likely

arrangements of the n objects, vide infra The proportion of cases with test statistic

values equal to or more extreme than the observed case yields the probability ofthe observed test statistic In contrast to the population model, the assignment oferrors to subjects is viewed as fixed, with the stochastic element taking the form

of the assignment of treatments to subjects for each arrangement [748] Probabilityvalues are then calculated according to all outcomes associated with assignments

Trang 24

4 1 Introduction

of treatments to subjects for each case This model is sometimes referred to as the

“conditional-on-errors” model, as the distribution used for structuring the test isconditional on the individual errors drawn for the observed sample; see for example,

a 1995 article by Peter Kennedy [748]

Exchangeability

A sufficient condition for a permutation test is the exchangeability of therandom variables Sequences that are independent and identically distributed(i.i.d.) are always exchangeable, but so is sampling without replacement from

a finite population However, while i.i.d implies exchangeability, ability does not imply i.i.d [528,601,758] Diaconis and Freedman present areadable discussion of exchangeability using urns and colored balls [346].More formally, variables X1; X2; : : : ; Xnare exchangeable if

exchange-P

" n

\

i D1.Xi xi/

1.3.1 Exact Permutation Tests

Exact permutation tests enumerate all equally-likely arrangements of the observeddata For each arrangement, the desired test statistic is calculated The obtaineddata yield the observed value of the test statistic The probability of obtaining theobserved value of the test statistic, or a more extreme value, is the proportion ofthe enumerated test statistics with values equal to or more extreme than the value

of the observed test statistic As sample sizes increase, the number of possiblearrangements can become very large and exact methods become impractical Forexample, permuting two small samples of sizes n1D n2D 20 yields

M D .n1C n2/Š

n1Š n2Š D .20C 20/Š

.20Š/2 D 137;846;528;820different arrangements of the observed data

Trang 25

1.3.2 Moment-Approximation Permutation Tests

The moment-approximation of a test statistic requires computation of the exactmoments of the test statistic, assuming equally-likely arrangements of the observeddata The moments are then used to fit a specified distribution For example,the first three exact moments may be used to fit a Pearson type III distribution.Then, the Pearson type III distribution approximates the underlying discrete per-mutation distribution and provides an approximate probability value For manyyears moment-approximation permutation tests provided an important intermediaryapproximation when computers lacked both the speed and the storage for calculatingexact permutation tests More recently, resampling-approximation permutation testshave largely replaced moment-approximation permutation tests, except when eitherthe size of the data set is very large or the probability of the observed test statistic isvery small

1.3.3 Resampling-Approximation Permutation Tests

Resampling-approximation permutation tests generate and examine a Monte Carlorandom subset of all possible equally-likely arrangements of the observed data

In the case of a resampling-approximation permutation test, the probability ofobtaining the observed value of the test statistic, or a more extreme value, is theproportion of the resampled test statistics with values equal to or more extreme thanthe value of the observed test statistic [368,649] Thus, resampling permutationprobability values are computationally quite similar to exact permutation tests, butthe number of resamplings to be considered is decided upon by the researcher ratherthan by considering all possible arrangements of the observed data With sufficientresamplings, a researcher can compute a probability value to any accuracy desired.Read and Cressie [1157], Bakeman, Robinson, and Quera [50], and Edgington andOnghena [396, p 9] described permutation methods as the “gold standard” againstwhich asymptotic methods must be judged Tukey took it one step further, labelingresampling permutation methods the “platinum standard” of permutation methods[216,1381,1382].2

1.3.4 Compared with Parametric Tests

Permutation tests differ from traditional parametric tests based on an assumedpopulation model in several ways

2 In a reversal Tukey could not have predicted, at the time of this writing gold was trading at $1,775 per troy ounce, while platinum was only $1,712 per troy ounce [ 275 ].

Trang 26

6 1 Introduction

1 Permutation tests are data dependent, in that all the information required foranalysis is contained within the observed data set; see a 2007 discussion byMielke and Berry [965, p 3].3

2 Permutation tests do not assume an underlying theoretical distribution; see a

1983 article by Gabriel and Hall [489]

3 Permutation tests do not depend on the assumptions associated with traditionalparametric tests, such as normality and homogeneity; see articles by Kennedy

in 1995 [748] and Berry, Mielke, and Mielke in 2002 [162].4

4 Permutation tests provide probability values based on the discrete permutationdistribution of equally-likely test statistic values, rather than an approximateprobability value based on a conjectured theoretical distribution, such as anormal, chi-squared, or F distribution; see a 2001 article by Berry, Johnston,and Mielke [117]

5 Whereas permutation tests are suitable when a random sample is obtained from

a designated population, permutation tests are also appropriate for nonrandomsamples, such as are common in biomedical research; see discussions byKempthorne in 1977 [721], Gabriel and Hall in 1983 [489], Bear in 1995 [88],Frick in 1998 [482], Ludbrook and Dudley in 1998 [856], and Edgington andOnghena in 2007 [396, pp 6–8]

6 Permutation tests are appropriate when analyzing entire populations, as tation tests are not predicated on repeated random sampling from a specifiedpopulation; see discussions by Ludbrook and Dudley in 1998 [856], Holford in

permu-2003 [638], and Edgington and Onghena in 2007 [396, pp 1–8]

7 Permutation tests can be defined for any selected test statistic; thus, researchershave the option of using a wide variety of test statistics, including themajority of statistics commonly utilized in traditional statistical approaches;see discussions by Mielke and Berry in 2007 [965]

8 Permutation tests are ideal for very small data sets, when conjectured, thetical distribution functions may provide very poor fits; see a 1998 article byLudbrook and Dudley [856]

hypo-9 Appropriate permutation tests are resistant to extreme values, such as arecommon in demographic data, e.g., income, age at first marriage, number ofchildren, and so on; see a discussion by Mielke and Berry in 2007 [965, pp 52–53] and an article by Mielke, Berry, and Johnston in 2011 [978] Consequently,the need for any data transformation is mitigated in the permutation contextand in general is not recommended, e.g., square root, logarithmic, the use of

3 Echoing Fisher’s argument that inference must be based solely on the data at hand [ 460 ], Haber refers to data dependency as “the data at hand principle” [ 565 , p 148].

4 Barton and David noted that it is desirable to make the minimum of assumptions, since, witness the oft-cited Bertrand paradox [ 163 ], that the assumptions made will often prejudice the conclusions reached [ 83 , p 455].

Trang 27

rank-order statistics,5 and the choice of a distance function, in particular, may

be very misleading [978]

10 Permutation tests provide data-dependent statistical inferences only to theactual experiment or survey that has been performed, and are not dependent

on a contrived super population; see for example, discussions by Feinstein in

1973 [421] and Edgington and Onghena in 2007 [396, pp 7–8]

1.3.5 The Bootstrap and the Jackknife

This chronicle is confined to permutation methods, although many researchersconsider that permutation methods, bootstrapping, and the jackknife are closelyrelated Traditionally, jackknife (leave-one-out) methods have been used to reducebias in small samples, calculate confidence intervals around parameter estimates,and test hypotheses [789,876,1376], while bootstrap methods have been used toestimate standard errors in cases where the distribution of the data is unknown [789]

In general, permutation methods are considered to be more powerful than either thebootstrap or (possibly) the jackknife approaches [789]

While permutation methods and bootstrapping both involve computing tions, and the rejection of the null hypothesis occurs when a common test statistic

simula-is extreme under both bootstrapping and permutation, they are conceptually andmechanically quite different On the other hand, they do have some similarities,including equivalence in an asymptotic sense [358,1189] The two approaches differ

in their distinct sampling methods In resampling, a “new” sample is obtained bydrawing the data without replacement, whereas in bootstrapping a “new” sample isobtained by drawing from the data with replacement [748,1189] Thus, bootstrap-ping and resampling are associated with sampling with and without replacement,respectively Philip Good has been reported as saying that the difference betweenpermutation tests and bootstrap tests is that “[p]ermutations test hypotheses con-cerning distributions; bootstraps test hypotheses concerning parameters.”

Specifically, resampling is a data-dependent procedure, dealing with all finitearrangements of the observed data, and based on sampling without replacement

In contrast, bootstrapping involves repeated sampling from a finite populationthat conceptually yields an induced infinite population based on sampling withreplacement In addition, when bootstrapping is used with small samples it isnecessary to make complex adjustments to control the risk of error; see for example,discussions by Hall and Wilson in 1991 [577], Efron and Tibshirani in 1993 [402],and Westfall and Young, also in 1993 [1437] Finally, the bootstrap distributionmay be viewed as an unconditional approximation to the null distribution of the

5 Rank-order statistics were among the earliest permutation tests, transforming the observed data into ranks, e.g., from smallest to largest While they were an important step in the history of permutation tests, modern computing has superseded the need for rank-order tests in the majority

of cases.

Trang 28

8 1 Introduction

test statistic, while the resampling distribution may be viewed as a conditionaldistribution of the test statistic [1189]

In 1991 Donegani argued that it is preferable to compute a permutation test based

on sampling without replacement (i.e., resampling) than with replacement (i.e.,bootstrap), although, as he noted, the two techniques are asymptotically equivalent[358] In a thorough comparison and analysis of the two methods, he demonstratedthat (1) the bootstrap procedure is “bad” for small sample sizes or whenever thealternative is close to the null hypothesis and (2) resampling tests should be used inorder to take advantage of their flexibility in the choice of a distance criteria [358,

p 183]

In 1988 Tukey stated that the relationship between permutation procedures, onthe one hand, and bootstrap and jackknife procedures, on the other hand, is “far fromclose” [1382] Specifically, Tukey listed four major differences between bootstrapand jackknife procedures, which he called “resampling,” and permutation methods,which he called “rerandomization” [1382]

1 Bootstrap and jackknife procedures need not begin until the data is collected.Rerandomization requires planning before the data collection is specified

2 Bootstrap and jackknife procedures play games of omission of units with dataalready collected Rerandomization plays games of exchange of treatments,while using all numerical results each time

3 Bootstrap and jackknife procedures apply to experiences as well as experiments.Rerandomization only applies to randomized experiments

4 Bootstrap and jackknife procedures give one only a better approximation to adesired confidence interval Rerandomization gives one a “platinum standard”significance test, which can be extended in simple cases—by the usual devices—

to a “platinum standard” confidence interval

Thus, bootstrapping remains firmly in the conditional-on-assignment tradition,assuming that the true error distribution can be approximated by a discrete distribu-tion with equal probability attached to each of the cases [850] On the other hand,permutation tests view the errors as fixed in repeated samples [748] Finally, someresearchers have tacitly conceived of permutation methods in a Bayesian context.Specifically, this interpretation amounts to a primitive Bayesian analysis where theprior distribution is the assumption of equally-likely arrangements associated withthe observed data, and the posterior distribution is the resulting data-dependentdistribution of the test statistic induced by the prior distribution

1.4 Student’s t Test

Student’s pooled t test [1331] for two independent samples is a convenient vehicle

to illustrate permutation tests and to compare a permutation test with its parametric

counterpart As a historical note, Student’s 1908 publication used z for the test statistic, and not t The first mention of t appeared in a letter from William Sealy

Gosset (“Student”) to R.A Fisher in November of 1922 It appears that the decision

to change from z to t originated with Fisher, but the choice of the letter t was due

Trang 29

to Student Eisenhart [408] and Box [196] provide historical commentaries on the

transition from Student’s z test to Student’s t test.

Student’s pooled t test for two independent samples is well-known, familiar

to most researchers, widely used in quantitative analyses, and elegantly simple.The pooled t test evaluates the mean difference between two independent randomsamples Under the null hypothesis, H0W 1 D 2, Student’s pooled t test statistic

is defined as

t D .Nx1 Nx2/ 1 2/

sNx1 Nx2 ;where the standard error of the sampling distribution of differences between twoindependent sample means is given by

sNx1 Nx2 D

24.n1 1/s2

1.4.1 An Exact Permutation t Test

Exact permutation tests are based on all possible arrangements of the observeddata For the two-sample t test, the number of permutations of the observed data

is given by

n1Š n2Š ;where N D n1C n2.

Let xij denote the ith observed score in the jth independent sample, j D 1; 2

and i D 1; : : : ; nj, let to denote the Student t statistic computed on the observeddata, and let tk denote the Student t statistic computed on each permutation of theobserved data for k D 1; : : : ; M For the first permutation of the observed dataset, interchange x13 and x12, compute t1, and compare t1 with to For the secondpermutation, interchange x12and x22, compute t2, and compare t2with to Continuethe process for k D 1; : : : ; M

To illustrate the exact permutation procedure, consider two independent samples

of n1D n2D 3 observations and let fx11; x21; x31g denote the n1 D 3 observations

in Sample 1 and fx12; x22; x32g denote the n2 D 3 observations in Sample 2.Table1.1depicts the

Trang 30

2 indicates the placement of the observation after permutation The exact two-sidedprobability (P ) value is then given by

Trang 31

develop-random permutations of the data possible, thus yielding too few places of accuracyfor research purposes.

A moment-approximation permutation test is based, for example, on the firstthree exact moments of the underlying discrete permutation distribution, yieldingthe exact mean, variance, and skewness, i.e., x, 2

x, and x Computational detailsfor the exact moments are given in Sect.4.15of Chap.4 An approximate probabilityvalue is obtained by fitting the exact moments to the associated Pearson type IIIdistribution, which is completely characterized by the first three moments, andintegrating the obtained Pearson type III distribution

1.4.3 A Resampling-Approximation t Test

When M is very large, exact permutation tests are impractical, even with high-speedcomputers, and resampling-approximation permutation tests become an importantalternative Resampling-approximation tests provide more precise probabilityvalues than moment-approximation tests and are similar in structure to exact tests,except that only a random sample of size L selected from all possible permutations,

M , is generated, where L is usually a large number to guarantee accuracy to aspecified number of places For instance, L D 1;000;000 will likely ensure threeplaces of accuracy [696] The resampling two-sided approximate probability value

is then given by

O

P D number of jtkj values jtoj

1.5 An Example Data Analysis

The English poor laws, the relief expenditure act, and a comparison of twoEnglish counties provide vehicles to illustrate exact, moment-approximation, andresampling-approximation permutation tests

The English Poor Laws

Up until the Reformation, it was considered a Christian duty in England toundertake the seven corporal works of mercy In accordance with Matthew25:32–46, Christians were to feed the hungry, give drink to the thirsty,welcome a stranger, clothe the naked, visit the sick, visit the prisoner, andbury the dead After the Reformation and the establishment of the Church ofEngland, many of these precepts were neglected, the poor were left withoutadequate assistance, and it became necessary to regulate relief of the poor

(continued)

Trang 32

12 1 Introduction

by statute The Poor Laws passed during the reign of Elizabeth I played adetermining role in England’s system of welfare, signaling a progression fromprivate charity to a welfare state, where care of the poor was embodied in law.Boyer [198] provides an exhaustive description of the historical development

of the English Poor Laws

In 1552, Parish registers of the poor were introduced to ensure a documented official record, and in 1563, Justices of the Peace were empow-ered to raise funds to support the poor In 1572, it was made compulsory thatall people pay a poor tax, with those funds used to help the deserving poor

well-In 1597, Parliament passed a law that each parish appoint an Overseer ofthe Poor who calculated how much money was needed for the parish, set thepoor tax accordingly, collected the poor rate from property owners, dispensedeither food or money to the poor, and supervised the parish poor house In

1601, the Poor Law Act was passed by Parliament, which brought togetherall prior measures into one legal document The act of 1601 endured until thePoor Law Amendment Act was passed in 1834

Consider an example data analysis utilizing Student’s pooled two-sample ttest based on historical parish-relief expenditure data from the 1800s [697] Toinvestigate factors that contributed to the level of relief expenditures, Boyer [198]assembled a data set comprised of a sample of 311 parishes in 20 counties in thesouth of England in 1831 The relief expenditure data were obtained from Blaug[172].6 Table1.2contains the 1831 per capita relief expenditures, in shillings, for

36 parishes in two counties: Oxford and Hertford For this example, the data wererounded to four places

The relief expenditure data from Oxford and Hertford counties are listed inTable1.2 Oxford County consisted of 24 parishes with a sample mean relief of

Nx1 D 20:28 shillings and a sample variance of s2

1 D 58:37 shillings HertfordCounty consisted of 12 parishes with a sample mean relief of Nx2D 13:47 shillingsand a sample variance of s2

2 D 37:58 shillings A conventional two-sample t testyields to D C2:68 and, with 24 C 12 2 D 34 degrees of freedom, a two-sidedapproximate probability value of OP D :0113 Although there are

24Š 12Š D 1;251;677;700possible arrangements of the observed data and an exact permutation test is thereforenot practical, it is not impossible For the Oxford and Hertford relief expenditure

6 The complete data set is available in several formats at the Cambridge University Press site: http:// uk.cambridge.org/resources/0521806631

Trang 33

Table 1.2 Average per

capita relief expenditures for

Oxford and Hertford counties

in shillings: 1831

Parish Expenditure Parish Expenditure Parish Expenditure

Finally, a resampling analysis of the Oxford and Hertford relief expendituredata based on L D 1;000;000 random arrangements of the observed data inTable 1.2, yields 8,478 calculated t values equal to or more extreme than theobserved value of to D C2:68, and a two-sided approximate probability value ofO

P D 8;478=1;000;000 D 0:0085

1.6 Overviews of Chaps 2 – 6

Chapters2 6describe the birth and development of statistical permutation methods.Chapter2covers the period from 1920 to 1939; Chap.3, the period from 1940 to1959; Chap.4, the period from 1960 to 1979; and Chap.5, the period from 1980

to 2000 Chapter6looks beyond the year 2000, summarizing the development ofpermutation methods from 2001 to 2010 Following Chap.6 is a brief epiloguesummarizing the attributes that distinguish permutation statistical methods fromconventional statistical methods

Chapter 2 : 1920–1939

Chapter2chronicles the period from 1920 to 1939 when the earliest discussions ofpermutation methods appeared in the literature In this period J Spława-Neyman,R.A Fisher, R.C Geary, T Eden, F Yates, and E.J.G Pitman laid the foundations

of permutation methods as we know them today As is evident in this period,

Trang 34

14 1 Introduction

permutation methods had their roots in agriculture and, from the beginning, werewidely recognized as the gold standard against which conventional methods could

be verified and confirmed

In 1923 Spława-Neyman introduced a permutation model for the analysis of fieldexperiments [1312], and in 1925 Fisher calculated an exact probability using thebinomial distribution [448] Two years later in 1927, Geary used an exact analysis

to support the use of asymptotic methods for correlation and regression [500], and

in 1933 Eden and Yates used a resampling-approximation permutation approach tovalidate the assumption of normality in an agricultural experiment [379]

In 1935, Fisher’s well-known hypothesized experiment involving “the lady

tasting tea” was published in the first edition of The Design of Experiments [451]

In 1936, Fisher used a shuffling technique to demonstrate how a permutation testworks [453], and in the same year Hotelling and Pabst utilized permutation methods

to calculate exact probability values for the analysis of rank data [653]

In 1937 and 1938, Pitman published three seminal articles on permutationmethods The first article dealt with permutation methods in general, with anemphasis on the two-sample test; the second article with permutation methods asapplied to bivariate correlation; and the third article with permutation methods asapplied to a randomized blocks analysis of variance [1129–1131]

In addition to laying the foundations for permutation tests, the 1920s and 1930swere also periods in which tools to ease the computation of permutation testswere developed Probability tables provided exact values for small samples, ranktests simplified the calculations, and desktop calculators became more available.Importantly, statistical laboratories began to appear in the United States in the1920s and 1930s, notably at the University of Michigan and Iowa State College ofAgriculture (now, Iowa State University) These statistical centers not only resulted

in setting the foundations for the development of the computing power that wouldeventually make permutation tests feasible, they also initiated the formal study ofstatistics as a stand-alone discipline

Chapter 3 : 1940–1959

Chapter3explores the period between 1940 and 1959 with attention to the ing development of permutation methods This period may be considered as a bridgebetween the early years where permutation methods were first conceptualized andthe next period, 1960–1979, in which gains in computer technology provided thenecessary tools to successfully employ specific permutation tests

continu-Between 1940 and 1959, the work on establishing permutation statistical ods that began in the 1920s continued In the 1940s, researchers applied knownpermutation techniques to create tables of exact probability values for smallsamples, among them tables for 2 2 contingency tables; the Spearman and Kendallrank-order correlation coefficients; the Wilcoxon, Mann–Whitney, and Festingertwo-sample rank-sum tests; and the Mann test for trend

Trang 35

meth-Theoretical work, driven primarily by the computational challenges ofcalculating exact permutation probability values, was also completed during thisperiod Instead of the focus being on new permutation tests, however, attentionturned to developing more simple alternatives to do calculations by converting data

to rank-order statistics Examples of rank tests that were developed between 1940and 1959 include non-parametric randomization tests, exact tests for randomnessbased on serial correlation, and tests of significance when the underlying probabilitydistribution is unknown

While this theoretical undertaking continued, other researchers worked ondeveloping practical non-parametric rank tests Key among these tests were theKendall rank-order correlation coefficient, the Kruskal–Wallis one-way analysis ofvariance rank test, the Wilcoxon and Mann–Whitney two-sample rank-sum tests,and the Mood median test

Chapter 4 : 1960–1979

Chapter4surveys the development of permutation methods in the period between

1960 and 1979 that was witness to dramatic improvements in computer technology,

a process that was integral to the further development of permutation statisticalmethods Prior to 1960, computers were based on vacuum tubes7and were large,slow, expensive, and availability was severely limited Between 1960 and 1979computers increasingly became based on transistors and were smaller, faster, moreaffordable, and more readily available to researchers As computers became moreaccessible to researchers, work on permutation tests continued with much of thefocus of that work driven by computer limitations in speed and storage

During this period, work on permutation methods fell primarily into threecategories: writing algorithms that efficiently generated permutation sequences;designing exact permutation analogs for existing parametric statistics; and, forthe first time, developing statistics specifically designed for permutation methods.Numerous algorithms were published in the 1960s and 1970s with a focus onincreasing the speed and efficiency of the routines for generating permutationsequences Other researchers focused on existing statistics, creating permutationcounterparts for well-known conventional statistics, notably the Fisher exact proba-bility test for 2 2 contingency tables, the Pitman test for two independent samples,

the F test for randomized block designs, and the chi-squared test for goodness of fit.

The first procedures designed specifically for permutation methods, multi-responsepermutation procedures (MRPP), appeared during this period

7 The diode and triode vacuum tubes were invented in 1906 and 1908, respectively, by Lee de Forest.

Trang 36

16 1 Introduction

Chapter 5 : 1980–2000

Chapter5details the development of permutation methods during the period 1980

to 2000 It is in this period that permutation tests may be said to have arrived.One measure of this arrival was the expansion in the coverage of permutationtests, branching out from the traditional coverage areas in computer technology andstatistical journals, and into such diverse subject areas as anthropology, atmosphericscience, biomedical science, psychology, and environmental health A secondmeasure of the arrival of permutation statistical methods was the sheer number ofalgorithms that continued to be developed in this period, including the development

of a pivotal network algorithm by Mehta and Patel in 1980 [919] Finally, additionalprocedures designed specifically for permutation methods, multivariate randomizedblock permutation (MRBP) procedures, were published in 1982 by Mielke andIyer [984]

This period was also home to the first books that dealt specifically with tation tests, including volumes by Edgington in 1980, 1987 and 1995 [392–394],Hubert in 1987 [666], Noreen in 1989 [1041], Good in 1994 and 1999 [522–524],Manly in 1991 and 1997 [875,876], and Simon in 1997 [1277], among others.Permutation versions of known statistics continued to be developed in the 1980sand 1990s, and work also continued on developing permutation statistical tests thatdid not possess existing parametric analogs

permu-Chapter 6 : Beyond 2000

Chapter 6 describes permutation methods after the year 2000, an era in whichpermutation tests have become much more commonplace Computer memory andspeed issues that hampered early permutation tests are no longer factors andcomputers are readily available to virtually all researchers Software packages forpermutation tests now exist for well-known statistical programs such as StatXact,SPSS, Stata, and SAS A number of books on permutation methods have beenpublished in this period, including works by Chihara and Hesterberg in 2011,Edgington and Onghena in 2007 [396], Good in 2000 and 2001 [525–527],Lunneborg in 2000 [858], Manly in 2007 [877], Mielke and Berry in 2001 and 2007[961,965], and Pesarin and Salmaso in 2010 [1122]

Among the many permutation methods considered in this period are analysis

of variance, linear regression and correlation, analysis of clinical trials, measures

of agreement and concordance, rank tests, ridit analysis, power, and Bayesianhierarchical analysis In addition, permutation methods expanded into new fields

of inquiry, including animal research, bioinformatics, chemistry, clinical trials,operations research, and veterinary medicine

The growth in the field of permutations is made palpable by a search of TheWeb of Science R using the key word “permutation.” Between 1915 and 1959, thekey word search reveals 43 journal articles That number increases to 540 articles

Trang 37

for the period between 1960 and 1979 and jumps to 3,792 articles for the periodbetween 1980 and 1999 From 2000 to 2010, the keyword search for permutationresults in 9,259 journal articles.

Epilogue

A brief coda concludes the book Chapter2contains a description of the celebrated

“lady tasting tea” experiment introduced by Fisher in 1935 [451, pp 11–29], which

is the iconic permutation test The Epilogue returns full circle to the lady tastingtea experiment, analyzing the original experiment to summarize the attributes thatdistinguish permutation tests from conventional tests in general

Researchers early on understood the superiority of permutation tests forcalculating exact probability values These same researchers also well understoodthe limitations of trying to calculate exact probability values While someresearchers turned to developing asymptotic solutions for calculating probabilityvalues, other researchers remained focused on the continued development ofpermutation tests This book chronicles the search for better methods for calculatingpermutation tests, the development of permutation counterparts for existingparametric statistical tests, and the development of separate, unique permutationtests

Trang 38

2 1920–1939

The second chapter of A Chronicle of Permutation Statistical Methods is devoted to

describing the earliest permutation tests and the statisticians that developed them.Examples of these early tests are provided and, in many cases, include the originaldata The chapter begins with a brief overview of the development of permutationmethods in the 1920s and 1930s and is followed by an in-depth treatment of selectedcontributions The chapter concludes with a brief discussion of the early threads inthe permutation literature that proved to be important as the field progressed anddeveloped from the early 1920s to the present

2.1 Overview of This Chapter

The 1920s and 1930s ushered in the field of permutation statistical methods.Several important themes emerged in these early years First was the use ofpermutation methods to evaluate statistics based on normal theory Second was theconsiderable frustration expressed with the difficulty of the computations on whichexact permutation methods were based Third was the widespread reluctance tosubstitute permutation methods for normal-theory methods, regarding permutationtests as a valuable device, but not as replacements for existing statistical tests Fourthwas the use of moments to approximate the discrete permutation distribution, asexact computations were too cumbersome except for the very smallest of samples.Fifth was the recognition that a permutation distribution could be based on onlythe variable portion of the sample statistic, thereby greatly reducing the number ofcalculations required Sixth was an early reliance on recursion methods to generatesuccessive values of the test statistic And seventh was a fixation on the use oflevels of significance, such as ˛ D 0:05, even when the exact probability valuewas available from the discrete permutation distribution

The initial contributions to permutation methods were made by J Neyman, R.A Fisher, and R.C Geary in the 1920s [448,500,1312] Neyman’s

Spława-1923 article foreshadowed the use of permutation methods, which were developed

K.J Berry et al., A Chronicle of Permutation Statistical Methods,

DOI 10.1007/978-3-319-02744-9 2,

19

Trang 39

by Fisher while at the Rothamsted Experimental Station In 1927, Geary was thefirst to use an exact permutation analysis to evaluate and demonstrate the utility ofasymptotic approaches In the early 1930s T Eden and F Yates utilized permutationmethods to evaluate conventional parametric methods in an agricultural experiment,using a random sample of all permutations of the observed data comprised ofmeasurements on heights of Yeoman II wheat shoots [379] This was perhaps thefirst example of the use of resampling techniques in an experiment The middle1930s witnessed three articles emphasizing permutation methods to generate exactprobability values for 2 2 contingency tables by R.A Fisher, F Yates, andJ.O Irwin [452,674,1472] In 1926 Fisher published an article on “The arrangement

of field experiments” [449] in which the term “randomization” was apparently usedfor the first time [176,323] In 1935 Fisher compared the means of randomizedpairs of observations by permutation methods using data from Charles Darwin on

Zea mays plantings [451], and in 1936 Fisher described a card-shuffling procedurefor analyzing data that offered an alternative approach to permutation statisticaltests [453]

In 1936 H Hotelling and M.R Pabst utilized permutation methods to circumventthe assumption of normality and for calculating exact probability values for smallsamples of rank data [653], and in 1937 M Friedman built on the work ofHotelling and Pabst to investigate the use of rank data in the ordinary analysis

of variance [485] In 1937 B.L Welch compared the normal theory of Fisher’s

variance-ratio z test (later, Snedecor’s F test) with permutation-version analyses of

randomized block and Latin square designs [1428], and in 1938 Welch used anexact permutation test to address tests of homogeneity for the correlation ratio, 2[1429] Egon Pearson was highly critical of permutation methods, especially thepermutation methods of Fisher, and in 1937 Pearson published an important critique

of permutation methods with special attention to the works of Fisher on the analysis

of Darwin’s Zea mays data and Fisher’s thinly-veiled criticism of the coefficient of

racial likeness developed by Pearson’s famous father, Karl Pearson [1093]

In 1937 and 1938 E.J.G Pitman published three seminal articles on permutationtests in which he examined permutation versions of two-sample tests, bivariatecorrelation, and randomized blocks analysis of variance [1129–1131] Building onthe work of Hotelling and Pabst in 1936, E.G Olds used permutation methods togenerate exact probability values for Spearman’s rank-order correlation coefficient

in 1938 [1054], and in that same year M.G Kendall incorporated permutationmethods in the construction of a new measure of rank-order correlation based

on the difference between the sums of concordant and discordant pairs [728].Finally, in 1939 M.D McCarthy argued for the use of permutation methods as firstapproximations before considering the data by means of an asymptotic distribution

2.2 Neyman–Fisher–Geary and the Beginning

Although precursors to permutation methods based on discrete probability ues were common prior to 1920 [396, pp 13–15], it was not until the early1920s that statistical tests were developed in forms that are recognized today as

Trang 40

val-2.2 Neyman–Fisher–Geary and the Beginning 21

permutation methods The 1920s and 1930s were critical to the development ofpermutation methods because it was during this nascent period that permutationmethods were first conceptualized and began to develop into a legitimate statisticalapproach The beginnings are founded in three farsighted publications in the 1920s

by J Spława-Neyman, R.A Fisher, and R.C Geary.1

2.2.1 Spława-Neyman and Agricultural Experiments

In 1923 Jerzy Spława-Neyman introduced a permutation model for the analysis

of agricultural field experiments This early paper used permutation methods tocompare and evaluate differences among several crop varieties [1312]

J Spława-Neyman

Jerzy Spława-Neyman earned an undergraduate degree from the University

of Kharkov (later, Maxim Gorki University2) in mathematics in 1917 andthe following year was a docent at the Institute of Technology, Kharkov

He took his first job as the only statistician at the National Institute ofAgriculture in Bydgoszcz in northern Poland and went on to receive a Ph.D

in mathematics from the University of Warsaw in 1924 with a dissertation,written in Bydgoszcz, on applying the theory of probability to agriculturalexperiments [817, p 161] It was during this period that he dropped the

“Spława” from his surname, resulting in the more commonly-recognizedJerzy Neyman Constance Reid, Spława-Neyman’s biographer, explained thatNeyman published his early papers under the name Spława-Neyman, and thatthe word Spława refers to Neyman’s family coat of arms and was a sign ofnobility [1160, p 45] Spława-Neyman is used here because the 1923 paperwas published under that name

After a year of lecturing on statistics at the Central College of Agriculture

in Warsaw and the Universities of Warsaw and Krakow, Neyman was sent

by the Polish government to University College, London, to study statisticswith Karl Pearson [817, p 161] Thus it was in 1925 that Neyman moved

to England and, coincidentally, began a decade-long association with EgonPearson, the son of Karl Pearson That collaboration eventually yielded

(continued)

1 For an enlightened discussion of the differences and similarities between Neyman and Fisher and their collective impact on the field of statistics, see a 1966 article by Stephen Fienberg and Judith

Tanur in International Statistical Review [430 ] and also E.L Lehmann’s remarkable last book,

published posthumously in 2011, on Fisher, Neyman, and the Creation of Classical Statistics [816 ].

2 Maxim Gorki (Maksim Gorky) is a pseudonym for Aleksei Maksimovich Peshkov (1868–1936), Russian short-story writer, novelist, and political activist.

Định dạng
Số trang	535
Dung lượng	8,07 MB