Having in mind three objectives: (1) to make available, in a handy format, tables of areas, percentiles and critical values for current applications of inferential statistics; (2) to provide, for each table, clear and sufficient guidelines as to their correct use and interpretation; and (3) to present the mathematical basis for the interested reader, together with the recipes and computational algorithms that were used to produce the tables.
Louis Laurencelle Frangois-A Dupuis J24 Jg"-^ym V Statistical Tables Explained and Applied HJTTJGi&^nZzwiiiLX ^ » World Scientific Statistical Tables, Explained and Applied This page is intentionally left blank Statistical Tobies Explained and Applied Louis Laurencelle Universite du Quebec a Trois-Rivieres, Canada Frangois-A Dupuis Universite Laval, Canada V f e World Scientific wl Jersey London ••Sine Singapore • Hong Kong New Jersey'London Published by World Scientific Publishing Co Pte Ltd P O Box 128, Farrer Road, Singapore 912805 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Translation of the original French edition Copyright © 2000 by Les editions Le Griffon d'argile STATISTICAL TABLES, EXPLAINED AND APPLIED Copyright © 2002 by World Scientific Publishing Co Pte Ltd All rights reserved This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA In this case permission to photocopy is not required from the publisher ISBN 981-02-4919-5 ISBN 981-02-4920-9 (pbk) Printed in Singapore by World Scientific Printers Contents Page Introduction vii Common abbreviations and notations Normal distribution ix Chi-square (x2) distribution 17 Student's t distribution 27 with Dunn-Sidak's t and significance table for r F distribution 43 Studentized range (q) distribution 63 Dunnett's t distribution 73 E2 (monotonic variation) distribution 85 Fmax distribution 103 Cochran's C distribution 113 Orthogonal polynomials 125 Binomial distribution 147 Number-of-runs distribution 169 Random numbers 185 Supplementary examples 197 v vi Statistical tables explained and applied Mathematical complements Beta [Beta distribution $x(a,b), Beta function B(a,b)] Binomial expansion Combinations, C(m,n) or (™) Correlation coefficient, p XY , rXY Distribution function, P(x) Expectation (of a random variable), \x or E(X) Exponential distribution, £(0) Factorial (function), n\ Factorial (ascending n(m), descending n(m)) Gamma [Gamma distribution Gk(x), Gamma function T ( x ) ] Integration (analytic, direct) Integration (numerical) Interpolation (linear, harmonic) Mean (of a random variable), \i or E(X), X Moments of a distribution [p., a2 yl5 y2] Moment estimates [X, s2, gv g2] Poisson distribution, ¥o(kt) Probability density function, p(x) Probability distribution function, P(x) Simpson's (parabolic) rule Standard deviation (of a random variable), a, s Uniform distribution, U(a,b) and t/(0,l) Variance (of a random variable), a or var(X), s2 211 213 213 214 214 214 215 215 215 215 216 216 217 217 218 218 219 220 220 220 220 221 221 222 Bibliographical references 223 Index of examples 227 General index 231 Introduction While preparing this book for publication, we had in mind three objectives: (1) to make available, in a handy format, tables of areas, percentiles and critical values for current applications of inferential statistics; (2) to provide, for each table, clear and sufficient guidelines as to their correct use and interpretation; and (3) to present the mathematical basis for the interested reader, together with the recipes and computational algorithms that were used to produce the tables As for our first objective, the reader will find several "classical" tables of distributions like those of the normal law, Student's t, Chi-square and F of Fisher-Snedecor All values have been re-computed, occasionnaly with our own algorithms; if our values should disagree with older ones, ours should prevail! Moreover, many other tables are new or made available for the first time; let us mention those of centiles for the E2 statistic concerning nonlinear monotonic variation in analysis of variance (ANOVA), of coefficients for the reconversion of orthogonal polynomials, and an extensive set of critical values for the binomial and the number-of-runs distributions To meet our second objective, we provide, for each distribution, a section on how to read off and use appropriate values in the tables, and another one with illustrative examples Supplementary examples are presented in a separate section, thus covering most common situations in the realm of significance testing procedures Finally, our third objective required us to compile more or less scattered and ill-known published documents on the origins, properties and computational algorithms (exact or approximate) for each selected distribution or probability law For the most important distributions (normal, %2, t, F, binomial, random numbers), we present computer algorithms that efficiently generate pseudo random values with chosen characteristics The reader should benefit from our compiled information and results, as we have tried to render them in the simplest and most easy-to-use fashion vii Vlll Statistical tables explained and applied The selection of our set of tabled statistical distributions (there are many more) has been partly dictated by the practice of ANOVA Thus, statistics like Hartley's F m a x and Cochran's C are often used for assessing the equality of variance assumption generally required for a valid significance test with the F-ratio distribution Also, Dunn-Sidak's t test, Studentized range q statistic, the E2 statistic and orthogonal polynomials all serve for comparing means in the context of ANOVA or in its sequel Apart from Winer's classic Statistical principles in experimental design (McGraw-Hill 1971, 1991), we refer the reader to Hochberg and Tamhane's (Wiley, 1987) treatise on that topic Briefly, our suggestion for the interpretation of effects in ANOVA is a function of the research hypotheses on the effects of the independent variable (I V.) We distinguish two global settings: 1) If there are no specific or directional hypotheses on the effects of the I V., the global F test for main effect may suffice When significant at the prescribed level, that test indicates that the I.V succeeded in bringing up real modifications in the measured phenomenon Should we want a detailed analysis of the effects of I.V., we may compare means pairwise according to the levels of I.V.: the usually recommended procedure for this is the HSD test of Tukey; some may prefer the less conservative Newman-Keuls approach If the wished-for comparisons extend beyond the mean vs mean format to include linear combinations of means (i.e group of means vs group of means), Scheffe's procedure and criterion may be called for, based on the F distribution 2) If planned or pre-determined comparisons are in order, justified by specific or directional research hypotheses or by the structure of the I V., the test to be applied depends on such structure and/or hypotheses For comparing one level of I.V (e.g mean Xk) to every other level (e.g X,, X2, , X i _ ), Dunnett's t test may be used To verify that a given power of the I.V (or regressor variable) has a linear influence on the dependent (or measured) variable, orthogonal polynomials analysis is well suited, except when the research hypothesis does not specify a particular function or the I.V is not metric, in which cases tests on monotonic variation, using6 the E2 statistic, may be applied On the other hand, if specific hypotheses concern only a subset of parwise comparisons, an appropriate procedure is Dunn-Sidak's t test [akin to the Bonferroni probability criterion] Introduction IX Such simple rules as those given above cannot serve to cope adequately with every special case or situation that one encounters in the practice of ANOVA Controversial as they may be, we propose these rules as a starting point to help clarify and make better the criteria and procedures by which means are to be compared in ANOVA designs The "Mathematical complements", at the end of the book, is in fact a short dictionnary of concepts and methods, statistical as well as mathematical, that appear in the distribution sections The purpose of the book being mostly utilitarian, we limit ourselves to the presentation of the main results, theorems and conclusions without algebraic demonstration or proof However, as one may read in some mathematical textbooks, "the reader may easily verify that " Common abbreviations and notations d.f distribution function (of r.v X), also denoted P(x) df degrees of freedom (of a statistic), related to parameter v in some p.d.f E(X) e exp(x) In mathematical expectation (or mean) of r.v X, relative to some p.d.f Euler's constant, defined by e = + -i + JJ + « 2.7183 value of e (e « 2.7183) to the xth power, or ex natural (or Napierian) logarithm p.d.f probability density function (of r.v X), also denoted p(x) p.m.f probability mass function (of X, a discrete r.v.), also denoted p(x) 7i usually, area of the unit circle (71 « 3.1416) M a y also designate t h e true probability of success in a trial, in binomial (or Bernoulli) sampling r.v random variable, random variate s.d standard deviation (s or sx for a sample, a or a x for a population or p.d.f.) var variance, usually population variance (a ) Mathematical complements 221 segmenting the interval in many sub-intervals and in mimicking the variation of f(x) in each sub-interval with a 2nd degree linear function, i.e a parabolic arc Simpson's rule demands that the original interval be partitioned in In segments allowing the position of n arcs In each segment, we get the triple of coordinates {(xL,yL), (xc,yc), (xR,yR)} corresponding to the Left, Center and Right positions in the segment Then, using y = f(x), the area component for that segment is: (Area) = L) ; L + 4> ; C + ; R] x-i(xR—xL) Easy to implement in a computer program, this method is the most efficient among the so-called simple methods of numerical integration Example Let Q = f1 *Jxdx = % In order to position a (single) parabolic arc in the integration interval (0,1), we must have yL =/(0) = 0, yc =/( : A) « 0.7071, and_yR = / ( l ) = 1, whence Ql = [0+4x0.7071+l]x-i(l-0) = 0.63807 For two arcs and using ordinates /(0), /(%), f(Vi), f(3A) a n d / ( l ) , we get Q2 * [0+4x0.5+0.7071]x-|(0.5) + [0.7071+4x0.8660+l]x-|(0.5) = [0 + 4x0.5 + 2x0.7071 +4x0.8660+ l]x-|(0.5) = 0.65652 Likewise, we could calculate Q3 » 0.66115, Q4 ô 0.66308, , Qll_>a0->% Example Let đ(1) » 0.841344746 where 0, ®(z) = Vi + jzf(z) dz Using a single parabolic arc and ordinates/(0) = l/VTH « 0.39894, /('A) * 0.35207, / ( l ) » 0.24197, we obtain 1 57 rel with t 57 rel with x2 57 rel with Beta 101 213 rel with binomial 58 F max 111 rel with F i l l Gamma 216 normal 13 Poisson (rel with Chi-square) 219 rectangular (discrete uniform) 221 Student's t 39 rel with F l v 40 uniform 221 Dunnett's method 82 Dunn-Sidak criterion 40 example 36 Expectation 215 see Moments Factorial (function) 215 rel with Gamma function 215 ascending 215 descending 215 Statistical tables explained and applied Gamma distribution 216 function 216 rel with factorial 216 gj, Yt (skewness index) 218-219 g2, y2 (kurtosis index) 218-219 Generation of random variates (r.v.'s) binomial 167-168 Chi-square 25 exponential 24 F 60 normal 14 from random numbers 192 correlated normal 15 Student's t 41 uniform 193-194 Harmonic interpolation see Interpolation mean of unequal sample sizes n- for q test 69 210 Homogeneity of variances see Test procedures HSD (Tukey's method) 67 Integration analytic, direct 216 numerical 217 trapezoidal rule 217 Simpson's (parabolic) rule 220 Interpolation for sufficient size n*(r) 41 harmonic Def 217 for F 53 for q 67 for C 119 for InF 109 linear Def 217 for F 53 for q 67 Isotonic regression see Regression (monotonic) Kurtosis (y2, g2) 218-219 233 General index Mean 218 see Moments absolute diff., expect, for X normal 14 of upper 100a % for X normal 14 Model Dominance (monotonic variation) Def 95 99 weight function 101 normal 12 for IQ, height, measurement error 12 and goodness-of-fit test 203 Poisson for rare events 219 runs test for residuals with respect to model 182 Simple order (monotonic variation) Def 95 99 weight function 101 Moments Def 218 Beta 213 binomial 165 calculation by d.f 112 122 C (Cochran's) (partial) 123 Chi( X /Vv)25 Chi-square 24 exponential 215 F 59-60 for paired (correlated) variances 60 F max (partial) 112 Gamma 216 normal 14 number of runs 183-184 Poisson 219 product of two indep r.v.'s 70 q (partial) 71 rectangular (discrete uniform) 221 sample estimates 219 Student's t 40 uniform 221 Multiple comparisons method general rules viii Dunnett 82 Newman-Keuls 68 Scheffe 208 Tukey HSD 67 Newman-Keuls' method 68 Permutations generation of random permutations of numbers to N 195 Probability density function (p.d.f.) Def 220 Beta 213 Cauchy (as tx) 39 Chi(x/Vv),s25 Us 10 Chi-square 23 exponential 215 rel with Gamma 215 F 57 for paired (corr.) variances 60 Gamma 216 rel with exponential 216 normal 13 q 69 range of k normal r.v.'s 70 Student's t 38 uniform 192 221 Probability distribution function (d.f.) see Distribution function Probability mass function (p.m.f.) binomial 165 number of runs 183 Poisson 219 rectangular (discrete uniform) 192 221 Regression monotonic (isotonic) 95-101 polynomial Def 140 133 conversion from orthogonal to regular 138-139 runs of positive and negative residuals 182 simple linear by orthogonal 136 test of predicted value from linear regression 201 Run(s) Def 181 Sampling (generation of samples) 191 with/without replacement 194-195 234 Sampling distribution of mean 199 of p (proportion) 161-168 of r of s.d s 25 of Ms 70 of variance 23 Scheffe's method 208-210 Shape coefficients (of a distribution) see Moments Simpson's rule 220 see Integration (numerical) Skewness (yu gx) 218-219 Standard deviation (s.d.) 221 confidence interval 21 see Moments Standard error of estimation (based on linear regression) 202 of measurement 12 of a predicted value from linear regression 202 Test procedures analysis of variance (see Analysis of variance) for correlation (sample r) with H0: p = 37 41 with H0: p = p0 200 Dominance model of monotonic variation 98 goodness-of-fit (by Chi-square) 193 202 guessing in a multi-item multiplechoice exam 162 Heads or Tails 162 interaction (dependence) in contingency table 22 mean (sample X) (a known) 199 (CT2 unknown) 199 Statistical tables explained and applied Test procedures (cont.) number of runs 181-182 power components in polynomial regression 133 137 proportion (sample/?) 163 Simple order model of monotonic variation 96 for the diff between two error-prone measurements 12 two indep r coefficients 207 two indep means 36 (a known) 204 two indep variances 55 two paired means 205 two paired variances 206 means (multiple planned compar.) according to Dunn-Sidak 36 by Dunnett's method 82 means (multiple unplanned compar.) by Newman-Keuls' method 68 by Tukey's HSD method by Scheffe's method 208-210 67 homogeneity of variances k=2 variances 55 k>2 109-110 119 207 rules for multiple comparisons viii Tukey's HSD method 67 Unequal groups Bartlett's %2 207-208 C 119-120 Dunnett's t 83 ^ max HO-111 q 68-69 Scheffe's method 210 Variance 222 component of orthogonal polynomial regression 142 confidence interval 21 homogeneity of see Test procedures see Moments Statistical Tobies, Explained and Applied This book contains several new or unpublished tables, such as one on the significance of the correlation coefficient r, one giving the percentiles of the E' statistic for monotonic variation (with two structural models of variation), an extensive table for the number-of-runs test, three tables for the binomial sum of probabilities, and a table of coefficients for the re-conversion of orthogonal polynomials In the case of the more familiar tables, such as those of the normal integral, or Student's t, Chi-square and F percentiles, all values have been re-computed, occasionally with the authors' own algorithms, using the most accurate methods available today For each of the fifteen distributions in the book, the authors have gathered the essential information so that interested readers can handle by themselves all phases of the computations An appendix, containing supplementary examples that pertain to the various tables, helps to complete the authors' review of current hypothesistesting procedures A mini-dictionary of often-used concepts and methods, statistical as well as mathematical, completes the book Besides meeting the needs of practitioners of inferential statistics, this book should be helpful to statistics teachers as well as graduate students, researchers and professionals in scientific computing, who will all find i t a rich source of essential data and references on the more important statistical distributions www worldscientific.com 4935 he .. .Statistical Tables, Explained and Applied This page is intentionally left blank Statistical Tobies Explained and Applied Louis Laurencelle Universite du... Mathematical presentation Calculation and moments Generation of pseudo random variates Statistical tables explained and applied Normal distributions N(60,52) 20 40 60 80 X Standard normal distribution 0,4... standard score, whose density function is the so-called standard normal distribution, -/V(0,1), Statistical tables explained and applied 14 Maximum p.d.f., at z = 0, equals p(0) « 0.3989, and