Second Edition Mathematical Statistics Basic Ideas and Selected Topics Volume I Peter J Bickel University of California Kjell A Doksum University of California PRENTICE HALL Upper Saddle River, New Jersey 07458 Library of Congress Cataloging-in-Publication Data BickeL Peter J Mathematical statistics: basic ideas and selected topics / Peter J Bickel, Kjell A Doksum-2 nd ed p em Includes bibliographical references and index ISBN 0-13-850363-X(v 1) Mathematical statistics, I Doksum, Kjell A, II Title QA276,B472001 519.5-dc21 00-031377 •, !, I Acquisition Editor: Kathleen Boothby Sestak Editor in Chief: Sally Yagan Assistant Vice President of Production and Manufacturing: David W, Riccardi Executive Managing Editor: Kathleen Schiaparelli Senior Managing Editor: Linda Mihatov Behrens Production Editor: Bob Walters Manufacturing Buyer: Alan Fischer Manufacturing Manager: Trudy Pisciotti Marketing Manager: Angela Battle Marketing Assistant: Vince Jansen Director of Marketing: John Tweeddale Editorial Assistant: Joanne Wendelken Art Director: Jayne Conte Cover Design: Jayne Conte • @2001, 1977 by Prentice-Hall, Inc, Upper Saddle River, New Jersey 07458 i All rights reserved, No part of this book may be reproduced, in any form Or by any means, without permission in writing from the publisher ,• i Printed in the United States of America 10 •, , : i ISBN: 0-13-850363-X , Prentice-Hall International (UK) Limited, London Prentice-Hall of Australia Pty Limited, Sydney Prentice-Hall of Canada Inc., Toronto Prentice-Hall Hispanoamericana, S.A., Mexico Prentice-Hall of India Private Limited, New Delhi Prentice-Hall of Japan, Inc., Tokyo Pearson Education Asia Pte Ltd Editora Prentice-Hall Brasil, Ltda., Rio de Janeiro I • To Erich L Lehmann j, , I ! I H - t' , I, I _-_ _ - - - - - - CONTENTS PREFACE TO THE SECOND EDITION: VOLUME I PREFACE TO THE FIRST EDITION STATISTICAL MODELS, GOALS, AND PERFORMANCE CRITERIA 1.1 Data, Models, Parameters, and Statistics 1.1.1 Data and Models 1.1.2 Parametrizations and Parameters 1.1.3 Statistics as Functions on the Sample Space 1.1.4 Examples, Regression Models 1.2 Bayesian Models 1.3 The Decision Theoretic Framework 1.3.1 Components of the Decision Theory Framework 1.3.2 Comparison of Decision Procedures 1.3.3 Bayes and Minimax Criteria 1.4 Prediction 1.5 Sufficiency 1.6 Exponential Families 1.6.1 The One-Parameter Case 1.6.2 The Multiparameter Case 1.6.3 Building Exponential Families 1.6.4 Properties of Exponential Families 1.6.5 Conjugate Families of Prior Distributions 1.7 Problems and Complements 1.8 Notes 1.9 References ••• XIII •• XVII 1 12 16 17 24 26 32 41 49 49 53 56 58 62 66 95 96 •• VI1 VIII CONTENTS METHODS OF ESTIMATION 99 2.1 Basic Heuristics of Estimation 99 2.1.1 Minimum Contrast Estimates; Estimating Equations 99 2.1.2 The Plug-In and Extension Principles 2.2 2.3 * 2.4 Minimum Contrast Estimates and Estimating Equations 107 2.2.1 Least Squares and Weighted Least Squares 107 2.2.2 Maximum Likelihood 114 Maximum Likelihood in Multiparameter Exponential Families 121 Algorithmic Issues 127 2.4.1 The Method of Bisection 127 2.4.2 Coordinate Ascent 129 2.4.3 The Newton-Raphson Algorithm 132 2.4.4 The EM (ExpectationlMaximization) Algorithm 133 2.5 Problems and Complements 138 2.6 Notes 158 2.7 References 159 MEASURES OF PERFORMANCE 161 3.1 Introduction 161 3.2 Bayes Procedures 161 3.3 Minimax Procedures 170 Unbiased Estimation and Risk Inequalities 176 3.4.1 Unbiased Estimation, Survey Sampling 176 3.4.2 The Information Inequality 179 *3.4 * 3.5 102 Nondecision Theoretic Criteria 188 3.5.1 Computation 188 3.5.2 Interpretability 189 3.5.3 Robustness 190 3.6 Problems and Complements 197 3.7 Notes 210 3.8 References 211 TESTING AND CONFIDENCE REGIONS 213 4.1 Introduction 213 4.2 Choosing a Test Statistic: The Neyman-Pearson Lemma 223 4.3 Uniformly Most Powerful Tests and Monotone Likelihood Ratio Mddels 227 Confidence Bounds, Intervals, and Regions 233 4.4 • CONTENTS IX The Duality Between Confidence Regions and Tests 241 *4.6 Uniformly Most Accurate Confidence Bounds 248 *4.7 Frequentist and Bayesian Formulations 251 4.8 Prediction Intervals 252 4.9 Likelihood Ratio Procedures 255 4.9.1 Introduction 255 4.9.2 Tests for the Mean of a Normal Distribution-Matched Pair Experiments 257 Tests and Confidence Intervals for the Difference in Means of Two Normal Populations 261 4.9.4 The Two-Sample Problem with Unequal Variances 264 4.9.5 Likelihood Ratio Procedures for Bivariate Normal Distributions 266 4.5 4.9.3 4.10 Problems and Complements 269 4.11 Notes 295 4.12 References 295 ASYMPTOTIC APPROXIMATIONS 5.1 Introduction: The Meaning and Uses of Asymptotics 297 5.2 Consistency 301 5.3 5.4 -." - - ,." ,- - _,,~.c, '.:• - ''''~'',','' 297 5.2.1 Plug-In Estimates and MLEs in Exponential Family Models 301 5.2.2 Consistency of Minimum Contrast Estimates 304 First- and Higher-Order Asymptotics: The Delta Method with Applications 306 5.3.1 The Delta Method for Moments 306 5.3.2 The Delta Method for In Law Approximations 311 5.3.3 Asymptotic Normality of the Maximum Likelihood Estimate in Exponential Families 322 Asymptotic Theory in One Dimension 324 5.4.1 Estimation: The Multinomial Case 324 * 5.4.2 Asymptotic Normality of Minimum Contrast and M -Estimates 327 * 5.4.3 Asymptotic Normality and Efficiency of the MLE 331 * 5.4.4 * 5.4.5 Testing 332 Confidence Bounds 336 5.5 Asymptotic Behavior and Optimality of the Posterior Distribution 337 5.6 Problems and Complements 345 5.7 Notes 362 5.8 References 363 x CONTENTS INFERENCE IN THE MULTIPARAMETER CASE 6.1 *6.2 Inference for Gaussian Linear Models 6.1.1 The Classical Gaussian Linear Model 365 6.1.2 369 *6.4 Estimation Tests and Confidence Intervals 6.1.3 Asymptotic Estimation Theory in p Dimensions 6.2.1 Estimating Equations 6.2.2 *6.3 Asymptotic Normality and Efficiency of the MLE The Posterior Distribution in the Multiparameter Case 6.2.3 Large Sample Tests and Confidence Regions 6.3.1 Asymptotic Approximation to the Distribution of the Likelihood Ratio Statistic 6.3.2 Wald's and Rao's Large Sample Tests Large Sample Methods for Discrete Data 6.4.1 Goodness-of-Fit in a Multinomial Model Pearson's X2 Test 6.4.2 Goodness-of-Fit to Composite Multinomial Models Contingency Tables *6.5 6.4.3 Logistic Regression for Binary Responses Generalized Linear Models *6.6 Robustness Properties and Semiparametric Models 6.7 6.8 6.9 365 366 374 383 384 386 391 392 392 398 400 401 403 408 Problems and Complements 411 417 422 Notes References 438 438 A A REVIEW OF BASIC PROBABILITY THEORY Al The Basic Model A2 Elementary Properties of Probability Models A.3 Discrete Probability Models A4 Conditional Probability and Independence A5 Compound Experiments A.6 Bernoulli and Multinomial Trials, Sampling With and Without Replacement A7 Probabilities on Euclidean Space A.8 Random Variables and Vectors: Transformations A9 Independence of Random Variables and Vectors AI0 The Expectation of a Random Variable A.ll Moments A.l2 Moment and Cumu1ant Generating Functions 441 441 443 443 444 446 447 448 451 453 454 456 459 • XI CONTENTS A.13 Some Classical Discrete and Continuous Distributions 460 A.14 Modes of Convergence of Random Variables and Limit Theorems 466 A.15 Further Limit Theorems and Inequalities 468 A.16 Poisson Process 472 A.17 Notes 474 A.18 References 475 B ADDITIONAL TOPICS IN PROBABILITY AND ANALYSIS B.I Conditioning by a Random Variable or Vector 477 B.1.1 The Discrete Case 477 B.1.2 Conditional Expectation for Discrete Variables 479 B.1.3 Properties of Conditional Expected Values 480 B.1.4 Continuous Variables 482 B.l.5 Comments on the General Case 484 B.2 Distribution Theory for Transformations of Random Vectors B.3 477 485 B.2.1 The Basic Framework 485 B.2.2 The Gamma and Beta Distributions 488 Distribution Theory for Samples from a Normal Population 491 F, and t Distributions B.3.2 Orthogonal Transformations B.4 The Bivariate Normal Distribution B.5 Moments of Random Vectors and Matrices B.3.1 The X 491 , 494 497 502 B.5.1 Basic Properties of Expectations 502 B.5.2 Properties of Variance 503 B.6 The Multivariate Normal Distribution 506 B.6.1 Definition and Density 506 B.6.2 Basic Properties Conditional Distributions 508 B.7 Convergence for Random Vectors: B.8 Multivariate Calculus 511 516 B.9 Convexity and Inequalities 518 B.1O Topics in Matrix Theory and Elementary Hilbert Space Theory 519 Op and Op Notation B.IO.1 Symmetric Matrices 519 B.1O.2 Order on Symmetric Matrices 520 B.IO.3 Elementary Hilbert Space Theory 521 B.ll Problems and Complements 524 B.12 Notes 538 B.13 References 539 00 CONTENTS XII C TABLES 541 Table I The Standard Normal Distribution Table I' Auxilliary Table of the Standard Normal Distribution Table II t Distribution Critical Values Table III X Distribution Critical Values Table IV F Distribution Critical Values 542 543 544 545 546 547 INDEX I I • 542 Tables Pr(Z Appendix C < z) j z • -;t- Appendix C 543 Tables Pr(Z < z) z Table I' Auxilliary table of the standard normal distribution Pr(Z > z) z 50 Pr(Z > z) z 09 1.341 Pr(Z ~ z) z 02 2.054 045 126 040 253 08 10405 01 2.326 35 385 07 1.476 30 524 06 1.555 001 3.090 005 2.576 25 674 20 842 15 1.036 10 1.282 05 1.645 04 1.751 03 1.881 025 1.960 00005 3.891 0000 4.265 0005 3.291 0001 3.719 Entries in the top row are areas to the right of values in the second row Area = Area= -1.282 Area = 1.282 j, , 544 Tables Appendix C Pr(T > t) Table II f distribution critical values df I I ! I 10 1I 12 13 14 15 16 17 18 19 20 21 22 23 24 25 30 40 50 60 100 1000 00 25 1.000 0.816 0.765 0.741 0.727 0.718 0.711 0.706 0.703 0.700 0.697 0.695 0.694 0.692 0,691 0.690 0,689 0.688 0.688 0.687 0.686 0.686 0.685 0.685 0,684 0.683 0.681 0,679 0.679 0.677 0.675 0.674 50% ,10 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.310 1.303 1.299 1.296 1.290 1.282 1.282 80% 05 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 I 711 1.708 1.697 1.684 1.676 1.671 1.660 1.646 1.645 90% 025 12.71 4.303 3.182 2.776 2.571 2.447 2.365 2,306 2.262 2.228 2,201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2,069 2.064 2.060 2.042 2.021 2.009 2.000 1.984 1.962 1.960 95% Right tail probability p 02 01 005 15.89 31.82 63.66 9.925 4.849 6.965 3.482 4.541 5.841 2.999 3.747 4.604 3.365 2.757 4.032 3,143 2.612 3.707 2.517 2.998 3.499 2.449 2.896 3.355 3.250 2.398 2.821 2.359 2.764 3.169 2.718 3,106 2.328 2.303 2.681 3.055 2.282 2.650 3.012 2.624 2.264 2.977 2,947 2.249 2.602 2.235 2.583 2.921 2.224 2.567 2.898 2.552 2.214 2.878 2.205 2.539 2.861 2.197 2.528 2.845 2.189 2.518 2.831 2.183 2.508 2.819 2.500 2.807 2.177 2.172 2.492 2.797 2.167 2.485 2.787 2.457 2.750 2.147 2.704 2.123 2.423 2.109 2.403 2.678 2.390 2.660 2.099 2.626 2.081 2.364 2.581 2.056 2.330 2.326 2.576 2.054 96% 99% 98% Confidence level C 0025 127.3 14.09 7.453 5.598 4,773 4.317 4.029 3.833 3.690 3.581 3.497 3.428 3,372 3.326 3.286 3.252 3.222 3.197 3.174 3.153 3.135 3.1l9 3.104 3.091 3.078 3.030 2.971 2.937 2.915 2.871 2.813 2.807 995% 001 318.3 22.33 10.21 7,173 5.893 5.208 4,785 4.501 4.297 4.144 4.025 3.930 3.852 3.787 3.733 3.686 3.646 3.610 3.579 3.552 3.527 3.505 3.485 3.467 3.450 3.385 3.307 3.261 3.232 3.174 3.098 3.090 99.8% 0005 636.6 31.60 12.92 8.610 6.869 5.959 5.408 5.041 4.781 4.587 4.437 4.318 4.221 4.140 4.073 4.015 3.965 3.922 3.883 3.850 3.819 3.792 3.768 3.745 3.725 3.646 3.551 3.496 3,460 3.390 3.300 3.291 99.9% • • " ." The entries in the top row are the probabilities of exceeding the tabled values The left column gives the degrees of freedom . , • •, ,"'''~ ,~.j , Appendix C 545 Tables x Table III X distribution critical values df 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 80 100 25 1.32 2.77 4.11 5.39 6.63 7.84 9.04 10.22 11.39 12.55 13.70 14.85 15.98 17.12 18.25 19.37 20.49 21.60 22.72 23.83 24.93 26.04 27.14 28.24 29.34 30.43 31.53 32.62 33.71 34.80 45.62 56.33 66.98 88.13 109.14 10 2.71 4.61 6.25 7.78 9.24 10.64 12.02 13.36 14.68 15.99 17.28 18.55 19.81 21.06 22.31 23.54 24.77 25.99 27.20 28.41 29.62 30.81 32.01 33.20 34.38 35.56 36.74 37.92 39.09 40.26 51.81 63.17 74.40 96.58 118.50 05 3.84 5.99 7.81 9.49 11.07 12.59 14.07 15.51 16.92 18.31 19.68 21.03 22.36 23.68 25.00 26.30 27.59 28.87 30.14 31.41 32.67 33.92 35.17 36.42 37.65 38.89 40.11 41.34 42.56 43.77 55.76 67.50 79.08 101.88 124.34 025 5.02 7.38 9.35 11.14 12.83 14.45 16.01 17.53 19.02 20.48 21.92 23.34 24.74 26.12 27.49 28.85 30.19 31.53 32.85 34.17 35.48 36.78 38.08 39.36 40.65 41.92 43.19 44.46 45.72 46.98 59.34 71.42 83.30 106.63 129.56 Right tail probability p 02 01 5.41 6.63 7.82 9.21 11.34 9.84 11.67 13.28 13.39 15.09 15.03 16.81 16.62 18.48 18.17 20.09 19.68 21.67 21.l6 23.21 24.72 22.62 24.05 26.22 25.47 27.69 26.87 29.14 28.26 30.58 29.63 32.00 31.00 33.41 32.35 34.81 36.19 33.69 37.57 35.02 38.93 36.34 40.29 37.66 41.64 38.97 42.98 40.27 44.31 41.57 45.64 42.86 46.96 44.14 48.28 45.42 49.59 46.69 50.89 47.96 63.69 60.44 76.15 72.61 88.38 84.58 112.33 108.07 135.81 131.14 005 7.88 10.60 12.84 14.86 16.75 18.55 20.28 21.95 23.59 25.19 26.76 28.30 29.82 31.32 32.80 34.27 35.72 37.16 38.58 40.00 41.40 42.80 44.18 45.56 46.93 48.29 49.64 50.99 52.34 53.67 66.77 79.49 91.95 116.32 140.17 0025 9.14 11.98 14.32 16.42 18.39 20.25 22.04 23.77 25.46 27.11 28.73 30.32 31.88 33.43 34.95 36.46 37.95 39.42 40.88 42.34 43.78 45.20 46.62 48.03 49.44 50.83 52.22 53.59 54.97 56.33 69.70 82.66 95.34 120.10 144.29 001 10.83 13.82 16.27 18.47 20.52 22.46 24.32 26.12 27.88 29.59 31.26 32.91 34.53 36.12 37.70 39.25 40.79 42.31 43.82 45.31 46.80 48.27 49.73 51.18 52.62 54.05 55.48 56.89 58.30 59.70 73.40 86.66 99.61 124.84 149.45 0005 12.12 15.20 17.73 20.00 22.11 24.10 26.02 27.87 29.67 31.42 33.14 34.82 36.48 38.11 39.72 41.31 42.88 44.43 45.97 47.50 49.01 50.51 52.00 53.48 54.95 56.41 57.86 59.30 60.73 62.16 76.09 89.56 102.69 128.26 153.17 The entries in the top row are the probabilities of exceeding the tabled values p = Pr(x > x) where x is in the body of the table and p is in the top row (margin) df denotes degrees of freedom and is given in the left column (margin) 546 Tables Pr(F Appendix C > 1) f Table IV F distribution critical values Tl Pr(F > f) , i 0.05 0.025 0.01 0.05 T2 0.D25 om \, 0.05 0.025 0.01 0.05 0.025 0.01 0.05 0.D25 0.01 0.05 0.025 0.01 0.05 0.025 0.01 0.05 0.025 om 0.05 0.025 0.01 0.05 0.025 0.01 0.05 0.025 0.01 0.05 0.025 0.01 Tl 10 12 15 161 648 4052 18.51 38.5! 98.50 10.13 17.44 34.12 7.71 12.22 21.20 6.61 10.01 16.26 5.99 8.81 13.75 5.59 8.07 12.25 5.32 7.57 11.26 5.12 7.21 10.56 4.96 6.94 10.04 4.75 6.55 9.33 4.54 6.20 8.68 199 799 4999 19.00 39.00 99.00 9.55 16.04 30.82 6.94 10.65 18.00 5.79 8.43 13.27 5.14 7.26 10.92 4.74 6.54 9.55 4.46 6.06 8.65 4.26 5.71 8.02 4.10 5.46 7.56 3.89 5.10 6.93 3.68 4.77 6.36 216 864 5403 19.16 39.17 99.17 9.28 15.44 29.46 6.59 9.98 16.69 5.41 7.76 12.06 4.76 6.60 9.78 4.35 5.89 8.45 4.07 5.42 7.59 3.86 5.08 6.99 3.71 4.83 6.55 3.49 4.47 5.95 3.29 4.15 5.42 = numerator degrees of freedom, T2 225 900 5625 19.25 39.25 99.25 9.12 15.10 28.71 6.39 9.60 15.98 5.19 7.39 11.39 4.53 6.23 9.15 4.12 5.52 7.85 3.84 5.05 7.01 3.63 4.72 6.42 3.48 4.47 5.99 3.26 4.12 5.41 3.06 3.80 4.89 230 922 5764 19.30 39.30 99.30 9.01 14.88 28.24 6.26 9.36 15.52 5.05 7.15 10.97 4.39 5.99 8.75 3.97 5.29 7.46 3.69 4.82 6.63 3.48 4.48 6.06 3.33 4.24 5.64 3.11 3.89 5.06 2.90 3.58 4.56 234 937 5859 19.33 39.33 99.33 8.94 14.73 27.91 6.16 9.20 15.21 4.95 6.98 10.67 4.28 5.82 8.47 3.87 5.12 7.19 3.58 4.65 6.37 3.37 4.32 5.80 3.22 4.07 5.39 3.00 3.73 4.82 2.79 3.41 4.32 237 948 5928 19.35 39.36 99.36 8.89 14.62 27.67 6.09 9.07 14.98 4.88 6.85 10.46 4.21 5.70 8.26 3.79 4.99 6.99 3.50 4.53 6.18 3.29 4.20 5.61 3.14 3.95 5.20 2.91 3.61 4.64 2.71 3.29 4.14 239 957 5981 19.37 39.37 99.37 8.85 14.54 27.49 6.04 8.98 14.80 4.82 6.76 10.29 4.15 5.60 8.10 3.73 4.90 6.84 3.44 4.43 6.03 3.23 4.10 5.47 3.07 3.85 5.06 2.85 3.51 4.50 2.64 3.20 4.00 = denominator degrees of freedom 10 242 969 6056 19.40 39.40 99.40 8.79 14.42 27.23 5.96 8.84 14.55 4.74 6.62 10.05 4.06 5.46 7.87 3.64 4.76 6.62 3.35 4.30 5.81 3.14 3.96 5.26 2.98 3.72 4.85 2.75 3.37 4.30 2.54 3.06 3.80 15 246 985 6157 19.43 39.43 99.43 8.70 14.25 26.87 5.86 8.66 14.20 4.62 6.43 9.72 3.94 5.27 7.56 3.51 4.57 6.31 3.22 4.10 5.52 3.01 3.77 4.96 2.85 3.52 4.56 2.62 3.18 4.01 2.40 2.86 3.52 I I I, I INDEX x ~ F, X is distributed according to F, 463 B( n, e), binomial distribution with parameters nand 461 E(A), exponential distribution with parameter A, 464 H(D, N, n), hypergeometric distribution with parameters D, N, n, 461 M(n,e , ,eq ), multinomial distribution with parameters n, e1 , , q , 462 N(JL, E), multivariate normal distribution,507 N(J.L, ( ), normal distribution with mean J.L and variance a , 464 N (J.L 1, J.L2, a? , a~ , p), bivariate normal distribution, 492 P(A), Poisson distribution with parameter A, 462 U(a, b), uniform distribution on the interval (a, b), 465 e, e acceptance, 215 action space, 17 adaptation, 388 algorithm, 102, 127 bisection, 127,210 coordinate ascent, 129 EM, 133 Newton-Raphson, 102, 132, 189,210 for GLM, 413 proportional fitting, 157 alternative, 215, 217 analysis of variance (ANOVA), 367 table, 379 antisymmetric, 207, 209 asymptotic distribution of quadratic forms, 510 asymptotic efficiency, 331 of Bayes estimate, 342 ofMLE, 331, 386 asymptotic equivalence of MLE and Bayes estimate, 342 asymptotic normality, 311 of M -estimate, estimating equation estimate, 330 of estimate, 300 of minimum contrast estimate, 327 of MLE, 331, 386 of posterior, 339, 391 of sample correlation, 319 asymptotic order in probability notation, 516 asymptotic relative efficiency, 357 autoregressive model, 11, 292 Bayes credible bound, 251 Bayes credible interval, 252 Bayes credible region, 251 asympt~tic, 344 Bayes estimiLte, 162 Bernoulli trials, 166 equivariance, 168 Gaussim model, 163 linear, I ~7 Bayes risk, 1162 547 548 Bayes rule, 27, 162 Bayes' rule, 445, 479, 482 Bayes' theorem, 14 Bayesian models, 12 Bayesian prediction interval, 254 Behrens-Fisher problem, 264 Bernoulli trials, 447 Bernstein's inequality, 469 binomial case,S I Bernstein-von Mises theorem, 339 Berry-Esseen bound, 299 Berry-Esseen theorem, 471 beta distribution, 488 as prior for Bernoulli trials, 15 moments, 526 beta function, 488 bias, 20, 176 sample variance, 78 binomial distribution, 447, 461 bioequivalence trials, 198 bivariate log normal distribution, 535 bivariate normal distribution, 497 cumulants, 506 geometry, 532 nondegenerate, 499 bivariate normal model, 266 Cauchy distribution, 526 Cauchy-Schwartz inequality, 458 Cauchy-Schwarz inequality, 39 generalized, 52 I center of a population distribution, 71 central limit theorem, 470 multivariate,S 10 chain rule, 517 change of variable formula, 452 characteristic function, 505 Chebychev bound, 346 Chebychev's inequality, 299, 469 chi-square distribution, 49 I noncentral, 530 chi-square test, 402 chi-squared distribution, 488 classification Index Bayes rule, 165 coefficient of determination, 37 coefficient of skewness, 457 collinearity, 69, 90 comparison, 247 complete families of tests, 232 compound experiment, 446 concave function,S I conditional distribution, 478 for bivariate normal case, 501 for multivariate normal case, 509 conditional expectation, 483 confidence band quantiles simultaneous, 284 confidence bound, 23, 234, 235 mean nonparametric, 241 uniformly most accurate, 248 confidence interval, 24, 234,235 Bernoulli trials approximate, 237 exact, 244 location parameter nonparametric, 286 median nonparametric,282 one-sample Student t, 235 quantile nonparametric, 284 shift parameter nonparametric,287 two-sample Student t, 263 unbiased, 283 confidence level, 235 confidence rectangle Gaussian model, 240 confidence region, 233, 239 distribution function, 240 confidence regions Gaussian linear model, 383 conjugate normal mixture distributions, 92 consistency, 301 Index of estimate, 300, 30 I of minimum contrast estimates, 304 of MLE, 305, 347 of posterior, 338 of test, 333 uniform, 301 contingency tables, 403 contrast function, 99 control observation, convergence in L p norm, 536 in law, distribution, 466 in law, in distribution for vectors, 511 in probability, 466 for vectors, 511 of random variables, 466 convergence of sample quantile, 536 convex function, 518 convex support, 122 convexity, 518 correlation, 267,458 inequality, 458 multiple, 40 ratio, 82 covariance, 458 of random vectors, 504 covariate, 10 stochastic, 387, 419 Cramer-Rao lower bound, 181 Cramer-von Mises statistic, 271 critical region, 23, 215 critical value, 216,217 cumulant,460 generating function, 460 in normal distribution, 460 cumulant generating function for random vector, 505 curved exponential family, 125 existence of MLE, 125 De Moivre-Laplace theorem, 470 decision rule, 19 admissible, 31 549 Bayes, 27, 161, 162 inadmissible, I minimax, 28, 170, 171 randomized, 28 unbiased, 78 decision theory, 16 delta method, 306 for distributions, 311 for moments, 306 density, 456 conditional, 482 density function, 449 design, 366 matrix, 366 random, 387 values, 366 deviance, 414 decomposition, 414 Dirichlet distribution, 74, 198,202 distribution function (d.f.), 450 distribution of quadratic form, 533 dominated convergence theorem, 514 double exponential distribution, 526 duality between confidence regions and tests, 241 duality theorem, 243 Dynkin, Lehmann, Scheffe's theorem, 86 Edgeworth approximations, 317 eigenvalues, 520 empirical distribution, 104 empirical distribution function, 8, 139 bivariate, 139 entropy maximum, 91 error, autoregressive, II estimate, 99 consistent, 301 empirical substitution, 139 estimating equation, 100 frequency plug-in, 103 Hodges-Lehmann, 149 least squares, 100 550 I' ,i· IiI maximum likelihood, 114 method of moments, 10 I minimum contrast, 99 plug-in, 104 squared error Bayes, 162 unbiased, 176 estimating equation estimate asymptotic nonnality, 384 estimation, 16 events, 442 independent, 445 expectation, 454, 455 conditional,479 exponential distribution, 464 exponential family, 49 conjugate prior, 62 convexity, 61 curved,57 identifiability, 60 log concavity, 61 MLE,121 moment generating function, 59 multiparameter, 53 canonical, 54 one-parameter, 49 canonical, 52 rank of, 60 submodel, 56 supennodel,58 UMVU estimate, 186 extension principle, 102, 104 F distribution, 491 moments, 530 noncentral, 531 F statistic, 376 factorization theorem, 43 Fisher consistent, 158 Fisher infonnation, 180 matrix, 185 Fisher's discriminant function, 226 Fisher's genetic linkage model, 405 Fisher's method of scoring, 434 Index fitted value, 372 fixed design, 387 Frechet differentiable, 516 frequency function, 449 conditional, 477 frequency plug-in principle, 103 gamma distribution, 488 moments, 526 gamma function, 488 gamma model MLE, 124, 129, 130 Gauss-Marakov linear model, 418 Gauss-Markov assumptions, 108 Gauss-Markov theorem, 418 Gaussian linear model, 366 canonical fonn, 368 confidence intervals, 381 confidence regions, 383 estimation in, 369 identifiability, 371 likelihood ratio statistic, 374 MLE,371 testing, 378 UMVU estimate, 371 Gaussian model Bayes estimate, 163 existence of MLE, 123 mixture, 134 Gaussian two-sample model, 261 generali;>.:ed linear models (GLM), 411 geometric ~stribution, 72, 87 GLM, 412 ' estimate asymptotic distributions, 415 Gaussillll,435 likelihood ratio asymptotic distribution, 415 likelihood ratio test, 414 Poisson, 435 goodness-of-fit test, 220, 223 gross error models, 190 , , I I i i,, I HOlder's iJlequality, 518 ,I I, , j • Index Hammersley's theorem, 513 Hardy-Weinberg proportions, 103,403 chi-square test, 405 MLE, 118, 124 UMVU estimate, 183 hat matrix, 372 hazard rates, 69, 70 heavy tails, 208 Hessian, 386 hierarchical Bayesian normal model, 92 hierarchical binomial-beta model, 93 Hodges's example, 332 Hodges-Lehmann estimate, 207 Hoeffding bound, 299, 346 Hoeffding's inequality, 519 Horvitz-Thompson estimate, 178 Huber estimate, 207, 390 hypergeometric distribution, 3, 461 hypergeometric probability, 448 hypothesis, 215 composite, 215 null, 215 simple, 215 identifiable, independence, 445, 453 independent experiments, 446 indifference region, 230 influence function, 196 information bound, 181 asymptotic variance, 327 information inequality, 179, 181, 186, 188, 206 integrable, 455 interquartile range (IQR), 196 • • mvanance shift, 77 inverse Gaussian distribution, 94 IQR,196 iterated expectation theorem, 481 Jacobian, 485 theorem, 486 Jensen's inequality, 518 551 Kolmogorov statistic, 220 Kolmogorov's theorem, 86 Kullback-Leibler divergence, 116 and MLE, 116 Kullback-Leibler loss function, 169 kurtosis, 279, 457 Lp norm, 536 Laplace distribution, 526 Laplace distribution, 374 law of large numbers weak Bernoulli's, 468 Khintchin's, 469 least absolute deviation estimates, 149, 374 least favorable prior distribution, 170 least squares, 107, 120 weighted, 107, Il2 Lehmann alternative, 275 level (of significance), 217 life testing, 89 likelihood equations, 117 likelihood function, 47 likelihood ratio, 48, 256 asymptotic chi-square distribution, 394,395 confidence region, 257 asymptotic, 395 logistic regression, 410 test, 256 bivariate normal, 266 Gaussian one-sample model, 257 Gaussian two-sample model, 261 one-sample scale, 291 two-sample scale, 293 likelihood ratio statistic in Gaussian linear model, 376 simple, 223 likelihood ratio test, 335 linear model Gaussian, 366 non-Gaussian, 389 - ,.,_ -~ 552 ,II • I i I stochastic covariates, 419 Gaussian, 387 heteroscedastic, 421 linear regression model, 109 link function, 412 canonical, 412 location parameter, 209,463 location-scale parameter family, 463 location-scale regression existence of MLE, 127 log linear model, 412 logistic distribution, 57, 132 likelihood equations, 154 neural nets, 151 Newton-Raphson, 132 logistic linear regression model, 408 logistic regression, 56, 90, 408 logistic transform, 408 empirical, 409 logit, 90, 408 loss function, 18 0- 1, 19 absolute, 18 Euclidean, 18 Kullback-Leibler, 169,202 quadratic, 18 M-estimate, 330 asymptotic normality, 384 marginal density, 452 Markov's inequality, 469 matched pair experiment, 257 maximum likelihood, 114 maximum likelihood estimate, 114, see MLE maximum likelihood estimate (MLE), 114 mean, 71,454,455 sensitivity curve, 192 mean absolute prediction error, 80, 83 mean squared error (MSE), 20 mean squared prediction error (MSPE), 32 median, 71 MSE,297 Index population, 77,80,105 sample, 192 sensitivity curve, 193 Mendel's genetic model, 214 chi-square test, 403 meta-analysis, 222 method of moments, 101 minimax estimate Bernoulli trials, 173 distribution function, 202 minimax rule, 28, 170 minimax test, 173 MLE, 114, see maximum likelihood estimate as projection, 371 asymptotic normality exponential family, 322 Cauchy model, 149 equivariance, 114, 144 existence, 121 uniqueness, 121 MLR, 228, see monotone likelihood ratio model, 1,5 AR(l), II Cox, 70 Gaussian linear regression, 366 gross error, 190, 210 Lehmann, 69 linear, 10 Gaussian, iO location symmetric, 191 logistic linear, 408 nonparametric, one-sample, 3, 366 parametric, proportional hazard, 70 regression, regular, scale, 69 sernipararnetric,6 shift,4 symmetric, 68 two-sample, 553 Index moment, 456 central, 457 of random matrix, 502 moment generating function, 459 for random vector, 504 monotone likelihood ratio (MLR), 228 Monte Carlo method, 219, 221,298,314 MSE, 20, see mean squared error sample mean, 21 sample variance, 78 MSPE, 32, see mean squared prediction error bivariate normal, 36 multivariate normal, 37 MSPE predictor, 83, 372 multinomial distribution, 462 multinomial trials, 55, 447, 462 consistent estimates, 302 Dirichlet prior, 198 estimation asymptotic normality, 324 in contingency tables, 403 Kullback-Leibler loss Bayes estimate, 202 minimax estimate, 20 I MLE, Il9, 124 Pearson's chi-square test, 401 UMVU estimate, 187 multiple correlation coefficient, 37 multivariate normal distribution, 506 natural parameter space, 52, 54 natural sufficient statistic, 54 negative binomial distribution, 87 neural net model, 151 Neyman allocation, 76 Neyman-Pearson framework, 23 Neyman-Pearson lemma, 224 Neyman-Pearson test, 165 noncentral t distribution, 260 noncentral F distribution, 376 noncentral chi-square distribution, 375 nonnegative definite, 519 normal distribution, 464, see Gaussian central moments, 529 normal equations, 10 I weighted least squares, 113 normalizing transformation, zero skewness, 351 observations,S one-way layout, 367 binomial testing, 410 confidence intervals, 382 testing, 378 order statistics, 527 orthogonal, 41 orthogonal matrix, 494 orthogonality, 522 p-sample problem, 367 p-value, 221 parameter, nuisance, parametrization, Pareto density, 85 Pearson's chi-square, 402 placebo, plug-in principle, 102 Poisson distribution, 462 Poisson process, 472 Poisson's theorem, 472 Polya's theorem, 515 population, 448 population R-squared, 37 population quantile, 105 positive definite, 519 posterior, 13 power function, 78, 217 asymptotic, 334 sample size, 230 prediction, 16, 32 training set, 19 prediction error, 32 absolute, 33 squared,32 weighted, 84 554 prediction interval, 252 Bayesian, 254 distribution-free, 254 Student t, 253 predictor, 32 linear, 38, 39 principal axis theorem, 519 prior, 13 conjugate, 15 binomial, 15 exponential family, 62 general model, 73 multinomial, 74 normal case, 63 Poisson, 73 improper, 163 Jeffrey's, 203 least favorable, 170 probability conditional, 444 continuous, 449 discrete, 443, 449 distribution, 442 subjective, 442 probability distribution, 451 probit model, 416 product moment, 457 central,457 projection, 41, 371 projection matrix, 372 projections on linear spaces, 522 Pythagorean identity, 41, 377 in Hilbert space, 522 quality control, 229 quantile population, 104 sensitivity curve, 195 random, 441 random design, 387 random effects model, 167 random experiments, 441 Index random variable, 451 random vector, 451 randomization, randomized test, 79, 224 rank,48 ranking, 16 Rao score, 399 confidence region, 400 statistic, 399 asymptotic chi-square distribution, 400 test, 399 multinomial goodness-of-fit, 402 Rao test, 335, 336 Rayleigh distribution, 53 regression, 9, 366 confidence intervals, 381 confidence regions, 383 heteroscedastic,58,153 homoscedastic, 58 Laplace model, 149 linear, 109 location-scale, 57 logistic, 56 Poisson, 204 polynomial, 146 testing, 378 weighted least squares, 147 weighted, linear, 112 regression line, 502 regression toward the mean, 36 rejection, 215 relative frequency, 441 residual, 48, Ill, 372 sum of squares, 379 response, 10, 366 risk function, 20 maximum, 28 testing, 22 risk set, 29 convexity, 79 robustness, 190,418 of level t-statistics, 314 555 Index asymptotic, 419 of tests, 419 saddle point, 199 sample, correlation, 140,267 covariance, 140 cumulant, 139 mean, 8,45 median, 105, 149, 192 quantile, 105 random, regression line, III variance, 8,45 sample of size n, 448 sample space,S, 442 sampling inspection, scale, 457 scale parameter, 463 Scheffe's theorem, 468 multivariate case, 514 score test, 335, 336, 399 selectiqg at random, 444 selection, 75, 247 sensitivity curve, 192 Shannon's lemma, 116 shift and scale equivariant, 209 shift equivariant, 206, 208 signal to noise fixed, 126 Slutsky's theorem, 467 multivariate, 512 spectral theorem, 519 square root matrix, 507 standard deviation, 457 standard error, 381 standard normal distribution, 464 statistic, ancillary, 48 equivalent, 43 sufficient, 42 Bayes, 46 minimal, 46 natural, 52 stochastic ordering, 67, 209 stratified sampling, 76, 205 substitution theorem for conditional expectations, 481 superefficiency, 332 survey sampling, 177 model based approach, 350 survival• functions, 70 symmetric distribution, 68 symmetric matrices, 519 symmetric variable, 68 t distribution, 491 moments, 530 Taylor expansion, 517 test function, 23 test size, 217 test statistic, 216 testing, 16, 213 Bayes, 165,225 testing independence in contingency tables,405 total differential, 517 transformation k linear, 516 affine, 487 linear, 487, 516 orthogonal, 494 trimmed mean, 194, 206 sensitivity curve, 194 type I error, 23, 216 type II error, 23, 216 UMP, 226, 227, see uniformly most powerful UMVU, 177, see uniformly minimum variance unbiased uncorrelated, 459 unidentifiable, uniform distribution, 465 discrete MLE,115 uniformly minimum variance unbiased (UMVU),I77 556 uniformly most powerful asymptotically, 334 uniformly most powerful (UMP), 226, 227 variance, 457 of random matrix, 503 sensitivity curve, 195 variance stabilizing transformation , 316 , 317 for binomial, 352 for correlation coefficient, 320, 350 for Poisson, 317 in GLM, 416 • vanance-covariance matrix, 498 ,, i I !, t, , I Index von Neumann's theorem, 171 Wald confidence regions, 399 Wald statistic, 398 asymptotic chi-square distribution, 398 Wald test, 335, 399 multinomial goodness-of-fit, 401 weak law of large numbers for vectors, II Weibull density, 84 weighted least squares, 113, 147 Wilks's theorem, 393-395, 397 Z-score, 457 ... I • , ,! •• Mathematical Statistics Basic Ideas and Selected Topics Volume I Second Edition , , ,j , , • J, ; ,j, , ,, II ; i I i : ,1 i ,I, • Chapter STATISTICAL MODELS, GOALS, AND PERFORMANCE... J Mathematical statistics: basic ideas and selected topics / Peter J Bickel, Kjell A Doksum-2 nd ed p em Includes bibliographical references and index ISBN 0-13-850363-X(v 1) Mathematical statistics,. .. PREFACE TO THE SECOND EDITION: VOLUME I PREFACE TO THE FIRST EDITION STATISTICAL MODELS, GOALS, AND PERFORMANCE CRITERIA 1.1 Data, Models, Parameters, and Statistics 1.1.1 Data and Models 1.1.2