The Basic Practice of Statistics The Basic Practice of Statistics Ninth Edition David S Moore Purdue University William I Notz The Ohio State University Senior Vice President, STEM: Daryl Fox Program Director, Math, Statistics, Earth Sciences, and Environmental Science: Andrew Dunaway Program Manager: Sarah Seymour Executive Content Development Manager, STEM: Debbie Hardin Development Editor: David Dietz Executive Project Manager, Content, STEM: Katrina Mangold Director of Content, Math and Statistics: Daniel Lauve Executive Media Editor: Catriona Kaplan Associate Editor: Andy Newton Assistant Editor: Justin Jones Marketing Manager: Leah Christians Marketing Assistant: Morgan Psiuk Director of Content Management Enhancement: Tracey Kuehn Senior Managing Editor: Lisa Kinne Senior Content Project Manager: Edward Dionne Project Manager: Heidi Allgair, SPi Global Senior Workflow Project Manager: Paul Rohloff Production Supervisor: Robert Cherry Director of Design, Content Management: Diana Blume Design Services Manager: Natasha Wolfe Cover Design: Laura de Grasse Interior Design: Vicki Tomaselli Art Manager: Matthew McAdams Director of Digital Production: Keri deManigold Media Project Manager: Hanna Squire Executive Permissions Editor: Cecilia Varas Rights and Billing Editor: Alexis Gargin Composition: SPi Global Printing and Binding: LSC Communications ISBN 978-1-319-38369-5 (ePub) © 2021, 2018, 2015, 2012 by W H Freeman and Company All rights reserved Printed in the United States of America 6 25 24 23 22 21 20 Macmillan Learning One New York Plaza Suite 4600 New York, NY 10004-1562 www.macmillanlearning.com BRIEF CONTENTS Chapter Getting Started Part I Exploring Data EXPLORING DATA: Variables and Distributions Chapter Picturing Distributions with Graphs Chapter Describing Distributions with Numbers Chapter The Normal Distributions EXPLORING DATA: Relationships Chapter Scatterplots and Correlation Chapter Regression Chapter Two-Way Tables* Chapter Exploring Data: Part I Review Part II Producing Data PRODUCING DATA Chapter Producing Data: Sampling Chapter Producing Data: Experiments Chapter 10 Data Ethics* Chapter 11 Producing Data: Part II Review Part III From Data Production to Inference PROBABILITY AND SAMPLING DISTRIBUTIONS Chapter 12 Introducing Probability Chapter 13 General Rules of Probability* Chapter 14 Binomial Distributions* Chapter 15 Sampling Distributions FOUNDATIONS OF INFERENCE Chapter 16 Confidence Intervals: The Basics Chapter 17 Tests of Significance: The Basics Chapter 18 Inference in Practice Chapter 19 From Data Production to Inference: Part III Review Part IV Inference about Variables QUANTITATIVE RESPONSE VARIABLE Chapter 20 Inference about a Population Mean Chapter 21 Comparing Two Means CATEGORICAL RESPONSE VARIABLE Chapter 22 Inference about a Population Proportion Chapter 23 Comparing Two Proportions Chapter 24 Inference about Variables: Part IV Review Part V Inference about Relationships INFERENCE ABOUT RELATIONSHIPS Chapter 25 Two Categorical Variables: The Chi-Square Test Chapter 26 Inference for Regression Chapter 27 One-Way Analysis of Variance: Comparing Several Means Part VI Optional Companion Chapters (Available Online) Chapter 28 Nonparametric Tests Chapter 29 Multiple Regression* Chapter 30 Two-Way Analysis of Variance Chapter 31 Statistical Process Control Chapter 32 Resampling: Permutation Tests and the Bootstrap *Starred material is optional and can be skipped without loss of continuity CONTENTS Why Did You Do That? Preface Acknowledgments About the Authors Chapter Getting Started 0.1 How the Data Were Obtained Matters 0.2 Always Look at the Data 0.3 Variation Is Everywhere 0.4 What Lies Ahead in This Book Part I Exploring Data Chapter Picturing Distributions with Graphs 1.1 Individuals and Variables 1.2 Categorical Variables: Pie Charts and Bar Graphs 1.3 Quantitative Variables: Histograms 1.4 Interpreting Histograms 1.5 Quantitative Variables: Stemplots 1.6 Time Plots Chapter Describing Distributions with Numbers 10 2.1 Measuring Center: The Mean 2.2 Measuring Center: The Median 2.3 Comparing the Mean and the Median 2.4 Measuring Variability: The Quartiles 2.5 The Five-Number Summary and Boxplots 2.6 Spotting Suspected Outliers and Modified Boxplots* 2.7 Measuring Variability: The Standard Deviation 2.8 Choosing Measures of Center and Variability 2.9 Examples of Technology 2.10 Organizing a Statistical Problem Chapter The Normal Distributions 3.1 Density Curves 3.2 Describing Density Curves 3.3 Normal Distributions 3.4 The 68–95–99.7 Rule 3.5 The Standard Normal Distribution 3.6 Finding Normal Proportions 3.7 Using the Standard Normal Table 3.8 Finding a Value Given a Proportion Chapter Scatterplots and Correlation 4.1 Explanatory and Response Variables 4.2 Displaying Relationships: Scatterplots 4.3 Interpreting Scatterplots 4.4 Adding Categorical Variables to Scatterplots 4.5 Measuring Linear Association: Correlation 4.6 Facts about Correlation Chapter Regression 5.1 Regression Lines 5.2 The Least-Squares Regression Line 5.3 Examples of Technology 5.4 Facts about Least-Squares Regression 5.5 Residuals 5.6 Influential Observations 5.7 Cautions about Correlation and Regression 5.8 Association Does Not Imply Causation 5.9 Correlation, Prediction, and Big Data* Chapter Two-Way Tables* 6.1 Marginal Distributions 6.2 Conditional Distributions 6.3 Simpson’s Paradox Chapter Exploring Data: Part I Review Part I Skills Review Test Yourself Supplementary Exercises Online Data for Additional Analyses Part II Producing Data Chapter Producing Data: Sampling 8.1 Population versus Sample 8.2 How to Sample Badly 8.3 Simple Random Samples Posterior probabilities, 315 Power of significance tests, 413–419 finding, by using software, 415–216 Predicted response, 128, 129, 133, 136, 591 Prediction big data and, 148–149 extrapolation, 143 inference for, 602–606 regression line for, 133–134 Prediction intervals, 604–605, 29-51–29-53 Predictor variables See Explanatory variables Prior probability, 315 Privacy, 254 Probability assignment of, 278, 285, 289 binomial, 326–328 conditional, 302–305, 311–315 defined, 272–274 of failure, 324 idea of, 272–274 models, 308–311 personal, 288–289 posterior, 315 prior, 315 randomness and, 274–275 random variables, 287–288 of success, 324 Probability 0, 278, 285 Probability 0.5, 272–273, 278 Probability applet, 273–274 Probability distribution, 284, 287–288 Probability histograms, 331, 333 Probability models continuous, 283–287 defined, 275–276 finite, 280–282, 287 tree diagrams, 308–311 Probability rules, 278–280, 295–316 addition rule, 296–298 Bayes’ rule, 311–316 conditional probability, 302–305 independent events, showing, 307–308 multiplication rule, 298–301, 305–307 Processes, 31-2–31-6 capability of, 31-31–31-33 defined, 31-2 describing, 31-2–31-4 focus on, rather than on products, 31-29 Process knowledge, 31-44 Process monitoring conditions, 31-8 control charts for, 31-8–31-23 defined, 31-8 Proportions, 493–510 comparing two, 517–530 cumulative, 84–87, 89 finding a value when given, 89–91 large-sample confidence intervals for, 496–502, 520–521 plus four confidence intervals for, 508–510, 528–530 population, inference about, 493–510 sample proportion p^, 494–496 sample size, choosing, 502–504 sampling distribution of difference between, 519 significance tests for, 504–507, 524–528 technology, examples of, 521–523 two-sample problems for, 517–519 Pseudo-random numbers, 205 Pullen, Ernest, 305 P-Value of a Test of Significance applet, 388, 389, 391, 411–412 P-values, 387–396 approximate, table for finding, 395–396 in f distribution, 635, 30-2 fixed standards for (significance levels), 390 for permutation tests, computing, 32-5, 32-7–32-16 in significance tests, 408–412 statistical significance, 387–391 in tests for population mean, 392–394 two-sided, 389, 393, 395 Q Quadratic regression, 29-31–29-33 Quantitative variables, 13, 20–23 defined, 13 histograms, 20–23 stemplots, 29–32 Quantum mechanics, 274 Quantum theory, 274 Quartiles, 50–52 calculating, 51 of density curves, 77, 78 even numbers for finding, 51–52 first, 51, 52–53, 55 in five-number summary, 52–53 interquartile range, 55–56 odd numbers for finding, 51 rule for finding, 51, 52 technology for finding, 60–61 third, 51, 52–53, 55, 56, 60 Quintiles, 94 R Random digit dialing (RDD) method, 214–216 Random digits, 205–206 Randomization in experiments as basis for inference, 32-2–32-9 in statistical design, 235, 238–241 Randomization distribution See Permutation distribution Randomized block designs, 30-6 Randomized comparative experiments, 32-2–32-9 completely randomized designs, 232–234 defined, 231 inference and, 403–404 logic of, 234–236 Randomness in binomial distributions, 331 probability and, 274–275 Random number generators, 283 Random phenomenon, 273, 275–277, 287, 288 Random samples, 272, 274 inference and, 403–404 random numbers in, 274 versus rational subgroups, 31-29–31-30 in significance tests, 272 simple, 204–209 Random sampling error, 405–406 Random variables, 280–282, 287–288 Rational subgroups, 31-29–31-30 R charts, 31-20–31-22, 31-33 Real data, distributions of, 80, 81 Reasoning of a Statistical Test applet, 383 Region, 106–107 Regression, 124–149 association, causation and, 144–148 big data, 148–149 correlation and, 132–133, 142–144, 148–149 explanatory and response variables in, distinction between, 132 influential observations, 139–142 prediction, 148–149 quadratic, 29-31–29-33 residuals, 134–139 toward the mean, 126 Regression inference, 587–612 conditions for, 589–590, 606–611 confidence intervals for regression slope, 600–602 correlation, testing lack of, 598–600 hypothesis of no linear relationship, testing, 597–598 inference about prediction, 602–606 parameters, estimating, 590–593 technology, examples of, 593–596 Regression lines, 124–127 defined, 124–125 extrapolation, 143 intercept of, 126, 127 least-squares, 127–129 multiple linear regression model with two, 29-24–29-28 population, 589–593, 597, 598, 600, 601, 606 for prediction, 133–134 residuals, 134–139 slope of, 126, 127 using, 126 Regression slope, 597, 600–602 Regression standard error, 591–593, 597, 601, 605, 610, 29-6, 29-10, 29-13 Regression toward the mean, 126 Relationship See Association Relative frequency (percents), histogram of, 25–26 Replication, in statistical design, 235, 240 Resampling, 32-1–32-27 bootstrap, 32-19–32-28 permutation tests, 32-1–32-2, 32-9–32-18 randomization in experiments as basis for inference, 32-2–32-9 Residual, 134–139 Residual plot, 137–138, 29-54 Resistant measure, 47, 55 Response bias, 212–213 Response variables, 99–100, 101, 102, 104, 110, 29-53–29-54, 30-5 Robinson, Duncan, 169 Robustness of t procedures, 454–457, 479–481 Rounding data, 30 Roundoff error, 16, 140, 164, 357, 482, 557, 561, 565, 592, 645 Row totals, 161, 163, 164 Row variable, 160, 161 R statistical package, 16, 415, 32-25–32-26 Runs signal, 31-21–31-22 S Sample bias, 203–204, 211 comparing two, with Wilcoxon rank sum test, 28-2–28-6 convenience, 203 defined, 201 inference from, trustworthiness of, 208–209 multistage, 210 nonresponse in, 211–212, 215 population versus, 201 random, 272, 274 with replacement, 32-19–32-20 simple random, 204–209 stratified random sample, 209–210 wording effects, 213 Sample mean x¯ for measuring center, 46–47 for sampling distributions, 349–352 simple random sample of, 350 Sample proportion p^, 494–496 Sample size ANOVA for, 631 for confidence intervals, 412–413 for desired margin of error, 412, 503 meta-analysis, 474 for proportion, choosing, 502–504 in significance tests, 409, 413–419 Sample space S, 276–277 Sample survey, 3, 201–203 cautions about, 211–213 defined, 201 nonresponse in, 211–212, 215 random digit dialing (RDD) method, 214–216 technology in, impact of, 214–217 undercoverage in, 211, 215–216 voluntary response in, 204, 205, 208, 215 web surveys, 215–216 Sampling bad, 203–204 binomial distributions in, 325–326 inference from, trustworthiness of, 208–209 population versus sample, 201–203 random sampling error, 405–406 surveys, cautions about, 211–214 technology, impact of, 214–217 Sampling design, 209–211 Sampling distributions, 342–360 central limit theorem, 353–357 of a count, 325 defined, 347–349 of difference between proportions, 519 law of large numbers, 345–347 for median, 358–359 notation, 343–344 parameters, 343–344 population distribution distinguished from, 347–351 for sample mean, 349–352 shape, center, and variability of, 348–349 statistical estimation, 344–345 statistical significance and, 357–360 for variance, 358–359 Sampling frame, 222 SAS statistical package, 415 SAT, 26, 57–58, 80, 84, 85–90, 101, 104, 106–107, 474, 29-36–29-38 Scatterplots, 100–108 categorical variables in, 106–108, 113 defined, 100 examining, 103 horizontal axis of, 100–101 interpreting, 102–106 outliers in, 103, 107, 113 relationships, displaying, 100–102 s charts, 31-14–31-20 See also Control charts Segmented bar graph, 165, 166 Shape of distributions, 22, 24, 27 of sampling distribution, 348–349 Sigman, Stan, 31-1 Significance levels, 390, 402, 411, 413–418 Significance tests, 382–396 cautions about, 408–412 for comparing proportions, 524–528 effect size in, 413–414, 416 four-step process, 392 hypotheses, stating, 385–387 for population mean, 392–395 power of, 413–419 for a proportion, 504–507 P-value and, 387–391 reasoning of, 383–385 for regression slope, 597 sample size in, 409, 413–419 from a table, 395–396 Type I and Type II errors in, 416–418 Simple conditions for inference about a mean, 367, 403 Simple permutation test, 32-5 Simple random sample (SRS), 204–209 binomial distributions, 326 confidence intervals for population mean, 374–375 independent, 30-6 inference about a mean, conditions for, 367 inference using the z procedures, 404–405 of sample mean x¯, 350 Simple Random Sample applet, 205, 206, 207, 208, 210–211, 232, 234, 349, 360 Simpson’s paradox, 146, 168–170 Simulation, 347 Single-peaked distributions, 24, 25 Sinusoidal form, 103 Six Sigma, 31-45 68–95–99.7 rule, 80–83 Skewed curve, 77–78 Skewed distributions, 75–78 benefits of, in statistics, 29 in comparing mean and median, 49 defined, 24–25, 55, 59 five-number summary for describing, 59 interquartile range for describing, 55 outliers and, 47, 55 to the right, 49, 50, 53, 55 Skewness, 24, 25, 30 Slope of a line, 126, 127 regression, confidence intervals for, 600–602 Smith, Gary, 156 Social science experiments, 257–259 Solve, in four-step process, 61–62 SoundScan, 1–2, Special cause, 31-7 s-type, 31-17–31-18, 31-27, 31-29 x¯-type, 31-18–31-19, 31-29 Spread, 24 Squared deviations, 57, 58 Squared multiple correlation coefficient, 29-12 Square root of variance, standard deviation as, 57, 58 Srivastava, Diane, 551 Standard deviation, 56–58 in ANOVA, checking, 632 binomial mean and, 330–332 calculating, 57–58 defined, 57 of density curves, 78 inference about, avoiding, 483–484 of Normal distributions, 79 outliers and, 59 in process monitoring, 31-14 sampling distributions for, 358–359 as square root of variance, 57, 58 usefulness of, properties that determine, 58 variation of data set described by, 56–58 Standard error, 441 bootstrap, 32-23–32-28 regression, 591–593, 597, 605, 610 of sample proportion, 496 Standardized observation, 84 Standardized value See Z-score Standardized variable, 84 Standardizing, 83–84 Standard Normal distribution, 83–84 Standard Normal table, 86–88, 680–681 State, in four-step process, 61–62 Statistical design of experiments, 235, 240 Statistical estimation, 344–345, 367–369 Statistical inference See Inference Statistically significant, 235, 359, 390 Statistical Power applet, 414–415, 418–419 Statistical problems, organizing, 61–64 for confidence intervals, 373–374 four-step process for, 8, 61–64 in real-world settings, 62 for significance tests, 392 Statistical process control, 31-1–31-38 capability distinguished from, 31-31–31-33 comments on, 31-29–31-31 common cause in, 31-7 control charts, 31-20–31-29, 31-33–31-34 defined, 31-7 desirability of, 31-30 idea of, 31-6–31-8 p charts, control limits for, 31-34–31-38 processes, 31-2–31-6 rational subgroups, 31-29–31-30 s charts for process monitoring, 31-14–31-20 special cause in, 31-7 x¯ charts for process monitoring, 31-8–31-13, 31-15 Statistical significance, 357–360 Statistical study, 13 Statistical test of significance See Significance tests Statistics, defined, Stemplots, 45, 46, 48–49, 53, 55, 62–63 back-to-back, 471, 658 of large data set, 29 making, 29 pattern of, 29, 30 quantitative variables, 29–32 of residuals, 29-54 split stems, 30 Straight lines, review of, 126 Strata, 209, 210 Stratified random sample, 209–210 Strength of least-squares regression line, 133–134 of a relationship, 102–103, 104, 108, 109, 111–112, 113 Student’s t, 444 S-type special cause, 31-17–31-18, 31-27, 31-29 Subjects, of experiments, 227–229 Success, probability of, 324 Sum of squares for groups (SSG), 28-25–28-27 Sums of squares, 644 Survey, 201–203, 211–214 Symmetric density curves, 77–78 Symmetric distributions, 24, 25, 27, 49, 53, 58, 59 Symmetry, 24, 30 T Tables ANOVA, 29-49–29-53 chi-square, 569 of random digits, 205–206 significance from, 395–396 standard Normal, 86–88 two-way, 160–171, 556–559 t approximation, 481–483 t confidence interval, one-sample, 444–446 t critical values, 443 t distributions, 442–443, 472–473, 478, 481–482, 483 Technology, examples of, 60–61 Temeles, Ethan, 620 Tests of significance See Significance tests Test statistic, P-value and, 387–388, 392–396 Texas Instruments graphing calculator, 60–61, 594, 601 Theory of Probability, The (Gnedenko, B V.), 311 Third quartile (Q3), 51, 52–53, 55, 56, 60 Three-sigma (3σ) control chart, 31-14–31-15 Tikhonov, Alexei, 331 Time plots, 32–34, 45 Time series data, 33 t procedures, 452–457 comparing groups, 637–638 matched pairs, 452–454 robustness of, 454–457 using, 455–457 Treatments, 227–229 Tree diagrams, 308–311 Trend, in time plots, 33 t test, one-sample, 446–449 Tukey, John, 638 Tukey pairwise multiple comparisons, 637–642, 30-2 Two-sample problems, 467–471, 517–519 Two-sample t procedures approximate distribution of, 481–483 pooled, avoiding, 483 robustness of, 479–481 Two-sample t statistic, 471–473, 475, 477, 481 Two-sided alternative hypothesis, 386 Two-sided P-value, 389, 393, 395 Two-Variable Statistical Calculator applet, 102, 114 Two-variable statistics, 102, 113 Two-way ANOVA, 30-1–30-23 conditions, 30-4–30-06 defined, 30-4 details of, 30-19–30-23 error sum of squares in, 30-21 F tests, 30-22–30-23 inference for, 30-11–30-19 interaction, 30-4–30-11 main effects, 30-6–30-11 one-way ANOVA compared to, 30-22 sum of squares, 30-21–30-22 Two-way tables, 160–171, 556–559 conditional distributions, 163–168 defined, 161 expected counts in, 561–562 marginal distributions, 161–163 Simpson’s paradox, 168–170 Type I and Type II errors in significance tests, 416–418 U Unbiased estimator, 350 Undercoverage, in sampling, 211, 215–216 Uniform density curve, 78, 284–285 Uniform distribution, 283, 284 Upper arm lengths, 82 V Values critical, 443 finding, when given a proportion, 89–91 Van Buren, Abigail, Variability choosing measures of, 59–60 of distributions, 24, 27 measuring, 50–60 quartiles for measuring, 50–51 of sampling distribution, 348–349 standard deviation for measuring, 56–58 Variables categorical, 13, 15–20, 106–108, 113 cause-and-effect relationship between, 124 column, 160, 161 confounded, 225–226, 230, 236 defined, 13 exact definitions of, 13, 201 explanatory, 99–100, 101, 102, 104, 106, 109–110, 227–229 individuals and, 13–15 lurking, 3, 143–144, 146–148 quantitative, 13, 20–23 random, 287–288 relationships among, 29-40–29-41 response, 99–100, 101, 102, 104, 110, 30-5 row, 160, 161 Variance defined, 56 pooled sample, 644 sampling distributions for, 358–359 of a set of observations, 57 standard deviation as square root of, 57, 58 Variation, 5–7 Venn diagram, 296–298 Vohs, Kathleen, 490, 491 Voluntary response sample, 204, 205, 208, 215 W Wald, Abraham, 99 Web surveys, 215–216 Wechsler Adult Intelligence Scale (WAIS), 431 Weighted average, 63, 643 Wilcoxon rank sum statistic, 28-4, 28-26 Wilcoxon rank sum test, 28-2–28-6 comparing two samples, 28-2–28-6 defined, 28-4 hypotheses tested with, 28-10–28-11 ties in, 28-11–28-16 Wilcoxon signed rank statistic, 28-17, 28-22–28-24 Wilcoxon signed rank test, 28-16–28-19 for matched pairs, 28-17–28-19 ties in, 28-21–28-24 Wording effects, 213 X x¯, sampling distribution of, 349–352 x¯ charts, 31-8–31-13, 31-15 See also Control charts x¯-type special cause, 31-18–31-19, 31-29 Z z procedures, 402, 403–406, 409, 414 z-score, 83–84, 86, 88 described, 83 standardizing, 83–84, 86, 88 unstandardizing, 89–90 ∑ (sigma), 46 The back cover reads as follows Still ahead of the curve First published in 1995, David Moore’s The Basic Practice of Statistics has advanced statistics education by providing students and educators with a text that focuses on data, including its collection and interpretation, emphasizes practice over theory, engages students by using real data in examples and exercises, facilitates learning through its clear writing and systematic approach to problem solving In the decades since that landmark publication, these aspects of BPS have improved and grown in importance Data analysis is central to the science of statistics Statistical studies play a key role in many academic disciplines, in today’s workplace, and in daily life The increasing diversity of students’ needs means it is ever more crucial to support learning with innovative pedagogy and straightforward communication The ninth edition of BPS responds to these trends with increased emphasis on the four-step problem-solving process and reaching conclusions based on data, use of current research data from a wide variety of fields, and detailed guidance on using specific statistical analysis software and calculators Achieve, from Macmillan learning Now available with the ninth edition, Achieve for The Basic Practice of Statistics connects the text’s hallmark problem-solving approach and real-world examples to rich digital resources that foster further understanding and application of statistics Achieve supports learning before, during, and after class for students, with over 3,000 homework questions, answerspecific feedback that coaches students toward the correct answer, and a variety of additional digital resources Instructors can track class performance with powerful analytics in an easy-to-use environment, while ensuring that their students have affordable access to course materials Visit Macmillan learning dot com forward slash achieve for more information Cover image from Gremlin, Getty Images Macmillan learning authentic Macmillan Learning, Macmillan learning dot com ... in The Basic Practice of Statistics Why Did You Write The Basic Practice of Statistics? Several factors influenced the writing of The Basic Practice of Statistics Easy-to-use statistical software... without loss of continuity WHY DID YOU DO THAT? The Authors Answer Questions about The Basic Practice of Statistics Welcome to the ninth edition of The Basic Practice of Statistics As the title... “algebra” in the sense of being able to read and use simple equations Why Should I Use The Basic Practice of Statistics to Teach an Introductory Statistics Course? The Basic Practice of Statistics