This page intentionally left blank Statistics Concepts and Controversies Senior Publisher: Publisher: Executive Marketing Manager: Development Editors: Senior Media Editor: Assistant Editor: Editorial Assistant: Marketing Assistant: Photo Editor: Photo Researcher: Text Designer: Cover Designer: Senior Project Editor: Illustrations: Production Manager: Composition: Printing and Binding: Craig Bleyer Ruth Baruth Jennifer Somerville Shona Burke, Anne Scanlan-Rohrer Roland Cheyney Brian Tedesco Katrina Wilhelm Eileen Rothschild Ted Szczepanski Julie Tesser Vicki Tomaselli Paula Jo Smith Mary Louise Byrd ICC Macmillan Inc., Mark Chickinelli Paul W Rohloff ICC Macmillan Inc RR Donnelly Minitab is a registered trademark of Minitab, Inc SPSS is a registered trademark of SPSS Inc Library of Congress Control Number: 2008932369 ISBN-13: 978-1-4292-2991-3 ISBN-10: 1-4292-2991-8 © 2009 by W H Freeman and Company All rights reserved Printed in the United States of America First printing W H Freeman and Company 41 Madison Avenue New York, NY 10010 www.whfreeman.com Statistics Concepts and Controversies SEVENTH EDITION David S Moore Purdue University William I Notz The Ohio State University W H Freeman and Company New York This page intentionally left blank Brief Contents Part I Producing Data 1 Where Do Data Come From? Samples, Good and Bad 21 What Do Samples Tell Us? 35 Sample Surveys in the Real World 57 Experiments, Good and Bad 81 Experiments in the Real World 101 Data Ethics 123 Measuring 143 Do the Numbers Make Sense? 165 Part I Review 181 Part II Organizing Data 191 10 Graphs, Good and Bad 193 11 Displaying Distributions with Graphs 217 12 Describing Distributions with Numbers 239 13 Normal Distributions 265 14 Describing Relationships: Scatterplots and Correlation 287 15 Describing Relationships: Regression, Prediction, and Causation 311 16 The Consumer Price Index and Government Statistics 339 Part II Review 359 v vi Brief Contents Part III Chance 373 17 Thinking about Chance 375 18 Probability Models 395 19 Simulation 411 20 The House Edge: Expected Values 429 Part III Review Part IV Inference 445 453 21 What Is a Confidence Interval? 455 22 What Is a Test of Significance? 481 23 Use and Abuse of Statistical Inference 505 24 Two-Way Tables and the Chi-Square Test∗ 521 Part IV Review ∗ This material is optional 545 Contents To the Teacher: Statistics as a Liberal Discipline Applications Index Prelude: Making Sense of Statistics Statistics and You: What Lies Ahead in This Book About the Authors Part I xiii xxi xxvii xxxiii xxxv Producing Data Where Do Data Come From? Case Study Talking about data: Individuals and variables Observational studies Sample surveys Census 11 Experiments 12 Statistics in Summary 14 Case Study Evaluated 14 Chapter Exercises 15 Exploring the Web 19 Notes and Data Sources 19 Samples, Good and Bad 21 Case Study 21 How to sample badly 21 Simple random samples 23 Can you trust a sample? 28 Statistics in Summary 29 Case Study Evaluated 30 Chapter Exercises 30 Exploring the Web 34 Notes and Data Sources 34 What Do Samples Tell Us? 35 Case Study 35 From sample to population 36 Sampling variability 37 Margin of error and all that 41 Confidence statements 43 Sampling from large populations 45 Statistical Controversies: Should Election Polls Be Banned? 46 Statistics in Summary 47 Case Study Evaluated 48 Chapter Exercises 48 Exploring the Web 54 Notes and Data Sources 55 Sample Surveys in the Real World 57 Case Study 57 How sample surveys go wrong 58 Sampling errors 58 Nonsampling errors 60 Wording questions 63 How to live with nonsampling errors 65 Sample design in the real world 65 Questions to ask before you believe a poll 70 Statistics in Summary 70 Case Study Evaluated 71 Chapter Exercises 71 Exploring the Web 78 Notes and Data Sources 79 vii viii Contents Experiments, Good and Bad 81 Case Study 81 Talking about experiments 81 How to experiment badly 83 Randomized comparative experiments 85 The logic of experimental design 88 Statistical significance 90 How to live with observational studies 91 Statistics in Summary 93 Case Study Evaluated 93 Chapter Exercises 94 Exploring the Web 99 Notes and Data Sources 100 Experiments in the Real World 101 Case Study 101 Equal treatment for all 101 Double-blind experiments 102 Refusals, nonadherers, and dropouts 104 Can we generalize? 106 Experimental design in the real world 108 Matched pairs and block designs 110 Statistical Controversies: Is It or Isn’t It a Placebo? 113 Statistics in Summary 114 Case Study Evaluated 114 Chapter Exercises 115 Exploring the Web 120 Notes and Data Sources 121 Data Ethics 123 Case Study 123 First principles 123 Institutional review boards 125 Informed consent 125 Confidentiality 127 Clinical trials 128 Statistical Controversies: Hope for Sale? 130 Behavioral and social science experiments 132 Statistics in Summary 134 Case Study Evaluated 134 Chapter Exercises 135 Exploring the Web 141 Notes and Data Sources 141 Measuring 143 Case Study 143 Measurement basics 143 Know your variables 145 Measurements valid and invalid 147 Statistical Controversies: SAT Exams in College Admissions 150 Measurements accurate and inaccurate 151 Improving reliability, reducing bias 153 Pity the poor psychologist 155 Statistics in Summary 157 Case Study Evaluated 158 Chapter Exercises 158 Exploring the Web 163 Notes and Data Sources 163 Do the Numbers Make Sense? Case Study 165 What didn’t they tell us? 165 Are the numbers consistent with each other? 167 Are the numbers plausible? 169 Are the numbers too good to be true? 169 Is the arithmetic right? 170 Is there a hidden agenda? 173 Statistics in Summary 174 Case Study Evaluated 174 Chapter Exercises 175 Exploring the Web 179 Notes and Data Sources 180 165 Answers to Odd-Numbered Exercises II.23 Prior to 2007 gold did not hold its value In 1985 dollars, the 2007 price of gold was $360.74, so in 2007 the value was greater than in 1985 and it had held its value II.25 (a) Roughly symmetric; the two highest and one lowest time might be considered outliers II.27 The mean would be higher because housing prices have a right-skewed distribution II.29 (a) Fidelity Technology Fund because of the higher correlation (b) No Chapter 17 17.3 Answers will vary but long trials of this experiment suggest about 40% heads 17.5 The proportion from Table A is 0.105 17.7 Results will vary with the type of thumbtack used 17.9 (a) (b) (c) 0.01 (d) 0.6 17.11 (b) A personal probability might take into account information about your driving habits (c) Most people believe they are better-than-average drivers 17.17 If two people talk at length, they will eventually discover something in common 17.19 0, 0.45, 0.495, 0.4995 17.21 The “law of averages” is no more reliable for forecasting the weather (in the short run) than it is for other predictions 17.25 51, 507, 5031, 50074; 1, 7, 31, 74 heads away from half the number of tosses Chapter 18 18.3 0.54 18.5 (a) 0.55 (b) 0.48 (c) 0.52 18.7 In Models 1, 3, and 4, the probabilities not sum to 1; Model has probabilities greater than Model is legitimate 18.9 Each possible value (1, 2, 3, 4) has probability 1/4 18.11 Possible totals: through 8; probabilities 1/16, 2/16, 3/16, 4/16, 3/16, 2/16, 1/16 The probability is 4/16 18.13 (a) 0.24 (b) 0.23 (c) 0.47; 0.5 18.15 (a) 0.518 to 0.582 (b) 0.16 18.17 0.9938 589 590 Answers to Odd-Numbered Exercises 18.19 (a) About 50% Probability is the proportion of times the outcome would occur in a very long series of repetitions (b) About 68% (c) About 32% 18.21 (a) 69.4 (b) Answers will vary (c) Answers will vary Chapter 19 19.3 (a) to for Democrats, to for Republicans (b) to for Democrats, to for Republicans (c) to for Democrats, to for Republicans, and or for undecided (d) 00 to 49 for Democrats, 50 to 87 for Republicans, and 88 to 99 for undecided 19.5 (a) chose Democrats, chose Republicans (b) chose Democrats, chose Republicans (c) chose Democrats, chose Republicans, and were undecided (d) chose Democrats, chose Republicans, and was undecided 19.7 (a) 0.1 (b) to for A, to for B, to for C, for D or F 19.9 Results will vary depending on the starting point in Table A 19.11 (a) With to meaning a made free throw, he makes all 10 zero times Thus, we estimate the probability of making all 10 free throws to be 0/25 = (b) The longest run of shots missed was 19.13 (a) or is a pass, to a failure (b) 0.5 (c) No: the probability of passing probably increases on each trial because of learning from previous attempts 19.15 (a) is a narrow flat side, to is a broad concave side, to is a broad convex side, and is a narrow hollow side (a) Results will vary with the starting line in Table A 19.17 (a) to means System A works (b) to means System B works (c) Results will vary with the starting line in Table A 19.19 00 to 24 means a passenger does not show up Results will vary with the starting line in Table A 19.21 to means the van is available Results will vary with the starting line in Table A 19.23 (a) BBB, BBG, BGB, GBB, GGB, GBG, BGG, GGG The first has probability 0.132651; the next three, 0.127449; the next three, 0.122451; and the last, 0.117649 (In practice, these should be rounded to or decimal places.) (b) 0.867349 Chapter 20 20.3 $0.60 20.5 $0.4996 20.7 (a) 300 (b) There is no difference, except in phrasing Saving 400 is the same as losing 200 (c) No The choice seems to be based on how the options “sound.” Answers to Odd-Numbered Exercises 20.9 8700 units 20.11 The probabilities for the outcomes 2, 3, 4, , 12 are 1/36, 2/36, 3/36, 4/36, 5/36, 6/36, 5/36, 4/36, 3/36, 2/36, and 1/36, respectively The expected value is 20.13 The expected profit is $10.20 per customer 20.15 (a) $99.625 (b) Although the probability is very small, having to pay $100,000 would probably be financially disastrous for you (c) The insurance company will earn about $100 per policy by the law of large numbers, and because they sell lots of policies, they can afford the rare case of paying $100,000 20.17 (a) 2.4 (b) Assign and to an A, to a B, to a C, a D, and an F Results will vary with the starting line in Table A 20.19 (b) Results will vary depending on the approach to the simulation Using to for heads and to for tails, the estimate of the expected value is $0.84 20.21 Results will vary depending on the approach to the simulation 20.23 (a) $0.9025 (b) $0.8574 Part III Review III.1 (b) If all are equally likely, this would be 30% III.3 (a) 0.1 (b) Use to for Type O, to for A, and for B, and for AB III.5 Results vary with the starting point in Table A III.7 3.5 III.9 (a) 0.03 (b) Use 00 to 91 for one pair or worse and 92 to 99 for two pairs or better Results vary with the starting point in Table A III.11 (a) All probabilities are between and and add to (b) 0.425 (c) 0.403 III.13 (a) 0.27 (b) 0.57 III.15 (a) 68% (b) 95% III.17 0.7% III.19 Correct assignment: each face value has probability 1/13 III.21 Ohio State and Purdue: 0.2 Minnesota, Indiana, and Wisconsin: 0.1 Chapter 21 21.5 (a) The 15,000 alumni (b) p is the proportion of all alumni who support the president’s decision (c) 0.33 21.7 (a) p is the proportion of all U.S adults who would say that having a gun in the house makes it a safer place (b) Using z∗ = 2, 0.42 ± 0.031 (c) It agrees to two decimal places 591 592 Answers to Odd-Numbered Exercises 21.9 4048 21.11 Using z∗ = 2, 0.075±0.0372 This interval does cover the true parameter value 21.13 Many people may be reluctant to admit that they have engaged in binge drinking Underestimate 21.15 (a) Using z∗ = 2, 0.6603 ± 0.0293 for TVs and 0.1803 ± 0.0238 for Fox (b) The margin of error for our 95% confidence intervals (“19 cases out of 20”) was no more than 3% 21.17 (a) Approximately Normal with mean 0.10 and standard deviation 0.015 (b) 96.4% 21.19 (a) ^ ± p ^(1 − ^)/n (c) 0.129 ± 0.0072 This is half as wide p p 21.21 0.42 ± 0.025 This is narrower 21.23 0.52 ± 0.015, 0.52 ± 0.018, 0.52 ± 0.024, and 0.52 ± 0.037 Increasing the confidence level increases the margin of error 21.25 0.0245 ± 0.0007 21.29 (a) to 30 (b) 15.6 to 20.4 21.31 0.485 to 0.587 21.33 Very roughly Normal -21.35 (a) x = 0.496, which is very close to 0.5 (b) We estimate σ = 0.3251 Chapter 22 22.5 Conclusion 1: The difference in earnings in our sample was so large that it would rarely occur (probability 0.028) in samples drawn from a population in which men’s and women’s earnings are equal Conclusion 2: A difference this large would not be unexpected (probability 0.576) in samples drawn from a population in which black and white earnings are equal 22.7 (b) The difference could be plausibly attributed to chance 22.9 The difference could be plausibly attributed to chance 22.11 (a) People might be embarrassed to admit they did not attend religious services (b) p is the proportion of American adults who attended religious services last week Use H0 : p = 0.4 and Ha : p < 0.4 22.13 H0 : p = 0.049 and Ha : p = 0.049 22.15 (a) p is the graduation rate for all athletes at this university (b) H0 : p = p 0.71 and Ha : p < 0.71 (c) 0.624 The P-value is the probability that ^ ≤ 0.624 when we assume a 71% graduation rate (d) ^-values as extreme as 0.624 would p be unlikely if the true proportion were 0.71 22.17 Significant at both 10% and 5% Answers to Odd-Numbered Exercises 22.19 A test is significant at the 1% level if outcomes as or more extreme than observed occur less than once in 100 times A test is significant at the 5% level if outcomes as or more extreme than observed occur less than in 100 times Something that occurs less than once in 100 times also occurs less than in 100 times, but the opposite is not necessarily true 22.21 (a) Use digits to for “Method A wins trial” and to for “Method B wins trial,” and take 20 digits from Table A (b) The numbers of times B won in our 10 simulations were 12, 13, 11, 12, 11, 10, 8, 14, 9, 12 The estimated P-value is 0.5 (c) We simulated the probability of observing results at least as extreme as those in our sample 22.23 The P-value is less than 0.0006 using Table B It is consistent with part (d) of Exercise 22.16 22.25 (a) Approximately Normal with mean 0.1 and standard deviation 0.01464 (b) The P-value is less than 0.0003, so the evidence is very strong 22.27 The P-value is less than 0.0003, so the evidence is very strong that fewer than half of all drivers are speeding 22.29 H0 : μ = 19 and Ha : μ < 19 22.31 The P-value is about 0.25, so we have little reason to believe that the mean of all numbers produced by this software is not 0.5 Chapter 23 23.3 Our confidence method can be applied only to an SRS 23.5 Our confidence interval contains some values of p that indicate that the majority favor Clinton 23.7 (a) No In a sample of 200 people we would expect to see people with P < 0.01 (b) Test this person again 23.9 Were these random samples? What were the sample sizes? 23.11 It is essentially correct 23.13 (a) 5% (b) We expect 5% of 77 (about 4) tests to show significance by chance 23.15 (a) Randomly assign 50 to get the vaccine and give a placebo to the rest (b) H0 : p1 = p2 and Ha : p1 > p2 , where p1 and p2 are the proportions who are infected after receiving the placebo and vaccine, respectively (c) Differences as or more extreme than those observed would occur 15% of the time by chance (d) Yes 23.17 Using z∗ = 2, 0.65 ± 0.067 23.19 906 passing: 0.888 to 0.924 907 passing: 0.889 to 0.925 593 594 Answers to Odd-Numbered Exercises Chapter 24 24.5 At least one close family member smokes: 35.7% of the students smoke No close family member smokes: 25% of the students smoke The smoking habits of students are associated with the presence or absence of close family members who smoke 24.7 (a) Hatched: 16, 38, 75 Did not hatch: 11, 18, 29 (b) Cold: 59.3% Neutral: 67.9% Hot: 72.1% Cold did not prevent hatching, but it did make it less likely 24.9 (a) About 1,439,000 earned bachelor’s degrees (b) 57.4%, 59.3%, 49.4%, 49.1% Women earned the majority of bachelor’s and master’s degrees but smaller percentages of professional and doctorate degrees 24.11 Start by setting a equal to any number between and 40 24.13 (a) White defendant: 19 yes, 141 no Black defendant: 17 yes, 149 no (b) Overall death penalty: 11.9% of white defendants, 10.2% of black defendants For white victims: 12.6% and 17.5% For black victims: 0% and 5.8% (c) White defendants killed whites 94.3% of the time, while black defendants killed whites 38.0% of the time The death penalty was more likely when the victim was white (14%) rather than black (5.4%) White defendants more frequently killed whites, the group more likely to bring the death penalty, even though whites were less likely to get the death penalty than blacks who killed whites 24.15 (a) Row totals: 82, 37 6.2, column totals: 20, 91, Grand total: 119 Expected counts: 13.8, 62.7, 5.5, 28.3, 2.5 Middle-column counts differ the most (b) 6.93; the largest contribution is from the lower right, with slightly less from the middle column (c) df = 2; significant at α = 0.05 but not at α = 0.01 24.17 χ = 1.703, df = 2, not significant at α = 0.05 24.19 (a) χ = 14.863; P < 0.001 The differences are significant (b) The data should come from independent SRSs of the (unregulated) child care providers in each city Part IV Review IV.1 Using z∗ = 2, 0.799 ± 0.025 IV.3 Between about 82.7% and 87.3% IV.5 Using z∗ = 2, 0.23 ± 0.027 IV.7 You have information about all states, not just a sample IV.9 (a) If there were no difference between the two groups of doctors, results like these would rarely happen by chance (b) The two proportions are given for comparison IV.11 Mornings: 0.550 to 0.590 Evenings: 0.733 to 0.767 The evening proportion is quite a bit higher and the two intervals not overlap Answers to Odd-Numbered Exercises IV.13 (a) H0 : p = 0.57 and Ha : p > 0.57 (b) Normal with mean 0.57 and standard deviation 0.01 (c) Yes IV.15 (a) People are either reluctant to admit that they don’t attend regularly, or they believe they are more regular than they really are (b) Sample results vary from the population truth (c) Both intervals are based on methods that work 95% of the time IV.17 H0 : p = 0.75 and Ha : p > 0.75; standard score 1.00; P = 0.16 Not significant at α = 0.05 IV.19 0.286 to 0.354 IV.21 H0 : p = 0.57 and Ha : p > 0.57; standard score 17.99; P < 0.0003; significant for any reasonable choice of α IV.23 Standard score −2.77; P = 0028 -IV.25 (b) x = 15.59 ft and s = 2.550 ft; 15.0 to 16.2 (c) What population are we examining: full-grown sharks, male sharks? IV.27 Standard score 1.53; P = 0.063 IV.29 (a) No complaint: 743 Medical complaint: 199 Nonmedical complaint: 440 Stayed: 1306 Left: 76 (b) 2.96%, 13.07%, 6.36% (c) Expected counts: 702.14, 188.06, 415.80; 40.86, 10.94, 24.20 All counts are greater than (d) H0 : There is no relationship between a member’s complaining and leaving the HMO Ha : There is some relationship df = 2; this is very significant Complaining and leaving are associated IV.31 (a) Rows: 1313, 1840; 991, 614 Morning: 57.0% Evening: 75.0% (b) We have very large samples with very different proportions (c) χ = 172 with df = 1; very significant 595 This page intentionally left blank Index Aaron, Hank, 237, 239, 240, 241, 242, 244, 245, 251, 263 accuracy, in measurement, 151–153 Adams, Evelyn Marie, 381 AIDS, 105, 351 ethics, 136, 137, 138 alternative hypothesis, 485, 487, 490, 491, 493, 495, 547 one-sided, two-sided, 488 American College Testing (ACT), 276, 277, 300, 476 American Community Survey, 11, 16 American Football League (AFL), 311 American Gaming Association, 443 American Journal of Sociology, 502 American Psychological Association, 133, 141, 519, 560–561 American Statistical Association, xiii, 141 anecdote, xxvii–xxviii, xxxi anonymity, 127, 128, 137 Anscombe, Frank, 332 applet, 263, 285, 309, 336, 392, 427 Archaeopteryx, 293–294, 312–313 asbestos, 386–387, 392 association between variables, 292, 325, 362–363, 530, 533–535 explaining, 322–323, 362–363 negative, 292–294, 297, 300 positive, 292–294, 297, 300 astragalus, 378–379, 424–425 authoritarian personality, 156–157 bar graph, 195–198, 217, 265, 360, 524 base period, 340–341, 351 Beck Depression Inventory, 117 Behavioral Risk Factor Surveillance System, 456, 458, 461, 462, 466, 473–474, 475, 478 bell curve, 272 Benford’s law, 451–452 bias, 22, 39–40, 124, 155, 456, 559 in experimentation, 88, 102–103, 105 in estimation, 39–41, 47, 458, 467 in measurement, 153–155, 156, 157 in sampling, 22–23, 37, 47, 456, 467 birthday problem, 426 block designs, 111–113, 114 body mass index (BMI), 322–323 Boggs, Rex, 366–367 Bonds, Barry, 236, 239, 240, 241, 243, 244–245, 246, 249, 250, 263 bone marrow transplant, 130 boxplot, 244–247, 255, 361 Brett, George, 281–282 Broca, Paul, 152 Bryant, Kobe, 253–254 Buffon, Count, 377, 386, 474, 442, 486–488, 509 Buffon’s needle, 427 Bureau of Economic Analysis, 349, 357 Bureau of Justice Statistics, 349 Bureau of Labor Statistics (BLS), xxxi, 48, 145–146, 155, 200, 210, 346–349, 350, 353, 356, 357, 476 Burt, Cyril, 178, 555 Bush, George W., 21, 35, 45, 329, 369–370 Business Week poll, 30 call-in opinion poll, 21, 22, 23, 31 cancer, and power lines, xxvii, 7–8 cancer clusters, 382 Carolina Abecedarian Project, 82–83, 108 Carter, Jimmy, 313 categorical variable See variable causation, 13, 14, 88, 91, 93, 320–326, 362–363, 382–383, 446, 533 census, 11–12, 14, 58 U.S Census, 5, 6, 11–12, 139, 350, 432 Census Bureau, 5, 11, 12, 16, 35, 66, 179, 194, 208, 235, 349, 350, 432, 440 center, of a distribution, 222, 229, 240, 242, 245, 249, 253, 255, 267, 361 of a density curve, 269–270, 278 Centers for Disease Control and Prevention, 139 central limit theorem, 468, 471 597 598 Index Challenger disaster, 390 chance behavior, 375–378, 379–384 Chance News, 179, 216, 336, 519 chi-square distribution, 528–529, 536 chi-square statistic, 526–528, 536 Circulation, 455 Cleveland, William, 220 clinical trials, 84, 87, 107, 120, 496, 557 ethics of, 128–132 closed question, in sample survey, 74 clusters, in sampling, 66 in scatterplot, 300 Cobb, Ty, 281–282 coin tossing, 376–378, 380, 384, 386, 407, 413–414, 486–488 College Board, 148, 149, 287, 334 column variable, 522 common response, 322–324, 325, 326 completely randomized design, 109, 114 computer-assisted interviewing, 60, 146 confidence, statistical, 42–45 confidence interval, xxxi, 455–479, 506–507, 511–512, 546–547 definition of, 457, 462 for population mean, 469–471 for population proportion, 460, 464–466, 546–547 confidence level, 44, 462, 471, 506 confidence statement, 43–45, 47 confidentiality, 124, 127–128, 134 confounding, 83–84, 91, 93, 174, 322–324, 326 Congressional Budget Office, 317 constant dollars, 342 constitutional amendment, 35, 36–37, 42, 51, 73 Consumer Expenditure Survey, 346 Consumer Price Index (CPI), 340, 341, 342–348, 351–352, 357, 363 table of, 343 Consumer Reports, 235, 257, 260 control, experimental, 89, 93, 110, 111 control group, 87, 91, 130 convenience sample, 22, 29, 47 correlation, 295–300, 313, 318–320, 362 squared, 319–320, 322, 323, 326, 362 cost of living, 347–348 count, 4, 147–148, 195, 225, 268, 522, 536 craps, 425 Crohn’s disease, 84–85 critical values of chi-square distribution, table of, 529 critical values of normal distribution, 465–466, 470, 547 table of, 465 cross-section data, 559 Current Population Survey (CPS), 9, 19, 54, 62, 66, 75, 77, 127, 146, 155, 194, 202, 239, 351, 357, 476, 498 cycles, 212, 213, 214 data, xxvii data analysis, xxxiii, 292, 359, 515 Dear Abby, 383 degrees of freedom, 528, 529, 532, 547 density curve, 268–270, 278, 279, 280, 401, 403, 468–469, 529 dependent variable See variable, response Detroit Area Study, 481–482 dice, 398–399, 405, 425 DiMaggio, Joe, 355 Dirksen, Everett, 318 distribution, 195, 197, 207, 217, 222–226, 229, 240, 244, 247, 249, 251, 399–403, 522 skewed, 223–226, 229, 240, 247, 254, 255, 269–270, 361, 468, 528 symmetric, 223–226, 229, 240, 247, 254, 255, 269–270, 361 domestic violence, 133–134, 176 double-blind experiment, 103–104, 105, 114, 131 draft lottery, 34 dropouts, 104–105, 506, 561 Duke University Medical Center, 125 Einstein, Albert, xxx, 377 Environmental Protection Agency, 353 error curve, 272 errors in measurement, 151–152, 157 errors in sampling, 58–65 ESP, 500, 516 ethics, 123–142 in clinical trials, 128–132 in behavioral studies, 132–134 event, 396–397, 403 exit polls, 46 expected counts, 525, 526, 527, 532 expected value, 429–443, 446 definition of, 431 Index experiment, 12–14, 81–91, 93, 325, 482 cautions about, 102, 104–108 design of, 102, 108–114 randomized comparative, 85–88, 101, 103, 107, 181, 321 FairTest, 149 family, definition of, 256 Fatal Accident Reporting System, 145, 147 Federal Bureau of Investigation (FBI), 161, 171, 209 Federal Election Commission, 370 FedStats, 357 Fermat, Pierre, 379, 451 five-number summary, 244, 253–255, 361 fixed market basket price index, 341–342, 351, 363 Food and Drug Administration, 113 Forbes, 162, 174, 256 F-scale, 156 Gallup poll, 19, 29, 35, 42, 43, 53, 54, 62, 65, 71, 185, 461 Galton, Francis, 272, 316 gambling, 401, 429–430, 433–435 See also lottery Gauss, Carl Friedrich, 272 General Aptitude Test Battery, 160 General Social Survey (GSS), 10, 19, 62, 136, 351, 357 Gilovich, Thomas, 452 Gnedenko, B V., 397 Goodall, Jane, Gould, Stephen Jay, 555 government statistics, 348–351 graphs, 193–216 See also bar graph, boxplot, density curve, histogram, line graph, pie chart, scatterplot, stemplot Greenspan, Alan, 348 Gretzky, Wayne, 151 Grosvenor, Charles, xxviii gross domestic product (GDP), 290–291 guns and crime, 320, 324 Harris Poll, 53, 54, 55, 78 Helsinki Declaration, 128 Hill, Theodore, 451 histogram, 38, 217–226, 229, 230, 231, 259, 265–269, 288, 361, 400 historical controls, 89 599 Hogan, Ben, 355 hot hand, 380, 414, 418 household, definition of, 256 hypothesis testing See significance test incoherent, 399 income inequality, 173–174, 247–248, 346, 559 independence, 413, 414, 416–418, 427, 433, 435 independent variable See variable, explanatory index number, 340–341, 351, 363 individual, 4, 14, 82, 218–219, 290 inference, statistical, xxxiii, 453, 456, 471, 482, 505, 515, 545 informed consent, 124, 125–126, 133–134 insurance, 378, 433–434 interaction, 110 intercept of a line, 315, 316, 362 International Bureau of Weights and Measures, 153–154 institutional review board, 125, 134, 141 instrument, in measurement, 144, 157 IQ tests, 149, 156, 178, 201, 272, 280–281, 282, 284, 299, 302, 327, 449, 450, 476 James, Lebron, 423, 442 Journal of the American Medical Association, xxviii, 99, 186, 189, 555 Keno, 439 Kerrich, John, 377, 386 Kerry, John, 45, 53, 369 Keynes, John Maynard, xxix Landers, Ann, xxviii, 23, 30 law of averages, 383–384, 392 law of large numbers, 433–435, 437 least-squares regression, 314–316, 318–320, 326 equation of, 335 Lewis, C S., 384 line graph, 199–202, 203, 207, 247, 248, 343, 360, 361 longitudinal data, 559 Los Angeles Times, 473, 475 Lott, John, 324 lottery, 381, 391, 429–432, 435, 438 600 Index LSAT, 516 lurking variable See variable M&M candy, 406–407 mall interviews, 22–23 margin of error, 41–43, 44, 47, 58, 62, 67, 68, 70, 272, 461–462, 470, 473, 547 quick method, 42–43, 68, 474 market research, xxx, 10 Mason, Jackie, xxix matched pairs design, 110–111, 114, 183, 499, 552 matching, 91, 98, 110–111 Mathematical Association of America, xiii Mays, Willie, 263 McNamara, John, 167–168 mean, 249–256, 296, 361, 467 of a density curve, 269–270, 278, 279 of Normal distribution, 270–272, 362 units of, 297 measurement, 143–164, 561 bias, 151–155, 156, 157 definition of, 145, 157 errors in, 152, 157 reliability, 152–155, 156, 157 validity, 147–151, 157, 174 median, 240–242, 244, 247, 253–255, 277, 361 of a density curve, 269–270 units of, 297 Mendel, Gregor, 450 meta-analysis, 108 Meyer, Eric, 166 minimum wage, 355 Minitab, 196, 198, 200, 207, 210, 213, 219, 226, 227, 228, 230, 245, 246, 248, 253, 259, 260, 288, 289, 291, 293, 301, 329, 492 model for data, 317, 359 Mondale, Walter, 314 Mosteller, Frederick, xxxi Nash, Steve, 391 National Assessment of Adult Literacy, 491–492 National Assessment of Educational Progress, 52–53, 470–471, 553 National Cancer Institute, xxvii, 382 National Center for Education Statistics, 357 National Center for Health Statistics, 349, 357, 378, 493–494 National Coalition Against Legalized Gambling, 443 National Collegiate Athletic Association (NCAA), 284, 498 National Crime Victimization Survey, 161 National Football League (NFL), 311, 395 National Health Survey, 60 National Household Survey on Drug Abuse, 214 National Indian Gaming Association, 443 National Institute of Standards and Technology, 153, 154, 163 National Institutes of Health, 86, 117 National Opinion Research Center, 10, 351, 560 National Science Foundation, 96 natural supplements, 113 New England Journal of Medicine, xxvii, 99, 124, 131, 189, 555 New York Times, 52, 65, 74, 116, 130, 168, 189, 381, 438, 473, 474, 475, 500 Nicklaus, Jack, 355 Nielsen, Arthur, xxx Nielsen Media Research, 10, 19 nonadherer, 104–105 nonresponse, 61–62, 63, 65, 71, 104, 114, 127, 506 nonsampling errors, 58, 60–65, 71 Normal distributions, 255, 265–286, 361–362, 399–403, 446, 457–458, 463, 464, 465, 467–469, 483–484, 486–488, 489–490 table of percentiles, Table B, 598 North American Association of State and Provincial Lotteries, 443 null hypothesis, 485, 487, 491, 493, 495, 507, 508–509, 512, 525, 547 numbers racket, 438 Nurses Health Study, 187 obesity, 105, 232–233, 322–323 observational study, 6–8, 12, 14, 81, 91–93, 325, 455, 533 odds, 387 Office of Minority Health, 104 Office for National Statistics (Britain), 349 Ohio State University, 123 open question, in sample survey, 74–75 opinion polls, 9, 46, 57, 59–65, 70 See also Gallup poll ethics, 138–139 Index outliers, 221, 226, 228, 240, 253–255, 291, 298, 300, 303, 314, 318–319, 361, 362, 470 P-value, 486, 488, 489–490, 492, 493–494, 495, 507, 508–509, 511, 512, 533, 547 Public Health Service, 129 parameter, 36, 39–40, 44, 47, 48, 58, 65, 155, 457, 459, 460, 467 pari-mutuel system, 432, 434 Pascal, Blaise, 379, 451 pattern and deviations, 200–201, 221, 229, 267, 288, 292, 361 Pearson, Karl, 377, 386 percentages, 4, 195, 225, 522–523, 547 See also rates arithmetic of, 170–173 percentile, 247–248, 277, 356 of Normal distributions, 277–278, 362, 402–403, 489–490, 492, 494, 547 Perot, H Ross, 64 personal probability, 385–386, 388, 399, 412 Pew Research Center, 57, 62, 78 Physicians’ Health Study, 94–95, 129 pictogram, 198–199, 207, 361 pie chart, 195–198, 207, 217, 360 pig whipworms, 84–85 placebo, 85, 86, 87, 102–104, 105, 113, 130, 131–132 ethics of, 130–132 placebo effect, 85, 93, 103, 114, 131 Playfair, William, 265 Point of Purchase Survey, 347 poker 448 Popular Science, 234, 258 population, 8–10, 14, 29, 36, 44, 45, 47, 155, 453, 456, 482, 561 prediction, 311, 312, 313, 314, 316–318, 324, 326, 362, 513–514 predictive validity, 150–151, 156, 157 preelection polls, 46 primary sampling unit, 66 privacy See confidentiality probability, xxxiii, 373, 375–378, 387–388, 412 rules of, 397–399, 427 probability model, 395–409, 412, 413, 414, 418, 419, 420, 421, 430, 431, 433, 436, 446 probability sample See sample processing error, 60, 71 Purdue University, 521 601 Quackwatch, 120 quantitative variable See variable quartiles, 240–244, 247, 255, 277, 361 of a density curve 269 units of, 297 race, 5, 61, 63, 104, 129, 481–482, 535 random digits, 24–27, 30, 389 table of, Table A, 596–597 use in experimental design, 87 use in sampling design, 25–27 use in simulation, 54, 412–416, 435–436 random digit dialing, 57, 59 random error, in measurement, 151–152, 157 random phenomenon, 375–378, 387, 388, 414, 431, 433, 437, 446 random sampling error, 58, 62, 71 randomization, in experiments, 85–88, 93, 99, 102, 109, 111–112, 114 in sampling, 23–28, 66–68 randomized comparative experiment See experiment rates, 148, 195 Reagan, Ronald, 313–314 real income, 343, 345, 355–356, 363 Reese, George, 427 refusals, 104 regression, 312–316, 324, 326, 362 regression to the mean, 316 relationship, describing, 290, 292–294 See also correlation, regression reliability in measurement, 152–155, 156–157 research randomizer, 27–28 response error, 60, 71 response rate, 70, 72 response variable See variable ring-no-answer, 72–73 risk, 251–252, 386–387 Rosenthal, Robert 513 roulette, 406, 429, 438 rounding data, 228–229 roundoff error, 195 row variable, 522 run, 380, 412, 413–414 Ruth, Babe, 354 602 Index sample, 8–10, 14, 29, 36, 44, 45, 47, 65–68, 155, 453, 456, 482 cautions about, 58–65, 70 list-assisted, 69 multistage, 66 probability, 69, 71 simple random, 23–27, 29–30, 41, 65, 66, 67, 68, 69, 86, 87, 181, 399–403, 457, 463, 466, 467, 470, 494, 506 stratified, 66–68, 71, 112, 506 systematic, 77 using software, 27–28 sample survey, 8–11, 14, 57–65, 125, 127, 346, 349 See also Current Population Survey, opinion polls sampling distribution, 37–41, 224, 267, 399–402, 403, 471, 546 definition of, 401 of chi-square statistic, 528–529, 532, 536, 548 of sample mean, 467–469, 471, 491–492, 493 of sample proportion, 36–39, 265–267, 399–402, 457–458, 463, 483, 487, 509–511, 546 sampling errors, 58–59, 71 sampling frame, 58, 71 sampling variability, 37–41, 45, 47, 457 See also sampling distribution SAT, 144, 145, 148, 149–151, 156, 166–167, 201, 258–259, 259–260, 262, 275–276, 277–278, 284, 287, 300, 316–317, 323, 327, 364–365 scales, on graph, 202–205, 361 scatterplot, 289–294, 300, 302, 303, 308, 312–313, 319, 362, 366 seasonal variation, 201 seasonally adjusted, 201, 356 Science, 76, 140, 169, 171 shape of a distribution, 222, 229, 267 sickle cell anemia, 86 significance, statistical, 90–91, 93, 106, 489, 495, 505, 507, 508–509, 513–514, 533, 547 significance level, 489 significance test, 481–503, 508–511, 513, 547 for population mean, 491–495 for population proportion, 482–488 for two-way table, 523–530, 533, 548 simple random sample See sample Simpson’s paradox, 533–535, 536 simulation, 54, 411–428, 435–436, 446, 475, 499–500 68–95–99.7 rule, 273–275, 278, 362, 400, 401, 402, 459, 464, 546 slope of a line, 315, 316, 362 slot machine, 434 smoking and health, 320, 325–326 social statistics, 350–351 Spencer, Henry, xxvii Spielberger Trait Anger Scale, 455, 531 spread, of a density curve, 269–270 of a distribution, 222, 229, 242, 244, 245, 249, 255, 267, 361 SRS See sample, simple random standard deviation, 249–252, 255–256, 296, 362, 467 of Normal distribution, 271–272, 273, 362 of sample mean, 467 of sample proportion, 457, 509–510 units of, 297 standard score, 275–277, 278, 296, 297, 362, 402, 490, 492, 493 Stanford-Binet test, 284 Standard & Poor’s 500 index, 203, 285, 369 State Committee for Statistics, 128 statistic, 36, 39, 40, 44, 45, 47, 58, 65, 224, 272, 399–401, 456, 458 Statistical Abstract of the U.S., 169, 172, 178, 179, 193, 209, 211, 342, 371, 448, 538 Statistics Canada, 349 stem-and-leaf plot See stemplot stemplot, 226–229, 240, 252, 253, 280–281, 361 back-to-back, 236, 246 stochastic beetle, 425, 440 strata, 66 subjects, 82, 06, 107, 108, 109, 110, 111, 112 sunspot cycle, 212–213 Super Bowl, 311, 395 Survey of Study Habits and Attitudes, 501 Suzuki, Ichiro, 391 tables, 193–195 telemarketing, 45, 123 telephone samples, 59, 60, 68–69 television ratings, 10 test of significance See significance test tetrahedron, 406 three-way table, 534 Index 603 treatment, experimental, 82, 106, 108, 109, 110, 111, 112 tree diagram, 419–421 trend, 200, 361 Tuskegee syphilis study, 129 Tversky, Amos, 438–439, 452 two-way tables, 521–523, 536, 547 quantitative, 197, 207, 217, 229, 255, 267, 288–290, 295, 300, 360, 361 response, 82, 92, 93, 110, 182, 289–290, 297, 300, 312, 316–317, 326, 362 variability, in estimation, 39–41, 45, 47 variance, 249 voluntary response, 22–23, 29, 47, 174, 453 UCLA, 498 unbiased estimation, 41, 458, 467 undercoverage, 58–59, 71 unemployment, measuring, 145–147, 154–155, 201 unit, of measurement, 15, 144, 145, 194, 297 University of Wisconsin, 136 U.S.A Today, 51 U.S census, 6, 11–12, 139, 350 U.S News and World Report, 163 Wainer, Howard, 201 Wald, Abraham, 292 Wall Street Journal, 74, 76 Washington Post, 17, 117 Wechsler Adult Intelligence Scale (WAIS), 282–283, 285, 450 Wechsler Intelligence Scale for Children (WISC), 284–285 weighting, 65 Williams, Ted, 281–282 Woods, Tiger, 355 wording questions, 63–65, 71 World Bank, 290 write-in opinion poll, 23 validity, in measurement, 147–151, 157, 174 variable, 4–5, 14, 144–145, 157, 561 See also distribution categorical, 197, 207, 217, 360, 523, 536 explanatory, 82, 92, 93, 109, 110, 182–183, 289–290, 297, 300, 312, 316–317, 326, 362 lurking, xxix, 83–85, 91, 93, 183, 320–325, 363, 455, 533–535, 536, 560 Yale University, 136 Zogby International, 52, 69 . ..This page intentionally left blank Statistics Concepts and Controversies Senior Publisher: Publisher: Executive Marketing Manager: Development Editors:... State University W H Freeman and Company New York This page intentionally left blank Brief Contents Part I Producing Data 1 Where Do Data Come From? Samples, Good and Bad 21 What Do Samples Tell... statistics to understand opinion polls and the Consumer Price Index Because data and chance are omnipresent, our propaganda line goes, everyone will find statistics useful, and perhaps even profitable