FIVE STEPS TO A AP Statistics Other books in McGraw-Hill’s Steps to a series include: AP Biology AP Calculus AB/BC AP Chemistry AP Computer Science AP English Language AP English Literature AP European History AP Microeconomics/Macroeconomics AP Physics B and C AP Psychology AP Spanish Language AP U.S Government and Politics AP U.S History AP World History 11 Practice Tests for the AP Exams Writing the AP English Essay FIVE STEPS TO A AP Statistics Duane C Hinders 2008–2009 MCGRAW-HILL New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi San Juan Seoul Singapore Sydney Toronto Copyright © 2008, 2004 by The McGraw-Hill Companies, Inc All rights reserved Manufactured in the United States of America Except as permitted under the United States Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission of the publisher 0-07-159466-3 The material in this eBook also appears in the print version of this title: 0-07-148856-1 All trademarks are trademarks of their respective owners Rather than put a trademark symbol after every occurrence of a trademarked name, we use names in an editorial fashion only, and to the benefit of the trademark owner, with no intention of infringement of the trademark Where such designations appear in this book, they have been printed with initial caps McGraw-Hill eBooks are available at special quantity discounts to use as premiums and sales promotions, or for use in corporate training programs For more information, please contact George Hoare, Special Sales, at george_hoare@mcgraw-hill.com or (212) 904-4069 TERMS OF USE This is a copyrighted work and The McGraw-Hill Companies, Inc (“McGraw-Hill”) and its licensors reserve all rights in and to the work Use of this work is subject to these terms Except as permitted under the Copyright Act of 1976 and the right to store and retrieve one copy of the work, you may not decompile, disassemble, reverse engineer, reproduce, modify, create derivative works based upon, transmit, distribute, disseminate, sell, publish or sublicense the work or any part of it without McGraw-Hill’s prior consent You may use the work for your own noncommercial and personal use; any other use of the work is strictly prohibited Your right to use the work may be terminated if you fail to comply with these terms THE WORK IS PROVIDED “AS IS.” McGRAW-HILL AND ITS LICENSORS MAKE NO GUARANTEES OR WARRANTIES AS TO THE ACCURACY, ADEQUACY OR COMPLETENESS OF OR RESULTS TO BE OBTAINED FROM USING THE WORK, INCLUDING ANY INFORMATION THAT CAN BE ACCESSED THROUGH THE WORK VIA HYPERLINK OR OTHERWISE, AND EXPRESSLY DISCLAIM ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE McGraw-Hill and its licensors not warrant or guarantee that the functions contained in the work will meet your requirements or that its operation will be uninterrupted or error free Neither McGraw-Hill nor its licensors shall be liable to you or anyone else for any inaccuracy, error or omission, regardless of cause, in the work or for any damages resulting therefrom McGraw-Hill has no responsibility for the content of any information accessed through the work Under no circumstances shall McGraw-Hill and/or its licensors be liable for any indirect, incidental, special, punitive, consequential or similar damages that result from the use of or inability to use the work, even if any of them has been advised of the possibility of such damages This limitation of liability shall apply to any claim or cause whatsoever whether such claim or cause arises in contract, tort or otherwise DOI: 10.1036/0071488561 Professional Want to learn more? We hope you enjoy this McGraw-Hill eBook! If you’d like more information about this book, its author, or related books and websites, please click here For more information about this title, click here CONTENTS Preface, ix Acknowledgments, xi About the Author, xii Introduction: The Five-Step Program, xiii The Basics, xiii STEP Set Up Your Study Program, 1 What You Need to Know About the AP Statistics Exam, Background Information, Some Frequently Asked Questions About the AP Statistics Exam, How to Plan Your Time, 10 Three Approaches to Preparing for the AP Statistics Exam, 10 Calendar for Each Plan, 12 STEP Determine Your Test Readiness, 15 Take a Diagnostic Exam, 17 AP Statistics Diagnostic Test, 19 Answers and Explanations, 29 Interpretation: How Ready Are You? 38 STEP Develop Strategies for Success, 39 Tips for Taking the Exam, 41 General Test-Taking Tips, 42 Tips for Multiple-Choice Questions, 42 Tips for Free-Response Questions, 43 Specific Statistics Content Tips, 45 STEP Review the Knowledge You Need to Score High, 47 Overview of Statistics/Basic Vocabulary, 49 Quantitative versus Qualitative Data, 50 Descriptive versus Inferential Statistics, 50 Collecting Data: Surveys, Experiments, Observational Studies, 51 Random Variables, 52 One-Variable Data Analysis, 55 Graphical Analysis, 56 Measures of Center, 62 Measures of Spread, 64 Position of a Term in a Distribution, 67 Normal Distribution, 70 vi ❯ Contents Two-Variable Data Analysis, 87 Scatterplots, 87 Correlation, 89 Lines of Best Fit, 93 Residuals, 96 Coefficient of Determination, 98 Outliers and Influential Observations, 99 Transformations to Achieve Linearity, 100 Design of a Study: Sampling, Surveys, and Experiments, 113 Samples, 114 Sampling Bias, 116 Experiments and Observational Studies, 118 Random Variables and Probability, 134 Probability, 134 Random Variables, 139 Normal Probabilities, 143 Simulation and Random Number Generation, 145 Transforming and Combining Random Variables, 148 Rules for the Mean and Standard Deviation of Combined Random Variables, 148 10 Binomial Distributions, Geometric Distributions, and Sampling Distributions, 162 Binomial Distributions, 162 Normal Approximation to the Binomial, 165 Geometric Distributions, 167 Sampling Distributions, 168 Sampling Distributions of a Sample Proportion, 172 11 Confidence Intervals and Introduction to Inference, 185 Estimation and Confidence Intervals, 186 Confidence Intervals for Means and Proportions, 189 Sample Size, 194 Statistical Significance and P-value, 196 The Hypothesis-Testing Procedure, 198 Type-I and Type-II Errors and the Power of a Test, 199 12 Inference for Means and Proportions, 215 Significance Testing, 216 Inference for a Single Population Mean, 218 Inference for the Difference Between Two Population Means, 221 Inference for a Single Population Proportion, 223 Inference for the Difference Between Two Population Proportions, 225 13 Inference for Regression, 243 Simple Linear Regression, 243 Inference for the Slope of a Regression Line, 245 Confidence Interval for the Slope of a Regression Line, 247 Inference for Regression Using Technology, 249 Contents ❮ vii 14 Inference for Categorical Data: Chi-square, 263 Chi-square Goodness-of-Fit Test, 263 Inference for Two-Way Tables, 268 STEP Build Your Test-Taking Confidence, 285 AP Statistics Practice Exam 1, 289 AP Statistics Practice Exam 2, 319 Appendices, 333 Formulas, 334 Tables, 336 Bibliography, 340 Web sites, 341 Glossary, 342 This page intentionally left blank This page intentionally left blank Appendixes Formulas Tables Bibliography Web sites Glossary ❮ 333 Copyright © 2008, 2004 by The McGraw-Hill Companies, Inc Click here for terms of use FORMULAS I Descriptive Statistics ∑x x= i n ∑ ( xi − x )2 n −1 sx = (n1 − 1)s12 + (n2 − 1)s 22 sp = (n1 + 1) + (n2 − 1) yˆ = b0 + b1 x ∑ ( x − x )( y − y ) ∑ (x − x ) b1 = i i b0 = y − b1 x r= ⎛ x i − x ⎞ ⎛ yi − y ⎞ ∑ ⎜ ⎟ n − ⎜⎝ s x ⎟⎠ ⎝ s y ⎠ b1 = r sy sx ∑( y sb1 = II − yˆ )2 n−2 ∑ ( xi − x )2 i Probability P(A » B) = P(A) + P(B) – P(A « B) P ( A | B) = P ( A ∩ B) P (B ) E ( X ) = µ x = ∑ xi pi Var(X) = σ x = Σ (x - µ ) i x pi If X has a binomial distribution with parameters n and p, then: ⎛ n⎞ P ( X = k ) = ⎜ ⎟ p k (1 − p )n−k ⎝ k⎠ Copyright © 2008, 2004 by The McGraw-Hill Companies, Inc Click here for terms of use Appendixes ❮ 335 µx = nk σ x = np(1 − p ) µ pˆ = p σ pˆ = p(1 − p ) n If x is the mean of a random sample of size n from an infinite population with mean µ and standard deviation σ, then µx = µ σx = σ n III Inferential Statistics Standardized test statistic: statistic − parameter standard deviation of statistic Confidence interval: statistic ± (critical value) (standard deviation of statistic) Single-sample STATISTIC Sample Mean STANDARD DEVIATION σ Sample Proportion n p(1− p ) n Two-sample STATISTIC STANDARD DEVIATION σ 12 σ 22 + n1 n2 Difference of sample means (σ1 π σ2) Difference of sample means (σ1 = σ2) Difference of sample proportions ( p1 π p2) Difference of sample proportions p1 = p σ 1 + n1 n2 p1(1 − p1 ) p2 (1 − p2 ) + n1 n2 p(1 − p ) (observed − expected ) Chi-square test statistic = ∑ expected 1 + n1 n TABLES Probability Table entry for z is the probability lying below z z TABLE A Standard Normal Probabilities z 00 01 02 03 04 05 06 07 08 09 −3.4 −3.3 −3.2 −3.1 −3.0 −2.9 −2.8 −2.7 −2.6 −2.5 −2.4 −2.3 −2.2 −2.1 −2.0 −1.9 −1.8 −1.7 −1.6 −1.5 −1.4 −1.3 −1.2 −1.1 −1.0 −0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 −0.0 0003 0005 0007 0010 0013 0019 0026 0035 0047 0062 0082 0107 0139 0179 0228 0287 0359 0446 0548 0668 0808 0968 1151 1357 1587 1841 2119 2420 2743 3085 3446 3821 4207 4602 5000 0003 0005 0007 0009 0013 0018 0025 0034 0045 0060 0080 0104 0136 0174 0222 0281 0351 0436 0537 0655 0793 0951 1131 1335 1562 1814 2090 2389 2709 3050 3409 3783 4168 4562 4960 0003 0005 0006 0009 0013 0018 0024 0033 0044 0059 0078 0102 0132 0170 0217 0274 0344 0427 0526 0643 0778 0934 1112 1314 1539 1788 2061 2358 2676 3015 3372 3745 4129 4522 4920 0003 0004 0006 0009 0012 0017 0023 0032 0043 0057 0075 0099 0129 0166 0212 0268 0336 0418 0516 0630 0764 0918 1093 1292 1515 1762 2033 2327 2643 2981 3336 3707 4090 4483 4880 0003 0004 0006 0008 0012 0016 0023 0031 0041 0055 0073 0096 0125 0162 0207 0262 0329 0409 0505 0618 0749 0901 1075 1271 1492 1736 2005 2296 2611 2946 3300 3669 4052 4443 4840 0003 0004 0006 0008 0011 0016 0022 0030 0040 0054 0071 0094 0122 0158 0202 0256 0322 0401 0495 0606 0735 0885 1056 1251 1469 1711 1977 2266 2578 2912 3264 3632 4013 4404 4801 0003 0004 0006 0008 0011 0015 0021 0029 0039 0052 0069 0091 0119 0154 0197 0250 0314 0392 0485 0594 0721 0869 1038 1230 1446 1685 1949 2236 2546 2877 3228 3594 3974 4364 4761 0003 0004 0005 0008 0011 0015 0021 0028 0038 0051 0068 0089 0116 0150 0192 0244 0307 0384 0475 0582 0708 0853 1020 1210 1423 1660 1922 2206 2514 2843 3192 3557 3936 4325 4721 0003 0004 0005 0007 0010 0014 0020 0027 0037 0049 0066 0087 0113 0146 0188 0239 0301 0375 0465 0571 0694 0838 1003 1190 1401 1635 1894 2177 2483 2810 3156 3520 3897 4286 4681 0002 0003 0005 0007 0010 0014 0019 0026 0036 0048 0064 0084 0110 0143 0183 0233 0294 0367 0455 0559 0681 0823 0985 1170 1379 1611 1867 2148 2451 2776 3121 3483 3859 4247 4641 Continued Copyright © 2008, 2004 by The McGraw-Hill Companies, Inc Click here for terms of use Appendixes ❮ 337 Probability Table entry for z is the probability lying below z z TABLE A Standard Normal Probabilities (continued ) z 00 01 02 03 04 05 06 07 08 09 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 5000 5398 5793 6179 6554 6915 7257 7580 7881 8159 8413 8643 8849 9032 9192 9332 9452 9554 9641 9713 9772 9821 9861 9893 9918 9938 9953 9965 9974 9981 9987 9990 9993 9995 9997 5040 5438 5832 6217 6591 6950 7291 7611 7910 8186 8438 8665 8869 9049 9207 9345 9463 9564 9649 9719 9778 9826 9864 9896 9920 9940 9955 9966 9975 9982 9987 9991 9993 9995 9997 5080 5478 5871 6255 6628 6985 7324 7642 7939 8212 8461 8686 8888 9066 9222 9357 9474 9573 9656 9726 9783 9830 9868 9898 9922 9941 9956 9967 9976 9982 9987 9991 9994 9995 9997 5120 5517 5910 6293 6664 7019 7357 7673 7967 8238 8485 8708 8907 9082 9236 9370 9484 9582 9664 9732 9788 9834 9871 9901 9925 9943 9957 9968 9977 9983 9988 9991 9994 9996 9997 5160 5557 5948 6331 6700 7054 7389 7704 7995 8264 8508 8729 8925 9099 9251 9382 9495 9591 9671 9738 9793 9838 9875 9904 9927 9945 9959 9969 9977 9984 9988 9992 9994 9996 9997 5199 5596 5987 6368 6736 7088 7422 7734 8023 8289 8531 8749 8944 9115 9265 9394 9505 9599 9678 9744 9798 9842 9878 9906 9929 9946 9960 9970 9978 9984 9989 9992 9994 9996 9997 5239 5636 6026 6406 6772 7123 7454 7764 8051 8315 8554 8770 8962 9131 9279 9406 9515 9608 9686 9750 9803 9846 9881 9909 9931 9948 9961 9971 9979 9985 9989 9992 9994 9996 9997 5279 5675 6064 6443 6808 7157 7486 7794 8078 8340 8577 8790 8980 9147 9292 9418 9525 9616 9693 9756 9808 9850 9884 9911 9932 9949 9962 9972 9979 9985 9989 9992 9995 9996 9997 5319 5714 6103 6480 6844 7190 7517 7823 8106 8365 8599 8810 8997 9162 9306 9429 9535 9625 9699 9761 9812 9854 9887 9913 9934 9951 9963 9973 9980 9986 9990 9993 9995 9996 9997 5359 5753 6141 6517 6879 7224 7549 7852 8133 8389 8621 8830 9015 9177 9319 9441 9545 9633 9706 9767 9817 9857 9890 9916 9936 9952 9964 9974 9981 9986 9990 9993 9995 9997 9998 338 ❯ Appendixes Table entry for p and C is the point t * with probability p lying above it and probability C lying between –t * and t * Probability p t* TABLE B t Distribution Critical Values TAIL PROBABILITY P df 25 20 15 10 05 025 02 01 005 0025 001 0005 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 80 100 1000 ∞ 1.000 816 765 741 727 718 711 706 703 700 697 695 694 692 691 690 689 688 688 687 686 686 685 685 684 684 684 683 683 683 681 679 679 678 677 675 674 50% 1.376 1.061 978 941 920 906 896 889 883 879 876 873 870 868 866 865 863 862 861 860 859 858 858 857 856 856 855 855 854 854 851 849 848 846 845 842 841 60% 1.963 1.386 1.250 1.190 1.156 1.134 1.119 1.108 1.100 1.093 1.088 1.083 1.079 1.076 1.074 1.071 1.069 1.067 1.066 1.064 1.063 1.061 1.060 1.059 1.058 1.058 1.057 1.056 1.055 1.055 1.050 1.047 1.045 1.043 1.042 1.037 1.036 70% 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.303 1.299 1.296 1.292 1.290 1.282 1.282 80% 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 1.684 1.676 1.671 1.664 1.660 1.646 1.645 90% 12.71 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 2.021 2.009 2.000 1.990 1.984 1.962 1.960 95% 15.89 4.849 3.482 2.999 2.757 2.612 2.517 2.449 2.398 2.359 2.328 2.303 2.282 2.264 2.249 2.235 2.224 2.214 2.205 2.197 2.189 2.183 2.177 2.172 2.167 2.162 2.158 2.154 2.150 2.147 2.123 2.109 2.099 2.088 2.081 2.056 2.054 96% 31.82 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 2.423 2.403 2.390 2.374 2.364 2.330 2.326 98% 63.66 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.704 2.678 2.660 2.639 2.626 2.581 2.576 99% 127.3 14.09 7.453 5.598 4.773 4.317 4.029 3.833 3.690 3.581 3.497 3.428 3.372 3.326 3.286 3.252 3.222 3.197 3.174 3.153 3.135 3.119 3.104 3.091 3.078 3.067 3.057 3.047 3.038 3.030 2.971 2.937 2.915 2.887 2.871 2.813 2.807 99.5% 318.3 22.33 10.21 7.173 5.893 5.208 4.785 4.501 4.297 4.144 4.025 3.930 3.852 3.787 3.733 3.686 3.646 3.611 3.579 3.552 3.527 3.505 3.485 3.467 3.450 3.435 3.421 3.408 3.396 3.385 3.307 3.261 3.232 3.195 3.174 3.098 3.091 99.8% 636.6 31.60 12.92 8.610 6.869 5.959 5.408 5.041 4.781 4.587 4.437 4.318 4.221 4.140 4.073 4.015 3.965 3.922 3.883 3.850 3.819 3.792 3.768 3.745 3.725 3.707 3.690 3.674 3.659 3.646 3.551 3.496 3.460 3.416 3.390 3.300 3.291 99.9% Confidence level C Appendixes ❮ 339 Probability p Table entry for p is the point (χ2) with probability p lying above it (χ2) TABLE C χ2 Critical Values df 25 20 1.32 1.64 2.77 3.22 4.11 4.64 5.39 5.99 6.63 7.29 7.84 8.56 9.04 9.80 10.22 11.03 11.39 12.24 10 12.55 13.44 11 13.70 14.63 12 14.85 15.81 13 15.98 16.98 14 17.12 18.15 15 18.25 19.31 16 19.37 20.47 17 20.49 21.61 18 21.60 22.76 19 22.72 23.90 20 23.83 25.04 21 24.93 26.17 22 26.04 27.30 23 27.14 28.43 24 28.24 29.55 25 29.34 30.68 26 30.43 31.79 27 31.53 32.91 28 32.62 34.03 29 33.71 35.14 30 34.80 36.25 40 45.62 47.27 50 56.33 58.16 60 66.98 68.97 80 88.13 90.41 100 109.1 111.7 15 2.07 3.79 5.32 6.74 8.12 9.45 10.75 12.03 13.29 14.53 15.77 16.99 18.20 19.41 20.60 21.79 22.98 24.16 25.33 26.50 27.66 28.82 29.98 31.13 32.28 33.43 34.57 35.71 36.85 37.99 49.24 60.35 71.34 93.11 114.7 10 TAIL PROBABILITY P 05 025 02 01 2.71 3.84 4.61 5.99 6.25 7.81 7.78 9.49 9.24 11.07 10.64 12.59 12.02 14.07 13.36 15.51 14.68 16.92 15.99 18.31 17.28 19.68 18.55 21.03 19.81 22.36 21.06 23.68 22.31 25.00 23.54 26.30 24.77 27.59 25.99 28.87 27.20 30.14 28.41 31.41 29.62 32.67 30.81 33.92 32.01 35.17 33.20 36.42 34.38 37.65 35.56 38.89 36.74 40.11 37.92 41.34 39.09 42.56 40.26 43.77 51.81 55.76 63.17 67.50 74.40 79.08 96.58 101.9 118.5 124.3 5.02 7.38 9.35 11.14 12.83 14.45 16.01 17.53 19.02 20.48 21.92 23.34 24.74 26.12 27.49 28.85 30.19 31.53 32.85 34.17 35.48 36.78 38.08 39.36 40.65 41.92 43.19 44.46 45.72 46.98 59.34 71.42 83.30 106.6 129.6 005 0025 001 0005 5.41 6.63 7.88 9.14 10.83 12.12 7.82 9.21 10.60 11.98 13.82 15.20 9.84 11.34 12.84 14.32 16.27 17.73 11.67 13.28 14.86 16.42 18.47 20.00 13.39 15.09 16.75 18.39 20.51 22.11 15.03 16.81 18.55 20.25 22.46 24.10 16.62 18.48 20.28 22.04 24.32 26.02 18.17 20.09 21.95 23.77 26.12 27.87 19.68 21.67 23.59 25.46 27.88 29.67 21.16 23.21 25.19 27.11 29.59 31.42 22.62 24.72 26.76 28.73 31.26 33.14 24.05 26.22 28.30 30.32 32.91 34.82 25.47 27.69 29.82 31.88 34.53 36.48 26.87 29.14 31.32 33.43 36.12 38.11 28.26 30.58 32.80 34.95 37.70 39.72 29.63 32.00 34.27 36.46 39.25 41.31 31.00 33.41 35.72 37.95 40.79 42.88 32.35 34.81 37.16 39.42 42.31 44.43 33.69 36.19 38.58 40.88 43.82 45.97 35.02 37.57 40.00 42.34 45.31 47.50 36.34 38.93 41.40 43.78 46.80 49.01 37.66 40.29 42.80 45.20 48.27 50.51 38.97 41.64 44.18 46.62 49.73 52.00 40.27 42.98 45.56 48.03 51.18 53.48 41.57 44.31 46.93 49.44 52.62 54.95 42.86 45.64 48.29 50.83 54.05 56.41 44.14 46.96 49.64 52.22 55.48 57.86 45.42 48.28 50.99 53.59 56.89 59.30 46.69 49.59 52.34 54.97 58.30 60.73 47.96 50.89 53.67 56.33 59.70 62.16 60.44 63.69 66.77 69.70 73.40 76.09 72.61 76.15 79.49 82.66 86.66 89.56 84.58 88.38 91.95 95.34 99.61 102.7 108.1 112.3 116.3 120.1 124.8 128.3 131.1 135.8 140.2 144.3 149.4 153.2 BIBLIOGRAPHY Advanced Placement Program Course Description, New York: The College Board, 2007–2008 Utts, J., Heckard, R 2004 Mind on Statistics 2nd edn Belmont, CA: Duxbury/Thomson Learning, Bock, D, Velleman, P., De Veaux, R 2004 Stats: Modeling the World Boston: Pearson/Addison Wesley Watkins, A., Scheaffer, R., Cobb, G 2004 Statistics in Action: Understanding a World of Data Emeryville, CA: Key Curriculum Press McClave, J., Sincich, T 2003 Statistics, 9th edn., Upper Saddle River, NJ: Prentice Hall Yates, D., Starnes, D., Moore, D 2005 Statistics through Applications New York: W.H Freeman and Company Moore, David S 2006 Introduction to the Practice of Statistics, 5th edn New York: W.H Freeman and Company Peck, R., Olsen, C., Devore, J 2005 Introduction to Statistics and Data Analysis, 2nd edn., Belmont, CA: Duxbury/Thomson Learning Yates, D., Moore, David, Starnes, D 2008 The Practice of Statistics, 3rd edn New York: W H Freeman and Company Copyright © 2008, 2004 by The McGraw-Hill Companies, Inc Click here for terms of use WEB SITES Here is a list of web sites that contain information and links that you might find useful in your preparation for the AP Statistics exam: AP Central: http://apcentral.collegeboard.com/ Bureau of the Census: http://www.census.gov/ Chance News: http://www.dartmouth.edu/_chance/chance_news/news.html Data and Story Library (DASL): http://lib.stat.cmu.edu/DASL/ Exploring Data: http://exploringdata.cqu.edu.au/ Statistical Abstract of the U.S.: http://www.census.gov/statab/www/ Copyright © 2008, 2004 by The McGraw-Hill Companies, Inc Click here for terms of use GLOSSARY theory that the researcher hopes to confirm by rejecting the null hypothesis Association—when some of the variability in one variable can be accounted for by the other Bar graph—graph in which the frequencies of categories are displayed with bars; analogous to a histogram for numerical data Bimodal—distribution with two (or more) most common values; see mode Binomial distribution—probability distribution for a random variable X in a binomial setting; Alternative hypothesis—the ⎛ n⎞ x n− x P ( X = x ) = ⎜ ⎟ ( p ) (1 − p ) , ⎝ x⎠ where n is the number of independent trials, p is the probability of success on each trial, and x is the count of successes out of the n trials Binomial setting (experiment)—when each of a fixed number, n, of observations either succeeds or fails, independently, with probability p Bivariate data—having to with two variables Block—a grouping of experimental units thought to be related to the response to the treatment Block design—procedure by which experimental units are put into homogeneous groups in an attempt to control for the effects of the group on the response Blocking—see block design Boxplot (box and whisker plot)—graphical representation of the 5-number summary of a dataset Each value in the 5-number summary is located over its corresponding value on a number line A box is drawn that ranges from Q1 to Q3 and “whiskers” extend to the maximum and minimum values from Q1 and Q3 Categorical data—see qualitative data Census—attempt to contact every member of a population Center—the “middle” of a distribution; either the mean or the median Central limit theorem—theorem that states that the sampling distribution of a sample mean becomes approximately normal when the sample size is large Chi-square (χ2) Goodness-of-Fit Test—compares a set of observed categorical values to a set of expected values under a set of hypothesized proportions for the categories; χ =∑ Coefficient of (O − E ) E determination (r2)—measures the proportion of variation in the response variable explained by regression on the explanatory variable Complement of an event—set of all outcomes in the sample space that are not in the event Completely randomized design—when all subjects (or experimental units) are randomly assigned to treatments in an experiment Conditional probability—the probability of one event succeeding given that some other event has already occurred Confidence interval—an interval that, with a given level of confidence, is likely to contain a population value; (estimate) ± (margin of error) Confidence level—the probability that the procedure used to construct an interval will generate an interval that does contain the population value Confounding variable—has an effect on the outcomes of the study but whose effects cannot be separated from those of the treatment variable Contingency table—see two-way table Continuous data—data that can be measured, or take on values in an interval; the set of possible values cannot be counted Continuous random variable—a random variable whose values are continuous data; takes all values in an interval Control—see statistical control Convenience sample—sample chosen without any random mechanism; chooses individuals based on ease of selection Correlation coefficient (r)—measures the strength of the linear relationship between two quantitative variables; Copyright © 2008, 2004 by The McGraw-Hill Companies, Inc Click here for terms of use Glossary ❮ 343 n ⎛ x i − x ⎞ ⎛ yi − r= ∑ ⎜ n − i =1 ⎜⎝ s x ⎟⎠ ⎝ s y y⎞ ⎟ ⎠ Correlation is not causation—just because two vari- ables correlate strongly does not mean that one caused the other Critical value—values in a distribution that identify certain specified areas of the distribution Degrees of freedom—number of independent datapoints in a distribution Density function—a function that is everywhere non-negative and has a total area equal to underneath it and above the horizontal axis Descriptive statistics—process of examining data analytically and graphically Dimension—size of a two way table; r × c Discrete data—data that can be counted (possibly infinite) or placed in order Discrete random variable—random variable whose values are discrete data Dotplot—graph in which data values are identified as dots placed above their corresponding values on a number line Double blind—experimental design in which neither the subjects nor the study administrators know what treatment a subject has received Empirical Rule (68-95-99.7 Rule)—states that, in a normal distribution, about 68% of the terms are within one standard deviation of the mean, about 95% are within two standard deviations, and about 99.7% are within three standard deviations Estimate—sample value used to approximate a value of a parameter Event—in probability, a subset of a sample space; a set of one or more simple outcomes Expected value—mean value of a discrete random variable Experiment—study in which a researcher measures the responses to a treatment variable, or variables, imposed and controlled by the researcher Experimental units—individuals on which experiments are conducted Explanatory variable—explains changes in response variable; treatment variable; independent variable Extrapolation—predictions about the value of a variable based on the value of another variable outside the range of measured values First quartile—25th percentile Five (5)-number summary—for a dataset, [minimum value, Q1, median, Q3, maximum value] Geometric setting—independent observations, each of which succeeds or fails with the same probability p; number of trials needed until first success is variable of interest Histogram—graph in which the frequencies of numerical data are displayed with bars; analogous to a bar graph for categorical data Homogeneity of proportions—chi-square hypothesis in which proportions of a categorical variable are tested for homogeneity across two or more populations Independent events—knowing one event occurs does not change the probability that the other occurs; P ( A ) = P A B Independent variable—see explanatory variable Inferential statistics—use of sample data to make inferences about populations Influential observation—observation, usually in the x direction, whose removal would have a marked impact on the slope of the regression line Interpolation—predictions about the value of a variable based on the value of another variable within the range of measured values Interquartile range—value of the third quartile minus the value of the first quartile; contains middle 50% of the data Least-squares regression line—of all possible lines, the line that minimizes the sum of squared errors (residuals) from the line Line of best fit—see least-squares regression line Lurking variable—one that has an effect on the outcomes of the study but whose influence was not part of the investigation Margin of error—measure of uncertainty in the estimate of a parameter; (critical value) (standard error) Marginal totals—row and column totals in a twoway table Matched pairs—experimental units paired by a researcher based on some common characteristic or characteristic Matched pairs design—experimental design that utilizes each pair as a block; one unit receives one treatment, and the other unit receives the other treatment Mean—sum of all the values in a dataset divided by the number of values Median—halfway through an ordered dataset, below and above which there lie an equal number of data values; 50th percentile ( ) 344 ❯ Glossary Mode—most common value in a distribution Mound-shaped (bell-shaped)—distribution in which Parameter—measure that describes a population Percentile rank—proportion of terms in the distri- data values tend to cluster about the center of the distribution; characteristic of a normal distribution Mutually exclusive events—events that cannot occur simultaneously; if one occurs, the other doesn’t Negatively associated—larger values of one variable are associated with smaller values of the other; see associated Nonresponse bias—occurs when subjects selected for a sample not respond Normal curve—familiar bell-shaped density curve; symmetric about its mean; defined in terms of its mean and standard deviation; butions less than the value being considered Placebo—an inactive procedure or treatment Placebo effect—effect, often positive, attributable to the patient’s expectation that the treatment will have an effect Point estimate—value based on sample data the represents a likely value for a population parameter Positively associated—larger values of one variable are associated with larger values of the other; see associated Power of the test—probability of rejecting a null hypothesis against a specific alternative Probability distribution—identification of the outcomes of a random variable together with the probabilities associated with those outcomes Probability histogram—histogram for a probability distribution; horizontal axis are the outcomes, vertical axis are the probabilities of those outcomes Probability of an event—relative frequency of the number of ways an event can succeed to the total number of ways it can succeed or fail Probability sample—sampling technique that uses a random mechanism to select the members of the sample Proportion—ratio of the count of a particular outcome to the total number of outcomes Qualitative data—data whose values range over categories rather than values Quantitative data—data whose values are numerical Quartiles—25th, 50th, and 75th percentiles of a dataset Random phenomenon—unclear how any one trial will turn out, but there is a regular distribution of outcomes in a large number of trials Random sample—sample in which each member of the sample is chosen by chance and each member of the population has an equal chance to be in the sample Random variable—numerical outcome of a random phenomenon (random experiment) Randomization—random assignment of experimental units to treatments Range—difference between the maximum and minimum values of a dataset Replication—repetition of each treatment enough times to help control for chance variation Representative sample—sample that possesses the essential characteristics of the population from which it was taken f (x) = σ 2π e 1⎛ x −µ ⎞ − ⎜ ⎝ σ ⎟⎠ Normal distribution—distribution of a random vari- able X so that P(a < X < b) is the area under the normal curve between a and b Null hypothesis—hypothesis being tested—usually a statement that there is no effect or difference between treatments; what a researcher wants to disprove to support his/her alternative Numerical data—see quantitative data Observational study—when variables of interest are observed and measured but no treatment is imposed in an attempt to influence the response Observed values—counts of outcomes in an experiment or study; compared with expected values in a chi-square analysis One-sided alternative—alternative hypothesis that varies from the null in only one direction One-sided test—used when an alternative hypothesis states that the true value is less than or greater than the hypothesized value Outcome—simple events in a probability experiment Outlier—a data value that is far removed from the general pattern of the data P(A and B)—probability that both A and B occur; P(A and B) = P(A) P A B P(A or B)—probability that either A or B occurs; P(A or B) = P(A) + P(B) - P(A and B) P value—probability of getting a sample value at least as extreme as obtained by chance alone assuming the null hypothesis is true ( ) Glossary ❮ 345 Residual—in a regression, the actual value minus the predicted value Standard normal probability—normal probability calculated from the standard normal distribution Resistant statistic—one whose numerical value is not influenced by extreme values in the dataset Statistic—measure that describes a sample (e.g., sample mean) Response bias—bias that stems from respondents Statistical control—holding constant variables in an inaccurate or untruthful response Response variable—measures the outcome of a study Robust—when a procedure may still be useful even if the conditions needed to justify it are not completely satisfied Robust procedure—procedure that still works reasonably well even if the assumptions needed for it are violated; the t procedures are robust against the assumption of normality as long as there are no outliers or severe skewness Sample space—set of all possible mutually exclusive outcomes of a probability experiment Sample survey—using a sample from a population to obtain responses to questions from individuals Sampling distribution of a statistic—distribution of all possible values of a statistic for samples of a given size Sampling frame—list of experimental units from which the sample is selected Scatterplot—graphical representation of a set of ordered pairs; horizontal axis is first element in the pair, vertical axis is the second Shape—geometric description of a dataset: moundshaped; symmetric, uniform; skewed; etc Significance level (a)—probability value that, when compared to the P value, determines whether a finding is statistically significant Simple random sample (SRS)—sample in which all possible samples of the same size are equally likely to be the sample chosen Simulation—random imitation of a probabilistic situation Skewed—distribution that is asymmetrical Skewed left (right)—asymmetrical with more of a tail on the left (right) than on the right (left) Spread—variability of a distribution Standard deviation—square root of the variance; experiment that might effect the response but are not one of the treatment variables Statistically significant—a finding that is unlikely to have occurred by chance Statistics—science of data Stemplot (stem-and-leaf plot)—graph in which ordinal data are broken into “stems” and “leaves”; visually similar to a histogram except that all the data are retained Stratified random sample—groups of interest (strata) chosen in such a way that they appear in approximately the same proportions in the sample as in the population Subjects—human experimental units Survey—obtaining responses to questions from individuals Symmetric—data values distributed equally above and below the center of the distribution Systematic bias—the mean of the sampling distribution of a statistic does not equal the mean of the population; see unbiased estimate Systematic sample—probability sample in which one of the first n subjects is chosen at random for the sample and then each nth person after that is chosen for the sample t distribution—the distribution with n - degrees of freedom for the t statistic t statistic— ∑ (x − x ) s= n −1 Standard error—estimate of population standard deviation based on sample data Standard normal distribution—normal distribution with a mean of and a standard deviation of t= x −µ s/ n Test statistic— estimator − hypothesized value standard error Third quartile—75th percentile Treatment variable—see explanatory variable Tree diagram—graphical technique for showing all possible outcomes in a probability experiment Two-sided alternative—alternative hypothesis that can vary from the null in either direction; values much greater than or much less than the null provide evidence against the null 346 ❯ Glossary Two-sided test—a hypothesis test with a two-sided alternative Two-way table—table that lists the outcomes of two categorical variables; the values of one category are given as the row variable, and the values of the other category are given as the column variable; also called a contingency table Type-I error—the error made when a true hypothesis is rejected Type-II error—the error made when a false hypothesis is not rejected Unbiased estimate—mean of the sampling distribution of the estimate equals the parameter being estimate Undercoverage—some groups in a population are not included in a sample from that population Uniform—distribution in which all data values have the same frequency of occurrence Univariate data—having to with a single variable Variance—average of the squared deviations from their mean of a set of observations; ∑ (x − x ) = s n −1 Voluntary response bias—bias inherent when people choose to respond to a survey or poll; bias is typically toward opinions of those who feel most strongly Voluntary response sample—sample in which participants are free to respond or not to a survey or a poll Wording bias—creation of response bias attributable to the phrasing of a question z score—number of standard deviations a term is above or below the mean; z= x−x s ... STEPS TO A AP Statistics Other books in McGraw-Hill’s Steps to a series include: AP Biology AP Calculus AB/BC AP Chemistry AP Computer Science AP English Language AP English Literature AP European... History AP Microeconomics/Macroeconomics AP Physics B and C AP Psychology AP Spanish Language AP U.S Government and Politics AP U.S History AP World History 11 Practice Tests for the AP Exams... About the AP Statistics Exam, Background Information, Some Frequently Asked Questions About the AP Statistics Exam, How to Plan Your Time, 10 Three Approaches to Preparing for the AP Statistics