How we know this text’s exercises are perfectly adapted for online learning with Enhanced WebAssign? The text author wrote them Enhanced WebAssign for Elementary Statistics: Looking at the Big Picture is an easy-to-use online teaching and learning system that provides assignable homework, automatic grading, and interactive assistance for students With more than 1,000 exercises pulled directly from the text—written and customized by Nancy Pfenning to be ideal for the online environment—students get problem-solving practice that clarifies statistics, builds skills, and boosts conceptual understanding And when you choose Enhanced WebAssign, students also get access to a Multimedia eBook, a complete interactive version of the text Students Get Interactive Practice As students work problems, they can link directly to: Watch It—Videos of worked exercises and examples from the text Read It—Relevant eBook selections from the text You Save Time on Homework Management, Including Automatic Grading Enhanced WebAssign’s simple, user-friendly interface lets you quickly master the essential functions—and help is always available if you need it Create a course in two easy steps, enroll students quickly (or let them enroll themselves), and select problems for an assignment in fewer than five minutes Enhanced WebAssign automatically grades the assignments and sends results to your gradebook It’s that easy! Screenshots shown here are for illustrative purposes only Find out more and see a sample assignment at www.webassign.net/brookscole 04950165427_Pfenning_EWA_IFC.indd 6/12/09 8:44:55 AM Elementary Statistics This page intentionally left blank Elementary Statistics LO O K I N G AT T H E B I G P I C T U R E Nancy pfenning University of Pittsburgh Australia • Brazil • Japan • Korea • Mexico • Singapore • Spain • United Kingdom • United States Elementary Statistics: Looking at the Big Picture Nancy Pfenning Publisher: Richard Stratton Senior Sponsoring Editor: Molly Taylor Associate Editor: Daniel Seibert Editorial Assistant: Shaylin Walsh Senior Marketing Manager: Greta Kleinert Marketing Coordinator: Erica O’Connell Marketing Communications Manager: Mary Anne Payumo Content Project Manager: Susan Miscio Art Director: Linda Helcher © 2011 Brooks/Cole, Cengage Learning ALL RIGHTS RESERVED No part of this work covered by the copyright herein may be reproduced, transmitted, stored, or used in any form or by any means graphic, electronic, or mechanical, including but not limited to photocopying, recording, scanning, digitizing, taping, Web distribution, information networks, or information storage and retrieval systems, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the publisher For product information and technology assistance, contact us at Cengage Learning Customer & Sales Support, 1-800-354-9706 For permission to use material from this text or product, submit all requests online at www.cengage.com/permissions Further permissions questions can be emailed to permissionrequest@cengage.com Senior Print Buyer: Diane Gibbons Library of Congress Control Number: 2009935400 Senior Rights Acquisition Account Manager, Text: Katie Huha ISBN-13: 978-0-495-01652-6 Production Service: S4Carlisle Publishing Services ISBN-10: 0-495-01652-7 Rights Acquisition Account Manager, Images: Don Schlotman Brooks/Cole 20 Channel Center Street Boston, MA 02210 USA Photo Researcher: Jennifer Lim Interior and Cover Designer: KeDesign Cover Image: © Veer Incorporated Compositor: S4Carlisle Publishing Services Cengage Learning is a leading provider of customized learning solutions with office locations around the globe, including Singapore, the United Kingdom, Australia, Mexico, Brazil, and Japan Locate your local office at: international.cengage.com/region Cengage Learning products are represented in Canada by Nelson Education, Ltd For your course and learning solutions, visit www.cengage.com Purchase any of our products at your local college store or at our preferred online store www.ichapters.com Printed in the United States of America 12 11 10 09 To Frank, Andreas & Mary, Marina, and Nils This page intentionally left blank Contents Preface xv Introduction: Variables and Processes in Statistics Types of Variables: Categorical or Quantitative Students Talk Stats: Identifying Types of Variables Handling Data for Two Types of Variables Roles of Variables: Explanatory or Response Statistics as a Four-Stage Process Summary 11 / Exercises 11 PA R T I Data Production 16 Sampling: Which Individuals Are Studied 18 Sources of Bias in Sampling: When Selected Individuals Are Not Representative Probability Sampling Plans: Relying on Randomness The Role of Sample Size: Bigger Is Better If the Sample Is Representative From Sample to Population: To What Extent Can We Generalize? Students Talk Stats: Seeking a Representative Sample 18 20 21 22 23 Summary 25 / Exercises 25 Design: How Individuals Are Studied 30 3.1 Various Designs for Studying Variables Identifying Study Design Observational Studies versus Experiments: Who Controls the Variables? Errors in Studies’ Conclusions: The Imperfect Nature of Statistical Studies 30 32 33 35 3.2 Sample Surveys: When Individuals Report Their Own Values Sources of Bias in Sample Surveys 38 38 3.3 Observational Studies: When Nature Takes Its Course Confounding Variables and Causation Paired or Two-Sample Studies Prospective or Retrospective Studies: Forward or Backward in Time 46 46 48 49 3.4 Experiments: When Researchers Take Control Randomized Controlled Experiments Double-Blind Experiments “Blind” Subjects “Blind” Experimenters Pitfalls in Experimentation Modifications to Randomization 51 52 53 53 54 55 57 vii viii Contents Students Talk Stats: Does Watching TV Cause ADHD? Considering Study Design 63 Summary 63 / Exercises 65 PA R T I I Displaying and Summarizing Data 70 Displaying and Summarizing Data for a Single Variable 4.1 72 Single Categorical Variable Summaries and Pie Charts The Role of Sample Size: Why Some Proportions Tell Us More Than Others Do Bar Graphs: Another Way to Visualize Categorical Data Mode and Majority: The Value That Dominates Revisiting Two Types of Bias Students Talk Stats: Biased Sample, Biased Assessment 74 75 77 77 78 4.2 Single Quantitative Variables and the Shape of a Distribution Thinking about Quantitative Data Stemplots: A Detailed Picture of Number Values Histograms: A More General Picture of Number Values 82 83 85 89 4.3 Center and Spread: What’s Typical for Quantitative Values, and How They Vary Five-Number Summary: Landmark Values for Center and Spread Boxplots: Depicting the Key Number Values Mean and Standard Deviation: Center and Spread in a Nutshell 93 93 95 98 4.4 Normal Distributions: The Shape of Things to Come The 68-95-99.7 Rule for Samples: What’s “Normal” for a Data Set From a Histogram to a Smooth Curve Standardizing Values of Normal Variables: Storing Information in the Letter z Students Talk Stats: When the 68-95-99.7 Rule Does Not Apply “Unstandardizing” z-Scores: Back to Original Units The Normal Table: A Precursor to Software 72 72 108 110 113 114 117 118 119 Summary 125 / Exercises 127 Displaying and Summarizing Relationships 133 5.1 Relationships between One Categorical and One Quantitative Variable Different Approaches for Different Study Designs Displays Summaries Notation Data from a Two-Sample Design Data from a Several-Sample Design Data from a Paired Design Students Talk Stats: Displaying and Summarizing Paired Data Generalizing from Samples to Populations: The Role of Spreads The Role of Sample Size: When Differences Have More Impact 133 133 134 134 134 134 137 138 139 141 143 5.2 Relationships between Two Categorical Variables 150 Summaries and Displays: Two-Way Tables, Conditional Percentages, and Bar Graphs 151 The Role of Sample Size: Larger Samples Let Us Rule Out Chance 156 696 Solutions to Selected Exercises 6.38 a independent b non-overlapping 7.4 a b because the distribution is skewed, not symmetric 7.5 a 6.39 0.02 is P(A given W) and 1.00 is P(W given A) 6.47 a 0.44 ϫ 0.38 ϩ 0.56 ϫ 0.43 ϭ 0.41 b gender 6.49 a P(Preg) ϭ 0.5, P(not Preg) ϭ 0.5, P(Pos given Preg) ϭ 0.75, P(Pos given not Preg) ϭ 0.25, P(Pos) ϭ P(Preg) ϫ P(Pos given Preg) ϩ P(not Preg) ϫ P(Pos given not Preg) ϭ 0.5(0.75) ϩ 0.5(0.25) ϭ 0.5, P(Preg given Pos) ϭ P(Preg and Pos)/P(Pos) ϭ 0.5(0.75)/0.5 ϭ 0.75 b P(Pos) ϭ P(Preg) ϫ P(Pos given Preg) ϩ P(not Preg) ϫ P(Pos given not Preg) ϭ 0.8(0.75) ϩ 0.2(0.25) ϭ 0.65, P(Preg given Pos) ϭ P(Preg and Pos)/P(Pos) ϭ 0.8(0.75)/0.65 ϭ 0.92 c P(Pos) ϭ P(Preg) ϫ P(Pos given Preg) ϩ P(not Preg) ϫ P(Pos given not Preg) ϭ 0.2(0.75) ϩ 0.8(0.25) ϭ 0.35, P(Preg given Pos) ϭ P(Preg and Pos)/P(Pos) ϭ 0.2(0.75)/0.35 ϭ 0.43 d As the probability of having the condition in question decreases, so does the probability of having the condition, given that one tests positive for the condition Outcome X Probability NNN 0.75 x 0.75 x 0.75 = 0.421875 NND 0.75 x 0.75 x 0.25 = 0.140625 NDN 0.75 x 0.25 x 0.75 = 0.140625 DNN 0.25 x 0.75 x 0.75 = 0.140625 NDD 0.75 x 0.25 x 0.25 = 0.046875 DND 0.25 x 0.75 x 0.25 = 0.046875 DDN 0.25 x 0.25 x 0.75 = 0.046875 DDD 0.25 x 0.25 x 0.25 = 0.015625 6.53 a X Probability Expected Liberal Not Liberal 0.421875 Year 1970 (60 x 100)/200 = 30 (140 x 100)/200 = 70 3(0.140625) = 0.421875 Year 2003 (60 x 100)/200 = 30 (140 x 100)/200 = 70 3(0.046875) = 0.140625 0.015625 b c 6,000 6.55 a 0.361 ϫ 0.316 ϭ 0.114 b higher b skewed right c 1; 7.7 a A mean could be calculated for the second question only, because it involves a quantitative variable b the principle of long-run observed outcomes 7.8 a 0.01 ϩ 0.02 ϩ 0.04 ϭ 0.07 b Nonoverlapping “Or” Rule c The problem arises because of the way the values of X are assessed (self-ratings are biased higher than true ability) d because ratings of friends are not independent (they would probably tend to be similar) e 0.01(0.01) ϭ 0.0001 f 0.34 ϩ 0.34 Ϫ 0.34(0.34) ϭ 0.56 g 0.07/(0.24 ϩ 0.28 ϩ 0.07) ϭ 0.12 Chapter 7.1 a 0, 1, 2, 3, 4, 5, 6, b X Probability 0.20 0.05 0.05 0.05 0.05 0.50 0.05 0.05 a Ϫ (0.02 ϩ 0.04 ϩ 0.34 ϩ 0.24 ϩ 0.28 ϩ 0.07) ϭ 0.01 b 0.30 Probability 7.3 0.20 0.10 0.00 c left-skewed Self-rated ability d 5, Solutions to Selected Exercises 7.9 9 ) + 1(16 ) + 2(16 ) = Weight with 16 , with 16 , and with 16 : 0(16 16 697 = 0.5 7.10 a histogram for bulls below left 0.80 0.60 Probability Probability 0.60 0.40 0.20 0.00 0.40 0.20 0.00 Bulls Cows b The histogram for cows is shown above right; both histograms are skewed right, but the histogram for bulls has a second peak at c bullocks, since 0.28 is greater than 0.06 d 0.66 ϩ 0.04 ϩ 0.28 ϭ 0.98 e 0.78 ϩ 0.11 ϩ 0.06 ϭ 0.95 f No; the Independent “And” Rule does not apply because the probability of owning no cows could be affected by whether or not bullocks are owned—B and C are likely to be dependent g mB ϭ 0(0.66) ϩ 1(0.04) ϩ 2(0.28) ϩ 3(0.02) ϭ 0.66 h mC ϭ 0(0.78) ϩ 1(0.11) ϩ 2(0.06) ϩ 3(0.03) ϩ 4(0.02) ϭ 0.40 i (66(0) ϩ 4(1) ϩ 28(2) ϩ 2(3))/100 ϭ 0.66, which equals the mean j 0, because a family never owns 0.40 cow 7.12 a 0.5 is the average of the numbers and b 20.52(0.5) + 0.52(0.5) = 0.5 mean 0.5 c 0.5 makes sense as the typical distance of the numbers and from their 7.13 a B = 2(0 - 0.66)2(0.66) + (1 - 0.66)2(0.04) + (2 - 0.66)2(0.28) + (3 - 0.66)2(0.02) = 0.95 b mB ϩ C ϭ 0.66 ϩ 0.40 ϭ 1.06 c The formula can be used only if the random variables are independent, but as discussed above, B and C are likely to be dependent 7.15 a We cannot use the 68-95-99.7 Rule to find probabilities because the distribution is not normal b m ϭ 1(0.01) ϩ 2(0.02) ϩ 3(0.04) ϩ 4(0.34) ϩ 5(0.24) ϩ 6(0.28) ϩ 7(0.07) ϭ 4.9 c The typical distance of values from their mean couldn’t be 0.012 or 0.12 (the histogram would have too little spread) or 12.0 (the histogram would have too much spread) Alternatively, we consider that the values range from to d 4.9 ϩ 2(1.2) ϭ 7.3 e P(X Ͼ 7.3) ϭ f The probability of being more than standard deviations above the mean equals 0, not (1 Ϫ 0.95)/2 ϭ 0.025 as it would be if the 68-95-99.7 Rule held The distribution is skewed left, not normal, and not at all smooth because only seven values are possible g When X ϭ 1, 50 Y = 50 (1) - = When X ϭ 7, Y = 50 (7) - 50 = 100 h mϪ50/3 ϩ (50/3)X ϭ Ϫ50/3 ϩ (50/3)(4.9) ϭ 65 i sϪ50/3 ϩ (50/3)X ϭ (50/3)(1.2) ϭ 20 7.18 mean 95(30) + 32 = 86, standard deviation (5) = 7.20 a 0.66 ϩ 0.66 ϭ 1.32 b No, because standard deviations are not additive 7.22 a mean 4.7 ϩ 4.7 ϭ 9.4, standard deviation 21.02 + 1.02 = 1.4 b The mean of the total days worked by both of them in a week can be computed as in part (a) but not standard deviation because days worked by co-workers would not be independent (they could tend to be similar, or to offset one another) 7.24 a The number of children is fixed, so there is a fixed sample size n ϭ There is the same probability of sickle-cell disease each time, p ϭ 0.25 The births are independent of each other, and for each child there are just two possibilities—disease or not b n ϭ and p ϭ 0.25 7.25 a not binomial because there are more than two possible values b not binomial because sample size n is not fixed c X is binomial d not binomial because sampling without replacement from a relatively small population results in dependence of selections 7.27 a when we sample from 100 people b 25 c 0.25 698 Solutions to Selected Exercises 7.28 Depending on socio-economic level of students at a particular high school, there could be fewer or more single-child families A private high school could tend to have students from smaller, wealthier families If the sample is biased, then the distribution of sample proportion would not be centered at population proportion 0.05 7.29 a If X ϭ and n ϭ 100, pN = 3>100 = 0.03 b If pN = 0.01 and n ϭ 200, X ϭ 0.01(200) ϭ c Mean is np ϭ 50(0.02) ϭ and standard deviation is 2np(1 - p) = 250(0.02)(0.98) = 1.0, approximately d On average, we expect to get about married student in our sample, and the number who are married will tend to differ from this mean by about e Mean is p ϭ 0.02 and standard deviation is 2p(1 - p)>n = 20.02(0.98)>50 = 0.02, approximately f Mean is np ϭ 500(0.02) ϭ 10 and standard deviation is 2np(1 - p) = 2500(0.02)(0.98) = 3.1, approximately g Mean is p ϭ 0.02 and standard deviation is 2p(1 - p)>n = 20.02(0.98)>500 = 0.01, approximately h On average, we expect the proportion of married students in our sample to be about 0.02, and the proportion who are married will tend to differ from this by about 0.01 i Sample proportions are closer to 0.02 for the larger samples (500) 7.30 The mean is 0.02 but we cannot report the standard deviation because there is too much dependence; since 100 is not more than 10 times 40, the rule of thumb for approximate independence is not satisfied 7.32 a 7.34 a Centers would both be 0.25 b There would be less spread for samples of 40 c Shape would be closer to normal for samples of 40 7.35 a more normal for the larger sample size, 500 b The distribution of X is approximately normal for samples of 500 but not 50: 500(0.02) and 500(0.98) are both at least 10, but 50(0.02) is less than 10 c The distribution of pN is also very right-skewed because it has the same shape as X 7.36 a mean 0.25 b standard deviation 0.25(1 - 0.25) = 0.07 c shape close to normal 40 because 40(0.25) ϭ 10 and 40(1 Ϫ 0.25) ϭ 30 are both at least 10 7.38 a (2) Sample proportion of women in random samples of 100 is most normal (balanced population, large sample size) and (3) sample proportion of blacks in random samples of 10 is least normal (unbalanced population, small sample size) b (4) Smallest standard deviation is 0.03, for p ϭ 0.10 and n ϭ 100 (sample proportion of blacks in samples of 100); (1) largest standard deviation is 0.16, for p ϭ 0.50 and n ϭ 10 (sample proportion of women in samples of 10) 7.40 the same 7.42 X is discrete and it has an infinite number of possible values: 1, 3, 5, etc 7.43 a continuous quantitative b categorical c discrete quantitative d continuous quantitative e continuous quantitative f continuous quantitative g discrete quantitative 7.45 a No, P(X Ն 2) and P(X Ͼ 2) aren’t necessarily equal because X is discrete b Yes, P(X Ն 2) and P(X Ͼ 2) should be equal because X, the mean of all values, is continuous c Yes, because X is continuous d Yes, because the random variable is continuous 7.47 0.7 0.6 Probability 0.5 0.4 0.3 17 0.2 22 27 32 37 42 Circumference (inches) 0.1 0.0 X b c The one on the bottom is more normal 47 Solutions to Selected Exercises 7.48 a 0.997, because these values are standard deviations on either side of the mean b (1 Ϫ 0.68)/2 ϭ 0.16, since 127 is one standard deviation above the mean c 112 and 132 (within standard deviations of the mean) d 112 (2 standard deviations below the mean) 7.49 a 0.025 b This is somewhat unusual (on the large side) c between 0.0015 and 0.025 because 20 is between 17 and 22 d this is very unusual (extremely small) e 12 inches below the mean, where there are inches in a standard deviation: (Ϫ12)/5 ϭ Ϫ2.4, so it is 2.4 standard deviations below the mean f z ϭ (20 Ϫ 32)/5 ϭ Ϫ2.4 g x ϭ 32 ϩ 0.8(5) ϭ 36 7.51 a b c d z ϭ (117 Ϫ 100)/15 ϭ ϩ1.13; unexceptional z ϭ (144 Ϫ 100)/15 ϭ ϩ2.93; exceptional z ϭ (132 Ϫ 100)/15 ϭ ϩ2.13; borderline z ϭ (129 Ϫ 100)/15 ϭ ϩ1.93; borderline 7.53 a less than 0.025 b more than 0.025 c more than 0.025 d less than 0.025 e more than 0.025 f more than 0.025 699 population e 23 is a count f 0.15 is a proportion g observational study 7.77 a overestimates b underestimates c overestimates d not obvious 7.84 a X1 ϩ X2 represents the thickness of bread for a sandwich constructed from two individual slices, and 2X1 represents the thickness of bread for a sandwich constructed by folding a single slice b There is more of a tendency for extremely thin or thick sandwiches when a sandwich is constructed by folding over a single slice c Mean of 2X1 is 2(0.5) ϭ 1.0 and mean of X1 ϩ X2 is 0.5 ϩ 0.5 ϭ 1.0; they are equal d Standard deviation of 2X1 is 2(0.01) ϭ 0.02 and the standard deviation of X1 ϩ X2 is 20.012 + 0.012 = 0.014 Standard deviation of 2X1 is larger because there is more of a tendency for extremely thin or thick sandwiches when a sandwich is constructed by folding over a single slice 7.87 a categorical (under 23 or not) 7.54 a b c d 7.57 a between Ϫ1 and Ϫ2 because 0.10 is between 0.16 and 0.025 b greater than ϩ3 because 0.001 is less than 0.0015 7.59 a z ϭ (106 Ϫ 122)/5 ϭ Ϫ3.2; probability of this short less than 0.0015 b z ϭ (125 Ϫ 122)/5 ϭ ϩ0.6; probability of this tall between 0.5 and 0.16 c z ϭ (116 Ϫ 122)/5 ϭ Ϫ1.2; probability of this short between 0.16 and 0.025 2.4 5.6 8.8 12 15.2 18.4 21.6 X b 7.61 a Because 0.02 is between 0.025 and 0.0015, the z-score is between ϩ2 and ϩ3, so the height is between 122 ϩ 2(5) ϭ 132 and 122 ϩ 3(5) ϭ 137 centimeters b Because 0.20 is between 0.16 and 0.5, the z-score is between Ϫ1 and 0, so the height is between 122 Ϫ (5) ϭ 117 and 122 Ϫ 0(5) ϭ 122 centimeters 7.63 a z ϭ Ϫ1.8, probability between 0.025 and 0.05 b z ϭ ϩ2.9, probability between 0.005 and c z ϭ ϩ2.22, probability between 0.025 and 0.01 d z ϭ Ϫ2.41, probability between 0.005 and 0.01 7.65 a z ϭ (106 Ϫ 122)/5 ϭ Ϫ3.2; very short b z ϭ (125 Ϫ 122)/5 ϭ ϩ0.6; somewhat tall c z ϭ (116 Ϫ 122)/5 ϭ Ϫ1.2; somewhat short 0.03 0.07 0.11 0.15 0.19 0.23 0.27 Sample proportion c Yes, because most of the area under the curve is to the left of 20 d 7.67 a less than 0.01 b more than 0.01 c less than 0.01 d more than 0.01 7.69 a b c d z ϭ Ϫ2.576, x ϭ 2.45 Ϫ 2.576(0.17) ϭ 2.01 z ϭ ϩ1.645, x ϭ 2.45 ϩ 1.645(0.17) ϭ 2.73 z ϭ Ϫ1.960, x ϭ 2.06 Ϫ 1.960(0.17) ϭ 1.73 z ϭ ϩ2.326, x ϭ 2.06 ϩ 2.326(0.17) ϭ 2.46 7.72 a pie chart b p because it describes the entire population of assaults c not necessarily, because sample proportion pN varies d closer to 0.20 because larger samples behave more like the 11.4 16.0 20.6 25.2 29.8 34.4 39.0 X e No, because there is almost no area under the curve to the left of 0.05 700 Solutions to Selected Exercises 7.89 a mean 25(0.5) ϭ 12.5, standard deviation 225(0.5)(0.5) = 2.5 b z ϭ (11 Ϫ 12.5)/2.5 ϭ Ϫ0.60 c a bit low d mean 0.5, standard deviation 20.5(0.5)>25 = 0.1 e z ϭ (0.44 Ϫ 0.50)/0.1 ϭ Ϫ0.60 f 0.6 7.91 a between 0.84 and 0.975 b between 0.975 and 0.9985 c between 0.16 and 0.50 d between 0.975 and 0.9985 e between 0.84 and 0.975 f between and 0.0015 8.15 a population mean m b 20 8.17 a population mean b population standard deviation and sample size c population shape and sample size 8.19 a Both distributions are centered at 10.5 b There should be less spread for the means of 16 selections c Shape is more normal for 16 than for 8.21 a to the right of 69 the left of 69 8.23 a 2.8> 216 = 0.7 b at 69 c at 69 d to b 2.8> 249 = 0.4 8.25 a extremely right-skewed b somewhat rightskewed c approximately normal Chapter 8.1 a population proportion p, a parameter b population mean m, a parameter 8.2 a sample mean x, a statistic proportion p, a parameter 8.5 a both distributions centered at 0.66 b less spread for samples of 50 students c shape more normal for samples of 50 students 8.6 a cannot use normal approximation because sample is too small b sample proportion not centered at 0.66 because sample is biased; premeds may tend to have a different graduation rate c can use methods presented 8.7 c 20 b population - 0.66) a standard deviation 40.66(1 90 = 0.05, sample proportion 64/90 ϭ 0.71, standardized sample proportion z ϭ (0.71 Ϫ 0.66)/0.05 ϭ ϩ1, probability approximately 0.16 8.11 a z ϭ (0.93 Ϫ 0.88)/0.02 ϭ ϩ2.5 b between 0.025 and 0.0015 c between 0.01 and 0.005 d very unlikely e (2) f not improbable 8.13 a mean 0.80, standard deviation 20.80(1 - 0.80)>64 = 0.05 b The shape is approximately normal because 64(0.8) and 64(0.2) are both at least 10 c 0.65 0.70 0.75 0.80 0.85 0.90 0.95 Sample proportion believing in God for samples of size 64 d No, because 48/64 ϭ 0.75 is just standard deviation below the mean e 0.80 is a parameter f p g 0.75 is a statistic h pN 8.27 a more than 0.5 b less than 0.16 c more than 0.025 8.28 z ϭ (50 Ϫ 42.6)/3.0 ϭ ϩ2.47, which is between ϩ2.326 and ϩ2.576, so the probability is between 0.01 and 0.005 8.32 a z = (212 - 217)>(126> 23,115) = -2.21 b between 0.025 and 0.0015 c between 0.01 and 0.025 d very unlikely e (2) f z = (212 - 217)>(126> 29) = -0.12 close to zero, so the sample mean is fairly close to 217 8.35 a two categorical variables b one each, quantitative and categorical c one quantitative variable d one categorical variable 8.36 (c) 8.40 The sample is too large relative to population size (should be no more than one-tenth population size, and it is half) and so samples without replacement have too much dependence 8.45 a approximately normal b left-skewed/low outliers c right-skewed/high outliers d rightskewed/high outliers 8.47 4, 3, 2, 1, respectively 8.49 40,000 because 40,000(1/4,000) ϭ 10 8.51 a 0.40 b 20.40(1 - 0.40)>100 = 0.05 c positive d 0.10 e 20.10(1 - 0.10)>100 = 0.03 f negative g z ϭ (0.25 Ϫ 0.40)/0.05 ϭ Ϫ3, unusually low z ϭ (0.25 Ϫ 0.10)/0.03 ϭ ϩ5, unusually high z ϭ (0.46 Ϫ 0.40)/0.05 ϭ ϩ1.2, not unusual z ϭ (0.16 Ϫ 0.10)/0.03 ϭ ϩ2, unusually high Solutions to Selected Exercises 8.53 a categorical; consider the numbers of bedrooms in new single-family houses completed in 2003 and summarize with mean b close to symmetric 701 8.65 a Proportion living on campus is 222/445 ϭ 0.499, or 0.50 rounded to two decimal places 0.50 0.30 Frequency Probability 0.40 0.20 0.10 0.00 Sample proportion with exactly bedrooms in samples of size 0 Sample proportion on campus in samples of size c 0.51 d 20.51(1 - 0.51)>6 = 0.20 e symmetric but not normal because the sample size is relatively small: np ϭ 6(0.51) ϭ 3.06 is less 8.54 0.40 negative (less than 0.51), 0.53 positive (greater than 0.51), 0.90 positive (greater than 0.51) 8.55 0.90 most extreme (farthest from 0.51), 0.53 least extreme (closest to 0.51) 8.56 a pN = 12>30 = 0.40, z ϭ (0.40 Ϫ 0.51)/0.09 ϭ Ϫ1.22, not unusual b pN = 16>30 = 0.53, z ϭ (0.53 Ϫ 0.51)/0.09 ϭ ϩ0.22, not unusual c pN = 27>30 = 0.90, z ϭ (0.90 Ϫ 0.51)/0.09 ϭ ϩ4.33, unusually many 8.57 a 0.51 plus or minus 2(0.09): between 0.33 and 0.69 b No, because 0.70 is outside the range of sample proportions we would see 95% of the time c 0.32 8.64 a mean 0.44, standard deviation 0.44(1 - 0.44) = 0.06, shape approximately 66 normal because 66(0.44) and 66(1 Ϫ 0.44) are both at least 10 b A sample proportion of 0.42 is not at all unusual because it is just a third of a standard deviation below the mean b mean 0.5, standard deviation 20.5(1 - 0.5)>10 = 0.16 Shape is symmetric and somewhat normal because although the sample size is small, the underlying distribution is nicely balanced c mean 0.47, standard deviation 0.13 Frequency than 10 f 0.51 g 20.51(1 - 0.51)>24 = 0.10 h approximately normal because the sample size is relatively large: np ϭ 24(0.51) ϭ 12.24 and n(1 Ϫ p) ϭ 24(0.49) ϭ 11.76 are both at least 10 0.2 0.3 0.4 0.5 0.6 0.7 Sample proportion on campus in samples of size 10 d The mean and standard deviation conform fairly well, and the shape is fairly normal 8.71 a population mean m b 40 negative, 50 positive, 38 negative, 37 negative c 50 most extreme (farthest from 42.6) and 40 least extreme (closest to 42.6) 702 Solutions to Selected Exercises 8.72 a 0, 1, 2, 3, 4, 5, 6, b c left-skewed d 200 (larger sample because it has less spread) e (1) extremely improbable (2) very improbable (3) not improbable f compare means 8.75 a mean 93, standard deviation 15> 2144 = 1.25 b mean 93, standard deviation 15> 236 = 2.5 c sample mean MQ greater than 93 d z = (96.5 - 93)>1.25 = +2.8, which is high enough to suggest that these children are not representative in terms of MQ: theirs tend to be higher than usual e (85.7 Ϫ 93)/2.5 ϭ Ϫ2.92, which is low enough to suggest that these children are not representative in terms of MQ: theirs tend to be lower than usual 8.76 a 15> 225 = b 93 plus or minus 2(3): the interval is (87, 99) c 0.32 Frequency 8.81 a Mean is 610, standard deviation is 72> 240 = 11.4, shape should be close to normal because the population is normal choosing a book; for 7% the cover was the predominant reason d (1) This interval has a 95% probability of containing the proportion of all people for whom a book’s cover was the deciding factor in choosing the last book they read 9.6 Those who come forward may be more likely, or less likely, to be male than abuse victims in general 9.7 population proportion, 0.10 9.8 a 0.60 Ϯ 2(0.035) ϭ (0.53, 0.67) b They should check if their sample was representative of all recent U.S college graduates 9.9 a 20.10(1 - 0.10)>708 = 0.01 b 68, 640 c 0.10 Ϯ 2(0.01) ϭ (0.08, 0.12) d only values greater than 0.05 e Yes, because the range of plausible values is strictly higher than 0.05 f Yes, because the population size would be much more than 10 times 708 9.11 0.20 ; 220.20(1 - 0.20)>504 = 0.20 ; 0.04 = (0.16, 0.24) or 0.20 ; 1.9620.20(1 - 0.20)>504 = 630 625 620 615 610 605 600 595 590 585 X = Mean Math SAT score in samples of size 40 b mean 610.5, standard deviation 11.5 c Means and standard deviations are very close; shape is bell-shaped but has a low outlier d If we continued to take repeated random samples, some sample means would be on the high side to balance out those that are on the low side Chapter 9.4 a 0.53 b It doesn’t make sense to set up a 95% confidence interval for the proportion of all Topeka voters who opposed the repeal because the data actually already represent all voters from that election c p 9.5 a The survey from 1999 allowed for overlapping categories because they total to more than 100% The survey from 2004 apparently did not because readers specified the main reason for choosing the last book they had read b 0.66 c The difference can be attributed to the questions: For 66% the cover was one of several reasons for 0.20 ; 0.03 = (0.17, 0.23) 9.14 Using actual binomial probabilities would be preferable; because there are only two nonsuccesses, a normal approximation would not be appropriate 9.17 a the first survey, because all the interval’s values are greater than 0.5 b narrower c n ϭ 1/0.042 ϭ 625 9.18 a A 90% confidence interval would be narrower because the multiplier is 1.645 instead of b The point estimate for population proportion is halfway between 0.67 and 0.91: (0.67 ϩ 0.91)/2 ϭ 0.79 c The margin of error is the distance from 0.79 to 0.67 or to 0.91, which is 0.12 d The approximate standard deviation is half the margin of error, or 0.06 e (3) We are 95% sure that population proportion falls in this interval f Yes, because the population (several thousand) is at least 10 times the sample size (75) 9.19 a 1/216,000 = 0.008 b Yes, because 0.52 Ϫ 0.008 is still greater than 0.5 c 1/28,000 = 0.011 d 1/2.13(16,000) = 0.022 e 0.53 is within margin of error (0.022) of 0.52, and so it is a plausible value and the rate for the population of Hispanics may not have dropped at all f larger samples 9.21 (d) 9.29 Actual binomial probabilities should be used because there were too few in one of the categories to justify use of a normal approximation 9.35 100(0.95) ϭ 95 Solutions to Selected Exercises 9.36 a One point of view is that population proportion equals 0.05 and the other is that it is higher than 0.05 b One would hope that researchers were careful to obtain a representative sample of children and to perform the strep test carefully c No: Certainly there would be more than 10 times 708 children in the city d No: The sample size is large enough because we would expect 0.05(708) ϭ 35 with and 0.95(708) ϭ 673 without; both numbers are greater than 10 e pN = 68>708 = 0.10, z ϭ (0.10 Ϫ 0.05)/ 0.01 ϭ ϩ5 f less than 0.0015 because ϩ5 is greater than ϩ3 g (1) If researchers suspected that resistance rates were going to be higher in a particular city, the data would be more convincing than if they had just gone “fishing” for unusual values 9.37 a greater-than b H0 : p ϭ 0.5 vs Ha : p Ͼ 0.5 9.38 a shaded area shown in first graph below 703 d shaded area shown in graph above e No, because neither of the P-values is small f Yes, because the null hypothesis that p ϭ 0.5 is not rejected 9.40 a 2,858/5,776 ϭ 0.495 b pN c Yes, because it is less than 0.5 d greater than 0.16 because Ϫ0.79 is to the right of Ϫ1 e z 9.41 a z ϭ because pN and p0 are both 0.5 b The entire half of the normal curve to the right of 0.5 is shaded; the P-value is 0.5 c The entire half of the z curve to the right of is shaded d The entire area under the curve is shaded e The entire area under the curve is shaded; the P-value is 1.0 f No, because the P-value is not small at all in either case 9.46 a z is more extreme for the House because the sample size is larger b The P-value for the House is smaller because z is more extreme c 0.040/2 ϭ 0.020 d one-sided only, because 0.02 is less than 0.025 but 0.04 is not e 0.13 9.47 a 0.11 (it is the smallest) b 0.20 (it is the largest) c 0.20 (it is the farthest from 0.15) d 0.14 (it is the closest to 0.15) 9.49 We have very strong evidence that this county’s proportion of Caesarians does not conform to the national rate of 0.26 0.50 0.52 ∧ Sample proportion p 9.50 not very small 9.52 a Type I (rejecting null hypothesis that the draws are random, even though it is true) b Type II (failing to reject null hypothesis that the draws are random, even though it is false) c Type I, because they would invest time, money, and effort into improving the lottery when it’s actually fine 0.85 b shaded area shown in graph above area shown in graph below z c shaded 0.50 0.52 ∧ Sample proportion p 9.56 a 0.05 is usually the default cutoff level a b 0.01 should be used so that an athlete is not unfairly barred from participation in his sport c 0.10 should be used because no immediate negative consequences would result from a positive test 9.59 a 11.04 is the z statistic for the test making a comparison to all Americans aged 18 to 44, because 0.955 is further from 0.855 than it is from 0.942 b Yes, the P-value is approximately zero, ecause the z-statistic is so high 9.65 a The interval wouldn’t come close to containing 0.13 b H0 : p ϭ 0.06 and H0 : p ϭ 0.14 c p0 d z ϭ if pN = p0 e P-value would be 1, twice the probability of z being greater than the absolute value of 9.66 d The data provide evidence that Group A strep in the city is more resistant to macrolides than it is in the United States in general 0.85 z 9.67 b The data fail to prove that women are more likely to die in the week after Christmas than in the week before 704 Solutions to Selected Exercises 9.73 P-value is 0.2148 9.76 P-value is 0.0154 9.77 P-value is 0.1977 9.96 z ϭ ϩ5 goes “off the charts,” so tables would not be helpful; we could say the P-value is approximately zero 9.98 0.0197 10.18 a H0 : m ϭ 1.0 versus Ha : m Ͼ 1.0 - 1.0 b t = 1.4 1.35> 10 = +0.94 c The standardized sample mean is identified as t because it is calculated using s and not s (and the sample size is small) d right-tailed e not small at all f The data not provide evidence that mean number of calves sired by all male beluga whales in captivity exceeds 1.0 10.20 The shape of the distribution of ages is close enough to normal but that of weights is not, because of right skewness and high outliers, and the fact that the sample size is rather small Chapter 10 10.1 a 6.5 b No, because the sample would not be representative of all Iberian rock lizards 10.3 We know s, so the standardized statistic is z 10.4 a m and s b 504 ; 1110 121 = (484, 524) c We already know the population mean to be 504 d Narrowest is (4) e Widest is (1) 10.22 a H0 : m ϭ 44 versus Ha : m Ͻ 44 b to justify calling the standardized statistic “t” (otherwise, because of the small sample size, the distribution would not be symmetric and bell-shaped) c less than 44 because t is negative d 0.002 is small e The average price of Merlot wines is less than the company’s wine prices in general f 2(0.002) ϭ 0.004 10.6 2(20> 216) = 10 10.23 a B b B c B 10.8 a t procedure b z procedure c (11.08, 12.15) d sample mean 11.614, P-value 0.000 e Selections were apparently not truly random because the P-value is very small f Yes, because the P-value would have been half the size of 0.000 g No, because sample mean is actually greater than 10.5 h Because the sample mean was significantly greater than 10.5, people apparently perceive larger numbers to be more random i larger j smaller 10.26 a 0.05 b 19; c one (0.020) d It will not always be the case that exactly one P-value is small enough; sometimes none will be small enough, sometimes maybe two e The interval will not contain 610.44 f because about half of the scores are below the mean and the other half above 10.9 a $9.04 b because the sample size (n ϭ 82) is fairly large c 9.04 ; 2(1.28> 282) = (8.76, 9.32) dollars d Yes: $9.00 is wellcontained in the confidence interval e narrower f narrower g Ha : m 9.00 10.11 a H0 : m ϭ 22.6 versus Ha : m Ͼ 22.6 b because is the population standard deviation c z = 25.35>-1622.6 = +1.32 d not small because z is closer to than to e no 2.64 would be significant f 2(1.32) ϭ 10.13 a The P-value is quite small (0.007), so we reject the null hypothesis and conclude that m Ͼ 7.151 for the 20th century b Greater longevity in the latter century is a possible explanation for longer tenures c 2(0.007) ϭ 0.014 10.27 a x is approximately normal because the sample is large (192) b Standardized sample mean follows an approximate z distribution because for a large sample, s is close to s c 57 ; 2(16> 2192) = 57 ; 2.3 = (54.7, 59.3) d t e 40 is not a plausible value for population mean because it is below the interval, not inside 57 - 40 = 14.72 g 40 is not a plausible value f 16> 1192 for population mean because then standardized sample mean would be too large to be believable h Ha : m 40 i Yes, because 40 is not contained in the confidence interval 10.30 a We have strong evidence that all male workers average more than 40 hours a week b No, because according to the test 40 is not a plausible value for population mean 10.33 a 10.14 because the standard deviation (1.362) is clearly more than (84.4, 96.4) words per minute b 1.645 c narrower d narrower e wider Frequency 10.16 a 90.4 ; 1.86(9.7> 29) = 90.4 ; 6.0 = 4 10 11 12 Goat scores (control) Solutions to Selected Exercises 705 10.60 a b H0 : m ϭ versus Ha : m Ͼ c A formal test is not necessary to establish that the goats did not perform significantly better than chance because their sample mean (8.8) was worse than chance d No We can say that we have no compelling evidence that goats in general perform better than chance with the given social cues e 8.8 ; 2(1.8> 223) = 8.8 ; 0.75 = (8.05, 9.55) 05 257 225 057 b (55.07, 71.09) (narrower) - 50 c t = 63 15> 112 = 3.00 d Yes, because the probability of t being at least as high as 3.00 is less than 0.01 e Ha : m Ͼ 50 f paired 10.37 a negative b positive c positive d negative e negative f negative g positive 10.71 a 0.0089 b (3) 10.73 a Age at first conception would also be right-skewed for human males because most would occur between the ages of 20 and 30, a few within a few years below 20, but some as much as 20 or 30 years more than 30 b Variable BelugaAge N Mean 13.00 StDev 2.71 SE Mean 1.02 95.0% CI ( 10.49, 15.51) c Ha : m Ͻ 12 10.82 a (51.89, 65.01) b 134 minutes a day is not plausible for students’ population mean; they apparently watch much less c H0 : m ϭ 134 versus Ha : m 134 group of similar students who did not participate j because it is roughly normal Chapter 11 11.1 a hypothesis test to determine that conditions of stress affect speech rates of stutterers in general 11.2 a two-sample b There is no relationship between age (adult or juvenile) and bucketselection performance; or, population mean bucket-selection performance scores are the same for adults and juveniles Symbolically, the latter can be written as H0 : m1 ϭ m2 11.4 a paired b by reporting the standard deviation of the differences between normal and stressed speech rates c histogram of differences d inference about md based on xd 11.6 a paired b (2) c No, a histogram should be used to display the single sample of differences 16.69 - = 4.51 d md ϭ e 14.81> 16 f yes g 16.69 ; 2.13114.81 ϭ (8.79, 24.58) 16 h Students might better when they’re older, or they might better second time around because of having thought about the questions during the interim period i A better design would be to compare improvements for students who 11.9 In each case, we see if the t statistic exceeds a yes b yes c no d yes e no 11.13 a 25.3 Ϫ 17.57 ϭ ϩ7.73 b Those who didn’t study did better c ϩ7.73/3.33 ϭ ϩ2.32 d No, because the P-value is not less than a ϭ 0.05 e Yes, because the P-value is less than a ϭ 0.05 f larger g Brighter students may exaggerate how little they need to study, by way of bragging, and struggling students may exaggerate how much time they study, by way of complaining Students could simply be asked to report hours studied in the previous week, or during any typical week, or they may be given journals for one or more weeks to record study hours Either design has disadvantages: Retrospective is subject to students’ faulty memories, and prospective may influence students’ behaviors Anonymity is important because students may over-report hours studied if they think their professors will have access to the information 11.14 a t statistic would be larger for the difference between ulna lengths because those sample means are farther apart b t statistic would be larger for the difference between ulna lengths because that standard deviation is smaller c 4.0 is for femurs and 7.3 is for ulnas d both, because both t statistics are unusually large e larger 706 Solutions to Selected Exercises 2 11.18 (0.16 - 0.09) ; 0.11 + 0.06 = A 22 20 0.07 ; 0.05 = (0.02, 0.12) 11.19 a 3.2 Ϫ 2.6 ϭ 0.6 b 0.6(60) ϭ 36 minutes c large sample sizes d very strong evidence that population means are different e only positive numbers f two categorical variables 11.20 a Both, because the P-values, 0.021 and 0.024, are both fairly small b Yes, because 0.11 is not more than twice 0.06 c yes d 0.012 Lead concentration (mg per liter) 11.24 a No; the categorical explanatory variable is which company and the quantitative response variable is lead concentration There is a column of responses for each categorical group b I ϭ 3, N ϭ ϩ ϩ ϭ 14 c 11.36 a Physical complaint is categorical b GSI score is quantitative c m1 ϭ m2 ϭ m3 ϭ m4 ϭ m5 ϭ m6 ϭ m7 d Not all population means are the same e numerator I Ϫ ϭ Ϫ ϭ 6, denominator N Ϫ I ϭ 76 Ϫ ϭ 69 f no g If a Type II Error is made, then no attempt is made to refer the adolescents for psychiatric screening on the basis of their physical complaint; perhaps such screening would have benefitted the adolescents 11.37 a Cancun cheapest and Los Cabos most expensive b no c Not all the population means are equal d no e No, they provide evidence that not all four are the same f Los Cabos 11.38 The largest sample standard deviation (104.7) is more than twice the smallest (40.3) 11.44 a H0 : m1 ϭ m2 b H0 : m1 ϭ m2 ϭ m3 c H0 : md ϭ 60 50 11.58 a paired b not large c no d The confidence interval contains zero, so we would not reject the null hypothesis claiming the population mean of differences to be zero e xd f width of interval is 38.8 Ϫ (Ϫ25.2) ϭ 64; halfway between endpoints is Ϫ25.2 ϩ 32 ϭ 6.8 g Math 40 30 20 10 IBM Memorex Zenith d Zenith e Memorex f Memorex g top-heavy h 21.36, 6.98, 21.91 i The sample standard deviations not satisfy the rule because 21.91 is more than twice 6.98 j not small k no 11.26 a t2 ϭ 2.352 ϭ 5.52 b It is very close to F ϭ 5.53 11.65 a One group of children has had cancer, another independent group has not b m1 Ͻ m2 c because x1 x2 11.77 a quite small b yes c quite large d no 11.82 a Means and standard deviations are Rainbow 1.015 and 0.446, Sucker 2.810 (highest mean) and 1.176 (highest sd), Whitefish 0.6050 and 0.0636 b The sample standard deviations are too different 11.31 gender 11.85 a Gaze 9.00 and control 9.31 A test is not necessary because the sample actually performed worse when cued with a gaze b Mean of differences is 2.20, t ϭ 2.58, P-value ϭ 0.015 (one-sided) The juveniles did significantly better with gaze than no cue (control) c If the mean of differences in part (b) had been negative, there would have been no evidence of performing better when cued with a gaze d Testing H0 : m ϭ versus Ha : m Ͼ has t ϭ 3.28, P-value ϭ 0.005 The null hypothesis is rejected and we conclude that, in general, juvenile goats respond to gazing, which is consistent with our conclusion in part (b) 11.34 Individuals in certain age groups could be more or less likely to admit to having been involved in automobile accidents than those in other age groups 11.89 The two-sample confidence interval (Ϫ6.522, Ϫ3.350) contains only negative numbers, suggesting times are shorter for freestyle overall on average 11.27 a all three b all three c none 11.29 a elementary/middle schools b middle schools c because their sample size (6) was smallest d I ϭ and N ϭ 49 ϩ ϩ 17 ϩ 12 ϭ 84 e I Ϫ ϭ Ϫ ϭ are the group degrees of freedom and N Ϫ I ϭ (49 ϩ ϩ 17 ϩ 12) Ϫ ϭ 84 Ϫ ϭ 80 are the error degrees of freedom f MSG ϭ SSG/DFG ϭ 9,220/3 ϭ 3,073, MSE ϭ SSE/DFE ϭ 585,561/80 ϭ 7,320 g F ϭ 3,073/7,320 ϭ 0.42 h no i larger j 1,250 707 Solutions to Selected Exercises 11.91 a alternative b Mean of sampled differences is 14.75, standardized mean is t ϭ 3.46, P-value is 0.001 c yes d (6.36, 23.13) e 10, 20 f because certain universities would attract students with varying relative strengths or weaknesses in math g two quantitative 11.96 a P-value is 0.002 so the difference is significant b those without pierced ears c gender d First separate males and females, then two two-sample tests 11.97 (Ϫ6.671, Ϫ5.431) Yes, because inches is contained in the interval 12.17 a pN and pN b Yes, because chi-square would be 8.4161 c ANOVA 12.24 a men b (2) and (3) 12.28 a cell phones b gender and carrying cell phones c gender and carrying cell phones 12.30 a 0.182,0.164 b X1 and X2 c a bit smaller than 0.05 d Yes, because the P-value is on the small side e smaller than 3.84 f Type II g Type I 12.31 a Taking Celebrex or not is explanatory and having heart problems or not is response b less than 0.05 c greater than 3.84 Chapter 12 12.2 a religion b 0.84 c 0.406 12.4 H0 : p1 ϭ p2 versus Ha : p1Ͼp2; z = requests for medications and what, if anything, the doctor prescribes m because the expected counts would be too small (between and 2) (0.83 - 0.72) - 1 0.79(1 - 0.79) a + b A 282 162 = 2.74; because z is greater than 2.576, the P-value is less than 0.005 and we conclude the difference is significant 12.5 a large b small c yes 12.8 The interval comes very close to containing zero, and the P-value is also somewhat borderline, just under 0.05 12.10 a Ϫ2.55 b 6.492 c Ϫ4.27 d 18.226 12.12 a for those requesting Paxil b Paxil request: 14/51 ϭ 0.27; general request: 1/50 ϭ 0.02; no request: 2/48 ϭ 0.04 c The proportion seems very different for those requesting Paxil d 5.82 ϭ 17(51)/149; 5.70 ϭ 17(50)/149; 5.48 ϭ 17(48)/149; 45.18 ϭ 132(51)/149; 44.30 ϭ 132(50)/149; 42.52 ϭ 132(48)/149 e 5.82/51 ϭ 5.70/50 ϭ 5.48/48 ϭ 0.114 f for those requesting Paxil g 11.503 ϭ (14 Ϫ 5.82)2/5.82; 1.481 ϭ (37 Ϫ 45.18)2/45.18; 3.880 ϭ (1 Ϫ 5.70)2/5.70; 0.500 ϭ (49 Ϫ 44.30)2/44.30; 2.207 ϭ (2 Ϫ 5.48)2/5.48; 0.284 ϭ (46 Ϫ 42.52)2/42.52 h “patients” requesting Paxil and receiving Paxil i 19.855 ϭ 11.503 ϩ 1.481 ϩ 3.880 ϩ 0.500 ϩ 2.207 ϩ 0.284 j There are r ϭ possibilities for the row variable and c ϭ possibilities for the column variable; the degrees of freedom are (r Ϫ 1)(c Ϫ 1) ϭ 2(1) ϭ k large, because it is considerably greater than 6.0 l The size of chi-square suggests that, in general, there is a relationship between a patient’s 12.41 a When the three individual groups are compared the chi-square statistic is fairly large, the P-value is fairly small, and there is evidence at the 0.05 level of a relationship between treatment and nausea In contrast, when the acupuncture and placebo groups are combined, the chi-square statistic is not large, the P-value is not small, and there is no evidence of a relationship b Report those on the left, because apparently acupuncture versus placebo makes a difference, and these groups should not be combined c Both, because all counts are at least 12.42 a (given the two counts of 19, there is enough information to fill in the rest of the table) b One Kind Two Kinds Three Kinds Total Sequential choice 19 19 41 Simultaneous choice 10 21 25 56 Total 29 40 28 97 Observed c more often d less often people tend to buy in bulk e yes f one that 12.44 a sample proportions 389/885 ϭ 0.44, 248/2,226 ϭ 0.11, z ϭ 20.46, P-value ϭ 0.000, conclude drug users are more likely to carry weapons than non-users b sample proportions 389/637 ϭ 0.61, 496/2,474 ϭ 0.20, z ϭ 20.46, P-value ϭ 0.000, conclude those who carry weapons are more likely to use drugs c chisquare ϭ 418.737, P-value ϭ 0.000, conclude taking drugs or not and carrying weapons or not are related d both Solutions to Selected Exercises 12.49 a chi-square ϭ 10.161, P-value ϭ 0.001; females were more likely to eat breakfast b chi-square ϭ 7.428, P-value ϭ 0.006, females were more likely to carry a cell phone c Those who did carry a cell phone were more likely to eat breakfast: 0.56 of those who carried a cell phone ate breakfast; 0.54 of those who didn’t carry a cell phone ate breakfast; the P-value is 0.741 so the relationship is not significant 13.1 for departures versus gates, on the left (there is less scatter and s will be smaller) 13.2 a because longer trips cost more money b No, it is doubtful that representatives would base the length of their trip on how much money is paid c Ϫ683 ϩ (1,176)(1) ϭ 493 dollars d $3,401 e s f 1,176 dollars g b1 h The intercept Ϫ683 should be interpreted theoretically as the y-value where the line crosses the y-axis; in actuality there would be no trips of length zero i b0 j b0 k all of these l moderately strong 13.6 (d) 13.8 a in the sample b 39 Ϫ ϭ 37 d 0.000 e (2) and (4) 13.9 Ha : b1 Ͼ because we expect the relationship to be positive c both 13.10 weak evidence of a strong relationship 13.12 strong evidence of a weak relationship 13.13 a Ϫ ϭ b (2) because r is of moderate size but the P-value is not small at all c d 0.23/2 ϭ 0.115 e chi-square 13.15 a Because children grow taller as they get older, we expect a positive relationship: correlation r greater than zero Age tells us quite a lot—but not everything—about how tall a child will be, so we expect r to be somewhere between 0.5 and 1.0 This should be true for the larger population of children, not just the sample b (2.0, 2.5) c b1 large d weak evidence of a strong relationship 13.18 a small b confidence interval c 55.3 Ϯ 2(1.2) ϭ (52.9, 57.7) d Yes, it came quite close e 55.3 ; 2(1.2)> 225 = (54.82, 55.78) f Yes, it came quite close g wider h because depth tells us a lot about age; without information about depth, it is difficult to pin down the age b (3) 13.25 for departures versus gates (on the left) 13.29 A government report about all 50 states tells us everything about the population; there is no larger group to generalize to Chapter 13 13.20 a (6) 13.21 a interval estimate of mean amount for all 5-day trips taken by the larger population of representatives b 2(1,176) ϭ 2,352 dollars because 10 days is times as long as days c interval estimate of mean amount paid for all 5-day trips d (9,988, 12,166) e The interval is not very accurate because 10 is too far from the mean trip length, days c (5) d (4) e (1) f (2) 13.34 a Distribution of b1 should be centered at b1 ϭ 2.25 b 250 children c s d 3(2.25) ϭ 6.75 inches 13.36 (c) [small sample with non-normal responses] 13.41 a closer to b relatively large because predictions of sale price based on assessed price are not very accurate c Conclude there is no evidence of a relationship between sale price and assessed price for the larger population of properties 13.42 a Texas school districts (for each district, record rate of autism and how much mercury is released in the county where that district is located) b Explanatory variable (quantitative) is pounds of environmentally released mercury in 2001; response variable (also quantitative) is rate of autism c 0.17/1,000 d b1 e No, the interval would not contain zero f much less than 0.05 g Setting as urban, small metro, or rural could play a role in mercury pollution, and also in autism rates via a variety of socioeconomic factors 13.48 a 0.90 (closer to 1), 0.08 b yN = 29.2 + 7.12x (steeper slope), yN = -784 + 0.47x c 17.55 (less prediction error), 40.15 d (4.32, 9.92) (interval contains only positive slopes), (Ϫ3.96, ϩ4.89) e 0.000 (small P-value), 0.815 f 5.85 (large t), 0.24 g (94.96, 179.84) (narrower interval), (40.3, 234.5) 13.50 a Velocity (feet per second) 708 1.5 1.0 0.5 Depth (feet) 10 Solutions to Selected Exercises (1) decrease (2) linear (3) high b the equation of the regression line c 10 Ϫ ϭ d t ϭ Ϫ6.80, P-value is 0.000; there is strong evidence of a relationship in general between depth and velocity e displaying and summarizing probability statistical inference data production 13.54 a touch on point Regression Plot Touch = 6.90442 + 0.656637 Point S = 2.54852 R-Sq = 34.4% 18 17 Touch 16 Chapter 14 14.5 one categorical variable; probability 14.6 one categorical variable; displaying and summarizing (Section 4.1) 14.7 categorical explanatory and quantitative response variables; statistical inference (Chapter 11) 14.8 two categorical variables; statistical inference (Chapter 12) 14.9 two quantitative variables; displaying and summarizing (Section 5.3) 14.12 two categorical variables; probability 15 14.14 two quantitative variables; data production 14 13 14.17 one quantitative variable; probability 12 14.18 two categorical variables; data production 11 14.19 one quantitative variable; data production 10 10 11 12 13 14 15 16 17 Point Regression Plot Touch = 14.4103 – 0.0370370 Gaze S = 3.14569 R-Sq = 0.1% 14.21 one categorical variable; statistical inference (Chapter 9) 14.27 two categorical variables; displaying and summarizing (Section 5.2) 14.29 one categorical variable; data production 14.36 two quantitative variables; statistical inference (Chapter 13) 18 17 16 Touch 709 14.37 categorical explanatory and quantitative response variables; displaying and summarizing (Section 5.1) 15 14 14.43 categorical explanatory and quantitative response variables; data production 13 12 14.44 categorical explanatory and quantitative response variables; probability 11 10 10 11 12 13 14 Gaze b s ϭ 2.549 c (8.306, 19.949) d because the sample size is small e s ϭ 3.146 f (6.892, 21.262) g 21.262 Ϫ 6.892 ϭ 14.37 is wider than 4(3.146) ϭ 12.584 h touch on point r ϭ ϩ0.59, touch on gaze r ϭ 0.03; r is closer to for the smaller s because the points are more tightly clustered i Smaller P-value corresponds to r closer to j point score 14.46 one quantitative variable; statistical inference (Chapter 10) This page intentionally left blank ... sample statistics for the data at hand The following diagram shows how summarizing data fits into the big picture” of statistics Data Production: Take sample data from the population, with sampling... summaries of variables and their relationships reflect the true nature of the variables and relationships in the sample only if the design for gathering the information is sound Bias Due to Study... individuals The science of statistics concerns itself with gathering data about a group of individuals, displaying and summarizing the data, and using the information provided by the data to draw