Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 545 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
545
Dung lượng
10,66 MB
Nội dung
13 Brief CONTENTS Pressmaster/Shutterstock.com Inference about Comparing Two Populations CHapter Outline 13-1 Inference about the Difference between Two Means: Independent Samples 13-2 Observational and Experimental Data 13-3 Inference about the Difference between Two Means: Matched Pairs Experiment 13-4 Inference about the Ratio of Two Variances 13-5 Inference about the Difference between Two Population Proportions Appendix 13 Review of Chapters 12 and 13 General Social Survey DATA GSS2014* KamiGami/Shutterstock.com Comparing Democrats and Republicans: Who is Likely to Have Completed a University Degree? In the business of politics it is important to be able to determine what differences exist between supporters and opponents In 2014 the General Social Survey asked people what is the highest degree earned (DEGREE)? The responses are: Left high school On page 489, we will provide our answer Completed high school Completed junior college Completed Bachelor’s degree Completed graduate degree (Continued) 427 Copyright 2018 Cengage Learning All Rights Reserved May not be copied, scanned, or duplicated, in whole or in part WCN 02-200-203 428 CHAPTER 13 The survey also asked, Do you think of yourself as Democrat, Independent, or Republican (PARTYID3)? Democrat Independent Republican Do these data allow us to infer that people who identify themselves as Republican Party supporters are more likely to have completed a Bachelor’s or graduate degree than their Democratic counterparts? Introduction W e can compare learning how to use statistical techniques to learning how to drive a car We began by describing what you are going to in this course (Chapter 1) and then presented the essential background material (Chapters 2–9) Learning the concepts of statistical inference and applying them the way we did in Chapters 10 and 11 is akin to driving a car in an empty parking lot You’re driving, but it’s not a realistic experience Learning Chapter 12 is like driving on a quiet side street with little traffic The experience represents real driving, but many of the difficulties have been eliminated In this chapter, you begin to drive for real, with many of the actual problems faced by licensed drivers, and the experience prepares you to tackle the next difficulty In this chapter, we present a variety of techniques used to compare two populations In Sections 13-1 and 13-3, we deal with interval variables; the parameter of interest is the difference between two means The difference between these two sections introduces yet another factor that determines the correct statistical method—the design of the experiment used to gather the data In Section 13-1, the samples are independently drawn, whereas in Section 13-3, the samples are taken from a matched pairs experiment In Section 13-2, we discuss the difference between observational and experimental data, a distinction that is critical to the way in which we interpret statistical results Section 13-4 presents the procedures employed to infer whether two population variances differ The parameter is the ratio σ21/σ22 (When comparing two variances, we use the ratio rather than the difference because of the nature of the sampling distribution.) Section 13-5 addresses the problem of comparing two populations of nominal data The parameter to be tested and estimated is the difference between two proportions 13- 1 I n f e r e n c e a b o u t t h e D i f f e r e n c e Independent Sa mples bet ween Two Means: In order to test and estimate the difference between two population means, the statistics practitioner draws random samples from each of two populations In this section, we discuss independent samples In Section 13-3, where we present the matched pairs experiment, the distinction between independent samples and matched pairs will be made clear For now, we define independent samples as samples completely unrelated to one another Figure 13.1 depicts the sampling process Observe that we draw a sample of size n from population and a sample of size n from population For each sample, we compute the sample means and sample variances Copyright 2018 Cengage Learning All Rights Reserved May not be copied, scanned, or duplicated, in whole or in part WCN 02-200-203 I n f ere n ce ab o u t C o m pari n g T w o P o p u lati o n s 429 Figure 13.1 Independent Samples from Two Populations Population Population Parameters: m1 and s12 Parameters: m2 and s22 Sample size: n1 Sample size: n2 Statistics: x–1 and s12 Statistics: x–2 and s22 The best estimator of the difference between two population means, μ1 − μ2 , is the difference between two sample means, x1 − x2 In Section 9-3 we presented the sampling distribution of x1 − x2 Sampling Distribution of x1 − x2 x − x is normally distributed if the populations are normal and approximately normal if the populations are nonnormal and the sample sizes are large The expected value of x − x is E(x − x ) = μ1 − μ2 The variance of x − x is V(x − x ) = σ21 σ2 + n1 n2 The standard error of x − x is σ21 σ2 + Ån1 n2 Thus, z = (x − x ) − (μ1 − μ2 ) σ21 Ån1 + σ22 n2 is a standard normal (or approximately normal) random variable It follows that the test statistic is z = (x − x ) − (μ1 − μ2 ) σ21 Ån1 + σ22 n2 The interval estimator is (x − x ) ± z α/ σ21 Ån1 + σ22 n2 However, these formulas are rarely used because the population variances σ21 and σ22 are virtually always unknown Consequently, it is necessary to estimate the standard error Copyright 2018 Cengage Learning All Rights Reserved May not be copied, scanned, or duplicated, in whole or in part WCN 02-200-203 430 CHAPTER 13 of the sampling distribution The way to this depends on whether the two unknown population variances are equal When they are equal, the test statistic is defined in the following way Test Statistic for μ1 − μ2 when σ21 = σ22 t= where s2p = (x1 − x2) − (μ1 − μ2) 1 s2 a + b Å p n1 n2 v = n1 + n2 − (n1 − 1)s21 + (n2 − 1)s22 n1 + n2 − The quantity s2p is called the pooled variance estimator It is the weighted average of the two sample variances with the number of degrees of freedom in each sample used as weights The requirement that the population variances be equal makes this calculation feasible because we need only one estimate of the common value of σ21 and σ22 It makes sense for us to use the pooled variance estimator because, in combining both samples, we produce a better estimate The test statistic is Student t distributed with n + n − degrees of freedom, provided that the two populations are normal The confidence interval estimator is derived by mathematics that by now has become routine Confidence Interval Estimator of μ1 − μ2 When σ21 = σ22 (x1 − x2) ± tα/2 Å s2p a 1 + b n1 n2 ν = n1 + n2 − We will refer to these formulas as the equal-variances test statistic and confidence interval estimator, respectively When the population variances are unequal, we cannot use the pooled variance estimate Instead, we estimate each population variance with its sample variance Unfortunately, the sampling distribution of the resulting statistic (x1 − x2) − (μ1 − μ2) s21 Å n1 + s22 n2 is neither normally nor Student t distributed However, it can be approximated by a Student t distribution with degrees of freedom equal to ν= (s21/n1 + s22/n2) (s21/n1) n1 − + (s22/n2) n2 − Copyright 2018 Cengage Learning All Rights Reserved May not be copied, scanned, or duplicated, in whole or in part WCN 02-200-203 I n f ere n ce ab o u t C o m pari n g T w o P o p u lati o n s 431 (It is usually necessary to round this number to the nearest integer.) The test statistic and confidence interval estimator are easily derived from the sampling distribution Test Statistic for μ1 − μ2 When σ21 ≠ σ22 t= (x1 − x2) − (μ1 − μ2) s21 s2 + 2b Å n1 n2 s21/n1 + s22/n2 2 ν= s21/n1 2 a n1 − + s22/n2 2 n2 − Confidence Interval Estimator of μ1 − μ2 When σ21 ≠ σ22 (x1 − x2) ± tα/2 s21 s2 + 2b Å n1 n2 ν= a (s21/n1 + s22/n2) (s21/n1) n1 − + (s22/n2) n2 − We will refer to these formulas as the unequal-variances test statistic and confidence interval estimator, respectively The question naturally arises, How we know when the population variances are equal? The answer is that because σ21 and σ22 are unknown, we can’t know for certain whether they’re equal However, we can perform a statistical test to determine whether there is evidence to infer that the population variances differ We conduct the F-test of the ratio of two variances, which we briefly present here and save the details for Section 13-4 Testing the Population Variances The hypotheses to be tested are H0: σ21/σ22 = H1: σ21/σ22 ≠ The test statistic is the ratio of the sample variances s21/s22 , which is F-distributed with degrees of freedom v = n − and v2 = n2 − Recall that we introduced the F-distribution in Section 8-4 The required condition is the same as that for the t-test of μ1 − μ2 , which is that both populations are normally distributed This is a two-tail test so that the rejection region is F > Fα/2, ν , ν or F < F1−α/2, ν , ν Put simply, we will reject the null hypothesis that states that the population variances are equal when the ratio of the sample variances is large or if it is small Table in Appendix B, which lists the critical values of the F-distribution, defines “large” and “small.” Copyright 2018 Cengage Learning All Rights Reserved May not be copied, scanned, or duplicated, in whole or in part WCN 02-200-203 432 CHAPTER 13 13-1a Decision Rule: Equal-Variances or Unequal-Variances t-Tests and Estimators Recall that we can never have enough statistical evidence to conclude that the null hypothesis is true This means that we can only determine whether there is enough evidence to infer that the population variances differ Accordingly, we adopt the following rule: We will use the equal-variances test statistic and confidence interval estimator unless there is evidence (based on the F-test of the population variances) to indicate that the population variances are unequal, in which case we will apply the unequal-variances test statistic and confidence interval estimator e x a m p l e 13.1* Direct DATA Xm13-01 and Broker-Purchased Mutual Funds Millions of investors buy mutual funds (see page 161 for a description of mutual funds), choosing from thousands of possibilities Some funds can be purchased directly from banks or other financial institutions whereas others must be purchased through brokers, who charge a fee for this service This raises the question, Can investors better by buying mutual funds directly than by purchasing mutual funds through brokers? To help answer this question, a group of researchers randomly sampled the annual returns from mutual funds that can be acquired directly and mutual funds that are bought through brokers and recorded the net annual returns, which are the returns on investment after deducting all relevant fees These are listed next Direct 9.33 6.94 16.17 16.97 5.94 12.61 3.33 16.13 11.2 1.14 Broker 4.68 3.09 7.26 2.05 13.07 0.59 13.57 0.35 2.69 18.45 4.23 10.28 7.1 −3.09 5.6 5.27 8.09 15.05 13.21 1.72 14.69 −2.97 10.37 −0.63 −0.15 0.27 4.59 6.38 −0.24 10.32 10.29 4.39 −2.06 7.66 10.83 14.48 4.8 13.12 −6.54 −1.06 3.24 −6.76 12.8 11.1 2.73 −0.13 18.22 −0.8 −5.75 2.59 3.71 13.15 11.05 −3.12 8.94 2.74 4.07 5.6 −0.85 −0.28 16.4 4.36 6.39 −11.07 9.24 −1.9 9.49 −2.67 6.7 8.97 0.19 1.87 12.39 −1.53 6.54 5.23 10.92 6.87 −2.15 −1.69 9.43 8.31 −3.99 −4.44 8.63 7.06 1.57 −8.44 −5.72 6.95 Can we conclude at the 5% significance level that directly purchased mutual funds outperform mutual funds bought through brokers? Solution: Identify To answer the question, we need to compare the population of returns from direct and the returns from broker-bought mutual funds The data are obviously interval (we’ve recorded real numbers) This problem objective–data type combination tells us that the parameter to be tested is the difference between two means, μ1 − μ2 The hypothesis *Source: D Bergstresser, J Chalmers, and P Tufano, “Assessing the Costs and Benefits of Brokers in the Mutual Fund Industry.” Copyright 2018 Cengage Learning All Rights Reserved May not be copied, scanned, or duplicated, in whole or in part WCN 02-200-203 I n f ere n ce ab o u t C o m pari n g T w o P o p u lati o n s 433 to be tested is that the mean net annual return from directly purchased mutual funds ( μ1) is larger than the mean of broker-purchased funds ( μ2) Hence, the alternative hypothesis is H1: (μ1 − μ2) > As usual, the null hypothesis automatically follows: H0: (μ1 − μ2) = To decide which of the t -tests of μ1 − μ2 to apply, we conduct the F-test of σ21 / σ22 H0: H1: σ21/σ22 = σ21/σ22 ≠ Compute M a n ua l ly: From the data, we calculated the following statistics: s21 = 37.49 and s22 = 43.34 Test statistic: F = s21/s22 = 37.49/43.34 = 0.86 Rejection region: F > Fα/2,ν ,ν = F.025,49,49 ≈ F.025,50,50 = 1.75 or F < F1−α/2,ν ,ν = F.975,49,49 = 1/F.025,49,49 ≈ 1/F.025,50,50 = 1/1.75 = 57 Because F = 86 is not greater than 1.75 or smaller than 57, we cannot reject the null hypothesis EXCEL Data Analysis 10 A F-Test: Two-Sample for Variances Mean Variance Observations df F P(F Compute M a n ua l ly: From the data, we calculated the following statistics: x1 = 6.63 x2 = 3.72 s21 = 37.49 s22 = 43.34 The pooled variance estimator is s2p = = (n1 − 1)s21 + (n2 − 1)s22 n1 + n2 − (50 − 1)37.49 + (50 − 1)43.34 50 + 50 − = 40.42 Copyright 2018 Cengage Learning All Rights Reserved May not be copied, scanned, or duplicated, in whole or in part WCN 02-200-203 I n f ere n ce ab o u t C o m pari n g T w o P o p u lati o n s 435 The number of degrees of freedom of the test statistic is ν = n1 + n2 − = 50 + 50 − = 98 The rejection region is t > tα, ν = t.05, 98 ≈ t.05, 100 = 1.660 We determine that the value of the test statistic is t = = (x1 − x2) − (μ1 − μ2) s2p a 1 + b n1 n2 Å (6.63 − 3.72) − 1 40.42a + b Å 50 50 = 2.29 EXCEL Data Analysis 10 11 12 13 14 A B t-Test: Two-Sample Assuming Equal Variances Mean Variance Observations Pooled Variance Hypothesized Mean Difference df t Stat P(T