So far, we have compared the means of two populations by using independent samples.
In this section and Section 10.6, we compare such means by using apaired sample. A paired sample may be appropriate when the members of the two populations have a natural pairing.
Each pair in a paired sampleconsists of a member of one population and that member’s corresponding member in the other population. With a simple random paired sample,each possible paired sample is equally likely to be the one selected.
Example 10.14 provides an unrealistically simple illustration of paired samples, but it will help you understand the concept.
EXAMPLE 10.14 Introducing Random Paired Samples
Husbands and Wives Let’s consider two small populations, one consisting of five married women and the other of their five husbands, as shown in the following figure. The arrows in the figure indicate the married couples, which constitute the pairs for these two populations.
Elizabeth Carol Maria Gloria Laura
Karim Harold Paul Joshua Songtao Husband Population Wife Population
478 CHAPTER 10 Inferences for Two Population Means
Suppose that we take a paired sample of size 3 (i.e., a sample of three pairs) from these two populations.
a. List the possible paired samples.
b. If a paired sample is selected at random (simple random paired sample), find the chance of obtaining any particular paired sample.
Solution We designated a wife–husband pair by using the first letter of each name. For example, (E, K) represents the couple Elizabeth and Karim.
a. There are 10 possible paired samples of size 3, as displayed in Table 10.12.
TABLE 10.12 Possible paired samples of size 3 from the wife and husband populations
Paired sample (E, K), (C, H), (M, P) (E, K), (C, H), (G, J) (E, K), (C, H), (L, S) (E, K), (M, P), (G, J) (E, K), (M, P), (L, S) (E, K), (G, J), (L, S) (C, H), (M, P), (G, J) (C, H), (M, P), (L, S) (C, H), (G, J), (L, S) (M, P), (G, J), (L, S)
b. For a simple random paired sample of size 3, each of the 10 possible paired samples listed in Table 10.12 is equally likely to be the one selected. Therefore the chance of obtaining any particular paired sample of size 3 is 101.
The previous example provides a concrete illustration of paired samples and em- phasizes that, for simple random paired samples of any given size, each possible paired sample is equally likely to be the one selected. In practice, we neither obtain the num- ber of possible paired samples nor explicitly compute the chance of selecting a partic- ular paired sample. However, these concepts underlie the methods we do use.
Comparing Two Population Means, Using a Paired Sample
We are now ready to examine a process for comparing the means of two populations by using a paired sample.
EXAMPLE 10.15 Comparing Two Means, Using a Paired Sample
Ages of Married People The U.S. Census Bureau publishes information on the ages of married people inCurrent Population Reports. Suppose that we want to decide whether, in the United States, the mean age of married men differs from the mean age of married women.
a. Formulate the problem statistically by posing it as a hypothesis test.
b. Explain the basic idea for carrying out the hypothesis test.
c. Suppose that 10 married couples in the United States are selected at random and that the ages, in years, of the people chosen are as shown in the second and third columns of Table 10.13. Discuss the use of these data to make a decision concerning the hypothesis test.
TABLE 10.13 Ages, in years, of a random sample of 10 married couples
Couple Husband Wife Difference,d
1 59 53 6
2 21 22 −1
3 33 36 −3
4 78 74 4
5 70 64 6
6 33 35 −2
7 68 67 1
8 32 28 4
9 54 41 13
10 52 44 8
36
Solution
a. To formulate the problem statistically, we first note that we have one variable—
namely, age—and two populations:
Population 1: All married men Population 2: All married women.
Let μ1 andμ2 denote the means of the variable “age” for Population 1 and Population 2, respectively:
μ1=mean age of all married men μ2=mean age of all married women.
We want to perform the hypothesis test
H0:μ1=μ2(mean ages of married men and women are the same) Ha:μ1 =μ2(mean ages of married men and women differ).
b. Independent samples could be used to carry out the hypothesis test: Take inde- pendent simple random samples of, say, 10 married men and 10 married women and then apply a pooled or nonpooledt-test to the age data obtained. However, in this case, a paired sample is more appropriate. Here, a pair consists of a mar- ried couple. The variable we analyze is the difference between the ages of the husband and wife in a couple. By using a paired sample, we can remove an ex- traneous source of variation: the variation in the ages among married couples.
The sampling error thus made in estimating the difference between the popula- tion means will generally be smaller and, therefore, we are more likely to detect differences between the population means when such differences exist.
c. The last column of Table 10.13 contains the difference,d, between the ages of each of the 10 couples sampled. We refer to each difference as a paired differencebecause it is the difference of a pair of observations. For example, in the first couple, the husband is 59 years old and the wife is 53 years old, giving a paired difference of 6 years, meaning that the husband is 6 years older than his wife.
If the null hypothesis of equal mean ages is true, the paired differences of the ages for the married couples sampled should average about 0; that is, the sample mean,d, of the paired differences should be roughly 0. Ifdis too much different from 0, we would take this as evidence that the null hypothesis is false.
From the last column of Table 10.13, we find that the sample mean of the paired differences is
d = di n = 36
10 =3.6.
The question now is, can this difference of 3.6 years be reasonably attributed to sampling error, or is the difference large enough to indicate that the two populations have different means? To answer that question, we need to know the distribution of the variabled, which we discuss next.
The Paired t-Statistic
Suppose thatxis a variable on each of two populations whose members can be paired.
For each pair, we letddenote the difference between the values of the variablex on the members of the pair. We calldthepaired-difference variable.
It can be shown that the mean of the paired differences equals the difference be- tween the two population means. In symbols,
μd =μ1−μ2.
Furthermore, ifd is normally distributed, we can apply this equation and our knowl- edge of the studentized version of a sample mean (Key Fact 8.5 on page 344) to obtain Key Fact 10.6.
480 CHAPTER 10 Inferences for Two Population Means
KEY FACT 10.6 Distribution of the Pairedt-Statistic
Suppose thatxis a variable on each of two populations whose members can be paired. Further suppose that the paired-difference variabledis normally distributed. Then, for paired samples of sizen, the variable
t=d−(μ1−μ2) sd/√
n has thet-distribution with df=n−1.
Note: We use the phrase normal differences as an abbreviation of “the paired- difference variable is normally distributed.”
Hypothesis Tests for the Means of Two Populations, Using a Paired Sample
We now present a hypothesis-testing procedure based on a paired sample for com- paring the means of two populations when the paired-difference variable is normally distributed. In light of Key Fact 10.6, for a hypothesis test with null hypothesis H0:μ1=μ2, we can use the variable
t = d sd/√
n
as the test statistic and obtain the critical value(s) or P-value from the t-table, Table IV.
We call this hypothesis-testing procedure thepaired t-test.Note that the paired t-test is simply the one-meant-test applied to the paired-difference variable with null hypothesisH0:μd=0. Procedure 10.6 provides a step-by-step method for performing a pairedt-test by using either the critical-value approach or the P-value approach.
Properties and guidelines for use of the pairedt-test are the same as those given for the one-meanz-test in Key Fact 9.7 on page 379 when applied to paired differences. In particular, the pairedt-test is robust to moderate violations of the normality assumption but, even for large samples, can sometimes be unduly affected by outliers because the sample mean and sample standard deviation are not resistant to outliers. Here are two other important points:
r Do not apply the pairedt-test to independent samples, and, likewise, do not apply a pooled or nonpooledt-test to a paired sample.
r The normality assumption for a pairedt-test refers to the distribution of the paired- difference variable, not to the two distributions of the variable under consideration.
EXAMPLE 10.16 The Paired t -Test
Ages of Married People We now return to the hypothesis test posed in Exam- ple 10.15. A random sample of 10 married couples gave the data on ages, in years, shown in the second and third columns of Table 10.13 on page 478. At the 5% sig- nificance level, do the data provide sufficient evidence to conclude that the mean age of married men differs from the mean age of married women?
FIGURE 10.15 Normal probability plot of the paired differences in Table 10.13
–3 0 –3
–2 –1 0 1 2 3
Paired difference (yr)
Normal score
3 6 9 12 15
Solution First, we check the two conditions required for using the pairedt-test, as listed in Procedure 10.6.
r Assumption 1 is satisfied because we have a simple random paired sample. Each pair consists of a married couple.
r Because the sample size,n=10, is small, we need to examine issues of normal- ity and outliers. (See the first bulleted item in Key Fact 9.7 on page 379.) To do so, we construct in Fig. 10.15 a normal probability plot for the sample of paired