Significance tests for population

When carrying out tests or measurements, it is often possible to form a hypothesis as a result of these tests.

For example, the boiling point of water is found to be: 101.7◦C, 99.8◦C, 100.4◦C, 100.3◦C, 99.5◦C and 98.9◦C, as a result of six tests. The mean of these six results is 100.1◦C. Based on these results, how confidently can it be predicted, that at this particular height above sea level and at this particular baromet- ric pressure, water boils at 100.1◦C? In other words, are the results based on sampling significantly dif- ferent from the true result? There are a variety of ways of testing significance, but only one or two of these in common use are introduced in this section.

Usually, in significance tests, some predictions about population parameters, based on sample data, are required. In significance tests for population means, a random sample is drawn from the population and the mean value of the sample, x, is determined. The testing procedure depends on whether or not the standard deviation of the population is known.

(a) When the standard deviation of the population is known

A null hypothesis is made that there is no difference between the value of a sample mean x and that of the population mean, à, i.e. H0: x=à. If many samples had been drawn from a population and a sampling distribution of means had been formed, then, provided N is large (usually taken as N ≥30) the mean value would form a normal distribution, having a mean value ofàxand a standard deviation or standard error of the means (see Section 61.3).

598 STATISTICS AND PROBABILITY

The particular value of x of a large sample drawn for a significance test is therefore part of a normal distribution and it is possible to determine by how much x is likely to differ from àx in terms of the normal standard variate z. The relationship is z=x−àx

σx .

However, with reference to Chapter 61, page 578, σx = σ

√N

Np−N Np−1

for finite populations,

= σ

√N for infinite populations, andàx =à where N is the sample size, Np is the size of the population, àis the mean of the population andσ the standard deviation of the population.

Substituting for àx and σx in the equation for z gives:

z= x−à

√σ N

for infinite populations, (1)

z= x−à

√σ N

Np−N Np−1

(2)

for populations of size Np

In Table 62.1 on page 594, the relationship between z-values and levels of significance for both one-tailed and two-tailed tests are given. It can be seen from this table for a level of significance of, say, 0.05 and a two-tailed test, the z-value is+1.96, and z-values outside of this range are not significant. Thus, for a given level of significance (i.e. a known value of z), the mean of the population,à, can be predicted by using equations (1) and (2) above, based on the mean of a sample x. Alternatively, if the mean of the population is known, the significance of a particular value of z, based on sample data, can be established. If the z-value based on the mean of a random sample for a two-tailed test is found to be, say, 2.01, then at a level of significance of 0.05, that is, the results being probably significant, the mean of the sampling distribution is said to differ significantly from what would be expected as a result of the null hypothesis (i.e. that x=à), due to the result of the test being classed as ‘not significant’ (see page 592).

The hypothesis would then be rejected and an alternative hypothesis formed, i.e. H1: x =à. The rules

of decision for such a test would be:

(i) reject the hypothesis at a 0.05 level of significance, i.e. if the z-value of the sample mean is outside of the range−1.96 to+1.96.

(ii) accept the hypothesis otherwise.

For small sample sizes (usually taken as N<30), the sampling distribution is not normally distributed, but approximates to Student’s t-distributions (see Section 61.5). In this case, t-values rather than z-values are used and the equations analogous to equations (1) and (2) are:

|t| = x−à

√σ N

for infinite populations (3)

|t| = x−à

√σ N

Np−N Np−1

(4)

for populations of size Np

where|t| means the modulus of t, i.e. the positive value of t.

(b) When the standard deviation of the population is not known

It is found, in practice, that if the standard deviation of a sample is determined, its value is less than the value of the standard deviation of the population from which it is drawn. This is as expected, since the range of a sample is likely to be less than the range of the population. The difference between the two standard deviations becomes more pronounced when the sample size is small. Investigations have shown that the variance, s2, of a sample of N items is approximately related to the variance,σ2, of the population from which it is drawn by:

s2=

N−1 N

σ2

The factor

N−1 N

is known as Bessel’s cor- rection. This relationship may be used to find the relationship between the standard deviation of a sample, s, and an estimate of the standard deviation of a population,σ, and is:ˆ

ˆ σ2=s2

N N−1

i.e.σˆ =s N

N−1

Ch62-H8152.tex 23/6/2006 15: 15 Page 599

SIGNIFICANCE TESTING 599

For large samples, say, a minimum of N being 30, the factor

N N−1

29 which is approximately equal to 1.017. Thus, for large samples s is very nearly equal toσˆ and the factor

N N−1

can be omitted without introducing any appreciable error. In equations (1) and (2), s can be written forσ, giving:

z= x−à

√s N

for infinite populations (5)

and z= x−à

√s N

Np−N Np−1

(6)

for populations of size Np For small samples, the factor

N N−1

cannot be disregarded and substitutingσ=s

N N−1

in equations (3) and (4) gives:

|t| = x−à s

N N −1

√N

= (x−à)√ (N−1)

s (7)

for infinite populations, and

|t| = x−à

s 56 67

N N −1

√N

Np−N Np−1

= (x−à)√ (N−1) s

Np−N Np−1

(8)

for populations of size Np.

The equations given in this section are parts of tests which are applied to determine population means. The way in which some of them are used is shown in the following worked problems.

Problem 4. Sugar is packed in bags by an auto- matic machine. The mean mass of the contents of a bag is 1.000 kg. Random samples of 36 bags are selected throughout the day and the mean mass of a particular sample is found to be 1.003 kg. If the manufacturer is willing to accept a standard deviation on all bags packed of 0.01 kg and a level of significance of 0.05, above which values the machine must be stopped and adjustments made, determine if, as a result of the sample under test, the machine should be adjusted.

Population meanà=1.000 kg, sample mean x=1.003 kg, population standard deviation σ=0.01 kg and sample size, N=36.

A null hypothesis for this problem is that the sample mean and the mean of the population are equal, i.e. H0: x=à.

Since the manufacturer is interested in deviations on both sides of the mean, the alternative hypothesis is that the sample mean is not equal to the population mean, i.e. H1: x=à.

The decision rules associated with these hypothe- ses are:

(i) reject H0 if the z-value of the sample mean is outside of the range of the z-values corresponding to a level of significance of 0.05 for a two-tailed test, i.e. stop machine and adjust, and

(ii) accept H0 otherwise, i.e. keep the machine running.

The sample size is over 30 so this is a ‘large sample’

problem and the population can be considered to be infinite. Because values of x, à, σ and N are all known, equation (1) can be used to determine the z-value of the sample mean,

i.e. z= x−à

√σ N

= 1.003−1.000

√0.01 36

= ± 0.003 0.0016

=±1.8

The z-value corresponding to a level of significance of 0.05 for a two-tailed test is given in Table 62.1 on page 594 and is±1.96. Since the z-value of the sample is within this range, the null hypothesis is accepted and the machine should not be adjusted.

600 STATISTICS AND PROBABILITY

Problem 5. The mean lifetime of a random sample of 50 similar torch bulbs drawn from a batch of 500 bulbs is 72 hours. The standard deviation of the lifetime of the sample is 10.4 hours. The batch is classed as inferior if the mean lifetime of the batch is less than the population mean of 75 hours. Determine whether, as a result of the sample data, the batch is considered to be inferior at a level of significance of (a) 0.05 and (b) 0.01.

Population size, Np=500, population mean,à=75 hours, mean of sample, x=72 hours, standard deviation of sample, s=10.4 hours, size of sample, N=50.

The null hypothesis is that the mean of the sample is equal to the mean of the population, i.e. H0: x=à. The alternative hypothesis is that the mean of the sample is less than the mean of the population, i.e.

H1: x<à.

(The fact that x=72 should not lead to the con- clusion that the batch is necessarily inferior. At a level of significance of 0.05, the result is ‘probably significant’, but since this corresponds to a confidence level of 95%, there are still 5 times in every 100 when the result can be significantly different, that is, be outside of the range of z-values for this data.

This particular sample result may be one of these 5 times.)

The decision rules associated with the hypo- theses are:

(i) reject H0if the z-value (or t-value) of the sample mean is less than the z-value (or t-value) corresponding to a level of significance of (a) 0.05 and (b) 0.01, i.e. the batch is inferior,

(ii) accept H0 otherwise, i.e. the batch is not inferior.

The data given is N, Np, x, s andà. The alternative hypothesis indicates a one-tailed distribution and since N>30 the ‘large sample’ theory applies.

From equation (6),

z= x−à

√s N

Np−N Np−1

= 72−75 10.4√

500−50 500−1

= −3

(1.471)(0.9496) =−2.15

(a) For a level of significance of 0.05 and a one- tailed test, all values to the left of the z-ordinate at −1.645 (see Table 62.1 on page 594) indi- cate that the results are ‘not significant’, that is, they differ significantly from the null hypothesis. Since the z-value of the sample mean is

−2.15, i.e. less than−1.645, the batch is con- sidered to be inferior at a level of significance of 0.05.

(b) The z-value for a level of significance of 0.01 for a one-tailed test is−2.33 and in this case, z-values of sample means lying to the left of the z-ordinate at−2.33 are ‘not significant’. Since the z-value of the sample lies to the right of this ordinate, it does not differ significantly from the null hypothesis and the batch is not considered to be inferior at a level of significance of 0.01.

(At first sight, for a mean value to be significant at a level of significance of 0.05, but not at 0.01, appears to be incorrect. However, it is stated earlier in the chapter that for a result to be probably significant, i.e. at a level of significance of between 0.01 and 0.05, the range of z-values is less than the range for the result to be highly significant, that is, having a level of significance of 0.01 or better. Hence the results of the problem are logical.)

Problem 6. An analysis of the mass of carbon in six similar specimens of cast iron, each of mass 425.0 g, yielded the following results:

17.1 g, 17.3 g, 16.8 g, 16.9 g, 17.8 g, and 17.4 g

Test the hypothesis that the percentage of carbon is 4.00% assuming an arbitrary level of significance of (a) 0.2 and (b) 0.1.

The sample mean,

x= 17.1+17.3+16.8+16.9+17.8+17.4

=17.22 6

The sample standard deviation,

s= 56 66 66 7

⎧⎪

⎪⎨

⎪⎪

⎩

(17.1−17.22)2+(17.3−17.22)2 +(16.8−17.22)2+ ã ã ã +(17.4−17.22)2

⎫⎪

⎪⎬

⎪⎪

⎭

=0.334

Ch62-H8152.tex 23/6/2006 15: 15 Page 601

SIGNIFICANCE TESTING 601

The null hypothesis is that the sample and population means are equal, i.e. H0: x=à.

The alternative hypothesis is that the sample and population means are not equal, i.e. H1: x=à.

The decision rules are:

(i) reject H0if the z- or t-value of the sample mean is outside of the range of the z- or t-value corresponding to a level of significance of (a) 0.2 and (b) 0.1, i.e. the mass of carbon is not 4.00%, (ii) accept H0otherwise, i.e. the mass of carbon is

4.00%.

The number of tests taken, N, is 6 and an infinite number of tests could have been taken, hence the population is considered to be infinite. Because N<30, a t-distribution is used.

If the mean mass of carbon in the bulk of the metal is 4.00%, the mean mass of carbon in a specimen is 4.00% of 425.0, i.e. 17.00 g, thusà=17.00.

From equation (7),

|t| = (x−à)√ (N−1) s

= (17.22−17.00)√ (6−1) 0.334

=1.473

In general, for any two-tailed distribution there is a critical region both to the left and to the right of the mean of the distribution. For a level of significance of 0.2, 0.1 of the percentile value of a t-distribution lies to the left of the mean and 0.1 of the percentile value lies to the right of the mean. Thus, for a level of significance ofα, a value t

1−α 2

, is required for a two-tailed distribution when using Table 61.2 on page 587. This conversion is necessary because the t-distribution is given in terms of levels of confidence and for a one-tailed distribution. The row t-value for a value of α of 0.2 is t

1−0.2 2

, i.e. t0.90. The degrees of freedom ν are N−1, that is 5. From Table 61.2 on page 587, the percentile value corresponding to (t0.90,ν=5) is 1.48, and for a two-tailed test,±1.48. Since the mean value of the sample is within this range, the hypothesis is accepted at a level of significance of 0.2.

The t-value forα=0.1 is t

1−0.1 2

, i.e. t0.95. The percentile value corresponding to t0.95,ν=5 is 2.02 and since the mean value of the sample is within

the range±2.02, the hypothesis is also accepted at this level of significance. Thus, it is probable that the mass of metal contains 4% carbon at levels of significance of 0.2 and 0.1.

Now try the following exercise.

Exercise 224 Further problems on signifi- cance tests for population means

1. A batch of cables produced by a manufacturer have a mean breaking strength of 2000 kN and a standard deviation of 100 kN. A sample of 50 cables is found to have a mean breaking strength of 2050 kN. Test the hypothesis that the breaking strength of the sample is greater than the breaking strength of the population from which it is drawn at a level of significance of 0.01.

⎡

⎢⎢

⎣

z (sample)=3.54, zα=2.58, hence hypothesis is rejected, where zαis the z-value corresponding to a level of significance ofα

⎤

⎥⎥

⎦

2. Nine estimations of the percentage of copper in a bronze alloy have a mean of 80.8% and standard deviation of 1.2%. Assuming that the percentage of copper in samples is normally distributed, test the null hypothesis that the true percentage of copper is 80% against an alternative hypothesis that it exceeds 80%, at a level of significance of 0.1.

t0.95,ν8=1.86,|t| =1.88, hence null hypothesis rejected

3. The internal diameter of a pipe has a mean diameter of 3.0000 cm with a standard deviation of 0.015 cm. A random sample of 30 measurements are taken and the mean of the samples is 3.0078 cm. Test the hypothesis that the mean diameter of the pipe is 3.0000 cm at a level of significance of 0.01.

z (sample)=2.85, zα= ±2.58, hence hypothesis is rejected

4. A fishing line has a mean breaking strength of 10.25 kN. Following a special treatment on the line, the following results are obtained for 20 specimens taken from the line.

602 STATISTICS AND PROBABILITY

Breaking strength Frequency (kN)

9.8 1

10 1

10.1 4

10.2 5

10.5 3

10.7 2

10.8 2

10.9 1

11.0 1

Test the hypothesis that the special treatment has improved the breaking strength at a level of significance of 0.1.

⎡

⎣x=10.38, s=0.33,

t0.95ν19=1.73,|t| =1.72, hence hypothesis is accepted

⎤

⎦

5. A machine produces ball bearings having a mean diameter of 0.50 cm. A sample of 10 ball bearings is drawn at random and the sample mean is 0.53 cm with a standard deviation of 0.03 cm. Test the hypothesis that the mean diameter is 0.50 cm at a level of significance of (a) 0.05 and (b) 0.01.

⎡

⎢⎢

⎢⎣

|t| =3.00,

(a) t0.975ν9=2.26, hence

hypothesis rejected, (b) t0.995ν9=3.25, hence

hypothesis is accepted

⎤

⎥⎥

⎥⎦

6. Six similar switches are tested to destruction at an overload of 20% of their normal max- imum current rating. The mean number of operations before failure is 8200 with a standard deviation of 145. The manufacturer of the switches claims that they can be operated at least 8000 times at a 20% overload current.

Can the manufacturer’s claim be supported at a level of significance of (a) 0.1 and (b) 0.2?

⎡

⎢⎢

⎢⎣

|t| =3.08,

(a) t0.95ν5=2.02, hence claim supported, (b) t0.99ν5=3.36, hence claim

not supported

⎤

⎥⎥

⎥⎦

Power series solution by the

Testing for a normal distribution