Testing for a normal distribution

Một phần của tài liệu Ebook Higher engineering mathematics (5th edition) Part 2 (Trang 250 - 254)

It should never be assumed that because data is con- tinuous it automatically follows that it is normally distributed. One way of checking that data is nor- mally distributed is by using normal probability paper, often just called probability paper. This is special graph paper which has linear markings on one axis and percentage probability values from 0.01 to 99.99 on the other axis (see Figs. 58.6 and 58.7).

The divisions on the probability axis are such that a straight line graph results for normally distributed data when percentage cumulative frequency values are plotted against upper class boundary values. If the points do not lie in a reasonably straight line, then the data is not normally distributed. The method used to test the normality of a distribution is shown in Problems 5 and 6. The mean value and standard deviation of normally distributed data may be deter- mined using normal probability paper. For normally distributed data, the area beneath the standardized normal curve and a z-value of unity (i.e. one stan- dard deviation) may be obtained from Table 58.1.

For one standard deviation, this area is 0.3413, i.e. 34.13%. An area of ±1 standard deviation is symmetrically placed on either side of the z=0 value, i.e. is symmetrically placed on either side of the 50% cumulative frequency value. Thus an area corresponding to ±1 standard deviation extends from percentage cumulative frequency values of (50+34.13)% to (50−34.13)%, i.e. from 84.13%

to 15.87%. For most purposes, these values are taken as 84% and 16%. Thus, when using normal probabil- ity paper, the standard deviation of the distribution is given by:

variable value for 84% cumulative frequency− variable value for 16% cumulative frequency

2

564 STATISTICS AND PROBABILITY

30 32 34 36 38 40 42

0.01 0.05 0.1 0.2 0.5 1 2 5 10 20 30 40 50 60 70 80 90 95 98 99 99.8 99.9 99.99

Percentage cumulative frequency

Upper class boundary R

P Q

Figure 58.6

Problem 5. Use normal probability paper to determine whether the data given below, which refers to the masses of 50 copper ingots, is approximately normally distributed. If the data is normally distributed, determine the mean and standard deviation of the data from the graph drawn.

Class mid-point value (kg) Frequency

29.5 2

30.5 4

31.5 6

32.5 8

33.5 9

34.5 8

35.5 6

36.5 4

37.5 2

38.5 1

To test the normality of a distribution, the upper class boundary/percentage cumulative frequency values are plotted on normal probability paper. The upper class boundary values are: 30, 31, 32, …, 38, 39.

0.01 0.050.1 0.2 0.5 1 2 5 10 20 30 40 50 60 70 80 90 95 98 99 99.9 99.99

10 20 30 40 50 60 70 80 90 100 110 Upper class boundary

Percentage cumulative frequency

B

A

C

Figure 58.7

The corresponding cumulative frequency values (for

‘less than’ the upper class boundary values) are:

2, (4+2)=6, (6+4+2)=12, 20, 29, 37, 43, 47, 49 and 50. The corresponding percentage cumulative frequency values are 2

50×100=4, 6

50×100=12, 24, 40, 58, 74, 86, 94, 98 and 100%.

The co-ordinates of upper class boundary/percen- tage cumulative frequency values are plotted as shown in Fig. 58.6. When plotting these values, it will always be found that the co-ordinate for the 100% cumulative frequency value cannot be plotted, since the maximum value on the probability scale is 99.99. Since the points plotted in Fig. 58.6 lie very nearly in a straight line, the data is approximately normally distributed.

The mean value and standard deviation can be determined from Fig. 58.6. Since a normal curve is symmetrical, the mean value is the value of the variable corresponding to a 50% cumulative fre- quency value, shown as point P on the graph. This shows that the mean value is 33.6 kg. The standard

Ch58-H8152.tex 23/6/2006 15: 14 Page 565

THE NORMAL DISTRIBUTION 565

J

deviation is determined using the 84% and 16%

cumulative frequency values, shown as Q and R in Fig. 58.6. The variable values for Q and R are 35.7 and 31.4 respectively; thus two standard devi- ations correspond to 35.7−31.4, i.e. 4.3, showing that the standard deviation of the distribution is approximately4.3

2 i.e. 2.15 standard deviations.

The mean value and standard deviation of the distribution can be calculated using

mean, x= 8fx

8f and standard deviation,

σ= 56 6738

[f (x− ¯x)2] 8f

4

where f is the frequency of a class and x is the class mid-point value. Using these formulae gives a mean value of the distribution of 33.6 (as obtained graphi- cally) and a standard deviation of 2.12, showing that the graphical method of determining the mean and standard deviation give quite realistic results.

Problem 6. Use normal probability paper to determine whether the data given below is nor- mally distributed. Use the graph and assume a normal distribution whether this is so or not, to find approximate values of the mean and standard deviation of the distribution.

Class mid-point values Frequency

5 1

15 2

25 3

35 6

45 9

55 6

65 2

75 2

85 1

95 1

To test the normality of a distribution, the upper class boundary/percentage cumulative frequency values are plotted on normal probability paper. The upper class boundary values are: 10, 20, 30, …, 90 and 100.

The corresponding cumulative frequency values are 1, 1+2=3, 1+2+3=6, 12, 21, 27, 29, 31, 32 and

33. The percentage cumulative frequency values are 1

33×100=3, 3

33×100=9, 18, 36, 64, 82, 88, 94, 97 and 100.

The co-ordinates of upper class boundary values/percentage cumulative frequency values are plotted as shown in Fig. 58.7. Although six of the points lie approximately in a straight line, three points corresponding to upper class boundary values of 50, 60 and 70 are not close to the line and indicate that the distribution is not normally distributed.

However, if a normal distribution is assumed, the mean value corresponds to the variable value at a cumulative frequency of 50% and, from Fig. 58.7, point A is 48. The value of the standard deviation of the distribution can be obtained from the variable values corresponding to the 84% and 16% cumula- tive frequency values, shown as B and C in Fig. 58.7 and give: 2σ=69−28, i.e. the standard deviation σ=20.5. The calculated values of the mean and standard deviation of the distribution are 45.9 and 19.4 respectively, showing that errors are introduced if the graphical method of determining these values is used for data which is not normally distributed.

Now try the following exercise.

Exercise 217 Further problems on testing for a normal distribution

1. A frequency distribution of 150 measure- ments is as shown:

Class mid-point value Frequency

26.4 5

26.6 12

26.8 24

27.0 36

27.2 36

27.4 25

27.6 12

Use normal probability paper to show that this data approximates to a normal distribu- tion and hence determine the approximate values of the mean and standard deviation of the distribution. Use the formula for mean and standard deviation to verify the results obtained.

⎣ Graphically, x=27.1,σ=0.3;

by calculation, x=27.079, σ =0.3001

566 STATISTICS AND PROBABILITY

2. A frequency distribution of the class mid- point values of the breaking loads for 275 similar fibres is as shown below:

Load (kN) 17 19 21 23 25 27 29 31 Frequency 9 23 55 78 64 28 14 4 Use normal probability paper to show that this distribution is approximately normally

distributed and determine the mean and stan- dard deviation of the distribution (a) from the graph and (b) by calculation.

(a) x=23.5 kN, σ =2.9 kN (b) x=23.364 kN, σ =2.917 kN

Ch59-H8152.tex 23/6/2006 15: 15 Page 567

J

Statistics and probability

59

Linear correlation

Một phần của tài liệu Ebook Higher engineering mathematics (5th edition) Part 2 (Trang 250 - 254)

Tải bản đầy đủ (PDF)

(413 trang)