A textbook of Computer Based Numerical and Statiscal Techniques part 55 potx

If the calculated value of chi-square is less than the table value at a specified level of significance the fit is considered to be good.. The following table shows the distribution of d

Trang 1

= 50 225 100

×

= 112.5

Since n is large, the test statistic is

Z = 2χ2 – 2n –1∼ N (0, 1)

Since Z > 3, it is significant at all levels of significance and hence H0 is rejected and we conclude that σ ≠ 10

Example 2 It is believed that the precision (as measured by the variance of an instrument is no more than 0.16 Write down the null and alternative hypothesis for testing this belief Carry out the test at 1% level, given 11 measurements of the same subject on the instrument:

2.5, 2.3, 2.4, 2.3, 2.5, 2.7, 2.5, 2.6, 2.6, 2.7, 2.5

[B.U (2006), Kanpur (2007)]

Sol Null Hypothesis, H0: σ2 = 0.16

Alternative Hypothesis, H1: σ2 > 0.16

Computation of Sample Variance

X = 27 6 11

= 2.51 ∑d iX–X 2

= 0.1891

Under the null hypothesis H0: σ2=0.16, the test statistic is:

χ2 = nS

2 2

σ =

∑d iX X– 2 2

0 1891

0 16

= 1.182 which follows χ2-distribution with d.f (11 – 1) = 10.

Trang 2

Since the calculated value of χ2 is less than the tabulated value 23.2 of χ2 for 10 d.f at 1% level of significance, it is not significant Hence H0 may be accepted and we conclude that the data are consistent with the hypothesis that the precision of the instrument is 0.16

(ii) Chi-Square Test of Goodness of Fit: χ2 test is an approximate test for large values of

n χ2 test enables us to ascertain how well the theoretical distributions fit empirical distributions

or distribution obtained from sample data If the calculated value of chi-square is less than the table value at a specified level of significance the fit is considered to be good Generally we take significance at 5% level Similarly if the calculated value of χ2 is greater than the table value, the chi-square fit is considered to be poor

Example 3 The following table shows the distribution of digits in numbers chosen at random from

a telephone directory:

Test whether the digits may be taken to occur equally frequently in the directory.

Sol Null Hypothesis H0: The digits taken in the directory occur equally frequently Therefore there is no significant difference between the observed and expected frequency

Under H0, the expected frequency is given by = 10 000

10

, = 1000

To find the value of χ2

E

i i i

–

= 58542

1000 = 58.542.

Conclusion The tabulated value of χ2 at 5% level of of significance for 9 d.f is 16.919 Since

the calculated value of χ2 is greater than the tabulated value, H0 is rejected

i.e., there is significant difference between the observed and theoretical frequency.

i.e., the digits taken in the directory do not occur equally frequently.

Example 4 The following table gives the number of aircraft accidents that occurs during the various days of the week Find whether the accidents are uniformly distributed over the week

(Given: The values of chi-square significant at 5, 6, 7, d.f are respecitvely 11.07.,12.59, 14.07 at the 5% level of significance.

Sol Here we set up the null hypothesis that the accidents are uniformly distributed over the week

Under the null hypothesis, the expected frequencies of the accidents on each of the days would be:

Trang 3

Days Sun Mon Tues Wed Thus Fri Sat Total

χ2 = 14 12

12

16 12 12

8 12 12

12 12 12

+ 11 12

12

9 12 12

14 12 12

12 (4 + 16 + 16 + 0 + 1 + 9 + 4) =

50 12

= 4.17

The number of degrees of freedom

= Number of observations – Number of independent constraints

= 7 – 1 = 6 The tabulated χ2

0.05 for 6 d.f = 12.59

Since the calculated χ2 is much less than the tabulated value, it is highly insignificant and

we accept the null hypothesis Hence we conclude that the accidents are uniformly distributed over the week

Example 5 Records taken of the number of male and female births in 800 families having four children are as follows:

Test whether the data are consistent with the hypothesis that the Binomial law holds and the chance

of male birth is equal to that of female birth, namely p = q = 1/2.

Sol H0: The data are consistent with the hypothesis of equal probability for male and female

births, i.e., p = q = 1/2.

We use Binomial distribution to calculate theoretical frequency given by:

N(r) = N × P(X = r) where N is the total frequency N(r) is the number of families with r male children:

P(X = r) = n C r p r q n–r where p and q are probability of male and female births, n is the number of children.

N(0) = No of families with 0 male children = 800 × 4C0 1

2

4

24 = 50

N(2) = 800 × 4C1 1

2

1 2

1 3

FHG IKJ FHG IKJ = 200; N(2) = 800 × 4C2 1

2

1 2

2 2

FHG IKJ FHG IKJ = 300

Trang 4

N(4) = 800 × 4C3 1

2

1 2

1 3

FHG IKJ FHG IKJ = 200; N(4) = 800 × 4C4 1

2

1 2

0 4

FHG IKJ FHG IKJ = 50

E

i i i

–

E

i i i

–

= 54.433

Conclusion Table value of χ2 at 5% level of significance for 5 – 1 = 4 d.f is 9.49.

Since the calculated value of χ2 is greater than the tabulated value, H0 is rejected

i.e., the data are not consistent with the hypothesis that the Binomial law holds and that the

chance of a male birth is not equal to that of a female birth

Since the fitting is Binomial, the degrees of freedom ν = n –1 i.e., ν = 5 –1 = 4

Example 6 A survey of 320 families with 5 children each revealed the following distribution:

No of boys

No of girls

No of families

Is this result consistent with the hypothesis that male and female births are equally probable ? Sol Let us set up the null hypothesis that the data are consistent with the hypothesis of equal probability for male and female births Then under the null hypothesis:

p = Probability of male birth = 1

2 = q

p(r) = Probability of ‘r’ male births in a family of 5

= FH IK5r p r q 5 – r = FH IK5r 1

2

5

FHG IKJ

The frequency of r male births is given by:

f(r) = N p(r) = 320 × FH IKr5 × 1

2

5

FHG IKJ

Trang 5

Substituting r = 0, 1, 2, 3, 4 successively in (1), we get the expected frequencies as follows :

f(0) = 10 × 1 = 10, f(1) = 10 × 5C1 = 50

f(2) = 10 × 5C2 = 100, f(3) = 10 × 5C3 = 100

f(4) = 10 × 5C4 = 50, f(5) = 10 × 5C5 = 10

Calculations for χχχχχ2

O E E

–

L N

Tabulated χ2

0.05 for 6 – 1 = 5 d.f is 11.07.

Calculated value of χ2 is less than the tabulated value, it is not significant at 5% level of significance and hence the null hypothesis of equal probability for male and female births may

be accepted

Example 7 Fit a Poisson distribution to the following data and test the goodness of fit:

Sol Mean of the given distribution is:

f x N

i i i

∑

= 189

392 = 0.482

In order to fit a Poisson distribution to the given data, we take the mean (parameter) m of the Poisson distribution equal to the mean of the given distribution, i.e., we take

m = X = 0.482 The frequency of r successes is given by the Poisson law as:

f(r) = Np(r) = 392 × e

r

–0.

!

482a f0 482 ; r = 0, 1, 2, , 6 Now, f(0) = 392 × e–0.482 = 392 × Antilog [– 0.482 log e]

= 392 × Antilog [– 0.482 × log 2.7183] [ e = 2.7183]

Trang 6

= 392 × Antilog [– 0.482 × 0.4343]

= 392 × Antilog [– 0.2093]

= 392 × Antilog [1.7907] = 392 × 0.6176 = 242.1

f(1) = m × f(0) = 0.482 × 242.1 = 116.69

f(2) = m

2 × f(1) = 0.241 × 116.69 = 28.12

f(3) = m

3 × f(2) =

0 482 3

× 28.12 = 4.518

f(4) = m

4 × f(3) =

0 482 4

× 4.518 = 0.544

f(5) = m

5 × f(4) =

0 482 5

× 0.544 = 0.052

f(6) = m

6 × f(5) =

0 482 6

× 0.052 = 0.004 Hence the theoretical Poisson frequencies correct to one decimal place are as given below:

X

Expected Frequency

CALCULATIONS FOR CHI-SQUARE

7 5 2 1 15

U V

||

W

||

4 5

0 5

0 1 0

5 1

.

U V

||

W

E

–

= 40.937

degree of freedom = 7 – 1 – 1 – 3 = 2 Tabulated value of χ2 for 2 degree of freedom at 5% level of significance is 5.99

Trang 7

Conclusion: Since calculated value of χ2 (40.937) is much greater than 5.99, it is therefore highly significant Hence we say that poisson distribution is not a good fit to the given data

Example 8 A die is thrown 270 times and the results of these throws are given below:

40 32 29 59 57 59

No appeared on the die Frequency

Test whether the die is biased or not.

Sol Null Hypothesis H0: Die is unbiased

Under this H0, the expected frequencies for each digit is 276

6 = 46.

To find the value of χ2,

i

i i

O E

O −E

E

i i i

–

= 980

46 = 21.30.

Conclusion: Tabulated value of χ2 at 5% level of significance for (6 – 1= 5) d.f is 11.09 Since

the calculated value of χ2 = 21.30 > 11.07 the tabulated value, H0 is rejected

i.e., die is not unbiased or die is biased.

Example 9 The theory predicts the proportion of beans in the four groups, G 1 , G 2 , G 3 , G 4 should

be in the ratio 9: 3: 3: 1 In an experiment with 1600 beans the numbers in the four groups were 882, 313,

287 and 118 Does the experimental result support the theory.

Sol H0: The experimental result support the theory, i.e., there is no significant difference between the observed and theoretical frequency under H0, the theoretical frequency can be calculated as follows:

E(G1) = 1600 9

16

×

= 900;

E(G2) = 1600 3

16

×

= 300;

E(G3) = 1600 3

16

×

= 300;

E(G4) = 1600 1

16

× = 100

Trang 8

To calculate the value of χ2

E

i i i

–

E

i i i

–

= 4.7266.

Conclusion: Table value of χ2 at 5% level of significance for 3 d.f is 7.815 Since the calculated

value of χ2 is less than that of the tabulated value Hence H0 is accepted i.e., the experimental

result support the theory

(iii)χχχχχ2 test as a test of Attributes: Let us consider two attributes A and B, A divided into

r classes A1, A2, , A r and B divided into S classes B1, B2, BS, such a classification in which attributes are divided into more than two classes is known as manifold classification The

various cell frequencies can be expressed in the following table known as r × s manifold contingency table Here (A i ) is the number of persons possessing the attributes and (B j) is the number of

persons possessing the attributes (B j ) and (A i B j) is the number of persons possessing both the attributes

A i and B j for [i = 1, 2, , r; j = 1, 2, S]

A i i

r

=

∑ 1

j

s

=

∑

1 = N, is the total frequency.

The contingency table for r × s is given below:

A A 1 A2 A3 .A r Total B

B1 (A1B1) (A2B1) (A3B1) (A1B1) B1

B2 (A1B2) (A2B2) (A3B2) (A r B2) B2

B3 (A1B3) (A2B3) (A3B3) (A r B3) B3

B s (A1B s) (A2B s) (A3Bs) (A r Bs) (B s)

The problem is to test if two attributes A and B under consideration are independent or not.

Under the null hypothesis, both the attributes are independent, the theoretical cell frequencies are calculated as follows

Trang 9

P(A i ) = Probability that a person possesses the attribute A i = A

N i

b g i = 1, 2, , r

P(B i ) = Probability that a person possesses the attribute B j = B

N j

e j

P(A i B j ) = Probability that a person possesses both attributes A i and B j = A B

N

i j

e j

If (A i B j)0 is the expected number of persons possessing both the attributes A i and B j

(A i B j)0 = N.P (A i B j ) = NP (A i )(B j)

= N A N

B N

i j

N

i j

b ge j

(Since A and B are independent)

i

r j

s

∑ ∑

1 1

A B

i j i j

i j

e j e j

e j

–

0 2

0 which is distributedd as a χ2 variate with (r –1)(S –1) d.f.

Some Remarkable points:

1 For a 2 × 2 contingency table where the frequencies are a b c d// , χ2 can be calculated from

independent frequencies as χ2 = a b c d ad bc

a b c d b d a c

+ + +

2

2 If the contingency table is not 2 × 2, then the above formula for calculating χ2 cannot

be used Hence, we have another formula for calculating the expected frequency (A i B j)0

= A B

N

i j

b ge j i.e., expected frequency in each cell is = Product of column total and row total

whole total

3 If a b c d// is the 2 × 2 contingency table with two attributes, Q = ad ad–+bc bc is called the

coefficient of association If the attributes are independent then a

b =

c

d.

Remark: Yatess Correction: In a 2 × 2 table, if the frequencies of a cell is small, we make Yates’s correction to make χ2 continuous

Decrease by 1

2 those cell frequencies which are greater than expected frequencies, and increase by 1

2 those which are less than expectation This will not affect the marginal columns. This correction is known as Yates’s correction to continuity

Trang 10

After Yates’s correction χ2 =

a c b d c d a b

2 2

χ2 =

a c b d c d a b

2 2

Example 10 (2 × 2 contingency table) For the 2 × 2 table,

prove that chi-square test of independence gives

a c b d a b c d

–

b gb gb gb g

2 + + + + , N = a + b + c + d (1)

[Guwahati Univ B.Sc., 2002] Sol Under the hypothesis of independence of attributes,

E(a) = a b a c

N

b gb g

E(b) = a b b d

N

b gb g

E(c) = a c c d

N

b gb g

N

b gb g

+ +

E a

b E b

E b

c E c

E c

d E d

E d

a – E(a) = a – a b a c

N

b gb g

N

= ad bc

N

–

Định dạng
Số trang	10
Dung lượng	109,41 KB