If the calculated value of chi-square is less than the table value at a specified level of significance the fit is considered to be good.. The following table shows the distribution of d
Trang 1= 50 225 100
×
= 112.5
Since n is large, the test statistic is
Z = 2χ2 – 2n –1∼ N (0, 1)
Since Z > 3, it is significant at all levels of significance and hence H0 is rejected and we conclude that σ ≠ 10
Example 2 It is believed that the precision (as measured by the variance of an instrument is no more than 0.16 Write down the null and alternative hypothesis for testing this belief Carry out the test at 1% level, given 11 measurements of the same subject on the instrument:
2.5, 2.3, 2.4, 2.3, 2.5, 2.7, 2.5, 2.6, 2.6, 2.7, 2.5
[B.U (2006), Kanpur (2007)]
Sol Null Hypothesis, H0: σ2 = 0.16
Alternative Hypothesis, H1: σ2 > 0.16
Computation of Sample Variance
X = 27 6 11
= 2.51 ∑d iX–X 2
= 0.1891
Under the null hypothesis H0: σ2=0.16, the test statistic is:
χ2 = nS
2 2
σ =
∑d iX X– 2 2
0 1891
0 16
= 1.182 which follows χ2-distribution with d.f (11 – 1) = 10.
Trang 2Since the calculated value of χ2 is less than the tabulated value 23.2 of χ2 for 10 d.f at 1% level of significance, it is not significant Hence H0 may be accepted and we conclude that the data are consistent with the hypothesis that the precision of the instrument is 0.16
(ii) Chi-Square Test of Goodness of Fit: χ2 test is an approximate test for large values of
n χ2 test enables us to ascertain how well the theoretical distributions fit empirical distributions
or distribution obtained from sample data If the calculated value of chi-square is less than the table value at a specified level of significance the fit is considered to be good Generally we take significance at 5% level Similarly if the calculated value of χ2 is greater than the table value, the chi-square fit is considered to be poor
Example 3 The following table shows the distribution of digits in numbers chosen at random from
a telephone directory:
Test whether the digits may be taken to occur equally frequently in the directory.
Sol Null Hypothesis H0: The digits taken in the directory occur equally frequently Therefore there is no significant difference between the observed and expected frequency
Under H0, the expected frequency is given by = 10 000
10
, = 1000
To find the value of χ2
E
i i i
–
= 58542
1000 = 58.542.
Conclusion The tabulated value of χ2 at 5% level of of significance for 9 d.f is 16.919 Since
the calculated value of χ2 is greater than the tabulated value, H0 is rejected
i.e., there is significant difference between the observed and theoretical frequency.
i.e., the digits taken in the directory do not occur equally frequently.
Example 4 The following table gives the number of aircraft accidents that occurs during the various days of the week Find whether the accidents are uniformly distributed over the week
(Given: The values of chi-square significant at 5, 6, 7, d.f are respecitvely 11.07.,12.59, 14.07 at the 5% level of significance.
Sol Here we set up the null hypothesis that the accidents are uniformly distributed over the week
Under the null hypothesis, the expected frequencies of the accidents on each of the days would be:
Trang 3Days Sun Mon Tues Wed Thus Fri Sat Total
χ2 = 14 12
12
16 12 12
8 12 12
12 12 12
+ 11 12
12
9 12 12
14 12 12
12 (4 + 16 + 16 + 0 + 1 + 9 + 4) =
50 12
= 4.17
The number of degrees of freedom
= Number of observations – Number of independent constraints
= 7 – 1 = 6 The tabulated χ2
0.05 for 6 d.f = 12.59
Since the calculated χ2 is much less than the tabulated value, it is highly insignificant and
we accept the null hypothesis Hence we conclude that the accidents are uniformly distributed over the week
Example 5 Records taken of the number of male and female births in 800 families having four children are as follows:
Test whether the data are consistent with the hypothesis that the Binomial law holds and the chance
of male birth is equal to that of female birth, namely p = q = 1/2.
Sol H0: The data are consistent with the hypothesis of equal probability for male and female
births, i.e., p = q = 1/2.
We use Binomial distribution to calculate theoretical frequency given by:
N(r) = N × P(X = r) where N is the total frequency N(r) is the number of families with r male children:
P(X = r) = n C r p r q n–r where p and q are probability of male and female births, n is the number of children.
N(0) = No of families with 0 male children = 800 × 4C0 1
2
4
24 = 50
N(2) = 800 × 4C1 1
2
1 2
1 3
FHG IKJ FHG IKJ = 200; N(2) = 800 × 4C2 1
2
1 2
2 2
FHG IKJ FHG IKJ = 300
Trang 4N(4) = 800 × 4C3 1
2
1 2
1 3
FHG IKJ FHG IKJ = 200; N(4) = 800 × 4C4 1
2
1 2
0 4
FHG IKJ FHG IKJ = 50
E
i i i
–
E
i i i
–
= 54.433
Conclusion Table value of χ2 at 5% level of significance for 5 – 1 = 4 d.f is 9.49.
Since the calculated value of χ2 is greater than the tabulated value, H0 is rejected
i.e., the data are not consistent with the hypothesis that the Binomial law holds and that the
chance of a male birth is not equal to that of a female birth
Since the fitting is Binomial, the degrees of freedom ν = n –1 i.e., ν = 5 –1 = 4
Example 6 A survey of 320 families with 5 children each revealed the following distribution:
No of boys
No of girls
No of families
Is this result consistent with the hypothesis that male and female births are equally probable ? Sol Let us set up the null hypothesis that the data are consistent with the hypothesis of equal probability for male and female births Then under the null hypothesis:
p = Probability of male birth = 1
2 = q
p(r) = Probability of ‘r’ male births in a family of 5
= FH IK5r p r q 5 – r = FH IK5r 1
2
5
FHG IKJ
The frequency of r male births is given by:
f(r) = N p(r) = 320 × FH IKr5 × 1
2
5
FHG IKJ
Trang 5Substituting r = 0, 1, 2, 3, 4 successively in (1), we get the expected frequencies as follows :
f(0) = 10 × 1 = 10, f(1) = 10 × 5C1 = 50
f(2) = 10 × 5C2 = 100, f(3) = 10 × 5C3 = 100
f(4) = 10 × 5C4 = 50, f(5) = 10 × 5C5 = 10
Calculations for χχχχχ2
O E E
–
L N
Tabulated χ2
0.05 for 6 – 1 = 5 d.f is 11.07.
Calculated value of χ2 is less than the tabulated value, it is not significant at 5% level of significance and hence the null hypothesis of equal probability for male and female births may
be accepted
Example 7 Fit a Poisson distribution to the following data and test the goodness of fit:
Sol Mean of the given distribution is:
f x N
i i i
∑
= 189
392 = 0.482
In order to fit a Poisson distribution to the given data, we take the mean (parameter) m of the Poisson distribution equal to the mean of the given distribution, i.e., we take
m = X = 0.482 The frequency of r successes is given by the Poisson law as:
f(r) = Np(r) = 392 × e
r
r
–0.
!
482a f0 482 ; r = 0, 1, 2, , 6 Now, f(0) = 392 × e–0.482 = 392 × Antilog [– 0.482 log e]
= 392 × Antilog [– 0.482 × log 2.7183] [ e = 2.7183]
Trang 6= 392 × Antilog [– 0.482 × 0.4343]
= 392 × Antilog [– 0.2093]
= 392 × Antilog [1.7907] = 392 × 0.6176 = 242.1
f(1) = m × f(0) = 0.482 × 242.1 = 116.69
f(2) = m
2 × f(1) = 0.241 × 116.69 = 28.12
f(3) = m
3 × f(2) =
0 482 3
× 28.12 = 4.518
f(4) = m
4 × f(3) =
0 482 4
× 4.518 = 0.544
f(5) = m
5 × f(4) =
0 482 5
× 0.544 = 0.052
f(6) = m
6 × f(5) =
0 482 6
× 0.052 = 0.004 Hence the theoretical Poisson frequencies correct to one decimal place are as given below:
X
Expected Frequency
CALCULATIONS FOR CHI-SQUARE
7 5 2 1 15
U V
||
W
||
4 5
0 5
0 1 0
5 1
.
U V
||
W
E
–
= 40.937
degree of freedom = 7 – 1 – 1 – 3 = 2 Tabulated value of χ2 for 2 degree of freedom at 5% level of significance is 5.99
Trang 7Conclusion: Since calculated value of χ2 (40.937) is much greater than 5.99, it is therefore highly significant Hence we say that poisson distribution is not a good fit to the given data
Example 8 A die is thrown 270 times and the results of these throws are given below:
40 32 29 59 57 59
No appeared on the die Frequency
Test whether the die is biased or not.
Sol Null Hypothesis H0: Die is unbiased
Under this H0, the expected frequencies for each digit is 276
6 = 46.
To find the value of χ2,
i
i
i i
O E
O −E
E
i i i
–
= 980
46 = 21.30.
Conclusion: Tabulated value of χ2 at 5% level of significance for (6 – 1= 5) d.f is 11.09 Since
the calculated value of χ2 = 21.30 > 11.07 the tabulated value, H0 is rejected
i.e., die is not unbiased or die is biased.
Example 9 The theory predicts the proportion of beans in the four groups, G 1 , G 2 , G 3 , G 4 should
be in the ratio 9: 3: 3: 1 In an experiment with 1600 beans the numbers in the four groups were 882, 313,
287 and 118 Does the experimental result support the theory.
Sol H0: The experimental result support the theory, i.e., there is no significant difference between the observed and theoretical frequency under H0, the theoretical frequency can be calculated as follows:
E(G1) = 1600 9
16
×
= 900;
E(G2) = 1600 3
16
×
= 300;
E(G3) = 1600 3
16
×
= 300;
E(G4) = 1600 1
16
× = 100
Trang 8To calculate the value of χ2
E
i i i
–
E
i i i
–
= 4.7266.
Conclusion: Table value of χ2 at 5% level of significance for 3 d.f is 7.815 Since the calculated
value of χ2 is less than that of the tabulated value Hence H0 is accepted i.e., the experimental
result support the theory
(iii)χχχχχ2 test as a test of Attributes: Let us consider two attributes A and B, A divided into
r classes A1, A2, , A r and B divided into S classes B1, B2, BS, such a classification in which attributes are divided into more than two classes is known as manifold classification The
various cell frequencies can be expressed in the following table known as r × s manifold contingency table Here (A i ) is the number of persons possessing the attributes and (B j) is the number of
persons possessing the attributes (B j ) and (A i B j) is the number of persons possessing both the attributes
A i and B j for [i = 1, 2, , r; j = 1, 2, S]
A i i
r
=
∑ 1
j
s
=
∑
1 = N, is the total frequency.
The contingency table for r × s is given below:
A A 1 A2 A3 .A r Total B
B1 (A1B1) (A2B1) (A3B1) (A1B1) B1
B2 (A1B2) (A2B2) (A3B2) (A r B2) B2
B3 (A1B3) (A2B3) (A3B3) (A r B3) B3
B s (A1B s) (A2B s) (A3Bs) (A r Bs) (B s)
The problem is to test if two attributes A and B under consideration are independent or not.
Under the null hypothesis, both the attributes are independent, the theoretical cell frequencies are calculated as follows
Trang 9P(A i ) = Probability that a person possesses the attribute A i = A
N i
b g i = 1, 2, , r
P(B i ) = Probability that a person possesses the attribute B j = B
N j
e j
P(A i B j ) = Probability that a person possesses both attributes A i and B j = A B
N
i j
e j
If (A i B j)0 is the expected number of persons possessing both the attributes A i and B j
(A i B j)0 = N.P (A i B j ) = NP (A i )(B j)
= N A N
B N
i j
N
i j
b ge j
(Since A and B are independent)
i
r j
s
∑ ∑
1 1
A B
i j i j
i j
e j e j
e j
–
0 2
0 which is distributedd as a χ2 variate with (r –1)(S –1) d.f.
Some Remarkable points:
1 For a 2 × 2 contingency table where the frequencies are a b c d// , χ2 can be calculated from
independent frequencies as χ2 = a b c d ad bc
a b c d b d a c
+ + +
2
2 If the contingency table is not 2 × 2, then the above formula for calculating χ2 cannot
be used Hence, we have another formula for calculating the expected frequency (A i B j)0
= A B
N
i j
b ge j i.e., expected frequency in each cell is = Product of column total and row total
whole total
3 If a b c d// is the 2 × 2 contingency table with two attributes, Q = ad ad–+bc bc is called the
coefficient of association If the attributes are independent then a
b =
c
d.
Remark: Yatess Correction: In a 2 × 2 table, if the frequencies of a cell is small, we make Yates’s correction to make χ2 continuous
Decrease by 1
2 those cell frequencies which are greater than expected frequencies, and increase by 1
2 those which are less than expectation This will not affect the marginal columns. This correction is known as Yates’s correction to continuity
Trang 10After Yates’s correction χ2 =
a c b d c d a b
2 2
χ2 =
a c b d c d a b
2 2
Example 10 (2 × 2 contingency table) For the 2 × 2 table,
prove that chi-square test of independence gives
a c b d a b c d
–
b gb gb gb g
2 + + + + , N = a + b + c + d (1)
[Guwahati Univ B.Sc., 2002] Sol Under the hypothesis of independence of attributes,
E(a) = a b a c
N
b gb g
E(b) = a b b d
N
b gb g
E(c) = a c c d
N
b gb g
N
b gb g
+ +
E a
b E b
E b
c E c
E c
d E d
E d
a – E(a) = a – a b a c
N
b gb g
N
= ad bc
N
–