1102 MATHEMATIC AL STATISTICS where m is the number of degrees of freedom. If |ρ rom | ≤ 3, then the null hypothesis should be accepted; if |ρ rom | > 3, then the null hypothesis should be rejected. This test has a fixed size α = 0.0027. 2 ◦ . Test for excluding outliers. To test the null hypothesis, the following statistic is used: x ∗ = 1 s ∗ max 1≤i≤n |X i – m ∗ |,(21.3.3.2) where s ∗ is the sample mean-square deviation. The null hypothesis H 0 for a given size α is accepted if x ∗ < x 1–α ,wherex α is the quantile of the statistic x ∗ under the assumption that the sample is normal. The values of x α for various n and α can be found in statistical tables. 3 ◦ . Test based on the sample first absolute central moment. The test is based on the statistic μ ∗ = μ ∗ (X 1 , , X n )= 1 ns ∗ n i=1 |X i – m ∗ |.(21.3.3.3) Under the assumption that the null hypothesis H 0 is true, the distribution of the statistic μ ∗ depends only on the sample size n and is independent of the parameters a and σ 2 .The null hypothesis H 0 for a given size α is accepted if μ α/2 < μ ∗ < μ 1–α/2 ,whereμ α is the α-quantile of the statistic μ ∗ . The values of μ α for various n and α can be found in statistical tables. 4 ◦ . Test based on the sample asymmetry coefficient. The test is based on the statistic γ ∗ 1 = γ ∗ 1 (X 1 , , X n )= 1 n(s ∗ ) 3 n i=1 (X i – m ∗ ) 3 .(21.3.3.4) The null hypothesis H 0 for a given size α is accepted if |γ ∗ 1 | < γ 1,1–α/2 ,whereγ 1,α is the α-quantile of the statistic γ ∗ 1 . The values of γ 1,α for various n and α can be found in statistical tables. 5 ◦ . Test based on sample excess coefficient.Thetestverifies the closeness of the sample excess (the test statistic) γ ∗ 2 = γ ∗ 2 (X 1 , , X n )= 1 n(s ∗ ) 4 n i=1 (X i – m ∗ ) 4 (21.3.3.5) and the theoretical excess γ 2 + 3 = E{(X – E{X}) 4 }/(Var{X}) 2 , equal to 3 for the normal law. The null hypothesis H 0 for a given size α is accepted if the inequality γ 2,α/2 < γ ∗ 2 < γ 2,1–α/2 holds, where γ 2,α is the α-quantile of the statistic γ ∗ 2 . The values of γ 2,α for various n and α can be found in statistical tables. 21.3.3-3. Comparison of expectations of two normal populations. Suppose that X and Y are two populations with known variances σ 2 1 and σ 2 2 and unknown expectations a 1 and a 2 . Two independent samples X 1 , , X n and Y 1 , , Y k are drawn 21.3. STATI STICAL HYPOTHESIS TESTING 1103 from the populations and the sample expectations (means) m ∗ 1 and m ∗ 2 are calculated. The hypothesis that the expectations are equal to each other is tested using the statistic U = m ∗ 1 – m ∗ 2 σ 2 1 /n + σ 2 2 /k ,(21.3.3.6) which has a normal distribution with parameters (0, 1) under the assumption that the null hypothesis H 0 : a X = a Y is true. If the variances of the populations are unknown, then either the sample size should be sufficiently large for obtaining a reliable accurate estimator or the variances should coincide; otherwise, the known tests are inefficient. If the variances of the populations are equal to each other, σ 2 1 = σ 2 2 , then one can test the null hypothesis H 0 : a X = a Y using the statistic T =(m ∗ 1 – m ∗ 2 ) 1 n + 1 k s 2∗ 1 (n – 1)+s 2∗ 2 (k – 1) n + k – 2 – 1 2 ,(21.3.3.7) which has the t-distribution (Student’s distribution) with v = n + k – 2 degrees of freedom. The choice of the critical region depends on the form of the alternative hypothesis: 1. For the alternative hypothesis H 1 : a 1 > a 2 , one should choose a right-sided critical region. 2. For the alternative hypothesis H 1 : a 1 < a 2 , one should choose a left-sided critical region. 3. For the alternative hypothesis H 1 : a 1 ≠ a 2 , one should choose a two-sided critical region. 21.3.3-4. Tests for variances to be equal. Suppose that there are L independent samples X 11 , , X 1n 1 ; X 21 , , X 2n 2 ; ; X L1 , , X Ln L of sizes n 1 , , n L drawn from distinct normal populations with unknown expectation a 1 , , a L and unknown variances σ 2 1 , , σ 2 L . It is required to test the simple hypothesis H 0 : σ 2 1 = ··· = σ 2 L (the variances of all populations are the same) against the alternative hypothesis H 1 that some variances are different. 1 ◦ . Bartlett’s test. The statistic in this test has the form b = N ln 1 N L i=1 (n i – 1)s 2∗ i – L i=1 (n i – 1)lns 2∗ i ,(21.3.3.8) where N = L i=1 (n i – 1), s 2∗ j = 1 n j – 1 n j j=1 (X ij – L ∗ j ) 2 , L ∗ j = 1 n j n j j=1 X ij . The statistic b permits reducing the problem of testing the hypothesis that the variances of normal samples are equal to each other to the problem of testing the hypothesis that the expectations of approximately normal samples are equal to each other. If the null hypothesis H 0 is true and all n > 5, then the ratio B = b 1 + 1 3(L – 1) + L i=1 1 n i – 1 – 1 N –1 is distributed approximately according to the chi-square law with L – 1 degrees of freedom. The null hypothesis H 0 for a given size α is accepted if B < χ 2 1–α (L – 1), where χ 2 α is the α-quantile of the chi-square distribution with L – 1 degrees of freedom. 1104 MATHEMATIC AL STATISTICS 2 ◦ . Cochran’s test. If all samples have the same volume (n 1 = ···= n L = n), then the null hypothesis H 0 is tested against the alternative hypothesis H 1 using the Cochran statistic G = s 2∗ max s 2∗ 1 + ···+ s 2∗ L ,(21.3.3.9) where s 2∗ max =max 1 ≤ i ≤ L s 2∗ i . Cochran’s test is, in general, less powerful than the Bartlett test, but it is simpler. The null hypothesis H 0 for a given size α is accepted if G < G α ,whereG α is the α-quantile of the statistic G. The values of G α for various α, L,andv = n – 1 can be found in statistical tables. 3 ◦ . Fisher’s test.ForL = 2, to test the null hypothesis H 0 that the variances of two samples coincide, it is most expedient to use Fisher’s test based on the statistic Ψ = s 2∗ 2 s 2∗ 1 ,(21.3.3.10) where s 2∗ 1 and s 2∗ 2 are the adjusted sample variances of the two samples. The statistic Ψ has the F -distribution (Fisher–Snedecor distribution) with n 2 –1 and n 1 –1 degrees of freedom. The one-sided Fisher test verifies the null hypothesis H 0 : σ 2 1 = σ 2 2 against the alternative hypothesis H 1 : σ 2 1 < σ 2 2 ; the critical region of the one-sided Fisher test for a given size α is determined by the inequality Ψ > Ψ 1–α (n 2 – 1, n 1 – 1). The two-sided Fisher test verifies the null hypothesis H 0 : σ 2 1 = σ 2 2 against the alternative hypothesis H 1 : σ 2 1 ≠ σ 2 2 ; the critical region of the two-sided Fisher test for a given size α is determined by the two inequalities Ψ α/2 < Ψ < Ψ 1–α/2 ,whereΨ α is the α-quantile of the F -distribution with parameters n 2 – 1 and n 1 – 1. 21.3.3-5. Sample correlation. Suppose that a sample X 1 , , X n is two-dimensional and its elements X i =(X i1 , X i2 ) are two-dimensional random variables with joint normal distribution with means a 1 and a 2 , variances σ 2 1 and σ 2 2 , and correlation r. It is required to test the hypothesis that the components X (1) and X (2) of the vector X are independent, i.e., test the hypothesis that the correlation is zero. Estimation of the correlation r is based on the sample correlation r ∗ = n i=1 (X i1 – m ∗ 1 )(X i2 – m ∗ 2 ) n i=1 (X i1 – m ∗ 1 ) 2 n j=1 (X i2 – m ∗ 2 ) 2 ,(21.3.3.11) where m ∗ 1 = 1 n n i=1 X i1 , m ∗ 2 = 1 n n i=1 X i2 . The distribution of the statistic r ∗ depends only on the sample size n, and the statistic r ∗ itself is a consistent asymptotically efficient estimator of the correlation r. The null hypothesis H 0 : r = 0 that X (1) and X (2) are independent against the alternative hypothesis H 1 : r ≠ 0 (X (1) and X (2) are dependent) is accepted if the inequality r α/2 < r ∗ <r 1–α/2 is satisfied. Here r α is the α-quantile of the sample correlation under the assumption that the null hypothesis H 0 is true; the relation r α =–r 1–α holds because of symmetry. 21.3. STATI STICAL HYPOTHESIS TESTING 1105 To construct the confidence intervals, one should use the Fisher transformation y =arctanhr ∗ = 1 2 ln 1 + r ∗ 1 – r ∗ ,(21.3.3.12) which, for n > 10, is approximately normal with parameters E{y} ≈ 1 2 ln 1 + r 1 – r + r 2(n – 3) ,Var{y} ≈ 1 n – 3 . 21.3.3-6. Regression analysis. Let X 1 , , X n be the results of n independent observations X i = L j=1 θ j f j (t i )+ε i , where f 1 (t), , f L (t) are known functions, θ 1 , , θ L are unknown parameters, and ε 1 , , ε n are random errors known to be independent and normally distributed with zero mean and with the same unknown variance σ 2 . The regression parameters are subject to the following constraints: 1. The number of observations n is greater than the number L of unknown parameters. 2. The vectors f i =(f i (t 1 ), , f i (t n )) (i = 1, 2, , L)(21.3.3.13) must be linearly independent. 1 ◦ . Estimation of unknown parameters θ 1 , , θ L and construction of (one-dimensional) confidence intervals for them. To solve this problem, we consider the sum of squares S 2 = S 2 (θ 1 , , θ L )= n i=1 [X i – θ 1 f 1 (t i )–···– θ L f L (t i )] 2 .(21.3.3.14) The estimators θ ∗ 1 , , θ ∗ L form a solution of the system of equations θ ∗ 1 n j=1 f i (t j )+···+ θ ∗ L n j=1 f i (t j )= n j=1 X j f i (t j ). (21.3.3.15) The estimators θ ∗ 1 , , θ ∗ L are linear and efficient; in particular, they are unbiased and have the minimal variance among all possible estimators. Remark. If we omit the requirement that the errors ε 1 , , ε n are normally distributed and only assume that they are uncorrelated and have zero expectation and the same variance σ 2 , then the estimators θ ∗ 1 , , θ ∗ L are linear, unbiased, and have the minimal variance in the class of all linear estimators. The confidence intervals for a given confidence level γ for the unknown parameters θ 1 , , θ L have the form |θ i – θ ∗ i | < t (1+γ)/2 c 2 i s 2∗ 0 ,(21.3.3.16) where t γ is the γ-quantile of the t-distribution with n – L degrees of freedom, s 2∗ 0 = 1 n – L min θ 1 , ,θ L S 2 (θ 1 , , θ L )= S 2 (θ ∗ 1 , , θ ∗ L ) n – L , c 2 i = n i=1 c 2 ij , and c ij are the coefficients in the representation θ ∗ i = n j=1 c ij X j . 1106 MATHEMATIC AL STATISTICS System (21.3.3.15) can be solved in the simplest way if the vectors (21.3.3.13) are orthogonal. In this case, system (21.3.3.15) splits into separate equations θ ∗ i n j=1 f 2 i (t j )= n j=1 X j f i (t j ). Then the estimators θ ∗ 1 , , θ ∗ L are independent, linear, and efficient. Remark. If we omit the requirement that the errors ε 1 , , ε n are normally distributed, then the estimators θ ∗ 1 , , θ ∗ L are uncorrelated, linear, and unbiased and have the minimal variance in the class of all linear estimators. 2 ◦ . Testing the hypothesis that some θ i are zero. Suppose that it is required to test the null hypothesis H 0 : θ k+1 = ··· = θ L = 0 (0 ≤ k < L). This problem can be solved using the statistic Ψ = s 2∗ 1 s 2∗ 0 , where s 2∗ 0 = 1 n – L min θ 1 , ,θ L S 2 (θ 1 , , θ L ), s 2∗ 1 = S 2 2 –(n – L)s 2∗ 0 L – k , S 2 2 =min θ 1 , ,θ k S 2 (θ 1 , , θ k , 0, , 0). The hypothesis H 0 for a given size γ is accepted if Ψ < Ψ 1–γ ,whereΨ γ is the γ-quantile of the F-distribution with parameters L – k and n – L. 3 ◦ . Finding the estimator x ∗ (t) of the regression x(t)= L i=1 θ i f i (t) at an arbitrary time and the construction of confidence intervals. The estimator x ∗ (t) of the regression x(t) is obtained if θ i in x(t) are replaced by their estimators: x ∗ (t)= L i=1 θ ∗ i f i (t). The estimator x ∗ (t) is a linear, efficient, normally distributed, and unbiased estimator of the regression x(t). The confidence interval of confidence level γ is given by the inequality |x(t)–x ∗ (t)| < t (1+γ)/2 c(t)s 2∗ 0 ,(21.3.3.17) where c(t)= L i=1 c ij (t)f i (t)andt γ is the α-quantile of the t-distribution with n – L degrees of freedom. Example 1. Consider a linear regression x(t)=θ 1 + θ 2 t. 1 ◦ . The estimators θ ∗ 1 and θ ∗ 2 of the unknown parameters θ 1 and θ 2 are given by the formulas θ ∗ 1 = n j=1 c 1j X j , θ ∗ 2 = n j=1 c 2j X j , where c 1j = n k=1 t 2 k – t j n k=1 t k n n k=1 t 2 k – n k=1 t k 2 , c 2j = nt j – n k=1 t k n n k=1 t 2 k – n k=1 t k 2 . 21.3. STATI STICAL HYPOTHESIS TESTING 1107 The statistic s 2∗ 0 , up to the factor (n – 2)/σ 2 ,hastheχ 2 -distribution with n – 2 degrees of freedom and is determined by the formula s 2∗ 0 = 1 n – 2 n j=1 (X j – θ ∗ 1 – θ ∗ 2 t j ) 2 . The confidence intervals of confidence level γ for the parameters θ i are given by the formula |θ i – θ ∗ i | < t (1+γ)/2 c 2 i s 2∗ 0 , i = 1, 2, where t γ is the γ-quantile of the t-distribution with n – 2 degrees of freedom. 2 ◦ . We test the null hypothesis H 0 : θ 2 = 0, i.e., the hypothesis that x(t) is independent of time. The value of S 2 2 is given by the formula S 2 2 = n i=1 (X i – m ∗ ) 2 , m ∗ = 1 n n i=1 X i , and the value of s 2∗ 1 is given by the formula s 2∗ 1 = S 2 2 –(n – 2)s 2∗ 0 . Thus, the hypothesis H 0 for a given confidence level γ is accepted if φ < φ γ ,whereφ γ is the γ-quantile of an F -distribution with parameters 1 and n – 2. 3 ◦ . The estimator x ∗ (t) of the regression x(t)hastheform x ∗ (t)=θ ∗ 1 + θ ∗ 2 t. The coefficient c(t) is determined by the formula c(t)= n j=1 (c 1j + c 2j t) 2 = n j=1 c 2 1j + 2t n j=1 c 1j c 2j + t 2 n j=2 c 2 1j = b 0 + b 1 t + b 2 t 2 . Thus the boundaries of the confidence interval for a given confidence level γ are given by the formulas x ∗ L (t)=θ ∗ 1 + θ ∗ 2 t – t (1+γ)/2 s 2∗ 0 (b 0 + b 1 t + b 2 t 2 ), x ∗ R (t)=θ ∗ 1 + θ ∗ 2 t + t (1+γ)/2 s 2∗ 0 (b 0 + b 1 t + b 2 t 2 ), where t γ is the γ-quantile of the t-distribution with n – 2 degrees of freedom. 21.3.3-7. Analysis of variance. Analysis of variance is a statistical method for clarifying the influence of several factors on experimental results and for planning subsequent experiments. 1 ◦ . The simplest problem of analysis of variance. Suppose that there are L independent samples X 11 , , X 1n 1 ; X 21 , , X 2n 1 ; ; X L1 , X L2 , , X Ln 1 ,(21.3.3.18) drawn from normal populations with unknown expectations a 1 , , a L and unknown but equal variances σ 2 . It is necessary to test the null hypothesis H 0 : a 1 = ··· = a L that all theoretical expectations a i are the same against the alternative hypothesis H 1 that some theoretical expectations are different. The intragroup variances are determined by the formulas s 2∗ (i) = 1 n i – 1 n j j=1 (X ij – m ∗ i ) 2 (i = 1, 2, , n), (21.3.3.19) 1108 MATHEMATIC AL STATISTICS where m ∗ i = 1 n j n j j=1 X ij is the sample mean of the corresponding sample. The random variable (n i – 1)s 2∗ (i) /σ 2 has the chi-square distribution with n i – 1 degrees of freedom. An unbiased estimator of the unknown variance σ 2 is given by the statistic s 2∗ 0 = L i=1 (n i – 1)s 2∗ (i) L i=1 (n i – 1) ,(21.3.3.20) called the residual variance. The intergroup sample variance is defined to be the statistic s 2∗ 1 = 1 L – 1 L i=1 (m ∗ i – m ∗ ) 2 n i ,(21.3.3.21) where m ∗ is the common sample mean of the generalized sample. The statistic s 2∗ 1 is independent of s 2∗ 0 and is an unbiased estimator of the unknown variance σ 2 . The random variable s 2∗ 0 L i=1 (n i – 1)/σ 2 is distributed by the chi-square law with L i=1 (n i – 1) degrees of freedom, and the random variable s 2∗ 1 (L – 1)/σ 2 is distributed by the chi-square law with L – 1 degrees of freedom. According to the one-sided Fisher test, the null hypothesis H 0 must be accepted for a given confidence level γ if Ψ = s 2∗ 1 /s 2∗ 0 <Ψ γ ,whereΨ γ is the γ-quantile of the F -distribution with parameters L – 1 and L i=1 (n i – 1). 2 ◦ . Multifactor analysis of variance. We consider two-factor analysis of variance. Suppose that the first factor acts at L 1 levels and the second factor acts at L 2 levels (the two-factor (L 1 , L 2 )-level model of analysis of variance). Suppose that we have n ij observations in which the first factor acted at the ith level and the second factor acted at the jth level. The observation results X ijk are independent normally distributed random variables with the same (unknown) variance σ 2 and unknown expectations a ij . It is required to test the null hypothesis H 0 that the first and second factors do not affect the results of observations, i.e., all a ij are the same. The action of two factors at levels L 1 and L 2 is identified with the action of a single factor at L 1 L 2 levels; then to test the hypothesis H 0 , it is expedient to use the one-factor L 1 L 2 -level model. The statistics s 2∗ 0 and s 2∗ 1 are determined by the formulas s 2∗ 0 = L 1 i=1 L 2 j=1 n ij k=1 (X ijk – m ∗ ij ) 2 L 1 i=1 L 2 j=1 (n ij – 1) , s 2∗ 1 = 1 L 1 L 2 – 1 L 1 i=1 L 2 j=1 n ij (m ∗ ij – m ∗ ) 2 , (21.3.3.22) . statistic γ ∗ 2 . The values of γ 2,α for various n and α can be found in statistical tables. 21.3.3-3. Comparison of expectations of two normal populations. Suppose that X and Y are two populations. degrees of freedom and is determined by the formula s 2∗ 0 = 1 n – 2 n j=1 (X j – θ ∗ 1 – θ ∗ 2 t j ) 2 . The confidence intervals of confidence level γ for the parameters θ i are given by the formula |θ i –. t γ is the γ-quantile of the t-distribution with n – 2 degrees of freedom. 21.3.3-7. Analysis of variance. Analysis of variance is a statistical method for clarifying the influence of several factors