20.2. RANDOM VARIABLES AND THEIR CHARACTERISTICS 1053 1111 OO 11 22 223 xx px() Fx() 3 ()a ()b Figure 20.9. Probability density (a) and cumulate distribution (b) functions of normal distribution for a = 1, σ = 1/4. 20.2.4-3. Normal distribution. A random variable X has the normal distribution with parameters (a, σ 2 )(seeFig.20.9a) if its probability density function has the form p(x)= 1 √ 2πσ exp – (x – a) 2 2σ 2 , x (–∞, ∞). (20.2.4.5) The cumulative distribution function (see Fig. 20.9b) and the characteristic function have the form F (x)= 1 √ 2πσ x –∞ exp – (t – a) 2 2σ 2 dt, f(t)=exp iat – σ 2 t 2 2 , (20.2.4.6) and the numerical characteristics are given by the formulas E{X} = a,Var{X} = σ 2 , Mode{X} =Med{X} = a, γ 1 = 0, γ 2 = 0, μ k = 0, k = 2m – 1, m = 1, 2, (2k – 1)!! σ 2k , k = 2m, m = 1, 2, The linear transformation Y = X–a σ reduces the normal distribution with parameters (a, σ 2 ) and cumulative distribution function F(x) to the standard normal distribution with parameters (0, 1) and cumulative distribution function Φ(x)= 1 √ 2π x –∞ e –t 2 /2 dt;(20.2.4.7) moreover, Φ(–x)=1 – Φ(x). Remark 1. The values of the cumulative distribution function Φ(x) of the standard normal distribution are computed by the function NORMSDIST(z) in EXCEL software; for example, for Φ(2), the function call NORMSDIST(2) returns the value 0.9972. Remark 2. The values of the cumulative distribution function F (x) of the normal distribution are computed by the function pnorm(x,mu,sigma) in MATHCAD software; to compute Φ(x), one should use pnorm(x,0,1). For example, Φ(2) = pnorm(2,0,1) = 0.9972. The probability that a random variable X normally distributed with parameters (m, σ 2 ) lies in the interval (a, b) is given by the formula P (a < ξ < b)=P a – m σ < ξ – m σ < b – m σ = Φ b – m σ – Φ a – m σ .(20.2.4.8) 1054 PROBABILITY THEORY A normally distributed random variable takes values close to its expectation with large probability; this is expressed by the sigma rule P (|X – m| ≥ kσ)=2[1 – Φ(k)] = 0.3173 for k = 1, 0.0456 for k = 2, 0.0027 for k = 3. The three-sigma rule is most frequently used. The fundamental role of the normal distribution is due to the fact that, under mild assumptions, the distribution of a sum of random variables is asymptotically normal as the number of terms increases. The corresponding conditions are given in the central limit theorem. 20.2.4-4. Cauchy distribution. A random variable X obeys the Cauchy distribution with parameters (a, λ)(λ > 0)(see Fig. 20.10a)if p(x)= λ π[λ 2 +(x – a) 2 ] , x (–∞, ∞). (20.2.4.9) 1111 OO 11 22 223 x px() Fx() x 3 ()a ()b Figure 20.10. Probability density (a) and cumulate distribution (b) functions of Cauchy distribution for a = 1, λ = 4. The cumulative distribution function has the form (see Fig. 20.10b) F (x)= 1 π arctan x – a λ + 1 2 .(20.2.4.10) The numerical characteristics of a random variable that has a Cauchy distribution do not exist in the usual sense. The expectation exists only in the sense of the Cauchy principal value (see Paragraph 10.2.2-3) and is given by the formula E{X} = lim T →∞ λ π T –T xdx λ 2 +(x – a) 2 = a. 20.2.4-5. Chi-square distribution. A random variable X = χ 2 (n)hasthechi-square distribution with n degrees of freedom if its probability density function has the form (see Fig. 20.11a) p(x)= 1 2 n/2 Γ(α/2) x n/2–1 e –x/2 for x > 0, 0 for x ≤ 0. (20.2.4.11) 20.2. RANDOM VARIABLES AND THEIR CHARACTERISTICS 1055 1 1 1 2 3 1 OO 11 22 22 2 3 xx px() Fx() 3 3 ()a ()b Figure 20.11. Probability density (a) and cumulate distribution (b) functions of chi-square distribution for n = 1 (curve 1), n = 2 (curve 2), and n = 3 (curve 3). The cumulative distribution function can be written as (see Fig. 20.11b) F (x)= 1 2 n/2 Γ(α/2) x 0 ξ n/2–1 e –ξ/2 dξ,(20.2.4.12) where Γ(x) is a Gamma function. Remark. The values χ 2 (x, n) of the cumulative distribution function of the chi-square distribution with n degrees of freedom can be obtained using the expression 1 – CHIDIST(x; deg freedom) in EXCEL software. For example, for the chi-square distribution with 10 degrees of freedom at the point x = 2, one gets χ 2 (2, 10)= 1 – CHIDIST(2; 10)=0.0037. A similar result is obtained if we use the function pchisq(x,n) in MATHCAD software: χ 2 (2, 10) = pchisq(2, 10)=0.0037. Main property of the chi-square distribution. For an arbitrary n,thesum X = n k=1 X 2 k , of squares of independent random variables obeying the standard normal distribution has the chi-square distribution with n degrees of freedom. T HEOREM ON DECOMPOSITION. Suppose that the sum n k=1 X 2 k of squares of independent standard normally distributed random variables is expressed as the sum of L quadratic forms y j (X 1 , , X n ) of ranks n j , respectively. The variables y 1 , , y L are independent and obey the chi-square distributions with n 1 , , n L degrees of freedom if and only if n 1 + ···+ n L = n . THEOREM ON ADDITION, OR STABILITY PROPERTY. The sum of L independent random variables y 1 , , y L obeying the chi-square distributions with n 1 , , n L degrees of freedom, respectively, has the chi-square distribution with n = n 1 + ··· + n L degrees of freedom. The characteristic function has the form f(t)=(1 – 2it) –n/2 , and the numerical characteristics are given by the formulas E{χ 2 (n)} = n,Var{χ 2 (n)} = 2n, α k = n(n + 2) ⋅ ⋅ [n + 2(k – 1)], γ 1 = 2 2 n , γ 2 = 12 n , Mode{χ 2 (n)} = n – 2 (n ≥ 2). 1056 PROBABILITY THEORY Relationship with other distributions: 1. For n = 1, formula (20.2.4.11) gives the probability density function of the square X 2 of a random variable with the standard normal distribution. 2. For n = 2, formula (20.2.4.11) gives the exponential distribution with parameter λ = 1 2 . 3. As n →∞, the random variable X = χ 2 (n) has an asymptotically normal distribution with parameters (n, 2n). 4. As n →∞, the random variable 2χ 2 (n) has an asymptotically normal distribution with parameters ( √ 2n – 1, 1). For the quantiles (denoted by χ 2 γ or χ 2 γ (n)), one has the approximation formula χ 2 γ (n) ≈ 1 2 ( √ 2n – 1 + t γ ) 2 (n ≥ 30), where t γ is the quantile of the standard normal distribution. For γ close to 0 or 1, it is more expedient to use the approximation given by the formula χ 2 γ (n) ≈ n 1 – 2 9n + t γ 2 9n 3 . The quantiles χ 2 γ (n) are tabulated; they can also be computed in EXCEL, MATHCAD, and other software. Remark. Tables often list χ 2 1–γ (n)ratherthanχ 2 γ (n). 20.2.4-6. Student’s t-distribution. A random variable X = t(n)hasStudent’s distribution (t-distribution) with n degrees of freedom (n > 0) if its probability density function has the form (see Fig. 20.12a) p(x)= Γ( n+1 2 ) √ nπ Γ( n 2 ) 1 + x 2 n – n+1 2 , x (–∞, ∞). (20.2.4.13) where Γ(x) is Gamma function. 1111 OO 0.2 0.5 0.4 1 2 xx px() Fx() 22 2 ()a ()b Figure 20.12. Probability density (a) and cumulate distribution (b) functions of Student’s t-distribution for n = 3. The cumulative distribution function has the form (see Fig. 20.12b) F (x)= Γ( n+1 2 ) √ nπ Γ( n 2 ) x –∞ 1 + ξ 2 n – n+1 2 dξ.(20.2.4.14) Remark. The values of Student’s distribution function t(n) with n degrees of freedom can be computed, for example, by using the function pt(x,n) in MATHCAD software. 20.2. RANDOM VARIABLES AND THEIR CHARACTERISTICS 1057 Main property of Student’s distribution. If η and χ 2 (n) are independent random variables and η has the standard normal distribution, then the random variable t(n)=η n χ 2 (n) has Student’s distribution with n degrees of freedom. The numerical characteristics are given by the formulas E{t(n)} = 0 (n > 1), Var{t(n)} = n n – 2 for n > 2, 0 for n ≤ 2, Mode{t(n)} =Med{t(n)} = 0 (n > 1), α 2k–1 = 0, α 2k = n k Γ(n/2 – k)Γ(k + 1/2) √ π Γ(n/2) (2k < n), γ 1 = 0, γ 2 = 3(n – 2) n – 4 (n > 4). Relationship with other distributions: 1. For n = 1, Student’s distribution coincides with the Cauchy distribution. 2. As n →∞, Student’s distribution is asymptotically normal with parameters (0, 1). The quantiles of Student’s distribution are denoted by t γ (n) and satisfy t γ (n)=–t 1–γ (n), |t 1–γ (n)| = t 1–γ/2 (n). The quantiles t γ (n) are tabulated; they can also be computed in EXCEL, MATHCAD, and other software. Student’s distribution is used when testing the hypothesis about the mean of a normally distributed population with unknown variance. 20.2.5. Multivariate Random Variables 20.2.5-1. Distribution of bivariate random variable. Suppose that random variables X 1 , , X n are defined on a probability space (Ω, F, P ); then one says that an n-dimensional random vector X =(X 1 , , X n )orasystem of random variables is given. The random variables X 1 , , X n can be viewed as the coordinates of points in an n-dimensional space. Thedistribution function F (x 1 , x 2 )=F X 1 ,X 2 (x 1 , x 2 ) of atwo-dimensional random vector (X 1 , X 2 ), or the joint distribution function of the random variables X 1 and X 2 ,isdefined as the probability of the simultaneous occurrence (intersection) of the events (X 1 < x 1 )and (X 2 < x 2 ); i.e., F (x 1 , x 2 )=F X 1 ,X 2 (x 1 , x 2 )=P (X 1 < x 1 , X 2 < x 2 ). (20.2.5.1) Geometrically, F(x 1 , x 2 ) can beinterpreted as the probability that the random point (X 1 , X 2 ) lies in the lower left infinite quadrant with vertex (x 1 , x 2 ) (see Fig. 20.13). Given the joint distribution of random variables X 1 and X 2 , one can fi nd the distributions of each of the random variables X 1 and X 2 , known as the marginal distributions: F X 1 (x 1 )=P (X 1 < x 1 )=P (X 1 < x 1 , X 2 <+∞)=F X 1 ,X 2 (x 1 ,+∞), F X 2 (x 2 )=P (X 2 < x 2 )=P (X 1 <+∞, X 2 < x 2 )=F X 1 ,X 2 (+∞, x 2 ). (20.2.5.2) 1058 PROBABILITY THEORY O (,)xx x x (,)XX 1 1 2 1 2 2 Figure 20.13. Geometrically interpretation of the distribution function F X 1 ,X 2 (x 1 , x 2 ). The marginal distributions do not completely characterize the two-dimensional random variable (X 1 , X 2 ); i.e., the joint distribution of the random variables X 1 and X 2 cannot in general be reconstructed from the marginal distributions. Properties of the joint distribution function of random variables X 1 and X 2 : 1. The function F (x 1 , x 2 ) is a nondecreasing function of each of the arguments. 2. F(x 1 ,–∞)=F (–∞, x 2 )=F (–∞,–∞)=0. 3. F(+∞,+∞)=1. 4. The probability that the random vector lies in a rectangle with sides parallel to the coordinate axes is P (a 1 ≤ X 1 < b 1 , a 2 ≤ X 2 < b 2 )=F (b 1 , b 2 )–F (b 1 , a 2 )–F (a 1 , b 2 )+F (a 1 , a 2 ). 5. The function F (x 1 , x 2 ) is left continuous in each of the arguments. 20.2.5-2. Discrete bivariate random variables. A bivariate random variable (X 1 , X 2 )issaidtobediscrete if each of the random variables X 1 and X 2 is discrete. If the random variable X 1 takes the values x 11 , , x 1m and the random variable X 2 takes the values x 21 , , x 2n , then the random vector (X 1 , X 2 ) can take only the pairs of values (x 1i , x 2j )(i = 1, , m, j = 1, , n). It is convenient to describe the distribution of a bivariate discrete random variable using the distribution matrix shown in Fig. 20.14. Figure 20.14. Distribution matrix. The entries p ij = P(X 1 = x 1i , X 2 = x 2j ) of the distribution matrix are the probabilities of the simultaneous occurrence of the events (X 1 = x 1i )and(X 2 = x 2j ); P X 1 ,i = p i1 + ···+ p in is the probability that the random variable X 1 takes the value x 1i ; P X 2 ,j = p 1j + ···+ p mj 20.2. RANDOM VARIABLES AND THEIR CHARACTERISTICS 1059 is the probability that the random variable X 2 takes the value x 2j ; the last column (resp., row) shows the distribution of the random variable X 1 (resp., X 2 ). The distribution function of a discrete bivariate random variable can be determined by the formula F (x 1 , x 2 )= x 1i <x 1 x 2j <x 2 p ij .(20.2.5.3) 20.2.5-3. Continuous bivariate random variables. A bivariate random variable (X 1 , X 2 )issaidtobecontinuous if its joint distribution function F (x 1 , x 2 ) can be represented as F (x 1 , x 2 )= x 2 –∞ x 1 –∞ p(y 1 , y 2 ) dy 1 dy 2 ,(20.2.5.4) where the joint probability function p(x 1 , x 2 )=p X 1 ,X 2 (x 1 , x 2 ) is piecewise continuous. The joint probability function can be expressed via the joint distribution function as follows: p X 1 ,X 2 (x 1 , x 2 )=p(x 1 , x 2 )=F x 1 x 2 (x 1 , x 2 ). (20.2.5.5) Formulas (20.2.5.4) and (20.2.5.5) establish a one-to-one correspondence (up to sets of probability zero) between the joint probability functions and the joint distribution functions of continuous bivariate random variables. The differential p(x 1 , x 2 ) dx 1 dx 2 is called a probability element. Up to higher-order infinitesimals, the probability element is equal to the probability for the random variable (X 1 , X 2 ) to lie in the infinitesimal rectangle (x 1 , x 1 + Δx 1 ) × (x 2 , x 2 + Δx 2 ). The probability density function of the two-dimensional random variable (X 1 , X 2 ), which is also called the joint probability function of the random variables X 1 and X 2 , determines the probability density functions of the random variables X 1 and X 2 , which are called the marginal probability functions of the two-dimensional random variable (X 1 , X 2 ), by the formulas p X 1 (x 1 )= +∞ –∞ p X 1 ,X 2 (x 1 , x 2 ) dx 2 , p X 2 (x 2 )= +∞ –∞ p X 1 ,X 2 (x 1 , x 2 ) dx 1 .(20.2.5.6) In the general case, the joint probability function cannot be reconstructed from the marginal probability functions, and hence the latter do not completely characterize the bivariate random variable (X 1 , X 2 ). Properties of the joint probability function of random variables X 1 and X 2 : 1. The function p(x 1 , x 2 ) is nonnegative; i.e., p(x 1 , x 2 ) ≥ 0. 2. +∞ –∞ +∞ –∞ p(x 1 , x 2 ) dx 1 dx 2 = 1. 3. P(a 1 < X 1 < b 1 , a 2 < X 2 < b 2 )= b 1 a 1 dx 1 b 2 a 2 p(x 1 , x 2 ) dx 2 = b 2 a 2 dx 2 b 1 a 1 p(x 1 , x 2 ) dx 1 . 4. The probability for a two-dimensional random variable (X 1 , X 2 ) to lie in a domain D ⊂ R 2 is numerically equal to the volume of the curvilinear cylinder with base D bounded above by the surface of the joint probability function: P [(X 1 , X 2 ) D]= (x 1 ,x 2 ) D p X 1 ,X 2 (x 1 , x 2 ) dx 1 dx 2 . . distributions: 1. For n = 1, formula (20.2.4.11) gives the probability density function of the square X 2 of a random variable with the standard normal distribution. 2. For n = 2, formula (20.2.4.11). distribution of the random variables X 1 and X 2 cannot in general be reconstructed from the marginal distributions. Properties of the joint distribution function of random variables X 1 and X 2 : 1 each of the arguments. 20.2.5-2. Discrete bivariate random variables. A bivariate random variable (X 1 , X 2 )issaidtobediscrete if each of the random variables X 1 and X 2 is discrete. If the random