Handbook of mathematics for engineers and scienteists part 157 potx

1060 PROBABILITY THEORY Random variables X 1 and X 2 are said to be independent if the relation P (X 1 S 1 , X 2 S 2 )=P (X 1 S 1 ) P (X 2 S 2 )(20.2.5.7) holds for any measurable sets S 1 and S 2 . T HEOREM 1. Random variables X 1 and X 2 are independent if and only if F X 1 ,X 2 (x 1 , x 2 )=F X 1 (x 1 )F X 2 (x 2 ). T HEOREM 2. Random variables X 1 and X 2 are independent if and only if the characteristic function of the bivariate random variable (X 1 , X 2 ) is equal to the product of the characteristic functions of X 1 and X 2 , f X 1 ,X 2 (x 1 , x 2 )=f X 1 (x 1 )f X 2 (x 2 ). 20.2.5-4. Numerical characteristics of bivariate random variables. The expectation of a function g(X 1 , X 2 ) of a bivariate random variable (X 1 , X 2 )isdefined as the expression computed by the formula E{g(X 1 , X 2 )} = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩  i  j g(x 1i , x 2j )p ij in the discrete case,  +∞ –∞  +∞ –∞ g(x 1 , x 2 )p(x 1 , x 2 ) dx 1 dx 2 in the continuous case, (20.2.5.8) if these expressions exist in the sense of absolute convergence; otherwise, one says that E{g(X 1 , X 2 )} does not exist. The moment of order r 1 + r 2 of a two-dimensional random variable (X 1 , X 2 ) about a point (a 1 , a 2 )isdefined as the expectation E{(X 1 – a 1 ) r 1 (X 2 – a 2 ) r 2 }. If a 1 = a 2 = 0, then the moment of order r 1 + r 2 of a two-dimensional random variable (X 1 , X 2 ) is called simply the moment, or the initial moment. The initial moment of order r 1 + r 2 is usually denoted by α r 1 ,r 2 ; i.e., α r 1 ,r 2 = E{X r 1 1 X r 2 2 }. The first initial moments are the expectations of the random variables X 1 and X 2 ; i.e., α 1,0 = E{X 1 1 X 0 2 } = E{X 1 } and α 0,1 = E{X 0 1 X 1 2 } = E{X 2 }. The point (E{X 1 }, E{X 2 }) on the OXY -plane characterizes the position of the random point (X 1 , X 2 ); this position spreads about the point (E{X 1 }, E{X 2 }). Obviously, the first central moments are zero. The second initial moments are given by the formulas α 2,0 = α 2 (X 1 ), α 0,2 = α 2 (X 2 ), α 1,1 = E{X 1 X 2 }. If a 1 = E{X 1 } and a 2 = E{X 2 }, then the moment of order r 1 + r 2 of the bivariate random variable (X 1 , X 2 ) is called the central moment. The central moment of order r 1 +r 2 is usually denoted by μ r 1 ,r 2 ; i.e., μ r 1 ,r 2 = E{(X 1 – E{X 1 }) r 1 (X 2 – E{X 2 }) r 2 }. The second central moments are of special interest and have special names and notation: λ 11 = μ 2,0 =Var{X 1 }, λ 22 = μ 0,2 =Var{X 2 }, λ 12 = λ 21 = μ 1,1 = E{(X 1 – E{X 1 })(X 2 – E{X 2 })}. The first two of these moments are the variances of the respective random variables, and the third moment is called the covariance and will be considered below. 20.2. RANDOM VARIABLES AND THEIR CHARACTERISTICS 1061 20.2.5-5. Covariance and correlation of two random variables. The covariance (correlation moment, or mixed second moment) Cov(X 1 , X 2 ) of random variables X 1 and X 2 is defined as the central moment of order (1 + 1): Cov(X 1 , X 2 )=α 1,1 = E{(X 1 – E{X 1 })(X 2 – E{X 2 })}.(20.2.5.9) Properties of the covariance: 1. Cov(X 1 , X 2 )=Cov(X 2 , X 1 ). 2. Cov(X, X)=Var{X}. 3. If the random variables X 1 and X 2 are independent, then Cov(X 1 , X 2 )=0.If Cov(X 1 , X 2 ) ≠ 0, then the random variables X 1 and X 2 are dependent. 4. If Y 1 = a 1 X 1 + b 1 and Y 2 = a 2 X 2 + b 2 ,thenCov(Y 1 , Y 2 )=a 1 a 2 Cov(X 1 , X 2 ). 5. Cov(X 1 , X 2 )=E{X 1 X 2 } – E{X 1 }E{X 2 }. 6. |Cov(X 1 , X 2 )| ≤ √ Var{X 1 }Var{X 2 }. Moreover, Cov(X 1 , X 2 )= √ Var{X 1 }Var{X 2 } if and only if the random variables X 1 and X 2 are linearly dependent. 7. Var{X 1 + X 2 } =Var{X 1 } +Var{X 2 } + 2Cov(X 1 , X 2 ). If Cov(X 1 , X 2 )=0, then the random variables X 1 and X 2 are said to be uncorrelated; if Cov(X 1 , X 2 ) ≠ 0, then they are correlated. Independent random variables are always uncorrelated, but correlated random variables are not necessarily independent in general. Example 1. Suppose that we throw two dice. Let X 1 be the number of spots on top of the first die, and let X 2 be the number of spots on top of the second die. We consider the random variables Y 1 = X 1 + X 2 and Y 2 = X 1 – X 2 (the sum and difference of points obtained). Then Cov(Y 1 , Y 2 )=E{(X 1 + X 2 – E{X 1 + X 2 })(X 1 – X 2 – E{X 1 – X 2 })} = = E{(X 1 – E{X 1 }) 2 –(X 2 – E{X 2 }) 2 } =Var{X 1 } –Var{X 2 } = 0, since X 1 and X 2 are identically distributed and hence Var{X 1 } =Var{X 2 }.ButY 1 and Y 2 are obviously dependent; for example, if Y 1 = 2 then one necessarily has Y 2 = 0. The covariance of random variables X 1 and X 2 characterizes both the degree of their dependence on each other and their spread around the point (E{X 1 }, E{X 2 }). The covariance of X 1 and X 2 has the dimension equal to the product of dimensions of X 1 and X 2 . Along with the covariance of random variables X 1 and X 2 , one often uses the correlation ρ(X 1 , X 2 ), which is a dimensionless normalized variable. The correlation (or correlation coefficient) of random variables X 1 and X 2 is the ratio of the covariance of X 1 and X 2 to the product of their standard deviations, ρ(X 1 , X 2 )= Cov(X 1 , X 2 ) σ X 1 σ X 2 .(20.2.5.10) The correlation of random variables X 1 and X 2 indicates the degree of linear dependence between the variables. If ρ(X 1 , X 2 )=0, then there is no linear relation between the random variables, but there may well be some different relation between them. Properties of the correlation: 1. ρ(X 1 , X 2 )=ρ(X 2 , X 1 ). 2. ρ(X, X)=1. 3. If random variables X 1 and X 2 are independent, then ρ(X 1 , X 2 )=0.Ifρ(X 1 , X 2 ) ≠ 0, then the random variables X 1 and X 2 are dependent. 4. If Y 1 = a 1 X 1 + b 1 and Y 2 = a 2 X 2 + b 2 ,thenρ(X 1 , X 2 )= ρ(X 1 , X 2 ). 5. |ρ(X 1 , X 2 )| ≤ 1. Moreover, ρ(X 1 , X 2 )= 1 if and only if the random variables X 1 and X 2 are linearly dependent. 1062 PROBABILITY THEORY 20.2.5-6. Conditional distributions. The joint distribution of random variables X 1 and X 2 determines the conditional distribution of one of the random variables given that the other random variable takes a certain value (or lies in a certain interval). If the joint distribution is discrete, then the conditional distributions of X 1 and X 2 are also discrete. The conditional distributions are described by the formulas P 1|2 (x 1i |x 2j )=P (X 1 = x 1i |X 2 = x 2j )= P (X 1 = x 1i , X 2 = x 2j ) P (X 2 = x 2j ) = p ij P X 2 ,j , P 2|1 (x 2j |x 1i )=P (X 2 = x 2j |X 1 = x 1i )= P (X 1 = x 1i , X 2 = x 2j ) P (X 1 = x 1i ) = p ij P X 1 ,i , i = 1, , m; j = 1, , n. (20.2.5.11) The probabilities P 1|2 (x 1i |x 2j ), j = 1, , n,define the conditional probability mass function of the random variable X 2 given X 1 = x 1i ; and the probabilities P 2|1 (x 2j |x 1i ), i = 1, , m,define the conditional probability mass function of the random variable X 1 given X 2 = x 2j . These conditional probability mass functions have the properties of usual probability mass functions; for example, the sum of probabilities in each of them is equal to 1,  i P 1|2 (x 1i |x 2j )=  j P 2|1 (x 2j |x 1i )=1. If the joint distribution is continuous, then the conditional distributions of the random variables X 1 and X 2 are also continuous and are described by the conditional probability density functions p 1|2 (x 1 |x 2 )= p X 1 ,X 2 (x 1 , x 2 ) p X 2 (x 2 ) , p 2|1 (x 2 |x 1 )= p X 1 ,X 2 (x 1 , x 2 ) p X 1 (x 1 ) .(20.2.5.12) The conditional distributions of the random variables X 1 and X 2 can also be described by the conditional cumulative distribution functions F X 2 (x 2 |X 1 = x 1 )=P (X 2 < x 2 |X 1 = x 1 ), F X 1 (x 1 |X 2 = x 2 )=P (X 1 < x 1 |X 2 = x 2 ). (20.2.5.13) The total probability formulas for the cumulative distribution functions of continuous random variables have the form F X 2 (x 2 )=  +∞ –∞ F X 2 (x 2 |X 1 = x 1 )p X 1 (x 1 )dx 1 , F X 1 (x 1 )=  +∞ –∞ F X 1 (x 1 |X 2 = x 2 )p X 2 (x 2 )dx 2 . (20.2.5.14) T HEOREM ON MULTIPLICATION OF DENSITIES. The joint probability function for two random variables is equal to the product of the probability density function of one random variable by the conditional probability density function of the other random variable, given the value of the first random variable: p X 1 ,X 2 (x 1 , x 2 )=p X 2 (x 2 )p 1|2 (x 1 x 2 )=p X 1 (x 1 )p 2|1 (x 2 |x 1 ). (20.2.5.15) 20.2. RANDOM VARIABLES AND THEIR CHARACTERISTICS 1063 Bayes’ formulas: P 1|2 (x 1i |x 2j )= p X 1 (x 1i )p 2|1 (x 2j |x 1i )  i P (X 1 = x 1i ) P 2|1 (x 2j x 1i ) , P 2|1 (x 2j |x 1i )= p X 2 (x 2j )p 1|2 (x 1i |x 2j )  j P (X 2 = x 2j )p 1|2 (x 1i |x 2j ) ; (20.2.5.16) p 1|2 (x 1 |x 2 )= p X 1 (x 1 )p 2|1 (x 2 |x 1 )  +∞ –∞ p X 1 (x 1 )p 2|1 (x 2 |x 1 ) dx 1 , p 2|1 (x 2 |x 1 )= p X 2 (x 2 )p 1|2 (x 1 |x 2 )  +∞ –∞ p X 2 (x 2 )p 1|2 (x 1 |x 2 ) dx 2 . (20.2.5.17) 20.2.5-7. Conditional expectation. Regression. The conditional expectation of a discrete random variable X 2 ,givenX 1 = x 1 (where x 1 is a possible value of the random variable X 1 ), is defi ned to be the sum of products of possible values of X 2 by their conditional probabilities, E{X 2 |X 1 = x 1 } =  j x 2j p 2|1 (x 2j |x 1 ). (20.2.5.18) For continuous random variables, E{(X 2 |X 1 = x 1 } =  +∞ –∞ x 2 p 2|1 (x 2 |x 1 ) dx 2 .(20.2.5.19) Properties of the conditional expectation: 1. If random variables X and Y are independent, then their conditional expectations coincide with the unconditional expectations; i.e., E{Y |X = x} = E{Y } and E{X|Y = y} = E{X}. 2. E{f(X)h(Y )|X = x} = f(x)E{h(Y )|X = x}. 3. Additivity of the conditional expectation: E{Y 1 + Y 2 |X} = E{Y 1 |X = x} + E{Y 2 |X = x}. A function g 2 (X 1 ) is called the best mean-square approximation to a random variable X 2 if the expectation E{[X 2 – g 2 (X 1 )] 2 } takes the least possible value; the function g 2 (x 1 )is called the mean-square regression of X 2 on X 1 . The conditional expectation E{X 2 |X 1 } is a function of X 1 , E{X 2 |X 1 } = g 2 (X 1 ). (20.2.5.20) It is called the regression function of X 2 on X 1 and is the mean-square regression of X 2 on X 1 . In a majority of cases, it suffices to approximate the regression (20.2.5.20) by the linear function g 2 (X 1 )=α + β 21 X 1 = E{X 2 } + β 21 (X 1 – E{X 1 }). Here the coefficient β 21 = ρ 12 σ X 2 /σ X 1 is called the regression coefficient of X 2 on X 1 (ρ 12 = ρ(X 1 , X 2 )). The number σ 2 X 2 (1 – ρ 2 12 ) is called the residual standard deviation of the random variable X 2 with respect to the random variable X 1 ; this number characterizes the error arising if X 2 is replaced by the linear function g 2 (X 1 )=α + β 21 X 1 . 1064 PROBABILITY THEORY Remark 1. The regression (20.2.5.20) can be approximated more precisely by a polynomial of degree k > 1 (parabolic regression of order k) or some other nonlinear functions (exponential regression, logarithmic regression, etc.). Remark 2. If X 2 is taken for the independent variable, then we obtain the mean-square regression E{X 1 |X 2 } = g 1 (X 2 ) of X 1 on X 2 and the linear regression g 1 (X 2 )=E{X 1 } + β 12 (X 2 – E{X 2 }), β 12 = ρ 12 σ X 1 σ X 2 of X 1 on X 2 . Remark 3. All regression lines pass through the point (E{X 1 }, E{X 2 }). 20.2.5-8. Distribution function of multivariate random variable. The probability P (X 1 < x 1 , , X n < x n ) treated as a function of a point x =(x 1 , , x n ) of the n-dimensional space and denoted by F X (x)=F (x)=P(X 1 < x 1 , , X n < x n )(20.2.5.21) is called the multiple (or joint) distribution function of the n-dimensional random vector X =(X 1 , , X n ). Properties of the joint distribution function of a random vector X: 1. F(x) is a nondecreasing function in each of the arguments. 2. If at least one of the arguments x 1 , , x n is equal to –∞, then the joint distribution function is equal to zero. 3. The m-dimensional distribution function of the subsystem of m < n random variables X 1 , , X m can be determined if the arguments corresponding to the remaining random variables X m+1 , , X n are set to +∞, F X 1 , ,X m (x 1 , , x m )=F X (x 1 , , x m ,–∞, ,+∞). (The m-dimensional distribution function F X 1 , ,X m (x 1 , , x m ) is usually called the marginal distribution function.) 4. The function F X (x) is left continuous in each of the arguments. An n-dimensional random variable X is said to be discrete if each of the random variables X 1 , X 2 , , X n is discrete. The distribution of a subsystem X 1 , , X m of random variables and the conditional distributions are defined as in Paragraphs 20.2.5-6 and 20.2.5-7. An n-dimensional random variable X is said to be continuous if its distribution function F (x) can be written in the form F (x)=  –∞<y 1 <x 1  –∞<y n <x n p(y) dy,(20.2.5.22) where dy = dy 1 dy n and the function p(x), called the multiple (or joint) probability function of the random variables X 1 , , X n , is piecewise continuous. The joint probability function can be expressed via the joint distribution function by the formula p(x)= ∂ n F X (x) ∂x 1 ∂x n ;(20.2.5.23) 20.2. RANDOM VARIABLES AND THEIR CHARACTERISTICS 1065 i.e., the joint probability function is the nth mixed partial derivative (one differentiation in each of the arguments) of the joint distribution function. Formulas (20.2.5.22) and (20.2.5.23) establish a one-to-one correspondence (up to sets of probability zero) between the joint probability functions and the joint distribution functions of continuous multivariate random variables. The differential p(x) dx is called a probability element. The joint probability function of n random variables X 1 , X 2 , , X n has the same properties as the joint probability function of two random variables X 1 and X 2 (see Paragraph 20.2.1-4.). The marginal probability functions and conditional probability functions obtained from a continuous n-dimensional probability distribution are defined precisely as in Paragraphs 20.2.1-4 and 20.2.1-8. Remark 1. The distribution of a system of two or more multivariate random variables X 1 =(X 11 , X 12 , ) and X 2 =(X 21 , X 22 , ) is the joint distribution of all variables X 11 , X 12 , ; X 21 , X 22 , ; Remark 2. A joint distribution can be discrete in some random variables and continuous in the other random variables. 20.2.5-9. Numerical characteristics of multivariate random variables. The expectation of a function g(X) of a multivariate random variable X is definedbythe formula E{g(X)} = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩  i 1  i n g(x 1i 1 , , x 2i n )p i 1 i 2 i n in the discrete case,  +∞ –∞  +∞ –∞ g(x)p(x) dx in the continuous case (20.2.5.24) if these expressions exist in the sense of absolute convergence; otherwise, one says that E{g(X)} does not exist. The moment of order r 1 + ···+ r n of a random variable X about a point (a 1 , , a n )is defined as the expectation E{(X 1 – a 1 ) r 1 (X n – a n ) r n }. For a 1 = ···= a n = 0, the moment of order r 1 + ···+ r n of an n-dimensional random variable X is called the initial moment and is denoted by α r 1 r n = E{X r 1 1 X r n n }. The first initial moments are the expectations of the coordinates X 1 , , X n . The point (E{X 1 }, , E{X n }) in the space R n characterizes the position of the random point (X 1 , , X n ), which spreads about the point (E{X 1 }, , E{X n }). The first central moments are naturally zero. If a 1 = E{X 1 }, , a n = E{X n }, then the moment of order r 1 + ··· + r n of the n-dimensional random variable X is called the central moment and is denoted by μ r 1 r n = E  (X 1 – E{X 1 }) r 1 (X n – E{X n }) r n  . The second central moments have the following notation: λ ij = λ ji = E  (X i – E{X i })(X j – E{X j })  =  Var{X i } = σ 2 i for i = j, Cov(X i , X j )fori ≠ j. (20.2.5.25) The moments λ ij given by relation (20.2.5.25) determine the covariance matrix (matrix of moments)[λ ij ]. Obviously, the covariance matrix is real and symmetric; its determi- nant det[λ ij ] is called the generalized variance of the n-dimensional distribution.The correlations ρ ij = ρ(X i , X j )= Cov(X i , X j ) σ X i σ X j = λ ij  λ ii λ jj (i, j = 1, 2, , n)(20.2.5.26) 1066 PROBABILITY THEORY determine the correlation matrix [ρ ij ]ofthen-dimensional distribution provided that all variances Var{X i } are nonzero. Obviously, the correlation matrix is real and symmetric. The quantity  det[ρ ij ] is called the spread coefficient. 20.2.5-10. Regression. A function g 1 (X 2 , , X n ) is called the best mean-square approximation to a random variable X 1 if the expectation E{[X 1 – g 1 (X 2 , , X n )] 2 } takes the least possible value. The function g 1 (x 2 , , x n ) is called the mean-square regression of X 1 on X 2 , , X n . The conditional expectation E{X 1 |X 2 , , X n } is a function of X 2 , , X n , E{X 1 |X 2 , , X n } = g 1 (X 2 , , X n ). (20.2.5.27) It is called the regression function of X 1 on X 2 , , X n and is the mean-square regression of X 1 on X 2 , , X n . In a majority of cases, it suffices to approximate the regression (20.2.5.27) by the linear function g i = E{X i } +  j≠i β ij (X j – E{X j }). (20.2.5.28) Relation (20.2.5.28) determines the linear regression of X i on the other n – 1 variables. The regression coefficients β ij are determined by the relation β ij =– Λ ij Λ ii , where Λ ij are the entries of the inverse of the covariance matrix. The measure of correlation between X i and the other n – 1 variables is the multiple correlation coefficient ρ(X i , g i )=  1 – 1 λ ii Λ ii . The residual of X i with respect to the other n – 1 variables is defined as the random variable Δ i = X i – g i . It satisfies the relations Cov(Δ i , X j )=  0 for i ≠ j, Var {Δ i } for i = j (residual variance). 20.2.5-11. Characteristic functions. The characteristic function of a random variable X is defined as the expectation of the random variable exp  i n  j=1 t j X j  : f X (t)=f(t)=E  exp  i n  j=1 t j X j  ,(20.2.5.29) where t =(t 1 , , t n ), i is the imaginary unit, i 2 =–1. . spots on top of the first die, and let X 2 be the number of spots on top of the second die. We consider the random variables Y 1 = X 1 + X 2 and Y 2 = X 1 – X 2 (the sum and difference of points. The correlation (or correlation coefficient) of random variables X 1 and X 2 is the ratio of the covariance of X 1 and X 2 to the product of their standard deviations, ρ(X 1 , X 2 )= Cov(X 1 ,. some random variables and continuous in the other random variables. 20.2.5-9. Numerical characteristics of multivariate random variables. The expectation of a function g(X) of a multivariate random

Định dạng
Số trang	7
Dung lượng	393,51 KB