20.2. RANDOM VARIABLES AND THEIR CHARACTERISTICS 1067 For continuous random variables, f(t)= +∞ –∞ +∞ –∞ exp i n j=1 t j x j p(x) dx. The inversion formula for a continuous distribution has the form p(x)= 1 (2π) n +∞ –∞ +∞ –∞ exp –i n j=1 t j x j f(t) dt, where dt = dt 1 dt n . If the initial moments of a random variable X exist, then E{X r 1 1 X r n n } = i – n j=1 r j ∂ r 1 +···+r n f(t) ∂t r 1 1 ∂t r n n t 1 =···=t n =0 . The characteristic function corresponding to the m-dimensional marginal distribution of m out of n variables X 1 , , X n can be obtained from the characteristic function (20.2.5.29) if the variables t j corresponding to the random variables that are not contained in the m-dimensional marginal distribution are replaced by zeros. C ONTINUITY THEOREM FOR CHARACTERISTIC FUNCTIONS.The weak convergence F X n → F of a sequence of distribution functions F 1 (x) , F 2 (x) , is equivalent to the uniform convergence f n (t) → f(t) of characteristic functions on each finite interval. 20.2.5-12. Independency of random variables. Random variables X 1 , , X n are said to be independent if the events {X 1 S 1 }, , {X n S n } are independent for any measurable sets S 1 , , S n . For this, it is necessary and sufficient that P (X 1 S 1 , , X n S n )= n k=1 P (X k S k ). (20.2.5.30) Relation (20.2.5.30) is equivalent to one of the following three: 1. In the general case: for any x R n , F X (x)= n k=1 F X k (x k ). 2. For absolutely continuous distributions: for any x R n (except possibly for a set of measure zero), p X (x)= n k=1 p X k (x k ). 3. For discrete distributions: for any x R n , P (X 1 = x 1 , , X n = x n )= n k=1 P (X k = x k ). The joint distribution of independent random variables is uniquely determined by their individual distributions. Independent random variables are uncorrelated, but the converse is not true in general. Random variables X 1 , , X n are independent if and only if the characteristic function of the multivariate random variable X is equal to the product of the characteristic functions of the random variables X 1 , , X n . 1068 PROBABILITY THEORY 20.3. Limit Theorems 20.3.1. Convergence of Random Variables 20.3.1-1. Convergence in probability. A sequence of random variables X 1 , X 2 , is said to converge in probability to a random variable X(X n P −→ X)if lim n→∞ P (|X n – X| ≥ ε)=0 (20.3.1.1) for each ε > 0, i.e., if for any ε > 0 and δ > 0 there exists a number N, depending on ε and δ, such that the inequality P (|X n – X| > ε)<δ holds for n > N. A sequence of k-dimensional random variables X n is said to converge in probability to a random variable X if each coordinate of the random variable X n converges in probability to the respective coordinate of the random variable X. 20.3.1-2. Almost sure convergence (convergence with probability 1). A sequence of random variables X 1 , X 2 , is said to converge almost surely (or with probability 1) to a random variable X (X n a.s. −→ X)if P [ω Ω : lim n→∞ X n (ω)=X(ω)] = 1.(20.3.1.2) A sequence X n → X converges almost surely if and only if P ∞ m=1 {|X n+m – X| ≥ ε} −→ n→∞ 0 for each ε > 0. Convergence almost surely implies convergence in probability. The converse statement is not true in general. 20.3.1-3. Convergence in mean. A sequence of random variables X 1 , X 2 , with finite pth initial moments (p = 1, 2, ) is said to converge in pth mean to a random variable X (E{X p } < ∞)if lim n→∞ E{|X n – X| p } = 0.(20.3.1.3) Convergence in pth mean, for p = 2 is called convergence in mean square. If X n → X in pth mean then X n → X in p 1 th mean for all p 1 ≤ p. Convergence in pth mean implies convergence in probability. The converse statement is not true in general. 20.3. LIMIT THEOREMS 1069 20.3.1-4. Convergence in distribution. Suppose that a sequence F 1 (x), F 2 (x), of cumulative distribution functions converges to a distribution function F(x), lim n→∞ F n (x)=F (x), (20.3.1.4) for every point x at which F(x). In this case, we say that the sequence X 1 , X 2 , of the corresponding random variables converges to the random variable X in distribution.The random variables X 1 , X 2 , can be defined on different probability spaces. A sequence F 1 (x), F 2 (x), of distribution functions weakly converges to a distribution function F (x)(F n → F)if lim n→∞ E{h(X n )} = E{h(X)} (20.3.1.5) for any bounded continuous function h as n →∞. Convergence in distribution and weak convergence of distribution functions are equiv- alent. The weak convergence F X n → F for random variables having a probability density function means the convergence +∞ –∞ g(x)p X n (x) dx → +∞ –∞ g(x)p(x) dx (20.3.1.6) for any bounded continuous function g(x). 20.3.2. Limit Theorems 20.3.2-1. Law of large numbers. The law of large numbers consists of several theorems establishing the stability of average results and revealing conditions for this stability to occur. The notion of convergence in probability is most often used for the case in which the limit random variable X has the degenerate distribution concentrated at a point a (P (ξ = a)=1) and X n = 1 n n k=1 Y k ,(20.3.2.1) where Y 1 , Y 2 , are arbitrary random variables. A sequence Y 1 , Y 2 , satisfies the weak law of large numbers if the limit relation lim n→∞ P 1 n n k=1 Y k – a ≥ ε ≡ lim n→∞ P (|X n – a| ≥ ε)=0 (20.3.2.2) holds for any ε > 0. If the relation P ω Ω : lim n→∞ 1 n n k=1 Y k = a ≡ P ω Ω : lim n→∞ X n = a = 1 (20.3.2.3) 1070 PROBABILITY THEORY is satisfi ed instead of (20.3.2.2), i.e., the sequence X n converges to the number a with probability 1, then the sequence Y 1 , Y 2 , satisfies the strong law of large numbers. Markov inequality. For any nonnegative random variable X that has an expectation E{X}, the inequality P (X ≥ ε) ≤ E{X} ε 2 (20.3.2.4) holds for each ε > 0. Chebyshev inequality. For any random variable X with finite variance, the inequality P (|X – E{X}| ≥ ε) ≤ Var{X} ε 2 (20.3.2.5) holds for each ε > 0. C HEBYSHEV THEOREM. If X 1 , X 2 , is a sequence of independent random variables with uniformly bounded finite variances, Var{X 1 } ≤ C , Var{X 2 } ≤ C , , then the limit relation lim n→∞ P 1 n n k=1 X k – 1 n n k=1 E{X k } < ε = 1 (20.3.2.6) holds for each ε > 0 . BERNOULLI THEOREM. Let μ n be the number of occurrences of an event A (the number of successes) in n independent trials, and let p = P (A) be the probability of the occurrence of the event A (the probability of success) in each of the trials. Then the sequence of relative frequencies μ n /n of the occurrence of the event A in n independent trials converges in probability to p = P (A) as n →∞ ; i.e., the limit relation lim n→∞ P μ n n – p < ε = 1 (20.3.2.7) holds for each ε > 0 . POISSON THEOREM. If in a sequence of independent trials the probability that an event A occurs in the k th trial is equal to p k ,then lim n→∞ P μ n n – p 1 + ···+ p n n < ε = 1.(20.3.2.8) K OLMOGOROV THEOREM. If a sequence of independent random variables X 1 , X 2 , satisfies the condition ∞ k=1 Var{X k } k 2 <+∞,(20.3.2.9) then it obeys the strong law of large numbers. The existence of the expectation is a necessary and sufficient condition for the strong law of large numbers to apply to a sequence of independent identically distributed random variables. 20.4. STOCHASTIC PROCESSES 1071 20.3.2-2. Central limit theorem. A random variable X n with distribution function F X n is asymptotically normally distributed if there exists a sequence of pairs of real numbers m n , σ 2 n such that the random variables (X n – m n /σ n ) converge in probability to a standard normal variable. This occurs if and only if the limit relation lim n→∞ P (X n + aσ n < X n < X n + bσ n )=Φ(b)–Φ(a), (20.3.2.10) where Φ(x) is the distribution function of the standard normal law, holds for any a and b (b > a). L YAPUNOV CENTRAL LIMIT THEOREM. If X 1 , , X n is a sequence of independent random variables satisfying the Lyapunov condition lim n→∞ n k=1 α 3 (X k ) n k=1 Var{X k } = 0, where α 3 (X k ) is the third initial moment of the random variable X k , then the sequence of random variables Y n = n k=1 (X k – E{X k }) n k=1 Var{X k } converges in distribution to the normal law, i.e., the following limit exists: lim n→∞ P n k=1 (X k – E{X k }) n k=1 Var{X k } < t = 1 √ 2π t –∞ e –t 2 /2 dt = Φ(t). (20.3.2.11) L INDEBERG CENTRAL LIMIT THEOREM. Let X 1 , X 2 , be a sequence of independent identically distributed random variables with finite expectation E{X k } = m and finite variance σ 2 . Then, as n →∞ , the random variable 1 n n k=1 X k has an asymptotically normal probability distribution with parameters (m, σ 2 /n) . Let μ n be the number of occurrences of an event A (the number of successes) in n independent trials, and let p = P (A) be the probability of the occurrence of the event A (the probability of success) in each of the trials. Then the sequence of relative frequencies μ n /n has an asymptotically normal probability distribution with parameters (p, p(1 – p)/n) . 20.4. Stochastic Processes 20.4.1. Theory of Stochastic Processes 20.4.1-1. Notion of stochastic process. Let a family ξ(t)=ξ(ω, t), ω Ω,(20.4.1.1) of random variables depending on a parameter t T be given on a probability space (Ω, F, P ). The variable ξ(t), t T , can be treated as a random function of the variable t T . The values of this function are the values of the random variable (20.4.1.1). The random function ξ(t) of the independent variable t is called a stochastic process. If a random 1072 PROBABILITY THEORY outcome ω Ω occurs, then the actual process is described by the corresponding trajectory, which is called a realization of the process, or a sample function. A stochastic process can be simply a numerical function ξ(t) of time admitting different realizations ξ(ω, t)(one-dimensional stochastic process) or a vector function Ξ(t)(multi- dimensional, or vector, stochastic process). The study of a multidimensional stochastic process can be reduced to the study of one-dimensional stochastic processes by a transfor- mation taking Ξ(t)=(ξ 1 (t), , ξ n (t)) to the auxiliary process ξ a (t)=Ξ(t) ⋅ a = n i=1 a i ξ i (t), (20.4.1.2) where a =(a 1 , , a n ) is an arbitrary k-dimensional vector. Therefore, the study of one- dimensional stochastic processes ξ(t) is the main point in the theory of stochastic processes. If the parameter t ranges in some interval of the real line R, then the stochastic process is called a stochastic process with continuous time, and if the parameter t takes integer values, then the process is called a stochastic process with discrete time (a random sequence). To describe a stochastic process, one should specify an infinite set of compatible finite- dimensional probability distributions of the random vectors Ξ(t) corresponding to all pos- sible finite subsets t =(t 1 , , t n ) of values of the argument. Remark. Specifying compatible finite-dimensional probability distributions of random vectors may be insufficient for specifying the probabilities of events depending on the values of ξ(t)onaninfinite set of values of the parameter t; i.e., this does not uniquely determine the stochastic process ξ(t). Example. Suppose that ξ(t)=cos(ωt + Φ), 0 ≤ t ≤ 1, is a harmonic oscillation with random phase Φ, Z is a random variable uniformly distributed on the interval [0, 1], and the stochastic process ζ(t), 0 ≤ t ≤ 1,is given by the relation ζ(t)= ξ(t)fort ≠ Z, ξ(t)+3 for t = Z. Since P [(Z = t 1 ) ∪···∪(Z = t n ) = 0 for any finite set t =(t 1 , , t n ), we see that all finite-dimensional distributions of the stochastic processes ξ(t)andζ(t) are the same. At the same time, these processes differ from each other. Specifying the set of finite-dimensional probability distributions often permits one to clarify whether there exists at least one stochastic process ξ(t) with finite-dimensional distributions whose realizations satisfy a certain property (for example, are continuous or differentiable). 20.4.1-2. Correlation function. Let ξ(t)andζ(t) be real stochastic processes. The autocorrelation function of a stochastic process is defined as the function B ξξ (t, s)=E ξ(t)–E{ξ(t)} ξ(s)–E{ξ(s)} ,(20.4.1.3) which is the second central moment function. Remark. The autocorrelation function of a stochastic process is also called the covariance function. The mixed second moment, i.e., the function B ξζ (t, s)=E{ξ(t)ζ(s)},(20.4.1.4) of values of ξ(t)andζ(t) at two points is called the cross-correlation function (cross- covariance function). 20.4. STOCHASTIC PROCESSES 1073 The mixed central moment (covariance), i.e., the function b ξζ (t, s)=E ξ(t)–E{ξ(t)} ζ(s)–E{ζ(s)} ,(20.4.1.5) is called the central cross-correlation function. The correlation coefficient, i.e., the function ρ ξζ (t, s)= b ξζ (t, s) √ Var{ξ(t)}Var{ζ(s)} ,(20.4.1.6) is called the normalized cross-correlation function. The following relations hold: E{ξ 2 (t)} = B ξξ (t, t) ≥ 0, |B ξξ (t, s)| 2 ≤ B ξξ (t, t)B ξξ (s, s), Var{ξ(t)} = B ξξ (t, t)–[E{ξ(t)}] 2 , Cov[ξ(t), ζ(s)] = B ξζ (t, s)–ξ(t)ζ(s). Suppose that ξ(t)andζ(t)arecomplex stochastic processes (essentially, two-dimensional stochastic processes). The autocorrelation function and the cross-correlation function are determined by the relations B ξξ (t, s)=E ξ(t)ξ(s) = B ξξ (s, t), B ξζ (t, s)=E ξ(t)ζ(s) = B ζξ (s, t), (20.4.1.7) where ξ(t), B ξξ (s, t), and B ζξ (s, t) are the function conjugate to a function ξ(t), B ξξ (s, t), and B ζξ (s, t), respectively. 20.4.1-3. Differentiation and integration of stochastic process. To differentiate a stochastic process is to calculate the limit lim h→0 ξ(t + h)–ξ(t) h = ξ t (t). (20.4.1.8) If the limit is understood in the sense of convergence in mean square (resp., with probabil- ity 1), then the differentiation is also understood in mean square (resp., with probability 1). The following formulas hold: E{ξ t (t)} = dE{ξ(t)} dt , B ξ t ξ t (t, s)= ∂ 2 B ξξ (t, s) ∂t∂s , B ξ t ξ (t, s)= ∂B ξξ (t, s) ∂t .(20.4.1.9) The integral b a ξ(t) dt (20.4.1.10) of a stochastic process ξ(t)definedonaninterval[a, b] with autocorrelation function B ξξ (t, s) is the limit in mean square of the integral sums n k=1 ξ(s k )(t k – t k–1 ) as the diameter of the partition a = t 0 < t 1 < ···< t n = b,wheres k [t k , t k–1 ] tends to zero. . (19.1.6.1) and, moreover, the functions ϕ =(ϕ 1 , , ϕ n )andf i (i = 0, 1, , m) and their partial derivatives are continuous in x in a neighborhood of the Cartesian product of {(t, ˆ x|t Γ} by U and. lines x 1 = 0 and x 2 = 0. The feasible region of the problem is the common part of the half-planes of solutions to all inequalities in the system of constraints. If the system of constraints. 0 (i = 1, 2, , m). (19.2.1.3) This form of the linear programming problem differs from other forms in that the system of constraints is a system of equations and all variables are nonnegative. Any