20.2. RANDOM VARIABLES AND THEIR CHARACTERISTICS 1039 20.2. Random Variables and Their Characteristics 20.2.1. One-Dimensional Random Variables 20.2.1-1. Notion of random variable. Distribution function of random variable. A random variable X is a real function X = X(ω), ω Ω, on a probability space Ω such that the set {ω : X(ω) ≤ x} belongs to the σ-algebra F of events for each real x. Any rule (table, function, graph, or otherwise) that permits one to find the probabilities of events A ⊆Fis usually called the distribution law of a random variable. In general, random variables can be discrete or continuous. The cumulative distribution function of a random variable X is the function F X (x) whose value at each point x is equal to the probability of the event {X < x}: F X (x)=F (x)=P (X < x). (20.2.1.1) Properties of the cumulative distribution function: 1. F (x) is bounded, i.e., 0 ≤ F (x) ≤ 1. 2. F (x) is a nondecreasing function for x (–∞, ∞); i.e., if x 2 > x 1 ,thenF (x 2 ) ≥ F(x 1 ). 3. lim x→–∞ F (x)=F(–∞)=0. 4. lim x→+∞ F (x)=F (+∞)=1. 5. The probability that a random variable X lies in the interval [x 1 , x 2 ) is equal to the increment of its cumulative distribution function on this interval; i.e., P (x 1 ≤ X < x 2 )=F (x 2 )–F (x 1 ). 6. F (x) is left continuous; i.e., lim x→x 0 –0 F (x)=F (x 0 ). 20.2.1-2. Discrete random variables. A random variable X is said to be discrete if the set of its possible values (the spectrum of the discrete random variable) is at most countable. A discrete distribution is determined by a finite or countable set of probabilities P (X = x i ) such that i P (X = x i )=1. To define a discrete random variable, it is necessary to specify the values x 1 , x 2 , and the corresponding probabilities p 1 , p 2 , ,wherep i = P (X = x i ). Remark. In what follows, we assume that the values of a discrete random variable X are arranged in ascending order. In this case, the cumulative distribution function of a discrete random variable X is the step function defined as the sum F (x)= x i <x P (X = x i ). (20.2.1.2) It is often convenient to write out the cumulative distribution function using the function θ(x) such that θ(x)=1 for x > 0 and θ(x)=0 for x ≤ 0: F (x)= i P (X = x i )θ(x – x i ). 1040 PROBABILITY THEORY For discrete random variables, one can introduce the notion of probability density function p(x) by setting p(x)= i P (X = x i )δ(x – x i ), where δ(x) is the delta function. 20.2.1-3. Continuous random variables. Probability density function. A random variable X is said to be continuous if its cumulative distribution function F X (x) can be represented in the form F (x)= x –∞ p(y) dy.(20.2.1.3) The function p(x) is called the probability density function of the random variable X. Obviously, relation (20.2.1.3) is equivalent to the relation p(x) = lim Δx→0 P (x ≤ X ≤ x + Δx) Δx = dF (x) dx .(20.2.1.4) The differential dF (x)=p(x) dx ≈ P (x ≤ X < x + dx) is called a probability element. Properties of the probability density function: 1. p(x) ≥ 0. 2. P (x 1 ≤ X < x 2 )= x 2 x 1 p(y) dy. 3. +∞ –∞ p(x) dx = 1. 4. P (x ≤ ξ < x + Δx) ≈ p(x)Δx. 5. For continuous random variables, one always has P (X = x)=0, but the event {X = x} is not necessarily impossible. 6. For continuous random variables, P (x 1 ≤ X < x 2 )=P (x 1 < X < x 2 )=P (x 1 < X ≤ x 2 )=P (x 1 ≤ X ≤ x 2 ). 20.2.1-4. Unified description of probability distribution. Discrete and continuous probability distributions can be studied simultaneously if the prob- ability of each event {a ≤ X < b} is represented in terms of the integral P (a ≤ X < b)= b a dF (x), (20.2.1.5) where F (x)=P(X < x) is the cumulative distribution function of the random variable X. For a continuous distribution, the integral (20.2.1.5) becomes the Riemann integral. For a discrete distribution, the integral can be reduced to the form P (a ≤ X < b)= a≤ x i <b P (X = x i ). In particular, the integral can also be used in the case of mixed distributions, i.e., distributions that are partially continuous and partially discrete. 20.2. RANDOM VARIABLES AND THEIR CHARACTERISTICS 1041 20.2.1-5. Symmetric random variables. A random variable X is said to be symmetric if the condition P (X <–x)=P (X > x)(20.2.1.6) holds for all x. Properties of symmetric random variables: 1. P (|X| < x)=F (x)–F (–x)=2F (x)–1. 2. F (0)=0.5. 3. If a moment α 2k+1 of odd order about the origin (see Paragraph 20.2.2-3) exists, then it is zero. 4. If t γ is the quantile of level γ (see Paragraph 20.2.2-5), then t γ =–t 1–γ . A random variable Y is said to be symmetric about its expected value if the random variable X = Y – E{Y } is symmetric, where E{Y } is the expected value of a random variable Y (see Paragraph 20.2.2-1). Properties of random variables symmetric about the expected value: 1. P (|Y – E{Y }| < x)=2F Y (x + E{Y })–1. 2. F Y (E{Y })=0.5. 3. If a central moment μ 2k+1 of odd order (see Paragraph 20.2.2-3) exists, then it is equal to zero. 20.2.1-6. Functions of random variables. Suppose that a random variable Y is related to a random variable X by a functional dependence Y = f(X). If X is discrete, then, obviously, Y is also discrete. To find the distribution law of the random variable Y ,itsuffices to calculate the values f(x i ). If there are repeated values among y i = f(x i ), then these repeated values are taken into account only once, the corresponding probabilities being added. If X is a continuous random variable with probability density function p X (x), then, in general, the random variable Y is also continuous. The cumulative distribution function of Y is given by the formula F Y (y)=P (η < y)=P [f(x)<y]= f(x)<y p X (x) dx.(20.2.1.7) If the function y = f(x) is differentiable and monotone on the entire range of the argument x, then the probability density function p Y (y) of the random variable Y is given by the formula p Y (y)=p X [ψ(y)]|ψ y (y)|,(20.2.1.8) where ψ is the inverse function of f(x). If f (x) is a nonmonotonic function, then the inverse function is nonunique and the probability density function of the random variable y is the sum of as many terms as there are values (for a given y) of the inverse function: p Y (y)= k i=1 p X [ψ i (y)] [ψ i (y)] y ,(20.2.1.9) where ψ 1 (y), , ψ k (y) are the values of the inverse function for a given y. 1042 PROBABILITY THEORY Example 1. Suppose that a random variable X has the probability density p X (x)= 1 √ 2π e –x 2 /2 . Find the distribution of the random variable Y = X 2 . In this case, y = f (x)=x 2 . According to (20.2.1.7), we obtain F Y (y)= x 2 <y 1 √ 2π e –x 2 /2 dx = 1 √ 2π √ y – √ y e –x 2 /2 dx = 2 √ 2π √ y 0 e –x 2 /2 dx = 1 √ 2π y 0 e –t/2 √ t dt. Example 2. Suppose that a random variable X has the probability density p X (x)= 1 √ 2πσ exp – (x – a) 2 2σ 2 . Find the probability density of the random variable Y = e X . For y > 0, the cumulative distribution function of the random variable Y = e X is determined by the relations F Y (y)=P (Y < y)=P (e X < y)=P (X <lny)=F X (ln y). We differentiate this relation and obtain p Y (y)= dF Y (y) dy = dF X (ln y) dy = p X (ln y) 1 y = 1 √ 2πσy exp – (ln y – a) 2 2σ 2 for y > 0. The distribution of Y is called the log-normal distribution. Example 3. Suppose that a random variable X has the probability density p X (x)forx (–∞, ∞). Then the probability density of the random variable Y = |X| is given by the formula p Y (y)=p X (x)+p X (–x)(y ≥ 0). In particular, if X is symmetric, then p Y (y)=2p X (y)(y ≥ 0). 20.2.2. Characteristics of One-Dimensional Random Variables 20.2.2-1. Expectation. The expectation (expected value) E{X} of a discrete or continuous random variable X is the expression given by the formula E{X} = +∞ –∞ xdF(x)= ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ i x i p i in the discrete case, +∞ –∞ xp(x) dx in the continuous case. (20.2.2.1) For the existence of the expectation (20.2.2.1), it is necessary that the corresponding series or integral converge absolutely. The expectation is the main characteristic defining the “position” of a random variable, i.e., the number near which its possible values are concentrated. We note that the expectation is not a function of the variable x but a functional describing the properties of the distribution of the random variable X. There are distributions for which the expectation does not exist. Example 1. For the Cauchy distribution given by the probability density function p(x)= 1 π(1 + x 2 ) , x (–∞,+∞), the expectation does not exist because the integral +∞ –∞ |x|/π(1 + x 2 ) dx diverges. 20.2. RANDOM VARIABLES AND THEIR CHARACTERISTICS 1043 20.2.2-2. Expectation of function of random variable. If a random variable Y is related to a random variable X by a functional dependence Y = f(X), then the expectation of the random variable Y = f (X) can be determined by two methods. The first method is to construct the distribution of the random variable Y and then use already known formulas to find E{Y }. The second method is to use the formulas E{Y } = E{f(X)} = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ i f(x i )p i in the discrete case, +∞ –∞ f(x)p(x) dx in the continuous case (20.2.2.2) if these expressions exist in the sense of absolute convergence. Example 2. Suppose that a random variable X is uniformly distributed in the interval (–π/2, π/2), i.e., p(x)=1/π for x (–π/2, π/2). Then the expectation of the random variable Y =sin(X) is equal to E{Y } = +∞ –∞ f(x)p(x) dx = π/2 –π/2 1 π sin xdx= 0. Properties of the expectation: 1. E{C} = C for any real C. 2. E{αX + βY } = αE{X} + βE{Y } for any real α and β. 3. E{X} ≤ E{Y } if X(ω) ≤ Y (ω), ω Ω. 4. E ∞ k=1 X k = ∞ k=1 E{X k } if the series ∞ k=1 E{|X k |} converges. 5. g(E{X}) ≤ E{g(X)} for convex functions g(X). 6. Any bounded random variable has a finite expectation. 7. |E{X}| ≤ E{|X|}. 8. The Cauchy–Schwarz inequality (E{|XY |}) 2 ≤ (E{X}) 2 (E{Y }) 2 holds. 9. E n k=1 X k = n k=1 E{X k } for mutually independent random variables X 1 , , X n . 20.2.2-3. Moments. The expectation E{(X –a) k } is called the kth moment of a random variable X about a.The moments about zero are usually referred to simply as the moments of a random variable. (Sometimes they are called initial moments.) The kth moment satisfies the relation α k = E{X k } = +∞ –∞ x k dF (x)= ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ i x k i p i in the discrete case, +∞ –∞ x k p(x) dx in the continuous case. (20.2.2.3) If a = E{X}, then the kth moment of the random variable X about a is called the kth central moment.Thekth central moment satisfies the relation μ k = E{(X –E{X}) k }= ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ i (x i – E{X}) k p i in the discrete case, +∞ –∞ (x – E{X}) k p(x) dx in the continuous case. (20.2.2.4) 1044 PROBABILITY THEORY In particular, μ 0 = 1 for any random variable. The number m k = E{|X – a| k } is called the kth absolute moment of X about a. The existence of a kth moment α k or μ k implies the existence of the moments α m and μ m of orders m ≤ k; if the integral (or series) for α k or μ k diverges, then all integrals (series) for α m and μ m of orders m ≥ k also diverge. There is a simple relationship between the central and initial moments: μ k = k m=0 C m k α m (α 1 ) k–m , α 0 = 1; α k = k m=0 C m k μ m (α 1 ) k–m .(20.2.2.5) Relations (20.2.2.5) can be represented in the following easy-to-memorize symbolic form: μ k =(α – α 1 ) k , α k =(μ + α 1 ) k , where it is assumed that after the right-hand sides have been multiplied out according to the binomial formula, the expressions α m and μ m are replaced by α m and μ m , respectively. If the probability distribution is symmetric about its expectation, then all existing central moments μ k of even order k are zero. The probability distribution is uniquely determined by the moments α 0 , α 1 , provided that they all exist and the series ∞ m=0 |α m |t m /m! converges for some t > 0. 20.2.2-4. Variance. The variance of a random variable is the measure Var{X} of the deviation of a random variable X from its expectation E{X}, determined by the relation Var {X} = E{(X – E{X}) 2 }.(20.2.2.6) The variance Var{X} is the second central moment of the random variable X.The variance can be determined by the formulas Var{X} = +∞ –∞ (x – E{X}) 2 dF (x) = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ i (x i – E{X}) 2 p i in the discrete case, +∞ –∞ (x – E{X}) 2 p(x) dx in the continuous case. (20.2.2.7) The variance characterizes the spread in values of the random variable X about its expectation. Properties of the variance: 1. Var{C} = 0 for any real C. 2. The variance is nonnegative: Var{X} ≥ 0. 3. Var{αX + β} = α 2 Var{X} for any real numbers α and β. 4. Var{X} = E{X 2 } –(E{X}) 2 . 5. min m E{(X – m) 2 } =Var{X} and is attained for m = E{X}. 6. Var{X 1 + ···+ X n } =Var{X 1 } + ···+Var{X n } for pairwise independent random variables X 1 , , X n . 7. If X and Y are independent random variables, then Var{XY } =Var{X}Var{Y } +Var{X}(E{Y }) 2 +Var{Y }(E{X}) 2 . 20.2. RANDOM VARIABLES AND THEIR CHARACTERISTICS 1045 20.2.2-5. Numerical characteristics of random variables. A quantile of level γ of a one-dimensional distribution is a number t γ for which the value of the corresponding distribution function is equal to γ; i.e., P (X < t γ )=F (t γ )=γ (0 < γ < 1). (20.2.2.8) Quantiles exist for each probability distribution, but they are not necessarily uniquely determined. Quantiles are widely used in statistics. The quantile t 1/2 is called the me- dian Med{X}.Forn = 4, the quantiles t m/n are called quartiles,forn = 10, they are called deciles,andforn = 100, they are called percentiles. A mode Mode{X} of a continuous probability distribution is a point of maximum of the probability density function p(x). A mode of a discrete probability distribution is a value Mode{X} preceded and followed by values associated with probabilities smaller than p(Mode{X}). Distributions with one, two, or more modes are said to be unimodal, bimodal,or multimodal, respectively. The standard deviation (root-mean-square deviation) of a random variable X is the square root of its variance, σ = Var {X}. The standard deviation has the same dimension as the random variable itself. The coefficient of variation is the ratio of the standard deviation to the expected value, v = σ E{X} . The asymmetry coefficient,orskewness,isdefined by the formula γ 1 = μ 3 (μ 2 ) 3/2 .(20.2.2.9) If γ 1 > 0, then the distribution curve is more flattened to the right of the mode Mode{X}; if γ 1 < 0, then the distribution curve is more flattened to the left of the mode Mode{X} (see Fig. 20.1). (As a rule, this applies to continuous random variables.) Mode{ }X x px() px() Mode{ }X γ >0 γ <0 11 Figure 20.1. Relationship of the distribution curve and the asymmetry coefficient. The excess coefficient,orexcess,orkurtosis,isdefined by the formula γ 2 = μ 4 μ 2 2 – 3.(20.2.2.10) One says that for γ 2 = 0 the distribution has a normal excess, for γ 2 > 0 the distribution has a positive excess, and for γ 2 < 0 the distribution has a negative excess. Remark. The coefficients γ 2 1 and γ 2 + 3 or (γ 2 + 3)/2 are often used instead of γ 1 and γ 2 . Pearson’s first skewness coefficient for a unimodal distribution is defined by the formula s = E{X} – Mode{X} σ . . 20.2. RANDOM VARIABLES AND THEIR CHARACTERISTICS 1039 20.2. Random Variables and Their Characteristics 20.2.1. One-Dimensional Random Variables 20.2.1-1. Notion of random variable. Distribution. x i ). In particular, the integral can also be used in the case of mixed distributions, i.e., distributions that are partially continuous and partially discrete. 20.2. RANDOM VARIABLES AND THEIR. x 2 ) dx diverges. 20.2. RANDOM VARIABLES AND THEIR CHARACTERISTICS 1043 20.2.2-2. Expectation of function of random variable. If a random variable Y is related to a random variable X by a functional