Handbook of mathematics for engineers and scienteists part 160 pps

Chapter 21 Mathematical Statistics 21.1. Introduction to Mathematical Statistics 21.1.1. Basic Notions and Problems of Mathematical Statistics 21.1.1-1. Basic problems of mathematical statistics. The term “statistics” derives from the Latin word “status,” meaning “state.” Statistics comprises three major divisions: collection of statistical data, their statistical analysis, and development of mathematical methods for processing and using the statistical data to draw scientific and practical conclusions. It is the last division that is commonly known as mathematical statistics. The original material for a statistical study is a set of results specially gathered for this study or a set of results of specially performed experiments. The following problems arise in this connection. 1. Estimating the unknown probability of a random event 2. Finding the unknown theoretical distribution function The problem is stated as follows. Given the values x 1 , , x n of a random variable X obtained in n independent trials, find, at least approximately, the unknown distribution function F (x) of the random variable X. 3. Determining the unknown parameters of the theoretical distribution function The problem is stated as follows. A random variable X has the distribution function F (x; θ 1 , , θ k ) depending on k parameters θ 1 , , θ k , whose values are unknown. The main goal is to estimate the unknown parameters θ 1 , , θ k using only the results X 1 , , X n of observations of the random variable X. Instead of seeking approximate values of the unknown parameters θ 1 , , θ k in the form of functions θ ∗ 1 , , θ ∗ k , in a number of problems it is preferable to seek functions θ ∗ i,L and θ ∗ i,R (i = 1, 2, , k) depending on the results of observations and known variables and such that with sufficient reliability one can claim that θ ∗ i,L < θ ∗ i < θ ∗ i,R (i = 1, 2, , k). The functions θ ∗ i,L and θ ∗ i,R (i = 1, 2, , k) are called the confidence boundaries for θ ∗ 1 , , θ ∗ k . 4. Testing statistical hypotheses The problem is stated as follows. Some reasoning suggests that the distribution function of a random variable X is F (x); the question is whether the observed values are compatible with the hypothesis to be tested that the random variable X has the distribution F(x). 5. Estimation of dependence A sequence of observations is performed simultaneously for two random variables X and Y . The results of observations are given by pairs of values x 1 , y 1 , x 2 , y 2 , , x n , y n . It is required to find a functional or correlation relationship between X and Y . 21.1.1-2. Population and sample. The set of all possible results of observations that can be made under a given set of conditions is called the population. In some problems, the population is treated as a random variable X. 1081 1082 MATHEMATICAL STATISTICS An example of population is the entire population of a country. In this population, we are, for example, interested in the age of people. Another example of population is the set of parts produced by a given machine. These parts can be either accepted or rejected. The number of entities in a population is called its size and is usually denoted by the symbol N. A set of entities randomly selected from a population is called a sample. A sample must be representative of the population; i.e., it must show the right proportions characteristic of the population. This is achieved by the randomness of the selection when all entities in the population can be selected with the same probability. The number of elements in a sample is called its size and is usually denoted by the symbol n. The elements of a sample will be denoted by X 1 , , X n . Note that sampling itself can be performed by various methods. Having selected an element and measured its value, one can delete this element from the population so that it cannot be selected in subsequent trials (sampling without replacement). Alternatively, after measuring the value of an element, one can return it to the population (samples with replacement). Obviously, for a sufficiently large population size the difference between sampling with and without replacement disappears. 21.1.1-3. Theoretical distribution function. Each element X i in a sample has the distribution function F (x), and the elements X 1 , , X n are assumed to be independent for sampling with replacement or as n →∞.Asample X 1 , , X n is interpreted as a set of n independent identically distributed random variables with distribution function F (x)orasn independent realizations of an observable random variable X with distribution function F (x). The distribution function F (x) is called the theoretical distribution function. The joint distribution function F X 1 , ,X n (x 1 , , x n )ofthesampleX 1 , , X n is given by the formula F X 1 , ,X n (x 1 , , x n )=P (X 1 < x 1 , , X n < x n )=F (x 1 )F (x 2 ) F(x n ). (21.1.1.1) 21.1.2. Simplest Statistical Transformations 21.1.2-1. Series of order statistics. By arranging the elements of a sample X 1 , , X n in ascending order, X (1) ≤ ··· ≤ X (n) , we obtain the series of order statistics X (1) , , X (n) . Obviously, this transformation does not lead to a loss of information about the theoretical distribution function. The variables X (1) and X (n) are called the extreme order statistics. The difference R = X ∗ (n) – X ∗ (1) (21.1.2.1) of the extreme order statistics is called the range statistic,orthesample range R. The series of order statistics is used to construct the empirical distribution function (see Paragraph 21.1.2-6). 21.1.2-2. Statistical series. If a sample X 1 , , X n contains coinciding elements, which may happen in observations of a discrete random variable, then it is expedient to group the elements. For a common 21.1. INTRODUCTION TO MAT H E M AT I C AL STATISTICS 1083 value of several variates in the sample X 1 , , X n , the size of the corresponding group of coinciding elements is called the frequency or the weight of this variate value. By n i we denote the number of occurrences of the ith variate value. The set Z 1 , , Z L of distinct variate values arranged in ascending order with the corresponding frequencies n 1 , , n L represents the sample X 1 , , X n and is called a statistical series (see Example 1 in Paragraph 21.1.2-7). 21.1.2-3. Interval series. Interval series are used in observations of continuous random variables. In this case, the entire sample range is divided into fi nitely many bins,orclass intervals, and then the number of variates in each bin is calculated. The ordered sequence of class intervals with the corresponding frequencies or relative frequencies of occurrences of variates in each of these intervals is called an interval series. It is convenient to represent an interval series as a table with two rows (e.g., see Example 2 in Paragraph 21.1.2-7). The first row of the table contains the class intervals [x 0 , x 1 ), [x 1 , x 2 ), ,[x L–1 , x L ), which are usually chosen to have the same length. The interval length h is usually determined by the Sturges formula h = X ∗ (n) – X ∗ (1) 1 +log 2 n ,(21.1.2.2) where 1 +log 2 n = L is the number of intervals (log 2 n ≈ 3.322 lg n). The second row of the interval series contains the frequencies or relative frequencies of occurrences of the sample elements in each of these intervals. Remark. It is recommended to take X (1) – 1 2 h for the left boundary of the first interval. 21.1.2-4. Relative frequencies. Let H be the event that the value of a random variable X belongs to a set S H . Suppose also that a random sample X 1 , , X n is given. The number n H of sample elements lying in S H is called the frequency of the event H. The ratio of the frequency n H to the sample size is called the relative frequency and is denoted by p ∗ H = n H n .(21.1.2.3) Since a random sample can be treated as the result of a sequence of n Bernoulli trials (Paragraph 20.1.3-2), it follows that the random variable n H has the binomial distribution with parameter p = P (H), where P (H) is the probability of the event H. One has E{p ∗ H } = P (H), Var{p ∗ H } = P (H)[1 – P (H)] n .(21.1.2.4) The relative frequency p ∗ H is an unbiased consistent estimator for the corresponding probability P (H). As n →∞, the estimator p ∗ H is asymptotically normal with the parameters (21.1.2.4). Let H i (i = 1, 2, , L) be the random events that the random variable takes the value Z i (in the discrete case) or lies in the ith interval of the interval series (in the continuous case), 1084 MATHEMATICAL STATISTICS and let n i and p ∗ i be their frequencies and relative frequencies, respectively. The cumulative frequencies N l are determined by the formula N l = l  i=1 n i .(21.1.2.5) The cumulative relative frequencies W l are given by the expression W l = l  i=1 p ∗ i = N l n .(21.1.2.6) 21.1.2-5. Notion of statistic. To make justified statistical conclusions, one needs a sample of sufficiently large size n. Obviously, it is rather difficult to use and store such samples. The notion of statistic allows one to avoid these problems. A statistic S =(S 1 , , S k ) is an arbitrary k-dimensional function of the sample X 1 , , X n : S i = S i (X 1 , , X n )(i = 1, 2, , k). (21.1.2.7) Being a function of the random vector (X 1 , , X n ), the statistic S =(S 1 , , S k )isalsoa random vector, and its distribution function F S 1 , ,S k (x 1 , , x n )=P (S 1 < x 1 , , S k < x k ) is given by the formula F S 1 , ,S k (x 1 , , x n )=  P (y 1 ) P(y n ) for a discrete random variable X andbytheformula F S 1 , ,S k (x 1 , , x n )=   p(y 1 ) p(y n ) dy 1 dy n for a continuous random variable, where the summation or integration extends over all possible values y 1 , , y n (in the discrete case, each y i belongs to the set Z 1 , , Z L ) satisfying the inequalities S 1 (y 1 , , y n )<x 1 , S 2 (y 1 , , y n )<x 2 , , S k (y 1 , , y n )<x k . 21.1.2-6. Empirical distribution function. The empirical (sample) distribution function corresponding to a random sample X 1 , , X n is defined for each real x by the formula F ∗ n (x)= μ n (X 1 , , X n ; x) n ,(21.1.2.8) 21.1. INTRODUCTION TO MAT H E M AT I C AL STATISTICS 1085 where μ n (X 1 , , X n ; x) is the number of sample elements whose values are less than x. It is a nondecreasing step function such that F ∗ n (–∞)=0 and F ∗ n (+∞)=1. Since each X i is less than x with probability p x = F ∗ n (x), while X i themselves are independent, it follows that μ n (X 1 , , X n ; x) is an integer random variable distributed according to the binomial law P (μ n (X 1 , , X n ; x)=k)=C k n [F (x)] k [1 – F(x)] n–k with E{F ∗ n (x)} = F(x)andVar{F ∗ n (x)} = F(x)[1 – F (x)]. By the Glivenko–Cantelli theorem, D n =sup x   F ∗ n (x)–F(x)   a.s. −→ 0 (21.1.2.9) as n →∞; i.e., the variable D n converges to 0 with probability 1 or almost surely (see Paragraph 20.3.1-2). The random variable D n measures how close F ∗ n (x)andF(x)are.The empirical distribution function F ∗ n (x) is an unbiased consistent estimator of the theoretical distribution function. If a sample is given by a statistical series, then the following formula can be used: F ∗ (x)=  Z i <x p ∗ i .(21.1.2.10) It is convenient to construct the empirical distribution function F ∗ n (x) using the series of order statistics X (1) ≤ ≤ X (n) . In this case, F ∗ n (x)=  0 if x ≤ X (1) , k/n if X (k) < x ≤ X (k+1) , 1 if x > X (n) ; (21.1.2.11) i.e., the function F ∗ n (x) is constant on each interval (X (k) , X (k+1) ] and increases by 1/n at the point X (k) . 21.1.2-7. Graphical representation of statistical distribution. 1 ◦ . A broken line passing through the points with coordinates (Z i , n i )(i = 1, 2, , L), where Z i are the variate values in a statistical series and n i are the corresponding frequencies, is called the frequency polygon or a distribution polygon. If the relative frequencies p ∗ 1 = n 1 /n, , p ∗ L = n L /n are used instead of the frequencies n i (n 1 + ···+ n L = n), then the polygon is called the relative frequency polygon. Example 1. For the statistical series Z j 0123 4 5 p ∗ j 0.10.15 0.30.25 0.15 0.05 the relative frequency polygon has the form shown in Fig. 21.1. Z O 1 0.1 (, )Zp (, )Zp (, )Zp (, )Zp (, )Zp (, )Zp 1 2 3 4 5 6 1 2 3 4 5 6 * * * * * * 0.2 0.3 2345 p * Figure 21.1. Example of a relative frequency polygon. 1086 MATHEMATICAL STATISTICS 2 ◦ . The bar graph consisting of rectangles whose bases are class intervals of length Δ i = x i+1 –x i and whose heights are equal to the frequency densities n i /Δ i is called the frequency histogram. The area of a frequency histogram is equal to the size of the corresponding random sample. The bar graph consisting of rectangles whose bases are class intervals of length Δ i = x i+1 –x i and whose heights are equal to the relative frequency densities p ∗ i (x)/Δ i =n i /(nΔ i ) is called the relative frequency histogram. The area of the relative frequency histogram is equal to 1. The relative frequency histogram is an estimator of the probability density. Example 2. For the interval series [x i , x i+1 )[0, 5)[5, 10)[10, 15)[15, 20)[20, 25) n i 4 6 12 10 8 the relative frequency histogram has the form shown in Fig. 21.2. x O 5 0.02 0.04 0.06 10 15 20 25 px() Δ * Fig. 21.2. Example of a relative frequency histogram. 21.1.2-8. Main distributions of mathematical statistics. The normal distribution, the chi-square distribution, and the Student distribution were considered in Paragraphs 20.2.4-3, 20.2.4-5, and 20.2.4-6, respectively. 1 ◦ . A random variable Ψ has a Fisher–Snedecor distribution, or an F -distribution, with n 1 and n 2 degrees of freedom if Ψ = n 2 χ 2 1 n 1 χ 2 2 ,(21.1.2.12) where χ 2 1 and χ 2 2 are independent random variables obeying the chi-square distribution with n 1 and n 2 degrees of freedom. The F -distribution is characterized by the probability density function Ψ(x)= Γ  n 1 +n 2 2  Γ  n 1 2  Γ  n 2 2  n n 1 2 1 n n 2 2 2 x n 1 2 –1 (n 2 + n 1 x) – n 1 +n 2 2 (x > 0). (21.1.2.13) where Γ(x) is Gamma function. The quantiles of the F -distribution are usually denoted by φ α . 2 ◦ .TheKolmogorov distribution function has the form K(x)= ∞  k=–∞ (–1) k e –2k 2 x 2 (x > 0). (21.1.2.14) The Kolmogorov distribution is the distribution of the random variable η =max 0≤t≤1 |ξ(t)|,where ξ(t)isaWiener process on the interval 0 ≤ t ≤ 1 with fixed endpoints ξ(0)=0 and ξ(1)=0. 21.1. INTRODUCTION TO MAT H E M AT I C AL STATISTICS 1087 21.1.3. Numerical Characteristics of Statistical Distribution 21.1.3-1. Sample moments. The kth sample moment of a random sample X 1 , , X n is defined as α ∗ k = 1 n n  i=1 X k i .(21.1.3.1) The kth sample central moment of a random sample X 1 , , X n is defined as μ ∗ k = 1 n n  i=1 (X i – α ∗ 1 ) k .(21.1.3.2) The sample moments satisfy the following formulas: E{α ∗ k } = α k ,Var{α ∗ k } = α 2k – α 2 k n ,(21.1.3.3) E{μ ∗ k } = μ k + O(1/n), Var {μ ∗ k } = μ 2k – 2kμ k–1 μ k+1 – μ 2 k + k 2 μ 2 μ 2 k–1 n + O(1/n 2 ). (21.1.3.4) The sample moment α ∗ k is an unbiased consistent estimator of the corresponding population moment α k . The sample central moment μ ∗ k is a biased consistent estimator of the corresponding population central moment μ k . If there exists a moment μ 2k , then the sample moment μ ∗ k is asymptotically normally distributed with parameters (α k ,(α 2k – α 2 k )/n)asn →∞. Unbiased consistent estimators for μ 3 and μ 4 are given by μ ∗ 3 = n 2 α ∗ 3 (n – 1)(n – 2) , μ ∗ 4 = n(n 2 – 2n + 3)α ∗ 4 – 3n(2n – 3)(α ∗ 4 ) 2 (n – 1)(n – 2)(n – 3) .(21.1.3.5) 21.1.3-2. Sample mean. The sample mean of a random sample X 1 , , X n is defined as the first-order sample moment, i.e., m ∗ = α ∗ 1 = 1 n n  i=1 X i .(21.1.3.6) The sample mean of a random sample X 1 , , X n is also denoted by X. It satisfies the following formulas: E{m ∗ } = m (m = α 1 ), Var{m ∗ } = σ 2 n ,(21.1.3.7) E{(m ∗ – m) 3 } = μ 3 n 2 , E{(m ∗ – m) 4 } = 3(n – 1)σ 2 + μ 4 n 3 .(21.1.3.8) The sample mean m ∗ is an unbiased consistent estimator of the population expectation E{X} = m. If the population variance σ 2 exists, then the sample mean m ∗ is asymptotically normally distributed with parameters (m, σ 2 /n). The sample mean for the function Y = f(X) of a random variable X is Y = 1 n n  i=1 f(X i ). . X 1 , , X n of observations of the random variable X. Instead of seeking approximate values of the unknown parameters θ 1 , , θ k in the form of functions θ ∗ 1 , , θ ∗ k , in a number of problems. F(x). 5. Estimation of dependence A sequence of observations is performed simultaneously for two random variables X and Y . The results of observations are given by pairs of values x 1 , y 1 ,. the formula F S 1 , ,S k (x 1 , , x n )=  P (y 1 ) P(y n ) for a discrete random variable X andbytheformula F S 1 , ,S k (x 1 , , x n )=   p(y 1 ) p(y n ) dy 1 dy n for a continuous random

Định dạng
Số trang	7
Dung lượng	422,98 KB