Class Notes in Statistics and Econometrics Part 4 pps

56 363 0
Class Notes in Statistics and Econometrics Part 4 pps

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

CHAPTER 7 Chebyshev Inequality, Weak Law of Large Numbers, and Central Limit Theorem 7.1. Chebyshev Inequality If the random variable y has finite expected value µ and standard deviation σ, and k is some positive number, then the Chebyshev Inequality says (7.1.1) Pr  |y − µ|≥kσ  ≤ 1 k 2 . In words, the probability that a given random variable y differs from its expected value by more than k standard deviations is less than 1/k 2 . (Here “more than” and “less than” are short forms for “more than or equal to” and “less than or equal 189 1907. CHEBYSHEV INEQUALITY, WEAK LAW OF LARGE NUMBERS, AND CENTRAL LIMIT THEOREM to.”) One does not need to know the full distribution of y for that, only its expected value and standard deviation. We will give here a proof only if y has a discrete distribution, but the inequality is valid in general. Going over to the standardized variable z = y−µ σ we have to show Pr[|z|≥k] ≤ 1 k 2 . Assuming z assumes the values z 1 , z 2 ,. . . with probabilities p(z 1 ), p(z 2 ),. . . , then Pr[|z|≥k] =  i : |z i |≥k p(z i ).(7.1.2) Now multiply by k 2 : k 2 Pr[|z|≥k] =  i : |z i |≥k k 2 p(z i )(7.1.3) ≤  i : |z i |≥k z 2 i p(z i )(7.1.4) ≤  all i z 2 i p(z i ) = var[ z] = 1.(7.1.5) The Chebyshev inequality is sharp for all k ≥ 1. Proof: the random variable which takes the value −k with probability 1 2k 2 and the value +k with probability 7.1. CHEBYSHEV INEQUALITY 191 1 2k 2 , and 0 with probability 1 − 1 k 2 , has expected value 0 and variance 1 and the ≤-sign in (7.1.1) becomes an equal sign. Problem 115. [HT83, p. 316] Let y be the number of successes in n trials of a Bernoulli experiment with success probability p. Show that (7.1.6) Pr     y n − p    <ε  ≥ 1 − 1 4nε 2 . Hint: first compute what Chebyshev will tell you about the lefthand side, and then you will need still another inequality. Answer. E[y/n] = p and var[y/n] = pq/n (where q = 1 − p). Chebyshev says therefore (7.1.7) Pr     y n − p    ≥k  pq n  ≤ 1 k 2 . Setting ε = k  pq/n, therefore 1/k 2 = pq/nε 2 one can rewerite (7.1.7) as (7.1.8) Pr     y n − p    ≥ε  ≤ pq nε 2 . Now note that pq ≤ 1 /4 whatever their values are.  Problem 116. 2 points For a standard normal variable, Pr[|z|≥1] is approxi- mately 1/3, please look up the precise value in a table. What does the Chebyshev 1927. CHEBYSHEV INEQUALITY, WEAK LAW OF LARGE NUMBERS, AND CENTRAL LIMIT THEOREM inequality says about this probability? Also, Pr[|z|≥2] is approximately 5%, again look up the precise value. What does Chebyshev say? Answer. Pr[|z|≥1] = 0.3174, the Chebyshev inequality says tha t Pr[|z|≥1] ≤ 1. Also, Pr[|z|≥2] = 0.0456, while Chebyshev says it is ≤ 0.25.  7.2. The Probability Limit and the Law of Large Numbers Let y 1 , y 2 , y 3 , . . . be a sequence of independent random variables all of which have the same expected value µ and variance σ 2 . Then ¯y n = 1 n  n i=1 y i has expected value µ and variance σ 2 n . I.e., its probability mass is clustered much more closely around the value µ than the individual y i . To make this statement more precise we need a concept of convergence of random variables. It is not possible to define it in the “obvious” way that the sequence of random variables y n converges toward y if every realization of them converges, since it is possible, although extremely unlikely, that e.g. all throws of a coin show heads ad infinitum, or follow another sequence for which the average number of heads does not converge towards 1/2. Therefore we will use the following definition: The sequence of random variables y 1 , y 2 , . . . converges in probability to another random variable y if and only if for every δ > 0 (7.2.1) lim n→∞ Pr  |y n − y|≥δ  = 0. 7.2. THE PROBABILITY LIMIT AND THE LAW OF LARGE NUMBERS 193 One can also say that the probability limit of y n is y, in formulas (7.2.2) plim n→∞ y n = y. In many applications, the limiting variable y is a degenerate random variable, i.e., it is a constant. The Weak Law of Large Numbers says that, if the expected value exists, then the probability limit of the sample means of an ever increasing sample is the expected value, i.e., plim n→∞ ¯y n = µ. Problem 117. 5 points Assuming that not only the expected value but also the variance exists, derive the Weak Law of Large Numbers, which can be written as (7.2.3) lim n→∞ Pr  |¯y n − E[y]|≥δ  = 0 for all δ > 0, from the Chebyshev inequality (7.2.4) Pr[|x − µ|≥kσ] ≤ 1 k 2 where µ = E[x] and σ 2 = var[x] Answer. From nonnegativity of probability and the Chebyshev inequality for x = ¯y follows 0 ≤ Pr[|¯y − µ|≥ kσ √ n ] ≤ 1 k 2 for all k. Set k = δ √ n σ to get 0 ≤ Pr[|¯y n − µ|≥δ] ≤ σ 2 nδ 2 . For any fixed δ > 0, the upper bound converges towards zero as n → ∞, and the lower bound is zero , therefore the probability iself also converges towards zero.  1947. CHEBYSHEV INEQUALITY, WEAK LAW OF LARGE NUMBERS, AND CENTRAL LIMIT THEOREM Problem 118. 4 points Let y 1 , . . . , y n be a sample from some unknown prob- ability distribution, with sample mean ¯y = 1 n  n i=1 y i and sample variance s 2 = 1 n  n i=1 (y i − ¯y) 2 . Show that the data satisfy the following “sample equivalent” of the Chebyshev inequality: if k is any fixed positive number, and m is the number of observations y j which satisfy   y j − ¯y   ≥ks, then m ≤ n/k 2 . In symbols, (7.2.5) #{y i : |y i − ¯y|≥ks} ≤ n k 2 . Hint: apply the usual Chebyshev inequality to the so-called empirical distribution of the sample. The empirical distribution is a discrete probability distribution defined by Pr[y=y i ] = k/n, when the number y i appears k times in the sample. (If all y i are different, then all probabilities are 1/n). The empirical distribution corresponds to the experiment of randomly picking one observation out of the given sample. Answer. The only thing to note is: the sample mean is the expected value in that empirical distribution, the sample variance is the variance, and the relative number m/n is the probability. (7.2.6) #{y i : y i ∈ S} = n Pr[S]  • a. 3 points What happens to this result when the distribution from which the y i are taken does not have an expected value or a variance? 7.3. CENTRAL LIMIT THEOREM 195 Answer. The result still holds but ¯y and s 2 do not converge as the number of observations increases.  7.3. Central Limit Theorem Assume all y i are independent and have the same distribution with mean µ, variance σ 2 , and also a moment generating function. Again, let ¯y n be the sample mean of the first n observations. The central limit theorem says that the probability distribution for (7.3.1) ¯y n − µ σ/ √ n converges to a N(0, 1). This is a different concept of convergence than the probability limit, it is convergence in distribution. Problem 119. 1 point Construct a sequence of random variables y 1 , y 2 . . . with the following property: their cumulative distribution functions converge to the cumu- lative distribution function of a standard normal, but the random variables themselves do not converge in probability. (This is easy!) Answer. One example would be: all y i are independent standard normal variables.  1967. CHEBYSHEV INEQUALITY, WEAK LAW OF LARGE NUMBERS, AND CENTRAL LIMIT THEOREM Why do we have the funny expression ¯y n −µ σ/ √ n ? Because this is the standardized version of ¯y n . We know from the law of large numbers that the distribution of ¯y n becomes more and more concentrated around µ. If we standardize the sample averages ¯y n , we compensate for this concentration. The central limit theorem tells us therefore what happens to the shape of the cumulative distribution function of ¯y n . If we disregard the fact that it becomes more and more concentrated (by multiplying it by a factor which is chosen such that the variance remains constant), then we see that its geometric shape comes closer and closer to a normal distribution. Proof of the Central Limit Theorem: By Problem 120, (7.3.2) ¯y n − µ σ/ √ n = 1 √ n n  i=1 y i − µ σ = 1 √ n n  i=1 z i where z i = y i − µ σ . Let m 3 , m 4 , etc., be the third, fourth, etc., moments of z i ; then the m.g.f. of z i is (7.3.3) m z i (t) = 1 + t 2 2! + m 3 t 3 3! + m 4 t 4 4! + ··· Therefore the m.g.f. of 1 √ n  n i=1 z i is (multiply and substitute t/ √ n for t): (7.3.4)  1 + t 2 2!n + m 3 t 3 3! √ n 3 + m 4 t 4 4!n 2 + ···  n =  1 + w n n  n 7.3. CENTRAL LIMIT THEOREM 197 where (7.3.5) w n = t 2 2! + m 3 t 3 3! √ n + m 4 t 4 4!n + ··· . Now use Euler’s limit, this time in the form: if w n → w for n → ∞, then  1+ w n n  n → e w . Since our w n → t 2 2 , the m.g.f. of the standardized ¯y n converges toward e t 2 2 , which is that of a standard normal distribution. The Central Limit theorem is an example of emergence: independently of the distributions of the individual summands, the distribution of the sum has a very specific shape, the Gaussian bell curve. The signals turn into white noise. Here emergence is the emergence of homogenity and indeterminacy. In capitalism, much more specific outcomes emerge: whether one quits the job or not, whether one sells the stock or not, whether one gets a divorce or not, the outcome for society is to perpetuate the system. Not many activities don’t have this outcome. Problem 120. Show in detail that ¯y n −µ σ/ √ n = 1 √ n  n i=1 y i −µ σ . Answer. Lhs = √ n σ   1 n  n i=1 y i  −µ  = √ n σ   1 n  n i=1 y i  −  1 n  n i=1 µ   = √ n σ 1 n   n i=1 y i − µ  = rhs.  1987. CHEBYSHEV INEQUALITY, WEAK LAW OF LARGE NUMBERS, AND CENTRAL LIMIT THEOREM Problem 121. 3 points Explain verbally clearly what the law of large numbers means, what the Central Limit Theorem means, and what their difference is. Problem 122. (For this problem, a table is needed.) [Lar82, exercise 5.6.1, p. 301] If you roll a pair of dice 180 times, what is the approximate probability that the sum seven appears 25 or more times? Hint: use the Central Limit Theorem (but don’t worry about the continuity correction, which is beyond the scope of this class). Answer. Let x i be the random variable that equals one if the i-th roll is a seven, and zero otherwise. Since 7 can be obtained in six ways (1+6, 2+5, 3+4, 4+3, 5+2, 6+1), the probability to get a 7 (which is at the same time the expected value of x i ) is 6/36=1/6. Since x 2 i = x i , var[x i ] = E[x i ] − (E[x i ]) 2 = 1 6 − 1 36 = 5 36 . Define x =  180 i=1 x i . We need Pr[x≥25]. Since x is the sum of many independent identically distributed random variables, the CLT says that x is asympotical ly normal. Which normal? That which has the same expected value and variance as x. E[x] = 180 · (1/6) = 30 and var[x] = 180 · (5/36) = 25. Therefore define y ∼ N(30, 25). The CLT says that Pr[x≥25] ≈ Pr[y≥25]. Now y≥25 ⇐⇒ y − 30≥ − 5 ⇐⇒ y − 30≤ + 5 ⇐⇒ (y −30)/5≤1. But z = (y − 30)/5 is a standard Normal, therefore Pr[(y − 30)/5≤1] = F z (1), i.e., the cumulative distribution of the standard Normal evaluated at +1. One can look this up in a table, the probability asked for is .8413. Larson uses the continuity correction: x is discrete, and Pr[x≥25] = Pr[x>24]. Therefore Pr[y≥25] and Pr[y>24] are two alternative good approximations; but the best is Pr[y≥24.5] = .8643. This is the continuity correction.  [...]... min(x, y) ≤ 0 .4] Answer (8.3.9) Pr[max(x, y) ≤ 0.5 and min(x, y) ≤ 0 .4] 0. 24 0. 24 3 = = = Pr[min(x, y) ≤ 0 .4] 1 − 0.36 0. 64 8 8.3 CONDITIONAL PROBABILITY AND CONDITIONAL MEAN 215 • d Compute the joint cumulative distribution function of u and v Answer One good way is to do it geometrically: for arbitrary 0 ≤ u, v ≤ 1 draw the area {u ≤ u and v ≤ v} and then derive its size If u ≤ v then Pr[u ≤ u and. .. {max(x, y) ≤ 0.5 and min(x, y) > 0 .4} and compute its probability Answer The event is the square between 0 .4 and 0.5, and its probability is 0.01 • b Compute the probability of the event {max(x, y) ≤ 0.5 and min(x, y) ≤ 0 .4} Answer It is Pr[max(x, y) ≤ 0.5] − Pr[max(x, y) ≤ 0.5 and min(x, y) > 0 .4] , i.e., the area of the square from 0 to 0.5 minus the square we just had, i.e., 0. 24 • c Compute Pr[max(x,... the vector random variables x and y have the property that xi is independent of every y j for all i and j, does that make x and y independent random vectors? Interestingly, the answer is no Give a counterexample that this fact does not even hold for indicator variables I.e., construct two random vectors x and y, consisting of indicator variables, with the property that each component of x is independent... functions, the same is true for the joint density function and the joint probability mass function Only under this strong definition of independence is it true that any functions of independent random variables are independent Problem 1 34 4 points Prove that, if x and y are independent, then E[xy] = E[x] E[y] and therefore cov[x, y] = 0 (You may assume x and y have density functions) Give a counterexample... independent but not mutually independent 8.6 CONDITIONAL EXPECTATION AND VARIANCE 221 Answer Go back to throwing a coin twice independently and define A = {HH, HT }; B = {T H, HH}, and C = {HH, T T }, and x1 = IA , x2 = IB , and y = IC They are pairwise independent, but A ∩ B ∩ C = A ∩ B, i.e., x1 x2 y = x1 x2 , therefore E[x1 x2 y] = E[x1 x2 ] E[y] therefore they are not independent Problem 137 4. .. (x) 2 14 8 VECTOR RANDOM VARIABLES because of the formula for the marginal: +∞ (8.3.8) fx (x) = fx,y (x, y) dy y=−∞ You see that formula (8.3.6) divides the joint density exactly by the right number which makes the integral equal to 1 Problem 132 [BD77, example 1.1 .4 on p 7] x and y are two independent random variables uniformly distributed over [0, 1] Define u = min(x, y) and v = max(x, y) • a Draw in. .. 1), and if cov[x, y] = 0, then x and y are independent (I.e., in this respect indicator variables have similar properties as jointly normal random variables.) Answer Define the events A = {ω ∈ U : x(ω) = 1} and B = {ω ∈ U : y(ω) = 1}, i.e., x = iA (the indicator variable of the event A) and y = iB Then xy = iA∩B If cov[x, y] = E[xy] − E[x] E[y] = Pr[A ∩ B] − Pr[A] Pr[B] = 0, then A and B are independent... first flip giving a head For instance, the above two probabilities can be achieved by the following experimental setup: a person has one fair coin and flips it twice in a row Then the two flips are independent But the probabilities of 1/2 for heads and 1/2 for tails can also be achieved as follows: The person has two coins in his or her pocket One has two heads, and one has two tails If at random one of... each component of y, but x and y are not independent as vector random variables Hint: Such an example can be constructed in the simplest possible case that x has two components and y has one component; i.e., you merely have to find three indicator variables x1 , x2 , and y with the property that x1 is independent x1 of y, and x2 is independent of y, but the vector is not independent of y For x2 these... into the integral, i.e., the formula is (8.1.1) E[g(x, y)] = g(x, y)fx,y (x, y) dx dy R2 Problem 1 24 Assume there are two transportation choices available: bus and car If you pick at random a neoclassical individual ω and ask which utility this person derives from using bus or car, the answer will be two numbers that can be u(ω) written as a vector (u for bus and v for car) v(ω) • a 3 points Assuming . 7 Chebyshev Inequality, Weak Law of Large Numbers, and Central Limit Theorem 7.1. Chebyshev Inequality If the random variable y has finite expected value µ and standard deviation σ, and k is some. seven, and zero otherwise. Since 7 can be obtained in six ways (1+6, 2+5, 3 +4, 4+ 3, 5+2, 6+1), the probability to get a 7 (which is at the same time the expected value of x i ) is 6/36=1/6. Since. Therefore Pr[y≥25] and Pr[y> 24] are two alternative good approximations; but the best is Pr[y≥ 24. 5] = .8 643 . This is the continuity correction.  CHAPTER 8 Vector Random Variables In this chapter

Ngày đăng: 04/07/2014, 15:20

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan