Class Notes in Statistics and Econometrics Part 8 ppt

48 304 0
Class Notes in Statistics and Econometrics Part 8 ppt

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

CHAPTER 15 Hypothesis Testing Imagine you are a business p e rson considering a major investment in order to launch a new product. The sales prospects of this product are not known with certainty. You have to rely on the outcome of n marketing surveys that measure the demand for the product once it is offered. If µ is the actual (unknown) rate of return on the investment, each of these surveys here will be modeled as a random variable, which has a Normal distribution with this mean µ and known variance 1. Let y 1 , y 2 , . . . , y n be the observed survey results. How would you decide whether to build the plant? The intuitively reasonable thing to do is to go ahead with the investment if the sample mean of the observations is greater than a given value c, and not to do 425 426 15. HYPOTHESIS TESTING it otherwise. This is indeed an optimal decision rule, and we will discuss in what respect it is, and how c should be picked. Your decision can be the wrong decision in two different ways: either you decide to go ahead with the investment although there will be no demand for the product, or you fail to invest although there would have been demand. There is no decision rule which eliminates both errors at once; the first error would be minimized by the rule never to produce, and the second by the rule always to pro duce. In order to determine the right tradeoff between these errors, it is important to be aware of their asymmetry. The error to go ahead with production although there is no demand has potentially disastrous consequences (loss of a lot of money), while the other error may cause you to miss a profit opportunity, but there is no actual loss involved, and presumably you can find other opportunities to invest your money. To express this asymmetry, the error with the potentially disastrous consequences is called “error of type one,” and the other “error of type two.” The distinction between type one and type two errors can also be made in other cases. Locking up an innocent person is an error of type one, while letting a criminal go unpunished is an error of type two; publishing a pap e r with false results is an error of type one, while foregoing an opportunity to publish is an error of type two (at least this is what it ought to be). 15. HYPOTHESIS TESTING 427 Such an asymmetric situation calls for an asymmetric decision rule. One needs strict safeguards against committing an error of type one, and if there are several decision rules which are equally safe with resp ec t to errors of type one, then one will select among those that decision rule which minimizes the error of type two. Let us look here at decision rules of the form: make the investment if ¯y > c. An error of type one occurs if the decision rule advises you to make the investment while there is no demand for the product. This will be the case if ¯y > c but µ ≤ 0. The probability of this error depends on the unknown parameter µ, but it is at most α = Pr[¯y > c |µ = 0]. This maximum value of the type one error probability is called the significance level, and you, as the director of the firm, will have to decide on α depending on how tolerable it is to lose money on this venture, which presumably depends on the chances to lose money on alternative investments. It is a serious shortcoming of the classical theory of hypothesis testing that it does not provide good guidelines how α should be chosen, and how it should change with sample size. Instead, there is the tradition to choose α to be either 5% or 1% or 0.1%. Given α, a table of the cumulative standard normal distribution function allows you to find that c for which Pr[¯y > c |µ = 0] = α. Problem 213. 2 points Assume each y i ∼ N(µ, 1), n = 400 and α = 0.05, and different y i are independent. Compute the value c which satisfies Pr[¯y > c |µ = 0] = α. You shoule either look it up in a table and include a xerox copy of the table with 428 15. HYPOTHESIS TESTING the entry circled and the complete bibliographic reference written on the xerox copy, or do it on a computer, writing exactly which commands you used. In R, the function qnorm does what you need, find out about it by typing help(qnorm). Answer. In the case n = 400, ¯y has variance 1/400 and therefore standard deviation 1/20 = 0.05. Therefore 20¯y is a standard normal: from Pr[¯y > c |µ = 0] = 0.05 follows Pr[20¯y > 20c |µ = 0] = 0.05. Therefore 20c = 1.645 can be looked up in a table, perhaps use [JHG + 88, p. 986], the row for ∞ d.f. Let us do this in R. The p-“quantile” of the distribution of the random variable y is defined as that value q for which Pr[y ≤ q] = p. If y is normally distributed, this quantile is computed by the R-function qnorm(p, mean=0, sd=1, lower.tail=TRUE). In the present case we need either qnorm(p=1-0.05, mean=0, sd=0.05) or qnorm(p=0.05, mean=0, sd=0.05, lower.tail=FALSE) which gives the value 0.08224268.  Choosing a decision which makes a loss unlikely is not enough; your decision must also give you a chance of success. E.g., the decision rule to build the plant if −0.06 ≤ ¯y ≤ −0.05 and not to build it otherwise is completely perverse, although the significance level of this decision rule is approximately 4% (if n = 100). In other words, the significance level is not enough information for evaluating the performance of the test. You also need the “power function,” which gives you the probability with which the test advises you to make the “critical” decision, as a function of the true parameter values. (Here the “critical” decision is that decision which might 15. HYPOTHESIS TESTING 429 -3 -2 -1 0 1 2 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 1. Eventually this Figure will show the Power function of a one-sided normal test, i.e., the probability of error of type one as a function of µ; right now this is simply the cdf of a Standard Normal potentially lead to an error of type one.) By the definition of the significance level, the power function does not exceed the significance level for those parameter values for which going ahead would lead to a type 1 error. But only those tests are “powerful” whose power function is high for those parameter values for which it would be correct to go ahead. In our case, the power function must be below 0.05 when µ ≤ 0, and we want it as high as possible when µ > 0. Figure 1 shows the power function for the dec ision rule to go ahead whenever ¯y ≥ c, where c is chosen in s uch a way that the significance level is 5%, for n = 100. The hypothesis whose rejection, although it is true, constitutes an error of type one, is called the null hypothesis, and its alternative the alternative hypothesis. (In the examples the null hypotheses were: the return on the investment is zero or negative, 430 15. HYPOTHESIS TESTING the defendant is innoce nt, or the results about which one wants to publish a research paper are wrong.) The null hypothesis is therefore the hypothesis that nothing is the case. The test tests whether this hypothesis should be rejected, will safeguard against the hypothesis one wants to reject but one is afraid to reject erroneously. If you reject the null hypothesis, you don’t want to regret it. Mathematically, every test can be identified with its null hypothesis, which is a region in parameter space (often consisting of one point only), and its “critical region,” which is the event that the test comes out in favor of the “critical decision,” i.e., rejects the null hypothesis. The critical region is usually an event of the form that the value of a certain random variable, the “test statistic,” is within a given range, usually that it is too high. The power function of the test is the probability of the critical region as a function of the unknown parameters, and the significance level is the maximum (or, if this maximum depends on unknown parameters, any upp e r bound) of the p ower function over the null hypothesis. Problem 214. Mr. Jones is on trial for counterfeiting Picasso paintings, and you are an expert witness who has developed fool-proof statistical significance tests for identifying the painter of a given painting. • a. 2 points There are two ways you can set up your test. 15. HYPOTHESIS TESTING 431 a: You can either say: The null hypothesis is that the painting was done by Picasso, and the alternative hypothesis that it was done by Mr. Jones. b: Alternatively, you might say: The null hypothesis is that the painting was done by Mr. Jones, and the alternative hypothesis that it was done by Pi- casso. Does it matter which way you do the test, and if so, which way is the correct one. Give a reason to your answer, i.e., say what would be the consequences of testing in the incorrect way. Answer. The determination of what the null and what the alternative hypothesis is depends on what is considered to b e the catastrophic error which is to be guarded against. On a trial, Mr. Jones is considered innocent until proven guilty. Mr. Jones should not be convicted unless he can be proven guilty beyond “reasonable doubt.” Therefore the test must be set up in such a way that the hypothesis that the painting is by Picasso will only be rejected if the chance that it is actually by Picasso is very small. The error of type one is that the painting is considered counterfeited although it is really by Picasso. Since the error of type one is always the error to reject the null hypothesis although it is true, solution a. is the correct one. You are not proving, you are testing.  • b. 2 points After the trial a customer calls you who is in the process of acquiring a very expensive alleged Picasso painting, and who wants to be sure that this painting is not one of Jones’s falsifications. Would you now set up your test in the same way as in the trial or in the opposite way? 432 15. HYPOTHESIS TESTING Answer. It is worse to spend money on a counterfeit painting than to forego purchasing a true Picasso. Therefore the null hypothesis would be that the painting was done by Mr. Jones, i.e., it is the opposite way.  Problem 215. 7 points Someone makes an extended experiment throwing a coin 10,000 times. The relative frequency of heads in these 10,000 throws is a random variable. Given that the probability of getting a head is p, what are the mean and standard deviation of the relative frequency? Design a test, at 1% significance level, of the null hypothesis that the coin is fair, against the alternative hypothesis that p < 0.5. For this you should use the central limit theorem. If the head showed 4,900 times, would you reject the null hypothesis? Answer. Let x i be the random variable that equals one when the i-th throw is a head, and zero otherwise. The expected value of x is p, the probability of throwing a head. Since x 2 = x, var[x] = E[x] − (E[x]) 2 = p(1 − p). The relative frequency of heads is simply the average of all x i , call it ¯x. It has mean p and variance σ 2 ¯x = p(1−p) 10,000 . Given that it is a fair coin, its mean is 0.5 and its standard deviation is 0.005. Reject if the actual frequency < 0.5 − 2.326σ ¯x = .48857. Another approach: (15.0.33) Pr(¯x ≤ 0.49) = Pr  ¯x − 0.5 0.005 ≤ −2  = 0.0227 since the fraction is, by the central limit theorem, approximately a standard normal random variable. Therefore do not reject.  15.1. DUALITY BETWEEN SIGNIFICANCE TESTS AND CONFIDENCE REGIONS 433 15.1. Duality between Significance Tests and Confidence Regions There is a duality between confidence regions with confidence level 1 − α and certain significance tests. Let us look at a family of significance tests, which all have a significance level ≤ α, and which define for every possible value of the parameter φ 0 ∈ Ω a critical region C(φ 0 ) for rejecting the simple null hypothesis that the true parameter is equal to φ 0 . The condition that all significance levels are ≤ α means mathematically (15.1.1) Pr  C(φ 0 )|φ = φ 0  ≤ α for all φ 0 ∈ Ω. Mathematically, confidence regions and such families of tests are one and the same thing: if one has a confidence region R(y), one can define a test of the null hypothesis φ = φ 0 as follows: for an observed outcome y reject the null hypothesis if and only if φ 0 is not contained in R(y). On the other hand, given a family of tests, one can build a confidence region by the prescription: R(y) is the set of all those parameter values which would not be rejected by a test based on observation y. Problem 216. Show that with these definitions, equations (14.0.5) and (15.1.1) are equivalent. Answer. Since φ 0 ∈ R(y) iff y ∈ C  (φ 0 ) (the complement of the critical region rejecting that the parameter value is φ 0 ), it follows Pr[R(y) ∈ φ 0 |φ = φ 0 ] = 1 − Pr[C(φ 0 )|φ = φ 0 ] ≥ 1 − α.  434 15. HYPOTHESIS TESTING This duality is discussed in [BD77, pp. 177–182]. 15.2. The Neyman Pearson Lemma and Likelihood Ratio Tests Look one more time at the example with the fertilizer. Why are we considering only regions of the form ¯y ≥ µ 0 , why not one of the form µ 1 ≤ ¯y ≤ µ 2 , or maybe not use the mean but decide to build if y 1 ≥ µ 3 ? Here the µ 1 , µ 2 , and µ 3 can be chosen such that the probability of committing an error of type one is still α. It seems intuitively clear that these alternative decision rules are not reasonable. The Neyman Pearson lemma proves this intuition right. It says that the critical regions of the form ¯y ≥ µ 0 are uniformly most powerful, in the sense that every other critical region with same probability of type one error has equal or higher probability of committing error of type two, regardless of the true value of µ. Here are formulation and proof of the Neyman Pearson lemma, first for the case that both null hypothesis and alternative hypothesis are simple: H 0 : θ = θ 0 , H A : θ = θ 1 . In other words, we want to determine on the basis of the observations of the random variables y 1 , . . . , y n whether the true θ was θ 0 or θ 1 , and a determination θ = θ 1 when in fact θ = θ 0 is an error of type one. The critical region C is the set of all outcomes that lead us to conclude that the parameter has value θ 1 . The Neyman Pearson lemma says that a uniformly most powerful test exists in this situation. It is a so-called likelihood-ratio te st, which has the following critical [...]... HYPOTHESIS TESTING • b 1 point Government agencies enforcing discrimination laws traditionally have been using the selection ratio p1 /p2 Compute the selection ratio for both firms Answer In firm A, the selection ratio is 1 68 32 10 = 17 80 = 0.2125 In firm B, it is 13 54 = 0.2407 • c 3 points Statisticians argue that the selection ratio is a flawed measure of discrimination, see [Gas 88, pp 207–11 of... and y = [ 6 12 18 30 54 ] I.e., the three subjects receiving treatment A had the results 18, 30, and 54, and the two subjects receiving treatment B the results 6 and 12 This gives a value of t = 1 .81 Does this indicate a significant difference between A and B? 15.5 PERMUTATION TESTS 455 The usual approach to answer this question is discussed in chapter/section 42, p 959 It makes the following assumptions:... θ ∈ ω, then take twice the difference of the attained levels of the log likelihoodfunctions, and compare with the χ2 tables 15.3 The Runs Test [Spr 98, pp 171–175] is a good introductory treatment, similar to the one given here More detail in [GC92, Chapter 3] (not in University of Utah Main Library) and even more in [Bra 68, Chapters 11 and 23] (which is in the Library) 15.3 THE RUNS TEST 441 Each of... 1] Demonstrate this by comparing firm A with firm C which hires 5 out of 32 black and 40 out of 68 white applicants Hirings by Two Different Firms with 100 Applications Each Firm A Firm C Minority Majority Minority Majority Hired 1 10 5 40 Not Hired 31 58 27 28 Table 2 Selection Ratio gives Conflicting Verdicts 17 5 Answer In Firm C the selection ratio is 32 68 = 64 = 0.265625 In firm A, the chances 40 for... chances 40 for blacks to be hired is 24% that of whites, and in firm C it is 26% Firm C seems better But if we compare the chances not to get hired we get a conflicting verdict: In firm A the ratio is 31 68 = 1.1357 In firm C it is 27 68 = 2.0491 In firm C, the chances not to get hired is twice as 32 58 32 28 high for Minorities as it is for Whites, in firm A the chances not to get hired are more equal Here... factors used in the statistic for the goodness of fit test Problem 219 2 points A matrix Ω is a g-inverse of Ψ iff ΨΩΨ = Ψ Show that the following matrix  1 p1 10   n (15.4.2) 0 0 1 p2 0 ··· ··· ···  0 0    1 pr is a g-inverse of the covariance matrix of the multinomial distribution given in (8. 4.2) Answer Postmultiplied by the g-inverse given in (15.4.2), the covariance matrix from (8. 4.2)... that are χ2 ’s Here it is again: Assume y is a jointly normal vector random variable with mean vector µ and covariance matrix σ 2 Ψ, and Ω is a symmetric nonnegative definite matrix Then (y − µ) Ω(y − µ) ∼ σ 2 χ2 iff k ΨΩΨΩΨ = ΨΩΨ and k is the rank of Ω If Ψ is singular, i.e., does not have an inverse, and Ω is a g-inverse of Ψ, then condition (10.4.9) holds A matrix Ω is a g-inverse of Ψ iff ΨΩΨ = Ψ Every... Which of the two firms seems to discriminate more? Is the difference of probabilities or the odds ratio the more relevant statistic here? Answer In firm A, 3.125% of the minority applicants and 14.7% of the Majority applicants 29 were hired The difference of the probabilities is 11. 581 % and the odds ratio is 155 = 0. 187 1 In firm B, 4.167% of the minority applicants and 17.3 08% of the majority applicants were... selection ratio for its complement The odds ratio and the differences in probabilities do not give rise to such discrepancies: the odds ratio for not being hired is just the inverse of the odds ratio for being hired, and the difference in the probabilities of not being hired is the negative of the difference in the probabilities of being hired As long as p1 and p2 are both close to zero, the odds ratio is... the new hires were Minorities 15.5 PERMUTATION TESTS 457 Hirings by Two Different Firms with 100 Applications Each Firm A Firm B Minority Majority Minority Majority Hired 1 10 2 9 Not Hired 31 58 46 43 Table 1 Which Firm’s Hiring Policies are More Equitable? • a 3 points Let p1 be the proportion of Minorities hired, and p2 the proportion Majorities hired Compute the difference p1 − p2 and the odds ratio . counterfeiting Picasso paintings, and you are an expert witness who has developed fool-proof statistical significance tests for identifying the painter of a given painting. • a. 2 points There. who is in the process of acquiring a very expensive alleged Picasso painting, and who wants to be sure that this painting is not one of Jones’s falsifications. Would you now set up your test in the. 15 Hypothesis Testing Imagine you are a business p e rson considering a major investment in order to launch a new product. The sales prospects of this product are not known with certainty. You have

Ngày đăng: 04/07/2014, 15:20

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan