Chapter 14: HYPOTHESIS TESTING ANH CONFIDENCE REGIONS

Trang 1

CHAPTER 14

Hypothesis testing and confidence regions

The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher’s work on estimation As in estimation, we begin by postulating a statistical model but instead of seeking an estimator of 8@in @ we consider the question

whether @€@, <9 or EO, = O-— @, is mostly supported by the observed

data The discussion which follows will proceed in a similar way, though less systematically and formally, to the discussion of estimation This is due to the complexity of the topic which arises mainly because one is asked to assimilate too many concepts too quickly just to be able to define the problem properly This difficulty, however, is inherent in testing, if any proper understanding of the topic is to be attempted, and thus unavoidable Every effort is made to ensure that the formal definitions are supplemented with intuitive explanations and examples In Sections 14.1 and 14.2 the

concepts needed to define a test and some criteria for ‘good’ tests are

discussed using a simple example In Section 14.3 the question of constructing ‘good’ tests is considered Section 14.4 relates hypothesis testing to confidence estimation, bringing out the duality between the two areas In Section 14.5 the related topic of prediction is considered 14.1 Testing, definitions and concepts

Let X be a random variable (r.v.) defined on the probability space

(S, A P(-)) and consider the statistical model associated with X:

(i) D={ f(x; 6), GEO}:

(ii) X=(X,, X2, , X,,)' is a random sample, from f(x; 6)

Trang 2

286 Hypothesis testing and confidence regions

conjecture about @ of the form 6 belongs to some subset @, of @ is

supported by the data x =(x,,x, ,X,)' Wecall such a conjecture the null

hypothesis and denote it by H 5: 8 € Og, where if the sample realisation x EC,

we accept Ho, if x EC, we reject it The mapping which enables us to define Co and C, we call a test statistic 1(X): # > R (see Fig 11.4)

In order to illustrate the concepts introduced so far let us consider the

following example Let X be the random variable representing the marks

achieved by students in an econometric theory paper and let the statistical model be: 1 1/X -0\? i O=< f(X; A= —~|—— = : () lớn = Tam ep ( : ) | 0e© =[0, 100]; (ii) X=(X¡.X; ,X/,, n=40 is a random sample from ƒ(x; Ø) The hypothesis to be tested is Hy: 0=60 (i.e X ~N(60,64)), ©, ={60} against H,: 0460 (ie X~N(u, 64), 1 #60), ©, =[0, 100] — {60} Common sense suggests that if some ‘good’ estimator of 0, say X,,=

(1/n) $7 X;, for the sample realisation x takes a value ‘around’ 60 then we will be inclined to accept H, Let us formalise this argument:

The acceptance region takes the form 60-—e<X,,<60+8, e>0, or

Co= (x: |X, —60| <e},

and

C,={x: |X, —60|2e} is the rejection region

The next question is, ‘how do we choose e” If ¢ is too small we run the risk of rejecting Hy when it is true; we call this type I error On the other hand, if ¢is too large we run the risk of accepting H, when it is false; we call this type II

error Formally, ifxeC, (reyect Hy) and 6€@, (H, is true) — type I error; if x € Cy (accept H,) and 6 € ©, (His false) — type I] error (see Table 14.1) The

Table 14.1

Hy accepted Hy rejected

H, true correct type I error

Trang 3

14.1 Testing, definitions and concepts 287

hypothesis to be tested is formally stated as follows: Hạ: 0cO,, O)SO (14.1) Against the null hypothesis H, we postulate the alternative H, which takes the form: H,: 0€05 (14.2) or, equivalently, H,: 0€0, =O-O, (14.3)

It is important to note at the outset that H, and H, are in effect hypotheses about the distribution of the sample ƒ(x; 9), Le

Ho: f(x: 8), @E€O,, H,: f(x: 6), GeO, (14.4) A hypothesis H, or H, is called simple if knowing 6€ ©, or 6 €©, specifies J (x; 8) completely, otherwise it is called a composite hypothesis That is, if I(x; 0), 0€ Oy or f(x; 8), GEO, contain only one density function we say that Hy or H, are simple hypotheses, respectively; otherwise they are said to be composite

In testing a null hypothesis Hy against an alternative H, the issue is to decide whether the sample realisation x ‘supports’ Hy or H, In the former

case we say that H is accepted, in the latter Hg is rejected In order to be able to make such a decision we need to formulate a mapping which relates ©, to some subset of the observation space 2% say Co, we call an acceptance region, and its complement C, (Cy UC, =4%) Co \C,=@) we call the

rejection region (see Fig 11.4) Obviously, in any particular situation we cannot say for certain in which of the four boxes in Table 14.1 we are in; at

best we can only make a probabilistic statement relating to this Moreover, if we were to choose ¢ ‘too small’ we run a higher risk of committing a type I error than of committing a type IJ error and vice versa That is, there is a trade off between the probability of type I error, i.e

Pr(xeC,; OE Oo) =a, (14.5)

and the probability B of type II error, i.e

Pr(xeŒạ; 9c©;)= ổ (14.6)

Ideally we would like x = 6 =0 for all @ € © which is not possible for a fixed n

Moreover, we cannot control both simultaneously because of the trade-off

between them ‘How do we proceed, then” In order to help us decide let us

Trang 4

The jury in a criminal offence trial are instructed to choose between: Hy: the accused is not guilty; and

H,: the accused is guilty;

with their decision based on the evidence presented in the court This evidence in hypothesis testing comes in the form of ® and X The jury are instructed to accept Hy unless they have been convinced beyond any reasonable doubt otherwise This requirement is designed to protect an innocent person from being convicted and it corresponds to choosing a small value for «, the probability of convicting the accused when innocent By adopting such a strategy, however,they are running the risk of letting a number of ‘crooks off the hook’ This corresponds to being prepared to accept a relatively high value of Ø, the probability of not convicting the accused when guilty, in order to protect an innocent person from

conviction This is based on the moral argument that it is preferable to let

off a number of guilty people rather than to sentence an innocent person However, we can never be sure that an innocent person has not been sent to prison and the strategy is designed to keep the probability of this happening very low A similar strategy is also adopted in hypothesis testing where a small value of » is chosen and for a given a, f is minimised Formally, this amounts to choosing «* such that

Pr(x EC; 0€O,)=a(8)<a* for PEO, (14.7) and Pr(x €Cy; 0€O,)= (8) is minimised for 0e©, (14.8) by choosing C, or Cy appropriately In the case of the above example if we were to choose a, say «* =0.05, then Pr(|X,, —60| > e; 8=60)=0.05 (14.9)

Trang 5

14.1 Testing, definitions and concepts 289

and thus the distribution of t(-) is known completely (no unknown parameters) When this is the case this distribution can be used in conjunction with the above probabilistic statement to determine e In order

to do this we need to relate |X,,—60| to r(X) (a statistic) for which the

distribution is known The obvious way to do this is to standardise the

former, i.e consider |X, —60|/1.265 which is equal to |z(X)| This suggests

changing the above probabilistic statement to the equivalent statement

p,( " = 00 = 60)=005 where c, = —— 1.265 1.265 (14.12)

Given that the distribution of the statistic t(X) is symmetric and we want to determine c, such that Pr(|c(X)| 2 c,)=0.05 we should choose the value of c, from the tables of N(0, 1) which leaves «*/2 =0.025 probability on either side of the distribution as shown in Fig 14.1 The value of c, given from the N(O, 1) tables is c, = 1.96 This in turn implies that the rejection region for the test is ¥,—60 C, -}x nO 196) [x |r(X)|> 1.96) (14.13) or C, = {x:|X,, 60] > 2.48} (14.14)

That is, for sample realisations x which give rise to X,, falling outside the interval (57.52, 62.48) we reject Ho

Let us summarise the argument so far in order to keep the discussion in perspective We set out to construct a test for Hy: 0=60 against H,:0460

and intuition suggested the rejection region (|X,,—60|>c) In order to

determine ¢ we had to

(i) choose an «a; and then

Trang 6

Given that C¡ ={x: |r(X)|> 1.96} defines a test with « =0.05, the question

which naturally arises is: ‘What do we need the probability of type IT error 8 for? The answer is that we need f to decide whether the test defined in terms of C, is a ‘good’ or a ‘bad’ test As we mentioned at the outset, the way we decided to ‘solve’ the problem of the trade-off between « and f was to choose a smail value for a and define C, so as to minimise f At this stage we do not know whether the test defined above is a ‘good’ test or not Let us consider setting up the apparatus to enable us to consider the question of

optimality

14.2 Optimal tests

Since the acceptance and rejection regions constitute a partition of the observation space % ie CoUC,=2% and CynC,=2, it implies that

Pr(x e Cạ)= 1— Pr(x eC)) for all 8Ââ, Hence, minimisation of Pr(x €C,)

for all EO, is equivalent to maximising Pr(x eC;) for all 9€Q,

Definition 1

The probability of rejecting Hy when false at some point 6, €©,, i.e Pr(xeC;; 0=) ¡s called the power of the test a 0= 0)

Note that

Pr(xeC¡;0=0Ø)= 1— Pr(xeCạ; 0=8,)= 1— 8(0)) (14.15)

In the case of the example above we can define the power of the test at some

0, €©,, say 0, =54, to be Pr[(|X,,—60))/1.265> 1.96; 6=54] ‘How do we

calculate this probability” The temptation is to suggest using the same distribution as above, i.e 7(X) =(X,, — 60)/ 1.265 ~ N(0, 1) This is, however, wrong because @ is no longer equal to 60; we assumed that 6= 54 and thus (X,, — 54)/1.265 ~ N(O, 1) This implies that (54-60) 1265 ` :X)~ NỈ ' for 8=54 Using this we can define the power of the test at Ø= 54 to be Pr X,—60 > 1.96: 0=54 |= Pr (Xn = 4) - _ 1.96 24-69) 1.265 1.265 1.265

Trang 7

142 Optimal tests 291 need to calculate the power for all 9 ¢@, Following the same procedure the

power of the test defined by C, for 0= 56, 58, 60, 62, 64, 66 is as follows: Pr(|c(X)| > 1.96; 0 = 56) =0.8849, Pr(|t(X)| > 1.96; @= 58) = 0.3520, Pr(|r(X)| > 1.96; 0=60)=0.05, Pr(|t(X)| > 1.96; 6= 62) =0.3520, Pr(|t(X)| > 1.96; 0 = 64) = 0.8849, Pr(|t(X)| > 1.96; 0=66)=0.9973

As we can see, the power of the test increases as we go further away from 0=60 (Hạ) and the power at 6=60 equals the probability of type I error

This prompts us to define the power function as follows: Definition 2

A0)=Pr(xeEC,), G€O is called the power function of the test defined by the rejection region C,

Definition 3

œz=maXạ,e, A(8) is defined to be the size (or the significance level) of the test

In the case where Hy is simple, say 6= 6, then « = A(6,) These definitions enable us to define a criterion for ‘a best’ test of a given size « to be the one (if

it exists) whose power function A(@), 8€@, is maximum at every 0

Definition 4

A test of Hạ: 0 c©g against H,: 0 €O, as defined by some rejection

region C, is said to be uniformly most powerful (UM P) test of size x

if

(i) max A(8) =a;

GEO,

(ii) 2(0)>Z*\0)_ for all 0e©;;

where Z2*() ¡is the power function oƒ any other test 0ƒ size a

As we saw above, in order to be able to determine the power function we

need to know the distribution of the test statistic t(X) (in terms of which C,

is defined) under H, (i.e when Hg is false) The concept of a UMP test provides us with the criterion needed to choose between tests for the same H o

Trang 8

292 Hypothesis testing and confidence regions f (z) Cy ={x:1(X) = 1.645} (14.16) 0 1.645 2 f (z) C‡ * ={x+(X) <1.648} (14.17) —1.645 0 Zz f tz) C‡ * * ={xjr(X)|<0.038} (14.18) -0.03 0 003 Zz

Fig 14.2 The rejection regions (14.16), (14.17) and (14.18)

above with rejection region

Cy = {x: |c(X)| > 1.96} (14.19)

To that end we shall compare the power of this test with the power of the size 0.05 tests (Fig 14.2), defined by the rejection regions All the rejection regions define size 0.05 tests for Hy: 0=60 against H,: 860 In order to discriminate between ‘bad’, ‘good’ and ‘better’ tests we have to calculate their power functions and compare them The diagram of the power functions A(0), 2 ”(0), F~ *(0), A” ~ *(0) is illustrated in Fig 14.3

Looking at the diagram we can see that only one thing is clear ‘cut’; C; ** defines a very bad test, its power function being dominated by the other tests Comparing the other three tests we can see that C* is more

powerful than the other two for 0> 60 but A* (6) <a for 0<60.C/ * is more

powerful than the other two for 0<60 but Y* *(@)<« for Ø0 > 60, but none of the tests is more powerful over the whole range That there is no UMP test of size 0.05 for Hy: 0= 60 against H,: 0460 As will be seen in the sequel, no UMP tests exist in most situations of interest in practice The procedure adopted in such cases is to reduce the class of all tests to some subclass by

Trang 9

14.2 Optimal tests 293 P** (8) #2" (0) 1.00 / À ++? (8) 0.05 + 0

Fig 14.3 The power functions #0),.2*(0), #7 "(0,2° * *(8)

the subclass One of the most important restrictions used in this context is

the criterion of unbiasedness Definition 5

A test of Hy: 0€@y against @€O, is said to be unbiased if

max Z0) < max 2(0) (14.20)

0cÓ 0c@;

In other words, a test is unbiased if it rejects Hạ more often when I( is false

than when it is true; a minimal but sensible requirement Another form

these added restrictions can take which reduces the problem to one where UMP do exist is related to the probability model ® These include restrictions such as that ® belongs to the one-parameter exponential family In the case of the above example we can see that the test defined by C* *

is biased and C, is now UMP within the class of unbiased tests This is because C/ and Cy * are biased for 6<60 and @> 60 respectively It is

obvious, however, that for Hạ: 0=60 against HỊ:0>60 or H*: 0<60,

the tests defined by C/ and C; * are UMP, respectively That is, for the one- sided alternatives there exist UMP tests given by C/ and Cj It is

important to note that in the case of H, and Hf above the parameter space implicitly assumed is different In the case of H, the parameter space implicitly assumed is © = [60, 100] and in the case of H¥, © = [0, 60] This is

Trang 10

Collecting all the above concepts together we say that a test has been defined when the following components have been specified:

(T1) a test statistic t(X) (T2) the size of the test a

(T3) the distribution of t(X) under Hy and H, (T4) the rejection region C, (or, equivalently, C,)

Let us illustrate this using the marks example above The test statistic is

n(X,„— 0) _ (X,—60)

xX = =

HX) đ 127 7 (14.21)

we call it a statistic because ø is known and 0 is known under Hạ and H, lf we choose the size z = 0.05 the fact that r(X) ~ NỊ0, 1) under Hạ enables us to

define the rejection region C¡ = {x: |r(X)|>c„} where c„ is determined from

Pr(|t(X)| > c,; 8 = 60) =0.05 to be 1.96, from the standard normal tables, ie if pz) denotes the density function of N(0, 1) then

[ o(2) (14.22)

In order to derive the power function we need the distribution of 1(X) under

H, Under H, we know that /WX,, — 91) x a 1*(X)= N(O, 1), (14.23) for any 8, €@, and hence we can relate t(X) with t*(X) b t(X)=t*(X)+ vín =« (14.24) to deduce that ‹X)~N( vũ —=- i (14.25) under H, This enables us to define the power function as AO.) = Pr(x: |r(X)| >c,) = Pr( <4) <-¢,- vn vn) Ø +Pi(= (X)>c, ~ Jnl? - ma), 6,€Q) (14.26)

Trang 11

14.2 Optimal tests 295 which we need to know its distribution under both H, and H, Hence,

constructing an optimal test is largely a matter of being able to find a statistic t(X) which should have the following properties:

(i) 1(X) depends on X via a ‘good’ estimator of 0; and

(ii) the distribution of t(X) under both H, and H, does not depend on

any unknown parameters We call such a statistic a pivot

It is no exaggeration to say that hypothesis testing is based on our ability to

construct such pivots When X isa random sample from N(u, 0”) pivots are readily available in the form of

vn (aS #)-A6,0, `: Ha —l), (n— I) s~zn= 1), 2 (14.27)

but in general these pivots are very hard to come by

The first pivot was used above to construct tests for » when o? is known (both one-sided and two-sided tests) The second pivot can be used to set up similar tests for 4 when o? is unknown For example, testing Ho: u=Lo against H,: u# flo the rejection region can be defined by

C,={x:|r,(X)|>c,} where t,(X)= vu ; (Xa) (14.28)

and c, can be determined by: fe f(t}dt= 1—2; f(t) being the density of the

Student’s t-distribution with n—1 degrees of freedom For Ho: p=fMo

against H,: < mạ the rejection region takes the form Cy= (x: 1(X)2c,} with =| f(t) dt, (14.29) determining c, The pivot ¬ tạX)=S ĐỀ - sự ơ? — |) (1430)

can be used to test hypotheses about o” For example, in the case of a

random sample from N(u,¢’) testing Hy: 0? > 0% against H,: o?<o?2 the rejection for an optimal test takes the form

C, = (x: t(X) <c,}, (14.31)

where c, is determined via | 4m= b=s

Trang 12

14.3 Constructing optimal tests

In constructing the tests considered so far we used ad hoc intuitive arguments which led us to a pivot As with estimation, it would be helpful if there were general methods for constructing optimal tests It turns out that the availability of a method for constructing optimal tests depends crucially on the nature of the hypotheses (H and H,) or/and the probability model postulated As far as the nature of Hy and H, is concerned existence and optimality depend crucially on whether these hypotheses are simple or

composite As mentioned in Section 14.2 a hypothesis Hy or H, is called

simple if Og or ©, contain just one point respectively In the case of the

‘marks’ example above, ©, = {60} and ©, = {[0, 60) v (60, 100]}, Le Hạ is

simple and H, is composite since it contains more than one point Care should be exercised when @ is a vector of unknown parameters because in such a case ©, or ©, must contain single vectors as well in order to be simple For example, in the case of sampling from N(u, 07) and ø2 is not

known, Ho: = Hạ 1S not a simple hypothesis since Oy = {(u, 0), 77 ER }

(1) Simple null and simple alternative

The theory concerning two simple hypotheses was fully developed in the 1920s by Neyman and Pearson Let

O={ f(x; 6), 0E0}

be the probability model and X=(X,, X2, , X,,)’ be the sampling model and consider the simple null and simple alternative Hy: 0= 0) and H,:

0=0,,Đ= (0, 0,}, i.e there are only two possible distributions for ®, that

is, f(x; @) and f(x; 0,) Given the available data x we want to choose between the two distributions The following theorem provides us with sufficient conditions for the existence of a UMP test for this, the simplest of the cases in testing

Nevman—Pearson theorem

LetX=(X,,X , X,) beasample froma continuous distribution

Trang 13

14.3 Constructing optimal tests 297

In this simple case

% for 0=,

1—f for 0=0, (14.34)

20={

The Neyman-Pearson theorem suggests that it is intuitively sensible to

base the acceptance or rejection of Hy on the relative values of the distributions of the sample evaluated at Ø= 0ạ and Ø= 0) 1.e reJect Họ 1ƒ the

ratlo ƒ(X; Øẹ)/ ƒ(%; Ø) 1s relatively small This amounts to rejecting Hạ when the evidence in the form of x favour H;, giving 1t a higher ‘support’ It is very important to note that the Neyman-—Pearson theorem does not solve the problem completely because the problem of relating the ratio f(x; @)/ f(x; 6,) to a pivotal quantity (test statistic) remains Consider the case

where X ~ N(0, 07), ? known, and we want to test Hy: 0= 0, against H: 6=6, (89 <4,) From the Neyman—Pearson theorem we know that the

rejection region defined in terms of the ratio _ F(X; 80) I(x; kh: 0) - 21 —ep [0 —8)—2Ä,10, -oon| (14.35)

Trang 14

For example tf « =0.05, c* = 1.645 and the power of the test is /m(0, — PO)= {n0 >c‡ TH: =1—, (14.39) where _ £0) =O) 9, 1) under H) (14.40) In this case we can control f if we can increase the sample size since 1 — B =Pr(t,(X) <c**) act ct wg, — Ôạ) (14.41) o

For the hypothesis Hy: 0=0, against H,: @=6, when 0, <9, the test statistic takes the form VnX aan 0) Oo r(X)= E I(x; 99, 9,) +v" (af - °) lụa ~O)<0), Ø VJ into, —9,)] Ø 2 which gives rise to the rejection region C,={x:r(X) Xe) (14.42)

(2) Composite null and composite alternative (one parameter case) For the hypothesis

Ho: 029% against

HỊ:0<0%

being the other extreme of two simple hypotheses, no such results as the Neyman—Pearson theorem exist and it comes as no surprise that no UMP

tests exist in general The only result of some interest in this case is that if we

restrict the probability model to require the density functions to have monotone likelihood ratio in the test statistic t(X) then UMP tests do exist

This result is of limited value, however, since it does not provide us with a

Trang 15

14.4 The likelihood ratio test procedure 299 (3) Simple H, against composite H,

In the case where we want to test Hy: @= 6, against H,: 0>@p (or 0< 4) uniformly most powerful (UMP) tests do not exist in general In some particular cases, however, such UMP tests do exist and the Neyman— Pearson theorem can help us derive them If the UMP test for the simple

Ho: 0=0, against the simple H,: @=0, does not depend on 0, then the same test is UMP for the one-sided alternative 6> 49 (or 6< 9 ) In the example discussed above the tests defined by

C,={x: (X)2c¥} (14.43)

and

C,={x: 1(X) <c¥} (14.44)

are also UMP for the hypotheses Hy: @=6, against H,: >6@) and H,: 6=6, against H,: 0<@p, respectively This is indeed confirmed by the

diagram of the power function derived for the ‘marks’ example above Another result in the simple class of hypotheses is available in the case where sampling is from a one-parameter exponential family of densities (normal, binomial, Poisson, etc.) In such cases UMP tests do exist for one- sided alternatives

Two-sided alternatives

For testing Hy: 0= 4, against H,: 646, no UMP tests exist in general This is rather unfortunate since most tests in practice are of this type One interesting result in this case is that if we restrict the probability model to the one-parameter exponential family and narrow down the class of tests by

imposing unbiasedness, then we know that UMP tests do exist The test

defined by the rejection region

C,={x: |r(X)]>c,} (14.45)

(see ‘marks’ example) is indeed UMP unbiased; the one-sided alternative tests being biased over the whole of O

14.4 The likelihood ratio test procedure

The discussion so far suggests that no UMP tests exist for a wide variety of

cases which are important in practice However, the likelihood ratio test

procedure yields very satisfactory tests for a great number of cases where none of the above methods is applicable It is particularly valuable in the case where both hypotheses are composite and @ is a vector of parameters

This procedure not only has a lot of intuitive appeal but also frequently

Trang 16

300 Hypothesis testing and confidence regions Consider Hạ: 0c©o against H,:@€0, Let the likelihood function be L(0; x), then the likelihood ratio is defined by ax L(O; ~ 4x) _ Hiệp TU _ Tổ x) (14.46) max L(; x) (8; x) cQ

The numerator measures the highest ‘support’ x renders to 8€ @, and the denominator measures the maximum value of the likelihood function (see Fig 14.4) By definition A(x) can never exceed unity and the smaller it is the less Hy is ‘supported’ by the data This suggests that the rejection region based on «(x) must be of the form

Cy, ={x: Ax)<k}, O<k<1, (14.47)

Trang 17

14.4 The likelihood ratio test procedure 301

distribution of A(x) under both H, and H, is known This is usually the exception rather than the rule The exceptions arise when ® is a normal family of densities and X is a random sample in which case A(x) is often a monotone function of some of the pivots we encountered above Let us

illustrate the procedure and the difficulties arising by considering several examples Example 1! l/x— 2 =| osm = ag APY 2p) Ame eR AR X=(X,,X5, ,X,) Let

be the probability model and X=(X,, X,, , X,,)’ be a random sample

from f(x;4,07), Ho: =o against H,: uA Họ 1 H L(0; x)=(2nơ)"!? splio: Y -u i=l => —Hm/2 > (X;— Ho)? i=] y (x,-X,)? ¡=1

At first sight it might seem an impossible task to determine the distribution

of A(x) Note, however, that AX) = H t > (x; —Ho)? = ¥ (x¡— X„)*+n(X„— , i=t ¡=1 which implies that Vv 2 —n/2 2N -H/2 Ax)= 14+ An Hol -(1455) > (x¡— X„? "

where W= /n[(Ý„— mạ)/s]~ tín — 1) under Ho,

W~t(n—1; 8) under H,, ja MH) uu, €O)

Since A(x) isa monotone decreasing function of W the rejection region takes the form

Trang 18

and z, c, and Z0) can be derived from the distribution of W Example 2 In the context of the statistical model of example 1 consider Hy: 07 =08 against H,:0°?403, O=RxR, and =(, 05), HER} =í§ =‡¡0 5l (aX nj 2 1 n (X,—X,)* n exp 2 ¬ oe n : The inequality A(x) <k is equivalent to v<k, or v>k, where Me > (x, —¥,)? ~y7(n—1) under Hy Sh] = i=1 and 2 s nơi t~z'(n—l;ð) under H,, d6=—,;, Ớo 2 øic©,, with k, and k, defined by k 2 | đz?{(n—l)=1—#, ky eg if «=0.1, n—1=30, k, = 18.5, k,=29.3

Hence, the rejection region is C, = {x: v<k, orv>k,} Using the analogy

between this and the various tests of » we encountered so far we can

postulate that in the case of the one-sided hypotheses:

(i) Hạ: ø?>ơa, H,: ø?<ø the rejection region is C, ={x:v<k,};

(ii) Ho: 0? $03, Hy: 0? > a3, C, ={x: vSkyh

The question arising at this stage is: ‘What use is the likelihood ratio test procedure if the distribution of A(X) is only known when a well-known pivot exists already” The answer is that it is reassuring to know that the

procedure in these cases leads to certain well-known pivots because the

likelihood ratio test procedure is of considerable importance when no such pivots exist Under certain conditions we can derive the asymptotic

distribution of A(X) We can show that under certain conditions

Ho

~2 log A(X) ~ (7) (14.48)

Trang 19

14.5 Confidence estimation 303 Ho ('~° reads ‘asymptotically distributed under H,’), r being the number of x parameters tested This will be pursued further in Section 16.2 14.5 Confidence estimation

In point estimation when an estimator 6 of @ is constructed we usually think of it not just as a point but as a point surrounded by some region of possible

error, ie 6+e, where e is related to the standard error of 6 This can be

viewed as a crude form of a confidence interval for @

(6-e<0<6+8); (14.49)

crude because there is no guarantee that such an interval will include 0

Indeed, we can show that the probability the @ does not belong to this

interval is actually non-negative In order to formalise this argument we

need to attach probabilities to such intervals In general, interval estimation

refers to constructing random intervals of the form

(c(X) <6 <7(X)), (14.50)

together with an associated probability for such a statement being valid

1(X) and 7(X) are two statistics referred to as the lower and upper ‘bound’ respectively; they are in effect stochastic bounds on 0 The associated probability will take the form

Pr(t(X) <0 <a(X))= 1-4, (14.51)

where the probabilistic statement is based on the distribution of 1(X) and 1X) The main problem is to construct such statistics for which the distribution does not depend on the unknown parameter(s) 6 This,

however, is the same problem as in hypothesis testing In that context we

Trang 20

suggests that in the long-run (in repeated experiments) the random interval (t(X), 7(X)) will include the ‘true’ but unknown @ For any particular realisation x, however, we do not know ‘for sure’ whether (t(X), t(X))

includes or not the ‘true’ 0; we are only (1 —«) confident that it does The

duality between hypothesis testing and confidence intervals can be seen in the ‘marks’ example discussed above For the null hypothesis Ho: 0=0o %€® against H,: @# 6p, we constructed a size a test based on the acceptance region G6 os Ø C (89) =4X: 09 —C, —— SX, SA +6, (14.53) vm vn with c, defined by ed | ó(z)dz=l—ø, Z~N(0, ]) (14.54)

This implies that PrixeCp, G=6,)=1-—a and hence by a simple manipulation of Cy we can define the (1 —«) confidence interval

x)=} X,-c,—- <0, cite 4 vn a/k (14.55)

Pr0yeC)=1—z (14.56)

In general, any acceptance region for a size x test can be transformed into a (1—a) confidence interval for 6 by changing Co, a function of x EX to C,a function of 6,€0 One-sided tests correspond to one-sided confidence intervals of the form Pr+(X)<0)>1—ø (14.57) or _ Pr(0 <t(X))> 1—z (14.58)

In general when © = R”, m> 1, the family of subsets C(X) of O where C(X) depends on X but not @ is called a random region For example,

C(X)={0::(X)<0<r(X)} or C(X)={9::(X)<0) (14.59) The problem of confidence estimation is one of constructing a random region C(X) such that, for a given x €(0, 1),

Trang 21

14.5 Confidence estimation 305 It is interesting to note that C(X) could be interpreted as C(X) = (0: t,(X) <0; <t,(X), i= 1,2, ,m}, (14.61) in which case if (c,(X), t,(X)) represent independent (1—«;) confidence intervals 13 i=1,2, ,.m and (l-a)= i (1 —#/) (14.62) 1 ụ

The duality between hypothesis testing and confidence estimation does

not end at the construction stage The various properties of tests have corresponding counterparts in confidence estimation

Definition 7

A family of (—a) level confidence regions C(X) is said to be uniformly most accurate (UMA) among (1—2) level confidence regions C*(X) if

Pr(x: 0€ C(X)/0) < Pr(x: AE C*(X)/O) for all EO (14.63) This clearly shows that when power is reinterpreted as accuracy it provides us with the basic optimality criterion in confidence estimation It turns out (not surprisingly) that UMP tests lead to UMA confidence regions This 1s because

8cC(X)c<@_ and only if xeC (a7 (14.64) where C,(0) represents the acceptance region of Hy: €=@q In effect the confidence region C(X) can be formed by

C(x)= {Bạ: xe Cạ(Øạ)}, (14.65)

and the acceptance region C,(6) by

C(Io) = 1X: 8g EC(X)}, (14.66)

hence

Pix: X 6 Co(Oo)/8 = Oo) = Pr(x: 05 € C(x)/@=0) = 1-4 (14.67) This duality between Co(@)) and C(X) is illustrated below for the above example assuming that n= 1 to enable us to draw the graph given in Fig

14.5,

Trang 22

306 Hypothesis testing and confidence regions KEL 6 26) =1 a (85) = 2 B10) = O—cy Co (8g) xX, 5 (85) GEO Fig 14.5 The duality between hypothesis testing and interval estimation Definition 8 A confidence region C(X) for @, is said to be unbiased at confidence level (l—a) if Pr(x: 6, €C(x)/8,)<1—a for 0,,0,€0 (14.68) In general, a ‘good’ test will give rise to a good confidence region and vice versa (see Lehmann (1959))

14.6 Prediction

In the context of a statistical model as defined by the probability and

sampling model components, prediction refers to the construction of an ‘optimal’ Borel function /(-} (see Chapter 6) of the form:

):0 5% (14.69)

which purports to provide a ‘good guess’ for the value of a random variable X,,+, which does not belong to the postulated sample If we denote the

Trang 23

14.6 Prediction 307

‘good’ predictor of X,,,, Given that 6,=/A(X,) we can define I(-) as a function of the sample directly, ie 1(6,)=((X,) Properties of optimal predictors were discussed in Section 12.3, but no methods of constructing such predictors were mentioned The purpose of this section is to consider this problem briefly

The problem of constructing optimal predictors refers to the question of ‘how do we choose the function I(-) so as the resulting predictor to satisfy certain desirable properties” To be able to answer this question we need to specify what the desirable properties are The single most widely used

criterion for a good predictor is minimum mean square error (MSE) This

criterion suggests choosing [(-) in such a way so as to minimise

E(X, 44 ~1(X,))’, (14.70)

where E(-) is defined in terms of the joint distribution of X,_, and X,, say,

D(X, 4,,X,; w) It turns out that the solution of this minimisation problem

is theoretically extremely simple This is because (70) can be expressed in the form

EWX,.¡ —!X,))Ÿ

=E(X,+¡ — E(X„ ¡/0(X,)} + (E(X, ¡/ø0X,)) — IX,)j)? =E(X„.¡ — E(X,„ /ø(X,)))? + EUELX„.,/ø(X,)) —IXU)?

+2E([X„.¡ — EWX,„.,/ø(X")}1E(X„ :/0(X,) —UX,)}) (14.71)

Using the properties CES and SCES of Section 7.2 we can show that the last term is equal to zero Hence, (71) is minimised when

(X,)= E(X,, 4 ;/o(X,)) (14.72)

When this is the case the second term is also equal to zero That is, the form of the predictor X,, , =1(X,,) which minimises (70) is

Xi =E(X,.¡/ø@,) (14.73)

where o(X,,) is the a-field generated by X, (see Chapter 4)

As argued in Sections 5.4 and 7.2, the functional form of E(X, , ,/o(X,,)) depends entirely on the form of the joint distribution D(X,,,,, X,; w) For

example, in the case where D(X,,.,, X,; W) is multivariate normal then the conditional expectation is linear, i.e

E(X,,, 1/A(X,,)) = BX, (14.74)

(see Chapter 15) In practice, when D(X,,,,, X,; W) is not known linear

Trang 24

functional form of

E(X,, + ,/o(X,,)) = g(X,,) (14.75)

In such cases the joint distribution is implicitly assumed to be closely approximated by a normal distribution

The prediction value of X,,., will take the form

Xn +4 = E(X,,41/X,=X%,), (14.76)

where x, refers to the observed realisation of the sample X, The intuition underlying (76) is that the best ‘guess’ for the value X,,, must be the average of all its possible values, in view of the past realisations of X(X,,=x,) (see Fig 14.6)

Itis important to note that in the case where X,, , and X,, are independent then

E(X 4 ¡/0(X„))= E(X„, ¡) (14.77)

That is, the conditional expectation coincides with the marginal expectation of X,,,, (see Chapters 6 and 7) This is the reason why in the case of the random sample X, where X,~ N(O, 1),i=1,2, n,if X,4, is

also assumed to have the same distribution, its best predictor (in MSE

sense) is its mean, i.e X, , =(1/n) » ¡ Ä;(see Section 12.3) Itgoes without saying that for prediction purposes we prefer to have non-random sampling

models because the past history of the stochastic progess | X,,, n= 1} will be

of considerable value in such a case

Trang 25

146 Prediction 309

Ø and the same analysis as in Section 14.5 goes through with minor interpretation changes

Important concepts

Null and alternative hypotheses, acceptance region, rejection region, test statistic, type I error, type II error, power of a test, the power function, size of a test, uniformly most powerful test, unbiased test, simple hypothesis, composite hypothesis, Neyman—Pearson lemma, likelihood ratio test,

pivots, confidence region, confidence level, uniformly most accurate

confidence regions, unbiased confidence region, optimal predictor, minimum mean square error ooo NID 10 12 13 14 15 16 Questions Explain the relationship between Hy and H, and the distribution of the sample

Define the concepts of a test statistic, type 1 and type II errors and probabilities of type I and IT errors

Explain intuitively why we cannot control both probabilities of type I and type II errors How do we ‘solve’ this problem in hypothesis testing?

Define and explain the concepts of the power of a test and the power function of a test

Explain the concept of the size of a test Define and explain the concept of a UMP test State the components needed to define a test

Explain why we need to know the distribution of the test statistic under both the null and the alternative hypotheses

Define the concept ofa pivot and explain its role in hypothesis testing Explain the concepts of one-sided and two-sided tests

Explain the circumstances under which UMP tests exist

Explain the Neyman—Pearson theorem and the likelihood ratio test procedure as ways of constructing optimal tests

Explain intuitively the meaning of the statement Pr(t(X) <0 <1(X)) = 1a

Define the concept of a (1—«) confidence region for @

Explain the relationship between C,(6,), the acceptance region for

Trang 26

17 Define and explain the concept of a (1 —«) uniformly most accurate confidence region

Exercises

1, Forthe ‘marks’ example of Section 14.2 construct a size 0.05 test for Hy: 6=60 against H,: 0<60 Is it unbiased? Using this, construct a 0.95 significance level confidence interval for 0

2 Let X ~ N(u, 07) and consider the following hypotheses: (i) Ho: hoo Hị:H>Họẹ, 07? >90, bo — known;

(ii) Hy: 0? 203, H,:0?<o02, weR, o% - known; (iii) Ho: M=Mo, 0° =03, Hy: eA, 0°? #03; (iv) Ho: =o, 07? S02, Hị:Hz#+ug, 07? <8

State whether the above null and alternative hypotheses are simple or

composite and explain your answer

3 Let X=(X,, ,X,,)’ bea random sample from N(O, 1) where 0Q =

{0,,0,} Construct a size x test for

Hy: 0=09, against H,: 0=6)

Using this, construct a (1~—2) significance level confidence interval for @

4 LetX =(X,, ,X,}’ bea random sample from a Bernoulli distribution with a density function f(x; V=FU-6)'-*, x=0, 1 Construct a size « test for Hy: 0<, against H,: > 0p (Hint: (97 , X;) is binomially distributed.) 5 Let X=(X,, , X,)' be a random sample from Nịu, ø?) () Show that the test defined by the rejection region X,- 1 ¢ — Cc, a Yn tl, ` — đe]

defnes a UMP unbiased test for Hạ: u < uọ against H,: > Uo

(") Derive a UMP unbiased test for Hg: p>) against H,: p< po

6 Let X=(X,, , X,) and Y=(Y,, , Y,,)’ be two random samples

from N(u,o7) and N(u, 03) respectively Show that for the hypotheses:

Trang 27

Additional references 311 (iii) Hy:o?=03, Hị:ø1zZ0), the rejection regions are: Cp ={xX yxy) 2k}, Cị= {X, y:T{X, Y) <ky}, Cy = x,y: k3 <x, y) <k4}, respectively, where

define UMP unbiased tests (see Lehmann (1959), pp 169-70) What is the distribution of 1(x, y)?

(iv) Construct a size « test for Hy: 0? =03=07, H,: 07 £04 using

the likelihood ratio test procedure

Additional references

Định dạng
Số trang	27
Dung lượng	814,76 KB