CHAPTER 14
Hypothesis testing and confidence regions
The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher’s work on estimation As in estimation, we begin by postulating a statistical model but instead of seeking an estimator of 8@in @ we consider the question
whether @€@, <9 or EO, = O-— @, is mostly supported by the observed
data The discussion which follows will proceed in a similar way, though less systematically and formally, to the discussion of estimation This is due to the complexity of the topic which arises mainly because one is asked to assimilate too many concepts too quickly just to be able to define the problem properly This difficulty, however, is inherent in testing, if any proper understanding of the topic is to be attempted, and thus unavoidable Every effort is made to ensure that the formal definitions are supplemented with intuitive explanations and examples In Sections 14.1 and 14.2 the
concepts needed to define a test and some criteria for ‘good’ tests are
discussed using a simple example In Section 14.3 the question of constructing ‘good’ tests is considered Section 14.4 relates hypothesis testing to confidence estimation, bringing out the duality between the two areas In Section 14.5 the related topic of prediction is considered 14.1 Testing, definitions and concepts
Let X be a random variable (r.v.) defined on the probability space
(S, A P(-)) and consider the statistical model associated with X:
(i) D={ f(x; 6), GEO}:
(ii) X=(X,, X2, , X,,)' is a random sample, from f(x; 6)
Trang 2286 Hypothesis testing and confidence regions
conjecture about @ of the form 6 belongs to some subset @, of @ is
supported by the data x =(x,,x, ,X,)' Wecall such a conjecture the null
hypothesis and denote it by H 5: 8 € Og, where if the sample realisation x EC,
we accept Ho, if x EC, we reject it The mapping which enables us to define Co and C, we call a test statistic 1(X): # > R (see Fig 11.4)
In order to illustrate the concepts introduced so far let us consider the
following example Let X be the random variable representing the marks
achieved by students in an econometric theory paper and let the statistical model be: 1 1/X -0\? i O=< f(X; A= —~|—— = : () lớn = Tam ep ( : ) | 0e© =[0, 100]; (ii) X=(X¡.X; ,X/,, n=40 is a random sample from ƒ(x; Ø) The hypothesis to be tested is Hy: 0=60 (i.e X ~N(60,64)), ©, ={60} against H,: 0460 (ie X~N(u, 64), 1 #60), ©, =[0, 100] — {60} Common sense suggests that if some ‘good’ estimator of 0, say X,,=
(1/n) $7 X;, for the sample realisation x takes a value ‘around’ 60 then we will be inclined to accept H, Let us formalise this argument:
The acceptance region takes the form 60-—e<X,,<60+8, e>0, or
Co= (x: |X, —60| <e},
and
C,={x: |X, —60|2e} is the rejection region
The next question is, ‘how do we choose e” If ¢ is too small we run the risk of rejecting Hy when it is true; we call this type I error On the other hand, if ¢is too large we run the risk of accepting H, when it is false; we call this type II
error Formally, ifxeC, (reyect Hy) and 6€@, (H, is true) — type I error; if x € Cy (accept H,) and 6 € ©, (His false) — type I] error (see Table 14.1) The
Table 14.1
Hy accepted Hy rejected
H, true correct type I error
Trang 314.1 Testing, definitions and concepts 287
hypothesis to be tested is formally stated as follows: Hạ: 0cO,, O)SO (14.1) Against the null hypothesis H, we postulate the alternative H, which takes the form: H,: 0€05 (14.2) or, equivalently, H,: 0€0, =O-O, (14.3)
It is important to note at the outset that H, and H, are in effect hypotheses about the distribution of the sample ƒ(x; 9), Le
Ho: f(x: 8), @E€O,, H,: f(x: 6), GeO, (14.4) A hypothesis H, or H, is called simple if knowing 6€ ©, or 6 €©, specifies J (x; 8) completely, otherwise it is called a composite hypothesis That is, if I(x; 0), 0€ Oy or f(x; 8), GEO, contain only one density function we say that Hy or H, are simple hypotheses, respectively; otherwise they are said to be composite
In testing a null hypothesis Hy against an alternative H, the issue is to decide whether the sample realisation x ‘supports’ Hy or H, In the former
case we say that H is accepted, in the latter Hg is rejected In order to be able to make such a decision we need to formulate a mapping which relates ©, to some subset of the observation space 2% say Co, we call an acceptance region, and its complement C, (Cy UC, =4%) Co \C,=@) we call the
rejection region (see Fig 11.4) Obviously, in any particular situation we cannot say for certain in which of the four boxes in Table 14.1 we are in; at
best we can only make a probabilistic statement relating to this Moreover, if we were to choose ¢ ‘too small’ we run a higher risk of committing a type I error than of committing a type IJ error and vice versa That is, there is a trade off between the probability of type I error, i.e
Pr(xeC,; OE Oo) =a, (14.5)
and the probability B of type II error, i.e
Pr(xeŒạ; 9c©;)= ổ (14.6)
Ideally we would like x = 6 =0 for all @ € © which is not possible for a fixed n
Moreover, we cannot control both simultaneously because of the trade-off
between them ‘How do we proceed, then” In order to help us decide let us
Trang 4288 Hypothesis testing and confidence regions
The jury in a criminal offence trial are instructed to choose between: Hy: the accused is not guilty; and
H,: the accused is guilty;
with their decision based on the evidence presented in the court This evidence in hypothesis testing comes in the form of ® and X The jury are instructed to accept Hy unless they have been convinced beyond any reasonable doubt otherwise This requirement is designed to protect an innocent person from being convicted and it corresponds to choosing a small value for «, the probability of convicting the accused when innocent By adopting such a strategy, however,they are running the risk of letting a number of ‘crooks off the hook’ This corresponds to being prepared to accept a relatively high value of Ø, the probability of not convicting the accused when guilty, in order to protect an innocent person from
conviction This is based on the moral argument that it is preferable to let
off a number of guilty people rather than to sentence an innocent person However, we can never be sure that an innocent person has not been sent to prison and the strategy is designed to keep the probability of this happening very low A similar strategy is also adopted in hypothesis testing where a small value of » is chosen and for a given a, f is minimised Formally, this amounts to choosing «* such that
Pr(x EC; 0€O,)=a(8)<a* for PEO, (14.7) and Pr(x €Cy; 0€O,)= (8) is minimised for 0e©, (14.8) by choosing C, or Cy appropriately In the case of the above example if we were to choose a, say «* =0.05, then Pr(|X,, —60| > e; 8=60)=0.05 (14.9)
Trang 514.1 Testing, definitions and concepts 289
and thus the distribution of t(-) is known completely (no unknown parameters) When this is the case this distribution can be used in conjunction with the above probabilistic statement to determine e In order
to do this we need to relate |X,,—60| to r(X) (a statistic) for which the
distribution is known The obvious way to do this is to standardise the
former, i.e consider |X, —60|/1.265 which is equal to |z(X)| This suggests
changing the above probabilistic statement to the equivalent statement
p,( " = 00 = 60)=005 where c, = —— 1.265 1.265 (14.12)
Given that the distribution of the statistic t(X) is symmetric and we want to determine c, such that Pr(|c(X)| 2 c,)=0.05 we should choose the value of c, from the tables of N(0, 1) which leaves «*/2 =0.025 probability on either side of the distribution as shown in Fig 14.1 The value of c, given from the N(O, 1) tables is c, = 1.96 This in turn implies that the rejection region for the test is ¥,—60 C, -}x nO 196) [x |r(X)|> 1.96) (14.13) or C, = {x:|X,, 60] > 2.48} (14.14)
That is, for sample realisations x which give rise to X,, falling outside the interval (57.52, 62.48) we reject Ho
Let us summarise the argument so far in order to keep the discussion in perspective We set out to construct a test for Hy: 0=60 against H,:0460
and intuition suggested the rejection region (|X,,—60|>c) In order to
determine ¢ we had to
(i) choose an «a; and then
Trang 6290 Hypothesis testing and confidence regions
Given that C¡ ={x: |r(X)|> 1.96} defines a test with « =0.05, the question
which naturally arises is: ‘What do we need the probability of type IT error 8 for? The answer is that we need f to decide whether the test defined in terms of C, is a ‘good’ or a ‘bad’ test As we mentioned at the outset, the way we decided to ‘solve’ the problem of the trade-off between « and f was to choose a smail value for a and define C, so as to minimise f At this stage we do not know whether the test defined above is a ‘good’ test or not Let us consider setting up the apparatus to enable us to consider the question of
optimality
14.2 Optimal tests
Since the acceptance and rejection regions constitute a partition of the observation space % ie CoUC,=2% and CynC,=2, it implies that
Pr(x e Cạ)= 1— Pr(x eC)) for all 8Ââ, Hence, minimisation of Pr(x €C,)
for all EO, is equivalent to maximising Pr(x eC;) for all 9€Q,
Definition 1
The probability of rejecting Hy when false at some point 6, €©,, i.e Pr(xeC;; 0=) ¡s called the power of the test a 0= 0)
Note that
Pr(xeC¡;0=0Ø)= 1— Pr(xeCạ; 0=8,)= 1— 8(0)) (14.15)
In the case of the example above we can define the power of the test at some
0, €©,, say 0, =54, to be Pr[(|X,,—60))/1.265> 1.96; 6=54] ‘How do we
calculate this probability” The temptation is to suggest using the same distribution as above, i.e 7(X) =(X,, — 60)/ 1.265 ~ N(0, 1) This is, however, wrong because @ is no longer equal to 60; we assumed that 6= 54 and thus (X,, — 54)/1.265 ~ N(O, 1) This implies that (54-60) 1265 ` :X)~ NỈ ' for 8=54 Using this we can define the power of the test at Ø= 54 to be Pr X,—60 > 1.96: 0=54 |= Pr (Xn = 4) - _ 1.96 24-69) 1.265 1.265 1.265
Trang 7142 Optimal tests 291 need to calculate the power for all 9 ¢@, Following the same procedure the
power of the test defined by C, for 0= 56, 58, 60, 62, 64, 66 is as follows: Pr(|c(X)| > 1.96; 0 = 56) =0.8849, Pr(|t(X)| > 1.96; @= 58) = 0.3520, Pr(|r(X)| > 1.96; 0=60)=0.05, Pr(|t(X)| > 1.96; 6= 62) =0.3520, Pr(|t(X)| > 1.96; 0 = 64) = 0.8849, Pr(|t(X)| > 1.96; 0=66)=0.9973
As we can see, the power of the test increases as we go further away from 0=60 (Hạ) and the power at 6=60 equals the probability of type I error
This prompts us to define the power function as follows: Definition 2
A0)=Pr(xeEC,), G€O is called the power function of the test defined by the rejection region C,
Definition 3
œz=maXạ,e, A(8) is defined to be the size (or the significance level) of the test
In the case where Hy is simple, say 6= 6, then « = A(6,) These definitions enable us to define a criterion for ‘a best’ test of a given size « to be the one (if
it exists) whose power function A(@), 8€@, is maximum at every 0
Definition 4
A test of Hạ: 0 c©g against H,: 0 €O, as defined by some rejection
region C, is said to be uniformly most powerful (UM P) test of size x
if
(i) max A(8) =a;
GEO,
(ii) 2(0)>Z*\0)_ for all 0e©;;
where Z2*() ¡is the power function oƒ any other test 0ƒ size a
As we saw above, in order to be able to determine the power function we
need to know the distribution of the test statistic t(X) (in terms of which C,
is defined) under H, (i.e when Hg is false) The concept of a UMP test provides us with the criterion needed to choose between tests for the same H o
Trang 8292 Hypothesis testing and confidence regions f (z) Cy ={x:1(X) = 1.645} (14.16) 0 1.645 2 f (z) C‡ * ={x+(X) <1.648} (14.17) —1.645 0 Zz f tz) C‡ * * ={xjr(X)|<0.038} (14.18) -0.03 0 003 Zz
Fig 14.2 The rejection regions (14.16), (14.17) and (14.18)
above with rejection region
Cy = {x: |c(X)| > 1.96} (14.19)
To that end we shall compare the power of this test with the power of the size 0.05 tests (Fig 14.2), defined by the rejection regions All the rejection regions define size 0.05 tests for Hy: 0=60 against H,: 860 In order to discriminate between ‘bad’, ‘good’ and ‘better’ tests we have to calculate their power functions and compare them The diagram of the power functions A(0), 2 ”(0), F~ *(0), A” ~ *(0) is illustrated in Fig 14.3
Looking at the diagram we can see that only one thing is clear ‘cut’; C; ** defines a very bad test, its power function being dominated by the other tests Comparing the other three tests we can see that C* is more
powerful than the other two for 0> 60 but A* (6) <a for 0<60.C/ * is more
powerful than the other two for 0<60 but Y* *(@)<« for Ø0 > 60, but none of the tests is more powerful over the whole range That there is no UMP test of size 0.05 for Hy: 0= 60 against H,: 0460 As will be seen in the sequel, no UMP tests exist in most situations of interest in practice The procedure adopted in such cases is to reduce the class of all tests to some subclass by
Trang 914.2 Optimal tests 293 P** (8) #2" (0) 1.00 / À ++? (8) 0.05 + 0
Fig 14.3 The power functions #0),.2*(0), #7 "(0,2° * *(8)
the subclass One of the most important restrictions used in this context is
the criterion of unbiasedness Definition 5
A test of Hy: 0€@y against @€O, is said to be unbiased if
max Z0) < max 2(0) (14.20)
0cÓ 0c@;
In other words, a test is unbiased if it rejects Hạ more often when I( is false
than when it is true; a minimal but sensible requirement Another form
these added restrictions can take which reduces the problem to one where UMP do exist is related to the probability model ® These include restrictions such as that ® belongs to the one-parameter exponential family In the case of the above example we can see that the test defined by C* *
is biased and C, is now UMP within the class of unbiased tests This is because C/ and Cy * are biased for 6<60 and @> 60 respectively It is
obvious, however, that for Hạ: 0=60 against HỊ:0>60 or H*: 0<60,
the tests defined by C/ and C; * are UMP, respectively That is, for the one- sided alternatives there exist UMP tests given by C/ and Cj It is
important to note that in the case of H, and Hf above the parameter space implicitly assumed is different In the case of H, the parameter space implicitly assumed is © = [60, 100] and in the case of H¥, © = [0, 60] This is
Trang 10294 Hypothesis testing and confidence regions
Collecting all the above concepts together we say that a test has been defined when the following components have been specified:
(T1) a test statistic t(X) (T2) the size of the test a
(T3) the distribution of t(X) under Hy and H, (T4) the rejection region C, (or, equivalently, C,)
Let us illustrate this using the marks example above The test statistic is
n(X,„— 0) _ (X,—60)
xX = =
HX) đ 127 7 (14.21)
we call it a statistic because ø is known and 0 is known under Hạ and H, lf we choose the size z = 0.05 the fact that r(X) ~ NỊ0, 1) under Hạ enables us to
define the rejection region C¡ = {x: |r(X)|>c„} where c„ is determined from
Pr(|t(X)| > c,; 8 = 60) =0.05 to be 1.96, from the standard normal tables, ie if pz) denotes the density function of N(0, 1) then
[ o(2) (14.22)
In order to derive the power function we need the distribution of 1(X) under
H, Under H, we know that /WX,, — 91) x a 1*(X)= N(O, 1), (14.23) for any 8, €@, and hence we can relate t(X) with t*(X) b t(X)=t*(X)+ vín =« (14.24) to deduce that ‹X)~N( vũ —=- i (14.25) under H, This enables us to define the power function as AO.) = Pr(x: |r(X)| >c,) = Pr( <4) <-¢,- vn vn) Ø +Pi(= (X)>c, ~ Jnl? - ma), 6,€Q) (14.26)
Trang 1114.2 Optimal tests 295 which we need to know its distribution under both H, and H, Hence,
constructing an optimal test is largely a matter of being able to find a statistic t(X) which should have the following properties:
(i) 1(X) depends on X via a ‘good’ estimator of 0; and
(ii) the distribution of t(X) under both H, and H, does not depend on
any unknown parameters We call such a statistic a pivot
It is no exaggeration to say that hypothesis testing is based on our ability to
construct such pivots When X isa random sample from N(u, 0”) pivots are readily available in the form of
vn (aS #)-A6,0, `: Ha —l), (n— I) s~zn= 1), 2 (14.27)
but in general these pivots are very hard to come by
The first pivot was used above to construct tests for » when o? is known (both one-sided and two-sided tests) The second pivot can be used to set up similar tests for 4 when o? is unknown For example, testing Ho: u=Lo against H,: u# flo the rejection region can be defined by
C,={x:|r,(X)|>c,} where t,(X)= vu ; (Xa) (14.28)
and c, can be determined by: fe f(t}dt= 1—2; f(t) being the density of the
Student’s t-distribution with n—1 degrees of freedom For Ho: p=fMo
against H,: < mạ the rejection region takes the form Cy= (x: 1(X)2c,} with =| f(t) dt, (14.29) determining c, The pivot ¬ tạX)=S ĐỀ - sự ơ? — |) (1430)
can be used to test hypotheses about o” For example, in the case of a
random sample from N(u,¢’) testing Hy: 0? > 0% against H,: o?<o?2 the rejection for an optimal test takes the form
C, = (x: t(X) <c,}, (14.31)
where c, is determined via | 4m= b=s
Trang 12296 Hypothesis testing and confidence regions
14.3 Constructing optimal tests
In constructing the tests considered so far we used ad hoc intuitive arguments which led us to a pivot As with estimation, it would be helpful if there were general methods for constructing optimal tests It turns out that the availability of a method for constructing optimal tests depends crucially on the nature of the hypotheses (H and H,) or/and the probability model postulated As far as the nature of Hy and H, is concerned existence and optimality depend crucially on whether these hypotheses are simple or
composite As mentioned in Section 14.2 a hypothesis Hy or H, is called
simple if Og or ©, contain just one point respectively In the case of the
‘marks’ example above, ©, = {60} and ©, = {[0, 60) v (60, 100]}, Le Hạ is
simple and H, is composite since it contains more than one point Care should be exercised when @ is a vector of unknown parameters because in such a case ©, or ©, must contain single vectors as well in order to be simple For example, in the case of sampling from N(u, 07) and ø2 is not
known, Ho: = Hạ 1S not a simple hypothesis since Oy = {(u, 0), 77 ER }
(1) Simple null and simple alternative
The theory concerning two simple hypotheses was fully developed in the 1920s by Neyman and Pearson Let
O={ f(x; 6), 0E0}
be the probability model and X=(X,, X2, , X,,)’ be the sampling model and consider the simple null and simple alternative Hy: 0= 0) and H,:
0=0,,Đ= (0, 0,}, i.e there are only two possible distributions for ®, that
is, f(x; @) and f(x; 0,) Given the available data x we want to choose between the two distributions The following theorem provides us with sufficient conditions for the existence of a UMP test for this, the simplest of the cases in testing
Nevman—Pearson theorem
LetX=(X,,X , X,) beasample froma continuous distribution
Trang 1314.3 Constructing optimal tests 297
In this simple case
% for 0=,
1—f for 0=0, (14.34)
20={
The Neyman-Pearson theorem suggests that it is intuitively sensible to
base the acceptance or rejection of Hy on the relative values of the distributions of the sample evaluated at Ø= 0ạ and Ø= 0) 1.e reJect Họ 1ƒ the
ratlo ƒ(X; Øẹ)/ ƒ(%; Ø) 1s relatively small This amounts to rejecting Hạ when the evidence in the form of x favour H;, giving 1t a higher ‘support’ It is very important to note that the Neyman-—Pearson theorem does not solve the problem completely because the problem of relating the ratio f(x; @)/ f(x; 6,) to a pivotal quantity (test statistic) remains Consider the case
where X ~ N(0, 07), ? known, and we want to test Hy: 0= 0, against H: 6=6, (89 <4,) From the Neyman—Pearson theorem we know that the
rejection region defined in terms of the ratio _ F(X; 80) I(x; kh: 0) - 21 —ep [0 —8)—2Ä,10, -oon| (14.35)
Trang 14298 Hypothesis testing and confidence regions
For example tf « =0.05, c* = 1.645 and the power of the test is /m(0, — PO)= {n0 >c‡ TH: =1—, (14.39) where _ £0) =O) 9, 1) under H) (14.40) In this case we can control f if we can increase the sample size since 1 — B =Pr(t,(X) <c**) act ct wg, — Ôạ) (14.41) o
For the hypothesis Hy: 0=0, against H,: @=6, when 0, <9, the test statistic takes the form VnX aan 0) Oo r(X)= E I(x; 99, 9,) +v" (af - °) lụa ~O)<0), Ø VJ into, —9,)] Ø 2 which gives rise to the rejection region C,={x:r(X) Xe) (14.42)
(2) Composite null and composite alternative (one parameter case) For the hypothesis
Ho: 029% against
HỊ:0<0%
being the other extreme of two simple hypotheses, no such results as the Neyman—Pearson theorem exist and it comes as no surprise that no UMP
tests exist in general The only result of some interest in this case is that if we
restrict the probability model to require the density functions to have monotone likelihood ratio in the test statistic t(X) then UMP tests do exist
This result is of limited value, however, since it does not provide us with a
Trang 1514.4 The likelihood ratio test procedure 299 (3) Simple H, against composite H,
In the case where we want to test Hy: @= 6, against H,: 0>@p (or 0< 4) uniformly most powerful (UMP) tests do not exist in general In some particular cases, however, such UMP tests do exist and the Neyman— Pearson theorem can help us derive them If the UMP test for the simple
Ho: 0=0, against the simple H,: @=0, does not depend on 0, then the same test is UMP for the one-sided alternative 6> 49 (or 6< 9 ) In the example discussed above the tests defined by
C,={x: (X)2c¥} (14.43)
and
C,={x: 1(X) <c¥} (14.44)
are also UMP for the hypotheses Hy: @=6, against H,: >6@) and H,: 6=6, against H,: 0<@p, respectively This is indeed confirmed by the
diagram of the power function derived for the ‘marks’ example above Another result in the simple class of hypotheses is available in the case where sampling is from a one-parameter exponential family of densities (normal, binomial, Poisson, etc.) In such cases UMP tests do exist for one- sided alternatives
Two-sided alternatives
For testing Hy: 0= 4, against H,: 646, no UMP tests exist in general This is rather unfortunate since most tests in practice are of this type One interesting result in this case is that if we restrict the probability model to the one-parameter exponential family and narrow down the class of tests by
imposing unbiasedness, then we know that UMP tests do exist The test
defined by the rejection region
C,={x: |r(X)]>c,} (14.45)
(see ‘marks’ example) is indeed UMP unbiased; the one-sided alternative tests being biased over the whole of O
14.4 The likelihood ratio test procedure
The discussion so far suggests that no UMP tests exist for a wide variety of
cases which are important in practice However, the likelihood ratio test
procedure yields very satisfactory tests for a great number of cases where none of the above methods is applicable It is particularly valuable in the case where both hypotheses are composite and @ is a vector of parameters
This procedure not only has a lot of intuitive appeal but also frequently
Trang 16300 Hypothesis testing and confidence regions Consider Hạ: 0c©o against H,:@€0, Let the likelihood function be L(0; x), then the likelihood ratio is defined by ax L(O; ~ 4x) _ Hiệp TU _ Tổ x) (14.46) max L(; x) (8; x) cQ
The numerator measures the highest ‘support’ x renders to 8€ @, and the denominator measures the maximum value of the likelihood function (see Fig 14.4) By definition A(x) can never exceed unity and the smaller it is the less Hy is ‘supported’ by the data This suggests that the rejection region based on «(x) must be of the form
Cy, ={x: Ax)<k}, O<k<1, (14.47)
Trang 1714.4 The likelihood ratio test procedure 301
distribution of A(x) under both H, and H, is known This is usually the exception rather than the rule The exceptions arise when ® is a normal family of densities and X is a random sample in which case A(x) is often a monotone function of some of the pivots we encountered above Let us
illustrate the procedure and the difficulties arising by considering several examples Example 1! l/x— 2 =| osm = ag APY 2p) Ame eR AR X=(X,,X5, ,X,) Let
be the probability model and X=(X,, X,, , X,,)’ be a random sample
from f(x;4,07), Ho: =o against H,: uA Họ 1 H L(0; x)=(2nơ)"!? splio: Y -u i=l => —Hm/2 > (X;— Ho)? i=] y (x,-X,)? ¡=1
At first sight it might seem an impossible task to determine the distribution
of A(x) Note, however, that AX) = H t > (x; —Ho)? = ¥ (x¡— X„)*+n(X„— , i=t ¡=1 which implies that Vv 2 —n/2 2N -H/2 Ax)= 14+ An Hol -(1455) > (x¡— X„? "
where W= /n[(Ý„— mạ)/s]~ tín — 1) under Ho,
W~t(n—1; 8) under H,, ja MH) uu, €O)
Since A(x) isa monotone decreasing function of W the rejection region takes the form
Trang 18302 Hypothesis testing and confidence regions
and z, c, and Z0) can be derived from the distribution of W Example 2 In the context of the statistical model of example 1 consider Hy: 07 =08 against H,:0°?403, O=RxR, and =(, 05), HER} =í§ =‡¡0 5l (aX nj 2 1 n (X,—X,)* n exp 2 ¬ oe n : The inequality A(x) <k is equivalent to v<k, or v>k, where Me > (x, —¥,)? ~y7(n—1) under Hy Sh] = i=1 and 2 s nơi t~z'(n—l;ð) under H,, d6=—,;, Ớo 2 øic©,, with k, and k, defined by k 2 | đz?{(n—l)=1—#, ky eg if «=0.1, n—1=30, k, = 18.5, k,=29.3
Hence, the rejection region is C, = {x: v<k, orv>k,} Using the analogy
between this and the various tests of » we encountered so far we can
postulate that in the case of the one-sided hypotheses:
(i) Hạ: ø?>ơa, H,: ø?<ø the rejection region is C, ={x:v<k,};
(ii) Ho: 0? $03, Hy: 0? > a3, C, ={x: vSkyh
The question arising at this stage is: ‘What use is the likelihood ratio test procedure if the distribution of A(X) is only known when a well-known pivot exists already” The answer is that it is reassuring to know that the
procedure in these cases leads to certain well-known pivots because the
likelihood ratio test procedure is of considerable importance when no such pivots exist Under certain conditions we can derive the asymptotic
distribution of A(X) We can show that under certain conditions
Ho
~2 log A(X) ~ (7) (14.48)
Trang 1914.5 Confidence estimation 303 Ho ('~° reads ‘asymptotically distributed under H,’), r being the number of x parameters tested This will be pursued further in Section 16.2 14.5 Confidence estimation
In point estimation when an estimator 6 of @ is constructed we usually think of it not just as a point but as a point surrounded by some region of possible
error, ie 6+e, where e is related to the standard error of 6 This can be
viewed as a crude form of a confidence interval for @
(6-e<0<6+8); (14.49)
crude because there is no guarantee that such an interval will include 0
Indeed, we can show that the probability the @ does not belong to this
interval is actually non-negative In order to formalise this argument we
need to attach probabilities to such intervals In general, interval estimation
refers to constructing random intervals of the form
(c(X) <6 <7(X)), (14.50)
together with an associated probability for such a statement being valid
1(X) and 7(X) are two statistics referred to as the lower and upper ‘bound’ respectively; they are in effect stochastic bounds on 0 The associated probability will take the form
Pr(t(X) <0 <a(X))= 1-4, (14.51)
where the probabilistic statement is based on the distribution of 1(X) and 1X) The main problem is to construct such statistics for which the distribution does not depend on the unknown parameter(s) 6 This,
however, is the same problem as in hypothesis testing In that context we
Trang 20304 Hypothesis testing and confidence regions
suggests that in the long-run (in repeated experiments) the random interval (t(X), 7(X)) will include the ‘true’ but unknown @ For any particular realisation x, however, we do not know ‘for sure’ whether (t(X), t(X))
includes or not the ‘true’ 0; we are only (1 —«) confident that it does The
duality between hypothesis testing and confidence intervals can be seen in the ‘marks’ example discussed above For the null hypothesis Ho: 0=0o %€® against H,: @# 6p, we constructed a size a test based on the acceptance region G6 os Ø C (89) =4X: 09 —C, —— SX, SA +6, (14.53) vm vn with c, defined by ed | ó(z)dz=l—ø, Z~N(0, ]) (14.54)
This implies that PrixeCp, G=6,)=1-—a and hence by a simple manipulation of Cy we can define the (1 —«) confidence interval
x)=} X,-c,—- <0, cite 4 vn a/k (14.55)
Pr0yeC)=1—z (14.56)
In general, any acceptance region for a size x test can be transformed into a (1—a) confidence interval for 6 by changing Co, a function of x EX to C,a function of 6,€0 One-sided tests correspond to one-sided confidence intervals of the form Pr+(X)<0)>1—ø (14.57) or _ Pr(0 <t(X))> 1—z (14.58)
In general when © = R”, m> 1, the family of subsets C(X) of O where C(X) depends on X but not @ is called a random region For example,
C(X)={0::(X)<0<r(X)} or C(X)={9::(X)<0) (14.59) The problem of confidence estimation is one of constructing a random region C(X) such that, for a given x €(0, 1),
Trang 2114.5 Confidence estimation 305 It is interesting to note that C(X) could be interpreted as C(X) = (0: t,(X) <0; <t,(X), i= 1,2, ,m}, (14.61) in which case if (c,(X), t,(X)) represent independent (1—«;) confidence intervals 13 i=1,2, ,.m and (l-a)= i (1 —#/) (14.62) 1 ụ
The duality between hypothesis testing and confidence estimation does
not end at the construction stage The various properties of tests have corresponding counterparts in confidence estimation
Definition 7
A family of (—a) level confidence regions C(X) is said to be uniformly most accurate (UMA) among (1—2) level confidence regions C*(X) if
Pr(x: 0€ C(X)/0) < Pr(x: AE C*(X)/O) for all EO (14.63) This clearly shows that when power is reinterpreted as accuracy it provides us with the basic optimality criterion in confidence estimation It turns out (not surprisingly) that UMP tests lead to UMA confidence regions This 1s because
8cC(X)c<@_ and only if xeC (a7 (14.64) where C,(0) represents the acceptance region of Hy: €=@q In effect the confidence region C(X) can be formed by
C(x)= {Bạ: xe Cạ(Øạ)}, (14.65)
and the acceptance region C,(6) by
C(Io) = 1X: 8g EC(X)}, (14.66)
hence
Pix: X 6 Co(Oo)/8 = Oo) = Pr(x: 05 € C(x)/@=0) = 1-4 (14.67) This duality between Co(@)) and C(X) is illustrated below for the above example assuming that n= 1 to enable us to draw the graph given in Fig
14.5,
Trang 22306 Hypothesis testing and confidence regions KEL 6 26) =1 a (85) = 2 B10) = O—cy Co (8g) xX, 5 (85) GEO Fig 14.5 The duality between hypothesis testing and interval estimation Definition 8 A confidence region C(X) for @, is said to be unbiased at confidence level (l—a) if Pr(x: 6, €C(x)/8,)<1—a for 0,,0,€0 (14.68) In general, a ‘good’ test will give rise to a good confidence region and vice versa (see Lehmann (1959))
14.6 Prediction
In the context of a statistical model as defined by the probability and
sampling model components, prediction refers to the construction of an ‘optimal’ Borel function /(-} (see Chapter 6) of the form:
):0 5% (14.69)
which purports to provide a ‘good guess’ for the value of a random variable X,,+, which does not belong to the postulated sample If we denote the
Trang 2314.6 Prediction 307
‘good’ predictor of X,,,, Given that 6,=/A(X,) we can define I(-) as a function of the sample directly, ie 1(6,)=((X,) Properties of optimal predictors were discussed in Section 12.3, but no methods of constructing such predictors were mentioned The purpose of this section is to consider this problem briefly
The problem of constructing optimal predictors refers to the question of ‘how do we choose the function I(-) so as the resulting predictor to satisfy certain desirable properties” To be able to answer this question we need to specify what the desirable properties are The single most widely used
criterion for a good predictor is minimum mean square error (MSE) This
criterion suggests choosing [(-) in such a way so as to minimise
E(X, 44 ~1(X,))’, (14.70)
where E(-) is defined in terms of the joint distribution of X,_, and X,, say,
D(X, 4,,X,; w) It turns out that the solution of this minimisation problem
is theoretically extremely simple This is because (70) can be expressed in the form
EWX,.¡ —!X,))Ÿ
=E(X,+¡ — E(X„ ¡/0(X,)} + (E(X, ¡/ø0X,)) — IX,)j)? =E(X„.¡ — E(X,„ /ø(X,)))? + EUELX„.,/ø(X,)) —IXU)?
+2E([X„.¡ — EWX,„.,/ø(X")}1E(X„ :/0(X,) —UX,)}) (14.71)
Using the properties CES and SCES of Section 7.2 we can show that the last term is equal to zero Hence, (71) is minimised when
(X,)= E(X,, 4 ;/o(X,)) (14.72)
When this is the case the second term is also equal to zero That is, the form of the predictor X,, , =1(X,,) which minimises (70) is
Xi =E(X,.¡/ø@,) (14.73)
where o(X,,) is the a-field generated by X, (see Chapter 4)
As argued in Sections 5.4 and 7.2, the functional form of E(X, , ,/o(X,,)) depends entirely on the form of the joint distribution D(X,,,,, X,; w) For
example, in the case where D(X,,.,, X,; W) is multivariate normal then the conditional expectation is linear, i.e
E(X,,, 1/A(X,,)) = BX, (14.74)
(see Chapter 15) In practice, when D(X,,,,, X,; W) is not known linear
Trang 24308 Hypothesis testing and confidence regions
functional form of
E(X,, + ,/o(X,,)) = g(X,,) (14.75)
In such cases the joint distribution is implicitly assumed to be closely approximated by a normal distribution
The prediction value of X,,., will take the form
Xn +4 = E(X,,41/X,=X%,), (14.76)
where x, refers to the observed realisation of the sample X, The intuition underlying (76) is that the best ‘guess’ for the value X,,, must be the average of all its possible values, in view of the past realisations of X(X,,=x,) (see Fig 14.6)
Itis important to note that in the case where X,, , and X,, are independent then
E(X 4 ¡/0(X„))= E(X„, ¡) (14.77)
That is, the conditional expectation coincides with the marginal expectation of X,,,, (see Chapters 6 and 7) This is the reason why in the case of the random sample X, where X,~ N(O, 1),i=1,2, n,if X,4, is
also assumed to have the same distribution, its best predictor (in MSE
sense) is its mean, i.e X, , =(1/n) » ¡ Ä;(see Section 12.3) Itgoes without saying that for prediction purposes we prefer to have non-random sampling
models because the past history of the stochastic progess | X,,, n= 1} will be
of considerable value in such a case
Trang 25146 Prediction 309
Ø and the same analysis as in Section 14.5 goes through with minor interpretation changes
Important concepts
Null and alternative hypotheses, acceptance region, rejection region, test statistic, type I error, type II error, power of a test, the power function, size of a test, uniformly most powerful test, unbiased test, simple hypothesis, composite hypothesis, Neyman—Pearson lemma, likelihood ratio test,
pivots, confidence region, confidence level, uniformly most accurate
confidence regions, unbiased confidence region, optimal predictor, minimum mean square error ooo NID 10 12 13 14 15 16 Questions Explain the relationship between Hy and H, and the distribution of the sample
Describe the relationship between the acceptance and rejection regions and ©, and ©)
Define the concepts of a test statistic, type 1 and type II errors and probabilities of type I and IT errors
Explain intuitively why we cannot control both probabilities of type I and type II errors How do we ‘solve’ this problem in hypothesis testing?
Define and explain the concepts of the power of a test and the power function of a test
Explain the concept of the size of a test Define and explain the concept of a UMP test State the components needed to define a test
Explain why we need to know the distribution of the test statistic under both the null and the alternative hypotheses
Define the concept ofa pivot and explain its role in hypothesis testing Explain the concepts of one-sided and two-sided tests
Explain the circumstances under which UMP tests exist
Explain the Neyman—Pearson theorem and the likelihood ratio test procedure as ways of constructing optimal tests
Explain intuitively the meaning of the statement Pr(t(X) <0 <1(X)) = 1a
Define the concept of a (1—«) confidence region for @
Explain the relationship between C,(6,), the acceptance region for
Trang 26310 Hypothesis testing and confidence regions
17 Define and explain the concept of a (1 —«) uniformly most accurate confidence region
Exercises
1, Forthe ‘marks’ example of Section 14.2 construct a size 0.05 test for Hy: 6=60 against H,: 0<60 Is it unbiased? Using this, construct a 0.95 significance level confidence interval for 0
2 Let X ~ N(u, 07) and consider the following hypotheses: (i) Ho: hoo Hị:H>Họẹ, 07? >90, bo — known;
(ii) Hy: 0? 203, H,:0?<o02, weR, o% - known; (iii) Ho: M=Mo, 0° =03, Hy: eA, 0°? #03; (iv) Ho: =o, 07? S02, Hị:Hz#+ug, 07? <8
State whether the above null and alternative hypotheses are simple or
composite and explain your answer
3 Let X=(X,, ,X,,)’ bea random sample from N(O, 1) where 0Q =
{0,,0,} Construct a size x test for
Hy: 0=09, against H,: 0=6)
Using this, construct a (1~—2) significance level confidence interval for @
4 LetX =(X,, ,X,}’ bea random sample from a Bernoulli distribution with a density function f(x; V=FU-6)'-*, x=0, 1 Construct a size « test for Hy: 0<, against H,: > 0p (Hint: (97 , X;) is binomially distributed.) 5 Let X=(X,, , X,)' be a random sample from Nịu, ø?) () Show that the test defined by the rejection region X,- 1 ¢ — Cc, a Yn tl, ` — đe]
defnes a UMP unbiased test for Hạ: u < uọ against H,: > Uo
(") Derive a UMP unbiased test for Hg: p>) against H,: p< po
6 Let X=(X,, , X,) and Y=(Y,, , Y,,)’ be two random samples
from N(u,o7) and N(u, 03) respectively Show that for the hypotheses:
Trang 27Additional references 311 (iii) Hy:o?=03, Hị:ø1zZ0), the rejection regions are: Cp ={xX yxy) 2k}, Cị= {X, y:T{X, Y) <ky}, Cy = x,y: k3 <x, y) <k4}, respectively, where
define UMP unbiased tests (see Lehmann (1959), pp 169-70) What is the distribution of 1(x, y)?
(iv) Construct a size « test for Hy: 0? =03=07, H,: 07 £04 using
the likelihood ratio test procedure
Additional references