Chapter 12: ESTIMATION I - PROPERTIES OF ESTIMATORS

21 320 0
Chapter 12: ESTIMATION I - PROPERTIES OF ESTIMATORS

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

CHAPTER 12 Estimation I — properties of estimators Estimation in what follows refers to point estimation unless indicated otherwise Let (S, ¥ P(-)) be the probability space of reference with X a r.v defined on this space The following statistical model is postulated: (i) O={ f(x; 6), PEO}, (ii) X=(X,, X, , X,) 1s a random sample from f(x; 6) Estimation in the context OCR; of this statistical model takes the form of constructing a mapping h(-):2% > ©, where 7% is the observation space and h(-) is a Borel function The composite function (a statistic) §=h(X):S is called an estimator and its vaiue h(x), x EX an estimate It is important to distinguish between the two because the former is a random variable (r.v.) and the latter is a real number Example Let f(x; A= [1/./(2n)] exp{ —3(x — Ø)?}, 0c R, and X be a random sample from ƒ(x; Ø) Then #=lR" and the following functions define estimators of 6: -~ lẻ ni 0=-),X; (i) (ii) (iii) k 2=7 L Xs k=1,2, ,n—-1: 232 Properties of estimators (iv) U,== (v) 0;=— ˆ - (vi) ˆ H (Xx, + Xu): n Hị=I n 6,=- X75 ; iX;; Wis ˆ il (vn) n 0, =——_ n+1,“ i i It is obvious that we can construct infinitely many such estimators However, constructing ‘good’ estimators is not so obvious From the above examples it is clear that we need some criteria to choose between these estimators In other words, we need to formalise what we mean by a ‘good’ estimator Moreover, it will be of considerable help if we could devise general methods of constructing such good estimators; a question considered in the next chapter 12.1 Finite sample properties In order to be able to set up criteria for choosing between estimators we need to understand the role of an estimator first An estimator is constructed with the sole aim of providing us with the ‘most representative value’ of @ in the parameter space ©, based on the available information in the form of the statistical model Given that the estimator 0=h(X) is a rv (being a Borel function ofa random vector X) any formalisation of what we mean by a ‘most representative value’ must be in terms of the distribution of 6, say f(0) This is because any statement about ‘how near is to the true &” can only be a probabilistic one The obvious property to require a ‘good’ estimator of to satisfy is that f(6) is centred around 06 Definition An estimator of is said to be an unbiased estimator of if E(Ô= | 6f (8) dé That is, the distribution of parameter to be estimated (12.1) @ has mean equal to the unknown 12.1 Finite sample properties 233 Note that an alternative, but equivalent, way to define E(Ô is E(6)= | TH Ihs) fe: 0) dx, (12.2) —œ where ƒ(x;Ø)=ƒf(xị, x;, Xạ; Ð) is the distribution of the sample, X Sometimes we can derive E(6) without having to derive either of the above distributions by just using the properties of E(-) (see Chapter 4) For example, in the case of the estimators suggested in Section 12.1, using independence and the properties of the normal distribution we can deduce that 6,~N(0,(1/n), this is because 0, is a linear function of normally distributed r.v.’s (see Chapter 6.3), and ˆ ] n E(ô,) “ẤN x x)= H { H EXJ=, no¥ 0=" A =0 no (12.3) (see Fig 12.1) The second equality due to independence and the property E(c)=c if ¢ is a constant and the third equality because of the identically distributed assumption Similarly for the variance of 0, ˆ H i=l n i=1 _! A = " Using similar arguments we can deduce that ˆ ñ,~N( 0) ˆ 0:~N(6,1), k=tl.,2, ,n—], ˆ 26 2} non ˆ Ø,~ đ, ~ ny*(n; n0) f(ơ,) 0, Fig 12.1 The sampling distribution of 0, 6, = A(x) 234 Properties of estimators ˆ n+1\„ (n+1)2n+1) 0~A( MT) ˆ n n 0.~N( Thuy} Hence, the estimators 6,, 6, 03 are indeed unbiased but 6,, 05,6, and 6, are biased We define bias to be B(#)= E(#)—6 and thus B(O,)=[(2—n)/n]@, B(Ô;)= n?(1 + 0) — 0, B(8,)= [(n— 1)/218, B(Ô;)= — 0/(n+ 1) As can be seen from the above discussion, it is often possible to derive the mean of an estimator without having to derive its distribution It must be remembered, however, that unbiasedness is a property based on the distribution of ổ This distribution is often called sampling distribution of in order to distinguish it from any other distribution of functions of r.v.’s Although unbiasedness seems at first sight to be a highly desirable property it turns out to be a rather severe restriction in some cases and in most situations there are too many unbiased estimators for this property to be used as the sole criterion for judging estimators The question which naturally arises is, ‘how can we choose among unbiased estimators?’ Returning to the above example, we can see that the unbiased estimators 6,,65,6, have the same mean but they not have the same variances Given that the variance is a measure of dispersion, intuition suggests that the estimator with the smallest variance is in a sense better because its distribution is more ‘concentrated’ around Ø This argument leads to the second property, that of relative efficiency Definition An unbiased estimator 0, of is said to be relatively more efficient than some other unbiased estimator @, if - + Var(Ô,)1,(0)7', i=1,2, ,m (m being the number of parameters), where I,(0);,' represents the ith diagonal element of the inverse of the matrix oe (ee) | élog f(x; 0)\/é log f(x; 0)\ _ =| 67 log f(x; 0) | (12.13) called the sample information matrix; the second equality holding under the restrictions CR1—CR3 In order to illustrate these, consider the following example: 240 Properties of estimators Example ! oP = Fam Vv f(x; “3 I {x-—p\? - DO=< (H) X=(X,,X;, , X„} 1s a random sample from ƒ(x; 8) L In example ` L : + „Ø=(u,ø“)clRxIR”?; (1) | discussed above we deduced that =- ] H ` X, Hit (12.14) is a ‘good’ estimator of y, and intuition suggests that since j/ is in effect the sample moment corresponding to « the sample variance ] H Pa Y (X10? (12.15) i=1 should be a ‘good’ estimator of o? In order to check our intuition let us examine whether G7” satisfies any of the above discussed properties t(Š (X;—ñ) }* (0x —H) -tñ~/)P) =e 0x ¡~M?+(Ñ—H ,°~3X,—/lli (12.16) Since E(X,—n)°=ø3, Eyi— "=~ and E[(X,—g)(ji ơ2 =~ from independence, we can deduce that n n r| Š =a |= y (02 i=1 i=1 -2 =n 1)o? (12.17) This, however, implies that E(¢?) = [(n— 1)/n]ø? # ø2, that is, ¢? is a biased estimator of ø?, Moreover, it is clear that the estimator H s”=——_n—1 j= (X,—/? (12.18) is unbiased From Chapter 6.3 we also know that (15 ~2-1) (12.19) 12.1 Finite sample properties 241 and thus Var(s= ot ap 2(n — 1) = 20+ (12.20) n-l since the variance of a chi-square r.v equals twice its degrees of freedom Let us consider the question whether i= X,, and s? are efficient estimators: 6)= I] ƒ(x; i=1 cư củ ø./(2n) (a?) 7/2 = Gye 1/x;—ŠŸ &xP EXP) 2\ ø n 55 (x;—)?}, (12.21) => log ƒ(x; Ø)= =slog 2n—5 log ø? —3 H Ø Clog f(x; 8) Clog f(x; 8) _ 60 lv ou Clog f(x; 0) _ (12.22) ( ) a n " of at | \ W398 M298 2, G7 log f(x; 0) ê?log ƒ(x; 6) Go? ou éa* @ log f(x; 8)9) _ Cp? 6000 — | ê?logƒ(x;Ø0) n 32 t H 32 0 — 1,(0)= n 2a* (12.23) êm êø? ê?log ƒ(x;6) —zã4>.(X¡;—H) Tel") = (X;-1)?, ¡=q ToL 224 T2 (Xị— MỂ t a? an and (1224) [I(Ø] != 203 aor n (12.25) 242 Properties of estimators This clearly shows that although X,, achieves the Cramer—Rao lower bound s? does not It turns out, however, that no other unbiased estimator exists which is relatively more efficient than s?; although there are more efficient biased estimators such as 22 vườn ¢ ^* (X;—- X,,)’ (12.26) Efficiency can be seen as a property indicating that the estimator ‘utilises’ all the information contained in the statistical model An important concept related to the information ofa statistical model is the concept of a sufficient statistic This concept was introduced by Fisher (1922) as a way to reduce the sampling information by discarding only the information of no relevance to any inference about In other words, a statistic 1(X) is said to be sufficient for @ if it makes no difference whether we use X or 1(X) in inference concerning Obviously in such a case we would prefer to work with r(X) instead of X, the former being of lower dimensionality Definition A statistic (+): R", n>m, is called sufficient for if the conditional distribution f(x/t(x) =) is independent of 9, i.e Ð does not appear in f(x/t(x)=t) and the domain of f{(-) does not involve In example above intuition suggests that t(X)=)"., X; must be a sufficient statistic for @ since in constructing a ‘very good’ estimator of 0, 6, we only needed to know the sum of the sample and not the sample itself That is, as far as inference about @ is concerned knowing all the numbers (X,,X3, ,X,,) orjust }"_, X;makes no difference Verifying this directly by deriving f(x/t(x)=1) and showing that it is independent of can be a very difficult exercise One indirect way of verifying sufficiency is provided by the following lemma Fisher-Neyman factorisation lemma The statistic t(X) is sufficient for if and only if there exists a factorisation of the form I(x; 0) =f (t(x); 8) - A(x), (12.27) where (t(x); 0) is the density function of t(X) and depenas on and h(X), some function of X independent of Even this result, however, is of no great help because we have to have the statistic t(X) as well as its distribution to begin with The following method suggested by Lehmann and Scheffe (1950) provides us with a very convenient way to derive minimal sufficient statistics A sufficient statistic t(X) is said to 12.1 Finite sample properties 243 be minimal if the sample X cannot be reduced beyond 1(X) without losing sufficiency They suggested choosing an arbitrary value x, in % and form the ratio xEx MSDiy x5), F(X; 9) (12.28) 066, and the values of x, which make g(x, X,; 6) independent of @ are the required minimal sufficient statistics In example above G(X, Xo; {1 mm 20°} H m4 X?—-} H i=l xả | KP) i=1 x,-¥ Xo} i=1 (12.29) This clearly shows that 1(X)=()7., X; )7., X?7) is a minimal sufficient statistic since for these values of xg g(x, Xp; Ø)= Hence, we can conclude that (X,,s7) being simple functions of 2(X) are sufficient statistics It is important to note that we cannot take yr X; or yey X? separately as minimal sufficient statistics; they are jointly sufficient for 0=(u, 07) In contrast to unbiasedness and efficiency, sufficiency is a property of statistics in general, not just estimators, and it is inextricably bound up with the nature of ® For some parametric family of density functions such as the exponential family of distributions sufficient statistics exist, for other families they might not Intuition suggests that, since efficiency is related to full utilisation of the information in the statistical model, and sufficiency can be seen as a maximal reduction of such information without losing any relevant information as far as inference about @ is concerned, there must be a direct relationship between the two properties A relationship along the lines that when an efficient estimator is needed we should look no further than the sufficient statistics, is provided by the following lemma Rao and Blackwell lemma Let 1(X) be a sufficient statistic for and t(X) be an estimator of 8, then E(h(X) — 0)? < E(t(X)—6)*, 060, (12.30) where h(X) = E(t(X)/t(X) = 1), i.e the conditional expectation of t(X) given t(X)=t From the above discussion of the properties of unbiasedness, relative and full efficiency and sufficiency we can see that these properties are directly 244 Properties of estimators related to the distribution of the estimator of As argued repeatedly deriving the distribution of Borel functions of r.v.’s such as O=h(X) isa very difficult exercise and very few results are available in the literature These results are mainly related to simple functions of normally distributed r.v.’s (see Section 6.3) For the cases where no such results are available (which is the rule rather than the exception) we have to resort to asymptotic results This implies that we need to extend the above list of criteria for ‘good’ estimators to include asymptotic properties of estimators These asymptotic properties will refer to the behaviour of as n > ~ In order to emphasise the distinction between these asymptotic properties and the properties considered so far we call the latter finite sample (or small sample) properties The finite sample properties are related directly to the distribution of 0, say f(6,) On the other hand, the asymptotic properties are related to the asymptotic distribution of 6, 12.2 Asymptotic properties A natural property to require estimators to have is that asn > ~ (i.e as the sample size increases) the probability of @ being close to the true value @ should increase as well We formalise this idea using the concept of convergence In probability associated with the weak law of large numbers (WLLN) (see Section 9.2) Definition An estimator 0,=h(X) is said to be consistent for if lim Pr(|Ô,— 0| 7405 926/n)= 1/n, n> 1, which implies that for a small n the difference |6, —- 6| might be enormous, but the probability of this occurring decreasing to zero aS H —>% Fig 12.4 illustrates the concept in the case where 0, has a well-behaved 12.2 Asymptotic properties 245 { | ' ‘ 0—ce \ 0+€c Fig 12.4 Consistency in the case of a symmetric uniformly converging distribution symmetric distribution for n,y

Ngày đăng: 17/12/2013, 15:19

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan