Estimation I — properties of estimators
Estimation in what follows refers to point estimation unless indicated otherwise Let (S, ¥ P(-)) be the probability space of reference with X a r.v defined on this space The following statistical model is postulated: (i) O={ f(x; 6), PEO}, OCR;
(ii) X=(X,, X, , X,) 1s a random sample from f(x; 6)
Estimation in the context of this statistical model takes the form of constructing a mapping h(-):2% > ©, where 7% is the observation space and h(-) is a Borel function The composite function (a statistic) §=h(X):S 0 is called an estimator and its vaiue h(x), x EX an estimate It is important to distinguish between the two because the former is a random variable (r.v.) and the latter is a real number
Example 1
Trang 2232 Properties of estimators 1 (iv) U,== (Xx, + Xu): H ˆ 1 n 2 (v) 0;=— X75 Hị=I - ˆ 1 n ; (vi) 6,=- iX;; Wis ˆ 1 n il 0, =——_ i (vn) 7 n+1,“ i
It is obvious that we can construct infinitely many such estimators However, constructing ‘good’ estimators is not so obvious From the above examples it is clear that we need some criteria to choose between these estimators In other words, we need to formalise what we mean by a ‘good’ estimator Moreover, it will be of considerable help if we could devise general methods of constructing such good estimators; a question considered in the next chapter
12.1 Finite sample properties
In order to be able to set up criteria for choosing between estimators we need to understand the role of an estimator first An estimator is constructed with the sole aim of providing us with the ‘most representative value’ of @ in the parameter space ©, based on the available information in the form of the statistical model Given that the estimator 0=h(X) is a rv (being a Borel function of a random vector X) any formalisation of what we mean by a ‘most representative value’ must be in terms of the distribution of 6, say f(0) This is because any statement about ‘how near 6 is to the true &” can only be a probabilistic one
Trang 3Note that an alternative, but equivalent, way to define E(Ô is
E(6)= | TH I hs) fe: 0) dx, (12.2)
—œ
where ƒ(x;Ø)=ƒf(xị, x;, Xạ; Ð) is the distribution of the sample, X Sometimes we can derive E(6) without having to derive either of the above distributions by just using the properties of E(-) (see Chapter 4) For example, in the case of the estimators suggested in Section 12.1, using independence and the properties of the normal distribution we can deduce
that 6,~N(0,(1/n), this is because 0, is a linear function of normally
distributed r.v.’s (see Chapter 6.3), and
ˆ ] n 1 H { H 0
E(ô,) “ẤN x x)= 3 EXJ=, ¥ 0=" =0 no no A (12.3)
Trang 4234 Properties of estimators
ˆ n+1\„ (n+1)2n+ 1) ˆ n n
0~A( MT) 0.~N( Thuy}
Hence, the estimators 6,, 6, 03 are indeed unbiased but 6,, 05,6, and 6, are
biased We define bias to be B(#)= E(#)— 6 and thus B(O,)=[(2—n)/n]@, B(Ô;)= n?(1 + 0) — 0, B(8,)= [(n — 1)/218, B(Ô;)= — 0/(n + 1) As can be seen
from the above discussion, it is often possible to derive the mean of an estimator 6 without having to derive its distribution It must be remembered, however, that unbiasedness is a property based on the distribution of ổ This distribution is often called sampling distribution of 0 in order to distinguish it from any other distribution of functions of r.v.’s Although unbiasedness seems at first sight to be a highly desirable property it turns out to be a rather severe restriction in some cases and in most situations there are too many unbiased estimators for this property to be used as the sole criterion for judging estimators The question which
naturally arises is, ‘how can we choose among unbiased estimators?’ Returning to the above example, we can see that the unbiased estimators 6,,65,6, have the same mean but they do not have the same variances Given that the variance is a measure of dispersion, intuition suggests that the estimator with the smallest variance is in a sense better because its distribution is more ‘concentrated’ around Ø This argument leads to the second property, that of relative efficiency
Definition 2
An unbiased estimator 0, of 6 is said to be relatively more efficient than some other unbiased estimator @, if - + -+a Var(Ô Var(Ô,)<Var(Ô,) or eff(6, na 1 2 In the above definition @, is relatively more efficient than either 6, or 8; since ~ 1 1 ˆ Var(0)=<¿ = Var(0;), k=1,2, ,n—-1, n and - | ˆ
Var(02)=7< 1=Var(@3), for k>l
Le Ổ; is relatively more efficient than 6, (see Fig 12.2)
In the case of biased estimators relative efficiency can be defined in terms of the mean square error (MSE) which takes the form
Trang 5
6
Fig 12.2 The sampling distribution of 6, 6, and 63 that is, an estimator 6* is relatively more efficient than Ổ if
E(ô* — 0)?< Etô ~ 0}?
or
MSE(Ô*) < MSE(Ô
Ascan be seen, this definition includes the definition in the case of unbiased estimators as a special case Moreover, the definition in terms of the MSE enables us to compare an unbiased with a biased estimator in terms of efficiency For example,
MSE(6,) <MSE(6;),
and intuition suggests that 0, is a ‘better’ estimator than 0, despite the fact that 63 is unbiased and 0, is not; Fig 12.3 illustrates the case In circumstances like this it seems a bit unreasonable to insist on unbiasedness Caution should be exercised, however, when different distributions are involved as in the case of 6s
Let us consider the concept of MSE in some detail As defined above, the MSE of 6 depends not only on @ but the value of 6 in © chosen as well That is, for some 6),€0
MSE(O, 65) = E(8— 65)? = E[(0 — E(8) + E(ơ — 9;)]?
= Var(Ơ)+[ B(ổ 0a)12, (12.7)
Trang 6236 Properties of estimators f(ô+) f£(\ô;) 8
Fig 12.3 Comparing the sampling distribution of 6, and 63
(uniformly in ©) That is,
MSE(0 0) <MSE(6,6) for all 0e©,
where 6 denotes any other estimator of 0 For any two estimators 0 and of 0 if MSE(6, 8) << MSE(9, 0), GEO with strict inequality holding for some 0€@, 0 is said to be inadmissible For example, 6, above is inadmissible because
MSE(0,,0)<MSE(@3,0), forall 0€@ ifn>1
In view of this we can see that 6, and 6, are inadmissible because in MSE terms are dominated by ổ„ The question which naturally arises is: ‘Can we find an estimator which dominates every other in MSE terms?’ A moment’s reflection suggests that this is impossible because the MSE criterion depends on the value of @ chosen In order to see this let us choose a particular value of # in ©, say 0», and define the estimator
*=6, forall xe 7:
Then MSE(@*, 0))=0 and any uniformly best estimator would have to satisfy MSE(6*, 0)=0 for all 0 € ©, since 6, was arbitrarily chosen That is, estimate @ perfectly whatever its ‘true’ value! Who needs criteria in such a case? Hence, the fact that there are no uniformly best estimators in MSE terms is due to the nature of the problem itself
Trang 7estimators we happen to consider This, however, is not very satisfactory since there might be much better estimators in terms of MSE for which we know nothing about In order to be able to avoid choosing the better of two inefficient estimators we need some absolute measure of efficiency Such a measure is provided by the Cramer—Rao lower bound which takes the form
_ đĐ(0 |
ôn |
CR()=—=E———.=— Clog f(x;0)\? (12.8)
li
ƒ(x; Ø) is the distribution of the sample and B(6) the bias It can be shown that for any estimator 0* of 6
MSE(0*, 0)> CR(0)
under the following regularity conditions on ©:
(CRI) The set A={x: f(x; 6)>0} does not depend on 0)
(CR2) For each 0€® the derivatives [C log f(x; 0)] (60), i= 1, 2, 3, exist Jor all xe:
(CR3) 0< E[(ô/ô0lof ƒ(x; 0)]ˆ<> /ør all 0e@
In the case of unbiased estimators the inequality takes the form ^ "+ 2 ]-I vard*,> | (TRL?) the inverse of this lower bound is called Fisher's information and is denoted by 1,,(8) Definition 3 An unbiased estimator 6 of 6 is said to be (fully) efficient if ’ - 12-1 var)=| rÍ” log J0: ”) | ce
Trang 8238 Properties of estimators n 1 log ƒ(x; 0)= —5 logl2z)—„ 3 (x,—Ø)? =1 dlogƒf(x:0_ < ~ TH he b0 "vo 2 2 z| (ee ”) || ws) | by independence An alternative way to derive the Cramer—Rao lower bound is to use the equality dlog fA)" ]_ _[ d* log f(x: 8) A () - _| âu } (12.9)
which holds true under CR1—CR3 and /(x; 6) is the ‘true’ density function In the above example [d? log f(x; 0)]/(d@?)= —n and hence the equality holds
So, for this example, CR(6) = 1/n and, as can be seen from above, the only
estimator which achieves the bound is 6,, that is, Var(0,) = CR(); hence 6,
is a fully efficient estimator The properties of unbiasedness, relative efficiency and full efficiency enabled us to reduce the number of the originally suggested estimators considerably Moreover, by narrowing the class of estimators considered, to the class of unbiased estimators, we succeeded in ‘solving’ the problem of ‘no uniformly best estimator’, discussed above in relation to the MSE criterion This, however, is not very surprising given that by assuming unbiasedness we exclude the bias term which is largely responsible for the problem
Sometimes the class of estimators considered is narrowed even further by requiring unbiased as well as linear estimators That is, estimators which are linear functions of the r.v.’s of the sample For example, in the case of example 1 above, 6,, 0, 63, 9,, 6, and 6, are linear estimators Within the class of linear and unbiased estimators we can show that 6, has minimum variance In order to show that let us take a general linear estimator and J=c+ ¥ aX, (12.10) i=1
which includes the above linear estimators as special cases, and determine the values for c, a;,i=1,2, ,n, which ensure that @ is best linear unbiased
estimator (BLUE) of u Firstly, for @ to be unbiased we must have E()=0
which implies that c=0 and )?_,a;=1 Secondly, since Var(@)=
y?-, a7o?=07 Y"_, a? we must choose the a,s so as to minimise )'"_ , a? as
Trang 9this problem we have
min: l(a,A)= ¥ = at), (12.11) đị i=l i=1 ol A 3a, 72449 i=1,2, ,n, Le a= 5 Summing over i, H n A 1 a=) x=1sAq=-, Lê d=-, i=1,2, ,n, i=1 ini 2 n n
for c=0 and a;=1/n, i=1,2, ,n, =(1/n) 3 7—, X,, which is identical to 6, Hence 0, is BLUE (minimum variance among the class of linear and unbiased estimators of y) This result will be of considerable interest in Chapter 21, in relation to the so-called Gauss-Markov theorem
The properties of unbiasedness and efficiency can be generalised directly to the multiparameter case where 0=(0,, 6,,) Ois said to be an unbiased estimator of 6 if
E()=0, ie E(6)=0,, i=1,2, m
In the case of full efficiency we can show that the Cramer—Rao inequality for an unbiased estimator @ of @ takes the form
ˆ ôlog ƒ{x; Ø\/ôlog ƒ(x;Ø)V
Cov( Geer mae Gaara le (12.12)
or
Var(6,) >1,(0)7', i=1,2, ,m
(m being the number of parameters), where I,(0);,' represents the ith diagonal element of the inverse of the matrix
élog f(x; 0)\/é log f(x; 0)\
oe (ee) |
_ 67 log f(x; 0)
=| | (12.13)
Trang 10240 Properties of estimators Example 2 ! I {x-—p\? 3 + (1) DO=< f(x; = Fam oP “3 - „Ø=(u,ø“)clRxIR”?; L Vv L :
(H) X=(X,,X;, , X„} 1s a random sample from ƒ(x; 8) In example | discussed above we deduced that
` ] H
=- ` X, (12.14)
Hit
is a ‘good’ estimator of y, and intuition suggests that since j/ is in effect the sample moment corresponding to « the sample variance
] H
Pa Y (X10? (12.15)
i=1
should be a ‘good’ estimator of o? In order to check our intuition let us examine whether G7” satisfies any of the above discussed properties t(Š (X;—ñ) }* (0x —H) -tñ~/)P) =e 3 0x ¡~M?+(Ñ—H ,°~3X,—/lli (12.16) Since 2 ơ2
E(X,—n)°=ø3, Eyi— "=~ and E[(X,—g)(ji =~
from independence, we can deduce that
n n 2 2
r| Š =a |= y (02 -2 =n 1)o? (12.17)
i=1 i=1
This, however, implies that E(¢?) = [(n— 1)/n]ø? # ø2, that is, ¢? is a biased
Trang 11and thus
ot 20+
Var(s= ap 2(n — 1) = n-l (12.20)
Trang 12242 Properties of estimators
This clearly shows that although X,, achieves the Cramer—Rao lower bound s? does not It turns out, however, that no other unbiased estimator exists which is relatively more efficient than s?; although there are more efficient biased estimators such as
22 1 ¢ 7 2
vườn ^* (X;—- X,,)’ (12.26)
Efficiency can be seen as a property indicating that the estimator ‘utilises’ all the information contained in the statistical model An important concept related to the information ofa statistical model is the concept of a sufficient statistic This concept was introduced by Fisher (1922) as a way to reduce the sampling information by discarding only the information of no relevance to any inference about 6 In other words, a statistic 1(X) is said to be sufficient for @ if it makes no difference whether we use X or 1(X) in inference concerning 8 Obviously in such a case we would prefer to work with r(X) instead of X, the former being of lower dimensionality
Definition 4
A statistic (+): 2 R", n>m, is called sufficient for 0 if the conditional distribution f(x/t(x) =) is independent of 9, i.e Ð does not appear in f(x/t(x)=t) and the domain of f{(-) does not involve 6 In example 1 above intuition suggests that t(X)=)"., X; must be a sufficient statistic for @ since in constructing a ‘very good’ estimator of 0, 6, we only needed to know the sum of the sample and not the sample itself That is, as far as inference about @ is concerned knowing all the numbers (X,,X3, ,X,,) orjust }"_, X;makes no difference Verifying this directly by deriving f(x/t(x)=1) and showing that it is independent of 6 can be a very difficult exercise One indirect way of verifying sufficiency is provided by the following lemma
Fisher-Neyman factorisation lemma
The statistic t(X) is sufficient for 0 if and only if there exists a factorisation of the form
I(x; 0) =f (t(x); 8) - A(x), (12.27)
where (t(x); 0) is the density function of t(X) and depenas on 6 and h(X), some function of X independent of 6
Trang 13be minimal if the sample X cannot be reduced beyond 1(X) without losing sufficiency They suggested choosing an arbitrary value x, in % and form the ratio
MSD iy x5), xEx 066, (12.28)
F(X; 9)
and the values of x, which make g(x, X,; 6) independent of @ are the required minimal sufficient statistics In example 2 above {1 H H G(X, Xo; mm X?—-} xả | 20°} m4 i=l KP) x,-¥ Xo} (12.29) i=1 i=1
This clearly shows that 1(X)=()7., X; )7., X?7) is a minimal sufficient
statistic since for these values of xg g(x, Xp; Ø)= 1 Hence, we can conclude that (X,,s7) being simple functions of 2(X) are sufficient statistics It is important to note that we cannot take yr X; or yey X? separately as minimal sufficient statistics; they are jointly sufficient for 0=(u, 07)
In contrast to unbiasedness and efficiency, sufficiency is a property of statistics in general, not just estimators, and it is inextricably bound up with the nature of ® For some parametric family of density functions such as the exponential family of distributions sufficient statistics exist, for other families they might not Intuition suggests that, since efficiency is related to full utilisation of the information in the statistical model, and sufficiency can be seen as a maximal reduction of such information without losing any relevant information as far as inference about @ is concerned, there must be a direct relationship between the two properties A relationship along the lines that when an efficient estimator is needed we should look no further than the sufficient statistics, is provided by the following lemma
Rao and Blackwell lemma
Let 1(X) be a sufficient statistic for 8 and t(X) be an estimator of 8, then
E(h(X) — 0)? < E(t(X)—6)*, 060, (12.30) where h(X) = E(t(X)/t(X) = 1), i.e the conditional expectation of t(X) given t(X)=t
Trang 14244 Properties of estimators
related to the distribution of the estimator 0 of 0 As argued repeatedly deriving the distribution of Borel functions of r.v.’s such as O=h(X) isa very difficult exercise and very few results are available in the literature These results are mainly related to simple functions of normally distributed r.v.’s (see Section 6.3) For the cases where no such results are available (which is the rule rather than the exception) we have to resort to asymptotic results This implies that we need to extend the above list of criteria for ‘good’ estimators to include asymptotic properties of estimators These asymptotic properties will refer to the behaviour of 6 as n > ~ In order to emphasise the distinction between these asymptotic properties and the properties considered so far we call the latter finite sample (or small sample) properties The finite sample properties are related directly to the distribution of 0, say f(6,) On the other hand, the asymptotic properties are related to the asymptotic distribution of 6,
12.2 Asymptotic properties
A natural property to require estimators to have is that asn > ~ (i.e as the sample size increases) the probability of @ being close to the true value @ should increase as well We formalise this idea using the concept of convergence In probability associated with the weak law of large numbers (WLLN) (see Section 9.2)
Definition 5
An estimator 0,=h(X) is said to be consistent for 6 if
lim Pr(|Ô,— 0|<e)= I, (12.31)
Hy
ÖP
and we write 0, 0
This is in effect an extension of the WLLN for the sample mean X, to some Borel function h(X) It is important to note that consistency does not refer to 0, approaching Ø in the sense of mathematical convergence The convergence refers to the probability associated with the event lô, —0|<e¿as derived from the distribution of 0, as n> x Moreover, consistency is a very minimal property (although a very important one) since if 6, is a consistent estimator of 0 then so is 6*=6,+7405926/n if Pr(|0,—6|> 7405 926/n)= 1/n, n> 1, which implies that for a small n the difference |6, —- 6| might be enormous, but the probability of this occurring decreasing to zero aS H —>%
Trang 15{ ' 1 \ 0—ce 0 0+€c Fig 12.4 Consistency in the case of a symmetric uniformly converging distribution
symmetric distribution for n,y<n,<n3<n4y<ns This diagram seems to suggest that if the sampling distribution f(6,) becomes less and less dispersed as n> x and eventually collapses at the point 0 (i.e becomes degenerate), then Ø, is a consistent estimator of 0 The following lemma formalises this argument
Lemma 12.1
If 6, is an estimator of @ which satisfies the following properties
(i) lim E(6,)=0;
p
(ii) lim Var(@,)=0, then 6, > 0
nox
It is important, however, to note that these are only sufficient conditions for consistency (not necessary); that is, consistency is not equivalent to the above conditions, since for consistency Var(6,) need not even exist The above lemma, however enables us to prove consistency in many cases of interest in practice If we return to example | above we can see that
? ~ !
Trang 16246 Properties of estimators
by Chebyshev’s inequality and lim,,_, ,,[ 1 —(1/ne?)] = 1 Alternatively, using Lemma 12.1 we can see that both conditions are satisfied Similarly, we can
P P P
show that ổ; — Ø, ổ + Ø ('+>` reads 'đoes not converge in probability to'),
P P P P
6, + 0,65 + 8,06 + 0,8, + 0 Moreover, for ở? and s? ofexample 2 we can P P
show that đ?— øˆ and s?— øŸ,
A stronger form of consistency associated with almost sure convergence is a very desirable asymptotic property for estimators
Definition 6
An estimator 0, is said to be a strongly consistent estimator of 0 if Pr( tim ñ,~0)= 1
and is denoted by ô,— 8
The strong consistency of 6, in example | is verified directly by the SLLN and that of s? from the fact that it is a continuous function of the sample moments X, and m,=(1/n) )"_, X? (see Chapter 10) Consistency and strong consistency can be seen as extensions of the weak law and strong law of large numbers for }''_, X; to the general statistic 6,, respectively Extending the central limit theorem to 6, leads to the property of asymptotic normality
Definition 7
An estimator 0, is said to be asymptotically normal if two sequences {V,(0), n= 1}, {8,, n> 1} exist such that
(0) ?(ỗ,—Ø,) Sz~ N(O, 1) (12.32)
Trang 17asymptotic variance In relation to this form of asymptotic normality we consider two further asymptotic properties
Definition 8
An estimator 6, with Var(6,)=O(1/n) is said to be asymptotically unbiased if
/n0,-0) 20 asn>x (12.34)
This is automatically satisfied in the case of an asymptotically normal
estimator 6, for Var(0,) = V,(0) and E(6,)=6, Thus, asymptotic normality
can be written in the form
./n(6, — 0) ~ NO, V(0)) (12.35)
It must be emphasised that asymptotic unbiasedness is a stronger condition than lim,.,,, E(@,)= 6; the former specifying the rate of convergence
In relation to the variance of the asymptotic normal distribution we can define the concept of asymptotic efficiency Definition 9 An asymptotically normal estimator 0, is said to be asymptotically efficient if V(€)=1,,(0) 1, where 1„(Ø= lim (, 140) (12.36) H— œ
ie the asymptotic variance achieves the limit of the Cramer—Rao lower bound (see Rothenberg (1973))
At this stage it is important to distinguish between three different forms of the information matrix The sample information matrix /,(6) (see (13)), the single observation one I(@} with f(x; 6) in (13), Le
dlog f(xs M\*
Fae)
and the asymptotic information matrix J,,(8) in (36)
12.3 Predictors and their properties Consider the simple statistical model:
(a) Probability model: = { f(x; 0)=1/,/(2n) exp{—3(xS—0)?, 0e)
ie X~N(O, 1)
Trang 18248 Properties of estimators
Hence the distribution of the sample is
ƒ¡ x„:Ø)= [| 6z 9) (12437)
(¿=1
Prediction of the value of X beyond the sample observations, say X„„¡, refers to the construction ofa Borel function Í(-) from the parameter space © to the observation space #
-): O2 (12.38)
If Ø is known we can use the assumption that X„,;¡ ~ W(Œ, l) to make probabilistic statements about X,,,, Otherwise we need to estimate 6 first and then use it to construct /(-) In the present example we know from Sections 12.1 and 12.2 above that
- lẻ
0,=- ¥ X; (12.39)
Hị=n
is a ‘good’ estimator of @ Intuition suggests that a ‘good’ predictor of X,,.,
might be to use /(6,)=6,, that is,
X41 =9, (12.40)
The random variable X,, ,=1(0,) is called the predictor of X,., and its value the prediction value Note that the main difference between estimation and prediction is that in the latter case what we are ‘estimating’ (X,,+1) 18 a random variable itself not a constant parameter @
In order to consider the optimal properties of a predictor X, ,, =/(6,) we define the prediction error to be
ye =Xyey—Xg at (12.41)
Given that both X,,, and X,,, , are random variables e, , , isalso a random variable and has its own distribution Using the expectation operator with respect to the distribution of e,, we can define the following properties: (1) Unbiasedness The predictor X,, , of X,,., is said to be unbiased if
E(e, 41) =0 (12.42)
(2) Minimum MSE The predictor X,,,, of X,, is said to be minimum mean square error if
E(2.¡)S E(X„.ị =„v¡)°SEX,.¡—Ẩ„.¡)2” — (1243) for any other predictor X,,, of X, 41
Another property of predictors commonly used in practice is linearity (3) Linear The predictor X,,,, of X,,., is said to be linear if I() is a
Trang 19In the case of the example considered above we can deduce that
1
eat ~A[0 1+) (12.44)
given that e,., is a linear function of normally distributed r.v.’s, e,.) = X„.¡—(1/n)Š;-, X, Hence, X,,,, is both linear and unbiased Moreover, using the same procedure as in Section 13.1 for linear least-squares estimators, we can show that X,,., is also minimum MSE among the class of linear unbiased predictors
The above properties of predictors are directly related to the same properties for estimators discussed in Section 12.1 This is not surprising, however, given that a predictor can be viewed as an ‘estimator’ of a random
variable which does not belong to the sample
Important concepts
Estimator, estimate, unbiased estimator, bias, relative efficiency, mean square error, full efficiency, Cramer—Rao lower bound, information matrix, sufficient statistic, finite sample properties, asymptotic properties, consistency, strong consistency, asymptotic normality, asymptotic unbiasedness, asymptotic efficiency, BLUE
Questions
1 Define the concept of an estimator as a mapping and contrast it with the concept of an estimate
2 Define the finite sample properties of unbiasedness, relative and full efficiency, sufficiency and explain their meaning
3 ‘Underlying every expectation operator E(-) there is an implicit distribution.’ Explain 4 Explain the Cramer-Rao lower bound and the concept of the information matrix 5 Explain the Lehmann-Scheffe method of constructing minimal sufficient statistics
6 Contrast unbiasedness and efficiency with sufficiency
7 Explain the difference between small sample and asymptotic properties
8 Define and compare consistency and strong consistency
Trang 20250 10 H1 12 1 Properties of estimators
Explain the concept of asymptotic efficiency in relation to asymptotically normal estimators ‘What happens when the asymptotic distribution is not normal”
Explain intuitively why \/n(@,—@)>0 as n> x is a stronger condition than lim, ,, E(6,)=0
Explain the relationships between /„(Ø), (0) and I„(6) Exercises
Let X=(X,, X2, , X,)' be a random sample from N(@, 1) and consider the following estimators of 0: ˆ - 1v 0=X,, Ø;= rà l,2, ,n—1, Lực 2 _1 2 3_ 1 s 5 _ 64.6 0;=3X,+$X U,=-; Y IX, 0,=,+0; AW j=]
(i) Derive the distribution of these estimators
(il) Using these distributions consider the question whether these estimators satisfy the properties of unbiasedness, full efficiency and consistency
(iii) Choose the relatively most efficient estimator Consider the following estimator defined by
0=" and Pr( =) Jase
n nf n+l
ˆ ˆ 1
0,=n and Pr,=n)= 7 and show that:
(1) 6, as defined above has a proper sampling distribution; (ii) G,, is a biased estimator of zero;
(1) lim, ,„ Var(ổ,) does not exist; and (iv) 6,, is a consistent estimator of zero
Let X=(X,, X2, , X,)' be a random sample from N(0,o7) and
consider
yx
on i=] as an estimator of o?
Trang 21(ti) Compare it with 6? of example 2 above and explain intuitively why the differences occur
(1) Derive the asymptotic đistribution of đ?
4 Let X=(X,, , X,) be a random sample from the exponential distribution with density I f(x = 5 e *" x>0 Construct a minimal sufficlent statisic for Ø using the Lehmann- Scheffe method Additional references