Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 57 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
57
Dung lượng
2,89 MB
Nội dung
Chapter 6 NON-LINEAR REGRESSION MODELS TAKESHI AMEMIYA* Stanford University Contents 1. Introduction 334 2. Single equation-i.i.d. case 336 2.1. Model 336 2.2. Asymptotic properties 337 2.3. Computation 341 2.4. Tests of hypotheses 347 2.5. Confidence regions 352 3. Single equation-non-i.i.d. case 354 3. I. Autocorrelated errors 354 3.2. Heteroscedastic errors 358 4. Multivariate models 359 5. Simultaneous equations models 362 5.1. Non-linear two-stage least squares estimator 362 5.2. Other single equation estimators 370 5.3. Non-linear simultaneous equations 375 5.4. Non-linear three-stage least squares estimator 376 5.5. Non-linear full information maximum likelihood estimator 379 References 385 *This work was supported by National Science Foundation Grant SE%7912965 at the Institute for Mathematical Studies in the Social Sciences, Stanford University. The author is indebted to the following people for valuable comments: R. C. Fair, A. R. Gallant, Z. Griliches, M. D. Intriligator, T. E. MaCurdy, J. L. Powell, R. E. Quandt, N. E. Savin, and H. White. Handbook of Econometrics, Volume I, Edited by Z. Griliches and M.D. Intriligator 0 North-Holland Publishing Company, 1983 334 T. Amemiya 1. Introduction This is a survey of non-linear regression models, with an emphasis on the theory of estimation and hypothesis testing rather than computation and applications, although there will be some discussion of the last two topics. For a general discussion of computation the reader is referred to Chapter 12 of this Handbook by Quandt. My aim is to present the gist of major results; therefore, I will sometimes omit proofs and less significant assumptions. For those, the reader must consult the original sources. The advent of advanced computer technology has made it possible for the econometrician to estimate an increasing number of non-linear regression models in recent years. Non-linearity arises in many diverse ways in econometric applica- tions. Perhaps the simplest and best known case of non-linearity in econometrics is that which arises as the observed variables in a linear regression model are transformed to take account of the first-order autoregression of the error terms. Another well-known case is the distributed-lag model in which the coefficients on the lagged exogenous variables are specified to decrease with lags in a certain non-linear fashion, such as geometrically declining coefficients. In both of these cases, non-linearity appears only in parameters but not in variables. More general non-linear models are used in the estimation of production functions and demand functions. Even a simple Cobb-Douglas production function cannot be transformed into linearity if the error term is added rather than multiplied [see Bodkin and Klein (1967)]. CES [Arrow, Chenery, Minhas and Solow (196 l)] and VES [Revankar (1971)] production functions are more highly non-linear. In the estimation of expenditure functions, a number of highly non-linear functions have been proposed (some of these are used in the supply side as well)-Translog [Christensen, Jorgenson and Lau (1975)], Generalized Leontief [Diewert (1974)], S-Branch [Brown and Heien (1972)], and Quadratic [Howe, Pollack and Wales (1979)], to name a few. Some of these and other papers with applications will be mentioned in various relevant parts of this chapter. The non-linear regression models I will consider in this chapter can be written in their most general form as (1.1) where y,, .x,, and (Y~ are vectors of endogenous variables, exogenous variables, and parameters, respectively, and uif are unobservable error terms with zero mean. Eqs. (1. l), with all generality, constitute the non-linear simultaneous equations model, which is analyzed in Section 5. I devote most of the discussion in the chapter to this section because this area has been only recently developed and therefore there is little account of it in general references. Ch. 6: Non -linear Regression Models 335 Many simpler models arising as special cases of (1.1) are considered in other sections. In Section 2 I take up the simplest of these, which I will call the standard non-linear regression model, defined by Y,=f(x,&+% t=1,2 , , T, 0.2) where (u,} are scalar i.i.d. (independent and identically distributed) random variables with zero mean and constant variance. Since this is the model which has been most extensively analyzed in the literature, I will also devote a lot of space to the analysis of this model. Section 3 considers the non-i.i.d. case of the above model, and Section 4 treats its multivariate generalization. Now, I should mention what will not be discussed. I will not discuss the maximum likelihood estimation of non-linear models unless the model is written in the regression form (1.1). Many non-linear models are discussed elsewhere in this Handbook; see, for example, the chapters by Dhrymes, McFadden, and Maddala. The reader is advised to recognize a close connection between the non-linear least squares estimator analyzed in this chapter and the maximum likelihood estimator studied in the other chapters; essentially the same techniques are used to derive the asymptotic properties of the two estimators and analogous computer algorithms can be used to compute both. I will not discuss splines and other methods of function approximation, since space is limited and these techniques have not been as frequently used in econometrics as they have in engineering applications. A good introduction to the econometric applications of spline functions can be found in Poirier (1976). Above I mentioned the linear model with the transformation to reduce the autocorrelation of the error terms and the distributed-lag model. I will not specifically study these models because they are very large topics by themselves and are best dealt with separately. (See the chapter by Hendry, Pagan, and Sargan in this Handbook). There are a few other important topics which, although non-linearity is involved, woud best be studied within another context, e.g. non-linear error-in-variable models and non-linear time-series models. Regarding these two topics, I recommend Wolter and Fuller (1978) and Priestley (1978). Finally, I conclude this introduction by citing general references on non-linear regression models. Malinvaud (1970b) devotes one long chapter to non-linear regression models in which he discusses the asymptotic properties of the non- linear least squares estimator in a multivariate model. There are three references which are especially good in the discussion of computation algorithms, confidence regions, and worked out examples: Draper and Smith (1966) Bard (1974) and Judge, Griffiths, Hill and Lee (1980). Several chapters in Goldfeld and Quandt (1972) are devoted to the discussion of non-linear regression models. Their Chapter 1 presents an excellent review of optimization techniques which can be used in the computation of both the non-linear least squares and the maximum likelihood estimators. Chapter 2 discusses the construction of confidence regions 336 T. Amemiya in the non-linear regression model and the asymptotic properties of the maximum likelihood estimator (but not of the non-linear least squares estimator). Chapter 5 considers the Cobb-Douglas production function with both multiplicative and additive errors, and Chapter 8 considers non-linear (only in variables) simulta- neous equations models. There are two noteworthy survey articles: Gallant (1975a), with emphasis on testing and computation, and Bunke, Henscheke, Strtiby and Wisotzki (1977), which is more theoretically oriented. None of the above-mentioned references, however, discusses the estimation of simultaneous equations models non-linear both in variables and parameters. 2. Single equation-i.i.d. case 2.1. Model In this section I consider the standard non-linear regression model Y,=fb,Jo)+%~ t=1,2 , , T, (2.1) where y, is a scalar endogenous variable, x, is a vector of exogenous variables, & is a K-vector of unknown parameters, and {u,} are unobservable scalar i.i.d. random variables with Eu, = 0 and Vu, = ut, another unknown parameter. Note that, unlike the linear model wheref(x,, &) = x&, the dimensions of the vectors x, and &-, are not necessarily the same. We will assume that f is twice continuously differentiable. As for the other assumptions on f, I will mention them as they are required for obtaining various results in the course of the subsequent discussion. Econometric examples of (2.1) include the Cobb-Douglas production function with an additive error, Q, = p, Kf2L,B3 + u,, and the CES (constant elasticity of substitution) production function: (2.2) (2.3) Sometimes I will write (2.1) in vector notation as Y =f(Po)+% (2.4) where y, f( /3,-J, and u are T-vectors whose t th element is equal toy,, f( x,, &), and u,, respectively. I will also use the symbolf,(&) to denote f(x,, &,)_ Ch. 6: Non -linear Regression Models 337 The non-linear least squares (NLLS) estimator, denoted p, is defined as the value of /I that minimizes the sum of squared residuals S,(P) = t [Yt -fhP)12. (2.5) It is important to distinguish between the p that appears in (2.5), which is the argument of the function f(x,, m), and &, which is a fixed true value. In what follows, I will discuss the properties of p, the method of computation, and statistical inference based on 8. 2.2. Asymptotic properties 2.2.1. Consistency The consistency of the NLLS estimator is rigorously proved in Jennrich (1969) and Malinvaud (1970a). The former proves strong consistency (j? converging to &, almost surely) and the latter weak consistency (p converging to &, in probability). Weak consistency is more common in the econometric literature and is often called by the simpler name of consistency. The main reason why strong consistency, rather than weak consistency, is proved is that the former implies the latter and is often easier to prove. I will mainly follow Jennrich’s proof but translate his result into weak consistency. The consistency of b is proved by proving that plim T- ‘S,( j3) is minimized at the true value &. Strong consistency is proved by showing the same holds for the almost sure limit of T- ‘S,( /3) instead. This method of proof can be used to prove the consistency of any other type of estimator which is obtained by either minimizing or maximizing a random function over the parameter space. For example, I used the same method to prove the strong consistency of the maximum likelihood estimator (MLE) of the Tobit model in Amemiya (1973b). This method of proof is intuitively appealing because it seems obvious that if T-l&( /3) is close to plim T-k&(/3) and if the latter is minimized at &,, then fi, which minimizes the former, should be close to &. However, we need the following three assumptions in order for the proof to work: The parameter space B is compact (closed and bounded) and & is its interior point. (2.6) S, ( @ ) is continuous in p . (2.7) plim T- ‘S,(p) exists, is non-stochastic, and its convergence is uniform in p. (2.8) 338 T. Amemiya The meaning of (2.8) is as follows. Define S(p) = plim T- ‘S,( j3). Then, given E, S > 0, there exists To, independent of /I, such that for all T 2 To and for all P, PUT-‘&-(P)- S(P)1 ’ &I< 6. It is easy to construct examples in which the violation of any single assumption above leads to the inconsistency of 8. [See Amemiya (1980).] I will now give a sketch of the proof of the consistency and indicate what additional assumptions are needed as I go along. From (2.1) and (2.5), we get =A,+A,+A,, (2.9) where c means CT=, unless otherwise noted. First, plim A, = ut by a law of large numbers [see, for example, Kolmogorov Theorem 2, p. 115, in Rao (1973)]. Secondly, for fixed &, and p, plim A, = 0 follows from the convergence of T-‘C[f,(&)- f,(p)]’ by Chebyshev’s inequality: Since the uniform convergence of A, follows from the uniform convergence of the right-hand side of (2.10), it suffices to assume converges uniformly in fi, , & E B. (2.11) Having thus disposed of A, and A,, we need only to assume that lim A, is uniquely minimized at PO; namely, lim+E[f,(&)-N)l’-o ifP*&. (2.12) To sum up, the non-linear least squares estimator B of the model (2.1) is consistent if (2.6), (2.1 l), and (2112) are satisfied. I will comment on the signifi- cance and the plausibility of these three assumptions. The assumption of a compact parameter space (2.6) is convenient but can be rather easily removed. The trick is to dispose of the region outside a certain compact subset of the parameter space by assuming that in that region T-‘~MPoF.MP)12 * IS sufficiently large. This is done by Malinvaud (1970a). An essentially similar argument appears also in Wald (1949) in the proof of the consistency of the maximum likelihood estimator. It would be nice if assumption (2.11) could be paraphrased into separate assumptions on the functional form off and on the properties of the exogenous Ch. 6: Non -linear Regression Models 339 sequence {x,}, which are easily verifiable. Several authors have attempted to obtain such assumptions. Jennrich (1969) observes that if f is bounded and continuous, (2.11) is implied by the assumption that the empirical distribution function of {x,} converges to a distribution function. He also notes that another way to satisfy (2.11) is to assume that {x,} are i.i.d. with a distribution function F, and f is bounded uniformly in p by a function which is square integrable with respect to F. Malinvaud (1970a) generalizes the first idea of Jennrich by introduc- ing the concept of weak convergence of measure, whereas Gallant (1977) gener- alizes the second idea of Jennrich by considering the notion of Cesaro summabil- ity. However, it seems to me that the best procedure is to leave (2.11) as it is and try to verify it directly. The assumption (2.12) is comparable to the familiar assumption in the linear model that lim T- ‘X’X exists and is positive definite. It can be easily proved that in the linear model the above assumption is not necessary for the consistency of least squares and it is sufficient to assume (X’X)- ’ + 0. This observation suggests that assumption (2.12) can be relaxed in an analogous way. One such result can be found in Wu (198 1). 2.2.2. Asymptotic normality The asymptotic normality of the NLLS estimator B is rigorously proved in Jennrich (1969). Again, I will give a sketch of the proof, explaining the required assumptions as I go along, rather than reproducing Jennrich’s result in a theo- rem-proof format. The asymptotic normality of the NLLS estimator, as in the case of the MLE, can be derived from the following Taylor expansion: (2.13) where a2$/apap’ is a K x K matrix of second-order derivatives and p* lies between j? and &. To be able to write down (2.13), we must assume that f, is twice continuously differentiable with respect to p. Since the left-hand side of (2.13) is zero (because B minimizes S,), from (2.13) we obtain: @(~_p,)=_ 1 a2sT [ Twl,.]‘$ %I,,- (2.14) Thus, we are done if we can show that (i) the limit distribution of fi-‘(asT/a&&, is normal and (ii) T- ‘( 6’2ST/apap’)B* converges in probabil- ity to a non-singular matrix. We will consider these two statements in turn. 340 T. Amemiya The proof of statement (i) is straightforward. Differentiating (2.5) with respect to @, we obtain: Evaluating (2.15) at & and dividing it by @, we have: 1 as, = JT ap &l i aft cu -I f ap Bo* (2.15) (2.16) But it is easy to find the conditions for the asymptotic normality of (2.16) because the summand in the right-hand side is a weighted average of an i.i.d. sequence-the kind encountered in the least squares estimation of a linear model. Therefore, if we assume exists and is non-singular, then 1 as t 0 afi PO + N(0,4&). (2.17) (2.18) This result can be straightforwardly obtained from the Lindberg-Feller central limit theorem [Rao (1973, p. 128)] or, more directly, from of Anderson (197 1, Theorem 2.6.1, p. 23). Proving (ii) poses a more difficult problem. Write an element of the matrix ~-l(a~s,/apap)~. ash@*). 0 ne might think that plim hT( /3*) = plim hT( &,) follows from the well-known theorem which says that the probability limit of a continuous function is the function of the probability limit, but the theorem does not apply because h, is in general a function of an increasing number of random variables y,, j2,. . . ,y,. But, by a slight modification of lemma 4, p. 1003, of Amemiya (1973b), we can show that if hr( p) converges almost surely to a certain non-stochastic function h( /?) uniformly in p, then plim hT( p*) = h(plim /I*) = h( &). Differentiating (2.15) again with respect to p and dividing by T yields (2.19) We must show that each of the three terms in the right-hand side of (2.19) Ch. 6: Non -linear Regression Models 341 converges almost surely to a non-stochastic function uniformly in p. For this purpose the following assumptions will suffice: converges uniformly in /I in an open neighborhood of /3,, , and (2.20) converges uniformly in p in an open neighborhood of & . (2.21) Then, we obtain; 1 a$- PlimT apap, 8* =2C- Finally, from (2.14), (2.18), and (2.22) we obtain: (2.22) (2.23) The assumptions we needed in proving (2.23) were (2.17), (2.20), and (2.21) as well as the assumption that /? is consistent. It is worth pointing out that in the process of proving (2.23) we have in effect shown that we have, asymptotically, (2.24) where I have put G = ( af/&3’),0, a F x K matrix. Note that (2.24) exactly holds in the linear case. The practical consequence of the approximation (2.24) is that all the results for the linear regression model are asymptotically valid for the non-linear regression model if we treat G as the regressor matrix. In particular, we can use the usual t and F statistics with an approximate precision, as I will explain more fully in Sections 2.4 and 2.5 below. Since the matrix G depends on the unknown parameters, we must in practice evaluate it at b. 2.3. Computation Since there is in general no explicit formula for the NLLS estimator b, the minimization of (2.5) must usually be carried out by some iterative method. There 342 T. Amemiya are two general types of iteration methods: general optimization methods applied to the non-linear least squares problem in particular, and procedures which are specifically designed to cope with the present problem. In this chapter I will discuss two representative methods - the Newton-Raphson iteration which be- longs to the first type and the Gauss-Newton iteration which belongs to the second type - and a few major variants of each method. These cover a majority of the iterative methods currently used in econometric applications. Although not discussed here, I should mention another method sometimes used in econometric applications, namely the so-called conjugate gradient method of Powell (1964) which does not require the calculation of derivatives and is based on a different principle from the Newton methods. Much more detailed discussion of these and other methods can be found in Chapter 12 of this Handbook and in Goldfeld and Quandt (1972, ch. 1). 2.3.1. Newton - Raphson iteration The Newton-Raphson method is based on the following quadratic approxima- tion of a minimand (it also works for a maximand): (2.25) where B, is the initial estimate [obtained by a pure guess or by a method such as the one proposed by Hartley and Booker (1965) described below]. The second- round estimator & of the iteration is obtained by minimizing the right-hand side of (2.25). Therefore, (2.26) The iteration is to be repeated until the sequence {&} thus obtained converges to the desired degree of accuracy. Inserting (2.26) into (2.25) and writing n + 1 and n for 2 and 1, we obtain: (2.27) The above equation shows two weaknesses of the Newton-Raphson iteration. (i) Even if (2.27) holds exactly, &(&+ ,) < S,(&) is not guaranteed unless (~2WWV’)~” is a positive definite matrix. (ii) Even if the matrix is positive [...]... Phillips (1976) proves that the Zellner iteration (4.4) converges to the quasi MLE fi if T is ‘ In the linear SUR model, the least squares and the generalized least identically equal for every finite sample if the same conditions are met squares estimators are Ch 6: Non -linear Regression Models 361 sufficiently large and if 4, (or any other initial estimate) is sufficiently close to 8 Therefore, Phillips’... may obtain a good starting value by this method, as Gallant’ example shows s 2.3.2 Gauss-Newton iteration This is the method specifically designed to calculate the NLLS estimator Expanding f,(p) in a Taylor series around the initial estimate fi,, we get: (2.35) Substituting the right-hand side of (2.35) for f,(p) in (2.5) yields (2.36) The second-round estimator b2 of the Gauss-Newton iteration is... that of B if we use a consistent estimator (such as the Hartley-Booker estimator) to start this iteration An advantage of the Gauss-Newton iteration over the Newton-Raphson iteration is that the former requires only the first derivatives off, The Gauss-Newton iteration may be alternatively motivated as follows Evaluating the approximation (2.35) at &, and inserting it into eq (2.1) yields (2.38) Then,... proof Rewrite the Gausss Newton iteration (2.37) as (I have also changed 1 to n and 2 to n + 1 in the subscript) b”+,=h(s,>? where h is a vector-valued (2.42) function implicitly defined by (2.37) By a Taylor Ch 6: Non -linear Regression Models 341 expansion: (2.43) where /3:_ , lies between &, and &_ , If we define A,, = (ah /a/3’ ),, and denote the largest characteristic root of AkA, by h,, we can... separately considered Assuming Q is full rank, we can find a K, X K matrix R such that (R’ Q’ = A’ is non-singular If we , ) the define (Y A/3 and partition (Y’ (‘ = = Y;,),(Y{*)), hypothesis Q/3 = c is equivalent to the hypothesis a(Z)= c As noted after eq (2.24), all the results of the linear regression model can be extended to the non-linear model by treating G = ( af/&3’ ),0 as the regressor matrix... than (2.47) except when K, = 1 All these observations indicate a preference for (2.49) over (2.47) See Gallant (1975b) for a tabulation of the power function of the test based on S,(/?‘ )/S,(&, which is equivalent to the test based on (2.49) 2.4.2 Non -linear hypotheses under non -normality Now I consider the test of a non-linear hypothesis h(P) = 0, (2.53) where h is a q-vector valued non-linear function... second-round estimator 8, has the same asymptotic distribution as B In this case, a further iteration does not bring any improvement so far as the asymptotic distribution is concerned This is shown below By a Taylor expansion of (a&/a/3);, around &, we obtain: as, as, ab j, = T -I I Bo + 8* (S,-PO)> (2.30) where p* lies between B, and & Inserting (2.30) into (2.26) yields dqP,-PO)= ( I- [ ~~,1’ gg~~*)m~l-~o)... of the Zellner iteration (4.4), the iteration can be also defined implicitly by the normal equation But, since the quasi MLE 8 is a stationary H,(e, 6) = 0 point of the iteration (4.7): that is, (4.9) Phillips proves that (4.7) defines a unique function d,,+, = A(&) by showing a mapping (a, b) + (z, w) defined by z = Hr(a, b) and w = b has a Jacobian which is a P-matrix (every principal minor is positive)... later disclaimed by Goldfeld and Quandt (1972) as a computational error, since they found this estimator to be inconsistent This was also pointed out by Edgerton (1972) In fact, the consistency of NL2S requires that the rank of W must be at least equal to the number of regression parameters to be estimated, as I will explain below Goldfeld and Quandt (1968) also tried the Theil interpretation and discarded... ones used in Section 2.2 I will give an intuitive proof of the consistency by writing the expression corresponding to (2.9) [The method of proof used in Amemiya (1974) is slightly different.] Using a Taylor expansion of f(a) around (~c, we obtain the approximation $sr(alW) G’ = $dP wu + f (c-q, - a)‘ Pwu +~(a-a,)‘ P,G(a-a,), G’ (5.11) W)‘ W’ and G = (af/J(~‘ ),~ It is apparent from (5.11) that where . continuously differentiable. As for the other assumptions on f, I will mention them as they are required for obtaining various results in the course of the subsequent discussion. Econometric examples. rigorously proved in Jennrich (1969). Again, I will give a sketch of the proof, explaining the required assumptions as I go along, rather than reproducing Jennrich’s result in a theo- rem-proof. normality of the NLLS estimator, as in the case of the MLE, can be derived from the following Taylor expansion: (2.13) where a2$/apap’ is a K x K matrix of second-order derivatives and