Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 58 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
58
Dung lượng
2,91 MB
Nội dung
NON-LINEAR REGRESSION MODELS Chapter NON-LINEAR TAKESHI REGRESSION MODELS AMEMIYA* Stanford University Contents Introduction Single equation-i.i.d 2.1 Model 2.2 Asymptotic 2.3 Computation properties 2.4 Tests of hypotheses 2.5 Confidence regions Single equation-non-i.i.d I Autocorrelated 3.2 case Heteroscedastic case errors errors Multivariate models Simultaneous equations models 5.1 Non-linear 5.2 Other single equation two-stage least squares estimator 5.3 Non-linear simultaneous 5.4 Non-linear three-stage 5.5 Non-linear full information estimators equations least squares estimator maximum likelihood estimator References *This work was supported by National Science Foundation Grant SE%7912965 Mathematical Studies in the Social Sciences, Stanford University The author following people for valuable comments: R C Fair, A R Gallant, Z Griliches, T E MaCurdy, J L Powell, R E Quandt, N E Savin, and H White Handbook of Econometrics, Volume I, Edited by Z Griliches and M.D Intriligator North-Holland Publishing Company, 1983 334 336 336 337 341 347 352 354 354 358 359 362 362 370 375 376 379 385 at the Institute for is indebted to the M D Intriligator, T Amemiya 334 Introduction This is a survey of non-linear regression models, with an emphasis on the theory of estimation and hypothesis testing rather than computation and applications, although there will be some discussion of the last two topics For a general discussion of computation the reader is referred to Chapter 12 of this Handbook by Quandt My aim is to present the gist of major results; therefore, I will sometimes omit proofs and less significant assumptions For those, the reader must consult the original sources The advent of advanced computer technology has made it possible for the econometrician to estimate an increasing number of non-linear regression models in recent years Non-linearity arises in many diverse ways in econometric applications Perhaps the simplest and best known case of non-linearity in econometrics is that which arises as the observed variables in a linear regression model are transformed to take account of the first-order autoregression of the error terms Another well-known case is the distributed-lag model in which the coefficients on the lagged exogenous variables are specified to decrease with lags in a certain non-linear fashion, such as geometrically declining coefficients In both of these cases, non-linearity appears only in parameters but not in variables More general non-linear models are used in the estimation of production functions and demand functions Even a simple Cobb-Douglas production function cannot be transformed into linearity if the error term is added rather than multiplied [see Bodkin and Klein (1967)] CES [Arrow, Chenery, Minhas and Solow (196 l)] and VES [Revankar (1971)] production functions are more highly non-linear In the estimation of expenditure functions, a number of highly non-linear functions have been proposed (some of these are used in the supply side as well)-Translog [Christensen, Jorgenson and Lau (1975)], Generalized Leontief [Diewert (1974)], S-Branch [Brown and Heien (1972)], and Quadratic [Howe, Pollack and Wales (1979)], to name a few Some of these and other papers with applications will be mentioned in various relevant parts of this chapter The non-linear regression models I will consider in this chapter can be written in their most general form as (1.1) where y,, x,, and (Y~ vectors of endogenous variables, exogenous variables, and are parameters, respectively, and uif are unobservable error terms with zero mean Eqs (1 l), with all generality, constitute the non-linear simultaneous equations model, which is analyzed in Section I devote most of the discussion in the chapter to this section because this area has been only recently developed and therefore there is little account of it in general references Ch 6: Non -linear Regression Models 335 Many simpler models arising as special cases of (1.1) are considered in other sections In Section I take up the simplest of these, which I will call the standard non-linear regression model, defined by Y,=f(x,&+% t=1,2 , , T, 0.2) where (u,} are scalar i.i.d (independent and identically distributed) random variables with zero mean and constant variance Since this is the model which has been most extensively analyzed in the literature, I will also devote a lot of space to the analysis of this model Section considers the non-i.i.d case of the above model, and Section treats its multivariate generalization Now, I should mention what will not be discussed I will not discuss the maximum likelihood estimation of non-linear models unless the model is written in the regression form (1.1) Many non-linear models are discussed elsewhere in this Handbook; see, for example, the chapters by Dhrymes, McFadden, and Maddala The reader is advised to recognize a close connection between the non-linear least squares estimator analyzed in this chapter and the maximum likelihood estimator studied in the other chapters; essentially the same techniques are used to derive the asymptotic properties of the two estimators and analogous computer algorithms can be used to compute both I will not discuss splines and other methods of function approximation, since space is limited and these techniques have not been as frequently used in econometrics as they have in engineering applications A good introduction to the econometric applications of spline functions can be found in Poirier (1976) Above I mentioned the linear model with the transformation to reduce the autocorrelation of the error terms and the distributed-lag model I will not specifically study these models because they are very large topics by themselves and are best dealt with separately (See the chapter by Hendry, Pagan, and Sargan in this Handbook) There are a few other important topics which, although non-linearity is involved, woud best be studied within another context, e.g non-linear error-in-variable models and non-linear time-series models Regarding these two topics, I recommend Wolter and Fuller (1978) and Priestley (1978) Finally, I conclude this introduction by citing general references on non-linear regression models Malinvaud (1970b) devotes one long chapter to non-linear regression models in which he discusses the asymptotic properties of the nonlinear least squares estimator in a multivariate model There are three references which are especially good in the discussion of computation algorithms, confidence regions, and worked out examples: Draper and Smith (1966) Bard (1974) and Judge, Griffiths, Hill and Lee (1980) Several chapters in Goldfeld and Quandt (1972) are devoted to the discussion of non-linear regression models Their Chapter presents an excellent review of optimization techniques which can be used in the computation of both the non-linear least squares and the maximum likelihood estimators Chapter discusses the construction of confidence regions 336 T Amemiya in the non-linear regression model and the asymptotic properties of the maximum likelihood estimator (but not of the non-linear least squares estimator) Chapter considers the Cobb-Douglas production function with both multiplicative and additive errors, and Chapter considers non-linear (only in variables) simultaneous equations models There are two noteworthy survey articles: Gallant (1975a), with emphasis on testing and computation, and Bunke, Henscheke, Strtiby and Wisotzki (1977), which is more theoretically oriented None of the above-mentioned references, however, discusses the estimation of simultaneous equations models non-linear both in variables and parameters Single equation-i.i.d case 2.1 Model In this section I consider the standard non-linear regression model Y,=fb,Jo)+%~ t=1,2 , , T, (2.1) where y, is a scalar endogenous variable, x, is a vector of exogenous variables, & is a K-vector of unknown parameters, and {u,} are unobservable scalar i.i.d random variables with Eu, = and Vu, = ut, another unknown parameter Note that, unlike the linear model wheref(x,, &) = x&, the dimensions of the vectors x, and &-,are not necessarily the same We will assume that f is twice continuously differentiable As for the other assumptions on f, I will mention them as they are required for obtaining various results in the course of the subsequent discussion Econometric examples of (2.1) include the Cobb-Douglas production function with an additive error, Q, = p, Kf2L,B3+ u,, (2.2) and the CES (constant elasticity of substitution) production function: (2.3) Sometimes I will write (2.1) in vector notation as Y =f(Po)+% (2.4) where y, f( /3,-J,and u are T-vectors whose t th element is equal toy,, f( x,, &), and u,, respectively I will also use the symbolf,(&) to denote f(x,, &,)_ Ch 6: Non -linear Regression Models 337 The non-linear least squares (NLLS) estimator, denoted p, is defined as the value of /I that minimizes the sum of squared residuals S,(P) = t [Yt-fhP)12 (2.5) It is important to distinguish between the p that appears in (2.5), which is the argument of the function f(x,, m), and &, which is a fixed true value In what follows, I will discuss the properties of p, the method of computation, and statistical inference based on Asymptotic properties 2.2 2.2.1 Consistency The consistency of the NLLS estimator is rigorously proved in Jennrich (1969) and Malinvaud (1970a) The former proves strong consistency (j? converging to &, almost surely) and the latter weak consistency (p converging to &, in probability) Weak consistency is more common in the econometric literature and is often called by the simpler name of consistency The main reason why strong consistency, rather than weak consistency, is proved is that the former implies the latter and is often easier to prove I will mainly follow Jennrich’ proof but s translate his result into weak consistency The consistency of b is proved by proving that plim T- ‘ j3) is minimized at S,( the true value & Strong consistency is proved by showing the same holds for the almost sure limit of T- ‘ /3) instead This method of proof can be used to prove S,( the consistency of any other type of estimator which is obtained by either minimizing or maximizing a random function over the parameter space For example, I used the same method to prove the strong consistency of the maximum likelihood estimator (MLE) of the Tobit model in Amemiya (1973b) This method of proof is intuitively appealing because it seems obvious that if T-l&( /3) is close to plim T-k&(/3) and if the latter is minimized at &,, then fi, which minimizes the former, should be close to & However, we need the following three assumptions in order for the proof to work: The parameter space B is compact (closed and bounded) and & is its interior point S, ( @) is continuous in p plim T- ‘ S,(p) (2.6) (2.7) exists, is non-stochastic, and its convergence is uniform in p (2.8) 338 T Amemiya The meaning of (2.8) is as follows Define S(p) = plim T- ‘ j3) Then, given S,( E,S > 0, there exists To, independent of /I, such that for all T To and for all P, PUT-‘ &-(P)S(P)1 ’ &I< It is easy to construct examples in which the violation of any single assumption above leads to the inconsistency of [See Amemiya (1980).] I will now give a sketch of the proof of the consistency and indicate what additional assumptions are needed as I go along From (2.1) and (2.5), we get (2.9) =A,+A,+A,, where c means CT=, unless otherwise noted First, plim A, = ut by a law of large numbers [see, for example, Kolmogorov Theorem 2, p 115, in Rao (1973)] Secondly, for fixed &, and p, plim A, = follows from the convergence of T-‘ C[f,(&)f,(p)]’ by Chebyshev’ inequality: s Since the uniform convergence of A, follows from the uniform convergence of the right-hand side of (2.10), it suffices to assume converges uniformly in fi,, & E B (2.11) Having thus disposed of A, and A,, we need only to assume that lim A, is uniquely minimized at PO; namely, lim+E[f,(&)-N)l’ -o ifP*& (2.12) To sum up, the non-linear least squares estimator B of the model (2.1) is consistent if (2.6), (2.1 l), and (2112) are satisfied I will comment on the significance and the plausibility of these three assumptions The assumption of a compact parameter space (2.6) is convenient but can be rather easily removed The trick is to dispose of the region outside a certain compact subset of the parameter space by assuming that in that region T-‘ ~MPoF.MP)12 IS sufficiently large This is done by Malinvaud (1970a) * An essentially similar argument appears also in Wald (1949) in the proof of the consistency of the maximum likelihood estimator It would be nice if assumption (2.11) could be paraphrased into separate assumptions on the functional form off and on the properties of the exogenous Ch 6: Non-linearRegressionModels 339 sequence {x,}, which are easily verifiable Several authors have attempted to obtain such assumptions Jennrich (1969) observes that if f is bounded and continuous, (2.11) is implied by the assumption that the empirical distribution function of {x,} converges to a distribution function He also notes that another way to satisfy (2.11) is to assume that {x,} are i.i.d with a distribution function F, and f is bounded uniformly in p by a function which is square integrable with respect to F Malinvaud (1970a) generalizes the first idea of Jennrich by introducing the concept of weak convergence of measure, whereas Gallant (1977) generalizes the second idea of Jennrich by considering the notion of Cesaro summability However, it seems to me that the best procedure is to leave (2.11) as it is and try to verify it directly The assumption (2.12) is comparable to the familiar assumption in the linear model that lim T- ‘ X exists and is positive definite It can be easily proved that X’ in the linear model the above assumption is not necessary for the consistency of least squares and it is sufficient to assume (X’ X)- ’+ This observation suggests that assumption (2.12) can be relaxed in an analogous way One such result can be found in Wu (198 1) 2.2.2 Asymptotic normality The asymptotic normality of the NLLS estimator B is rigorously proved in Jennrich (1969) Again, I will give a sketch of the proof, explaining the required assumptions as I go along, rather than reproducing Jennrich’ result in a theos rem-proof format The asymptotic normality of the NLLS estimator, as in the case of the MLE, can be derived from the following Taylor expansion: (2.13) where a2$/apap’ is a K x K matrix of second-order derivatives and p* lies between j? and & To be able to write down (2.13), we must assume that f, is twice continuously differentiable with respect to p Since the left-hand side of (2.13) is zero (because B minimizes S,), from (2.13) we obtain: @(~_p,)=_ Thus, [ a2sT Twl,.]‘ $ %I,,- (2.14) we are done if we can show that (i) the limit distribution of is normal and (ii) T- ‘ 6’ ( 2ST/apap’ )B* converges in probabilfi-‘ (asT/a&&, ity to a non-singular matrix We will consider these two statements in turn 340 T Amemiya The proof of statement (i) is straightforward to @, we obtain: Differentiating (2.5) with respect (2.15) Evaluating (2.15) at & and dividing it by @, we have: as, = JT ap &l i aft f apBo* cu -I (2.16) But it is easy to find the conditions for the asymptotic normality of (2.16) because the summand in the right-hand side is a weighted average of an i.i.d sequence-the kind encountered in the least squares estimation of a linear model Therefore, if we assume exists and is non-singular, (2.17) then t as afi + N(0,4&) (2.18) PO This result can be straightforwardly obtained from the Lindberg-Feller central limit theorem [Rao (1973, p 128)] or, more directly, from of Anderson (197 1, Theorem 2.6.1, p 23) Proving (ii) poses a more difficult problem Write an element of the matrix ~-l(a~s,/apap)~ ash@*) ne might think that plim hT( /3*) = plim hT( &,) follows from the well-known theorem which says that the probability limit of a continuous function is the function of the probability limit, but the theorem does not apply because h, is in general a function of an increasing number of random variables y,, j2, ,y, But, by a slight modification of lemma 4, p 1003, of Amemiya (1973b), we can show that if hr( p) converges almost surely to a certain non-stochastic function h( /?) uniformly in p, then plim hT( p*) = h(plim /I*) = h( &) Differentiating (2.15) again with respect to p and dividing by T yields (2.19) We must show that each of the three terms in the right-hand side of (2.19) Ch 6: Non-linearRegressionModels 341 converges almost surely to a non-stochastic function uniformly in p For this purpose the following assumptions will suffice: converges uniformly in /I in an open neighborhood of /3,,, (2.20) and converges uniformly in p in an open neighborhood of & (2.21) Then, we obtain; PlimT a$apap, 8* =2C- (2.22) Finally, from (2.14), (2.18), and (2.22) we obtain: (2.23) The assumptions we needed in proving (2.23) were (2.17), (2.20), and (2.21) as well as the assumption that /? is consistent It is worth pointing out that in the process of proving (2.23) we have in effect shown that we have, asymptotically, (2.24) where I have put G = ( af/&3’ ),0, a F x K matrix Note that (2.24) exactly holds in the linear case The practical consequence of the approximation (2.24) is that all the results for the linear regression model are asymptotically valid for the non-linear regression model if we treat G as the regressor matrix In particular, we can use the usual t and F statistics with an approximate precision, as I will explain more fully in Sections 2.4 and 2.5 below Since the matrix G depends on the unknown parameters, we must in practice evaluate it at b 2.3 Computation Since there is in general no explicit formula for the NLLS estimator b, the minimization of (2.5) must usually be carried out by some iterative method There Ch 6: Non -linear Regression Models 315 likelihood estimator applied to the original equations (5.44) and (5.45) Quandt concludes that NLFI is the best, OLS is the worst, and the rest are more or less similar, although, to a certain extent, the asymptotic ranking (5.43) is preserved 5.3 Non -hear simultaneous equations So far I have considered the estimation of the parameters of a single equation in the system of simultaneous equations Now, I will consider the estimation of all the parameters of the system The equations of the model are fj(.YtY aj> Uiri x,T = i=1,2 T. Yn; t=1,2 , , T, (5.48) where yt is an n-vector of endogenous variables, x, is a vector of exogenous variables, and (Y~ a &-vector of unknown parameters I will assume that the is n-vector ut = (u,~, uZt, , unt)’ is an i.i.d vector random variable with zero mean and variance-covariance matrix Not all of the elements of the vectors y, and x, may actually appear in the arguments of each fi I assume that each equation has its own vector of parameters (Y~ and that there are no constraints among the ai’ s, but the results I state subsequently can easily be modified if we can express each (Y~ parametrically as ai( where the number of elements in is fewer than CY=,Ki Strictly speaking, (5.48) is not a comIjlete model by itself because there is no guarantee that a unique solution for y, exists for every possible value of uit unless some stringent assumptions are made on the form of fi Therefore, we will assume either that f, satisfies such assumptions or that if there is more than one solution for y, there is some additional mechanism by which a unique solution is chosen I have already mentioned two simple examples of (5.48): the model of Goldfeld and Quandt (1968), defined by (5.8) and (5.9), and the model of Quandt (1975), defined by (5.44) and (5.45) The first model is shown to possess two solutions occurring in two different regions; therefore, the model is not complete unless we specify some mechanism by which one of the two solutions is chosen Goldfeld and Quandt conduct a Monte Carlo study in which they analyze how the performance of several estimators is affected by various mechanisms for choosing solutions such as always choosing solutions from one region or mixing solutions from two regions [See Kelejian (1975) for a further study on this issue.] Quandt (1975) shows that in the second model above there is a one-to-one correspondence between y and u if it is assumed that (Y,< and xgr > for all t I will not discuss the problem of identification in the model (5.48) There are not many useful results in the literature beyond the basic discussion of Fisher (1966) as summarized in Goldfeld and Quandt (1972, p 221 ff) I will merely T Amemiya 376 point out that non-linearity generally helps rather than hampers identification, so that, for example, the number of excluded exogenous variables in a given equation need not be greater than or equal to the number of parameters of the same equation in a non-linear model I should also point out that I have actually (G’ in the given one sufficient condition for identifiability- that plim T - ‘ P,G) right-hand side of (5.14) be non-singular To facilitate the discussion of the subsequent sections I will give a list of symbols: a= (c&a; a;)‘ , ) ) A = 2~1, where @is the Kronecker product, fit =fi(Y,P x,, (yi), f, = an n-vector whose i th element is fi,, Ai, = a T-vector whose t th element is fi,, f = ( f(;, , f,;, , ,.f,l,,)9 a nT-vecm F= (f~l,,j& ,f~,,), afit git = x Gi = a(y; a TXn-math a &-vector, ‘ f(i) , a T x Ki matrix whose t th row is gi:, G=diag{G,,G, 5.4 (5.49) , , G,}, block diagonal matrix Non -linear three -stage least squares estimator Before starting the main discussion I wish to point out that all the results of Sections 5.1 and 5.2 are valid if we change (5.1) to Consequently, the minimand (5.10) which defines the class of NL2S estimators should be changed to f’ w(Fvw)-‘ Wf (5.51) The asymptotic normality result (5.14) needs not be changed The significance of Ch 6: Non -linear Regression Models 311 the above modification is that a NL2S estimator can be applied to each equation of (5.48)? As a natural extension of the class of the NLZS estimators defined in Section 5.1.3, Jorgenson and Laffont (1974) defined the class of non-linear three-stage least squares estimators (NL3S) as the value of (Ythat minimizes f~[e-l@w(IVw)-‘ W] f, (5.52) where is a consistent estimate of Z For example, e=f t=l (5.53) where is the NL2S estimator obtained from each equation The above definition is analogous to the definition of the linear 3SLS as a generalization of the linear 2SLS The consistency and the asymptotic normality of the estimator defined above are proved in Jorgenson and Laffont (1974) and Gallant (1977) The consistency of the NL2S and NL3S estimators of the parameters of the model (5.48) can be proved with minimal assumptions on uit- namely, those stated after (5.48) This robustness makes the estimators attractive Another important strength of the estimators is that they retain their consistency regardless of whether or not (5.48) yields a unique solution for y, and, in the case of multiple solutions, regardless of what additional mechanism chooses a unique solution See MaCurdy (1980) for an interesting discussion of this point Amemiya (1977) defined the class of the NL3S estimators slightly more generally as the value of (Ythat minimizes (5.54) where h is a consistent estimate of A and S is a matrix of constants with nT rows and with the rank of at least cy_,K, This definition is reduced to the Jorgenson-Laffont definition if S = diag( IV, IV, , W) Its asymptotic variancecovariance matrix is given by v, = plimT[G’ S(S’ S)-‘ A-‘ A-‘ SW’ G]-‘ (5.55) Its lower bound is equal to V’ = limT[EG’ EG]-‘ s, A-‘ , (5.56) “Another advantage of this modification is that the Box-Cox transformation model (v: - 1)/h = B’ + u, [see Box and Cox (1964)] can be regarded as a special case of (5.52) See Amemiya and x, Powell (I 980) for the application of NLZS to the Box-Cox model 378 T Amemiya which is attained when one chooses S = EG I will call this estimator the BNL3S estimator (B for best) We can also attain the lower bound (5.56) using the Jorgenson-Laffont definition, but that is possible if and only if the space spanned by the column vectors of W contains the union of the spaces spanned by the column vectors of EG, for i = 1,2, _, n This necessitates including many columns in W, which is likely to increase the finite sample variance of the estimator although it has no effect asymptotically This is a disadvantage of the Jorgenson-Laffont definition compared to my definition Noting that BNL3S is not practical just as BNL2S, Amerniya (1976) suggests the following approximation (1) Compute Bi, an SNL2S estimator of q, i = 1,2, , n (2) Evaluate Gi at &,-call it &; (3) Treat Gi as the dependent variables of the regression and search for the optimal set of independent variables K that best predict ei (4) Choose S = diag{P,G,, P&, , P,&,,}, where Pi = HgJq’ Fq’ ~)-‘ In Section 5.1.4 I discussed tests of hypotheses based on NL2S developed by Gallant and Jorgenson (1979) These tests can easily be generalized to the tests based on NL3S, as shown by the same authors Let B be the NL3S estimator here and let be a consistent estimate of 2: Also, let S,(a) refer to the NL3S minimand-and let d refer to the constrained NL3S subject to the condition (5.15) or (5.16) Then, the Wald and SSRD test statistics can now be defined as Wald = h(~)‘ {~[~‘ e~)~]-‘ h(~), (e-‘ ri}-‘ (5.57) SSRD=S,(&)-S,(8), (5.58) and where G is G evaluated at and P = W(W’ W’ W)-‘ Note that (5.57) and (5.58) are similar to (5.18) and (5.19) The limit distribution of the two statistics under the alternative hypothesis (if it is “close” to the null hypothesis as before) is given by Wald,SSRD-X2[4,h(aO)‘ {~~[~‘ (LI-1~P)G]H}-1h(cuo)], (5.59) or, alternatively, using (5.22): Wald, SSRD - x2 [q, ( a0 - a*)‘ ( t?‘ where P, = (X-‘ ~P)G[C’ ~P)G]-‘ (X-‘ (Z-‘ G’ eP) P, - P2)c ( a0 - a*)], (5.60) Ch 6: Non -linear Regression Models 319 and P,=(Z-'OP)6R[R'~(Z-'OP)GR]-l~~~(~-'~P) As an application of the SSRD test, Gallant and Jorgenson tested the hypothesis of symmetry of the matrix of parameters in the three-equation translog expenditure model of Jorgenson and Lau (1975) The other applications of the NL3S estimators include Jorgenson and Lau [ 19781, which was previously mentioned in Section 5.1.4, and Haessel [ 19761, who estimated a system of demand equations, nonlinear only in parameters, by both NL2S and NL3S estimators 5.5 Non -linear full information (NLFI) maximum likelihood estimator In this section we consider the maximum likelihood estimator of model (5.48) under the normality assumption of uit To so we must assume that (5.48) defines a one-to-one correspondence between yt and u, This assumption enables us to write down the likelihood function in the usual way as the product of the density of u, and the Jacobian Unfortunately, this is a rather stringent assumption, which considerably limits the usefulness of the NLFI estimator in practice We have already noted that Goldfeld and Quandt’ model defined by (5.8) and s (5.9) does not satisfy this assumption This example illustrates two types of problems which confront NLFI: (1) (2) Since there is no solution for y for some values of u, the domain of u must be restricted, which implies that the normality assumption cannot hold Since there are two solutions for y for some values of u, one must specify a mechanism for choosing a unique solution in order to write down the likelihood function One should note that the NL2S and the NL3S estimators are free from both of these problems Assuming u, - N(0, Z), we can write the log-likelihood function of the model (5.48) as L*= - ;log]E]+ i t=l log G II -+ i t II f,lPh (5.61) t=1 Solving aL*/a_z for Z, we get: = E=+ r=l f,f,‘ (5.62) Inserting (5.62) into (5.61) yields the concentrated log-likelihood function L = Clog II - ;log]+f,f,‘ ] t /I (5.63) T Amemiya 380 The NLFI maximum likelihood estimator of (Yis defined as the value of CY that maximizes (5.63) It is shown in Amemiya (1977) that the NLFI estimator is consistent if the true distribution of u, is normal but is generally inconsistent if u, is not normal.‘ This result is contrary to the result in the linear case where the FIML estimator derived from the normality assumption retains its consistency even if the true distribution is not normal It is also shown in the same article that NLFI is asymptotically more efficient than BNL3S in general if the true distribution of U, is normal (On the other hand, NL3S is more robust than NLFI because NL3S is consistent even if the distribution of U, is not normal.) This result is also contrary to the result in the linear case where FIML and 3SLS have the same asymptotic distribution In the subsequent subsections I will further discuss these results as well as some other problems related to the NLFI estimator I will not discuss hypotheses testing here since the discussion of Section 2.4.2 is applicable to the present model 5.5.1 Consistency Differentiating aL G-,=, (5.63) with respect to (Y~, obtain: we Ifi - Tt$, dgi, Sitf,‘ (c.ftf,‘ l, )~‘ (5.64) 8% where (-)i ’ denotes the ith column of the inverse of the matrix within the parentheses The consistency of NLFI is equivalent to the condition: (5.65) and hence to the condition: (5.66) where I# is the i th column of Z - ’ Now, (5.66) could hold even if each term of a summation is different from the corresponding term of the other,‘ but that event is extremely unlikely Therefore, we can say that the consistency of NLFI is “%k result is completely separate from and in no way contradicts the quite likely fact that the maximum likelihood estimator of a non-linear model derived under the assumption of a certain regular non-normal distribution is consistent if the true distribution is the same as the assumed distribution Ch 6: Non -linear Regression Models 381 essentially equivalent to the condition: agit EaUir = Egi,u;al (5.67) It is interesting to note that condition (5.67) holds if u, is normal because of the following lemma.17 Lemma Suppose u = (u,, u2 , , u, )’is distributed as N(0, Z), where is positive definite If ah(u)/&+ is continuous, E)ah/du,l -c GO,and ElhuJ M for some ) Mandlim,,,,,,$(ui)=O 191t is sufficient if all the conditions of footnote 18 hold uniformly with respect to all the other elements of u 382 T Amemiya Now, the question is: Does (5.70) hold for a density G(U) other than normal? The term within the square brackets in (5.70) is clearly zero if u is normal Moreover, we can say “if and only if’ in the preceding sentence, provided that we restrict our attention to the class of continuously differentiable densities, 4, as proved by White (1980b) However, a#/&, + u’ Jl(u) = is not a necessary a’ condition for (5.70) to hold, as we have noted in footnote 13 regarding a simple example of (5.5) and (5.6) This was first noted by Phillips (1981), who gives another interesting example His model is defined by 1%Yl, Y21+ + alx, = ult, OLZYl, = U2t (5.71) (5.72) In this example g, = and g, = eU’ -al; therefore, (5.70) clearly holds for i = for any density $, and Phillips found a class of densities for which (5.70) holds for the case i = What is the significance of these examples? It is that given gi we can sometimes find a class of non-normal densities J/ for which (5.70) holds When gi are simple, as in these examples, we can find a fairly broad class of non-normal densities for which (5.70) holds However, if gi is a more complicated non-linear function of the exogenous variables and the parameters {ai} as well as of u, (5.70) can be made to hold only when we specify a density which depends on the exogenous variables and the parameters of the model In such a case, normality can be regarded, for all practical purposes, as a necessary and sufficient condition for the consistency of NLFI 5.5.2 Comparison between NLFI and NL3S Amemiya (1977) showed that the asymptotic equivalence of NLFI and BNL3S occurs if and (almost) only if fi, can be written in the form ~(Y,,x,,CWi)=Ai((Yi)‘ Z(Yt,Xt)+Bi((Yi,Xt), (5.73) where z is an n-vector of surrogate variables Another instructive way to compare NLFI and BNL3S is to compare certain iterative methods to obtain the two estimators By equating the right-hand side of (5.64) to zero and rearranging terms we can obtain the following iteration to obtain NLFI: (5.74) Ch 6: Non -linear Regression Models 383 where (5.75) and & = diag(G,, Gz, , cn) and where all the variables that appear in the second term of the right-hand side of (5.74) are evaluated at ai( The Gauss-Newton iteration to obtain BNL3S can be written as (5.76) where c; = EGI and G = diag(G,, G2, ,G,) as before Thus, we see that the only difference between (5.74) and (5.76) is in the respective “instrumental variables” used in the formulae Note that Gi defined in (5.75) can work as a proper set of “instrumental variables” (that is, the variables uncorrelated with ut) only if u, satisfies the condition of the aforementioned lemma, whereas ci is always a proper set of instrumental variables, which implies that BNL3S is more robust than NLFI If u, is normal, however, Gi catches more of the part of Gi uncorrelated with u, than ci does, which implies that NLFI is more efficient than BNL3S under normality Note that (5.74) is a generalization of the formula expounded by Hausman (1975) for the linear case Unlike in the linear case, however, the iteration defined by (5.74) does not have the property that is asymptotically equivalent to NLFI when oi(,) is consistent Therefore, its main value may be pedagogical, and it may not be recommendable in practice 5.5.3 Computation of NLFI The discussion of the computation of NLFI preceded the theoretical discussion of the statistical properties of NLFI by more than ten years The first paper on computation was by Eisenpress and Greenstadt (1966) who proposed a modified Newton-Raphson iteration Their modification is the kind that combines both (2.28) and (2.29) Chow (1973) essentially differs from the above two authors in that he obtained simpler formulae by assuming that different parameters appear in different equations as in (5.48) I have already mentioned the iterative method considered by Amemiya (1977) mainly for a pedagogical purpose Dagenais (1978) modified my iteration to speed up the convergence and compared it with a Newton-Raphson method due to Chow and Fair (1973) and the DFP iteration mentioned in Section 2.3.1 in certain examples of non-linear models Results are inconclusive Belsley (1979) compared the computation speed of the DFP iteration in computing NLFI and NL3S in five models of varying degrees of 384 T Amemiya complexity and found that NL3S was three to ten times faster Nevertheless, Belsley shows that the computation of NLFI is quite feasible and can be improved by using a more suitable algorithm and by using the approximation of the Jacobian due to Fair- see eq (5.79) below 5.5.4 Other related papers Fair and Parke (1980) estimated Fair’ (1976) macro model (97 equations, 29 of s which are stochastic, with 182 parameters including 12 first-order autoregressive coefficients), which is non-linear in variables as well as in parameters (this latter non-linearity caused by the transformation to take account of the first-order autogression of the errors), by OLS, SNL2S, the Jorgenson-Laffont NL3S, and NLFI The latter two estimators are calculated by a derivative-free algorithm due to Parke This algorithm of NLFI uses the approximation of the Jacobian: (5.79) where J, = af,/i?y:, N is a small integer, and t,, t,, ., t, are equally spaced between and T Fair finds that in terms of predictive accuracy there is not much difference among different estimators, but in terms of policy response OLS is set apart from the rest Bianchi and Calzolari (1980) propose a method by which one can calculate the mean squared prediction error matrix of a vector predictor based on any estimator of the non-linear simultaneous equations model Suppose the structural equations can be written as f ( yp, xp, a) = up at the prediction period p and we can solve it for yp as y, = g(x,, (Y, up) Define the predictor Yp based on the estimator B by $p = g(x,, &,O) (Note that yp is an n-vector.) Then we have The authors suggest that A, be evaluated by simulation As for A,, we can easily obtain its asymptotic value from the knowledge of the asymptotic distribution of Hatanaka (1978) considers a simultaneous equations model non-linear only in variables Such a model can be written as F(Y, X)r + XB = U Define P by F(?, X)f + X& = 0, where f and B are the OLS estimates Then, Hatanaka proposes using F(p, X) as the instruments to calculate 3SLS He proposes the Ch 6: Non -linear Regression Models 385 method-of-scoring iteration to calculate NLFI where the iteration is started at the aforementioned 3SLS He also proves the consistency and the asymptotic normality of NLFI and obtains its asymptotic covariance matrix [which can be also obtained from Amemiya (1977) by an appropriate reduction] References Akaike, H ( 1973) “Information Theory and an Extension of the Maximum Likelihood Principle”, in: B N Petrov and F Csaki (eds.), Second International Symposium on Information Theory Budapest: Akademiai Kiado, pp 267-28 Amemiya, T (1973a) “Generalized Least Squares with an Estimated Autocovariance Matrix”, Econometrica, 41, 723-732 Amemiya, T (1973b) “Regression Analysis When the Dependent Variable Is Truncated Normal”, Econometrica, 41, 997- 1016 Amemiya, T (1974) “The Nonlinear Two-Stage Least-Squares Estimator”, Journal of Econometrics, 2, 105- 1IO Amemiya, T (1975) “The Nonlinear Limited-Information Maximum-Likelihood Estimator and the Modified Nonlinear Two-Stage Least-Squares Estimator”, Journal of Econometrics, 3, 375-386 Amemiya, T (1976) “Estimation in Nonlinear Simuhaneous Equation Models”, Paper presented at Institut National de La Statistique et Des Etudes Economiques, Paris, March 10 and published in French in: E Malinvaud (ed.), Cahiers Du Seminaire D ‘ econometric, no 19 (I 978) Amemiya, T (1977) “The Maximum Likelihood and the Nonlinear Three-Stage Least Squares Estimator in the General Nonlinear Simultaneous Equation Model”, Econometrica, 45, 955-968 Amemiya, T (I 980) “Lecture Notes in Advanced Theory of Econometrics”, Department of Economics, Stanford University Amemiya, T ( 1981) “ Correction to a Lemma”, forthcoming in Econometrica Amemiya, T and W A Fuller (1967) “A Comparative Study of Alternative Estimators in a Distributed Lag Model”, Econometrica, 35, 509-529 Amemiya, T and J L Powell (1980) “A Comparison of the Box-Cox Maximum Likelihood Estimator and the Nonlinear Two Stage Least Squares Estimator”, Technical Report No 322, August, Institute for Mathematical Studies in the Social Sciences, Stanford University Anderson, T W (1958) An Introduction to Multivariate Statistical Analysis New York: John Wiley & sons Anderson, T W (1971) The Statistical Analysis of Time Series New York: John Wiley & Sons Arrow K J H B Chenerv B S Minhas and R M Solow (1961) “Capital-Labor Substitution and Ecohomic’ Efficiency”, R-&iew of Economics and Statististics, 43,‘ 225-250 Bard, Y (1974) Nonlinear Parameter Estimation New York: Academic Press Bates D M and D G Watts (1980) “Relative Curvature Measures of Nonlinearitv” Journal of the _ Royal Statistical Society, Ser ‘ 42, 1-25 (with discussion) B, Beale, E M L (1960) “Confidence Regions in Non-Linear Estimation”, Journal of the Royal Statistical Society, Ser B, 22, 41-88 (with discussion) Beauchamp, J J and R G Cornell (1966) “Simultaneous Nonlinear Estimation”, Technometrics, 8, 19-326 Belsley, D A (1979) “On the Computational Competitiveness of Full-Information Maximum-Likehhood and Three-Stage Least-Squares in the Estimation of Nonlinear Simultaneous-Equations Models”, Journal of Econometrics, 9, 15-342 Bemdt, E R., W E Diewert and M N Darrough (1977) “Flexible Functional Forms and Expenditure Distributions: An Application to Canadian Consumer Demand Functions”, International Economic Review, 18, 65 I-676 Bemdt, E R., B H Hall, R E Hall and J A Hausman (1974) “Estimation and Inference in Nonlinear Structural Models”, Annals of Econometric and Social Measurement, 3, 653-666 Bemdt, E R and N E Savin (1977) “Conflict Among Criteria for Testing Hypotheses in the Multivariate Linear Regression Model”, Econometrica, 45, 1263- 1278 386 T Amemiya Bianchi, C and G Calzolari (1980) “The One-Period Forecast Error in Non-linear Econometric Models”, International Economic Review, 21, 201-208 Bodkin, R G and L R Klein (1967) “Nonlinear Estimation of Aggregate Production Functions”, Review of Economics and Statistics, 49, 28-44 Box, G E P and D R Cox (1964) “An Analysis of Transformations”, Journal of Royal Statistical _ _ Society, Ser B, 26, 21 l-252 (withdiscussion): Brown M and D Heien (1972) “The S-Branch Utilitv Tree: A Generalization of the Linear Expenditure System”, Econometrica, 40, 737-747 _ But&e, H., K Henscheke, R Sttiiby and C Wisotzki (1977) “Parameter Estimation in Nonlinear Regression Models”, Mathematische Operationsforschung und Statistik, Series Statistics, 8, 23-40 Charatsis, E G (1971) “A Computer Program for Estimation of the Constant Elasticity of Substitution Production Function”, Applied Statistics, 20, 286-296 Chow G C (1973) “On the Comnutation of Full-Information Maximum Likelihood Estimates for Nonlinear Equation Systems”, R&-w of Economics and Statistics, 55, lCl- 109 Chow, G C and R C Fair (1973) “Maximum Likelihood Estimation of Linear Equation Systems with Auto-Regressive Residuals”, Annals of Economic and Social Measurement, 2, 17-28 Christensen, L R., D W Jorgenson and L J Lau (1975) “Transcendental Logarithmic Utility Functions”, American Economic Review, 65, 367-383 in: J Neyman (ed.), Proceedings of the Cox, D R ( 196 1) “Tests of Separate Families of Hypotheses”, Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol I Berkeley: University of California Press, pp 105-123 Cox, D R (1962) “Further Results on Tests of Separate Families of Hypotheses”, Journal of the Royal Statistical Society, Ser B, 24, 406-424 Dagenais, M G (1978) “The Computation of FIML Estimates as Iterative Generalized Least Squares Estimates in Linear and Nonlinear Simultaneous Eouations Models”, Econometrica 46, 1351-1362 Darrough, M N (1977) “A Model of Consumption and Leisure in an Intertemporal Framework: A Systematic Treatment Using Japanese Data”, International Economic Review, i8, 677-696 Davidon W C (19591 “Variable Metric Method for Minimization”, AEC Research Development Report, ANL-5990 ’ Deaton, A S (1974) “The Analysis of Consumer Demand in the United Kingdom, 1900- 1970”, Econometrica, 42, 341-368 Diewert, W E (1973) “Separability and a Generalization of the Cobb-Douglas Cost, Production, and Indirect Utility Functions”, Technical Report 86, January, Institute for Mathematical Studies in the Social Sciences, Stanford University and D Kendrick (eds.), Diewert, W E (1974) “Applications of Duality Theory”, in: M Intriligator North-Holland Publishing Co., pp Frontiers of Quantitative Economics, vol II Amsterdam: 106-171 Draper, N R and H Smith (1966) Applied Regression Analysis New York: John Wiley & Sons Edgerton, D L (1972) “Some Properties of Two Stage Least Squares as Applied to Non-Linear Models”, International Economic Review, 13, 26-32 Eisenpress, H and J Greenstadt (1966) “The Estimation of Nonlinear Econometric Systems”, Econometrica, 34, 85 1-86 Fair, R C (1976) A Model of Macroeconomic Activity, Vol II: The Empirical Model Cambridge, Mass.: Ballinger Fair, R C and W R Parke (1980) “Full-Information Estimates of a Non-linear Macroecometric Model”, Journal of Econometrics, 13, 269-291 Fisher, F M (1966) The Identification Problem in Econometrics New York: McGraw-Hill Fletcher R and M J D Powell (1963) “A Ranidlv Convereent Descent Method for Minimization” Computer Journal, 6, 163-168 ~ ’ A * Gale, D and H Nikaido (1965) “The Jacobian Matrix and Global Univalence of Mappings”, Mathematische Annalen, 159, 81-93 Gallant, A R (1975a) “Nonlinear Regression”, The American Statistician, 29, 73-81 Gallant, A R (1975b) “The Power of the Likelihood Ratio Test of Location in Nonlinear Regression Models”, Journal of the American Statistical Association, 70, 198-203 Gallant, A R (1975~) “Testing a Subset of the Parameters of a Nonlinear Regression Model”, Journal of the American Statistical Association, 70, 927-932 Ch 6: Non -linear Regression Models 387 Gallant, A R (1975d) “Seemingly Unrelated Nonlinear Regression”, Journal of Econometrics, 3, 35-50 Least-Squares Estimation for a System of Simultaneous, NonlinGallant, A R (1977) “Three-Stage ear, Implicit Equations”, Journal of Econometrics, 5, 71-88 Gallant, A R and J J Goebel (1976) “Nonlinear Regression with Autocorrelated Errors”, Journal of the Americun Stutisticul Association, 1, 96 l-967 Gallant, A R and A Holly (1980) “Statistical Inference in an Implicit, Nonlinear, Simultaneous Econometrica, 48, 697-720 Equation Model in the Context of Maximum Likelihood Estimation”, Gallant, A R and D W Jorgenson (1979) “Statistical Inference for a System of Simultaneous, Non-Linear, Implicit Equations in the Context of Instrumental Variable Estimation”, Journal of Econometrics, 11, 275-302 Glasbey, C A (1979) “Correlated Residuals in Non-Linear Regression Applied to Growth Data”, Applied Statistics, 28, 25 l-259 Guttman, I and D A Meeter (1965) “On Beale’ Measures of Non-Linearity”, s Technometrics, 7, 623-637 Goldfeld, S M and R E Quandt (1968) “Nonlinear Simultaneous Equations: Estimation and Prediction”, International Economic Review, 9, 113- 136 North-HolGoldfeld, S M and R E Quandt (1972) Nonlinear Methods in Econometrics Amsterdam: land Publishing Co Goldfeld, S M., R E Quandt and H F Trotter (1966) “Maximization by Quadratic Hill-Climbing”, Econometrica, 34, 541-551 Haessel, W (1976) “Demand for Agricultural Commodities in Ghana: An Application of Nonlinear Two-Stage Least Squares with Prior Information”, American Journal of Agricultural Economics, 58, 341-345 Hannan, E .I (1963) “Regression for Time Series”, in: M Rosenblatt (ed.), Time Series Analysis New York: John Wiley & Sons, pp 17-37 Hannan, E J (1971) “Non-Linear Time Series Regression”, Journal of Applied Probability, 8, 767-780 Hartley, H (1961) “The Modified Gauss-Newton Method for the Fitting of Non-Linear Regression Functions By Least Squares”, Technometrics, 3, 269-280 Hartley, H (1964) “Exact Confidence Regions for the Parameters in Non-Linear Regression Laws”, Biometrika, 1, 347-353 Hartley, H and A Booker (1965) “Non-Linear Least Squares Estimation”, Annuls of Mathematical Statistics, 36, 638-650 Hatanaka, M (1978) “On the Efficient Estimation Methods for the Macro-Economic Models Nonlinear in Variables”, Journal of Econometrics, 8, 323-356 Hausman, J A (1975) “An Instrumental Variable Approach to Full Information Estimators for Linear and Certain Nonlinear Econometric Models”, Econometrica, 43, 727-738 Hildreth, C and J P Houck (1968) “Some Estimators for a Linear Model with Random Coefficients” Journal of the American Statistical Association, 63, 584-595 Hoadley, B (197 1) “Asymptotic Properties of Maximum Likelihood Estimators for the Independent Not Identically Distributed Case”, Annuls of Mathematical Statistics, 42, 1977-1991 Howe, H., R A Pollack and T J Wales (1979) “Theory and Time Series Estimation of the Quadratic Expenditure System”, Econometrica, 47, 123 1- 1248 Hurd, M D (1974) “Wage Changes, Desired Manhours and Unemployment”, Memorandum No 155 (Revised), October, Center for Research in Economic Growth, Stanford University Jennrich, R I (1969) “Asymptotic Properties of Non-linear Least Squares Estimation”, Annals of Mathematical Statistics, 40, 633-643 Jorgenson, D W and J Laffont (1974) “Efficient Estimation of Nonlinear Simultaneous Equations with Additive Disturbances”, Annals of Economic and Social Measurement, 3, 615-640 Jorgenson, D W and L J Lau (1975) “The Structure of Consumer Preferences”, Annuls of Economic and Social Measurement, 4, 49-101 Jorgenson, D W and L J Lau (1978) “Testing the Integrability of Consumer Demand Functions, United States, 1947-1971”, mimeo Judge, G G., W E Griffiths, R C Hill and T C Lee (1980) The Theory and Practice of Econometrics New York: John Wiley & Sons 388 T Amemiya Just, R E and R D Pope (1978) “Stochastic Specification of Production Functions and Economic Implications”, Journal of Econometrics, 7, 67-86 Kelejian, H H (1971) “Two-Stage Least Squares and Econometric Systems Linear in Parameters but Nonlinear in the Endogenous Variables”, Journal of the American Statistical Association, 66, 373-374 Kelejian, H H (1974) “Efficient Instrumental Variable Estimation of Large Scale Nonlinear Econometric Models”, mimeo Kelejian, H H (1975) “Nonlinear Systems and Non-Unique Solutions: Some Results Concerning Estimation”, mimeo., May (Revised) MacKinnon, J G (1976) “Estimating the Linear Expenditure System and Its Generalizations”, in: S M Goldfeld and R E Quandt (eds.), Studies in Nonlinear Estimation Cambridge, Mass.: Ballinger, pp 143- 166 MaCurdy, T E (1980) “An Intertemporal Analysis of Taxation and Work Disincentives”, Working Papers in Economics no E-80-4, The Hoover Institution, Stanford University Malinvaud, E (1970a) “The Consistency of Nonlinear Regressions”, Annals of Mathematical Statistics, 41, 956-969 Malinvaud, E (I 970b) Statistical Methodr of Econometrics (2nd rev edn.) Amsterdam: North-Holland Publishing Co Marquardt, D W (1963) “An Algorithm for Least Squares Estimation of Nonlinear Parameters”, Journal of the Society for Industrial and Applied Mathematics, 11, 43 l-441 Mizon, G E (1977) “Inference Procedures in Nonlinear Models: An Application in a UK Industrial Cross Section Study of Factor Substitution and Returns to Scale”, Econometrica, 45, 1221-1242 Phillips, P C B (1976) “The Iterated Minimum Distance Estimator and the Quasi-Maximum Likelihood Estimator”, Econometrica, 44, 449-460 Phillips, P C B (1981) “On the Consistency of Non-Linear FIML”, mimeo Poirier, D .I (1976) The Econometrics of Structural Change Amsterdam: North-Holland Publishing co Powell, M J D (1964) “A Efficient Method for Finding the Minimum of a Function of Several Variables Without Calculating Derivatives”, Computer Journal, 7, 115- 162 Priestley, M B (I 978) “Non-Linear Models in Time Series Analysis”, The Statistician, 27, 159- 176 Quandt, R E (1975) “A Note on Amemiya’ Nonlinear Two-Stage Least Squares Estimators”, s Research Memorandum no 178, May, Econometric Research Program, Princeton University Rao, C R (1947) “Large Sample Tests of Statistical Hypotheses Concerning Several Parameters with Applications to Problems of Estimation”, Proceedings of Cambridge Philosophical Society, 44, 50-57 Rao, C R (1973) Linear Statistical Inference and Its Applications (2nd edn.) New York: John Wiley & Sons Revankar, N S (1971) “A Class of Variable Elasticity of Substitution Production Function”, Econometrica, 39, 1-7 Rice, P and V K Smith (1977) “An Econometric Model of the Petroleum Industry”, Journal of Econometrics, 263-288 Robinson, P M (1972) “Non-Linear Regression for Multiple Time-Series”, Journal of Applied Probability, 9, 758-768 Sargent, T J (I 978) “Estimation of Dynamic Labor Demand Schedules Under Rational Expectations”, Journal of Political Economy, 86, 1009- 1044 Silvey, S D (1959) “The Lagrangian Multiplier Test”, Annals of Mathematical Statistics, 30, 389-407 Stein, C (1973) “Estimation of the Mean of a Multivariate Normal Distribution”, Technical Report no 48, June 26, Department of Statistics, Stanford University Strickland, A D and L W Weiss (1976) “Advertising, Concentration, and Price-Cost Margins”, Journal of Political Economy, 84, 1109-l 121 Theil, H (1953) “Repeated Least-Squares Applied to Complete Equation Systems”, mimeo The Hague: Central Planning Bureau Theil, H (1971) Principles of Econometrics New York: John Wiley 8~ Sons Tornheim, L (1963)“Convergence in Nonlinear Regression”, Technometrics, 5, 13-5 14 Tsurumi, H (1970) “Nonlinear Two-Stage Least Squares Estimation of CES Production Functions Applied to the Canadian Manufacturing Industries”, Reoiew of Economics and Statistics, 52, 200-207 Ch 6: Non -linear Regression Models 389 Wald, A (1943) “Tests of Statistical Hypotheses Concerning Several Parameters When the Number of Observations is Large”, Transaction of American Mathematical Society, 54, 426-482 Wald, A (1949) “Note on the Consistency of the Maximum Likelihood Estimate”, Annals of Mathematical Statistics, 60, 595-601 White, H (1980a) “Nonlinear Regression on Cross-Section Data”, Econometrica, 48, 721-746 White, H (1980b) “A Note on Normality and the Consistency of the Nonlinear Simultaneous Equations Maximum Likelihood Estimator”, May, mimeo Wolter, K M and W A Fuller (1978) “Estimation of Nonlinear Errors-in-Variables Models”, mimeo Wu, C F (I 98 I) “Asymptotic Theory of Nonlinear Least Squares Estimation”, Annals of Stafisfics, 9, 501-513 Zellner, A (1962) “An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias”, Journal of the American Statisfical Association, 57, 348-368 Zellner, A., D S Huang and L C Chau (1965) “Further Analysis of the Short-Run Consumption Function with Emphasis on the Role of Liquid Assets”, Econometrica, 33, 571-581 ... Multivariate models Simultaneous equations models 5.1 Non-linear 5.2 Other single equation two-stage least squares estimator 5.3 Non-linear simultaneous 5.4 Non-linear three-stage 5.5 Non-linear. .. this introduction by citing general references on non-linear regression models Malinvaud (1970b) devotes one long chapter to non-linear regression models in which he discusses the asymptotic properties... will not discuss the maximum likelihood estimation of non-linear models unless the model is written in the regression form (1.1) Many non-linear models are discussed elsewhere in this Handbook; see,