Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 34 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
34
Dung lượng
399,32 KB
Nội dung
CHAPTER 45 Flexible Functional Form So far we have assumed that the mean of the dependent variable is a linear func- tion of the explanatory variables. In this chaper, this assumption will be relaxed. We first discuss the case where the explanatory variables are categorical variables. For categorical variables (gender, nationality, occupations, etc.), the conce pt of linearity does not make sense, and indeed, it is custom ary to fit arbitrary numerical functions of these categorical variables. One can do this also if one has numerical variables which assume only a limited number of values (such as the number of people in a household). As long as there are repeated observations for each level of these vari- ables, it is possible to introduce a different dummy variables for every level, and in this way also allow arbitrary functions. Linear restrictions between the coefficients 997 998 45. FLEXIBLE FUNCTIONAL FORM of these dummies can be interpreted as the selection of more restricted functional spaces. 45.1. Categorical Variables: Regression with Dummies and Factors If the explanatory variables are categorical then it is customary to fit arbitrary functions of these variables. This can be done with the use of dummy variables, or by the use of variables coded as “factors.” If there are more than two categories, you need several several regressors taking only the values 0 and 1, which is why they are called “dummy variables.” One regressor with several levels 0,1,2, etc. is too restrictive. The “factor” data type in R allows to code several levels in one variable, which will automatically be expanded into a set of dummy variables. Therefore let us first discuss dummy variables. If one has a categorical variable which has j possible outcomes, the simplest and most obvious thing to do would be to generate j regressors into the equation, each taking the value 1 if the observation has this level, and the value 0 otherwise. But if one does this, one has to leave the intercept out of the regression, otherwise one gets perfect multicollinearity. Usually in practice one keeps the intercept and omits one of the dummy variables. This makes it a little more difficult to interpret the dummy variables. 45.1. CATEGORICAL VARIABLES: REGRESSION WITH DUMMIES AND FACTORS 999 Problem 441. In the intermediate econometrics textbook [WW79], the follow- ing regression line is estimated: (45.1.1) b t = 0.13 + .068y t + 0.23w t + ˆε t , where b t is the public purchase of Canadian government bonds (in billion $), y t is the national income, and w t is a dummy variable with the value w t = 1 for the war years 1940–45, and zero otherwise. • a. 1 point This equation represents two regression lines, one for peace and one for war, both of which have the same slope, but which have different intercepts. What is the intercept of the peace time regression, and what is that of the war time regression line? Answer. In peace, w t = 0, therefore the regression reads b t = 0.13 + .068y t + ˆε t , therefore the intercept is .13. In war, w t = 1, therefore b t = 0.13 + .068y t + 0.23 + ˆε t , therefore the intercept is .13 + .23 = .36. • b. 1 point What would the estimated equation have been if, instead of w t , they had used a variable p t with the values p t = 0 during the war years, and p t = 1 otherwise? (Hint: the coefficient for p t will be negative, because the intercept in peace times is below the intercept in war times). Answer. Now the intercept of the whole equation is the intercept of the war regression line, which is .36, and the coefficient of p t is the difference between peace and war intercepts, which is 1000 45. FLEXIBLE FUNCTIONAL FORM 23. (45.1.2) b t = .36 + .068y t − .23p t + ˆε t . • c. 1 point What would the estimated equation have been if they had thrown in both w t and p t , but left out the intercept term? Answer. Now the coefficient of w t is the intercept in the war years, which is .36, and the coefficient of p t is the intercept in the peace years, which is .13. (45.1.3) b t = .36w t + .13p t + .068y t + ˆε t ? • d. 2 points What would the estimated equation have been, if bond sales and income had been measured in millions of dollars instead of billions of dollars? (1 billion = 1000 million.) Answer. From b t = 0.13 + .068y t + 0.23w t + ˆε t follows 1000b t = 130 + .068 ·1000y t + 230w t + 1000ˆε t , or (45.1.4) b (m) t = 130 + .068y (m) t + 230w t + ˆε (m) t , where b (m) t is bond sales in millions (i.e., b (m) t = 1000b t ), and y (m) t is national income in millions (i.e., y (m) t = 1000y t ). 45.1. CATEGORICAL VARIABLES: REGRESSION WITH DUMMIES AND FACTORS 1001 Problem 442. 5 points Assume you run a time series regression y = Xβ + ε ε ε, but you have reason to believe that the values of the parameter β are not equal in all time periods t. What would you do? Answer. Include dummies, run separate regressions for subperiods, use a varying parameter model. There are various ways to set it up. Threshold effects might be represented by the following dummies: (45.1.5) ι o o o ι ι o o ι ι ι o ι ι ι ι In the example in Problem 441, the slope of the numerical variables does not change with the levels of the categorical variables, in other words, there is no in- teraction between those variables, but each variable makes a separate contribution to the response variable. The presence of interaction can be modeled by including products of the dummy variables with the response variable with whom interaction exists. How do you know the interpretation of the coefficients of a given set of dummies? Write the equation for every category separately. E.g. [Gre97, p. 383]: Winter 1002 45. FLEXIBLE FUNCTIONAL FORM y = β 1 + β 5 x, Spring y = β 1 + β 2 + β 5 x Summer y = β 1 + β 3 + β 5 x, Autumn y = β 1 + β 4 + β 5 x. I.e. the overall intercept β 1 is the intercept in Winter, the coefficient for the first seasonal dummy β 2 is the difference between Spring and Winter, that for the second dummy β 3 difference between Summer and Winter, and β 4 the difference between Autumn and Winter. If the slope differs too, do (45.1.6) ι o x o ι ι x x = ι d x d ∗ x where ∗ denotes the Hadamard product of two matrices (their element-wise multi- plication). This last term is called an interaction term. An alternative to using dummy variables is to use factor variables. If one includes a factor variable into a regression formula, the statistical package converts it into a set of dummies. Look at Section 22.5 for an example how to use factor variables instead of dummies in R. 45.2. Flexible Functional Form for Numerical Variables Here the issue is: how to find the right transformation of the explanatory vari- ables before running the regression? Each of the methods to be discussed has a smoothing parameter. 45.2. FLEXIBLE FUNCTIONAL FORM FOR NUMERICAL VARIABLES 1003 To fix notation, assume for now that only one explanatory variable x is given and you want to estimate the model y = f(x) +ε ε ε with the usual assumption ε ε ε ∼ o, σ 2 I. But whereas the regression model specified that f is an affine function, we allow f to be an element of an appropriate larger function space. The size of this space is characterized by a so-called smoothing parameter. 45.2.1. Polynomial Regression. The most frequently used metho d is poly- nomial regression, i.e., one chooses f to be a polynomial of order m (i.e. it has m terms, including the constant term) or degree m −1 (i.e. the highest power is x m−1 ). f(x) = θ 0 + θ 1 x + ··· + θ m−1 x m−1 . Motivation: This is a seamless generalization of ordinary least squares, since affine functions are exactly polynomials of degree 1 (order 2). Taylor’s theorem says that any f ∈ W m [a, b] can be approximated by a polynomial of order m (degree m −1) plus a remainder term which can be written as an integral involving the mth derivative, see [Eub88, (3.5) on p. 90]. The Weierstrass Approximation T heorem say that any continuous function over a closed and bounded interval can be uniformly approximated by polynomials of sufficiently high degree. Here one has to decide what degree to use, the degree of the polynomial plays here the role of the smoothing parameter. Some practical hints: 1004 45. FLEXIBLE FUNCTIONAL FORM For higher degree polynomials don’t use the “p ower basis” 1, x, x 2 , . . . , x m−1 , but there are two reasonable choices. Either one can use Legendre polynomials [Eub88, (3.10) and (3.11) on p. 54], which are obtained from the power basis by Gram- Schmidt orthonormalization over the interval [a, b]. This does not make the design matrix orthogonal, but at le ast one should expe ct it not to be too ill-conditioned, and the roots and the general shape of Legendre polynomials is well-understood. As the second main choice one may also select polynomials that make the design-matrix itself exactly orthonormal. The Splus-function poly do e s that. The jth Legendre polynomial has exactly j real roots in the interval [Dav75, Chapter X], [Sze59, Chapter III]. The orthogonal polynomials probably have a sim- ilar property. This gives another justification for using polynomial regession, which is similar to the justification one sometimes reads for using Fourier-series: The data have high-frequency and low-frequency comp onents, and one wants to filter out the low-frequency comp onents. In practice, polynomials do not always give a good fit. There are better alterna- tives available, which will be discussed in turn. 45.2.2. The Box-Cox Transformation. An early attempt used in Economet- rics was to use a family of functions which is not as complete as the polynomials but which ecompasses many functional forms encountered in Economics. These functions 45.2. FLEXIBLE FUNCTIONAL FORM FOR NUMERICAL VARIABLES 1005 are only defined for x > 0 and have the form (45.2.1) B(x, λ) = x λ −1 λ if λ = 0 ln(x) if λ = 0 [DM93, p. 484] have a plot with the curves for λ = 1.5, 1, 0.5, 0, −0.5, and −1. They point out some serious disadvantage of this transformation: if λ = 0, B(x, λ) is bounded eihter from below or above. For λ < 0, B(x, λ) cannot be greater than −1/λ, and for λ > 0, it cannot be less than −1/λ. About the Box-Cox transformation read [Gre97, 10.4] 45.2.3. Kernel Estimates. For nonparametric estimates look at [Loa99], it has the R-package locfit. Figure 1.1 is a good example: it is actuarial data, which are roughly fitted by a straight line, but a better idea of the accelerations and decelerations can be very useful for a life insurance company. Chapter 1 gives a historical overview: Spencer’s rule from 1904 was designed for computational convenience (for hand-calculations), and it reproduces polynomials up to the 3rd degree. Figure 2.1 illustrates how local regression is done. Pp. 18/19: emphasis on fitted values, not on the parameter estimates. There are two important parameters: the bandwidth and the degree of the polynomial. To see the effects 1006 45. FLEXIBLE FUNCTIONAL FORM of bandwidth, see the plots on p. 21: using our data we can do plots of the sort plot(locfit(r~year,data=uslt,alpha=0.1,deg=3),get.data=T) and then vary alpha and deg. Problem 443. What kind of smoothing would be best for the time series of the variable r (profit rate) in dataset uslt? Problem 444. Locally constant smooths are not good at the edges, and also not at the maxima and minima of the data. Why not? The kernel estimator can be considered a local fit of a constant. Straight lines are b etter, and cubic parabolas even better. Quadratic ones not as good. The birth rate data which require smoothing with a varying bandwidth are interesting, see Simonoff p. 157, description in the text on p. 158. 45.2.4. Regression Splines. About the word “spline,” [Wah90, p. vii] writes: “The mechanical spline is a thin reedlike strip that was used to draw curves needed in the fabrication of cross sections of ships’ hulls. Ducks or weights were placed on the strip to force it to go through given points, and the free portion of the strip would assume a position in space that minimized the bending energy.” One of the drawbacks of polynomial regression is that its fit is global. One method to provide for local fits is to fit a piecewise polynomial. A spline is a piecewise [...]... are some guidelines how to choose knots, taken from [Eub88, p 357]: For m = 2, linear splines, place knots at points where the data exhibit a change in slope For m = 3, quadratic splines, locate knots near local maxima, minima or in ection points in the data 1008 45 FLEXIBLE FUNCTIONAL FORM For m = 4, cubic splines, arrange the knots so that they are close to in exion points in the data and not more... starts with a cubic spline, i.e., a spline of order 4, and postulates in addition that the 2nd derivative is zero outside the boundary points, one obtains what is called a “natural cubic spline”; compare the Splus-function ns There is exactly one natural spline going through n datapoints One has to choose the order and the location of the knots The most popular are cubic splines, and higher orders do... specify here different univariate smoothing techniques for the individual variables and then combine it all into a joint fit by the method of back-substitution Back-substitution is an iterative procedure by which one obtains the joint fit by an iteration only involving fits on one (0) explanatory variable each One starts with some initial set of functions fi and then, cycling through j = 1, , k, 1, ,... maximal correlation again if one maximizes over functions φ of one variable and θ of k variables In the case of joint normality, the above result generalizes to the following: the optimal φ can be chosen to be the identity, and the optimal θ is linear; it can be cvhosen to be the best linear predictor 46.1 ALTERNATING LEAST SQUARES AND ALTERNATING CONDITIONAL EXPECTATIONS 1 023 In the case of several... which means f itself and its derivatives up to and including the m − 1st derivative are absolutely continuous over a closed and bounded interval [a, b], and the mth derivative is square integrable over [a, b] If we allow such a general f , then the estimation criterion can no longer be the minimization of the sum of squared errors, because in this case one could simply choose an interpolant of the data,... values are also continuous functions of time Here are a few remarks about intrinsically linear functions, following [Gre97, (8.4) on p 390]; Say y is the vector of observations of the dependent variable, and Z is the data matrix of the explanatory variables, for instance in the above example Z would contain only t, x, and perhaps a categorical variable denoting the different age groups In the above example... variables and the variables in the regression [Gre97, Definition 81 on p 396] says something about the relationship between the parameters of interest and the regression coefficients: if the k regression coefficients β1 , , βk can be written as k one-to-one possibly nonlinear functions of the k underlying parameters θ1 , , θk , then the model is intrinsically linear in θ [Gre97, p 391/2] brings the... regression 46.1 ALTERNATING LEAST SQUARES AND ALTERNATING CONDITIONAL EXPECTATIONS 1025 Secondly, there are situations in which the functions of x and y which have highest correlation are not very interesting functions Here is an example in which the function with the highest correlation may be uninteresting If y and one of the xi change sign together, one gets correlation of 1 by predicting the sign of y... y by the sign of xi and ignoring all other components of x Here is another example in which the function with the highest correlation may be uninteresting Let x y be a mixture consisting of x y with probability 1 − α x y = (46.1.1) x with probability α y where x and y are independent random variables which have density functions, while x and y are discrete, and let Dx and Dy be the supports... than one extreme point (maximum or minimum) and one in ection point occurs between any two knots It is also possible to determine the number of knots and select their location so as to optimize the fit But this is a hairy minimization problem; [Eub88, p 362] gives some shortcuts Extensions: Sometime one wants knots which are not so smooth, this can be obtained by letting several knots coincide Or one wants . maxima, minima or in ec- tion points in the data. 1008 45. FLEXIBLE FUNCTIONAL FORM For m = 4, cubic splines, arrange the knots so that they are close to in exion points in the data and not more. the intercept in Winter, the coefficient for the first seasonal dummy β 2 is the difference between Spring and Winter, that for the second dummy β 3 difference between Summer and Winter, and β 4 the. data which require smoothing with a varying bandwidth are interesting, see Simonoff p. 157, description in the text on p. 158. 45.2.4. Regression Splines. About the word “spline,” [Wah90, p. vii]