Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 32 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
32
Dung lượng
411,29 KB
Nội dung
An overview of regression analysis 107 Atermin x ∗2 t can be cancelled from the numerator and denominator of (4A.29), and, recalling that x ∗ t = (x t − ¯ x ), this gives the variance of the slope coefficient as var( ˆ β) = s 2 (x t − ¯ x) 2 (4A.30) so that the standard error can be obtained by taking the square root of (4A.30): SE( ˆ β) = s 1 (x t − ¯ x) 2 (4A.31) Turning now to the derivation of the intercept standard error, this is much more difficult than that of the slope standard error. In fact, both are very much easier using matrix algebra, as shown in the following chapter. This derivation is therefore offered in summary form. It is possible to express ˆα as a function of the true α and of the disturbances, u t : ˆα = α + u t x 2 t − x t x t T x 2 t − x t 2 (4A.32) Denoting all the elements in square brackets as g t , (4A.32) can be written ˆα − α = u t g t (4A.33) From (4A.15), the intercept variance would be written var( ˆα) = E u t g t 2 = g 2 t E u 2 t = s 2 g 2 t (4A.34) Writing (4A.34) out in full for g 2 t and expanding the brackets, var( ¯α) = s 2 T x 2 t 2 − 2 x t x 2 t x t + x 2 t x t 2 T x 2 t − x t 2 2 (4A.35) This looks rather complex, but, fortunately, if we take x 2 t outside the square brackets in the numerator, the remaining numerator cancels with a term in the denominator to leave the required result: SE(ˆα) = s x 2 t T (x t − ¯ x) 2 (4A.36) 5 Further issues in regression analysis Learning outcomes In this chapter, you will learn how to ● construct models with more than one explanatory variable; ● derive the OLS parameter and standard error estimators in the multiple regression context; ● determine how well the model fits the data; ● understand the principles of nested and non-nested models; ● test multiple hypotheses using an F -test; ● form restricted regressions; and ● test for omitted and redundant variables. 5.1 Generalising the simple model to multiple linear regression Previously, a model of the following form has been used: y t = α + βx t + u t t = 1, 2, ,T (5.1) Equation (5.1) is a simple bivariate regression model. That is, changes in the dependent variable are explained by reference to changes in one single explanatory variable x. What if the realestate theory or the idea that is sought to be tested suggests that the dependent variable is influenced by more than one independent variable, however? For example, simple esti- mation and tests of the capital asset pricing model can be conducted using an equation of the form of (5.1), but arbitrage pricing theory does not pre- suppose that there is only a single factor affecting stock returns. So, to give one illustration, REIT excess returns might be purported to depend on their sensitivity to unexpected changes in 108 Further issues in regression analysis 109 (1) inflation; (2) the differences in returns on short- and long-dated bonds; (3) the dividend yield; or (4) default risks. Having just one independent variable would be no good in this case. It would, of course, be possible to use each of the four proposed explanatory factors in separate regressions. It is of greater interest, though, and it is also more valid, to have more than one explanatory variable in the regression equation at the same time, and therefore to examine the effect of all the explanatory variables together on the explained variable. It is very easy to generalise the simple model to one with k regressors (independent variables). Equation (5.1) becomes y t = β 1 + β 2 x 2t + β 3 x 3t +···+β k x kt + u t ,t= 1, 2, ,T (5.2) The variables x 2t , x 3t , ,x kt are therefore a set of k − 1 explanatory variables that are thought to influence y, and the coefficient estimates β 2 , β 3 , , β k are the parameters that quantify the effect of each of these explanatory variables on y. The coefficient interpretations are slightly altered in the multiple regression context. Each coefficient is now known as a partial regression coefficient, interpreted as representing the partial effect of the given explanatory variable on the explained variable, after holding con- stant, or eliminating the effect of, all the other explanatory variables. For example, ˆ β 2 measures the effect of x 2 on y after eliminating the effects of x 3 , x 4 , ,x k . Stating this in other words, each coefficient measures the average change in the dependent variable per unit change in a given independent variable, holding all other independent variables constant at their average values. 5.2 The constant term In (5.2) above, astute readers will have noticed that the explanatory variables are numbered x 2 , x 3 , . . . – i.e. the list starts with x 2 and not x 1 .So,whereis x 1 ? In fact, it is the constant term, usually represented by a column of ones of length T : x 1 = ⎡ ⎢ ⎢ ⎢ ⎣ 1 1 . . . 1 ⎤ ⎥ ⎥ ⎥ ⎦ (5.3) 110 RealEstateModellingandForecasting Thus there is a variable implicitly hiding next to β 1 , which is a column vector of ones, the length of which is the number of observations in the sample. The x 1 in the regression equation is not usually written, in the same way that one unit of p and two units of q would be written as ‘p + 2q’ and not ‘1p + 2q’. β 1 is the coefficient attached to the constant term (which was called α in the previous chapter). This coefficient can still be referred to as the intercept, which can be interpreted as the average value that y would take if all the explanatory variables took a value of zero. A tighter definition of k, the number of explanatory variables, is prob- ably now necessary. Throughout this book, k is defined as the number of ‘explanatory variables’ or ‘regressors’, including the constant term. This is equivalent to the number of parameters that are estimated in the regression equation. Strictly speaking, it is not sensible to call the constant an explana- tory variable, since it does not explain anything and it always takes the same values. This definition of k will be employed for notational convenience, however. Equation (5.2) can be expressed even more compactly by writing it in matrix form: y = Xβ + u (5.4) where: y is of dimension T × 1; X is of dimension T × k; β is of dimension k × 1; and u is of dimension T × 1. The difference between (5.2) and (5.4) is that all the time observations have been stacked up in a vector, and also that all the different explanatory variables have been squashed together so that there is a column for each in the X matrix. Such a notation may seem unnecessarily complex, but, in fact, the matrix notation is usually more compact and convenient. So, for example, if k is two – i.e. there are two regressors, one of which is the constant term (equivalent to a simple bivariate regression y t = α + βx t + u t ) – it is possible to write ⎡ ⎢ ⎢ ⎢ ⎣ y 1 y 2 . . . y T ⎤ ⎥ ⎥ ⎥ ⎦ = ⎡ ⎢ ⎢ ⎢ ⎣ 1 x 21 1 x 22 . . . . . . 1 x 2T ⎤ ⎥ ⎥ ⎥ ⎦ β 1 β 2 + ⎡ ⎢ ⎢ ⎢ ⎣ u 1 u 2 . . . u T ⎤ ⎥ ⎥ ⎥ ⎦ (5.5) T × 1 T × 22×1 T × 1 so that the x ij element of the matrix X represents the j th time observa- tion on the ith variable. Notice that the matrices written in this way are Further issues in regression analysis 111 conformable – in other words, there is a valid matrix multiplication and addition on the RHS. 1 5.3 How are the parameters (the elements of the β vector) calculated in the generalised case? Previously, the residual sum of squares, ˆ u 2 i , was minimised with respect to α and β. In the multiple regression context, in order to obtain estimates of the parameters, β 1 , β 2 , ,β k , the RSS would be minimised with respect to all the elements of β. Now, the residuals can be stacked in a vector: ˆ u = ⎡ ⎢ ⎢ ⎢ ⎣ ˆ u 1 ˆ u 2 . . . ˆ u T ⎤ ⎥ ⎥ ⎥ ⎦ (5.6) The RSS is still the relevant loss function, and would be given in a matrix notation by equation (5.7): L = ˆ u ˆ u = [ ˆ u 1 ˆ u 2 ··· ˆ u T ] ⎡ ⎢ ⎢ ⎢ ⎣ ˆ u 1 ˆ u 2 . . . ˆ u T ⎤ ⎥ ⎥ ⎥ ⎦ = ˆ u 2 1 + ˆ u 2 2 +···+ ˆ u 2 T = ˆ u 2 t (5.7) Using a similar procedure to that employed in the bivariate regression case – i.e. substituting into (5.7), and denoting the vector of estimated parameters as ˆ β – it can be shown (see the appendix to this chapter) that the coefficient estimates will be given by the elements of the expression ˆ β = ⎡ ⎢ ⎢ ⎣ ˆ β 1 ˆ β 2 . . . ˆ β k ⎤ ⎥ ⎥ ⎦ = (X X) −1 X y (5.8) If one were to check the dimensions of the RHS of (5.8), it would be observed to be k × 1. This is as required, since there are k parameters to be estimated by the formula for ˆ β. 1 The above presentation is the standard way to express matrices in the time series econometrics literature, although the ordering of the indices is different from that used in the mathematics of matrix algebra (as presented in chapter 2 of this book). In the latter case, x ij would represent the element in row i and column j, although, in the notation used from this point of the book onwards, it is the other way around. 112 RealEstateModellingandForecasting How are the standard errors of the coefficient estimates calculated, though? Previously, to estimate the variance of the errors, σ 2 ,anestima- tor denoted by s 2 was used: s 2 = ˆ u 2 t T − 2 (5.9) The denominator of (5.9) is given by T − 2, which is the number of degrees of freedom for the bivariate regression model – i.e. the number of observations minus two. This applies, essentially, because two observations are effectively ‘lost’ in estimating the two model parameters – i.e. in deriving estimates for α and β. In the case in which there is more than one explanatory variable plus a constant, and using the matrix notation, (5.9) would be modified to s 2 = ˆ u ˆ u T − k (5.10) where k = the number of regressors including a constant. In this case, k observations are ‘lost’ as k parameters are estimated, leaving T − k degrees of freedom. It can also be shown (see the appendix to this chapter) that the parameter variance–covariance matrix is given by var( ˆ β) = s 2 (X X) −1 (5.11) The leading diagonal terms give the coefficient variances while the off-diagonal terms give the covariances between the parameter estimates, so that the variance of ˆ β 1 is the first diagonal element, the variance of ˆ β 2 is the second element on the leading diagonal and the variance of ˆ β k is the kth diagonal element. The coefficient standard errors are simply given therefore by taking the square roots of each of the terms on the leading diagonal. Example 5.1 The following model with three regressors (including the constant) is esti- mated over fifteen observations, y = β 1 + β 2 x 2 + β 3 x 3 + u (5.12) and the following data have been calculated from the original xs: (X X) −1 = ⎡ ⎢ ⎣ 2.03.5 −1.0 3.51.06.5 −1.06.54.3 ⎤ ⎥ ⎦ , (X y) = ⎡ ⎢ ⎣ −3.0 2.2 0.6 ⎤ ⎥ ⎦ , ˆ u ˆ u = 10.96 Further issues in regression analysis 113 Calculate the coefficient estimates and their standard errors. ˆ β = ⎡ ⎢ ⎢ ⎢ ⎣ ˆ β 1 ˆ β 2 . . . ˆ β k ⎤ ⎥ ⎥ ⎥ ⎦ = (X X) −1 X y = ⎡ ⎣ 2.03.5 −1.0 3.51.06.5 −1.06.54.3 ⎤ ⎦ × ⎡ ⎣ −3.0 2.2 0.6 ⎤ ⎦ = ⎡ ⎣ 1.10 −4.40 19.88 ⎤ ⎦ (5.13) To calculate the standard errors, an estimate of σ 2 is required: s 2 = RSS T − k = 10.96 15 − 3 = 0.91 (5.14) The variance–covariance matrix of ˆ β is given by s 2 (X X) −1 = 0.91(X X) −1 = ⎡ ⎣ 1.82 3.19 −0.91 3.19 0.91 5.92 −0.91 5.92 3.91 ⎤ ⎦ (5.15) The coefficient variances are on the diagonals, and the standard errors are found by taking the square roots of each of the coefficient variances. var( ˆ β 1 ) = 1.82 SE( ˆ β 1 ) = 1.35 (5.16) var( ˆ β 2 ) = 0.91 ⇔ SE( ˆ β 2 ) = 0.95 (5.17) var( ˆ β 3 ) = 3.91 SE( ˆ β 3 ) = 1.98 (5.18) The estimated equation would be written ˆ y = 1.10 − 4.40x 2 + 19.88x 3 (1.35) (0.95) (1.98) (5.19) In practice, fortunately, all econometrics software packages will estimate the coefficient values and their standard errors. Clearly, though, it is still useful to understand where these estimates came from. 5.4 A special type of hypothesis test: the t-ratio Recall from equation (4.29) in the previous chapter that the formula under a test of significance approach to hypothesis testing using a t-test for variable i is test statistic = ˆ β i − β ∗ i SE ˆ β i (5.20) 114 RealEstateModellingandForecasting If the test is H 0 : β i = 0 H 1 : β i = 0 i.e. a test that the population parameter is zero against a two-sided alterna- tive – this is known as a t-ratio test. Since β ∗ i = 0, the expression in (5.20) collapses to test statistic = ˆ β i SE( ˆ β i ) (5.21) Thus the ratio of the coefficient to its standard error, given by this expres- sion, is known as the t-ratio or t-statistic. In the last example above, the t-ratios associated with each of the three coefficients would be given by ˆ β 1 ˆ β 2 ˆ β 3 Coefficient 1.10 −4.40 19.88 SE 1.35 0.95 1.98 t-ratio 0.81 −4.63 10.04 Note that, if a coefficient is negative, its t-ratio will also be negative. In order to test (separately) the null hypotheses that β 1 = 0, β 2 = 0 and β 3 = 0, the test statistics would be compared with the appropriate critical value from a t-distribution. In this case, the number of degrees of freedom, given by T −k,isequalto15 − 3 = 12. The 5 per cent critical value for this two- sided test (remember, 2.5 per cent in each tail for a 5 per cent test) is 2.179, while the 1 per cent two-sided critical value (0.5 per cent in each tail) is 3.055. Given these t-ratios and critical values, would the following null hypotheses be rejected? H 0 : β 1 = 0? No. H 0 : β 2 = 0? Yes. H 0 : β 3 = 0? Yes. If H 0 is rejected, it would be said that the test statistic is significant. If the variable is not ‘significant’ it means that, while the estimated value of the coefficient is not exactly zero (e.g. 1.10 in the example above), the coefficient is indistinguishable statistically from zero. If a zero was placed in the fitted equation instead of the estimated value, this would mean that, whatever happened to the value of that explanatory variable, the dependent variable would be unaffected. This would then be taken to mean that the variable is not helping to explain variations in y, and that it could therefore be removed from the regression equation. For example, if the t- ratio associated with x 3 had been 1.04 rather than 10.04, the variable would Further issues in regression analysis 115 be classed as insignificant – i.e. not statistically different from zero). The only insignificant term in the above regression is the intercept. There are good statistical reasons for always retaining the constant, even if it is not significant; see chapter 6. It is worth noting that, for degrees of freedom greater than around twenty- five, the 5 per cent two-sided critical value is approximately ±2. So, as a rule of thumb (i.e. a rough guide), the null hypothesis would be rejected if the t-statistic exceeds two in absolute value. Some authors place the t-ratios in parentheses below the corresponding coefficient estimates rather than the standard errors. Accordingly, one needs to check which convention is being used in each particular application, and also to state this clearly when presenting estimation results. 5.5 Goodness of fit statistics 5.5.1 R 2 It is desirable to have some measure of how well the regression model actually fits the data. In other words, it is desirable to have an answer to the question ‘How well does the model containing the explanatory variables that was proposed actually explain variations in the dependent variable?’. Quantities known as goodness of fit statistics are available to test how well the sample regression function (SRF) fits the data – that is, how ‘close’ the fitted regression line is to all the data points taken together. Note that it is not possible to say how well the sample regression function fits the population regression function – i.e. how the estimated model compares with the true relationship between the variables – as the latter is never known. What measures might therefore make plausible candidates to be goodness of fit statistics? A first response to this might be to look at the residual sum of squares. Recall that OLS selected the coefficient estimates that minimised this quantity, so the lower the minimised value of the RSS was, the better the model fitted the data. Consideration of the RSS is certainly one possibility, but the RSS is unbounded from above (strictly, it is bounded from above by the total sum of squares – see below) – i.e. it can take any (non-negative) value. So, for example, if the value of the RSS under OLS estimation was 136.4, what does this actually mean? It would be very difficult, by looking at this number alone, to tell whether the regression line fitted the data closely or not. The value of the RSS depends to a great extent on the scale of the dependent variable. Thus one way to reduce the RSS pointlessly would be to divide all the observations on y by ten! 116 RealEstateModellingandForecasting In fact, a scaled version of the residual sum of squares is usually employed. The most common goodness of fit statistic is known as R 2 . One way to define R 2 is to say that it is the square of the correlation coefficient between y and ˆ y – that is, the square of the correlation between the values of the dependent variable and the corresponding fitted values from the model. A correlation coefficient must lie between −1 and +1 by definition. Since R 2 (defined in this way) is the square of a correlation coefficient, it must lie between zero and one. If this correlation is high, the model fits the data well, while, if the correlation is low (close to zero), the model is not providing a good fit to the data. Another definition of R 2 requires a consideration of what the model is attempting to explain. What the model is trying to do in effect is to explain variability of y about its mean value, ¯ y. This quantity, ¯ y, which is more specifically known as the unconditional mean of y, acts like a benchmark, since, if the researcher had no model for y, he/she could do no worse than to regress y on a constant only. In fact, the coefficient estimate for this regression would be the mean of y. So, from the regression y t = β 1 + u t (5.22) the coefficient estimate, ˆ β 1 ,willbethemeanofy –i.e. ¯ y. The total variation across all observations of the dependent variable about its mean value is known as the total sum of squares, TSS, which is given by TSS = t (y t − ¯ y) 2 (5.23) The TSS can be split into two parts: the part that has been explained by the model (known as the explained sum of squares, ESS) and the part that the model was not able to explain (the RSS). That is, TSS = ESS + RSS (5.24) t (y t − ¯ y) 2 = t ( ˆ y t − ¯ y) 2 + t ˆ u 2 t (5.25) Recall that the residual sum of squares can also be expressed as t (y t − ˆ y t ) 2 since a residual for observation t is defined as the difference between the actual and fitted values for that observation. The goodness of fit statistic is given by the ratio of the explained sum of squares to the total sum of squares, R 2 = ESS TSS (5.26) [...]... one or more will be significant by chance alone More concretely, it could be stated that, if an α per cent size of test is used, on average one in every (100/α) regressions will have a significant slope coefficient by chance alone 124 RealEstateModellingandForecasting Trying many variables in a regression without basing the selection of the candidate variables on a realestate or economic theory is... inflationary environment as 122 RealEstateModellingandForecasting landlords push for higher rents to cover inflation and expenses) and Vt−1 is the vacancy rate (in per cent) in the previous quarter The fitted model is ˆ % Rt = 6.21 + 2.07(% Et ) − 0.54Vt−1 (2.7) (2.5) (−3.0) (5.35) ¯ Adjusted R 2 = 0.23 According to the above results, if the vacancy rate in the previous quarter fell by 1 per cent, the rate... the unrestricted regression are ˆ ˆ β 2 = 1 and β 3 = 1 Of course, such an event is extremely unlikely to occur in practice Example 5.4 In the previous chapter, we estimated a bivariate model of real rent growth for UK offices (equation 4.10) The single explanatory variable was the growth 126 RealEstateModellingandForecasting in employment and financial and business services We now extend this model... y y is zero with respect to β, and β X Xβ ˆ ˆ acts like a square of X β, which is differentiated to 2X Xβ Rearranging, ˆ 2X y = 2X X β ˆ X y = X Xβ (5A.6) (5A.7) 133 134 RealEstateModellingandForecasting Pre-multiplying both sides of (5A.7) by the inverse of X X, ˆ β = (X X)−1 X y (5A.8) Thus the vector of OLS coefficient estimates for a set of k parameters is given by ⎡ ⎤ ˆ β1 ⎢ β2 ⎥ ˆ ˆ ⎢ ⎥ (5A.9)... estate analyst to determine how reliable the model is and to recognise the circumstances under which OLS may run into problems These tests enable an assessment of the quality of a model, the selection between models and, in particular, an assessment of the suitability of the chosen model to be used for forecasting 135 136 RealEstateModellingandForecasting 6.2 Violations of the assumptions of the... between the t- and the F -distributions will always hold ● The F -distribution has only positive values and is not symmetrical ● Therefore the null is rejected only if the test statistic exceeds the critical F -value, ˆ although the test is a two-sided one in the sense that rejection will occur if β 2 is significantly bigger or significantly smaller than 0.5 128 RealEstateModellingandForecasting 5.8.2... of the omitted OFBS series, the sample size shrinks, since official data for OFBS start only in 1983 and, since we use growth rates, the sample commences in 1984 Hence the estimations and tests below take place for the 3 The data are from the Office for National Statistics 130 Real Estate Modelling andForecasting sample period 1984 to 2006, which leads to different parameter estimates Admittedly, this... explained by GDPgt The F -test statistic is 1268.38−1078.26 × 28−3 = 4.41 1078.26 1 132 Real Estate Modelling andForecasting The critical value for F (1, 25) at the 5 per cent level is 4.24 Since the computed value is just higher than the critical value, we reject the null hypothesis at the 5 per cent significance level that GDPg does not belong to the equation If we now repeat the analysis and test... test involved 120 Real Estate Modelling andForecasting imposing restrictions on the original model to arrive at a restricted formulation that would be a subset of, or nested within, the original specification Sometimes, however, it is of interest to compare between non-nested models For example, suppose that there are two researchers working independently, each with a separate real estate theory for... Introduction Chapters 4 and 5 introduced the classical linear regression model and discussed key statistics in the estimation of bivariate and multiple regression models The reader will have begun to build knowledge about assessing the goodness of fit and robustness of a regression model This chapter continues the discussion of model adequacy by examining diagnostic tests that will help the real estate analyst . coefficient by chance alone. 1 24 Real Estate Modelling and Forecasting Trying many variables in a regression without basing the selection of the candidate variables on a real estate or economic theory. elements in an inflationary environment as 122 Real Estate Modelling and Forecasting landlords push for higher rents to cover inflation and expenses) and V t−1 is the vacancy rate (in per cent) in. = u t g t (4A.33) From (4A.15), the intercept variance would be written var( ˆα) = E u t g t 2 = g 2 t E u 2 t = s 2 g 2 t (4A. 34) Writing (4A. 34) out in full for g 2 t and expanding