Chapter 4
Further development and analysis of the classical linear regression model
Phan Tuyết TrinhTô Thị Phương ThảoNguyễn Hoàng Minh Huy
Lâm Bá DuLê Chí CangHuỳnh Thái Huy
GVHD: TS Phùng Đức Nam
Trang 21 Generalising the simple model to multiple linear regression2 The constant term
3 How are the parameters calculated in the generalised case?
4 Testing multiple hypotheses: the F-test
5 Sample output for multiple hypothesis tests6 Multiple regression using an APT-style model7 Data mining and the true size of the test
8 Goodness of fit statistics9 Hedonic pricing models
10 Tests of non-nested hypotheses11 Quantile regression
Trang 34.1 Generalising the simple model to multiple linear regression
Stock returns might be purported to depend on their sensitivity to unexpected changes in:
• inflation
• the differences in returns on short- and long-dated bonds• industrial production
• default risks
Trang 44.2 The constant term
k is defined as the number of ‘explanatory variables’ or
‘regressors’ including the constant term.
= the number of parameters that are estimated in the regression equation.
Trang 5The elements of the β vector
●SRF(Sample Regression Function)
, where:,
4.3 How are the parameters (the elements of the β vector) calculated in the generalised case?
Trang 6Ordinary least squares (OLS)
● (: an estimate of the variance of the errors - )
● var
4.3 How are the parameters (the elements of the β vector) calculated in the generalised case?
Trang 74.3 How are the parameters (the elements of the β vector) calculated in the generalised case?
Trang 84.3 How are the parameters (the elements of the β vector) calculated in the generalised case?
Trang 94.3 How are the parameters (the elements of the β vector) calculated in the generalised case?
Trang 11●var •
4.3 How are the parameters (the elements of the β vector) calculated in the generalised case?
Trang 12● Mô hình gốc/Mô hình không ràng buộc – UnRestricted
Ước lượng bằng OLS thu được tổng bình phương các phần dư
URSS, có bậc tự do df (degree of freedom) = T – k
● Mô hình có ràng buộc (Mô hình bị thu hẹp, mất đi m hệ số hồi
Trang 13Ví dụ mô hình có ràng buộc (Restricted)
Trang 15View/Coefficient Diagnostics/Wald Test –
Trang 17•Whether the monthly returns on Microsoft stock can be explained bay reference to unexpected changes in a set of macroeconomic and financial variables.
=> Arbitrage pricing theory (APT)
4.6 Multiple regression using an APT-style model
Trang 18The steps to take regression model
• Step 1: Open a new Eviews workfile • Step 2: Import the data
• Step 3: Generate variables:
The APT posits that the stock return can be
explained by reference to the unexpected changes in the macroeconomic varibles rather their levels Unexpected value = Actual value – expected value
4.6 Multiple regression using an APT-style model
Trang 19Generate variables
•Genr
Dspread = baa_aaa_spread – baa_aaa_spread(-1)Dcredit = consumer_credit – consumer_credit (-1)Rmsoft = 100*dlog(microsoft)
Rsandp = 100*dlog(sandp)
Dmoney = m1money_supply – m1money_supply(-1)Inflation = 100*dlog(cpi)
Term = ustb10y – ustb3m
Dinflation = inflation – inflation(-1)Mustb3m = ustb3m/12
Rterm = term – term(-1)
Ermsoft = rmsoft – mustb3mErsandp = rsandp – mustb3m
4.6 Multiple regression using an APT-style model
Trang 20The steps to take regression model
• Step 4: Object/New Object/ Equation msoftreg: ERMSOFT C ERSANDP DPROD DCREDIT DINFLATION DMONEY DSPREAD RTERM • Method: Least Squares.
4.6 Multiple regression using an APT-style model
Trang 22The steps to take regression model
• View/Coefficient Diagnostics/Wald Test – Coefficient Restrictions
• C(3) = 0, C(4) = 0, C(5) = 0, C(6) = 0, C(7) = 0
4.6 Multiple regression using an APT-style model
Trang 23Null Hypothes is Sum m ary:
Norm alized Res triction (= 0)ValueStd Err.
Trang 24Stepwise regression
• Stepwise regression is an automatic variable selection produre which chooses the jointly most important’s explanatory variables from a set of candidate variables.
• The simplest is the uni-directional forwards method.
• No variables => first variable(the lowest p-value) =>the next lowest p-value
4.6 Multiple regression using an APT-style model
Trang 25• Object/New Object
• Equation: Msoftstepwise
• Method: STEPLS- Stepwise Least Square • Dependent variable: ERMSOFT C
• Explanatory variables: ERSANDP DPROD DCREDIT DINFLATION DMONEY DSPREAD
Trang 27Stepwise regression
Stepwise procedures have been strongly criticised by statistical purists At the most basic level, they are sometimes argued to be no better than automated procedures for data mining, in particular if the list of potential candidate variables is long and results from a ‘fishing trip’ rather than a strong prior financial theory.
4.6 Multiple regression using an APT-style model
Trang 28Sample sizes and asymptotic theory
•A question that is often asked by those new to econometrics is ‘what is an appropriate sample size for model estimation?’ - Most testing procedures in econometrics rely on asymptotic
theory The results in theory hold only if there are an infinite
Trang 29• test statistics are assumed to follow a random distribution
=> they will take on extreme values that fall in the rejection region some of the time by chance alone Þ the possibility of rejecting a correct null
4.7 Data mining and the true size of the test
Trang 30• If enough explanatory variables are employed in a regression, often one or more will be significant by chance alone.
• If an α% size of test is used, on average one in every (100/αα) regressions will have a significant
slope coefficient by chance alone.
4.7 Data mining and the true size of the test
Trang 31• Trying many variables in a regression without basing the selection of the candidate variables on a financial or economic theory is known as
‘data mining’ or ‘data snooping’.
=> The true significance level will be considerably greater than the nominal significance level assumed.
4.7 Data mining and the true size of the test
Trang 32To avoid data mining:
• ensuring that the selection of candidate regressors for inclusion in a model is made on the basis of financial or economic theory
• examining the forecast performance of the model in an ‘out-of-sample’ data set
4.7 Data mining and the true size of the test
Trang 334.8 Goodness of fit statistics
“How well does the model containing the explanatory variables that was proposed actually explain variations in the dependent variable?”
Trang 344.8 Goodness of fit statistics
• Quantities known as goodness of fit statistics
are available to test how well the SRF fits the data – that is, how ‘close’ the fitted regression line is to all of the data points taken together.
Trang 354.8 Goodness of fit statistics
What measures might make plausible candidates to be goodness of fit statistics?
• RSS
The value of RSS depends to a great extent on the
scale of the dependent variable • R2
A scaled version of RSS
Trang 364.8 Goodness of fit statistics
• It is the square of the correlation coefficient between and
• the square of the correlation between the values of the dependent variable and the corresponding fitted values from the model
• must lie between 0 and 1
• If this correlation is high, the model fits the data well, while if the correlation is low (close to zero), the model is not providing a good fit to the data
•
Trang 374.8 Goodness of fit statistics
The TSS can be split into 2 parts:
• the part that has been explained by the model (the
explained sum of squares, ESS)
• the part that the model was not able to explain (the RSS).
Trang 384.8 Goodness of fit statistics
R2
Trang 394.8 Goodness of fit statistics
RSS = TSS i.e ESS =0 so R2 = ESS/TSS = 0
• The model has not succeeded in explaining any
of the variability of y about its mean value
• This would happen only where the estimated values of all of the coefficients = 0
Trang 404.8 Goodness of fit statistics
ESS = TSS i.e RSS =0 so R2 = ESS/TSS = 1
• The model has explained all of the variability of
y about its mean value
• This would happen only in the case where all of the observation points lie exactly on the fitted line.
Trang 414.8 Goodness of fit statistics
Trang 43Problems with R2 as a goodness of fit measure
•R2 is defined in terms of variation about the mean of y so that if a model is reparameterised (rearranged) and the dependent variable changes, R2 will change.
• R2 never falls if more regressors are added to the regression
• (3) R2 can take values of 0.9 or higher for time series regressions, and hence it is not good at discriminating between models, since a wide array of models will
frequently have broadly similar (and high) values of R2.
Trang 444.8 Goodness of fit statistics
Adjusted R2
So if an extra regressor (variable) is added to the
model, k increases and unless R2 increases by a
more than off-setting amount, will actually fall.
Trang 45• One application of econometric techniques where the coefficients have a particularly intuitively appealing interpretation is in the area of hedonic pricing models.
• Hedonic models are often used to produce appraisals or valuations of properties, given their characteristics (e.g size of dwelling, number of bedrooms, location, number of bathrooms, etc) In these models, the coefficient estimates represent ‘prices of the characteristics’.
4.9 Hedonic pricing models
Trang 46• One such application of a hedonic pricing model is given by Des Rosiers and Theriault (1996), who consider the effect of various amenities on rental values for ´ buildings and apartments in five sub-markets in the Quebec area of Canada
• The paper employs 1990 data for the Quebec City region, and there are 13,378 observations
4.9 Hedonic pricing models
Trang 47LnAGE log of the apparent age of the property
NBROOMS number of bedrooms
AREABYRM area per room (in square metres)
ELEVATOR a dummy variable = 1 if the building has anelevator; 0 otherwise
BASEMENT a dummy variable = 1 if the unit is located in a basement; 0 otherwise OUTPARK number of outdoor parking spaces
INDPARK number of indoor parking spaces
NOLEASE a dummy variable = 1 if the unit has no leaseattached to it; 0 otherwise LnDISTCBD log of the distance in kilometres to the centralbusiness district (CBD) SINGLPAR percentage of single parent families in the areawhere the building stands DSHOPCNTR distance in kilometres to the nearest shoppingcentre VACDIFF1 vacancy difference between the building and thecensus figure
4.9 Hedonic pricing models
Trang 48-Hedonic model of rental values in Quebec City, 1990.Dependent variable: Canadian dollars per month
4.9 Hedonic pricing models
Trang 49• This list includes several variables that are dummy variables.
• Dummy variables can be used in the context of cross-sectional or time series regressions.
• The dummy variables are used in the same way as other explanatory variables and the coefficients on the dummy variables can be interpreted as the average differences in the values of the dependent variable for each category.
4.9 Hedonic pricing models
Trang 50The relationship between the regression F -statistic and R.
• Recall that the regression F -statistic tests the
null hypothesis that all of the regression slope parameters are simultaneously zero
Trang 51• One limitation of such studies that is worth
mentioning at this stage is their assumption that the implicit price of each characteristic is
identical across types of property, and that
these characteristics do not become saturated.
4.9 Hedonic pricing models
Trang 52• Suppose that there are two researchers
working independently, each with a separate financial theory for explaining the variation in
some variable, yt
(4.48) (4.49)
• An alternative approach to comparing between non-nested models would be to estimate an encompassing or hybrid model In the case of (4.48) and (4.49), the relevant encompassing
Trang 531.γ2 is statistically significant but γ3 is not In this case, (4.50) collapses to (4.48), and the latter is the preferred model.
2.γ3 is statistically significant but γ2 is not In this case, (4.50) collapses to (4.49), and the latter is the preferred model.
3.γ2 and γ3 are both statistically significant This would imply that
both x2 and x3 have incremental explanatory power for y, in
which case both variables should be retained Models (4.48) and (4.49) are both ditched and (4.50) is the preferred model.4. Neither γ2 nor γ3 are statistically significant In this case, none
of the models can be dropped, and some other method for choosing between them must be employed.
Selecting between models
4.10 Tests of non-nested hypotheses
Trang 54• There are several limitations to the use of encompassing regressions to select between non-nested models.
• It could be the case that if they are both
included, neither γ2 nor γ3 are statistically significant, while each is significant in their separate regressions (4.48) and (4.49)
4.10 Tests of non-nested hypotheses
Trang 55Background and motivation
• We may think of there being a non-linear (∩-shaped) relationship between regulation and GDP growth
• Estimating a standard linear regression model may lead to seriously misleading estimates: it will ‘average’ the positive and negative effects from very low and very high regulation.
4.11 Quantile regression
Trang 56Background and motivation
• Quantile regressions, developed by Koenker
and Bassett (1978), represent a more natural and flexible way to capture the complexities inherent in the relationship by estimating models for the conditional quantile functions.
4.11 Quantile regression
Trang 57Background and motivation
• Quantile regressions can be conducted in both time series and cross-sectional contexts
• It is usually assumed that the dependent
variable (response variable) in the literature on
quantile regressions, is independently distributed and homoscedastic
• Quantile regression is a non-parametric technique
4.11 Quantile regression
Trang 58Background and motivation
• Quantiles, denoted , refer to the position where
an observation falls within an ordered series for
Q(τ ) = inf y : F(y) ≥ τ
where inf refers to the infimum, or the ‘greatest
lower bound’ which is the smallest value of y
satisfying the inequality
• quantiles must lie between 0 and 1
4.11 Quantile regression
Trang 59Estimation of quantile functions
4.11 Quantile regression
Trang 60An application of quantile regression: evaluating fund performance
• A study by Bassett and Chen (2001) performs a style attribution analysis for a mutual fund and, for comparison, the S&P500 index.
• Examine how a portfolio’s exposure to various styles varies with performance
4.11 Quantile regression
Trang 61An application of quantile regression: evaluating fund performance
• Bassett and Chen (2001) conduct a style
analysis in this spirit by regressing the returns of a fund on the returns of a large growth
portfolio, the returns of a large value portfolio, the returns of a small growth portfolio, and the returns of a small value portfolio.
4.11 Quantile regression