CAO HỌC TÀI LIỆU PHÂN TÍCH STATA . NHỮNG ĐIỀU CẦN BIẾT VỀ CAO HỌC TÀI LIỆU PHÂN TÍCH STATA, LÝ THUYẾT CAO HỌC TÀI LIỆU PHÂN TÍCH STATA, BÀI GIẢNG CAO HỌC TÀI LIỆU PHÂN TÍCH STATA. TỔNG QUAN CAO HỌC TÀI LIỆU PHÂN TÍCH STATA
Trang 1Pham Thi Bich Ngoc, Ph.D (University of Kiel, Germany)
FEC/Hoa Sen University
ngoc.phamthibich@hoasen.edu.vn
UNIVERSITY OF ECONOMICS HOCHIMINHCITY, June 2014
Trang 2 Multicollinearity occurs when two or more
independent variables in a regression model
are highly correlated to each other
will be higher if the corresponding
independent variable is more highly correlated
to the other independent variables in the
model
Trang 3 Perfect multicollinearity occurs when there is a perfect linear correlation between two or more independent variables
value in all observations
June 14 - Dr Pham Thi Bich Ngoc 3
Trang 4 The symptoms of a multicollinearity problem
1 independent variable(s) considered
critical in explaining the model’s dependent variable are not
statistically significant according to the tests
Trang 52 High R2, highly significant F-test,
but few or no statistically significant
t tests
3 Parameter estimates drastically
change values and become
statistically significant when
excluding some independent
variables from the regression
June 14 - Dr Pham Thi Bich Ngoc 5
Trang 6 A simple test for multicollinearity is to
conduct “artificial” regressions between each independent variable (as the “dependent”
variable) and the remaining independent
1 VIF
Trang 7 VIFj = 2, for example, means that variance is
by multicollinearity
June 14 - Dr Pham Thi Bich Ngoc 7
Trang 8 Although it is useful to be aware of the
presence of multicollinearity, it is not easy to
remedy severe (non-perfect) multicollinearity
new sample might help lessen multicollinearity
Trang 9 Exclude the independent variables that appear
to be causing the problem
help, for example:
using real instead of nominal economic data
using a reciprocal instead of a polynomial
specification on a given independent variable
June 14 - Dr Pham Thi Bich Ngoc 9
Trang 10 Var( u | x ) = σ2 [MLR.5]
Homoscedasticity assumption: variance is
constant
implied that conditional on the explanatory
variables, the variance of the unobserved
error, u , was constant
If this is not true, that is if the variance of
u is different for different values of the x ’s,
then the errors are heteroskedastic
Trang 12 This provides an estimator of the variance of which is consistent
standard error for inference
heteroscedasticity-consistent standard errors
errors…]
ˆ
j
b
Trang 1313
standard errors only have asymptotic
statistics formed with robust standard errors
inferences will not be correct
regress
June 14 - Dr Pham Thi Bich Ngoc
Trang 14
(3) Autocorrelation
Autocorrelation occurs in time-series studies
when the errors associated with a given time
period carry over into future time periods
For example, if we are predicting the growth of stock dividends, an overestimate in one year is likely to lead to overestimates in succeeding
years
Trang 15 Test: Durbin-Watson statistic :
d (e i e i1)2
e i2
, for n and K-1 d.f.
Positive Zone of No Autocorrelation Zone of Negative
autocorrelation indecision indecision autocorrelation
| _| | _| _| | _|
0 d-lower d-upper 2 4-d-upper 4-d-lower 4
Autocorrelation is clearly evident Ambiguous – cannot rule out autocorrelation Autocorrelation in not evident
June 14 - Dr Pham Thi Bich Ngoc 15
Trang 16 regress lnY to lnK, lnL, lnM, horizontal, Bam, Bch
estat vif
calculates the centered or uncentered variance inflation
factors (VIFs) for the independent variables specified in a
linear regression model
Trang 17 regress lnY to lnK, lnL, lnM, horizontal, Bam, Bch
Trang 18 regress lnY to lnK, lnL, lnM, horizontal, Bam, Bch
estat bgodfrey Breusch-Godfrey test for
higher-order serial correlation
H0: no serial correlation
estat dwatson Durbin-Watson d statistic to test
for first-order serial correlation
The Durbin-Watson statistic has a range from 0 to
4 with a midpoint of 2
For panel data:
Trang 19 regress lnY to lnK, lnL, lnM, horizontal, Bam, Bch
estat ovtest
Ramsey regression specification-error test for omitted
variables
Ho: model has no omitted variables
June 14 - Dr Pham Thi Bich Ngoc 19
Trang 20 xtreg lnY to lnK, lnL, lnM, horizontal, Bam, Bch
Multicollinearity: not problematic
Trang 21 Pooled OLS
Hausman test
David Roodman, 2009 " How to do xtabond2: An introduction to
difference and system GMM in Stata ," Stata Journal , StataCorp LP, vol
9(1), pages 86-136, March
David Roodman, 2006 " How to Do xtabond2: An Introduction to
"Difference" and "System" GMM in Stata ," Working Papers 103, Center
for Global Development
June 14 - Dr Pham Thi Bich Ngoc 21
Trang 22• Suppose y is firm output and x is a number of employees
• We have i = 1…n firms and t = 1…T time periods (year)
• A simple econometric model:
uit is a random error term: E (uit ) ~ N (0, σ2)
Assumptions: intercept and slope coefficients are constant across time and firms and that the error term captures
it it
it a a x u
Pooled regression by OLS (STATA_xtreg…)
Trang 23Pooled regression by OLS may result in heterogeneity bias :
Pooled regression:
yit= a0+ a1xit+ uit
True model: Firm 1
True model: Firm 2
True model: Firm 3
True model: Firm 4
Trang 24(One Way) Fixed Effects Model:
If each group (firm) to have its own intercept:
HOW? create a set of dummy (binary) variables, one for
each firm, and include them as regressors
This form of estimation is also known as Least Squares
Dummy Variables (LSDV)
it it
i
it a a x u
it it
N
it i
it a D a x u
Fixed Effects Estimation:
Trang 25(Two Way) Fixed Effects Model:
allow the intercept to vary across the different time periods (Two Way Fixed Effects):
it it
T
t
it i N
i
it i
0
STATA: xtreg … i.id i.year
June 14 - Dr Pham Thi Bich Ngoc 25
Trang 26Fixed Effects/Within:
discards all variation between individuals and uses only
variation over time within an individual
) (
)
(
1 0
it i
it y a x x u
STATA: xtreg … , fe
Trang 27it a a x v
Random Effects Estimation:
June 14 - Dr Pham Thi Bich Ngoc 27
Trang 28We assume that:
regressor) of
t independen (both
0 ) (
) (
n) correlatio group
across (no
if 0 ) (
ation) autocorrel
(no or
if 0 ) (
) components two
of nce (independe ,
, 0
) (
tic) homoscedas components
(both )
(
) (
0 ) ( )
(
2 2
2 2
i
j i
js it
j it
it
v i
it i
x E
x v
E
j i
v
v
E
j i
s t
E
j t i v
Trang 29Choosing between Fixed Effects (FE) and Random Effects (RE)
1 With large T and small N there is likely to be little
difference, so FE is preferable as it is easier to compute
2 With large N and small T, estimates can differ significantly
If the cross-sectional groups are a random sample of the
population RE is preferable If not the FE is preferable
3 If the error component, vi , is correlated with x then RE is biased, but FE is not
4 For large N and small T and if the assumptions behind RE hold then RE is more efficient than FE
June 14 - Dr Pham Thi Bich Ngoc 29
Trang 31Tests for the statistical significance of the difference
between the coefficient estimates obtained by FE and by
RE, under then null hypothesis that the RE estimates are
efficient and consistent, and FE estimates are inefficient
Hausman test:
STATA: hausman FE RE (LM test: xttest0 after xtreg , re)
June 14 - Dr Pham Thi Bich Ngoc 31
Trang 32estimates store RE hausman FE RE
June 14 - Dr Pham Thi Bich Ngoc 32