Random Slopes [Coefficients] Model Static and Dynamic IV [next lecture] PANEL DATA MODELS.. reg lgdp llabor linvest pci, noheader.. adjusted for 58 clusters in province). reg lgdp llabor[r]
(1)PANEL DATA MODELS
(2)Panel data
data on MANY units and SEVERAL time periods
(3)Example
Viet Nam Provincial data on
GDP : provincial GDP (mil VND)
LABFO: number of laborers of provinces (1000 persons)
RINVEST: gross investment of provinces (mil VND) PCI: 100-point scaled composite index measuring
and ranking Vietnam’s provinces based on their overall economic governance quality
(4)Example
provcode province year rgdp labfo rinvest pci
An Giang 1 2007 22000000 1221.3 5600000 66.4688 1 2008 25000000 1244.9 4600000 61.1247 1 2009 25000000 1227.3 4800000 58.177 1 2010 27000000 1255 4500000 61.9379 1 2011 29000000 1300.4 3900000 62.22 Bac Can 2 2007 1500000 177.2 592714 46.4687 2 2008 2000000 179.8 1100000 39.7762 2 2009 2400000 189.8 1100000 75.9563 2 2010 3400000 194 2600000 51.4864 2 2011 4200000 199.6 2900000 52.71
(5)Why panel data?
more information
heterogeneity among units
(6)Manipulation of panel data
delta: unit
time variable: year, 2007 to 2011
panel variable: province (strongly balanced) xtset province year
(7)Manipulation of panel data
58 100.00 XXXXX 58 100.00 100.00 11111 Freq Percent Cum Pattern
Distribution of T_i: 5% 25% 50% 75% 95% max (province*year uniquely identifies each observation)
Span(year) = periods Delta(year) = unit
year: 2007, 2008, , 2011 T = province: 1, 2, , 58 n = 58 xtdescribe
(8)Manipulation of panel data
58 100.00 XXXXX 58 100.00 100.00 11111 Freq Percent Cum Pattern
Distribution of T_i: 5% 25% 50% 75% 95% max (province*year uniquely identifies each observation)
Span(year) = periods Delta(year) = unit
year: 2007, 2008, , 2011 T = province: 1, 2, , 58 n = 58 xtdescribe
(9)Manipulation of panel data
within 5.445562 41.77384 79.91406 T = between 4.380076 49.67015 67.33098 n = 58 pci overall 57.23728 6.969482 36.39006 77.19708 N = 290
within 1960123 -6043249 1.63e+07 T-bar = 4.7069 between 1.33e+07 1351629 8.79e+07 n = 58 rinvest overall 8788711 1.37e+07 592714.3 9.53e+07 N = 273
within 4884013 -502939.8 5.25e+07 T-bar = 4.82759 between 2.89e+07 2651895 1.39e+08 n = 58 rgdp overall 2.05e+07 2.96e+07 1485281 1.71e+08 N = 280 Variable Mean Std Dev Min Max Observations xtsum rgdp rinvest pci
(10)Manipulation of panel data
(n = 58)
Total 290 100.00 103 177.59 56.31 155 53.45 54 93.10 57.41 135 46.55 49 84.48 55.10 pcidummy Freq Percent Freq Percent Percent Overall Between Within xttab pcidummy
* Tabulate panel data
pcidumy: = pci above average
53.45% on average have pci above average 93.1% ever have pci above average
(11)Manipulation of panel data
xtline pci if province<=10, overlay
40
50
60
70
80
PC
I
2007 2008 2009 2010 2011
YEAR
(12)Pooled OLS
Fixed Effects (FE) Model
Random Effects (RE) Model
(13)Basic considerations
Pooled OLS
Unit-specific effects
Two-way effects model
Mixed model [or Random coefficients model]
it it it y X u
it i it it y X u
it i t it it y X u
(14)Pooled OLS
assumes identical intercept for all units and time periods
assumes errors are independent across all units i.
(15)Pooled OLS
assumes identical intercept for all units and time periods
_cons 2.726831 1.10059 2.48 0.016 5229367 4.930725 pci 0107977 .0063007 1.71 0.092 -.0018192 .0234146 linvest 6307949 .1446832 4.36 0.000 3410717 920518 llabor 4986047 .1981111 2.52 0.015 1018941 .8953153 lgdp Coef Std Err t P>|t| [95% Conf Interval] Robust
(16)Unit-specific effects model
rho .91156075 (fraction of variance due to u_i)
sigma_e .15132407 sigma_u .48582326
(17)Two-way effects model
xtreg lgdp llabor linvest pci i.year, fe
rho .98652813 (fraction of variance due to u_i)
sigma_e .09500439 sigma_u .81298858
_cons 14.54723 .887371 16.39 0.000 12.79774 16.29672
2011 4228365 .0253706 16.67 0.000 3728171 .4728559 2010 2969721 .0238665 12.44 0.000 2499183 .3440259 2009 1845611 .0219074 8.42 0.000 1413696 .2277525 2008 0979883 .0202528 4.84 0.000 0580589 .1379177 year
(18)
Problems of FE models
FE models are equivalent to Pooled OLS with
unit-specific dummies, and/or time-specific dummies
included.
Problem: so many dummy variables included in the model, result in lower degree of freedom
Solution: Random Effects [RE] Model
it i it it y X u
it i it it
(19)Random effects model
rho .86012018 (fraction of variance due to u_i)
sigma_e .15132407 sigma_u .37524081
(20)Problem of RE model
No dummy variables added, so efficient. Assumption
If assumption is not satisfied, then is inconsistent.
In summary
FE: inefficient, but consistent
RE: efficient, but probably inconsistent
RE will be better if is consistent.
2
, uncorrelated with
i N Xit
FE
RE
RE
(21)Hausman test
Recall that RE will be better if is consistent is unbiased if it is not systematically different
from
Hausman test null hypothesis
if the null hypothesis is rejected: FE is better
if the null hypothesis is not rejected: RE is better RE
RE
FE
is not systematically different from
RE FE
(22)Hausman test
xtreg lgdp llabor linvest pci, fe
estimate store fixed
xtreg lgdp llabor linvest pci, re
estimate store random
(23)Hausman test
(V_b-V_B is not positive definite) Prob>chi2 = 0.0000
= 30.38
chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B) Test: Ho: difference in coefficients not systematic
B = inconsistent under Ha, efficient under Ho; obtained from xtreg b = consistent under Ho and Ha; obtained from xtreg pci 0044829 0042964 .0001865
linvest .258814 3487632 -.0899493 .0160913 llabor 1.173004 8752699 .2977342 141169
fixed random Difference S.E
(b) (B) (b-B) sqrt(diag(V_b-V_B)) Coefficients
(24)Random Coefficients Model
xtmixed lgdp llabor linvest pci || province: pci
LR test vs linear regression: chi2(2) = 341.52 Prob > chi2 = 0.0000 sd(Residual) 1501907 .0075356 1361241 .1657109 sd(_cons) 3422018 .0864983 2085088 .5616171 sd(pci) 0044457 .0017468 0020582 .0096027 province: Independent