Bài 11: Mô hình dữ liệu bảng Panel Data

Random Slopes [Coefficients] Model Static and Dynamic IV [next lecture] PANEL DATA MODELS.. reg lgdp llabor linvest pci, noheader.. adjusted for 58 clusters in province). reg lgdp llabor[r]

(1)

PANEL DATA MODELS

(2)

Panel data

 data on MANY units and SEVERAL time periods

(3)

Example

Viet Nam Provincial data on

 GDP : provincial GDP (mil VND)

 LABFO: number of laborers of provinces (1000 persons)

 RINVEST: gross investment of provinces (mil VND)  PCI: 100-point scaled composite index measuring

and ranking Vietnam’s provinces based on their overall economic governance quality

(4)

Example

provcode province year rgdp labfo rinvest pci

An Giang 1 2007 22000000 1221.3 5600000 66.4688 1 2008 25000000 1244.9 4600000 61.1247 1 2009 25000000 1227.3 4800000 58.177 1 2010 27000000 1255 4500000 61.9379 1 2011 29000000 1300.4 3900000 62.22 Bac Can 2 2007 1500000 177.2 592714 46.4687 2 2008 2000000 179.8 1100000 39.7762 2 2009 2400000 189.8 1100000 75.9563 2 2010 3400000 194 2600000 51.4864 2 2011 4200000 199.6 2900000 52.71

(5)

Why panel data?

 more information

 heterogeneity among units

(6)

Manipulation of panel data

delta: unit

time variable: year, 2007 to 2011

panel variable: province (strongly balanced) xtset province year

(7)

58 100.00 XXXXX 58 100.00 100.00 11111 Freq Percent Cum Pattern

Distribution of T_i: 5% 25% 50% 75% 95% max (province*year uniquely identifies each observation)

Span(year) = periods Delta(year) = unit

year: 2007, 2008, , 2011 T = province: 1, 2, , 58 n = 58 xtdescribe

(8)

58 100.00 XXXXX 58 100.00 100.00 11111 Freq Percent Cum Pattern

Distribution of T_i: 5% 25% 50% 75% 95% max (province*year uniquely identifies each observation)

Span(year) = periods Delta(year) = unit

year: 2007, 2008, , 2011 T = province: 1, 2, , 58 n = 58 xtdescribe

(9)

within 5.445562 41.77384 79.91406 T = between 4.380076 49.67015 67.33098 n = 58 pci overall 57.23728 6.969482 36.39006 77.19708 N = 290

within 1960123 -6043249 1.63e+07 T-bar = 4.7069 between 1.33e+07 1351629 8.79e+07 n = 58 rinvest overall 8788711 1.37e+07 592714.3 9.53e+07 N = 273

within 4884013 -502939.8 5.25e+07 T-bar = 4.82759 between 2.89e+07 2651895 1.39e+08 n = 58 rgdp overall 2.05e+07 2.96e+07 1485281 1.71e+08 N = 280 Variable Mean Std Dev Min Max Observations xtsum rgdp rinvest pci

(10)

(n = 58)

Total 290 100.00 103 177.59 56.31 155 53.45 54 93.10 57.41 135 46.55 49 84.48 55.10 pcidummy Freq Percent Freq Percent Percent Overall Between Within xttab pcidummy

* Tabulate panel data

pcidumy: = pci above average

53.45% on average have pci above average 93.1% ever have pci above average

(11)

xtline pci if province<=10, overlay

40

50

60

70

80

PC

I

2007 2008 2009 2010 2011

YEAR

(12)

Pooled OLS

Fixed Effects (FE) Model

Random Effects (RE) Model

(13)

Basic considerations

 Pooled OLS

 Unit-specific effects

 Two-way effects model

 Mixed model [or Random coefficients model]

it it it y   X   u

it i it it y   X  u

it i t it it y     X  u

(14)

Pooled OLS

 assumes identical intercept for all units and time periods

 assumes errors are independent across all units i.

(15)

Pooled OLS

 assumes identical intercept for all units and time periods

_cons 2.726831 1.10059 2.48 0.016 5229367 4.930725 pci 0107977 .0063007 1.71 0.092 -.0018192 .0234146 linvest 6307949 .1446832 4.36 0.000 3410717 920518 llabor 4986047 .1981111 2.52 0.015 1018941 .8953153 lgdp Coef Std Err t P>|t| [95% Conf Interval] Robust

(16)

Unit-specific effects model

rho .91156075 (fraction of variance due to u_i)

sigma_e .15132407 sigma_u .48582326

(17)

Two-way effects model

xtreg lgdp llabor linvest pci i.year, fe

sigma_e .09500439 sigma_u .81298858

_cons 14.54723 .887371 16.39 0.000 12.79774 16.29672

2011 4228365 .0253706 16.67 0.000 3728171 .4728559 2010 2969721 .0238665 12.44 0.000 2499183 .3440259 2009 1845611 .0219074 8.42 0.000 1413696 .2277525 2008 0979883 .0202528 4.84 0.000 0580589 .1379177 year

(18)

Problems of FE models

 FE models are equivalent to Pooled OLS with

 unit-specific dummies, and/or  time-specific dummies

included.

 Problem: so many dummy variables included in the model, result in lower degree of freedom

 Solution: Random Effects [RE] Model

it i it it y    X   u

it i it it

(19)

Random effects model

sigma_e .15132407 sigma_u .37524081

(20)

Problem of RE model

 No dummy variables added, so efficient.  Assumption

 If assumption is not satisfied, then is inconsistent.

 In summary

 FE: inefficient, but consistent

 RE: efficient, but probably inconsistent

 RE will be better if is consistent.

 2 

,  uncorrelated with

i N   Xit



FE



RE



RE

(21)

Hausman test

 Recall that RE will be better if is consistent  is unbiased if it is not systematically different

from

 Hausman test null hypothesis

 if the null hypothesis is rejected: FE is better

 if the null hypothesis is not rejected: RE is better RE



RE



FE



is not systematically different from

RE FE

(22)

Hausman test

xtreg lgdp llabor linvest pci, fe

estimate store fixed

xtreg lgdp llabor linvest pci, re

estimate store random

(23)

Hausman test

(V_b-V_B is not positive definite) Prob>chi2 = 0.0000

= 30.38

chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B) Test: Ho: difference in coefficients not systematic

B = inconsistent under Ha, efficient under Ho; obtained from xtreg b = consistent under Ho and Ha; obtained from xtreg pci 0044829 0042964 .0001865

linvest .258814 3487632 -.0899493 .0160913 llabor 1.173004 8752699 .2977342 141169

fixed random Difference S.E

(b) (B) (b-B) sqrt(diag(V_b-V_B)) Coefficients

(24)

Random Coefficients Model

xtmixed lgdp llabor linvest pci || province: pci

LR test vs linear regression: chi2(2) = 341.52 Prob > chi2 = 0.0000 sd(Residual) 1501907 .0075356 1361241 .1657109 sd(_cons) 3422018 .0864983 2085088 .5616171 sd(pci) 0044457 .0017468 0020582 .0096027 province: Independent

Định dạng
Số trang	24
Dung lượng	680,2 KB