These are Models that Combine Cross-section and Time-Series Data In panel data the same cross-sectional unit industry, firm, country is surveyed over time, so we have data which is
Trang 1Dr Pham Thi Bich Ngoc
Hoa Sen University
ngoc.phamthibich@hoasen.edu.vn
1
Trang 2 Learn and use STATA?
http://www.ats.ucla.edu/stat/stata/
Introductory Economics: A Modern Approach
- Jeffrey M Wooldridge (2012)
“Economic Analysis of Cross section and
Panel data” - Jeffrey M Wooldridge (2010)
Trang 3 These are Models that Combine
Cross-section and Time-Series Data
In panel data the same cross-sectional unit
(industry, firm, country) is surveyed over
time, so we have data which is pooled over
space as well as time.
3
YEU TO CHU THE VA YEU TO THOI GIA
I : ID (DOANH NGHIEP, INDIVIDUAL, HOUSEHOLD, COUNTRY, INDUSTRY
T : TIME (DAY, WEEK, QUATER, YRYEAR
ID / YEAR / WAGE / EDU / EXP / MARRIED KHOA 2010 7 12 6 0
KHOA 2011 8 12 7 0
KHOA 2012 8 12 8 0
PHUONG 2010 5 12 1 0
PHUONG 2011 5 13 3 0
file excel BT1
Trang 4If all the cross-sectional units have the same number of time series observations the panel is balanced, if not it is
T T
Nt it
t t
N i
N i
y y
y y
y y
y y
y y
y y
y y
y y
2 1
2 2
22 12
1 1
21 11
Time series
Cross section
- a matrix of balanced panel data observations on variable y,
N cross-sectional observations, T time series observations
Trang 51 Panel data can take explicit account of specific heterogeneity (“individual” here meansrelated to the microunit)
individual-2 By combining data in two dimensions, panel datagives more data variation, less collinearity andmore degrees of freedom
3 Panel data is better suited than cross-sectional
example it is well suited to understanding
transition behaviour – for example companybankruptcy or merger; the effects of technologicalchange, or economic cycles
5
Trang 6 Grunfeld and Griliches [1960]
◦ i = 10 firms: GM, CH, GE, WE, US, AF, DM, GY, UN,
Trang 7 yit = Real per capita GDP
1 ln( ) ln( )
7
Trang 8 LWAGE = log of wage = dependent variable in regressions
EXP = work experience
WKS = weeks worked
OCC = occupation, 1 if blue collar,
IND = 1 if manufacturing industry
SOUTH = 1 if resides in south
SMSA = 1 if resides in a city (SMSA)
Trang 9 Pooled OLS
Difference in Difference, First Differences
(FD), Between Effects, Fixed Effects (FE),
Random Effects (RE), and Hausman test
Two stages Least Square (2SLS)
David Roodman, 2009 " How to do xtabond2: An introduction to
difference and system GMM in Stata ," Stata Journal , StataCorp LP, vol
9(1), pages 86-136, March.
David Roodman, 2006 " How to Do xtabond2: An Introduction to
"Difference" and "System" GMM in Stata ," Working Papers 103, Center
for Global Development.
9
Trang 10A Pooled OLS
(Pooled Cross Section)
Trang 11 Often loosely use the term panel data to refer to any data set that has both a cross- sectional dimension and a time-series
Trang 12 We may want to pool cross sections just to
get bigger sample sizes
We may want to pool cross sections to
investigate the effect of time
We may want to pool cross sections to
investigate whether relationships have
changed over time
coi taatat ca cac quan sat thoi gian nhu 1 quan sat binh thuong
thoi gian la tong hop cac bien co den nen kinh te.
Trang 13• Suppose y is firm output and x is a number of employees
• We have i = 1…n firms and t = 1…T time periods (year)
• A simple econometric model:
ϵit is a random error term: E (ϵit ) ~ N (0, σ2)
Assumptions: intercept and slope coefficients are constant across time and firms and that the error term captures
differences over time and over firms???
it it
Trang 14Pooled regression by POLS may result in heterogeneity bias :
Pooled regression:
yit= a0+ a1xit+ uit
True model: Firm 1
True model: Firm 2
True model: Firm 3
True model: Firm 4
Trang 15 reg depvar [indepvars] [i.year]
Trang 16B Fixed Effects Model
Trang 17(One Way) Fixed Effects Model: (individual effects)
If each group (firm) to have its own intercept:
HOW? create a set of dummy (binary) variables, one for
each firm, and include them as regressors
This form of estimation is also known as Least Squares
Dummy Variables (LSDV)
it it
i
y 0 1
it it
N
i
it i
it a D a x
0
Fixed Effects Estimation:
STATA: reg depvar [indepvars] i.id
17
Trang 18(Two Way) Fixed Effects Model: (individual + time effects)
allow the intercept to vary across the different time periods (Two Way Fixed Effects):
it it
T
t
it i N
i
it i
0
STATA: reg depvar [indepvars] i.id i.year
Trang 19Fixed Effects/Within:
discards all variation between individuals and uses only
variation over time within an individual
) (
)
(
1 0
0i i it i it it i
it y a a a x x e e
it i
it i
Trang 20C Random Effects Model
Trang 21 Previously we’ve assumed that ui was
correlated with the x ’s, but what if it’s not?
OLS would be consistent in that case, but composite error will be serially correlated
Trang 22 Need to transform the model and do GLS
to solve the problem and make correct inferences
End up with a sort of weighted average
of OLS and Fixed Effects – use
u
x x
y y
1 0
2 1 2 2
2
Trang 23 If θ = 1, then this is just the fixed effects
estimator
If θ = 0, then this is just the OLS estimator
So, the bigger the variance of the
unobserved effect, the closer it is to FE
The smaller the variance of the unobserved
effect, the closer it is to OLS
Trang 25We assume that:
regressor)of
t independen(both
0)
()
(
n)correlatiogroup
across(no
if0)
(
ation)autocorrel
(noor
if0)
(
)components two
ofnce(independe
,,0
)(
tic)homoscedascomponents
(both )
(
)
(
0)
()
(
2 2
2 2
i
j
i
js it
j it
it
v i
it i
x e E x
u
E
j i
u
u
E
j i
s t
e
e
E
j t i u
Trang 26Choosing between Fixed Effects (FE) and Random Effects (RE)
1 With large T and small N there is likely to be little
difference, so FE is preferable as it is easier to compute
2 With large N and small T, estimates can differ significantly
If the cross-sectional groups are a random sample of the
population RE is preferable If not the FE is preferable
3 If the error component, vi , is correlated with x then RE is biased, but FE is not
4 For large N and small T and if the assumptions behind RE hold then RE is more efficient than FE
Trang 28 Test for Var(ui) = 0 , that is
◦ If Ti=T for all i, the Lagrange-multiplier test
statistic (Breusch-Pagan, 1980) is:
Trang 29◦ For unbalanced panels, the modified Breusch-Pagan
LM test for random effects (Baltagi-Li, 1990) is:
◦ Alternative one-side test:
ˆ
1 ~ (1) ˆ
it
i i i t i
Trang 30 Fixed effects estimator is consistent under H0and H1; Random effects estimator is efficient
under H0, but it is inconsistent under H1.
Hausman Test Statistic
Trang 32Tests for the statistical significance of the difference
between the coefficient estimates obtained by FE and by
RE, under then null hypothesis that the RE estimates are efficient and consistent, and FE estimates are inefficient
Hausman test:
STATA: hausman FE RE
Trang 33 The data in WAGEPAN.RAW are from Vella and
Verbeek (1998) Each of the 545 men in the sample worked in every year from 1980 through 1987
Some variables in the data set change over time:
three important ones
Other variables do not change: race and education
are the key examples If we use fixed effects (or
first differencing), we cannot include race,
education, or experience in the
equation
33
Trang 34 We use three methods: pooled OLS, random
effects, and fixed effects
In the first two methods, we can include educ and
race dummies (black and hispan), but these drop
out of the fixed effects analysis
The time-varying variables are exper, exper2,
union, and married “exper” is dropped in the FE
analysis (but exper2 remains) Each regression also contains a full set of year dummies