CAO HỌC TÀI LIỆU PHÂN TÍCH STATA . NHỮNG ĐIỀU CẦN BIẾT VỀ CAO HỌC TÀI LIỆU PHÂN TÍCH STATA, LÝ THUYẾT CAO HỌC TÀI LIỆU PHÂN TÍCH STATA, BÀI GIẢNG CAO HỌC TÀI LIỆU PHÂN TÍCH STATA. TỔNG QUAN CAO HỌC TÀI LIỆU PHÂN TÍCH STATA
Trang 1Pham Thi Bich Ngoc, Ph.D (University of Kiel, Germany)
FEC/Hoa Sen University
ngoc.phamthibich@hoasen.edu.vn
UNIVERSITY OF ECONOMICS HOCHIMINHCITY, 03 June 2014
Trang 2 Learn and use STATA?
http://www.ats.ucla.edu/stat/stata/
“Economic Analysis of Cross section and
Panel data” - Jeffrey M Wooldridge (2010)
Trang 3 These are Models that Combine
Cross-section and Time-Series Data
In panel data the same cross-sectional unit
(industry, firm, country) is surveyed over
time, so we have data which is pooled over
space as well as time
Trang 41 Panel data can take explicit account of individual-specific heterogeneity (“individual” here means related to the microunit)
2 By combining data in two dimensions, panel data gives more data variation, less collinearity and more degrees of freedom
3 Panel data is better suited than sectional data for studying the dynamics of
example company bankruptcy or merger
Trang 54 Panel data is better at detecting and
measuring effects that cannot be observed
in either cross-section or time-series data
5 Panel data enables the study of more
complex behavioural models – for example
the effects of technological change, or
economic cycles
6 Panel data can minimise the effects of
aggregation bias, from aggregating firms
Trang 6If all the cross-sectional units have the same number of time series observations the panel is balanced, if not it is
T T
Nt it
t t
N i
N i
y y
y y
y y
y y
y y
y y
y y
y y
2 1
2 2
22 12
1 1
21 11
Time series
Cross section
- a matrix of balanced panel data observations on variable y,
N cross-sectional observations, T time series observations
Trang 7 Grunfeld and Griliches [1960]
◦ i = 10 firms: GM, CH, GE, WE, US, AF, DM, GY, UN,
Trang 8 yit = Real per capita GDP
si = Average saving rate (over 1960-1985)
ni = Average population growth rate (over 1960-1985)
g+ d = 5%
COMi = 1 if communist, 0 otherwise
OPECi =1 if OPEC, 0 otherwise
Trang 9 LWAGE = log of wage = dependent variable in regressions
EXP = work experience
WKS = weeks worked
OCC = occupation, 1 if blue collar,
IND = 1 if manufacturing industry
SOUTH = 1 if resides in south
SMSA = 1 if resides in a city (SMSA)
Trang 10 Two basic windows
Trang 11 The usual – open, save, print
Log-file open/suspend/close
Do-file editor
Browse and Edit
Break
Trang 12 Open draft-student.dta
Create do file/.log file
A 3-factor Cobb- Douglas function (simple):
Trang 14 summarize [varlist] [, detail]
◦ # obs, mean, SD, range
Eg sum lnY/lnK/lnL/lnM
Trang 15 histogram varname
◦ Simple histogram of your variable
◦ Eg histogram lnY
histogram lnY, frac by(D7, title(“Firm Sales in 2007 and the Rest") subtitle("(in VND)")
qnorm varname
◦ Quantile plot of your variable to check normality
◦ Eg qnorm lnY
Trang 16 regress lnY to lnK, lnL, lnM, horizontal, Bam, Bch
predict r, resid
kdensity r, normal
Trang 17 tabulate [varname]
◦ Counts and percentages
◦ (see also, table - this is very different!)
Trang 18 tabulate [var1] [var2]
◦ “Cross-tab”
◦ Descriptive options
Eg tab D7 sectorcode if sectorcode<11
Trang 19 scatter [var1] [var2]
◦ Scatterplot of the two variables
twoway lfit[var1] [var2]
twoway scatter [var1] [var2]|| lfit [var1]
[var2]||, by(var3, total row(1))
http://www.stata.com/support/faqs/graphics/gph/gr aphdocs/twoway-linear-prediction-plot/index.html
Eg Graph lnY to lnK (linear, scatter plots)
Trang 20 pwcorr [varlist] [, sig]
◦ Pairwise correlations between variables
◦ “sig” option gives p-values
spearman [varlist] [, stats(rho p)]
Eg: Correlation between lnY/lnK/lnL/lnM?
Trang 21 regress depvar [indepvars] [if] [in]
[weight] [, options]
regress fits a model of depvar on indepvars using linear regression
regress lnY lnK lnL lnM horizontal Bam Bch
Checking Homoscedasticity of Residuals
rvfplot, yline(0)
Trang 22xtset id year
xtreg lnY lnK lnL lnM …
xtreg lnY lnK lnL lnM … i.year
xtreg lnY lnK lnL lnM … i.year i.industry