Panchanan Das Econometrics in Theory and Practice Analysis of Cross Section, Time Series and Panel Data with Stata 15.1 Econometrics in Theory and Practice Panchanan Das Econometrics in Theory and Practice Analysis of Cross Section, Time Series and Panel Data with Stata 15.1 123 Panchanan Das Department of Economics University of Calcutta Kolkata, India ISBN 978-981-32-9018-1 ISBN 978-981-32-9019-8 https://doi.org/10.1007/978-981-32-9019-8 (eBook) © Springer Nature Singapore Pte Ltd 2019 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Dedicated to my father late Bibhuti Bhusan Das Preface This book is an outcome of my experience in learning and teaching econometrics since more than three decades Good quality books of econometrics are available, but there is a dearth of user-friendly books with a proper combination of theory and application with statistical software The books particularly by Maddala, Wooldridge, Greene, Enders, Maddala and Kim, Hsiao and Baltagi are very much invaluable The book by Gujarati is also a good one in its ability to elaborate econometric theories for graduate students However, many scholars and students and researchers, today, use statistical software to empirical analysis I also have used both EViews and Stata in my teaching and research works and personally found that Stata is as powerful or flexible compared to EViews Furthermore, Stata is used extensively to process large data sets This book is a proper combination of econometric theory and application with Stata 15.1 The basic purpose of this text is to introduce econometric analysis of cross section, time series and panel data with the application of statistical software This book may serve as a basic text for those who wish to learn and apply econometric analysis in empirical research The level of presentation is kept as simple as possible to make it useful for undergraduate as well as graduate students It contains several examples with real data and Stata programmes and interpretation of the results This book is intended primarily for graduate and post-graduate students in universities in India and abroad and researchers in the social sciences, business, management, operations research, engineering or applied mathematics In this book, we view econometrics as a subject dealing with a set of data analytic techniques that are used in empirical research extensively The aim is to provide students with the skills required to undertake independent applied research using modern econometric methods It covers the statistical tools needed to understand empirical economic research and to plan and execute independent research projects It attempts to provide a balance between theory and applied research Various concepts and techniques of econometric analysis are supported by carefully developed examples vii viii Preface with the use of statistical software package, Stata 15.1 Hopefully, this book will successfully bridge the gap between learning econometrics and learning how to use Stata It is an attempt to incorporate econometric theories in a student-friendly manner to understand properly the techniques needed for empirical research It demands both students and professional analysts because of its balanced discussion of the theories with software applications However, this book should not be claimed as a substitute for the well-established texts that are being used in academia; rather it can serve as a supplementary text in both undergraduate- and post-graduate-level econometric courses The discussion in this book is based on the assumption that the reader is somewhat familiar with the Stata software and other statistical programming The Stata help manuals from the Stata Corporation offer detailed explanation and syntax for all the commands used in this book The data used for illustration are taken mainly from official sources like CSO, NSSO and ILO The topics covered in this book are divided into four parts Part I is the discussion on introductory econometric methods covering the syllabus of graduate courses in the University of Calcutta, Delhi University and other leading universities in India and abroad This part of the book provides an introduction to basic econometric methods for data analysis that economists and other social scientists use to estimate the economic and social relationships, and to test hypotheses about them, using real-world data There are chapters in this part covering the data management issues, details of linear regression models and the related problems due to the violation of the classical assumptions Chapter provides some basic steps used in econometrics and statistical software, Stata 15.1, for useful application of econometric theories Chapter discusses linear regression model and its application with cross section data Chapter deals with this problem of statistical inference of a linear regression model Chapter relaxes the homoscedasticity and non-autocorrelation assumptions of the random error of a linear regression model and shows how the parameters of the linear model are correctly estimated Chapter discusses the detection of multicollinearity and alternatives for handling the problem Part II discusses some advanced topics used frequently in empirical research with cross section data This part contains chapters to include some specific problems of regression analysis Chapter explains how qualitative explanatory variables can be incorporated into a linear model Chapter provides econometric models with limited dependent variables and problems of truncated distribution, sample selection bias and multinomial logit Special emphasis is given to multivariate analysis, particularly principal component analysis and factor analysis, because of their popularity in empirical research with cross section data Chapter captures these issues Part III deals with time series econometric analysis Time series data have some special features, and they should be taken care of very much cautiously Time series econometrics was developed in modern approach since the early 1980s with the publications of Engle and Granger, and it becomes very much popular in empirical research with the development of user-friendly software This book covers intensively both the univariate and multivariate time series econometric models and their Preface ix applications with software programming in chapters This part starts with the discussion on data generating process of time series data in Chap Chapter 10 deals with different features of the data generating process (DGP) of a time series in a univariate framework The presence of unit roots in macroeconomic time series has received a major area of theoretical and applied research since the early 1980s Chapter 11 presents some issues regarding unit root tests and explores some of the implications for macroeconomic theory and policy Chapter 12 explores the basic conceptual issues involved in estimating the relationship between two or more nonstationary time series with unit roots Chapter 13 examines the behaviour of volatility in terms of conditional heteroscedasticity model Forecasting is important in economics, commerce and various disciplines of social science and pure science Chapter 14 aims to provide an overview of forecasting based on time series analysis Part IV takes care of panel data analysis in chapters Panel data have several advantages over the cross section and time series data Panel data econometrics gains popularity because of the availability of panel data in the public domain today Different aspects of fixed effects and random effects are discussed here I have extended panel data analysis by taking dynamic panel data models which are the most suitable for macroeconomic research Chapter 15 discusses different types of panel data model in a static framework Chapter 16 deals with testing of hypotheses to examine panel data in a static framework Panel data with long time period have been used predominately in applied macroeconomic research like purchasing power parity, growth convergence, business cycle synchronisation and so on Chapter 17 provides some theoretical issues and their application in testing for unit roots in panel data Dynamic model in panel data framework is very much popular in empirical research Chapter 18 focuses on some issues of dynamic panel data model All chapters in this book provide applications of econometric models by using Stata Simple presentation of some difficult topics in a rigorous manner is the major strength of this book While the Bayesian econometrics, nonparametric and semiparametric, are popular methods today to capture the behaviour of the data in a more complex real situation, I not attempt to cover these topics because of my comparative disadvantage in these areas and to keep the technical difficulty at a lower possible level Despite these limitations, the topics covered in this book are basics and necessary for econometrics training of every student in economics and other disciplines I hope the students of econometrics will share my enthusiasm and optimism in the importance of different econometric methods they will learn through reading this book Hopefully, it will enhance their interest in empirical research in economics and other fields of social science Kolkata, India May 2019 Panchanan Das Acknowledgements My interest in econometrics was initiated by my teachers at different level since more than three decades back I acknowledge the contribution of Amiya Kumar Bagchi, my teacher and Ph.D supervisor, in the field of empirical research that encourages me to learn econometrics at least indirectly Among others I should mention Dipankor Coondoo of Indian Statistical Institute, Kolkata, who helped me to understand clearly different issues of the subject Sankar Kumar Bhoumik, my senior colleague and friend, helped a lot to learn the subject by providing access to teaching at post-graduate level at the Department of Economics, University of Calcutta, even much before my joining the Department as a permanent faculty I also gratefully acknowledge my teacher, Manoj Kumar Sanyal, who in fact is a continuous source of encouragement in learning and thinking I think, in some way, they have prepared the background for this book being written A number of friends and colleagues have commented on earlier drafts of the book, or helped in other ways I am grateful to Maniklal Adhikary, Anindita Sengupta, Pradip Kumar Biswas and others for their assistance and encouragement Discussions with Oleg Golichenko and Kirdina Svetlana of Higher School Economics, Moscow, were helpful in clarifying some of my ideas Any remaining errors and omissions are, of course, my responsibility, and I shall be glad to have them brought to my attention I am grateful to the Department of Economics, University of Calcutta, for providing an adequate infrastructure where I spent time during my learning and teaching of economics Special thanks are due to the Head of the Department of Economics and the authority of the University of Calcutta I am extremely grateful to my wife, Krishna, who took over many of my roles in the household during the preparation of the manuscripts xi xii Acknowledgements Finally, thanks to the editorial team of Springer for help with indexing and proof-reading I am grateful to Sagarika Ghosh of Springer for encouragement for this project Kolkata, India May 2019 Panchanan Das 18.4 Instrumental Variable Estimation 551 This command produces the generalised 2SLS estimator to the model As shown in the following output table, the same instruments are used to estimate the random effects model xtivreg ln_lab ln_lab_pro gdp_growth ( l.ln_lab =l2.ln_lab) , re G2SLS random-effects IV regression Group variable: country_SA Number of obs Number of groups = = 208 R-sq: Obs per group: = avg = max = 26 26.0 26 within = 0.9628 between = 1.0000 overall = 0.9993 corr(u_i, X) Wald chi2(3) Prob > chi2 = (assumed) ln_lab Coef ln_lab L1 .9961779 0031267 ln_lab_pro gdp_growth _cons 0060347 -.0083758 0692175 0081933 0013638 0751947 sigma_u sigma_e rho 01455816 05769063 05986757 (fraction of variance due to u_i) Instrumented: Instruments: Std Err z = = 118897.60 0.0000 P>|z| [95% Conf Interval] 318.61 0.000 9900498 1.002306 0.74 -6.14 0.92 0.461 0.000 0.357 -.0100239 -.0110488 -.0781613 0220933 -.0057028 2165964 L.ln_lab ln_lab_pro gdp_growth L2.ln_lab xtivreg, fd is used to implement the first-differenced two-stage least-square regression estimator for Anderson–Hsiao (1981) The first-differenced estimator removes the unobserved heterogeneity (μi ) xtivreg ln_lab ln_lab_pro gdp_growth ( l.ln_lab =l2.ln_lab) , fd xtivreg ln_lab ln_lab_pro gdp_growth ( l.ln_lab =l2.ln_lab) , fd First-differenced IV regression Group variable: country_SA Time variable: year R-sq: within = 0.8775 between = 0.8953 overall = 0.8391 corr(u_i, Xb) = -0.9327 Number of obs Number of groups = = 200 Obs per group: = avg = max = 25 25.0 25 Wald chi2(3) Prob > chi2 = = 92.06 0.0000 552 18 Dynamic Panel Model D.ln_lab Coef ln_lab LD -.5909087 259678 -2.28 0.023 -1.099868 -.0819491 ln_lab_pro D1 -.6514837 2253443 -2.89 0.004 -1.09315 -.209817 gdp_growth D1 -.0022558 0026486 -0.85 0.394 -.0074469 0029353 _cons 0769693 0153361 5.02 0.000 0469111 1070275 sigma_u sigma_e rho 3.6551524 06148097 99971716 Instrumented: Instruments: Std Err z P>|z| [95% Conf Interval] (fraction of variance due to u_i) L.ln_lab ln_lab_pro gdp_growth L2.ln_lab 18.5 Arellano–Bond GMM Estimator The instrumental variable method suggested by Anderson and Hsiao (1981) does not consider all potential orthogonality conditions The first-differenced instrumental variable (IV) estimation method can produce consistent estimates, but these estimates are not necessarily efficient This is because the IV method does not utilise all the available moment conditions The use of lagged difference as an instrument results in inefficient estimator (Arellano 1989) Arellano and Bond (1991) developed a dynamic panel data model by utilising the orthogonality conditions that exist between lagged values of yit and the disturbances εit Arellano and Bond (1991) derived GMM estimator for the parameters of a dynamic panel data model by taking more instruments available Arellano and Bond (1991) identify a number of valid instruments in terms of the lag values of the dependent variable, the predetermined variables and the endogenous variables by following the methodology developed in Holtz-Eakin et al (1988) The Arellano and Bond (1991) model may be looked at as an extension of the GMM framework developed by Hansen (1982) This model combines all the lagged levels along with the first differences of the strictly exogenous variables to form a potentially large instrument matrix Using this instrument matrix, Arellano and Bond (1991) derive the one-step and two-step GMM estimators, as well as the robust VCE estimator for the one-step model Later on, Windmeijer (2005) formulated a bias-corrected robust estimator for VCEs of two-step GMM estimators Arellano and Bond (1991) derived all of the relevant moment conditions for GMM estimation of a dynamic panel data model The moment conditions are based on the first-differenced model as shown in (18.4.2): 18.5 Arellano–Bond GMM Estimator 553 yit = φ1 yi,t−1 + εit , t = 2, T The number of moment conditions depends on T For t = 2, Eq (18.4.1) will be (yi2 − yi1 ) = φ1 (yi1 − yi0 ) + (εi2 − εi1 ) (18.5.1) Here, yi0 is a valid instrument and the moment condition is E( εi2 yi0 ) = (18.5.2) (yi3 − yi2 ) = φ1 (yi2 − yi1 ) + (εi3 − εi2 ) (18.5.3) For t = 3, Eq (18.4.1) is In this case, yi0 and yi1 are valid instruments, because they are highly correlated with (yi2 − yi1 ) and not correlated with (εi3 − εi2 ) as long as the εit are not serially correlated The moment conditions are E( εi3 yi0 ) = 0, and E( εi3 yi1 ) = (18.5.4) For t = 4, Eq (18.4.1) is (yi4 − yi3 ) = φ1 (yi3 − yi2 ) + (εi4 − εi3 ) (18.5.5) In this case, yi0 , yi1 and yi2 are valid instruments for (yi3 − yi2 ) The moment conditions are E( εi4 yi0 ) = 0, E( εi4 yi1 ) = and E( εi4 yi2 ) = (18.5.6) In this way, for period t, the set of valid instruments will be (yi0 , yi1 , …, yi,t −2 ), and moment conditions are obtained accordingly Therefore, for T = 4, (t = 2, and 4), we have moment conditions: E( εi2 yi0 ) = E( εi3 yi0 ) = 0, and E( εi3 yi1 ) = E( εi4 yi0 ) = 0, E( εi4 yi1 ) = and E( εi4 yi2 ) = For GMM estimation, let we define 554 18 Dynamic Panel Model ⎡ ⎢ ⎢ ⎢ ⎢ gi (φ1 ) = ⎢ ⎢ ⎢ ⎣ ⎤ ⎡ εi2 yi0 ( ⎢( εi3 yi0 ⎥ ⎥ ⎢ ⎥ ⎢ εi3 yi1 ⎥ ⎢ ( ⎥=⎢ εi4 yi0 ⎥ ⎢ ( ⎥ ⎢ εi4 yi1 ⎦ ⎣ ( εi4 yi2 ( yi2 − φ1 yi3 − φ1 yi3 − φ1 yi4 − φ1 yi4 − φ1 yi4 − φ1 ⎤ yi1 )yi0 yi2 )yi0 ⎥ ⎥ ⎥ yi2 )yi1 ⎥ ⎥ yi3 )yi0 ⎥ ⎥ yi3 )yi1 ⎦ yi3 )yi2 (18.5.7) It may be re-expressed in matrix form as ⎡ yi0 ⎢ ⎢ ⎢ ⎢ gi (φ1 ) = ⎢ ⎢ ⎢ ⎣ 0 yi0 yi1 0 ⎤ 0 ⎥ ⎥⎡ y − φ y ⎤ ⎥ i1 ⎥⎣ i2 yi3 − φ1 yi2 ⎦ ⎥ yi0 ⎥ ⎥ yi4 − φ1 yi3 yi1 ⎦ yi2 (18.5.8) or, gi (φ1 ) = X i Yi − φ1 Yi,−1 = X i εi (18.5.9) Here, the instrument matrix is ⎤ yi0 0 0 X i = ⎣ yi0 yi1 0 ⎦ 0 yi0 yi1 yi2 ⎡ The vectors of (18.5.10) y are ⎡ ⎤ yi2 Yi = ⎣ yi3 ⎦ yi4 ⎤ ⎡ yi1 Yi,−1 = ⎣ yi2 ⎦ yi3 (18.5.11) (18.5.12) Notice that gi (φ1 ) is a linear function of φ Therefore, the moment condition for exogeneity becomes E X i εi = For t = T, the instrument matrix will be (18.5.13) 18.5 Arellano–Bond GMM Estimator 555 ⎡ ⎤ yi0 0 X i = ⎣ yi0 yi1 ⎦ 0 yi0 yi,T −2 The (18.5.14) y vectors are ⎡ ⎤ yi2 Yi = ⎣ ⎦ yi T ⎡ ⎤ yi1 Yi,−1 = ⎣ ⎦ yi,T −1 (18.5.15) (18.5.16) Let we define S = E gi (φ1 )gi (φ1 ) = E[X i εi εi X i ] (18.5.17) Under conditional heteroscedasticity, a consistent estimate is n Sˆ = N X i εˆ i εˆ i X i (18.5.18) i=1 Here, εˆ i = Yi − φˆ Yi,−1 are consistent estimates of the first-differenced residuals obtained from a preliminary consistent estimator The sample moments used for GMM estimation are g N (φ1 ) = N = Sx Here, Sx y = N N y − Sx N i=1 Yi − φ1 Yi,−1 Xi i=1 X i Yi and Sx y−1 y−1 φ1 = N (18.5.19) N i=1 X i Yi,−1 The efficient GMM estimator is obtained by solving the following problem: Min: N g N (φ1 ) Sˆ −1 g N (φ1 ) = N Sx y − Sx Sˆ −1 Sx y−1 φ1 y − Sx y−1 φ1 The solution is φˆ = Sx ˆ −1 Sx y−1 S −1 y−1 Sx ˆ −1 Sx y−1 S y (18.5.20) This estimator is known as the two-step Arellano–Bond GMM estimator 556 18 Dynamic Panel Model The GMM estimator suffers from a weak instrument problem when autoregressive coefficient (φ ) in a dynamic panel model approaches unity When φ = 1, the moment conditions are completely irrelevant for the true parameter φ This is because in this case lagged levels are weak predictors of the first differences The estimated asymptotic standard errors of two-step GMM estimators are downward biased (Windmeijer 2005) In this case a variance correction is needed to improve inference using the Wald test 18.5.1 Illustration by Using Stata In Stata, the linear dynamic panel data model developed by Arellano and Bond (1991) is estimated by using the command xtabond By using menu in Stata, we can follow the path: Statistics > Longitudinal/panel data > Dynamic panel data (DPD) > Arellano-Bond estimation The Arellano–Bond estimator sets up a GMM problem in which the model is specified as a system of equations The test of autocorrelation of order m and the Sargan test of over-identifying restrictions derived by Arellano and Bond (1991) can be obtained with estat abond and estat sargan, respectively We start with one-step estimator of Arellano and Bond (1991) by using the same data set as used in earlier models In this data set, ln_lab is the log of wage workers, ln_lab_pro denotes log of output per worker and gdp_growth represents GDP growth rate We estimate one-step estimators of a dynamic model of labour demand in which ln_lab is the dependent variable and its first lag along with the current and one-period lag values of labour productivity and GDP growth are included as regressors by using the following command: xtabond ln_lab l(0/1).ln_lab_pro l(0/1).gdp_growth, lags(1) noconstant The output window in Stata 15.1 is shown below Although the moment conditions use first-differenced errors, xtabond estimates the coefficients of the level model and reports them accordingly The Wald statistic is used to test the null hypothesis that all the coefficients are zero and the null hypothesis is significantly rejected The footer in the output table reports the instruments used in the estimation process The first line indicates that xtabond used lags from on back to create the GMM-type instruments The notation L(2/.).ln_lab indicates that GMM-type instruments were created using lag of ln_lab from on back The third line indicates that the first difference of all the exogenous variables was used as standard instruments The following table of the output reports the coefficients, their standard errors and z statistics from the robust one-step estimators of a dynamic model of labour demand in which log values of labour employment (ln_lab) is the dependent variable and log values of labour productivity and GDP growth (ln_lab_pro and ln_gdp_growth) along with the first two lags of ln_lab are included as regressors 18.5 Arellano–Bond GMM Estimator 557 xtabond ln_lab l(0/1).ln_lab_pro l(0/1).gdp_growth, lags(1) noconstant Arellano-Bond dynamic panel-data estimation Group variable: country_SA Time variable: year Number of instruments = Number of obs Number of groups 184 = = 208 Obs per group: = avg = max = 26 26 26 Wald chi2(5) Prob > chi2 = = 6666.20 0.0000 One-step results ln_lab Coef Std Err ln_lab L1 .9502519 0221508 ln_lab_pro L1 -.9646305 1.034263 gdp_growth L1 -.0001647 0078905 z P>|z| [95% Conf Interval] 42.90 0.000 9068372 9936667 2613085 2595388 -3.69 3.99 0.000 0.000 -1.476786 525576 -.4524753 1.542949 0025689 001317 -0.06 5.99 0.949 0.000 -.0051997 0053093 0048703 0104717 Instruments for differenced equation GMM-type: L(2/.).ln_lab Standard: D.ln_lab_pro LD.ln_lab_pro D.gdp_growth LD.gdp_growth After using xtabond to estimate the model, we need to perform Sargan test of over-identifying restrictions by using the command estat sargan For homoscedastic error term, the Sargan test has an asymptotic χ2 distribution The Sargan test reported below comes from the one-step homoscedastic estimator The output shown below presents strong evidence in non-rejecting the null hypothesis that the over-identifying restrictions are valid Thus, we not need to reconsider our model or our instruments Arellano and Bond (1991) found a tendency for this test to under-reject in the presence of heteroscedasticity estat sargan Sargan test of overidentifying restrictions H0: overidentifying restrictions are valid chi2(179) = 156.8911 Prob > chi2 = 0.8819 The Arellano–Bond test for serial correlation in the first-differenced errors at order m can be performed by using estat abond It calculates the first- and second-order autocorrelation in the first-differenced errors The output shown below presents strong evidence against the null hypothesis of zero autocorrelation in the first-differenced errors at order Serial correlation in the first-differenced errors at an order higher than implies that the moment 558 18 Dynamic Panel Model conditions used by xtabond are not valid The result shown below presents no significant evidence of serial correlation in the first-differenced errors at order estat abond Arellano-Bond test for zero autocorrelation in first-differenced errors Order z Prob > z -6.6549 0.0000 1.6257 0.1040 H0: no autocorrelation One-Step Estimator with Robust VCE To estimate the same model with one-step robust estimator, we have to use the following command: xtabond ln_lab l(0/1).ln_lab_pro l(0/1).gdp_growth, lags(1) vce(robust) The coefficients are the same, but some robust standard errors are higher than those that assume a homoscedastic error term 18.5 Arellano–Bond GMM Estimator 559 xtabond ln_lab l(0/1).ln_lab_pro l(0/1).gdp_growth, lags(1) vce(robust) Arellano-Bond dynamic panel-data estimation Group variable: country_SA Time variable: year Number of obs Number of groups = = 208 = avg = max = 26 26 26 = = 31812.58 0.0000 Obs per group: Number of instruments = 185 Wald chi2(5) Prob > chi2 One-step results (Std Err adjusted for clustering on country_SA) Robust Std Err ln_lab Coef ln_lab L1 .9502519 0116998 ln_lab_pro L1 -.9646305 1.034263 gdp_growth L1 _cons z P>|z| [95% Conf Interval] 81.22 0.000 9273207 9731831 4003911 415636 -2.41 2.49 0.016 0.013 -1.749383 219631 -.1798784 1.848894 -.0001647 0078905 0036093 00287 -0.05 2.75 0.964 0.006 -.0072388 0022655 0069094 0135155 -.1263982 1397189 -0.90 0.366 -.4002421 1474458 Instruments for differenced equation GMM-type: L(2/.).ln_lab Standard: D.ln_lab_pro LD.ln_lab_pro D.gdp_growth LD.gdp_growth Instruments for level equation Standard: _cons Two-Step Estimator with Windmeijer Bias-Corrected Robust VCE The Windmeijer bias-corrected robust VCE of the same model can be obtained by using the following command: xtabond ln_lab l(0/1).ln_lab_pro l(0/1).gdp_growth, lags(1) twostep vce(robust) noconstant The results are shown in the following output The estimated coefficients have been changed in the two-step method 560 18 Dynamic Panel Model xtabond ln_lab l(0/1).ln_lab_pro l(0/1).gdp_growth, lags(1) twoste p vce(robust) n > oconstant Arellano-Bond dynamic panel-data estimation Group variable: country_SA Time variable: year Number of obs Number of groups = = 208 = avg = max = 26 26 26 = = 80.64 0.0000 Obs per group: Number of instruments = 184 Wald chi2(5) Prob > chi2 Two-step results (Std Err adjusted for clustering on country_SA) WC-Robust Std Err ln_lab Coef z P>|z| [95% Conf Interval] ln_lab L1 .4879422 2715575 1.80 0.072 -.0443007 1.020185 ln_lab_pro L1 -.2034783 1.104816 1.773437 2.067326 -0.11 0.53 0.909 0.593 -3.679351 -2.947069 3.272394 5.156701 gdp_growth L1 -.0030979 0013066 0141439 0036273 -0.22 0.36 0.827 0.719 -.0308194 -.0058028 0246236 0084159 Instruments for differenced equation GMM-type: L(2/.).ln_lab Standard: D.ln_lab_pro LD.ln_lab_pro D.gdp_growth LD.gdp_growth The test for autocorrelation presents no evidence of model misspecification estat abond Arellano-Bond test for zero autocorrelation in first-differenced errors Order z -.90131 1.4019 Prob > z 0.3674 0.1609 H0: no autocorrelation 18.6 System GMM Estimator The Arellano–Bond (1991) model is extended further by Arellano and Bover (1995), Ahn and Schmidt (1995) and Blundell and Bond (1998) to accommodate large autoregressive parameters and a large ratio of the variance of the cross section-specific effect to the variance of idiosyncratic error Blundell and Bond (1998) advocated the use of extra moment conditions based on the stationarity restrictions of the time series properties of the data, as suggested by Arellano and Bover (1995) They propose a system 18.6 System GMM Estimator 561 GMM procedure that uses moment conditions based on the level equations together with the usual Arellano and Bond type orthogonality conditions Their modification of the estimator includes lagged levels as well as lagged differences To discuss the system GMM method, we consider the following model: yit = φ1 yi,t−1 + βxit + μi + εit (18.6.1) Here, x it is a vector containing both contemporaneous and lagged values of explanatory variables The dynamic panel data model in (18.6.1) captures both the long-run equilibrium and the short-run dynamics The idiosyncratic errors obey the following conditional moment restriction: E εit |yi0 , yi1 , yi,t−1 ; xi0 , xi1 , xit ; μi = 0, t = 1, 2, T (18.6.2) The first-differenced form of (18.6.1) is yit = φ1 yi,t−1 + β xit + εit (18.6.3) The unconditional moment conditions are: E yi0 yi1 yi,t−2 εit = (18.6.4) E xi0 xi1 xi,t−1 εit = (18.6.5) Anderson and Hsiao (1981) use simple IV estimators of this type for AR(1) model in a multivariate framework Arellano and Bond (1991) use GMM estimators in this framework Ahn and Schmidt (1995) suggest an additional set of T − nonlinear moment conditions: E((μi + εit ) εit ) = 0, t = 3, , T (18.6.6) Blundell and Bond (1998) use lagged changes of the variables as instruments for current levels, and the additional moment conditions under the assumptions that E( yit |μi ) = and E( xit |μi ) = are E yi0 yi1 E xi0 yi,t−1 (μi + εit ) = and xi1 xi,t (μi + εit ) = (18.6.7) However, the moment conditions shown in (18.6.7) are redundant because it can be expressed as a linear combination of the moments shown in (18.6.4) and (18.6.5) Kiviet et al (2013) suggest the following non-redundant moment conditions E( yit−1 (μi + εit )) = 0, t = 2, 3, , T (18.6.8) 562 18 Dynamic Panel Model Along with, for endogenous x it , E( xit−1 (μi + εit )) = t = 2, 3, , T (18.6.9) If x it is exogenous, E( xit (μi + εit )) = t = 1, 2, 3, , T (18.6.10) Equations (18.6.4), (18.6.5), (18.6.8) and either (18.6.9) or (18.6.10) form what is known as the system GMM estimator The system GMM estimator involves a set of additional restrictions on the initial conditions of the process generating y The model developed by Hsiao et al (2002) uses direct maximum likelihood estimation with the differenced data that needs less restrictions under the assumption that idiosyncratic errors are normally distributed Both approaches yield consistent estimators for all values of φ Phillips and Han (2008) introduced a differencing-based estimator in an AR(1) model for which asymptotic Gaussian-based inference is valid for all values of φ ∈ (−1, 1) 18.6.1 Illustration by Using Stata In Stata, xtdpdsys estimates a linear dynamic panel data model where the unobserved cross section effects are correlated with the lags of the dependent variable as developed in Blundell and Bond (1998) To estimate this model, we can use the menu in Stata main window in the following sequence: Statistics > Longitudinal/panel data > Dynamic panel data (DPD) > ArellanoBover/Blundell-Bond estimation To estimate the same model by using this methodology, we have to carry out the following command: xtdpdsys ln_lab l(0/1).ln_lab_pro l(0/1).gdp_growth, lags(1) vce(robust) By comparing with the estimated results in Arellano and Bond (1991), it is clear that the system estimator provides a much higher estimate of the coefficient on lagged ln_lab and the other regressors The number of instruments used in the system estimation is higher than used in Arellano and Bond (1991) 18.6 System GMM Estimator 563 xtdpdsys ln_lab l(0/1).ln_lab_pro l(0/1).gdp_growth, lags(1) vce(robust) System dynamic panel-data estimation Group variable: country_SA Time variable: year Number of instruments = Number of obs Number of groups 211 = = 216 Obs per group: = avg = max = 27 27 27 Wald chi2(5) Prob > chi2 = = 157857.60 0.0000 One-step results Robust Std Err ln_lab Coef z P>|z| [95% Conf Interval] ln_lab L1 1.00402 0037401 268.45 0.000 99669 1.011351 ln_lab_pro L1 -1.227652 1.241446 4701487 4814403 -2.61 2.58 0.009 0.010 -2.149127 2978402 -.3061779 2.185052 gdp_growth L1 .0019207 0097494 004403 002956 0.44 3.30 0.663 0.001 -.006709 0039557 0105505 0155431 _cons -.131618 1137913 -1.16 0.247 -.3546449 0914089 Instruments for differenced equation GMM-type: L(2/.).ln_lab Standard: D.ln_lab_pro LD.ln_lab_pro D.gdp_growth LD.gdp_growth Instruments for level equation GMM-type: LD.ln_lab Standard: _cons Summary Points • The dynamic panel data model incorporates both the long-run equilibrium relationship and the short-run dynamics • The unobserved effects are correlated with the lagged dependent variables, making standard estimators inconsistent • Anderson and Hsiao (1981) propose instrumental variable procedure to estimate dynamic panel model • Arellano and Bond (1991) derive the corresponding one-step and two-step GMM estimators, as well as the robust VCE estimator for the one-step model • Blundell and Bond (1998) propose a system GMM procedure that uses moment conditions based on the level equations together with the usual Arellano and Bond type orthogonality conditions • Kiviet et al (2013) developed further the system GMM method by introducing non-redundant moment conditions 564 18 Dynamic Panel Model Appendix: Generalised Method of Moments The generalised method of moments (GMM) is an extension of the classical theory of the method of moments of Fisher (1925) The basis of the method of moments is that a sample statistic will converge in probability to some constant in a random sample To estimate K parameters, θ , …, θ K , we have to compute K statistics, m1 , …, mK , whose probability limits are known functions of the parameters These K moments are equated to the K functions, and the functions are inverted to express the parameters as functions of the moments The moments will be consistent by virtue of a law of large numbers They will be asymptotically normally distributed by virtue of the central limit theorem Suppose that a sample consists of n observations, y1 , …, yn The kth order raw moment is mk = n i=1 yik (18.A.1) n Therefore, E(m k ) = μk = E yik (18.A.2) In general, μk will be a function of the underlying parameters By computing K raw moments and equating them to these functions, we obtain K equations that can be solved to provide estimates of the K unknown parameters The moments based on powers of y provide a natural source of information about the parameters In the method of moments, there are exactly as many moment equations as there are parameters to be estimated Thus, each of these is exactly identified There will be a single solution to the moment equations, and at that solution, the equations will be exactly satisfied But in many cases there are more moment equations than parameters, so the system is overdetermined and there may be conflicting sets of solutions Suppose that the model involves K parameters, θ = (θ , θ , …, θ K ), and there are L moment conditions, L > K The GMM estimator is based on a set of population orthogonality conditions: E(m l (yi , xi , z i , θ )) = E(m il (θ )) = (18.A.3) The corresponding sample means, m¯ l (θ ) = n n m il (yi , xi , z i , θ ) = n i=1 n m il (θ ) i=1 (18.A.4) Appendix: Generalised Method of Moments 565 Equation (18.A.4) provides a system of L equations in K unknown parameters L and will not have a unique solution We can reconcile the different sets of K estimates that can be obtained from Eq (18.A.4) by minimising a criterion function: L m¯ l2 = m(θ ¯ ) m(θ ¯ ) q= (18.A.5) l=1 We can also use the criterion as a weighted sum of squares ¯ ) q = m(θ ¯ ) Wn m(θ (18.A.6) Here W n is any positive definite matrix that may depend on the data but is not a function of θ to produce a consistent estimator of θ The estimators defined by choosing θ to minimise (18.A.6) are minimum distance estimators or GMM estimators If W n is a positive definite matrix, then GMM estimator of θ is consistent References Ahn, S.C., and P Schmidt 1995 Efficient Estimation of Models for Dynamic Panel Data Journal of Econometrics 68: 5–27 Anderson, T.W., and C Hsiao 1981 Estimation of Dynamic Models with Error Components Journal of American Statistical Association 76: 598–606 Arellano, M 1989 A Note on the Anderson-Hsiao Estimator for Panel Data Economics Letters 31: 337–341 Arellano, M., and S Bond 1991 Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations Review of Economic Studies 58: 277–297 Arellano, M., and O Bover 1995 Another Look at the Instrumental Variable Estimation of Error Component Models Journal of Econometrics 68: 29–51 Blundell, R., and S Bond 1998 Initial Conditions and Moment Restrictions in Dynamic Panel Data Models Journal of Econometrics 87: 115–143 Fisher, Ronald A 1925 Statistical Methods for Research Workers Edinburgh: Oliver and Boyd Hansen, L.P 1982 Large Sample Properties of Generalized Method of Moments Estimators Econometrica 50: 1029–1054 Holtz-Eakin, D., W.K Newey, and H.S Rosen 1988 Estimating Vector Autoregressions with Panel Data Econometrica 56: 1371–1395 Hsiao, C., M.H Pesaran, and A.K Tahmiscioglu 2002 Maximum Likelihood Estimation of Fixed Effects Dynamic Panel Data Models Covering Short Time Periods Journal of Econometrics 109: 107–150 Kiviet, J.F., M Pleus, and R Poldermans 2013 Accuracy and Efficiency of Various GMM Inference Techniques in Dynamic Micro Panel Data Models Mimeo: University of Amsterdam Nickell, S 1981 Biases in Dynamic Models with Fixed Effects Econometrica 49: 1417–1426 Phillips, P.C.B., and C Han 2008 Gaussian Inference in AR(1) Time Series With or Without a Unit Root Econometric Theory 24: 631–650 Windmeijer, F 2005 A Finite Sample Correction for the Variance of Linear Efficient Two-Step GMM Estimators Journal of Econometrics 126: 25–51 ... now in version 15.1 for Windows, Unix and Mac computers Main windows in Stata There are five docked windows in Stata The Command window locating below in the startup window is used for typing... series in India Time series data are useful in analysing trend and forecasting in macroeconometric model In finance, they are used in forecasting volatility along with mean return from a financial... Sanyal, who in fact is a continuous source of encouragement in learning and thinking I think, in some way, they have prepared the background for this book being written A number of friends and colleagues