Stata time series fall 2011

Center for Teaching, Research & Learning Social Science Research Lab American University, Washington, D.C http://www.american.edu/provost/ctrl/ 202-885-3862 Stata & Time series Stata is a general-purpose statistical software package Stata's full range of capabilities include: data management, statistical analysis, graphics, simulations, and custom programming Course Objective This course is designed to give a basic understanding of some of the features available in Stata when working with time series analysis Time series data represents a pool of variables observed and recorded over time For this tutorial we are going to use the “Time series.dta” data set containing the following variables: date, unemployment, consumer price index (CPI), interest rate, and GDP growth “Time series.dta” contains observations for each quarter from 1960 to 2005 Learning Outcomes Opening the data set and data description Declaring the data to be Time Series Useful time series command Autocorrelation and cross-correlation analysis Unit Root test Opening the data set and data description We recommend that you create a log file before you start working in Stata, this way you will have all your computations on a file to review afterwards To this, go to: File > Log > Begin This file will record all the input that you type, as well as all the output produced by STATA Alternatively, you can type (in the command window): log using "C:\Users\CTRL\Desktop\TSlog.log" Opening the data file For this tutorial, we will use Time series.dta, which can be downloaded from: http://www.american.edu/provost/ctrl/trainingguides.cfm In Stata 11 and earlier versions, before you open the dataset, you may need to set the memory size (In this instance, this isn’t necessary, as the example dataset is relatively small and does not require a lot of memory.) To tell STATA how much memory to set aside for data, type: set mem 100m (This command is not needed in Stata 12) Once you have downloaded and unzipped the dataset, you can access by going to: File > Open Alternatively, you can type: use "C:\Users\CTRL\Desktop\Time series.dta", clear where the clear option has been appended This clears Stata’s memory, allowing you to open a new dataset In order to get a sense of what the data file contains we can use a couple of commands: summerize and describe, both stata commands provide useful information about our data set and variables Summarize calculates and displays a variety of univariate summary statistics If no variable list is specified, summary statistics are calculated for all the variables in the dataset Describe produces a summary of the dataset in memory or of the data stored in a Stata-format dataset Example using “Time series.dta” summarize Variable Obs Mean unemp cpi interest gdp datevar 181 181 181 181 181 5.914917 95.91184 6.167403 2.031231 90 Std Dev 1.453928 54.13317 3.3706 2.001162 52.39434 Min Max 3.4 29.39667 98 -1.703726 10.66667 192.1667 19.1 9.718504 180 describe Contains data from C:\Users\CTRL\Desktop\Time series.dta obs: 181 vars: 12 Oct 2011 10:00 size: 3,620 variable name unemp cpi interest gdp datevar Sorted by: Note: storage type float float float float float display format %9.0g %9.0g %9.0g %9.0g %tq value label variable label Unemployment Rate Consumer Price Index Federal Funds Interest Rate GDP annual growth Date variable datevar dataset has changed since last saved Declaring the data to be Time Series Using the time variable “datevar”, we are able to declare the data as times series in order to use the time series operators Using the tsset command tsset declares the data in memory to be a time series tssetting the data is what makes Stata's time-series operators such as L and F (lag and lead) work Also, before using the other time series commands, you must tsset the data first If you save the data after tsset, Stata will remember that data as being time series and you will not have to tsset again Example using “Time series.dta” tsset datevar time variable: delta: datevar, 1960q1 to 2005q1 quarter Useful Time Series commands In this section, we introduce a few basic but very helpful commands tin (times in, from time A to time B) option: list datevar unemp if tin(2000q1,2000q4) 161 162 163 164 datevar unemp 2000q1 2000q2 2000q3 2000q4 4.033333 3.933333 3.9 twithin (times within time A and time B, excluding the two time points) option: list datevar unemp if twithin(2001q1,2001q3) 166 datevar unemp 2001q2 4.4 Generating values bases on past observations using the lag operator and forward-looking values using the lead operator: generate unempL1=L1.unemp generate unempL2=L2.unemp list datevar unemp unempL1 unempL2 in 1/5 datevar unemp unempL1 unempL2 1960q1 1960q2 1960q3 1960q4 1961q1 5.133333 5.233333 5.533333 6.266667 6.8 5.133333 5.233333 5.533333 6.266667 5.133333 5.233333 5.533333 generate unempF1=F1.unemp generate unempF2=F2.unemp list datevar unemp unempF1 unempF2 in 1/5 datevar unemp unempF1 unempF2 1960q1 1960q2 1960q3 1960q4 1961q1 5.133333 5.233333 5.533333 6.266667 6.8 5.233333 5.533333 6.266667 6.8 5.533333 6.266667 6.8 6.766667 To generate the difference between current and previous values, use the D operator The transformations are as follows: D1 = Yt – Yt-1 and D2 = (Yt–Yt-1) – (Yt-1–Yt-2) generate unempD1=D1.unemp generate unempD2=D2.unemp list datevar unemp unempD1 unempD2 in 1/5 datevar unemp unempD1 unempD2 1960q1 1960q2 1960q3 1960q4 1961q1 5.133333 5.233333 5.533333 6.266667 6.8 0999999 3000002 7333336 5333333 2000003 4333334 -.2000003 Autocorrelation and cross-correlation analysis In this section, we show you how to explore autocorrelation and cross-correlation Autocorrelation represent the correlation between a variable and its previous values; use the ac and pac commands To explore the relationship between two time series, use the command xcorr, making sure that you always list the independent variable first and the dependent variable second ac produces a correlogram (a graph of autocorrelations) with pointwise confidence intervals that is based on Bartlett's formula for MA(q) processes pac produces a partial correlogram (a graph of partial autocorrelations) with confidence intervals calculated using a standard error of 1/sqrt(n) The residual variances for each lag may optionally be included on the graph xcorr plots the sample cross-correlation function Example using “Time series.dta” -0.50 0.00 0.50 1.00 ac unemp, lags(10) 10 Lag Bartlett's formula for MA(q) 95% confidence bands In this case, the autocorrelation graph indicates that unemployment is correlated with up to eight previous quarters -1.00 -0.50 0.00 0.50 1.00 pac unemp, lags(10) 10 Lag 95% Confidence bands [se = 1/sqrt(n)] xcorr gdp unemp 1.00 0.50 0.00 -0.50 -1.00 -1.00 -0.50 0.00 0.50 1.00 Cross-correlogram -20 -10 Lag 10 20 The graph above indicates that GDP has a negative correlation with unemployment (six to nine months) Unit Root test In this section, we demonstrate how to evaluate if the series has a unit root When working with times series data sets it is important to look for unit root If unit root is found in a series this means that more than one trend is present in the series Let’s look at unemployment across time and test for unit root Unemployment Rate 10 12 line unemp datevar 1960q1 1965q1 1970q1 1975q1 1980q1 1985q1 1990q1 1995q1 2000q1 2005q1 Date variable In order to assess for Unit Root we can use the Dickey-Fuller test to examine for stochastic trends, using the following command: dfuller unemp, lag(5) Augmented Dickey-Fuller test for unit root Z(t) Test Statistic 1% Critical Value -2.481 -3.485 Number of obs = 175 Interpolated Dickey-Fuller 5% Critical 10% Critical Value Value -2.885 -2.575 MacKinnon approximate p-value for Z(t) = 0.1201 In this case the null hypothesis is that unemployment has a unit root The Z-score yielded by the test shows that unemployment has a unit root, because it falls within the acceptance interval (i.e |-2.597| < |-3.481|) When testing for unit root on the first difference of unemployment, we will find out that it does not have unit root: dfuller unempD1, lag(5) Augmented Dickey-Fuller test for unit root Z(t) Test Statistic 1% Critical Value -4.593 -3.485 Number of obs = 174 Interpolated Dickey-Fuller 5% Critical 10% Critical Value Value -2.885 -2.575 MacKinnon approximate p-value for Z(t) = 0.0001 In this case The Z-score does not fall within the acceptance interval (i.e |-5.303| > |-3.481|) therefore we can discard a unit root ... last saved Declaring the data to be Time Series Using the time variable “datevar”, we are able to declare the data as times series in order to use the time series operators Using the tsset command... data in memory to be a time series tssetting the data is what makes Stata' s time- series operators such as L and F (lag and lead) work Also, before using the other time series commands, you must... you save the data after tsset, Stata will remember that data as being time series and you will not have to tsset again Example using ? ?Time series. dta” tsset datevar time variable: delta: datevar,

Định dạng
Số trang	10
Dung lượng	533,92 KB
File đính kèm	98. Stata.rar (500 KB)