13.2 Survey of selected time series models
13.2.1 Univariate time series models
AR(p) time series process
We start by considering a simple first-order autoregressive (AR) process. The value ofyat the current timetis explained by its value at timet−1, a constantc, and a white noise error process{𝜀t}:
yt=c+𝜙yt−1+𝜀t. (13.1)
Basically, (13.1) is a first-order inhomogeneous difference equation. The path of this process depends on the value of𝜙. If|𝜙|≥1, then shocks accumulate over time and the process is non-stationary. Incidentally, if |𝜙|>1 the process grows with- out bounds, and if |𝜙|=1 it has a unit root. The latter will be discussed in more detail in the subsection on multivariate time series modelling. For now, only the covariance-stationary case,|𝜙|<1, is considered. With the lag operator L, (13.1) can be rewritten as
(1−𝜙L)yt=c+𝜀t. (13.2)
The stable solution to this process is given by an infinite sum of past errors with decaying weights:
yt= (c+𝜀t) +𝜙(c+𝜀t−1) +𝜙2(c+𝜀t−2) +𝜙3(c+𝜀t−3) + ã ã ã (13.3a)
= [ c
1−𝜙 ]
+𝜀t+𝜙𝜀t−1+𝜙2𝜀t−2+𝜙3𝜀t−3+ ã ã ã (13.3b) The expected value and the second-order moments of the AR(1) process in (13.1) are given by
𝜇=E[yt] = c
1−𝜙, (13.4a)
𝛾0=E[(yt−𝜇)2] = 𝜎2
1−𝜙2, (13.4b)
𝛾j=E[(yt−𝜇)(yt−j−𝜇)] = [ 𝜙j
1−𝜙2 ]
𝜎2. (13.4c)
From (13.4c), the geometrically decaying pattern of the auto-covariances is evident.
The AR(1) process can be generalized to an AR(p) process:
yt=c+𝜙1yt−1+𝜙2yt−2+ ã ã ã +𝜙pyt−p+𝜀t. (13.5) As with (13.1), (13.5) can be rewritten as
(1−𝜙1L−𝜙2L2− ã ã ã −𝜙pLp)yt=c+𝜀t. (13.6)
k k It can be shown that such an AR(p) process is stationary if all roots z0 of the
polynomial
𝜙p(z) =1−𝜙1z−𝜙2z2− ã ã ã −𝜙pzp (13.7) have a modulus greater than one. The modulus of a complex numberz=z1+iz2is defined as|z|=
√ z2
1+z2
2. Viewing the stationarity condition from that angle, it turns out that in the case of an AR(1) process, as in (13.1),|𝜙|<1 is required because the only solution to 1−𝜙z=0 is given forz=1∕𝜙and|z|=|1∕𝜙|>1 when|𝜙|<1.
If the error process{𝜀t}is normally distributed, (13.5) can be consistently esti- mated by the ordinary least-squares (OLS) method. Furthermore, the OLS estimator for the unknown coefficient vector𝜷= (c,𝝓)′is asymptotically normally distributed.
Alternatively, the model parameters can be estimated by the principle of maximum likelihood. However, one problem arises in the context of AR(p) models and this holds true for the more general class of ARMA(p,q) models discussed later. For iid random variables with probability density functionf(yt;𝜽)for t=1,…,T and parameter vector𝜽, the joint density function is the product of the marginal densities:
f(y;𝜽) =f(y1,…,yT;𝜽) =
∏T
t=1
f(yt;𝜽). (13.8) This joint density function can, in line with the ML principle, be interpreted as a function of the parameters𝜽given the data vectory; that is, the likelihood function is given by
𝔏(𝜽|y) =𝔏(𝜽|y1,…,yT) =
∏T
t=1
f(yt;𝜽). (13.9) The log-likelihood function then has the simple form
ln𝔏(𝜽|y) =
∑T
t=1
lnf(yt;𝜽). (13.10)
Because our model assumes that the time series {yt} has been generated from a covariance-stationary process, the iid assumption is violated and hence the log-likelihood cannot be derived as swiftly as in (13.8)–(13.10). That is, yt is modelled as a function of its own history, and therefore yt is not independent of yt−1,…,yt−p given that {𝜀t} is normally distributed with expectation 𝜇=0 and variance𝜎2. In order to apply the ML principle, one therefore has two options: either estimate the full-information likelihood function or derive the likelihood function from a conditional marginal factorization. The derivation of the log-likelihood for both options is provided, for instance, in Hamilton (1994). Here, we will focus on the second option. The idea is that the joint density function can be factored as the product of the conditional density function given all past information and the joint density function of the initial values:
f(yT,…,y1;𝜽) = ( T
∏
t=p+1
f(yt|It−1,𝜽) )
⋅f(yp,…,y1;𝜽), (13.11)
k k whereIt−1denotes the information available at timet. This joint density function can
then be interpreted as the likelihood function with respect to the parameter vector𝜽 given the sampley, and therefore the log-likelihood is given by
ln𝔏(𝜽|y) =
∑T
t=p+1
lnf(yt|It−1,𝜽) +lnf(yp,…,y1;𝜽). (13.12) The log-likelihood consists of two terms. The first term is the conditional log-likelihood and the second term the marginal log-likelihood for the initial values. Whether one maximizes the exact log-likelihood, as in (13.12), or only the conditional log-likelihood, that is, the first term of the exact log-likelihood, is asymptotically equivalent. Both are consistent estimators and have the same limiting normal distribution. Bear in mind that in small samples the two estimators might differ by a non-negligible amount, in particular if the roots are close to unity.
Because a closed-form solution does not exist, numerical optimization methods are used to derive optimal parameter values.
MA(q) time series process
It was shown above that a finite stable AR(p) process can be inverted to a moving average (MA) of current and past shocks. It is now considered how a process can be modelled as a finite moving average of its shocks. Such a process is called MA(q), where the parameterqrefers to the highest lag of shocks included in such a process.
An MA(1) process is given by
yt =𝜇+𝜀t+𝜃𝜀t−1, (13.13)
where{𝜀t}is a white noise process and𝜇,𝜃can be any constants. This process has moments
𝜇=E[yt] =E[𝜇+𝜀t+𝜃𝜀t−1], (13.14a) 𝛾0 =E[(yt−𝜇)2] = (1+𝜃2)𝜎2, (13.14b) 𝛾1=E[(yt−𝜇)(yt−1−𝜇)] =𝜃𝜎2. (13.14c)
The higher auto-covariances𝛾jwithj>1 are zero. Neither the mean nor the auto- covariance are functions of time, hence an MA(1) process is covariance-stationary for all values of𝜃. Incidentally, this process also has the characteristic of ergodicity.
The MA(1) process can be extended to the general class of MA(q) processes:
yt=𝜇+𝜀t+𝜃1𝜀t−1+ ã ã ã +𝜃q.𝜀t−q (13.15) With the lag operatorL, this process can be rewritten as
yt−𝜇=𝜀t+𝜃1𝜀t−1+ ã ã ã +𝜃q𝜀t−q (13.16a)
= (1+𝜃1L+ ã ã ã +𝜃qLq)𝜀t=𝜃q(L)𝜀t. (13.16b)
k k Much as a stable AR(p) process can be rewritten as an infinite MA process, an
MA(q) process can be transformed into an infinite AR process as long as the roots of the characteristic polynomial, thez-transform, have modulus greater than 1, that is, are outside the unit circle:
𝜃qz=1+𝜃1z+ ã ã ã +𝜃qzq. (13.17) The expected value of an MA(q) process is𝜇and hence invariant with respect to its order. The second-order moments are
𝛾0 =E[(yt−𝜇)2] = (1+𝜃21+ ã ã ã +𝜃q2)𝜎2, (13.18a) 𝛾j=E[(𝜀t+𝜃1𝜀t−1+ ã ã ã +𝜃q𝜀t−q) ì (𝜀t−q+𝜃1𝜀t−j−1+ ã ã ã +𝜃q𝜀t−j−q)]. (13.18b) Because the{𝜀t}are uncorrelated with each other by assumption, (13.18b) can be simplified to
𝛾j=
{(1+𝜃j+1𝜃1+𝜃j+2𝜃2+ ã ã ã +𝜃q𝜃q−j)𝜎2 forj=1,2,ã ã ã,q,
0 forj>q. (13.19)
That is, empirically an MA(q) process can be detected by its firstqsignificant autocor- relations and a slowly decaying or alternating pattern of its partial autocorrelations.
For large sample sizesT, a 95% significance band can be calculated as (
𝜚j− 2
√T
, 𝜚j+ 2
√T )
, (13.20)
where𝜚jrefers to thejth-order autocorrelation.
It has been stated above that a finite AR process can be inverted to an infinite MA process. Before we proceed further, let us first examine the stability condition of such an MA(∞) process,
yt=𝜇+
∑∞
j=0
𝜓j𝜀t−j. (13.21)
The coefficients for an infinite process are denoted by𝜓instead of𝜃. It can be shown that such an infinite process is covariance-stationary if the coefficient sequence{𝜓j} is either square summable,
∑∞
j=0
𝜓j2<∞, (13.22)
or absolute summable,
∑∞
J=0
|𝜓j|<∞, (13.23)
where absolute summability is sufficient for square summability—that is, the former implies the latter, but not vice versa.
k k ARMA(p,q) time series process
The previous subsections have shown how a time series can be explained either by its history or by current and past shocks. Furthermore, the moments of these data-generating processes have been derived and the mutual convertibility of these model classes has been stated for parameter sets that fulfill the stability condition.
These two time series processes are now put together and a more general class of ARMA(p,q) processes is investigated.
In practice, it is often cumbersome to detect a pure AR(p) or MA(q) process by the behaviors of its empirical ACF and PACF because neither one tapers off with increasing lag order. In these instances, the time series might have been generated by a mixed autoregressive moving average process.
For a stationary time series{yt}, such a mixed process is defined as
yt=c+𝜙1yt−1+ ã ã ã +𝜙pyt−p+𝜀t+𝜃1𝜀t−1+ ã ã ã +𝜃q𝜀t−q. (13.24) By assumption,{yt}is stationary, that is, the roots of the characteristic polynomial lie outside the unit circle. Hence, with the lag operator, (13.24) can be transformed to:
yt= c
1−𝜙1L− ã ã ã −𝜙pLp + 1+𝜃1L+ ã ã ã +𝜃qLq
1−𝜙1L− ã ã ã −𝜙pLp𝜀t (13.25a)
=𝜇+𝜓(L)𝜀t. (13.25b)
The condition of absolute summability for the lag coefficients{𝜓j}must hold. Put differently, the stationarity condition depends only on the AR parameters and not on the MA ones.
We now briefly touch on the Box–Jenkins approach to time series modelling (see Box and Jenkins 1976). This approach consists of three stages: identification, es- timation, and diagnostic checking. As a first step, the series is visually inspected for stationarity. If an investigator has doubts that this condition is met, he/she has to suitably transform the series before proceeding. Such transformations could in- volve the removal of a deterministic trend or taking first differences with respect to time. Furthermore, variance instability such as higher fluctuations as time proceeds can be coped with by using the logarithmic values of the series instead. By inspect- ing the empirical ACF and PACF, a tentative ARMA(p,q) model is specified. The next stage is the estimation of a preliminary model. The ML principle allows one to discriminate between different model specifications by calculating information cri- teria and/or applying likelihood-ratio tests. Thus, one has a second set of tools to determine an appropriate lag order for ARMA(p,q) models compared with the or- der decision that is derived from ACF and PACF. Specifically, the Akaike, Schwarz, and/or Hannan–Quinn information criteria can be utilized in the determination of an appropriate model structure (see Akaike 1981, Hannan and Quinn 1979, Quinn 1980, Schwarz 1978):
AIC(p,q) =ln(̂𝜎2) +2(p+q)
T , (13.26a)
k k BIC(p,q) =ln(̂𝜎2) +lnT(p+q)
T , (13.26b)
HQ(p,q) =ln(̂𝜎2) +ln(ln(T))(p+q)
T , (13.26c)
where ̂𝜎2 denotes the estimated variance of an ARMA(p,q) process. The lag order (p,q) that minimizes the information criteria is then selected. As an alternative, a likelihood-ratio test can be computed for an unrestricted and a restricted model. The test statistic is defined as
2[𝔏(𝜽) −̂ 𝔏(𝜽)] ∼̃ 𝜒2(m), (13.27) where𝔏(𝜽)̂ denotes the estimate of the unrestricted log-likelihood and𝔏(𝜽)̃ that of the restricted log-likelihood. This test statistic is distributed as𝜒2withmdegrees of freedom, which corresponds to the number of restrictions. Next, one should check the model’s stability as well as the significance of its parameters. If one of these tests fails, the econometrician has to start again by specifying a more parsimonious model with respect to the ARMA order. In the last step, diagnostic checking, one should then examine the residuals for lack of correlation and for normality and conduct tests for correctness of the model order, (i.e., over- and under-fitting). Incidentally, by calculat- ing pseudoex anteforecasts, the model’s suitability for prediction can be examined.
Once a stable (in the sense of covariance-stationary) ARMA(p,q) model has been estimated, it can be used to predict future values ofyt. These forecasts can be com- puted recursively from the linear predictor:
yT(h) =𝜙1ȳT+h−1+ ã ã ã +𝜙pȳT+h−p+
𝜀t+𝜃1𝜀t−T−1+ ã ã ã +𝜃q𝜀t−T−q+ ã ã ã , (13.28) whereȳt=ytfort≤T andȳT+j=yT(j)forj=1,…,h−1. Using the Wold repre- sentation of a covariance-stationary ARMA(p,q) process (see (13.25a) and (13.25b)), this predictor is equivalent to
yT(h) =𝜇+𝜓h𝜀t+𝜓h+1𝜀t−1+𝜓h+2𝜀t−2+ ã ã ã (13.29) It can be shown that this predictor is minimal with respect to the mean squared er- ror criterion based on the information setIt—see, for instance, Judge et al. (1985, Chapter 7) and Hamilton (1994, Chapter 4). Incidentally, if the forecast horizonhis greater than the moving average orderq, the forecasts are determined solely by the autoregressive terms in (13.28).
If {𝜀t} is assumed to be standard normally distributed, then it follows that the h-steps-ahead forecast is distributed as
yt+h|It∼N(yt+h|t, 𝜎2(1+𝜓12+ ã ã ã +𝜓h−12 )), (13.30) where the𝜓i,i=1,…,h−1, denote the coefficients from the Wold representation of a covariance-stationary ARMA(p,q) process. The 95% forecast confidence band
k k can then be computed as
yt+h|It±1.96⋅√
𝜎2(1+𝜓12+ ã ã ã +𝜓h−12 ). (13.31)