Box-Jenkins Modeling and Forecasting

Sections8.1through8.4introduced theAR(1) model, including model properties, identification methods, and forecasting. We now introduce a broader class of models known as autoregressive integrated moving average (ARIMA) models, attributable to George Box and Gwilym Jenkins (see Box, Jenkins, and Reinsel, 1994).

8.5.1 Models AR(p) Models

The autoregressive model of order 1 allows us to relate the current behavior of an observation directly to its immediate past value. Moreover, in some applications,

there are also important effects of observations that are more distant in the past than simply the immediate preceding observation. To quantify this, we have already introduced the lagkautocorrelationρkthat captures the linear relationship betweenyt andyt−k. To incorporate this feature into a forecasting framework, we have the autoregressive model of order p, denoted by AR(p). The model equation is

yt =β0+β1yt−1+ ã ã ã +βpyt−p+εt, t =p+1, . . . , T , (8.5) where{εt}is a white noise process such that Cov(εt+k, yt)=0 fork >0, and β0,β1, . . . , βpare unknown parameters.

As a convention, when data analysts specify anAR(p) model, they include not onlyyt−pas a predictor variable but also the intervening lags,yt−1, . . . , yt−p+1. The exceptions to this convention are the seasonal autoregressive models, which will be introduced in Section 9.4. Also by convention, theAR(p) is a model of a stationary, stochastic process. Thus, certain restrictions on the parameters β1, . . . , βp are necessary to ensure (weak) stationarity. These restrictions are developed in the following subsection.

Backshift Notation

The backshift, or backward-shift, operator B is defined by Byt =yt−1. The notation Bkmeans apply the operatorktimes; that is,

Bkyt =BBã ã ãByt =Bk−1 yt−1 = ã ã ã =yt−k.

This operator is linear in the sense that B(a1yt +a2yt−1)=a1yt−1+a2yt−2, wherea1anda2are constants. Thus, we can express theAR(p) model as

β0+εt =yt−

β1yt−1+ ã ã ã +βpyt−p

1−β1B− ã ã ã −βpBp

yt =(B)yt.

If x is a scalar, then(x)=1−β1x− ã ã ã −βpxp is apth-order polynomial in x. Thus, there exist p roots of the equation (x)=0. These roots, say, g1, . . . , gp, may or may not be complex numbers. It can be shown (see Box, Jenkins, and Reinsel, 1994) that, for stationarity, all roots lie strictly outside the unit circle. To illustrate, forp=1, we have(x)=1−β1x. The root of this equation isg1 =β1−1. Thus, we require|g1|>1, or|β1|<1, for stationarity.

MA(q) Models

One interpretation of the modelyt =β0+εt is that the disturbanceεt perturbs the measure of the “true,”expected value ofyt.Similarly, we can consider the modelyt =β0+εt−θ1εt−1, whereθ1εt−1is the perturbation from the previous time period. Extending this line of thought, we introduce the moving average model of order q, denoted byMA(q). The model equation is

yt =β0+εt−θ1εt−1− ã ã ã −θqεt−q, (8.6) where the process{εt}is a white noise process such that Cov(εt+k, yt)=0 for k >0 andβ0,θ1, . . . , θq are unknown parameters.

With equation (8.6) it is easy to see that Cov(yt+k, yt)=0 fork > q. Thus, ρk=0 fork > q. Unlike theAR(p) model, theMA(q) process is stationary for any finite values of the parametersβ0, θ1, . . . , θq. It is convenient to write the MA(q) using backshift notation, as follows:

yt −β0 =

1−θ1B−. . .−θqBq

εt =(B)εt.

As with(x), ifxis a scalar, then(x)=1−θ1x− ã ã ã −θqxq is aqth-order polynomial in x. It is unfortunate that the phrase “moving average”is used for the model defined by equation (8.6) and the estimate defined in Section 9.2. We will attempt to clarify the usage as it arises.

ARMA and ARIMA Models

Combining theAR(p) and theMA(q) models yields the autoregressive moving average modelof orderpandq, orARMA(p, q),

yt −β1yt−1− ã ã ã −βpyt−p=β0+εt −θ1εt−1− ã ã ã −θqεt−q, (8.7) which can be represented as

(B)yt =β0+(B)εt. (8.8)

In many applications, the data require differencing to exhibit stationarity. We assume that the data are differenceddtimes to yield

wt =(1−B)dyt =(1−B)d−1(yt−yt−1)

=(1−B)d−2(yt−yt−1−(yt−1−yt−2))= ã ã ã (8.9) In practice, d is typically 0, 1, or 2. With this, the autoregressive integrated moving average modelof order (p, d, q), denoted byARI MA(p, d, q), is

(B)wt =β0+(B)εt. (8.10) Often,β0is zero ford >0.

Several procedures are available for estimating model parameters including maximum likelihood estimation, and conditional and unconditional least squares estimation. In most cases, these procedures require iterative fitting procedures.

See Abraham and Ledolter (1983) for further information.

Example: Forecasting Mortality Rates. To quantify values in life insurance and annuities, actuaries need forecasts of age-specific mortality rates. Since its publication, the method proposed by Lee and Carter (1992) has proved a popular method to forecast mortality. For example, Li and Chan (2007) used these methods to produce forecasts of 1921–2000 Canadian population rates and 1900–2000 U.S. population rates. They showed how to modify the basic methodology to incorporate atypical events including wars and pandemic events such as influenza and pneumonia.

The Lee-Carter method is usually based on central death rates at agexat time t, denoted bymx,t. The model equation is

mx,t =αx +βxκt+εx,t. (8.11) Here, the intercept (αx) and slope (βx) depend only on agex, not on time t. The parameter κt captures the important time effects (except for those in the disturbance termεx,t).

At first glance, the Lee-Carter model appears to be a linear regression with one explanatory variable. However, the termκt is not observed and so different techniques are required for model estimation. Different algorithms are available, including the singular value decomposition proposed by Lee and Carter (1992), the principal components approach and a Poisson regression model; see Li and Chan (2007) for references.

The time-varying termκt is typically represented using anARI MA model.

Li and Chan found that a random walk (with adjustments for unusual events) was a suitable model for Canadian and U.S. rates (with different coefficients), reinforcing the findings of Lee and Carter.

8.5.2 Forecasting Optimal Point Forecasts

Similar to forecasts that were introduced in Section8.4, it is common to provide forecasts that are estimates of conditional expectations of the predic- tive distribution. Specifically, assume that we have available a realization of {y1, y2, . . . , yT}and want to forecastyT+l, the value of the seriesl lead time units in the future. If the parameters of the process were known, then we would use E(yT+l|yT, yT−1, yT−2, . . .), that is, the conditional expectation ofyT+lgiven the value of the series up to and including timeT. We use the notation ET for this conditional expectation.

To illustrate, takingt =T +land applying ET to both sides of equation (8.7) yields

yT(l)−β1yT(l−1)− ã ã ã −βpyT(l−p)

=β0+ET(εT+l−θ1εT+l−1− ã ã ã −θqεT+l−q), (8.12) using the notationyT(k)=ET(yT+k). Fork≤0, ET(yT+k)=yT+k, as the value ofyT+k is known at timeT. Further, ET(εT+k)=0 for k >0, as disturbance terms in the future are assumed to be uncorrelated with current and past values of the series. Thus, equation (8.12) provides the basis of the chain rule of forecasting, where we recursively provide forecasts at lead timelbased on prior forecasts and realizations of the series. To implement equation (8.12), we substi- tute estimates for parameters and residuals for disturbance terms.

Special Case – MA(1) Model. We have already seen the forecasting chain rule for the AR(1) model in Section8.4. For theMA(1) model, note that for

l ≥2, we have yT(l)=ET(yT+l)=ET(β0+εT+l−θ1εT+l−1)=β0, because εT+l and εT+l−1 are in the future at time T. For l =1, we have yT(1)= ET(β0+εT+1−θ1εT)=β0−θ1ET(εT). Typically, one would estimate the term ET(εT) using the residual at timeT.

ψ-Coefficient Representation

Any ARIMA(p, d, q) model can be expressed as

yt =β0∗+εt+ψ1εt−1+ψ2εt−2+ ã ã ã =β0∗+ ∞ k=0

ψkεt−k,

called the ψ-coefficient representation. That is, the current value of a process can be expressed as a constant plus a linear combination of the current and previous disturbances. Values of{ψk}depend on the linear parameters of the ARIMAprocess and can be determined via straightforward recursive substitution.

To illustrate, for theAR(1) model, we have

yt =β0+εt+β1yt−1=β0+εt+β1(β0+εt−1+β1yt−2)= ã ã ã

= β0

1−β1 +εt+β1εt−1+β12εt−2+ ã ã ã = β0 1−β1 +

∞ k=0

β1kεt−k. That is,ψk =β1k.

Forecast Interval

Using theψ-coefficient representation, we can express the conditional expectation ofyT+las

ET(yT+l)=β0∗+ ∞

k=0

ψkET(εT+l−k)=β0∗+ ∞

k=l

ψkET (εT+l−k). This is because, at time T, the errorsεT, εT−1, . . ., have been determined by the realization of the process. However, the errorsεT+1, . . . , εT+lhave not been realized and hence have conditional expectation zero. Thus, thel-step forecast error is

yT+l−ET(yT+l)=β0∗+ ∞ k=0

ψkεT+l−k−

β0∗+ ∞

k=l

ψkET(εT+l−k)

= l−1 k=0

ψkεT+l−k.

We focus on the variability of the forecasts errors. That is, straightforward cal- culations yield Var(yT+l−ET(yT+l))=σ2 l−k=11ψk2. Thus, assuming normality of the errors, a 100(1−α)% forecast interval foryT+lis

yT+l±(t−value)s l−1

k=0

ψk2.

Index

0 100 200 300 400 500

0.02 0.01 0.00 0.01

Residuals Figure 8.7 Residuals

from a quadratic trend in time model of the Hong Kong exchange rates.

wheret-value is the (1−α/2)thpercentile from at-distribution withdf =T − (number of linear parameters). If yt is an ARI MA(p, d, q) process, then ψk is a function ofβ1, . . . , βp, θ1, . . . , θq and the number of linear parameters is 1+p+q.

Fitting Data to a Normal Distribution

Is the Model Useful? Some Basic Summary Measures