Look for trends, seasonal components, step changes, outliers3. Nonlinearly transform data, if necessary.[r]
(1)Introduction to Time Series Analysis Lecture 13.
Peter Bartlett
Last lecture:
1 Yule-Walker estimation
(2)Introduction to Time Series Analysis Lecture 13. Review: Maximum likelihood estimation
2 Computational simplifications: un/conditional least squares Diagnostics
4 Model selection
(3)Review: Maximum likelihood estimator
Suppose that X1, X2, , Xn is drawn from a zero mean Gaussian ARMA(p,q) process The likelihood of parameters φ ∈ Rp, θ ∈ Rq,
σw2 ∈ R+ is defined as the density of X = (X1, X2, , Xn)′ under the Gaussian model with those parameters:
L(φ, θ, σw2 ) = (2π)n/2 |Γ
n|1/2
exp
−1
2X
′Γ−1
n X
,
where |A| denotes the determinant of a matrix A, and Γn is the variance/covariance matrix of X with the given parameter values
(4)Maximum likelihood estimation: Simplifications
We can simplify the likelihood by expressing it in terms of the innovations. Since the innovations are linear in previous and current values, we can write
X1 Xn
| {z }
X = C
X1 − X10
Xn − Xn−1
n
| {z }
U
where C is a lower triangular matrix with ones on the diagonal Take the variance/covariance of both sides to see that
Γn = CDC′ where D = diag(P0
(5)Maximum likelihood estimation
Thus, |Γn| = |C|2P10 · · ·Pnn−1 = P10 · · ·Pnn−1 and
X′Γ−1
n X = U′C′Γn−1CU = U′C′C−TD−1C−1CU = U′D−1U So we can rewrite the likelihood as
L(φ, θ, σw2 ) = (2π)nP0
1 · · ·Pnn−1
1/2 exp −
n
X
i=1
(Xi − Xii−1)2/Pii−1
!
=
(2πσw2 )nr10 · · ·rnn−11
/2 exp
−S(φ, θ)
2σ2
w
,
where ri−1
i = Pii−1/σw2 and
S(φ, θ) =
n
X
i=1
Xi − Xi−1
i
2 ri−1
i
(6)Maximum likelihood estimation
The log likelihood of φ, θ, σw2 is
l(φ, θ, σw2 ) = log(L(φ, θ, σw2 )) = −n
2 log(2πσ
w) −
1
n
X
i=1
logri−1
i −
S(φ, θ) 2σ2
w
Differentiating with respect to σw2 shows that the MLE ( ˆφ,θ,ˆ σˆw2 ) satisfies
n 2ˆσ2
w
= S( ˆφ,θ)ˆ 2ˆσ4
w
⇔ σˆw2 = S( ˆφ, θ)ˆ
n ,
and φ,ˆ θˆminimize log S( ˆφ,θ)ˆ n ! + n n X i=1
logri−1
(7)Summary: Maximum likelihood estimation
The MLE ( ˆφ,θ,ˆ σˆw2 ) satisfies
ˆ
σw2 = S( ˆφ, θ)ˆ
n ,
and φ,ˆ θˆminimize log S( ˆφ,θ)ˆ n
!
+ n
n
X
i=1
logri−1
i ,
where ri−1
i = Pii−1/σw2 and
S(φ, θ) =
n
X
i=1
Xi − Xi−1
i
2 ri−1
i
(8)Maximum likelihood estimation
Minimization is done numerically (e.g., Newton-Raphson) Computational simplifications:
• Unconditional least squares Drop the log rii−1 terms
• Conditional least squares Also approximate the computation of xi−1
i by dropping initial terms in S e.g., for AR(2), all but the first two terms in S
depend linearly on φ1, φ2, so we have a least squares problem The differences diminish as sample size increases For example,
Pt−1
t → σw2 so rtt−1 → 1, and thus n−1
P
(9)Review: Maximum likelihood estimation
For an ARMA(p,q) process, the MLE and un/conditional least squares estimators satisfy
ˆ φ ˆ θ − φ θ
∼ AN
0,
σw2 n Γφφ Γφθ Γθφ Γθθ,
−1
, where Γφφ Γφθ Γθφ Γθθ,
= Cov((X, Y ),(X, Y )), X = (X1, , Xp)′ φ(B)X
t = Wt,
Y = (Y1, , Yp)′ θ(B)Y
(10)Introduction to Time Series Analysis Lecture 13. Review: Maximum likelihood estimation
2 Computational simplifications: un/conditional least squares Diagnostics
4 Model selection
(11)Building ARMA models
1 Plot the time series
Look for trends, seasonal components, step changes, outliers Nonlinearly transform data, if necessary
3 Identify preliminary values of p, and q Estimate parameters
5 Use diagnostics to confirm residuals are white/iid/normal
(12)Diagnostics
How we check that a model fits well? The residuals (innovations, xt − xt−1
t ) should be white Consider the standardized innovations,
et = xt − xˆ
t−1
t
q ˆ Pt−1
t
This should behave like a mean-zero, unit variance, iid sequence • Check a time plot
(13)Testing i.i.d.: Turning point test
{Xt} i.i.d implies that Xt, Xt+1 and Xt+2 are equally likely to occur in any of six possible orders:
0 10 15 20
0.5 1.5 2.5 3.5
(provided Xt, Xt+1, Xt+2 are distinct)
(14)Testing i.i.d.: Turning point test
Define T = |{t : Xt, Xt+1, Xt+2 is a turning point}|
ET = (n − 2)2/3
Can show T ∼ AN(2n/3,8n/45)
Reject (at 5% level) the hypothesis that the series is i.i.d if
T − 2n
3
> 1.96 r
8n 45
(15)Testing i.i.d.: Difference-sign test
S = |{i : Xi > Xi−1}| = |{i : (∇X)i > 0}|
ES = n −
Can show S ∼ AN(n/2, n/12)
Reject (at 5% level) the hypothesis that the series is i.i.d if
S − n
2
> 1.96 r
n 12
Tests for trend
(16)Testing i.i.d.: Rank test
N = |{(i, j) : Xi > Xj and i > j}|
EN = n(n − 1)
Can show N ∼ AN(n2/4, n3/36)
Reject (at 5% level) the hypothesis that the series is i.i.d if
N − n2
4
> 1.96 r
n3 36
(17)Testing if an i.i.d sequence is Gaussian: qq plot
Plot the pairs (m1, X(1)), ,(mn, X(n)), where mj = EZ(j),
Z(1) < · · · < Z(n) are order statistics from N(0,1) sample of size n, and
X(1) < · · · < X(n) are order statistics of the series X1, , Xn Idea: If Xi ∼ N(µ, σ2), then
EX(j) = µ + σmj, so (mj, X(j)) should be linear.
(18)Introduction to Time Series Analysis Lecture 13. Review: Maximum likelihood estimation
2 Computational simplifications: un/conditional least squares Diagnostics
4 Model selection