Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 80 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
80
Dung lượng
295,56 KB
Nội dung
MULTISTEP YULE-WALKER ESTIMATION
OF AUTOREGRESSIVE MODELS
YOU TINGYAN
NATIONAL UNIVERSITY OF SINGAPORE
2010
MULTISTEP YULE-WALKER ESTIMATION
OF AUTOREGRESSIVE MODELS
YOU TINGYAN
(B.Sc. Nanjing Normal University)
A THESIS SUBMITTED
FOR THE DEGREE OF MASTER OF SCIENCE
DEPARTMENT OF STATISTICS AND APPLIED PROBABILITY
NATIONAL UNIVERSITY OF SINGAPORE
2010
i
Acknowledgements
It is a pleasure to convey my gratitude to those who made this thesis possible all
in my humble acknowledgment. In the first place I am heartily thankful to my supervisor, Prof. Xia Yingcun, whose encouragement, supervision and support from
the preliminary to the concluding level enabled me to develop an understanding of
the subject. His supervision, advice, and guidance from the very early stage of this
research as well as giving me extraordinary experiences through the work are the
critical support to the completeness of this thesis. Above all and the most needed,
he provided me sustaining encouragement and support in various ways. His truly
scientist intuition has made him as a constant oasis of ideas and passions in science,
which exceptionally inspire and enrich my growth as a student, a researcher and
a scientist want to be. I am gratefully appreciating him more than he knows. I
also would like to record my gratitude to my classmates and seniors, Jiang Binyan,
Jiang Qian, Liangxuehua, Zhu Yongting, Yu Xiaojiang, Jiang Xiaojun, for their
involvement with my research. It’s so kind of them all always kindly to grant me
their time even for answering some of my unintelligent questions about time series
ii
and estimation methods. Many thanks go in particular to Fu Jingyu, who used her
precious time to read this thesis and gave her critical and constructive comments
about it.
Lastly, I offer my regards and blessings to staffs in the general office of department, and all of those who supported me in any respect during the completion of
the project.
iii
Contents
Acknowledgements
Summary
i
vi
List of Tables
vii
List of Figures
ix
1 Introduction
1
1.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
AR model and its estimation . . . . . . . . . . . . . . . . . . . . . .
2
1.3
Organization of this Thesis . . . . . . . . . . . . . . . . . . . . . . .
4
2 Literature Review
6
2.1
Univariate Time Series Background . . . . . . . . . . . . . . . . . .
6
2.2
Time series Models . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.3
Autoregressive (AR) Model . . . . . . . . . . . . . . . . . . . . . .
12
2.4
AR model Properties . . . . . . . . . . . . . . . . . . . . . . . . . .
14
iv
2.5
2.6
2.4.1
Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.4.2
ACF and PACF for AR Model . . . . . . . . . . . . . . . . .
15
Basic Methods for Parameter Estimation . . . . . . . . . . . . . . .
18
2.5.1
Maximum Likelihood Estimation (MLE) . . . . . . . . . . .
18
2.5.2
Least Square Estimation Method (LS) . . . . . . . . . . . .
20
2.5.3
Yule-Walk Method (YW)
. . . . . . . . . . . . . . . . . . .
23
2.5.4
Burg’s Estimation Method (B) . . . . . . . . . . . . . . . . .
25
Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . .
26
3 Multistep Yule-Walker Estimation Method
29
3.1
Multistep Yule-Walker Estimation (MYW)
. . . . . . . . . . . . .
30
3.2
Bias of YW method on Finite Samples . . . . . . . . . . . . . . . .
32
3.3
Theoretical Support of MYW . . . . . . . . . . . . . . . . . . . . .
33
4 Simulation Results
4.1
Comparisons for Estimation Accuracy for AR (2) model
. . . . . .
36
4.1.1
Percentage for Outperformance of MYW . . . . . . . . . . .
36
4.1.2
Difference between the SSE of ACFs for YW and MWY
methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
The Effect of Different Forward Step m
. . . . . . . . . . .
42
Estimation Accuracy for Fractional ARIMA Model . . . . . . . . .
45
4.1.3
4.2
36
5 Real Data Application
52
v
5.1
Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
5.2
Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
6 Conclusion and Future Research
58
Bibliography
59
Appendix
65
vi
Summary
The aim of this work is to fit a ”wrong” model to an observed time series
by employing higher order Yule-Walker equations in order to enhance the fitting
accuracy. Several parameter estimation methods for autoregressive models were
reviewed, such as Maximum Likelihood method, Least Square method, Yule-Walker
method, Burg’s method, etc. Comparison of the estimation accuracy between the
well-known Yule-Walker method and our new multistep Yule-Walker method based
on the autocorrelation function (ACF) is made. The effect of different number of
Yule-Walker equations on the estimation performance is investigated. Monte Carlo
analysis and real data are used to check the performance of the proposed method.
Keywords: Time series, Autoregressive Model, Least Square method, Yule-Walker
Method, ACF
vii
List of Tables
4.1
Detailed Percentage for a Better Performance of MYW method . .
39
4.2
List of ”best” m for MYW method . . . . . . . . . . . . . . . . . .
44
viii
List of Figures
4.1
Percentage for outpermance of MYW out of 1000 simulation iterations for n=200, 500, 1000 and 2000 . . . . . . . . . . . . . . . . . .
38
4.3
SSE of ACF for both method and its difference with n=200
. . . .
41
4.4
SSE of ACF for both method and its difference with n=1000 . . . .
41
4.5
SSE of ACF for both method and its difference with n=500 . . . . .
41
4.6
SSE of ACF for both method and its difference with n=2000 . . . .
41
4.6
SSE of ACF for MYW method with n=200, 500, 1000 and 2000 . .
43
4.7
Difference of SSE of ACF with n=200, 500, 1000 and 2000 for p=2,
d=0.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
Difference of SSE of ACF for n=500 with p=1 . . . . . . . . . . . .
48
4.10 Difference of SSE of ACF for n=500 with p=2 . . . . . . . . . . . .
48
4.11 Difference of SSE of ACF for n=500 with p=3 . . . . . . . . . . . .
48
4.12 Difference of SSE of ACF for n=500 with p=4 . . . . . . . . . . . .
48
4.13 Difference of SSE of ACF for n=1000 with p=1 . . . . . . . . . . .
49
4.14 Difference of SSE of ACF for n=1000 with p=2 . . . . . . . . . . .
49
4.9
ix
4.15 Difference of SSE of ACF for n=1000 with p=3 . . . . . . . . . . .
49
4.16 Difference of SSE of ACF for n=1000 with p=4 . . . . . . . . . . .
49
4.17 Difference of SSE of ACF for n=2000 with p=1 . . . . . . . . . . .
50
4.18 Difference of SSE of ACF for n=2000 with p=2 . . . . . . . . . . .
50
4.19 Difference of SSE of ACF for n=2000 with p=3 . . . . . . . . . . .
50
4.20 Difference of SSE of ACF for n=2000 with p=4 . . . . . . . . . . .
50
5.2
Difference between SSE of ACF for two methods with p=1 . . . . .
54
5.3
SSE of ACF for MYW method with p=1 . . . . . . . . . . . . . . .
54
5.4
Difference between SSE of ACF for two methods with p=2 . . . . .
54
5.5
SSE of ACF for MYW method with p=2 . . . . . . . . . . . . . . .
54
5.6
Difference between SSE of ACF for two methods with p=3 . . . . .
55
5.7
SSE of ACF for MYW method with p=3 . . . . . . . . . . . . . . .
55
5.8
Difference between SSE of ACF for two methods with p=4 . . . . .
55
5.9
SSE of ACF for MYW method with p=4 . . . . . . . . . . . . . . .
55
5.10 Difference between SSE of ACF for two methods with p=5 . . . . .
56
5.11 SSE of ACF for MYW method with p=5 . . . . . . . . . . . . . . .
56
CHAPTER 1. INTRODUCTION
1
Chapter 1
Introduction
1.1
Introduction
In recent years, great interests have been given to the development and application of time series data. There are two categories of methods for time series
analysis, one is frequency-domain methods and the other one is time-domain methods. The former includes spectral analysis and wavelet analysis; the latter includes
autocorrelation and cross-correlation analysis. These methods are commonly used
for astronomic phenomena, weather patterns, financial asset prices, economic activities, etc. The time series models introduced include simple autoregressive (AR)
models, simple moving-average (MA) models, mixed autoregressive moving-average
(ARMA) models, seasonal models, unit-root nonstationarity, and fractionally differenced models for long-range dependence. The most fundamental class of time
series should be the autoregressive moving average model(ARMA). Techniques to
CHAPTER 1. INTRODUCTION
2
estimate the parameters of the ARMA model fall into two classes. One is to
construct a likelihood function and derive the parameters by maximizing it using
some iterative nonlinear optimization procedure. The other class of technique gets
the parameter in two steps: firstly obtain the coefficients of autoregressive (AR)
parameters, then derive the spectral parameters in moving-average (MA) part subsequently. In the scope of our work, focus will be put on the method for parameter
estimation for AR parameters. After reviewing several commonly used AR model
parameter estimation methods, a new multistep Yule-Walker estimation method
is introduced which increases the equation number in the Yule-Walker method to
enhance the fitting accuracy. The criteria used to compare the performance of the
methods is the ACFs matching between model generated series and original series,
which was detailed introduced by Xia and H.Tong( 2010).
1.2
AR model and its estimation
Various models have been developed to mimic the observed time series. However, it is said that to some extend all the models are wrong due to certain reasons.
No model could exactly reflects the observed series and inaccuracy is always existing for the postulated model. The only effort we could make is to find a model
which can capture the characteristic of the series to the maximum extend and to
fit the ”wrong” model with a parameter estimation method which can reduce the
estimation bias effectively. Our work will be focusing on the AR models and its
CHAPTER 1. INTRODUCTION
3
estimation methods in order to evaluate the performance of different parameter
estimation methods for fitting the AR model. The autoregressive (AR) model,
which was developed by Box and Jenkins in 1970, represents a linear regression
relationship of the current value of series against one or more past values of the
series. Early in the mid seventies, autoregressive modeling was first introduced in
the nuclear engineering and widely used in other industries soon after. Nowadays,
autoregressive modeling is a popular means for identifying, monitoring, malfunctioning detecting and diagnosing system performance. An autoregressive model
depends on a limited number of parameters, which are estimated from time series
data. There are a lot of techniques exist for computing AR coefficients, among
which the main two categories are Least Squares and Burg’s method. We could
find a wide range of supported techniques in MatLab for these methods. When
using the various algorithms from different sources, there are two points to be paid
attention to. One is to check whether or not the series has already been taken
out the mean, the other one is whether the sign of the coefficients are inverted
in the definition or assumptions. Comparisons of the estimated finite-sample accuracies within these methods have been made and these results provided some
useful insights into the behavior of these estimators. It has already been proved
that these estimation techniques should lead to approximately the same parameter estimates in large data sample cases. But either the Yule-Walker or the Least
Squares method is frequently used compared with other methods mostly due to
CHAPTER 1. INTRODUCTION
4
some historical reasons. Among all of the methods, the most common method is so
called Yule-Walker method which applies the least squares regression method on
the Yule-Walker equations system. The basic steps to get the Yule-Walker equations is firstly to derive the coefficients by multiplying the AR model by its prior
values with lag n = 1, 2, · · · , p, and then to take the expectation of the multiple
values and normalize it (Box and Jenkins, 1976). However, some previous research
has been done to show that in some occasions the Yule-Walker estimation method
leads to poor parameter estimates with large bias even for moderately sized data
samples. In our study, we propose an improved method on the Yule-Walker method
which is to increase the equation numbers in the Yule-Walker system and try to
figure out whether this could help to enhance the parameter estimation accuracy.
The Monte Carlo analysis will be used here to generate simulation results for this
new method and real data will also be applied to check its performance.
1.3
Organization of this Thesis
The outline of this work is as follows: In Chap 1, the aim and purpose of this
work is presented and a general introduction on the approaches to the parameter
estimation for autoregressive model is given. In Chap 2, Literature review has been
done on the definition of univariate time series, background of time series model
classes and properties of autoregressive model. Emphasis has been given to the
methods for estimating the parameters in the AR(p) model, including the Maxi-
CHAPTER 1. INTRODUCTION
5
mum Likelihood method, Least Square method, Yule-Walker method and Burg’s
method. The Monte Carlo analysis which will be used in numerical examples in the
following section is also briefly described. In Chap 3, we will show the modification
we proposed on the Yule-Walker method. The bias of the Yule-Walker estimator
in finite sample which lead to the poor performance of the Yule-Walker method is
demonstrated. Theoretical support for better estimation performance of Multistep
Yule-Walker method is given. Simulation results of the autoregressive processes to
support the modification are illustrated in Chap 4, while in Chap 5, we will illuminate our findings with the application of Multistep Yule-Walker modeling method
for daily exchange rate of Japanese Yen for US Dollar. Finally, conclusions for this
work and some remarks for further study are presented in Chap 6.
CHAPTER 2. LITERATURE REVIEW
6
Chapter 2
Literature Review
2.1
Univariate Time Series Background
Time series is a set of observations {xt } which is recorded at a specific time t
sequentially over equal time increments or continuous time. If the set is of single
observations, the series is called a univariate time series. Univariate time series
can be extended to deal with vector-valued data, which means more than one
observations are recorded at a time. This leads to the multivariate time-series
models and vectors are used for the multivariate data. Another extensions is the
forcing time series, on which the observed series may not have a causal effect. The
difference between the multivariate series and the forcing series is that we could
control the forcing series under experience design, which means it is deterministic,
while the multivariate series is totally stochastic. We will only cover the univariate
time series in this thesis, so hereinafter univariate time series is simply be put as
CHAPTER 2. LITERATURE REVIEW
7
time series. Time series can be either discrete or continuous. A discrete-time time
series is one in which the time for observation recording are is a discrete set, for
example, when observations are recorded at fixed time intervals. Continuous-time
time series are obtained when time set recording the observations are continuous.
Time series have been widely used in a wide range of areas. It arise when monitoring engineering processes, recording stock price in financial market or tracking
corporate business metrics, etc. Due to the fact that data points taken over time
may have an internal structure, such as autocorrelation, trend or seasonal variation, time series analysis has been developed to accounted for these issues and
investigate the information behind the series. For example, in the financial industry, time series analysis is used to observe the price changing trends on stock,
bond, or other financial asset over time; it can also be used to compare the change
of these financial variables with other comparable variables within the same time
period. To be more specific, if you wanted to analyze how the daily closing stock
prices for a given stock over a period of one year change, a list of all the closing
prices for the stock over each day for the year should be obtained and recorded
in chronological order as a time series with daily interval and a one-year period.
There are a number of approaches to modeling time series, from the simplest model
to more complicated model which take trend and seasonal and residual effect into
account. Decompositions is one approach is to decompose the time series into a
trend, seasonal, and residual component. Another approach, is to analyze the se-
CHAPTER 2. LITERATURE REVIEW
8
ries in the frequency domain, which is the common method used in scientific and
engineering applications. We will not cover the complicated models in this work
and only outline a few of the most common approaches below.
The simplest model for a time series is one in which there is no trends or seasonal
component. The observations are simply independent and identically distributed
(i.i.d.) random variables with zero mean, which is referred as X1 , X2 , · · · . We
define the series of random variables Xt as time series if for any positive integer n
and real number x1 , x2 , · · · , xn ,
P [X1 < x1 , · · · , Xn < xn ] = P [X1 < x1 ] · · · P [Xn < xn ] = F (x1 ) · · · F (xn ) (2.1)
where F (.) is the cumulative distribution function of the i.i.d random variables
X1 , X2 , · · · . To simplify this model, we do not consider the dependence between
observations. Specially, for all h >> 1 and all x, x1 , · · · , xn , if
P [Xn+h < x|X1 = x1 , Xn = xn ] = P [Xn+h < x],
(2.2)
we can say that X1 , ..., Xn contain no useful information when forecasting the
possible behavior of Xn+h . The function that minimizes the mean square error
E[(Xn+h − f (X1 , Xn ))2 ] will equal to zero if the values of X1 , Xn is given. This
property makes the i.i.d. series quite uninteresting and limits its use for forecasting.
However, it plays a very critical part as a building block for more complex time
series models. In other time series, trend is clear in the data pattern, thus, the zero
mean model is no longer suitable for these cases. So, we have the following model:
Xt = mt + Yt
(2.3)
CHAPTER 2. LITERATURE REVIEW
9
The model separate the time series into two parts: mt is the trend component
function which changes slowly over time and Yt is a time series with zero mean.
A common assumption in many time series techniques is that the data are stationary. If a time series {Xt } has similar properties to those time shifted series, we can loosely say that this time series is stationary. To be more strict on
the properties, we focus on the first-order and second-order moments of {Xt }.
Firstly the first-order moment of {Xt } is the mean function µx (t) = E(Xt ). Usually we will assume {Xt } be a time series with E(Xt2 ) < ∞. For the secondorder moment E(Xt2 ), we introduce the conception of covariance. The covariance
γi = Cov(Xt , Xt−i ) is called the lag-i autocovariance of {Xt }. It has two important properties: (a) γ0 = V ar(Xt ) and (b) γ−i = γi . The second property holds
because Cov(Xt , Xt−(−i) ) = Cov(Xt−(−i) , Xt ) = Cov(Xt+i , Xt ) = Cov(Xt1 , Xt1 −i ),
where ti = t + i. When normalized the autocovariance by its variance, the autocorrelation (ACF) is obtained. For a stationary process, the mean, variance and
autocorrelation structure do not change over time. So if we have a series of which
the above statistical properties are constant and no periodic fluctuations in seasonal
trend, we can call it stationary. But stationarity have more precise mathematical
definitions. In section 2.4.1, more introduction on stationary on autoregressive
process will be given for our purpose.
CHAPTER 2. LITERATURE REVIEW
2.2
10
Time series Models
A time series model for the observed series {Xt } is a specification of the joint
distributions of the sequence of random variables {Xt }. Different models for time
series data have many different forms and represent different stochastic processes.
We have briefly introduced the simplest model for a time series which are simply independent and identically distributed (i.i.d.) random variables with zero mean and
without trends or seasonal components. Three broad classes of practical importance for modeling variations of a process exist: the autoregressive (AR) models,
the moving average (MA) models and the integrated (I) models. Autoregressive
(AR) model is a linear regression relationship of the current value of the series
against one or more past values of the series. We will give a detailed description
on autoregressive model in the following section. Moving average (MA) model is a
linear regression relationship of the current value of the series against the random
shocks of one or more past values of the series. The random shocks at each point
are assumed to come from the same distribution, typically a normal distribution
with zero mean and constant finite variance. In the moving average model, these
random shocks are passed to future values of the time series, which make it distinct from other class of model. Fitting the MA estimates is more complicated
than fitting the AR models because the error terms in MA models are not observable. This means that iterative non-linear fitting procedures should be used for
MA model estimation instead of linear least squares. We will not go further on
CHAPTER 2. LITERATURE REVIEW
11
this topic in this study.
New models, such as the autoregressive moving average (ARMA) model and
autoregressive integrated moving average (ARIMA) model can be obtained if we
extend the models by combining the fundamental classes together. The autoregressive moving average (ARMA) model is a combination of autoregressive (AR)
model and Moving Average(MA) model. The autoregressive integrated moving
average (ARIMA) was introduced by Box and Jenkins (1976). It predicts mean
values in a time series as a linear combination of its own past values and past errors.
Autoregressive integrated moving average (ARIMA) model was advanced by Box
and Jenkins which requires long time series data. Box and Jenkins introduced the
concept of seasonal non-seasonal (S-NS) ARIMA models for describing a seasonal
time series and also provided an iterative procedure for developing such models.
Although seasonality violates stationarity assumption, the autoregressive fractionally integrated moving average (ARFIMA) model is also introduced to explicitly
incorporate the seasonality into the time series model.
All these above classes represent a linearly relationship between the current
data and previous data points. In empirical situation in which more complicated
time series are involved, linear models are not sufficient to cover all the information.
It is also an interesting topic to consider the non-linear dependence of a series on
previous data points which generates a chaotic time series. So models to represent the changes of variance over time, which is also called heteroskedasticity, are
CHAPTER 2. LITERATURE REVIEW
12
introduced. These models are called autoregressive conditional heteroskedasticity
(ARCH) and the collection of this model class has a wide variety of representations,
such as GARCH, TARCH, EGARCH, FIGARCH, CGARCH, etc. In the ARCH
model class, changes in variability are related to recent past values of the observed
series. Similarly the GARCH model assumes that there is correlation between a
time series data and its own lagged data. These ARCH model class have been
widely used in predicting several time series data including inflation, stock prices,
exchange rates, interest rates and for forecasting.
2.3
Autoregressive (AR) Model
This study focuses on one specific type of time series model: the autoregressive
(AR) model. The AR(p) model was developed by Box and Jenkins in 1970 (Box,
1994). As mentioned above, AR (p) model is a linear regression relationship of the
current value of the series against past values of the series. The value of p is called
the order of the AR model, which means that the current value is represented by
p past values in the series. An autoregressive process of order p is a zero mean
stationary process.
To better understand the general autoregressive model, we will start from the
simplest AR(1)model:
Xt = ϕ0 + ϕ1 Xt−1
(2.4)
CHAPTER 2. LITERATURE REVIEW
13
For AR(1) model, conditional on the past observation, we have
E(Xt |Xt−1 ) = ϕ0 + ϕ1 Xt−1
(2.5)
V ar(Xt |Xt−1 ) = V ar(at ) = σa2
(2.6)
From the above conditional mean and variance on the past data point Xt−1 , the
value of Xt−1 is not correlated to the value of Xt−i for i > 1. The current data
point is centered around ϕ0 + ϕ1 Xt−1 with standard deviation σa . So, the past data
point Xt−1 is not enough to determine the conditional expectation of Xt , which
inspires us to take more past data points into the model to give a better indication
for the current data point. Thus a more flexible and generalized model is extended
as AR(p) model satisfies the following equation:
X(t) = ϕ1 X(t − 1) + ϕ2 X(t − 2) + .. + ϕp X(t − p) + at
(2.7)
where p is the order and {at } is assumed to be a white noise series with zero mean
and constant finite variance σa2 . The representation of the AR(p) model has the
same form as the linear regression model if Xt is served as the dependent variable
and lagged values Xt−1 , Xt−2 , ..., Xt−p are served as the explanatory variable. Thus,
the autoregressive model has several properties similar to those of the simple linear
regression model. However there are still some differences between the two models.
In this model, the past p values Xt−i (i = 1, ..., p) jointly determine the conditional
expectation of Xt given the past data. The coefficients ϕ1 , ϕ2 , · · · , ϕp are such that
CHAPTER 2. LITERATURE REVIEW
14
all the roots of the polynomial equation
1−
p
∑
ϕp x−i = 0
(2.8)
i=1
fall inside the unit circle; or another polynomial form
A(z) = 1 + ϕ1 z1 + + ϕn zn
(2.9)
has all its zeros outside the unit circle. This is a necessary condition for the
stationarity of the autoregressive process, which will be the main content of the
following section.
2.4
2.4.1
AR model Properties
Stationarity
The foundation of time series analysis is stationarity. We refer a time series
{Xt } to be strictly stationary if the joint distribution of (Xt1 , ..., Xtk ) is identical
to that of (Xt1 +t , ..., Xtk +t ) for all t, where k is an arbitrary positive integer and
(t1 , , ..., tk ) is a collection of k positive integers represent the recorded time. To
put it in a more understandable way, if the joint distribution of (Xt1 +t , ..., Xtk +t )
is invariant under time shift, the time series can be recognized as strict stationary.
This condition is very strong and usually used in theoretical research. However,
in real world time series, it is hard to achieve. Thus, we use another version of
stationarity called weak stationarity. From the name we can see that it is a weaker
CHAPTER 2. LITERATURE REVIEW
15
form of stationarity which stands if both the mean of Xt and the covariance between
Xt and Xt−i are time-invariant, where i is an arbitrary integer. That is to say, for
a time series {Xt } to meet the requirement of weakly stationary, it should satisfy
two conditions: (a) Constant mean: E(Xt ) = µ; and (b) Cov(Xt , Xt−i ) = γi only
depends on i. To illustrate the weak stationarity clearly, we take a series of T
observed data points {Xt | t = 1, ..., T } as example. If we look at the time plot of
this weak stationary series, we can find that the values of the series are fluctuating
within a fixed interval and with a constant variation. In practical applications,
weak stationarity has a wider use and enables one to make inferences concerning
future observations. If the first two moments of {Xt } are finite, the time series
can be regarded as under the weak stationarity condition implicitly. From the
definitions, a time series {Xt } under strictly stationary condition has its first two
moments to be finite, so we can conclude that the strong stationary implies the
weak stationary. However, the converse deduction does not hold. In addition, if
the time series {Xt } is normally distributed, then the two stationarity is equivalent
to each other due to the special properties of the normal distribution.
2.4.2
ACF and PACF for AR Model
Methods for time series analysis may be divided into two classes: frequencydomain methods and time-domain methods. Auto-correlation and cross-correlation
analysis are included in the latter class, which is to examine serial dependence.
CHAPTER 2. LITERATURE REVIEW
16
In linear time series analysis, correlation is of great importance to understand
various classes of models. Special attention has been paid to the correlations between the variable and its past values. This concept of correlation is generalized
to autocorrelation, which is the basic tool for studying a stationary time series. In
other text it is also referred as serial correlations.
Consider a weakly stationary time series {Xt }, the linear dependence between
Xt and its past values Xt−i is of interest. We call the correlation coefficient between
Xt and Xt−i as the lag-i autocorrelation of Xt and is commonly denoted by ρi .
Specifically, we define
Cov(Xt , Xt−i )
Cov(Xt , Xt−i )
γi
ρi = √
=
=
V ar(Xt )
γ0
V ar(Xt )V ar(Xt−i )
(2.10)
Under the weak stationarity condition, V ar(Xt ) = V ar(Xt−i ) and ρi is a function
of i only. From the definition, we have ρ0 = 1, ρi = ρ−i , and −1 ≤ ρi ≤ 1. In
addition, a weakly stationary series {Xt } is not autocorrelated if and only if ρi = 0
for all i > 0.
Here, we also introduce the partial autoregressive function (PACF) for a stationary time series to understand other properties of the series. PACF is a function
of its ACF and is a powerful method for determining the order p of an AR model.
A simple, yet effective way to introduce PACF is to consider the following AR
models in consecutive orders:
xt = Φ0,1 + Φ1,1 xt−1 + e1t
xt = Φ0,2 + Φ1,2 xt−1 + +Φ2,2 xt−2 + e2t
CHAPTER 2. LITERATURE REVIEW
17
xt = Φ0,3 + Φ1,3 xt−1 + +Φ2,3 xt−2 + +Φ3,3 xt−3 + e3t
xt = Φ0,4 + Φ1,4 xt−1 + +Φ2,4 xt−2 + +Φ3,4 xt−3 + +Φ4,4 xt−4 + e4t
..
.
..
.
where Φ0,j , Φi,j , and ejt are the constant term, the coefficient of xt−i , and the error
term of an AR(j) model respectively. These above equations all have the same
form with a multiple linear regression and the estimation for PACF estimator as
the coefficient in the model can use the concept of least squares regression for estimation. Following is a specific description for the PACF estimator: the estimate
Φ1,1 in the first equation is called the lag-1 sample PACF of xt ; the estimate Φ2,2
in the second equation is the lag-2 sample PACF of xt ; the estimate Φ3,3 in the
third equation is the lag-3 sample PACF of xt , and so on. From the definition, the
lag-2 PACF Φ2,2 shows the added contribution of xt−2 to xt over the AR(1) model
xt = Φ0 + Φ1 xt−1 + e1t . The lag-3 PACF shows the added contribution of xt−3
to xt over an AR(2) model, and so on. Therefore, for an AR(p) model, the lag-p
sample PACF should not be zero, but Φj,j should be close to zero for all j > p.
This means that the sample PACF cuts off at lag p and this property is often used
to determine the value of order p for the autoregressive model. The following other
properties of sample PACF can be obtained for a stationary AR(p) model:
• Φp,p converges to Φp as the sample size T goes to infinity.
• The asymptotic variance of Φj,j is 1/T for j > p.
CHAPTER 2. LITERATURE REVIEW
2.5
18
Basic Methods for Parameter Estimation
The AR model is widely used in science, engineering, econometrics, biometrics,
geophysics, etc. When a series is to be modeled by the AR model, the appropriate
order p should be determined and the parameters of the model must be estimated.
There are a number of methods available for estimating its parameters for this
model and of these the following three maybe the most commonly used.
2.5.1
Maximum Likelihood Estimation (MLE)
Maximum Likelihood method has a wide use for estimation. Time series analysis also adopts it to estimate the parameters of the stationary ARMA(p,q) model.
To use the Maximum Likelihood method, let’s assume that time series {Xt } follows
the Gaussian distribution. Consider the gereral ARMA (p,q) model
Xt = ϕ1 Xt−1 + · · · + ϕp Xt−p + at − θ1 at−1 − · · · − θq at−q
(2.11)
where µ = E(Xt ) and at ∼ N (0, σa2 ). The joint probability density of a =
(a1 , a2 , · · · , an )′ is
P (a|ϕ, µ, θ, σa2 )
=
(2ϕσa2 )−n/2
n
1 ∑ 2
a]
exp[− 2
2σa t=1 t
(2.12)
Set X0 and a0 to be the initial values for X and a, we get the log-likelihood
function
ln L∗ (ϕ, µ, θ, σa2 ) =
n
S∗ (ϕ, µ, θ)
ln 2πσa2 −
2
2σa2
(2.13)
CHAPTER 2. LITERATURE REVIEW
19
where
S∗ (ϕ, µ, θ) =
n
∑
a2t (ϕ, µ, θ|X0 , a0 , X)
(2.14)
t=1
By maximizing ln L for the given series data, Maximum Likelihood estimator
is obtained. Since the above log-likelihood function is based on the initial condition, so the estimators ϕ, µ and θ are called the condition Maximum Likelihood
estimators.
The estimator σa2 of σa2 is obtained as
σa2 =
S∗ (ϕ, µ, θ)
n − (2p + q + 1)
(2.15)
after ϕ, µ and θ are calculated.
Alternatively, because of the stationarity of the time series, an improvement was
proposed by Box, Jenkins, and Reinsel (1994) with the unknown future value in the
forward form and unknown past backward forms. The unconditional log-likelihood
function came out with this improvement
ln L(ϕ, µ, θ, σa2 ) =
n
S(ϕ, µ, θ)
ln 2πσa2 −
2
2σa2
(2.16)
with the unconditional sum of square function
n
∑
S(ϕ, µ, θ) =
[E(at |ϕ, µ, θ)]2
(2.17)
t=−∞
Similarly, the estimator σa2 of σa2 is calculated as
σa2 =
S∗ (ϕ, µ, θ)
n
(2.18)
CHAPTER 2. LITERATURE REVIEW
20
The unconditional Maximum Likelihood method is efficient in the situations
for seasonal models, or nonstationary models or relatively short series. Both the
conditional and unconditional likelihood function are approximations. The exact
closed form is very difficult to derive. Newbold (1974) illustrated an expression for
the ARMA(p,q) model.
One thing to mention here is that when X1 , X2 , ..., Xn are independent and
identically distributed (i.i.d), when n is sufficiently large, the Maximum Likelihood
estimators follow approximately normally distributions, the variances of which are
at least as small as those of other asymptotically normally distributed estimators
(Lehamann, 1983). Even if {Xt } is not normal distributed, Equation 2.16 still can
be used as a measure of goodness to fit the model and the estimator calculated by
maximizing Equation 2.16 is still called Maximum Likelihood estimators. For the
scope of our study, we can obtained the ML estimator for the AR process setting
θ = 0.
2.5.2
Least Square Estimation Method (LS)
Regression analysis is possibly the most widely used statistical method in data
analysis. Among the various regression methods, Least Square is well developed
for the linear regression models and been used frequently for estimation. The
principal of Least Square approach is to minimize the standard sum of squares of
the errors term ϵt . AR model is a simple linear regression model and it utilizes
CHAPTER 2. LITERATURE REVIEW
21
the least squares method to fit a model by minimizing the sum of square errors for
estimating parameters. Consider the following AR(p) model:
Y (t) = ϕ1 Y (t − 1) + ϕ2 Y (t − 2) + .. + ϕp Y (t − p) + εt
(2.19)
The shock at is under the following assumptions:
1. E(εt ) = 0
2. E(ε2t ) = σe2
3. E(εt εk ) = 0 for t ̸= k
4. E(Yt εt ) = 0
That is, εt is a zero mean white noise series of constant variance σt2 .
Let ϕ denote the vector of known parameter
ϕ = [ϕ1 , ..., ϕp ]T
(2.20)
The AR model parameters in equation 2.19 are estimated by minimizing the
sum of squares error
∑n
t=1
ε2t . So the Least Square estimate of ϕ is defined as
ϕLS = arg min
n
∑
[y(t) −
t=1
Denote
p
∑
y(t − i)ϕi ]2
i=1
[
Y˜ (t) =
(2.21)
]T
y(t − 1) · · · y(t − p)
(2.22)
After calculation, equation (2.21) yields the results
[
ϕLS =
n
∑
t=p+1
]−1 [
Y˜ (t)T Y˜ (t)
n
∑
t=p+1
]
Y˜ (t)T y(t)
(2.23)
CHAPTER 2. LITERATURE REVIEW
22
Detailed information for the above algorithm was explained by Kay and Marple
1981. Later, we found that the LS method uses the normal equations to implement
the linear system. We have two common methods for solving the normal equation.
One is by Cholesky factorization and the other one is by QR factorization. While
Cholesky factorization is faster in computation, QR factorization has better numerical properties. In Least Square method, we assume that the earlier observations
receive the same weight as recent observations. It gives the linear systems equation
Ax = b from least squares normal equation as follows:
N
N
N
∑
∑
∑
2
yt−1
yt−1 yt−2 · · ·
yt−1 yt−p
t=p+1
t=p+1
t=p+1
∑
N
N
N
∑
∑
2
yt−1 yt−2
yt−2
···
yt−2 yt−p
t=p+1
t=p+1
t=p+1
..
..
..
..
.
.
.
.
N
N
N
∑
∑
∑
2
y y
y y
···
y
t−1 t−p
t=p+1
t−2 t−p
t=p+1
t−p
t=p+1
ϕ1
ϕ2
..
.
ϕp
N
∑
yt yt−1
t=p+1
∑
N
yt yt−2
= t=p+1
..
.
N
∑
yt yt−p
t=p+1
QR factorization (Golub and Van Loan, 1996) are used to solve the first and
second linear system equations. Let’s rewrite normal equations AT Ax = AT b using
QR factorization A = QR:
AT Ax = AT b
RT QT QRx = RT QT b
RT Rx = RT QT b (QT Q = I)
Rx = QT b (R nonsingular)
The results from this method were used as the model parameters. In this
method, we assume the earlier observations in LS method receive the same weight
CHAPTER 2. LITERATURE REVIEW
23
as recent observations. However, the recent observations may be more important
for the true behavior of the process so that, so discounted least squares method
was proposed to take into account the condition that the older observations receive
proportionally less weight than the recent ones.
2.5.3
Yule-Walk Method (YW)
Yule-Walk Method method, also called the autocorrelation method, is a numerically simple approach to estimate the AR parameters of the ARMA model. In
this method, an autoregressive (AR) model is also fitted by minimizing the forward
prediction error in a sense of least-squares regression. The difference is that YuleWalker method is to solves the Yule-Walker equations, which is formed from sample
covariances. A stationary autoregressive (AR) process {Yt } of order p can be fully
identified from the first p + 1 autocovariances, that is cov(Yt , Yt+k ), k = 0, 1, · · · , p,
by the Yule-Walker equations. Moreover, the Yule-Walker equations have been
employed in estimating the AR parameters and the disturbance variance from the
first p + 1 sample autocovariances.
Rewrite Equation 2.19, we can get Yt in the form of
Yt =
p
∑
ϕj Yt−j + εt
(2.24)
j=1
By Multiplying both side of Equation 2.24 by Yt−j , j = 0, 1, · · · , p, then taking
expections, we could get the YW Equation
Γp ϕ = γp
(2.25)
CHAPTER 2. LITERATURE REVIEW
24
where Γp is the covariance matrix [γ(i − j)]pi,j=1 and γp = (γ(1), · · · , γ(p))′ . Replacing the covariance γ(j) by the corresponding sample covariances γ(j), the YuleWalker estimator of ϕ is given below by (Young and Jakeman 1979)
· · · γ(p − 1)
γ(1)
γ(2)
γ(1)
γ(0)
· · · γ(p − 2)
ϕY W =
.
..
..
..
..
..
.
.
.
.
γ(p)
γ(p − 1) γ(p − 2) · · ·
γ(0)
γ(0)
γ(1)
(2.26)
or:
Γp ϕY W = γp
(2.27)
Here, autocovariance could be replaced with autocorrelation (ACF) when normalized by the variance, then the autocovariance γi becomes the autocorrelation
ρi with the values varying within interval [-1,1]. The terms autocovariance and
autocorrelation can be used interchangeably.
Various algorithms, such as the Least Square algorithm or Levinson-Durbin
algorithm, can be used here to solve the above linear Yule-Walker system. The
Levinson-Durbin recursion is quite efficient for computation to get the AR (p)
parameters with the first p autocorrelations. Toeplitz structure of the matrix in
Equation 2.26 provides convenience for computation and makes the Yule-Walker
methods more attractive with more computational efficiency than the Least Square
method. The advantage of the computational simplicity makes Yule-Walker an
attractive choice for many applications.
CHAPTER 2. LITERATURE REVIEW
2.5.4
25
Burg’s Estimation Method (B)
Burg’s method is another different class of estimation method. It has been found
that Burg’s method, which is to solve the lattice filter equations using the harmonic
mean of forward and backward squared prediction errors, gives a quite good performance with high accuracy and is regarded to be the most preferable method when
the signal energy is non-uniformly distributed in a frequency range. This is often
the case with audio signals. Burg’s method is quite different from the Least Square
and Yule-Walker method which estimate the autoregressive parameters directly.
Different from the Least Square method which minimizing the residual, Burg’s
method deals with prediction error. Different from the Yule-Walker method, in
which the estimated coefficients ϕp1 , · · · , ϕpp are precisely the coefficients of the
best linear predictor of YP +1 in terms of Yp , · · · , Y1 under the assumption that
the ACF of Yt coincides with the sample ACF at lag 1, ..., p, Burg’s method first
estimates the reflection coefficients, which are defined as the last autoregressive
parameter estimate for each model order p. Reflection coefficients consists of unbiased estimates of the partial autocorrelation (PACF) coefficient. Under Burg’s
method, PACF Φ11 , Φ22 , · · · , Φpp is estimated by minimizing the sum of squares of
forward and backward one-step prediction errors with respect to the coefficients Φii .
Levinson-Durbin algorithm is also used here to determine the parameter estimates.
It recursively computes the successive intermediate reflection coefficients to derive
the parameters for the AR model. Given a observed stationary zero mean series
CHAPTER 2. LITERATURE REVIEW
26
Y(t), we denote ui (t), t = i1 , ..., n, 0 ≤ i < n, to be the difference between xn+1+i−t
and the best linear estimate of xn+1+i−t in terms of the preceding i observations.
Also, denote vi (t), t = i1 , ..., n, 0 ≤ i < n, to be the difference between xn+1−t
and the best linear estimate of xn+1−t in terms of the subsequent i observations.
ui (t) and vi (t) are so called forward and backward prediction errors and satisfy the
following recursions:
u0 (t) = v0 (t) = xn+1−t
(2.28)
ui (t) = ui+1 (t − 1) − Φii vi−1 (t)
(2.29)
vi (t) = vi−1 (t) − Φii ui−1 (t − 1)
(2.30)
(B)
Burg’s estimate Φ11 of Φ11 is obtained by minimizing δ12 , i.e.
∑
1
[u2 (t) + v12 (t)]
2(n − 1) t=2 1
n
(B)
Φ11 = arg min δ12 = arg min
(2.31)
The values for u1 (t), v1 (t) and δ12 generated from Equation 2.31 can be used to
(B)
replace the value in above recursion steps with i = 2 and Burg’s estimate Φ22 of
(B)
Φ22 is obtained. Continuing this recursion process, we can finally get Φpp . For
pure autoregressive models, Burg’s method usually performs better with a higher
likelihood than Yule-Walker method.
2.6
Monte Carlo Simulation
Monte Carlo simulation is a method that takes sets of random numbers as input
to iteratively evaluate a deterministic model. The aim of Monte Carlo simulation
CHAPTER 2. LITERATURE REVIEW
27
is to understand the impact of uncertainty, and to develop plans to mitigate or otherwise cope with risk. This method is especially useful for uncertainty propagation
situations such as variation determination, sensitivity error affects, performance or
reliability of the system modeling without enough information. For a simulation
involving in extremely large number of evaluations of the model could only be done
with super computers. Monte Carlo simulation is a sampling method which randomly generates the inputs from probability distributions to simulate the process of
sampling from an actual population. To use this method, we firstly should choose
a distribution for the inputs to match the existing data, or to represent our current
state of knowledge. There are several methods to represent the data generated
from the simulation, such as histogram, summary statistics, error bars, reliability
predictions, tolerance zones, and confidence intervals. Monte Carlo simulation is a
all round method with a wide range of applications in various fields. We can benefit a lot from the simulation method for analyzing the behavior of some activity,
plan or process that involves uncertainty. To deal with variable market demand
in economy, fluctuating costs in business, variation in a manufacturing process, or
unpredictable weather data in meteorology, you can always find the important role
of Monte Carlo simulation.
Thought Monte Carlo simulation has a powerful function, the steps in it are quite
simple. The following steps illustrate the common simulation procedures:
Step 1: Create a parametric model, y = f (x1 , x2 , ..., xq ).
CHAPTER 2. LITERATURE REVIEW
28
Step 2: Generate a set of random inputs, xi 1, xi 2, ..., xi q.
Step 3: Evaluate the model and store the results as yi .
Step 4: Repeat steps 2 and 3 for i = 1 to n.
Step 5: Analyze the results using probability distribution, confidence interval, etc.
CHAPTER 3. MULTISTEP YULE-WALKER ESTIMATION METHOD
29
Chapter 3
Multistep Yule-Walker Estimation
Method
When introducing the Yule-Walker Method, we can find its computational attractiveness, however, its drawback has also come into our eyes. We have the
unnormalized autocorrelation (also called autocovariance)
γk = E[y(t)y(t − k)]
(3.1)
N −k
1 ∑
γk =
y(t)y(t − k)
N − k t=1
(3.2)
and sample autocovariance
In the Yule-Walker Method, AR(p) parameters depend on merely the first p +
1 lags from γ0 to γp . This subset of the given autocorrelation lags can reflect
only part of the information contained in the series, which means that AR model
CHAPTER 3. MULTISTEP YULE-WALKER ESTIMATION METHOD
30
generated from Yule-Walker method will have autocorrelation behavior match the
first p+1 well, but it has very poor representation for the remaining autocorrelation
lags from γp+1 afterwards. Realizing the poor performance of the straightforward
application of the original Yule-Walker method, modifications have been proposed
for better estimation performance. Several modifications have been presented on
the basic method, such as to increase the number of the equation in the YuleWalker system and to increase the order of the estimated model. The basic ideas
of the modifications are very simple but significant improvements in the quality of
the estimates have been achieved. Different algorithms and a wide range of claims
about their relative performances are presented by a number of researchers. In
our work, focus will be mainly on clarifying and putting in proper perspective the
former modification which is to increase the number of the Yule-Walker equations.
We will call this method as Multistep Yule-Walker (MYW) method hereinafter in
this work. Following is the detailed description for this modification.
3.1
Multistep Yule-Walker Estimation (MYW)
To reflect the complete set of autocorrelation set, it is better to take the autocorrelation lags beyond p into account. Thus, the extended Yule-Walker system is
proposed:
CHAPTER 3. MULTISTEP YULE-WALKER ESTIMATION METHOD
· · · γ(p − 1)
γ(1)
γ(2)
γ(1)
γ(0)
· · · γ(p − 2)
..
..
..
.
..
..
.
.
.
.
γ(p)
γ(p − 1)
γ(p − 2)
···
γ(0)
ϕM Y W =
γ(p + 1)
γ(p)
γ(p − 1)
···
γ(1)
γ(p + 1)
γ(p)
···
γ(2)
γ(p + 2)
..
..
..
.
..
..
.
.
.
.
γ(p + m)
γ(p + m − 1) γ(p + m − 2) · · ·
γ(m)
γ(0)
31
γ(1)
(3.3)
or:
Γm ϕM Y W = γm
(3.4)
Multistep Yule-Walker estimate ϕM Y W can be obtained from the above system
which involves high lag coefficients γk , k > p. In the above system, the equation
number is larger than the parameter number. This over determined system of
equations can be solved in a sense of least square regression. The ϕM Y W is thus
given by
ϕM Y W
= arg min
· · · γ(p − 1)
γ(1)
..
..
.
...
ϕM Y W −
..
.
.
γ(p + m)
γ(p + m − 1) · · ·
γ(m)
γ(0)
2
Q
(3.5)
where ∥x∥2Q = xT Qx and Q is a positive definite weighting m × m matrix. Q is
generally set to be I for simplicity. The QR factorization procedure mentioned in
CHAPTER 3. MULTISTEP YULE-WALKER ESTIMATION METHOD
32
Section 2.4.1 can be also applied here for solving the above system.
3.2
Bias of YW method on Finite Samples
In some applications, such as radar application, number of observations is finite.
However, for such finite sample cases, ϕY W does not show a good fitting performance. The autocorrelation estimates in YW method have a small triangular bias.
An finite order AR model can be written as
yt + ϕ1 yt−1 + · · · · · · + ϕp yt−p = εt
(3.6)
where εt is a white noise process with zero mean and finite variance σt2 .
The first p true parameters can determine the first p lags of the true AR normalized autocorrelation function, which has the similar Yule-Walker relationship
with the true parameter ϕi as follows:
ρ(q) + ϕ1 ρ(q − 1) + · · · + ϕp ρ(q − p) = 0.
(3.7)
The estimator for the normalized autocorrelation function of N observation yn
for lag q is given below:
γ(q)
ρ(q) =
=
γ(0)
1
N
∑N −q
1
N
t=1 yt yt+q
∑N −q 2
t=1 yt
(3.8)
We can get the expectation for the autocovariance estimator
N −q
1 ∑
q
N −q
E[γ(q)] =
E[yt yt+q ] = γ(q){1 − }
yt yt+q =
N t=1
N
N
(3.9)
CHAPTER 3. MULTISTEP YULE-WALKER ESTIMATION METHOD
33
(Piet M.T. Broersen, 2008). From Equation 3.9 above, we could get a triangular
bias 1−q/N for γ(q), estimator of the true autocovariance. In Yule-Walker method,
we replace the normalized autocovariance ρ(q) in Equation 3.7 with its estimators
in Equation 3.8 to derive the autoregressive parameters ϕ(i) in p equations below:
ρ(q) + ϕ1 ρ(q − 1) + · · · + ϕp ρ(q − p) = 0
(3.10)
The bias in Equation 3.9 is passed down from the estimated autocorrelation
function to the estimated AR model parameters in this Yule-Walker method, which
makes the Yule-Walker estimator greatly biased from the true coefficients.
3.3
Theoretical Support of MYW
Suppose yt is the observed time series which is a strictly stationary and strongly
mixing sequence with exponentially decreasing mixing-coefficients and xt is the time
series generated by the parametric model:
xt = gϕ (xt−1 , · · · , xt−p ) + εt ,
(3.11)
where εt is the innovation and function gϕ (·) is known up to parameters ϕ. Denote
the l-step-ahead prediction of yt+l based on model 3.11 by
[l]
gϕ = E(xt+l |xt = yt ).
(3.12)
[l]
For AR model which is linear, gϕ is simply a compound function,
[l]
gϕ = gϕ (gϕ (· · · gϕ (yt ) · · · )).
(3.13)
CHAPTER 3. MULTISTEP YULE-WALKER ESTIMATION METHOD
34
If we use AR (p) model to match yt , by the Yule-Walker equation, we have the
recursive formula for its ACF, i.e.
γ(k) = γ(k − 1)ϕ1 + γ(k − 2)ϕ2 + · · · + γ(k − p)ϕp , k = 1, 2, . . .
(3.14)
Let l > p, ϕ = (ϕ1 , ϕ2 , · · · , ϕp )T ), γl = (γ(1), γ(2), · · · , γ(l))T , and
Γl =
· · · γ(p − 1)
γ(1)
γ(0)
· · · γ(p − 2)
..
..
..
...
.
.
.
γ(l − 1) γ(l − 2) · · ·
γ(0)
γ(0)
γ(1)
(3.15)
So the Yule-Walker equation can be write as
Γl ϕ = γl
(3.16)
Since ϕ is selected to match the ACF of yt , we can replace the ACF of xt by
the ACF of yt , which is denoted by γ˜ (k) and estimated by γ(k) = T −1
∑
t
=
1T −k (yt − y¯)(yt+k − y¯). Denote Γl and γl to be the sample versions of Γl and γl
˜ and γ˜ to be the corresponding population entities for yt . Let ϕ{l}
respectively and Γ
be the general form of the two methods with ϕ{l} = ϕY W for l = p and ϕ{l} = ϕM Y W
for l > p. Denoting the minimizer by ϕ{l} , we have
ϕ{l} = (ΓTl Γl )−1 ΓTl γl
(3.17)
It is easy to find that ϕ{p} is the most efficient among all ϕ{l} , l = p, p + 1, · · · for
observation-error-free case, i.e. εt = 0. Otherwise, we have the following theorem
CHAPTER 3. MULTISTEP YULE-WALKER ESTIMATION METHOD
35
(Xia and Tong, 2010):
[k]
Theorem 3.1 Assuming the moments E∥yt ∥2δ , E∥gϑ (yt , · · · , yt−p )∥2δ ,
[k]
[k]
E∥∂gϑ (yt , · · · , yt−p )/∂ϕ∥2δ and E∥∂ 2 gϑ (yt , · · · , yt−p )/∂ϕ∂ϕT ∥2δ exist for some δ >
2, we have in distribution
√
n{ϕ{l} − ϑ} ∼ N (0, Σl )
(3.18)
where ϑ = (ΓTl Γl )−1 ΓTl γl and Σl is a positive definite matrix. As a special case, if
yt = xt + ϵt with V ar(εt ) > 0 and V ar(ϵt ) = σϵ2 > 0, then the above asymptotic
results holds with ϑ = ϕ + σϵ2 (ΓTl Γl + σϵ2 Γp + σϵ4 I)−1 (Γp + σϵ2 I)ϕ.
Clearly, bias σϵ2 (ΓTl Γl +σϵ2 Γp +σϵ4 I)−1 (Γp +σϵ2 I)ϕ in the estimator will be smaller
when l is larger. Denote γ¯k = (γ(k), γ(k + 1), · · · , γ(k + p − 1)), then we have
ΓTl Γl
=
ΓTp Γp
+
l
∑
γ¯k γ¯k T
k=p
Thus, if a larger l is used or the ACF decays very slowly, the bias for the estimator
could be reduced effectively. This leads to the result that Multistep Yule-Walker
method (m > 1 or l > p) has a less significant bias than the old Yule-Walker
method and estimation accuracy may increase considerably with increasing the
number of YW equations. Simulations in Chapter 4 give a strong support of this
results.
CHAPTER 4. SIMULATION RESULTS
36
Chapter 4
Simulation Results
4.1
Comparisons for Estimation Accuracy for AR
(2) model
4.1.1
Percentage for Outperformance of MYW
Simulations have been done to compare the estimation performances for the
old Yule-Walker method and Multistep Yule-Walker method. In our simulation,
we generate series from the following the following AR (2) model:
y(t) = 0.9y(t − 1) − 0.87y(t − 2) + ε(t)
(4.1)
1000 independent realizations (N = 1000) of n data points each have been generated from the real coefficient ϕ = [0.9 − 0.87]. The error term ε(t) is randomly
generated from normal distribution with zero mean and unit variance. Then we
CHAPTER 4. SIMULATION RESULTS
37
assume a ”wrong” model for the generated series and estimate the parameters
through both the old Yule-Walker method and the Multistep Yule-Walker with the
equation number increased from 1 to 20 (m = 1 − 20). The ACFs of the original process, the process generated by the Yule-Walker estimator and process by
the Multistep Yule-Walker estimator are obtained from Monte Carlo simulation
respectively. Then we compare the accuracy of the estimation by checking how is
the ACFs of the the estimated series fitting the ACFs of the original time series
through the sum of squared error (SSE) method. This comparison criteria which is
to match the the ACFs is proved to have obvious advantages in capturing features
of the series especially when the true model is absent, data set is short or the data
is highly cyclical.
We start our simulation from sample size n=200 but it should be noted that
n = 200 may not be large enough for some of the methods to perform better. So
we consider the sample size n to be 200, 500, 1000 and 2000 increasingly. Plot 4.1
below shows the percentage of the times among the 1000 simulations when the SSE
of ACFs from MYW method is less than that of the old YW method, i.e. MYW
method outperforms YW method.
CHAPTER 4. SIMULATION RESULTS
38
Figure 4.1: Percentage for outpermance of MYW out of 1000 simulation iterations
percent of times when new method is better than old method
sample size n=200 simulation iteraty N= 1000
1
0.9
0.8
0.7
0.6
0.5
0.4
0
5
10
m
15
20
sample size n=500 simulation iteraty N= 1000
percent of times when new method is better than old method
percent of times when new method is better than old method
percent of times when new method is better than old method
for n=200, 500, 1000 and 2000
1
0.9
0.8
0.7
0.6
0.5
0.4
0
5
10
m
15
20
sample size n=1000 simulation iteraty N= 1000
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0
5
10
m
15
20
sample size n=2000 simulation iteraty N= 1000
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
5
10
m
15
20
To see the percentage more clear, the following Table 4.1 about the detailed
percentage for the outperformance of MYW method with sample sizes n = 200,500,
1000 and 2000 are provided:
CHAPTER 4. SIMULATION RESULTS
39
Table 4.1: Detailed Percentage for a Better Performance of MYW method
Forward Step n=200(%) n=500(%) n=1000(%) n=2000 (%)
1
40.2
31.4
21.3
13.7
2
100.0
100.0
100.0
100.0
3
100.0
100.0
100.0
100.0
4
100.0
100.0
100.0
100.0
5
99.8
100.0
100.0
100.0
6
99.9
100.0
99.9
100.0
7
99.8
99.7
100.0
100.0
8
100.0
100.0
100.0
100.0
9
99.8
100.0
100.0
100.0
10
99.6
100.0
100.0
100.0
11
99.6
100.0
100.0
100.0
12
99.9
100.0
100.0
100.0
13
99.9
100.0
100.0
100.0
14
99.7
100.0
100.0
100.0
15
99.6
100.0
100.0
100.0
16
99.3
100.0
99.9
100.0
17
99.5
99.9
100.0
100.0
18
99.4
100.0
100.0
100.0
19
99.5
100.0
100.0
100.0
20
99.3
99.9
100.0
100.0
CHAPTER 4. SIMULATION RESULTS
40
From the above Table 4.1, it’s easy to find that with the increase of the equation
number in the Yule-Walker system, the estimation accuracy has been improved
without doubt. With the forward step m > 1, there is nearly a 100% percentage
that the Multistep Yule-Walker method is better than the Yule-Walker method
with a smaller sum of squared error of ACFs and the estimation accuracy is also
increased with the increase of sample size n. The next section will go further to
investigate the exact SSEs for the two methods as well as its difference.
4.1.2
Difference between the SSE of ACFs for YW and
MWY methods
Considering simulation iterations N=1000, we will show 4 sets of graphs for 4
sample sizes n=200, 500, 1000 and 2000 separately. Each set consists of two graphs.
The one above with two lines shows the SSEs for the two methods in which the line
with asterisk represents the SSEY W of ACF for the YW method and the line with
circle represents the SSEM Y W of ACF for MYW method. The one below with one
line represents the difference Dif = SSEY W − SSEM Y W . If Dif > 0, then the
series generated by the YW method has a greater ACF departure from the original
series than that by the MYW method, which mean that the series generated from
the parameters of MYW method matches the original series better than the old
YW method.
CHAPTER 4. SIMULATION RESULTS
41
Figure 4.3: SSE of ACF for both Figure 4.5: SSE of ACF for both
method and its difference with method and its difference with
n=200
−3
2.5
x 10
n=500
−3
sample size n=200simulation iteration N= 1000
2
SSE of ACF for YM
SSE of ACF for MYW
sample size n=500simulation iteration N= 1000
SSE of ACF for YM
SSE of ACF for MYW
1.8
1.6
sum squre error of ACF
2
sum squre error of ACF
x 10
1.5
1
0.5
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0
5
−3
1.4
10
m
15
0
20
5
−4
Difference for SSE of ACF
x 10
0
14
10
m
15
20
Difference for SSE of ACF
x 10
sample size n=200 simulation iteraty N= 1000
sample size n=500 simulation iteraty N= 1000
12
1.2
10
1
8
0.8
6
0.6
4
0.4
2
0.2
0
0
−2
0
5
10
m
15
20
0
5
10
m
15
20
Figure 4.4: SSE of ACF for both Figure 4.6: SSE of ACF for both
method and its difference with method and its difference with
n=1000
−3
1.8
x 10
n=2000
−3
sample size n=1000simulation iteration N= 1000
1.8
SSE of ACF for YM
SSE of ACF for MYW
1.6
sum squre error of ACF
sum squre error of ACF
SSE of ACF for YM
SSE of ACF for MYW
1.4
1.2
1
0.8
0.6
1.2
1
0.8
0.6
0.4
0.4
0.2
0.2
0
5
−4
14
sample size n=2000simulation iteration N= 1000
1.6
1.4
0
x 10
10
m
15
20
0
5
−4
Difference for SSE of ACF
x 10
0
14
12
10
10
8
8
6
6
4
4
2
2
0
0
−2
−2
10
m
15
20
sample size n=2000 simulation iteraty N= 1000
12
5
15
Difference for SSE of ACF
x 10
sample size n=1000 simulation iteraty N= 1000
0
10
m
20
0
5
10
m
15
20
CHAPTER 4. SIMULATION RESULTS
42
In the above plot with two lines of every plot set in Plot 4.3, 4.4, 4.5 and 4.6, the
line with asterisk is always above the line with circle when m > 1. It is supported
by the below graph of every set in which the line is all above zero. So we could
conclude for our simulation that the Multistep Yule-Walker method has a more
accurate estimation for the parameters of this assumed AR (2) model except for
the case m=1. Also, as sample size n increase, the difference of SSE of ACF for
the two methods is more apparent. All these results are in accordance with the
conclusion drawn from the percentage of outperformance of MYW.
4.1.3
The Effect of Different Forward Step m
A good choice of m is important in practice. This section is to find out whether
there is a ”best” equation number to be increased for the MYW method which goes
with a smallest ACF departure. To see more directly what value of m gives a more
satisfactory result, let’s take out the line with circle in the above graphs separately.
As described, this line show the SSE of ACF between original AR process and
process generated by the estimated parameters from MYW method with increased
equation number m=1 to 20. Let’s take the line SSEB = 0.5 × 10−3 to be our
baseline and regard the m with a SSE less than SSEB as a ”best” m. The results
are presented in Plot 4.6 below:
CHAPTER 4. SIMULATION RESULTS
43
Figure 4.6: SSE of ACF for MYW method with n=200, 500, 1000 and 2000
−3
1.8
x 10
−3
sample size n=200 simulation iteration N= 1000
1.4
x 10
sample size n=1000 simulation iteration N= 1000
SSE of ACF for MYW
SSE of ACF for MYW
1.6
1.2
sum squre error of ACF
sum squre error of ACF
1.4
1.2
1
0.8
0.6
1
0.8
0.6
0.4
0.4
0.2
0.2
0
0
5
−3
1.8
x 10
10
m
15
0
20
0
5
−3
sample size n=500 simulation iteration N= 1000
1.5
x 10
10
m
15
20
sample size n=2000 simulation iteration N= 1000
SSE of ACF for MYW
SSE of ACF for MYW
1.6
sum squre error of ACF
sum squre error of ACF
1.4
1.2
1
0.8
0.6
1
0.5
0.4
0.2
0
0
5
10
m
15
20
0
0
5
10
m
15
20
In general, an improvement on the accuracy for estimation could be found for
1 < m < 20 on the above Plot 4.6. The results using the SSE criterion indicate that
the Multistep Yule-Walker method has its attractiveness in parameter estimation
with more information on ACF lags added into the Yule-Walker system. To find
out the ”best” m, we observe the 4 cases with different sample sizes one by one
and list the ”m” when the SSE is smaller than SSEB = 0.5 × 10−3 . The results
are in Table 4.2 below:
CHAPTER 4. SIMULATION RESULTS
Table 4.2: List of ”best” m for MYW method
Forward Step n=200 n=500 n=1000 n=2000
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
44
CHAPTER 4. SIMULATION RESULTS
45
From the above Table 4.2, for the parameter estimation for the AR (2) model
y(t) = 0.9y(t − 1) − 0.87y(t − 2) + e(t), we can always find a small enough SSE of
ACF for excellent fitting performance to the estimation with the Multistep YuleWalker method for m > 5 only for several exceptions. But the rules that the larger
the better does not hold when choosing the ”best” m since that a large m may also
cause larger variability of the estimator. Therefore, there is no general ”best” m
for all the cases. Different ”best” m exist for different cases which keeps a good
balance of estimation accuracy and variability.
4.2
Estimation Accuracy for Fractional ARIMA
Model
The simulations in Section 4.1 consider a simple AR(2) model. It is taken from
the ARMA(p,q) model by setting p = 2 and q = 0. In this model, we did not
take into account the long memory characteristic which means that the ACF of
the process decay very slowly. It is often the case for the more complicated data
in real world. In order to evaluate the estimation accuracy of the Multistep YuleWalker method, we will also try another model called fractional ARIMA model or
ARFIMA model, which is the most known ones among the stationary, invertible and
long memory processes. The fractionally ARIMA(p, d, q) process has widely been
used in different fields such as astronomy, hydrology, mathematics and computer
CHAPTER 4. SIMULATION RESULTS
46
science, to represent a time series with long memory property. So proper care
should be taken for the long time persistence present in the data.
For a time series {Xt }, we define µt = Xt −Xt−1 = (1−B)d Xt as the differenced
series, where B is the backward operator and d is an integer. If µt follows the
ARMA(p, q) process, we could call {Xt } a ARIMA (p,d,q) proecess. A fractional
ARIMA (p,d,q) model is a generalized ARIMA model by allowing for noninteger d
which varying in the interval (-0.5,0.5). So in a fractional ARIMA(p, d, q) model,
the order of dimension d is fractional and it has long time dependence ACF which
will not decay over time for 0 < d < 0.5.
In this study, long memory series are generated from the fractional ARIMA(p,
d, q) model by setting d=0.2. The random variables εt are assumed to be identically
and independently normally distributed as N(0, 1). Any AR(p) model is a ”wrong”
model for this long memory series. Firstly, we assume a ”wrong” AR(2) model for
the generated series and estimate the parameters for the AR model with the YuleWalker method and Multistep Yule-Walker method. The difference of SSE of ACFs
for the two methods are presented with sample size n=200, 500, 1000 and 2000 in
following Plot 4.7:
CHAPTER 4. SIMULATION RESULTS
47
Figure 4.7: Difference of SSE of ACF with n=200, 500, 1000 and 2000 for p=2,
d=0.2
−4
14
x 10
−4
Difference for SSE of ACF with n=200
7
12
6
10
5
8
4
6
3
4
2
2
1
0
0
−2
0
5
−4
4
x 10
10
m
15
20
−1
x 10
Difference for SSE of ACF with n=1000
0
5
−4
Difference for SSE of ACF with n=500
9
10
m
15
x 10
Difference for SSE of ACF with n=2000
0
5
20
8
3.5
7
3
6
2.5
5
2
4
1.5
3
1
2
0.5
1
0
0
−0.5
−1
0
5
10
m
15
20
10
m
15
20
Obviously, we could find a better performance with the Multistep Yule-Walker
method as described before. To eliminate the estimation bias from the different
order p, three other models are considered: AR(1), AR(3) and AR(4) for the data
sample n=500, 1000 and 2000.
CHAPTER 4. SIMULATION RESULTS
48
Figure 4.9: Difference of SSE of ACF Figure 4.11: Difference of SSE of
for n=500 with p=1
−4
18
x 10
ACF for n=500 with p=3
−4
Difference for SSE of ACF with n=500
7
16
x 10
Difference for SSE of ACF with n=500
0
5
6
14
5
12
4
10
8
3
6
2
4
1
2
0
0
−2
0
5
10
m
15
20
−1
10
m
15
20
Figure 4.10: Difference of SSE of Figure 4.12: Difference of SSE of
ACF for n=500 with p=2
−4
4
x 10
ACF for n=500 with p=4
−4
Difference for SSE of ACF with n=500
7
3.5
x 10
Difference for SSE of ACF with n=500
0
5
6
3
5
2.5
4
2
3
1.5
2
1
1
0.5
0
0
−0.5
−1
0
5
10
m
15
20
10
m
15
20
CHAPTER 4. SIMULATION RESULTS
49
Figure 4.13: Difference of SSE of Figure 4.15: Difference of SSE of
ACF for n=1000 with p=1
−4
10
x 10
ACF for n=1000 with p=3
−4
Difference for SSE of ACF with n=1000
10
8
8
6
6
4
4
2
2
0
0
−2
0
5
10
m
15
20
−2
x 10
Difference for SSE of ACF with n=1000
0
5
10
m
15
20
Figure 4.14: Difference of SSE of Figure 4.16: Difference of SSE of
ACF for n=1000 with p=2
−4
7
x 10
ACF for n=1000 with p=4
−4
Difference for SSE of ACF with n=1000
9
x 10
Difference for SSE of ACF with n=1000
0
5
8
6
7
5
6
4
5
3
4
3
2
2
1
1
0
−1
0
0
5
10
m
15
20
−1
10
m
15
20
CHAPTER 4. SIMULATION RESULTS
50
Figure 4.17: Difference of SSE of Figure 4.19: Difference of SSE of
ACF for n=2000 with p=1
−4
9
x 10
ACF for n=2000 with p=3
−4
Difference for SSE of ACF with n=2000
5
x 10
Difference for SSE of ACF with n=2000
0
5
8
4
7
6
3
5
4
2
3
1
2
1
0
0
−1
0
5
10
m
15
20
−1
10
m
15
20
Figure 4.18: Difference of SSE of Figure 4.20: Difference of SSE of
ACF for n=2000 with p=2
−4
9
x 10
ACF for n=2000 with p=4
−4
Difference for SSE of ACF with n=2000
6
x 10
Difference for SSE of ACF with n=2000
0
5
8
5
7
4
6
5
3
4
2
3
2
1
1
0
0
−1
0
5
10
m
15
20
−1
10
m
15
20
With almost every line in the above plot are above zero, the improvement
of the estimation accuracy with MYW method has been achieved in all sample
sizes and all four p values. Among them, a better performance with a relatively
larger value in the difference of the SSE of ACFs for both methods is found when
p = 1. So among the four ”wrong” models, AR(1) gives a better fitting with the
original process according to the ACF matching criteria by the Multistep Yule-
CHAPTER 4. SIMULATION RESULTS
Walker estimation method than the Yule-Walker method.
51
CHAPTER 5. REAL DATA APPLICATION
52
Chapter 5
Real Data Application
5.1
Data Source
The interesting aspect of a work lies in whether it can explain and motivate
the methodology with real data. In this work, an effort was made to apply this
modified method on the real data sets.
Since 1973, when the floating exchange rate system was implemented, serious
concerns have been put on the volatility of the foreign exchange rates by world
leaders, policy makers, economic researchers and financial specialists. Disputes on
whether the increased volatility of exchange rate may have a negative impact on
international trade and what can be done to eliminate currency speculation arose
from a series of financial crises in Mexico, Russia, and Asia. It is of great importance
to fit the exchange rate data with a good model to give proper prediction. We have
already known that the probability densities of changes of foreign exchange rates
CHAPTER 5. REAL DATA APPLICATION
53
generally have fat tails compared with the normal distribution and the volatility
always shows a long autocorrelation. Thus, the linear AR model is frequently used
to fit the real exchange rate series because it is sufficient for reflect the above
characteristics of real exchange rates and has some predictive abilities for the long
run. We could find the advantages of the AR model to be fitted for real exchange
rate both theoretically and empirically in many recent research papers. Under
the AR model assumption for the exchange rate data, the Multistep Yule-Walker
method we proposed in our study could be used to estimate the parameters of
the AR model for daily exchange rate series. The estimation performances of the
univariate time series presentation for the daily USD/ JPY real exchange rate are
compared using data for the period 2001-2004. The ultimate test of its usefulness is
its estimation accuracy in terms of the sum of squared error (SSE)of its ACFs. We
compare the SSE of ACFs from the model generated by Yule-Walker estimation
and that by Multistep Yule-Walker estimation to check which method generate a
more matching series with the observed data.
5.2
Numerical Results
1005 data of the daily USD/ JPY real exchange rate for the period 2001-2004
are used. Five AR (p) models with p=1, 2, 3, 4 and 5 are applied to match the
series. Both the Yule-Walker method and the Multistep Yule-Walker method are
use to estimate the parameters for the assumed model and the SSE of ACFs for
CHAPTER 5. REAL DATA APPLICATION
54
both methods are compared. The results are shown in plots below:
−4
SSE of ACF for MYW methods
x 10
Difference in sum squre error of ACF
0.0625
sum squre error of ACF
0.0624
0.0624
2
0.0623
0.0623
1
0.0623
0.0622
0
0
5
10
m
15
20
0.0621
0
5
10
m
15
20
Figure 5.2: Difference between SSE Figure 5.3: SSE of ACF for MYW
of ACF for two methods with p=1
−4
1.5
method with p=1
SSE of ACF for MYW methods
x 10
Difference in sum squre error of ACF
0.0625
0.0624
sum squre error of ACF
0.0624
0.0624
1
0.0624
0.0624
0.5
0.0623
0.0623
0.0623
0
0
5
10
m
15
20
0.0623
0
5
10
m
15
20
Figure 5.4: Difference between SSE Figure 5.5: SSE of ACF for MYW
of ACF for two methods with p=2
method with p=2
CHAPTER 5. REAL DATA APPLICATION
−4
55
Difference in sum squre error of ACF
SSE of ACF for MYW methods
x 10
0.0629
0.0629
0.0628
sum squre error of ACF
3
0.0628
0.0627
2
0.0627
0.0626
1
0.0626
0.0625
0
0
5
10
m
15
20
0.0625
0
5
10
m
15
20
Figure 5.6: Difference between SSE Figure 5.7: SSE of ACF for MYW
of ACF for two methods with p=3
−4
1.4
SSE of ACF for MYW methods
x 10
Difference in sum squre error of ACF
0.0629
0.0629
1.2
sum squre error of ACF
method with p=3
0.0629
1
0.0628
0.8
0.0628
0.6
0.0628
0.4
0.0628
0.2
0
0.0628
0
5
10
m
15
20
0.0627
0
5
10
m
15
20
Figure 5.8: Difference between SSE Figure 5.9: SSE of ACF for MYW
of ACF for two methods with p=4
method with p=4
CHAPTER 5. REAL DATA APPLICATION
−5
4
Difference in sum squre error of ACF
SSE of ACF for MYW methods
x 10
0.0628
0.0628
3.5
0.0628
3
sum squre error of ACF
56
0.0628
2.5
0.0628
2
0.0628
1.5
0.0628
1
0.0628
0.5
0
0.0628
0
5
10
m
15
20
0.0628
0
5
10
m
15
20
Figure 5.10: Difference between SSE Figure 5.11: SSE of ACF for MYW
of ACF for two methods with p=5
method with p=5
Five ”wrong” models have been tried here: AR (1), AR (2), AR (3), AR (4) and
AR (5). In model AR (1), AR (3) and AR (4), the SSE of ACFs of the Multistep
Yule-Walker is close to zero which indicates an excellent fit with the original series
for m > 2 and the difference of the SSE of ACF for the two methods are relatively
large for m > 1. So we could find a better performance of the Multistep YuleWalker method in these three models. In model AR (2), the improvement of the
estimation accuracy of the Multistep Yule-Walker method starts from m = 5 and
for AR (5), it starts from m = 10.
Our results indicate that the exchange rates generated by the AR model with
the parameters estimated by the Multistep Yule-Walker method has a very small
SSE of ACFs. Better fit is given by three assumed models AR (1), AR (3) and AR
(4). Overall in all the five cases conducted, the Multistep Yule-Walker method outperforms the Yule-Walker method with the line representing the difference between
CHAPTER 5. REAL DATA APPLICATION
57
the SSE of ACFs for the two methods above zero. So we could conclude that the
Multistep Yule-Walker method can be used to achieve fairly accurate estimation
for foreign exchange market for predicting. The above model applied in USD/JPY
exchange rate could easily be applied in other exchange rates also without much
alteration in program.
CHAPTER 6. CONCLUSION AND FUTURE RESEARCH
58
Chapter 6
Conclusion and Future Research
In this study, a modification for the Yule-Walker method is introduced to fit the
”wrong” AR model, which involves a high-order system of p+m linear equations for
the estimation of the p autoregression parameters. For the Yule-Walker method,
which use the sample ACF to fit autoregression (AR) model to time series data,
it yields a strong distortion in finite samples. This study attempted to reduce
the bias generated by the old Yule-Walker method by adding more ACF lags.
Monte carlo simulations are presented to support the analysis. Benefits of better
estimation performance are achieved by increasing the equation number in the
Yule-Walker method for estimation. It is shown that use of this new Multistep
Yule-Walker method improves the performance of the parameter estimation for
finite sample data. This accuracy generally grows when more than one equation is
added to the Yule-Walker system. Comparison is made between the Multistep YuleWalker method and Yule-Walker method in terms of the sum of squared error of
CHAPTER 6. CONCLUSION AND FUTURE RESEARCH
59
the ACFs. The new method gives a good trade-off between the estimation accuracy
and computational complication. In further study, more about estimation accuracy
difference between Yule-Walker method and Multistep Yule-Walker method could
be discussed both theoretical and empirically, and new adaption to more effectively
use this new method for achieving further performance improvements could be
explored. Attentions could be also paid to other factors that affect the performance
of the Multistep Yule-Walker method. And other more thorough performance
evaluation approaches for the finite sample data could be used for a more reasonable
comparison for the estimation accuracy of the method.
BIBLIOGRAPHY
60
Bibliography
[1] A. M. Walker, (1962), Large sample estimation of parameters for autoregressive processes with moving average residuals. Biometrika, 49 (1962), 117- 131.
[2] B. Friedlander, (1982), A recursive maximum likelihood for ARMA spectral
estimation. IEEE Transactions on Information Theory, IT-28, 4. 639- 646.
[3] B. Friedlander (1983), Instrumental variable methods for ARMA spectral estimation. IEEE Transactions on Acoustic Speech and Signal Processing, ASSP31, 2 (Apr. 1983), 404-415.
[4] B. Friedlander (1983), The asymptotic performance of the modified YuleWalker estimator. In Proceedings of the 2nd ASSP Workshop on Spectral
Estimation, Tampa, Fla., Nov. 1983, pp. 22-26.
[5] B. Friedlander, (1983), Instrumental variable methods for ARMA spectral estimation. IEEE Transactions on Acoustic Speech and Signal Processing, ASSP31, 2 (Apr. 1983), 404-415.
BIBLIOGRAPHY
61
[6] B. Friedlander and B. Porat, The modified Yule-Walker method of ARMA
spectral estimation.IEEE Trans. Aerospace Electron Syst., vol. AES-20. pp.
158-173. Mar. 1984.
[7] B. Friedlander, The overdetermined recursive instrumental variable method.
IEEE Trans. Automat. Conlr., vol. AC-29. pp. 353-356, Apr. 1984.
[8] B. Friedlander and K. C. Sharman, Performance evaluation of the York: Wiley.
1973. modified Yule-Walker estimator. IEEE Trans. Acoust., Speech,Signal
Processing, vol. ASSP-33. pp. 719-725. 1985.
[9] B. Porat, ARMA Spectral estimation based on partial autocorrelations. Circuits Syst. Signal Processing, vol. 2, no. 3, pp. 341-360. 1983.
[10] E. Wensink and W.J. Dijkhof, On finite sample statistics for Yule- Walker
estimates. IEEE Trans. on Information Theory, vol. 49, pp. 509-516, 2003.
[11] J. A. Cadzow, Spectral estimation: An overdetermined rational model equation approach. Prox.IEEE, vol.70. pp.907-939.
[12] J. A. Cadzow, ARMA modeling of time series. IEEE Trans. Pattern Anal.
Machine Intell., vol. PAMI3, pp. 124-128.
[13] J. A. Cadzow, (1982), Spectral estimation: An overdetermined rational model
equation approach. Proceedings of the IEEE, 70, 9 (Sept. 1982), 907-939.
BIBLIOGRAPHY
62
[14] L. P. Hansen, Large sample properties of generalized method of moments
estimators. Econometrica, vol.5. pp. 1029-1054. 1982
[15] M. Kaveh and S. P. Bruzzone, Statistical efficiency of correlation based methods for ARMA spectral estimation.Process. Inst. Elec. Eng., vol. 130, part F,
pp. 211-217, Apr. 1983.
[16] M. Kaveh, and S.P.Bruzzone, (1983), Statistical efficiency of correlation-based
methods for ARMA spectral estimation. Proceedings of the IEEE, 130, pt. F,
3 (Apr. 1983), 211- 217.
[17] M. Pagano, Estimation of models of autoregressive signal plus white noise.
Ann. Statist., vol. 2, no. 1, pp. 99-108, 1974.
[18] M. Pagano,(1974), Estimation of models of the autoregressive signal plus noise.
Annals of Statistics, 2, 1 (1974), 99-108.
[19] P.M.T. Broersen, Historical misconceptions in autocorrelation estimation.
IEEE Trans. on Instrumentation and Measurement, vol. 56, no 4, pp. 11891197, 2007.
[20] P.M.T. Broersen, Finite-sample bias in Yule-Walker Method of Autoregressive
estimation. IEEE International Instrumentation and Measurement Technology
Conference, Victoria, Vancouver Island, Canada, 2008.
[21] P. Shaman and R.A. Stine, The bias of autoregressive coefficient estimators,
Journal ofthe American Statistical Association, vol. 83,pp 842-848, 1988.
BIBLIOGRAPHY
63
[22] P. Stoica and T. S¨onderstr¨om, Optimal instrumental Variable estimation and
approximate implementations. IEEE Trans. Automat. Contr., vol. AC-28. pp.
757-772. July 1983.
[23] P. Stoica, Generalized Yule-Walker equations and testing the orders of multivariate time series. International Journal of Control. v37 i5. 1159-1166. 1983
[24] P. Stoica, T. Soderstrom, and B. Friedlander, Optimal instrumental variable
estimates of the AR parameters of an ARMA process. IEEE Trans. Autom.
Control, vol. AC-30, no. 11, pp. 1066-1074, 1985.
[25] P. Stoica. B. Friedlander. and T. Soderstrom, Optimal instrumental variable
multistep algorithms for estimation of the AR parameters of an ARMA process. Syst. Contr. Technol.. Palo Alto. CA. Tech. Rep. 5498-04. May 1984: also
in Proc. 24th IEEE Conf. Decision Contr., Fort Lauderdale. FL, Dec. 11-13.
1985.
[26] P. Stoica, Petre, B. Friedlander, Benjamin and S¨oderstr¨om, Torsten(1986),
Least-squares, Yule-Walker, and overdetermined Yule-Walker estimation of
AR parameters: a Monte Carlo analysis of finite-sample properties. International Journal of Control, 43: 1, 13-27.
[27] Rey S. Tsay, (2005), Analysis of Financial Time Series [second edition]. A
John Wiley & Sons, Inc. Publication, Hoboken, New Jersey.
BIBLIOGRAPHY
64
[28] S. M. Kay, (1980), A new ARMA spectral estimator. IEEE Transactions on
Acoustics, Speech and Signal Processing ASSP-28, vol. 5, 585-588.
[29] S. Kay and J. Makhoul, On the statistics of the estimated reflection coefficients
of an autoregressive process. IEEE Trans.Acoust., Speech, Signal Process., vol.
ASSP-3 1, pp. 1447-1455, 1983.
[30] T. W. Anderson, (1971), The Statistical Analysis of Time Series. New York:
Wiley, 1971.
[31] T. Kailath. A. Vieira. and M. Morf, Inverses of Toeplitz operators. innovations
and orthogonal polynomials. SfAM Rev., vol. 20. pp. 106-110. Jan. 1978.
[32] T. Soderstrijm and P. Stoica, Comparison of some instrumental variable
methods-Consistency and accuracy aspects. Autornatica, vol. 17, no. 1, pp.
101-115, 1981.
[33] Wei, (1990), Time series analysis. Addison-Wesley Publishing Company, Redwood City, CA.
[34] Y. C. Xia and H. Tong (2010), Feature matching in time series modelling,
Manuscript, ♯472, Department of Statistics and Actuarial Sciences, University
of Hong Kong.
[35] Y. T. Chan and R. P. Langford, Spectral estimation via the high-order YuleWalker equations. IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP30, pp. 689-698. Oct. 1982.
APPENDIX
65
Appendix
Related matlab code
function acvf = armaacvf(phi,theta,n)
% ARMAACVF(PHI,THETA,N) computes acvf out to lag N of ARMA model with
%
given coefficients in (PHI,THETA) assuming sigma_a^2 = 1.
% phi, theta must be column vectors (px1 and qx1, respectively).
% returns acvf = [ gamma_0 gamma_1 ... gamma_n ]’
[p m] = size(phi); [q m] = size(theta); phi1 = [1 ; -phi ]; if q>p
phi1 = [phi1 ; zeros(q-p,1) ];
end theta1 = [1 ; -theta]; if p>q
theta1 = [ theta1 ; zeros(p-q,1) ];
end m = 1+ max(p,q);
% find gamma_0 , ... , gamma_{m-1} by solving linear system
% setting up the matrix
R = zeros(m,m); T = toeplitz(1:m); for i=1:m
for j=1:m
APPENDIX
66
R(i,T(i,j)) = R(i,T(i,j))+phi1(j);
end
end
% setting up r.h.s.
psi1 = [1]; for i=2:m
psi1 = [ psi1 ; theta1(i) - psi1(1:(i-1))’*phi1(i:-1:2) ];
end rhs = zeros(m,1); for i=1:m
rhs(i) = theta1(i:m)’*psi1(1:(m-i+1));
end acvf = R\rhs; if m > n
acvf = acvf(1:(n+1));
else
for i=m:n
temp = phi’*acvf(i:-1:(i-p+1));
acvf = [ acvf ; temp ];
end
end
function acf = armaacf(phi,theta,n)
% ARMA_ACF(PHI,THETA,N) computes acf out to lag N of ARMA model with
%
given coefficients in (PHI,THETA).
% phi, theta must be column vectors (px1 and qx1, respectively).
APPENDIX
67
acvf = armaacvf(phi,theta,n); acf = acvf/acvf(1);
function
theta = YuleWalker(y, p, M)
% M >= 1
M-step ahead
r = autocorr(y, p+M+1); r = r/r(1); X = []; y = r(2:p+M); x =
zeros(1,p);
x(1,:) = r(1:p); for i = 2:p+M-1
xi = [r(i) x(i-1,:)];
x(i,:) = xi(1:p);
end
theta = inv(x’*x)*(x’*y);
function x=Farima(N,H)
% Input:
% n: signal length 2^n.
% Hurst parameter, 0.5 [...]... described In Chap 3, we will show the modification we proposed on the Yule- Walker method The bias of the Yule- Walker estimator in finite sample which lead to the poor performance of the Yule- Walker method is demonstrated Theoretical support for better estimation performance of Multistep Yule- Walker method is given Simulation results of the autoregressive processes to support the modification are illustrated... Difference of SSE of ACF for n=1000 with p=3 49 4.16 Difference of SSE of ACF for n=1000 with p=4 49 4.17 Difference of SSE of ACF for n=2000 with p=1 50 4.18 Difference of SSE of ACF for n=2000 with p=2 50 4.19 Difference of SSE of ACF for n=2000 with p=3 50 4.20 Difference of SSE of ACF for n=2000 with p=4 50 5.2 Difference between SSE of ACF... prediction error in a sense of least-squares regression The difference is that YuleWalker method is to solves the Yule- Walker equations, which is formed from sample covariances A stationary autoregressive (AR) process {Yt } of order p can be fully identified from the first p + 1 autocovariances, that is cov(Yt , Yt+k ), k = 0, 1, · · · , p, by the Yule- Walker equations Moreover, the Yule- Walker equations have... classes of practical importance for modeling variations of a process exist: the autoregressive (AR) models, the moving average (MA) models and the integrated (I) models Autoregressive (AR) model is a linear regression relationship of the current value of the series against one or more past values of the series We will give a detailed description on autoregressive model in the following section Moving... prices, economic activities, etc The time series models introduced include simple autoregressive (AR) models, simple moving-average (MA) models, mixed autoregressive moving-average (ARMA) models, seasonal models, unit-root nonstationarity, and fractionally differenced models for long-range dependence The most fundamental class of time series should be the autoregressive moving average model(ARMA) Techniques... work, focus will be put on the method for parameter estimation for AR parameters After reviewing several commonly used AR model parameter estimation methods, a new multistep Yule- Walker estimation method is introduced which increases the equation number in the Yule- Walker method to enhance the fitting accuracy The criteria used to compare the performance of the methods is the ACFs matching between model... parameter estimation method which can reduce the estimation bias effectively Our work will be focusing on the AR models and its CHAPTER 1 INTRODUCTION 3 estimation methods in order to evaluate the performance of different parameter estimation methods for fitting the AR model The autoregressive (AR) model, which was developed by Box and Jenkins in 1970, represents a linear regression relationship of the current... sample cases But either the Yule- Walker or the Least Squares method is frequently used compared with other methods mostly due to CHAPTER 1 INTRODUCTION 4 some historical reasons Among all of the methods, the most common method is so called Yule- Walker method which applies the least squares regression method on the Yule- Walker equations system The basic steps to get the Yule- Walker equations is firstly to... and for forecasting 2.3 Autoregressive (AR) Model This study focuses on one specific type of time series model: the autoregressive (AR) model The AR(p) model was developed by Box and Jenkins in 1970 (Box, 1994) As mentioned above, AR (p) model is a linear regression relationship of the current value of the series against past values of the series The value of p is called the order of the AR model, which... expectation of the multiple values and normalize it (Box and Jenkins, 1976) However, some previous research has been done to show that in some occasions the Yule- Walker estimation method leads to poor parameter estimates with large bias even for moderately sized data samples In our study, we propose an improved method on the Yule- Walker method which is to increase the equation numbers in the Yule- Walker .. .MULTISTEP YULE- WALKER ESTIMATION OF AUTOREGRESSIVE MODELS YOU TINGYAN (B.Sc Nanjing Normal University) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF STATISTICS... well-known Yule- Walker method and our new multistep Yule- Walker method based on the autocorrelation function (ACF) is made The effect of different number of Yule- Walker equations on the estimation. .. distribution, confidence interval, etc CHAPTER MULTISTEP YULE- WALKER ESTIMATION METHOD 29 Chapter Multistep Yule- Walker Estimation Method When introducing the Yule- Walker Method, we can find its computational