Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 117 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
117
Dung lượng
1,51 MB
Nội dung
Chapter 1 Introduction
This first chapter offers a general description of the short term load forecasting (STLF) problem, and its
significance for the power industry. Then the two main approaches to STLF – statistical approach and
artificial neural networks approach are introduced and detailed, followed by the motivation for this thesis
and contribution of this thesis. Finally there is a bibliographic review of the methods for STLF from these
two disciplines, and then the structure of this thesis is explained.
1.1 Load Forecasting
Load forecasting has always been an issue of major interest for the electricity
industry. During the operation of a power system, the system response closely follows the
load requirements. So when there is an increase or decrease in the load demand, then the
power generation has to be increased or decreased accordingly. To be able to provide this
on-demand power generation, the electric utility operator needs to have available
sufficient quantity of generation resources. Thus, if the operator has some a priori
knowledge of the load requirements in the future, he can optimally allocate the generation
resources.
There are three kinds of load forecasting: short term, medium term, and long term
forecasts. Utility operators need to perform all the three forecasts, as they influence
different aspects of the power supply chain. Short term load forecasting typically means
forecasts for one hour to one week, and are needed for the daily operation of the power
system. Medium term forecasts typically cover one week to one year ahead, and are
needed for fuel supply planning and maintenance. Long term load forecasts usually cover
a period longer than a year, and are needed for power system planning.
1.2 Importance of Short Term Load Forecasting
Short term load forecasting (STLF) is the keystone of the operation of today’s
power systems. Without access to good short term forecasts, it would be impossible for
any electric utility to be able to operate in an economical, reliable and secure manner.
The input data for load flow studies and contingency analysis is provided by STLF.
Utilities need to perform these studies to calculate the generating requirements of each
1
generator in the system, to determine the line flows, to determine the bus voltages, and to
ensure that the system continues to operate reliably even in the case of contingencies such
as loss of a generator or of a line. STLF is also used by the utility engineers in other offline network studies, such as preparing a list of corrective actions for different types of
expected faults. Such corrective actions may include load shedding, switching off
interconnections and forming islands, starting up of peaking units or increasing the
spinning and standby reserves of the system [1]. Thus, the STLF is used by the system
operators and regulatory agencies to ensure the safe and reliable operation of the system,
and by the producers to ensure the optimal utilization of generators and power stations.
With the advent of deregulation and the rise of competitive electricity markets,
STLF has also become important for market operators, transmission owners and other
market participants [2]. As an accurate electricity price forecast is not possible without an
accurate load forecast, hence the operational plans and bidding strategies of the market
players require STLF as well. Forecast errors will have negative implications for the
company profits, and eventually for shareholder value.
1.3 Approaches to Short Term Load Forecasting
STLF methods, and more generally, time series prediction (TSP) methods can be
broadly divided into two categories: statistical methods and computational intelligence
(CI) methods.
1.3.1 Statistical Methods
1.3.1.1 Time Series Models
Modern statistical methods for time series prediction can be said to begun in 1927
when Yule came up with an autoregressive technique to predict the annual number of
sunspots. According to this model, the next-step value was a weighted average of previous
observations of the series. To model more interesting behavior from this linear system,
outside intervention in the form of noise was introduced. For the next half-century, the
reigning paradigm for predicting any time series remained that of a linear model added
with noise. The popular models developed during this period would include moving
2
average,
exponential
smoothing
methods,
Box-Jenkins
approach
to
modeling
autoregressive moving average (ARMA) models and autoregressive integrated moving
average (ARIMA) models. These models, referred together as time series models, assume
that the data is following a stationary pattern, i.e. the series is normally distributed with a
constant mean and variance over a long time period. They also assume that the series has
uncorrelated random error, and no outliers are present.
Applied for load forecasting, time series methods provide satisfactory results as long
as the variables affecting the load demand, such as environmental variables, do not change
suddenly. Whenever there is an abrupt change in such variables, the accuracy of the time
series models suffers. Also, the assumption of stationarity of the load series is rather
restricting, and whenever the historical load data deviates significantly from this
assumption, the forecasting accuracy decreases.
1.3.1.2 Regression Models
Regression methods are another popular tool for load forecasting. Here the load is
modeled as a linear combination of relevant variables such as weather conditions and day
type. Temperature is usually the most important factor for load forecasting among weather
variables, though its importance depends upon the kind of forecast and the type of climate.
For example, for STLF, temperature effects might be more critical for tropical regions
than temperate ones. Typically temperature is modeled in a nonlinear fashion. Other
weather variables such as wind velocity, humidity and cloud cover can be included in the
regression model to obtain higher accuracy. Clearly, no two utilities are the same, and a
detailed case study analysis of the different geographical, meteorological, and social
factors affecting the load demand needs to be carried out before proceeding with the
regression methods. Once the variables have been determined, the coefficients of these
variables can be estimated using least squares or other regression methods.
Though regression methods are popular tools for STLF among electric utilities, they
have their share of drawbacks. The relationship between the load demand and the
influencing factors is a nonlinear and complex one, and developing an accurate model is a
challenge. From on-site tests, it has been seen that the performance of regression methods
3
deteriorates when the weather changes abruptly, leading to load deviation [3]. This
drawback occurs in particular because the model is linearized so as to obtain its
coefficients. But the load patterns are nonlinear; hence a linearized model fails to
represent the load demand accurately during certain distinct time periods.
1.3.1.3 Kalman Filtering Based Models
Towards the end of 1980s, as computers became more powerful, it became possible
to record longer time series and apply more complex algorithms to them. Drawing on
ideas from differential topology and dynamical systems, it was possible to represent a time
series as being generated by deterministic governing equations. This approach of Kalman
filtering techniques characterizes dynamical systems by a state-space representation. The
theory of Kalman filtering provides an efficient computational (recursive) means to
estimate the state of a process, in a way that minimizes the mean of the squared error. The
filter supports estimation of the past, present and even future states, and it can do so even
when the precise nature of the modeled system is unknown [4]. A significant challenge in
the use of Kalman filtering based methods is the estimation of the state-space model
parameters.
1.3.1.4 Non-linear Time Series Models
To overcome the limitations of the linear time series models, a second generation of
non-linear statistical time series models has been developed. Some of the models, such as
autoregressive conditional heteroscedatic (ARCH) and generalized autoregressive
conditional heteroscedatic (GARCH) attempt the model the variance of the time series as a
function of its past values. These models achieved limited success for STLF since they
were mostly specialized for particular problems in particular domain, example volatility
clustering in financial indices.
Regime-switching models, developed first for econometrics, are slowly being
successfully applied for STLF as well. As the name suggests, these models involve
switching between a finite number of linear regimes. The models only differ in their
assumptions about the stochastic process generating the regime.
4
i.
The mixture of normal distributions model has state transition probabilities which
are independent of the history of the regime. Compared to a single normal
distribution, this approach is better able to model fatter-than-normal tails and
skewness [5].
ii.
In the Markov-switching model, the switching between two or more regimes is
governed by a discrete-state homogeneous Markov chain [6]. So in a possible
formulation of the Markov-switching model, the model can be divided into two
parts, firstly a regressive model to regress the model variable over hidden state
variables, and secondly, an autoregressive model to describe the hidden state
variables.
iii.
In the threshold autoregressive (TAR) model [7][8], the switching between two or
more linear autoregressive models is governed by an observable variable, called
the threshold variable. In the case where this threshold variable is a lagged value
of the time series, the model is called a self-exciting threshold autoregressive
(SETAR) model.
iv.
In the smooth transition autoregressive (STAR) model, the switching is governed
by an observable threshold variable, similar to TAR model, but a smooth
transition between the two regimes is enforced.
As a few of these non-linear time series model form the basis of the hybrid models
proposed in this work, they are explained in detail in Chapter 2.
1.3.2 Computational Intelligence Methods
The deregulated markets and the constant need to improve the accuracy of load
forecasting have forced the electricity utility operators to focus much attention to
computational intelligence based forecasting methods. It has been calculated in [9] that a
reduction of 1% of forecasting error could save up to $1.6 million annually for a utility.
Computational intelligence techniques broadly fall into four classes – expert
systems, fuzzy logic systems, neural networks and evolutionary computation systems. A
brief introduction to these four approaches is provided.
5
1.3.2.1 Expert Systems
An expert system is a computer program which simulates the judgment and behavior
of a human or an organization that has expert knowledge and experience in a particular
field. Typically an expert system would comprise four parts: a knowledge base, a data
base, an inference mechanism, and a user interface. For STLF, the knowledge base is
typically a set of rules represented in the IF-THEN form, and can consist of relationships
between the changes in the load demand and changes in factors which affect the use of
electricity. The data base is typically a collection of facts provided by the human experts
after interviewing them, and also facts obtained using the inference mechanism of the
system. The inference mechanism is the “thinking” part of the expert system, because it
makes the logical decisions using the knowledge from the knowledge base and
information from the data base. Forward chaining and backward chaining are two popular
reasoning mechanisms used by the inference mechanism [10].
In terms of advantages, the expert systems can be used to take decisions when the
human experts are unavailable, thus reducing the work burden of human experts. When
human experts retire, their knowledge can still be retained in these systems.
1.3.2.2 Fuzzy Logic Systems
Fuzzy systems are knowledge-based software environments which are constructed
from a collection of linguistic IF-THEN rules, and realize nonlinear mapping which has
interesting mathematical properties of “low-order interpolation” and “universal function
approximation”. These systems facilitate the design of reasoning mechanism of partially
known, nonlinear and complex processes.
A fuzzy logic system comprises of four parts – fuzzifier, fuzzy inference engine,
fuzzy rule base and defuzzifier. The system takes the crisp input value, which is then
fuzzified (i.e. converted into corresponding membership grade in the input fuzzy sets),
thereafter it is fed to the fuzzy inference engine. Using the stored IF-THEN fuzzy rules
from the rule base, the inference engine produces a fuzzy output that undergoes further
defuzzification to result in crisp output.
6
Fuzzy logic is often combined with other computational intelligence methods such
as expert systems and neural networks.
1.3.2.3 Artificial Neural Networks (ANN)
Artificial neural networks are massively parallel, distributed processing systems
built on the analogy to the human neural network – the fundamental information
processing system. Generally speaking, the practical use of neural networks has been
recognized mainly because of such distinguished features as
i.
general nonlinear mapping between a subset of the past time series values and the
future time series values
ii.
the capability of capturing essential functional relationships among the data, which
is valuable when such relationships are not known a priori or are very difficult to
describe mathematically and/or when the collected observation data are corrupted
by noise
iii.
universal function approximation capability that enables modeling of arbitrary
nonlinear continuous function to any degree of accuracy
iv.
capability of learning and generalizing from examples using the data-driven selfadaptive approach [11]
In fact, there are several kinds of ANN models. Every neural network model can be
classified by its architecture, processing and training. The architecture describes the neural
connections. Processing describes how the network produces output for every input and
weight. The training algorithm describes how the neural network adapts its weight for
every training vector.
The multilayer perceptron (MLP) is one of the most researched network
architecture. It is a supervised learning neural architecture, and it has been very popular
for time series prediction in general, and STLF in particular. This is because in its simplest
form, a TSP problem can be rewritten as a supervised learning problem, with the current
and past values of the time series as the input values to the network, and the one-step7
ahead value as the output value. This formulation allows one to explore the universal
function approximation and subsequent generalization capability of the MLP. The radial
basis function (RBF) network is another popular supervised learning architecture which
can also be used for the same purposes.
The Self-Organizing Map (SOM) is an important unsupervised learning neural
architecture, which is based on unsupervised competitive-cooperative learning paradigm.
In contrast to the supervised learning methods, SOM has not been popular for time series
prediction, or STLF. This mostly is because the SOM is traditionally viewed as a data
vector quantization and clustering algorithm [12][13] less suitable for function
approximation by itself. Hence when used for TSP, the SOM is usually used in a hybrid
model, where the SOM is first used for clustering, and subsequently another function
approximation method such as MLP or support vector regression (SVR) is used to learn
the function.
As the MLP and SOM form the basis of the work proposed in this thesis, they are
reviewed in greater detail in Chapter 3.
1.3.2.4 Evolutionary Approach
The algorithms developed under the common term of evolutionary computation are
inspired from the study of evolutionary behavior of biological processes. They are mainly
based on selection of a population as a possible initial solution of a given problem.
Through stepwise processing of the initial population using evolutionary operators, such
as crossover, recombination, selection and mutation, the fitness of the initial population
steadily improves.
Consider how a genetic algorithm might be applied to load forecasting. First an
appropriate model (either linear or nonlinear) is selected and an initial population of
candidate solutions is created. A candidate solution is produced by randomly choosing a
set of parameter values for the selected forecasting model. Each solution is then ranked
based on its prediction error over a set of training data. A new population of solutions is
generated by selecting fitter solutions and applying a crossover or mutation operation.
8
New populations are created until the fittest solution has a sufficiently small prediction
error or repeated generations produce no reduction of error.
1.3.3 Hybrid Approaches
Hybrid models have been proposed to overcome the inadequacies of using an
individual model, either a statistical method or a computational intelligence method. Also
referred to as ensemble methods or combined methods, these models usually are
employed to improve the prediction accuracy. There is still an absence of good theory in
this field on how to proceed with hybridization, though trends are emerging. Broadly
speaking, hybrid methods can be implemented in three different ways: linear models,
nonlinear models and both linear and nonlinear models.
In linear hybridization, two or more linear statistical models are combined together.
Though some work has been done in this field, as discussed in the section on literature
review, this field did not really pick up because a linear hybrid model would still suffer
from many of the problems with which linear models suffer.
The most heavily researched hybrid models would be those involving two nonlinear
models, especially two computational intelligence models. This is because the three
popular CI models - ANNs, fuzzy logic and evolutionary computation have their own
capabilities and restrictions, which are usually complimentary to each other. So for eg, the
black-box modeling approach of neural networks might be well suited for process
modeling or for intelligent control, but not that suitable for decision control. Similarly the
fuzzy logic systems can easily handle imprecise data and explain their decisions in the
context of the available facts in linguistic form; however they cannot automatically
acquire the linguistic rules to make these decisions. It is these capabilities and restrictions
of individual intelligent technologies which have driven their fusion to create hybrid
intelligent systems which have been successfully applied for various complex problems,
including STLF.
The third class of hybrid models, which this thesis is about, involve one statistical
method and one computational intelligence method. Usually the CI method is a neural
network, chosen for their flexibility and powerful pattern recognition capabilities. But
9
when developed as a predictive model, neural networks become difficult to interpret due
to their black-box nature and it becomes hard to test the parameters for their statistical
significance. Hence, time series models, linear ones such as ARMA or ARIMA, or
nonlinear ones such as STAR are introduced in the hybrid model to handle the concern of
interpretation.
1.4 Motivation
Though a comfortable state of performance has been achieved for electricity load
forecasting, but market players will always bring in new dynamic bidding strategies,
which, coupled with price-dependent load shall introduce new variability and nonstationarity in the electricity load demand series. Besides, the stricter power quality
requirements and development of distributed energy resources are other reasons why the
modern power system will always require more advanced and more accurate load
forecasting tools.
Consider why a SOM based hybrid model is an appealing option. Though every
possible approach has been applied for STLF, the more popular ones are the time series
approaches and computational intelligence approaches of feed-forward neural networks.
An extensive literature review is done in Section 1.6. Both these approaches attempt to
build a single global model to describe the load dynamics. The difference between time
series approaches and supervised learning neural networks is that while time series
approaches build an exact model of the dynamics (“hard computing”), the supervised
learning neural networks allow some tolerance for imprecision and uncertainty to achieve
tractability and robustness (“soft computing”). However, there is an exciting alternative to
building a global model, which is to build local models for the series dynamics, where
each local model handles a smaller section of the series dynamics. This is definitely an
area which needs further study, because a time series such as load demand series shows
various stylized facts, discussed further in Chapter 4. The complexity of a global model
increases a lot if it is to handle all the stylized facts. Working with multiple local models
might bring down the complexity. On the other hand, the challenges faced in working with
local models are manifold. Firstly, what factors should decide the division of the series
10
dynamics into local models. Secondly, how do we combine the results from multiple local
models to give the final prediction value?
In this thesis, SOM based hybrid models are proposed to explore the abovementioned idea of local models. As mentioned earlier, SOMs have been less applied to
STLF traditionally, which mostly has to do with the prevalent attitude among researchers
that SOMs are an unsupervised learning method, suitable only for data vector quantization
and clustering [12][13]. But this same property of clustering makes SOMs an excellent
tool for building local models.
Another motivation for this thesis is to further explore the idea of transitions
between local models. Once the local models have been built, how does the transition
from one model to another take place? Is it a sudden jump, where a local model M1 was
being used to describe the series on a particular day and a different local model M2 is
being used for the next day? After analyzing the electricity load demand series, it was
found that regimes were present in the series, due to season effects and market effects, and
the transition between these regimes was smooth. Hence a sudden jump from one local
model to another local model might not be the best approach. Hence this thesis studies the
NCSTAR model in Chapter 6, which allows smooth transition between local models. The
idea is to be able to obtain a highly accurate learning and prediction not only for test
samples which clearly belong to a particular local model, but also for test samples which
represent the transition from one local model to another local model.
Earlier researchers have proposed working with local models for STLF in different
ways, and to attain different aims. For eg., in [14], the same wavelet-based neural network
is trained four times over different periods in the year to handle the four seasons. But this
paper does not consider the transitions between the local models, i.e. the seasons to be
smooth. Not much work has been done on enforcing smooth transition between regimes or
local models for STLF. After extensive literature review (please see Section 1.6), the only
paper which was found to be handling smooth transition between local models for
electricity load forecasting is [30]. So definitely more study has to be done on how to
identify local models, how to implement smooth transitions between local models, and
11
how introducing the smooth transition will affect the prediction accuracy of the overall
model for STLF. This is exactly what this thesis sets out to do.
1.5 Contribution of the Thesis
In this work, two SOM based hybrid models are proposed for STLF.
In the first model, a load forecasting technique is proposed which uses a weighted
SOM for splitting the past historical data into clusters. For the standard SOM, all the
inputs to the neural network are equally weighted. This is a drawback compared to other
supervised learning methods which have procedures to adjust their network weights, e.g.
back-propagation method for MLPs and pseudo-inverse method for RBFs. Hence, a
strategy is proposed which weighs the inputs according to their correlation with the
output. Once the training with the weighted SOM is complete, the time series has now
been divided into smaller clusters, one cluster for each neuron. Next, a local linear model
is built for each of these clusters using an autoregressive model, which helps to smoothen
the results.
In the second hybrid model, the aim is to allow for smooth transitions between the
local models. Here the model of interest is a linear model with time varying coefficients
which are the outputs of a single hidden layer feedforward neural network. The hidden
layer is responsible for partitioning the input space into multiple sub-spaces through
multivariate thresholds and smooth transition between the sub-spaces. Significant research
has already been done into the specification, estimation and evaluation of this model. In
this thesis, a new SOM-based method is proposed to smartly initialize the weights of the
hidden layer before the network training. First, a SOM network is applied to split the
historical data dynamics into clusters. Then the Ho-Kashyap algorithm is used to obtain
the equations of the hyperplanes separating the clusters. These hyperplanes' equations are
then used to smartly initialize the weights and biases of the hidden layer of the network.
1.6 Literature Survey
The two approaches to STLF, and TSP in general, statistical methods and CI
methods have already been discussed above, and their different sub-categories have been
12
introduced. Some of the approaches described, such as non-linear time series models,
SOMs, and MLPs are more relevant to the work in this thesis than other models. What
follows next is a bibliographical survey for methods in STLF, with more emphasis given
to methods relevant to work done in this thesis.
1.6.1 Statistical Methods
In the field of linear approach to time series, Box-Jenkins methodology is the most
popular approach to handling ARMA and ARIMA models, and consists of model
identification and selection, parameter estimation and model checking. Box-Jenkins
methodology is among the oldest methods applied to STLF. It was proposed in [15], and
further developed in [16]. With a more modern perspective, [17] is an influential text on
nonlinear time series models, including several of those described in Section 1.3.1.4.
ARMA and ARIMA continue to be very popular for STLF.
In [18], the load demand is modeled as the sum of the two terms, the first term
depending on the time of day and the normal weather pattern for that day, and the second
term being the residual term which models the random disturbances using an ARMA
model. Usually the Box-Jenkins models assume a Gaussian noise. In [19], the ARMAmodeling method proposed allows for non-Gaussian noise as well.
Other works which
use Box-Jenkins method for STLF are [20][21].
In [22], a periodic autoregression model is used to develop 24 seasonal equations,
using the last 48 load values within each equation. The motivation is that by following a
seasonal-modeling approach, it is possible to incorporate a priori information concerning
the seasonalities at several levels (daily, weekly, yearly, etc.) by appropriately choosing
the model structure and estimation method. In [23], an ARMAX model is proposed for
STLF, where the X represents an exogenous variable, temperature in this case. Actually
this is a hybrid model as it uses a computational intelligence method, paticle swarm
optimization to determine the order of the model as well as its coefficients instead of the
traditional Box-Jenkins approach.
An ARIMA model uses differencing to handle the non-stationarity of the series, and
then uses ARMA to handle the resulting stationary series. In [24], six methods are
13
compared for STLF, and ARIMA is found to be a suitable benchmark. In [25], a modified
ARIMA model is proposed. Basically this model not only takes past loads as input, but
also the estimates of past loads provided by human experts. Thus this model, in a sense,
incorporates the knowledge of experienced human operators. This method is shown to be
superior to both ANN and ARIMA.
Now consider the previous work in STLF on regime-switching models, i.e. nonlinear statistical time series model discussed earlier in Section 1.3.1.4. The threshold
autoregressive (TAR) model was proposed by [7] and [8]. In [26], a TAR model with
multiple thresholds is developed for load forecasting. This model chooses the optimum
number of thresholds is the one which minimizes the sum of threshold variances.
A generalization of the TAR model is the smooth transition autoregressive (STAR)
model, which was initially proposed in [27], and further developed in [28] and [29]. A
modified STAR model for load forecasting is proposed in [30] where temperature plays
the role of threshold variable. This method uses periodic autoregressive models to
represent the linear regime, as they better capture the fact that the autocorrelation at a
particular lag of one half-hour varies across the week. Such switching regime models have
also been proposed for electricity price forecasting [31] [32].
1.6.2 Computational Intelligence Methods
Four CI methods were introduced earlier in Section 1.3.2, but the following
literature review focuses mostly on neural networks, as these are the most popular
amongst the four for STLF, and also the most relevant to the work done in this thesis.
There are several kinds of ANN models, classified by their architecture, processing
and training. For STLF, the popular ones have been used, e.g. radial basis function
networks [33][34], self-organizing maps [35] and recurrent neural networks [36][37].
However the most popular network architecture is the multi-layer perceptron described in
Section 1.3.2.3, as its structure lends naturally to unknown function approximation. In
[38],
a fully connected
three-layer
feedforward
ANN is
implemented
with
backpropagation learning rule, and the input variables being historical hourly load data,
day of the week and temperature. In [3], a multi-layered feedforward ANN is developed
14
which takes three types of variables as inputs - season related inputs, weather related
inputs, and historical loads. In [39], electricity price is also considered as a main
characteristic of the load.
Other recent work involving MLP for STLF include
[40][41][34][42].
In [43], in order to reduce the neural network structure and learning time, a onehour-ahead load forecasting method is proposed which uses the correction of similar day
data. In this proposed prediction method, the forecasted load power is obtained by adding
a correction to the selected similar day data. In [44], weather ensemble predictions are
used for STLF. A weather ensemble prediction consists of multiple scenarios for a
weather variable. These scenarios are used to produce multiple scenarios for load
forecasts. In [45], network committee technique, which is a technique from the neural
network architecture, is applied to improve the accuracy of forecasting the next-day peak
load.
1.6.3 Hybrid Methods
Hybrid models combining statistical models and neural networks are rare for STLF,
though they have been proposed for other TSP fields. In [46], a hybrid ARIMA/ANN
model is proposed. Because of the complexity of a moving trend as well as a cyclic
seasonal variation, an adaptive ARIMA model is first used to forecast the monthly load
and then the forecast load of the ARIMA model is used as an additive input to the ANN.
The prediction accuracy of this approach is shown to be better than traditional methods of
time series models and regression methods. In [47], a recurrent neural network is trained
by features extracted from ARIMA analyses, and used for predicting the mid-term price
trend of the Taiwan stock exchange weighted index. In [48], again an ARIMA model and
neural network model are combined to forecast time series of reliability data with growth
trend, and the results are shown to be better than either of the component models. In [49],
seasonal ARIMA (SARIMA) model and the neural network MLP are combined to
forecast time series with seasonality.
It was mentioned earlier in Section 1.3.2.3 that a neural network can be
implemented for both, supervised as well as unsupervised learning. But unsupervised
15
learning architectures, such as SOMs have traditionally been used for data vector
quantization and clustering. Hence when used for TSP, the SOM is usually used in a
hybrid model, where the SOM is first used for clustering, and subsequently another
function approximation method such as MLP or support vector regression (SVR) is used
to learn the function. In [50][51][52], a two-stage adaptive hybrid network is proposed. In
the first stage, a SOM network is applied to cluster the input data into several subsets in an
unsupervised manner. In the next stage, support vector machines (SVMs) are used to fit
the training data of each subset in a supervised manner. In [53], profiling is done through
SOMs, followed by prediction through radial function networks. In [54], the first SOM
module is used to forecast normal and abnormal days, and the second MLP module is able
to make the load model sensitive weather factors such as temperature.
As was mentioned in Section 1.3.3, the most heavily researched hybrid models for
TSP in general involve those where both the component models are computational
intelligence methods. In [55], a real-time pricing type scenario is envisioned where energy
prices change on an hourly basis, and the consumer is able to react to those price signals
by changing his load demand. In [56], attention is paid to special days. An ANN provides
the forecast scaled load curve and fuzzy inference models give the forecast maximum and
minimum loads of the special day. Similarly, significant work has also been done on
hybridizing evolutionary algorithms with neural networks. In [57], a genetic algorithm is
used to tune the parameters of a neural network which is used for STLF. A similar
approach is presented in [58]. In [59], a fuzzy neural network is combined with a chaossearch genetic algorithm and simulated annealing, and is found to be able to exploit all the
original methods' advantages. Similarly, particle swarm optimization is a recent CI
approach which has been hybridized with other CI approaches such as neural networks
[60][61] and support vector machines [62] to successfully improve the prediction accuracy
for STLF.
1.7 Structure of the Thesis
The thesis consists of the following chapters.
16
In this first chapter, short term load forecasting was introduced. The two approaches
to short term load forecasting, statistical approach and computational intelligence based
approach, were introduced, and their hybrid methods were discussed. Relevant work from
past research was presented. Finally the motivation for this thesis, and its contributions
were presented.
In the second chapter, statistical methods for time series analysis are briefly
discussed. These include the more traditional Box-Jenkins methodology, Holt-Winters
exponential smoothing, and the more recent regime-switching models.
In the third chapter, two popular neural network models, multilayer perceptron for
supervised learning and self-organizing maps for unsupervised learning are described. The
architecture, the learning rule and relevant issues are presented.
In the fourth chapter, the stylized facts of the load demand series are presented. It is
necessary to understand the unique properties of the load demand series before any
attempt is made to model them.
In the fifth chapter, the first hybrid model is presented. First it is explained how an
unsupervised model such as a self-organizing map can be used for time series prediction.
Then the hybrid model, involving autocorrelation weighted input to the self-organizing
map and autoregressive model is explained, along with the motivation for weighing with
autocorrelation coefficients.
In the sixth chapter, the second hybrid model is proposed to overcome certain issues
with the first proposed model. The need for smooth transitions between regimes in the
load series is highlighted. The contribution of this paper, a novel method to smartly
initialize the weights of the hidden layer of the neural network model NCSTAR is
presented.
The final chapter concludes this thesis with some directions for future work.
17
Chapter 2 Statistical Models for Time Series Analysis
In this chapter, the classical tools for time series prediction are reviewed, and recent developments in
nonlinear modeling are detailed. First, the commonly used Box-Jenkins approach to time series analysis is
described. Then, another commonly used classical method, the Holt-Winters exponential smoothing
procedure is explained. Finally, an overview of the more recent regime-switching models is given.
2.1 Box-Jenkins Methodology
ARMA models, as described by the Box-Jenkins methodology, are a very rich class
of possible models. The assumptions for this class of models are (a) the series is stationary
or can be transformed to one using a simple transformation such as differencing (b) the
series follows a linear model.
The original Box-Jenkins modeling procedure involves an iterative three-stage
procedure of model identification, model estimation and model validation. Later work
[63] includes a preliminary stage for data preparation and a final stage for forecasting.
•
Data preparation can involve several sub-steps. If the variance of the series
changes with the level, then a transformation of the data, such as logarithms, might
be necessary to make it a homoscedastic (constant variance) series. Similarly, it
needs to be determined if the series is stationary, and if there is any significant
seasonality which needs to be modeled. Differencing approach enables to handle
stationarity and remove seasonality.
•
Model identification involves identifying the order of the autoregressive and
moving average terms to obtain a good fit to the data. Several graph based
approaches exist, which include the autocorrelation function and partial
autocorrelation function approaches, and new model selection tools such as
Akaike’s Information Criterion have been developed.
•
Model estimation involves finding the value of model coefficients in order to
obtain a good fit on the data. The main approaches are non-linear least squares and
maximum likelihood estimation.
18
•
Model validation involves testing the residuals. As the Box-Jenkins models
assume that the error term should follow a stationary univariate process, the
residuals should have nearly the properties of i.i.d. normal random variables. If the
assumptions are not satisfied, then a more appropriate model needs to be found.
The residual analysis should hopefully provide some clues on how to develop a
more appropriate model.
2.1.1 AR Model
An autoregressive model of order p ≥ 1 is defined as
Xt = b1Xt-1 + .. + bpXt-p + εt
(2.1)
where {εt} ~ N(0, σ2), also known as white noise. This model can be written as an AR(p)
process. The equation explicitly specifies the linear relationship between the current value
and its past values.
2.1.2 MA Model
A moving average model of order q ≥ 1 is defined as
Xt = εt + a1 εt-1 +…+ aq εt-q
(2.2)
where {εt} ~ N(0, σ2), or white noise. This model can be written as an MA(q) process. For
h < q, there is a correlation between Xt and Xt-h due to the fact that they depend on the
same error terms εt-j.
2.1.3 ARMA Model
Combining the AR and MA forms together gives the popular autoregressive moving
average ARMA model, which can be defined as
Xt = b1Xt-1 + .. + bpXt-p + εt + a1 εt-1 +…+ aq εt-q
(2.3)
where {εt} ~ N(0, σ2), or white noise, and (p,q) are the order of the models. ARMA
models are a popular choice for approximating various stationary processes.
19
2.1.4 ARIMA Model
An autoregressive integrated moving average ARIMA model is a generalization of
an ARMA model. A time series which needs to be differenced to be made stationary is
said to be an “integrated” version of a stationary series. So an ARIMA(p,q,d) process is
one where the series needs to be differenced d times to obtain an ARMA(p,q) process.
This model, as mentioned in Section 1.6.1, continues to popular for STLF, and has been
used as a benchmark in this work.
2.2 Holt Winters Exponential Smoothing Method
2.2.1 Introduction
Exponential smoothing is a procedure where the forecast is continuously revised in
the light of more recent experience. This method assigns exponentially decreasing weights
as the observation gets older. The method consists of three steps: deciding on the model to
use and setting initial values of model parameters, updating the estimates of model
parameters, and finally forecasting.
Single exponential smoothing, used for short-range smoothing, assumes that the data
fluctuates around a reasonably stable mean (no trend or seasonality). Double exponential
smoothing method is used when the data shows a trend. Finally, the method which is most
interesting for this thesis, triple exponential smoothing, also called Holt-Winters
smoothing, can handle both trend and seasonality.
There are two main Holt-Winters smoothing models, depending on the type of
seasonality – multiplicative seasonal model and additive seasonal model. The difference
between the two is that in the multiplicative case, the size of the seasonal fluctuations
varies, depending on the overall level of the series, whereas in the additive case, the series
shows steady seasonal fluctuations. So an additive seasonal model is appropriate for a
time series when the amplitude of the seasonal pattern is independent of the average level
of the series.
20
2.2.2 Model Set-up
Consider the case when the series exhibits additive seasonality. In this model, the
assumption is that the time series can be represented by the model
yt = b1 + b2t + St + εt
(2.4)
where
b1 is the base signal called the permanent component
b2 is a linear trend component, which may be deleted if necessary
St is a additive seasonal factor, such that for season length of L periods,
∑
St = 0
1≤t ≤L
εt is the random error component
2.2.3 Notation Used for the Updating Process
Let the current deseasonalized level of the process at the end of period T be denoted
by RT. At the end of a time period t, let
R t be the estimate of the deseasonalized level.
Gt be the estimate of the trend.
St be the estimate of the seasonal component.
2.2.4 Procedure for Updating the Estimates of Model Parameters
2.2.4.1 Overall smoothing
R t = α (yt - St-L) + (1 – α) * ( R t-1 + Gt-1)
(2.5)
where 0 < α < 1 is a smoothing constant.
St-L is the seasonal factor for period T computed one season (L periods) ago.
Subtracting St-L from yt deseasonalizes the data so that only the trend component and the
prior value of the permanent component enter into the updating process for R t.
21
2.2.4.2 Smoothing of the trend factor
Gt = β * ( R t - R t-1) + (1 – β) * Gt-1
(2.6)
where 0 < β < 1 is another smoothing constant.
The estimate of the trend component is simply the smoothed difference between two
successive estimates of the deseasonalized level.
2.2.4.3 Smoothing of the seasonal component
St = γ * (yt - R t) + (1 – γ) * St-L
(2.7)
where 0 < γ < 1 is the third smoothing constant.
The estimate of the seasonal component is a combination of the most recently
observed seasonal factor given by the demand yt after removing the deseasonalized series
level estimate R t and the previous best seasonal factor estimate for this time period.
All the parameters in the method, α, β, and γ are estimated by minimizing the sum of
squared one step-ahead in-sample errors. The initial smoothed values for the level, trend
and seasonal components are estimated by averaging the early observations.
2.2.4.4 Value of forecast
The forecast for the next period is given by:
yt = R t-1 + Gt-1 + St-L
(2.8)
Note that the best estimate of the seasonal factor for this time period in the season is
used, which was last updated L periods ago.
2.2.5 Exponential smoothing for double seasonality
When dealing with daily load forecasting, the series shows only one significant
seasonality, which is the within-week cycle. Hence the above proposed method can be
satisfactorily applied in that scenario.
22
But when concerned with hourly load forecasting, there are two seasonalities, the
within-day cycle and the within-week cycle. To handle this double seasonality scenario,
[64] proposes an extension of the classical seasonal Holt-Winters smoothing method.
Using a new formulation where St and Tt denote the smoothed level and trend, Dt and Wt
are seasonal indices (intra-day and intra-week), s1 and s2 are the seasonal periodicity
lengths for intra-day and intra-week periods respectively, α, γ, δ, and ω are the smoothing
parameters, and ŷt(k) is the k-step-ahead forecast made from forecast origin t, then:
St = α
yt
+ (1 – α) (St-1 + Tt-1)
Dt-s Wt-s
1
(2.9a)
2
Tt = γ(St - St-1) + (1- γ) Tt-1
Dt = δ
yt
+ (1 – δ) Dt- s
St Wt-s
(2.9b)
(2.9c)
1
2
Wt = ω
yt
+ (1 – ω) Wt- s
St Dt-s
(2.9d)
2
1
ŷt(k) = (St + kTt) Dt- s
1
+k
Wt- s
2
+k
+ Φk(yt – ((St-1 + Tt-1) Dt- s Wt- s )
1
2
(2.9e)
The multiplicative seasonality formulation has been used here, though it is
mentioned in [64] that the additive seasonality gives similar results. The parameter
involving Φ is an adjustment for first-order autocorrelation.
In [65], a comparison of several univariate methods for STLF is presented. Besides
the exponential smoothing for double seasonality described above, the other methods
compared are double seasonal ARIMA model, artificial neural network, and a regression
method with principal component analysis. It is reported that in terms of mean absolute
percentage error (MAPE), the best approach is double seasonal exponential smoothing.
Hence in this paper, the standard Holt-Winters exponential smoothing has been used as a
benchmark for daily load forecasting, and the double seasonal exponential smoothing as
proposed in [64] has been used as a benchmark for hourly load forecasting.
23
2.3 Nonlinear Models
Regime-switching models were earlier mentioned briefly in Section 1.3.1.4.
Nonlinear models can prove to be better in terms of estimation and forecasting compared
to linear models because of their flexibility in capturing the characteristics of the data. In
this document, we will only be considering the threshold models.
In order to keep clarity within the various threshold models, a homogeneous
notation for all the models is described here.
Henceforth the following notation will be used:
•
yt is the value of a time series {yt} at time t
•
xɶ t ∈ ℜp is a p × 1 vector of lagged values of yt and/or some exogenous variables
•
xt ∈ ℜp+1 is defined as xt = [1, xɶ tT]T, where the first element is referred as an
intercept
•
The general nonlinear model is then expressed as
yt = Φ(xt ; ψ) + εt
(2.10)
where Φ(xt ; ψ) is a nonlinear function of the variable xt with parameter ψ, and {εt}
is a sequence of independently normally distributed random variables with zero
mean and variance σ2.
•
The logistic function which is used later on, when defined over the domain ℜp is
usually written as
f(γ(xt - β)) =
1
1 +exp (-γ ( xt - β ) )
(2.11a)
where γ or slope parameter determines the smoothness of the change between
models, i.e. the smoothness of the transition from one regime to another, and β can
be considered as the threshold which marks the regime switch. In its onedimensional form, it can be written as
24
f(γ(yt-d - c)) =
1
1 +exp (-γ (yt-d- c) )
(2.11b)
where yt-d is usually known as the transition or threshold variable, and d is called
the delay parameter.
2.3.1 Threshold Autoregressive Model (TAR)
To solve limitations of the linear approach, a threshold autoregressive (TAR) model
was proposed, which allows for a locally linear approximation over a number of regimes,
and it can be formulated as
k
yt =
∑ ωi xt I(st ∈ Ai) + εt
(2.12a)
i=1
k
=
∑ {ωi,0 + ωi,1 yt-1 + ωi,2 yt-2 +…+ ωi,p yt-p} I(st ∈ Ai) + εt
(2.12b)
i=1
where st is the threshold variable, I is an indicator (or step) function, ωi is the
autoregressive parameters for the ith linear regime, and {Ai} forms a partition of (-∞,∞)
k
with
k
∪ Ai = (-∞,∞) and
∩A
i=1
i=1
i
= ϕ, ∀ i ≠ j. So basically one of the autoregressive models
is activated, depending upon the value of the threshold variable st relative to the partitions
{Ai}.
2.3.2 Smooth Transition Autoregressive Model (STAR)
If one has good reason to believe that the transitions between the regimes are
smooth, and not discontinuous as assumed by TAR model, then one can choose the
smooth transition autoregressive (STAR) model. In this model, the indicator function I(.)
changes from a step function to a smooth function, such as sigmoid function, as in
Equation 2.11b. This STAR model with k regimes is defined as
k
yt =
∑ ωi xt Fi(st; γi, ci) + εt
(2.13)
i=1
25
The transition function, F(st; γi, ci) is a continuous function which is bounded
between 0 and 1. The transition variable st can be a lagged endogenous variable, that is
st= yt-d for certain integer d > 0. But this is not a required assumption, as the transition
variable can also be an exogenous variable (st= zt), or a (possibly nonlinear) function of
lagged endogenous variables: st= h( xɶ t ; α) for some function h which depends on the (p×1)
parameter vector α. Finally, we obtain a model with smoothly changing parameters if the
transition variable is a linear time trend, i.e. st= t.
The observable variable st and the associated value of F(st; γi, ci) determine the
regime that occurs at time t. Different types of regime-switching behavior can be obtained
by different choices for the transition function. The first-order logistic function, Equation
2.11b is a popular choice for F(st; γi, ci), and the resultant model is called a logistic STAR
(LSTAR).
In the LSTAR model, the transition function F(st; γi, ci) in Equation 2.13 is defined
as
1 − f ( st; γi, ci )
Fi(st; γi, ci) = f ( st; γi, ci ) − f ( st ; γi+1, ci+1)
f (s ; γ , c )
t i i
if i =1
if 1 6.5%.
Figure 6.4. The histogram of the residual error for New England hourly data
TABLE 6.3: PERCENTAGE OF HOURS WITH A CERTAIN MAPE RANGE FOR NEW ENGLAND DATA
MAPE Range
< 1%
% of Hours
24.2
1% to 2%
20.7
2% to 3%
17.4
3% to 4%
11.7
4% to 5%
8.7
5% to 6%
6.7
6% to 7%
4.0
>7%
6.6
At the 95% significance level, the null hypothesis of normal distribution is rejected
for the obtained residuals using all the three tests - Kolmogorov-Smirnov test, Lilliefors
85
test and Jaruq-Bera test. In Figure 6.4, looking at the histogram of the residuals and
comparing it with normal distribution's histogram, we can see that the residuals have fattails. This means that from time to time, there are rather large values in the error which are
hard to reconcile with the standard distributional assumption of normality. In a fat-tailed
distribution, the probability of large and small values is much higher than would be
implied by a normal distribution. This can be the reason why the tests for normality are
rejected for the residuals.
Consider why the fat-tails are present in the error residuals. This thesis hypothesizes
that sudden changes in weather due to summer effect and winter effect are responsible for
the fat-tails. The winter months of January-December and summer months of June-July
often include some extremely high demand days because of sudden heat wave or sudden
cold wave on those particular days. The NCSTAR model does not consider exogenous
weather forecasts in its inputs, because it assumes that weather variables evolve in a
smooth fashion and the load series can sufficiently capture the change. This assumption
leads to bigger errors for the days when weather changes suddenly. Hence the fat-tails.
To support this hypothesis, consider Figure 6.5. Let Lk and Lk+1 denote the load
demand at consecutive hours k and k+1 respectively. So ∆ =
L k+1
denotes the change over
Lk
the two consecutive hours. In Figure 6.5, the histogram of ∆ =
L k+1
for the four year data
Lk
plotted. The histogram appears to be the superposition of three bell-shaped segments. The
biggest bell-shaped segment is in the middle, around ∆ = 1. This corresponds to the
normal weather, when weather changes smoothly. Then there are two bell-shaped
segments, one below and one above the centre one. These are being created because of
more-than-normal or less-than-normal change in demand over two hours due to weather
effects. Hence, weather effects lead to fat-tails, which leads to the rejection of the
hypothesis of normality of error residuals.
86
Figure 6.5. Hourly change ∆
=
L k+1
for New England hourly data
Lk
6.4.1.8 Benchmarking
The benchmark used is the semigroup based system-type neural network (ST-NN)
architecture proposed in [104][105]. In this method, the network is decomposed into two
channels - a semigroup channel and a function channel. The semigroup channel models
the dependency of the load on temperature, whereas the functional channel represents the
fundamental characteristics of daily load cycles.
In Table 6.4, the results are shown which compare the two models for the seven
days of the week. Clearly NCSTAR outperforms the ST-NN model on each day, and the
improvements in accuracy are largest for the weekend days. Looking at the MAPE values
for the combined data, MAPEs are 3.75% and 3.07% respectively for ST-NN and
NCSTAR methods. NCSTAR has been able to improve the results by 0.68% over ST-NN,
which is a significant improvement. In Table 6.5, results are compared for the twelve
months of the year. While ST-NN and NCSTAR perform rather similarly in the easy-topredict months of March, April, May, it can be seen that NCSTAR is able to provide a
87
much better result in months which form the transition between seasons, such as June and
September.
TABLE 6.4: COMPARISON OF PREDICTION RESULTS FOR NEW ENGLAND FOR EACH DAY OF THE WEEK
Monday
ST-NN
4.01
NCSTAR
3.00
Tuesday
3.42
2.73
Wednesday
3.24
2.86
Thursday
3.24
2.96
Friday
3.50
3.23
Saturday
4.61
3.42
Sunday
4.26
3.07
Total
3.75
3.07
TABLE 6.5: COMPARISON OF PREDICTION RESULTS FOR NEW ENGLAND FOR INDIVIDUAL MONTHS
January
ST-NN
2.91
NCSTAR
2.83
February
2.70
2.42
March
2.79
2.90
April
3.20
3.23
May
3.10
3.18
June
4.52
2.41
July
3.97
3.79
August
5.28
5.00
September
6.10
3.23
October
2.85
2.42
November
3.19
2.10
December
4.40
3.25
6.4.1.9 Confidence Interval of the Predicted Value
We have already discussed how to deliver a single point forecast. However, the
decision-maker needs additional information bounding possible future values of the
forecasted process, in order to assess the uncertainties involved in prediction. This can be
done by looking at the confidence interval of the predicted values. Confidence intervals
should be as narrow as possible, and they should encompass a number of true values to
justify the reliability of the model.
88
Figure 6.6 shows the actual load demand and the 95% confidence interval for the
predicted load demand for three representative weeks of the year, the first week of
February, July and November respectively. The confidence interval was calculated after
running 1000 simulations of the proposed hybrid model. From the figure, it can be seen
that the width of the confidence interval changes over the day. As a general rule, the
confidence interval is narrow during the non-working hours, such as night-time, and
increases for the working hours. The maximum width of the confidence interval exists for
the periods which are the most difficult to predict. The period which is the most difficult
to predict depends on factors such as seasonality, as can be seen in the figures for the three
weeks of February, July and November.
89
Figure 6.6. Actual load demand (on left) and the 95% confidence interval (on right) for the
predicted load demand for three representative weeks of the year, the first week of February, July
and November respectively. The right side figures show the predicted value (thick line) and the
upper and lower bounds of the 95% confidence interval (thin lines)
6.4.2 Alberta Market
6.4.2.1 Data Source and Period
Five years of publicly available hourly load demand data for the Alberta market
[106] in Canada is used. Data from January 1, 2000 to December 31, 2002 is used to train
the network, data from, January 1, 2003 to December 31, 2003 is used as a validation set,
whereas data from January 1, 2004 to December 31, 2004 is used to calculate the out-ofsample accuracy. The model is built using the steps of model identification, model
estimation and model validation as explained earlier in Section 6.2.
6.4.2.2 Benchmark with AESO
The forecast load published by Alberta Electric System Operator (AESO) is a
reasonable benchmark to compare against for the proposed NCSTAR model. For the test
period of 2004, the MAPE of NCSTAR is 1.08% while the MAPE of AESO is 1.26%.
Tables 6.6 and 6.7 show the breakdown of MAPE results over the days of the week and
the months of the year for the period 2004 respectively. Clearly the NCSTAR model has
been able to significantly improve the forecasting results over AESO.
90
TABLE 6.6: COMPARISON OF PREDICTION RESULTS FOR ALBERTA FOR EACH DAY OF THE WEEK
Monday
AESO
1.44
NCSTAR
1.23
Tuesday
1.19
1.11
Wednesday
1.09
1.04
Thursday
1.03
0.93
Friday
1.18
1.06
Saturday
1.29
1.03
Sunday
1.36
1.11
Total
1.26
1.08
TABLE 6.7: COMPARISON OF PREDICTION RESULTS FOR ALBERTA FOR INDIVIDUAL MONTHS
January
AESO
1.26
NCSTAR
1.25
February
0.90
1.00
March
1.41
0.87
April
1.11
0.96
May
1.20
0.89
June
1.29
1.06
July
1.61
1.33
August
1.12
1.22
September
1.28
0.93
October
1.02
0.98
November
1.13
0.92
December
1.36
1.44
Looking at the monthly MAPE breakdown for both NCSTAR and AESO, we notice
that the MAPE is high for two periods - winter months of January-December and summer
months of June-July. Both these periods correspond to a relatively higher demand. These
periods also often include some extremely high demand days because of sudden heat wave
or cold wave on those particular days. As the NCSTAR model does not incorporate future
weather or temperature information, the prediction accuracy deteriorates during these
periods. But a lack of meteorological variables in the NCSTAR model is still justified
because for short lead times like one-day-ahead, usually the meteorological variables
evolve in a very smooth fashion, and the load series can sufficiently capture the change.
91
6.4.2.3 Benchmark with PREDICT2-ES, ARIMA and ANN
We also compare our work against a recent model proposed in [107]. This is the
PREDICT2-ES model which is a non-linear chaotic dynamic based predictor, with five
optimal parameters which are searched through an evolutionary strategy. Comparisons are
made for four weeks of the year 2004, each representing a different season. The results are
shown in Table 6.8. ARIMA refers to the autoregressive integrated moving average
methodology developed by Box and Jenkins[15]. In Table 6.8, ANN refers to artificial
neural networks. The MAPEs shown for ARIMA, ANN and PREDICT2-ES are taken
from [107]. AESO refers to the error made by the forecast load published by AESO for
the same period. It can be seen that ARIMA and ANN have a worse accuracy compared to
the other three approaches. AESO and NCSTAR are a slight improvement over the
PREDICT2-ES model. The accuracy of AESO and NCSTAR are comparable for these
studied periods.
TABLE 6.8: COMPARISON OF PREDICTION RESULTS FOR ALBERTA
Test period
2/16/20042/22/2004
5/11/20045/17/2004
8/16/20048/22/2004
10/25/200410/31/2004
ARIMA
ANN
PREDICT
2-ES
AESO
NCSTAR
1.440
2.130
0.945
0.877
0.877
1.070
1.100
0.812
0.832
0.840
2.540
2.130
1.272
0.986
1.195
1.500
0.820
0.745
0.727
0.652
6.4.2.4 Error Analysis
Here we apply methods taken from the model validation stage to see whether the
NCSTAR model or the AESO model are able to model the data. Error residuals are
obtained for both the models, and the histogram is plotted. Also, on top of the histrogram,
a best-fit normal distribution is superimposed. The results are shown in Figure 6.7 for (a)
NCSTAR model and (b) AESO model. Similar to the error residuals from the New
England market, here too the null hypothesis of normal distribution is rejected for both the
models. This is possible due to the fat-tails, or outliers due to unexpected weather
92
conditions as was described in Section 6.4.1.7 for New England data. Comparing the two
plots, not much can be said, except that the frequency for NCSTAR model is higher
around the centre compared to AESO model. Also, the histogram seems to be more
symmetric for NCSTAR compared to AESO.
For more comparisons between the two models, it is necessary to look at the table of
range of forecasting errors, which is shown in Table 6.9. The NCSTAR model has 57.6%
hours with MAPE less than 1%, compared to only 50.7% hours for AESO model. Thus
the NCSTAR model has a better accuracy compared for AESO for a larger number of
hours.
Figure 6.7. The histogram of the residual error for Alberta hourly data for (a) NCSTAR
model and (b) AESO model
TABLE 6.9: PERCENTAGE OF HOURS WITH A CERTAIN MAPE RANGE FOR NEW ENGLAND DATA
% of Hours in NCSTAR
model
57.6
% of Hours in AESO
model
50.7
1% to 2%
28.0
29.8
2% to 3%
9.5
12.3
3% to 4%
3.5
4.8
>4%
1.4
2.4
MAPE Range
< 1%
93
6.4.3 New South Wales Market
Here the results for NCSTAR are compared against the results for a Bayesian
Markov chain scheme proposed in [108] upon six months of data from New South Wales
market [109] in Australia. This model presents a Bayesian approach for estimating multiequation models for intra-day load forecasting, where a first-order vector autoregression is
used for the errors.
The training period is from January 1, 1998 to December 31, 2000. The test period
is February 1, 2001 to July 31, 2001. The results for the six months data and the monthly
breakdown are shown in Table 6.10 and Table 6.11 respectively. Clearly the NCSTAR
model improves the prediction accuracy not only in the complete six months data, but also
over individual months.
TABLE 6.10: COMPARISON OF PREDICTION RESULTS FOR NEW SOUTH WALES
Mean (Bayesian)
Weekdays
3.10
Weekends
3.43
Mean (NCSTAR)
2.17
2.31
Median (Bayesian)
3.10
3.33
Median (NCSTAR)
1.82
2.01
TABLE 6.11: COMPARISON OF PREDICTION RESULTS FOR NSW FOR INDIVIDUAL MONTHS
Weekdays
Bayesian
NCSTAR
Weekends
Bayesian
NCSTAR
February
3.97
2.25
4.00
2.63
March
3.17
2.79
3.44
2.78
April
2.31
2.05
3.06
1.46
May
3.14
2.00
3.74
2.51
June
4.45
2.20
4.53
2.32
July
2.17
1.81
2.33
2.02
6.4.4 England and Wales Market
Three markets have been studied to show the superior performance of NCSTAR
over the ISOs and other benchmarks for hourly load forecasting. Now instead of hourly
load forecasting, we work with daily load forecasting, which is also an interesting problem
94
for the power industry. Just as season effects and market effects lead to multiple regimes
being present in the hourly demand series, similarly the same effects also lead to multiple
regimes in the daily demand series. Hence NCSTAR is a suitable model to handle daily
demand series as well.
The data used here is obtained from the National Grid [93], which is the
transmission company for England and Wales market. Daily load demand is obtained by
summing the hourly demands for the day. The benchmark is an ARIMA model proposed
in [85]. The test period is the year 2003. While [94] uses data from 1970 to 1998 to
identify their ARIMA model, data of only the three preceding years 2000 to 2002 is used
to obtain the NCSTAR model.
The results are shown in Table 6.12. The results are impressive, considering that the
NCSTAR model applied here had no special treatment for holidays, not even averaging.
On the other hand, [94] included a separate model for special days, which is possible for
them because they have enough samples for special days due to the longer training period.
TABLE 6.12: COMPARISON OF PREDICTION RESULTS FOR ENGLAND AND WALES FOR INDIVIDUAL MONTHS
OF 2003
January
ARIMA
2.91
NCSTAR
2.09
February
2.53
1.19
March
1.76
2.50
April
3.02
1.40
May
2.28
2.08
June
1.21
0.89
July
2.22
0.84
August
1.96
1.78
September
1.48
1.12
October
2.20
1.38
November
1.68
1.52
December
3.61
2.22
Total
2.24
1.59
95
6.4.5 Singapore Market
Finally, the NCSTAR model is applied on the Singapore market data. But first, a
small study on the seasonality pattern in Singapore.
Seasons, as such, do not exist in Singapore. Singapore has a tropical rainforest
climate with no distinctive seasons, uniform temperature and pressure, high humidity and
abundant rainfall. May and June are the hottest months, while November and December
make up the wetter monsoon season. Now consider the electricity demand (semi-hourly)
for 2006, as shown in Figure 6.8. It can be clearly seen that seasonality effects are not
present here. Though one might say that the beginning and end of the year, i.e. December,
January and February have a slightly lower demand. This could be because of the wetter
monsoon season, and/or the holiday season effects.
Figure 6.8. Semi-hourly electricity demand in Singapore from 1 Jan 2006 to 31 Dec 2006
For NCSTAR, the training period is the data from January 1, 2005 to December 31,
2007, the validation data is from January 1, 2008 to December 31, 2008, and finally the
testing period is from January 1, 2009 to December 31, 2009. The model is built using the
steps of model identification, model estimation and model validation as explained earlier
96
in Section 6.2. The holidays were replaced by the mean of load demand one week earlier
and one week after.
The MAPE results obtained by week day and by month are shown in Table 6.13 and
Table 6.14 respectively.
TABLE 6.13: MAPE PREDICTION RESULTS FOR SINGAPORE FOR EACH DAY OF THE WEEK
Monday
NCSTAR
1.43
Tuesday
1.50
Wednesday
1.13
Thursday
1.35
Friday
1.88
Saturday
1.54
Sunday
1.15
Total
1.43
TABLE 6.14: MAPE COMPARISON OF PREDICTION RESULTS FOR SINGAPORE FOR INDIVIDUAL MONTHS
January
NCSTAR
1.49
February
1.10
March
1.25
April
1.38
May
1.68
June
1.48
July
1.33
August
1.48
September
1.57
October
1.78
November
1.33
December
1.21
Unfortunately the author was unable to find a suitable benchmark to compare the
work against. A suitable benchmark would have been a paper which had at least one year
training data, and preferably the year should have been 2009. The strength of the proposed
NCSTAR model is in the fact that it is better able to model smooth transitions between
97
multiple regimes. Seasonality is a prominent source of multiple regimes in the load
demand time series. And in order to accommodate all the seasons at least once, it is
necessary to have at least one year's data for testing. Unfortunately, such a paper for
benchmarking was unavailable.
The total MAPE obtained is 1.43% for Singapore. Other papers have earlier
obtained MAPEs which are significantly less than this result [110][111]. As mentioned
earlier, one reason why NCSTAR fails to obtain competitive results for Singapore data is
the lack of multiple regimes due to lack of seasonality, and hence no good use of the
smooth transition effect of the NCSTAR model.
6.4.6 Speed and Complexity Issues
The proposed NCSTAR model with SOM-based initialization is able to provide a
more
accurate
and
faster
training
capability
compared
to
the
original
NCSTAR model which was proposed in [68] with the original weight initialization
involving parallel hyperplanes which has been discussed in Section 6.2.6. The reason for
this faster training, as discussed earlier, relates to the fact that the SOM-based
initialization helps the training to start from a value close to the global optimum, hence the
training has less chance of getting stuck in a local minima, which is a major concern for
the MLP. This fact is shown in Figure 6.9. For the Alberta data, both the NCSTAR model
is implemented, once with the proposed SOM-based initialization, and then with the
original weight initialization proposed in [68]. The training in both the scenarios is
through the regular back-propagation algorithm [76]. Researchers have come up with
heuristics such as learning with momentum term, and second-order algorithms such as
conjugate gradient descent and Levenberg-Marquardt algorithms to speed the learning
process compared to back-prop, and help prevent getting stuck in a local minima. But for
benchmarking purposes, the regular back-propagation algorithm is used for both models.
Figure 6.8 shows the fall of mean square error (MSE) with the number of iterations for
1000 iterations for Monday. The continuous line is the SOM-based initialized version, and
the dotted line is the original NCSTAR initialization proposed in [68]. MSE falls faster,
98
and deeper for the SOM-based initialization method compared to the original
initialization. Hence, the training is more accurate and faster.
Figure 6.9. Comparison of training performance of SOM-based initialization (solid line)
and originally proposed initialization (dotted line) for the NCSTAR model
In terms of absolute time, the program is run on a Core 2 Duo 3Ghz processor. A
separate model is being built for each hour of the week, i.e. 168 models. Building all the
models together takes a time of 84.71 seconds in a typical run, which is 0.50 seconds to
build an hourly model. 84.71 seconds is a significant amount of time, especially when
compared to traditional methods such as ARIMA which are faster. However, the model is
so time consuming because it has to first cluster the data, then train a MLP for each hour
of each day. To handle the weekend effect, this is necessary.
Building an hourly model takes three time-consuming steps - the SOM clustering,
the Ho-Kashyap training, and the back-propagation training for the MLP. For a typical
99
hourly model, these three steps consume 0.27, 0.09 and 0.12 seconds respectively. The
most time consuming step is the first one, i.e. the clustering with SOM.
Consider the complexity of the NCSTAR model. Usually this is discussed in terms
of number of input/output variables, and the complexity of the learning algorithm. The
NCSTAR model has two components •
a linear autoregressive model for prediction (Equation 6.1). The input is zt, which
has lagged values of yt. How many lags and which lags to include is determined by
the ACF function, as described in section on model identification. Usually only
three lags, at lag 1, at lag 24, and at lag 168 are taken.
•
the coefficients of the linear model, which are the output of a single hidden layer
MLP (Equation 6.2). Here the input is xt, which needs to have 24 variables, the
lagged values of 24 hours in the previous day. It is important to include all the 24
lags because we are trying to cluster the data based on the historical daily load
profile patterns, and the pattern might appear in any section of the day.
Hence overall there are 28 variables as input to the NCSTAR model. This means
that the complexity is fairly high. Though in literature review, it is not uncommon to find
STLF models which have this high or higher number of variables. Also the fact that our
model assumes that weather variables change smoothly (and these changes can be
captured by the load demand itself), we do not have to include any exogenous variables,
which keep the model relatively simple.
Finally the learning algorithm is the Concentrated Least Squares method, which is a
modified back-propagation algorithm aimed at taking benefit of certain linearities in the
model to reduce the dimensionality of the iterative estimation problem, as discussed in
Section 6.2.4.
100
6.5 Limitations of the proposed model
The strength of the proposed model lies in being better able to model the smooth
transition between multiple regimes. This model might not be best suitable for power
markets where the load pattern does not show multiple regimes. As strong seasonal effects
are an important cause for multiple regimes, this means that the model might not be able
to perform well enough for power markets with no seasonal effects. This can be seen from
the results, where for Singapore data, the results are not as impressive as for the previous
three markets of New England, Alberta and New South Wales. A possible reason for this
is the lack of strong seasonality in Singapore.
6.6 Discussion And Conclusion
In this chapter, it is first explained why the NCSTAR model, with its multivariate
thresholds and smooth transitions between regimes, is suitable for short-term load
forecasting. This is because the load demand is highly dependent on seasonal factors, and
seasons tend to change gradually. Next, the inadequacies of the current methods of
initializing the weights of the NCSTAR neural network are highlighted, and the
importance of having good initial network weights is explained. Finally a two step method
to obtain fairly good initial weights is proposed. In the first step, unsupervised learning is
used to cluster the historical data into separate regimes. Second step involves the use of
Ho-Kashyap algorithm to find the equations of the separating planes, which are then used
to initialize the weights of the hidden layer neurons. Experiments on four prominent
energy markets show that the proposed method gives competitive results in terms of
prediction accuracy for hourly load forecasting as well as daily load forecasting.
A notable advantage of the proposed method is that it can easily handle the presence
of multiple regimes in the electricity load data, which might occur due to seasonal effects
or market effects. This is because the NCSTAR model is working with the weighted sum
of several local AR models instead of a single global model for the whole series. This
handling of multiple regimes is a desirable property because in a deregulated power
market, players will continue to bring in new dynamic bidding strategies which will
introduce more local regimes in the price-dependent load demand series.
101
The proposed method of weight initialization for the NCSTAR model also makes it
more robust to initial conditions because the first step of the initialization method involves
a SOM, which is generally robust against bad initializations.
In the present model, exogenous variables such as weather factors like temperature
or humidity are not included. This can be justified by the observation that for short lead
times like one-day-ahead, the weather variables evolve in a smooth fashion. However it is
also noted that the prediction accuracy is the worst for peak winter and peak summer
months, which are most associated with sudden cold wave or sudden heat wave
respectively. Assuming that good weather forecasts are available, it will be interesting to
incorporate the weather factors in our model in such a way that they influence the
predicted load demand only if the predicted weather is significantly different from
historically observed normal weather for the next day. Furthermore, which weather
variable to consider shall depend upon the characteristics of the energy market being
studied. So while humidity might be an important factor for load forecasting in tropical
markets, it does not matter much in temperate markets. Future work in this field will have
to deeply investigate these characteristics before weather variables can be incorporated in
the model to improve the prediction accuracy further.
102
Chapter 7 Conclusion
The purpose of this thesis was to approach the problem of short term load
forecasting using hybrid methods. This work was primarily concerned with SOM-based
hybrid models. The aim was to show that unsupervised learning methods such as SOMs,
which are usually not associated much with time series prediction, can be hybridized
judiciously with other time series models such as a linear AR model or a non-linear STAR
model, to give good results for STLF.
The first hybrid method proposed used SOMs for data vector quantization and
clustering. SOMs were used to divide the historical dynamics into several clusters. Thus,
this method implied working with several local models instead of a single global model.
Each local model handles only a small section of the dynamics. Besides, each local model
can be a simple model, such as the AR model in the proposed hybrid model. The model
was shown to give more accurate results than popular time series ARIMA model.
In the previous model, the transition from one local model to another local model
was a discrete event, i.e. the jump from one local model to another local model was
immediate, as a series segment can belong only to one local model. But a better way is to
model the transitions as being smooth transitions. In nature, when a regime caused by cold
weather changes to a regime caused by mild weather, the transition is a gradual one. Same
can be said about other regime changes, caused by season effect or market effect.
Obviously the smoothness of the transition varies. A smooth transition model would help
to improve the accuracy during the transition period. In the second hybrid model, the
smooth transition between regimes is modeled. First, a linear-neural model is described
which is capable of handling multiple regimes and multiple transition variables. Next a
new method is proposed to smartly initialize the weights of the hidden layer of the neural
network before its training. Finally the model is implemented on four electricity markets
to show its competitiveness compared to popular time series models and other recently
proposed models.
103
The unique contribution of this thesis is the in-depth study of the smooth transition
models for the area of short-term load forecasting. Though regime-switching models,
specifically smooth transition autoregressive models have been popularly used in
econometrics [112][113], a detailed study on smooth transition models has been lacking
for the domain of STLF. As electricity load demand series has its own unique stylized
characteristics, it is interesting to see how the smooth transition approach applies in this
field. In this thesis, a model building procedure is developed for the smooth transition
approach, involving identification, estimation and validation stages.
Another unique contribution of this thesis is the new approach to initialize the
weights of the NCSTAR model. This new approach, involving the SOM and the HoKashyap methodology was required because the original NCSTAR model proposed a
global search for weight initialization, which is not an ideal approach to initialize the
weights of a multilayer perceptron. Though this new weight initialization method has been
proposed here for STLF, it is a very general method which can be easily extended to other
time series prediction fields as well.
A final unique contribution of this thesis is the idea explored in the first hybrid
model that the inputs to the SOM could be weighted in order to achieve an improved
forecasting accuracy. Not much work has been done in the field of weighted SOMs. In
this thesis, the weights for SOMs were chosen to be the autocorrelation coefficients
between time lags. For the domain of short-term load forecasting, this is interesting
because the autocorrelation varies periodically due to the weekend effect. It was shown
that autocorrelation weighing improves the prediction accuracy for STLF under certain
situations.
In terms of future lines of research, the following paths remain open for future
developments:
•
For STLF, the question of whether to include exogenous variables such as weather
related variables can be addressed. If exogenous weather variables are included, a
study will need to done to compare between the various weather variables
available to find which one contributes to improving the accuracy the most. An
104
appropriate study will be required to find if the relationship between the load and
weather is linear or not. A good idea would be to weigh the effect of weather
variable such that if the forecast weather variable has a significantly different value
from the historically observed value, then it is given more weight.
•
In the first proposed hybrid model, the autocorrelation coefficient is used as a
measure of dependence within a time series. This is just one, and a rather simple
one, amongst many other measures of dependence within a time series. Other
measures have been proposed, such as Cohen's kappa, Cramer's, Goodman and
Kruskal's lambda, for e.g. in [114]. Weighing the SOM inputs with these measures
can be implemented.
•
In the first proposed hybrid model, the bigger idea is to build local models instead
of a global model. SOM is just one amongst many possible tools for clustering,
and subsequently building a local model. Other clustering algorithms, statistical as
well as neural network based, can be explored and the performances compared.
In the second proposed hybrid model, SOM is used to initialize the weights of the
hidden layer of a neural network. Choosing initial weights continues to be a challenging
problem in other computational intelligence methods such as particle swarm optimization
as well. Hybridizing with unsupervised learning methods such as SOM for weight
initialization is an interesting problem to study.
105
Bibliography
[1] A. J. Wood and B. F. Wollenberg, Power Generation Operation and Control, John
Wiley & Sons, 1984.
[2] E.A. Feinberg and D. Genethliou, "Load forecasting". In: Applied Mathematics for
Restructured Electric Power Systems: Optimization, Control, and Computational
Intelligence, J.H. Chow et al. (eds.), Springer, 2005.
[3] A. D. Papalexopoulos, S. Hao and T. M. Peng, "An Implementation of a Neural
Network Based Load Forecasting Model for the EMS", IEEE Transactions on Power
Systems, vol. 9, no. 3, pp. 1956-1962, 1994.
[4] G. Welch and G. Bishop, "An Introduction to Kalman Filter", Technical Report TR95041, University of North Carolina, Chapel Hill, NC, USA, 1995.
[5] J. Wang, "Modeling and Generating Daily Changes in Market Variables Using A
Multivariate Mixture of Normal Distributions", Proceedings of the 33rd Conference on
Winter Simulation, pp. 283-289, 2001.
[6] L. E. Baum, T. Petrie, G. Soules and N. Weiss, "A Maximization Technique Occurring
in the Statistical Analysis of Probabilistic Functions of Markov Chains", The Annals of
Mathematical Statistics, vol. 41, no. 1, pp. 164-171, 1970.
[7] H. Tong, "On a Threshold Model", in Pattern Recognition and Signal Processing, C.
H. Chen, Ed. Amsterdam, The Netherlands: Sijthoff and Noordhoff, 1978.
[8] H. Tong and K. S. Lim, "Threshold Autoregression, Limit Cycles and Cyclical Data
(with discussion)", Journal of Royal Statistical Society, ser. B 42, pp. 245-292, 1980.
[9] B. F. Hobbs, S. Jitprapaikulsarn, S. Konda, V. Chankong, K. A. Loparo and D. J.
Maratukulam, "Analysis of the value for unit commitment of improved load forecasting",
IEEE Transactions on Power Systems, vol. 14, no. 4, pp. 1342-1348, 1999.
106
[10] J. A. Gonzalez, and D. D. Douglas. The Engineering of Knowledge-Based Systems:
Theory and Practice, Englewood Cliffs, NJ: Prentice Hall, 2000.
[11] A. K. Palit and D. Popovic, Computational Intelligence in Time Series Forecasting:
Theory and Engineering Applications, Springer, London, 2005.
[12] R. Xu and D. Wunsch, "Survey of Clustering Algorithms", IEEE Transactions on
Neural Networks, vol. 16, no. 3, pp. 645-678, 2005.
[13] A. Flexer, "On the Use of Self-Organizing Maps for Clustering and Visualization",
Intelligent Data Analysis, vol. 5, no. 5, pp. 373-384, 2001.
[14] N. M. Pindoriya, S. N. Singh and S. K. Singh, “Forecasting of Short-Term Electric
Load Using Application of Wavelets with Feed-Forward Neural Networks”, International
Journal of Emerging Electric Power Systems, vol. 11, no. 1, 2010.
[15] G. E. P. Box, G. M. Jenkins and G. Reisnel, Time Series Analysis- Forecasting and
Control, 3rd ed., Prentice Hall, 1994.
[16] T. Anderson, The Statistical Analysis of Time Series, Wiley & Sons, 1971.
[17] H. Tong, Non-linear Time Series: A Dynamical System Approach, Oxford University
Press, Oxford, 1990.
[18] G. Gross and F. D. Galiana, "Short-term load forecasting", Proceedings of the IEEE,
vol. 75, no. 2, pp. 1558-1573, 1987.
[19] S. J. Huang and K. R. Shih, "Short-Term Load Forecasting via ARMA Model
Identification Including Non-Gaussian Process Identification", IEEE Transactions on
Power Systems, vol. 18, no. 2, pp. 673-679, 2003.
[20] J. Y. Fan and J. D. McDonald, "A Real-Time Implementation of Short-Term Load
Forecasting For Distribution Power Systems", IEEE Transactions on Power Systems, vol.
9, no. 2, pp. 988-994, 1994.
107
[21] M. T. Hagan and S. M. Behr, "The Time Series Approach To Short-Term Load
Forecasting", IEEE Transactions on Power Systems, vol. 2, no. 3, pp. 785-791, 1987.
[22] M. Espinoza, C. Joye and R. Belmans, "Short-Term Load Forecasting, Profile
Identification and Customer Segmentation: A Methodology Based On Periodic Time
Series", IEEE Transactions on Power Systems, vol. 20, no. 3, pp. 1622-1631, 2005.
[23] C. M. Huang, C. J. Huang, and M. L. Wang, "A Particle Swarm Optimization To
Identifying the ARMAX Model for Short-Term Load Forecasting", IEEE Transactions on
Power Systems, vol. 20, no. 2, pp. 1126-1133, 2005.
[24] J. W. Taylor, L. M. de Menezes and P. E. McSharry, "A Comparison of Univariate
Methods for Forecasting Electricity Demand Up to A Day Ahead", International Journal
of Forecasting, vol. 22, pp. 1-16, 2006.
[25] N. Amjady, "Short Term Load Forecasting Using Time-Series Modeling With Peak
Load Estimation Capability", IEEE Transactions on Power Systems, vol. 16, no. 3, pp.
498-505, 2001.
[26] S. R. Huang, "Short-Term Load Forecasting Using Threshold Autoregressive
Models", IEE Proceedings on General Transmission and Distribution, vol. 144, no. 5, pp.
477-481, 1997.
[27] K. S. Chan and H. Tong, "On Estimating Thresholds In Autoregressive Models",
Journal of Time Series Analysis, vol. 7, pp. 179-190, 1986.
[28] R. Luukkonen, P. Saikkonen and T. Terasvirta, "Testing Linearity Against Smooth
Transition Autoregressive Models", Biometrika, vol. 75, pp. 491-499, 1988.
[29] T. Terasvirta, "Specification, Estimation, and Evaluation of Smooth Transition
Autoregressive Models", Journal of American Statistical Association, vol. 89, no. 425, pp.
208-218, 1994.
108
[30] L. F. Amaral, R. C. Souza and M. Stevenson, "A Smooth Transition Periodic
Autoregressive Model for Short-Term Load Forecasting", International Journal of
Forecasting, vol. 24, no. 4, pp. 603-615, 2004.
[31] A. T. Robinson, "Electricity pool prices: A case study in nonlinear time series",
Applied Economics, vol. 32, no. 5, pp. 527-532, 2000.
[32] M. Stevenson, "Filtering and forecasting electricity prices in the increasingly
deregulated Australian electricity market", International Institute of Forecasters
Conference, pp. 1-31, 2001.
[33] D. K. Ranaweera, N. F. Hubele and A. D. Papalexopoulos, "Application of radial
basis function neural network model for short-term load forecasting", IEE Proceedings of
Generation, Transmission and Distribution, no. 142, pp. 45-50, 1995.
[34] E. Gonzalez-Romera, M. A. Jaramillo-Moran and D. Carmona-Fernandez, "Monthly
electricity energy demand forecasting based on trend extraction", IEEE Transactions on
Power Systems, vol. 21, no. 4, pp. 1946-1953, 2006.
[35] M. Becalli, M.Cellura, V. Lo Brano and A. Marvuglia, "Forecasting daily urban
electricity load profiles using artificial neural networks", Energy Conversion and
Management, vol. 45, pp. 2879-2900, 2004.
[36] T. Senjyu, P. Mandal, K. Uezato and T. Funabashi, "Next day load curve forecasting
using recurrent neural network structure", IEE Proceedings of Generation, Transmission
and Distribution, vol. 151, no. 3, pp. 388-394, 2004.
[37] C. N. Tran, D. C. Park and W. S. Choi, "Short-term load forecasting using multiscale
bilinear recurrent neural network with an adaptive learning algorithm", In: King, I. et al.
(Eds.), Thirteenth International Conference on Neural Information Processing (ICONIP
2006), LNCS, vol. 4233. Springer, pp. 964-973, 2006.
[38] A. G. Bakirtzis, V. Petridis, S. J. Klarzis, M. C. Alexadis and A. H. Malssis, "A
neural network short term load forecasting model for the Greek power system", IEEE
Transactions on Power Systems, vol. 11, no. 2, pp. 858-864, 1996.
109
[39] H. Chen, C. A. Canizares and A. Singh, "ANN based short-term load forecasting in
electricity markets", Proceedings of the IEEE Power Engineering Society Transmission
and Distribution Conference, vol. 2, pp. 411-415, 2001.
[40] H. S. Hippert, C. E. Pedreira and R. C. Souza, "Neural networks for short-term load
forecasting: A review and evaluation", IEEE Transactions on Power Systems, vol. 16, no.
1, pp. 44-55, 2001.
[41] H. S. Hippert, D. W. Bunn and R. C. Souza, "Large neural networks for electricity
load forecasting: Are they overfitted", International Journal of Forecasting, vol. 21, pp.
425-434, 2005.
[42] J. V. Ringwood, D. Bofelli and F. T. Murray, "Forecasting electricity demand on
short, medium and long time scales using neural networks", Journal of Intelligent and
Robotic Systems, vol. 31, pp. 129-147, 2001.
[43] T. Senjyu, H. Takara, K. Uezato and T. Funabashi, "One hour ahead load forecasting
using neural network", IEEE Transactions on Power Systems, vol. 17, no. 1, pp. 113-119,
2002.
[44] J. W. Taylor and R. Buizza, "Neural network load forecasting with weather ensemble
predictions", IEEE Transactions on Power Systems, vol. 17, pp. 626-632, 2002.
[45] R. E. Abdel-Aal, "Improving electric load forecasts using network committees",
Electric Power Systems Research, vol. 74, pp. 83-94, 2005.
[46] A. A. El Desouky and M. M. Elkateb, "Hybrid adaptive techniques for electric-load
forecast using ANN and ARIMA", IEE Proceedings of Generation, Transmission and
Distribution, vol. 147, no. 4, pp. 213-217, 2000.
[47] J. H. Wang and J. Y. Leu, "Stock market trend prediction using ARIMA-based neural
networks", Proceedings of IEEE International Conference on Neural Networks, vol. 4, pp.
2160-2165, 1996.
110
[48] C. T. Su, L. I. Tong and C. M. Leou, "Combination of Time series and neural
network for reliability forecasting modeling", Journal of the Chinese Institute of Industrial
Engineers, vol. 14, no. 4, pp. 419-429, 1997.
[49] F. M. Tseng, H. C. Yu and G. H. Tzeng, "Combining neural network model with
seasonal time series ARIMA model", Technological Forecasting and Social Change, vol.
69, no. 1, pp. 71-87, 2001.
[50] S. Fan and L. Chen, "Short-term load forecasting based on an adaptive hybrid
method", IEEE Transactions on Power Systems, vol. 21, no. 1, pp. 392-401, 2006.
[51] M. Martin-Merino and J. Roman, "A New SOM algorithm for electricity load
forecasting", Lecture Notes in Computer Science, Springer-Verlag, pp. 995-1003, 2006.
[52] Z. Bao, D. Pi, and Y. Sun, "Short term load forecasting based on self-organizing map
and support vector machine," Lecture Notes in Computer Science, vol. 3610, pp. 688-691,
2005.
[53] A. Lendasse, M. Cottrell, V. Wertz and M. Verleysen, "Prediction of electric load
using Kohonen maps - Application to the Polish electricity consumption", Proceedings of
the American Control Conference, vol. 5, pp. 3684-3689, 2002.
[54] M. Farhadi and S. M. M. Tafreshi, "Effective model for next day load curve
forecasting based upon combination of perceptron and kohonen ANNs applied to Iran
power network", 29th International Telecommunications Energy Conference, pp. 267-273,
2007.
[55] A. Khotanzad, E. Zhou and H. Elragal, "A Neuro-Fuzzy approach to short-term load
forecasting in a price sensitive environment", IEEE Transactions on Power Systems, vol.
17, no. 4, pp. 1273-1282, 2002.
[56] K. H. Kim, H. S. Youn and Y. C. Kang, "STLF for special days in anomalous load
conditions using neural networks and fuzzy inference method", IEEE Transactions on
Power Systems, vol. 15, pp. 559-565, 2000.
111
[57] S.H. Ling, F.H.F. Leung, H.K. Lam, Y.S. Lee and P.K.S. Tam, “A novel geneticalgorithm-based neural network for short-term load forecasting,” IEEE Transactions on
Industrial Electronics, vol. 50, no. 4, pp. 793-799, 2003.
[58] Ronaldo R. B. de Aquino, O. N. Neto, Milde M. S. Lira, A. A. Ferreira, K. F. Santos.
"Using Genetic Algorithm to Develop a Neural-Network-Based Load Forecasting",
Lecture Notes in Computer Science, Springer Berlin, pp. 738-747, 2007.
[59] G. C. Liao and T. P. Tsao, "Application of a fuzzy neural network combined with a
chaos genetic algorithm and simulated annealing to short-term load forecasting", IEEE
Transactions on Evolutionary Computation, vol. 10, no. 3, pp. 330-340, 2006.
[60] Z. A. Bashir and M. E. El-Hawary, "Short-term load forecasting using artificial
neural networks based on particle swarm optimization algorithm", Canadian Conference
on Electrical and Computer Engineering, pp. 272-275, 2007.
[61] D. Niu, Z. Gu and M. Xing, "Research on neural networks based on culture particle
swarm optimization and its application in power load forecasting", Third International
Conference on Natural Computation, pp. 270-274, 2007.
[62] J. Wang, Y. Zhou and Y. Chen, "Electricity load forecasting based on support vector
machines and simulated annealing particle swarm optimization algorithm", Proceedings of
the IEEE International Conference on Automation and Logistics, pp. 2836-2840, 2007.
[63] S. Makridakis, S. C. Wheelwright and R. J. Hyndman, Forecasting: Methods and
Applications,3rd edition. John Wiley and Sons, 1998.
[64] J. W. Taylor, "Short-Term Electricity Demand Forecasting Using Double Seasonal
Exponential Smoothing", Journal of Operational Research Society, vol. 54, pp. 799-805,
2003.
[65] J. W. Taylor and P. E. McSharry, "Short-Term Load Forecasting Methods: An
Evaluation Based on European Data", IEEE Transactions on Power Systems, vol. 22, pp.
2213-2219, 2007.
112
[66] T. Teräsvirta, M. C. Medeiros and G. Rech, 2006. "Building neural network models
for time series: a statistical approach", Journal of Forecasting, John Wiley & Sons, Ltd.,
vol. 25, no. 1, pp. 49-75, 2006.
[67] A. Veiga and M. Medeiros, "A hybrid linear-neural model for time series
forecasting", Proceedings of the NEURAP, pp. 377-384, 1998.
[68] A. Veiga and M. Medeiros, "A hybrid linear-neural model for time series
forecasting", IEEE Transactions on Neural Networks, vol. 11, no. 6, pp. 1402-1412, Nov
2000.
[69] A. Veiga and M. Medeiros, "A flexible coefficient smooth transition time series
model", IEEE Transactions on Neural Networks, vol. 16, no. 1, pp. 97-113, Jan 2005.
[70] T. E. Jin, "Training issues and learning algorithms for feedforward and recurrent
neural networks", Ph. D. thesis, National University of Singapore, Singapore, 2009.
[71] D. S. Yeung and X. Sun, “Using function approximation to analyze the sensitivity of
MLPs with antisymmetrix squashing activation function”, IEEE Transactions on Neural
Networks, vol. 13, no. 1, pp. 34-44, 2002.
[72] T. Kohonen, Self-Organizing Maps, Springer-Verlag, Berlin, Germany, 1997.
[73] A. Jain and B. Satish, "Clustering based Short Term Load Forecasting using Support
Vector Machines", IEEE Power Tech Conference, 2009.
[74] J. M. Fidalgo and M. A. Matos, "Forecasting Portugal global load with artificial
neural networks", Lecture Notes in Computer Science, Springer-Verlag, pp. 728-737,
2007.
[75] G. A. Barreto and A. F. R. Araujo, "Identification and control of dynamic systems
using the self-organizing maps", IEEE Transactions on Neural Networks, vol. 15, no. 5,
pp. 1244-1259, 2004.
[76] S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, 1999.
113
[77] J. Goppert and W. Rosenstiel, "Topology preserving interpolation in self-organizing
maps", Proceedings of the NeuroNIMES'93, pp. 425-534, 1993.
[78] J. Goppert and W. Rosenstiel, "Topology interpolation in SOM by affine
transformations", Proceedings of the ESANN'95, pp. 15-20, 1995.
[79] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press,
1994.
[80] C. W. J. Granger, and P. Newbold. "Spurious Regressions in Econometrics." Journal
of Econometrics, vol. 2, pp. 111–120, 1974.
[81] D. A. Dickey and W. Fuller, "Likelihood Ratio Statistics for Autoregressive Time
Series With A Unit Root", Econometrica, vol. 49, pp. 1057-1072, 1981.
[82] P. Phillips, and P. Perron. "Testing for a Unit Root in Time Series Regression."
Biometrika. Vol. 75, pp. 335–346, 1988.
[83] D. Kwiatkowski, P. Phillips, P. Schmidt and Y. Shin, "Testing the null hypothesis of
stationarity against the alternative of a unit root", Journal of Econometrics, vol. 54, pp.
159-178, 1992.
[84] M. Cottrell, J. C. Fort and G. Pages, "Theoretical aspects of the SOM algorithm",
Neurocomputing, vol. 21, pp. 119-138, 1998.
[85] page 139, T. Kohonen. Self Organizing Maps. Springer, Berlin, 3rd edition, 2001.
[86] M. Cottrell and J. C. Fort, "Etude d'un algorithme d'auto-organization", Annales de
l'Institut Henri Poincare, vol. 23, no. 1, pp. 1-20, 1987
[87] C. Bouton and G. Pages, "Self-Organization of the one-dimensional Kohonen
algorithm with non-uniformly distributed stimuli", Stochastic Processes and their
Applications, vol. 47, pp. 249-274, 1993.
114
[88] C. Bouton and G. Pages, "Convergence in distribution of the one-dimensional
Kohonen algorithm when the stimuli are not uniform", Advances in Applied Probability,
vol. 26, pp. 80-103, 1994.
[89] E. Erwin, K. Obermayer and K. Shulten, "Self-organizing maps: stationary states,
metastability and convergence rate", Biological Cybernetics, vol. 67, pp. 35-45, 1992.
[90] E. Erwin, K. Obermayer and K. Shulten, "Self-organizing maps: ordering,
convergence properties and energy functions", Biological Cybernetics, vol. 67, pp. 47-55,
1992.
[91] M. Cottrell, "Theoretical aspects of the SOM algorithm", Neurocomputing, vol. 21,
pp. 119-138, 1998.
[92] H. Robbins and S. Monro, "A stochastic approximation method", Annals of
Mathematical Statistics, vol. 22, pp. 400-407, 1951.
[93] National Grid. Available www.nationalgrid.com/UK
[94] C. L. Hor, S. J. Watson and S. Majithia, “Daily load forecasting and maximum
demand estimation using ARIMA and GARCH”, Proceedings PMAPS, pp. 1-6, 2006.
[95] D. van Dijk, T. Terasvirta and P. H. Franses, "Smooth transition autoregressive
models - a survey of recent developments", Econometric Reviews, vol. 21, no. 3, pp. 1-47,
2002.
[96] Robert B. Davies, "Hypothesis testing when a nuisance parameter is present only
under the alternatives", Biometrika, vol. 74, no. 1, pp. 33-43, 1987.
[97] R. Luukkonen, P. Saikkonen and T. Terasvirta, "Testing linearity against smooth
transition autoregressive models", Biometrika, vol. 75, pp. 491-499, 1988.
[98] C. W. J. Granger and T. Terasvirta, Modelling Nonlinear Economic Relationships,
Oxford University Press, Oxford, 1993.
115
[99] S. Leybourne, P. Newbold and D. Vougas, "Unit roots and smooth transitions",
Journal of Time Series Analysis, vol. 19, pp. 83-97, 1998.
[100] G. Thimm and E. Fiesler, "High-order and multilayer perceptron initialization",
IEEE Transactions on Neural Networks, vol. 8, no. 2, pp. 349-359, 1997.
[101] J. F. Kolen and J. B. Pollack, "Backpropagation is sensitive to initial conditions",
Laboratory for Artificial Intelligence Research, Comput. Inform. Sci. Dep, Tech. Rep. TR
90-JK-BPSIC, 1990.
[102] R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification, 2nd ed., WileyInterscience, 2000.
[103] New England ISO. Available www.iso-ne.com
[104] K. Y. Lee and Shu Du, "Short term load forecasting using semigroup based systemtype neural network", Proceedings of Intelligent System Application to Power Systems, pp.
291-296, 2005.
[105] Shu Du, "Short term load forecasting using system-type neural network
architecture", Master's thesis, Baylor University, 2009.
[106] The Alberta electric system operator. Available www.aeso.ca
[107] C. U. Vila, A. Z. de Souza, J. W. Lima and P. P. Balestrassi, "Electricity demand
and spot price forecasting using evolutionary computation combined with chaotic
nonlinear dynamic model", Electrical Power and Energy Systems, vol. 21, no. 2, pp. 108116, 2010.
[108] R. Cottet and M. Smith, "Bayesian modelling and forecasting of intraday electricity
load", Journal of American Statistical Association, vol. 98, no. 464, pp. 839-849, 2003.
[109] Australian Energy Market Operator. Available www.aemo.com.au
[110] D. Srinivasan, A. C. Liew and C. S. Chang, "A neural network short-term load
forecaster", Electric Power Systems Research, vol. 28, no. 3, pp. 227-234, 1994.
116
[111] D. Srinivasan, "Evolving artificial neural networks for short term load forecasting",
Neurocomputing, vol. 23, no. 1-3, pp. 265-276, 1998.
[112] J. Skalin and T. Terasvirta, "Another look at Swedish business cycles", Journal of
Applied Econometrics, vol. 14, pp. 359-378, 1999.
[113] J. Skalin and T. Terasvirta, "Modelling asymmetries and moving equilibria in
unemployment rates", Macroeconomic Dynamics, 2001.
[114] C. Weib and R. Gob, "Measuring serial dependence in categorical time series",
Advances in Statistical Analysis, vol. 92, no. 1, pp. 71-89, 2008.
117
[...]... STLF 1.7 Structure of the Thesis The thesis consists of the following chapters 16 In this first chapter, short term load forecasting was introduced The two approaches to short term load forecasting, statistical approach and computational intelligence based approach, were introduced, and their hybrid methods were discussed Relevant work from past research was presented Finally the motivation for this... explained how an unsupervised model such as a self- organizing map can be used for time series prediction Then the hybrid model, involving autocorrelation weighted input to the self- organizing map and autoregressive model is explained, along with the motivation for weighing with autocorrelation coefficients In the sixth chapter, the second hybrid model is proposed to overcome certain issues with the first... what this thesis sets out to do 1.5 Contribution of the Thesis In this work, two SOM based hybrid models are proposed for STLF In the first model, a load forecasting technique is proposed which uses a weighted SOM for splitting the past historical data into clusters For the standard SOM, all the inputs to the neural network are equally weighted This is a drawback compared to other supervised learning... necessary to make it a homoscedastic (constant variance) series Similarly, it needs to be determined if the series is stationary, and if there is any significant seasonality which needs to be modeled Differencing approach enables to handle stationarity and remove seasonality • Model identification involves identifying the order of the autoregressive and moving average terms to obtain a good fit to the... a teacher Hence the network has to learn to adapt based on the experiences collected through the previous training patterns The most popular architecture for unsupervised learning is the self- organizing map, which is discussed next 3.3 Self- Organizing Maps 3.3.1 Introduction The SOM algorithm, first introduced by Kohonen in [72], is one of the most popular ANN model based on the unsupervised competitive... supervised learning and self- organizing maps for unsupervised learning are described The architecture, the learning rule and relevant issues are presented In the fourth chapter, the stylized facts of the load demand series are presented It is necessary to understand the unique properties of the load demand series before any attempt is made to model them In the fifth chapter, the first hybrid model is presented... obtain a good fit to the data Several graph based approaches exist, which include the autocorrelation function and partial autocorrelation function approaches, and new model selection tools such as Akaike’s Information Criterion have been developed • Model estimation involves finding the value of model coefficients in order to obtain a good fit on the data The main approaches are non-linear least squares... has been used as a benchmark for daily load forecasting, and the double seasonal exponential smoothing as proposed in [64] has been used as a benchmark for hourly load forecasting 23 2.3 Nonlinear Models Regime-switching models were earlier mentioned briefly in Section 1.3.1.4 Nonlinear models can prove to be better in terms of estimation and forecasting compared to linear models because of their flexibility... are used to produce multiple scenarios for load forecasts In [45], network committee technique, which is a technique from the neural network architecture, is applied to improve the accuracy of forecasting the next-day peak load 1.6.3 Hybrid Methods Hybrid models combining statistical models and neural networks are rare for STLF, though they have been proposed for other TSP fields In [46], a hybrid ARIMA/ANN... found to be able to exploit all the original methods' advantages Similarly, particle swarm optimization is a recent CI approach which has been hybridized with other CI approaches such as neural networks [60][61] and support vector machines [62] to successfully improve the prediction accuracy for STLF 1.7 Structure of the Thesis The thesis consists of the following chapters 16 In this first chapter, short ... term load forecasting was introduced The two approaches to short term load forecasting, statistical approach and computational intelligence based approach, were introduced, and their hybrid methods... autoregressive and moving average terms to obtain a good fit to the data Several graph based approaches exist, which include the autocorrelation function and partial autocorrelation function approaches, ... unsupervised model such as a self- organizing map can be used for time series prediction Then the hybrid model, involving autocorrelation weighted input to the self- organizing map and autoregressive model