Self organizing maps based hybrid approaches to short term load forecasting

Chapter 1 Introduction This first chapter offers a general description of the short term load forecasting (STLF) problem, and its significance for the power industry. Then the two main approaches to STLF – statistical approach and artificial neural networks approach are introduced and detailed, followed by the motivation for this thesis and contribution of this thesis. Finally there is a bibliographic review of the methods for STLF from these two disciplines, and then the structure of this thesis is explained. 1.1 Load Forecasting Load forecasting has always been an issue of major interest for the electricity industry. During the operation of a power system, the system response closely follows the load requirements. So when there is an increase or decrease in the load demand, then the power generation has to be increased or decreased accordingly. To be able to provide this on-demand power generation, the electric utility operator needs to have available sufficient quantity of generation resources. Thus, if the operator has some a priori knowledge of the load requirements in the future, he can optimally allocate the generation resources. There are three kinds of load forecasting: short term, medium term, and long term forecasts. Utility operators need to perform all the three forecasts, as they influence different aspects of the power supply chain. Short term load forecasting typically means forecasts for one hour to one week, and are needed for the daily operation of the power system. Medium term forecasts typically cover one week to one year ahead, and are needed for fuel supply planning and maintenance. Long term load forecasts usually cover a period longer than a year, and are needed for power system planning. 1.2 Importance of Short Term Load Forecasting Short term load forecasting (STLF) is the keystone of the operation of today’s power systems. Without access to good short term forecasts, it would be impossible for any electric utility to be able to operate in an economical, reliable and secure manner. The input data for load flow studies and contingency analysis is provided by STLF. Utilities need to perform these studies to calculate the generating requirements of each 1 generator in the system, to determine the line flows, to determine the bus voltages, and to ensure that the system continues to operate reliably even in the case of contingencies such as loss of a generator or of a line. STLF is also used by the utility engineers in other offline network studies, such as preparing a list of corrective actions for different types of expected faults. Such corrective actions may include load shedding, switching off interconnections and forming islands, starting up of peaking units or increasing the spinning and standby reserves of the system [1]. Thus, the STLF is used by the system operators and regulatory agencies to ensure the safe and reliable operation of the system, and by the producers to ensure the optimal utilization of generators and power stations. With the advent of deregulation and the rise of competitive electricity markets, STLF has also become important for market operators, transmission owners and other market participants [2]. As an accurate electricity price forecast is not possible without an accurate load forecast, hence the operational plans and bidding strategies of the market players require STLF as well. Forecast errors will have negative implications for the company profits, and eventually for shareholder value. 1.3 Approaches to Short Term Load Forecasting STLF methods, and more generally, time series prediction (TSP) methods can be broadly divided into two categories: statistical methods and computational intelligence (CI) methods. 1.3.1 Statistical Methods 1.3.1.1 Time Series Models Modern statistical methods for time series prediction can be said to begun in 1927 when Yule came up with an autoregressive technique to predict the annual number of sunspots. According to this model, the next-step value was a weighted average of previous observations of the series. To model more interesting behavior from this linear system, outside intervention in the form of noise was introduced. For the next half-century, the reigning paradigm for predicting any time series remained that of a linear model added with noise. The popular models developed during this period would include moving 2 average, exponential smoothing methods, Box-Jenkins approach to modeling autoregressive moving average (ARMA) models and autoregressive integrated moving average (ARIMA) models. These models, referred together as time series models, assume that the data is following a stationary pattern, i.e. the series is normally distributed with a constant mean and variance over a long time period. They also assume that the series has uncorrelated random error, and no outliers are present. Applied for load forecasting, time series methods provide satisfactory results as long as the variables affecting the load demand, such as environmental variables, do not change suddenly. Whenever there is an abrupt change in such variables, the accuracy of the time series models suffers. Also, the assumption of stationarity of the load series is rather restricting, and whenever the historical load data deviates significantly from this assumption, the forecasting accuracy decreases. 1.3.1.2 Regression Models Regression methods are another popular tool for load forecasting. Here the load is modeled as a linear combination of relevant variables such as weather conditions and day type. Temperature is usually the most important factor for load forecasting among weather variables, though its importance depends upon the kind of forecast and the type of climate. For example, for STLF, temperature effects might be more critical for tropical regions than temperate ones. Typically temperature is modeled in a nonlinear fashion. Other weather variables such as wind velocity, humidity and cloud cover can be included in the regression model to obtain higher accuracy. Clearly, no two utilities are the same, and a detailed case study analysis of the different geographical, meteorological, and social factors affecting the load demand needs to be carried out before proceeding with the regression methods. Once the variables have been determined, the coefficients of these variables can be estimated using least squares or other regression methods. Though regression methods are popular tools for STLF among electric utilities, they have their share of drawbacks. The relationship between the load demand and the influencing factors is a nonlinear and complex one, and developing an accurate model is a challenge. From on-site tests, it has been seen that the performance of regression methods 3 deteriorates when the weather changes abruptly, leading to load deviation [3]. This drawback occurs in particular because the model is linearized so as to obtain its coefficients. But the load patterns are nonlinear; hence a linearized model fails to represent the load demand accurately during certain distinct time periods. 1.3.1.3 Kalman Filtering Based Models Towards the end of 1980s, as computers became more powerful, it became possible to record longer time series and apply more complex algorithms to them. Drawing on ideas from differential topology and dynamical systems, it was possible to represent a time series as being generated by deterministic governing equations. This approach of Kalman filtering techniques characterizes dynamical systems by a state-space representation. The theory of Kalman filtering provides an efficient computational (recursive) means to estimate the state of a process, in a way that minimizes the mean of the squared error. The filter supports estimation of the past, present and even future states, and it can do so even when the precise nature of the modeled system is unknown [4]. A significant challenge in the use of Kalman filtering based methods is the estimation of the state-space model parameters. 1.3.1.4 Non-linear Time Series Models To overcome the limitations of the linear time series models, a second generation of non-linear statistical time series models has been developed. Some of the models, such as autoregressive conditional heteroscedatic (ARCH) and generalized autoregressive conditional heteroscedatic (GARCH) attempt the model the variance of the time series as a function of its past values. These models achieved limited success for STLF since they were mostly specialized for particular problems in particular domain, example volatility clustering in financial indices. Regime-switching models, developed first for econometrics, are slowly being successfully applied for STLF as well. As the name suggests, these models involve switching between a finite number of linear regimes. The models only differ in their assumptions about the stochastic process generating the regime. 4 i. The mixture of normal distributions model has state transition probabilities which are independent of the history of the regime. Compared to a single normal distribution, this approach is better able to model fatter-than-normal tails and skewness [5]. ii. In the Markov-switching model, the switching between two or more regimes is governed by a discrete-state homogeneous Markov chain [6]. So in a possible formulation of the Markov-switching model, the model can be divided into two parts, firstly a regressive model to regress the model variable over hidden state variables, and secondly, an autoregressive model to describe the hidden state variables. iii. In the threshold autoregressive (TAR) model [7][8], the switching between two or more linear autoregressive models is governed by an observable variable, called the threshold variable. In the case where this threshold variable is a lagged value of the time series, the model is called a self-exciting threshold autoregressive (SETAR) model. iv. In the smooth transition autoregressive (STAR) model, the switching is governed by an observable threshold variable, similar to TAR model, but a smooth transition between the two regimes is enforced. As a few of these non-linear time series model form the basis of the hybrid models proposed in this work, they are explained in detail in Chapter 2. 1.3.2 Computational Intelligence Methods The deregulated markets and the constant need to improve the accuracy of load forecasting have forced the electricity utility operators to focus much attention to computational intelligence based forecasting methods. It has been calculated in [9] that a reduction of 1% of forecasting error could save up to $1.6 million annually for a utility. Computational intelligence techniques broadly fall into four classes – expert systems, fuzzy logic systems, neural networks and evolutionary computation systems. A brief introduction to these four approaches is provided. 5 1.3.2.1 Expert Systems An expert system is a computer program which simulates the judgment and behavior of a human or an organization that has expert knowledge and experience in a particular field. Typically an expert system would comprise four parts: a knowledge base, a data base, an inference mechanism, and a user interface. For STLF, the knowledge base is typically a set of rules represented in the IF-THEN form, and can consist of relationships between the changes in the load demand and changes in factors which affect the use of electricity. The data base is typically a collection of facts provided by the human experts after interviewing them, and also facts obtained using the inference mechanism of the system. The inference mechanism is the “thinking” part of the expert system, because it makes the logical decisions using the knowledge from the knowledge base and information from the data base. Forward chaining and backward chaining are two popular reasoning mechanisms used by the inference mechanism [10]. In terms of advantages, the expert systems can be used to take decisions when the human experts are unavailable, thus reducing the work burden of human experts. When human experts retire, their knowledge can still be retained in these systems. 1.3.2.2 Fuzzy Logic Systems Fuzzy systems are knowledge-based software environments which are constructed from a collection of linguistic IF-THEN rules, and realize nonlinear mapping which has interesting mathematical properties of “low-order interpolation” and “universal function approximation”. These systems facilitate the design of reasoning mechanism of partially known, nonlinear and complex processes. A fuzzy logic system comprises of four parts – fuzzifier, fuzzy inference engine, fuzzy rule base and defuzzifier. The system takes the crisp input value, which is then fuzzified (i.e. converted into corresponding membership grade in the input fuzzy sets), thereafter it is fed to the fuzzy inference engine. Using the stored IF-THEN fuzzy rules from the rule base, the inference engine produces a fuzzy output that undergoes further defuzzification to result in crisp output. 6 Fuzzy logic is often combined with other computational intelligence methods such as expert systems and neural networks. 1.3.2.3 Artificial Neural Networks (ANN) Artificial neural networks are massively parallel, distributed processing systems built on the analogy to the human neural network – the fundamental information processing system. Generally speaking, the practical use of neural networks has been recognized mainly because of such distinguished features as i. general nonlinear mapping between a subset of the past time series values and the future time series values ii. the capability of capturing essential functional relationships among the data, which is valuable when such relationships are not known a priori or are very difficult to describe mathematically and/or when the collected observation data are corrupted by noise iii. universal function approximation capability that enables modeling of arbitrary nonlinear continuous function to any degree of accuracy iv. capability of learning and generalizing from examples using the data-driven selfadaptive approach [11] In fact, there are several kinds of ANN models. Every neural network model can be classified by its architecture, processing and training. The architecture describes the neural connections. Processing describes how the network produces output for every input and weight. The training algorithm describes how the neural network adapts its weight for every training vector. The multilayer perceptron (MLP) is one of the most researched network architecture. It is a supervised learning neural architecture, and it has been very popular for time series prediction in general, and STLF in particular. This is because in its simplest form, a TSP problem can be rewritten as a supervised learning problem, with the current and past values of the time series as the input values to the network, and the one-step7 ahead value as the output value. This formulation allows one to explore the universal function approximation and subsequent generalization capability of the MLP. The radial basis function (RBF) network is another popular supervised learning architecture which can also be used for the same purposes. The Self-Organizing Map (SOM) is an important unsupervised learning neural architecture, which is based on unsupervised competitive-cooperative learning paradigm. In contrast to the supervised learning methods, SOM has not been popular for time series prediction, or STLF. This mostly is because the SOM is traditionally viewed as a data vector quantization and clustering algorithm [12][13] less suitable for function approximation by itself. Hence when used for TSP, the SOM is usually used in a hybrid model, where the SOM is first used for clustering, and subsequently another function approximation method such as MLP or support vector regression (SVR) is used to learn the function. As the MLP and SOM form the basis of the work proposed in this thesis, they are reviewed in greater detail in Chapter 3. 1.3.2.4 Evolutionary Approach The algorithms developed under the common term of evolutionary computation are inspired from the study of evolutionary behavior of biological processes. They are mainly based on selection of a population as a possible initial solution of a given problem. Through stepwise processing of the initial population using evolutionary operators, such as crossover, recombination, selection and mutation, the fitness of the initial population steadily improves. Consider how a genetic algorithm might be applied to load forecasting. First an appropriate model (either linear or nonlinear) is selected and an initial population of candidate solutions is created. A candidate solution is produced by randomly choosing a set of parameter values for the selected forecasting model. Each solution is then ranked based on its prediction error over a set of training data. A new population of solutions is generated by selecting fitter solutions and applying a crossover or mutation operation. 8 New populations are created until the fittest solution has a sufficiently small prediction error or repeated generations produce no reduction of error. 1.3.3 Hybrid Approaches Hybrid models have been proposed to overcome the inadequacies of using an individual model, either a statistical method or a computational intelligence method. Also referred to as ensemble methods or combined methods, these models usually are employed to improve the prediction accuracy. There is still an absence of good theory in this field on how to proceed with hybridization, though trends are emerging. Broadly speaking, hybrid methods can be implemented in three different ways: linear models, nonlinear models and both linear and nonlinear models. In linear hybridization, two or more linear statistical models are combined together. Though some work has been done in this field, as discussed in the section on literature review, this field did not really pick up because a linear hybrid model would still suffer from many of the problems with which linear models suffer. The most heavily researched hybrid models would be those involving two nonlinear models, especially two computational intelligence models. This is because the three popular CI models - ANNs, fuzzy logic and evolutionary computation have their own capabilities and restrictions, which are usually complimentary to each other. So for eg, the black-box modeling approach of neural networks might be well suited for process modeling or for intelligent control, but not that suitable for decision control. Similarly the fuzzy logic systems can easily handle imprecise data and explain their decisions in the context of the available facts in linguistic form; however they cannot automatically acquire the linguistic rules to make these decisions. It is these capabilities and restrictions of individual intelligent technologies which have driven their fusion to create hybrid intelligent systems which have been successfully applied for various complex problems, including STLF. The third class of hybrid models, which this thesis is about, involve one statistical method and one computational intelligence method. Usually the CI method is a neural network, chosen for their flexibility and powerful pattern recognition capabilities. But 9 when developed as a predictive model, neural networks become difficult to interpret due to their black-box nature and it becomes hard to test the parameters for their statistical significance. Hence, time series models, linear ones such as ARMA or ARIMA, or nonlinear ones such as STAR are introduced in the hybrid model to handle the concern of interpretation. 1.4 Motivation Though a comfortable state of performance has been achieved for electricity load forecasting, but market players will always bring in new dynamic bidding strategies, which, coupled with price-dependent load shall introduce new variability and nonstationarity in the electricity load demand series. Besides, the stricter power quality requirements and development of distributed energy resources are other reasons why the modern power system will always require more advanced and more accurate load forecasting tools. Consider why a SOM based hybrid model is an appealing option. Though every possible approach has been applied for STLF, the more popular ones are the time series approaches and computational intelligence approaches of feed-forward neural networks. An extensive literature review is done in Section 1.6. Both these approaches attempt to build a single global model to describe the load dynamics. The difference between time series approaches and supervised learning neural networks is that while time series approaches build an exact model of the dynamics (“hard computing”), the supervised learning neural networks allow some tolerance for imprecision and uncertainty to achieve tractability and robustness (“soft computing”). However, there is an exciting alternative to building a global model, which is to build local models for the series dynamics, where each local model handles a smaller section of the series dynamics. This is definitely an area which needs further study, because a time series such as load demand series shows various stylized facts, discussed further in Chapter 4. The complexity of a global model increases a lot if it is to handle all the stylized facts. Working with multiple local models might bring down the complexity. On the other hand, the challenges faced in working with local models are manifold. Firstly, what factors should decide the division of the series 10 dynamics into local models. Secondly, how do we combine the results from multiple local models to give the final prediction value? In this thesis, SOM based hybrid models are proposed to explore the abovementioned idea of local models. As mentioned earlier, SOMs have been less applied to STLF traditionally, which mostly has to do with the prevalent attitude among researchers that SOMs are an unsupervised learning method, suitable only for data vector quantization and clustering [12][13]. But this same property of clustering makes SOMs an excellent tool for building local models. Another motivation for this thesis is to further explore the idea of transitions between local models. Once the local models have been built, how does the transition from one model to another take place? Is it a sudden jump, where a local model M1 was being used to describe the series on a particular day and a different local model M2 is being used for the next day? After analyzing the electricity load demand series, it was found that regimes were present in the series, due to season effects and market effects, and the transition between these regimes was smooth. Hence a sudden jump from one local model to another local model might not be the best approach. Hence this thesis studies the NCSTAR model in Chapter 6, which allows smooth transition between local models. The idea is to be able to obtain a highly accurate learning and prediction not only for test samples which clearly belong to a particular local model, but also for test samples which represent the transition from one local model to another local model. Earlier researchers have proposed working with local models for STLF in different ways, and to attain different aims. For eg., in [14], the same wavelet-based neural network is trained four times over different periods in the year to handle the four seasons. But this paper does not consider the transitions between the local models, i.e. the seasons to be smooth. Not much work has been done on enforcing smooth transition between regimes or local models for STLF. After extensive literature review (please see Section 1.6), the only paper which was found to be handling smooth transition between local models for electricity load forecasting is [30]. So definitely more study has to be done on how to identify local models, how to implement smooth transitions between local models, and 11 how introducing the smooth transition will affect the prediction accuracy of the overall model for STLF. This is exactly what this thesis sets out to do. 1.5 Contribution of the Thesis In this work, two SOM based hybrid models are proposed for STLF. In the first model, a load forecasting technique is proposed which uses a weighted SOM for splitting the past historical data into clusters. For the standard SOM, all the inputs to the neural network are equally weighted. This is a drawback compared to other supervised learning methods which have procedures to adjust their network weights, e.g. back-propagation method for MLPs and pseudo-inverse method for RBFs. Hence, a strategy is proposed which weighs the inputs according to their correlation with the output. Once the training with the weighted SOM is complete, the time series has now been divided into smaller clusters, one cluster for each neuron. Next, a local linear model is built for each of these clusters using an autoregressive model, which helps to smoothen the results. In the second hybrid model, the aim is to allow for smooth transitions between the local models. Here the model of interest is a linear model with time varying coefficients which are the outputs of a single hidden layer feedforward neural network. The hidden layer is responsible for partitioning the input space into multiple sub-spaces through multivariate thresholds and smooth transition between the sub-spaces. Significant research has already been done into the specification, estimation and evaluation of this model. In this thesis, a new SOM-based method is proposed to smartly initialize the weights of the hidden layer before the network training. First, a SOM network is applied to split the historical data dynamics into clusters. Then the Ho-Kashyap algorithm is used to obtain the equations of the hyperplanes separating the clusters. These hyperplanes' equations are then used to smartly initialize the weights and biases of the hidden layer of the network. 1.6 Literature Survey The two approaches to STLF, and TSP in general, statistical methods and CI methods have already been discussed above, and their different sub-categories have been 12 introduced. Some of the approaches described, such as non-linear time series models, SOMs, and MLPs are more relevant to the work in this thesis than other models. What follows next is a bibliographical survey for methods in STLF, with more emphasis given to methods relevant to work done in this thesis. 1.6.1 Statistical Methods In the field of linear approach to time series, Box-Jenkins methodology is the most popular approach to handling ARMA and ARIMA models, and consists of model identification and selection, parameter estimation and model checking. Box-Jenkins methodology is among the oldest methods applied to STLF. It was proposed in [15], and further developed in [16]. With a more modern perspective, [17] is an influential text on nonlinear time series models, including several of those described in Section 1.3.1.4. ARMA and ARIMA continue to be very popular for STLF. In [18], the load demand is modeled as the sum of the two terms, the first term depending on the time of day and the normal weather pattern for that day, and the second term being the residual term which models the random disturbances using an ARMA model. Usually the Box-Jenkins models assume a Gaussian noise. In [19], the ARMAmodeling method proposed allows for non-Gaussian noise as well. Other works which use Box-Jenkins method for STLF are [20][21]. In [22], a periodic autoregression model is used to develop 24 seasonal equations, using the last 48 load values within each equation. The motivation is that by following a seasonal-modeling approach, it is possible to incorporate a priori information concerning the seasonalities at several levels (daily, weekly, yearly, etc.) by appropriately choosing the model structure and estimation method. In [23], an ARMAX model is proposed for STLF, where the X represents an exogenous variable, temperature in this case. Actually this is a hybrid model as it uses a computational intelligence method, paticle swarm optimization to determine the order of the model as well as its coefficients instead of the traditional Box-Jenkins approach. An ARIMA model uses differencing to handle the non-stationarity of the series, and then uses ARMA to handle the resulting stationary series. In [24], six methods are 13 compared for STLF, and ARIMA is found to be a suitable benchmark. In [25], a modified ARIMA model is proposed. Basically this model not only takes past loads as input, but also the estimates of past loads provided by human experts. Thus this model, in a sense, incorporates the knowledge of experienced human operators. This method is shown to be superior to both ANN and ARIMA. Now consider the previous work in STLF on regime-switching models, i.e. nonlinear statistical time series model discussed earlier in Section 1.3.1.4. The threshold autoregressive (TAR) model was proposed by [7] and [8]. In [26], a TAR model with multiple thresholds is developed for load forecasting. This model chooses the optimum number of thresholds is the one which minimizes the sum of threshold variances. A generalization of the TAR model is the smooth transition autoregressive (STAR) model, which was initially proposed in [27], and further developed in [28] and [29]. A modified STAR model for load forecasting is proposed in [30] where temperature plays the role of threshold variable. This method uses periodic autoregressive models to represent the linear regime, as they better capture the fact that the autocorrelation at a particular lag of one half-hour varies across the week. Such switching regime models have also been proposed for electricity price forecasting [31] [32]. 1.6.2 Computational Intelligence Methods Four CI methods were introduced earlier in Section 1.3.2, but the following literature review focuses mostly on neural networks, as these are the most popular amongst the four for STLF, and also the most relevant to the work done in this thesis. There are several kinds of ANN models, classified by their architecture, processing and training. For STLF, the popular ones have been used, e.g. radial basis function networks [33][34], self-organizing maps [35] and recurrent neural networks [36][37]. However the most popular network architecture is the multi-layer perceptron described in Section 1.3.2.3, as its structure lends naturally to unknown function approximation. In [38], a fully connected three-layer feedforward ANN is implemented with backpropagation learning rule, and the input variables being historical hourly load data, day of the week and temperature. In [3], a multi-layered feedforward ANN is developed 14 which takes three types of variables as inputs - season related inputs, weather related inputs, and historical loads. In [39], electricity price is also considered as a main characteristic of the load. Other recent work involving MLP for STLF include [40][41][34][42]. In [43], in order to reduce the neural network structure and learning time, a onehour-ahead load forecasting method is proposed which uses the correction of similar day data. In this proposed prediction method, the forecasted load power is obtained by adding a correction to the selected similar day data. In [44], weather ensemble predictions are used for STLF. A weather ensemble prediction consists of multiple scenarios for a weather variable. These scenarios are used to produce multiple scenarios for load forecasts. In [45], network committee technique, which is a technique from the neural network architecture, is applied to improve the accuracy of forecasting the next-day peak load. 1.6.3 Hybrid Methods Hybrid models combining statistical models and neural networks are rare for STLF, though they have been proposed for other TSP fields. In [46], a hybrid ARIMA/ANN model is proposed. Because of the complexity of a moving trend as well as a cyclic seasonal variation, an adaptive ARIMA model is first used to forecast the monthly load and then the forecast load of the ARIMA model is used as an additive input to the ANN. The prediction accuracy of this approach is shown to be better than traditional methods of time series models and regression methods. In [47], a recurrent neural network is trained by features extracted from ARIMA analyses, and used for predicting the mid-term price trend of the Taiwan stock exchange weighted index. In [48], again an ARIMA model and neural network model are combined to forecast time series of reliability data with growth trend, and the results are shown to be better than either of the component models. In [49], seasonal ARIMA (SARIMA) model and the neural network MLP are combined to forecast time series with seasonality. It was mentioned earlier in Section 1.3.2.3 that a neural network can be implemented for both, supervised as well as unsupervised learning. But unsupervised 15 learning architectures, such as SOMs have traditionally been used for data vector quantization and clustering. Hence when used for TSP, the SOM is usually used in a hybrid model, where the SOM is first used for clustering, and subsequently another function approximation method such as MLP or support vector regression (SVR) is used to learn the function. In [50][51][52], a two-stage adaptive hybrid network is proposed. In the first stage, a SOM network is applied to cluster the input data into several subsets in an unsupervised manner. In the next stage, support vector machines (SVMs) are used to fit the training data of each subset in a supervised manner. In [53], profiling is done through SOMs, followed by prediction through radial function networks. In [54], the first SOM module is used to forecast normal and abnormal days, and the second MLP module is able to make the load model sensitive weather factors such as temperature. As was mentioned in Section 1.3.3, the most heavily researched hybrid models for TSP in general involve those where both the component models are computational intelligence methods. In [55], a real-time pricing type scenario is envisioned where energy prices change on an hourly basis, and the consumer is able to react to those price signals by changing his load demand. In [56], attention is paid to special days. An ANN provides the forecast scaled load curve and fuzzy inference models give the forecast maximum and minimum loads of the special day. Similarly, significant work has also been done on hybridizing evolutionary algorithms with neural networks. In [57], a genetic algorithm is used to tune the parameters of a neural network which is used for STLF. A similar approach is presented in [58]. In [59], a fuzzy neural network is combined with a chaossearch genetic algorithm and simulated annealing, and is found to be able to exploit all the original methods' advantages. Similarly, particle swarm optimization is a recent CI approach which has been hybridized with other CI approaches such as neural networks [60][61] and support vector machines [62] to successfully improve the prediction accuracy for STLF. 1.7 Structure of the Thesis The thesis consists of the following chapters. 16 In this first chapter, short term load forecasting was introduced. The two approaches to short term load forecasting, statistical approach and computational intelligence based approach, were introduced, and their hybrid methods were discussed. Relevant work from past research was presented. Finally the motivation for this thesis, and its contributions were presented. In the second chapter, statistical methods for time series analysis are briefly discussed. These include the more traditional Box-Jenkins methodology, Holt-Winters exponential smoothing, and the more recent regime-switching models. In the third chapter, two popular neural network models, multilayer perceptron for supervised learning and self-organizing maps for unsupervised learning are described. The architecture, the learning rule and relevant issues are presented. In the fourth chapter, the stylized facts of the load demand series are presented. It is necessary to understand the unique properties of the load demand series before any attempt is made to model them. In the fifth chapter, the first hybrid model is presented. First it is explained how an unsupervised model such as a self-organizing map can be used for time series prediction. Then the hybrid model, involving autocorrelation weighted input to the self-organizing map and autoregressive model is explained, along with the motivation for weighing with autocorrelation coefficients. In the sixth chapter, the second hybrid model is proposed to overcome certain issues with the first proposed model. The need for smooth transitions between regimes in the load series is highlighted. The contribution of this paper, a novel method to smartly initialize the weights of the hidden layer of the neural network model NCSTAR is presented. The final chapter concludes this thesis with some directions for future work. 17 Chapter 2 Statistical Models for Time Series Analysis In this chapter, the classical tools for time series prediction are reviewed, and recent developments in nonlinear modeling are detailed. First, the commonly used Box-Jenkins approach to time series analysis is described. Then, another commonly used classical method, the Holt-Winters exponential smoothing procedure is explained. Finally, an overview of the more recent regime-switching models is given. 2.1 Box-Jenkins Methodology ARMA models, as described by the Box-Jenkins methodology, are a very rich class of possible models. The assumptions for this class of models are (a) the series is stationary or can be transformed to one using a simple transformation such as differencing (b) the series follows a linear model. The original Box-Jenkins modeling procedure involves an iterative three-stage procedure of model identification, model estimation and model validation. Later work [63] includes a preliminary stage for data preparation and a final stage for forecasting. • Data preparation can involve several sub-steps. If the variance of the series changes with the level, then a transformation of the data, such as logarithms, might be necessary to make it a homoscedastic (constant variance) series. Similarly, it needs to be determined if the series is stationary, and if there is any significant seasonality which needs to be modeled. Differencing approach enables to handle stationarity and remove seasonality. • Model identification involves identifying the order of the autoregressive and moving average terms to obtain a good fit to the data. Several graph based approaches exist, which include the autocorrelation function and partial autocorrelation function approaches, and new model selection tools such as Akaike’s Information Criterion have been developed. • Model estimation involves finding the value of model coefficients in order to obtain a good fit on the data. The main approaches are non-linear least squares and maximum likelihood estimation. 18 • Model validation involves testing the residuals. As the Box-Jenkins models assume that the error term should follow a stationary univariate process, the residuals should have nearly the properties of i.i.d. normal random variables. If the assumptions are not satisfied, then a more appropriate model needs to be found. The residual analysis should hopefully provide some clues on how to develop a more appropriate model. 2.1.1 AR Model An autoregressive model of order p ≥ 1 is defined as Xt = b1Xt-1 + .. + bpXt-p + εt (2.1) where {εt} ~ N(0, σ2), also known as white noise. This model can be written as an AR(p) process. The equation explicitly specifies the linear relationship between the current value and its past values. 2.1.2 MA Model A moving average model of order q ≥ 1 is defined as Xt = εt + a1 εt-1 +…+ aq εt-q (2.2) where {εt} ~ N(0, σ2), or white noise. This model can be written as an MA(q) process. For h < q, there is a correlation between Xt and Xt-h due to the fact that they depend on the same error terms εt-j. 2.1.3 ARMA Model Combining the AR and MA forms together gives the popular autoregressive moving average ARMA model, which can be defined as Xt = b1Xt-1 + .. + bpXt-p + εt + a1 εt-1 +…+ aq εt-q (2.3) where {εt} ~ N(0, σ2), or white noise, and (p,q) are the order of the models. ARMA models are a popular choice for approximating various stationary processes. 19 2.1.4 ARIMA Model An autoregressive integrated moving average ARIMA model is a generalization of an ARMA model. A time series which needs to be differenced to be made stationary is said to be an “integrated” version of a stationary series. So an ARIMA(p,q,d) process is one where the series needs to be differenced d times to obtain an ARMA(p,q) process. This model, as mentioned in Section 1.6.1, continues to popular for STLF, and has been used as a benchmark in this work. 2.2 Holt Winters Exponential Smoothing Method 2.2.1 Introduction Exponential smoothing is a procedure where the forecast is continuously revised in the light of more recent experience. This method assigns exponentially decreasing weights as the observation gets older. The method consists of three steps: deciding on the model to use and setting initial values of model parameters, updating the estimates of model parameters, and finally forecasting. Single exponential smoothing, used for short-range smoothing, assumes that the data fluctuates around a reasonably stable mean (no trend or seasonality). Double exponential smoothing method is used when the data shows a trend. Finally, the method which is most interesting for this thesis, triple exponential smoothing, also called Holt-Winters smoothing, can handle both trend and seasonality. There are two main Holt-Winters smoothing models, depending on the type of seasonality – multiplicative seasonal model and additive seasonal model. The difference between the two is that in the multiplicative case, the size of the seasonal fluctuations varies, depending on the overall level of the series, whereas in the additive case, the series shows steady seasonal fluctuations. So an additive seasonal model is appropriate for a time series when the amplitude of the seasonal pattern is independent of the average level of the series. 20 2.2.2 Model Set-up Consider the case when the series exhibits additive seasonality. In this model, the assumption is that the time series can be represented by the model yt = b1 + b2t + St + εt (2.4) where b1 is the base signal called the permanent component b2 is a linear trend component, which may be deleted if necessary St is a additive seasonal factor, such that for season length of L periods, ∑ St = 0 1≤t ≤L εt is the random error component 2.2.3 Notation Used for the Updating Process Let the current deseasonalized level of the process at the end of period T be denoted by RT. At the end of a time period t, let R t be the estimate of the deseasonalized level. Gt be the estimate of the trend. St be the estimate of the seasonal component. 2.2.4 Procedure for Updating the Estimates of Model Parameters 2.2.4.1 Overall smoothing R t = α (yt - St-L) + (1 – α) * ( R t-1 + Gt-1) (2.5) where 0 < α < 1 is a smoothing constant. St-L is the seasonal factor for period T computed one season (L periods) ago. Subtracting St-L from yt deseasonalizes the data so that only the trend component and the prior value of the permanent component enter into the updating process for R t. 21 2.2.4.2 Smoothing of the trend factor Gt = β * ( R t - R t-1) + (1 – β) * Gt-1 (2.6) where 0 < β < 1 is another smoothing constant. The estimate of the trend component is simply the smoothed difference between two successive estimates of the deseasonalized level. 2.2.4.3 Smoothing of the seasonal component St = γ * (yt - R t) + (1 – γ) * St-L (2.7) where 0 < γ < 1 is the third smoothing constant. The estimate of the seasonal component is a combination of the most recently observed seasonal factor given by the demand yt after removing the deseasonalized series level estimate R t and the previous best seasonal factor estimate for this time period. All the parameters in the method, α, β, and γ are estimated by minimizing the sum of squared one step-ahead in-sample errors. The initial smoothed values for the level, trend and seasonal components are estimated by averaging the early observations. 2.2.4.4 Value of forecast The forecast for the next period is given by: yt = R t-1 + Gt-1 + St-L (2.8) Note that the best estimate of the seasonal factor for this time period in the season is used, which was last updated L periods ago. 2.2.5 Exponential smoothing for double seasonality When dealing with daily load forecasting, the series shows only one significant seasonality, which is the within-week cycle. Hence the above proposed method can be satisfactorily applied in that scenario. 22 But when concerned with hourly load forecasting, there are two seasonalities, the within-day cycle and the within-week cycle. To handle this double seasonality scenario, [64] proposes an extension of the classical seasonal Holt-Winters smoothing method. Using a new formulation where St and Tt denote the smoothed level and trend, Dt and Wt are seasonal indices (intra-day and intra-week), s1 and s2 are the seasonal periodicity lengths for intra-day and intra-week periods respectively, α, γ, δ, and ω are the smoothing parameters, and ŷt(k) is the k-step-ahead forecast made from forecast origin t, then: St = α yt + (1 – α) (St-1 + Tt-1) Dt-s Wt-s 1 (2.9a) 2 Tt = γ(St - St-1) + (1- γ) Tt-1 Dt = δ yt + (1 – δ) Dt- s St Wt-s (2.9b) (2.9c) 1 2 Wt = ω yt + (1 – ω) Wt- s St Dt-s (2.9d) 2 1 ŷt(k) = (St + kTt) Dt- s 1 +k Wt- s 2 +k + Φk(yt – ((St-1 + Tt-1) Dt- s Wt- s ) 1 2 (2.9e) The multiplicative seasonality formulation has been used here, though it is mentioned in [64] that the additive seasonality gives similar results. The parameter involving Φ is an adjustment for first-order autocorrelation. In [65], a comparison of several univariate methods for STLF is presented. Besides the exponential smoothing for double seasonality described above, the other methods compared are double seasonal ARIMA model, artificial neural network, and a regression method with principal component analysis. It is reported that in terms of mean absolute percentage error (MAPE), the best approach is double seasonal exponential smoothing. Hence in this paper, the standard Holt-Winters exponential smoothing has been used as a benchmark for daily load forecasting, and the double seasonal exponential smoothing as proposed in [64] has been used as a benchmark for hourly load forecasting. 23 2.3 Nonlinear Models Regime-switching models were earlier mentioned briefly in Section 1.3.1.4. Nonlinear models can prove to be better in terms of estimation and forecasting compared to linear models because of their flexibility in capturing the characteristics of the data. In this document, we will only be considering the threshold models. In order to keep clarity within the various threshold models, a homogeneous notation for all the models is described here. Henceforth the following notation will be used: • yt is the value of a time series {yt} at time t • xɶ t ∈ ℜp is a p × 1 vector of lagged values of yt and/or some exogenous variables • xt ∈ ℜp+1 is defined as xt = [1, xɶ tT]T, where the first element is referred as an intercept • The general nonlinear model is then expressed as yt = Φ(xt ; ψ) + εt (2.10) where Φ(xt ; ψ) is a nonlinear function of the variable xt with parameter ψ, and {εt} is a sequence of independently normally distributed random variables with zero mean and variance σ2. • The logistic function which is used later on, when defined over the domain ℜp is usually written as f(γ(xt - β)) = 1 1 +exp (-γ ( xt - β ) ) (2.11a) where γ or slope parameter determines the smoothness of the change between models, i.e. the smoothness of the transition from one regime to another, and β can be considered as the threshold which marks the regime switch. In its onedimensional form, it can be written as 24 f(γ(yt-d - c)) = 1 1 +exp (-γ (yt-d- c) ) (2.11b) where yt-d is usually known as the transition or threshold variable, and d is called the delay parameter. 2.3.1 Threshold Autoregressive Model (TAR) To solve limitations of the linear approach, a threshold autoregressive (TAR) model was proposed, which allows for a locally linear approximation over a number of regimes, and it can be formulated as k yt = ∑ ωi xt I(st ∈ Ai) + εt (2.12a) i=1 k = ∑ {ωi,0 + ωi,1 yt-1 + ωi,2 yt-2 +…+ ωi,p yt-p} I(st ∈ Ai) + εt (2.12b) i=1 where st is the threshold variable, I is an indicator (or step) function, ωi is the autoregressive parameters for the ith linear regime, and {Ai} forms a partition of (-∞,∞) k with k ∪ Ai = (-∞,∞) and ∩A i=1 i=1 i = ϕ, ∀ i ≠ j. So basically one of the autoregressive models is activated, depending upon the value of the threshold variable st relative to the partitions {Ai}. 2.3.2 Smooth Transition Autoregressive Model (STAR) If one has good reason to believe that the transitions between the regimes are smooth, and not discontinuous as assumed by TAR model, then one can choose the smooth transition autoregressive (STAR) model. In this model, the indicator function I(.) changes from a step function to a smooth function, such as sigmoid function, as in Equation 2.11b. This STAR model with k regimes is defined as k yt = ∑ ωi xt Fi(st; γi, ci) + εt (2.13) i=1 25 The transition function, F(st; γi, ci) is a continuous function which is bounded between 0 and 1. The transition variable st can be a lagged endogenous variable, that is st= yt-d for certain integer d > 0. But this is not a required assumption, as the transition variable can also be an exogenous variable (st= zt), or a (possibly nonlinear) function of lagged endogenous variables: st= h( xɶ t ; α) for some function h which depends on the (p×1) parameter vector α. Finally, we obtain a model with smoothly changing parameters if the transition variable is a linear time trend, i.e. st= t. The observable variable st and the associated value of F(st; γi, ci) determine the regime that occurs at time t. Different types of regime-switching behavior can be obtained by different choices for the transition function. The first-order logistic function, Equation 2.11b is a popular choice for F(st; γi, ci), and the resultant model is called a logistic STAR (LSTAR). In the LSTAR model, the transition function F(st; γi, ci) in Equation 2.13 is defined as  1 − f ( st; γi, ci )  Fi(st; γi, ci) = f ( st; γi, ci ) − f ( st ; γi+1, ci+1)  f (s ; γ , c ) t i i  if i =1 if 1 6.5%. Figure 6.4. The histogram of the residual error for New England hourly data TABLE 6.3: PERCENTAGE OF HOURS WITH A CERTAIN MAPE RANGE FOR NEW ENGLAND DATA MAPE Range < 1% % of Hours 24.2 1% to 2% 20.7 2% to 3% 17.4 3% to 4% 11.7 4% to 5% 8.7 5% to 6% 6.7 6% to 7% 4.0 >7% 6.6 At the 95% significance level, the null hypothesis of normal distribution is rejected for the obtained residuals using all the three tests - Kolmogorov-Smirnov test, Lilliefors 85 test and Jaruq-Bera test. In Figure 6.4, looking at the histogram of the residuals and comparing it with normal distribution's histogram, we can see that the residuals have fattails. This means that from time to time, there are rather large values in the error which are hard to reconcile with the standard distributional assumption of normality. In a fat-tailed distribution, the probability of large and small values is much higher than would be implied by a normal distribution. This can be the reason why the tests for normality are rejected for the residuals. Consider why the fat-tails are present in the error residuals. This thesis hypothesizes that sudden changes in weather due to summer effect and winter effect are responsible for the fat-tails. The winter months of January-December and summer months of June-July often include some extremely high demand days because of sudden heat wave or sudden cold wave on those particular days. The NCSTAR model does not consider exogenous weather forecasts in its inputs, because it assumes that weather variables evolve in a smooth fashion and the load series can sufficiently capture the change. This assumption leads to bigger errors for the days when weather changes suddenly. Hence the fat-tails. To support this hypothesis, consider Figure 6.5. Let Lk and Lk+1 denote the load demand at consecutive hours k and k+1 respectively. So ∆ = L k+1 denotes the change over Lk the two consecutive hours. In Figure 6.5, the histogram of ∆ = L k+1 for the four year data Lk plotted. The histogram appears to be the superposition of three bell-shaped segments. The biggest bell-shaped segment is in the middle, around ∆ = 1. This corresponds to the normal weather, when weather changes smoothly. Then there are two bell-shaped segments, one below and one above the centre one. These are being created because of more-than-normal or less-than-normal change in demand over two hours due to weather effects. Hence, weather effects lead to fat-tails, which leads to the rejection of the hypothesis of normality of error residuals. 86 Figure 6.5. Hourly change ∆ = L k+1 for New England hourly data Lk 6.4.1.8 Benchmarking The benchmark used is the semigroup based system-type neural network (ST-NN) architecture proposed in [104][105]. In this method, the network is decomposed into two channels - a semigroup channel and a function channel. The semigroup channel models the dependency of the load on temperature, whereas the functional channel represents the fundamental characteristics of daily load cycles. In Table 6.4, the results are shown which compare the two models for the seven days of the week. Clearly NCSTAR outperforms the ST-NN model on each day, and the improvements in accuracy are largest for the weekend days. Looking at the MAPE values for the combined data, MAPEs are 3.75% and 3.07% respectively for ST-NN and NCSTAR methods. NCSTAR has been able to improve the results by 0.68% over ST-NN, which is a significant improvement. In Table 6.5, results are compared for the twelve months of the year. While ST-NN and NCSTAR perform rather similarly in the easy-topredict months of March, April, May, it can be seen that NCSTAR is able to provide a 87 much better result in months which form the transition between seasons, such as June and September. TABLE 6.4: COMPARISON OF PREDICTION RESULTS FOR NEW ENGLAND FOR EACH DAY OF THE WEEK Monday ST-NN 4.01 NCSTAR 3.00 Tuesday 3.42 2.73 Wednesday 3.24 2.86 Thursday 3.24 2.96 Friday 3.50 3.23 Saturday 4.61 3.42 Sunday 4.26 3.07 Total 3.75 3.07 TABLE 6.5: COMPARISON OF PREDICTION RESULTS FOR NEW ENGLAND FOR INDIVIDUAL MONTHS January ST-NN 2.91 NCSTAR 2.83 February 2.70 2.42 March 2.79 2.90 April 3.20 3.23 May 3.10 3.18 June 4.52 2.41 July 3.97 3.79 August 5.28 5.00 September 6.10 3.23 October 2.85 2.42 November 3.19 2.10 December 4.40 3.25 6.4.1.9 Confidence Interval of the Predicted Value We have already discussed how to deliver a single point forecast. However, the decision-maker needs additional information bounding possible future values of the forecasted process, in order to assess the uncertainties involved in prediction. This can be done by looking at the confidence interval of the predicted values. Confidence intervals should be as narrow as possible, and they should encompass a number of true values to justify the reliability of the model. 88 Figure 6.6 shows the actual load demand and the 95% confidence interval for the predicted load demand for three representative weeks of the year, the first week of February, July and November respectively. The confidence interval was calculated after running 1000 simulations of the proposed hybrid model. From the figure, it can be seen that the width of the confidence interval changes over the day. As a general rule, the confidence interval is narrow during the non-working hours, such as night-time, and increases for the working hours. The maximum width of the confidence interval exists for the periods which are the most difficult to predict. The period which is the most difficult to predict depends on factors such as seasonality, as can be seen in the figures for the three weeks of February, July and November. 89 Figure 6.6. Actual load demand (on left) and the 95% confidence interval (on right) for the predicted load demand for three representative weeks of the year, the first week of February, July and November respectively. The right side figures show the predicted value (thick line) and the upper and lower bounds of the 95% confidence interval (thin lines) 6.4.2 Alberta Market 6.4.2.1 Data Source and Period Five years of publicly available hourly load demand data for the Alberta market [106] in Canada is used. Data from January 1, 2000 to December 31, 2002 is used to train the network, data from, January 1, 2003 to December 31, 2003 is used as a validation set, whereas data from January 1, 2004 to December 31, 2004 is used to calculate the out-ofsample accuracy. The model is built using the steps of model identification, model estimation and model validation as explained earlier in Section 6.2. 6.4.2.2 Benchmark with AESO The forecast load published by Alberta Electric System Operator (AESO) is a reasonable benchmark to compare against for the proposed NCSTAR model. For the test period of 2004, the MAPE of NCSTAR is 1.08% while the MAPE of AESO is 1.26%. Tables 6.6 and 6.7 show the breakdown of MAPE results over the days of the week and the months of the year for the period 2004 respectively. Clearly the NCSTAR model has been able to significantly improve the forecasting results over AESO. 90 TABLE 6.6: COMPARISON OF PREDICTION RESULTS FOR ALBERTA FOR EACH DAY OF THE WEEK Monday AESO 1.44 NCSTAR 1.23 Tuesday 1.19 1.11 Wednesday 1.09 1.04 Thursday 1.03 0.93 Friday 1.18 1.06 Saturday 1.29 1.03 Sunday 1.36 1.11 Total 1.26 1.08 TABLE 6.7: COMPARISON OF PREDICTION RESULTS FOR ALBERTA FOR INDIVIDUAL MONTHS January AESO 1.26 NCSTAR 1.25 February 0.90 1.00 March 1.41 0.87 April 1.11 0.96 May 1.20 0.89 June 1.29 1.06 July 1.61 1.33 August 1.12 1.22 September 1.28 0.93 October 1.02 0.98 November 1.13 0.92 December 1.36 1.44 Looking at the monthly MAPE breakdown for both NCSTAR and AESO, we notice that the MAPE is high for two periods - winter months of January-December and summer months of June-July. Both these periods correspond to a relatively higher demand. These periods also often include some extremely high demand days because of sudden heat wave or cold wave on those particular days. As the NCSTAR model does not incorporate future weather or temperature information, the prediction accuracy deteriorates during these periods. But a lack of meteorological variables in the NCSTAR model is still justified because for short lead times like one-day-ahead, usually the meteorological variables evolve in a very smooth fashion, and the load series can sufficiently capture the change. 91 6.4.2.3 Benchmark with PREDICT2-ES, ARIMA and ANN We also compare our work against a recent model proposed in [107]. This is the PREDICT2-ES model which is a non-linear chaotic dynamic based predictor, with five optimal parameters which are searched through an evolutionary strategy. Comparisons are made for four weeks of the year 2004, each representing a different season. The results are shown in Table 6.8. ARIMA refers to the autoregressive integrated moving average methodology developed by Box and Jenkins[15]. In Table 6.8, ANN refers to artificial neural networks. The MAPEs shown for ARIMA, ANN and PREDICT2-ES are taken from [107]. AESO refers to the error made by the forecast load published by AESO for the same period. It can be seen that ARIMA and ANN have a worse accuracy compared to the other three approaches. AESO and NCSTAR are a slight improvement over the PREDICT2-ES model. The accuracy of AESO and NCSTAR are comparable for these studied periods. TABLE 6.8: COMPARISON OF PREDICTION RESULTS FOR ALBERTA Test period 2/16/20042/22/2004 5/11/20045/17/2004 8/16/20048/22/2004 10/25/200410/31/2004 ARIMA ANN PREDICT 2-ES AESO NCSTAR 1.440 2.130 0.945 0.877 0.877 1.070 1.100 0.812 0.832 0.840 2.540 2.130 1.272 0.986 1.195 1.500 0.820 0.745 0.727 0.652 6.4.2.4 Error Analysis Here we apply methods taken from the model validation stage to see whether the NCSTAR model or the AESO model are able to model the data. Error residuals are obtained for both the models, and the histogram is plotted. Also, on top of the histrogram, a best-fit normal distribution is superimposed. The results are shown in Figure 6.7 for (a) NCSTAR model and (b) AESO model. Similar to the error residuals from the New England market, here too the null hypothesis of normal distribution is rejected for both the models. This is possible due to the fat-tails, or outliers due to unexpected weather 92 conditions as was described in Section 6.4.1.7 for New England data. Comparing the two plots, not much can be said, except that the frequency for NCSTAR model is higher around the centre compared to AESO model. Also, the histogram seems to be more symmetric for NCSTAR compared to AESO. For more comparisons between the two models, it is necessary to look at the table of range of forecasting errors, which is shown in Table 6.9. The NCSTAR model has 57.6% hours with MAPE less than 1%, compared to only 50.7% hours for AESO model. Thus the NCSTAR model has a better accuracy compared for AESO for a larger number of hours. Figure 6.7. The histogram of the residual error for Alberta hourly data for (a) NCSTAR model and (b) AESO model TABLE 6.9: PERCENTAGE OF HOURS WITH A CERTAIN MAPE RANGE FOR NEW ENGLAND DATA % of Hours in NCSTAR model 57.6 % of Hours in AESO model 50.7 1% to 2% 28.0 29.8 2% to 3% 9.5 12.3 3% to 4% 3.5 4.8 >4% 1.4 2.4 MAPE Range < 1% 93 6.4.3 New South Wales Market Here the results for NCSTAR are compared against the results for a Bayesian Markov chain scheme proposed in [108] upon six months of data from New South Wales market [109] in Australia. This model presents a Bayesian approach for estimating multiequation models for intra-day load forecasting, where a first-order vector autoregression is used for the errors. The training period is from January 1, 1998 to December 31, 2000. The test period is February 1, 2001 to July 31, 2001. The results for the six months data and the monthly breakdown are shown in Table 6.10 and Table 6.11 respectively. Clearly the NCSTAR model improves the prediction accuracy not only in the complete six months data, but also over individual months. TABLE 6.10: COMPARISON OF PREDICTION RESULTS FOR NEW SOUTH WALES Mean (Bayesian) Weekdays 3.10 Weekends 3.43 Mean (NCSTAR) 2.17 2.31 Median (Bayesian) 3.10 3.33 Median (NCSTAR) 1.82 2.01 TABLE 6.11: COMPARISON OF PREDICTION RESULTS FOR NSW FOR INDIVIDUAL MONTHS Weekdays Bayesian NCSTAR Weekends Bayesian NCSTAR February 3.97 2.25 4.00 2.63 March 3.17 2.79 3.44 2.78 April 2.31 2.05 3.06 1.46 May 3.14 2.00 3.74 2.51 June 4.45 2.20 4.53 2.32 July 2.17 1.81 2.33 2.02 6.4.4 England and Wales Market Three markets have been studied to show the superior performance of NCSTAR over the ISOs and other benchmarks for hourly load forecasting. Now instead of hourly load forecasting, we work with daily load forecasting, which is also an interesting problem 94 for the power industry. Just as season effects and market effects lead to multiple regimes being present in the hourly demand series, similarly the same effects also lead to multiple regimes in the daily demand series. Hence NCSTAR is a suitable model to handle daily demand series as well. The data used here is obtained from the National Grid [93], which is the transmission company for England and Wales market. Daily load demand is obtained by summing the hourly demands for the day. The benchmark is an ARIMA model proposed in [85]. The test period is the year 2003. While [94] uses data from 1970 to 1998 to identify their ARIMA model, data of only the three preceding years 2000 to 2002 is used to obtain the NCSTAR model. The results are shown in Table 6.12. The results are impressive, considering that the NCSTAR model applied here had no special treatment for holidays, not even averaging. On the other hand, [94] included a separate model for special days, which is possible for them because they have enough samples for special days due to the longer training period. TABLE 6.12: COMPARISON OF PREDICTION RESULTS FOR ENGLAND AND WALES FOR INDIVIDUAL MONTHS OF 2003 January ARIMA 2.91 NCSTAR 2.09 February 2.53 1.19 March 1.76 2.50 April 3.02 1.40 May 2.28 2.08 June 1.21 0.89 July 2.22 0.84 August 1.96 1.78 September 1.48 1.12 October 2.20 1.38 November 1.68 1.52 December 3.61 2.22 Total 2.24 1.59 95 6.4.5 Singapore Market Finally, the NCSTAR model is applied on the Singapore market data. But first, a small study on the seasonality pattern in Singapore. Seasons, as such, do not exist in Singapore. Singapore has a tropical rainforest climate with no distinctive seasons, uniform temperature and pressure, high humidity and abundant rainfall. May and June are the hottest months, while November and December make up the wetter monsoon season. Now consider the electricity demand (semi-hourly) for 2006, as shown in Figure 6.8. It can be clearly seen that seasonality effects are not present here. Though one might say that the beginning and end of the year, i.e. December, January and February have a slightly lower demand. This could be because of the wetter monsoon season, and/or the holiday season effects. Figure 6.8. Semi-hourly electricity demand in Singapore from 1 Jan 2006 to 31 Dec 2006 For NCSTAR, the training period is the data from January 1, 2005 to December 31, 2007, the validation data is from January 1, 2008 to December 31, 2008, and finally the testing period is from January 1, 2009 to December 31, 2009. The model is built using the steps of model identification, model estimation and model validation as explained earlier 96 in Section 6.2. The holidays were replaced by the mean of load demand one week earlier and one week after. The MAPE results obtained by week day and by month are shown in Table 6.13 and Table 6.14 respectively. TABLE 6.13: MAPE PREDICTION RESULTS FOR SINGAPORE FOR EACH DAY OF THE WEEK Monday NCSTAR 1.43 Tuesday 1.50 Wednesday 1.13 Thursday 1.35 Friday 1.88 Saturday 1.54 Sunday 1.15 Total 1.43 TABLE 6.14: MAPE COMPARISON OF PREDICTION RESULTS FOR SINGAPORE FOR INDIVIDUAL MONTHS January NCSTAR 1.49 February 1.10 March 1.25 April 1.38 May 1.68 June 1.48 July 1.33 August 1.48 September 1.57 October 1.78 November 1.33 December 1.21 Unfortunately the author was unable to find a suitable benchmark to compare the work against. A suitable benchmark would have been a paper which had at least one year training data, and preferably the year should have been 2009. The strength of the proposed NCSTAR model is in the fact that it is better able to model smooth transitions between 97 multiple regimes. Seasonality is a prominent source of multiple regimes in the load demand time series. And in order to accommodate all the seasons at least once, it is necessary to have at least one year's data for testing. Unfortunately, such a paper for benchmarking was unavailable. The total MAPE obtained is 1.43% for Singapore. Other papers have earlier obtained MAPEs which are significantly less than this result [110][111]. As mentioned earlier, one reason why NCSTAR fails to obtain competitive results for Singapore data is the lack of multiple regimes due to lack of seasonality, and hence no good use of the smooth transition effect of the NCSTAR model. 6.4.6 Speed and Complexity Issues The proposed NCSTAR model with SOM-based initialization is able to provide a more accurate and faster training capability compared to the original NCSTAR model which was proposed in [68] with the original weight initialization involving parallel hyperplanes which has been discussed in Section 6.2.6. The reason for this faster training, as discussed earlier, relates to the fact that the SOM-based initialization helps the training to start from a value close to the global optimum, hence the training has less chance of getting stuck in a local minima, which is a major concern for the MLP. This fact is shown in Figure 6.9. For the Alberta data, both the NCSTAR model is implemented, once with the proposed SOM-based initialization, and then with the original weight initialization proposed in [68]. The training in both the scenarios is through the regular back-propagation algorithm [76]. Researchers have come up with heuristics such as learning with momentum term, and second-order algorithms such as conjugate gradient descent and Levenberg-Marquardt algorithms to speed the learning process compared to back-prop, and help prevent getting stuck in a local minima. But for benchmarking purposes, the regular back-propagation algorithm is used for both models. Figure 6.8 shows the fall of mean square error (MSE) with the number of iterations for 1000 iterations for Monday. The continuous line is the SOM-based initialized version, and the dotted line is the original NCSTAR initialization proposed in [68]. MSE falls faster, 98 and deeper for the SOM-based initialization method compared to the original initialization. Hence, the training is more accurate and faster. Figure 6.9. Comparison of training performance of SOM-based initialization (solid line) and originally proposed initialization (dotted line) for the NCSTAR model In terms of absolute time, the program is run on a Core 2 Duo 3Ghz processor. A separate model is being built for each hour of the week, i.e. 168 models. Building all the models together takes a time of 84.71 seconds in a typical run, which is 0.50 seconds to build an hourly model. 84.71 seconds is a significant amount of time, especially when compared to traditional methods such as ARIMA which are faster. However, the model is so time consuming because it has to first cluster the data, then train a MLP for each hour of each day. To handle the weekend effect, this is necessary. Building an hourly model takes three time-consuming steps - the SOM clustering, the Ho-Kashyap training, and the back-propagation training for the MLP. For a typical 99 hourly model, these three steps consume 0.27, 0.09 and 0.12 seconds respectively. The most time consuming step is the first one, i.e. the clustering with SOM. Consider the complexity of the NCSTAR model. Usually this is discussed in terms of number of input/output variables, and the complexity of the learning algorithm. The NCSTAR model has two components • a linear autoregressive model for prediction (Equation 6.1). The input is zt, which has lagged values of yt. How many lags and which lags to include is determined by the ACF function, as described in section on model identification. Usually only three lags, at lag 1, at lag 24, and at lag 168 are taken. • the coefficients of the linear model, which are the output of a single hidden layer MLP (Equation 6.2). Here the input is xt, which needs to have 24 variables, the lagged values of 24 hours in the previous day. It is important to include all the 24 lags because we are trying to cluster the data based on the historical daily load profile patterns, and the pattern might appear in any section of the day. Hence overall there are 28 variables as input to the NCSTAR model. This means that the complexity is fairly high. Though in literature review, it is not uncommon to find STLF models which have this high or higher number of variables. Also the fact that our model assumes that weather variables change smoothly (and these changes can be captured by the load demand itself), we do not have to include any exogenous variables, which keep the model relatively simple. Finally the learning algorithm is the Concentrated Least Squares method, which is a modified back-propagation algorithm aimed at taking benefit of certain linearities in the model to reduce the dimensionality of the iterative estimation problem, as discussed in Section 6.2.4. 100 6.5 Limitations of the proposed model The strength of the proposed model lies in being better able to model the smooth transition between multiple regimes. This model might not be best suitable for power markets where the load pattern does not show multiple regimes. As strong seasonal effects are an important cause for multiple regimes, this means that the model might not be able to perform well enough for power markets with no seasonal effects. This can be seen from the results, where for Singapore data, the results are not as impressive as for the previous three markets of New England, Alberta and New South Wales. A possible reason for this is the lack of strong seasonality in Singapore. 6.6 Discussion And Conclusion In this chapter, it is first explained why the NCSTAR model, with its multivariate thresholds and smooth transitions between regimes, is suitable for short-term load forecasting. This is because the load demand is highly dependent on seasonal factors, and seasons tend to change gradually. Next, the inadequacies of the current methods of initializing the weights of the NCSTAR neural network are highlighted, and the importance of having good initial network weights is explained. Finally a two step method to obtain fairly good initial weights is proposed. In the first step, unsupervised learning is used to cluster the historical data into separate regimes. Second step involves the use of Ho-Kashyap algorithm to find the equations of the separating planes, which are then used to initialize the weights of the hidden layer neurons. Experiments on four prominent energy markets show that the proposed method gives competitive results in terms of prediction accuracy for hourly load forecasting as well as daily load forecasting. A notable advantage of the proposed method is that it can easily handle the presence of multiple regimes in the electricity load data, which might occur due to seasonal effects or market effects. This is because the NCSTAR model is working with the weighted sum of several local AR models instead of a single global model for the whole series. This handling of multiple regimes is a desirable property because in a deregulated power market, players will continue to bring in new dynamic bidding strategies which will introduce more local regimes in the price-dependent load demand series. 101 The proposed method of weight initialization for the NCSTAR model also makes it more robust to initial conditions because the first step of the initialization method involves a SOM, which is generally robust against bad initializations. In the present model, exogenous variables such as weather factors like temperature or humidity are not included. This can be justified by the observation that for short lead times like one-day-ahead, the weather variables evolve in a smooth fashion. However it is also noted that the prediction accuracy is the worst for peak winter and peak summer months, which are most associated with sudden cold wave or sudden heat wave respectively. Assuming that good weather forecasts are available, it will be interesting to incorporate the weather factors in our model in such a way that they influence the predicted load demand only if the predicted weather is significantly different from historically observed normal weather for the next day. Furthermore, which weather variable to consider shall depend upon the characteristics of the energy market being studied. So while humidity might be an important factor for load forecasting in tropical markets, it does not matter much in temperate markets. Future work in this field will have to deeply investigate these characteristics before weather variables can be incorporated in the model to improve the prediction accuracy further. 102 Chapter 7 Conclusion The purpose of this thesis was to approach the problem of short term load forecasting using hybrid methods. This work was primarily concerned with SOM-based hybrid models. The aim was to show that unsupervised learning methods such as SOMs, which are usually not associated much with time series prediction, can be hybridized judiciously with other time series models such as a linear AR model or a non-linear STAR model, to give good results for STLF. The first hybrid method proposed used SOMs for data vector quantization and clustering. SOMs were used to divide the historical dynamics into several clusters. Thus, this method implied working with several local models instead of a single global model. Each local model handles only a small section of the dynamics. Besides, each local model can be a simple model, such as the AR model in the proposed hybrid model. The model was shown to give more accurate results than popular time series ARIMA model. In the previous model, the transition from one local model to another local model was a discrete event, i.e. the jump from one local model to another local model was immediate, as a series segment can belong only to one local model. But a better way is to model the transitions as being smooth transitions. In nature, when a regime caused by cold weather changes to a regime caused by mild weather, the transition is a gradual one. Same can be said about other regime changes, caused by season effect or market effect. Obviously the smoothness of the transition varies. A smooth transition model would help to improve the accuracy during the transition period. In the second hybrid model, the smooth transition between regimes is modeled. First, a linear-neural model is described which is capable of handling multiple regimes and multiple transition variables. Next a new method is proposed to smartly initialize the weights of the hidden layer of the neural network before its training. Finally the model is implemented on four electricity markets to show its competitiveness compared to popular time series models and other recently proposed models. 103 The unique contribution of this thesis is the in-depth study of the smooth transition models for the area of short-term load forecasting. Though regime-switching models, specifically smooth transition autoregressive models have been popularly used in econometrics [112][113], a detailed study on smooth transition models has been lacking for the domain of STLF. As electricity load demand series has its own unique stylized characteristics, it is interesting to see how the smooth transition approach applies in this field. In this thesis, a model building procedure is developed for the smooth transition approach, involving identification, estimation and validation stages. Another unique contribution of this thesis is the new approach to initialize the weights of the NCSTAR model. This new approach, involving the SOM and the HoKashyap methodology was required because the original NCSTAR model proposed a global search for weight initialization, which is not an ideal approach to initialize the weights of a multilayer perceptron. Though this new weight initialization method has been proposed here for STLF, it is a very general method which can be easily extended to other time series prediction fields as well. A final unique contribution of this thesis is the idea explored in the first hybrid model that the inputs to the SOM could be weighted in order to achieve an improved forecasting accuracy. Not much work has been done in the field of weighted SOMs. In this thesis, the weights for SOMs were chosen to be the autocorrelation coefficients between time lags. For the domain of short-term load forecasting, this is interesting because the autocorrelation varies periodically due to the weekend effect. It was shown that autocorrelation weighing improves the prediction accuracy for STLF under certain situations. In terms of future lines of research, the following paths remain open for future developments: • For STLF, the question of whether to include exogenous variables such as weather related variables can be addressed. If exogenous weather variables are included, a study will need to done to compare between the various weather variables available to find which one contributes to improving the accuracy the most. An 104 appropriate study will be required to find if the relationship between the load and weather is linear or not. A good idea would be to weigh the effect of weather variable such that if the forecast weather variable has a significantly different value from the historically observed value, then it is given more weight. • In the first proposed hybrid model, the autocorrelation coefficient is used as a measure of dependence within a time series. This is just one, and a rather simple one, amongst many other measures of dependence within a time series. Other measures have been proposed, such as Cohen's kappa, Cramer's, Goodman and Kruskal's lambda, for e.g. in [114]. Weighing the SOM inputs with these measures can be implemented. • In the first proposed hybrid model, the bigger idea is to build local models instead of a global model. SOM is just one amongst many possible tools for clustering, and subsequently building a local model. Other clustering algorithms, statistical as well as neural network based, can be explored and the performances compared. In the second proposed hybrid model, SOM is used to initialize the weights of the hidden layer of a neural network. Choosing initial weights continues to be a challenging problem in other computational intelligence methods such as particle swarm optimization as well. Hybridizing with unsupervised learning methods such as SOM for weight initialization is an interesting problem to study. 105 Bibliography [1] A. J. Wood and B. F. Wollenberg, Power Generation Operation and Control, John Wiley & Sons, 1984. [2] E.A. Feinberg and D. Genethliou, "Load forecasting". In: Applied Mathematics for Restructured Electric Power Systems: Optimization, Control, and Computational Intelligence, J.H. Chow et al. (eds.), Springer, 2005. [3] A. D. Papalexopoulos, S. Hao and T. M. Peng, "An Implementation of a Neural Network Based Load Forecasting Model for the EMS", IEEE Transactions on Power Systems, vol. 9, no. 3, pp. 1956-1962, 1994. [4] G. Welch and G. Bishop, "An Introduction to Kalman Filter", Technical Report TR95041, University of North Carolina, Chapel Hill, NC, USA, 1995. [5] J. Wang, "Modeling and Generating Daily Changes in Market Variables Using A Multivariate Mixture of Normal Distributions", Proceedings of the 33rd Conference on Winter Simulation, pp. 283-289, 2001. [6] L. E. Baum, T. Petrie, G. Soules and N. Weiss, "A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains", The Annals of Mathematical Statistics, vol. 41, no. 1, pp. 164-171, 1970. [7] H. Tong, "On a Threshold Model", in Pattern Recognition and Signal Processing, C. H. Chen, Ed. Amsterdam, The Netherlands: Sijthoff and Noordhoff, 1978. [8] H. Tong and K. S. Lim, "Threshold Autoregression, Limit Cycles and Cyclical Data (with discussion)", Journal of Royal Statistical Society, ser. B 42, pp. 245-292, 1980. [9] B. F. Hobbs, S. Jitprapaikulsarn, S. Konda, V. Chankong, K. A. Loparo and D. J. Maratukulam, "Analysis of the value for unit commitment of improved load forecasting", IEEE Transactions on Power Systems, vol. 14, no. 4, pp. 1342-1348, 1999. 106 [10] J. A. Gonzalez, and D. D. Douglas. The Engineering of Knowledge-Based Systems: Theory and Practice, Englewood Cliffs, NJ: Prentice Hall, 2000. [11] A. K. Palit and D. Popovic, Computational Intelligence in Time Series Forecasting: Theory and Engineering Applications, Springer, London, 2005. [12] R. Xu and D. Wunsch, "Survey of Clustering Algorithms", IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 645-678, 2005. [13] A. Flexer, "On the Use of Self-Organizing Maps for Clustering and Visualization", Intelligent Data Analysis, vol. 5, no. 5, pp. 373-384, 2001. [14] N. M. Pindoriya, S. N. Singh and S. K. Singh, “Forecasting of Short-Term Electric Load Using Application of Wavelets with Feed-Forward Neural Networks”, International Journal of Emerging Electric Power Systems, vol. 11, no. 1, 2010. [15] G. E. P. Box, G. M. Jenkins and G. Reisnel, Time Series Analysis- Forecasting and Control, 3rd ed., Prentice Hall, 1994. [16] T. Anderson, The Statistical Analysis of Time Series, Wiley & Sons, 1971. [17] H. Tong, Non-linear Time Series: A Dynamical System Approach, Oxford University Press, Oxford, 1990. [18] G. Gross and F. D. Galiana, "Short-term load forecasting", Proceedings of the IEEE, vol. 75, no. 2, pp. 1558-1573, 1987. [19] S. J. Huang and K. R. Shih, "Short-Term Load Forecasting via ARMA Model Identification Including Non-Gaussian Process Identification", IEEE Transactions on Power Systems, vol. 18, no. 2, pp. 673-679, 2003. [20] J. Y. Fan and J. D. McDonald, "A Real-Time Implementation of Short-Term Load Forecasting For Distribution Power Systems", IEEE Transactions on Power Systems, vol. 9, no. 2, pp. 988-994, 1994. 107 [21] M. T. Hagan and S. M. Behr, "The Time Series Approach To Short-Term Load Forecasting", IEEE Transactions on Power Systems, vol. 2, no. 3, pp. 785-791, 1987. [22] M. Espinoza, C. Joye and R. Belmans, "Short-Term Load Forecasting, Profile Identification and Customer Segmentation: A Methodology Based On Periodic Time Series", IEEE Transactions on Power Systems, vol. 20, no. 3, pp. 1622-1631, 2005. [23] C. M. Huang, C. J. Huang, and M. L. Wang, "A Particle Swarm Optimization To Identifying the ARMAX Model for Short-Term Load Forecasting", IEEE Transactions on Power Systems, vol. 20, no. 2, pp. 1126-1133, 2005. [24] J. W. Taylor, L. M. de Menezes and P. E. McSharry, "A Comparison of Univariate Methods for Forecasting Electricity Demand Up to A Day Ahead", International Journal of Forecasting, vol. 22, pp. 1-16, 2006. [25] N. Amjady, "Short Term Load Forecasting Using Time-Series Modeling With Peak Load Estimation Capability", IEEE Transactions on Power Systems, vol. 16, no. 3, pp. 498-505, 2001. [26] S. R. Huang, "Short-Term Load Forecasting Using Threshold Autoregressive Models", IEE Proceedings on General Transmission and Distribution, vol. 144, no. 5, pp. 477-481, 1997. [27] K. S. Chan and H. Tong, "On Estimating Thresholds In Autoregressive Models", Journal of Time Series Analysis, vol. 7, pp. 179-190, 1986. [28] R. Luukkonen, P. Saikkonen and T. Terasvirta, "Testing Linearity Against Smooth Transition Autoregressive Models", Biometrika, vol. 75, pp. 491-499, 1988. [29] T. Terasvirta, "Specification, Estimation, and Evaluation of Smooth Transition Autoregressive Models", Journal of American Statistical Association, vol. 89, no. 425, pp. 208-218, 1994. 108 [30] L. F. Amaral, R. C. Souza and M. Stevenson, "A Smooth Transition Periodic Autoregressive Model for Short-Term Load Forecasting", International Journal of Forecasting, vol. 24, no. 4, pp. 603-615, 2004. [31] A. T. Robinson, "Electricity pool prices: A case study in nonlinear time series", Applied Economics, vol. 32, no. 5, pp. 527-532, 2000. [32] M. Stevenson, "Filtering and forecasting electricity prices in the increasingly deregulated Australian electricity market", International Institute of Forecasters Conference, pp. 1-31, 2001. [33] D. K. Ranaweera, N. F. Hubele and A. D. Papalexopoulos, "Application of radial basis function neural network model for short-term load forecasting", IEE Proceedings of Generation, Transmission and Distribution, no. 142, pp. 45-50, 1995. [34] E. Gonzalez-Romera, M. A. Jaramillo-Moran and D. Carmona-Fernandez, "Monthly electricity energy demand forecasting based on trend extraction", IEEE Transactions on Power Systems, vol. 21, no. 4, pp. 1946-1953, 2006. [35] M. Becalli, M.Cellura, V. Lo Brano and A. Marvuglia, "Forecasting daily urban electricity load profiles using artificial neural networks", Energy Conversion and Management, vol. 45, pp. 2879-2900, 2004. [36] T. Senjyu, P. Mandal, K. Uezato and T. Funabashi, "Next day load curve forecasting using recurrent neural network structure", IEE Proceedings of Generation, Transmission and Distribution, vol. 151, no. 3, pp. 388-394, 2004. [37] C. N. Tran, D. C. Park and W. S. Choi, "Short-term load forecasting using multiscale bilinear recurrent neural network with an adaptive learning algorithm", In: King, I. et al. (Eds.), Thirteenth International Conference on Neural Information Processing (ICONIP 2006), LNCS, vol. 4233. Springer, pp. 964-973, 2006. [38] A. G. Bakirtzis, V. Petridis, S. J. Klarzis, M. C. Alexadis and A. H. Malssis, "A neural network short term load forecasting model for the Greek power system", IEEE Transactions on Power Systems, vol. 11, no. 2, pp. 858-864, 1996. 109 [39] H. Chen, C. A. Canizares and A. Singh, "ANN based short-term load forecasting in electricity markets", Proceedings of the IEEE Power Engineering Society Transmission and Distribution Conference, vol. 2, pp. 411-415, 2001. [40] H. S. Hippert, C. E. Pedreira and R. C. Souza, "Neural networks for short-term load forecasting: A review and evaluation", IEEE Transactions on Power Systems, vol. 16, no. 1, pp. 44-55, 2001. [41] H. S. Hippert, D. W. Bunn and R. C. Souza, "Large neural networks for electricity load forecasting: Are they overfitted", International Journal of Forecasting, vol. 21, pp. 425-434, 2005. [42] J. V. Ringwood, D. Bofelli and F. T. Murray, "Forecasting electricity demand on short, medium and long time scales using neural networks", Journal of Intelligent and Robotic Systems, vol. 31, pp. 129-147, 2001. [43] T. Senjyu, H. Takara, K. Uezato and T. Funabashi, "One hour ahead load forecasting using neural network", IEEE Transactions on Power Systems, vol. 17, no. 1, pp. 113-119, 2002. [44] J. W. Taylor and R. Buizza, "Neural network load forecasting with weather ensemble predictions", IEEE Transactions on Power Systems, vol. 17, pp. 626-632, 2002. [45] R. E. Abdel-Aal, "Improving electric load forecasts using network committees", Electric Power Systems Research, vol. 74, pp. 83-94, 2005. [46] A. A. El Desouky and M. M. Elkateb, "Hybrid adaptive techniques for electric-load forecast using ANN and ARIMA", IEE Proceedings of Generation, Transmission and Distribution, vol. 147, no. 4, pp. 213-217, 2000. [47] J. H. Wang and J. Y. Leu, "Stock market trend prediction using ARIMA-based neural networks", Proceedings of IEEE International Conference on Neural Networks, vol. 4, pp. 2160-2165, 1996. 110 [48] C. T. Su, L. I. Tong and C. M. Leou, "Combination of Time series and neural network for reliability forecasting modeling", Journal of the Chinese Institute of Industrial Engineers, vol. 14, no. 4, pp. 419-429, 1997. [49] F. M. Tseng, H. C. Yu and G. H. Tzeng, "Combining neural network model with seasonal time series ARIMA model", Technological Forecasting and Social Change, vol. 69, no. 1, pp. 71-87, 2001. [50] S. Fan and L. Chen, "Short-term load forecasting based on an adaptive hybrid method", IEEE Transactions on Power Systems, vol. 21, no. 1, pp. 392-401, 2006. [51] M. Martin-Merino and J. Roman, "A New SOM algorithm for electricity load forecasting", Lecture Notes in Computer Science, Springer-Verlag, pp. 995-1003, 2006. [52] Z. Bao, D. Pi, and Y. Sun, "Short term load forecasting based on self-organizing map and support vector machine," Lecture Notes in Computer Science, vol. 3610, pp. 688-691, 2005. [53] A. Lendasse, M. Cottrell, V. Wertz and M. Verleysen, "Prediction of electric load using Kohonen maps - Application to the Polish electricity consumption", Proceedings of the American Control Conference, vol. 5, pp. 3684-3689, 2002. [54] M. Farhadi and S. M. M. Tafreshi, "Effective model for next day load curve forecasting based upon combination of perceptron and kohonen ANNs applied to Iran power network", 29th International Telecommunications Energy Conference, pp. 267-273, 2007. [55] A. Khotanzad, E. Zhou and H. Elragal, "A Neuro-Fuzzy approach to short-term load forecasting in a price sensitive environment", IEEE Transactions on Power Systems, vol. 17, no. 4, pp. 1273-1282, 2002. [56] K. H. Kim, H. S. Youn and Y. C. Kang, "STLF for special days in anomalous load conditions using neural networks and fuzzy inference method", IEEE Transactions on Power Systems, vol. 15, pp. 559-565, 2000. 111 [57] S.H. Ling, F.H.F. Leung, H.K. Lam, Y.S. Lee and P.K.S. Tam, “A novel geneticalgorithm-based neural network for short-term load forecasting,” IEEE Transactions on Industrial Electronics, vol. 50, no. 4, pp. 793-799, 2003. [58] Ronaldo R. B. de Aquino, O. N. Neto, Milde M. S. Lira, A. A. Ferreira, K. F. Santos. "Using Genetic Algorithm to Develop a Neural-Network-Based Load Forecasting", Lecture Notes in Computer Science, Springer Berlin, pp. 738-747, 2007. [59] G. C. Liao and T. P. Tsao, "Application of a fuzzy neural network combined with a chaos genetic algorithm and simulated annealing to short-term load forecasting", IEEE Transactions on Evolutionary Computation, vol. 10, no. 3, pp. 330-340, 2006. [60] Z. A. Bashir and M. E. El-Hawary, "Short-term load forecasting using artificial neural networks based on particle swarm optimization algorithm", Canadian Conference on Electrical and Computer Engineering, pp. 272-275, 2007. [61] D. Niu, Z. Gu and M. Xing, "Research on neural networks based on culture particle swarm optimization and its application in power load forecasting", Third International Conference on Natural Computation, pp. 270-274, 2007. [62] J. Wang, Y. Zhou and Y. Chen, "Electricity load forecasting based on support vector machines and simulated annealing particle swarm optimization algorithm", Proceedings of the IEEE International Conference on Automation and Logistics, pp. 2836-2840, 2007. [63] S. Makridakis, S. C. Wheelwright and R. J. Hyndman, Forecasting: Methods and Applications,3rd edition. John Wiley and Sons, 1998. [64] J. W. Taylor, "Short-Term Electricity Demand Forecasting Using Double Seasonal Exponential Smoothing", Journal of Operational Research Society, vol. 54, pp. 799-805, 2003. [65] J. W. Taylor and P. E. McSharry, "Short-Term Load Forecasting Methods: An Evaluation Based on European Data", IEEE Transactions on Power Systems, vol. 22, pp. 2213-2219, 2007. 112 [66] T. Teräsvirta, M. C. Medeiros and G. Rech, 2006. "Building neural network models for time series: a statistical approach", Journal of Forecasting, John Wiley & Sons, Ltd., vol. 25, no. 1, pp. 49-75, 2006. [67] A. Veiga and M. Medeiros, "A hybrid linear-neural model for time series forecasting", Proceedings of the NEURAP, pp. 377-384, 1998. [68] A. Veiga and M. Medeiros, "A hybrid linear-neural model for time series forecasting", IEEE Transactions on Neural Networks, vol. 11, no. 6, pp. 1402-1412, Nov 2000. [69] A. Veiga and M. Medeiros, "A flexible coefficient smooth transition time series model", IEEE Transactions on Neural Networks, vol. 16, no. 1, pp. 97-113, Jan 2005. [70] T. E. Jin, "Training issues and learning algorithms for feedforward and recurrent neural networks", Ph. D. thesis, National University of Singapore, Singapore, 2009. [71] D. S. Yeung and X. Sun, “Using function approximation to analyze the sensitivity of MLPs with antisymmetrix squashing activation function”, IEEE Transactions on Neural Networks, vol. 13, no. 1, pp. 34-44, 2002. [72] T. Kohonen, Self-Organizing Maps, Springer-Verlag, Berlin, Germany, 1997. [73] A. Jain and B. Satish, "Clustering based Short Term Load Forecasting using Support Vector Machines", IEEE Power Tech Conference, 2009. [74] J. M. Fidalgo and M. A. Matos, "Forecasting Portugal global load with artificial neural networks", Lecture Notes in Computer Science, Springer-Verlag, pp. 728-737, 2007. [75] G. A. Barreto and A. F. R. Araujo, "Identification and control of dynamic systems using the self-organizing maps", IEEE Transactions on Neural Networks, vol. 15, no. 5, pp. 1244-1259, 2004. [76] S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, 1999. 113 [77] J. Goppert and W. Rosenstiel, "Topology preserving interpolation in self-organizing maps", Proceedings of the NeuroNIMES'93, pp. 425-534, 1993. [78] J. Goppert and W. Rosenstiel, "Topology interpolation in SOM by affine transformations", Proceedings of the ESANN'95, pp. 15-20, 1995. [79] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [80] C. W. J. Granger, and P. Newbold. "Spurious Regressions in Econometrics." Journal of Econometrics, vol. 2, pp. 111–120, 1974. [81] D. A. Dickey and W. Fuller, "Likelihood Ratio Statistics for Autoregressive Time Series With A Unit Root", Econometrica, vol. 49, pp. 1057-1072, 1981. [82] P. Phillips, and P. Perron. "Testing for a Unit Root in Time Series Regression." Biometrika. Vol. 75, pp. 335–346, 1988. [83] D. Kwiatkowski, P. Phillips, P. Schmidt and Y. Shin, "Testing the null hypothesis of stationarity against the alternative of a unit root", Journal of Econometrics, vol. 54, pp. 159-178, 1992. [84] M. Cottrell, J. C. Fort and G. Pages, "Theoretical aspects of the SOM algorithm", Neurocomputing, vol. 21, pp. 119-138, 1998. [85] page 139, T. Kohonen. Self Organizing Maps. Springer, Berlin, 3rd edition, 2001. [86] M. Cottrell and J. C. Fort, "Etude d'un algorithme d'auto-organization", Annales de l'Institut Henri Poincare, vol. 23, no. 1, pp. 1-20, 1987 [87] C. Bouton and G. Pages, "Self-Organization of the one-dimensional Kohonen algorithm with non-uniformly distributed stimuli", Stochastic Processes and their Applications, vol. 47, pp. 249-274, 1993. 114 [88] C. Bouton and G. Pages, "Convergence in distribution of the one-dimensional Kohonen algorithm when the stimuli are not uniform", Advances in Applied Probability, vol. 26, pp. 80-103, 1994. [89] E. Erwin, K. Obermayer and K. Shulten, "Self-organizing maps: stationary states, metastability and convergence rate", Biological Cybernetics, vol. 67, pp. 35-45, 1992. [90] E. Erwin, K. Obermayer and K. Shulten, "Self-organizing maps: ordering, convergence properties and energy functions", Biological Cybernetics, vol. 67, pp. 47-55, 1992. [91] M. Cottrell, "Theoretical aspects of the SOM algorithm", Neurocomputing, vol. 21, pp. 119-138, 1998. [92] H. Robbins and S. Monro, "A stochastic approximation method", Annals of Mathematical Statistics, vol. 22, pp. 400-407, 1951. [93] National Grid. Available www.nationalgrid.com/UK [94] C. L. Hor, S. J. Watson and S. Majithia, “Daily load forecasting and maximum demand estimation using ARIMA and GARCH”, Proceedings PMAPS, pp. 1-6, 2006. [95] D. van Dijk, T. Terasvirta and P. H. Franses, "Smooth transition autoregressive models - a survey of recent developments", Econometric Reviews, vol. 21, no. 3, pp. 1-47, 2002. [96] Robert B. Davies, "Hypothesis testing when a nuisance parameter is present only under the alternatives", Biometrika, vol. 74, no. 1, pp. 33-43, 1987. [97] R. Luukkonen, P. Saikkonen and T. Terasvirta, "Testing linearity against smooth transition autoregressive models", Biometrika, vol. 75, pp. 491-499, 1988. [98] C. W. J. Granger and T. Terasvirta, Modelling Nonlinear Economic Relationships, Oxford University Press, Oxford, 1993. 115 [99] S. Leybourne, P. Newbold and D. Vougas, "Unit roots and smooth transitions", Journal of Time Series Analysis, vol. 19, pp. 83-97, 1998. [100] G. Thimm and E. Fiesler, "High-order and multilayer perceptron initialization", IEEE Transactions on Neural Networks, vol. 8, no. 2, pp. 349-359, 1997. [101] J. F. Kolen and J. B. Pollack, "Backpropagation is sensitive to initial conditions", Laboratory for Artificial Intelligence Research, Comput. Inform. Sci. Dep, Tech. Rep. TR 90-JK-BPSIC, 1990. [102] R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification, 2nd ed., WileyInterscience, 2000. [103] New England ISO. Available www.iso-ne.com [104] K. Y. Lee and Shu Du, "Short term load forecasting using semigroup based systemtype neural network", Proceedings of Intelligent System Application to Power Systems, pp. 291-296, 2005. [105] Shu Du, "Short term load forecasting using system-type neural network architecture", Master's thesis, Baylor University, 2009. [106] The Alberta electric system operator. Available www.aeso.ca [107] C. U. Vila, A. Z. de Souza, J. W. Lima and P. P. Balestrassi, "Electricity demand and spot price forecasting using evolutionary computation combined with chaotic nonlinear dynamic model", Electrical Power and Energy Systems, vol. 21, no. 2, pp. 108116, 2010. [108] R. Cottet and M. Smith, "Bayesian modelling and forecasting of intraday electricity load", Journal of American Statistical Association, vol. 98, no. 464, pp. 839-849, 2003. [109] Australian Energy Market Operator. Available www.aemo.com.au [110] D. Srinivasan, A. C. Liew and C. S. Chang, "A neural network short-term load forecaster", Electric Power Systems Research, vol. 28, no. 3, pp. 227-234, 1994. 116 [111] D. Srinivasan, "Evolving artificial neural networks for short term load forecasting", Neurocomputing, vol. 23, no. 1-3, pp. 265-276, 1998. [112] J. Skalin and T. Terasvirta, "Another look at Swedish business cycles", Journal of Applied Econometrics, vol. 14, pp. 359-378, 1999. [113] J. Skalin and T. Terasvirta, "Modelling asymmetries and moving equilibria in unemployment rates", Macroeconomic Dynamics, 2001. [114] C. Weib and R. Gob, "Measuring serial dependence in categorical time series", Advances in Statistical Analysis, vol. 92, no. 1, pp. 71-89, 2008. 117 [...]... STLF 1.7 Structure of the Thesis The thesis consists of the following chapters 16 In this first chapter, short term load forecasting was introduced The two approaches to short term load forecasting, statistical approach and computational intelligence based approach, were introduced, and their hybrid methods were discussed Relevant work from past research was presented Finally the motivation for this... explained how an unsupervised model such as a self- organizing map can be used for time series prediction Then the hybrid model, involving autocorrelation weighted input to the self- organizing map and autoregressive model is explained, along with the motivation for weighing with autocorrelation coefficients In the sixth chapter, the second hybrid model is proposed to overcome certain issues with the first... what this thesis sets out to do 1.5 Contribution of the Thesis In this work, two SOM based hybrid models are proposed for STLF In the first model, a load forecasting technique is proposed which uses a weighted SOM for splitting the past historical data into clusters For the standard SOM, all the inputs to the neural network are equally weighted This is a drawback compared to other supervised learning... necessary to make it a homoscedastic (constant variance) series Similarly, it needs to be determined if the series is stationary, and if there is any significant seasonality which needs to be modeled Differencing approach enables to handle stationarity and remove seasonality • Model identification involves identifying the order of the autoregressive and moving average terms to obtain a good fit to the... a teacher Hence the network has to learn to adapt based on the experiences collected through the previous training patterns The most popular architecture for unsupervised learning is the self- organizing map, which is discussed next 3.3 Self- Organizing Maps 3.3.1 Introduction The SOM algorithm, first introduced by Kohonen in [72], is one of the most popular ANN model based on the unsupervised competitive... supervised learning and self- organizing maps for unsupervised learning are described The architecture, the learning rule and relevant issues are presented In the fourth chapter, the stylized facts of the load demand series are presented It is necessary to understand the unique properties of the load demand series before any attempt is made to model them In the fifth chapter, the first hybrid model is presented... obtain a good fit to the data Several graph based approaches exist, which include the autocorrelation function and partial autocorrelation function approaches, and new model selection tools such as Akaike’s Information Criterion have been developed • Model estimation involves finding the value of model coefficients in order to obtain a good fit on the data The main approaches are non-linear least squares... has been used as a benchmark for daily load forecasting, and the double seasonal exponential smoothing as proposed in [64] has been used as a benchmark for hourly load forecasting 23 2.3 Nonlinear Models Regime-switching models were earlier mentioned briefly in Section 1.3.1.4 Nonlinear models can prove to be better in terms of estimation and forecasting compared to linear models because of their flexibility... are used to produce multiple scenarios for load forecasts In [45], network committee technique, which is a technique from the neural network architecture, is applied to improve the accuracy of forecasting the next-day peak load 1.6.3 Hybrid Methods Hybrid models combining statistical models and neural networks are rare for STLF, though they have been proposed for other TSP fields In [46], a hybrid ARIMA/ANN... found to be able to exploit all the original methods' advantages Similarly, particle swarm optimization is a recent CI approach which has been hybridized with other CI approaches such as neural networks [60][61] and support vector machines [62] to successfully improve the prediction accuracy for STLF 1.7 Structure of the Thesis The thesis consists of the following chapters 16 In this first chapter, short ... term load forecasting was introduced The two approaches to short term load forecasting, statistical approach and computational intelligence based approach, were introduced, and their hybrid methods... autoregressive and moving average terms to obtain a good fit to the data Several graph based approaches exist, which include the autocorrelation function and partial autocorrelation function approaches, ... unsupervised model such as a self- organizing map can be used for time series prediction Then the hybrid model, involving autocorrelation weighted input to the self- organizing map and autoregressive model

Định dạng
Số trang	117
Dung lượng	1,51 MB