In this section, the load of a certain hour is modeled as a function of the most recent hours only. It is not expected that this kind of a model would be accurate with long lead-times. The idea is to see whether better accuracy for the closest hours can be achieved than with hour by hour models of section 5.2.
The model is intended only for lead-times from one to five hours. The length of the lead-time depends on the number of output nodes of the networks. This means that using earlier forecasts recursively is not tested. The errors would grow unacceptably large, if the lead-times were made longer this way.
Two variations of the model are used. The first model has only one network, which is applied to the whole data. The other one has one network for each hour of the day.
The idea is to divide the data of the whole year thereby into smaller sets, and to apply separate networks for each of them. This reduces the computational effort required in training with the data of the whole year.
The structure of the first model variant is very simple. There is a single MLP neural network, which has one input layer, one hidden layer and one output layer. The input to the network consists of the most recent load values, three binary variables indicating the type of the day, and five binary variables expressing the hour of the day. The days are divided into three classes instead of four, because now the data of the previous day is not utilized in forecasting. The output consists of load forecasts for a few following hours. If the number of input load values is p, and output load values q, there are in all p + 8 input variables and q output variables.
The model variant, which has separate networks for each hour of the day has the same input and output variables, except the hour of the day is naturally not included, because each network is intended for one particular hour only.
Test results
One network for all hours
First, the training with the data of a whole year, from May 24, 1996 to May 23, 1997, was taken into consideration. Testing was carried out using the remaining data, from May 24, 1997 to August 8, 1997.
The number of the training cases is quite considerable. There are 24 cases each day, and the number of the days is more than 300. Therefore, the number of the hidden layer neurons should also be quite large (see equation 3.3). Small hidden layer sizes were tried, but results were inadequate.
The problem with a large hidden layer is the long computation time needed in training. For a very simple input-output structure, where p = 3 and q =1, the sufficient training with 50 hidden layer neurons took several hours for Matlab to complete. For the 6 first test weeks (that is the time before the summer holidays) the average error varied between 1.6 % and 1.9 %.
The increasing of the hidden layer size further would probably still improve the results. However, to avoid the excessive growth in the computing time, the reducing of the training set was taken into consideration instead.
Training sets of one and two months were tested. The sets consist of the end part of the one year long data set used above. The week following the training sets was used as a test set. The reason for not using a longer test set is the fact, that training with a short training set would in reality be often repeated. Therefore, there is no sense in testing a network on load data of a later time.
The average forecasting errors with different model structures are given in the tables 5.1 and 5.2. In almost all cases p=3, because this was found the most suitable number of input load values in preliminary test runs. The number of output neurons varies between 1 and 7, and that of hidden layer neurons varies between 10 and 30.
Table 5.1: the average forecasting errors for the test set May 24 – May 30, 1997.
The training period is March, 23 – May 23, 1997.
p q neurons lead 1h lead 2h lead 3h lead 4h lead 5h lead 6h lead 7h
3 1 10 1.37 - - - - - -
3 1 20 1.35 - - - - - -
3 1 30 1.40 - - - - - -
3 3 10 2.57 2.54 2.60 - - - -
3 3 20 1.47 2.05 2.42 - - - -
3 3 30 1.47 1.82 2.33 - - - -
3 5 10 2.37 3.05 3.06 3.00 3.54 - -
3 5 20 1.63 1.93 2.48 2.67 3.23 - -
3 5 30 1.47 1.80 2.30 2.55 2.84 - -
5 5 30 1.59 1.92 2.22 2.31 2.66 - -
3 7 30 1.51 1.76 2.07 2.30 2.73 3.05 3.32
Table 5.2: the average forecasting errors for the test set May 24 – May 30, 1997.
The training period is April 23 – May 23, 1997.
p q neurons lead 1h lead 2h lead 3h lead 4h lead 5h lead 6h lead 7h
3 1 10 1.31 - - - - - -
3 1 20 1.26 - - - - - -
3 1 30 1.23 - - - - - -
3 3 10 2.07 2.49 2.53 - - - -
3 3 20 1.38 1.91 2.17 - - - -
3 3 30 1.32 1.78 2.22 - - - -
3 5 10 2.60 3.13 3.37 3.46 3.40 - -
3 5 20 1.37 1.76 2.39 2.47 2.57 - -
3 5 30 1.41 1.85 2.20 2.50 2.66 - -
5 5 30 1.44 1.91 2.25 2.50 2.85 - -
3 7 30 1.42 1.75 2.25 2.40 2.69 3.03 3.28
It seems clear that the results with the shorter training set are slightly better. The hidden layer size of 30 neurons seems to be superior to smaller hidden layers.
Increasing the output layer size does not seem to worsen the forecasting accuracy considerably with the shortest lead times. Therefore, the model with p = 3, q = 5 or 7, and number of hidden layer neurons = 30 seems to be the most appropriate.
To get a more reliable opinion on the accuracy, the model with three load inputs and five outputs was applied to the same test sets as used in the first part of this chapter.
The training sets were approximately one month long. The average error percentages for different lead times are given in figure 5.20.
Figure 5.20: The average forecasting errors for different training and test sets with p=3, q=5, number of hidden layer neurons = 30. The test sets consist of the week straight after the training set.
The results are good for the first two test cases. For the third one, errors are however large. This is the week, which caused large forecasting errors also with the hour by hour model.
Although the errors for the shortest lead times are quite small, there seems to be no clear improvement over the hour by hour model utilizing the data of the previous day and week.
Separate networks for different hours
The reason for using a separate network for each hour of the day is to enable training with the load data of the whole year within a reasonable time. The training set consists therefore of the data from May 24, 1996 to May 23, 1997, and the test data of the remaining data, from May 24, 1997 to August 18, 1997.
0 1 2 3 4 5 6 7
Sept. 17 – Oct. 15, 1996
Oct. 31 – Nov. 30, 1996
Jan. 19 – Feb. 17, 1997
Mar. 15 – Apr. 19, 1997
Training set
Average percentage error
Lead time = 1h Lead time = 2h Lead time = 3h Lead time = 4h Lead time = 5h
The average error percentages with different model structures are given in table 5.3.
The errors are clearly larger than with one network trained on a short training set.
Larger hidden layer sizes were tried, but without considerable improvement in the results.
Table 5.3: the average forecasting errors with the model using separate networks for each hour of the day. The training set consists of data between May 24, 1996 – May 23, 1997. The test set is the period May 24 – August 18, 1997.
p q neurons lead 1h lead 2h lead 3h lead 4h lead 5h
3 1 3 2.35 - - - -
3 1 5 4.14 - - - -
3 1 7 2.30 - - - -
5 1 3 2.43 - - - -
5 1 5 2.40 - - - -
5 1 7 2.37 - - - -
3 3 5 2.50 3.38 3.73 - -
3 3 7 2.42 3.08 3.73 - -
5 3 7 2.55 3.20 3.78 - -
3 3 10 2.30 3.04 3.68 - -
3 5 5 2.65 3.31 3.70 4.11 11.04
5 5 5 3.09 3.29 3.92 7.03 4.67
5 5 7 2.64 3.16 3.78 4.21 4.60
2 5 10 2.32 3.04 3.67 4.13 4.45
5 5 10 2.50 3.14 3.77 4.26 4.61