5.1 Predicting Electricity Peaks on a Low Voltage Network

In the previous chapter, we looked at load measurements for all households together and we ignored their chronological order. In contrast, in this chapter, we are interested in short term forecasting of household profiles individually. Therefore, information about the time at which measurements were taken becomes relevant.

To illustrate different popular methods and look at their errors, we use a subset of the End point monitoring of household electricity demand data from the Thames Valley Vision project,Footnote 1 kindly made publicly available by Scottish and Southern Energy Networks,Footnote 2 containing profiles for 226 households in 30 min resolution.

We use the time window from Sunday, 19 July 2015 00.00 to Monday, 21 September 2015 00.00. The first eight weeks (or less depending on the model) are used for training the models, and we want to predict the ninth week (commencing on the 14th September 2015), each half hour usage in that week for each of the households.

On the Fig. 5.1, mean value (top) or maximum value (bottom) for each half-hour is computed for the day of the week for each household, and a box-plot over households is presented. On the Fig. 5.2, a box-plot is produced for each household over all the values recorded during the eight weeks of observations for that household.

Fig. 5.1
figure 1

Usage on different days

Fig. 5.2
figure 2

Households-half-hourly usage box-plot

The data for each household consists of 3072 half-hourly values. We want to predict the last 336 value in each time series. The exploratory data analysisconfirms that there is a daily seasonality detected. The examples of seasonal decomposition using an additive seasonal model with a lag of 48 half-hours, i.e. one day, and auto-correlation and partial auto-correlation functions with lag 48 for two households are given on Fig. 5.3. In most cases, no uniform trend is observed, daily seasonal component usually contains morning and evening peak as expected, and residuals mostly look random, as expected. The autocorrelation and  partial autocorrelation inform us on how many past observations are relevant for predicting a half-hour, and they look different for different households.

Fig. 5.3
figure 3

Exploratory data analysis of household profiles

5.1.1 Short Term Load Forecasts

In this section, several different popular forecasting algorithms from both statistical and machine learning backgrounds will be tested. We will evaluate them using four error measures described in Sect. 2.2, MAPE, MAE, MAD and \(E_4\).

Since we want to compare errors for different forecasting algorithms, in Chap. 2 we have established two simple benchmarks. A last week (LW) forecast, where the data from one week before is used to predict the same half-hour of the test week is extremely simple (as no calculation is needed), but relatively competitive. A simple average over several past weeks is also popular, a so called similar day (SD) forecast.

Deciding on how many weeks of history to use to average over is not always straightforward, especially when seasons change. Here we have done a quick optimisation of the number of weeks used. Although the smallest  error is obtained for one week, i.e. when SD is the same to LW forecast, we use four weeks of history, as this resulted in the smallest 4th norm error, and we are interested in peaks. Examples of LW and SD forecasts are given on Figs. 5.4 and 5.5.

In addition to the two benchmarks, LW and SD, four different algorithms: SARIMA (seasonal ARIMA), Permutation merge (PM), LSTM (recurrent neural network) and MLP (forward neural network) are compared. The detailed descriptions of the algorithms are given in Chap. 2.

Fig. 5.4
figure 4

The LW forecast (red cros) and observation (blue dot) for one household

Fig. 5.5
figure 5

The SD forecast (red cros) and observation (blue dot) for one household

5.1.1.1 SARIMA

As previously discussed, this is a variation of a widely used ARIMA model, where the past values are used to predict future, but also moving average helps to pick up changes in the observations. Integration ensures stationarity of the data. In seasonal autoregressive integrated moving average model (SARIMA) , seasonal part is added. In our case, that is the detected daily seasonality. The time series is split into peak load and seasonal part. The general peak load is assumed to be without and seasonal part is with periodicity. The parameters we use are \(p=\{2, 3, 5, 6\}\), \(d=1\), \(q=\{0, 1, 3\}\) for the general part and \(P=\{1,2\}\), \(D=0\), \(Q=\{0,1,2\}\) for the seasonal part of the model. The parameters were obtained doing localised search based on the  Akaike Information Criterion (AIC) for each household.Footnote 3 An example showing some success with the prediction of peaks can be seen on Fig. 5.6.

Fig. 5.6
figure 6

The SARIMA forecast (red cros) and observation (blue dot) for one household

5.1.1.2 PM

The  algorithm with the size of the window 1, therefore allowing permutations with one half hour before and after, was run for a different number of weeks in history. When using only one week, this is equal to LW benchmark. As shown on Fig. 5.7, there was no single value that optimised all four errors. We have chosen 4 weeks of history based on the smallest \(E_4\) error. While relatively similar to SD forecast, PM manages to capture some peak timings better (compare Fig. 5.5 with Fig. 5.8).

Fig. 5.7
figure 7

PM algorithms performance for different mean error values

Fig. 5.8
figure 8

The PM4 forecast (red cros) and observation (blue dot) for one household

5.1.1.3 LSTM

The Long Short Term Memory, a recurrent neural network method with two hidden layers with 20 nodes each was used, implemented with Python Keras library [1], training the model over 5 epochs and using a batch size of 48 (the number of samples per gradient update). The optimiser used was ‘adam’, a method for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments [2]. For the input, we have used the previous load, half-hour and day of the week values. This was coded by 4 values: 0 for working day followed by working day; 1 for working day followed by non-working day; 2 for non-working day followed by working day; 3 for non-working day followed by non-working day.

We ran limited parameter search from 10 to 30 nodes in each layer, and noticed, similar to [3] that the equal number of nodes per layer seem to work best. The minimal errors for all four error measures were obtained for the configuration with 20 nodes in each hidden layer, which agrees with the optimal choice obtained by [4], based on MAPE error only. An example of LSTM forecast can be seen on Fig. 5.9.

Fig. 5.9
figure 9

The LSTM forecast (red cros) and observation (blue dot) for one household

Fig. 5.10
figure 10

The MLP forecast (red cros) and observation (blue dot) for one household

5.1.1.4 MLP

Multi-layer perceptron, a feed-forward neural network with five hidden layers and nine nodes in each was chosen, after running limited parameter search, firstly deciding on the number of nodes in two layers (from 5 to 20) and then adding layers with the optimal number of neurons, 9, until the errors started to grow. All four error measures were behaving the same way.

We used MLPRegressor from Python scikit-learn library [5], using ‘relu’, the rectified linear unit function, \(f(x) = \max (0, x)\) as the activation function for the hidden layers. An optimiser in the family of quasi-Newton methods, ‘lbfgs’, was used as the solver for weight optimisation. The learning rate for weight updates was set to ‘adaptive’, i.e. kept constant on 0.01, as long as training loss was continuing to decrease. Each time two consecutive epochs fail to decrease training loss by at least 0.0001, or fail to increase validation score by at least 0.0001, the learning rate was divided by 5. L2 penalty was set to \(\alpha =0.01\). An example is shown on Fig. 5.10, where timing of regular peaks is mostly captured, but amplitudes are underestimated.

5.1.2 Forecast Uncertainty

Table 5.1 Mean errors
Table 5.2 Median errors

In Tables 5.1, 5.2, 5.3, respectively, means, medians and maxima over households of all four errors are given for the four algorithms and two benchmarks. The box-plot of four errors means across 226 households is given on Fig. 5.11. The results show that SARIMA forecast is having the smallest errors for \(E_4\) error measure and performing best with respect to peaks. Two benchmarks are very competitive, when looking across all the values, with LW doing very well in all other three error measures. PM and MLP are slightly worse and LSTM is lagging behind.

Table 5.3 Maximum errors
Fig. 5.11
figure 11

Average errors across households

Fig. 5.12
figure 12

Histogram of errors—differences between predicted and observed values

While four error measures give values that are all positive, the differences between predicted and actual value can be negative, in the case of underestimation. This is of importance, especially for peaks. The consequences of underestimated peaks (higher prices, outages, or storage control problems) are usually much worse than overestimation of peaks (higher costs, or non-optimal running). Histograms of those distances for all used methods are given on Fig. 5.12 with normal probability distribution function contours, based on the distances’ mean and standard deviation. One can see that these distances are not normally distributed. Almost all forecasts are more one-sided, therefore underestimating. This is especially pronounced for SD, PM, LSTM and MLP forecasts. Also, one can notice similarity in distances profiles between LW, and SARIMA on one hand and between other four forecast on the other hand.

We note that this predictive task is quite challenging. In the week commencing 31 Aug 2015, there is a double challenge of a summer bank holiday (31 Aug) and beginning of school year, while summer weeks before that in the UK in general are characterised by less consumption. This leaves only one full week of behaviour relatively similar to the week that we want to predict which explains why LW’s MAPE is on average better than more sophisticated methods.

5.1.3 Heteroscedasticity in Forecasts

In this section, we want to look into timing and frequency of largest errors for all forecast methods that were compared in the previous section. We want to see if we can spot any patterns. Are the different methods better in different time-steps? Can we identify time periods that are more difficult for forecasting? To this end, we apply the development of the scedasis introduced in Sect. 4.4 to capture how the largest absolute errors stemming from each forecasting method evolve over time. The interpretation is the following: the higher the scedasis at time \(t\in [0,1]\), the higher the propensity for extreme errors to occur at that time t. A value around 1 means stationarity in the errors of forecasts.

Fig. 5.13
figure 13

Estimated scedasis, \(\hat{c}\), as a function of time

Figure 5.13 displays the estimated scedasis, as given in (4.22), when we select the largest 50 errors determined by each forecasting method. The SARIMA model yield the least oscillation around 1 which is indicative of satisfactory performance of this time series model in capturing the relevant traits in the data. Both PM4 and SD4 seem to have better predictive power on later days of the week as they exhibit a decreasing trend in the likelihood of large errors. All methods show large uncertainty in the forecasts delivered between Wednesday and Thursday, where all the sample paths for the scedasis tend to concentrate above 1. The early hours of Thursday maxima box-plots on Fig. 5.1c when compared with Fig. 5.1a show more spread in values. In this way, the estimated scedasis values give us a way to quantify which times are more difficult for prediction regarding different algorithms.