# Case Study

- 4k Downloads

## Abstract

In the previous chapter, we looked at load measurements for all households together and we ignored their chronological order. In contrast, in this chapter, we are interested in short term forecasting of household profiles individually. Therefore, information about the time at which measurements were taken becomes relevant.

## 5.1 Predicting Electricity Peaks on a Low Voltage Network

In the previous chapter, we looked at load measurements for all households together and we ignored their chronological order. In contrast, in this chapter, we are interested in short term forecasting of household profiles individually. Therefore, information about the time at which measurements were taken becomes relevant.

To illustrate different popular methods and look at their errors, we use a subset of the End point monitoring of household electricity demand data from the Thames Valley Vision project,^{1} kindly made publicly available by Scottish and Southern Energy Networks,^{2} containing profiles for 226 households in 30 min resolution.

We use the time window from Sunday, 19 July 2015 00.00 to Monday, 21 September 2015 00.00. The first eight weeks (or less depending on the model) are used for training the models, and we want to predict the ninth week (commencing on the 14th September 2015), each half hour usage in that week for each of the households.

### 5.1.1 Short Term Load Forecasts

In this section, several different popular forecasting algorithms from both statistical and machine learning backgrounds will be tested. We will evaluate them using four error measures described in Sect. 2.2, MAPE, MAE, MAD and \(E_4\).

Since we want to compare errors for different forecasting algorithms, in Chap. 2 we have established two simple benchmarks. A **last week (LW)** forecast, where the data from one week before is used to predict the same half-hour of the test week is extremely simple (as no calculation is needed), but relatively competitive. A simple average over several past weeks is also popular, a so called **similar day (SD)** forecast.

Deciding on how many weeks of history to use to average over is not always straightforward, especially when seasons change. Here we have done a quick optimisation of the number of weeks used. Although the smallest error is obtained for one week, i.e. when SD is the same to LW forecast, we use four weeks of history, as this resulted in the smallest 4th norm error, and we are interested in peaks. Examples of LW and SD forecasts are given on Figs. 5.4 and 5.5.

#### 5.1.1.1 SARIMA

^{3}An example showing some success with the prediction of peaks can be seen on Fig. 5.6.

#### 5.1.1.2 PM

#### 5.1.1.3 LSTM

The Long Short Term Memory, a recurrent neural network method with two hidden layers with 20 nodes each was used, implemented with Python Keras library [1], training the model over 5 epochs and using a batch size of 48 (the number of samples per gradient update). The optimiser used was ‘adam’, a method for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments [2]. For the input, we have used the previous load, half-hour and day of the week values. This was coded by 4 values: 0 for working day followed by working day; 1 for working day followed by non-working day; 2 for non-working day followed by working day; 3 for non-working day followed by non-working day.

#### 5.1.1.4 MLP

Multi-layer perceptron, a feed-forward neural network with five hidden layers and nine nodes in each was chosen, after running limited parameter search, firstly deciding on the number of nodes in two layers (from 5 to 20) and then adding layers with the optimal number of neurons, 9, until the errors started to grow. All four error measures were behaving the same way.

We used MLPRegressor from Python scikit-learn library [5], using ‘relu’, the rectified linear unit function, \(f(x) = \max (0, x)\) as the activation function for the hidden layers. An optimiser in the family of quasi-Newton methods, ‘lbfgs’, was used as the solver for weight optimisation. The learning rate for weight updates was set to ‘adaptive’, i.e. kept constant on 0.01, as long as training loss was continuing to decrease. Each time two consecutive epochs fail to decrease training loss by at least 0.0001, or fail to increase validation score by at least 0.0001, the learning rate was divided by 5. L2 penalty was set to \(\alpha =0.01\). An example is shown on Fig. 5.10, where timing of regular peaks is mostly captured, but amplitudes are underestimated.

### 5.1.2 Forecast Uncertainty

Mean errors

Error | LW | SD | PM | SARIMA | LSTM | MLP |
---|---|---|---|---|---|---|

MAPE (%) | | 44.3813 | 45.1576 | 43.3889 | 47.1202 | 47.7836 |

MAE (kWh) | | 0.0844 | 0.0861 | 0.0866 | 0.0957 | 0.0902 |

MAD (kWh) | | 0.04314 | 0.0440 | 0.0450 | 0.0512 | 0.0538 |

\(E_4\) (kWh) | 1.2299 | 1.0588 | 1.0774 | | 1.2307 | 1.0924 |

Median errors

Error | LW | SD | PM | SARIMA | LSTM | MLP |
---|---|---|---|---|---|---|

MAPE (%) | 42.7588 | | 43.1196 | 44.0777 | 45.2884 | 43.8456 |

MAE (kWh) | 0.0742 | | 0.0758 | 0.0757 | 0.0792 | 0.0754 |

MAD (kWh) | | 0.0363 | 0.0375 | 0.0387 | 0.0445 | 0.0454 |

\(E_4\) (kWh) | 1.1785 | | 0.9916 | 1.0127 | 1.1598 | 1.0183 |

Maximum errors

Error | LW | SD | PM | SARIMA | LSTM | MLP |
---|---|---|---|---|---|---|

MAPE (%) | 150.3073 | 463.8288 | 459.1680 |
| 343.5002 | 681.7654 |

MAE (kWh) |
| 0.4831 | 0.4823 | 0.3111 | 0.7156 | 0.5721 |

MAD (kWh) |
| 0.1680 | 0.1664 | 0.1700 | 0.1742 | 0.1799 |

\(E_4\) (kWh) | 3.6269 | 5.7239 | 5.8357 |
| 8.2499 | 6.4193 |

While four error measures give values that are all positive, the differences between predicted and actual value can be negative, in the case of underestimation. This is of importance, especially for peaks. The consequences of underestimated peaks (higher prices, outages, or storage control problems) are usually much worse than overestimation of peaks (higher costs, or non-optimal running). Histograms of those distances for all used methods are given on Fig. 5.12 with normal probability distribution function contours, based on the distances’ mean and standard deviation. One can see that these distances are not normally distributed. Almost all forecasts are more one-sided, therefore underestimating. This is especially pronounced for SD, PM, LSTM and MLP forecasts. Also, one can notice similarity in distances profiles between LW, and SARIMA on one hand and between other four forecast on the other hand.

We note that this predictive task is quite challenging. In the week commencing 31 Aug 2015, there is a double challenge of a summer bank holiday (31 Aug) and beginning of school year, while summer weeks before that in the UK in general are characterised by less consumption. This leaves only one full week of behaviour relatively similar to the week that we want to predict which explains why LW’s MAPE is on average better than more sophisticated methods.

### 5.1.3 Heteroscedasticity in Forecasts

*t*. A value around 1 means stationarity in the errors of forecasts.

Figure 5.13 displays the estimated scedasis, as given in ( 4.22), when we select the largest 50 errors determined by each forecasting method. The SARIMA model yield the least oscillation around 1 which is indicative of satisfactory performance of this time series model in capturing the relevant traits in the data. Both PM4 and SD4 seem to have better predictive power on later days of the week as they exhibit a decreasing trend in the likelihood of large errors. All methods show large uncertainty in the forecasts delivered between Wednesday and Thursday, where all the sample paths for the scedasis tend to concentrate above 1. The early hours of Thursday maxima box-plots on Fig. 5.1c when compared with Fig. 5.1a show more spread in values. In this way, the estimated scedasis values give us a way to quantify which times are more difficult for prediction regarding different algorithms.

## Footnotes

## References

- 1.Chollet, F. et al.: Keras. https://keras.io (2015)
- 2.Kingma, D., Lei Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015 (2015)Google Scholar
- 3.Cavallo, J., Marinescu, A., Dusparic, I., Clarke, S.: Evaluation of forecasting methods for very small-scale networks. In: Woon, W.L., Aung, Z., Madnick, S. (eds.) Data Analytics for Renewable Energy Integration, pp. 56–75. Springer International Publishing, Cham (2015)Google Scholar
- 4.Kong, W., Dong, Z.Y., Jia, Y., Hill, D.J., Xu, Y., Zhang, Y.: Short-term residential load forecasting based on lstm recurrent neural network. IEEE Trans. Smart Grid (2017)Google Scholar
- 5.Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learn. Res.
**12**, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar

## Copyright information

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.