Keywords

1 Introduction

Since energy systems around the world transition to an increasing share of renewable energy sources in energy supply, the implementation of smart grids supporting this transition also advances. Smart grid implementation implies a growing number of smart meters that record power or energy consumption and generation as time series [4]. These recorded energy time series are characterized by a multi-seasonality, an aggregation-level dependent predictability, and a dependence on exogenous influences such as weather [16]. The increasing number of recorded energy time series enables a wide range of possible applications for this data and the goal of their automated operation. Exemplary applications for the smart grid that support the transition to renewable energy sources include customer profiling, load analysis, load forecasting, and load management [39, 47]. However, to perform well, these applications usually require clean data that represents the typical behavior of the underlying system well [30, 47].

Unfortunately, recorded time series are usually not clean, but contain anomalies [13]. Anomalies are patterns that deviate from what is considered normal [10]. They can occur in energy time series for many reasons, including smart meter failures [46], unusual consumption [32, 40], and energy theft [24]. All anomalies have in common that they potentially contain data points or patterns that represent false or misleading information, which can be problematic for any analysis of this data performed by the mentioned applications [47]. For example, anomalies such as positive or negative spikes may strongly deviate from what is considered normal, and a subsequent forecasting method that uses the data as input in an automated manner may generate an incorrect forecast. This forecast could in turn lead to an inappropriate energy schedule and ultimately affect the stability of the energy system in an automated smart grid setting.

Therefore, managing anomalies in energy time series – in the sense of dealing with their presence – is an important issue for applications in the energy system such as billing and forecasting [47]. In energy time series forecasting, the importance of an adequate anomaly management is generally known, e.g. [2, 3, 36]. For this reason, various anomaly management strategies exist, including the use of robust forecasting methods [23, 28], the use of information on detected anomalies [43], and the compensation of detected anomalies [9, 14, 37, 50]. However, it is not clear which strategy is the best for managing anomalies in energy time series forecasting regarding the obtained accuracy and also the associated necessary effort, which is why a rigorous comparison of available strategies is needed.

Therefore, the present paper introduces and compares different general strategies for managing anomalies in energy time series forecasting. For this purpose, we build on the typically used strategies mentioned above and describe three different general strategies based on them, namely the raw, the detection, and the compensation strategy. While the raw strategy applies forecasting methods directly to the data input without any changes, the detection strategy provides information on anomalies detected in the input data to the forecasting method. The compensation strategy cleans the input data by detecting and thereafter compensating anomalies in the input data before applying a forecasting method.

To comparatively evaluate these strategies, we use a representative selection of forecasting methods, including naive, simple statistical, statistical learning, and machine learning methods. We also make use of real-world energy time series with inserted synthetic anomalies derived from real-world data. Given these forecasting methods and data, we compare the obtained forecast accuracy of all proposed strategies and present an example of how these strategies work and perform.

The remainder of the present paper is structured as follows: After describing related work in Sect. 2, Sect. 3 introduces the strategies for managing anomalies in energy time series forecasting. In Sect. 4, we evaluate the presented strategies. Finally, we discuss the results and the strategies in Sect. 5 and conclude the paper in Sect. 6.

2 Related Work

Since anomalies are potentially limiting the performance of any downstream application, dealing with their presence is generally a well-known topic. For example, all kinds of pre-processing methods aim to raise data quality to ensure the validity and reliability of data analysis results, e.g. [5, 19, 37]. Similarly, the influence of the choice of preprocessing methods on the accuracy of forecasting methods is also known, e.g. [1, 3]. In energy time series forecasting, several works also address how to deal with the presence of anomalies. We organize these works along three strategies.

Works of the first strategy focus on the robustness of forecasting methods. These works, for example, develop forecasting methods that are robust against anomalies, e.g. [23, 28, 29, 51, 53], strengthen existing forecasting methods, e.g. [54], or at least investigate the robustness of forecasting methods with respect to anomalous data, e.g. [27, 52]. The second strategy consists of works that make use of information on detected anomalies. In [43], for example, the information on predicted anomalies is used to adapt the energy production. Works of the third strategy detect anomalies and replace the detected anomalies with appropriate values, e.g. [9, 14, 30, 33], or even remove the detected anomalies, e.g. [11].

Despite these works on specific anomaly management strategies, it is not known which strategy is the best for managing anomalies in energy time series forecasting. For this reason, a rigorous comparison of the available strategies – as done in the present paper – is lacking.

3 Strategies for Managing Anomalies in Energy Time Series Forecasting

In this section, we present three general strategies for managing anomalies in energy time series forecasting, which build on the previously described anomaly management strategies in literature.Footnote 1 All of these strategies apply a forecasting method \(f(\circ )\) to create a forecast for an input power time series \(\textbf{y} = \{y_t\}_{t \in T}\) with T measured values. This forecasting method creates a forecast based on historical values of the input power time series and exogenous features \(\textbf{e}\) such as calendar information or weather forecasts. More specifically, the forecasting method combines the most recent N historical values of the input power time series \(\textbf{y}_{t} = y_{t-(N-1)}, \dots , y_t\) with the exogenous features \(\textbf{e}_{t+H} = e_{t+1}, \dots e_{t+H}\) for the forecasting horizon H. Using this combination, the forecasting method then generates a forecast at time point t

$$\begin{aligned} \hat{\textbf{y}}_{t+H} = f(\textbf{y}_{t}, \textbf{e}_{t+H}), \end{aligned}$$
(1)

where \(\hat{\textbf{y}}_{t+H} = \hat{y}_{t+1}, \dots \hat{y}_{t+H}\) is the forecast value for the input power time series for each time step in the forecast horizon. Nevertheless, the considered strategies comprise different steps and thus differ in the inputs to the applied forecasting method (see Fig. 1). We thus describe the included steps, the used input, and the underlying assumptions for each strategy in the following.

Fig. 1.
figure 1

The three strategies for managing anomalies in energy time series forecasting. The raw strategy directly uses the input power time series to create a forecast. The detection strategy first detects anomalies in the input power time series, before providing a forecast using the information on the detected anomalies from the power time series with detected anomalies. The compensation strategy detects anomalies and additionally compensates the detected anomalies before performing a forecast based on the power time series with compensated detected anomalies.

Raw Strategy. The first strategy is the so-called raw strategy. It directly uses a power time series containing anomalies as input to a forecasting method. Given this input, the applied forecasting method provides a forecast of the input power time series. Formally, the raw strategy thus creates a forecast at time point t

$$\begin{aligned} \hat{\textbf{y}}_{t+H}^{\text {raw}} = f(\textbf{y}_{t}, \textbf{e}_{t+H}), \end{aligned}$$
(2)

where \(\textbf{y}_{t}\) are the historical values of the input power time series containing anomalies and H, \(\textbf{e}_{t+H}\), and \(\hat{\textbf{y}}_{t+H}\) are defined as above.

The raw strategy assumes that the anomalies contained in the input time series do not strongly affect the forecast of the applied forecasting method or that the applied forecasting method is robust against anomalies. Therefore, the applied forecasting method is assumed to still achieve an accurate forecast.

Detection Strategy. The second strategy is the so-called detection strategy. This strategy first applies an anomaly detection method to the power time series containing anomalies to detect contained anomalies whereby the anomaly detection method can be supervised or unsupervised. The resulting power time series with detected anomalies serves as input to the forecasting method that then provides the forecast of the power time series. Formally, the detection strategy, therefore, results in a forecast at time point t

$$\begin{aligned} \hat{\textbf{y}}_{t+H}^{\text {detection}} = f(\textbf{y}_{t}, \textbf{d}_{t+H}, \textbf{e}_{t+H}), \end{aligned}$$
(3)

where \(\textbf{d}_{t+H} = d_{t+1}, \dots d_{t+H}\) are the labels of the detected anomalies for the forecasting horizon H and \(\textbf{y}_{t}\), \(\textbf{e}_{t+H}\), and \(\hat{\textbf{y}}_{t+H}\) are defined as above.

The assumption of the detection strategy is that the applied forecasting method can incorporate information about detected anomalies in its model so that the consideration of detected anomalies leads to an accurate forecast.

Compensation Strategy. The third strategy is the so-called compensation strategy. It also first applies a supervised or unsupervised anomaly detection method to the power time series containing anomalies to identify the contained anomalies. However, this strategy then uses the power time series with detected anomalies as input to an anomaly compensation method \(c(\circ )\) that replaces the detected anomalies with realistic values, i.e.,

$$\begin{aligned} \tilde{\textbf{y}}_{t+H} = c(\textbf{y}_{t}, \textbf{d}_{t+H},\circ ), \end{aligned}$$
(4)

where \(\tilde{\textbf{y}}_{t+H}\) is the power time series with compensated detected anomalies and \(\circ \) are additional parameters of the compensation method. This power time series with compensated detected anomalies \(\tilde{\textbf{y}}_{t+H}\) serves as input to the forecasting method that provides the forecast of the power time series. Formally, we describe the forecast of the compensation strategy at time point t with

$$\begin{aligned} \hat{\textbf{y}}_{t+H}^{\text {compensation}} = f(\tilde{\textbf{y}}_{t}, \textbf{e}_{t+H}), \end{aligned}$$
(5)

where \(\tilde{\textbf{y}}_{t}\) are the historical values of the input power time series with compensated detected anomalies and H, \(\textbf{e}_{t+H}\), and \(\hat{\textbf{y}}_{t+H}\) are defined as above.

The compensation strategy assumes that anomalies have to be detected and compensated in order to enable the applied forecasting method to provide an accurate forecast.

4 Evaluation

To evaluate the proposed strategies for managing anomalies in energy time series forecasting, we compare the forecasting accuracy of all strategies using different forecasting methods. Before presenting the results, we detail the performed evaluation: We introduce the used data and the inserted synthetic anomalies, the anomaly detection methods applied in the detection and compensation strategies, and the anomaly compensation method applied in the compensation strategy. We also describe the used forecasting methods and the experimental setting.

4.1 Data and Inserted Synthetic Anomalies

For the evaluation, we use real-world data in which we insert synthetic anomalies. The chosen data set is the “ElectricityLoadDiagrams20112014 Data Set”Footnote 2 from the UCI Machine Learning Repository [18]. It includes electrical power time series from 370 clients with different consumption patterns [38]. The 370 time series are available in a quarter-hourly resolution for a period of up to four years, namely from the beginning of 2011 until the end of 2014. We choose the power time series MT_200 for the evaluation to cover the entire four-year period, to account for the electrical load of a typical client, and to consider a time series that is anomaly-free compared to other time series in the data set (see Fig. 2).

Fig. 2.
figure 2

Overview of the data used for the evaluation.

Since the chosen time series does not include labeled anomalies and thus do not allow for controlled experimental conditions, we insert synthetic anomalies in the complete chosen time series. For this, we consider the two anomaly groups used in [44], namely technical faults in the metering infrastructure and unusual consumption. Using the corresponding available anomaly generation methodFootnote 3, we insert four types of anomalies from each group: Anomalies of types 1 to 4 are from the group of technical faults and based on anomalies identified in real-world power time series in [45]. These anomalies violate the underlying distribution corresponding to normal behavior. Anomalies of types 5 to 8 are from the group of unusual consumption and represent unusual behavior as described in [44]. These anomalies are characterized by unusually low or high power consumption. We give formulas and examples for all types of anomalies in Appendix 1.

For the evaluation, we insert once 20 anomalies of types 1 to 4 each from the group of technical faults and once 20 anomalies of types 5 to 8 each from the group of unusual consumption into the selected time series. We insert 20 anomalies per type to consider a reasonable number of anomalies and we insert all four types of anomalies from a group at once to consider their simultaneous occurrence [44, 45]. The inserted anomalies correspond to 5% of the data for the technical faults and 11% of the data for the unusual consumption.

4.2 Applied Anomaly Detection Methods

For the evaluation of the detection and compensation strategies, we choose anomaly detection methods based on the evaluation results in [44], where a variety of anomaly detection methods is already evaluated on the selected data. More specifically, we choose the method from the evaluated supervised and unsupervised anomaly detection methods that overall performs best for the considered groups of anomalies. For both groups of anomalies, the best-performing method is an unsupervised anomaly detection method, namely the Variational Autoencoder (VAE) for the technical faults and the Local Outlier Factor (LOF) for the unusual consumption. We briefly introduce both chosen anomaly detection methods, before we describe their application.

The Variational Autoencoder (VAE) learns to map its input to its output using the probability distribution of ideally anomaly-free data in the latent space, so it is trained to only reconstruct non-anomalous data [26]. The Local Outlier Factor (LOF) estimates the local density of a sample by the distance to its k-nearest neighbors and uses low local densities compared to its neighbors to determine anomalies [8].

To enhance the detection performance of the selected anomaly detection methods, we apply them to the latent space representation of the selected data as suggested in [44] and visualized in Fig. 8 in Appendix 2. We choose the generative method to create these latent space representations for each selected anomaly detection method based on the evaluation results in [44]: We create the latent space data representations for the vae with a conditional Invertible Neural Network (cINN) [6] and the latent space data representation for the LOF with a conditional Variational Autoencoder (cVAE) [41]. We detail the architecture and training of the used cINN and cVAE in Appendix 2. Given the created latent space representation, we apply the selected anomaly detection methods to the entire selected time series of the chosen data as in [44].

4.3 Applied Anomaly Compensation Method

For the anomaly compensation in the evaluation of the proposed compensation strategy, we use a Prophet-based imputation method because of its superior imputation performance for power time series determined in [48]. The Prophet-based imputation method [48] is built on the forecasting method Prophet which is capable of estimating a time series model on irregularly spaced data [42]. Prophet uses a modular regression model that considers trend, seasonality, and holidays as key components. It can be described as

$$\begin{aligned} y(t) = g(t) + s(t) + h(t) + \varepsilon _t, \end{aligned}$$
(6)

where g models the trend, s the seasonality, h the holidays, and \(\varepsilon _t\) all other changes not represented in the model. The Prophet-based imputation method trains the regression model using all values available in the power time series. Given the trained regression model, the Prophet-based imputation method considers all anomalies in the power time series as missing values and imputes them with the corresponding values from the trained regression model.

4.4 Anomaly-Free Baseline Strategy

In the evaluation, we examine the proposed raw, detection, and compensation strategies all based on the selected data containing inserted synthetic anomalies. For the evaluation of these strategies, we additionally provide an anomaly-free baseline. This baseline strategy comprises forecasts that are calculated on that selected data but without any inserted anomalies (see Fig. 3).

Fig. 3.
figure 3

For evaluating the proposed strategies on the data with inserted synthetic anomalies, we use the forecast calculated on the input power time series without inserted anomalies as an anomaly-free baseline strategy.

4.5 Applied Forecasting Methods

For the evaluation of the proposed strategies, we consider a multi-step 24 h-ahead forecast with a multiple output strategy for which we apply a representative selection of forecasting methods to the selected data. Due to the quarter-hourly resolution of the selected data, the forecast comprises 96 values. For forecasting methods with hyperparameters, we use hyperparameters that we initially choose based on best practices and then verify. We first present the selected forecasting methods and their input data for the raw and compensation strategies, before we describe them for the detection strategy and the anomaly-free baseline strategy. We lastly present the used train-test split.

Methods Applied in Raw and Compensation Strategies. To examine the raw and compensation strategies comprehensively, we consider methods with different learning assumptions. We apply eight forecasting methods, namely two naive and six advanced methods. The advanced methods comprise a simple statistical method, a simple and two more complex machine learning methods, and two statistical learning methods.

The first naive method is the Last Day Forecast. It uses the values of the previous 24 h for the values to be predicted, i.e.,

$$\begin{aligned} \hat{y}_{t,h} = y_{t - 96 + h}, \end{aligned}$$
(7)

where \(\hat{y}_{t,h}\) is the forecast value of the electrical load for the forecast horizon h at time t and \(y_t\) is the electrical load at time t.

The second naive method is the Last Week Forecast. It takes the corresponding values of the last week as the forecast values, i.e.,

$$\begin{aligned} \hat{y}_{t,h} = y_{t - 672 + h}, \end{aligned}$$
(8)

where \(\hat{y}_{t,h}\) is the forecast value of the electrical load for the forecast horizon h at time t and \(y_{t - 672}\) is the electrical load one week ago at time \(t-672\).

The first advanced method is the Linear Regression (LinR). As a statistical method, it models the forecast values as a linear relationship between the historical load values and calendar information and determines the corresponding parameters using ordinary least squares. It is defined as

$$\begin{aligned} \hat{y}_{t,h} = c_h + \sum \limits _j{\beta _{h,j} \cdot y_{t-j}} + \sum \limits _k{\gamma _{h, k} \cdot C_{t, k}} + \varepsilon , \end{aligned}$$
(9)

where c is a constant, index j iterates over the lagged load features \(y_{t-j}\), index k iterates over the calendar information \(C_{t, k}\), and \(\varepsilon \) is the error.

The second advanced method is a commonly applied simple machine learning method, namely a Neural Network (NN). It organizes a network of interconnected nodes in input, hidden, and output layers to apply different functions that activate the corresponding nodes to learn the relationship between input and output (e.g., [31, 49]). The implementation of the used NN is detailed in Table 6 in Appendix 3. For its training, we use a batch size of 64, the Adam optimizer [25] with default parameters, and a maximum of 50 epochs.

The third advanced method is the Profile Neural Network (PNN) [21] as a state-of-the-art and more complex machine learning method. It combines statistical information in the form of standard load profiles with convolutional neural networks (CNNs) to improve the forecasting accuracy. For this, it decomposes a power time series into a standard load profile module, a trend module, and a colorful noise module, before aggregating their outputs to obtain the forecast [21]. For the training, the PNN uses a batch size of 512, the Adam optimizer [25], and a maximum of 50 epochs.

The fourth advanced method is the Random Forest (RF) Regressor representing a statistical learning method. It creates several randomly drawn regression trees and takes the mean of each individual tree’s forecast as forecast [7], i.e.,

$$\begin{aligned} \hat{y}_{t, h} = \frac{1}{B}\sum _{b=1}^B t_{b, h}(x), \end{aligned}$$
(10)

where B is the number of bootstrap samples of the training set, \(t_b\) is an individual fitted tree, and x are the values from the test set. For the evaluation, we use \(B = 100\).

The fifth advanced method is the Support Vector Regression (SVR) and represents another statistical learning method. It determines a regression plane with the smallest distance to all data points used for the training. The data points closest to the regression plane on both sides are the so-called support vectors [17]. We apply the SVR with a linear kernel, \(C = 1.0\), and \(\epsilon = 1.0\).

The sixth advanced method is the XGBoost Regressor, which represents a more complex machine learning method. It iteratively creates regression trees and uses gradient descent to minimize a regularized objective function [12].

All introduced forecasting methods use the historical values of the selected power time series that contains inserted synthetic anomalies. The advanced methods also consider calendar information as input (see Table 5 in Appendix 3 for more details). While the naive methods directly use the mentioned historical load values, all other methods obtain the normalized load of the last 24 h and the calendar information for the first value to be predicted.

Methods Applied in Detection Strategy. For the detection strategy that can use the information on the detected anomalies for the forecast, we apply the forecasting methods introduced for the raw and compensation strategies. This way, we also evaluate the detection method using forecasting methods with different learning assumptions. However, we adapt the previously introduced methods as follows: We change the Last Day Forecast so that it uses the value from a week ago in the case of a detected anomaly. Similarly, we modify the Last Week Forecast so that it uses the corresponding value of the second to last week as the forecast value if the value to be predicted is a detected anomaly. In accordance with the detection strategy, all other forecasting methods obtain the information on the detected anomalies of the last 24 h as additional features.

Methods Applied in Anomaly-Free Baseline Strategy. To calculate the anomaly-free baseline strategy for the data containing synthetic anomalies, we apply all forecasting methods described for the raw and compensation strategies to the same data but without inserted synthetic anomalies. These forecasting methods obtain the inputs in the way described for the raw and compensation strategies.

Train-Test Split. Regardless of the considered strategy, we use the same train-test split for all evaluated forecasting methods. Each forecasting method is trained on 80% of the available data and tested on the remaining 20%. For all strategies, the available data is the selected time series without the first 96 data points. When calculating the anomaly-free baseline strategy for this data, we use the same period of time, i.e., all values except the first 96 data points.

4.6 Experimental Setting

For evaluation, we use evaluation metrics in a defined hard- and software setting.

Metrics. In order to evaluate the proposed strategies for managing anomalies in energy time series forecasting, we examine the accuracy of the obtained forecasts compared to the data without inserted synthetic anomalies using two metrics.

The first metric is the commonly used root mean squared error (RMSE). Given N data points to be predicted, it is defined as

$$\begin{aligned} {\text {RMSE}} = \sqrt{\frac{1}{N}\sum _{t=1}^{N} (y_t - \hat{y}_t)^2}, \end{aligned}$$
(11)

with the actual value \(y_t\) of the anomaly-free time series and the forecast value \(\hat{y}_t\). Due to the squared differences considered, the RMSE is sensitive to outliers.

Therefore, we also consider a second commonly used metric, the mean absolute error (MAE), which is robust to outliers. It is defined as

$$\begin{aligned} {\text {MAE}} = \frac{1}{N}\sum \limits _{t=1}^N |y_t - \hat{y}_t | \end{aligned}$$
(12)

with N data points to be forecast, the actual value \(y_t\) of the anomaly-free time series, and the forecast value \(\hat{y}_t\).

Hard- and Software. In order to obtain comparable results, we use the same hardware throughout the evaluation and implement all evaluated strategies and used anomaly detection, anomaly compensation, and forecasting methods in Python (see Appendix 4 for more details).

4.7 Results

To examine the presented strategies, we compare their accuracy on the selected time series with the described inserted synthetic anomalies and using the described anomaly detection, anomaly compensation, and forecasting methods. After presenting the results of this comparison for the technical faults and unusual consumption, we show a part of the selected time series as an example of how the different strategies work and perform.

Fig. 4.
figure 4

The accuracy of the eight forecasting methods applied to the data with 20 synthetic anomalies of each type from the technical faults and unusual consumption. For each forecasting method introduced in Sect. 4.5, the bars indicate the average RMSE or MAE for the raw strategy, detection strategy, compensation strategy, and anomaly-free baseline strategy. The error bars show the observed standard deviation across all runs on the test data set. Note that the anomaly-free baseline strategy generally performs best because it uses data that does not contain inserted synthetic anomalies.

Comparison. For the comparison, we apply all proposed strategies to the selected data with synthetic anomalies from the technical faults and unusual consumption. For both groups of anomalies, we insert 20 anomalies of each type belonging to this group. Figure 4a and 4c show the resulting RMSE and MAE for the technical faults and Fig. 4b and 4d for the unusual consumption. For each considered forecasting method, the bars indicate the average RMSE or MAE for the raw strategy, the detection strategy, the compensation strategy, and the anomaly-free baseline strategy. The error bars show the observed standard deviation across all runs on the test data set.

Technical Faults. Regarding the technical faults, all considered forecasting methods except the Last Day Forecast, the Last Week Forecast, the LinR, and the NN have both the lowest RMSE and MAE when using the compensation strategy. The SVR has only the lowest RMSE with the compensation strategy but the lowest MAE with the raw strategy. Even though the difference to the compensation strategy is only small, the Last Day Forecast, the Last Week Forecast, and the NN achieve their lowest RMSE and MAE using the detection strategy and the LinR with the raw strategy. Moreover, the difference between the RMSE when using the compensation strategy and the RMSE using the second best strategy is largest for the XGBoost Regressor, the RF Regressor, and the SVR. Similarly, the difference between the MAE when using the compensation strategy and the MAE using the second best strategy is largest for the XGBoost Regressor, the PNN, and the RF Regressor. Additionally, we see the largest difference between the RMSEs in the use of the raw, detection, and compensation strategies for the Last Day Forecast and the Last Week Forecast, followed by the XGBoost Regressor. With respect to the MAE, we observe the largest differences for the PNN, the XGBoost Regressor, and RF Regressor.

Compared to the anomaly-free baseline strategy, the RMSE of all forecasting methods, especially of the Last Day Forecast, the Last Week Forecast, the SVR, and the XGBoost Regressor, is also noticeably greater for all three strategies. Concerning the MAE, we also see large differences between the anomaly-free baseline strategy and the three other strategies for all forecasting methods but especially the Last Day Forecast, the XGBoost Regressor, the LinR, and the SVR. Considering the actual accuracy, the LinR, the PNN, and the NN form the group of forecasting methods that achieve the lowest RMSE and the SVR, the PNN, and the LinR the group with the lowest MAE.

Unusual Consumption. For the unusual consumption, all considered forecasting methods except the NN achieve both the lowest RMSE and MAE using the compensation strategy. The NN has its lowest RMSE with the detection strategy. The Last Day Forecast also has its lowest RMSE using the compensation strategy but its lowest MAE using the detection strategy. The difference in the RMSE and MAE between using the compensation strategy and using the second best strategy is large for the XGBoost Regressor, the LinR, the RF Regressor, the SVR, and small for the NN, the PNN, the Last Day Forecast, and the Last Week Forecast. Moreover, we observe the largest differences between the RMSEs for using the raw, detection, and compensation strategies for the LinR, the RF Regressor, and the SVR. The largest observed differences in the MAE of these three strategies are for the LinR, the XGBoost Regressor, and the NN.

In comparison to the anomaly-free baseline strategy, the RMSE and MAE of all forecasting methods is clearly larger for all three strategies. With regard to their actual accuracy, the PNN achieves the lowest RMSE, followed by the SVR and the LinR. Considering the accuracy in terms of the MAE, the SVR, the PNN, and the Last Day Forecast achieve the lowest MAE.

Example. To demonstrate how the different strategies work and perform, we finally look at a part of the time series used for the evaluation in more detail. Using three days of this time series from June 2014, Fig. 5 illustrates an inserted synthetic anomaly, how this anomaly is detected, and what the resulting forecasts of the different strategies look like.

Fig. 5.
figure 5

Three exemplary days of the power time series used for the evaluation, where a synthetic anomaly of type 8 is inserted, detected, and compensated. Finally, the inserted anomaly is dealt with differently in the forecast depending on the strategy.

More specifically, Fig. 5a illustrates the selected original time series, which we assume to be anomaly-free, and the time series with inserted synthetic anomalies. In the latter, we observe an anomaly of type 8 which increases the load values for about one of the three days. In addition to these two time series, Fig. 5b shows the data points of the time series with inserted synthetic anomalies that are detected as anomalous by the LOF, which is the applied anomaly detection method. We observe that the LOF, detects various but not all data points of the inserted synthetic anomaly as anomalous. Figure 5c then illustrates how these detected anomalous data points are compensated using the Prophet-based imputation method. With regard to the compensated detected anomalous data points, our observation is that compensated values are all close to the original anomaly-free time series.

Finally, Fig. 5d additionally shows the multi-step 24 h-ahead forecasts of the four different strategies using the PNN and given the previously described information. We observe that all strategies result in different forecasts: The forecast of the compensation strategy, that is based on the time series with compensated synthetic anomalies introduced in Fig. 5c, is closest to the forecast of the anomaly-free baseline strategy. Moreover, the forecast of the detection strategy, that uses the information on the detected anomalous data points introduced in Fig. 5b, is closer to the forecast of the anomaly-free baseline strategy than the forecast of the raw strategy.

5 Discussion

In this section, we first discuss the results from the evaluation of the proposed strategies for managing anomalies in energy time series forecasting, before reviewing the evaluation regarding its limitations and insights.

In the comparison of the accuracy of the proposed strategies, we observe that using the compensation strategy yields the lowest RMSE and MAE for most forecasting methods and both groups of anomalies. However, while the results are generally consistent across both accuracy metrics, some forecasting methods benefit from the two other strategies with respect to the RMSE, the MAE, or both: The NN and the Last Day Forecast perform best using the detection strategy for the technical faults and unusual consumption, the Last Week Forecast using the detection strategy for only the technical faults, and the LinR and the SVR using the raw strategy for the technical faults. However, it is worth noting that the compensation strategy is often the second-best strategy in these cases with similar accuracy, so it could serve as a default strategy.

Nevertheless, using the compensation and also the detection strategy is associated with additional computational costs because of the necessary anomaly detection and anomaly compensation. In case of the anomaly detection, the computational costs also include the creation of the latent space representation that we use to enhance the detection performance. Whether the improvement in accuracy over the raw strategy of using the data essentially as-is justifies this additional computational cost depends on the forecasting method and the anomalies contained in the data and requires careful consideration. From these results, one could infer that applying strategies that actively handle anomalies, namely the compensation and the detection strategies, is generally beneficial and, more specifically, that using the compensation strategy is mostly beneficial. Based on this inference, a best practice could be to apply the compensation strategy in energy time series forecasting. Additionally, given the nature of the data used for the evaluation, we assume that the gained insights also apply to similar periodic data, for example, from the areas of solar power generation, mobility, and sales.

Moreover, the comparison of the accuracy of all strategies shows that there is no clearly best-performing forecasting method for technical faults and unusual consumption. Instead, there are rather groups of similarly well-performing forecasting methods, for example, the LinR and the PNN for the technical faults and the PNN and the SVR for the unusual consumption. Additionally, regarding their actual accuracy, we observe that even naive forecasting methods can provide reasonable forecasts, which can serve as a computationally light baseline when looking for competitive forecasts.

Furthermore, the example of three days from the time series used for the evaluation illustrates how the strategies differ. By showing an inserted synthetic anomaly, the data points of this anomaly that are detected as anomalous, and how the detected anomalous data points are compensated, the example presents the inputs for the raw, detection, and compensation strategies. Additionally, the example includes the resulting forecasts of all strategies for the next day. Thereby, the influence of the different inputs on the forecast accuracy of the different strategies becomes comprehensible. Considering the results of the example, we also observe that the compensation strategy provides the comparatively best forecast although it is dependent on the only partly detected and compensated inserted synthetic anomaly. With regard to the detection performance of the detection method, however, it should be noted that anomaly types of unusual consumption are difficult to be detected from experience.

Nevertheless, we note that these results are associated with certain limitations. One limitation is that we only evaluate the proposed strategies with the selected anomaly detection, anomaly compensation, and forecasting methods since the performance of the proposed strategies highly depends on these methods. For example, forecasting methods vary by design in their sensitivity to anomalies and detection methods may not detect all anomalous data points. While we believe that the different selected methods are a representative sample of existing methods, it would be interesting to extend the evaluation to further anomaly detection, anomaly compensation, and forecasting methods. The performance of the selected methods additionally depends on the used hyperparameters. Although the hyperparameters used are carefully selected, their optimal choice could be investigated. Moreover, the reported results are based on the selected data. Although we perceive the selected time series based on our domain knowledge as comparatively anomaly-free, it could contain anomalies that influence the results. For example, the contained anomalies could worsen the results of the raw strategy and improve the results of the detection and compensation strategies. However, in this case, the relative comparison of the strategies would remain the same. Nevertheless, future work could examine more closely whether anomalies are contained and affect the results. In addition to contained anomalies, the results also depend on the inserted anomalies and might change with different numbers and types of inserted anomalies. Future work could thus also examine the influence of the inserted anomalies on the results. Furthermore, the data used for the evaluation represents the electrical power consumption on a client level. In future work, it might, therefore, be interesting to use other data to investigate how the aggregation level of the data influences the results.

Overall, we conclude from the performed evaluation that the compensation strategy is generally beneficial as it mostly allows for better or at least similar forecasting results as the other evaluated strategies when the input data contains anomalies. By favoring precise forecasts, the compensation strategy provides a means for appropriately managing anomalies in forecasts using energy time series, which could also be beneficial for automated machine learning forecasting.

6 Conclusion

In the present paper, we evaluate three general strategies for managing anomalies in automated energy time series forecasting, namely the raw, the detection, and the compensation strategy. For the evaluation, we apply a representative selection of forecasting methods to real-world data containing inserted synthetic anomalies in order to compare these strategies regarding the obtained forecast accuracy. We also present an example of how these strategies work and perform.

Despite requiring additional computational costs, the compensation strategy is generally beneficial as it mostly outperforms the detection and the raw strategy when the input data contains anomalies.

Given the proposed strategies for managing anomalies in energy time series, future work could address several follow-up questions. For example, future work could verify the results by applying other data including labeled data, anomaly detection methods, and anomaly compensation methods. Similarly, future work could evaluate the proposed strategies with further forecasting methods. Furthermore, future work could integrate the proposed strategies into existing approaches for automated machine learning to include them in the optimization problem of finding the best forecast for a given data set.