1 Introduction

Very short-term load forecasting (VSTLF) provides load forecasts up to one day ahead. Across the power industry, such forecasts are typically utilized by utilities and grid operators for real-time scheduling of electricity generation, load frequency control, and demand response. The very short-term load forecasts are also crucial to business operations of retailers, power marketers and trading firms.

VSTLF is often viewed as a sub-problem of short-term load forecasting (STLF), largely because both can take weather forecasts as the inputs for the forecasting period. STLF has been extensively studied over the past several decades, as summarized by several review articles [1,2,3,4]. A recent development on STLF was through the Global Energy Forecasting Competition 2012 (GEFCom2012) [5].

Many STLF models, such as regression models [2, 6] and artificial neural networks (ANN) [3], can be used for VSTLF. Nevertheless, to achieve high accuracy in the very short horizon, it should be recognized that the difference between VSTLF and STLF in practice is two-fold. From the modeling perspective, VSTLF models can rely on lagged load as an independent variable in addition to others such as weather and calendar variables that are commonly used in STLF. From the implementation perspective, VSTLF requires the model to be estimated rather quickly to produce the forecast in time. The short lead time also challenges the data collection process. Although the smart grid technologies today have made it possible to push recent load information to the operation room, many power companies still do not have access to high-quality load data of the most recent hour(s) when forecasting the load of the next hour.

The literature of VSTLF has been primarily devoted to the modeling aspect. Researchers have tried various techniques to forecast the load of the next few minutes to hours. Liu et al. compared five techniques for VSTLF in [7]. Although the paper has been frequently cited, its autoregressive models were incorrectly applied to the load series. Charytoniuk and Chen proposed another approach using a set of ANNs to model the load dynamics instead of the actual loads [8]. For VSTLF, Taylor used the observations of minute-by-minute British electricity demand to evaluate various methods including autoregressive integrated moving average (ARIMA) models and two exponential smoothing methods [9]. Alamaniotis et al. proposed an ensemble of kernel-based Gaussian processes [10]. Guan et al. pre-filtered the spikes in load series and decomposed the load series using wavelet prior to feeding it into a neural network [11].

Although the lagged load has often been used in the VSTLF literature, researchers typically assume that the observations of recent load are available with high quality whenever needed. In other words, there are few studies about the data quality issues of the lagged load variables. In reality, the information and communication technologies utilized by many power corporations cannot guarantee the real-time accurate demand load data. The most recent load data may arrive one or several hours or even several days later. Considering the meter malfunction, communication failures and equipment outages, the raw load data may be further cleansed through the load settlement process several weeks later. Hence it is very likely that the load value in the most recent hour is inaccurate. The possible malicious data attack to the data acquisition system may also lead to bad load data with many anomalies [12]. In fact, anomalies in the most recent load observations often cause the performance degradation of VSLTF models.

Although some papers in the load forecasting literature have more or less covered data quality issues, few of them are specifically devoted to VSTLF. Among power and energy applications, the anomaly detection emerges as an important topic in some fields, such as electric load forecasting [11, 13,14,15], load pattern grouping [16], gas load forecasting [17] and load data cleaning [15, 18, 19]. Among them, some have focused on the related topics of anomaly detection for STLF. Chakhchoukh et al. proposed a robust method for outlier and break detection for seasonal ARIMA parameter estimation and forecasting the electricity consumption in France up to a day-ahead [15]. Several engineers from the British Columbia Transmission Corporation proposed several novel methods to cleanse the corrupted and missing observations in the load data [18, 19]. In GEFCom2014, a winning team Jingrui Xie used a procedure based on a multiple linear regression model for outlier detection and data cleansing for STLF [14].

The main contribution of this paper is a novel anomaly detection method for VSTLF. We propose a model-based anomaly detection method that consists of two components, a dynamic regression model and an adaptive anomaly threshold. Due to a lack of benchmarking anomaly detection method specifically for VSTLF, three other methods are selected for Comparisons. Two of them are so-called “naïve methods” commonly used in the industry, while the other one is the method developed and used by Jingrui Xie in GEFCom2014 [14]. The publicly available data from ISO New England (ISONE) is used to construct the case study. We introduce the anomalies by deliberately increasing the most recent load observation to different levels.

The rest of this paper is organized as follows. Section 2 introduces the background of this study. Section 3 introduces three anomaly detection methods and then proposes a model-based anomaly detection method for VSTLF. Section 4 reports the framework to simulate the anomalies and presents the computational results. Section 5 proposes a general anomaly detection framework and discusses about some future research directions. Section 6 concludes this paper.

2 Background

In this section, we introduce the background of this paper including the data, models and their VSTLF performance on the case study data. All numerical experiments in this paper are performed using MATLAB (R2014a) on a personal laptop equipped with Intel Core i5 2.40 GHz CPU, 4GB usable RAM and Microsoft Windows 8 Professional. The two regression models are implemented using the module “robustfit” of MATLAB.

2.1 ISONE data

ISONE has made its load and temperature data publicly available from its website [20]. The data has been widely used in the load forecasting community [11, 21]. The Global Energy Forecasting Competition 2017 also used ISONE data in its qualifying match.

This paper takes 3 years (from 2013 to 2015) of hourly system total load and dry bulb temperature data to construct the case study. The goal is to forecast the one-hour-ahead loads of 2015. We conduct one-hour-ahead ex-post forecasting on a rolling basis with the model being re-estimated every hour using two years of data. In other words, to forecast each hourly load in 2015, the most recent two years of hourly load and temperature values are used as the training data for parameter estimation.

2.2 Models for VSTLF

Regression analysis is a widely used technique for load forecasting [2, 22,23,24]. In the regression analysis framework, the load is usually treated as the dependent variable, while the weather and calendar variables are treated as independent variables. The parameters of regression models are usually estimated using the ordinary least square method. Most of the top teams in GEFCom2012 adopted regression models [5, 25]. The benchmark model of GEFCom2012, a.k.a. Tao’s Vanilla benchmark, is also a regression model:

$$ \begin{aligned} E(Load_{t} ) & = \beta_{0} + \beta_{1} Trend_{t} + \beta_{2} Month_{t} + \beta_{3} Hour_{t} \\ & \quad \cdot Weekday_{t} + \beta_{4} T_{t} \cdot Hour_{t} + \beta_{5} T_{t}^{2} \\ & \quad \cdot Hour_{t} + \beta_{6} T_{t}^{3} \cdot Hour_{t} + \beta_{7} T_{t} \\ & \quad \cdot Hour_{t} + \beta_{8} T_{t}^{2} \cdot Month_{t} + \beta_{9} T_{t}^{3} \\ & \quad \cdot Month_{t} \\ \end{aligned} $$
(1)

where Trend t is an increasing natural number representing a linear trend at time t; Hour t , Weekday t and Month t are class variables representing 24 hours of a day, 7 days of a week and 12 months of a year, respectively; T t is a quantitative variable representing the temperature at time t. For the ease of presentation, we use β j to denote the coefficients. Nevertheless, it should be noted that β j for a quantitative variable is one coefficient, while β j for a class variable or an interaction including one or two class variables is a vector of multiple coefficients. In total, this Vanilla model consists of 290 coefficients to be estimated.

To enhance the accuracy in the very-short term, we augment the Vanilla model by adding a lagged load variable as the following:

$$ \begin{aligned} E(Load_{t} ) & = \beta_{0} + \beta_{1} Trend_{t} + \beta_{2} Month_{t} \\ & \quad + \beta_{3} Hour_{t} \cdot Weekday_{t} + \beta_{4} T_{t} \\ & \quad \cdot Hour_{t} + \beta_{5} T_{t}^{2} \cdot Hour_{t} + \beta_{6} T_{t}^{3} \\ & \quad \cdot Hour_{t} + \beta_{7} T_{t} \cdot Month_{t} + \beta_{8} T_{t}^{2} \\ & \quad \cdot Month_{t} + \beta_{9} T_{t}^{3} \cdot Month_{t} + \beta_{10} Load_{t - 1} \\ \end{aligned} $$
(2)

where Loadt−1 is the load in the preceding hour. Hence, there are totally 291 coefficients to be estimated. With the lagged dependent variable, model (2) is a dynamic regression model, abbreviated as DRM.

2.3 Benchmarking VSTLF performance on ISONE data

We then conduct one-hour-ahead forecasting for 2015 using the two models introduced above. Here we use the mean absolute percentage error (MAPE) of all hourly loads in 2015 to evaluate the performance of the models. MAPE is specified as the following:

$$ MAPE = \frac{100\% }{n}\sum\limits_{t = 1}^{n} {\left| {\frac{{A_{t} - F_{t} }}{{A_{t} }}} \right|} $$
(3)

where A t and F t are the actual and forecasted hourly loads at time t, respectively. A smaller MAPE value indicates that the corresponding model produces more accurate forecasts.

Table 1 shows the VSTLF performance of the two models. For each model, we conduct two experiments. One experiment is based on the assumption that the actual load value of the most recent hour is available and accurate, so we use the actual value for Loadt−1. The other one does not use the actual load value of the most recent hour either due to its unavailability or poor quality. In order to forecast the next hourly load, we first forecast the load of the most recent hour. We then use the predicted load value for Loadt−1 and actual values for the previous loads.

Table 1 MAPE of hourly load forecast in 2015

The following observations can be made from Table 1:

  1. 1)

    Using different values for Loadt−1, the MAPE results for Vanilla model are different due to the rolling VSTLF basis with the model being re-estimated using two recent years of changed data.

  2. 2)

    Whether the actual or predicted load values are used for the lagged loads, the DRM produces much more accurate one-hour-ahead forecasts than the Vanilla model. This is largely due to the inclusion of the lagged load variable.

  3. 3)

    The Vanilla model is not very sensitive to the load of the most recent hour. When using the predicted load value for Loadt−1, the MAPE value only increases by 0.01%. The reason is that the most recent observation is only one of the 17520 hourly observations within two years that are equally weighted in the least square estimation.

The DRM is quite sensitive to the most recent hourly load. When the predicted load value is used for Loadt−1, the MAPE value increases from 0.84% to 1.46%. The reason is that Loadt−1 is treated as an independent variable in the DRM.

3 Anomaly detection methods

In this section, we first introduce three anomaly detection methods, and then propose a DRM based detection method with an adaptive threshold.

3.1 A naïve method

In the power industry, a naïve anomaly detection method is often used for load forecasting. The mean and standard deviation of the hourly load values of all observations in one preceding year are first calculated and denoted as μ L and σ L , respectively. Then the hourly observations with load values outside the interval [μ L L , μ L + L ] are treated as anomalies, where the threshold h is given beforehand. This naïve method is denoted as “Method I” in this paper.

3.2 A seasonal naïve method

The seasonal naïve method is widely used for anomaly detection in the load data. Denoted as “Method II” in this paper, this method is based on the corresponding hourly loads for the load in each hour of a day. For the ith hour of the day, i=1, 2, …, 24, the mean and standard deviation of the hourly load values of all observations at hour i are first calculated and denoted as μ L (i) and σ L (i), respectively. Then the observations at hour i with load values outside the interval [μ L (i) − L (i), μ L (i) + L (i)] are treated as anomalies, where the threshold h is again given beforehand.

3.3 A Vanilla model based method with fixed threshold

In GEFCom2014, a winning team JIngrui Xie developed an anomaly detection method based on the Vanilla model with a fixed threshold [14]. The parameters of the Vanilla benchmark model are first estimated by using the training data. Then the absolute percentage error (APE) for each hourly load observation in the historical data is calculated. The observations in the unknown data with APE values greater than the fixed threshold h are treated as anomalies. h was set to be 0.5 in [14]. This method is denoted as “Xie’s method” in this paper.

3.4 A DRM-based method with an adaptive threshold

Due to the outstanding performance of the DRM as shown in Table 1, we propose a real-time anomaly detection method with an adaptive threshold for VSTLF based on the DRM. First, the parameters of the DRM are estimated using the training dataset (i.e., part of historical data). Then the percentage error (PE) for each hourly load in the training dataset is calculated. Finally, the newly-collected hourly observation will be treated as an anomaly, if the corresponding PE value is outside the interval [μ p p , μ p + p ], where the μ p and σ p values are the mean and standard deviation of the PE values of all observations in the two recent years of the rolling period. Hence these thresholds are updated on a rolling basis as the VSTLF progresses. The flow chart of this proposed anomaly detection method for one instance is depicted in Fig. 1. As the forecasting origin being advanced during the sliding simulation, the work flow is repeated on a rolling basis as well. This proposed anomaly detection method can be effective for both missing data and corrupted data. In this paper, we test this proposed method on ISONE data with simulated anomalies.

Fig. 1
figure 1

Flow chart of proposed anomaly detection method

4 Case study

In this section, we present the case study including the anomaly simulation method, a comparative analysis of the four anomaly detection methods on the ISONE data with simulated anomalies.

The simulated anomalies are injected to the year of 2015 one at a time. In total, p% of the loads is randomly selected with their loads altered by multiplying with 1 + k% to make these selected data points anomalies. Figure 2 depicts the hourly load profiles of the corrupted data for one week (from 13 July 2015 to 19 July 2015) in the summer of 2015, where k = 20, p = 50. The effects of anomalies in the load data with k ≥ 0 will be tested in this section since similar observations can be obtained for k< 0.

Fig. 2
figure 2

Load profile of corrupted data

Before forecasting an hourly load of the next hour using DRM, each anomaly detection method is tried individually to determine whether the newly-acquired hourly load data of the current hour should be used or replaced with the predicted hourly load. To measure the performance of the four anomaly detection methods, we used two measures introduced in [26]. One is false negative rate (FNR), which indicates the ratio of the number of undetected anomalies to the number of all anomalies. The other is false positive rate (FNR), which indicates the ratio of the number of normal points being detected as anomalies to the number of normal points. A smaller FNR or FPR value indicates that the corresponding method is more effective for anomaly detection. Since the ultimate goal of anomaly detection is to enhance the VSTLF accuracy, we also use the MAPE value to evaluate the one-hour-ahead forecasting accuracy after each anomaly detection method is applied.

4.1 Varying magnitude of anomalies

To investigate how the magnitudes of anomalies affect the performance of the anomaly detection methods, we first fix the percentage of anomalies in the testing dataset (i.e., data in the full year 2015) as 50% (i.e., p = 50) and then vary the magnitude k of anomaly load from 1.25 to 40 by doubling the k value each step. For each k and each anomaly detection method, we repeat the tests 5 times. Note that increasing the amount of repetitions does not alter the findings and conclusions from this paper. The averages of FNR, FPR and MAPE values of the 5 tests are recorded in Table 2 for all four methods. The thresholds h of Method I, Method II, Xie’s method [14] and the proposed method are set to be 2, 2, 0.2 and 2, respectively. The following observations can be made from Table 2.

Table 2 FNR, FPR, MAPE under various magnitude of anomaly load
  1. (1)

    For each method, as k increases, FPR remains the same and FNR decreases. The reason is that FPR is determined by h, and h stays the same for all possible k.

  2. (2)

    For k = 1.25, a lower FNR does not necessarily result in a lower MAPE. The reason is that there is an increasing trend of electric loads from year 2013 to year 2015. The higher FNR indicates more undetected anomalies of k% increased loads in the training dataset, which help offset the bias in the load forecast for year 2015.

  3. (3)

    For Method I or II, MAPE increases as k increases. The main reason is that the two methods cannot detect and correct enough anomalies (i.e., FNR is not low enough) to maintain the strong forecasting performance.

  4. (4)

    For Xie’s method and the proposed method, MAPE first increases and then decreases with respect to the increase of k. The initial increase of MAPE is mainly due to the increase of k. The latter decrease of MAPE is mainly due to the significant decrease of FNR value.

Figure 3 shows the forecasted hourly load profile for the same week in the summer of year 2015, with the anomalies generated from the setting k = 20 and p = 50. We can observe that all four methods are more or less over-predicting the actual load, due to the anomalies of increased loads. Nevertheless, the forecast provided by the proposed method is overall much closer to the actual load and much less affected by the anomalies than the forecasts provided by the other three methods.

Fig. 3
figure 3

Forecasted loads for corrupted data using four method

4.2 Varying the amount of anomalies and threshold in the proposed method

We first fix the magnitude of anomaly load at 110%, i.e., k = 10. We then set the percentage of anomalies in 2015 as 25%, 50% and 75%, respectively. The threshold h in the proposed anomaly detection method is varied from 1 to 8 with the increment of 1. For each (h, p) pair, we repeat the tests 5 times. Figure 4 shows the averages of FNR and FPR values of the 5 tests for different h values and p = 50. Figure 5 shows the averages of MAPE values of the 5 tests for different h and p values.

Fig. 4
figure 4

FNR and FNR of proposed method for p% = 50%

Fig. 5
figure 5

Forecast error in MAPE of DRM for VSTLF

The followings can be observed from Figs. 4 and 5:

  1. (1)

    As h increases, FNR increases and FPR decreases.

  2. (2)

    For any given h, MAPE increases as the percentage of anomalies increases.

  3. (3)

    For any given p, as h increases from 1 to 4, MAPE decreases, which is primarily due to the significant decrease of FPR. As h increases from 5 to 8, MAPE increases, which is primarily due to the increase of FNR.

For any given p, MAPE reaches the minimum value when h is 4 or 5. Taking p = 50 for example, if h = 4 or 5, both FNR and FPR are below 0.8%. Meanwhile, MAPE= 1.17%, lying in the middle of the interval [0.84, 1.46] from Table 1. This is close to the lowest possible MAPE we can get, because most anomalies are replaced with the forecasted loads with few normal observations being recognized as anomalies. Without loss of generality, for other k values or the coming anomaly either increasing or decreasing the recent load, different optimal h values can be obtained for lowest MAPE similarly.

4.3 Comparisons

Based on the computational results in Table II, the order of anomaly detection methods from more effective to less is the proposed method, Xie’s method, Method II and Method I, respectively. In principle, Xie’s method and proposed method are the two better ones since they are based on more comprehensive models than the naïve and seasonal naïve models. Method II outperforms Method I due to the inherent seasonality of the load series. The reason that the proposed method outperforms Xie’s method is two-fold. Firstly, as shown in Table 1, the underlying model of proposed method (a.k.a., DRM) produces much more accurate forecasts than the underlying model of Xie’s method (a.k.a., Vanilla model) for VSTLF. Secondly, the threshold determined by mean and standard deviation of PE values is more adaptive to the data than a fixed APE value in Xie’s method.

5 Discussion

In this section, we discuss the general anomaly detection framework and some future research directions.

5.1 A general anomaly detection framework

As shown in Table 3, the four anomaly detection methods can be categorized using a general anomaly detection framework, consisting of two components: an underlying model and a threshold. Evidence from earlier observations suggests that the accuracy of the underlying model directly influences the effectiveness of the anomaly detection method. Moreover, an adaptive threshold is superior to a fixed threshold. Future research can be carried out following this analytical framework by testing additional underlying models and other means to define the threshold. Similar anomaly detection methods can be tested for STLF applications as well.

Table 3 Details of four methods under proposed framework

5.2 Underlying models in proposed framework

In this paper, the DRM model is proposed by adding a lagged load variable to the Vanilla model. If we have added two or more lagged load variables to the Vanilla model and then used this new model as the underlying model in the proposed anomaly detection framework, we may obtain more accurate VSTLF forecasts and then more accurate real-time anomaly detection. Other alternatives such as the artificial neural network, support vector regression, fuzzy regression and robust regression models are also good candidates of underlying models. Hence, one important future direction is to seek the best underlying model for fine tuning of the proposed anomaly detection framework.

6 Conclusion

The anomaly detection and cleansing in the load data is an essential task in the smart grid era. High-quality real-time load data helps achieve accurate very short-term forecasts, which can further help with the operational excellence. In this paper, extended from the Vanilla benchmark model in GEFCom2012, a DRM is proposed for VSTLF. We then propose a real-time anomaly detection method based on the DRM for the corrupted load data, which can be further cleansed by replacing the detected anomalies with the forecasted hourly load from last sliding simulation. According to extensive tests on the ISONE data with simulated anomalies, the proposed anomaly detection method is shown to outperform two commonly-used naïve methods and one state-of-the-art method. Finally, a general framework is proposed to lay the ground for the future research on anomaly detection for load forecasting.

Although designed for hourly load data, the proposed anomaly detection method may be equally applicable to other scenarios of anomalies in time series data, such as the weather data and renewable generation data. Further investigations on the design of more effective model-based real-time anomaly detection methods or robust VSTLF techniques are also of particular importance to the field.