1 Introduction

1.1 Motivation

During the past two decades, deregulation of the electric power industry has occurred in many regions. Retailors and large customers purchase electricity from wholesale markets and most of the small-and-medium-sized customers buy their electricity from retail markets. Electricity price forecasting plays an increasingly important role in generating, retailing, and planning.

Electricity price forecasting can be categorized into short-term and mid-term forecasting. Short-term price forecasting can help market participants determine the day-ahead bid/offer for the wholesale market to maximize the benefit and manage the risks [1]. Mid-term monthly average electricity price forecasting (e.g., month ahead) is essential in guiding the participants over mid- and long-term time scales. In retail markets, the month-to-month variable rate plan is prevalent. The retail rate varies from month to month based on the market price and retailors usually issue the retail rate for next month one month ahead of time. In this sense, mid-term price forecasting will be helpful for retailors. In addition, mid-long-term price forecasting can provide price signals for generation expansion.

However, the disadvantages of monthly average electricity price forecasting are obvious.

  1. 1)

    For retailors, although monthly average electricity price forecasting can help them set the rate plan, the contribution of monthly average price to the retailors is very limited. The reason is as follows. Monthly average price is the load weighted price of system load profile. If the load profiles of the customers of the retailor are different from the system load profile, the average cost per KWh of the retailor may be different from the monthly average electricity price due to the significant price difference between peak and off-peak hours. Therefore, if a month ahead average daily electricity price profile can be forecasted, retailors can set the month-to-month variable rate plan based on the load profiles of the customers. The rate can even be customized for each consumer based on his/her load profile.

  2. 2)

    For GenCos, electricity prices in different periods are essential for investment decision making. Electricity prices in different periods are taken into consideration in microgrid planning [2, 3]. The microgrid can benefit significantly from high prices in peak hours if the generation cost is lower than the wholesale market price [4]. Energy storage systems are cost-effective if the price differences between peak hours and off-peak hours are significant [5]. However, the monthly average price cannot provide enough information for microgrid planning.

Therefore, month ahead average daily electricity price profile forecasting is very essential.

1.2 Literature review and contributions

A variety of methods and ideas have been tried for electricity price forecasting with varying degrees of success [6]. The Global Energy Forecasting Competition (GEFCom) has been held annually by the IEEE Working Group on Energy Forecasting since 2012. The theme of GEFCom2014 is electricity price forecasting [7]. Electricity price forecasting can be categorized into short-term and medium-and-long term time scales.

Most research focuses on short-term electricity price forecasting. Numerous methods have been proposed in the past. There are mainly three types of methodologies, including the game theory model, simulation model and time series model [8]. In [9], a dynamic platform is proposed to foster the development of data-mining enhanced multi-agent systems. It is utilized to generate power load and settlement price prediction in day-ahead market in Greece. Time-series models are also popular in short-term price forecasting, which includes parsimonious stochastic models, regression or causal models and artificial intelligence (AI) models [8]. In [10], a new functional forecasting method is proposed, which attempts to generalize the standard seasonal ARMAX time-series model to the L2 Hilbert space. A neural network (NN) model is presented in [11]. In this model, different influential factors are the feedback. Historical prices from the financial market, weekly price/load information, historical loads and day type are chosen as the explanatory factors. A support vector machine (SVM) model considering the impacts of oil and natural gas prices is proposed in [12] to forecast daily electricity price. In [13], a genetic optimal regression of relevance vector machines (SRM) model is proposed. The final prediction model is the optimal linear assembly of several individual SRM models with different kernel functions.

With the continuous deepening of research, hybrid models that combine several methods have been developed. Hybrid models usually outperform individual forecasting models. A short-term forecasting of the electricity price with data-driven algorithms is studied in [14]. A stacked denoising autoencoder (SDA) model, a class of deep neural networks (DNN), and its extended version are utilized to forecast the daily electricity price profile. In [15], a hybrid architecture combining the advantages of autoregressive integrated moving average (ARIMA) models and the local learning techniques is proposed. A hybrid model is proposed in [16]. In this model, an artificial neural network (ANN), an adaptive neuro-fuzzy inference system and an autoregressive moving average (ARMA) are utilized to generate three independent price forecasts. A new data fusion algorithm is then proposed to combine them. A hybrid approach to construct prediction intervals of marginal clearing prices (MCPs) with a two-stage formulation is proposed in [17].

Although short-term electricity price forecasting has been well studied, only a few studies focus on mid-term price forecasting. Mid-term electricity price forecasting is much more complicated, whereas short-term price forecasting can take advantage of trends from the immediate past [18]. Data-driven approaches, which take impact factors as inputs, are prevalent in mid-term price forecasting. In [19], several methods with some economic data as inputs are utilized to forecast the monthly average price, and the best mean absolute percentage error (MAPE) in these methods is 12.97%. An SVM model considering calendar day, fuel prices, electric loads, weather and import/export power is proposed in [20], and the MAPE is 8.04%. A data-driven approach with two regression-based linear forecasting models is proposed in [21], and the MAPE is 9.67%.

Reference [22] forecasts the mid-term UK baseload electricity prices by forecasting the prices of each day and averaging the forecasts afterwards. As is known, block transaction is adopted in the UK spot market. The UK baseload electricity price has a lower price fluctuation. Therefore, it is less sensitive to load variation and fuel price fluctuation. However, the day-ahead price in Electric reliability council of Texas (ERCOT) varies from time to time and day to day. It is highly sensitive to load variations and fuel price fluctuations. Month ahead daily load forecasting and daily fuel price forecasting is not accurate enough, which may result in low accuracy of hourly day-ahead price forecasting if we forecast the daily electricity prices of ERCOT and average the forecasts afterward. The month ahead monthly load forecasting accuracy is approximately 98%. The month ahead fuel price forecasting can learn from the future price, which usually has good results [23]. Hence, the advantage of forecasting the average daily electricity price profile directly instead of forecasting the hourly electricity prices and average the forecasts afterwards is clear due to the higher accuracy of monthly load forecasting and average fuel price forecasting.

To the best of our knowledge, up until now, month ahead average daily electricity price profile forecasting has not yet been investigated, but it merits in-depth study. The main contributions of this paper are:

  1. 1)

    Month ahead average daily day-ahead electricity price profile forecasting is proposed in this paper. A hybrid nonlinear regression and SVM model is proposed for month ahead average daily electricity price profile forecasting.

  2. 2)

    A nonlinear price regression model with deviation compensation is proposed to improve the forecast accuracy.

  3. 3)

    Off-peak hours, peak hours in peak months and peak hours in off-peak months are distinguished and different methods are adopted to further improve prediction accuracy.

1.3 Differences between the proposed average daily price profile forecasting and hourly price forecasting and averaging afterwards

The conventional mid-term average price forecasting is to forecast the hourly price for the next month and average the price afterwards. The distinctions between the proposed method and the conventional ones are as follows.

  1. 1)

    The forecast object

The conventional method forecasts the hourly prices of each day for next month, and then averages the forecast prices of each period, e.g., averages the forecast prices at 10:00 a.m. every day. In contrast, the proposed method directly forecasts the average price of a period, e.g., the average price at 10:00 a.m. for the next month.

  1. 2)

    The training set

For the conventional method, the training set includes the historical hourly price data for each day, e.g., 24-hour prices for one day. In contrast, the proposed method training set includes historical average daily price profiles, e.g., the average price at 10:00 a.m. for a month.

  1. 3)

    The historical impact factor data

For the conventional method, the historical impact factor data include the historical load profile of each day and the historical fuel price of each day. For the proposed method, the historical impact factor data include the historical average load profile of each month, historical average fuel price of each month, etc.

  1. 4)

    The forecasting accuracy of input factors

For the conventional method, the forecasting accuracy of the month ahead daily load profile and fuel price of each day is relatively low. However, the electricity price is strongly sensitive to the load and fuel prices. Forecasting errors of these input factors may result in non-negligible errors of the forecasted electricity price. For the proposed method, the forecasting accuracy of month ahead average daily load profile and month ahead average fuel price is relatively high. In other words, the forecasting errors of the input factors are small, and the forecasting error of the proposed method mainly originates from the modeling error.

Therefore, month ahead average daily price profile forecasting is proposed in this paper to improve forecast accuracy.

2 ERCOT electricity market and data sources

2.1 Overview of ERCOT electricity market

ERCOT was formed in 1970 and then became the central operating coordinator for Texas. With the deregulation of the electric power industry, ERCOT became an ISO in 1996 [24]. ERCOT developed the zonal wholesale market and then transformed to the nodal market in 2010. ERCOT manages the flow of electric power to 24 million Texas customers, covering approximately 90 percent of the state’s electric load [25]. The total operational capacity is 78543 MW. The capacity fuel types in percentages are shown in Table 1.

Table 1 Capacity fuel types in percentages in ERCOT

ERCOT runs both day-ahead and real-time markets. The two-sided auction is adopted. Demand is allowed to bid in spot markets. In the day-ahead market, producers submit their offers and consumers submit their bids to ERCOT. Energy is co-optimized with ancillary services (AS) and certain congestion revenue rights (CRR). The locational marginal price (LMP) in ERCOT has two components, which are the energy and congestion component.

The retail market in Texas is deregulated and well-developed. As of September 2014, 114 retailors were actively doing business in ERCOT [26]. In terms of rate structure, there are mainly three types of plans offered in the market, which are the fixed rate plan, variable rate plan (month-to-month) and indexed plan (market rate) [27].

2.2 Data sources

The data regarding the prices of natural gas delivered for electricity generation and system capacity of Texas are obtained from [28]. The data of consumer price index (CPI) and average wages of power plant operators are obtained from [29]. The data of lending rate in the USA are obtained from [30]. The data for day-ahead electricity prices and loads were obtained from [25].

Features for the predicted month should be predicted prior to electricity price forecasting. In these features, the lending rate and average wages change little and usually remain constant over a long period. However, the natural gas prices and CPI varies from month to month. The natural gas price may even fluctuate sharply at some times. Research regarding natural gas price forecasting has been reported and some well-performing methods are available [23, 31, 32]. CPI forecasting has also been well investigated in the economic field [33,34,35]. As this paper mainly focuses on the method of electricity price forecasting, forecasting of natural gas prices, CPI, average wages and lending rate for the predicted month are beyond the scope of this study. These parameters are assumed to be predicted accurately.

3 Nonlinear regression model with deviation compensation (NRM-DC)

3.1 Nonlinear regression model (NRM)

  1. 1)

    The compositions of costs

According to engineering economics, the cost is divided into the period cost and product cost [36]. The composition of cost and the impact factor related to cost are shown in Fig. 1.

Fig. 1
figure 1

Composition of cost and the impact factor related to cost

Period costs are defined as costs charged to expenses in the period in which they are incurred, and mainly consist of selling expenses, administrative expenses and financial expenses such as insurance and income tax expenses. Selling expenses and administrative expenses are mostly determined by the management level and price level. Financial expenses are closely related to the price level and lending rate.

Product costs consist of the costs involved in the purchase or production of goods, including the direct material, direct labor cost and manufacturing overhead. Direct material includes raw materials that can be processed into products. In power plants, direct material cost is directly related to fuel price. Direct labor cost is the wages paid to workers who produce the products and is directly related to social average wages of the industry. The remaining cost belongs to the manufacturing overhead, which is mainly influenced by the price level.

As analyzed above, cost (\( C_{\text{cost}} \)) can be described as a linear superposition of fuel price, price level, average wages and lending rate with different weights. It can be described as:

$$ C_{\text{cost}} = a_{1} f + a_{2} w + a_{3} l + a_{4} c + a_{5} $$
(1)

where \( f \) is fuel price; \( w \) is average wages; \( l \) is lending rate; \( c \) is CPI; \( a_{1} \), \( a_{2} \), \( a_{3} \), \( a_{4} \), \( a_{5} \) are the fitting coefficients and \( a_{5} \) is a fixed asset.

In ERCOT, electricity prices are driven to a large extent by changes in fuel prices, and natural gas prices in particular [37]. According to the natural gas prices, coal prices and heating rate, which can be obtained from the EIA, the comparison between costs of electricity generated with natural gas-fired and coal-fired power plants is shown in Fig. 2. Obviously, the cost of electricity generated with natural gas is much higher than that generated with coal. As 62.33% of installed capacity fuel types in Texas is natural gas and only 23.51% of installed capacity fuel types is coal, coal-fired power plants cannot meet the system load. Typically, the fuel type of marginal units is natural gas. Therefore, the natural gas price is selected as the dominant impact factor in fuel price in this paper.

Fig. 2
figure 2

Costs of electricity generated with natural gas and coal

  1. 2)

    The relationship between price and cost

Cost is the basis of price, but it is not the only factor that influences price. Price is determined by supply and demand in economic theory [38]. Spot price can even be forecasted directly by modeling the supply curve and demand curve in reference [39]. Figure 3 illustrates the relationship between electricity prices and demand in the PJM electricity market from January 1, 1999 to December 31, 2000 [40]. As demand increases, the price grows slowly at the beginning and rises significantly when the demand is high. An exponential function is used in [41], a Box–Cox transformation model is used in [42] and a hockey-stick shaped model is used in [40] to describe the relationship between price and supply-demand situation. The exponential function is employed in this paper.

Fig. 3
figure 3

Electricity prices and demand

  1. 3)

    Nonlinear regression model

The mean-reversion model can be used to describe electricity price and price fluctuation around the mean value [43]. Price (Cprice) can be expressed by generation cost multiplying supply demand coefficient (SDC) (i.e. kSDC), as in s). It is worth noting that other forms of functions can also be used.

$$ C_{\text{price}} = C_{\text{cost}} k_{\text{SDC}} $$
(2)

The relationship between supply and demand can be expressed by the average hourly loading rate in month horizon (AHLRMH) (i.e. ρAHLRMH), which is the average hourly load in a month horizon (Pav) divided by system capacity (Ps), as in (3).

$$ \rho_{\text{AHLRMH}} = \frac{{P_{\text{av}} }}{{P_{\text{s}} }} $$
(3)

As each month has different numbers of days, the average hourly load in the month horizon rather than total hourly load in the month horizon is chosen. An example is given to show how to calculate average hourly load in month horizon. Assume that average hourly load at 10:00 a.m. in September (with 30 days) is needed. It is the sum of all the loads at 10:00 a.m. in September divided by 30.

Each hour of each month has its average loading rate (i.e. \( \rho_{\text{AHLRMH}}^{\text{each}} \)), but it cannot reflect the tension of supply and demand straightforwardly. Therefore, a benchmark average loading rate (BALR) (i.e. \( \rho_{\text{B}} \)) is defined. It is the average loading rates of all the hours of all the months. It is used to measure the tension of supply and demand. The supply and demand situation coefficient (SDSC) (i.e. \( k_{\text{SDSC}} \)) is proposed and is defined as (4).

$$ k_{\text{SDSC}} = \frac{{\rho_{\text{AHLRMH}}^{\text{each}} }}{{\rho_{\text{B}} }} $$
(4)

SDSC is a number approximately 1.0 and can reflect the supply and demand situation straightforwardly. If SDSC is greater than 1.0, it means that there is a tense relationship between supply and demand in which prices may go up, and vice versa.

An exponential function is employed to describe the relationship between price and the supply-demand situation. The expression of SDC is as (5). It is worth noting that other forms of functions can also be used.

$$ k_{\text{SDC}} = a^{{k_{\text{SDSC}} - b}} $$
(5)

By substituting (1) and (4) into (2), the electricity price can be expressed by a nonlinear regression model as:

$$ C_{\text{price}} = (a_{1} f + a_{2} w + a_{3} l + a_{4} c + a_{5} )a_{6}^{{k_{\text{SDSC}} - a_{7} }} $$
(6)

where \( a_{1} \), \( a_{2} \), …, \( a_{7} \) are the parameters to be fitted.

Due to the fluctuation of prices, to reduce the influence of price fluctuation, a logarithmic smoothing processing is applied, which is as:

$$ \begin{aligned} \log (C_{\text{price}} ) & = \log (a_{1} f + a_{2} w + a_{3} l + a_{4} c + a_{5} ) \\ & \quad + (k_{\text{SDSC}} - a_{7} ) \cdot \log a_{6} \\ \end{aligned} $$
(7)

As the prices vary widely in different periods, the fitting may not perform well if all the historical data are used for fitting one model. Therefore, different periods are separated for regression in this paper. For example, to forecast prices at 10:00 a.m., only historical data at 10:00 a.m. are used for training.

3.2 Deviation compensation model based on SVM

It is widely acknowledged that there may be some system bias if only one model is employed to forecast the price. With the specific data for ERCOT from October 2010 to April 2016, examples of deviation analysis of April and September based on the proposed nonlinear regression model are shown in Fig. 4. The deviation is the actual price minus the predicted price.

Fig. 4
figure 4

Deviation analysis

It is apparent that the deviations exhibit certain distribution characteristics. Therefore, it is reasonable to adjust the results obtained from the nonlinear regression model. An improvement of the forecast accuracy can be expected. Therefore, a deviation compensation model is proposed.

As the deviation distribution is nonlinear and may be related to various factors, conventional regression methods may be ineffective.

As is known to all, SVM is an effective statistical machine learning method that is suitable for the high-order non-linear regression problem [44]. SVM is adopted to predict the deviations between predicted prices of the proposed nonlinear regression model and the actual prices. Natural gas price, CPI, average wages, lending rate, SDSC are preprocessed by principal component analysis (PCA) to extract the principal components [45]. The principal components are selected as the input features of the SVM model.

The framework of the proposed deviation compensation model is shown in Fig. 5.

Fig. 5
figure 5

Framework of the proposed deviation compensation model

3.3 Performance evaluation

Root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) are widely used performance evaluation indices in forecasting. RMSE, MAE and MAPE are defined as:

$$ RMSE = \sqrt {\sum\limits_{n = 1}^{N} {\frac{{|y_{n} - \hat{y}_{n} |^{2} }}{n}} } $$
(8)
$$ MAE =\frac{1}{N}\sum\limits_{n = 1}^{N} {|y_{n} - \hat{y}_{n} |} $$
(9)
$$ MAPE = \frac{1}{N}\sum\limits_{n = 1}^{N} {\frac{{ |y_{n} - \hat{y}_{n} |}}{{y_{n} }}} $$
(10)

where N is the number of forecasted data; \( y_{n} \) is the actual value and \( \hat{y}_{n} \) is the forecasted value.

These three indices are adopted for performance evaluation in this paper.

3.4 ERCOT case study

Historical data from December 2010 to November 2016 are used in the case study. A rolling forecast for the month ahead average daily electricity price profiles from May 2016 to November 2016 is presented. Two benchmarks are designed to verify the performance of the proposed method. The actual price data from the same month and same periods in the previous year are used as the forecasted price in Benchmark 1. The actual price data of the same periods in the previous month are used as the forecasted price in Benchmark 2. The results are shown in Figs. 6 and 7, respectively. Data in Fig. 6 are ordered by months and by periods.

Fig. 6
figure 6

Actual prices and forecasted prices based on NRM-DC

Fig. 7
figure 7

Forecasted results of each hour

RMSE and MAPE of NRM, NRM-DC, Benchmark 1 and Benchmark 2 are shown in Table 2.

Table 2 RMSE and MAPE of NRM, NRM-DC, benchmark 1 and benchmark 2

It is apparent from Table 2 that the proposed NRM and NRM-DC models outperform Benchmark 1 and Benchmark 2 in this case. The forecasted prices coincide with the trend of the actual electricity prices. NRM-DC performs better than NRM, especially in off-peak periods (periods except period 7, 8, 15, 16, 17 and 18). It is apparent from Table 3 that the proposed NRM and NRM-DCR methods perform better than the SVM model in off-peak periods in this case.

Table 3 RMSE and MAPE of NRM, NRM-DC and SVM model in off-peak periods

However, errors of the peak hours (period 7, 8, 15, 16, 17 and 18) are much higher than those of the off-peak hours. A specialized model is proposed in Section 4 to forecast prices in peak hours.

4 Price forecasting for peak hours

The supply and demand situation is usually tense in peak hours, which often results in price spikes. The prices in off-peak hours may be similar to the prices of the same period in the previous time window [46], but the average electricity prices in peak hours may become remarkably higher than usual. Large forecasting errors are produced. The price spike is considered to be an abnormal price in many studies and several models have been proposed for price spike forecasting [47, 48]. Price forecasting methods for peak hours should be differentiated from those for off-peak hours.

As is shown in Fig. 6, the maximum hourly average price is approximately 9 times the minimum hourly average price. The hourly average prices of periods 7, 8, 15, 16, 17 and 18 from December 2010 to April 2016 are shown in Figs. 8 and 9. Price spikes can deteriorate the fitness of nonlinear regression model.

Fig. 8
figure 8

Month ahead average hourly prices of periods 7 and 8

Fig. 9
figure 9

Month ahead average hourly prices of periods 15, 16, 17 and 18

As one can observe, prices in peak hours are not always very high. According to the prices and SDSC, months can be divided into peak months and off-peak months. Peak months of periods 7 and 8 are December, January and February in winter. Peak months of periods 15 and 18 are July and August in summer. Peak months of periods 16 and 17 are June, July and August in summer. Prices in peak hours of off-peak months are rational, whereas prices in peak hours of peak months are extremely high. Therefore, different forecasting methods should be adopted to forecast the prices in different scenarios.

4.1 Price forecasting for peak hours in off-peak months

Prices of peak hours in peak months are significantly high, which deteriorates the fitness of nonlinear price regression model. However, prices of peak hours in off-peak months still exhibit strong patterns, which means the NRM-DC model may remain applicable.

Historical data of off-peak months are used for training. The results are shown in Tables 4, 5 and 6, respectively.

Table 4 Forecasting results of periods 7, 8 in off-peak months
Table 5 Forecasting results of periods 15, 18 in off-peak months
Table 6 Forecasting results of periods 16, 17 in off-peak months

The proposed method for forecasting prices of peak hours in off-peak months can significantly improve the prediction accuracy in this case.

4.2 Price forecasting for peak hours in peak months

Prices of peak hours in peak months can be extremely high and uncertain. The relationship between price and SDSC can be strongly nonlinear. SVM shows advantages for strong nonlinear problems. Therefore, it is employed to forecast the prices of peak hours in peak months.

The influences of CPI, wages and lending rate on prices become minor as SDSC increases. When supply and demand are tight, prices are mainly related to supply, demand and variable costs. Therefore, only SDSC, natural gas prices, month and period are selected as the input features. Results are shown in Table 7.

Table 7 Forecasting results for peak hours in peak months

It is apparent that the RMSE and MAPE of the SVM model are much lower than those of the other two models.

5 Framework and final forecasting results of hybrid nonlinear regression and SVM model

Section 3 proposes a nonlinear regression model with deviation compensation. Different methods are employed to forecast prices of peak hours in peak months in Section 4. A hybrid nonlinear regression and SVM model is proposed to synthesize the advantages of these methods. In summary, the framework of the hybrid model is shown in Fig. 10.

Fig. 10
figure 10

Framework of hybrid nonlinear regression and SVM model

Historical data are used for training forecasting models. Data of off-peak hours are used as the training sets to forecast the prices of off-peak hours based on the NRM-DC model. Data of peak hours in off-peak months are used to forecast the prices of peak hours in off-peak months based on the NRM-DC model. All of the data are used to forecast the prices of peak hours in peak months based on the SVM model.

The forecast results based on the hybrid model proposed in this paper are shown in Table 8.

Table 8 Forecast results based on hybrid model and a brief comparison with other models

It is apparent that the proposed hybrid model performs better than the NRM-DC and SVM models in this case based on the ERCOT dataset.

6 Conclusion

Month ahead average daily electricity price profile forecasting is an essential task for retailors and investment decision making in electricity markets. A hybrid nonlinear regression and SVM model is proposed in this paper for month ahead average hourly price forecasting. In this model, prices of different periods in different months are forecasted by different methods. Three methods are adopted for prices of off-peak hours, peak hours in off-peak months and peak hours in peak months. A nonlinear price regression model with deviation compensation is proposed to forecast the prices of off-peak hours and prices of peak hours in off-peak months. SVM is adopted to forecast the prices of peak hours in peak months. The case study suggests that the hybrid method proposed in this paper performs well in month ahead average daily electricity price profile forecasting based on the ERCOT dataset.

Future work will investigate the applications of month ahead average daily electricity price profile forecasting, especially in retail market and investment decision.