Expectile regression averaging method for probabilistic forecasting of electricity prices

In this paper we propose a new method for probabilistic forecasting of electricity prices. It is based on averaging point forecasts from different models combined with expectile regression. We show that deriving the predicted distribution in terms of expectiles, might be in some cases advantageous to the commonly used quantiles. We apply the proposed method to the day-ahead electricity prices from the German market and compare its accuracy with the Quantile Regression Averaging method and quantile -- as well as expectile-based historical simulation. The obtained results indicate that using the expectile regression improves the accuracy of the probabilistic forecasts of electricity prices, but a variance stabilizing transformation should be applied prior to modelling.


Introduction
For the last three decades electricity markets are undergoing significant structural changes.At the same time the price risk for the wholesale electricity market participants increased significantly.Limited storage possibilities, technical constraints of transmission grid and the importance of electricity supply lead to much higher price variability than in other commodities markets.In the recent years we observe a rapid transformation of the overall electricity production profile in the European electricity markets with a growing share of renewable energy sources (RES).This makes not only electricity demand but also supply highly weather dependent and, as a consequence, electricity prices can be even more volatile.The distributed generation caused by the growth of RES induced the change not only in the energy production, but also in the profiles of the market participants.A number of small producers and traders joined the market.They are facing a significant risk associated with electricity price volatility, but at the same time electricity markets give them a range of trading opportunities.Since electricity prices are not known in advance, any trade planing needs to be based on price forecasts.In such a context some trading strategies have been proposed recently in the literature (Maciejowska et al., 2019;Serafin et al., 2022;Agakishiev et al., 2023).They utilize point as well as probabilistic forecasts of electricity prices.We believe that improving the accuracy of the forecasts would also improve the efficiency of trading strategies.
There are many articles dedicated to point forecasting of the day-ahead electricity prices, see Weron (2014) or Lago et al. (2021) for extensive reviews.Also an extension from the point to probabilistic forecasting methods has gained much attention in recent years, see Nowotarski and Weron (2018) for a review.The latter takes into account not only the best estimate of a future value but also uncertainty of the prediction.As a consequence, it brings much more information to a decision maker and allows e.g. for a direct risk management.One of the methods that were successfully applied in the probabilistic electricity price forecasting is the Quantile Regression Averaging (QRA) proposed by Nowotarski and Weron (2014).It is built on the quantile regression (Koenker and Basset, 1978) method combined with different point forecasts of electricity prices.In this paper we follow this direction and introduce the Expectile Regression Averaging (ERA) method.It uses a notion of expectiles introduced originally by Newey and Powell (1987).Expectiles can be viewed as an analogous description of the distribution to quantiles (Gneiting, 2009).However, the estimation of the expectile regression is based on the least squares method, in contrast to the quantile regression which is based on the least absolute deviations.
Due to their numerical and statistical properties expectiles has seen increasing interest in the recent years.They were used, among others, in regression analysis Waltrup et al. (2015), functional factor modelling (Burdejová and Härdle, 2019), estimation of extremes (Girard et al., 2022) or multivariate data analysis (Cascos and Ochoa, 2021).Expectiles have also gained at lot of attention in finance, since Kuan et al. (2008) adopted them as a risk measure, called the expectile Value at Risk (EVaR).Although the interpretation of EVaR is less straightforward than for the classical risk measures, using expectiles allows to overcome the known drawbacks of the latter, like non-coherence or non-elicitability (Ziegel, 2016;Bellini and Di Bernardino, 2015).Expectiles were also recently used as a risk measure for electricity market by Syuhada et al. (2021) or Janczura and Wójcik (2022).For other applications concerning electricity markets see also the work of Taylor (2021) or Melzer et al. (2019).However, to our best knowledge, expectiles were not used in the context of forecast averaging for electricity prices, yet.
The rest of the paper is structured as follows.In Section 2 we briefly describe the notion of expectiles and show their analogies as well as differences from quantiles.Section 3 is devoted to the construction of probabilistic forecasts of electricity prices.In particular, in Section 3.1 we introduce the Expectile Regression Averaging method.Next, in Section 4 we apply the proposed technique to the day-ahead electricity prices from the German market and compare its performance with some benchmark probabilistic forecasts.Finally, in Section 5 we conclude.

Expectiles and quantiles
A standard way of describing the probability distribution of a random variable is in terms of the cumulative distribution function (CDF) and its inverse, i.e. quantiles.Another notion that can be used in such a context is the expectile.An expectile at level τ , e τ (0 < τ < 1), is defined as a unique solution of (Newey and Powell, 1987) where (x) + = max(x, 0) and (x) − = min(x, 0) denote the positive and negative part of a variable x.For τ = 1 2 expectile is equal to the mean of the distribution, so expectiles are often seen as asymmetric generalization of the mean (Gneiting, 2009).On the other hand, if the expected value in (1) is replaced with a probability mass function, then the formula defines the quantile, yielding median for τ = 1 2 .Hence, expectiles generalize the mean in a similar way as quantiles generalize the median, but are based on the mean distance instead of the mass of the distribution.As a consequence, they include information on the size of exceedances, in contrast to quantiles, which are based only on their frequency.
Expectiles can be also defined as the minimizers of the quadratic loss function (Bellini and Di Bernardino, 2015) e τ (Y ) = arg min Note that for τ = 1 2 this loss function is just the standard mean square error (MSE).For quantiles we have an analogous absolute loss function (3) which for α = 1 2 is the mean absolute error (MAE).The loss functions (2) and (3) are also a basis for the quantile and expectile regression, generalizing the classical linear regression model in terms of the predicted variable distribution.These methods will be further used in the paper for forecast construction.
Both quantiles and expectiles describe a distribution of a random variable.Naturally, they are related with each other.As shown by Yao and Tong (1996) there exists a unique function h such that It is given by were G(x) = x −∞ ydF (y) is the partial moment function and F is the CDF of Y .There exists also the inverse relation.Expectiles are linked with the CDF F by (Waltrup et al., 2015) Hence, quantiles can be calculated from expectiles, and expectiles can be calculated form quantiles, but it usually requires some numerical approximations.
3 Probabilistic forecasts of electricity prices

Quantile and Expectile regression averaging
One of the commonly used methods for probabilistic orecasting of electricity prices is the Quantile Regression Averaging (QRA) proposed by Nowotarski and Weron (2014).It is based on applying the quantile regression (Koenker and Basset, 1978) to a pool of point forecasts of different individual models.Denote electricity price for a delivery during hour h on day t by P h,t .In the QRA method, probabilistic forecasts of P h,t are determined as the following linear combination (Nowotarski and Weron, 2014) where qP h,t (α) is an α-quantile of the forecasted distribution, Ph,t is a vector of K corresponding individual point forecasts, while w α is a column of weights for the α-quantile.Weights w α are estimated, by minimizing the quantile loss function (8) In this paper we follow the forecast averaging approach, but we propose to combine it with the expectile regression.It is similar to the quantile regression, but the absolute loss function ( 8) is replaced with the quadratic one (2).Hence, in the Expectile Regression Averaging (ERA) method the τ -expectile of the predicted distribution, êP h,t (τ ), is calculated as where Ph,t is the vector of point forecasts from the individual models and w τ are the weights estimated from Note that the expectile regression is based on the L 2 optimization, yielding here an ordinary least squares (OLS) method, while the quantile regresssion is based on L 1 optimization.The latter is more robust to outliers, but on the other hand least squares method posses better numerical properties.

Individual models
The ERA and QRA methods use a linear combination of individual forecasts, so they require deriving a set of point forecasts, first.To this end, we consider five expert models, being standard, frequently used approaches in electricity price modelling (see e.g.Misiorek et al., 2006;Kristiansen, 2012;Maciejowska, 2020).All are build on autoregressive models with exogenous variables (ARX), in which one assumes that electricity prices can be explained by the market fundamentals of technical or economical nature, like e.g.load, generation or weather conditions.Since, the forecasts of physical system variables are often publicly available, the construction of price predictions with the ARX models is straightforward.
In the first considered in this paper model we assume that the electricity price for a delivery during hour h of day t, P h,t , is given by Model 1: where P h,t−i are the autoregressive terms, Z i h,t , i = 1, 2, .., k are the exogenous variables and ǫ h,t is the noise term.In order to account for a weekly seasonality of electricity prices, we use also a dummy variable D t related to different days of the week.Here we consider Monday, Saturday, Sunday/Holiday, and the other days of the week.
The second model differs from the first one by the number of regressors, for which we consider all prices from a given hour during the previous week, i.e.
Model 2: The third model uses also the minimum and maximum of the previous days' prices, so it allows for taking into account nonlinear intraday effects Model 3: The structure of the fourth model, called the p-ARX (Misiorek et al., 2006), is similar to Model 1, but applied to prices with pre-processed spikes.Precisely, the prices that exceed the mean level from the calibration window by more than its three standard deviations are transformed as where the upper level is set to L U = µ P h,t + 3σ P h,t , while the lower level to The fifth model specification, m-ARX proposed by Ziel and Weron (2018), is a modification of Model 2, including the weekly mean of the prices P W h,t = 1 7 7 i=1 P h,t−i in the following way Model 5: The parameters of the ARX models, θ i , α i , ψ i , δ, η, can be estimated using the least squares method.Then, the day-ahead point forecasts for each hour are given by the corresponding linear combination of explanatory variables.The set of these forecasts, Ph,t , is then used in the ERA (9) and QRA (7) methods.
As a benchmark we also calculate the probabilistic forecasts using the standard historical simulation method.For each of the individual models we derive the out-ofsample point prediction errors Then, the probabilistic forecast is calculated as the sum of the point forecast and the errors' distribution.Here, this distribution is considered in terms of the quantiles as well as expectiles.
Overall, we consider 12 methods for deriving probabilistic forecasts: QRA, ERA as well as historical simulation of expectiles and quantiles from the five individual models.The models are fitted for each hour separately, so in total we consider 24 one-dimensional time series.This is a common approach in electricity price modelling since electricity delivered during different hours is in fact traded as separate products.

Prediction intervals from expectiles
The considered probabilistic forecasts are given either in terms of quantiles or of expectiles.Both are a proper description of the predicted distribution, but their accuracy should be evaluated using different scoring functions.Hence, in order to compare the quantile-and expectile-based methods, we transform expectiles into the corresponding quantiles.This yields the prediction intervals (PI), commonly used in forecasting context.To this end, we use a procedure proposed by Waltrup et al. (2015).It is based on finding a CDF that minimizes the distance between the derived expectiles and their theoretical values resulting from that CDF (6), i.e.
where F (e τ (Y )) is the value of the CDF at τ -expectile and G(e τ (Y )) is its partial moment function, G(x) = x −∞ ydF (y).Next, the values of the CDF at desired quantiles are approximated using linear interpolation and finally inverted yielding prediction intervals.For a detailed description of this procedure see Waltrup et al. (2015).

Variance stabilizing transformation
Since electricity prices are known to be highly volatile, we apply the variance stabilizing transformation prior to modelling (see Uniejewski et al., 2018, for a discussion on a usage of different transformations in this context).Here, we apply the inverse hyperbolic sine (asinh) function, which can be viewed as a generalization of the logarithmic transformation, being suitable also for negative prices.Namely, we consider where y h,t is the normalized price, y h,t = p h,t − µ p h,t /σ p h,t , with σ p h,t being here the standard deviation of prices, p h,t , in the calibration window and µ p h,t the corresponding mean.
For practical applications one is usually interested in predictions of the original prices, hence, here, predictions calculated for transformed prices are in the end inverted back.Since inverting the asinh transformation of random variables is not straightforward (see Narajewski and Ziel (2020) for a discussion on this issue), we use the Monte Carlo approach.Namely, first we simulate n day-ahead price scenarios using the predicted distribution.Next, we invert each of them using the hyperbolic sine function Finally, the empirical distribution of the inverted scenarios p 1 h,t , p 2 h,t , ..., p n h,t yields the probabilistic forecast of the price for day t and hour h.
4 German electricity market case study

Datasets
We apply the ERA, QRA as well as the historical simulation methods from individual models ( 11)-( 15) to the day-ahead hourly electricity prices from the German EPEX spot market spanning the period of 1.01.2017-31.12.2020.The considered prices are plotted in Figure 1.For the calculation of the point forecasts we use the set of exogenous variables Z i h,t consisting of: i) the forecasts of generation; ii) forecasts of wind generation; iii) forecasts of solar generation and iv) forecasts of load.All these values are published by the Transmission System Operator (TSO) and are freely available from ENTSO-E platform (https://transparency.entsoe.eu/).The values of the considered variables are plotted in Figure 2. 31.12.201731.12.201831.12.201931.12.2020

Day-ahead prices
Fig. 1 Hourly day-ahead electricity prices from the German EPEX spot market from the period 1.01.2017-31.12.2020

Forecasts construction
Electricity price predictions are calculated in a moving window scheme.For each day of the validation window we calculate the day-ahead probabilistic forecast based on the parameters estimated from the preceding calibration window.The derivation of the probabilistic forecasts for all considered methods requires calculating the point forecasts, first.Hence, we divide the calibration window into two yearly parts.The first one is used for the estimation of the parameters of the individual models ( 11)-( 15).Next, the resulting point forecasts are derived for the second part of the calibration window.Finally, these point forecasts are used to calculate probabilistic forecasts for the validation window.Here, the forecasts are evaluated in a two-yearly window spanning over the years 2019-2020.The comparison of forecasts is done in terms of quantiles (prediction intervals).In order to transform the expectile-based predictions, we apply the procedure (6) to a grid of expectiles calculated at the following levels τ = 0.001, 0.0025, 0.005, 0.0075, 0.01, 0.02, 0.04, ..., 0.98, 0.99, 0.9925, 0.995, 0.9975, 0.999.

Forecasts evaluation
The accuracy of prediction intervals is compared using the pinball loss (PL), being a proper scoring function for quantiles, (Gneiting and Katzfuss, 2014) where qP t,h (α) is the α-quantile of the forecasted price distribution and P t,h is the actually observed value.We calculate the averaged pinball score for each hour and

Forecasted solar generation
Fig. 2 Values of the exogenous variables: forecasted generation, forecasted wind generation, forecasted load and forecasted solar generation for the German market from the period 1.01.2017-31.12.2020 (source: ENTSO-E).
percentile in the validation window.The values are then averaged over all percentiles, yielding hourly pinball score, or over all hours, yielding percentile pinball score.The obtained results are plotted in Figure 3.As can be observed, the ERA ad QRA averaging schemes yield lower pinball scores than the historical simulation method for most percentiles.The highest difference is obtained in the middle of the distribution.A similar picture is obtained for the hourly pinball score with the lowest values for the ERA method for most of the hours.The only exceptions are the peak hours of 20 and 15, for which the historical simulation methods yield more accurate results than the forecast averaging.
The significance of the pinball score differences is further verified using the onesided Diebold and Mariano (1995) test.In Figure 4 we show the number of hours as well as percentiles for which each of the considered models was significantly outperformed by its competitors.The obtained results confirm conclusions drawn from mean pinball score  Figure 3.The ERA and QRA averaging schemes outperform significantly the historical simulation methods.There are no hours for which the accuracy of ERA or QRA were significantly lower, while they outperformed the historical simulation for 5 up to 23 hours depending on the model specification.Similarly for percentiles, we can see a significant improvement in the forecast accuracy, if averaging methods were used.This is especially apparent for the ERA method, which outperforms the other approaches for 33 up to 86 percentiles.It yields significantly better results also in comparison with the QRA method, outperforming the latter for 8 hours and 67 percentiles.Looking at the differences between the quantile-and expectile-based historical simulation within a given model specification we do not observe a clear pattern and overall accuracy is at similar level.
In order to further evaluate the predictions, we calculate the coverage probability P(P t,h < qP t,h (α)) at the 5% and 95% α-levels.Note that these are in fact accuracy of the Value at Risk forecasts at 95% level, i.e.VaR 95% , for a seller and buyer, respectively.Results obtained for each of the hours are plotted in Figure 5. Here, we can see a clear difference between the results obtained with the quantile-and expectile-based methods.The coverage probabilities obtained for the latter are closer to the expected 5% and 95% levels.This is visible for both approaches -the ERA method and the expectile-based historical simulation.The coverage probabilities obtained with the QRA as well the quantile-based historical simulation methods are too high for the 5%  quantile and at the same time too low for the 95% one, yielding too narrow prediction intervals.The coverage probabilities obtained with the expectile based methods are close to the expected 5% and 95% levels with the only exception for higher quantiles in the night hours which are lower by approximately 1%.The significance of the differences from the expected 5% and 95% levels is verified using the Kupiec (1995) test.The number of hours for which the obtained coverage probabilities were significantly different from 5% and 95% is given in Table 1.In the case of the expectile-based methods the obtained values are significantly different than the expected ones only for few hours, mainly during night, and for 95% level.For the quantile-based methods the accuracy is much worse as the number of hours with significant differences is from 10 up to even 22.

Number of quantiles
Table 1 Number of hours for which the coverage probability obtained for the considered methods applied to asinh transformed prices was not significantly different than the expected 5% or 95% value according to the Kupiec (1995) test performed at the 5% significance level.'Q' denotes the quantile-based approach, while 'EX' the expectile-based one.

Historical simulation
Model 1 Model 2 Model 3 Model 4 Model 5 21 0 10 0 13 0 11 1 11 1 13 0 P 95% [%] 22 3 22 7 14 2 17 6 20 8 14 2 The obtained results are summarized in Table 2.As a reference we provide also the results obtained in the case, if there was no transformation applied to electricity prices prior to modelling.The coverage probabilities and the pinball scores are averaged over Fig. 5 Hourly coverage probabilities at the 5% (top panel) and 95% (bottom panel) level obtained for each of the considered models applied to asinh transformed prices.'Q-hist' denote the historical simulation in terms of quantiles, while 'EX-hist' the historical simulation in terms of expectiles.Numbers are related to the individual Models.The 5% and 95% levels are marked with horizontal blue lines.
all hours and in the latter case also over all percentiles.The coverage probabilities are additionally evaluated with the Kupiec (1995) test at the 5% significance level.The best averaged pinball score was obtained for the ERA method applied to asinh transformed prices.Also the averaged coverage probabilities are in this case closer to 5% or 95% for the expectile-based methods.We observe much improvement of the forecast accuracy if the asinh transformation is applied to electricity prices, especially in the lower tails of the predicted distribution and in the overall pinball score.Interestingly, if no transformation is applied, then the quantile-based methods yield higher accuracy than their expectile analogues.Those methods rely on the absolute deviation instead of the least squares, so are more robust to outliers.Nevertheless, accuracy of the forecasts obtained without transformation is lower then their transformed versions for all of the considered methods.

Conclusions
In this paper we proposed a new method for probabilistic forecasting of electricity prices.It is based on combining forecast averaging with the expectile regression.Precisely, it yields forecasts of expectiles, given as linear combinations of a pool of point forecasts.Predicted distribution is then given in terms of expectiles, so it Table 2 The values of the pinball score (PS) as well as the coverage probabilities P 5% and P 95% at the 5% and 95% levels obtained for the considered methods.The values were averaged over 24 hours and all percentiles.Predictions were calculated for prices transformed with asinh function, as well as without a transformation.'Q' denotes the quantile-based approach, while 'EX' the expectile-based one.The coverage probabilities that are not significantly different than the expected level according to the Kupiec (1995)  can be directly used e.g. for risk management purposes.On the other hand, from a grid of expectiles one can also calculate quantiles of the same distribution.Such transformation yields prediction intervals, commonly used in the forecasting context.The proposed ERA approach was applied to the German electricity market data.Its accuracy for hourly, day-ahead electricity prices was with the QRA as well as the historical simulation methods.In terms of the pinball score both considered forecast averaging methods, ERA and QRA, significantly outperformed historical simulation.Results of the expectile-as well as quantile-based historical simulation methods were in this case similar.We also calculated coverage probabilities at the 5% and 95% levels.For this accuracy measure all expectile-based methods outperformed significantly the quantile-based approaches.Overall, the best results were obtained for the ERA method applied to prices after variance stabilizing transformation.Such transformation improved forecast accuracy for all considered methods.
We believe that utilizing the notion of expectiles in probabilistic forecasting of electricity prices might improve the forecasts accuracy.Since using expectile regression leads to least squares optimization, it naturally inherits its good numerical properties.However, for such volatile data as electricity prices it should be applied with consciousness, as the least squares method is not robust to outliers.Hence, a variance stabilizing transformation or outlier treatment methods might be necessary to apply it efficiently.

Fig. 3
Fig.3Hourly (top panel) or percentile (bottom panel) mean pinball score obtained for each of the considered models applied to asinh transformed prices.'Q-hist' denote the historical simulation in terms of quantiles, while 'EX-hist' the historical simulation in terms of expectiles.Numbers are related to the individual Models.

Fig. 4
Fig. 4 Number of hours (left panel) or percentiles (right panel) for which prediction from the model in row is significantly worse than prediction from the model in column according to the Diebold and Mariano (1995) test. 'Q-hist' denote the historical simulation in terms of quantiles, while 'EX-hist' the historical simulation in terms of expectiles.Numbers are related to the individual Models.The test was performed at the 5% significance level.
test and the lowest pinball scores are given in bold.