Daily Middle-Term Probabilistic Forecasting of Power Consumption in North-East England

Probabilistic forecasting of power consumption in a middle-term horizon (months to a year) is a main challenge in the energy sector. It plays a key role in planning future generation plants and transmission grid. We propose a new model that incorporates trend and seasonality features as in traditional time-series analysis and weather conditions as explicative variables in a parsimonious machine learning approach, known as Gaussian Process. Applying to a daily power consumption dataset in North East England provided by one of the largest energy suppliers, we obtain promising results in Out-of-Sample density forecasts up to one year, even using a small dataset, with only a two-year In-Sample data. In order to verify the quality of the achieved power consumption probabilistic forecast we consider measures that are common in the energy sector as pinball loss and Winkler score and backtesting conditional and unconditional tests, standard in the banking sector after the introduction of Basel II Accords.


Introduction
Power consumption forecast has received significant attention from both academics and practitioners in recent years.In particular, middle-term forecast, i.e. in a timehorizon between a few months and a year1 , plays a key role in the planning of power systems both for network reliability and for investment strategies in future generation plants and transmission (see, e.g., Hong & Fan, 2016).
Through a probabilistic forecast, one obtains the full probability distribution of future consumption.It is the latest frontier of current research: it is more helpful for utilities and grid operators than point consumption forecast.In fact, it does not provide only the expected value of the forecast but also information in terms of the dispersion of the forecast: a piece of information that is relevant for generating reliable scenarios.
This technique gained momentum in the energy sector after the Global Energy Forecasting competition 2014 (GEFCom) on a dataset of U.S.A. power consumption (see, e.g.Hong & Fan, 2016;Nowotarski & Weron, 2018, and references therein).
In particular, it is important to be able to select the relevant drivers and their relationship with power consumption; this allows to understand and to hedge the risks.
We are interested in the impact of weather conditions; they play the most relevant role in middle-term forecasts compared to economic and demographic drivers that play a role in longer forecasts (Hyndman & Fan, 2010).We focus on a region in the UK with relatively homogeneous weather conditions and on the power consumption of one main operator on the household energy market.We desire to model the dependency of power consumption from weather conditions; the most natural technique, now standard in power consumption forecasting, is known as ex-post forecasting.It has been applied to middle-term probabilistic forecasting of power consumption on the French distribution network (Goude et al., 2013) and on the National Electricity Market of Australia (Hyndman & Fan, 2010).
We use a Machine Learning (ML) technique.After the seminal paper of Park et al. (1991) these techniques have been shown to provide interesting results in short-term point consumption forecast (see, e.g., Fan & Chen, 2006;Meng et al., 2009;Shi et al., 2018).In this paper, we apply a ML technique also to a middle-term consumption forecast up to one-year; moreover, we consider a density forecast and not only a point forecast.Furthermore, the main successes of ML have been shown with big-data analytics (see, e.g., Chen et al., 2012;Marino et al., 2016;Mocanu et al., 2016) and, in particular, they have been used in pricing forecast (see, e.g., Nowotarski & Weron, 2018, for a review).The real challenge is to use these techniques with small datasets.This is a common exigence in the industrial sector: it is quite hard to obtain within a middle-to-large operator homogeneous long time-series.This fact is due both to the rapid changes that are observed in this energy market and to mergers and acquisitions, corporate transactions that have become frequent after the liberalization of the sector: the features of this energy market can change significantly over time the composition of clients' portfolio and then the characteristics of the dataset.In order to show the effectiveness and quality of the proposed technique, we consider the extreme case where we analyse a two-year In-Sample dataset to forecast a one-year power consumption.
Up to our knowledge, the use of ML for middle-term density forecast using a small dataset is new in the literature.
In particular, the Gaussian Process technique is natural for density forecasting, because this model provides as output density forecasts of power consumption.The model has a relatively low computational cost and it is very parsimonious -with only three parameters -compared to other ML techniques.We are able to compute the densities of the consumption forecast for the hybrid model we introduce.
The performance of the model is obtained by comparing the forecasted results over the last year of the dataset.In order to value the quality of the forecast we consider techniques that are standard either in the energy sector, as pinball and Winkler scores (see, e.g., Nowotarski & Weron, 2018), or in the banking sector after the introduction of Basel II Accords, as backtesting (Kupiec, 1995;Christoffersen, 1998) The main contributions of the paper are threefold.First, we introduce a hybrid model that joints the advantages of classical univariate time-series analysis and simple ML techniques.In particular, via a Gaussian Process we incorporate in power consumption density forecast the dependency from weather conditions and we deduce density characteristics for the hybrid model we consider.Second, we show that a ML technique relying on a small dataset can achieve promising results forecast of power consumption using only weather data in middle-term forecasts.Third, we value the density forecast via both sharpness and reliability measures, showing the quality of the achieved results.In particular, we show that a Gaussian Process can achieve very good results even considering a short daily time-series with only a two-year In-Sample set.
The rest of the paper is organized as follows.In Section 2 we summarise the key characteristics of the dataset we analyse.In Section 3 we present the methodology; in particular, we describe in detail i) the proposed model and how the weather conditions are introduced via a Gaussian Process, ii) the forecasting technique and iii) the evaluation methods.Section 4 shows the main numerical results and Section 5 concludes.

Dataset Description
North East England is one of the nine regions of England and the eighth most • Temperature, in • C; • Wind Speed, in m/s; • Precipitation Amount, in mm; • Solar Radiation, in K J/m 2 ; • Relative Humidity, in %; • Cloud Cover, on a scale 0 (clear) to 8 (completely cloudy). of measurement in weather conditions: it implies ranges of values that can differ by orders of magnitude.Due to the difference in units of measurement, in order to obtain a homogeneous dataset it is often useful to standardise each input variable.In the non-parametric approach of ML that we consider in the next section, we use the Standardised Euclidean Distance.
The yearly and weekly seasonality of power consumption is rather evident in the dataset.Figure 1 represents the first two-year daily power consumption.It suggests a yearly seasonal behaviour and slightly higher power consumption during weekends.
We observe that i) power consumption is more than three times larger in winter w.r.t.
summer time, ii) consumption on weekdays is lower than on Sundays (and to a lesser extent on Saturdays) and iii) volatility is larger in winter than in summer time.These are well known stylised facts common to most power consumption time-series: for these reasons, and in particular due to the observed volatility behaviour, it is now standard to model the logarithm of consumption (see, e.g., Hong & Fan, 2016, for a review).
In the next section we describe in detail the adopted methodology.

The model
The main goal of this study is forecasting future power consumption over the middle-term horizon, modeling both seasonal and weather-related features.Our model choice is guided by the desire of obtaining a density forecast with a reasonably parsimonious description without the need to resort to extremely fine-tuned models.
As already mentioned in the introduction, we model log-scaled daily power consumption data, as it is standard in the literature for probabilistic electric consumption forecast.The characteristics of power demand that we desire to model are: • long-term trend; • yearly and weekly seasonal behaviour; • daily autocorrelation; • the relation with weather conditions; • weather-based error correlation and variance clustering.
The hybrid model we consider is split into two parts: a classical approach that models the first three characteristics (trend, seasonality and autocorrelation) and a ML method that takes care of the weather influence.
First, the relation between consumption and calendar variables is established through a General Linear Model (GLM).Then, we investigate the relation between GLM residuals and weather variables through a ML method known as a Gaussian Process (GP) (see, e.g., Rasmussen & Williams, 2006).We call this hybrid model GPX, because it is the natural extension of AutoRegressive eXogenous models, known as ARX (see, e.g, Box et al., 2015, p.534 et seq.).
The model of the natural logarithm of power consumption can be written as: where , where calendar time is measured via the cardinality t of the observation, starting from 1 on the first date in the dataset; T t is the trend term, S t the seasonality both yearly and weekly, introduced via two dummy variables for Saturday and Sunday, ω = 2π/365 and R t are the residuals.We also consider an Auto-Regressive (AR) term; in Section 4 we show that the null hypothesis of a unit root can be refused and that an AR(1) describes properly the time-series.
The described hybrid model approach where one differentiates first trend, seasonality and AR components and then analyses separately the residuals is standard in the energy literature (see, e.g., Benth et al., 2008, and references therein).In this study residuals of power consumption are modeled via a GP that incorporates the information coming from weather conditions: this is the main contribution of this study from a modeling perspective.
It is well known that, after having detrended and deseasonalized the time-series, the impact on power consumption of weather conditions in general and of temperature in particular, is very important and cannot be neglected (see, e.g., Hong & Fan, 2016).
As already stated in the introduction, in this study we propose to incorporate weather conditions in the model via a Gaussian Process.GPs are well known in the ML literature and provide a simple tool for density forecast.In this subsection we briefly recall the main characteristics of a GP, following the reference book of Rasmussen & Williams (2006) and using a notation similar to the one they use in a function-space view.
A GP "is a collection of random variables, any finite number of which have a joint Gaussian distribution" (cf.Rasmussen & Williams, 2006, Def.2.1, p.13).It is completely specified by its mean function and covariance matrix.In the case of zero mean, the random variables represent the value of the function R(x) at "location" x; in Rasmussen & Williams (2006) they are indicated as: where h(x, x ) is an arbitrary kernel function between the locations x and x .
In practice, this notation indicates that for any collection of n observations, the corresponding residuals are Gaussian random variables s.t.
where N(•, •) is a multinomial Gaussian distribution with zero mean and a positive definite covariance matrix H ∈ R n×n , R ∈ R n and X ∈ R n×m , where m is the number of regressors.In this study we consider 9 regressors: the 7 weather conditions in the dataset (cf.Section 2) and 2 other related to the calendar time t, in order to introduce a yearly calendar effect also in the correlation matrix.In particular, we consider cos(ωt) and sin(ωt) where ω = 2π/365 is defined as in the seasonality S t in equation ( 1).In the following, we continue to refer to these 9 regressors as weather conditions even if they contain these other two explicative variables.The covariance H i j between the i th and j th residuals depends on the weather conditions X of the corresponding dates t i and t j .
An example of kernel function is the Kronecker delta multiplied by a positive scalar σ 2 , i.e. h(x, x ) = σ 2 δ(x, x ) ; in this case, the residuals corresponds to Gaussian i.i.d.
random variables with variance σ 2 as in the standard linear regression.
In this study we consider a kernel function for the residuals equal to where with x − x the Standardised Euclidean Distance (SED) between x and x and σ f ≥ 0, σ l > 0 two additional parameters w.r.t. the standard linear regression (see, e.g., Lourenco & Santos, 2012;Morad et al., 2018). 3The choice of SED compared to L2 norm has the advantage of allowing the same contribution of each weather condition, independently from its unit of measure.
As standard in the statistical literature, the dataset is divided into In-Sample (IS) for model calibration (training set X) and Out-of-Sample (OS) for forecasting, with n observations, and evaluating the quality of the achieved forecast (test set X * ), with a number of points equal to n * , respectively two-and one-year long.One can compute pairwise, through the chosen kernel function, the covariance matrix of a finite number of GP observations which depends from the three parameters σ f , σ l and σ.
A GP presents the great advantage that it is immediate to infer the Out-of-Sample residuals and their distribution.The distribution of the points in the In-Sample and the Out-of-Sample set is (cf.Rasmussen & Williams, 2006, eq.(2.21), p.16) where K(X, X * ) denotes the n × n * matrix of the covariances evaluated at all pairs of training and test points, and similarly for the other entries K(X * , X * ) and K(X * , X).
Hereinafter, in order to simplify the notation, we indicate with R t the residual at time t both IS and OS, where R t for t = 1, . . ., n are the IS values and R t for t = n+1, . . ., n+n * are the OS ones.
We can use the GP for the prediction of OS residuals.Rasmussen & Williams (2006) show that the OS residuals R * given the IS residuals R and the weather conditions both IS, X, and OS, X * , are (cf.eq.(2.22), p.16): where Plugging the OS residuals into the hybrid model ( 1), we are able to obtain an ex-post probabilistic forecasting of power consumption.
The next subsection describes in detail the ex-post density forecasting technique and the last subsection the evaluation methods we consider.

The ex-post forecasting technique and the flow diagram
Focusing on the data structure described in the previous section, the forecast of middle-term daily density power consumption is obtained via an ex-post forecasting, a technique, introduced by Hyndman & Fan (2010) in the power consumption sector, now commonly used in middle to long term power consumption forecasts (see, e.g.Goude et al., 2013).
As shown in the flow diagram of Figure 2, the method is divided into three stages (see, e.g., Hyndman & Fan, 2010, p.1144): calibration, forecasting and evaluation.
In the first stage, the GPX model (in its two components GLM and GP) is calibrated with the IS training set, with both power and meteorological data.GLM is calibrated through Ordinary Least Squares, while the calibration of Gaussian process parameters σ f , σ l and σ is obtained maximising the log-likelihood max where H(X, X) is defined in (5).The log-likelihood is maximised through a Gradient descent iterative procedure with an adaptive step length (see, e.g., Rasmussen & Williams, 2006, and references therein).
In the second stage, the density forecasting is obtained via an ex-post forecast.
This forecast uses the weather conditions in the OS set in order to forecast the power consumption; as well explained by Goude et al. (2013, p.443) "assuming that the realisation of the meteorological covariates is known in advance (...) allows us to quantify the performances of our model without embedding the meteorological forecasting errors".The idea is that, in order to focus on the ability of the model to describe a strong and reliable relation between daily weather conditions and power consumption, one supposes to know perfectly the weather conditions in the OS period.
With GPX an ex-post point forecasting is straightforward.The expected value of consumption is obtained via (1) where residuals' expected value is given by ( 7).The main strength of GPX is that the whole OS (conditional) distribution can be easily obtained.OS log-consumption at day t = n + i, given IS consumption and weather conditions up to day t, is a Gaussian r.v. with conditional mean equal to and conditional variance where cov(R n+i , R n+i− j ) is obtained in (8).
Equation ( 11) is the most relevant modeling result in this paper: it allows to obtain the ex-post density forecast for the proposed consumption model ( 1) and ( 2).It extends the known formula for autoregressive processes in presence of i.i.d.residuals (see, e.g., Box et al., 2015, Ch.3.2.3 p.58) to the case of interest where residuals are modeled via a GP.It can be proved via an induction method.
Finally, in the third stage, the quality of model forecast is evaluated comparing it with the last year of realised OS consumption data.Model evaluation is realised both in terms of point consumption forecast and of reliability and sharpness of predicted densities.These evaluation methods are described in the next subsection.

Besides the standard measures of point consumption forecasts as Root Mean Squared
Error (RMSE) and Mean Absolute Percentage Error (MAPE), we provide some evaluation methods of density forecasting that are becoming standard in the power sector density forecasts.Compared to point forecast that provides only the expected value, when dealing with density forecast it is more difficult to value the quality of the forecast, because we are not able to observe the realised distribution of the underlying process.Therefore, we cannot compare the predicted distribution to the true one, as we only have one realisation for each distribution.The evaluation is based on two main measures: the sharpness that verifies that the forecast is as tight as possible around the expected value, and the reliability that attests distribution's statistical significance.For their detailed description we refer to Nowotarski & Weron (2018); in this subsection, we briefly summarise their main characteristics.
Sharpness is measured via the Winkler score and the pinball loss function.The score function proposed by Winkler (1972), now known as the Winkler score, is one of the main measures for Confidence Intervals (CI) sharpness.Let Lt and Ût be, respectively, the lower and upper bounds for a given (central) q CI, where q is the CI level, and y t the actual consumption at time t, the Winkler score is defined as: The pinball loss function is an error measure for quantile forecasts; it is the function to be minimised in quantile regression.Let Ŷt,q be the consumption forecast at the q th quantile, then the pinball loss function can be written as: P Ŷt,q , q; y t , where P Ŷt,q , q; y t := Let us notice that not only the value of the pinball loss provides us a useful information but also its shape as a function of the quantile q.An asymmetric pinball loss indicates that the density forecast does not reproduce with the same accuracy right and left tails of the true consumption density, while a symmetric pinball loss suggests that the shape of the actual distribution is forecasted adequately.We remind that in power consumption not only higher consumptions matter (right tail) but also lower ones that can lead to negative electricity prices.Reliability refers to the statistical consistency between the density forecasts and the realised observations OS in the test set; e.g., if 90% of the realised daily power consumptions fall within the 90% CI, then this CI is said to be reliable.
A simple and intuitive way to check model reliability from a qualitative point of view has been shown in Mori & Ohmi (2005).For a given nominal level q, one considers the (central) q CI and the indicator I t that takes two values, 1 if the actual consumption falls within the forecasted CI and zero otherwise, i.e.
The empirical coverage is the OS mean of the indicator.The closer is the empirical coverage to the nominal level, the better it is.In Section 4, we show both the nominal level and the empirical coverage for several values of q; in particular, we show that for GPX these two values result to be very close.
In order to verify that the two sets are close enough even from a quantitative point of view, it has become standard in the banking industry to run two statistical tests; if I t is considered related to quantiles instead of CI, these tests correspond to the two most common risk management tests.The first one is named Unconditional Coverage and tests the zero hypothesis that the empirical coverage equals the nominal level, i.e. , 1995).Let n 0 and n 1 be respectively the number of zeros and ones of the indicator I t , the test is carried out in the likelihood ratio (LR) framework
The second one is named Conditional Coverage and tests the alternative hypothesis that the ones and the zeros are clustered together in the indicator I t timeseries.In the alternative model the time-series is modeled as a first-order Markov chain.Let n i j be the number of observations with the value i for the indicator I t followed by j for I t+1 and π i j := n i j /(n i0 + n i1 ), the LR statistics is This LR statistics is distributed asymptotically as a χ 2 (2) (Christoffersen, 1998).
In the following section we summarise the main results in the three steps described in Figure 2: calibration, forecasting and evaluation.

Results
We compare the GPX model with simple benchmarks in the power industry: the GLM (with i.i.d.residuals) and the ARX model, where weather conditions are introduced as linear regressors (see, e.g., Box et al., 2015, p.534).The three models are calibrated In-Sample (2y data from the 1 st of April 2014 to the 31 st of March 2016) and the quality of the forecast is tested in the Out-of-Sample set (1y data from the 1 st of April 2016 to the 31 st of March 2017).
The great majority of the studies in power is focused on price forecast (see, e.g., Nowotarski & Weron, 2018, for a review); in this paper we focus on a middle-term density forecast for power consumption, where the number of studies in the literature is rather limited (see, e.g., Hong & Fan, 2016).As discussed in previous section (see Figure 2), the analysis is divided in three steps: calibration, forecasting and evaluation.
After a data pre-processing, the GPX model ( 1) and ( 2) is calibrated In-Sample considering first the GLM and then the GP.
The data pre-processing consists in the treatment of leap years and outliers.We remove from the dataset February the 29 th in leap years.Outliers may influence seasonality analysis.For this reason, they have been removed following the same technique described in Benth et al. (2008).Following that technique only one outlier has been detected corresponding to the 25 th of July 2015.It has been removed and the GLM is calibrated though Ordinary Least Squares.After estimating the GLM, the outlier has been inserted back into the time series.
The GLM considers both time effects and an autoregressive component, that are calibrated IS in three steps.First, only the parameters related to the time effects are calibrated through an Ordinary Least Squares; they include a long term trend T t and a (yearly and weekly) seasonality S t .Second, we measure the autocorrelation and the partial autocorrelation of the residuals of this regression; Figure 3 highlights the necessity of a one day auto-regressive component.Furthermore, the augmented Dickey-Fuller test refuses the null hypothesis of a unit root in favour of the alternative for the de-seasonalized process, ensuring stability of autoregression4 .Finally, we perform an   Ex-post forecasting is straightforward with GPX: each density forecast at time  of March 2016, while the forecast goes up to one later.
The quality of this forecast is the main result of the paper.In the remaining part of this section we provide some quantitative criteria that show the goodness of both the point and the density forecasting.
For the evaluation on the model, GPX model is compared to two simple benchmark models in this field: the GLM and the ARX.The former is equivalent to a Gaussian process regression with a diagonal covariance matrix (σ f = 0).The latter is an extension of GLM, in which it is introduced a linear dependency with respect to exogenous variables.We first consider accuracy measures for the point forecasting and then we show the results for the sharpness and reliability of the density forecasting: these evaluation techniques have been described in Section 3. lower than 5%, the threshold that limits -for practitioners -a good forecast in power consumption.Moreover, a lower RMSE (almost one third lower than ARX) indicates that the GPX reduces significantly the error also in winter times, when the forecast is more relevant due to the higher consumption in absolute terms and the higher volatility: a behaviour we have observed in Figure 4.
Second, we consider the analysis of sharpness.The bottom panel in Figure 5 represents the Winkler score for the three models for confidence levels ranging from the 1 st to the 99 th percentiles.We observe that GPX for Winkler score is significantly lower than the other two models for all percentiles.
Figure 5: Pinball & Winkler loss functions for the three models for the 1 st to the 99 th percentiles.We observe that not only GPX presents the lowest score for all percentiles (i.e. it is sharper and more accurate), but also that the pinball loss is more symmetric for GPX than for the other two models.
On the other hand, interesting results arise also from the analysis of the pinball loss.Top panel in Figure 5 shows pinball loss for the 1 st to the 99 th percentiles of GLM, GPX and ARX predictions.We observe that the plot of the pinball loss provides useful information not only in terms of sharpness and accuracy (the pinball loss is lower for all percentiles than the other two models and then GPX is sharper) but also related to its symmetric shape.The proposed density forecasting of power consumption is reproducing with the same accuracy both right and left tails of the actual consumption density.
Finally, also the analysis of reliability is presented.The solidity of our model can be also tested from a quantitative point of view through likelihood ratio (LR) tests.In particular, Table 6 resumes   one-year time horizon for daily power consumption.In fact, not only we have a MAPE lower than 5% over the whole time-horizon, but the nominal level and the empirical coverage appear very close for all values of q considered in Table 5 and

Concluding remarks
In this paper we have introduced the GPX (cf.equations ( 1) and ( 2)), a new hybrid model for power consumption where weather conditions are included via a simple ML technique: a Gaussian Process.
This technique allows to provide in an elementary way a density forecast over a middle-term horizon, a forecast that is very important both for network reliability of power systems and for investment strategies in new plants and transmission facilities.
In terms of point forecasting we are able to predict daily power consumption with a MAPE lower than 5% over one year (cf.
populous conurbation in the United Kingdom.The dataset we analyse contains both the time-series of daily power consumption values and seven daily average weather conditions.It is three years long, from April 2014 to March 2017.Power consumption is the aggregated household consumption of one of the main UK power suppliers.The weather dataset represents the daily average of hourly records of weather conditions in North East England.It includes seven different weather indicators:

Figure 1 :
Figure 1: Power consumption (blue line) since April 2014 to March 2016.One can notice a yearly seasonal behaviour and locally higher values on Sundays (marked with a black circle).

Figure 2 :
Figure 2: Flow-diagram of the three stages of the method: calibration, forecasting and evaluation of the proposed model.

.
The function W(•) is equal to the CI if an observation (the actual power consumption) lies inside the forecasted CI and it adds a penalty if the observation lies outside the CI: in this way it rewards a forecaster for a sharp (narrow) and accurate CI.A lower score indicates a better probabilistic forecast.

Figure 3 :
Figure 3: Autocorrelation function and partial autocorrelation function of seasonally adjusted consumption timeseries.We observe that an AR(1) well explains the observed autocorrelation in the time-series.
Gaussian with conditional mean (10) and conditional variance(11).In Figure4, we show the OS power consumption forecasting of GPX: the continuous pink line indicates the point forecast while the transparent bright red indicates the 95% confidence interval; we also show with a dot dashed green line the realised OS power consumption.Results look impressive: not only the point forecast tracks closely the realised consumption (even the spikes in winter time are tracked very closely), but also the realised consumption falls within the 95% CI in all but 15 days (95.89 %), and the densities reproduce the observed behaviour of periods of low volatility in summer time followed by periods of high volatility in winter time.Let us underline that the last power consumption considered in model calibration is the 31 st

Figure 4 :
Figure 4: Realized (dot dashed green line) and expected (continuous pink line) power consumption in MWh between April 2016 to March 2017 with predicted Confidence Intervals at 95% (transparent bright red).
the results of the reliability tests, standard in the backtesting of VaR in the banking industry after the introduction of Basel II: the Unconditional Coverage and the Conditional Coverage.It is possible to notice that only GPX obtains LRs small enough to pass the tests, while they are rejected for both GLM and ARX.
shown in Figure 6.The reliability is confirmed by the LR tests; the Conditional Coverage has shown also that the violations are not clustered in a particular period of the year, as it is revealed also by a direct inspection in Figure 4.The results imply that GPX model is able to catch a very accurate relation between weather conditions and power consumption distribution.

Table 1 :
Descriptive statistics for daily power consumption and average daily weather data in NE England in the whole time window.

Table 2 .
After the

Table 2 :
GLM parameters calibrated IS with their standard deviation (SE).With *** we indicate statistical significance of the parameters at 1% significance level.calibration of the linear part of the hybrid model, GLM residuals are then calibrated via the GP described in Section 3, maximising of the log-likelihood (9).Calibrated parameters are reported in Table3.

Table 3 :
GP parameters calibrated IS with their standard deviation (SE).With *** we indicate statistical significance of the parameters at 1% significance level.Standard errors are obtained by means of parametric bootstrapping technique (see, e.g., Efron & Tibshirani, 1986) with 1000 samples.

Table 4 :
RMSE and MAPE for the three models considered.We observe that GPX not only presents a MAPE lower than 5%, i.e. it is considered a good forecast by practitioners, but also the lower absolute error indicates a more precise point forecasting in winter times, i.e. when forecasting is more relevant.
Table5provides the backtested Confidence Intervals.The qualitative results of this evaluation method in terms of reliability of the proposed density forecasting look very good: it is possible to notice that GPX backtested CI are close to the actual one, with a maximum absolute error of 2.1% at the 90% level.One can also notice the strong reliability of GPX at 95% and 99%, which are the most used nominal levels.Moreover, one can see in Figure6that the GPX Backtested CI are much closer to nominal ones for any choice of nominal level.

Table 5 :
Backtested Confidence Intervals.We observe that the empirical coverage for GPX is very close to nominal levels.

Table 5 (
together with Figure6) and Table6are the strongest results of our evaluation analysis: the GPX model is able to provide reliable confidence intervals over Figure 6: Backtested Confidence Intervals.We observe that the empirical coverage for GPX is very close to

Table 6 :
Likelihood Ratios tests at 90% level of 99% CI.The χ 2 test statistic represents the threshold over which one should reject the null hypothesis.
Table 4); while for what concerns density forecasting, we have shown in Table 5, Figure6and Table6that is reliable, accurate and sharpe -even with a small dataset -in a detailed evaluation analysis of the results.