Abstract
We address the issue of public or bank holidays in electricity load modeling and forecasting. Special characteristics of public holidays such as their classification into fixeddate and weekday holidays are discussed in detail. We present stateoftheart techniques to deal with public holidays such as removing them from the data set, treating them as Sunday dummy or introducing separate holiday dummies. We analyze pros and cons of these approaches and provide a large load forecasting study for Germany that compares the techniques using standard performance and significance measures. Finally, we give general recommendations for the treatment of public holidays in energy forecasting to suggest how the drawbacks particular to most of the stateoftheart methods can be mitigated. This is especially useful, as the incorporation of holiday effects can improve the forecasting accuracy during public holidays periods by more than 80%, but even for nonholidays periods, the forecasting error can be reduced by approximately 10%.
Introduction
In almost every data set of electricity load or price date set we observe holiday effects. As public holidays occur rarely and have a special impact, it is challenging to treat them correctly in modeling and forecasting [1]. Furthermore, consideration of public holidays in energy forecasting yields highly relevant practical benefits. For instance in the Global Energy Forecasting Competition 2012 (GEFCom2012) one challenges posted by GEFCom2012 was holiday modeling [2]. Similarly, in the Global Energy Forecasting Competition 2014 (GEFCom2014) the probabilistic price forecasting track contained one task relates to public holidays [3]. Also recently, three of ten tasks of the RTE Dayahead Demand Forecasting Competition 2017 were dedicated to the topic of public holidays. So compared to their relative frequency they seem to be overrepresented in energy forecasting competitions which expresses the importance.
The question of holidays in electricity load is a very old question [4]. Hence, many energy forecasters deal with the problem of the treatment of public holidays (i.e. bank holidays or federal holidays) but the methods they use to approach this problem distinguish substantially. Also the winners of beforementioned forecasting competitions used different approach to deal with public holidays.
In this paper, we describe in detail special characteristics of public holidays and how they influence regular weekly consumption pattern. We also discuss the impact of specific ways to account for public holidays effects in different energy forecasting models and refer to the corresponding literature. Furthermore, we provide a large forecasting study for German electricity on 64 methodologies to deal with public holidays, as there is no such study in literature. Smaller studies of this kind can be found in [1, 5], but these studies are limited in size and show a lack of statistical validity. Additionally we address the question of the impact of univariate and multivariate modeling frameworks in the context of holiday public holidays, see [6]. The forecasting study gives substantial insight on the question how to deal with public holidays in load forecasting which are also summarized in the final conclusion section.
What is so special about public holidays in energy forecasting?
In general, holidays are specific days of the year where people change their behaviors. Thus, the electricity demand is usually influenced by this change as well. This transfers directly to formation of electricity prices and related processes.
An important issue is that the occurrence of public holidays (or bank holidays) is usually known years in advance. Public holidays are set by governments of a country, state or region and affect the human activity by law. From the energy and load forecasting perspective, these can be modeled appropriately. However, sometimes it happens that governments designate, suspend or carry over public holidays. These structural breaks in the holiday pattern complicate load forecasting. However, their impact is minor and is ignored in the modeling part of this article.
All over the world, holidays tend to decrease human working activity. Usually, this decreases the electricity demand of a certain region. However, the smaller the aggregation level, the less this general statement holds. In almost every country, there are certain regions which increase their electricity consumption during public holidays, especially those ones in touristic areas.
A public holiday occurs usually once a year. In energy forecasting this can be regarded as a rare event. Hence, it is rather difficult to establish a model which would be able to credit for such a day. Most load series have a clear weekly seasonal pattern in which especially the behavior on Saturdays and Sundays is different to those on the other days. Often public holidays violate weekly pattern quite clearly and shift the consumption behavior towards the weekend pattern.
Two classes of public holidays
As noted by e.g. [1, 7, 8], public holiday can be categorized into two groups. The first group contains the fixeddate public holidays. They are those that always occur on the fixed day each year. Typical examples are New Year’s Day which is always on 1 January and also Christmas Day which is celebrated in many African, American and European countries at 25 December. Additionally, many national holidays (such as Independence Day in the US) fall into this category, because they are linked to historically important dates and events of the corresponding country.
The second group contains the so called weekday public holidays [9] (which can be analogously referred to as variable date public holiday or flexible date public holidays [8], or fixed weekday holidays [1]). These holidays are not fixed in their date, but in the weekday of their occurrence. The date of occurrence of the public holiday can thus vary from year to year, but usually falls on a similar part of the year. In many European countries, several Christian public holidays days such as Easter or Ascension can be classified into the second group. In countries like the US or the UK, some bank holidays fall into this category as well. For instance, in the UK, the Spring bank holiday is always at last Monday of May. In states with Islamic oriented public holidays, there are weekday holidays as well. Most of these countries have the so called Lunar Events and Solar Events. The latter ones depend on the Sun and thus match the standard (Georgian) calendar year. The Lunar Events depend on the Moon cycle, typical examples are e.g. Ramadan or Ashura. These public holidays are very special because they move in the long run around the entire year. This is because the Luna year (or Islam/Muslim year) does not match the periodicity of the sun. From the modeling and forecasting perspective tracking, this phenomenon is very challenging. As temperature cycles follow the Solar year and the energy consumption depends on the temperature (due to cooling and/or heating), this leads to changing interactions between Luna public holidays and temperature effects.
Note that, depending on a country, there can be either more fixeddate public holidays or weekday public holidays within a course of a year. However, almost every country has representatives of both groups. In some countries (esp. those with Islamic holidays) it is also possible that a weekday public holiday falls on the same date with fixeddate public holiday.
Figure 1 illustrates the two types of holidays in selected weeks for seven subsequent years (2010–2016) for the German electricity load. We clearly see that the two weekday public holidays (W.Hld), Good Friday and Easter Monday, which occur around Easter, are always fall on the dates between the end of March and the end of April. Taking a closer look, we observe that even though the date is varying, the impact seems to be quite stable over the years. In contrast, the fixeddate public holiday (F.Hld), Labor Day (1 May), has clearly different effects. If it falls on a Sunday (2011 and 2016), there is no (distinct) impact observable, whereas the impact is quite obvious for the other days. Still, reference [10] argues that if a fixeddate public holiday is on a Sunday, the load tends to be reduced even more, at least for the considered Turkish load data.
Finally, there are very few exceptions of holidays that can not be clearly grouped into the classes. These are public holidays which have neither date nor weekday fixed. Notable examples are the Vernal Equinox Day in Japan or the TombSweeping Day in China.
Another issue is that some public and bank holidays are only celebrated in parts of the corresponding regions. We refer to these days as to regional public holidays. There are modeling techniques available that can cover these issues [11]. Still, in this German load forecasting study, we focus only on the impact of standard public holidays and ignore an impact of regional public holidays (Table 1).
Modeling framework of the forecasting study
For dealing with public holidays, there are many techniques used in the energy forecasting community. They range from extremely simple techniques up to very sophisticated methodologies. In this and the subsequent section, we list and describe the majority of used approaches.
Therefore, we denote \(Y_t\) the electricity load at time t that we want to forecast. Moreover, let \(Y_{d,h}\) be the corresponding hourday representation, that gives \(Y_t\) with its corresponding day d and hour h, so formally \(Y_{t}=Y_{24(d1)+h}\). Without loss of generality we consider hourly data with 24 observations each day, if we consider another frequency (e.g. halfhourly data with 48 observations) we require obvious adoptions.
As highlighted by [6] there are two main modeling streams in energy forecasting. In one stream a single big model is fitted to hourly process \(Y_t\) and this ‘highfrequency’ time series is forecasted many hours ahead using a recursive forecasting approach. The alternative approach, is based on the consideration of 24 single model equations, for each hour separately. These 24 single models may be interrelated or span a (fully) multivariate model. Here each hour is forecasted separately. We refer the first highfrequent approach to be a univariate modeling framework, and the second one to be a multivariate modeling framework as in [6].
This empirical study shows that the public holiday impact depends substantially on the considered modeling framework. Intuitively this makes sense as in the multivariate framework less observations are taking into account for the estimation. So if we have e.g. three years of data available, we have only three observations for a public holiday at a certain hour, so the forecasting methods might struggle to capture the right behavior well. In contrast, in the hourly data approach we would have \(3\times 24 = 72\) observations on a holiday available which potentially increases to chance explore the holiday behavior correctly. Still, the 24 consecutive hours of a day are usually highly depended (esp. highly correlated) which reduces the potential gain in the forecasting accuracy.
In the next subsection we introduce two benchmark models, for the univariate and multivariate modeling framework. Both will capture exactly the same effects and will serve as basis for including public holiday modeling methods. These public holiday extensions will be described in the subsequent section leading to 32 models out of 8 model classes which can be applied to the univariate and multivariate modeling framework.
Two benchmark models
For all modeling approaches, we consider a core benchmark model that does not capture the public holiday effects, but allows for simple addition of holiday effects. Therefore, denote \({\mathcal{Y}}_t = (t, Y_{t1}, Y_{t2}, \ldots )\) all available load information which includes timeinformation (represented by t) and past load values. The available load information set can be denoted in the dayhour notation \({\mathcal{Y}}_{d,h} = (d,h, Y_{d,h1}, Y_{d,h2},\ldots , Y_{d,1}, Y_{d1,24}, Y_{d1,23}, \ldots )\) so that it contains exactly the same information as \({\mathcal{Y}}_t\).
We denote the considered univariate benchmark model by:
and a considered benchmark multivariate model by
where \({\varepsilon }_t\) and \({\varepsilon }_{d,h}\) are the error terms (also noise or innovation). So, in the model we have as an input the available data, such as the actual time information, coded by t or d and h. Note that for the univariate approach, a single function f must be determined, whereas for the multivariate approach 24 functions \(f_h\) are required. If the underlying data is only of daily frequency then of course everything can be applied by just considering one single equation as in the multivariate model (2) (e.g. \(h=1\)).
We design the benchmark models so that they capture the most relevant features in a suitable way, these are autoregressive effects, such as daily, weekly and annual seasonal effects. We first introduce the multivariate model first, and then then the univariate one.
The considered multivariate benchmark model is:
where \(\beta _{k,h}\) are weekly dummy parameters, \(\alpha _{j,h}^{\text{sin}}\) and \(\alpha _{j,h}^{\text{cos}}\) annual component parameters; and \(\phi _{k,h}\) the autoregressive parameters. Moreover, \(DoW_k(d)\) denotes the dayoftheweek dummy which is always zero expect if d falls on the k’s weekday. So if d is a Monday then \(DoW_1(d)\) is 1. Additionally, \(A=365.24\) is the number of days in a (meteorologic) year; J is the order of the Fourier approximation of the annual seasonal component; p is the order of autoregressive lags. We choose an Fourier approximation order of \(J=2\), which introduces \(2\times 2= 4\) parameters describing the annual effects and is sufficient to capture the standard heating and cooling cycles in electricity load data. The autoregressive effects in captured by the \(p_h\) autoregressive parameters. We choose \(p_h=7\) which leads to a memory of a week to capture weekly autoregressive effects. All in all this model specification has \(7+4+7=18\) parameters for each hour of the day. As we have 24 hours for each day, they are in total \(24\times 18 = 432\) parameters in the full specification of model (3).
The univariate model that we introduce now will have the same amount of parameters (also for the corresponding effects). We consider for the univariate framework the function:
where \(HoD_k\) and \(HoW_k\) denote the houroftheday dummy and houroftheweek dummy, and again J as order of the annual Fourier approximation such as p as order of autoregressive lags. The parameters \(\beta _{k}\), \(\alpha _{j,k}^{\text{sin}}\), \(\alpha _{j,k}^{\text{cos}}\) and \(\phi _k\) are unknown and have to be determined given the data. The houroftheday dummy \(HoD_k(t)\) is always zero except if t is the k’s hour of the day, and similarly the houroftheweek dummy \(HoW_k(t)\) is always zero except if t is the k’s hour of the week. So they are \(7\times 24=168\) houroftheweek variables \(\beta _k\) in the model which capture the weekly profile in the data. Additionally, there are the annual effects which are in fact annual effects with daily interactions, so the annual seasonal effects can vary of the day. These effects are modeled by a Fourier approximation of order J which is multiplied with the houroftheday dummy to allow for the mentioned interaction effects. Again, we choose an Fourier approximation order of \(J=2\), which introduces \(2\times 2\times 24= 96\) parameters describing the annual effects. The autoregressive effects in captured by choosing \(p=168\), which leads to a memory of a week, as for 3. The univariate model specification has in total \(168+96+168=432\) parameters, which is the same as for the multivariate specification.
The specification of \(f_h\) and f [(3) and (4)] are very similar from the modeling perspective. Both models have in total the same amount of parameters (\(24\times (7+4+7)=432\)) which represent similar effects.
Here the daily and weekly seasonality is described by \(24\times 7=168\) parameters, and the annual effects by \(24\times 4=96\) parameters. For the annual component we use a Fourier approximation, up to order 2 with a periodicity of a year. Note that other choices like periodic Bsplines, periodic wavelets, oder calender month dummies would be possible as well.
The autoregressive effects are described by \(7\times 24=168\) parameters in both core models. Note that if we would delete the the autoregressive effects from the model (3) and (4), then both models would be identical, in the sense that the OLS fit would be the same. So they would yield the same forecast. Only the autoregressive effects are treated slightly differently.
Moreover, both models are different from the estimation/training perspective and provide different forecasting results. The first univariate model description to be more robust, whereas the multivariate specification is able to take different levels variation in the data into account. This is because implicitly, most estimation/training methods assume for the univariate approach that the variance of the error is constant whereas the multivariate approach allows for variation over the day.
Finally, the models are simple in the sense that no complicated or external regressors effects are included, like nonlinear effects, change points, temperature effects, clockchange effects and longterm trends. Only historic load information and deterministic time information is used. Note that for this empirical load forecasting study, the effect of ignoring nonlinearities is acceptable, as in Germany, only heating plays a major role but no cooling, therefore there are no strong nonlinearities observable. Still, remember that most nonlinear modeling techniques embed the linear approaches that are considered here. For instance, both models above can be written as a special artificial neural network (ANN) with a feed forward neural network skiplayer and no hidden layer.
Public holiday models
As mentioned, we will use the benchmark models (3) and (4) to build up public holiday extended versions. Therefore we have to treat the public holidays and introduce the following notations: \({\mathbb{F}}\) is the set of all fixeddate holidays of the target region; \(F_k(d)\) is the fixeddate public holiday dummy of public holiday \(k \in {\mathbb{F}}\). It is 1 if the day d is holiday k and zero otherwise. For instance if k is New Years day, then \(F_k(d)\) is 1 if d is on 1 January and 0 otherwise; F(d) is the (overall) fixeddate public holiday dummy, which is 1 if day d is any fixeddate public holiday. Formally, we have \(F(d) = \sum \nolimits _{k\in {\mathbb{F}}} F_k(d)\); \({\mathbb{W}}\) is the set of all weekday holidays. For the US, this contains Birthday of Martin Luther King, Jr., Washington’s Birthday, Memorial Day, Labor Day, Columbus Day and Thanksgiving Day; \(W_k(d)\) is the fixeddate public holiday dummy of public holiday \(k \in {\mathbb{W}}\). It is 1 if the day d is holiday k and zero otherwise. For instance, if k is on Christmas Day, then \(\text{W}_k(d)\) is 1 if d is the 25 Dec and 0 otherwise; W(d) is the (overall) weekday public holiday dummy, which is 1 if day d is any weekday public holiday. Formally, we have \(\text{W}(d) = \sum _{k\in {\mathbb{W}}} \text{W}_k(d)\). \({\mathbb{H}}= {\mathbb{F}}\cup {\mathbb{W}}\) is the set of all holidays. \(H(d) = F(d) + W(d)\) is the overall holiday dummy. (In countries where weekday public holidays can fall on a fixeddate holiday this definition should be adopted to \(H(d) = \min \{F(d) + W(d), 1\}\) to avoid that the day is counted twice.)
Analogously to the holiday dummies which depend on the day d we define the same dummies depended on the time index t.
For the estimation/training of the forecasting model we suppose that we have always the past D days of load observations available, or alternatively \(T=24D\) hours of load data. Without loss of generality we denote this available sample by \( {\mathcal{Y}}_{D, h} = (Y_{1,h}, Y_{2,h}, \ldots , Y_{D,h})'\) of length D and \( {\mathcal{Y}}_{T} = (Y_{1}, Y_{2,} \ldots , Y_{T})'\) of length T. Moreover, denote the corresponding time index set \({\mathbb{T}}= \{1,2,\ldots , T\}\) and \({\mathbb{D}}= \{1,2,\ldots , D \}\).
Ignoring public holidays
In relatively many studies the public holiday effects are simply ignored. So, we simply keep the model specifications (3) and (4) as they are. For the estimation, all available data is considered. Thus, we tend to get a high error on days where there is a public holiday. However, if the holiday is on a weekend, especially on a Sunday, the effect is likely less relevant.
This approach has still many opponents even though the modeler and forecaster are already aware of the problem. Especially in studies where the focus in on other modeling aspects this approach is still popular, [6, 9, 12,13,14,15,16,17,18].
For the estimation of this model we consider the standard regression framework. So for the model (1) and (2) with the benchmark specifications (3) and (4) we have the OLS estimator in the univariate case:
and in the multivariate case:
with \({\mathcal{X}}_{T}\) and \({\mathcal{X}}_{D,h}\) as regressor matrix the corresponding model specification. We denote the models by Ubench and Mbench for the univariate and multivariate specification. In general for all model abbreviation we apply the U in front of a univariate model and M in front of a multivariate model.
Removing public holidays
In some studies [19,20,21,22,23] unwanted holidays are removed (or excluded) from the underlying data set, or they are treated as missing data. In contrast to the ignoring the public holidays approach we have the advantage of not having the public holiday bias for the training or estimation. However, due to the removing of the holidays, we might ignore relevant information from our data, this could help to improve the forecasts. However, this modeling technique has the disadvantage—that it is not incorporating the effects of public holidays at all. So if we want to forecast these days, we have the problem that we can never capture the public holiday effects. Additionally, the removing of data, makes the estimation of autoregressive effects complicated. Thus, it should not be suitable for shortterm forecasts where the autoregressive impact is strong. However, to overcome this problem (at least partially) [24] removed public holiday from the training set, and replaced them by the average of the load in the corresponding periods of the two adjacent weeks for corresponding lagged inputs.
Note that some of the listed studies which remove the public holidays from their underlying data sets mention explicitly that the forecasting methods is not tailored for forecasting those days. However, some of them also list results for the performance including those holidays.
In the empirical application, we consider the parameter estimation as in (5) or (6). The only thing that change is the supplied vector of observed data \({\mathcal{Y}}_{T}\) or \({\mathcal{Y}}_{D,h}\) and the corresponding regression matrix \(\varvec{{\mathcal{X}}}_{T}\) or \(\varvec{{\mathcal{X}}}_{D,h}\). In the univariate case, the reduced observation vector can be defined as \(\widetilde{\varvec{{\mathcal{Y}}}}_{T} = (Y_t)_{t\in {\mathbb{T}}\setminus \{ t\in {\mathbb{T}} {H}(t) = 1 \} }\). \(\widetilde{\varvec{{\mathcal{Y}}}}_{T}\) gets slightly smaller as all observation of the public holidays (\({\mathbb{H}}\)) will be removed. For the univariate and multivariate specification, we denote the corresponding model by remh.
We also consider the two special cases: first where we only remove the fixeddate public holidays (\({\mathbb{F}}\)) from the estimation procedure and a second where we remove all weekday public holidays (\({\mathbb{W}}\)). The resulting models are denoted by ‘remfh’ such as ‘remwh’.
Public holidays as weekday dummy
Many electricity load forecasting methods treat use daytype dummies to incorporate the specific holiday effects. Here public holidays are usually treated as a weekday dummy. If the modeler or forecasting does not distinguish between different impacts of Saturdays and Sundays and just considered weekend dummies in model f or \(f_h\), then the public holidays are often considered as a weekend as well, as in [25] or [26] for selected holidays. However, in most cases, the public holidays are usually treated as a Sunday [27,28,29] or sometimes as a Saturday [30], depending on the motivation of the forecaster. In the rare case of [31] for load modeling in Iran [32], the public holidays were considered as a Friday. Sometimes this assigning to the weekday dummy is tailored for every single public holiday, e.g. in [33], the majority of the public holidays is treated as Saturday, but some as Sunday.
A clear plus of this modeling approach is that captures automatically the different effects of fixeddate public holidays and weekday public holidays. Another important feature is that all public holidays get the same effect, similarly as selected weekday effects.
In univariate modeling frameworks, those dummies are not very popular as they somehow introduce structural breaks at midnight. However, this problem can be removed by introducing ’smooth’ dummies. In [11], a such sophisticated univariate dummy based approach is used. A Bspline basis function approach is used to guarantee a smooth transition effect for the public holidays.
Here we implement only two strategies namely those that treats every holiday as a Sunday, and the Saturday option. The latter one seems to be the most plausible way for the data sets of consideration, especially if you have in mind that concerning many legal issues most countries of the world treat public holidays as Sunday. So the considered countries in this study as well.
Formally, this approach changes the dayoftheweek dummy \(DoW_k\) (or analogously the houroftheweek dummy \(HoW_k\)) in the model specification. The regular 7periodic pattern gets disturbed by the holidays. Let w be the target weekday which replaces the public holidays (e.g. \(w=6\) for Saturday and \(w=7\) for Sunday). Then, we define formally dayoftheweek dummy with holiday replacement as:
The resulting model specification for the multivariate framework is:
For the univariate model is constructed in a similar way using \(HoW_k\) instead of \(DoW_k\), so:
with \({\mathcal{D}}_w = \{24(w1)+1, \ldots , 24w\}\) We denote the resulting models by ‘hSat’ (\(w=6\)) such as ‘hSun’ (\(w=7\)). They are estimated using OLS.
Additional public holiday dummies
In some studies, the public/bank holidays are incorporated by introducing new public holiday dummies. These dummies model the public holiday effect separately instead of forcing it to follow the weekday effect as in previous model class.
More precisely, these methodologies provide a list of public holiday sets. For each set of the list, a public holiday dummy is introduced. In the most simple case, the list contains only a single set of all public holidays, so \({\mathbb{H}}\). Another approach would be to consider a list of two elements, the first of all fixeddate public holidays \({\mathbb{F}}\) and second all weekday public holidays \({\mathbb{W}}\). Following this we would have two subgroups of public holidays which lead to two new public holiday dummies. In this study, we consider seven different groups, namely:

1)
all public holiday with the list \({\mathcal{L}}=({\mathbb{H}})\) and resulting holiday dummy \({\mathcal{D}}_1 = {H}\);

2)
all fixeddate public holidays: \({\mathcal{L}}=({\mathbb{F}})\) and resulting holiday dummy \({\mathcal{D}}_1 = {F}\);

3)
all weekday public holidays: \({\mathcal{L}}=({\mathbb{W}})\) and resulting holiday dummy \({\mathcal{D}}_1 = {W}\);

4)
all fixeddate public holidays and all weekday public holidays: \({\mathcal{L}}=({\mathbb{F}},{\mathbb{W}})\) and resulting holiday dummies \({\mathcal{D}}_1= \text{F}\) and \({\mathcal{D}}_2 = \text{W}\);

5)
all fixeddate public holidays separately and all weekday public holidays: \({\mathcal{L}}=( (l)_{l\in {\mathbb{F}}}, {\mathbb{W}})\) with resulting holiday dummies \({\mathcal{D}}_k= \text{F}_k\) with \(k\in {\mathbb{F}}\) and \({\mathcal{D}}_{F+1} = {W}\) with F as number of fixeddate public holidays;

6)
all fixeddate public holidays and all weekday public holidays separately: \({\mathcal{L}}=( {\mathbb{F}}, (l)_{l\in {\mathbb{W}}} )\) with resulting holiday dummies \({\mathcal{D}}_1 = {F}\) and \({\mathcal{D}}_{1+k} = {W}_k\) with \(k\in {\mathbb{W}}\);

7)
all public holidays separately: \({\mathcal{L}}=( (l)_{l\in {\mathbb{F}}},(l)_{l\in {\mathbb{W}}}) \) with resulting holiday dummies \({\mathcal{D}}_k = {F}_k\) with \(k\in {\mathbb{F}}\) and \({\mathcal{D}}_{F+k} = {W}_k\) with \(k\in {\mathbb{W}}\) and F as number of fixeddate public holidays;
The first three options lead to a single new public holiday dummy. The fourth to two new public holiday dummies. And the latter ones leads usually to more dummies depending on the region to model. In 7), every single public holiday receive its own dummy.
Of course, there are many more options of subgroup creating. However they must be tailors for every country and set of public holidays. For instance in [34], they created three subgroups Monday holidays, Summer holidays and Winter holidays. In [35], a similar holiday classification was used forecasting electricity load. Such a classification is usually based on expert knowledge of the forecaster about the target region and energy time series.
In literature, the most common approach seems to add holiday dummies seems to be the option 1). This is used by e.g. [36,37,38,39,40,41]. Especially in artificial intelligent based methods where a training set is provided to e.g. the neural net or support vector machine algorithms often just add such a holiday dummy. Also [42] indirectly use the 1) method, even they are introducing a ‘working day’ dummy which is zero on working days and nonzero on nonworking days. As they also incorporate Saturday and Sunday dummies, this approach spans up the same model space as the direct approach.
Having the list of public holidays \({\mathcal{L}}\) which define the public holiday dummies, there are basically two options to incorporate them into the model. It includes the additional dummy and the replacing dummy approach. The latter one is described in the next subsection.
The additional public holiday approach of the model specification gets only extended by the additional dummies. For instance, the multivariate model specification (3) turns to:
where L is the length of the set of holidays list \({\mathcal{L}}\) and \({\mathcal{D}}_{l}\) represents the l’s holiday dummy that corresponds to l’s element of \({\mathcal{L}}\). For instance, for 4) we have \( \sum _{l=1}^L \gamma _{l,h} {\mathcal{D}}_{l}(d)= \gamma _{1,h} {\mathcal{D}}_1(d) + \gamma _{2,h} {\mathcal{D}}_2(d) = \gamma _{1,h} {F}(d)\, + \gamma _{2,h} {W}(d)\), so two additional holiday dummies are added.
For the univariate model, the holiday dummies are added in the same way. We denote the additional models with the corresponding options lists above by Adh \(\textcircled {1}\), Adfh \(\textcircled {2}\), Adwh \(\textcircled {3}\), Adfhwh \(\textcircled {4}\), AdfHwh \(\textcircled {5}\), AdfhwH \(\textcircled {6}\) and and AdH \(\textcircled {7}\). We estimate them using OLS.
In theory, the model approach of treating each holiday separately using dummies (option 7)) has a nice feature that the holiday impact can be modeled explicitly for all holidays. This is valid especially for regional holidays, or in cases where it is relatively well known that the holiday impact differs.
However, as pointed out in Sect. 2, only weekday public holidays occur always on the same weekday. Here the additional holiday dummies approach seems to be reasonable as the impact to the holiday is more or less the same in every year. However, for modeling fixeddate public holidays we might run into troubles with the additional approach.
For a better understanding, we consider a simple example. Assume that we observe a certain fixeddate public holiday in a 3 year insample period on a Wednesday, Thursday and Friday. In the next year where we want to forecast the load of this public holiday, it falls on a Sunday. Thus, a simple additional dummy approach would have noticed a clear reduction in the load for the three insample years and puts a clear negative coefficient on this public holiday dummy. If in the next year, the public holiday falls on a Sunday then there will be a double reducing impact, because the Sunday dummy and the public holiday dummy are active. The replacing dummy approach is able to overcome this problem.
Replacing public holiday dummies
The replacing public holiday approach can be seen as a modification of the additional approach. Here, the holiday dummies are added to the model as before. Additionaly, the dummies for the weekly pattern (DoW or HoW) are adapted so that the corresponding weekday effect is set to zero if there is a holiday. The basic idea is to avoid double reducing effects, if e.g. the public holiday falls on a Sunday.
The replacing holiday dummy options is present in literature as well, so in [43]. Furthermore, specific nonlinear methods that utilize factor variable for day types fall into this category as well, as they span up the same space of possible solutions. (Note that this potentially holds for nonlinear methods of Sect. 4.4 as well, especially those which enlarge the feature space to explicitly include interactions like polynomial kernel regression based methods.) Reference [44] who won the electricity load and electricity price forecasting track in the Global Energy Forecasting Competition 2014 (see [3]) utilized such a daytype methodology using a single holiday group (but also a dummy for the day before and after the public holiday). Reference [45] use such a way of modeling with two groups of public holidays, but the grouping itself is not reported. Also [46] used such a sophisticated day type specification for the Kaggle Global Energy Forecasting Competition 2012 for load forecasting. For the US data, they considered four groups for Christmas and New Year’s Day, for Christmas Eve, for Independence day and for Thanksgiving. Also [47] use 4 groups of public holidays (New Years Day, Christmas Day and Boxing Day, New Years Eve, Remaining Public Holidays) for German industrial load modeling and replace the corresponding weekday dummy.
For the replacing method, the holiday dummies are added as in (13) but additionally weekday dummy \(DoW_k\) is modified. In detail, the dummy is set to zero if any holiday is active, so formally we have:
This leads to the model specification:
Similarly as before the univariate model version is constructed in the same way. We denote the replacing models by Rph \(\textcircled {1}\), Rpfh \(\textcircled {2}\), Rpwh \(\textcircled {3}\), Rpfhwh \(\textcircled {4}\), RpfHwh \(\textcircled {5}\), RpfhwH \(\textcircled {6}\) and and RpH \(\textcircled {7}\).
In the replacing approach double counting fixeddate public holiday on a Sunday is avoided, as the Sunday dummy is replaced by zero and only the public holiday dummy is active. Still the replacing approach has the disadvantage that there is no active weekday dummy. The impact is completely estimated by very few public holidays in the data set which might lead to very volatile estimates and higher forecasting errors.
Hybrid dummy approach: additional weekday public holidays dummies and replacing fixeddate holidays
We have seen the additional public holiday approach where public holidays are added and the replacing one where public holidays are added but the corresponding weekday dummies is replaced by 0. The replacing method was mainly introduced by the problem that additional approach double counts fixeddate public holiday on weekends (esp. Sunday). However, this is only an issue for fixeddate public holidays but not for weekday holidays. So there is also the option of considering the replacing approach for the weekday public holidays, but keeping the (simple) additional holiday approach.
Therefore denote \({\mathcal{L}}\) the list of public holiday sets that follow the additional approach and \({\mathcal{R}}\) the list of public holidays which will be treated using the replacing method. As we have in \({\mathcal{L}}\) only weekday public holidays and in \({\mathcal{R}}\) only fixed date public holidays we receive in total four possible combinations:

a)
\({\mathcal{L}}= ({\mathbb{W}})\) and \({\mathcal{R}}= ({\mathbb{F}})\)

b)
\({\mathcal{L}}= ((l)_{l\in {\mathbb{W}}})\) and \({\mathcal{R}}= ({\mathbb{F}})\)

c)
\({\mathcal{L}}= ({\mathbb{W}})\) and \({\mathcal{R}}= ((l)_{l\in {\mathbb{F}}})\)

d)
\({\mathcal{L}}= ((l)_{l\in {\mathbb{W}}})\) and \({\mathcal{R}}= ((l)_{l\in {\mathbb{F}}})\)
In the multivariate case, the resulting model is then
with \({\mathcal{D}}^{{\mathcal{L}}}_{l}\) as weekday dummies for \({\mathcal{L}}\) and \({\mathcal{D}}^{{\mathcal{R}}}_{l}\) as weekday dummies for \({\mathcal{R}}\). Here L and R are the elements in \({\mathcal{L}}\) and \({\mathcal{R}}\). The replacing dayoftheweek dummy \(DoW_{\text{Re}{\mathcal{R}},k}\) is defined as in (11) using \({\mathcal{R}}\) as holiday list, instead of \({\mathcal{L}}\). We denote the four model options by AdwhRpfh \(\textcircled {a}\), AdwHRpfh \(\textcircled {b}\), AdwhRpfH \(\textcircled {c}\) and AdwHRpfH \(\textcircled {d}\). Again, the corresponding univariate framework is constructed in the same way. All models are estimated using OLS.
However, note that out of the four options \(\textcircled {a}\)–\(\textcircled {d}\) for weekday holiday list \({\mathcal{L}}\) and fixeddate list \({\mathcal{R}}\) two models coincide with previous ones due to singularities. The options \(\textcircled {b}\) and \(\textcircled {d}\) where the additional list of weekday public holidays \({\mathcal{L}}\) contains all public holidays separately match the replacing model versions where each weekday public holiday is modeled with its own dummy. So RpfhwH is equivalent to AdwHRpfh and repH is equivalent to AdwHRpfH. Thus, we are dropping the latter one (AdwHRpfh and AdwHRpfH) from the analysis. The same holds for the univariate framework.
Finally, theoretically we could also create the other mixtures (e.g. weekday public holiday as replacing and fixeddate public holidays as additional dummy approach), but due to the motivation the crucial problem are the fixeddate public holidays. Furthermore, the resulting models does not lead to further information concerning the modeling of public holidays.
Hybrid approach: weekday public holidays as dummies and fixeddate holidays as weekdays
As mentioned in the model description in the previous Sect. 4.4, the additional holiday dummy approach might be inappropriate for the fixeddate public holidays. However, this issue might be solved by combining the holiday as weekday approach (Sect. 4.3) with the additional holiday dummy methods (Sect. 4.4). So it is able to construct a model that joins the properties of both public holiday modeling approaches efficiently.
The key idea is to use additional public holidays dummies for weekday public holidays where they work well, but to consider the holiday as weekday approach for the fixeddate holidays.
For the additional modeling approach, we define the multivariate model by:
where \({\mathcal{L}}\) is the holiday list which contain only weekday holidays and:
Here for public holiday list \({\mathcal{L}}\), we have two reasonable options as only contain weekday public holidays:

1)
\({\mathcal{L}}=({\mathbb{W}})\) where all weekday public holidays with a single parameter and dummy W;

2)
\({\mathcal{L}}=((l)_{l\in {\mathbb{W}}})\) where all weekday public holidays with a separate parameter and dummies \({W}_k\) for all \(k\in {\mathbb{W}}\).
For the replacing weekday of the fixeddate public holidays in (15) we consider again the plausible options \(w=6\) (Saturday) and \(w=7\) (Sunday). Even though the hybrid model is motivated by the disadvantages of the additional holiday dummy approach we can use it with the replacing method as well. In this case, the \(DoW_{{\mathbb{F}}w,k}(d)\) gets adopted to:
For the univariate model, the construction of the hybrid model follows the same steps. We denote the additional hybrid model as in (14) by HyAd6wh for option 1) and \(w=6\) and HyAd6wH for option 2) and \(w=6\). For the replacing approach we denote it by HyRp6wh for option 1) and \(w=6\) and HyRp6wH for option 2) and \(w=6\). If we use \(w=7\) instead of \(w=6\) we change the 6 in the name to 7, so e.g. HyAd6wh to HyAd7wh. As before the models are estimated using OLS.
Finally, we want to mention that the hybrid model has the disadvantage that fixeddate public holidays are always treated as a specific weekday, e.g. Sunday. So holidays with minor effect or regional public holidays can not be well modeled.
Public holidays with separate additional dummies and weekday holiday impact adjustment
Another approach to overcome the problem of the additive holiday dummy approach of Sect. 4.4 is based on an impact adjustment for weekday holidays. Here, an impact profile shall measure the additional impact of the weekday public holiday at a certain weekday. In the simplified framework we would expect an fixeddate public holiday to have a 100% reducing effect if it falls on a standard working day e.g. Wednesday towards the Sunday behavior. But if weekday public holiday fall on a falls on Saturday, we expect a smaller reducing effect; so that the Saturday turns more or less to a Sunday. On a Sunday we expect essentially no effect at all. This effect can also be observed in Fig. 1.
Now the idea is to compute an impact profile to measure this load (or consumption/demand/price) reduction. Therefore we consider the weekly load profile \({\mathcal{P}}_{w,h}\) as a basis. Thus, \({\mathcal{P}}_{w,h}\) is simply the mean load at weekday w and hour h. So it represents \(7\times 24\) values. In Fig. 2, we see such a typical profile \({\mathcal{P}}_{w,h}\) for German load data. Given this profile we can define the impact profile. It is constructed using two reference days, a zero impact day \(w_0\) and a full impact day \(w_1\). Then we can define the impact function \({\mathcal{I}}_{w,h}\):
In this study we consider only zero impact weekday as Sunday (\(w_0=7\)) and the full impact weekday as Wednesday (\(w_1=3\)). Figure 2 visualizes the impact profile \({\mathcal{I}}_{w,h}\) for German load data. We see clearly that the impact for the Sunday is set to zero and for Wednesday it is one. For the working days Tuesday and Thursdays the impact is close to 1 as well. For the Monday and Friday we observe the reducing effect from the transition from and towards the weekend. And for Saturday we observe a relatively small effect of roughly about 0.3 during the main hours.
Given the impact function we can setup the corresponding model.
where \({\mathcal{L}}\) is a list containing fixeddate public holiday. For the holiday list \({\mathcal{L}}\) we can consider the same two possibilities 1) \({\mathcal{L}}=({\mathbb{W}})\) and 2) \({\mathcal{L}}=((l)_{l\in {\mathbb{W}}})\) as in the previous subsection. We see that the impact profile \({\mathcal{I}}_{w,h}\) is included via simple multiplication with the corresponding fixeddate holiday dummies \(\text{F}_k\). Note that for the fixeddate public holidays it only makes sense to consider. This approach was used by [8] in the GEFCOM2014 probabilistic load forecasting framework. In [48] this methodology is applied to mid and longterm energy system modeling. The model framework can be improved by considering the impact coefficients tailored for the corresponding holidays, or broader classes e.g. summer and winter holidays. We denote the model using option 1) with ‘impwh’, the option 2) with ‘impwH’. Similarly the univariate versions can be derived.
Finally, we want to mention that there are some other forecasting methods that deal with public holiday which do not fit in the classes introduced before. Most notable these are fuzzy linear regression approaches [10, 49, 50], such as factor variable approches as used in [51, 52].
Empirical study
Forecasting study and evaluation design
We test the several listed methodologies. Note that we could also perform a probabilistic forecasting study, but even the simple point forecasting study illustrates the results sufficiently well. We evaluate the forecasting performance for the electricity load data of Germany which was also used in Fig. 1 for illustration purpose. In total, we consider 8 years of load data, from January 2009 to December 2016. As in [5], we consider 3 years of insample training, in detail \(3\times 365\) days.
The evaluation framework is motivated by this one of [20] where the performance also evaluated once for all outofsample days and once without the holidays. As basic evaluation method, we consider the mean absolute error (MAE) which is optimal if the point forecast corresponds to the median. We also evaluate the performance concerning the root mean square error (RMSE) which is optimal if the point forecast shall match the mean. As the models are designed on forecasting the mean, the evaluation focus should be on the RMSE. Note that we are not considering the mean absolute percentage error (MAPE) even though it is still popular in energy forecasting, especially in engineering literature. The reason is that the MAPE is not a suitable forecasting evaluation measure, especially with respect to significance evaluation [53].
For the MAE and RMSE, we consider several options to evaluate the performance differently. First we are evaluating the performance across the full outof sample period, afterwards we are looking at the public holidays itself.
Formally, we consider the definitions of the MAE and RMSE are:
where \({\mathbb{O}}\) is an selected outofsample day subset. To define these outofsample day subsets \({\mathbb{O}}\), let \({\mathbb{O}}\text{os}\) be the set of all outofsample days; here 5 years from January 2012 to December 2016. Then, we consider in total five options for the subset \({\mathbb{O}}\):

1)
all days:
$$\begin{aligned} {\mathbb{O}}_{\text{all}} = {\mathbb{O}}\text{os} \end{aligned}$$ 
2)
all nonholidays days:
$$\begin{aligned} {\mathbb{O}}_{\text{nH}} = \{ d\in {\mathbb{O}}\text{os}  H(d)=0 \} \end{aligned}$$ 
3)
all public holidays:
$$\begin{aligned} {\mathbb{O}}_{\text{H}} = \{ d\in {\mathbb{O}}\text{os}  H(d)=1 \} \end{aligned}$$ 
4)
all fixeddate public holidays:
$$\begin{aligned} {\mathbb{O}}_{\text{F}} = \{ d\in {\mathbb{O}}\text{os}  F(d)=1 \} \end{aligned}$$ 
5)
all weekday public holidays:
$$\begin{aligned} {\mathbb{O}}_{\text{W}} = \{ d\in {\mathbb{O}}\text{os}  W(d)=1 \} \end{aligned}$$
The options \({\mathbb{O}}_{\text{all}} \) and \({\mathbb{O}}_{\text{nH}}\) evaluate the overall performance on all days, mainly focusing on the overall performance. The difference between the corresponding measures e.g. \({MAE}_{{\mathbb{O}}_{\text{all}}}\) and \({MAE}_{{\mathbb{O}}_{\text{nH}}}\) allows us to assess the impact of the holidays to the overall performance.
The remaining three subsets \({\mathbb{O}}_{\text{H}}\), \({\mathbb{O}}_{\text{F}}\) and \({\mathbb{O}}_{\text{W}}\) are useful to compare the forecasting accuracy of the modeling procedures on the public holidays itself. Moreover, they allow us to draw different conclusions for the holiday type.
The scores defined by (19) and (20) allow the comparison of prediction methods. However, just reporting the outofsample MAE or RMSE does not directly allow significance statements. Here, we are a multivariate Diebold–Mariano (DM) test as introduced in [6] for electricity price forecasting and extended by [54].
Therefore, denote by \(\widehat{\varvec{\varepsilon }}_{Y,d} = (\widehat{{\varepsilon }}_{d,1}, \ldots , \widehat{{\varepsilon }}_{d,24})'\) vectors of outofsample errors \(\widehat{{\varepsilon }}_{d,h} = Y_{d,h}  \widehat{Y}_{d,h}\) for given outofsample day d. Now, we are considering the forecasting error for two models say A and B with the forecast errors \(\widehat{\varvec{\varepsilon }}_{A,d}\) and \(\widehat{\varvec{\varepsilon }}_{B,d}\) of day d. Then we define the multivariate loss differential series:
with \(\Vert \cdot \Vert _p\)norm, i.e., \(\Vert \widehat{\varvec{\varepsilon }}_{A,d}\Vert _p = (\sum _{h=1}^{24} \widehat{{\varepsilon }}_{A,d,h}^p)^{1/p}\). For the DMtest, we are considering the options \(p=1\) and \(p=2\). The former one corresponds to the MAE, the latter one to the RMSE. When applying the DMtest we are always testing the best model A (under a certain score) against another model B. For each pair (A, B), we carry out the DMtest with the null hypothesis \(H_0: E(\varDelta _{A,B,d}) \le 0\), i.e., the testing if the forecasts of B is outperformed by A. As in the standard DMtest, we assume that the series of the loss differences \(\varDelta _{A,B,d}\) is covariance stationary. This yield asymptotically normally distributed test statistics.
Finally, remember that we are considering the multivariate and univariate modeling framework. Both modeling frameworks describe the same effects and are comparable, still they are not identical. So we can only judge properly within this corresponding modeling class as both benchmark model are not designed to be optimal.
Results for forecasting accuracy
In Table 2 the results of the multivariate and univariate forecast for the German electricity loads are presented. There we see the MAE and RMSE values of all models for the considered forecasting horizon of 1 day. (Note that the authors also considered higher forecasting horizons of 7 days and 30 days in detail. Due to space limitations the results are not shown here, but the interested reader may request the results from the authors. However, the results are in general in line with the 1 day ahead forecasts.)
Below the the MAE and RMSE values, we report the test statistic of the Diebold–Mariano test with respect to the best model in the corresponding category. The best model is highlighted by bold font and all models that are not significantly worse (significance level \(\alpha =1\%\)) within each column are underlined. Moreover, the cells of Table 2 are colored with respect to the test statistic of the corresponding DMtest using a red \(\rightarrow \) yellow \(\rightarrow \) color scheme. Here green represents the best model.
First we are evaluating the overall performance of the forecasting methodology. So we are considering the option 1) and 2) with the evaluation for all outofsample days and all nonholidays in the outofsample period.
First of all, we observe that for the multivariate benchmark model (see Table 2) the 1day ahead MAE on all outofsample days (\({\mathbb{O}}_{\text{all}}\)) is about 1.48 GW and for all outofsample days without the public holidays (\({\mathbb{O}}_{\text{nH}}\)) it is only about 1.31 GW. Thus the removing of only 9 public holidays (\(9/356 \approx 2.5\%\) of the data) reduces the outofsample MAE by more than \(10\%\). Thus, the impact on the forecasting performance of a public holiday is about four times as high as a normal day. For the RMSE and other forecasting horizons, similar relationships hold. Also, the univariate models have the same pattern regarding the performance of the benchmark model. However, in absolute terms, the univariate models have better forecasting accuracy with respect to short term horizons.
If we compare the benchmark model of Table 2 with the simple approach where the public holidays were removed from the data set (rmh, remfh and remwh), then we notice that the forecasting performance is effected as well. Especially the model where all holidays got removed (remh) shows a clear improvement in the forecasting accuracy in terms of MAE, but it stayed at a similar level for the RMSE. In multiple cases, the remH approach leads to the best overall forecasting accuracy for the multivariate and univariate framework.
However, especially for multivariate shortterm forecasts multiple models which incorporate public holiday effect have better forecasting results. Here, three model forecasts are not significantly worse from the best model forecast of MRpH for both MAE and RMSE. These are MAdwhRpfH, Mimpwh and MimpwH. The former one seems to be better for all outofsample days and the latter one for all nonholidays. The replacing public holiday method MrepH has an 1day ahead MAE on all outofsample days of only 1.23 GW which is about 17% smaller than the MAE of the benchmark model. Also on all nonholidays the forecasting accuracy increased by about 9% for all nonholidays. In terms of RMSE, the improvements are similar. This illustrates well, that the proper inclusion of public holidays improves the forecasting accuracy even for the nonholidays substantially.
For univariate frameworks, the picture is not that clear and looks quite diffuse. Here URpH reduces only the RMSE compared to the Uremh model which is best in terms of 1day ahead MAE. Still, the mentioned RMSE improvement of URpH is 19%. Additionally, the DMtest with respct to the \(\Vert \cdot \Vert _2\)norm shows that only the forecasts of the MAEoptimal Uremh are significantly worse than URpH.
Now, we want to evaluate the forecasting performance of the considered models on the public holidays. In Table 2, we observe that for the benchmark model the error is much larger as for the nonholidays. For example the 1day ahead MAE of the multivariate benchmark model (Mbenchmark) is on all holidays 8.59 GW and on all nonholidays 1.31 GW, so the absolute error is 656% larger on holidays than on nonholidays. Interestingly, this error impact is of the same magnitude for the fixeddate public holidays and the weekday public holidays. These observations holds similarly for all forecasting horizons and error measures.
Furthermore, we notice that the removing the holidays in the data set (remh, remfh and remwh) reduces consistently the forecasting accuracy compared to benchmark. In contrast, all considered public holiday methods improved the forecasting accuracy substantially. Now, we discuss the results of all model classes in more detail. However, before we start with the individual models we observe that the univariate and multivariate models in Table 2 shows similar overall pictures.
The public holiday as weekday models hSat and hSun improve the forecasting accuracy clearly with respect to the benchmark model. So we have e.g. a 1day ahead MAEs of 2.51 GW and 2.38 GW with respect to all holidays for the multivariate models whereas the benchmark has a MAE of 8.59 GW. Thus, the improvement is 71 and 72%. It seems that fixeddate and weekday public holidays can be modeled both similarly well. Still, it seems that for the multivariate models the fixeddate public holidays can be described slightly better, and for the univariate ones, the weekday public holidays. Moreover, we notice that in most cases hSat and hSun are significantly worse than the best approach.
The additional public holiday methods (Adh, Adfh, \(\ldots \) AdH) also improve the forecasting accuracy. Here the performance depends crucially on the selected sets of holiday dummies. We see that the multivariate and univariate models with a single fixeddate public holiday dummy Adfh performs better than the benchmark for the fixeddate public holidays, but similar for the weekday public holidays. Similarly, the model with a weekday public holiday dummy Adwh improves the forecasting accuracy for these holidays. In contrast, the simple Adh model improve the forecasting accuracy for all holidays as expected. If we compare Adh (one additional dummy) with AdH (additional dummy for each public holiday) we notice that it seems that many additional public holiday dummies does not improve the forecasting performance. For the univariate framework UAdh yield quite clearly outperforms UAdH on the fixeddate public holidays but vice versa on the weekday holidays.
However, wee see that the public holiday forecasting accuracy for the additional public holiday methods is worse than performance of the hSat and hSat methods, for both the univariate and multivariate models. Still, if we have a closer look at the results, we notice that the forecasting performance of most additional public holiday approaches are very competitive for the weekday public holidays but show a lack of accuracy for the fixeddate public holidays. The reason for this is explained in detail in the previous section. Even the simple Adh model performs well for the weekday public holidays.
For the replacing dummy methods, the results differ to the additional public holiday dummy models. In general, the replacing models RpXXX tend to perform better than there additional dummy counterparts AdXXX. We observe, that the replacing method RpH (each holiday with a replacing dummy) yields high forecasting accuracy for the multivariate and univariate approaches. We have an 1step ahead MAE for all holidays of 1.66 GW for MRpH and only 1.44 GW for URpH. Thus, with an MAE improvement compared to the benchmark of 81 and 82% the error level at the holidays is only about 36 and 10% higher than the error level of nonholidays as reported in Table 2. The high forecasting accuracy seems to be preserved with increasing forecasting horizon.
The next class of models is the hybrid class containing additional weekday holiday dummies but replacing fixeddate holiday dummies. The representatives AdwhRpfh and AdwhRpfH show similar performance to the replacing methods. So it seems that replacing approach substantially improves the forecasting accuracy compared to the pure additional holiday dummy methodology. Still, the hybrid approach is not able to outperform the pure replacing holiday dummy method.
Furthermore, we have the hybrid approaches (HyAd6wh, HyAd6wH, \(\ldots \), HyRp7wH) which consider the fixeddate public holidays as Saturday or Sunday and the weekday public holidays as holiday dummy. Overall they show a good forecasting performance. We observe that the forecasting accuracy for the fixeddate public holidays is of the same magnitude as the corresponding holiday as weekday methods (hSat and hSun). So the improvement of the forecasting accuracy is mainly from the weekday public holiday. For weekday public holidays, the hybrid model show excellent performance with a similar error level as the best replacing holiday dummy methods.
Finally, we have the impact adjusted additional public holiday dummy methods (impwh, impwH). Here, we observe that the forecasting accuracy of the multivariate methods is relatively poor in the short runs. But here we want to mention that it keeps almost the same error level for longer forecasting horizons where the forecasting accuracy is not significantly worse than the best model. For the univariate framework, we observe that the approach is well working for the weekday public holidays, but struggles with the fixeddate public holidays. The reason is likely the nontrivial interactions between the autoregressive coefficients with the changing impact functions.
As mentioned, there are only a very few limited studies that compare different public holiday modeling methods. In a small daily load forecasting study, reference [5] compare four different methods to deal with public holidays were evaluated in a neural net framework. First, an additional holiday dummy (A1), second the holidays were treated as a Sunday (A2), third an additional factor variable is introduced (similarly as in [52]) was used to code the day before the holiday, the holiday itself, and the day after the holiday (A3) and fourth a methodology (A2) and (A3) combined. Reference [5] concludes that the method A3 performs best. However, their outof sample period was only 2 month. Further no significant statements are made. Still, A3 can be regarded as an additional dummy approach that additionally takes bridging effects into account. However, in our analysis, the additional holiday dummy approach is outperformed by the holiday as Sunday model and even more significantly by the replacing public holiday model.
Summary and conclusion
We discuss the impact of public holidays modeling approaches used for electricity load forecasting. Therefore, we elaborate on the challenging properties of public holidays, especially of those which can be classified as fixeddate public holidays or weekday public holidays.
Several approaches were established. Among them are e.g. removing public holidays from the data set, treating public holidays as Sunday dummy or introducing separate holiday dummies. These approaches were compared with respect to a suitable benchmark model. The benchmark model includes daily, weekly and annual seasonalities such as autoregressive effects. All considered 31 forecasting methods were tested in a univariate and multivariate modeling frameworks [6]. The former one treats the data as a highfrequent (e.g. every hour) time series whereas in the latter every load period (e.g. every hours) is modeled and forecasted separately. We evaluate all forecasting models and different forecasting horizons using the MAE and RMSE performance measures such as the Diebold–Mariano test.
The empirical results show that the incorporation of holiday effects can rise the forecasting accuracy substantially. On public holidays, the improvement can exceed 80%, but, remarkably, also during nonholiday periods the forecast accuracy increases by about 10% only due to properly covered holiday effects. The most promising methodology to deal with public holidays is the replacing public holiday dummy approach (model repH of Sect. 4.5). This approach works especially well in multivariate modeling settings. In the replacing public holiday dummy approach, we add public holiday dummies to the model but set the weekday dummies at the holidays to zero.
More sophisticated studies could be carried out in future as well. This can be done by altering the insample period length, including more data sets and implementing more advanced modeling approaches, among which are e.g. forecasting of bridging effects.
References
 [1]
Hong T (2010) Short term electric load forecasting. North Carolina State University, Raleigh
 [2]
Hong T, Pinson P, Fan S (2014) Global energy forecasting competition 2012. Int J Forecast 30(2):357–363
 [3]
Hong T, Pinson P, Fan S et al (2016) Probabilistic energy forecasting: global energy forecasting competition 2014 and beyond. Int J Forecast 32(2):896–913
 [4]
Brubacher SR, Wilson GT (1976) Interpolating time series with application to the estimation of holiday effects on electricity demand. Appl Stat 25:107–116
 [5]
Aquino I, Perez C, Chavez JK et al (2007) Daily load forecasting using quick propagation neural network with a special holiday encoding. In: Proceedings of 2007 international joint conference on neural networks, Orlando, USA, 12–17 Aug. 2007, pp 1935–1940
 [6]
Ziel F, Weron R (2016) Dayahead electricity price forecasting with highdimensional structures: univariate vs. multivariate models. Technical report, Hugo Steinhaus Center, Wroclaw University of Technology
 [7]
Koopman SJ, Ooms M, Carnero MA (2007) Periodic seasonal REG–ARFIMA–GARCH models for daily electricity spot prices. J Am Stat Assoc 102(477):16–27
 [8]
Ziel F, Liu B (2016) Lasso estimation for GEFCom2014 probabilistic electric load forecasting. Int J Forecast 32(3):1029–1037
 [9]
Bosco BP, Parisio LP, Pelagatti MM (2007) Deregulated wholesale electricity prices in Italy: an empirical analysis. Int Adv Econ Res 13(4):415–432
 [10]
Çevik HH, Çunkas M (2016) A fuzzy logic based short term load forecast for the holidays. Int J Mach Learn Comput 6(1):57
 [11]
Ziel F, Steinert R, Husmann S (2015) Efficient modeling and forecasting of electricity spot prices. Energy Econ 47:98–111
 [12]
Haben S, Giasemidis G (2016) A hybrid model of kernel density estimation and quantile regression for GEFCom2014 probabilistic load forecasting. Int J Forecast 32(3):1017–1022
 [13]
Jurado S, Nebot À, Mugica F (2015) Hybrid methodologies for electricity load forecasting: entropybased feature selection with machine learning and soft computing techniques. Energy 86:276–291
 [14]
Koprinska I, Rana M, Agelidis VG (2015) Correlation and instance based feature selection for electricity load forecasting. Knowl Based Syst 82:29–40
 [15]
Macaira PM, Sousa RC, Oliveira FLC (2016) Forecasting Brazil’s electricity consumption with pegels exponential smoothing techniques. IEEE Latin Am Trans 14(3):1252–1258
 [16]
McLoughlin F, Duffy A, Conlon M (2015) A clustering approach to domestic electricity load profile characterisation using smart metering data. Appl Energy 141:190–199
 [17]
Nowotarski J, Weron R (2016) On the importance of the longterm seasonal component in dayahead electricity price forecasting. Energy Econ 57:228–235
 [18]
Tascikaraoglu A, Sanandaji BM (2016) Shortterm residential electric load forecasting: a compressive spatiotemporal approach. Energy Build 111:380–392
 [19]
Antoniadis A, Brossat X, Cugliari J et al (2016) A prediction interval for a functionvalued forecast model: application to load forecasting. Int J Forecast 32(3):939–947
 [20]
Dordonnat V, Koopman SJ, Ooms M (2012) Dynamic factors in periodic timevarying regressions with an application to hourly electricity load modelling. Comput Stat Data Anal 56(11):3134–3152
 [21]
Dudek G (2016) Patternbased local linear regression models for shortterm load forecasting. Electr Power Syst Res 130:139–147
 [22]
Liebl D (2013) Modeling and forecasting electricity spot prices: a functional data perspective. Ann Appl Stat 7(3):1562–1592
 [23]
Yuan X (2010) Analysis the electricity load and the summer airconditioning load in Taizhou. In: Proceedings of 2010 China international conference on electricity distribution (CICED), Nanjing, China, 13–16 Sept. 2010, pp 1–16
 [24]
Taylor JW (2012) Shortterm load forecasting with exponentially weighted methods. IEEE Trans Power Syst 27(1):458–464
 [25]
Aung Z, Toukhy M, Williams J et al (2012) Towards accurate electricity load forecasting in smart grids. Int J Emerg Electr Power Syst 2012, 16(6):563–567
 [26]
Charlton N, Singleton C (2014) A refined parametric model for short term load forecasting. Int J Forecast 30(2):364–368
 [27]
Krogh B, De Llinas ES, Lesser D (1982) Design and implementation of an online load forecasting algorithm. IEEE Trans Power Appar Syst 9:3284–3289
 [28]
Potočnik P, Govekar E (2016) Applied shortterm forecasting for the Slovenian natural gas market. In: Proceedings of 2016 13th international conference on the european energy market (EEM), Porto, Portugal, 6–9 June 2016, pp 1–5
 [29]
Zhang L, Luh PB (2005) Neural networkbased market clearing price prediction and confidence interval estimation with an improved extended Kalman filter method. IEEE Trans Power Syst 20(1):59–66
 [30]
Rahman S, Bhatnagar R (1988) An expert system based algorithm for short term load forecast. IEEE Trans Power Syst 3(2):392–399
 [31]
Amjady N (2001) Shortterm hourly load forecasting using timeseries modeling with peak load estimation capability. IEEE Trans Power Syst 16(4):798–805
 [32]
Bahrami S, Hooshmand RA, Parastegari M (2014) Short term electric load forecasting by wavelet transform and grey model improved by pso (particle swarm optimization) algorithm. Energy 72:434–442
 [33]
Hong T, Wilson J, Xie J (2014) Long term probabilistic load forecasting and normalization with hourly information. IEEE Trans Smart Grid 5(1):456–462
 [34]
Khotanzad A, Hwang RC, Abaye A et al (1995) An adaptive modular artificial neural network hourly load forecaster and its implementation at electric utilities. IEEE Trans Power Syst 10(3):1716–1722
 [35]
Cho H, Goude Y, Brossat X et al (2013) Modeling and forecasting daily electricity load curves: a hybrid approach. J Am Stat Assoc 108(501):7–21
 [36]
Chen BJ, Chang MW, Lin CJ (2004) Load forecasting using support vector machines: a study on eunite competition 2001. IEEE Trans Power Syst 19(4):1821–1830
 [37]
Filatov A, Smirnova E (2012) The electricity price and quantity forecasting at the dayahead market. Sci Enq Contemp World: Theor Basics Innov Approach 44
 [38]
Lisi F, Nan F (2014) Component estimation for electricity prices: procedures and comparisons. Energy Econ 44:143–159
 [39]
Niu D, Wang Y, Wu DD (2010) Power load forecasting using support vector machine and ant colony optimization. Expert Syst Appl 37(3):2531–2539
 [40]
Panapakidis IP, Dagoumas AS (2016) Dayahead electricity price forecasting via the application of artificial neural network based models. Appl Energy 172:132–151
 [41]
Uniejewski B, Nowotarski J, Weron R (2016) Automated variable selection and shrinkage for dayahead electricity price forecasting. Energies 9(8):621
 [42]
Hernández L, Baladrón C, Aguiar JM et al (2014) Artificial neural networks for shortterm load forecasting in microgrids environment. Energy 75:252–264
 [43]
Ba A, Sinn M, Goude Y et al (2012) Adaptive learning of smoothing functions: application to electricity load forecasting. In: Proceedings of advances in neural information processing systems, pp 2510–2518
 [44]
Gaillard P, Goude Y, Nedellec R (2016) Additive models and robust aggregation for GEFCom2014 probabilistic electric load and electricity price forecasting. Int J Forecast 32(3):1038–1050
 [45]
Goude Y, Nedellec R, Kong N (2014) Local short and middle term electricity load forecasting with semiparametric additive models. IEEE Trans Smart Grid 5(1):440–446
 [46]
Nedellec R, Cugliari J, Goude Y (2014) Gefcom 2012: electric load forecasting and backcasting with semiparametric models. Int J Forecast 30(2):375–381
 [47]
Berk K, Hoffmann A, Müller A (2017) Probabilistic forecasting of electricity load with inhomogeneous Markov switching models. SSRN Electron J. https://doi.org/10.2139/ssrn.2954365
 [48]
Ziel F, Steinert R (2017) Probabilistic midand longterm electricity price forecasting. arXiv preprint arXiv:1703.10806
 [49]
Song KB, Baek YS, Hong DH et al (2005) Shortterm load forecasting for the holidays using fuzzy linear regression method. IEEE Trans Power Syst 20(1):96–101
 [50]
Wi YM, Joo SK, Song KB (2012) Holiday load forecasting using fuzzy polynomial regression with weather feature selection and adjustment. IEEE Trans Power Syst 27(2):596–603
 [51]
Fan S, Hyndman RJ (2012) Shortterm load forecasting based on a semiparametric additive model. IEEE Trans Power Syst 27(1):134–141
 [52]
Taieb SB, Hyndman RJ (2014) A gradient boosting approach to the Kaggle load forecasting competition. Int J Forecast 30(2):382–394
 [53]
Franses PH (2016) A note on the mean absolute scaled error. Int J Forecast 32(1):20–22
 [54]
Uniejewski B, Weron R, Ziel F et al (2017) Variance stabilizing transformations for electricity spot price forecasting. Technical report, Hugo Steinhaus Center, Wroclaw University of Technology
Author information
Affiliations
Corresponding author
Additional information
CrossCheck date: 9 January 2018
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
ZIEL, F. Modeling public holidays in load forecasting: a German case study. J. Mod. Power Syst. Clean Energy 6, 191–207 (2018). https://doi.org/10.1007/s4056501803855
Received:
Accepted:
Published:
Issue Date:
Keywords
 Public holidays
 Bank holidays
 Special day effects
 Forecasting
 Electricity load
 Holiday dummy