Multilayer perceptron for shortterm load forecasting: from global to local approach
 53 Downloads
Abstract
Many forecasting models are built on neural networks. The key issues in these models, which strongly translate into the accuracy of forecasts, are data representation and the decomposition of the forecasting problem. In this work, we consider both of these problems using shortterm electricity load demand forecasting as an example. A load time series expresses both the trend and multiple seasonal cycles. To deal with multiseasonality, we consider four methods of the problem decomposition. Depending on the decomposition degree, the problem is split into local subproblems which are modeled using neural networks. We move from the global model, which is competent for all forecasting tasks, through the local models competent for the subproblems, to the models built individually for each forecasting task. Additionally, we consider different ways of the input data encoding and analyze the impact of the data representation on the results. The forecasting models are examined on the real power system data from four European countries. Results indicate that the local approaches can significantly improve the accuracy of load forecasting, compared to the global approach. A greater degree of decomposition leads to the greater reduction in forecast errors.
Keywords
Data representation Forecasting problem decomposition Neural networks Shortterm load forecasting1 Introduction
Shortterm load forecasting (STLF) aims to predict the future load demand ranging from an hour to a week ahead. This is essential for power system control and scheduling. From an energy generation point of view, STLF is necessary for electric utility operations such as unit commitment, generation dispatch, hydro scheduling, hydrothermal coordination, spinning reserve allocation, interchange and low flow evaluation, fuel allocation and network diagnosis. Since electricity load demand is the basic driver of electricity prices, load forecasting plays an important role in competitive energy markets. Forecast accuracy is a key factor in determining the financial performance of energy companies and other market participants and financial institutions operating in energy markets. Improving the STLF accuracy can significantly reduce the power system operating cost.
Neural networks (NNs) have been widely used in STLF since the early 90 s. This is due to the capacity of NNs to capture the nonlinear relationship between explanatory variables and load. Modeling this relationship is not an easy task because it is unstable in time and strongly dependent on the period of the year, day of the week, hour of the day as well as other factors. NNs have many attractive features, which are extremely useful in hard forecasting problems. These include universal approximation property, capability of learning from examples, several learning paradigms, many architectures, massive parallelism, robustness in the presence of noise and fault tolerance.

deep convolutional NN—in [4], a hybrid method based on a deep convolutional NN is introduced for shortterm PV power forecasting,

long shortterm memory recurrent NN—in [5], this framework, which is the latest and one of the most popular techniques of deep learning, is proposed for STLF for individual residential households,

stacked denoising autoencoders—in [6] this model, a class of deep neural networks and its extended version are utilized to forecast the hourly electricity price,

poolingbased deep recurrent NN—in [7], this network batches a group of customers’ load profiles into a pool of inputs and is applied for household load forecasting,

randomized algorithms for NN training—in [8], a novel hybrid method of probabilistic electricity load forecasting is proposed, including randomized training an improved wavelet NN, wavelet preprocessing and bootstrapping,

secondorder gray NN—in [9], a method based on wavelet decomposition and a secondorder gray NN combined with an augmented Dickey–Fuller test is proposed to improve the accuracy of load forecasting,

echo state networks—in [10], the wavelet echo state network is applied to both STLF and shortterm temperature forecasting,

spiking NN—in [11], spiking NN shortterm load forecaster is proposed.
STLF problems are usually decomposed to make them simpler to solve. Subproblems can be modeled using simpler NN architectures, which are easier to learn and generalize better than one big NN designed for the complex original problem. Different methods of decomposition have been proposed. In the early works on neural models for STLF, the separate models for different day types were proposed. A typical example is [12], where seven NNs were built corresponding to the 7 day types from Monday to Sunday. In [13], the load patterns were classified into weekday patterns and weekendday patterns. The weights of NN were estimated using previous load data for each pattern. Then, the daily period was divided into three parts: 1–9, 10–19 and 20–24, and each part was modeled using individual NN. A similar approach in [14] was applied. For the forecasts of weekdays from Tuesday to Friday, there was one NN for each of these days. For Saturdays, Sundays, Mondays and days after a holiday, the daily period was divided into several parts, which were modeled by separate NNs. Decomposition into day types was also used in [15]. Separate NNs were formed for each day type, and different input variables were used. More recent examples of STLF problem decomposition into day types can be found in [16, 17, 18].
Another common practice is to decompose the load time series into 24 series, each one corresponding to an hour of the day. Then, 24 NNs with a single output learn on these series to provide load forecasts for the next day [19, 20]. In [21] for each day and each hour of the 24 h, a new model was trained on 20 previous days. Thus, this local approach produced models which are competent only for just 1 hour of the current forecasted day. For 24hahead STLF in [10], 24 individual models were built based on wavelet echo state networks, each one for a specific hour of the day.
Forecasting problem decomposition into 12 months was used in [22]. The forecasting model consists of twelve NNs. Each of them performs the final 24hahead load forecast for 1 month of the year. This decomposition is justified due to the differences in the weather during the year. A separate model is also designed for each month in [23]. The proposed models are composed of two hybrid NNs derived from fuzzy NNs.
Several levels of the STLF problem decomposition were used in the forecasting system known as ANNSTLF [24], which by the year 1997 was being used by over 30 utilities in the USA and Canada. This system has three MLP modules: an hourly module with 24 MLPs, a daily module with 7 MLPs and a weekly module. The final load forecast for each hour is found by a proper adaptive combination (based on the recursive least squares) of the forecasts given for this hour by the three MLP modules.
Another approach to STLF problem decomposition relies on clustering the NN inputs and designing a separate NN for each cluster. This approach combines unsupervised and supervised learning concepts. In [25], unsupervised learning is used to identify days with similar daily load and temperature patterns. The training patterns for clustering are selected among patterns from the same period of the year as the forecast day and representing the same day type. For each cluster, a NN with 24 output nodes, which produce 24 hourly loads for the forecast day, is trained. Thus, the relationship between input and output variables is modeled locally within each cluster. In [26], twostage clustering was used. The SOM identifies similarities among the different load patterns and forms clusters. Then, the kmeans algorithm identifies similarities among different clusters and groups similar clusters together. A dedicated MLP for each cluster is trained to properly forecast the load curve. Clustering is also applied on the training set in [10]. After the clusters are formulated, NNs are trained separately with the data of the corresponding clusters.
A completely different approach to the STLF decomposition problem is based on the multilevel wavelet transform (WT). A load time series consisting of both global smooth trends and sharp local variations is decomposed into generalization and details, i.e., low and highfrequency components. These components are modeled separately by NNs. In [10], each NN gets as inputs a frequency component determined for the day d, nextday temperature and daytype index. The output of NN is the predicted nextday value of the corresponding component. NN in the next layer is used to reconstruct the nextday load forecast based on outputs (frequency components) of the NNs from the previous layer. In [27], a forecasting model composed of wavelet transform, NN and an evolutionary algorithm was proposed. Each component is predicted by a combination of NN and the evolutionary algorithm, and then by inverse WT, the hourly load forecast is obtained. The key idea of the STLF method proposed in [28] is to select the similar day’s load as the input load, apply WT to decompose it into lowfrequency and highfrequency components and then use separate NNs to predict the two components of tomorrow’s load.

decomposition of the load time series into two parts: the daily average load and the intraday load variation, and modeling these two parts independently using NNs [29],

decomposition of the STLF problem at the distribution level into “regular” and “irregular” nodes based on load pattern similarities [30]. These node types are then forecasted separately. Each irregular node is forecasted by an individual NN model.

decomposition into geographical regions due to distinct climate characteristics and consumer bases. In [31], the load forecasting in the four regions of Taiwan is independently performed using NNs.
In this paper, we apply MLP for STLF. This type of network is the most widely used among NNs for modeling nonlinear relationships due its valuable features such as universal approximation property and flexible fitting to the target function. This flexibility is achieved easily by adding new hidden neurons. Training algorithms for MLP have been developed for many years and are still being improved. They can be implemented in many programming languages and environments such as: MATLAB, Python, R, C++, Java and Scala. There are also known effective methods of dealing with overfitting in MLP such as regularization. In the simulation study in this work, we use a powerful and robust Levenberg–Marquardt learning algorithm which interpolates between the Gauss–Newton algorithm and the method of gradient descent. To avoid overfitting, this algorithm is enriched by Bayesian regularization, which provides effective and robust MPL training.
MLP performs very well in STLF against other NNs. In [3], a comparison of several neural STLF models on four datasets is presented. The compared neural models are: MLP, radial basis function NN, generalized regression NN (GRNN), two models based on fuzzy counterpropagation NNs and three models based on selforganizing maps. MLP was trained using the local learning procedure. The forecasting results demonstrate that MLP and GRNN achieve similar level of accuracy for all datasets (with a slight advantage for GRNN). The MLP model performs very well compared to other stateoftheart models as well as the classical statistical models: ARIMA and exponential smoothing (see comparison in Table 9 in Sect. 4).

forecasting for each day of the week separately using seven NNs,

forecasting for each hour of the day separately using 24 NNs,

forecasting for each day of the week and each hour of the day separately using 168 NNs,

forecasting separately for each forecasting task using NN built only for this task.
The daily curves of the load in the proposed models are preprocessed to filter out the trend, weekly and annual cycles, and are introduced to the models as input and output variables. Depending on the approach, other input data are used: period of the year, day of the week and hour of the day. The performance of the model is dependent on how data are represented. For day of the week, six methods of coding are considered, while for hour of the day, five methods are considered. In the experimental part of this work, global and local approaches are examined as well as data representation methods.
The paper is organized as follows. In Sect. 2, data representation ways are described. The forecasting models in global and local versions are presented in Sect. 3. In Sect. 4, the models using different data representation ways are tested on real load data and compared. Finally, Sect. 5 concludes this paper.
2 Data representation
The proposed neural models generate forecast of the load at timepoint t of the day i + τ, where i is the number of the current day and τ is the forecast horizon in days. The forecasted day is located in some period of the year and represents some day of the week. It is assumed that the load pattern of the current day i is available and can be used as an input of the model. It is represented by a vector x = [x_{1}x_{2} … x_{n}], where n is a number of timepoints in a daily period (24 for hourly resolution). The load pattern of the forecasted day i + τ is represented by a vector y = [y_{1}y_{2} … y_{n}]. A neural model (MLP) generates one of the yvector component as an output: y_{t}. The timepoint number t, hour of the day in our case, is represented by a vector h. The forecasted day is located in the period of the year represented by a vector p, and its type (Monday, …, Sunday) is represented by a vector d. Vectors x, p, d and h can be used as inputs of NNs, and component y_{t} is an output. The methods of representation of input and output data are shown below.
2.1 Representation of load time series
Note that after normalization, all load vectors L_{i} have unity length, zero mean and the same variance. Thus, the load time series, which is nonstationary in mean and variance, is represented by xvectors having the same mean and variance. They carry information about the shape of the daily load curve. (The trend and weekly and annual variations are filtered.)
As the daily periods of the load time series are coded as x and yvectors, we unify the input and output data and simplify the relationships between them. This is further discussed in [32]. The expected result is a simpler and more accurate forecasting model.
2.2 Representation of period of the year
Days which are in the same position in time in a yearly cycle have similar values of their pvector components. This way of period of the year representation was used in many works, see [12, 10, 29].
2.3 Representations for day of the week
Representations for day of the week
dc  1  2  3  4  5  6 

d ^{1}  d ^{2}  d ^{3}  d ^{4}  d ^{5}  d ^{6}  
Monday  δ/7  [sin(2πδ/7) cos(2πδ/7)]  1000000  001  001  00 
Tuesday  0100000  010  011  01  
Wednesday  0010000  011  010  01  
Thursday  0001000  100  110  01  
Friday  0000100  101  111  01  
Saturday  0000010  110  101  10  
Sunday  0000001  111  100  11 
In the second representation, sine and cosine functions are used for coding the day index in a similar way as in the case of coding the period of the year (see dc = 2 in Table 1). Note that in this “periodic” representation, the last day of the week has similar dvector values as the first day of the week.
The third representation uses seven bits. The δth day of the week is represented by a vector having one at the δth position and zeros at remaining positions (see dc = 3 in Table 1).
In the fourth representation, index δ is encoded using the natural binary system. Three bits are needed for coding seven values of the index (see dc = 4 in Table 1). Note that in this representation, two neighboring values of the index can differ significantly in the binary space. For example, values 3 and 4 are represented by vectors [011] and [100], respectively, where all three bits are different. This disorder of regularity can affect learning of the neural network. To improve this, in the fifth representation, Gray code is used, in which two adjacent values of the index differ in only one bit in the binary space (see dc = 5 in Table 1).
In the last representation, days of the week are grouped according to the load pattern similarity. Four groups are assumed: (1) Mondays, (2) Tuesdays–Fridays, (3) Saturdays and (4) Sundays. The group number is binary encoded (see dc = 6 in Table 1).
2.4 Representations for hour of the day
Representations for hour of the day
hc  1  2  3  4  5 

h ^{1}  h ^{2}  h ^{3}  h ^{4}  h ^{5}  
Hour 1  t/24  [sin(2πt/24) cos(2πt/24)]  1000…0  00001  00001 
Hour 2  0100…0  00010  00011  
Hour 3  0010…0  00011  00010  
…  …  …  …  
Hour 24  0000…1  11000  10100 
The features of the coding method have an influence on NN learning and its ability to map inputs into outputs. The first important feature of coding is the adjacency property: adjacent values in the original space are represented by adjacent values in the code space. All representation methods have this property except for natural binary coding. The second feature is the periodicity property: the beginning and ending values in the original space are represented by similar values in the code space. This property can be important for periodically changing variable such as load in daily, weekly and annual periods. This property is evident only for representations based on sine and cosine functions. Another important feature is the number of components of the code vector. This corresponds to the number of free parameters of NN (weights connecting inputs with hidden neurons), which are used to map the encoded variable into the output variable (in the context of other input variables of course). The more components (inputs) for a variable, the more weights for it, which enables the network to model more complex relationships. In the case of day of the week and hour of the day, the most inputs are delivered by the third representation methods: 7 for d^{3} and 24 for h^{3}, respectively. In the first representation method of these variables (d^{1} and h^{1}, respectively), there is only one component. One component is also used in the case of the load time series representation. (Each load value is represented by one component of xvector.)
In Sect. 4, the representation methods are tested experimentally.
3 Forecasting models for STLF based on MLPs
Five variants of the forecasting models based on MLP are examined, v.1–v.5. They correspond to the method of the STLF problem decomposition. In each case, the forecasting task is to forecast the power system load at hour χ = 1, 2,…, 24 of the day i + τ.
For all cases, MLP with one hidden layer is used. The Levenberg–Marquardt algorithm with Bayesian regularization is applied for learning MLPs. In this algorithm, combination of squared errors and net weights is minimized. This expansion of the cost function to search not only for the minimal error, but also for the minimal error using the minimal weights prevents overfitting. Compared to other methods of improving generalization in NNs, Bayesian regularization gives very good results [33].
3.1 Global model (v.1)

xvector for the day i, x_{i},

period of the year from which the forecasted day is, p_{i+τ},

type of the forecasted day, d_{i+τ}, and

forecasted hour, h_{t}.
In the experimental part of the work, the training set contains samples from the first two or three years of the data and the test set contains samples from the next year. The model which is trained only once on the training set is competent for the entire test period, i.e., for i = m + 1 to m + v, where v is 366 for 1year test period.
3.2 Separate NN for each day of the week (v.2)
Note that NN for the daytype δ learns on the training patterns representing just this type of the day. Each NN is competent for the entire test period but only for one of the seven days of the week.
3.3 Separate NN for each hour of the day (v.3)
Thus, 24 NNs are built. Each of them is competent for the entire test period but only for one of the 24 h of the day. Note that input patterns [x_{i}p_{i+τ}d_{i+τ}] for all 24 NNs are the same, but target outputs are different. NN for hour χ has χth component of yvector as its output: y_{i,χ}.
3.4 Separate NN for each day of the week and hour of the day (v.4)
In this case, NN designed for the daytype δ and hour χ learns on training patterns corresponding to days of type δ and hour χ.
3.5 Separate NN for each forecasting task (v.5)
Note that in the case of this model, the historical period from which the training set is generated is not limited to m days, but also contains recent days from m + 1 to q − 1, i.e., all available data, up to the day, which is the last day from the available history. Newer information hidden in data, from i = m + 1 to q − 1, is used for building a forecasting model for the day q. In the case of models described above, this information is not used. For example, when we forecast last day of the 1year test period (v = 366) using models v.1–v.4, we use information older than 1 year to train the model. Model v.5 learns on the most recent information about the load time series. We expect this to increase model performance.
In [34], a similar neural model was proposed, but it was trained locally on the training samples (x_{i}, y_{i,t}), where xvectors belong to the set of k nearest neighbors of the current xvector. During experiments conducted as part of this work, we noticed that increasing k counterintuitively improves results. Thus, in this study, we do not limit the number of samples to k nearest neighbors, but train our model using all samples from history [see i ∈ ∆^{1…q−1} in (13)].
4 Simulation study
In this section, the neural models v.1–v.5 as well as data representation methods are evaluated on real data: four data sets containing hourly loads of Polish (PL), British (GB), French (FR) and German (DE) power systems in the period 2012–2015. (The source of data is www.entsoe.eu.) A onedayahead STLF problem is considered (τ = 1). Then, the forecast accuracy of the best variant of the proposed neural model is compared with levels of accuracy achieved by stateoftheart models applied to STLF. The comparative models include neural networks, linear regression, nonparametric regression, clusteringbased models, artificial immune systems, ARIMA, exponential smoothing and the naïve model.
The optimization and training procedures for neural models in variants v.1–v.4 are as follows. First, the number of hidden neurons and the best methods of input data representation are selected. To do this, the training is repeated for each variant of data representation as well as for #neurons = 1, 2,…, g. For example, in the case of model v.1, we have four input variables: x_{i}, p_{i+τ}, d_{i+τ}, h_{t}. The day of the week is encoded using one of six ways, and the hour of the day is encoded using one of five ways. Thus, there are 5 × 6 = 30 combinations of input data representation. For each combination, we train the model composed of 1, 2,…, g neurons. The training set is created using data from the period 2012–2013. After training, the model is tested on data from 2014. The best model is selected having the lowest error on the test period. Then, the best model is trained on data from the period 2012–2014 and then tested on data from 2015. The mean error for the forecasting tasks from the test period (2015) is a measure of the model quality.
In the case of the neural model in variant v.5, the optimization and training procedures are different. This model is trained for each forecasting task from the test period (2015) independently. First, to select the number of hidden neurons, ten models are built for ten forecasting tasks from the history, which are similar to the current forecasting task. The history from which these ten tasks are selected is limited to the period covering year 2014 and the period of 2015 preceding the current forecasting task for the day q. By similar forecasting tasks to the current one, we mean tasks for the same daytype δ as the current task and having xvectors similar to the xvector of the current task (x_{q}). The similarity measure between xvectors is the Euclidean distance. For example, when the forecasting task concerns July 1, 2015, a Wednesday, we select ten similar forecasting tasks from the period from January 1, 2014, to June 29, 2015. For PL data ,these selected tasks are (ranked in the order of similarity): June 11, 2014, June 25, 2014, June 17, 2015, June 24, 2015, July 2, 2014, July 16, 2014, July 9, 2014, June 18, 2014, July 30, 2014 and June 10, 2015. As we can see, these days are from the same period of the year as the current forecasted day. The model learns for each of the ten similar tasks independently on the training set generated from the historical data according to (13), where now q is the similar day number. The training is repeated for #neurons = 1, 2,…, g, and the optimal number of neurons is selected for which the mean error determined on similar tasks is minimal. Then, the optimal model learns on the training set (13) generated from the period starting on January 1, 2012, and the forecast for the current task is generated. Thus, for each forecasting task from 2015, a separate NN is created and optimized on the historical forecasting tasks which are most similar to the current one.
The error measure applied in this study is the mean absolute percentage error (MAPE), which is traditionally used as an error measure in STLF. Atypical days such as public holidays are excluded from the training and test sets (between 10 and 20 days in a year).
Results for variant 1
PL  GB  FR  DE  

dc  2  3  2  6 
hc  3  4  3  4 
#neurons  12  13  8  14 
MAPE 2014  1.24  2.12  1.65  1.67 
MAPE 2015  1.35  2.81  1.70  1.66 
Results for variant 2
Mon  Tue  Wed  Thu  Fri  Sat  Sun  Mean  

PL  
hc  4  5  5  3  4  4  4  
#neurons  6  13  10  4  16  10  9  9.71 
MAPE 2014  2.07  1.13  1.2  1.12  1.24  1.29  1.28  1.33 
MAPE 2015  2.15  1.11  1.24  1.28  1.38  1.49  1.49  1.45 
GB  
hc  3  4  4  4  4  5  5  
#neurons  5  9  8  9  7  6  6  7.14 
MAPE 2014  2.38  3.12  2.56  3.05  2.66  3.36  3.34  2.92 
MAPE 2015  3.88  3.53  3.49  3.83  3.23  3.41  3.43  3.54 
FR  
hc  4  5  4  5  5  4  4  
#neurons  5  8  9  7  8  6  7  7.14 
MAPE 2014  2.36  1.76  1.72  1.96  1.56  1.85  1.77  1.85 
MAPE 2015  2.32  1.93  1.78  2.07  1.97  1.88  1.68  1.95 
DE  
hc  3  3  4  4  4  5  4  
#neurons  1  6  7  9  10  8  6  6.71 
MAPE 2014  2.64  1.2  1.39  1.39  1.59  1.56  1.56  1.62 
MAPE 2015  2.27  1.86  2.33  1.58  1.84  1.88  1.64  1.91 
Results for variant 3
Hour  1  2  3  4  5  6  7  8  9  10  11  12  

PL  dc  3  6  4  5  4  6  4  3  6  6  5  3 
#neurons  2  7  11  8  10  7  6  3  8  5  8  7  
MAPE 2014  0.56  0.66  0.71  0.74  0.84  0.92  1.09  1.21  1.23  1.16  1.23  1.25  
MAPE 2015  0.49  0.67  0.74  0.79  0.92  1.00  1.19  1.21  1.60  1.27  1.33  1.33  
GB  dc  2  5  1  6  6  5  5  6  4  6  6  3 
#neurons  4  3  6  2  5  5  8  5  7  8  3  5  
MAPE 2014  0.43  0.94  1.10  1.38  1.63  1.82  1.99  1.90  1.96  2.09  2.69  3.36  
MAPE 2015  0.66  0.99  1.31  1.54  1.63  2.05  2.18  2.25  2.35  2.37  2.69  3.78  
FR  dc  1  1  4  2  2  1  4  3  3  2  3  2 
#neurons  6  14  7  10  18  16  8  4  4  10  7  9  
MAPE 2014  0.38  0.62  0.78  0.86  0.96  1.15  1.39  1.54  1.41  1.33  1.36  1.47  
MAPE 2015  0.41  0.60  0.85  1.04  1.10  1.30  1.54  1.59  1.42  1.59  1.44  1.59  
DE  dc  6  4  3  6  6  6  3  6  3  5  6  4 
#neurons  1  10  3  5  15  3  7  10  5  6  8  5  
MAPE 2014  0.41  0.58  0.69  0.80  0.84  0.94  1.25  1.29  1.31  1.23  1.30  1.29  
MAPE 2015  0.38  0.65  0.73  0.82  0.92  0.98  1.39  1.49  1.55  1.32  1.41  1.48 
Hour  13  14  15  16  17  18  19  20  21  22  23  24  Mean  

PL  dc  4  6  4  4  4  5  2  1  1  3  3  4  
#neurons  7  5  6  7  3  7  7  9  8  5  8  6  6.67  
MAPE 2014  1.31  1.38  1.38  1.50  1.48  1.40  1.33  1.30  1.17  1.08  1.12  1.14  1.13  
MAPE 2015  1.44  1.50  1.44  1.64  1.65  1.43  1.36  1.37  1.36  1.19  1.19  1.22  1.22  
GB  dc  5  6  5  5  5  5  5  3  3  2  4  3  
#neurons  6  7  5  7  8  4  3  3  2  3  2  3  4.75  
MAPE 2014  3.92  4.24  4.42  4.39  4.09  3.52  2.93  2.67  2.48  2.26  2.12  2.25  2.52  
MAPE 2015  4.21  4.99  5.19  4.72  4.09  3.84  3.11  2.98  2.51  2.33  2.37  2.62  2.78  
FR  dc  4  5  3  4  2  2  1  2  5  2  3  2  
#neurons  9  5  9  6  6  5  5  4  3  6  3  6  7.50  
MAPE 2014  1.56  1.80  1.96  2.10  2.21  2.00  1.88  1.76  1.71  1.71  1.64  1.67  1.47  
MAPE 2015  1.67  1.87  2.41  2.22  2.37  2.11  1.93  1.80  1.81  1.83  1.64  1.73  1.58  
DE  dc  3  3  3  4  3  3  3  1  2  1  3  3  
#neurons  2  5  5  6  7  6  4  7  12  5  2  2  5.88  
MAPE 2014  1.40  1.48  1.63  1.59  1.61  1.54  1.41  1.32  1.34  1.26  1.31  1.36  1.22  
MAPE 2015  1.40  1.59  1.73  1.73  1.73  1.54  1.44  1.49  1.56  1.26  1.33  1.36  1.30 
Results for variant 4
PL  GB  FR  DE  

#neurons  3.48  2.97  2.77  2.40 
MAPE 2014  1.16  2.64  1.53  1.22 
MAPE 2015  1.20  2.73  1.56  1.24 
MAPE for variant 5
PL  GB  FR  DE  

v.5a  1.17*  2.63*  1.63*  1.25* 
v.5b  1.27  2.53*  1.61*  1.26 
v.5c  1.16*  2.57*  1.62*  1.27* 
 (a)
optimization procedure is carried out to select the optimal number of neurons for each forecasting task (#neurons = 1, 2,…, 5),
 (b)
there is no optimization phase—there is one neuron in the hidden layer for all forecasting tasks, and
 (c)
there is no optimization phase—there are two neurons in the hidden layer for all forecasting tasks.
The best results for each dataset in Table 7 are marked with an asterisk (best results were confirmed by Wilcoxon rank sum test with 5% level of significance). As we can see from this table, there is no difference in errors between variants v.5a and v.5c. But note that variant v.5c does not require the timeconsuming optimization phase.
MAPE for the best models in each variant
v.1  v.2  v.3  v.4  v.5  

PL  1.35  1.45  1.22  1.20  1.16* 
GB  2.81  3.54  2.78  2.73  2.57* 
FR  1.70  1.95  1.58  1.56*  1.62 
DE  1.66  1.91  1.30  1.24*  1.27* 

PL—hourly load time series of the Polish power system in the period of 2002–2004,

FR—halfhourly load time series of the of the French power system in the period of 2007–2009,

GB—the halfhourly load time series of the British power system in the period of 2007–2009 and

VC—the halfhourly load time series of the power system of Victoria, Australia, in the period of 2006–2008.

RBFNN—radial basis function NN,

GRNN—generalized regression NN,

FCPNN1—fuzzy counterpropagation NN, instar clustering variant,

FCPNN2—fuzzy counterpropagation NN, outstar clustering variant,

SOM1—selforganizing map, concatenated x and ypatterns clustering variant,

SOM2—selforganizing map, independent x and ypatterns clustering variant,

SOM3—selforganizing map, ypatterns clustering variant,

PCR—principal components regression,

PLSR—partial leastsquares regression,

NWE—NadarayaWatson estimator,

FNM—fuzzy neighborhood model,

FP1 + kmeans—model based on kmeans clustering of concatenated x and ypatterns,

FP2 + kmeans—model based on kmeans clustering of x and ypatterns independently,

AIS1—artificial immune system working on concatenated x and ypatterns,

AIS 2—artificial immune system working on separate populations of x and ypatterns,

AISLFS—artificial immune system with local feature selection,

ARIMA—auto regressive integrated moving average model ARIMA(p, d, q) × (P, D, Q)_{v},

ES—exponential smoothing state space model,

Naïve—naïve model: the forecasted daily curve is the same as seven days ago.
The first seven models are based on different types of NNs and were described in detail in [3]. The next two models, PCR and PLSR [35], are linear regression models in which the components of the input patterns are constructed by the linear combination of the original components. In these models, the relationship between input and output patterns is modeled locally in the neighborhood of a query pattern. NWE and FNM are nonparametric regression models [36], where the regression curve is a linear combination of yvectors weighted by the function which nonlinearly maps the distance between xvectors. FP1 + kmeans and FP2 + kmeans [36] aggregate the x and ypatterns into clusters, assign the query pattern to the cluster and reconstruct the forecasted ypattern from the cluster characteristics. Artificial immune systems, AIS1, AIS2 and AISLFS [36, 37], are biologically inspired computation methods, where the forecasting problem is solved in the immune memory creation process. In these models, antibodies are the recognition and prediction units, which memorize features of the time series and reconstruct the forecasted pattern. For ARIMA and ES, the time series were decomposed for each hour of the day (or half hour, depending on the time series resolution) and a separate series was created. In this way, a daily seasonality was eliminated. ARIMA or ES was used for the independent modeling of these series. In the above list of forecasting models, except ARIMA, ES and Naïve, the time series are represented by patterns defined in the same way as in this work [see Eqs. (1) and (2)].
Forecast errors and their interquartile ranges for the proposed and comparative models
Model  PL  FR  GB  VC  

MAPE  IQR  MAPE  IQR  MAPE  IQR  MAPE  IQR  
MLP (v.5)  1.45  1.38  1.59  1.64  1.63  1.68  2.99  2.74 
RBFNN  1.67  1.53  1.70  1.70  1.84  1.90  3.23  3.05 
GRNN  1.38  1.33  1.64  1.71  1.56  1.64  2.83  2.59 
FCPNN1  1.71  1.46  1.90  1.95  1.69  1.79  3.18  2.97 
FCPNN2  1.63  1.50  1.82  1.86  1.66  1.71  3.22  2.99 
SOM1  1.74  1.65  2.10  2.19  1.95  1.98  3.41  3.12 
SOM2  1.73  1.53  1.95  2.04  1.78  1.89  3.28  3.08 
SOM3  1.99  1.83  2.06  2.18  1.95  2.02  3.63  3.47 
PCR  1.35  1.33  1.71  1.78  1.60  1.68  3.00  2.70 
PLSR  1.34  1.32  1.57  1.61  1.54  1.61  2.83  2.60 
NWE  1.30  1.30  1.66  1.67  1.55  1.63  2.82  2.56 
FNM  1.38  1.38  1.67  1.71  1.60  1.66  2.91  2.67 
FP1 + kmeans  1.69  1.64  2.05  2.17  1.84  1.88  3.34  3.01 
FP2 + kmeans  1.59  1.51  1.94  2.05  1.76  1.84  3.13  2.94 
AIS1  1.50  1.50  1.93  1.95  1.77  1.84  3.04  2.75 
AIS2  1.50  1.51  1.93  1.96  1.78  1.87  3.33  2.93 
AISLFS  1.51  1.49  1.79  1.81  1.67  1.73  3.13  2.75 
ARIMA  1.82  1.71  2.32  2.53  2.02  2.07  3.67  3.42 
ES  1.66  1.57  2.10  2.29  1.85  1.84  3.52  3.35 
Naïve  3.43  3.42  5.05  5.96  3.52  3.82  4.54  4.20 
5 Conclusions
The main contribution of this work is to examine global and local versions of the neural models for STLF. The models are analyzed in the context of data representation methods. Daily load curve is introduced to the models as an xvector: the normalized vector of the hourly loads. This preprocessing simplifies the forecasting problem by filtering out both the trend and seasonal variations of periods longer than a day. Xvectors express the shape of the daily curve. Similar preprocessing is used for an output variable. The day of the week is encoded in six ways and the hour of the day in five ways. In optimization procedures, the best ways of coding as well as the number of hidden neurons are selected for each forecasting model.
In a global approach, the model is competent for each day of the week and each hour of the day. The relationship between input and output variables is complex in this case, which means a more complex network with more hidden neurons is required. Both the learning and optimization of such model are difficult, timeconsuming tasks. While the accuracy of this model is limited, the decomposition of the forecasting problem into subproblems, and modeling these subproblems separately, should lead to an improvement in accuracy. The first decomposition method relies upon splitting the problem into seven subproblems, one for each day of the week individually. However, an experimental study did not confirm better results for this approach. The second decomposition method splits the problem into 24 subproblems, one for each hour of the day. Local NNs are built for each hour and results are improved. Further improvement is achieved when the problem is decomposed into subproblems representing each day of the week and each hour of the day. In this case, the local relationships between the input and output variables within the subproblems are simpler and can be modeled using less neurons in easy optimization and learning procedures. Finally, the most local decomposition method splits the problem into separate forecasting tasks, i.e., forecasts for a given hour of a given day. In this case, an individual NN learns for each forecasting task and is competent only for this task. New tasks require new NN learning. An advantage of this method is that the model does not require an optimization phase (selection of the number of neurons and data coding method). In its optimal variant, it only has two hidden neurons, so, its learning is very fast. The most local models, v.4 and v.5, reduced the forecast errors significantly, when compared to the global model.
The final recommendation for STLF is using the local MLP models v.4 or v.5 due to the most accurate results. Note that these models have very simple architecture: in v.4 only 2–4 hidden neurons are needed, and in v.5 just two or even one hidden neuron provides sufficiently accurate forecasts. Such simple models learn much faster and are more resistant to overfitting than more complex models v.1, v.2 and v.3. Moreover, the landscape of the error function for them is less complex, so finding a global minimum is more likely.
Notes
Compliance with ethical standards
Conflict of interest
The author declares that he has no conflict of interest.
References
 1.Hippert HS, Pedreira CE, Souza RC (2001) Neural networks for shortterm load forecasting: a review and evaluation. IEEE Trans Power Syst 16(1):44–55Google Scholar
 2.Kodogiannis VS, Anagnostakis EM (2002) Soft computing based techniques for shortterm load forecasting. Fuzzy Sets Syst 128:413–426MathSciNetzbMATHGoogle Scholar
 3.Dudek G (2016) Neural networks for patternbased shortterm load forecasting: a comparative study. Neurocomputing 2015:64–74Google Scholar
 4.Zang H et al (2018) Hybrid method for shortterm photovoltaic power forecasting based on deep convolutional neural network. IET Gener Transm Distrib 12(20):4557–4567Google Scholar
 5.Kong W, Dong ZY, Jia Y, Hill DJ, Xu Y, Zhang Y (2019) Shortterm residential load forecasting based on LSTM recurrent neural network. IEEE Trans Smart Grid 10(1):841–851Google Scholar
 6.Wang L, Zhang Z, Chen J (2017) Shortterm electricity price forecasting with stacked denoising autoencoders. IEEE Trans Power Syst 32(4):2673–2681Google Scholar
 7.Shi H, Xu M, Li R (2018) Deep learning for household load forecasting—a novel pooling deep RNN. IEEE Trans Smart Grid 9(5):5271–5280Google Scholar
 8.Rafiei M, Niknam T, Aghaei J, ShafieKhah M, Catalão JPS (2018) Probabilistic load forecasting using an improved wavelet neural network trained by generalized extreme learning machine. IEEE Trans Smart Grid 9(6):6961–6971Google Scholar
 9.Li B, Zhang J, He Y, Wang Y (2017) Shortterm loadforecasting method based on wavelet decomposition with secondorder gray neural network model combined with ADF test. IEEE Access 5:16324–16331Google Scholar
 10.Deihimi A, Orang O, Showkati H (2013) Shortterm electric load and temperature forecasting using wavelet echo state networks with neural reconstruction. Energy 57:382–401Google Scholar
 11.Kulkarni S, Simon SP, Sundareswaran K (2013) A spiking neural network (SNN) forecast engine for shortterm electrical load forecasting. Appl Soft Comput 13(8):3628–3635Google Scholar
 12.Papalexopoulos AD, Hao S, Peng TM (1994) An implementation of a neural network based load forecasting model for the EMS. IEEE Trans Power Syst 9(4):1956–1962Google Scholar
 13.Lee KY, Cha YT, Park JH (1992) Shortterm load forecasting using an artificial neural network. IEEE Trans Power Syst 7(1):124–132Google Scholar
 14.Srinivasan D (1998) Evolving artificial neural networks for short term load forecasting. Neurocomputing 23(1–3):265–276Google Scholar
 15.Topalli AK, Erkmen I, Topalli I (2006) Intelligent shortterm load forecasting in Turkey. Int J Electr Power Energy Syst 28(7):437–447Google Scholar
 16.Methaprayoon K, Lee WJ, Rasmiddatta S, Liao JR, Ross RJ (2007) Multistage artificial neural network shortterm load forecasting engine with frontend weather forecast. IEEE Trans Ind Appl 43(6):1410–1416Google Scholar
 17.Fan S, Chen L, Lee WJ (2009) Shortterm load forecasting using comprehensive combination based on multimeteorological information. IEEE Trans Ind Appl 45(4):1460–1466Google Scholar
 18.Cecati C, Kolbusz J, Różycki P, Siano P, Wilamowski BM (2015) A novel RBF training algorithm for shortterm electric load forecasting and comparative studies. IEEE Trans Ind Electron 62(10):6519–6529Google Scholar
 19.Kalaitzakis K, Stavrakakis GS, Anagnostakis EM (2002) Shortterm load forecasting based on artificial neural networks parallel implementation. Electr Power Syst Res 63(3):185–196Google Scholar
 20.Kodogiannis VS, Anagnostakis EM (1999) A study of advanced learning algorithms for shortterm load forecasting. Eng Appl Artif Intell 12(2):159–173Google Scholar
 21.Dillon TS, Sestito S, Leung S (1991) An adaptive neural network approach in load forecasting in a power system. In: Proceedings first international forum on applications of neural networks to power systems, pp 17–21Google Scholar
 22.Tamimi M, Egbert R (2000) Short term electric load forecasting via fuzzy neural collaboration. Electr Power Syst Res 56(3):243–248Google Scholar
 23.Hanmandlu M, Chauhan BK (2011) Load forecasting using hybrid models. IEEE Trans Power Syst 26(1):20–29Google Scholar
 24.Khotanzad A, Hwang RC, Abaye A, Maratukulam D (1995) An adaptive modular artificial neural network hourly load forecaster and its implementation at electric utilities. IEEE Trans Power Syst 10(4):1716–1722Google Scholar
 25.Djukanovic M, Ruzic S, Babic B, Sobajic DJ, Pao YH (1995) A neuralnet based short term load forecasting using moving window procedure. Int J Electr Power Energy Syst 17(6):391–397Google Scholar
 26.Hernández L, Baladrón C, Aguiar JM, Carro B, SánchezEsguevillas A, Lloret J (2014) Artificial neural networks for shortterm load forecasting in microgrids environment. Energy 75:252–264Google Scholar
 27.Amjady N, Keynia F (2009) Shortterm load forecasting of power systems by combination of wavelet transform and neuroevolutionary algorithm. Energy 34(1):46–57Google Scholar
 28.Chen Y, Luh PB, Guan C, Zhao YG, Michel LD, Coolbeth MA, Friedland PB, Rourke SJ (2010) Shortterm load forecasting: similar daybased wavelet neural networks. IEEE Trans Power Syst 25(1):322–330Google Scholar
 29.Ding N, Benoit C, Foggia G, Bésanger Y, Wurtz F (2016) Neural networkbased model design for shortterm load forecast in distribution systems. IEEE Trans Power Syst 31(1):72–81Google Scholar
 30.Sun X, Luh PB, Cheung KW, Guan W, Michel LD, Venkata SS, Miller MT (2016) An efficient approach to shortterm load forecasting at the distribution level. IEEE Trans Power Syst 31(4):2526–2537Google Scholar
 31.Chu WC, Chen YP, Xu ZW, Lee WJ (2011) Multiregion shortterm load forecasting in consideration of HI and load/weather diversity. IEEE Trans Ind Appl 47(1):232–237Google Scholar
 32.Dudek G (2015) Pattern similaritybased methods for shortterm load forecasting—part 1: principles. Appl Soft Comput 37:277–287Google Scholar
 33.Ferreira VH, da Silva APA (2007) Toward estimating autonomous neural networkbased electric load forecasters. IEEE Trans Power Syst 22(4):1554–1562Google Scholar
 34.Dudek G (2013) Forecasting time series with multiple seasonal cycles using neural networks with local learning. In: Rutkowski L et al (eds) Artificial intelligence and soft computing, ICAISC 2013, LNCS 7894, pp 52–63Google Scholar
 35.Dudek G (2016) Patternbased local linear regression models for shortterm load forecasting. Electr Power Syst Res 130:139–147Google Scholar
 36.Dudek G (2015) Pattern similaritybased methods for shortterm load forecasting—part 2: models. Appl Soft Comput 36:422–441Google Scholar
 37.Dudek G (2017) Artificial immune system with local feature selection for shortterm load forecasting. IEEE Trans Evol Comput 21:116–130Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.