Abstract
About 23% of the German energy demand is supplied by natural gas. Additionally, for about the same amount Germany serves as a transit country. Thereby, the German network represents a central hub in the European natural gas transport network. The transport infrastructure is operated by transmissions system operators (TSOs). The number one priority of the TSOs is to ensure the security of supply. However, the TSOs have only very limited knowledge about the intentions and planned actions of the shippers (traders). Open Grid Europe (OGE), one of Germany’s largest TSO, operates a highpressure transport network of about 12,000 km length. With the introduction of peakload gas power stations, it is of great importance to predict in and outflow of the network to ensure the necessary flexibility and security of supply for the German Energy Transition (“Energiewende”). In this paper, we introduce a novel hybrid forecast method applied to gas flows at the boundary nodes of a transport network. This method employs an optimized feature selection and minimization. We use a combination of a FAR, LSTM and mathematical programming to achieve robust highquality forecasts on realworld data for different types of network nodes.
1 Introduction
About 23% of the German (and European) energy demand is met by natural gas [2]. Additionally, for about the same amount Germany serves as a transit country. Thereby, the German network represents a central hub in the European natural gas transport network. In light of the German Energy transition (“Energiewende”) with an increasing share of renewable energy sources as well as the envisioned international transition towards substantially less fossil fuels and related greenhouse gas emissions, the importance of natural gas will increase even more. A critical task of gas power plants is to deliver electricity in peak load situations when electricity from renewable energy sources is not sufficient to cope with the demands. From the gas network point of view, this leads to huge gas demands on very short notice.
The gas transmission network is run by transmission system operator, hereinafter referred to as TSOs. An integrated organization would conduct jointly operations including gas trading, running storage facilities and operating the transmission network. Thereby, the network capacities could affect transport requirements. However, TSOs face novel challenges to ensure the security of supply caused by the liberalization of the European gas markets [1], which makes TSOs no longer allowed to own, trade or store gas. Instead, trading will be conducted by independent companies to ensure discriminatoryfree access to the transport network for all traders. Therefore, natural gas forecasting has become a fundamental input to the TSOs’ decisionmaking mechanisms. Meanwhile, the natural gas market is becoming more and more competitive and is moving towards more shortterm planning, e.g., dayahead contracts, which makes the dispatch of natural gas in the pipeline network even more challenging [12]. Therefore, a highaccuracy and highfrequency forecasting of local supplies and demands of natural gas consumption is essential for efficient network operation of TSOs.
Although on the contractual level all gas transports of a market area have to be balanced, this needs only to be achieved on average over time. Some outflow might actually only be balanced by an inflow at a later time.
Despite these challenges for the TSOs they need to meet all transport demands. The TSO has the obligation to monitor the situation, foresee possible shortages and react accordingly to ensure the safety of supply. Since changes in gas networks happen rather slowly it is therefore extremely important to have accurate forecasts on the demands and supply of the network to be able to react on time.
We collaborated with one of the biggest German TSOs, operating a gas network with a pipe length of about 12,000 km in total (see Fig. 1), to improve their hourly forecasts for demand and supply. We aim to:

Predict as precisely as possible the average hourly gas flows for the upcoming gas day, i.e., from 6 am to 6 am, just before the start of the gas day (at about 5:59 am);

The prediction needs to be appropriate for all different types of nodes ranging from connections to other networks or countries to industrial users or municipal consumers, leading to very diverse data characteristics;
To reach these goals, we investigated real data from the transport network operated by Open Grid Europe. We propose a powerful and robust hybrid forecast model that benefits from the combination of state of the art forecasting approaches and optimisation, leading to improved forecast accuracy. We interpreted the most important features that our model automatically selects.
In the following, the next subsections present nomenclature and an overview of related work. Section 2 describes the data we used in this study. Section 3 gives details on the proposed models. Section 4 describes the evaluation methodology and the evaluation of computational experiments. Finally, we draw some conclusions.
1.1 Nomenclature
Open Grid Europe GmbH  OGE 
Municipal power stations  MUN 
Power stations and industry  IND 
Storage  STO 
Transfer points to other networks  NET 
Mathematical programming  MP 
Linear program  LP 
Mixed integer linear programming  MILP 
Functional autoregressive  FAR 
Long short term memory  LSTM 
Hybrid model  HYB 
Heating day degree  HDD 
Baseline forecast (persistance)  BAS 
1.2 Related work
Models on natural gas demand forecasting are mainly focused on long term issues. There are quite some publications regarding electricity demand forecasting, (see, e.g. [3, 30, 32, 47]) but electricity behaves very differently from gas.
A survey on models to predict natural gas consumption published between 1949 and 2010 is presented by [42] who evidences that only a few works are focused on hourly gas flow prediction. A more recent survey [51] considers 187 papers published between 2002 and 2017. The authors point out that the majority of works provide daily predictions and recognize that neural networks are the most used models. The authors also show that, on the considered period, most of the works were performed at an aggregated level (i.e. country or city) and only three papers proposed models to forecast the hourly gas consumption.
In [49], two neural networks were tested to forecast natural gas consumption based on historical data and environmental variables. The authors found a better prediction accuracy when using the multilayer perceptron compared to the radial basis function. In [48], a model similar to radial basis neural network was proposed to predict gas consumption in a distribution system. In this work, input variables were selected using a genetic algorithm. Residential hourly gas consumption was predicted with neural networks by [17]. In this work, the heating degreehour method which considers the gap between outdoor and indoor temperature was considered. The best hyperparameters configuration consisted of 29 neurons, a feedforward backpropagation algorithm and tangent, sigmoid and linear functions for the input, hidden and output layers respectively. Similarly, [45] proposed neural networks to forecast residential natural gas demand. The proposed network consisted of a multilayer perceptron with one hidden layer. The input features included calendar (i.e. month, day of the month, day of the week, hour) and weather (temperature) information. The authors found that the average prediction error was higher during the winter months because gas flow was higher. More recently, [23] compared several machine learning models to predict residential natural gas hourly demand and found that recurrent neural network and linear regression were the most accurate models. The prediction results of monthly gas consumption of residential buildings using Extreme Learning Machine (ELM), artificial neural networks (ANNs) and genetic programming (GP) were presented by [24]. The ELM is characterized by higher training speed compared to backpropagation and it was found to perform better, in terms of RMSE, compared to the other two techniques. In [26] the authors set up a two stages methodology to predict daily gas consumption of utility companies. In the first stage, two NNs are run in parallel to produce daily forecasts; in the second stage, a nonlinear transformation of some features of the input vector is performed. The combination of the two stages is based on several methods such as average forecast, recursive least squares, etc. The results show that the mix between the two forecasters has higher accuracy although the combination of the two models increases the complexity. Overall, these works show that the consumer profile is very important when forecasting gas flow. In this regard, [38] identified seventeen groups profiles, based on their historical consumption and predicted daily gas demand. The overall prediction was obtained from the combination of single predictions.
The backpropagation algorithm optimized with a genetic algorithm was implemented by [54] to increase the training speed and to achieve a global minimum. The authors predict next day gas loads based on temperature and weather conditions. Furthermore, the authors tested the algorithm on a three years real dataset recorded in Shanghai to predict one month and a half gas load. Similarly, [55] propose a recurrent neural network to predict daily gas flow. The OutputInputHidden FeedbackElman neural network takes into account, not only the hidden nodes’ feedbacks but also considers the output nodes’ feedbacks. The results improved compared to these obtained with the standard Elman network. However, the authors recognize that further research is needed to forecast gas demand during holidays. In [4], an adaptive networkbased fuzzy inference system (ANFIS) consisting of a neural network integrated with fuzzy logic was proposed to forecast short term natural gas demand. The main advantage of this model was its ability to handle uncertainty, noise and nonlinearity in the data and, compared to standard neural network models, provided more accurate results. Wavelet transform has been deployed by [44] to decompose the hourly gas demand time series and BiLSTM and LSTM are optimized using genetic algorithm. The model was applied to winter data on which it has shown good prediction accuracy.
Several static and adaptive models have been tested by [37] for shortterm gas consumption forecast (randomwalk, temperature correlation model, linear regression model, ARX, adaptive (recursive) linear autoregressive model (RARX), neural network (NN), Recurrent NN, Support Vector Regression). They found that the best performance was obtained by the RARX of order 3. Furthermore, they found that nonlinear models such as neural networks and support vector machines had a lower generalization capacity compared to linear models. Finally, they concluded that the adaptive models overall performed better than static models.
The traditional approaches are regression and econometric models. In this regard, the performance of non linear mixed effects, ARIMAX and ARX models to predict gas consumption of 62 residential and small commercial customers was assessed by [10]. The authors forecast daily consumption of an entire month based on the previous 18 months. The time series included zero flows and missing data which were excluded for the training process. The prediction performance was similar in terms of daily mean absolute error which was close to zero for all the tested models. Thus, the authors propose to combine multiple models although they recognize that this might be a difficult task because of increased computational complexity. Multiple linear regression has been proposed by [40] who predicted annual gas consumption based on socioeconomic variables (GDP and inflation in the case of Turkey) that have been selected based on their statistical significance. Based on the forecast, the authors propose alternative energy policies. Robust least square method combined with loglinear Cobb–Douglas model has been proposed by [15]. The authors compared the proposed robust and ordinary least square methods for the yearly forecast of the natural gas demand in Brazil, considering the total demand as well as the industrial and power sectors demand. The authors showed that using the proposed model can be very useful when a large amount of past data is not available, which is usually necessary for the calibration of more sophisticated forecast models.
A hybrid model formed by a grey model and an autoregressive integrated moving average model has been proposed by [52] to predict monthly shale gas production. The authors conclude that the results of the combined model are more accurate than the single linear and nonlinear models.
In [33], Multivariate Adaptive and Conic Multivariate Adaptive Regression Splines were proposed to predict residential daily gas demand. The two models provided better results in terms of prediction errors (MAE and RMSE) compared to these obtained with linear regression and neural networks. In [41], the nonlinear characteristics of the natural gas consumption is modeled with several Grey models that are compared to predict the yearly natural gas consumption in China. Nonlinear programming and genetic algorithm have been proposed by [19] to predict natural gas consumption in the residential and commercial sectors on a yearly basis. Similarly, [25] proposed the breeder hybrid algorithm which consists of three steps for natural gas flow demand forecast. In the first stage, the coefficients of a nonlinear regression model are estimated. Successively, the estimates are improved using a genetic algorithm. Finally, the optimized coefficients are deployed as initial solutions for the simulated annealing. Nearest neighbor and local regression were proposed by [6] to predict gas flow in a small gas network with a 15 minutes resolution. The authors evidence the importance of environmental variables such as the temperature. Their method allowed to detect anomalies and the consumption patterns based on one year historical data. In the literature, there are also combinations of several methods to predict one dayhead natural gas consumption. In [34], the time series were decomposed into lowfrequency and highfrequency components using Wavelet transform. In a second step, the genetic algorithm and Adaptive NeuroFuzzy Inference System were deployed to predict each of the decomposed time series. The output was finally fed into a feedforward neural network to refine the prediction. The research was focused on different types of natural gas distribution points. The authors obtained better prediction results using the data of distribution points located near the city center. Neural networks have been also compared to the performance of autoregressive models. In [46], for instance, short term natural gas consumption in Turkey was predicted using SARIMAX model and Neural Networks (Multilayer and Radial Basis) and multivariate regression. They found that SARIMAX had better prediction performance. The temperature correlation model, proposed by [43], was compared with several configurations of ARX, stepwise regression, Support Vector Regression and neural network. The authors found that SVR and NN performed better on the training set, while high order ARX model performed better on the test set. Support Vector Regression has been deployed with false neighbours filtered approach to predict short term natural gas consumption [56]. The local predictor was based on the nearest neighbour approach so that the Euclidean distance between the training and test data and the neighbour filter was applied to determine the validity of the predicted values based on the exponential separation rate. The authors obtained better performance prediction compared to ARIMA, neural networks and Support Vector Regression.
Overall, the analyzed literature shows that there are few works that are focused on the comparison between methods to predict hourly gas flow of different types of nodes in a gas network or combining the advantages of different forecasting methods to a hybrid model for hourly gas flows. Therefore, we propose a hybrid model based on optimisation and machine learning and compare its results to four different models to predict hourly gas flow. To address the heterogeneity of the time series for the different node types we compare results obtained for four different types of nodes.
2 Data
We consider highresolution natural gas inflows and outflows in the highpressure gas pipeline network operated by Open Grid Europe GmbH (OGE).
The gas transmission network has more than 1000 boundary nodes which can be classified into four different groups:

Network Transfer Points (labeled NET) are large nodes with natural gas imported and exported to other networks mostly outside Germany. These can be entries and/or exits.

Municipal Utility nodes (MUN) serve residential and small commercial constituents and are always only exits. They are often temperature dependent, exhibit daily and seasonal patterns, and simultaneously are influenced by weekends/holidays.

Industry and Power Stations (IND) represent electricity generation and factory production nodes. These are also always exits and naturally exhibit weekly patterns due to working routines.

Storage nodes (STO) usually have a large number of zero flow hours with some substantial, often constant transfer in between. The nodes can always be both entries and exits.
While, in principle, we know the above classification, it is proved not to be reliable regarding the behavior of the nodes, so we will not use this information as part of the forecast, but just to explain certain behavior.
Table 1 depicts the number of nodes belonging to the different groups and the percentage of gas flow explained by each group.
As an illustration, we carefully selected three nodes for each type. The three network nodes we selected occupy 22% of the whole network flow. The municipal nodes are considered important by the TSO. The Industry nodes selected represent power plants and play a key role in energy generation with high renewable energy shares, as they are fast to start and can produce the necessary energy in times of peak demand. For the representative nodes from the Storage group, we selected the most frequently used nodes in the observed period. Figure 2 shows normalized (to the range of [0, 1]) flows of the nodes considered in this study.
For each node, the gas flows are measured hourly. Additionally, we were given the average daily temperatures measured at the nodes. Some statistical properties of selected nodes are given in Table 2. As can be seen from Fig. 2 and in Table 2 some nodes have continuous flow, while other are active only occasionally. Storage nodes have the highest percentage of zero flows. For the ones considered in this study hours with zero flow amount for 26–53% of the time. Network nodes show the highest variability and are always inflows. Industry nodes are always outflows and are clearly not temperature dependent. Municipal nodes are usually temperature dependent and have strong daily, weekly and seasonal patterns.
3 Methods
Many research studies showed that combining forecasts improves accuracy relative to individual forecasts [16, 27]. In this section, we will first present three different individual forecasting methods; Functional AutoRegressive (FAR), Long ShortTerm Memory Network (LSTM) and Mathematical Programming (MP) model. Then we will propose a hybrid model (HYB) based on MP method which is using the output of two other forecasting models, FAR and LSTM, as additional inputs (features).
3.1 Functional autoregressive (FAR) method
Recent development in functional data analysis provides an efficient way for jointly analyzing the large dimensional processes such as natural gas flows. On each day, the hourly gas flows are represented as a continuous gas flow curve over an infinite time interval that naturally inherits the serial and crossdependence in the raw data. The serial crossdependence among the daily gas flow curves is described by the functional autoregressive (FAR) model ([8]), which extends the discrete time series analysis from a finite dimensional space to an infinite world. The most popular estimation methods for FAR type models include the functional Yule–Walker estimation and the sieve maximum likelihood (SML) estimation, see [7,8,9, 21], and [12,13,14, 31].
Our interest is to predict the daily gas flow curves based on the learned dynamic dependence over time. We detail the FAR setup for the daily curves of natural gas flows and show how to obtain the SML estimator of the functional parameters.
Let \( \{ X_{t} (\tau )\} _{{t = 1}}^{n} \) denote the gas flow curve on day t, which is a squareintegrable random function in the Hilbert space \({\mathcal {H}}\) defined over a time domain \(\tau \in [0,1]\) without loss of generality. The functional autoregressive model of order 1, i.e. the FAR(1) model, is defined as:
where \(\mu (\tau )\) is the timedependent mean function of \(X_t(\tau )\). The innovation \(\varepsilon _t(\tau )\) is a strong \({\mathcal {H}}\)white noise with zero mean and bounded second moment \(E\Vert \varepsilon (\tau )\Vert ^2<\infty \). The norm \(\Vert \cdot \Vert \) is induced by the inner product \(<\cdot>\) of \({\mathcal {H}}\). The AR operator is represented as a kernel \(K \in L^2([0,1])\), which is one implementable form of the HilbertSchmidt operator specifying the serial dependence of the curve on its own past value. This choice of convolution kernel operator is very common in the study of functional linear and autoregressive processes, see, [29, 31, 53] and [14]. The kernel K is usually taken to be an even function with \(\Vert K\Vert _2<1\). Here, \(\Vert \cdot \Vert _2\) denotes the standard \( L_2 \) norm.
For the functional observations and the autoregressive terms defined on the infinite dimensional space, we project them onto Fourier basis functions given their periodic features, which is also easy to derive a closedform solution. We represent the functional terms in the basis of \(L^2([0,1])\) given by the trigonometric functions \(\Phi _{0}=I_{[0, 1]}\), \(\Phi _{2k}(\tau )=\sqrt{2}\cos (2\pi k\tau )\) and \(\Phi _{2k1}(\tau )=\sqrt{2}\sin (2\pi k\tau )\) for \(k \in {{\mathbb {Z}}}\backslash \{0\}\), as follows:
where \(a_{t,0}\), \(a_{t,k}\), \(b_{t,k}\) denote the constant, cosine, and sine Fourier basis coefficients corresponding to the observed gas flow curves \(X_{t}(\tau )\); \(c_{j,0}\) and \(c_{j,k}\) are the constant and cosine basis coefficients for the unknown even kernel \(K(\tau )\); \(\omega _{0}\), \(\omega _{k}\), \(\eta _{k}\) are for the intercept function \( \delta (\tau )=\mu (\tau )\int _{0}^{1}K(\tau s)\mu (s)\,ds \), and \(\epsilon _{t,0}\), \(\epsilon _{t,k}\), \(e_{t,k}\) are for the innovation \(\varepsilon _{t}(\tau )\).
Plugin the Fourier expansions into (1) and rearrange the equations, we obtain the recursive relationship of the Fourier coefficients. However, it is unfeasible to estimate the Fourier coefficients in infinitedimensional parameter spaces. This makes it necessary to conduct regularization or dimension reduction. Among others, [20] proposed the method of sieve to conduct estimation over the approximating subspaces \(\{\Theta _{m_n}\}\), called sieves, rather than over the original infinitedimensional space \(\Theta \). We refer to [11] for more theoretical details and [13] for a specific application example of the sieve method.
Under sieves, the unknown parameters are estimated under a subspace \(\{\Theta _{m_n}\}\). The estimation of the FAR model is thus converted to an estimation problem of a finite number of unknown Fourier coefficients. Further assume the Fourier coefficients of the innovation function \(\varepsilon _t(\tau )\), i.e, \(\epsilon _{t,0}\), \(\epsilon _{t,k}\), \(e_{t,k}\), are IID Gaussian distributed with zero mean and variance \( \sigma _k^2 \), a transition density under \(\Theta _{m_n}\) is defined as follows:
based on which the maximum likelihood estimator can be obtained with closedform solution under sieve \(\Theta _{m_n}\) as:
which lead to the kernel estimator \( {\hat{K}}(\cdot ) \).
We implement the FAR modelling to forecast the daily gas flow curves. The hstep ahead gas flow forecast, denoted as \({{\hat{X}}}_{t+h}(\tau )\) is:
At each forecast point, we estimate the Fourier coefficients to obtain the estimated mean function \({{\hat{\mu }}}(\cdot )\) and kernel operator \({{\hat{K}}}(\cdot ).\) The fitted model is then used to compute \(h=1\) and 2step ahead forecasts of the gas flow curves.
3.2 Long shortterm memory network
In this section, we use Long ShortTerm Memory Network (LSTM) to predict gas flow based on the previous 24h. LSTM are a special types of Recurrent Neural Networks (RNN) that have been introduced in the eighties (i.e. [18, 39]) to model time interrelations by allowing connection between hidden units with a time delay [35]. At each iteration, the hidden state vector receives the input vector and its previous hidden state. The hidden state vector can therefore be seen as a representation of time sequences [36].
Long ShortTerm Memory (LSTM) networks, proposed by [22], include a memory gate that controls what passes through the network and what is blocked so that some of the information that is feedback to the network is remembered and some other is forgotten. An additional gate keeps memory and filters out what has to be forgotten. At each time step, the network memorizes the information and filters out what is not relevant for the prediction. Finally, another set of gates ignores what is irrelevant. The formulation of the LSTM, as presented in [28], consists of three layers that are called gates: the input (3), forget (4) and output gates (7), respectively:
where \(h_{t1}\) is the hidden state computed at time \(t1\) which is calculated based on previous hidden state, \(h_t\). Each of the gate has a sigmoid function so that the values range between 0 and 1.
The decision on which information will be stored into a cell is based on the input layer (3) and on a hyperbolic tangent (or sigmoid) function assigned to the layer that returns the set of candidate values, \( {\hat{C}}\) (5):
To update the cell state \(C_{t1}\) into \(C_t\), the old state is multiplied by the forget gate and added to the new candidate values (6):
Finally, the output is obtained by passing the cell state \(C_t\) to a rectified linear function (or hyperbolic tangent) to decide which part of the information is passed to the output 7. Moreover, the cell state is multiplied by the output of the relu (or tanh) gate (8).
Thanks to this architecture, LSTM has the ability to look back several time steps and, thus, to improve the predictions. Also recurrent neural networks can look time steps back but the problem they incur is called vanishing or exploding gradient for which the results either become very large or small.
The LSTM deployed to forecast gas flow consisted of one single layer and an early stop function with patience set to four. This means that the training of the network stops as soon as the value of loss function remains the same after four iterations. In this work, different types of parameters are manually selected to forecast gas flow depending on the type of node and based on trial and error.
The configuration of the parameters of the network for each node is reported in Table 3. The most influential parameter is the batch size that is the number of training examples that are utilized in each iteration. The higher the value of this parameter the faster is the training of the network. Only two types of activation functions are selected for the hidden state, depending on the node: hyperbolic tangent function (tanh) or sigmoid. The transfer function of the output gate selected for all nodes, except for one network node, is the Rectified Linear Unit (Relu). Overall for storage nodes that are characterized by high variability between negative and positive values and by a high number of hours with zero flows, the batch size was set between 48 and 80 and the number of neurons between 70 and 80.
3.3 Mathematical programming (MP) for time series forecasting
In this Section, we use Linear Programs (LP) together with Mixed Integer Linear Programs (MILP) for prediction of the flows—supplies and demands of the gas network. Given a set of measurements \(m_{d,h} \in M\) for each day \(d \in D\) and each hour \(h \in H\). Let us define \(M_{d} \subseteq M\) as a subset of the measurements before day d. The features \(i \in F_h = \{1, \ldots , n_h\}\) are defined as arbitrary functions of historical flow values, \(f_{h,i}(d):D \rightarrow M_d, i \le p_h \le n_h\) and exogenous variables \(f_{h,i}(d), i \in \{p_h + 1, \ldots , n_h\}\). We can approximate gas flow with weighted sum of features
where \(p_{d,h}\) is the flow value which is approximated, and \(w_{h,i}\) define the weights.
The approximation error is defined as
and the optimal weights are calculated by minimizing the sum of absolute errors for each day d and hour h
This problem is not a LP because of the nonlinear absolute value in the objective function but it can be transformed into a LP. We can rewrite the error \(e_{d,h}\) as the difference of two nonnegative variables:
Then the transformed objective function is
For a solution to be optimal regarding to the objective \(e_{d,h}^+\cdot e_{d,h}^ = 0\) must be true, so we can write
and consequently the final LP problem becomes
Furthermore, we can force our model to be unbiased by requiring
and setting bounds for the weights \(l \le w_{h,i} \le u\) to limit the influence of a single specific feature.
For each day in the test set (out of sample days that we want to forecast) the forecasted flow values are computed by first computing the weights via an LP with 16 weeks of historical data and then using the weighted sum of features (9) for each hour to forecast the flow values. The lower and upper bounds for the weights are set to \(l=2\) and \(u=2\), respectively. For the computation of the forecasted flow values, it might be that also forecasted flow values of prior hours are used as input values for calculating the features, if the corresponding hours do not lie in the past.
3.4 Training: feature selection
In the training procedure of this method, a slightly different model is used which automatically chooses for each hour the B features which are most important, to limit overfitting in the LP. Therefore, we add additional binary variables \(x_{h,i}\) to the problem, which determine whether feature i is chosen for hour h, i.e., whether the weight of feature i and hour h is not equal to zero. Then, we link these variables to the weight variables
and limit the number of chosen features by B
The solution of the resulting MILP leads for each hour h to one feature set \(F_h\) of at most B features which are most important for this hour.
The list of features used in this study is presented in Table 4. The whole set of features F we used consists of 29 different features based on historical flow values, one temperature feature and two different features describing position of the predicted gas day in the week and the offset feature. Using sensitivity analysis the number of chosen features was limited to six. One year of historical measurements was used for training and selecting optimal set of features for every node and hour. Figure 3 is showing the heatmap of selected features for MP model for each group of nodes summed up for 24 h.
For all nodes the feature representing the flow of the previous hour (\(f_1\)) is the mostly used feature. For all hours except the first predicted gas hour this feature value is calculated based on the the forecasted flow of previous hour when the final forecast is calculated. The same hour yesterday(\(f_4\)) is also widely selected among all groups. As it was expected Weekend (\(f_{31}\)), Evening (\(f_{32}\)) as well as Mean temperature difference feature (\(f_{30}\)) are usually chosen only for the Municipal utilities since the behaviour of those nodes shows strong daily, weekly and seasonal patterns. In the case of Industry nodes the features of Mean flow of the same and previous day (\(f_{29},f_{15}\)) together with the Ratio features \(f_{11},f_{12}\) are the most frequently chosen features. For Transfer Points and Storages features of mean flow of the same and previous day (\(f_{29},f_{15}\)) are also dominating ones except for first gas hour where this pattern is not present. For all observed nodes the Offset feature (representing the bias in the model) is selected very frequently.
Figure 4 shows a scatter plot of flow amount versus the three most frequently chosen features among different types of nodes for all hours of the day. It can be seen that the flow depends linearly on \(f_4\) while other features are showing a nonlinear dependency.
3.4.1 Hybrid model
The main advantage of the mathematical programming method (MP) proposed in this paper is flexibility in the sense that adding new features when they are available in order to improve the forecast is very simple.
In this section, we propose a hybrid model(HYB), combining mathematical programming (MP) with the two other proposed methods by adding the outputs from LSTM and FAR model as an exogenous inputs to the MP model. The optimal sets of features chosen for every node and hour were kept from previous MP training and extended with the forecasts from the LSTM and FAR model as additional features. The final forecast is calculated as weighted sum of all features from the extended features set:
New optimal weights are calculated by running an LP:
where \({\text{ LSTM}}_h(d)\) and \({\text{ FAR}}_h(d)\) represent the forecasted values obtained from the LSTM and FAR models, respectively.
4 Testing and results
4.1 Influence of temperature
The temperature is one of the most important factors that influence gas consumption. When the natural gas is consumed for heating such as in residential areas, the temperature usually has an inverse relationship with gas consumption which is also dependent on other environmental data such as the time of day, the day of week, the season, etc. Furthermore, temperature and time of day are the factors that mostly impact the forecast error [45].
The majority of models presented in the literature are focused on residential and small commercial consumer at individual or aggregate level. Several authors have considered the temperature (i.e. [19, 33, 34]) or meteorological data [46] in their models to forecast gas flow and reduce the prediction error. In [50], the authors pointed out that the nonlinear characteristics of temperature has been assessed long time ago and gas consumption is proportional to the Heating Degree Day (HDD) [5]. This proportionality is evidenced when plotting the average daily temperature versus the gas consumption.
The scatter plots of daily changes of temperature versus daily changes of gas flow of the nodes considered in this work are shown in Fig. 5. Storage nodes have a high percentage of zero flows, those are the nodes that better approximate the non linear relationship expressed by the HDD. As expected, the Municipal nodes present a positive correlation between the flow and the temperature. Among the network nodes, one of them presents a negative relationship with the temperature. The remaining nodes appear to be independent from the temperature.
4.2 Results
4.2.1 Objectives, setup and evaluation metrics
In this paper, we observed a data set of hourly gas flow time series from 12 nodes with 17.520 observations (2 years).
Our goal is to predict values \(p_{d,0}\) to \(p_{d,23}\) for a given \(d\in D\). Of course only data for days \(d1\) and earlier can be used. All proposed methods are tested on the last 60 days of the data set.
Our basis comparison are the mean absolute deviation (MAD) between the hourly forecast and the measured flow during one day (24 h) ahead forecast, defined as
and mean absolute percentage error (MAPE) defined as
All results are also compared to a baseline (BAS) forecast defined as
Even though our main goal is to predict, as precisely as possible, the average hourly gas flows for the next 24 h, we adopted mean daily errors (MAD and MAPE) as accuracy metrics having in mind that the transport system operator has certain flexibility throughout the day. In particular, this means that, in principle, the maximum hourly error could be compensated by technical measures of TSO as long as it is not prolonged for several hours and have same direction.
4.2.2 Forecasting results
Mean MAD and MAPE (over 60 days in the test set) achieved by proposed models are presented in the Table 5.
It can be seen that between individual forecasting methods the LSTM model is the most robust one and obtained the best results for nodes from all 4 behavior groups. The MP model achieved the best results for all Municipal utilities, and the performance is especially good for MUN2 node where all models had a particularly high MAPE. FAR model outperformed others for the Industry node IND2. Even though FAR errors are slightly higher than other two proposed models (MP and LSTM) it can be seen that for most of the nodes the performance is very similar. For Storage nodes with intermittent behaviour none of the proposed individual methods has demonstrated adequate accuracy.
The Hybrid model showed an improvement for all 4 groups nodes. The improvement is especially significant for Storage nodes, where the average MAPE is lower for more than 2(%) for nodes STO1 and STO2. For node STO3 the LSTM model has the lowest MAPE but the lowest MAD is achieved by the HYB model.
The four types of nodes considered in the paper have very different behaviors, and the time series of the corresponding flows can be forecasted with wide accuracy ranges. The Storage nodes have a characteristic behavior with the intermittent flow of a large scale which, in addition, can have both directions. The MUN2 node also shows some intermittent behavior, especially in the test period, which results in higher mean daily errors.
Figure 6 shows calculated 24 h ahead forecast and the measured flow of all proposed models for a 1 week period.
The proposed Hybrid model is built on the top of Mathematical Programming (MP) method, which represents a weighted sum of features (calculated on previous flow values as well as some exogenous variables) chosen by Mixed Integer Programming (MIP) in the offline regime. The Hybrid model’s main advantage is its property to easily include new features like forecasts from different models in order to improve the final result. Using the Hybrid model, we are ensuring that the provided forecast is at least ‘as good as’ the best model’s result, always taking advantage of discarding the influence of previously inaccurate model. Even though for some nodes, single forecast models outperform the Hybrid model, the proposed method shows robust, stable accuracy very similar to the best individual model and, in some cases, brings a significant improvements. For example, in the case for MUN2 node, which shows some intermittent behaviour and significant changes in the daily levels, the LSTM model fails to give a good forecast with the mean daily MAPE of more than 50%. Simultaneously, the Hybrid model successfully sets the weights for the corresponding features and discards LSTM forecast, using the linear combination of other features to provide the forecast, which reduces the MAPE by 26%.
5 Conclusions
In this paper, we proposed a robust and powerful hybrid forecast model combining Mathematical Programming with Functional AutoRegressive and Long Short Term Neural Network model for forecasting gas flows at the boundary nodes of gas transport network. Our experiments are based on real world data from one of Germany’s largest transmission system operators, Open Grid Europe. We showed that the proposed method is appropriate for choosing optimal set of features and forecasting various behaviours from different nodes groups in the complex gas transmission network. From obtained results it is clear that even though in some specific cases single forecast models outperform the Hybrid model, the proposed method can achieve stable accuracy close to the best individual model and in some cases brings a significant improvement to the forecast quality.
References
Regulation (EC) no 715/2009 of the European Parliament and of the Council on conditions for access to the natural gas transmission networks (2009)
The Arbeitsgemeinschaft Energiebilanzen (AG Energiebilanzen): Evaluation tables of the energy balance for GermanyEnergy data for the years 1991–2018 (2019)
Alfares, H., Nazeeruddin, M.: Electric load forecasting: literature survey and classification of methods. Int. J. Syst. Sci. 33(1), 23–34 (2002). https://doi.org/10.1080/00207720110067421
Azadeh, A., Asadzadeh, S., Ghanbari, A.: An adaptive networkbased fuzzy inference system for shortterm natural gas demand estimation: Uncertain and complex environments. Energy Policy 38(3), 1529–1536 (2010). https://doi.org/10.1016/j.enpol.2009.11.036. (Security, prosperity and community—towards a common european energy policy? Special section with regular papers)
Baker, D.: Effect of observation time on mean temperature estimation. J. Appl. Meteorol. 14(4), 471–476 (1975). https://doi.org/10.1175/15200450(1975)014<0471:EOOTOM>;2.0.CO;2
Baldacci, L., Golfarelli, M., Lombardi, D., Sami, F.: Natural gas consumption forecasting for anomaly detection. Expert Syst. Appl. 62, 190–201 (2016). https://doi.org/10.1016/j.eswa.2016.06.013
Besse, P., Cardot, H., Stephenson, D.: Autoregressive forecasting of some climatic variations. Scand. J. Stat. 27, 673–687 (2000)
Bosq, D.: Modelization, nonparametric estimation and prediction for continuous time processes. In: Roussas, G. (ed.) Nonparametric Functional Estimation and Related Topics, NATO Science Series C, pp. 509–529. Springer, Berlin (1991)
Bosq, D.: Linear Processes in Function Spaces: Theory and Applications. Springer, New York (2000)
Brabec, M., Konár, O., Pelikán, E., Malý, M.: A nonlinear mixed effects model for the prediction of natural gas consumption by individual customers. Int. J. Forecast. 24(4), 659–678 (2008). https://doi.org/10.1016/j.ijforecast.2008.08.005. (Energy forecasting)
Chen, X.: Large sample sieve estimation of seminonparametric models. Handb. Econ. 6, 5549–5632 (2007)
Chen, Y., Chua, W.S., Koch, T.: Forecasting dayahead highresolution naturalgas demand and supply in Germany. Appl. Energy 228, 1091–1110 (2018)
Chen, Y., Li, B.: An adaptive functional autoregressive forecast model to predict electricity price curves. J. Bus. Econ. Stat. 35, 371–388 (2017)
Chen, Y., Marron, J.S., Zhang, J.: Modeling seasonality and serial dependence of electricity price curves with warping functional autoregressive dynamics. Ann. Appl. Stat. 13, 1590–1616 (2019)
Costa, O.L.V., de Oliveira Ribeiro, C., Ho, L.L., Rego, E.E., Parente, V., Toro, J.: A robust least square approach for forecasting models: an application to Brazil’s natural gas demand. Energy Syst. 87, 1868–3975 (2019). https://doi.org/10.1007/s12667019003511
Degiannakis, S.: Multiple days ahead realized volatility forecasting: single, combined and average forecasts. Glob. Financ. J. 36, 41–61 (2018). https://doi.org/10.1016/j.gfj.2017.12.002
Dombaycı, O.: The prediction of heating energy consumption in a model house by using artificial neural networks in denizliturkey. Adv. Eng. Softw. 41(2), 141–147 (2010). https://doi.org/10.1016/j.advengsoft.2009.09.012
Elman, J.L.: Finding structure in time. Cogn. Sci. 14, 20 (1990)
Forouzanfar, M., Doustmohammadi, A., Menhaj, M., Hasanzadeh, S.: Modeling and estimation of the natural gas consumption for residential and commercial sectors in iran. Appl. Energy 87(1), 268–274 (2010). https://doi.org/10.1016/j.apenergy.2009.07.008
Grenander, U.: Abstract Inference. Wiley, New York (1981)
Guillas, S.: Rates of convergence of autocorrelation estimates for autoregressive hilbertian processes. Stat. Probab. Lett. 55, 281–291 (2001)
Hochreiter, S., Schmidhuber, J.: Long shortterm memory. Neural Comput. 9(8), 1735–1780 (1997)
Hribar, R., Potočnik, P., Šilc, J., Papa, G.: A comparison of models for forecasting the residential natural gas demand of an urban area. Energy 167, 511–522 (2019)
Izadyar, N., Ong, H.C., Shamshirband, S., Ghadamian, H., Tong, C.W.: Intelligent forecasting of residential heating demand for the district heating system based on the monthly overall natural gas consumption. Energy Build. 104, 208–214 (2015). https://doi.org/10.1016/j.enbuild.2015.07.006
Karadede, Y., Ozdemir, G., Aydemir, E.: Breeder hybrid algorithm approach for natural gas demand forecasting model. Energy 141, 1269–1284 (2017). https://doi.org/10.1016/j.energy.2017.09.130
Khotanzad, A., Elragal, H., Lu, T.: Combination of artificial neuralnetwork forecasters for prediction of natural gas consumption. IEEE Trans. Neural Netw. 11(2), 464–473 (2000). https://doi.org/10.1109/72.839015
Liang, Z., Liang, J., Wang, C., Dong, X., Miao, X.: Shortterm wind power combined forecasting based on error forecast correction. Energy Convers. Manag. 119, 215–226 (2016). https://doi.org/10.1016/j.enconman.2016.04.036
Liu, C., Jin, Z., Gu, J., Qiu, C.: Shortterm load forecasting using a long shortterm memory network. In: 2017 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGTEurope), pp. 1–6 (2017). https://doi.org/10.1109/ISGTEurope.2017.8260110
Liu, X., Xiao, H., Chen, R.: Convolutional autoregressive models for functional time series. J. Econom. 194, 263–282 (2016)
Mirasgedis, S., Sarafidis, Y., Georgopoulou, E., Lalas, D.P., Moschovits, M., Karagiannis, F., Papakonstantinou, D.: Models for midterm electricity demand forecasting incorporating weather influences. Energy 31(2), 208–227 (2006). https://doi.org/10.1016/j.energy.2005.02.016
Mokhtari, F., Mourid, T.: Prediction of continuous time autoregressive processes via the reproducing kernel spaces. Stat. Infer. Stoch. Process. 6, 247–266 (2003)
Nguyen, H.T., Nabney, I.T.: Shortterm electricity demand and gas price forecasts using wavelet transforms and adaptive models. Energy 35(9), 3674–3685 (2010). https://doi.org/10.1016/j.energy.2010.05.013
Özmen, A., Yılmaz, Y., Weber, G.W.: Natural gas consumption forecast with mars and cmars models for residential users. Energy Econ. 70, 357–381 (2018). https://doi.org/10.1016/j.eneco.2018.01.022
Panapakidis, I., Dagoumas, A.: Dayahead natural gas demand forecasting based on the combination of wavelet transform and anfis/genetic algorithm/neural network model. Energy 118, 231–245 (2017). https://doi.org/10.1016/j.energy.2016.12.033
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural network. In: Proceedings of the 30th International Conference on Machine Learning, vol. 28 (2013)
Plunkett, K., Elman, J.: Exercises in Rethinking Innateness: A Handbook for Connectionist Simulations. MIT Press, New York (1997)
Potočnik, P., Soldo, B., Šimunović, G., Šarić, T., Jeromen, A., Govekar, E.: Comparison of static and adaptive models for shortterm residential natural gas forecasting in croatia. Appl. Energy 129, 94–103 (2014). https://doi.org/10.1016/j.apenergy.2014.04.102
Ravnik, J., Hriberšek, M.: A method for natural gas forecasting and preliminary allocation based on unique standard natural gas consumption profiles. Energy 180, 149–162 (2019)
Rumelhart, D., Hinton, G., Williams, R.: Learning representations by backpropagating errors. Nature 323(6088), 533–536 (1986)
Sen, D., Erdem Günay, M., Tunç, K.M.: Forecasting annual natural gas consumption using socioeconomic indicators for making future policies. Energy 173, 1106–1118 (2019)
Shaikh, F., Ji, Q., Shaikh, P., Mirjat, N., Uqaili, M.: Forecasting china’s natural gas demand based on optimised nonlinear grey models. Energy 140, 941–951 (2017). https://doi.org/10.1016/j.energy.2017.09.037
Soldo, B.: Forecasting natural gas consumption. Appl. Energy 92, 26–37 (2012). https://doi.org/10.1016/j.apenergy.2011.11.003
Soldo, B., Potočnik, P., Šimunović, G., Šarić, T., Govekar, E.: Improving the residential natural gas consumption forecasting models by using solar radiation. Energy Build. 69, 498–506 (2014). https://doi.org/10.1016/j.enbuild.2013.11.032
Su, H., Zio, E., Zhang, J., Xu, M., Li, X., Zhang, Z.: A hybrid hourly natural gas demand forecasting method based on the integration of wavelet transform and enhanced deeprnn model. Energy 178, 585–597 (2019)
Szoplik, J.: Forecasting of natural gas consumption with artificial neural networks. Energy 85, 208–220 (2015). https://doi.org/10.1016/j.energy.2015.03.084
Taşpınar, F., V̧elebi, N., Tutkun, N.,: Forecasting of daily natural gas consumption on regional basis in Turkey using various computational methods. Energy Build. 56, 23–31 (2013). https://doi.org/10.1016/j.enbuild.2012.10.023
Taylor, J.: Shortterm electricity demand forecasting using double seasonal exponential smoothing. J. Oper. Res. Soc. 54(8), 799–805 (2003). https://doi.org/10.1057/palgrave.jors.2601589
Thaler, M., Grabec, I., Poredoş, A.: Prediction of energy consumption and risk of excess demand in a distribution system. Phys. A Stat. Mech. Appl. 355(1), 46–53 (2005). https://doi.org/10.1016/j.physa.2005.02.066. (Market dynamics and quantitative economics)
Tonković, Z., ZekićSušac, M., Somolanji, M.: Predicting natural gas consumption by neural networks. Tehnički Vjesnik 16(3), 51–6 (2009)
Vitullo, S.R., Brown, R., Corliss, G.F., Marx, B.: Mathematical models for natural gas forecasting. Can. Appl. Math. Q. 17(7), 807–827 (2009)
Šebalj, D., Mesarić, J., Dujak, D.: Predicting natural gas consumption–a literature review. In: Proceedings of the Central European Conference on Information and Intelligent Systems (2017)
Wang, Q., Li, S., Li, R., Ma, M.: Forecasting U.S. shale gas monthly production using a hybrid arima and metabolic nonlinear grey model. Energy 160, 378–387 (2018)
Xu, M., Li, J., Chen, Y.: Varying coefficient functional autoregressive model with application to the US treasuries. J. Multivar. Anal. 159, 168–183 (2017)
Yu, F., Xu, X.: A shortterm load forecasting model of natural gas based on optimized genetic algorithm and improved bp neural network. Appl. Energy 134, 102–113 (2014). https://doi.org/10.1016/j.apenergy.2014.07.104
Zhou, H., Su, G., Li, G.: Forecasting daily gas load with Oihf–Elman neural network. Proced. Comput. Sci. 5, 754–758 (2011). https://doi.org/10.1016/j.procs.2011.07.100. (The 2nd international conference on ambient systems, networks and technologies (ANT2011)/The 8th international conference on mobile web information systems (MobiWIS 2011))
Zhu, L., Li, M., Wu, Q., Jiang, L.: Shortterm natural gas demand prediction based on support vector regression with false neighbours filtered. Energy 80, 428–436 (2015). https://doi.org/10.1016/j.energy.2014.11.083
Acknowledgements
This work has been supported by the Research Campus Modal (Mathematical Optimization and Data Analysis Laboratories) funded by the German Federal Ministry of Education and Research (fund numbers 05M14ZAM, 05M20ZBM).
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Petkovic, M., Chen, Y., Gamrath, I. et al. A hybrid approach for high precision prediction of gas flows. Energy Syst 13, 383–408 (2022). https://doi.org/10.1007/s12667021004664
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12667021004664