A deep learning model for gas storage optimization

Curin, Nicolas; Kettler, Michael; Kleisinger-Yu, Xi; Komaric, Vlatka; Krabichler, Thomas; Teichmann, Josef; Wutte, Hanna

doi:10.1007/s10203-021-00363-6

A deep learning model for gas storage optimization

Open access
Published: 19 November 2021

Volume 44, pages 1021–1037, (2021)
Cite this article

Download PDF

You have full access to this open access article

Decisions in Economics and Finance Aims and scope Submit manuscript

A deep learning model for gas storage optimization

Download PDF

Nicolas Curin¹,
Michael Kettler¹,
Xi Kleisinger-Yu²,
Vlatka Komaric¹,
Thomas Krabichler ORCID: orcid.org/0000-0001-8094-3110³,
Josef Teichmann² &
…
Hanna Wutte²

4350 Accesses
9 Citations
Explore all metrics

Abstract

To the best of our knowledge, the application of deep learning in the field of quantitative risk management is still a relatively recent phenomenon. In this article, we utilize techniques inspired by reinforcement learning in order to optimize the operation plans of underground natural gas storage facilities. We provide a theoretical framework and assess the performance of the proposed method numerically in comparison to a state-of-the-art least-squares Monte-Carlo approach. Due to the inherent intricacy originating from the high-dimensional forward market as well as the numerous constraints and frictions, the optimization exercise can hardly be tackled by means of traditional techniques.

Operation Condition Prediction for Pipeline

Soft Actor-Critic Based Deep Reinforcement Learning Method for Production Optimization

Hydropower Optimization Using Deep Learning

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Natural gas prices exhibit distinct yearly seasonal patterns. Due to limited storage capacities and pronounced fluctuations in the demand, its prices tend to be lower in summer, and higher as well as more volatile in winter. Physical storage facilities are required in order to exploit this seasonality. Various market participants are willing to own or lease storage facilities, or trade storage capacities, which creates high demand in the underground gas storage facilities worldwide. As an example, US working gas^{Footnote 1} in underground storage was at its record high in 2020 compared to the previous 5 years; see Fig. 1. Given the smoothing effect of gas price spreads through storage facilities, it is vital to optimize their usage for trading, pricing, hedging, risk management and investment purposes.

Over the last decades, various articles contributed to the modeling and optimization of energy storage. For standard references, see the Section 12.6 of Geman (2009), Section 5.3.4 of Fiorenzani et al. (2012), and Holland (2007, 2008). Other references include, e.g., De Jong (2015), Boogert and De Jong (2008, 2011), Safarov and Atkinson (2017), Cummins et al. (2017), Carmona and Ludkovski (2010), Bjerksund et al. (2011), Thompson et al. (2009), Hénaff et al. (2018), Malyscheff and Trafalis (2017), Jaillet et al. (2004), Warin (2012) and Makassikis et al. (2007). Much of the literature puts more emphasis on the modeling (and prediction) of gas prices rather than on developing algorithms for the optimization of storage plans. Conventionally, storage plans were optimized by means of Least-Squares Monte Carlo approaches (LSMC) (Malyscheff and Trafalis 2017; Warin 2012; Boogert and De Jong 2011) or support vector machine regression (SVR), considered as a stochastic control problem with HJB equations (Thompson 2016), or an application of real option theory (Thompson et al. 2009). The bottleneck of classical optimization techniques is the so-called curse of dimensionality, i.e., the running time often grows exponentially in the number of state space dimensions. Based on techniques inspired by reinforcement learning, one manages to tackle these intricate optimization tasks without simplifications. To this end, one designs an artificial financial agent who is able to trade off numerous aspects without further ado. The recent work Bachouch et al. (2020) applies a number of true reinforcement learning algorithms to the problem of gas storage valuation, seen as discrete-time stochastic control problems in finite time horizon. Regarding the problem of pricing commodity swing options, Daluiso et al. (2020) employ an actor-critic reinforcement learning technique to approximate actions in day-ahead forward markets maximizing accumulated expected payoffs. In contrast to these reinforcement learning methods, our artificial agent does not learn to act under all configurations, but only under those relevant for the given scenarios.^{Footnote 2} Furthermore neither Markov assumptions are made, nor is dynamic programming used.

In spirit similar to the present work, Barrera-Esteve et al. (2006) suggest to address the problem of pricing a swing option on natural gas with a policy search method, i.e., to train a neural network to approximate optimal gas consumption rates maximizing expected terminal wealth. Thus, in accordance with our approach, Barrera-Esteve et al. (2006) view the task of pricing the swing option as a general parametric optimization problem rather than a stochastic control problem. By contrast however, we do not assume any Markov setting such as that arising from the one-factor model for forward prices of gas which considered therein. Moreover, we stress that the deep hedging approach followed in the present work can be readily generalized to minimizing risk measures instead of expected rewards.

Beyond that, to the best of the authors’ knowledge, our techniques inspired by reinforcement learning have not been applied to gas storage and related problems. Thus, their full potential as well as new challenges in storage-related optimization problems are yet to be investigated.

In this article, we present a fresh machine learning approach for the optimization of gas storage. Along the lines of deep hedging (see Buehler et al. 2019), we determine optimal strategy networks, i.e., neural networks that approximate optimal strategies for trading in spot and forward markets utilizing storage facilities. Optimality is understood as maximizing expected utility of accumulated wealth at terminal time. More specifically, we introduce two models which are of the intrinsic valuation type: a simple spot-only model (SMod) allowing for trades in a spot-proxy only, and a more involved model referred to as spot-and-forward model (SFMod) additionally incorporating trades in monthly forwards with delivery periods. Traditionally, models with trades in spot-proxies based on an artificial daily forward curve, which is implied from the tradable monthly forwards with delivery period, have been employed for simplicity. However, the main purpose of gas storage optimization is to maximize profits or utility of storage managers rather than theoretical valuations. Therefore, the use of tradables, and thus the use of SFMod, is more relevant for gas storage optimization.

The paper is structured in the following way: In Sect. 2, we provide a brief overview of important aspects in gas storage modeling and outline the machine learning concept that we employ for optimizing gas storage usage. In Sect. 3, we present the spot-only deep learning model (SMod) for trading strategies utilizing gas storage facilities. We compare our model in numerical tests against a set of benchmark strategies derived via LSMC. In Sect. 4, we present the spot-and-forward model (SFMod), which additionally includes trades in monthly forwards with delivery periods, and investigate its performance in numerical tests likewise.

Throughout this article, we consider a discrete time setting. Let the time instances $\mathbb {T}=\{0,1,2,\ldots ,K-1\}$ for some $K\in \mathbb {N}$ be the trading horizon in days, and let $(\varOmega ,\mathcal {F},\mathbb {F},\mathbb {P})$ with $\mathbb {F}=(\mathcal {F}_k)_{k\in \mathbb {T}}$ be a filtered probability space with real-world measure $\mathbb {P}$. Additionally, we assume the existence of an equivalent risk-neutral measure ${\mathbb {Q}}$.

2 Optimizing gas storage by means of deep hedging

There are three types of underground natural gas storage: depleted natural gas field/oil fields, aquifers, and salt caverns. They are often located close to a pipeline, which makes the delivery of physical transactions more convenient. Compared to the storage through conversion to LNG, underground natural gas storage is bigger and cheaper, but restricted to regional use. In Table 1, we provide an overview of the main stylized characteristics of a gas storage, that are relevant for modeling and optimization. An owner of physical natural gas storage might want to hedge on the forward market by selling virtual storage. Typically, such a virtual storage is structured based on the stylized characteristics in order to ease negotiation and improve the liquidity of the transaction.

Table 1 This table lists the most important characteristics for modeling gas storage

Full size table

The goal of gas storage optimization is to find an optimal plan for withdrawing and injecting gas into the storage over a certain period of time subject to the above-mentioned constraints. Extracting respectively feeding gas into the storage corresponds to going short or long in the spot market with respect to a certain storage level. Hence, gas storage optimization can be seen as the problem of identifying optimal actions in an uncertain and restricted market environment to maximize an expected terminal utility of accrued wealth. More formally, let us consider an agent trading in a market and let $h_k$ denote the action she takes on day k. A trading strategy over the whole trading horizon is then collected in $H=\{h_0,\ldots ,h_{K-1}\}$. At maturity, the agent gains utility $U(W_H)$ based on the stochastic terminal wealth $W_H$ that she accrued by trading according to the strategy H. The agent seeks to identify an optimal strategy $H^*$ satisfying

$$\begin{aligned} H^*=\underset{H}{\text {arg}\max }\ {\mathbb {E}}_{\mathbb {P}} [U(W_H)]. \end{aligned}$$

(1)

Reinforcement learning (Sutton and Barto 2018) is a broad and very active area of research, suggesting a plethora of algorithms to solve intricate optimization problems. A very popular strand of deep reinforcement learning focuses on approximating optimal actions by (e.g., feed-forward) neural networks; see Definition 1 below. Neural networks are very suitable for such tasks because of their universality and their efficient trainability. Parameters $\theta $ of these network strategies $G^\theta =\{g^\theta _0,\ldots ,g^\theta _{K-1}\}$ are trained to maximize an estimate of the expected terminal utility, i.e., to solve

$$\begin{aligned} \max _\theta {\mathbb {E}}_\mathbb {P} \left[ U\left( W_{G^\theta }\right) \right] . \end{aligned}$$

(2)

Notice that, in contrast to reinforcement learning, we do neither calculate intermediate value functions, nor do we calculate actions on states which are unlikely to appear. This considerably reduces the numerical complexity of the method. Additionally, we are neither constrained to time consistent settings nor to expected reward settings.

Definition 1

(Feedforward neural network) Let $L, d_0, d_L\in {\mathbb {N}}$. A feed-forward neural network ${g^\theta :\,{\mathbb {R}}^{d_0}\rightarrow {\mathbb {R}}^{d_L}}$ is defined as

$$\begin{aligned} g^\theta (x)=A^{L}\circ \varphi \circ A^{L-1}\circ \ldots \circ \varphi \circ A^{1}(x),\quad \end{aligned}$$

(3)

where

$L\in \mathbb {N}$ is the number of layers ($L-1$ hidden layers),
$\varphi (\cdot )$ denotes a non-linear activation function that is applied component-wise, e.g., the sigmoid activation function $\varphi (\cdot )=(1+e^{-\,\cdot })^{-1}$, and
$A^{l}, l=1,\ldots ,L$ denote affine linear maps in the respective dimensions, whose parameters are stored in $\theta \in {\mathbb {R}}^q$ for some $q\in \mathbb {N}$.

Note that in reinforcement learning, it is commonly assumed that environments can be described by (known or unknown) Markov decision processes. However, in many real-world applications and in particular in financial markets, information above and beyond knowledge of the current state can be used to better predict the dynamics of the environment and improve control, rendering Markov assumptions unrealistic. While enlarging the state space can partially act as remedy, we highlight that the approach outlined above does not necessarily require a Markovian framework. Instead, we approximate trading strategies along the lines of deep hedging without restricting the state space or even restricting ourselves to specific market dynamics.

Finally, for obvious reasons, we want to note that the trading action $h_k$ should only depend on information which is available in the market up until time k. This entails the parameterization of the strategy $h_k$ on a given day k by a neural network $g^\theta _k$ mapping appropriate market information and storage levels to trading actions.

3 SMod: intrinsic spot trading

Following the machine learning approach outlined in Sect. 2, we introduce in what follows a deep hedging model for gas storage optimization that is based on trading day-ahead prices of gas. Note that in commodities markets, day-ahead futures or forwards are seen as close proxy of the spot price. Therefore, we tacitly refer trading activities in the day-ahead price of gas to spot trading. For simplicity, we assume no discounting and zero transaction cost, i.e., $\kappa \equiv 0$; for more general formulation including costs, see Remark 1.

Let $S_k=\big (F(k,k+1,k+1)\big )_{k\in {\mathbb {T}}}$ denote the $\mathbb {F}$-adapted gas spot price, and $h^S_k$ the $\mathcal {F}_k$-measurable action on day k. $h^S_k>0$ refers to an injection of $|h^S_k|$ MWh into the storage and $h^S_k<0$ refers to a withdrawal of $|h^S_k|$ MWh from it. A trading strategy over the whole trading horizon is denoted by $\widetilde{H}^S=\{h^S_0,h^S_1,\ldots ,h^S_{K-1}\}$ and its terminal value is given by $(\widetilde{H}^S\bullet S)_{K-1}:=\sum _{k=0}^{K-1} h^S_k S_k$. Moreover, the storage level (or working gas) $H_n^S$ on day n is given by

$$\begin{aligned} H_{n}^S:=\sum _{k=0}^{n-1}h^S_k, \end{aligned}$$

with an initially empty storage, i.e., $H_0^S:=0$.

Suppose a storage manager’s preferences can be expressed through a (concave, non-decreasing) utility function $U:{\mathbb {R}}\rightarrow {\mathbb {R}}$. In line with Sect. 2, she aims to identify strategies maximizing her expected utility of terminal wealth, i.e., to maximize

$$\begin{aligned} \mathbb {E}_\mathbb {P}\left[ U(W_{K-1})\right] \end{aligned}$$

(4)

over all eligible $\widetilde{H}^S$, where

$$\begin{aligned} W_{K-1}&:=\sum _{k=0}^{K-1}-h^S_k S_k \end{aligned}$$

(5)

denotes the resulting terminal profit and loss^{Footnote 3} (P&L). The optimization is subject to the constraints

$$\begin{aligned}&H_K^S = 0, \end{aligned}$$

(6)

$$\begin{aligned}&0 \, \le H_k^S \, \le \, c, \qquad \text {and } \qquad \ell _k \, \le \,h^S_k \, \le \, u_k, \end{aligned}$$

(7)

for all $k\in \mathbb {T}$. The constraint (6) says that the storage must be empty at maturity; if one does not adhere to that, a contractually agreed penalty becomes due. Unless gas prices become negative, any profit-seeking agent would comply with (6) naturally, since profits can only be generated by disposing of previously stored gas. The daily constraints (7) can be transformed and merged to a single target range. Indeed, one simply imposes

$$\begin{aligned} \widetilde{\ell }_k \, \le \, h^S_k\, \le \,\widetilde{u}_k, \end{aligned}$$

(8)

where

$$\begin{aligned} \widetilde{\ell }_k:=\max \left\{ \ell _k,-H_k^S\right\} ,\qquad \text {and}\qquad \widetilde{u}_k:=\min \left\{ u_k,c-H_k^S\right\} . \end{aligned}$$

Pursuing a strategy learning approach, we approximate each action $h^S_k$ in terms of a deep neural network $g_k$; for ease of readability, we henceforth skip the dependence on the model parameters $\theta $. The inputs to these network strategies are the current spot price $S_k$, the latest storage fill level $H_k^S$ (that iteratively depends on the previous neural networks) and the time k, i.e., $g_k = g_k\big (k,H_k^S,S_k\big )$. These K networks are summarized in the storage schedule $\widetilde{G}=\{g_0, \ldots , g_{K-1}\}$. Note that we allow for parameter sharing amongst the network instances $g_k$, i.e., the number of distinct neural networks $N\in {\mathbb {N}}$ can be significantly smaller than the number of trading days. In the most extreme case, the present framework allows for modeling each strategy with the same network, i.e., $g_k\equiv g$ for all $k\in \mathbb {T}$. It needs to be noted that the inputs of the neural networks in the numerical tests below were normalized in order to ensure a swift and stable learning process. The model parameters of $\widetilde{G}$ were trained with standard Adam stochastic gradient descent on negative expected utility

$$\begin{aligned} \mathbb {E}_\mathbb {P}\quad \left[ -U\left( \sum _{k=0}^{K-1}-g_k\left( k,H_k^S,S_k\right) S_k\right) \right] , \end{aligned}$$

(9)

subject to the constraints mentioned in Sect. 2.

3.1 Training setup

In the following, we state the precise training setup by the example of exponential utility $ U(\cdot ):=(1-e^{-\gamma \cdot })/\gamma $, with risk aversion rate $\gamma \in {\mathbb {R}}^+$.

Training Data: time horizon of storage $\mathbb {T}$, M trajectories of the spot $(S^i_k )_{k\in \mathbb {T};i=1,\ldots ,M}$.
Training object: storage action (withdrawal or injection rate) $\widetilde{G}$ over the whole storage horizon, that is a neural network consisting of $N\in {\mathbb {N}}$ ($N\le K$) distinct sub-networks, each of which has L layers. The network’s input is time as well as respective spot and storage fill level.
Training criterion: minimize an estimate of expected negative utility over batches $B\subset \{1,\ldots ,M\}$ of training data, i.e.,
$$\begin{aligned} \min _{\widetilde{G}(S^i)\in \mathcal {G}^i,i\in B}\frac{1}{|B|} \sum _{i\in B}-U\left( W^i_{K-1}\right) , \end{aligned}$$
where
$$\begin{aligned} W^i_{K-1}&:=\sum _{k=0}^{K-1}-g_k(k,H^{S_i}_k,S^i_k) S^i_k,\\ \mathcal {G}^i&=\left\{ \widetilde{G}(S^i) ~\Big | ~H_K^{S^i}=0;\ \widetilde{\ell }_k\le g_k\left( k,H_k^{S^i},S^i_k\right) \le \widetilde{u}_k\, ~\text {for}~k\in \mathbb {T}\right\} . \end{aligned}$$

Remark 1

For the full case as described in Table 1, where costs $\kappa $ and C are non-zero, terminal profit and loss is given by

$$\begin{aligned} W_{K-1}:=\left( \sum _{k=0}^{K-1}-h^S_k S_k-\big |h^S_kS_k\big |\cdot \kappa \right) -C. \end{aligned}$$

The training can be performed analogously.

Remark 2

Of course here a more general path dependence could be considered to deal with possible non-Markovianity.

For numerical testing, spot curves of gas as well as benchmark strategies were provided by Axpo Solutions AG in form of $10{,}000\times 351$ matrices, representing $M=10{,}000$ scenarios of $K=351$ trading days.^{Footnote 4} Benchmark strategies had been derived utilising an LSMC technique.^{Footnote 5} We would like to emphasize that the proposed framework is not linked to specific stochastic dynamics of the spot price scenarios, i.e., we can exploit the methodology regardless of the model choice. Therefore, we leave the nature of the scenarios unspecified. If required, one can enrich the base scenarios with arbitrary stress scenarios.

The neural network model was implemented in tensorflow.keras with the sigmoid activation function. The daily constraints $\widetilde{\ell }_k$ and $\widetilde{u}_k$ from (8), and the network-based action $g_k$ were parameterized using the inverse linear transformation from 0, 1 and $\frac{g_k-\widetilde{\ell }_k}{(\widetilde{u}_k-\widetilde{\ell }_k)}\in [0,1]$ respectively. For the final zero storage constraint, we additionally checked for every k that

$$\begin{aligned} H^S_k \le \sum _{k+1}^{K-1}\ell _k \end{aligned}$$

(10)

prevailed, in order to ensure that an empty final storage $H^S_K=0$ remained reachable. If the condition was violated on any day, the upper action bounds on all subsequent days were overridden by their lower counterparts, forcing a complete withdrawal of storage until maturity. In reality, it is possible to leave a non-empty storage by paying the penalty. Yet, for our modeling, the zero final storage rule was strictly adhered to. Figure 2 visualizes the constraints: the left plot shows normalized zero storage constraint (10), and the right plot shows daily injection and withdrawal bounds (8). The daily constraints entail two regimes with

$$\begin{aligned} \ell _k={\left\{ \begin{array}{ll} -600, &{} \quad {\text {if}} \;k\le 170,\\ -3072,&{}\quad {\text {otherwise}}, \end{array}\right. } \qquad u_k={\left\{ \begin{array}{ll} 2808,&{}\quad {\text {if}}\; k\le 200,\\ 408,&{}\quad {\text {otherwise}}. \end{array}\right. } \end{aligned}$$

The network strategies $g_k$ were trained based on the spot prices provided by Axpo Solutions AG. The data set was split into a training set of 6000 and a validation set of 4000 scenarios in order to perform in-sample and out-of-sample tests. LSMC benchmark strategies were optimized on the entire set of 10,000 scenarios in order to serve as a suitable proxy of the optimal solution.^{Footnote 6} In the course of various experiments, we assessed the required training time depending on the depth and the number of distinct neural networks ($N\le K$), different learning rates and the batch size in order to fine-tune empirically a suitable setting for SMod. The training generally concluded quickly and is well-managable on a standard 8-core notebook. Illustratively, the training of SMod using in the implementation as much as K neural networks and 1000 epochs on 6000 scenarios takes less than 2 h. A core strength of the proposed deep hedging approach however, is that training of strategy networks can be performed when time is not pressing (e.g., on a weekly basis) and trained strategies can be readily evaluated at any given time within a couple of seconds. This runtime for the evaluation is competitive with the LSMC approach. Moreover, it turned out that it is not necessary to build SMod on K neural networks. In fact, we encountered that $N=12$ instances, each used recurrently during a month, with $L=2$ and $d_1=16$ (see Definition 1 above) already provided a decent approximation of the optimal policy. Indeed, after 1000 epochs on 6000 scenarios, a learning rate of 0.001, a batch size of 64, and a risk aversion rate of $\gamma =3$, the strategy of the artificial financial agent gets convincingly close to the benchmark solution. Figure 3 provides a visualization of the P&L line-up between the spot-only and the benchmark model in in-sample and out-of-sample tests, as well as a visualization of the storage fill levels of SMod and that of the benchmark respectively. A comparison of descriptive statistics on the terminal P&L between SMod and the benchmark is reported in the table at the bottom of Fig. 6. When we used fewer instances $N<12$, ceteris paribus, we encountered inferior performance after the same number of epochs. It turns out that a higher parameter N promotes a quicker learning of the required solution complexity.

4 SFMod: intrinsic spot and forward trading

In the following, we extend the previous model by trading additionally on the front month rolling forwards with delivery period of a whole month. A front month rolling forward curve contains at any point in time the first nearby monthly forward. We inherently assume that a monthly forward contract is only traded before its delivery period starts (and no longer during the delivery period), and that delivery obligations are valued using the spot prices whose delivery days lie within the delivery period. Note that we restrict ourselves to those forwards that have delivery months within the time horizon of the storage problem. A visualization of the forward rolling mechanism is provided in Fig. 4.

Let $0=n_0<n_1<\cdots<n_J<K$ be the first days of the months $\mathcal {J}=\{0,1,\ldots ,J\}$ respectively. Let $h^j_k$ with $j\in \mathcal {J}$ denote the action on day k on the forward $F(k,n_j, n_{j+1}-1)$, which has the delivery period $[n_j, n_{j+1}-1]$. $h^j_k>0$ refers to buying and $h^j_k<0$ refers to selling $F(k,n_j, n_{j+1}-1)$. The above assumption implies that $h^j_k=0$ for $k<n_{j-1}$ and for $k\ge n_j$; in particular $h^J_k=0$ for all $k\in \mathbb {T}$. Consistently to SMod, we aim to maximize

$$\begin{aligned} \mathbb {E}_\mathbb {P} \big [U(W_{K-1})\big ] \end{aligned}$$

(11)

with the terminal P&L

$$\begin{aligned} W_{K-1} := W^S_{K-1} + W^F_{K-1}. \end{aligned}$$

$W^S_{K-1}$ denotes the terminal P&L from the spot trading and is unchangedly given by (5). $h^S_k$ schedules the storage activity for the next day. Similarly, $W^F_{K-1}$ denotes the terminal P&L from trading the monthly forward, and is defined as

$$\begin{aligned} W^F_{K-1}= \sum _{j=1}^{J-1} ~\sum _{ k=n_{j-1} }^{n_j-1}\left( -h^j_k F\left( k, n_j, n_{j+1}-1\right) \left( n_{j+1}-n_{j}\right) \right) . \end{aligned}$$

(12)

For a forward with the delivery period $[n_j, n_{j+1}-1]$, the daily delivery quantity $d^j$ is fixed on day $n_j -1$ for $j\ge 0$, and is given by

$$\begin{aligned} d^j:=\sum _{k=n_{j-1}}^{n_j-1} h^j_k,\qquad \text {for}~ j>0, \end{aligned}$$

(13)

and $d^0:=0$. The storage level $H_n$ on day n depends on both spot and monthly forward trading activities. For $n\in [n_{I-1},n_I)$, $H_n$ is given by

$$\begin{aligned} H_{n}:=\sum _{k=0}^{n-1}h^S_k + \sum _{j=1}^{I-2} \left( d^j \left( n_{J+1}-n_{J}\right) \right) +d^{I-1} \left( n-n_{I-1}+1\right) \end{aligned}$$

with initially empty storage, i.e. $H_0:=0$. The optimization of (11) is subject to the constraints

$$\begin{aligned} H_K&= 0,\end{aligned}$$

(14)

$$\begin{aligned} 0 \,\le H_k \, \le \, c, \qquad&\text {and } \qquad \ell _k-d^j \, \le \,h^S_k \, \le \, u_k-d^j \end{aligned}$$

(15)

for $n_j\le k\le n_{j+1}$, $j\le J-1$, and

$$\begin{aligned} h^j_k&\le \alpha \frac{c}{n_{j+1}-n_j} \end{aligned}$$

(16)

and for $\alpha \in [0,1]$. Alternatively, the daily constraints (15) can be expressed as iterative daily bounds

$$\begin{aligned} \widetilde{\ell }_k \, \le \, h_k&+d^j\, \le \,\widetilde{u}_k, \end{aligned}$$

(17)

where

$$\begin{aligned} \widetilde{\ell }_k:=\max \left\{ \ell _k,-H_k\right\} ,\qquad&\text {and}\qquad \widetilde{u}_k:=\min \left\{ u_k,c-H_k\right\} . \end{aligned}$$

Remark 3

In SFMod, the aggregated action on day k is $(h^S_k+d^j)$ for all $n_j\le k< n_{j+1}$. Hence, the action of pure spot trading is restricted by the daily delivery amount $d^j$, which results from $h^j_{\tilde{k}}$ with $n_{j-1}\le \tilde{k}<n_j$. In other words, forward trading activities have a delayed effect on the spot trading, but spot trading does not affect forward trading. The delivery quantities of the upcoming days in the current month are fixed after the respective forward trading has already terminated. The delivery obligations of the due forwards restrict the spot trading activities of the current month, as the sum of daily delivery and the spot trading is bounded by the daily withdrawal and injection rates. The constraint (16) ensures that the maximally traded amount can be stored in case of no spot trading. It can also be identified as liquidity constraint. Moreover, with the scaling factor $\alpha \in [0,1]$, one can bound the volume of forward trading and maintain an appropriate balance between spot and forward trading.

4.1 Training setup

Similarly to Sect. 3, we approximate for each trading day k actions $(h_k^S, h_k^j)$ by neural networks $g_k=(g_k^S, g_k^F)$ collected in $\widetilde{G}=\{g_0, \ldots , g_{K-1}\}$. Note that within this section, network strategies entail a two-dimensional output since in addition to actions in the spot market we also model strategies on the monthly forwards. For ease of notation, we abbreviate monthly forwards as $F_k=F(k, n_j, n_{j+1})$ for all $j\in \mathcal {J}$. The most important aspects of the training can be summarized as follows.

Training data: time horizon of storage $\mathbb {T}$, M trajectories of the spot $(S^i_k )_{k\in \mathbb {T};i=1,\ldots ,M}$, and of rolling month forward $(F^i_k)_{k\in \mathbb {T};i=1,\ldots ,M}$ respectively;
Training object: trading strategy network with two outputs for spot action and action in the rolling month forward consisting of $N\in {\mathbb {N}}$ ($N\le K$) distinct sub-networks, each of which has L layers. The network’s input is time as well as respective spot and storage fill level.
Training criterion: minimize an estimate of expected negative utility over batches $B\subset \{1,\ldots ,M\}$ of training data, i.e.,
$$\begin{aligned} \min _{\widetilde{G}\in \mathcal {G}^i, i\in B}\frac{1}{|B|} \sum _{i\in B}-U\left( W^{i,S}_{K-1} + W^{i,F}_{K-1} \right) , \end{aligned}$$
where
$$\begin{aligned} W^{i,S}_{K-1}&:=\sum _{k=0}^{K-1}-g_k^S\left( k,H^{S^i}_k,S^i_k, F^i_k\right) S^i_k,\\ W^F_{K-1}&:= \sum _{j=1}^{J-1} ~\sum _{ k=n_{j-1} }^{n_j-1}\left( -g^F_k\left( k,H^{S^i}_k,S^i_k, F^i_k\right) F^i_k(n_{j+1}-n_{j})\right) . \end{aligned}$$
For each scenario i, $\mathcal {G}^i$ contains those strategies $\widetilde{G}$ that fulfill all constraints (13)–(16).

For numerical testing, 10,000 scenarios of spot as well as monthly forward price curves of 12 months were provided by Axpo Solutions AG. 10,000 rolling monthly forward curves were inferred, each of which contains only the first nearby contract on any trading day. As in Sect. 3, the data set was split into 6000 training and 4000 test scenarios. Furthermore, we relied on the same network architecture as in SMod, i.e., 12 distinct neural networks representing 12 months of trading with $L=2$ and $d_1=16$. Note, however, that forward trading is discontinued in the last month, because the corresponding contract delivers beyond the trading horizon of the storage. Taking into account the revised constraints (13)–(16), strategy networks were then trained for 1000 epochs, using Adam stochastic gradient descent with a learning rate of 0.001, a batch size of 64, and a risk aversion of $\gamma =3$. Figure 5 visualizes the resulting policy in terms of the storage level across the scenarios over time with respect to different choices of $\alpha $. Figure 6 provides a visualization and detailed summary statistics for comparing this model with SMod and its benchmark. The comparison is based on the same setup for the spot strategy component and on the same training conditions. Both $\alpha $ and $\gamma $ can be used to control the distribution accordingly (e.g., shrink the variance of the P&L). Compared with the spot-only model, SFMod entails not only a slightly higher P&L on average, but also a higher volatility.

Figure 6 substantiates that SFMod, which allows trading activities on forwards, is the most favorable choice in terms of maximizing the expected terminal wealth. In comparison to SMod, it is slightly more involved from the technical setup, but in terms of computational time and effort, it is still well manageable on a standard 8-core notebook. Regarding SFMod, the higher first moment of the P&L distribution across all scenarios comes with a higher standard deviation. Furthermore, the first moment is sensitive to the limitation on forward market activities, expressed by the control variable $\alpha $. With a lower risk aversion $\gamma $, the expected profit can be increased even further at the expense of an increasing volatility. One direction of future work might be to generate superior P&L distributions with less risk. Another possible direction of future work might increase the model-theoretic complexity with more forward curves. It needs to be noted that the performance of SFMod is not adversely affected if we further extend the scope of forward trading activities or incorporate more realistic model features such as, for instance, $H^{S^i}_k$-dependent transaction cost.

5 Conclusion

We proposed a flexible and powerful framework that is capable of dealing with the intricacy of optimizing underground gas storage facilities in the presence of forward markets. Traditional techniques such as, for instance, least-squares Monte Carlo (LSMC) or dynamic programming are subject to a so-called curse-of-dimensionality, whereas the proposed deep learning technique is almost not affected by the dimensionality. Moreover, our experimental results show that the proposed deep hedging approach performs as good or better than the most-established state-of-the-art LSMC benchmark. These advances pave the way for better storage and production plans of energy for very general, non-Markovian markets.

Notes

Working gas refers to the total volume of gas in storage at a particular point in time. It is computed as total gas volume minus base gas.
In fact, we train our model with respect to a fixed initial state and a set of future scenarios. The validity of the approach and the model risk can only be controlled within the range of the given scenarios. If we leave the set of scenarios substantially (e.g., due to unprecedented market circumstances), the network has to be re-evaluated and, if necessary, re-trained.
The negative sign is added in analogy to the direction of the cashflows.
Forward curves were generated according to a three-factor model of Heath-Jarrow-Morton type respecting no-arbitrage conditions along the lines of Bjerksund et al. (2010). Spot prices were obtained from these as forward prices with instantaneous delivery.
Four basis functions were chosen in the LSMC approach: the spot price, the sum of forward prices and both these values squared.
If it was required, the split into a training and validation set could be achieved just as well for the LSMC approach.

References

Bachouch, A., Huré, C., Langrené, N., Pham, H.: Deep neural networks algorithms for stochastic control problems on finite horizon: numerical applications. SIAM J. Numer. Anal. 59, 525–557 (2020)
Google Scholar
Barrera-Esteve, C., Bergeret, F., Dossal, C.H., Gobet, E., Meziou, A., Munos, R., Reboul-Salze, D.: Numerical methods for the pricing of Swing options: a stochastic control approach. Methodol. Comput. Appl. Probab. 8(4), 517–540 (2006)
Article Google Scholar
Bjerksund, P., Rasmussen, H., Stensland, G.: Valuation and risk management in the Norwegian electricity market. In: Bjørndal, E., Bjørndal, M., Pardalos, P., Rönnqvist, M. (Eds.) Energy, Natural Resources and Environmental Economics, pp. 167–185. Springer, Berlin (2010)
Bjerksund, P., Stensland, G., Vagstad, F.: Gas storage valuation: price modelling v. optimization methods. Energy J. 32(1), 203–227 (2011)
Boogert, A., De Jong, C.: Gas storage valuation using a Monte Carlo method. J Deriv. 15(3), 81–98 (2008)
Article Google Scholar
Boogert, A., De Jong, C.: Gas storage valuation using a multifactor price process. J Energy Mark 4(4), 29–52 (2011)
Article Google Scholar
Buehler, H., Gonon, L., Teichmann, J., Wood, B.: Deep hedging. Quant. Finance 19(8), 1271–1291 (2019)
Article Google Scholar
Carmona, R., Ludkovski, M.: Valuation of energy storage: an optimal switching approach. Quant. Finance 10(4), 359–374 (2010)
Article Google Scholar
Cummins, M., Kiely, G., Murphy, B.: Gas storage valuation under lévy processes using the fast Fourier transform. J. Energy Mark. 10(4), 43–86 (2017)
Article Google Scholar
Daluiso, R., Nastasi, E., Pallavicini, A., Sartorelli, G.: Pricing commodity swing options (2020)
De Jong, C.: Gas storage valuation and optimization. J. Nat. Gas Sci. Eng. 24, 365–378 (2015)
Article Google Scholar
Fiorenzani, S., Ravelli, S., Edoli, E.: The Handbook of Energy Trading. Wiley, Hoboken (2012)
Book Google Scholar
Geman, H.: Commodities and Commodity Derivatives: Modeling and Pricing for Agricultural, Metals and Energy. Wiley, Hoboken (2009)
Google Scholar
Hénaff, P., Laachir, I., Russo, F.: Gas storage valuation and hedging: a quantification of model risk. Int. J. Financ. Stud. 6(1), 27 (2018)
Article Google Scholar
Holland, A.: Optimization of injection/withdrawal schedules for natural gas storage facilities. In: International Conference on Innovative Techniques and Applications of Artificial Intelligence, pp. 287–300. Springer (2007)
Holland, A.: A decision support tool for energy storage optimization. In: 2008 20th IEEE International Conference on Tools with Artificial Intelligence, vol. 2, pp. 299–306. IEEE (2008)
Jaillet, P., Ronn, E.I., Tompaidis, S.: Valuation of commodity-based swing options. Manag. Sci. 50(7), 909–921 (2004)
Article Google Scholar
Makassikis, C., Vialle, S., Warin, X.: Distribution of a stochastic control algorithm applied to gas storage valuation. In: 2007 IEEE International Symposium on Signal Processing and Information Technology, pp. 485–490 (2007). https://doi.org/10.1109/ISSPIT.2007.4458002
Malyscheff, A.M., Trafalis, T.B.: Natural gas storage valuation via least squares Monte Carlo and support vector regression. Energy Syst. 8(4), 815–855 (2017)
Article Google Scholar
Safarov, N., Atkinson, C.: Natural gas storage valuation and optimization under time-inhomogeneous exponential lévy processes. Int. J. Comput. Math. 94(11), 2147–2165 (2017)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Google Scholar
Thompson, M.: Natural gas storage valuation, optimization, market and credit risk management. J. Commod. Mark. 2(1), 26–44 (2016). https://doi.org/10.1016/j.jcomm.2016.07.004
Article Google Scholar
Thompson, M., Davison, M., Rasmussen, H.: Natural gas storage valuation and optimization: a real options application. Naval Res. Logist. (NRL) 56(3), 226–238 (2009)
Article Google Scholar
Warin, X.: Gas storage hedging. In: Breton, M., Ben-Ameur, H. (Eds.) Numerical Methods in Finance, pp. 421–445. Springer (2012)

Download references

Open Access

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Funding

Open access funding provided by Swiss Federal Institute of Technology Zurich.

Author information

Authors and Affiliations

Axpo Solutions AG, Baden, Switzerland
Nicolas Curin, Michael Kettler & Vlatka Komaric
Department of Mathematics, ETH Zürich, Zurich, Switzerland
Xi Kleisinger-Yu, Josef Teichmann & Hanna Wutte
Eastern Switzerland University of Applied Sciences, St. Gallen, Switzerland
Thomas Krabichler

Authors

Nicolas Curin
View author publications
You can also search for this author in PubMed Google Scholar
Michael Kettler
View author publications
You can also search for this author in PubMed Google Scholar
Xi Kleisinger-Yu
View author publications
You can also search for this author in PubMed Google Scholar
Vlatka Komaric
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Krabichler
View author publications
You can also search for this author in PubMed Google Scholar
Josef Teichmann
View author publications
You can also search for this author in PubMed Google Scholar
Hanna Wutte
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Krabichler.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

N. Curin, M. Kettler, V. Komaric: Opinions expressed in this paper are those of the authors, and do not necessarily reflect the view of Axpo Solutions AG.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Curin, N., Kettler, M., Kleisinger-Yu, X. et al. A deep learning model for gas storage optimization. Decisions Econ Finan 44, 1021–1037 (2021). https://doi.org/10.1007/s10203-021-00363-6

Download citation

Received: 31 January 2021
Accepted: 11 October 2021
Published: 19 November 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10203-021-00363-6

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A deep learning model for gas storage optimization

Abstract

Similar content being viewed by others

Operation Condition Prediction for Pipeline

Soft Actor-Critic Based Deep Reinforcement Learning Method for Production Optimization

Hydropower Optimization Using Deep Learning

1 Introduction