1 Introduction

Natural gas prices exhibit distinct yearly seasonal patterns. Due to limited storage capacities and pronounced fluctuations in the demand, its prices tend to be lower in summer, and higher as well as more volatile in winter. Physical storage facilities are required in order to exploit this seasonality. Various market participants are willing to own or lease storage facilities, or trade storage capacities, which creates high demand in the underground gas storage facilities worldwide. As an example, US working gasFootnote 1 in underground storage was at its record high in 2020 compared to the previous 5 years; see Fig. 1. Given the smoothing effect of gas price spreads through storage facilities, it is vital to optimize their usage for trading, pricing, hedging, risk management and investment purposes.

Fig. 1
figure 1

US natural gas (in contrast to liquefied natural gas or briefly LNG) futures curve and storage information provided by the EIA (US Energy Information Administration), as per February 18, 2021; downloaded from https://www.eia.gov/naturalgas/storage/dashboard/. The upper plot shows 1 year natural gas futures curves consisting of twelve monthly futures contracts with delivery period of months ranging from March 2021 to February 2022. It features the above-mentioned seasonal pattern, namely higher prices in winter and lower prices in summer. The lower plot shows the lower bound of underground gas storage in the US in lower 48 weekly working gas. Gas storage in the year 2020 reached a record high compared with that of the previous 5 years

Over the last decades, various articles contributed to the modeling and optimization of energy storage. For standard references, see the Section 12.6 of Geman (2009), Section 5.3.4 of Fiorenzani et al. (2012), and Holland (2007, 2008). Other references include, e.g., De Jong (2015), Boogert and De Jong (2008, 2011), Safarov and Atkinson (2017), Cummins et al. (2017), Carmona and Ludkovski (2010), Bjerksund et al. (2011), Thompson et al. (2009), Hénaff et al. (2018), Malyscheff and Trafalis (2017), Jaillet et al. (2004), Warin (2012) and Makassikis et al. (2007). Much of the literature puts more emphasis on the modeling (and prediction) of gas prices rather than on developing algorithms for the optimization of storage plans. Conventionally, storage plans were optimized by means of Least-Squares Monte Carlo approaches (LSMC) (Malyscheff and Trafalis 2017; Warin 2012; Boogert and De Jong 2011) or support vector machine regression (SVR), considered as a stochastic control problem with HJB equations (Thompson 2016), or an application of real option theory (Thompson et al. 2009). The bottleneck of classical optimization techniques is the so-called curse of dimensionality, i.e., the running time often grows exponentially in the number of state space dimensions. Based on techniques inspired by reinforcement learning, one manages to tackle these intricate optimization tasks without simplifications. To this end, one designs an artificial financial agent who is able to trade off numerous aspects without further ado. The recent work Bachouch et al. (2020) applies a number of true reinforcement learning algorithms to the problem of gas storage valuation, seen as discrete-time stochastic control problems in finite time horizon. Regarding the problem of pricing commodity swing options, Daluiso et al. (2020) employ an actor-critic reinforcement learning technique to approximate actions in day-ahead forward markets maximizing accumulated expected payoffs. In contrast to these reinforcement learning methods, our artificial agent does not learn to act under all configurations, but only under those relevant for the given scenarios.Footnote 2 Furthermore neither Markov assumptions are made, nor is dynamic programming used.

In spirit similar to the present work, Barrera-Esteve et al. (2006) suggest to address the problem of pricing a swing option on natural gas with a policy search method, i.e., to train a neural network to approximate optimal gas consumption rates maximizing expected terminal wealth. Thus, in accordance with our approach, Barrera-Esteve et al. (2006) view the task of pricing the swing option as a general parametric optimization problem rather than a stochastic control problem. By contrast however, we do not assume any Markov setting such as that arising from the one-factor model for forward prices of gas which considered therein. Moreover, we stress that the deep hedging approach followed in the present work can be readily generalized to minimizing risk measures instead of expected rewards.

Beyond that, to the best of the authors’ knowledge, our techniques inspired by reinforcement learning have not been applied to gas storage and related problems. Thus, their full potential as well as new challenges in storage-related optimization problems are yet to be investigated.

In this article, we present a fresh machine learning approach for the optimization of gas storage. Along the lines of deep hedging (see Buehler et al. 2019), we determine optimal strategy networks, i.e., neural networks that approximate optimal strategies for trading in spot and forward markets utilizing storage facilities. Optimality is understood as maximizing expected utility of accumulated wealth at terminal time. More specifically, we introduce two models which are of the intrinsic valuation type: a simple spot-only model (SMod) allowing for trades in a spot-proxy only, and a more involved model referred to as spot-and-forward model (SFMod) additionally incorporating trades in monthly forwards with delivery periods. Traditionally, models with trades in spot-proxies based on an artificial daily forward curve, which is implied from the tradable monthly forwards with delivery period, have been employed for simplicity. However, the main purpose of gas storage optimization is to maximize profits or utility of storage managers rather than theoretical valuations. Therefore, the use of tradables, and thus the use of SFMod, is more relevant for gas storage optimization.

The paper is structured in the following way: In Sect. 2, we provide a brief overview of important aspects in gas storage modeling and outline the machine learning concept that we employ for optimizing gas storage usage. In Sect. 3, we present the spot-only deep learning model (SMod) for trading strategies utilizing gas storage facilities. We compare our model in numerical tests against a set of benchmark strategies derived via LSMC. In Sect. 4, we present the spot-and-forward model (SFMod), which additionally includes trades in monthly forwards with delivery periods, and investigate its performance in numerical tests likewise.

Throughout this article, we consider a discrete time setting. Let the time instances \(\mathbb {T}=\{0,1,2,\ldots ,K-1\}\) for some \(K\in \mathbb {N}\) be the trading horizon in days, and let \((\varOmega ,\mathcal {F},\mathbb {F},\mathbb {P})\) with \(\mathbb {F}=(\mathcal {F}_k)_{k\in \mathbb {T}}\) be a filtered probability space with real-world measure \(\mathbb {P}\). Additionally, we assume the existence of an equivalent risk-neutral measure \({\mathbb {Q}}\).

2 Optimizing gas storage by means of deep hedging

There are three types of underground natural gas storage: depleted natural gas field/oil fields, aquifers, and salt caverns. They are often located close to a pipeline, which makes the delivery of physical transactions more convenient. Compared to the storage through conversion to LNG, underground natural gas storage is bigger and cheaper, but restricted to regional use. In Table 1, we provide an overview of the main stylized characteristics of a gas storage, that are relevant for modeling and optimization. An owner of physical natural gas storage might want to hedge on the forward market by selling virtual storage. Typically, such a virtual storage is structured based on the stylized characteristics in order to ease negotiation and improve the liquidity of the transaction.

Table 1 This table lists the most important characteristics for modeling gas storage

The goal of gas storage optimization is to find an optimal plan for withdrawing and injecting gas into the storage over a certain period of time subject to the above-mentioned constraints. Extracting respectively feeding gas into the storage corresponds to going short or long in the spot market with respect to a certain storage level. Hence, gas storage optimization can be seen as the problem of identifying optimal actions in an uncertain and restricted market environment to maximize an expected terminal utility of accrued wealth. More formally, let us consider an agent trading in a market and let \(h_k\) denote the action she takes on day k. A trading strategy over the whole trading horizon is then collected in \(H=\{h_0,\ldots ,h_{K-1}\}\). At maturity, the agent gains utility \(U(W_H)\) based on the stochastic terminal wealth \(W_H\) that she accrued by trading according to the strategy H. The agent seeks to identify an optimal strategy \(H^*\) satisfying

$$\begin{aligned} H^*=\underset{H}{\text {arg}\max }\ {\mathbb {E}}_{\mathbb {P}} [U(W_H)]. \end{aligned}$$
(1)

Reinforcement learning (Sutton and Barto 2018) is a broad and very active area of research, suggesting a plethora of algorithms to solve intricate optimization problems. A very popular strand of deep reinforcement learning focuses on approximating optimal actions by (e.g., feed-forward) neural networks; see Definition 1 below. Neural networks are very suitable for such tasks because of their universality and their efficient trainability. Parameters \(\theta \) of these network strategies \(G^\theta =\{g^\theta _0,\ldots ,g^\theta _{K-1}\}\) are trained to maximize an estimate of the expected terminal utility, i.e., to solve

$$\begin{aligned} \max _\theta {\mathbb {E}}_\mathbb {P} \left[ U\left( W_{G^\theta }\right) \right] . \end{aligned}$$
(2)

Notice that, in contrast to reinforcement learning, we do neither calculate intermediate value functions, nor do we calculate actions on states which are unlikely to appear. This considerably reduces the numerical complexity of the method. Additionally, we are neither constrained to time consistent settings nor to expected reward settings.

Definition 1

(Feedforward neural network) Let \(L, d_0, d_L\in {\mathbb {N}}\). A feed-forward neural network \({g^\theta :\,{\mathbb {R}}^{d_0}\rightarrow {\mathbb {R}}^{d_L}}\) is defined as

$$\begin{aligned} g^\theta (x)=A^{L}\circ \varphi \circ A^{L-1}\circ \ldots \circ \varphi \circ A^{1}(x),\quad \end{aligned}$$
(3)

where

  • \(L\in \mathbb {N}\) is the number of layers (\(L-1\) hidden layers),

  • \(\varphi (\cdot )\) denotes a non-linear activation function that is applied component-wise, e.g., the sigmoid activation function \(\varphi (\cdot )=(1+e^{-\,\cdot })^{-1}\), and

  • \(A^{l}, l=1,\ldots ,L\) denote affine linear maps in the respective dimensions, whose parameters are stored in \(\theta \in {\mathbb {R}}^q\) for some \(q\in \mathbb {N}\).

Note that in reinforcement learning, it is commonly assumed that environments can be described by (known or unknown) Markov decision processes. However, in many real-world applications and in particular in financial markets, information above and beyond knowledge of the current state can be used to better predict the dynamics of the environment and improve control, rendering Markov assumptions unrealistic. While enlarging the state space can partially act as remedy, we highlight that the approach outlined above does not necessarily require a Markovian framework. Instead, we approximate trading strategies along the lines of deep hedging without restricting the state space or even restricting ourselves to specific market dynamics.

Finally, for obvious reasons, we want to note that the trading action \(h_k\) should only depend on information which is available in the market up until time k. This entails the parameterization of the strategy \(h_k\) on a given day k by a neural network \(g^\theta _k\) mapping appropriate market information and storage levels to trading actions.

3 SMod: intrinsic spot trading

Following the machine learning approach outlined in Sect. 2, we introduce in what follows a deep hedging model for gas storage optimization that is based on trading day-ahead prices of gas. Note that in commodities markets, day-ahead futures or forwards are seen as close proxy of the spot price. Therefore, we tacitly refer trading activities in the day-ahead price of gas to spot trading. For simplicity, we assume no discounting and zero transaction cost, i.e., \(\kappa \equiv 0\); for more general formulation including costs, see Remark 1.

Let \(S_k=\big (F(k,k+1,k+1)\big )_{k\in {\mathbb {T}}}\) denote the \(\mathbb {F}\)-adapted gas spot price, and \(h^S_k\) the \(\mathcal {F}_k\)-measurable action on day k. \(h^S_k>0\) refers to an injection of \(|h^S_k|\) MWh into the storage and \(h^S_k<0\) refers to a withdrawal of \(|h^S_k|\) MWh from it. A trading strategy over the whole trading horizon is denoted by \(\widetilde{H}^S=\{h^S_0,h^S_1,\ldots ,h^S_{K-1}\}\) and its terminal value is given by \((\widetilde{H}^S\bullet S)_{K-1}:=\sum _{k=0}^{K-1} h^S_k S_k\). Moreover, the storage level (or working gas) \(H_n^S\) on day n is given by

$$\begin{aligned} H_{n}^S:=\sum _{k=0}^{n-1}h^S_k, \end{aligned}$$

with an initially empty storage, i.e., \(H_0^S:=0\).

Suppose a storage manager’s preferences can be expressed through a (concave, non-decreasing) utility function \(U:{\mathbb {R}}\rightarrow {\mathbb {R}}\). In line with Sect. 2, she aims to identify strategies maximizing her expected utility of terminal wealth, i.e., to maximize

$$\begin{aligned} \mathbb {E}_\mathbb {P}\left[ U(W_{K-1})\right] \end{aligned}$$
(4)

over all eligible \(\widetilde{H}^S\), where

$$\begin{aligned} W_{K-1}&:=\sum _{k=0}^{K-1}-h^S_k S_k \end{aligned}$$
(5)

denotes the resulting terminal profit and lossFootnote 3 (P&L). The optimization is subject to the constraints

$$\begin{aligned}&H_K^S = 0, \end{aligned}$$
(6)
$$\begin{aligned}&0 \, \le H_k^S \, \le \, c, \qquad \text {and } \qquad \ell _k \, \le \,h^S_k \, \le \, u_k, \end{aligned}$$
(7)

for all \(k\in \mathbb {T}\). The constraint (6) says that the storage must be empty at maturity; if one does not adhere to that, a contractually agreed penalty becomes due. Unless gas prices become negative, any profit-seeking agent would comply with (6) naturally, since profits can only be generated by disposing of previously stored gas. The daily constraints (7) can be transformed and merged to a single target range. Indeed, one simply imposes

$$\begin{aligned} \widetilde{\ell }_k \, \le \, h^S_k\, \le \,\widetilde{u}_k, \end{aligned}$$
(8)

where

$$\begin{aligned} \widetilde{\ell }_k:=\max \left\{ \ell _k,-H_k^S\right\} ,\qquad \text {and}\qquad \widetilde{u}_k:=\min \left\{ u_k,c-H_k^S\right\} . \end{aligned}$$

Pursuing a strategy learning approach, we approximate each action \(h^S_k\) in terms of a deep neural network \(g_k\); for ease of readability, we henceforth skip the dependence on the model parameters \(\theta \). The inputs to these network strategies are the current spot price \(S_k\), the latest storage fill level \(H_k^S\) (that iteratively depends on the previous neural networks) and the time k, i.e., \(g_k = g_k\big (k,H_k^S,S_k\big )\). These K networks are summarized in the storage schedule \(\widetilde{G}=\{g_0, \ldots , g_{K-1}\}\). Note that we allow for parameter sharing amongst the network instances \(g_k\), i.e., the number of distinct neural networks \(N\in {\mathbb {N}}\) can be significantly smaller than the number of trading days. In the most extreme case, the present framework allows for modeling each strategy with the same network, i.e., \(g_k\equiv g\) for all \(k\in \mathbb {T}\). It needs to be noted that the inputs of the neural networks in the numerical tests below were normalized in order to ensure a swift and stable learning process. The model parameters of \(\widetilde{G}\) were trained with standard Adam stochastic gradient descent on negative expected utility

$$\begin{aligned} \mathbb {E}_\mathbb {P}\quad \left[ -U\left( \sum _{k=0}^{K-1}-g_k\left( k,H_k^S,S_k\right) S_k\right) \right] , \end{aligned}$$
(9)

subject to the constraints mentioned in Sect. 2.

3.1 Training setup

In the following, we state the precise training setup by the example of exponential utility \( U(\cdot ):=(1-e^{-\gamma \cdot })/\gamma \), with risk aversion rate \(\gamma \in {\mathbb {R}}^+\).

  • Training Data: time horizon of storage \(\mathbb {T}\), M trajectories of the spot \((S^i_k )_{k\in \mathbb {T};i=1,\ldots ,M}\).

  • Training object: storage action (withdrawal or injection rate) \(\widetilde{G}\) over the whole storage horizon, that is a neural network consisting of \(N\in {\mathbb {N}}\) (\(N\le K\)) distinct sub-networks, each of which has L layers. The network’s input is time as well as respective spot and storage fill level.

  • Training criterion: minimize an estimate of expected negative utility over batches \(B\subset \{1,\ldots ,M\}\) of training data, i.e.,

    $$\begin{aligned} \min _{\widetilde{G}(S^i)\in \mathcal {G}^i,i\in B}\frac{1}{|B|} \sum _{i\in B}-U\left( W^i_{K-1}\right) , \end{aligned}$$

    where

    $$\begin{aligned} W^i_{K-1}&:=\sum _{k=0}^{K-1}-g_k(k,H^{S_i}_k,S^i_k) S^i_k,\\ \mathcal {G}^i&=\left\{ \widetilde{G}(S^i) ~\Big | ~H_K^{S^i}=0;\ \widetilde{\ell }_k\le g_k\left( k,H_k^{S^i},S^i_k\right) \le \widetilde{u}_k\, ~\text {for}~k\in \mathbb {T}\right\} . \end{aligned}$$

Remark 1

For the full case as described in Table 1, where costs \(\kappa \) and C are non-zero, terminal profit and loss is given by

$$\begin{aligned} W_{K-1}:=\left( \sum _{k=0}^{K-1}-h^S_k S_k-\big |h^S_kS_k\big |\cdot \kappa \right) -C. \end{aligned}$$

The training can be performed analogously.

Remark 2

Of course here a more general path dependence could be considered to deal with possible non-Markovianity.

For numerical testing, spot curves of gas as well as benchmark strategies were provided by Axpo Solutions AG in form of \(10{,}000\times 351\) matrices, representing \(M=10{,}000\) scenarios of \(K=351\) trading days.Footnote 4 Benchmark strategies had been derived utilising an LSMC technique.Footnote 5 We would like to emphasize that the proposed framework is not linked to specific stochastic dynamics of the spot price scenarios, i.e., we can exploit the methodology regardless of the model choice. Therefore, we leave the nature of the scenarios unspecified. If required, one can enrich the base scenarios with arbitrary stress scenarios.

The neural network model was implemented in tensorflow.keras with the sigmoid activation function. The daily constraints \(\widetilde{\ell }_k\) and \(\widetilde{u}_k\) from (8), and the network-based action \(g_k\) were parameterized using the inverse linear transformation from 0, 1 and \(\frac{g_k-\widetilde{\ell }_k}{(\widetilde{u}_k-\widetilde{\ell }_k)}\in [0,1]\) respectively. For the final zero storage constraint, we additionally checked for every k that

$$\begin{aligned} H^S_k \le \sum _{k+1}^{K-1}\ell _k \end{aligned}$$
(10)

prevailed, in order to ensure that an empty final storage \(H^S_K=0\) remained reachable. If the condition was violated on any day, the upper action bounds on all subsequent days were overridden by their lower counterparts, forcing a complete withdrawal of storage until maturity. In reality, it is possible to leave a non-empty storage by paying the penalty. Yet, for our modeling, the zero final storage rule was strictly adhered to. Figure 2 visualizes the constraints: the left plot shows normalized zero storage constraint (10), and the right plot shows daily injection and withdrawal bounds (8). The daily constraints entail two regimes with

$$\begin{aligned} \ell _k={\left\{ \begin{array}{ll} -600, &{} \quad {\text {if}} \;k\le 170,\\ -3072,&{}\quad {\text {otherwise}}, \end{array}\right. } \qquad u_k={\left\{ \begin{array}{ll} 2808,&{}\quad {\text {if}}\; k\le 200,\\ 408,&{}\quad {\text {otherwise}}. \end{array}\right. } \end{aligned}$$
Fig. 2
figure 2

The daily constraints of a gas storage. The left figure visualizes the empty final storage constraint (10); the y-axis is normalized by the total storage capacity c. The critical boundary is reached on the trading day 269. In other words, if the relative storage level is beyond the blue line any time after the trading day 269, the only admissible actions remain maximal withdrawal up until maturity. The right figure visualizes the daily injection and withdrawal constraints

The network strategies \(g_k\) were trained based on the spot prices provided by Axpo Solutions AG. The data set was split into a training set of 6000 and a validation set of 4000 scenarios in order to perform in-sample and out-of-sample tests. LSMC benchmark strategies were optimized on the entire set of 10,000 scenarios in order to serve as a suitable proxy of the optimal solution.Footnote 6 In the course of various experiments, we assessed the required training time depending on the depth and the number of distinct neural networks (\(N\le K\)), different learning rates and the batch size in order to fine-tune empirically a suitable setting for SMod. The training generally concluded quickly and is well-managable on a standard 8-core notebook. Illustratively, the training of SMod using in the implementation as much as K neural networks and 1000 epochs on 6000 scenarios takes less than 2 h. A core strength of the proposed deep hedging approach however, is that training of strategy networks can be performed when time is not pressing (e.g., on a weekly basis) and trained strategies can be readily evaluated at any given time within a couple of seconds. This runtime for the evaluation is competitive with the LSMC approach. Moreover, it turned out that it is not necessary to build SMod on K neural networks. In fact, we encountered that \(N=12\) instances, each used recurrently during a month, with \(L=2\) and \(d_1=16\) (see Definition 1 above) already provided a decent approximation of the optimal policy. Indeed, after 1000 epochs on 6000 scenarios, a learning rate of 0.001, a batch size of 64, and a risk aversion rate of \(\gamma =3\), the strategy of the artificial financial agent gets convincingly close to the benchmark solution. Figure 3 provides a visualization of the P&L line-up between the spot-only and the benchmark model in in-sample and out-of-sample tests, as well as a visualization of the storage fill levels of SMod and that of the benchmark respectively. A comparison of descriptive statistics on the terminal P&L between SMod and the benchmark is reported in the table at the bottom of Fig. 6. When we used fewer instances \(N<12\), ceteris paribus, we encountered inferior performance after the same number of epochs. It turns out that a higher parameter N promotes a quicker learning of the required solution complexity.

Fig. 3
figure 3

A line-up between the performance of the spot-only model (SMod) and that of the benchmark (LSMC). The upper left plot shows all the spot price scenarios. The upper right plot shows the storage fill levels across all scenarios as inferred from the neural networks of SMod, which are similar to those of the benchmark actions. The scale is normalized by the storage capacity c. The optimal policy tends to inject gas until the storage capacity is reached, and withdraws it after a certain waiting period until the storage is empty again. This is in-line with the underlying seasonality pattern. The plots below compare the terminal P&L between SMod and the benchmark in million CHF. The distribution on the left is based on the training set and that on the right based on the test set. Both in- and out-of-sample results are compellingly close to the benchmark

4 SFMod: intrinsic spot and forward trading

In the following, we extend the previous model by trading additionally on the front month rolling forwards with delivery period of a whole month. A front month rolling forward curve contains at any point in time the first nearby monthly forward. We inherently assume that a monthly forward contract is only traded before its delivery period starts (and no longer during the delivery period), and that delivery obligations are valued using the spot prices whose delivery days lie within the delivery period. Note that we restrict ourselves to those forwards that have delivery months within the time horizon of the storage problem. A visualization of the forward rolling mechanism is provided in Fig. 4.

Fig. 4
figure 4

The mechanism of the rolling strategies in SFMod

Let \(0=n_0<n_1<\cdots<n_J<K\) be the first days of the months \(\mathcal {J}=\{0,1,\ldots ,J\}\) respectively. Let \(h^j_k\) with \(j\in \mathcal {J}\) denote the action on day k on the forward \(F(k,n_j, n_{j+1}-1)\), which has the delivery period \([n_j, n_{j+1}-1]\). \(h^j_k>0\) refers to buying and \(h^j_k<0\) refers to selling \(F(k,n_j, n_{j+1}-1)\). The above assumption implies that \(h^j_k=0\) for \(k<n_{j-1}\) and for \(k\ge n_j\); in particular \(h^J_k=0\) for all \(k\in \mathbb {T}\). Consistently to SMod, we aim to maximize

$$\begin{aligned} \mathbb {E}_\mathbb {P} \big [U(W_{K-1})\big ] \end{aligned}$$
(11)

with the terminal P&L

$$\begin{aligned} W_{K-1} := W^S_{K-1} + W^F_{K-1}. \end{aligned}$$

\(W^S_{K-1}\) denotes the terminal P&L from the spot trading and is unchangedly given by (5). \(h^S_k\) schedules the storage activity for the next day. Similarly, \(W^F_{K-1}\) denotes the terminal P&L from trading the monthly forward, and is defined as

$$\begin{aligned} W^F_{K-1}= \sum _{j=1}^{J-1} ~\sum _{ k=n_{j-1} }^{n_j-1}\left( -h^j_k F\left( k, n_j, n_{j+1}-1\right) \left( n_{j+1}-n_{j}\right) \right) . \end{aligned}$$
(12)

For a forward with the delivery period \([n_j, n_{j+1}-1]\), the daily delivery quantity \(d^j\) is fixed on day \(n_j -1\) for \(j\ge 0\), and is given by

$$\begin{aligned} d^j:=\sum _{k=n_{j-1}}^{n_j-1} h^j_k,\qquad \text {for}~ j>0, \end{aligned}$$
(13)

and \(d^0:=0\). The storage level \(H_n\) on day n depends on both spot and monthly forward trading activities. For \(n\in [n_{I-1},n_I)\), \(H_n\) is given by

$$\begin{aligned} H_{n}:=\sum _{k=0}^{n-1}h^S_k + \sum _{j=1}^{I-2} \left( d^j \left( n_{J+1}-n_{J}\right) \right) +d^{I-1} \left( n-n_{I-1}+1\right) \end{aligned}$$

with initially empty storage, i.e. \(H_0:=0\). The optimization of (11) is subject to the constraints

$$\begin{aligned} H_K&= 0,\end{aligned}$$
(14)
$$\begin{aligned} 0 \,\le H_k \, \le \, c, \qquad&\text {and } \qquad \ell _k-d^j \, \le \,h^S_k \, \le \, u_k-d^j \end{aligned}$$
(15)

for \(n_j\le k\le n_{j+1}\), \(j\le J-1\), and

$$\begin{aligned} h^j_k&\le \alpha \frac{c}{n_{j+1}-n_j} \end{aligned}$$
(16)

and for \(\alpha \in [0,1]\). Alternatively, the daily constraints (15) can be expressed as iterative daily bounds

$$\begin{aligned} \widetilde{\ell }_k \, \le \, h_k&+d^j\, \le \,\widetilde{u}_k, \end{aligned}$$
(17)

where

$$\begin{aligned} \widetilde{\ell }_k:=\max \left\{ \ell _k,-H_k\right\} ,\qquad&\text {and}\qquad \widetilde{u}_k:=\min \left\{ u_k,c-H_k\right\} . \end{aligned}$$

Remark 3

In SFMod, the aggregated action on day k is \((h^S_k+d^j)\) for all \(n_j\le k< n_{j+1}\). Hence, the action of pure spot trading is restricted by the daily delivery amount \(d^j\), which results from \(h^j_{\tilde{k}}\) with \(n_{j-1}\le \tilde{k}<n_j\). In other words, forward trading activities have a delayed effect on the spot trading, but spot trading does not affect forward trading. The delivery quantities of the upcoming days in the current month are fixed after the respective forward trading has already terminated. The delivery obligations of the due forwards restrict the spot trading activities of the current month, as the sum of daily delivery and the spot trading is bounded by the daily withdrawal and injection rates. The constraint (16) ensures that the maximally traded amount can be stored in case of no spot trading. It can also be identified as liquidity constraint. Moreover, with the scaling factor \(\alpha \in [0,1]\), one can bound the volume of forward trading and maintain an appropriate balance between spot and forward trading.

4.1 Training setup

Similarly to Sect. 3, we approximate for each trading day k actions \((h_k^S, h_k^j)\) by neural networks \(g_k=(g_k^S, g_k^F)\) collected in \(\widetilde{G}=\{g_0, \ldots , g_{K-1}\}\). Note that within this section, network strategies entail a two-dimensional output since in addition to actions in the spot market we also model strategies on the monthly forwards. For ease of notation, we abbreviate monthly forwards as \(F_k=F(k, n_j, n_{j+1})\) for all \(j\in \mathcal {J}\). The most important aspects of the training can be summarized as follows.

  • Training data: time horizon of storage \(\mathbb {T}\), M trajectories of the spot \((S^i_k )_{k\in \mathbb {T};i=1,\ldots ,M}\), and of rolling month forward \((F^i_k)_{k\in \mathbb {T};i=1,\ldots ,M}\) respectively;

  • Training object: trading strategy network with two outputs for spot action and action in the rolling month forward consisting of \(N\in {\mathbb {N}}\) (\(N\le K\)) distinct sub-networks, each of which has L layers. The network’s input is time as well as respective spot and storage fill level.

  • Training criterion: minimize an estimate of expected negative utility over batches \(B\subset \{1,\ldots ,M\}\) of training data, i.e.,

    $$\begin{aligned} \min _{\widetilde{G}\in \mathcal {G}^i, i\in B}\frac{1}{|B|} \sum _{i\in B}-U\left( W^{i,S}_{K-1} + W^{i,F}_{K-1} \right) , \end{aligned}$$

    where

    $$\begin{aligned} W^{i,S}_{K-1}&:=\sum _{k=0}^{K-1}-g_k^S\left( k,H^{S^i}_k,S^i_k, F^i_k\right) S^i_k,\\ W^F_{K-1}&:= \sum _{j=1}^{J-1} ~\sum _{ k=n_{j-1} }^{n_j-1}\left( -g^F_k\left( k,H^{S^i}_k,S^i_k, F^i_k\right) F^i_k(n_{j+1}-n_{j})\right) . \end{aligned}$$

    For each scenario i, \(\mathcal {G}^i\) contains those strategies \(\widetilde{G}\) that fulfill all constraints (13)–(16).

For numerical testing, 10,000 scenarios of spot as well as monthly forward price curves of 12 months were provided by Axpo Solutions AG. 10,000 rolling monthly forward curves were inferred, each of which contains only the first nearby contract on any trading day. As in Sect. 3, the data set was split into 6000 training and 4000 test scenarios. Furthermore, we relied on the same network architecture as in SMod, i.e., 12 distinct neural networks representing 12 months of trading with \(L=2\) and \(d_1=16\). Note, however, that forward trading is discontinued in the last month, because the corresponding contract delivers beyond the trading horizon of the storage. Taking into account the revised constraints (13)–(16), strategy networks were then trained for 1000 epochs, using Adam stochastic gradient descent with a learning rate of 0.001, a batch size of 64, and a risk aversion of \(\gamma =3\). Figure 5 visualizes the resulting policy in terms of the storage level across the scenarios over time with respect to different choices of \(\alpha \). Figure 6 provides a visualization and detailed summary statistics for comparing this model with SMod and its benchmark. The comparison is based on the same setup for the spot strategy component and on the same training conditions. Both \(\alpha \) and \(\gamma \) can be used to control the distribution accordingly (e.g., shrink the variance of the P&L). Compared with the spot-only model, SFMod entails not only a slightly higher P&L on average, but also a higher volatility.

Fig. 5
figure 5

A comparison between the performance of SFMod with different choices for \(\alpha \) and that of the benchmark (LSMC). The plots compare the terminal P&L between SFMod (\(\alpha = 0.1\) and \(\alpha =0.5\)) and the benchmark in million CHF. The left chart exhibits the P&L on the training set and the right chart that on the test set. In both plots, the P&L distributions of SFMod is not conclusively more favorable than that of the benchmark. The optimal policy features a consistent seasonality pattern as that in SMod

Fig. 6
figure 6

The boxplot and the table provide a line-up of the considered models. The first moments and the volatility of the P&Ls distribution are largest for SFMod. Moreover, they depend on the choice of \(\alpha \). Please note that the LSMC-approach is still a competitive benchmark, even though it does not allow for forward trading activities

Figure 6 substantiates that SFMod, which allows trading activities on forwards, is the most favorable choice in terms of maximizing the expected terminal wealth. In comparison to SMod, it is slightly more involved from the technical setup, but in terms of computational time and effort, it is still well manageable on a standard 8-core notebook. Regarding SFMod, the higher first moment of the P&L distribution across all scenarios comes with a higher standard deviation. Furthermore, the first moment is sensitive to the limitation on forward market activities, expressed by the control variable \(\alpha \). With a lower risk aversion \(\gamma \), the expected profit can be increased even further at the expense of an increasing volatility. One direction of future work might be to generate superior P&L distributions with less risk. Another possible direction of future work might increase the model-theoretic complexity with more forward curves. It needs to be noted that the performance of SFMod is not adversely affected if we further extend the scope of forward trading activities or incorporate more realistic model features such as, for instance, \(H^{S^i}_k\)-dependent transaction cost.

5 Conclusion

We proposed a flexible and powerful framework that is capable of dealing with the intricacy of optimizing underground gas storage facilities in the presence of forward markets. Traditional techniques such as, for instance, least-squares Monte Carlo (LSMC) or dynamic programming are subject to a so-called curse-of-dimensionality, whereas the proposed deep learning technique is almost not affected by the dimensionality. Moreover, our experimental results show that the proposed deep hedging approach performs as good or better than the most-established state-of-the-art LSMC benchmark. These advances pave the way for better storage and production plans of energy for very general, non-Markovian markets.