A deep learning model for gas storage optimization

To the best of our knowledge, the application of deep learning in the field of quantitative risk management is still a relatively recent phenomenon. In this article, we utilize techniques inspired by reinforcement learning in order to optimize the operation plans of underground natural gas storage facilities. We provide a theoretical framework and assess the performance of the proposed method numerically in comparison to a state-of-the-art least-squares Monte-Carlo approach. Due to the inherent intricacy originating from the high-dimensional forward market as well as the numerous constraints and frictions, the optimization exercise can hardly be tackled by means of traditional techniques.

Mathematics Subject Classification (2010) 65K99, 91G60 1 Introduction Natural gas prices exhibit distinct yearly seasonal patterns.Due to limited storage capacities and pronounced fluctuations in the demand, its prices tend to be lower in summer, and higher as well as more volatile in winter.Physical storage facilities are required in order to exploit this seasonality.Various market participants are willing to own or lease storage facilities, which creates high demand in the underground gas storage facilities worldwide.As an example, US working gas1 in underground storage was at its record high in 2020 compared to the previous five years; see Figure 1.Given the rising prices of gas storage and the smoothing effect of gas price spreads through storage facilities, it is vital to optimize their usage for hedging, risk management and investment purposes.
Over the last decades, various articles contributed to the modeling and optimization of energy storage.For standard references, see the Section 12.6 of Geman (2009), Section 5.3.4 of Fiorenzani et al. (2012), and Holland (2007, 2008).Other references include, e.g., De Jong (2015); Boogert and De Jong (2008); Safarov and Atkinson (2017); Cummins et al. (2017); Carmona and Ludkovski (2010); Bjerksund et al. (2011); Thompson et al. (2009); Hénaff et al. (2018); Malyscheff and Trafalis (2017).Much of the literature puts more emphasis on the modeling (and prediction) of gas prices rather than on developing algorithms for the optimization of storage plans.Conventionally, storage plans were optimized by means of Least-Squares Monte Carlo approaches (LSMC) (Malyscheff and Trafalis, 2017) or support vector machine regression (SVR), considered as a stochastic control problem with HJB equations (Thompson, 2016), or an application of real option theory (Thompson et al., 2009).The bottleneck of classical optimization techniques is the so-called curse of dimensionality, i.e., the running time often grows exponentially in the number of state space dimensions.Based on techniques inspired by reinforcement learning, one manages to tackle these intricate optimization tasks without simplifications.To this end, one designs an artificial financial agent with superhuman experience and a decent risk appetite, who is able to trade off numerous aspects without further ado.Regarding the problem of pricing commodity swing options, Daluiso et al. (2020) employ an actor-critic reinforcement learning technique to approximate actions in day-ahead forward markets maximizing accumulated expected payoffs.In contrast to re-inforcement learning our artificial agent does not learn to act under all configurations, but only under those relevant for the given scenarios.Furthermore neither Markov assumptions are made, nor is dynamic programming used.Beyond that, to the best of the authors' knowledge, our techniques inspired by reinforcement learning have not been applied Fig. 1: US natural gas (in contrast to liquefied natural gas or briefly LNG) futures curve and storage information provided by the EIA (US Energy Information Administration), as per October 2, 2020; downloaded from https: //www.eia.gov/naturalgas/storage/dashboard/.The upper plot shows one year natural gas futures curves consisting of twelve monthly futures contracts with delivery period of months ranging from November 2020 to October 2021.It features the above-mentioned seasonal pattern, namely higher prices in winter and lower prices in summer.The lower plot shows the lower bound of underground gas storage in the US in lower 48 weekly working gas.Gas storage in the year 2020 reached a record high compared with that of the previous five years.
to gas storage and related problems.Thus, their full potential as well as new challenges in storage-related optimization problems are yet to be investigated.
In this article, we present a fresh machine learning approach for the optimization of gas storage.Along the lines of deep hedging (see Buehler et al. (2019)), we determine optimal strategy networks, i.e., neural networks that approximate optimal strategies for trading in spot and forward markets utilizing storage facilities.Optimality is understood as maximizing expected utility of accumulated wealth at terminal time.More specifically, we introduce two models which are of the intrinsic valuation type: a simple spot-only model (SMod) allowing for trades in a spot-proxy (more precisely, day-ahead forwards) only, and a more involved model referred to as spot-and-forward model (SFMod) additionally incorporating trades in monthly forwards with delivery periods.Traditionally, models with trades in spot-proxies based on an artificial daily forward curve, which is implied from the tradable monthly forwards with deliv-ery period, have been employed for simplicity.However, the main purpose of gas storage optimization is to maximize profits or utility of storage managers rather than theoretical valuations.Therefore, the use of tradables, and thus the use of SFMod, is more relevant for gas storage optimization.
The paper is structured in the following way: In Section 2, we provide a brief overview of important aspects in gas storage modeling and outline the machine learning concept that we employ for optimizing gas storage usage.In Section 3, we present the spot-only deep learning model (SMod) for trading strategies utilizing gas storage facilities.We compare our model in numerical tests against a set of benchmark strategies derived via LSMC.In Section 4, we present the spot-and-forward model (SFMod), which additionally includes trades in monthly forwards with delivery periods, and investigate its performance in numerical tests likewise.
Throughout this article, we consider a discrete time setting.Let the time instances T = {0, 1, 2, ⋯, K − 1} for some K ∈ N be the trading horizon in days, and let (Ω, F, F, P) with F = (F k ) k∈T be a filtered probability space with real-world measure P. Additionally, we assume the existence of an equivalent risk-neutral measure Q.

Optimizing gas storage by means of deep hedging
There are three types of underground natural gas storage: depleted natural gas field/oil fields, aquifers, and salt caverns.They are often located close to a pipeline, which makes the delivery of physical transactions more convenient.Compared to the storage through conversion to LNG, underground natural gas storage is bigger and cheaper, but restricted to regional use.In Table 1, we provide an overview of the main stylized characteristics of a gas storage, that are relevant for modeling and optimization.
The goal of gas storage optimization is to find an optimal plan for withdrawing and injecting gas into the storage over a certain period of time subject to the above-mentioned constraints.Extracting respectively feeding gas into the storage corresponds to going short or long in the spot market with respect to a certain storage level.Hence, gas storage optimization can be seen as the problem of identifying optimal actions in an uncertain and restricted market environment to maximize an expected terminal utility of accrued wealth.More formally, let us consider an agent trading in a market and let h k denote the action she takes on day k.A trading strategy over the whole trading horizon is then collected in H = {h 0 , ..., h K−1 }.At maturity, the agent gains utility U (W H ) based on the stochastic terminal wealth W H that she accrued by trading according to the strategy H.The agent seeks to identify an optimal strategy H * satisfying (1) Reinforcement learning (Sutton and Barto, 2018) is a broad and very active area of research, suggesting a plethora of algorithms to solve intricate optimiza-storage optimization constraints with unit: therm or MWh initial storage 0 units (plus cushion gas) terminal storage 0 units (plus cushion gas) capacity c injection rate on day k u k units (u k > 0) withdrawal rate on day k Table 1: This table lists the most important characteristics for modeling gas storage.Note that in underground storage, there usually is cushion or base gas, which is the volume of natural gas that is intended as permanent and not withdrawable inventory to maintain minimal pressure.This cushion can be neglected for modeling purposes.For simplicity, the injection and withdrawal costs are assumed to be proportional to injection and withdrawal respectively.In reality, these costs depend additionally on the pressure in the underground storage, which in turn depends on the level of working gas.
tion problems.A very popular strand of deep reinforcement learning focuses on approximating optimal actions by (e.g., feed-forward) neural networks; see Definition 1 below.Neural networks are very suitable for such tasks because of their universality and their efficient trainability.Parameters θ of these network strategies G θ = {g θ 0 , ..., g θ K−1 } are trained to maximize an estimate of the expected terminal utility, i.e., to solve max (2) where -L ∈ N is the number of layers (L − 1 hidden layers), ϕ(⋅) denotes a non-linear activation function that is applied component-wise, e.g., the sigmoid activation function ϕ(⋅) = (1 + e − ⋅ ) −1 , and -A l , l = 1, . . ., L denote affine linear maps in the respective dimensions, whose parameters are stored in θ ∈ R q for some q ∈ N.
Note that in reinforcement learning, it is commonly assumed that environments can be described by (known or unknown) Markov decision processes.However, in many real-world applications and in particular in financial markets, information above and beyond knowledge of the current state can be used to better predict the dynamics of the environment and improve control, rendering Markov assumptions unrealistic.While enlarging the state space can partially act as remedy, we highlight that the approach outlined above does not necessarily require a Markovian framework.Instead, we approximate trading strategies along the lines of deep hedging without restricting the state space or even restricting ourselves to specific market dynamics.
Finally, for obvious reasons, we want to note that the trading action h k should only depend on information which is available in the market up until time k.This entails the parameterization of the strategy h k on a given day k by a neural network g θ k mapping appropriate market information and storage levels to trading actions.
3 SMod: intrinsic spot trading Following the machine learning approach outlined in Section 2, we introduce in what follows a deep hedging model for gas storage optimization that is based on trading day-ahead prices of gas.Note that in commodities markets, day-ahead futures or forwards are seen as close proxy of the spot price.Therefore, we tacitly refer trading activities in the day-ahead price of gas to spot trading.For simplicity, we assume no discounting and zero transaction cost, i.e., κ ≡ 0; for more general formulation including costs, see Remark 1.
Let S k = F (k, k + 1, k + 1) k∈T denote the F-adapted gas spot price, and h S k the F k -measurable action on day k.h S k > 0 refers to an injection of h S k MWh into the storage and h S k < 0 refers to a withdrawal of h S k MWh from it.A trading strategy over the whole trading horizon is denoted by HS = {h S 0 , h S 1 , ..., h S K−1 } and its terminal value is given by Moreover, the storage level (or working gas) H S n on day n is given by with an initially empty storage, i.e., H S 0 ∶= 0. Suppose a storage manager's preferences can be expressed through a (concave, non-decreasing) utility function U ∶ R → R. In line with Section 2, she aims to identify strategies maximizing her expected utility of terminal wealth, i.e., to maximize over all eligible HS , where denotes the resulting terminal profit and loss 2 (P&L).The optimization is subject to the constraints 2 The negative sign is added in analogy to the direction of the cashflows.
for all k ∈ T. The constraint (6) says that the storage must be empty at maturity; if one does not adhere to that, a contractually agreed penalty becomes due.Unless gas prices become negative, any profit-seeking agent would comply with (6) naturally, since profits can only be generated by disposing of previously stored gas.The daily (7) constraints can be transformed and merged to a single target range.Indeed, one simply imposes where Pursuing a strategy learning approach, we approximate each action h S k in terms of a deep neural network g k ; for ease of readability, we henceforth skip the dependence on the model parameters θ.The inputs to these network strategies are the current spot price S k , the latest storage fill level H S k (that iteratively depends on the previous neural networks) and the time k, i.e., g k = g k k, H S k , S k .These K networks are summarized in the storage schedule G = {g 0 , ..., g K−1 }.Note that we allow for parameter sharing amongst the network instances g k , i.e., the number of distinct neural networks N ∈ N can be significantly smaller than the number of trading days.In the most extreme case, the present framework allows for modeling each strategy with the same network, i.e., g k ≡ g for all k ∈ T. It needs to be noted that the inputs of the neural networks in the numerical tests below were normalized in order to ensure a swift and stable learning process.The model parameters of G were trained with standard Adam stochastic gradient descent on negative expected utility subject to the constraints mentioned in Section 2.

Training setup
In the following, we state the precise training setup by the example of exponential utility U (⋅) ∶= (1 − e −r⋅ ) r, with risk aversion rate r ∈ R + .
-Training Data: time horizon of storage T, M trajectories of the spot (S i k ) k∈T;i=1,...,M .
-Training object: storage action (withdrawal or injection rate) G over the whole storage horizon, that is a neural network consisting of N ∈ N (N ≤ K) distinct sub-networks, each of which has L layers.The network's input is time as well as respective spot and storage fill level.
-Training criterion: minimize an estimate of expected negative utility over batches B ⊂ {1, . . ., M } of training data, i.e., min where Remark 1 For the full case as described in Table 1, where costs κ and C are non-zero, terminal profit and loss is given by The training can be performed analogously.
Remark 2 Of course here a more general path dependence could be considered to deal with possible non-Markovianity.
For numerical testing, spot curves of gas as well as benchmark strategies were provided by Axpo Solutions AG in form of 1 000 × 351 matrices, representing M = 1000 scenarios of K = 351 trading days.Benchmark strategies had been derived utilising the LSMC technique.We would like to emphasize that the proposed framework is not linked to specific stochastic dynamics of the spot price scenarios, i.e., we can exploit the methodology regardless of the model choice.Therefore, we leave the nature of the scenarios unspecified.If required, one can enrich the base scenarios with arbitrary stress scenarios.
The neural network model was implemented in tensorflow.keras with the sigmoid activation function.The daily constraints ̃ k and ũk from (8), and the network-based action g k were parameterized using the inverse linear transformation from 0, 1 and g k − ̃ k (ũ k − ̃ k ) ∈ [0, 1] respectively.For the final zero storage constraint, we additionally checked for every k that prevailed, in order to ensure that an empty final storage H S K = 0 remained reachable.If the condition was violated on any day, the upper action bounds on all subsequent days were overridden by their lower counterparts, forcing a complete withdrawal of storage until maturity.In reality, it is possible to leave a non-empty storage by paying the penalty.Yet, for our modeling, the zero final storage rule was strictly adhered to. Figure 2 visualizes the constraints: the left plot shows normalized zero storage constraint (10), and the right plot shows daily injection and withdrawal bounds (8) of the trained financial agent.The original daily constraints with the two regimes are still recognizable.The network strategies g k were trained based on the spot prices provided by Axpo Solutions AG.The data set was split into a training set of 900 and a validation set of 100 scenarios in order to perform in-sample and out-of-sample tests.LSCM benchmark strategies were optimized on the entire set of 1 000 scenarios.Thus, they serve as optimal solution.In the course of various experiments, we assessed the required training time depending on the depth and the number of distinct neural networks (N ≤ K), different learning rates and the batch size in order to fine-tune empirically a suitable setting for SMod.
The training generally concluded quickly and is well-managable on a standard 8core notebook.Illustratively, the training of SMod using in the implementation as much as K neural networks and 1 000 epochs on 900 scenarios takes less than 10 minutes.This runtime is competitive with the LSMC approach.Moreover, it turned out that it is not necessary to build SMod on K neural networks.In fact, we encountered that 12 instances with L = 2 and d 1 = 16 (see Definition 1 above) already provided a decent approximation of the optimal policy.Indeed, after 1 000 epochs on 900 scenarios, a learning rate of 0.05, a batch size of 64, and a risk aversion rate of r = 3, the strategy of the artificial financial agent gets convincingly close to the benchmark solution.Figure 3 provides a visualization of the P&L line-up between the spot-only and the benchmark model in in-sample and out-of-sample tests, as well as a visualization of the storage fill levels of SMod and that of the benchmark respectively.A comparison of descriptive statistics on the terminal P&L between SMod and the benchmark is reported in the table at the bottom of Figure 6.Both in-and out-of-sample results are compellingly close to the benchmark.The lower left plot shows the storage fill levels across all scenarios as inferred from the neural networks of SMod, and the lower right plot shows those of the benchmark actions.The scale is normalized by the storage capacity c.The optimal policy tends to inject gas until the storage capacity is reached, and withdraws it after a certain waiting period until the storage is empty again.This is in-line with the underlying seasonality pattern.

SFMod: intrinsic spot and forward trading
In the following, we extend the previous model by trading additionally on the front month rolling forwards with delivery period of a whole month.A front month rolling forward curve contains at any point in time the first nearby monthly forward.We inherently assume that a monthly forward contract is only traded before its delivery period starts (and no longer during the delivery period), and that delivery obligations are valued using the spot prices whose delivery days lie within the delivery period.Note that restrict ourselves to those forwards that have delivery months within the time horizon of the storage problem.A visualization of the forward rolling mechanism is provided in Figure 4.  Let 0 = n 0 < n 1 < ... < n J < K be the first days of the months J = {0, 1, ..., J} respectively.Let h j k with j ∈ J denote the action on day k on the forward F (k, n j , n j+1 − 1), which has the delivery period [n j , n j+1 − 1].h j k > 0 refers to buying and h j k < 0 refers to selling F (k, n j , n j+1 − 1).The above assumption implies that h j k = 0 for k < n j−1 and for k ≥ n j ; in particular h J k = 0 for all k ∈ T. Consistently to SMod, we aim to maximize with the terminal P&L W S K−1 denotes the terminal P&L from the spot trading and is unchangedly given by (5).h S k schedules the storage activity for the next day.Similarly, W F K−1 denotes the terminal P&L from trading the monthly forward, and is defined as For a forward with the delivery period [n j , n j+1 −1], the daily delivery quantity d j is fixed on day n j − 1 for j ≥ 0, and is given by and d 0 ∶= 0. The storage level H n on day n depends on both spot and monthly forward trading activities.For n ∈ [n I−1 , n I ), H n is given by with initially empty storage, i.e.H 0 ∶= 0. The optimization of ( 11) is subject to the constraints for n j ≤ k ≤ n j+1 , j ≤ J − 1, and and for α ∈ [0, 1].Alternatively, the daily constraints (15) can be expressed as iterative daily bounds where Remark 3 In SFMod, the aggregated action on day k is (h S k + d j ) for all n j ≤ k < n j+1 .Hence, the action of pure spot trading is restricted by the daily delivery amount d j , which results from h j k with n j−1 ≤ k < n j .In other words, forward trading activities have a delayed effect on the spot trading, but spot trading does not affect forward trading.The delivery quantities of the upcoming days in the current month are fixed after the respective forward trading has already terminated.The delivery obligations of the due forwards restrict the spot trading activities of the current month, as the sum of daily delivery and the spot trading is bounded by the daily withdrawal and injection rates.The constraint (16) ensures that the maximally traded amount can be stored in case of no spot trading.It can also be identified as liquidity constraint.Moreover, with the scaling factor α ∈ [0, 1], one can bound the volume of forward trading and maintain an appropriate balance between spot and forward trading.

Training setup
Similarly to Section 3, we approximate for each trading day k actions (h S k , h j k ) by neural networks g k = (g S k , g F k ) collected in G = {g 0 , ..., g K−1 }.Note that within this section, network strategies entail a two-dimensional output since in addition to actions in the spot market we also model strategies on the monthly forwards.For ease of notation, we abbreviate monthly forwards as F k = F (k, n j , n j+1 ) for all j ∈ J .The most important aspects of the training can be summarized as follows.
-Training data: time horizon of storage For each scenario i, G i contains those strategies G that fulfil all constraints (13)-( 16).
For numerical testing, 1 000 scenarios of spot as well as monthly forward price curves of 12 months were provided by Axpo Solutions AG. 1 000 rolling monthly forward curves were inferred, each of which contains only the first nearby contract on any trading day.As in Section 3, the data set was split into 900 training and 100 test scenarios.Furthermore, we relied on the same network architecture as in SMod, i.e., 12 distinct neural networks representing 12 months of trading with L = 2 and d 1 = 16.Note, however, that forward trading is discontinued in the last month, because the corresponding contract delivers beyond the trading horizon of the storage.Taking into account the revised constraints ( 13)-( 16), strategy networks were then trained for 1 000 epochs, using Adam stochastic gradient descent with a learning rate of 0.05 and a batch size of 100. Figure 5 visualizes the resulting policy in terms of the storage level across the scenarios over time with respect to different choices of α. Figure 6 provides a visualization and detailed summary statistics for comparing this model with SMod and its benchmark.The comparison is based on the same setup for the spot strategy component and on the same training conditions with the exception of risk aversion rate3 .Compared with the spotonly model, SFMod entails not only a significantly higher P&L on average, but also a higher volatility.Moreover, the P&L originates mostly from forward trading activities, and thus, it is highly sensitive with respect to the choice of α in ( 16).Hence, a suitable choice of α is inevitable.Fig. 5: A comparison between the performance of SFMod with different choices for α and that of the benchmark (LSMC).The upper plots compare the terminal P&L between SFMod (α = 0.1 and α = 0.5) and the benchmark in million CHF.The upper left exhibits the P&L on the training set and the upper right that on the test set.In both plots, the P&L distributions of SFMod are conclusively more favorable than that of the benchmark.The plots at the bottom visualize the storage levels proposed by SFMod for α = 0.1 on the left and for α = 0.5 on the right.The scale is normalized by the storage capacity c.The optimal policy features a consistent seasonality pattern as that in SMod.
Figure 6 substantiates that SFMod, which allows trading activities on forwards, is clearly the most favorable choice in terms of maximizing the expected utility of terminal wealth.In comparison to SMod, it is slightly more involved from the technical setup, but in terms of computational time and effort, it is still well manageable on a standard 8-core notebook.Regarding SFMod, the higher first moment of the P&L distribution across all scenarios comes with a higher standard deviation.Furthermore, the first moment is sensitive to the limitation on forward market activities, expressed by the control variable α.One direction of future work might be to generate superior P&L distributions with less risk.Another possible direction of future work might increase the model-theoretic complexity with more forward curves.It needs to be noted that the performance of SFMod is not adversely affected if we further extend the scope of forward trading activities or incorporate more realistic model features such as, for instance, H S i k -dependent transaction cost.Fig. 6: The boxplot and the table provide a line-up of the considered models.
The first moments and the volatility of the P&Ls distribution are largest for SFMod.Moreover, they depend on the choice of α.Please note that the LSMC-approach may not serve as a valid and competitive benchmark, as it does not allow for forward trading activities.

Conclusion
We proposed a flexible and powerful framework that is capable of dealing with the intricacy of optimizing underground gas storage facilities in the presence of forward markets.Traditional techniques such as, for instance, least-squares Monte Carlo (LSMC) or dynamic programming are subject to a so-called curseof-dimensionality, whereas the proposed deep learning technique is almost not affected by the dimensionality.Moreover, our experimental results show that the proposed deep hedging approach performs as good or better than the mostestablished state-of-the-art LSMC benchmark.These advances pave the way for unprecedented storage and production plans of energy.

Fig. 2 :
Fig. 2: The daily constraints of a gas storage.The left figure visualizes the empty final storage constraint (10); the y-axis is normalized by the total storage capacity c.The critical boundary is reached on the trading day 269.In other words, if the relative storage level is beyond the blue line any time after the trading day 269, the only admissible actions remain maximal withdrawal up until maturity.The right figure visualizes the daily injection and withdrawal constraints (already incorporating an optimal strategy).

Fig. 3 :
Fig. 3: A line-up between the performance of the spot-only model (SMod) and that of the benchmark (LSMC).The upper plots compare the terminal P&L between SMod and the benchmark in million CHF.The distribution on the left is based on the training set and that on the right based on the test set.Both in-and out-of-sample results are compellingly close to the benchmark.The lower left plot shows the storage fill levels across all scenarios as inferred from the neural networks of SMod, and the lower right plot shows those of the benchmark actions.The scale is normalized by the storage capacity c.The optimal policy tends to inject gas until the storage capacity is reached, and withdraws it after a certain waiting period until the storage is empty again.This is in-line with the underlying seasonality pattern.

Fig. 4 :
Fig. 4: The mechanism of the rolling strategies in SFMod.
n 11 , n 12 ) ▷ Daily delivery: d 10 MWh.MWh of S k ▷ No action on forward market ▷ Daily delivery: d 11 MWh.
T, M trajectories of the spot (S i k ) k∈T;i=1,...,M , and of rolling month forward (F i k ) k∈T;i=1,...,M respectively; -Training object: trading strategy network with two outputs for spot action and action in the rolling month forward consisting of N ∈ N (N ≤ K) distinct sub-networks, each of which has L layers.The network's input is time as well as respective spot and storage fill level.-Training criterion: minimize an estimate of expected negative utility over batches B ⊂ {1, . . ., M } of training data, i.e., min ,539 4,284,428 1,440,326 5,655,886 4,425,8101,436,510 1,448,475 median 5,362,543 4,341,783 1,391,877 5,576,047 4,258,077 1,284,445 1,382,435 std 3,111,824 2,708,985 1,132,284 2,839,238 2,526,534 1,019,118 1,109,619