Deep learning for quadratic hedging in incomplete jump market

We propose a deep learning approach to study the minimal variance pricing and hedging problem in an incomplete jump diffusion market. It is based upon a rigorous stochastic calculus derivation of the optimal hedging portfolio, optimal option price, and the corresponding equivalent martingale measure through the means of the Stackelberg game approach. A deep learning algorithm based on the combination of the feedforward and LSTM neural networks is tested on three different market models, two of which are incomplete. In contrast, the complete market Black-Scholes model serves as a benchmark for the algorithm's performance. The results that indicate the algorithm's good performance are presented and discussed. In particular, we apply our results to the special incomplete market model studied by Merton and give a detailed comparison between our results based on the minimal variance principle and the results obtained by Merton based on a different pricing principle. Using deep learning, we find that the minimal variance principle leads to typically higher option prices than those deduced from the Merton principle. On the other hand, the minimal variance principle leads to lower losses than the Merton principle.


Introduction
Many jump diffusion models have been extensively studied in the past.Of particular interest to us is the Merton model proposed by Merton (1976), where the jump distribution is log-normal.Merton derived a closed-form solution to the hedging problem for the European option, assuming diversifiabilty of the jump component.There are also other popular jump diffusion models.Kou (2002) proposed an alternative approach where double exponential jump distribution is assumed.The variance gamma model proposed in (Madan and Seneta, 1990) also supports the inclusion of jumps.
Regarding the use of machine learning in stochastic control theory, there have been a lot of new developments in the past few years.With new techniques, it has now become possible to solve problems that do not obtain closed-form solutions numerically.In (Han et al., 2016), a deep learning approach for solving stochastic control problems was proposed.The idea is that one discretizes time and, at each timestep, approximates the control with a feed-forward neural network.In the papers (Buehler, Gonon, Teichmann, and Wood, 2019;Carbonneau and Godin, 2021), the authors extend these results to reinforcement learning algorithms for hedging in high dimensions.Methods proposed in (Han et al., 2016) also provide a strong tool for finding the numerical solutions to high dimensional partial differential equations.In a seminal paper (Han, Jentzen, and E, 2018) and later on in (Beck, E, and Jentzen, 2019;Chan-Wai-Nam, Mikael, and Warin, 2019), the authors transform PDEs to backward stochastic differential equations and use deep learning to solve the associated stochastic control problem.
Since control problems related to hedging rely heavily on time series dynamics, it is convenient to consider recurrent neural networks (RNN) instead of feed-forward ones.These types of networks were proposed in (Rumelhart, Hinton, and Williams, 1986).The recurring nature of the network can contain cycles, which makes them suitable for problems that have sequential inputs.A particular case of RNN called long short-term memory (LSTM) was first introduced in (Hochreiter and Schmidhuber, 1997) and became very prominent in the last few years for time series modelling.The main advantage over RNN is LSTM's ability to provide a memory structure that can store information over a large number of timesteps.It has been heavily used in financial time series prediction; for instance, see (Cao, Li, and Li, 2019;Selvin, Vinayakumar, Gopalakrishnan, Menon, and Soman, 2017;Siami-Namini, Tavakoli, and Namin, 2018).One of the main advantages of LSTM networks is that they can also be used for non-Markovian models.In particular, Han et al. used them in (2021) to apply the methods proposed in (Han et al., 2016) to stochastic control problems with delay.Deep learning for mean-field stochastic control has been stud-ied by Agram et al. (2020) where the initial state was a part of the control process and then extended to mean-field systems with delay in (Agram, Grid, Kebiri, and Øksendal, 2022).Lai et al. (2022) also used LSTM networks to develop a data-driven approach for options market making.
Since our main concern is hedging in an incomplete market, we also refer to (2020), where Fecamp et al. present several deep learning algorithms for discretetime hedging, where the sources of incompleteness are illiquidity, non-tradable risk factors, and transaction costs.They propose a modified global LSTM approach where a single LSTM net is used that takes the whole time series as an input.The above approach was adopted in (Gao, Wu, and Duan, 2021), where they study the double exponential jump distribution model proposed in (Kou, 2002).
Furthermore, we do not consider the calibration of SDEs to fit market data in this paper.For promising approaches, one can check (Boursin, Remlinger, and Mikael, 2022;Remlinger, Mikael, and Elie, 2022) where generative adversarial networks are used.Another approach could feature neural SDEs.A good survey is (Kidger, Foster, Li, and Lyons, 2021).
In what follows, let us describe our purpose in the present paper.We consider a financial market where the underlying price of a risky asset follows a jump diffusion process with R * = R \ {0}, and S 0 (t) = 1 for all t denotes the value of a risk-free money market account.
The results in this paper can easily be extended to an arbitrary number of risky assets, but since the features of incomplete markets we are dealing with, can be fully illustrated by one jump diffusion risky asset only, we will for simplicity concentrate on this case in the following.We emphasise however, that we deal with an arbitrary number m of independent Brownian motions and an arbitrary number k of independent Poisson random measures in the representation (1.1).
Let z ∈ R be an initial endowment and let π(t) ∈ R be a self-financing portfolio, representing the fraction of the total wealth X(t) = X z,π (t) invested in the risky asset at time t.We say that π is admissible if in addition π is a predictable process in L 2 (dt × dP ).The set of admissible portfolios is denoted by A. We associate to an admissible portfolio π the wealth dynamics Here and in the following, we write for notational simplicity X(t) instead of X(t − ), in agreement with our convention for S(t).
Fix T be the time to maturity and let F be a given T -claim, i.e.F ∈ L 2 (P ) is an F T -measurable random variable, representing the payoff at that time.Then for each initial wealth z ∈ R and each portfolio π ∈ A, we want to minimize the expected squared hedging error Then the problem we consider is the following.
Problem 1.1 Find the optimal initial endowment z ∈ R and the optimal portfolio π ∈ A, such that inf Heuristically, this means that we define the price of the option with payoff F to be the initial endowment z needed to get the terminal wealth X(T ) as close as possible to F in quadratic mean by an admissible portfolio π.
We may regard the minimal variance problem (Problem 1.1) as a Stackelberg game, in which the first player chooses the initial endowment z, followed by the second player choosing the optimal portfolio π based on this initial endowment.Knowing this response π = π z from the follower, the first player chooses the initial endowment z which leads to a response π = π z which is optimal, in the sense that J( z, π z ) ≤ J(z, π) over all admissible pairs (z, π).
For a general (discontinuous) semimartingale market, this is already a known result; we refer, for example, to ( Černý and Kallsen, 2007) where the authors have proved that the optimal quadratic hedging strategy π can first be characterized in a linear feedback form which is independent of the initial endowment z, and that z can then be identified in a second step.Therefore, the results of Section 2 of the current paper might be derived from already existing results in the literature ( Černý and Kallsen, 2007) but we found it convenient for the reader to find explicit expressions of the optimal quadratic hedging problem for a jump diffusion market (Lévy process).Schweizer (1996) proves (under some conditions, including a non-arbitrage condition) that there exists a signed measure P , called the variance-optimal measure, such that (1.4) We consider in the present paper a jump diffusion market and we show that P is a positive measure and we find it explicitly.
In the next section, we prove the existence of the optimal hedging strategy and the optimal initial endowment for jump diffusion market.We show that the minimal variance price can be represented as the expected value of the option's (discounted) payoff under some equivalent martingale measures (which is given explicitly).The analysis concludes with looking at specific examples of jump diffusion market models.
Section 3 is devoted to the deep learning approach.Since the option price and the optimal hedging portfolio can not always be computed explicitly, or it is not feasible, we need to use some other numerical methods.We propose a deep learning algorithm that approximates option price and hedging portfolio.We propose a joint feedforward and multilayered LSTM network that adopts the "online" approximation approach proposed in (Han et al., 2016), where a neural network is used at each timestep to predict the control process.
First, we test this algorithm in the case of a complete market, namely in the case of the Black Scholes (BS) model.This is done in order to estimate its performance since both option price and hedging portfolio can be obtained explicitly in the BS case.
Then we apply it to models that assume an incomplete market.First, a continuous model where the underlying asset depends on multiple independent Brownian motions is used in order to study the algorithm's scalability to multi-dimensional inputs.The same is then also done for the jump diffusion Merton model.
We show the algorithm's success in all three models and discuss its performance.Since Merton's reasoning behind his hedging strategy is not consistent with the behaviour of financial markets, we also compare our results to those obtained by Merton and discuss how the approaches differ.In particular, we show how the minimal variance approach provides a safer hedging strategy compared to the one proposed by Merton.Finally, the algorithm's performance is also tested for another jump diffusion model, namely the Kou model.
2 Optimal quadratic hedging portfolio/optimal initial endowment In this section, we find the optimal strategy pair to Problem 1.1.

Equivalent martingale measures (EMMs)
Since EMMs play a crucial role in our discussion, we start this section by recalling that an important group of measures Q ∈ M can be described as follows (we refer to Chapter 1 in (Øksendal and Sulem, 2007) for more details): Let θ 0 (t) and θ 1 (t, ζ) > −1 be F-predictable processes such that Define the local martingale Z(t) = Z θ 0 ,θ 1 (t), by A sufficient condition for Z(•) being a true martingale is For proof see (Kallsen and Shiryaev, 2002).Then the measure Q θ 0 ,θ 1 defined by

The optimal portfolio
Assume as before that the wealth process X(t) = X z,π (t), corresponding to an initial wealth z and a self-financing portfolio π, is given by Let F be a given T -claim representing the terminal payoff of the option.By the Itô/martingale representation theorem for jump diffusions (see (Løkka, 2004)) we can write F = F (T ), where the martingale for unique F-predictable processes β(t) ∈ L 2 (λ 0 × P ), κ(t, ζ) ∈ L 2 (λ 0 × ν × P ), where λ 0 denotes Lebesgue measure.Note that β and κ depend linearly on F (they can be given in terms of Malliavin derivatives of F ).
Then by the Itô formula for jump diffusions (see e.g.Theorem 1.14 in (Øksendal and Sulem, 2007)) we get and

This gives
We can minimize J(π) by minimizing the dt-integrand pointwise for each t.This gives the following result: Theorem 2.1 Recall that we have made the assumption (1.2). a) For given initial value X(0) = z > 0 the portfolio π = π z which minimises is given in feedback form with respect to X(t) = X z, π by or, equivalently, , where .

The optimal initial endowment
Completing the Stackelberg game, we now proceed to find the initial endowment z which leads to a response π = π z that is optimal for Problem 1.1, in a sense that J( z, π) ≤ J(z, π) over all pairs (z, π).We shall first find the explicit solution for X.
Writing X = X for notational simplicity, equation (2.5) is of the form where , (2.7) (2.8) We rewrite (2.6) as dX (t) − X (t) dΓ t = C (t) dΛ t , (2.9) and multiply this equation by a process of the form where ρ, λ and θ are processes to be determined.Then (2.9) gets the form (2.10) We want to choose ρ, λ and θ such that Y t becomes an integrating factor, in the sense that To this end, note that by the Itô formula for Lévy processes, we have Therefore, again by the Itô formula, using (2.6), where (2.11) This gives Then Substituting this into (2.10),we get Solving for X (t), we obtain the following: Theorem 2.2 Given initial value z the corresponding optimal wealth process X z (t) is given by In particular, note that Going back to our problem, choose z ∈ R and let π z be the corresponding optimal hedging portfolio given by (2.3) and let X z be the corresponding optimal wealth process given by (2.5) and (2.13) respectively.Then Note that, if we define and then we can verify by the Itô formula that and (2.15) ) Proof.
To see this, we verify that the coefficients θ 0 (t □ Using this, we obtain the following, which is the main result in this section: Theorem 2.4 (i) The unique minimal variance price z of a European option with terminal payoff F at time T is given by (2.17) where A T is given by (2.12), C(s), Λ s are given by (2.7), (2.8) respectively, and K is given by (2.11).
(ii) Assume that the coefficient α(t), σ(t) and γ(t, ζ) are bounded and deterministic functions.Then where Q * is the EMM measure given by (2.16). Proof.
(i) To minimize J 0 (z) := E 1 2 ( X z (T ) − F ) 2 with respect to z we get by (2.13) and (2.14) that This is 0 if and only if (2.17) holds.
If α 1 is deterministic, we cancel out the factor exp for all t, and β = κ = 0. Hence the optimal portfolio is given in feedback form by Assume, for example, that α(t) > 0. Then G(t) < 0, and we see that if X(t) < F , then π(t)X(t) > 0, and hence the optimal portfolio pushes X(t) upwards towards F. Similarly, if X(t) > F then π(t)X(t) < 0 and the optimal push of X(t) is downwards towards F .This is to be expected since the portfolio tries to minimize the terminal variance Moreover, if we start at z = X(0) = F , we can choose π = 0 and this gives = 0, which is clearly optimal.By uniqueness of ( z, π) we conclude that ( z, π) = (F, 0) is the optimal pair in this case.
Remark 2.6 The price z is an arbitrage-free price of F , follows by Theorem 2.4 (ii).
Remark 2.7 Note that, as remarked earlier, the coefficients β and κ depend linearly on F .Therefore it follows from the formula (2.17) that the map Φ : is linear and bounded.By the Riesz representation theorem this map can be represented by a random variable Z ∈ L 2 (P, F T ), in the sense that Therefore, if we define the (signed) measure Q on F T by Comparing with the Schweizer variance-optimal pricing measure P (1.4) we conclude the following: Corollary 2.8 (i) P = Q In particular, P always exists in this market, without any non-arbitrage conditions.
(ii) Moreover, we have P = Q * , which is a positive EMM.

Example: European call option
We give some details about how to compute the minimal variance price z explicitly in the case of a European call option that will used in Section 3.
(i) Note that the term C(s) in Theorem 2.4 depends on the coefficients β and κ in the Itô representation of F .These coefficients can for example be found by using the generalised Clark-Ocone formula for Lévy processes, extended to L 2 (P ).See Theorem 12.26 in (Nunno, Øksendal, and Proske, 2008).
Let us find these coefficients in the case of a European call option, where where K is a given exercise price.In this case F (ω) represents the payoff at time T (fixed) of a (European call) option which gives the owner the right to buy the stock with value S(T, ω) at a fixed exercise price K. Thus if S(T, ω) > K the owner of the option gets the profit S(T, ω) − K and if S(T, ω) ≤ K the owner does not exercise the option and the profit is 0. Hence in this case Thus, we may write where The function f is not differentiable at x = K, so we cannot use the chain rule directly to evaluate D t F .However, we can approximate f by C 1 functions f n with the property that we see We get where D t F and D t,ζ F denote the generalised Malliavin derivatives (also called the Hida-Malliavin derivative) of F at t and (t, ζ) respectively, with respect to B(•) and N (•, •), respectively.Combining this with the chain rule for the Hida-Malliavin derivative and the Markov property of the process S(•), and assuming for simplicity that σ is constant and γ(t, ζ) = γ(ζ) does not depend on t, we obtain the following for β: (2.20) To find the corresponding result for κ we first use the chain rule for D t,ζ and get Then we obtain (2.21) where in general E y [h(S(u))] means E[h(S y (u))], i.e. expectation when S starts at y.
(ii) Since the coefficients α, σ and γ of the process S are deterministic and bounded, we compute numerically the minimal variance price where β, κ are given by (2.20),(2.21),respectively, we use the Itô formula combined with (2.15) to obtain where G(t) is given by (2.4).

Application to Black Scholes market with two independent Brownian motions
In this example, we consider a market driven by two independent Brownian motions, B 1 (t), B 2 (t): where the coefficients α 0 , β 1 , β 2 are assumed to be bounded constants.For a self-financing portfolio π and an initial wealth z, we have a wealth dynamic We want to find the pair ( z, π) which minimizes the expected squared hedging error Corollary 2.9 From Theorem 2.4 (ii), we conclude that (2.23) where (2.24) Looking at the above corollary, we can see that process Z * coincides with the one in the Black-Scholes setting, where the volatility coefficient of the single Brownian motion B(t) is given by β = β 2 1 + β 2 2 .Since processes βB(t) and β 1 B 1 (t) + β 2 B 2 (t) have the same distribution, so do the option payoffs.This yields that the minimal-variance European call option price in an incomplete market with two independent Brownian motions E Q * [F ] coincides with the unique Black-Scholes price in a single Brownian motion case for volatility β = β 2 1 + β 2 2 .Furthermore, we know that the hedging portfolio in the BS model is only a function of time and the underlying asset.Since for the choice of volatility parameter as above, the stocks in both BS and two-Brownian motion model follow the same distribution, we can deduce that for initial endowment E Q * [F ] from Equation (2.23) and for the BS hedging portfolio, we get The above can, of course, be extended to the arbitrary number of Brownian motions.

Application to Merton model
Here we consider the special case of the Merton model first proposed in (Merton, 1976).
It deals with European options in the market modeled by jump diffusion.More precisely, the market consists of (i) a risk free asset, with unit price S 0 (t) = 1 for all t, (ii) a risky asset, with price S(t) given by dS(t) = S(t) α 0 dt + σ 0 dB(t) + (y − 1) Ñ (dt, dy) , S(0) = S 0 > 0. (2.25) Here Ñ is a compensated Poisson random measure corresponding to the compound Poisson process with intensity λ and jump sizes y = exp(Y ), where Y ∼ N (µ, δ 2 ), independent of B and jump times.Heuristically speaking, y represents the absolute jump size, while γ 0 (t, y) = y − 1 equals the relative jump size.The coefficients α 0 and σ 0 are assumed to be bounded and deterministic.
If we denote k = E[y − 1] = exp(µ + δ 2 2 ) − 1 and observe that y − 1 > −1 a.s.we can use Itô formula and obtain the explicit solution Merton then proceeds to argue that the jumps in the asset price are not systemic and, therefore, the risk related to the jumps is diversifiable.In other words, this means that all the properties of the jump component of S(•) under the risk neutral measure are the same as under the natural measure.This argument yields the choice θ 0 = α 0 σ 0 and θ 1 (y) = 0 which results in EMM Q M corresponding to the Radon-Nikodým derivative and enables Merton to obtain the option price as where BS(t, S(t), σ 0 , r) denotes the price of the European call option under the BS model at time t, with stock price S(t), volatility σ, and interest rate r.This approach results in a non-symmetric loss.Merton argues that if no jump occurs, then the owner of the option collects a small profit.However, in a rare case when a jump does occur, the option holder suffers a significant loss.The nonsymmetric loss distribution can, in some instances, be nondesirable.Apart from that, the assumption of diversifiabilty of the jump risk does not hold in practice; for example, market indexes do experience occasional jumps in price.
Using the results from the previous section, we can obtain the optimal portfolio, optimal wealth process, and the unique minimal variance price given by (2.3), (2.13) and (2.17) respectively.To evaluate these expressions, one has to compute coefficients β(t) and κ(t, y) given by equations (2.20) and (2.21).If we denote where we use the fact that H(t)| N (t) = j ∼ N jµ, σ 2 (T − t) + jδ 2 .Unfortunately, this representation is not enough to obtain the desired results explicitly.However, we may observe the following.The Lévy measure associated with the compound process in equation (2.25) equals λµ y , where µ y is the law of lognormally distributed random variable y.Hence, even though the coefficient γ 0 is stochastic, we have that m = R * γ 0 (y) 2 ν(dy) is deterministic.Since α 0 and σ 0 are also deterministic, so is By part (ii) of the Theorem 2.4, we have an explicit representation of the EMM Q * and the corresponding Radon-Nikodým derivative given by Moreover, the minimal variance price z of an option with payoff F at time T is (2.28) Fig. 1 One realization of Z M and Z * for the choice of parameters specified in Section 3.3.3 Remark 2.10 It must be stated that, in theory, we do not have a guarantee that G(y i −1) > −1 since lognormal distribution takes values on the whole positive line.Consequently, the logarithm may not be defined, in which case we use the alternative formulation from (Lamberton and Lapeyre, 2011).This results in a signed measure Q * .Nonetheless, in practice, for a reasonable choice of parameters, G(y i − 1) > −1 holds with probability practically equal to 1, and we do not need to worry about this technical problem.
We can not derive the same formula as in (2.26) since the distribution of ln(1+G(y −1)) is unknown; however, we can use the Monte Carlo approach to obtain the option price.This option price obviously differs from Merton's, as does the hedging portfolio that also considers the risk coming from the jumps and results in a symmetric loss.Using the Euler-Maruyama discretization rule, we simulate one realization of Z M and Z * and present it in Figure 1.We can see that Z * has upward jumps, which are not present in Z M .
In figures 2 and 3, we can see how the price of the European option changes for both models depending on the parameters λ, µ, σ, δ.The option price in both models grows together with the absolute value of the parameters.Note that option prices coincide with the exclusion of jumps, namely when λ = 0 since the Merton model devolves into the BS model.It must be noted how the changes in prices are higher in the case described by Figure 2, where we consider the dependence on parameters λ and µ, than in Figure 3, where σ and δ are taken into consideration, due to those parameters' effect on stock and wealth dynamics.Large values of parameters λ and µ change dynamics directly, while σ and especially δ do it indirectly.In Figure 2, it is shown how model prices are similar for small absolute values of parameters.When parameters grow, minimal variance price grows faster; however, the difference stabilizes.As a matter of fact, the difference appears to be most significant when λ is relatively small, while |µ| is large.This is because once λ is large, the compensated part of the jump component, which is taken into account in both hedging strategies, takes over.
To continue with observations for the other two parameters, we can see that even though absolute differences are smaller, relatively speaking, models differ more when σ and δ parameters are modified, as shown in Figure 3.The difference between the prices seems to be biggest when jump sizes are constant while volatility in the model is large.
Fig. 2 Option price comparison depending on parameters λ and µ, where S 0 = 1, K = 0.5, σ = 0.2 and δ = 0.05 Numerical methods for obtaining the optimal portfolio are presented in the next section.

Deep Learning Approach
Here we present the algorithm using deep learning methods that returns both the optimal initial value and the optimal portfolio.In (Fecamp, Mikael, and Warin, 2020), different types of neural networks (NN) are compared when studying hedging in discrete-time incomplete markets.It turns out that Long short-term memory (LSTM) networks, which are the particular case of Recurrent neural networks (RNN), outperform standard feed-forward NN.Because the model discussed in the aforementioned paper doesn't incorporate jumps in the price dynamics, we also conducted a comparison between the performance of LSTM and feed-forward neural networks.However, we detected no significant difference between the two approaches.A possible intuitive reason for the good performance of the LSTM network in the jump diffusion models could be its ability to learn the jump dynamics from the previous jump occurrences.Furthermore, RNN and, in particular, LSTM networks seem to be the most natural approach when working with time series data and, importantly, allow generalization to non-Markovian models.Hence, we opted to focus exclusively on the LSTM model in the subsequent work.In particular a possible extension that would benefit from the use of LSTM networks is if we consider stochastic volatility models.Historical information that LSTM keeps or forgets would help us to store and use knowledge about the behaviour of the volatility parameter.The performance of our network is to be tested on four models.First, we consider the BS model, where the market is complete.The reason for that is that both option value and hedging portfolio are known and can be compared to our results.Then, we consider three cases of incomplete markets.We start with the BS model with multiple independent Brownian motions and continue with two jump diffusion models, Merton and Kou double exponential models.

Data Generation
To generate the data, we discretize time interval [0, T ] into R equidistant points t i at a distance ∆ t .We denote different realizations of the process with (j) in superscript, where j = [M ] and M denotes the batch size.When the superscript is omitted, we assume the vector notation.Additionally, we here assume the most general case with d B dimensional Brownian motion and d N dimensional Poisson random measure.
For initial wealth x 0 we construct under portfolio π the wealth process at time t i using the update rule Here α 0 ∈ R and σ 0 ∈ R d B are drift and volatility parameters respectively.Parameter γ 0 ∈ {0, 1} serves as a dummy variable that indicates whether a model is continuous or not.Moreover B i are M × d B dimensional independent N (0, 1) random variables, while , where J l i are independent and given by where This discretization coincides with continuous time SDEs in the Merton model described in Section 2.5 and in the BS model when γ 0 = 0 and Brownian motion is one dimensional.
To obtain a discrete stock process s, which we need to compute the loss function later on, we choose the initial stock value s 0 and set π = 1 in equation (3.1).
For option strike price K we first compute option's t R claim as F (j) = (s Then we can define loss as Note that here we still, for brevity's sake, assume that the interest rate r = 0, since all the proceeding computations can be done with the same amount of accuracy.

Network Architecture and Algorithm
As mentioned above, we make use of the LSTM networks.A feed-forward network approach was also tested.However, it resulted in worse performance; hence we decided to stick with LSTM to allow more generality.We divide our learning task into two parts.First, we need to find the initial wealth value x 0 .This is a relatively simple task, so a simple one-layer linear feed-forward NN with hidden dimension d given by is used.Recall that we interpret initial wealth x 0 as an option price, a deterministic quantity.Hence, x j 0 = x 0 for each j ∈ [M ] and we may use a vector of initial stock prices y = [s 0 , . . ., s 0 ] ⊤ as an input for NN L. Since the wealth process in Equation (1.3) is a geometric jump diffusion where a positive initial value is assumed, we use the softplus activation function f (x) = log(1 + e x ).
In the second part of the learning process, we strive to find the optimal portfolio π.We use a stacked LSTM network with two layers.Each layer is built from LSTM cells, one at each timestep.We do not give a detailed description of LSTM cells here.An interested reader can find more information in (Hochreiter and Schmidhuber, 1997).At each time step, the update rule (3.1) is used to obtain the wealth state x i , which we feed as an input for the first LSTM layer, as shown in Figure 4.It is important to note that learning the optimal portfolio π is a challenging problem since the portfolio at each timestep depends indirectly on the previous portfolio values.Our network architecture captures this dependency through hidden states h j i and cell states c j i for i ∈ [R] and j = 1, 2. Second LSTM layer is used mainly to take into account the non-linearity of our problem and consequently the non-linearity of the solution π.After the second LSTM layer returns its hidden state h 2 i ∈ R d , we feed it to another linear network of the form (3.3), which at last yields π i .To summarize, our network takes as input the initial stock value [s 0 , . . ., s 0 ] ⊤ ∈ R d , Brownian motion increments B ∈ R M ×d B ×R and jump increments J ∈ R M ×d N ×R and returns an output (x 0 , π) ∈ R M ×R+1 , where we can interpret x 0 as option price and π as hedging portfolio.Once done, we compute loss(s R , x R ) and use backpropagation to update the weights.In particular, the Adam optimizer is used.It feels befit to note here how the backpropagation approach to modifying the weights connects naturally to the theoretical control problem we solve in Section 2. Recall, in the Stackelberg game approach, we first determine the optimal portfolio πz given arbitrary initial state z, and only after that do we also find the optimal initial state ẑ where behavior under the above portfolio is assumed.Similarly, we start with an arbitrary initial state and portfolio when taking the deep learning approach.Then once the loss is computed, we modify weights in a backward manner.Hence we first modify weights responsible for the selection of portfolio π x 0 given (at the moment arbitrary) initial wealth x 0 , and only after that do we also modify the weights associated with the initial wealth.
Remark 3.1 Since the control process π is F-adopted, one may feel inclined to use the Brownian motion sequence (B i ) R i=1 as an input for the LSTM network.Indeed, it holds F X t ⊆ F t , so it may happen that using the sequence (x i ) R i=1 as an input may result in sub-optimal controls.However, as it is shown in Equation (2.3), the optimal control is of the feedback form, hence the choice of (x i ) R i=1 as the data input is reasonable.Apart from that, as discussed in (Han and Hu, 2021), taking (B i ) R i=1 as an input, does not improve the performance.
To summarise, let us present our Algorithm 1 in a more compact way.
Algorithm 1 Price of a European option and a hedging portfolio in Merton model Require: Maturity time T , drift parameter α 0 , volatility parameter β 0 , jump parameter γ 0 , initial stock price s 0 , strike price K, number of time steps R, batch size M and number of epochs P for 1 ≤ epoch ≤ P do Using discretization rules in Equation (3.1) and NN architecture in Figure 4 Compute loss(s R , x R ) as in Equation (3.2) Using backpropagation, update the weights of the NN architecture end for return x 0 and (π i ) i∈[R]

Numerical Results
Here we present the numerical results obtained using Algorithm 1.For calculations, we used the PyTorch library from Python.The hidden dimension is set to 512, batch size to 256, and the learning rate to 0.0005.Unless stated otherwise, we set time maturity T = 1.To assess the results, an evaluation set of size 10000 is used.The programming code can be found on the GitHub repository: https://github.com/janrems/DeepLearningQHedging.We examine four market models: the standard Black-Scholes (BS) model, the BS model with multiple independent Brownian motions, the Merton model, and the Kou double exponential jump diffusion model.Our analysis focuses on the convergence of losses in all these models.For the first three models, we explore the convergence of initial values toward the specified option price as well.Additionally, we compare the predicted hedging portfolio to its analytical counterpart for both variations of the BS model.Furthermore, we assess the algorithm's performance as the input dimension increases.In the continuous case, we achieve this by comparing results from the two studied BS variations.In jump models, we test scalability concerning the input dimension in a Merton case.Finally, we investigate the scalability of the algorithm in the temporal dimension, specifically when adjusting the timestep parameter R and maturity T .

Black Scholes market model
For the BS model, we take the number of time steps R = 80.Recall, the BS model is a case of the complete market model, which means that there exist initial wealth and optimal hedging portfolio that, at least in a continuous case, assures loss = 0. Apart from that, we have nice analytic solutions for both option price and hedging portfolio.We consider a case where in Equation (3.1) we set α 0 = 0.3, σ 0 = 0.2, γ 0 = 0, λ = 0 while we put s 0 = 1 and K = 0.5.Different choices of listed parameters result in different training times but do not affect the model's accuracy.For this particular choice of parameters, we compute 6000 epochs.
All the results presented here and in the other models are obtained in the following way.At the epoch where the minimal loss is achieved, we save the network weights.Then, using those weights, we evaluate our model on a larger evaluation set of size 10000.
The first two outputs of the algorithm are the optimal initial wealth x0 and the minimal obtained loss loss min .Both of them can be found in Table 1.In figures 5a and 5b, we can see how loss and initial value x 0 both converge towards 0 and theoretical BS option price 0.5, respectively.Now, let us take a look at our algorithm's other output, namely hedging portfolio π.Unlike initial value x 0 which is deterministic, portfolio π differs over different market realisations.We select portfolio realizations obtained on the evaluation set and denote them with π.We also wish to determine how close π is to theoretical portfolio given by the BS model.To obtain its discrete version from our discrete stock process s i∈[R] we define and put where Φ stands for the standard normal cumulative distribution function.
We define and estimate the distance between π and φ as a discrete L 2 distance estimate, namely We can see one instance of market realization, namely the stock process s, state process x, computed portfolio π theoretical portfolio φ and option payoff at terminal time F in Figure 5c.
The estimated portfolio follows the theoretical one closely.Moreover, as expected, we manage for the terminal wealth to be almost exact to the option payoff.In contrast to the classical Black-Scholes market with one Brownian motion, here the noise in the model comes from multiple independent Brownian options.In turn, this results in a non-complete market model.Let us recall that this means there is no guarantee that the terminal wealth X T equals the option payoff F almost surely.Besides, there is, in general, a continuum of possible option prices.However, as it was discussed in Section 2.4.1, in the case of the European call option, following the minimal-variance hedging approach, we end up with the price and the optimal hedging strategy coming from the classical Black-Scholes This gives us a nice opportunity to study how our algorithm scales in the presence of multidimensional noisy input.We choose the same parameters as in the previous section, the only difference being the volatility coefficient, which is now, of course, multidimensional and given by σ 0 = [0.11,0.16, 0.05].Note that we have ∥σ 0 ∥ = 0.2.

Model
Comparing both results in Table 2 and graphs in Figure 6 with their counterparts in Section 3.3.1,we can see that our algorithm scales well with the increased size of the input.Even though the results in both tables differ a bit, it is only natural because of the random nature of our machine learning algorithm.

Merton market model
In our particular example of the Merton model, we follow the setting from Section 2.5 and again use Algorithm 1, where we set parameters to α 0 = 0.2, σ 0 = 0.2, γ 0 = 1, s 0 = 1 and K = 0.5.For the log-normally distributed jumps, we set expected value µ = −0.2 and standard deviation δ = 0.05, while jump intensity λ = 5.Because of the downward jumps, we need to work with finer discretization to avoid negative wealth and stock values.Hence, we choose R = 150, which in turn leads to longer training time since we now have 150 times 2 LSTM cells in our NN.As the stock process is noisier than the one in the BS case, so is the portfolio, which in turn also means a harder learning task.We use 7,000 epochs and disregard the training rounds where negative jumps occur.
In Figure 7c, we see one market realization at the epoch at which the lowest loss was obtained.
The portfolio is much more dynamic, which, as discussed, results in slower learning.Furthermore, we see a more significant gap between terminal wealth and option payoff.As we can see in the figures 7a and 7b, both losses and initial values converge toward desired quantities.However, we can see that convergence becomes slow after just 3000 epochs.Tackling the plateau problem with, for instance, scheduling the learning rate is one of the tasks for future work.Notice there are a few spikes in both of these graphs.We believe they are due to computational errors that occur when the wealth process takes values close to zero, which is more often than in the previous two cases due to downward jumps.
The option price for the case of the Merton model is given by Equation (2.28).Using the Monte Carlo approach, we get an estimation ẑ = 0.519.Here, we again emphasize that this price differs from the one presented by Merton in Equation (2.26), which equals 0.515 for our particular choice of parameters.This means that the difference between the option price proposed in (Merton, 1976) and the minimal variance price is nontrivial in practice.
In Figure 8, we can see how the distribution of the differences between terminal wealth and option payoff in Merton's and our approach differ.We can see that using Merton's portfolio, one usually observes small gains, which are balanced with occasional large losses.On the other hand, the distribution is symmetric around zero using our approach.Apart from that, we are able to avoid large losses entirely, which is an advantage.Consequently, using a minimal variance hedging strategy is more costly, as discussed in Section 2.5.

Fig. 8 Distribution of the difference between terminal wealth and option payoff under Merton's and our approach in 1000 realizations
The performance of our deep learning algorithm for the Merton model is presented in Table 3.Compared to the performance in the BS case, the error of price prediction is around ten times higher, which is due to the increased volatility in the model.

Model ẑ x0
loss min Merton 0.519 0.521 8.13e-5 Table 3: Comparison of theoretical and numerical results in Merton model In the case of both complete and incomplete markets, we showed that the deep learning algorithm performs well by comparing the results with cases where explicit solutions or at least Monte Carlo estimates are available.
To analyze the scalability of the algorithm concerning an increase in input dimension, similar to the continuous case, we examine the Merton model jump diffusion process driven by a multi-dimensional compound Poisson process.Specifically, we compare the outcomes with a one-dimensional version guided by a modified compensated Poisson process using mixed distributions.We make use of the following well-known result.
i=1 Y 2 i be two compound Poisson processes where N 1 (t) and N 2 (t) are independent homogenous Poisson processes with intensities λ 1 and λ 2 respectively.Furthermore, jumps Y 1 i and Y 2 i are independent, coming from possibly different distributions.Consider now N (t), a homogenous Poisson process with intensity λ = λ 1 + λ 2 and random variables where p l ∈ [0, 1] represents the probability of exponentially distributed jump being either upward or downward.To ensure the existence of moments, we need additional conditions on intensities of the positive jump part, hence η l 1 > 1 while η l 2 > 0. The compensator of the drift in Equation (3.1) changes accordingly and has components Unfortunately, the property of jumps being both positive and negative and having heavier tails makes the Kou model impractical when one wants to obtain option price through equivalent martingale measures and Mont-Carlo approach as described in Section 2.5.The reason is that the condition G(e Y l i − 1) > −1, for G as in Equation (2.27), is not satisfied for general choice of parameters η l 1 , η l 2 and p l .Nonetheless, the option price can still be obtained through the use of the deep learning algorithm.Again, to avoid negative wealth and stock values, we select R = 150.Due to the heaviness of the tails in jump distribution, one needs to be careful when choosing parameters.In particular, η 2 shouldn't be too small.Here we choose s 0 = 1, K = 0.5, α 0 = 0.15, σ 0 = 0.2, γ 0 = 1, λ = 10, η 1 = 50, η 2 = 25 and p = 0.3.According to (Kou, 2002), these parameters should reflect the ones observed on the US stock market.Graphs of the loss and initial value convergence and a graph of one market realization can be found in Figure 9.The minimal obtained loss is 1.54e-5, lower than the losses recorded for the Merton model due to the choice of less noisy parameters.The algorithm predicts the initial wealth value x 0 = 0.4987.We know that the true option price should be higher than the BS price of 0.5 but not by much due to more conservative jump parameters.Hence, the predicted result does not seem too far off.The reason for the undershoot in the predicted initial value is the inclusion of the positive jumps in the Kou model.A possible additional direction for advancing the current study is to explore the utilization of deep learning algorithms in pricing American and path-dependent options within the context of the double exponential jump diffusion model, drawing inspiration from the research conducted in (Kou and Wang, 2004).

Scalability in temporal dimension
In this section, we explore how the algorithm's performance is influenced by variations in the temporal dimension.Specifically, we consider different time maturities T and varying numbers of timesteps R.
We examined the algorithm's performance within the context of both the complete BS market model and the incomplete Merton market model.Various combinations of parameters T and R were scrutinized.The financial model parameters remain consistent with those outlined in Sections 3.3.1 and 3.3.3.
Concerning machine learning parameters, we maintained a batch size of 256, a hidden dimension of 512, and a learning rate of 0.0005.For practical considerations, we conducted the learning process for a fixed 3000 epochs in each studied case.In both market models, the algorithm's performance was evaluated by comparing the loss and the absolute error of predicted option prices.For the BS model, the estimated expected L 2 loss defined in Equation (3.4) was also included in the assessment.
Results are presented in Tables 5 and 6 for the BS and Merton models, respectively.
R/T 0. It should be stated that due to the random nature of both inputs and the algorithm presented results are noisy in nature as well.
Nonetheless, we may with large certainty conclude that the algorithm's performance decreases with increasing maturity.This is particularly evident in the absolute error of option prices (Table 5b) and the expected L 2 distance (Table 5c).The use of discrete Euler-Maruyama approximation to the continuous solution likely contributes to this trend, where errors accumulate over time.We believe enhanced computing power may contribute to improved performance through finer discretization, extended training time, and higher hidden dimensions.
In examining performance concerning the discretization parameter R, it is harder to draw strong conclusions since the difference in results is less significant.It seems that finer discretization positively impacts the algorithm's performance in terms of loss and absolute error in option prices.Conversely, an inverse trend is observed in the expected L 2 distance.This discrepancy may arise from increased variation in the discrete theoretical portfolio as the number of timesteps grows.Nevertheless, focusing on loss and option price, which are the key outputs of our algorithm, we find a positive response to an increased number of discretization points.As discussed earlier, this may prove beneficial when considering longer maturities.Before presenting the results for the Merton case, we must first address the problem regarding the use of discretizations with a small number of timesteps, which was already mentioned in Section 3.3.3.To circumvent the problem of negative wealth and stock values, we present an alternative discretization approach.Instead of directly working with the discrete wealth process x given by the update rule in Equation (3.1), we take its logarithm.Using Itô's formula and Euler-Maruyama scheme we obtain its discrete version through an update rule where y 0 = log(x 0 ) and then put x i = exp(y i ).The same is done for the discrete stock process s where we additionally put π = 1.
Let us now present the results regarding the Merton case.In tables 6a and 6b, we see that the algorithm responds to changes in maturity and the number of timesteps similarly as in the BS case.The algorithm's worse performance is due to the increased randomness in the model, as it was explained in Section 3.3.3.Additionally, it must be mentioned that the algorithm's performance slightly decreases when the logarithmic approach outlined above is used.This decline could be attributed to the non-linear dependence of the process y on the control variable π.Hence, for the jump diffusion models, we advise the use of the original update rule with the number of timesteps at least 150.

Fig. 4
Fig. 4 Neural network architecture with two LSTM layers.Dashed lines represent the update rule in Equation (3.1)

Fig. 5
Fig. 5 Convergence of loss and initial values and one market realization for the BS model

Fig. 6
Fig. 6 Convergence of loss and initial values and one market realization in the BS model with multiple Brownian motions Fig. 7 Convergence of loss and initial values and one market realization for the Merton model

Fig. 9
Fig. 9 Convergence of loss and initial values and one market realization for the Kou model

Table 1 :
Comparison of theoretical and numerical results in BS model 3.3.2Black Scholes market with multiple independent Brownian motions

Table 2 :
Comparison of theoretical and numerical results for the BS model with multiple Brownian motions (BSMB)

Table 4 :
have the same distribution.This result easily extends to the sum of an arbitrary number of compensated Poisson processes.For simulation purposes, we select a three-dimensional compound Poisson process with intensities [3, 5, 2].The lognormal jumps have means [0.1, 0.1, 0.05] and standard deviations [0.05, 0.02, 0.01].The obtained results are presented in Table4.Once again, the similarity in the results indicates that our algorithm scales effectively even in cases involving discontinuous multi-dimensional input.Comparison of theoretical and numerical results in Merton model (Kou, 2002)esent numerical results for a market model introduced in(Kou, 2002).Kou proposed a model where the stock's behaviour is governed by a jump diffusion process where the jumps have a double exponential distribution.This means we can use the same type of discretization as for the Merton case described in equations (3.1) and (3.1),where the jump variables Y l i now follow a double exponential distribution

Table 5 :
Comparison of algorithm performance for different choices of timesteps R and maturities T -BS model

Table 6 :
Comparison of algorithm performance for different choices of timesteps R and maturities T -Merton model