Abstract
We price European-style options written on forward contracts in a commodity market, which we model with an infinite-dimensional Heath–Jarrow–Morton (HJM) approach. For this purpose, we introduce a new class of state-dependent volatility operators that map the square integrable noise into the Filipović space of forward curves. For calibration, we specify a fully parametrized version of our model and train a neural network to approximate the true option price as a function of the model parameters. This neural network can then be used to calibrate the HJM parameters based on observed option prices. We conduct a numerical case study based on artificially generated option prices in a deterministic volatility setting. In this setting, we derive closed pricing formulas, allowing us to benchmark the neural network based calibration approach. We also study calibration in illiquid markets with a large bid-ask spread. The experiments reveal a high degree of accuracy in recovering the prices after calibration, even if the original meaning of the model parameters is partly lost in the approximation step.
Similar content being viewed by others
Notes
The code is implemented in TensorFlow 2.1.0 and is available at https://github.com/silvialava/HJM_calibration_with_NN.
References
Andresen, A., Koekebakker, S., & Westgaard, S. (2010). Modeling electricity forward prices using the multivariate normal inverse Gaussian distribution. The Journal of Energy Markets, 3(3), 3.
Barth, A., & Benth, F. E. (2014). The forward dynamics in energy markets–Infinite-dimensional modelling and simulation. Stochastics An International Journal of Probability and Stochastic Processes, 86(6), 932–966.
Barth, A., & Lang, A. (2012). Simulation of stochastic partial differential equations using finite element methods. Stochastics An International Journal of Probability and Stochastic Processes, 84(2–3), 217–231.
Barth, A., & Lang, A. (2012). Multilevel Monte Carlo method with applications to stochastic partial differential equations. International Journal of Computer Mathematics, 89(18), 2479–2498.
Barth, A., Lang, A., & Schwab, C. (2013). Multilevel Monte Carlo method for parabolic stochastic partial differential equations. BIT Numerical Mathematics, 53(1), 3–27.
Bayer, C., & Stemper, B. (2018). Deep calibration of rough stochastic volatility models. arXiv:1810.03399
Bayer, C., Horvath, B., Muguruza, A., Stemper, B., & Tomas, M. (2019). On deep calibration of (rough) stochastic volatility models. arXiv:1908.08806
Benth, F. E. (2015). Kriging smooth energy futures curves. Energy Risk.
Benth, F. E., Benth, J. Š., & Koekebakker, S. (2008) Stochastic modelling of electricity and related markets (Vol. 11). World Scientific.
Benth, F. E., & Koekebakker, S. (2008). Stochastic modeling of financial electricity contracts. Energy Economics, 30(3), 1116–1157.
Benth, F. E., & Krühner, P. (2014). Representation of infinite-dimensional forward price models in commodity markets. Communications in Mathematics and Statistics, 2(1), 47–106.
Benth, F. E., & Krühner, P. (2015). Derivatives pricing in energy markets: An infinite-dimensional approach. SIAM Journal on Financial Mathematics, 6(1), 825–869.
Benth, F. E., & Paraschiv, F. (2018). A space-time random field model for electricity forward prices. Journal of Banking & Finance, 95, 203–216.
Bühler, H., Gonon, L., Teichmann, J., & Wood, B. (2019). Deep hedging. Quantitative Finance, 19(8), 1271–1291.
Carmona, R., & Nadtochiy, S. (2012). Tangent Lévy market models. Finance and Stochastics, 16(1), 63–104.
Chataigner, M., Crépey, S., & Dixon, M. (2020). Deep local volatility. Risks, 8(3), 82.
Clewlow, L., & Strickland, C. (2000). Energy derivatives: Pricing and risk management. London: Lacima Publications.
Cuchiero, C., Khosrawi, W., & Teichmann, J. (2020). A generative adversarial network approach to calibration of local stochastic volatility models. Risks, 8(4), 101.
Da Prato, G., & Zabczyk, J. (2014). Stochastic equations in infinite dimensions. Cambridge: Cambridge University Press.
De Spiegeleer, J., Madan, D. B., Reyners, S., & Schoutens, W. (2018). Machine learning for quantitative finance: fast derivative pricing, hedging and fitting. Quantitative Finance, 18(10), 1635–1643.
Engel, K.-J., & Nagel, R. (2006). A short course on operator semigroups. Berlin: Springer.
Ferguson, R., & Green, A. (2018). Deeply learning derivatives. arXiv preprint arXiv:1809.02233
Filipović, D. (2001). Consistency problems for Heath–Jarrow–Morton interest rate models. Lecture notes in mathematics (Vol. 1760). Springer.
Filipović, D. (2009). Term-structure models. A graduate course. Berlin: Springer.
Frestad, D. (2008). Common and unique factors influencing daily swap returns in the Nordic electricity market, 1997–2005. Energy Economics, 30(3), 1081–1097.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. London: MIT Press.
Gottschling, N. M., Antun, V., Adcock, B., & Hansen, A. C. (2020). The troublesome kernel: Why deep learning for inverse problems is typically unstable. arXiv preprint arXiv:2001.01258
Heath, D., Jarrow, R., & Morton, A. (1992). Bond pricing and the term structure of interest rates: A new methodology for contingent claims valuation. Econometrica: Journal of the Econometric Society, 77–105.
Hernandez, A. (2016). Model calibration with neural networks. Available at SSRN 2812140.
Higham, C. F., & Higham, D. J. (2019). Deep learning: An introduction for applied mathematicians. SIAM Review, 61(4), 860–891.
Horvath, B., Muguruza, A., & Tomas, M. (2020). Deep learning volatility: A deep neural network perspective on pricing and calibration in (rough) volatility models. Quantitative Finance 1–17.
Hutchinson, J. M., Lo, A. W., & Poggio, T. (1994). A nonparametric approach to pricing and hedging derivative securities via learning networks. The Journal of Finance, 49(3), 851–889.
Kallsen, J., & Krühner, P. (2015). On a Heath–Jarrow–Morton approach for stock options. Finance and Stochastics, 19(3), 583–615.
Kingma, D. P. & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Koekebakker, S., & Ollmar, F. (2005). Forward curve dynamics in the Nordic electricity market. Managerial Finance, 31(6), 73–94.
Kondratyev, Al. (2018). Learning curve dynamics with artificial neural networks. Available at SSRN 3041232.
Kovács, M., Larsson, S., & Lindgren, F. (2010). Strong convergence of the finite element method with truncated noise for semilinear parabolic stochastic equations with additive noise. Numerical Algorithms, 53(2–3), 309–320.
Nelson, C. R. & Siegel, A. F. (1987). Parsimonious modeling of yield curves. Journal of Business 473–489.
Peszat, S., & Zabczyk, J. (2007). Stochastic partial differential equations with Lévy noise: An evolution equation approach (Vol. 113). Cambridge: Cambridge University Press.
Rynne, B., & Youngson, M. A. (2013). Linear functional analysis. Berlin: Springer Science & Business Media.
Tappe, S. (2012). Some refinements of existence results for SPDEs driven by Wiener processes and Poisson random measures. International Journal of Stochastic Analysis.
Acknowledgements
The authors would like to thank Vegard Antun for precious coding support and related advice, and Christian Bayer for useful discussions. The authors are also grateful to two anonymous referees for their valuable comments which helped to improve the exposition of the paper with the goal to reach a larger audience.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Part of the project has been carried out during Silvia Lavagnini’s 3-month visit at UCSB, funded by the Kristine Bonnevie travel stipend 2019 from the Faculty of Mathematics and Natural Sciences (University of Oslo).
Appendices
Appendix A: Proofs of the main results
We report in this section the proofs to the main results in the order they appear in the paper.
1.1 A.1 Proof of Theorem 2.2
For every \(x\in \mathbb {R}_{+}\) and \(f \in \mathcal {H}_{\alpha }\), by the Cauchy–Schwarz inequality we can write that
which is bounded, since \(\kappa _t (x,\cdot ,f) \in \mathcal {H}\) for every \(x\in \mathbb {R}_+\) and every \(f \in \mathcal {H}_{\alpha }\) by Assumption 1, and because \(h\in \mathcal {H}\). Thus, \(\sigma _t(f) h\) is well defined for all \(h \in \mathcal {H}\).
We need to show that \(\sigma _t(f) h \in \mathcal {H}_{\alpha }\) for every \(f \in \mathcal {H}_{\alpha }\). We start by noticing that for every \(x\in \mathbb {R}_{+}\) the following equality holds:
where the differentiation under the integral sign is justified by Dominated Convergence because of Assumption 2 and \(\int _{\mathcal {O}} \bar{\kappa }_x (y) h(y) dy < \infty\) . Moreover, by Assumption 3 and the Cauchy–Schwarz inequality, we find that
which shows that \(\sigma _t(f) h\in \mathcal {H}_{\alpha }\) and boundedness of the operator \(\sigma _t(f)\) for each \(f \in \mathcal {H}_{\alpha }.\)
1.2 A.2 Proof of Theorem 2.3
We first observe that for every \(h\in \mathcal {H}\) and every \(f, f_1\in \mathcal {H}_{\alpha }\), it holds that
where we used the Cauchy–Schwarz inequality twice. By Assumption 3 this is bounded and allows us to apply the Fubini Theorem and calculate as follows:
for \(\sigma _t(f)^{*}f_1\) defined by
From Rynne and Youngson (2013, Theorem 6.1), \(\sigma _t(f)^{*}\) is the unique adjoint operator of \(\sigma _t(f)\), for \(f \in \mathcal {H}_{\alpha }\).
1.3 A.3 Proof of Theorem 2.4
We start with the growth condition. For \(h\in \mathcal {H}\) and \(f_1\in \mathcal {H}_{\alpha }\), we can write that
where we have used the Cauchy–Schwarz inequality, together with the inequality \(\left|f_1(0)\right|\le \left\Vert f_1\right\Vert _{\alpha }\) and Assumption 2. With some abuse of notation, it follows that \(\left\Vert \sigma _{t}(f_1)\right\Vert _{\mathcal {L}(\mathcal {H}, \mathcal {H}_{\alpha })} \le C(t)(1+\left\Vert f_1\right\Vert _{\alpha })\) for a suitably chosen constant C(t). Similarly, from Assumption 1, it follows that
from which \(\left\Vert \sigma _{t}(f_1)-\sigma _{t}(f_2)\right\Vert _{\mathcal {L} (\mathcal {H}, \mathcal {H}_{\alpha })}\le C(t) \left\Vert f_1 -f_2\right\Vert _{\alpha }\) for a suitably chosen C(t), which proves the Lipschitz continuity of the volatility operator, and concludes the proof.
1.4 A.4 Proof of Proposition 2.5
For the volatility operator \(\sigma _t\) to be well defined, we need to check that the function \(\kappa _t\) introduced in Eq. (2.6) satisfies the assumptions of Theorem 2.2. We start by observing that \(\kappa _t(x,\cdot ) \in \mathcal {H}\) if and only if \(\omega \in \mathcal {H}\). Then we can calculate the derivative
which, in particular, by Assumption 2 is bounded by
For the \(\mathcal {H}\)-norm, we then have that
where we have used that \(\left\Vert \bar{\omega }_x\right\Vert \le C_1\), which implies that Assumption 3 in Theorem 2.2 is satisfied for \(\alpha\) such that \(\int _{\mathbb {R}_+} e^{-2bx} \alpha (x) <\infty\). Finally, the Lipschitz condition is trivially satisfied and the growth condition is fulfilled because a(t) is bounded.
1.5 A.5 Proof of Lemma 3.1
For w in Eq. (3.2), we get that \(w_{\ell }(v)= \frac{1}{\ell }\) and \(\mathcal {W}_{\ell }(u) = \frac{u}{\ell }\). Then
and from Eqs. (3.4) and (3.6), we can write that
Integration by parts gives the result.
1.6 A.6 Proof of Proposition 3.2
Let \(f := \mathcal {D}_{\ell }^{w*}\delta _{T_1-s}^*(1)\). We start by applying the covariance operator to \(h := \sigma _s(g_s)^*f:\)
where we used Theorem 2.3 and the linearity of the scalar product. Further, we apply \(\sigma _s(g_s)\):
for \(\varPsi _s(x, \cdot ) := \int _{\mathcal {O}} \int _{\mathcal {O}}\kappa _s(x,z, g_s)q(z,y) \kappa _s (\cdot ,y, g_s)\mathrm{d}y \mathrm{d}z\). We go now back to the definition of f:
By Lemma 3.1, we can write that
to which, finally, we apply the operator \(\delta _{T_1-s}\mathcal {D}_{\ell }^{w}\):
finalizing the proof.
1.7 A.7 Proof of Proposition 5.2
We consider the representation
where we have introduced
By applying (repeatedly) the integration by parts, and since \(\omega ''\) is null, we obtain
By substituting Eq. (A.2) into (A.1), we get that
where we used the definition of \(d_{\ell }\) in Lemma 3.1. By integration by parts, we get
where we introduced
Using the definition of \(\omega\) in Eq. (5.3), we get that
where \(\mathrm {sgn}\) denotes the sign function. Similarly,
By substituting these findings and rearranging the terms, we get that
which concludes the proof.
Appendix B: The non-injectivity issue
From the numerical experiments, we observe that the accuracy achieved in calibration is not particularly convincing, especially for the parameters regarding the volatility and the covariance operator, a, b and k. Slightly better results were obtained for the Nelson–Siegel curve parameters, \(\alpha _0\), \(\alpha _1\) and \(\alpha _3\), with the exception of \(\alpha _2\) (see Figs. 3, 6 and 8). On the other hand, the relative error for the price approximation after calibration shows high degree of accuracy (Figs. 4, 7 and 9). We may conclude that the original meaning of the model parameters is lost in the approximation step. Indeed, as pointed out in Bayer and Stemper (2018), it is somehow to be expected that the neural network is non-injective in the input parameters on large part of the inputs domain. We shall briefly analyse this.
The pricing formula (5.1), once fixed the strike K and the time to maturity \(\tau\), crucially depends on \(\xi\) and \(\mu (g_{t})\) as derived, respectively, in Proposition 5.3 and Eq. (5.6):
However, \(\xi\) is only a scale, while \(\mu (g_{t})\) is more influential on the final price level since it defines the distance from the strike price K. Let us first focus on \(\xi\):
where \(B(b^2, e^{-b\ell })\) simply indicates a term proportional to \(b^2\) and \(e^{-b\ell }\). In the front coefficient, a decrease in a might be, for example, compensated by an increase in b or k, and vice versa, meaning that several combinations of values for a, b, and k lead to the same overall \(\xi\). Thus, we may suspect that it can be hard for the neural network to identify the right vector of parameters despite reaching good level of accuracy for the price.
In Fig. 11, we report an example of non-injectivity with respect to the parameters a, b and k that we have observed in the grid-based learning approach. Here the neural network is not injective when all the parameters, except one, are fixed, and is only little sensitive to the change in the parameters. This also explains the struggle in calibration.
Similar observations can be done for the drift:
Here the role of \(\alpha _0\) is specific since it defines the starting level of the curve, and indeed \(\alpha _0\) is the parameter that gets the best accuracy in estimation. However, \(\alpha _2\) appears first added to \(\alpha _1\) and then multiplied by \(\alpha _3\), making it hard for the neural network to outline its role. In the Nelson–Siegel curve in Eq. (5.5), \(\alpha _2\) defines the position of the “bump”, but the drift \(\mu (g_t)\) is obtained by integrating the curve within the delivery period of the contract. This integration smoothens the curve and makes it hard to locate the “bump”. This might explain why the accuracy in estimating \(\alpha _2\) is worse than for the other Nelson–Siegel parameters.
We conclude the article with the following theorem showing that it is possible to construct ReLU neural networks which act as simple linear maps.
Theorem B.1
Let \(A \in \mathbb {R}^{p \times d}\). Then for any \(L \ge 2\) and any \(\mathbf {n}= (d, n_1, \ldots , n_{L-1}, p)\) with \(n_i \ge 2d\), \(i=1,\ldots , (L-1)\), there exists an L-layer ReLU neural network \(\mathcal {N} :\mathbb {R}^d \rightarrow \mathbb {R}^p\) with dimension \(\mathbf {n}\), which satisfies
Proof
We follow a similar approach to Gottschling et al. (2020, Section 8.5). Let \(\nu _i\ge 0\) be such that \(n_i = 2d+\nu _i\) for \(i=1, \dots ,(L-1)\). For \(I_d\) the identity matrix of dimension d, we define the following weights:
where \(\top\) denotes the transpose operator. Here \(O_i \in \mathbb {R}^{d\times \nu _i}\) are matrices with all entries equal to 0 to compensate the matrix dimension in such a way that \(V_i \in \mathbb {R}^{n_{i}\times n_{i-1}}\) for \(i=1,\ldots ,(L-1)\). By considering zero-biases vectors \(v_i\), the linear maps \(H_i\) introduced in the neural network definition in Eq. (4.1) coincide then with the matrices \(V_i\).
We observe that for every \(x\in \mathbb {R}^d\), the ReLU activation function satisfies
where the activation function is meant to act component wise. By straightforward calculation, one can then see that the neural network defined here satisfies the equality \(\mathcal {N}(x) = Ax\) for every \(x\in \mathbb {R}^d\), which means that it acts on x as a linear map. \(\square\)
Theorem B.1 proves that we can construct a ReLU L-layer neural network which corresponds to a linear map. As there are infinitely many non-injective linear maps (the zero-map being a trivial example), it is then possible to construct infinitely many non-injective ReLU neural networks. Obviously, this does not show that a non-injective network, such as the one constructed in the proof of Theorem B.1, will also minimize the objective function used for training. It might, however, give a glimpse to understand that neural networks are not very likely to be injective in their input parameters.
Rights and permissions
About this article
Cite this article
Benth, F.E., Detering, N. & Lavagnini, S. Accuracy of deep learning in calibrating HJM forward curves. Digit Finance 3, 209–248 (2021). https://doi.org/10.1007/s42521-021-00030-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42521-021-00030-w
Keywords
- Heath–Jarrow–Morton approach
- Infinite dimension
- Energy markets
- Option pricing
- Neural networks
- Model calibration