Solving Non-linear Kolmogorov Equations in Large Dimensions by Using Deep Learning: A Numerical Comparison of Discretization Schemes

Marino, Raffaele; Macris, Nicolas

doi:10.1007/s10915-022-02044-x

Solving Non-linear Kolmogorov Equations in Large Dimensions by Using Deep Learning: A Numerical Comparison of Discretization Schemes

Published: 23 November 2022

Volume 94, article number 8, (2023)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

415 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Non-linear partial differential Kolmogorov equations are successfully used to describe a wide range of time dependent phenomena, in natural sciences, engineering or even finance. For example, in physical systems, the Allen–Cahn equation describes pattern formation associated to phase transitions. In finance, instead, the Black–Scholes equation describes the evolution of the price of derivative investment instruments. Such modern applications often require to solve these equations in high-dimensional regimes in which classical approaches are ineffective. Recently, an interesting new approach based on deep learning has been introduced byby E, Han and Jentzen [1, 2]. The main idea is to construct a deep network which is trained from the samples of discrete stochastic differential equations underlying Kolmogorov’s equation. The network is able to approximate, numerically at least, the solutions of the Kolmogorov equation with polynomial complexity in whole spatial domains. In this contribution we study variants of the deep networks by using different discretizations schemes of the stochastic differential equation. We compare the performance of the associated networks, on benchmarked examples, and show that, for some discretization schemes, improvements in the accuracy are possible without affecting the observed computational complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning and deep learning

Article Open access 08 April 2021

Artificial intelligence in Finance: a comprehensive review through bibliometric and content analysis

Article Open access 20 January 2024

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Notes

A possible route to make this argument rigorous would be to use methods of [24, 25]. Rigorous aspects are beyond the scope of this paper and we do not further address this point here.

References

Han, J., Jentzen, A., E, W.: Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. 115(34), 8505–8510 (2018)
E, W., Han, J., Jentzen, A.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5(4):349–380 (2017). https://doi.org/10.1007/s40304-017-0117-6
Bellman, R.: English Dynamic Programming, vol. XXV. Princeton University Press, Princeton, NJ (1957)
Beck, C., Becker, S., Cheridito, P., Jentzen, A., Neufeld, A.: Deep splitting method for parabolic PDEs. arXiv:1907.03452 (2019)
Chan-Wai-Nam, Q., Mikael, J., Warin, X.: Machine learning for semi linear PDEs. J. Sci. Comput. 79(3), 1667–1712 (2019). https://doi.org/10.1007/s10915-019-00908-3
Article MathSciNet MATH Google Scholar
Lee, H., Kang, I.S.: Neural algorithm for solving differential equations. J. Comput. Phys. 91(1), 110–131 (1990)
Article MathSciNet MATH Google Scholar
Meade, A.J., Jr., Fernandez, A.A.: Solution of nonlinear ordinary differential equations by feedforward neural networks. Math. Comput. Model. 20(9), 19–44 (1994)
Article MathSciNet MATH Google Scholar
Dissanayake, M., Phan-Thien, N.: Neural-network-based approximations for solving partial differential equations. Commun. Numer. Methods Eng. 10(3), 195–201 (1994)
Article MATH Google Scholar
Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 9(5), 987–1000 (1998)
Article Google Scholar
Lagaris, I., Likas, A., Papageorgiou, D.: Neural-network methods for boundary value problems with irregular boundaries. In: IEEE Transactions on Neural Networks/A Publication of the IEEE Neural Networks Council, vol. 11, pp. 1041–1049 (2000)
Malek, A., Shekari-Beidokhti, R.: Numerical solution for high order differential equations using a hybrid neural network—optimization method. Appl. Math. Comput. 183(1), 260–271 (2006)
MathSciNet MATH Google Scholar
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)
Article MathSciNet MATH Google Scholar
Sahli-Costabal, F., Yang, Y., Perdikaris, P., Hurtado, D.E., Kuhl, E.: Physics-informed neural networks for cardiac activation mapping. Front. Phys. 8, 42 (2020). https://doi.org/10.3389/fphy.2020.00042
Article Google Scholar
Raissi, M., Yazdani, A., Karniadakis, G.L.: Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations. Science 367, eaaw4741 (2020)
Article MathSciNet MATH Google Scholar
Sirignano, J., Spiliopoulos, K.: DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018)
Article MathSciNet MATH Google Scholar
Bismut, J.-M.: FrenchThéorie probabiliste du contrôle des diffusions, vol. 167. American Mathematical Society (AMS), Providence, RI (1976)
Pardoux, E., Peng, S.: Adapted solution of a backward stochastic differential equation. Syst. Control Lett. 14(1), 55–61 (1990)
Article MathSciNet MATH Google Scholar
Zhou, M., Han, J., Lu, J.: Actor-critic method for high dimensional static Hamilton–Jacobi–Bellman partial differential equations based on neural networks. SIAM J. Sci. Comput. 43(6), A4043–A4066 (2021)
Article MathSciNet MATH Google Scholar
Beck, C.W.E., Jentzen, A.: Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations. J Nonlinear Sci 29(4), 1563–1619 (2019). https://doi.org/10.1007/s00332-018-9525-3
Article MathSciNet MATH Google Scholar
Huré, C., Pham, H., Warin, X.: Some machine learning schemes for high-dimensional nonlinear PDEs. arXiv:1902.01599 (2019)
Pham, H., Warin, X.: Neural networks-based backward scheme for fully nonlinear PDEs. CoRR arXiv:1908.00412 (2019)
Raissi, M.: Forward–backward stochastic neural networks: deep learning of high-dimensional partial differential equations. arXiv:1804.07010 (2018)
Gonon, L., Schwab, C.: Deep ReLu network expression rates for option prices in high-dimensional, exponential lévy models. In: Seminar for Applied Mathematics, ETH Zürich, Switzerland, Technical Report 2020-52 (2020). https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2020/2020-52.pdf
Han, J., Long, J.: Convergence of the deep BSDE method for coupled FBSDES. Probab. Uncertain. Quant. Risk 5(1), 1–33 (2020)
Article MathSciNet MATH Google Scholar
Jiang, Y., Li, J.: Convergence of the deep BSDE method for FBSDES with non-Lipschitz coefficients. arXiv:2101.01869 (2021)
E, W., Han, J., Jentzen, A.: Algorithms for solving high dimensional PDEs: from nonlinear Monte Carlo to machine learning. Nonlinearity 35(1), 278 (2021)
Berner, J., Grohs, P., Jentzen, A.: Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations. arXiv:1809.03062 (2018)
Grohs, P., Hornung, F., Jentzen, A., von Wurstemberger, P.: A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations (2018)
Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T.A.: A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. In: SN Partial Differential Equations and Applications, vol. 1, no. 2, Apr. 2020. https://doi.org/10.1007/s42985-019-0006-9
E, W., Hutzenthaler, M., Jentzen, A., Kruse, T.: Multilevel Picard iterations for solving smooth semilinear parabolic heat equations. Numer. Anal. (2016)
E, W., Hutzenthaler, M., Jentzen, A., Kruse, T.: On multilevel picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations. J. Sci. Comput. 79(3), 1534–1571 (2019). https://doi.org/10.1007/s10915-018-00903-0
Becker, S., Braunwarth, R., Hutzenthaler, M., Jentzen, A., von Wurstemberger, P.: Numerical simulations for full history recursive multilevel picard approximations for systems of high-dimensional partial differential equations. arXiv:2005.10206 (2020)
Kloeden, P.E., Platen, E.: Numerical Solution of Stochastic Differential Equations, vol. 23. Springer, Berlin (2013)
MATH Google Scholar
Leimkuhler, B., Matthews, C.: Rational construction of stochastic numerical methods for molecular sampling. Appl. Math. Res. Exp. 2013(1), 34–56 (2012)
MathSciNet MATH Google Scholar
Marino: DNN-PDEs. https://github.com/RaffaeleMarino/DNN-PDEs (2019)
Kolmogoroff, A.: The theory of continuous random processes. Math. Ann. 108(1), 149–160 (1933)
Article MathSciNet Google Scholar
Gardiner, C.W., et al.: Handbook of Stochastic Methods, vol. 3. Springer, Berlin (1985)
Google Scholar
Glasserman, P.: Monte Carlo Methods in Financial Engineering, vol. 53. Springer, Berlin (2004)
MATH Google Scholar
Brenner, H.: Coupling between the translational and rotational Brownian motions of rigid particles of arbitrary shape: Ii. General theory. J. Colloid Interface Sci. 23(3), 407–436 (1967)
Article Google Scholar
Brenner, H.: Taylor dispersion in systems of sedimenting nonspherical Brownian particles. J. Colloid Interface Sci. 80(2), 548–588 (1981)
Article Google Scholar
Brenner, H.: Taylor dispersion in systems of sedimenting nonspherical brownian particles: Ii. Homogeneous ellipsoidal particles. J. Colloid Interface Sci. 80(2), 548–588 (1981)
Article Google Scholar
Marino, R., Aurell, E.: Advective-diffusive motion on large scales from small-scale dynamics with an internal symmetry. Phys. Rev. E 93(6), 062147 (2016)
Article Google Scholar
Marino, R., Eichhorn, R., Aurell, E.: Entropy production of a Brownian ellipsoid in the over damped limit. Phys. Rev. E 93(1), 012132 (2016)
Article Google Scholar
Aurell, E., Bo, S., Dias, M., Eichhorn, R., Marino, R.: Diffusion of a Brownian ellipsoid in a force field. EPL (Europhys. Lett.) 114(3), 30005 (2016)
Article Google Scholar
Jentzen, A., Röckner, M.: A Milstein scheme for SPDEs. Found. Comput. Math. 15(2), 313–362 (2015)
Article MathSciNet MATH Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)
Black, F., Scholes, M.: The pricing of options and corporate liabilities. J. Polit. Econ. 81(3), 637–654 (1973)
Article MathSciNet MATH Google Scholar
Hull, J.: Options, Futures, and Other Derivatives, 6th ed. Pearson Prentice Hall, Upper Saddle River, NJ (2006). http://gso.gbv.de/DB=2.1/CMD?ACT=SRCHA &SRT=YOP &IKT=1016 &TRM=ppn+563580607 &sourceid=fbw_bibsonomy
Hammersley, J.: Monte Carlo methods. Springer, Berlin (2013)
Google Scholar
Hutzenthaler, M., Jentzen, A., Kruse, T. et al.: On multilevel picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations. arXiv:1708.03223 (2017)
Hutzenthaler, W.E.M., Jentzen, A., Kruse, T.: On multilevel picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations. J. Sci. Comput. 79(3), 1534–1571 (2019)
Article MathSciNet MATH Google Scholar
Beck, C., Becker, S., Grohs, P., Jaafari, N., Jentzen, A.: Solving stochastic differential equations and Kolmogorov equations by means of deep learning. arXiv:1806.00421 (2018)

Download references

Acknowledgements

The work of R. M. was supported by Swiss National Foundation grant number 200021E 17554 and now he is supported by the FARE grant No. R167TEEE7B. N. M. learned about the topic during the conference “Intelligent Machines and Mathematics” in Bologna (January 2019) and acknowledges interesting related discussions with Pierluigi Contucci and Philip Grohs. We also thank Martin Hutzenthaler as well as the referees for constructive comments that improved the paper.

Author information

Authors and Affiliations

Dipartimento di Fisica, Sapienza Università di Roma, Piazzale Aldo Moro 5, 00185, Rome, Italy
Raffaele Marino
School of Computer and Communication Science, École Polytechnique Fédérale de Lausanne, Rte Cantonale, 1015, Lausanne, Switzerland
Nicolas Macris

Authors

Raffaele Marino
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Macris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raffaele Marino.

Ethics declarations

Conflict of interest

The authors declare no competing financial interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Derivation of Equation (28)

For completeness we derive Eq. (28) which follows from the Leimkuhler and Matthews discretization of stochastic trajectories given by (21). Note the differences between this equation and (24) implied by the Euler–Maruyama discretization (19). Using (14) and applying Ito’s lemma (i.e., expanding to second order) we get for small enough $\tau $

$$\begin{aligned} g[\vec {Y}^n_{LM}, \tau _{n+1}]-g[\vec {Y}^n_{LM},\tau _n]= & {} \partial _t g[\vec {Y}^n_{LM},\tau _n]\tau + \nabla _{\vec x} g[\vec {Y}^n_{LM},\tau _n]^TA[\vec {Y}^n_{LM},\tau _n]\tau \nonumber \\{} & {} +\frac{1}{2}\nabla g[\vec {Y}^n_{LM},\tau _n]^T B[\vec {Y}^n_{LM},\tau _n] (\Delta \vec {W}^n+\Delta \vec {W}^{n+1})\nonumber \\{} & {} +\frac{1}{8}\,\mathbb {E}\bigl [(\Delta \vec {W}^n+\Delta \vec {W}^{n+1})^T B^T[\vec {Y}^n_{LM}(\tau _{n}),\tau _n] \text {Hess}_{\vec {x}}g\nonumber \\{} & {} [\vec {Y}^n_{LM}(\tau _{n}),\tau _n]B(\Delta \vec {W}^n+\Delta \vec {W}^{n+1})\bigr ]. \end{aligned}$$

(42)

To evaluate the last term we work in components and use $\mathbb {E}[\Delta W_j^n\Delta W_i^n] = \delta _{ij}\tau $ and $\mathbb {E}[\Delta W_j^n\Delta W_i^{n+1}] = 0$ (because increments are independent). This yields

$$\begin{aligned} g[\vec {Y}^n_{LM}, \tau _{n+1}]-g[\vec {Y}^n_{LM},\tau _n]= & {} \partial _t g[\vec {Y}^n_{LM},\tau _n]\tau + \nabla _{\vec x} g[\vec {Y}^n_{LM},\tau _n]^TA[\vec {Y}^n_{LM},\tau _n]\tau \nonumber \\{} & {} +\frac{1}{2}\nabla g[\vec {Y}^n_{LM},\tau _n]^T B[\vec {Y}^n_{LM},\tau _n] (\Delta \vec {W}^n+\Delta \vec {W}^{n+1})\nonumber \\{} & {} + \frac{1}{4}\text {Tr}\{B B^T[\vec {Y}^n_{LM}(\tau _{n}),\tau _n] \text {Hess}_{\vec {x}}g[\vec {Y}^n_{LM}(\tau _{n}),\tau _n]\}\tau . \end{aligned}$$

(43)

Finally, using (1) for the partial derivative with respect to time we obtain (28).

Appendix B: Further Assessment of the LM Scheme for the Heat Equation

The Leihmkuhler–Matthews (LM) scheme appears to perform worse than both the Euler–Maruyama and Milstein schemes for the Black-Scholes equation (Sect. 5.1). This is also the case when the diffusion tensor is homogeneous as tested on the Allen–Cahn equation (Sect. 5.2). However we observe slight improvements when the time step $\tau $ of the discretization becomes smaller, and we therefore conclude that the LM scheme is limited by the number of neural networks needed for each time-slice. Taking smaller time steps adds networks which becomes prohibitive in terms of memory requirements.

However, the LM scheme is well suited for sampling from stationary distributions, and it is therefore natural to ask if this quality persists when solving an equation admitting a stationary limiting distribution. The simplest such equation is the heat equation that we discuss here. We find that for the same discretization interval used in the non-linear equations the LM scheme still leads to higher errors than Euler-Maruyama.

We look at the (forward) heat equation $\frac{\partial g(\vec {x}, t)}{\partial t}=\Delta _x g(\vec {x}, t)$ in $d=10$ dimensions, on the interval $t \in [0,T]$, and with initial condition $g(\vec {x}, 0)=||\vec {x}||^2_{\mathbb {R}^d}$. The exact solution is given by $g(\vec {x}, t)=||\vec {x}||^2_{\mathbb {R}^d} + td$.

We only compare the Leimkuhler–Matthews scheme with the Euler-Maruyama one (since the diffusion coefficient is constant). Figure 11 shows the comparison between the two schemes by computing the averaged relative approximation error $\langle \epsilon \rangle $:

$$\begin{aligned} \langle \epsilon \rangle = \frac{1}{N} \sum _{i=1}^N \left| \frac{g(\vec {x}_i, T) - \mathcal {N}\mathcal {N}(\vec {x}_i, T|\vec {\theta })}{g(\vec {x}_i, T)} \right| \end{aligned}$$

(44)

over $N=10^4$ points $\vec {x}_i$, $i=1,\dots ,N$, randomly chosen in $[0,1]^d$, with $\mathcal {N}\mathcal {N}(\vec {x}, T|\vec {\theta })$ the solution returned by the respective $\mathcal {DNN}$. Figure 11 shows that the Euler-Maruyama scheme is able to approximate better the solution of the heat equation than the Leimkuhler–Matthews one.

Appendix C: A Simpler Deep Learning Algorithm for Linear Equations. The Example of Geometric Brownian Motion

The main focus of this paper is on non-linear Kolmogorov equations, however we briefly discuss for completeness a simpler algorithm that applies to linear equations. Physical examples for applications of linear Kolmogorv equations can be found in [42,43,44]. For linear equations the standard Feynman-Kac formula is simply given by

$$\begin{aligned} g(\vec {x}, t) =\mathbb {E}_{\vec x} [\phi (\vec {X}(T))] \end{aligned}$$

(45)

i.e., the first term in (7). In [52] it is beautifully remarked that this expectation minimizing a certain "mean square error" and that this can be taken as the basis of a deep learning algorithm. The main idea is to minimize the following loss:

$$\begin{aligned} \mathcal {L}(\vec {\theta })=\frac{1}{\vert b-a\vert ^d}\int _{[a,b]^d} d\vec {x}\mathbb {E}_{\vec x}[ \vert \phi (\vec {X}(T)) - \mathcal {N}\mathcal {N}(\vec {x}, T\vert \vec {\theta })\vert ^2\bigr ] \end{aligned}$$

(46)

In practice the expectations are replaced by empirical averages over a sample set of discretized trajectories and the minimization over $\vec {\theta }$ carried on by a gradient descent algorithm. One important simplification with respect to the case of non-linear equations is that only one deep network at time T is optimized. We refer to [52] for further details of the algorithm. Our purpose here is to briefly compare the errors incurred by Euler-Maruyama and Milstein discretizations of the trajectories.

We work with a geometric Brownian motion, i.e., the linear part of the (backward) Black–Scholes Eq. (30),

$$\begin{aligned} \frac{\partial g(\vec {x},t)}{\partial t} + \frac{1}{2}\sum ^d_{i=1}|\sigma _i x_i|^2 \frac{\partial ^2 g}{\partial x^2_i}(\vec {x},t)+\sum ^d_{i=1}\mu x_i\frac{\partial g}{\partial x_i}(\vec {x},t) = 0, \qquad g(\vec {x}, T) = \psi (\vec {x}) \end{aligned}$$

(47)

for which the exact solution $g(\vec {x}, t)$, $0\le t\le T$ is known,

$$\begin{aligned} g(\vec {x}, t)= & {} \mathbb {E}\bigg [\psi \bigg (x_1 \exp \bigg (\sigma _1 W^{T-t}_1 + \bigg (\mu -\frac{|\sigma _1|^2}{2}\bigg )(T-t)\bigg ), \cdots , x_d \nonumber \\{} & {} \exp \bigg (\sigma _d W^{T-t}_1 + \bigg (\mu -\frac{|\sigma _1|^2}{2}\bigg )(T-t)\bigg )\bigg )\bigg ] \end{aligned}$$

(48)

where $\vec {W}^t$ is the standard Brownian motion (conditioned to start at the origin $\vec {W}^0 = 0$). This is a one-dimensional integral that can be computed by the Monte Carlo method.

The performance by computing the average relative error defined as

$$\begin{aligned} \epsilon = \int _{[0,1]^d} \left| \frac{g(\vec {x}, 0) - \mathcal{N}\mathcal{N}(\vec {x}, 0|\vec {\theta })}{g(\vec {x}, 0)} \right| d\vec {x} \end{aligned}$$

(49)

where $\mathcal{N}\mathcal{N}(\vec {x}, 0|\vec {\theta })$, at $t=0$, is the approximate solution obtained by the deep learning algorithm. We take $r=\frac{1}{20}$, $\mu =r-\frac{1}{10}$, $\sigma _i=\frac{1}{10}+\frac{i}{200}$, $d=100$, $t\in [0,T=1]$, by using a time size interval $\tau =T/N$, with $N=40$. We fix the the initial condition to $g(\vec {x},0)=\psi (\vec {x})=\exp {(-rT)}\max \{[\max _{i \in {1,2,\ldots ,d}}x_i]-100,0\}$ (as in [52]).

In Table 2 and Fig. 12 we report the results obtained for computing the relative error $\epsilon $ by using the two discretization schemes schemes in order to minimize the empirical averages corresponding to (46). We also average $\epsilon $ over 15 different experiments to reduce the variance. The analysis shows that exists an improvement in choosing an higher order discretization scheme for training the neural network.

Table 2 The table shows the average values of relative error $\epsilon $

Full size table

Appendix D: Computational Complexity for the Milstein Scheme

In Sect. 5.3, we presented using the Euler–Maruyama scheme that the deep learning methodology studied in this paper does not suffer from the curse of dimensionality, in the sense that the number of training trajectories used scales polynomially in the dimension of space. In this appendix, however, we present the same analysis for the Milstein scheme, and we show that the number of training trajectories scales polynomially in the dimension of space.

As far as we know, it is not know in literature an exact solution for a Kolmogorov non-linear equation with non constant transport coefficients. For this reason, we perform an accurate sampling of solutions using the multilevel Picard method for the equation (32) in the interval $[49.995:50.005]^d$ and then we use those solutions as reference values for our analysis.

The deep network of Sect. 4.2 (with Milstein discretization) is used since the diffusion term is non-homogeneous. For the number of initial random points in $[49.995:50.005]^d$ of stochastic input trajectories we take $P=4,8,16,32, 64, 128, 256, 512, 1024$. Once the network is trained we obtain the approximate solution $\mathcal {N}\mathcal {N}(\vec {x},0|\vec {\theta })$. The relative error is then easily computed from (36), using $M=10^2$ uniformly random points $\vec {x}_m\in [49.995:50.005]^d$, $m=1, \cdots , M$, computed with the multilevel Picard method.

Figure 13 shows the behaviour of $\langle \epsilon \rangle $ as a function of P, for various values of d. For small value of d, the averaged relative error needs few points P to approximate well the exact solution. In contrast, as expected, as d grows, the number of points P needed for approximate well the solution becomes larger.

By fixing a pre-specified relative error of $\langle \epsilon \rangle \approx 0.0011$ we observe that P scales polynomially with d, roughly as $P=O(d^{1.49})$. This value of P is comparable with the one obtained for the Euler-Maruyama scheme in Sect. 5.3. We conclude that the Milstein scheme improve the accuracy of the Neural Network, without modifying the computational complexity, as expected.

Appendix E: HJB Equation with non Constant Diffusion Coefficient

In Sect. 5.4, we presented a comparison between Euler–Maruyama and Leihmkuhler–Matthews scheme over the Hamilton–Jacobi–Bellman equation in (40). In this appendix we present a comparison between the Euler-Maruyama and a Mistein scheme over an Hamilton–Jacobi–Bellman equation with non constant diffusion tensor. The equation that we try to solve is:

$$\begin{aligned} \frac{\partial g}{\partial t}(\vec {x},t) + \textbf{D}(\vec {x})\Delta g(\vec {x},t) = \lambda ||\nabla g(\vec {x},t)||^2, \end{aligned}$$

(50)

where $\lambda $ is a positive constant set to 1, and $D(\vec {x})_{ij}=x_i^2 \delta _{ij}$, with $\delta _{ij}$ is the $\delta $-Kronecker. The terminal condition is $g(\vec {x}, T)=\phi (\vec {x})=\ln ((1+||\vec {x}||^2)/2)$ with $\vec {x} \in \mathbb {R}^d$.

In Fig. 14 we present the comparison between the output of $\mathcal {D}\mathcal {N}\mathcal {N}_{\textrm{EM}}$ (blue points) and $\mathcal {D}\mathcal {N}\mathcal {N}_{\textrm{M}}$ (green points). Each point is the average over 5 samples. Blue points identify the output of $\mathcal {D}\mathcal {N}\mathcal {N}_{\textrm{EM}}$ and reach the approximate value $g_{\textrm{EM}}(\vec {x}=(50, \dots , 50), 0)=10.049\pm 0.007$. The green points represent the value obtained by $\mathcal {D}\mathcal {N}\mathcal {N}_{\textrm{M}}$ $g_{M}(\vec {x}=(50, \dots , 50), 0)=10.008 \pm 0.008$. Again, it is well shown the different results of the two schemes. No numerical or analytical result is present in the literature for Eq. (50), but numerical observations show that the true value of the solution of $g(\vec {x}=(50, \dots , 50), 0)$ could be around $\sim 10.008$.

Appendix F: Time Scaling of the Deep Learning Algorithm

In this appendix we briefly discuss how, in practice, the time scales with the dimension d, when the deep network is trained to compute an approximate solution. This experiment is performed on a cluster composed of 22 nodes, 200 cores and 250 GB of RAM divided in 22 different machines with different hardware.

We illustrate this for the heat equation and for the exactly solvable non-linear diffusion equation of Sect. 5.3. Since the diffusion tensor is constant we use the algorithm presented in Sect. 4.1 based on the Euler-Maruyama discretization.

We use $P=4096$ random initial points for the set of stochastic trajectories used for training, and keep the number of training steps at 15000. Figure 15 shows how the time needed to train the network and find an approximate a solution scales with the number of dimensions d. Interestingly we see that for both equations the set of points live on the same straight line, on a log-log scale, with a slope $\sim 1.1$.

Appendix G: Derivation of Equation (26)

For completeness we derive Eq. (26) which follows from the Milstein discretization of stochastic trajectories given by (20) [37, 45]. We recall that the $\partial _{x_k}$ operator acts only on B.

For small enough $\tau $, after the discretization of Eq. (3), and using (20), one obtains:

$$\begin{aligned} g[\vec {Y}^{n+1}_{M}, \tau _{n+1}]&= g[\vec {Y}^n_{M},\tau _n]+ (\partial _t g[\vec {Y}^n_{M},\tau _n])\tau \\&\quad + \sum ^d_{i=1}(\partial _{x_i}g[\vec {Y}^n_{M},\tau _n])\Big (A_i[\vec {Y}^n_M,\tau _n]\tau + \sum _{j=1}^d B_{ij}[\vec {Y}^n_M,\tau _n]\Delta W^n_j \\&\quad +\frac{1}{2}\sum _{j,k,l=1}^d B_{kl}[\vec {Y}^n_M,\tau _n] (\partial _{x_{k}}B_{ij}[\vec {Y}^n_M,\tau _n]) (\Delta W^{n}_{j}\Delta W^{n}_{l}-\delta _{jl}\tau )\Big ) \\&\quad +\frac{1}{2}\sum ^d_{i,r=1}(\partial _{x_i}\partial _{x_r}g[\vec {Y}^n_{M},\tau _n])\Big (A_i[\vec {Y}^n_M,\tau _n]\tau + \sum _{j=1}^d B_{ij}[\vec {Y}^n_M,\tau _n]\Delta W^n_j\\&\quad +\frac{1}{2}\sum _{j,k,l=1}^d B_{kl}[\vec {Y}^n_M,\tau _n] (\partial _{x_{k}}B_{ij}[\vec {Y}^n_M,\tau _n]) (\Delta W^{n}_{j}\Delta W^{n}_{l}-\delta _{jl}\tau )\Big ) \\&\quad \Big (A_r[\vec {Y}^n_M,\tau _n]\tau + \sum _{j'=1}^d B_{rj'}[\vec {Y}^n_M,\tau _n]\Delta W^n_{j'} \\&\quad +\frac{1}{2}\sum _{j',k',l'=1}^d B_{k'l'}[\vec {Y}^n_M,\tau _n] (\partial _{x_{k'}}B_{rj'}[\vec {Y}^n_M,\tau _n]) (\Delta W^{n}_{j'}\Delta W^{n}_{l'}-\delta _{j'l'}\tau )\Big ) \end{aligned}$$

Finally, discarding all the terms of order greater than $\tau $, using the assumption that $ g[\vec {Y}^n_{M},\tau _n]$ must be a solution of the nonlinear Backward Kolmogorov Eq. (1), we end up with Eq. (26).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Marino, R., Macris, N. Solving Non-linear Kolmogorov Equations in Large Dimensions by Using Deep Learning: A Numerical Comparison of Discretization Schemes. J Sci Comput 94, 8 (2023). https://doi.org/10.1007/s10915-022-02044-x

Download citation

Received: 18 October 2021
Revised: 28 October 2022
Accepted: 30 October 2022
Published: 23 November 2022
DOI: https://doi.org/10.1007/s10915-022-02044-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Solving Non-linear Kolmogorov Equations in Large Dimensions by Using Deep Learning: A Numerical Comparison of Discretization Schemes

Abstract

Access this article