Skip to main content
Log in

Solving Non-linear Kolmogorov Equations in Large Dimensions by Using Deep Learning: A Numerical Comparison of Discretization Schemes

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

Non-linear partial differential Kolmogorov equations are successfully used to describe a wide range of time dependent phenomena, in natural sciences, engineering or even finance. For example, in physical systems, the Allen–Cahn equation describes pattern formation associated to phase transitions. In finance, instead, the Black–Scholes equation describes the evolution of the price of derivative investment instruments. Such modern applications often require to solve these equations in high-dimensional regimes in which classical approaches are ineffective. Recently, an interesting new approach based on deep learning has been introduced byby E, Han and Jentzen [1, 2]. The main idea is to construct a deep network which is trained from the samples of discrete stochastic differential equations underlying Kolmogorov’s equation. The network is able to approximate, numerically at least, the solutions of the Kolmogorov equation with polynomial complexity in whole spatial domains. In this contribution we study variants of the deep networks by using different discretizations schemes of the stochastic differential equation. We compare the performance of the associated networks, on benchmarked examples, and show that, for some discretization schemes, improvements in the accuracy are possible without affecting the observed computational complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Notes

  1. A possible route to make this argument rigorous would be to use methods of [24, 25]. Rigorous aspects are beyond the scope of this paper and we do not further address this point here.

References

  1. Han, J., Jentzen, A., E, W.: Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. 115(34), 8505–8510 (2018)

  2. E, W., Han, J., Jentzen, A.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5(4):349–380 (2017). https://doi.org/10.1007/s40304-017-0117-6

  3. Bellman, R.: English Dynamic Programming, vol. XXV. Princeton University Press, Princeton, NJ (1957)

  4. Beck, C., Becker, S., Cheridito, P., Jentzen, A., Neufeld, A.: Deep splitting method for parabolic PDEs. arXiv:1907.03452 (2019)

  5. Chan-Wai-Nam, Q., Mikael, J., Warin, X.: Machine learning for semi linear PDEs. J. Sci. Comput. 79(3), 1667–1712 (2019). https://doi.org/10.1007/s10915-019-00908-3

    Article  MathSciNet  MATH  Google Scholar 

  6. Lee, H., Kang, I.S.: Neural algorithm for solving differential equations. J. Comput. Phys. 91(1), 110–131 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  7. Meade, A.J., Jr., Fernandez, A.A.: Solution of nonlinear ordinary differential equations by feedforward neural networks. Math. Comput. Model. 20(9), 19–44 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  8. Dissanayake, M., Phan-Thien, N.: Neural-network-based approximations for solving partial differential equations. Commun. Numer. Methods Eng. 10(3), 195–201 (1994)

    Article  MATH  Google Scholar 

  9. Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 9(5), 987–1000 (1998)

    Article  Google Scholar 

  10. Lagaris, I., Likas, A., Papageorgiou, D.: Neural-network methods for boundary value problems with irregular boundaries. In: IEEE Transactions on Neural Networks/A Publication of the IEEE Neural Networks Council, vol. 11, pp. 1041–1049 (2000)

  11. Malek, A., Shekari-Beidokhti, R.: Numerical solution for high order differential equations using a hybrid neural network—optimization method. Appl. Math. Comput. 183(1), 260–271 (2006)

    MathSciNet  MATH  Google Scholar 

  12. Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  13. Sahli-Costabal, F., Yang, Y., Perdikaris, P., Hurtado, D.E., Kuhl, E.: Physics-informed neural networks for cardiac activation mapping. Front. Phys. 8, 42 (2020). https://doi.org/10.3389/fphy.2020.00042

    Article  Google Scholar 

  14. Raissi, M., Yazdani, A., Karniadakis, G.L.: Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations. Science 367, eaaw4741 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  15. Sirignano, J., Spiliopoulos, K.: DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  16. Bismut, J.-M.: FrenchThéorie probabiliste du contrôle des diffusions, vol. 167. American Mathematical Society (AMS), Providence, RI (1976)

  17. Pardoux, E., Peng, S.: Adapted solution of a backward stochastic differential equation. Syst. Control Lett. 14(1), 55–61 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  18. Zhou, M., Han, J., Lu, J.: Actor-critic method for high dimensional static Hamilton–Jacobi–Bellman partial differential equations based on neural networks. SIAM J. Sci. Comput. 43(6), A4043–A4066 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  19. Beck, C.W.E., Jentzen, A.: Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations. J Nonlinear Sci 29(4), 1563–1619 (2019). https://doi.org/10.1007/s00332-018-9525-3

    Article  MathSciNet  MATH  Google Scholar 

  20. Huré, C., Pham, H., Warin, X.: Some machine learning schemes for high-dimensional nonlinear PDEs. arXiv:1902.01599 (2019)

  21. Pham, H., Warin, X.: Neural networks-based backward scheme for fully nonlinear PDEs. CoRR arXiv:1908.00412 (2019)

  22. Raissi, M.: Forward–backward stochastic neural networks: deep learning of high-dimensional partial differential equations. arXiv:1804.07010 (2018)

  23. Gonon, L., Schwab, C.: Deep ReLu network expression rates for option prices in high-dimensional, exponential lévy models. In: Seminar for Applied Mathematics, ETH Zürich, Switzerland, Technical Report 2020-52 (2020). https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2020/2020-52.pdf

  24. Han, J., Long, J.: Convergence of the deep BSDE method for coupled FBSDES. Probab. Uncertain. Quant. Risk 5(1), 1–33 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  25. Jiang, Y., Li, J.: Convergence of the deep BSDE method for FBSDES with non-Lipschitz coefficients. arXiv:2101.01869 (2021)

  26. E, W., Han, J., Jentzen, A.: Algorithms for solving high dimensional PDEs: from nonlinear Monte Carlo to machine learning. Nonlinearity 35(1), 278 (2021)

  27. Berner, J., Grohs, P., Jentzen, A.: Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations. arXiv:1809.03062 (2018)

  28. Grohs, P., Hornung, F., Jentzen, A., von Wurstemberger, P.: A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations (2018)

  29. Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T.A.: A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. In: SN Partial Differential Equations and Applications, vol. 1, no. 2, Apr. 2020. https://doi.org/10.1007/s42985-019-0006-9

  30. E, W., Hutzenthaler, M., Jentzen, A., Kruse, T.: Multilevel Picard iterations for solving smooth semilinear parabolic heat equations. Numer. Anal. (2016)

  31. E, W., Hutzenthaler, M., Jentzen, A., Kruse, T.: On multilevel picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations. J. Sci. Comput. 79(3), 1534–1571 (2019). https://doi.org/10.1007/s10915-018-00903-0

  32. Becker, S., Braunwarth, R., Hutzenthaler, M., Jentzen, A., von Wurstemberger, P.: Numerical simulations for full history recursive multilevel picard approximations for systems of high-dimensional partial differential equations. arXiv:2005.10206 (2020)

  33. Kloeden, P.E., Platen, E.: Numerical Solution of Stochastic Differential Equations, vol. 23. Springer, Berlin (2013)

    MATH  Google Scholar 

  34. Leimkuhler, B., Matthews, C.: Rational construction of stochastic numerical methods for molecular sampling. Appl. Math. Res. Exp. 2013(1), 34–56 (2012)

    MathSciNet  MATH  Google Scholar 

  35. Marino: DNN-PDEs. https://github.com/RaffaeleMarino/DNN-PDEs (2019)

  36. Kolmogoroff, A.: The theory of continuous random processes. Math. Ann. 108(1), 149–160 (1933)

    Article  MathSciNet  Google Scholar 

  37. Gardiner, C.W., et al.: Handbook of Stochastic Methods, vol. 3. Springer, Berlin (1985)

    Google Scholar 

  38. Glasserman, P.: Monte Carlo Methods in Financial Engineering, vol. 53. Springer, Berlin (2004)

    MATH  Google Scholar 

  39. Brenner, H.: Coupling between the translational and rotational Brownian motions of rigid particles of arbitrary shape: Ii. General theory. J. Colloid Interface Sci. 23(3), 407–436 (1967)

    Article  Google Scholar 

  40. Brenner, H.: Taylor dispersion in systems of sedimenting nonspherical Brownian particles. J. Colloid Interface Sci. 80(2), 548–588 (1981)

    Article  Google Scholar 

  41. Brenner, H.: Taylor dispersion in systems of sedimenting nonspherical brownian particles: Ii. Homogeneous ellipsoidal particles. J. Colloid Interface Sci. 80(2), 548–588 (1981)

    Article  Google Scholar 

  42. Marino, R., Aurell, E.: Advective-diffusive motion on large scales from small-scale dynamics with an internal symmetry. Phys. Rev. E 93(6), 062147 (2016)

    Article  Google Scholar 

  43. Marino, R., Eichhorn, R., Aurell, E.: Entropy production of a Brownian ellipsoid in the over damped limit. Phys. Rev. E 93(1), 012132 (2016)

    Article  Google Scholar 

  44. Aurell, E., Bo, S., Dias, M., Eichhorn, R., Marino, R.: Diffusion of a Brownian ellipsoid in a force field. EPL (Europhys. Lett.) 114(3), 30005 (2016)

    Article  Google Scholar 

  45. Jentzen, A., Röckner, M.: A Milstein scheme for SPDEs. Found. Comput. Math. 15(2), 313–362 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  46. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)

  47. Black, F., Scholes, M.: The pricing of options and corporate liabilities. J. Polit. Econ. 81(3), 637–654 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  48. Hull, J.: Options, Futures, and Other Derivatives, 6th ed. Pearson Prentice Hall, Upper Saddle River, NJ (2006). http://gso.gbv.de/DB=2.1/CMD?ACT=SRCHA &SRT=YOP &IKT=1016 &TRM=ppn+563580607 &sourceid=fbw_bibsonomy

  49. Hammersley, J.: Monte Carlo methods. Springer, Berlin (2013)

    Google Scholar 

  50. Hutzenthaler, M., Jentzen, A., Kruse, T. et al.: On multilevel picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations. arXiv:1708.03223 (2017)

  51. Hutzenthaler, W.E.M., Jentzen, A., Kruse, T.: On multilevel picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations. J. Sci. Comput. 79(3), 1534–1571 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  52. Beck, C., Becker, S., Grohs, P., Jaafari, N., Jentzen, A.: Solving stochastic differential equations and Kolmogorov equations by means of deep learning. arXiv:1806.00421 (2018)

Download references

Acknowledgements

The work of R. M. was supported by Swiss National Foundation grant number 200021E 17554 and now he is supported by the FARE grant No. R167TEEE7B. N. M. learned about the topic during the conference “Intelligent Machines and Mathematics” in Bologna (January 2019) and acknowledges interesting related discussions with Pierluigi Contucci and Philip Grohs. We also thank Martin Hutzenthaler as well as the referees for constructive comments that improved the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raffaele Marino.

Ethics declarations

Conflict of interest

The authors declare no competing financial interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Derivation of Equation (28)

For completeness we derive Eq. (28) which follows from the Leimkuhler and Matthews discretization of stochastic trajectories given by (21). Note the differences between this equation and (24) implied by the Euler–Maruyama discretization (19). Using (14) and applying Ito’s lemma (i.e., expanding to second order) we get for small enough \(\tau \)

$$\begin{aligned} g[\vec {Y}^n_{LM}, \tau _{n+1}]-g[\vec {Y}^n_{LM},\tau _n]= & {} \partial _t g[\vec {Y}^n_{LM},\tau _n]\tau + \nabla _{\vec x} g[\vec {Y}^n_{LM},\tau _n]^TA[\vec {Y}^n_{LM},\tau _n]\tau \nonumber \\{} & {} +\frac{1}{2}\nabla g[\vec {Y}^n_{LM},\tau _n]^T B[\vec {Y}^n_{LM},\tau _n] (\Delta \vec {W}^n+\Delta \vec {W}^{n+1})\nonumber \\{} & {} +\frac{1}{8}\,\mathbb {E}\bigl [(\Delta \vec {W}^n+\Delta \vec {W}^{n+1})^T B^T[\vec {Y}^n_{LM}(\tau _{n}),\tau _n] \text {Hess}_{\vec {x}}g\nonumber \\{} & {} [\vec {Y}^n_{LM}(\tau _{n}),\tau _n]B(\Delta \vec {W}^n+\Delta \vec {W}^{n+1})\bigr ]. \end{aligned}$$
(42)

To evaluate the last term we work in components and use \(\mathbb {E}[\Delta W_j^n\Delta W_i^n] = \delta _{ij}\tau \) and \(\mathbb {E}[\Delta W_j^n\Delta W_i^{n+1}] = 0\) (because increments are independent). This yields

$$\begin{aligned} g[\vec {Y}^n_{LM}, \tau _{n+1}]-g[\vec {Y}^n_{LM},\tau _n]= & {} \partial _t g[\vec {Y}^n_{LM},\tau _n]\tau + \nabla _{\vec x} g[\vec {Y}^n_{LM},\tau _n]^TA[\vec {Y}^n_{LM},\tau _n]\tau \nonumber \\{} & {} +\frac{1}{2}\nabla g[\vec {Y}^n_{LM},\tau _n]^T B[\vec {Y}^n_{LM},\tau _n] (\Delta \vec {W}^n+\Delta \vec {W}^{n+1})\nonumber \\{} & {} + \frac{1}{4}\text {Tr}\{B B^T[\vec {Y}^n_{LM}(\tau _{n}),\tau _n] \text {Hess}_{\vec {x}}g[\vec {Y}^n_{LM}(\tau _{n}),\tau _n]\}\tau . \end{aligned}$$
(43)

Finally, using (1) for the partial derivative with respect to time we obtain (28).

Appendix B: Further Assessment of the LM Scheme for the Heat Equation

The Leihmkuhler–Matthews (LM) scheme appears to perform worse than both the Euler–Maruyama and Milstein schemes for the Black-Scholes equation (Sect. 5.1). This is also the case when the diffusion tensor is homogeneous as tested on the Allen–Cahn equation (Sect. 5.2). However we observe slight improvements when the time step \(\tau \) of the discretization becomes smaller, and we therefore conclude that the LM scheme is limited by the number of neural networks needed for each time-slice. Taking smaller time steps adds networks which becomes prohibitive in terms of memory requirements.

However, the LM scheme is well suited for sampling from stationary distributions, and it is therefore natural to ask if this quality persists when solving an equation admitting a stationary limiting distribution. The simplest such equation is the heat equation that we discuss here. We find that for the same discretization interval used in the non-linear equations the LM scheme still leads to higher errors than Euler-Maruyama.

Fig. 11
figure 11

Comparison between Leimkuhler-Matthews (red points) and Euler-Maruyama (blue points) schemes for the simple heat equation in ten dimensions. The comparison is performed by computing the averaged relative approximation error defined in Eq. (44) as a function of the number of initial points (P) for stochastic trajectoties used as inputs to the \(\mathcal {DNN}\)’s. We observe that the Euler-Maruyama scheme is superior (Color figure online)

We look at the (forward) heat equation \(\frac{\partial g(\vec {x}, t)}{\partial t}=\Delta _x g(\vec {x}, t)\) in \(d=10\) dimensions, on the interval \(t \in [0,T]\), and with initial condition \(g(\vec {x}, 0)=||\vec {x}||^2_{\mathbb {R}^d}\). The exact solution is given by \(g(\vec {x}, t)=||\vec {x}||^2_{\mathbb {R}^d} + td\).

We only compare the Leimkuhler–Matthews scheme with the Euler-Maruyama one (since the diffusion coefficient is constant). Figure 11 shows the comparison between the two schemes by computing the averaged relative approximation error \(\langle \epsilon \rangle \):

$$\begin{aligned} \langle \epsilon \rangle = \frac{1}{N} \sum _{i=1}^N \left| \frac{g(\vec {x}_i, T) - \mathcal {N}\mathcal {N}(\vec {x}_i, T|\vec {\theta })}{g(\vec {x}_i, T)} \right| \end{aligned}$$
(44)

over \(N=10^4\) points \(\vec {x}_i\), \(i=1,\dots ,N\), randomly chosen in \([0,1]^d\), with \(\mathcal {N}\mathcal {N}(\vec {x}, T|\vec {\theta })\) the solution returned by the respective \(\mathcal {DNN}\). Figure 11 shows that the Euler-Maruyama scheme is able to approximate better the solution of the heat equation than the Leimkuhler–Matthews one.

Appendix C: A Simpler Deep Learning Algorithm for Linear Equations. The Example of Geometric Brownian Motion

The main focus of this paper is on non-linear Kolmogorov equations, however we briefly discuss for completeness a simpler algorithm that applies to linear equations. Physical examples for applications of linear Kolmogorv equations can be found in [42,43,44]. For linear equations the standard Feynman-Kac formula is simply given by

$$\begin{aligned} g(\vec {x}, t) =\mathbb {E}_{\vec x} [\phi (\vec {X}(T))] \end{aligned}$$
(45)

i.e., the first term in (7). In [52] it is beautifully remarked that this expectation minimizing a certain "mean square error" and that this can be taken as the basis of a deep learning algorithm. The main idea is to minimize the following loss:

$$\begin{aligned} \mathcal {L}(\vec {\theta })=\frac{1}{\vert b-a\vert ^d}\int _{[a,b]^d} d\vec {x}\mathbb {E}_{\vec x}[ \vert \phi (\vec {X}(T)) - \mathcal {N}\mathcal {N}(\vec {x}, T\vert \vec {\theta })\vert ^2\bigr ] \end{aligned}$$
(46)

In practice the expectations are replaced by empirical averages over a sample set of discretized trajectories and the minimization over \(\vec {\theta }\) carried on by a gradient descent algorithm. One important simplification with respect to the case of non-linear equations is that only one deep network at time T is optimized. We refer to [52] for further details of the algorithm. Our purpose here is to briefly compare the errors incurred by Euler-Maruyama and Milstein discretizations of the trajectories.

We work with a geometric Brownian motion, i.e., the linear part of the (backward) Black–Scholes Eq. (30),

$$\begin{aligned} \frac{\partial g(\vec {x},t)}{\partial t} + \frac{1}{2}\sum ^d_{i=1}|\sigma _i x_i|^2 \frac{\partial ^2 g}{\partial x^2_i}(\vec {x},t)+\sum ^d_{i=1}\mu x_i\frac{\partial g}{\partial x_i}(\vec {x},t) = 0, \qquad g(\vec {x}, T) = \psi (\vec {x}) \end{aligned}$$
(47)

for which the exact solution \(g(\vec {x}, t)\), \(0\le t\le T\) is known,

$$\begin{aligned} g(\vec {x}, t)= & {} \mathbb {E}\bigg [\psi \bigg (x_1 \exp \bigg (\sigma _1 W^{T-t}_1 + \bigg (\mu -\frac{|\sigma _1|^2}{2}\bigg )(T-t)\bigg ), \cdots , x_d \nonumber \\{} & {} \exp \bigg (\sigma _d W^{T-t}_1 + \bigg (\mu -\frac{|\sigma _1|^2}{2}\bigg )(T-t)\bigg )\bigg )\bigg ] \end{aligned}$$
(48)

where \(\vec {W}^t\) is the standard Brownian motion (conditioned to start at the origin \(\vec {W}^0 = 0\)). This is a one-dimensional integral that can be computed by the Monte Carlo method.

The performance by computing the average relative error defined as

$$\begin{aligned} \epsilon = \int _{[0,1]^d} \left| \frac{g(\vec {x}, 0) - \mathcal{N}\mathcal{N}(\vec {x}, 0|\vec {\theta })}{g(\vec {x}, 0)} \right| d\vec {x} \end{aligned}$$
(49)

where \(\mathcal{N}\mathcal{N}(\vec {x}, 0|\vec {\theta })\), at \(t=0\), is the approximate solution obtained by the deep learning algorithm. We take \(r=\frac{1}{20}\), \(\mu =r-\frac{1}{10}\), \(\sigma _i=\frac{1}{10}+\frac{i}{200}\), \(d=100\), \(t\in [0,T=1]\), by using a time size interval \(\tau =T/N\), with \(N=40\). We fix the the initial condition to \(g(\vec {x},0)=\psi (\vec {x})=\exp {(-rT)}\max \{[\max _{i \in {1,2,\ldots ,d}}x_i]-100,0\}\) (as in [52]).

In Table 2 and Fig. 12 we report the results obtained for computing the relative error \(\epsilon \) by using the two discretization schemes schemes in order to minimize the empirical averages corresponding to (46). We also average \(\epsilon \) over 15 different experiments to reduce the variance. The analysis shows that exists an improvement in choosing an higher order discretization scheme for training the neural network.

Fig. 12
figure 12

Comparison between Milstein (green points) and Euler–Maruyama (blue points) schemes. The comparison is performed by computing the relative approximation error defined in Eq. (49) as a function of the number of steps done for training the parameters \(\vec {\theta }\). The figure shows that the Milstein scheme better, on average, approximates the solution of the linear Black–Scholes equation as compared to the Euler–Maruyama one. Error bars are SEM (Color figure online)

Table 2 The table shows the average values of relative error \(\epsilon \)

Appendix D: Computational Complexity for the Milstein Scheme

In Sect. 5.3, we presented using the Euler–Maruyama scheme that the deep learning methodology studied in this paper does not suffer from the curse of dimensionality, in the sense that the number of training trajectories used scales polynomially in the dimension of space. In this appendix, however, we present the same analysis for the Milstein scheme, and we show that the number of training trajectories scales polynomially in the dimension of space.

As far as we know, it is not know in literature an exact solution for a Kolmogorov non-linear equation with non constant transport coefficients. For this reason, we perform an accurate sampling of solutions using the multilevel Picard method for the equation (32) in the interval \([49.995:50.005]^d\) and then we use those solutions as reference values for our analysis.

The deep network of Sect. 4.2 (with Milstein discretization) is used since the diffusion term is non-homogeneous. For the number of initial random points in \([49.995:50.005]^d\) of stochastic input trajectories we take \(P=4,8,16,32, 64, 128, 256, 512, 1024\). Once the network is trained we obtain the approximate solution \(\mathcal {N}\mathcal {N}(\vec {x},0|\vec {\theta })\). The relative error is then easily computed from (36), using \(M=10^2\) uniformly random points \(\vec {x}_m\in [49.995:50.005]^d\), \(m=1, \cdots , M\), computed with the multilevel Picard method.

Figure 13 shows the behaviour of \(\langle \epsilon \rangle \) as a function of P, for various values of d. For small value of d, the averaged relative error needs few points P to approximate well the exact solution. In contrast, as expected, as d grows, the number of points P needed for approximate well the solution becomes larger.

By fixing a pre-specified relative error of \(\langle \epsilon \rangle \approx 0.0011\) we observe that P scales polynomially with d, roughly as \(P=O(d^{1.49})\). This value of P is comparable with the one obtained for the Euler-Maruyama scheme in Sect. 5.3. We conclude that the Milstein scheme improve the accuracy of the Neural Network, without modifying the computational complexity, as expected.

Fig. 13
figure 13

Left: Average relative error \(\langle \epsilon \rangle \) as a function of number of initial points of stochastic trajectories \(P=4,8,16,32, 64, 128, 256, 512, 1024\), on a log-log scale i.e. for different values of \(d=2,4,6,8,10\). Right: Computational complexity, given an average relative error \(\langle \epsilon \rangle < 0.0011 \), measured by P as a function of \(d=2,4,6,8,10\), on a log-log scale the slope is \(b\approx 1.49\). The computational complexity obtained is proportional to \(A(\epsilon )d^{1.49} \), with \(A(\epsilon ) \sim 7.6\). The results are obtained using the algorithm described in Sect. 4.2, for the Eq. (32) with \(T=0.01\)

Appendix E: HJB Equation with non Constant Diffusion Coefficient

In Sect. 5.4, we presented a comparison between Euler–Maruyama and Leihmkuhler–Matthews scheme over the Hamilton–Jacobi–Bellman equation in (40). In this appendix we present a comparison between the Euler-Maruyama and a Mistein scheme over an Hamilton–Jacobi–Bellman equation with non constant diffusion tensor. The equation that we try to solve is:

$$\begin{aligned} \frac{\partial g}{\partial t}(\vec {x},t) + \textbf{D}(\vec {x})\Delta g(\vec {x},t) = \lambda ||\nabla g(\vec {x},t)||^2, \end{aligned}$$
(50)

where \(\lambda \) is a positive constant set to 1, and \(D(\vec {x})_{ij}=x_i^2 \delta _{ij}\), with \(\delta _{ij}\) is the \(\delta \)-Kronecker. The terminal condition is \(g(\vec {x}, T)=\phi (\vec {x})=\ln ((1+||\vec {x}||^2)/2)\) with \(\vec {x} \in \mathbb {R}^d\).

In Fig. 14 we present the comparison between the output of \(\mathcal {D}\mathcal {N}\mathcal {N}_{\textrm{EM}}\) (blue points) and \(\mathcal {D}\mathcal {N}\mathcal {N}_{\textrm{M}}\) (green points). Each point is the average over 5 samples. Blue points identify the output of \(\mathcal {D}\mathcal {N}\mathcal {N}_{\textrm{EM}}\) and reach the approximate value \(g_{\textrm{EM}}(\vec {x}=(50, \dots , 50), 0)=10.049\pm 0.007\). The green points represent the value obtained by \(\mathcal {D}\mathcal {N}\mathcal {N}_{\textrm{M}}\) \(g_{M}(\vec {x}=(50, \dots , 50), 0)=10.008 \pm 0.008\). Again, it is well shown the different results of the two schemes. No numerical or analytical result is present in the literature for Eq. (50), but numerical observations show that the true value of the solution of \(g(\vec {x}=(50, \dots , 50), 0)\) could be around \(\sim 10.008\).

Fig. 14
figure 14

The figure shows the value of \(g(\vec {x}=(50, \dots , 50), 0)\) returned by the neural networks \(\mathcal {DNN}_{EM}\) (blue points), \(\mathcal {DNN}_{M}\) (green points) as function of the number of training steps done for learning the parameters. Each point is the mean over five independent runs. The number of equidistant time steps was fixed at 40, i.e. for each neural networks \(\mathcal {DNN}_{EM}\) and \(\mathcal {DNN}_{M}\) the value of N is fixed to 40. All the parameters were initialized randomly uniformly between \([-1,1]\). The total number of training steps, i.e. \(t_s\), was fixed to 6000 and the learning rate to \(\eta =0.008\) (Color figure online)

Appendix F: Time Scaling of the Deep Learning Algorithm

In this appendix we briefly discuss how, in practice, the time scales with the dimension d, when the deep network is trained to compute an approximate solution. This experiment is performed on a cluster composed of 22 nodes, 200 cores and 250 GB of RAM divided in 22 different machines with different hardware.

We illustrate this for the heat equation and for the exactly solvable non-linear diffusion equation of Sect. 5.3. Since the diffusion tensor is constant we use the algorithm presented in Sect. 4.1 based on the Euler-Maruyama discretization.

We use \(P=4096\) random initial points for the set of stochastic trajectories used for training, and keep the number of training steps at 15000. Figure 15 shows how the time needed to train the network and find an approximate a solution scales with the number of dimensions d. Interestingly we see that for both equations the set of points live on the same straight line, on a log-log scale, with a slope \(\sim 1.1\).

Fig. 15
figure 15

The figure shows how the time returned by the algorithm scales with the dimension d. Each point is an average over 3 experiments. Black circles correspond to the non-linear diffusion equation and blue squares to the heat equation. For both equations points live on the same straight line on a log-log scale with slope \(\sim 1.1\) (Color figure online)

Appendix G: Derivation of Equation (26)

For completeness we derive Eq. (26) which follows from the Milstein discretization of stochastic trajectories given by (20) [37, 45]. We recall that the \(\partial _{x_k}\) operator acts only on B.

For small enough \(\tau \), after the discretization of Eq. (3), and using (20), one obtains:

$$\begin{aligned} g[\vec {Y}^{n+1}_{M}, \tau _{n+1}]&= g[\vec {Y}^n_{M},\tau _n]+ (\partial _t g[\vec {Y}^n_{M},\tau _n])\tau \\&\quad + \sum ^d_{i=1}(\partial _{x_i}g[\vec {Y}^n_{M},\tau _n])\Big (A_i[\vec {Y}^n_M,\tau _n]\tau + \sum _{j=1}^d B_{ij}[\vec {Y}^n_M,\tau _n]\Delta W^n_j \\&\quad +\frac{1}{2}\sum _{j,k,l=1}^d B_{kl}[\vec {Y}^n_M,\tau _n] (\partial _{x_{k}}B_{ij}[\vec {Y}^n_M,\tau _n]) (\Delta W^{n}_{j}\Delta W^{n}_{l}-\delta _{jl}\tau )\Big ) \\&\quad +\frac{1}{2}\sum ^d_{i,r=1}(\partial _{x_i}\partial _{x_r}g[\vec {Y}^n_{M},\tau _n])\Big (A_i[\vec {Y}^n_M,\tau _n]\tau + \sum _{j=1}^d B_{ij}[\vec {Y}^n_M,\tau _n]\Delta W^n_j\\&\quad +\frac{1}{2}\sum _{j,k,l=1}^d B_{kl}[\vec {Y}^n_M,\tau _n] (\partial _{x_{k}}B_{ij}[\vec {Y}^n_M,\tau _n]) (\Delta W^{n}_{j}\Delta W^{n}_{l}-\delta _{jl}\tau )\Big ) \\&\quad \Big (A_r[\vec {Y}^n_M,\tau _n]\tau + \sum _{j'=1}^d B_{rj'}[\vec {Y}^n_M,\tau _n]\Delta W^n_{j'} \\&\quad +\frac{1}{2}\sum _{j',k',l'=1}^d B_{k'l'}[\vec {Y}^n_M,\tau _n] (\partial _{x_{k'}}B_{rj'}[\vec {Y}^n_M,\tau _n]) (\Delta W^{n}_{j'}\Delta W^{n}_{l'}-\delta _{j'l'}\tau )\Big ) \end{aligned}$$

Finally, discarding all the terms of order greater than \(\tau \), using the assumption that \( g[\vec {Y}^n_{M},\tau _n]\) must be a solution of the nonlinear Backward Kolmogorov Eq. (1), we end up with Eq. (26).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Marino, R., Macris, N. Solving Non-linear Kolmogorov Equations in Large Dimensions by Using Deep Learning: A Numerical Comparison of Discretization Schemes. J Sci Comput 94, 8 (2023). https://doi.org/10.1007/s10915-022-02044-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10915-022-02044-x

Keywords

Navigation