Abstract
Non-linear partial differential Kolmogorov equations are successfully used to describe a wide range of time dependent phenomena, in natural sciences, engineering or even finance. For example, in physical systems, the Allen–Cahn equation describes pattern formation associated to phase transitions. In finance, instead, the Black–Scholes equation describes the evolution of the price of derivative investment instruments. Such modern applications often require to solve these equations in high-dimensional regimes in which classical approaches are ineffective. Recently, an interesting new approach based on deep learning has been introduced byby E, Han and Jentzen [1, 2]. The main idea is to construct a deep network which is trained from the samples of discrete stochastic differential equations underlying Kolmogorov’s equation. The network is able to approximate, numerically at least, the solutions of the Kolmogorov equation with polynomial complexity in whole spatial domains. In this contribution we study variants of the deep networks by using different discretizations schemes of the stochastic differential equation. We compare the performance of the associated networks, on benchmarked examples, and show that, for some discretization schemes, improvements in the accuracy are possible without affecting the observed computational complexity.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Han, J., Jentzen, A., E, W.: Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. 115(34), 8505–8510 (2018)
E, W., Han, J., Jentzen, A.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5(4):349–380 (2017). https://doi.org/10.1007/s40304-017-0117-6
Bellman, R.: English Dynamic Programming, vol. XXV. Princeton University Press, Princeton, NJ (1957)
Beck, C., Becker, S., Cheridito, P., Jentzen, A., Neufeld, A.: Deep splitting method for parabolic PDEs. arXiv:1907.03452 (2019)
Chan-Wai-Nam, Q., Mikael, J., Warin, X.: Machine learning for semi linear PDEs. J. Sci. Comput. 79(3), 1667–1712 (2019). https://doi.org/10.1007/s10915-019-00908-3
Lee, H., Kang, I.S.: Neural algorithm for solving differential equations. J. Comput. Phys. 91(1), 110–131 (1990)
Meade, A.J., Jr., Fernandez, A.A.: Solution of nonlinear ordinary differential equations by feedforward neural networks. Math. Comput. Model. 20(9), 19–44 (1994)
Dissanayake, M., Phan-Thien, N.: Neural-network-based approximations for solving partial differential equations. Commun. Numer. Methods Eng. 10(3), 195–201 (1994)
Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 9(5), 987–1000 (1998)
Lagaris, I., Likas, A., Papageorgiou, D.: Neural-network methods for boundary value problems with irregular boundaries. In: IEEE Transactions on Neural Networks/A Publication of the IEEE Neural Networks Council, vol. 11, pp. 1041–1049 (2000)
Malek, A., Shekari-Beidokhti, R.: Numerical solution for high order differential equations using a hybrid neural network—optimization method. Appl. Math. Comput. 183(1), 260–271 (2006)
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)
Sahli-Costabal, F., Yang, Y., Perdikaris, P., Hurtado, D.E., Kuhl, E.: Physics-informed neural networks for cardiac activation mapping. Front. Phys. 8, 42 (2020). https://doi.org/10.3389/fphy.2020.00042
Raissi, M., Yazdani, A., Karniadakis, G.L.: Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations. Science 367, eaaw4741 (2020)
Sirignano, J., Spiliopoulos, K.: DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018)
Bismut, J.-M.: FrenchThéorie probabiliste du contrôle des diffusions, vol. 167. American Mathematical Society (AMS), Providence, RI (1976)
Pardoux, E., Peng, S.: Adapted solution of a backward stochastic differential equation. Syst. Control Lett. 14(1), 55–61 (1990)
Zhou, M., Han, J., Lu, J.: Actor-critic method for high dimensional static Hamilton–Jacobi–Bellman partial differential equations based on neural networks. SIAM J. Sci. Comput. 43(6), A4043–A4066 (2021)
Beck, C.W.E., Jentzen, A.: Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations. J Nonlinear Sci 29(4), 1563–1619 (2019). https://doi.org/10.1007/s00332-018-9525-3
Huré, C., Pham, H., Warin, X.: Some machine learning schemes for high-dimensional nonlinear PDEs. arXiv:1902.01599 (2019)
Pham, H., Warin, X.: Neural networks-based backward scheme for fully nonlinear PDEs. CoRR arXiv:1908.00412 (2019)
Raissi, M.: Forward–backward stochastic neural networks: deep learning of high-dimensional partial differential equations. arXiv:1804.07010 (2018)
Gonon, L., Schwab, C.: Deep ReLu network expression rates for option prices in high-dimensional, exponential lévy models. In: Seminar for Applied Mathematics, ETH Zürich, Switzerland, Technical Report 2020-52 (2020). https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2020/2020-52.pdf
Han, J., Long, J.: Convergence of the deep BSDE method for coupled FBSDES. Probab. Uncertain. Quant. Risk 5(1), 1–33 (2020)
Jiang, Y., Li, J.: Convergence of the deep BSDE method for FBSDES with non-Lipschitz coefficients. arXiv:2101.01869 (2021)
E, W., Han, J., Jentzen, A.: Algorithms for solving high dimensional PDEs: from nonlinear Monte Carlo to machine learning. Nonlinearity 35(1), 278 (2021)
Berner, J., Grohs, P., Jentzen, A.: Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations. arXiv:1809.03062 (2018)
Grohs, P., Hornung, F., Jentzen, A., von Wurstemberger, P.: A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations (2018)
Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T.A.: A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. In: SN Partial Differential Equations and Applications, vol. 1, no. 2, Apr. 2020. https://doi.org/10.1007/s42985-019-0006-9
E, W., Hutzenthaler, M., Jentzen, A., Kruse, T.: Multilevel Picard iterations for solving smooth semilinear parabolic heat equations. Numer. Anal. (2016)
E, W., Hutzenthaler, M., Jentzen, A., Kruse, T.: On multilevel picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations. J. Sci. Comput. 79(3), 1534–1571 (2019). https://doi.org/10.1007/s10915-018-00903-0
Becker, S., Braunwarth, R., Hutzenthaler, M., Jentzen, A., von Wurstemberger, P.: Numerical simulations for full history recursive multilevel picard approximations for systems of high-dimensional partial differential equations. arXiv:2005.10206 (2020)
Kloeden, P.E., Platen, E.: Numerical Solution of Stochastic Differential Equations, vol. 23. Springer, Berlin (2013)
Leimkuhler, B., Matthews, C.: Rational construction of stochastic numerical methods for molecular sampling. Appl. Math. Res. Exp. 2013(1), 34–56 (2012)
Marino: DNN-PDEs. https://github.com/RaffaeleMarino/DNN-PDEs (2019)
Kolmogoroff, A.: The theory of continuous random processes. Math. Ann. 108(1), 149–160 (1933)
Gardiner, C.W., et al.: Handbook of Stochastic Methods, vol. 3. Springer, Berlin (1985)
Glasserman, P.: Monte Carlo Methods in Financial Engineering, vol. 53. Springer, Berlin (2004)
Brenner, H.: Coupling between the translational and rotational Brownian motions of rigid particles of arbitrary shape: Ii. General theory. J. Colloid Interface Sci. 23(3), 407–436 (1967)
Brenner, H.: Taylor dispersion in systems of sedimenting nonspherical Brownian particles. J. Colloid Interface Sci. 80(2), 548–588 (1981)
Brenner, H.: Taylor dispersion in systems of sedimenting nonspherical brownian particles: Ii. Homogeneous ellipsoidal particles. J. Colloid Interface Sci. 80(2), 548–588 (1981)
Marino, R., Aurell, E.: Advective-diffusive motion on large scales from small-scale dynamics with an internal symmetry. Phys. Rev. E 93(6), 062147 (2016)
Marino, R., Eichhorn, R., Aurell, E.: Entropy production of a Brownian ellipsoid in the over damped limit. Phys. Rev. E 93(1), 012132 (2016)
Aurell, E., Bo, S., Dias, M., Eichhorn, R., Marino, R.: Diffusion of a Brownian ellipsoid in a force field. EPL (Europhys. Lett.) 114(3), 30005 (2016)
Jentzen, A., Röckner, M.: A Milstein scheme for SPDEs. Found. Comput. Math. 15(2), 313–362 (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)
Black, F., Scholes, M.: The pricing of options and corporate liabilities. J. Polit. Econ. 81(3), 637–654 (1973)
Hull, J.: Options, Futures, and Other Derivatives, 6th ed. Pearson Prentice Hall, Upper Saddle River, NJ (2006). http://gso.gbv.de/DB=2.1/CMD?ACT=SRCHA &SRT=YOP &IKT=1016 &TRM=ppn+563580607 &sourceid=fbw_bibsonomy
Hammersley, J.: Monte Carlo methods. Springer, Berlin (2013)
Hutzenthaler, M., Jentzen, A., Kruse, T. et al.: On multilevel picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations. arXiv:1708.03223 (2017)
Hutzenthaler, W.E.M., Jentzen, A., Kruse, T.: On multilevel picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations. J. Sci. Comput. 79(3), 1534–1571 (2019)
Beck, C., Becker, S., Grohs, P., Jaafari, N., Jentzen, A.: Solving stochastic differential equations and Kolmogorov equations by means of deep learning. arXiv:1806.00421 (2018)
Acknowledgements
The work of R. M. was supported by Swiss National Foundation grant number 200021E 17554 and now he is supported by the FARE grant No. R167TEEE7B. N. M. learned about the topic during the conference “Intelligent Machines and Mathematics” in Bologna (January 2019) and acknowledges interesting related discussions with Pierluigi Contucci and Philip Grohs. We also thank Martin Hutzenthaler as well as the referees for constructive comments that improved the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing financial interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Derivation of Equation (28)
For completeness we derive Eq. (28) which follows from the Leimkuhler and Matthews discretization of stochastic trajectories given by (21). Note the differences between this equation and (24) implied by the Euler–Maruyama discretization (19). Using (14) and applying Ito’s lemma (i.e., expanding to second order) we get for small enough \(\tau \)
To evaluate the last term we work in components and use \(\mathbb {E}[\Delta W_j^n\Delta W_i^n] = \delta _{ij}\tau \) and \(\mathbb {E}[\Delta W_j^n\Delta W_i^{n+1}] = 0\) (because increments are independent). This yields
Finally, using (1) for the partial derivative with respect to time we obtain (28).
Appendix B: Further Assessment of the LM Scheme for the Heat Equation
The Leihmkuhler–Matthews (LM) scheme appears to perform worse than both the Euler–Maruyama and Milstein schemes for the Black-Scholes equation (Sect. 5.1). This is also the case when the diffusion tensor is homogeneous as tested on the Allen–Cahn equation (Sect. 5.2). However we observe slight improvements when the time step \(\tau \) of the discretization becomes smaller, and we therefore conclude that the LM scheme is limited by the number of neural networks needed for each time-slice. Taking smaller time steps adds networks which becomes prohibitive in terms of memory requirements.
However, the LM scheme is well suited for sampling from stationary distributions, and it is therefore natural to ask if this quality persists when solving an equation admitting a stationary limiting distribution. The simplest such equation is the heat equation that we discuss here. We find that for the same discretization interval used in the non-linear equations the LM scheme still leads to higher errors than Euler-Maruyama.
We look at the (forward) heat equation \(\frac{\partial g(\vec {x}, t)}{\partial t}=\Delta _x g(\vec {x}, t)\) in \(d=10\) dimensions, on the interval \(t \in [0,T]\), and with initial condition \(g(\vec {x}, 0)=||\vec {x}||^2_{\mathbb {R}^d}\). The exact solution is given by \(g(\vec {x}, t)=||\vec {x}||^2_{\mathbb {R}^d} + td\).
We only compare the Leimkuhler–Matthews scheme with the Euler-Maruyama one (since the diffusion coefficient is constant). Figure 11 shows the comparison between the two schemes by computing the averaged relative approximation error \(\langle \epsilon \rangle \):
over \(N=10^4\) points \(\vec {x}_i\), \(i=1,\dots ,N\), randomly chosen in \([0,1]^d\), with \(\mathcal {N}\mathcal {N}(\vec {x}, T|\vec {\theta })\) the solution returned by the respective \(\mathcal {DNN}\). Figure 11 shows that the Euler-Maruyama scheme is able to approximate better the solution of the heat equation than the Leimkuhler–Matthews one.
Appendix C: A Simpler Deep Learning Algorithm for Linear Equations. The Example of Geometric Brownian Motion
The main focus of this paper is on non-linear Kolmogorov equations, however we briefly discuss for completeness a simpler algorithm that applies to linear equations. Physical examples for applications of linear Kolmogorv equations can be found in [42,43,44]. For linear equations the standard Feynman-Kac formula is simply given by
i.e., the first term in (7). In [52] it is beautifully remarked that this expectation minimizing a certain "mean square error" and that this can be taken as the basis of a deep learning algorithm. The main idea is to minimize the following loss:
In practice the expectations are replaced by empirical averages over a sample set of discretized trajectories and the minimization over \(\vec {\theta }\) carried on by a gradient descent algorithm. One important simplification with respect to the case of non-linear equations is that only one deep network at time T is optimized. We refer to [52] for further details of the algorithm. Our purpose here is to briefly compare the errors incurred by Euler-Maruyama and Milstein discretizations of the trajectories.
We work with a geometric Brownian motion, i.e., the linear part of the (backward) Black–Scholes Eq. (30),
for which the exact solution \(g(\vec {x}, t)\), \(0\le t\le T\) is known,
where \(\vec {W}^t\) is the standard Brownian motion (conditioned to start at the origin \(\vec {W}^0 = 0\)). This is a one-dimensional integral that can be computed by the Monte Carlo method.
The performance by computing the average relative error defined as
where \(\mathcal{N}\mathcal{N}(\vec {x}, 0|\vec {\theta })\), at \(t=0\), is the approximate solution obtained by the deep learning algorithm. We take \(r=\frac{1}{20}\), \(\mu =r-\frac{1}{10}\), \(\sigma _i=\frac{1}{10}+\frac{i}{200}\), \(d=100\), \(t\in [0,T=1]\), by using a time size interval \(\tau =T/N\), with \(N=40\). We fix the the initial condition to \(g(\vec {x},0)=\psi (\vec {x})=\exp {(-rT)}\max \{[\max _{i \in {1,2,\ldots ,d}}x_i]-100,0\}\) (as in [52]).
In Table 2 and Fig. 12 we report the results obtained for computing the relative error \(\epsilon \) by using the two discretization schemes schemes in order to minimize the empirical averages corresponding to (46). We also average \(\epsilon \) over 15 different experiments to reduce the variance. The analysis shows that exists an improvement in choosing an higher order discretization scheme for training the neural network.
Appendix D: Computational Complexity for the Milstein Scheme
In Sect. 5.3, we presented using the Euler–Maruyama scheme that the deep learning methodology studied in this paper does not suffer from the curse of dimensionality, in the sense that the number of training trajectories used scales polynomially in the dimension of space. In this appendix, however, we present the same analysis for the Milstein scheme, and we show that the number of training trajectories scales polynomially in the dimension of space.
As far as we know, it is not know in literature an exact solution for a Kolmogorov non-linear equation with non constant transport coefficients. For this reason, we perform an accurate sampling of solutions using the multilevel Picard method for the equation (32) in the interval \([49.995:50.005]^d\) and then we use those solutions as reference values for our analysis.
The deep network of Sect. 4.2 (with Milstein discretization) is used since the diffusion term is non-homogeneous. For the number of initial random points in \([49.995:50.005]^d\) of stochastic input trajectories we take \(P=4,8,16,32, 64, 128, 256, 512, 1024\). Once the network is trained we obtain the approximate solution \(\mathcal {N}\mathcal {N}(\vec {x},0|\vec {\theta })\). The relative error is then easily computed from (36), using \(M=10^2\) uniformly random points \(\vec {x}_m\in [49.995:50.005]^d\), \(m=1, \cdots , M\), computed with the multilevel Picard method.
Figure 13 shows the behaviour of \(\langle \epsilon \rangle \) as a function of P, for various values of d. For small value of d, the averaged relative error needs few points P to approximate well the exact solution. In contrast, as expected, as d grows, the number of points P needed for approximate well the solution becomes larger.
By fixing a pre-specified relative error of \(\langle \epsilon \rangle \approx 0.0011\) we observe that P scales polynomially with d, roughly as \(P=O(d^{1.49})\). This value of P is comparable with the one obtained for the Euler-Maruyama scheme in Sect. 5.3. We conclude that the Milstein scheme improve the accuracy of the Neural Network, without modifying the computational complexity, as expected.
Appendix E: HJB Equation with non Constant Diffusion Coefficient
In Sect. 5.4, we presented a comparison between Euler–Maruyama and Leihmkuhler–Matthews scheme over the Hamilton–Jacobi–Bellman equation in (40). In this appendix we present a comparison between the Euler-Maruyama and a Mistein scheme over an Hamilton–Jacobi–Bellman equation with non constant diffusion tensor. The equation that we try to solve is:
where \(\lambda \) is a positive constant set to 1, and \(D(\vec {x})_{ij}=x_i^2 \delta _{ij}\), with \(\delta _{ij}\) is the \(\delta \)-Kronecker. The terminal condition is \(g(\vec {x}, T)=\phi (\vec {x})=\ln ((1+||\vec {x}||^2)/2)\) with \(\vec {x} \in \mathbb {R}^d\).
In Fig. 14 we present the comparison between the output of \(\mathcal {D}\mathcal {N}\mathcal {N}_{\textrm{EM}}\) (blue points) and \(\mathcal {D}\mathcal {N}\mathcal {N}_{\textrm{M}}\) (green points). Each point is the average over 5 samples. Blue points identify the output of \(\mathcal {D}\mathcal {N}\mathcal {N}_{\textrm{EM}}\) and reach the approximate value \(g_{\textrm{EM}}(\vec {x}=(50, \dots , 50), 0)=10.049\pm 0.007\). The green points represent the value obtained by \(\mathcal {D}\mathcal {N}\mathcal {N}_{\textrm{M}}\) \(g_{M}(\vec {x}=(50, \dots , 50), 0)=10.008 \pm 0.008\). Again, it is well shown the different results of the two schemes. No numerical or analytical result is present in the literature for Eq. (50), but numerical observations show that the true value of the solution of \(g(\vec {x}=(50, \dots , 50), 0)\) could be around \(\sim 10.008\).
Appendix F: Time Scaling of the Deep Learning Algorithm
In this appendix we briefly discuss how, in practice, the time scales with the dimension d, when the deep network is trained to compute an approximate solution. This experiment is performed on a cluster composed of 22 nodes, 200 cores and 250 GB of RAM divided in 22 different machines with different hardware.
We illustrate this for the heat equation and for the exactly solvable non-linear diffusion equation of Sect. 5.3. Since the diffusion tensor is constant we use the algorithm presented in Sect. 4.1 based on the Euler-Maruyama discretization.
We use \(P=4096\) random initial points for the set of stochastic trajectories used for training, and keep the number of training steps at 15000. Figure 15 shows how the time needed to train the network and find an approximate a solution scales with the number of dimensions d. Interestingly we see that for both equations the set of points live on the same straight line, on a log-log scale, with a slope \(\sim 1.1\).
Appendix G: Derivation of Equation (26)
For completeness we derive Eq. (26) which follows from the Milstein discretization of stochastic trajectories given by (20) [37, 45]. We recall that the \(\partial _{x_k}\) operator acts only on B.
For small enough \(\tau \), after the discretization of Eq. (3), and using (20), one obtains:
Finally, discarding all the terms of order greater than \(\tau \), using the assumption that \( g[\vec {Y}^n_{M},\tau _n]\) must be a solution of the nonlinear Backward Kolmogorov Eq. (1), we end up with Eq. (26).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Marino, R., Macris, N. Solving Non-linear Kolmogorov Equations in Large Dimensions by Using Deep Learning: A Numerical Comparison of Discretization Schemes. J Sci Comput 94, 8 (2023). https://doi.org/10.1007/s10915-022-02044-x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10915-022-02044-x