Deep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Differential Equations and Backward Stochastic Differential Equations

Abstract

We study a new algorithm for solving parabolic partial differential equations (PDEs) and backward stochastic differential equations (BSDEs) in high dimension, which is based on an analogy between the BSDE and reinforcement learning with the gradient of the solution playing the role of the policy function, and the loss function given by the error between the prescribed terminal condition and the solution of the BSDE. The policy function is then approximated by a neural network, as is done in deep reinforcement learning. Numerical results using TensorFlow illustrate the efficiency and accuracy of the studied algorithm for several 100-dimensional nonlinear PDEs from physics and finance such as the Allen–Cahn equation, the Hamilton–Jacobi–Bellman equation, and a nonlinear pricing model for financial derivatives.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. 1.

    Bellman, R.: Dynamic programming. Princeton Landmarks in Mathematics. Princeton University Press, Princeton, NJ. Reprint of the 1957 edition, with a new introduction by Stuart Dreyfus (2010)

  2. 2.

    Bender, C., Denk, R.: A forward scheme for backward SDEs. Stoch. Process. Appl. 117(12), 1793–1812 (2007)

    MathSciNet  Article  MATH  Google Scholar 

  3. 3.

    Bender, C., Schweizer, N., Zhuo, J.: A primal-dual algorithm for BSDEs. arXiv:1310.3694 (2014)

  4. 4.

    Bergman, Y.Z.: Option pricing with differential interest rates. Rev. Financ. Stud. 8(2), 475–500 (1995)

    Article  Google Scholar 

  5. 5.

    Briand, P., Labart, C.: Simulation of BSDEs by Wiener chaos expansion. Ann. Appl. Probab. 24(3), 1129–1171 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  6. 6.

    Chassagneux, J.-F.: Linear multistep schemes for BSDEs. SIAM J. Numer. Anal. 52(6), 2815–2836 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  7. 7.

    Chassagneux, J.-F., Richou, A.: Numerical simulation of quadratic BSDEs. Ann. Appl. Probab. 26(1), 262–304 (2016)

    MathSciNet  Article  MATH  Google Scholar 

  8. 8.

    Crisan, D., Manolarakis, K.: Solving backward stochastic differential equations using the cubature method: application to nonlinear pricing. SIAM J. Financ. Math. 3(1), 534–571 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  9. 9.

    Darbon, J., Osher, S.: Algorithms for overcoming the curse of dimensionality for certain Hamilton–Jacobi equations arising in control theory and elsewhere. Res. Math. Sci. 3(19), 26 (2016)

    MathSciNet  MATH  Google Scholar 

  10. 10.

    Debnath, L.: Nonlinear Partial Differential Equations for Scientists and Engineers, 3rd edn. Birkhäuser/Springer, New York (2012)

    Book  MATH  Google Scholar 

  11. 11.

    E, W., Han, J., Jentzen, A.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. arXiv:1706.04702 (2017)

  12. 12.

    E, W., Hutzenthaler, M., Jentzen, A., Kruse, T.: Linear scaling algorithms for solving high-dimensional nonlinear parabolic differential equations. arXiv:1607.03295 (2017)

  13. 13.

    E, W., Hutzenthaler, M., Jentzen, A., Kruse, T.: On multilevel Picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations. arXiv:1708.03223 (2017)

  14. 14.

    Gobet, E., Lemor, J.-P., Warin, X.: A regression-based Monte Carlo method to solve backward stochastic differential equations. Ann. Appl. Probab. 15(3), 2172–2202 (2005)

    MathSciNet  Article  MATH  Google Scholar 

  15. 15.

    Gobet, E., Turkedjiev, P.: Linear regression MDP scheme for discrete backward stochastic differential equations under general conditions. Math. Comput. 85(299), 1359–1391 (2016)

    MathSciNet  Article  MATH  Google Scholar 

  16. 16.

    Gobet, E., Turkedjiev, P.: Adaptive importance sampling in least-squares Monte Carlo algorithms for backward stochastic differential equations. Stoch. Process. Appl. 127(4), 1171–1203 (2017)

    MathSciNet  Article  MATH  Google Scholar 

  17. 17.

    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press 2016. http://www.deeplearningbook.org

  18. 18.

    Han, J., E, W.: Deep learning approximation for stochastic control problems. arXiv:1611.07422 (2016)

  19. 19.

    Han, J., Jentzen, A., E, W.: Overcoming the curse of dimensionality: solving high-dimensional partial differential equations using deep learning. arXiv:1707.02568 (2017)

  20. 20.

    Henry-Labordère, P.: Counterparty risk valuation: a marked branching diffusion approach. arXiv:1203.2369 (2012)

  21. 21.

    Henry-Labordère, P., Oudjane, N., Tan, X., Touzi, N., Warin, X.: Branching diffusion representation of semilinear PDEs and Monte Carlo approximation. arXiv:1603.01727 (2016)

  22. 22.

    Henry-Labordère, P., Tan, X., Touzi, N.: A numerical algorithm for a class of BSDEs via the branching process. Stoch. Process. Appl. 124(2), 1112–1140 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  23. 23.

    Hinton, G.E., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. Sig. Process. Mag. 29, 82–97 (2012)

    Article  Google Scholar 

  24. 24.

    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning (ICML) (2015)

  25. 25.

    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)

  26. 26.

    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)

    Google Scholar 

  27. 27.

    LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521, 436–444 (2015)

    Article  Google Scholar 

  28. 28.

    Pardoux, É., Peng, S.: Adapted solution of a backward stochastic differential equation. Syst. Control Lett. 14(1), 55–61 (1990)

  29. 29.

    Pardoux, É., Peng, S.: Backward stochastic differential equations and quasilinear parabolic partial differential equations. In: Stochastic Partial Differential Equations and Their Applications (Charlotte, NC, 1991), vol. 176 of Lecture Notes in Control and Inform. Sci. Springer, Berlin, pp. 200–217 (1992)

  30. 30.

    Pardoux, É., Tang, S.: Forward-backward stochastic differential equations and quasilinear parabolic PDEs. Probab. Theory Relat. Fields 114(2), 123–150 (1999)

    MathSciNet  Article  MATH  Google Scholar 

  31. 31.

    Peng, S.: Probabilistic interpretation for systems of quasilinear parabolic partial differential equations. Stoch. Stoch. Rep. 37(1–2), 61–74 (1991)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Christian Beck and Sebastian Becker are gratefully acknowledged for useful suggestions regarding the implementation of the deep BSDE method. This project has been partially supported through the Major Program of NNSFC under grant 91130005, the research grant ONR N00014-13-1-0338, and the research grant DOE DE-SC0009248.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Weinan E.

Appendix A: Special Cases of the Proposed Algorithm

Appendix A: Special Cases of the Proposed Algorithm

In this section, we illustrate the general algorithm in Subsect. 3.2 in several special cases. More specifically, in Subsects. 5.1 and 5.2, we provide special choices for the functions \( \psi _m \), \( m \in {\mathbb {N}}\), and \( \Psi _m \), \( m \in {\mathbb {N}}\), employed in (3.14), and in Subsects. 5.3 and 5.4, we provide special choices for the function \( \Upsilon \) in (3.9).

Stochastic Gradient Descent (SGD)

Example 5.1

Assume the setting in Subsect. 3.2, let \( ( \gamma _m )_{ m \in {\mathbb {N}}} \subseteq (0,\infty ) \), and assume for all \( m \in {\mathbb {N}}\), \( x \in {\mathbb {R}}^{ \varrho } \), \( ( \varphi _j )_{ j \in {\mathbb {N}}} \in ( {\mathbb {R}}^{ \rho } )^{ {\mathbb {N}}} \) that

$$\begin{aligned} \varrho = \rho , \qquad \Psi _m( x, ( \varphi _j )_{ j \in {\mathbb {N}}} ) = \varphi _1 , \qquad \text {and} \qquad \psi _m( x ) = \gamma _m x .\nonumber \\ \end{aligned}$$
(5.1)

Then it holds for all \( m \in {\mathbb {N}}\) that

$$\begin{aligned} \Theta _{ m } = \Theta _{ m - 1 } - \gamma _{ m } \Phi ^{ m - 1, 1 }_{ \mathbb {S}_m }( \Theta _{ m - 1 } ) . \end{aligned}$$
(5.2)

Adaptive Moment Estimation (Adam) with Mini-Batches

In this subsection, we illustrate how the so-called Adam optimizer (see [25]) can be employed in conjunction with the deep BSDE method in Subsect. 3.2 (cf. also Subsect. 4.1 above).

Example 5.2

Assume the setting in Subsect. 3.2, assume that \( \varrho = 2 \rho \), let \( {\text {Pow}}_r :{\mathbb {R}}^{ \rho } \rightarrow {\mathbb {R}}^{ \rho } \), \( r \in (0,\infty ) \), be the functions which satisfy for all \( r \in (0,\infty ) \), \( x = ( x_1, \dots , x_{ \rho } ) \in {\mathbb {R}}^{ \rho } \) that

$$\begin{aligned} {\text {Pow}}_{ r }( x ) = ( | x_1 |^r, \dots , | x_{ \rho } |^r ) , \end{aligned}$$
(5.3)

let \( \varepsilon \in (0,\infty ) \), \( ( \gamma _m )_{ m \in {\mathbb {N}}} \subseteq (0,\infty ) \), \( ( J_m )_{ m \in {\mathbb {N}}_0 } \subseteq {\mathbb {N}}\), \( \mathbb {X}, \mathbb {Y} \in (0,1) \), let \( \mathbf{m} = ( \mathbf{m}^{ (1) } , \dots , \mathbf{m}^{ ( \rho ) } ) :\) \( {\mathbb {N}}_0 \times \Omega \rightarrow {\mathbb {R}}^{ \rho } \) and \( \mathbb {M} = ( \mathbb {M}^{ (1) } , \dots \mathbb {M}^{ ( \rho ) } ) :{\mathbb {N}}_0 \times \Omega \rightarrow {\mathbb {R}}^{ \rho } \) be the stochastic processes which satisfy for all \( m \in {\mathbb {N}}_0 \) that \( \Xi _m = ( \mathbf{m}_m^{ (1) }, \dots , \mathbf{m}^{ (\rho ) }_m , \mathbb {M}_m^{ (1) }, \dots , \mathbb {M}_m^{ (\rho ) } ) \), and assume for all \( m \in {\mathbb {N}}\), \( x = ( x_1, \dots , x_{ \rho } ) , y = ( y_1, \dots , y_{ \rho } ) \in {\mathbb {R}}^{ \rho } \), \( ( \varphi _j )_{ j \in {\mathbb {N}}} \in ( {\mathbb {R}}^{ \rho } )^{ {\mathbb {N}}} \) that

$$\begin{aligned}&\Psi _m( x, y, ( \varphi _j )_{ j \in {\mathbb {N}}} ) = \big ( \mathbb {X} x + ( 1 - \mathbb {X} ) \big ( \tfrac{ 1 }{ J_m } \textstyle \sum _{ j = 1 }^{ J_m } \varphi _j \big ) , \mathbb {Y} y + ( 1 - \mathbb {Y} ) \nonumber \\&\quad {\text {Pow}}_2\big ( \frac{ 1 }{ J_m } \textstyle \sum _{ j = 1 }^{ J_m } \varphi _j \big ) \big ) \end{aligned}$$
(5.4)

and

$$\begin{aligned} \psi _m( x, y )\! =\! \left( \left[ \varepsilon + \tfrac{ \sqrt{ | y_1 | } }{ \sqrt{ 1 - \mathbb {Y}^m } } \right] ^{ - 1 } \frac{ \gamma _m x_1 }{ ( 1 - \mathbb {X}^m ) } , \dots , \left[ \varepsilon \!+\! \tfrac{ \sqrt{ | y_{ \rho } | } }{ \sqrt{ 1 \!-\! \mathbb {Y}^m } } \right] ^{ - 1 } \frac{ \gamma _m x_{ \rho } }{ ( 1 - \mathbb {X}^m ) } \right) . \end{aligned}$$
(5.5)

Then it holds for all \( m \in {\mathbb {N}}\) that

$$\begin{aligned} \begin{aligned} \Theta _{ m }&= \Theta _{ m - 1 } - \left( \left[ \varepsilon + \tfrac{ \sqrt{ | \mathbb {M}^{ (1) }_m | } }{ \sqrt{ 1 - \mathbb {Y}^m } } \right] ^{ - 1 } \frac{ \gamma _m \mathbf{m}^{ (1) }_m }{ ( 1 - \mathbb {X}^m ) } , \dots , \left[ \varepsilon + \tfrac{ \sqrt{ | \mathbb {M}^{ ( \rho ) }_m | } }{ \sqrt{ 1 - \mathbb {Y}^m } } \right] ^{ - 1 } \frac{ \gamma _m \mathbf{m}^{ (\rho ) }_m }{ ( 1 - \mathbb {X}^m ) } \right) , \\ \mathbf{m}_m&= \mathbb {X} \, \mathbf{m}_{ m - 1 } + \frac{ ( 1 - \mathbb {X} ) }{ J_m } \left( \sum _{ j = 1 }^{ J_m } \Phi ^{ m - 1 , j }_{ \mathbb {S}_m }( \Theta _{ m - 1 } ) \right) , \\ \mathbb {M}_m&= \mathbb {Y} \, \mathbb {M}_{ m - 1 } + \left( 1 - \mathbb {Y} \right) {\text {Pow}}_{ 2 }\left( \frac{ 1 }{ J_m } \sum _{ j = 1 }^{ J_m } \Phi ^{ m - 1 , j }_{ \mathbb {S}_m }( \Theta _{ m - 1 } ) \right) . \end{aligned} \end{aligned}$$
(5.6)

Euler–Maruyama Scheme

Example 5.3

Assume the setting in Subsect. 3.2, let \( \mu :[0,T] \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d \) and \( \sigma :[0,T] \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d \) be functions, and assume for all \( s, t \in [0,T] \), \( x, w \in {\mathbb {R}}^d \) that

$$\begin{aligned} \Upsilon ( s, t, x, w ) = x + \mu ( s, x ) \, ( t - s ) + \sigma ( s, x ) \, w . \end{aligned}$$
(5.7)

Then it holds for all \( m, j \in {\mathbb {N}}_0 \), \( n \in \{ 0, 1, \dots , N - 1 \} \) that

$$\begin{aligned} \mathcal {X}^{ m, j }_n = \mathcal {X}^{ m, j }_n + \mu \left( t_n, \mathcal {X}^{ m, j }_n \right) \, \left( t_{ n + 1 } - t_n \right) + \sigma \left( t_n, \mathcal {X}^{ m, j }_n \right) \, \left( W_{ t_{ n + 1 } } - W_{ t_n } \right) . \end{aligned}$$
(5.8)

In the setting of Example 5.3, we consider under suitable further hypotheses for every sufficiently large \( m \in {\mathbb {N}}_0 \) the random variable \( \mathcal {U}^{ \Theta _m } \) as an approximation of \( u(0,\xi ) \) where \( u :[0,T] \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^k \) is a suitable solution of the PDE

$$\begin{aligned}&\frac{ \partial u}{ \partial t } ( t, x ) + \frac{ 1 }{ 2 } \sum \limits _{ j = 1 }^d \left( \frac{ \partial ^2 u}{ \partial x^2 } \right) ( t, x )\left[ \sigma ( t, x ) \, e^{ (d) }_j , \sigma ( t, x ) \, e^{ (d) }_j \right] + \left( \frac{ \partial u}{ \partial x } \right) ( t, x ) \, \mu ( t, x ) \nonumber \\&\quad + f\left( t, x, u(t,x), \left( \frac{ \partial u}{ \partial x } \right) ( t, x ) \, \sigma ( t, x ) \right) = 0 \end{aligned}$$
(5.9)

with \( u(T,x) = g(x) \), \( e^{ (d) }_1 = (1,0,\dots ,0) \), \( \dots \), \( e^{ (d) }_d = (0,\dots ,0,1) \in {\mathbb {R}}^d \) for \(t \in [0,T] \), \( x = ( x_1, \dots , x_d ) \in {\mathbb {R}}^d \) (cf. (PDE) in Sect. 2 above).

Geometric Brownian Motion

Example 5.4

Assume the setting in Subsect. 3.2, let \( \bar{\mu }, \bar{\sigma } \in {\mathbb {R}}\), and assume for all \( s, t \in [0,T] \), \( x = ( x_1, \dots , x_d ) \), \( w = ( w_1, \dots , w_d ) \in {\mathbb {R}}^d \) that

$$\begin{aligned} \Upsilon ( s, t, x, w ) \!=\! \exp \left( \left( \bar{\mu }\! -\! \frac{ \bar{\sigma }^2 }{ 2 } \right) ( t\! -\! s ) \right) \exp \left( \bar{\sigma } {\text {diag}}_{ {\mathbb {R}}^{ d \times d } }( w_1, \dots , w_d ) \right) x .\quad \end{aligned}$$
(5.10)

Then it holds for all \( m, j \in {\mathbb {N}}_0 \), \( n \in \{ 0, 1, \dots , N \} \) that

$$\begin{aligned} \mathcal {X}^{ \theta , m, j }_n = \exp \left( \left( \bar{\mu } - \frac{ \bar{\sigma }^2 }{ 2 } \right) t_n {\text {Id}}_{ {\mathbb {R}}^d } + \bar{\sigma } {\text {diag}}_{ {\mathbb {R}}^{ d \times d } }\left( W_{ t_n }^{ m, j } \right) \right) \xi . \end{aligned}$$
(5.11)

In the setting of Example 5.4 we view under suitable further hypotheses (cf. Subsect. 4.4 above) for every sufficiently large \( m \in {\mathbb {N}}_0 \) the random variable \( \mathcal {U}^{ \Theta _m } \) as an approximation of \(u(0,\xi ) \) where \( u :[0,T] \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^k \) is a suitable solution of the PDE

$$\begin{aligned}&\tfrac{ \partial u}{ \partial t } ( t, x ) + \tfrac{\bar{\sigma }^2}{ 2 } \textstyle \sum \limits _{i=1}^d | x_i |^2 \, \big ( \tfrac{ \partial ^2 u}{ \partial x^2_i } \big )(t,x) + \bar{\mu } \sum \limits _{i=1}^d x_i \, \big (\tfrac{\partial u}{\partial x_i}\big )(t,x) \nonumber \\&\quad + f\big ( t, x, u(t,x), \bar{\sigma } \, ( \tfrac{ \partial u}{ \partial x } )( t, x ) {\text {diag}}_{ {\mathbb {R}}^{ d \times d } }(x_1, \dots , x_d) \big ) = 0 \end{aligned}$$
(5.12)

with \( u(T,x) = g(x) \) for \( t \in [0,T] \), \( x = ( x_1, \dots , x_d ) \in {\mathbb {R}}^d \).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

E, W., Han, J. & Jentzen, A. Deep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Differential Equations and Backward Stochastic Differential Equations. Commun. Math. Stat. 5, 349–380 (2017). https://doi.org/10.1007/s40304-017-0117-6

Download citation

Keywords

  • PDEs
  • High dimension
  • Backward stochastic differential equations
  • Deep learning
  • Control
  • Feynman-Kac

Mathematics Subject Classification

  • 65M75
  • 60H35
  • 65C30