Skip to main content
Log in

Machine Learning for Semi Linear PDEs

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

Recent machine learning algorithms dedicated to solving semi-linear PDEs are improved by using different neural network architectures and different parameterizations. These algorithms are compared to a new one that solves a fixed point problem by using deep learning techniques. This new algorithm appears to be competitive in terms of accuracy with the best existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

References

  1. Pardoux, E., Peng, S.: Adapted solution of a backward stochastic differential equation. Syst. Control Lett. 14(1), 55–61 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bouchard, B., Touzi, N.: Discrete-time approximation and Monte-Carlo simulation of backward stochastic differential equations. Stoch. Process. Appl. 111(2), 175–206 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  3. Gobet, E., Lemor, J.-P., Warin, X., et al.: A regression-based Monte Carlo method to solve backward stochastic differential equations. Ann. Appl. Prob. 15(3), 2172–2202 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  4. Lemor, J.-P., Gobet, E., Warin, X., et al.: Rate of convergence of an empirical regression method for solving generalized backward stochastic differential equations. Bernoulli 12(5), 889–916 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  5. Gobet, Emmanuel, Turkedjiev, Plamen: Linear regression MDP scheme for discrete backward stochastic differential equations under general conditions. Math. Comput. 85(299), 1359–1391 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  6. Fahim, A., Touzi, N., Warin, X.: A probabilistic numerical method for fully nonlinear parabolic PDEs. Ann. Appl. Prob. 21, 1322–1364 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  7. Cheridito, P., et al.: Second-order backward stochastic differential equations and fully nonlinear parabolic PDEs. Commun. Pure Appl. Math. 60(7), 1081–1110 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  8. Longstaff, F.A., Schwartz, E.S.: Valuing American options by simulation: a simple least-squares approach. Rev. Financ. Stud. 14(1), 113–147 (2001)

    Article  MATH  Google Scholar 

  9. Bouchard, B., Warin, X.: Monte-Carlo valuation of American options: facts and new algorithms to improve existing methods. In: Numerical Methods in Finance. Springer, Berlin, pp. 215–255 (2012)

  10. Henry-Labordere, P., et al.: Branching Diffusion Representation of Semilinear PDEs and Monte Carlo Approximation. arXiv preprint arXiv:1603.01727 (2016)

  11. Bouchard, B., et al.: Numerical approximation of BSDEs using local polynomial drivers and branching processes. Monte Carlo Methods Appl. 23(4), 241–263 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  12. Bouchard, B., Tan, X., Warin, X.: Numerical Approximation of General Lipschitz BSDEs with Branching Processes. arXiv preprint arXiv:1710.10933 (2017)

  13. Warin, X.: Variations on Branching Methods for Non-linear PDEs. arXiv preprint arXiv:1701.07660 (2017)

  14. Fournié, E., et al.: Applications of Malliavin calculus to Monte Carlo methods in finance. Finance Stoch. 3(4), 391–412 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  15. Warin, X.: Nesting Monte Carlo for High-Dimensional Non Linear PDEs. arXiv preprint arXiv:1804.08432 (2018)

  16. Warin, X.: Monte Carlo for High-Dimensional Degenerated Semi Linear and Full Non Linear PDEs. arXiv preprint arXiv:1805.05078 (2018)

  17. Weinan, E., et al.: Linear scaling algorithms for solving high-dimensional nonlinear parabolic differential equations. In: SAM Research Report 2017 (2017)

  18. Weinan, E., et al.: On multilevel Picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations, vol. 46. arXiv preprint arXiv:1607.03295 (2016)

  19. Hutzenthaler, M., Kruse, T.: Multi-level Picard Approximations of High Dimensional Semilinear Parabolic Differential Equations with Gradient-Dependent Nonlinearities. arXiv preprint arXiv:1711.01080 (2017)

  20. Hutzenthaler, M., et al.: Overcoming the Curse of Dimensionality in the Numerical Approximation of Semilinear Parabolic Partial Differential Equations. arXiv preprint arXiv:1807.01212 (2018)

  21. Han, J., Jentzen, A., Weinan, E.: Overcoming the Curse of Dimensionality: Solving High-Dimensional Partial Differential Equations Using Deep Learning. arXiv:1707.02568 (2017)

  22. Weinan, E., Han, J., Jentzen, A.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5(4), 349–380 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  23. Beck, C., Weinan, E., Jentzen, A.: Machine Learning Approximation Algorithms for High-Dimensional Fully Nonlinear Partial Differential Equations and Secondorder Backward Stochastic Differential Equations. arXiv preprint arXiv:1709.05963 (2017)

  24. Raissi, M.: Forward–Backward Stochastic Neural Networks: Deep Learning of High Dimensional Partial Differential Equations. arXiv preprint arXiv:1804.07010 (2018)

  25. Fujii, M., Takahashi, A., Takahashi, M.: Asymptotic Expansion as Prior Knowledge in Deep Learning Method for High Dimensional BSDEs. arXiv preprint arXiv:1710.07030 (2017)

  26. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning—Volume 37. ICML’15, JMLR.org, Lille, pp. 448–456. http://dl.acm.org/citation.cfm?id=3045118.3045167 (2015)

  27. Cooijmans, T., et al.: Recurrent Batch Normalization. arXiv preprint arXiv:1603.09025 (2016)

  28. Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and Accurate Deep Network Learning by Exponential Linear Units (elus). arXiv preprint arXiv:1511.07289 (2015)

  29. He, K., et al.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  30. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  31. Olah, C.: Understanding LSTM Networks, Blog, http://colah.github.io/posts/2015-08-Understanding-LSTMs/ (2015)

  32. Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. Blog, http://karpathy.github.io/2015/05/21/rnn-effectiveness/ (2015)

  33. Han, J., Jentzen, A., et al.: Solving High-Dimensional Partial Differential Equations Using Deep Learning. In: arXiv preprint arXiv:1707.02568 (2017)

  34. Sirignano, J., Spiliopoulos, K.: DGM: a deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018)

    Article  MathSciNet  Google Scholar 

  35. Richou, A.: Étude théorique et numérique des équations différentielles stochastiques rétrogrades. Thèse de doctorat dirigée par Hu, Ying et Briand, Philippe Mathématiques et applications Rennes 1 2010, Ph.D. thesis (2010)

  36. Ruder, S.: An Overview of Gradient Descent Optimization Algorithms. arXiv preprint arXiv:1609.04747 (2016)

  37. Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. In: Grégoire, M., Geneviève, B.O., Klaus-Robert, M. (eds.) Neural Networks: Tricks of the Trade, 2nd edn. Springer, Berlin, pp. 437–478 (2012). ISBN: 978-3-642-35289-8. https://doi.org/10.1007/978-3-642-35289-8_26

  38. Han, J., Weinan, E.: Deep Learning Approximation for Stochastic Control Problems. arXiv preprint arXiv:1611.07422 (2016)

  39. Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (2014)

  40. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)

  41. Braun, S.: LSTM Benchmarks for Deep Learning Frameworks. arXiv preprint arXiv:1806.01818 (2018)

  42. Touzi, N.: Optimal Stochastic Control, Stochastic Target Problems, and Backward SDE, vol. 29. Springer, Berlin (2012)

    MATH  Google Scholar 

  43. Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991)

    Article  MathSciNet  Google Scholar 

  44. Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Simon Fécamp for useful discussions and technical advices.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xavier Warin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Some Test PDEs

We recall our notations for the semilinear PDEs:

$$\begin{aligned} -\partial _t u(t, x) - \mathcal {L} u(t, x)&= f\left( t, x, u(t, x), \sigma ^\top (t, x) \nabla u(t, x)\right) \\ u(T, x)&= g(x) \end{aligned}$$

where \(\mathcal {L}u (t, x) := \frac{1}{2} {{\,\mathrm{{\text {Tr}}}\,}}\left( \sigma ^\top \sigma (t, x) \nabla ^2 u(t, x)\right) + \mu (t, x)^\top \nabla u(t, x)\). For each example, we thus give the corresponding \(\mu \), \(\sigma \), f and g. In the implementation, we rather use a function

$$\begin{aligned} \tilde{f}(t, x, u(t, x), Du(t, x)) := f(t, x, u(t, x), \sigma ^\top (t, x) Du(t, x)) \end{aligned}$$

for convenience, as this does not influence the results and allows for a more direct formulation for some PDEs.

1.1.1 A Black–Scholes Equation with Default Risk

From [21, 22]. If not otherwise stated, the parameters take values: \(\overline{\mu }=0.02\), \(\overline{\sigma }=0.2\), \(\delta =2/3\), \(R=0.02\), \(\gamma _h=0.2\), \(\gamma _l=0.02\), \(v_h=50\), \(v_l=70\). We use the initial condition \(X_0 = (100, \ldots , 100)\).

$$\begin{aligned} \mu : (t, x)&\mapsto \overline{\mu } x \\ \sigma : (t, x)&\mapsto \overline{\sigma } \; {\mathrm {diag}}(\left\{ x_i\right\} _{i=1..d}) \\ f : (t, x, y, z)&\mapsto -(1-\delta ) \times {\mathrm {min}} \left\{ \gamma _h, \; {\mathrm {max}}\left\{ \gamma _l, \; \frac{\gamma _h - \gamma _l}{v_h - v_l} \left( y - v_h \right) + \gamma _h \right\} \right\} y - R y\\ g : x&\mapsto \min _{i=1..d} x_i \end{aligned}$$

We used the closed formula for the SDE dynamic:

$$\begin{aligned} X_t&= X_s \exp \left[ \left( \left( \bar{\mu }-\frac{\bar{\sigma }^2}{2}\right) \right) (t-s) + \bar{\sigma } (W_t - W_s)\right]&\forall t > s \end{aligned}$$

Baseline (from [21]):

figure a

1.1.2 A Black–Scholes–Barenblatt Equation

From [24]. If not otherwise stated, the parameters take values: \(\overline{\sigma }=0.4\), \(r=0.05\). We use the initial condition \(X_0 = (1.0, 0.5, 1.0, \ldots )\).

$$\begin{aligned} \mu : (t, x)&\mapsto 0 \\ \sigma : (t, x)&\mapsto \overline{\sigma } \; {\mathrm {diag}}(\left\{ x_i\right\} _{i=1..d}) \\ f : (t, x, y, z)&\mapsto -r \left( y - \frac{1}{\overline{\sigma }}\sum _{i=1}^{d} z_i\right) \\ g : x&\mapsto \left\| x\right\| ^2 \end{aligned}$$

We used the closed formula for the SDE dynamic:

$$\begin{aligned} X_t&= X_s \exp \left[ -\frac{\bar{\sigma }^2}{2} (t-s) + \bar{\sigma } (W_t - W_s)\right]&\forall t > s \end{aligned}$$

Exact solution:

$$\begin{aligned} u(t, x)&= \exp ((r+\overline{\sigma }^2)(T-t))g(x) \end{aligned}$$

1.1.3 A Hamilton–Jacobi–Bellman Equation

From [21, 22]. If not otherwise stated: \(\lambda =1.0\) and \(X_0 = (0, \ldots , 0)\).

$$\begin{aligned} \mu : (t, x)&\mapsto 0 \\ \sigma : (t, x)&\mapsto \sqrt{2} \; \mathbf {I}_d \\ f : (t, x, y, z)&\mapsto - 0.5 \; \lambda \left\| z\right\| ^2 \\ g : x&\mapsto \log \left( 0.5 \left[ 1 + \left\| x\right\| ^2 \right] \right) \end{aligned}$$

Monte-Carlo solution:

$$\begin{aligned} u(t, X_t) =&-\frac{1}{\lambda } \log \left( \mathbb {E}\left[ \; \exp \left( - \lambda g(X_t + \sqrt{2} B_{T-t}\right) \right] \right) \\ \forall j, \quad \frac{\partial u}{\partial x_j}(t, x) =&\left( \mathbb {E}\left[ \exp \left\{ -\lambda g(X_t + \sqrt{2} B_{T-t})\right\} \right] \right) ^{-1} \\&\times \mathbb {E} \left[ \frac{\partial \, g}{\partial \, x_j}(X_t + \sqrt{2} B_{T-t}) \exp \left\{ -\lambda g(X_t + \sqrt{2} B_{T-t})\right\} \right] \end{aligned}$$

where

$$\begin{aligned} \frac{\partial \, g}{\partial \, x_j}(x)&= \frac{2x_j}{1+\left\| x\right\| ^2} \end{aligned}$$

Baseline (computed using 10 million Monte-Carlo realizations, \(d=100\)):

figure b

1.1.4 An Oscillating Example with a Square Non-linearity

From [15]. If not otherwise stated, the parameters take values: \(\mu _0=0.2\), \(\sigma _0=1.0\), \(a=0.5\), \(r=0.1\). The intended effect of the \(\min \) and \(\max \) in f is to make f Lipschitz. We used the initial condition \(X_0=(1.0, 0.5, 1.0, \ldots )\).

$$\begin{aligned} \mu : (t, x)&\mapsto \mu _0 / d \\ \sigma : (t, x)&\mapsto \sigma _0 / \sqrt{d} \; \mathbf {I}_d \\ f : (t, x, y, z)&\mapsto \phi (t, x) \\&\quad + r \left( \max \left[ -\exp (2a(T-t)), \min \left\{ \frac{1}{\sigma _0 \sqrt{d}} y \sum _{i=1}^{d} z_i , \exp (2a(T-t)) \right\} \right] \right) ^2 \end{aligned}$$

where

$$\begin{aligned} \phi : (t, x)&\mapsto \cos \left( \sum _{i=1}^{d} x_i \right) \left( a + \frac{\sigma _0^2}{2} \right) \exp (a(T-t)) + \sin \left( \sum _{i=1}^{d} x_i \right) \mu _0 \exp (a(T-t)) \\&\qquad - r \left( \cos \left( \sum _{i=1}^{d} x_i \right) \sin \left( \sum _{i=1}^{d} x_i \right) \exp (2a(T-t))\right) ^2 \\ g : x&\mapsto \cos \left( \sum _{i=1}^d x_i \right) \end{aligned}$$

Exact solution:

$$\begin{aligned} u(t, x)&= \cos \left( \sum _{i=1}^d x_i \right) \exp (a(T-t)) \\ \forall j, \quad \frac{\partial u}{\partial x_j}(t, x)&= -\sin \left( \sum _{i=1}^d x_i \right) \exp (a(T-t)) \end{aligned}$$

1.1.5 A Non-Lipschitz Terminal Condition

From [35]. If not otherwise stated: \(\alpha =0.5\), \(X_0 = (0, \ldots , 0)\).

$$\begin{aligned} \mu : (t, x)&\mapsto 0 \\ \sigma : (t, x)&\mapsto \mathbf {I}_d \\ f : (t, x, y, z)&\mapsto - 0.5 \left\| z\right\| ^2 \\ g : x&\mapsto \sum _{i=1}^d \left( \max \left\{ 0, \min \left[ 1, x_i\right] \right\} \right) ^\alpha \end{aligned}$$

Monte-Carlo solution:

$$\begin{aligned} u(t, X_t) =&\log \left( \mathbb {E}\left[ \; \exp \left( g(X_t + B_{T-t})\right) \right] \right) \\ \forall j, \quad \frac{\partial u}{\partial x_j}(t, x) =&\left( \mathbb {E}\left[ \exp \left\{ g(X_t + \sqrt{2} B_{T-t})\right\} \right] \right) ^{-1} \\&\times \mathbb {E} \left[ g'(X_t + \sqrt{2} B_{T-t}) \exp \left\{ g(X_t + \sqrt{2} B_{T-t})\right\} \right] \end{aligned}$$

where

$$\begin{aligned} g'(x)&= \left\{ \begin{array}{ll} 0 &{}\quad {\text {if}}~ x\le 0~ \text {or}~ x\ge 1 \\ \alpha x^{\alpha -1} &{}\quad {\text {else}}\end{array} \right. \end{aligned}$$

Baseline (computed using 10 million Monte-Carlo realizations, \(d=10\)):

figure c

1.1.6 An Oscillating Example with Cox–Ingersoll–Ross Propagation

From [16]. If not otherwise stated: \(a=0.1\), \(\alpha =0.2\), \(T=1.0\), \(\hat{k}=0.1\), \(\hat{m}=0.3\), \(\hat{\sigma }=0.2\). We used the initial condition \(X_0 = (0.3, \ldots , 0.3)\). Note that we have \(2 \hat{k} \hat{m} > \hat{\sigma }^2\) so that X remains positive.

$$\begin{aligned} \mu : (t, x)&\mapsto \hat{k}\left( \hat{m} - x \right) \\ \sigma : (t, x)&\mapsto \hat{\sigma } {\mathrm {diag}} \left\{ \sqrt{x} \right\} \\ f : (t, x, y, z)&\mapsto \phi (t, x) + a y \left( \sum _{i=1}^d z_i \right) \end{aligned}$$

where

$$\begin{aligned} \phi : (t, x)&\mapsto \cos \left( \sum _{i=1}^{d} x_i \right) \left( -\alpha + \frac{{\hat{\sigma }}^2}{2} \right) \exp (-\alpha (T-t)) \\&\qquad + \sin \left( \sum _{i=1}^{d} x_i \right) \exp (-\alpha (T-t)) \sum _{i=1}^{d} \hat{k}\left( \hat{m} - x_i\right) \\&\qquad + a \cos \left( \sum _{i=1}^{d} x_i \right) \sin \left( \sum _{i=1}^{d} x_i\right) \exp (-2\alpha (T-t)) \sum _{i=1}^d \hat{\sigma } \sqrt{x_i}\\ g : x&\mapsto \cos \left( \sum _{i=1}^d x_i \right) \end{aligned}$$

Exact solution:

$$\begin{aligned} u(t, x)&= \cos \left( \sum _{i=1}^d x_i \right) \exp (-\alpha (T-t)) \\ \forall j, \quad \frac{\partial u}{\partial x_j}(t, x)&= -\sin \left( \sum _{i=1}^d x_i \right) \exp (-\alpha (T-t)) \end{aligned}$$

1.1.7 An Oscillating Example with Inverse Non-linearity

If not stated otherwise, we took parameters \(\mu _0 = 0.2\), \(\sigma _0 = 1.0\), \(a=0.5\), \(r=0.1\) and the initial condition \(X_0=(1.0, 0.5, 1.0, \ldots )\).

$$\begin{aligned} \mu : (t, x)&\mapsto \mu _0 /d \mathbb {1}_d\\ \sigma : (t, x)&\mapsto \sigma _0 \; \mathbf {I}_d \\ f : (t, x, y, z)&\mapsto \phi (t, x) + r \frac{d y}{\sum _{i=1}^d z_i} \\ g : x&\mapsto 2\sum _{i=1}^d x_i + \cos \left( \sum _{i=1}^d x_i\right) \end{aligned}$$

where

$$\begin{aligned} \phi : (t, x)&\mapsto 2a \sum _{i=1}^d x_i \exp (a(T-t)) + \cos \left( \sum _{i=1}^{d} x_i \right) \left( a + \frac{d \sigma _0^2}{2} \right) \exp (a(T-t)) \\&\quad - \mu _0 \left[ 2- \sin \left( \sum _{i=1}^{d} x_i \right) \right] \exp (a(T-t)) - r \frac{2 \sum _{i=1}^d x_i + \cos \left( \sum _{i=1}^{d} x_i \right) }{\sigma _0 \left[ 2 - \sin \left( \sum _{i=1}^{d} x_i \right) \right] } \end{aligned}$$

Exact solution:

$$\begin{aligned} u(t, x)&= \left[ 2\sum _{i=1}^d x_i + \cos \left( \sum _{i=1}^d x_i \right) \right] \exp (a(T-t)) \\ \forall j, \quad \frac{\partial u}{\partial x_j}(t, x)&= \left[ 2-\sin \left( \sum _{i=1}^d x_i \right) \right] \exp (a(T-t)) \end{aligned}$$

1.2 Demonstration of Proposition 4.6

In the sequel, C will only depend on u, f, g and may vary from one line to another.

First, note that due to the Lipschitz property in Assumption 4.3, the boundedness of f and g and the regularity of u due to Assumption 4.5, the solution u of (1) and Du satisfy a Feynman–Kac relation (see an adaptation of Proposition 1.7 in [42]):

$$\begin{aligned} u\left( t,x\right)&= \frac{1}{2}\mathbb {E}_{t,x} \left[ \phi \left( t, t+\tau ,X_{t+\tau }^{t,x}, u\left( t+\tau ,X_{t+\tau }^{t,x}\right) , Du\left( t+\tau ,X_{t+\tau }^{t,x}\right) \right) \right. \nonumber \\&\quad \left. +\, \phi \left( t, t+\tau ,\widehat{X}_{t+\tau }^{t,x}, u\left( t+\tau ,\widehat{X}_{t+\tau }^{t,x}\right) , Du\left( t+\tau ,\widehat{X}_{t+\tau }^{t,x}\right) \right) \right] \nonumber \\ Du\left( t,x\right)&= \mathbb {E}_{t,x} \left[ \sigma ^{-\top } \frac{W_{\left( t + \left( \tau \wedge \Delta t \right) \right) \wedge T}-W_{t}}{\tau \wedge \left( T-t\right) \wedge \Delta t} \right. \nonumber \\&\quad \times \frac{1}{2} \left( \phi \left( t,t+\tau , X_{t+\tau }^{t,x}, u\left( t+\tau , X_{t+\tau }^{t,x}\right) , Du\left( t+\tau , X_{t+\tau }^{t,x}\right) \right) \right. \nonumber \\&\quad \left. \left. -\, \phi \left( t,t+\tau , \widehat{X}_{t+\tau }^{t,x}, u\left( t+\tau , \widehat{X}_{t+\tau }^{t,x}\right) , Du\left( t+\tau , \widehat{X}_{t+\tau }^{t,x}\right) \right) \right) \right] . \end{aligned}$$
(39)

First picking \((t,y) \in [0,T] \times \Omega _\epsilon ^{0,x}\), we have using Eq. (39):

$$\begin{aligned} B(\theta ,t, y) = \,&| {\bar{u}}^{\epsilon }(\theta , t, y) - u(\theta ,t,y)|^2 \\ \le \,&2 | u(t,y)-u(\theta ,t,y)|^2 + 2 \left( \mathbb {E}_{t,y}\left[ 1_{X^{t,y}_{t+\tau } \in \Omega ^{0,x}_\epsilon } \right. \right. \\&\times \frac{1}{2}\left( \frac{f(t,X^{t,y}_{t+\tau },u(\theta , t+\tau , X^{t,y}_{t+\tau }), v(\theta ,t+\tau , X^{t,y}_{t+\tau }))}{ \rho (\tau )} \right. \\&- \frac{f(t,X^{t,y}_{t+\tau },u( t+\tau , X^{t,y}_{t+\tau }), Du(t+\tau , X^{t,y}_{t+\tau }))}{ \rho (\tau )} \\&+ \frac{f(t,{\hat{X}}^{t,y}_{t+\tau },u(\theta , t+\tau , {\hat{X}}^{t,y}_{t+\tau }), v(\theta ,t+\tau , {\hat{X}}^{t,y}_{t+\tau }))}{ \rho (\tau )} \\&\left. - \left. \frac{f(t,{\hat{X}}^{t,y}_{t+\tau },u( t+\tau , {\hat{X}} ^{t,y}_{t+\tau }), Du(t+\tau , {\hat{X}}^{t,y}_{t+\tau }))}{ \rho (\tau )} \right) \right] \\&+ \frac{1}{2}\mathbb {E}_{t,y} \big [ 1_{X^{t,y}_{t+\tau } \notin \Omega ^{0,x}_\epsilon } \big (\phi \big (t, t+\tau ,X_{t+\tau }^{t,y}, u(t+\tau ,X_{t+\tau }^{t,y}), Du(t+\tau ,X_{t+\tau }^{t,y}) \big ) \\&+ \left. \phi \big (t, t+\tau ,\widehat{X}_{t+\tau }^{t,y}, u(t+\tau ,\widehat{X}_{t+\tau }^{t,y}), Du(t+\tau ,\widehat{X}_{t+\tau }^{t,y}) \big ) \big )\big ] \right) ^2 \end{aligned}$$

Using Jensen equality, the boundedness of g and f, the fact that \(\rho \) is bounded by below and Assumption 4.3, we get:

$$\begin{aligned} B(\theta ,t,y) \le \,&2 | u(t,y)-u(\theta ,t,y)|^2 \\&+ C \left( K^2 \mathbb {E}\left[ 1_{X^{t,y}_{t+\tau } \in \Omega ^{0,x}_\epsilon } \left( (u(t+\tau ,X^{t,y}_{t+\tau })-u(\theta ,t+\tau ,X^{t,y}_{t+\tau }))^2 \right. \right. \right. \\&+ \left. \left. \left. || Du(t+\tau ,X^{t,y}_{t+\tau }) -v(\theta ,t+\tau ,X^{t,y}_{t+\tau })||^2_2 \right) \right] + \mathbb {E}[ 1_{X^{t,y}_{t+\tau } \notin \Omega ^{0,x}_{\epsilon }}]\right) \end{aligned}$$

Then using the results in [43, 44], we know that there exists \( (u(\theta ^{*},t,x), v(\theta ^*,t,x )) \in \kappa \) such that:

$$\begin{aligned} \sup _{({\hat{t}}, y) \in [t,T]\times \Omega _\epsilon ^{0,x}} ( | u({\hat{t}}, y) - u(\theta ^*, {\hat{t}}, y)| + || Du({\hat{t}}, y)- v(\theta ^{*}, {\hat{t}},y)||_2) \le \epsilon \end{aligned}$$
(40)

Then we have that

$$\begin{aligned} B(\theta ^{*},t,y) \le&C \left( \epsilon + \mathbb {E}_{t,y}\left[ 1_{X^{t,y}_{t+\tau } \notin \Omega ^{0,x}_{\epsilon }}\right] \right) \nonumber \\ \le&C \left( \epsilon + \mathbb {E}_{t,y}\left[ 1_{\exists s \in [t,T] / X^{t,y}_{s} \notin \Omega ^{0,x}_{\epsilon }}\right] \right) \end{aligned}$$
(41)

Similarly introducing \({\hat{W}}^{t}_\tau = \sigma ^{-\top } \frac{W_{(t+ (\tau \wedge \Delta t )) \wedge T}-W_{t}}{\tau \wedge (T-t) \wedge \Delta t}\),

$$\begin{aligned} C(\theta ^{*},t,y)&= || {\bar{v}}^\epsilon ( \theta ^{*} ,t, y) - v(\theta ^{*} ,t,y)||^2_2 \nonumber \\&\le 2 || Du(t,y)-v(\theta ^{*} ,t,y)||^2_2 + 2 \left( \frac{1}{2} \mathbb {E}_{t,y}\left[ 1_{X^{t,y}_{t+\tau } \in \Omega ^{0,x}_\epsilon } {\hat{W}}^{t}_\tau \right. \right. \nonumber \\&\qquad \times \left( \frac{f(t,X^{t,y}_{t+\tau },u(\theta ^{*} , t+\tau , X^{t,y}_{t+\tau }), v(\theta ^{*},t+\tau , X^{t,y}_{t+\tau }))}{ \rho (\tau )} \right. \nonumber \\&\qquad - \frac{f(t,X^{t,y}_{t+\tau },u( t+\tau , X^{t,y}_{t+\tau }), Du(t+\tau , X^{t,y}_{t+\tau }))}{ \rho (\tau )} \nonumber \\&\qquad - \frac{f(t,{\hat{X}}^{t,y}_{t+\tau },u(\theta ^{*} , t+\tau , {\hat{X}}^{t,y}_{t+\tau }), v(\theta ^{*},t+\tau , {\hat{X}}^{t,y}_{t+\tau }))}{ \rho (\tau )} \nonumber \\&\qquad \left. \left. + \frac{f(t,{\hat{X}}^{t,y}_{t+\tau },u( t+\tau , {\hat{X}} ^{t,y}_{t+\tau }), Du(t+\tau , {\hat{X}}^{t,y}_{t+\tau }))}{ \rho (\tau )} \right) \right] \nonumber \\&\qquad + \frac{1}{2}\mathbb {E}_{t,y} \big [ 1_{X^{t,y}_{t+\tau } \notin \Omega ^{0,x}_\epsilon } {\hat{W}}^{t}_\tau \big (\phi \big (t, t+\tau ,X_{t+\tau }^{t,y}, u(t+\tau ,X_{t+\tau }^{t,y}), Du(t+\tau ,X_{t+\tau }^{t,y}) \big ) \nonumber \\&\qquad - \left. \phi \big (t, t+\tau ,\widehat{X}_{t+\tau }^{t,y}, u(t+\tau ,\widehat{X}_{t+\tau }^{t,y}), Du(t+\tau ,\widehat{X}_{t+\tau }^{t,y}) \big ) \big )\Bigg ] \right) ^2 \nonumber \\&\le 2 || Du(t,y)-v(\theta ^{*} ,t,y)||^2 \nonumber \\&\qquad + \,C \mathbb {E}_{t,y}\left[ ({\hat{W}}^{t}_\tau )^\top {\hat{W}}^{t}_\tau \left( \frac{||f||_\infty ^2 }{\rho (\tau )^2} + \frac{|g(X^{t,y}_{t+\tau }) - g( {\hat{X}}^{t,y}_{t+\tau })|^2}{{\bar{F}}(T)^2}\right) \right] \mathbb {E}_{t,y}\left[ 1_{X^{t,y}_{t+\tau } \notin \Omega ^{0,x}_{\epsilon }} \right] \nonumber \\&\qquad + C K \mathbb {E}_{t,y}\left[ 1_{X^{t,y}_{t+\tau } \in \Omega ^{0,x}_\epsilon } \frac{\left( {\hat{W}}^{t}_\tau \right) ^\top {\hat{W}}^{t}_\tau }{\rho (\tau )^2} \left( u(t+\tau ,X^{t,y}_{t+\tau }) - u\left( \theta ^{*},t+\tau ,X^{t,y}_{t+\tau }\right) \right) ^2 \right] \nonumber \\&\qquad + \,C K \mathbb {E}_{t,y}\left[ 1_{X^{t,y}_{t+\tau } \in \Omega ^{0,x}_\epsilon } \frac{({\hat{W}}^{t}_\tau )^\top {\hat{W}}^{t}_\tau }{\rho (\tau )^2} || Du\left( t+\tau ,X^{t,y}_{t+\tau }\right) \right. \nonumber \\&\qquad \left. - \,v(\theta ^{*},t+\tau ,X^{t,y}_{t+\tau }) ||^2_2 \right] , \end{aligned}$$
(42)

where we have used Jensen and Cauchy Schwarz.

Introducing \(G\in {{\,\mathrm{\mathbb {R}}\,}}^d\) composed of centered unitary independent Gaussian random variables

$$\begin{aligned} \mathbb {E}_{t,x}\left[ \frac{({\hat{W}}^{t}_\tau )^\top {\hat{W}}^{t}_\tau }{\rho (\tau )^2}\right] = \mathbb {E}\left[ \frac{1}{\tau \rho (\tau )^2}\right] \mathbb {E}[G^\top \sigma ^{-1} \sigma ^{-\top } G] < \infty \end{aligned}$$

when \(u<1\) in Eq. (11).

Similarly using the fact that g is Lipschitz,

$$\begin{aligned} \mathbb {E}_{t,y}\left[ ({\hat{W}}^{t}_\tau )^\top {\hat{W}}^{t}_\tau \left( \frac{||f||_\infty ^2 }{\rho (\tau )^2} + \frac{|g(X^{t,y}_{t+\tau }) - g( {\hat{X}}^{t,y}_{t+\tau })|^2}{{\bar{F}}(T)^2}\right) \right]< C < \infty \end{aligned}$$

Then using Eq. (40) in Eq. (42), we get

$$\begin{aligned} C(\theta ^{*}, t,y) \le C ( \epsilon + \mathbb {E}_{t,y}[ 1_{\exists s \in [t,T] / X^{t,y}_{s} \notin \Omega ^{0,x}_{\epsilon }}]) \end{aligned}$$
(43)

Injecting (41) and (43) in Eq. (31) and using the definition of \(\Omega ^{0,x}_{\epsilon }\) complete the proof.

1.3 Demonstration of Proposition 4.7

Let \(\bar{K}\) be a compact. It is always possible to find \(n_0\) such that for \(n> n_0\), \( \bar{K} \subset \Omega ^{0,x}_{\epsilon _n}\).

Let u be a function from \({{\,\mathrm{\mathbb {R}}\,}}\times {{\,\mathrm{\mathbb {R}}\,}}^d\) to \({{\,\mathrm{\mathbb {R}}\,}}\) and v function from \({{\,\mathrm{\mathbb {R}}\,}}\times {{\,\mathrm{\mathbb {R}}\,}}^d\) to \({{\,\mathrm{\mathbb {R}}\,}}^d\). Let

$$\begin{aligned} \psi (t,t+\tau ,x,u) =&1_{X^{t,x}_{t+\tau } \in \Omega ^{0,x}_{\epsilon _n}} \frac{1}{2} \left[ \phi \left( t, t+\tau ,X_{t+\tau }^{t,x}, u(t+\tau ,X_{t+\tau }^{t,x}) \right) \right. \\&\left. +\, \phi \left( t, t+\tau ,\widehat{X}_{t+\tau }^{t,x}, u\left( t+\tau ,\widehat{X}_{t+\tau }^{t,x}\right) \right) \right] . \end{aligned}$$

We note that for \(\zeta \) an independent uniform random variable in [0, T]

$$\begin{aligned} D_n&:= \mathbb {E} \left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} \Bigg ( \mathbb {E}_{t,X^{0,x}_t} \left[ \psi (t,t+\tau ,X^{0,x}_t,u(\theta _n, \cdot ,\cdot ),v(\theta _n, \cdot ,\cdot )) \right] \right. \\&\qquad \left. - \,u(\theta _n,t ,X^{0,x}_{t}) \Bigg )^2 {{\,\mathrm{{\text {d}}}\,}}t \right] \\&= T \mathbb {E} \left[ 1_{X^{0,x}_{\zeta } \in \Omega ^{0,x}_\epsilon } \Bigg ( \mathbb {E}_{\zeta ,X^{0,x}_\zeta } \left[ \psi (\zeta ,\zeta +\tau ,X^{0,x}_\zeta ,u(\theta _n, \cdot ,\cdot ),v(\theta _n, \cdot ,\cdot )) \right] \right. \\&\qquad \left. -\, u(\theta _n, \zeta ,X^{0,x}_{\zeta }) \Bigg )^2 \right] \\&= T \ell (\theta _n) \end{aligned}$$

Using the Eq. (39), the expression of \(D_n\), the Lipschitz property of f, Jensen inequality, the boundedness of f, g, and the fact that \(\rho \) is bounded by below by \(\hat{\rho }(T)\):

$$\begin{aligned} F_n&= \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in {\bar{K}}} (u( t,X^{0,x}_t) - u( \theta _n, t,X^{0,x}_t)) ^2 {{\,\mathrm{{\text {d}}}\,}}t \right] \\&\le \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} (u( t,X^{0,x}_t) - u( \theta _n, t,X^{0,x}_t)) ^2 {{\,\mathrm{{\text {d}}}\,}}t \right] \\&\le 3 \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} \left( \mathbb {E}_{t,X^{0,x}_t} (\psi (t,t+\tau ,X^{0,x}_{t+\tau },u) \right. \right. \\&\quad \left. \left. - \psi (t,t+\tau ,X^{0,x}_{t+\tau }, u(\theta _n,\cdot ,\cdot ))) \right) ^2 {{\,\mathrm{{\text {d}}}\,}}t \right] \\&\quad +\, 3 \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} \left( \mathbb {E}_{t,X^{0,x}_t} ( \psi (t,t+\tau ,X^{0,x}_{t+\tau }, u(\theta _n,\cdot ,\cdot ))) - u( \theta _n, t,X^{0,x}_t) \right) ^2 {{\,\mathrm{{\text {d}}}\,}}t \right] \\&\quad +\, 3 \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} \mathbb {E}_{t,X^{0,x}_t} \left( 1_{X^{0,x}_{t+ \tau } \notin \Omega ^{0,x}_{\epsilon _n}} \frac{1}{4} \big [ \phi \big (t, t+\tau ,X_{t+\tau }^{t,x}, u(t+\tau ,X_{t+\tau }^{t,x}) \big ) \right. \right. \\&\quad + \left. \left. \phi \big (t, t+\tau ,\widehat{X}_{t+\tau }^{t,x}, u(t+\tau ,\widehat{X}_{t+\tau }^{t,x}) \big ) \big ]^2 \right) {{\,\mathrm{{\text {d}}}\,}}t \right] \\&\le 3 K^2 \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} \mathbb {E}_{t,X^{0,x}_t} \left[ 1_{\tau <T -t} 1_{X^{0,x}_{t+\tau } \in \Omega ^{0,x}_{\epsilon _n}}\right. \right. \\&\quad \left. \left. \times \frac{(u( \theta _n, t+\tau ,X^{0,x}_{t+\tau }) - u(t+\tau ,X^{0,x}_{t+\tau }))^2}{\rho (\tau )^2 }\right] {{\,\mathrm{{\text {d}}}\,}}t \right] \\&\quad +\, 3 T \ell (\theta _n) + 3 \left( \frac{ ||f||_\infty ^2}{{\hat{\rho }}(T)^2} + \frac{ ||g||_\infty ^2}{{\bar{F}}(T)^2}\right) \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} \mathbb {E}_{t,X^{0,x}_t} \left( 1_{X^{0,x}_{t+ \tau } \notin \Omega ^{0,x}_{\epsilon _n}} \right) {{\,\mathrm{{\text {d}}}\,}}t\right] \\&\le 3 K^2 \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} \mathbb {E}_{t,X^{0,x}_t} \left[ \int _t^T ds \right. \right. \\&\quad \left. \left. \times \,\left( 1_{X^{0,x}_{s} \in \Omega ^{0,x}_{\epsilon _n}} \frac{(u( \theta _n, s,X^{0,x}_{s}) - u(s,X^{0,x}_{s}))^2}{\rho (s-t) } \right) ds \right] {{\,\mathrm{{\text {d}}}\,}}t \right] + 3 T \ell (\theta _n) + C \epsilon \\&\le 3 \frac{K^2 T}{{\hat{\rho }}(T)} \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} (u( \theta _n, t,X^{0,x}_{t}) - u(t,X^{0,x}_{t}))^2 {{\,\mathrm{{\text {d}}}\,}}t \right] + 3T \ell (\theta _n) + C \epsilon \end{aligned}$$

Then, if K is small enough

$$\begin{aligned} F_n \le {\hat{\rho }}(T) \frac{T \ell (\theta _n) + C \epsilon }{ {\hat{\rho }}(T) - 3 K^2 T}, \end{aligned}$$

which completes the proof.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chan-Wai-Nam, Q., Mikael, J. & Warin, X. Machine Learning for Semi Linear PDEs. J Sci Comput 79, 1667–1712 (2019). https://doi.org/10.1007/s10915-019-00908-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10915-019-00908-3

Keywords

Navigation