Machine Learning for Semi Linear PDEs

Chan-Wai-Nam, Quentin; Mikael, Joseph; Warin, Xavier

doi:10.1007/s10915-019-00908-3

Machine Learning for Semi Linear PDEs

Published: 12 February 2019

Volume 79, pages 1667–1712, (2019)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

1753 Accesses
58 Citations
Explore all metrics

Abstract

Recent machine learning algorithms dedicated to solving semi-linear PDEs are improved by using different neural network architectures and different parameterizations. These algorithms are compared to a new one that solves a fixed point problem by using deep learning techniques. This new algorithm appears to be competitive in terms of accuracy with the best existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

Article 14 February 2018

Improved Architectures and Training Algorithms for Deep Operator Networks

Article 24 June 2022

Why Deep Neural Networks: Yet Another Explanation

References

Pardoux, E., Peng, S.: Adapted solution of a backward stochastic differential equation. Syst. Control Lett. 14(1), 55–61 (1990)
Article MathSciNet MATH Google Scholar
Bouchard, B., Touzi, N.: Discrete-time approximation and Monte-Carlo simulation of backward stochastic differential equations. Stoch. Process. Appl. 111(2), 175–206 (2004)
Article MathSciNet MATH Google Scholar
Gobet, E., Lemor, J.-P., Warin, X., et al.: A regression-based Monte Carlo method to solve backward stochastic differential equations. Ann. Appl. Prob. 15(3), 2172–2202 (2005)
Article MathSciNet MATH Google Scholar
Lemor, J.-P., Gobet, E., Warin, X., et al.: Rate of convergence of an empirical regression method for solving generalized backward stochastic differential equations. Bernoulli 12(5), 889–916 (2006)
Article MathSciNet MATH Google Scholar
Gobet, Emmanuel, Turkedjiev, Plamen: Linear regression MDP scheme for discrete backward stochastic differential equations under general conditions. Math. Comput. 85(299), 1359–1391 (2016)
Article MathSciNet MATH Google Scholar
Fahim, A., Touzi, N., Warin, X.: A probabilistic numerical method for fully nonlinear parabolic PDEs. Ann. Appl. Prob. 21, 1322–1364 (2011)
Article MathSciNet MATH Google Scholar
Cheridito, P., et al.: Second-order backward stochastic differential equations and fully nonlinear parabolic PDEs. Commun. Pure Appl. Math. 60(7), 1081–1110 (2007)
Article MathSciNet MATH Google Scholar
Longstaff, F.A., Schwartz, E.S.: Valuing American options by simulation: a simple least-squares approach. Rev. Financ. Stud. 14(1), 113–147 (2001)
Article MATH Google Scholar
Bouchard, B., Warin, X.: Monte-Carlo valuation of American options: facts and new algorithms to improve existing methods. In: Numerical Methods in Finance. Springer, Berlin, pp. 215–255 (2012)
Henry-Labordere, P., et al.: Branching Diffusion Representation of Semilinear PDEs and Monte Carlo Approximation. arXiv preprint arXiv:1603.01727 (2016)
Bouchard, B., et al.: Numerical approximation of BSDEs using local polynomial drivers and branching processes. Monte Carlo Methods Appl. 23(4), 241–263 (2017)
Article MathSciNet MATH Google Scholar
Bouchard, B., Tan, X., Warin, X.: Numerical Approximation of General Lipschitz BSDEs with Branching Processes. arXiv preprint arXiv:1710.10933 (2017)
Warin, X.: Variations on Branching Methods for Non-linear PDEs. arXiv preprint arXiv:1701.07660 (2017)
Fournié, E., et al.: Applications of Malliavin calculus to Monte Carlo methods in finance. Finance Stoch. 3(4), 391–412 (1999)
Article MathSciNet MATH Google Scholar
Warin, X.: Nesting Monte Carlo for High-Dimensional Non Linear PDEs. arXiv preprint arXiv:1804.08432 (2018)
Warin, X.: Monte Carlo for High-Dimensional Degenerated Semi Linear and Full Non Linear PDEs. arXiv preprint arXiv:1805.05078 (2018)
Weinan, E., et al.: Linear scaling algorithms for solving high-dimensional nonlinear parabolic differential equations. In: SAM Research Report 2017 (2017)
Weinan, E., et al.: On multilevel Picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations, vol. 46. arXiv preprint arXiv:1607.03295 (2016)
Hutzenthaler, M., Kruse, T.: Multi-level Picard Approximations of High Dimensional Semilinear Parabolic Differential Equations with Gradient-Dependent Nonlinearities. arXiv preprint arXiv:1711.01080 (2017)
Hutzenthaler, M., et al.: Overcoming the Curse of Dimensionality in the Numerical Approximation of Semilinear Parabolic Partial Differential Equations. arXiv preprint arXiv:1807.01212 (2018)
Han, J., Jentzen, A., Weinan, E.: Overcoming the Curse of Dimensionality: Solving High-Dimensional Partial Differential Equations Using Deep Learning. arXiv:1707.02568 (2017)
Weinan, E., Han, J., Jentzen, A.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5(4), 349–380 (2017)
Article MathSciNet MATH Google Scholar
Beck, C., Weinan, E., Jentzen, A.: Machine Learning Approximation Algorithms for High-Dimensional Fully Nonlinear Partial Differential Equations and Secondorder Backward Stochastic Differential Equations. arXiv preprint arXiv:1709.05963 (2017)
Raissi, M.: Forward–Backward Stochastic Neural Networks: Deep Learning of High Dimensional Partial Differential Equations. arXiv preprint arXiv:1804.07010 (2018)
Fujii, M., Takahashi, A., Takahashi, M.: Asymptotic Expansion as Prior Knowledge in Deep Learning Method for High Dimensional BSDEs. arXiv preprint arXiv:1710.07030 (2017)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning—Volume 37. ICML’15, JMLR.org, Lille, pp. 448–456. http://dl.acm.org/citation.cfm?id=3045118.3045167 (2015)
Cooijmans, T., et al.: Recurrent Batch Normalization. arXiv preprint arXiv:1603.09025 (2016)
Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and Accurate Deep Network Learning by Exponential Linear Units (elus). arXiv preprint arXiv:1511.07289 (2015)
He, K., et al.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Article Google Scholar
Olah, C.: Understanding LSTM Networks, Blog, http://colah.github.io/posts/2015-08-Understanding-LSTMs/ (2015)
Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. Blog, http://karpathy.github.io/2015/05/21/rnn-effectiveness/ (2015)
Han, J., Jentzen, A., et al.: Solving High-Dimensional Partial Differential Equations Using Deep Learning. In: arXiv preprint arXiv:1707.02568 (2017)
Sirignano, J., Spiliopoulos, K.: DGM: a deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018)
Article MathSciNet Google Scholar
Richou, A.: Étude théorique et numérique des équations différentielles stochastiques rétrogrades. Thèse de doctorat dirigée par Hu, Ying et Briand, Philippe Mathématiques et applications Rennes 1 2010, Ph.D. thesis (2010)
Ruder, S.: An Overview of Gradient Descent Optimization Algorithms. arXiv preprint arXiv:1609.04747 (2016)
Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. In: Grégoire, M., Geneviève, B.O., Klaus-Robert, M. (eds.) Neural Networks: Tricks of the Trade, 2nd edn. Springer, Berlin, pp. 437–478 (2012). ISBN: 978-3-642-35289-8. https://doi.org/10.1007/978-3-642-35289-8_26
Han, J., Weinan, E.: Deep Learning Approximation for Stochastic Control Problems. arXiv preprint arXiv:1611.07422 (2016)
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (2014)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Braun, S.: LSTM Benchmarks for Deep Learning Frameworks. arXiv preprint arXiv:1806.01818 (2018)
Touzi, N.: Optimal Stochastic Control, Stochastic Target Problems, and Backward SDE, vol. 29. Springer, Berlin (2012)
MATH Google Scholar
Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991)
Article MathSciNet Google Scholar
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank Simon Fécamp for useful discussions and technical advices.

Author information

Authors and Affiliations

EDF Lab Paris-Saclay, Palaiseau, France
Quentin Chan-Wai-Nam, Joseph Mikael & Xavier Warin
Laboratoire de Finance des Marchés de l’Energie, FiME, Chatou, France
Xavier Warin

Authors

Quentin Chan-Wai-Nam
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Mikael
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Warin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xavier Warin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Some Test PDEs

We recall our notations for the semilinear PDEs:

$$\begin{aligned} -\partial _t u(t, x) - \mathcal {L} u(t, x)&= f\left( t, x, u(t, x), \sigma ^\top (t, x) \nabla u(t, x)\right) \\ u(T, x)&= g(x) \end{aligned}$$

where $\mathcal {L}u (t, x) := \frac{1}{2} {{\,\mathrm{{\text {Tr}}}\,}}\left( \sigma ^\top \sigma (t, x) \nabla ^2 u(t, x)\right) + \mu (t, x)^\top \nabla u(t, x)$. For each example, we thus give the corresponding $\mu $, $\sigma $, f and g. In the implementation, we rather use a function

$$\begin{aligned} \tilde{f}(t, x, u(t, x), Du(t, x)) := f(t, x, u(t, x), \sigma ^\top (t, x) Du(t, x)) \end{aligned}$$

for convenience, as this does not influence the results and allows for a more direct formulation for some PDEs.

1.1.1 A Black–Scholes Equation with Default Risk

From [21, 22]. If not otherwise stated, the parameters take values: $\overline{\mu }=0.02$, $\overline{\sigma }=0.2$, $\delta =2/3$, $R=0.02$, $\gamma _h=0.2$, $\gamma _l=0.02$, $v_h=50$, $v_l=70$. We use the initial condition $X_0 = (100, \ldots , 100)$.

$$\begin{aligned} \mu : (t, x)&\mapsto \overline{\mu } x \\ \sigma : (t, x)&\mapsto \overline{\sigma } \; {\mathrm {diag}}(\left\{ x_i\right\} _{i=1..d}) \\ f : (t, x, y, z)&\mapsto -(1-\delta ) \times {\mathrm {min}} \left\{ \gamma _h, \; {\mathrm {max}}\left\{ \gamma _l, \; \frac{\gamma _h - \gamma _l}{v_h - v_l} \left( y - v_h \right) + \gamma _h \right\} \right\} y - R y\\ g : x&\mapsto \min _{i=1..d} x_i \end{aligned}$$

We used the closed formula for the SDE dynamic:

$$\begin{aligned} X_t&= X_s \exp \left[ \left( \left( \bar{\mu }-\frac{\bar{\sigma }^2}{2}\right) \right) (t-s) + \bar{\sigma } (W_t - W_s)\right]&\forall t > s \end{aligned}$$

Baseline (from [21]):

1.1.2 A Black–Scholes–Barenblatt Equation

From [24]. If not otherwise stated, the parameters take values: $\overline{\sigma }=0.4$, $r=0.05$. We use the initial condition $X_0 = (1.0, 0.5, 1.0, \ldots )$.

$$\begin{aligned} \mu : (t, x)&\mapsto 0 \\ \sigma : (t, x)&\mapsto \overline{\sigma } \; {\mathrm {diag}}(\left\{ x_i\right\} _{i=1..d}) \\ f : (t, x, y, z)&\mapsto -r \left( y - \frac{1}{\overline{\sigma }}\sum _{i=1}^{d} z_i\right) \\ g : x&\mapsto \left\| x\right\| ^2 \end{aligned}$$

We used the closed formula for the SDE dynamic:

$$\begin{aligned} X_t&= X_s \exp \left[ -\frac{\bar{\sigma }^2}{2} (t-s) + \bar{\sigma } (W_t - W_s)\right]&\forall t > s \end{aligned}$$

Exact solution:

$$\begin{aligned} u(t, x)&= \exp ((r+\overline{\sigma }^2)(T-t))g(x) \end{aligned}$$

1.1.3 A Hamilton–Jacobi–Bellman Equation

From [21, 22]. If not otherwise stated: $\lambda =1.0$ and $X_0 = (0, \ldots , 0)$.

$$\begin{aligned} \mu : (t, x)&\mapsto 0 \\ \sigma : (t, x)&\mapsto \sqrt{2} \; \mathbf {I}_d \\ f : (t, x, y, z)&\mapsto - 0.5 \; \lambda \left\| z\right\| ^2 \\ g : x&\mapsto \log \left( 0.5 \left[ 1 + \left\| x\right\| ^2 \right] \right) \end{aligned}$$

Monte-Carlo solution:

$$\begin{aligned} u(t, X_t) =&-\frac{1}{\lambda } \log \left( \mathbb {E}\left[ \; \exp \left( - \lambda g(X_t + \sqrt{2} B_{T-t}\right) \right] \right) \\ \forall j, \quad \frac{\partial u}{\partial x_j}(t, x) =&\left( \mathbb {E}\left[ \exp \left\{ -\lambda g(X_t + \sqrt{2} B_{T-t})\right\} \right] \right) ^{-1} \\&\times \mathbb {E} \left[ \frac{\partial \, g}{\partial \, x_j}(X_t + \sqrt{2} B_{T-t}) \exp \left\{ -\lambda g(X_t + \sqrt{2} B_{T-t})\right\} \right] \end{aligned}$$

where

$$\begin{aligned} \frac{\partial \, g}{\partial \, x_j}(x)&= \frac{2x_j}{1+\left\| x\right\| ^2} \end{aligned}$$

Baseline (computed using 10 million Monte-Carlo realizations, $d=100$):

1.1.4 An Oscillating Example with a Square Non-linearity

From [15]. If not otherwise stated, the parameters take values: $\mu _0=0.2$, $\sigma _0=1.0$, $a=0.5$, $r=0.1$. The intended effect of the $\min $ and $\max $ in f is to make f Lipschitz. We used the initial condition $X_0=(1.0, 0.5, 1.0, \ldots )$.

$$\begin{aligned} \mu : (t, x)&\mapsto \mu _0 / d \\ \sigma : (t, x)&\mapsto \sigma _0 / \sqrt{d} \; \mathbf {I}_d \\ f : (t, x, y, z)&\mapsto \phi (t, x) \\&\quad + r \left( \max \left[ -\exp (2a(T-t)), \min \left\{ \frac{1}{\sigma _0 \sqrt{d}} y \sum _{i=1}^{d} z_i , \exp (2a(T-t)) \right\} \right] \right) ^2 \end{aligned}$$

where

$$\begin{aligned} \phi : (t, x)&\mapsto \cos \left( \sum _{i=1}^{d} x_i \right) \left( a + \frac{\sigma _0^2}{2} \right) \exp (a(T-t)) + \sin \left( \sum _{i=1}^{d} x_i \right) \mu _0 \exp (a(T-t)) \\&\qquad - r \left( \cos \left( \sum _{i=1}^{d} x_i \right) \sin \left( \sum _{i=1}^{d} x_i \right) \exp (2a(T-t))\right) ^2 \\ g : x&\mapsto \cos \left( \sum _{i=1}^d x_i \right) \end{aligned}$$

Exact solution:

$$\begin{aligned} u(t, x)&= \cos \left( \sum _{i=1}^d x_i \right) \exp (a(T-t)) \\ \forall j, \quad \frac{\partial u}{\partial x_j}(t, x)&= -\sin \left( \sum _{i=1}^d x_i \right) \exp (a(T-t)) \end{aligned}$$

1.1.5 A Non-Lipschitz Terminal Condition

From [35]. If not otherwise stated: $\alpha =0.5$, $X_0 = (0, \ldots , 0)$.

$$\begin{aligned} \mu : (t, x)&\mapsto 0 \\ \sigma : (t, x)&\mapsto \mathbf {I}_d \\ f : (t, x, y, z)&\mapsto - 0.5 \left\| z\right\| ^2 \\ g : x&\mapsto \sum _{i=1}^d \left( \max \left\{ 0, \min \left[ 1, x_i\right] \right\} \right) ^\alpha \end{aligned}$$

Monte-Carlo solution:

$$\begin{aligned} u(t, X_t) =&\log \left( \mathbb {E}\left[ \; \exp \left( g(X_t + B_{T-t})\right) \right] \right) \\ \forall j, \quad \frac{\partial u}{\partial x_j}(t, x) =&\left( \mathbb {E}\left[ \exp \left\{ g(X_t + \sqrt{2} B_{T-t})\right\} \right] \right) ^{-1} \\&\times \mathbb {E} \left[ g'(X_t + \sqrt{2} B_{T-t}) \exp \left\{ g(X_t + \sqrt{2} B_{T-t})\right\} \right] \end{aligned}$$

where

$$\begin{aligned} g'(x)&= \left\{ \begin{array}{ll} 0 &{}\quad {\text {if}}~ x\le 0~ \text {or}~ x\ge 1 \\ \alpha x^{\alpha -1} &{}\quad {\text {else}}\end{array} \right. \end{aligned}$$

Baseline (computed using 10 million Monte-Carlo realizations, $d=10$):

1.1.6 An Oscillating Example with Cox–Ingersoll–Ross Propagation

From [16]. If not otherwise stated: $a=0.1$, $\alpha =0.2$, $T=1.0$, $\hat{k}=0.1$, $\hat{m}=0.3$, $\hat{\sigma }=0.2$. We used the initial condition $X_0 = (0.3, \ldots , 0.3)$. Note that we have $2 \hat{k} \hat{m} > \hat{\sigma }^2$ so that X remains positive.

$$\begin{aligned} \mu : (t, x)&\mapsto \hat{k}\left( \hat{m} - x \right) \\ \sigma : (t, x)&\mapsto \hat{\sigma } {\mathrm {diag}} \left\{ \sqrt{x} \right\} \\ f : (t, x, y, z)&\mapsto \phi (t, x) + a y \left( \sum _{i=1}^d z_i \right) \end{aligned}$$

where

$$\begin{aligned} \phi : (t, x)&\mapsto \cos \left( \sum _{i=1}^{d} x_i \right) \left( -\alpha + \frac{{\hat{\sigma }}^2}{2} \right) \exp (-\alpha (T-t)) \\&\qquad + \sin \left( \sum _{i=1}^{d} x_i \right) \exp (-\alpha (T-t)) \sum _{i=1}^{d} \hat{k}\left( \hat{m} - x_i\right) \\&\qquad + a \cos \left( \sum _{i=1}^{d} x_i \right) \sin \left( \sum _{i=1}^{d} x_i\right) \exp (-2\alpha (T-t)) \sum _{i=1}^d \hat{\sigma } \sqrt{x_i}\\ g : x&\mapsto \cos \left( \sum _{i=1}^d x_i \right) \end{aligned}$$

Exact solution:

$$\begin{aligned} u(t, x)&= \cos \left( \sum _{i=1}^d x_i \right) \exp (-\alpha (T-t)) \\ \forall j, \quad \frac{\partial u}{\partial x_j}(t, x)&= -\sin \left( \sum _{i=1}^d x_i \right) \exp (-\alpha (T-t)) \end{aligned}$$

1.1.7 An Oscillating Example with Inverse Non-linearity

If not stated otherwise, we took parameters $\mu _0 = 0.2$, $\sigma _0 = 1.0$, $a=0.5$, $r=0.1$ and the initial condition $X_0=(1.0, 0.5, 1.0, \ldots )$.

$$\begin{aligned} \mu : (t, x)&\mapsto \mu _0 /d \mathbb {1}_d\\ \sigma : (t, x)&\mapsto \sigma _0 \; \mathbf {I}_d \\ f : (t, x, y, z)&\mapsto \phi (t, x) + r \frac{d y}{\sum _{i=1}^d z_i} \\ g : x&\mapsto 2\sum _{i=1}^d x_i + \cos \left( \sum _{i=1}^d x_i\right) \end{aligned}$$

where

$$\begin{aligned} \phi : (t, x)&\mapsto 2a \sum _{i=1}^d x_i \exp (a(T-t)) + \cos \left( \sum _{i=1}^{d} x_i \right) \left( a + \frac{d \sigma _0^2}{2} \right) \exp (a(T-t)) \\&\quad - \mu _0 \left[ 2- \sin \left( \sum _{i=1}^{d} x_i \right) \right] \exp (a(T-t)) - r \frac{2 \sum _{i=1}^d x_i + \cos \left( \sum _{i=1}^{d} x_i \right) }{\sigma _0 \left[ 2 - \sin \left( \sum _{i=1}^{d} x_i \right) \right] } \end{aligned}$$

Exact solution:

$$\begin{aligned} u(t, x)&= \left[ 2\sum _{i=1}^d x_i + \cos \left( \sum _{i=1}^d x_i \right) \right] \exp (a(T-t)) \\ \forall j, \quad \frac{\partial u}{\partial x_j}(t, x)&= \left[ 2-\sin \left( \sum _{i=1}^d x_i \right) \right] \exp (a(T-t)) \end{aligned}$$

1.2 Demonstration of Proposition 4.6

In the sequel, C will only depend on u, f, g and may vary from one line to another.

First, note that due to the Lipschitz property in Assumption 4.3, the boundedness of f and g and the regularity of u due to Assumption 4.5, the solution u of (1) and Du satisfy a Feynman–Kac relation (see an adaptation of Proposition 1.7 in [42]):

$$\begin{aligned} u\left( t,x\right)&= \frac{1}{2}\mathbb {E}_{t,x} \left[ \phi \left( t, t+\tau ,X_{t+\tau }^{t,x}, u\left( t+\tau ,X_{t+\tau }^{t,x}\right) , Du\left( t+\tau ,X_{t+\tau }^{t,x}\right) \right) \right. \nonumber \\&\quad \left. +\, \phi \left( t, t+\tau ,\widehat{X}_{t+\tau }^{t,x}, u\left( t+\tau ,\widehat{X}_{t+\tau }^{t,x}\right) , Du\left( t+\tau ,\widehat{X}_{t+\tau }^{t,x}\right) \right) \right] \nonumber \\ Du\left( t,x\right)&= \mathbb {E}_{t,x} \left[ \sigma ^{-\top } \frac{W_{\left( t + \left( \tau \wedge \Delta t \right) \right) \wedge T}-W_{t}}{\tau \wedge \left( T-t\right) \wedge \Delta t} \right. \nonumber \\&\quad \times \frac{1}{2} \left( \phi \left( t,t+\tau , X_{t+\tau }^{t,x}, u\left( t+\tau , X_{t+\tau }^{t,x}\right) , Du\left( t+\tau , X_{t+\tau }^{t,x}\right) \right) \right. \nonumber \\&\quad \left. \left. -\, \phi \left( t,t+\tau , \widehat{X}_{t+\tau }^{t,x}, u\left( t+\tau , \widehat{X}_{t+\tau }^{t,x}\right) , Du\left( t+\tau , \widehat{X}_{t+\tau }^{t,x}\right) \right) \right) \right] . \end{aligned}$$

(39)

First picking $(t,y) \in [0,T] \times \Omega _\epsilon ^{0,x}$, we have using Eq. (39):

$$\begin{aligned} B(\theta ,t, y) = \,&| {\bar{u}}^{\epsilon }(\theta , t, y) - u(\theta ,t,y)|^2 \\ \le \,&2 | u(t,y)-u(\theta ,t,y)|^2 + 2 \left( \mathbb {E}_{t,y}\left[ 1_{X^{t,y}_{t+\tau } \in \Omega ^{0,x}_\epsilon } \right. \right. \\&\times \frac{1}{2}\left( \frac{f(t,X^{t,y}_{t+\tau },u(\theta , t+\tau , X^{t,y}_{t+\tau }), v(\theta ,t+\tau , X^{t,y}_{t+\tau }))}{ \rho (\tau )} \right. \\&- \frac{f(t,X^{t,y}_{t+\tau },u( t+\tau , X^{t,y}_{t+\tau }), Du(t+\tau , X^{t,y}_{t+\tau }))}{ \rho (\tau )} \\&+ \frac{f(t,{\hat{X}}^{t,y}_{t+\tau },u(\theta , t+\tau , {\hat{X}}^{t,y}_{t+\tau }), v(\theta ,t+\tau , {\hat{X}}^{t,y}_{t+\tau }))}{ \rho (\tau )} \\&\left. - \left. \frac{f(t,{\hat{X}}^{t,y}_{t+\tau },u( t+\tau , {\hat{X}} ^{t,y}_{t+\tau }), Du(t+\tau , {\hat{X}}^{t,y}_{t+\tau }))}{ \rho (\tau )} \right) \right] \\&+ \frac{1}{2}\mathbb {E}_{t,y} \big [ 1_{X^{t,y}_{t+\tau } \notin \Omega ^{0,x}_\epsilon } \big (\phi \big (t, t+\tau ,X_{t+\tau }^{t,y}, u(t+\tau ,X_{t+\tau }^{t,y}), Du(t+\tau ,X_{t+\tau }^{t,y}) \big ) \\&+ \left. \phi \big (t, t+\tau ,\widehat{X}_{t+\tau }^{t,y}, u(t+\tau ,\widehat{X}_{t+\tau }^{t,y}), Du(t+\tau ,\widehat{X}_{t+\tau }^{t,y}) \big ) \big )\big ] \right) ^2 \end{aligned}$$

Using Jensen equality, the boundedness of g and f, the fact that $\rho $ is bounded by below and Assumption 4.3, we get:

$$\begin{aligned} B(\theta ,t,y) \le \,&2 | u(t,y)-u(\theta ,t,y)|^2 \\&+ C \left( K^2 \mathbb {E}\left[ 1_{X^{t,y}_{t+\tau } \in \Omega ^{0,x}_\epsilon } \left( (u(t+\tau ,X^{t,y}_{t+\tau })-u(\theta ,t+\tau ,X^{t,y}_{t+\tau }))^2 \right. \right. \right. \\&+ \left. \left. \left. || Du(t+\tau ,X^{t,y}_{t+\tau }) -v(\theta ,t+\tau ,X^{t,y}_{t+\tau })||^2_2 \right) \right] + \mathbb {E}[ 1_{X^{t,y}_{t+\tau } \notin \Omega ^{0,x}_{\epsilon }}]\right) \end{aligned}$$

Then using the results in [43, 44], we know that there exists $ (u(\theta ^{*},t,x), v(\theta ^*,t,x )) \in \kappa $ such that:

$$\begin{aligned} \sup _{({\hat{t}}, y) \in [t,T]\times \Omega _\epsilon ^{0,x}} ( | u({\hat{t}}, y) - u(\theta ^*, {\hat{t}}, y)| + || Du({\hat{t}}, y)- v(\theta ^{*}, {\hat{t}},y)||_2) \le \epsilon \end{aligned}$$

(40)

Then we have that

$$\begin{aligned} B(\theta ^{*},t,y) \le&C \left( \epsilon + \mathbb {E}_{t,y}\left[ 1_{X^{t,y}_{t+\tau } \notin \Omega ^{0,x}_{\epsilon }}\right] \right) \nonumber \\ \le&C \left( \epsilon + \mathbb {E}_{t,y}\left[ 1_{\exists s \in [t,T] / X^{t,y}_{s} \notin \Omega ^{0,x}_{\epsilon }}\right] \right) \end{aligned}$$

(41)

Similarly introducing ${\hat{W}}^{t}_\tau = \sigma ^{-\top } \frac{W_{(t+ (\tau \wedge \Delta t )) \wedge T}-W_{t}}{\tau \wedge (T-t) \wedge \Delta t}$,

$$\begin{aligned} C(\theta ^{*},t,y)&= || {\bar{v}}^\epsilon ( \theta ^{*} ,t, y) - v(\theta ^{*} ,t,y)||^2_2 \nonumber \\&\le 2 || Du(t,y)-v(\theta ^{*} ,t,y)||^2_2 + 2 \left( \frac{1}{2} \mathbb {E}_{t,y}\left[ 1_{X^{t,y}_{t+\tau } \in \Omega ^{0,x}_\epsilon } {\hat{W}}^{t}_\tau \right. \right. \nonumber \\&\qquad \times \left( \frac{f(t,X^{t,y}_{t+\tau },u(\theta ^{*} , t+\tau , X^{t,y}_{t+\tau }), v(\theta ^{*},t+\tau , X^{t,y}_{t+\tau }))}{ \rho (\tau )} \right. \nonumber \\&\qquad - \frac{f(t,X^{t,y}_{t+\tau },u( t+\tau , X^{t,y}_{t+\tau }), Du(t+\tau , X^{t,y}_{t+\tau }))}{ \rho (\tau )} \nonumber \\&\qquad - \frac{f(t,{\hat{X}}^{t,y}_{t+\tau },u(\theta ^{*} , t+\tau , {\hat{X}}^{t,y}_{t+\tau }), v(\theta ^{*},t+\tau , {\hat{X}}^{t,y}_{t+\tau }))}{ \rho (\tau )} \nonumber \\&\qquad \left. \left. + \frac{f(t,{\hat{X}}^{t,y}_{t+\tau },u( t+\tau , {\hat{X}} ^{t,y}_{t+\tau }), Du(t+\tau , {\hat{X}}^{t,y}_{t+\tau }))}{ \rho (\tau )} \right) \right] \nonumber \\&\qquad + \frac{1}{2}\mathbb {E}_{t,y} \big [ 1_{X^{t,y}_{t+\tau } \notin \Omega ^{0,x}_\epsilon } {\hat{W}}^{t}_\tau \big (\phi \big (t, t+\tau ,X_{t+\tau }^{t,y}, u(t+\tau ,X_{t+\tau }^{t,y}), Du(t+\tau ,X_{t+\tau }^{t,y}) \big ) \nonumber \\&\qquad - \left. \phi \big (t, t+\tau ,\widehat{X}_{t+\tau }^{t,y}, u(t+\tau ,\widehat{X}_{t+\tau }^{t,y}), Du(t+\tau ,\widehat{X}_{t+\tau }^{t,y}) \big ) \big )\Bigg ] \right) ^2 \nonumber \\&\le 2 || Du(t,y)-v(\theta ^{*} ,t,y)||^2 \nonumber \\&\qquad + \,C \mathbb {E}_{t,y}\left[ ({\hat{W}}^{t}_\tau )^\top {\hat{W}}^{t}_\tau \left( \frac{||f||_\infty ^2 }{\rho (\tau )^2} + \frac{|g(X^{t,y}_{t+\tau }) - g( {\hat{X}}^{t,y}_{t+\tau })|^2}{{\bar{F}}(T)^2}\right) \right] \mathbb {E}_{t,y}\left[ 1_{X^{t,y}_{t+\tau } \notin \Omega ^{0,x}_{\epsilon }} \right] \nonumber \\&\qquad + C K \mathbb {E}_{t,y}\left[ 1_{X^{t,y}_{t+\tau } \in \Omega ^{0,x}_\epsilon } \frac{\left( {\hat{W}}^{t}_\tau \right) ^\top {\hat{W}}^{t}_\tau }{\rho (\tau )^2} \left( u(t+\tau ,X^{t,y}_{t+\tau }) - u\left( \theta ^{*},t+\tau ,X^{t,y}_{t+\tau }\right) \right) ^2 \right] \nonumber \\&\qquad + \,C K \mathbb {E}_{t,y}\left[ 1_{X^{t,y}_{t+\tau } \in \Omega ^{0,x}_\epsilon } \frac{({\hat{W}}^{t}_\tau )^\top {\hat{W}}^{t}_\tau }{\rho (\tau )^2} || Du\left( t+\tau ,X^{t,y}_{t+\tau }\right) \right. \nonumber \\&\qquad \left. - \,v(\theta ^{*},t+\tau ,X^{t,y}_{t+\tau }) ||^2_2 \right] , \end{aligned}$$

(42)

where we have used Jensen and Cauchy Schwarz.

Introducing $G\in {{\,\mathrm{\mathbb {R}}\,}}^d$ composed of centered unitary independent Gaussian random variables

$$\begin{aligned} \mathbb {E}_{t,x}\left[ \frac{({\hat{W}}^{t}_\tau )^\top {\hat{W}}^{t}_\tau }{\rho (\tau )^2}\right] = \mathbb {E}\left[ \frac{1}{\tau \rho (\tau )^2}\right] \mathbb {E}[G^\top \sigma ^{-1} \sigma ^{-\top } G] < \infty \end{aligned}$$

when $u<1$ in Eq. (11).

Similarly using the fact that g is Lipschitz,

$$\begin{aligned} \mathbb {E}_{t,y}\left[ ({\hat{W}}^{t}_\tau )^\top {\hat{W}}^{t}_\tau \left( \frac{||f||_\infty ^2 }{\rho (\tau )^2} + \frac{|g(X^{t,y}_{t+\tau }) - g( {\hat{X}}^{t,y}_{t+\tau })|^2}{{\bar{F}}(T)^2}\right) \right]< C < \infty \end{aligned}$$

Then using Eq. (40) in Eq. (42), we get

$$\begin{aligned} C(\theta ^{*}, t,y) \le C ( \epsilon + \mathbb {E}_{t,y}[ 1_{\exists s \in [t,T] / X^{t,y}_{s} \notin \Omega ^{0,x}_{\epsilon }}]) \end{aligned}$$

(43)

Injecting (41) and (43) in Eq. (31) and using the definition of $\Omega ^{0,x}_{\epsilon }$ complete the proof.

1.3 Demonstration of Proposition 4.7

Let $\bar{K}$ be a compact. It is always possible to find $n_0$ such that for $n> n_0$, $ \bar{K} \subset \Omega ^{0,x}_{\epsilon _n}$.

Let u be a function from ${{\,\mathrm{\mathbb {R}}\,}}\times {{\,\mathrm{\mathbb {R}}\,}}^d$ to ${{\,\mathrm{\mathbb {R}}\,}}$ and v function from ${{\,\mathrm{\mathbb {R}}\,}}\times {{\,\mathrm{\mathbb {R}}\,}}^d$ to ${{\,\mathrm{\mathbb {R}}\,}}^d$. Let

$$\begin{aligned} \psi (t,t+\tau ,x,u) =&1_{X^{t,x}_{t+\tau } \in \Omega ^{0,x}_{\epsilon _n}} \frac{1}{2} \left[ \phi \left( t, t+\tau ,X_{t+\tau }^{t,x}, u(t+\tau ,X_{t+\tau }^{t,x}) \right) \right. \\&\left. +\, \phi \left( t, t+\tau ,\widehat{X}_{t+\tau }^{t,x}, u\left( t+\tau ,\widehat{X}_{t+\tau }^{t,x}\right) \right) \right] . \end{aligned}$$

We note that for $\zeta $ an independent uniform random variable in [0, T]

$$\begin{aligned} D_n&:= \mathbb {E} \left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} \Bigg ( \mathbb {E}_{t,X^{0,x}_t} \left[ \psi (t,t+\tau ,X^{0,x}_t,u(\theta _n, \cdot ,\cdot ),v(\theta _n, \cdot ,\cdot )) \right] \right. \\&\qquad \left. - \,u(\theta _n,t ,X^{0,x}_{t}) \Bigg )^2 {{\,\mathrm{{\text {d}}}\,}}t \right] \\&= T \mathbb {E} \left[ 1_{X^{0,x}_{\zeta } \in \Omega ^{0,x}_\epsilon } \Bigg ( \mathbb {E}_{\zeta ,X^{0,x}_\zeta } \left[ \psi (\zeta ,\zeta +\tau ,X^{0,x}_\zeta ,u(\theta _n, \cdot ,\cdot ),v(\theta _n, \cdot ,\cdot )) \right] \right. \\&\qquad \left. -\, u(\theta _n, \zeta ,X^{0,x}_{\zeta }) \Bigg )^2 \right] \\&= T \ell (\theta _n) \end{aligned}$$

Using the Eq. (39), the expression of $D_n$, the Lipschitz property of f, Jensen inequality, the boundedness of f, g, and the fact that $\rho $ is bounded by below by $\hat{\rho }(T)$:

$$\begin{aligned} F_n&= \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in {\bar{K}}} (u( t,X^{0,x}_t) - u( \theta _n, t,X^{0,x}_t)) ^2 {{\,\mathrm{{\text {d}}}\,}}t \right] \\&\le \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} (u( t,X^{0,x}_t) - u( \theta _n, t,X^{0,x}_t)) ^2 {{\,\mathrm{{\text {d}}}\,}}t \right] \\&\le 3 \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} \left( \mathbb {E}_{t,X^{0,x}_t} (\psi (t,t+\tau ,X^{0,x}_{t+\tau },u) \right. \right. \\&\quad \left. \left. - \psi (t,t+\tau ,X^{0,x}_{t+\tau }, u(\theta _n,\cdot ,\cdot ))) \right) ^2 {{\,\mathrm{{\text {d}}}\,}}t \right] \\&\quad +\, 3 \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} \left( \mathbb {E}_{t,X^{0,x}_t} ( \psi (t,t+\tau ,X^{0,x}_{t+\tau }, u(\theta _n,\cdot ,\cdot ))) - u( \theta _n, t,X^{0,x}_t) \right) ^2 {{\,\mathrm{{\text {d}}}\,}}t \right] \\&\quad +\, 3 \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} \mathbb {E}_{t,X^{0,x}_t} \left( 1_{X^{0,x}_{t+ \tau } \notin \Omega ^{0,x}_{\epsilon _n}} \frac{1}{4} \big [ \phi \big (t, t+\tau ,X_{t+\tau }^{t,x}, u(t+\tau ,X_{t+\tau }^{t,x}) \big ) \right. \right. \\&\quad + \left. \left. \phi \big (t, t+\tau ,\widehat{X}_{t+\tau }^{t,x}, u(t+\tau ,\widehat{X}_{t+\tau }^{t,x}) \big ) \big ]^2 \right) {{\,\mathrm{{\text {d}}}\,}}t \right] \\&\le 3 K^2 \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} \mathbb {E}_{t,X^{0,x}_t} \left[ 1_{\tau <T -t} 1_{X^{0,x}_{t+\tau } \in \Omega ^{0,x}_{\epsilon _n}}\right. \right. \\&\quad \left. \left. \times \frac{(u( \theta _n, t+\tau ,X^{0,x}_{t+\tau }) - u(t+\tau ,X^{0,x}_{t+\tau }))^2}{\rho (\tau )^2 }\right] {{\,\mathrm{{\text {d}}}\,}}t \right] \\&\quad +\, 3 T \ell (\theta _n) + 3 \left( \frac{ ||f||_\infty ^2}{{\hat{\rho }}(T)^2} + \frac{ ||g||_\infty ^2}{{\bar{F}}(T)^2}\right) \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} \mathbb {E}_{t,X^{0,x}_t} \left( 1_{X^{0,x}_{t+ \tau } \notin \Omega ^{0,x}_{\epsilon _n}} \right) {{\,\mathrm{{\text {d}}}\,}}t\right] \\&\le 3 K^2 \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} \mathbb {E}_{t,X^{0,x}_t} \left[ \int _t^T ds \right. \right. \\&\quad \left. \left. \times \,\left( 1_{X^{0,x}_{s} \in \Omega ^{0,x}_{\epsilon _n}} \frac{(u( \theta _n, s,X^{0,x}_{s}) - u(s,X^{0,x}_{s}))^2}{\rho (s-t) } \right) ds \right] {{\,\mathrm{{\text {d}}}\,}}t \right] + 3 T \ell (\theta _n) + C \epsilon \\&\le 3 \frac{K^2 T}{{\hat{\rho }}(T)} \mathbb {E}\left[ \int _0^T 1_{X^{0,x}_{t} \in \Omega ^{0,x}_{\epsilon _n}} (u( \theta _n, t,X^{0,x}_{t}) - u(t,X^{0,x}_{t}))^2 {{\,\mathrm{{\text {d}}}\,}}t \right] + 3T \ell (\theta _n) + C \epsilon \end{aligned}$$

Then, if K is small enough

$$\begin{aligned} F_n \le {\hat{\rho }}(T) \frac{T \ell (\theta _n) + C \epsilon }{ {\hat{\rho }}(T) - 3 K^2 T}, \end{aligned}$$

which completes the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chan-Wai-Nam, Q., Mikael, J. & Warin, X. Machine Learning for Semi Linear PDEs. J Sci Comput 79, 1667–1712 (2019). https://doi.org/10.1007/s10915-019-00908-3

Download citation

Received: 21 September 2018
Revised: 24 December 2018
Accepted: 07 January 2019
Published: 12 February 2019
Issue Date: 15 June 2019
DOI: https://doi.org/10.1007/s10915-019-00908-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning for Semi Linear PDEs

Abstract

Access this article

Similar content being viewed by others

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

Improved Architectures and Training Algorithms for Deep Operator Networks

Why Deep Neural Networks: Yet Another Explanation

References

Acknowledgements