A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations

Hutzenthaler, Martin; Jentzen, Arnulf; Kruse, Thomas; Nguyen, Tuan Anh

doi:10.1007/s42985-019-0006-9

A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations

Original Paper
Open access
Published: 06 April 2020

Volume 1, article number 10, (2020)
Cite this article

Download PDF

You have full access to this open access article

SN Partial Differential Equations and Applications Aims and scope Submit manuscript

A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations

Download PDF

Martin Hutzenthaler ORCID: orcid.org/0000-0003-0738-8717¹,
Arnulf Jentzen^2,3,
Thomas Kruse⁴ &
…
Tuan Anh Nguyen¹

5251 Accesses
53 Citations
4 Altmetric
Explore all metrics

Abstract

Deep neural networks and other deep learning methods have very successfully been applied to the numerical approximation of high-dimensional nonlinear parabolic partial differential equations (PDEs), which are widely used in finance, engineering, and natural sciences. In particular, simulations indicate that algorithms based on deep learning overcome the curse of dimensionality in the numerical approximation of solutions of semilinear PDEs. For certain linear PDEs it has also been proved mathematically that deep neural networks overcome the curse of dimensionality in the numerical approximation of solutions of such linear PDEs. The key contribution of this article is to rigorously prove this for the first time for a class of nonlinear PDEs. More precisely, we prove in the case of semilinear heat equations with gradient-independent nonlinearities that the numbers of parameters of the employed deep neural networks grow at most polynomially in both the PDE dimension and the reciprocal of the prescribed approximation accuracy. Our proof relies on recently introduced full history recursive multilevel Picard approximations for semilinear PDEs.

Multilevel Picard iterations for solving smooth semilinear parabolic heat equations

Article Open access 04 November 2021

Deep learning schemes for parabolic nonlocal integro-differential equations

Article Open access 24 October 2022

Error analysis for physics-informed neural networks (PINNs) approximating Kolmogorov PDEs

Article Open access 15 November 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Deep neural networks (DNNs) have revolutionized a number of computational problems; see, e.g., the references in Grohs et al. [13]. In 2017 deep learning-based approximation algorithms for certain parabolic partial differential equations (PDEs) have been proposed in Han et al. [6, 14] and based on these works there is now a series of deep learning-based numerical approximation algorithms for a large class of different kinds of PDEs in the scientific literature; see, e.g., [1, 2, 4, 9, 10, 11, 13, 15, 21,22,23,24,25]. There is empirical evidence that deep learning-based methods work exceptionally well for approximating solutions of high-dimensional PDEs and that these do not suffer from the curse of dimensionality; see, e.g., the simulations in [1, 2, 6, 14]. There exist, however, only few theoretical results which prove that DNN approximations of solutions of PDEs do not suffer from the curse of dimensionality: The recent articles [5, 10, 13, 20] prove rigorously that DNN approximations overcome the curse of dimensionality in the numerical approximation of solutions of certain linear PDEs.

The main result of this article, Theorem 4.1 below, proves for semilinear heat equations with gradient-independent nonlinearities that the number of parameters of the approximating DNN grows at most polynomially in both the PDE dimension $d\in {\mathbb {N}}$ and the reciprocal of the prescribed accuracy $\varepsilon >0$. Thereby, we establish for the first time that there exist DNN approximations of solutions of such PDEs which indeed overcome the curse of dimensionality. To illustrate the main result of this article we formulate in the following result, Theorem 1.1 below, a special case of Theorem 4.1.

Theorem 1.1

Let $\mathbf {A}_{d}:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d$, $d\in {\mathbb {N}}=\{1,2,\ldots \}$, and $\left\| \cdot \right\| :(\cup _{d\in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0,\infty )$ satisfy for all $d\in {\mathbb {N}}$, $x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d$ that $\mathbf {A}_{d}(x)= \left( \max \{x_1,0\},\ldots ,\max \{x_d,0\}\right)$ and $\Vert x\Vert =[\sum _{i=1}^d(x_i)^2]^{1/2}$, let ${\mathbf {N}} = \cup _{H\in {\mathbb {N}}}\cup _{(k_0,k_1,\ldots ,k_{H+1})\in {\mathbb {N}}^{H+2}} \left[ \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_{n}\times k_{n-1}} \times {\mathbb {R}}^{k_{n}}\right) \right],$ let ${\mathcal {R}}:\mathbf {N}\rightarrow (\cup _{k,l\in {\mathbb {N}}} C({\mathbb {R}}^k,{\mathbb {R}}^l))$ and ${\mathcal {P}}:\mathbf {N}\rightarrow {\mathbb {N}}$ satisfy for all $H\in {\mathbb {N}}$, $k_0,k_1,\ldots ,k_H,k_{H+1}\in {\mathbb {N}}$, $\Phi = ((W_1,B_1),\ldots ,(W_{H+1},B_{H+1}))\in \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_n\times k_{n-1}} \times {\mathbb {R}}^{k_n}\right) ,$$x_0 \in {\mathbb {R}}^{k_0},\ldots ,x_{H}\in {\mathbb {R}}^{k_{H}}$ with $\forall \, n\in {\mathbb {N}}\cap [1,H]:x_n = \mathbf {A}_{k_n}(W_n x_{n-1}+B_n )$ that

$$\begin{aligned} {\mathcal {R}}(\Phi )\in C({\mathbb {R}}^{k_0},{\mathbb {R}}^ {k_{H+1}}),\quad (\mathcal {R}(\Phi )) (x_0) = W_{H+1}x_{H}+B_{H+1}, \quad {and}\quad \mathcal {P}(\Phi )=\textstyle {\sum \limits _{n=1}^{H+1}}k_n(k_{n-1}+1), \end{aligned}$$

let$T,\kappa \in (0,\infty )$, $f\in C({\mathbb {R}},{\mathbb {R}})$, $({\mathfrak {g}}_{{d,\varepsilon }})_{d\in {\mathbb {N}},\varepsilon \in (0,1]}\subseteq \mathbf {N}$, $(c_d)_{d\in {\mathbb {N}}}\subseteq (0,\infty )$, for every$d\in {\mathbb {N}}$let$g_d\in C({\mathbb {R}}^d,{\mathbb {R}})$, for every$d\in {\mathbb {N}}$let$u_d\in C^{1,2}([0,T]\times {\mathbb {R}}^d,{\mathbb {R}})$, and assume for all$d\in {\mathbb {N}}$, $v,w\in {\mathbb {R}}$, $x\in {\mathbb {R}}^d$, $\varepsilon \in (0,1]$, $t\in (0,T)$that$|f(v)-f(w)|\le \kappa |v-w|$, ${\mathcal {R}}({\mathfrak {g}}_{{d,\varepsilon }})\in C({\mathbb {R}}^d,{\mathbb {R}})$, $|({\mathcal {R}}({\mathfrak {g}}_{{d,\varepsilon }}))(x)|\le \kappa d^\kappa (1+\| {x}\| ^\kappa )$, $| g_d(x)-({\mathcal {R}}({\mathfrak {g}}_{{d,\varepsilon }}))(x)| \le \varepsilon \kappa d^\kappa (1+\| {x}\| ^\kappa )$, ${\mathcal {P}}({\mathfrak {g}}_{{d,\varepsilon }})\le \kappa d^\kappa \varepsilon ^{-\kappa }$, $|u_d(t,x)|\le c_d(1+\| {x}\| ^{c_d})$, $u_d(0,x)=g_d(x)$, and

$$(\tfrac{\partial }{\partial t}u_d)(t,x)=(\Delta _xu_d)(t,x)+f(u_d(t,x)).$$

(1)

Then there exist$\eta \in (0,\infty )$and$({\mathfrak {u}}_{{d,\varepsilon }})_{d\in {\mathbb {N}},\varepsilon \in (0,1]}\subseteq {\mathbf {N}}$such that for all$d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$it holds that${\mathcal {R}}({\mathfrak {u}}_{{d,\varepsilon }})\in C({\mathbb {R}}^d,{\mathbb {R}})$, ${\mathcal {P}}({\mathfrak {u}}_{{d,\varepsilon }})\le \eta d^\eta \varepsilon ^{-\eta }$, and

$$\left[ \int _{[0,1]^d}\left| u_d(T,x)-(\mathcal {R}({\mathfrak {u}}_{{d,\varepsilon }}))(x)\right| ^2\, dx\right] ^{\frac{1}{2}}\le \varepsilon .$$

(2)

Theorem 1.1 is an immediate consequence of Corollary 4.2 in Sect. 4.2 below (with $T=2T$, $u_d(t,x)=u_d(T-\frac{t}{2},x)$, $f(v)=f(v)/2$ for $t\in [0,2T]$, $x\in {\mathbb {R}}^d$, $v\in {\mathbb {R}}$ in the notation of Corollary 4.2). In the following we add a few comments on some of the mathematical objects which appear in Theorem 1.1. First, note that for all $d\in {\mathbb {N}}$ it holds that $\left\| \cdot \right\| |_{{\mathbb {R}}^d}:{\mathbb {R}}^d \rightarrow [0,\infty )$ in Theorem 1.1 is nothing else but the standard norm on ${\mathbb {R}}^d$. Theorem 1.1 shows under suitable assumptions that DNNs can overcome the curse of dimensionality in the numerical approximation of semilinear heat equations of the form (1) above and the functions ${\mathbf {A}}_{d}:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d$, $d\in {\mathbb {N}}$, in Theorem 1.1 above specify the activation functions which we employ in the considered DNN approximations. The set ${\mathbf {N}}$ in Theorem 1.1 above represents the set of all DNNs. Observe that for all $\Phi \in {\mathbf {N}}$ we have that the natural number ${\mathcal {P}}(\Phi )$ specifies the number of real parameters used to describe the DNN $\Phi$. Moreover, note that for all $\Phi \in {\mathbf {N}}$ it holds that ${\mathcal {R}}(\Phi )$ is the realization function associated to the DNN $\Phi$. The real number $T\in (0,\infty )$ specifies the time horizon [0, T] of the PDEs in (1), the function $f:{\mathbb {R}}\rightarrow {\mathbb {R}}$ specifies the nonlinearity of the PDEs in (1), the functions $g_d:{\mathbb {R}}^d \rightarrow {\mathbb {R}}$, $d\in {\mathbb {N}}$, specify the initial conditions of the PDEs in (1), and the functions $u_d:[0,T]\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}$, $d\in {\mathbb {N}}$, specify the solutions of the PDEs in (1). The real numbers $\kappa \in (0,\infty )$ and $c_d \in (0,\infty )$, $d\in {\mathbb {N}}$, are constants which we use to specify suitable regularity assumptions on the nonlinearity $f:{\mathbb {R}}\rightarrow {\mathbb {R}}$, the initial conditions $g_d:{\mathbb {R}}^d \rightarrow {\mathbb {R}}$, $d\in {\mathbb {N}}$, and the PDE solutions $u_d:[0,T]\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}$, $d\in {\mathbb {N}}$. Theorem 1.1 establishes the existence of DNNs ${\mathfrak {u}}_{{d,\varepsilon }}\in \mathbf {N}$, $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$, which approximate the solutions $u_d:[0,T]\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}$, $d\in {\mathbb {N}}$, of (1) at time T without the curse of dimensionality.

Next we sketch the main steps in our proof of Theorem 1.1 above. Roughly speaking, the proof can be divided into four main steps. First, we approximate the solutions of the semilinear heat equations in (1) through solutions of PDEs whose initial conditions and nonlinearities can be exactly represented through suitable DNNs (cf. Lemma 2.3 below). Next we construct a suitable artificial probability space on which we approximate the solutions of these approximating PDEs by means of the in [7, 17] recently introduced full history recursive multilevel Picard (MLP) approximations (cf. Corollary 2.4 below and see also [3, 8, 12, 16, 18, 19] for further articles on MLP approximations). Thereafter, we prove that the MLP approximations of the approximating PDEs can be exactly represented by DNNs (cf. Lemma 3.10 below). We thus have constructed suitable random DNNs which approximate the solutions of (1) without the curse of dimensionality in the probabilistically strong sense. We are now in the position of the articles [5, 13, 20] to bring, e.g., [20, Corollary 2.4] into play to obtain the existence of a realization on the artificial probability space such that the error between the PDE solution of (1) and the realization of the constructed random DNNs is below the prescribed approximation accuracy $\varepsilon \in (0,1]$ and this will allow us to complete the proof of Theorem 1.1 above. Our strategy of the proof of Theorem 1.1 is inspired by the procedure in [5, 13, 20] in the sense that we also construct suitable random DNNs to approximate the solutions of (1) on a suitable artificial probability space. The main difference of this work compared to [5, 13, 20] is that in this work we do not construct the approximating random DNNs through the plain vanilla Monte Carlo method but through the recently introduced MLP approximation scheme which is an efficient nonlinear Monte Carlo algorithm and thereby allows us to overcome the curse of dimensionality in the case of nonlinear PDEs of the form (1).

The remainder of this article is organized as follows. In Sect. 2 we provide auxiliary results on MLP approximations ensuring that these approximations are stable against perturbations in the nonlinearity and the terminal condition of the PDE (1). In Sect. 3 we show that MLP approximations can be represented by DNNs and we provide bounds for the number of parameters of the representing DNN. In Sect. 4 we use the results of Sects. 2 and 3 to establish in Theorem 4.1 the main result of this article.

2 A stability result for full history recursive multilevel Picard (MLP) approximations

2.1 Setting

Setting 2.1

Let$d\in {\mathbb {N}}$, $T,L,\delta ,B\in (0,\infty )$, $p,q\in [1,\infty )$, $f_1,f_2\in C\left( [0,T]\times {\mathbb {R}}^{d}\times {\mathbb {R}},{\mathbb {R}}\right)$, $g_1,g_2\in C({\mathbb {R}}^d,{\mathbb {R}})$, let$\left\| \cdot \right\| :{\mathbb {R}}^d \rightarrow [0,\infty )$satisfy for all$x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d$that$\Vert x\Vert =[\sum _{i=1}^d(x_i)^2]^{1/2}$, assume for all$t\in [0,T]$, $x\in {\mathbb {R}}^d$, $w,v\in {\mathbb {R}}$, $i\in \{1,2\}$that

$$\left| f_i(t,x,w)-f_i(t,x,v)\right| \le L\left| w-v\right| ,$$

(3)

$$\max \left\{ \left| f_i(t,x,0)\right| ,\left| g_i(x)\right| \right\} \le B\left( 1+\left\| {x}\right\| \right) ^p,$$

(4)

and

$$\max \left\{ \left| f_1(t,x,v)-f_2(t,x,v)\right| , \left| g_1(x)-g_2(x)\right| \right\} \le \delta \left( \left( 1+\left\| {x}\right\| \right) ^{pq}+|v|^q\right) ,$$

(5)

let$F_i:C\left( [0,T]\times {\mathbb {R}}^d,{\mathbb {R}}\right) \rightarrow C\left( [0,T]\times {\mathbb {R}}^d,{\mathbb {R}}\right)$, $i\in \{1,2\}$, satisfy for all$v\in C\left( [0,T]\times {\mathbb {R}}^d,{\mathbb {R}}\right)$, $t\in [0,T]$, $x\in {\mathbb {R}}^d$, $i\in \{1,2\}$that

$$(F_i(v))(t,x) = f_i(t,x,v(t,x)),$$

(6)

let$(\Omega , {\mathcal {F}}, {\mathbb {P}})$be a probability space, let${\mathbf {W}}:[0,T]\times \Omega \rightarrow {\mathbb {R}}^d$be a standard Brownian motion with continuous sample paths, let$u_1,u_2\in C([0,T]\times {\mathbb {R}}^d,{\mathbb {R}})$, assume for all$i\in \{1,2\}$, $s\in [0,T]$, $x\in {\mathbb {R}}^d$that

$${\mathbb {E}}\left[ \left| g_i\left( x+\mathbf {W}_{T-s}\right) \right| +\int _s^T\left| \left( F_i(u_i)\right) \left( t,x+\mathbf {W}_{t-s}\right) \right| \,dt \right] <\infty$$

(7)

and

$$u_i(s,x)={\mathbb {E}}\left[ g_i\left( x+\mathbf {W}_{T-s}\right) +\int _s^T \left( F_i(u_i)\right) \left( t,x+\mathbf {W}_{t-s}\right) \,dt\right],$$

(8)

let$\Theta = \bigcup _{ n \in {\mathbb {N}}} {\mathbb {Z}}^n$, let${\mathfrak {u}}^\theta :\Omega \rightarrow [0,1]$, $\theta \in \Theta$, be independent random variables which are uniformly distributed on [0, 1], let${\mathcal {U}}^\theta :[0,T]\times \Omega \rightarrow [0, T]$, $\theta \in \Theta$, satisfy for all$t\in [0,T]$, $\theta \in \Theta$that${\mathcal {U}}^\theta _t = t+ (T-t){\mathfrak {u}}^\theta$, let$W^\theta :[0,T]\times \Omega \rightarrow {\mathbb {R}}^d$, $\theta \in \Theta$, be independent standard Brownian motions, assume that$({\mathfrak {u}}^\theta )_{\theta \in \Theta }$, $(W^\theta )_{\theta \in \Theta }$, and${\mathbf {W}}$are independent, and let${U}_{ n,M}^{\theta } :[0, T] \times {\mathbb {R}}^d \times \Omega \rightarrow {\mathbb {R}}$, $n,M\in {\mathbb {Z}}$, $\theta \in \Theta$, be functions which satisfy for all$n,M \in {\mathbb {N}}$, $\theta \in \Theta$, $t \in [0,T]$, $x\in {\mathbb {R}}^d$that${U}_{-1,M}^{\theta }(t,x)={U}_{0,M}^{\theta }(t,x)=0$and

$$\begin{aligned}{U}_{n,M}^{\theta }(t,x) & = \frac{1}{M^n} \sum _{i=1}^{M^n} g_2\left( x+W^{(\theta ,0,-i)}_{T}-W^{(\theta ,0,-i)}_{t}\right) \\&\quad + \sum _{l=0}^{n-1} \frac{(T-t)}{M^{n-l}} \left[ \sum _{i=1}^{M^{n-l}} \left( F_2\left( {U}_{l,M}^{(\theta ,l,i)}\right) -\mathbb {1}_{{\mathbb {N}}}(l)F_2\left( {U}_{l-1,M}^{(\theta ,-l,i)}\right) \right)\right.\\&\quad \qquad\qquad\qquad\qquad \left. \left( \mathcal {U}_t^{(\theta ,l,i)},x+W_{\mathcal {U}_t^{(\theta ,l,i)}}^{(\theta ,l,i)}-W_{t}^{(\theta ,l,i)}\right) \right] . \end{aligned}$$

(9)

2.2 An a priori estimate for solutions of partial differential equations (PDEs)

Lemma 2.2

(q-th moment of the exact solution) Assume Setting 2.1 and let $x\in {\mathbb {R}}^d$, $i\in \{1,2\}$. Then it holds that

$$\begin{aligned} \begin{aligned} \sup _{t\in [0,T]} \left( {\mathbb {E}} \Big[ \left| u_i(t,x+\mathbf {W}_{t}) \right| ^q \Big] \right) ^{\frac{1}{q}} \le e^{LT} (T+1)B\left[ \sup _{t\in [0,T]} \left( {\mathbb {E}} \left[ \left( 1+\left\| {x+\mathbf {W}_t}\right\| \right) ^{pq} \right] \right) ^{ \frac{1}{q}} \right] . \end{aligned} \end{aligned}$$

(10)

Proof of Lemma 2.2

Throughout this proof let $\mu _{t} :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1]$, $t \in [0,T]$, be the probability measures which satisfy for all $t \in [0,T]$, $B \in \mathcal {B}({\mathbb {R}}^d)$ that

$$\mu _t(B) = {\mathbb {P}}( x + \mathbf {W}_t \in B ).$$

(11)

The integral transformation theorem, (8), and the triangle inequality show for all $t \in [0,T]$ that

$$\begin{aligned}\left( {\mathbb {E}} \Big[ | u_i(t,x +\mathbf {W}_t)|^q \Big] \right) ^{ \frac{1}{q}} & = \left( \int _{{\mathbb {R}}^d} | u_i(t, z)|^q \, \mu _t(dz) \right) ^{ \frac{1}{q}} \\&= \left( \int _{{\mathbb {R}}^d} \left| {\mathbb {E}} \left[ g_i(z+\mathbf {W}_{T-t}) + \int _t^{T} (F_i(u_i))(s,z+\mathbf {W}_{s-t}) \,ds \right] \right| ^q \, \mu _t(dz) \right) ^{ \frac{1}{q}} \\& \le \left( \int _{{\mathbb {R}}^d}\left| {\mathbb {E}} \left[ g_i(z+\mathbf {W}_{T-t}) \right] \right| ^q \, \mu _t(dz) \right) ^{ \frac{1}{q}} \\&\qquad +\int _{t}^T \left( \int _{{\mathbb {R}}^d}\left| {\mathbb {E}} \left[ (F_i(u_i))(s,z+\mathbf {W}_{s-t}) \right] \right| ^q \, \mu _t(dz)\right) ^{\frac{1}{q}} ds . \end{aligned}$$

(12)

Next, Jensen’s inequality, Fubini’s theorem, (11), the fact that ${\mathbf {W}}$ has independent and stationary increments, and (4) demonstrate that for all $t \in [0,T]$ it holds that

$$\begin{aligned}\int _{{\mathbb {R}}^d}\left| {\mathbb {E}} \left[ g_i(z+\mathbf {W}_{T-t}) \right] \right| ^q\, \mu _t(dz) \le \int _{{\mathbb {R}}^d} {\mathbb {E}} \Big[ \left| g_i(z+\mathbf {W}_{T}-\mathbf {W}_t) \right| ^q \Big] \, \mu _t(dz)\\&\quad = {\mathbb {E}} \Big[ \left| g_i\left( x +\mathbf {W}_t+ \mathbf {W}_{T}- \mathbf {W}_t\right) \right| ^q \Big] = {\mathbb {E}} \left[ \left| g_i\left( x +\mathbf {W}_{T}\right) \right| ^q \right] \le {\mathbb {E}} \left[ B^q\left( 1+ \left\| { x + \mathbf {W}_{T} }\right\| \right) ^{pq} \right] . \end{aligned}$$

(13)

Furthermore, Jensen’s inequality, Fubini’s theorem, (11), the fact that $\mathbf {W}$ has independent and stationary increments, the triangle inequality, (3), and (4) demonstrate for all $t \in [0,T]$ that

$$\begin{aligned} \int _{t}^T\left( \int _{{\mathbb {R}}^d}\left| {\mathbb {E}} \left[ (F_i(u_i))(s,z+\mathbf {W}_{s-t}) \right] \right| ^q \, \mu _t(dz)\right) ^{\frac{1}{q}}ds\\&\quad \le \int _{t}^T \left( \int _{{\mathbb {R}}^d} {\mathbb {E}} \Big[ \left| (F_i(u_i))(s,z+\mathbf {W}_{s}-\mathbf {W}_{t}) \right| ^q \Big] \, \mu _t(dz)\right) ^{\frac{1}{q}} \, ds \\&\quad = \int _{t}^T \left( {\mathbb {E}} \Big[ \left| \left( F_i(u_i)\right) (s,x+\mathbf {W}_{t}+\mathbf {W}_{s}-\mathbf {W}_{t}) \right| ^q \Big] \right) ^{\frac{1}{q}} ds \\&\quad \le \int _{t}^T \left( {\mathbb {E}} \Big[ \left| (F_i(0))(s, x+\mathbf {W}_{s}) \right| ^q \Big] \right) ^{ \frac{1}{q}} ds + \int _{t}^T \left( {\mathbb {E}} \Big[ \left| (F_i(u_i) - F_i(0))(s,x+\mathbf {W}_{s}) \right| ^q \Big] \right) ^{ \frac{1}{q}} \, ds \\&\quad \le T\sup _{s\in [0,T]} \left( {\mathbb {E}} \left[ B^q\left( 1+\left\| {x+\mathbf {W}_s}\right\| \right) ^{pq} \right] \right) ^{ \frac{1}{q}} + \int _{t}^T \left( {\mathbb {E}} \Big[ L^q \left| u_i(s,x+\mathbf {W}_{s}) \right| ^q \Big] \right) ^{ \frac{1}{q}} \, ds. \end{aligned}$$

(14)

Combining this with (12) and (13) implies that for all $t \in [0,T]$ it holds that

$$\begin{aligned}&\left( {\mathbb {E}} \Big[ \left| u_i(t,x +\mathbf {W}_t)\right| ^q \Big] \right) ^{\frac{1}{q}} \\&\quad \le (T+1)B\sup _{s\in [0,T]} \left( {\mathbb {E}} \left[ \left( 1+\left\| {x+\mathbf {W}_s}\right\| \right) ^{pq} \right] \right) ^{ \frac{1}{q}} + L\int _{t}^T \left( {\mathbb {E}} \Big[ \left| u_i(s,x+\mathbf {W}_{s}) \right| ^q \Big] \right) ^{ \frac{1}{q}} \, ds. \end{aligned}$$

(15)

Next, [17, Corollary 3.11] shows that

$$\begin{aligned} \sup _{s\in [0,T]}\sup _{y\in {\mathbb {R}}^d} \frac{\left| u_i(s,y)\right| }{\left( 1+\left\| {y}\right\| \right) ^p}\le \sup _{s\in [0,T]}\sup _{y\in {\mathbb {R}}^d} \frac{\left| u_i(s,y)\right| }{1+\left\| {y}\right\| ^p}< \infty . \end{aligned}$$

(16)

This, the triangle inequality, and the fact that ${\mathbb {E}}\left[ \left\| {\mathbf {W}_T}\right\| ^{pq}\right] <\infty$ show that

$$\begin{aligned} &\int _0^T \left( {\mathbb {E}} \Big[ \left| u_i(s,x+\mathbf {W}_{s}) \right| ^q \Big] \right) ^{ \frac{1}{q}} ds\le \left[ \sup _{s\in [0,T]}\sup _{y\in {\mathbb {R}}^d} \frac{|u(s,y)|}{\left( 1+\left\| {y}\right\| \right) ^p}\right] \int _{0}^{T}\left( {\mathbb {E}}\left[ \left( 1+\left\| {x+\mathbf {W}_s}\right\| \right) ^{pq}\right] \right) ^{\frac{1}{q}}ds\\&\quad \le \left[ \sup _{s\in [0,T]} \sup _{y\in {\mathbb {R}}^d}\frac{|u(s,y)|}{\left( 1+\left\| {y}\right\| \right) ^p}\right] T \left( 1+\left\| {x}\right\| + \left( {\mathbb {E}} \left[ \left\| {\mathbf {W}_T}\right\| ^{pq} \right] \right) ^{\frac{1}{pq}}\right) ^{p} <\infty . \end{aligned}$$

(17)

This, Gronwall’s integral inequality, and (15) establish for all $t \in [0, T]$ that

$$\begin{aligned} \left( {\mathbb {E}} \Big[ \left| u_i(t,x+\mathbf {W}_{t}) \right| ^q \Big] \right) ^{\frac{1}{q}}\le e^{LT} (T+1)B\sup _{s\in [0,T]} \left( {\mathbb {E}} \left[ \left( 1+\left\| {x+\mathbf {W}_s}\right\| \right) ^{pq} \right] \right) ^{ \frac{1}{q}} . \end{aligned}$$

(18)

The proof of Lemma 2.2 is thus completed. $\square$

2.3 A stability result for solutions of PDEs

Lemma 2.3

Assume Setting 2.1. Then it holds for all $t\in [0,T]$, $x\in {\mathbb {R}}^d$ that

$$\begin{aligned}&{\mathbb {E}}\Big[ \left| u_1(t,x+\mathbf {W}_t)-u_2(t,x+\mathbf {W}_t)\right| \Big] \\&\quad \le \delta \left( e^{LT} (T+1)\right) ^{q+1}\left( B^q+1\right) \left( 1+\left\| {x}\right\| + \left( {\mathbb {E}} \left[ \left\| {\mathbf {W}_T}\right\| ^{pq} \right] \right) ^{\frac{1}{pq}}\right) ^{pq}. \end{aligned}$$

(19)

Proof of Lemma 2.3

First, (8), the triangle inequality, and the fact that ${\mathbf {W}}$ has stationary increments show for all $s\in [0,T]$, $z\in {\mathbb {R}}^d$ that

$$\begin{aligned}&\left| u_1(s,z)-u_2(s,z)\right| \\&\quad =\left| {\mathbb {E}}\left[ (g_1-g_2)\left( z+\mathbf {W}_{T-s}\right) +\int _s^T \left( F_1(u_1)-F_1(u_2)+ F_1(u_2)- F_2(u_2) \right) \left( t,z+\mathbf {W}_{t-s}\right) \,dt\right] \right| \\&\quad \le {\mathbb {E}}\Big[ \left| (g_1-g_2)\left( z+\mathbf {W}_{T}-\mathbf {W}_{s}\right) \right| \Big] +\int _s^T{\mathbb {E}}\Big[ \left| \left( F_1(u_1)-F_1(u_2)\right) \left( t,z+\mathbf {W}_{t}-\mathbf {W}_{s}\right) \right| \Big] \,dt\\&\qquad +\int _s^T{\mathbb {E}}\Big[ \left| \left( F_1(u_2)- F_2(u_2)\right) \left( t,z+\mathbf {W}_{t}-\mathbf {W}_{s}\right) \right| \Big] \,dt. \end{aligned}$$

(20)

This, Fubini’s theorem, the fact that $\mathbf {W}$ has independent increments, and the Lipschitz condition in (3) ensure that for all $s\in [0,T]$, $x\in {\mathbb {R}}^d$ it holds that

$$\begin{aligned}&{\mathbb {E}}\Big[ \left| \left( u_1-u_2\right) (s,x+\mathbf {W}_s)\right| \Big] = {\mathbb {E}}\left[ \left. \left| u_1(s,z)-u_2(s,z)\right| \right| _{z=x+\mathbf {W}_s}\right] \\&\quad \le {\mathbb {E}}\left[ {\mathbb {E}}\Big[ \left. \left| (g_1-g_2)\left( z+\mathbf {W}_{T}-\mathbf {W}_{s}\right) \right| \Big] \right| _{z=x+\mathbf {W}_s}\right] \\&\qquad +\int _s^T{\mathbb {E}}\left[ {\mathbb {E}}\Big[ \left. \left| \left( F_1(u_1)-F_1(u_2)\right) \left( t,z+\mathbf {W}_{t}-\mathbf {W}_{s}\right) \right| \Big] \right| _{z=x+\mathbf {W}_s}\right] \,dt\\&\qquad +\int _s^T{\mathbb {E}}\left[ {\mathbb {E}}\Big[ \left. \left| \left( F_1(u_2)- F_2(u_2)\right) \left( t,z+\mathbf {W}_{t}-\mathbf {W}_{s}\right) \right| \Big] \right| _{z=x+\mathbf {W}_s}\right] \,dt\\&\quad = {\mathbb {E}}\Big[ \left| (g_1-g_2)\left( x+\mathbf {W}_{T}\right) \right| \Big] +\int _s^T{\mathbb {E}}\Big[ \left| \left( F_1(u_1)-F_1(u_2)\right) \left( t,x+\mathbf {W}_{t}\right) \right| \Big] \,dt\\&\qquad +\int _s^T{\mathbb {E}}\Big[ \left| \left( F_1(u_2)- F_2(u_2)\right) \left( t,x+\mathbf {W}_{t}\right) \right| \Big] \,dt\\&\quad \le {\mathbb {E}}\Big[ \left| (g_1-g_2)\left( x+\mathbf {W}_{T}\right) \right| \Big] + \int _s^T{\mathbb {E}}\Big[ L\left| \left( u_1- u_2\right) \left( t,x+\mathbf {W}_{t}\right) \right| \Big] \,dt\\&\qquad + T\sup _{t\in [0,T]}{\mathbb {E}}\Big[ \left| \left( F_1(u_2)- F_2(u_2)\right) \left( t,x+\mathbf {W}_{t}\right) \right| \Big] . \end{aligned}$$

(21)

This, Gronwall’s lemma, and Lemma 2.2 yield for all $x\in {\mathbb {R}}^d$ that

$$\begin{aligned}&\sup _{t\in [0,T]}{\mathbb {E}}\Big[ \left| \left( u_1-u_2\right) (t,x+\mathbf {W}_t)\right| \Big] \\&\quad \le e^{LT}(T+1) \sup _{t\in [0,T]}\max \left\{ {\mathbb {E}}\Big[ \left| (g_1-g_2)\left( x+\mathbf {W}_{T}\right) \right| \Big] , {\mathbb {E}}\Big[ \left| \left( F_1(u_2)- F_2(u_2)\right) \left( t,x+\mathbf {W}_{t}\right) \right| \Big] \right\} . \end{aligned}$$

(22)

Furthermore, (5), the triangle inequality, and Lemma 2.2 imply for all $x\in {\mathbb {R}}^d$ that

$$\begin{aligned} \begin{aligned}&\sup _{t\in [0,T]}\max \left\{ {\mathbb {E}}\Big[ \left| (g_1-g_2)\left( x+\mathbf {W}_{T}\right) \right| \Big] , {\mathbb {E}}\Big[ \left| \left( F_1(u_2)- F_2(u_2)\right) \left( t,x+\mathbf {W}_{t}\right) \right| \Big] \right\} \\&\quad \le \delta \sup _{t\in [0,T]} {\mathbb {E}}\left[ \left( 1+\left\| {x+\mathbf {W}_t}\right\| \right) ^{pq}+\left| u_2(x+\mathbf {W}_t)\right| ^q\right] \\&\quad \le \delta \sup _{t\in [0,T]} {\mathbb {E}}\left[ \left( 1+\left\| {x+\mathbf {W}_t}\right\| \right) ^{pq}\right] +\delta \sup _{t\in [0,T]}{\mathbb {E}}\left[ \left| u_2(x+\mathbf {W}_t)\right| ^q\right] .\\&\quad \le \delta \sup _{t\in [0,T]} {\mathbb {E}}\left[ \left( 1+\left\| {x+\mathbf {W}_t}\right\| \right) ^{pq}\right] +\delta (e^{LT} (T+1)B)^q\sup _{t\in [0,T]} {\mathbb {E}} \left[ \left( 1+\left\| {x+\mathbf {W}_t}\right\| \right) ^{pq} \right] \\&\quad \le \delta \left( e^{LT} (T+1)\right) ^q (B^q+1)\sup _{t\in [0,T]} {\mathbb {E}} \left[ \left( 1+\left\| {x+\mathbf {W}_t}\right\| \right) ^{pq} \right] . \end{aligned} \end{aligned}$$

(23)

This, (22), and the triangle inequality yield that

$$\begin{aligned}&\sup _{t\in [0,T]}{\mathbb {E}}\Big[ \left| \left( u_1-u_2\right) (t,x+\mathbf {W}_t)\right| \Big] \\&\quad \le \delta \left( e^{LT} (T+1)\right) ^{q+1}\left( B^q+1\right) \sup _{t\in [0,T]} {\mathbb {E}} \left[ \left( 1+\left\| {x+\mathbf {W}_t}\right\| \right) ^{pq} \right] \\&\quad \le \delta \left( e^{LT} (T+1)\right) ^{q+1}\left( B^q+1\right) \left( 1+\left\| {x}\right\| + \left( {\mathbb {E}} \Big[ \left\| {\mathbf {W}_T}\right\| ^{pq} \Big] \right) ^{\frac{1}{pq}}\right) ^{pq}. \end{aligned}$$

(24)

This completes the proof of Lemma 2.3. $\qquad\qquad\square$

2.4 A stability result for MLP approximations

Corollary 2.4

Assume Setting 2.1, let $x\in {\mathbb {R}}^d$, $N,M\in {\mathbb {N}}$, and assume that $q\ge 2$. Then it holds that

$$\begin{aligned}&\left( {\mathbb {E}}\left[ \left| U^0_{N,M}(0,x)-u_1(0,x)\right| ^2\right] \right) ^{\frac{1}{2}}\\&\quad \le \left( e^{LT} (T+1)\right) ^{q+1}\left( B^q+1\right) \left( \delta +\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\right) \left( 1+\left\| {x}\right\| + \left( {\mathbb {E}} \Big[ \left\| {\mathbf {W}_T}\right\| ^{pq} \Big] \right) ^{\frac{1}{pq}}\right) ^{pq}. \end{aligned}$$

(25)

Proof of Corollary 2.4

First, Lemma 2.2 implies that $\int _{0}^{T} \left( {\mathbb {E}} \left[ \left| u_2(t,x+\mathbf {W}_{t}) \right| ^2 \right] \right) ^{\frac{1}{2}}dt<\infty$. This, [17, Theorem 3.5] (with $\xi =x$, $F=F_2$, $g=g_2$, and $u=u_2$ in the notation of [17, Theorem 3.5]), (4), and the triangle inequality ensure that

$$\begin{aligned}&\left( {\mathbb {E}}\left[ \left| U^0_{N,M}(0,x)-u_2(0,x)\right| ^2\right] \right) ^{\frac{1}{2}}\\&\quad \le e^{LT} \left[ \left( {\mathbb {E}}\left[ \left| g_2(x+\mathbf {W}_T)\right| ^2\right] \right) ^{\frac{1}{2}} +T\left( \frac{1}{T}\int _0^T{\mathbb {E}}\left[ \left| (F_2(0))(t,x+\mathbf {W}_t)\right| ^2\right] dt \right) ^{\frac{1}{2}} \right] \frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\\&\quad \le e^{LT} (T+1) \sup _{t\in [0,T]} \left( {\mathbb {E}}\left[ B^2 \left( 1+\left\| {x+\mathbf {W}_t}\right\| \right) ^{2p}\right] \right) ^{\frac{1}{2}}\,\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}} \\&\quad \le e^{LT} (T+1)B\left( 1+\left\| {x}\right\| + \left( {\mathbb {E}} \left[ \left\| {\mathbf {W}_T}\right\| ^{2p} \right] \right) ^{\frac{1}{2p}}\right) ^{p} \frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}. \end{aligned}$$

(26)

Furthermore, Lemma 2.3 shows that

$$\begin{aligned}\left| u_2(0,x)-u_1(0,x)\right|&\le \delta \left( e^{LT} (T+1)\right) ^{q+1}\left( B^q+1\right) \left( 1+\left\| {x}\right\| + \left( {\mathbb {E}} \left[ \left\| {\mathbf {W}_T}\right\| ^{pq} \right] \right) ^{\frac{1}{pq}}\right) ^{pq}. \end{aligned}$$

(27)

This, the triangle inequality, (26), the fact that $B\le B^q+1$, the assumption that $q\ge 2$, and Jensen’s inequality show that

$$\begin{aligned}&\left( {\mathbb {E}}\left[ \left| U^0_{N,M}(0,x)-u_1(0,x)\right| ^2\right] \right) ^{\frac{1}{2}}\\&\quad \le \left( {\mathbb {E}}\left[ \left| U^0_{N,M}(0,x)-u_2(0,x)\right| ^2\right] \right) ^{\frac{1}{2}}+\left| u_2(0,x)-u_1(0,x)\right| \\&\quad \le \left( e^{LT} (T+1)\right) ^{q+1}\left( B^q+1\right) \left( \delta +\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\right) \left( 1+\left\| {x}\right\| + \left( {\mathbb {E}} \left[ \left\| {\mathbf {W}_T}\right\| ^{pq} \right] \right) ^{\frac{1}{pq}}\right) ^{pq}. \end{aligned}$$

(28)

The proof of Corollary 2.4 is thus completed. $\square$

3 Deep neural network representations for MLP approximations

The main result of this section, Lemma 3.10 below, shows that multilevel Picard approximations can be well represented by DNNs. The central tools for the proof of Lemma 3.10 are Lemmas 3.8 and 3.9 which show that DNNs are stable under compositions and summations. We formulate Lemmas 3.8 and 3.9 in terms of the operators defined in (33) and (34) below, whose properties are studied in Lemmas 3.3, 3.4, and 3.5.

3.1 A mathematical framework for deep neural networks

Setting 3.1

(Artificial neural networks) Let$\left\| \cdot \right\| , |||\cdot ||| :(\cup _{d\in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0,\infty )$and$\dim :(\cup _{d\in {\mathbb {N}}}{\mathbb {R}}^d) \rightarrow {\mathbb {N}}$satisfy for all$d\in {\mathbb {N}}$, $x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d$that$\Vert x\Vert =\sqrt{\sum _{i=1}^d(x_i)^2}$, $|||x|||=\max _{i\in [1,d]\cap {\mathbb {N}}}|x_i|$, and$\dim \left( x\right) =d$, let${\mathbf {A}}_{d}:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d$, $d\in {\mathbb {N}}$, satisfy for all$d\in {\mathbb {N}}$, $x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d$that

$${\mathbf {A}}_{d}(x)= \left( \max \{x_1,0\},\ldots ,\max \{x_d,0\}\right) ,$$

(29)

let${\mathbf {D}}=\cup _{H\in {\mathbb {N}}} {\mathbb {N}}^{H+2}$, let

$$\begin{aligned}{\mathbf {N}} = \bigcup _{H\in {\mathbb {N}}}\bigcup _{(k_0,k_1,\ldots ,k_{H+1})\in {\mathbb {N}}^{H+2}} \left[ \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_{n}\times k_{n-1}} \times {\mathbb {R}}^{k_{n}}\right) \right] , \end{aligned}$$

(30)

let${\mathcal {D}}:{\mathbf {N}}\rightarrow \mathbf {D}$and${\mathcal {R}}:{\mathbf {N}}\rightarrow (\cup _{k,l\in {\mathbb {N}}} C({\mathbb {R}}^k,{\mathbb {R}}^l))$satisfy for all$H\in {\mathbb {N}}$, $k_0,k_1,\ldots ,k_H,k_{H+1}\in {\mathbb {N}}$, $\Phi = ((W_1,B_1),\ldots ,(W_{H+1},B_{H+1}))\in \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_n\times k_{n-1}} \times {\mathbb {R}}^{k_n}\right) ,$$x_0 \in {\mathbb {R}}^{k_0},\ldots ,x_{H}\in {\mathbb {R}}^{k_{H}}$with$\forall \, n\in {\mathbb {N}}\cap [1,H]:x_n = \mathbf {A}_{k_n}(W_n x_{n-1}+B_n )$that

$${\mathcal {D}}(\Phi )= (k_0,k_1,\ldots ,k_{H}, k_{H+1}),\qquad {\mathcal {R}}(\Phi )\in C({\mathbb {R}}^{k_0},{\mathbb {R}}^ {k_{H+1}}),$$

(31)

$$\qquad and\qquad ({\mathcal {R}}(\Phi )) (x_0) = W_{H+1}x_{H}+B_{H+1},$$

(32)

let$\odot :{\mathbf {D}}\times {\mathbf {D}}\rightarrow {\mathbf {D}}$satisfy for all$H_1,H_2\in {\mathbb {N}}$, $\alpha =(\alpha _0,\alpha _1,\ldots ,\alpha _{H_1},\alpha _{H_1+1})\in {\mathbb {N}}^{H_1+2}$, $\beta =(\beta _0,\beta _1,\ldots ,\beta _{H_2},\beta _{H_2+1})\in {\mathbb {N}}^{H_2+2}$that

$$\alpha \odot \beta = (\beta _{0},\beta _{1},\ldots ,\beta _{H_2},\beta _{H_2+1}+\alpha _{0},\alpha _{1},\alpha _{2},\ldots ,\alpha _{H_1+1})\in {\mathbb {N}}^{H_1+H_2+3},$$

(33)

let${{\, \mathrm{\boxplus }\,}}:{\mathbf {D}}\times {\mathbf {D}}\rightarrow {\mathbf {D}}$satisfy for all$H\in {\mathbb {N}}$, $\alpha = (\alpha _0,\alpha _1,\ldots ,\alpha _{H},\alpha _{H+1})\in {\mathbb {N}}^{H+2}$, $\beta = (\beta _0,\beta _1,\beta _2,\ldots ,\beta _{H},\beta _{H+1})\in {\mathbb {N}}^{H+2}$that

$$\alpha {{\,\mathrm{\boxplus }\,}}\beta =(\alpha _0,\alpha _1+\beta _1,\ldots ,\alpha _{H}+\beta _{H},\beta _{H+1})\in {\mathbb {N}}^{H+2},$$

(34)

and let${\mathfrak {n}}_{n}\in {\mathbf {D}}$, $n\in [3,\infty )\cap {\mathbb {N}}$, satisfy for all$n\in [3,\infty )\cap {\mathbb {N}}$that

$$\begin{aligned} {\mathfrak {n}}_{n}= (1,\underbrace{2,\ldots ,2}_{(n-2)\text {-times}},1)\in {\mathbb {N}}^{n}. \end{aligned}$$

(35)

Remark 3.2

The set ${\mathbf {N}}$ can be viewed as the set of all artificial neural networks. For each network $\Phi \in {\mathbf {N}}$ the function ${\mathcal {R}}(\Phi )$ is the function represented by $\Phi$ and the vector ${\mathcal {D}}(\Phi )$ describes the layer dimensions of $\Phi$.

3.2 Properties of operations associated to deep neural networks

Lemma 3.3

($\odot$ is associative) Assume Setting 3.1 and let $\alpha ,\beta ,\gamma \in {\mathbf {D}}$. Then it holds that $(\alpha \odot \beta )\odot \gamma = \alpha \odot (\beta \odot \gamma )$.

Proof of Lemma 3.3

Throughout this proof let $H_1,H_2,H_3\in {\mathbb {N}}$, $(\alpha _i)_{i\in [0,H_1+1]\cap {\mathbb {N}}_0}\in {\mathbb {N}}^{H_1+2}$, $(\beta _i)_{i\in [0,H_2+1]\cap {\mathbb {N}}_0}\in {\mathbb {N}}^{H_2+2}$, $(\gamma _i)_{i\in [0,H_3+1]\cap {\mathbb {N}}_0}\in {\mathbb {N}}^{H_3+2}$ satisfy that

$$\begin{aligned} \alpha&=(\alpha _0,\alpha _1,\ldots ,\alpha _{H_1+1}),\quad \beta =(\beta _0,\beta _1,\ldots ,\beta _{H_2+1}),\quad \text {and}\\ \gamma&=(\gamma _0,\gamma _1,\ldots ,\gamma _{H_3+1}). \end{aligned}$$

(36)

The definition of $\odot$ in (33) then shows that

$$\begin{aligned} (\alpha \odot \beta )\odot \gamma& = (\beta _{0},\beta _{1},\beta _2\ldots ,\beta _{H_2},\beta _{H_2+1}+\alpha _{0},\alpha _{1},\alpha _{2},\ldots ,\alpha _{H_1+1})\odot (\gamma _0,\gamma _1,\ldots ,\gamma _{H_3+1})\\&= (\gamma _0,\ldots ,\gamma _{H_3},\gamma _{H_3+1}+\beta _{0},\beta _{1},\ldots ,\beta _{H_2},\beta _{H_2+1}+\alpha _{0},\alpha _{1},\alpha _{2},\ldots ,\alpha _{H_1+1})\\&= (\alpha _0,\alpha _1,\ldots ,\alpha _{H_1+1})\odot (\gamma _{0},\gamma _{1},\ldots ,\gamma _{H_3},\gamma _{H_3+1}+\beta _{0},\beta _{1},\beta _{2},\ldots ,\beta _{H_2+1}) \\&=\alpha \odot (\beta \odot \gamma ). \end{aligned}$$

(37)

The proof of Lemma 3.3 is thus completed. $\square$

Lemma 3.4

(${{\,\mathrm{\boxplus }\,}}$ and associativity) Assume Setting 3.1, let $H,k,l \in {\mathbb {N}}$, and let $\alpha ,\beta ,\gamma \in \left( \{k\}\times {\mathbb {N}}^{H} \times \{l\}\right)$. Then

(i)
it holds that$\alpha {{\,\mathrm{\boxplus }\,}}\beta \in \left( \{k\}\times {\mathbb {N}}^{H} \times \{l\}\right)$,
(ii)
it holds that$\beta {{\,\mathrm{\boxplus }\,}}\gamma \in \left( \{k\}\times {\mathbb {N}}^{H} \times \{l\}\right)$, and
(iii)
it holds that$(\alpha {{\,\mathrm{\boxplus }\,}}\beta ){{\,\mathrm{\boxplus }\,}}\gamma = \alpha {{\,\mathrm{\boxplus }\,}}(\beta {{\,\mathrm{\boxplus }\,}}\gamma )$.

Proof of Lemma 3.4

Throughout this proof let $\alpha _i,\beta _i,\gamma _i\in {\mathbb {N}}$, $i\in [1,H]\cap {\mathbb {N}}$, satisfy that $\alpha = (k,\alpha _1,\alpha _2,\ldots ,\alpha _{H},l)$, $\beta = (k,\beta _1,\beta _2,\ldots ,\beta _{H},l)$, and $\gamma = (k,\gamma _1,\gamma _2,\ldots ,\gamma _{H},l).$ The definition of ${{\,\mathrm{\boxplus }\,}}$ (see (34)) then shows that

$$\begin{aligned}\alpha {{\,\mathrm{\boxplus }\,}}\beta&=(k,\alpha _1+\beta _1, \alpha _2+\beta _2, \ldots ,\alpha _{H}+\beta _{H},l)\in \{k\}\times {\mathbb {N}}^{H}\times \{l\}, \\ \beta {{\,\mathrm{\boxplus }\,}}\gamma&=(k,\beta _1+\gamma _1, \beta _2+\gamma _2, \ldots ,\beta _{H}+\gamma _{H},l)\in \{k\}\times {\mathbb {N}}^{H}\times \{l\}, \end{aligned}$$

(38)

and

$$\begin{aligned} (\alpha {{\,\mathrm{\boxplus }\,}}\beta ){{\,\mathrm{\boxplus }\,}}\gamma&=(k,(\alpha _1+\beta _1)+\gamma _1, (\alpha _2+\beta _2)+\gamma _2, \ldots ,(\alpha _{H}+\beta _{H})+\gamma _{H},l) \\&=(k,\alpha _1+(\beta _1+\gamma _1), \alpha _2+(\beta _2+\gamma _2), \ldots ,\alpha _{H}+(\beta _{H}+\gamma _{H}),l) = \alpha {{\,\mathrm{\boxplus }\,}}(\beta {{\,\mathrm{\boxplus }\,}}\gamma ).\end{aligned}$$

(39)

The proof of Lemma 3.4 is thus completed. $\qquad\qquad\square$

Lemma 3.5

(Triangle inequality) Assume Setting 3.1, let $H,k,l \in {\mathbb {N}}$, and let $\alpha ,\beta \in \{k\}\times {\mathbb {N}}^{H} \times \{l\}$. Then it holds that $|||\alpha {{\,\mathrm{\boxplus }\,}}\beta |||\le |||\alpha |||+ |||\beta |||$.

Proof of Lemma 3.5

Throughout this proof let $\alpha _i,\beta _i\in {\mathbb {N}}$, $i\in [1,H]\cap {\mathbb {N}}$, satisfy that $\alpha = (k,\alpha _1,\alpha _2,\ldots ,\alpha _{H},l)$ and $\beta = (k,\beta _1,\beta _2,\ldots ,\beta _{H},l).$ The definition of ${{\,\mathrm{\boxplus }\,}}$ (see (34)) then shows that $\alpha {{\,\mathrm{\boxplus }\,}}\beta =(k,\alpha _1+\beta _1, \alpha _2+\beta _2, \ldots ,\alpha _{H}+\beta _{H},l).$ This together with the triangle inequality implies that

$$\begin{aligned} |||\alpha {{\,\mathrm{\boxplus }\,}}\beta |||&=\sup \left\{ |k|,\left| \alpha _1+\beta _1\right| , \left| \alpha _2+\beta _2\right| , \ldots ,\left| \alpha _{H}+\beta _{H}\right| ,\left| l\right| \right\} \\&\le \sup \left\{ |k|,\left| \alpha _1\right| , \left| \alpha _2\right| , \ldots ,\left| \alpha _{H}\right| ,\left| l\right| \right\} +\sup \left\{ |k|,\left| \beta _1\right| , \left| \beta _2\right| , \ldots ,\left| \beta _{H}\right| ,\left| l\right| \right\} \\&= |||\alpha |||+|||\beta |||.\end{aligned}$$

(40)

This completes the proof of Lemma 3.5. $\square$

The following result, Lemma 3.6, can in a slightly modified variant, e.g., be found in [20, Lemma 5.4] in the scientific literature.

Lemma 3.6

(Existence of DNNs with $H\in {\mathbb {N}}$ hidden layers for the identity in ${\mathbb {R}}$) Assume Setting 3.1 and let $H\in {\mathbb {N}}$. Then it holds that $\mathrm {Id}_{{\mathbb {R}}}\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathfrak {n}}_{H+2} \})$.

Proof of Lemma 3.6

Throughout this proof let $W_1\in {\mathbb {R}}^{2\times 1}$, $W_i\in {\mathbb {R}}^{2\times 2}$, $\,i\in [2,H]\cap {\mathbb {N}}$, $W_{H+1}\in {\mathbb {R}}^{1\times 2}$, $B_i\in {\mathbb {R}}^2$, $i\in [1,H]\cap {\mathbb {N}}$, $B_{H+1}\in {\mathbb {R}}^1$ satisfy that

$$\begin{aligned} &W_1= \begin{pmatrix} 1\\ -1 \end{pmatrix},\quad \forall i\in [2,H]\cap {\mathbb {N}}:W_i=\begin{pmatrix} 1&{} 0\\ 0&{} 1 \end{pmatrix} , \quad W_{H+1}= \begin{pmatrix} 1&-1 \end{pmatrix},\\&\forall i\in [1,H]\cap {\mathbb {N}}:B_i= \begin{pmatrix} 0\\ 0 \end{pmatrix},\quad B_{H+1}=0,\end{aligned}$$

(41)

let $\phi \in {\mathbf {N}}$ satisfy that $\phi =((W_1,B_1),(W_2,B_2),\ldots ,(W_H,B_H),(W_{H+1},B_{H+1}))$, for every $a\in {\mathbb {R}}$ let $a^+\in [0,\infty )$ be the non-negative part of a, i.e., $a^+=\max \{a,0\}$, and let $x_0\in {\mathbb {R}}$, $x_1,x_2,\ldots ,x_{H}\in {\mathbb {R}}^2$ satisfy for all $n\in {\mathbb {N}}\cap [1,H]$ that

$$x_n = \mathbf {A}_{2}(W_n x_{n-1}+B_n ).$$

(42)

Note that (41) and the definition of ${\mathcal {D}}$ (see (31)) imply that ${\mathcal {D}}(\phi )={\mathfrak {n}}_{H+2}$. Furthermore, (41), (42), and an induction argument show that

$$\begin{aligned} x_1&= \mathbf {A}_{2}(W_1x_0+B_1)= \mathbf {A}_{2}\left( \begin{pmatrix} x_0\\ -x_0 \end{pmatrix}\right) = \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix},\\ x_2&= \mathbf {A}_{2}(W_2x_1+B_2)= \mathbf {A}_{2}(x_1)=\mathbf {A}_{2}\left( \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix}\right) = \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix} ,\\&\quad \vdots \\ x_{H}&= \mathbf {A}_{2}(W_{H}x_{H-1}+B_{H})= \mathbf {A}_{2}(x_{H-1})=\mathbf {A}_{2}\left( \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix}\right) = \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix}. \end{aligned}$$

(43)

The definition of ${\mathcal {R}}$ (see (32)) hence ensures that

$$\begin{aligned} ({\mathcal {R}}(\phi ))(x_0)&=W_{H+1}x_{H}+B_{H+1}= \begin{pmatrix} 1&-1 \end{pmatrix} \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix}=x_0^{+}-(-x_0)^{+}=x_0.\end{aligned}$$

(44)

The fact that $x_0$ was arbitrary therefore proves that ${\mathcal {R}}(\phi ) =\mathrm {Id}_{{\mathbb {R}}}$. This and the fact that ${\mathcal {D}}(\phi )={\mathfrak {n}}_{H+2}$ demonstrate that ${\mathrm {Id}}_{{\mathbb {R}}}\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathfrak {n}}_{H+2} \})$. The proof of Lemma 3.6 is thus completed. $\qquad\qquad\square$

Lemma 3.7

(DNNs for affine transformations) Assume Setting 3.1 and let $d,m\in {\mathbb {N}}$, $\lambda \in {\mathbb {R}}$, $b\in {\mathbb {R}}^d$, $a\in {\mathbb {R}}^m$, $\Psi \in {\mathbf {N}}$ satisfy that ${\mathcal {R}}(\Psi )\in C({\mathbb {R}}^d,{\mathbb {R}}^m)$. Then it holds that

$$\lambda \left( \left( \mathcal {R}(\Psi )\right) (\cdot +b)+a\right) \in \mathcal {R}\left( \{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\mathcal {D}(\Psi )\}\right) .$$

(45)

Proof of Lemma 3.7

Throughout this proof let $H,k_0,k_1,\ldots ,k_{H+1}\in {\mathbb {N}}$ satisfy that

$$H+2=\dim \left( \mathcal {D}(\Psi )\right) \quad \text {and}\quad (k_0,k_1,\ldots ,k_{H},k_{H+1}) = \mathcal {D}(\Psi ),$$

(46)

let $((W_1,B_1),(W_2,B_2),\ldots ,(W_H,B_H),(W_{H+1},B_{H+1})) \in \prod _{n=1}^{H+1}\left( {\mathbb {R}}^{k_n\times k_{n-1}}\times {\mathbb {R}}^{k_n}\right)$ satisfy that

$$\left( (W_1,B_1),(W_2,B_2),\ldots ,(W_H,B_H),(W_{H+1},B_{H+1})\right) =\Psi ,$$

(47)

let $\phi \in {\mathbf {N}}$ satisfy that

$$\phi =\left( (W_1,B_1+W_1b),(W_2,B_2),\ldots ,(W_H,B_H),(\lambda W_{H+1},\lambda B_{H+1}+\lambda a)\right) ,$$

(48)

and let $x_0,y_0 \in {\mathbb {R}}^{k_0},x_1,y_1 \in {\mathbb {R}}^{k_1},\ldots ,x_{H},y_H\in {\mathbb {R}}^{k_{H}}$ satisfy for all $n\in {\mathbb {N}}\cap [1,H]$ that

$$x_n = {\mathbf {A}}_{k_n}(W_n x_{n-1}+B_n ),\, y_n = \mathbf {A}_{k_n}(W_n y_{n-1}+B_n+\mathbb {1}_{\{1\}}(n)W_1b ), \quad \text {and} \quad x_0=y_0+b.$$

(49)

Then it holds that

$$y_1= {\mathbf {A}}_{k_1}(W_1 y_{0}+B_1+W_1b )= {\mathbf {A}}_{k_1}(W_1( y_{0}+b)+B_1 ) = {\mathbf {A}}_{k_1}(W_1x_0+B_1 )=x_1.$$

(50)

This and an induction argument prove for all $i\in [2,H]\cap {\mathbb {N}}$ that

$$\begin{aligned} y_i=\mathbf {A}_{k_i}(W_i y_{i-1}+B_i )= \mathbf {A}_{k_i}(W_i x_{i-1}+B_i )=x_i. \end{aligned}$$

(51)

The definition of ${\mathcal {R}}$ (see (32)) hence shows that

$$\begin{aligned} ({\mathcal {R}}(\phi ))(y_0)&= \lambda W_{H+1}y_H+\lambda B_{H+1}+\lambda a=\lambda W_{H+1}x_H+\lambda B_{H+1}+\lambda a\\ {}&=\lambda (W_{H+1}x_H+ B_{H+1}+ a) =\lambda ( (\mathcal {R}(\Psi ))(x_0)+a)= \lambda ((\mathcal {R}(\Psi ))(y_0+b)+a). \end{aligned}$$

(52)

This and the fact that $y_0$ was arbitrary prove that ${\mathcal {R}}(\phi )=\lambda ((\mathcal {R}(\Psi ))(\cdot +b)+a)$. This and the fact that ${\mathcal {D}}(\phi )=\mathcal {D}(\Psi )$ imply that $\lambda \left( (\mathcal {R}(\Psi ))(\cdot +b)+a\right) \in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\mathcal {D}(\Psi )\})$. The proof of Lemma 3.7 is thus completed. $\square$

Lemma 3.8

(Composition) Assume Setting 3.1 and let $d_1,d_2,d_3\in {\mathbb {N}}$, $f\in C({\mathbb {R}}^{d_2},{\mathbb {R}}^{d_3})$, $g\in C( {\mathbb {R}}^{d_1}, {\mathbb {R}}^{d_2})$, $\alpha ,\beta \in \mathbf {D}$ satisfy that $f\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\alpha \})$ and $g\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\beta \})$. Then it holds that $(f\circ g)\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\alpha \odot \beta \})$.

Proof of Lemma 3.8

Throughout this proof let $H_1,H_2,\alpha _0,\ldots , \alpha _{H_1+1},\beta _0,\ldots , \beta _{H_2+1}\in {\mathbb {N}}$, $\Phi _{f}, \Phi _{g}\in \mathbf {N}$ satisfy that

$$\begin{aligned}&(\alpha _0,\alpha _1,\ldots ,\alpha _{H_1+1})=\alpha , \quad (\beta _0,\beta _1,\ldots ,\beta _{H_2+1})=\beta , \quad \mathcal {R}(\Phi _{f})=f , \\&\quad \mathcal {D}(\Phi _{f})=\alpha , \quad \mathcal {R}(\Phi _{g})=g,\quad \text {and}\quad \mathcal {D}(\Phi _{g})=\beta . \end{aligned}$$

(53)

Lemma 5.4 in [20] shows that there exists $\mathbb {I}\in \mathbf {N}$ such that $\mathcal {D}(\mathbb {I})=d_2{\mathfrak {n}}_{3}= (d_2,2d_2,d_2)$ and $\mathcal {R}(\mathbb {I}) =\mathrm {Id}_{{\mathbb {R}}^{d_2}}$. Note that $2d_2=\beta _{H_2+1}+\alpha _0$. This and [20, Proposition 5.2] (with $\phi _1= \Phi _{f}$, $\phi _2= \Phi _{g}$, and $\mathbb {I}=\mathbb {I}$ in the notation of [20, Proposition 5.2]) show that there exists $\Phi _{f\circ g}\in \mathbf {N}$ such that $\mathcal {R}( \Phi _{f\circ g})=f\circ g$ and $\mathcal {D}(\Phi _{f\circ g})= \mathcal {D}(\Phi _{f})\odot \mathcal {D}(\Phi _{g})=\alpha \odot \beta$. Hence, it holds that $f\circ g\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\alpha \odot \beta \})$. The proof of Lemma 3.8 is thus completed. $\square$

The following result, Lemma 3.9, can roughly speaking in a specialized form be found, e.g., in [20, Lemma 5.1].

Lemma 3.9

(Sum of DNNs of the same length) Assume Setting 3.1 and let $M,H,p,q\in {\mathbb {N}}$, $h_1,h_2,\ldots ,h_M\in {\mathbb {R}}$, $k_i\in \mathbf {D}$, $f_i\in C({\mathbb {R}}^{p},{\mathbb {R}}^{q})$, $i\in [1,M]\cap {\mathbb {N}}$, satisfy for all $i\in [1,M]\cap {\mathbb {N}}$ that

$$\begin{aligned} \dim \left( k_i\right) =H+2\quad \text {and}\quad f_i\in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )=k_i\right\} \right) . \end{aligned}$$

(54)

Then it holds that

$$\begin{aligned} \sum _{i=1}^{M}h_if_i \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M}}k_i\right\} \right) . \end{aligned}$$

(55)

Proof of Lemma 3.9

Throughout this proof let $\phi _i\in \mathbf {N}$, $i\in [1,M]\cap {\mathbb {N}}$, and $k_{i,n}\in {\mathbb {N}}$, $i\in [1,M]\cap {\mathbb {N}}$, $n\in [0,H+1]\cap {\mathbb {N}}_0$, satisfy for all $i \in [1,M]\cap {\mathbb {N}}$ that

$$\begin{aligned} \mathcal {D}(\phi _i)=k_i= (k_{i,0},k_{i,1},k_{i,2},\ldots ,k_{i,H},k_{i,H+1}) \quad \text {and}\quad \mathcal {R}(\phi _i)=f_i, \end{aligned}$$

(56)

for every $i\in [1,M]\cap {\mathbb {N}}$ let $((W_{i,1}, B_{i,1}),\ldots , (W_{i,H+1}, B_{i,H+1}))\in \prod _{n=1}^{H+1}\left( {\mathbb {R}}^{k_{i,n}\times k_{i,n-1}} \times {\mathbb {R}}^{k_{i,n}}\right)$ satisfy that

$$\begin{aligned} \phi _i= \left( (W_{i,1}, B_{i,1}),\ldots , (W_{i,H+1},B_{i,H+1})\right) , \end{aligned}$$

(57)

let $k_n^{{{\,\mathrm{\boxplus }\,}}}\in {\mathbb {N}}$, $n\in [1,H]\cap {\mathbb {N}}$, $k^{{{\,\mathrm{\boxplus }\,}}}\in {\mathbb {N}}^ {H+2}$ satisfy for all $n\in [1,H]\cap {\mathbb {N}}$ that

$$\begin{aligned} \begin{aligned} k_n^{{{\,\mathrm{\boxplus }\,}}}=\sum _{i=1}^{M}k_{i,n}\quad \text {and}\quad k^{{{\,\mathrm{\boxplus }\,}}}=(p,k^{{{\,\mathrm{\boxplus }\,}}}_1,k^{{{\,\mathrm{\boxplus }\,}}}_2,\ldots , k^{{{\,\mathrm{\boxplus }\,}}}_{H},q), \end{aligned} \end{aligned}$$

(58)

let $W_1\in {\mathbb {R}}^{k_1^{{{\,\mathrm{\boxplus }\,}}}\times p}$, $B_1\in {\mathbb {R}}^{k_1^{{{\,\mathrm{\boxplus }\,}}}}$ satisfy that

$$\begin{aligned} W_1= \begin{pmatrix} W_{1,1}\\ W_{2,1}\\ \vdots \\ W_{M,1} \end{pmatrix} \quad \text {and}\quad B_1= \begin{pmatrix} B_{1,1}\\ B_{2,1}\\ \vdots \\ B_{M,1} \end{pmatrix}, \end{aligned}$$

(59)

let $W_n\in {\mathbb {R}}^{k_n^{{{\,\mathrm{\boxplus }\,}}}\times k_{n-1}^{{{\,\mathrm{\boxplus }\,}}}}$, $B_n\in {\mathbb {R}}^{k^{{{\,\mathrm{\boxplus }\,}}}_{n}}$, $n\in [2,H]\cap {\mathbb {N}}$, satisfy for all $n\in [2,H]\cap {\mathbb {N}}$ that

$$\begin{aligned} \begin{aligned} W_n= \begin{pmatrix} W_{1,n} &{} 0 &{} 0 &{} 0 \\ 0 &{} W_{2,n} &{} 0 &{} 0 \\ 0 &{} 0 &{} \ddots &{} 0 \\ 0 &{} 0 &{} 0 &{}W_{M,n} \end{pmatrix} \quad \text {and}\quad B_n= \begin{pmatrix} B_{1,n}\\ B_{2,n}\\ \vdots \\ B_{M,n} \end{pmatrix},\end{aligned} \end{aligned}$$

(60)

let $W_{H+1}\in {\mathbb {R}}^{q\times k_{H}^{{{\,\mathrm{\boxplus }\,}}}}$, $B_{H+1}\in {\mathbb {R}}^{q}$ satisfy that

$$\begin{aligned} \begin{aligned} W_{H+1}= \begin{pmatrix} h_1W_{1,H+1}&\ldots&h_MW_{M,H+1} \end{pmatrix}\quad \text {and}\quad B_{H+1} = \sum _{i=1}^{M}h_iB_{i,H+1}, \end{aligned} \end{aligned}$$

(61)

let $x_0\in {\mathbb {R}}^p,\, x_1\in {\mathbb {R}}^{k_1^{{{\,\mathrm{\boxplus }\,}}}}, x_2\in {\mathbb {R}}^{k_2^{{{\,\mathrm{\boxplus }\,}}}} \ldots ,x_H\in {\mathbb {R}}^{k_H^{{{\,\mathrm{\boxplus }\,}}}}$, let $x_{1,0},x_{2,0},\ldots ,x_{M,0}\in {\mathbb {R}}^{p}$, $x_{i,n}\in {\mathbb {R}}^{k_{i,n}}$, $i\in [1,M]\cap {\mathbb {N}}$, $n\in [1,H]\cap {\mathbb {N}}$, satisfy for all $i\in [1,M]\cap {\mathbb {N}}$, $n\in [1,H]\cap {\mathbb {N}}$ that

$$\begin{aligned} &x_0=x_{1,0}=x_{2,0}=\cdots =x_{M,0},\\&x_{i,n}=\mathbf {A}_{k_{i,n}}(W_{i,n}x_{i,n-1}+B_{i,n}),\\&x_n= \mathbf {A}_{k^{{{\,\mathrm{\boxplus }\,}}}_{n}}(W_{n}x_{n-1}+B_{n}), \end{aligned}$$

(62)

and let $\psi \in {\mathbf {N}}$ satisfy that

$$\begin{aligned} \psi = \left( (W_1,B_1),(W_2,B_2),\ldots ,(W_H,B_H),(W_{H+1},B_{H+1})\right) . \end{aligned}$$

(63)

First, the definitions of $\mathcal {D}$ and $\mathcal {R}$ (see (31) and (32)), (56), and the fact that $\forall \, i\in [1,M]\cap {\mathbb {N}}:f_i\in C({\mathbb {R}}^p,{\mathbb {R}}^q)$ show for all $i\in [1,M]\cap {\mathbb {N}}$ that $k_i= (p,k_{i,1},k_{i,2},\ldots ,k_{i,H},q).$ The definition of $\mathcal {D}$ (see (31)), the definition of ${{\,\mathrm{\boxplus }\,}}$ (see (34)), and (58) then show that

$$\begin{aligned} \mathcal {D}(\psi )= (p,k_1^{{{\,\mathrm{\boxplus }\,}}},\ldots ,k_H^{{{\,\mathrm{\boxplus }\,}}},q)={\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M}}k_i. \end{aligned}$$

(64)

Next, we prove by induction on $n\in [1,H]\cap {\mathbb {N}}$ that $x_n=(x_{1,n},x_{2,n},\ldots ,x_{M,n})$. First, (59) shows that

$$\begin{aligned} W_1x_0+B_1= \begin{pmatrix} W_{1,1}\\ W_{2,1}\\ \vdots \\ W_{M,1} \end{pmatrix}x_0+ \begin{pmatrix} B_{1,1}\\ B_{2,1}\\ \vdots \\ B_{M,1} \end{pmatrix} = \begin{pmatrix} W_{1,1}x_0+B_{1,1}\\ W_{2,1}x_0+B_{2,1}\\ \vdots \\ W_{M,1}x_0+B_{M,1} \end{pmatrix}. \end{aligned}$$

(65)

This implies that

$$\begin{aligned} x_1= \mathbf {A}_{k_1^{{{\,\mathrm{\boxplus }\,}}}}(W_1x_0+B_1)=\begin{pmatrix} x_{1,1}\\ x_{2,1}\\ \vdots \\ x_{M,1}\end{pmatrix}. \end{aligned}$$

(66)

This proves the base case. Next, for the induction step let $n\in [2,H]\cap {\mathbb {N}}$ and assume that $x_{n-1}=(x_{1,n-1},x_{2,n-1},\ldots ,x_{M,n-1})$. Then (60) and the induction hypothesis ensure that

$$\begin{aligned} \begin{aligned}&W_nx_{n-1}+B_n \\&\quad = W_{n}\begin{pmatrix} x_{1,n-1}\\ x_{2,n-1}\\ \vdots \\ x_{M,n-1} \end{pmatrix}+B_{n} =\begin{pmatrix} W_{1,n} &{} 0 &{} 0 &{} 0 \\ 0 &{} W_{2,n} &{} 0 &{} 0 \\ 0 &{} 0 &{} \ddots &{} 0 \\ 0 &{} 0 &{} 0 &{}W_{M,n} \end{pmatrix} \begin{pmatrix} x_{1,n-1}\\ x_{2,n-1}\\ \vdots \\ x_{M,n-1} \end{pmatrix}+ \begin{pmatrix} B_{1,n}\\ B_{2,n}\\ \vdots \\ B_{M,n} \end{pmatrix} \\&\quad = \begin{pmatrix} W_{1,n}x_{1,n-1}+ B_{1,n}\\ W_{2,n}x_{2,n-1}+B_{2,n}\\ \vdots \\ W_{M,n}x_{M,n-1}+ B_{M,n} \end{pmatrix}.\end{aligned} \end{aligned}$$

(67)

This yields that

$$\begin{aligned} x_{n}= \mathbf {A}_{k_n^{{{\,\mathrm{\boxplus }\,}}}}(W_nx_{n-1}+B_n)=\begin{pmatrix} x_{1,n}\\ x_{2,n}\\ \vdots \\ x_{M,n} \end{pmatrix}. \end{aligned}$$

(68)

This proves the induction step. Induction now proves for all $n\in [1,H]\cap {\mathbb {N}}$ that $x_n=(x_{1,n},x_{2,n},\ldots ,x_{M,n})$. This, the definition of $\mathcal {R}$ (see (32)), and (61) imply that

$$\begin{aligned} \begin{aligned}&(\mathcal {R}(\psi ))(x_0)=W_{H+1}x_H+B_{H+1}\\&\quad =W_{H+1}\begin{pmatrix} x_{1,H}\\ x_{2,H}\\ \vdots \\ x_{M,H} \end{pmatrix}+B_{H+1} =\begin{pmatrix} h_1W_{1,H+1}&\ldots&h_MW_{M,H+1} \end{pmatrix} \begin{pmatrix} x_{1,H}\\ x_{2,H}\\ \vdots \\ x_{M,H} \end{pmatrix}+\left[ \sum _{i=1}^{M}h_iB_{i,H+1}\right] \\&\quad =\left[ \sum _{i=1}^{M}h_iW_{i,H+1}x_{i,H}\right] +\left[ \sum _{i=1}^{M}h_iB_{i,H+1}\right] = \sum _{i=1}^{M}h_i\left( W_{i,H+1}x_{i,H}+B_{i,H+1}\right) \\&\quad =\sum _{i=1}^M h_i(\mathcal {R}(\phi _i))(x_0). \end{aligned} \end{aligned}$$

(69)

This, the fact that $x_0\in {\mathbb {R}}^{p}$ was arbitrary, and (56) yield that

$$\begin{aligned} \mathcal {R}(\psi )= \sum _{i=1}^{M}h_i\mathcal {R}(\phi _i)=\sum _{i=1}^{M}h_if_i. \end{aligned}$$

(70)

This and (64) show that

$$\begin{aligned} \sum _{i=1}^{M}h_if_i \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M}}k_i\right\} \right) . \end{aligned}$$

(71)

The proof of Lemma 3.9 is thus completed. $\square$

3.3 Deep neural network representations for MLP approximations

Lemma 3.10

Assume Setting 3.1, let $d,M\in {\mathbb {N}}$, $T,c \in (0,\infty )$, $f\in C({\mathbb {R}},{\mathbb {R}})$, $g \in C( {\mathbb {R}}^d, {\mathbb {R}})$, $\Phi _f,\Phi _g\in \mathbf {N}$ satisfy that $\mathcal {R}(\Phi _f)= f$, $\mathcal {R}(\Phi _g)= g$, and

$$\begin{aligned} c\ge \max \left\{ 2, |||\mathcal {D}(\Phi _{f})|||,|||\mathcal {D}(\Phi _{g})|||\right\} , \end{aligned}$$

(72)

let$(\Omega , \mathcal {F}, {\mathbb {P}})$be a probability space, let$\Theta = \bigcup _{ n \in {\mathbb {N}}} {\mathbb {Z}}^n$, let${\mathfrak {u}}^\theta :\Omega \rightarrow [0,1]$, $\theta \in \Theta$, be independent random variables which are uniformly distributed on [0, 1], let$\mathcal {U}^\theta :[0,T]\times \Omega \rightarrow [0, T]$, $\theta \in \Theta$, satisfy for all$t\in [0,T]$, $\theta \in \Theta$that$\mathcal {U}^\theta _t = t+ (T-t){\mathfrak {u}}^\theta$, let$W^\theta :[0,T]\times \Omega \rightarrow {\mathbb {R}}^d$, $\theta \in \Theta$, be independent standard Brownian motions with continuous sample paths, assume that$({\mathfrak {u}}^\theta )_{\theta \in \Theta }$and$(W^\theta )_{\theta \in \Theta }$are independent, let${U}_{ n,M}^{\theta } :[0, T] \times {\mathbb {R}}^d \times \Omega \rightarrow {\mathbb {R}}$, $n\in {\mathbb {Z}}$, $\theta \in \Theta$, satisfy for all$n \in {\mathbb {N}}$, $\theta \in \Theta$, $t \in [0,T]$, $x\in {\mathbb {R}}^d$that${U}_{-1,M}^{\theta }(t,x)={U}_{0,M}^{\theta }(t,x)=0$and

$$\begin{aligned} {U}_{n,M}^{\theta }(t,x) & = \frac{1}{M^n} \sum _{i=1}^{M^n} g\left( x+W^{(\theta ,0,-i)}_{T}-W^{(\theta ,0,-i)}_{t}\right) \\&\quad + \sum _{l=0}^{n-1} \frac{(T-t)}{M^{n-l}} \left[ \sum _{i=1}^{M^{n-l}} \left( f\circ {U}_{l,M}^{(\theta ,l,i)}-\mathbb {1}_{{\mathbb {N}}}(l)\,f\circ {U}_{l-1,M}^{(\theta ,-l,i)}\right)\right.\\ &\qquad\quad\qquad\qquad\qquad \left.\left( \mathcal {U}_t^{(\theta ,l,i)},x+W_{\mathcal {U}_t^{(\theta ,l,i)}}^{(\theta ,l,i)}-W_{t}^{(\theta ,l,i)}\right) \right] , \end{aligned}$$

(73)

and let$\omega \in \Omega$. Then for all$n\in {\mathbb {N}}_0$there exists a family$(\Phi _{n,t}^{\theta })_{\theta \in \Theta ,t\in [0,T]}\subseteq \mathbf {N}$such that

(i)
it holds for all$t_1,t_2\in [0,T]$, $\theta _1,\theta _2\in \Theta$that
$${\mathcal {D}}\left( \Phi _{n,t_1}^{\theta _1}\right) =\mathcal {D}\left( \Phi _{n,t_2}^{\theta _2}\right) ,$$
(74)
(ii)
it holds for all$t\in [0,T]$, $\theta \in \Theta$that
$$\dim \left( \mathcal {D}\left( \Phi _{n,t}^{\theta }\right) \right) = n\left( \dim \left( \mathcal {D}\left( \Phi _{f}\right) \right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) ,$$
(75)
(iii)
it holds for all$t\in [0,T]$, $\theta \in \Theta$that
$$|||\mathcal {D}(\Phi _{n,t}^{\theta } )|||\le c(3 M)^n,$$
(76)
and
(iv)
it holds for all$\theta \in \Theta$, $t\in [0,T]$, $x\in {\mathbb {R}}^d$that
$${U}_{n,M}^{\theta }(t,x,\omega )=(\mathcal {R}(\Phi _{n,t}^{\theta }))(x).$$
(77)

Proof of Lemma 3.10

We prove Lemma 3.10 by induction on $n\in {\mathbb {N}}_0$. For the base case $n=0$ note that the fact that $\forall \, t\in [0,T],\theta \in \Theta :U^\theta _{0,M}(t,\cdot ,\omega )=0$, the fact that the function 0 can be represented by a network with depth $\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right)$, and (72) imply that there exists $(\Phi _{0,t}^{\theta })_{\theta \in \Theta , t\in [0,T]}\subseteq \mathbf {N}$ such that it holds for all $t_1,t_2\in [0,T]$, $\theta _1,\theta _2\in \Theta$ that $\mathcal {D}\left( \Phi _{0,t_1}^{\theta _1}\right) =\mathcal {D}\left( \Phi _{0,t_2}^{\theta _2}\right)$ and such that it holds for all $\theta \in \Theta$, $t\in [0,T]$ that $\dim \left( \mathcal {D}(\Phi _{0,t}^{\theta })\right) =\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right)$, $|||\mathcal {D}(\Phi _{0,t}^{\theta } )|||\le |||\mathcal {D}(\Phi _{g})|||\le c$, and ${U}_{0,M}^{\theta }(t,\cdot ,\omega )= \mathcal {R}(\Phi _{0,t}^{\theta })$. This proves the base case $n=0$.

For the induction step from $n\in {\mathbb {N}}_0$ to $n+1\in {\mathbb {N}}$ let $n\in {\mathbb {N}}_0$ and assume that (i)–(iv) hold true for all $k\in [0,n]\cap {\mathbb {N}}_0$. The assumption that $g=\mathcal {R}(\Phi _g)$ and Lemma 3.7 (with $d=d$, $m=1$, $\lambda =1$, $a=0$, $b=W^{\theta }_{T}(\omega )-W^{\theta }_{t}(\omega )$, and $\Psi =\Phi _g$ for $\theta \in \Theta$, $t\in [0,T]$ in the notation of Lemma 3.7) show for all $\theta \in \Theta$, $t\in [0,T]$ that

$$\begin{aligned} \begin{aligned} g\left( \cdot +W^{\theta }_{T}(\omega )-W^{\theta }_{t}(\omega )\right)&=(\mathcal {R}(\Phi _g))\left( \cdot +W^{\theta }_{T}(\omega )-W^{\theta }_{t}(\omega )\right) \\&\in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\mathcal {D}(\Phi _{g}) \right\} \right) . \end{aligned} \end{aligned}$$

(78)

Furthermore, Lemma 3.6 (with $H=(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) -1$ in the notation of Lemma 3.6) ensures that

$$\begin{aligned} \mathrm {Id}_{{\mathbb {R}}}\in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathfrak {n}}_{(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}\right\} \right) . \end{aligned}$$

(79)

This, (78), and Lemma 3.8 (with $d_1=d$, $d_2=1$, $d_3=1$, $f=\mathrm {Id}_{{\mathbb {R}}}$, $g=g\left( \cdot +W^{\theta }_{T}(\omega )-W^{\theta }_{t}(\omega )\right)$, $\alpha ={\mathfrak {n}}_{(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}$, and $\beta =\mathcal {D}(\Phi _g)$ for $\theta \in \Theta$, $t\in [0,T]$ in the notation of Lemma 3.8) show that for all $\theta \in \Theta$, $t\in [0,T]$ it holds that

$$\begin{aligned} \begin{aligned} g\left( \cdot +W^{\theta }_{T}(\omega )-W^{\theta }_{t}(\omega )\right) \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathfrak {n}}_{(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1} \odot \mathcal {D}(\Phi _{g}) \right\} \right) . \end{aligned} \end{aligned}$$

(80)

Next, the induction hypothesis implies for all $\theta \in \Theta$, $t\in [0,T]$, $l\in [0,n]\cap {\mathbb {N}}_0$ that

$$\begin{aligned} {U}_{l,M}^{\theta }(t,\cdot ,\omega )=\mathcal {R}(\Phi _{l,t}^{\theta })\quad \text {and}\quad \mathcal {D}\left( \Phi _{l,t}^{\theta }\right) =\mathcal {D}\left( \Phi _{l,0}^{0}\right) . \end{aligned}$$

(81)

This and Lemma 3.7 (with

$$\begin{aligned} \begin{aligned}&d=d,\quad m=1,\quad a=0,\quad b=W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\quad \text {and}\\&\Psi =\Phi _{l,\mathcal {U}_t^{\theta }(\omega )}^{\eta }\quad \text {for}\quad \theta ,\eta \in \Theta , \quad t\in [0,T],\quad l\in [0,n]\cap {\mathbb {N}}_0 \end{aligned} \end{aligned}$$

(82)

in the notation of Lemma 3.7) imply that for all $\theta ,\eta \in \Theta$, $t\in [0,T]$, $l\in [0,n]\cap {\mathbb {N}}_0$ it holds that

$$\begin{aligned} \begin{aligned}&U_{l,M}^{\eta } \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad =\left( \mathcal {R}\left( \Phi _{l,\mathcal {U}_t^{\theta }(\omega )}^{\eta }\right) \right) \left( \cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega )\right) \\&\quad \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )= \mathcal {D}\left( \Phi _{l,\mathcal {U}_t^{\theta }(\omega )}^{\eta }\right) \right\} \right) = \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )= \mathcal {D}\left( \Phi _{l,0}^{0}\right) \right\} \right) . \end{aligned} \end{aligned}$$

(83)

Moreover, Lemma 3.6 (with $H=(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) -1$ for $l\in [0,n-1]\cap {\mathbb {N}}_0$ in the notation of Lemma 3.6) ensures for all $l\in [0,n-1]\cap {\mathbb {N}}_0$ that

$$\begin{aligned} \mathrm {Id}_{{\mathbb {R}}}\in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \right\} \right) . \end{aligned}$$

(84)

This, (83), and Lemma 3.8 (with

$$\begin{aligned} \begin{aligned}&d_1=d, \quad d_2=1, \quad d_3=1, \quad f=\mathrm {Id}_{{\mathbb {R}}}, \quad \alpha ={\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}}, \quad \\&\quad \beta =\mathcal {D}\left( \Phi _{l,0}^{0}\right) ,\quad \text {and}\quad g= U_{l,M}^{\eta } \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad \qquad \text {for}\quad \eta ,\theta \in \Theta ,\quad t\in [0,T],\quad l\in [0,n-1]\cap {\mathbb {N}}_0\\ \end{aligned} \end{aligned}$$

(85)

in the notation of Lemma 3.8) prove for all $\eta ,\theta \in \Theta$, $t\in [0,T]$, $l\in [0,n-1]\cap {\mathbb {N}}_0$ that

$$\begin{aligned} &U_{l,M}^{\eta } \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )= {\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \odot \mathcal {D}(\Phi _{l,0}^{0}) \right\} \right) . \end{aligned}$$

(86)

This and Lemma 3.8 (with

$$\begin{aligned} \begin{aligned}&d_1=d, \quad d_2=1, \quad d_3=1, \quad f=f,\quad \alpha = \mathcal {D}(\Phi _f),\quad \\&\beta ={\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \odot \mathcal {D}(\Phi _{l,0}^{0}),\quad \text {and} \quad g=U_{l,M}^{\eta } \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad \qquad \text {for}\quad \eta ,\theta \in \Theta ,\quad t\in [0,T], \quad l\in [0,n-1]\cap {\mathbb {N}}_0 \end{aligned} \end{aligned}$$

(87)

in the notation of Lemma 3.8) assure for all $\eta ,\theta \in \Theta$, $t\in [0,T]$, $l\in [0,n-1]\cap {\mathbb {N}}_0$ that

$$\begin{aligned} \begin{aligned}&\left( f\circ U_{l,M}^{\eta }\right) \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\mathcal {D}(\Phi _{f})\odot {\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \odot \mathcal {D}(\Phi _{l,0}^{0}) \right\} \right) . \end{aligned} \end{aligned}$$

(88)

Next, (83) (with $l=n$) and Lemma 3.8 (with

$$\begin{aligned} \begin{aligned}&d_1=d, \quad d_2=1, \quad d_3=1, \quad f=f,\quad \alpha = \mathcal {D}(\Phi _f),\quad \beta =\mathcal {D}\left( \Phi _{n,0}^{0}\right) ,\quad \text {and}\\&\quad g=\left( U_{n,M}^{\eta }\right) \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \quad \text {for}\quad \eta ,\theta \in \Theta ,\quad t\in [0,T] \end{aligned} \end{aligned}$$

(89)

in the notation of Lemma 3.8) prove for all $\eta ,\theta \in \Theta$, $t\in [0,T]$ that

$$\begin{aligned} \begin{aligned}&\left( f\circ U_{n,M}^{\eta }\right) \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\mathcal {D}(\Phi _{f}) \odot \mathcal {D}(\Phi _{n,0}^{0}) \right\} \right) . \end{aligned} \end{aligned}$$

(90)

Furthermore, the definition of $\odot$ in (33) and the fact that

$$\begin{aligned} \forall \, l\in [0,n]\cap {\mathbb {N}}_0:\dim \left( \mathcal {D}(\Phi _{l,0}^{0})\right) =l \left( \dim \left( \mathcal {D}\left( \Phi _{f}\right) \right) -1\right) + \dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) \end{aligned}$$

(91)

in the induction hypothesis imply that

$$\begin{aligned} \begin{aligned}&\dim \left( {\mathfrak {n}}_{{(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}}\odot \mathcal {D}(\Phi _{g})\right) \\&\quad =\left[ (n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1\right] +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) -1\\&\quad =(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) , \end{aligned} \end{aligned}$$

(92)

that

$$\begin{aligned} \begin{aligned}&\dim \left( \mathcal {D}(\Phi _{f}) \odot \mathcal {D}(\Phi _{n,0}^{0}) \right) = \dim \left( \mathcal {D}(\Phi _{f})\right) +\dim \left( \mathcal {D}(\Phi _{n,0}^{0})\right) -1\\&\quad = \dim \left( \mathcal {D}(\Phi _{f})\right) +\left[ n\left( \dim \left( \mathcal {D}\left( \Phi _{f}\right) \right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) \right] -1\\&\quad = (n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) ,\end{aligned} \end{aligned}$$

(93)

and for all $l\in [0,n-1]\cap {\mathbb {N}}_0$ that

$$\begin{aligned} \begin{aligned}&\dim \left( \mathcal {D}(\Phi _{f}) \odot {\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \odot \mathcal {D}(\Phi _{l,0}^{0}) \right) \\&\quad = \dim \left( \mathcal {D}(\Phi _{f})\right) +\dim \left( {\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}}\right) +\dim \left( \mathcal {D}(\Phi _{l,0}^{0}) \right) -2\\&\quad =\dim \left( \mathcal {D}(\Phi _{f})\right) +\left[ (n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1 \right] \\&\qquad + \left[ l \left( \dim \left( \mathcal {D}\left( \Phi _{f}\right) \right) -1\right) + \dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) \right] -2\\&\quad =\dim \left( \mathcal {D}(\Phi _{f})\right) + n\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) -1\\&\quad = (n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) . \end{aligned} \end{aligned}$$

(94)

This shows, roughly speaking, that the functions in (80), (90), and (88) can be represented by networks with the same depth (i.e. number of layers): $(n+1)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1)+\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right)$. Hence, Lemma 3.9 and (73) imply that there exists a family $(\Phi _{n+1,t}^{\theta })_{\theta \in \Theta , t\in [0,T]}\subseteq \mathbf {N}$ such that for all $\theta \in \Theta$, $t\in [0,T]$, $x\in {\mathbb {R}}^d$ it holds that

$$\begin{aligned} \begin{aligned}&\left( \mathcal {R}(\Phi _{n+1,t}^{\theta })\right) (x) \\&\quad = \frac{1}{M^{n+1}} \sum _{i=1}^{M^{n+1}} g\left( x+W^{(\theta ,0,-i)}_{T}(\omega )-W^{(\theta ,0,-i)}_{t}(\omega )\right) \\&\qquad + \frac{(T-t)}{M} \sum _{i=1}^{M} \left( f\circ {U}_{n,M}^{(\theta ,n,i)}\right) \left( \mathcal {U}_t^{(\theta ,n,i)}(\omega ),x+W_{\mathcal {U}_t^{(\theta ,n,i)}(\omega )}^{(\theta ,n,i)}(\omega )- W_{t}^{(\theta ,n,i)}(\omega ),\omega \right) \\&\qquad + \sum _{l=0}^{n-1} \frac{(T-t)}{M^{n+1-l}} \sum _{i=1}^{M^{n+1-l}} \left( f\circ {U}_{l,M}^{(\theta ,l,i)}\right) \left( \mathcal {U}_t^{(\theta ,l,i)}(\omega ),x+W_{\mathcal {U}_t^{(\theta ,l,i)}(\omega )}^{(\theta ,l,i)}(\omega )- W_{t}^{(\theta ,l,i)}(\omega ),\omega \right) \\&\qquad -\sum _{l=1}^{n} \frac{(T-t)}{M^{n+1-l}} \sum _{i=1}^{M^{n+1-l}} \left( f\circ {U}_{l-1,M}^{(\theta ,-l,i)} \right) \left( \mathcal {U}_t^{(\theta ,l,i)}(\omega ),x+W_{\mathcal {U}_t^{(\theta ,l,i)}(\omega )}^{(\theta ,l,i)}(\omega )- W_{t}^{(\theta ,l,i)}(\omega ),\omega \right) \\&\quad = {U}_{n+1,M}^{\theta }(t,x,\omega ), \end{aligned} \end{aligned}$$

(95)

that

$$\begin{aligned} \dim \left( \mathcal {D}(\Phi _{n+1,t}^{\theta })\right) = (n+1)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1)+\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) , \end{aligned}$$

(96)

and that

$$\begin{aligned} \begin{aligned} \mathcal {D}(\Phi _{n+1,t}^{\theta })& = \left( {\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M^{n+1}}}\left[ {\mathfrak {n}}_{{(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \odot \mathcal {D}(\Phi _{g})\right] \right) {{\,\mathrm{\boxplus }\,}}\left( {\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M}} \left( \mathcal {D}\left( \Phi _{f}\right) \odot \mathcal {D}\left( \Phi _{n,0}^{0}\right) \right) \right) \\&\quad {{\,\mathrm{\boxplus }\,}}\left( {\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{l=0}^{n-1}}{\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M^{n+1-l}}}\left[ \left( \mathcal {D}(\Phi _{f})\odot {\mathfrak {n}}_{{(n-l)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1) +1 }} \odot \mathcal {D}(\Phi _{l,0}^{0})\right) \right) \right. \\&\left. \quad {{\,\mathrm{\boxplus }\,}}\left( {\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{l=1}^{n}}{\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M^{n+1-l}}} \left( \mathcal {D}(\Phi _{f})\odot {\mathfrak {n}}_{{(n-l+1)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1) +1 }} \odot \mathcal {D}(\Phi _{l-1,0}^{0})\right) \right] \right) .\end{aligned} \end{aligned}$$

(97)

This shows for all $t_1,t_2\in [0,T]$, $\theta _1,\theta _2\in \Theta$ that

$$\begin{aligned} \mathcal {D}\left( \Phi _{n+1,t_1}^{\theta _1}\right) =\mathcal {D}\left( \Phi _{n+1,t_2}^{\theta _2}\right) . \end{aligned}$$

(98)

Furthermore, (97), the triangle inequality (see Lemma 3.5), and the fact that

$$\begin{aligned} \forall \, l\in [0,n]\cap {\mathbb {N}}_0:|||\mathcal {D}(\Phi _{l,0}^{0} )|||\le c(3 M)^l \end{aligned}$$

(99)

in the induction hypothesis show for all $\theta \in \Theta$, $t\in [0,T]$ that

$$\begin{aligned} \begin{aligned} |||\mathcal {D}(\Phi _{n+1,t}^{\theta })|||&\le \sum _{i=1}^{M^{n+1}}||| {\mathfrak {n}}_{{(n+1)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1)+1}} \odot \mathcal {D}(\Phi _{g})|||+\sum _{i=1}^{M}|||\mathcal {D}(\Phi _{f}) \odot \mathcal {D}(\Phi _{n,0}^{0})|||\\&\quad + \sum _{l=0}^{n-1}\sum _{i=1}^{M^{n+1-l}} |||\mathcal {D}(\Phi _{f})\odot {\mathfrak {n}}_{{(n-l)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1) +1 }} \odot \mathcal {D}(\Phi _{l,0}^{0})|||\\&\quad + \sum _{l=1}^{n}\sum _{i=1}^{M^{n+1-l}}|||\mathcal {D}(\Phi _{f})\odot {\mathfrak {n}}_{{(n-l+1)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1) +1 }} \odot \mathcal {D}(\Phi _{l-1,0}^{0})|||. \end{aligned} \end{aligned}$$

(100)

Note that for all $H_1,H_2,\alpha _0,\ldots ,\alpha _{H_1+1},\beta _0,\ldots , \beta _{H_2+1}\in {\mathbb {N}}$, $\alpha ,\beta \in \mathbf {D}$ with $\alpha =(\alpha _0,\ldots ,\alpha _{H_1+1})$, $\beta =(\beta _0,\ldots , \beta _{H_2+1})$, $\alpha _0=\beta _{H_2+1}=1$ it holds that $|||\alpha \odot \beta |||\le \max \{|||\alpha |||,|||\beta |||,2\}$ (see (33)). This, (100), the fact that $\forall \, H\in {\mathbb {N}}:|||{\mathfrak {n}}_{{H+2}}|||=2$ (see (35)), (72), and (99) prove for all $\theta\in\Theta, t\in[0,T]$ that

$$\begin{aligned} \begin{aligned}&|||\mathcal {D}(\Phi _{n+1,t}^{\theta })|||\\&\quad \le \left[ \sum _{i=1}^{M^{n+1}}c \right] + \left[ \sum _{i=1}^{M}c({3} M)^n\right] + \left[ \sum _{l=0}^{n-1}\sum _{i=1}^{M^{n+1-l}}c({3} M)^l\right] +\left[ \sum _{l=1}^{n}\sum _{i=1}^{M^{n+1-l}}c({3} M)^{l-1}\right] \\&\quad = M^{n+1}c+Mc(3M)^{n}+\left[ \sum _{l=0}^{n-1}M^{n+1-l}c(3M)^l\right] +\left[ \sum _{l=1}^{n}M^{n+1-l}c(3M)^{l-1}\right] \\&\quad \leq M^{n+1}c\left[ 1+3^n+\sum _{l=0}^{n-1}3^l+\sum _{l=1}^{n}3^{l-1}\right] = M^{n+1}c\left[ 1+\sum _{l=0}^{n}3^l+\sum _{l=1}^{n}3^{l-1}\right] \\&\quad \le cM^{n+1}\left[ 1+2\sum _{l=0}^{n} {3} ^l\right] = cM^{n+1}\left[ 1+2\frac{{3}^{n+1}-1}{{3}-1}\right] = c({3} M)^{n+1}. \end{aligned} \end{aligned}$$

(101)

Combining (95), (96), (98), and (101) completes the induction step. Induction hence establishes (i)–(iv). The proof of Lemma 3.10 is thus completed. $\square$

3.4 Deep neural network approximations for the PDE nonlinearity

Lemma 3.11

(DNN interpolation) Assume Setting 3.1, let $N\in {\mathbb {N}}$, $a_0,a_1,\ldots , a_{N-1},\xi _0,\xi _1,\ldots ,\xi _N\in {\mathbb {R}}$ satisfy that $\xi _0<\xi _1<\ldots <\xi _N$, let $f:{\mathbb {R}}\rightarrow {\mathbb {R}}$ be a function, assume for all $x\in (-\infty ,\xi _0]$ that $f(x)=f(\xi _0)$, assume for all $n\in [0,N-1]\cap {\mathbb {Z}}$, $x\in (\xi _n,\xi _{n+1}]$ that $f(x)=f(\xi _n)+a_n(x-\xi _n)$, and assume for all $x\in (\xi _N,\infty )$ that $f(x)=f(\xi _N)$. Then it holds that

$$f\in {\mathcal {R}}(\{\Phi \in {\mathbf {N}}:{\mathcal {D}}(\Phi )=(1,N+1,1)\}).$$

(102)

Proof of Lemma 3.11

Throughout this proof let $a_{-1}=0$ and $a_N=0$, let $c_n \in {\mathbb {R}}, n \in$$[0,N]\cap {\mathbb {Z}}$, be the real numbers which satisfy for all $n\in [0,N]\cap {\mathbb {Z}}$ that $c_n=a_{n}-a_{n-1}$, let $W_1\in {\mathbb {R}}^{(N+1)\times 1}$, $B_1\in {\mathbb {R}}^{N+1}$, $W_2\in {\mathbb {R}}^{1\times (N+1)}$, $B_2\in {\mathbb {R}}$, $\Phi \in \mathbf {N}$ be given by

$$\begin{aligned}&W_1 = \begin{pmatrix} 1\\ 1\\ \vdots \\ 1 \end{pmatrix} ,\quad B_1=\begin{pmatrix} -\xi _0\\ -\xi_1\\ \vdots \\ -\xi_N \end{pmatrix} ,\quad W_2= \begin{pmatrix} c_0&c_1&\ldots&c_N \end{pmatrix},\quad B_2= f(\xi _0), \end{aligned}$$

(103)

and

$$\begin{aligned} \Phi = ((W_1,B_1),(W_2,B_2)), \end{aligned}$$

(104)

and let $g:{\mathbb {R}}\rightarrow {\mathbb {R}}$ satisfy for all $x\in {\mathbb {R}}$ that

$$\begin{aligned} g(x)=f(\xi _0)+\sum _{k=0}^{N}c_k\max \{x-\xi _k,0\}. \end{aligned}$$

(105)

First, observe that the fact that $\forall \,n\in [0,N-1]\cap {\mathbb {Z}}:\xi _n<\xi _{n+1}$ and the fact that $\forall \, n\in [0,N]\cap {\mathbb {Z}}:a_n= \sum _{k=0}^{n}c_k$ then show for all $n\in [0,N-1]\cap {\mathbb {Z}}$, $x\in (\xi _n,\xi _{n+1}]$ that

$$\begin{aligned} \begin{aligned} g(x)-g(\xi _n)&= \left[ \sum _{k=0}^{N}c_k\left( \max \{x-\xi _k,0\}-\max \{\xi _{n}-\xi _k,0\}\right) \right] \\&=\sum _{k=0}^{n}c_k [(x-\xi _k)-(\xi _{n}-\xi _k)]= \sum _{k=0}^{n}c_k (x-\xi _{n})=a_n(x-\xi _n). \end{aligned} \end{aligned}$$

(106)

This shows for all $n\in [0,N-1]\cap {\mathbb {Z}}$ that g is affine linear on the interval $(\xi _n,\xi _{n+1}]$. This, the fact that for all $n\in [0,N-1]\cap {\mathbb {Z}}$ it holds that f is affine linear on the interval $(\xi _n,\xi _{n+1}]$, the fact that $\forall \, x\in (-\infty ,\xi _0]:f(x)= g(x)=f(\xi _0)$, and an induction argument imply for all $x\in (-\infty ,\xi _N]$ that $f(x)=g(x)$. Furthermore, (105), the fact that $\forall \,n\in [0,N-1]\cap {\mathbb {Z}}:\xi _n<\xi _{n+1}$, and the fact that $\sum _{k=0}^{N}c_k=0$ imply for all $x\in (\xi _N,\infty )$ that

$$\begin{aligned} \begin{aligned} g(x)-g(\xi _N)&= \left[ \sum _{k=0}^{N}c_k\left( \max \{x-\xi _k,0\}-\max \{\xi _{N}-\xi _k,0\}\right) \right] \\&=\sum _{k=0}^{N}c_k [(x-\xi _k)-(\xi _{N}-\xi _k)]= \sum _{k=0}^{N}c_k (x-\xi _{N})=0.\end{aligned} \end{aligned}$$

(107)

This shows for all $x\in (\xi _N,\infty )$ that $g(x)=g(\xi _N)$. This, the fact that $\forall \,x\in (\xi _N,\infty ):f(x)=f(\xi _N)$, the fact that $\forall \,x\in (-\infty ,\xi _N]:f(x)=g(x)$, and (105) prove for all $x\in {\mathbb {R}}$ that

$$\begin{aligned} f(x)=g(x)=f(\xi _0)+\sum _{k=0}^{N}c_k\max \{x-\xi _k,0\}. \end{aligned}$$

(108)

Next, the definition of $\mathcal {R}$ and $\mathcal {D}$ (see (31) and (32)), (103), (104), and (108) imply that for all $x\in {\mathbb {R}}$ it holds that $\mathcal {D}(\Phi )=(1,N+1,1)$ and

$$\begin{aligned} \begin{aligned}&(\mathcal {R}(\Phi ))(x)= W_2( \mathbf {A}_{N+1}(W_1x+B_1))+B_2\\ {}&= \begin{pmatrix} c_0&c_1&\ldots&c_N \end{pmatrix} \begin{pmatrix} \max \{x-\xi _0,0\}\\ \max \{x-\xi _1,0\}\\ \vdots \\ \max \{x-\xi _N,0\} \end{pmatrix}+f(\xi _0)= f(\xi _0)+\sum _{k=0}^{N}c_k\max \{x-\xi _k,0\}=f(x).\end{aligned} \end{aligned}$$

(109)

This establishes (102). The proof of Lemma 3.11 is thus completed. $\square$

Lemma 3.12

Let $L\in [0,\infty )$, $N\in {\mathbb {N}}$, $a\in {\mathbb {R}}$, $b\in (a,\infty )$, $\xi _0, \xi _1,\ldots , \xi _N\in {\mathbb {R}}$ satisfy for all $n\in [0,N]\cap {\mathbb {Z}}$ that $\xi _n=a+\frac{(b-a)n}{N}$, let $f:{\mathbb {R}}\rightarrow {\mathbb {R}}$ satisfy for all $x,y\in {\mathbb {R}}$ that

$$\begin{aligned} |f(x)-f(y)|\le L|x-y|, \end{aligned}$$

(110)

and let$g:{\mathbb {R}}\rightarrow {\mathbb {R}}$satisfy for all$x\in {\mathbb {R}}$, $n\in [0,N-1]\cap {\mathbb {Z}}$that

$$g(x) = \left\{ {\begin{array}{*{20}l} {f(\xi _{0} )} \hfill & {:x \in ( - \infty ,\xi _{0} ]} \hfill \\ {\frac{{f(\xi _{n} )(\xi _{{n + 1}} - x) + f(\xi _{{n + 1}} )(x - \xi _{n} )}}{{\xi _{{n + 1}} - \xi _{n} }}} \hfill & {:x \in (\xi _{n} ,\xi _{{n + 1}} ]} \hfill \\ {f(\xi _{N} )} \hfill & {:x \in (\xi _{N} ,\infty ).} \hfill \\ \end{array} } \right.$$

(111)

Then

(i)
it holds for all$n\in [0,N]\cap {\mathbb {Z}}$that$g(\xi _n)=f(\xi _n)$,
(ii)
it holds for all$x,y\in {\mathbb {R}}$that$|g(x)-g(y)|\le L|x-y|$, and
(iii)
it holds for all$x\in [a,b]$that$|g(x)-f(x)|\le \frac{2L(b-a)}{N}$.

Proof of Lemma 3.12

Throughout this proof let $r,\ell :{\mathbb {R}}\rightarrow {\mathbb {R}}$ satisfy for all $x\in {\mathbb {R}}{\setminus}(a,b]$ that

$$\begin{aligned} r(x)=\ell (x)=x \end{aligned}$$

(112)

and for all $n\in [0,N-1]\cap {\mathbb {Z}}$, $x\in (\xi _n,\xi _{n+1}]$ that

$$\begin{aligned} r(x)= \xi _{n+1}\quad \text {and}\quad \ell (x)= \xi _{n}. \end{aligned}$$

(113)

Note that (111) implies (i). Next observe that for all $x,y\in (a,b]$ with $x\le y$ and $\ell (y)<r(x)$ it holds that $r(x)=r(y)$ and $\ell (x) =\ell (y)$. This, (112), (111), and (110) show that for all $x,y\in {\mathbb {R}}$ with $x\le y$ and $\ell (y)<r(x)$ it holds that $x,y\in (a,b]$, $r(x)=r(y)$, $\ell (x) =\ell (y)$, and

$$\begin{aligned} |g(x)-g(y)|= \left| \frac{f(r(x))-f(\ell (x))}{r(x)-\ell (x)} (x-y)\right| \le L |x-y|. \end{aligned}$$

(114)

Furthermore, (111), (110), and the fact that $\forall \,x\in {\mathbb {R}}:\ell (x)\le x\le r(x)$ imply for all $x\in (a,b]$ that

$$\begin{aligned} \begin{aligned} |g(x)-g(r(x))|&=\left| \frac{f(\ell (x))-f(r(x))}{\ell (x)-r(x)} (x-\ell (x))+f(\ell (x))-f(r(x))\right| \\&= \left| \frac{(f(\ell (x))-f(r(x)))(x-r(x))}{\ell (x)-r(x)}\right| \le L|x-r(x)|=L(r(x)-x) \end{aligned} \end{aligned}$$

(115)

and

$$\begin{aligned} \begin{aligned} g(x)-g(\ell (x))&=\left| \frac{f(\ell (x))-f(r(x))}{\ell (x)-r(x)} (x-\ell (x))+f(\ell (x))-f(\ell (x))\right| \\&=\left| \frac{f(\ell (x))-f(r(x))}{\ell (x)-r(x)} (x-\ell (x))\right| \le L|x-\ell (x)|=L(x-\ell (x)). \end{aligned} \end{aligned}$$

(116)

This and (112) show for all $x\in {\mathbb {R}}$ that

$$\begin{aligned} |g(x)-g(r(x))|\le L(r(x)-x) \quad \text {and}\quad |g(x)-g(\ell (x))|\le L(x-\ell (x)) . \end{aligned}$$

(117)

The triangle inequality therefore shows for all $x,y\in {\mathbb {R}}$ with $x\le y$ and $r(x)\le \ell (y)$ that

$$\begin{aligned} \begin{aligned} |g(x)-g(y)|&\le | g(x)-g(r(x))|+|g(r(x))-g(\ell (y))|+|g(\ell (y))-g(y)|\\&\le L (r(x)-x)+ L(\ell (y)-r(x))+ L (y-\ell (y))= L(y-x)= L|y-x|.\end{aligned} \end{aligned}$$

(118)

This and (114) show for all $x,y\in {\mathbb {R}}$ with $x\le y$ that $|g(x)-g(y)|\le L|x-y|$. Symmetry hence establishes (ii). Next, the fact that $\forall \,x\in {\mathbb {R}}:g(\ell (x))=f(\ell (x))$, the triangle inequality, (110), (117), and the fact that $\forall \,x\in [a,b]:0\le x-\ell (x)\le (b-a)/N$ imply for all $x\in [a,b]$ that

$$\begin{aligned} \begin{aligned} |g(x)-f(x)|&= |g(x)-f(\ell (x))+f(\ell (x))-f(x)|\\&= |g(x)-g(\ell (x))+f(\ell (x))-f(x)|\\&\le |g(x)-g(\ell (x))|+|f(\ell (x))-f(x)|\le 2L (x-\ell (x))\le 2L(b-a)/N. \end{aligned} \end{aligned}$$

(119)

This establishes (iii). The proof of Lemma 3.12 is thus completed. $\square$

Corollary 3.13

Assume Setting 3.1, let $\epsilon \in (0,1]$, $L\in [0,\infty )$, $q\in (1,\infty )$, and let $f:{\mathbb {R}}\rightarrow {\mathbb {R}}$ satisfy for all $x,y\in {\mathbb {R}}$ that $|f(x)-f(y)|\le L|x-y|.$ Then there exists a function $g:{\mathbb {R}}\rightarrow {\mathbb {R}}$ such that

(i)
it holds for all$x,y\in {\mathbb {R}}$that$|g(x)-g(y)|\le L|x-y|$,
(ii)
it holds for all$x\in {\mathbb {R}}$that$|f(x)-g(x)|\le \epsilon (1+|x|^q)$, and
(iii)
it holds that
$$\begin{aligned} g\in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )\in {\mathbb {N}}^3 and\; |||\mathcal {D}(\Phi )|||\le \frac{4L\left( 1+\left( 4L+2|f(0)| \right) ^{\frac{1}{q-1}} \right) +2}{\epsilon ^{\frac{q}{(q-1)}}}\right\} \right) . \end{aligned}$$
(120)

Proof of Corollary 3.13

Throughout this proof let $R\in {\mathbb {R}}$, $N\in {\mathbb {N}}$ satisfy that

$$\begin{aligned} R=\max \left( 1,\left( \frac{4L+2|f(0)|}{\epsilon } \right) ^{\frac{1}{q-1}} \right) \quad \text {and}\quad N=\min \left\{ n\in {\mathbb {N}}:\frac{4LR}{n}\le \epsilon \right\} , \end{aligned}$$

(121)

let $\xi _0, \xi _1,\ldots , \xi _N\in {\mathbb {R}}$ be the real numbers which satisfy for all $n\in [0,N]\cap {\mathbb {Z}}$ that $\xi _n=R(-1+\frac{2n}{N})$, and let $g:{\mathbb {R}}\rightarrow {\mathbb {R}}$ satisfy for all $x\in {\mathbb {R}}$, $n\in [0,N-1]\cap {\mathbb {Z}}$ that

$$g(x) = \left\{ {\begin{array}{*{20}l} {f(\xi _{0} )} \hfill & {:x \in ( - \infty ,\xi _{0} ]} \hfill \\ {\frac{{f(\xi _{n} )(\xi _{{n + 1}} - x) + f(\xi _{{n + 1}} )(x - \xi _{n} )}}{{\xi _{{n + 1}} - \xi _{n} }}} \hfill & {:x \in (\xi _{n} ,\xi _{{n + 1}} ]} \hfill \\ {f(\xi _{N} )} \hfill & {:x \in (\xi _{N} ,\infty ).} \hfill \\ \end{array} } \right.$$

(122)

By (ii) in Lemma 3.12 the function g satisfies (i). Next, it follows from (iii) in Lemma 3.12 that for all $x\in [-R,R]$ it holds $|g(x)-f(x)|\le 4LR/N$. This and the fact that $4LR/N\le \epsilon$ prove that for all $x\in [-R,R]$ it holds that $|g(x)-f(x)|\le \epsilon \le \epsilon (1+|x|^q)$. Next, the triangle inequality, the fact that $f(R)=g(R)$, and the Lipschitz condition of f and g imply for all $x\in {\mathbb {R}}$ that

$$\begin{aligned} \begin{aligned} |f(x)-g(x)|&\le |f(x)-f(0)|+|f(0)|+|g(x)-g(R)|+|g(R)|\\&= |f(x)-f(0)|+|f(0)|+|g(x)-g(R)|+|f(R)|\\&\le |f(x)-f(0)|+|f(0)|+|g(x)-g(R)|+|f(R)-f(0)|+|f(0)|\\&\le L|x|+2|f(0)|+L|x-R|+LR\\&\le 2L(|x|+R)+2|f(0)|. \end{aligned} \end{aligned}$$

(123)

This and (121) show for all $x\in {\mathbb {R}}{\setminus}[-R,R]$ that

$$\begin{aligned} \begin{aligned} \frac{|f(x)-g(x)|}{1+|x|^q}&\le \frac{2L(|x|+R)+2|f(0)|}{1+|x|^q}\le \frac{4L|x|+2|f(0)|}{1+|x|^q} \\&\le \frac{4L}{|x|^{q-1}}+\frac{2|f(0)|}{|x|^q}\le \frac{4L}{R^{q-1}}+\frac{2|f(0)|}{R^q}\le \frac{4L+2|f(0)|}{R^{q-1}} \le \epsilon .\end{aligned} \end{aligned}$$

(124)

This and the fact that $\forall \,x\in [-R,R]:|g(x)-f(x)|\le \epsilon (1+|x|^q)$ prove that for all $x\in {\mathbb {R}}$ it holds that $|g(x)-f(x)|\le \epsilon (1+|x|^q)$. This shows that g satisfies (ii). Next, (i) in Lemma 3.12 ensures that g satisfies for all $x\in (-\infty ,-R]$ that $g(x)=g(-R)$, for all $n\in [0,{\mathbb {N}}-1]\cap {\mathbb {Z}}$, $x\in (\xi _n,\xi _{n+1}]$ that $g(x)=g(\xi _n)+\frac{g(\xi _{n+1})-g(\xi _n)}{\xi _{n+1}-\xi _n}(x-\xi _n)$, and for all $x\in (R,\infty )$ that $g(x)=g(R)$. This and Lemma 3.11 (with $N=N$, $f=g$, $\xi _n=\xi _n$ for $n\in [0,N]\cap {\mathbb {Z}}$, and $a_n= (g(\xi _{n+1})-g(\xi _n))/(\xi _{n+1}-\xi _n)$ for $n\in [0,N-1]\cap {\mathbb {Z}}$ in the notation of Lemma 3.11) imply that

$$\begin{aligned} g\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=(1,N+1,1)\}). \end{aligned}$$

(125)

Furthermore, if $N>1$, then (121) implies that $\frac{4LR}{N-1}>\epsilon$. Hence, if $N>1$ it holds that $N<\frac{4LR}{\epsilon }+1$. This and (121) ensure that

$$N \le \frac{4LR}{\epsilon }+1=\frac{4L\max \left( 1,\left( \frac{4L+2|f(0)|}{\epsilon } \right) ^{\frac{1}{q-1}} \right) +\epsilon }{\epsilon }.$$

(126)

This and (125) imply that

$$\begin{aligned} |||\mathcal {D}(\Phi )|||&=N+1 \le \frac{4L\max \left( 1,\left( \frac{4L+2|f(0)|}{\epsilon } \right) ^{\frac{1}{q-1}} \right) +2\epsilon }{\epsilon }\\ {}&= \frac{4L\max \left( \epsilon ^{\frac{1}{q-1}},\left( 4L+2|f(0)| \right) ^{\frac{1}{q-1}} \right) +2\epsilon ^{\frac{q}{(q-1)}}}{\epsilon ^{\frac{q}{(q-1)}}}\\ {}&\le \frac{4L\left( 1+\left( 4L+2|f(0)| \right) ^{\frac{1}{q-1}} \right) +2}{\epsilon ^{\frac{q}{(q-1)}}}. \end{aligned}$$

(127)

This establishes (iii). The proof of Corollary 3.13 is thus completed. $\square$

4 Deep neural network approximations for PDEs

4.1 Deep neural network approximations with specific polynomial convergence rates

Theorem 4.1

Let $\left\| \cdot \right\| , |||\cdot ||| :(\cup _{d\in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0,\infty )$ and $\dim :(\cup _{d\in {\mathbb {N}}}{\mathbb {R}}^d) \rightarrow {\mathbb {N}}$ satisfy for all $d\in {\mathbb {N}}$, $x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d$ that $\Vert x\Vert =[\sum _{i=1}^d(x_i)^2]^{1/2}$, $|||x|||=\max _{i\in [1,d]\cap {\mathbb {N}}}|x_i|$, and $\dim \left( x\right) =d$, let $T,L, B,\beta \in [0,\infty )$, $p,{\mathfrak {p}}\in {\mathbb {N}}$, $q\in {\mathbb {N}}\cap [2,\infty )$, $\alpha \in [2,\infty )$, let $f:{\mathbb {R}}\rightarrow {\mathbb {R}}$ satisfy for all $x,y\in {\mathbb {R}}$ that $|f(x)-f(y)|\le L|x-y|$, for every $d\in {\mathbb {N}}$ let $g_d\in C({\mathbb {R}}^d,{\mathbb {R}})$, for every $d\in {\mathbb {N}}$ let $\nu _d:\mathcal B({\mathbb {R}}^d)\rightarrow [0,1]$ be a probability measure on $({\mathbb {R}}^d, \mathcal B({\mathbb {R}}^d))$, for every $d\in {\mathbb {N}}$ let $\mathbf {A}_{d}:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d$ satisfy for all $x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d$ that

$$\begin{aligned} \mathbf {A}_{d}(x)= \left( \max \{x_1,0\},\ldots ,\max \{x_d,0\}\right) , \end{aligned}$$

(128)

let$\mathbf {D}=\cup _{H\in {\mathbb {N}}} {\mathbb {N}}^{H+2}$, let

$$\begin{aligned} \begin{aligned} \mathbf {N}= \bigcup _{H\in {\mathbb {N}}}\bigcup _{(k_0,k_1,\ldots ,k_{H+1})\in {\mathbb {N}}^{H+2}} \left[ \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_{n}\times k_{n-1}} \times {\mathbb {R}}^{k_{n}}\right) \right] , \end{aligned} \end{aligned}$$

(129)

let$\mathcal {P}:\mathbf {N}\rightarrow {\mathbb {N}}$, $\mathcal {D}:\mathbf {N}\rightarrow \mathbf {D}$, and$\mathcal {R}:\mathbf {N}\rightarrow (\cup _{k,l\in {\mathbb {N}}} C({\mathbb {R}}^k,{\mathbb {R}}^l))$satisfy for all$H\in {\mathbb {N}}$, $k_0,k_1,\ldots ,k_H,k_{H+1}\in {\mathbb {N}}$, $\Phi = ((W_1,B_1),\ldots ,(W_{H+1},B_{H+1}))\in \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_n\times k_{n-1}} \times {\mathbb {R}}^{k_n}\right) ,$$x_0 \in {\mathbb {R}}^{k_0},\ldots ,x_{H}\in {\mathbb {R}}^{k_{H}}$with$\forall \, n\in {\mathbb {N}}\cap [1,H]:x_n = \mathbf {A}_{k_n}(W_n x_{n-1}+B_n )$that

$$\begin{aligned}&\mathcal {P}(\Phi )=\sum _{n=1}^{H+1}k_n(k_{n-1}+1), \qquad \mathcal {D}(\Phi )= (k_0,k_1,\ldots ,k_{H}, k_{H+1}), \end{aligned}$$

(130)

$$\begin{aligned}&\mathcal {R}(\Phi )\in C({\mathbb {R}}^{k_0},{\mathbb {R}}^ {k_{H+1}}),\qquad and\qquad (\mathcal {R}(\Phi )) (x_0) = W_{H+1}x_{H}+B_{H+1}, \end{aligned}$$

(131)

for every$\varepsilon \in (0,1]$, $d\in {\mathbb {N}}$let${\mathfrak {g}}_{{d,\varepsilon }} \in \mathbf {N}$, and assume for all$d\in {\mathbb {N}}$, $x\in {\mathbb {R}}^d$, $\varepsilon \in (0,1]$that$\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }})\in C({\mathbb {R}}^d,{\mathbb {R}})$, $|(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)|\le Bd^p (1+\left\| {x}\right\| )^p$, $\left| g_d(x)-(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)\right| \le \varepsilon Bd^p (1+\left\| {x}\right\| )^{pq}$, $|||\mathcal {D}({\mathfrak {g}}_{{d,\varepsilon }})|||\le Bd^p\varepsilon ^{-\alpha }$, $\dim \left( \mathcal {D}\left({\mathfrak {g}}_{{d,\varepsilon }}\right) \right) \le Bd^p\varepsilon ^{-\beta }$, and$\left( \int _{{\mathbb {R}}^d}\left\| {y}\right\| ^{2pq} \nu _d(dy)\right) ^{1/(2pq)}\le Bd^{{\mathfrak {p}}}$. Then

(i)
there exist unique$u_d\in C([0,T]\times {\mathbb {R}}^d,{\mathbb {R}})$, $d\in {\mathbb {N}}$, such that for every$d\in {\mathbb {N}}$, $x\in {\mathbb {R}}^d$, $s\in [0,T]$, every probability space$(\Omega , \mathcal {F}, {\mathbb {P}})$, and every standard Brownian motion$\mathbf {W}:[0,T]\times \Omega \rightarrow {\mathbb {R}}^d$with continuous sample paths it holds that$\sup _{t\in [0,T]}\sup _{y\in {\mathbb {R}}^d}\left( \frac{|u_d(t,y)|}{1+\left\| {y}\right\| ^p} \right) <\infty$and
$$\begin{aligned} u_d(s,x)={\mathbb {E}}\left[ g_d\left( x+\mathbf {W}_{T-s}\right) +\int _s^T f\left( u_d\left( t,x+\mathbf {W}_{t-s}\right) \right) \,dt\right] \end{aligned}$$
(132)
and
(ii)
there exist$(\Psi _{d,\varepsilon })_{d\in {\mathbb {N}},\varepsilon \in (0,1]}\subseteq \mathbf {N}$, $\eta \in (0,\infty )$, $C=(C_\gamma )_{\gamma \in (0,1]}:(0,1]\rightarrow (0,\infty )$such that for all$d\in {\mathbb {N}}$, $\varepsilon , \gamma \in (0,1]$it holds that$\mathcal {R}(\Psi _{d,\varepsilon })\in C({\mathbb {R}}^d,{\mathbb {R}})$, $\mathcal {P}(\Psi _{d,\varepsilon })\le C_\gamma d^\eta \varepsilon ^{-(4+2\alpha +\beta +\gamma )}$, and
$$\begin{aligned} \left[ \int _{{\mathbb {R}}^d}\left| u_d(0,x)-(\mathcal {R}(\Psi _{d,\varepsilon }))(x)\right| ^2\nu _d(dx)\right] ^{\frac{1}{2}}\le \varepsilon . \end{aligned}$$
(133)

Proof of Theorem 4.1

Throughout this proof assume without loss of generality that

$$\begin{aligned} B\ge \max \left\{ |f(0)|+1, 4L\left( 1+\left( 4L+2|f(0)| \right) ^{\frac{1}{q-1}} \right) +2\right\} . \end{aligned}$$

(134)

Note that the triangle inequality, the fact that $\forall \, d\in {\mathbb {N}}, x \in {\mathbb {R}}^d, \varepsilon \in (0,1]:|({\mathcal {R}}({\mathfrak {g}}_{{d,\varepsilon }}))(x)|\le Bd^p(1+\left\| {x}\right\| )^p$, and the fact that $\forall \, d\in {\mathbb {N}}, x \in {\mathbb {R}}^d, \varepsilon \in (0,1]:\left| g_d(x)-(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)\right| \le \varepsilon Bd^p (1+\left\| {x}\right\| )^{pq}$ imply for all $d\in {\mathbb {N}}$, $x \in {\mathbb {R}}^d$, $\varepsilon \in (0,1]$ that

$$\begin{aligned} \begin{aligned} \left| g_d(x)\right|&\le \left| g_d(x)-(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)\right| +\left| (\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)\right| \le \varepsilon Bd^p (1+\left\| {x}\right\| )^{pq}+ Bd^p (1+\left\| {x}\right\| )^p. \end{aligned} \end{aligned}$$

(135)

This proves for all $d\in {\mathbb {N}}$, $x \in {\mathbb {R}}^d$ that

$$\begin{aligned} \left| g_d(x)\right| \le Bd^p (1+\left\| {x}\right\| )^p. \end{aligned}$$

(136)

Corollary 3.11 in [17], the fact that f is globally Lipschitz continuous, and (136) hence establish (i). It thus remains to prove (ii). To this end note that Corollary 3.13 ensures that there exist ${\mathfrak {f}}_{{\varepsilon }}\in \mathbf {N}$, $\varepsilon \in (0,1]$, which satisfy for all $v,w\in {\mathbb {R}}$, $\varepsilon \in (0,1]$ that $\mathcal {R}({\mathfrak {f}}_{{\varepsilon }})\in C({\mathbb {R}},{\mathbb {R}})$, $\left| (\mathcal {R}({\mathfrak {f}}_{{\varepsilon }}))(w)-(\mathcal {R}({\mathfrak {f}}_{{\varepsilon }}))(v)\right| \le L\left| w-v\right|$, $\left| f(v)-(\mathcal {R}({\mathfrak {f}}_{{\varepsilon }}))(v)\right| \le \varepsilon (1+|v|^q)$, $\dim \left( \mathcal {D}\left( {\mathfrak {f}}_{{\varepsilon }}\right) \right) =3$, and

$$\begin{aligned} |||\mathcal {D}({\mathfrak {f}}_{{\varepsilon }})|||\le \left[ 4L\left( 1+\left( 4L+2|f(0)| \right) ^{\frac{1}{q-1}} \right) +2\right] \varepsilon ^{-\frac{q}{(q-1)}}. \end{aligned}$$

(137)

Note that the fact that $B\ge 1+|f(0)|$ implies for all $\varepsilon \in (0,1]$ that

$$\begin{aligned} \left| (\mathcal {R}({\mathfrak {f}}_{{\varepsilon }}))(0)\right| \le \left| (\mathcal {R}({\mathfrak {f}}_{{\varepsilon }}))(0)-f(0)\right| +|f(0)|\le \varepsilon +|f(0)|\le B. \end{aligned}$$

(138)

Next let $(\Omega , \mathcal {F}, {\mathbb {P}})$ be a probability space, for every $d\in {\mathbb {N}}$ let $\mathbf {W}^d :[0,T]\times \Omega \rightarrow {\mathbb {R}}^d$ be a standard Brownian motion with continuous sample paths, let $\Theta = \bigcup _{ n \in {\mathbb {N}}} {\mathbb {Z}}^n$, let ${\mathfrak {u}}^\theta :\Omega \rightarrow [0,1]$, $\theta \in \Theta$, be independent random variables which are uniformly distributed on [0, 1], let $\mathcal {U}^\theta :[0,T]\times \Omega \rightarrow [0, T]$, $\theta \in \Theta$, satisfy for all $t\in [0,T]$, $\theta \in \Theta$ that $\mathcal {U}^\theta _t = t+ (T-t){\mathfrak {u}}^\theta$, for every $d\in {\mathbb {N}}$ let $W^{\theta ,d}:[0,T]\times \Omega \rightarrow {\mathbb {R}}^d$, $\theta \in \Theta$, be independent standard Brownian motions with continuous sample paths, assume for every $d\in {\mathbb {N}}$ that $(\mathfrak {u}^\theta )_{\theta \in \Theta }$ and $(W^{\theta ,d})_{\theta \in \Theta }$ are independent, and let ${U}_{ n,M,d,\delta }^{\theta } :[0, T] \times {\mathbb {R}}^d \times \Omega \rightarrow {\mathbb {R}}$, $n,M\in {\mathbb {Z}}$, $d\in {\mathbb {N}}$, $\delta \in (0,1]$, $\theta \in \Theta$, satisfy for all $d,n,M \in {\mathbb {N}}$, $\delta \in (0,1]$, $\theta \in \Theta$, $t \in [0,T]$, $x\in {\mathbb {R}}^d$ that ${U}_{-1,M,d,\delta }^{\theta }(t,x)={U}_{0,M,d,\delta }^{\theta }(t,x)=0$ and

$$\begin{aligned} {U}_{n,M,d,\delta }^{\theta }(t,x) & = \frac{1}{M^n} \sum _{i=1}^{M^n} (\mathcal {R}({\mathfrak {g}}_{{d,\delta }}))\left( x+W^{(\theta ,0,-i),d}_{T}-W^{(\theta ,0,-i),d}_{t}\right) \\&\quad + \sum _{l=0}^{n-1} \frac{(T-t)}{M^{n-l}} \left[ \sum _{i=1}^{M^{n-l}} \left( ({\mathcal {R}}({\mathfrak {f}}_{{\delta }}))\circ {U}_{l,M,d,\delta }^{(\theta ,l,i)}-\mathbb {1}_{{\mathbb {N}}}(l)(\mathcal {R}({\mathfrak {f}}_{{\delta }}))\circ {U}_{l-1,M,d,\delta }^{(\theta ,-l,i)}\right) \right. \\&\quad\qquad\qquad\qquad\qquad\left. \left( {\mathcal {U}}_t^{(\theta ,l,i)},x+W_{\mathcal {U}_t^{(\theta ,l,i)}}^{(\theta ,l,i),d}-W_{t}^{(\theta ,l,i),d}\right) \right] , \end{aligned}$$

(139)

let $c_{d}\in [1,\infty )$, $d\in {\mathbb {N}}$, satisfy for all $d\in {\mathbb {N}}$ that

$$\begin{aligned} c_{d}= \left( e^{LT} (T+1)\right) ^{q+1}\left( (Bd^p)^q+1\right) \left[ 1+\left( \int _{{\mathbb {R}}^d}\left\| {x}\right\| ^{2pq} \nu _d(dx)\right) ^{\frac{1}{(2pq)}} +\left( {\mathbb {E}} \left[ \left\| {\mathbf {W}^d_T}\right\| ^{pq} \right] \right) ^{\frac{1}{(pq)}} \right] ^{pq}, \end{aligned}$$

(140)

let $k_{d,\varepsilon }\in {\mathbb {N}}$, $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$, satisfy for all $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$ that

$$\begin{aligned} k_{d,\varepsilon }=\max \left\{ |||{\mathcal {D}}({\mathfrak {f}}_{{\varepsilon }})|||,|||{\mathcal {D}}({\mathfrak {g}}_{{d,\varepsilon }})|||,2\right\} , \end{aligned}$$

(141)

let ${\tilde{C}}=({\tilde{C}}_\gamma )_{\gamma \in (0,1]}:(0,\infty )\rightarrow (0,\infty ]$ satisfy for all $\gamma \in (0,\infty )$ that

$$\begin{aligned} {\tilde{C}}_\gamma = \sup _{n\in {\mathbb {N}}\cap [2,\infty )} \left[ n (3 n)^{2n} \left( \frac{\sqrt{e}(1+2LT)}{\sqrt{n-1}} \right) ^{(n-1)(4+\gamma )} \right] , \end{aligned}$$

(142)

let $N_{d,\varepsilon }\in {\mathbb {N}}$, $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$, satisfy for all $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$ that

$$\begin{aligned} N_{d,\varepsilon }=\min \left\{ n \in {\mathbb {N}}\cap [2,\infty ) :\left[ c_d \left( \frac{\sqrt{e}(1+2LT)}{\sqrt{n}} \right) ^n\right] \le \frac{\varepsilon }{2}\right\} , \end{aligned}$$

(143)

and let $\delta _{d,\varepsilon }\in (0,1]$, $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$, satisfy for all $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$ that $\delta _{d,\varepsilon }=\frac{\varepsilon }{4Bd ^p c_d}$. Note that the fact that for all $d\in {\mathbb {N}}$ the random variable $\left\| {\frac{\mathbf {W}^d_T}{\sqrt{T}}}\right\| ^{2}$ is chi-squared distributed with d degrees of freedom and Jensen’s inequality imply that for all $d\in {\mathbb {N}}$ it holds that

$$\begin{aligned} \left( {\mathbb {E}}\left[ \left\| {\mathbf {W}^d_T}\right\| ^{pq}\right] \right) ^2\le {\mathbb {E}}\left[ \left\| {\mathbf {W}^d_T}\right\| ^{2pq}\right] =(2T)^{pq} \left[ \frac{\Gamma \left( \frac{d}{2}+pq\right) }{\Gamma \left( \frac{d}{2}\right) }\right] =(2T)^{pq} \left[ \prod _{k=0}^{pq-1}\left( \frac{d}{2}+k \right) \right] . \end{aligned}$$

(144)

This implies for all $d\in {\mathbb {N}}$ that

$$\begin{aligned} \left( {\mathbb {E}} \left[ \left\| {\mathbf {W}^d_T}\right\| ^{pq} \right] \right) ^{\frac{1}{(pq)}} = \left( {\mathbb {E}} \left[ \left\| {\mathbf {W}^d_T}\right\| ^{pq} \right] \right) ^{\frac{2}{(2pq)}} \le \sqrt{2T}\left( \prod _{k=0}^{pq-1}\left( \frac{d}{2}+k \right) \right) ^{\frac{1}{(2pq)}} \le \sqrt{2T\left( \frac{d}{2}+pq-1\right) }. \end{aligned}$$

(145)

This together with the fact that $\forall \, d\in {\mathbb {N}}:\left( \int _{{\mathbb {R}}^d}\left\| {x}\right\| ^{2pq} \nu _d(dx)\right) ^{\frac{1}{(2pq)}}\le Bd^{{\mathfrak {p}}}$ implies that there exists ${\bar{C}} \in (0,\infty )$ such that for all $d\in {\mathbb {N}}$ it holds that

$$\begin{aligned} c_d\le {\bar{C}} d^{pq}\left( \frac{1+d^{{\mathfrak {p}}}+\sqrt{d}}{3} \right) ^{pq}\le {\bar{C}} d^{({\mathfrak {p}}+1) pq}. \end{aligned}$$

(146)

Next note that for all $\gamma \in (0,\infty )$ it holds that

$$\begin{aligned} {\tilde{C}}_\gamma&= \sup _{n\in {\mathbb {N}}\cap [2,\infty )} \left[ n (3 n)^{2n} \left( \frac{\sqrt{e}(1+2LT)}{\sqrt{n-1}} \right) ^{(n-1)(4+\gamma )} \right] \\&= \sup _{n\in {\mathbb {N}}\cap [2,\infty )} \left[ (\sqrt{e}(1+2LT))^{(n-1)(4+\gamma )}n^3 3^{2n} (n-1)^{-(n-1)\frac{\gamma }{2}} \left( \frac{n}{n-1} \right) ^{2(n-1)} \right] \\&\le \left[ \sup _{n\in {\mathbb {N}}\cap [2,\infty )} \left[ (\sqrt{e}(1+2LT))^{(n-1)(4+\gamma )}n^3 3^{2n} (n-1)^{-(n-1)\frac{\gamma }{2}} \right] \right] \left[ \sup _{n\in {\mathbb {N}}\cap [2,\infty )} \left( \frac{n}{n-1} \right) ^{2(n-1)} \right] \\&<\infty . \end{aligned}$$

(147)

The fact that for all $d\in {\mathbb {N}}$, $v\in {\mathbb {R}}$, $x\in {\mathbb {R}}^d$, $\varepsilon \in (0,1]$ it holds that $\left| f(v)-(\mathcal {R}({\mathfrak {f}}_{{\varepsilon }}))(v)\right| \le \varepsilon (1+|v|^q)$ and $\left| g_d(x)-(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)\right| \le \varepsilon Bd^p (1+\left\| {x}\right\| )^{pq}$ implies that for all $d\in {\mathbb {N}}$, $v\in {\mathbb {R}}$, $x\in {\mathbb {R}}^d$, $\varepsilon \in (0,1]$ it holds that

$$\begin{aligned} \max \left\{ \left| f(v)-(\mathcal {R}({\mathfrak {f}}_{{\varepsilon }}))(v)\right| , \left| g_d(x)-(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)\right| \right\}&\le \max \left\{ \varepsilon (1+|v|^q), \varepsilon Bd^p (1+\left\| {x}\right\| )^{pq} \right\} \\&\le \varepsilon Bd^p ((1+ \left\| {x}\right\| )^{pq}+|v|^q). \end{aligned}$$

(148)

This, (136), (138), the fact that for all $d\in {\mathbb {N}}$, $v,w\in {\mathbb {R}}$, $x\in {\mathbb {R}}^d$, $\varepsilon \in (0,1]$ it holds that $|f(v)-f(w)|\le L$, $\left| ({\mathcal {R}}({\mathfrak {f}}_{{\varepsilon }}))(v)-({\mathcal {R}}({\mathfrak {f}}_{{\varepsilon }}))(w)\right| \le L\left| v-w\right|$, $|f(0)|\le B$, $|({\mathcal {R}}({\mathfrak {g}}_{{d,\varepsilon }}))(x)|\le Bd^p (1+\left\| {x}\right\| )^p$, and Corollary 2.4 (with $f_1=f$, $f_2={\mathcal {R}}({\mathfrak {f}}_{{\delta }})$, $g_1=g_d$, $g_2={\mathcal {R}}({\mathfrak {g}}_{{d,\delta }})$, $L=L$, $\delta =\delta Bd^p$, $B=Bd^p$, ${\mathbf {W}}={\mathbf {W}}^d$ in the notation of Corollary 2.4) imply that for all $d,N,M\in {\mathbb {N}}$, $\delta \in (0,1]$ it holds that

$$\begin{aligned}&\left( \int _{{\mathbb {R}}^d}{\mathbb {E}}\left[ \left| U^0_{N,M,d,\delta }(0,x)-u_d(0,x)\right| ^2\right] \nu _d(dx)\right) ^{\frac{1}{2}} \nonumber \\&\quad \le \left( e^{LT} (T+1)\right) ^{q+1}\left( (Bd^p)^q+1\right) \left( \delta Bd^p+\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\right) \nonumber \\&\qquad \cdot \left[ \int _{{\mathbb {R}}^d} \left( 1+\left\| {x}\right\| + \left( {\mathbb {E}} \left[ \left\| {\mathbf {W}^d_T}\right\| ^{pq} \right] \right) ^{\frac{1}{pq}}\right) ^{2pq} \nu _d(dx)\right] ^{\frac{1}{2}}. \end{aligned}$$

(149)

The triangle inequality hence proves for all $d,N,M\in {\mathbb {N}}$, $\delta \in (0,1]$ that

$$\begin{aligned} \begin{aligned}&\left( \int _{{\mathbb {R}}^d}{\mathbb {E}}\left[ \left| U^0_{N,M,d,\delta }(0,x)-u_d(0,x)\right| ^2\right] \nu _d(dx)\right) ^{\frac{1}{2}} \\&\quad \le \left( e^{LT} (T+1)\right) ^{q+1}\left( (Bd^p)^q+1\right) \left( \delta Bd^p+\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\right) \\&\qquad \cdot \left[ 1+\left( \int _{{\mathbb {R}}^d}\left\| {x}\right\| ^{2pq} \nu _d(dx)\right) ^{\frac{1}{(2pq)}} +\left( {\mathbb {E}} \left[ \left\| {\mathbf {W}^d_T}\right\| ^{pq} \right] \right) ^{\frac{1}{(pq)}} \right] ^{pq} \\&\quad = c_d\left( \delta Bd^p + \frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\right) . \end{aligned} \end{aligned}$$

(150)

This and Fubini’s theorem imply that for all $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$ it holds that

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}\left[ \int _{{\mathbb {R}}^d}\left| U^0_{N_{d,\varepsilon },N_{d,\varepsilon },d,\delta _{d,\varepsilon }}(0,x)-u_d(0,x)\right| ^2\nu _d(dx)\right] \\&\quad = \int _{{\mathbb {R}}^d}{\mathbb {E}}\left[ \left| U^0_{N_{d,\varepsilon },N_{d,\varepsilon },d,\delta _{d,\varepsilon }}(0,x)-u_d(0,x)\right| ^2\right] \nu _d(dx)\\&\quad \le \left( c_d\delta _{d,\varepsilon }Bd^p + c_d\left( \frac{\sqrt{e}(1+2LT)}{\sqrt{N_{d,\varepsilon }}}\right) ^{N_{d,\varepsilon }} \right) ^2 \le \left( \frac{\varepsilon }{4}+\frac{\varepsilon }{2}\right) ^2< \varepsilon ^2. \end{aligned} \end{aligned}$$

(151)

This implies that for all $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$ there exists $\omega _{d,\varepsilon }\in \Omega$ such that

$$\begin{aligned} \begin{aligned} \int _{{\mathbb {R}}^d}\left| U^0_{N_{d,\varepsilon },N_{d,\varepsilon },d,\delta _{d,\varepsilon }}(0,x,\omega _{d,\varepsilon })-u_d(0,x)\right| ^2\nu _d(dx)&<\varepsilon ^2. \end{aligned} \end{aligned}$$

(152)

Next, observe that Lemma 3.10 shows that for all $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$ there exists $\Psi _{d,\varepsilon } \in \mathbf {N}$ such that for all $x\in {\mathbb {R}}^d$ it holds that $\mathcal {R}(\Psi _{d,\varepsilon })\in C({\mathbb {R}}^d,{\mathbb {R}})$, $(\mathcal {R}(\Psi _{d,\varepsilon }))(x)=U^0_{N_{d,\varepsilon },N_{d,\varepsilon },d,\delta _{d,\varepsilon }}(0,x,\omega _{d,\varepsilon })$, $|||\mathcal {D}(\Psi _{d,\varepsilon } )|||\le k_{d,\delta _{d,\varepsilon }}(3 N_{d,\varepsilon })^{N_{d,\varepsilon }}$, and

$$\begin{aligned} \dim \left( \mathcal {D}\left( \Psi _{d,\varepsilon }\right) \right) = N_{d,\varepsilon }\left( \dim \left( {\mathcal {D}}\left( {\mathfrak {f}}_{{\delta _{d,\varepsilon }}}\right) \right) -1\right) +\dim \left( {\mathcal {D}}\left( {\mathfrak {g}}_{{d,\delta _{d,\varepsilon }}}\right) \right) . \end{aligned}$$

(153)

This and (152) prove (133). Moreover, (153) and (130) imply that for all $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$ it holds that

$$\begin{aligned} \mathcal {P}(\Psi _{d,\varepsilon })&\le \sum _{j=1}^{\dim (\mathcal {D}(\Psi _{d,\varepsilon }))}k_{d,\delta _{d,\varepsilon }}(3 N_{d,\varepsilon })^{N_{d,\varepsilon }} \left( k_{d,\delta _{d,\varepsilon }}(3 N_{d,\varepsilon })^{N_{d,\varepsilon }}+1\right) \\&\le 2\dim \left( \mathcal {D}\left( \Psi _{d,\varepsilon }\right) \right) k_{d,\delta _{d,\varepsilon }}^2(3 N_{d,\varepsilon })^{2N_{d,\varepsilon }}\\&=2 \left( N_{d,\varepsilon }\left( \dim \left( \mathcal {D}\left( {\mathfrak {f}}_{{\delta _{d,\varepsilon }}}\right) \right) -1\right) +\dim \left( \mathcal {D}\left( {\mathfrak {g}}_{{d,\delta _{d,\varepsilon }}}\right) \right) \right) k_{d,\delta _{d,\varepsilon }}^2(3 N_{d,\varepsilon })^{2N_{d,\varepsilon }}.\end{aligned}$$

(154)

In addition, it follows from (137) and (134) that for all $\varepsilon \in (0,1]$ it holds that

$$\begin{aligned} |||\mathcal {D}({\mathfrak {f}}_{{\varepsilon }})|||\le \left[ 4L\left( 1+\left( 4L+2|f(0)| \right) ^{\frac{1}{q-1}} \right) +2\right] \varepsilon ^{-\frac{q}{(q-1)}}\le B \varepsilon ^{-2}\le B \varepsilon ^{-\alpha }. \end{aligned}$$

(155)

Combining this with (154), the fact that $\forall \, \varepsilon \in (0,1] :\dim \left( \mathcal {D}\left( {\mathfrak {f}}_{{\varepsilon }}\right) \right) =3$, the fact that $\forall \, d\in {\mathbb {N}}, \varepsilon \in (0,1] :|||\mathcal {D}({\mathfrak {g}}_{{d,\varepsilon }})|||\le d^p\varepsilon ^{-\alpha }B$, and the fact that $\forall \, d\in {\mathbb {N}}, \varepsilon \in (0,1] :\dim \left( \mathcal {D}\left( {\mathfrak {g}}_{{d,\varepsilon }}\right) \right) \le d^p\varepsilon ^{-\beta }B$ implies that for all $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$ it holds that $k_{d,\delta _{d,\varepsilon }}\le d^p\delta _{d,\varepsilon }^{-\alpha }B$ and that

$$\begin{aligned} \mathcal {P}(\Psi _{d,\varepsilon })&\le 2 \left( N_{d,\varepsilon }\left( \dim \left( \mathcal {D}\left( {\mathfrak {f}}_{{\delta _{d,\varepsilon }}}\right) \right) -1\right) +\dim \left( {\mathcal {D}}\left( {\mathfrak {g}}_{{d,\delta _{d,\varepsilon }}}\right) \right) \right) \left( d^{p}\delta _{d,\varepsilon }^{-\alpha }B\right) ^2 (3 N_{d,\varepsilon })^{2N_{d,\varepsilon }}\\&\le 2 \left( 2N_{d,\varepsilon }+d^p(\delta _{d,\varepsilon })^{-\beta }B\right) \left( d^{p}\delta _{d,\varepsilon }^{-\alpha }B\right) ^2 (3 N_{d,\varepsilon })^{2N_{d,\varepsilon }}\\&\le 4d^p\delta _{d,\varepsilon }^{-\beta }Bd^{2p}\delta _{d,\varepsilon }^{-2\alpha }B^2 N_{d,\varepsilon } (3 N_{d,\varepsilon })^{2N_{d,\varepsilon }}\\&= 4B^3 (4c_d Bd^p)^{2\alpha +\beta } d^{3p} \varepsilon ^{-(2\alpha +\beta )} N_{d,\varepsilon } (3 N_{d,\varepsilon })^{2N_{d,\varepsilon }}. \end{aligned}$$

(156)

Furthermore, (143) ensures that for all $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$ it holds that

$$\begin{aligned} \varepsilon \le 2c_d \left( \frac{\sqrt{e}(1+2LT)}{\sqrt{N_{d,\varepsilon }-1}} \right) ^{N_{d,\varepsilon }-1}. \end{aligned}$$

(157)

This together with (156) implies that for all $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$, $\gamma \in (0,1]$ it holds that

$$\begin{aligned} \begin{aligned} \mathcal {P}(\Psi_{d,\varepsilon })&\le 4B^{2\alpha +\beta +3} (4c_d)^{2\alpha +\beta } d^{(2\alpha +\beta +3)p} \varepsilon ^{-(2\alpha +\beta )} N_{d,\varepsilon } (3 N_{d,\varepsilon })^{2N_{d,\varepsilon }}\varepsilon ^{4+\gamma }\varepsilon ^{-(4+\gamma )}\\&\quad \le 4B^{2\alpha +\beta +3} (4c_d)^{4+2\alpha +\beta +\gamma } d^{(2\alpha +\beta +3)p} N_{d,\varepsilon } (3 N_{d,\varepsilon })^{2N_{d,\varepsilon }} \left( \frac{\sqrt{e}(1+2LT)}{\sqrt{N_{d,\varepsilon }-1}} \right) ^{(N_{d,\varepsilon }-1)(4+\gamma )}\varepsilon ^{-(4+2\alpha +\beta +\gamma )}\\&\quad \le 4B^{2\alpha +\beta +3} (4c_d)^{5+2\alpha +\beta } d^{(2\alpha +\beta +3)p} \sup _{n\in {\mathbb {N}}\cap [2,\infty )}\left[ n (3 n)^{2n} \left( \frac{\sqrt{e}(1+2LT)}{\sqrt{n-1}} \right) ^{(n-1)(4+\gamma )} \right] \varepsilon ^{-(4+2\alpha +\beta +\gamma )} \\&\quad = 4B^{2\alpha +\beta +3} (4c_d)^{5+2\alpha +\beta } d^{(2\alpha +\beta +3)p} {\tilde{C}}_\gamma \varepsilon ^{-(4+2\alpha +\beta +\gamma )} .\end{aligned} \end{aligned}$$

(158)

Combining this with (146) and (147) establishes that there exist $\eta \in (0,\infty )$, $C=(C_\gamma )_{\gamma \in (0,1]}:(0,1]\rightarrow (0,\infty )$ such that for all $d\in {\mathbb {N}}$, $\varepsilon , \gamma \in (0,1]$ it holds that $\mathcal {P}(\Psi _{d,\varepsilon })\le C_\gamma d^\eta \varepsilon ^{-(4+2\alpha +\beta +\gamma )}$. The proof of Theorem 4.1 is thus completed. $\square$

4.2 Deep neural network approximations with general polynomial convergence rates

Corollary 4.2

Let $\left\| \cdot \right\| :(\cup _{d\in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0,\infty )$ and $\mathbf {A}_{d}:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d$, $d\in {\mathbb {N}}$, satisfy for all $d\in {\mathbb {N}}$, $x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d$ that $\Vert x\Vert =[\sum _{i=1}^d(x_i)^2]^{1/2}$ and $\mathbf {A}_{d}(x)= \left( \max \{x_1,0\},\ldots ,\max \{x_d,0\}\right) ,$ let

$$\begin{aligned} \mathbf {N}= \bigcup _{H\in {\mathbb {N}}}\bigcup _{(k_0,k_1,\ldots ,k_{H+1})\in {\mathbb {N}}^{H+2}} \left[ \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_{n}\times k_{n-1}} \times {\mathbb {R}}^{k_{n}}\right) \right] , \end{aligned}$$

(159)

let$\mathcal {R}:\mathbf {N}\rightarrow (\cup _{k,l\in {\mathbb {N}}} C({\mathbb {R}}^k,{\mathbb {R}}^l))$and$\mathcal {P}:\mathbf {N}\rightarrow {\mathbb {N}}$satisfy for all$H\in {\mathbb {N}}$, $k_0,k_1,\ldots ,k_H,k_{H+1}\in {\mathbb {N}}$, $\Phi = ((W_1,B_1),\ldots ,(W_{H+1},B_{H+1}))\in \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_n\times k_{n-1}} \times {\mathbb {R}}^{k_n}\right) ,$$x_0 \in {\mathbb {R}}^{k_0},\ldots ,x_{H}\in {\mathbb {R}}^{k_{H}}$with$\forall \, n\in {\mathbb {N}}\cap [1,H]:x_n = \mathbf {A}_{k_n}(W_n x_{n-1}+B_n )$that

$$\begin{aligned} {\mathcal {R}}(\Phi )\in C({\mathbb {R}}^{k_0},{\mathbb {R}}^ {k_{H+1}}),\quad (\mathcal {R}(\Phi )) (x_0) = W_{H+1}x_{H}+B_{H+1}, \quad and\quad \mathcal {P}(\Phi )=\textstyle {\sum \limits _{n=1}^{H+1}}k_n(k_{n-1}+1), \end{aligned}$$

let$T,\kappa \in (0,\infty )$, $f\in C({\mathbb {R}},{\mathbb {R}})$, $({\mathfrak {g}}_{{d,\varepsilon }})_{d\in {\mathbb {N}},\varepsilon \in (0,1]}{\subseteq}\mathbf {N}$, $(c_d)_{d\in {\mathbb {N}}}{\subseteq}(0,\infty )$, for every$d\in {\mathbb {N}}$let$g_d\in C({\mathbb {R}}^d,{\mathbb {R}})$, for every$d\in {\mathbb {N}}$let$u_d\in C^{1,2}([0,T]\times {\mathbb {R}}^d,{\mathbb {R}})$, and assume for all$d\in {\mathbb {N}}$, $v,w\in {\mathbb {R}}$, $x\in {\mathbb {R}}^d$, $\varepsilon \in (0,1]$, $t\in (0,T)$that

$$\begin{aligned}&\mathcal {P}({\mathfrak {g}}_{{d,\varepsilon }})\le \kappa d^\kappa \varepsilon ^{-\kappa }, \quad \left| g_d(x)-(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)\right| \le \varepsilon \kappa d^\kappa (1+\left\| {x}\right\| ^\kappa ), \quad \mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }})\in C({\mathbb {R}}^d,{\mathbb {R}}), \end{aligned}$$

(160)

$$\begin{aligned}&|(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)|\le \kappa d^\kappa (1+\left\| {x}\right\| ^\kappa ),\quad |f(v)-f(w)|\le \kappa |v-w|, \quad |u_d(t,x)|\le c_d(1+\left\| {x}\right\| ^{c_d}), \end{aligned}$$

(161)

$$\begin{aligned}&\begin{aligned} (\tfrac{\partial }{\partial t}u_d)(t,x)+\tfrac{1}{2}(\Delta _xu_d)(t,x)+f(u_d(t,x))=0, \qquad and \qquad u_d(T,x)=g_d(x). \end{aligned} \end{aligned}$$

(162)

Then there exist$(\Psi _{d,\varepsilon })_{d\in {\mathbb {N}},\varepsilon \in (0,1]}{\subseteq}\mathbf {N}$, $\eta \in (0,\infty )$such that for all$d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$it holds that$\mathcal {R}(\Psi _{d,\varepsilon })\in C({\mathbb {R}}^d,{\mathbb {R}})$, $\mathcal {P}(\Psi _{d,\varepsilon })\le \eta d^\eta \varepsilon ^{-\eta }$, and

$$\begin{aligned} \left[ \int _{[0,1]^d}\left| u_d(0,x)-(\mathcal {R}(\Psi _{d,\varepsilon }))(x)\right| ^2\, dx\right] ^{\frac{1}{2}}\le \varepsilon . \end{aligned}$$

(163)

Proof of Corollary 4.2

Throughout this proof assume without loss of generality that $\kappa \ge 2$, let $|||\cdot ||| :(\cup _{d\in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0,\infty )$ and $\dim :(\cup _{d\in {\mathbb {N}}}{\mathbb {R}}^d) \rightarrow {\mathbb {N}}$ satisfy for all $d\in {\mathbb {N}}$, $x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d$ that $|||x|||=\max _{i\in [1,d]\cap {\mathbb {N}}}|x_i|$ and $\dim \left( x\right) =d$, let $\mathcal {D}:\mathbf {N}\rightarrow \mathbf {D}$ satisfy for all $H\in {\mathbb {N}}$, $k_0,k_1,\ldots ,k_H,k_{H+1}\in {\mathbb {N}}$, $\Phi = ((W_1,B_1),\ldots ,(W_{H+1},B_{H+1}))\in \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_n\times k_{n-1}} \times {\mathbb {R}}^{k_n}\right)$ that

$$\begin{aligned} \mathcal {D}(\Phi )= (k_0,k_1,\ldots ,k_{H}, k_{H+1}), \end{aligned}$$

(164)

and let $B=\max \left\{ 1, \kappa \left[ \sup _{r\in [0,\infty )}\frac{1+r^\kappa }{(1+r)^\kappa }\right] \right\}$. The fact that $\forall \, d\in {\mathbb {N}}$, $t\in [0,T]$, $x\in {\mathbb {R}}^d :|u_d(t,x)|\le c_d(1+\left\| {x}\right\| ^{c_d})$, the fact that $\forall \, d\in {\mathbb {N}}$, $x\in {\mathbb {R}}^d :u_d(T,x)=g_d(x)$, the fact that $\forall \, v,w\in {\mathbb {R}}:|f(v)-f(w)|\le \kappa |v-w|$, (162), and the Feynman–Kac-formula ensure that the functions $u_d$, $d\in {\mathbb {N}}$, satisfy (132). Next note that for all $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$, $x\in {\mathbb {R}}^d$ it holds that

$$\begin{aligned}&|(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)|\le \kappa d^\kappa (1+\left\| {x}\right\| ^\kappa ) \le \kappa \left[ \sup _{r\in [0,\infty )}\frac{1+r^\kappa }{(1+r)^\kappa }\right] d^\kappa (1+\left\| {x}\right\| )^\kappa \le Bd^\kappa (1+\left\| {x}\right\| )^\kappa , \end{aligned}$$

(165)

$$\begin{aligned}&\left| g_d(x)-(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)\right| \le \varepsilon \kappa d^\kappa (1+\left\| {x}\right\| ^\kappa )\le \varepsilon \kappa \left[ \sup _{r\in [0,\infty )}\frac{1+r^\kappa }{(1+r)^\kappa }\right] d^\kappa (1+\left\| {x}\right\| )^\kappa \le \varepsilon Bd^\kappa (1+\left\| {x}\right\| )^{2\kappa }, \end{aligned}$$

(166)

$$\begin{aligned}&|||\mathcal {D}({\mathfrak {g}}_{{d,\varepsilon }})|||\le \mathcal {P}({\mathfrak {g}}_{{d,\varepsilon }})\le \kappa d^\kappa \varepsilon ^{-\kappa }\le Bd^\kappa \varepsilon ^{-\kappa }, \end{aligned}$$

(167)

and

$$\begin{aligned} \dim \left( \mathcal {D}\left( {\mathfrak {g}}_{{d,\varepsilon }}\right) \right) \le \mathcal {P}({\mathfrak {g}}_{{d,\varepsilon }})\le \kappa d^\kappa \varepsilon ^{-\kappa } \le Bd^\kappa \varepsilon ^{-\kappa }. \end{aligned}$$

(168)

Moreover, observe that the fact that $\forall \, d\in {\mathbb {N}}, y\in [0,1]^d:\left\| {y}\right\| \le \sqrt{d}$ ensures that for all $d\in {\mathbb {N}}$, $\alpha \in (0,\infty )$ it holds that

$$\begin{aligned} \left( \int _{[0,1]^d}\left\| {y}\right\| ^\alpha \, dy\right) ^{\frac{1}{\alpha }} \le \sqrt{d} \le Bd. \end{aligned}$$

(169)

Combining this with (165)–(168) and Theorem 4.1 (with $\alpha =\kappa$, $\beta =\kappa$, $B=B$, $L=\kappa$, $p=\kappa$, $q=2$, ${\mathfrak{p}}=1$, and $\gamma =\frac{1}{2}$ in the notation of Theorem 4.1) ensures that there exist $(\Psi _{d,\varepsilon })_{d\in {\mathbb {N}},\varepsilon \in (0,1]}{\subseteq}\mathbf {N}$, $\eta \in (0,\infty )$ such that for all $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$ it holds that $\mathcal {R}(\Psi _{d,\varepsilon })\in C({\mathbb {R}}^d,{\mathbb {R}})$, $\mathcal {P}(\Psi _{d,\varepsilon })\le \eta d^\eta \varepsilon ^{-\eta }$, and

$$\begin{aligned} \left[ \int _{[0,1]^d}\left| u_d(0,x)-(\mathcal {R}(\Psi _{d,\varepsilon }))(x)\right| ^2\, dx\right] ^{\frac{1}{2}}\le \varepsilon . \end{aligned}$$

(170)

The proof of Corollary 4.2 is thus completed. $\square$

References

Beck, C., Becker, S., Grohs, P., Jaafari, N., Jentzen, A.: Solving stochastic differential equations and Kolmogorov equations by means of deep learning. arXiv:1806.00421, (2018)
Beck, C., E, W., Jentzen, A.: Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations. J. Nonlinear Sci. 29, 1563–1619 (2019)
Article MathSciNet Google Scholar
Beck, C., Hornung, F., Hutzenthaler, M., Jentzen, A., Kruse, T.: Overcoming the curse of dimensionality in the numerical approximation of Allen–Cahn partial differential equations via truncated full-history recursive multilevel Picard approximations. arXiv:1907.06729 (2019)
Becker, S., Cheridito, P., Jentzen, A.: Deep optimal stopping. J. Mach. Learn. Res. 20(74), 1–25 (2019)
MathSciNet MATH Google Scholar
Berner, J., Grohs, P., Jentzen, A.: Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations. arXiv:1809.03062, (2018)
E, W., Han, J., Jentzen, A.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5(4), 349–380 (2017)
Article MathSciNet Google Scholar
E, W., Hutzenthaler, M., Jentzen, A., Kruse, T.: Multilevel Picard iterations for solving smooth semilinear parabolic heat equations. arXiv:1607.03295, (2016)
E, W., Hutzenthaler, M., Jentzen, A., Kruse, T.: On multilevel Picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations. J. Sci. Comput. 79(3), 1534–1571 (2019)
Article MathSciNet Google Scholar
E, W., Yu, B.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1–12 (2018)
Article MathSciNet Google Scholar
Elbrächter, D., Grohs, P., Jentzen, A., Schwab, C.: DNN expression rate analysis of high-dimensional PDEs: application to option pricing. arXiv:1809.07669, (2018)
Fujii, M., Takahashi, A., Takahashi, M.: Asymptotic expansion as prior knowledge in deep learning method for high dimensional BSDEs. arXiv:1710.07030, (2017)
Giles, M., Jentzen, A., Welti, T.: Generalised multilevel Picard approximations. arXiv:1911.03188, (2019)
Grohs, P., Hornung, F., Jentzen, A., von Wurstemberger, P.: A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations. Mem. Am. Math. Soc. arxiv:1809.02362, (2020)
Han, J., Jentzen, A., E, W.: Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. 115(34), 8505–8510 (2018)
Article MathSciNet Google Scholar
Henry-Labordere, P.L.: Deep primal-dual algorithm for BSDEs: applications of machine learning to CVA and IM. Available at SSRN: https://ssrn.com/abstract=3071506, (2017)
Hutzenthaler, M., Jentzen, A., Kruse, T.: Overcoming the curse of dimensionality in the numerical approximation of parabolic partial differential equations with gradient-dependent nonlinearities. arXiv:1912.02571, (2019)
Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T.A., von Wurstemberger, P.: Overcoming the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations. arXiv:1807.01212, (2018)
Hutzenthaler, M., Jentzen, A., von Wurstemberger, P.: Overcoming the curse of dimensionality in the approximative pricing of financial derivatives with default risks. Electron. J. Probab. arXiv:1903.05985, (2019)
Hutzenthaler, M., Kruse, T.: Multi-level Picard approximations of high-dimensional semilinear parabolic differential equations with gradient-dependent nonlinearities. SIAM J. Numer. Anal. arxiv:1711.01080 (2020)
Jentzen, A., Salimova, D., Welti, T.: A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients. arXiv:1809.07321, (2018)
Khoo, Y., Lu, J., Ying, L.: Solving parametric PDE problems with artificial neural networks. arXiv:1707.03351, (2017)
Mishra, S.: A machine learning framework for data driven acceleration of computations of differential equations. arXiv:1807.09519, (2018)
Nabian, M. A., Meidani, H.: A deep neural network surrogate for high-dimensional random partial differential equations. arXiv:1806.02957, (2018)
Raissi, M.: Forward–backward stochastic neural networks: deep learning of high-dimensional partial differential equations. arXiv:1804.07010, (2018)
Sirignano, J., Spiliopoulos, K.: Dgm: A deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018)
Article MathSciNet Google Scholar

Download references

Acknowledgements

Open Access funding provided by Projekt DEAL. This project has been partially supported through the German Research Foundation via research Grants HU1889/6-1 (MH) and UR313/1-1 (TK) and via RTG 2131 High-dimensional Phenomena in Probability-Fluctuations and Discontinuity (TAN). The second author acknowledges funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy EXC 2044-390685587, Mathematics Muenster: Dynamics-Geometry-Structure. Diyora Salimova and an anonymous referee are gratefully acknowledged for giving helpful comments.

Author information

Authors and Affiliations

Faculty of Mathematics, University of Duisburg-Essen, 45117, Essen, Germany
Martin Hutzenthaler & Tuan Anh Nguyen
SAM, Department of Mathematics, ETH Zurich, 8092, Zurich, Switzerland
Arnulf Jentzen
Faculty of Mathematics and Computer Science, University of Münster, 48149, Münster, Germany
Arnulf Jentzen
Institute of Mathematics, University of Gießen, 35392, Gießen, Germany
Thomas Kruse

Authors

Martin Hutzenthaler
View author publications
You can also search for this author in PubMed Google Scholar
Arnulf Jentzen
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Kruse
View author publications
You can also search for this author in PubMed Google Scholar
Tuan Anh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Hutzenthaler.

Additional information

This article is part of the section “Computational Approaches” edited by Siddhartha Mishra.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hutzenthaler, M., Jentzen, A., Kruse, T. et al. A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. SN Partial Differ. Equ. Appl. 1, 10 (2020). https://doi.org/10.1007/s42985-019-0006-9

Download citation

Received: 10 August 2019
Accepted: 22 December 2019
Published: 06 April 2020
DOI: https://doi.org/10.1007/s42985-019-0006-9

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations

Abstract

Similar content being viewed by others

Multilevel Picard iterations for solving smooth semilinear parabolic heat equations

Deep learning schemes for parabolic nonlocal integro-differential equations

Error analysis for physics-informed neural networks (PINNs) approximating Kolmogorov PDEs

1 Introduction

Theorem 1.1

2 A stability result for full history recursive multilevel Picard (MLP) approximations

2.1 Setting

Setting 2.1

2.2 An a priori estimate for solutions of partial differential equations (PDEs)

Lemma 2.2

Proof of Lemma 2.2

2.3 A stability result for solutions of PDEs

Lemma 2.3

Proof of Lemma 2.3

2.4 A stability result for MLP approximations

Corollary 2.4

Proof of Corollary 2.4

3 Deep neural network representations for MLP approximations

3.1 A mathematical framework for deep neural networks

Setting 3.1

Remark 3.2

3.2 Properties of operations associated to deep neural networks

Lemma 3.3

Proof of Lemma 3.3

Lemma 3.4

Proof of Lemma 3.4

Lemma 3.5

Proof of Lemma 3.5

Lemma 3.6

Proof of Lemma 3.6

Lemma 3.7

Proof of Lemma 3.7

Lemma 3.8

Proof of Lemma 3.8

Lemma 3.9

Proof of Lemma 3.9

3.3 Deep neural network representations for MLP approximations

Lemma 3.10

Proof of Lemma 3.10

3.4 Deep neural network approximations for the PDE nonlinearity

Lemma 3.11

Proof of Lemma 3.11

Lemma 3.12

Proof of Lemma 3.12

Corollary 3.13

Proof of Corollary 3.13

4 Deep neural network approximations for PDEs

4.1 Deep neural network approximations with specific polynomial convergence rates

Theorem 4.1

Proof of Theorem 4.1

4.2 Deep neural network approximations with general polynomial convergence rates

Corollary 4.2

Proof of Corollary 4.2

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation