1 Introduction

Deep neural networks (DNNs) have revolutionized a number of computational problems; see, e.g., the references in Grohs et al. [13]. In 2017 deep learning-based approximation algorithms for certain parabolic partial differential equations (PDEs) have been proposed in Han et al. [6, 14] and based on these works there is now a series of deep learning-based numerical approximation algorithms for a large class of different kinds of PDEs in the scientific literature; see, e.g., [1, 2, 4, 9, 10, 11, 13, 15, 21,22,23,24,25]. There is empirical evidence that deep learning-based methods work exceptionally well for approximating solutions of high-dimensional PDEs and that these do not suffer from the curse of dimensionality; see, e.g., the simulations in [1, 2, 6, 14]. There exist, however, only few theoretical results which prove that DNN approximations of solutions of PDEs do not suffer from the curse of dimensionality: The recent articles [5, 10, 13, 20] prove rigorously that DNN approximations overcome the curse of dimensionality in the numerical approximation of solutions of certain linear PDEs.

The main result of this article, Theorem 4.1 below, proves for semilinear heat equations with gradient-independent nonlinearities that the number of parameters of the approximating DNN grows at most polynomially in both the PDE dimension \(d\in {\mathbb {N}}\) and the reciprocal of the prescribed accuracy \(\varepsilon >0\). Thereby, we establish for the first time that there exist DNN approximations of solutions of such PDEs which indeed overcome the curse of dimensionality. To illustrate the main result of this article we formulate in the following result, Theorem 1.1 below, a special case of Theorem 4.1.

Theorem 1.1

Let \(\mathbf {A}_{d}:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\), \(d\in {\mathbb {N}}=\{1,2,\ldots \}\), and \(\left\| \cdot \right\| :(\cup _{d\in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0,\infty )\) satisfy for all \(d\in {\mathbb {N}}\), \(x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d\) that \(\mathbf {A}_{d}(x)= \left( \max \{x_1,0\},\ldots ,\max \{x_d,0\}\right)\) and \(\Vert x\Vert =[\sum _{i=1}^d(x_i)^2]^{1/2}\), let \({\mathbf {N}} = \cup _{H\in {\mathbb {N}}}\cup _{(k_0,k_1,\ldots ,k_{H+1})\in {\mathbb {N}}^{H+2}} \left[ \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_{n}\times k_{n-1}} \times {\mathbb {R}}^{k_{n}}\right) \right],\) let \({\mathcal {R}}:\mathbf {N}\rightarrow (\cup _{k,l\in {\mathbb {N}}} C({\mathbb {R}}^k,{\mathbb {R}}^l))\) and \({\mathcal {P}}:\mathbf {N}\rightarrow {\mathbb {N}}\) satisfy for all \(H\in {\mathbb {N}}\), \(k_0,k_1,\ldots ,k_H,k_{H+1}\in {\mathbb {N}}\), \(\Phi = ((W_1,B_1),\ldots ,(W_{H+1},B_{H+1}))\in \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_n\times k_{n-1}} \times {\mathbb {R}}^{k_n}\right) ,\)\(x_0 \in {\mathbb {R}}^{k_0},\ldots ,x_{H}\in {\mathbb {R}}^{k_{H}}\) with \(\forall \, n\in {\mathbb {N}}\cap [1,H]:x_n = \mathbf {A}_{k_n}(W_n x_{n-1}+B_n )\) that

$$\begin{aligned} {\mathcal {R}}(\Phi )\in C({\mathbb {R}}^{k_0},{\mathbb {R}}^ {k_{H+1}}),\quad (\mathcal {R}(\Phi )) (x_0) = W_{H+1}x_{H}+B_{H+1}, \quad {and}\quad \mathcal {P}(\Phi )=\textstyle {\sum \limits _{n=1}^{H+1}}k_n(k_{n-1}+1), \end{aligned}$$

let\(T,\kappa \in (0,\infty )\), \(f\in C({\mathbb {R}},{\mathbb {R}})\), \(({\mathfrak {g}}_{{d,\varepsilon }})_{d\in {\mathbb {N}},\varepsilon \in (0,1]}\subseteq \mathbf {N}\), \((c_d)_{d\in {\mathbb {N}}}\subseteq (0,\infty )\), for every\(d\in {\mathbb {N}}\)let\(g_d\in C({\mathbb {R}}^d,{\mathbb {R}})\), for every\(d\in {\mathbb {N}}\)let\(u_d\in C^{1,2}([0,T]\times {\mathbb {R}}^d,{\mathbb {R}})\), and assume for all\(d\in {\mathbb {N}}\), \(v,w\in {\mathbb {R}}\), \(x\in {\mathbb {R}}^d\), \(\varepsilon \in (0,1]\), \(t\in (0,T)\)that\(|f(v)-f(w)|\le \kappa |v-w|\), \({\mathcal {R}}({\mathfrak {g}}_{{d,\varepsilon }})\in C({\mathbb {R}}^d,{\mathbb {R}})\), \(|({\mathcal {R}}({\mathfrak {g}}_{{d,\varepsilon }}))(x)|\le \kappa d^\kappa (1+\| {x}\| ^\kappa )\), \(| g_d(x)-({\mathcal {R}}({\mathfrak {g}}_{{d,\varepsilon }}))(x)| \le \varepsilon \kappa d^\kappa (1+\| {x}\| ^\kappa )\), \({\mathcal {P}}({\mathfrak {g}}_{{d,\varepsilon }})\le \kappa d^\kappa \varepsilon ^{-\kappa }\), \(|u_d(t,x)|\le c_d(1+\| {x}\| ^{c_d})\), \(u_d(0,x)=g_d(x)\), and

$$(\tfrac{\partial }{\partial t}u_d)(t,x)=(\Delta _xu_d)(t,x)+f(u_d(t,x)).$$
(1)

Then there exist\(\eta \in (0,\infty )\)and\(({\mathfrak {u}}_{{d,\varepsilon }})_{d\in {\mathbb {N}},\varepsilon \in (0,1]}\subseteq {\mathbf {N}}\)such that for all\(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\)it holds that\({\mathcal {R}}({\mathfrak {u}}_{{d,\varepsilon }})\in C({\mathbb {R}}^d,{\mathbb {R}})\), \({\mathcal {P}}({\mathfrak {u}}_{{d,\varepsilon }})\le \eta d^\eta \varepsilon ^{-\eta }\), and

$$\left[ \int _{[0,1]^d}\left| u_d(T,x)-(\mathcal {R}({\mathfrak {u}}_{{d,\varepsilon }}))(x)\right| ^2\, dx\right] ^{\frac{1}{2}}\le \varepsilon .$$
(2)

Theorem 1.1 is an immediate consequence of Corollary 4.2 in Sect. 4.2 below (with \(T=2T\), \(u_d(t,x)=u_d(T-\frac{t}{2},x)\), \(f(v)=f(v)/2\) for \(t\in [0,2T]\), \(x\in {\mathbb {R}}^d\), \(v\in {\mathbb {R}}\) in the notation of Corollary 4.2). In the following we add a few comments on some of the mathematical objects which appear in Theorem 1.1. First, note that for all \(d\in {\mathbb {N}}\) it holds that \(\left\| \cdot \right\| |_{{\mathbb {R}}^d}:{\mathbb {R}}^d \rightarrow [0,\infty )\) in Theorem 1.1 is nothing else but the standard norm on \({\mathbb {R}}^d\). Theorem 1.1 shows under suitable assumptions that DNNs can overcome the curse of dimensionality in the numerical approximation of semilinear heat equations of the form (1) above and the functions \({\mathbf {A}}_{d}:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\), \(d\in {\mathbb {N}}\), in Theorem 1.1 above specify the activation functions which we employ in the considered DNN approximations. The set \({\mathbf {N}}\) in Theorem 1.1 above represents the set of all DNNs. Observe that for all \(\Phi \in {\mathbf {N}}\) we have that the natural number \({\mathcal {P}}(\Phi )\) specifies the number of real parameters used to describe the DNN \(\Phi\). Moreover, note that for all \(\Phi \in {\mathbf {N}}\) it holds that \({\mathcal {R}}(\Phi )\) is the realization function associated to the DNN \(\Phi\). The real number \(T\in (0,\infty )\) specifies the time horizon [0, T] of the PDEs in (1), the function \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) specifies the nonlinearity of the PDEs in (1), the functions \(g_d:{\mathbb {R}}^d \rightarrow {\mathbb {R}}\), \(d\in {\mathbb {N}}\), specify the initial conditions of the PDEs in (1), and the functions \(u_d:[0,T]\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}\), \(d\in {\mathbb {N}}\), specify the solutions of the PDEs in (1). The real numbers \(\kappa \in (0,\infty )\) and \(c_d \in (0,\infty )\), \(d\in {\mathbb {N}}\), are constants which we use to specify suitable regularity assumptions on the nonlinearity \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\), the initial conditions \(g_d:{\mathbb {R}}^d \rightarrow {\mathbb {R}}\), \(d\in {\mathbb {N}}\), and the PDE solutions \(u_d:[0,T]\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}\), \(d\in {\mathbb {N}}\). Theorem 1.1 establishes the existence of DNNs \({\mathfrak {u}}_{{d,\varepsilon }}\in \mathbf {N}\), \(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\), which approximate the solutions \(u_d:[0,T]\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}\), \(d\in {\mathbb {N}}\), of (1) at time T without the curse of dimensionality.

Next we sketch the main steps in our proof of Theorem 1.1 above. Roughly speaking, the proof can be divided into four main steps. First, we approximate the solutions of the semilinear heat equations in (1) through solutions of PDEs whose initial conditions and nonlinearities can be exactly represented through suitable DNNs (cf. Lemma 2.3 below). Next we construct a suitable artificial probability space on which we approximate the solutions of these approximating PDEs by means of the in [7, 17] recently introduced full history recursive multilevel Picard (MLP) approximations (cf. Corollary 2.4 below and see also [3, 8, 12, 16, 18, 19] for further articles on MLP approximations). Thereafter, we prove that the MLP approximations of the approximating PDEs can be exactly represented by DNNs (cf. Lemma 3.10 below). We thus have constructed suitable random DNNs which approximate the solutions of (1) without the curse of dimensionality in the probabilistically strong sense. We are now in the position of the articles [5, 13, 20] to bring, e.g., [20, Corollary 2.4] into play to obtain the existence of a realization on the artificial probability space such that the error between the PDE solution of (1) and the realization of the constructed random DNNs is below the prescribed approximation accuracy \(\varepsilon \in (0,1]\) and this will allow us to complete the proof of Theorem 1.1 above. Our strategy of the proof of Theorem 1.1 is inspired by the procedure in [5, 13, 20] in the sense that we also construct suitable random DNNs to approximate the solutions of (1) on a suitable artificial probability space. The main difference of this work compared to [5, 13, 20] is that in this work we do not construct the approximating random DNNs through the plain vanilla Monte Carlo method but through the recently introduced MLP approximation scheme which is an efficient nonlinear Monte Carlo algorithm and thereby allows us to overcome the curse of dimensionality in the case of nonlinear PDEs of the form (1).

The remainder of this article is organized as follows. In Sect. 2 we provide auxiliary results on MLP approximations ensuring that these approximations are stable against perturbations in the nonlinearity and the terminal condition of the PDE (1). In Sect. 3 we show that MLP approximations can be represented by DNNs and we provide bounds for the number of parameters of the representing DNN. In Sect. 4 we use the results of Sects. 2 and 3 to establish in Theorem 4.1 the main result of this article.

2 A stability result for full history recursive multilevel Picard (MLP) approximations

2.1 Setting

Setting 2.1

Let\(d\in {\mathbb {N}}\), \(T,L,\delta ,B\in (0,\infty )\), \(p,q\in [1,\infty )\), \(f_1,f_2\in C\left( [0,T]\times {\mathbb {R}}^{d}\times {\mathbb {R}},{\mathbb {R}}\right)\), \(g_1,g_2\in C({\mathbb {R}}^d,{\mathbb {R}})\), let\(\left\| \cdot \right\| :{\mathbb {R}}^d \rightarrow [0,\infty )\)satisfy for all\(x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d\)that\(\Vert x\Vert =[\sum _{i=1}^d(x_i)^2]^{1/2}\), assume for all\(t\in [0,T]\), \(x\in {\mathbb {R}}^d\), \(w,v\in {\mathbb {R}}\), \(i\in \{1,2\}\)that

$$\left| f_i(t,x,w)-f_i(t,x,v)\right| \le L\left| w-v\right| ,$$
(3)
$$\max \left\{ \left| f_i(t,x,0)\right| ,\left| g_i(x)\right| \right\} \le B\left( 1+\left\| {x}\right\| \right) ^p,$$
(4)

and

$$\max \left\{ \left| f_1(t,x,v)-f_2(t,x,v)\right| , \left| g_1(x)-g_2(x)\right| \right\} \le \delta \left( \left( 1+\left\| {x}\right\| \right) ^{pq}+|v|^q\right) ,$$
(5)

let\(F_i:C\left( [0,T]\times {\mathbb {R}}^d,{\mathbb {R}}\right) \rightarrow C\left( [0,T]\times {\mathbb {R}}^d,{\mathbb {R}}\right)\), \(i\in \{1,2\}\), satisfy for all\(v\in C\left( [0,T]\times {\mathbb {R}}^d,{\mathbb {R}}\right)\), \(t\in [0,T]\), \(x\in {\mathbb {R}}^d\), \(i\in \{1,2\}\)that

$$(F_i(v))(t,x) = f_i(t,x,v(t,x)),$$
(6)

let\((\Omega , {\mathcal {F}}, {\mathbb {P}})\)be a probability space, let\({\mathbf {W}}:[0,T]\times \Omega \rightarrow {\mathbb {R}}^d\)be a standard Brownian motion with continuous sample paths, let\(u_1,u_2\in C([0,T]\times {\mathbb {R}}^d,{\mathbb {R}})\), assume for all\(i\in \{1,2\}\), \(s\in [0,T]\), \(x\in {\mathbb {R}}^d\)that

$${\mathbb {E}}\left[ \left| g_i\left( x+\mathbf {W}_{T-s}\right) \right| +\int _s^T\left| \left( F_i(u_i)\right) \left( t,x+\mathbf {W}_{t-s}\right) \right| \,dt \right] <\infty$$
(7)

and

$$u_i(s,x)={\mathbb {E}}\left[ g_i\left( x+\mathbf {W}_{T-s}\right) +\int _s^T \left( F_i(u_i)\right) \left( t,x+\mathbf {W}_{t-s}\right) \,dt\right],$$
(8)

let\(\Theta = \bigcup _{ n \in {\mathbb {N}}} {\mathbb {Z}}^n\), let\({\mathfrak {u}}^\theta :\Omega \rightarrow [0,1]\), \(\theta \in \Theta\), be independent random variables which are uniformly distributed on [0, 1], let\({\mathcal {U}}^\theta :[0,T]\times \Omega \rightarrow [0, T]\), \(\theta \in \Theta\), satisfy for all\(t\in [0,T]\), \(\theta \in \Theta\)that\({\mathcal {U}}^\theta _t = t+ (T-t){\mathfrak {u}}^\theta\), let\(W^\theta :[0,T]\times \Omega \rightarrow {\mathbb {R}}^d\), \(\theta \in \Theta\), be independent standard Brownian motions, assume that\(({\mathfrak {u}}^\theta )_{\theta \in \Theta }\), \((W^\theta )_{\theta \in \Theta }\), and\({\mathbf {W}}\)are independent, and let\({U}_{ n,M}^{\theta } :[0, T] \times {\mathbb {R}}^d \times \Omega \rightarrow {\mathbb {R}}\), \(n,M\in {\mathbb {Z}}\), \(\theta \in \Theta\), be functions which satisfy for all\(n,M \in {\mathbb {N}}\), \(\theta \in \Theta\), \(t \in [0,T]\), \(x\in {\mathbb {R}}^d\)that\({U}_{-1,M}^{\theta }(t,x)={U}_{0,M}^{\theta }(t,x)=0\)and

$$\begin{aligned}{U}_{n,M}^{\theta }(t,x) & = \frac{1}{M^n} \sum _{i=1}^{M^n} g_2\left( x+W^{(\theta ,0,-i)}_{T}-W^{(\theta ,0,-i)}_{t}\right) \\&\quad + \sum _{l=0}^{n-1} \frac{(T-t)}{M^{n-l}} \left[ \sum _{i=1}^{M^{n-l}} \left( F_2\left( {U}_{l,M}^{(\theta ,l,i)}\right) -\mathbb {1}_{{\mathbb {N}}}(l)F_2\left( {U}_{l-1,M}^{(\theta ,-l,i)}\right) \right)\right.\\&\quad \qquad\qquad\qquad\qquad \left. \left( \mathcal {U}_t^{(\theta ,l,i)},x+W_{\mathcal {U}_t^{(\theta ,l,i)}}^{(\theta ,l,i)}-W_{t}^{(\theta ,l,i)}\right) \right] . \end{aligned}$$
(9)

2.2 An a priori estimate for solutions of partial differential equations (PDEs)

Lemma 2.2

(q-th moment of the exact solution) Assume Setting 2.1 and let \(x\in {\mathbb {R}}^d\), \(i\in \{1,2\}\). Then it holds that

$$\begin{aligned} \begin{aligned} \sup _{t\in [0,T]} \left( {\mathbb {E}} \Big[ \left| u_i(t,x+\mathbf {W}_{t}) \right| ^q \Big] \right) ^{\frac{1}{q}} \le e^{LT} (T+1)B\left[ \sup _{t\in [0,T]} \left( {\mathbb {E}} \left[ \left( 1+\left\| {x+\mathbf {W}_t}\right\| \right) ^{pq} \right] \right) ^{ \frac{1}{q}} \right] . \end{aligned} \end{aligned}$$
(10)

Proof of Lemma 2.2

Throughout this proof let \(\mu _{t} :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1]\), \(t \in [0,T]\), be the probability measures which satisfy for all \(t \in [0,T]\), \(B \in \mathcal {B}({\mathbb {R}}^d)\) that

$$\mu _t(B) = {\mathbb {P}}( x + \mathbf {W}_t \in B ).$$
(11)

The integral transformation theorem, (8), and the triangle inequality show for all \(t \in [0,T]\) that

$$\begin{aligned}\left( {\mathbb {E}} \Big[ | u_i(t,x +\mathbf {W}_t)|^q \Big] \right) ^{ \frac{1}{q}} & = \left( \int _{{\mathbb {R}}^d} | u_i(t, z)|^q \, \mu _t(dz) \right) ^{ \frac{1}{q}} \\&= \left( \int _{{\mathbb {R}}^d} \left| {\mathbb {E}} \left[ g_i(z+\mathbf {W}_{T-t}) + \int _t^{T} (F_i(u_i))(s,z+\mathbf {W}_{s-t}) \,ds \right] \right| ^q \, \mu _t(dz) \right) ^{ \frac{1}{q}} \\& \le \left( \int _{{\mathbb {R}}^d}\left| {\mathbb {E}} \left[ g_i(z+\mathbf {W}_{T-t}) \right] \right| ^q \, \mu _t(dz) \right) ^{ \frac{1}{q}} \\&\qquad +\int _{t}^T \left( \int _{{\mathbb {R}}^d}\left| {\mathbb {E}} \left[ (F_i(u_i))(s,z+\mathbf {W}_{s-t}) \right] \right| ^q \, \mu _t(dz)\right) ^{\frac{1}{q}} ds . \end{aligned}$$
(12)

Next, Jensen’s inequality, Fubini’s theorem, (11), the fact that \({\mathbf {W}}\) has independent and stationary increments, and (4) demonstrate that for all \(t \in [0,T]\) it holds that

$$\begin{aligned}\int _{{\mathbb {R}}^d}\left| {\mathbb {E}} \left[ g_i(z+\mathbf {W}_{T-t}) \right] \right| ^q\, \mu _t(dz) \le \int _{{\mathbb {R}}^d} {\mathbb {E}} \Big[ \left| g_i(z+\mathbf {W}_{T}-\mathbf {W}_t) \right| ^q \Big] \, \mu _t(dz)\\&\quad = {\mathbb {E}} \Big[ \left| g_i\left( x +\mathbf {W}_t+ \mathbf {W}_{T}- \mathbf {W}_t\right) \right| ^q \Big] = {\mathbb {E}} \left[ \left| g_i\left( x +\mathbf {W}_{T}\right) \right| ^q \right] \le {\mathbb {E}} \left[ B^q\left( 1+ \left\| { x + \mathbf {W}_{T} }\right\| \right) ^{pq} \right] . \end{aligned}$$
(13)

Furthermore, Jensen’s inequality, Fubini’s theorem, (11), the fact that \(\mathbf {W}\) has independent and stationary increments, the triangle inequality, (3), and (4) demonstrate for all \(t \in [0,T]\) that

$$\begin{aligned} \int _{t}^T\left( \int _{{\mathbb {R}}^d}\left| {\mathbb {E}} \left[ (F_i(u_i))(s,z+\mathbf {W}_{s-t}) \right] \right| ^q \, \mu _t(dz)\right) ^{\frac{1}{q}}ds\\&\quad \le \int _{t}^T \left( \int _{{\mathbb {R}}^d} {\mathbb {E}} \Big[ \left| (F_i(u_i))(s,z+\mathbf {W}_{s}-\mathbf {W}_{t}) \right| ^q \Big] \, \mu _t(dz)\right) ^{\frac{1}{q}} \, ds \\&\quad = \int _{t}^T \left( {\mathbb {E}} \Big[ \left| \left( F_i(u_i)\right) (s,x+\mathbf {W}_{t}+\mathbf {W}_{s}-\mathbf {W}_{t}) \right| ^q \Big] \right) ^{\frac{1}{q}} ds \\&\quad \le \int _{t}^T \left( {\mathbb {E}} \Big[ \left| (F_i(0))(s, x+\mathbf {W}_{s}) \right| ^q \Big] \right) ^{ \frac{1}{q}} ds + \int _{t}^T \left( {\mathbb {E}} \Big[ \left| (F_i(u_i) - F_i(0))(s,x+\mathbf {W}_{s}) \right| ^q \Big] \right) ^{ \frac{1}{q}} \, ds \\&\quad \le T\sup _{s\in [0,T]} \left( {\mathbb {E}} \left[ B^q\left( 1+\left\| {x+\mathbf {W}_s}\right\| \right) ^{pq} \right] \right) ^{ \frac{1}{q}} + \int _{t}^T \left( {\mathbb {E}} \Big[ L^q \left| u_i(s,x+\mathbf {W}_{s}) \right| ^q \Big] \right) ^{ \frac{1}{q}} \, ds. \end{aligned}$$
(14)

Combining this with (12) and (13) implies that for all \(t \in [0,T]\) it holds that

$$\begin{aligned}&\left( {\mathbb {E}} \Big[ \left| u_i(t,x +\mathbf {W}_t)\right| ^q \Big] \right) ^{\frac{1}{q}} \\&\quad \le (T+1)B\sup _{s\in [0,T]} \left( {\mathbb {E}} \left[ \left( 1+\left\| {x+\mathbf {W}_s}\right\| \right) ^{pq} \right] \right) ^{ \frac{1}{q}} + L\int _{t}^T \left( {\mathbb {E}} \Big[ \left| u_i(s,x+\mathbf {W}_{s}) \right| ^q \Big] \right) ^{ \frac{1}{q}} \, ds. \end{aligned}$$
(15)

Next, [17, Corollary 3.11] shows that

$$\begin{aligned} \sup _{s\in [0,T]}\sup _{y\in {\mathbb {R}}^d} \frac{\left| u_i(s,y)\right| }{\left( 1+\left\| {y}\right\| \right) ^p}\le \sup _{s\in [0,T]}\sup _{y\in {\mathbb {R}}^d} \frac{\left| u_i(s,y)\right| }{1+\left\| {y}\right\| ^p}< \infty . \end{aligned}$$
(16)

This, the triangle inequality, and the fact that \({\mathbb {E}}\left[ \left\| {\mathbf {W}_T}\right\| ^{pq}\right] <\infty\) show that

$$\begin{aligned} &\int _0^T \left( {\mathbb {E}} \Big[ \left| u_i(s,x+\mathbf {W}_{s}) \right| ^q \Big] \right) ^{ \frac{1}{q}} ds\le \left[ \sup _{s\in [0,T]}\sup _{y\in {\mathbb {R}}^d} \frac{|u(s,y)|}{\left( 1+\left\| {y}\right\| \right) ^p}\right] \int _{0}^{T}\left( {\mathbb {E}}\left[ \left( 1+\left\| {x+\mathbf {W}_s}\right\| \right) ^{pq}\right] \right) ^{\frac{1}{q}}ds\\&\quad \le \left[ \sup _{s\in [0,T]} \sup _{y\in {\mathbb {R}}^d}\frac{|u(s,y)|}{\left( 1+\left\| {y}\right\| \right) ^p}\right] T \left( 1+\left\| {x}\right\| + \left( {\mathbb {E}} \left[ \left\| {\mathbf {W}_T}\right\| ^{pq} \right] \right) ^{\frac{1}{pq}}\right) ^{p} <\infty . \end{aligned}$$
(17)

This, Gronwall’s integral inequality, and (15) establish for all \(t \in [0, T]\) that

$$\begin{aligned} \left( {\mathbb {E}} \Big[ \left| u_i(t,x+\mathbf {W}_{t}) \right| ^q \Big] \right) ^{\frac{1}{q}}\le e^{LT} (T+1)B\sup _{s\in [0,T]} \left( {\mathbb {E}} \left[ \left( 1+\left\| {x+\mathbf {W}_s}\right\| \right) ^{pq} \right] \right) ^{ \frac{1}{q}} . \end{aligned}$$
(18)

The proof of Lemma 2.2 is thus completed. \(\square\)

2.3 A stability result for solutions of PDEs

Lemma 2.3

Assume Setting 2.1. Then it holds for all \(t\in [0,T]\), \(x\in {\mathbb {R}}^d\) that

$$\begin{aligned}&{\mathbb {E}}\Big[ \left| u_1(t,x+\mathbf {W}_t)-u_2(t,x+\mathbf {W}_t)\right| \Big] \\&\quad \le \delta \left( e^{LT} (T+1)\right) ^{q+1}\left( B^q+1\right) \left( 1+\left\| {x}\right\| + \left( {\mathbb {E}} \left[ \left\| {\mathbf {W}_T}\right\| ^{pq} \right] \right) ^{\frac{1}{pq}}\right) ^{pq}. \end{aligned}$$
(19)

Proof of Lemma 2.3

First, (8), the triangle inequality, and the fact that \({\mathbf {W}}\) has stationary increments show for all \(s\in [0,T]\), \(z\in {\mathbb {R}}^d\) that

$$\begin{aligned}&\left| u_1(s,z)-u_2(s,z)\right| \\&\quad =\left| {\mathbb {E}}\left[ (g_1-g_2)\left( z+\mathbf {W}_{T-s}\right) +\int _s^T \left( F_1(u_1)-F_1(u_2)+ F_1(u_2)- F_2(u_2) \right) \left( t,z+\mathbf {W}_{t-s}\right) \,dt\right] \right| \\&\quad \le {\mathbb {E}}\Big[ \left| (g_1-g_2)\left( z+\mathbf {W}_{T}-\mathbf {W}_{s}\right) \right| \Big] +\int _s^T{\mathbb {E}}\Big[ \left| \left( F_1(u_1)-F_1(u_2)\right) \left( t,z+\mathbf {W}_{t}-\mathbf {W}_{s}\right) \right| \Big] \,dt\\&\qquad +\int _s^T{\mathbb {E}}\Big[ \left| \left( F_1(u_2)- F_2(u_2)\right) \left( t,z+\mathbf {W}_{t}-\mathbf {W}_{s}\right) \right| \Big] \,dt. \end{aligned}$$
(20)

This, Fubini’s theorem, the fact that \(\mathbf {W}\) has independent increments, and the Lipschitz condition in (3) ensure that for all \(s\in [0,T]\), \(x\in {\mathbb {R}}^d\) it holds that

$$\begin{aligned}&{\mathbb {E}}\Big[ \left| \left( u_1-u_2\right) (s,x+\mathbf {W}_s)\right| \Big] = {\mathbb {E}}\left[ \left. \left| u_1(s,z)-u_2(s,z)\right| \right| _{z=x+\mathbf {W}_s}\right] \\&\quad \le {\mathbb {E}}\left[ {\mathbb {E}}\Big[ \left. \left| (g_1-g_2)\left( z+\mathbf {W}_{T}-\mathbf {W}_{s}\right) \right| \Big] \right| _{z=x+\mathbf {W}_s}\right] \\&\qquad +\int _s^T{\mathbb {E}}\left[ {\mathbb {E}}\Big[ \left. \left| \left( F_1(u_1)-F_1(u_2)\right) \left( t,z+\mathbf {W}_{t}-\mathbf {W}_{s}\right) \right| \Big] \right| _{z=x+\mathbf {W}_s}\right] \,dt\\&\qquad +\int _s^T{\mathbb {E}}\left[ {\mathbb {E}}\Big[ \left. \left| \left( F_1(u_2)- F_2(u_2)\right) \left( t,z+\mathbf {W}_{t}-\mathbf {W}_{s}\right) \right| \Big] \right| _{z=x+\mathbf {W}_s}\right] \,dt\\&\quad = {\mathbb {E}}\Big[ \left| (g_1-g_2)\left( x+\mathbf {W}_{T}\right) \right| \Big] +\int _s^T{\mathbb {E}}\Big[ \left| \left( F_1(u_1)-F_1(u_2)\right) \left( t,x+\mathbf {W}_{t}\right) \right| \Big] \,dt\\&\qquad +\int _s^T{\mathbb {E}}\Big[ \left| \left( F_1(u_2)- F_2(u_2)\right) \left( t,x+\mathbf {W}_{t}\right) \right| \Big] \,dt\\&\quad \le {\mathbb {E}}\Big[ \left| (g_1-g_2)\left( x+\mathbf {W}_{T}\right) \right| \Big] + \int _s^T{\mathbb {E}}\Big[ L\left| \left( u_1- u_2\right) \left( t,x+\mathbf {W}_{t}\right) \right| \Big] \,dt\\&\qquad + T\sup _{t\in [0,T]}{\mathbb {E}}\Big[ \left| \left( F_1(u_2)- F_2(u_2)\right) \left( t,x+\mathbf {W}_{t}\right) \right| \Big] . \end{aligned}$$
(21)

This, Gronwall’s lemma, and Lemma 2.2 yield for all \(x\in {\mathbb {R}}^d\) that

$$\begin{aligned}&\sup _{t\in [0,T]}{\mathbb {E}}\Big[ \left| \left( u_1-u_2\right) (t,x+\mathbf {W}_t)\right| \Big] \\&\quad \le e^{LT}(T+1) \sup _{t\in [0,T]}\max \left\{ {\mathbb {E}}\Big[ \left| (g_1-g_2)\left( x+\mathbf {W}_{T}\right) \right| \Big] , {\mathbb {E}}\Big[ \left| \left( F_1(u_2)- F_2(u_2)\right) \left( t,x+\mathbf {W}_{t}\right) \right| \Big] \right\} . \end{aligned}$$
(22)

Furthermore, (5), the triangle inequality, and Lemma 2.2 imply for all \(x\in {\mathbb {R}}^d\) that

$$\begin{aligned} \begin{aligned}&\sup _{t\in [0,T]}\max \left\{ {\mathbb {E}}\Big[ \left| (g_1-g_2)\left( x+\mathbf {W}_{T}\right) \right| \Big] , {\mathbb {E}}\Big[ \left| \left( F_1(u_2)- F_2(u_2)\right) \left( t,x+\mathbf {W}_{t}\right) \right| \Big] \right\} \\&\quad \le \delta \sup _{t\in [0,T]} {\mathbb {E}}\left[ \left( 1+\left\| {x+\mathbf {W}_t}\right\| \right) ^{pq}+\left| u_2(x+\mathbf {W}_t)\right| ^q\right] \\&\quad \le \delta \sup _{t\in [0,T]} {\mathbb {E}}\left[ \left( 1+\left\| {x+\mathbf {W}_t}\right\| \right) ^{pq}\right] +\delta \sup _{t\in [0,T]}{\mathbb {E}}\left[ \left| u_2(x+\mathbf {W}_t)\right| ^q\right] .\\&\quad \le \delta \sup _{t\in [0,T]} {\mathbb {E}}\left[ \left( 1+\left\| {x+\mathbf {W}_t}\right\| \right) ^{pq}\right] +\delta (e^{LT} (T+1)B)^q\sup _{t\in [0,T]} {\mathbb {E}} \left[ \left( 1+\left\| {x+\mathbf {W}_t}\right\| \right) ^{pq} \right] \\&\quad \le \delta \left( e^{LT} (T+1)\right) ^q (B^q+1)\sup _{t\in [0,T]} {\mathbb {E}} \left[ \left( 1+\left\| {x+\mathbf {W}_t}\right\| \right) ^{pq} \right] . \end{aligned} \end{aligned}$$
(23)

This, (22), and the triangle inequality yield that

$$\begin{aligned}&\sup _{t\in [0,T]}{\mathbb {E}}\Big[ \left| \left( u_1-u_2\right) (t,x+\mathbf {W}_t)\right| \Big] \\&\quad \le \delta \left( e^{LT} (T+1)\right) ^{q+1}\left( B^q+1\right) \sup _{t\in [0,T]} {\mathbb {E}} \left[ \left( 1+\left\| {x+\mathbf {W}_t}\right\| \right) ^{pq} \right] \\&\quad \le \delta \left( e^{LT} (T+1)\right) ^{q+1}\left( B^q+1\right) \left( 1+\left\| {x}\right\| + \left( {\mathbb {E}} \Big[ \left\| {\mathbf {W}_T}\right\| ^{pq} \Big] \right) ^{\frac{1}{pq}}\right) ^{pq}. \end{aligned}$$
(24)

This completes the proof of Lemma 2.3. \(\qquad\qquad\square\)

2.4 A stability result for MLP approximations

Corollary 2.4

Assume Setting 2.1, let \(x\in {\mathbb {R}}^d\), \(N,M\in {\mathbb {N}}\), and assume that \(q\ge 2\). Then it holds that

$$\begin{aligned}&\left( {\mathbb {E}}\left[ \left| U^0_{N,M}(0,x)-u_1(0,x)\right| ^2\right] \right) ^{\frac{1}{2}}\\&\quad \le \left( e^{LT} (T+1)\right) ^{q+1}\left( B^q+1\right) \left( \delta +\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\right) \left( 1+\left\| {x}\right\| + \left( {\mathbb {E}} \Big[ \left\| {\mathbf {W}_T}\right\| ^{pq} \Big] \right) ^{\frac{1}{pq}}\right) ^{pq}. \end{aligned}$$
(25)

Proof of Corollary 2.4

First, Lemma 2.2 implies that \(\int _{0}^{T} \left( {\mathbb {E}} \left[ \left| u_2(t,x+\mathbf {W}_{t}) \right| ^2 \right] \right) ^{\frac{1}{2}}dt<\infty\). This, [17, Theorem 3.5] (with \(\xi =x\), \(F=F_2\), \(g=g_2\), and \(u=u_2\) in the notation of [17, Theorem 3.5]), (4), and the triangle inequality ensure that

$$\begin{aligned}&\left( {\mathbb {E}}\left[ \left| U^0_{N,M}(0,x)-u_2(0,x)\right| ^2\right] \right) ^{\frac{1}{2}}\\&\quad \le e^{LT} \left[ \left( {\mathbb {E}}\left[ \left| g_2(x+\mathbf {W}_T)\right| ^2\right] \right) ^{\frac{1}{2}} +T\left( \frac{1}{T}\int _0^T{\mathbb {E}}\left[ \left| (F_2(0))(t,x+\mathbf {W}_t)\right| ^2\right] dt \right) ^{\frac{1}{2}} \right] \frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\\&\quad \le e^{LT} (T+1) \sup _{t\in [0,T]} \left( {\mathbb {E}}\left[ B^2 \left( 1+\left\| {x+\mathbf {W}_t}\right\| \right) ^{2p}\right] \right) ^{\frac{1}{2}}\,\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}} \\&\quad \le e^{LT} (T+1)B\left( 1+\left\| {x}\right\| + \left( {\mathbb {E}} \left[ \left\| {\mathbf {W}_T}\right\| ^{2p} \right] \right) ^{\frac{1}{2p}}\right) ^{p} \frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}. \end{aligned}$$
(26)

Furthermore, Lemma 2.3 shows that

$$\begin{aligned}\left| u_2(0,x)-u_1(0,x)\right|&\le \delta \left( e^{LT} (T+1)\right) ^{q+1}\left( B^q+1\right) \left( 1+\left\| {x}\right\| + \left( {\mathbb {E}} \left[ \left\| {\mathbf {W}_T}\right\| ^{pq} \right] \right) ^{\frac{1}{pq}}\right) ^{pq}. \end{aligned}$$
(27)

This, the triangle inequality, (26), the fact that \(B\le B^q+1\), the assumption that \(q\ge 2\), and Jensen’s inequality show that

$$\begin{aligned}&\left( {\mathbb {E}}\left[ \left| U^0_{N,M}(0,x)-u_1(0,x)\right| ^2\right] \right) ^{\frac{1}{2}}\\&\quad \le \left( {\mathbb {E}}\left[ \left| U^0_{N,M}(0,x)-u_2(0,x)\right| ^2\right] \right) ^{\frac{1}{2}}+\left| u_2(0,x)-u_1(0,x)\right| \\&\quad \le \left( e^{LT} (T+1)\right) ^{q+1}\left( B^q+1\right) \left( \delta +\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\right) \left( 1+\left\| {x}\right\| + \left( {\mathbb {E}} \left[ \left\| {\mathbf {W}_T}\right\| ^{pq} \right] \right) ^{\frac{1}{pq}}\right) ^{pq}. \end{aligned}$$
(28)

The proof of Corollary 2.4 is thus completed. \(\square\)

3 Deep neural network representations for MLP approximations

The main result of this section, Lemma 3.10 below, shows that multilevel Picard approximations can be well represented by DNNs. The central tools for the proof of Lemma 3.10 are Lemmas 3.8 and 3.9 which show that DNNs are stable under compositions and summations. We formulate Lemmas 3.8 and 3.9 in terms of the operators defined in (33) and (34) below, whose properties are studied in Lemmas 3.33.4, and 3.5.

3.1 A mathematical framework for deep neural networks

Setting 3.1

(Artificial neural networks) Let\(\left\| \cdot \right\| , |||\cdot ||| :(\cup _{d\in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0,\infty )\)and\(\dim :(\cup _{d\in {\mathbb {N}}}{\mathbb {R}}^d) \rightarrow {\mathbb {N}}\)satisfy for all\(d\in {\mathbb {N}}\), \(x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d\)that\(\Vert x\Vert =\sqrt{\sum _{i=1}^d(x_i)^2}\), \(|||x|||=\max _{i\in [1,d]\cap {\mathbb {N}}}|x_i|\), and\(\dim \left( x\right) =d\), let\({\mathbf {A}}_{d}:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\), \(d\in {\mathbb {N}}\), satisfy for all\(d\in {\mathbb {N}}\), \(x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d\)that

$${\mathbf {A}}_{d}(x)= \left( \max \{x_1,0\},\ldots ,\max \{x_d,0\}\right) ,$$
(29)

let\({\mathbf {D}}=\cup _{H\in {\mathbb {N}}} {\mathbb {N}}^{H+2}\), let

$$\begin{aligned}{\mathbf {N}} = \bigcup _{H\in {\mathbb {N}}}\bigcup _{(k_0,k_1,\ldots ,k_{H+1})\in {\mathbb {N}}^{H+2}} \left[ \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_{n}\times k_{n-1}} \times {\mathbb {R}}^{k_{n}}\right) \right] , \end{aligned}$$
(30)

let\({\mathcal {D}}:{\mathbf {N}}\rightarrow \mathbf {D}\)and\({\mathcal {R}}:{\mathbf {N}}\rightarrow (\cup _{k,l\in {\mathbb {N}}} C({\mathbb {R}}^k,{\mathbb {R}}^l))\)satisfy for all\(H\in {\mathbb {N}}\), \(k_0,k_1,\ldots ,k_H,k_{H+1}\in {\mathbb {N}}\), \(\Phi = ((W_1,B_1),\ldots ,(W_{H+1},B_{H+1}))\in \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_n\times k_{n-1}} \times {\mathbb {R}}^{k_n}\right) ,\)\(x_0 \in {\mathbb {R}}^{k_0},\ldots ,x_{H}\in {\mathbb {R}}^{k_{H}}\)with\(\forall \, n\in {\mathbb {N}}\cap [1,H]:x_n = \mathbf {A}_{k_n}(W_n x_{n-1}+B_n )\)that

$${\mathcal {D}}(\Phi )= (k_0,k_1,\ldots ,k_{H}, k_{H+1}),\qquad {\mathcal {R}}(\Phi )\in C({\mathbb {R}}^{k_0},{\mathbb {R}}^ {k_{H+1}}),$$
(31)
$$\qquad and\qquad ({\mathcal {R}}(\Phi )) (x_0) = W_{H+1}x_{H}+B_{H+1},$$
(32)

let\(\odot :{\mathbf {D}}\times {\mathbf {D}}\rightarrow {\mathbf {D}}\)satisfy for all\(H_1,H_2\in {\mathbb {N}}\), \(\alpha =(\alpha _0,\alpha _1,\ldots ,\alpha _{H_1},\alpha _{H_1+1})\in {\mathbb {N}}^{H_1+2}\), \(\beta =(\beta _0,\beta _1,\ldots ,\beta _{H_2},\beta _{H_2+1})\in {\mathbb {N}}^{H_2+2}\)that

$$\alpha \odot \beta = (\beta _{0},\beta _{1},\ldots ,\beta _{H_2},\beta _{H_2+1}+\alpha _{0},\alpha _{1},\alpha _{2},\ldots ,\alpha _{H_1+1})\in {\mathbb {N}}^{H_1+H_2+3},$$
(33)

let\({{\, \mathrm{\boxplus }\,}}:{\mathbf {D}}\times {\mathbf {D}}\rightarrow {\mathbf {D}}\)satisfy for all\(H\in {\mathbb {N}}\), \(\alpha = (\alpha _0,\alpha _1,\ldots ,\alpha _{H},\alpha _{H+1})\in {\mathbb {N}}^{H+2}\), \(\beta = (\beta _0,\beta _1,\beta _2,\ldots ,\beta _{H},\beta _{H+1})\in {\mathbb {N}}^{H+2}\)that

$$\alpha {{\,\mathrm{\boxplus }\,}}\beta =(\alpha _0,\alpha _1+\beta _1,\ldots ,\alpha _{H}+\beta _{H},\beta _{H+1})\in {\mathbb {N}}^{H+2},$$
(34)

and let\({\mathfrak {n}}_{n}\in {\mathbf {D}}\), \(n\in [3,\infty )\cap {\mathbb {N}}\), satisfy for all\(n\in [3,\infty )\cap {\mathbb {N}}\)that

$$\begin{aligned} {\mathfrak {n}}_{n}= (1,\underbrace{2,\ldots ,2}_{(n-2)\text {-times}},1)\in {\mathbb {N}}^{n}. \end{aligned}$$
(35)

Remark 3.2

The set \({\mathbf {N}}\) can be viewed as the set of all artificial neural networks. For each network \(\Phi \in {\mathbf {N}}\) the function \({\mathcal {R}}(\Phi )\) is the function represented by \(\Phi\) and the vector \({\mathcal {D}}(\Phi )\) describes the layer dimensions of \(\Phi\).

3.2 Properties of operations associated to deep neural networks

Lemma 3.3

(\(\odot\) is associative) Assume Setting 3.1 and let \(\alpha ,\beta ,\gamma \in {\mathbf {D}}\). Then it holds that \((\alpha \odot \beta )\odot \gamma = \alpha \odot (\beta \odot \gamma )\).

Proof of Lemma 3.3

Throughout this proof let \(H_1,H_2,H_3\in {\mathbb {N}}\), \((\alpha _i)_{i\in [0,H_1+1]\cap {\mathbb {N}}_0}\in {\mathbb {N}}^{H_1+2}\), \((\beta _i)_{i\in [0,H_2+1]\cap {\mathbb {N}}_0}\in {\mathbb {N}}^{H_2+2}\), \((\gamma _i)_{i\in [0,H_3+1]\cap {\mathbb {N}}_0}\in {\mathbb {N}}^{H_3+2}\) satisfy that

$$\begin{aligned} \alpha&=(\alpha _0,\alpha _1,\ldots ,\alpha _{H_1+1}),\quad \beta =(\beta _0,\beta _1,\ldots ,\beta _{H_2+1}),\quad \text {and}\\ \gamma&=(\gamma _0,\gamma _1,\ldots ,\gamma _{H_3+1}). \end{aligned}$$
(36)

The definition of \(\odot\) in (33) then shows that

$$\begin{aligned} (\alpha \odot \beta )\odot \gamma& = (\beta _{0},\beta _{1},\beta _2\ldots ,\beta _{H_2},\beta _{H_2+1}+\alpha _{0},\alpha _{1},\alpha _{2},\ldots ,\alpha _{H_1+1})\odot (\gamma _0,\gamma _1,\ldots ,\gamma _{H_3+1})\\&= (\gamma _0,\ldots ,\gamma _{H_3},\gamma _{H_3+1}+\beta _{0},\beta _{1},\ldots ,\beta _{H_2},\beta _{H_2+1}+\alpha _{0},\alpha _{1},\alpha _{2},\ldots ,\alpha _{H_1+1})\\&= (\alpha _0,\alpha _1,\ldots ,\alpha _{H_1+1})\odot (\gamma _{0},\gamma _{1},\ldots ,\gamma _{H_3},\gamma _{H_3+1}+\beta _{0},\beta _{1},\beta _{2},\ldots ,\beta _{H_2+1}) \\&=\alpha \odot (\beta \odot \gamma ). \end{aligned}$$
(37)

The proof of Lemma 3.3 is thus completed. \(\square\)

Lemma 3.4

(\({{\,\mathrm{\boxplus }\,}}\) and associativity) Assume Setting 3.1, let \(H,k,l \in {\mathbb {N}}\), and let \(\alpha ,\beta ,\gamma \in \left( \{k\}\times {\mathbb {N}}^{H} \times \{l\}\right)\). Then

  1. (i)

    it holds that\(\alpha {{\,\mathrm{\boxplus }\,}}\beta \in \left( \{k\}\times {\mathbb {N}}^{H} \times \{l\}\right)\),

  2. (ii)

    it holds that\(\beta {{\,\mathrm{\boxplus }\,}}\gamma \in \left( \{k\}\times {\mathbb {N}}^{H} \times \{l\}\right)\), and

  3. (iii)

    it holds that\((\alpha {{\,\mathrm{\boxplus }\,}}\beta ){{\,\mathrm{\boxplus }\,}}\gamma = \alpha {{\,\mathrm{\boxplus }\,}}(\beta {{\,\mathrm{\boxplus }\,}}\gamma )\).

Proof of Lemma 3.4

Throughout this proof let \(\alpha _i,\beta _i,\gamma _i\in {\mathbb {N}}\), \(i\in [1,H]\cap {\mathbb {N}}\), satisfy that \(\alpha = (k,\alpha _1,\alpha _2,\ldots ,\alpha _{H},l)\), \(\beta = (k,\beta _1,\beta _2,\ldots ,\beta _{H},l)\), and \(\gamma = (k,\gamma _1,\gamma _2,\ldots ,\gamma _{H},l).\) The definition of \({{\,\mathrm{\boxplus }\,}}\) (see (34)) then shows that

$$\begin{aligned}\alpha {{\,\mathrm{\boxplus }\,}}\beta&=(k,\alpha _1+\beta _1, \alpha _2+\beta _2, \ldots ,\alpha _{H}+\beta _{H},l)\in \{k\}\times {\mathbb {N}}^{H}\times \{l\}, \\ \beta {{\,\mathrm{\boxplus }\,}}\gamma&=(k,\beta _1+\gamma _1, \beta _2+\gamma _2, \ldots ,\beta _{H}+\gamma _{H},l)\in \{k\}\times {\mathbb {N}}^{H}\times \{l\}, \end{aligned}$$
(38)

and

$$\begin{aligned} (\alpha {{\,\mathrm{\boxplus }\,}}\beta ){{\,\mathrm{\boxplus }\,}}\gamma&=(k,(\alpha _1+\beta _1)+\gamma _1, (\alpha _2+\beta _2)+\gamma _2, \ldots ,(\alpha _{H}+\beta _{H})+\gamma _{H},l) \\&=(k,\alpha _1+(\beta _1+\gamma _1), \alpha _2+(\beta _2+\gamma _2), \ldots ,\alpha _{H}+(\beta _{H}+\gamma _{H}),l) = \alpha {{\,\mathrm{\boxplus }\,}}(\beta {{\,\mathrm{\boxplus }\,}}\gamma ).\end{aligned}$$
(39)

The proof of Lemma 3.4 is thus completed. \(\qquad\qquad\square\)

Lemma 3.5

(Triangle inequality) Assume Setting 3.1, let \(H,k,l \in {\mathbb {N}}\), and let \(\alpha ,\beta \in \{k\}\times {\mathbb {N}}^{H} \times \{l\}\). Then it holds that \(|||\alpha {{\,\mathrm{\boxplus }\,}}\beta |||\le |||\alpha |||+ |||\beta |||\).

Proof of Lemma 3.5

Throughout this proof let \(\alpha _i,\beta _i\in {\mathbb {N}}\), \(i\in [1,H]\cap {\mathbb {N}}\), satisfy that \(\alpha = (k,\alpha _1,\alpha _2,\ldots ,\alpha _{H},l)\) and \(\beta = (k,\beta _1,\beta _2,\ldots ,\beta _{H},l).\) The definition of \({{\,\mathrm{\boxplus }\,}}\) (see (34)) then shows that \(\alpha {{\,\mathrm{\boxplus }\,}}\beta =(k,\alpha _1+\beta _1, \alpha _2+\beta _2, \ldots ,\alpha _{H}+\beta _{H},l).\) This together with the triangle inequality implies that

$$\begin{aligned} |||\alpha {{\,\mathrm{\boxplus }\,}}\beta |||&=\sup \left\{ |k|,\left| \alpha _1+\beta _1\right| , \left| \alpha _2+\beta _2\right| , \ldots ,\left| \alpha _{H}+\beta _{H}\right| ,\left| l\right| \right\} \\&\le \sup \left\{ |k|,\left| \alpha _1\right| , \left| \alpha _2\right| , \ldots ,\left| \alpha _{H}\right| ,\left| l\right| \right\} +\sup \left\{ |k|,\left| \beta _1\right| , \left| \beta _2\right| , \ldots ,\left| \beta _{H}\right| ,\left| l\right| \right\} \\&= |||\alpha |||+|||\beta |||.\end{aligned}$$
(40)

This completes the proof of Lemma 3.5. \(\square\)

The following result, Lemma 3.6, can in a slightly modified variant, e.g., be found in [20, Lemma 5.4] in the scientific literature.

Lemma 3.6

(Existence of DNNs with \(H\in {\mathbb {N}}\) hidden layers for the identity in \({\mathbb {R}}\)) Assume Setting 3.1 and let \(H\in {\mathbb {N}}\). Then it holds that \(\mathrm {Id}_{{\mathbb {R}}}\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathfrak {n}}_{H+2} \})\).

Proof of Lemma 3.6

Throughout this proof let \(W_1\in {\mathbb {R}}^{2\times 1}\), \(W_i\in {\mathbb {R}}^{2\times 2}\), \(\,i\in [2,H]\cap {\mathbb {N}}\), \(W_{H+1}\in {\mathbb {R}}^{1\times 2}\), \(B_i\in {\mathbb {R}}^2\), \(i\in [1,H]\cap {\mathbb {N}}\), \(B_{H+1}\in {\mathbb {R}}^1\) satisfy that

$$\begin{aligned} &W_1= \begin{pmatrix} 1\\ -1 \end{pmatrix},\quad \forall i\in [2,H]\cap {\mathbb {N}}:W_i=\begin{pmatrix} 1&{} 0\\ 0&{} 1 \end{pmatrix} , \quad W_{H+1}= \begin{pmatrix} 1&-1 \end{pmatrix},\\&\forall i\in [1,H]\cap {\mathbb {N}}:B_i= \begin{pmatrix} 0\\ 0 \end{pmatrix},\quad B_{H+1}=0,\end{aligned}$$
(41)

let \(\phi \in {\mathbf {N}}\) satisfy that \(\phi =((W_1,B_1),(W_2,B_2),\ldots ,(W_H,B_H),(W_{H+1},B_{H+1}))\), for every \(a\in {\mathbb {R}}\) let \(a^+\in [0,\infty )\) be the non-negative part of a, i.e., \(a^+=\max \{a,0\}\), and let \(x_0\in {\mathbb {R}}\), \(x_1,x_2,\ldots ,x_{H}\in {\mathbb {R}}^2\) satisfy for all \(n\in {\mathbb {N}}\cap [1,H]\) that

$$x_n = \mathbf {A}_{2}(W_n x_{n-1}+B_n ).$$
(42)

Note that (41) and the definition of \({\mathcal {D}}\) (see (31)) imply that \({\mathcal {D}}(\phi )={\mathfrak {n}}_{H+2}\). Furthermore, (41), (42), and an induction argument show that

$$\begin{aligned} x_1&= \mathbf {A}_{2}(W_1x_0+B_1)= \mathbf {A}_{2}\left( \begin{pmatrix} x_0\\ -x_0 \end{pmatrix}\right) = \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix},\\ x_2&= \mathbf {A}_{2}(W_2x_1+B_2)= \mathbf {A}_{2}(x_1)=\mathbf {A}_{2}\left( \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix}\right) = \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix} ,\\&\quad \vdots \\ x_{H}&= \mathbf {A}_{2}(W_{H}x_{H-1}+B_{H})= \mathbf {A}_{2}(x_{H-1})=\mathbf {A}_{2}\left( \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix}\right) = \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix}. \end{aligned}$$
(43)

The definition of \({\mathcal {R}}\) (see (32)) hence ensures that

$$\begin{aligned} ({\mathcal {R}}(\phi ))(x_0)&=W_{H+1}x_{H}+B_{H+1}= \begin{pmatrix} 1&-1 \end{pmatrix} \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix}=x_0^{+}-(-x_0)^{+}=x_0.\end{aligned}$$
(44)

The fact that \(x_0\) was arbitrary therefore proves that \({\mathcal {R}}(\phi ) =\mathrm {Id}_{{\mathbb {R}}}\). This and the fact that \({\mathcal {D}}(\phi )={\mathfrak {n}}_{H+2}\) demonstrate that \({\mathrm {Id}}_{{\mathbb {R}}}\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathfrak {n}}_{H+2} \})\). The proof of Lemma 3.6 is thus completed. \(\qquad\qquad\square\)

Lemma 3.7

(DNNs for affine transformations) Assume Setting 3.1 and let \(d,m\in {\mathbb {N}}\), \(\lambda \in {\mathbb {R}}\), \(b\in {\mathbb {R}}^d\), \(a\in {\mathbb {R}}^m\), \(\Psi \in {\mathbf {N}}\) satisfy that \({\mathcal {R}}(\Psi )\in C({\mathbb {R}}^d,{\mathbb {R}}^m)\). Then it holds that

$$\lambda \left( \left( \mathcal {R}(\Psi )\right) (\cdot +b)+a\right) \in \mathcal {R}\left( \{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\mathcal {D}(\Psi )\}\right) .$$
(45)

Proof of Lemma 3.7

Throughout this proof let \(H,k_0,k_1,\ldots ,k_{H+1}\in {\mathbb {N}}\) satisfy that

$$H+2=\dim \left( \mathcal {D}(\Psi )\right) \quad \text {and}\quad (k_0,k_1,\ldots ,k_{H},k_{H+1}) = \mathcal {D}(\Psi ),$$
(46)

let \(((W_1,B_1),(W_2,B_2),\ldots ,(W_H,B_H),(W_{H+1},B_{H+1})) \in \prod _{n=1}^{H+1}\left( {\mathbb {R}}^{k_n\times k_{n-1}}\times {\mathbb {R}}^{k_n}\right)\) satisfy that

$$\left( (W_1,B_1),(W_2,B_2),\ldots ,(W_H,B_H),(W_{H+1},B_{H+1})\right) =\Psi ,$$
(47)

let \(\phi \in {\mathbf {N}}\) satisfy that

$$\phi =\left( (W_1,B_1+W_1b),(W_2,B_2),\ldots ,(W_H,B_H),(\lambda W_{H+1},\lambda B_{H+1}+\lambda a)\right) ,$$
(48)

and let \(x_0,y_0 \in {\mathbb {R}}^{k_0},x_1,y_1 \in {\mathbb {R}}^{k_1},\ldots ,x_{H},y_H\in {\mathbb {R}}^{k_{H}}\) satisfy for all \(n\in {\mathbb {N}}\cap [1,H]\) that

$$x_n = {\mathbf {A}}_{k_n}(W_n x_{n-1}+B_n ),\, y_n = \mathbf {A}_{k_n}(W_n y_{n-1}+B_n+\mathbb {1}_{\{1\}}(n)W_1b ), \quad \text {and} \quad x_0=y_0+b.$$
(49)

Then it holds that

$$y_1= {\mathbf {A}}_{k_1}(W_1 y_{0}+B_1+W_1b )= {\mathbf {A}}_{k_1}(W_1( y_{0}+b)+B_1 ) = {\mathbf {A}}_{k_1}(W_1x_0+B_1 )=x_1.$$
(50)

This and an induction argument prove for all \(i\in [2,H]\cap {\mathbb {N}}\) that

$$\begin{aligned} y_i=\mathbf {A}_{k_i}(W_i y_{i-1}+B_i )= \mathbf {A}_{k_i}(W_i x_{i-1}+B_i )=x_i. \end{aligned}$$
(51)

The definition of \({\mathcal {R}}\) (see (32)) hence shows that

$$\begin{aligned} ({\mathcal {R}}(\phi ))(y_0)&= \lambda W_{H+1}y_H+\lambda B_{H+1}+\lambda a=\lambda W_{H+1}x_H+\lambda B_{H+1}+\lambda a\\ {}&=\lambda (W_{H+1}x_H+ B_{H+1}+ a) =\lambda ( (\mathcal {R}(\Psi ))(x_0)+a)= \lambda ((\mathcal {R}(\Psi ))(y_0+b)+a). \end{aligned}$$
(52)

This and the fact that \(y_0\) was arbitrary prove that \({\mathcal {R}}(\phi )=\lambda ((\mathcal {R}(\Psi ))(\cdot +b)+a)\). This and the fact that \({\mathcal {D}}(\phi )=\mathcal {D}(\Psi )\) imply that \(\lambda \left( (\mathcal {R}(\Psi ))(\cdot +b)+a\right) \in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\mathcal {D}(\Psi )\})\). The proof of Lemma 3.7 is thus completed. \(\square\)

Lemma 3.8

(Composition) Assume Setting 3.1 and let \(d_1,d_2,d_3\in {\mathbb {N}}\), \(f\in C({\mathbb {R}}^{d_2},{\mathbb {R}}^{d_3})\), \(g\in C( {\mathbb {R}}^{d_1}, {\mathbb {R}}^{d_2})\), \(\alpha ,\beta \in \mathbf {D}\) satisfy that \(f\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\alpha \})\) and \(g\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\beta \})\). Then it holds that \((f\circ g)\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\alpha \odot \beta \})\).

Proof of Lemma 3.8

Throughout this proof let \(H_1,H_2,\alpha _0,\ldots , \alpha _{H_1+1},\beta _0,\ldots , \beta _{H_2+1}\in {\mathbb {N}}\), \(\Phi _{f}, \Phi _{g}\in \mathbf {N}\) satisfy that

$$\begin{aligned}&(\alpha _0,\alpha _1,\ldots ,\alpha _{H_1+1})=\alpha , \quad (\beta _0,\beta _1,\ldots ,\beta _{H_2+1})=\beta , \quad \mathcal {R}(\Phi _{f})=f , \\&\quad \mathcal {D}(\Phi _{f})=\alpha , \quad \mathcal {R}(\Phi _{g})=g,\quad \text {and}\quad \mathcal {D}(\Phi _{g})=\beta . \end{aligned}$$
(53)

Lemma 5.4 in [20] shows that there exists \(\mathbb {I}\in \mathbf {N}\) such that \(\mathcal {D}(\mathbb {I})=d_2{\mathfrak {n}}_{3}= (d_2,2d_2,d_2)\) and \(\mathcal {R}(\mathbb {I}) =\mathrm {Id}_{{\mathbb {R}}^{d_2}}\). Note that \(2d_2=\beta _{H_2+1}+\alpha _0\). This and [20, Proposition 5.2] (with \(\phi _1= \Phi _{f}\), \(\phi _2= \Phi _{g}\), and \(\mathbb {I}=\mathbb {I}\) in the notation of [20, Proposition 5.2]) show that there exists \(\Phi _{f\circ g}\in \mathbf {N}\) such that \(\mathcal {R}( \Phi _{f\circ g})=f\circ g\) and \(\mathcal {D}(\Phi _{f\circ g})= \mathcal {D}(\Phi _{f})\odot \mathcal {D}(\Phi _{g})=\alpha \odot \beta\). Hence, it holds that \(f\circ g\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\alpha \odot \beta \})\). The proof of Lemma 3.8 is thus completed. \(\square\)

The following result, Lemma 3.9, can roughly speaking in a specialized form be found, e.g., in [20, Lemma 5.1].

Lemma 3.9

(Sum of DNNs of the same length) Assume Setting 3.1 and let \(M,H,p,q\in {\mathbb {N}}\), \(h_1,h_2,\ldots ,h_M\in {\mathbb {R}}\), \(k_i\in \mathbf {D}\), \(f_i\in C({\mathbb {R}}^{p},{\mathbb {R}}^{q})\), \(i\in [1,M]\cap {\mathbb {N}}\), satisfy for all \(i\in [1,M]\cap {\mathbb {N}}\) that

$$\begin{aligned} \dim \left( k_i\right) =H+2\quad \text {and}\quad f_i\in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )=k_i\right\} \right) . \end{aligned}$$
(54)

Then it holds that

$$\begin{aligned} \sum _{i=1}^{M}h_if_i \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M}}k_i\right\} \right) . \end{aligned}$$
(55)

Proof of Lemma 3.9

Throughout this proof let \(\phi _i\in \mathbf {N}\), \(i\in [1,M]\cap {\mathbb {N}}\), and \(k_{i,n}\in {\mathbb {N}}\), \(i\in [1,M]\cap {\mathbb {N}}\), \(n\in [0,H+1]\cap {\mathbb {N}}_0\), satisfy for all \(i \in [1,M]\cap {\mathbb {N}}\) that

$$\begin{aligned} \mathcal {D}(\phi _i)=k_i= (k_{i,0},k_{i,1},k_{i,2},\ldots ,k_{i,H},k_{i,H+1}) \quad \text {and}\quad \mathcal {R}(\phi _i)=f_i, \end{aligned}$$
(56)

for every \(i\in [1,M]\cap {\mathbb {N}}\) let \(((W_{i,1}, B_{i,1}),\ldots , (W_{i,H+1}, B_{i,H+1}))\in \prod _{n=1}^{H+1}\left( {\mathbb {R}}^{k_{i,n}\times k_{i,n-1}} \times {\mathbb {R}}^{k_{i,n}}\right)\) satisfy that

$$\begin{aligned} \phi _i= \left( (W_{i,1}, B_{i,1}),\ldots , (W_{i,H+1},B_{i,H+1})\right) , \end{aligned}$$
(57)

let \(k_n^{{{\,\mathrm{\boxplus }\,}}}\in {\mathbb {N}}\), \(n\in [1,H]\cap {\mathbb {N}}\), \(k^{{{\,\mathrm{\boxplus }\,}}}\in {\mathbb {N}}^ {H+2}\) satisfy for all \(n\in [1,H]\cap {\mathbb {N}}\) that

$$\begin{aligned} \begin{aligned} k_n^{{{\,\mathrm{\boxplus }\,}}}=\sum _{i=1}^{M}k_{i,n}\quad \text {and}\quad k^{{{\,\mathrm{\boxplus }\,}}}=(p,k^{{{\,\mathrm{\boxplus }\,}}}_1,k^{{{\,\mathrm{\boxplus }\,}}}_2,\ldots , k^{{{\,\mathrm{\boxplus }\,}}}_{H},q), \end{aligned} \end{aligned}$$
(58)

let \(W_1\in {\mathbb {R}}^{k_1^{{{\,\mathrm{\boxplus }\,}}}\times p}\), \(B_1\in {\mathbb {R}}^{k_1^{{{\,\mathrm{\boxplus }\,}}}}\) satisfy that

$$\begin{aligned} W_1= \begin{pmatrix} W_{1,1}\\ W_{2,1}\\ \vdots \\ W_{M,1} \end{pmatrix} \quad \text {and}\quad B_1= \begin{pmatrix} B_{1,1}\\ B_{2,1}\\ \vdots \\ B_{M,1} \end{pmatrix}, \end{aligned}$$
(59)

let \(W_n\in {\mathbb {R}}^{k_n^{{{\,\mathrm{\boxplus }\,}}}\times k_{n-1}^{{{\,\mathrm{\boxplus }\,}}}}\), \(B_n\in {\mathbb {R}}^{k^{{{\,\mathrm{\boxplus }\,}}}_{n}}\), \(n\in [2,H]\cap {\mathbb {N}}\), satisfy for all \(n\in [2,H]\cap {\mathbb {N}}\) that

$$\begin{aligned} \begin{aligned} W_n= \begin{pmatrix} W_{1,n} &{} 0 &{} 0 &{} 0 \\ 0 &{} W_{2,n} &{} 0 &{} 0 \\ 0 &{} 0 &{} \ddots &{} 0 \\ 0 &{} 0 &{} 0 &{}W_{M,n} \end{pmatrix} \quad \text {and}\quad B_n= \begin{pmatrix} B_{1,n}\\ B_{2,n}\\ \vdots \\ B_{M,n} \end{pmatrix},\end{aligned} \end{aligned}$$
(60)

let \(W_{H+1}\in {\mathbb {R}}^{q\times k_{H}^{{{\,\mathrm{\boxplus }\,}}}}\), \(B_{H+1}\in {\mathbb {R}}^{q}\) satisfy that

$$\begin{aligned} \begin{aligned} W_{H+1}= \begin{pmatrix} h_1W_{1,H+1}&\ldots&h_MW_{M,H+1} \end{pmatrix}\quad \text {and}\quad B_{H+1} = \sum _{i=1}^{M}h_iB_{i,H+1}, \end{aligned} \end{aligned}$$
(61)

let \(x_0\in {\mathbb {R}}^p,\, x_1\in {\mathbb {R}}^{k_1^{{{\,\mathrm{\boxplus }\,}}}}, x_2\in {\mathbb {R}}^{k_2^{{{\,\mathrm{\boxplus }\,}}}} \ldots ,x_H\in {\mathbb {R}}^{k_H^{{{\,\mathrm{\boxplus }\,}}}}\), let \(x_{1,0},x_{2,0},\ldots ,x_{M,0}\in {\mathbb {R}}^{p}\), \(x_{i,n}\in {\mathbb {R}}^{k_{i,n}}\), \(i\in [1,M]\cap {\mathbb {N}}\), \(n\in [1,H]\cap {\mathbb {N}}\), satisfy for all \(i\in [1,M]\cap {\mathbb {N}}\), \(n\in [1,H]\cap {\mathbb {N}}\) that

$$\begin{aligned} &x_0=x_{1,0}=x_{2,0}=\cdots =x_{M,0},\\&x_{i,n}=\mathbf {A}_{k_{i,n}}(W_{i,n}x_{i,n-1}+B_{i,n}),\\&x_n= \mathbf {A}_{k^{{{\,\mathrm{\boxplus }\,}}}_{n}}(W_{n}x_{n-1}+B_{n}), \end{aligned}$$
(62)

and let \(\psi \in {\mathbf {N}}\) satisfy that

$$\begin{aligned} \psi = \left( (W_1,B_1),(W_2,B_2),\ldots ,(W_H,B_H),(W_{H+1},B_{H+1})\right) . \end{aligned}$$
(63)

First, the definitions of \(\mathcal {D}\) and \(\mathcal {R}\) (see (31) and (32)), (56), and the fact that \(\forall \, i\in [1,M]\cap {\mathbb {N}}:f_i\in C({\mathbb {R}}^p,{\mathbb {R}}^q)\) show for all \(i\in [1,M]\cap {\mathbb {N}}\) that \(k_i= (p,k_{i,1},k_{i,2},\ldots ,k_{i,H},q).\) The definition of \(\mathcal {D}\) (see (31)), the definition of \({{\,\mathrm{\boxplus }\,}}\) (see (34)), and (58) then show that

$$\begin{aligned} \mathcal {D}(\psi )= (p,k_1^{{{\,\mathrm{\boxplus }\,}}},\ldots ,k_H^{{{\,\mathrm{\boxplus }\,}}},q)={\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M}}k_i. \end{aligned}$$
(64)

Next, we prove by induction on \(n\in [1,H]\cap {\mathbb {N}}\) that \(x_n=(x_{1,n},x_{2,n},\ldots ,x_{M,n})\). First, (59) shows that

$$\begin{aligned} W_1x_0+B_1= \begin{pmatrix} W_{1,1}\\ W_{2,1}\\ \vdots \\ W_{M,1} \end{pmatrix}x_0+ \begin{pmatrix} B_{1,1}\\ B_{2,1}\\ \vdots \\ B_{M,1} \end{pmatrix} = \begin{pmatrix} W_{1,1}x_0+B_{1,1}\\ W_{2,1}x_0+B_{2,1}\\ \vdots \\ W_{M,1}x_0+B_{M,1} \end{pmatrix}. \end{aligned}$$
(65)

This implies that

$$\begin{aligned} x_1= \mathbf {A}_{k_1^{{{\,\mathrm{\boxplus }\,}}}}(W_1x_0+B_1)=\begin{pmatrix} x_{1,1}\\ x_{2,1}\\ \vdots \\ x_{M,1}\end{pmatrix}. \end{aligned}$$
(66)

This proves the base case. Next, for the induction step let \(n\in [2,H]\cap {\mathbb {N}}\) and assume that \(x_{n-1}=(x_{1,n-1},x_{2,n-1},\ldots ,x_{M,n-1})\). Then (60) and the induction hypothesis ensure that

$$\begin{aligned} \begin{aligned}&W_nx_{n-1}+B_n \\&\quad = W_{n}\begin{pmatrix} x_{1,n-1}\\ x_{2,n-1}\\ \vdots \\ x_{M,n-1} \end{pmatrix}+B_{n} =\begin{pmatrix} W_{1,n} &{} 0 &{} 0 &{} 0 \\ 0 &{} W_{2,n} &{} 0 &{} 0 \\ 0 &{} 0 &{} \ddots &{} 0 \\ 0 &{} 0 &{} 0 &{}W_{M,n} \end{pmatrix} \begin{pmatrix} x_{1,n-1}\\ x_{2,n-1}\\ \vdots \\ x_{M,n-1} \end{pmatrix}+ \begin{pmatrix} B_{1,n}\\ B_{2,n}\\ \vdots \\ B_{M,n} \end{pmatrix} \\&\quad = \begin{pmatrix} W_{1,n}x_{1,n-1}+ B_{1,n}\\ W_{2,n}x_{2,n-1}+B_{2,n}\\ \vdots \\ W_{M,n}x_{M,n-1}+ B_{M,n} \end{pmatrix}.\end{aligned} \end{aligned}$$
(67)

This yields that

$$\begin{aligned} x_{n}= \mathbf {A}_{k_n^{{{\,\mathrm{\boxplus }\,}}}}(W_nx_{n-1}+B_n)=\begin{pmatrix} x_{1,n}\\ x_{2,n}\\ \vdots \\ x_{M,n} \end{pmatrix}. \end{aligned}$$
(68)

This proves the induction step. Induction now proves for all \(n\in [1,H]\cap {\mathbb {N}}\) that \(x_n=(x_{1,n},x_{2,n},\ldots ,x_{M,n})\). This, the definition of \(\mathcal {R}\) (see (32)), and (61) imply that

$$\begin{aligned} \begin{aligned}&(\mathcal {R}(\psi ))(x_0)=W_{H+1}x_H+B_{H+1}\\&\quad =W_{H+1}\begin{pmatrix} x_{1,H}\\ x_{2,H}\\ \vdots \\ x_{M,H} \end{pmatrix}+B_{H+1} =\begin{pmatrix} h_1W_{1,H+1}&\ldots&h_MW_{M,H+1} \end{pmatrix} \begin{pmatrix} x_{1,H}\\ x_{2,H}\\ \vdots \\ x_{M,H} \end{pmatrix}+\left[ \sum _{i=1}^{M}h_iB_{i,H+1}\right] \\&\quad =\left[ \sum _{i=1}^{M}h_iW_{i,H+1}x_{i,H}\right] +\left[ \sum _{i=1}^{M}h_iB_{i,H+1}\right] = \sum _{i=1}^{M}h_i\left( W_{i,H+1}x_{i,H}+B_{i,H+1}\right) \\&\quad =\sum _{i=1}^M h_i(\mathcal {R}(\phi _i))(x_0). \end{aligned} \end{aligned}$$
(69)

This, the fact that \(x_0\in {\mathbb {R}}^{p}\) was arbitrary, and (56) yield that

$$\begin{aligned} \mathcal {R}(\psi )= \sum _{i=1}^{M}h_i\mathcal {R}(\phi _i)=\sum _{i=1}^{M}h_if_i. \end{aligned}$$
(70)

This and (64) show that

$$\begin{aligned} \sum _{i=1}^{M}h_if_i \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M}}k_i\right\} \right) . \end{aligned}$$
(71)

The proof of Lemma 3.9 is thus completed. \(\square\)

3.3 Deep neural network representations for MLP approximations

Lemma 3.10

Assume Setting 3.1, let \(d,M\in {\mathbb {N}}\), \(T,c \in (0,\infty )\), \(f\in C({\mathbb {R}},{\mathbb {R}})\), \(g \in C( {\mathbb {R}}^d, {\mathbb {R}})\), \(\Phi _f,\Phi _g\in \mathbf {N}\) satisfy that \(\mathcal {R}(\Phi _f)= f\), \(\mathcal {R}(\Phi _g)= g\), and

$$\begin{aligned} c\ge \max \left\{ 2, |||\mathcal {D}(\Phi _{f})|||,|||\mathcal {D}(\Phi _{g})|||\right\} , \end{aligned}$$
(72)

let\((\Omega , \mathcal {F}, {\mathbb {P}})\)be a probability space, let\(\Theta = \bigcup _{ n \in {\mathbb {N}}} {\mathbb {Z}}^n\), let\({\mathfrak {u}}^\theta :\Omega \rightarrow [0,1]\), \(\theta \in \Theta\), be independent random variables which are uniformly distributed on [0, 1], let\(\mathcal {U}^\theta :[0,T]\times \Omega \rightarrow [0, T]\), \(\theta \in \Theta\), satisfy for all\(t\in [0,T]\), \(\theta \in \Theta\)that\(\mathcal {U}^\theta _t = t+ (T-t){\mathfrak {u}}^\theta\), let\(W^\theta :[0,T]\times \Omega \rightarrow {\mathbb {R}}^d\), \(\theta \in \Theta\), be independent standard Brownian motions with continuous sample paths, assume that\(({\mathfrak {u}}^\theta )_{\theta \in \Theta }\)and\((W^\theta )_{\theta \in \Theta }\)are independent, let\({U}_{ n,M}^{\theta } :[0, T] \times {\mathbb {R}}^d \times \Omega \rightarrow {\mathbb {R}}\), \(n\in {\mathbb {Z}}\), \(\theta \in \Theta\), satisfy for all\(n \in {\mathbb {N}}\), \(\theta \in \Theta\), \(t \in [0,T]\), \(x\in {\mathbb {R}}^d\)that\({U}_{-1,M}^{\theta }(t,x)={U}_{0,M}^{\theta }(t,x)=0\)and

$$\begin{aligned} {U}_{n,M}^{\theta }(t,x) & = \frac{1}{M^n} \sum _{i=1}^{M^n} g\left( x+W^{(\theta ,0,-i)}_{T}-W^{(\theta ,0,-i)}_{t}\right) \\&\quad + \sum _{l=0}^{n-1} \frac{(T-t)}{M^{n-l}} \left[ \sum _{i=1}^{M^{n-l}} \left( f\circ {U}_{l,M}^{(\theta ,l,i)}-\mathbb {1}_{{\mathbb {N}}}(l)\,f\circ {U}_{l-1,M}^{(\theta ,-l,i)}\right)\right.\\ &\qquad\quad\qquad\qquad\qquad \left.\left( \mathcal {U}_t^{(\theta ,l,i)},x+W_{\mathcal {U}_t^{(\theta ,l,i)}}^{(\theta ,l,i)}-W_{t}^{(\theta ,l,i)}\right) \right] , \end{aligned}$$
(73)

and let\(\omega \in \Omega\). Then for all\(n\in {\mathbb {N}}_0\)there exists a family\((\Phi _{n,t}^{\theta })_{\theta \in \Theta ,t\in [0,T]}\subseteq \mathbf {N}\)such that

  1. (i)

    it holds for all\(t_1,t_2\in [0,T]\), \(\theta _1,\theta _2\in \Theta\)that

    $${\mathcal {D}}\left( \Phi _{n,t_1}^{\theta _1}\right) =\mathcal {D}\left( \Phi _{n,t_2}^{\theta _2}\right) ,$$
    (74)
  2. (ii)

    it holds for all\(t\in [0,T]\), \(\theta \in \Theta\)that

    $$\dim \left( \mathcal {D}\left( \Phi _{n,t}^{\theta }\right) \right) = n\left( \dim \left( \mathcal {D}\left( \Phi _{f}\right) \right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) ,$$
    (75)
  3. (iii)

    it holds for all\(t\in [0,T]\), \(\theta \in \Theta\)that

    $$|||\mathcal {D}(\Phi _{n,t}^{\theta } )|||\le c(3 M)^n,$$
    (76)

    and

  4. (iv)

    it holds for all\(\theta \in \Theta\), \(t\in [0,T]\), \(x\in {\mathbb {R}}^d\)that

    $${U}_{n,M}^{\theta }(t,x,\omega )=(\mathcal {R}(\Phi _{n,t}^{\theta }))(x).$$
    (77)

Proof of Lemma 3.10

We prove Lemma 3.10 by induction on \(n\in {\mathbb {N}}_0\). For the base case \(n=0\) note that the fact that \(\forall \, t\in [0,T],\theta \in \Theta :U^\theta _{0,M}(t,\cdot ,\omega )=0\), the fact that the function 0 can be represented by a network with depth \(\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right)\), and (72) imply that there exists \((\Phi _{0,t}^{\theta })_{\theta \in \Theta , t\in [0,T]}\subseteq \mathbf {N}\) such that it holds for all \(t_1,t_2\in [0,T]\), \(\theta _1,\theta _2\in \Theta\) that \(\mathcal {D}\left( \Phi _{0,t_1}^{\theta _1}\right) =\mathcal {D}\left( \Phi _{0,t_2}^{\theta _2}\right)\) and such that it holds for all \(\theta \in \Theta\), \(t\in [0,T]\) that \(\dim \left( \mathcal {D}(\Phi _{0,t}^{\theta })\right) =\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right)\), \(|||\mathcal {D}(\Phi _{0,t}^{\theta } )|||\le |||\mathcal {D}(\Phi _{g})|||\le c\), and \({U}_{0,M}^{\theta }(t,\cdot ,\omega )= \mathcal {R}(\Phi _{0,t}^{\theta })\). This proves the base case \(n=0\).

For the induction step from \(n\in {\mathbb {N}}_0\) to \(n+1\in {\mathbb {N}}\) let \(n\in {\mathbb {N}}_0\) and assume that (i)–(iv) hold true for all \(k\in [0,n]\cap {\mathbb {N}}_0\). The assumption that \(g=\mathcal {R}(\Phi _g)\) and Lemma 3.7 (with \(d=d\), \(m=1\), \(\lambda =1\), \(a=0\), \(b=W^{\theta }_{T}(\omega )-W^{\theta }_{t}(\omega )\), and \(\Psi =\Phi _g\) for \(\theta \in \Theta\), \(t\in [0,T]\) in the notation of Lemma 3.7) show for all \(\theta \in \Theta\), \(t\in [0,T]\) that

$$\begin{aligned} \begin{aligned} g\left( \cdot +W^{\theta }_{T}(\omega )-W^{\theta }_{t}(\omega )\right)&=(\mathcal {R}(\Phi _g))\left( \cdot +W^{\theta }_{T}(\omega )-W^{\theta }_{t}(\omega )\right) \\&\in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\mathcal {D}(\Phi _{g}) \right\} \right) . \end{aligned} \end{aligned}$$
(78)

Furthermore, Lemma 3.6 (with \(H=(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) -1\) in the notation of Lemma 3.6) ensures that

$$\begin{aligned} \mathrm {Id}_{{\mathbb {R}}}\in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathfrak {n}}_{(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}\right\} \right) . \end{aligned}$$
(79)

This, (78), and Lemma 3.8 (with \(d_1=d\), \(d_2=1\), \(d_3=1\), \(f=\mathrm {Id}_{{\mathbb {R}}}\), \(g=g\left( \cdot +W^{\theta }_{T}(\omega )-W^{\theta }_{t}(\omega )\right)\), \(\alpha ={\mathfrak {n}}_{(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}\), and \(\beta =\mathcal {D}(\Phi _g)\) for \(\theta \in \Theta\), \(t\in [0,T]\) in the notation of Lemma 3.8) show that for all \(\theta \in \Theta\), \(t\in [0,T]\) it holds that

$$\begin{aligned} \begin{aligned} g\left( \cdot +W^{\theta }_{T}(\omega )-W^{\theta }_{t}(\omega )\right) \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathfrak {n}}_{(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1} \odot \mathcal {D}(\Phi _{g}) \right\} \right) . \end{aligned} \end{aligned}$$
(80)

Next, the induction hypothesis implies for all \(\theta \in \Theta\), \(t\in [0,T]\), \(l\in [0,n]\cap {\mathbb {N}}_0\) that

$$\begin{aligned} {U}_{l,M}^{\theta }(t,\cdot ,\omega )=\mathcal {R}(\Phi _{l,t}^{\theta })\quad \text {and}\quad \mathcal {D}\left( \Phi _{l,t}^{\theta }\right) =\mathcal {D}\left( \Phi _{l,0}^{0}\right) . \end{aligned}$$
(81)

This and Lemma 3.7 (with

$$\begin{aligned} \begin{aligned}&d=d,\quad m=1,\quad a=0,\quad b=W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\quad \text {and}\\&\Psi =\Phi _{l,\mathcal {U}_t^{\theta }(\omega )}^{\eta }\quad \text {for}\quad \theta ,\eta \in \Theta , \quad t\in [0,T],\quad l\in [0,n]\cap {\mathbb {N}}_0 \end{aligned} \end{aligned}$$
(82)

in the notation of Lemma 3.7) imply that for all \(\theta ,\eta \in \Theta\), \(t\in [0,T]\), \(l\in [0,n]\cap {\mathbb {N}}_0\) it holds that

$$\begin{aligned} \begin{aligned}&U_{l,M}^{\eta } \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad =\left( \mathcal {R}\left( \Phi _{l,\mathcal {U}_t^{\theta }(\omega )}^{\eta }\right) \right) \left( \cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega )\right) \\&\quad \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )= \mathcal {D}\left( \Phi _{l,\mathcal {U}_t^{\theta }(\omega )}^{\eta }\right) \right\} \right) = \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )= \mathcal {D}\left( \Phi _{l,0}^{0}\right) \right\} \right) . \end{aligned} \end{aligned}$$
(83)

Moreover, Lemma 3.6 (with \(H=(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) -1\) for \(l\in [0,n-1]\cap {\mathbb {N}}_0\) in the notation of Lemma 3.6) ensures for all \(l\in [0,n-1]\cap {\mathbb {N}}_0\) that

$$\begin{aligned} \mathrm {Id}_{{\mathbb {R}}}\in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \right\} \right) . \end{aligned}$$
(84)

This, (83), and Lemma 3.8 (with

$$\begin{aligned} \begin{aligned}&d_1=d, \quad d_2=1, \quad d_3=1, \quad f=\mathrm {Id}_{{\mathbb {R}}}, \quad \alpha ={\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}}, \quad \\&\quad \beta =\mathcal {D}\left( \Phi _{l,0}^{0}\right) ,\quad \text {and}\quad g= U_{l,M}^{\eta } \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad \qquad \text {for}\quad \eta ,\theta \in \Theta ,\quad t\in [0,T],\quad l\in [0,n-1]\cap {\mathbb {N}}_0\\ \end{aligned} \end{aligned}$$
(85)

in the notation of Lemma 3.8) prove for all \(\eta ,\theta \in \Theta\), \(t\in [0,T]\), \(l\in [0,n-1]\cap {\mathbb {N}}_0\) that

$$\begin{aligned} &U_{l,M}^{\eta } \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )= {\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \odot \mathcal {D}(\Phi _{l,0}^{0}) \right\} \right) . \end{aligned}$$
(86)

This and Lemma 3.8 (with

$$\begin{aligned} \begin{aligned}&d_1=d, \quad d_2=1, \quad d_3=1, \quad f=f,\quad \alpha = \mathcal {D}(\Phi _f),\quad \\&\beta ={\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \odot \mathcal {D}(\Phi _{l,0}^{0}),\quad \text {and} \quad g=U_{l,M}^{\eta } \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad \qquad \text {for}\quad \eta ,\theta \in \Theta ,\quad t\in [0,T], \quad l\in [0,n-1]\cap {\mathbb {N}}_0 \end{aligned} \end{aligned}$$
(87)

in the notation of Lemma 3.8) assure for all \(\eta ,\theta \in \Theta\), \(t\in [0,T]\), \(l\in [0,n-1]\cap {\mathbb {N}}_0\) that

$$\begin{aligned} \begin{aligned}&\left( f\circ U_{l,M}^{\eta }\right) \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\mathcal {D}(\Phi _{f})\odot {\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \odot \mathcal {D}(\Phi _{l,0}^{0}) \right\} \right) . \end{aligned} \end{aligned}$$
(88)

Next, (83) (with \(l=n\)) and Lemma 3.8 (with

$$\begin{aligned} \begin{aligned}&d_1=d, \quad d_2=1, \quad d_3=1, \quad f=f,\quad \alpha = \mathcal {D}(\Phi _f),\quad \beta =\mathcal {D}\left( \Phi _{n,0}^{0}\right) ,\quad \text {and}\\&\quad g=\left( U_{n,M}^{\eta }\right) \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \quad \text {for}\quad \eta ,\theta \in \Theta ,\quad t\in [0,T] \end{aligned} \end{aligned}$$
(89)

in the notation of Lemma 3.8) prove for all \(\eta ,\theta \in \Theta\), \(t\in [0,T]\) that

$$\begin{aligned} \begin{aligned}&\left( f\circ U_{n,M}^{\eta }\right) \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\mathcal {D}(\Phi _{f}) \odot \mathcal {D}(\Phi _{n,0}^{0}) \right\} \right) . \end{aligned} \end{aligned}$$
(90)

Furthermore, the definition of \(\odot\) in (33) and the fact that

$$\begin{aligned} \forall \, l\in [0,n]\cap {\mathbb {N}}_0:\dim \left( \mathcal {D}(\Phi _{l,0}^{0})\right) =l \left( \dim \left( \mathcal {D}\left( \Phi _{f}\right) \right) -1\right) + \dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) \end{aligned}$$
(91)

in the induction hypothesis imply that

$$\begin{aligned} \begin{aligned}&\dim \left( {\mathfrak {n}}_{{(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}}\odot \mathcal {D}(\Phi _{g})\right) \\&\quad =\left[ (n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1\right] +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) -1\\&\quad =(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) , \end{aligned} \end{aligned}$$
(92)

that

$$\begin{aligned} \begin{aligned}&\dim \left( \mathcal {D}(\Phi _{f}) \odot \mathcal {D}(\Phi _{n,0}^{0}) \right) = \dim \left( \mathcal {D}(\Phi _{f})\right) +\dim \left( \mathcal {D}(\Phi _{n,0}^{0})\right) -1\\&\quad = \dim \left( \mathcal {D}(\Phi _{f})\right) +\left[ n\left( \dim \left( \mathcal {D}\left( \Phi _{f}\right) \right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) \right] -1\\&\quad = (n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) ,\end{aligned} \end{aligned}$$
(93)

and for all \(l\in [0,n-1]\cap {\mathbb {N}}_0\) that

$$\begin{aligned} \begin{aligned}&\dim \left( \mathcal {D}(\Phi _{f}) \odot {\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \odot \mathcal {D}(\Phi _{l,0}^{0}) \right) \\&\quad = \dim \left( \mathcal {D}(\Phi _{f})\right) +\dim \left( {\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}}\right) +\dim \left( \mathcal {D}(\Phi _{l,0}^{0}) \right) -2\\&\quad =\dim \left( \mathcal {D}(\Phi _{f})\right) +\left[ (n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1 \right] \\&\qquad + \left[ l \left( \dim \left( \mathcal {D}\left( \Phi _{f}\right) \right) -1\right) + \dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) \right] -2\\&\quad =\dim \left( \mathcal {D}(\Phi _{f})\right) + n\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) -1\\&\quad = (n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) . \end{aligned} \end{aligned}$$
(94)

This shows, roughly speaking, that the functions in (80), (90), and (88) can be represented by networks with the same depth (i.e. number of layers): \((n+1)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1)+\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right)\). Hence, Lemma 3.9 and (73) imply that there exists a family \((\Phi _{n+1,t}^{\theta })_{\theta \in \Theta , t\in [0,T]}\subseteq \mathbf {N}\) such that for all \(\theta \in \Theta\), \(t\in [0,T]\), \(x\in {\mathbb {R}}^d\) it holds that

$$\begin{aligned} \begin{aligned}&\left( \mathcal {R}(\Phi _{n+1,t}^{\theta })\right) (x) \\&\quad = \frac{1}{M^{n+1}} \sum _{i=1}^{M^{n+1}} g\left( x+W^{(\theta ,0,-i)}_{T}(\omega )-W^{(\theta ,0,-i)}_{t}(\omega )\right) \\&\qquad + \frac{(T-t)}{M} \sum _{i=1}^{M} \left( f\circ {U}_{n,M}^{(\theta ,n,i)}\right) \left( \mathcal {U}_t^{(\theta ,n,i)}(\omega ),x+W_{\mathcal {U}_t^{(\theta ,n,i)}(\omega )}^{(\theta ,n,i)}(\omega )- W_{t}^{(\theta ,n,i)}(\omega ),\omega \right) \\&\qquad + \sum _{l=0}^{n-1} \frac{(T-t)}{M^{n+1-l}} \sum _{i=1}^{M^{n+1-l}} \left( f\circ {U}_{l,M}^{(\theta ,l,i)}\right) \left( \mathcal {U}_t^{(\theta ,l,i)}(\omega ),x+W_{\mathcal {U}_t^{(\theta ,l,i)}(\omega )}^{(\theta ,l,i)}(\omega )- W_{t}^{(\theta ,l,i)}(\omega ),\omega \right) \\&\qquad -\sum _{l=1}^{n} \frac{(T-t)}{M^{n+1-l}} \sum _{i=1}^{M^{n+1-l}} \left( f\circ {U}_{l-1,M}^{(\theta ,-l,i)} \right) \left( \mathcal {U}_t^{(\theta ,l,i)}(\omega ),x+W_{\mathcal {U}_t^{(\theta ,l,i)}(\omega )}^{(\theta ,l,i)}(\omega )- W_{t}^{(\theta ,l,i)}(\omega ),\omega \right) \\&\quad = {U}_{n+1,M}^{\theta }(t,x,\omega ), \end{aligned} \end{aligned}$$
(95)

that

$$\begin{aligned} \dim \left( \mathcal {D}(\Phi _{n+1,t}^{\theta })\right) = (n+1)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1)+\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) , \end{aligned}$$
(96)

and that

$$\begin{aligned} \begin{aligned} \mathcal {D}(\Phi _{n+1,t}^{\theta })& = \left( {\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M^{n+1}}}\left[ {\mathfrak {n}}_{{(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \odot \mathcal {D}(\Phi _{g})\right] \right) {{\,\mathrm{\boxplus }\,}}\left( {\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M}} \left( \mathcal {D}\left( \Phi _{f}\right) \odot \mathcal {D}\left( \Phi _{n,0}^{0}\right) \right) \right) \\&\quad {{\,\mathrm{\boxplus }\,}}\left( {\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{l=0}^{n-1}}{\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M^{n+1-l}}}\left[ \left( \mathcal {D}(\Phi _{f})\odot {\mathfrak {n}}_{{(n-l)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1) +1 }} \odot \mathcal {D}(\Phi _{l,0}^{0})\right) \right) \right. \\&\left. \quad {{\,\mathrm{\boxplus }\,}}\left( {\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{l=1}^{n}}{\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M^{n+1-l}}} \left( \mathcal {D}(\Phi _{f})\odot {\mathfrak {n}}_{{(n-l+1)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1) +1 }} \odot \mathcal {D}(\Phi _{l-1,0}^{0})\right) \right] \right) .\end{aligned} \end{aligned}$$
(97)

This shows for all \(t_1,t_2\in [0,T]\), \(\theta _1,\theta _2\in \Theta\) that

$$\begin{aligned} \mathcal {D}\left( \Phi _{n+1,t_1}^{\theta _1}\right) =\mathcal {D}\left( \Phi _{n+1,t_2}^{\theta _2}\right) . \end{aligned}$$
(98)

Furthermore, (97), the triangle inequality (see Lemma 3.5), and the fact that

$$\begin{aligned} \forall \, l\in [0,n]\cap {\mathbb {N}}_0:|||\mathcal {D}(\Phi _{l,0}^{0} )|||\le c(3 M)^l \end{aligned}$$
(99)

in the induction hypothesis show for all \(\theta \in \Theta\), \(t\in [0,T]\) that

$$\begin{aligned} \begin{aligned} |||\mathcal {D}(\Phi _{n+1,t}^{\theta })|||&\le \sum _{i=1}^{M^{n+1}}||| {\mathfrak {n}}_{{(n+1)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1)+1}} \odot \mathcal {D}(\Phi _{g})|||+\sum _{i=1}^{M}|||\mathcal {D}(\Phi _{f}) \odot \mathcal {D}(\Phi _{n,0}^{0})|||\\&\quad + \sum _{l=0}^{n-1}\sum _{i=1}^{M^{n+1-l}} |||\mathcal {D}(\Phi _{f})\odot {\mathfrak {n}}_{{(n-l)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1) +1 }} \odot \mathcal {D}(\Phi _{l,0}^{0})|||\\&\quad + \sum _{l=1}^{n}\sum _{i=1}^{M^{n+1-l}}|||\mathcal {D}(\Phi _{f})\odot {\mathfrak {n}}_{{(n-l+1)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1) +1 }} \odot \mathcal {D}(\Phi _{l-1,0}^{0})|||. \end{aligned} \end{aligned}$$
(100)

Note that for all \(H_1,H_2,\alpha _0,\ldots ,\alpha _{H_1+1},\beta _0,\ldots , \beta _{H_2+1}\in {\mathbb {N}}\), \(\alpha ,\beta \in \mathbf {D}\) with \(\alpha =(\alpha _0,\ldots ,\alpha _{H_1+1})\), \(\beta =(\beta _0,\ldots , \beta _{H_2+1})\), \(\alpha _0=\beta _{H_2+1}=1\) it holds that \(|||\alpha \odot \beta |||\le \max \{|||\alpha |||,|||\beta |||,2\}\) (see (33)). This, (100), the fact that \(\forall \, H\in {\mathbb {N}}:|||{\mathfrak {n}}_{{H+2}}|||=2\) (see (35)), (72), and (99) prove for all \(\theta\in\Theta, t\in[0,T]\) that

$$\begin{aligned} \begin{aligned}&|||\mathcal {D}(\Phi _{n+1,t}^{\theta })|||\\&\quad \le \left[ \sum _{i=1}^{M^{n+1}}c \right] + \left[ \sum _{i=1}^{M}c({3} M)^n\right] + \left[ \sum _{l=0}^{n-1}\sum _{i=1}^{M^{n+1-l}}c({3} M)^l\right] +\left[ \sum _{l=1}^{n}\sum _{i=1}^{M^{n+1-l}}c({3} M)^{l-1}\right] \\&\quad = M^{n+1}c+Mc(3M)^{n}+\left[ \sum _{l=0}^{n-1}M^{n+1-l}c(3M)^l\right] +\left[ \sum _{l=1}^{n}M^{n+1-l}c(3M)^{l-1}\right] \\&\quad \leq M^{n+1}c\left[ 1+3^n+\sum _{l=0}^{n-1}3^l+\sum _{l=1}^{n}3^{l-1}\right] = M^{n+1}c\left[ 1+\sum _{l=0}^{n}3^l+\sum _{l=1}^{n}3^{l-1}\right] \\&\quad \le cM^{n+1}\left[ 1+2\sum _{l=0}^{n} {3} ^l\right] = cM^{n+1}\left[ 1+2\frac{{3}^{n+1}-1}{{3}-1}\right] = c({3} M)^{n+1}. \end{aligned} \end{aligned}$$
(101)

Combining (95), (96), (98), and (101) completes the induction step. Induction hence establishes (i)–(iv). The proof of Lemma 3.10 is thus completed. \(\square\)

3.4 Deep neural network approximations for the PDE nonlinearity

Lemma 3.11

(DNN interpolation) Assume Setting 3.1, let \(N\in {\mathbb {N}}\), \(a_0,a_1,\ldots , a_{N-1},\xi _0,\xi _1,\ldots ,\xi _N\in {\mathbb {R}}\) satisfy that \(\xi _0<\xi _1<\ldots <\xi _N\), let \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) be a function, assume for all \(x\in (-\infty ,\xi _0]\) that \(f(x)=f(\xi _0)\), assume for all \(n\in [0,N-1]\cap {\mathbb {Z}}\), \(x\in (\xi _n,\xi _{n+1}]\) that \(f(x)=f(\xi _n)+a_n(x-\xi _n)\), and assume for all \(x\in (\xi _N,\infty )\) that \(f(x)=f(\xi _N)\). Then it holds that

$$f\in {\mathcal {R}}(\{\Phi \in {\mathbf {N}}:{\mathcal {D}}(\Phi )=(1,N+1,1)\}).$$
(102)

Proof of Lemma 3.11

Throughout this proof let \(a_{-1}=0\) and \(a_N=0\), let \(c_n \in {\mathbb {R}}, n \in\)\([0,N]\cap {\mathbb {Z}}\), be the real numbers which satisfy for all \(n\in [0,N]\cap {\mathbb {Z}}\) that \(c_n=a_{n}-a_{n-1}\), let \(W_1\in {\mathbb {R}}^{(N+1)\times 1}\), \(B_1\in {\mathbb {R}}^{N+1}\), \(W_2\in {\mathbb {R}}^{1\times (N+1)}\), \(B_2\in {\mathbb {R}}\), \(\Phi \in \mathbf {N}\) be given by

$$\begin{aligned}&W_1 = \begin{pmatrix} 1\\ 1\\ \vdots \\ 1 \end{pmatrix} ,\quad B_1=\begin{pmatrix} -\xi _0\\ -\xi_1\\ \vdots \\ -\xi_N \end{pmatrix} ,\quad W_2= \begin{pmatrix} c_0&c_1&\ldots&c_N \end{pmatrix},\quad B_2= f(\xi _0), \end{aligned}$$
(103)

and

$$\begin{aligned} \Phi = ((W_1,B_1),(W_2,B_2)), \end{aligned}$$
(104)

and let \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\) satisfy for all \(x\in {\mathbb {R}}\) that

$$\begin{aligned} g(x)=f(\xi _0)+\sum _{k=0}^{N}c_k\max \{x-\xi _k,0\}. \end{aligned}$$
(105)

First, observe that the fact that \(\forall \,n\in [0,N-1]\cap {\mathbb {Z}}:\xi _n<\xi _{n+1}\) and the fact that \(\forall \, n\in [0,N]\cap {\mathbb {Z}}:a_n= \sum _{k=0}^{n}c_k\) then show for all \(n\in [0,N-1]\cap {\mathbb {Z}}\), \(x\in (\xi _n,\xi _{n+1}]\) that

$$\begin{aligned} \begin{aligned} g(x)-g(\xi _n)&= \left[ \sum _{k=0}^{N}c_k\left( \max \{x-\xi _k,0\}-\max \{\xi _{n}-\xi _k,0\}\right) \right] \\&=\sum _{k=0}^{n}c_k [(x-\xi _k)-(\xi _{n}-\xi _k)]= \sum _{k=0}^{n}c_k (x-\xi _{n})=a_n(x-\xi _n). \end{aligned} \end{aligned}$$
(106)

This shows for all \(n\in [0,N-1]\cap {\mathbb {Z}}\) that g is affine linear on the interval \((\xi _n,\xi _{n+1}]\). This, the fact that for all \(n\in [0,N-1]\cap {\mathbb {Z}}\) it holds that f is affine linear on the interval \((\xi _n,\xi _{n+1}]\), the fact that \(\forall \, x\in (-\infty ,\xi _0]:f(x)= g(x)=f(\xi _0)\), and an induction argument imply for all \(x\in (-\infty ,\xi _N]\) that \(f(x)=g(x)\). Furthermore, (105), the fact that \(\forall \,n\in [0,N-1]\cap {\mathbb {Z}}:\xi _n<\xi _{n+1}\), and the fact that \(\sum _{k=0}^{N}c_k=0\) imply for all \(x\in (\xi _N,\infty )\) that

$$\begin{aligned} \begin{aligned} g(x)-g(\xi _N)&= \left[ \sum _{k=0}^{N}c_k\left( \max \{x-\xi _k,0\}-\max \{\xi _{N}-\xi _k,0\}\right) \right] \\&=\sum _{k=0}^{N}c_k [(x-\xi _k)-(\xi _{N}-\xi _k)]= \sum _{k=0}^{N}c_k (x-\xi _{N})=0.\end{aligned} \end{aligned}$$
(107)

This shows for all \(x\in (\xi _N,\infty )\) that \(g(x)=g(\xi _N)\). This, the fact that \(\forall \,x\in (\xi _N,\infty ):f(x)=f(\xi _N)\), the fact that \(\forall \,x\in (-\infty ,\xi _N]:f(x)=g(x)\), and (105) prove for all \(x\in {\mathbb {R}}\) that

$$\begin{aligned} f(x)=g(x)=f(\xi _0)+\sum _{k=0}^{N}c_k\max \{x-\xi _k,0\}. \end{aligned}$$
(108)

Next, the definition of \(\mathcal {R}\) and \(\mathcal {D}\) (see (31) and (32)), (103), (104), and (108) imply that for all \(x\in {\mathbb {R}}\) it holds that \(\mathcal {D}(\Phi )=(1,N+1,1)\) and

$$\begin{aligned} \begin{aligned}&(\mathcal {R}(\Phi ))(x)= W_2( \mathbf {A}_{N+1}(W_1x+B_1))+B_2\\ {}&= \begin{pmatrix} c_0&c_1&\ldots&c_N \end{pmatrix} \begin{pmatrix} \max \{x-\xi _0,0\}\\ \max \{x-\xi _1,0\}\\ \vdots \\ \max \{x-\xi _N,0\} \end{pmatrix}+f(\xi _0)= f(\xi _0)+\sum _{k=0}^{N}c_k\max \{x-\xi _k,0\}=f(x).\end{aligned} \end{aligned}$$
(109)

This establishes (102). The proof of Lemma 3.11 is thus completed. \(\square\)

Lemma 3.12

Let \(L\in [0,\infty )\), \(N\in {\mathbb {N}}\), \(a\in {\mathbb {R}}\), \(b\in (a,\infty )\), \(\xi _0, \xi _1,\ldots , \xi _N\in {\mathbb {R}}\) satisfy for all \(n\in [0,N]\cap {\mathbb {Z}}\) that \(\xi _n=a+\frac{(b-a)n}{N}\), let \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) satisfy for all \(x,y\in {\mathbb {R}}\) that

$$\begin{aligned} |f(x)-f(y)|\le L|x-y|, \end{aligned}$$
(110)

and let\(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\)satisfy for all\(x\in {\mathbb {R}}\), \(n\in [0,N-1]\cap {\mathbb {Z}}\)that

$$g(x) = \left\{ {\begin{array}{*{20}l} {f(\xi _{0} )} \hfill & {:x \in ( - \infty ,\xi _{0} ]} \hfill \\ {\frac{{f(\xi _{n} )(\xi _{{n + 1}} - x) + f(\xi _{{n + 1}} )(x - \xi _{n} )}}{{\xi _{{n + 1}} - \xi _{n} }}} \hfill & {:x \in (\xi _{n} ,\xi _{{n + 1}} ]} \hfill \\ {f(\xi _{N} )} \hfill & {:x \in (\xi _{N} ,\infty ).} \hfill \\ \end{array} } \right.$$
(111)

Then

  1. (i)

    it holds for all\(n\in [0,N]\cap {\mathbb {Z}}\)that\(g(\xi _n)=f(\xi _n)\),

  2. (ii)

    it holds for all\(x,y\in {\mathbb {R}}\)that\(|g(x)-g(y)|\le L|x-y|\), and

  3. (iii)

    it holds for all\(x\in [a,b]\)that\(|g(x)-f(x)|\le \frac{2L(b-a)}{N}\).

Proof of Lemma 3.12

Throughout this proof let \(r,\ell :{\mathbb {R}}\rightarrow {\mathbb {R}}\) satisfy for all \(x\in {\mathbb {R}}{\setminus}(a,b]\) that

$$\begin{aligned} r(x)=\ell (x)=x \end{aligned}$$
(112)

and for all \(n\in [0,N-1]\cap {\mathbb {Z}}\), \(x\in (\xi _n,\xi _{n+1}]\) that

$$\begin{aligned} r(x)= \xi _{n+1}\quad \text {and}\quad \ell (x)= \xi _{n}. \end{aligned}$$
(113)

Note that (111) implies (i). Next observe that for all \(x,y\in (a,b]\) with \(x\le y\) and \(\ell (y)<r(x)\) it holds that \(r(x)=r(y)\) and \(\ell (x) =\ell (y)\). This, (112), (111), and (110) show that for all \(x,y\in {\mathbb {R}}\) with \(x\le y\) and \(\ell (y)<r(x)\) it holds that \(x,y\in (a,b]\), \(r(x)=r(y)\), \(\ell (x) =\ell (y)\), and

$$\begin{aligned} |g(x)-g(y)|= \left| \frac{f(r(x))-f(\ell (x))}{r(x)-\ell (x)} (x-y)\right| \le L |x-y|. \end{aligned}$$
(114)

Furthermore, (111), (110), and the fact that \(\forall \,x\in {\mathbb {R}}:\ell (x)\le x\le r(x)\) imply for all \(x\in (a,b]\) that

$$\begin{aligned} \begin{aligned} |g(x)-g(r(x))|&=\left| \frac{f(\ell (x))-f(r(x))}{\ell (x)-r(x)} (x-\ell (x))+f(\ell (x))-f(r(x))\right| \\&= \left| \frac{(f(\ell (x))-f(r(x)))(x-r(x))}{\ell (x)-r(x)}\right| \le L|x-r(x)|=L(r(x)-x) \end{aligned} \end{aligned}$$
(115)

and

$$\begin{aligned} \begin{aligned} g(x)-g(\ell (x))&=\left| \frac{f(\ell (x))-f(r(x))}{\ell (x)-r(x)} (x-\ell (x))+f(\ell (x))-f(\ell (x))\right| \\&=\left| \frac{f(\ell (x))-f(r(x))}{\ell (x)-r(x)} (x-\ell (x))\right| \le L|x-\ell (x)|=L(x-\ell (x)). \end{aligned} \end{aligned}$$
(116)

This and (112) show for all \(x\in {\mathbb {R}}\) that

$$\begin{aligned} |g(x)-g(r(x))|\le L(r(x)-x) \quad \text {and}\quad |g(x)-g(\ell (x))|\le L(x-\ell (x)) . \end{aligned}$$
(117)

The triangle inequality therefore shows for all \(x,y\in {\mathbb {R}}\) with \(x\le y\) and \(r(x)\le \ell (y)\) that

$$\begin{aligned} \begin{aligned} |g(x)-g(y)|&\le | g(x)-g(r(x))|+|g(r(x))-g(\ell (y))|+|g(\ell (y))-g(y)|\\&\le L (r(x)-x)+ L(\ell (y)-r(x))+ L (y-\ell (y))= L(y-x)= L|y-x|.\end{aligned} \end{aligned}$$
(118)

This and (114) show for all \(x,y\in {\mathbb {R}}\) with \(x\le y\) that \(|g(x)-g(y)|\le L|x-y|\). Symmetry hence establishes (ii). Next, the fact that \(\forall \,x\in {\mathbb {R}}:g(\ell (x))=f(\ell (x))\), the triangle inequality, (110), (117), and the fact that \(\forall \,x\in [a,b]:0\le x-\ell (x)\le (b-a)/N\) imply for all \(x\in [a,b]\) that

$$\begin{aligned} \begin{aligned} |g(x)-f(x)|&= |g(x)-f(\ell (x))+f(\ell (x))-f(x)|\\&= |g(x)-g(\ell (x))+f(\ell (x))-f(x)|\\&\le |g(x)-g(\ell (x))|+|f(\ell (x))-f(x)|\le 2L (x-\ell (x))\le 2L(b-a)/N. \end{aligned} \end{aligned}$$
(119)

This establishes (iii). The proof of Lemma 3.12 is thus completed. \(\square\)

Corollary 3.13

Assume Setting 3.1, let \(\epsilon \in (0,1]\), \(L\in [0,\infty )\), \(q\in (1,\infty )\), and let \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) satisfy for all \(x,y\in {\mathbb {R}}\) that \(|f(x)-f(y)|\le L|x-y|.\) Then there exists a function \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\) such that

  1. (i)

    it holds for all\(x,y\in {\mathbb {R}}\)that\(|g(x)-g(y)|\le L|x-y|\),

  2. (ii)

    it holds for all\(x\in {\mathbb {R}}\)that\(|f(x)-g(x)|\le \epsilon (1+|x|^q)\), and

  3. (iii)

    it holds that

    $$\begin{aligned} g\in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )\in {\mathbb {N}}^3 and\; |||\mathcal {D}(\Phi )|||\le \frac{4L\left( 1+\left( 4L+2|f(0)| \right) ^{\frac{1}{q-1}} \right) +2}{\epsilon ^{\frac{q}{(q-1)}}}\right\} \right) . \end{aligned}$$
    (120)

Proof of Corollary 3.13

Throughout this proof let \(R\in {\mathbb {R}}\), \(N\in {\mathbb {N}}\) satisfy that

$$\begin{aligned} R=\max \left( 1,\left( \frac{4L+2|f(0)|}{\epsilon } \right) ^{\frac{1}{q-1}} \right) \quad \text {and}\quad N=\min \left\{ n\in {\mathbb {N}}:\frac{4LR}{n}\le \epsilon \right\} , \end{aligned}$$
(121)

let \(\xi _0, \xi _1,\ldots , \xi _N\in {\mathbb {R}}\) be the real numbers which satisfy for all \(n\in [0,N]\cap {\mathbb {Z}}\) that \(\xi _n=R(-1+\frac{2n}{N})\), and let \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\) satisfy for all \(x\in {\mathbb {R}}\), \(n\in [0,N-1]\cap {\mathbb {Z}}\) that

$$g(x) = \left\{ {\begin{array}{*{20}l} {f(\xi _{0} )} \hfill & {:x \in ( - \infty ,\xi _{0} ]} \hfill \\ {\frac{{f(\xi _{n} )(\xi _{{n + 1}} - x) + f(\xi _{{n + 1}} )(x - \xi _{n} )}}{{\xi _{{n + 1}} - \xi _{n} }}} \hfill & {:x \in (\xi _{n} ,\xi _{{n + 1}} ]} \hfill \\ {f(\xi _{N} )} \hfill & {:x \in (\xi _{N} ,\infty ).} \hfill \\ \end{array} } \right.$$
(122)

By (ii) in Lemma 3.12 the function g satisfies (i). Next, it follows from (iii) in Lemma 3.12 that for all \(x\in [-R,R]\) it holds \(|g(x)-f(x)|\le 4LR/N\). This and the fact that \(4LR/N\le \epsilon\) prove that for all \(x\in [-R,R]\) it holds that \(|g(x)-f(x)|\le \epsilon \le \epsilon (1+|x|^q)\). Next, the triangle inequality, the fact that \(f(R)=g(R)\), and the Lipschitz condition of f and g imply for all \(x\in {\mathbb {R}}\) that

$$\begin{aligned} \begin{aligned} |f(x)-g(x)|&\le |f(x)-f(0)|+|f(0)|+|g(x)-g(R)|+|g(R)|\\&= |f(x)-f(0)|+|f(0)|+|g(x)-g(R)|+|f(R)|\\&\le |f(x)-f(0)|+|f(0)|+|g(x)-g(R)|+|f(R)-f(0)|+|f(0)|\\&\le L|x|+2|f(0)|+L|x-R|+LR\\&\le 2L(|x|+R)+2|f(0)|. \end{aligned} \end{aligned}$$
(123)

This and (121) show for all \(x\in {\mathbb {R}}{\setminus}[-R,R]\) that

$$\begin{aligned} \begin{aligned} \frac{|f(x)-g(x)|}{1+|x|^q}&\le \frac{2L(|x|+R)+2|f(0)|}{1+|x|^q}\le \frac{4L|x|+2|f(0)|}{1+|x|^q} \\&\le \frac{4L}{|x|^{q-1}}+\frac{2|f(0)|}{|x|^q}\le \frac{4L}{R^{q-1}}+\frac{2|f(0)|}{R^q}\le \frac{4L+2|f(0)|}{R^{q-1}} \le \epsilon .\end{aligned} \end{aligned}$$
(124)

This and the fact that \(\forall \,x\in [-R,R]:|g(x)-f(x)|\le \epsilon (1+|x|^q)\) prove that for all \(x\in {\mathbb {R}}\) it holds that \(|g(x)-f(x)|\le \epsilon (1+|x|^q)\). This shows that g satisfies (ii). Next, (i) in Lemma 3.12 ensures that g satisfies for all \(x\in (-\infty ,-R]\) that \(g(x)=g(-R)\), for all \(n\in [0,{\mathbb {N}}-1]\cap {\mathbb {Z}}\), \(x\in (\xi _n,\xi _{n+1}]\) that \(g(x)=g(\xi _n)+\frac{g(\xi _{n+1})-g(\xi _n)}{\xi _{n+1}-\xi _n}(x-\xi _n)\), and for all \(x\in (R,\infty )\) that \(g(x)=g(R)\). This and Lemma 3.11 (with \(N=N\), \(f=g\), \(\xi _n=\xi _n\) for \(n\in [0,N]\cap {\mathbb {Z}}\), and \(a_n= (g(\xi _{n+1})-g(\xi _n))/(\xi _{n+1}-\xi _n)\) for \(n\in [0,N-1]\cap {\mathbb {Z}}\) in the notation of Lemma 3.11) imply that

$$\begin{aligned} g\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=(1,N+1,1)\}). \end{aligned}$$
(125)

Furthermore, if \(N>1\), then (121) implies that \(\frac{4LR}{N-1}>\epsilon\). Hence, if \(N>1\) it holds that \(N<\frac{4LR}{\epsilon }+1\). This and (121) ensure that

$$N \le \frac{4LR}{\epsilon }+1=\frac{4L\max \left( 1,\left( \frac{4L+2|f(0)|}{\epsilon } \right) ^{\frac{1}{q-1}} \right) +\epsilon }{\epsilon }.$$
(126)

This and (125) imply that

$$\begin{aligned} |||\mathcal {D}(\Phi )|||&=N+1 \le \frac{4L\max \left( 1,\left( \frac{4L+2|f(0)|}{\epsilon } \right) ^{\frac{1}{q-1}} \right) +2\epsilon }{\epsilon }\\ {}&= \frac{4L\max \left( \epsilon ^{\frac{1}{q-1}},\left( 4L+2|f(0)| \right) ^{\frac{1}{q-1}} \right) +2\epsilon ^{\frac{q}{(q-1)}}}{\epsilon ^{\frac{q}{(q-1)}}}\\ {}&\le \frac{4L\left( 1+\left( 4L+2|f(0)| \right) ^{\frac{1}{q-1}} \right) +2}{\epsilon ^{\frac{q}{(q-1)}}}. \end{aligned}$$
(127)

This establishes (iii). The proof of Corollary 3.13 is thus completed. \(\square\)

4 Deep neural network approximations for PDEs

4.1 Deep neural network approximations with specific polynomial convergence rates

Theorem 4.1

Let \(\left\| \cdot \right\| , |||\cdot ||| :(\cup _{d\in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0,\infty )\) and \(\dim :(\cup _{d\in {\mathbb {N}}}{\mathbb {R}}^d) \rightarrow {\mathbb {N}}\) satisfy for all \(d\in {\mathbb {N}}\), \(x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d\) that \(\Vert x\Vert =[\sum _{i=1}^d(x_i)^2]^{1/2}\), \(|||x|||=\max _{i\in [1,d]\cap {\mathbb {N}}}|x_i|\), and \(\dim \left( x\right) =d\), let \(T,L, B,\beta \in [0,\infty )\), \(p,{\mathfrak {p}}\in {\mathbb {N}}\), \(q\in {\mathbb {N}}\cap [2,\infty )\), \(\alpha \in [2,\infty )\), let \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) satisfy for all \(x,y\in {\mathbb {R}}\) that \(|f(x)-f(y)|\le L|x-y|\), for every \(d\in {\mathbb {N}}\) let \(g_d\in C({\mathbb {R}}^d,{\mathbb {R}})\), for every \(d\in {\mathbb {N}}\) let \(\nu _d:\mathcal B({\mathbb {R}}^d)\rightarrow [0,1]\) be a probability measure on \(({\mathbb {R}}^d, \mathcal B({\mathbb {R}}^d))\), for every \(d\in {\mathbb {N}}\) let \(\mathbf {A}_{d}:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\) satisfy for all \(x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d\) that

$$\begin{aligned} \mathbf {A}_{d}(x)= \left( \max \{x_1,0\},\ldots ,\max \{x_d,0\}\right) , \end{aligned}$$
(128)

let\(\mathbf {D}=\cup _{H\in {\mathbb {N}}} {\mathbb {N}}^{H+2}\), let

$$\begin{aligned} \begin{aligned} \mathbf {N}= \bigcup _{H\in {\mathbb {N}}}\bigcup _{(k_0,k_1,\ldots ,k_{H+1})\in {\mathbb {N}}^{H+2}} \left[ \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_{n}\times k_{n-1}} \times {\mathbb {R}}^{k_{n}}\right) \right] , \end{aligned} \end{aligned}$$
(129)

let\(\mathcal {P}:\mathbf {N}\rightarrow {\mathbb {N}}\), \(\mathcal {D}:\mathbf {N}\rightarrow \mathbf {D}\), and\(\mathcal {R}:\mathbf {N}\rightarrow (\cup _{k,l\in {\mathbb {N}}} C({\mathbb {R}}^k,{\mathbb {R}}^l))\)satisfy for all\(H\in {\mathbb {N}}\), \(k_0,k_1,\ldots ,k_H,k_{H+1}\in {\mathbb {N}}\), \(\Phi = ((W_1,B_1),\ldots ,(W_{H+1},B_{H+1}))\in \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_n\times k_{n-1}} \times {\mathbb {R}}^{k_n}\right) ,\)\(x_0 \in {\mathbb {R}}^{k_0},\ldots ,x_{H}\in {\mathbb {R}}^{k_{H}}\)with\(\forall \, n\in {\mathbb {N}}\cap [1,H]:x_n = \mathbf {A}_{k_n}(W_n x_{n-1}+B_n )\)that

$$\begin{aligned}&\mathcal {P}(\Phi )=\sum _{n=1}^{H+1}k_n(k_{n-1}+1), \qquad \mathcal {D}(\Phi )= (k_0,k_1,\ldots ,k_{H}, k_{H+1}), \end{aligned}$$
(130)
$$\begin{aligned}&\mathcal {R}(\Phi )\in C({\mathbb {R}}^{k_0},{\mathbb {R}}^ {k_{H+1}}),\qquad and\qquad (\mathcal {R}(\Phi )) (x_0) = W_{H+1}x_{H}+B_{H+1}, \end{aligned}$$
(131)

for every\(\varepsilon \in (0,1]\), \(d\in {\mathbb {N}}\)let\({\mathfrak {g}}_{{d,\varepsilon }} \in \mathbf {N}\), and assume for all\(d\in {\mathbb {N}}\), \(x\in {\mathbb {R}}^d\), \(\varepsilon \in (0,1]\)that\(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }})\in C({\mathbb {R}}^d,{\mathbb {R}})\), \(|(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)|\le Bd^p (1+\left\| {x}\right\| )^p\), \(\left| g_d(x)-(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)\right| \le \varepsilon Bd^p (1+\left\| {x}\right\| )^{pq}\), \(|||\mathcal {D}({\mathfrak {g}}_{{d,\varepsilon }})|||\le Bd^p\varepsilon ^{-\alpha }\), \(\dim \left( \mathcal {D}\left({\mathfrak {g}}_{{d,\varepsilon }}\right) \right) \le Bd^p\varepsilon ^{-\beta }\), and\(\left( \int _{{\mathbb {R}}^d}\left\| {y}\right\| ^{2pq} \nu _d(dy)\right) ^{1/(2pq)}\le Bd^{{\mathfrak {p}}}\). Then

  1. (i)

    there exist unique\(u_d\in C([0,T]\times {\mathbb {R}}^d,{\mathbb {R}})\), \(d\in {\mathbb {N}}\), such that for every\(d\in {\mathbb {N}}\), \(x\in {\mathbb {R}}^d\), \(s\in [0,T]\), every probability space\((\Omega , \mathcal {F}, {\mathbb {P}})\), and every standard Brownian motion\(\mathbf {W}:[0,T]\times \Omega \rightarrow {\mathbb {R}}^d\)with continuous sample paths it holds that\(\sup _{t\in [0,T]}\sup _{y\in {\mathbb {R}}^d}\left( \frac{|u_d(t,y)|}{1+\left\| {y}\right\| ^p} \right) <\infty\)and

    $$\begin{aligned} u_d(s,x)={\mathbb {E}}\left[ g_d\left( x+\mathbf {W}_{T-s}\right) +\int _s^T f\left( u_d\left( t,x+\mathbf {W}_{t-s}\right) \right) \,dt\right] \end{aligned}$$
    (132)

    and

  2. (ii)

    there exist\((\Psi _{d,\varepsilon })_{d\in {\mathbb {N}},\varepsilon \in (0,1]}\subseteq \mathbf {N}\), \(\eta \in (0,\infty )\), \(C=(C_\gamma )_{\gamma \in (0,1]}:(0,1]\rightarrow (0,\infty )\)such that for all\(d\in {\mathbb {N}}\), \(\varepsilon , \gamma \in (0,1]\)it holds that\(\mathcal {R}(\Psi _{d,\varepsilon })\in C({\mathbb {R}}^d,{\mathbb {R}})\), \(\mathcal {P}(\Psi _{d,\varepsilon })\le C_\gamma d^\eta \varepsilon ^{-(4+2\alpha +\beta +\gamma )}\), and

    $$\begin{aligned} \left[ \int _{{\mathbb {R}}^d}\left| u_d(0,x)-(\mathcal {R}(\Psi _{d,\varepsilon }))(x)\right| ^2\nu _d(dx)\right] ^{\frac{1}{2}}\le \varepsilon . \end{aligned}$$
    (133)

Proof of Theorem 4.1

Throughout this proof assume without loss of generality that

$$\begin{aligned} B\ge \max \left\{ |f(0)|+1, 4L\left( 1+\left( 4L+2|f(0)| \right) ^{\frac{1}{q-1}} \right) +2\right\} . \end{aligned}$$
(134)

Note that the triangle inequality, the fact that \(\forall \, d\in {\mathbb {N}}, x \in {\mathbb {R}}^d, \varepsilon \in (0,1]:|({\mathcal {R}}({\mathfrak {g}}_{{d,\varepsilon }}))(x)|\le Bd^p(1+\left\| {x}\right\| )^p\), and the fact that \(\forall \, d\in {\mathbb {N}}, x \in {\mathbb {R}}^d, \varepsilon \in (0,1]:\left| g_d(x)-(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)\right| \le \varepsilon Bd^p (1+\left\| {x}\right\| )^{pq}\) imply for all \(d\in {\mathbb {N}}\), \(x \in {\mathbb {R}}^d\), \(\varepsilon \in (0,1]\) that

$$\begin{aligned} \begin{aligned} \left| g_d(x)\right|&\le \left| g_d(x)-(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)\right| +\left| (\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)\right| \le \varepsilon Bd^p (1+\left\| {x}\right\| )^{pq}+ Bd^p (1+\left\| {x}\right\| )^p. \end{aligned} \end{aligned}$$
(135)

This proves for all \(d\in {\mathbb {N}}\), \(x \in {\mathbb {R}}^d\) that

$$\begin{aligned} \left| g_d(x)\right| \le Bd^p (1+\left\| {x}\right\| )^p. \end{aligned}$$
(136)

Corollary 3.11 in [17], the fact that f is globally Lipschitz continuous, and (136) hence establish (i). It thus remains to prove (ii). To this end note that Corollary 3.13 ensures that there exist \({\mathfrak {f}}_{{\varepsilon }}\in \mathbf {N}\), \(\varepsilon \in (0,1]\), which satisfy for all \(v,w\in {\mathbb {R}}\), \(\varepsilon \in (0,1]\) that \(\mathcal {R}({\mathfrak {f}}_{{\varepsilon }})\in C({\mathbb {R}},{\mathbb {R}})\), \(\left| (\mathcal {R}({\mathfrak {f}}_{{\varepsilon }}))(w)-(\mathcal {R}({\mathfrak {f}}_{{\varepsilon }}))(v)\right| \le L\left| w-v\right|\), \(\left| f(v)-(\mathcal {R}({\mathfrak {f}}_{{\varepsilon }}))(v)\right| \le \varepsilon (1+|v|^q)\), \(\dim \left( \mathcal {D}\left( {\mathfrak {f}}_{{\varepsilon }}\right) \right) =3\), and

$$\begin{aligned} |||\mathcal {D}({\mathfrak {f}}_{{\varepsilon }})|||\le \left[ 4L\left( 1+\left( 4L+2|f(0)| \right) ^{\frac{1}{q-1}} \right) +2\right] \varepsilon ^{-\frac{q}{(q-1)}}. \end{aligned}$$
(137)

Note that the fact that \(B\ge 1+|f(0)|\) implies for all \(\varepsilon \in (0,1]\) that

$$\begin{aligned} \left| (\mathcal {R}({\mathfrak {f}}_{{\varepsilon }}))(0)\right| \le \left| (\mathcal {R}({\mathfrak {f}}_{{\varepsilon }}))(0)-f(0)\right| +|f(0)|\le \varepsilon +|f(0)|\le B. \end{aligned}$$
(138)

Next let \((\Omega , \mathcal {F}, {\mathbb {P}})\) be a probability space, for every \(d\in {\mathbb {N}}\) let \(\mathbf {W}^d :[0,T]\times \Omega \rightarrow {\mathbb {R}}^d\) be a standard Brownian motion with continuous sample paths, let \(\Theta = \bigcup _{ n \in {\mathbb {N}}} {\mathbb {Z}}^n\), let \({\mathfrak {u}}^\theta :\Omega \rightarrow [0,1]\), \(\theta \in \Theta\), be independent random variables which are uniformly distributed on [0, 1], let \(\mathcal {U}^\theta :[0,T]\times \Omega \rightarrow [0, T]\), \(\theta \in \Theta\), satisfy for all \(t\in [0,T]\), \(\theta \in \Theta\) that \(\mathcal {U}^\theta _t = t+ (T-t){\mathfrak {u}}^\theta\), for every \(d\in {\mathbb {N}}\) let \(W^{\theta ,d}:[0,T]\times \Omega \rightarrow {\mathbb {R}}^d\), \(\theta \in \Theta\), be independent standard Brownian motions with continuous sample paths, assume for every \(d\in {\mathbb {N}}\) that \((\mathfrak {u}^\theta )_{\theta \in \Theta }\) and \((W^{\theta ,d})_{\theta \in \Theta }\) are independent, and let \({U}_{ n,M,d,\delta }^{\theta } :[0, T] \times {\mathbb {R}}^d \times \Omega \rightarrow {\mathbb {R}}\), \(n,M\in {\mathbb {Z}}\), \(d\in {\mathbb {N}}\), \(\delta \in (0,1]\), \(\theta \in \Theta\), satisfy for all \(d,n,M \in {\mathbb {N}}\), \(\delta \in (0,1]\), \(\theta \in \Theta\), \(t \in [0,T]\), \(x\in {\mathbb {R}}^d\) that \({U}_{-1,M,d,\delta }^{\theta }(t,x)={U}_{0,M,d,\delta }^{\theta }(t,x)=0\) and

$$\begin{aligned} {U}_{n,M,d,\delta }^{\theta }(t,x) & = \frac{1}{M^n} \sum _{i=1}^{M^n} (\mathcal {R}({\mathfrak {g}}_{{d,\delta }}))\left( x+W^{(\theta ,0,-i),d}_{T}-W^{(\theta ,0,-i),d}_{t}\right) \\&\quad + \sum _{l=0}^{n-1} \frac{(T-t)}{M^{n-l}} \left[ \sum _{i=1}^{M^{n-l}} \left( ({\mathcal {R}}({\mathfrak {f}}_{{\delta }}))\circ {U}_{l,M,d,\delta }^{(\theta ,l,i)}-\mathbb {1}_{{\mathbb {N}}}(l)(\mathcal {R}({\mathfrak {f}}_{{\delta }}))\circ {U}_{l-1,M,d,\delta }^{(\theta ,-l,i)}\right) \right. \\&\quad\qquad\qquad\qquad\qquad\left. \left( {\mathcal {U}}_t^{(\theta ,l,i)},x+W_{\mathcal {U}_t^{(\theta ,l,i)}}^{(\theta ,l,i),d}-W_{t}^{(\theta ,l,i),d}\right) \right] , \end{aligned}$$
(139)

let \(c_{d}\in [1,\infty )\), \(d\in {\mathbb {N}}\), satisfy for all \(d\in {\mathbb {N}}\) that

$$\begin{aligned} c_{d}= \left( e^{LT} (T+1)\right) ^{q+1}\left( (Bd^p)^q+1\right) \left[ 1+\left( \int _{{\mathbb {R}}^d}\left\| {x}\right\| ^{2pq} \nu _d(dx)\right) ^{\frac{1}{(2pq)}} +\left( {\mathbb {E}} \left[ \left\| {\mathbf {W}^d_T}\right\| ^{pq} \right] \right) ^{\frac{1}{(pq)}} \right] ^{pq}, \end{aligned}$$
(140)

let \(k_{d,\varepsilon }\in {\mathbb {N}}\), \(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\), satisfy for all \(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\) that

$$\begin{aligned} k_{d,\varepsilon }=\max \left\{ |||{\mathcal {D}}({\mathfrak {f}}_{{\varepsilon }})|||,|||{\mathcal {D}}({\mathfrak {g}}_{{d,\varepsilon }})|||,2\right\} , \end{aligned}$$
(141)

let \({\tilde{C}}=({\tilde{C}}_\gamma )_{\gamma \in (0,1]}:(0,\infty )\rightarrow (0,\infty ]\) satisfy for all \(\gamma \in (0,\infty )\) that

$$\begin{aligned} {\tilde{C}}_\gamma = \sup _{n\in {\mathbb {N}}\cap [2,\infty )} \left[ n (3 n)^{2n} \left( \frac{\sqrt{e}(1+2LT)}{\sqrt{n-1}} \right) ^{(n-1)(4+\gamma )} \right] , \end{aligned}$$
(142)

let \(N_{d,\varepsilon }\in {\mathbb {N}}\), \(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\), satisfy for all \(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\) that

$$\begin{aligned} N_{d,\varepsilon }=\min \left\{ n \in {\mathbb {N}}\cap [2,\infty ) :\left[ c_d \left( \frac{\sqrt{e}(1+2LT)}{\sqrt{n}} \right) ^n\right] \le \frac{\varepsilon }{2}\right\} , \end{aligned}$$
(143)

and let \(\delta _{d,\varepsilon }\in (0,1]\), \(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\), satisfy for all \(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\) that \(\delta _{d,\varepsilon }=\frac{\varepsilon }{4Bd ^p c_d}\). Note that the fact that for all \(d\in {\mathbb {N}}\) the random variable \(\left\| {\frac{\mathbf {W}^d_T}{\sqrt{T}}}\right\| ^{2}\) is chi-squared distributed with d degrees of freedom and Jensen’s inequality imply that for all \(d\in {\mathbb {N}}\) it holds that

$$\begin{aligned} \left( {\mathbb {E}}\left[ \left\| {\mathbf {W}^d_T}\right\| ^{pq}\right] \right) ^2\le {\mathbb {E}}\left[ \left\| {\mathbf {W}^d_T}\right\| ^{2pq}\right] =(2T)^{pq} \left[ \frac{\Gamma \left( \frac{d}{2}+pq\right) }{\Gamma \left( \frac{d}{2}\right) }\right] =(2T)^{pq} \left[ \prod _{k=0}^{pq-1}\left( \frac{d}{2}+k \right) \right] . \end{aligned}$$
(144)

This implies for all \(d\in {\mathbb {N}}\) that

$$\begin{aligned} \left( {\mathbb {E}} \left[ \left\| {\mathbf {W}^d_T}\right\| ^{pq} \right] \right) ^{\frac{1}{(pq)}} = \left( {\mathbb {E}} \left[ \left\| {\mathbf {W}^d_T}\right\| ^{pq} \right] \right) ^{\frac{2}{(2pq)}} \le \sqrt{2T}\left( \prod _{k=0}^{pq-1}\left( \frac{d}{2}+k \right) \right) ^{\frac{1}{(2pq)}} \le \sqrt{2T\left( \frac{d}{2}+pq-1\right) }. \end{aligned}$$
(145)

This together with the fact that \(\forall \, d\in {\mathbb {N}}:\left( \int _{{\mathbb {R}}^d}\left\| {x}\right\| ^{2pq} \nu _d(dx)\right) ^{\frac{1}{(2pq)}}\le Bd^{{\mathfrak {p}}}\) implies that there exists \({\bar{C}} \in (0,\infty )\) such that for all \(d\in {\mathbb {N}}\) it holds that

$$\begin{aligned} c_d\le {\bar{C}} d^{pq}\left( \frac{1+d^{{\mathfrak {p}}}+\sqrt{d}}{3} \right) ^{pq}\le {\bar{C}} d^{({\mathfrak {p}}+1) pq}. \end{aligned}$$
(146)

Next note that for all \(\gamma \in (0,\infty )\) it holds that

$$\begin{aligned} {\tilde{C}}_\gamma&= \sup _{n\in {\mathbb {N}}\cap [2,\infty )} \left[ n (3 n)^{2n} \left( \frac{\sqrt{e}(1+2LT)}{\sqrt{n-1}} \right) ^{(n-1)(4+\gamma )} \right] \\&= \sup _{n\in {\mathbb {N}}\cap [2,\infty )} \left[ (\sqrt{e}(1+2LT))^{(n-1)(4+\gamma )}n^3 3^{2n} (n-1)^{-(n-1)\frac{\gamma }{2}} \left( \frac{n}{n-1} \right) ^{2(n-1)} \right] \\&\le \left[ \sup _{n\in {\mathbb {N}}\cap [2,\infty )} \left[ (\sqrt{e}(1+2LT))^{(n-1)(4+\gamma )}n^3 3^{2n} (n-1)^{-(n-1)\frac{\gamma }{2}} \right] \right] \left[ \sup _{n\in {\mathbb {N}}\cap [2,\infty )} \left( \frac{n}{n-1} \right) ^{2(n-1)} \right] \\&<\infty . \end{aligned}$$
(147)

The fact that for all \(d\in {\mathbb {N}}\), \(v\in {\mathbb {R}}\), \(x\in {\mathbb {R}}^d\), \(\varepsilon \in (0,1]\) it holds that \(\left| f(v)-(\mathcal {R}({\mathfrak {f}}_{{\varepsilon }}))(v)\right| \le \varepsilon (1+|v|^q)\) and \(\left| g_d(x)-(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)\right| \le \varepsilon Bd^p (1+\left\| {x}\right\| )^{pq}\) implies that for all \(d\in {\mathbb {N}}\), \(v\in {\mathbb {R}}\), \(x\in {\mathbb {R}}^d\), \(\varepsilon \in (0,1]\) it holds that

$$\begin{aligned} \max \left\{ \left| f(v)-(\mathcal {R}({\mathfrak {f}}_{{\varepsilon }}))(v)\right| , \left| g_d(x)-(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)\right| \right\}&\le \max \left\{ \varepsilon (1+|v|^q), \varepsilon Bd^p (1+\left\| {x}\right\| )^{pq} \right\} \\&\le \varepsilon Bd^p ((1+ \left\| {x}\right\| )^{pq}+|v|^q). \end{aligned}$$
(148)

This, (136), (138), the fact that for all \(d\in {\mathbb {N}}\), \(v,w\in {\mathbb {R}}\), \(x\in {\mathbb {R}}^d\), \(\varepsilon \in (0,1]\) it holds that \(|f(v)-f(w)|\le L\), \(\left| ({\mathcal {R}}({\mathfrak {f}}_{{\varepsilon }}))(v)-({\mathcal {R}}({\mathfrak {f}}_{{\varepsilon }}))(w)\right| \le L\left| v-w\right|\), \(|f(0)|\le B\), \(|({\mathcal {R}}({\mathfrak {g}}_{{d,\varepsilon }}))(x)|\le Bd^p (1+\left\| {x}\right\| )^p\), and Corollary 2.4 (with \(f_1=f\), \(f_2={\mathcal {R}}({\mathfrak {f}}_{{\delta }})\), \(g_1=g_d\), \(g_2={\mathcal {R}}({\mathfrak {g}}_{{d,\delta }})\), \(L=L\), \(\delta =\delta Bd^p\), \(B=Bd^p\), \({\mathbf {W}}={\mathbf {W}}^d\) in the notation of Corollary 2.4) imply that for all \(d,N,M\in {\mathbb {N}}\), \(\delta \in (0,1]\) it holds that

$$\begin{aligned}&\left( \int _{{\mathbb {R}}^d}{\mathbb {E}}\left[ \left| U^0_{N,M,d,\delta }(0,x)-u_d(0,x)\right| ^2\right] \nu _d(dx)\right) ^{\frac{1}{2}} \nonumber \\&\quad \le \left( e^{LT} (T+1)\right) ^{q+1}\left( (Bd^p)^q+1\right) \left( \delta Bd^p+\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\right) \nonumber \\&\qquad \cdot \left[ \int _{{\mathbb {R}}^d} \left( 1+\left\| {x}\right\| + \left( {\mathbb {E}} \left[ \left\| {\mathbf {W}^d_T}\right\| ^{pq} \right] \right) ^{\frac{1}{pq}}\right) ^{2pq} \nu _d(dx)\right] ^{\frac{1}{2}}. \end{aligned}$$
(149)

The triangle inequality hence proves for all \(d,N,M\in {\mathbb {N}}\), \(\delta \in (0,1]\) that

$$\begin{aligned} \begin{aligned}&\left( \int _{{\mathbb {R}}^d}{\mathbb {E}}\left[ \left| U^0_{N,M,d,\delta }(0,x)-u_d(0,x)\right| ^2\right] \nu _d(dx)\right) ^{\frac{1}{2}} \\&\quad \le \left( e^{LT} (T+1)\right) ^{q+1}\left( (Bd^p)^q+1\right) \left( \delta Bd^p+\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\right) \\&\qquad \cdot \left[ 1+\left( \int _{{\mathbb {R}}^d}\left\| {x}\right\| ^{2pq} \nu _d(dx)\right) ^{\frac{1}{(2pq)}} +\left( {\mathbb {E}} \left[ \left\| {\mathbf {W}^d_T}\right\| ^{pq} \right] \right) ^{\frac{1}{(pq)}} \right] ^{pq} \\&\quad = c_d\left( \delta Bd^p + \frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\right) . \end{aligned} \end{aligned}$$
(150)

This and Fubini’s theorem imply that for all \(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\) it holds that

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}\left[ \int _{{\mathbb {R}}^d}\left| U^0_{N_{d,\varepsilon },N_{d,\varepsilon },d,\delta _{d,\varepsilon }}(0,x)-u_d(0,x)\right| ^2\nu _d(dx)\right] \\&\quad = \int _{{\mathbb {R}}^d}{\mathbb {E}}\left[ \left| U^0_{N_{d,\varepsilon },N_{d,\varepsilon },d,\delta _{d,\varepsilon }}(0,x)-u_d(0,x)\right| ^2\right] \nu _d(dx)\\&\quad \le \left( c_d\delta _{d,\varepsilon }Bd^p + c_d\left( \frac{\sqrt{e}(1+2LT)}{\sqrt{N_{d,\varepsilon }}}\right) ^{N_{d,\varepsilon }} \right) ^2 \le \left( \frac{\varepsilon }{4}+\frac{\varepsilon }{2}\right) ^2< \varepsilon ^2. \end{aligned} \end{aligned}$$
(151)

This implies that for all \(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\) there exists \(\omega _{d,\varepsilon }\in \Omega\) such that

$$\begin{aligned} \begin{aligned} \int _{{\mathbb {R}}^d}\left| U^0_{N_{d,\varepsilon },N_{d,\varepsilon },d,\delta _{d,\varepsilon }}(0,x,\omega _{d,\varepsilon })-u_d(0,x)\right| ^2\nu _d(dx)&<\varepsilon ^2. \end{aligned} \end{aligned}$$
(152)

Next, observe that Lemma 3.10 shows that for all \(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\) there exists \(\Psi _{d,\varepsilon } \in \mathbf {N}\) such that for all \(x\in {\mathbb {R}}^d\) it holds that \(\mathcal {R}(\Psi _{d,\varepsilon })\in C({\mathbb {R}}^d,{\mathbb {R}})\), \((\mathcal {R}(\Psi _{d,\varepsilon }))(x)=U^0_{N_{d,\varepsilon },N_{d,\varepsilon },d,\delta _{d,\varepsilon }}(0,x,\omega _{d,\varepsilon })\), \(|||\mathcal {D}(\Psi _{d,\varepsilon } )|||\le k_{d,\delta _{d,\varepsilon }}(3 N_{d,\varepsilon })^{N_{d,\varepsilon }}\), and

$$\begin{aligned} \dim \left( \mathcal {D}\left( \Psi _{d,\varepsilon }\right) \right) = N_{d,\varepsilon }\left( \dim \left( {\mathcal {D}}\left( {\mathfrak {f}}_{{\delta _{d,\varepsilon }}}\right) \right) -1\right) +\dim \left( {\mathcal {D}}\left( {\mathfrak {g}}_{{d,\delta _{d,\varepsilon }}}\right) \right) . \end{aligned}$$
(153)

This and (152) prove (133). Moreover, (153) and (130) imply that for all \(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\) it holds that

$$\begin{aligned} \mathcal {P}(\Psi _{d,\varepsilon })&\le \sum _{j=1}^{\dim (\mathcal {D}(\Psi _{d,\varepsilon }))}k_{d,\delta _{d,\varepsilon }}(3 N_{d,\varepsilon })^{N_{d,\varepsilon }} \left( k_{d,\delta _{d,\varepsilon }}(3 N_{d,\varepsilon })^{N_{d,\varepsilon }}+1\right) \\&\le 2\dim \left( \mathcal {D}\left( \Psi _{d,\varepsilon }\right) \right) k_{d,\delta _{d,\varepsilon }}^2(3 N_{d,\varepsilon })^{2N_{d,\varepsilon }}\\&=2 \left( N_{d,\varepsilon }\left( \dim \left( \mathcal {D}\left( {\mathfrak {f}}_{{\delta _{d,\varepsilon }}}\right) \right) -1\right) +\dim \left( \mathcal {D}\left( {\mathfrak {g}}_{{d,\delta _{d,\varepsilon }}}\right) \right) \right) k_{d,\delta _{d,\varepsilon }}^2(3 N_{d,\varepsilon })^{2N_{d,\varepsilon }}.\end{aligned}$$
(154)

In addition, it follows from (137) and (134) that for all \(\varepsilon \in (0,1]\) it holds that

$$\begin{aligned} |||\mathcal {D}({\mathfrak {f}}_{{\varepsilon }})|||\le \left[ 4L\left( 1+\left( 4L+2|f(0)| \right) ^{\frac{1}{q-1}} \right) +2\right] \varepsilon ^{-\frac{q}{(q-1)}}\le B \varepsilon ^{-2}\le B \varepsilon ^{-\alpha }. \end{aligned}$$
(155)

Combining this with (154), the fact that \(\forall \, \varepsilon \in (0,1] :\dim \left( \mathcal {D}\left( {\mathfrak {f}}_{{\varepsilon }}\right) \right) =3\), the fact that \(\forall \, d\in {\mathbb {N}}, \varepsilon \in (0,1] :|||\mathcal {D}({\mathfrak {g}}_{{d,\varepsilon }})|||\le d^p\varepsilon ^{-\alpha }B\), and the fact that \(\forall \, d\in {\mathbb {N}}, \varepsilon \in (0,1] :\dim \left( \mathcal {D}\left( {\mathfrak {g}}_{{d,\varepsilon }}\right) \right) \le d^p\varepsilon ^{-\beta }B\) implies that for all \(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\) it holds that \(k_{d,\delta _{d,\varepsilon }}\le d^p\delta _{d,\varepsilon }^{-\alpha }B\) and that

$$\begin{aligned} \mathcal {P}(\Psi _{d,\varepsilon })&\le 2 \left( N_{d,\varepsilon }\left( \dim \left( \mathcal {D}\left( {\mathfrak {f}}_{{\delta _{d,\varepsilon }}}\right) \right) -1\right) +\dim \left( {\mathcal {D}}\left( {\mathfrak {g}}_{{d,\delta _{d,\varepsilon }}}\right) \right) \right) \left( d^{p}\delta _{d,\varepsilon }^{-\alpha }B\right) ^2 (3 N_{d,\varepsilon })^{2N_{d,\varepsilon }}\\&\le 2 \left( 2N_{d,\varepsilon }+d^p(\delta _{d,\varepsilon })^{-\beta }B\right) \left( d^{p}\delta _{d,\varepsilon }^{-\alpha }B\right) ^2 (3 N_{d,\varepsilon })^{2N_{d,\varepsilon }}\\&\le 4d^p\delta _{d,\varepsilon }^{-\beta }Bd^{2p}\delta _{d,\varepsilon }^{-2\alpha }B^2 N_{d,\varepsilon } (3 N_{d,\varepsilon })^{2N_{d,\varepsilon }}\\&= 4B^3 (4c_d Bd^p)^{2\alpha +\beta } d^{3p} \varepsilon ^{-(2\alpha +\beta )} N_{d,\varepsilon } (3 N_{d,\varepsilon })^{2N_{d,\varepsilon }}. \end{aligned}$$
(156)

Furthermore, (143) ensures that for all \(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\) it holds that

$$\begin{aligned} \varepsilon \le 2c_d \left( \frac{\sqrt{e}(1+2LT)}{\sqrt{N_{d,\varepsilon }-1}} \right) ^{N_{d,\varepsilon }-1}. \end{aligned}$$
(157)

This together with (156) implies that for all \(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\), \(\gamma \in (0,1]\) it holds that

$$\begin{aligned} \begin{aligned} \mathcal {P}(\Psi_{d,\varepsilon })&\le 4B^{2\alpha +\beta +3} (4c_d)^{2\alpha +\beta } d^{(2\alpha +\beta +3)p} \varepsilon ^{-(2\alpha +\beta )} N_{d,\varepsilon } (3 N_{d,\varepsilon })^{2N_{d,\varepsilon }}\varepsilon ^{4+\gamma }\varepsilon ^{-(4+\gamma )}\\&\quad \le 4B^{2\alpha +\beta +3} (4c_d)^{4+2\alpha +\beta +\gamma } d^{(2\alpha +\beta +3)p} N_{d,\varepsilon } (3 N_{d,\varepsilon })^{2N_{d,\varepsilon }} \left( \frac{\sqrt{e}(1+2LT)}{\sqrt{N_{d,\varepsilon }-1}} \right) ^{(N_{d,\varepsilon }-1)(4+\gamma )}\varepsilon ^{-(4+2\alpha +\beta +\gamma )}\\&\quad \le 4B^{2\alpha +\beta +3} (4c_d)^{5+2\alpha +\beta } d^{(2\alpha +\beta +3)p} \sup _{n\in {\mathbb {N}}\cap [2,\infty )}\left[ n (3 n)^{2n} \left( \frac{\sqrt{e}(1+2LT)}{\sqrt{n-1}} \right) ^{(n-1)(4+\gamma )} \right] \varepsilon ^{-(4+2\alpha +\beta +\gamma )} \\&\quad = 4B^{2\alpha +\beta +3} (4c_d)^{5+2\alpha +\beta } d^{(2\alpha +\beta +3)p} {\tilde{C}}_\gamma \varepsilon ^{-(4+2\alpha +\beta +\gamma )} .\end{aligned} \end{aligned}$$
(158)

Combining this with (146) and (147) establishes that there exist \(\eta \in (0,\infty )\), \(C=(C_\gamma )_{\gamma \in (0,1]}:(0,1]\rightarrow (0,\infty )\) such that for all \(d\in {\mathbb {N}}\), \(\varepsilon , \gamma \in (0,1]\) it holds that \(\mathcal {P}(\Psi _{d,\varepsilon })\le C_\gamma d^\eta \varepsilon ^{-(4+2\alpha +\beta +\gamma )}\). The proof of Theorem 4.1 is thus completed. \(\square\)

4.2 Deep neural network approximations with general polynomial convergence rates

Corollary 4.2

Let \(\left\| \cdot \right\| :(\cup _{d\in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0,\infty )\) and \(\mathbf {A}_{d}:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\), \(d\in {\mathbb {N}}\), satisfy for all \(d\in {\mathbb {N}}\), \(x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d\) that \(\Vert x\Vert =[\sum _{i=1}^d(x_i)^2]^{1/2}\) and \(\mathbf {A}_{d}(x)= \left( \max \{x_1,0\},\ldots ,\max \{x_d,0\}\right) ,\) let

$$\begin{aligned} \mathbf {N}= \bigcup _{H\in {\mathbb {N}}}\bigcup _{(k_0,k_1,\ldots ,k_{H+1})\in {\mathbb {N}}^{H+2}} \left[ \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_{n}\times k_{n-1}} \times {\mathbb {R}}^{k_{n}}\right) \right] , \end{aligned}$$
(159)

let\(\mathcal {R}:\mathbf {N}\rightarrow (\cup _{k,l\in {\mathbb {N}}} C({\mathbb {R}}^k,{\mathbb {R}}^l))\)and\(\mathcal {P}:\mathbf {N}\rightarrow {\mathbb {N}}\)satisfy for all\(H\in {\mathbb {N}}\), \(k_0,k_1,\ldots ,k_H,k_{H+1}\in {\mathbb {N}}\), \(\Phi = ((W_1,B_1),\ldots ,(W_{H+1},B_{H+1}))\in \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_n\times k_{n-1}} \times {\mathbb {R}}^{k_n}\right) ,\)\(x_0 \in {\mathbb {R}}^{k_0},\ldots ,x_{H}\in {\mathbb {R}}^{k_{H}}\)with\(\forall \, n\in {\mathbb {N}}\cap [1,H]:x_n = \mathbf {A}_{k_n}(W_n x_{n-1}+B_n )\)that

$$\begin{aligned} {\mathcal {R}}(\Phi )\in C({\mathbb {R}}^{k_0},{\mathbb {R}}^ {k_{H+1}}),\quad (\mathcal {R}(\Phi )) (x_0) = W_{H+1}x_{H}+B_{H+1}, \quad and\quad \mathcal {P}(\Phi )=\textstyle {\sum \limits _{n=1}^{H+1}}k_n(k_{n-1}+1), \end{aligned}$$

let\(T,\kappa \in (0,\infty )\), \(f\in C({\mathbb {R}},{\mathbb {R}})\), \(({\mathfrak {g}}_{{d,\varepsilon }})_{d\in {\mathbb {N}},\varepsilon \in (0,1]}{\subseteq}\mathbf {N}\), \((c_d)_{d\in {\mathbb {N}}}{\subseteq}(0,\infty )\), for every\(d\in {\mathbb {N}}\)let\(g_d\in C({\mathbb {R}}^d,{\mathbb {R}})\), for every\(d\in {\mathbb {N}}\)let\(u_d\in C^{1,2}([0,T]\times {\mathbb {R}}^d,{\mathbb {R}})\), and assume for all\(d\in {\mathbb {N}}\), \(v,w\in {\mathbb {R}}\), \(x\in {\mathbb {R}}^d\), \(\varepsilon \in (0,1]\), \(t\in (0,T)\)that

$$\begin{aligned}&\mathcal {P}({\mathfrak {g}}_{{d,\varepsilon }})\le \kappa d^\kappa \varepsilon ^{-\kappa }, \quad \left| g_d(x)-(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)\right| \le \varepsilon \kappa d^\kappa (1+\left\| {x}\right\| ^\kappa ), \quad \mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }})\in C({\mathbb {R}}^d,{\mathbb {R}}), \end{aligned}$$
(160)
$$\begin{aligned}&|(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)|\le \kappa d^\kappa (1+\left\| {x}\right\| ^\kappa ),\quad |f(v)-f(w)|\le \kappa |v-w|, \quad |u_d(t,x)|\le c_d(1+\left\| {x}\right\| ^{c_d}), \end{aligned}$$
(161)
$$\begin{aligned}&\begin{aligned} (\tfrac{\partial }{\partial t}u_d)(t,x)+\tfrac{1}{2}(\Delta _xu_d)(t,x)+f(u_d(t,x))=0, \qquad and \qquad u_d(T,x)=g_d(x). \end{aligned} \end{aligned}$$
(162)

Then there exist\((\Psi _{d,\varepsilon })_{d\in {\mathbb {N}},\varepsilon \in (0,1]}{\subseteq}\mathbf {N}\), \(\eta \in (0,\infty )\)such that for all\(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\)it holds that\(\mathcal {R}(\Psi _{d,\varepsilon })\in C({\mathbb {R}}^d,{\mathbb {R}})\), \(\mathcal {P}(\Psi _{d,\varepsilon })\le \eta d^\eta \varepsilon ^{-\eta }\), and

$$\begin{aligned} \left[ \int _{[0,1]^d}\left| u_d(0,x)-(\mathcal {R}(\Psi _{d,\varepsilon }))(x)\right| ^2\, dx\right] ^{\frac{1}{2}}\le \varepsilon . \end{aligned}$$
(163)

Proof of Corollary 4.2

Throughout this proof assume without loss of generality that \(\kappa \ge 2\), let \(|||\cdot ||| :(\cup _{d\in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0,\infty )\) and \(\dim :(\cup _{d\in {\mathbb {N}}}{\mathbb {R}}^d) \rightarrow {\mathbb {N}}\) satisfy for all \(d\in {\mathbb {N}}\), \(x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d\) that \(|||x|||=\max _{i\in [1,d]\cap {\mathbb {N}}}|x_i|\) and \(\dim \left( x\right) =d\), let \(\mathcal {D}:\mathbf {N}\rightarrow \mathbf {D}\) satisfy for all \(H\in {\mathbb {N}}\), \(k_0,k_1,\ldots ,k_H,k_{H+1}\in {\mathbb {N}}\), \(\Phi = ((W_1,B_1),\ldots ,(W_{H+1},B_{H+1}))\in \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_n\times k_{n-1}} \times {\mathbb {R}}^{k_n}\right)\) that

$$\begin{aligned} \mathcal {D}(\Phi )= (k_0,k_1,\ldots ,k_{H}, k_{H+1}), \end{aligned}$$
(164)

and let \(B=\max \left\{ 1, \kappa \left[ \sup _{r\in [0,\infty )}\frac{1+r^\kappa }{(1+r)^\kappa }\right] \right\}\). The fact that \(\forall \, d\in {\mathbb {N}}\), \(t\in [0,T]\), \(x\in {\mathbb {R}}^d :|u_d(t,x)|\le c_d(1+\left\| {x}\right\| ^{c_d})\), the fact that \(\forall \, d\in {\mathbb {N}}\), \(x\in {\mathbb {R}}^d :u_d(T,x)=g_d(x)\), the fact that \(\forall \, v,w\in {\mathbb {R}}:|f(v)-f(w)|\le \kappa |v-w|\), (162), and the Feynman–Kac-formula ensure that the functions \(u_d\), \(d\in {\mathbb {N}}\), satisfy (132). Next note that for all \(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\), \(x\in {\mathbb {R}}^d\) it holds that

$$\begin{aligned}&|(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)|\le \kappa d^\kappa (1+\left\| {x}\right\| ^\kappa ) \le \kappa \left[ \sup _{r\in [0,\infty )}\frac{1+r^\kappa }{(1+r)^\kappa }\right] d^\kappa (1+\left\| {x}\right\| )^\kappa \le Bd^\kappa (1+\left\| {x}\right\| )^\kappa , \end{aligned}$$
(165)
$$\begin{aligned}&\left| g_d(x)-(\mathcal {R}({\mathfrak {g}}_{{d,\varepsilon }}))(x)\right| \le \varepsilon \kappa d^\kappa (1+\left\| {x}\right\| ^\kappa )\le \varepsilon \kappa \left[ \sup _{r\in [0,\infty )}\frac{1+r^\kappa }{(1+r)^\kappa }\right] d^\kappa (1+\left\| {x}\right\| )^\kappa \le \varepsilon Bd^\kappa (1+\left\| {x}\right\| )^{2\kappa }, \end{aligned}$$
(166)
$$\begin{aligned}&|||\mathcal {D}({\mathfrak {g}}_{{d,\varepsilon }})|||\le \mathcal {P}({\mathfrak {g}}_{{d,\varepsilon }})\le \kappa d^\kappa \varepsilon ^{-\kappa }\le Bd^\kappa \varepsilon ^{-\kappa }, \end{aligned}$$
(167)

and

$$\begin{aligned} \dim \left( \mathcal {D}\left( {\mathfrak {g}}_{{d,\varepsilon }}\right) \right) \le \mathcal {P}({\mathfrak {g}}_{{d,\varepsilon }})\le \kappa d^\kappa \varepsilon ^{-\kappa } \le Bd^\kappa \varepsilon ^{-\kappa }. \end{aligned}$$
(168)

Moreover, observe that the fact that \(\forall \, d\in {\mathbb {N}}, y\in [0,1]^d:\left\| {y}\right\| \le \sqrt{d}\) ensures that for all \(d\in {\mathbb {N}}\), \(\alpha \in (0,\infty )\) it holds that

$$\begin{aligned} \left( \int _{[0,1]^d}\left\| {y}\right\| ^\alpha \, dy\right) ^{\frac{1}{\alpha }} \le \sqrt{d} \le Bd. \end{aligned}$$
(169)

Combining this with (165)–(168) and Theorem 4.1 (with \(\alpha =\kappa\), \(\beta =\kappa\), \(B=B\), \(L=\kappa\), \(p=\kappa\), \(q=2\), \({\mathfrak{p}}=1\), and \(\gamma =\frac{1}{2}\) in the notation of Theorem 4.1) ensures that there exist \((\Psi _{d,\varepsilon })_{d\in {\mathbb {N}},\varepsilon \in (0,1]}{\subseteq}\mathbf {N}\), \(\eta \in (0,\infty )\) such that for all \(d\in {\mathbb {N}}\), \(\varepsilon \in (0,1]\) it holds that \(\mathcal {R}(\Psi _{d,\varepsilon })\in C({\mathbb {R}}^d,{\mathbb {R}})\), \(\mathcal {P}(\Psi _{d,\varepsilon })\le \eta d^\eta \varepsilon ^{-\eta }\), and

$$\begin{aligned} \left[ \int _{[0,1]^d}\left| u_d(0,x)-(\mathcal {R}(\Psi _{d,\varepsilon }))(x)\right| ^2\, dx\right] ^{\frac{1}{2}}\le \varepsilon . \end{aligned}$$
(170)

The proof of Corollary 4.2 is thus completed. \(\square\)