Deep neural network approximations for solutions of PDEs based on Monte Carlo algorithms

Grohs, Philipp; Jentzen, Arnulf; Salimova, Diyora

doi:10.1007/s42985-021-00100-z

Deep neural network approximations for solutions of PDEs based on Monte Carlo algorithms

Original Paper
Open access
Published: 08 June 2022

Volume 3, article number 45, (2022)
Cite this article

Download PDF

You have full access to this open access article

Partial Differential Equations and Applications Aims and scope Submit manuscript

Deep neural network approximations for solutions of PDEs based on Monte Carlo algorithms

Download PDF

3920 Accesses
9 Citations
17 Altmetric
Explore all metrics

Abstract

In the past few years deep artificial neural networks (DNNs) have been successfully employed in a large number of computational problems including, e.g., language processing, image recognition, fraud detection, and computational advertisement. Recently, it has also been proposed in the scientific literature to reformulate high-dimensional partial differential equations (PDEs) as stochastic learning problems and to employ DNNs together with stochastic gradient descent methods to approximate the solutions of such high-dimensional PDEs. There are also a few mathematical convergence results in the scientific literature which show that DNNs can approximate solutions of certain PDEs without the curse of dimensionality in the sense that the number of real parameters employed to describe the DNN grows at most polynomially both in the PDE dimension $d \in {\mathbb {N}}$ and the reciprocal of the prescribed approximation accuracy $\varepsilon > 0$. One key argument in most of these results is, first, to employ a Monte Carlo approximation scheme which can approximate the solution of the PDE under consideration at a fixed space-time point without the curse of dimensionality and, thereafter, to prove then that DNNs are flexible enough to mimic the behaviour of the employed approximation scheme. Having this in mind, one could aim for a general abstract result which shows under suitable assumptions that if a certain function can be approximated by any kind of (Monte Carlo) approximation scheme without the curse of dimensionality, then the function can also be approximated with DNNs without the curse of dimensionality. It is a subject of this article to make a first step towards this direction. In particular, the main result of this paper, roughly speaking, shows that if a function can be approximated by means of some suitable discrete approximation scheme without the curse of dimensionality and if there exist DNNs which satisfy certain regularity properties and which approximate this discrete approximation scheme without the curse of dimensionality, then the function itself can also be approximated with DNNs without the curse of dimensionality. Moreover, for the number of real parameters used to describe such approximating DNNs we provide an explicit upper bound for the optimal exponent of the dimension $d \in {\mathbb {N}}$ of the function under consideration as well as an explicit lower bound for the optimal exponent of the prescribed approximation accuracy $\varepsilon >0$. As an application of this result we derive that solutions of suitable Kolmogorov PDEs can be approximated with DNNs without the curse of dimensionality.

Space-time error estimates for deep neural network approximations for differential equations

Article Open access 11 January 2023

Numerical Solution of the Parametric Diffusion Equation by Deep Neural Networks

Article Open access 05 June 2021

Gauss Newton Method for Solving Variational Problems of PDEs with Neural Network Discretizaitons

Article 03 June 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In the past few years deep artificial neural networks (DNNs) have been successfully employed in a large number of computational problems including, e.g., language processing (cf., e.g., [13, 23, 29, 31, 38, 57]), image recognition (cf., e.g., [32, 40, 52, 54, 56]), fraud detection (cf., e.g., [12, 51]), and computational advertisement (cf., e.g., [55, 58]).

Recently, it has also been proposed to reformulate high-dimensional partial differential equations (PDEs) as stochastic learning problems and to employ DNNs together with stochastic gradient descent methods to approximate the solutions of such high-dimensional PDEs [3, 16, 17, 20, 26, 39, 53] (cf., e.g., also [14, 37, 42]). We refer, e.g., to [1, 2, 4,5,6, 9, 11, 15, 19, 22, 27, 28, 33, 35, 43,44,45, 48, 49] and the references mentioned therein for further developments and extensions of such deep learning based numerical approximation methods for PDEs. In particular, the references [2, 9, 17, 35, 45] deal with linear PDEs (and the stochastic differential equations (SDEs) related to them, respectively), the references [1, 11, 15, 19, 20, 28, 33] deal with semilinear PDEs (and the backward stochastic differential equations (BSDEs) related to them, respectively), the references [3, 43, 48, 49] deal with fully nonlinear PDEs (and the second-order backward stochastic differential equations (2BSDEs) related to them, respectively), the references [27, 44, 53] deal with certain specific subclasses of fully nonlinear PDEs (and the 2BSDEs related to them, respectively), and the references [5, 6, 22, 53] deal with free boundary PDEs (and the optimal stopping/option pricing problems related to them (see, e.g., [8, Chapter 1]), respectively).

In the scientific literature there are also a few rigorous mathematical convergence results for DNN based approximation methods for PDEs. For example, the references [27, 53] provide mathematical convergence results for such DNN based approximation methods for PDEs without any information on the convergence speed and, for instance, the references [10, 18, 21, 24, 25, 30, 34, 36, 41, 50] provide mathematical convergence results of such DNN based approximation methods for PDEs with dimension-independent convergence rates and error constants which are only polynomially dependent on the dimension. In particular, the latter references show that DNNs can approximate solutions of certain PDEs without the curse of dimensionality (cf. [7]) in the sense that the number of real parameters employed to describe the DNN grows at most polynomially both in the PDE dimension $d \in {\mathbb {N}}$ and the reciprocal of the prescribed approximation accuracy $\varepsilon > 0$ (cf., e.g., [46, Chapter 1] and [47, Chapter 9]).

One key argument in most of these articles is, first, to employ a Monte Carlo approximation scheme which can approximate the solution of the PDE under consideration at a fixed space-time point without the curse of dimensionality and, thereafter, to prove then that DNNs are flexible enough to mimic the behaviour of the employed approximation scheme (cf., e.g., [36, Section 2 and (i)–(iii) in Section 1] and [24]). Having this in mind, one could aim for a general abstract result which shows under suitable assumptions that if a certain function can be approximated by any kind of (Monte Carlo) approximation scheme without the curse of dimensionality, then the function can also be approximated with DNNs without the curse of dimensionality.

It is a subject of this article to make a first step towards this direction. In particular, the main result of this paper, Theorem 2.3 below, roughly speaking, shows that if a function can be approximated by means of some suitable discrete approximation scheme without the curse of dimensionality (cf. (2.9) in Theorem 2.3 below) and if there exist DNNs which satisfy certain regularity properties and which approximate this discrete approximation scheme without the curse of dimensionality, then the function itself can also be approximated with DNNs without the curse of dimensionality. Moreover, for the number of real parameters used to describe such approximating DNNs we provide in Theorem 2.3 below an explicit upper bound for the optimal exponent of the dimension $d \in {\mathbb {N}}$ of the function under consideration as well as an explicit lower bound for the optimal exponent of the prescribed approximation accuracy $\varepsilon >0$ (see (2.16) in Theorem 2.3 below).

In our applications of Theorem 2.3 we employ Theorem 2.3 to study in Theorem 4.5 below DNN approximations for PDEs. Theorem 4.5 can be considered as a special case of Theorem 2.3 with the function to be approximated to be equal to the solution of a suitable Kolmogorov PDE (cf. (4.42) below) at the final time $T \in (0, \infty )$ and the approximating scheme to be equal to the Monte Carlo Euler scheme. In particular, Theorem 4.5 shows that solutions of suitable Kolmogorov PDEs can be approximated with DNNs without the curse of dimensionality. For the number of real parameters used to describe such approximating DNNs Theorem 4.5 also provides an explicit upper bound for the optimal exponent of the dimension $d \in {\mathbb {N}}$ of the PDE under consideration as well as an explicit lower bound for the optimal exponent of the prescribed approximation accuracy $\varepsilon >0$ (see (4.43) below). In order to illustrate the findings of Theorem 4.5 below, we now present in Theorem 1.1 below a special case of Theorem 4.5.

Theorem 1.1

Let $ \varphi _{0,d} \in C({\mathbb {R}}^d, {\mathbb {R}}) $, $ d \in {\mathbb {N}}$, and $ \varphi _{ 1, d } \in C({\mathbb {R}}^d, {\mathbb {R}}^d) $, $ d \in {\mathbb {N}}$, let $\left\| \cdot \right\| \! :(\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0, \infty )$ and ${\mathfrak {R}}:(\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow (\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d)$ satisfy for all $d \in {\mathbb {N}}$, $x = (x_1, \ldots , x_d) \in {\mathbb {R}}^d$ that

$$\begin{aligned} \Vert x\Vert = \big ( \textstyle \sum _{i=1}^d |x_i|^2\big )^{1/2} \qquad \text {and} \qquad {\mathfrak {R}}(x) = (\max \{x_1, 0\}, \ldots , \max \{x_d, 0\}), \end{aligned}$$

(1.1)

let ${\mathbf {N}}= \cup _{L \in {\mathbb {N}}} \cup _{ l_0,l_1,\ldots , l_L\in {\mathbb {N}}} ( \times _{k = 1}^L ({\mathbb {R}}^{l_k \times l_{k-1}} \times {\mathbb {R}}^{l_k}) )$, let ${\mathcal {P}}:{\mathbf {N}}\rightarrow {\mathbb {N}}$ and ${\mathcal {R}}:{\mathbf {N}}\rightarrow (\cup _{k,l\in {\mathbb {N}}} C({\mathbb {R}}^k,{\mathbb {R}}^l))$ satisfy for all $ L\in {\mathbb {N}}$, $l_0,l_1,\ldots , l_L \in {\mathbb {N}}$, $ \Phi = ((W_1, B_1),\ldots , (W_L,B_L)) \in ( \times _{k = 1}^L({\mathbb {R}}^{l_k \times l_{k-1}} \times {\mathbb {R}}^{l_k})) $, $x_0 \in {\mathbb {R}}^{l_0}, x_1 \in {\mathbb {R}}^{l_1}, \ldots , x_{L} \in {\mathbb {R}}^{l_{L}}$ with $\forall \, k \in {\mathbb {N}}\cap (0,L) :x_k = {\mathfrak {R}}(W_k x_{k-1} + B_k)$ that ${\mathcal {P}}(\Phi ) = \sum _{k = 1}^L l_k(l_{k-1} + 1) $, ${\mathcal {R}}(\Phi ) \in C({\mathbb {R}}^{l_0},{\mathbb {R}}^{l_L})$, and

$$\begin{aligned} ({\mathcal {R}}(\Phi )) (x_0) = W_L x_{L-1} + B_L, \end{aligned}$$

(1.2)

let $ T, \kappa , {\mathfrak {e}}\in (0, \infty )$, ${\mathfrak {d}}\in [4, \infty )$, $\theta \in [1, \infty )$, $ ( \phi ^{ m, d }_{ \varepsilon } )_{ (m, d, \varepsilon ) \in \{ 0, 1 \} \times {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}$, assume for all $ d \in {\mathbb {N}}$, $ \varepsilon \in (0,1] $, $ m \in \{0, 1\}$, $ x, y \in {\mathbb {R}}^d $ that

$$\begin{aligned}&{\mathcal {R}}( \phi ^{ 0, d }_{ \varepsilon } ) \in C( {\mathbb {R}}^d, {\mathbb {R}}), \quad {\mathcal {R}}( \phi ^{ 1, d }_{ \varepsilon } ) \in C( {\mathbb {R}}^d, {\mathbb {R}}^d ), \quad {\mathcal {P}}( \phi ^{ m, d }_{ \varepsilon } ) \le \kappa d^{ 2^{(-m)} {\mathfrak {d}}} \varepsilon ^{ - 2^{(-m)} {\mathfrak {e}}}, \end{aligned}$$

(1.3)

$$\begin{aligned}&|( {\mathcal {R}}(\phi ^{ 0, d }_{ \varepsilon }) )(x) - ( {\mathcal {R}}(\phi ^{ 0, d }_{ \varepsilon }) )(y)| \le \kappa d^{{\mathfrak {d}}} (1 + \Vert x\Vert ^{\theta } + \Vert y \Vert ^{\theta })\Vert x-y\Vert , \end{aligned}$$

(1.4)

$$\begin{aligned}&\Vert ( {\mathcal {R}}(\phi ^{ 1, d }_{ \varepsilon }) )(x) \Vert \le \kappa ( d^{ {\mathfrak {d}}} + \Vert x \Vert ), \qquad | \varphi _{ 0, d }( x )| \le \kappa d^{ {\mathfrak {d}}} ( d^{ \theta {\mathfrak {d}}} + \Vert x \Vert ^{ \theta } ), \end{aligned}$$

(1.5)

$$\begin{aligned}&\Vert \varphi _{ m, d }(x) - ( {\mathcal {R}}(\phi ^{ m, d }_{ \varepsilon }) )(x) \Vert \le \varepsilon \kappa d^{{\mathfrak {d}}} (d^{\theta {\mathfrak {d}}}+ \Vert x\Vert ^{\theta }), \end{aligned}$$

(1.6)

and $ \Vert \varphi _{ 1, d }( x ) - \varphi _{ 1, d }( y ) \Vert \le \kappa \Vert x - y \Vert $, and for every $ d \in {\mathbb {N}}$ let $ u_d :[0,T] \times {\mathbb {R}}^{d} \rightarrow {\mathbb {R}}$ be an at most polynomially growing viscosity solution of

$$\begin{aligned} \begin{aligned} \left( \tfrac{ \partial }{\partial t} u_d \right) ( t, x )&= \left( \tfrac{ \partial }{\partial x} u_d \right) ( t, x ) \, \varphi _{ 1, d }( x ) + \textstyle \sum \limits _{ i = 1 }^d \displaystyle \left( \tfrac{ \partial ^2 }{ \partial x_i^2 } u_d \right) ( t, x ) \end{aligned} \end{aligned}$$

(1.7)

with $ u_d( 0, x ) = \varphi _{ 0, d }( x ) $ for $ ( t, x ) \in (0,T) \times {\mathbb {R}}^d $. Then for every $p \in (0, \infty )$ there exist $ c \in {\mathbb {R}}$ and $ ( \Psi _{ d, \varepsilon } )_{ (d , \varepsilon ) \in {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}$ such that for all $ d \in {\mathbb {N}}$, $ \varepsilon \in (0,1] $ it holds that $ {\mathcal {R}}( \Psi _{ d, \varepsilon } ) \in C( {\mathbb {R}}^{ d }, {\mathbb {R}}) $, $[ \int _{ [0, 1]^d } | u_d(T, x) - ( {\mathcal {R}}(\Psi _{ d, \varepsilon }) )( x ) |^p \, dx ]^{ \nicefrac { 1 }{ p } } \le \varepsilon $, and

$$\begin{aligned} {\mathcal {P}}( \Psi _{ d, \varepsilon } ) \le c \varepsilon ^{-({\mathfrak {e}}+6)} d^{{\mathfrak {d}}[6 \theta + 13 + 2 {\mathfrak {e}}(\theta +1)]}. \end{aligned}$$

(1.8)

Theorem 1.1 is an immediate consequence of Corollary 4.6 in Sect. 4 below. Corollary 4.6, in turn, is a special case of Theorem 4.5. Let us add some comments regarding the mathematical objects appearing in Theorem 1.1.

The set $ {\mathbf {N}}$ in Theorem 1.1 above is a set of tuples of pairs of real matrices and real vectors and this set represents the set of all DNNs (see also Definition 3.1 below). The function ${\mathfrak {R}}:(\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow (\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d)$ in Theorem 1.1 represents multidimensional rectifier functions. Theorem 1.1 is thus an approximation result for rectified DNNs.

Moreover, for every DNN $ \Phi \in {\mathbf {N}}$ in Theorem 1.1 above $ {\mathcal {P}}( \Phi ) \in {\mathbb {N}}$ represents the number of real parameters which are used to describe the DNN $ \Phi $ (see also Definition 3.1 below). In particular, for every DNN $ \Phi \in {\mathbf {N}}$ in Theorem 1.1 one can think of $ {\mathcal {P}}( \Phi ) \in {\mathbb {N}}$ as a number proportional to the amount of memory storage needed to store the DNN $\Phi $. Furthermore, the function $ {\mathcal {R}}:{\mathbf {N}}\rightarrow (\cup _{ k, l \in {\mathbb {N}}} C( {\mathbb {R}}^k, {\mathbb {R}}^l )) $ from the set $ {\mathbf {N}}$ of “all DNNs" to the union $ \cup _{ k, l \in {\mathbb {N}}} C( {\mathbb {R}}^k, {\mathbb {R}}^l ) $ of continuous functions describes the realization functions associated to the DNNs (see also Definition 3.3 below).

The real number $ T > 0 $ in Theorem 1.1 describes the time horizon under consideration and the real numbers $ \kappa , {\mathfrak {e}}, {\mathfrak {d}}, \theta \in {\mathbb {R}}$ in Theorem 1.1 are constants used to formulate the assumptions in Theorem 1.1. The key assumption in Theorem 1.1 is the hypothesis that both the possibly nonlinear initial value functions $ \varphi _{ 0, d } :{\mathbb {R}}^d \rightarrow {\mathbb {R}}$, $ d \in {\mathbb {N}}$, and the possibly nonlinear drift coefficient functions $ \varphi _{ 1, d } :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d $, $ d \in {\mathbb {N}}$, of the PDEs in (1.7) can be approximated by means of DNNs without the curse of dimensionality (see (1.3)–(1.6) above for details).

Results related to Theorem 4.5 have been established in [24, Theorem 3.14], [36, Theorem 1.1], [34, Theorem 4.1], and [50, Corollary 2.2]. Theorem 3.14 in [24] proves a similar statement to (1.8) for a different class of PDEs than (1.7), that is, Theorem 3.14 in [24] deals with Black-Scholes PDEs with affine linear coefficient functions while in (1.7) the diffusion coefficient is constant and the drift coefficient may be nonlinear. Theorem 1.1 in [36] shows the existence of constants and exponents of $d \in {\mathbb {N}}$ and $\varepsilon >0$ such that (1.8) holds but does not provide any explicit form for these exponents. Theorem 4.1 in [34] studies a different class of PDEs than (1.7) (the diffusion coefficient is chosen so that the second order term is the Laplacian and the drift coefficient is chosen to be zero but there is a nonlinearity depending on the PDE solution in the PDE in Theorem 4.1 in [34]) and provides an explicit exponent for $\varepsilon >0$ and the existence of constants and exponents of $d \in {\mathbb {N}}$ such that (1.8) holds. Corollary 2.2 in [50] studies a more general class of Kolmogorov PDEs than (1.7) and shows the existence of constants and exponents of $d \in {\mathbb {N}}$ and $\varepsilon >0$ such that (1.8) holds. Theorem 4.5 above extends these results by providing explicit exponents for $d \in {\mathbb {N}}$ and $\varepsilon > 0$ in terms of the used assumptions such that (1.8) holds and, in addition, Theorem 4.5 can be considered as a special case of the general DNN approximation result in Theorem 2.3 with the functions to be approximated to be equal to the solutions of the PDEs in (1.7) at the final time $T \in (0, \infty )$ and the approximating scheme to be equal to the Monte Carlo Euler scheme.

The remainder of this article is organized as follows. In Sect. 2 we present Theorem 2.3, which is the main result of this paper. The proof of Theorem 2.3 employs the elementary result in Lemma 2.2. Lemma 2.2 establishes suitable a priori bounds for random variables and follows from the well-known discrete Gronwall-type inequality in Lemma 2.1 below. In Sect. 3 we develop in Lemmas 3.29 and 3.30 a few elementary results on representation flexibilities of DNNs. The proofs of Lemmas 3.29 and 3.30 use results on a certain artificial neural network (ANN) calculus which we recall and extend in Sects. 3.1–3.7. In Sect. 4 in Theorem 4.5 we employ Lemmas 3.29 and 3.30 to establish the existence of DNNs which approximate solutions of suitable Kolmogorov PDEs without the curse of dimensionality. In our proof of Theorem 4.5 we also employ error estimates for the Monte Carlo Euler method which we present in Proposition 4.4 in Sect. 4. The proof of Proposition 4.4, in turn, makes use of the elementary error estimate results in Lemmas 4.1–4.3 below.

2 Deep artificial neural network (DNN) approximations

In this section we show in Theorem 2.3 below that, roughly speaking, if a function can be approximated by means of some suitable discrete approximation scheme without the curse of dimensionality and if there exist DNNs which satisfy certain regularity properties and which approximate this discrete approximation scheme without the curse of dimensionality, then the function itself can also be approximated with DNNs without the curse of dimensionality.

In our proof of Theorem 2.3 we employ the elementary a priori estimates for expectations of certain random variables in Lemma 2.2 below. Lemma 2.2, in turn, follows from the well-known discrete Gronwall-type inequality in Lemma 2.1 below. Only for completeness we include in this section a detailed proof for Lemma 2.1.

2.1 A priori bounds for random variables

Lemma 2.1

Let $\alpha \in [0, \infty )$, $ \beta \in [0, \infty ]$ and let $ x :{\mathbb {N}}_0 \rightarrow {\mathbb {R}}$ satisfy for all $n \in {\mathbb {N}}$ that $x_n \le \alpha x_{n-1} + \beta $. Then it holds for all $n \in {\mathbb {N}}$ that

$$\begin{aligned} x_n \le \alpha ^n x_0 + \beta (1 + \alpha + \ldots + \alpha ^{n-1}) \le \alpha ^n x_0 + \beta e^{\alpha }. \end{aligned}$$

(2.1)

Proof of Lemma 2.1

We prove (2.1) by induction on $n \in {\mathbb {N}}$. For the base case $n=1$ note that the hypothesis that $\forall \, k \in {\mathbb {N}}:x_k \le \alpha x_{k-1} + \beta $ ensures that

$$\begin{aligned} x_1 \le \alpha x_0 +\beta = \alpha ^1 x_0 + \beta \le \alpha ^1 x_0 + \beta e^\alpha . \end{aligned}$$

(2.2)

This establishes (2.1) in the base case $n=1$. For the induction step ${\mathbb {N}}\ni (n-1) \rightarrow n \in {\mathbb {N}}\cap [2, \infty )$ observe that the hypothesis that $\forall \, k \in {\mathbb {N}}:x_k \le \alpha x_{k-1} + \beta $ implies that for all $n \in {\mathbb {N}}\cap [2, \infty )$ with $x_{n-1} \le \alpha ^{n-1} x_0 + \beta (1 + \alpha + \ldots + \alpha ^{n-2})$ it holds that

$$\begin{aligned} \begin{aligned} x_{n}&\le \alpha x_{n-1} + \beta \le \alpha ^{n} x_0 + \alpha \beta (1 + \alpha + \ldots + \alpha ^{n-2}) + \beta \\&= \alpha ^{n} x_0 + \beta (1 + \alpha + \ldots + \alpha ^{n-1}) \le \alpha ^{n} x_0 + \beta e^{\alpha }. \end{aligned} \end{aligned}$$

(2.3)

Induction thus establishes (2.1). This completes the proof of Lemma 2.1. $\square $

Lemma 2.2

Let $N \in {\mathbb {N}}$, $p \in [1, \infty )$, $\alpha , \beta , \gamma \in [0, \infty )$ and let $X_n :\Omega \rightarrow {\mathbb {R}}$, $n \in \{0, 1, \ldots , N\}$, and $Z_n :\Omega \rightarrow {\mathbb {R}}$, $n \in \{0, 1, \ldots , N-1\}$, be random variables which satisfy for all $n \in \{1, 2, \ldots , N\}$ that

$$\begin{aligned} |X_n | \le \alpha |X_{n-1}| + \beta \big [\gamma + |Z_{n-1}| \big ]. \end{aligned}$$

(2.4)

Then it holds that

$$\begin{aligned} \begin{aligned} \left( {\mathbb {E}}\! \left[ |X_N |^p \right] \right) ^{\nicefrac {1}{p}} \le \alpha ^N \! \left( {\mathbb {E}}\! \left[ |X_0 |^p \right] \right) ^{\nicefrac {1}{p}} + e^{\alpha } \beta \! \left[ \gamma + \sup \nolimits _{i \in \{0, 1, \ldots , N -1 \}} \left( {\mathbb {E}}\! \left[ |Z_{i} |^p \right] \right) ^{\nicefrac {1}{p}}\right] . \end{aligned} \end{aligned}$$

(2.5)

Proof of Lemma 2.2

First, note that (2.4) implies for all $n \in \{1, 2, \ldots , N\}$ that

$$\begin{aligned} \begin{aligned} \left( {\mathbb {E}}\! \left[ |X_n |^p \right] \right) ^{\nicefrac {1}{p}}&\le \alpha \! \left( {\mathbb {E}}\! \left[ |X_{n-1} |^p \right] \right) ^{\nicefrac {1}{p}} +\beta \! \left[ \gamma + \left( {\mathbb {E}}\! \left[ |Z_{n-1} |^p \right] \right) ^{\nicefrac {1}{p}} \right] \\&\le \alpha \! \left( {\mathbb {E}}\! \left[ |X_{n-1} |^p \right] \right) ^{\nicefrac {1}{p}} +\beta \! \left[ \gamma + \sup \nolimits _{i \in \{0, 1, \ldots , N -1 \}} \left( {\mathbb {E}}\! \left[ |Z_{i} |^p \right] \right) ^{\nicefrac {1}{p}} \right] . \end{aligned} \end{aligned}$$

(2.6)

Lemma 2.1 (with $\alpha = \alpha $, $\beta = \beta \, [ \gamma + \sup \nolimits _{i \in \{0, 1, \ldots , N -1 \}} ( {\mathbb {E}}[ |Z_{i} |^p ] )^{\nicefrac {1}{p}} ]$ in the notation of Lemma 2.1) hence establishes for all $n \in \{1, 2, \ldots , N\}$ that

$$\begin{aligned} \begin{aligned} \left( {\mathbb {E}}\! \left[ |X_n |^p \right] \right) ^{\nicefrac {1}{p}} \le \alpha ^n \! \left( {\mathbb {E}}\! \left[ |X_0 |^p \right] \right) ^{\nicefrac {1}{p}} + e^{\alpha } \beta \! \left[ \gamma + \sup \nolimits _{i \in \{0, 1, \ldots , N -1 \}} \left( {\mathbb {E}}\! \left[ |Z_{i} |^p \right] \right) ^{\nicefrac {1}{p}}\right] . \end{aligned} \end{aligned}$$

(2.7)

The proof of Lemma 2.2 is thus completed. $\square $

2.2 A DNN approximation result for Monte Carlo algorithms

Theorem 2.3

Let $(\Omega , {\mathcal {F}}, {\mathbb {P}})$ be a probability space, let $ {\mathfrak {n}}_0 \in (0, \infty )$, ${\mathfrak {n}}_1, {\mathfrak {n}}_2, {\mathfrak {e}}, {\mathfrak {d}}_0, {\mathfrak {d}}_1, \ldots , {\mathfrak {d}}_6 \in [0, \infty )$, ${\mathfrak {C}}, p, \theta \in [1, \infty )$, $(M_{N})_{N \in {\mathbb {N}}} \subseteq {\mathbb {N}}$, let $Z^{N, d, m}_n :\Omega \rightarrow {\mathbb {R}}^{d} $, $n \in \{0, 1, \ldots , N-1\}$, $m \in \{1, 2, \ldots , M_{N}\}$, $d, N \in {\mathbb {N}}$, be random variables, let $f_{N, d} \in C( {\mathbb {R}}^{d} \times {\mathbb {R}}^{d}, {\mathbb {R}}^{d})$, $d, N \in {\mathbb {N}}$, and $Y^{N, d, x}_n = (Y^{N, d, m, x}_n)_{m \in \{1, 2, \ldots , M_{N}\}} :\Omega \rightarrow {\mathbb {R}}^{M_N d}$, $n \in \{0, 1, \ldots , N\}$, $x \in {\mathbb {R}}^d$, $d, N \in {\mathbb {N}}$, satisfy for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , M_{N}\}$, $x \in {\mathbb {R}}^d$, $n \in \{1, 2, \ldots , N\}$, $\omega \in \Omega $ that $Y^{N, d, m, x}_{0}(\omega ) = x$ and

$$\begin{aligned} \begin{aligned} Y^{N, d, m, x}_{n}(\omega )&= f_{N, d} \big (Z^{N, d, m}_{n-1}(\omega ), Y^{N, d, m, x}_{n-1}(\omega )\big ), \end{aligned} \end{aligned}$$

(2.8)

let $\left\| \cdot \right\| \!:(\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0, \infty )$ satisfy for all $d \in {\mathbb {N}}$, $x = (x_1, x_2, \ldots , x_d) \in {\mathbb {R}}^d$ that $\Vert x\Vert = ( \textstyle \sum _{i=1}^d |x_i|^2)^{\nicefrac {1}{2}}$, for every $d \in {\mathbb {N}}$ let $ \nu _d :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1]$ be a probability measure on ${\mathbb {R}}^d$, let $g_{N, d} \in C( {\mathbb {R}}^{Nd}, {\mathbb {R}})$, $ d, N \in {\mathbb {N}}$, and $u_d \in C({\mathbb {R}}^d, {\mathbb {R}})$, $d \in {\mathbb {N}}$, satisfy for all $ N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , M_N\}$, $n \in \{0, 1, \ldots , N-1\}$ that

$$\begin{aligned}&\left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \big |u_d(x) - g_{M_N,d} (Y^{N, d, x}_N) \big |^p \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}} \le {\mathfrak {C}}d^{{\mathfrak {d}}_0} N^{-{\mathfrak {n}}_0}, \end{aligned}$$

(2.9)

$$\begin{aligned}&\left( {\mathbb {E}}\! \left[ \Vert Z^{N, d, m}_{n} \Vert ^{2 p \theta } \right] \right) ^{\nicefrac {1}{(2 p \theta )}} \le {\mathfrak {C}}d^{{\mathfrak {d}}_1}, \quad \text {and} \quad \left[ \int _{{\mathbb {R}}^d} \Vert x\Vert ^{2p \theta } \, \nu _d (dx) \right] ^{\nicefrac {1}{(2 p \theta )}} \le {\mathfrak {C}}d^{{\mathfrak {d}}_1 + {\mathfrak {d}}_2},\nonumber \\ \end{aligned}$$

(2.10)

let ${\mathbf {N}}$ be a set, let $ {\mathcal {P}}:{\mathbf {N}}\rightarrow {\mathbb {N}}$, ${\mathcal {D}} :{\mathbf {N}}\rightarrow ( \cup _{L \in {\mathbb {N}}} {\mathbb {N}}^{L})$, and $ {\mathcal {R}}:{\mathbf {N}}\rightarrow (\cup _{ k, l \in {\mathbb {N}}} C( {\mathbb {R}}^k, {\mathbb {R}}^l )) $ be functions, let ${\mathfrak {N}}_{d, \varepsilon } \subseteq {\mathbf {N}}$, $ \varepsilon \in (0, 1]$, $d \in {\mathbb {N}}$, let $({\mathbf {f}}^{N, d}_{\varepsilon , z})_{(N, d, \varepsilon , z) \in {\mathbb {N}}^2 \times (0, 1] \times {\mathbb {R}}^d } \subseteq {\mathbf {N}}$, $({\mathbf {g}}^{N, d}_{\varepsilon })_{(N, d, \varepsilon ) \in {\mathbb {N}}^2 \times (0, 1] } \subseteq {\mathbf {N}}$, $({\mathfrak {I}}_{d})_{d \in {\mathbb {N}}} \subseteq {\mathbf {N}}$, assume for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $x, y, z \in {\mathbb {R}}^d$ that $ {\mathfrak {N}}_{d, \varepsilon } \subseteq \{\Phi \in {\mathbf {N}}:{\mathcal {R}}( \Phi ) \in C({\mathbb {R}}^d, {\mathbb {R}}^d) \}$, ${\mathfrak {I}}_d \in {\mathfrak {N}}_{d, \varepsilon }$, $({\mathcal {R}}({\mathfrak {I}}_d))(x) = x$, ${\mathcal {P}}({\mathfrak {I}}_d) \le {\mathfrak {C}}d^{{\mathfrak {d}}_3} $, $ {\mathcal {R}}( {\mathbf {f}}^{N, d}_{\varepsilon , z}) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)$, $({\mathbb {R}}^d \ni {\mathfrak {z}} \mapsto ( {\mathcal {R}}({\mathbf {f}}^{N, d}_{\varepsilon , {\mathfrak {z}}}))(x) \in {\mathbb {R}}^d)$ is ${\mathcal {B}}({\mathbb {R}}^d) /{\mathcal {B}}({\mathbb {R}}^d)$-measurable, and

$$\begin{aligned}&\Vert f_{N, d}(z, x) - ( {\mathcal {R}}({\mathbf {f}}^{N, d}_{\varepsilon , z}))(x) \Vert \le \varepsilon {\mathfrak {C}}d^{{\mathfrak {d}}_4} (d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}+ \Vert x\Vert ^{\theta }), \end{aligned}$$

(2.11)

$$\begin{aligned}&\Vert ( {\mathcal {R}}({\mathbf {f}}^{N, d}_{\varepsilon , z}))(x) \Vert \le \big (1 + \tfrac{{\mathfrak {C}}}{N}\big ) \Vert x\Vert + {\mathfrak {C}}d^{{\mathfrak {d}}_2}( d^{{\mathfrak {d}}_1} + \Vert z\Vert ), \end{aligned}$$

(2.12)

$$\begin{aligned}&\Vert f_{N, d}(z, x) - f_{N, d}(z, y) \Vert \le {\mathfrak {C}}\Vert x - y \Vert , \end{aligned}$$

(2.13)

assume for every $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $\Phi \in {\mathfrak {N}}_{d, \varepsilon }$ that there exist $(\phi _z)_{z \in {\mathbb {R}}^d} \subseteq {\mathfrak {N}}_{d, \varepsilon }$ such that for all $x, z, {\mathfrak {z}} \in {\mathbb {R}}^d$ it holds that $ ({\mathcal {R}}(\phi _z)) (x) = ( {\mathcal {R}}({\mathbf {f}}^{N, d}_{\varepsilon , z}))(({\mathcal {R}}(\Phi ))(x)) $, ${\mathcal {P}}(\phi _z) \le {\mathcal {P}}(\Phi ) + {\mathfrak {C}}N^{{\mathfrak {n}}_1} d^{{\mathfrak {d}}_3} \varepsilon ^{-{\mathfrak {e}}}$, and $ {\mathcal {D}} (\phi _z) = {\mathcal {D}} (\phi _{{\mathfrak {z}}})$, assume for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $x = (x_i)_{i \in \{1, 2, \ldots , N\}} \in {\mathbb {R}}^{Nd}$, $y = (y_i)_{i \in \{1, 2, \ldots , N\}} \in {\mathbb {R}}^{Nd}$ that $ {\mathcal {R}}({\mathbf {g}}^{N, d}_{\varepsilon }) \in C({\mathbb {R}}^{Nd}, {\mathbb {R}})$ and

$$\begin{aligned}&|g_{N,d}(x) - ( {\mathcal {R}}({\mathbf {g}}^{N, d}_{\varepsilon }) )(x) | \le \varepsilon {\mathfrak {C}}d^{{\mathfrak {d}}_5} \left[ d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}+ \tfrac{1}{N} \textstyle \sum \limits _{i=1}^N \displaystyle \Vert x_i \Vert ^{\theta } \right] , \end{aligned}$$

(2.14)

$$\begin{aligned}&|( {\mathcal {R}}({\mathbf {g}}^{N, d}_{\varepsilon }) )(x) - ( {\mathcal {R}}({\mathbf {g}}^{N, d}_{\varepsilon }) )(y)| \nonumber \\&\quad \le \frac{{\mathfrak {C}}d^{{\mathfrak {d}}_6}}{N} \left[ \textstyle \sum \limits _{i=1}^N \displaystyle (d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + \Vert x_i\Vert ^{\theta } + \Vert y_i \Vert ^{\theta })\Vert x_i- y_i\Vert \right] , \end{aligned}$$

(2.15)

and assume for every $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $\Phi _1, \Phi _2, \ldots , \Phi _{M_N} \in {\mathfrak {N}}_{d, \varepsilon }$ with ${\mathcal {D}}(\Phi _1) = {\mathcal {D}}(\Phi _2) = \cdots = {\mathcal {D}}(\Phi _{M_N})$ that there exists $\varphi \in {\mathbf {N}}$ such that for all $x \in {\mathbb {R}}^d$ it holds that $ {\mathcal {R}}(\varphi ) \in C({\mathbb {R}}^d, {\mathbb {R}})$, $( {\mathcal {R}}(\varphi ))(x) = ( {\mathcal {R}}({\mathbf {g}}^{ M_N, d }_{ \varepsilon }) )( ({\mathcal {R}}(\Phi _1))(x), ({\mathcal {R}}(\Phi _2))(x),$ $\ldots , ({\mathcal {R}}(\Phi _{M_N}))(x))$, and ${\mathcal {P}}(\varphi ) \le {\mathfrak {C}}N^{{\mathfrak {n}}_2} ( N^{{\mathfrak {n}}_1 +1} d^{{\mathfrak {d}}_3} \varepsilon ^{-{\mathfrak {e}}} + {\mathcal {P}}(\Phi _1))$. Then there exist $ c \in {\mathbb {R}}$ and $ ( \Psi _{ d, \varepsilon } )_{ (d , \varepsilon ) \in {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}$ such that for all $ d \in {\mathbb {N}}$, $ \varepsilon \in (0,1] $ it holds that $ {\mathcal {R}}( \Psi _{ d, \varepsilon } ) \in C( {\mathbb {R}}^{ d }, {\mathbb {R}}) $, $[ \int _{ {\mathbb {R}}^d } | u_d(x) - ( {\mathcal {R}}(\Psi _{ d, \varepsilon }) )( x ) |^p \, \nu _d(dx) ]^{ \nicefrac { 1 }{ p } } \le \varepsilon $, and

$$\begin{aligned} {\mathcal {P}}( \Psi _{ d, \varepsilon } ) \le c d^{\frac{{\mathfrak {d}}_0( {\mathfrak {n}}_1+{\mathfrak {n}}_2 +1)}{{\mathfrak {n}}_0} + {\mathfrak {d}}_3 + {\mathfrak {e}}\max \{{\mathfrak {d}}_5 + \theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2), {\mathfrak {d}}_4 + {\mathfrak {d}}_6 + 2\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)\}} \varepsilon ^{-\frac{( {\mathfrak {n}}_1+ {\mathfrak {n}}_2 +1)}{{\mathfrak {n}}_0} -{\mathfrak {e}}}. \end{aligned}$$

(2.16)

Theorem 2.3, roughly speaking, shows that if a function can be approximated by means of some suitable discrete (Monte Carlo) approximation scheme without the curse of dimensionality (cf. (2.9) above) and if there exist DNNs which satisfy certain regularity properties and which approximate this discrete approximation scheme without the curse of dimensionality, then the function itself can also be approximated with DNNs without the curse of dimensionality.

The proof of Theorem 2.3 is given below. In the following we provide some comments on the mathematical objects appearing in Theorem 2.3 above.

The triple $(\Omega , {\mathcal {F}}, {\mathbb {P}})$ denotes the probability space on which we consider the discrete (Monte Carlo) approximation scheme. For every $N, d \in {\mathbb {N}}$ the random variables $Z^{N, d, m}_n :\Omega \rightarrow {\mathbb {R}}^{d} $, $n \in \{0, 1, \ldots , N-1\}$, $m \in \{1, 2, \ldots , M_{N}\}$, and the Lipschitz continuous function $f_{N, d} \in C( {\mathbb {R}}^{d} \times {\mathbb {R}}^{d}, {\mathbb {R}}^{d})$ (cf. (2.13) above) are employed in the iterative construction of the discrete approximations $Y^{N, d, x}_n = (Y^{N, d, m, x}_n)_{m \in \{1, 2, \ldots , M_{N}\}} :\Omega \rightarrow {\mathbb {R}}^{M_N d}$, $n \in \{0, 1, \ldots , N\}$, $x \in {\mathbb {R}}^d$ (cf. (2.8) above). We assume that these approximations composed with the functions $g_{N, d} \in C( {\mathbb {R}}^{Nd}, {\mathbb {R}})$, $ d, N \in {\mathbb {N}}$, approximate the functions $u_d \in C({\mathbb {R}}^d, {\mathbb {R}})$, $d \in {\mathbb {N}}$, without the curse of dimensionality in the strong $L^p$-sense with respect to the probability measures $ \nu _d :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1]$, $d \in {\mathbb {N}}$ (cf. (2.9) above). We assume suitable moment bounds for the random variables $Z^{N, d, m}_n :\Omega \rightarrow {\mathbb {R}}^{d} $, $n \in \{0, 1, \ldots , N-1\}$, $m \in \{1, 2, \ldots , M_{N}\}$, $d, N \in {\mathbb {N}}$, and the probability measures $ \nu _d :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1]$, $d \in {\mathbb {N}}$ (cf. (2.10) above).

We think of ${\mathbf {N}}$ in Theorem 2.3 above as a set of DNNs (see also Definition 3.1 below) and for every $\Phi \in {\mathbf {N}}$ we think of ${\mathcal {P}}(\Phi ) \in {\mathbb {N}}$ as the number of parameters which are used to describe $\Phi $ (see also Definition 3.1 below). For every $\Phi \in {\mathbf {N}}$ we think of ${\mathcal {D}}(\Phi ) \in (\cup _{L\in {\mathbb {N}}} {\mathbb {N}}^{L})$ as the vector consisting of the dimensions of all layers of $\Phi $ and we think of ${\mathcal {R}}(\Phi )$ as the realization function associated to $\Phi $ (see also Definition 3.3 below).

For every $d \in {\mathbb {N}}$, $ \varepsilon \in (0, 1]$ we think of ${\mathfrak {N}}_{d, \varepsilon } \subseteq {\mathbf {N}}$ as a set of DNNs with suitable regularity properties. For every $N, d \in {\mathbb {N}}$, $z \in {\mathbb {R}}^d$ we think of $({\mathbf {f}}^{N, d}_{\varepsilon , z})_{\varepsilon \in (0, 1] } \subseteq {\mathbf {N}}$ as neural networks which approximate the function ${\mathbb {R}}^d \ni x \mapsto f_{N, d} (z, x) \in {\mathbb {R}}^{d}$ without the curse of dimensionality (cf. (2.11) above) and which satisfy a suitable linear growth condition (cf. (2.12) above). For every $N, d \in {\mathbb {N}}$ we think of $({\mathbf {g}}^{N, d}_{\varepsilon })_{\varepsilon \in (0, 1] } \subseteq {\mathbf {N}}$ as neural networks which approximate the function $g_{N, d} :{\mathbb {R}}^{d}\rightarrow {\mathbb {R}}^{d}$ without the curse of dimensionality (cf. (2.14) above) and which satisfy a suitable Lipschitz-type condition (cf. (2.15) above). For every $d \in {\mathbb {N}}$ we think of ${\mathfrak {I}}_d \in {\mathbf {N}}$ as a neural network representing the identity function on ${\mathbb {R}}^d$ in the sense that for all $x \in {\mathbb {R}}^d$ it holds that $({\mathcal {R}}({\mathfrak {I}}_d))(x) = x$ (see also Definition 3.15 below).

Proof of Theorem 2.3

Throughout this proof let $\gamma = 46 e^{{\mathfrak {C}}} {\mathfrak {C}}^2 ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^3 )^{ 2\theta }$, let $\delta = \max \{{\mathfrak {d}}_5 + \theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2), {\mathfrak {d}}_4 + {\mathfrak {d}}_6 + 2\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)\}$, let $X^{N, d, x, \varepsilon }_n = (X^{N, d, m, x, \varepsilon }_n)_{m \in \{1, 2, \ldots , M_{N}\}} :\Omega \rightarrow {\mathbb {R}}^{M_N d}$, $n \in \{0, 1, \ldots , N\}$, $\varepsilon \in (0, 1]$, $x \in {\mathbb {R}}^d$, $d, N \in {\mathbb {N}}$, be the random variables which satisfy for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , M_{N}\}$, $x \in {\mathbb {R}}^d$, $\varepsilon \in (0, 1]$, $n \in \{1, 2, \ldots , N\}$, $\omega \in \Omega $ that $X^{N, d, m, x, \varepsilon }_{0}(\omega ) = x$ and

$$\begin{aligned} \begin{aligned} X^{N, d, m, x, \varepsilon }_{n}(\omega )&= \Big ( {\mathcal {R}}\Big ({\mathbf {f}}^{N, d}_{\varepsilon , Z^{N, d, m}_{n-1}(\omega )}\Big ) \Big ) \big ( X^{N, d, m, x, \varepsilon }_{n-1}(\omega )\big ), \end{aligned} \end{aligned}$$

(2.17)

and let $({\mathcal {N}}_{d, \varepsilon })_{(d, \varepsilon ) \in {\mathbb {N}}\times (0, 1]} \subseteq {\mathbb {N}}$ and $({\mathcal {E}}_{d, \varepsilon })_{(d, \varepsilon ) \in {\mathbb {N}}\times (0, 1]} \subseteq (0, 1]$ satisfy for all $\varepsilon \in (0, 1]$, $d \in {\mathbb {N}}$ that

$$\begin{aligned} {\mathcal {N}}_{d, \varepsilon } = \min \! \left( {\mathbb {N}}\cap \big [\big (\tfrac{2{\mathfrak {C}}d^{{\mathfrak {d}}_0}}{\varepsilon }\big )^{\nicefrac {1}{{\mathfrak {n}}_0}}, \infty \big ) \right) \qquad \text {and}\qquad {\mathcal {E}}_{d, \varepsilon } = \tfrac{\varepsilon }{\gamma d^{ \delta }} . \end{aligned}$$

(2.18)

Note that for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $n \in \{0, 1, 2, \ldots , N\}$ it holds that

$$\begin{aligned} \big ({\mathbb {R}}^d \ni x \mapsto X^{N, d, x, \varepsilon }_n \in {\mathbb {R}}^{M_N d}\big ) \in C({\mathbb {R}}^d, {\mathbb {R}}^{M_N d}). \end{aligned}$$

(2.19)

This implies that for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$ it holds that

$$\begin{aligned} \begin{aligned}&\left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \big |u_d(x) - ( {\mathcal {R}}({\mathbf {g}}^{M_N, d}_{\varepsilon }) )(X^{N, d, x, \varepsilon }_N) \big |^p \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}} \\&\quad \le \left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \big |u_d(x) - g_{M_N,d} (Y^{N, d, x}_N) \big |^p \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}} \\&\qquad + \left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \big | g_{M_N,d} (Y^{N, d, x}_N) - ({\mathcal {R}}({\mathbf {g}}^{M_N, d}_{\varepsilon } ))(Y^{N, d, x}_N) \big |^p \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}}\\&\qquad + \left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \big | ({\mathcal {R}}({\mathbf {g}}^{M_N, d}_{\varepsilon }) )(Y^{N, d, x}_N) - ( {\mathcal {R}}({\mathbf {g}}^{M_N, d}_{\varepsilon }) )(X^{N, d, x, \varepsilon }_N) \big |^p \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}}. \end{aligned} \end{aligned}$$

(2.20)

Next observe that (2.14) ensures for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$ that

$$\begin{aligned} \begin{aligned}&\left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \big | g_{M_N,d} (Y^{N, d, x}_N) - ({\mathcal {R}}({\mathbf {g}}^{M_N, d}_{\varepsilon }) )(Y^{N, d, x}_N) \big |^p \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}} \\&\quad \le \varepsilon {\mathfrak {C}}d^{{\mathfrak {d}}_5} \left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \Big | d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + \tfrac{1}{M_N} \textstyle \sum \limits _{m=1}^{M_N} \displaystyle \Vert Y^{N, d, m, x}_N \Vert ^{\theta } \Big |^p \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}}\\&\quad \le \varepsilon {\mathfrak {C}}d^{{\mathfrak {d}}_5} \left[ d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}+ \tfrac{1}{M_N} \textstyle \sum \limits _{m=1}^{M_N} \displaystyle \left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \Vert Y^{N, d, m, x}_N \Vert ^{p\theta } \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}} \right] . \end{aligned} \end{aligned}$$

(2.21)

In addition, note that (2.11) and (2.12) assure that for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $x, z \in {\mathbb {R}}^d$ it holds that

$$\begin{aligned} \begin{aligned} \Vert f_{N, d}(z, x) \Vert&\le \Vert f_{N, d}(z, x) - ( {\mathcal {R}}({\mathbf {f}}^{N, d}_{\varepsilon , z}))(x) \Vert + \Vert ( {\mathcal {R}}({\mathbf {f}}^{N, d}_{\varepsilon , z}))(x) \Vert \\&\le \varepsilon {\mathfrak {C}}d^{{\mathfrak {d}}_4} (d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}+ \Vert x\Vert ^{\theta }) + \big (1 + \tfrac{{\mathfrak {C}}}{N}\big ) \Vert x\Vert + {\mathfrak {C}}d^{{\mathfrak {d}}_2}( d^{{\mathfrak {d}}_1} + \Vert z\Vert ). \end{aligned} \end{aligned}$$

(2.22)

This proves that for all $N, d \in {\mathbb {N}}$, $x, z \in {\mathbb {R}}^d$ it holds that

$$\begin{aligned} \Vert f_{N, d}(z, x) \Vert \le \big (1 + \tfrac{{\mathfrak {C}}}{N}\big ) \Vert x\Vert + {\mathfrak {C}}d^{{\mathfrak {d}}_2} ( d^{{\mathfrak {d}}_1} + \Vert z\Vert ). \end{aligned}$$

(2.23)

Hence, we obtain that for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , M_N\}$, $x \in {\mathbb {R}}^d$, $ n \in \{1, 2, \ldots , N\}$ it holds that

$$\begin{aligned} \begin{aligned} \Vert Y^{N, d, m, x}_n \Vert&= \big \Vert f_{N, d} \big (Z^{N, d, m}_{n-1}, Y^{N, d, m, x}_{n-1}\big ) \big \Vert \\&\le \big (1 + \tfrac{{\mathfrak {C}}}{N}\big ) \Vert Y^{N, d, m, x}_{n-1} \Vert + {\mathfrak {C}}d^{{\mathfrak {d}}_2} \big [ d^{{\mathfrak {d}}_1} + \Vert Z^{N, d, m}_{n-1}\Vert \big ]. \end{aligned} \end{aligned}$$

(2.24)

Moreover, note that (2.12) assures that for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , M_N\}$, $x \in {\mathbb {R}}^d$, $\varepsilon \in (0, 1]$, $ n \in \{1, 2, \ldots , N\}$ it holds that

$$\begin{aligned} \begin{aligned} \Vert X^{N, d, m, x, \varepsilon }_{n} \Vert&= \Big \Vert \Big ( {\mathcal {R}}\Big ({\mathbf {f}}^{N, d}_{\varepsilon , Z^{N, d, m}_{n-1}} \Big ) \Big ) \big ( X^{N, d, m, x, \varepsilon }_{n-1}\big ) \Big \Vert \\&\le \big (1 + \tfrac{{\mathfrak {C}}}{N}\big )\Vert X^{N, d, m, x, \varepsilon }_{n-1} \Vert + {\mathfrak {C}}d^{{\mathfrak {d}}_2} \big [ d^{{\mathfrak {d}}_1} + \Vert Z^{N, d, m}_{n-1}\Vert \big ]. \end{aligned} \end{aligned}$$

(2.25)

Lemma 2.2 (with $N = n$, $p = 2p\theta $, $\alpha $ = $(1 + \frac{{\mathfrak {C}}}{N})$, $\beta = {\mathfrak {C}}d^{{\mathfrak {d}}_2}$, $\gamma = d^{{\mathfrak {d}}_1}$, $Z_i = \Vert Z^{N, d, m}_{i} \Vert $ for $N, d \in {\mathbb {N}}$, $ n \in \{1, 2, \ldots , N\}$, $m \in \{1, 2, \ldots , M_N\}$, $i \in \{0, 1, \ldots , n-1\}$ in the notation of Lemma 2.2), (2.24), and (2.10) therefore demonstrate that for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , M_N\}$, $x \in {\mathbb {R}}^d$, $\varepsilon \in (0, 1]$, $ n \in \{1, 2, \ldots , N\}$ it holds that

$$\begin{aligned} \begin{aligned}&\max \left\{ \left( {\mathbb {E}}\big [ \Vert Y^{N, d, m, x}_n \Vert ^{2 p \theta } \big ] \right) ^{\nicefrac {1}{(2 p \theta )}}, \left( {\mathbb {E}}\big [ \Vert X^{N, d, m, x, \varepsilon }_{n} \Vert ^{2 p \theta } \big ] \right) ^{\nicefrac {1}{(2 p \theta )}} \right\} \\&\quad \le \big (1 + \tfrac{{\mathfrak {C}}}{N}\big )^n \Vert x\Vert + e^{(1 + \frac{{\mathfrak {C}}}{N})} {\mathfrak {C}}d^{{\mathfrak {d}}_2} \left[ d^{{\mathfrak {d}}_1}+ \sup \nolimits _{i \in \{0, 1, \ldots , n -1 \}} \left( {\mathbb {E}}\! \left[ \Vert Z^{N, d, m}_{i} \Vert ^{2 p \theta } \right] \right) ^{\!\nicefrac {1}{(2 p \theta )}} \right] \\&\quad \le e^{{\mathfrak {C}}} \Vert x\Vert + e^{{\mathfrak {C}}+1} {\mathfrak {C}}d^{{\mathfrak {d}}_2} \big [ d^{{\mathfrak {d}}_1} + {\mathfrak {C}}d^{{\mathfrak {d}}_1} \big ] \le e^{{\mathfrak {C}}} \Vert x\Vert + 2 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^2 d^{{\mathfrak {d}}_1 + {\mathfrak {d}}_2}\\&\quad \le 2 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^2 \big [\Vert x\Vert + d^{{\mathfrak {d}}_1 + {\mathfrak {d}}_2} \big ]. \end{aligned} \end{aligned}$$

(2.26)

This and the fact that $\forall \, a, b \in {\mathbb {R}}:|a+b|^{\theta } \le 2^{\theta -1}(|a|^{\theta }+|b|^{\theta })$ prove for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , M_N\}$, $x \in {\mathbb {R}}^d$, $\varepsilon \in (0, 1]$, $ n \in \{1, 2, \ldots , N\}$ that

$$\begin{aligned} \begin{aligned}&\max \left\{ {\mathbb {E}}\big [ \Vert Y^{N, d, m, x}_n \Vert ^{2 p \theta } \big ], {\mathbb {E}}\big [ \Vert X^{N, d, m, x, \varepsilon }_{n} \Vert ^{2 p \theta } \big ] \right\} \\&\quad \le \left( 2 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^2 \big [\Vert x\Vert + d^{{\mathfrak {d}}_1 + {\mathfrak {d}}_2} \big ] \right) ^{2 p \theta } = \left( 2 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^2 \right) ^{2 p \theta } \big [\Vert x\Vert + d^{{\mathfrak {d}}_1 + {\mathfrak {d}}_2} \big ]^{2 p \theta }\\&\quad \le 2^{2 p (\theta -1) } \left( 2 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^2 \right) ^{2 p \theta }\big [\Vert x\Vert ^{\theta } + d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} \big ]^{2p}\\&\quad \le ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^2 )^{2 p \theta } \big [\Vert x\Vert ^{\theta } + d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} \big ]^{2p}. \end{aligned} \end{aligned}$$

(2.27)

This and (2.10) establish that for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , M_N\}$, $\varepsilon \in (0, 1]$ it holds that

$$\begin{aligned}&\max \left\{ \left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \Vert Y^{N, d, m, x}_N \Vert ^{2p\theta } \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{(2p)}}, \left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \Vert X^{N, d, m, x, \varepsilon }_{N} \Vert ^{2p\theta } \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{(2p)}} \right\} \nonumber \\&\quad \le ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^2 )^{ \theta } \left( \int _{{\mathbb {R}}^d} \big [\Vert x\Vert ^{\theta } + d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} \big ]^{2p}\, \nu _d (dx) \right) ^{\!\nicefrac {1}{(2p)}}\nonumber \\&\quad \le ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^2 )^{ \theta } \left[ \left( \int _{{\mathbb {R}}^d} \Vert x\Vert ^{ 2p \theta } \, \nu _d (dx) \right) ^{\!\nicefrac {1}{(2p)}} + d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} \right] \nonumber \\&\quad \le ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^2 )^{ \theta } \big [ {\mathfrak {C}}^{\theta } d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}\big ] \le 2 ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^3 )^{ \theta } d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} . \end{aligned}$$

(2.28)

Hence, we obtain that for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , M_N\}$ it holds that

$$\begin{aligned} \begin{aligned} \left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \Vert Y^{N, d, m, x}_N \Vert ^{p\theta } \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}}&\le \left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \Vert Y^{N, d, m, x}_N \Vert ^{2p\theta } \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{(2p)}}\\&\le 2 ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^3 )^{ \theta } d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}. \end{aligned} \end{aligned}$$

(2.29)

Combining this and (2.21) demonstrates that for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$ it holds that

$$\begin{aligned} \begin{aligned}&\left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \big | g_{M_N,d} (Y^{N, d, x}_N) - ({\mathcal {R}}({\mathbf {g}}^{M_N, d}_{\varepsilon }) )(Y^{N, d, x}_N) \big |^p \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}} \\&\quad \le \varepsilon {\mathfrak {C}}d^{{\mathfrak {d}}_5} \big [ d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} +2 ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^3 )^{ \theta } d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} \big ] \le 3 \varepsilon {\mathfrak {C}}( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^3 )^{ \theta } d^{{\mathfrak {d}}_5 + \theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}. \end{aligned} \end{aligned}$$

(2.30)

In addition, observe that (2.15) ensures that for all $N, d \in {\mathbb {N}}$, $x \in {\mathbb {R}}^d$, $\varepsilon \in (0, 1]$ it holds that

$$\begin{aligned}&\big | ({\mathcal {R}}({\mathbf {g}}^{M_N, d}_{\varepsilon }) )(Y^{N, d, x}_N) - ( {\mathcal {R}}({\mathbf {g}}^{M_N, d}_{\varepsilon }) )(X^{N, d, x, \varepsilon }_N) \big | \nonumber \\&\quad \le \frac{{\mathfrak {C}}d^{{\mathfrak {d}}_6}}{M_N} \bigg [ \textstyle \sum \limits _{m=1}^{M_N} \displaystyle \big (d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + \Vert Y^{N, d, m, x}_N\Vert ^{\theta } + \Vert X^{N, d, m, x, \varepsilon }_N \Vert ^{\theta }\big ) \Vert Y^{N, d, m, x}_N - X^{N, d, m, x, \varepsilon }_N\Vert \bigg ]. \end{aligned}$$

(2.31)

This ensures for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$ that

$$\begin{aligned} \begin{aligned}&\left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \big | ({\mathcal {R}}({\mathbf {g}}^{M_N, d}_{\varepsilon }) )(Y^{N, d, x}_N) - ( {\mathcal {R}}({\mathbf {g}}^{M_N, d}_{\varepsilon }) )(X^{N, d, x, \varepsilon }_N) \big |^p \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}}\\&\quad \le \frac{{\mathfrak {C}}d^{{\mathfrak {d}}_6}}{ M_N } \sum _{m=1}^{M_N} \bigg ({\mathbb {E}}\bigg [ \int _{{\mathbb {R}}^d} \big (d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + \Vert Y^{N, d, m, x}_N\Vert ^{\theta } + \Vert X^{N, d, m, x, \varepsilon }_N \Vert ^{\theta }\big )^p \\&\qquad \cdot \Vert Y^{N, d, m, x}_N -X^{N, d, m, x, \varepsilon }_N\Vert ^p \, \nu _d (dx) \bigg ] \bigg )^{\!\nicefrac {1}{p}}. \end{aligned} \end{aligned}$$

(2.32)

Hölder’s inequality hence assures for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$ that

$$\begin{aligned}&\left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \big | ({\mathcal {R}}({\mathbf {g}}^{M_N, d}_{\varepsilon }) )(Y^{N, d, x}_N) - ( {\mathcal {R}}( {\mathbf {g}}^{M_N, d}_{\varepsilon }) )(X^{N, d, x, \varepsilon }_N) \big |^p \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}}\nonumber \\&\quad \le \frac{{\mathfrak {C}}d^{{\mathfrak {d}}_6}}{ M_N } \sum _{m=1}^{M_N} \bigg ({\mathbb {E}}\bigg [ \int _{{\mathbb {R}}^d} \big (d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + \Vert Y^{N, d, m, x}_N\Vert ^{\theta } + \Vert X^{N, d, m, x, \varepsilon }_N \Vert ^{\theta }\big )^{2p} \, \nu _d (dx) \bigg ] \bigg )^{\!\nicefrac {1}{(2p)}} \nonumber \\&\qquad \cdot \bigg ({\mathbb {E}}\bigg [ \int _{{\mathbb {R}}^d} \Vert Y^{N, d, m, x}_N - X^{N, d, m, x, \varepsilon }_N\Vert ^{2p} \, \nu _d (dx) \bigg ] \bigg )^{\!\nicefrac {1}{(2p)}}. \end{aligned}$$

(2.33)

Moreover, note that (2.28) implies that for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , M_N\}$, $\varepsilon \in (0, 1]$ it holds that

$$\begin{aligned} \begin{aligned}&\left( {\mathbb {E}}\bigg [ \int _{{\mathbb {R}}^d} \big (d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + \Vert Y^{N, d, m, x}_N\Vert ^{\theta } + \Vert X^{N, d, m, x, \varepsilon }_N \Vert ^{\theta }\big )^{2p} \, \nu _d (dx) \bigg ] \right) ^{\!\nicefrac {1}{(2p)}}\\&\quad \le d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}+ \bigg ({\mathbb {E}}\bigg [ \int _{{\mathbb {R}}^d} \Vert Y^{N, d, m, x}_N\Vert ^{2p\theta } \, \nu _d (dx) \bigg ] \bigg )^{\!\nicefrac {1}{(2p)}} \\&\qquad + \bigg ({\mathbb {E}}\bigg [ \int _{{\mathbb {R}}^d} \Vert X^{N, d, m, x, \varepsilon }_N \Vert ^{2p \theta } \, \nu _d (dx) \bigg ] \bigg )^{\!\nicefrac {1}{(2p)}}\\&\quad \le d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + 4 ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^3 )^{ \theta } d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} \le 5 ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^3 )^{ \theta } d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}. \end{aligned} \end{aligned}$$

(2.34)

Next observe that (2.13) and (2.11) prove that for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , M_N\}$, $x \in {\mathbb {R}}^d$, $\varepsilon \in (0, 1]$, $n \in \{1, 2, \ldots , N\}$ it holds that

$$\begin{aligned} \begin{aligned}&\Vert Y^{N, d, m, x}_n - X^{N, d, m, x, \varepsilon }_n\Vert \\&\quad = \left\| f_{N, d} \big (Z^{N, d, m}_{n-1}, Y^{N, d, m, x}_{n-1}\big ) - \Big ( {\mathcal {R}}\Big ({\mathbf {f}}^{N, d}_{\varepsilon , Z^{N, d, m}_{n-1}} \Big ) \Big ) \big ( X^{N, d, m, x, \varepsilon }_{n-1}\big ) \right\| \\&\quad \le \left\| f_{N, d} \big (Z^{N, d, m}_{n-1}, Y^{N, d, m, x}_{n-1}\big ) - f_{N, d} \big (Z^{N, d, m}_{n-1}, X^{N, d, m, x, \varepsilon }_{n-1} \big ) \right\| \\&\qquad + \left\| f_{N, d} \big (Z^{N, d, m}_{n-1}, X^{N, d, m, x, \varepsilon }_{n-1} \big ) - \Big ( {\mathcal {R}}\Big ({\mathbf {f}}^{N, d}_{\varepsilon , Z^{N, d, m}_{n-1}} \Big ) \Big ) \big ( X^{N, d, m, x, \varepsilon }_{n-1}\big ) \right\| \\&\quad \le {\mathfrak {C}}\Vert Y^{N, d, m, x}_{n-1} - X^{N, d, m, x, \varepsilon }_{n-1}\Vert + \varepsilon {\mathfrak {C}}d^{{\mathfrak {d}}_4} \left( d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + \Vert X^{N, d, m, x, \varepsilon }_{n-1} \Vert ^{\theta } \right) . \end{aligned} \end{aligned}$$

(2.35)

Lemma 2.2 (with $N = N$, $p = 2p$, $\alpha $ = ${\mathfrak {C}}$, $\beta = \varepsilon {\mathfrak {C}}d^{{\mathfrak {d}}_4}$, $\gamma = d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}$, $Z_n = \Vert X^{N, d, m, x, \varepsilon }_{n} \Vert ^{\theta } $, $X_n = \Vert Y^{N, d, m, x}_n - X^{N, d, m, x, \varepsilon }_n \Vert $ for $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , M_N\}$, $x \in {\mathbb {R}}^d$, $\varepsilon \in (0, 1]$, $n \in \{1, 2, \ldots , N\}$ in the notation of Lemma 2.2) and (2.27) hence ensure for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , M_N\}$, $x \in {\mathbb {R}}^d$, $\varepsilon \in (0, 1]$ that

$$\begin{aligned} \begin{aligned}&\left( {\mathbb {E}}\big [ \Vert Y^{N, d, m, x}_N - X^{N, d, m, x, \varepsilon }_N\Vert ^{2p} \big ]\right) ^{\!\nicefrac {1}{(2p)}}\\&\quad \le e^{{\mathfrak {C}}} \varepsilon {\mathfrak {C}}d^{{\mathfrak {d}}_4} \left[ d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + \sup \nolimits _{i \in \{0, 1, \ldots , N-1\}} \left( {\mathbb {E}}\big [ \Vert X^{N, d, m, x, \varepsilon }_{i}\Vert ^{2p \theta } \big ]\right) ^{\!\nicefrac {1}{(2p)}} \right] \\&\quad \le \varepsilon e^{{\mathfrak {C}}} {\mathfrak {C}}d^{{\mathfrak {d}}_4} \left[ d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + \left( ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^2 )^{2 p \theta } \big [\Vert x\Vert ^{\theta } + d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} \big ]^{2p} \right) ^{\!\nicefrac {1}{(2p)}} \right] \\&\quad = \varepsilon e^{{\mathfrak {C}}} {\mathfrak {C}}d^{{\mathfrak {d}}_4} \left[ d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^2 )^{ \theta } \big [\Vert x\Vert ^{\theta } + d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} \big ] \right] \\&\quad \le 2 \varepsilon e^{{\mathfrak {C}}} {\mathfrak {C}}d^{{\mathfrak {d}}_4} ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^2 )^{ \theta } \big [\Vert x\Vert ^{\theta } + d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} \big ]. \end{aligned} \end{aligned}$$

(2.36)

This and (2.10) demonstrate that for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , M_N\}$, $\varepsilon \in (0, 1]$ it holds that

$$\begin{aligned} \begin{aligned}&\bigg ({\mathbb {E}}\bigg [ \int _{{\mathbb {R}}^d} \Vert Y^{N, d, m, x}_N - X^{N, d, m, x, \varepsilon }_N\Vert ^{2p} \, \nu _d (dx) \bigg ] \bigg )^{\!\nicefrac {1}{(2p)}}\\&\quad \le 2 \varepsilon e^{{\mathfrak {C}}} {\mathfrak {C}}d^{{\mathfrak {d}}_4} ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^2 )^{ \theta } \left[ \int _{{\mathbb {R}}^d} \big ( \Vert x\Vert ^{\theta } +d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} \big )^{2p}\, \nu _d (dx) \right] ^{\nicefrac {1}{(2p)}}\\&\quad \le 2 \varepsilon e^{{\mathfrak {C}}} {\mathfrak {C}}d^{{\mathfrak {d}}_4} ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^2 )^{ \theta } \left( \left[ \int _{{\mathbb {R}}^d} \Vert x\Vert ^{2p \theta } \, \nu _d (dx) \right] ^{\nicefrac {1}{(2p)}} + d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}\right) \\&\quad \le 2 \varepsilon e^{{\mathfrak {C}}} {\mathfrak {C}}d^{{\mathfrak {d}}_4} ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^2 )^{ \theta } \big ( {\mathfrak {C}}^{\theta } d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} \big )\\&\quad \le 4 \varepsilon e^{{\mathfrak {C}}} {\mathfrak {C}}( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^3 )^{ \theta } d^{{\mathfrak {d}}_4 + \theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}. \end{aligned} \end{aligned}$$

(2.37)

Combining this with (2.33) and (2.34) establishes that for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$ it holds that

$$\begin{aligned} \begin{aligned}&\left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \big | ({\mathcal {R}}({\mathbf {g}}^{M_N, d}_{\varepsilon }) )(Y^{N, d, x}_N) - ( {\mathcal {R}}({\mathbf {g}}^{M_N, d}_{\varepsilon }) )(X^{N, d, x, \varepsilon }_N) \big |^p \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}} \\&\quad \le {\mathfrak {C}}d^{{\mathfrak {d}}_6} \cdot 5 ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^3 )^{ \theta } d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} \cdot 4 \varepsilon e^{{\mathfrak {C}}} {\mathfrak {C}}( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^3 )^{ \theta } d^{{\mathfrak {d}}_4 + \theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}\\&\quad \le 20 \varepsilon e^{{\mathfrak {C}}} {\mathfrak {C}}^2 ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^3 )^{ 2\theta } d^{{\mathfrak {d}}_4 + {\mathfrak {d}}_6 + 2\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}. \end{aligned} \end{aligned}$$

(2.38)

This, (2.9), (2.20), and (2.30) prove for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$ that

$$\begin{aligned}&\left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \big |u_d(x) - ( {\mathcal {R}}({\mathbf {g}}^{M_N, d}_{\varepsilon }) )(X^{N, d, x, \varepsilon }_N) \big |^p \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}} \nonumber \\&\quad \le {\mathfrak {C}}d^{{\mathfrak {d}}_0} N^{-{\mathfrak {n}}_0} + 3 \varepsilon {\mathfrak {C}}( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^3 )^{ \theta } d^{{\mathfrak {d}}_5 + \theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + 20 \varepsilon e^{{\mathfrak {C}}} {\mathfrak {C}}^2 ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^3 )^{ 2\theta } d^{{\mathfrak {d}}_4 + {\mathfrak {d}}_6+ 2\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}\nonumber \\&\quad \le {\mathfrak {C}}d^{{\mathfrak {d}}_0} N^{-{\mathfrak {n}}_0} + 23 \varepsilon e^{{\mathfrak {C}}} {\mathfrak {C}}^2 ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^3 )^{ 2\theta } d^{\delta }\nonumber \\&\quad = {\mathfrak {C}}d^{{\mathfrak {d}}_0} N^{-{\mathfrak {n}}_0} + \tfrac{\varepsilon \gamma d^{\delta }}{2} . \end{aligned}$$

(2.39)

Combining this and (2.18) assures that for all $d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$ it holds that

$$\begin{aligned} \begin{aligned}&\left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \Big |u_d(x) - \Big ( {\mathcal {R}}\Big ({\mathbf {g}}^{ M_{{\mathcal {N}}_{d, \varepsilon }}, d }_{ {\mathcal {E}}_{d, \varepsilon } }\Big ) \Big ) \Big (X^{{\mathcal {N}}_{d, \varepsilon }, d, x, {\mathcal {E}}_{d, \varepsilon }}_{{\mathcal {N}}_{d, \varepsilon }} \Big ) \Big |^p \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}} \le \frac{\varepsilon }{2} + \frac{\varepsilon }{2} = \varepsilon . \end{aligned} \end{aligned}$$

(2.40)

This and, e.g., [36, Lemma 2.1] establish that there exists ${\mathfrak {w}} = ({\mathfrak {w}}_{d, \varepsilon })_{(d, \varepsilon ) \in {\mathbb {N}}\times (0, 1]} :{\mathbb {N}}\times (0, 1] \rightarrow \Omega $ which satisfies for all $d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$ that

$$\begin{aligned} \begin{aligned}&\left[ \int _{{\mathbb {R}}^d} \Big |u_d(x) - \Big ( {\mathcal {R}}\Big ( {\mathbf {g}}^{ M_{{\mathcal {N}}_{d, \varepsilon }}, d }_{ {\mathcal {E}}_{d, \varepsilon } } \Big ) \Big )\Big (X^{{\mathcal {N}}_{d, \varepsilon }, d, x, {\mathcal {E}}_{d, \varepsilon }}_{{\mathcal {N}}_{d, \varepsilon }}({\mathfrak {w}}_{d, \varepsilon })\Big ) \Big |^p \, \nu _d (dx) \right] ^{\nicefrac {1}{p}} \le \varepsilon . \end{aligned} \end{aligned}$$

(2.41)

Next note that for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , M_{N}\}$, $x \in {\mathbb {R}}^d$, $\varepsilon \in (0, 1]$, $\omega \in \Omega $ it holds that $X^{N, d, m, x, \varepsilon }_{0}(\omega ) = ({\mathcal {R}}( {\mathfrak {I}}_d))(x)$. The assumption that for all $ d\in {\mathbb {N}}$, $\varepsilon \in (0, 1]$ it holds that ${\mathfrak {I}}_d \in {\mathfrak {N}}_{d, \varepsilon }$ and (2.17) hence ensure that there exist $(\Phi ^{N, d, m, \varepsilon , \omega }_{n})_{m \in \{1, 2, \ldots , M_{N}\}} \subseteq {\mathfrak {N}}_{d, \varepsilon }$, $\omega \in \Omega $, $n \in \{0, 1, 2, \ldots , N\}$, $\varepsilon \in (0, 1]$, $d, N \in {\mathbb {N}}$, which satisfy for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $n \in \{0, 1, 2, \ldots , N\}$, $\omega \in \Omega $ , $m \in \{1, 2, \ldots , M_{N}\}$, $x \in {\mathbb {R}}^d$ that ${\mathcal {P}}(\Phi ^{N, d, m, \varepsilon , \omega }_{n}) \le {\mathcal {P}}({\mathfrak {I}}_d)+ n {\mathfrak {C}}N^{{\mathfrak {n}}_1} d^{{\mathfrak {d}}_3} \varepsilon ^{-{\mathfrak {e}}}$, ${\mathcal {D}}(\Phi ^{N, d, m, \varepsilon , \omega }_{n}) = {\mathcal {D}}(\Phi ^{N, d, 1, \varepsilon , \omega }_{n})$, and

$$\begin{aligned} ({\mathcal {R}}( \Phi ^{N, d, m, \varepsilon , \omega }_{n}))(x) = \Big ( {\mathcal {R}}\Big ({\mathbf {f}}^{N, d}_{\varepsilon , Z^{N, d, m}_{n-1}(\omega )} \Big ) \Big ) \big ( X^{N, d, m, x, \varepsilon }_{n-1}(\omega )\big )= X^{N, d, m, x, \varepsilon }_{n}(\omega ). \end{aligned}$$

(2.42)

The assumption that for all $d \in {\mathbb {N}}$ it holds that ${\mathcal {P}}({\mathfrak {I}}_d) \le {\mathfrak {C}}d^{{\mathfrak {d}}_3} $ therefore implies for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , M_{N}\}$, $\varepsilon \in (0, 1]$, $\omega \in \Omega $ that

$$\begin{aligned} {\mathcal {P}}(\Phi ^{N, d, m, \varepsilon , \omega }_{N}) \le {\mathfrak {C}}d^{{\mathfrak {d}}_3}+ N {\mathfrak {C}}N^{{\mathfrak {n}}_1} d^{{\mathfrak {d}}_3} \varepsilon ^{-{\mathfrak {e}}} \le {\mathfrak {C}}d^{{\mathfrak {d}}_3}+ {\mathfrak {C}}N^{{\mathfrak {n}}_1+1} d^{{\mathfrak {d}}_3} \varepsilon ^{-{\mathfrak {e}}}. \end{aligned}$$

(2.43)

Therefore, we obtain that there exist $\Psi ^{N, d, \varepsilon , \omega } \in {\mathbf {N}}$, $\omega \in \Omega $, $\varepsilon \in (0, 1]$, $d, N \in {\mathbb {N}}$, which satisfy for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $\omega \in \Omega $, $x \in {\mathbb {R}}^d$ that ${\mathcal {R}}(\Psi ^{N, d, \varepsilon , \omega }) \in C({\mathbb {R}}^d, {\mathbb {R}})$, ${\mathcal {P}}(\Psi ^{N, d, \varepsilon , \omega }) \le {\mathfrak {C}}N^{{\mathfrak {n}}_2} (N^{{\mathfrak {n}}_1+1} d^{{\mathfrak {d}}_3} \varepsilon ^{-{\mathfrak {e}}} + {\mathfrak {C}}d^{{\mathfrak {d}}_3}+ {\mathfrak {C}}N^{{\mathfrak {n}}_1+1} d^{{\mathfrak {d}}_3} \varepsilon ^{-{\mathfrak {e}}} ) $, and

$$\begin{aligned}&({\mathcal {R}}(\Psi ^{N, d, \varepsilon , \omega }))(x)\nonumber \\&\quad = ({\mathcal {R}}({\mathbf {g}}^{M_N, d}_{\varepsilon })\big (({\mathcal {R}}(\Phi ^{N, d, 1, \varepsilon , \omega }_{N}))(x), ({\mathcal {R}}(\Phi ^{N, d, 2, \varepsilon , \omega }_{N}))(x), \ldots , ({\mathcal {R}}(\Phi ^{N, d, M_N, \varepsilon , \omega }_{N}))(x)\big )\nonumber \\&\quad = ({\mathcal {R}}({\mathbf {g}}^{M_N, d}_{\varepsilon })) \big ( X^{N, d, 1, x, \varepsilon }_{N}(\omega ), X^{N, d, 2, x, \varepsilon }_{N}(\omega ), \ldots , X^{N, d, M_N, x, \varepsilon }_{N}(\omega )\big )\nonumber \\&\quad = ({\mathcal {R}}({\mathbf {g}}^{M_N, d}_{\varepsilon }))(X^{N, d, x, \varepsilon }_{N}(\omega )). \end{aligned}$$

(2.44)

Hence, we obtain that for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $\omega \in \Omega $ it holds that

$$\begin{aligned} \begin{aligned} {\mathcal {P}}(\Psi ^{N, d, \varepsilon , \omega })&\le {\mathfrak {C}}^2 N^{{\mathfrak {n}}_2} d^{{\mathfrak {d}}_3} \varepsilon ^{-{\mathfrak {e}}}(2N^{{\mathfrak {n}}_1+1} +1 )\\&\le 3 {\mathfrak {C}}^2 N^{{\mathfrak {n}}_1+ {\mathfrak {n}}_2+1} d^{{\mathfrak {d}}_3} \varepsilon ^{-{\mathfrak {e}}}. \end{aligned} \end{aligned}$$

(2.45)

This and (2.18) demonstrate that for all $d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$ it holds that

$$\begin{aligned} \begin{aligned}&{\mathcal {P}}(\Psi ^{{\mathcal {N}}_{d, \varepsilon }, d, {\mathcal {E}}_{d, \varepsilon }, {\mathfrak {w}}_{d, \varepsilon }})\\&\quad \le 3 {\mathfrak {C}}^2 2^{ {\mathfrak {n}}_1+{\mathfrak {n}}_2 +1} \big (\tfrac{2{\mathfrak {C}}d^{{\mathfrak {d}}_0}}{\varepsilon }\big )^{\frac{( {\mathfrak {n}}_1+{\mathfrak {n}}_2 +1)}{{\mathfrak {n}}_0}} d^{ {\mathfrak {d}}_3} \varepsilon ^{-{\mathfrak {e}}} \gamma ^{ {\mathfrak {e}}} d^{ {\mathfrak {e}}\delta }\\&\quad \le 3 {\mathfrak {C}}^2 2^{{\mathfrak {n}}_1+{\mathfrak {n}}_2 + 1} (2 {\mathfrak {C}})^{\frac{ {\mathfrak {n}}_1+{\mathfrak {n}}_2 +1}{{\mathfrak {n}}_0}} \gamma ^{ {\mathfrak {e}}} d^{\frac{{\mathfrak {d}}_0( {\mathfrak {n}}_1+{\mathfrak {n}}_2 +1)}{{\mathfrak {n}}_0} + {\mathfrak {d}}_3 + {\mathfrak {e}}\delta } \varepsilon ^{-\frac{( {\mathfrak {n}}_1+ {\mathfrak {n}}_2 +1)}{{\mathfrak {n}}_0} -{\mathfrak {e}}}. \end{aligned} \end{aligned}$$

(2.46)

Combining this, (2.41), and (2.44) establishes (2.16). The proof of Theorem 2.3 is thus completed. $\square $

3 Artificial neural network (ANN) calculus

In this section we establish in Lemma 3.29 and Lemma 3.30 below a few elementary results on representation flexibilities of ANNs. In our proofs of Lemma 3.29 and Lemma 3.30 we use results from a certain ANN calculus which we recall from the scientific literature and extend in Sects. 3.1–3.7 below.

In particular, Definition 3.1 below is [25, Definitions 2.1], Definition 3.2 below is [25, Definitions 2.2], Definition 3.3 below is [25, Definitions 2.3], Definition 3.4 below is [25, Definitions 2.5], and Definition 3.5 below is [25, Definitions 2.17]. Moreover, all the results in Sect. 3.5 below are well-known and/or elementary and the proofs of these results are therefore omitted.

3.1 ANNs

Definition 3.1

(ANNs) We denote by ${\mathbf {N}}$ the set given by

$$\begin{aligned} \begin{aligned} {\mathbf {N}}&= \cup _{L \in {\mathbb {N}}} \cup _{ l_0,l_1,\ldots , l_L \in {\mathbb {N}}} \! \left( \times _{k = 1}^L ({\mathbb {R}}^{l_k \times l_{k-1}} \times {\mathbb {R}}^{l_k}) \right) \end{aligned} \end{aligned}$$

(3.1)

and we denote by $ {\mathcal {P}}:{\mathbf {N}}\rightarrow {\mathbb {N}}$, ${\mathcal {L}}:{\mathbf {N}}\rightarrow {\mathbb {N}}$, ${\mathcal {I}}:{\mathbf {N}}\rightarrow {\mathbb {N}}$, ${\mathcal {O}}:{\mathbf {N}}\rightarrow {\mathbb {N}}$, ${\mathcal {H}}:{\mathbf {N}}\rightarrow {\mathbb {N}}_0$, and ${\mathcal {D}}:{\mathbf {N}}\rightarrow ( \cup _{L\in {\mathbb {N}}}{\mathbb {N}}^{L})$ the functions which satisfy for all $ L\in {\mathbb {N}}$, $l_0,l_1,\ldots , l_L \in {\mathbb {N}}$, $ \Phi \in ( \times _{k = 1}^L({\mathbb {R}}^{l_k \times l_{k-1}} \times {\mathbb {R}}^{l_k}))$ that ${\mathcal {P}}(\Phi ) = \sum _{k = 1}^L l_k(l_{k-1} + 1) $, ${\mathcal {L}}(\Phi )=L$, ${\mathcal {I}}(\Phi )=l_0$, ${\mathcal {O}}(\Phi )=l_L$, ${\mathcal {H}}(\Phi )=L-1$, and ${\mathcal {D}}(\Phi )= (l_0,l_1,\ldots , l_L)$.

3.2 Realizations of ANNs

Definition 3.2

(Multidimensional versions) Let $d \in {\mathbb {N}}$ and let $\psi :{\mathbb {R}}\rightarrow {\mathbb {R}}$ be a function. Then we denote by ${\mathfrak {M}}_{\psi , d} :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d$ the function which satisfies for all $ x = ( x_1, \dots , x_{d} ) \in {\mathbb {R}}^{d} $ that

$$\begin{aligned} {\mathfrak {M}}_{\psi , d}( x ) = \left( \psi (x_1) , \ldots , \psi (x_d) \right) . \end{aligned}$$

(3.2)

Definition 3.3

(Realizations associated to ANNs) Let $a\in C({\mathbb {R}},{\mathbb {R}})$. Then we denote by $ {\mathcal {R}}_{a}:{\mathbf {N}}\rightarrow ( \cup _{k,l\in {\mathbb {N}}}C({\mathbb {R}}^k,{\mathbb {R}}^l)) $ the function which satisfies for all $ L\in {\mathbb {N}}$, $l_0,l_1,\ldots , l_L \in {\mathbb {N}}$, $ \Phi = ((W_1, B_1),(W_2, B_2),\ldots , (W_L,B_L)) \in ( \times _{k = 1}^L({\mathbb {R}}^{l_k \times l_{k-1}} \times {\mathbb {R}}^{l_k})) $, $x_0 \in {\mathbb {R}}^{l_0}, x_1 \in {\mathbb {R}}^{l_1}, \ldots , x_{L-1} \in {\mathbb {R}}^{l_{L-1}}$ with $\forall \, k \in {\mathbb {N}}\cap (0,L) :x_k ={\mathfrak {M}}_{a,l_k}(W_k x_{k-1} + B_k)$ that

$$\begin{aligned} {\mathcal {R}}_{a}(\Phi ) \in C({\mathbb {R}}^{l_0},{\mathbb {R}}^{l_L})\qquad \text {and}\qquad ( {\mathcal {R}}_{a}(\Phi ) ) (x_0) = W_L x_{L-1} + B_L \end{aligned}$$

(3.3)

(cf. Definitions 3.1 and 3.2).

3.3 Compositions of ANNs

Definition 3.4

(Compositions of ANNs) We denote by ${(\cdot ) \bullet (\cdot )}:\{(\Phi _1,\Phi _2)\in {\mathbf {N}}\times {\mathbf {N}}:{\mathcal {I}}(\Phi _1)={\mathcal {O}}(\Phi _2)\}\rightarrow {\mathbf {N}}$ the function which satisfies for all $ L,{\mathscr {L}}\in {\mathbb {N}}$, $l_0,l_1,\ldots , l_L,{\mathfrak {l}}_0,{\mathfrak {l}}_1,\ldots , {\mathfrak {l}}_{\mathscr {L}} \in {\mathbb {N}}$, $ \Phi _1 = ((W_1, B_1),(W_2, B_2),\ldots , (W_L,B_L)) \in ( \times _{k = 1}^L({\mathbb {R}}^{l_k \times l_{k-1}} \times {\mathbb {R}}^{l_k})) $, $ \Phi _2 = (({\mathscr {W}}_1, {\mathscr {B}}_1),({\mathscr {W}}_2, {\mathscr {B}}_2),\ldots , ({\mathscr {W}}_{\mathscr {L}},{\mathscr {B}}_{\mathscr {L}})) \in ( \times _{k = 1}^{\mathscr {L}}({\mathbb {R}}^{{\mathfrak {l}}_k \times {\mathfrak {l}}_{k-1}} \times {\mathbb {R}}^{{\mathfrak {l}}_k})) $ with $l_0={\mathcal {I}}(\Phi _1)={\mathcal {O}}(\Phi _2)={\mathfrak {l}}_{{\mathscr {L}}}$ that

$$\begin{aligned} \begin{aligned}&{\Phi _1 \bullet \Phi _2}\\&\quad = {\left\{ \begin{array}{ll} \begin{array}{r} \big (({\mathscr {W}}_1, {\mathscr {B}}_1),({\mathscr {W}}_2, {\mathscr {B}}_2),\ldots , ({\mathscr {W}}_{{\mathscr {L}}-1},{\mathscr {B}}_{{\mathscr {L}}-1}), (W_1 {\mathscr {W}}_{{\mathscr {L}}}, W_1 {\mathscr {B}}_{{\mathscr {L}}}+B_{1}),\\ (W_2, B_2), (W_3, B_3),\ldots ,(W_{L},B_{L})\big ) \end{array} &{}: L>1<{\mathscr {L}} \\ \big ( (W_1 {\mathscr {W}}_{1}, W_1 {\mathscr {B}}_1+B_{1}), (W_2, B_2), (W_3, B_3),\ldots ,(W_{L},B_{L}) \big ) &{}: L>1={\mathscr {L}}\\ \big (({\mathscr {W}}_1, {\mathscr {B}}_1),({\mathscr {W}}_2, {\mathscr {B}}_2),\ldots , ({\mathscr {W}}_{{\mathscr {L}}-1},{\mathscr {B}}_{{\mathscr {L}}-1}),(W_1 {\mathscr {W}}_{{\mathscr {L}}}, W_1 {\mathscr {B}}_{{\mathscr {L}}}+B_{1}) \big ) &{}: L=1<{\mathscr {L}} \\ (W_1 {\mathscr {W}}_{1}, W_1 {\mathscr {B}}_1+B_{1}) &{}: L=1={\mathscr {L}} \end{array}\right. } \end{aligned}\nonumber \\ \end{aligned}$$

(3.4)

(cf. Definition 3.1).

3.4 Parallelizations of ANNs with the same length

Definition 3.5

(Parallelizations of ANNs with the same length) Let $n\in {\mathbb {N}}$. Then we denote by

$$\begin{aligned} {\mathbf {P}}_{n}:\big \{(\Phi _1,\Phi _2,\dots , \Phi _n)\in {\mathbf {N}}^n:{\mathcal {L}}(\Phi _1)= {\mathcal {L}}(\Phi _2)=\cdots ={\mathcal {L}}(\Phi _n) \big \}\rightarrow {\mathbf {N}}\end{aligned}$$

(3.5)

the function which satisfies for all $L\in {\mathbb {N}}$, $l_{1,0},l_{1,1},\dots , l_{1,L}, l_{2,0},l_{2,1},\dots , l_{2,L},\dots ,l_{n,0},l_{n,1},\dots , l_{n,L}\in {\mathbb {N}}$, $\Phi _1=((W_{1,1}, B_{1,1}),(W_{1,2}, B_{1,2}),\ldots , (W_{1,L},B_{1,L}))\in ( \times _{k = 1}^L({\mathbb {R}}^{l_{1,k} \times l_{1,k-1}} \times {\mathbb {R}}^{l_{1,k}}))$, $\Phi _2=((W_{2,1}, B_{2,1}),(W_{2,2}, B_{2,2}),\ldots , (W_{2,L},B_{2,L}))\in ( \times _{k = 1}^L({\mathbb {R}}^{l_{2,k} \times l_{2,k-1}} \times {\mathbb {R}}^{l_{2,k}}))$, ..., $\Phi _n=((W_{n,1}, B_{n,1}),(W_{n,2}, B_{n,2}),\ldots , (W_{n,L},B_{n,L}))\in ( \times _{k = 1}^L({\mathbb {R}}^{l_{n,k} \times l_{n,k-1}} \times {\mathbb {R}}^{l_{n,k}}))$ that

$$\begin{aligned} \begin{aligned} {\mathbf {P}}_{n}(\Phi _1,\Phi _2,\dots ,\Phi _n)&= \left( \left( {\begin{pmatrix} W_{1,1}&{} 0&{} 0&{} \cdots &{} 0\\ 0&{} W_{2,1}&{} 0&{}\cdots &{} 0\\ 0&{} 0&{} W_{3,1}&{}\cdots &{} 0\\ \vdots &{} \vdots &{}\vdots &{} \ddots &{} \vdots \\ 0&{} 0&{} 0&{}\cdots &{} W_{n,1} \end{pmatrix} ,\begin{pmatrix}B_{1,1}\\ B_{2,1}\\ B_{3,1}\\ \vdots \\ B_{n,1}\end{pmatrix}}\right) ,\right. \\ {}&\quad \left( {\begin{pmatrix} W_{1,2}&{} 0&{} 0&{} \cdots &{} 0\\ 0&{} W_{2,2}&{} 0&{}\cdots &{} 0\\ 0&{} 0&{} W_{3,2}&{}\cdots &{} 0\\ \vdots &{} \vdots &{}\vdots &{} \ddots &{} \vdots \\ 0&{} 0&{} 0&{}\cdots &{} W_{n,2} \end{pmatrix} ,\begin{pmatrix}B_{1,2}\\ B_{2,2}\\ B_{3,2}\\ \vdots \\ B_{n,2}\end{pmatrix}}\right) ,\dots , \\ {}&\quad \left. \left( {\begin{pmatrix} W_{1,L}&{} 0&{} 0&{} \cdots &{} 0\\ 0&{} W_{2,L}&{} 0&{}\cdots &{} 0\\ 0&{} 0&{} W_{3,L}&{}\cdots &{} 0\\ \vdots &{} \vdots &{}\vdots &{} \ddots &{} \vdots \\ 0&{} 0&{} 0&{}\cdots &{} W_{n,L} \end{pmatrix} ,\begin{pmatrix}B_{1,L}\\ B_{2,L}\\ B_{3,L}\\ \vdots \\ B_{n,L}\end{pmatrix}}\right) \right) \end{aligned} \end{aligned}$$

(3.6)

(cf. Definition 3.1).

3.5 Linear transformations of ANNs

Definition 3.6

(Identity matrix) Let $n\in {\mathbb {N}}$. Then we denote by ${\text {I}}_{n}\in {\mathbb {R}}^{n\times n}$ the identity matrix in ${\mathbb {R}}^{n\times n}$.

Definition 3.7

(ANNs with a vector input) Let $n \in {\mathbb {N}}$, $B \in {\mathbb {R}}^n$. Then we denote by ${\mathfrak {B}}_B \in ({\mathbb {R}}^{n \times n} \times {\mathbb {R}}^n)$ the pair given by ${\mathfrak {B}}_B = ({\text {I}}_n, B)$ (cf. Definition 3.6).

Lemma 3.8

Let $n \in {\mathbb {N}}$, $B \in {\mathbb {R}}^n$. Then

(i)
it holds that ${\mathfrak {B}}_B \in {\mathbf {N}}$,
(ii)
it holds that ${\mathcal {D}}({\mathfrak {B}}_{B}) = (n, n) \in {\mathbb {N}}^2$,
(iii)
it holds for all $a \in C({\mathbb {R}}, {\mathbb {R}})$ that ${\mathcal {R}}_{a}({\mathfrak {B}}_{B}) \in C({\mathbb {R}}^n, {\mathbb {R}}^n)$, and
(vi)
it holds for all $a \in C({\mathbb {R}}, {\mathbb {R}})$, $x \in {\mathbb {R}}^n$ that $({\mathcal {R}}_{a}({\mathfrak {B}}_{B})) (x) = x + B$

(cf. Definitions 3.1, 3.3, and 3.7).

Lemma 3.9

Let $\Phi \in {\mathbf {N}}$ (cf. Definition 3.1). Then

(i)
it holds for all $B \in {\mathbb {R}}^{{\mathcal {O}}(\Phi )} $ that ${\mathcal {D}}({{\mathfrak {B}}_B \bullet \Phi }) = {\mathcal {D}}(\Phi )$,
(ii)
it holds for all $B \in {\mathbb {R}}^{{\mathcal {O}}(\Phi )}$, $a \in C({\mathbb {R}}, {\mathbb {R}})$ that ${\mathcal {R}}_{a}({{\mathfrak {B}}_B \bullet \Phi }) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi )}, {\mathbb {R}}^{{\mathcal {O}}(\Phi )}) $,
(iii)
it holds for all $B \in {\mathbb {R}}^{{\mathcal {O}}(\Phi )} $, $a \in C({\mathbb {R}}, {\mathbb {R}})$, $x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )}$ that
$$\begin{aligned} ({\mathcal {R}}_{a}({{\mathfrak {B}}_B \bullet \Phi }))(x) = ({\mathcal {R}}_{a}(\Phi ))(x) + B, \end{aligned}$$
(3.7)
(iv)
it holds for all $B \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )} $ that ${\mathcal {D}}({\Phi \bullet {\mathfrak {B}}_B}) = {\mathcal {D}}(\Phi )$,
(v)
it holds for all $B \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )} $, $a \in C({\mathbb {R}}, {\mathbb {R}})$ that ${\mathcal {R}}_{a}({\Phi \bullet {\mathfrak {B}}_B}) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi )}, {\mathbb {R}}^{{\mathcal {O}}(\Phi )}) $, and
(iv)
it holds for all $B \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )} $, $a \in C({\mathbb {R}}, {\mathbb {R}})$, $x \in {\mathbb {R}}^{{\mathcal {O}}(\Phi )}$ that
$$\begin{aligned} ({\mathcal {R}}_{a}({\Phi \bullet {\mathfrak {B}}_B}))(x) = ({\mathcal {R}}_{a}(\Phi ))(x+B) \end{aligned}$$
(3.8)

(cf. Definitions 3.3, 3.4, and 3.7).

Definition 3.10

(ANNs with a matrix input) Let $m, n \in {\mathbb {N}}$, $W \in {\mathbb {R}}^{m \times n}$. Then we denote by ${\mathfrak {W}}_{W} \in ({\mathbb {R}}^{m \times n} \times {\mathbb {R}}^{m})$ the pair given by ${\mathfrak {W}}_{W} = (W, 0)$.

Lemma 3.11

Let $m, n \in {\mathbb {N}}$, $W \in {\mathbb {R}}^{m \times n}$. Then

(i)
it holds that ${\mathfrak {W}}_W \in {\mathbf {N}}$,
(ii)
it holds that ${\mathcal {D}}({\mathfrak {W}}_{W}) = (n, m) \in {\mathbb {N}}^2$,
(iii)
it holds for all $a \in C({\mathbb {R}}, {\mathbb {R}})$ that ${\mathcal {R}}_{a}({\mathfrak {W}}_{W}) \in C({\mathbb {R}}^n, {\mathbb {R}}^m)$, and
(iv)
it holds for all $a \in C({\mathbb {R}}, {\mathbb {R}})$, $x \in {\mathbb {R}}^n$ that $ ({\mathcal {R}}_{a}({\mathfrak {W}}_{W})) (x) = Wx $

(cf. Definitions 3.1, 3.3, and 3.10).

Lemma 3.12

Let $a \in C({\mathbb {R}}, {\mathbb {R}})$, $\Phi \in {\mathbf {N}}$ (cf. Definition 3.1). Then

(i)
it holds for all $m \in {\mathbb {N}}$, $W \in {\mathbb {R}}^{m \times {\mathcal {O}}(\Phi )}$ that ${\mathcal {R}}_{a}({{\mathfrak {W}}_W \bullet \Phi }) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi )}, {\mathbb {R}}^m) $,
(ii)
it holds for all $m \in {\mathbb {N}}$, $W \in {\mathbb {R}}^{m \times {\mathcal {O}}(\Phi )}$, $x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )}$ that
$$\begin{aligned} ({\mathcal {R}}_{a}({{\mathfrak {W}}_W \bullet \Phi }))(x) = W \big (({\mathcal {R}}_{a}(\Phi ))(x)\big ), \end{aligned}$$
(3.9)
(iii)
it holds for all $n \in {\mathbb {N}}$, $W \in {\mathbb {R}}^{ {\mathcal {I}}(\Phi ) \times n}$ that ${\mathcal {R}}_{a}({\Phi \bullet {\mathfrak {W}}_W}) \in C({\mathbb {R}}^n, {\mathbb {R}}^{{\mathcal {O}}(\Phi )}) $, and
(iv)
it holds for all $n \in {\mathbb {N}}$, $W \in {\mathbb {R}}^{ {\mathcal {I}}(\Phi ) \times n}$, $x \in {\mathbb {R}}^n$ that
$$\begin{aligned} ({\mathcal {R}}_{a}({\Phi \bullet {\mathfrak {W}}_W}))(x) = ({\mathcal {R}}_{a}(\Phi ))(Wx) \end{aligned}$$
(3.10)

(cf. Definitions 3.3, 3.4, and 3.10).

Definition 3.13

(Scalar multiplications of ANNs) We denote by $\left( \cdot \right) \circledast \left( \cdot \right) :{\mathbb {R}}\times {\mathbf {N}}\rightarrow {\mathbf {N}}$ the function which satisfies for all $\lambda \in {\mathbb {R}}$, $\Phi \in {\mathbf {N}}$ that

$$\begin{aligned} \lambda \circledast \Phi = {{\mathfrak {W}}_{\lambda {\text {I}}_{{\mathcal {O}}(\Phi )}} \bullet \Phi } \end{aligned}$$

(3.11)

(cf. Definitions 3.1, 3.4, 3.6, and 3.10).

Lemma 3.14

Let $\lambda \in {\mathbb {R}}$, $\Phi \in {\mathbf {N}}$ (cf. Definition 3.1). Then

(i)
it holds that ${\mathcal {D}}(\lambda \circledast \Phi ) = {\mathcal {D}}(\Phi )$,
(ii)
it holds for all $a \in C({\mathbb {R}}, {\mathbb {R}})$ that ${\mathcal {R}}_{a}(\lambda \circledast \Phi ) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi )}, {\mathbb {R}}^{{\mathcal {O}}(\Phi )})$, and
(iii)
it holds for all $a \in C({\mathbb {R}}, {\mathbb {R}})$, $x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )}$ that
$$\begin{aligned} ({\mathcal {R}}_{a}(\lambda \circledast \Phi ))(x) = \lambda \big ( ({\mathcal {R}}_{a}(\Phi ))(x) \big ) \end{aligned}$$
(3.12)

(cf. Definitions 3.3 and 3.13).

3.6 Representations of the identities with rectifier functions

Definition 3.15

We denote by ${\mathfrak {I}}= ({\mathfrak {I}}_d)_{d \in {\mathbb {N}}} :{\mathbb {N}}\rightarrow {\mathbf {N}}$ the function which satisfies for all $d \in {\mathbb {N}}$ that

$$\begin{aligned} {\mathfrak {I}}_1 = \left( \left( \begin{pmatrix} 1\\ -1 \end{pmatrix}, \begin{pmatrix} 0\\ 0 \end{pmatrix} \right) , \Big ( \begin{pmatrix} 1&-1 \end{pmatrix}, 0 \Big ) \right) \in \big (({\mathbb {R}}^{2 \times 1} \times {\mathbb {R}}^{2}) \times ({\mathbb {R}}^{1 \times 2} \times {\mathbb {R}}^1) \big ) \end{aligned}$$

(3.13)

and

$$\begin{aligned} {\mathfrak {I}}_d = {\mathbf {P}}_{d} ({\mathfrak {I}}_1, {\mathfrak {I}}_1, \ldots , {\mathfrak {I}}_1) \end{aligned}$$

(3.14)

(cf. Definitions 3.1 and 3.5).

Lemma 3.16

Let $d \in {\mathbb {N}}$, $a \in C({\mathbb {R}}, {\mathbb {R}})$ satisfy for all $x \in {\mathbb {R}}$ that $a(x) = \max \{x, 0\}$. Then

(i)
it holds that ${\mathcal {D}}({\mathfrak {I}}_d) = (d, 2d, d) \in {\mathbb {N}}^3$,
(ii)
it holds that $ {\mathcal {R}}_{a}({\mathfrak {I}}_d) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)$, and
(iii)
it holds for all $x \in {\mathbb {R}}^d$ that $({\mathcal {R}}_{a}({\mathfrak {I}}_d))(x) = x$

(cf. Definitions 3.1, 3.3, and 3.15).

Proof of Lemma 3.16

Throughout this proof let $L =2$, $l_0 = 1$, $l_1 = 2$, $l_2 =1$. Note that (3.13) ensures that

$$\begin{aligned} {\mathcal {D}}({\mathfrak {I}}_1) = (1, 2, 1) = (l_0, l_1, l_2). \end{aligned}$$

(3.15)

This and, e.g., [25, Lemma 2.18] prove that

$$\begin{aligned} \begin{aligned}&{\mathbf {P}}_{d} ({\mathfrak {I}}_1, {\mathfrak {I}}_1, \ldots , {\mathfrak {I}}_1) \\&\in \big (\! \times _{k = 1}^L \big ( {\mathbb {R}}^{(d l_k) \times (d l_{k-1})} \times {\mathbb {R}}^{(d l_k)}\big ) \big ) = \big (\big ({\mathbb {R}}^{(2d) \times d} \times {\mathbb {R}}^{2d}\big ) \times \big ({\mathbb {R}}^{d \times (2d)} \times {\mathbb {R}}^d\big ) \big ) \end{aligned} \end{aligned}$$

(3.16)

(cf. Definition 3.5). Hence, we obtain that ${\mathcal {D}}({\mathfrak {I}}_d) = (d, 2d, d) \in {\mathbb {N}}^3$. This establishes item (i). Next note that (3.13) assures that for all $x \in {\mathbb {R}}$ it holds that

$$\begin{aligned} ({\mathcal {R}}_{a}({\mathfrak {I}}_1))(x) = a(x) - a(-x) = \max \{x, 0\} - \max \{-x, 0\} = x. \end{aligned}$$

(3.17)

Combining this and, e.g., [25, Proposition 2.19] demonstrates that for all $ x = (x_1, x_2, \ldots , x_d) \in {\mathbb {R}}^d$ it holds that ${\mathcal {R}}_{a}({\mathfrak {I}}_d) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)$ and

$$\begin{aligned} \begin{aligned} ({\mathcal {R}}_{a}({\mathfrak {I}}_d))(x)&= \big ( {\mathcal {R}}_{a}\big ({\mathbf {P}}_{d} ({\mathfrak {I}}_1, {\mathfrak {I}}_1, \ldots , {\mathfrak {I}}_1)\big )\big )(x_1, x_2, \ldots , x_d)\\&= \big ( ({\mathcal {R}}_{a}({\mathfrak {I}}_1))(x_1), ({\mathcal {R}}_{a}({\mathfrak {I}}_1))(x_2), \ldots , ({\mathcal {R}}_{a}({\mathfrak {I}}_1))(x_d)\big )\\&= (x_1, x_2, \ldots , x_d) = x. \end{aligned} \end{aligned}$$

(3.18)

This establishes items (ii)–(iii). The proof of Lemma 3.16 is thus completed. $\square $

3.7 Sums of ANNs with the same length

Definition 3.17

Let $m, n \in {\mathbb {N}}$. Then we denote by ${\mathfrak {S}}_{m, n} \in ({\mathbb {R}}^{m \times (nm)} \times {\mathbb {R}}^m)$ the pair given by

$$\begin{aligned} {\mathfrak {S}}_{m, n} = {\mathfrak {W}}_{({\text {I}}_m \,\,\, {\text {I}}_m \,\,\, \ldots \,\,\, {\text {I}}_m) } \end{aligned}$$

(3.19)

(cf. Definitions 3.6 and 3.10).

Lemma 3.18

Let $m, n \in {\mathbb {N}}$. Then

(i)
it holds that ${\mathfrak {S}}_{m, n} \in {\mathbf {N}}$,
(ii)
it holds that ${\mathcal {D}}({\mathfrak {S}}_{m, n}) = (nm, m) \in {\mathbb {N}}^2$,
(iii)
it holds for all $a \in C({\mathbb {R}}, {\mathbb {R}})$ that ${\mathcal {R}}_{a}({\mathfrak {S}}_{m, n}) \in C({\mathbb {R}}^{nm}, {\mathbb {R}}^m)$, and
(iv)
it holds for all $a \in C({\mathbb {R}}, {\mathbb {R}})$, $x_1, x_2, \ldots , x_n \in {\mathbb {R}}^{m}$ that
$$\begin{aligned} ({\mathcal {R}}_{a}({\mathfrak {S}}_{m, n})) (x_1, x_2, \ldots , x_n) = \textstyle \sum _{k=1}^n x_k \end{aligned}$$
(3.20)

(cf. Definitions 3.1, 3.3, and 3.17).

Proof of Lemma 3.18

Note that the fact that ${\mathfrak {S}}_{m, n} \in ({\mathbb {R}}^{m \times (nm)} \times {\mathbb {R}}^m)$ ensures that ${\mathfrak {S}}_{m, n} \in {\mathbf {N}}$ and ${\mathcal {D}}({\mathfrak {S}}_{m, n}) = (nm, m) \in {\mathbb {N}}^2$. This establishes items (i)–(ii). Next observe that items (iii)–(iv) in Lemma 3.11 prove that for all $a \in C({\mathbb {R}}, {\mathbb {R}})$, $x_1, x_2, \ldots , x_n \in {\mathbb {R}}^{m}$ it holds that ${\mathcal {R}}_{a}({\mathfrak {S}}_{m, n}) \in C({\mathbb {R}}^{nm}, {\mathbb {R}}^m)$ and

$$\begin{aligned} \begin{aligned} ({\mathcal {R}}_{a}({\mathfrak {S}}_{m, n})) (x_1, x_2, \ldots , x_n)&= \big ( {\mathcal {R}}_{a}\big ( {\mathfrak {W}}_{({\text {I}}_m \,\,\, {\text {I}}_m \,\,\, \ldots \,\,\, {\text {I}}_m) } \big )\big )(x_1, x_2, \ldots , x_n)\\&= ({\text {I}}_m \,\,\, {\text {I}}_m \,\,\, \ldots \,\,\, {\text {I}}_m) (x_1, x_2, \ldots , x_n) = \textstyle \sum _{k=1}^n x_k \end{aligned} \end{aligned}$$

(3.21)

(cf. Definitions 3.6 and 3.10). This establishes items (iii)–(iv). The proof of Lemma 3.18 is thus completed. $\square $

Lemma 3.19

Let $m, n \in {\mathbb {N}}$, $a \in C({\mathbb {R}}, {\mathbb {R}})$, $\Phi \in \{\Psi \in {\mathbf {N}}:{\mathcal {O}}(\Psi ) = nm\}$ (cf. Definition 3.1). Then

(i)
it holds that ${\mathcal {R}}_{a}({{\mathfrak {S}}_{m, n} \bullet \Phi }) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi )}, {\mathbb {R}}^m) $ and
(ii)
it holds for all $x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )}$, $y_1, y_2, \ldots , y_n \in {\mathbb {R}}^{m}$ with $({\mathcal {R}}_{a}(\Phi ))(x) = (y_1, y_2, \ldots , y_n)$ that
$$\begin{aligned} \big ( {\mathcal {R}}_{a}({{\mathfrak {S}}_{m, n} \bullet \Phi }) \big )(x) = \textstyle \sum _{k=1}^n y_k \end{aligned}$$
(3.22)

(cf. Definitions 3.3, 3.4, and 3.17).

Proof of Lemma 3.19

Note that Lemma 3.18 ensures that for all $x_1, x_2, \ldots , x_n \in {\mathbb {R}}^{m}$ it holds that ${\mathcal {R}}_{a}({\mathfrak {S}}_{m, n}) \in C({\mathbb {R}}^{nm}, {\mathbb {R}}^m)$ and

$$\begin{aligned} \begin{aligned} ({\mathcal {R}}_{a}({\mathfrak {S}}_{m, n})) (x_1, x_2, \ldots , x_n) = \textstyle \sum _{k=1}^n x_k. \end{aligned} \end{aligned}$$

(3.23)

Combining this and, e.g., [25, item (v) in Proposition 2.6] establishes items (i)–(ii). The proof of Lemma 3.19 is thus completed. $\square $

Lemma 3.20

Let $n \in {\mathbb {N}}$, $a \in C({\mathbb {R}}, {\mathbb {R}})$, $\Phi \in {\mathbf {N}}$ (cf. Definition 3.1). Then

(i)
it holds that ${\mathcal {R}}_{a}({\Phi \bullet {\mathfrak {S}}_{{\mathcal {I}}(\Phi ), n}}) \in C({\mathbb {R}}^{n {\mathcal {I}}(\Phi )}, {\mathbb {R}}^{{\mathcal {O}}(\Phi )}) $ and
(ii)
it holds for all $x_1, x_2, \ldots , x_n \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )}$ that
$$\begin{aligned} \big ({\mathcal {R}}_{a}({\Phi \bullet {\mathfrak {S}}_{{\mathcal {I}}(\Phi ), n}}) \big )(x_1, x_2, \ldots , x_n) = ({\mathcal {R}}_{a}(\Phi ))(\textstyle \sum _{k=1}^n x_k) \end{aligned}$$
(3.24)

(cf. Definitions 3.3, 3.4, and 3.17).

Proof of Lemma 3.20

Note that Lemma 3.18 demonstrates that for all $m \in {\mathbb {N}}$, $x_1, x_2, \ldots , x_n \in {\mathbb {R}}^{m}$ it holds that ${\mathcal {R}}_{a}({\mathfrak {S}}_{m, n}) \in C({\mathbb {R}}^{nm}, {\mathbb {R}}^m)$ and

$$\begin{aligned} \begin{aligned} ({\mathcal {R}}_{a}({\mathfrak {S}}_{m, n})) (x_1, x_2, \ldots , x_n) = \textstyle \sum _{k=1}^n x_k. \end{aligned} \end{aligned}$$

(3.25)

Combining this and, e.g., [25, item (v) in Proposition 2.6] establishes items (i)–(ii). The proof of Lemma 3.20 is thus completed. $\square $

Definition 3.21

Let $m, n \in {\mathbb {N}}$, $A \in {\mathbb {R}}^{m \times n}$. Then we denote by $A^* \in {\mathbb {R}}^{n \times m}$ the transpose of A.

Definition 3.22

Let $m, n \in {\mathbb {N}}$. Then we denote by ${\mathfrak {T}}_{m, n} \in ({\mathbb {R}}^{(nm) \times m} \times {\mathbb {R}}^{nm})$ the pair given by

$$\begin{aligned} {\mathfrak {T}}_{m, n} = {\mathfrak {W}}_{({\text {I}}_m \,\,\, {\text {I}}_m \,\,\, \ldots \,\,\, {\text {I}}_m)^* } \end{aligned}$$

(3.26)

(cf. Definitions 3.6, 3.10, and 3.21).

Lemma 3.23

Let $m, n \in {\mathbb {N}}$. Then

(i)
it holds that ${\mathfrak {T}}_{m, n} \in {\mathbf {N}}$,
(ii)
it holds that $ {\mathcal {D}}({\mathfrak {T}}_{m, n}) = (m, nm) \in {\mathbb {N}}^2$,
(iii)
it holds for all $a \in C({\mathbb {R}}, {\mathbb {R}})$ that ${\mathcal {R}}_{a}({\mathfrak {T}}_{m, n}) \in C({\mathbb {R}}^{m}, {\mathbb {R}}^{nm})$, and
(iv)
it holds for all $a \in C({\mathbb {R}}, {\mathbb {R}})$, $x \in {\mathbb {R}}^m$ that
$$\begin{aligned} ({\mathcal {R}}_{a}({\mathfrak {T}}_{m, n})) (x) = (x, x, \ldots , x) \end{aligned}$$
(3.27)

(cf. Definitions 3.1, 3.3, and 3.22).

Proof of Lemma 3.23

Note that the fact that ${\mathfrak {T}}_{m, n} \in ({\mathbb {R}}^{(nm) \times m} \times {\mathbb {R}}^{nm})$ ensures that ${\mathfrak {T}}_{m, n} \in {\mathbf {N}}$ and ${\mathcal {D}}({\mathfrak {T}}_{m, n}) = (m, nm) \in {\mathbb {N}}^2$. This establishes items (i)–(ii). Next observe that items (iii)–(iv) in Lemma 3.11 prove that for all $a \in C({\mathbb {R}}, {\mathbb {R}})$, $x \in {\mathbb {R}}^m$ it holds that ${\mathcal {R}}_{a}({\mathfrak {T}}_{m, n}) \in C({\mathbb {R}}^{m}, {\mathbb {R}}^{nm})$ and

$$\begin{aligned} \begin{aligned} ({\mathcal {R}}_{a}({\mathfrak {T}}_{m, n})) (x)&= \big ({\mathcal {R}}_{a}\big ( {\mathfrak {W}}_{({\text {I}}_m \,\,\, {\text {I}}_m \,\,\, \ldots \,\,\, {\text {I}}_m)^* } \big ) \big )(x)\\&= ({\text {I}}_m \,\,\, {\text {I}}_m \,\,\, \ldots \,\,\, {\text {I}}_m)^{*} x = (x, x, \ldots , x) \end{aligned} \end{aligned}$$

(3.28)

(cf. Definitions 3.6 and 3.10). This establishes items (iii)–(iv). The proof of Lemma 3.23 is thus completed. $\square $

Lemma 3.24

Let $n \in {\mathbb {N}}$, $a \in C({\mathbb {R}}, {\mathbb {R}})$, $\Phi \in {\mathbf {N}}$ (cf. Definition 3.1). Then

(i)
it holds that ${\mathcal {R}}_{a}({{\mathfrak {T}}_{{\mathcal {O}}(\Phi ), n} \bullet \Phi }) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi )}, {\mathbb {R}}^{n {\mathcal {O}}(\Phi )}) $ and
(ii)
it holds for all $x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )}$ that
$$\begin{aligned} \big ( {\mathcal {R}}_{a}({{\mathfrak {T}}_{{\mathcal {O}}(\Phi ), n} \bullet \Phi }) \big )(x) = \big (({\mathcal {R}}_{a}(\Phi ))(x), ({\mathcal {R}}_{a}(\Phi ))(x), \ldots , ({\mathcal {R}}_{a}(\Phi ))(x) \big ) \end{aligned}$$
(3.29)

(cf. Definitions 3.3, 3.4, and 3.22).

Proof of Lemma 3.24

Note that Lemma 3.23 ensures that for all $m \in {\mathbb {N}}$, $x \in {\mathbb {R}}^m$ it holds that ${\mathcal {R}}_{a}({\mathfrak {T}}_{m, n}) \in C({\mathbb {R}}^{m}, {\mathbb {R}}^{nm})$ and

$$\begin{aligned} \begin{aligned} ({\mathcal {R}}_{a}({\mathfrak {T}}_{m, n})) (x) = (x, x, \ldots , x). \end{aligned} \end{aligned}$$

(3.30)

Combining this and, e.g., [25, item (v) in Proposition 2.6] establishes items (i)–(ii). The proof of Lemma 3.24 is thus completed. $\square $

Lemma 3.25

Let $m, n \in {\mathbb {N}}$, $a \in C({\mathbb {R}}, {\mathbb {R}})$, $\Phi \in \{\Psi \in {\mathbf {N}}:{\mathcal {I}}(\Psi ) = nm\}$ (cf. Definition 3.1). Then

(i)
it holds that ${\mathcal {R}}_{a}({\Phi \bullet {\mathfrak {T}}_{m, n}}) \in C({\mathbb {R}}^{m}, {\mathbb {R}}^{{\mathcal {O}}(\Phi )}) $ and
(ii)
it holds for all $x \in {\mathbb {R}}^{m}$ that
$$\begin{aligned} \big ({\mathcal {R}}_{a}({\Phi \bullet {\mathfrak {T}}_{m, n}}) \big )(x) = ({\mathcal {R}}_{a}(\Phi ))(x, x, \ldots , x) \end{aligned}$$
(3.31)

(cf. Definitions 3.3, 3.4, and 3.22).

Proof of Lemma 3.25

Observe that Lemma 3.23 demonstrates that for all $x \in {\mathbb {R}}^m$ it holds that ${\mathcal {R}}_{a}({\mathfrak {T}}_{m, n}) \in C({\mathbb {R}}^{m}, {\mathbb {R}}^{nm})$ and

$$\begin{aligned} \begin{aligned} ({\mathcal {R}}_{a}({\mathfrak {T}}_{m, n})) (x) = (x, x, \ldots , x). \end{aligned} \end{aligned}$$

(3.32)

Combining this and, e.g., [25, item (v) in Proposition 2.6] establishes items (i)–(ii). The proof of Lemma 3.25 is thus completed. $\square $

Definition 3.26

(Sums of ANNs with the same length) Let $n \in {\mathbb {N}}$, $\Phi _1, \Phi _2, \ldots , \Phi _n \in {\mathbf {N}}$ satisfy for all $k \in \{1, 2, \ldots , n\}$ that ${\mathcal {L}}(\Phi _k) = {\mathcal {L}}(\Phi _1)$, ${\mathcal {I}}(\Phi _k) = {\mathcal {I}}(\Phi _1)$, and ${\mathcal {O}}(\Phi _k) = {\mathcal {O}}(\Phi _1)$. Then we denote by $\oplus _{k \in \{1, 2, \ldots , n\}} \Phi _k$ (we denote by $\Phi _1 \oplus \Phi _2 \oplus \ldots \oplus \Phi _n$) the tuple given by

$$\begin{aligned} \oplus _{k \in \{1, 2, \ldots , n\}} \Phi _k = \big ( {{\mathfrak {S}}_{{\mathcal {O}}(\Phi _1), n} \bullet {{\big [{\mathbf {P}}_n(\Phi _1,\Phi _2,\dots , \Phi _n)\big ] \bullet {\mathfrak {T}}_{{\mathcal {I}}(\Phi _1), n}}}} \big ) \in {\mathbf {N}}\end{aligned}$$

(3.33)

(cf. Definitions 3.1, 3.4, 3.5, 3.17, and 3.22).

Definition 3.27

(Dimensions of ANNs) Let $n \in {\mathbb {N}}_0$. Then we denote by ${\mathbb {D}}_n :{\mathbf {N}}\rightarrow {\mathbb {N}}_0$ the function which satisfies for all $ L\in {\mathbb {N}}$, $l_0,l_1,\ldots , l_L \in {\mathbb {N}}$, $ \Phi \in ( \times _{k = 1}^L({\mathbb {R}}^{l_k \times l_{k-1}} \times {\mathbb {R}}^{l_k}))$ that

$$\begin{aligned} \begin{aligned} {\mathbb {D}}_n (\Phi ) = {\left\{ \begin{array}{ll} l_n &{}:n \le L \\ 0 &{}:n > L \end{array}\right. } \end{aligned} \end{aligned}$$

(3.34)

(cf. Definition 3.1).

Lemma 3.28

Let $n \in {\mathbb {N}}$, $\Phi _1, \Phi _2, \ldots , \Phi _n \in {\mathbf {N}}$ satisfy for all $k \in \{1, 2, \ldots , n\}$ that ${\mathcal {L}}(\Phi _k) = {\mathcal {L}}(\Phi _1)$, ${\mathcal {I}}(\Phi _k) = {\mathcal {I}}(\Phi _1)$, and ${\mathcal {O}}(\Phi _k) = {\mathcal {O}}(\Phi _1)$ (cf. Definition 3.1). Then

(i)
it holds that $ {\mathcal {L}}(\oplus _{k \in \{1, 2, \ldots , n\}} \Phi _k) = {\mathcal {L}}(\Phi _1)$,
(ii)
it holds that
$$\begin{aligned}&{\mathcal {D}}(\oplus _{k \in \{1, 2, \ldots , n\}} \Phi _k) \nonumber \\&\quad = \big ({\mathcal {I}}(\Phi _1), \textstyle \sum _{k=1}^n {\mathbb {D}}_1(\Phi _k), \textstyle \sum _{k = 1}^n {\mathbb {D}}_2(\Phi _k), \ldots , \textstyle \sum _{k=1}^n {\mathbb {D}}_{{\mathcal {L}}(\Phi _1)-1}(\Phi _k), {\mathcal {O}}(\Phi _1)\big ), \end{aligned}$$
(3.35)
(iii)
it holds for all $a \in C({\mathbb {R}}, {\mathbb {R}})$ that ${\mathcal {R}}_{a}(\oplus _{k \in \{1, 2, \ldots , n\}} \Phi _k) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi _1)}, {\mathbb {R}}^{{\mathcal {O}}(\Phi _1)})$, and
(iv)
it holds for all $a \in C({\mathbb {R}}, {\mathbb {R}})$, $x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi _1)}$ that
$$\begin{aligned} \big ({\mathcal {R}}_{a} (\oplus _{k \in \{1, 2, \ldots , n\}} \Phi _k ) \big ) (x) = \sum _{k=1}^n ({\mathcal {R}}_a(\Phi _k))(x) \end{aligned}$$
(3.36)

(cf. Definitions 3.3, 3.26, and 3.27).

Proof of Lemma 3.28

First, note that, e.g., [25, Lemma 2.18] proves that

$$\begin{aligned} \begin{aligned}&{\mathcal {D}}\big ( {\mathbf {P}}_n(\Phi _1,\Phi _2,\dots , \Phi _n) \big )\\&\quad = \big ( \textstyle \sum _{k=1}^n {\mathbb {D}}_0(\Phi _k), \textstyle \sum _{k = 1}^n {\mathbb {D}}_1(\Phi _k), \ldots , \textstyle \sum _{k=1}^n {\mathbb {D}}_{{\mathcal {L}}(\Phi _1)-1}(\Phi _k), \textstyle \sum _{k=1}^n {\mathbb {D}}_{{\mathcal {L}}(\Phi _1)}(\Phi _k)\big )\\&\quad = \big (n {\mathcal {I}}(\Phi _1), \textstyle \sum _{k=1}^n {\mathbb {D}}_1(\Phi _k), \textstyle \sum _{k = 1}^n {\mathbb {D}}_2(\Phi _k), \ldots , \textstyle \sum _{k=1}^n {\mathbb {D}}_{{\mathcal {L}}(\Phi _1)-1}(\Phi _k), n {\mathcal {O}}(\Phi _1)\big ) \end{aligned} \end{aligned}$$

(3.37)

(cf. Definition 3.5). Moreover, observe that item (ii) in Lemma 3.18 ensures that

$$\begin{aligned} {\mathcal {D}}\big ({\mathfrak {S}}_{{\mathcal {O}}(\Phi _1), n} \big ) = (n{\mathcal {O}}(\Phi _1), {\mathcal {O}}(\Phi _1)) \end{aligned}$$

(3.38)

(cf. Definition 3.17). This, (3.37), and, e.g., [25, item (i) in Proposition 2.6] demonstrate that

$$\begin{aligned} \begin{aligned}&{\mathcal {D}}\big ({{\mathfrak {S}}_{{\mathcal {O}}(\Phi _1), n} \bullet \big [{\mathbf {P}}_n(\Phi _1,\Phi _2,\dots , \Phi _n)\big ]} \big )\\&\quad = \big (n {\mathcal {I}}(\Phi _1), \textstyle \sum _{k=1}^n {\mathbb {D}}_1(\Phi _k), \textstyle \sum _{k = 1}^n {\mathbb {D}}_2(\Phi _k), \ldots , \textstyle \sum _{k=1}^n {\mathbb {D}}_{{\mathcal {L}}(\Phi _1)-1}(\Phi _k), {\mathcal {O}}(\Phi _1)\big ). \end{aligned} \end{aligned}$$

(3.39)

Next note that item (ii) in Lemma 3.23 assures that

$$\begin{aligned} {\mathcal {D}}\big ( {\mathfrak {T}}_{{\mathcal {I}}(\Phi _1), n}\big ) = ({\mathcal {I}}(\Phi _1), n {\mathcal {I}}(\Phi _1)) \end{aligned}$$

(3.40)

(cf. Definition 3.22). Combining this, (3.39), and, e.g., [25, item (i) in Proposition 2.6] proves that

$$\begin{aligned} \begin{aligned}&{\mathcal {D}}(\oplus _{k \in \{1, 2, \ldots , n\}} \Phi _k) \\&\quad = {\mathcal {D}}\big ( {{\mathfrak {S}}_{{\mathcal {O}}(\Phi _1), n} \bullet {{\big [{\mathbf {P}}_n(\Phi _1,\Phi _2,\dots , \Phi _n)\big ] \bullet {\mathfrak {T}}_{{\mathcal {I}}(\Phi _1), n}}}}\big )\\&\quad = \big ({\mathcal {I}}(\Phi _1), \textstyle \sum _{k=1}^n {\mathbb {D}}_1(\Phi _k), \textstyle \sum _{k = 1}^n {\mathbb {D}}_2(\Phi _k), \ldots , \textstyle \sum _{k=1}^n {\mathbb {D}}_{{\mathcal {L}}(\Phi _1)-1}(\Phi _k), {\mathcal {O}}(\Phi _1)\big ). \end{aligned} \end{aligned}$$

(3.41)

This establishes items (i)–(ii). Next observe that Lemma 3.25 and (3.37) ensure that for all $a \in C({\mathbb {R}}, {\mathbb {R}})$, $x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi _1)}$ it holds that ${\mathcal {R}}_{a}({[{\mathbf {P}}_n(\Phi _1,\Phi _2,\dots , \Phi _n)] \bullet {\mathfrak {T}}_{{\mathcal {I}}(\Phi _1), n}}) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi _1)}, {\mathbb {R}}^{n {\mathcal {O}}(\Phi _1)}) $ and

$$\begin{aligned} \begin{aligned}&\big ({\mathcal {R}}_{a}\big ({[{\mathbf {P}}_n(\Phi _1,\Phi _2,\dots , \Phi _n)] \bullet {\mathfrak {T}}_{{\mathcal {I}}(\Phi _1), n}}\big )\big ) (x)\\&\quad = \big ({\mathcal {R}}_{a}\big ({\mathbf {P}}_n(\Phi _1,\Phi _2,\dots , \Phi _n)\big )\big )(x, x, \ldots , x). \end{aligned} \end{aligned}$$

(3.42)

Combining this with, e.g., [25, item (ii) in Proposition 2.19] proves that for all $a \in C({\mathbb {R}}, {\mathbb {R}})$, $x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi _1)}$ it holds that

$$\begin{aligned} \begin{aligned}&\big ({\mathcal {R}}_{a}\big ({[{\mathbf {P}}_n(\Phi _1,\Phi _2,\dots , \Phi _n)] \bullet {\mathfrak {T}}_{{\mathcal {I}}(\Phi _1), n}}\big )\big ) (x)\\&\quad = \big ( ({\mathcal {R}}_{a}(\Phi _1))(x), ({\mathcal {R}}_{a}(\Phi _2))(x), \ldots , ({\mathcal {R}}_{a}(\Phi _n))(x) \big ) \in {\mathbb {R}}^{n {\mathcal {O}}(\Phi _1)}. \end{aligned} \end{aligned}$$

(3.43)

Lemma 3.19, (3.38), and, e.g., [25, Lemma 2.8] therefore demonstrate that for all $a \in C({\mathbb {R}}, {\mathbb {R}})$, $x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi _1)}$ it holds that ${\mathcal {R}}_{a}(\oplus _{k \in \{1, 2, \ldots , n\}} \Phi _k) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi _1)}, {\mathbb {R}}^{{\mathcal {O}}(\Phi _1)})$ and

$$\begin{aligned} \begin{aligned}&\big ({\mathcal {R}}_{a} (\oplus _{k \in \{1, 2, \ldots , n\}} \Phi _k ) \big ) (x)\\&\quad = \big ({\mathcal {R}}_{a}\big ({{\mathfrak {S}}_{{\mathcal {O}}(\Phi _1), n} \bullet {{[{\mathbf {P}}_n(\Phi _1,\Phi _2,\dots , \Phi _n)] \bullet {\mathfrak {T}}_{{\mathcal {I}}(\Phi _1), n}}}}\big )\big ) (x) = \sum _{k=1}^n( {\mathcal {R}}_a(\Phi _k)) (x). \end{aligned}\nonumber \\ \end{aligned}$$

(3.44)

This establishes items (iii)–(iv). The proof of Lemma 3.28 is thus completed. $\square $

3.8 ANN representation results

Lemma 3.29

Let $ n \in {\mathbb {N}}$, $h_1, h_2, \ldots , h_n \in {\mathbb {R}}$, $ \Phi _1, \Phi _2, \ldots , \Phi _n \in {\mathbf {N}}$ satisfy that ${\mathcal {D}}(\Phi _1) = {\mathcal {D}}(\Phi _2) = \ldots = {\mathcal {D}}(\Phi _n)$, let $A_k \in {\mathbb {R}}^{{\mathcal {I}}(\Phi _1) \times (n {\mathcal {I}}(\Phi _1))}$, $k \in \{1, 2, \ldots , n\}$, satisfy for all $k \in \{1, 2, \ldots , n\}$, $x = (x_i)_{i \in \{1, 2, \ldots , n\}} \in {\mathbb {R}}^{n {\mathcal {I}}(\phi _1)}$ that $A_k x = x_k$, and let $\Psi \in {\mathbf {N}}$ satisfy that

$$\begin{aligned} \Psi = \oplus _{k \in \{1, 2, \ldots , n\}} ( h_k \circledast ( {\Phi _k \bullet {\mathfrak {W}}_{A_k}})) \end{aligned}$$

(3.45)

(cf. Definitions 3.1, 3.10, 3.13, and 3.26). Then

(i)
it holds that
$$\begin{aligned} {\mathcal {D}}(\Psi ) = (n {\mathcal {I}}(\Phi _1), n{\mathbb {D}}_1(\Phi _1), n{\mathbb {D}}_2(\Phi _1), \ldots , n{\mathbb {D}}_{{\mathcal {L}}(\Phi _1)-1}(\Phi _1), {\mathcal {O}}(\Phi _1)), \end{aligned}$$
(3.46)
(ii)
it holds that ${\mathcal {P}}(\Psi ) \le n^2 {\mathcal {P}}(\Phi _1)$,
(iii)
it holds for all $ a \in C({\mathbb {R}}, {\mathbb {R}})$ that ${\mathcal {R}}_{a}(\Psi ) \in C({\mathbb {R}}^{n {\mathcal {I}}(\Phi _1)}, {\mathbb {R}}^{{\mathcal {O}}(\Phi _1)})$, and
(iv)
it holds for all $ a \in C({\mathbb {R}}, {\mathbb {R}})$, $x = (x_k)_{k \in \{1, 2, \ldots , n\}} \in {\mathbb {R}}^{n {\mathcal {I}}(\Phi _1)}$ that
$$\begin{aligned} ({\mathcal {R}}_{a}(\Psi ))(x) = \sum _{k=1}^n h_k ({\mathcal {R}}_{a}(\Phi _k))(x_k) \end{aligned}$$
(3.47)

(cf. Definitions 3.3 and 3.27).

Proof of Lemma 3.29

First, note that item (ii) in Lemma 3.11 ensures for all $k \in \{1, 2, \ldots , n\}$ that

$$\begin{aligned} {\mathcal {D}}({\mathfrak {W}}_{A_k}) = (n {\mathcal {I}}(\Phi _1), {\mathcal {I}}(\Phi _1)) \in {\mathbb {N}}^2. \end{aligned}$$

(3.48)

This and, e.g., [25, item (i) in Proposition 2.6] prove for all $k \in \{1, 2, \ldots , n\}$ that

$$\begin{aligned} {\mathcal {D}}( {\Phi _k \bullet {\mathfrak {W}}_{A_k}}) = (n {\mathcal {I}}(\Phi _1), {\mathbb {D}}_1(\Phi _k), {\mathbb {D}}_2(\Phi _k), \ldots , {\mathbb {D}}_{{\mathcal {L}}(\Phi _k)}(\Phi _k)). \end{aligned}$$

(3.49)

Item (i) in Lemma 3.14 therefore demonstrates for all $k \in \{1, 2, \ldots , n\}$ that

$$\begin{aligned} \begin{aligned} {\mathcal {D}}( h_k \circledast ( {\Phi _k \bullet {\mathfrak {W}}_{A_k}}))&= {\mathcal {D}}({\Phi _k \bullet {\mathfrak {W}}_{A_k}}) \\&= (n {\mathcal {I}}(\Phi _1), {\mathbb {D}}_1(\Phi _k), {\mathbb {D}}_2(\Phi _k), \ldots , {\mathbb {D}}_{{\mathcal {L}}(\Phi _k)-1}(\Phi _k), {\mathcal {O}}(\Phi _k))\\&= (n {\mathcal {I}}(\Phi _1), {\mathbb {D}}_1(\Phi _1), {\mathbb {D}}_2(\Phi _1), \ldots , {\mathbb {D}}_{{\mathcal {L}}(\Phi _1)-1}(\Phi _1), {\mathcal {O}}(\Phi _1)). \end{aligned}\nonumber \\ \end{aligned}$$

(3.50)

Combining this with item (ii) in Lemma 3.28 ensures that

$$\begin{aligned} \begin{aligned} {\mathcal {D}}(\Psi )&= {\mathcal {D}}\big (\! \oplus _{k \in \{1, 2, \ldots , n\}} ( h_k \circledast ( {\Phi _k \bullet {\mathfrak {W}}_{A_k}}))\big )\\&= (n {\mathcal {I}}(\Phi _1), n{\mathbb {D}}_1(\Phi _1), n{\mathbb {D}}_2(\Phi _1), \ldots , n{\mathbb {D}}_{{\mathcal {L}}(\Phi _1)-1}(\Phi _1), {\mathcal {O}}(\Phi _1)). \end{aligned} \end{aligned}$$

(3.51)

This establishes item (i). Hence, we obtain that

$$\begin{aligned} {\mathcal {P}}(\Psi ) \le n^2 {\mathcal {P}}(\Phi _1). \end{aligned}$$

(3.52)

This establishes item (ii). Moreover, observe that items (iii)–(iv) in Lemma 3.12 assure for all $k \in \{1, 2, \ldots , n\}$, $a \in C({\mathbb {R}}, {\mathbb {R}})$, $x = (x_i)_{i \in \{1, 2, \ldots , n\}} \in {\mathbb {R}}^{n {\mathcal {I}}(\Phi _1)}$ that ${\mathcal {R}}_{a}( {\Phi _k \bullet {\mathfrak {W}}_{A_k}}) \in C({\mathbb {R}}^{n {\mathcal {I}}(\Phi _1)}, {\mathbb {R}}^{{\mathcal {O}}(\Phi _k)})$ and

$$\begin{aligned} \big ({\mathcal {R}}_{a}( {\Phi _k \bullet {\mathfrak {W}}_{A_k}})\big ) (x) = ({\mathcal {R}}_{a}(\Phi ))(A_k x) = ({\mathcal {R}}_{a}(\Phi )) (x_k). \end{aligned}$$

(3.53)

Combining this with items (ii)–(iii) in Lemma 3.14 proves for all $k \in \{1, 2, \ldots , n\}$, $a \in C({\mathbb {R}}, {\mathbb {R}})$, $x = (x_i)_{i \in \{1, 2, \ldots , n\}} \in {\mathbb {R}}^{n {\mathcal {I}}(\Phi _1)}$ that ${\mathcal {R}}_{a}( h_k \circledast ( {\Phi _k \bullet {\mathfrak {W}}_{A_k}})) \in C({\mathbb {R}}^{n {\mathcal {I}}(\Phi _1)}, {\mathbb {R}}^{{\mathcal {O}}(\Phi _1)})$ and

$$\begin{aligned} \big ({\mathcal {R}}_{a}( h_k \circledast ( {\Phi _k \bullet {\mathfrak {W}}_{A_k}}))\big ) (x) = h_k ({\mathcal {R}}_{a}(\Phi )) (x_k). \end{aligned}$$

(3.54)

Items (iii)–(iv) in Lemma 3.28 and (3.50) hence ensure for all $a \in C({\mathbb {R}}, {\mathbb {R}})$, $x = (x_i)_{i \in \{1, 2, \ldots , n\}} \in {\mathbb {R}}^{n {\mathcal {I}}(\Phi _1)}$ that ${\mathcal {R}}_{a}(\Psi ) \in C({\mathbb {R}}^{n {\mathcal {I}}(\Phi _1)}, {\mathbb {R}}^{{\mathcal {O}}(\Phi _1)})$ and

$$\begin{aligned} \begin{aligned} ({\mathcal {R}}_{a}(\Psi ))(x)&= \big ( {\mathcal {R}}_{a}\big ( \! \oplus _{k \in \{1, 2, \ldots , n\}} ( h_k \circledast ( {\Phi _k \bullet {\mathfrak {W}}_{A_k}})) \big ) \big )(x)\\&= \sum _{k = 1}^n \big ({\mathcal {R}}_{a}( h_k \circledast ( {\Phi _k \bullet {\mathfrak {W}}_{A_k}}))\big ) (x) = \sum _{k=1}^n h_k ({\mathcal {R}}_{a}(\Phi _k))(x_k). \end{aligned} \end{aligned}$$

(3.55)

This establishes items (iii)–(iv). The proof of Lemma 3.29 is thus completed. $\square $

Lemma 3.30

Let $a\in C({\mathbb {R}},{\mathbb {R}})$, $L_1, L_2\in {\mathbb {N}}$, ${\mathbb {I}}, \Phi _1,\Phi _2\in {\mathbf {N}}$, $d,{\mathfrak {i}}, l_{1,0},l_{1,1},\dots ,l_{1,L_1},l_{2,0}, l_{2,1},\dots ,l_{2,L_2}\in {\mathbb {N}}$ satisfy for all $k\in \{1,2\}$, $x\in {\mathbb {R}}^{d}$ that $2\le {\mathfrak {i}}\le 2d$, $l_{2,L_2-1}\le l_{1,L_1-1}+{\mathfrak {i}}$, ${\mathcal {D}}({\mathbb {I}}) = (d,{\mathfrak {i}},d)$, $({\mathcal {R}}_{a}({\mathbb {I}}))(x)=x$, ${\mathcal {I}}(\Phi _k)={\mathcal {O}}(\Phi _k)=d$, and ${\mathcal {D}}(\Phi _k)=(l_{k,0},l_{k,1},\dots , l_{k,L_k})$ (cf. Definitions 3.1 and 3.3). Then there exists $\Psi \in {\mathbf {N}}$ such that

(i)
it holds that ${\mathcal {R}}_{a}(\Psi )\in C({\mathbb {R}}^d,{\mathbb {R}}^d)$,
(ii)
it holds for all $x\in {\mathbb {R}}^d$ that
$$\begin{aligned} ({\mathcal {R}}_{a}(\Psi ))(x)=({\mathcal {R}}_{a}(\Phi _2))(x)+\big (({\mathcal {R}}_{a}(\Phi _1))\circ ({\mathcal {R}}_{a}(\Phi _2))\big )(x), \end{aligned}$$
(3.56)
(iii)
it holds that
$$\begin{aligned} {\mathbb {D}}_{{\mathcal {L}}(\Psi ) -1} (\Psi ) \le l_{1, L_1 -1} + {\mathfrak {i}}, \end{aligned}$$
(3.57)
and
(iv)
it holds that ${\mathcal {P}}(\Psi ) \le {\mathcal {P}}(\Phi _2)+\big [\tfrac{1}{2}{\mathcal {P}}({\mathbb {I}})+{\mathcal {P}}(\Phi _1)\big ]^{\!2}$

(cf. Definitions 3.4 and 3.27).

Proof of Lemma 3.30

To prove items (i)–(iv) we distinguish between the case $L_1=1$ and the case $L_1 \in {\mathbb {N}}\cap [2, \infty )$. We first prove items (i)–(iv) in the case $L_1=1$. Note that, e.g., [25, Proposition 2.30] (with $a=a$, $d=d$, ${\mathfrak {L}} = L_2$, $(\ell _0, \ell _1, \ldots , \ell _{{\mathfrak {L}}}) = (l_{2,0},l_{2,1}, \ldots , l_{2,L_2})$, $\psi = \Phi _2$, $\phi _n = \Phi _1$ for $n \in {\mathbb {N}}_0$ in the notation of [25, Proposition 2.30]) implies that there exists $\Psi \in {\mathbf {N}}$ such that

(I)
it holds that ${\mathcal {R}}_{a}(\Psi ) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)$,
(II)
it holds for all $x \in {\mathbb {R}}^d$ that
$$\begin{aligned} ({\mathcal {R}}_{a}(\Psi ))(x)=({\mathcal {R}}_{a}(\Phi _2))(x)+\big (({\mathcal {R}}_{a}(\Phi _1))\circ ({\mathcal {R}}_{a}(\Phi _2))\big )(x), \end{aligned}$$
(3.58)
and
(III)
it holds that ${\mathcal {D}}(\Psi ) = {\mathcal {D}}(\Phi _2)$.

The hypothesis that $l_{2,L_2-1}\le l_{1,L_1-1}+{\mathfrak {i}}$ hence ensures that

$$\begin{aligned} {\mathbb {D}}_{{\mathcal {L}}(\Psi ) -1} (\Psi ) = {\mathbb {D}}_{{\mathcal {L}}(\Phi _2) -1} (\Phi _2) = l_{2,L_2-1} \le l_{1, L_1 -1} + {\mathfrak {i}}. \end{aligned}$$

(3.59)

Moreover, note that (III) assures that

$$\begin{aligned} {\mathcal {P}}(\Psi ) = {\mathcal {P}}(\Phi _2) \le {\mathcal {P}}(\Phi _2)+\big [\tfrac{1}{2}{\mathcal {P}}({\mathbb {I}})+{\mathcal {P}}(\Phi _1)\big ]^{\!2}. \end{aligned}$$

(3.60)

Combining this with (I) and (3.59) establishes items (i)–(iv) in the case $L_1=1$. We now prove items (i)–(iv) in the case $L_1 \in {\mathbb {N}}\cap [2, \infty )$. Observe that, e.g., [25, Proposition 2.28] (with $a=a$, $L_1 = L_1$, $L_2 = L_2$, ${\mathbb {I}} = {\mathbb {I}}$, $\Phi _1 = \Phi _1$, $\Phi _2 = \Phi _2$, $d=d$, ${\mathfrak {i}} = {\mathfrak {i}}$, $(l_{1, 0}, l_{1, 1}, \ldots , l_{1, L_1}) = (l_{1, 0}, l_{1, 1}, \ldots , l_{1, L_1})$, $(l_{2, 0}, l_{2, 1}, \ldots , l_{2, L_2}) = (l_{2, 0}, l_{2, 1}, \ldots , l_{2, L_2})$ in the notation of [25, Proposition 2.28]) proves that there exists $\Psi \in {\mathbf {N}}$ such that

(a)
it holds that ${\mathcal {R}}_{a}(\Psi ) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)$,
(b)
it holds for all $x\in {\mathbb {R}}^d$ that
$$\begin{aligned} ({\mathcal {R}}_{a}(\Psi ))(x)=({\mathcal {R}}_{a}(\Phi _2))(x)+\big (({\mathcal {R}}_{a}(\Phi _1))\circ ({\mathcal {R}}_{a}(\Phi _2))\big )(x), \end{aligned}$$
(3.61)
(c)
it holds that
$$\begin{aligned} {\mathcal {D}}(\Psi )=(l_{2,0},l_{2,1},\dots , l_{2,L_2-1},l_{1,1}+{\mathfrak {i}},l_{1,2}+{\mathfrak {i}},\dots ,l_{1,L_1-1}+{\mathfrak {i}}, l_{1, L_1}), \end{aligned}$$
(3.62)
and
(d)
it holds that ${\mathcal {P}}(\Psi ) \le {\mathcal {P}}(\Phi _2)+\big [\tfrac{1}{2}{\mathcal {P}}({\mathbb {I}})+{\mathcal {P}}(\Phi _1)\big ]^{\!2}$.

This establishes items (i)–(iv) in the case $L_1 \in {\mathbb {N}}\cap [2, \infty )$. The proof of Lemma 3.30 is thus completed. $\square $

4 Kolmogorov partial differential equations (PDEs)

In this section we establish in Theorem 4.5 below the existence of DNNs which approximate solutions of suitable Kolmogorov PDEs without the curse of dimensionality. Moreover, in Corollary 4.6 below we specialize Theorem 4.5 to the case where for every $ d \in {\mathbb {N}}$ we have that the probability measure $ \nu _d $ appearing in Theorem 4.5 is the uniform distribution on the d-dimensional unit cube $ [0,1]^d $. In addition, in Corollary 4.7 below we specialize Theorem 4.5, roughly speaking, to the case where the constants $\kappa \in (0, \infty )$, ${\mathfrak {e}}, {\mathfrak {d}}_1, {\mathfrak {d}}_2, \ldots , {\mathfrak {d}}_6 \in [0, \infty ) $, which we use to specify the regularity hypotheses in Theorem 4.5, are all equal in the sense that $\kappa = {\mathfrak {e}}= {\mathfrak {d}}_1 = {\mathfrak {d}}_2= \ldots = {\mathfrak {d}}_6$.

Corollary 4.7 follows immediately from Theorem 4.5 and is a slight generalization of [36, Theorem 6.1] and [36, Theorem 1.1], respectively. In our proof of Theorem 4.5 we employ the DNN representation results in Lemmas 3.29–3.30 from Sect. 3 above as well as essentially well-known error estimates for the Monte Carlo Euler method which we establish in Proposition 4.4 in this section below. The proof Proposition 4.4, in turn, employs the elementary error estimate results in Lemmas 4.1–4.3 below.

4.1 Error analysis for the Monte Carlo Euler method

Lemma 4.1

Let $d, m \in {\mathbb {N}}$, $\xi \in {\mathbb {R}}^d$, $T \in (0, \infty )$, $L_0, L_1, l \in [0, \infty )$, $h \in (0, T]$, $B \in {\mathbb {R}}^{d \times m}$, let $ \left\| \cdot \right\| \! :{\mathbb {R}}^d \rightarrow [0,\infty ) $ be the d-dimensional Euclidean norm, let $(\Omega , {\mathcal {F}}, {\mathbb {P}})$ be a probability space, let $W :[0, T] \times \Omega \rightarrow {\mathbb {R}}^m$ be a standard Brownian motion, let $f_0 :{\mathbb {R}}^d \rightarrow {\mathbb {R}}$ and $f_1 :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d$ be functions, let $\chi :[0, T] \rightarrow [0, T]$ be a function, assume for all $t \in [0, T]$, $x, y \in {\mathbb {R}}^d$ that

$$\begin{aligned}&|f_0(x) - f_0(y)| \le L_0 \!\left( 1+ \int _0^1 [r \Vert x\Vert + (1-r) \Vert y\Vert ]^l \, dr \right) \! \Vert x-y\Vert , \end{aligned}$$

(4.1)

$$\begin{aligned}&\Vert f_1(x) - f_1(y) \Vert \le L_1 \Vert x-y\Vert , \end{aligned}$$

(4.2)

and $\chi (t) = \max (\{0, h, 2h, \ldots \} \cap [0, t])$, and let $ X, Y :[0,T] \times \Omega \rightarrow {\mathbb {R}}^d $ be stochastic processes with continuous sample paths which satisfy for all $ t \in [0,T] $ that $ Y_t = \xi + \int _0^t f_1\big ( Y_{ \chi ( s ) } \big ) \, ds + B W_t $ and

$$\begin{aligned} X_t = \xi + \int _0^t f_1( X_s ) \, ds + B W_t . \end{aligned}$$

(4.3)

Then it holds that

$$\begin{aligned}&\big |{\mathbb {E}}[f_0(X_T)] - {\mathbb {E}}[f_0(Y_T)] \big | \le (h/T)^{\nicefrac {1}{2}} e^{(l+3+2L_1+[l L_1 + 2 L_1 +2]T)} \max \{1, L_0\} \nonumber \\&\quad \cdot \Big [ \Vert \xi \Vert + 2 + \max \{1, \Vert f_1(0)\Vert \} \max \{1, T\} + \sqrt{(2\max \{l, 1\} -1) {\text {Trace}}(B^*B) T} \Big ]^{1+l} . \end{aligned}$$

(4.4)

Proof of Lemma 4.1

First, note that (4.2) proves that for all $x \in {\mathbb {R}}^d$ it holds that

$$\begin{aligned} \Vert f_1(x) \Vert \le \Vert f_1(x) - f_1(0) \Vert + \Vert f_1(0)\Vert \le L_1 \Vert x\Vert + \Vert f_1(0)\Vert . \end{aligned}$$

(4.5)

This, (4.1), (4.2), and, e.g., [36, Proposition 4.2] (with $d = d$, $m = m$, $\xi = \xi $, $T = T$, $c = L_1$, $C = \Vert f_1(0)\Vert $, $\varepsilon _0 = 0$, $\varepsilon _1 = 0$, $\varepsilon _2 = 0$, $\varsigma _0 = 0$, $\varsigma _1 = 0$, $\varsigma _2 = 0$, $L_0 = L_0$, $L_1 = L_1$, $l = l$, $h = h$, $B = B$, $p = 2$, $q = 2$, $\left\| \cdot \right\| = \left\| \cdot \right\| $, $(\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})$, $W = W$, $\phi _0 = f_0$, $f_1 = f_1$, $\phi _2 = ({\mathbb {R}}^d \ni x \mapsto x \in {\mathbb {R}}^d)$, $\chi = \chi $, $f_0 = f_0$, $\phi _1 = \phi _1$, $\varpi _r = ( {\mathbb {E}}[ \Vert B W_T \Vert ^r])^{ \nicefrac {1}{r} }$, $X = X$, $Y = Y$ for $r \in (0, \infty )$ in the notation of [36, Proposition 4.2]) establish that

$$\begin{aligned} \begin{aligned}&\big |{\mathbb {E}}[f_0(X_T)] - {\mathbb {E}}[f_0(Y_T)] \big | \le (h/T)^{\nicefrac {1}{2}} e^{(l+3+2L_1+[l L_1 + L_1 + L_1 +2]T)} \max \{1, L_0\} \\&\quad \cdot \Big [ \Vert \xi \Vert + 2 + \max \{1, \Vert f_1(0)\Vert \} \max \{1, T\} + \big ( {\mathbb {E}}\big [ \Vert B W_T \Vert ^{\max \{2, 2l\}} \big ] \big )^{ \nicefrac {1}{\max \{2, 2l\}} } \Big ]^{1+l}. \end{aligned} \end{aligned}$$

(4.6)

Combining this with, e.g., [36, Lemma 4.2] (with $d = d$, $m = m$, $T = T$, $p = \max \{2, 2l\}$, $B = B$, $\left\| \cdot \right\| = \left\| \cdot \right\| $, $(\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})$, $W = W$ in the notation of [36, Lemma 4.2]) ensures that

$$\begin{aligned}&\big |{\mathbb {E}}[f_0(X_T)] - {\mathbb {E}}[f_0(Y_T)] \big | \le (h/T)^{\nicefrac {1}{2}} e^{(l+3+2L_1+[l L_1 + 2 L_1 +2]T)} \max \{1, L_0\} \nonumber \\&\quad \cdot \Big [ \Vert \xi \Vert + 2 + \max \{1, \Vert f_1(0)\Vert \} \max \{1, T\} + \sqrt{(2\max \{l, 1\} -1) {\text {Trace}}(B^*B) T} \Big ]^{1+l} . \end{aligned}$$

(4.7)

The proof of Lemma 4.1 is thus completed. $\square $

Lemma 4.2

Let $d, m \in {\mathbb {N}}$, $T, \kappa \in (0, \infty )$, $\theta , {\mathfrak {d}}_0, {\mathfrak {d}}_1 \in [0, \infty )$, $h \in (0, T]$, $B \in {\mathbb {R}}^{d \times m}$, $p \in [1, \infty )$, let $\nu :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1] $ be a probability measure on ${\mathbb {R}}^d$, let $ \left\| \cdot \right\| \! :{\mathbb {R}}^d \rightarrow [0,\infty ) $ be the d-dimensional Euclidean norm, let $(\Omega , {\mathcal {F}}, {\mathbb {P}})$ be a probability space, let $W :[0, T] \times \Omega \rightarrow {\mathbb {R}}^m$ be a standard Brownian motion, let $f_0 :{\mathbb {R}}^d \rightarrow {\mathbb {R}}$ and $f_1 :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d$ be functions, let $\chi :[0, T] \rightarrow [0, T]$ be a function, assume for all $t \in [0, T]$, $x, y \in {\mathbb {R}}^d$ that

$$\begin{aligned}&|f_0(x) - f_0(y)| \le \kappa d^{{\mathfrak {d}}_0} (1 + \Vert x\Vert ^{\theta } + \Vert y\Vert ^{\theta }) \Vert x-y\Vert , \end{aligned}$$

(4.8)

$$\begin{aligned}&\Vert f_1(x) - f_1(y)\Vert \le \kappa \Vert x -y\Vert , \qquad {\text {Trace}}(B^* B) \le \kappa d^{2 {\mathfrak {d}}_1}, \end{aligned}$$

(4.9)

$$\begin{aligned}&\Vert f_1(0)\Vert \le \kappa d^{{\mathfrak {d}}_1}, \qquad \left[ \int _{{\mathbb {R}}^d} \Vert z\Vert ^{p(1+\theta )} \, \nu (dz) \right] ^{\nicefrac {1}{(p (1+\theta ))}} \le \kappa d^{{\mathfrak {d}}_1}, \end{aligned}$$

(4.10)

and $\chi (t) = \max (\{0, h, 2h, \ldots \} \cap [0, t])$, and let $ X^x:[0,T] \times \Omega \rightarrow {\mathbb {R}}^d $, $x \in {\mathbb {R}}^d$, and $Y^x :[0,T] \times \Omega \rightarrow {\mathbb {R}}^d $, $x \in {\mathbb {R}}^d$, be stochastic processes with continuous sample paths which satisfy for all $x \in {\mathbb {R}}^d$, $ t \in [0,T] $ that $ Y_t^x = x + \int _0^t f_1\big ( Y^x_{ \chi ( s ) } \big ) \, ds + B W_t $ and

$$\begin{aligned} X^x_t = x + \int _0^t f_1( X^x_s ) \, ds + B W_t . \end{aligned}$$

(4.11)

Then it holds that

$$\begin{aligned} \begin{aligned}&\left[ \int _{{\mathbb {R}}^d} \big |{\mathbb {E}}[f_0(X^x_T)] - {\mathbb {E}}[f_0(Y^x_T)] \big |^p \, \nu (dx) \right] ^{\nicefrac {1}{p}} \le 2^{4\theta +5} | \!\max \{1, T\} |^{\theta +1}\\&\qquad \cdot |\!\max \{ \kappa , \theta , 1 \}|^{\theta +3} e^{(6\max \{ \kappa , \theta , 1 \}+5|\!\max \{ \kappa , \theta , 1 \}|^2 T)} d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1(\theta +1)} (h/T)^{\nicefrac {1}{2}}. \end{aligned} \end{aligned}$$

(4.12)

Proof of Lemma 4.2

Throughout this proof let $ \iota = \max \{ \kappa , \theta , 1 \} $. Note that (4.8) proves that for all $x, y \in {\mathbb {R}}^d$ it holds that

$$\begin{aligned} \begin{aligned}&|f_0(x) - f_0(y)| \le \kappa d^{{\mathfrak {d}}_0} (1 + \Vert x\Vert ^{\theta } + \Vert y \Vert ^{\theta } )\Vert x-y\Vert \\&\quad \le \kappa d^{{\mathfrak {d}}_0} \! \left( 1 + 2 (\theta +1) \int _0^1 \big [r \Vert x\Vert + (1-r) \Vert y\Vert \big ]^{\theta } \, dr \right) \! \Vert x-y\Vert \\&\quad \le 2 \kappa (\theta +1 ) d^{{\mathfrak {d}}_0} \! \left( 1 + \int _0^1 \big [r \Vert x\Vert + (1-r) \Vert y\Vert \big ]^{\theta } \, dr \right) \! \Vert x-y\Vert . \end{aligned} \end{aligned}$$

(4.13)

Lemma 4.1 (with $d = d$, $m = m$, $\xi = x$, $T =T$, $L_0 = 2\kappa (\theta +1) d^{{\mathfrak {d}}_0}$, $L_1 = \kappa $, $l = \theta $, $h = h$, $B = B$, $\left\| \cdot \right\| = \left\| \cdot \right\| $, $(\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})$, $W = W$, $f_0 = f_0$, $f_1 = f_1$, $\chi = \chi $, $X = X^x$, $Y = Y^x$ for $x \in {\mathbb {R}}^d$ in the notation of Lemma 4.1), (4.10), and (4.9) hence ensure that for all $x \in {\mathbb {R}}^d$ it holds that

$$\begin{aligned}&\big |{\mathbb {E}}[f_0(X^x_T)] - {\mathbb {E}}[f_0(Y^x_T)] \big | \le (h/T)^{\nicefrac {1}{2}} e^{(\theta +3+2\kappa +[\theta \kappa + 2 \kappa +2]T)} \max \{1, 2 \kappa (\theta +1) d^{{\mathfrak {d}}_0}\}\nonumber \\&\qquad \cdot \Big [ \Vert x\Vert + 2 + \max \{1, \Vert f_1(0)\Vert \} \max \{1, T\} + \sqrt{(2\max \{\theta , 1\} -1) {\text {Trace}}(B^*B) T} \Big ]^{1+\theta }\nonumber \\&\quad \le (h/T)^{\nicefrac {1}{2}} e^{(\theta +3+2\kappa +[\theta \kappa + 2 \kappa +2]T)} \max \{1, 2 \kappa (\theta +1) d^{{\mathfrak {d}}_0}\} \nonumber \\&\qquad \cdot \Big [ \Vert x\Vert + 2 + \max \{1, \kappa d^{{\mathfrak {d}}_1} \} \max \{1, T\} + \sqrt{(2\max \{\theta , 1\} -1) \kappa d^{2 {\mathfrak {d}}_1} T} \Big ]^{1+\theta }. \end{aligned}$$

(4.14)

Therefore, we obtain that for all $x \in {\mathbb {R}}^d$ it holds that

$$\begin{aligned}&\big |{\mathbb {E}}[f_0(X^x_T)] - {\mathbb {E}}[f_0(Y^x_T)] \big |\nonumber \\&\quad \le 2 \iota (\iota +1) d^{{\mathfrak {d}}_0} (h/T)^{\nicefrac {1}{2}} e^{(6\iota +5\iota ^2 T)} \big [ \Vert x\Vert + 2 + \iota d^{{\mathfrak {d}}_1}\max \{1, T\} + \sqrt{(2\iota -1) \kappa d^{2 {\mathfrak {d}}_1} T} \big ]^{1+\theta } \nonumber \\&\quad \le 4 \iota ^2 d^{{\mathfrak {d}}_0} (h/T)^{\nicefrac {1}{2}} e^{(6\iota +5\iota ^2 T)} \big [ \Vert x\Vert + 2 + \iota d^{{\mathfrak {d}}_1}\max \{1, T\} + \sqrt{2\iota \kappa d^{2 {\mathfrak {d}}_1} T} \big ]^{1+\theta }\nonumber \\&\quad \le 4 \iota ^2 d^{{\mathfrak {d}}_0} (h/T)^{\nicefrac {1}{2}} e^{(6\iota +5\iota ^2 T)} \big [ \Vert x\Vert + 2 + 3 \iota d^{{\mathfrak {d}}_1}\max \{1, T\} \big ]^{1+\theta }\nonumber \\&\quad \le 4 \iota ^2 d^{{\mathfrak {d}}_0} (h/T)^{\nicefrac {1}{2}} e^{(6\iota +5\iota ^2 T)} \big [ \Vert x\Vert + 5 \iota d^{{\mathfrak {d}}_1}\max \{1, T\} \big ]^{1+\theta }. \end{aligned}$$

(4.15)

This establishes that

$$\begin{aligned}&\left[ \int _{{\mathbb {R}}^d} \big |{\mathbb {E}}[f_0(X^x_T)] - {\mathbb {E}}[f_0(Y^x_T)] \big |^p \, \nu (dx) \right] ^{\nicefrac {1}{p}} \nonumber \\&\quad \le 4 \iota ^2 d^{{\mathfrak {d}}_0} (h/T)^{\nicefrac {1}{2}} e^{(6\iota +5\iota ^2 T)} \left[ \int _{{\mathbb {R}}^d} \big [ \Vert x\Vert + 5 \iota d^{{\mathfrak {d}}_1}\max \{1, T\} \big ]^{p(1+\theta )} \, \nu (dx) \right] ^{\nicefrac {1}{p}}\nonumber \\&\quad \le 4 \iota ^2 d^{{\mathfrak {d}}_0} (h/T)^{\nicefrac {1}{2}} e^{(6\iota +5\iota ^2 T)} \left[ \int _{{\mathbb {R}}^d} \big [ 2^{\theta } \Vert x\Vert ^{1+\theta } + 2^{\theta } ( 5 \iota d^{{\mathfrak {d}}_1} \max \{1, T\} )^{1+\theta } \big ]^{p} \, \nu (dx) \right] ^{\nicefrac {1}{p}}\nonumber \\&\quad \le 2^{\theta +2} \iota ^2 d^{{\mathfrak {d}}_0} (h/T)^{\nicefrac {1}{2}} e^{(6\iota +5\iota ^2 T)} \left[ \left[ \int _{{\mathbb {R}}^d} \Vert x\Vert ^{p(1+\theta )} \, \nu (dx) \right] ^{\nicefrac {1}{p}} + ( 5 \iota d^{{\mathfrak {d}}_1} \max \{1, T\} )^{1+\theta } \right] .\nonumber \\ \end{aligned}$$

(4.16)

Combining this and (4.10) assures that

$$\begin{aligned} \begin{aligned}&\left[ \int _{{\mathbb {R}}^d} \big |{\mathbb {E}}[f_0(X^x_T)] - {\mathbb {E}}[f_0(Y^x_T)] \big |^p \, \nu (dx) \right] ^{\nicefrac {1}{p}} \\&\quad \le 2^{\theta +2} \iota ^2 d^{{\mathfrak {d}}_0} (h/T)^{\nicefrac {1}{2}} e^{(6\iota +5\iota ^2 T)} \big [ \kappa ^{1+\theta } d^{{\mathfrak {d}}_1(1+\theta )} + ( 5 \iota d^{{\mathfrak {d}}_1} \max \{1, T\} )^{1+\theta } \big ]\\&\quad \le 2^{\theta +2} ( 5 \iota \max \{1, T\} )^{\theta +1} \iota ^2 d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1(\theta +1)} (h/T)^{\nicefrac {1}{2}} e^{(6\iota +5\iota ^2 T)} \\&\quad \le 2^{\theta +2 + 3(\theta +1)} |\! \max \{1, T\} |^{\theta +1} \iota ^{2 +\theta +1} d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1(\theta +1)} (h/T)^{\nicefrac {1}{2}} e^{(6\iota +5\iota ^2 T)} \\&\quad \le 2^{4\theta +5} |\! \max \{1, T\} |^{\theta +1} \iota ^{\theta +3} e^{(6\iota +5\iota ^2 T)} d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1(\theta +1)} (h/T)^{\nicefrac {1}{2}}. \end{aligned} \end{aligned}$$

(4.17)

The proof of Lemma 4.2 is thus completed. $\square $

Lemma 4.3

Let $d, M, n \in {\mathbb {N}}$, $T, \kappa , \theta \in (0, \infty )$, ${\mathfrak {d}}_0, {\mathfrak {d}}_1 \in [0, \infty )$, $B \in {\mathbb {R}}^{d \times n}$, $p \in [2, \infty )$, let $\nu :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1] $ be a probability measure on ${\mathbb {R}}^d$, let $ \left\| \cdot \right\| \! :{\mathbb {R}}^d \rightarrow [0,\infty ) $ be the d-dimensional Euclidean norm, let $(\Omega , {\mathcal {F}}, {\mathbb {P}})$ be a probability space, let $W^m :[0, T] \rightarrow {\mathbb {R}}^n$, $m \in \{1, 2, \ldots , M\}$, be independent standard Brownian motions, let $f_0 :{\mathbb {R}}^d \rightarrow {\mathbb {R}}$ be ${\mathcal {B}}({\mathbb {R}}^d) /{\mathcal {B}}({\mathbb {R}})$-measurable, let $f_1 :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d$ be ${\mathcal {B}}({\mathbb {R}}^d) /{\mathcal {B}}({\mathbb {R}}^d)$-measurable, let $\chi :[0, T] \rightarrow [0, T]$ be ${\mathcal {B}}([0,T]) /{\mathcal {B}}([0, T])$-measurable, assume for all $t \in [0, T]$, $x \in {\mathbb {R}}^d$ that

$$\begin{aligned} |f_0(x)| \le \kappa d^{{\mathfrak {d}}_0} (d^{{\mathfrak {d}}_1 \theta } + \Vert x\Vert ^{\theta }), \qquad \Vert f_1(x)\Vert \le \kappa (d^{{\mathfrak {d}}_1} + \Vert x\Vert ), \end{aligned}$$

(4.18)

$$\begin{aligned} {\text {Trace}}(B^* B) \le \kappa d^{2 {\mathfrak {d}}_1}, \qquad \left[ \int _{{\mathbb {R}}^d} \Vert z\Vert ^{p\theta } \, \nu (dz) \right] ^{\nicefrac {1}{(p \theta )}} \le \kappa d^{{\mathfrak {d}}_1}, \end{aligned}$$

(4.19)

and $\chi (t) \le t$, and let $ Y^{m, x} :[0,T] \times \Omega \rightarrow {\mathbb {R}}^d $, $m \in \{1, 2, \ldots , M\}$, $x \in {\mathbb {R}}^d$, be stochastic processes with continuous sample paths which satisfy for all $x \in {\mathbb {R}}^d$, $m \in \{1, 2, \ldots , M\}$, $ t \in [0,T] $ that

$$\begin{aligned} Y_t^{m, x} = x + \int _0^t f_1\big ( Y^{m, x}_{ \chi ( s ) } \big ) \, ds + B W_t^m, \end{aligned}$$

(4.20)

Then it holds that

$$\begin{aligned} \begin{aligned}&\left( {\mathbb {E}}\!\left[ \int _{{\mathbb {R}}^d} \Big | {\mathbb {E}}[f_0(Y^{1, x}_T)] - \tfrac{1}{M} \Big [ \textstyle \sum \nolimits _{m=1}^M \displaystyle f_0(Y^{m, x}_T) \Big ] \Big |^p \, \nu (dx) \right] \right) ^{\!\nicefrac {1}{p}}\\&\quad \le 2^{\theta +2} p \kappa (p \theta +p +1)^{\theta } (\kappa T +1)^{\theta } e^{\kappa \theta T} (\kappa ^{\theta } +1) d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1 \theta } M^{-\nicefrac {1}{2}}. \end{aligned} \end{aligned}$$

(4.21)

Proof of Lemma 4.3

Throughout this proof let $ \iota = \max \{ \theta , 1 \} $. Note that (4.18) and, e.g., [36, Lemma 4.1] (with $d =d$, $m =n$, $\xi = \xi $, $p =q$, $c = \kappa $, $C= \kappa d^{{\mathfrak {d}}_1}$, $T =T$, $B = B$, $\left\| \cdot \right\| = \left\| \cdot \right\| $, $(\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})$, $W = W^1$, $\mu = f_1$, $\chi = \chi $, $X = Y^{1, x}$ for $q \in [1, \infty )$, $x \in {\mathbb {R}}^d$ in the notation of [36, Lemma 4.1]) prove that for all $q \in [1, \infty )$, $x \in {\mathbb {R}}^d$ it holds that

$$\begin{aligned} \begin{aligned} \big ({\mathbb {E}}\big [ \Vert Y^{1, x}_T \Vert ^q \big ]\big )^{\nicefrac {1}{q}} \le \Big ( \Vert x\Vert + \kappa d^{{\mathfrak {d}}_1}T + \big ({\mathbb {E}}\big [ \Vert B W^1_T \Vert ^q \big ]\big )^{\nicefrac {1}{q}} \Big ) e^{\kappa T}. \end{aligned} \end{aligned}$$

(4.22)

This, (4.19), and, e.g., [36, Lemma 4.2] (with $d = d$, $m = n$, $T = T$, $p = q$, $B = B$, $\left\| \cdot \right\| = \left\| \cdot \right\| $, $(\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})$, $W = W^1$ for $q \in [1, \infty )$ in the notation of [36, Lemma 4.2]) ensure that for all $q \in [1, \infty )$, $x \in {\mathbb {R}}^d$ it holds that

$$\begin{aligned} \begin{aligned} \Big ({\mathbb {E}}\big [ \Vert Y^{1, x}_T \Vert ^q \big ]\Big )^{\!\nicefrac {1}{q}}&\le \Big ( \Vert x\Vert + \kappa d^{{\mathfrak {d}}_1}T +\sqrt{\max \{1, q-1\} {\text {Trace}}(B^* B)T} \Big ) e^{\kappa T}\\&\le \Big ( \Vert x\Vert + \kappa d^{{\mathfrak {d}}_1}T +\sqrt{\max \{1, q-1\} \kappa d^{2 {\mathfrak {d}}_1} T} \Big ) e^{\kappa T}\\&\le \big ( \Vert x\Vert + \kappa d^{{\mathfrak {d}}_1}T +q \max \{\kappa T, 1\} d^{{\mathfrak {d}}_1} \big ) e^{\kappa T}\\&\le \big ( \Vert x\Vert + (q+1) \max \{\kappa T, 1\} d^{{\mathfrak {d}}_1} \big ) e^{\kappa T}. \end{aligned} \end{aligned}$$

(4.23)

Combining this with (4.18) and Hölder’s inequality establishes for all $x \in {\mathbb {R}}^d$ that

$$\begin{aligned} \begin{aligned} \Big ({\mathbb {E}}\big [ |f_{0} (Y^{1, x}_T ) |^p \big ]\Big )^{\!\nicefrac {1}{p}}&\le \kappa d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1 \theta } + \kappa d^{{\mathfrak {d}}_0} \Big ({\mathbb {E}}\big [ \Vert Y^{1, x}_T \Vert ^{p \theta } \big ]\Big )^{\!\nicefrac {1}{p}}\\&\le \kappa d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1 \theta } + \kappa d^{{\mathfrak {d}}_0} \Big ({\mathbb {E}}\big [ \Vert Y^{1, x}_T \Vert ^{p \iota } \big ]\Big )^{\!\nicefrac {\theta }{(p \iota )}}\\&\le \kappa d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1 \theta } + \kappa d^{{\mathfrak {d}}_0} \big ( \Vert x\Vert + (p \iota +1) \max \{\kappa T, 1\} d^{{\mathfrak {d}}_1} \big )^{\theta } e^{\kappa \theta T}\\&\le \kappa d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1 \theta } + \kappa d^{{\mathfrak {d}}_0} \big ( \Vert x\Vert + (p \iota +1) (\kappa T +1) d^{{\mathfrak {d}}_1} \big )^{\theta } e^{\kappa \theta T}. \end{aligned} \end{aligned}$$

(4.24)

The fact that $ \forall \, y, z \in {\mathbb {R}}, \alpha \in [0, \infty ) :|y + z|^{\alpha } \le 2^{\alpha }(|y|^{\alpha } + |z|^{\alpha })$ hence proves for all $x \in {\mathbb {R}}^d$ that

$$\begin{aligned} \begin{aligned} \Big ({\mathbb {E}}\big [ |f_{0} (Y^{1, x}_T ) |^p \big ]\Big )^{\!\nicefrac {1}{p}}&\le \kappa d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1 \theta } + 2^{\theta } \kappa d^{{\mathfrak {d}}_0} \big ( \Vert x\Vert ^{\theta } + (p \iota +1)^{\theta } (\kappa T +1)^{\theta } d^{{\mathfrak {d}}_1 \theta } \big ) e^{\kappa \theta T}\\&\le 2^{\theta } \kappa (p \iota +1)^{\theta } (\kappa T +1)^{\theta } d^{{\mathfrak {d}}_0} e^{\kappa \theta T} \big ( \Vert x\Vert ^{\theta } + 2 d^{{\mathfrak {d}}_1 \theta } \big )\\&\le 2^{\theta +1} \kappa (p \iota +1)^{\theta } (\kappa T +1)^{\theta } d^{{\mathfrak {d}}_0} e^{\kappa \theta T} \big ( \Vert x\Vert ^{\theta } + d^{{\mathfrak {d}}_1 \theta } \big ). \end{aligned} \end{aligned}$$

(4.25)

This implies that for all $x \in {\mathbb {R}}^d$ it holds that

$$\begin{aligned} \begin{aligned} {\mathbb {E}}\big [ |f_{0} (Y^{1, x}_T ) | \big ] \le \Big ({\mathbb {E}}\big [ |f_{0} (Y^{1, x}_T ) |^p \big ]\Big )^{\!\nicefrac {1}{p}} < \infty . \end{aligned} \end{aligned}$$

(4.26)

Combining this with, e.g., [24, Corollary 2.5] (with $p=p$, $d=1$, $n=M$, $\left\| \cdot \right\| = \left\| \cdot \right\| $, $(\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})$, $X_i = f_0(Y^{i, x})$ for $i \in \{1, 2, \ldots , M\}$, $x \in {\mathbb {R}}^d$ in the notation of [24, Corollary 2.5]) and (4.25) assures for all $x \in {\mathbb {R}}^d$ that

$$\begin{aligned} \begin{aligned}&\left( {\mathbb {E}}\! \left[ \Big | {\mathbb {E}}[f_0(Y^{1, x}_T)] - \tfrac{1}{M} \Big [ \textstyle \sum \nolimits _{m=1}^M \displaystyle f_0(Y^{m, x}_T) \Big ] \Big |^p \right] \right) ^{\!\nicefrac {1}{p}}\\&\quad \le 2 M^{-\nicefrac {1}{2}} \sqrt{(p-1)} \Big ({\mathbb {E}}\Big [\big |f_{0} (Y^{1, x}_T) - {\mathbb {E}}[ f_{0} (Y^{1, x}_T)]\big |^p\Big ] \Big )^{\!\nicefrac {1}{p}} \\&\quad \le 4 M^{-\nicefrac {1}{2}} \sqrt{(p-1)} \Big ({\mathbb {E}}\big [ |f_{0} (Y^{1, x}_T ) |^p \big ]\Big )^{\!\nicefrac {1}{p}} \\&\quad \le 2^{\theta +3} M^{-\nicefrac {1}{2}} \sqrt{(p-1)} \kappa (p \iota +1)^{\theta } (\kappa T +1)^{\theta } d^{{\mathfrak {d}}_0} e^{\kappa \theta T} \big ( \Vert x\Vert ^{\theta } + d^{{\mathfrak {d}}_1 \theta } \big ) . \end{aligned} \end{aligned}$$

(4.27)

This and the fact that $\sqrt{p -1} \le \nicefrac {p}{2}$ establish that

$$\begin{aligned}&\left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \Big | {\mathbb {E}}[f_0(Y^{1, x}_T)] - \tfrac{1}{M} \Big [ \textstyle \sum \nolimits _{m=1}^M \displaystyle f_0(Y^{m, x}_T) \Big ] \Big |^p \, \nu (dx) \right] \right) ^{\!\nicefrac {1}{p}} \nonumber \\&\quad \le 2^{\theta +2} M^{-\nicefrac {1}{2}} p \kappa (p \iota +1)^{\theta } (\kappa T +1)^{\theta } d^{{\mathfrak {d}}_0} e^{\kappa \theta T} \left( \int _{ {\mathbb {R}}^d } \big ( \Vert x\Vert ^{\theta } + d^{{\mathfrak {d}}_1 \theta } \big )^p \, \nu (dx) \right) ^{\!\nicefrac {1}{p}}\nonumber \\&\quad \le 2^{\theta +2} M^{-\nicefrac {1}{2}} p \kappa (p \iota +1)^{\theta } (\kappa T +1)^{\theta } d^{{\mathfrak {d}}_0} e^{\kappa \theta T} \left( d^{{\mathfrak {d}}_1 \theta } + \left[ \int _{{\mathbb {R}}^d} \Vert x\Vert ^{p \theta } \, \nu (dx) \right] ^{\nicefrac {1}{p}} \right) .\nonumber \\ \end{aligned}$$

(4.28)

Combining this and (4.19) demonstrates that

$$\begin{aligned} \begin{aligned}&\left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \Big | {\mathbb {E}}[f_0(Y^{1, x}_T)] - \tfrac{1}{M} \Big [ \textstyle \sum \nolimits _{m=1}^M \displaystyle f_0(Y^{m, x}_T) \Big ] \Big |^p \, \nu (dx) \right] \right) ^{\!\nicefrac {1}{p}}\\&\quad \le 2^{\theta +2} M^{-\nicefrac {1}{2}} p \kappa (p \iota +1)^{\theta } (\kappa T +1)^{\theta } d^{{\mathfrak {d}}_0} e^{\kappa \theta T} \left[ d^{{\mathfrak {d}}_1 \theta } + \kappa ^{\theta } d^{{\mathfrak {d}}_1 \theta } \right] \\&\quad \le 2^{\theta +2} p \kappa (p \iota +1)^{\theta } (\kappa T +1)^{\theta } e^{\kappa \theta T} (\kappa ^{\theta } +1) d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1 \theta } M^{-\nicefrac {1}{2}}\\&\quad \le 2^{\theta +2} p \kappa (p \theta +p +1)^{\theta } (\kappa T +1)^{\theta } e^{\kappa \theta T} (\kappa ^{\theta } +1) d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1 \theta } M^{-\nicefrac {1}{2}}. \end{aligned} \end{aligned}$$

(4.29)

The proof of Lemma 4.3 is thus completed. $\square $

Proposition 4.4

Let $d, M, n \in {\mathbb {N}}$, $T, \kappa , \theta \in (0, \infty )$, ${\mathfrak {d}}_0, {\mathfrak {d}}_1 \in [0, \infty )$, $h \in (0, T]$, $B \in {\mathbb {R}}^{d \times n}$, $p \in [2, \infty )$, let $\nu :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1] $ be a probability measure on ${\mathbb {R}}^d$, let $ \left\| \cdot \right\| \! :{\mathbb {R}}^d \rightarrow [0,\infty ) $ be the d-dimensional Euclidean norm, let $(\Omega , {\mathcal {F}}, {\mathbb {P}})$ be a probability space, let $W^m :[0, T] \times \Omega \rightarrow {\mathbb {R}}^n$, $m \in \{1, 2, \ldots , M\}$, be independent standard Brownian motions, let $f_0 :{\mathbb {R}}^d \rightarrow {\mathbb {R}}$ and $f_1 :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d$ be functions, let $\chi :[0, T] \rightarrow [0, T]$ be a function, assume for all $t \in [0, T]$, $x, y \in {\mathbb {R}}^d$ that

$$\begin{aligned}&|f_0(x) - f_0(y)| \le \kappa d^{{\mathfrak {d}}_0} (1 + \Vert x\Vert ^{\theta } + \Vert y\Vert ^{\theta }) \Vert x-y\Vert , \end{aligned}$$

(4.30)

$$\begin{aligned}&|f_0(x)| \le \kappa d^{{\mathfrak {d}}_0} ( d^{{\mathfrak {d}}_1 \theta } + \Vert x\Vert ^{\theta }), \qquad {\text {Trace}}(B^* B) \le \kappa d^{2{\mathfrak {d}}_1}, \end{aligned}$$

(4.31)

$$\begin{aligned}&\Vert f_1(x) - f_1(y)\Vert \le \kappa \Vert x -y\Vert , \qquad \Vert f_1(x)\Vert \le \kappa (d^{{\mathfrak {d}}_1} + \Vert x\Vert ), \end{aligned}$$

(4.32)

$$\begin{aligned}&\left[ \int _{{\mathbb {R}}^d} \Vert z\Vert ^{p(1+\theta )} \, \nu (dz) \right] ^{\nicefrac {1}{(p(1+\theta ))}} \le \kappa d^{{\mathfrak {d}}_1}, \end{aligned}$$

(4.33)

and $\chi (t) = \max (\{0, h, 2h, \ldots \} \cap [0, t])$, and let $ X^x :[0,T] \times \Omega \rightarrow {\mathbb {R}}^d $, $x \in {\mathbb {R}}^d$, and $ Y^{m, x} :[0,T] \times \Omega \rightarrow {\mathbb {R}}^d $, $m \in \{1, 2, \ldots , M\}$, $x \in {\mathbb {R}}^d$, be stochastic processes with continuous sample paths which satisfy for all $x \in {\mathbb {R}}^d$, $m \in \{1, 2, \ldots , M\}$, $ t \in [0,T] $ that $ X^x_t = x + \int _0^t f_1( X^x_s ) \, ds + B W_t^1 $ and

$$\begin{aligned} Y_t^{m, x} = x + \int _0^t f_1\big ( Y^{m, x}_{ \chi ( s ) } \big ) \, ds + B W_t^m . \end{aligned}$$

(4.34)

Then it holds that

$$\begin{aligned} \begin{aligned}&\left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \Big | {\mathbb {E}}[f_0(X^{x}_T)] - \tfrac{1}{M} \Big [ \textstyle \sum \nolimits _{m=1}^M \displaystyle f_0(Y^{m, x}_T) \Big ] \Big |^p \, \nu (dx) \right] \right) ^{\!\nicefrac {1}{p}}\\&\quad \le 2^{4\theta +5} |\! \max \{1, T\} |^{\theta +1} |\!\max \{ \kappa , \theta , 1 \}|^{2\theta +3} e^{(6\max \{ \kappa , \theta , 1 \}+5|\!\max \{ \kappa , \theta , 1 \}|^2 T)} \\&\qquad \cdot p (p \theta + p +1)^{\theta } d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1(\theta +1)} \big ((h/T)^{\nicefrac {1}{2}} + M^{-\nicefrac {1}{2}}\big ). \end{aligned} \end{aligned}$$

(4.35)

Proof of Proposition 4.4

Throughout this proof let $ \iota = \max \{ \kappa , \theta , 1 \}$. Note that the triangle inequality proves that

$$\begin{aligned} \begin{aligned}&\left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \Big | {\mathbb {E}}[f_0(X^{x}_T)] - \tfrac{1}{M} \Big [ \textstyle \sum \nolimits _{m=1}^M \displaystyle f_0(Y^{m, x}_T) \Big ] \Big |^p \, \nu (dx) \right] \right) ^{\!\nicefrac {1}{p}}\\&\quad \le \left( \int _{{\mathbb {R}}^d} \big |{\mathbb {E}}[f_0(X^x_T)] - {\mathbb {E}}[f_0(Y^{1, x}_T)] \big |^p \, \nu (dx) \right) ^{\!\nicefrac {1}{p}}\\&\qquad + \left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \Big | {\mathbb {E}}[f_0(Y^{1, x}_T)] - \tfrac{1}{M} \Big [ \textstyle \sum \nolimits _{m=1}^M \displaystyle f_0(Y^{m, x}_T) \Big ] \Big |^p \, \nu (dx) \right] \right) ^{\!\nicefrac {1}{p}}. \end{aligned} \end{aligned}$$

(4.36)

Next note that (4.31)–(4.33) and Lemma 4.2 (with $d=d$, $m=n$, $T=T$, $\kappa =\kappa $, $\theta =\theta $, ${\mathfrak {d}}_0 = {\mathfrak {d}}_0$, ${\mathfrak {d}}_1 = {\mathfrak {d}}_1$, $h=h$, $B=B$, $p=p$, $\nu =\nu $, $\left\| \cdot \right\| = \left\| \cdot \right\| $, $(\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})$, $W = W^1$, $f_0 = f_0$, $f_1 = f_1$, $\chi =\chi $, $X^x = X^x$, $Y^x= Y^{1, x}$ for $x \in {\mathbb {R}}^d$ in the notation of Lemma 4.2) demonstrates that

$$\begin{aligned}&\left( \int _{{\mathbb {R}}^d} \big |{\mathbb {E}}[f_0(X^x_T)] - {\mathbb {E}}[f_0(Y^{1, x}_T)] \big |^p \, \nu (dx) \right) ^{\!\nicefrac {1}{p}} \nonumber \\&\quad \le 2^{4\theta +5} |\! \max \{1, T\} |^{\theta +1} |\!\max \{ \kappa , \theta , 1 \}|^{\theta +3} e^{(6\max \{ \kappa , \theta , 1 \}+5|\!\max \{ \kappa , \theta , 1 \}|^2 T)} d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1(\theta +1)} (h/T)^{\nicefrac {1}{2}}\nonumber \\&\quad = 2^{4\theta +5} | \!\max \{1, T\} |^{\theta +1} \iota ^{\theta +3} e^{(6\iota +5\iota ^2 T)} d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1(\theta +1)} (h/T)^{\nicefrac {1}{2}}. \end{aligned}$$

(4.37)

Moreover, observe that Hölder’s inequality and (4.33) imply that

$$\begin{aligned} \begin{aligned} \left[ \int _{{\mathbb {R}}^d} \Vert z\Vert ^{(p\theta )} \, \nu (dz) \right] ^{\nicefrac {1}{p \theta }} \le \left[ \int _{{\mathbb {R}}^d} \Vert z\Vert ^{p(1+\theta )} \, \nu (dz) \right] ^{\nicefrac {1}{(p(1+\theta ))}} \le \kappa d^{{\mathfrak {d}}_1}. \end{aligned} \end{aligned}$$

(4.38)

Lemma 4.3 (with $d=d$, $M=M$, $n=n$, $T=T$, $\kappa =\kappa $, $\theta =\theta $, ${\mathfrak {d}}_0 = {\mathfrak {d}}_0$, ${\mathfrak {d}}_1 = {\mathfrak {d}}_1$, $B=B$, $p=p$, $\nu = \nu $, $\left\| \cdot \right\| = \left\| \cdot \right\| $, $(\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})$, $W^m=W^m$, $f_0=f_0$, $f_1=f_1$, $\chi =\chi $, $Y^{m,x} = Y^{m,x}$ for $m \in \{1, 2, \ldots , M\}$, $x \in {\mathbb {R}}^d$ in the notation of Lemma 4.3), (4.31), and (4.32) hence establish that

$$\begin{aligned} \begin{aligned}&\left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \Big | {\mathbb {E}}[f_0(Y^{1, x}_T)] - \tfrac{1}{M} \Big [ \textstyle \sum \nolimits _{m=1}^M \displaystyle f_0(Y^{m, x}_T) \Big ] \Big |^p \, \nu (dx) \right] \right) ^{\!\nicefrac {1}{p}}\\&\quad \le 2^{\theta +2} p \kappa (p \theta +p +1)^{\theta } (\kappa T +1)^{\theta } e^{\kappa \theta T} (\kappa ^{\theta } +1) d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1 \theta } M^{-\nicefrac {1}{2}}\\&\quad \le 2^{\theta +2} p \kappa (p \theta +p +1)^{\theta } | \!\max \{1, T\} |^{\theta } (\kappa +1)^{\theta } e^{\kappa \theta T} (\kappa ^{\theta } +1) d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1 \theta } M^{-\nicefrac {1}{2}}\\&\quad \le 2^{\theta +2} p \iota (p \theta +p +1)^{\theta } | \!\max \{1, T\} |^{\theta } (2 \iota )^{\theta } e^{\iota \theta T} ({\iota }^{\theta } +1) d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1 \theta } M^{-\nicefrac {1}{2}}\\&\quad \le 2^{2\theta +3} p \iota ^{2\theta +1} (p \theta +p +1)^{\theta } |\! \max \{1, T\} |^{\theta } e^{\iota \theta T} d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1 \theta } M^{-\nicefrac {1}{2}}. \end{aligned} \end{aligned}$$

(4.39)

This, (4.36), and (4.37) assure that

$$\begin{aligned} \begin{aligned}&\left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \Big | {\mathbb {E}}[f_0(X^{x}_T)] - \tfrac{1}{M} \Big [ \textstyle \sum \nolimits _{m=1}^M \displaystyle f_0(Y^{m, x}_T) \Big ] \Big |^p \, \nu (dx) \right] \right) ^{\!\nicefrac {1}{p}} \\&\quad \le 2^{4\theta +5} | \!\max \{1, T\} |^{\theta +1} \iota ^{\theta +3} e^{(6\iota +5\iota ^2 T)} d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1(\theta +1)} (h/T)^{\nicefrac {1}{2}} \\&\qquad +2^{2\theta +3} p \iota ^{2\theta +1} (p \theta +p +1)^{\theta } |\! \max \{1, T\} |^{\theta } e^{\iota \theta T} d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1 \theta } M^{-\nicefrac {1}{2}}\\&\quad \le 2^{4\theta +5} | \!\max \{1, T\} |^{\theta +1} \iota ^{2\theta +3} e^{(6\iota +5\iota ^2 T)} d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1(\theta +1)} \\&\qquad \cdot \Big ( (h/T)^{\nicefrac {1}{2}}+ p (p \theta + p +1)^{\theta } M^{-\nicefrac {1}{2}} \Big )\\&\quad \le 2^{4\theta +5} | \!\max \{1, T\} |^{\theta +1} \iota ^{2\theta +3} e^{(6\iota +5\iota ^2 T)} d^{{\mathfrak {d}}_0 + {\mathfrak {d}}_1(\theta +1)} p (p \theta + p +1)^{\theta } \\&\qquad \cdot \big ((h/T)^{\nicefrac {1}{2}} + M^{-\nicefrac {1}{2}}\big ). \end{aligned} \end{aligned}$$

(4.40)

The proof of Proposition 4.4 is thus completed. $\square $

4.2 DNN approximations for Kolmogorov PDEs

Theorem 4.5

Let $ A_d = (A_{d, i, j})_{(i, j) \in \{1, \ldots , d\}^2} \in {\mathbb {R}}^{ d \times d } $, $ d \in {\mathbb {N}}$, be symmetric positive semidefinite matrices, let $\left\| \cdot \right\| \! :(\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0, \infty )$ satisfy for all $d \in {\mathbb {N}}$, $x = (x_1, x_2, \ldots , x_d) \in {\mathbb {R}}^d$ that $\Vert x\Vert = ( \textstyle \sum _{i=1}^d |x_i|^2)^{\nicefrac {1}{2}}$, for every $ d \in {\mathbb {N}}$ let $ \nu _d :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1]$ be a probability measure on ${\mathbb {R}}^d$, let $ \varphi _{0,d} :{\mathbb {R}}^d \rightarrow {\mathbb {R}}$, $ d \in {\mathbb {N}}$, and $ \varphi _{ 1, d } :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d $, $ d \in {\mathbb {N}}$, be functions, let $ T, \kappa \in (0, \infty )$, ${\mathfrak {e}}, {\mathfrak {d}}_1, {\mathfrak {d}}_2, \ldots , {\mathfrak {d}}_6 \in [0, \infty )$, $\theta \in [1, \infty )$, $p \in [2, \infty )$, $ ( \phi ^{ m, d }_{ \varepsilon } )_{ (m, d, \varepsilon ) \in \{ 0, 1 \} \times {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}$, $a\in C({\mathbb {R}}, {\mathbb {R}})$ satisfy for all $x \in {\mathbb {R}}$ that $a(x) = \max \{x, 0\}$, assume for all $ d \in {\mathbb {N}}$, $ \varepsilon \in (0,1] $, $ m \in \{0, 1\}$, $ x, y \in {\mathbb {R}}^d $ that $ {\mathcal {R}}_{a}( \phi ^{ 0, d }_{ \varepsilon } ) \in C( {\mathbb {R}}^d, {\mathbb {R}}) $, $ {\mathcal {R}}_{a}( \phi ^{ 1, d }_{ \varepsilon } ) \in C( {\mathbb {R}}^d, {\mathbb {R}}^d ) $, $ {\text {Trace}}(A_d) \le \kappa d^{ 2 {\mathfrak {d}}_1 } $, $[ \int _{{\mathbb {R}}^d} \Vert x\Vert ^{2p \theta } \, \nu _d (dx) ]^{\nicefrac {1}{(2p \theta )}} \le \kappa d^{{\mathfrak {d}}_1 + {\mathfrak {d}}_2}$, $ {\mathcal {P}}( \phi ^{ m, d }_{ \varepsilon } ) \le \kappa d^{ 2^{(-m)} {\mathfrak {d}}_3 } \varepsilon ^{ - 2^{(-m)} {\mathfrak {e}}}$, $ |( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(x) - ( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(y)| \le \kappa d^{{\mathfrak {d}}_6} (1 + \Vert x\Vert ^{\theta } + \Vert y \Vert ^{\theta })\Vert x-y\Vert $, $ \Vert ( {\mathcal {R}}_{a}(\phi ^{ 1, d }_{ \varepsilon }) )(x) \Vert \le \kappa ( d^{ {\mathfrak {d}}_1 + {\mathfrak {d}}_2 } + \Vert x \Vert ) $, $| \varphi _{ 0, d }( x )| \le \kappa d^{ {\mathfrak {d}}_6 } ( d^{ \theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2) } + \Vert x \Vert ^{ \theta } )$, $ \Vert \varphi _{ 1, d }( x ) - \varphi _{ 1, d }( y ) \Vert \le \kappa \Vert x - y \Vert $, and

$$\begin{aligned} \Vert \varphi _{ m, d }(x) - ( {\mathcal {R}}_{a}(\phi ^{ m, d }_{ \varepsilon }) )(x) \Vert \le \varepsilon \kappa d^{{\mathfrak {d}}_{(5 -m)}} (d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}+ \Vert x\Vert ^{\theta }) , \end{aligned}$$

(4.41)

and for every $ d \in {\mathbb {N}}$ let $ u_d :[0,T] \times {\mathbb {R}}^{d} \rightarrow {\mathbb {R}}$ be an at most polynomially growing viscosity solution of

$$\begin{aligned} \begin{aligned} \left( \tfrac{ \partial }{\partial t} u_d \right) ( t, x )&= \left( \tfrac{ \partial }{\partial x} u_d \right) ( t, x ) \, \varphi _{ 1, d }( x ) + \textstyle \sum \limits _{ i, j = 1 }^d \displaystyle A_{ d, i, j } \, \left( \tfrac{ \partial ^2 }{ \partial x_i \partial x_j } u_d \right) ( t, x ) \end{aligned} \end{aligned}$$

(4.42)

with $ u_d( 0, x ) = \varphi _{ 0, d }( x ) $ for $ ( t, x ) \in (0,T) \times {\mathbb {R}}^d $ (cf. Definitions 3.1 and 3.3). Then there exist $ c \in {\mathbb {R}}$ and $ ( \Psi _{ d, \varepsilon } )_{ (d , \varepsilon ) \in {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}$ such that for all $ d \in {\mathbb {N}}$, $ \varepsilon \in (0,1] $ it holds that $ {\mathcal {R}}( \Psi _{ d, \varepsilon } ) \in C( {\mathbb {R}}^{ d }, {\mathbb {R}}) $, $[ \int _{ {\mathbb {R}}^d } | u_d(T, x) - ( {\mathcal {R}} (\Psi _{ d, \varepsilon }) )( x ) |^p \, \nu _d(dx) ]^{ \nicefrac { 1 }{ p } } \le \varepsilon $, and

$$\begin{aligned} {\mathcal {P}}( \Psi _{ d, \varepsilon } ) \le c d^{6[{\mathfrak {d}}_6 + ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)(\theta +1)] + \max \{4, {\mathfrak {d}}_3\} + {\mathfrak {e}}\max \{{\mathfrak {d}}_5 + \theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2), {\mathfrak {d}}_4 + {\mathfrak {d}}_6 + 2\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)\}} \varepsilon ^{-({\mathfrak {e}}+6)}. \end{aligned}$$

(4.43)

Proof of Theorem 4.5

Throughout this proof let $ {\mathcal {A}}_d \in {\mathbb {R}}^{ d \times d } $, $ d \in {\mathbb {N}}$, satisfy for all $ d \in {\mathbb {N}}$ that $ {\mathcal {A}}_d = \sqrt{ 2 A_d } $, let $(\Omega , {\mathcal {F}}, {\mathbb {P}})$ be a probability space, let $ W^{ d, m } :[0,T] \times \Omega \rightarrow {\mathbb {R}}^d $, $ d, m \in {\mathbb {N}}$, be independent standard Brownian motions, let $Z^{N, d, m}_n :\Omega \rightarrow {\mathbb {R}}^{d} $, $n \in \{0, 1, \ldots , N-1\}$, $m \in \{1, 2, \ldots , N\}$, $d, N \in {\mathbb {N}}$, be the random variables which satisfy for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , N\}$, $n \in \{0, 1, \ldots , N-1\}$ that

$$\begin{aligned} Z^{N, d, m}_n = {\mathcal {A}}_d W^{d, m}_{\frac{(n+1)T}{N}} - {\mathcal {A}}_d W^{d, m}_{\frac{nT}{N}}, \end{aligned}$$

(4.44)

let $f_{N, d} :{\mathbb {R}}^{d} \times {\mathbb {R}}^{d} \rightarrow {\mathbb {R}}^{d}$, $d, N \in {\mathbb {N}}$, satisfy for all $N, d \in {\mathbb {N}}$, $x, y \in {\mathbb {R}}^d$ that

$$\begin{aligned} f_{N, d}(x, y) = x+ y + \tfrac{T}{N} \varphi _{1, d}(y), \end{aligned}$$

(4.45)

let $ X^{ d, x } :[0,T] \times \Omega \rightarrow {\mathbb {R}}^d $, $ x \in {\mathbb {R}}^d $, $ d \in {\mathbb {N}}$, be stochastic processes with continuous sample paths which satisfy for all $ d \in {\mathbb {N}}$, $ x \in {\mathbb {R}}^d $, $ t \in [0,T] $ that

$$\begin{aligned} X^{ d, x }_t = x + \int _0^t \varphi _{ 1, d }( X^{ d, x }_s ) \, ds + {\mathcal {A}}_d W^{ d, 1 }_t \end{aligned}$$

(4.46)

(cf., e.g., [36, item (i) in Theorem 3.1] (with $(\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})$, $T=T$, $d=d$, $m=d$, $B= {\mathcal {A}}_d$, $\mu = \varphi _{1,d}$ for $d \in {\mathbb {N}}$ in the notation of [36, Theorem 3.1])), let $Y^{N, d, x}_n = (Y^{N, d, m, x}_n)_{m \in \{1, 2, \ldots , N\}} :\Omega \rightarrow {\mathbb {R}}^{N d}$, $n \in \{0, 1, \ldots , N\}$, $x \in {\mathbb {R}}^d$, $d, N \in {\mathbb {N}}$, satisfy for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , N\}$, $x \in {\mathbb {R}}^d$, $n \in \{1, 2, \ldots , N\}$ that $Y^{N, d, m, x}_{0} = x$ and

$$\begin{aligned} Y^{N, d, m, x}_{n}&= f_{N, d} \big (Z^{N, d, m}_{n-1}, Y^{N, d, m, x}_{n-1}\big ), \end{aligned}$$

(4.47)

let $g_{N, d} :{\mathbb {R}}^{Nd} \rightarrow {\mathbb {R}}$, $d, N \in {\mathbb {N}}$, satisfy for all $N, d \in {\mathbb {N}}$, $x = (x_i)_{i \in \{1, 2, \ldots , N\}} \in {\mathbb {R}}^{Nd}$ that

$$\begin{aligned} g_{N, d}(x) = \frac{1}{N} \sum _{i=1}^N \varphi _{0,d} (x_i), \end{aligned}$$

(4.48)

and let ${\mathfrak {N}}_{d, \varepsilon } \subseteq {\mathbf {N}}$, $ \varepsilon \in (0, 1]$, $d \in {\mathbb {N}}$, satisfy for all $d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$ that

$$\begin{aligned}&{\mathfrak {N}}_{d, \varepsilon } \nonumber \\&\quad = \Big \{ \Phi \in {\mathbf {N}}:\! \big [ \big ( {\mathcal {R}}_{a}( \Phi ) \in C({\mathbb {R}}^d, {\mathbb {R}}^d) \big ) \wedge \big ({\mathbb {D}}_{{\mathcal {L}}(\Phi ) -1}(\Phi ) \le {\mathbb {D}}_{{\mathcal {L}}(\phi ^{1,d}_{\varepsilon }) -1}(\phi ^{1,d}_{\varepsilon }) + 2d \big ) \big ] \Big \} \end{aligned}$$

(4.49)

(cf. Definition 3.27). Note that (4.44) and, e.g., [36, Lemma 4.2] (with $d = d$, $m = d$, $T = T$, $p = 2p \theta $, $B = {\mathcal {A}}_d$, $(\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})$, $W = W^{d,m}$ for $d, m \in {\mathbb {N}}$ in the notation of [36, Lemma 4.2]) ensure that for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , N\}$, $n \in \{0, 1, \ldots , N-1\}$ it holds that

$$\begin{aligned} \begin{aligned}&\left( {\mathbb {E}}\! \left[ \Vert Z^{N, d, m}_{n} \Vert ^{2 p \theta } \right] \right) ^{\nicefrac {1}{(2 p \theta )}} = \left( {\mathbb {E}}\! \left[ \Big \Vert {\mathcal {A}}_d W^{d, m}_{\frac{(n+1)T}{N}} - {\mathcal {A}}_d W^{d, m}_{\frac{nT}{N}}\Big \Vert ^{2 p \theta } \right] \right) ^{\!\nicefrac {1}{(2 p \theta )}}\\&\quad \le \left( {\mathbb {E}}\! \left[ \Big \Vert {\mathcal {A}}_d W^{d, m}_{\frac{(n+1)T}{N}} \Big \Vert ^{2 p \theta } \right] \right) ^{\!\nicefrac {1}{(2 p \theta )}} + \left( {\mathbb {E}}\! \left[ \Big \Vert {\mathcal {A}}_d W^{d, m}_{\frac{nT}{N}}\Big \Vert ^{2 p \theta } \right] \right) ^{\!\nicefrac {1}{(2 p \theta )}}\\&\quad \le 2 \sqrt{(2p\theta -1) {\text {Trace}}({\mathcal {A}}_d^* {\mathcal {A}}_d) T} = 2 \sqrt{2(2p\theta -1) {\text {Trace}}(A_d) T}. \end{aligned} \end{aligned}$$

(4.50)

This and the assumption that $ \forall \, d \in {\mathbb {N}}:{\text {Trace}}(A_d) \le \kappa d^{ 2 {\mathfrak {d}}_1 } $ assure for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , N\}$, $n \in \{0, 1, \ldots , N-1\}$ that

$$\begin{aligned} \begin{aligned} \left( {\mathbb {E}}\! \left[ \Vert Z^{N, d, m}_{n} \Vert ^{2 p \theta } \right] \right) ^{\nicefrac {1}{(2 p \theta )}} \le 4 p \theta \sqrt{\kappa T} d^{{\mathfrak {d}}_1}. \end{aligned} \end{aligned}$$

(4.51)

Moreover, observe that Lemma 3.16 (with $d=d$, $a=a$ for $d \in {\mathbb {N}}$ in the notation of Lemma 3.16) ensures that there exist ${\mathfrak {I}}_d \in {\mathbf {N}}$, $d \in {\mathbb {N}}$, such that for all $d \in {\mathbb {N}}$, $x \in {\mathbb {R}}^d$ it holds that ${\mathcal {D}}({\mathfrak {I}}_d) = (d, 2d, d)$, ${\mathcal {R}}_{a}( {\mathfrak {I}}_{d}) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)$, and $({\mathcal {R}}_{a}({\mathfrak {I}}_d))(x) = x$. This and (4.49) assure for all $d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$ that ${\mathfrak {I}}_d \in {\mathfrak {N}}_{d, \varepsilon }$ and

$$\begin{aligned} {\mathcal {P}}({\mathfrak {I}}_d) = 2d(d+1) + d(2d+1) = 2d^2 + 2d+2d^2 +d = 4d^2 + 3d \le 7d^2. \end{aligned}$$

(4.52)

Next note that Lemma 3.14 demonstrates that for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$ it holds that ${\mathcal {D}}(\frac{T}{N} \circledast \phi ^{1, d}_{\varepsilon }) = {\mathcal {D}}(\phi ^{1, d}_{\varepsilon })$, ${\mathcal {R}}_{a}(\frac{T}{N} \circledast \phi ^{1, d}_{\varepsilon } ) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)$, and

$$\begin{aligned} {\mathcal {R}}_{a}\big (\tfrac{T}{N} \circledast \phi ^{1, d}_{\varepsilon }\big ) = \tfrac{T}{N} {\mathcal {R}}_{a}(\phi ^{1, d}_{\varepsilon }) \end{aligned}$$

(4.53)

(cf. Definition 3.13). This, the fact that ${\mathcal {D}}({\mathfrak {I}}_d) = (d, 2d, d)$, and Lemma 3.30 (with $a=a$, $L_1 = {\mathcal {L}}(\tfrac{T}{N} \circledast \phi ^{1, d}_{\varepsilon }) $, $L_2 = 2$, ${\mathbb {I}} = {\mathfrak {I}}_d$, $\Phi _1 = \tfrac{T}{N} \circledast \phi ^{1, d}_{\varepsilon }$, $\Phi _2 = {\mathfrak {I}}_d$, $d=d$, ${\mathfrak {i}} = 2d$, $(l_{1, 0}, l_{1, 1}, \ldots , l_{1, L_1}) = {\mathcal {D}}(\tfrac{T}{N} \circledast \phi ^{1, d}_{\varepsilon })$, $(l_{2, 0}, l_{2, 1}, l_{2, L_2}) = (d, 2d, d)$ for $d, N \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$ in the notation of Lemma 3.30) establish that there exist ${\mathbf {f}}^{N, d}_{\varepsilon } \in {\mathbf {N}}$, $\varepsilon \in (0, 1]$, $d, N \in {\mathbb {N}}$, such that for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $x \in {\mathbb {R}}^d$ it holds that ${\mathcal {R}}_{a}({\mathbf {f}}^{N, d}_{\varepsilon }) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)$ and

$$\begin{aligned} ( {\mathcal {R}}_{a}({\mathbf {f}}^{N, d}_{\varepsilon })) (x) = x + \big ( {\mathcal {R}}_{a}\big (\tfrac{T}{N} \circledast \phi ^{1, d}_{\varepsilon }\big )\big )(x) = x + \tfrac{T}{N} ( {\mathcal {R}}_{a}(\phi ^{1, d}_{\varepsilon }))(x). \end{aligned}$$

(4.54)

Items (ii)–(iii) in Lemma 3.9 hence ensure that there exist ${\mathbf {f}}^{N, d}_{\varepsilon , z} \in {\mathbf {N}}$, $z \in {\mathbb {R}}^d$, $\varepsilon \in (0, 1]$, $d, N \in {\mathbb {N}}$, which satisfy for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $z, x \in {\mathbb {R}}^d$ that ${\mathcal {R}}_{a}({\mathbf {f}}^{N, d}_{\varepsilon , z}) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)$ and

$$\begin{aligned} ( {\mathcal {R}}_{a}({\mathbf {f}}^{N, d}_{\varepsilon , z})) (x) = ( {\mathcal {R}}_{a}({\mathbf {f}}^{N, d}_{\varepsilon })) (x) + z = z+x+ \tfrac{T}{N} ( {\mathcal {R}}_{a}( \phi ^{1, d}_{\varepsilon }))(x) . \end{aligned}$$

(4.55)

This, (4.45), and (4.41) imply for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $x, z \in {\mathbb {R}}^d$ that $({\mathbb {R}}^d \ni {\mathfrak {z}} \mapsto ( {\mathcal {R}}_{a}({\mathbf {f}}^{N, d}_{\varepsilon , {\mathfrak {z}}}))(x) \in {\mathbb {R}}^d)$ is ${\mathcal {B}}({\mathbb {R}}^d) /{\mathcal {B}}({\mathbb {R}}^d)$-measurable and

$$\begin{aligned} \begin{aligned} \Vert f_{N, d}(z, x) - ( {\mathcal {R}}_{a}({\mathbf {f}}^{N, d}_{\varepsilon , z}))(x) \Vert&= \tfrac{T}{N} \Vert \varphi _{1, d} (x) - ( {\mathcal {R}}_{a}(\phi ^{1, d}_{\varepsilon }))(x) \Vert \\&\le \tfrac{T \varepsilon \kappa d^{{\mathfrak {d}}_4}}{N} ( d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + \Vert x \Vert ^{ \theta } )\\&\le \varepsilon T \kappa d^{{\mathfrak {d}}_4} ( d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + \Vert x \Vert ^{ \theta } ). \end{aligned} \end{aligned}$$

(4.56)

Next note that (4.55) and the assumption that $ \forall \, \varepsilon \in (0, 1], d \in {\mathbb {N}}, x \in {\mathbb {R}}^d :\Vert ( {\mathcal {R}}_{a}(\phi ^{ 1, d }_{ \varepsilon }) )(x) \Vert \le \kappa ( d^{ {\mathfrak {d}}_1 + {\mathfrak {d}}_2 } + \Vert x \Vert ) $ prove for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $x, z \in {\mathbb {R}}^d$ that

$$\begin{aligned} \begin{aligned} \Vert ( {\mathcal {R}}_{a}({\mathbf {f}}^{N, d}_{\varepsilon , z}))(x) \Vert&\le \Vert z\Vert + \Vert x\Vert + \tfrac{T}{N} \Vert ( {\mathcal {R}}_{a}(\phi ^{ 1, d }_{ \varepsilon }) )(x) \Vert \\&\le \Vert z\Vert + \Vert x\Vert + \tfrac{T \kappa }{N} ( d^{ {\mathfrak {d}}_1 + {\mathfrak {d}}_2 } + \Vert x \Vert ) \\&= \big (1 + \tfrac{T \kappa }{N} \big ) \Vert x\Vert + \tfrac{T \kappa d^{{\mathfrak {d}}_1 + {\mathfrak {d}}_2}}{N} + \Vert z\Vert \\&\le \big (1 + \tfrac{T \kappa }{N} \big ) \Vert x\Vert + (T\kappa +1) (d^{{\mathfrak {d}}_1 + {\mathfrak {d}}_2} + \Vert z\Vert )\\&\le \big (1 + \tfrac{T \kappa }{N} \big ) \Vert x\Vert + (T\kappa +1) d^{{\mathfrak {d}}_2}(d^{{\mathfrak {d}}_1} + \Vert z\Vert ). \end{aligned} \end{aligned}$$

(4.57)

In addition, observe that (4.45) and the assumption that $\forall \, d \in {\mathbb {N}}, x, y \in {\mathbb {R}}^d :\Vert \varphi _{ 1, d }( x ) - \varphi _{ 1, d }( y ) \Vert \le \kappa \Vert x - y \Vert $ imply that for all $N, d \in {\mathbb {N}}$, $x, z \in {\mathbb {R}}^d$ it holds that

$$\begin{aligned} \begin{aligned} \Vert f_{N, d}(z, x) - f_{N, d}(z, y) \Vert&= \Vert x + \tfrac{T}{N} \varphi _{1,d}(x) - y - \tfrac{T}{N} \varphi _{1, d}(y) \Vert \\&\le \Vert x-y\Vert + \tfrac{T}{N} \Vert \varphi _{ 1, d }( x ) - \varphi _{ 1, d }( y ) \Vert \\&\le \big (1 + \tfrac{T \kappa }{N} \big ) \Vert x -y\Vert \le (1+ T \kappa )\Vert x -y\Vert . \end{aligned} \end{aligned}$$

(4.58)

Moreover, note that (4.53), the fact that ${\mathcal {D}} ({\mathfrak {I}}_d) = (d, 2d, d)$, and Lemma 3.30 (with $a=a$, $L_1 = {\mathcal {L}}(\tfrac{T}{N} \circledast \phi ^{1, d}_{\varepsilon }) $, $L_2 = {\mathcal {L}}(\Phi )$, ${\mathbb {I}} = {\mathfrak {I}}_d$, $\Phi _1 = \tfrac{T}{N} \circledast \phi ^{1, d}_{\varepsilon }$, $\Phi _2 = \Phi $, $d=d$, ${\mathfrak {i}} = 2d$, $(l_{1, 0}, l_{1, 1}, \ldots , l_{1, L_1}) = {\mathcal {D}}(\tfrac{T}{N} \circledast \phi ^{1, d}_{\varepsilon }) = {\mathcal {D}}(\phi ^{1, d}_{\varepsilon })$, $(l_{2, 0}, l_{2, 1}, \ldots , l_{2, L_2}) = {\mathcal {D}}(\Phi )$ for $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $\Phi \in {\mathfrak {N}}_{d, \varepsilon }$ in the notation of Lemma 3.30) prove that for every $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $\Phi \in {\mathfrak {N}}_{d, \varepsilon }$ there exists ${\hat{\Phi }} \in {\mathbf {N}}$ such that for all $x \in {\mathbb {R}}^d$ it holds that ${\mathcal {R}}_{a}({\hat{\Phi }}) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)$, ${\mathbb {D}}_{{\mathcal {L}}({\hat{\Phi }}) -1}({\hat{\Phi }}) \le {\mathbb {D}}_{{\mathcal {L}}(\phi ^{1,d}_{\varepsilon }) -1}(\phi ^{1,d}_{\varepsilon }) + 2d $, ${\mathcal {P}}({\hat{\Phi }}) \le {\mathcal {P}}(\Phi ) + [\frac{1}{2} {\mathcal {P}}({\mathfrak {I}}_d) + {\mathcal {P}}(\phi ^{1, d}_{\varepsilon })]^2$, and

$$\begin{aligned} \begin{aligned} ({\mathcal {R}}_{a}({\hat{\Phi }})) (x)&= ({\mathcal {R}}_{a}(\Phi ))(x) + \big ( \big ({\mathcal {R}}_{a}\big (\tfrac{T}{N} \circledast \phi ^{1, d}_{\varepsilon }\big )\big )\circ ({\mathcal {R}}_{a}(\Phi ))\big )(x)\\&= ({\mathcal {R}}_{a}(\Phi ))(x) + \tfrac{T}{N} \big (({\mathcal {R}}_{a}( \phi ^{1, d}_{\varepsilon }))\circ ({\mathcal {R}}_{a}(\Phi ))\big )(x). \end{aligned} \end{aligned}$$

(4.59)

This, (4.49), (4.52), and the fact that $\forall \, d \in {\mathbb {N}}, \varepsilon \in (0, 1] :{\mathcal {P}}( \phi ^{ 1, d }_{ \varepsilon } ) \le \kappa d^{ 2^{(-1)} {\mathfrak {d}}_3 } \varepsilon ^{ - 2^{(-1)} {\mathfrak {e}}}$ demonstrate that for every $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $\Phi \in {\mathfrak {N}}_{d, \varepsilon }$ there exists ${\hat{\Phi }} \in {\mathfrak {N}}_{d, \varepsilon }$ such that for all $x \in {\mathbb {R}}^d$ it holds that

$$\begin{aligned} {\mathcal {P}}({\hat{\Phi }}) \le {\mathcal {P}}(\Phi ) + (4d^2 +\kappa d^{ 2^{(-1)} {\mathfrak {d}}_3 } \varepsilon ^{ - 2^{(-1)} {\mathfrak {e}}})^2 \le {\mathcal {P}}(\Phi ) + (\kappa +4)^2 d^{ \max \{4, {\mathfrak {d}}_3\}} \varepsilon ^{-{\mathfrak {e}}} \end{aligned}$$

(4.60)

and

$$\begin{aligned} \begin{aligned} ({\mathcal {R}}_{a}({\hat{\Phi }})) (x) = ({\mathcal {R}}_{a}(\Phi ))(x) + \tfrac{T}{N} \big (({\mathcal {R}}_{a}( \phi ^{1, d}_{\varepsilon }))\circ ({\mathcal {R}}_{a}(\Phi ))\big )(x). \end{aligned} \end{aligned}$$

(4.61)

Items (i)–(iii) in Lemma 3.9 and (4.55) hence ensure that for every $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $\Phi \in {\mathfrak {N}}_{d, \varepsilon }$ there exist $({\hat{\Phi }}_{z})_{z \in {\mathbb {R}}^d} \subseteq {\mathfrak {N}}_{d, \varepsilon }$ such that for all $x, z, {\mathfrak {z}} \in {\mathbb {R}}^d$ it holds that

$$\begin{aligned} ({\mathcal {R}}_{a}({\hat{\Phi }}_z))(x)= & {} z+ ({\mathcal {R}}_{a}(\Phi ))(x)+ \tfrac{T}{N} \big (({\mathcal {R}}_{a}( \phi ^{1, d}_{\varepsilon }))\circ ({\mathcal {R}}_{a}(\Phi ))\big )(x) \nonumber \\= & {} ( {\mathcal {R}}_{a}({\mathbf {f}}^{N, d}_{\varepsilon , z}))\big ( ({\mathcal {R}}_{a}(\Phi ))(x)\big ), \end{aligned}$$

(4.62)

$$\begin{aligned} {\mathcal {P}}({\hat{\Phi }}_z)\le & {} {\mathcal {P}}(\Phi ) + (\kappa +4)^2 d^{ \max \{4, {\mathfrak {d}}_3\}} \varepsilon ^{-{\mathfrak {e}}}, \end{aligned}$$

(4.63)

and ${\mathcal {D}} ({\hat{\Phi }}_{z}) = {\mathcal {D}} ({\hat{\Phi }}_{{\mathfrak {z}}})$. In the next step we observe that Lemma 3.29 (with $n = N$, $h_m = \nicefrac {1}{N}$, $\phi _m = \phi ^{0, d}_{\varepsilon }$, $a= a$ for $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $m \in \{1, 2, \ldots , N\}$ in the notation of Lemma 3.29) demonstrates that there exist ${\mathbf {g}}^{N, d}_{\varepsilon } \in {\mathbf {N}}$, $\varepsilon \in (0, 1]$, $d, N \in {\mathbb {N}}$, which satisfy for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $x = (x_i)_{i \in \{1, 2, \ldots , N\}} \in {\mathbb {R}}^{Nd}$ that ${\mathcal {R}}_{a}({\mathbf {g}}^{N, d}_{\varepsilon }) \in C({\mathbb {R}}^{N d}, {\mathbb {R}})$ and

$$\begin{aligned} ({\mathcal {R}}_{a}({\mathbf {g}}^{N, d}_{\varepsilon })) (x) = \frac{1}{N} \sum _{i = 1}^N ( {\mathcal {R}}_{a}(\phi ^{0, d}_{\varepsilon }))(x_i). \end{aligned}$$

(4.64)

This, (4.48), and (4.41) ensure that for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $x = (x_i)_{i \in \{1, 2, \ldots , N\}} \in {\mathbb {R}}^{Nd}$ it holds that

$$\begin{aligned} \begin{aligned}&|g_{N,d}(x) - ( {\mathcal {R}}_{a}({\mathbf {g}}^{N, d}_{\varepsilon }) )(x) | \le \frac{1}{N} \sum _{i=1}^N |\varphi _{0,d}(x_i) - ( {\mathcal {R}}_{a}( \phi ^{0, d}_{\varepsilon }))(x_i)|\\&\quad \le \frac{\varepsilon \kappa d^{{\mathfrak {d}}_5}}{N} \sum _{i=1}^N (d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + \Vert x_i\Vert ^{\theta }) = \varepsilon \kappa d^{{\mathfrak {d}}_5} \left[ d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + \tfrac{1}{N} \textstyle \sum \nolimits _{i=1}^N \displaystyle \Vert x_i \Vert ^{\theta } \right] . \end{aligned} \end{aligned}$$

(4.65)

Moreover, note that (4.64) and the assumption that $\forall \, \varepsilon \in (0,1], d \in {\mathbb {N}}, x, y \in {\mathbb {R}}^d :|( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(x) - ( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(y)| \le \kappa d^{{\mathfrak {d}}_6} (1 + \Vert x\Vert ^{\theta } + \Vert y \Vert ^{\theta })\Vert x-y\Vert $ imply that for all $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $x = (x_i)_{i \in \{1, 2, \ldots , N\}} \in {\mathbb {R}}^{Nd}$, $y = (y_i)_{i \in \{1, 2, \ldots , N\}} \in {\mathbb {R}}^{Nd}$ it holds that

$$\begin{aligned} \begin{aligned}&|( {\mathcal {R}}_{a}({\mathbf {g}}^{N, d}_{\varepsilon }) )(x) - ( {\mathcal {R}}_{a}({\mathbf {g}}^{N, d}_{\varepsilon }) )(y)|\\&\quad \le \frac{1}{N} \sum _{i=1}^N | ( {\mathcal {R}}_{a}( \phi ^{0, d}_{\varepsilon }))(x_i) - ( {\mathcal {R}}_{a}( \phi ^{0, d}_{\varepsilon }))(y_i)|\\&\quad \le \frac{\kappa d^{{\mathfrak {d}}_6}}{N} \left[ \textstyle \sum \nolimits _{i=1}^N \displaystyle (1 + \Vert x_i\Vert ^{\theta } + \Vert y_i \Vert ^{\theta })\Vert x_i- y_i\Vert \right] . \end{aligned} \end{aligned}$$

(4.66)

Next observe that the fact that ${\mathcal {D}}({\mathfrak {I}}_d) = (d, 2d, d)$ and, e.g., [25, Proposition 2.16] (with $\Psi = {\mathfrak {I}}_d$, $\Phi _1 = \phi ^{0, d}_{\varepsilon }$, $\Phi _2 \in \{\Phi \in {\mathbf {N}}:{\mathcal {I}}(\Phi ) = {\mathcal {O}}(\Phi ) =d \}$, ${\mathfrak {i}} = 2d$ in the notation of [25, Proposition 2.16]) prove that for every $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $\Phi _1, \Phi _2, \ldots , \Phi _{N} \in \{\Phi \in {\mathbf {N}}:{\mathcal {I}}(\Phi ) = {\mathcal {O}}(\Phi ) =d \}$ with ${\mathcal {D}}(\Phi _1) = {\mathcal {D}}(\Phi _2) = \ldots = {\mathcal {D}}(\Phi _{N})$ there exist $\Psi _1, \Psi _2, \ldots , \Psi _{N} \in {\mathbf {N}}$ such that for all $i \in \{1, 2, \ldots , N\}$ it holds that $ {\mathcal {R}}_{a}(\Psi _i) \in C({\mathbb {R}}^d, {\mathbb {R}})$, ${\mathcal {D}}(\Psi _i) = {\mathcal {D}}(\Psi _1)$, ${\mathcal {P}}(\Psi _i) \le 2 ({\mathcal {P}}(\phi ^{0, d}_{\varepsilon }) + {\mathcal {P}}(\Phi _i))$, and

$$\begin{aligned} {\mathcal {R}}_{a}(\Psi _i) = [ {\mathcal {R}}_{a}(\phi ^{0, d}_{\varepsilon })] \circ [{\mathcal {R}}_{a}({\mathfrak {I}}_d)] \circ [{\mathcal {R}}_{a}(\Phi _i)] = [ {\mathcal {R}}_{a}(\phi ^{0, d}_{\varepsilon })] \circ [{\mathcal {R}}_{a}(\Phi _i)]. \end{aligned}$$

(4.67)

This, (4.64), and Lemma 3.28 assure that for every $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $\Phi _1, \Phi _2, \ldots , \Phi _{N} \in \{\Phi \in {\mathbf {N}}:{\mathcal {I}}(\Phi ) = {\mathcal {O}}(\Phi ) =d \}$ with ${\mathcal {D}}(\Phi _1) = {\mathcal {D}}(\Phi _2) = \ldots = {\mathcal {D}}(\Phi _{N})$ there exists $\Psi \in {\mathbf {N}}$ such that for all $x \in {\mathbb {R}}^d$ it holds that $ {\mathcal {R}}_{a}(\Psi ) \in C({\mathbb {R}}^d, {\mathbb {R}})$, ${\mathcal {P}}(\Psi ) \le 2 N^2 ({\mathcal {P}}(\phi ^{0, d}_{\varepsilon }) + {\mathcal {P}}(\Phi _1))$, and

$$\begin{aligned} \begin{aligned} ({\mathcal {R}}_{a}(\Psi )) (x)&= \frac{1}{N} \sum _{i=1}^{N} ( {\mathcal {R}}_{a}(\phi ^{0, d}_{\varepsilon }))\big ( ({\mathcal {R}}_{a}(\Phi _i)) (x)\big )\\&= ( {\mathcal {R}}_{a}({\mathbf {g}}^{N, d}_{\varepsilon })) \big ( ({\mathcal {R}}_{a}(\Phi _1)) (x), ({\mathcal {R}}_{a}(\Phi _2)) (x), \ldots , ({\mathcal {R}}_{a}(\Phi _N)) (x)\big ). \end{aligned} \end{aligned}$$

(4.68)

The assumption that $\forall \, d \in {\mathbb {N}}, \varepsilon \in (0, 1] :{\mathcal {P}}( \phi ^{ 0, d }_{ \varepsilon } ) \le \kappa d^{ {\mathfrak {d}}_3 } \varepsilon ^{ - {\mathfrak {e}}}$ hence ensures that for every $N, d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $\Phi _1, \Phi _2, \ldots , \Phi _{N} \in \{\Phi \in {\mathbf {N}}:{\mathcal {I}}(\Phi ) = {\mathcal {O}}(\Phi ) =d \}$ with ${\mathcal {D}}(\Phi _1) = {\mathcal {D}}(\Phi _2) = \ldots = {\mathcal {D}}(\Phi _{N})$ there exists $\Psi \in {\mathbf {N}}$ such that for all $x \in {\mathbb {R}}^d$ it holds that $ {\mathcal {R}}_{a}(\Psi ) \in C({\mathbb {R}}^d, {\mathbb {R}})$, $({\mathcal {R}}_{a}(\Psi )) (x) = ( {\mathcal {R}}_{a}({\mathbf {g}}^{N, d}_{\varepsilon })) ( ({\mathcal {R}}_{a}(\Phi _1)) (x), ({\mathcal {R}}_{a}(\Phi _2)) (x),$ $ \ldots , ({\mathcal {R}}_{a}(\Phi _N)) (x))$, and

$$\begin{aligned} \begin{aligned} {\mathcal {P}}(\Psi ) \le 2 N^2 ( \kappa d^{ {\mathfrak {d}}_3 } \varepsilon ^{ - {\mathfrak {e}}} + {\mathcal {P}}(\Phi _1)) \le 2 \max \{\kappa , 1\} N^2 ( d^{ {\mathfrak {d}}_3 } \varepsilon ^{ - {\mathfrak {e}}} + {\mathcal {P}}(\Phi _1)). \end{aligned} \end{aligned}$$

(4.69)

Furthermore, note that (4.41) and the assumption that $\forall \, d\in {\mathbb {N}}, \varepsilon \in (0, 1], x, y \in {\mathbb {R}}^d :|( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(x) - ( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(y)| \le \kappa d^{ {\mathfrak {d}}_6} (1 + \Vert x\Vert ^{\theta } + \Vert y \Vert ^{\theta })\Vert x-y\Vert $ demonstrate for all $d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $x \in {\mathbb {R}}^d$ that

$$\begin{aligned}&|\varphi _{0,d}(x) - \varphi _{0, d}(y)| \nonumber \\&\quad \le | \varphi _{ 0, d }(x) - ( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(x) | + | \varphi _{ 0, d }(y) - ( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(y) | \nonumber \\&\qquad + |( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(x) - ( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon } ))(y)|\nonumber \\&\quad \le \varepsilon \kappa d^{ {\mathfrak {d}}_5 } ( d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + \Vert x \Vert ^{ \theta } ) + \varepsilon \kappa d^{ {\mathfrak {d}}_5 } ( d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)} + \Vert y \Vert ^{ \theta } ) \nonumber \\&\qquad + \kappa d^{ {\mathfrak {d}}_6} (1 + \Vert x\Vert ^{\theta } + \Vert y \Vert ^{\theta })\Vert x-y\Vert . \end{aligned}$$

(4.70)

This establishes that for all $d \in {\mathbb {N}}$, $x \in {\mathbb {R}}^d$ it holds that

$$\begin{aligned} \begin{aligned} |\varphi _{0,d}(x) - \varphi _{0, d}(y)|&\le \kappa d^{ {\mathfrak {d}}_6} (1 + \Vert x\Vert ^{\theta } + \Vert y \Vert ^{\theta })\Vert x-y\Vert . \end{aligned} \end{aligned}$$

(4.71)

Next observe that the assumption that $\forall \, d\in {\mathbb {N}}, \varepsilon \in (0, 1], x \in {\mathbb {R}}^d :\Vert ( {\mathcal {R}}_{a}(\phi ^{ 1, d }_{ \varepsilon }) )(x) \Vert \le \kappa ( d^{ {\mathfrak {d}}_1 + {\mathfrak {d}}_2 } + \Vert x \Vert ) $ and (4.41) ensure for all $d \in {\mathbb {N}}$, $\varepsilon \in (0, 1]$, $x \in {\mathbb {R}}^d$ that

$$\begin{aligned} \begin{aligned} \Vert \varphi _{1, d} (x) \Vert&\le \Vert \varphi _{ 1, d }(x) - ( {\mathcal {R}}_{a}(\phi ^{ 1, d }_{ \varepsilon }) )(x) \Vert + \Vert ( {\mathcal {R}}_{a}(\phi ^{ 1, d }_{ \varepsilon }) )(x) \Vert \\&\le \varepsilon \kappa d^{{\mathfrak {d}}_4} (d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}+ \Vert x\Vert ^{\theta }) + \kappa ( d^{ {\mathfrak {d}}_1 + {\mathfrak {d}}_2 } + \Vert x \Vert ). \end{aligned} \end{aligned}$$

(4.72)

This proves that for all $d \in {\mathbb {N}}$, $x \in {\mathbb {R}}^d$ it holds that

$$\begin{aligned} \Vert \varphi _{1, d} (x) \Vert \le \kappa ( d^{ {\mathfrak {d}}_1 + {\mathfrak {d}}_2 } + \Vert x \Vert ). \end{aligned}$$

(4.73)

In the next step we note that the Hölder’s inequality, the assumption that $\forall \, d \in {\mathbb {N}}:[ \int _{{\mathbb {R}}^d} \Vert x\Vert ^{2p \theta } \, \nu _d (dx) ]^{\nicefrac {1}{(2p\theta )}} \le \kappa d^{{\mathfrak {d}}_1 + {\mathfrak {d}}_2}$, and the assumption that $\theta \in [1, \infty )$ assure that for all $d \in {\mathbb {N}}$ it holds that

$$\begin{aligned} \begin{aligned} \left[ \int _{{\mathbb {R}}^d} \Vert x\Vert ^{p (1+\theta )} \, \nu _d (dx) \right] ^{\nicefrac {1}{(p(1+\theta ))}}&\le \left[ \int _{{\mathbb {R}}^d} \Vert x\Vert ^{2p\theta } \, \nu _d (dx) \right] ^{\nicefrac {1}{(2p\theta )}} \le \kappa d^{{\mathfrak {d}}_1 + {\mathfrak {d}}_2}. \end{aligned} \end{aligned}$$

(4.74)

Next note that (4.47), (4.45), and (4.44) imply that for all $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , N\}$, $x \in {\mathbb {R}}^d$, $n \in \{1, 2, \ldots , N\}$ it holds that

$$\begin{aligned} \begin{aligned} Y^{N, d, m, x}_{n}&= Z^{N, d, m}_{n-1} + Y^{N, d, m, x}_{n-1} + \tfrac{T}{N} \varphi _{1,d}(Y^{N, d, m, x}_{n-1}) \\&= Y^{N, d, m, x}_{n-1} + \tfrac{T}{N} \varphi _{1,d}(Y^{N, d, m, x}_{n-1}) + {\mathcal {A}}_d W^{d, m}_{\frac{nT}{N}} - {\mathcal {A}}_d W^{d, m}_{\frac{(n-1)T}{N}}. \end{aligned} \end{aligned}$$

(4.75)

The assumption that $\forall \, d \in {\mathbb {N}}, x \in {\mathbb {R}}^d :| \varphi _{ 0, d }( x )| \le \kappa d^{ {\mathfrak {d}}_6 } ( d^{ \theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2) } + \Vert x \Vert ^{ \theta } )$, the assumption that $ \forall \, d \in {\mathbb {N}}:{\text {Trace}}(A_d) \le \kappa d^{ 2 {\mathfrak {d}}_1 } $, the assumption that $\forall \, d \in {\mathbb {N}}, x, y \in {\mathbb {R}}^d :\Vert \varphi _{ 1, d }( x ) - \varphi _{ 1, d }( y ) \Vert \le \kappa \Vert x - y \Vert $, (4.71), (4.73), (4.74), (4.46), and Proposition 4.4 (with $d=d$, $M=N$, $n=d$, $T=T$, $\kappa = \kappa $, $\theta = \theta $, ${\mathfrak {d}}_0 = {\mathfrak {d}}_6$, ${\mathfrak {d}}_1 = {\mathfrak {d}}_1 + {\mathfrak {d}}_2$, $h = \nicefrac {T}{N}$, $B = {\mathcal {A}}_d$, $p=p$, $\nu = \nu _d$, $(\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})$, $W^m = W^{d, m}$, $f_0 = \varphi _{0,d}$, $f_1 = \varphi _{1, d}$ for $N, d \in {\mathbb {N}}$ in the notation of Proposition 4.4) hence establish that for all $N, d \in {\mathbb {N}}$ it holds that

$$\begin{aligned}&\left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \Big |{\mathbb {E}}[\varphi _{0,d} (X_T^{d, x})] - \tfrac{1}{N} \Big [ \textstyle \sum \nolimits _{i=1}^N \displaystyle \varphi _{0,d} (Y^{N, d, i, x}_{N}) \Big ] \Big |^p \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}}\nonumber \\&\quad \le 2^{4\theta +5} | \!\max \{1, T\} |^{\theta +1} |\!\max \{ \kappa , \theta , 1 \}|^{2\theta +3} e^{(6\max \{ \kappa , \theta , 1 \}+5|\!\max \{ \kappa , \theta , 1 \}|^2 T)} \nonumber \\&\qquad \cdot p (p \theta + p +1)^{\theta } d^{{\mathfrak {d}}_6 + ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)(\theta +1)} \big ( N^{-\nicefrac {1}{2}} + N^{-\nicefrac {1}{2}}\big )\nonumber \\&\quad = 2^{4\theta +6} |\! \max \{1, T\} |^{\theta +1} |\!\max \{ \kappa , \theta \}|^{2\theta +3} e^{(6\max \{ \kappa , \theta \}+5|\!\max \{ \kappa , \theta \}|^2 T)} \nonumber \\&\qquad \cdot p (p \theta + p +1)^{\theta } d^{{\mathfrak {d}}_6 + ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)(\theta +1)} N^{-\nicefrac {1}{2}} . \end{aligned}$$

(4.76)

This, the fact that $\forall \, d\in {\mathbb {N}}, x \in {\mathbb {R}}^d :| \varphi _{ 0, d }( x ) | \le \kappa d^{ {\mathfrak {d}}_6 } ( d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2) } + \Vert x \Vert ^{ \theta } )$, (4.73), (4.48), and, e.g., [36, Theorem 3.1] (with $(\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})$, $T=T$, $d=d$, $m=d$, $B= {\mathcal {A}}_d$, $\varphi = \varphi _{0,d}$, $\mu = \varphi _{1,d}$ for $d \in {\mathbb {N}}$ in the notation of [36, Theorem 3.1]) prove for all $N, d \in {\mathbb {N}}$ that

$$\begin{aligned} \begin{aligned}&\left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \big |u_d(T, x) - g_{N,d} (Y^{N, d, x}_N) \big |^p \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}}\\&\quad = \left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \big |{\mathbb {E}}[\varphi _{0,d} (X_T^{d, x})] - g_{N,d} (Y^{N, d, x}_N) \big |^p \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}}\\&\quad = \left( {\mathbb {E}}\! \left[ \int _{{\mathbb {R}}^d} \Big |{\mathbb {E}}[\varphi _{0,d} (X_T^{d, x})] - \tfrac{1}{N} \Big [ \textstyle \sum \nolimits _{i=1}^N \displaystyle \varphi _{0,d} (Y^{N, d, i, x}_{N}) \Big ] \Big |^p \, \nu _d (dx) \right] \right) ^{\!\nicefrac {1}{p}}\\&\quad \le 2^{4\theta +6} | \!\max \{1, T\} |^{\theta +1} |\!\max \{ \kappa , \theta \}|^{2\theta +3} e^{(6\max \{ \kappa , \theta \}+5|\!\max \{ \kappa , \theta \}|^2 T)} \\&\qquad \cdot p (p \theta + p +1)^{\theta } d^{{\mathfrak {d}}_6 + ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)(\theta +1)} N^{-\nicefrac {1}{2}}. \end{aligned} \end{aligned}$$

(4.77)

Combining this, (4.47), (4.51), (4.52), (4.56), (4.57), (4.58), (4.62), (4.63), (4.65), (4.66), (4.69), and Theorem 2.3 (with $(\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})$, ${\mathfrak {n}}_0= \nicefrac {1}{2}$, ${\mathfrak {n}}_1 = 0$, ${\mathfrak {n}}_2 =2$, ${\mathfrak {e}}= {\mathfrak {e}}$, ${\mathfrak {d}}_0 = {\mathfrak {d}}_6 + ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)(\theta +1)$, ${\mathfrak {d}}_1 = {\mathfrak {d}}_1$, ${\mathfrak {d}}_2 = {\mathfrak {d}}_2$, ${\mathfrak {d}}_3 = \max \{4, {\mathfrak {d}}_3\}$, ${\mathfrak {d}}_4 = {\mathfrak {d}}_4$, ${\mathfrak {d}}_5 = {\mathfrak {d}}_5$, ${\mathfrak {d}}_6 = {\mathfrak {d}}_6$, $ {\mathfrak {C}}=2^{4\theta +6} | \!\max \{1, T\} |^{\theta +1} |\!\max \{ \kappa , \theta \}|^{2\theta +3} e^{(6\max \{ \kappa , \theta \}+5|\!\max \{ \kappa , \theta \}|^2 T)} p (p \theta + p +1)^{\theta }$, $p=p$, $\theta = \theta $, $M_N= N$, $Z^{N, d, m}_n = Z^{N, d, m}_n$, $f_{N, d} = f_{N, d}$, $Y^{N, d, x}_l = Y^{N, d, x}_l$, $\left\| \cdot \right\| = \left\| \cdot \right\| $, $\nu _d = \nu _d$, $g_{ N, d } = g_{ N, d }$, $u_d(x) = u_d(T,x)$, ${\mathbf {N}}= {\mathbf {N}}$, ${\mathcal {P}}= {\mathcal {P}}$, ${\mathcal {D}}= {\mathcal {D}}$, ${\mathcal {R}}= {\mathcal {R}}_{a}$, ${\mathfrak {N}}_{d, \varepsilon } = {\mathfrak {N}}_{d, \varepsilon }$, ${\mathbf {f}}^{N, d}_{\varepsilon , z} = {\mathbf {f}}^{N, d}_{\varepsilon , z}$, ${\mathbf {g}}^{N, d}_{\varepsilon } = {\mathbf {g}}^{N, d}_{\varepsilon }$, ${\mathfrak {I}}_d = {\mathfrak {I}}_d$ for $N, d \in {\mathbb {N}}$, $m \in \{1, 2, \ldots , N\}$, $n \in \{0, 1, \ldots , N-1\}$, $l \in \{0, 1, \ldots , N\}$, $\varepsilon \in (0, 1]$, $x, z \in {\mathbb {R}}^d$ in the notation of Theorem 2.3) establish (4.43). The proof of Theorem 4.5 is thus completed. $\square $

Corollary 4.6

Let $ \varphi _{0,d} :{\mathbb {R}}^d \rightarrow {\mathbb {R}}$, $ d \in {\mathbb {N}}$, and $ \varphi _{ 1, d } :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d $, $ d \in {\mathbb {N}}$, be functions, let $\left\| \cdot \right\| \! :(\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0, \infty )$ satisfy for all $d \in {\mathbb {N}}$, $x = (x_1, x_2, \ldots , x_d) \in {\mathbb {R}}^d$ that $\Vert x\Vert = ( \textstyle \sum _{i=1}^d |x_i|^2)^{\nicefrac {1}{2}}$, let $ T, \kappa \in (0, \infty )$, ${\mathfrak {e}}, {\mathfrak {d}}_1, {\mathfrak {d}}_2, \ldots , {\mathfrak {d}}_6 \in [0, \infty )$, $\theta \in [1, \infty )$, $p \in [2, \infty )$, $ ( \phi ^{ m, d }_{ \varepsilon } )_{ (m, d, \varepsilon ) \in \{ 0, 1 \} \times {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}$, $a\in C({\mathbb {R}}, {\mathbb {R}})$ satisfy for all $x \in {\mathbb {R}}$ that $a(x) = \max \{x, 0\}$, assume for all $ d \in {\mathbb {N}}$, $ \varepsilon \in (0,1] $, $ m \in \{0, 1\}$, $ x, y \in {\mathbb {R}}^d $ that $ {\mathcal {R}}_{a}( \phi ^{ 0, d }_{ \varepsilon } ) \in C( {\mathbb {R}}^d, {\mathbb {R}}) $, $ {\mathcal {R}}_{a}( \phi ^{ 1, d }_{ \varepsilon } ) \in C( {\mathbb {R}}^d, {\mathbb {R}}^d ) $, $ {\mathcal {P}}( \phi ^{ m, d }_{ \varepsilon } ) \le \kappa d^{ 2^{(-m)} {\mathfrak {d}}_3 } \varepsilon ^{ - 2^{(-m)} {\mathfrak {e}}}$, $ |( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(x) - ( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(y)| \le \kappa d^{{\mathfrak {d}}_6} (1 + \Vert x\Vert ^{\theta } + \Vert y \Vert ^{\theta })\Vert x-y\Vert $, $ \Vert ( {\mathcal {R}}_{a}(\phi ^{ 1, d }_{ \varepsilon }) )(x) \Vert \le \kappa ( d^{ {\mathfrak {d}}_1 + {\mathfrak {d}}_2 } + \Vert x \Vert ) $, $| \varphi _{ 0, d }( x )| \le \kappa d^{ {\mathfrak {d}}_6 } ( d^{ \theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2) } + \Vert x \Vert ^{ \theta } )$, $ \Vert \varphi _{ 1, d }( x ) - \varphi _{ 1, d }( y ) \Vert \le \kappa \Vert x - y \Vert $, and

$$\begin{aligned} \Vert \varphi _{ m, d }(x) - ( {\mathcal {R}}_{a}(\phi ^{ m, d }_{ \varepsilon }) )(x) \Vert \le \varepsilon \kappa d^{{\mathfrak {d}}_{(5 -m)}} (d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}+ \Vert x\Vert ^{\theta }) , \end{aligned}$$

(4.78)

and for every $ d \in {\mathbb {N}}$ let $ u_d :[0,T] \times {\mathbb {R}}^{d} \rightarrow {\mathbb {R}}$ be an at most polynomially growing viscosity solution of

$$\begin{aligned} \begin{aligned} \left( \tfrac{ \partial }{\partial t} u_d \right) ( t, x )&= \left( \tfrac{ \partial }{\partial x} u_d \right) ( t, x ) \, \varphi _{ 1, d }( x ) + \textstyle \sum \limits _{ i = 1 }^d \displaystyle \left( \tfrac{ \partial ^2 }{ \partial x_i^2 } u_d \right) ( t, x ) \end{aligned} \end{aligned}$$

(4.79)

with $ u_d( 0, x ) = \varphi _{ 0, d }( x ) $ for $ ( t, x ) \in (0,T) \times {\mathbb {R}}^d $ (cf. Definitions 3.1 and 3.3). Then there exist $ c \in {\mathbb {R}}$ and $ ( \Psi _{ d, \varepsilon } )_{ (d , \varepsilon ) \in {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}$ such that for all $ d \in {\mathbb {N}}$, $ \varepsilon \in (0,1] $ it holds that $ {\mathcal {R}}( \Psi _{ d, \varepsilon } ) \in C( {\mathbb {R}}^{ d }, {\mathbb {R}}) $, $[ \int _{ [0, 1]^d } | u_d(T, x) - ( {\mathcal {R}} (\Psi _{ d, \varepsilon }) )( x ) |^p \, dx ]^{ \nicefrac { 1 }{ p } } \le \varepsilon $, and

$$\begin{aligned}&{\mathcal {P}}( \Psi _{ d, \varepsilon } ) \le c \varepsilon ^{-({\mathfrak {e}}+6)} \nonumber \\&\quad \cdot d^{6[{\mathfrak {d}}_6 + ( \max \{{\mathfrak {d}}_1, \nicefrac {1}{2}\} + {\mathfrak {d}}_2)(\theta +1)] + \max \{4, {\mathfrak {d}}_3\} + {\mathfrak {e}}\max \{{\mathfrak {d}}_5 + \theta ( \max \{{\mathfrak {d}}_1, \nicefrac {1}{2}\} + {\mathfrak {d}}_2), {\mathfrak {d}}_4 + {\mathfrak {d}}_6 + 2\theta ( \max \{{\mathfrak {d}}_1, \nicefrac {1}{2}\} + {\mathfrak {d}}_2)\}} . \end{aligned}$$

(4.80)

Proof of Corollary 4.6

Throughout this proof for every $ d \in {\mathbb {N}}$ let $ \lambda _d :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0, \infty ]$ be the Lebesgue-Borel measure on ${\mathbb {R}}^d$ and let $\nu _d :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1]$ be the function which satisfies for all $B \in {\mathcal {B}}({\mathbb {R}}^d)$ that

$$\begin{aligned} \nu _d(B) = \lambda _{d}(B \cap [0, 1]^d). \end{aligned}$$

(4.81)

Observe that (4.81) implies that for all $d \in {\mathbb {N}}$ it holds that $\nu _d$ is a probability measure on ${\mathbb {R}}^d$. This and (4.81) ensure that for all $d \in {\mathbb {N}}$, $g \in C({\mathbb {R}}^d, {\mathbb {R}})$ it holds that

$$\begin{aligned} \int _{{\mathbb {R}}^d} |g(x)| \, \nu _d(dx) = \int _{ [0,1]^d } |g(x)| \, dx. \end{aligned}$$

(4.82)

Combining this with, e.g., [24, Lemma 3.15] demonstrates that for all $d \in {\mathbb {N}}$ it holds that

$$\begin{aligned} \begin{aligned} \int _{{\mathbb {R}}^d} \Vert x\Vert ^{2p \theta } \, \nu _d (dx)&= \int _{[0,1]^d} \Vert x\Vert ^{2p \theta } \, dx \le d^{p \theta }. \end{aligned} \end{aligned}$$

(4.83)

This assures for all $d \in {\mathbb {N}}$ that

$$\begin{aligned} \left[ \int _{{\mathbb {R}}^d} \Vert x\Vert ^{2p \theta } \, \nu _d (dx)\right] ^{\nicefrac {1}{(2p \theta )}} \le d^{\nicefrac {1}{2}} \le \max \{\kappa , 1\} d^{ \max \{{\mathfrak {d}}_1, \nicefrac {1}{2}\} + {\mathfrak {d}}_2}. \end{aligned}$$

(4.84)

Moreover, note that for all $d \in {\mathbb {N}}$ it holds that

$$\begin{aligned} {\text {Trace}}({\text {I}}_d) \le d \le \max \{\kappa , 1\} d^{2 \max \{{\mathfrak {d}}_1, \nicefrac {1}{2}\}} \end{aligned}$$

(4.85)

(cf. Definition (3.6)). This, (4.84), and Theorem 4.5 (with $A_d = {\text {I}}_d$, $\left\| \cdot \right\| = \left\| \cdot \right\| $, $\nu _d = \nu _d$, $\varphi _{ 0, d } = \varphi _{ 0, d }$, $\varphi _{ 1, d } = \varphi _{ 1, d }$, $T =T$, $\kappa = \max \{\kappa , 1\}$, ${\mathfrak {e}}= {\mathfrak {e}}$, ${\mathfrak {d}}_1 = \max \{{\mathfrak {d}}_1, \nicefrac {1}{2}\}$, ${\mathfrak {d}}_2 = {\mathfrak {d}}_2$, ${\mathfrak {d}}_3 = {\mathfrak {d}}_3$, ${\mathfrak {d}}_4 = {\mathfrak {d}}_4$, ${\mathfrak {d}}_5 = {\mathfrak {d}}_5$, ${\mathfrak {d}}_6 = {\mathfrak {d}}_6$, $\theta = \theta $, $p = p$, $\phi ^{0, d}_{\varepsilon } = \phi ^{0, d}_{\varepsilon }$, $\phi ^{1, d}_{\varepsilon } = \phi ^{1, d}_{\varepsilon }$, $a = a$, $u_d = u_d$ for $d \in {\mathbb {N}}$ in the notation of Theorem 4.5) establish (4.80). The proof of Corollary 4.6 is thus completed. $\square $

Corollary 4.7

Let $ A_d = ( A_{ d, i, j } )_{ (i, j) \in \{ 1, \dots , d \}^2 } \in {\mathbb {R}}^{ d \times d } $, $ d \in {\mathbb {N}}$, be symmetric positive semidefinite matrices, let $\left\| \cdot \right\| \! :(\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0, \infty )$ satisfy for all $d \in {\mathbb {N}}$, $x = (x_1, x_2, \ldots , x_d) \in {\mathbb {R}}^d$ that $\Vert x\Vert = ( \textstyle \sum _{i=1}^d |x_i|^2)^{\nicefrac {1}{2}}$, for every $ d \in {\mathbb {N}}$ let $ \nu _d :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1]$ be a probability measure on ${\mathbb {R}}^d$, let $ \varphi _{0,d} :{\mathbb {R}}^d \rightarrow {\mathbb {R}}$, $ d \in {\mathbb {N}}$, and $ \varphi _{ 1, d } :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d $, $ d \in {\mathbb {N}}$, be functions, let $ T, \kappa , p \in (0, \infty )$, $\theta \in [1, \infty )$, $ ( \phi ^{ m, d }_{ \varepsilon } )_{ (m, d, \varepsilon ) \in \{ 0, 1 \} \times {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}$, $a\in C({\mathbb {R}}, {\mathbb {R}})$ satisfy for all $x \in {\mathbb {R}}$ that $a(x) = \max \{x, 0\}$, assume for all $ d \in {\mathbb {N}}$, $ \varepsilon \in (0,1] $, $m \in \{0, 1\}$, $ x, y \in {\mathbb {R}}^d $ that $ {\mathcal {R}}_{a}( \phi ^{ 0, d }_{ \varepsilon } ) \in C( {\mathbb {R}}^d, {\mathbb {R}}) $, $ {\mathcal {R}}_{a}( \phi ^{ 1, d }_{ \varepsilon } ) \in C( {\mathbb {R}}^d, {\mathbb {R}}^d ) $, $ | \varphi _{ 0, d }( x ) | + {\text {Trace}}(A_d) \le \kappa d^{ \kappa } ( 1 + \Vert x \Vert ^{ \theta }) $, $[ \int _{{\mathbb {R}}^d} \Vert x\Vert ^{2 \max \{p, 2\} \theta } \, \nu _d (dx) ]^{\nicefrac {1}{(2 \max \{p, 2\} \theta )}} \le \kappa d^{\kappa }$, $ {\mathcal {P}}( \phi ^{ m, d }_{ \varepsilon } ) \le \kappa d^{ \kappa } \varepsilon ^{ - \kappa } $, $ |( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(x) - ( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(y)| \le \kappa d^{\kappa } (1 + \Vert x\Vert ^{\theta } + \Vert y \Vert ^{\theta })\Vert x-y\Vert $, $ \Vert ( {\mathcal {R}}_{a}(\phi ^{ 1, d }_{ \varepsilon }) )(x) \Vert \le \kappa ( d^{ \kappa } + \Vert x \Vert ) $, $ \Vert \varphi _{ 1, d }( x ) - \varphi _{ 1, d }( y ) \Vert \le \kappa \Vert x - y \Vert $, and

$$\begin{aligned} \Vert \varphi _{ m, d }(x) - ( {\mathcal {R}}_{a}(\phi ^{ m, d }_{ \varepsilon }) )(x) \Vert \le \varepsilon \kappa d^{ \kappa } ( 1 + \Vert x \Vert ^{ \theta } ) , \end{aligned}$$

(4.86)

and for every $ d \in {\mathbb {N}}$ let $ u_d :[0,T] \times {\mathbb {R}}^{d} \rightarrow {\mathbb {R}}$ be an at most polynomially growing viscosity solution of

$$\begin{aligned} \begin{aligned} \left( \tfrac{ \partial }{\partial t} u_d \right) ( t, x )&= \left( \tfrac{ \partial }{\partial x} u_d \right) ( t, x ) \, \varphi _{ 1, d }( x ) + \textstyle \sum \limits _{ i, j = 1 }^d \displaystyle A_{ d, i, j } \, \left( \tfrac{ \partial ^2 }{ \partial x_i \partial x_j } u_d \right) ( t, x ) \end{aligned} \end{aligned}$$

(4.87)

with $ u_d( 0, x ) = \varphi _{ 0, d }( x ) $ for $ ( t, x ) \in (0,T) \times {\mathbb {R}}^d $ (cf. Definitions 3.1 and 3.3). Then there exist $ c \in {\mathbb {R}}$ and $ ( \Psi _{ d, \varepsilon } )_{ (d , \varepsilon ) \in {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}$ such that for all $ d \in {\mathbb {N}}$, $ \varepsilon \in (0,1] $ it holds that $ {\mathcal {P}}( \Psi _{ d, \varepsilon } ) \le c \, d^c \varepsilon ^{ - c } $, $ {\mathcal {R}}_{a}( \Psi _{ d, \varepsilon } ) \in C( {\mathbb {R}}^{ d }, {\mathbb {R}}) $, and

$$\begin{aligned} \left[ \int _{ {\mathbb {R}}^d } | u_d(T,x) - ( {\mathcal {R}}_{a}(\Psi _{ d, \varepsilon }) )( x ) |^p \, \nu _d(dx) \right] ^{ \nicefrac { 1 }{ p } } \le \varepsilon . \end{aligned}$$

(4.88)

References

Beck, C., Becker, S., Cheridito, P., Jentzen, A., Neufeld, A.: Deep splitting method for parabolic PDEs. SIAM J. Sci. Comput. 43(5), A3135–A3154 (2021)
Article MathSciNet Google Scholar
Beck, C., Becker, S., Grohs, P., Jaafari, N., Jentzen, A.: Solving the Kolmogorov PDE by means of deep learning. J. Sci. Comput. 88, 3 (2021). (Paper No. 73)
Beck, C., E, W., Jentzen, A.: Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations. J. Nonlinear Sci. 29(4), 1563–1619 (2019)
Article MathSciNet Google Scholar
Beck, C., Hutzenthaler, M., Jentzen, A., Kuckuck, B.: An overview on deep learning-based approximation methods for partial differential equations. Revision requested from Discrete Contin. Dyn. Syst. Ser. B., arXiv:2012.12348 (2020)
Becker, S., Cheridito, P., Jentzen, A.: Deep optimal stopping. J. Mach. Learn. Res. 20, 25 (2019). (Paper No. 74)
Becker, S., Cheridito, P., Jentzen, A., Welti, T.: Solving high-dimensional optimal stopping problems using deep learning. Eur. J. Appl. Math. 32(3), 470–514 (2021)
Article MathSciNet Google Scholar
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Bensoussan, A., Lions, J.-L.: Applications of Variational Inequalities in Stochastic Control, vol 12 of Studies in Mathematics and Its Applications, vol. 12. North-Holland Publishing Co., Amsterdam (1982)
Berg, J., Nyström, K.: A unified deep artificial neural network approach to partial differential equations in complex geometries. Neurocomputing 317, 28–41 (2018)
Article Google Scholar
Berner, J., Grohs, P., Jentzen, A.: Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations. SIAM J. Math. Data Sci. 2(3), 631–657 (2020)
Article MathSciNet Google Scholar
Chan-Wai-Nam, Q., Mikael, J., Warin, X.: Machine learning for semi linear PDEs. J. Sci. Comput. 79(3), 1667–1712 (2019)
Article MathSciNet Google Scholar
Chouiekh, A., Haj, E.H.I.E.: Convnets for fraud detection analysis. Proc. Comput. Sci. 127, 133–138 (2018)
Article Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Article Google Scholar
Dissanayake, M.W.M.G., Phan-Thien, N.: Neural-network-based approximations for solving partial differential equations. Commun. Numer. Meth. Eng. 10(3), 195–201 (1994)
Article Google Scholar
Dockhorn, T.: A Discussion on Solving Partial Differential Equations Using Neural Networks. arXiv:1904.07200 (2019)
E, W., Han, J., Jentzen, J.A.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5(4), 349–380 (2017)
Article MathSciNet Google Scholar
E, W., Yu, B.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1–12 (2018)
Article MathSciNet Google Scholar
Elbrächter, D., Grohs, P., Jentzen, A., Schwab, C.: DNN expression rate analysis of high-dimensional PDEs: application to option pricing. Constr. Approx. 55, 3–71 (2021)
Article MathSciNet Google Scholar
Farahmand, A.-M., Nabi, S., Nikovski, D.: Deep reinforcement learning for partial differential equation control. In: 2017 American Control Conference (ACC), pp 3120–3127 (2017)
Fujii, M., Takahashi, A., Takahashi, M.: Asymptotic expansion as prior knowledge in deep learning method for high dimensional BSDEs. Asia-Pac. Financ. Markets 26(3), 391–408 (2019)
Article Google Scholar
Gonon, L., Grohs, P., Jentzen, A., Kofler, D., Šiška, D.: Uniform error estimates for artificial neural network approximations for heat equations. IMA J. Numer. Anal. (2021). https://doi.org/10.1093/imanum/drab027
Article MATH Google Scholar
Goudenège, L., Molent, A., Zanette, A.: Machine Learning for Pricing American Options in High Dimension. arXiv:1903.11275 (2019)
Graves, A., Mohamed, A.-R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 6645–6649 (2013)
Grohs, P., Hornung, F., Jentzen, A., von Wurstemberger, P.: A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations. Accepted in Mem. Am. Math. Soc. arXiv:1809.02362 (2018)
Grohs, P., Hornung, F., Jentzen, A., Zimmermann, P.: Space-time error estimates for deep neural network approximations for differential equations. Revision requested from Adv. Comput. Math. arXiv:1908.03833 (2019)
Han, J., Jentzen, A., E, W.: Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. 115(34), 8505–8510 (2018)
Article MathSciNet Google Scholar
Han, J., Long, J.: Convergence of the deep BSDE method for coupled FBSDEs. Probab. Uncertain. Quant. Risk 5, 5 (2020)
Article MathSciNet Google Scholar
Henry-Labordère, P.: Deep primal-dual algorithm for BSDEs: applications of machine learning to CVA and IM. SSRN (2017). https://doi.org/10.2139/ssrn.3071506
Article Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.-R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Hornung, F., Jentzen, A., Salimova, D.: Space-time Deep Neural Network Approximations for High-dimensional Partial Differential Equations. arXiv:2006.02199 (2020)
Hu, B., Lu, Z., Li, H., Chen, Q.: Convolutional neural network architectures for matching natural language sentences. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 2 pp. 2042–2050 (2014)
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2261–2269 (2017)
Huré, C., Pham, H., Warin, X.: Some Machine Learning Schemes for High-dimensional Nonlinear PDEs. arXiv:1902.01599 (2019)
Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T.A.: A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. SN Partial Differ. Equ. Appl. 1(10), 1–34 (2020)
MathSciNet MATH Google Scholar
Jacquier, A., Oumgari, M.: Deep PPDEs for Rough Local Stochastic Volatility. arXiv:1906.02551 (2019)
Jentzen, A., Salimova, D., Welti, T.: A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients. Commun. Math. Sci. 19(5), 1167–1205 (2021)
Article MathSciNet Google Scholar
Jianyu, L., Siwei, L., Yingjian, Q., Yaping, H.: Numerical solution of elliptic partial differential equation using radial basis function neural networks. Neural Netw. 16(5), 729–734 (2003)
Article Google Scholar
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 655–665 (2014)
Khoo, Y., Lu, J., Ying, L.: Solving parametric PDE problems with artificial neural networks. Eur. J. Appl. Math. 32(3), 421–435 (2021)
Article MathSciNet Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
Kutyniok, G., Petersen, P., Raslan, M., Schneider, R.: A theoretical analysis of deep neural networks and parametric PDEs. Constr. Approx. 55, 73–125 (2022)
Article MathSciNet Google Scholar
Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 9(5), 987–1000 (1998)
Article Google Scholar
Long, Z., Lu, Y., Ma, X., Dong, B.: PDE-Net: learning PDEs from data. In: Proceedings of the 35th International Conference on Machine Learning, pp. 3208–3216 (2018)
Lye, K.O., Mishra, S., Ray, D.: Deep learning observables in computational fluid dynamics. J. Comput. Phys. 410(26), 109339 (2020)
Article MathSciNet Google Scholar
Magill, M., Qureshi, F., de Haan, H.W.: Neural networks trained to solve differential equations learn general representations. In: Advances in Neural Information Processing Systems, pp. 4071–4081 (2018)
Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems. Volume I: Linear Information, vol. 6 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich (2008)
Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems. Volume II: Standard Information for Functionals, vol. 12 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich (2010)
Pham, H., Warin, X.: Neural networks-based backward scheme for fully nonlinear PDEs. arXiv:1908.00412 (2019)
Raissi, M.: Deep hidden physics models: deep learning of nonlinear partial differential equations. J. Mach. Learn. Res. 19, 25:1-25:24 (2018)
Reisinger, C., Zhang, Y.: Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems. Anal. Appl. (Singap.) 18(6), 951–999 (2020)
Article MathSciNet Google Scholar
Roy, A., Sun, J., Mahoney, R., Alonzi, L., Adams, S., Beling, P.: Deep learning detecting fraud in credit card transactions. In: 2018 Systems and Information Engineering Design Symposium (SIEDS), pp. 129–134 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Sirignano, J., Spiliopoulos, K.: DGM: a deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018)
Article MathSciNet Google Scholar
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)
Wang, R., Fu, B., Fu, G., Wang, M.: Deep & cross network for ad click predictions. In: Proceedings of the ADKDD’17 (2017)
Wang, W., Yang, J., Xiao, J., Li, S., Zhou, D.: Face recognition based on deep learning. In: Human Centered Computing, pp. 812–820 (2015)
Wu, C., Karanasou, P., Gales, M.J., Sim, K.C.: Stimulated deep neural network for speech recognition. Interspeech 2016, 400–404 (2016)
Google Scholar
Zhai, S., Chang, K.-h., Zhang, R., Zhang, Z.M.: Deepintent: learning attentions for online advertising with recurrent neural networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1295–1304 (2016)

Download references

Acknowledgements

This work has been funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy EXC 2044-390685587, Mathematics Münster: Dynamics-Geometry-Structure, by the Swiss National Science Foundation (SNSF) through the research grant 200020_175699, by ETH Foundations of Data Science (ETH-FDS), and by the startup fund project of Shenzhen Research Institute of Big Data through the research grant T00120220001.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Faculty of Mathematics and Research Platform Data Science, University of Vienna, Vienna, Austria
Philipp Grohs
School of Data Science and Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, China
Arnulf Jentzen
Applied Mathematics: Institute for Analysis and Numerics, Faculty of Mathematics and Computer Science, University of Münster, Munster, Germany
Arnulf Jentzen
Seminar for Applied Mathematics, Department of Mathematics, ETH Zurich, Zurich, Switzerland
Arnulf Jentzen & Diyora Salimova
Chair for Mathematical Information Science, Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich, Switzerland
Diyora Salimova
Department for Applied Mathematics, Mathematical Institute, Albert-Ludwigs-University of Freiburg, Freiburg, Germany
Diyora Salimova

Authors

Philipp Grohs
View author publications
You can also search for this author in PubMed Google Scholar
Arnulf Jentzen
View author publications
You can also search for this author in PubMed Google Scholar
Diyora Salimova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arnulf Jentzen.

Additional information

This article is part of the topical collection “Deep learning and PDEs” edited by Arnulf Jentzen, Lin Lin, Siddhartha Mishra, and Lexing Ying.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Grohs, P., Jentzen, A. & Salimova, D. Deep neural network approximations for solutions of PDEs based on Monte Carlo algorithms. Partial Differ. Equ. Appl. 3, 45 (2022). https://doi.org/10.1007/s42985-021-00100-z

Download citation

Received: 25 February 2020
Accepted: 06 May 2021
Published: 08 June 2022
DOI: https://doi.org/10.1007/s42985-021-00100-z

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Deep neural network approximations for solutions of PDEs based on Monte Carlo algorithms

Abstract

Similar content being viewed by others

Space-time error estimates for deep neural network approximations for differential equations

Numerical Solution of the Parametric Diffusion Equation by Deep Neural Networks

Gauss Newton Method for Solving Variational Problems of PDEs with Neural Network Discretizaitons

1 Introduction

Theorem 1.1

2 Deep artificial neural network (DNN) approximations

2.1 A priori bounds for random variables

Lemma 2.1

Proof of Lemma 2.1

Lemma 2.2

Proof of Lemma 2.2

2.2 A DNN approximation result for Monte Carlo algorithms

Theorem 2.3

Proof of Theorem 2.3

3 Artificial neural network (ANN) calculus

3.1 ANNs

Definition 3.1

3.2 Realizations of ANNs

Definition 3.2

Definition 3.3

3.3 Compositions of ANNs

Definition 3.4

3.4 Parallelizations of ANNs with the same length

Definition 3.5

3.5 Linear transformations of ANNs

Definition 3.6

Definition 3.7

Lemma 3.8

Lemma 3.9

Definition 3.10

Lemma 3.11

Lemma 3.12

Definition 3.13

Lemma 3.14

3.6 Representations of the identities with rectifier functions

Definition 3.15

Lemma 3.16

Proof of Lemma 3.16

3.7 Sums of ANNs with the same length

Definition 3.17

Lemma 3.18

Proof of Lemma 3.18

Lemma 3.19

Proof of Lemma 3.19

Lemma 3.20

Proof of Lemma 3.20

Definition 3.21

Definition 3.22

Lemma 3.23

Proof of Lemma 3.23

Lemma 3.24

Proof of Lemma 3.24

Lemma 3.25

Proof of Lemma 3.25

Definition 3.26

Definition 3.27

Lemma 3.28

Proof of Lemma 3.28

3.8 ANN representation results

Lemma 3.29

Proof of Lemma 3.29

Lemma 3.30

Proof of Lemma 3.30

4 Kolmogorov partial differential equations (PDEs)

4.1 Error analysis for the Monte Carlo Euler method

Lemma 4.1

Proof of Lemma 4.1

Lemma 4.2

Proof of Lemma 4.2

Lemma 4.3

Proof of Lemma 4.3

Proposition 4.4

Proof of Proposition 4.4

4.2 DNN approximations for Kolmogorov PDEs

Theorem 4.5

Proof of Theorem 4.5

Corollary 4.6