Abstract
In the past few years deep artificial neural networks (DNNs) have been successfully employed in a large number of computational problems including, e.g., language processing, image recognition, fraud detection, and computational advertisement. Recently, it has also been proposed in the scientific literature to reformulate high-dimensional partial differential equations (PDEs) as stochastic learning problems and to employ DNNs together with stochastic gradient descent methods to approximate the solutions of such high-dimensional PDEs. There are also a few mathematical convergence results in the scientific literature which show that DNNs can approximate solutions of certain PDEs without the curse of dimensionality in the sense that the number of real parameters employed to describe the DNN grows at most polynomially both in the PDE dimension \(d \in {\mathbb {N}}\) and the reciprocal of the prescribed approximation accuracy \(\varepsilon > 0\). One key argument in most of these results is, first, to employ a Monte Carlo approximation scheme which can approximate the solution of the PDE under consideration at a fixed space-time point without the curse of dimensionality and, thereafter, to prove then that DNNs are flexible enough to mimic the behaviour of the employed approximation scheme. Having this in mind, one could aim for a general abstract result which shows under suitable assumptions that if a certain function can be approximated by any kind of (Monte Carlo) approximation scheme without the curse of dimensionality, then the function can also be approximated with DNNs without the curse of dimensionality. It is a subject of this article to make a first step towards this direction. In particular, the main result of this paper, roughly speaking, shows that if a function can be approximated by means of some suitable discrete approximation scheme without the curse of dimensionality and if there exist DNNs which satisfy certain regularity properties and which approximate this discrete approximation scheme without the curse of dimensionality, then the function itself can also be approximated with DNNs without the curse of dimensionality. Moreover, for the number of real parameters used to describe such approximating DNNs we provide an explicit upper bound for the optimal exponent of the dimension \(d \in {\mathbb {N}}\) of the function under consideration as well as an explicit lower bound for the optimal exponent of the prescribed approximation accuracy \(\varepsilon >0\). As an application of this result we derive that solutions of suitable Kolmogorov PDEs can be approximated with DNNs without the curse of dimensionality.
1 Introduction
In the past few years deep artificial neural networks (DNNs) have been successfully employed in a large number of computational problems including, e.g., language processing (cf., e.g., [13, 23, 29, 31, 38, 57]), image recognition (cf., e.g., [32, 40, 52, 54, 56]), fraud detection (cf., e.g., [12, 51]), and computational advertisement (cf., e.g., [55, 58]).
Recently, it has also been proposed to reformulate high-dimensional partial differential equations (PDEs) as stochastic learning problems and to employ DNNs together with stochastic gradient descent methods to approximate the solutions of such high-dimensional PDEs [3, 16, 17, 20, 26, 39, 53] (cf., e.g., also [14, 37, 42]). We refer, e.g., to [1, 2, 4,5,6, 9, 11, 15, 19, 22, 27, 28, 33, 35, 43,44,45, 48, 49] and the references mentioned therein for further developments and extensions of such deep learning based numerical approximation methods for PDEs. In particular, the references [2, 9, 17, 35, 45] deal with linear PDEs (and the stochastic differential equations (SDEs) related to them, respectively), the references [1, 11, 15, 19, 20, 28, 33] deal with semilinear PDEs (and the backward stochastic differential equations (BSDEs) related to them, respectively), the references [3, 43, 48, 49] deal with fully nonlinear PDEs (and the second-order backward stochastic differential equations (2BSDEs) related to them, respectively), the references [27, 44, 53] deal with certain specific subclasses of fully nonlinear PDEs (and the 2BSDEs related to them, respectively), and the references [5, 6, 22, 53] deal with free boundary PDEs (and the optimal stopping/option pricing problems related to them (see, e.g., [8, Chapter 1]), respectively).
In the scientific literature there are also a few rigorous mathematical convergence results for DNN based approximation methods for PDEs. For example, the references [27, 53] provide mathematical convergence results for such DNN based approximation methods for PDEs without any information on the convergence speed and, for instance, the references [10, 18, 21, 24, 25, 30, 34, 36, 41, 50] provide mathematical convergence results of such DNN based approximation methods for PDEs with dimension-independent convergence rates and error constants which are only polynomially dependent on the dimension. In particular, the latter references show that DNNs can approximate solutions of certain PDEs without the curse of dimensionality (cf. [7]) in the sense that the number of real parameters employed to describe the DNN grows at most polynomially both in the PDE dimension \(d \in {\mathbb {N}}\) and the reciprocal of the prescribed approximation accuracy \(\varepsilon > 0\) (cf., e.g., [46, Chapter 1] and [47, Chapter 9]).
One key argument in most of these articles is, first, to employ a Monte Carlo approximation scheme which can approximate the solution of the PDE under consideration at a fixed space-time point without the curse of dimensionality and, thereafter, to prove then that DNNs are flexible enough to mimic the behaviour of the employed approximation scheme (cf., e.g., [36, Section 2 and (i)–(iii) in Section 1] and [24]). Having this in mind, one could aim for a general abstract result which shows under suitable assumptions that if a certain function can be approximated by any kind of (Monte Carlo) approximation scheme without the curse of dimensionality, then the function can also be approximated with DNNs without the curse of dimensionality.
It is a subject of this article to make a first step towards this direction. In particular, the main result of this paper, Theorem 2.3 below, roughly speaking, shows that if a function can be approximated by means of some suitable discrete approximation scheme without the curse of dimensionality (cf. (2.9) in Theorem 2.3 below) and if there exist DNNs which satisfy certain regularity properties and which approximate this discrete approximation scheme without the curse of dimensionality, then the function itself can also be approximated with DNNs without the curse of dimensionality. Moreover, for the number of real parameters used to describe such approximating DNNs we provide in Theorem 2.3 below an explicit upper bound for the optimal exponent of the dimension \(d \in {\mathbb {N}}\) of the function under consideration as well as an explicit lower bound for the optimal exponent of the prescribed approximation accuracy \(\varepsilon >0\) (see (2.16) in Theorem 2.3 below).
In our applications of Theorem 2.3 we employ Theorem 2.3 to study in Theorem 4.5 below DNN approximations for PDEs. Theorem 4.5 can be considered as a special case of Theorem 2.3 with the function to be approximated to be equal to the solution of a suitable Kolmogorov PDE (cf. (4.42) below) at the final time \(T \in (0, \infty )\) and the approximating scheme to be equal to the Monte Carlo Euler scheme. In particular, Theorem 4.5 shows that solutions of suitable Kolmogorov PDEs can be approximated with DNNs without the curse of dimensionality. For the number of real parameters used to describe such approximating DNNs Theorem 4.5 also provides an explicit upper bound for the optimal exponent of the dimension \(d \in {\mathbb {N}}\) of the PDE under consideration as well as an explicit lower bound for the optimal exponent of the prescribed approximation accuracy \(\varepsilon >0\) (see (4.43) below). In order to illustrate the findings of Theorem 4.5 below, we now present in Theorem 1.1 below a special case of Theorem 4.5.
Theorem 1.1
Let \( \varphi _{0,d} \in C({\mathbb {R}}^d, {\mathbb {R}}) \), \( d \in {\mathbb {N}}\), and \( \varphi _{ 1, d } \in C({\mathbb {R}}^d, {\mathbb {R}}^d) \), \( d \in {\mathbb {N}}\), let \(\left\| \cdot \right\| \! :(\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0, \infty )\) and \({\mathfrak {R}}:(\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow (\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d)\) satisfy for all \(d \in {\mathbb {N}}\), \(x = (x_1, \ldots , x_d) \in {\mathbb {R}}^d\) that
let \({\mathbf {N}}= \cup _{L \in {\mathbb {N}}} \cup _{ l_0,l_1,\ldots , l_L\in {\mathbb {N}}} ( \times _{k = 1}^L ({\mathbb {R}}^{l_k \times l_{k-1}} \times {\mathbb {R}}^{l_k}) )\), let \({\mathcal {P}}:{\mathbf {N}}\rightarrow {\mathbb {N}}\) and \({\mathcal {R}}:{\mathbf {N}}\rightarrow (\cup _{k,l\in {\mathbb {N}}} C({\mathbb {R}}^k,{\mathbb {R}}^l))\) satisfy for all \( L\in {\mathbb {N}}\), \(l_0,l_1,\ldots , l_L \in {\mathbb {N}}\), \( \Phi = ((W_1, B_1),\ldots , (W_L,B_L)) \in ( \times _{k = 1}^L({\mathbb {R}}^{l_k \times l_{k-1}} \times {\mathbb {R}}^{l_k})) \), \(x_0 \in {\mathbb {R}}^{l_0}, x_1 \in {\mathbb {R}}^{l_1}, \ldots , x_{L} \in {\mathbb {R}}^{l_{L}}\) with \(\forall \, k \in {\mathbb {N}}\cap (0,L) :x_k = {\mathfrak {R}}(W_k x_{k-1} + B_k)\) that \({\mathcal {P}}(\Phi ) = \sum _{k = 1}^L l_k(l_{k-1} + 1) \), \({\mathcal {R}}(\Phi ) \in C({\mathbb {R}}^{l_0},{\mathbb {R}}^{l_L})\), and
let \( T, \kappa , {\mathfrak {e}}\in (0, \infty )\), \({\mathfrak {d}}\in [4, \infty )\), \(\theta \in [1, \infty )\), \( ( \phi ^{ m, d }_{ \varepsilon } )_{ (m, d, \varepsilon ) \in \{ 0, 1 \} \times {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}\), assume for all \( d \in {\mathbb {N}}\), \( \varepsilon \in (0,1] \), \( m \in \{0, 1\}\), \( x, y \in {\mathbb {R}}^d \) that
and \( \Vert \varphi _{ 1, d }( x ) - \varphi _{ 1, d }( y ) \Vert \le \kappa \Vert x - y \Vert \), and for every \( d \in {\mathbb {N}}\) let \( u_d :[0,T] \times {\mathbb {R}}^{d} \rightarrow {\mathbb {R}}\) be an at most polynomially growing viscosity solution of
with \( u_d( 0, x ) = \varphi _{ 0, d }( x ) \) for \( ( t, x ) \in (0,T) \times {\mathbb {R}}^d \). Then for every \(p \in (0, \infty )\) there exist \( c \in {\mathbb {R}}\) and \( ( \Psi _{ d, \varepsilon } )_{ (d , \varepsilon ) \in {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}\) such that for all \( d \in {\mathbb {N}}\), \( \varepsilon \in (0,1] \) it holds that \( {\mathcal {R}}( \Psi _{ d, \varepsilon } ) \in C( {\mathbb {R}}^{ d }, {\mathbb {R}}) \), \([ \int _{ [0, 1]^d } | u_d(T, x) - ( {\mathcal {R}}(\Psi _{ d, \varepsilon }) )( x ) |^p \, dx ]^{ \nicefrac { 1 }{ p } } \le \varepsilon \), and
Theorem 1.1 is an immediate consequence of Corollary 4.6 in Sect. 4 below. Corollary 4.6, in turn, is a special case of Theorem 4.5. Let us add some comments regarding the mathematical objects appearing in Theorem 1.1.
The set \( {\mathbf {N}}\) in Theorem 1.1 above is a set of tuples of pairs of real matrices and real vectors and this set represents the set of all DNNs (see also Definition 3.1 below). The function \({\mathfrak {R}}:(\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow (\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d)\) in Theorem 1.1 represents multidimensional rectifier functions. Theorem 1.1 is thus an approximation result for rectified DNNs.
Moreover, for every DNN \( \Phi \in {\mathbf {N}}\) in Theorem 1.1 above \( {\mathcal {P}}( \Phi ) \in {\mathbb {N}}\) represents the number of real parameters which are used to describe the DNN \( \Phi \) (see also Definition 3.1 below). In particular, for every DNN \( \Phi \in {\mathbf {N}}\) in Theorem 1.1 one can think of \( {\mathcal {P}}( \Phi ) \in {\mathbb {N}}\) as a number proportional to the amount of memory storage needed to store the DNN \(\Phi \). Furthermore, the function \( {\mathcal {R}}:{\mathbf {N}}\rightarrow (\cup _{ k, l \in {\mathbb {N}}} C( {\mathbb {R}}^k, {\mathbb {R}}^l )) \) from the set \( {\mathbf {N}}\) of “all DNNs" to the union \( \cup _{ k, l \in {\mathbb {N}}} C( {\mathbb {R}}^k, {\mathbb {R}}^l ) \) of continuous functions describes the realization functions associated to the DNNs (see also Definition 3.3 below).
The real number \( T > 0 \) in Theorem 1.1 describes the time horizon under consideration and the real numbers \( \kappa , {\mathfrak {e}}, {\mathfrak {d}}, \theta \in {\mathbb {R}}\) in Theorem 1.1 are constants used to formulate the assumptions in Theorem 1.1. The key assumption in Theorem 1.1 is the hypothesis that both the possibly nonlinear initial value functions \( \varphi _{ 0, d } :{\mathbb {R}}^d \rightarrow {\mathbb {R}}\), \( d \in {\mathbb {N}}\), and the possibly nonlinear drift coefficient functions \( \varphi _{ 1, d } :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d \), \( d \in {\mathbb {N}}\), of the PDEs in (1.7) can be approximated by means of DNNs without the curse of dimensionality (see (1.3)–(1.6) above for details).
Results related to Theorem 4.5 have been established in [24, Theorem 3.14], [36, Theorem 1.1], [34, Theorem 4.1], and [50, Corollary 2.2]. Theorem 3.14 in [24] proves a similar statement to (1.8) for a different class of PDEs than (1.7), that is, Theorem 3.14 in [24] deals with Black-Scholes PDEs with affine linear coefficient functions while in (1.7) the diffusion coefficient is constant and the drift coefficient may be nonlinear. Theorem 1.1 in [36] shows the existence of constants and exponents of \(d \in {\mathbb {N}}\) and \(\varepsilon >0\) such that (1.8) holds but does not provide any explicit form for these exponents. Theorem 4.1 in [34] studies a different class of PDEs than (1.7) (the diffusion coefficient is chosen so that the second order term is the Laplacian and the drift coefficient is chosen to be zero but there is a nonlinearity depending on the PDE solution in the PDE in Theorem 4.1 in [34]) and provides an explicit exponent for \(\varepsilon >0\) and the existence of constants and exponents of \(d \in {\mathbb {N}}\) such that (1.8) holds. Corollary 2.2 in [50] studies a more general class of Kolmogorov PDEs than (1.7) and shows the existence of constants and exponents of \(d \in {\mathbb {N}}\) and \(\varepsilon >0\) such that (1.8) holds. Theorem 4.5 above extends these results by providing explicit exponents for \(d \in {\mathbb {N}}\) and \(\varepsilon > 0\) in terms of the used assumptions such that (1.8) holds and, in addition, Theorem 4.5 can be considered as a special case of the general DNN approximation result in Theorem 2.3 with the functions to be approximated to be equal to the solutions of the PDEs in (1.7) at the final time \(T \in (0, \infty )\) and the approximating scheme to be equal to the Monte Carlo Euler scheme.
The remainder of this article is organized as follows. In Sect. 2 we present Theorem 2.3, which is the main result of this paper. The proof of Theorem 2.3 employs the elementary result in Lemma 2.2. Lemma 2.2 establishes suitable a priori bounds for random variables and follows from the well-known discrete Gronwall-type inequality in Lemma 2.1 below. In Sect. 3 we develop in Lemmas 3.29 and 3.30 a few elementary results on representation flexibilities of DNNs. The proofs of Lemmas 3.29 and 3.30 use results on a certain artificial neural network (ANN) calculus which we recall and extend in Sects. 3.1–3.7. In Sect. 4 in Theorem 4.5 we employ Lemmas 3.29 and 3.30 to establish the existence of DNNs which approximate solutions of suitable Kolmogorov PDEs without the curse of dimensionality. In our proof of Theorem 4.5 we also employ error estimates for the Monte Carlo Euler method which we present in Proposition 4.4 in Sect. 4. The proof of Proposition 4.4, in turn, makes use of the elementary error estimate results in Lemmas 4.1–4.3 below.
2 Deep artificial neural network (DNN) approximations
In this section we show in Theorem 2.3 below that, roughly speaking, if a function can be approximated by means of some suitable discrete approximation scheme without the curse of dimensionality and if there exist DNNs which satisfy certain regularity properties and which approximate this discrete approximation scheme without the curse of dimensionality, then the function itself can also be approximated with DNNs without the curse of dimensionality.
In our proof of Theorem 2.3 we employ the elementary a priori estimates for expectations of certain random variables in Lemma 2.2 below. Lemma 2.2, in turn, follows from the well-known discrete Gronwall-type inequality in Lemma 2.1 below. Only for completeness we include in this section a detailed proof for Lemma 2.1.
2.1 A priori bounds for random variables
Lemma 2.1
Let \(\alpha \in [0, \infty )\), \( \beta \in [0, \infty ]\) and let \( x :{\mathbb {N}}_0 \rightarrow {\mathbb {R}}\) satisfy for all \(n \in {\mathbb {N}}\) that \(x_n \le \alpha x_{n-1} + \beta \). Then it holds for all \(n \in {\mathbb {N}}\) that
Proof of Lemma 2.1
We prove (2.1) by induction on \(n \in {\mathbb {N}}\). For the base case \(n=1\) note that the hypothesis that \(\forall \, k \in {\mathbb {N}}:x_k \le \alpha x_{k-1} + \beta \) ensures that
This establishes (2.1) in the base case \(n=1\). For the induction step \({\mathbb {N}}\ni (n-1) \rightarrow n \in {\mathbb {N}}\cap [2, \infty )\) observe that the hypothesis that \(\forall \, k \in {\mathbb {N}}:x_k \le \alpha x_{k-1} + \beta \) implies that for all \(n \in {\mathbb {N}}\cap [2, \infty )\) with \(x_{n-1} \le \alpha ^{n-1} x_0 + \beta (1 + \alpha + \ldots + \alpha ^{n-2})\) it holds that
Induction thus establishes (2.1). This completes the proof of Lemma 2.1. \(\square \)
Lemma 2.2
Let \(N \in {\mathbb {N}}\), \(p \in [1, \infty )\), \(\alpha , \beta , \gamma \in [0, \infty )\) and let \(X_n :\Omega \rightarrow {\mathbb {R}}\), \(n \in \{0, 1, \ldots , N\}\), and \(Z_n :\Omega \rightarrow {\mathbb {R}}\), \(n \in \{0, 1, \ldots , N-1\}\), be random variables which satisfy for all \(n \in \{1, 2, \ldots , N\}\) that
Then it holds that
Proof of Lemma 2.2
First, note that (2.4) implies for all \(n \in \{1, 2, \ldots , N\}\) that
Lemma 2.1 (with \(\alpha = \alpha \), \(\beta = \beta \, [ \gamma + \sup \nolimits _{i \in \{0, 1, \ldots , N -1 \}} ( {\mathbb {E}}[ |Z_{i} |^p ] )^{\nicefrac {1}{p}} ]\) in the notation of Lemma 2.1) hence establishes for all \(n \in \{1, 2, \ldots , N\}\) that
The proof of Lemma 2.2 is thus completed. \(\square \)
2.2 A DNN approximation result for Monte Carlo algorithms
Theorem 2.3
Let \((\Omega , {\mathcal {F}}, {\mathbb {P}})\) be a probability space, let \( {\mathfrak {n}}_0 \in (0, \infty )\), \({\mathfrak {n}}_1, {\mathfrak {n}}_2, {\mathfrak {e}}, {\mathfrak {d}}_0, {\mathfrak {d}}_1, \ldots , {\mathfrak {d}}_6 \in [0, \infty )\), \({\mathfrak {C}}, p, \theta \in [1, \infty )\), \((M_{N})_{N \in {\mathbb {N}}} \subseteq {\mathbb {N}}\), let \(Z^{N, d, m}_n :\Omega \rightarrow {\mathbb {R}}^{d} \), \(n \in \{0, 1, \ldots , N-1\}\), \(m \in \{1, 2, \ldots , M_{N}\}\), \(d, N \in {\mathbb {N}}\), be random variables, let \(f_{N, d} \in C( {\mathbb {R}}^{d} \times {\mathbb {R}}^{d}, {\mathbb {R}}^{d})\), \(d, N \in {\mathbb {N}}\), and \(Y^{N, d, x}_n = (Y^{N, d, m, x}_n)_{m \in \{1, 2, \ldots , M_{N}\}} :\Omega \rightarrow {\mathbb {R}}^{M_N d}\), \(n \in \{0, 1, \ldots , N\}\), \(x \in {\mathbb {R}}^d\), \(d, N \in {\mathbb {N}}\), satisfy for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , M_{N}\}\), \(x \in {\mathbb {R}}^d\), \(n \in \{1, 2, \ldots , N\}\), \(\omega \in \Omega \) that \(Y^{N, d, m, x}_{0}(\omega ) = x\) and
let \(\left\| \cdot \right\| \!:(\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0, \infty )\) satisfy for all \(d \in {\mathbb {N}}\), \(x = (x_1, x_2, \ldots , x_d) \in {\mathbb {R}}^d\) that \(\Vert x\Vert = ( \textstyle \sum _{i=1}^d |x_i|^2)^{\nicefrac {1}{2}}\), for every \(d \in {\mathbb {N}}\) let \( \nu _d :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1]\) be a probability measure on \({\mathbb {R}}^d\), let \(g_{N, d} \in C( {\mathbb {R}}^{Nd}, {\mathbb {R}})\), \( d, N \in {\mathbb {N}}\), and \(u_d \in C({\mathbb {R}}^d, {\mathbb {R}})\), \(d \in {\mathbb {N}}\), satisfy for all \( N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , M_N\}\), \(n \in \{0, 1, \ldots , N-1\}\) that
let \({\mathbf {N}}\) be a set, let \( {\mathcal {P}}:{\mathbf {N}}\rightarrow {\mathbb {N}}\), \({\mathcal {D}} :{\mathbf {N}}\rightarrow ( \cup _{L \in {\mathbb {N}}} {\mathbb {N}}^{L})\), and \( {\mathcal {R}}:{\mathbf {N}}\rightarrow (\cup _{ k, l \in {\mathbb {N}}} C( {\mathbb {R}}^k, {\mathbb {R}}^l )) \) be functions, let \({\mathfrak {N}}_{d, \varepsilon } \subseteq {\mathbf {N}}\), \( \varepsilon \in (0, 1]\), \(d \in {\mathbb {N}}\), let \(({\mathbf {f}}^{N, d}_{\varepsilon , z})_{(N, d, \varepsilon , z) \in {\mathbb {N}}^2 \times (0, 1] \times {\mathbb {R}}^d } \subseteq {\mathbf {N}}\), \(({\mathbf {g}}^{N, d}_{\varepsilon })_{(N, d, \varepsilon ) \in {\mathbb {N}}^2 \times (0, 1] } \subseteq {\mathbf {N}}\), \(({\mathfrak {I}}_{d})_{d \in {\mathbb {N}}} \subseteq {\mathbf {N}}\), assume for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(x, y, z \in {\mathbb {R}}^d\) that \( {\mathfrak {N}}_{d, \varepsilon } \subseteq \{\Phi \in {\mathbf {N}}:{\mathcal {R}}( \Phi ) \in C({\mathbb {R}}^d, {\mathbb {R}}^d) \}\), \({\mathfrak {I}}_d \in {\mathfrak {N}}_{d, \varepsilon }\), \(({\mathcal {R}}({\mathfrak {I}}_d))(x) = x\), \({\mathcal {P}}({\mathfrak {I}}_d) \le {\mathfrak {C}}d^{{\mathfrak {d}}_3} \), \( {\mathcal {R}}( {\mathbf {f}}^{N, d}_{\varepsilon , z}) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)\), \(({\mathbb {R}}^d \ni {\mathfrak {z}} \mapsto ( {\mathcal {R}}({\mathbf {f}}^{N, d}_{\varepsilon , {\mathfrak {z}}}))(x) \in {\mathbb {R}}^d)\) is \({\mathcal {B}}({\mathbb {R}}^d) /{\mathcal {B}}({\mathbb {R}}^d)\)-measurable, and
assume for every \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(\Phi \in {\mathfrak {N}}_{d, \varepsilon }\) that there exist \((\phi _z)_{z \in {\mathbb {R}}^d} \subseteq {\mathfrak {N}}_{d, \varepsilon }\) such that for all \(x, z, {\mathfrak {z}} \in {\mathbb {R}}^d\) it holds that \( ({\mathcal {R}}(\phi _z)) (x) = ( {\mathcal {R}}({\mathbf {f}}^{N, d}_{\varepsilon , z}))(({\mathcal {R}}(\Phi ))(x)) \), \({\mathcal {P}}(\phi _z) \le {\mathcal {P}}(\Phi ) + {\mathfrak {C}}N^{{\mathfrak {n}}_1} d^{{\mathfrak {d}}_3} \varepsilon ^{-{\mathfrak {e}}}\), and \( {\mathcal {D}} (\phi _z) = {\mathcal {D}} (\phi _{{\mathfrak {z}}})\), assume for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(x = (x_i)_{i \in \{1, 2, \ldots , N\}} \in {\mathbb {R}}^{Nd}\), \(y = (y_i)_{i \in \{1, 2, \ldots , N\}} \in {\mathbb {R}}^{Nd}\) that \( {\mathcal {R}}({\mathbf {g}}^{N, d}_{\varepsilon }) \in C({\mathbb {R}}^{Nd}, {\mathbb {R}})\) and
and assume for every \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(\Phi _1, \Phi _2, \ldots , \Phi _{M_N} \in {\mathfrak {N}}_{d, \varepsilon }\) with \({\mathcal {D}}(\Phi _1) = {\mathcal {D}}(\Phi _2) = \cdots = {\mathcal {D}}(\Phi _{M_N})\) that there exists \(\varphi \in {\mathbf {N}}\) such that for all \(x \in {\mathbb {R}}^d\) it holds that \( {\mathcal {R}}(\varphi ) \in C({\mathbb {R}}^d, {\mathbb {R}})\), \(( {\mathcal {R}}(\varphi ))(x) = ( {\mathcal {R}}({\mathbf {g}}^{ M_N, d }_{ \varepsilon }) )( ({\mathcal {R}}(\Phi _1))(x), ({\mathcal {R}}(\Phi _2))(x),\) \(\ldots , ({\mathcal {R}}(\Phi _{M_N}))(x))\), and \({\mathcal {P}}(\varphi ) \le {\mathfrak {C}}N^{{\mathfrak {n}}_2} ( N^{{\mathfrak {n}}_1 +1} d^{{\mathfrak {d}}_3} \varepsilon ^{-{\mathfrak {e}}} + {\mathcal {P}}(\Phi _1))\). Then there exist \( c \in {\mathbb {R}}\) and \( ( \Psi _{ d, \varepsilon } )_{ (d , \varepsilon ) \in {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}\) such that for all \( d \in {\mathbb {N}}\), \( \varepsilon \in (0,1] \) it holds that \( {\mathcal {R}}( \Psi _{ d, \varepsilon } ) \in C( {\mathbb {R}}^{ d }, {\mathbb {R}}) \), \([ \int _{ {\mathbb {R}}^d } | u_d(x) - ( {\mathcal {R}}(\Psi _{ d, \varepsilon }) )( x ) |^p \, \nu _d(dx) ]^{ \nicefrac { 1 }{ p } } \le \varepsilon \), and
Theorem 2.3, roughly speaking, shows that if a function can be approximated by means of some suitable discrete (Monte Carlo) approximation scheme without the curse of dimensionality (cf. (2.9) above) and if there exist DNNs which satisfy certain regularity properties and which approximate this discrete approximation scheme without the curse of dimensionality, then the function itself can also be approximated with DNNs without the curse of dimensionality.
The proof of Theorem 2.3 is given below. In the following we provide some comments on the mathematical objects appearing in Theorem 2.3 above.
The triple \((\Omega , {\mathcal {F}}, {\mathbb {P}})\) denotes the probability space on which we consider the discrete (Monte Carlo) approximation scheme. For every \(N, d \in {\mathbb {N}}\) the random variables \(Z^{N, d, m}_n :\Omega \rightarrow {\mathbb {R}}^{d} \), \(n \in \{0, 1, \ldots , N-1\}\), \(m \in \{1, 2, \ldots , M_{N}\}\), and the Lipschitz continuous function \(f_{N, d} \in C( {\mathbb {R}}^{d} \times {\mathbb {R}}^{d}, {\mathbb {R}}^{d})\) (cf. (2.13) above) are employed in the iterative construction of the discrete approximations \(Y^{N, d, x}_n = (Y^{N, d, m, x}_n)_{m \in \{1, 2, \ldots , M_{N}\}} :\Omega \rightarrow {\mathbb {R}}^{M_N d}\), \(n \in \{0, 1, \ldots , N\}\), \(x \in {\mathbb {R}}^d\) (cf. (2.8) above). We assume that these approximations composed with the functions \(g_{N, d} \in C( {\mathbb {R}}^{Nd}, {\mathbb {R}})\), \( d, N \in {\mathbb {N}}\), approximate the functions \(u_d \in C({\mathbb {R}}^d, {\mathbb {R}})\), \(d \in {\mathbb {N}}\), without the curse of dimensionality in the strong \(L^p\)-sense with respect to the probability measures \( \nu _d :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1]\), \(d \in {\mathbb {N}}\) (cf. (2.9) above). We assume suitable moment bounds for the random variables \(Z^{N, d, m}_n :\Omega \rightarrow {\mathbb {R}}^{d} \), \(n \in \{0, 1, \ldots , N-1\}\), \(m \in \{1, 2, \ldots , M_{N}\}\), \(d, N \in {\mathbb {N}}\), and the probability measures \( \nu _d :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1]\), \(d \in {\mathbb {N}}\) (cf. (2.10) above).
We think of \({\mathbf {N}}\) in Theorem 2.3 above as a set of DNNs (see also Definition 3.1 below) and for every \(\Phi \in {\mathbf {N}}\) we think of \({\mathcal {P}}(\Phi ) \in {\mathbb {N}}\) as the number of parameters which are used to describe \(\Phi \) (see also Definition 3.1 below). For every \(\Phi \in {\mathbf {N}}\) we think of \({\mathcal {D}}(\Phi ) \in (\cup _{L\in {\mathbb {N}}} {\mathbb {N}}^{L})\) as the vector consisting of the dimensions of all layers of \(\Phi \) and we think of \({\mathcal {R}}(\Phi )\) as the realization function associated to \(\Phi \) (see also Definition 3.3 below).
For every \(d \in {\mathbb {N}}\), \( \varepsilon \in (0, 1]\) we think of \({\mathfrak {N}}_{d, \varepsilon } \subseteq {\mathbf {N}}\) as a set of DNNs with suitable regularity properties. For every \(N, d \in {\mathbb {N}}\), \(z \in {\mathbb {R}}^d\) we think of \(({\mathbf {f}}^{N, d}_{\varepsilon , z})_{\varepsilon \in (0, 1] } \subseteq {\mathbf {N}}\) as neural networks which approximate the function \({\mathbb {R}}^d \ni x \mapsto f_{N, d} (z, x) \in {\mathbb {R}}^{d}\) without the curse of dimensionality (cf. (2.11) above) and which satisfy a suitable linear growth condition (cf. (2.12) above). For every \(N, d \in {\mathbb {N}}\) we think of \(({\mathbf {g}}^{N, d}_{\varepsilon })_{\varepsilon \in (0, 1] } \subseteq {\mathbf {N}}\) as neural networks which approximate the function \(g_{N, d} :{\mathbb {R}}^{d}\rightarrow {\mathbb {R}}^{d}\) without the curse of dimensionality (cf. (2.14) above) and which satisfy a suitable Lipschitz-type condition (cf. (2.15) above). For every \(d \in {\mathbb {N}}\) we think of \({\mathfrak {I}}_d \in {\mathbf {N}}\) as a neural network representing the identity function on \({\mathbb {R}}^d\) in the sense that for all \(x \in {\mathbb {R}}^d\) it holds that \(({\mathcal {R}}({\mathfrak {I}}_d))(x) = x\) (see also Definition 3.15 below).
Proof of Theorem 2.3
Throughout this proof let \(\gamma = 46 e^{{\mathfrak {C}}} {\mathfrak {C}}^2 ( 4 e^{{\mathfrak {C}}+1} {\mathfrak {C}}^3 )^{ 2\theta }\), let \(\delta = \max \{{\mathfrak {d}}_5 + \theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2), {\mathfrak {d}}_4 + {\mathfrak {d}}_6 + 2\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)\}\), let \(X^{N, d, x, \varepsilon }_n = (X^{N, d, m, x, \varepsilon }_n)_{m \in \{1, 2, \ldots , M_{N}\}} :\Omega \rightarrow {\mathbb {R}}^{M_N d}\), \(n \in \{0, 1, \ldots , N\}\), \(\varepsilon \in (0, 1]\), \(x \in {\mathbb {R}}^d\), \(d, N \in {\mathbb {N}}\), be the random variables which satisfy for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , M_{N}\}\), \(x \in {\mathbb {R}}^d\), \(\varepsilon \in (0, 1]\), \(n \in \{1, 2, \ldots , N\}\), \(\omega \in \Omega \) that \(X^{N, d, m, x, \varepsilon }_{0}(\omega ) = x\) and
and let \(({\mathcal {N}}_{d, \varepsilon })_{(d, \varepsilon ) \in {\mathbb {N}}\times (0, 1]} \subseteq {\mathbb {N}}\) and \(({\mathcal {E}}_{d, \varepsilon })_{(d, \varepsilon ) \in {\mathbb {N}}\times (0, 1]} \subseteq (0, 1]\) satisfy for all \(\varepsilon \in (0, 1]\), \(d \in {\mathbb {N}}\) that
Note that for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(n \in \{0, 1, 2, \ldots , N\}\) it holds that
This implies that for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\) it holds that
Next observe that (2.14) ensures for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\) that
In addition, note that (2.11) and (2.12) assure that for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(x, z \in {\mathbb {R}}^d\) it holds that
This proves that for all \(N, d \in {\mathbb {N}}\), \(x, z \in {\mathbb {R}}^d\) it holds that
Hence, we obtain that for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , M_N\}\), \(x \in {\mathbb {R}}^d\), \( n \in \{1, 2, \ldots , N\}\) it holds that
Moreover, note that (2.12) assures that for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , M_N\}\), \(x \in {\mathbb {R}}^d\), \(\varepsilon \in (0, 1]\), \( n \in \{1, 2, \ldots , N\}\) it holds that
Lemma 2.2 (with \(N = n\), \(p = 2p\theta \), \(\alpha \) = \((1 + \frac{{\mathfrak {C}}}{N})\), \(\beta = {\mathfrak {C}}d^{{\mathfrak {d}}_2}\), \(\gamma = d^{{\mathfrak {d}}_1}\), \(Z_i = \Vert Z^{N, d, m}_{i} \Vert \) for \(N, d \in {\mathbb {N}}\), \( n \in \{1, 2, \ldots , N\}\), \(m \in \{1, 2, \ldots , M_N\}\), \(i \in \{0, 1, \ldots , n-1\}\) in the notation of Lemma 2.2), (2.24), and (2.10) therefore demonstrate that for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , M_N\}\), \(x \in {\mathbb {R}}^d\), \(\varepsilon \in (0, 1]\), \( n \in \{1, 2, \ldots , N\}\) it holds that
This and the fact that \(\forall \, a, b \in {\mathbb {R}}:|a+b|^{\theta } \le 2^{\theta -1}(|a|^{\theta }+|b|^{\theta })\) prove for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , M_N\}\), \(x \in {\mathbb {R}}^d\), \(\varepsilon \in (0, 1]\), \( n \in \{1, 2, \ldots , N\}\) that
This and (2.10) establish that for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , M_N\}\), \(\varepsilon \in (0, 1]\) it holds that
Hence, we obtain that for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , M_N\}\) it holds that
Combining this and (2.21) demonstrates that for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\) it holds that
In addition, observe that (2.15) ensures that for all \(N, d \in {\mathbb {N}}\), \(x \in {\mathbb {R}}^d\), \(\varepsilon \in (0, 1]\) it holds that
This ensures for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\) that
Hölder’s inequality hence assures for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\) that
Moreover, note that (2.28) implies that for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , M_N\}\), \(\varepsilon \in (0, 1]\) it holds that
Next observe that (2.13) and (2.11) prove that for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , M_N\}\), \(x \in {\mathbb {R}}^d\), \(\varepsilon \in (0, 1]\), \(n \in \{1, 2, \ldots , N\}\) it holds that
Lemma 2.2 (with \(N = N\), \(p = 2p\), \(\alpha \) = \({\mathfrak {C}}\), \(\beta = \varepsilon {\mathfrak {C}}d^{{\mathfrak {d}}_4}\), \(\gamma = d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)}\), \(Z_n = \Vert X^{N, d, m, x, \varepsilon }_{n} \Vert ^{\theta } \), \(X_n = \Vert Y^{N, d, m, x}_n - X^{N, d, m, x, \varepsilon }_n \Vert \) for \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , M_N\}\), \(x \in {\mathbb {R}}^d\), \(\varepsilon \in (0, 1]\), \(n \in \{1, 2, \ldots , N\}\) in the notation of Lemma 2.2) and (2.27) hence ensure for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , M_N\}\), \(x \in {\mathbb {R}}^d\), \(\varepsilon \in (0, 1]\) that
This and (2.10) demonstrate that for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , M_N\}\), \(\varepsilon \in (0, 1]\) it holds that
Combining this with (2.33) and (2.34) establishes that for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\) it holds that
This, (2.9), (2.20), and (2.30) prove for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\) that
Combining this and (2.18) assures that for all \(d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\) it holds that
This and, e.g., [36, Lemma 2.1] establish that there exists \({\mathfrak {w}} = ({\mathfrak {w}}_{d, \varepsilon })_{(d, \varepsilon ) \in {\mathbb {N}}\times (0, 1]} :{\mathbb {N}}\times (0, 1] \rightarrow \Omega \) which satisfies for all \(d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\) that
Next note that for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , M_{N}\}\), \(x \in {\mathbb {R}}^d\), \(\varepsilon \in (0, 1]\), \(\omega \in \Omega \) it holds that \(X^{N, d, m, x, \varepsilon }_{0}(\omega ) = ({\mathcal {R}}( {\mathfrak {I}}_d))(x)\). The assumption that for all \( d\in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\) it holds that \({\mathfrak {I}}_d \in {\mathfrak {N}}_{d, \varepsilon }\) and (2.17) hence ensure that there exist \((\Phi ^{N, d, m, \varepsilon , \omega }_{n})_{m \in \{1, 2, \ldots , M_{N}\}} \subseteq {\mathfrak {N}}_{d, \varepsilon }\), \(\omega \in \Omega \), \(n \in \{0, 1, 2, \ldots , N\}\), \(\varepsilon \in (0, 1]\), \(d, N \in {\mathbb {N}}\), which satisfy for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(n \in \{0, 1, 2, \ldots , N\}\), \(\omega \in \Omega \) , \(m \in \{1, 2, \ldots , M_{N}\}\), \(x \in {\mathbb {R}}^d\) that \({\mathcal {P}}(\Phi ^{N, d, m, \varepsilon , \omega }_{n}) \le {\mathcal {P}}({\mathfrak {I}}_d)+ n {\mathfrak {C}}N^{{\mathfrak {n}}_1} d^{{\mathfrak {d}}_3} \varepsilon ^{-{\mathfrak {e}}}\), \({\mathcal {D}}(\Phi ^{N, d, m, \varepsilon , \omega }_{n}) = {\mathcal {D}}(\Phi ^{N, d, 1, \varepsilon , \omega }_{n})\), and
The assumption that for all \(d \in {\mathbb {N}}\) it holds that \({\mathcal {P}}({\mathfrak {I}}_d) \le {\mathfrak {C}}d^{{\mathfrak {d}}_3} \) therefore implies for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , M_{N}\}\), \(\varepsilon \in (0, 1]\), \(\omega \in \Omega \) that
Therefore, we obtain that there exist \(\Psi ^{N, d, \varepsilon , \omega } \in {\mathbf {N}}\), \(\omega \in \Omega \), \(\varepsilon \in (0, 1]\), \(d, N \in {\mathbb {N}}\), which satisfy for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(\omega \in \Omega \), \(x \in {\mathbb {R}}^d\) that \({\mathcal {R}}(\Psi ^{N, d, \varepsilon , \omega }) \in C({\mathbb {R}}^d, {\mathbb {R}})\), \({\mathcal {P}}(\Psi ^{N, d, \varepsilon , \omega }) \le {\mathfrak {C}}N^{{\mathfrak {n}}_2} (N^{{\mathfrak {n}}_1+1} d^{{\mathfrak {d}}_3} \varepsilon ^{-{\mathfrak {e}}} + {\mathfrak {C}}d^{{\mathfrak {d}}_3}+ {\mathfrak {C}}N^{{\mathfrak {n}}_1+1} d^{{\mathfrak {d}}_3} \varepsilon ^{-{\mathfrak {e}}} ) \), and
Hence, we obtain that for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(\omega \in \Omega \) it holds that
This and (2.18) demonstrate that for all \(d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\) it holds that
Combining this, (2.41), and (2.44) establishes (2.16). The proof of Theorem 2.3 is thus completed. \(\square \)
3 Artificial neural network (ANN) calculus
In this section we establish in Lemma 3.29 and Lemma 3.30 below a few elementary results on representation flexibilities of ANNs. In our proofs of Lemma 3.29 and Lemma 3.30 we use results from a certain ANN calculus which we recall from the scientific literature and extend in Sects. 3.1–3.7 below.
In particular, Definition 3.1 below is [25, Definitions 2.1], Definition 3.2 below is [25, Definitions 2.2], Definition 3.3 below is [25, Definitions 2.3], Definition 3.4 below is [25, Definitions 2.5], and Definition 3.5 below is [25, Definitions 2.17]. Moreover, all the results in Sect. 3.5 below are well-known and/or elementary and the proofs of these results are therefore omitted.
3.1 ANNs
Definition 3.1
(ANNs) We denote by \({\mathbf {N}}\) the set given by
and we denote by \( {\mathcal {P}}:{\mathbf {N}}\rightarrow {\mathbb {N}}\), \({\mathcal {L}}:{\mathbf {N}}\rightarrow {\mathbb {N}}\), \({\mathcal {I}}:{\mathbf {N}}\rightarrow {\mathbb {N}}\), \({\mathcal {O}}:{\mathbf {N}}\rightarrow {\mathbb {N}}\), \({\mathcal {H}}:{\mathbf {N}}\rightarrow {\mathbb {N}}_0\), and \({\mathcal {D}}:{\mathbf {N}}\rightarrow ( \cup _{L\in {\mathbb {N}}}{\mathbb {N}}^{L})\) the functions which satisfy for all \( L\in {\mathbb {N}}\), \(l_0,l_1,\ldots , l_L \in {\mathbb {N}}\), \( \Phi \in ( \times _{k = 1}^L({\mathbb {R}}^{l_k \times l_{k-1}} \times {\mathbb {R}}^{l_k}))\) that \({\mathcal {P}}(\Phi ) = \sum _{k = 1}^L l_k(l_{k-1} + 1) \), \({\mathcal {L}}(\Phi )=L\), \({\mathcal {I}}(\Phi )=l_0\), \({\mathcal {O}}(\Phi )=l_L\), \({\mathcal {H}}(\Phi )=L-1\), and \({\mathcal {D}}(\Phi )= (l_0,l_1,\ldots , l_L)\).
3.2 Realizations of ANNs
Definition 3.2
(Multidimensional versions) Let \(d \in {\mathbb {N}}\) and let \(\psi :{\mathbb {R}}\rightarrow {\mathbb {R}}\) be a function. Then we denote by \({\mathfrak {M}}_{\psi , d} :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) the function which satisfies for all \( x = ( x_1, \dots , x_{d} ) \in {\mathbb {R}}^{d} \) that
Definition 3.3
(Realizations associated to ANNs) Let \(a\in C({\mathbb {R}},{\mathbb {R}})\). Then we denote by \( {\mathcal {R}}_{a}:{\mathbf {N}}\rightarrow ( \cup _{k,l\in {\mathbb {N}}}C({\mathbb {R}}^k,{\mathbb {R}}^l)) \) the function which satisfies for all \( L\in {\mathbb {N}}\), \(l_0,l_1,\ldots , l_L \in {\mathbb {N}}\), \( \Phi = ((W_1, B_1),(W_2, B_2),\ldots , (W_L,B_L)) \in ( \times _{k = 1}^L({\mathbb {R}}^{l_k \times l_{k-1}} \times {\mathbb {R}}^{l_k})) \), \(x_0 \in {\mathbb {R}}^{l_0}, x_1 \in {\mathbb {R}}^{l_1}, \ldots , x_{L-1} \in {\mathbb {R}}^{l_{L-1}}\) with \(\forall \, k \in {\mathbb {N}}\cap (0,L) :x_k ={\mathfrak {M}}_{a,l_k}(W_k x_{k-1} + B_k)\) that
(cf. Definitions 3.1 and 3.2).
3.3 Compositions of ANNs
Definition 3.4
(Compositions of ANNs) We denote by \({(\cdot ) \bullet (\cdot )}:\{(\Phi _1,\Phi _2)\in {\mathbf {N}}\times {\mathbf {N}}:{\mathcal {I}}(\Phi _1)={\mathcal {O}}(\Phi _2)\}\rightarrow {\mathbf {N}}\) the function which satisfies for all \( L,{\mathscr {L}}\in {\mathbb {N}}\), \(l_0,l_1,\ldots , l_L,{\mathfrak {l}}_0,{\mathfrak {l}}_1,\ldots , {\mathfrak {l}}_{\mathscr {L}} \in {\mathbb {N}}\), \( \Phi _1 = ((W_1, B_1),(W_2, B_2),\ldots , (W_L,B_L)) \in ( \times _{k = 1}^L({\mathbb {R}}^{l_k \times l_{k-1}} \times {\mathbb {R}}^{l_k})) \), \( \Phi _2 = (({\mathscr {W}}_1, {\mathscr {B}}_1),({\mathscr {W}}_2, {\mathscr {B}}_2),\ldots , ({\mathscr {W}}_{\mathscr {L}},{\mathscr {B}}_{\mathscr {L}})) \in ( \times _{k = 1}^{\mathscr {L}}({\mathbb {R}}^{{\mathfrak {l}}_k \times {\mathfrak {l}}_{k-1}} \times {\mathbb {R}}^{{\mathfrak {l}}_k})) \) with \(l_0={\mathcal {I}}(\Phi _1)={\mathcal {O}}(\Phi _2)={\mathfrak {l}}_{{\mathscr {L}}}\) that
(cf. Definition 3.1).
3.4 Parallelizations of ANNs with the same length
Definition 3.5
(Parallelizations of ANNs with the same length) Let \(n\in {\mathbb {N}}\). Then we denote by
the function which satisfies for all \(L\in {\mathbb {N}}\), \(l_{1,0},l_{1,1},\dots , l_{1,L}, l_{2,0},l_{2,1},\dots , l_{2,L},\dots ,l_{n,0},l_{n,1},\dots , l_{n,L}\in {\mathbb {N}}\), \(\Phi _1=((W_{1,1}, B_{1,1}),(W_{1,2}, B_{1,2}),\ldots , (W_{1,L},B_{1,L}))\in ( \times _{k = 1}^L({\mathbb {R}}^{l_{1,k} \times l_{1,k-1}} \times {\mathbb {R}}^{l_{1,k}}))\), \(\Phi _2=((W_{2,1}, B_{2,1}),(W_{2,2}, B_{2,2}),\ldots , (W_{2,L},B_{2,L}))\in ( \times _{k = 1}^L({\mathbb {R}}^{l_{2,k} \times l_{2,k-1}} \times {\mathbb {R}}^{l_{2,k}}))\), ..., \(\Phi _n=((W_{n,1}, B_{n,1}),(W_{n,2}, B_{n,2}),\ldots , (W_{n,L},B_{n,L}))\in ( \times _{k = 1}^L({\mathbb {R}}^{l_{n,k} \times l_{n,k-1}} \times {\mathbb {R}}^{l_{n,k}}))\) that
(cf. Definition 3.1).
3.5 Linear transformations of ANNs
Definition 3.6
(Identity matrix) Let \(n\in {\mathbb {N}}\). Then we denote by \({\text {I}}_{n}\in {\mathbb {R}}^{n\times n}\) the identity matrix in \({\mathbb {R}}^{n\times n}\).
Definition 3.7
(ANNs with a vector input) Let \(n \in {\mathbb {N}}\), \(B \in {\mathbb {R}}^n\). Then we denote by \({\mathfrak {B}}_B \in ({\mathbb {R}}^{n \times n} \times {\mathbb {R}}^n)\) the pair given by \({\mathfrak {B}}_B = ({\text {I}}_n, B)\) (cf. Definition 3.6).
Lemma 3.8
Let \(n \in {\mathbb {N}}\), \(B \in {\mathbb {R}}^n\). Then
-
(i)
it holds that \({\mathfrak {B}}_B \in {\mathbf {N}}\),
-
(ii)
it holds that \({\mathcal {D}}({\mathfrak {B}}_{B}) = (n, n) \in {\mathbb {N}}^2\),
-
(iii)
it holds for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\) that \({\mathcal {R}}_{a}({\mathfrak {B}}_{B}) \in C({\mathbb {R}}^n, {\mathbb {R}}^n)\), and
-
(vi)
it holds for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(x \in {\mathbb {R}}^n\) that \(({\mathcal {R}}_{a}({\mathfrak {B}}_{B})) (x) = x + B\)
(cf. Definitions 3.1, 3.3, and 3.7).
Lemma 3.9
Let \(\Phi \in {\mathbf {N}}\) (cf. Definition 3.1). Then
-
(i)
it holds for all \(B \in {\mathbb {R}}^{{\mathcal {O}}(\Phi )} \) that \({\mathcal {D}}({{\mathfrak {B}}_B \bullet \Phi }) = {\mathcal {D}}(\Phi )\),
-
(ii)
it holds for all \(B \in {\mathbb {R}}^{{\mathcal {O}}(\Phi )}\), \(a \in C({\mathbb {R}}, {\mathbb {R}})\) that \({\mathcal {R}}_{a}({{\mathfrak {B}}_B \bullet \Phi }) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi )}, {\mathbb {R}}^{{\mathcal {O}}(\Phi )}) \),
-
(iii)
it holds for all \(B \in {\mathbb {R}}^{{\mathcal {O}}(\Phi )} \), \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )}\) that
$$\begin{aligned} ({\mathcal {R}}_{a}({{\mathfrak {B}}_B \bullet \Phi }))(x) = ({\mathcal {R}}_{a}(\Phi ))(x) + B, \end{aligned}$$(3.7) -
(iv)
it holds for all \(B \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )} \) that \({\mathcal {D}}({\Phi \bullet {\mathfrak {B}}_B}) = {\mathcal {D}}(\Phi )\),
-
(v)
it holds for all \(B \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )} \), \(a \in C({\mathbb {R}}, {\mathbb {R}})\) that \({\mathcal {R}}_{a}({\Phi \bullet {\mathfrak {B}}_B}) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi )}, {\mathbb {R}}^{{\mathcal {O}}(\Phi )}) \), and
-
(iv)
it holds for all \(B \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )} \), \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(x \in {\mathbb {R}}^{{\mathcal {O}}(\Phi )}\) that
$$\begin{aligned} ({\mathcal {R}}_{a}({\Phi \bullet {\mathfrak {B}}_B}))(x) = ({\mathcal {R}}_{a}(\Phi ))(x+B) \end{aligned}$$(3.8)
(cf. Definitions 3.3, 3.4, and 3.7).
Definition 3.10
(ANNs with a matrix input) Let \(m, n \in {\mathbb {N}}\), \(W \in {\mathbb {R}}^{m \times n}\). Then we denote by \({\mathfrak {W}}_{W} \in ({\mathbb {R}}^{m \times n} \times {\mathbb {R}}^{m})\) the pair given by \({\mathfrak {W}}_{W} = (W, 0)\).
Lemma 3.11
Let \(m, n \in {\mathbb {N}}\), \(W \in {\mathbb {R}}^{m \times n}\). Then
-
(i)
it holds that \({\mathfrak {W}}_W \in {\mathbf {N}}\),
-
(ii)
it holds that \({\mathcal {D}}({\mathfrak {W}}_{W}) = (n, m) \in {\mathbb {N}}^2\),
-
(iii)
it holds for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\) that \({\mathcal {R}}_{a}({\mathfrak {W}}_{W}) \in C({\mathbb {R}}^n, {\mathbb {R}}^m)\), and
-
(iv)
it holds for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(x \in {\mathbb {R}}^n\) that \( ({\mathcal {R}}_{a}({\mathfrak {W}}_{W})) (x) = Wx \)
(cf. Definitions 3.1, 3.3, and 3.10).
Lemma 3.12
Let \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(\Phi \in {\mathbf {N}}\) (cf. Definition 3.1). Then
-
(i)
it holds for all \(m \in {\mathbb {N}}\), \(W \in {\mathbb {R}}^{m \times {\mathcal {O}}(\Phi )}\) that \({\mathcal {R}}_{a}({{\mathfrak {W}}_W \bullet \Phi }) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi )}, {\mathbb {R}}^m) \),
-
(ii)
it holds for all \(m \in {\mathbb {N}}\), \(W \in {\mathbb {R}}^{m \times {\mathcal {O}}(\Phi )}\), \(x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )}\) that
$$\begin{aligned} ({\mathcal {R}}_{a}({{\mathfrak {W}}_W \bullet \Phi }))(x) = W \big (({\mathcal {R}}_{a}(\Phi ))(x)\big ), \end{aligned}$$(3.9) -
(iii)
it holds for all \(n \in {\mathbb {N}}\), \(W \in {\mathbb {R}}^{ {\mathcal {I}}(\Phi ) \times n}\) that \({\mathcal {R}}_{a}({\Phi \bullet {\mathfrak {W}}_W}) \in C({\mathbb {R}}^n, {\mathbb {R}}^{{\mathcal {O}}(\Phi )}) \), and
-
(iv)
it holds for all \(n \in {\mathbb {N}}\), \(W \in {\mathbb {R}}^{ {\mathcal {I}}(\Phi ) \times n}\), \(x \in {\mathbb {R}}^n\) that
$$\begin{aligned} ({\mathcal {R}}_{a}({\Phi \bullet {\mathfrak {W}}_W}))(x) = ({\mathcal {R}}_{a}(\Phi ))(Wx) \end{aligned}$$(3.10)
(cf. Definitions 3.3, 3.4, and 3.10).
Definition 3.13
(Scalar multiplications of ANNs) We denote by \(\left( \cdot \right) \circledast \left( \cdot \right) :{\mathbb {R}}\times {\mathbf {N}}\rightarrow {\mathbf {N}}\) the function which satisfies for all \(\lambda \in {\mathbb {R}}\), \(\Phi \in {\mathbf {N}}\) that
(cf. Definitions 3.1, 3.4, 3.6, and 3.10).
Lemma 3.14
Let \(\lambda \in {\mathbb {R}}\), \(\Phi \in {\mathbf {N}}\) (cf. Definition 3.1). Then
-
(i)
it holds that \({\mathcal {D}}(\lambda \circledast \Phi ) = {\mathcal {D}}(\Phi )\),
-
(ii)
it holds for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\) that \({\mathcal {R}}_{a}(\lambda \circledast \Phi ) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi )}, {\mathbb {R}}^{{\mathcal {O}}(\Phi )})\), and
-
(iii)
it holds for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )}\) that
$$\begin{aligned} ({\mathcal {R}}_{a}(\lambda \circledast \Phi ))(x) = \lambda \big ( ({\mathcal {R}}_{a}(\Phi ))(x) \big ) \end{aligned}$$(3.12)
(cf. Definitions 3.3 and 3.13).
3.6 Representations of the identities with rectifier functions
Definition 3.15
We denote by \({\mathfrak {I}}= ({\mathfrak {I}}_d)_{d \in {\mathbb {N}}} :{\mathbb {N}}\rightarrow {\mathbf {N}}\) the function which satisfies for all \(d \in {\mathbb {N}}\) that
and
(cf. Definitions 3.1 and 3.5).
Lemma 3.16
Let \(d \in {\mathbb {N}}\), \(a \in C({\mathbb {R}}, {\mathbb {R}})\) satisfy for all \(x \in {\mathbb {R}}\) that \(a(x) = \max \{x, 0\}\). Then
-
(i)
it holds that \({\mathcal {D}}({\mathfrak {I}}_d) = (d, 2d, d) \in {\mathbb {N}}^3\),
-
(ii)
it holds that \( {\mathcal {R}}_{a}({\mathfrak {I}}_d) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)\), and
-
(iii)
it holds for all \(x \in {\mathbb {R}}^d\) that \(({\mathcal {R}}_{a}({\mathfrak {I}}_d))(x) = x\)
(cf. Definitions 3.1, 3.3, and 3.15).
Proof of Lemma 3.16
Throughout this proof let \(L =2\), \(l_0 = 1\), \(l_1 = 2\), \(l_2 =1\). Note that (3.13) ensures that
This and, e.g., [25, Lemma 2.18] prove that
(cf. Definition 3.5). Hence, we obtain that \({\mathcal {D}}({\mathfrak {I}}_d) = (d, 2d, d) \in {\mathbb {N}}^3\). This establishes item (i). Next note that (3.13) assures that for all \(x \in {\mathbb {R}}\) it holds that
Combining this and, e.g., [25, Proposition 2.19] demonstrates that for all \( x = (x_1, x_2, \ldots , x_d) \in {\mathbb {R}}^d\) it holds that \({\mathcal {R}}_{a}({\mathfrak {I}}_d) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)\) and
This establishes items (ii)–(iii). The proof of Lemma 3.16 is thus completed. \(\square \)
3.7 Sums of ANNs with the same length
Definition 3.17
Let \(m, n \in {\mathbb {N}}\). Then we denote by \({\mathfrak {S}}_{m, n} \in ({\mathbb {R}}^{m \times (nm)} \times {\mathbb {R}}^m)\) the pair given by
(cf. Definitions 3.6 and 3.10).
Lemma 3.18
Let \(m, n \in {\mathbb {N}}\). Then
-
(i)
it holds that \({\mathfrak {S}}_{m, n} \in {\mathbf {N}}\),
-
(ii)
it holds that \({\mathcal {D}}({\mathfrak {S}}_{m, n}) = (nm, m) \in {\mathbb {N}}^2\),
-
(iii)
it holds for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\) that \({\mathcal {R}}_{a}({\mathfrak {S}}_{m, n}) \in C({\mathbb {R}}^{nm}, {\mathbb {R}}^m)\), and
-
(iv)
it holds for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(x_1, x_2, \ldots , x_n \in {\mathbb {R}}^{m}\) that
$$\begin{aligned} ({\mathcal {R}}_{a}({\mathfrak {S}}_{m, n})) (x_1, x_2, \ldots , x_n) = \textstyle \sum _{k=1}^n x_k \end{aligned}$$(3.20)
(cf. Definitions 3.1, 3.3, and 3.17).
Proof of Lemma 3.18
Note that the fact that \({\mathfrak {S}}_{m, n} \in ({\mathbb {R}}^{m \times (nm)} \times {\mathbb {R}}^m)\) ensures that \({\mathfrak {S}}_{m, n} \in {\mathbf {N}}\) and \({\mathcal {D}}({\mathfrak {S}}_{m, n}) = (nm, m) \in {\mathbb {N}}^2\). This establishes items (i)–(ii). Next observe that items (iii)–(iv) in Lemma 3.11 prove that for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(x_1, x_2, \ldots , x_n \in {\mathbb {R}}^{m}\) it holds that \({\mathcal {R}}_{a}({\mathfrak {S}}_{m, n}) \in C({\mathbb {R}}^{nm}, {\mathbb {R}}^m)\) and
(cf. Definitions 3.6 and 3.10). This establishes items (iii)–(iv). The proof of Lemma 3.18 is thus completed. \(\square \)
Lemma 3.19
Let \(m, n \in {\mathbb {N}}\), \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(\Phi \in \{\Psi \in {\mathbf {N}}:{\mathcal {O}}(\Psi ) = nm\}\) (cf. Definition 3.1). Then
-
(i)
it holds that \({\mathcal {R}}_{a}({{\mathfrak {S}}_{m, n} \bullet \Phi }) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi )}, {\mathbb {R}}^m) \) and
-
(ii)
it holds for all \(x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )}\), \(y_1, y_2, \ldots , y_n \in {\mathbb {R}}^{m}\) with \(({\mathcal {R}}_{a}(\Phi ))(x) = (y_1, y_2, \ldots , y_n)\) that
$$\begin{aligned} \big ( {\mathcal {R}}_{a}({{\mathfrak {S}}_{m, n} \bullet \Phi }) \big )(x) = \textstyle \sum _{k=1}^n y_k \end{aligned}$$(3.22)
(cf. Definitions 3.3, 3.4, and 3.17).
Proof of Lemma 3.19
Note that Lemma 3.18 ensures that for all \(x_1, x_2, \ldots , x_n \in {\mathbb {R}}^{m}\) it holds that \({\mathcal {R}}_{a}({\mathfrak {S}}_{m, n}) \in C({\mathbb {R}}^{nm}, {\mathbb {R}}^m)\) and
Combining this and, e.g., [25, item (v) in Proposition 2.6] establishes items (i)–(ii). The proof of Lemma 3.19 is thus completed. \(\square \)
Lemma 3.20
Let \(n \in {\mathbb {N}}\), \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(\Phi \in {\mathbf {N}}\) (cf. Definition 3.1). Then
-
(i)
it holds that \({\mathcal {R}}_{a}({\Phi \bullet {\mathfrak {S}}_{{\mathcal {I}}(\Phi ), n}}) \in C({\mathbb {R}}^{n {\mathcal {I}}(\Phi )}, {\mathbb {R}}^{{\mathcal {O}}(\Phi )}) \) and
-
(ii)
it holds for all \(x_1, x_2, \ldots , x_n \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )}\) that
$$\begin{aligned} \big ({\mathcal {R}}_{a}({\Phi \bullet {\mathfrak {S}}_{{\mathcal {I}}(\Phi ), n}}) \big )(x_1, x_2, \ldots , x_n) = ({\mathcal {R}}_{a}(\Phi ))(\textstyle \sum _{k=1}^n x_k) \end{aligned}$$(3.24)
(cf. Definitions 3.3, 3.4, and 3.17).
Proof of Lemma 3.20
Note that Lemma 3.18 demonstrates that for all \(m \in {\mathbb {N}}\), \(x_1, x_2, \ldots , x_n \in {\mathbb {R}}^{m}\) it holds that \({\mathcal {R}}_{a}({\mathfrak {S}}_{m, n}) \in C({\mathbb {R}}^{nm}, {\mathbb {R}}^m)\) and
Combining this and, e.g., [25, item (v) in Proposition 2.6] establishes items (i)–(ii). The proof of Lemma 3.20 is thus completed. \(\square \)
Definition 3.21
Let \(m, n \in {\mathbb {N}}\), \(A \in {\mathbb {R}}^{m \times n}\). Then we denote by \(A^* \in {\mathbb {R}}^{n \times m}\) the transpose of A.
Definition 3.22
Let \(m, n \in {\mathbb {N}}\). Then we denote by \({\mathfrak {T}}_{m, n} \in ({\mathbb {R}}^{(nm) \times m} \times {\mathbb {R}}^{nm})\) the pair given by
(cf. Definitions 3.6, 3.10, and 3.21).
Lemma 3.23
Let \(m, n \in {\mathbb {N}}\). Then
-
(i)
it holds that \({\mathfrak {T}}_{m, n} \in {\mathbf {N}}\),
-
(ii)
it holds that \( {\mathcal {D}}({\mathfrak {T}}_{m, n}) = (m, nm) \in {\mathbb {N}}^2\),
-
(iii)
it holds for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\) that \({\mathcal {R}}_{a}({\mathfrak {T}}_{m, n}) \in C({\mathbb {R}}^{m}, {\mathbb {R}}^{nm})\), and
-
(iv)
it holds for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(x \in {\mathbb {R}}^m\) that
$$\begin{aligned} ({\mathcal {R}}_{a}({\mathfrak {T}}_{m, n})) (x) = (x, x, \ldots , x) \end{aligned}$$(3.27)
(cf. Definitions 3.1, 3.3, and 3.22).
Proof of Lemma 3.23
Note that the fact that \({\mathfrak {T}}_{m, n} \in ({\mathbb {R}}^{(nm) \times m} \times {\mathbb {R}}^{nm})\) ensures that \({\mathfrak {T}}_{m, n} \in {\mathbf {N}}\) and \({\mathcal {D}}({\mathfrak {T}}_{m, n}) = (m, nm) \in {\mathbb {N}}^2\). This establishes items (i)–(ii). Next observe that items (iii)–(iv) in Lemma 3.11 prove that for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(x \in {\mathbb {R}}^m\) it holds that \({\mathcal {R}}_{a}({\mathfrak {T}}_{m, n}) \in C({\mathbb {R}}^{m}, {\mathbb {R}}^{nm})\) and
(cf. Definitions 3.6 and 3.10). This establishes items (iii)–(iv). The proof of Lemma 3.23 is thus completed. \(\square \)
Lemma 3.24
Let \(n \in {\mathbb {N}}\), \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(\Phi \in {\mathbf {N}}\) (cf. Definition 3.1). Then
-
(i)
it holds that \({\mathcal {R}}_{a}({{\mathfrak {T}}_{{\mathcal {O}}(\Phi ), n} \bullet \Phi }) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi )}, {\mathbb {R}}^{n {\mathcal {O}}(\Phi )}) \) and
-
(ii)
it holds for all \(x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi )}\) that
$$\begin{aligned} \big ( {\mathcal {R}}_{a}({{\mathfrak {T}}_{{\mathcal {O}}(\Phi ), n} \bullet \Phi }) \big )(x) = \big (({\mathcal {R}}_{a}(\Phi ))(x), ({\mathcal {R}}_{a}(\Phi ))(x), \ldots , ({\mathcal {R}}_{a}(\Phi ))(x) \big ) \end{aligned}$$(3.29)
(cf. Definitions 3.3, 3.4, and 3.22).
Proof of Lemma 3.24
Note that Lemma 3.23 ensures that for all \(m \in {\mathbb {N}}\), \(x \in {\mathbb {R}}^m\) it holds that \({\mathcal {R}}_{a}({\mathfrak {T}}_{m, n}) \in C({\mathbb {R}}^{m}, {\mathbb {R}}^{nm})\) and
Combining this and, e.g., [25, item (v) in Proposition 2.6] establishes items (i)–(ii). The proof of Lemma 3.24 is thus completed. \(\square \)
Lemma 3.25
Let \(m, n \in {\mathbb {N}}\), \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(\Phi \in \{\Psi \in {\mathbf {N}}:{\mathcal {I}}(\Psi ) = nm\}\) (cf. Definition 3.1). Then
-
(i)
it holds that \({\mathcal {R}}_{a}({\Phi \bullet {\mathfrak {T}}_{m, n}}) \in C({\mathbb {R}}^{m}, {\mathbb {R}}^{{\mathcal {O}}(\Phi )}) \) and
-
(ii)
it holds for all \(x \in {\mathbb {R}}^{m}\) that
$$\begin{aligned} \big ({\mathcal {R}}_{a}({\Phi \bullet {\mathfrak {T}}_{m, n}}) \big )(x) = ({\mathcal {R}}_{a}(\Phi ))(x, x, \ldots , x) \end{aligned}$$(3.31)
(cf. Definitions 3.3, 3.4, and 3.22).
Proof of Lemma 3.25
Observe that Lemma 3.23 demonstrates that for all \(x \in {\mathbb {R}}^m\) it holds that \({\mathcal {R}}_{a}({\mathfrak {T}}_{m, n}) \in C({\mathbb {R}}^{m}, {\mathbb {R}}^{nm})\) and
Combining this and, e.g., [25, item (v) in Proposition 2.6] establishes items (i)–(ii). The proof of Lemma 3.25 is thus completed. \(\square \)
Definition 3.26
(Sums of ANNs with the same length) Let \(n \in {\mathbb {N}}\), \(\Phi _1, \Phi _2, \ldots , \Phi _n \in {\mathbf {N}}\) satisfy for all \(k \in \{1, 2, \ldots , n\}\) that \({\mathcal {L}}(\Phi _k) = {\mathcal {L}}(\Phi _1)\), \({\mathcal {I}}(\Phi _k) = {\mathcal {I}}(\Phi _1)\), and \({\mathcal {O}}(\Phi _k) = {\mathcal {O}}(\Phi _1)\). Then we denote by \(\oplus _{k \in \{1, 2, \ldots , n\}} \Phi _k\) (we denote by \(\Phi _1 \oplus \Phi _2 \oplus \ldots \oplus \Phi _n\)) the tuple given by
(cf. Definitions 3.1, 3.4, 3.5, 3.17, and 3.22).
Definition 3.27
(Dimensions of ANNs) Let \(n \in {\mathbb {N}}_0\). Then we denote by \({\mathbb {D}}_n :{\mathbf {N}}\rightarrow {\mathbb {N}}_0\) the function which satisfies for all \( L\in {\mathbb {N}}\), \(l_0,l_1,\ldots , l_L \in {\mathbb {N}}\), \( \Phi \in ( \times _{k = 1}^L({\mathbb {R}}^{l_k \times l_{k-1}} \times {\mathbb {R}}^{l_k}))\) that
(cf. Definition 3.1).
Lemma 3.28
Let \(n \in {\mathbb {N}}\), \(\Phi _1, \Phi _2, \ldots , \Phi _n \in {\mathbf {N}}\) satisfy for all \(k \in \{1, 2, \ldots , n\}\) that \({\mathcal {L}}(\Phi _k) = {\mathcal {L}}(\Phi _1)\), \({\mathcal {I}}(\Phi _k) = {\mathcal {I}}(\Phi _1)\), and \({\mathcal {O}}(\Phi _k) = {\mathcal {O}}(\Phi _1)\) (cf. Definition 3.1). Then
-
(i)
it holds that \( {\mathcal {L}}(\oplus _{k \in \{1, 2, \ldots , n\}} \Phi _k) = {\mathcal {L}}(\Phi _1)\),
-
(ii)
it holds that
$$\begin{aligned}&{\mathcal {D}}(\oplus _{k \in \{1, 2, \ldots , n\}} \Phi _k) \nonumber \\&\quad = \big ({\mathcal {I}}(\Phi _1), \textstyle \sum _{k=1}^n {\mathbb {D}}_1(\Phi _k), \textstyle \sum _{k = 1}^n {\mathbb {D}}_2(\Phi _k), \ldots , \textstyle \sum _{k=1}^n {\mathbb {D}}_{{\mathcal {L}}(\Phi _1)-1}(\Phi _k), {\mathcal {O}}(\Phi _1)\big ), \end{aligned}$$(3.35) -
(iii)
it holds for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\) that \({\mathcal {R}}_{a}(\oplus _{k \in \{1, 2, \ldots , n\}} \Phi _k) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi _1)}, {\mathbb {R}}^{{\mathcal {O}}(\Phi _1)})\), and
-
(iv)
it holds for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi _1)}\) that
$$\begin{aligned} \big ({\mathcal {R}}_{a} (\oplus _{k \in \{1, 2, \ldots , n\}} \Phi _k ) \big ) (x) = \sum _{k=1}^n ({\mathcal {R}}_a(\Phi _k))(x) \end{aligned}$$(3.36)
(cf. Definitions 3.3, 3.26, and 3.27).
Proof of Lemma 3.28
First, note that, e.g., [25, Lemma 2.18] proves that
(cf. Definition 3.5). Moreover, observe that item (ii) in Lemma 3.18 ensures that
(cf. Definition 3.17). This, (3.37), and, e.g., [25, item (i) in Proposition 2.6] demonstrate that
Next note that item (ii) in Lemma 3.23 assures that
(cf. Definition 3.22). Combining this, (3.39), and, e.g., [25, item (i) in Proposition 2.6] proves that
This establishes items (i)–(ii). Next observe that Lemma 3.25 and (3.37) ensure that for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi _1)}\) it holds that \({\mathcal {R}}_{a}({[{\mathbf {P}}_n(\Phi _1,\Phi _2,\dots , \Phi _n)] \bullet {\mathfrak {T}}_{{\mathcal {I}}(\Phi _1), n}}) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi _1)}, {\mathbb {R}}^{n {\mathcal {O}}(\Phi _1)}) \) and
Combining this with, e.g., [25, item (ii) in Proposition 2.19] proves that for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi _1)}\) it holds that
Lemma 3.19, (3.38), and, e.g., [25, Lemma 2.8] therefore demonstrate that for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(x \in {\mathbb {R}}^{{\mathcal {I}}(\Phi _1)}\) it holds that \({\mathcal {R}}_{a}(\oplus _{k \in \{1, 2, \ldots , n\}} \Phi _k) \in C({\mathbb {R}}^{{\mathcal {I}}(\Phi _1)}, {\mathbb {R}}^{{\mathcal {O}}(\Phi _1)})\) and
This establishes items (iii)–(iv). The proof of Lemma 3.28 is thus completed. \(\square \)
3.8 ANN representation results
Lemma 3.29
Let \( n \in {\mathbb {N}}\), \(h_1, h_2, \ldots , h_n \in {\mathbb {R}}\), \( \Phi _1, \Phi _2, \ldots , \Phi _n \in {\mathbf {N}}\) satisfy that \({\mathcal {D}}(\Phi _1) = {\mathcal {D}}(\Phi _2) = \ldots = {\mathcal {D}}(\Phi _n)\), let \(A_k \in {\mathbb {R}}^{{\mathcal {I}}(\Phi _1) \times (n {\mathcal {I}}(\Phi _1))}\), \(k \in \{1, 2, \ldots , n\}\), satisfy for all \(k \in \{1, 2, \ldots , n\}\), \(x = (x_i)_{i \in \{1, 2, \ldots , n\}} \in {\mathbb {R}}^{n {\mathcal {I}}(\phi _1)}\) that \(A_k x = x_k\), and let \(\Psi \in {\mathbf {N}}\) satisfy that
(cf. Definitions 3.1, 3.10, 3.13, and 3.26). Then
-
(i)
it holds that
$$\begin{aligned} {\mathcal {D}}(\Psi ) = (n {\mathcal {I}}(\Phi _1), n{\mathbb {D}}_1(\Phi _1), n{\mathbb {D}}_2(\Phi _1), \ldots , n{\mathbb {D}}_{{\mathcal {L}}(\Phi _1)-1}(\Phi _1), {\mathcal {O}}(\Phi _1)), \end{aligned}$$(3.46) -
(ii)
it holds that \({\mathcal {P}}(\Psi ) \le n^2 {\mathcal {P}}(\Phi _1)\),
-
(iii)
it holds for all \( a \in C({\mathbb {R}}, {\mathbb {R}})\) that \({\mathcal {R}}_{a}(\Psi ) \in C({\mathbb {R}}^{n {\mathcal {I}}(\Phi _1)}, {\mathbb {R}}^{{\mathcal {O}}(\Phi _1)})\), and
-
(iv)
it holds for all \( a \in C({\mathbb {R}}, {\mathbb {R}})\), \(x = (x_k)_{k \in \{1, 2, \ldots , n\}} \in {\mathbb {R}}^{n {\mathcal {I}}(\Phi _1)}\) that
$$\begin{aligned} ({\mathcal {R}}_{a}(\Psi ))(x) = \sum _{k=1}^n h_k ({\mathcal {R}}_{a}(\Phi _k))(x_k) \end{aligned}$$(3.47)
(cf. Definitions 3.3 and 3.27).
Proof of Lemma 3.29
First, note that item (ii) in Lemma 3.11 ensures for all \(k \in \{1, 2, \ldots , n\}\) that
This and, e.g., [25, item (i) in Proposition 2.6] prove for all \(k \in \{1, 2, \ldots , n\}\) that
Item (i) in Lemma 3.14 therefore demonstrates for all \(k \in \{1, 2, \ldots , n\}\) that
Combining this with item (ii) in Lemma 3.28 ensures that
This establishes item (i). Hence, we obtain that
This establishes item (ii). Moreover, observe that items (iii)–(iv) in Lemma 3.12 assure for all \(k \in \{1, 2, \ldots , n\}\), \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(x = (x_i)_{i \in \{1, 2, \ldots , n\}} \in {\mathbb {R}}^{n {\mathcal {I}}(\Phi _1)}\) that \({\mathcal {R}}_{a}( {\Phi _k \bullet {\mathfrak {W}}_{A_k}}) \in C({\mathbb {R}}^{n {\mathcal {I}}(\Phi _1)}, {\mathbb {R}}^{{\mathcal {O}}(\Phi _k)})\) and
Combining this with items (ii)–(iii) in Lemma 3.14 proves for all \(k \in \{1, 2, \ldots , n\}\), \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(x = (x_i)_{i \in \{1, 2, \ldots , n\}} \in {\mathbb {R}}^{n {\mathcal {I}}(\Phi _1)}\) that \({\mathcal {R}}_{a}( h_k \circledast ( {\Phi _k \bullet {\mathfrak {W}}_{A_k}})) \in C({\mathbb {R}}^{n {\mathcal {I}}(\Phi _1)}, {\mathbb {R}}^{{\mathcal {O}}(\Phi _1)})\) and
Items (iii)–(iv) in Lemma 3.28 and (3.50) hence ensure for all \(a \in C({\mathbb {R}}, {\mathbb {R}})\), \(x = (x_i)_{i \in \{1, 2, \ldots , n\}} \in {\mathbb {R}}^{n {\mathcal {I}}(\Phi _1)}\) that \({\mathcal {R}}_{a}(\Psi ) \in C({\mathbb {R}}^{n {\mathcal {I}}(\Phi _1)}, {\mathbb {R}}^{{\mathcal {O}}(\Phi _1)})\) and
This establishes items (iii)–(iv). The proof of Lemma 3.29 is thus completed. \(\square \)
Lemma 3.30
Let \(a\in C({\mathbb {R}},{\mathbb {R}})\), \(L_1, L_2\in {\mathbb {N}}\), \({\mathbb {I}}, \Phi _1,\Phi _2\in {\mathbf {N}}\), \(d,{\mathfrak {i}}, l_{1,0},l_{1,1},\dots ,l_{1,L_1},l_{2,0}, l_{2,1},\dots ,l_{2,L_2}\in {\mathbb {N}}\) satisfy for all \(k\in \{1,2\}\), \(x\in {\mathbb {R}}^{d}\) that \(2\le {\mathfrak {i}}\le 2d\), \(l_{2,L_2-1}\le l_{1,L_1-1}+{\mathfrak {i}}\), \({\mathcal {D}}({\mathbb {I}}) = (d,{\mathfrak {i}},d)\), \(({\mathcal {R}}_{a}({\mathbb {I}}))(x)=x\), \({\mathcal {I}}(\Phi _k)={\mathcal {O}}(\Phi _k)=d\), and \({\mathcal {D}}(\Phi _k)=(l_{k,0},l_{k,1},\dots , l_{k,L_k})\) (cf. Definitions 3.1 and 3.3). Then there exists \(\Psi \in {\mathbf {N}}\) such that
-
(i)
it holds that \({\mathcal {R}}_{a}(\Psi )\in C({\mathbb {R}}^d,{\mathbb {R}}^d)\),
-
(ii)
it holds for all \(x\in {\mathbb {R}}^d\) that
$$\begin{aligned} ({\mathcal {R}}_{a}(\Psi ))(x)=({\mathcal {R}}_{a}(\Phi _2))(x)+\big (({\mathcal {R}}_{a}(\Phi _1))\circ ({\mathcal {R}}_{a}(\Phi _2))\big )(x), \end{aligned}$$(3.56) -
(iii)
it holds that
$$\begin{aligned} {\mathbb {D}}_{{\mathcal {L}}(\Psi ) -1} (\Psi ) \le l_{1, L_1 -1} + {\mathfrak {i}}, \end{aligned}$$(3.57)and
-
(iv)
it holds that \({\mathcal {P}}(\Psi ) \le {\mathcal {P}}(\Phi _2)+\big [\tfrac{1}{2}{\mathcal {P}}({\mathbb {I}})+{\mathcal {P}}(\Phi _1)\big ]^{\!2}\)
(cf. Definitions 3.4 and 3.27).
Proof of Lemma 3.30
To prove items (i)–(iv) we distinguish between the case \(L_1=1\) and the case \(L_1 \in {\mathbb {N}}\cap [2, \infty )\). We first prove items (i)–(iv) in the case \(L_1=1\). Note that, e.g., [25, Proposition 2.30] (with \(a=a\), \(d=d\), \({\mathfrak {L}} = L_2\), \((\ell _0, \ell _1, \ldots , \ell _{{\mathfrak {L}}}) = (l_{2,0},l_{2,1}, \ldots , l_{2,L_2})\), \(\psi = \Phi _2\), \(\phi _n = \Phi _1\) for \(n \in {\mathbb {N}}_0\) in the notation of [25, Proposition 2.30]) implies that there exists \(\Psi \in {\mathbf {N}}\) such that
-
(I)
it holds that \({\mathcal {R}}_{a}(\Psi ) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)\),
-
(II)
it holds for all \(x \in {\mathbb {R}}^d\) that
$$\begin{aligned} ({\mathcal {R}}_{a}(\Psi ))(x)=({\mathcal {R}}_{a}(\Phi _2))(x)+\big (({\mathcal {R}}_{a}(\Phi _1))\circ ({\mathcal {R}}_{a}(\Phi _2))\big )(x), \end{aligned}$$(3.58)and
-
(III)
it holds that \({\mathcal {D}}(\Psi ) = {\mathcal {D}}(\Phi _2)\).
The hypothesis that \(l_{2,L_2-1}\le l_{1,L_1-1}+{\mathfrak {i}}\) hence ensures that
Moreover, note that (III) assures that
Combining this with (I) and (3.59) establishes items (i)–(iv) in the case \(L_1=1\). We now prove items (i)–(iv) in the case \(L_1 \in {\mathbb {N}}\cap [2, \infty )\). Observe that, e.g., [25, Proposition 2.28] (with \(a=a\), \(L_1 = L_1\), \(L_2 = L_2\), \({\mathbb {I}} = {\mathbb {I}}\), \(\Phi _1 = \Phi _1\), \(\Phi _2 = \Phi _2\), \(d=d\), \({\mathfrak {i}} = {\mathfrak {i}}\), \((l_{1, 0}, l_{1, 1}, \ldots , l_{1, L_1}) = (l_{1, 0}, l_{1, 1}, \ldots , l_{1, L_1})\), \((l_{2, 0}, l_{2, 1}, \ldots , l_{2, L_2}) = (l_{2, 0}, l_{2, 1}, \ldots , l_{2, L_2})\) in the notation of [25, Proposition 2.28]) proves that there exists \(\Psi \in {\mathbf {N}}\) such that
-
(a)
it holds that \({\mathcal {R}}_{a}(\Psi ) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)\),
-
(b)
it holds for all \(x\in {\mathbb {R}}^d\) that
$$\begin{aligned} ({\mathcal {R}}_{a}(\Psi ))(x)=({\mathcal {R}}_{a}(\Phi _2))(x)+\big (({\mathcal {R}}_{a}(\Phi _1))\circ ({\mathcal {R}}_{a}(\Phi _2))\big )(x), \end{aligned}$$(3.61) -
(c)
it holds that
$$\begin{aligned} {\mathcal {D}}(\Psi )=(l_{2,0},l_{2,1},\dots , l_{2,L_2-1},l_{1,1}+{\mathfrak {i}},l_{1,2}+{\mathfrak {i}},\dots ,l_{1,L_1-1}+{\mathfrak {i}}, l_{1, L_1}), \end{aligned}$$(3.62)and
-
(d)
it holds that \({\mathcal {P}}(\Psi ) \le {\mathcal {P}}(\Phi _2)+\big [\tfrac{1}{2}{\mathcal {P}}({\mathbb {I}})+{\mathcal {P}}(\Phi _1)\big ]^{\!2}\).
This establishes items (i)–(iv) in the case \(L_1 \in {\mathbb {N}}\cap [2, \infty )\). The proof of Lemma 3.30 is thus completed. \(\square \)
4 Kolmogorov partial differential equations (PDEs)
In this section we establish in Theorem 4.5 below the existence of DNNs which approximate solutions of suitable Kolmogorov PDEs without the curse of dimensionality. Moreover, in Corollary 4.6 below we specialize Theorem 4.5 to the case where for every \( d \in {\mathbb {N}}\) we have that the probability measure \( \nu _d \) appearing in Theorem 4.5 is the uniform distribution on the d-dimensional unit cube \( [0,1]^d \). In addition, in Corollary 4.7 below we specialize Theorem 4.5, roughly speaking, to the case where the constants \(\kappa \in (0, \infty )\), \({\mathfrak {e}}, {\mathfrak {d}}_1, {\mathfrak {d}}_2, \ldots , {\mathfrak {d}}_6 \in [0, \infty ) \), which we use to specify the regularity hypotheses in Theorem 4.5, are all equal in the sense that \(\kappa = {\mathfrak {e}}= {\mathfrak {d}}_1 = {\mathfrak {d}}_2= \ldots = {\mathfrak {d}}_6\).
Corollary 4.7 follows immediately from Theorem 4.5 and is a slight generalization of [36, Theorem 6.1] and [36, Theorem 1.1], respectively. In our proof of Theorem 4.5 we employ the DNN representation results in Lemmas 3.29–3.30 from Sect. 3 above as well as essentially well-known error estimates for the Monte Carlo Euler method which we establish in Proposition 4.4 in this section below. The proof Proposition 4.4, in turn, employs the elementary error estimate results in Lemmas 4.1–4.3 below.
4.1 Error analysis for the Monte Carlo Euler method
Lemma 4.1
Let \(d, m \in {\mathbb {N}}\), \(\xi \in {\mathbb {R}}^d\), \(T \in (0, \infty )\), \(L_0, L_1, l \in [0, \infty )\), \(h \in (0, T]\), \(B \in {\mathbb {R}}^{d \times m}\), let \( \left\| \cdot \right\| \! :{\mathbb {R}}^d \rightarrow [0,\infty ) \) be the d-dimensional Euclidean norm, let \((\Omega , {\mathcal {F}}, {\mathbb {P}})\) be a probability space, let \(W :[0, T] \times \Omega \rightarrow {\mathbb {R}}^m\) be a standard Brownian motion, let \(f_0 :{\mathbb {R}}^d \rightarrow {\mathbb {R}}\) and \(f_1 :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) be functions, let \(\chi :[0, T] \rightarrow [0, T]\) be a function, assume for all \(t \in [0, T]\), \(x, y \in {\mathbb {R}}^d\) that
and \(\chi (t) = \max (\{0, h, 2h, \ldots \} \cap [0, t])\), and let \( X, Y :[0,T] \times \Omega \rightarrow {\mathbb {R}}^d \) be stochastic processes with continuous sample paths which satisfy for all \( t \in [0,T] \) that \( Y_t = \xi + \int _0^t f_1\big ( Y_{ \chi ( s ) } \big ) \, ds + B W_t \) and
Then it holds that
Proof of Lemma 4.1
First, note that (4.2) proves that for all \(x \in {\mathbb {R}}^d\) it holds that
This, (4.1), (4.2), and, e.g., [36, Proposition 4.2] (with \(d = d\), \(m = m\), \(\xi = \xi \), \(T = T\), \(c = L_1\), \(C = \Vert f_1(0)\Vert \), \(\varepsilon _0 = 0\), \(\varepsilon _1 = 0\), \(\varepsilon _2 = 0\), \(\varsigma _0 = 0\), \(\varsigma _1 = 0\), \(\varsigma _2 = 0\), \(L_0 = L_0\), \(L_1 = L_1\), \(l = l\), \(h = h\), \(B = B\), \(p = 2\), \(q = 2\), \(\left\| \cdot \right\| = \left\| \cdot \right\| \), \((\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})\), \(W = W\), \(\phi _0 = f_0\), \(f_1 = f_1\), \(\phi _2 = ({\mathbb {R}}^d \ni x \mapsto x \in {\mathbb {R}}^d)\), \(\chi = \chi \), \(f_0 = f_0\), \(\phi _1 = \phi _1\), \(\varpi _r = ( {\mathbb {E}}[ \Vert B W_T \Vert ^r])^{ \nicefrac {1}{r} }\), \(X = X\), \(Y = Y\) for \(r \in (0, \infty )\) in the notation of [36, Proposition 4.2]) establish that
Combining this with, e.g., [36, Lemma 4.2] (with \(d = d\), \(m = m\), \(T = T\), \(p = \max \{2, 2l\}\), \(B = B\), \(\left\| \cdot \right\| = \left\| \cdot \right\| \), \((\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})\), \(W = W\) in the notation of [36, Lemma 4.2]) ensures that
The proof of Lemma 4.1 is thus completed. \(\square \)
Lemma 4.2
Let \(d, m \in {\mathbb {N}}\), \(T, \kappa \in (0, \infty )\), \(\theta , {\mathfrak {d}}_0, {\mathfrak {d}}_1 \in [0, \infty )\), \(h \in (0, T]\), \(B \in {\mathbb {R}}^{d \times m}\), \(p \in [1, \infty )\), let \(\nu :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1] \) be a probability measure on \({\mathbb {R}}^d\), let \( \left\| \cdot \right\| \! :{\mathbb {R}}^d \rightarrow [0,\infty ) \) be the d-dimensional Euclidean norm, let \((\Omega , {\mathcal {F}}, {\mathbb {P}})\) be a probability space, let \(W :[0, T] \times \Omega \rightarrow {\mathbb {R}}^m\) be a standard Brownian motion, let \(f_0 :{\mathbb {R}}^d \rightarrow {\mathbb {R}}\) and \(f_1 :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) be functions, let \(\chi :[0, T] \rightarrow [0, T]\) be a function, assume for all \(t \in [0, T]\), \(x, y \in {\mathbb {R}}^d\) that
and \(\chi (t) = \max (\{0, h, 2h, \ldots \} \cap [0, t])\), and let \( X^x:[0,T] \times \Omega \rightarrow {\mathbb {R}}^d \), \(x \in {\mathbb {R}}^d\), and \(Y^x :[0,T] \times \Omega \rightarrow {\mathbb {R}}^d \), \(x \in {\mathbb {R}}^d\), be stochastic processes with continuous sample paths which satisfy for all \(x \in {\mathbb {R}}^d\), \( t \in [0,T] \) that \( Y_t^x = x + \int _0^t f_1\big ( Y^x_{ \chi ( s ) } \big ) \, ds + B W_t \) and
Then it holds that
Proof of Lemma 4.2
Throughout this proof let \( \iota = \max \{ \kappa , \theta , 1 \} \). Note that (4.8) proves that for all \(x, y \in {\mathbb {R}}^d\) it holds that
Lemma 4.1 (with \(d = d\), \(m = m\), \(\xi = x\), \(T =T\), \(L_0 = 2\kappa (\theta +1) d^{{\mathfrak {d}}_0}\), \(L_1 = \kappa \), \(l = \theta \), \(h = h\), \(B = B\), \(\left\| \cdot \right\| = \left\| \cdot \right\| \), \((\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})\), \(W = W\), \(f_0 = f_0\), \(f_1 = f_1\), \(\chi = \chi \), \(X = X^x\), \(Y = Y^x\) for \(x \in {\mathbb {R}}^d\) in the notation of Lemma 4.1), (4.10), and (4.9) hence ensure that for all \(x \in {\mathbb {R}}^d\) it holds that
Therefore, we obtain that for all \(x \in {\mathbb {R}}^d\) it holds that
This establishes that
Combining this and (4.10) assures that
The proof of Lemma 4.2 is thus completed. \(\square \)
Lemma 4.3
Let \(d, M, n \in {\mathbb {N}}\), \(T, \kappa , \theta \in (0, \infty )\), \({\mathfrak {d}}_0, {\mathfrak {d}}_1 \in [0, \infty )\), \(B \in {\mathbb {R}}^{d \times n}\), \(p \in [2, \infty )\), let \(\nu :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1] \) be a probability measure on \({\mathbb {R}}^d\), let \( \left\| \cdot \right\| \! :{\mathbb {R}}^d \rightarrow [0,\infty ) \) be the d-dimensional Euclidean norm, let \((\Omega , {\mathcal {F}}, {\mathbb {P}})\) be a probability space, let \(W^m :[0, T] \rightarrow {\mathbb {R}}^n\), \(m \in \{1, 2, \ldots , M\}\), be independent standard Brownian motions, let \(f_0 :{\mathbb {R}}^d \rightarrow {\mathbb {R}}\) be \({\mathcal {B}}({\mathbb {R}}^d) /{\mathcal {B}}({\mathbb {R}})\)-measurable, let \(f_1 :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) be \({\mathcal {B}}({\mathbb {R}}^d) /{\mathcal {B}}({\mathbb {R}}^d)\)-measurable, let \(\chi :[0, T] \rightarrow [0, T]\) be \({\mathcal {B}}([0,T]) /{\mathcal {B}}([0, T])\)-measurable, assume for all \(t \in [0, T]\), \(x \in {\mathbb {R}}^d\) that
and \(\chi (t) \le t\), and let \( Y^{m, x} :[0,T] \times \Omega \rightarrow {\mathbb {R}}^d \), \(m \in \{1, 2, \ldots , M\}\), \(x \in {\mathbb {R}}^d\), be stochastic processes with continuous sample paths which satisfy for all \(x \in {\mathbb {R}}^d\), \(m \in \{1, 2, \ldots , M\}\), \( t \in [0,T] \) that
Then it holds that
Proof of Lemma 4.3
Throughout this proof let \( \iota = \max \{ \theta , 1 \} \). Note that (4.18) and, e.g., [36, Lemma 4.1] (with \(d =d\), \(m =n\), \(\xi = \xi \), \(p =q\), \(c = \kappa \), \(C= \kappa d^{{\mathfrak {d}}_1}\), \(T =T\), \(B = B\), \(\left\| \cdot \right\| = \left\| \cdot \right\| \), \((\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})\), \(W = W^1\), \(\mu = f_1\), \(\chi = \chi \), \(X = Y^{1, x}\) for \(q \in [1, \infty )\), \(x \in {\mathbb {R}}^d\) in the notation of [36, Lemma 4.1]) prove that for all \(q \in [1, \infty )\), \(x \in {\mathbb {R}}^d\) it holds that
This, (4.19), and, e.g., [36, Lemma 4.2] (with \(d = d\), \(m = n\), \(T = T\), \(p = q\), \(B = B\), \(\left\| \cdot \right\| = \left\| \cdot \right\| \), \((\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})\), \(W = W^1\) for \(q \in [1, \infty )\) in the notation of [36, Lemma 4.2]) ensure that for all \(q \in [1, \infty )\), \(x \in {\mathbb {R}}^d\) it holds that
Combining this with (4.18) and Hölder’s inequality establishes for all \(x \in {\mathbb {R}}^d\) that
The fact that \( \forall \, y, z \in {\mathbb {R}}, \alpha \in [0, \infty ) :|y + z|^{\alpha } \le 2^{\alpha }(|y|^{\alpha } + |z|^{\alpha })\) hence proves for all \(x \in {\mathbb {R}}^d\) that
This implies that for all \(x \in {\mathbb {R}}^d\) it holds that
Combining this with, e.g., [24, Corollary 2.5] (with \(p=p\), \(d=1\), \(n=M\), \(\left\| \cdot \right\| = \left\| \cdot \right\| \), \((\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})\), \(X_i = f_0(Y^{i, x})\) for \(i \in \{1, 2, \ldots , M\}\), \(x \in {\mathbb {R}}^d\) in the notation of [24, Corollary 2.5]) and (4.25) assures for all \(x \in {\mathbb {R}}^d\) that
This and the fact that \(\sqrt{p -1} \le \nicefrac {p}{2}\) establish that
Combining this and (4.19) demonstrates that
The proof of Lemma 4.3 is thus completed. \(\square \)
Proposition 4.4
Let \(d, M, n \in {\mathbb {N}}\), \(T, \kappa , \theta \in (0, \infty )\), \({\mathfrak {d}}_0, {\mathfrak {d}}_1 \in [0, \infty )\), \(h \in (0, T]\), \(B \in {\mathbb {R}}^{d \times n}\), \(p \in [2, \infty )\), let \(\nu :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1] \) be a probability measure on \({\mathbb {R}}^d\), let \( \left\| \cdot \right\| \! :{\mathbb {R}}^d \rightarrow [0,\infty ) \) be the d-dimensional Euclidean norm, let \((\Omega , {\mathcal {F}}, {\mathbb {P}})\) be a probability space, let \(W^m :[0, T] \times \Omega \rightarrow {\mathbb {R}}^n\), \(m \in \{1, 2, \ldots , M\}\), be independent standard Brownian motions, let \(f_0 :{\mathbb {R}}^d \rightarrow {\mathbb {R}}\) and \(f_1 :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) be functions, let \(\chi :[0, T] \rightarrow [0, T]\) be a function, assume for all \(t \in [0, T]\), \(x, y \in {\mathbb {R}}^d\) that
and \(\chi (t) = \max (\{0, h, 2h, \ldots \} \cap [0, t])\), and let \( X^x :[0,T] \times \Omega \rightarrow {\mathbb {R}}^d \), \(x \in {\mathbb {R}}^d\), and \( Y^{m, x} :[0,T] \times \Omega \rightarrow {\mathbb {R}}^d \), \(m \in \{1, 2, \ldots , M\}\), \(x \in {\mathbb {R}}^d\), be stochastic processes with continuous sample paths which satisfy for all \(x \in {\mathbb {R}}^d\), \(m \in \{1, 2, \ldots , M\}\), \( t \in [0,T] \) that \( X^x_t = x + \int _0^t f_1( X^x_s ) \, ds + B W_t^1 \) and
Then it holds that
Proof of Proposition 4.4
Throughout this proof let \( \iota = \max \{ \kappa , \theta , 1 \}\). Note that the triangle inequality proves that
Next note that (4.31)–(4.33) and Lemma 4.2 (with \(d=d\), \(m=n\), \(T=T\), \(\kappa =\kappa \), \(\theta =\theta \), \({\mathfrak {d}}_0 = {\mathfrak {d}}_0\), \({\mathfrak {d}}_1 = {\mathfrak {d}}_1\), \(h=h\), \(B=B\), \(p=p\), \(\nu =\nu \), \(\left\| \cdot \right\| = \left\| \cdot \right\| \), \((\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})\), \(W = W^1\), \(f_0 = f_0\), \(f_1 = f_1\), \(\chi =\chi \), \(X^x = X^x\), \(Y^x= Y^{1, x}\) for \(x \in {\mathbb {R}}^d\) in the notation of Lemma 4.2) demonstrates that
Moreover, observe that Hölder’s inequality and (4.33) imply that
Lemma 4.3 (with \(d=d\), \(M=M\), \(n=n\), \(T=T\), \(\kappa =\kappa \), \(\theta =\theta \), \({\mathfrak {d}}_0 = {\mathfrak {d}}_0\), \({\mathfrak {d}}_1 = {\mathfrak {d}}_1\), \(B=B\), \(p=p\), \(\nu = \nu \), \(\left\| \cdot \right\| = \left\| \cdot \right\| \), \((\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})\), \(W^m=W^m\), \(f_0=f_0\), \(f_1=f_1\), \(\chi =\chi \), \(Y^{m,x} = Y^{m,x}\) for \(m \in \{1, 2, \ldots , M\}\), \(x \in {\mathbb {R}}^d\) in the notation of Lemma 4.3), (4.31), and (4.32) hence establish that
This, (4.36), and (4.37) assure that
The proof of Proposition 4.4 is thus completed. \(\square \)
4.2 DNN approximations for Kolmogorov PDEs
Theorem 4.5
Let \( A_d = (A_{d, i, j})_{(i, j) \in \{1, \ldots , d\}^2} \in {\mathbb {R}}^{ d \times d } \), \( d \in {\mathbb {N}}\), be symmetric positive semidefinite matrices, let \(\left\| \cdot \right\| \! :(\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0, \infty )\) satisfy for all \(d \in {\mathbb {N}}\), \(x = (x_1, x_2, \ldots , x_d) \in {\mathbb {R}}^d\) that \(\Vert x\Vert = ( \textstyle \sum _{i=1}^d |x_i|^2)^{\nicefrac {1}{2}}\), for every \( d \in {\mathbb {N}}\) let \( \nu _d :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1]\) be a probability measure on \({\mathbb {R}}^d\), let \( \varphi _{0,d} :{\mathbb {R}}^d \rightarrow {\mathbb {R}}\), \( d \in {\mathbb {N}}\), and \( \varphi _{ 1, d } :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d \), \( d \in {\mathbb {N}}\), be functions, let \( T, \kappa \in (0, \infty )\), \({\mathfrak {e}}, {\mathfrak {d}}_1, {\mathfrak {d}}_2, \ldots , {\mathfrak {d}}_6 \in [0, \infty )\), \(\theta \in [1, \infty )\), \(p \in [2, \infty )\), \( ( \phi ^{ m, d }_{ \varepsilon } )_{ (m, d, \varepsilon ) \in \{ 0, 1 \} \times {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}\), \(a\in C({\mathbb {R}}, {\mathbb {R}})\) satisfy for all \(x \in {\mathbb {R}}\) that \(a(x) = \max \{x, 0\}\), assume for all \( d \in {\mathbb {N}}\), \( \varepsilon \in (0,1] \), \( m \in \{0, 1\}\), \( x, y \in {\mathbb {R}}^d \) that \( {\mathcal {R}}_{a}( \phi ^{ 0, d }_{ \varepsilon } ) \in C( {\mathbb {R}}^d, {\mathbb {R}}) \), \( {\mathcal {R}}_{a}( \phi ^{ 1, d }_{ \varepsilon } ) \in C( {\mathbb {R}}^d, {\mathbb {R}}^d ) \), \( {\text {Trace}}(A_d) \le \kappa d^{ 2 {\mathfrak {d}}_1 } \), \([ \int _{{\mathbb {R}}^d} \Vert x\Vert ^{2p \theta } \, \nu _d (dx) ]^{\nicefrac {1}{(2p \theta )}} \le \kappa d^{{\mathfrak {d}}_1 + {\mathfrak {d}}_2}\), \( {\mathcal {P}}( \phi ^{ m, d }_{ \varepsilon } ) \le \kappa d^{ 2^{(-m)} {\mathfrak {d}}_3 } \varepsilon ^{ - 2^{(-m)} {\mathfrak {e}}}\), \( |( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(x) - ( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(y)| \le \kappa d^{{\mathfrak {d}}_6} (1 + \Vert x\Vert ^{\theta } + \Vert y \Vert ^{\theta })\Vert x-y\Vert \), \( \Vert ( {\mathcal {R}}_{a}(\phi ^{ 1, d }_{ \varepsilon }) )(x) \Vert \le \kappa ( d^{ {\mathfrak {d}}_1 + {\mathfrak {d}}_2 } + \Vert x \Vert ) \), \(| \varphi _{ 0, d }( x )| \le \kappa d^{ {\mathfrak {d}}_6 } ( d^{ \theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2) } + \Vert x \Vert ^{ \theta } )\), \( \Vert \varphi _{ 1, d }( x ) - \varphi _{ 1, d }( y ) \Vert \le \kappa \Vert x - y \Vert \), and
and for every \( d \in {\mathbb {N}}\) let \( u_d :[0,T] \times {\mathbb {R}}^{d} \rightarrow {\mathbb {R}}\) be an at most polynomially growing viscosity solution of
with \( u_d( 0, x ) = \varphi _{ 0, d }( x ) \) for \( ( t, x ) \in (0,T) \times {\mathbb {R}}^d \) (cf. Definitions 3.1 and 3.3). Then there exist \( c \in {\mathbb {R}}\) and \( ( \Psi _{ d, \varepsilon } )_{ (d , \varepsilon ) \in {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}\) such that for all \( d \in {\mathbb {N}}\), \( \varepsilon \in (0,1] \) it holds that \( {\mathcal {R}}( \Psi _{ d, \varepsilon } ) \in C( {\mathbb {R}}^{ d }, {\mathbb {R}}) \), \([ \int _{ {\mathbb {R}}^d } | u_d(T, x) - ( {\mathcal {R}} (\Psi _{ d, \varepsilon }) )( x ) |^p \, \nu _d(dx) ]^{ \nicefrac { 1 }{ p } } \le \varepsilon \), and
Proof of Theorem 4.5
Throughout this proof let \( {\mathcal {A}}_d \in {\mathbb {R}}^{ d \times d } \), \( d \in {\mathbb {N}}\), satisfy for all \( d \in {\mathbb {N}}\) that \( {\mathcal {A}}_d = \sqrt{ 2 A_d } \), let \((\Omega , {\mathcal {F}}, {\mathbb {P}})\) be a probability space, let \( W^{ d, m } :[0,T] \times \Omega \rightarrow {\mathbb {R}}^d \), \( d, m \in {\mathbb {N}}\), be independent standard Brownian motions, let \(Z^{N, d, m}_n :\Omega \rightarrow {\mathbb {R}}^{d} \), \(n \in \{0, 1, \ldots , N-1\}\), \(m \in \{1, 2, \ldots , N\}\), \(d, N \in {\mathbb {N}}\), be the random variables which satisfy for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , N\}\), \(n \in \{0, 1, \ldots , N-1\}\) that
let \(f_{N, d} :{\mathbb {R}}^{d} \times {\mathbb {R}}^{d} \rightarrow {\mathbb {R}}^{d}\), \(d, N \in {\mathbb {N}}\), satisfy for all \(N, d \in {\mathbb {N}}\), \(x, y \in {\mathbb {R}}^d\) that
let \( X^{ d, x } :[0,T] \times \Omega \rightarrow {\mathbb {R}}^d \), \( x \in {\mathbb {R}}^d \), \( d \in {\mathbb {N}}\), be stochastic processes with continuous sample paths which satisfy for all \( d \in {\mathbb {N}}\), \( x \in {\mathbb {R}}^d \), \( t \in [0,T] \) that
(cf., e.g., [36, item (i) in Theorem 3.1] (with \((\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})\), \(T=T\), \(d=d\), \(m=d\), \(B= {\mathcal {A}}_d\), \(\mu = \varphi _{1,d}\) for \(d \in {\mathbb {N}}\) in the notation of [36, Theorem 3.1])), let \(Y^{N, d, x}_n = (Y^{N, d, m, x}_n)_{m \in \{1, 2, \ldots , N\}} :\Omega \rightarrow {\mathbb {R}}^{N d}\), \(n \in \{0, 1, \ldots , N\}\), \(x \in {\mathbb {R}}^d\), \(d, N \in {\mathbb {N}}\), satisfy for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , N\}\), \(x \in {\mathbb {R}}^d\), \(n \in \{1, 2, \ldots , N\}\) that \(Y^{N, d, m, x}_{0} = x\) and
let \(g_{N, d} :{\mathbb {R}}^{Nd} \rightarrow {\mathbb {R}}\), \(d, N \in {\mathbb {N}}\), satisfy for all \(N, d \in {\mathbb {N}}\), \(x = (x_i)_{i \in \{1, 2, \ldots , N\}} \in {\mathbb {R}}^{Nd}\) that
and let \({\mathfrak {N}}_{d, \varepsilon } \subseteq {\mathbf {N}}\), \( \varepsilon \in (0, 1]\), \(d \in {\mathbb {N}}\), satisfy for all \(d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\) that
(cf. Definition 3.27). Note that (4.44) and, e.g., [36, Lemma 4.2] (with \(d = d\), \(m = d\), \(T = T\), \(p = 2p \theta \), \(B = {\mathcal {A}}_d\), \((\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})\), \(W = W^{d,m}\) for \(d, m \in {\mathbb {N}}\) in the notation of [36, Lemma 4.2]) ensure that for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , N\}\), \(n \in \{0, 1, \ldots , N-1\}\) it holds that
This and the assumption that \( \forall \, d \in {\mathbb {N}}:{\text {Trace}}(A_d) \le \kappa d^{ 2 {\mathfrak {d}}_1 } \) assure for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , N\}\), \(n \in \{0, 1, \ldots , N-1\}\) that
Moreover, observe that Lemma 3.16 (with \(d=d\), \(a=a\) for \(d \in {\mathbb {N}}\) in the notation of Lemma 3.16) ensures that there exist \({\mathfrak {I}}_d \in {\mathbf {N}}\), \(d \in {\mathbb {N}}\), such that for all \(d \in {\mathbb {N}}\), \(x \in {\mathbb {R}}^d\) it holds that \({\mathcal {D}}({\mathfrak {I}}_d) = (d, 2d, d)\), \({\mathcal {R}}_{a}( {\mathfrak {I}}_{d}) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)\), and \(({\mathcal {R}}_{a}({\mathfrak {I}}_d))(x) = x\). This and (4.49) assure for all \(d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\) that \({\mathfrak {I}}_d \in {\mathfrak {N}}_{d, \varepsilon }\) and
Next note that Lemma 3.14 demonstrates that for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\) it holds that \({\mathcal {D}}(\frac{T}{N} \circledast \phi ^{1, d}_{\varepsilon }) = {\mathcal {D}}(\phi ^{1, d}_{\varepsilon })\), \({\mathcal {R}}_{a}(\frac{T}{N} \circledast \phi ^{1, d}_{\varepsilon } ) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)\), and
(cf. Definition 3.13). This, the fact that \({\mathcal {D}}({\mathfrak {I}}_d) = (d, 2d, d)\), and Lemma 3.30 (with \(a=a\), \(L_1 = {\mathcal {L}}(\tfrac{T}{N} \circledast \phi ^{1, d}_{\varepsilon }) \), \(L_2 = 2\), \({\mathbb {I}} = {\mathfrak {I}}_d\), \(\Phi _1 = \tfrac{T}{N} \circledast \phi ^{1, d}_{\varepsilon }\), \(\Phi _2 = {\mathfrak {I}}_d\), \(d=d\), \({\mathfrak {i}} = 2d\), \((l_{1, 0}, l_{1, 1}, \ldots , l_{1, L_1}) = {\mathcal {D}}(\tfrac{T}{N} \circledast \phi ^{1, d}_{\varepsilon })\), \((l_{2, 0}, l_{2, 1}, l_{2, L_2}) = (d, 2d, d)\) for \(d, N \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\) in the notation of Lemma 3.30) establish that there exist \({\mathbf {f}}^{N, d}_{\varepsilon } \in {\mathbf {N}}\), \(\varepsilon \in (0, 1]\), \(d, N \in {\mathbb {N}}\), such that for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(x \in {\mathbb {R}}^d\) it holds that \({\mathcal {R}}_{a}({\mathbf {f}}^{N, d}_{\varepsilon }) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)\) and
Items (ii)–(iii) in Lemma 3.9 hence ensure that there exist \({\mathbf {f}}^{N, d}_{\varepsilon , z} \in {\mathbf {N}}\), \(z \in {\mathbb {R}}^d\), \(\varepsilon \in (0, 1]\), \(d, N \in {\mathbb {N}}\), which satisfy for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(z, x \in {\mathbb {R}}^d\) that \({\mathcal {R}}_{a}({\mathbf {f}}^{N, d}_{\varepsilon , z}) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)\) and
This, (4.45), and (4.41) imply for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(x, z \in {\mathbb {R}}^d\) that \(({\mathbb {R}}^d \ni {\mathfrak {z}} \mapsto ( {\mathcal {R}}_{a}({\mathbf {f}}^{N, d}_{\varepsilon , {\mathfrak {z}}}))(x) \in {\mathbb {R}}^d)\) is \({\mathcal {B}}({\mathbb {R}}^d) /{\mathcal {B}}({\mathbb {R}}^d)\)-measurable and
Next note that (4.55) and the assumption that \( \forall \, \varepsilon \in (0, 1], d \in {\mathbb {N}}, x \in {\mathbb {R}}^d :\Vert ( {\mathcal {R}}_{a}(\phi ^{ 1, d }_{ \varepsilon }) )(x) \Vert \le \kappa ( d^{ {\mathfrak {d}}_1 + {\mathfrak {d}}_2 } + \Vert x \Vert ) \) prove for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(x, z \in {\mathbb {R}}^d\) that
In addition, observe that (4.45) and the assumption that \(\forall \, d \in {\mathbb {N}}, x, y \in {\mathbb {R}}^d :\Vert \varphi _{ 1, d }( x ) - \varphi _{ 1, d }( y ) \Vert \le \kappa \Vert x - y \Vert \) imply that for all \(N, d \in {\mathbb {N}}\), \(x, z \in {\mathbb {R}}^d\) it holds that
Moreover, note that (4.53), the fact that \({\mathcal {D}} ({\mathfrak {I}}_d) = (d, 2d, d)\), and Lemma 3.30 (with \(a=a\), \(L_1 = {\mathcal {L}}(\tfrac{T}{N} \circledast \phi ^{1, d}_{\varepsilon }) \), \(L_2 = {\mathcal {L}}(\Phi )\), \({\mathbb {I}} = {\mathfrak {I}}_d\), \(\Phi _1 = \tfrac{T}{N} \circledast \phi ^{1, d}_{\varepsilon }\), \(\Phi _2 = \Phi \), \(d=d\), \({\mathfrak {i}} = 2d\), \((l_{1, 0}, l_{1, 1}, \ldots , l_{1, L_1}) = {\mathcal {D}}(\tfrac{T}{N} \circledast \phi ^{1, d}_{\varepsilon }) = {\mathcal {D}}(\phi ^{1, d}_{\varepsilon })\), \((l_{2, 0}, l_{2, 1}, \ldots , l_{2, L_2}) = {\mathcal {D}}(\Phi )\) for \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(\Phi \in {\mathfrak {N}}_{d, \varepsilon }\) in the notation of Lemma 3.30) prove that for every \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(\Phi \in {\mathfrak {N}}_{d, \varepsilon }\) there exists \({\hat{\Phi }} \in {\mathbf {N}}\) such that for all \(x \in {\mathbb {R}}^d\) it holds that \({\mathcal {R}}_{a}({\hat{\Phi }}) \in C({\mathbb {R}}^d, {\mathbb {R}}^d)\), \({\mathbb {D}}_{{\mathcal {L}}({\hat{\Phi }}) -1}({\hat{\Phi }}) \le {\mathbb {D}}_{{\mathcal {L}}(\phi ^{1,d}_{\varepsilon }) -1}(\phi ^{1,d}_{\varepsilon }) + 2d \), \({\mathcal {P}}({\hat{\Phi }}) \le {\mathcal {P}}(\Phi ) + [\frac{1}{2} {\mathcal {P}}({\mathfrak {I}}_d) + {\mathcal {P}}(\phi ^{1, d}_{\varepsilon })]^2\), and
This, (4.49), (4.52), and the fact that \(\forall \, d \in {\mathbb {N}}, \varepsilon \in (0, 1] :{\mathcal {P}}( \phi ^{ 1, d }_{ \varepsilon } ) \le \kappa d^{ 2^{(-1)} {\mathfrak {d}}_3 } \varepsilon ^{ - 2^{(-1)} {\mathfrak {e}}}\) demonstrate that for every \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(\Phi \in {\mathfrak {N}}_{d, \varepsilon }\) there exists \({\hat{\Phi }} \in {\mathfrak {N}}_{d, \varepsilon }\) such that for all \(x \in {\mathbb {R}}^d\) it holds that
and
Items (i)–(iii) in Lemma 3.9 and (4.55) hence ensure that for every \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(\Phi \in {\mathfrak {N}}_{d, \varepsilon }\) there exist \(({\hat{\Phi }}_{z})_{z \in {\mathbb {R}}^d} \subseteq {\mathfrak {N}}_{d, \varepsilon }\) such that for all \(x, z, {\mathfrak {z}} \in {\mathbb {R}}^d\) it holds that
and \({\mathcal {D}} ({\hat{\Phi }}_{z}) = {\mathcal {D}} ({\hat{\Phi }}_{{\mathfrak {z}}})\). In the next step we observe that Lemma 3.29 (with \(n = N\), \(h_m = \nicefrac {1}{N}\), \(\phi _m = \phi ^{0, d}_{\varepsilon }\), \(a= a\) for \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(m \in \{1, 2, \ldots , N\}\) in the notation of Lemma 3.29) demonstrates that there exist \({\mathbf {g}}^{N, d}_{\varepsilon } \in {\mathbf {N}}\), \(\varepsilon \in (0, 1]\), \(d, N \in {\mathbb {N}}\), which satisfy for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(x = (x_i)_{i \in \{1, 2, \ldots , N\}} \in {\mathbb {R}}^{Nd}\) that \({\mathcal {R}}_{a}({\mathbf {g}}^{N, d}_{\varepsilon }) \in C({\mathbb {R}}^{N d}, {\mathbb {R}})\) and
This, (4.48), and (4.41) ensure that for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(x = (x_i)_{i \in \{1, 2, \ldots , N\}} \in {\mathbb {R}}^{Nd}\) it holds that
Moreover, note that (4.64) and the assumption that \(\forall \, \varepsilon \in (0,1], d \in {\mathbb {N}}, x, y \in {\mathbb {R}}^d :|( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(x) - ( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(y)| \le \kappa d^{{\mathfrak {d}}_6} (1 + \Vert x\Vert ^{\theta } + \Vert y \Vert ^{\theta })\Vert x-y\Vert \) imply that for all \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(x = (x_i)_{i \in \{1, 2, \ldots , N\}} \in {\mathbb {R}}^{Nd}\), \(y = (y_i)_{i \in \{1, 2, \ldots , N\}} \in {\mathbb {R}}^{Nd}\) it holds that
Next observe that the fact that \({\mathcal {D}}({\mathfrak {I}}_d) = (d, 2d, d)\) and, e.g., [25, Proposition 2.16] (with \(\Psi = {\mathfrak {I}}_d\), \(\Phi _1 = \phi ^{0, d}_{\varepsilon }\), \(\Phi _2 \in \{\Phi \in {\mathbf {N}}:{\mathcal {I}}(\Phi ) = {\mathcal {O}}(\Phi ) =d \}\), \({\mathfrak {i}} = 2d\) in the notation of [25, Proposition 2.16]) prove that for every \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(\Phi _1, \Phi _2, \ldots , \Phi _{N} \in \{\Phi \in {\mathbf {N}}:{\mathcal {I}}(\Phi ) = {\mathcal {O}}(\Phi ) =d \}\) with \({\mathcal {D}}(\Phi _1) = {\mathcal {D}}(\Phi _2) = \ldots = {\mathcal {D}}(\Phi _{N})\) there exist \(\Psi _1, \Psi _2, \ldots , \Psi _{N} \in {\mathbf {N}}\) such that for all \(i \in \{1, 2, \ldots , N\}\) it holds that \( {\mathcal {R}}_{a}(\Psi _i) \in C({\mathbb {R}}^d, {\mathbb {R}})\), \({\mathcal {D}}(\Psi _i) = {\mathcal {D}}(\Psi _1)\), \({\mathcal {P}}(\Psi _i) \le 2 ({\mathcal {P}}(\phi ^{0, d}_{\varepsilon }) + {\mathcal {P}}(\Phi _i))\), and
This, (4.64), and Lemma 3.28 assure that for every \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(\Phi _1, \Phi _2, \ldots , \Phi _{N} \in \{\Phi \in {\mathbf {N}}:{\mathcal {I}}(\Phi ) = {\mathcal {O}}(\Phi ) =d \}\) with \({\mathcal {D}}(\Phi _1) = {\mathcal {D}}(\Phi _2) = \ldots = {\mathcal {D}}(\Phi _{N})\) there exists \(\Psi \in {\mathbf {N}}\) such that for all \(x \in {\mathbb {R}}^d\) it holds that \( {\mathcal {R}}_{a}(\Psi ) \in C({\mathbb {R}}^d, {\mathbb {R}})\), \({\mathcal {P}}(\Psi ) \le 2 N^2 ({\mathcal {P}}(\phi ^{0, d}_{\varepsilon }) + {\mathcal {P}}(\Phi _1))\), and
The assumption that \(\forall \, d \in {\mathbb {N}}, \varepsilon \in (0, 1] :{\mathcal {P}}( \phi ^{ 0, d }_{ \varepsilon } ) \le \kappa d^{ {\mathfrak {d}}_3 } \varepsilon ^{ - {\mathfrak {e}}}\) hence ensures that for every \(N, d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(\Phi _1, \Phi _2, \ldots , \Phi _{N} \in \{\Phi \in {\mathbf {N}}:{\mathcal {I}}(\Phi ) = {\mathcal {O}}(\Phi ) =d \}\) with \({\mathcal {D}}(\Phi _1) = {\mathcal {D}}(\Phi _2) = \ldots = {\mathcal {D}}(\Phi _{N})\) there exists \(\Psi \in {\mathbf {N}}\) such that for all \(x \in {\mathbb {R}}^d\) it holds that \( {\mathcal {R}}_{a}(\Psi ) \in C({\mathbb {R}}^d, {\mathbb {R}})\), \(({\mathcal {R}}_{a}(\Psi )) (x) = ( {\mathcal {R}}_{a}({\mathbf {g}}^{N, d}_{\varepsilon })) ( ({\mathcal {R}}_{a}(\Phi _1)) (x), ({\mathcal {R}}_{a}(\Phi _2)) (x),\) \( \ldots , ({\mathcal {R}}_{a}(\Phi _N)) (x))\), and
Furthermore, note that (4.41) and the assumption that \(\forall \, d\in {\mathbb {N}}, \varepsilon \in (0, 1], x, y \in {\mathbb {R}}^d :|( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(x) - ( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(y)| \le \kappa d^{ {\mathfrak {d}}_6} (1 + \Vert x\Vert ^{\theta } + \Vert y \Vert ^{\theta })\Vert x-y\Vert \) demonstrate for all \(d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(x \in {\mathbb {R}}^d\) that
This establishes that for all \(d \in {\mathbb {N}}\), \(x \in {\mathbb {R}}^d\) it holds that
Next observe that the assumption that \(\forall \, d\in {\mathbb {N}}, \varepsilon \in (0, 1], x \in {\mathbb {R}}^d :\Vert ( {\mathcal {R}}_{a}(\phi ^{ 1, d }_{ \varepsilon }) )(x) \Vert \le \kappa ( d^{ {\mathfrak {d}}_1 + {\mathfrak {d}}_2 } + \Vert x \Vert ) \) and (4.41) ensure for all \(d \in {\mathbb {N}}\), \(\varepsilon \in (0, 1]\), \(x \in {\mathbb {R}}^d\) that
This proves that for all \(d \in {\mathbb {N}}\), \(x \in {\mathbb {R}}^d\) it holds that
In the next step we note that the Hölder’s inequality, the assumption that \(\forall \, d \in {\mathbb {N}}:[ \int _{{\mathbb {R}}^d} \Vert x\Vert ^{2p \theta } \, \nu _d (dx) ]^{\nicefrac {1}{(2p\theta )}} \le \kappa d^{{\mathfrak {d}}_1 + {\mathfrak {d}}_2}\), and the assumption that \(\theta \in [1, \infty )\) assure that for all \(d \in {\mathbb {N}}\) it holds that
Next note that (4.47), (4.45), and (4.44) imply that for all \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , N\}\), \(x \in {\mathbb {R}}^d\), \(n \in \{1, 2, \ldots , N\}\) it holds that
The assumption that \(\forall \, d \in {\mathbb {N}}, x \in {\mathbb {R}}^d :| \varphi _{ 0, d }( x )| \le \kappa d^{ {\mathfrak {d}}_6 } ( d^{ \theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2) } + \Vert x \Vert ^{ \theta } )\), the assumption that \( \forall \, d \in {\mathbb {N}}:{\text {Trace}}(A_d) \le \kappa d^{ 2 {\mathfrak {d}}_1 } \), the assumption that \(\forall \, d \in {\mathbb {N}}, x, y \in {\mathbb {R}}^d :\Vert \varphi _{ 1, d }( x ) - \varphi _{ 1, d }( y ) \Vert \le \kappa \Vert x - y \Vert \), (4.71), (4.73), (4.74), (4.46), and Proposition 4.4 (with \(d=d\), \(M=N\), \(n=d\), \(T=T\), \(\kappa = \kappa \), \(\theta = \theta \), \({\mathfrak {d}}_0 = {\mathfrak {d}}_6\), \({\mathfrak {d}}_1 = {\mathfrak {d}}_1 + {\mathfrak {d}}_2\), \(h = \nicefrac {T}{N}\), \(B = {\mathcal {A}}_d\), \(p=p\), \(\nu = \nu _d\), \((\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})\), \(W^m = W^{d, m}\), \(f_0 = \varphi _{0,d}\), \(f_1 = \varphi _{1, d}\) for \(N, d \in {\mathbb {N}}\) in the notation of Proposition 4.4) hence establish that for all \(N, d \in {\mathbb {N}}\) it holds that
This, the fact that \(\forall \, d\in {\mathbb {N}}, x \in {\mathbb {R}}^d :| \varphi _{ 0, d }( x ) | \le \kappa d^{ {\mathfrak {d}}_6 } ( d^{\theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2) } + \Vert x \Vert ^{ \theta } )\), (4.73), (4.48), and, e.g., [36, Theorem 3.1] (with \((\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})\), \(T=T\), \(d=d\), \(m=d\), \(B= {\mathcal {A}}_d\), \(\varphi = \varphi _{0,d}\), \(\mu = \varphi _{1,d}\) for \(d \in {\mathbb {N}}\) in the notation of [36, Theorem 3.1]) prove for all \(N, d \in {\mathbb {N}}\) that
Combining this, (4.47), (4.51), (4.52), (4.56), (4.57), (4.58), (4.62), (4.63), (4.65), (4.66), (4.69), and Theorem 2.3 (with \((\Omega , {\mathcal {F}}, {\mathbb {P}}) = (\Omega , {\mathcal {F}}, {\mathbb {P}})\), \({\mathfrak {n}}_0= \nicefrac {1}{2}\), \({\mathfrak {n}}_1 = 0\), \({\mathfrak {n}}_2 =2\), \({\mathfrak {e}}= {\mathfrak {e}}\), \({\mathfrak {d}}_0 = {\mathfrak {d}}_6 + ({\mathfrak {d}}_1 + {\mathfrak {d}}_2)(\theta +1)\), \({\mathfrak {d}}_1 = {\mathfrak {d}}_1\), \({\mathfrak {d}}_2 = {\mathfrak {d}}_2\), \({\mathfrak {d}}_3 = \max \{4, {\mathfrak {d}}_3\}\), \({\mathfrak {d}}_4 = {\mathfrak {d}}_4\), \({\mathfrak {d}}_5 = {\mathfrak {d}}_5\), \({\mathfrak {d}}_6 = {\mathfrak {d}}_6\), \( {\mathfrak {C}}=2^{4\theta +6} | \!\max \{1, T\} |^{\theta +1} |\!\max \{ \kappa , \theta \}|^{2\theta +3} e^{(6\max \{ \kappa , \theta \}+5|\!\max \{ \kappa , \theta \}|^2 T)} p (p \theta + p +1)^{\theta }\), \(p=p\), \(\theta = \theta \), \(M_N= N\), \(Z^{N, d, m}_n = Z^{N, d, m}_n\), \(f_{N, d} = f_{N, d}\), \(Y^{N, d, x}_l = Y^{N, d, x}_l\), \(\left\| \cdot \right\| = \left\| \cdot \right\| \), \(\nu _d = \nu _d\), \(g_{ N, d } = g_{ N, d }\), \(u_d(x) = u_d(T,x)\), \({\mathbf {N}}= {\mathbf {N}}\), \({\mathcal {P}}= {\mathcal {P}}\), \({\mathcal {D}}= {\mathcal {D}}\), \({\mathcal {R}}= {\mathcal {R}}_{a}\), \({\mathfrak {N}}_{d, \varepsilon } = {\mathfrak {N}}_{d, \varepsilon }\), \({\mathbf {f}}^{N, d}_{\varepsilon , z} = {\mathbf {f}}^{N, d}_{\varepsilon , z}\), \({\mathbf {g}}^{N, d}_{\varepsilon } = {\mathbf {g}}^{N, d}_{\varepsilon }\), \({\mathfrak {I}}_d = {\mathfrak {I}}_d\) for \(N, d \in {\mathbb {N}}\), \(m \in \{1, 2, \ldots , N\}\), \(n \in \{0, 1, \ldots , N-1\}\), \(l \in \{0, 1, \ldots , N\}\), \(\varepsilon \in (0, 1]\), \(x, z \in {\mathbb {R}}^d\) in the notation of Theorem 2.3) establish (4.43). The proof of Theorem 4.5 is thus completed. \(\square \)
Corollary 4.6
Let \( \varphi _{0,d} :{\mathbb {R}}^d \rightarrow {\mathbb {R}}\), \( d \in {\mathbb {N}}\), and \( \varphi _{ 1, d } :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d \), \( d \in {\mathbb {N}}\), be functions, let \(\left\| \cdot \right\| \! :(\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0, \infty )\) satisfy for all \(d \in {\mathbb {N}}\), \(x = (x_1, x_2, \ldots , x_d) \in {\mathbb {R}}^d\) that \(\Vert x\Vert = ( \textstyle \sum _{i=1}^d |x_i|^2)^{\nicefrac {1}{2}}\), let \( T, \kappa \in (0, \infty )\), \({\mathfrak {e}}, {\mathfrak {d}}_1, {\mathfrak {d}}_2, \ldots , {\mathfrak {d}}_6 \in [0, \infty )\), \(\theta \in [1, \infty )\), \(p \in [2, \infty )\), \( ( \phi ^{ m, d }_{ \varepsilon } )_{ (m, d, \varepsilon ) \in \{ 0, 1 \} \times {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}\), \(a\in C({\mathbb {R}}, {\mathbb {R}})\) satisfy for all \(x \in {\mathbb {R}}\) that \(a(x) = \max \{x, 0\}\), assume for all \( d \in {\mathbb {N}}\), \( \varepsilon \in (0,1] \), \( m \in \{0, 1\}\), \( x, y \in {\mathbb {R}}^d \) that \( {\mathcal {R}}_{a}( \phi ^{ 0, d }_{ \varepsilon } ) \in C( {\mathbb {R}}^d, {\mathbb {R}}) \), \( {\mathcal {R}}_{a}( \phi ^{ 1, d }_{ \varepsilon } ) \in C( {\mathbb {R}}^d, {\mathbb {R}}^d ) \), \( {\mathcal {P}}( \phi ^{ m, d }_{ \varepsilon } ) \le \kappa d^{ 2^{(-m)} {\mathfrak {d}}_3 } \varepsilon ^{ - 2^{(-m)} {\mathfrak {e}}}\), \( |( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(x) - ( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(y)| \le \kappa d^{{\mathfrak {d}}_6} (1 + \Vert x\Vert ^{\theta } + \Vert y \Vert ^{\theta })\Vert x-y\Vert \), \( \Vert ( {\mathcal {R}}_{a}(\phi ^{ 1, d }_{ \varepsilon }) )(x) \Vert \le \kappa ( d^{ {\mathfrak {d}}_1 + {\mathfrak {d}}_2 } + \Vert x \Vert ) \), \(| \varphi _{ 0, d }( x )| \le \kappa d^{ {\mathfrak {d}}_6 } ( d^{ \theta ({\mathfrak {d}}_1 + {\mathfrak {d}}_2) } + \Vert x \Vert ^{ \theta } )\), \( \Vert \varphi _{ 1, d }( x ) - \varphi _{ 1, d }( y ) \Vert \le \kappa \Vert x - y \Vert \), and
and for every \( d \in {\mathbb {N}}\) let \( u_d :[0,T] \times {\mathbb {R}}^{d} \rightarrow {\mathbb {R}}\) be an at most polynomially growing viscosity solution of
with \( u_d( 0, x ) = \varphi _{ 0, d }( x ) \) for \( ( t, x ) \in (0,T) \times {\mathbb {R}}^d \) (cf. Definitions 3.1 and 3.3). Then there exist \( c \in {\mathbb {R}}\) and \( ( \Psi _{ d, \varepsilon } )_{ (d , \varepsilon ) \in {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}\) such that for all \( d \in {\mathbb {N}}\), \( \varepsilon \in (0,1] \) it holds that \( {\mathcal {R}}( \Psi _{ d, \varepsilon } ) \in C( {\mathbb {R}}^{ d }, {\mathbb {R}}) \), \([ \int _{ [0, 1]^d } | u_d(T, x) - ( {\mathcal {R}} (\Psi _{ d, \varepsilon }) )( x ) |^p \, dx ]^{ \nicefrac { 1 }{ p } } \le \varepsilon \), and
Proof of Corollary 4.6
Throughout this proof for every \( d \in {\mathbb {N}}\) let \( \lambda _d :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0, \infty ]\) be the Lebesgue-Borel measure on \({\mathbb {R}}^d\) and let \(\nu _d :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1]\) be the function which satisfies for all \(B \in {\mathcal {B}}({\mathbb {R}}^d)\) that
Observe that (4.81) implies that for all \(d \in {\mathbb {N}}\) it holds that \(\nu _d\) is a probability measure on \({\mathbb {R}}^d\). This and (4.81) ensure that for all \(d \in {\mathbb {N}}\), \(g \in C({\mathbb {R}}^d, {\mathbb {R}})\) it holds that
Combining this with, e.g., [24, Lemma 3.15] demonstrates that for all \(d \in {\mathbb {N}}\) it holds that
This assures for all \(d \in {\mathbb {N}}\) that
Moreover, note that for all \(d \in {\mathbb {N}}\) it holds that
(cf. Definition (3.6)). This, (4.84), and Theorem 4.5 (with \(A_d = {\text {I}}_d\), \(\left\| \cdot \right\| = \left\| \cdot \right\| \), \(\nu _d = \nu _d\), \(\varphi _{ 0, d } = \varphi _{ 0, d }\), \(\varphi _{ 1, d } = \varphi _{ 1, d }\), \(T =T\), \(\kappa = \max \{\kappa , 1\}\), \({\mathfrak {e}}= {\mathfrak {e}}\), \({\mathfrak {d}}_1 = \max \{{\mathfrak {d}}_1, \nicefrac {1}{2}\}\), \({\mathfrak {d}}_2 = {\mathfrak {d}}_2\), \({\mathfrak {d}}_3 = {\mathfrak {d}}_3\), \({\mathfrak {d}}_4 = {\mathfrak {d}}_4\), \({\mathfrak {d}}_5 = {\mathfrak {d}}_5\), \({\mathfrak {d}}_6 = {\mathfrak {d}}_6\), \(\theta = \theta \), \(p = p\), \(\phi ^{0, d}_{\varepsilon } = \phi ^{0, d}_{\varepsilon }\), \(\phi ^{1, d}_{\varepsilon } = \phi ^{1, d}_{\varepsilon }\), \(a = a\), \(u_d = u_d\) for \(d \in {\mathbb {N}}\) in the notation of Theorem 4.5) establish (4.80). The proof of Corollary 4.6 is thus completed. \(\square \)
Corollary 4.7
Let \( A_d = ( A_{ d, i, j } )_{ (i, j) \in \{ 1, \dots , d \}^2 } \in {\mathbb {R}}^{ d \times d } \), \( d \in {\mathbb {N}}\), be symmetric positive semidefinite matrices, let \(\left\| \cdot \right\| \! :(\cup _{d \in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0, \infty )\) satisfy for all \(d \in {\mathbb {N}}\), \(x = (x_1, x_2, \ldots , x_d) \in {\mathbb {R}}^d\) that \(\Vert x\Vert = ( \textstyle \sum _{i=1}^d |x_i|^2)^{\nicefrac {1}{2}}\), for every \( d \in {\mathbb {N}}\) let \( \nu _d :{\mathcal {B}}({\mathbb {R}}^d) \rightarrow [0,1]\) be a probability measure on \({\mathbb {R}}^d\), let \( \varphi _{0,d} :{\mathbb {R}}^d \rightarrow {\mathbb {R}}\), \( d \in {\mathbb {N}}\), and \( \varphi _{ 1, d } :{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d \), \( d \in {\mathbb {N}}\), be functions, let \( T, \kappa , p \in (0, \infty )\), \(\theta \in [1, \infty )\), \( ( \phi ^{ m, d }_{ \varepsilon } )_{ (m, d, \varepsilon ) \in \{ 0, 1 \} \times {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}\), \(a\in C({\mathbb {R}}, {\mathbb {R}})\) satisfy for all \(x \in {\mathbb {R}}\) that \(a(x) = \max \{x, 0\}\), assume for all \( d \in {\mathbb {N}}\), \( \varepsilon \in (0,1] \), \(m \in \{0, 1\}\), \( x, y \in {\mathbb {R}}^d \) that \( {\mathcal {R}}_{a}( \phi ^{ 0, d }_{ \varepsilon } ) \in C( {\mathbb {R}}^d, {\mathbb {R}}) \), \( {\mathcal {R}}_{a}( \phi ^{ 1, d }_{ \varepsilon } ) \in C( {\mathbb {R}}^d, {\mathbb {R}}^d ) \), \( | \varphi _{ 0, d }( x ) | + {\text {Trace}}(A_d) \le \kappa d^{ \kappa } ( 1 + \Vert x \Vert ^{ \theta }) \), \([ \int _{{\mathbb {R}}^d} \Vert x\Vert ^{2 \max \{p, 2\} \theta } \, \nu _d (dx) ]^{\nicefrac {1}{(2 \max \{p, 2\} \theta )}} \le \kappa d^{\kappa }\), \( {\mathcal {P}}( \phi ^{ m, d }_{ \varepsilon } ) \le \kappa d^{ \kappa } \varepsilon ^{ - \kappa } \), \( |( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(x) - ( {\mathcal {R}}_{a}(\phi ^{ 0, d }_{ \varepsilon }) )(y)| \le \kappa d^{\kappa } (1 + \Vert x\Vert ^{\theta } + \Vert y \Vert ^{\theta })\Vert x-y\Vert \), \( \Vert ( {\mathcal {R}}_{a}(\phi ^{ 1, d }_{ \varepsilon }) )(x) \Vert \le \kappa ( d^{ \kappa } + \Vert x \Vert ) \), \( \Vert \varphi _{ 1, d }( x ) - \varphi _{ 1, d }( y ) \Vert \le \kappa \Vert x - y \Vert \), and
and for every \( d \in {\mathbb {N}}\) let \( u_d :[0,T] \times {\mathbb {R}}^{d} \rightarrow {\mathbb {R}}\) be an at most polynomially growing viscosity solution of
with \( u_d( 0, x ) = \varphi _{ 0, d }( x ) \) for \( ( t, x ) \in (0,T) \times {\mathbb {R}}^d \) (cf. Definitions 3.1 and 3.3). Then there exist \( c \in {\mathbb {R}}\) and \( ( \Psi _{ d, \varepsilon } )_{ (d , \varepsilon ) \in {\mathbb {N}}\times (0,1] } \subseteq {\mathbf {N}}\) such that for all \( d \in {\mathbb {N}}\), \( \varepsilon \in (0,1] \) it holds that \( {\mathcal {P}}( \Psi _{ d, \varepsilon } ) \le c \, d^c \varepsilon ^{ - c } \), \( {\mathcal {R}}_{a}( \Psi _{ d, \varepsilon } ) \in C( {\mathbb {R}}^{ d }, {\mathbb {R}}) \), and
References
Beck, C., Becker, S., Cheridito, P., Jentzen, A., Neufeld, A.: Deep splitting method for parabolic PDEs. SIAM J. Sci. Comput. 43(5), A3135–A3154 (2021)
Beck, C., Becker, S., Grohs, P., Jaafari, N., Jentzen, A.: Solving the Kolmogorov PDE by means of deep learning. J. Sci. Comput. 88, 3 (2021). (Paper No. 73)
Beck, C., E, W., Jentzen, A.: Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations. J. Nonlinear Sci. 29(4), 1563–1619 (2019)
Beck, C., Hutzenthaler, M., Jentzen, A., Kuckuck, B.: An overview on deep learning-based approximation methods for partial differential equations. Revision requested from Discrete Contin. Dyn. Syst. Ser. B., arXiv:2012.12348 (2020)
Becker, S., Cheridito, P., Jentzen, A.: Deep optimal stopping. J. Mach. Learn. Res. 20, 25 (2019). (Paper No. 74)
Becker, S., Cheridito, P., Jentzen, A., Welti, T.: Solving high-dimensional optimal stopping problems using deep learning. Eur. J. Appl. Math. 32(3), 470–514 (2021)
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
Bensoussan, A., Lions, J.-L.: Applications of Variational Inequalities in Stochastic Control, vol 12 of Studies in Mathematics and Its Applications, vol. 12. North-Holland Publishing Co., Amsterdam (1982)
Berg, J., Nyström, K.: A unified deep artificial neural network approach to partial differential equations in complex geometries. Neurocomputing 317, 28–41 (2018)
Berner, J., Grohs, P., Jentzen, A.: Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations. SIAM J. Math. Data Sci. 2(3), 631–657 (2020)
Chan-Wai-Nam, Q., Mikael, J., Warin, X.: Machine learning for semi linear PDEs. J. Sci. Comput. 79(3), 1667–1712 (2019)
Chouiekh, A., Haj, E.H.I.E.: Convnets for fraud detection analysis. Proc. Comput. Sci. 127, 133–138 (2018)
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Dissanayake, M.W.M.G., Phan-Thien, N.: Neural-network-based approximations for solving partial differential equations. Commun. Numer. Meth. Eng. 10(3), 195–201 (1994)
Dockhorn, T.: A Discussion on Solving Partial Differential Equations Using Neural Networks. arXiv:1904.07200 (2019)
E, W., Han, J., Jentzen, J.A.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5(4), 349–380 (2017)
E, W., Yu, B.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1–12 (2018)
Elbrächter, D., Grohs, P., Jentzen, A., Schwab, C.: DNN expression rate analysis of high-dimensional PDEs: application to option pricing. Constr. Approx. 55, 3–71 (2021)
Farahmand, A.-M., Nabi, S., Nikovski, D.: Deep reinforcement learning for partial differential equation control. In: 2017 American Control Conference (ACC), pp 3120–3127 (2017)
Fujii, M., Takahashi, A., Takahashi, M.: Asymptotic expansion as prior knowledge in deep learning method for high dimensional BSDEs. Asia-Pac. Financ. Markets 26(3), 391–408 (2019)
Gonon, L., Grohs, P., Jentzen, A., Kofler, D., Šiška, D.: Uniform error estimates for artificial neural network approximations for heat equations. IMA J. Numer. Anal. (2021). https://doi.org/10.1093/imanum/drab027
Goudenège, L., Molent, A., Zanette, A.: Machine Learning for Pricing American Options in High Dimension. arXiv:1903.11275 (2019)
Graves, A., Mohamed, A.-R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 6645–6649 (2013)
Grohs, P., Hornung, F., Jentzen, A., von Wurstemberger, P.: A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations. Accepted in Mem. Am. Math. Soc. arXiv:1809.02362 (2018)
Grohs, P., Hornung, F., Jentzen, A., Zimmermann, P.: Space-time error estimates for deep neural network approximations for differential equations. Revision requested from Adv. Comput. Math. arXiv:1908.03833 (2019)
Han, J., Jentzen, A., E, W.: Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. 115(34), 8505–8510 (2018)
Han, J., Long, J.: Convergence of the deep BSDE method for coupled FBSDEs. Probab. Uncertain. Quant. Risk 5, 5 (2020)
Henry-Labordère, P.: Deep primal-dual algorithm for BSDEs: applications of machine learning to CVA and IM. SSRN (2017). https://doi.org/10.2139/ssrn.3071506
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.-R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Hornung, F., Jentzen, A., Salimova, D.: Space-time Deep Neural Network Approximations for High-dimensional Partial Differential Equations. arXiv:2006.02199 (2020)
Hu, B., Lu, Z., Li, H., Chen, Q.: Convolutional neural network architectures for matching natural language sentences. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 2 pp. 2042–2050 (2014)
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2261–2269 (2017)
Huré, C., Pham, H., Warin, X.: Some Machine Learning Schemes for High-dimensional Nonlinear PDEs. arXiv:1902.01599 (2019)
Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T.A.: A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. SN Partial Differ. Equ. Appl. 1(10), 1–34 (2020)
Jacquier, A., Oumgari, M.: Deep PPDEs for Rough Local Stochastic Volatility. arXiv:1906.02551 (2019)
Jentzen, A., Salimova, D., Welti, T.: A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients. Commun. Math. Sci. 19(5), 1167–1205 (2021)
Jianyu, L., Siwei, L., Yingjian, Q., Yaping, H.: Numerical solution of elliptic partial differential equation using radial basis function neural networks. Neural Netw. 16(5), 729–734 (2003)
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 655–665 (2014)
Khoo, Y., Lu, J., Ying, L.: Solving parametric PDE problems with artificial neural networks. Eur. J. Appl. Math. 32(3), 421–435 (2021)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Kutyniok, G., Petersen, P., Raslan, M., Schneider, R.: A theoretical analysis of deep neural networks and parametric PDEs. Constr. Approx. 55, 73–125 (2022)
Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 9(5), 987–1000 (1998)
Long, Z., Lu, Y., Ma, X., Dong, B.: PDE-Net: learning PDEs from data. In: Proceedings of the 35th International Conference on Machine Learning, pp. 3208–3216 (2018)
Lye, K.O., Mishra, S., Ray, D.: Deep learning observables in computational fluid dynamics. J. Comput. Phys. 410(26), 109339 (2020)
Magill, M., Qureshi, F., de Haan, H.W.: Neural networks trained to solve differential equations learn general representations. In: Advances in Neural Information Processing Systems, pp. 4071–4081 (2018)
Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems. Volume I: Linear Information, vol. 6 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich (2008)
Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems. Volume II: Standard Information for Functionals, vol. 12 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich (2010)
Pham, H., Warin, X.: Neural networks-based backward scheme for fully nonlinear PDEs. arXiv:1908.00412 (2019)
Raissi, M.: Deep hidden physics models: deep learning of nonlinear partial differential equations. J. Mach. Learn. Res. 19, 25:1-25:24 (2018)
Reisinger, C., Zhang, Y.: Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems. Anal. Appl. (Singap.) 18(6), 951–999 (2020)
Roy, A., Sun, J., Mahoney, R., Alonzi, L., Adams, S., Beling, P.: Deep learning detecting fraud in credit card transactions. In: 2018 Systems and Information Engineering Design Symposium (SIEDS), pp. 129–134 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Sirignano, J., Spiliopoulos, K.: DGM: a deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018)
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)
Wang, R., Fu, B., Fu, G., Wang, M.: Deep & cross network for ad click predictions. In: Proceedings of the ADKDD’17 (2017)
Wang, W., Yang, J., Xiao, J., Li, S., Zhou, D.: Face recognition based on deep learning. In: Human Centered Computing, pp. 812–820 (2015)
Wu, C., Karanasou, P., Gales, M.J., Sim, K.C.: Stimulated deep neural network for speech recognition. Interspeech 2016, 400–404 (2016)
Zhai, S., Chang, K.-h., Zhang, R., Zhang, Z.M.: Deepintent: learning attentions for online advertising with recurrent neural networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1295–1304 (2016)
Acknowledgements
This work has been funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy EXC 2044-390685587, Mathematics Münster: Dynamics-Geometry-Structure, by the Swiss National Science Foundation (SNSF) through the research grant 200020_175699, by ETH Foundations of Data Science (ETH-FDS), and by the startup fund project of Shenzhen Research Institute of Big Data through the research grant T00120220001.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is part of the topical collection “Deep learning and PDEs” edited by Arnulf Jentzen, Lin Lin, Siddhartha Mishra, and Lexing Ying.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Grohs, P., Jentzen, A. & Salimova, D. Deep neural network approximations for solutions of PDEs based on Monte Carlo algorithms. Partial Differ. Equ. Appl. 3, 45 (2022). https://doi.org/10.1007/s42985-021-00100-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42985-021-00100-z
Mathematics Subject Classification
- 65C99
- 68T05