DNN Expression Rate Analysis of High-Dimensional PDEs: Application to Option Pricing

Elbrächter, Dennis; Grohs, Philipp; Jentzen, Arnulf; Schwab, Christoph

doi:10.1007/s00365-021-09541-6

DNN Expression Rate Analysis of High-Dimensional PDEs: Application to Option Pricing

Open access
Published: 06 May 2021

Volume 55, pages 3–71, (2022)
Cite this article

Download PDF

You have full access to this open access article

Constructive Approximation Aims and scope

DNN Expression Rate Analysis of High-Dimensional PDEs: Application to Option Pricing

Download PDF

Dennis Elbrächter¹,
Philipp Grohs^2,3,
Arnulf Jentzen^4,5 &
…
Christoph Schwab⁴

4636 Accesses
47 Citations
1 Altmetric
Explore all metrics

Abstract

We analyze approximation rates by deep ReLU networks of a class of multivariate solutions of Kolmogorov equations which arise in option pricing. Key technical devices are deep ReLU architectures capable of efficiently approximating tensor products. Combining this with results concerning the approximation of well-behaved (i.e., fulfilling some smoothness properties) univariate functions, this provides insights into rates of deep ReLU approximation of multivariate functions with tensor structures. We apply this in particular to the model problem given by the price of a European maximum option on a basket of d assets within the Black–Scholes model for European maximum option pricing. We prove that the solution to the d-variate option pricing problem can be approximated up to an $\varepsilon $-error by a deep ReLU network with depth ${\mathcal {O}}\big (\ln (d)\ln (\varepsilon ^{-1})+\ln (d)^2\big )$ and ${\mathcal {O}}\big (d^{2+\frac{1}{n}}\varepsilon ^{-\frac{1}{n}}\big )$ nonzero weights, where $n\in {\mathbb {N}}$ is arbitrary (with the constant implied in ${\mathcal {O}}(\cdot )$ depending on n). The techniques developed in the constructive proof are of independent interest in the analysis of the expressive power of deep neural networks for solution manifolds of PDEs in high dimension.

Deep ReLU network expression rates for option prices in high-dimensional, exponential Lévy models

Article Open access 31 August 2021

Differential learning methods for solving fully nonlinear PDEs

Article 15 March 2023

A deep learning approach to the probabilistic numerical solution of path-dependent partial differential equations

Article 24 July 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

1.1 Motivation

The development of new classification and regression algorithms based on deep neural networks—coined “deep learning”—revolutionized the area of artificial intelligence, machine learning and data analysis [15]. More recently, these methods have been applied to the numerical solution of partial differential equations (PDEs for short) [3, 12, 21, 22, 27, 32, 39, 41, 42]. In these works, it has been empirically observed that deep learning-based methods work exceptionally well when used for the numerical solution of high-dimensional problems arising in option pricing. The numerical experiments carried out in [2, 3, 21, 42] in particular suggest that deep learning-based methods may not suffer from the curse of dimensionality for these problems, but only few theoretical results exist which support this claim: In [38], a first theoretical result on rates of expression of infinite-variate generalized polynomial chaos expansions for solution manifolds of certain classes of parametric PDEs has been obtained. Furthermore, recent work [4, 18] shows that the algorithms introduced in [2] for the numerical solution of Kolmogorov PDEs are free of the curse of dimensionality in terms of network size and training sample complexity.

Neural networks constitute a parametrized class of functions constructed by successive applications of affine mappings and coordinatewise nonlinearities; see [35] for a mathematical introduction. As in [34], we introduce a neural network via a tuple of matrix-vector pairs

$$\begin{aligned} \Phi= & {} (((A^1_{i,j})_{i,j=1}^{N_1,N_0},(b^1_i)_{i=1}^{N_1}),\dots , ((A^L_{i,j})_{i,j=1}^{N_L,N_{L-1}},(b^L_i)_{i=1}^{N_L}))\\&\in \times _{l=1}^L\left( {{\mathbb {R}}^{N_l\times N_{l-1}}\times {\mathbb {R}}^{N_l}}\right) \end{aligned}$$

for given hyperparameters $L\in {\mathbb {N}}$, $N_0,N_1,\dots ,N_L\in {\mathbb {N}}$. Given an “activation function” $\varrho \in C({\mathbb {R}},{\mathbb {R}})$, a neural network $\Phi $ then describes a function $R_\varrho (\Phi )\in C({\mathbb {R}}^{N_0},{\mathbb {R}}^{N_L})$ that can be evaluated by the recursion

$$\begin{aligned} x_l = \varrho ( A_{l} x_{l-1} + b_{1}), l=1,\dots , L-1,\quad \left[ { R_{ \varrho }( \Phi ) }\right] ( x_0 ) = A_L x_{ L - 1 } + b_L. \end{aligned}$$

(1.1)

The number of nonzero values in the matrix-vector tuples defining $\Phi $ describes the size of $\Phi $ which will be denoted by ${\mathcal {M}}(\Phi )$ and the depth of the network $\Phi $, i.e., its number of affine transformations, will be denoted by ${\mathcal {L}}(\Phi )$. We refer to Setting 5.1 for a more detailed description. A popular activation function $\varrho $ is the so-called rectified linear unit $\mathrm {ReLU}(x)=\max \{x,0\}$ [15].

An increasing body of research addresses the approximation properties (or “expressive power”) of deep neural networks, where by “approximation properties” we mean the study of the optimal trade-off between the size ${\mathcal {M}}(\Phi )$ and the approximation error $\Vert u-R_\varrho (\Phi )\Vert $ of neural networks approximating functions u from a given function class. Classical references include [1, 7, 8, 23] as well as the summary [35] and the references therein. In these works, it is shown that deep neural networks provide optimal approximation rates for classical smoothness spaces such as Sobolev spaces or Besov spaces. More recently, these results have been extended to Shearlet and Ridgelet spaces [5], modulation spaces [33], piecewise smooth functions [34] and polynomial chaos expansions [38]. All these results indicate that all classical approximation methods based on sparse expansions can be emulated by neural networks.

1.2 Contributions and Main Result

As a first main contribution of this work, we show in Proposition 6.4 that low-rank functions of the form

$$\begin{aligned} (x_1,\dots , x_d)\in {\mathbb {R}}^d \mapsto \sum _{s=1}^Rc_s\prod _{j=1}^dh_j^s(x_j), \end{aligned}$$

(1.2)

with $h_j^s\in C({\mathbb {R}},{\mathbb {R}})$ sufficiently regular and $(c_s)_{s=1}^R\subseteq {\mathbb {R}}$ can be approximated to a given relative precision by deep ReLU neural networks of size scaling like $Rd^2$. In particular, we obtain a dependence on the dimension d that is only polynomial and not exponential, i.e., we avoid the curse of dimensionality. In other words, we show that in addition all classical approximation methods based on sparse expansions and on more general low-rank structures, can be emulated by neural networks. Since the solutions of several classes of high-dimensional PDEs are precisely of this form (see, e.g., [38]), our approximation results can be directly applied to these problems to establish approximation rates for neural network approximations that do not suffer from the curse of dimensionality. Note that approximation results for functions of the form (1.2) have previously been considered in [37] in the context of statistical bounds for nonparametric regression.

Moreover, we remark that the networks realizing the product in (1.2) itself have a connectivity scaling which is logarithmic in the accuracy $\varepsilon ^{-1}$. While we will, for our concrete example, only obtain a spectral connectivity scaling, i.e., like $\varepsilon ^{-{\frac{1}{n}}}$ for any $n\in {\mathbb {N}}$ with the implicit constant depending on n, this tensor construction may be used to obtain logarithmic scaling (w.r.t. the accuracy) for d-variate functions in cases where the univariate $h_j^s$ can be approximated with a logarithmic scaling.

As a particular application of the tools developed in the present paper, we provide a mathematical analysis of the rates of expressive power of neural networks for a particular, high-dimensional PDE which arises in mathematical finance, namely the pricing of a so-called European maximum option (see, e.g., [43]).

We consider the particular (and not quite realistic) situation that the log-returns of these d assets are uncorrelated, i.e., their log-returns evolve according to d uncorrelated drifted scalar diffusion processes.

The price of the European maximum option on this basket of d assets can then be obtained as solution of the multivariate Black–Scholes equation which reads, for the presently considered case of uncorrelated assets, as

$$\begin{aligned} \textstyle ( \tfrac{ \partial }{ \partial t } u )(t,x) + \tfrac{ \mu }{ 2 } \sum \limits _{ i = 1 }^d x_i \big ( \tfrac{ \partial }{ \partial x_i } u \big )(t,x) + \tfrac{ \sigma ^2 }{ 2 } \sum \limits _{ i = 1 }^d | x_i |^2 \big ( \tfrac{ \partial ^2 }{ \partial x_i^2 } u \big )(t,x) = 0 \;. \end{aligned}$$

(1.3)

For the European maximum option, (1.3) is completed with the terminal condition

$$\begin{aligned} u(T,x) = \varphi (x) = \max \{ x_1 - K_1, x_2 - K_2, \dots , x_d - K_d , 0 \} \end{aligned}$$

(1.4)

for $ x = ( x_1, \dots , x_d ) \in (0,\infty )^d $. It is well known (see, e.g., [11, 20] and the references therein) that there exists a unique solution of (1.3)–(1.4). This solution can be expressed as conditional expectation of the function $\varphi (x)$ in (1.4) over suitable sample paths of a d-dimensional diffusion.

One main result of this paper is the following result (stated with completely detailed assumptions as Theorem 7.3), on expression rates of deep neural networks for the basket option price u(0, x) for $x\in [a,b]^d$ for some $0<a<b< \infty $. To render their dependence on the number d of assets in the basket explicit, we write $u_d$ in the statement of the theorem.

Theorem 1.1

Let $n\in {\mathbb {N}}$, $\mu \in {\mathbb {R}}$, $T,\sigma ,a\in (0,\infty )$, $b\in (a,\infty )$, $(K_i)_{i\in {\mathbb {N}}}\subseteq [0,K_{\mathrm {max}})$, and let $u_d:(0,\infty )\times [a,b]^d\rightarrow {\mathbb {R}}$, $d\in {\mathbb {N}}$, be the functions which satisfy for every $d\in {\mathbb {N}}$, and for every $(t,x) \in [0,T]\times (0,\infty )^d$ the equation (1.3) with terminal condition (1.4).

Then there exist neural networks $(\Gamma _{d,\varepsilon })_{\varepsilon \in (0,1],d\in {\mathbb {N}}}$ which satisfy

(i)
$\displaystyle \sup _{\varepsilon \in (0,1],d\in {\mathbb {N}}}\left[ {\frac{{\mathcal {L}}(\Gamma _{d,\varepsilon })}{\max \{1,\ln (d)\}\left( {|\ln (\varepsilon )|+\ln (d)+1}\right) }}\right] <\infty $,
(ii)
$\displaystyle \sup _{\varepsilon \in (0,1],d\in {\mathbb {N}}}\left[ {\frac{{\mathcal {M}}(\Gamma _{d,\varepsilon })}{d^{2+\frac{1}{n}}\varepsilon ^{-\frac{1}{n}}}}\right] <\infty $, and
(iii)
for every $\varepsilon \in (0,1]$, $d\in {\mathbb {N}}$,
$$\begin{aligned} \sup _{x\in [a,b]^d}\left| {u_d(0,x)-\left[ {R_{\mathrm {ReLU}}(\Gamma _{d,\varepsilon })}\right] \!(x)}\right| \le \varepsilon . \end{aligned}$$
(1.5)

Informally speaking, the previous result states that the price of a d-dimensional European maximum option can, for every $n\in {\mathbb {N}}$, be expressed on cubes $[a,b]^d$ by deep neural networks to pointwise accuracy $\varepsilon >0$ with network size bounded as ${\mathcal {O}}(d^{2+1/n} \varepsilon ^{-1/n})$ for arbitrary, fixed $n\in {\mathbb {N}}$ and with the constant implied in ${\mathcal {O}}(\cdot )$ independent of d and of $\varepsilon $ (but depending on n). In other words, the price of a European maximum option on a basket of d assets can be approximated (or “expressed”) by deep ReLU networks with spectral accuracy and without curse of dimensionality.

The proof of this result is based on a near explicit expression for the function $u_d(0,x)$ (see Sect. 2). It uses this expression in conjunction with regularity estimates in Sect. 3 and a neural network quadrature calculus and corresponding error estimates (which is of independent interest) in Sect. 4 to show that the function $u_d(0,x)$ possesses an approximate low-rank representation consisting of tensor products of cumulative normal distribution functions (Lemma 4.3) to which the low-rank approximation result mentioned above can be applied.

Related results have been shown in the recent work [18] which proves (by completely different methods) that solutions to general Kolmogorov equations with affine drift and diffusion terms can be approximated by neural networks of a size that scales polynomially in the dimension and the reciprocal of the desired accuracy as measured by the $L^p$ norm with respect to a given probability measure. The approximation estimates developed in the present paper only apply to the European maximum option pricing problem for uncorrelated assets but hold with respect to the much stronger $L^\infty $ norm and provide spectral accuracy in $\varepsilon $ (as opposed to a low-order polynomial rate obtained in [18]), which is a considerable improvement. In summary, compared to [18], the present paper treats a more restricted problem but achieves stronger approximation results.

In order to give some context to our approximation results, we remark that solutions to Kolmogorov PDEs may, under reasonable assumptions, be approximated by empirical risk minimization over a neural network hypothesis class. The key here is the Feynman–Kac formula which allows to write the solution to the PDE as the expectation of an associated stochastic process. This expectation can be approximated by Monte Carlo integration, i.e., one can view it as a neural network training problem where the data are generated by Monte Carlo sampling methods which, under suitable conditions, are capable of avoiding the curse of dimensionality. For more information on this, we refer to [4].

While we admit that the European maximum option pricing problem for uncorrelated assets constitutes a rather special problem, the proofs in this paper develop several novel deep neural network approximation results of independent interest that can be applied to more general settings where a low-rank structure is implicit in high-dimensional problems. For mostly numerical results on machine learning for pricing American options, we refer to [16]. Lastly we note that after a first preprint of the present paper was submitted, a number of research articles related to this work have appeared [13, 14, 17, 19, 24,25,26, 28, 36].

1.3 Outline

The structure of this article is as follows. Section 2 provides a derivation of the semi-explicit formula for the price of European maximum options in a standard Black–Scholes setting. This formula consists of an integral of a tensor product function. In Sect. 3, we develop some auxiliary regularity results for the cumulative normal distribution that are of independent interest which will be used later on. In Sect. 4, we show that the integral appearing in the formula of Sect. 2 can be efficiently approximated by numerical quadrature. Section 5 introduces some basic facts related to deep ReLU networks, and Sect. 6 develops basic approximation results for the approximation of functions which possess a tensor product structure. Finally, in Sect. 7 we show our main result, namely a spectral approximation rate for the approximation of European maximum options by deep ReLU networks without curse of dimensionality. In Appendix A, we collect some auxiliary proofs.

2 High-Dimensional Derivative Pricing

In this section, we briefly review the Black–Scholes differential equation (1.3) which arises, among others, as Kolmogorov equation for multivariate geometric Brownian Motion. This linear, parabolic equation is for one particular type of financial contracts (so-called European maximum option on a basket of d stocks whose log-returns are assumed for simplicity as mutually uncorrelated) endowed with the terminal condition (1.4) and solved for $(t,x)\in [0,T]\times (0,\infty )^d$.

Proposition 2.1

Let $ d \in {\mathbb {N}}$, $ \mu \in {\mathbb {R}}$, $ \sigma , T, K_1, \dots , K_d, \xi _1, \dots , \xi _d \in (0,\infty ) $, let $ ( \Omega , {\mathcal {F}}, {\mathbb {P}}) $ be a probability space, and let $ W = ( W^{ (1) }, \dots , W^{ (d) } ) :[0,T] \times \Omega \rightarrow {\mathbb {R}}^d $ be a standard Brownian motion and let $u\in C([0,T]\times (0,\infty )^d)$ satisfy (1.3) and (1.4). Then for $x = (\xi _1,\dots , \xi _d)\in (0,\infty )^d$ it holds that

$$\begin{aligned} \begin{aligned} u(0,x) =&{\mathbb {E}}\!\left[ \max _{ i \in \{ 1, 2, \dots , d \} } \left( \max \!\left\{ \exp \!\big ( \big [ \mu - \tfrac{ \sigma ^2 }{ 2 } \big ] T + \sigma W_T^{ (i) } \big ) \, \xi _i - K_i , 0 \right\} \right) \right] \\ =&\int _0^{ \infty } 1 - \left[ \textstyle \prod \limits _{ i = 1 }^d \left( \int _{ - \infty }^{ \frac{ 1 }{ \sigma \sqrt{ T } } \left[ \ln \left( \frac{ y + K_i }{ \xi _i } \right) - \left( \mu - [ \nicefrac { \sigma ^2 }{ 2 } ] \right) T \right] } \tfrac{ 1 }{ \sqrt{ 2 \pi } } \, \exp \!\left( - \frac{ r^2 }{ 2 } \right) \mathrm{d}r \right) \right] \mathrm{d}y . \end{aligned}\nonumber \\ \end{aligned}$$

(2.1)

For the proof of this Proposition, we require the following well-known result.

Lemma 2.2

(Complementary distribution function formula) Let $ \mu :{\mathcal {B}}( [0,\infty ) ) \rightarrow [0,\infty ] $ be a sigma-finite measure. Then

$$\begin{aligned} \int _0^{ \infty } x \, \mu ( dx ) = \int _0^{ \infty } \mu ( [x,\infty ) ) \, dx . \end{aligned}$$

(2.2)

We are now in position to provide a proof of Proposition 2.1.

Proof of Proposition 2.1

The first equality follows directly from the Feynman–Kac formula [20, Corollary 4.17]. We proceed with a proof of the second equality. Throughout this proof, let $X_i :\Omega \rightarrow {\mathbb {R}}$, $i \in \{ 1, 2, \dots , d \}$, be random variables which satisfy for every $ i \in \{ 1, 2, \dots , d \} $

$$\begin{aligned} X_i = \exp \!\big ( \big [ \mu - \tfrac{ \sigma ^2 }{ 2 } \big ] T + \sigma W_T^{ (i) } \big ) \, \xi _i \end{aligned}$$

(2.3)

and let $ Y :\Omega \rightarrow {\mathbb {R}}$ be the random variable given by

$$\begin{aligned} Y = \max \{ X_1 - K_1 , \dots , X_d - K_d , 0 \} . \end{aligned}$$

(2.4)

Observe that for every $ y \in (0,\infty ) $ it holds

$$\begin{aligned} {\mathbb {P}}\!\left( Y \ge y \right)&= 1 - {\mathbb {P}}\!\left( Y< y \right) = 1 - {\mathbb {P}}\!\left( \max _{ i \in \{ 1, 2, \dots , d \} } \left( X_i - K_i \right)< y \right) \nonumber \\&= 1 - {\mathbb {P}}\!\left( \cap _{ i \in \{ 1, 2, \dots , d \} } \left\{ X_i - K_i< y \right\} \right) = 1 - \textstyle \prod \limits _{ i = 1 }^d {\mathbb {P}}\!\left( X_i - K_i< y \right) \nonumber \\&= 1 - \textstyle \prod \limits _{ i = 1 }^d {\mathbb {P}}\!\left( X_i< y + K_i \right) \nonumber \\&= 1 - \textstyle \prod \limits _{ i = 1 }^d {\mathbb {P}}\!\left( \exp \!\big ( \big [ \mu - \tfrac{ \sigma ^2 }{ 2 } \big ] T + \sigma W_T^{ (i) } \big ) \, \xi _i < y + K_i \right) . \end{aligned}$$

(2.5)

Hence, we obtain that for every $ y \in (0,\infty ) $ it holds

$$\begin{aligned} \begin{aligned} {\mathbb {P}}\!\left( Y \ge y \right)&= 1 - \textstyle \prod \limits _{ i = 1 }^d {\mathbb {P}}\!\left( \exp \!\big ( \big [ \mu - \tfrac{ \sigma ^2 }{ 2 } \big ] T + \sigma W_T^{ (i) } \big )< \frac{ y + K_i }{ \xi _i } \right) \\&= 1 - \textstyle \prod \limits _{ i = 1 }^d {\mathbb {P}}\!\left( \sigma W_T^{ (i) }< \ln \!\left( \frac{ y + K_i }{ \xi _i } \right) - \big [ \mu - \tfrac{ \sigma ^2 }{ 2 } \big ] T \right) \\&= 1 - \textstyle \prod \limits _{ i = 1 }^d {\mathbb {P}}\!\left( \frac{ 1 }{ \sqrt{T} } W_T^{ (i) } < \frac{ 1 }{ \sigma \sqrt{ T } } \left[ \ln \!\left( \frac{ y + K_i }{ \xi _i } \right) - \big [ \mu - \tfrac{ \sigma ^2 }{ 2 } \big ] T \right] \right) . \end{aligned} \end{aligned}$$

(2.6)

This shows that for every $ y \in (0,\infty ) $ it holds

$$\begin{aligned} \begin{aligned} {\mathbb {P}}\!\left( Y \ge y \right)&= 1 - \left[ \textstyle \prod \limits _{ i = 1 }^d \left( \int _{ - \infty }^{ \frac{ 1 }{ \sigma \sqrt{ T } } \left[ \ln \left( \frac{ y + K_i }{ \xi _i } \right) - \left( \mu - [ \nicefrac { \sigma ^2 }{ 2 } ] \right) T \right] } \tfrac{ 1 }{ \sqrt{ 2 \pi } } \, \exp \!\left( - \frac{ r^2 }{ 2 } \right) \mathrm{d}r \right) \right] . \end{aligned} \end{aligned}$$

(2.7)

Combining this with Lemma 2.2 completes the proof of Proposition 2.1. $\square $

With Lemma 2.2 and Proposition 2.1, we may write

$$\begin{aligned} \begin{aligned} u(0,x) = {\mathbb {E}}\!\left[ \varphi \!\left( \exp \!\left( \left[ \mu - \nicefrac { \sigma ^2 }{ 2 } \right] T + \sigma W^{ (1) }_T \right) x_1 , \ldots , \exp \!\left( \left[ \mu - \nicefrac { \sigma ^2 }{ 2 } \right] T + \sigma W^{ (d) }_T \right) x_d \right) \right] \end{aligned} \end{aligned}$$

(2.8)

(“semi-explicit” formula). Let us consider the case $ \mu = \sigma ^2 / 2 $, $ T = \sigma = 1 $ and $ K_1 = \ldots = K_d = K \in (0,\infty ) $. Then for every $ x = ( x_1, \dots , x_d) \in (0,\infty )^d $

$$\begin{aligned} \begin{aligned} u(0,x)&= {\mathbb {E}}\!\left[ \varphi \!\left( e^{ W^{ (1) }_T } x_1 , \ldots , e^{ W^{ (d) }_T } x_d \right) \right] = {\mathbb {E}}\!\left[ \varphi \!\left( e^{ W^{ (1) }_1 } x_1 , \ldots , e^{ W^{ (d) }_1 } x_d \right) \right] \\&= {\mathbb {E}}\!\left[ \max \!\left\{ e^{ W^{ (1) }_1 } x_1 - K , \ldots , e^{ W^{ (d) }_1 } x_d - K , 0 \right\} \right] \\&= \int _0^{ \infty } 1 - \left[ \prod _{ i = 1 }^d \int _{ - \infty }^{ \ln ( \frac{ K + c }{ x_i } ) } \tfrac{ 1 }{ \sqrt{ 2 \pi } } \, \exp \!\left( - \tfrac{ r^2 }{ 2 } \right) \mathrm{d}r \right] \mathrm{d}c . \end{aligned} \end{aligned}$$

(2.9)

3 Regularity of the Cumulative Normal Distribution

Now that we have derived an semi-explicit formula for the solution, we establish regularity properties of the integrand function in (2.9). This will be required in order to approximate the multivariate integrals by quadratures (which are subsequently realized by neural networks) in Sect. 4 and to apply the neural network results from Sect. 6 to our problem. To this end, we analyze the derivatives of the factors in the tensor product, which essentially are compositions of the cumulative normal distribution with the natural logarithm. As this function appears in numerous closed-form option pricing formulae (see, e.g., [29]), the (Gevrey) type regularity estimates obtained in this section are of independent interest. (They may, for example, also be used in the analysis of deep network expression rates and of spectral methods for option pricing).

Lemma 3.1

Let $f:(0,\infty )\rightarrow {\mathbb {R}}$ be the function which satisfies for every $t\in (0,\infty )$ that

$$\begin{aligned} f(t)=\tfrac{1}{\sqrt{2\pi }}\int ^{\ln (t)}_{-\infty } e^{-\frac{1}{2}r^2}\mathrm {d}r, \end{aligned}$$

(3.1)

let $g_{n,k}:(0,\infty )\rightarrow {\mathbb {R}}$, $n,k\in {\mathbb {N}}_0$, be the functions which satisfy for every $n,k\in {\mathbb {N}}_0$, $t\in (0,\infty )$ that

$$\begin{aligned} g_{n,k}(t)=t^{-n}e^{-\frac{1}{2}[\ln (t)]^2}[\ln (t)]^k, \end{aligned}$$

(3.2)

and let $(\gamma _{n,k})_{n,k\in {\mathbb {Z}}}\subseteq {\mathbb {Z}}$ be the integers which satisfy for every $n,k\in {\mathbb {Z}}$ that

$$\begin{aligned} \gamma _{n,k} = {\left\{ \begin{array}{ll}1 &{} :n=1,k=0\\ -\gamma _{n-1,k-1}-(n-1)\gamma _{n-1,k}+(k+1)\gamma _{n-1,k+1} &{} :n>1, 0\le k<n\\ 0 &{} :\mathrm {else} \end{array}\right. }. \nonumber \\ \end{aligned}$$

(3.3)

Then it holds for every $n\in {\mathbb {N}}$ that

(i)
we have that f is n times continuously differentiable and
(ii)
we have for every $t\in (0,\infty )$ that
$$\begin{aligned} f^{(n)}(t)=\tfrac{1}{\sqrt{2\pi }}\left[ {\sum _{k=0}^{n-1}\gamma _{n,k}\,g_{n,k}(t)}\right] . \end{aligned}$$
(3.4)

Proof of Lemma 3.1

We prove (i) and (ii) by induction on $n\in {\mathbb {N}}$. For the base case $n=1$ note that (3.1), (3.2), (3.3), the fact that the function ${\mathbb {R}}\ni r\mapsto e^{-\frac{1}{2}r^2}\in (0,\infty )$ is continuous, the fundamental theorem of calculus and the chain rule yield

(A)
that f is differentiable and
(B)
that for every $t\in (0,\infty )$ it holds
$$\begin{aligned} f'(t)=\tfrac{1}{\sqrt{2\pi }}\, e^{-\frac{1}{2}[\ln (t)]^2}t^{ - 1 } = \tfrac{1}{\sqrt{2\pi }}\, g_{1,0}(t) = \tfrac{1}{\sqrt{2\pi }}\, \gamma _{ 1, 0 } \, g_{1,0}(t) . \end{aligned}$$
(3.5)

This establishes (i) and (ii) in the base case $n=1$. For the induction step ${{\mathbb {N}}\ni n\rightarrow n+1\in \{2,3,4,\dots \}}$, note that for every $t\in (0,\infty )$ we have

$$\begin{aligned} \tfrac{\mathrm {d}}{\mathrm {d}t}\left[ {e^{-\frac{1}{2}[\ln (t)]^2}}\right] = -t^{-1}e^{-\frac{1}{2}[\ln (t)]^2}\ln (t). \end{aligned}$$

(3.6)

Combining this and (3.2) with the product rule establishes for every $n\in {\mathbb {N}}$, $k \in \{ 0, 1, \dots , n - 1 \} $, $ t \in (0,\infty ) $ that

$$\begin{aligned} \begin{aligned} (g_{n,k})'(t)&=\tfrac{\mathrm {d}}{\mathrm {d}t}\left[ {t^{-n}e^{-\frac{1}{2}[\ln (t)]^2}[\ln (t)]^k}\right] \\&=-nt^{-(n+1)}e^{-\frac{1}{2}[\ln (t)]^2}[\ln (t)]^k-t^{-(n+1)}e^{-\frac{1}{2}[\ln (t)]^2}[ \ln (t) ]^{k+1}\\&\quad +t^{-(n+1)}e^{-\frac{1}{2}[\ln (t)]^2}k[\ln (t)]^{ \max \{ k - 1 , 0 \} } \\&=-g_{n+1,k+1}(t)-ng_{n+1,k}(t)+kg_{n+1,\max \{ k - 1 , 0 \} }(t). \end{aligned} \end{aligned}$$

(3.7)

Hence, we obtain that for every $n\in {\mathbb {N}}$, $t\in (0,\infty )$ it holds

$$\begin{aligned} \begin{aligned}&\sum _{k=0}^{n-1}\gamma _{n,k}(g_{n,k})'(t)\\&\quad =\sum _{k=0}^{n-1}\left[ {\gamma _{n,k}\left( {-g_{n+1,k+1}(t)-ng_{n+1,k}(t)+kg_{n+1,\max \{ k-1, 0 \} }(t)}\right) }\right] \\&\quad =\sum _{k=0}^{n-1}-\gamma _{n,k}\,g_{n+1,k+1}(t)+\sum _{k=0}^{n-1}-n\gamma _{n,k}\,g_{n+1,k}(t)+\sum _{k=1}^{n-1}k\gamma _{n,k}\,g_{n+1,\max \{ k-1, 0 \} }(t)\\&\quad =\sum _{k=1}^{n}-\gamma _{n,k-1}\,g_{n+1,k}(t)+\sum _{k=0}^{n-1}-n\gamma _{n,k}\,g_{n+1,k}(t)+\sum _{k=0}^{n-2}(k+1)\gamma _{n,k+1}\,g_{n+1,k}(t). \end{aligned} \end{aligned}$$

(3.8)

The fact that for every $n\in {\mathbb {N}}$ it holds that $\gamma _{n,-1}=\gamma _{n,n}=\gamma _{n,n+1}=0$ and (3.3) therefore ensure that for every $n\in {\mathbb {N}}$, $t\in (0,\infty )$ we have

$$\begin{aligned} \begin{aligned} \sum _{k=0}^{n-1}\gamma _{n,k}(g_{n,k})'(t)&=\sum _{k=0}^{n}\left[ {\left( {-\gamma _{n,k-1}-n\gamma _{n,k}+(k+1)\gamma _{n,k+1}}\right) g_{n+1,k}(t)}\right] \\&=\sum _{k=0}^{n}\gamma _{n+1,k}\,g_{n+1,k}(t). \end{aligned} \end{aligned}$$

(3.9)

Induction thus establishes (i) and (ii). The proof of Lemma 3.1 is thus completed. $\square $

Using the recursive formula from the above, we can now bound the derivatives of f. Note that the supremum of $f^{(n)}$ is actually attained on the interval $[e^{-4n},1]$ and scales with n like $e^{(cn^2)}$ for some $c\in (0,\infty )$. This can directly be seen by calculating the maximum of the $g_{n,k}$ from (3.2). For our purposes, however, it is sufficient to establish that all derivatives of f are bounded on $(0,\infty )$.

Lemma 3.2

Let $f:(0,\infty )\rightarrow {\mathbb {R}}$ be the function which satisfies for every $t\in (0,\infty )$ that

$$\begin{aligned} f(t)=\tfrac{1}{\sqrt{2\pi }}\int ^{\ln (t)}_{-\infty } e^{-\frac{1}{2}r^2}\mathrm {d}r. \end{aligned}$$

(3.10)

Then it holds for every $n\in {\mathbb {N}}$ that

$$\begin{aligned} \sup _{t\in (0,\infty )}\left| {f^{(n)}(t)}\right| \le \max \!\left\{ (n-1)!\,2^{n-2}\, ,\sup _{t\in [e^{-4n},1]}\left| {f^{(n)}(t)}\right| \right\} <\infty . \end{aligned}$$

(3.11)

Proof of Lemma 3.2

Throughout this proof, let $g_{n,k}:(0,\infty )\rightarrow {\mathbb {R}}$, $n,k\in {\mathbb {N}}_0$, be the functions introduced in (3.2) and let $(\gamma _{n,k})_{n,k\in {\mathbb {Z}}}\subseteq {\mathbb {Z}}$ be the integers introduced in (3.3). Then Lemma 3.1 shows for every $n\in {\mathbb {N}}$ that

(a)
we have that f is n times continuously differentiable and
(b)
we have for every $t\in (0,\infty )$ that
$$\begin{aligned} f^{(n)}(t)=\tfrac{1}{\sqrt{2\pi }}\left[ {\sum _{k=0}^{n-1}\gamma _{n,k}\,g_{n,k}(t)}\right] . \end{aligned}$$
(3.12)

In addition, observe that for every $m\in {\mathbb {N}}$, $t\in (0,e^{-2m}]$ holds ${\tfrac{1}{2}\ln (t)\le -m}$. This ensures that for every $m\in {\mathbb {N}}$, $t\in (0,e^{-2m}]\subseteq (0,1]$ we have

$$\begin{aligned} \begin{aligned} e^{-\frac{1}{2}[\ln (t)]^2}&=e^{\left[ {\ln (t)(-\frac{1}{2}\ln (t))}\right] }=\left[ {e^{\ln (t)}}\right] ^{-\frac{1}{2}\ln (t)} =t^{-\frac{1}{2}\ln (t)}=\left( {\tfrac{1}{t}}\right) ^{\frac{1}{2}\ln (t)} \le \left( {\tfrac{1}{t}}\right) ^{-m}=t^m. \end{aligned} \end{aligned}$$

(3.13)

Moreover, note that the fundamental theorem of calculus implies for every $t\in (0,1]$ that

$$\begin{aligned} \begin{aligned} \left| {\ln (t)}\right|&=\left| {\ln (t)-\ln (1)}\right| =\left| {\ln (1)-\ln (t)}\right| =\left| {\int _t^1 \frac{1}{s}\,\mathrm {d}s}\right| \le \left| {\frac{1}{t}(1-t)}\right| \le t^{-1}. \end{aligned} \end{aligned}$$

(3.14)

Combining (3.2), (3.12) and (3.13) therefore establishes that for every $n\in {\mathbb {N}}$, ${t\in (0,e^{-4n})}{\subseteq (0,1]}$ it holds

$$\begin{aligned} \begin{aligned} \left| {f^{(n)}(t)}\right|&=\tfrac{1}{\sqrt{2\pi }}\left| {\sum _{k=0}^{n-1}\gamma _{n,k}\,g_{n,k}(t)}\right| =\tfrac{1}{\sqrt{2\pi }}\left| {\sum _{k=0}^{n-1}\gamma _{n,k}t^{-n}e^{-\frac{1}{2}[\ln (t)]^2}[\ln (t)]^k}\right| \\&\le \tfrac{1}{\sqrt{2\pi }}\left[ {\sum _{k=0}^{n-1}\left| {\gamma _{n,k}}\right| t^{n-k}}\right] \le \tfrac{1}{\sqrt{2\pi }}\left[ {\sum _{k=0}^{n-1}\left| {\gamma _{n,k}}\right| }\right] . \end{aligned} \end{aligned}$$

(3.15)

In addition, observe that the fundamental theorem of calculus ensures that for every $t\in [1,\infty )$ we have

$$\begin{aligned} \left| {\ln (t)}\right| =\left| {\ln (t)-\ln (1)}\right| =\left| {\int _1^t \frac{1}{s}\,\mathrm {d}s}\right| \le \left| {t-1}\right| \le t. \end{aligned}$$

(3.16)

This, (3.2), (3.12) and the fact that for every $t\in (0,\infty )$ it holds $|e^{-\frac{1}{2}[\ln (t)]^2}|\le 1$ imply that for every $n\in {\mathbb {N}}$, $t\in (1,\infty )$ we have

$$\begin{aligned} \begin{aligned} \left| {f^{(n)}(t)}\right|&=\tfrac{1}{\sqrt{2\pi }}\left| {\sum _{k=0}^{n-1}\gamma _{n,k}\,g_{n,k}(t)}\right| =\tfrac{1}{\sqrt{2\pi }}\left| {\sum _{k=0}^{n-1}\gamma _{n,k}t^{-n}e^{-\frac{1}{2}[\ln (t)]^2}[\ln (t)]^k}\right| \\&\le \tfrac{1}{\sqrt{2\pi }}\left[ {\sum _{k=0}^{n-1}\left| {\gamma _{n,k}}\right| t^{-n}\left| {\ln (t)}\right| ^k}\right] \le \tfrac{1}{\sqrt{2\pi }}\left[ {\sum _{k=0}^{n-1}\left| {\gamma _{n,k}}\right| t^{-n}t^k}\right] \\&=\tfrac{1}{\sqrt{2\pi }}\left[ {\sum _{k=0}^{n-1}\left| {\gamma _{n,k}}\right| t^{-n+k}}\right] \le \tfrac{1}{\sqrt{2\pi }}\left[ {\sum _{k=0}^{n-1}\left| {\gamma _{n,k}}\right| }\right] . \end{aligned} \end{aligned}$$

(3.17)

Moreover, observe that (a) assures that for every $n\in {\mathbb {N}}$ it holds that the function $f^{(n)}$ is continuous. This and the boundedness of the set $[e^{-4n},1]$ ensure that for every $n\in {\mathbb {N}}$ we have

$$\begin{aligned} \sup _{t\in [e^{-4n},1]}\left| {f^{(n)}(t)}\right| <\infty . \end{aligned}$$

(3.18)

Combining this with (3.15) and (3.17) establishes that for every $n\in {\mathbb {N}}$ we have

$$\begin{aligned} \sup _{t\in (0,\infty )}\left| {f^{(n)}(t)}\right| \le \max \!\left\{ \tfrac{1}{\sqrt{2\pi }}\left[ {\sum _{k=0}^{n-1}\left| {\gamma _{n,k}}\right| }\right] , \sup _{t\in [e^{-4n},1]}\left| {f^{(n)}(t)}\right| \right\} <\infty . \end{aligned}$$

(3.19)

Furthermore, note that (3.3) implies that for every $n\in \{2,3,4,\dots \}$ it holds

$$\begin{aligned} \begin{aligned} \sum _{k=0}^{n-1}\left| {\gamma _{n,k}}\right|&=\sum _{k=0}^{n-1}\left| {-\gamma _{n-1,k-1}-(n-1)\gamma _{n-1,k}+(k+1)\gamma _{n-1,k+1}}\right| \\&\le \left[ {\sum _{k=0}^{n-1}\left| {\gamma _{n-1,k-1}}\right| }\right] +\left[ {\sum _{k=0}^{n-1}(n-1)\left| {\gamma _{n-1,k}}\right| }\right] +\left[ {\sum _{k=0}^{n-1}(k+1)\left| {\gamma _{n-1,k+1}}\right| }\right] \\&=\left[ {\sum _{k=-1}^{n-2}\left| {\gamma _{n-1,k}}\right| }\right] +\left[ {\sum _{k=0}^{n-1}(n-1)\left| {\gamma _{n-1,k}}\right| }\right] +\left[ {\sum _{k=1}^{n}k\left| {\gamma _{n-1,k}}\right| }\right] . \end{aligned} \end{aligned}$$

(3.20)

Combining this with the fact that for every $n\in \{2,3,4,\dots \}$, $k\in {\mathbb {Z}}\backslash \{0,1,\dots ,n-2\}$ we have $\gamma _{n-1,k}=0$ implies that for every $n\in \{2,3,4,\dots \}$ it holds

$$\begin{aligned} \sum _{k=0}^{n-1}\left| {\gamma _{n,k}}\right|&=\sum _{k=0}^{n-2}\left[ {(1+(n-1)+k)\left| {\gamma _{n-1,k}}\right| }\right] \le (2n-2)\left[ {\sum _{k=0}^{n-2}\left| {\gamma _{n-1,k}}\right| }\right] \nonumber \\&=2(n-1)\left[ {\sum _{k=0}^{n-2}\left| {\gamma _{n-1,k}}\right| }\right] . \end{aligned}$$

(3.21)

The fact that $\gamma _{1,0}=1$ hence implies that for every $n\in {\mathbb {N}}$ we have

$$\begin{aligned} \sum _{k=0}^{n-1}\left| {\gamma _{n,k}}\right| \le (n-1)!\,2^{n-1}\left[ {\sum _{k=0}^0\left| {\gamma _{1,k}}\right| }\right] =(n-1)!\,2^{n-1}. \end{aligned}$$

(3.22)

Combining this and (3.19) ensures that for every $n\in {\mathbb {N}}$ it holds

$$\begin{aligned} \sup _{t\in (0,\infty )}\left| {f^{(n)}(t)}\right| \le \max \!\left\{ \tfrac{1}{\sqrt{2\pi }}(n-1)!\,2^{n-1}\, ,\sup _{t\in [e^{-4n},1]}\left| {f^{(n)}(t)}\right| \right\} <\infty . \end{aligned}$$

(3.23)

The proof of Lemma 3.2 is thus completed. $\square $

In the following corollary, we estimate the derivatives of the function $x\rightarrow f(\tfrac{K+c}{x})$ required to approximate this function by neural networks.

Corollary 3.3

Let $n\in {\mathbb {N}}$, $K\in [0,\infty )$, $c,a\in (0,\infty )$, $b\in (a,\infty )$, let $f:(0,\infty )\rightarrow {\mathbb {R}}$ be the function which satisfies for every $t\in (0,\infty )$ that

$$\begin{aligned} f(t)=\tfrac{1}{\sqrt{2\pi }}\int ^{\ln (t)}_{-\infty } e^{-\frac{1}{2}r^2}\mathrm {d}r, \end{aligned}$$

(3.24)

and let $h:[a,b]\rightarrow {\mathbb {R}}$ be the function which satisfies for every $x\in [a,b]$ that

$$\begin{aligned} h(x)=f\left( \tfrac{K+c}{x}\right) . \end{aligned}$$

(3.25)

Then it holds

(i)
that f and h are infinitely often differentiable and
(ii)
that
$$\begin{aligned}&\max _{k\in \{0,1,\dots ,{n}\}}\sup _{x\in [a,b]}\left| {h^{(k)}\!(x)}\right| \nonumber \\&\quad \le n2^{n-1} n! \left[ {\max _{k\in \{0,1,\dots ,{n}\}} \sup _{t\in [\frac{K+c}{b},\frac{K+c}{a}]}\left| {f^{(k)}\!(t)}\right| }\right] \max \{a^{-2n},1\}\max \{(K+c)^n,1\}.\nonumber \\ \end{aligned}$$
(3.26)

Proof of Corollary 3.3

Throughout this proof, let $\alpha _{m,j}\in {\mathbb {Z}}$, $m,j\in {\mathbb {Z}}$, be the integers which satisfy that for every $m,j\in {\mathbb {Z}}$ it holds

$$\begin{aligned} \alpha _{m,j}={\left\{ \begin{array}{ll}-1 &{} :m=j=1\\ -(m-1+j)\alpha _{m-1,j}-\alpha _{m-1,j-1} &{} :m>1,\,\, 1\le j\le m \\ 0 &{} :\mathrm {else} \end{array}\right. }. \end{aligned}$$

(3.27)

Note that Lemma 3.1 and the chain rule ensure that the functions f and h are infinitely often differentiable. Next we claim that for every $m\in {\mathbb {N}}$, $x\in [a,b]$ it holds

$$\begin{aligned} h^{(m)}\!(x)=\tfrac{\mathrm {d}^m}{\mathrm {d}x^m}\!\left( {f\left( \tfrac{K+c}{x}\right) }\right) =\sum _{j=1}^m \alpha _{m,j}(K+c)^j x^{-(m+j)}(f^{(j)}\!\big (\tfrac{K+c}{x})\big ). \end{aligned}$$

(3.28)

We prove (3.28) by induction on $m\in {\mathbb {N}}$. To prove the base case $m=1$ we note that the chain rule ensures that for every $x\in [a,b]$ we have

$$\begin{aligned} \tfrac{\mathrm {d}}{\mathrm {d}x}\!\left( {f\left( \tfrac{K+c}{x}\right) }\right) =-(K+c)x^{-2}\!\left( {f'\! \left( \tfrac{K+c}{x}\right) }\right) =\alpha _{1,1}(K+c)x^{-2}\!\left( {f'\! \left( \tfrac{K+c}{x}\right) }\right) . \end{aligned}$$

(3.29)

This establishes (3.28) in the base case $m=1$. For the induction step ${\mathbb {N}}\ni m \rightarrow m+1\in {\mathbb {N}}$ observe that the chain rule implies for every $m\in {\mathbb {N}}$, $x\in [a,b]$ that

$$\begin{aligned} \begin{aligned}&\tfrac{\mathrm {d}}{\mathrm {d}x}\!\left[ {\sum _{j=1}^m \alpha _{m,j}(K+c)^j x^{-(m+j)}\!\left( {f^{(j)}\! \left( \tfrac{K+c}{x}\right) }\right) }\right] \\&\quad =-\left[ {\sum _{j=1}^m \alpha _{m,j}(K+c)^{j+1}x^{-(m+j+2)}\!\left( {f^{(j+1)}\! \left( \tfrac{K+c}{x}\right) }\right) }\right] \\&\qquad -\left[ {\sum _{j=1}^m \alpha _{m,j}(K+c)^j(m+j)x^{-(m+j+1)}\!\left( {f^{(j)}\! \left( \tfrac{K+c}{x}\right) }\right) }\right] \\&\quad =-\left[ {\sum _{j=2}^{m+1} \alpha _{m,j-1}(K+c)^{j}x^{-(m+j+1)}\!\left( {f^{(j)} \!\left( \tfrac{K+c}{x}\right) }\right) }\right] \\&\qquad -\left[ {\sum _{j=1}^m \alpha _{m,j}(K+c)^j (m+j)x^{-(m+j+1)}\!\left( {f^{(j)}\! \left( \tfrac{K+c}{x}\right) }\right) }\right] \\&\quad =\sum _{j=1}^{m+1}(-(m+j)\alpha _{m,j}-\alpha _{m,j-1})(K+c)^jx^{-(m+1+j)} \!\left( {f^{(j)}\!\left( \tfrac{K+c}{x}\right) }\right) . \end{aligned}\end{aligned}$$

(3.30)

Induction thus establishes (3.28). Next note that (3.27) ensures that for every $m\in \{2,3,\dots \}$ it holds

$$\begin{aligned} \begin{aligned}&\max _{j\in \{1,2,\dots ,{m}\}}\left| {\alpha _{m,j}}\right| \\&\quad =\max _{j\in \{1,2,\dots ,{m}\}}\left| {-(m-1+j)\alpha _{m-1,j}-\alpha _{m-1,j-1}}\right| \\&\quad \le \left[ {\max _{j\in \{1,2,\dots ,{m-1}\}}\left| {(m-1+j)\alpha _{m-1,j}}\right| }\right] +\left[ {\max _{j\in \{1,2,\dots ,{m-1}\}}\left| {\alpha _{m-1,j}}\right| }\right] \\&\quad \le (2m-1)\left[ {\max _{j\in \{1,2,\dots ,{m-1}\}}\left| {\alpha _{m-1,j}}\right| }\right] \le 2m\left[ {\max _{j\in \{1,2,\dots ,{m-1}\}}\left| {\alpha _{m-1,j}}\right| }\right] . \end{aligned}\end{aligned}$$

(3.31)

Induction hence proves that for every $m\in {\mathbb {N}}$ we have $\max _{j\in \{1,2,\dots ,{m}\}}\left| {\alpha _{m,j}}\right| \le 2^{m-1}m!$. Combining this with (3.28) implies that for every $m\in \{1,2,\dots ,{n}\}$, $x\in [a,b]$ we have

$$\begin{aligned}&\left| {h^{(m)}\!(x)}\right| \nonumber \\&\quad =\left| {\sum _{j=1}^m \alpha _{m,j}(K+c)^j x^{-(m+j)}\!\big (f^{(j)} \!\left( \tfrac{K+c}{x}\right) \big )}\right| \nonumber \\&\quad \le 2^{m-1}m! \left[ {\max _{j\in \{1,2,\dots ,{m}\}}\sup _{t\in \left[ \frac{K+c}{b},\frac{K+c}{a}\right] }\left| {f^{(j)}\!(t)}\right| }\right] \max \{x^{-2m},1\}\left[ {\sum _{j=1}^m (K+c)^j}\right] \nonumber \\&\quad \le m2^{m-1} m! \left[ {\max _{j\in \{1,2,\dots ,{m}\}}\sup _{t\in \left[ \frac{K+c}{b},\frac{K+c}{a}\right] }\left| {f^{(j)}\!(t)}\right| }\right] \max \{x^{-2m},1\}\max \{(K+c)^m,1\}.\nonumber \\ \end{aligned}$$

(3.32)

Combining this with the fact that $\sup _{x\in [a,b]}\left| {h(x)}\right| =\sup _{t\in [\frac{K+c}{b},\frac{K+c}{a}]}\left| {f(t)}\right| $ establishes that it holds

$$\begin{aligned}&\max _{k\in \{0,1,\dots ,{n}\}}\sup _{x\in [a,b]}\left| {h^{(k)}\!(x)}\right| \nonumber \\&\quad \le n2^{n-1} n! \left[ {\max _{k\in \{0,1,\dots ,{n}\}}\sup _{t\in \left[ \frac{K+c}{b},\frac{K+c}{a}\right] }\left| {f^{(k)}\!(t)}\right| }\right] \max \{a^{-2n},1\}\max \{(K+c)^n,1\}.\nonumber \\ \end{aligned}$$

(3.33)

This completes the proof of Corollary 3.3. $\square $

Next we consider the derivatives of the functions $c\mapsto f(\tfrac{K+c}{x_i})$, $i\in \{1,2,\dots ,{d}\}$, and their tensor product, which will be needed in order to approximate the outer integral in (2.9) by composite Gaussian quadrature.

Corollary 3.4

Let $n\in {\mathbb {N}}$, $K\in [0,\infty )$, $x\in (0,\infty )$, let $f:(0,\infty )\rightarrow {\mathbb {R}}$ be the function which satisfies for every $t\in (0,\infty )$ that

$$\begin{aligned} f(t)=\tfrac{1}{\sqrt{2\pi }}\int ^{\ln (t)}_{-\infty } e^{-\frac{1}{2}r^2}\mathrm {d}r, \end{aligned}$$

(3.34)

and let $g:(0,\infty )\rightarrow {\mathbb {R}}$ be the function which satisfies for every $t\in (0,\infty )$ that

$$\begin{aligned} g(t)=f\!\left( {\tfrac{K+t}{x}}\right) . \end{aligned}$$

(3.35)

Then it holds

(i)
that f and g are infinitely often differentiable and
(ii)
that
$$\begin{aligned} \sup _{t\in (0,\infty )}\left| {g^{(n)}(t)}\right| \le \left[ \sup _{t\in (0,\infty )}\left| {f^{(n)}(t)}\right| \right] \left| {x}\right| ^{-n}<\infty . \end{aligned}$$
(3.36)

Proof of Corollary 3.4

Combining Lemma 3.2 with the chain rule implies that for every $t\in (0,\infty )$ it holds

$$\begin{aligned} \left| {g^{(n)}(t)}\right| =\left| {\tfrac{\mathrm {d}^n}{\mathrm {d}t^n}\big (f(\tfrac{K+t}{x})\big )}\right| =\left| {f^{(n)}\!\left( {\tfrac{K+t}{x}}\right) \tfrac{1}{x^n}}\right| \le \left[ \sup _{t\in (0,\infty )}\left| {f^{(n)}(t)}\right| \right] \left| {x}\right| ^{-n}<\infty . \end{aligned}$$

(3.37)

This completes the proof of Corollary 3.4. $\square $

Lemma 3.5

Let $d,n\in {\mathbb {N}}$, $a\in (0,\infty )$, $b\in (a,\infty )$, $K=(K_1,\dots ,K_d)\in [0,\infty )^d$, $x=(x_1,\dots ,x_d)\in [a,b]^d$, let $f:(0,\infty )\rightarrow {\mathbb {R}}$ be the function which satisfies for every $t\in (0,\infty )$ that

$$\begin{aligned} f(t)=\tfrac{1}{\sqrt{2\pi }}\int ^{\ln (t)}_{-\infty } e^{-\frac{1}{2}r^2}\mathrm {d}r, \end{aligned}$$

(3.38)

and let $F:(0,\infty )\rightarrow {\mathbb {R}}$ be the function which satisfies for every $c\in (0,\infty )$ that

$$\begin{aligned} F(c)=1-\left[ {\textstyle \prod \limits _{i=1}^d f\!\left( {\tfrac{K_i+c}{x_i}}\right) }\right] . \end{aligned}$$

(3.39)

Then it holds

(i)
that f and F are infinitely often differentiable and
(ii)
that
$$\begin{aligned} \sup _{c\,\in (0,\infty )}\left| {F^{(n)}(c)}\right| \le \left[ \max _{k\in \{0,1,\dots ,{n}\}}\sup _{t\in (0,\infty )}\left| {f^{(k)}(t)}\right| \right] ^n d^n a^{-n}<\infty . \end{aligned}$$
(3.40)

Proof of Lemma 3.5

Note that Lemma 3.1 ensures that f and F are infinitely often differentiable. Moreover, observe that (3.39) and the general Leibniz rule imply for every $c\in (0,\infty )$ that

$$\begin{aligned} \begin{aligned} F^{(n)}(c)&=-\tfrac{\mathrm {d}^n}{\mathrm {d}c^n}\left[ \textstyle \prod \limits _{i=1}^d f\!\left( {\tfrac{K_i+c}{x_i}}\right) \right] \\&=-\sum _{\begin{array}{c} l_1,l_2,\dots ,l_d\in {\mathbb {N}}_0,\\ \sum _{i=1}^d l_i=n \end{array}}\left[ {\left( {\begin{array}{c}n\\ l_1,l_2,\dots ,l_d\end{array}}\right) \textstyle \prod \limits _{i=1}^d\left( {\tfrac{\mathrm {d}^{l_i}}{\mathrm {d}c^{l_i}}\left[ {f\!\left( {\tfrac{K_i+c}{x_i}}\right) }\right] }\right) }\right] . \end{aligned} \end{aligned}$$

(3.41)

Next note that the fact that for every $r\in {\mathbb {R}}$ it holds that $e^{-\frac{1}{2}r^2}\ge 0$ ensures that

$$\begin{aligned} \sup _{t\in (0,\infty )}\left| {f(t)}\right| = \sup _{t\in (0,\infty )}\left| {\tfrac{1}{\sqrt{2\pi }}\int ^{\ln (t)}_{-\infty } e^{-\frac{1}{2}r^2}\mathrm {d}r}\right| =\left| {\tfrac{1}{\sqrt{2\pi }}\int ^{\infty }_{-\infty } e^{-\frac{1}{2}r^2}\mathrm {d}r}\right| =1. \end{aligned}$$

(3.42)

Corollary 3.4 hence establishes that for every $c\in [0,\infty )$, $l_1,\dots ,l_d\in {\mathbb {N}}_0$ with $\sum _{i=1}^d l_i=n$ it holds

$$\begin{aligned} \begin{aligned}&\left| {\textstyle \prod \limits _{i=1}^d\left( {\tfrac{\mathrm {d}^{l_i}}{\mathrm {d}c^{l_i}}\left[ {f\!\left( {\tfrac{K_i+c}{x_i}}\right) }\right] }\right) }\right| \\&\quad \le \textstyle \prod \limits _{i=1}^d\displaystyle \left( {\left[ \sup _{t\in (0,\infty )}\left| {f^{(l_i)}(t)}\right| \right] \left| {x_i}\right| ^{-l_i}}\right) \\&\quad =\left[ {\textstyle \prod \limits _{i=1}^d\left| {x_i}\right| ^{-l_i}}\right] \left[ {\textstyle \prod \limits _{i=1}^d\displaystyle \left( {\sup _{t\in (0,\infty )}\left| {f^{(l_i)}(t)}\right| }\right) }\right] \\&\quad \le \left[ {\textstyle \prod \limits _{i=1}^d\left| {x_i}\right| ^{-l_i}}\right] \left[ {\textstyle \prod \limits _{\begin{array}{c} i\in \{1,2,\dots ,{d}\},\\ l_i>0 \end{array}}\displaystyle \left( {\max _{k\in \{1,2,\dots ,{n}\}}\sup _{t\in (0,\infty )}\left| {f^{(k)}(t)}\right| }\right) }\right] \\&\quad \le \left[ {\textstyle \prod \limits _{i=1}^d\left| {x_i}\right| ^{-l_i}}\right] \left[ {\textstyle \prod \limits _{\begin{array}{c} i\in \{1,2,\dots ,{d}\},\\ l_i>0 \end{array}}\displaystyle \max \left\{ 1,\max _{k\in \{1,2,\dots ,{n}\}}\sup _{t\in (0,\infty )}\left| {f^{(k)}(t)}\right| \right\} }\right] \\&\quad \le \left[ {\textstyle \prod \limits _{i=1}^d\left| {x_i}\right| ^{-l_i}}\right] \left[ {\max \left\{ 1,\max _{k\in \{1,2,\dots ,{n}\}}\sup _{t\in (0,\infty )}\left| {f^{(k)}(t)}\right| \right\} }\right] ^{(l_1+\ldots +l_d)}\\&\quad =\left[ {\textstyle \prod \limits _{i=1}^d\left| {x_i}\right| ^{-l_i}}\right] \left[ {\max _{k\in \{0,1,\dots ,{n}\}}\sup _{t\in (0,\infty )}\left| {f^{(k)}(t)}\right| }\right] ^n. \end{aligned} \end{aligned}$$

(3.43)

Moreover, note that the multinomial theorem ensures that

$$\begin{aligned} \begin{aligned} d^n&=\left[ {\sum _{i=1}^d 1}\right] ^n =\sum _{\begin{array}{c} l_1,l_2,\dots ,l_d\in {\mathbb {N}}_0,\\ \sum _{i=1}^d l_i=n \end{array}}\left[ {\left( {\begin{array}{c}n\\ l_1,l_2,\dots ,l_d\end{array}}\right) \textstyle \prod \limits _{i=1}^d 1^{l_i}}\right] \\&=\sum _{\begin{array}{c} l_1,l_2,\dots ,l_d\in {\mathbb {N}}_0,\\ \sum _{i=1}^d l_i=n \end{array}}\left[ {\left( {\begin{array}{c}n\\ l_1,l_2,\dots ,l_d\end{array}}\right) }\right] . \end{aligned} \end{aligned}$$

(3.44)

Combining this with (3.41), (3.43) and the assumption that $x\in [a,b]^d$ implies that for every $c\in (0,\infty )$ we have

$$\begin{aligned}&\left| {F^{(n)}(c)}\right| \nonumber \\&\quad \le \left| {\sum _{\begin{array}{c} l_1,l_2,\dots ,l_d\in {\mathbb {N}}_0, \nonumber \\ \sum _{i=1}^d l_i=n \end{array}}\left[ {\left( {\begin{array}{c}n\\ l_1,l_2,\dots ,l_d\end{array}}\right) \left[ {\textstyle \prod \limits _{i=1}^d\displaystyle \left| {x_i}\right| ^{-l_i}}\right] \left[ \max _{k\in \{0,1,\dots ,{n}\}}\sup _{t\in (0,\infty )}\left| {f^{(k)}(t)}\right| \right] ^n}\right] }\right| \nonumber \\&\quad \le a^{-n}\left[ \max _{k\in \{0,1,\dots ,{n}\}}\sup _{t\in (0,\infty )}\left| {f^{(k)}(t)}\right| \right] ^n \left| {\sum _{\begin{array}{c} l_1,l_2,\dots ,l_d\in {\mathbb {N}}_0, \nonumber \\ \sum _{i=1}^d l_i=n \end{array}}\left( {\begin{array}{c}n\\ l_1,l_2,\dots ,l_d\end{array}}\right) }\right| \\&\quad = a^{-n}\left[ \max _{k\in \{0,1,\dots ,{n}\}}\sup _{t\in (0,\infty )}\left| {f^{(k)}(t)}\right| \right] ^n d^n. \end{aligned}$$

(3.45)

This completes the proof of Lemma 3.5. $\square $

4 Quadrature

To approximate the function $x\mapsto u(0,x)$ from (2.9) by a neural network, we need to evaluate, for arbitrary, given x, an expression of the form $\int _0^{\infty } F_x(c)\mathrm {d}c$ with $F_x$ as defined in Lemma 4.2. We achieve this by proving in Lemma 4.2 that the functions $F_x$ decay sufficiently fast for $c\rightarrow \infty $, and then employ numerical integration to show that the definite integral $\int _0^N F_x(c)\mathrm {d}c$ can be sufficiently well approximated by a weighted sum of $F_x(c_j)$ for suitable quadrature points $c_j\in (0,N)$. The representation of such a sum can be realized by neural networks. We show in Sects. 6 and 7 how the functions $x\mapsto F_x(c_j)$ for $(c_j)\in (0,N)$ can be realized efficiently due to their tensor product structure. We start by recalling an error bound for composite Gaussian quadrature which is explicit in the step size and quadrature order.

Lemma 4.1

Let $n,M\in {\mathbb {N}}$, $N\in (0,\infty )$. Then there exist real numbers $(c_j)_{j=1}^{nM}\subseteq (0,N)$ and ${(w_j)_{j=1}^{nM}\subseteq (0,\infty )}$ such that for every $h\in C^{2n}([0,N],{\mathbb {R}})$ it holds

$$\begin{aligned} \left| {\int _0^N h(t)\,\mathrm {d}t - \sum _{j=1}^{nM}w_j h(c_j)}\right| \le \tfrac{1}{(2n)!}N^{2n+1}M^{-2n}\left[ {\sup _{\xi \in [0,N]}\left| {h^{(2n)}(\xi )}\right| }\right] . \end{aligned}$$

(4.1)

Proof of Lemma 4.1

Throughout this proof, let $h\in C^{2n}([0,N],{\mathbb {R}})$ and $\alpha _k\in [0,N]$, $k\in \{0,1,\dots ,M\}$, such that for every $k\in \{0,1,\dots ,M\}$ it holds $\alpha _k=\tfrac{kN}{M}$. Observe that [30, Theorems 4.17, 6.11 and 6.12] ensure that for every $k\in \{0,1,\dots ,M-1\}$ there exist $(\gamma ^k_i)_{i=1}^{n}\subseteq (\alpha _k,\alpha _{k+1})$, $(\omega ^k_i)_{i=1}^{n}\subseteq (0,\infty )$ and $\xi ^k\in [\alpha _k,\alpha _{k+1}]$ such that

$$\begin{aligned} \int _{\alpha _k}^{\alpha _{k+1}}h(t)\,\mathrm {d}t-\sum _{i=1}^n\omega ^k_i h(\gamma ^k_i) =\frac{h^{(2n)}(\xi ^k)}{(2n)!}\int _{\alpha _k}^{\alpha _{k+1}}\left[ {\textstyle \prod \limits _{i=1}^n (t-\gamma ^k_i)^2}\right] \mathrm {d}t. \end{aligned}$$

(4.2)

Next note that for every $k\in \{0,1,\dots ,M-1\}$ it holds

$$\begin{aligned} \begin{aligned} \int _{\alpha _k}^{\alpha _{k+1}}\left[ {\textstyle \prod \limits _{i=1}^n (t-\gamma ^k_i)^2}\right] \mathrm {d}t&\le \int _{\alpha _{k}}^{\alpha _{k+1}}\left[ {\textstyle \prod \limits _{i=1}^n(\alpha _k-\alpha _{k+1})^2}\right] \mathrm {d}t =\left[ {\tfrac{N}{M}}\right] ^{2n+1}. \end{aligned}\end{aligned}$$

(4.3)

Combining this with (4.2) yields that for every $k\in \{0,1,\dots ,M\}$ we have

$$\begin{aligned} \begin{aligned}&\left| {\int _{\alpha _k}^{\alpha _{k+1}}h(t)\, \mathrm {d}t-\sum _{i=1}^n\omega ^k_i h(\gamma ^k_i)}\right| \\&\quad \le \frac{\left| {h^{(2n)}(\xi ^k)}\right| }{(2n)!}\left[ {\tfrac{N}{M}}\right] ^{2n+1} \le \tfrac{1}{(2n)!}\left[ {\tfrac{N}{M}}\right] ^{2n+1}\left[ {\sup _{\xi \in [0,N]}\left| {h^{(2n)}(\xi )}\right| }\right] . \end{aligned}\end{aligned}$$

(4.4)

Hence, we obtain

$$\begin{aligned} \begin{aligned} \left| { \int _0^N h(t)\, \mathrm {d}t\!-\!\!\sum _{k=0}^{M-1}\!\sum _{i=1}^n\omega ^k_i h(\gamma ^k_i)}\right|&=\left| {\sum _{k=0}^{M-1}\left[ {\int _{\alpha _k}^{\alpha _{k+1}}h(t)\,\mathrm {d}t-\sum _{i=1}^n\omega ^k_i h(\gamma ^k_i)}\right] }\right| \\&\le \sum _{k=0}^{M-1}\left( {\tfrac{1}{(2n)!}\left( {\tfrac{N}{M}}\right) ^{2n+1}\left[ {\sup _{\xi \in [0,N]}\left| {h^{(2n)}(\xi )}\right| }\right] }\right) \\&=\tfrac{1}{(2n)!}N^{2n+1}M^{-2n}\left[ {\sup _{\xi \in [0,N]}\left| {h^{(2n)}(\xi )}\right| }\right] . \end{aligned}\end{aligned}$$

(4.5)

Let $(c_j)_{j=1}^{nM}\subseteq (0,N)$, $(w_j)_{j=1}^{nM}\subseteq (0,\infty )$ such that for every $i\in \{1,2,\dots ,{n}\}$, $k\in \{0,1,\dots ,M-1\}$ it holds

$$\begin{aligned} c_{kn+i}=\gamma _i^k\quad \mathrm {and}\quad w_{kn+i}=\omega _i^k. \end{aligned}$$

(4.6)

Next observe that

$$\begin{aligned} \left| {\int _0^N h(t)\,\mathrm {d}t - \sum _{j=1}^{nM}w_j h(c_j)}\right| =\left| { \int _0^N h(t)\,\mathrm {d}t\!-\!\!\sum _{k=0}^{M-1}\!\sum _{i=1}^n\omega ^k_i h(\gamma ^k_i)}\right| . \end{aligned}$$

(4.7)

This completes the proof of Lemma 4.1. $\square $

In the following, we bound the error due to truncating the domain of integration.

Lemma 4.2

Let $d,n\in {\mathbb {N}}$, $a\in (0,\infty )$, $b\in (a,\infty )$, $K=(K_1,K_2,\dots ,K_d)\in [0,\infty )^d$, let $F_x:(0,\infty )\rightarrow {\mathbb {R}}$, $x\in [a,b]^d$, be the functions which satisfy for every $x=(x_1,x_2,\dots ,x_d)\in [a,b]^d$, $c\in (0,\infty )$ that

$$\begin{aligned} F_x(c)=1-\prod _{i=1}^d\left[ {\tfrac{1}{\sqrt{2\pi }}\int _{-\infty }^{\ln (\frac{K_i+c}{x_i})}e^{-\frac{1}{2}r^2}\mathrm {d}r}\right] , \end{aligned}$$

(4.8)

and for every $\varepsilon \in (0,1]$ let $N_{\varepsilon }\in {\mathbb {R}}$ be given by $N_{\varepsilon }=2e^{2(n+1)}(b+1)^{1+\frac{1}{n}}d^{\frac{1}{n}}\varepsilon ^{-\frac{1}{n}}$. Then it holds for every $\varepsilon \in (0,1]$ that

$$\begin{aligned} \sup _{x\in [a,b]^d}\left| {\int _{N_{\varepsilon }}^\infty F_x(c)\,\mathrm {d}c}\right| \le \varepsilon . \end{aligned}$$

(4.9)

Proof of Lemma 4.2

Throughout this proof, let $g:(0,\infty )\rightarrow (0,1)$ be the function given by

$$\begin{aligned} g(t)=1-\tfrac{1}{\sqrt{2\pi }}\int _{-\infty }^{\ln (t)}e^{-\frac{1}{2}r^2}\mathrm {d}r. \end{aligned}$$

(4.10)

Note that [6, Eq.(5)] ensures that for every $y\in [0,\infty )$ we have $\tfrac{2}{\sqrt{\pi }}\int _y^{\infty }e^{-r^2}\mathrm {d}r \le e^{-y^2}$. This implies for every $t\in [1,\infty )$ that

$$\begin{aligned} \begin{aligned} 0<g(t)&=1-\tfrac{1}{\sqrt{2\pi }}\int _{-\infty }^{\ln (t)}e^{-\frac{1}{2}r^2}\mathrm {d}r =\tfrac{1}{\sqrt{2\pi }}\int _{\ln (t)}^\infty e^{-\frac{1}{2}r^2}\mathrm {d}r\\&=\tfrac{1}{\sqrt{\pi }}\int ^{\infty }_{\frac{\ln (t)}{\sqrt{2}}} e^{-r^2}\mathrm {d}r\le \tfrac{1}{2}e^{-\frac{1}{2}[\ln (t)]^2}. \end{aligned}\end{aligned}$$

(4.11)

Furthermore, observe that for every $t\in [e^{2(n+1)},\infty )$ it holds

$$\begin{aligned} \begin{aligned} e^{-\frac{1}{2}[\ln (t)]^2}&=e^{\left[ {\ln (t)(-\frac{1}{2}\ln (t))}\right] } =\left[ {e^{\ln (t)}}\right] ^{-\frac{1}{2}\ln (t)}=t^{-\frac{1}{2}\ln (t)}\le t^{-(n+1)}. \end{aligned} \end{aligned}$$

(4.12)

This, (4.11) and the fact that for every $\varepsilon \in (0,1]$, $c\in [N_{\varepsilon },\infty )$, $x\in [a,b]^d$, $i\in \{1,2,\dots ,{d}\}$ we have $\tfrac{K_i+c}{x_i}\ge \tfrac{c}{b}\ge e^{2(n+1)}\ge 1$ imply that for every $\varepsilon \in (0,1]$, $c\in [N_{\varepsilon },\infty )$, $x\in [a,b]^d$ it holds

$$\begin{aligned} \begin{aligned} \left| {F_x(c)}\right|&=\left| {1-\prod _{i=1}^d\left[ {\tfrac{1}{\sqrt{2\pi }}\int _{-\infty }^{\ln \left( \frac{K_i+c}{x_i}\right) }e^{-\frac{1}{2}r^2}\mathrm {d}r}\right] }\right| =\left| {1-\prod _{i=1}^d\left[ {1-g\left( \tfrac{K_i+c}{x_i}\right) }\right] }\right| \\&\le \left| {1-\prod _{i=1}^d\left[ {1-\tfrac{1}{2}\left[ {\tfrac{K_i+c}{x_i}}\right] ^{-(n+1)}}\right] }\right| \le \left| {1-\prod _{i=1}^d\left[ {1-\tfrac{1}{2}\left[ {\tfrac{c}{b}}\right] ^{-(n+1)}}\right] }\right| . \end{aligned}\end{aligned}$$

(4.13)

Combining this with the binomial theorem and the fact that for every $i\in \{1,2,\dots ,{d}\}$ we have $\left( {\begin{array}{c}d\\ i\end{array}}\right) \le \tfrac{d^i}{i!}\le \tfrac{d^i}{\exp (i\ln (i)-i+1)}\le \tfrac{(de)^i}{i^i}$ establishes that for every $\varepsilon \in (0,1]$, $c\in [N_{\varepsilon },\infty )$, $x\in [a,b]^d$ it holds

$$\begin{aligned} \begin{aligned} \left| {F_x(c)}\right|&\le \left| {1-\left( {1-\tfrac{1}{2}\left[ {\tfrac{c}{b}}\right] ^{-(n+1)}}\right) ^d}\right| =\left| {1-\sum _{i=0}^d\left[ {\left( {\begin{array}{c}d\\ i\end{array}}\right) \left[ {-\tfrac{1}{2}\left[ {\tfrac{c}{b}}\right] ^{-(n+1)}}\right] ^i}\right] }\right| \\&\le \sum _{i=1}^d\left[ {\left( {\begin{array}{c}d\\ i\end{array}}\right) \left[ {\tfrac{1}{2}}\right] ^i\left[ {\tfrac{b}{c}}\right] ^{(n+1)i}}\right] \le \sum _{i=1}^d\left[ {\tfrac{de}{2i}}\right] ^i\left[ {\tfrac{b}{c}}\right] ^{(n+1)i}\\&=\sum _{i=1}^d\left[ {\tfrac{e}{2i}}\right] ^i\left[ {d\left[ {\tfrac{b}{c}}\right] ^{n+1}}\right] ^i\le 2d\left[ {\tfrac{b}{c}}\right] ^{n+1}\left[ {\sum _{i=1}^d\left[ {d\left[ {\tfrac{b}{c}}\right] ^{n+1}}\right] ^{i-1}}\right] \\&=2d\left[ {\tfrac{b}{c}}\right] ^{n+1}\left[ {\sum _{i=0}^{d-1}\left[ {d\left[ {\tfrac{b}{c}}\right] ^{n+1}}\right] ^{i}}\right] \le 2d\left[ {\tfrac{b}{c}}\right] ^{n+1}\left[ {\sum _{i=0}^{\infty }\left[ {d\left[ {\tfrac{b}{c}}\right] ^{n+1}}\right] ^{i}}\right] . \end{aligned}\end{aligned}$$

(4.14)

This, the geometric sum formula and the fact that for every $\varepsilon \in (0,1]$ it holds that $N_\varepsilon \ge 2bd^{\frac{1}{n}}$ imply that for every $\varepsilon \in (0,1]$, $c\in [N_{\varepsilon },\infty )$, $x\in [a,b]^d$ we have

$$\begin{aligned} \left| {F_x(c)}\right|&\le 2d\left[ {\tfrac{b}{c}}\right] ^{n+1} \left[ {\frac{1}{1-d\left[ {\tfrac{b}{c}}\right] ^{n+1}}}\right] \le 4d\left[ {\tfrac{b}{c}}\right] ^{n+1}. \end{aligned}$$

(4.15)

Hence, we obtain for every $\varepsilon \in (0,1]$, $x\in [a,b]^d$ that

$$\begin{aligned} \begin{aligned} \left| {\int _{N_{\varepsilon }}^\infty F_x(c)\,\mathrm {d}c}\right|&\le 4db^{n+1} \left| {\int _{N_{\varepsilon }}^\infty c^{-(n+1)}\mathrm {d}c}\right| =4db^{n+1}\tfrac{1}{n}(N_{\varepsilon })^{-n}\\&=\tfrac{4}{n}db^{n+1}\left[ {2e^{2(n+1)}(b+1)^{1+\frac{1}{n}}d^{\frac{1}{n}}\varepsilon ^{-\frac{1}{n}}}\right] ^{-n}\\&=\tfrac{4}{n}db^{n+1}2^{-n}e^{-(2n^2+2n)}(b+1)^{-(n+1)}d^{-1}\varepsilon \\&=\tfrac{4}{n}2^{-n}e^{-(2n^2+n)}\left[ {\tfrac{b}{b+1}}\right] ^{n+1}\varepsilon \le \varepsilon . \end{aligned}\end{aligned}$$

(4.16)

This completes the proof of Lemma 4.2. $\square $

Next we combine the above result with Lemma 4.1 in order to derive the number of terms needed in order to approximate the integral by a sum to within a prescribed error bound $\varepsilon $.

Lemma 4.3

Let $n\in {\mathbb {N}}$, $a\in (0,\infty )$, $b\in (a,\infty )$, $(K_i)_{i\in {\mathbb {N}}}\subseteq [0,\infty )$, let $F^d_x:(0,\infty )\rightarrow {\mathbb {R}}$, $x\in [a,b]^d$, $d\in {\mathbb {N}}$, be the functions which satisfy for every $d\in {\mathbb {N}}$, $x=(x_1,x_2,\dots ,x_d)\in [a,b]^d$, $c\in (0, \infty )$ that

$$\begin{aligned} F^d_x(c)=1-\prod _{i=1}^d\left[ {\tfrac{1}{\sqrt{2\pi }}\int _{-\infty }^{\ln (\frac{K_i+c}{x_i})}e^{-\frac{1}{2}r^2}\mathrm {d}r}\right] , \end{aligned}$$

(4.17)

and for every $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$ let $N_{d,\varepsilon }\in {\mathbb {R}}$ be given by

$$\begin{aligned} N_{d,\varepsilon }=2e^{2(n+1)}(b+1)^{1+\frac{1}{n}}d^{\frac{1}{n}}\left[ {\tfrac{\varepsilon }{2}}\right] ^{-\frac{1}{n}}. \end{aligned}$$

(4.18)

Then there exist $Q_{d,\varepsilon }\in {\mathbb {N}}$, $c^d_{\varepsilon ,j}\in (0,N_{d,\varepsilon })$, $w^d_{\varepsilon ,j}\in [0,\infty )$, $j\in \{1,2,\dots ,{Q_{d,\varepsilon }}\}$, $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$, such

(i)
that
$$\begin{aligned} \sup _{\varepsilon \in (0,1], d\in {\mathbb {N}}}\left[ {\frac{Q_{d,\varepsilon }}{d^{1+\frac{2}{n}}\varepsilon ^{-\frac{2}{n}}}}\right] <\infty \end{aligned}$$
(4.19)
and
(ii)
that for every $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$ it holds $\sum _{j=1}^{Q_{d,\varepsilon }}w^d_{\varepsilon ,j}=N_{d,\varepsilon }$ and
$$\begin{aligned} \sup _{x\in [a,b]^d}\left| {\int _0^\infty F^d_x(c)\,\mathrm {d}c-\sum _{j=1}^{Q_{d,\varepsilon }}w^d_{\varepsilon ,j}F^d_x(c^d_{\varepsilon ,j})}\right| \le \varepsilon . \end{aligned}$$
(4.20)

Proof of Lemma 4.3

Note that Lemma 3.5 ensures the existence of $S_m\in {\mathbb {R}}$, $m\in {\mathbb {N}}$, such that for every $d,m\in {\mathbb {N}}$, $x\in [a,b]^d$ it holds

$$\begin{aligned} \sup _{c\,\in (0,\infty )}\left| {(F^d_x)^{(m)}(c)}\right| \le S_m d^m. \end{aligned}$$

(4.21)

Let $Q_{d,\varepsilon }\in {\mathbb {R}}$, $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$, be given by

$$\begin{aligned} Q_{d,\varepsilon }=n\left\lceil \left[ {\tfrac{1}{(2n)!}(N_{d,\varepsilon })^{2n+1}S_{2n}d^{2n}\tfrac{2}{\varepsilon }}\right] ^{\frac{1}{2n}}\right\rceil . \end{aligned}$$

(4.22)

Next observe that Lemma 4.1 (with $N\leftrightarrow N_{d,\varepsilon }$ in the notation of Lemma 4.1) establishes the existence of $c^d_{\varepsilon ,j}\in (0,N_{d,\varepsilon })$, $w^d_{\varepsilon ,j}\in [0,\infty )$, $j\in \{1,2,\dots ,{Q_{d,\varepsilon }}\}$, $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$, such that for every $d\in {\mathbb {N}}$, $\varepsilon \in (0,\infty )$, $x\in [a,b]^d$ we have $\sum _{j=1}^{Q_{d,\varepsilon }}w^d_{\varepsilon ,j}=N_{d,\varepsilon }$ and

$$\begin{aligned} \begin{aligned} \left| {\int _0^{N_{d,\varepsilon }}F^d_x(c)\mathrm {d}c-\sum _{j=1}^{Q_{d,\varepsilon }}w^d_{\varepsilon ,j}F^d_x(c^d_{\varepsilon ,j})}\right|&\le \tfrac{1}{(2n)!}(N_{d,\varepsilon })^{2n+1}\left[ {\tfrac{Q_{d,\varepsilon }}{n}}\right] ^{-2n}S_{2n}d^{2n}\\&\le \tfrac{1}{(2n)!}(N_{d,\varepsilon })^{2n+1}\left[ {\tfrac{1}{(2n)!}(N_{d,\varepsilon })^{2n+1}S_{2n}d^{2n}\tfrac{2}{\varepsilon }}\right] ^{-1}S_{2n}d^{2n}\\&=\tfrac{\varepsilon }{2}. \end{aligned}\end{aligned}$$

(4.23)

Moreover, note that Lemma 4.2 (with $N_{d,\frac{\varepsilon }{2}}\leftrightarrow N_{d,\varepsilon }$ in the notation of Lemma 4.2) and (4.23) imply for every $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$, $x\in [a,b]^d$ that

$$\begin{aligned} \begin{aligned}&\left| {\int _0^\infty F^d_x(c)\,\mathrm {d}c-\sum _{j=1}^{Q_{d,\varepsilon }} w^d_{\varepsilon ,j}F^d_x(c^d_{\varepsilon ,j})}\right| \\&\quad \le \left| {\int _0^{N_{d,\varepsilon }}F^d_x(c)\,\mathrm {d}c-\sum _{j=1}^{Q_{d,\varepsilon }}w^d_{\varepsilon ,j}F^d_x(c^d_{\varepsilon ,j})}\right| + \left| {\int ^{\infty }_{N_{d,\varepsilon }}F^d_x(c)\,\mathrm {d}c}\right| \\&\quad \le \tfrac{\varepsilon }{2}+\tfrac{\varepsilon }{2}=\varepsilon . \end{aligned}\end{aligned}$$

(4.24)

Furthermore, we have for every $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$ that

$$\begin{aligned} \begin{aligned} Q_{d,\varepsilon }&\le n\left( {1+\left[ {\tfrac{1}{(2n)!}(N_{d,\varepsilon })^{2n+1}S_{2n}d^{2n}\tfrac{2}{\varepsilon }}\right] ^{\frac{1}{2n}}}\right) \\&=n+n\left[ {\tfrac{2S_{2n}}{(2n)!}}\right] ^{\frac{1}{2n}}d\varepsilon ^{-\frac{1}{2n}}(N_{d,\varepsilon })^{1+\frac{1}{2n}}\\&\le n+n\left[ {\tfrac{2S_{2n}}{(2n)!}}\right] ^{\frac{1}{2n}}d\varepsilon ^{-\frac{1}{2n}}\left[ {4e^{2(n+1)}(b+1)^{1+\frac{1}{n}}d^{\frac{1}{n}}\varepsilon ^{-\frac{1}{n}}}\right] ^{1+\frac{1}{2n}}\\&=n+4n\left[ {\tfrac{8S_{2n}}{(2n)!}}\right] ^{\frac{1}{2n}}e^{2n+3+\frac{1}{n}}\left[ {b+1}\right] ^{1+\frac{3}{2n}+\frac{1}{2n^2}}d^{1+\frac{1}{n}+\frac{1}{2n^2}}\varepsilon ^{-\frac{3}{2n}-\frac{1}{2n^2}}\\&\le nd^{1+\frac{2}{n}}\varepsilon ^{-\frac{2}{n}} + 4n\left[ {\tfrac{8S_{2n}}{(2n)!}}\right] ^{\frac{1}{2n}}e^{2n+3+\frac{1}{n}}\left[ {b+1}\right] ^{1+\frac{3}{2n}+\frac{1}{2n^2}}d^{1+\frac{2}{n}}\varepsilon ^{-\frac{2}{n}}. \end{aligned} \end{aligned}$$

(4.25)

This implies

$$\begin{aligned} \sup _{\varepsilon \in (0,1], d\in {\mathbb {N}}}\left[ {\frac{Q_{d,\varepsilon }}{d^{1+\frac{2}{n}}\varepsilon ^{-\frac{2}{n}}}}\right] \le n + 4n\left[ {\tfrac{8S_{2n}}{(2n)!}}\right] ^{\frac{1}{2n}}e^{2n+3+\frac{1}{n}}\left[ {b+1}\right] ^{1+\frac{3}{2n}+\frac{1}{2n^2}}<\infty . \end{aligned}$$

(4.26)

The proof of Lemma 4.3 is thus completed. $\square $

5 Basic ReLU DNN Calculus

In order to talk about neural networks we will, up to some minor changes and additions, adopt the notation of P. Petersen and F. Voigtlaender from [34]. This allows us to differentiate between a neural network, defined as a structured set of weights, and its realization, which is a function on ${\mathbb {R}}^d$. Note that this is almost necessary in order to talk about the complexity of neural networks, since notions like depth, size or architecture do not make sense for general functions on ${\mathbb {R}}^d$. Even if we know that a given function “is” a neural network, i.e., can be written a series of affine transformations and componentwise nonlinearities, there are, in general, multiple non-trivially different ways to do so.

Each of these structured sets we consider does, however, define a unique function. This enables us to explicitly and unambiguously construct complex neural networks from simple ones, and subsequently relate the approximation capability of a given network to its complexity. Further note that since the realization of neural network is unique we can still speak of a neural network approximating a given function when its realization does so.

Specifically, a neural network will be given by its architecture, i.e., number of layers L and layer dimensions^{Footnote 1}$N_0,N_1,\dots ,N_L$, as well as the weights determining the affine transformations used to compute each layer from the previous one. Note that our notion of neural networks does not attach the architecture and weights to a fixed activation function, but instead considers the realization of such a neural network with respect to a given activation function. This choice is a purely technical one here, as we always consider networks with ReLU activation function.

Setting 5.1

(Neural networks) For every $L\in {\mathbb {N}}$, $N_0,N_1,\dots ,N_L\in {\mathbb {N}}$ let ${\mathcal {N}}_L^{N_0,N_1,\dots ,N_L}$ be the set given by

$$\begin{aligned} {\mathcal {N}}_L^{N_0,N_1,\dots ,N_L}=\times _{l=1}^L\left( {{\mathbb {R}}^{N_l\times N_{l-1}}\times {\mathbb {R}}^{N_l}}\right) , \end{aligned}$$

(5.1)

let ${\mathfrak {N}}$ be the set given by

$$\begin{aligned} {\mathfrak {N}}=\bigcup _{\begin{array}{c} L \in {\mathbb {N}},\\ N_0, N_1, ..., N_L \in {\mathbb {N}} \end{array} } {\mathcal {N}}^{ N_0, N_1,\dots , N_L }_L, \end{aligned}$$

(5.2)

let ${\mathcal {L}},{\mathcal {M}},{\mathcal {M}}_l,\dim _{\mathrm {in}},\dim _{\mathrm {out}}:{\mathfrak {N}}\rightarrow {\mathbb {N}}$, $l\in \{1,2,\dots ,{L}\}$, be the functions which satisfy for every $L\in {\mathbb {N}}$ and every ${N_0,N_1,\dots ,N_L\in {\mathbb {N}}}$, $\Phi =(((A^1_{i,j})_{i,j=1}^{N_1,N_0},(b^1_i)_{i=1}^{N_1}),\dots ,((A^L_{i,j})_{i,j=1}^{N_L,N_{L-1}},(b^L_i)_{i=1}^{N_L}))\in {\mathcal {N}}^{ N_0, N_1,\dots , N_L }_L$, $l\in \{1,2,\dots ,{L}\}$ ${\mathcal {L}}(\Phi )=L$, $\dim _{\mathrm {in}}(\Phi )=N_0$, $\dim _{\mathrm {out}}(\Phi )=N_L$,

$$\begin{aligned} {\mathcal {M}}_l(\Phi )=\sum _{i=1}^{N_l}\left[ {\mathbb {1}_{{\mathbb {R}}\backslash \{0\}}(b^l_i) +\sum _{j=1}^{N_{l-1}}\mathbb {1}_{{\mathbb {R}}\backslash \{0\}}(A^l_{i,j})}\right] , \end{aligned}$$

(5.3)

and

$$\begin{aligned} {\mathcal {M}}(\Phi )=\sum _{l=1}^L{\mathcal {M}}_l(\Phi ). \end{aligned}$$

(5.4)

For every $\varrho \in C({\mathbb {R}},{\mathbb {R}})$ let $\varrho ^*:\cup _{d\in {\mathbb {N}}}{\mathbb {R}}^d\rightarrow \cup _{d\in {\mathbb {N}}}{\mathbb {R}}^d$ be the function which satisfies for every $d\in {\mathbb {N}}$, $x=(x_1,x_2,\dots ,x_d)\in {\mathbb {R}}^d$ that $\varrho ^*(x)=(\varrho (x_1),\varrho (x_2),\dots ,\varrho (x_d))$, and for every $\varrho \in {\mathcal {C}}({\mathbb {R}},{\mathbb {R}})$ denote by $R_{\varrho }:{\mathfrak {N}}\rightarrow \cup _{a,b\in {\mathbb {N}}}\,C({\mathbb {R}}^a,{\mathbb {R}}^b)$ the function which satisfies for every $L\in {\mathbb {N}}$, $N_0,N_1,\dots ,N_L\in {\mathbb {N}}$, $x_0\in {\mathbb {R}}^{N_0}$, and $\Phi =((A_1,b_1),(A_2,b_2),\dots ,(A_L,b_L))\in {\mathcal {N}}_L^{N_0,N_1,\dots ,N_L}$, with $x_1\in {\mathbb {R}}^{N_1},\dots ,x_{L-1}\in {\mathbb {R}}^{N_{L-1}}$ given by

$$\begin{aligned} x_l = \varrho ^*( A_{l} x_{l-1} + b_{l})\;,\qquad l = 1,...,L-1\;, \end{aligned}$$

(5.5)

that

$$\begin{aligned} \left[ { R_{ \varrho }( \Phi ) }\right] ( x_0 ) = A_L x_{ L - 1 } + b_L\;. \end{aligned}$$

(5.6)

The quantity ${\mathcal {M}}(\Phi )$ simply denotes the number of nonzero entries of the network $\Phi $, which together with its depth ${\mathcal {L}}(\Phi )$ will be how we measure the “size” of a given neural network $\Phi $. One could instead consider the number of all weights, i.e., including zeroes, of a neural network. Note, however, that for any non-degenerate neural network $\Phi $ the total number of weights is bounded from above by ${\mathcal {M}}(\Phi )^2+{\mathcal {M}}(\Phi )$. Here, the terminology “degenerate” refers to a neural network which has neurons that can be removed without changing the realization of the NN. This implies for any neural network there also exists a non-degenerate one of smaller or equal size, which has the exact same realization. Since our primary goal is to approximate d-variate functions by networks the size of which only depends polynomially on the dimension, the above means that the qualitatively same results hold regardless of which notion of “size” is used.

We start by introducing two basic tools for constructing new neural networks from known ones and, in Lemma 5.3 and Lemma 5.4, consider how the properties of a derived network depend on its parts. Note that techniques like these have already been used in [34] and [37].

The first tool will be the “composition” of neural networks in (5.7), which takes two networks and provides a new network whose realization is the composition of the realizations of the two constituent functions.

The second tool will be the “parallelization” of neural networks in (5.12), which will be useful when considering linear combinations or tensor products of functions which we can already approximate. While parallelization of same depth networks (5.10) works with arbitrary activation functions, we use for the general case that any ReLU network can easily be extended (5.11) to an arbitrary depth without changing its realization.

Setting 5.2

Assume Setting 5.1, for every $L_1,L_2\in {\mathbb {N}}$, $\Phi ^i=\left( {(A_1^i,b_1^i), (A_2^i,b_2^i),\dots ,(A^i_{L_i},b^i_{L_i})}\right) \in {\mathfrak {N}}$, $i\in \{1,2\}$, with $\dim _{\mathrm {in}}(\Phi ^1)=\dim _{\mathrm {out}}(\Phi ^2)$ let $\Phi ^1\odot \Phi ^2\in {\mathfrak {N}}$ be the neural network given by

$$\begin{aligned} \Phi ^1\odot \Phi ^2=&\left( (A_1^2,b_1^2),\dots ,(A^2_{L_2-1},b^2_{L_2-1}),\left( {\begin{pmatrix}A^2_{L_2}\\ -A^2_{L_2}\end{pmatrix}, \begin{pmatrix}b^2_{L_2}\\ -b^2_{L_2}\end{pmatrix}}\right) , \left( {\begin{pmatrix}A^1_1&-A^1_1\end{pmatrix},b^1_1}\right) ,\right. \nonumber \\&\left. (A^1_2,b^1_2),\dots ,(A^1_{L_1},b^1_{L_1})\right) , \end{aligned}$$

(5.7)

for every $d\in {\mathbb {N}}$, $L\in {\mathbb {N}}\cap [2,\infty )$ let $\Phi ^{\mathrm {Id}}_{d,L}\in {\mathfrak {N}}$ be the neural network given by

$$\begin{aligned} \Phi ^{\mathrm {Id}}_{d,L}=\left( {\left( {\begin{pmatrix}\mathrm {Id}_{{\mathbb {R}}^d}\\ -\mathrm {Id}_{{\mathbb {R}}^d}\end{pmatrix},0}\right) ,\underbrace{(\mathrm {Id}_{{\mathbb {R}}^{2d}},0),\dots ,(\mathrm {Id}_{{\mathbb {R}}^{2d}},0)}_{\text {L-2 times}},\left( {\begin{pmatrix}\mathrm {Id}_{{\mathbb {R}}^d}&-\mathrm {Id}_{{\mathbb {R}}^d}\end{pmatrix},0}\right) }\right) , \end{aligned}$$

(5.8)

for every $d\in {\mathbb {N}}$ let $\Phi ^{\mathrm {Id}}_{d,1}\in {\mathfrak {N}}$ be the neural network given by

$$\begin{aligned} \Phi ^{\mathrm {Id}}_{d,1}=((\mathrm {Id}_{{\mathbb {R}}^d},0)), \end{aligned}$$

(5.9)

for every $n,L\in {\mathbb {N}}$, $\Phi ^j=((A^j_1,b^j_1),(A^j_2,b^j_2),\dots ,(A^j_L,b^j_L))\in {\mathfrak {N}}$, $j\in \{1,2,\dots ,{n}\}$, let ${\mathcal {P}}_s(\Phi ^1,\Phi ^2,\dots ,\Phi ^n)\in {\mathfrak {N}}$ be the neural network which satisfies

$$\begin{aligned} {\mathcal {P}}_s(\Phi ^1,\Phi ^2,\dots ,\Phi ^n)=\left( {\left( {\begin{pmatrix}A^1_1&{}&{}&{}\\ {} &{} A^2_1 &{}&{}\\ &{}&{}\ddots &{}\\ &{}&{}&{}A^n_1\end{pmatrix},\begin{pmatrix}b^1_1\\ b^2_1\\ \vdots \\ b^n_1\end{pmatrix}}\right) ,\dots ,\left( {\begin{pmatrix}A^1_L&{}&{}&{}\\ {} &{} A^2_L &{}&{}\\ &{}&{}\ddots &{}\\ &{}&{}&{}A^n_L\end{pmatrix},\begin{pmatrix}b^1_L\\ b^2_L\\ \vdots \\ b^n_L\end{pmatrix}}\right) }\right) , \end{aligned}$$

(5.10)

for every $L,d\in {\mathbb {N}}$, $\Phi \in {\mathfrak {N}}$ with ${\mathcal {L}}(\Phi )\le L$, $\dim _{\mathrm {out}}(\Phi )=d$, let ${\mathcal {E}}_L(\Phi )\in {\mathfrak {N}}$ be the neural network given by

$$\begin{aligned} {\mathcal {E}}_L(\Phi )={\left\{ \begin{array}{ll}\Phi ^{\mathrm {Id}}_{d,L-{\mathcal {L}}(\Phi )}\odot \Phi &{} :{\mathcal {L}}(\Phi )<L \\ \Phi &{} :{\mathcal {L}}(\Phi )=L\end{array}\right. }, \end{aligned}$$

(5.11)

and for every $n,L\in {\mathbb {N}}$, $\Phi ^j\in {\mathfrak {N}}$, $j\in \{1,2,\dots ,{n}\}$ with $\max _{j\in \{1,2,\dots ,{n}\}}{\mathcal {L}}(\Phi ^j)=L$, let ${\mathcal {P}}(\Phi ^1,\Phi ^2,\dots ,\Phi ^n)\in {\mathfrak {N}}$ denote the neural network given by

$$\begin{aligned} {\mathcal {P}}(\Phi ^1,\Phi ^2,\dots ,\Phi ^n)=&{\mathcal {P}}_s({\mathcal {E}}_L(\Phi ^1),{\mathcal {E}}_L(\Phi ^2),\dots ,{\mathcal {E}}_L(\Phi ^n)). \end{aligned}$$

(5.12)

Lemma 5.3

Assume Setting 5.2, let $\Phi ^1,\Phi ^2\in {\mathfrak {N}}$ and let $\varrho :{\mathbb {R}}\rightarrow {\mathbb {R}}$ be the function which satisfies for every $t\in {\mathbb {R}}$ that $\varrho (t)=\max \{0,t\}$. Then

(i)
for every $x\in {\mathbb {R}}^{\dim _{\mathrm {in}}(\Phi ^2)}$ it holds
$$\begin{aligned}{}[R_{\varrho }(\Phi ^1\odot \Phi ^2)](x)=([R_{\varrho }(\Phi ^1)]\circ [R_{\varrho }(\Phi ^2)])(x)=[R_{\varrho }(\Phi ^1)]([R_{\varrho }(\Phi ^2)](x)), \end{aligned}$$
(5.13)
(ii)
${\mathcal {L}}(\Phi ^1\odot \Phi ^2)={\mathcal {L}}(\Phi ^1)+{\mathcal {L}}(\Phi ^2)$,
(iii)
${\mathcal {M}}(\Phi ^1\odot \Phi ^2)\le {\mathcal {M}}(\Phi ^1)+{\mathcal {M}}(\Phi ^2)+{\mathcal {M}}_1(\Phi ^1)+{\mathcal {M}}_{{\mathcal {L}}(\Phi ^2)}(\Phi ^2) \le 2({\mathcal {M}}(\Phi ^1)+{\mathcal {M}}(\Phi ^2))$,
(iv)
${\mathcal {M}}_1(\Phi ^1\odot \Phi ^2)={\mathcal {M}}_1(\Phi ^2)$,
(v)
${\mathcal {M}}_{{\mathcal {L}}(\Phi ^1\odot \Phi ^2)}(\Phi ^1\odot \Phi ^2)={\mathcal {M}}_{{\mathcal {L}}(\Phi ^1)}(\Phi ^1)$,
(vi)
$\dim _{\mathrm {in}}(\Phi ^1\odot \Phi ^2)=\dim _{\mathrm {in}}(\Phi ^2)$,
(vii)
$\dim _{\mathrm {out}}(\Phi ^1\odot \Phi ^2)=\dim _{\mathrm {out}}(\Phi ^1)$,
(viii)
for every $d,L\in {\mathbb {N}}$, $x\in {\mathbb {R}}^d$ it holds that $[R_{\varrho }(\Phi ^{\mathrm {Id}}_{d,L})](x)=x$ and
(ix)
for every $L\in {\mathbb {N}}$, $\Phi \in {\mathfrak {N}}$ with ${\mathcal {L}}(\Phi )\le L$, $x\in {\mathbb {R}}^{\dim _{\mathrm {in}}(\Phi )}$ it holds that $[R_{\varrho }({\mathcal {E}}_L(\Phi ))](x)=[R_{\varrho }(\Phi )](x)$.

Proof of Lemma 5.3

For every $i\in \{1,2\}$ let $L_i\in {\mathbb {N}}$, $N^i_1,N^i_2,\dots ,N^i_{L_i}$, $(A^i_l,b^i_l)\in {\mathbb {R}}^{N^i_l\times N^i_{l-1}}\times {\mathbb {R}}^{N^i_l}$, $l\in \{1,2,\dots ,{L_i}\}$ such that $\Phi ^i=((A^i_1,b^i_1),\dots ,(A^i_{L_i},b^i_{L_i}))$. Furthermore, let $(A_l,b_l)\in {\mathbb {R}}^{N_l\times N_{l-1}}\times {\mathbb {R}}^{N_l}$, $l\in \{1,2,\dots ,{L_1+L_2}\}$, be the matrix-vector tuples which satisfy $\Phi _1\odot \Phi _2=((A_1,b_1),\dots ,(A_{L_1+L_2},b_{L_1+L_2}))$ and let $r_l:{\mathbb {R}}^{N_0}\rightarrow {\mathbb {R}}^{N_l}$, $l\in \{1,2,\dots ,{L_1+L_2}\}$, be the functions which satisfy for every $x\in {\mathbb {R}}^{N_0}$ that

$$\begin{aligned} r_l(x)=\left\{ \begin{array}{lll} \varrho ^*(A_1x+b_1) &{} :l=1\\ \varrho ^*(A_l r_{l-1}(x)+b_l) &{} :1<l<L_1+L_2\\ A_l r_{l-1}(x)+b_l &{} :l=L_1+L_2 \end{array}\right. . \end{aligned}$$

(5.14)

Observe that for every $l\in \{1,2,\dots ,{L_2-1}\}$ holds $(A_l,b_l)=(A^2_l,b^2_l)$. This implies that for every $x\in {\mathbb {R}}^{N_0}$ holds

$$\begin{aligned} A^2_{L_2}r_{L_2-1}(x)+b^2_{L_2}=[R_{\varrho }(\Phi _2)](x). \end{aligned}$$

(5.15)

Combining this with (5.7) implies for every $x\in {\mathbb {R}}^{N_0}$ that

$$\begin{aligned} \begin{aligned} r_{L_2}(x)&=\varrho ^*(A_{L_2}r_{L_2-1}(x)+b_{L_2})=\varrho ^*\left( \begin{pmatrix}A^2_{L_2}\\ -A^2_{L_2}\end{pmatrix}r_{L_2-1}(x)+\begin{pmatrix}b^2_{L_2}\\ -b^2_{L_2}\end{pmatrix}\right) \\&=\varrho ^*\left( \begin{pmatrix}A^2_{L_2}r_{l-1}(x) +b^2_{L_2}\\ -A^2_{L_2}r_{l-1}(x)-b^2_{L_2}\end{pmatrix}\right) =\begin{pmatrix}\varrho ^*([R_{\varrho }(\Phi ^2)](x))\\ \varrho ^*(-[R_{\varrho }(\Phi ^2)](x))\end{pmatrix} \end{aligned}\end{aligned}$$

(5.16)

In addition, for every $d\in {\mathbb {N}}$, $y=(y_1,y_2,\dots ,y_d)\in {\mathbb {R}}^d$ holds

$$\begin{aligned} \varrho ^*(y)-\varrho ^*(-y)=(\varrho (y_1)-\varrho (-y_1),\varrho (y_2)-\varrho (-y_2),\dots ,\varrho (y_d)-\varrho (-y_d))=y. \end{aligned}$$

(5.17)

This, (5.7) and (5.16) ensure that for every $x\in {\mathbb {R}}^{N_0}$ holds

$$\begin{aligned} \begin{aligned} r_{L_2+1}(x)&=A_{L_2+1} \begin{pmatrix}\varrho ^*([R_{\varrho }(\Phi ^2)](x)) \\ \varrho ^*(-[R_{\varrho }(\Phi ^2)](x)) \end{pmatrix}+b_{L_2+1}\\&=A^1_1\varrho ^*([R_{\varrho }(\Phi ^2)](x))-A^1_1 \varrho ^*(-[R_{\varrho }(\Phi ^2)](x))+b_{L_2+1}\\&=A^1_1[R_{\varrho }(\Phi ^2)](x)+b^1_1. \end{aligned}\end{aligned}$$

(5.18)

Combining this with (5.14) establishes (i). Moreover, (ii)-(vii) follow directly from (5.7). Furthermore, (5.8), (5.9) and (5.17) imply (viii). Finally, (ix) follows from (5.11) and (viii). This completes the proof of Lemma 5.3. $\square $

Lemma 5.4

Assume Setting 5.2, let $\varrho :{\mathbb {R}}\rightarrow {\mathbb {R}}$ be the function which satisfies for every $t\in {\mathbb {R}}$ that $\varrho (t)=\max \{0,t\}$, let $n\in {\mathbb {N}}$, let $\varphi ^j\in {\mathfrak {N}}$, $j\in \{1,2,\dots ,{n}\}$, let $d_j\in {\mathbb {N}}$, $j\in \{1,2,\dots ,{n}\}$, be given by $d_j=\dim _{\mathrm {in}}(\varphi ^j)$, let $D\in {\mathbb {N}}$ be given by $D=\sum _{j=1}^n d_j$ and let $\Phi \in {\mathfrak {N}}$ be given by $\Phi ={\mathcal {P}}(\varphi ^1,\varphi ^2,\dots ,\varphi ^n)$. Then

(i)
for every $x\in {\mathbb {R}}^D$ it holds
$$\begin{aligned} {}[R_{\varrho }(\Phi )](x)=&\left( [R_{\varrho }(\varphi ^1)](x_1,\dots ,x_{d_1}), [R_{\varrho }(\varphi ^2)](x_{d_1+1},\dots ,x_{d_1+d_2}),\dots ,\right. \nonumber \\&\left. [R_{\varrho }(\varphi ^n)](x_{D-d_n+1},\dots ,x_{D})\right) , \end{aligned}$$
(5.19)
(ii)
${\mathcal {L}}(\Phi )=\max _{j\in \{1,2,\dots ,{n}\}}{\mathcal {L}}(\varphi ^j)$,
(iii)
${\mathcal {M}}(\Phi )\le 2\left( {\sum _{j=1}^n{\mathcal {M}}(\varphi ^j)}\right) +4\left( {\sum _{j=1}^n \dim _{\mathrm {out}}(\varphi ^j)}\right) \max _{j\in \{1,2,\dots ,{n}\}}{\mathcal {L}}(\varphi ^j)$,
(iv)
${\mathcal {M}}(\Phi )=\sum _{j=1}^n{\mathcal {M}}(\varphi ^j)$ provided for every $j,j'\in \{1,2,\dots ,{n}\}$ holds ${\mathcal {L}}(\varphi ^j)={\mathcal {L}}(\varphi ^{j'})$,
(v)
${\mathcal {M}}_{{\mathcal {L}}(\Phi )}(\Phi )\le \sum _{j=1}^n \max \{2\dim _{\mathrm {out}}(\varphi ^j),{\mathcal {M}}_{{\mathcal {L}}(\varphi ^j)}(\varphi ^j)\}$,
(vi)
${\mathcal {M}}_1(\Phi )=\sum _{j=1}^n{\mathcal {M}}_1(\varphi ^j)$,
(vii)
$\dim _{\mathrm {in}}(\Phi )=\sum _{j=1}^n\dim _{\mathrm {in}}(\varphi ^j)$ and
(viii)
$\dim _{\mathrm {out}}(\Phi )=\sum _{j=1}^n\dim _{\mathrm {out}}(\varphi ^j)$.

Proof of Lemma 5.4

Observe that Lemma 5.3 implies that for every $j\in \{1,2,\dots ,{n}\}$ holds

$$\begin{aligned} R_{\varrho }({\mathcal {E}}_{{\mathcal {L}}(\Phi )}(\varphi ^j))=R_{\varrho }(\varphi ^j). \end{aligned}$$

(5.20)

Combining this with (5.10) and (5.12) establishes (i). Furthermore, note that (ii), (vi), (vii) and (viii) follow directly from (5.10) and (5.12). Moreover, (5.10) demonstrates that for every $m\in {\mathbb {N}}$, $\psi _i\in {\mathfrak {N}}$, $i\in \{1,2,\dots ,{m}\}$, with $\forall i,i'\in \{1,2,\dots ,{m}\}:{\mathcal {L}}(\psi ^i)={\mathcal {L}}(\psi ^{i'})$ holds

$$\begin{aligned} {\mathcal {M}}({\mathcal {P}}_s(\psi ^1,\psi ^2,\dots ,\psi ^m))=\sum _{i=1}^m{\mathcal {M}}(\psi ^i). \end{aligned}$$

(5.21)

This establishes (iv). Next, observe that Lemma 5.3, (5.11) and the fact that for every $d\in $, $L\in {\mathbb {N}}$ holds ${\mathcal {M}}(\Phi ^{\mathrm {Id}}_{d,L})\le 2dL$ imply that for every $j\in \{1,2,\dots ,{n}\}$ we have

$$\begin{aligned} \begin{aligned} {\mathcal {M}}({\mathcal {E}}_{{\mathcal {L}}(\Phi )}(\varphi ^j))&\le 2{\mathcal {M}}(\Phi ^{\mathrm {Id}}_{\dim _{\mathrm {out}}(\varphi ^j),{\mathcal {L}}(\Phi )-{\mathcal {L}}(\varphi ^j)})+2{\mathcal {M}}(\varphi ^j)\\&\le 4\dim _{\mathrm {out}}(\varphi ^j){\mathcal {L}}(\Phi )+2{\mathcal {M}}(\varphi ^j). \end{aligned} \end{aligned}$$

(5.22)

Combining this with (5.21) establishes (iii). In addition, note that (5.8), (5.9) and (5.11) ensure for every $j\in \{1,2,\dots ,{n}\}$ that

$$\begin{aligned} {\mathcal {M}}_{{\mathcal {L}}(\Phi )}({\mathcal {E}}_{{\mathcal {L}}(\Phi )}(\varphi ^j)) \le \max \{2\dim _{\mathrm {out}}(\varphi ^j),{\mathcal {M}}_{{\mathcal {L}}(\varphi ^j)}(\varphi ^j)\}. \end{aligned}$$

(5.23)

Combining this with (5.10) establishes (v). The proof of Lemma 5.4 is thus completed. $\square $

6 Basic Expression Rate Results

Here, we begin by establishing an expression rate result for a very simple function, namely $x\mapsto x^2$ on [0, 1]. Our approach is based on the observation by M. Telgarsky [40] that neural networks with ReLU activation function can efficiently compute high-frequency sawtooth functions and the idea of D. Yarotsky in [44] to use this in order to approximate the function $x\mapsto x^2$ by networks computing its linear interpolations. This can then be used to derive networks capable of efficiently approximating $(x,y)\mapsto xy$, which leads to tensor products as well as polynomials and subsequently smooth function. Note that [44] uses a slightly different notion of neural networks, where connections between non-adjacent layers are permitted. This does, however, only require a technical modification of the proof, which does not significantly change the result. Nonetheless, the respective proofs are provided in appendix for completeness.

Lemma 6.1

Assume Setting 5.1 and let $\varrho :{\mathbb {R}}\rightarrow {\mathbb {R}}$ be the ReLU activation function given by $\varrho (t)=\max \{0,t\}$. Then there exist neural networks $(\sigma _{\varepsilon })_{\varepsilon \in (0,\infty )}\subseteq {\mathfrak {N}}$ such that for every $\varepsilon \in (0,\infty )$

(i)
${\mathcal {L}}(\sigma _{\varepsilon })\le {\left\{ \begin{array}{ll}\tfrac{1}{2}\left| {\log _2(\varepsilon )}\right| +1 &{} :\varepsilon <1\\ 1 &{} :\varepsilon \ge 1\end{array}\right. }$,
(ii)
${\mathcal {M}}(\sigma _{\varepsilon })\le {\left\{ \begin{array}{ll}15(\tfrac{1}{2}\left| {\log _2(\varepsilon )}\right| +1) &{} :\varepsilon < 1\\ 0 &{} :\varepsilon \ge 1\end{array}\right. }$,
(iii)
$\sup _{t\in [0,1]}\left| {t^2-\left[ {R_{\varrho }(\sigma _{\varepsilon })}\right] \!(t)}\right| \le \varepsilon $,
(iv)
$[R_{\varrho }(\sigma _{\varepsilon })]\!(0) = 0$.

We can now derive the following result on approximate multiplication by neural networks, by observing that $xy=2B^2(|(x+y)/2B|^2-|x/2B|^2-|y/2B|^2)$ for every $B\in (0,\infty )$, $x,y\in {\mathbb {R}}$.

Lemma 6.2

Assume Setting 5.1, let $B\in (0,\infty )$ and let $\varrho :{\mathbb {R}}\rightarrow {\mathbb {R}}$ be the ReLU activation function given by $\varrho (t)=\max \{0,t\}$. Then there exist neural networks $(\mu _{\varepsilon })_{\varepsilon \in (0,\infty )}\subseteq {\mathfrak {N}}$ which satisfy for every $\varepsilon \in (0,\infty )$ that

(i)
${\mathcal {L}}(\mu _{\varepsilon })\le {\left\{ \begin{array}{ll}\tfrac{1}{2}\log _2(\tfrac{1}{\varepsilon })+\log _2(B)+6 &{} :\varepsilon < B^2\\ 1 &{} :\varepsilon \ge B^2\end{array}\right. }$,
(ii)
${\mathcal {M}}(\mu _{\varepsilon })\le {\left\{ \begin{array}{ll}45\log _2(\tfrac{1}{\varepsilon })+90\log _2(B)+259 &{} :\varepsilon < B^2\\ 0 &{} :\varepsilon \ge B^2\end{array}\right. }$,
(iii)
$\sup _{(x,y)\in [-B,B]^2}\left| {xy-\left[ {R_{\varrho }(\mu _{\varepsilon })}\right] \!(x,y)}\right| \le \varepsilon $,
(iv)
${\mathcal {M}}_1(\mu _{\varepsilon })=8,\ {\mathcal {M}}_{{\mathcal {L}}(\mu _{\varepsilon })}(\mu _{\varepsilon })=3$ and
(v)
for every $x\in {\mathbb {R}}$ it holds that $R_\varrho [\mu _{\varepsilon }](0,x) = R_\varrho [\mu _{\varepsilon }](x,0)=0$.

Next we extend this result to products of any number of factors by hierarchical, pairwise multiplication.

Theorem 6.3

Assume Setting 5.1, let $\varrho :{\mathbb {R}}\rightarrow {\mathbb {R}}$ be the ReLU activation function given by $\varrho (t)=\max \{0,t\}$, let $m\in {\mathbb {N}}\cap [2,\infty )$ and let $B\in [1,\infty )$. Then there exists a constant $C\in {\mathbb {R}}$ (which is independent of m, B) and neural networks ${(\Pi _{\varepsilon })_{\varepsilon \in (0,\infty )}\subseteq {\mathfrak {N}}}$ which satisfy

(i)
${\mathcal {L}}(\Pi _{\varepsilon })\le C\ln (m)\left( {\left| {\ln (\varepsilon )}\right| +m\ln (B)+\ln (m)}\right) $,
(ii)
${\mathcal {M}}(\Pi _{\varepsilon })\le C m\left( {\left| {\ln (\varepsilon )}\right| +m\ln (B)+\ln (m)}\right) $,
(iii)
$\displaystyle \sup _{x\in [-B,B]^m}\left| {\left[ {\prod _{j=1}^m x_j}\right] -\left[ {R_{\varrho }(\Pi _{\varepsilon })}\right] \!(x)}\right| \le \varepsilon $ and
(iv)
$R_\varrho \left[ \Pi _{\varepsilon }\right] (x_1,x_2,\dots ,x_m)=0$, if there exists $i\in \{1,2,\dots ,m\}$ with $x_i=0$.

Proof of Theorem 6.3

Throughout this proof, assume Setting 5.2, let $l=\lceil \log _2 m\rceil $ and let $\theta \in {\mathcal {N}}^{1,1}_1$ be the neural network given by $\theta =(0,0)$, let $(A,b)\in {\mathbb {R}}^{l\times m}\times {\mathbb {R}}^{l}$ be the matrix-vector tuple given by

$$\begin{aligned} A_{i,j}={\left\{ \begin{array}{ll}1 &{} :i=j, j\le m \\ 0 &{} :\mathrm {else} \end{array}\right. }\quad \mathrm {and}\quad b_i={\left\{ \begin{array}{ll} 0 &{} :i\le m\\ 1 &{} :i>m\end{array}\right. }. \end{aligned}$$

(6.1)

Let further $\omega \in {\mathcal {N}}^{m,2^l}_2$ be the neural network given by $\omega =((A,b))$. Note that Lemma 6.2 (with $B^m$ as B in the notation of Lemma 6.2) ensures that there exist neural networks $(\mu _{\eta })_{\eta \in (0,\infty )}\subseteq {\mathfrak {N}}$ such that for every $\eta \in (0,\left[ {B^m}\right] ^2)$ it holds

(A)
${\mathcal {L}}(\mu _{\eta })\le \tfrac{1}{2}\log _2(\tfrac{1}{\eta })+\log _2(B^m)+6$,
(B)
${\mathcal {M}}(\mu _{\eta })\le 45\log _2(\tfrac{1}{\eta })+90\log _2(B^m)+259$,
(C)
$\displaystyle \sup _{x,y\in [-B^m,B^m]}\left| {xy-\left[ {R_{\varrho }(\mu _{\eta })}\right] \!(x,y)}\right| \le \eta $,
(D)
${\mathcal {M}}_1(\mu _{\eta })=8,\ {\mathcal {M}}_{{\mathcal {L}}(\mu _{\eta })}(\mu _{\eta })=3$ and
(E)
for every $x\in {\mathbb {R}}$ it holds that $R_\varrho [\mu _{\eta }](0,x) = R_\varrho [\mu _{\eta }](x,0)=0$.

Let $(\nu _{\varepsilon })_{\varepsilon \in (0,\infty )}\subseteq {\mathfrak {N}}$ be the neural networks which satisfy for every $\varepsilon \in (0,\infty )$

$$\begin{aligned} \nu _{\varepsilon }=\mu _{m^{-2}B^{-2m}\varepsilon }. \end{aligned}$$

(6.2)

Observe that (A) implies that for every $\varepsilon \in (0,B^m)\subseteq (0,m^2 B^{4m})$ it holds

$$\begin{aligned} \begin{aligned} {\mathcal {L}}(\nu _{\varepsilon })&\le \tfrac{1}{2}\log _2\left( \tfrac{1}{m^{-2}B^{-2m}\varepsilon }\right) +\log _2(B^m)+6\\&=\tfrac{1}{2}\left( \log _2\left( \tfrac{1}{\varepsilon }\right) +2\log _2(m)+2m\log _2(B)\right) +m\log _2(B)+6\\&=\tfrac{1}{2}\log _2\left( \tfrac{1}{\varepsilon }\right) +2m\log _2(B)+\log _2(m)+6. \end{aligned} \end{aligned}$$

(6.3)

In addition, note that (B) implies that for every $\varepsilon \in (0,B^m)\subseteq (0,m^2 B^{4m})$

$$\begin{aligned} \begin{aligned} {\mathcal {M}}(\nu _{\varepsilon })&\le 45\log _2\left( \tfrac{1}{m^{-2}B^{-2m}\varepsilon }\right) +90\log _2(B^m)+259\\&=45\log _2\left( \tfrac{1}{\varepsilon }\right) +180m\log _2(B)+90\log _2(m)+259. \end{aligned}\end{aligned}$$

(6.4)

Furthermore, (C) implies that for every $\varepsilon \in (0,B^m)\subseteq (0,m^2 B^{4m})$ holds

$$\begin{aligned} \sup _{x,y\in [-B^m,B^m]}\left| {xy-\left[ {R_{\varrho }(\nu _{\eta })}\right] \!(x,y)}\right| \le m^{-2}B^{-2m}\varepsilon . \end{aligned}$$

(6.5)

Let $\pi _{k,\varepsilon }\in {\mathfrak {N}}$, $\varepsilon \in (0,\infty )$, $k\in {\mathbb {N}}$, be the neural networks which satisfy for every $\varepsilon \in (0,\infty )$, $k\in {\mathbb {N}}$

$$\begin{aligned} \pi _{k,\varepsilon }={\left\{ \begin{array}{ll}\nu _{\varepsilon }&{} :k=1\\ \nu _{\varepsilon }\odot {\mathcal {P}}(\pi _{k-1,\varepsilon },\pi _{k-1,\varepsilon }) &{} :k>1 \end{array}\right. } \end{aligned}$$

(6.6)

and let $(\Pi _{\varepsilon })_{\varepsilon \in (0,\infty )}\subseteq {\mathfrak {N}}$ be neural networks given by

$$\begin{aligned} \Pi _{\varepsilon }={\left\{ \begin{array}{ll}\pi _{l,\varepsilon }\odot \omega &{} :\varepsilon <B^m \\ \theta &{} :\varepsilon \ge B^m \end{array}\right. }. \end{aligned}$$

(6.7)

Note that for every $\varepsilon \in (B^m,\infty )$ it holds

$$\begin{aligned} \begin{aligned} \sup _{x\in [-B,B]^m}\left| {\left[ {\textstyle \prod \limits _{j=1}^m x_j}\right] -\left[ {R_{\varrho }(\Pi _{\varepsilon })}\right] \!(x)}\right|&=\sup _{x\in [-B,B]^m}\left| {\left[ {\textstyle \prod \limits _{j=1}^m x_j}\right] -\left[ {R_{\varrho }(\theta )}\right] \!(x)}\right| \\&=\sup _{x\in [-B,B]^m}\left| {\left[ {\textstyle \prod \limits _{j=1}^m x_j}\right] -0}\right| =B^m\le \varepsilon . \end{aligned}\end{aligned}$$

(6.8)

We claim that for every $k\in \{1,2,\dots ,{l}\}$, $\varepsilon \in (0,B^m)$ it holds

(a)
that
$$\begin{aligned} \sup _{x\in [-B,B]^{(2^k)}}\left| {\left[ {\textstyle \prod \limits _{j=1}^{2^k}x_j}\right] -[R_{\varrho }(\pi _{k,\varepsilon })](x)}\right| \le 4^{k-1} m^{-2} B^{(2^k-2m)}\varepsilon , \end{aligned}$$
(6.9)
(b)
that ${\mathcal {L}}(\pi _{k,\varepsilon })\le k{\mathcal {L}}(\nu _{\varepsilon })$ and
(c)
that ${\mathcal {M}}(\pi _{k,\varepsilon })\le (2^k-1){\mathcal {M}}(\nu _{\varepsilon })+(2^{k-1}-1)20$.

We prove (a), (b) and (c) by induction on $k\in \{1,2,\dots ,{l}\}$. Observe that (6.5) and the fact that $B\in [1,\infty )$ establishes (a) for $k=1$. Moreover, note that (6.6) establishes (b) and (c) in the base case $k=1$.

For the induction step $\{1,2,\dots ,{l-1}\}\ni k\rightarrow k+1\in \{2,3,\dots ,l\}$ note that Lemma 5.3, Lemma 5.4, (6.5) and (6.6) imply that for every $k\in \{1,2,\dots ,{l-1}\}$, $\varepsilon \in (0,B^m)$

$$\begin{aligned} \begin{aligned}&\sup _{x\in [-B,B]^{(2^{k+1})}}\left| {\left[ {\prod _{j=1}^{2^{k+1}}x_j}\right] -[R_{\varrho }(\pi _{k+1,\varepsilon })](x)}\right| \\&\quad =\sup _{x,x'\in [-B,B]^{(2^k)}}\left| {\left[ {\prod _{j=1}^{2^k}x_j}\right] \!\!\left[ {\prod _{j=1}^{2^k}x'_j}\right] -[R_{\varrho }(\pi _{k+1,\varepsilon })]\left( {(x,x')}\right) }\right| \\&\quad =\sup _{x,x'\in [-B,B]^{(2^k)}}\left| {\left[ {\prod _{j=1}^{2^k}x_j}\right] \!\!\left[ {\prod _{j=1}^{2^k}x'_j}\right] -[R_{\varrho }(\nu _{\varepsilon })]\left( {[R_{\varrho }(\pi _{k,\varepsilon })](x),[R_{\varrho }(\pi _{k,\varepsilon })](x')}\right) }\right| \\&\quad \le \sup _{x,x'\in [-B,B]^{(2^k)}}\left| {\left[ {\prod _{j=1}^{2^k}x_j}\right] \!\!\left[ {\prod _{j=1}^{2^k}x'_j}\right] -\left( {[R_{\varrho }(\pi _{k,\varepsilon })](x)}\right) \left( {[R_{\varrho }(\pi _{k,\varepsilon })](x')}\right) }\right| \\&\qquad \,+\!\!\!\!\!\!\sup _{x,x'\in [-B,B]^{(2^k)}}\left| \left( {[R_{\varrho }(\pi _{k,\varepsilon })](x)}\right) \left( {[R_{\varrho }(\pi _{k,\varepsilon })](x')}\right) \right. \\&\qquad \left. -[R_{\varrho }(\nu _{\varepsilon })]\left( {[R_{\varrho }(\pi _{k,\varepsilon })](x),[R_{\varrho }(\pi _{k,\varepsilon })](x')}\right) \right| \\&\quad \le \sup _{x,x'\in [-B,B]^{(2^k)}}\left| {\left[ {\prod _{j=1}^{2^k}x_j}\right] \!\!\left[ {\prod _{j=1}^{2^k}x'_j}\right] -\left( {[R_{\varrho }(\pi _{k,\varepsilon })](x)}\right) \left( {[R_{\varrho }(\pi _{k,\varepsilon })](x')}\right) }\right| \\&\qquad + m^{-2}B^{-2m}\varepsilon . \end{aligned} \end{aligned}$$

(6.10)

Next, for every $c,\delta \in (0,\infty )$, $y,z\in [-c,c]$, ${\tilde{y}},{\tilde{z}}\in {\mathbb {R}}$ with $\left| {y-{\tilde{y}}}\right| , \left| {z-{\tilde{z}}}\right| \le \delta $ it holds

$$\begin{aligned} \left| {yz-{\tilde{y}}{\tilde{z}}}\right| \le 2(\left| {y}\right| +\left| {z}\right| )\delta + \delta ^2\le 2c\delta +\delta ^2. \end{aligned}$$

(6.11)

Moreover, for every $k\in \{1,2,\dots ,{l}\}$

$$\begin{aligned} 4^{k-1}\le 4^{l-1}=4^{\lceil \log _2 m\rceil -1}\le 4^{\log _2 m}=m^2. \end{aligned}$$

(6.12)

The fact that $B\in [1,\infty )$ therefore ensures that for every $k\in \{1,2,\dots ,{l-1}\}$, $\varepsilon \in (0,B^m)$

$$\begin{aligned} \begin{aligned} \left[ {4^{k-1} m^{-2} B^{(2^k-2m)}\varepsilon }\right] ^2 =&\left[ {4^{k-1} m^{-2} B^{(2^{k+1}-2m)}\varepsilon }\right] \left[ {4^{k-1} m^{-2} B^{-2m}\varepsilon }\right] \\ \le&\left[ {4^{k-1} m^{-2} B^{(2^{k+1}-2m)}\varepsilon }\right] . \end{aligned}\end{aligned}$$

(6.13)

This and (6.11) imply that for every $k\in \{1,2,\dots ,{l-1}\}$, $\varepsilon \in (0,B^m)$, $x,x'\in [-B,B]^{(2^k)}$

$$\begin{aligned} \begin{aligned}&\left| {\left[ {\textstyle \prod \limits _{j=1}^{2^k}x_j}\right] \!\!\left[ {\textstyle \prod \limits _{j=1}^{2^k}x'_j}\right] -\left( {[R_{\varrho }(\pi _{k,\varepsilon })](x)}\right) \left( {[R_{\varrho }(\pi _{k,\varepsilon })](x')}\right) }\right| \\&\quad \le 2B^{(2^k)}4^{k-1} m^{-2} B^{(2^k-2m)}\varepsilon +\left[ {4^{k-1} m^{-2} B^{(2^k-2m)}\varepsilon }\right] ^2\\&\quad \le 3\left[ {4^{k-1} m^{-2} B^{(2^{k+1}-2m)}\varepsilon }\right] . \end{aligned}\end{aligned}$$

(6.14)

Combining this, (6.10) and the fact that $B\in [1,\infty )$ demonstrates that for every $k\in \{1,2,\dots ,{l-1}\}$, $\varepsilon \in (0,B^m)$

$$\begin{aligned} \begin{aligned}&\sup _{x\in [-B,B]^{(2^{k+1})}}\left| {\left[ {\textstyle \prod \limits _{j=1}^{2^{k+1}}x_j}\right] -[R_{\varrho }(\pi _{k+1,\varepsilon })](x)}\right| \\&\quad \le 3\left[ {4^{k-1} m^{-2} B^{(2^{k+1}-2m)}\varepsilon }\right] +m^{-2}B^{-2m}\varepsilon \\&\quad \le 4^k m^{-2} B^{(2^{k+1}-2m)}\varepsilon . \end{aligned}\end{aligned}$$

(6.15)

This establishes the claim (a). Moreover, Lemma 5.3 and Lemma 5.4 imply for every $k\in \{1,2,\dots ,{l-1}\}$, $\varepsilon \in (0,B^m)$ with ${\mathcal {L}}(\pi _{k,\varepsilon })\le k{\mathcal {L}}(\nu _{\varepsilon })$ holds

$$\begin{aligned} \begin{aligned} {\mathcal {L}}(\pi _{k+1,\varepsilon })&={\mathcal {L}}(\nu _{\varepsilon })+\max \{{\mathcal {L}}(\pi _{k,\varepsilon }),{\mathcal {L}}(\pi _{k,\varepsilon })\}\\&\le {\mathcal {L}}(\nu _{\varepsilon }) + k{\mathcal {L}}(\nu _{\varepsilon })=(k+1){\mathcal {L}}(\nu _{\varepsilon }). \end{aligned}\end{aligned}$$

(6.16)

This establishes the claim (b). Furthermore, Lemma 5.3, Lemma 5.4, (B) and (D) imply for every $k\in \{1,2,\dots ,{l-1}\}$, $\varepsilon \in (0,B^m)$ with ${\mathcal {M}}(\pi _{k,\varepsilon })\le (2^k-1){\mathcal {M}}(\nu _{\varepsilon })+(2^{k-1}-1)20$ holds

$$\begin{aligned} \begin{aligned} {\mathcal {M}}(\pi _{k+1,\varepsilon })&\le {\mathcal {M}}(\nu _{\varepsilon })+({\mathcal {M}}(\pi _{k,\varepsilon })+{\mathcal {M}}(\pi _{k,\varepsilon }))+{\mathcal {M}}_1(\nu _{\varepsilon })+{\mathcal {M}}_{{\mathcal {L}}({\mathcal {P}}(\pi _{k,\varepsilon },\pi _{k,\varepsilon }))}({\mathcal {P}}(\pi _{k,\varepsilon },\pi _{k,\varepsilon })) \\&\le {\mathcal {M}}(\nu _{\varepsilon })+2{\mathcal {M}}(\pi _{k,\varepsilon })+14+2{\mathcal {M}}_{{\mathcal {L}}(\nu _{\varepsilon })}(\nu _{\varepsilon })\le {\mathcal {M}}(\nu _{\varepsilon })+2{\mathcal {M}}(\pi _{k,\varepsilon })+20\\&\le {\mathcal {M}}(\nu _{\varepsilon })+2((2^k-1){\mathcal {M}}(\nu _{\varepsilon })+(2^{k-1}-1)20)+20\\&=(2^{k+1}-1){\mathcal {M}}(\nu _{\varepsilon })+(2^k-1)20. \end{aligned}\end{aligned}$$

(6.17)

This establishes the claim (c).

Combining (a) with Lemma 5.3 and (6.7) implies for every $\varepsilon \in (0,B^m)$ the bound

$$\begin{aligned} \begin{aligned} \sup _{x\in [-B,B]^m}\left| {\left[ {\prod _{j=1}^m x_j}\right] -\left[ {R_{\varrho }(\Pi _{\varepsilon })}\right] \!(x)}\right|&\le \!\!\!\!\!\sup _{x\in [-B,B]^{(2^l)}}\!\left| {\left[ {\prod _{j=1}^{2^l} x_j}\right] -\left[ {R_{\varrho }(\pi _{l,\varepsilon })}\right] \!(x)}\right| \\&\le 4^{l-1} m^{-2} B^{(2^l-2m)}\varepsilon \\&\le 4^{\lceil \log _2(m)\rceil -1} m^{-2} B^{(2^{\lceil \log _2(m)\rceil }-2m)}\varepsilon \\&\le 4^{\log _2(m)} m^{-2} B^{(2^{\log _2(m)+1}-2m)}\varepsilon \\&\le \left[ {2^{\log _2(m)}}\right] ^2 m^{-2}B^{(2m-2m)}\varepsilon \le \varepsilon . \end{aligned}\end{aligned}$$

(6.18)

This and (6.8) establish that the neural networks $(\Pi _{\varepsilon })_{\varepsilon \in (0,\infty )}$ satisfy (iii). Combining (b) with Lemma 5.3, (6.3) and (6.7) ensures that for every $\varepsilon \in (0,B^m)$

$$\begin{aligned} \begin{aligned} {\mathcal {L}}(\Pi _{\varepsilon })&={\mathcal {L}}(\pi _{l,\varepsilon })+{\mathcal {L}}(\omega )\le l{\mathcal {L}}(\nu _{\varepsilon })+1\le (\log _2(m)+1){\mathcal {L}}(\nu _{\varepsilon })+1\\&\le \log _2(m)\log _2\left( \tfrac{1}{\varepsilon }\right) +4\log _2(m)m\log _2(B)+2[\log _2(m)]^2+12\log _2(m)+1 \end{aligned}\end{aligned}$$

(6.19)

and that for every $\varepsilon \in (B^m,\infty )$ it holds ${\mathcal {L}}(\Pi _{\varepsilon })={\mathcal {L}}(\theta )=1$. This establishes that the neural networks $(\Pi _{\varepsilon })_{\varepsilon \in (0,\infty )}$ satisfy (i). Furthermore, note that (c), Lemma 5.3, (6.3) and (6.7) demonstrate that for every $\varepsilon \in (0,B^m)$

$$\begin{aligned} \begin{aligned} {\mathcal {M}}(\Pi _{\varepsilon })&\le 2({\mathcal {M}}(\pi _{l,\varepsilon }) + {\mathcal {M}}(\omega ))\le 2\left[ {(2^l-1){\mathcal {M}}(\nu _{\varepsilon })+(2^{l-1}-1)20}\right] +4m\\&\le 2^{l+1}{\mathcal {M}}(\nu _{\varepsilon })+(2^l)20+4m\le 4m{\mathcal {M}}(\nu _{\varepsilon })+44m\\&\le 180m\log _2\left( \tfrac{1}{\varepsilon }\right) +720m^2\log _2(B)+360m\log _2(m)+1080m \end{aligned}\end{aligned}$$

(6.20)

and that for every $\varepsilon \in (B^m,\infty )$ holds ${\mathcal {M}}(\Pi _{\varepsilon })={\mathcal {M}}(\theta )=0$. This establishes that the neural networks $(\Pi _{\varepsilon })_{\varepsilon \in (0,\infty )}$ satisfy (ii). Note that (iv) follows from (E) by construction. The proof of Theorem 6.3 is thus completed. $\square $

With the above established, it is quite straightforward to get the following result for the approximation of tensor products. Note that the exponential term $B^{m-1}$ in (iii) is unavoidable as result from multiplying m many inaccurate values of magnitude B. For our purposes, this will not be an issue since the functions we consider are bounded in absolute value by $B=1$. This is further not an issue in cases, where the $h_j$ can be approximated by networks whose size scales logarithmically with $\varepsilon $.

Proposition 6.4

Assume Setting 5.2, let $\varrho :{\mathbb {R}}\rightarrow {\mathbb {R}}$ be the ReLU activation function given by $\varrho (t)=\max \{0,t\}$, let $B\in [1,\infty )$, $m\in {\mathbb {N}}$, for every $j\in \{1,2,\dots ,{m}\}$ let $d_j\in {\mathbb {N}}$, $\Omega _j\subseteq {\mathbb {R}}^{d_j}$, and $h_j:\Omega _j\rightarrow [-B,B]$, let $(\Phi ^j_{\varepsilon })_{\varepsilon \in (0,\infty )}\in {\mathfrak {N}}$, $j\in \{1,2,\dots ,{m}\}$, be neural networks which satisfy for every $\varepsilon \in (0,\infty )$, $j\in \{1,2,\dots ,{m}\}$

$$\begin{aligned} \sup _{t\in \Omega _j}\left| {h_j(x)-\left[ {R_{\varrho }(\Phi ^j_{\varepsilon })}\right] (x)}\right| \le \varepsilon , \end{aligned}$$

(6.21)

let $\Phi ^{{\mathcal {P}}}_{\varepsilon }\in {\mathfrak {N}}$, $\varepsilon \in (0,\infty )$ be given by $\Phi ^{{\mathcal {P}}}_{\varepsilon }={\mathcal {P}}(\Phi ^1_{\varepsilon },\Phi ^2_{\varepsilon },\dots ,\Phi ^m_{\varepsilon })$, and let $L_{\varepsilon }\in {\mathbb {N}}$, $\varepsilon \in (0,\infty )$ be given by $L_{\varepsilon }=\max _{j\in \{1,2,\dots ,{m}\}}{\mathcal {L}}(\Phi ^j_{\varepsilon })$.

Then there exists a constant $C\in {\mathbb {R}}$ ( which is independent of $m,B,\varepsilon $) and neural networks $(\Psi _{\varepsilon })_{\varepsilon \in (0,\infty )}\subseteq {\mathfrak {N}}$ which satisfy

(i)
${\mathcal {L}}(\Psi _{\varepsilon })\le C\ln (m)\left( {\left| {\ln (\varepsilon )}\right| +m\ln (B)+\ln (m)}\right) +L_{\varepsilon }$,
(ii)
${\mathcal {M}}(\Psi _{\varepsilon }) \le C m\left( {\left| {\ln (\varepsilon )}\right| +m\ln (B)+\ln (m)}\right) +{\mathcal {M}}(\Phi ^{{\mathcal {P}}}_{\varepsilon })+{\mathcal {M}}_{L_{\varepsilon }}(\Phi ^{{\mathcal {P}}}_{\varepsilon })$ and
(iii)
$\displaystyle \sup _{t=(t_1,t_2,\dots ,t_m)\in \times _{j=1}^m \Omega _j}\left| {\left[ {\textstyle \prod \limits _{j=1}^m h_j(t_j)}\right] -\left[ {R_{\varrho }(\Psi _{\varepsilon })}\right] \!(t)}\right| \le 3mB^{m-1}\varepsilon .$

Proof of Proposition 6.4

In the case of $m=1$, the neural networks $(\Phi ^1_{\varepsilon })_{\varepsilon \in (0,\infty )}\in {\mathfrak {N}}$ satisfy (i), (ii) and (iii) by assumption. Throughout the remainder of this proof, assume $m\ge 2$, and let $\theta \in {\mathcal {N}}^{1,1}_1$ denote the trivial neural network $\theta =(0,0)$. Observe that Theorem 6.3 (with $\varepsilon \leftrightarrow \eta $, $C'\leftrightarrow C$ in the notation Theorem 6.3) ensures that there exist $C'\in {\mathbb {R}}$ and neural networks $(\Pi _{\eta })_{\eta \in (0,\infty )}\subseteq {\mathfrak {N}}$ which satisfy for every $\eta \in (0,\infty )$ that

(a)
${\mathcal {L}}(\Pi _{\eta })\le C'\ln (m)\left( {\left| {\ln (\eta )}\right| +m\ln (B)+\ln (m)}\right) $,
(b)
${\mathcal {M}}(\Pi _{\eta })\le C' m\left( {\left| {\ln (\eta )}\right| +m\ln (B)+\ln (m)}\right) $ and
(c)
$\displaystyle \sup _{x\in [-B,B]^m}\left| {\left[ {\prod _{j=1}^m x_j}\right] -\left[ {R_{\varrho }(\Pi _{\eta })}\right] \!(x)}\right| \le \eta $.

Let $(\Psi _{\varepsilon })_{\varepsilon \in (0,\infty )}\subseteq {\mathfrak {N}}$ be the neural networks which satisfy for every $\varepsilon \in (0,\infty )$ that

$$\begin{aligned} \Psi _{\varepsilon }={\left\{ \begin{array}{ll}\Pi _{\varepsilon }\odot {\mathcal {P}}(\Phi ^1_{\varepsilon },\Phi ^2_{\varepsilon }, \dots , \Phi ^m_{\varepsilon }) &{} :\varepsilon <\tfrac{B}{2m}\\ \theta &{} :\varepsilon \ge \tfrac{B}{2m}\end{array}\right. }. \end{aligned}$$

(6.22)

Note that for every $\varepsilon \in (0,\tfrac{B}{2m})$

$$\begin{aligned} \begin{aligned} \max _{\overset{x\in [-B,B]^m,x'\in {\mathbb {R}}^m}{\left\Vert x'-x\right\Vert _{\infty }\le \varepsilon }}\left| {\prod _{j=1}^m x'_j-\prod _{j=1}^m x_j}\right|&=(B+\varepsilon )^m-B^m=\sum _{k=1}^m \left( {\begin{array}{c}m\\ k\end{array}}\right) B^{m-k}\varepsilon ^k\le \varepsilon \sum _{k=1}^m\frac{m^k}{k!}B^{m-k}\varepsilon ^{k-1}\\&\le \varepsilon \sum _{k=1}^m\frac{m^k}{k!}B^{m-k}\left( {\frac{B}{2m}}\right) ^{k-1}=mB^{m-1}\varepsilon \sum _{k=1}^m \frac{1}{2^{k-1}k!}\\&\le 2mB^{m-1}\varepsilon . \end{aligned}\end{aligned}$$

(6.23)

Combining this with Lemma 5.3, Lemma 5.4, (6.21) and (c) implies that for every $\varepsilon \in (0,\tfrac{B}{2m})$, $t=(t_1,t_2,\dots ,t_m)\in \Omega $ it holds

$$\begin{aligned} \begin{aligned} \left| {\left[ {\textstyle \prod \limits _{j=1}^m h_j(t_j)}\right] -\left[ {R_{\varrho }(\Psi _{\varepsilon })}\right] \!(t)}\right|&=\left| {\left[ {\textstyle \prod \limits _{j=1}^m h_j(t_j)}\right] -\left[ {R_{\varrho }(\Pi _{\varepsilon }\odot {\mathcal {P}}(\Phi ^1_{\varepsilon },\Phi ^2_{\varepsilon },\dots ,\Phi ^m_{\varepsilon }))}\right] \!(t)}\right| \\&\le \left| {\left[ {\textstyle \prod \limits _{j=1}^m h_j(t_j)}\right] -\left[ {\textstyle \prod \limits _{j=1}^m \left[ {R_{\varrho }(\Phi ^j_{\varepsilon })}\right] (t_j)}\right] }\right| \\&\quad +\left| {\left[ {\textstyle \prod \limits _{j=1}^m \left[ {R_{\varrho }(\Phi ^j_{\varepsilon })}\right] (t_j)}\right] -\left[ {R_{\varrho }(\Pi _{\varepsilon })}\right] \left( {[R_{\varrho }(\Phi ^1_{\varepsilon })](t_1),\dots ,[R_{\varrho }(\Phi ^m_{\varepsilon })](t_j)}\right) }\right| \\&\le 2mB^{m-1}\varepsilon +\varepsilon \le 3mB^{m-1}\varepsilon . \end{aligned}\end{aligned}$$

(6.24)

Moreover, for every $\varepsilon \in [\tfrac{B}{2m},\infty )$, $t=(t_1,t_2,\dots ,t_m)\in \Omega $ it holds that

$$\begin{aligned} \begin{aligned} \left| {\left[ {\textstyle \prod \limits _{j=1}^m h_j(t_j)}\right] -\left[ {R_{\varrho }(\Psi _{\varepsilon })}\right] \!(t)}\right|&=\left| {\left[ {\textstyle \prod \limits _{j=1}^m h_j(t_j)}\right] -\left[ {R_{\varrho }(\theta )}\right] \!(t)}\right| \\&=\left| {\left[ {\textstyle \prod \limits _{j=1}^m h_j(t_j)}\right] }\right| \le B^m\le 2mB^{m-1}\varepsilon . \end{aligned}\end{aligned}$$

(6.25)

This and (6.24) establish that the neural networks $(\Psi _{\varepsilon })_{\varepsilon ,c\,\in (0,\infty )}$ satisfy (iii). Next observe that Lemma 5.3, Lemma 5.4 and (a) demonstrate that for every $\varepsilon \in (0,\tfrac{B}{2m})$

$$\begin{aligned} \begin{aligned} {\mathcal {L}}(\Psi _{\varepsilon })&={\mathcal {L}}(\Pi _{\varepsilon }\odot {\mathcal {P}}(\Phi ^1_{\varepsilon },\Phi ^2_{\varepsilon }, \dots , \Phi ^m_{\varepsilon }))={\mathcal {L}}(\Pi _{\varepsilon })+\max _{j\in \{1,2,\dots ,{m}\}}{\mathcal {L}}(\Phi ^j_{\varepsilon })\\&\le C'\ln (m)\left( {\left| {\ln (\varepsilon )}\right| +m\ln (B)+\ln (m)}\right) +L_{\varepsilon }. \end{aligned}\end{aligned}$$

(6.26)

This and the fact that for every $\varepsilon \in [\tfrac{B}{2m},\infty )$ it holds that ${\mathcal {L}}(\Psi _{\varepsilon })={\mathcal {L}}(\theta )=1$ establish that the neural networks $(\Psi _{\varepsilon })_{\varepsilon ,c\,\in (0,\infty )}$ satisfy (i). Furthermore, note that Lemma 5.3, Lemma 5.4 and (b) ensure that for every $\varepsilon \in (0,\tfrac{B}{2m})$

$$\begin{aligned} \begin{aligned} {\mathcal {M}}(\Psi _{\varepsilon })&={\mathcal {M}}(\Pi _{\varepsilon }\odot {\mathcal {P}}(\Phi ^1_{\varepsilon },\Phi ^2_{\varepsilon }, \dots , \Phi ^m_{\varepsilon }))\\&\le 2{\mathcal {M}}(\Pi _{\varepsilon })+{\mathcal {M}}({\mathcal {P}}(\Phi ^1_{\varepsilon },\Phi ^2_{\varepsilon }, \dots , \Phi ^m_{\varepsilon }))\\&\quad +{\mathcal {M}}_{{\mathcal {L}}({\mathcal {P}}(\Phi ^1_{\varepsilon },\Phi ^2_{\varepsilon }, \dots , \Phi ^m_{\varepsilon }))}({\mathcal {P}}(\Phi ^1_{\varepsilon },\Phi ^2_{\varepsilon }, \dots , \Phi ^m_{\varepsilon }))\\&\le 2C' m\left( {\left| {\ln (\varepsilon )}\right| +m\ln (B)+\ln (m)}\right) +{\mathcal {M}}(\Phi ^{{\mathcal {P}}}_{\varepsilon })+{\mathcal {M}}_{L_{\varepsilon }}(\Phi ^{{\mathcal {P}}}_{\varepsilon }). \end{aligned}\end{aligned}$$

(6.27)

This and the fact that for every $\varepsilon \in [\tfrac{B}{2m},\infty )$ it holds that ${\mathcal {M}}(\Psi _{\varepsilon })={\mathcal {M}}(\theta )=0$ imply the neural networks $(\Psi _{\varepsilon })_{\varepsilon ,c\,\in (0,\infty )}$ satisfy (ii). The proof of Proposition 6.4 is completed. $\square $

Another way to use the multiplication results is to consider the approximation of smooth functions by polynomials. This can be done for functions of arbitrary dimension using the multivariate Taylor expansion (see [44] and [31, Thm. 2.3]). Such a direct approach, however, yields networks whose size depends exponentially on the dimension of the function. As our goal is to show that high-dimensional functions with a tensor product structure can be approximated by networks with only polynomial dependence on the dimension, we only consider univariate smooth functions here. In appendix, we present a detailed and explicit construction of this Taylor approximation by neural networks. In the following results, we employ an auxiliary parameter r, so that the bounds on the depth and connectivity of the networks may be stated for all $\varepsilon \in (0,\infty )$. Note that this parameter does not influence the construction of the networks themselves.

Theorem 6.5

Assume Setting 5.1, let $n\in {\mathbb {N}}$, $r\in (0,\infty )$, let $\varrho :{\mathbb {R}}\rightarrow {\mathbb {R}}$ be the ReLU activation function given by $\varrho (t)=\max \{0,t\}$ and let $B^n_1\subseteq C^n([0,1],{\mathbb {R}})$ be the set given by

$$\begin{aligned} B^n_1=\left\{ f\in C^n([0,1],{\mathbb {R}}):\max _{k\in \{0,1,\dots ,n\}}\left[ {\sup _{t\in [0,1]}\left| {f^{(k)}(t)}\right| }\right] \le 1\right\} . \end{aligned}$$

(6.28)

Then there exist neural networks $(\Phi _{f,\varepsilon })_{f\in B^n_1,\varepsilon \in (0,\infty )}\subseteq {\mathfrak {N}}$ which satisfy

(i)
$\displaystyle \sup _{f\in B^n_1,\varepsilon \in (0,\infty )}\left[ {\frac{{\mathcal {L}}(\Phi _{f,\varepsilon })}{\max \{r,\left| {\ln (\varepsilon )}\right| \}}}\right] <\infty $,
(ii)
$\displaystyle \sup _{f\in B^n_1,\varepsilon \in (0,\infty )}\left[ {\frac{{\mathcal {M}}(\Phi _{f,\varepsilon })}{\varepsilon ^{-\frac{1}{n}}\max \{r,|\ln (\varepsilon )|\}}}\right] <\infty $ and
(iii)
for every $f\in B^n_1$, $\varepsilon \in (0,\infty )$ that
$$\begin{aligned} \sup _{t\in [0,1]}\left| {f(t)-\left[ {R_{\varrho }(\Phi _{f,\varepsilon })}\right] \!(t)}\right| \le \varepsilon . \end{aligned}$$
(6.29)

For convenience of use, we also provide the following more general corollary.

Corollary 6.6

Assume Setting 5.1, let $r\in (0,\infty )$ and let $\varrho :{\mathbb {R}}\rightarrow {\mathbb {R}}$ be the ReLU activation function given by $\varrho (t)=\max \{0,t\}$. Let further the set ${\mathcal {C}}^n$ be given by ${\mathcal {C}}^n=\cup _{[a,b]\subseteq {\mathbb {R}}_+}C^n([a,b],{\mathbb {R}})$, and let $\left\Vert \cdot \right\Vert _{n,\infty }:{\mathcal {C}}^n\rightarrow [0,\infty )$ satisfy for every $[a,b]\subseteq {\mathbb {R}}_+$, $f\in C^n([a,b],{\mathbb {R}})$

$$\begin{aligned} \left\Vert f\right\Vert _{n,\infty }=\max _{k\in \{0,1,\dots ,n\}}\left[ {\sup _{t\in [a,b]}\left| {f^{(k)}(t)}\right| }\right] . \end{aligned}$$

(6.30)

Then there exist neural networks $\left( {\Phi _{f,\varepsilon }}\right) _{f\in {\mathcal {C}}^n,\varepsilon \in (0,\infty )}\subseteq {\mathfrak {N}}$ which satisfy

(i)
$\displaystyle \sup _{f\in {\mathcal {C}}^n, \varepsilon \in (0,\infty )}\left[ {\frac{{\mathcal {L}}(\Phi _{f,\varepsilon })}{\max \{r,|\ln (\frac{\varepsilon }{\max \{1,b-a\}\left\Vert f\right\Vert _{n,\infty }})|\}}}\right] <\infty $,
(ii)
$\displaystyle \sup _{f\in {\mathcal {C}}^n, \varepsilon \in (0,\infty )}\left[ {\frac{{\mathcal {M}}(\Phi _{f,\varepsilon })}{\max \{1,b-a\}\left\Vert f\right\Vert _{n,\infty }^{\frac{1}{n}}\varepsilon ^{-\frac{1}{n}}\max \{r,|\ln (\frac{\varepsilon }{\max \{1,b-a\}\left\Vert f\right\Vert _{n,\infty }})|\}}}\right] <\infty $ and
(iii)
for every $[a,b]\subseteq {\mathbb {R}}_+$, $f\in C^n([a,b],{\mathbb {R}})$, $\varepsilon \in (0,\infty )$ that
$$\begin{aligned} \sup _{t\in [a,b]}\left| {f(t)-\left[ {R_{\varrho }(\Phi _{f,\varepsilon })}\right] \!(t)}\right| \le \varepsilon . \end{aligned}$$
(6.31)

7 DNN Expression Rates for High-Dimensional Basket Prices

Now that we have established a number of general expression rate results, we can apply them to our specific problem. Using the regularity result (3.3), we obtain the following.

Corollary 7.1

Assume Setting 5.1, let $n\in {\mathbb {N}}$, $r\in (0,\infty )$, $a\in (0,\infty )$, $b\in (a,\infty )$, let $\varrho :{\mathbb {R}}\rightarrow {\mathbb {R}}$ be the ReLU activation function given by $\varrho (t)=\max \{0,t\}$, let $f:(0,\infty )\rightarrow {\mathbb {R}}$ be as defined in (3.1) and let $h_{c,K}:[a,b]\rightarrow {\mathbb {R}}$, $c\in (0,\infty )$, $K\in [0,\infty )$, denote the functions which satisfy for every $c\in (0,\infty )$, $K\in [0,\infty )$, $x\in [a,b]$ that

$$\begin{aligned} h_{c,K}(x)=f\left( \tfrac{K+c}{x}\right) . \end{aligned}$$

(7.1)

Then there exist neural networks $\left( {\Phi _{\varepsilon ,c,K}}\right) _{\varepsilon ,c\,\in (0,\infty ),K\in [0,\infty )}\subseteq {\mathfrak {N}}$ which satisfy

(i)
$\displaystyle \sup _{\varepsilon ,c\in (0,\infty ),K\in [0,\infty )}\left[ {\frac{{\mathcal {L}}(\Phi _{\varepsilon ,c,K})}{\max \{r,|\ln (\varepsilon )|\}+\max \{0,\ln (K+c)\}}}\right] <\infty $,
(ii)
$\displaystyle \sup _{\varepsilon ,c\,\in (0,\infty ),K\in [0,\infty )}\left[ {\frac{{\mathcal {M}}(\Phi _{\varepsilon ,c,K})}{(K+c+1)^{\frac{1}{n}}\varepsilon ^{-\frac{1}{n^2}}}}\right] <\infty $ and
(iii)
for every $\varepsilon ,c\in (0,\infty )$, $K\in [0,\infty )$ that
$$\begin{aligned} \sup _{x\in [a,b]}\left| {h_{c,K}(x)-\left[ {R_{\varrho }(\Phi _{\varepsilon ,c,K})}\right] \!(x)}\right| \le \varepsilon . \end{aligned}$$
(7.2)

Proof of Corollary 7.1

We observe Corollary 3.3 ensures the existence of a constant $C\in {\mathbb {R}}$ with

$$\begin{aligned} \max _{k\le n}\sup _{x\in [a,b]}\left| {h_{c,K}^{(k)}(x)}\right| \le C\max \{(K+c)^n,1\}. \end{aligned}$$

(7.3)

Moreover, observe for every $\varepsilon ,c\in (0,\infty )$, $K\in [0,\infty )$ it holds

$$\begin{aligned} \begin{aligned}&\max \left\{ r,|\ln \left( \tfrac{\varepsilon }{\max \{1,b-a\}C\max \{(K+c)^n,1\}}\right) |\right\} \\&\quad \le \max \{r,\left| {\ln (\varepsilon )}\right| \}+|\ln (\max \{1,b-a\})|+\left| {\ln (C\max \{(K+c)^n,1\})}\right| \\&\quad \le \max \{r,\left| {\ln (\varepsilon )}\right| \}+\ln (\max \{1,b-a\})+\left| {\ln (C)}\right| +\left| {\ln (\max \{(K+c)^n,1\})}\right| \\&\quad \le \max \{r,\left| {\ln (\varepsilon )}\right| \}+\ln (\max \{1,b-a\})+\left| {\ln (C)}\right| +n\max \{\ln (K+c),0\}\\&\quad \le n(1+\max \{1,\tfrac{1}{r}\}(|\ln (C)|+\ln (\max \{1,b-a\})))(\max \{r,\left| {\ln (\varepsilon )}\right| \}\\&\qquad +\max \{\ln (K+c),0\}). \end{aligned}\end{aligned}$$

(7.4)

Furthermore, note for every $\varepsilon ,c\in (0,\infty )$, $K\in [0,\infty )$ it holds

$$\begin{aligned} \begin{aligned}&\left[ {\frac{\varepsilon }{\max \{1,b-a\}C\max \{(K+c)^n,1\}}}\right] ^{-\frac{1}{2n^2}}\\&\quad =[\max \{1,b-a\}]^{-\frac{1}{2n^2}}\varepsilon ^{-\frac{1}{2n^2}}C^{\frac{1}{2n^2}}\max \{(K+c)^{\frac{1}{2n}},1\}\\&\quad \le [\max \{1,b-a\}]^{-\frac{1}{2n^2}} C^{\frac{1}{2n^2}}(K+c+1)^{\frac{1}{2n}}\varepsilon ^{-\frac{1}{2n^2}}. \end{aligned}\end{aligned}$$

(7.5)

Combining this, (7.3), (7.4) with Lemma A.1 and Corollary 6.6 (with $n\leftrightarrow 2n^2$ in the notation of Corollary 6.6) completes the proof of Corollary 7.1. $\square $

We can then employ Proposition 6.4 in order to approximate the required tensor product.

Corollary 7.2

Assume Setting 5.1, let $\varrho :{\mathbb {R}}\rightarrow {\mathbb {R}}$ be the ReLU activation function given by $\varrho (t)=\max \{0,t\}$, let $n\in {\mathbb {N}}$, $a\in (0,\infty )$, $b\in (a,\infty )$, $(K_i)_{i\in {\mathbb {N}}}\subseteq [0,K_{\mathrm {max}})$, and consider, for $h_{c,K}:[a,b]\rightarrow {\mathbb {R}}$, $c\in (0,\infty )$, $K\in [0,K_{\mathrm {max}})$, the functions which are, for every $c\in (0,\infty )$, $K\in [0,K_{\mathrm {max}})$, $x\in [a,b]$, given by

$$\begin{aligned} h_{c,K}(x)=\tfrac{1}{\sqrt{2\pi }}\int ^{\ln (\frac{K+c}{x})}_{-\infty } e^{-\frac{1}{2}r^2}\mathrm {d}r. \end{aligned}$$

(7.6)

For any $c\in (0, \infty )$, $d\in {\mathbb {N}}$ let the function $F^d_c(x):[a,b]^d\rightarrow {\mathbb {R}}$ be given by

$$\begin{aligned} F^d_c(x)=1-\left[ {\textstyle \prod \limits _{i=1}^d h_{c,K_i}(x_i)}\right] . \end{aligned}$$

(7.7)

Then there exist neural networks $(\Psi ^d_{\varepsilon ,c})_{\varepsilon ,c\,\in (0,\infty ),d\in {\mathbb {N}}}\subseteq {\mathfrak {N}}$ which satisfy

(i)
$\displaystyle \sup _{\varepsilon ,c\,\in (0,\infty ),d\in {\mathbb {N}}}\left[ {\frac{{\mathcal {L}}(\Psi ^d_{\varepsilon ,c})}{\max \{1,\ln (d)\}(\left| {\ln (\varepsilon )}\right| +\ln (d)+1)+\ln (c+1)}}\right] <\infty $,
(ii)
$\displaystyle \sup _{\varepsilon ,c\,\in (0,\infty ),d\in {\mathbb {N}}}\left[ {\frac{{\mathcal {M}}(\Psi ^d_{\varepsilon ,c})}{(c+1)^{\frac{1}{n}}d^{1+\frac{1}{n}}\varepsilon ^{-\frac{1}{n}}}}\right] <\infty $ and
(iii)
for every $\varepsilon ,c\,\in (0,\infty )$, $d\in {\mathbb {N}}$ that
$$\begin{aligned} \sup _{x\in [a,b]^d}\left| {F^d_c(x)-\left[ {R_{\varrho }(\Psi ^d_{\varepsilon ,c})}\right] \!(x)}\right| \le \varepsilon . \end{aligned}$$
(7.8)

Proof of Corollary 7.2

Throughout this proof, assume Setting 5.2. Corollary 7.1 ensures there exist constants $b_L,b_M\in (0,\infty )$ and neural networks $\left( {\Phi ^i_{\eta ,c}}\right) _{\eta ,c\,\in (0,\infty )}\subseteq {\mathfrak {N}}$, $i\in {\mathbb {N}}$ such that for every $i\in {\mathbb {N}}$ it holds

(a)
$\displaystyle \sup _{\eta ,c\in (0,\infty )}\left[ {\frac{{\mathcal {L}}(\Phi ^i_{\eta ,c})}{\max \{1,|\ln (\eta )|\}+\max \{0,\ln (K_{\mathrm {max}}+c)\}}}\right] <b_L$,
(b)
$\displaystyle \sup _{\eta ,c\,\in (0,\infty )}\left[ {\frac{{\mathcal {M}}(\Phi ^i_{\eta ,c})}{(K_{\mathrm {max}}+c+1)^{\frac{1}{n}}\eta ^{-\frac{1}{n^2}}}}\right] <b_M$ and
(c)
for every $\eta ,c\in (0,\infty )$ that
$$\begin{aligned} \sup _{x\in [a,b]}\left| {h_{c,K_i}(x)-\left[ {R_{\varrho }(\Phi ^i_{\eta ,c})}\right] \!(x)}\right| \le \eta . \end{aligned}$$
(7.9)

Furthermore, for every $c\in (0,\infty )$, $i\in {\mathbb {N}}$, $x\in [a,b]$ holds

$$\begin{aligned} \begin{aligned} \left| {h_{c,K_i}(x)}\right| =\left| {\tfrac{1}{\sqrt{2\pi }}\int ^{\ln (\frac{K_i+c}{x})}_{-\infty } e^{-\frac{1}{2}r^2}\mathrm {d}r}\right| \le \tfrac{1}{\sqrt{2\pi }}\left| {\int ^{\infty }_{-\infty } e^{-\frac{1}{2}r^2}\mathrm {d}r}\right| =1. \end{aligned}\end{aligned}$$

(7.10)

Combining this with (a) and Proposition 6.4 and Lemma 5.4 implies there exist $C\in {\mathbb {R}}$ and neural networks $(\psi ^d_{\eta ,c})_{\eta \in (0,\infty )}\subseteq {\mathfrak {N}}$, $c\in (0,\infty )$, $d\in {\mathbb {N}}$, such that for every $c\in (0,\infty )$, $d\in {\mathbb {N}}$ it holds

(A)
$\displaystyle {\mathcal {L}}(\psi ^d_{\eta ,c})\le C\ln (d)\left( {\left| {\ln (\eta )}\right| +\ln (d)}\right) +\max _{i\in \{1,2,\dots ,{d}\}}{\mathcal {L}}(\Phi ^i_{\eta ,c})$,
(B)
$\displaystyle {\mathcal {M}}(\psi ^d_{\eta ,c})\le C d\left( {\left| {\ln (\eta )}\right| +\ln (d)}\right) +4\sum _{i=1}^d {\mathcal {M}}(\Phi ^i_{\eta ,c})+8d\max _{i\in \{1,2,\dots ,{d}\}}{\mathcal {L}}(\Phi ^i_{\eta ,c})$ and
(C)
for every $\eta \in (0,\infty )$ that
$$\begin{aligned} \sup _{x\in [a,b]^d}\left| {\left[ {\textstyle \prod \limits _{i=1}^d h_{c,K_i}(x_i)}\right] -\left[ {R_{\varrho }(\psi ^d_{\eta ,c})}\right] \!(x)}\right| \le 3d\eta . \end{aligned}$$
(7.11)

Let $\lambda \in {\mathcal {N}}_1^{1,1}$ be the neural network given by $\lambda =\left( {(-1,1)}\right) $, let $\theta \in {\mathcal {N}}^{1,1}_1$ be the neural network given by $\theta =(0,0)$ and let $(\Psi ^d_{\varepsilon ,c})_{\varepsilon ,c\,\in (0,\infty ),d\in {\mathbb {N}}}\subseteq {\mathfrak {N}}$ be the neural networks given by

$$\begin{aligned} \Psi ^d_{\varepsilon ,c}={\left\{ \begin{array}{ll}\lambda \odot \psi ^d_{\nicefrac {\varepsilon }{(3d)},c} &{} :\varepsilon \le 2\\ \theta &{} :\varepsilon >2\end{array}\right. }. \end{aligned}$$

(7.12)

Observe that this and (B) imply for every $\varepsilon \in (0,2]$, $c\,\in (0,\infty )$, $d\in {\mathbb {N}}$, $x\in [a,b]^d$ it holds

$$\begin{aligned} \begin{aligned} \left| {F^d_c(x)-\left[ {R_{\varrho }(\Psi ^d_{\varepsilon ,c})}\right] \!(x)}\right|&=\left| {\left( {1-\left[ {\textstyle \prod \limits _{i=1}^d h_{c,K_i}(x_i)}\right] }\right) -\left( {1-\left[ {R_{\varrho }(\psi ^d_{\nicefrac {\varepsilon }{(3d)},c})}\right] \!(x)}\right) }\right| \\&\le 3d\tfrac{\varepsilon }{3d}=\varepsilon . \end{aligned}\end{aligned}$$

(7.13)

Moreover, (7.12) and (7.10) ensure for every $\varepsilon \in (2,\infty )$, $c\,\in (0,\infty )$, $d\in {\mathbb {N}}$, $x\in [a,b]^d$ it holds

$$\begin{aligned} \begin{aligned} \left| {F^d_c(x)-\left[ {R_{\varrho }(\Psi ^d_{\varepsilon ,c})}\right] \!(x)}\right|&=\left| {\left( {1-\left[ {\textstyle \prod \limits _{i=1}^d h_{c,K_i}(x_i)}\right] }\right) }\right| \\ \end{aligned}\end{aligned}$$

(7.14)

This and (7.13) establish the neural networks $(\Psi ^d_{\varepsilon ,c})_{\varepsilon ,c\,\in (0,\infty ),d\in {\mathbb {N}}}$ satisfy (iii). Next observe that for every $c\,\in (0,\infty )$ it holds

$$\begin{aligned} \begin{aligned} \max \{0,\ln (K_{\mathrm {max}}+c)\}&\le \max \{0,\ln (\max \{1,K_{\mathrm {max}}\}+\max \{1,K_{\mathrm {max}}\}c)\}\\&=\ln (\max \{1,K_{\mathrm {max}}\}(1+c))=\ln (\max \{1,K_{\mathrm {max}}\})+\ln (1+c)\\&\le \ln (c+1)+|\ln (K_{\mathrm {max}})|. \end{aligned}\end{aligned}$$

(7.15)

Hence, we obtain that for every $\varepsilon ,c\,\in (0,\infty )$, $d\in {\mathbb {N}}$ it holds

$$\begin{aligned} \begin{aligned}&\max \left\{ 1,|\ln \left( \tfrac{\varepsilon }{3d}\right) |\right\} +\max \{0,\ln (K_{\mathrm {max}}+c)\}\\&\quad \le |\ln (\varepsilon )|+\ln (d)+\ln (3)+\ln (c+1)+|\ln (K_{\mathrm {max}})|\\&\quad \le (\ln (3)+|\ln (K_{\mathrm {max}})|)\left[ {\max \{1,\ln (d)\}(|\ln (\varepsilon )|+\ln (d)+1)+\ln (c+1)}\right] . \end{aligned}\end{aligned}$$

(7.16)

In addition, for every $\varepsilon ,c\,\in (0,\infty )$, $d\in {\mathbb {N}}$ it holds

$$\begin{aligned} C\ln (d)\left( {\left| {\ln \left( \tfrac{\varepsilon }{3d}\right) }\right| +\ln (d)}\right) \le 4C\left[ {\max \{1,\ln (d)\}(|\ln (\varepsilon )|+\ln (d)+1)+\ln (c+1)}\right] . \end{aligned}$$

(7.17)

Combining this with Lemma 5.3, (a), (A) and (7.16) yields

$$\begin{aligned} \begin{aligned}&\sup _{\begin{array}{c} \varepsilon \in (0,2],c\,\in (0,\infty ),\\ d\in {\mathbb {N}} \end{array}}\left[ {\frac{{\mathcal {L}}(\Psi ^d_{\varepsilon ,c})}{\max \{1,\ln (d)\}(\left| {\ln (\varepsilon )}\right| +\ln (d)+1)+\ln (c+1)}}\right] \\&\quad \le \sup _{\begin{array}{c} \varepsilon \in (0,2],c\,\in (0,\infty ),\\ d\in {\mathbb {N}} \end{array}}\left[ {\frac{1+C\ln (d)\left( {\left| {\ln (\frac{\varepsilon }{3d})}\right| +\ln (d)}\right) +\max _{i\in \{1,2,\dots ,{d}\}}{\mathcal {L}}(\Phi ^i_{\nicefrac {\varepsilon }{(3d)},c})}{\max \{1,\ln (d)\}(\left| {\ln (\varepsilon )}\right| +\ln (d)+1)+\ln (c+1)}}\right] \\&\quad \le 2+4C+(\ln (3)+|\ln (K_{\mathrm {max}})|)b_L<\infty . \end{aligned}\end{aligned}$$

(7.18)

Moreover, (7.12) shows

$$\begin{aligned} \begin{aligned}&\sup _{\begin{array}{c} \varepsilon \in (2,\infty ),c\,\in (0,\infty ),\\ d\in {\mathbb {N}} \end{array}}\left[ {\frac{{\mathcal {L}}(\Psi ^d_{\varepsilon ,c})}{\max \{1,\ln (d)\}(\left| {\ln (\varepsilon )}\right| +\ln (d)+1)+\ln (c+1)}}\right] \\&\quad =\sup _{\begin{array}{c} \varepsilon \in (2,\infty ),c\,\in (0,\infty ),\\ d\in {\mathbb {N}} \end{array}}\left[ {\frac{1}{\max \{1,\ln (d)\}(\left| {\ln (\varepsilon )}\right| +\ln (d)+1)+\ln (c+1)}}\right] <\infty . \end{aligned}\end{aligned}$$

(7.19)

This and (7.18) establish that $(\Psi ^d_{\varepsilon ,c})_{\varepsilon ,c\,\in (0,\infty ),d\in {\mathbb {N}}}$ satisfy (i). Next observe Lemma A.1 implies that

for every $\varepsilon \in (0,2]$ it holds
$$\begin{aligned} |\ln (\varepsilon )|\le \left[ {\sup _{\delta \in [\exp (-2n^2),2]}\ln (\delta )}\right] \varepsilon ^{-\frac{1}{n}}=2n^2\varepsilon ^{-\frac{1}{n}}, \end{aligned}$$
(7.20)
for every $d\in {\mathbb {N}}$ it holds
$$\begin{aligned} \ln (d)\le \left[ {\max _{k\in \{1,2,\dots {\exp (2n^2)}\}}\ln (k)}\right] d^{\frac{1}{n}}=2n^2d^{\frac{1}{n}}, \end{aligned}$$
(7.21)
and for every $c\in (0,\infty )$ it holds
$$\begin{aligned} \ln (c+1)\le \left[ {\sup _{t\in (0,\exp (2n^2-1)]}\ln (t+1)}\right] (c+1)^{\frac{1}{n}}=2n^2(c+1)^{\frac{1}{n}}. \end{aligned}$$
(7.22)

For every $m\in {\mathbb {N}}$, $x_i\in [1,\infty )$, $i\in \{1,2,\dots ,{m}\}$, it holds

$$\begin{aligned} \sum _{i=1}^m x_i\le \textstyle \prod \limits _{i=1}^m(x_i+1)\le 2^m\textstyle \prod \limits _{i=1}^m x_i. \end{aligned}$$

(7.23)

Combining this with (7.20), (7.21) and (7.22) shows for every $\varepsilon \in (0,2]$, $d\in {\mathbb {N}}$, $c\in (0,\infty )$ it holds

$$\begin{aligned} \begin{aligned} 2C d\left( |\ln \left( \tfrac{\varepsilon }{3d}\right) |+\ln (d)\right)&\le 2C d (|\ln (\varepsilon )|+2\ln (d)+\ln (3)+\ln (c+1))\\&\le 4n^2C d (2\varepsilon ^{-\frac{1}{n}}+2d^{\frac{1}{n}}+\ln (3)+(c+1)^{\frac{1}{n}})\\&\le 1024n^2 C (c+1)^{\frac{1}{n}} d^{1+\frac{1}{n}} \varepsilon ^{-\frac{1}{n}}. \end{aligned}\end{aligned}$$

(7.24)

Furthermore, note (7.15), (7.20), (7.21), (7.22) and (7.23) ensure for every $\varepsilon \in (0,2]$, $d\in {\mathbb {N}}$, $c\in (0,\infty )$ it holds

$$\begin{aligned} \begin{aligned}&16d\left( \max \{1,|\ln \left( \tfrac{\varepsilon }{3d}\right) |\}+\max \{0,\ln (K_{\mathrm {max}}+c)\}\right) \\&\quad \le 16d(|\ln (\varepsilon )|+\ln (d)+\ln (3)+\ln (c+1)+|\ln (K_{\mathrm {max}})|)\\&\quad \le 32n^2d(2\varepsilon ^{-\frac{1}{n}}+d^{\frac{1}{n}}+(c+1)^{\frac{1}{n}}+\ln (3)+|\ln (K_{\mathrm {max}})|)\\&\quad \le 2048n^2(\ln (3)+|\ln (K_{\mathrm {max}})|)(c+1)^{\frac{1}{n}} d^{1+\frac{1}{n}} \varepsilon ^{-\frac{1}{n}}. \end{aligned}\end{aligned}$$

(7.25)

In addition, observe that for every $\varepsilon \in (0,2]$, $d\in {\mathbb {N}}$, $c\in (0,\infty )$ it holds

$$\begin{aligned} 4d(K_{\mathrm {max}}+c+1)^{\frac{1}{n}}\left( \tfrac{\varepsilon }{3d}\right) ^{-\frac{1}{n^2}}\le 96\max \{1,K_{\mathrm {max}}\}(c+1)^{\frac{1}{n}}d^{1+\frac{1}{n}}\varepsilon ^{-\frac{1}{n}}. \end{aligned}$$

(7.26)

Combining this with Lemma 5.3, (a), (b), (B), (7.24) and (7.25) yield

$$\begin{aligned} \begin{aligned}&\sup _{\begin{array}{c} \varepsilon \in (0,2],c\,\in (0,\infty ),\\ d\in {\mathbb {N}} \end{array}}\left[ {\frac{{\mathcal {M}}(\Psi ^d_{\varepsilon ,c})}{(c+1)^{\frac{1}{n}}d^{1+\frac{1}{n}}\varepsilon ^{-\frac{1}{n}}}}\right] \\&\quad \le \sup _{\begin{array}{c} \varepsilon \in (0,2],c\,\in (0,\infty ),\\ d\in {\mathbb {N}} \end{array}}\left[ {\frac{\displaystyle 4 +2Cd\left( |\ln \left( \tfrac{\varepsilon }{3d}\right) |+\ln (d)\right) +8\sum _{i=1}^d{\mathcal {M}}\left( \Phi ^i_{\nicefrac {\varepsilon }{(3d)},c}\right) +16d\max _{i\in \{1,2,\dots ,{d}\}} {\mathcal {L}}\left( \Phi ^i_{\nicefrac {\varepsilon }{(3d)},c}\right) }{(c+1)^{\frac{1}{n}}d^{1+\frac{1}{n}}\varepsilon ^{-\frac{1}{n}}}}\right] \\&\quad \le 8+1024n^2C+96\max \{1,K_{\mathrm {max}}\}b_M+2048n^2(\ln (3)+|\ln (K_{\mathrm {max}})|)b_L<\infty . \end{aligned}\end{aligned}$$

(7.27)

Furthermore, note that (7.12) ensures

$$\begin{aligned} \begin{aligned} \sup _{\begin{array}{c} \varepsilon \in (2,\infty ),c\,\in (0,\infty ),\\ d\in {\mathbb {N}} \end{array}}\left[ {\frac{{\mathcal {M}}(\Psi ^d_{\varepsilon ,c})}{(c+1)^{\frac{1}{n}}d^{1+\frac{1}{n}}\varepsilon ^{-\frac{1}{n}}}}\right] =\sup _{\begin{array}{c} \varepsilon \in (2,\infty ),c\,\in (0,\infty ),\\ d\in {\mathbb {N}} \end{array}}\left[ {\frac{{\mathcal {M}}(\theta )}{(c+1)^{\frac{1}{n}}d^{1+\frac{1}{n}}\varepsilon ^{-\frac{1}{n}}}}\right] =0. \end{aligned}\end{aligned}$$

(7.28)

This and (7.27) establish that the neural networks $(\Psi ^d_{\varepsilon ,c})_{\varepsilon ,c\,\in (0,\infty ),d\in {\mathbb {N}}}$ satisfy (ii). Thus the proof of Corollary 7.2 is completed. $\square $

Finally, we add the quadrature estimates from Sect. 4 to achieve approximation with networks whose size only depends polynomially on the dimension of the problem.

Theorem 7.3

Assume Setting 5.1, let $\varrho :{\mathbb {R}}\rightarrow {\mathbb {R}}$ be the ReLU activation function given by $\varrho (t)=\max \{0,t\}$, let $n\in {\mathbb {N}}$, $a\in (0,\infty )$, $b\in (a,\infty )$, $(K_i)_{i\in {\mathbb {N}}}\subseteq [0,K_{\mathrm {max}})$ and let $F_d:(0,\infty )\times [a,b]^d\rightarrow {\mathbb {R}}$, $d\in {\mathbb {N}}$, be the functions which satisfy for every $d\in {\mathbb {N}}$, $c\in (0, \infty )$, $x\in [a,b]^d$

$$\begin{aligned} F_d(c,x)=1-\prod _{i=1}^d \left[ {\tfrac{1}{\sqrt{2\pi }}\displaystyle \int ^{\ln (\frac{K_i+c}{x_i})}_{-\infty } e^{-\frac{1}{2}r^2}\mathrm {d}r}\right] . \end{aligned}$$

(7.29)

Then there exists neural networks $(\Gamma _{d,\varepsilon })_{\varepsilon \in (0,1],d\in {\mathbb {N}}}\in {\mathfrak {N}}$ which satisfy

(i)
$\displaystyle \sup _{\varepsilon \in (0,1],d\in {\mathbb {N}}}\left[ {\frac{{\mathcal {L}}(\Gamma _{d,\varepsilon })}{\max \{1,\ln (d)\}\left( {|\ln (\varepsilon )|+\ln (d)+1}\right) }}\right] <\infty $,
(ii)
$\displaystyle \sup _{\varepsilon \in (0,1],d\in {\mathbb {N}}}\left[ {\frac{{\mathcal {M}}(\Gamma _{d,\varepsilon })}{d^{2+\frac{1}{n}}\varepsilon ^{-\frac{1}{n}}}}\right] <\infty $ and
(iii)
for every $\varepsilon \in (0,1]$, $d\in {\mathbb {N}}$ that
$$\begin{aligned} \sup _{x\in [a,b]^d}\left| {\int _0^{\infty }F_d(c,x)\mathrm {d}c-\left[ {R_{\varrho }(\Gamma _{d,\varepsilon })}\right] \!(x)}\right| \le \varepsilon . \end{aligned}$$
(7.30)

Proof of Theorem 7.3

Throughout this proof, assume Setting 5.2, let $S_{b,n}\in {\mathbb {R}}$ be given by

$$\begin{aligned} S_{b,n}=2e^{2(4n+1)}(b+1)^{1+\frac{1}{4n}} \end{aligned}$$

(7.31)

and let $N_{d,\varepsilon }\in {\mathbb {R}}$, $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$, be given by

$$\begin{aligned} N_{d,\varepsilon }=S_{b,n}d^{\frac{1}{4n}}\left[ {\tfrac{\varepsilon }{4}}\right] ^{-\frac{1}{4n}}. \end{aligned}$$

(7.32)

Note Lemma 4.3 (with $4n\leftrightarrow n$, $F_x^d(c)\leftrightarrow F_d(x,c)$, $N_{d,\frac{\varepsilon }{2}}\leftrightarrow N_{d,\varepsilon }$, $Q_{d,\frac{\varepsilon }{2}}\leftrightarrow Q_{d,\varepsilon }$ in the notation of Lemma 4.3) ensures that there exist $Q_{d,\varepsilon }\in {\mathbb {R}}$, $c^d_{\varepsilon ,j}\in (0,N_{d,\varepsilon })$, $w^d_{\varepsilon ,j}\in [0,\infty )$, $j\in \{1,2,\dots ,{Q_{d,\varepsilon }}\}$, $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$ with

$$\begin{aligned} \sup _{\varepsilon \in (0,1], d\in {\mathbb {N}}}\left[ {\frac{Q_{d,\varepsilon }}{d^{1+\frac{1}{2n}}\varepsilon ^{-\frac{1}{2n}}}}\right] <\infty \end{aligned}$$

(7.33)

and for every $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$ it holds

$$\begin{aligned} \sup _{x\in [a,b]^d}\left| {\int _0^\infty F_d(c,x)\mathrm {d}c-\sum _{j=0}^{Q_{d,\varepsilon }} w^d_{\varepsilon ,j}F_d(c^d_{\varepsilon ,j},x)}\right| \le \tfrac{\varepsilon }{2} \end{aligned}$$

(7.34)

and

$$\begin{aligned} \sum _{j=1}^{Q_{d,\varepsilon }}w^d_{\varepsilon ,j}=N_{d,\varepsilon }. \end{aligned}$$

(7.35)

Furthermore, Corollary 7.2 (with $4n\leftrightarrow n$, $F_{c^d_{\varepsilon ,j}}^d(x)\leftrightarrow F_d(x,c^d_{\varepsilon ,j})$) ensures there exist neural networks $(\Psi ^d_{\varepsilon ,j})_{\varepsilon \in (0,\infty ),d\in {\mathbb {N}},j\in \{1,2,\dots ,{Q_{d,\varepsilon }}\}}\subseteq {\mathfrak {N}}$ which satisfy

(a)
$\displaystyle \sup _{\varepsilon \in (0,\infty ),d\in {\mathbb {N}}}\left[ {\frac{\max _{j\in \{1,2,\dots ,{Q_{d,\varepsilon }}\}}{\mathcal {L}}(\Psi ^d_{\varepsilon ,j})}{\max \{1,\ln (d)\}\left( {|\ln (\frac{\varepsilon }{2N_{d,\varepsilon }})|+\ln (d)+1}\right) +\ln (N_{d,\varepsilon }+1)}}\right] <\infty $,
(b)
$\displaystyle \sup _{\varepsilon \in (0,\infty ),d\in {\mathbb {N}}}\left[ {\frac{\max _{j\in \{1,2,\dots ,{Q_{d,\varepsilon }}\}}{\mathcal {M}}(\Psi ^d_{\varepsilon ,j})}{(N_{d,\varepsilon }+1)^{\frac{1}{4n}}d^{1+\frac{1}{4n}}\left[ {\frac{\varepsilon }{2N_{d,\varepsilon }}}\right] ^{-\frac{1}{4n}}}}\right] <\infty $ and
(c)
for every $\varepsilon \in (0,\infty )$, $d\in {\mathbb {N}}$ that
$$\begin{aligned} \sup _{x\in [a,b]^d}\left| {F_d(c^d_{\varepsilon ,j},x)-\left[ {R_{\varrho }(\Psi ^d_{\varepsilon ,j})}\right] \!(x)}\right| \le \tfrac{\varepsilon }{2N_{d,\varepsilon }}. \end{aligned}$$
(7.36)

Let $\mathrm {Id}_{{\mathbb {R}}^d}\in {\mathbb {R}}^{d\times d}$, $d\in {\mathbb {N}}$, be the matrices given by $\mathrm {Id}_{{\mathbb {R}}^d}=\mathrm {diag}(1,1,\dots ,1)$, let $\nabla _{d,q}\in {\mathcal {N}}_1^{d,dq}$, $d,q\in {\mathbb {N}}$, be the neural networks given by

$$\begin{aligned} \nabla _{d,q}=\left( {(\begin{pmatrix}\mathrm {Id}_d \\ \vdots \\ \mathrm {Id}_d\end{pmatrix},0)}\right) , \end{aligned}$$

(7.37)

let $\Sigma _{d,\varepsilon }\in {\mathcal {N}}_1^{d,1}$, $d\in {\mathbb {N}}$, $\varepsilon \in (0,1]$, be the neural networks given by

$$\begin{aligned} \Sigma _{d,\varepsilon }=\left( {(\begin{pmatrix}w^d_{\varepsilon ,1}&w^d_{\varepsilon ,2}&\dots&w^d_{\varepsilon ,Q_{d,\varepsilon }}\end{pmatrix},0)}\right) , \end{aligned}$$

(7.38)

and let $(\Gamma _{d,\varepsilon })_{\varepsilon \in (0,1],d\in {\mathbb {N}}}\in {\mathfrak {N}}$ be the neural networks given by

$$\begin{aligned} \Gamma _{d,\varepsilon }=\Sigma _{d,\varepsilon }\odot {\mathcal {P}}(\Psi ^d_{\varepsilon ,1}, \Psi ^d_{\varepsilon ,2}, \dots , \Psi ^d_{\varepsilon ,Q_{d,\varepsilon }}) \odot \nabla _{d,Q_{d,\varepsilon }}. \end{aligned}$$

(7.39)

Combining Lemma 5.3, Lemma 5.4, (7.34), (7.35) and (c) implies for every $\varepsilon \in (0,\infty )$ and $d\in {\mathbb {N}}$, $x\in [a,b]^d$ it holds

$$\begin{aligned} \begin{aligned}&\left| {\int _0^{\infty }F_d(c,x)\mathrm {d}c-\left[ {R_{\varrho }(\Gamma _{d,\varepsilon })}\right] \!(x)}\right| \\&\quad \le \left| {\int _0^\infty F_d(c,x)\mathrm {d}c-\sum _{j=0}^{Q_{d,\varepsilon }} w^d_{\varepsilon ,j}F_d(c^d_{\varepsilon ,j},x)}\right| +\left| {\sum _{j=0}^{Q_{d,\varepsilon }} w^d_{\varepsilon ,j}F_d(c^d_{\varepsilon ,j},x)-\left[ {R_{\varrho }(\Gamma _{d,\varepsilon })}\right] \!(x)}\right| \\&\quad \le \tfrac{\varepsilon }{2}+\left| {\sum _{j=0}^{Q_{d,\varepsilon }} w^d_{\varepsilon ,j}F_d(c^d_{\varepsilon ,j},x)-\sum _{j=0}^{Q_{d,\varepsilon }}w^d_{\varepsilon ,j}\left[ {R_{\varrho }(\Psi ^d_{\varepsilon ,j})}\right] \!(x)}\right| \\&\quad \le \tfrac{\varepsilon }{2}+\sum _{j=0}^{Q_{d,\varepsilon }} w^d_{\varepsilon ,j}\left| {F_d(c^d_{\varepsilon ,j},x)-\left[ {R_{\varrho }(\Psi ^d_{\varepsilon ,j})}\right] \!(x)}\right| \le \tfrac{\varepsilon }{2}+N_{d,\varepsilon }\tfrac{\varepsilon }{2N_{d,\varepsilon }}=\varepsilon . \end{aligned}\end{aligned}$$

(7.40)

This establishes that the neural networks $(\Gamma _{d,\varepsilon })_{\varepsilon \in (0,1],d\in {\mathbb {N}}}$ satisfy (iii). Next, observe for every $\varepsilon \in (0,\infty )$, $d\in {\mathbb {N}}$

$$\begin{aligned} \begin{aligned}&\max \{1,\ln (d)\}\left( {|\ln (\frac{\varepsilon }{2N_{d,\varepsilon }})|+\ln (d)+1}\right) +\ln (N_{d,\varepsilon }+1)\\&\quad \le \max \{1,\ln (d)\}\left( {|\ln (\varepsilon )|+\ln (d)+3\ln (N_{d,\varepsilon })+\ln (2)+1}\right) \\&\quad \le \max \{1,\ln (d)\}\left( {|\ln (\varepsilon )|+\ln (d)+3\left( {\ln (S_{b,n})+\frac{1}{4n}\ln (d)+\frac{1}{4n}|\ln (\varepsilon )|+\frac{1}{4n}\ln (4)}\right) +2}\right) \\&\quad \le \max \{1,\ln (d)\}\left( {4|\ln (\varepsilon )|+4\ln (d)+3\ln (S_{b,n})+8}\right) \\&\quad \le (3\ln (S_{b,n})+8)\max \{1,\ln (d)\}\left( {|\ln (\varepsilon )|+\ln (d)+1}\right) . \end{aligned}\end{aligned}$$

(7.41)

Combining this with Lemma 5.3, Lemma 5.4 and (a) implies

$$\begin{aligned} \begin{aligned}&\sup _{\varepsilon \in (0,1],d\in {\mathbb {N}}}\left[ {\frac{{\mathcal {L}}(\Gamma _{d,\varepsilon })}{\max \{1,\ln (d)\}\left( {|\ln (\varepsilon )|+\ln (d)+1}\right) }}\right] \\&\quad \le \sup _{\varepsilon \in (0,1],d\in {\mathbb {N}}}\left[ {\frac{{\mathcal {L}}(\Sigma _{d,\varepsilon })+\max _{j\in \{1,2,\dots ,{Q_{d,\varepsilon }}\}}{\mathcal {L}}(\Psi ^d_{\varepsilon ,j})+{\mathcal {L}}(\nabla _{d,Q_{d,\varepsilon }})}{\max \{1,\ln (d)\}\left( {|\ln (\varepsilon )|+\ln (d)+1}\right) }}\right] \\&\quad \le 2+\sup _{\varepsilon \in (0,1],d\in {\mathbb {N}}}\left[ {\frac{\max _{j\in \{1,2,\dots ,{Q_{d,\varepsilon }}\}}{\mathcal {L}}(\Psi ^d_{\varepsilon ,j})}{\max \{1,\ln (d)\}\left( {|\ln (\varepsilon )|+\ln (d)+1}\right) }}\right] \\&\quad \le 2+(3\ln (S_{b,n})+8)\!\!\!\!\!\!\sup _{\varepsilon \in (0,\infty ),d\in {\mathbb {N}}}\left[ {\frac{\max _{j\in \{1,2,\dots ,{Q_{d,\varepsilon }}\}}{\mathcal {L}}(\Psi ^d_{\varepsilon ,j})}{\max \{1,\ln (d)\}\left( {|\ln (\frac{\varepsilon }{2N_{d,\varepsilon }})|+\ln (d)+1}\right) +\ln (N_{d,\varepsilon }+1)}}\right] \\&\quad <\infty . \end{aligned}\end{aligned}$$

(7.42)

This establishes $(\Gamma _{d,\varepsilon })_{\varepsilon \in (0,1],d\in {\mathbb {N}}}$ satisfy (i). In addition, for every $\varepsilon \in (0,\infty )$, $d\in {\mathbb {N}}$ it holds

$$\begin{aligned} \begin{aligned} (N_{d,\varepsilon }+1)^{\frac{1}{4n}}d^{1+\frac{1}{4n}}\left[ {\frac{\varepsilon }{2N_{d,\varepsilon }}}\right] ^{-\frac{1}{4n}}&\le 4N_{d,\varepsilon }^{\frac{1}{2n}}d^{1+\frac{1}{4n}}\varepsilon ^{-\frac{1}{4n}}\\&\le 4\left[ {S_{b,n}d^{\frac{1}{4n}}\left[ {\tfrac{\varepsilon }{4}}\right] ^{-\frac{1}{4n}}}\right] ^{\frac{1}{2n}}d^{1+\frac{1}{4n}}\varepsilon ^{-\frac{1}{4n}}\\&\le 16S_{b,n}d^{1+\frac{1}{4n}+\frac{1}{4n^2}}\varepsilon ^{-(\frac{1}{4n}+\frac{1}{8n^2})}\\&\le 16S_{b,n}d^{1+\frac{1}{2n}}\varepsilon ^{-\frac{1}{2n}}. \end{aligned}\end{aligned}$$

(7.43)

Combining this with Lemma 5.3, Lemma 5.4, (7.33), (b) and the fact that for every $\psi \in {\mathfrak {N}}$ which satisfies ${\min _{l\in \{1,2,\dots ,{{\mathcal {L}}(\psi )}\}}{\mathcal {M}}_l(\psi )>0}$ it holds ${\mathcal {L}}(\psi )\le {\mathcal {M}}(\psi )$ ensures

$$\begin{aligned}&\sup _{\varepsilon \in (0,1],d\in {\mathbb {N}}}\left[ {\frac{{\mathcal {M}}(\Gamma _{d,\varepsilon })}{d^{(2+\frac{1}{n})}\varepsilon ^{-\frac{1}{n}}}}\right] \nonumber \\&\quad \le \sup _{\varepsilon \in (0,1],d\in {\mathbb {N}}}\left[ {\frac{\displaystyle 2{\mathcal {M}}(\Sigma _{d,\varepsilon })+4\left( {2\sum _{j=1}^{Q_{d,\varepsilon }}{\mathcal {M}}(\Psi ^d_{\varepsilon ,j})+4Q_{d,\varepsilon }\max _{j\in \{1,2,\dots ,{Q_{d,\varepsilon }}\}}{\mathcal {L}}(\Psi ^d_{\varepsilon ,j})}\right) +4{\mathcal {M}}(\nabla _{d,Q_{d,\varepsilon }})}{d^{(2+\frac{1}{n})}\varepsilon ^{-\frac{1}{n}}}}\right] \nonumber \\&\quad \le \sup _{\varepsilon \in (0,1],d\in {\mathbb {N}}}\left[ {\frac{24Q_{d,\varepsilon }\max _{j\in \{1,2,\dots ,{Q_{d,\varepsilon }}\}}{\mathcal {M}}(\Psi ^d_{\varepsilon ,j})}{d^{(2+\frac{1}{n})}\varepsilon ^{-\frac{1}{n}}}}\right] +\sup _{\varepsilon \in (0,1],d\in {\mathbb {N}}}\left[ {\frac{2Q_{d,\varepsilon }+4dQ_{d,\varepsilon }}{d^{(2+\frac{1}{n})}\varepsilon ^{-\frac{1}{n}}}}\right] \nonumber \\&\quad \le 24\left( {\sup _{\varepsilon \in (0,1],d\in {\mathbb {N}}}\left[ {\frac{Q_{d,\varepsilon }}{d^{(1+\frac{1}{2n})}\varepsilon ^{-\frac{1}{2n}}}}\right] }\right) \left( {\sup _{\varepsilon \in (0,1],d\in {\mathbb {N}}}\left[ {\frac{\max _{j\in \{1,2,\dots ,{Q_{d,\varepsilon }}\}}{\mathcal {M}}(\Psi ^d_{\varepsilon ,j})}{d^{(1+\frac{1}{2n})}\varepsilon ^{-\frac{1}{2n}}}}\right] }\right) \nonumber \\&\qquad +4\sup _{\varepsilon \in (0,1],d\in {\mathbb {N}}}\left[ {\frac{Q_{d,\varepsilon }}{d^{(1+\frac{1}{n})}\varepsilon ^{-\frac{1}{n}}}}\right] \nonumber \\&\quad \le 24\left( {\sup _{\varepsilon \in (0,1],d\in {\mathbb {N}}}\left[ {\frac{Q_{d,\varepsilon }}{d^{(1+\frac{1}{2n})}\varepsilon ^{-\frac{1}{2n}}}}\right] }\right) \left( {1+16S_{b,n}\sup _{\varepsilon \in (0,1],d\in {\mathbb {N}}}\left[ {\frac{\max _{j\in \{1,2,\dots ,{Q_{d,\varepsilon }}\}}{\mathcal {M}}(\Psi ^d_{\varepsilon ,j})}{(N_{d,\varepsilon }+1)^{\frac{1}{4n}}d^{1+\frac{1}{4n}}\left[ {\frac{\varepsilon }{2N_{d,\varepsilon }}}\right] ^{-\frac{1}{4n}}}}\right] }\right) \nonumber \\&\quad <\infty . \end{aligned}$$

(7.44)

This establishes the neural networks $(\Gamma _{d,\varepsilon })_{\varepsilon \in (0,1],d\in {\mathbb {N}}}$ satisfy (ii). The proof of Theorem 7.3 is thus completed. $\square $

8 Discussion

While Theorem 7.3 only establishes formally that the solution of one specific high-dimensional PDE may be approximated by neural networks without curse of dimensionality, the constructive approach also serves to illustrate that neural networks are capable of accomplishing the same for any PDE solution which exhibits a similar low-rank structure. Note here, that the tensor product construction in Proposition 6.4 only introduces a logarithmic dependency on the approximation accuracy. That we end up with a spectral rate in this specific case is due to Proposition 6.4 and Lemma 4.3, i.e., the insufficient regularity of the univariate functions inside the tensor product, as well as the number of terms required by the Gaussian quadrature used to approximate the outer integral. In particular, this means that the approach in Sect. 6 might also be used to produce approximation results with connectivity growing only logarithmically in the inverse of the approximation error, given that one has a suitably well-behaved low-rank structure.

The present result is a promising step toward higher-order, numerical solution of high-dimensional PDEs, which are notoriously troublesome to handle with any of the classical approaches based on discretization of the domain, or with randomized (a.k.a. Monte Carlo-based) arguments. Of course, answering the question of approximability can only ensure that there exist networks with a reasonable size-to-accuracy trade-off, whereas for any practical purpose it is also necessary to establish whether and how one can find these networks.

An analysis of the generalization error for linear Kolmogorov equations can be found in [4], which concludes that, under reasonable assumptions, the number of required Monte Carlo samples is free of the curse of dimensionality. Moreover, there are a number of empirical results [2, 3, 21, 39, 42], which suggest that the solutions of various high-dimensional PDEs may be learned efficiently using standard stochastic gradient descent-based methods. However, a satisfying formal analysis of this training process does not seem to be available at the present.

Lastly we would like to point out that, even though we had a semi-explicit formula available, the ReLU networks we used for approximation were in no way adapted to use this knowledge and have been shown to exhibit excellent approximation properties for, e.g., piecewise smooth functions [34], affine and Gabor systems [10] and even fractal structures [9]. So, while a spline dictionary-based approach specifically designed for the approximation of this one PDE solution may have similar rates, it would most certainly lack the remarkable universality of neural networks.

Notes

Often phrased as input dimension $N_0$ and output dimension $N_L$ with $N_l$, $l\in \{1,2,\dots ,{L-1}\}$ many neurons in the lth layer.

References

Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39(3), 930–945 (1993)
Article MathSciNet Google Scholar
Beck, C., Becker, S., Grohs, P., Jaafari, N., Jentzen, A.: Solving stochastic differential equations and Kolmogorov equations by means of deep learning. arXiv:1806.00421 (2018)
Beck, C., E, W., Jentzen, A.: Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations. J. Nonlinear Sci. (2017)
Berner, J., Grohs, P., Jentzen, A.: Analysis of the generalization error: Empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations. SIAM J. Math. Data Sci. 2, 631–657 (2020)
Article MathSciNet Google Scholar
Bölcskei, H., Grohs, P., Kutyniok, G., Petersen, P.: Optimal approximation with sparsely connected deep neural networks. SIAM J. Math. Data Sci. 1(1), 8–45 (2019)
Article MathSciNet Google Scholar
Chiani, M., Dardari, D., Simon, M.K.: New exponential bounds and approximations for the computation of error probability in fading channels. IEEE Trans. Wireless Commun. 2(4), 840–845 (2003)
Article Google Scholar
Chui, C., Li, X., Mhaskar, H.: Neural networks for localized approximation. Math. Comput. 63(208), 607–623 (1994)
Article MathSciNet Google Scholar
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)
Article MathSciNet Google Scholar
Dym, N., Sober, B., Daubechies, I.: Expression of fractals through neural network functions. IEEE J. Select. Areas Inf. Theory 1(1), 57–66 (2020)
Article Google Scholar
Elbrächter, D., Perekrestenko, D., Grohs, P., Bölcskei, H.: Deep neural network approximation theory. arXiv:1901.02220 (2019)
Freidlin, M.: Functional Integration and Partial Differential Equations. Annals of Mathematics Studies, vol. 109. Princeton University Press, Princeton (1985)
Fujii, M., Takahashi, A., Takahashi, M.: Asymptotic expansion as prior knowledge in deep learning method for high dimensional BSDEs. Asia-Pac. Finan. Mark. 29, 1563–1619 (2017)
MATH Google Scholar
Gonon, L., Grohs, P., Jentzen, A., Kofler, D., Šiška, D. Uniform error estimates for artificial neural network approximations for heat equations. arXiv:1911.09647 (2019)
Gonon, L., Schwab, C.: Deep ReLU network expression rates for option prices in high-dimensional, exponential Lévy models. Tech. Rep. 2020-52, Seminar for Applied Mathematics, ETH Zürich, 2020
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)
MATH Google Scholar
Goudenège, L., Molent, A., Zanette, A.: Machine learning for pricing American options in high-dimensional Markovian and non-Markovian models. Quant. Finance 20(4), 573–591 (2020)
Article MathSciNet Google Scholar
Grohs, P., Herrmann, L.: Deep neural network approximation for high-dimensional elliptic PDEs with boundary conditions. arXiv:2007.05384 (2020)
Grohs, P., Hornung, F., Jentzen, A., von Wurstemberger, P.A., proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations. arXiv:1809.02362. Accepted in Mem. Amer. Math, Soc (2019)
Grohs, P., Jentzen, A., Salimova, D.: Deep neural network approximations for Monte Carlo algorithms. arXiv:1908.10828 (2019)
Hairer, M., Hutzenthaler, M., Jentzen, A.: Loss of regularity for Kolmogorov equations. Ann. Probab. 43(2), 468–527 (2015)
Article MathSciNet Google Scholar
Han, J., Jentzen, A., Weinan, E.: Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. 115(34), 8505–8510 (2018)
Article MathSciNet Google Scholar
Henry-Labordere, P.: Deep Primal-Dual Algorithm for BSDEs: Applications of Machine Learning to CVA and IM. Available at SSRN: https://ssrn.com/abstract=3071506
Hornik, K., Stinchcombe, M., White, H.: Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw. 3(5), 551–560 (1990)
Article Google Scholar
Hornung, F., Jentzen, A., Salimova, D.: Space-time deep neural network approximations for high-dimensional partial differential equations. arXiv:2006.02199 (2020)
Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T.A.: A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. SN Partial Differ. Equ. Appl. 1, 10 (2020)
Article MathSciNet Google Scholar
Jentzen, A., Salimova, D., Welti, T.: A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients. arXiv:1809.07321 (2018)
Khoo, Y., Lu, J., Ying, L.: Solving parametric PDE problems with artificial neural networks. Eur. J. Appl. Math. (2020)
Kutyniok, G., Petersen, P., Raslan, M., Schneider, R.: A theoretical analysis of deep neural networks and parametric PDEs. arXiv:1904.00377 (2019)
Kwok, Y.-K.: Mathematical Models of Financial Derivatives, 2nd edn. Springer, Berlin (2008)
MATH Google Scholar
Levy, D.: Introduction to Numerical Analysis, 2010. Available: https://api.semanticscholar.org/CorpusID:123255603
Mhaskar, H.N.: Neural Networks for optimal approximation of smooth and analytic functions. Neural Comput. 8, 164–177 (1996)
Article Google Scholar
Mishra, S.: A machine learning framework for data driven acceleration of computations of differential equations. Math. Eng. 1(1), 118–146 (2018)
Article MathSciNet Google Scholar
Perekrestenko, D., Grohs, P., Elbrächter, D., Bölcskei, H.: The universal approximation power of finite-width deep ReLU networks. arXiv:1806.01528 (2018)
Petersen, P., Voigtlaender, F.: Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Netw. 108, 296–330 (2018)
Article Google Scholar
Pinkus, A.: Approximation theory of the MLP model in neural networks. Acta Numer. 8, 143–195 (1999)
Article MathSciNet Google Scholar
Reisinger, C., Zhang, Y.: Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems. arXiv:1903.06652 (2019)
Schmidt-Hieber, J.: Nonparametric regression using deep neural networks with ReLU activation function. Ann. Stat. 48(4), 1875–1897 (2020)
MathSciNet MATH Google Scholar
Schwab, C., Zech, J.: Deep learning in high dimension: Neural network expression rates for generalized polynomial chaos expansions in UQ. Anal. App. 17(01), 19–55 (2019)
Article MathSciNet Google Scholar
Sirignano, J., Spiliopoulos, K.: DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018)
Article MathSciNet Google Scholar
Telgarsky, M.: Representation benefits of deep feedforward networks. arXiv:1509.0810 (2015)
Weinan, E., Yu, B.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1–12 (2018)
MathSciNet MATH Google Scholar
Weinan, E., Han, J., Jentzen, A.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5(4), 349–380 (2017)
Article MathSciNet Google Scholar
Wilmott, P.: Paul Wilmott Introduces Quantitative Finance, 2nd edn. Wiley, Hoboken (2007)
MATH Google Scholar
Yarotsky, D.: Error bounds for approximations with deep ReLU networks. Neural Networks 94, 103–114 (2017)
Article Google Scholar

Download references

Funding

Open access funding provided by University of Vienna.

Author information

Authors and Affiliations

Faculty of Mathematics, University of Vienna, Vienna, Austria
Dennis Elbrächter
Faculty of Mathematics and Research Network Data Science, University of Vienna, Vienna, Austria
Philipp Grohs
Johann Radon Institute of Computational and Applied Mathematics, Austrian Academy of Sciences, Linz, Austria
Philipp Grohs
SAM, Department of Mathematics, ETH Zurich, Zurich, Switzerland
Arnulf Jentzen & Christoph Schwab
Institute for Analysis and Numerics, Faculty of Mathematics and Computer Science, University of Muenster, Muenster, Germany
Arnulf Jentzen

Authors

Dennis Elbrächter
View author publications
You can also search for this author in PubMed Google Scholar
Philipp Grohs
View author publications
You can also search for this author in PubMed Google Scholar
Arnulf Jentzen
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Schwab
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dennis Elbrächter.

Additional information

Communicated by Wolfgang Dahmen, Ronald A. Devore, and Philipp Grohs.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was performed during visits of PG at the Seminar for Applied Mathematics and the FIM of ETH Zürich, and completed during the thematic term “Numerical Analysis of Complex PDE Models in the Sciences” at the Erwin Schrödinger Institute, Vienna, from June-August, 2018. AJ acknowledges support by the Swiss National Science Foundation under grant No. 175699 DE and PhG are supported in part by the Austrian Science Fund (FWF) under project number P 30148.

Additional Proofs

1.1 Technical Lemma

Lemma A.1

It holds for every $r\in (0,\infty )$, $t\in (0,\exp (-2r^2)]$ that

$$\begin{aligned} \left| {\ln (t)}\right| \le t^{-\nicefrac {1}{r}} \end{aligned}$$

(A.1)

and for every $r\in (0,\infty )$, $t\in [\exp (2r^2),\infty )$ that

$$\begin{aligned} \ln (t)\le t^{\nicefrac {1}{r}}. \end{aligned}$$

(A.2)

Proof of Lemma A.1

First, observe that for every $r\in (0,\infty )$, $y\in [2r^2,\infty )$ it holds that

$$\begin{aligned} \exp \!\left( {\frac{y}{r}}\right) =\sum _{k=0}^{\infty }\left[ {\frac{y^k}{k!r^k}}\right] \ge \frac{y^2}{2!r^2}=y\left[ {\frac{y}{2r^2}}\right] \ge y. \end{aligned}$$

(A.3)

This implies that for every $r\in (0,\infty )$, $x\in [\exp (2r^2),\infty )$ it holds that

$$\begin{aligned} x^{\nicefrac {1}{r}}=\exp \!\left( {\ln \!\left( {x^{\nicefrac {1}{r}}}\right) }\right) =\exp \!\left( {\tfrac{\ln (x)}{r}}\right) \ge \ln (x). \end{aligned}$$

(A.4)

Hence, we obtain that for every $r\in (0,\infty )$, $t\in (0,\exp (-2r^2)]\subseteq (0,1]$ it holds that

$$\begin{aligned} t^{-\nicefrac {1}{r}}=\left[ {\tfrac{1}{t}}\right] ^{\nicefrac {1}{r}}\ge \ln \left( \tfrac{1}{t}\right) =\left| {\ln (t)}\right| . \end{aligned}$$

(A.5)

This completes the proof of Lemma A.1. $\square $

1.2 Proof of Lemma 6.1

Proof of Lemma 6.1

The proof follows [44]. We provide it in order to provide values of constants in the bounds on depth and width, and to reveal the dependence on the scaling parameter B. Throughout this proof, let $\theta \in {\mathcal {N}}^{1,1}_1$ be the neural network given by $\theta =(0,0)$, let $g_s:[0,1]\rightarrow [0,1]$, $s\in {\mathbb {N}}$, be the functions which satisfy for every $s\in {\mathbb {N}}$, $t\in [0,1]$ that

$$\begin{aligned} g_s(t)={\left\{ \begin{array}{ll} 2t &{} :s=1,t<\tfrac{1}{2}\\ 2-2t &{} :s=1,t\ge \tfrac{1}{2}\\ g_1(g_{s-1}(t)) &{} :s\ge 1 \end{array}\right. }, \end{aligned}$$

(A.6)

and let $f_m:[0,1]\rightarrow [0,1]$, $m\in {\mathbb {N}}$, be the functions which satisfy for every $m\in {\mathbb {N}}$, $k\in \{0,1,\dots ,{2^m}\}$, $x\in \left[ {\tfrac{k}{2^m},\tfrac{k+1}{2^m}}\right] $ that

$$\begin{aligned} f_m(x)=\left[ {\frac{2k+1}{2^m}}\right] x-\frac{k^2+k}{2^{2m}}. \end{aligned}$$

(A.7)

We claim for every $s\in {\mathbb {N}}$, $k\in \{0,1,\dots ,{2^{s-1}-1}\}$ it holds

$$\begin{aligned} g_s(x)={\left\{ \begin{array}{ll} 2^s(x-\tfrac{2k}{2^s}) &{} :x\in \left[ {\tfrac{2k}{2^s},{\tfrac{2k+1}{2^s}}}\right] \\ 2^s\left( \tfrac{2k+2}{2^s}-x\right) &{} :x\in \left[ {\tfrac{2k+1}{2^s},{\tfrac{2k+2}{2^s}}}\right] \end{array}\right. }. \end{aligned}$$

(A.8)

We now prove (A.8) by induction on $s\in {\mathbb {N}}$. Equation (A.6) establishes (A.8) in the base case $s=1$. For the induction step ${\mathbb {N}}\ni s\rightarrow s+1\in \{2,3,\dots \}$ observe that (A.6) implies for every $s\in {\mathbb {N}}$, $l\in \{0,1,\dots ,{2^{s-1}-1}\}$ that

(a)
it holds for every $x\in \left[ {\tfrac{2l}{2^s},\tfrac{2l+(\nicefrac {1}{2})}{2^s}}\right] $
$$\begin{aligned} \begin{aligned} g_{s+1}(x)&=g(g_s(x))=g(2^s(x-\tfrac{2l}{2^s}))=2\left[ {2^s(x-\tfrac{2l}{2^s})}\right] \\&=2^{s+1}(x-\tfrac{2l}{2^s})=2^{s+1}(x-\tfrac{2(2l)}{2^{s+1}}). \end{aligned}\end{aligned}$$
(A.9)
(b)
it holds for every $x\in \left[ {\tfrac{2l+(\nicefrac {1}{2})}{2^s},\tfrac{2l+1}{2^s}}\right] $
$$\begin{aligned} \begin{aligned} g_{s+1}(x)&=g(g_s(x))=g(2^s(x-\tfrac{2l}{2^s}))=2-2\left[ {2^s(x-\tfrac{2l}{2^s})}\right] \\&=2-2^{s+1}x+4l=2^{s+1}\left( \tfrac{4l+2}{2^{s+1}}-x\right) \\&=2^{s+1}\left( \tfrac{2(2l+1)}{2^{s+1}}-x\right) . \end{aligned}\end{aligned}$$
(A.10)
(c)
it holds for every $x\in \left[ {\tfrac{2l+1}{2^s},\tfrac{2l+(\nicefrac {3}{2})}{2^s}}\right] $
$$\begin{aligned} \begin{aligned} g_{s+1}(x)&=g(g_s(x))=g\left( 2^s\left( \tfrac{2l+2}{2^s}-x\right) \right) =2-2\left[ {2^s \left( \tfrac{2l+2}{2^s}-x\right) }\right] \\&=2-2(2l+2)+2^{s+1}x=2^{s+1}x-2(2l+1)\\&=2^{s+1}(x-\tfrac{2(2l+1)}{2^{s+1}}). \end{aligned}\end{aligned}$$
(A.11)
(d)
it holds for every $x\in \left[ {\tfrac{2l+(\nicefrac {3}{2})}{2^s},\tfrac{2l+2}{2^s}}\right] $
$$\begin{aligned} \begin{aligned} g_{s+1}(x)&=g(g_s(x))=g\left( 2^s\left( \tfrac{2l+2}{2^s}-x\right) \right) =2\left[ {2^s\left( \tfrac{2l+2}{2^s}-x\right) }\right] \\&=2^{s+1}\left( \tfrac{2l+2}{2^s}-x\right) =2^{s+1} \left( \tfrac{2(2l+2)}{2^{s+1}}-x\right) . \end{aligned}\end{aligned}$$
(A.12)

Next observe that for every $s\in {\mathbb {N}}$, $k\in \{0,1,\dots ,{2^s-1}\}$ there exists $l\in \{0,1,\dots ,{2^{s-1}-1}\}$ such that

$$\begin{aligned} \left[ {\tfrac{2k}{2^{s+1}},{\tfrac{2k+1}{2^{s+1}}}}\right] =\left[ {\tfrac{2l}{2^s},\tfrac{2l+(\nicefrac {1}{2})}{2^s}}\right] \quad \mathrm {or}\quad \left[ {\tfrac{2k}{2^{s+1}},{\tfrac{2k+1}{2^{s+1}}}}\right] =\left[ {\tfrac{2l+1}{2^s},\tfrac{2l+(\nicefrac {3}{2})}{2^s}}\right] . \end{aligned}$$

(A.13)

Furthermore, for every $s\in {\mathbb {N}}$, $k\in \{0,1,\dots ,{2^s-1}\}$ there exists $l\in \{0,1,\dots ,{2^{s-1}-1}\}$ such that

$$\begin{aligned} \left[ {\tfrac{2k+1}{2^{s+1}},{\tfrac{2k+2}{2^{s+1}}}}\right] =\left[ {\tfrac{2l+(\nicefrac {1}{2})}{2^s},\tfrac{2l+1}{2^s}}\right] \quad \mathrm {or}\quad \left[ {\tfrac{2k+1}{2^{s+1}},{\tfrac{2k+2}{2^{s+1}}}}\right] =\left[ {\tfrac{2l+(\nicefrac {3}{2})}{2^s},\tfrac{2l+2}{2^s}}\right] . \end{aligned}$$

(A.14)

Combining this with (A.9), (A.10), (A.11), (A.12) and (A.13) completes the induction step ${\mathbb {N}}\ni s\rightarrow s+1\in \{2,3,\dots \}$ and thus establishes the claim (A.8).

Next, for every $m\in {\mathbb {N}}$, $k\in \{0,1,\dots ,{2^{m-1}}\}$ it holds

$$\begin{aligned} \begin{aligned} f_{m-1}\left( \tfrac{2k}{2^m}\right) -f_m \left( \tfrac{2k}{2^m}\right)&=f_{m-1} \left( \tfrac{k}{2^{m-1}}\right) -f_m \left( \tfrac{2k}{2^m}\right) =\left[ {\tfrac{k}{2^{m-1}}}\right] ^2-\left[ {\tfrac{2k}{2^m}}\right] ^2 =0. \end{aligned}\end{aligned}$$

(A.15)

In addition, note that (A.7) implies that for every $m\in {\mathbb {N}}$, $k\in \{0,1,\dots ,{2^m-1}\}$ it holds

$$\begin{aligned} \begin{aligned} f_{m-1}\left( \tfrac{2k+1}{2^{m}}\right)&=f_{m-1}\left( {\tfrac{k+\frac{1}{2}}{2^{m-1}}}\right) =\left[ {\frac{2k+1}{2^{m-1}}}\right] \frac{k+\frac{1}{2}}{2^{m-1}}-\frac{k^2+k}{2^{2(m-1)}}\\&=\frac{(2k+1)(k+\frac{1}{2})-(k^2+k)}{2^{2m-2}}=\frac{k^2+k+\frac{1}{2}}{2^{2m-2}} =\frac{4k^2+4k+2}{2^{2m}} \end{aligned}\end{aligned}$$

(A.16)

and

$$\begin{aligned} \begin{aligned} f_m\left( \tfrac{2k+1}{2^{m}}\right)&=\left[ {\frac{2(2k+1)+1}{2^m}}\right] \frac{2k+1}{2^{m}}-\frac{(2k+1)^2+(2k+1)}{2^{2m}} =\frac{4k^2+4k+1}{2^{2m}}. \end{aligned}\end{aligned}$$

(A.17)

For every $m\in {\mathbb {N}}$, $k\in \{0,1,\dots ,{2^m-1}\}$ it holds

$$\begin{aligned} f_{m-1}\left( \tfrac{2k+1}{2^m}\right) -f_m \left( \tfrac{2k+1}{2^m}\right) =\frac{4k^2+4k+2}{2^{2m}}-\frac{4k^2+4k+1}{2^{2m}}=\frac{1}{2^{2m}}. \end{aligned}$$

(A.18)

Combining this with (A.8), (A.7) and (A.15) demonstrates that for every $m\in {\mathbb {N}}$, $x\in [0,1]$ it holds

$$\begin{aligned} f_{m-1}(x)-f_m(x)=2^{-2m}g_m(x). \end{aligned}$$

(A.19)

The fact that for every $x\in [0,1]$ it holds that $f_0(x)=x$ therefore implies that for every $m\in {\mathbb {N}}_0$, $x\in [0,1]$ it holds

$$\begin{aligned} f_m(x)=x-\sum _{s=1}^m 2^{-2s} g_s(x). \end{aligned}$$

(A.20)

We observe $f_m$ is the affine, linear interpolant of the twice continuously differentiable function $[0,1]\ni x\mapsto x^2\in [0,1]$ at the points $\tfrac{k}{2^m}$, $k\in \{0,1,\dots ,{2^m}\}$. This establishes that for every $m\in {\mathbb {N}}$

$$\begin{aligned} \begin{aligned} \sup _{x\in [0,1]}\left| {x^2-f_m(x)}\right|&=\max _{k\in \{0,1,\dots ,{2^m}\}}\left( {\sup _{x\in \left[ {\frac{k}{2^m},\frac{k+1}{2^m}}\right] }\left| {x^2-f_m(x)}\right| }\right) \\&\le \max _{k\in \{0,1,\dots ,{2^m}\}}\left( {\frac{\left[ {\frac{k+1}{2^m}-\frac{k}{2^m}}\right] ^2}{8}\max _{x\in \left[ {\frac{k}{2^m},\frac{k+1}{2^m}}\right] }\left| {\tfrac{\mathrm {d}^{2}}{\mathrm {d}t^{2}}\left[ {x^2}\right] }\right| }\right) \\&\le \max _{k\in \{0,1,\dots ,{2^m}\}}\left( {\tfrac{1}{8}\left[ {\tfrac{1}{2^m}}\right] ^2\max _{x\in \left[ {\frac{k}{2^m},\frac{k+1}{2^m}}\right] }\left| {2}\right| }\right) \\&=2^{-2m-2}. \end{aligned}\end{aligned}$$

(A.21)

Let $(A_k,b_k)\in {\mathbb {R}}^{4\times 4}\times {\mathbb {R}}^4$, $k\in {\mathbb {N}}$, be the matrix-vector tuples which satisfy for every $k\in {\mathbb {N}}$

$$\begin{aligned} A_k=\begin{pmatrix} 2 &{} -4 &{} 2 &{} 0\\ 2 &{} -4 &{} 2 &{} 0\\ 2 &{} -4 &{} 2 &{} 0\\ -2^{-2k+3} &{} 2^{-2k+4} &{} -2^{-2k+3} &{} 1 \end{pmatrix} \quad \mathrm {and}\quad b_k=\begin{pmatrix} 0 \\ -\frac{1}{2} \\ -1 \\ 0 \end{pmatrix}, \end{aligned}$$

(A.22)

let $\varphi _m\in {\mathfrak {N}}$, $m\in {\mathbb {N}}$, be the neural networks which satisfy $\varphi _1=(1,0)$ and, for every $m\in {\mathbb {N}}$,

$$\begin{aligned} \varphi _m=\left( {\left( {\begin{pmatrix}1 \\ 1 \\ 1 \\ 1 \end{pmatrix},\begin{pmatrix}0 \\ -\frac{1}{2} \\ -1 \\ 0\end{pmatrix}}\right) ,(A_2,b_2),\dots ,(A_{m-1},b_{m-1}),\left( {\begin{pmatrix}-2^{-2m+3} \\ 2^{-2m+4} \\ -2^{-2m+3} \\ 1\end{pmatrix}^T,0}\right) }\right) . \end{aligned}$$

(A.23)

Let further $r^k:{\mathbb {R}}\rightarrow {\mathbb {R}}$, $k\in {\mathbb {N}}$ denote the function which satisfies for every $x\in {\mathbb {R}}$

$$\begin{aligned} (r^1_1(x),r^1_2(x),r^1_3(x),r^1_4(x))=r^1(x)=\varrho ^*(x,x-\tfrac{1}{2},x-1,x) \end{aligned}$$

(A.24)

and for every $x\in {\mathbb {R}}$, $k\in {\mathbb {N}}$

$$\begin{aligned} (r^k_1(x),r^k_2(x),r^k_3(x),r^k_4(x))=r^k(x)=\varrho ^*(A_k r_{k-1}(x)+b_k). \end{aligned}$$

(A.25)

We claim that for every $k\in \{1,2,\dots ,{m-1}\}$, $x\in [0,1]$ it holds

(a)
$$\begin{aligned} 2r^k_1(x)-4r^k_2(x)+2r^k_3(x)=g_k(x) \end{aligned}$$
(A.26)
and
(b)
$$\begin{aligned} r^k_4(x)=x-\sum _{j=1}^{k-1} 2^{-2j}g_j(x). \end{aligned}$$
(A.27)

We prove (a) and (b) by induction over $k\in \{1,2,\dots ,{m-1}\}$. For the base case $k=1$ we note that for every $x\in [0,1]$ it holds

$$\begin{aligned} g_1(x)=2\varrho (x)-4\varrho (x-\tfrac{1}{2})+2\varrho (x-1). \end{aligned}$$

(A.28)

Hence, we obtain that for every $x\in [0,1]$ it holds

$$\begin{aligned} 2r^1_1(x)-4r^1_2(x)+2r^1_3(x)=2\varrho (x)-4\varrho (x-\tfrac{1}{2})+2\varrho (x-1)=g_1(x). \end{aligned}$$

(A.29)

Furthermore, note that for every $x\in [0,1]$ it holds that $r^1_4(x)=x$. This and (A.29) establish the base case $k=1$. For the induction step $\{1,2,\dots ,{m-2}\}\ni k-1\rightarrow k\in \{2,3,\dots ,m-1\}$ observe that (A.28) ensures for every $x\in [0,1]$, $k\in \{2,3,\dots ,m-1\}$, with $g_{k-1}(x)=2r^{k-1}_1(x)-4r^{k-1}_2(x)+2r^{k-1}_3(x)$, it holds

$$\begin{aligned} \begin{aligned} 2r^k_1(x)-4r^k_2(x)+2r^k_3(x)=\quad&2\varrho (2r^{k-1}_1(x)-4r^{k-1}_2(x)+2r^{k-1}_3(x))\\ -&4\varrho (2r^{k-1}_1(x)-4r^{k-1}_2(x)+2r^{k-1}_3(x)-\tfrac{1}{2})\\ +&2\varrho (2r^{k-1}_1(x)-4r^{k-1}_2(x)+2r^{k-1}_3(x)-1)\\ =\quad&g_1(2r^{k-1}_1(x)-4r^{k-1}_2(x)+2r^{k-1}_3(x))\\ =\quad&g_1(g_{k-1}(k))=g_k(x). \end{aligned}\end{aligned}$$

(A.30)

Induction thus establishes (a). Moreover note that (A.7) and (A.20) for every $k\in {\mathbb {N}}$, $x\in [0,1]$ it holds

$$\begin{aligned} x-\sum _{j=1}^{k-1} 2^{-2j}g_j(x)=f_{k-1}(x)\ge 0. \end{aligned}$$

(A.31)

Combining this with (A.28) implies that for every $x\in [0,1]$, $k\in \{2,3,\dots ,m-1\}$ with $g_{k-1}(x)=2r^{k-1}_1(x)-4r^{k-1}_2(x)+2r^{k-1}_3(x)$ and $r^{k-1}_4(x)=x-\sum _{j=1}^{k-2} 2^{-2j}g_j(x)$ it holds

$$\begin{aligned} r^k_4(x)= & {} \varrho (-2^{-2k+3}r^{k-1}_1(x)+2^{-2k+4}r^{k-1}_2(x)-2^{-2k+3}r^{k-1}_3(x)+r^{k-1}_4(x))\nonumber \\= & {} \varrho (x-\sum _{j=1}^{k-2} 2^{-2j}g_j(x)-g_{k-1}(x))=\varrho (x-\sum _{j=1}^{k-1} 2^{-2j}g_j(x))\nonumber \\= & {} x-\sum _{j=1}^{k-1} 2^{-2j}g_j(x). \end{aligned}$$

(A.32)

Induction thus establishes (b). Next observe that (a) and (b) that for every $m\in {\mathbb {N}}$, $x\in [0,1]$ it holds

$$\begin{aligned} \begin{aligned}{}[R_{\varrho }(\varphi _m)](x)&=-2^{-2m+3}r^{m-1}_1(x)+2^{-2m+4}r^{m-1}_2(x)-2^{-2m+3}r^{m-1}_3(x)+r^{m-1}_4(x)\\&=-2^{-2(m-1)}\left( {2r^{m-1}_1(x)-4r^{m-1}_2(x)+2r^{m-1}_3(x)}\right) +x-\sum _{j=1}^{m-2} 2^{-2j}g_j(x)\\&=x-\left[ {\sum _{j=1}^{m-2} 2^{-2j}g_j(x)}\right] -2^{-2(m-1)}g_{m-1}(x)=x-\sum _{j=1}^{m-1} 2^{-2j}g_j(x). \end{aligned}\end{aligned}$$

(A.33)

Combining this with (A.20) establishes that for every $m\in {\mathbb {N}}$, $x\in [0,1]$ it holds

$$\begin{aligned}{}[R_{\varrho }(\varphi _m)](x)=f_{m-1}(x). \end{aligned}$$

(A.34)

This and (A.21) imply that for every $m\in {\mathbb {N}}$ it holds

$$\begin{aligned} \sup _{x\in [0,1]}\left| {x^2-[R_{\varrho }(\varphi _m)](x)}\right| \le 2^{-2m}. \end{aligned}$$

(A.35)

Furthermore, observe that by construction it holds for every $m\in {\mathbb {N}}$

$$\begin{aligned} {\mathcal {L}}(\varphi _m)=m\quad \mathrm {and}\quad {\mathcal {M}}(\varphi _m)=\max \{1,10+15(m-2)\}\le 15m. \end{aligned}$$

(A.36)

Let $(\sigma _{\varepsilon })_{\varepsilon \in (0,\infty )}\subseteq {\mathfrak {N}}$ be the neural networks which satisfy for $\varepsilon \in (0,1)$

$$\begin{aligned} \sigma _{\varepsilon }=\varphi _{\left\lceil \frac{1}{2}\left| {\log _2(\varepsilon )}\right| \right\rceil } \end{aligned}$$

(A.37)

and for every $\varepsilon \in [1,\infty )$ that $\sigma _{\varepsilon }=\theta $. Observe that for every $\varepsilon \in [1,\infty )$ it holds

$$\begin{aligned} \sup _{x\in [0,1]}\left| {x^2-[R_{\varrho }(\sigma _{\varepsilon })](x)}\right| =\sup _{x\in [0,1]}\left| {x^2-[R_{\varrho }(\theta )](x)}\right| \le 1\le \varepsilon . \end{aligned}$$

(A.38)

In addition, note for every $\varepsilon \in (0,1)$ it holds

$$\begin{aligned} \begin{aligned} \sup _{x\in [0,1]}\left| {x^2-[R_{\varrho }(\sigma _{\varepsilon })](x)}\right|&=\sup _{x\in [0,1]}\left| {x^2-\left[ R_{\varrho } \left( \varphi _{\left\lceil \frac{1}{2}\left| {\log _2(\varepsilon )}\right| \right\rceil }\right) \right] (x)}\right| \\&\le 2^{-2\left\lceil \frac{1}{2}\left| {\log _2(\varepsilon )}\right| \right\rceil }\le 2^{-2\left( \frac{1}{2}\left| {\log _2(\varepsilon )}\right| \right) } =2^{\log _2(\varepsilon )}=\varepsilon . \end{aligned}\end{aligned}$$

(A.39)

Moreover, observe that (A.36) implies for every $\varepsilon \in (0,1)$ it holds

$$\begin{aligned} {\mathcal {L}}(\sigma _{\varepsilon })={\mathcal {L}}(\varphi _{\left\lceil \frac{1}{2}\left| {\log _2(\varepsilon )}\right| \right\rceil })=\left\lceil \tfrac{1}{2}\left| {\log _2(\varepsilon )}\right| \right\rceil \end{aligned}$$

(A.40)

and

$$\begin{aligned} {\mathcal {M}}(\sigma _{\varepsilon })={\mathcal {M}}(\varphi _{\left\lceil \frac{1}{2}\left| {\log _2(\varepsilon )}\right| \right\rceil })\le 15\left\lceil \tfrac{1}{2}\left| {\log _2(\varepsilon )}\right| \right\rceil . \end{aligned}$$

(A.41)

Furthermore, for every $\varepsilon \in [1,\infty )$ it holds ${\mathcal {L}}(\sigma _{\varepsilon })={\mathcal {L}}(\theta )=1$ and ${\mathcal {M}}(\sigma _{\varepsilon })={\mathcal {M}}(\theta )=0$. This completes the proof of Lemma 6.1. $\square $

1.3 Proof of Lemma 6.2

Proof of Lemma 6.2

Throughout this proof, assume Setting 5.2, let $\theta \in {\mathcal {N}}^{1,1}_1$ be the neural network given by $\theta =(0,0)$, let $\alpha \in {\mathcal {N}}_2^{2,6,3}$ be the neural network given by

$$\begin{aligned} \begin{aligned} \alpha _1&=\left( {(\begin{pmatrix} 1 &{} 1\\ -1 &{} -1\\ 1 &{} 0 \\ -1 &{} 0 \\ 0 &{} 1 \\ 0 &{} -1 \end{pmatrix},\begin{pmatrix}0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0\end{pmatrix}), \left( \tfrac{1}{2B}\begin{pmatrix} 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 1 \end{pmatrix},\begin{pmatrix} 0 \\ 0 \\ 0\end{pmatrix}\right) }\right) , \end{aligned}\end{aligned}$$

(A.42)

and let $\Sigma \in {\mathcal {N}}^{3,1}_1$ be the neural network given by $\Sigma =\left( {(\begin{pmatrix}2B^2&-2B^2&-2B^2\end{pmatrix},0)}\right) $. Observe that Lemma 6.1 ensures the existence of neural networks $(\sigma _{\varepsilon })_{\varepsilon \in (0,\infty )}\subseteq {\mathfrak {N}}$ which satisfy Lemma 6.1, (i) – (iv). Let $(\mu _{\varepsilon })_{\varepsilon \in (0,\infty )}\subseteq {\mathfrak {N}}$ be the neural networks which satisfy for every $\varepsilon \in (0,\infty )$

$$\begin{aligned} \mu _{\varepsilon }={\left\{ \begin{array}{ll}\Sigma \odot {\mathcal {P}}\left( {\sigma _{\nicefrac {\varepsilon }{6B^2}},\sigma _{\nicefrac {\varepsilon }{6B^2}},\sigma _{\nicefrac {\varepsilon }{6B^2}}}\right) \odot \alpha &{} :\varepsilon < B^2\\ \theta &{} :\varepsilon \ge B^2\end{array}\right. }. \end{aligned}$$

(A.43)

Note first that for every $\varepsilon \in [B^2,\infty )$ it holds

$$\begin{aligned} \begin{aligned} \sup _{x,y\in [-B,B]}\left| {xy-\left[ {R_{\varrho }(\mu _{\varepsilon })}\right] \!(x,y)}\right|&=\sup _{x,y\in [-B,B]}\left| {xy-\left[ {R_{\varrho }(\theta )}\right] \!(x,y)}\right| \\&=\sup _{x,y\in [-B,B]}\left| {xy-0}\right| =B^2\le \varepsilon . \end{aligned}\end{aligned}$$

(A.44)

Next observe that for every $(x,y)\in {\mathbb {R}}^2$ it holds

$$\begin{aligned} \begin{aligned}{}[R_{\varrho }(\alpha )](x,y)=\tfrac{1}{2B}\begin{pmatrix} \varrho (x+y)+\varrho (-(x+y)) \\ \varrho (x) + \varrho (-x) \\ \varrho (y) + \varrho (-y) \end{pmatrix}= \tfrac{1}{2B}\begin{pmatrix} |x+y| \\ |x| \\ |y| \end{pmatrix}. \end{aligned}\end{aligned}$$

(A.45)

Furthermore, for every $(x,y,z)\in {\mathbb {R}}^3$ holds $[R_{\varrho }(\Sigma )](x,y,z) = 2B^2x - 2B^2y - 2B^2z$. Combining this with Lemma 5.3, Lemma 5.4, (A.43) and (A.45) establishes that for every $\varepsilon \in (0,B^2)$, $(x,y)\in [-B,B]^2$ it holds

$$\begin{aligned} \begin{aligned}{}[R_{\varrho }(\mu _{\varepsilon })](x,y)&=2B^2\left( {[R_{\varrho }(\sigma _{\nicefrac {\varepsilon }{6B^2}})]\left( {\tfrac{\left| {x+y}\right| }{2B}}\right) -[R_{\varrho }(\sigma _{\nicefrac {\varepsilon }{6B^2}})]\left( {\tfrac{\left| {x}\right| }{2B}}\right) -[R_{\varrho }(\sigma _{\nicefrac {\varepsilon }{6B^2}})]\left( {\tfrac{\left| {y}\right| }{2B}}\right) }\right) . \end{aligned}\end{aligned}$$

(A.46)

With Lemma 6.1, Item iv, (A.46) establishes (v). In addition note that Lemma 6.1 demonstrates for every $\varepsilon \in (0,\infty )$ it holds

$$\begin{aligned} \begin{aligned}&\sup _{z\in [-2B,2B]}\left| {\tfrac{1}{2}z^2-2B^2\left[ {[R_{\varrho }(\sigma _{\nicefrac {\varepsilon }{6B^2}})]\left( {\tfrac{\left| {z}\right| }{2B}}\right) }\right] }\right| \\&\quad =\sup _{z\in [-2B,2B]}\left| {2B^2\left[ {\tfrac{\left| {z}\right| }{2B}}\right] ^2-2B^2\left[ {[R_{\varrho }(\sigma _{\nicefrac {\varepsilon }{6B^2}})]\left( {\tfrac{\left| {z}\right| }{2B}}\right) }\right] }\right| \\&\quad =2B^2\left[ {\sup _{t\in [0,1]}\left| {t^2-\left[ {[R_{\varrho }(\sigma _{\nicefrac {\varepsilon }{6B^2}})]\left( {t}\right) }\right] }\right| }\right] \le 2B^2\left[ {\frac{\varepsilon }{6B^2}}\right] =\frac{\varepsilon }{3}. \end{aligned}\end{aligned}$$

(A.47)

This and (A.46) establish that for every $\varepsilon \in (0,B^2)$ it holds

$$\begin{aligned} \begin{aligned}&\sup _{x,y\in [-B,B]}\left| {xy-[R_{\varrho }(\mu _{\varepsilon })](x,y)}\right| \\&\quad =\sup _{x,y\in [-B,B]}\left| {\frac{1}{2}\left[ {(x+y)^2-x^2-y^2}\right] -[R_{\varrho }(\mu _{\varepsilon })](x,y)}\right| \\&\quad \le \tfrac{\varepsilon }{3}+\tfrac{\varepsilon }{3}+\tfrac{\varepsilon }{3}=\varepsilon . \end{aligned}\end{aligned}$$

(A.48)

Next observe that ${\mathcal {L}}(\alpha )=2$ and ${\mathcal {L}}(\Sigma )=1$. Combining this with Lemma 5.3, Lemma 5.4 and Lemma 6.1(i) ensures for every $\varepsilon \in (0,B^2)$

$$\begin{aligned} \begin{aligned} {\mathcal {L}}(\mu _{\varepsilon })&={\mathcal {L}}(\Sigma )+{\mathcal {L}}(\sigma _{\nicefrac {\varepsilon }{6B^2}})+{\mathcal {L}}(\alpha )\\&\le \tfrac{1}{2}\left| {\log _2\left( \tfrac{\varepsilon }{6B^2}\right) }\right| +4=\tfrac{1}{2}\log _2 \left( \tfrac{6B^2}{\varepsilon }\right) +4\\&\le \tfrac{1}{2}\left( \log _2\left( \tfrac{1}{\varepsilon }\right) +2\log _2(B)+3\right) +4\\&=\tfrac{1}{2}\log _2\left( \tfrac{1}{\varepsilon }\right) +\log _2(B)+6. \end{aligned}\end{aligned}$$

(A.49)

Combining ${\mathcal {M}}(\alpha )=14$ and ${\mathcal {M}}(\Sigma )=3$ with Lemma 5.3, Lemma 5.4, Lemma 6.1(ii) and (A.42) demonstrate that for every $\varepsilon \in (0,B^2)$ it holds

$$\begin{aligned} \begin{aligned} {\mathcal {M}}(\mu _{\varepsilon })&\le 2\left( {{\mathcal {M}}(\Sigma )+3{\mathcal {M}}(\sigma _{\nicefrac {\varepsilon }{6B^2}})+{\mathcal {M}}(\alpha )}\right) \\&\le 34 + 90\left( \tfrac{1}{2}|\log _2\left( \tfrac{6B^2}{\varepsilon }\right) |+1\right) \\&\le 45\log _2\left( \tfrac{1}{\varepsilon }\right) +90\log _2(B) + 259. \end{aligned}\end{aligned}$$

(A.50)

Moreover, for every $\varepsilon \in (B^2,\infty )$ it holds ${\mathcal {L}}(\mu _{\varepsilon })=1$ and ${\mathcal {M}}(\mu _{\varepsilon })=0$. Next, observe Lemma 5.3 and Lemma 5.4 demonstrate that for every $\varepsilon \in (0,\infty )$ it holds that ${\mathcal {M}}_1(\mu _\varepsilon )={\mathcal {M}}_1(\alpha )=8$ and ${\mathcal {M}}_{{\mathcal {L}}(\mu _{\varepsilon })}(\mu _{\varepsilon })={\mathcal {M}}(\Sigma )=3$. This completes the proof of Lemma 6.2. $\square $

1.4 Proof of Theorem 6.5

Proof of Theorem 6.5

Throughout this proof, assume Setting 5.2, let $h_{N,j}:{\mathbb {R}}\rightarrow {\mathbb {R}}$, $N\in {\mathbb {N}}$, $j\in \{0,1,\dots ,N\}$, be the functions which satisfy for every $N\in {\mathbb {N}}$, $j\in \{0,1,\dots ,N\}$, $x\in {\mathbb {R}}$

$$\begin{aligned} h_{N,j}(x)={\left\{ \begin{array}{ll}Nx+1-j &{} :\tfrac{j-1}{N}\le x \le \tfrac{j}{N}\\ -Nx+1+j &{} :\tfrac{j}{N} \le x\le \tfrac{j+1}{N}\\ 0 &{} :\mathrm {else} \end{array}\right. }, \end{aligned}$$

(A.51)

let $T_{f,N,j}:{\mathbb {R}}\rightarrow {\mathbb {R}}$, $f\in B^n_1$, $N\in {\mathbb {N}}$, $j\in \{0,1,\dots ,N\}$, be the functions which satisfy for every $f\in B^n_1$, $N\in {\mathbb {N}}$, $j\in \{0,1,\dots ,N\}$, $x\in [0,1]$

$$\begin{aligned} T_{f,N,j}(x)=\sum _{k=0}^{n-1}\frac{f^{(k)}\left( \tfrac{j}{N}\right) }{k!}(x-\tfrac{j}{N})^k. \end{aligned}$$

(A.52)

For every $f\in B^n_1$, let $f_N:{\mathbb {R}}\rightarrow {\mathbb {R}}$, $N\in {\mathbb {N}}$ denote functions which satisfy for every $N\in {\mathbb {N}}$, $x\in [0,1]$

$$\begin{aligned} f_N(x)=\sum _{j=0}^N h_{N,j}(x)T_{f,N,j}(x). \end{aligned}$$

(A.53)

Observe that Taylor’s theorem (with Lagrange remainder term) ensures that for every $f\in B^n_1$, $N\in {\mathbb {N}}$, $j\in \{0,1,\dots ,N\}$, $x\in [\max \{0,\tfrac{j-1}{N}\},\min \{1,\tfrac{j+1}{N}\}]$

$$\begin{aligned} \begin{aligned} \left| {f(x)-T_{f,N,j}(x)}\right|&\le \tfrac{1}{n!}\left| {x-\tfrac{j}{N}}\right| ^n \sup _{\xi \in [\max \{0,\tfrac{j-1}{N}\},\min \{1,\tfrac{j+1}{N}\}]}\left| {f^{(n)}(\xi )}\right| \\&\le \tfrac{1}{n!} N^{-n} \max _{k\in \{0,1,\dots ,n\}}\left[ {\sup _{t\in [0,1]}\left| {f^{(k)}(t)}\right| }\right] \le \tfrac{1}{n!} N^{-n}. \end{aligned}\end{aligned}$$

(A.54)

Moreover, for every $N\in {\mathbb {N}}$, $x\in [0,1]$, $j\notin \{\lceil Nx\rceil -1,\lceil Nx\rceil \}$ it holds that $h_{N,j}(x)=0$. We obtain for every $N\in {\mathbb {N}}$ and $x\in [0,1]$

$$\begin{aligned} \sum _{j=0}^N h_{N,j}(x)T_{f,N,j}(x)=h_{N,\lceil Nx\rceil -1}(x)T_{f,N,\lceil Nx\rceil -1}(x) +h_{N,\lceil Nx\rceil }(x)T_{f,N,\lceil Nx\rceil }(x). \end{aligned}$$

(A.55)

Furthermore, (A.51) implies for every $N\in {\mathbb {N}}$, $j\in \{1,\dots ,N-1\}$, $x\in [\tfrac{j-1}{N},\tfrac{j}{N}]$ holds

$$\begin{aligned} h_{N,j-1}(x)+h_{N,j}(x)=-Nx+1+(j-1) + Nx+1-j=1. \end{aligned}$$

(A.56)

Combining this with (A.53), (A.54) and (A.55) establishes that for every $f\in B^n_1$, $N\in {\mathbb {N}}$, $x\in [0,1]$

$$\begin{aligned} \begin{aligned}&\left| {f(x)-f_N(x)}\right| \\&\quad =\left| {f(x)-\sum _{j=0}^N h_{N,j}(x)T_{f,N,j}(x)}\right| \\&\quad =\left| {f(x)-\left( {h_{N,\lceil Nx\rceil -1}(x)T_{f,N,\lceil Nx\rceil -1}(x)+h_{N,\lceil Nx\rceil }(x)T_{f,N,\lceil Nx\rceil }(x)}\right) }\right| \\&\quad \le \left| {h_{N,\lceil Nx\rceil -1}(x)f(x)-h_{N,\lceil Nx\rceil -1}(x)T_{f,N,\lceil Nx\rceil -1}(x)}\right| \\&\qquad +\left| {h_{N,\lceil Nx\rceil }(x)f(x)-h_{N,\lceil Nx\rceil }(x)T_{f,N,\lceil Nx\rceil }(x)}\right| \\&\quad =h_{N,\lceil Nx\rceil -1}(x)\left| {f(x)-T_{f,N,\lceil Nx\rceil -1}(x)}\right| +h_{N,\lceil Nx\rceil }(x)\left| {f(x)-T_{f,N,\lceil Nx\rceil }(x)}\right| \\&\quad \le h_{N,\lceil Nx\rceil -1}(x)\left[ {\tfrac{1}{n!} N^{-n}}\right] +h_{N,\lceil Nx\rceil }(x)\left[ {\tfrac{1}{n!} N^{-n}}\right] =\tfrac{1}{n!} N^{-n}. \end{aligned}\end{aligned}$$

(A.57)

We now realize this local Taylor approximation using neural networks. To this end, note that Theorem 6.3 ensures that there exist $C\in {\mathbb {R}}$ and neural networks $(\Pi _{\eta }^k)_{\eta \in (0,\infty )}$, $k\in {\mathbb {N}}\cap [2,\infty )$ which satisfy

(A)
$\displaystyle {\mathcal {L}}(\Pi _{\eta }^k)\le C\ln (k)\left( {\left| {\ln (\eta )}\right| +k\ln (3)+\ln (k)}\right) $,
(B)
$\displaystyle {\mathcal {M}}(\Pi _{\eta }^k)\le C k\left( {\left| {\ln (\eta )}\right| +k\ln (3)+\ln (k)}\right) $,
(C)
$\displaystyle \sup _{x\in [-3,3]^k}\left| {\left[ {\prod _{i=1}^k x_i}\right] -\left[ {R_{\varrho }(\Pi _{\eta }^k)}\right] \!(x)}\right| \le \eta $ and
(D)
$R_\varrho \left[ \Pi _{\eta }^k\right] (x_1,x_2,\dots ,x_k)=0$, if there exists $i\in \{1,2,\dots ,k\}$ with $x_i=0$.

To complete the proof, we introduce the following neural networks:

$\nabla _{N,j,k}\in {\mathcal {N}}^{k,1}_1$, $N\in {\mathbb {N}}$, $j\in \{0,1,\dots ,{N}\}$, $k\in \{2,3,\dots ,n-1\}$ given by
$$\begin{aligned} \nabla _{N,j,k}=\left( {(\begin{pmatrix}1 \\ \vdots \\ 1\end{pmatrix}, \begin{pmatrix}-\tfrac{j}{N} \\ \vdots \\ -\tfrac{j}{N} \end{pmatrix})}\right) , \end{aligned}$$
(A.58)
$\xi _{\varepsilon ,N,j}^k\in {\mathfrak {N}}$, $\varepsilon \in (0,\infty )$, $N\in {\mathbb {N}}$, $j\in \{0,1,\dots ,{N}\}$, $k\in \{1,2,\dots ,n-1\}$, given by
$$\begin{aligned} \xi _{\varepsilon ,N,j}^k={\left\{ \begin{array}{ll}(1,0) &{} :k=1\\ \Pi ^k_{\nicefrac {\varepsilon }{8e}}\odot \nabla _{N,j,k} &{} :k>1\end{array}\right. }, \end{aligned}$$
(A.59)
$\Sigma _{f,N,j}\in {\mathcal {N}}^{1,n-1}_1$, $f\in B^n_1$, $N\in {\mathbb {N}}$, $j\in \{0,1,\dots ,{N}\}$ given by
$$\begin{aligned} \Sigma _{f,N,j}=\left( {(\begin{pmatrix}\frac{f^{(n-1)}\left( \tfrac{j}{N}\right) }{(n-1)!}&\frac{f^{(n-2)}\left( \tfrac{j}{N}\right) }{(n-2)!}&\dots&\frac{f^{(1)}\left( \tfrac{j}{N}\right) }{(1)!}\end{pmatrix},f\left( \tfrac{j}{N}\right) )}\right) , \end{aligned}$$
(A.60)
$\tau _{f,\varepsilon ,N,j}\in {\mathfrak {N}}$, $f\in B^n_1$, $\varepsilon \in (0,\infty )$, $N\in {\mathbb {N}}$, $j\in \{0,1,\dots ,{N}\}$ given by
$$\begin{aligned} \tau _{f,\varepsilon ,N,j}=\Sigma _{f,N,j}\odot {\mathcal {P}}(\xi _{\varepsilon ,N,j}^{n-1}, \xi _{\varepsilon ,N,j}^{n-2}, \dots , \xi _{\varepsilon ,N,j}^1)\odot \nabla _{1,0,n-1}, \end{aligned}$$
(A.61)
$\chi _{N,j}\in {\mathcal {N}}^{1,3,1}_2$, $N\in {\mathbb {N}}$, $j\in \{0,1,\dots ,{N}\}$ given by
$$\begin{aligned} \chi _{N,j}=\left( {(\begin{pmatrix}1 \\ 1 \\ 1 \end{pmatrix}, \begin{pmatrix}-\nicefrac {(j-1)}{N} \\ -\nicefrac {j}{N} \\ -\nicefrac {(j+1)}{N}\end{pmatrix}), (\begin{pmatrix}1&-2&1\end{pmatrix},0)}\right) \end{aligned}$$
(A.62)
$\lambda _N\in {\mathcal {N}}^{1,N+1}_1$, $N\in {\mathbb {N}}$ given by
$$\begin{aligned} \lambda _N=\left( {(\begin{pmatrix}1&\dots&1\end{pmatrix},0)}\right) , \end{aligned}$$
(A.63)
$\psi _{f,\varepsilon ,N,j}\in {\mathfrak {N}}$, $f\in B^n_1$, $\varepsilon \in (0,\infty )$, $N\in {\mathbb {N}}$, $j\in \{0,1,\dots ,{N}\}$ given by
$$\begin{aligned} \psi _{f,\varepsilon ,N,j}=\Pi ^2_{\nicefrac {\varepsilon }{8}}\odot {\mathcal {P}}(\chi _{N,j},\tau _{f,\varepsilon ,N,j}), \end{aligned}$$
(A.64)
$\varphi _{f,\varepsilon ,N}\in {\mathfrak {N}}$, $f\in B^n_1$, $N\in {\mathbb {N}}$, $\varepsilon \in (0,\infty )$ given by
$$\begin{aligned} \varphi _{f,\varepsilon ,N}=\lambda _N\odot {\mathcal {P}}\left( {\psi _{f,\varepsilon ,N,1}, \psi _{f,\varepsilon ,N,2}, \dots , \psi _{f,\varepsilon ,N,N}}\right) \odot \nabla _{1,0,2N+2}. \end{aligned}$$
(A.65)

With these networks, we note Lemma 5.3, Lemma 5.4, (C), (A.58) and (A.59) ensure that for every $N\in {\mathbb {N}}$, $\varepsilon \in (0,\infty )$, $j\in \{0,1,\dots ,{N}\}$, $k\in \{2,3,\dots ,n-1\}$

$$\begin{aligned} \begin{aligned}&\sup _{x\in [0,1]}\left| {(x-\tfrac{j}{N})^k-\left[ {R_{\varrho }(\xi _{\varepsilon ,N,j}^k)}\right] \!(x)}\right| \\&\quad \le \sup _{x\in [0,1]}\left| {(x-\tfrac{j}{N})^k-\left[ {R_{\varrho }(\Pi ^k_{\nicefrac {\varepsilon }{8e}})}\right] (\left[ {R_{\varrho }(\nabla _{N,j,k})}\right] (x))}\right| \\&\quad \le \sup _{x\in [0,1]}\left| {\left[ {\prod _{i=1}^k (x-\tfrac{j}{N})^k}\right] -\left[ {R_{\varrho }(\Pi ^k_{\nicefrac {\varepsilon }{8e}})}\right] (x-\tfrac{j}{N},x-\tfrac{j}{N},\dots ,x-\tfrac{j}{N})}\right| \\&\quad \le \sup _{x\in [-1,1]^k}\left| {\left[ {\prod _{i=1}^k x_i}\right] -\left[ {R_{\varrho }(\Pi _{\nicefrac {\varepsilon }{8e}}^k)}\right] \!(x)}\right| \le \tfrac{\varepsilon }{8e} \end{aligned}\end{aligned}$$

(A.66)

and

$$\begin{aligned} \sup _{x\in [0,1]}\left| {(x-\tfrac{j}{N})-\left[ {R_\varrho (\xi _{\varepsilon ,N,j}^1)}\right] (x)}\right| =0. \end{aligned}$$

(A.67)

Moreover, Lemma 5.3, Lemma 5.4, (A.58), (A.59), (A.60) and (A.61) demonstrate that for every $f\in B^n_1$, $N\in {\mathbb {N}}$, $\varepsilon \in (0,\infty )$, $j\in \{0,1,\dots ,{N}\}$, $x\in [0,1]$ it holds

$$\begin{aligned} \left[ {R_{\varrho }(\tau _{f,\varepsilon ,N,j})}\right] (x)=\sum _{k=1}^{n-1}\left[ {\frac{f^{(k)} \left( \tfrac{j}{N}\right) }{k!}\left[ {R_{\varrho }(\xi _{\varepsilon ,N,j}^k)}\right] (x)}\right] +f\left( \tfrac{j}{N}\right) . \end{aligned}$$

(A.68)

Combining this with (A.52), (A.61), (A.66) and (A.66) establishes that for every $f\in B^n_1$, $N\in {\mathbb {N}}$, $\varepsilon \in (0,\infty )$, $j\in \{0,1,\dots ,{N}\}$, $x\in [0,1]$ it holds

$$\begin{aligned} \begin{aligned}&\left| {T_{f,N,j}(x)-\left[ {R_{\varrho }(\tau _{f,\varepsilon ,N,j})}\right] (x)}\right| \\&\quad =\left| {\left( {\sum _{k=0}^{n-1} \frac{f^{(k)}\left( \tfrac{j}{N}\right) }{k!}(x-\tfrac{j}{N})^k}\right) -\left( {\sum _{k=1}^{n-1}\left[ {\frac{f^{(k)} \left( \tfrac{j}{N}\right) }{k!}\left[ {R_{\varrho }(\xi _{\varepsilon ,N,j}^k)}\right] (x)}\right] +f \left( \tfrac{j}{N}\right) }\right) }\right| \\&\quad \le \sum _{k=1}^{n-1}\left( {\frac{f^{(k)} \left( \tfrac{j}{N}\right) }{k!}\left| {\left( x-\tfrac{j}{N}\right) ^k-\left[ {R_{\varrho }(\xi _{\varepsilon ,N,j}^k)}\right] (x)}\right| }\right) \\&\quad \le \frac{\varepsilon }{8e}\sum _{k=1}^{n-1}\frac{f^{(k)} \left( \tfrac{j}{N}\right) }{k!} \le \frac{\varepsilon }{8e}\left( {\sum _{k=1}^{\infty }\frac{1}{k!}}\right) \le \frac{\varepsilon }{8}. \end{aligned}\end{aligned}$$

(A.69)

Next, (A.62) ensures for every $N\in {\mathbb {N}}$, $j\in \{0,1,\dots ,{N}\}$, $x\in [0,1]$

$$\begin{aligned}{}[R_{\varrho }(\chi _{N,j})](x)=\varrho (x-\tfrac{j-1}{N})-2\varrho (x-\tfrac{j}{N})+\varrho (x-\tfrac{j+1}{N})=h_{N,j}(x). \end{aligned}$$

(A.70)

Now (A.69) and Taylor’s Theorem imply for every $f\in B^n_1$, $N\in {\mathbb {N}}$, $\varepsilon \in (0,1)$, $j\in \{0,1,\dots ,{N}\}$, $x\in [0,1]$ that

$$\begin{aligned} \begin{aligned} \left| {[R_{\varrho }(\tau _{f,\varepsilon ,N,j})](x)}\right|&\le \left| {[R_{\varrho }(\tau _{f,\varepsilon ,N,j})](x)-T_{f,N,j}(x)}\right| +\left| {T_{f,N,j}(x)-f(x)}\right| +\left| {f(x)}\right| \\&\le \frac{\varepsilon }{4(N+1)}+\tfrac{1}{n!} x^n \sup _{t\in [0,1]}|f^{(n)}(t)|+\sup _{t\in [0,1]}\left| {f(t)}\right| \le 3. \end{aligned}\end{aligned}$$

(A.71)

Combining this with Lemma 5.3, Lemma 5.4, (A.51), (C), (A.69) and (A.70) establishes for every $f\in B^n_1$, $N\in {\mathbb {N}}$, $\varepsilon \in (0,1)$, $j\in \{0,1,\dots ,{N}\}$, $x\in [0,1]$ the bound

$$\begin{aligned} \begin{aligned}&\left| {h_{N,j}(x)T_{f,N,j}(x)-[R_{\varrho }(\psi _{f,\varepsilon ,N,j})](x,x)}\right| \\&\quad \le \left| {h_{N,j}(x)T_{f,N,j}(x)-[R_{\varrho }(\chi _{N,j})](x)[R_{\varrho }(\tau _{N,j})](x)}\right| \\&\qquad +\left| {[R_{\varrho }(\chi _{N,j})](x)[R_{\varrho }(\tau _{N,j})](x)-[R_{\varrho }(\Pi ^2_{\nicefrac {\varepsilon }{8}}\circ {\mathcal {P}}(\chi _{N,j},\tau _{f,\varepsilon ,N,j}))](x,x)}\right| \\&\quad \le \left| {h_{N,j}(x)T_{f,N,j}(x)-[R_{\varrho }(\tau _{N,j})](x)}\right| \\&\qquad +\left| {[R_{\varrho }(\chi _{N,j})](x)[R_{\varrho }(\tau _{N,j})](x)-[R_{\varrho }(\Pi ^2_{\nicefrac {\varepsilon }{8}})]([R_{\varrho }(\chi _{N,j}](x),[R_{\varrho }(\tau _{f,\varepsilon ,N,j})](x))}\right| \\&\quad \le \tfrac{\varepsilon }{8}+\tfrac{\varepsilon }{8}=\tfrac{\varepsilon }{4}. \end{aligned}\end{aligned}$$

(A.72)

Furthermore, note that for every $N\in {\mathbb {N}}$, $j\in \{0,1,\dots ,{N}\}$, $x\notin [\tfrac{j-1}{N},\tfrac{j+1}{N}]$ it holds that $h_{N,j}(x)=\chi _{N,j}(x)=0$. Thus (D) ensures that for every $f\in B^n_1$, $N\in {\mathbb {N}}$, $\varepsilon \in (0,1)$, $j\in \{0,1,\dots ,{N}\}$, $x\in [0,1]$ it holds

$$\begin{aligned} \left| {h_{N,j}(x)T_{f,N,j}(x)-[R_{\varrho }(\psi _{f,\varepsilon ,N,j})](x,x)}\right| =0. \end{aligned}$$

(A.73)

This, Lemma 5.3, Lemma 5.4, (A.53), (A.65) and (A.72) imply that for every $f\in B^n_1$, $N\in {\mathbb {N}}$, $\varepsilon \in (0,1)$, $x\in [0,1]$ it holds

$$\begin{aligned} \begin{aligned} \left| {f_N(x)-[R_{\varrho }(\varphi _{f,\varepsilon ,N})](x)}\right|&=\left| {\sum _{j=0}^N h_{N,j}(x)T_{f,N,j}(x)-\sum _{j=0}^N [R_{\varrho }(\psi _{f,\varepsilon ,N,j})](x,x)}\right| \\&\le 2\max _{j\in \{0,1,\dots ,{N}\}}\left| {h_{N,j}(x)T_{f,N,j}(x)-[R_{\varrho }(\psi _{f,\varepsilon ,N,j})](x,x)}\right| \\&\le \tfrac{\varepsilon }{2}. \end{aligned}\end{aligned}$$

(A.74)

Combining this with (A.57) establishes that for every $f\in B^n_1$, $N\in {\mathbb {N}}$, $\varepsilon \in (0,1)$, $x\in [0,1]$ it holds

$$\begin{aligned} \begin{aligned} \left| {f(x)-[R_{\varrho }(\varphi _{f,\varepsilon ,N})](x)}\right|&\le \left| {f(x)-f_N(x)}\right| +\left| {f_N(x)-[R_{\varrho }(\varphi _{f,\varepsilon ,N})]}\right| \le \tfrac{1}{n!} N^{-n}+\tfrac{\varepsilon }{2}. \end{aligned}\end{aligned}$$

(A.75)

Let $N_{\varepsilon }\in {\mathbb {N}}$ satisfy for every $\varepsilon \in (0,\infty )$

$$\begin{aligned} N_{\varepsilon }=\left\lceil \left[ {\tfrac{2}{n!\varepsilon }}\right] ^{\nicefrac {1}{n}}\right\rceil , \end{aligned}$$

(A.76)

let $\theta \in {\mathcal {N}}^{1,1}_1$ be given by $\theta =(0,0)$ and let $(\Phi _{f,\varepsilon })_{f\in B^n_1,\varepsilon \in (0,\infty )}\subseteq {\mathfrak {N}}$ be the neural networks given by

$$\begin{aligned} \Phi _{f,\varepsilon }={\left\{ \begin{array}{ll}\varphi _{f,\varepsilon ,N_{\varepsilon }} &{} :\varepsilon < 1 \\ \theta &{} :\varepsilon \ge 1\end{array}\right. }. \end{aligned}$$

(A.77)

Observe that (A.75) implies that for every $f\in B^n_1$, $\varepsilon \in (0,1)$, $x\in [0,1]$

$$\begin{aligned} \begin{aligned} \left| {f(x)-[R_{\varrho }(\Phi _{f,\varepsilon })](x)}\right|&=\left| {f(x)-[R_{\varrho }(\varphi _{f,\varepsilon ,N_{\varepsilon }})](x)}\right| \le \tfrac{1}{n!} N_{\varepsilon }^{-n} +\tfrac{\varepsilon }{2} \le \tfrac{1}{n!} \left[ {\tfrac{n!\varepsilon }{2}}\right] +\tfrac{\varepsilon }{2}=\varepsilon . \end{aligned}\end{aligned}$$

(A.78)

Moreover that for every $f\in B^n_1$, $\varepsilon \in [1,\infty )$, $x\in [0,1]$ it holds

$$\begin{aligned} \left| {f(x)-[R_{\varrho }(\Phi _{f,\varepsilon })](x)}\right| =\left| {f(x)-[R_{\varrho }(\theta )](x)}\right| =\left| {f(x)}\right| \le 1\le \varepsilon . \end{aligned}$$

(A.79)

This and (A.78) establish that the neural networks $(\Phi _{f,\varepsilon })_{f\in B^n_1,\varepsilon \in (0,\infty )}$ satisfy (iii).

Next, Lemma 5.3, Lemma 5.4, (A), (A.58) and (A.59) imply for every $N\in {\mathbb {N}}$, $\varepsilon \in (0,\infty )$, $j\in \{0,1,\dots ,{N}\}$, $k\in \{1,2,\dots ,n-1\}$

$$\begin{aligned} \begin{aligned} {\mathcal {L}}(\xi ^k_{\varepsilon ,N,j})&\le \max \{1,{\mathcal {L}}(\Pi ^k_{\nicefrac {\varepsilon }{8e}})+{\mathcal {L}}(\nabla _{N,j,k})\} \le C\ln (k)\left( {|\ln \left( \tfrac{\varepsilon }{8e}\right) |+k\ln (3)+\ln (k)}\right) +1. \end{aligned}\end{aligned}$$

(A.80)

Combining this with Lemma 5.3, Lemma 5.4, (A.58), (A.60), (A.61) shows for every $f\in B^n_1$, $N\in {\mathbb {N}}$, $\varepsilon \in (0,\infty )$, $j\in \{0,1,\dots ,{N}\}$ the bound

$$\begin{aligned} \begin{aligned} {\mathcal {L}}(\tau _{f,\varepsilon ,N,j})&\le {\mathcal {L}}(\Sigma _{f,N,j})+\left[ {\max _{k\in \{1,2,\dots ,{n-1}\}}{\mathcal {L}}(\xi ^k_{\varepsilon ,N,j})}\right] +{\mathcal {L}}(\nabla _{1,0,n-1})\\&\le 3+C\ln (n)\left( { |\ln \left( \tfrac{\varepsilon }{8e}\right) |+n\ln (3)+\ln (n)}\right) . \end{aligned}\end{aligned}$$

(A.81)

This, Lemma 5.3, Lemma 5.4, (A), (A.62), (A.63), (A.65) and (A.58) ensure for every $f\in B^n_1$, $N\in {\mathbb {N}}$, $\varepsilon \in (0,\infty )$ it holds

$$\begin{aligned} \begin{aligned} {\mathcal {L}}(\varphi _{f,\varepsilon ,N})&\le {\mathcal {L}}(\lambda _N)+\left[ {\max _{j\in \{0,1,\dots ,{N}\}}{\mathcal {L}}(\psi _{f,\varepsilon ,N,j})}\right] +{\mathcal {L}}(\nabla _{1,0,2N+2})\\&\le 2+\left[ {\max _{j\in \{0,1,\dots ,{N}\}}{\mathcal {L}}(\Pi ^2_{\nicefrac {\varepsilon }{8}}\odot {\mathcal {P}}(\chi _{N,j},\tau _{f,\varepsilon ,N,j}))}\right] \\&\le 2+\left[ {C\ln (2)\left( {|\ln \left( \tfrac{\varepsilon }{8}\right) |+2\ln (3)+\ln (2)}\right) +\max \{3,{\mathcal {L}}(\tau _{f,\varepsilon ,N,j})\}}\right] \\&\le 5+C\ln (2)\left( {|\ln \left( \tfrac{\varepsilon }{8}\right) |+\ln (18)}\right) +C\ln (n)\left( {|\ln \left( \tfrac{\varepsilon }{8e}\right) |+n\ln (3)+\ln (n)}\right) \\&\le 5+C\ln (2)\left( {|\ln (\varepsilon )|+|\ln (8)|+\ln (18)}\right) \\&\quad +C\ln (n)\left( { |\ln (\varepsilon )|+|\ln (8e)|+n\ln (3)+\ln (n)}\right) \\&=C\ln (2n)\left| {\ln (\varepsilon )}\right| +C(\ln (2)\ln (144)+\ln (n)(\ln (3)n+\ln (n)+|\ln (8e)|))+5. \end{aligned}\end{aligned}$$

(A.82)

With the constant C from (A.82), define the term $T_1$ by

$$\begin{aligned} T_1=C(\ln (2)\ln (144)+\ln (n)(\ln (3)n+\ln (n)+|\ln (8e)|))+5. \end{aligned}$$

(A.83)

Observe that (A.82) implies for every $f\in B^n_1$, $\varepsilon \in (0,1)$

$$\begin{aligned} \begin{aligned} {\mathcal {L}}(\Phi _{f,\varepsilon })={\mathcal {L}}(\varphi _{f,\varepsilon ,N_{\varepsilon }})=C\ln (2n)\left| {\ln (\varepsilon )}\right| +T_1. \end{aligned}\end{aligned}$$

(A.84)

Hence we obtain

$$\begin{aligned} \begin{aligned} \sup _{f\in B^n_1,\varepsilon \in (0,e^{-r}]} \left[ {\tfrac{{\mathcal {L}}(\Phi _{f,\varepsilon })}{\max \{r,\left| {\ln (\varepsilon )}\right| \}}}\right] \le \sup _{f\in B^n_1,\varepsilon \in (0,e^{-r}]}\left[ {\tfrac{C\ln (2n)\left| {\ln (\varepsilon )}\right| +T_1}{\left| {\ln (\varepsilon )}\right| }}\right] \le C\ln (2n)+\tfrac{T_1}{r}<\infty . \end{aligned}\end{aligned}$$

(A.85)

In addition, note that (A.84) ensures that

$$\begin{aligned} \begin{aligned} \sup _{f\in B^n_1,\varepsilon \in (e^{-r},1)}\left[ {\tfrac{{\mathcal {L}}(\Phi _{f,\varepsilon })}{\max \{r,\left| {\ln (\varepsilon )}\right| \}}}\right] \le \sup _{f\in B^n_1,\varepsilon \in (e^{-r},1)}\left[ {\tfrac{C\ln (2n)\left| {\ln (\varepsilon )}\right| +T_1}{r}}\right] \le C\ln (2n)+\tfrac{T_1}{r}<\infty . \end{aligned}\end{aligned}$$

(A.86)

Furthermore,

$$\begin{aligned} \sup _{f\in B^n_1,\varepsilon \in [1,\infty )}\!\left[ {\tfrac{{\mathcal {L}}(\Phi _{f,\varepsilon })}{\max \{r,\left| {\ln (\varepsilon )}\right| \}}}\right] =\sup _{f\in B^n_1,\varepsilon \in [1,\infty )}\!\left[ {\tfrac{1}{\max \{r,\left| {\ln (\varepsilon )}\right| \}}}\right] <\infty . \end{aligned}$$

(A.87)

This, (A.85) and (A.86) establish that the neural networks $(\Phi _{f,\varepsilon })_{\varepsilon \in (0,\infty )}$ satisfy (i). Next, Lemma 5.3, (B), (A.58) and (A.59) imply for every $N\in {\mathbb {N}}$, $\varepsilon \in (0,\infty )$, $j\in \{0,1,\dots ,{N}\}$, $k\in \{1,2,\dots ,n-1\}$

$$\begin{aligned} \begin{aligned} {\mathcal {M}}(\xi ^k_{\varepsilon ,N,j})&\le \max \{1,2({\mathcal {M}}(\Pi ^k_{\nicefrac {\varepsilon }{8e}})+{\mathcal {M}}(\nabla _{N,j,k}))\} \le 2(C k\left( {\left| {\ln \left( \tfrac{\varepsilon }{8e}\right) }\right| +k\ln (3)+\ln (k)}\right) +1) \end{aligned}\end{aligned}$$

(A.88)

Combining this with Lemma 5.3, Lemma 5.4, (A.58), (A.60) and (A.61) shows for every $f\in B^n_1$, $N\in {\mathbb {N}}$, $\varepsilon \in (0,\infty )$, $j\in \{0,1,\dots ,{N}\}$ it holds

$$\begin{aligned} \begin{aligned} {\mathcal {M}}(\tau _{f,\varepsilon ,N,j})&\le 2\left( {{\mathcal {M}}(\Sigma _{f,N,j})+2\left( {{\mathcal {M}}({\mathcal {P}}(\xi _{\varepsilon ,N,j}^{n-1}, \dots , \xi _{\varepsilon ,N,j}^1))+{\mathcal {L}}(\nabla _{1,0,n-1})}\right) }\right) \\&\le 2n+4\left( {2\left[ {\sum _{k=1}^{n-1}{\mathcal {M}}(\xi _{\varepsilon ,N,j}^k)}\right] +4(n-1)\max _{k\in \{1,2,\dots ,{n-1}\}}{\mathcal {L}}(\xi _{\varepsilon ,N,j}^k)}\right) +8(n-1)\\&\le 10n+8(n-1)(2C n\left( {\ln \left( \tfrac{\varepsilon }{(8e)}\right) |+n\ln (3)+\ln (n)}\right) +2)\\&\quad +16(n-1)(C\ln (n)\left( {|\ln \left( \tfrac{\varepsilon }{8e}\right) |+n\ln (3)+\ln (n)}\right) +1)\\&\le 32n^2C\left( {|\ln \left( \tfrac{\varepsilon }{8e}\right) |+n\ln (3)+\ln (n)}\right) +42n. \end{aligned}\end{aligned}$$

(A.89)

Let the term $T_2$ be given by

$$\begin{aligned} T_2=128\left( {C+32n^2C+C\ln (n)}\right) , \end{aligned}$$

(A.90)

and let the term $T_3$ be given by

$$\begin{aligned} T_3=1556+128(C\ln (144)+64n^2C(n\ln (3)+\ln (n))+42n. \end{aligned}$$

(A.91)

This, Lemma 5.3, Lemma 5.4, (B), (A.58), (A.62), (A.63), (A.65) and the fact that for every $\psi \in {\mathfrak {N}}$ with $\min _{l\in \{1,2,\dots ,{{\mathcal {L}}(\psi )}\}}{\mathcal {M}}_l(\psi )>0$ it holds that ${\mathcal {L}}(\psi )\le {\mathcal {M}}(\psi )$ ensure that for every $f\in B^n_1$, $N\in {\mathbb {N}}$, $\varepsilon \in (0,\infty )$ it holds

$$\begin{aligned} \begin{aligned}&{\mathcal {M}}(\varphi _{f,\varepsilon ,N})\\&\quad \le 2\left( {{\mathcal {M}}(\lambda _N)+2\left[ {{\mathcal {M}}({\mathcal {P}}(\psi _{f,\varepsilon ,N,1},\psi _{f,\varepsilon ,N,2},\dots ,\psi _{f,\varepsilon ,N,N}))+{\mathcal {M}}(\nabla _{1,0,2N+2})}\right] }\right) \\&\quad \le 2(N+1)+8\left[ {\sum _{j=0 }^N {\mathcal {M}}(\psi _{f,\varepsilon ,N,j})}\right] +16(N+1)\left[ {\max _{j\in \{0,1,\dots ,{N}\}}{\mathcal {L}}(\psi _{f,\varepsilon ,N,j})}\right] +8(N+1)\\&\quad \le 20N+32(N+1)\max _{j\in \{1,2,\dots ,{N}\}}{\mathcal {M}}(\psi _{f,\varepsilon ,N,j})\\&\quad \le 20N+64N\left( {{\mathcal {M}}(\Pi ^2_{\nicefrac {\varepsilon }{8}})+{\mathcal {M}}({\mathcal {P}}(\chi _{N,N},\tau _{f,\varepsilon ,N,N}))}\right) \\&\quad \le 20N+128NC \left( {\left| {\ln \left( \tfrac{\varepsilon }{8}\right) }\right| +2\ln (3)+\ln (2)}\right) \\&\qquad +64N\left( {2{\mathcal {M}}(\chi _{N,N})+2{\mathcal {M}}(\tau _{f,\varepsilon ,N,N})+4\max \{{\mathcal {L}}(\chi _{N,N}),{\mathcal {L}}(\tau _{f,\varepsilon ,N,N})\}}\right) \\&\quad \le 20N +128NC\left( {\left| {\ln \left( \tfrac{\varepsilon }{8}\right) }\right| +\ln (18)}\right) +1152N\\&\qquad +128N\left( {32n^2C\left( {|\ln \left( \tfrac{\varepsilon }{8e}\right) |+n\ln (3)+\ln (n)}\right) +42n}\right) \\&\qquad +128N\left( {3+C\ln (n)\left( { |\ln \left( \tfrac{\varepsilon }{8e}\right) |+n\ln (3)+\ln (n)}\right) }\right) \\&\quad =128\left( {C+32n^2C+C\ln (n)}\right) N|\ln (\varepsilon )|\\&\qquad + \left( {1556+128(C\ln (144)+64n^2C(n\ln (3)+\ln (n))+42n}\right) N\\&\quad =T_2N|\ln (\varepsilon )|+T_3N. \end{aligned}\end{aligned}$$

(A.92)

Combining this with Lemma A.1 demonstrates that for every $f\in B^n_1$, $\varepsilon \in (0,\exp (-2n^2)]$ it holds

$$\begin{aligned} \begin{aligned} {\mathcal {M}}(\Phi _{f,\varepsilon })&={\mathcal {M}}(\varphi _{f,\varepsilon ,N_{\varepsilon }})\le T_2N_{\varepsilon }|\ln (\varepsilon )|+T_3N_{\varepsilon }\\&=T_2\left\lceil \left[ {\tfrac{2}{n!\varepsilon }}\right] ^{\nicefrac {1}{n}}\right\rceil |\ln (\varepsilon )|+T_3\left\lceil \left[ {\tfrac{2}{n!\varepsilon }}\right] ^{\nicefrac {1}{n}}\right\rceil \\&\le 3T_2\varepsilon ^{-\frac{1}{n}}|\ln (\varepsilon )|+3T_3\varepsilon ^{-\frac{1}{n}}\\&\le 3T_2\varepsilon ^{-\frac{1}{n}}\max \{r,|\ln (\varepsilon )|\}+3T_3\varepsilon ^{-\frac{1}{n}}. \end{aligned}\end{aligned}$$

(A.93)

Hence we obtain

$$\begin{aligned} \begin{aligned} \sup _{f\in B^n_1,\varepsilon \in (0,\exp (-2n^2))}\!\left[ {\frac{{\mathcal {M}}(\Phi _{f,\varepsilon })}{\varepsilon ^{-\frac{1}{n}}\max \{r,|\ln (\varepsilon )|\}}}\right] \le 3T_2+3T_3\frac{1}{\max \{r,2n^2\}}<\infty . \end{aligned}\end{aligned}$$

(A.94)

Combining (A.93) with the fact that continuous function are bounded on compact sets ensures

$$\begin{aligned} \begin{aligned}&\sup _{f\in B^n_1,\varepsilon \in [\exp (-2n^2),1]}\!\left[ {\frac{{\mathcal {M}}(\Phi _{f,\varepsilon })}{\varepsilon ^{-\frac{1}{n}}\max \{r,|\ln (\varepsilon )|\}}}\right] \\&\quad \le \sup _{f\in B^n_1,\varepsilon \in [\exp (-2n^2),1]}\!\left[ {\frac{T_2N(|\ln (\varepsilon )|+|\ln (N)|)+T_3N}{\varepsilon ^{-\frac{1}{n}}\max \{r,|\ln (\varepsilon )|\}}}\right] <\infty . \end{aligned}\end{aligned}$$

(A.95)

In addition note

$$\begin{aligned} \sup _{f\in B^n_1,\varepsilon \in (1,\infty )}\!\left[ {\frac{{\mathcal {M}}(\Phi _{f,\varepsilon })}{\varepsilon ^{-\frac{1}{n}}\max \{r,|\ln (\varepsilon )|\}}}\right]&= \sup _{f\in B^n_1,\varepsilon \in (1,\infty )}\!\left[ {\frac{{\mathcal {M}}(\theta )}{\varepsilon ^{-\frac{1}{n}}\max \{r,|\ln (\varepsilon )|\}}}\right] \end{aligned}$$

(A.96)

$$\begin{aligned}&=\sup _{f\in B^n_1,\varepsilon \in (1,\infty )}\!\left[ {\frac{0}{\varepsilon ^{-\frac{1}{n}}\max \{r,|\ln (\varepsilon )|\}}}\right] =0<\infty . \end{aligned}$$

(A.97)

This, (A.94) and (A.95) establish that the neural networks $(\Phi _{f,\varepsilon })_{f\in B^n_1,\varepsilon \in (0,\infty )}$ satisfy (ii). The proof of Theorem 6.5 is completed. $\square $

1.5 Proof of Corollary 6.6

Proof of Corollary 6.6

Throughout this proof, assume Setting 5.2, let $c_{a,b}\in {\mathbb {R}}$, $[a,b]\subseteq {\mathbb {R}}_+$, be the real numbers given by $c_{a,b}=\min \{1,(b-a)^{-n}\}$, let $\lambda _{a,b}\in {\mathcal {N}}^{1,1}_1$, $[a,b]\subseteq {\mathbb {R}}_+$, be the neural networks given by $\lambda _{a,b}=(\tfrac{1}{b-a},-\tfrac{a}{b-a})$, let $\alpha _f\in {\mathcal {N}}^{1,1}_1$, $f\in {\mathcal {C}}^n$ be the neural networks given by $\alpha _f=(\tfrac{1}{c}\left\Vert f\right\Vert _{n,\infty },0)$, let $L_{a,b}:[0,1]\rightarrow [a,b]$, $[a,b]\subseteq {\mathbb {R}}_+$ be the functions which satisfy for every $[a,b]\subseteq {\mathbb {R}}_+$, $t\in [0,1]$

$$\begin{aligned} L_{a,b}(t)=(b-a)t+a, \end{aligned}$$

(A.98)

and for every $f\in {\mathcal {C}}^n$ let $f_*\in C^n([0,1],{\mathbb {R}})$ be the function which satisfies for every $t\in [0,1]$

$$\begin{aligned} f_*(t)=\left\Vert f\right\Vert _{n,\infty }^{-1}c_{a,b}(f(L_{a,b}(t))). \end{aligned}$$

(A.99)

We claim that for every $[a,b]\subseteq {\mathbb {R}}_+$, $f\in C^n([a,b],{\mathbb {R}})$, $m\in \{1,2,\dots ,{n}\}$, $t\in [0,1]$ it holds

$$\begin{aligned} f_*^{(m)}(t)=\left\Vert f\right\Vert _{n,\infty }^{-1}c_{a,b}(b-a)^m[f^{(m)}(L_{a,b}(t))]. \end{aligned}$$

(A.100)

We now prove (A.100) by induction on $m\in \{1,2,\dots ,{n}\}$. For the base case $m=1$, the chain rule implies for every $[a,b]\subseteq {\mathbb {R}}_+$, $f\in C^n([a,b],{\mathbb {R}})$, $t\in [0,1]$

$$\begin{aligned} \begin{aligned} f_*'(t)&=\tfrac{\mathrm {d}}{\mathrm {d}t}\left[ {\left\Vert f\right\Vert _{n,\infty }^{-1}c_{a,b}f(L_{a,b}(t))}\right] =\left\Vert f\right\Vert _{n,\infty }^{-1}c_{a,b}\left[ {f'(L_{a,b}(t))L_{a,b}'(t)}\right] \\&=\left\Vert f\right\Vert _{n,\infty }^{-1}c_{a,b}\left[ {f'(L_{a,b}(t))(b-a)}\right] =\left\Vert f\right\Vert _{n,\infty }^{-1}c_{a,b}(b-a)[f'(L_{a,b}(t))]. \end{aligned}\end{aligned}$$

(A.101)

This establishes (A.100) in the base case $m=1$.

For the induction step $\{1,2,\dots ,n-1\}\ni m\rightarrow m+1\in \{2,3,\dots ,n\}$ observe that the chain rule ensures for every $[a,b]\subseteq {\mathbb {R}}_+$, $f\in C^n([a,b],{\mathbb {R}})$, $m\in {\mathbb {N}}$, $t\in [0,1]$

$$\begin{aligned} \begin{aligned}&\tfrac{\mathrm {d}}{\mathrm {d}t}\left[ {\left\Vert f\right\Vert _{n,\infty }^{-1} c_{a,b}(b-a)^m[f^{(m)}(L_{a,b}(t))]}\right] \\&\quad =\left\Vert f\right\Vert _{n,\infty }^{-1} c_{a,b}(b-a)^m[f^{(m+1)}(L_{a,b}(t))L_{a,b}'(t)]\\&\quad =\left\Vert f\right\Vert _{n,\infty }^{-1} c_{a,b}(b-a)^{m+1}[f^{(m+1)}(L_{a,b}(t))]. \end{aligned} \end{aligned}$$

(A.102)

Induction thus establishes (A.100).

In addition, for every $[a,b]\subseteq {\mathbb {R}}_+$, $k\in \{0,1,\dots ,n\}$

$$\begin{aligned} \begin{aligned} c_{a,b}(b-a)^k&=\min \{1,(b-a)^{-n}\}(b-a)^k = \min \{(b-a)^k,(b-a)^{-n+k}\}\le 1. \end{aligned}\end{aligned}$$

(A.103)

Combining this with (6.30), (A.98) and (A.100) ensures for every $[a,b]\subseteq {\mathbb {R}}_+$, $f\in C^n([a,b],{\mathbb {R}})$

$$\begin{aligned} \begin{aligned}&\max _{k\in \{0,1,\dots ,n\}}\left[ {\sup _{t\in [0,1]}\left| {f_*^{(k)}(t)}\right| }\right] \\&\quad =\max _{k\in \{0,1,\dots ,n\}}\left[ {\sup _{t\in [a,b]}\left| {\left\Vert f\right\Vert _{n,\infty }^{-1}c_{a,b}(b-a)^k[f^{(k)}(t)]}\right| }\right] \\&\quad \le \left\Vert f\right\Vert _{n,\infty }^{-1}\max _{k\in \{0,1,\dots ,n\}}\left[ {\sup _{t\in [a,b]}\left| {f^{(k)}(t)}\right| }\right] = 1. \end{aligned}\end{aligned}$$

(A.104)

Theorem 6.5 therefore establishes that there exist neural networks $(\Phi _{g,\eta })_{g\in B^n_1,\eta \in (0,\infty )}\subseteq {\mathfrak {N}}$ which satisfy

(a)
$\displaystyle \sup _{g\in B^n_1,\eta \in (0,\infty )}\left[ {\frac{{\mathcal {L}}(\Phi _{g,\eta })}{\max \{r,\left| {\ln (\eta )}\right| \}}}\right] <\infty $,
(b)
$\displaystyle \sup _{g\in B^n_1,\eta \in (0,\infty )}\left[ {\frac{{\mathcal {M}}(\Phi _{g,\eta })}{\eta ^{-\frac{1}{n}}\max \{r,|\ln (\eta )|\}}}\right] <\infty $ and
(c)
for every $g\in B^n_1$, $\eta \in (0,\infty )$ that
$$\begin{aligned} \sup _{t\in [0,1]}\left| {g(t)-\left[ {R_{\varrho }(\Phi _{g,\eta })}\right] \!(t)}\right| \le \eta . \end{aligned}$$
(A.105)

Let $\left( {\Phi _{f,\varepsilon }}\right) _{f\in {\mathcal {C}}^n,\varepsilon \in (0,\infty )}\subseteq {\mathfrak {N}}$ denote neural networks which satisfy for every $[a,b]\subseteq {\mathbb {R}}_+$, $f\in C^n([a,b],{\mathbb {R}})$, $\varepsilon \in (0,\infty )$

$$\begin{aligned} \Phi _{f,\varepsilon }=\alpha _f\odot \varphi _{f_*,\frac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}}\odot \lambda _{a,b}. \end{aligned}$$

(A.106)

Observe that for every $[a,b]\subseteq {\mathbb {R}}_+$, $f\in C^n([a,b],{\mathbb {R}})$, $t\in [0,1]$ it holds

$$\begin{aligned}{}[R_{\varrho }(\lambda _{a,b})](t)=\left[ {\tfrac{1}{(b-a)}}\right] t-\tfrac{a}{(b-a)}=L_{a,b}^{-1}(t)\qquad \text {and}\qquad [R_{\varrho }(\alpha _f)](t)=\tfrac{\left\Vert f\right\Vert _{n,\infty }}{c_{a,b}}t. \end{aligned}$$

(A.107)

Lemma 5.3 therefore demonstrates for every $[a,b]\subseteq {\mathbb {R}}_+$, $f\in C^n([a,b],{\mathbb {R}})$, $\varepsilon \in (0,\infty )$, $t\in [0,1]$ it holds

$$\begin{aligned} \begin{aligned}{}[R_{\varrho }(\Phi _{f,\varepsilon })](t)&=[R_{\varrho }(\alpha _f\odot \varphi _{f_*,\frac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}}\odot \lambda _{a,b})](t)\\&=[R_{\varrho }(\alpha _f)\circ R_{\varrho }(\varphi _{f_*,\frac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}})\circ R_{\varrho }(\lambda _{a,b})](t)\\&=\tfrac{\left\Vert f\right\Vert _{n,\infty }}{c_{a,b}}[R_{\varrho }(\varphi _{f_*,\frac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}})](L_{a,b}^{-1}(t)). \end{aligned}\end{aligned}$$

(A.108)

Moreover, note (A.99) ensures that for every $[a,b]\subseteq {\mathbb {R}}_+$, $f\in C^n([a,b],{\mathbb {R}})$, $t\in [a,b]$ it holds

$$\begin{aligned} f(t)=\tfrac{\left\Vert f\right\Vert _{n,\infty }}{c_{a,b}}f_*(L_{a,b}^{-1}(t)). \end{aligned}$$

(A.109)

Combining (c), (A.106) and (A.108) implies for every $[a,b]\subseteq {\mathbb {R}}_+$, $f\in C^n([a,b],{\mathbb {R}})$, $\varepsilon \in (0,\infty )$

$$\begin{aligned} \begin{aligned}&\sup _{t\in [a,b]}\left| {f(t)-\left[ {R_{\varrho }(\Phi _{f,\varepsilon })}\right] \!(t)}\right| \\&\quad =\sup _{t\in [a,b]}\left| {\tfrac{\left\Vert f\right\Vert _{n,\infty }}{c_{a,b}}f_*(L_{a,b}^{-1}(t))-\tfrac{\left\Vert f\right\Vert _{n,\infty }}{c_{a,b}}[R_{\varrho }(\varphi _{f_*,\frac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}})](L_{a,b}^{-1}(t))}\right| \\&\quad =\tfrac{\left\Vert f\right\Vert _{n,\infty }}{c_{a,b}}\left[ {\sup _{t\in [0,1]}\left| {f_*(t)-[R_{\varrho }(\varphi _{f_*,\frac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}})](t)}\right| }\right] \le \tfrac{\left\Vert f\right\Vert _{n,\infty }}{c_{a,b}} \tfrac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}=\varepsilon . \end{aligned}\end{aligned}$$

(A.110)

This establishes that the neural networks $\left( {\Phi _{f,\varepsilon }}\right) _{f\in {\mathcal {C}}^n, \varepsilon \in (0,\infty )}$ satisfy (iii). Furthermore, Lemma 5.3 ensures for every $[a,b]\subseteq {\mathbb {R}}_+$, $f\in C^n([a,b],{\mathbb {R}})$, $\varepsilon \in (0,\infty )$ holds

$$\begin{aligned} \begin{aligned} {\mathcal {L}}(\Phi _{f,\varepsilon })=&{\mathcal {L}}(\alpha _f\odot \varphi _{f_*,\frac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}}\odot \lambda _{a,b})\\ =&{\mathcal {L}}(\alpha _f)+{\mathcal {L}}(\varphi _{f_*,\frac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}})+{\mathcal {L}}(\lambda _{a,b})={\mathcal {L}}(\varphi _{f_*,\frac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}})+2. \end{aligned}\end{aligned}$$

(A.111)

In addition, for every $[a,b]\subseteq {\mathbb {R}}_+$, $f\in C^n([a,b],{\mathbb {R}})$, $\varepsilon \in (0,\infty )$ holds

$$\begin{aligned} \begin{aligned} \max \{r,|\ln \left( \tfrac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}\right) |\}&=\max \{r,|\ln \left( \tfrac{\min \{1,(b-a)^{-n}\}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}\right) |\} =\max \{r,|\ln \left( \tfrac{\varepsilon }{(\max \{1,(b-a)\})^n\left\Vert f\right\Vert _{n,\infty }}\right) |\} \\&\le n\max \{r,|\ln \left( \tfrac{\varepsilon }{(\max \{1,(b-a)\})\left\Vert f\right\Vert _{n,\infty }}\right) |\}. \end{aligned}\end{aligned}$$

(A.112)

Combining this with (a) and (A.111) implies that

$$\begin{aligned} \begin{aligned}&\sup _{f\in {\mathcal {C}}^n, \varepsilon \in (0,\infty )} \left[ {\frac{{\mathcal {L}}(\Phi _{f,\varepsilon })}{\max \{r,|\ln (\frac{\varepsilon }{\max \{1,b-a\}\left\Vert f\right\Vert _{n,\infty }})|\}}}\right] \\&\le n\sup _{f\in {\mathcal {C}}^n, \varepsilon \in (0,\infty )} \left[ {\frac{{\mathcal {L}}(\varphi _{f_*,\frac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}})+2}{\max \{r,|\ln \left( \tfrac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}\right) |\}}}\right] \\&=n\sup _{g\in B^n_1,\eta \in (0,\infty )}\left[ {\frac{{\mathcal {L}}(\Phi _{g,\eta })+2}{\max \{r,\left| {\ln (\eta )}\right| \}}}\right] <\infty . \end{aligned}\end{aligned}$$

(A.113)

This establishes that the neural networks $\left( {\Phi _{f,\varepsilon }}\right) _{f\in {\mathcal {C}}^n, \varepsilon \in (0,\infty )}$ satisfy (i). Next, Lemma 5.3 implies that for every $[a,b]\subseteq {\mathbb {R}}_+$, $f\in C^n([a,b],{\mathbb {R}})$, $\varepsilon \in (0,\infty )$

$$\begin{aligned} \begin{aligned} {\mathcal {M}}(\Phi _{f,\varepsilon })&={\mathcal {M}}(\alpha _f\odot \varphi _{f_*,\frac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}}\odot \lambda _{a,b}) ={\mathcal {M}}(\alpha _f)+{\mathcal {M}}(\varphi _{f_*,\frac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}})+{\mathcal {M}}(\lambda _{a,b})\\&={\mathcal {M}}(\varphi _{f_*,\frac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}})+3. \end{aligned}\end{aligned}$$

(A.114)

In addition, note that (A.112) shows for every $[a,b]\subseteq {\mathbb {R}}_+$, $f\in C^n([a,b],{\mathbb {R}})$, $\varepsilon \in (0,\infty )$

$$\begin{aligned} \begin{aligned}&\left[ {\tfrac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}}\right] ^{-\frac{1}{n}}\max \{r,|\ln \left( \tfrac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}\right) |\} n\\&\quad \le \max \{1,b-a\}\left\Vert f\right\Vert _{n,\infty }^{\frac{1}{n}}\varepsilon ^{-\frac{1}{n}} \max \left\{ r,|\ln \left( \tfrac{\varepsilon }{\max \{1,b-a\}\left\Vert f\right\Vert _{n,\infty }}\right) |\right\} . \end{aligned}\end{aligned}$$

(A.115)

Combining this with (b) and (A.106) therefore ensures

$$\begin{aligned}&\sup _{f\in {\mathcal {C}}^n, \varepsilon \in (0,\infty )}\left[ {\frac{{\mathcal {M}}(\Phi _{f,\varepsilon })}{\max \{1,b-a\}\left\Vert f\right\Vert _{n,\infty }^{\frac{1}{n}}\varepsilon ^{-\frac{1}{n}}\max \{r,|\ln (\frac{\varepsilon }{\max \{1,b-a\}\left\Vert f\right\Vert _{n,\infty }})|\}}}\right] \nonumber \\&\quad \le n\sup _{f\in {\mathcal {C}}^n, \varepsilon \in (0,\infty )}\left[ {\frac{{\mathcal {M}}(\varphi _{f_*,\frac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}})+3}{\quad \left[ {\frac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }}}\right] ^{-\frac{1}{n}}\max \{r,|\ln (\frac{c_{a,b}\varepsilon }{\left\Vert f\right\Vert _{n,\infty }})|\}}}\right] \nonumber \\&\quad \le n\sup _{g\in B^n_1,\eta \in (0,\infty )}\left[ {\frac{{\mathcal {M}}(\Phi _{g,\eta })+3}{\eta ^{-\frac{1}{n}}\max \{r,|\ln (\eta )|\}}}\right] <\infty . \end{aligned}$$

(A.116)

This establishes that the neural networks $\left( {\Phi _{f,\varepsilon }}\right) _{f\in {\mathcal {C}}^n, \varepsilon \in (0,\infty )}$ satisfy (ii) and completes the proof. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Elbrächter, D., Grohs, P., Jentzen, A. et al. DNN Expression Rate Analysis of High-Dimensional PDEs: Application to Option Pricing. Constr Approx 55, 3–71 (2022). https://doi.org/10.1007/s00365-021-09541-6

Download citation

Received: 20 September 2018
Accepted: 13 November 2020
Published: 06 May 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s00365-021-09541-6

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

DNN Expression Rate Analysis of High-Dimensional PDEs: Application to Option Pricing

Abstract

Similar content being viewed by others

Deep ReLU network expression rates for option prices in high-dimensional, exponential Lévy models

Differential learning methods for solving fully nonlinear PDEs

A deep learning approach to the probabilistic numerical solution of path-dependent partial differential equations

1 Introduction

1.1 Motivation

1.2 Contributions and Main Result

Theorem 1.1

1.3 Outline

2 High-Dimensional Derivative Pricing

Proposition 2.1

Lemma 2.2

Proof of Proposition 2.1

3 Regularity of the Cumulative Normal Distribution

Lemma 3.1

Proof of Lemma 3.1

Lemma 3.2

Proof of Lemma 3.2

Corollary 3.3

Proof of Corollary 3.3

Corollary 3.4

Proof of Corollary 3.4

Lemma 3.5

Proof of Lemma 3.5

4 Quadrature

Lemma 4.1

Proof of Lemma 4.1

Lemma 4.2

Proof of Lemma 4.2

Lemma 4.3

Proof of Lemma 4.3

5 Basic ReLU DNN Calculus

Setting 5.1

Setting 5.2

Lemma 5.3

Proof of Lemma 5.3

Lemma 5.4

Proof of Lemma 5.4

6 Basic Expression Rate Results

Lemma 6.1

Lemma 6.2

Theorem 6.3

Proof of Theorem 6.3

Proposition 6.4

Proof of Proposition 6.4

Theorem 6.5

Corollary 6.6

7 DNN Expression Rates for High-Dimensional Basket Prices

Corollary 7.1

Proof of Corollary 7.1

Corollary 7.2

Proof of Corollary 7.2

Theorem 7.3

Proof of Theorem 7.3

8 Discussion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Additional Proofs

Additional Proofs

1.1 Technical Lemma

Lemma A.1

Proof of Lemma A.1

1.2 Proof of Lemma 6.1

Proof of Lemma 6.1

1.3 Proof of Lemma 6.2

Proof of Lemma 6.2

1.4 Proof of Theorem 6.5

Proof of Theorem 6.5

1.5 Proof of Corollary 6.6

Proof of Corollary 6.6

Rights and permissions

About this article