The main result of this section, Lemma 3.10 below, shows that multilevel Picard approximations can be well represented by DNNs. The central tools for the proof of Lemma 3.10 are Lemmas 3.8 and 3.9 which show that DNNs are stable under compositions and summations. We formulate Lemmas 3.8 and 3.9 in terms of the operators defined in (33) and (34) below, whose properties are studied in Lemmas 3.3, 3.4, and 3.5.
A mathematical framework for deep neural networks
Setting 3.1
(Artificial neural networks) Let\(\left\| \cdot \right\| , |||\cdot ||| :(\cup _{d\in {\mathbb {N}}} {\mathbb {R}}^d) \rightarrow [0,\infty )\)and\(\dim :(\cup _{d\in {\mathbb {N}}}{\mathbb {R}}^d) \rightarrow {\mathbb {N}}\)satisfy for all\(d\in {\mathbb {N}}\), \(x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d\)that\(\Vert x\Vert =\sqrt{\sum _{i=1}^d(x_i)^2}\), \(|||x|||=\max _{i\in [1,d]\cap {\mathbb {N}}}|x_i|\), and\(\dim \left( x\right) =d\), let\({\mathbf {A}}_{d}:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\), \(d\in {\mathbb {N}}\), satisfy for all\(d\in {\mathbb {N}}\), \(x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d\)that
$${\mathbf {A}}_{d}(x)= \left( \max \{x_1,0\},\ldots ,\max \{x_d,0\}\right) ,$$
(29)
let\({\mathbf {D}}=\cup _{H\in {\mathbb {N}}} {\mathbb {N}}^{H+2}\), let
$$\begin{aligned}{\mathbf {N}} = \bigcup _{H\in {\mathbb {N}}}\bigcup _{(k_0,k_1,\ldots ,k_{H+1})\in {\mathbb {N}}^{H+2}} \left[ \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_{n}\times k_{n-1}} \times {\mathbb {R}}^{k_{n}}\right) \right] , \end{aligned}$$
(30)
let\({\mathcal {D}}:{\mathbf {N}}\rightarrow \mathbf {D}\)and\({\mathcal {R}}:{\mathbf {N}}\rightarrow (\cup _{k,l\in {\mathbb {N}}} C({\mathbb {R}}^k,{\mathbb {R}}^l))\)satisfy for all\(H\in {\mathbb {N}}\), \(k_0,k_1,\ldots ,k_H,k_{H+1}\in {\mathbb {N}}\), \(\Phi = ((W_1,B_1),\ldots ,(W_{H+1},B_{H+1}))\in \prod _{n=1}^{H+1} \left( {\mathbb {R}}^{k_n\times k_{n-1}} \times {\mathbb {R}}^{k_n}\right) ,\)\(x_0 \in {\mathbb {R}}^{k_0},\ldots ,x_{H}\in {\mathbb {R}}^{k_{H}}\)with\(\forall \, n\in {\mathbb {N}}\cap [1,H]:x_n = \mathbf {A}_{k_n}(W_n x_{n-1}+B_n )\)that
$${\mathcal {D}}(\Phi )= (k_0,k_1,\ldots ,k_{H}, k_{H+1}),\qquad {\mathcal {R}}(\Phi )\in C({\mathbb {R}}^{k_0},{\mathbb {R}}^ {k_{H+1}}),$$
(31)
$$\qquad and\qquad ({\mathcal {R}}(\Phi )) (x_0) = W_{H+1}x_{H}+B_{H+1},$$
(32)
let\(\odot :{\mathbf {D}}\times {\mathbf {D}}\rightarrow {\mathbf {D}}\)satisfy for all\(H_1,H_2\in {\mathbb {N}}\), \(\alpha =(\alpha _0,\alpha _1,\ldots ,\alpha _{H_1},\alpha _{H_1+1})\in {\mathbb {N}}^{H_1+2}\), \(\beta =(\beta _0,\beta _1,\ldots ,\beta _{H_2},\beta _{H_2+1})\in {\mathbb {N}}^{H_2+2}\)that
$$\alpha \odot \beta = (\beta _{0},\beta _{1},\ldots ,\beta _{H_2},\beta _{H_2+1}+\alpha _{0},\alpha _{1},\alpha _{2},\ldots ,\alpha _{H_1+1})\in {\mathbb {N}}^{H_1+H_2+3},$$
(33)
let\({{\, \mathrm{\boxplus }\,}}:{\mathbf {D}}\times {\mathbf {D}}\rightarrow {\mathbf {D}}\)satisfy for all\(H\in {\mathbb {N}}\), \(\alpha = (\alpha _0,\alpha _1,\ldots ,\alpha _{H},\alpha _{H+1})\in {\mathbb {N}}^{H+2}\), \(\beta = (\beta _0,\beta _1,\beta _2,\ldots ,\beta _{H},\beta _{H+1})\in {\mathbb {N}}^{H+2}\)that
$$\alpha {{\,\mathrm{\boxplus }\,}}\beta =(\alpha _0,\alpha _1+\beta _1,\ldots ,\alpha _{H}+\beta _{H},\beta _{H+1})\in {\mathbb {N}}^{H+2},$$
(34)
and let\({\mathfrak {n}}_{n}\in {\mathbf {D}}\), \(n\in [3,\infty )\cap {\mathbb {N}}\), satisfy for all\(n\in [3,\infty )\cap {\mathbb {N}}\)that
$$\begin{aligned} {\mathfrak {n}}_{n}= (1,\underbrace{2,\ldots ,2}_{(n-2)\text {-times}},1)\in {\mathbb {N}}^{n}. \end{aligned}$$
(35)
Remark 3.2
The set \({\mathbf {N}}\) can be viewed as the set of all artificial neural networks. For each network \(\Phi \in {\mathbf {N}}\) the function \({\mathcal {R}}(\Phi )\) is the function represented by \(\Phi\) and the vector \({\mathcal {D}}(\Phi )\) describes the layer dimensions of \(\Phi\).
Properties of operations associated to deep neural networks
Lemma 3.3
(\(\odot\) is associative) Assume Setting 3.1 and let \(\alpha ,\beta ,\gamma \in {\mathbf {D}}\). Then it holds that \((\alpha \odot \beta )\odot \gamma = \alpha \odot (\beta \odot \gamma )\).
Proof of Lemma 3.3
Throughout this proof let \(H_1,H_2,H_3\in {\mathbb {N}}\), \((\alpha _i)_{i\in [0,H_1+1]\cap {\mathbb {N}}_0}\in {\mathbb {N}}^{H_1+2}\), \((\beta _i)_{i\in [0,H_2+1]\cap {\mathbb {N}}_0}\in {\mathbb {N}}^{H_2+2}\), \((\gamma _i)_{i\in [0,H_3+1]\cap {\mathbb {N}}_0}\in {\mathbb {N}}^{H_3+2}\) satisfy that
$$\begin{aligned} \alpha&=(\alpha _0,\alpha _1,\ldots ,\alpha _{H_1+1}),\quad \beta =(\beta _0,\beta _1,\ldots ,\beta _{H_2+1}),\quad \text {and}\\ \gamma&=(\gamma _0,\gamma _1,\ldots ,\gamma _{H_3+1}). \end{aligned}$$
(36)
The definition of \(\odot\) in (33) then shows that
$$\begin{aligned} (\alpha \odot \beta )\odot \gamma& = (\beta _{0},\beta _{1},\beta _2\ldots ,\beta _{H_2},\beta _{H_2+1}+\alpha _{0},\alpha _{1},\alpha _{2},\ldots ,\alpha _{H_1+1})\odot (\gamma _0,\gamma _1,\ldots ,\gamma _{H_3+1})\\&= (\gamma _0,\ldots ,\gamma _{H_3},\gamma _{H_3+1}+\beta _{0},\beta _{1},\ldots ,\beta _{H_2},\beta _{H_2+1}+\alpha _{0},\alpha _{1},\alpha _{2},\ldots ,\alpha _{H_1+1})\\&= (\alpha _0,\alpha _1,\ldots ,\alpha _{H_1+1})\odot (\gamma _{0},\gamma _{1},\ldots ,\gamma _{H_3},\gamma _{H_3+1}+\beta _{0},\beta _{1},\beta _{2},\ldots ,\beta _{H_2+1}) \\&=\alpha \odot (\beta \odot \gamma ). \end{aligned}$$
(37)
The proof of Lemma 3.3 is thus completed. \(\square\)
Lemma 3.4
(\({{\,\mathrm{\boxplus }\,}}\) and associativity) Assume Setting 3.1, let \(H,k,l \in {\mathbb {N}}\), and let \(\alpha ,\beta ,\gamma \in \left( \{k\}\times {\mathbb {N}}^{H} \times \{l\}\right)\). Then
- (i)
it holds that\(\alpha {{\,\mathrm{\boxplus }\,}}\beta \in \left( \{k\}\times {\mathbb {N}}^{H} \times \{l\}\right)\),
- (ii)
it holds that\(\beta {{\,\mathrm{\boxplus }\,}}\gamma \in \left( \{k\}\times {\mathbb {N}}^{H} \times \{l\}\right)\), and
- (iii)
it holds that\((\alpha {{\,\mathrm{\boxplus }\,}}\beta ){{\,\mathrm{\boxplus }\,}}\gamma = \alpha {{\,\mathrm{\boxplus }\,}}(\beta {{\,\mathrm{\boxplus }\,}}\gamma )\).
Proof of Lemma 3.4
Throughout this proof let \(\alpha _i,\beta _i,\gamma _i\in {\mathbb {N}}\), \(i\in [1,H]\cap {\mathbb {N}}\), satisfy that \(\alpha = (k,\alpha _1,\alpha _2,\ldots ,\alpha _{H},l)\), \(\beta = (k,\beta _1,\beta _2,\ldots ,\beta _{H},l)\), and \(\gamma = (k,\gamma _1,\gamma _2,\ldots ,\gamma _{H},l).\) The definition of \({{\,\mathrm{\boxplus }\,}}\) (see (34)) then shows that
$$\begin{aligned}\alpha {{\,\mathrm{\boxplus }\,}}\beta&=(k,\alpha _1+\beta _1, \alpha _2+\beta _2, \ldots ,\alpha _{H}+\beta _{H},l)\in \{k\}\times {\mathbb {N}}^{H}\times \{l\}, \\ \beta {{\,\mathrm{\boxplus }\,}}\gamma&=(k,\beta _1+\gamma _1, \beta _2+\gamma _2, \ldots ,\beta _{H}+\gamma _{H},l)\in \{k\}\times {\mathbb {N}}^{H}\times \{l\}, \end{aligned}$$
(38)
and
$$\begin{aligned} (\alpha {{\,\mathrm{\boxplus }\,}}\beta ){{\,\mathrm{\boxplus }\,}}\gamma&=(k,(\alpha _1+\beta _1)+\gamma _1, (\alpha _2+\beta _2)+\gamma _2, \ldots ,(\alpha _{H}+\beta _{H})+\gamma _{H},l) \\&=(k,\alpha _1+(\beta _1+\gamma _1), \alpha _2+(\beta _2+\gamma _2), \ldots ,\alpha _{H}+(\beta _{H}+\gamma _{H}),l) = \alpha {{\,\mathrm{\boxplus }\,}}(\beta {{\,\mathrm{\boxplus }\,}}\gamma ).\end{aligned}$$
(39)
The proof of Lemma 3.4 is thus completed. \(\qquad\qquad\square\)
Lemma 3.5
(Triangle inequality) Assume Setting 3.1, let \(H,k,l \in {\mathbb {N}}\), and let \(\alpha ,\beta \in \{k\}\times {\mathbb {N}}^{H} \times \{l\}\). Then it holds that \(|||\alpha {{\,\mathrm{\boxplus }\,}}\beta |||\le |||\alpha |||+ |||\beta |||\).
Proof of Lemma 3.5
Throughout this proof let \(\alpha _i,\beta _i\in {\mathbb {N}}\), \(i\in [1,H]\cap {\mathbb {N}}\), satisfy that \(\alpha = (k,\alpha _1,\alpha _2,\ldots ,\alpha _{H},l)\) and \(\beta = (k,\beta _1,\beta _2,\ldots ,\beta _{H},l).\) The definition of \({{\,\mathrm{\boxplus }\,}}\) (see (34)) then shows that \(\alpha {{\,\mathrm{\boxplus }\,}}\beta =(k,\alpha _1+\beta _1, \alpha _2+\beta _2, \ldots ,\alpha _{H}+\beta _{H},l).\) This together with the triangle inequality implies that
$$\begin{aligned} |||\alpha {{\,\mathrm{\boxplus }\,}}\beta |||&=\sup \left\{ |k|,\left| \alpha _1+\beta _1\right| , \left| \alpha _2+\beta _2\right| , \ldots ,\left| \alpha _{H}+\beta _{H}\right| ,\left| l\right| \right\} \\&\le \sup \left\{ |k|,\left| \alpha _1\right| , \left| \alpha _2\right| , \ldots ,\left| \alpha _{H}\right| ,\left| l\right| \right\} +\sup \left\{ |k|,\left| \beta _1\right| , \left| \beta _2\right| , \ldots ,\left| \beta _{H}\right| ,\left| l\right| \right\} \\&= |||\alpha |||+|||\beta |||.\end{aligned}$$
(40)
This completes the proof of Lemma 3.5. \(\square\)
The following result, Lemma 3.6, can in a slightly modified variant, e.g., be found in [20, Lemma 5.4] in the scientific literature.
Lemma 3.6
(Existence of DNNs with \(H\in {\mathbb {N}}\) hidden layers for the identity in \({\mathbb {R}}\)) Assume Setting 3.1 and let \(H\in {\mathbb {N}}\). Then it holds that \(\mathrm {Id}_{{\mathbb {R}}}\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathfrak {n}}_{H+2} \})\).
Proof of Lemma 3.6
Throughout this proof let \(W_1\in {\mathbb {R}}^{2\times 1}\), \(W_i\in {\mathbb {R}}^{2\times 2}\), \(\,i\in [2,H]\cap {\mathbb {N}}\), \(W_{H+1}\in {\mathbb {R}}^{1\times 2}\), \(B_i\in {\mathbb {R}}^2\), \(i\in [1,H]\cap {\mathbb {N}}\), \(B_{H+1}\in {\mathbb {R}}^1\) satisfy that
$$\begin{aligned} &W_1= \begin{pmatrix} 1\\ -1 \end{pmatrix},\quad \forall i\in [2,H]\cap {\mathbb {N}}:W_i=\begin{pmatrix} 1&{} 0\\ 0&{} 1 \end{pmatrix} , \quad W_{H+1}= \begin{pmatrix} 1&-1 \end{pmatrix},\\&\forall i\in [1,H]\cap {\mathbb {N}}:B_i= \begin{pmatrix} 0\\ 0 \end{pmatrix},\quad B_{H+1}=0,\end{aligned}$$
(41)
let \(\phi \in {\mathbf {N}}\) satisfy that \(\phi =((W_1,B_1),(W_2,B_2),\ldots ,(W_H,B_H),(W_{H+1},B_{H+1}))\), for every \(a\in {\mathbb {R}}\) let \(a^+\in [0,\infty )\) be the non-negative part of a, i.e., \(a^+=\max \{a,0\}\), and let \(x_0\in {\mathbb {R}}\), \(x_1,x_2,\ldots ,x_{H}\in {\mathbb {R}}^2\) satisfy for all \(n\in {\mathbb {N}}\cap [1,H]\) that
$$x_n = \mathbf {A}_{2}(W_n x_{n-1}+B_n ).$$
(42)
Note that (41) and the definition of \({\mathcal {D}}\) (see (31)) imply that \({\mathcal {D}}(\phi )={\mathfrak {n}}_{H+2}\). Furthermore, (41), (42), and an induction argument show that
$$\begin{aligned} x_1&= \mathbf {A}_{2}(W_1x_0+B_1)= \mathbf {A}_{2}\left( \begin{pmatrix} x_0\\ -x_0 \end{pmatrix}\right) = \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix},\\ x_2&= \mathbf {A}_{2}(W_2x_1+B_2)= \mathbf {A}_{2}(x_1)=\mathbf {A}_{2}\left( \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix}\right) = \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix} ,\\&\quad \vdots \\ x_{H}&= \mathbf {A}_{2}(W_{H}x_{H-1}+B_{H})= \mathbf {A}_{2}(x_{H-1})=\mathbf {A}_{2}\left( \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix}\right) = \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix}. \end{aligned}$$
(43)
The definition of \({\mathcal {R}}\) (see (32)) hence ensures that
$$\begin{aligned} ({\mathcal {R}}(\phi ))(x_0)&=W_{H+1}x_{H}+B_{H+1}= \begin{pmatrix} 1&-1 \end{pmatrix} \begin{pmatrix} x_0^+\\ (-x_0)^{+} \end{pmatrix}=x_0^{+}-(-x_0)^{+}=x_0.\end{aligned}$$
(44)
The fact that \(x_0\) was arbitrary therefore proves that \({\mathcal {R}}(\phi ) =\mathrm {Id}_{{\mathbb {R}}}\). This and the fact that \({\mathcal {D}}(\phi )={\mathfrak {n}}_{H+2}\) demonstrate that \({\mathrm {Id}}_{{\mathbb {R}}}\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathfrak {n}}_{H+2} \})\). The proof of Lemma 3.6 is thus completed. \(\qquad\qquad\square\)
Lemma 3.7
(DNNs for affine transformations) Assume Setting 3.1 and let \(d,m\in {\mathbb {N}}\), \(\lambda \in {\mathbb {R}}\), \(b\in {\mathbb {R}}^d\), \(a\in {\mathbb {R}}^m\), \(\Psi \in {\mathbf {N}}\) satisfy that \({\mathcal {R}}(\Psi )\in C({\mathbb {R}}^d,{\mathbb {R}}^m)\). Then it holds that
$$\lambda \left( \left( \mathcal {R}(\Psi )\right) (\cdot +b)+a\right) \in \mathcal {R}\left( \{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\mathcal {D}(\Psi )\}\right) .$$
(45)
Proof of Lemma 3.7
Throughout this proof let \(H,k_0,k_1,\ldots ,k_{H+1}\in {\mathbb {N}}\) satisfy that
$$H+2=\dim \left( \mathcal {D}(\Psi )\right) \quad \text {and}\quad (k_0,k_1,\ldots ,k_{H},k_{H+1}) = \mathcal {D}(\Psi ),$$
(46)
let \(((W_1,B_1),(W_2,B_2),\ldots ,(W_H,B_H),(W_{H+1},B_{H+1})) \in \prod _{n=1}^{H+1}\left( {\mathbb {R}}^{k_n\times k_{n-1}}\times {\mathbb {R}}^{k_n}\right)\) satisfy that
$$\left( (W_1,B_1),(W_2,B_2),\ldots ,(W_H,B_H),(W_{H+1},B_{H+1})\right) =\Psi ,$$
(47)
let \(\phi \in {\mathbf {N}}\) satisfy that
$$\phi =\left( (W_1,B_1+W_1b),(W_2,B_2),\ldots ,(W_H,B_H),(\lambda W_{H+1},\lambda B_{H+1}+\lambda a)\right) ,$$
(48)
and let \(x_0,y_0 \in {\mathbb {R}}^{k_0},x_1,y_1 \in {\mathbb {R}}^{k_1},\ldots ,x_{H},y_H\in {\mathbb {R}}^{k_{H}}\) satisfy for all \(n\in {\mathbb {N}}\cap [1,H]\) that
$$x_n = {\mathbf {A}}_{k_n}(W_n x_{n-1}+B_n ),\, y_n = \mathbf {A}_{k_n}(W_n y_{n-1}+B_n+\mathbb {1}_{\{1\}}(n)W_1b ), \quad \text {and} \quad x_0=y_0+b.$$
(49)
Then it holds that
$$y_1= {\mathbf {A}}_{k_1}(W_1 y_{0}+B_1+W_1b )= {\mathbf {A}}_{k_1}(W_1( y_{0}+b)+B_1 ) = {\mathbf {A}}_{k_1}(W_1x_0+B_1 )=x_1.$$
(50)
This and an induction argument prove for all \(i\in [2,H]\cap {\mathbb {N}}\) that
$$\begin{aligned} y_i=\mathbf {A}_{k_i}(W_i y_{i-1}+B_i )= \mathbf {A}_{k_i}(W_i x_{i-1}+B_i )=x_i. \end{aligned}$$
(51)
The definition of \({\mathcal {R}}\) (see (32)) hence shows that
$$\begin{aligned} ({\mathcal {R}}(\phi ))(y_0)&= \lambda W_{H+1}y_H+\lambda B_{H+1}+\lambda a=\lambda W_{H+1}x_H+\lambda B_{H+1}+\lambda a\\ {}&=\lambda (W_{H+1}x_H+ B_{H+1}+ a) =\lambda ( (\mathcal {R}(\Psi ))(x_0)+a)= \lambda ((\mathcal {R}(\Psi ))(y_0+b)+a). \end{aligned}$$
(52)
This and the fact that \(y_0\) was arbitrary prove that \({\mathcal {R}}(\phi )=\lambda ((\mathcal {R}(\Psi ))(\cdot +b)+a)\). This and the fact that \({\mathcal {D}}(\phi )=\mathcal {D}(\Psi )\) imply that \(\lambda \left( (\mathcal {R}(\Psi ))(\cdot +b)+a\right) \in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\mathcal {D}(\Psi )\})\). The proof of Lemma 3.7 is thus completed. \(\square\)
Lemma 3.8
(Composition) Assume Setting 3.1 and let \(d_1,d_2,d_3\in {\mathbb {N}}\), \(f\in C({\mathbb {R}}^{d_2},{\mathbb {R}}^{d_3})\), \(g\in C( {\mathbb {R}}^{d_1}, {\mathbb {R}}^{d_2})\), \(\alpha ,\beta \in \mathbf {D}\) satisfy that \(f\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\alpha \})\) and \(g\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\beta \})\). Then it holds that \((f\circ g)\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\alpha \odot \beta \})\).
Proof of Lemma 3.8
Throughout this proof let \(H_1,H_2,\alpha _0,\ldots , \alpha _{H_1+1},\beta _0,\ldots , \beta _{H_2+1}\in {\mathbb {N}}\), \(\Phi _{f}, \Phi _{g}\in \mathbf {N}\) satisfy that
$$\begin{aligned}&(\alpha _0,\alpha _1,\ldots ,\alpha _{H_1+1})=\alpha , \quad (\beta _0,\beta _1,\ldots ,\beta _{H_2+1})=\beta , \quad \mathcal {R}(\Phi _{f})=f , \\&\quad \mathcal {D}(\Phi _{f})=\alpha , \quad \mathcal {R}(\Phi _{g})=g,\quad \text {and}\quad \mathcal {D}(\Phi _{g})=\beta . \end{aligned}$$
(53)
Lemma 5.4 in [20] shows that there exists \(\mathbb {I}\in \mathbf {N}\) such that \(\mathcal {D}(\mathbb {I})=d_2{\mathfrak {n}}_{3}= (d_2,2d_2,d_2)\) and \(\mathcal {R}(\mathbb {I}) =\mathrm {Id}_{{\mathbb {R}}^{d_2}}\). Note that \(2d_2=\beta _{H_2+1}+\alpha _0\). This and [20, Proposition 5.2] (with \(\phi _1= \Phi _{f}\), \(\phi _2= \Phi _{g}\), and \(\mathbb {I}=\mathbb {I}\) in the notation of [20, Proposition 5.2]) show that there exists \(\Phi _{f\circ g}\in \mathbf {N}\) such that \(\mathcal {R}( \Phi _{f\circ g})=f\circ g\) and \(\mathcal {D}(\Phi _{f\circ g})= \mathcal {D}(\Phi _{f})\odot \mathcal {D}(\Phi _{g})=\alpha \odot \beta\). Hence, it holds that \(f\circ g\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\alpha \odot \beta \})\). The proof of Lemma 3.8 is thus completed. \(\square\)
The following result, Lemma 3.9, can roughly speaking in a specialized form be found, e.g., in [20, Lemma 5.1].
Lemma 3.9
(Sum of DNNs of the same length) Assume Setting 3.1 and let \(M,H,p,q\in {\mathbb {N}}\), \(h_1,h_2,\ldots ,h_M\in {\mathbb {R}}\), \(k_i\in \mathbf {D}\), \(f_i\in C({\mathbb {R}}^{p},{\mathbb {R}}^{q})\), \(i\in [1,M]\cap {\mathbb {N}}\), satisfy for all \(i\in [1,M]\cap {\mathbb {N}}\) that
$$\begin{aligned} \dim \left( k_i\right) =H+2\quad \text {and}\quad f_i\in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )=k_i\right\} \right) . \end{aligned}$$
(54)
Then it holds that
$$\begin{aligned} \sum _{i=1}^{M}h_if_i \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M}}k_i\right\} \right) . \end{aligned}$$
(55)
Proof of Lemma 3.9
Throughout this proof let \(\phi _i\in \mathbf {N}\), \(i\in [1,M]\cap {\mathbb {N}}\), and \(k_{i,n}\in {\mathbb {N}}\), \(i\in [1,M]\cap {\mathbb {N}}\), \(n\in [0,H+1]\cap {\mathbb {N}}_0\), satisfy for all \(i \in [1,M]\cap {\mathbb {N}}\) that
$$\begin{aligned} \mathcal {D}(\phi _i)=k_i= (k_{i,0},k_{i,1},k_{i,2},\ldots ,k_{i,H},k_{i,H+1}) \quad \text {and}\quad \mathcal {R}(\phi _i)=f_i, \end{aligned}$$
(56)
for every \(i\in [1,M]\cap {\mathbb {N}}\) let \(((W_{i,1}, B_{i,1}),\ldots , (W_{i,H+1}, B_{i,H+1}))\in \prod _{n=1}^{H+1}\left( {\mathbb {R}}^{k_{i,n}\times k_{i,n-1}} \times {\mathbb {R}}^{k_{i,n}}\right)\) satisfy that
$$\begin{aligned} \phi _i= \left( (W_{i,1}, B_{i,1}),\ldots , (W_{i,H+1},B_{i,H+1})\right) , \end{aligned}$$
(57)
let \(k_n^{{{\,\mathrm{\boxplus }\,}}}\in {\mathbb {N}}\), \(n\in [1,H]\cap {\mathbb {N}}\), \(k^{{{\,\mathrm{\boxplus }\,}}}\in {\mathbb {N}}^ {H+2}\) satisfy for all \(n\in [1,H]\cap {\mathbb {N}}\) that
$$\begin{aligned} \begin{aligned} k_n^{{{\,\mathrm{\boxplus }\,}}}=\sum _{i=1}^{M}k_{i,n}\quad \text {and}\quad k^{{{\,\mathrm{\boxplus }\,}}}=(p,k^{{{\,\mathrm{\boxplus }\,}}}_1,k^{{{\,\mathrm{\boxplus }\,}}}_2,\ldots , k^{{{\,\mathrm{\boxplus }\,}}}_{H},q), \end{aligned} \end{aligned}$$
(58)
let \(W_1\in {\mathbb {R}}^{k_1^{{{\,\mathrm{\boxplus }\,}}}\times p}\), \(B_1\in {\mathbb {R}}^{k_1^{{{\,\mathrm{\boxplus }\,}}}}\) satisfy that
$$\begin{aligned} W_1= \begin{pmatrix} W_{1,1}\\ W_{2,1}\\ \vdots \\ W_{M,1} \end{pmatrix} \quad \text {and}\quad B_1= \begin{pmatrix} B_{1,1}\\ B_{2,1}\\ \vdots \\ B_{M,1} \end{pmatrix}, \end{aligned}$$
(59)
let \(W_n\in {\mathbb {R}}^{k_n^{{{\,\mathrm{\boxplus }\,}}}\times k_{n-1}^{{{\,\mathrm{\boxplus }\,}}}}\), \(B_n\in {\mathbb {R}}^{k^{{{\,\mathrm{\boxplus }\,}}}_{n}}\), \(n\in [2,H]\cap {\mathbb {N}}\), satisfy for all \(n\in [2,H]\cap {\mathbb {N}}\) that
$$\begin{aligned} \begin{aligned} W_n= \begin{pmatrix} W_{1,n} &{} 0 &{} 0 &{} 0 \\ 0 &{} W_{2,n} &{} 0 &{} 0 \\ 0 &{} 0 &{} \ddots &{} 0 \\ 0 &{} 0 &{} 0 &{}W_{M,n} \end{pmatrix} \quad \text {and}\quad B_n= \begin{pmatrix} B_{1,n}\\ B_{2,n}\\ \vdots \\ B_{M,n} \end{pmatrix},\end{aligned} \end{aligned}$$
(60)
let \(W_{H+1}\in {\mathbb {R}}^{q\times k_{H}^{{{\,\mathrm{\boxplus }\,}}}}\), \(B_{H+1}\in {\mathbb {R}}^{q}\) satisfy that
$$\begin{aligned} \begin{aligned} W_{H+1}= \begin{pmatrix} h_1W_{1,H+1}&\ldots&h_MW_{M,H+1} \end{pmatrix}\quad \text {and}\quad B_{H+1} = \sum _{i=1}^{M}h_iB_{i,H+1}, \end{aligned} \end{aligned}$$
(61)
let \(x_0\in {\mathbb {R}}^p,\, x_1\in {\mathbb {R}}^{k_1^{{{\,\mathrm{\boxplus }\,}}}}, x_2\in {\mathbb {R}}^{k_2^{{{\,\mathrm{\boxplus }\,}}}} \ldots ,x_H\in {\mathbb {R}}^{k_H^{{{\,\mathrm{\boxplus }\,}}}}\), let \(x_{1,0},x_{2,0},\ldots ,x_{M,0}\in {\mathbb {R}}^{p}\), \(x_{i,n}\in {\mathbb {R}}^{k_{i,n}}\), \(i\in [1,M]\cap {\mathbb {N}}\), \(n\in [1,H]\cap {\mathbb {N}}\), satisfy for all \(i\in [1,M]\cap {\mathbb {N}}\), \(n\in [1,H]\cap {\mathbb {N}}\) that
$$\begin{aligned} &x_0=x_{1,0}=x_{2,0}=\cdots =x_{M,0},\\&x_{i,n}=\mathbf {A}_{k_{i,n}}(W_{i,n}x_{i,n-1}+B_{i,n}),\\&x_n= \mathbf {A}_{k^{{{\,\mathrm{\boxplus }\,}}}_{n}}(W_{n}x_{n-1}+B_{n}), \end{aligned}$$
(62)
and let \(\psi \in {\mathbf {N}}\) satisfy that
$$\begin{aligned} \psi = \left( (W_1,B_1),(W_2,B_2),\ldots ,(W_H,B_H),(W_{H+1},B_{H+1})\right) . \end{aligned}$$
(63)
First, the definitions of \(\mathcal {D}\) and \(\mathcal {R}\) (see (31) and (32)), (56), and the fact that \(\forall \, i\in [1,M]\cap {\mathbb {N}}:f_i\in C({\mathbb {R}}^p,{\mathbb {R}}^q)\) show for all \(i\in [1,M]\cap {\mathbb {N}}\) that \(k_i= (p,k_{i,1},k_{i,2},\ldots ,k_{i,H},q).\) The definition of \(\mathcal {D}\) (see (31)), the definition of \({{\,\mathrm{\boxplus }\,}}\) (see (34)), and (58) then show that
$$\begin{aligned} \mathcal {D}(\psi )= (p,k_1^{{{\,\mathrm{\boxplus }\,}}},\ldots ,k_H^{{{\,\mathrm{\boxplus }\,}}},q)={\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M}}k_i. \end{aligned}$$
(64)
Next, we prove by induction on \(n\in [1,H]\cap {\mathbb {N}}\) that \(x_n=(x_{1,n},x_{2,n},\ldots ,x_{M,n})\). First, (59) shows that
$$\begin{aligned} W_1x_0+B_1= \begin{pmatrix} W_{1,1}\\ W_{2,1}\\ \vdots \\ W_{M,1} \end{pmatrix}x_0+ \begin{pmatrix} B_{1,1}\\ B_{2,1}\\ \vdots \\ B_{M,1} \end{pmatrix} = \begin{pmatrix} W_{1,1}x_0+B_{1,1}\\ W_{2,1}x_0+B_{2,1}\\ \vdots \\ W_{M,1}x_0+B_{M,1} \end{pmatrix}. \end{aligned}$$
(65)
This implies that
$$\begin{aligned} x_1= \mathbf {A}_{k_1^{{{\,\mathrm{\boxplus }\,}}}}(W_1x_0+B_1)=\begin{pmatrix} x_{1,1}\\ x_{2,1}\\ \vdots \\ x_{M,1}\end{pmatrix}. \end{aligned}$$
(66)
This proves the base case. Next, for the induction step let \(n\in [2,H]\cap {\mathbb {N}}\) and assume that \(x_{n-1}=(x_{1,n-1},x_{2,n-1},\ldots ,x_{M,n-1})\). Then (60) and the induction hypothesis ensure that
$$\begin{aligned} \begin{aligned}&W_nx_{n-1}+B_n \\&\quad = W_{n}\begin{pmatrix} x_{1,n-1}\\ x_{2,n-1}\\ \vdots \\ x_{M,n-1} \end{pmatrix}+B_{n} =\begin{pmatrix} W_{1,n} &{} 0 &{} 0 &{} 0 \\ 0 &{} W_{2,n} &{} 0 &{} 0 \\ 0 &{} 0 &{} \ddots &{} 0 \\ 0 &{} 0 &{} 0 &{}W_{M,n} \end{pmatrix} \begin{pmatrix} x_{1,n-1}\\ x_{2,n-1}\\ \vdots \\ x_{M,n-1} \end{pmatrix}+ \begin{pmatrix} B_{1,n}\\ B_{2,n}\\ \vdots \\ B_{M,n} \end{pmatrix} \\&\quad = \begin{pmatrix} W_{1,n}x_{1,n-1}+ B_{1,n}\\ W_{2,n}x_{2,n-1}+B_{2,n}\\ \vdots \\ W_{M,n}x_{M,n-1}+ B_{M,n} \end{pmatrix}.\end{aligned} \end{aligned}$$
(67)
This yields that
$$\begin{aligned} x_{n}= \mathbf {A}_{k_n^{{{\,\mathrm{\boxplus }\,}}}}(W_nx_{n-1}+B_n)=\begin{pmatrix} x_{1,n}\\ x_{2,n}\\ \vdots \\ x_{M,n} \end{pmatrix}. \end{aligned}$$
(68)
This proves the induction step. Induction now proves for all \(n\in [1,H]\cap {\mathbb {N}}\) that \(x_n=(x_{1,n},x_{2,n},\ldots ,x_{M,n})\). This, the definition of \(\mathcal {R}\) (see (32)), and (61) imply that
$$\begin{aligned} \begin{aligned}&(\mathcal {R}(\psi ))(x_0)=W_{H+1}x_H+B_{H+1}\\&\quad =W_{H+1}\begin{pmatrix} x_{1,H}\\ x_{2,H}\\ \vdots \\ x_{M,H} \end{pmatrix}+B_{H+1} =\begin{pmatrix} h_1W_{1,H+1}&\ldots&h_MW_{M,H+1} \end{pmatrix} \begin{pmatrix} x_{1,H}\\ x_{2,H}\\ \vdots \\ x_{M,H} \end{pmatrix}+\left[ \sum _{i=1}^{M}h_iB_{i,H+1}\right] \\&\quad =\left[ \sum _{i=1}^{M}h_iW_{i,H+1}x_{i,H}\right] +\left[ \sum _{i=1}^{M}h_iB_{i,H+1}\right] = \sum _{i=1}^{M}h_i\left( W_{i,H+1}x_{i,H}+B_{i,H+1}\right) \\&\quad =\sum _{i=1}^M h_i(\mathcal {R}(\phi _i))(x_0). \end{aligned} \end{aligned}$$
(69)
This, the fact that \(x_0\in {\mathbb {R}}^{p}\) was arbitrary, and (56) yield that
$$\begin{aligned} \mathcal {R}(\psi )= \sum _{i=1}^{M}h_i\mathcal {R}(\phi _i)=\sum _{i=1}^{M}h_if_i. \end{aligned}$$
(70)
This and (64) show that
$$\begin{aligned} \sum _{i=1}^{M}h_if_i \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M}}k_i\right\} \right) . \end{aligned}$$
(71)
The proof of Lemma 3.9 is thus completed. \(\square\)
Deep neural network representations for MLP approximations
Lemma 3.10
Assume Setting 3.1, let \(d,M\in {\mathbb {N}}\), \(T,c \in (0,\infty )\), \(f\in C({\mathbb {R}},{\mathbb {R}})\), \(g \in C( {\mathbb {R}}^d, {\mathbb {R}})\), \(\Phi _f,\Phi _g\in \mathbf {N}\) satisfy that \(\mathcal {R}(\Phi _f)= f\), \(\mathcal {R}(\Phi _g)= g\), and
$$\begin{aligned} c\ge \max \left\{ 2, |||\mathcal {D}(\Phi _{f})|||,|||\mathcal {D}(\Phi _{g})|||\right\} , \end{aligned}$$
(72)
let\((\Omega , \mathcal {F}, {\mathbb {P}})\)be a probability space, let\(\Theta = \bigcup _{ n \in {\mathbb {N}}} {\mathbb {Z}}^n\), let\({\mathfrak {u}}^\theta :\Omega \rightarrow [0,1]\), \(\theta \in \Theta\), be independent random variables which are uniformly distributed on [0, 1], let\(\mathcal {U}^\theta :[0,T]\times \Omega \rightarrow [0, T]\), \(\theta \in \Theta\), satisfy for all\(t\in [0,T]\), \(\theta \in \Theta\)that\(\mathcal {U}^\theta _t = t+ (T-t){\mathfrak {u}}^\theta\), let\(W^\theta :[0,T]\times \Omega \rightarrow {\mathbb {R}}^d\), \(\theta \in \Theta\), be independent standard Brownian motions with continuous sample paths, assume that\(({\mathfrak {u}}^\theta )_{\theta \in \Theta }\)and\((W^\theta )_{\theta \in \Theta }\)are independent, let\({U}_{ n,M}^{\theta } :[0, T] \times {\mathbb {R}}^d \times \Omega \rightarrow {\mathbb {R}}\), \(n\in {\mathbb {Z}}\), \(\theta \in \Theta\), satisfy for all\(n \in {\mathbb {N}}\), \(\theta \in \Theta\), \(t \in [0,T]\), \(x\in {\mathbb {R}}^d\)that\({U}_{-1,M}^{\theta }(t,x)={U}_{0,M}^{\theta }(t,x)=0\)and
$$\begin{aligned} {U}_{n,M}^{\theta }(t,x) & = \frac{1}{M^n} \sum
_{i=1}^{M^n} g\left( x+W^{(\theta ,0,-i)}_{T}-W^{(\theta
,0,-i)}_{t}\right) \\&\quad + \sum _{l=0}^{n-1}
\frac{(T-t)}{M^{n-l}} \left[ \sum _{i=1}^{M^{n-l}} \left( f\circ
{U}_{l,M}^{(\theta ,l,i)}-\mathbb {1}_{{\mathbb {N}}}(l)\,f\circ
{U}_{l-1,M}^{(\theta ,-l,i)}\right)\right.\\
&\qquad\quad\qquad\qquad\qquad \left.\left( \mathcal {U}_t^{(\theta ,l,i)},x+W_{\mathcal {U}_t^{(\theta ,l,i)}}^{(\theta
,l,i)}-W_{t}^{(\theta ,l,i)}\right) \right] , \end{aligned}$$
(73)
and let\(\omega \in \Omega\). Then for all\(n\in {\mathbb {N}}_0\)there exists a family\((\Phi _{n,t}^{\theta })_{\theta \in \Theta ,t\in [0,T]}\subseteq \mathbf {N}\)such that
- (i)
it holds for all\(t_1,t_2\in [0,T]\), \(\theta _1,\theta _2\in \Theta\)that
$${\mathcal {D}}\left( \Phi _{n,t_1}^{\theta _1}\right) =\mathcal {D}\left( \Phi _{n,t_2}^{\theta _2}\right) ,$$
(74)
- (ii)
it holds for all\(t\in [0,T]\), \(\theta \in \Theta\)that
$$\dim \left( \mathcal {D}\left( \Phi _{n,t}^{\theta }\right) \right) = n\left( \dim \left( \mathcal {D}\left( \Phi _{f}\right) \right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) ,$$
(75)
- (iii)
it holds for all\(t\in [0,T]\), \(\theta \in \Theta\)that
$$|||\mathcal {D}(\Phi _{n,t}^{\theta } )|||\le c(3 M)^n,$$
(76)
and
- (iv)
it holds for all\(\theta \in \Theta\), \(t\in [0,T]\), \(x\in {\mathbb {R}}^d\)that
$${U}_{n,M}^{\theta }(t,x,\omega )=(\mathcal {R}(\Phi _{n,t}^{\theta }))(x).$$
(77)
Proof of Lemma 3.10
We prove Lemma 3.10 by induction on \(n\in {\mathbb {N}}_0\). For the base case \(n=0\) note that the fact that \(\forall \, t\in [0,T],\theta \in \Theta :U^\theta _{0,M}(t,\cdot ,\omega )=0\), the fact that the function 0 can be represented by a network with depth \(\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right)\), and (72) imply that there exists \((\Phi _{0,t}^{\theta })_{\theta \in \Theta , t\in [0,T]}\subseteq \mathbf {N}\) such that it holds for all \(t_1,t_2\in [0,T]\), \(\theta _1,\theta _2\in \Theta\) that \(\mathcal {D}\left( \Phi _{0,t_1}^{\theta _1}\right) =\mathcal {D}\left( \Phi _{0,t_2}^{\theta _2}\right)\) and such that it holds for all \(\theta \in \Theta\), \(t\in [0,T]\) that \(\dim \left( \mathcal {D}(\Phi _{0,t}^{\theta })\right) =\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right)\), \(|||\mathcal {D}(\Phi _{0,t}^{\theta } )|||\le |||\mathcal {D}(\Phi _{g})|||\le c\), and \({U}_{0,M}^{\theta }(t,\cdot ,\omega )= \mathcal {R}(\Phi _{0,t}^{\theta })\). This proves the base case \(n=0\).
For the induction step from \(n\in {\mathbb {N}}_0\) to \(n+1\in {\mathbb {N}}\) let \(n\in {\mathbb {N}}_0\) and assume that (i)–(iv) hold true for all \(k\in [0,n]\cap {\mathbb {N}}_0\). The assumption that \(g=\mathcal {R}(\Phi _g)\) and Lemma 3.7 (with \(d=d\), \(m=1\), \(\lambda =1\), \(a=0\), \(b=W^{\theta }_{T}(\omega )-W^{\theta }_{t}(\omega )\), and \(\Psi =\Phi _g\) for \(\theta \in \Theta\), \(t\in [0,T]\) in the notation of Lemma 3.7) show for all \(\theta \in \Theta\), \(t\in [0,T]\) that
$$\begin{aligned} \begin{aligned} g\left( \cdot +W^{\theta }_{T}(\omega )-W^{\theta }_{t}(\omega )\right)&=(\mathcal {R}(\Phi _g))\left( \cdot +W^{\theta }_{T}(\omega )-W^{\theta }_{t}(\omega )\right) \\&\in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\mathcal {D}(\Phi _{g}) \right\} \right) . \end{aligned} \end{aligned}$$
(78)
Furthermore, Lemma 3.6 (with \(H=(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) -1\) in the notation of Lemma 3.6) ensures that
$$\begin{aligned} \mathrm {Id}_{{\mathbb {R}}}\in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathfrak {n}}_{(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}\right\} \right) . \end{aligned}$$
(79)
This, (78), and Lemma 3.8 (with \(d_1=d\), \(d_2=1\), \(d_3=1\), \(f=\mathrm {Id}_{{\mathbb {R}}}\), \(g=g\left( \cdot +W^{\theta }_{T}(\omega )-W^{\theta }_{t}(\omega )\right)\), \(\alpha ={\mathfrak {n}}_{(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}\), and \(\beta =\mathcal {D}(\Phi _g)\) for \(\theta \in \Theta\), \(t\in [0,T]\) in the notation of Lemma 3.8) show that for all \(\theta \in \Theta\), \(t\in [0,T]\) it holds that
$$\begin{aligned} \begin{aligned} g\left( \cdot +W^{\theta }_{T}(\omega )-W^{\theta }_{t}(\omega )\right) \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathfrak {n}}_{(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1} \odot \mathcal {D}(\Phi _{g}) \right\} \right) . \end{aligned} \end{aligned}$$
(80)
Next, the induction hypothesis implies for all \(\theta \in \Theta\), \(t\in [0,T]\), \(l\in [0,n]\cap {\mathbb {N}}_0\) that
$$\begin{aligned} {U}_{l,M}^{\theta }(t,\cdot ,\omega )=\mathcal {R}(\Phi _{l,t}^{\theta })\quad \text {and}\quad \mathcal {D}\left( \Phi _{l,t}^{\theta }\right) =\mathcal {D}\left( \Phi _{l,0}^{0}\right) . \end{aligned}$$
(81)
This and Lemma 3.7 (with
$$\begin{aligned} \begin{aligned}&d=d,\quad m=1,\quad a=0,\quad b=W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\quad \text {and}\\&\Psi =\Phi _{l,\mathcal {U}_t^{\theta }(\omega )}^{\eta }\quad \text {for}\quad \theta ,\eta \in \Theta , \quad t\in [0,T],\quad l\in [0,n]\cap {\mathbb {N}}_0 \end{aligned} \end{aligned}$$
(82)
in the notation of Lemma 3.7) imply that for all \(\theta ,\eta \in \Theta\), \(t\in [0,T]\), \(l\in [0,n]\cap {\mathbb {N}}_0\) it holds that
$$\begin{aligned} \begin{aligned}&U_{l,M}^{\eta } \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad =\left( \mathcal {R}\left( \Phi _{l,\mathcal {U}_t^{\theta }(\omega )}^{\eta }\right) \right) \left( \cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega )\right) \\&\quad \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )= \mathcal {D}\left( \Phi _{l,\mathcal {U}_t^{\theta }(\omega )}^{\eta }\right) \right\} \right) = \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )= \mathcal {D}\left( \Phi _{l,0}^{0}\right) \right\} \right) . \end{aligned} \end{aligned}$$
(83)
Moreover, Lemma 3.6 (with \(H=(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) -1\) for \(l\in [0,n-1]\cap {\mathbb {N}}_0\) in the notation of Lemma 3.6) ensures for all \(l\in [0,n-1]\cap {\mathbb {N}}_0\) that
$$\begin{aligned} \mathrm {Id}_{{\mathbb {R}}}\in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )={\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \right\} \right) . \end{aligned}$$
(84)
This, (83), and Lemma 3.8 (with
$$\begin{aligned} \begin{aligned}&d_1=d, \quad d_2=1, \quad d_3=1, \quad f=\mathrm {Id}_{{\mathbb {R}}}, \quad \alpha ={\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}}, \quad \\&\quad \beta =\mathcal {D}\left( \Phi _{l,0}^{0}\right) ,\quad \text {and}\quad g= U_{l,M}^{\eta } \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad \qquad \text {for}\quad \eta ,\theta \in \Theta ,\quad t\in [0,T],\quad l\in [0,n-1]\cap {\mathbb {N}}_0\\ \end{aligned} \end{aligned}$$
(85)
in the notation of Lemma 3.8) prove for all \(\eta ,\theta \in \Theta\), \(t\in [0,T]\), \(l\in [0,n-1]\cap {\mathbb {N}}_0\) that
$$\begin{aligned} &U_{l,M}^{\eta } \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )= {\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \odot \mathcal {D}(\Phi _{l,0}^{0}) \right\} \right) . \end{aligned}$$
(86)
This and Lemma 3.8 (with
$$\begin{aligned} \begin{aligned}&d_1=d, \quad d_2=1, \quad d_3=1, \quad f=f,\quad \alpha = \mathcal {D}(\Phi _f),\quad \\&\beta ={\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \odot \mathcal {D}(\Phi _{l,0}^{0}),\quad \text {and} \quad g=U_{l,M}^{\eta } \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad \qquad \text {for}\quad \eta ,\theta \in \Theta ,\quad t\in [0,T], \quad l\in [0,n-1]\cap {\mathbb {N}}_0 \end{aligned} \end{aligned}$$
(87)
in the notation of Lemma 3.8) assure for all \(\eta ,\theta \in \Theta\), \(t\in [0,T]\), \(l\in [0,n-1]\cap {\mathbb {N}}_0\) that
$$\begin{aligned} \begin{aligned}&\left( f\circ U_{l,M}^{\eta }\right) \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\mathcal {D}(\Phi _{f})\odot {\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \odot \mathcal {D}(\Phi _{l,0}^{0}) \right\} \right) . \end{aligned} \end{aligned}$$
(88)
Next, (83) (with \(l=n\)) and Lemma 3.8 (with
$$\begin{aligned} \begin{aligned}&d_1=d, \quad d_2=1, \quad d_3=1, \quad f=f,\quad \alpha = \mathcal {D}(\Phi _f),\quad \beta =\mathcal {D}\left( \Phi _{n,0}^{0}\right) ,\quad \text {and}\\&\quad g=\left( U_{n,M}^{\eta }\right) \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \quad \text {for}\quad \eta ,\theta \in \Theta ,\quad t\in [0,T] \end{aligned} \end{aligned}$$
(89)
in the notation of Lemma 3.8) prove for all \(\eta ,\theta \in \Theta\), \(t\in [0,T]\) that
$$\begin{aligned} \begin{aligned}&\left( f\circ U_{n,M}^{\eta }\right) \left( \mathcal {U}_t^{\theta }(\omega ),\cdot +W_{\mathcal {U}_t^{\theta }(\omega )}^{\theta }(\omega )- W_{t}^{\theta }(\omega ),\omega \right) \\&\quad \in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )=\mathcal {D}(\Phi _{f}) \odot \mathcal {D}(\Phi _{n,0}^{0}) \right\} \right) . \end{aligned} \end{aligned}$$
(90)
Furthermore, the definition of \(\odot\) in (33) and the fact that
$$\begin{aligned} \forall \, l\in [0,n]\cap {\mathbb {N}}_0:\dim \left( \mathcal {D}(\Phi _{l,0}^{0})\right) =l \left( \dim \left( \mathcal {D}\left( \Phi _{f}\right) \right) -1\right) + \dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) \end{aligned}$$
(91)
in the induction hypothesis imply that
$$\begin{aligned} \begin{aligned}&\dim \left( {\mathfrak {n}}_{{(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}}\odot \mathcal {D}(\Phi _{g})\right) \\&\quad =\left[ (n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1\right] +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) -1\\&\quad =(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) , \end{aligned} \end{aligned}$$
(92)
that
$$\begin{aligned} \begin{aligned}&\dim \left( \mathcal {D}(\Phi _{f}) \odot \mathcal {D}(\Phi _{n,0}^{0}) \right) = \dim \left( \mathcal {D}(\Phi _{f})\right) +\dim \left( \mathcal {D}(\Phi _{n,0}^{0})\right) -1\\&\quad = \dim \left( \mathcal {D}(\Phi _{f})\right) +\left[ n\left( \dim \left( \mathcal {D}\left( \Phi _{f}\right) \right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) \right] -1\\&\quad = (n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) ,\end{aligned} \end{aligned}$$
(93)
and for all \(l\in [0,n-1]\cap {\mathbb {N}}_0\) that
$$\begin{aligned} \begin{aligned}&\dim \left( \mathcal {D}(\Phi _{f}) \odot {\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \odot \mathcal {D}(\Phi _{l,0}^{0}) \right) \\&\quad = \dim \left( \mathcal {D}(\Phi _{f})\right) +\dim \left( {\mathfrak {n}}_{{(n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}}\right) +\dim \left( \mathcal {D}(\Phi _{l,0}^{0}) \right) -2\\&\quad =\dim \left( \mathcal {D}(\Phi _{f})\right) +\left[ (n-l)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1 \right] \\&\qquad + \left[ l \left( \dim \left( \mathcal {D}\left( \Phi _{f}\right) \right) -1\right) + \dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) \right] -2\\&\quad =\dim \left( \mathcal {D}(\Phi _{f})\right) + n\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) -1\\&\quad = (n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) . \end{aligned} \end{aligned}$$
(94)
This shows, roughly speaking, that the functions in (80), (90), and (88) can be represented by networks with the same depth (i.e. number of layers): \((n+1)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1)+\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right)\). Hence, Lemma 3.9 and (73) imply that there exists a family \((\Phi _{n+1,t}^{\theta })_{\theta \in \Theta , t\in [0,T]}\subseteq \mathbf {N}\) such that for all \(\theta \in \Theta\), \(t\in [0,T]\), \(x\in {\mathbb {R}}^d\) it holds that
$$\begin{aligned} \begin{aligned}&\left( \mathcal {R}(\Phi _{n+1,t}^{\theta })\right) (x) \\&\quad = \frac{1}{M^{n+1}} \sum _{i=1}^{M^{n+1}} g\left( x+W^{(\theta ,0,-i)}_{T}(\omega )-W^{(\theta ,0,-i)}_{t}(\omega )\right) \\&\qquad + \frac{(T-t)}{M} \sum _{i=1}^{M} \left( f\circ {U}_{n,M}^{(\theta ,n,i)}\right) \left( \mathcal {U}_t^{(\theta ,n,i)}(\omega ),x+W_{\mathcal {U}_t^{(\theta ,n,i)}(\omega )}^{(\theta ,n,i)}(\omega )- W_{t}^{(\theta ,n,i)}(\omega ),\omega \right) \\&\qquad + \sum _{l=0}^{n-1} \frac{(T-t)}{M^{n+1-l}} \sum _{i=1}^{M^{n+1-l}} \left( f\circ {U}_{l,M}^{(\theta ,l,i)}\right) \left( \mathcal {U}_t^{(\theta ,l,i)}(\omega ),x+W_{\mathcal {U}_t^{(\theta ,l,i)}(\omega )}^{(\theta ,l,i)}(\omega )- W_{t}^{(\theta ,l,i)}(\omega ),\omega \right) \\&\qquad -\sum _{l=1}^{n} \frac{(T-t)}{M^{n+1-l}} \sum _{i=1}^{M^{n+1-l}} \left( f\circ {U}_{l-1,M}^{(\theta ,-l,i)} \right) \left( \mathcal {U}_t^{(\theta ,l,i)}(\omega ),x+W_{\mathcal {U}_t^{(\theta ,l,i)}(\omega )}^{(\theta ,l,i)}(\omega )- W_{t}^{(\theta ,l,i)}(\omega ),\omega \right) \\&\quad = {U}_{n+1,M}^{\theta }(t,x,\omega ), \end{aligned} \end{aligned}$$
(95)
that
$$\begin{aligned} \dim \left( \mathcal {D}(\Phi _{n+1,t}^{\theta })\right) = (n+1)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1)+\dim \left( \mathcal {D}\left( \Phi _{g}\right) \right) , \end{aligned}$$
(96)
and that
$$\begin{aligned} \begin{aligned} \mathcal {D}(\Phi _{n+1,t}^{\theta })& = \left( {\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M^{n+1}}}\left[ {\mathfrak {n}}_{{(n+1)\left( \dim \left( \mathcal {D}(\Phi _{f})\right) -1\right) +1}} \odot \mathcal {D}(\Phi _{g})\right] \right) {{\,\mathrm{\boxplus }\,}}\left( {\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M}} \left( \mathcal {D}\left( \Phi _{f}\right) \odot \mathcal {D}\left( \Phi _{n,0}^{0}\right) \right) \right) \\&\quad {{\,\mathrm{\boxplus }\,}}\left( {\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{l=0}^{n-1}}{\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M^{n+1-l}}}\left[ \left( \mathcal {D}(\Phi _{f})\odot {\mathfrak {n}}_{{(n-l)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1) +1 }} \odot \mathcal {D}(\Phi _{l,0}^{0})\right) \right) \right. \\&\left. \quad {{\,\mathrm{\boxplus }\,}}\left( {\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{l=1}^{n}}{\mathop {{{\,\mathrm{\boxplus }\,}}}\limits _{i=1}^{M^{n+1-l}}} \left( \mathcal {D}(\Phi _{f})\odot {\mathfrak {n}}_{{(n-l+1)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1) +1 }} \odot \mathcal {D}(\Phi _{l-1,0}^{0})\right) \right] \right) .\end{aligned} \end{aligned}$$
(97)
This shows for all \(t_1,t_2\in [0,T]\), \(\theta _1,\theta _2\in \Theta\) that
$$\begin{aligned} \mathcal {D}\left( \Phi _{n+1,t_1}^{\theta _1}\right) =\mathcal {D}\left( \Phi _{n+1,t_2}^{\theta _2}\right) . \end{aligned}$$
(98)
Furthermore, (97), the triangle inequality (see Lemma 3.5), and the fact that
$$\begin{aligned} \forall \, l\in [0,n]\cap {\mathbb {N}}_0:|||\mathcal {D}(\Phi _{l,0}^{0} )|||\le c(3 M)^l \end{aligned}$$
(99)
in the induction hypothesis show for all \(\theta \in \Theta\), \(t\in [0,T]\) that
$$\begin{aligned} \begin{aligned} |||\mathcal {D}(\Phi _{n+1,t}^{\theta })|||&\le \sum _{i=1}^{M^{n+1}}||| {\mathfrak {n}}_{{(n+1)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1)+1}} \odot \mathcal {D}(\Phi _{g})|||+\sum _{i=1}^{M}|||\mathcal {D}(\Phi _{f}) \odot \mathcal {D}(\Phi _{n,0}^{0})|||\\&\quad + \sum _{l=0}^{n-1}\sum _{i=1}^{M^{n+1-l}} |||\mathcal {D}(\Phi _{f})\odot {\mathfrak {n}}_{{(n-l)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1) +1 }} \odot \mathcal {D}(\Phi _{l,0}^{0})|||\\&\quad + \sum _{l=1}^{n}\sum _{i=1}^{M^{n+1-l}}|||\mathcal {D}(\Phi _{f})\odot {\mathfrak {n}}_{{(n-l+1)(\dim \left( \mathcal {D}(\Phi _{f})\right) -1) +1 }} \odot \mathcal {D}(\Phi _{l-1,0}^{0})|||. \end{aligned} \end{aligned}$$
(100)
Note that for all \(H_1,H_2,\alpha _0,\ldots ,\alpha _{H_1+1},\beta _0,\ldots , \beta _{H_2+1}\in {\mathbb {N}}\), \(\alpha ,\beta \in \mathbf {D}\) with \(\alpha =(\alpha _0,\ldots ,\alpha _{H_1+1})\), \(\beta =(\beta _0,\ldots , \beta _{H_2+1})\), \(\alpha _0=\beta _{H_2+1}=1\) it holds that \(|||\alpha \odot \beta |||\le \max \{|||\alpha |||,|||\beta |||,2\}\) (see (33)). This, (100), the fact that \(\forall \, H\in {\mathbb {N}}:|||{\mathfrak {n}}_{{H+2}}|||=2\) (see (35)), (72), and (99) prove for all \(\theta\in\Theta, t\in[0,T]\) that
$$\begin{aligned} \begin{aligned}&|||\mathcal {D}(\Phi _{n+1,t}^{\theta })|||\\&\quad \le \left[ \sum _{i=1}^{M^{n+1}}c \right] + \left[ \sum _{i=1}^{M}c({3} M)^n\right] + \left[ \sum _{l=0}^{n-1}\sum _{i=1}^{M^{n+1-l}}c({3} M)^l\right] +\left[ \sum _{l=1}^{n}\sum _{i=1}^{M^{n+1-l}}c({3} M)^{l-1}\right] \\&\quad = M^{n+1}c+Mc(3M)^{n}+\left[ \sum _{l=0}^{n-1}M^{n+1-l}c(3M)^l\right] +\left[ \sum _{l=1}^{n}M^{n+1-l}c(3M)^{l-1}\right] \\&\quad \leq M^{n+1}c\left[ 1+3^n+\sum _{l=0}^{n-1}3^l+\sum _{l=1}^{n}3^{l-1}\right] = M^{n+1}c\left[ 1+\sum _{l=0}^{n}3^l+\sum _{l=1}^{n}3^{l-1}\right] \\&\quad \le cM^{n+1}\left[ 1+2\sum _{l=0}^{n} {3} ^l\right] = cM^{n+1}\left[ 1+2\frac{{3}^{n+1}-1}{{3}-1}\right] = c({3} M)^{n+1}. \end{aligned} \end{aligned}$$
(101)
Combining (95), (96), (98), and (101) completes the induction step. Induction hence establishes (i)–(iv). The proof of Lemma 3.10 is thus completed. \(\square\)
Deep neural network approximations for the PDE nonlinearity
Lemma 3.11
(DNN interpolation) Assume Setting 3.1, let \(N\in {\mathbb {N}}\), \(a_0,a_1,\ldots , a_{N-1},\xi _0,\xi _1,\ldots ,\xi _N\in {\mathbb {R}}\) satisfy that \(\xi _0<\xi _1<\ldots <\xi _N\), let \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) be a function, assume for all \(x\in (-\infty ,\xi _0]\) that \(f(x)=f(\xi _0)\), assume for all \(n\in [0,N-1]\cap {\mathbb {Z}}\), \(x\in (\xi _n,\xi _{n+1}]\) that \(f(x)=f(\xi _n)+a_n(x-\xi _n)\), and assume for all \(x\in (\xi _N,\infty )\) that \(f(x)=f(\xi _N)\). Then it holds that
$$f\in {\mathcal {R}}(\{\Phi \in {\mathbf {N}}:{\mathcal {D}}(\Phi )=(1,N+1,1)\}).$$
(102)
Proof of Lemma 3.11
Throughout this proof let \(a_{-1}=0\) and \(a_N=0\), let \(c_n \in {\mathbb {R}}, n \in\)\([0,N]\cap {\mathbb {Z}}\), be the real numbers which satisfy for all \(n\in [0,N]\cap {\mathbb {Z}}\) that \(c_n=a_{n}-a_{n-1}\), let \(W_1\in {\mathbb {R}}^{(N+1)\times 1}\), \(B_1\in {\mathbb {R}}^{N+1}\), \(W_2\in {\mathbb {R}}^{1\times (N+1)}\), \(B_2\in {\mathbb {R}}\), \(\Phi \in \mathbf {N}\) be given by
$$\begin{aligned}&W_1 = \begin{pmatrix} 1\\ 1\\ \vdots \\ 1 \end{pmatrix} ,\quad B_1=\begin{pmatrix} -\xi _0\\ -\xi_1\\ \vdots \\ -\xi_N \end{pmatrix} ,\quad W_2= \begin{pmatrix} c_0&c_1&\ldots&c_N \end{pmatrix},\quad B_2= f(\xi _0), \end{aligned}$$
(103)
and
$$\begin{aligned} \Phi = ((W_1,B_1),(W_2,B_2)), \end{aligned}$$
(104)
and let \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\) satisfy for all \(x\in {\mathbb {R}}\) that
$$\begin{aligned} g(x)=f(\xi _0)+\sum _{k=0}^{N}c_k\max \{x-\xi _k,0\}. \end{aligned}$$
(105)
First, observe that the fact that \(\forall \,n\in [0,N-1]\cap {\mathbb {Z}}:\xi _n<\xi _{n+1}\) and the fact that \(\forall \, n\in [0,N]\cap {\mathbb {Z}}:a_n= \sum _{k=0}^{n}c_k\) then show for all \(n\in [0,N-1]\cap {\mathbb {Z}}\), \(x\in (\xi _n,\xi _{n+1}]\) that
$$\begin{aligned} \begin{aligned} g(x)-g(\xi _n)&= \left[ \sum _{k=0}^{N}c_k\left( \max \{x-\xi _k,0\}-\max \{\xi _{n}-\xi _k,0\}\right) \right] \\&=\sum _{k=0}^{n}c_k [(x-\xi _k)-(\xi _{n}-\xi _k)]= \sum _{k=0}^{n}c_k (x-\xi _{n})=a_n(x-\xi _n). \end{aligned} \end{aligned}$$
(106)
This shows for all \(n\in [0,N-1]\cap {\mathbb {Z}}\) that g is affine linear on the interval \((\xi _n,\xi _{n+1}]\). This, the fact that for all \(n\in [0,N-1]\cap {\mathbb {Z}}\) it holds that f is affine linear on the interval \((\xi _n,\xi _{n+1}]\), the fact that \(\forall \, x\in (-\infty ,\xi _0]:f(x)= g(x)=f(\xi _0)\), and an induction argument imply for all \(x\in (-\infty ,\xi _N]\) that \(f(x)=g(x)\). Furthermore, (105), the fact that \(\forall \,n\in [0,N-1]\cap {\mathbb {Z}}:\xi _n<\xi _{n+1}\), and the fact that \(\sum _{k=0}^{N}c_k=0\) imply for all \(x\in (\xi _N,\infty )\) that
$$\begin{aligned} \begin{aligned} g(x)-g(\xi _N)&= \left[ \sum _{k=0}^{N}c_k\left( \max \{x-\xi _k,0\}-\max \{\xi _{N}-\xi _k,0\}\right) \right] \\&=\sum _{k=0}^{N}c_k [(x-\xi _k)-(\xi _{N}-\xi _k)]= \sum _{k=0}^{N}c_k (x-\xi _{N})=0.\end{aligned} \end{aligned}$$
(107)
This shows for all \(x\in (\xi _N,\infty )\) that \(g(x)=g(\xi _N)\). This, the fact that \(\forall \,x\in (\xi _N,\infty ):f(x)=f(\xi _N)\), the fact that \(\forall \,x\in (-\infty ,\xi _N]:f(x)=g(x)\), and (105) prove for all \(x\in {\mathbb {R}}\) that
$$\begin{aligned} f(x)=g(x)=f(\xi _0)+\sum _{k=0}^{N}c_k\max \{x-\xi _k,0\}. \end{aligned}$$
(108)
Next, the definition of \(\mathcal {R}\) and \(\mathcal {D}\) (see (31) and (32)), (103), (104), and (108) imply that for all \(x\in {\mathbb {R}}\) it holds that \(\mathcal {D}(\Phi )=(1,N+1,1)\) and
$$\begin{aligned} \begin{aligned}&(\mathcal {R}(\Phi ))(x)= W_2( \mathbf {A}_{N+1}(W_1x+B_1))+B_2\\ {}&= \begin{pmatrix} c_0&c_1&\ldots&c_N \end{pmatrix} \begin{pmatrix} \max \{x-\xi _0,0\}\\ \max \{x-\xi _1,0\}\\ \vdots \\ \max \{x-\xi _N,0\} \end{pmatrix}+f(\xi _0)= f(\xi _0)+\sum _{k=0}^{N}c_k\max \{x-\xi _k,0\}=f(x).\end{aligned} \end{aligned}$$
(109)
This establishes (102). The proof of Lemma 3.11 is thus completed. \(\square\)
Lemma 3.12
Let \(L\in [0,\infty )\), \(N\in {\mathbb {N}}\), \(a\in {\mathbb {R}}\), \(b\in (a,\infty )\), \(\xi _0, \xi _1,\ldots , \xi _N\in {\mathbb {R}}\) satisfy for all \(n\in [0,N]\cap {\mathbb {Z}}\) that \(\xi _n=a+\frac{(b-a)n}{N}\), let \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) satisfy for all \(x,y\in {\mathbb {R}}\) that
$$\begin{aligned} |f(x)-f(y)|\le L|x-y|, \end{aligned}$$
(110)
and let\(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\)satisfy for all\(x\in {\mathbb {R}}\), \(n\in [0,N-1]\cap {\mathbb {Z}}\)that
$$g(x) = \left\{ {\begin{array}{*{20}l} {f(\xi _{0} )} \hfill & {:x \in ( - \infty ,\xi _{0} ]} \hfill \\ {\frac{{f(\xi _{n} )(\xi _{{n + 1}} - x) + f(\xi _{{n + 1}} )(x - \xi _{n} )}}{{\xi _{{n + 1}} - \xi _{n} }}} \hfill & {:x \in (\xi _{n} ,\xi _{{n + 1}} ]} \hfill \\ {f(\xi _{N} )} \hfill & {:x \in (\xi _{N} ,\infty ).} \hfill \\ \end{array} } \right.$$
(111)
Then
- (i)
it holds for all\(n\in [0,N]\cap {\mathbb {Z}}\)that\(g(\xi _n)=f(\xi _n)\),
- (ii)
it holds for all\(x,y\in {\mathbb {R}}\)that\(|g(x)-g(y)|\le L|x-y|\), and
- (iii)
it holds for all\(x\in [a,b]\)that\(|g(x)-f(x)|\le \frac{2L(b-a)}{N}\).
Proof of Lemma 3.12
Throughout this proof let \(r,\ell :{\mathbb {R}}\rightarrow {\mathbb {R}}\) satisfy for all \(x\in {\mathbb {R}}{\setminus}(a,b]\) that
$$\begin{aligned} r(x)=\ell (x)=x \end{aligned}$$
(112)
and for all \(n\in [0,N-1]\cap {\mathbb {Z}}\), \(x\in (\xi _n,\xi _{n+1}]\) that
$$\begin{aligned} r(x)= \xi _{n+1}\quad \text {and}\quad \ell (x)= \xi _{n}. \end{aligned}$$
(113)
Note that (111) implies (i). Next observe that for all \(x,y\in (a,b]\) with \(x\le y\) and \(\ell (y)<r(x)\) it holds that \(r(x)=r(y)\) and \(\ell (x) =\ell (y)\). This, (112), (111), and (110) show that for all \(x,y\in {\mathbb {R}}\) with \(x\le y\) and \(\ell (y)<r(x)\) it holds that \(x,y\in (a,b]\), \(r(x)=r(y)\), \(\ell (x) =\ell (y)\), and
$$\begin{aligned} |g(x)-g(y)|= \left| \frac{f(r(x))-f(\ell (x))}{r(x)-\ell (x)} (x-y)\right| \le L |x-y|. \end{aligned}$$
(114)
Furthermore, (111), (110), and the fact that \(\forall \,x\in {\mathbb {R}}:\ell (x)\le x\le r(x)\) imply for all \(x\in (a,b]\) that
$$\begin{aligned} \begin{aligned} |g(x)-g(r(x))|&=\left| \frac{f(\ell (x))-f(r(x))}{\ell (x)-r(x)} (x-\ell (x))+f(\ell (x))-f(r(x))\right| \\&= \left| \frac{(f(\ell (x))-f(r(x)))(x-r(x))}{\ell (x)-r(x)}\right| \le L|x-r(x)|=L(r(x)-x) \end{aligned} \end{aligned}$$
(115)
and
$$\begin{aligned} \begin{aligned} g(x)-g(\ell (x))&=\left| \frac{f(\ell (x))-f(r(x))}{\ell (x)-r(x)} (x-\ell (x))+f(\ell (x))-f(\ell (x))\right| \\&=\left| \frac{f(\ell (x))-f(r(x))}{\ell (x)-r(x)} (x-\ell (x))\right| \le L|x-\ell (x)|=L(x-\ell (x)). \end{aligned} \end{aligned}$$
(116)
This and (112) show for all \(x\in {\mathbb {R}}\) that
$$\begin{aligned} |g(x)-g(r(x))|\le L(r(x)-x) \quad \text {and}\quad |g(x)-g(\ell (x))|\le L(x-\ell (x)) . \end{aligned}$$
(117)
The triangle inequality therefore shows for all \(x,y\in {\mathbb {R}}\) with \(x\le y\) and \(r(x)\le \ell (y)\) that
$$\begin{aligned} \begin{aligned} |g(x)-g(y)|&\le | g(x)-g(r(x))|+|g(r(x))-g(\ell (y))|+|g(\ell (y))-g(y)|\\&\le L (r(x)-x)+ L(\ell (y)-r(x))+ L (y-\ell (y))= L(y-x)= L|y-x|.\end{aligned} \end{aligned}$$
(118)
This and (114) show for all \(x,y\in {\mathbb {R}}\) with \(x\le y\) that \(|g(x)-g(y)|\le L|x-y|\). Symmetry hence establishes (ii). Next, the fact that \(\forall \,x\in {\mathbb {R}}:g(\ell (x))=f(\ell (x))\), the triangle inequality, (110), (117), and the fact that \(\forall \,x\in [a,b]:0\le x-\ell (x)\le (b-a)/N\) imply for all \(x\in [a,b]\) that
$$\begin{aligned} \begin{aligned} |g(x)-f(x)|&= |g(x)-f(\ell (x))+f(\ell (x))-f(x)|\\&= |g(x)-g(\ell (x))+f(\ell (x))-f(x)|\\&\le |g(x)-g(\ell (x))|+|f(\ell (x))-f(x)|\le 2L (x-\ell (x))\le 2L(b-a)/N. \end{aligned} \end{aligned}$$
(119)
This establishes (iii). The proof of Lemma 3.12 is thus completed. \(\square\)
Corollary 3.13
Assume Setting 3.1, let \(\epsilon \in (0,1]\), \(L\in [0,\infty )\), \(q\in (1,\infty )\), and let \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) satisfy for all \(x,y\in {\mathbb {R}}\) that \(|f(x)-f(y)|\le L|x-y|.\) Then there exists a function \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\) such that
- (i)
it holds for all\(x,y\in {\mathbb {R}}\)that\(|g(x)-g(y)|\le L|x-y|\),
- (ii)
it holds for all\(x\in {\mathbb {R}}\)that\(|f(x)-g(x)|\le \epsilon (1+|x|^q)\), and
- (iii)
it holds that
$$\begin{aligned} g\in \mathcal {R}\left( \left\{ \Phi \in \mathbf {N}:\mathcal {D}(\Phi )\in {\mathbb {N}}^3 and\; |||\mathcal {D}(\Phi )|||\le \frac{4L\left( 1+\left( 4L+2|f(0)| \right) ^{\frac{1}{q-1}} \right) +2}{\epsilon ^{\frac{q}{(q-1)}}}\right\} \right) . \end{aligned}$$
(120)
Proof of Corollary 3.13
Throughout this proof let \(R\in {\mathbb {R}}\), \(N\in {\mathbb {N}}\) satisfy that
$$\begin{aligned} R=\max \left( 1,\left( \frac{4L+2|f(0)|}{\epsilon } \right) ^{\frac{1}{q-1}} \right) \quad \text {and}\quad N=\min \left\{ n\in {\mathbb {N}}:\frac{4LR}{n}\le \epsilon \right\} , \end{aligned}$$
(121)
let \(\xi _0, \xi _1,\ldots , \xi _N\in {\mathbb {R}}\) be the real numbers which satisfy for all \(n\in [0,N]\cap {\mathbb {Z}}\) that \(\xi _n=R(-1+\frac{2n}{N})\), and let \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\) satisfy for all \(x\in {\mathbb {R}}\), \(n\in [0,N-1]\cap {\mathbb {Z}}\) that
$$g(x) = \left\{ {\begin{array}{*{20}l} {f(\xi _{0} )} \hfill & {:x \in ( - \infty ,\xi _{0} ]} \hfill \\ {\frac{{f(\xi _{n} )(\xi _{{n + 1}} - x) + f(\xi _{{n + 1}} )(x - \xi _{n} )}}{{\xi _{{n + 1}} - \xi _{n} }}} \hfill & {:x \in (\xi _{n} ,\xi _{{n + 1}} ]} \hfill \\ {f(\xi _{N} )} \hfill & {:x \in (\xi _{N} ,\infty ).} \hfill \\ \end{array} } \right.$$
(122)
By (ii) in Lemma 3.12 the function g satisfies (i). Next, it follows from (iii) in Lemma 3.12 that for all \(x\in [-R,R]\) it holds \(|g(x)-f(x)|\le 4LR/N\). This and the fact that \(4LR/N\le \epsilon\) prove that for all \(x\in [-R,R]\) it holds that \(|g(x)-f(x)|\le \epsilon \le \epsilon (1+|x|^q)\). Next, the triangle inequality, the fact that \(f(R)=g(R)\), and the Lipschitz condition of f and g imply for all \(x\in {\mathbb {R}}\) that
$$\begin{aligned} \begin{aligned} |f(x)-g(x)|&\le |f(x)-f(0)|+|f(0)|+|g(x)-g(R)|+|g(R)|\\&= |f(x)-f(0)|+|f(0)|+|g(x)-g(R)|+|f(R)|\\&\le |f(x)-f(0)|+|f(0)|+|g(x)-g(R)|+|f(R)-f(0)|+|f(0)|\\&\le L|x|+2|f(0)|+L|x-R|+LR\\&\le 2L(|x|+R)+2|f(0)|. \end{aligned} \end{aligned}$$
(123)
This and (121) show for all \(x\in {\mathbb {R}}{\setminus}[-R,R]\) that
$$\begin{aligned} \begin{aligned} \frac{|f(x)-g(x)|}{1+|x|^q}&\le \frac{2L(|x|+R)+2|f(0)|}{1+|x|^q}\le \frac{4L|x|+2|f(0)|}{1+|x|^q} \\&\le \frac{4L}{|x|^{q-1}}+\frac{2|f(0)|}{|x|^q}\le \frac{4L}{R^{q-1}}+\frac{2|f(0)|}{R^q}\le \frac{4L+2|f(0)|}{R^{q-1}} \le \epsilon .\end{aligned} \end{aligned}$$
(124)
This and the fact that \(\forall \,x\in [-R,R]:|g(x)-f(x)|\le \epsilon (1+|x|^q)\) prove that for all \(x\in {\mathbb {R}}\) it holds that \(|g(x)-f(x)|\le \epsilon (1+|x|^q)\). This shows that g satisfies (ii). Next, (i) in Lemma 3.12 ensures that g satisfies for all \(x\in (-\infty ,-R]\) that \(g(x)=g(-R)\), for all \(n\in [0,{\mathbb {N}}-1]\cap {\mathbb {Z}}\), \(x\in (\xi _n,\xi _{n+1}]\) that \(g(x)=g(\xi _n)+\frac{g(\xi _{n+1})-g(\xi _n)}{\xi _{n+1}-\xi _n}(x-\xi _n)\), and for all \(x\in (R,\infty )\) that \(g(x)=g(R)\). This and Lemma 3.11 (with \(N=N\), \(f=g\), \(\xi _n=\xi _n\) for \(n\in [0,N]\cap {\mathbb {Z}}\), and \(a_n= (g(\xi _{n+1})-g(\xi _n))/(\xi _{n+1}-\xi _n)\) for \(n\in [0,N-1]\cap {\mathbb {Z}}\) in the notation of Lemma 3.11) imply that
$$\begin{aligned} g\in \mathcal {R}(\{\Phi \in \mathbf {N}:\mathcal {D}(\Phi )=(1,N+1,1)\}). \end{aligned}$$
(125)
Furthermore, if \(N>1\), then (121) implies that \(\frac{4LR}{N-1}>\epsilon\). Hence, if \(N>1\) it holds that \(N<\frac{4LR}{\epsilon }+1\). This and (121) ensure that
$$N \le \frac{4LR}{\epsilon }+1=\frac{4L\max \left( 1,\left( \frac{4L+2|f(0)|}{\epsilon } \right) ^{\frac{1}{q-1}} \right) +\epsilon }{\epsilon }.$$
(126)
This and (125) imply that
$$\begin{aligned} |||\mathcal {D}(\Phi )|||&=N+1 \le \frac{4L\max \left( 1,\left( \frac{4L+2|f(0)|}{\epsilon } \right) ^{\frac{1}{q-1}} \right) +2\epsilon }{\epsilon }\\ {}&= \frac{4L\max \left( \epsilon ^{\frac{1}{q-1}},\left( 4L+2|f(0)| \right) ^{\frac{1}{q-1}} \right) +2\epsilon ^{\frac{q}{(q-1)}}}{\epsilon ^{\frac{q}{(q-1)}}}\\ {}&\le \frac{4L\left( 1+\left( 4L+2|f(0)| \right) ^{\frac{1}{q-1}} \right) +2}{\epsilon ^{\frac{q}{(q-1)}}}. \end{aligned}$$
(127)
This establishes (iii). The proof of Corollary 3.13 is thus completed. \(\square\)