Skip to main content
Log in

Constructive Deep ReLU Neural Network Approximation

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

We propose an efficient, deterministic algorithm for constructing exponentially convergent deep neural network (DNN) approximations of multivariate, analytic maps \(f:[-1,1]^{K}\rightarrow {\mathbb {R}}\). We address in particular networks with the rectified linear unit (ReLU) activation function. Similar results and proofs apply for many other popular activation functions. The algorithm is based on collocating f in deterministic families of grid points with small Lebesgue constants, and by a-priori (i.e., “offline”) emulation of a spectral basis with DNNs to prescribed fidelity. Assuming availability of N function values of a possibly corrupted, numerical approximation \(\breve{f}\) of f in \([-1,1]^{K}\) and a bound on \(\Vert f-\breve{f} \Vert _{L^\infty ({[-1,1]^K})}\), we provide an explicit, computational construction of a ReLU DNN which attains accuracy \(\varepsilon \) (depending on N and \(\Vert f-\breve{f} \Vert _{L^\infty ({[-1,1]^K})}\)) uniformly, with respect to the inputs. For analytic maps \(f: [-1,1]^{K}\rightarrow {\mathbb {R}}\), we prove exponential convergence of expression and generalization errors of the constructed ReLU DNNs. Specifically, for every target accuracy \(\varepsilon \in (0,1)\), there exists N depending also on f such that the error of the construction algorithm with N evaluations of \(\breve{f}\) as input in the norm \(L^\infty ([-1,1]^{K};{\mathbb {R}})\) is smaller than \(\varepsilon \) up to an additive data-corruption bound \(\Vert f-\breve{f} \Vert _{L^\infty ({[-1,1]^K})}\) multiplied with a factor growing slowly with \(1/\varepsilon \) and the number of non-zero DNN weights grows polylogarithmically with respect to \(1/\varepsilon \). The algorithmic construction of the ReLU DNNs which will realize the approximations, is explicit and deterministic in terms of the function values of \(\breve{f}\) in tensorized Clenshaw–Curtis grids in \([-1,1]^K\). We illustrate the proposed methodology by a constructive algorithm for (offline) computations of posterior expectations in Bayesian PDE inversion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The error bounds have been derived under the assumption that the affine transformations in the DNNs are evaluated in exact arithmetic, without rounding.

References

  1. Adcock, B., Dexter, N.: The gap between theory and practice in function approximation with deep neural networks. SIAM J. Math. Data Sci. 3(2), 624–655 (2021)

    Article  MathSciNet  Google Scholar 

  2. Arridge, S., Maass, P., Öktem, O., Schönlieb, C.-B.: Solving inverse problems using data-driven models. Acta Numer. 28, 1–174 (2019)

    Article  MathSciNet  Google Scholar 

  3. Bölcskei, H., Grohs, P., Kutyniok, G., Petersen, P.: Optimal approximation with sparsely connected deep neural networks. SIAM J. Math. Data Sci. 1(1), 8–45 (2019)

    Article  MathSciNet  Google Scholar 

  4. Boullé, N., Nakatsukasa, Y., Townsend, A.: Rational neural networks. In: Accepted for Publication in 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada (2020)

  5. Cheridito, P., Jentzen, A., Rossmannek, F.: Non-convergence of stochastic gradient descent in the training of deep neural networks. J. Complex. 64, 101540 (2021)

    Article  MathSciNet  Google Scholar 

  6. Cohen, A., Schwab, C., Zech, J.: Shape holomorphy of the stationary Navier-Stokes Equations. SIAM J. Math. Anal. 50(2), 1720–1752 (2018)

    Article  MathSciNet  Google Scholar 

  7. Dashti, M., Stuart, A.M.: The Bayesian approach to inverse problems. In: Handbook of Uncertainty Quantification, pp. 311–428. Springer, Cham (2017)

    Chapter  Google Scholar 

  8. Daws, J., Webster, C.: Analysis of deep neural networks with quasi-optimal polynomial approximation rates, (2019). ArXiv: 1912.02302

  9. Dick, J., Gantner, R.N., Gia, Q.T.L., Schwab, C.: Multilevel higher-order quasi-Monte Carlo Bayesian estimation. Math. Models Methods Appl. Sci. 27(5), 953–995 (2017)

    Article  MathSciNet  Google Scholar 

  10. Dick, J., Gantner, R.N., Gia, Q.T.L., Schwab, C.: Higher order quasi-Monte Carlo integration for Bayesian PDE inversion. Comput. Math. Appl. 77(1), 144–172 (2019)

    Article  MathSciNet  Google Scholar 

  11. Dũng, D., Nguyen, V.K.: Deep ReLU neural networks in high-dimensional approximation. Neural Netw. 142, 619–635 (2021)

    Article  Google Scholar 

  12. E, W., Wang, Q.: Exponential convergence of the deep neural network approximation for analytic functions. Sci. China Math. 61(10), 1733–1740 (2018)

  13. Ehlich, H., Zeller, K.: Auswertung der Normen von Interpolationsoperatoren. Math. Ann. 164, 105–112 (1966)

    Article  MathSciNet  Google Scholar 

  14. Elbrächter, D., Grohs, P., Jentzen, A., Schwab, C.: DNN expression rate analysis of high-dimensional PDEs: application to option pricing. Constructive Approximation. Published Online May 6th, (2021)

  15. Gaß, M., Glau, K., Mahlstedt, M., Mair, M.: Chebyshev interpolation for parametric option pricing. Finance Stoch. 22(3), 701–731 (2018)

    Article  MathSciNet  Google Scholar 

  16. Grohs, P., Voigtlaender, F.: Proof of the theory-to-practice gap in deep learning via sampling complexity bounds for neural network approximation spaces. Technical report, (2021). ArXiv: 2104.02746

  17. Henríquez, F., Schwab, C.: Shape holomorphy of the Calderón projector for the Laplacian in \({\mathbb{R}}^2\). Integral Equ. Oper. Theory 93(4), 43 (2021)

    Article  Google Scholar 

  18. Herrmann, L., Schwab, C., Zech, J.: Deep neural network expression of posterior expectations in Bayesian PDE inversion. Inverse Probl. 36(12), 125011 (2020)

    Article  MathSciNet  Google Scholar 

  19. Herrmann, L., Schwab, Ch.: Multilevel quasi-Monte Carlo uncertainty quantification for advection-diffusion-reaction. In: Monte Carlo and Quasi-Monte Carlo Methods, volume 324 of Springer Proc. Math. Stat., pages 31–67. Springer, Cham (2020)

  20. Hosseini, B., Nigam, N.: Well-posed Bayesian inverse problems: priors with exponential tails. SIAM/ASA J. Uncertain. Quantif. 5(1), 436–465 (2017)

    Article  MathSciNet  Google Scholar 

  21. Jerez-Hanckes, C., Schwab, C., Zech, J.: Electromagnetic wave scattering by random surfaces: shape holomorphy. Math. Mod. Meth. Appl. Sci. 27(12), 2229–2259 (2017)

    Article  MathSciNet  Google Scholar 

  22. Li, B., Tang, S., Yu, H.: Better approximations of high dimensional smooth functions by deep neural networks with rectified power units. Commun. Comput. Phys. 27(2), 379–411 (2019)

    Article  MathSciNet  Google Scholar 

  23. Liang, S., Srikant, R.: Why deep neural networks for function approximation? In: Proc. of ICLR 2017, pages 1 – 17, (2017). ArXiv:1610.04161

  24. Lu, L., Jin, P., Karniadakis, G.E.: DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators, (2020). arXiv: 1910.03193

  25. Lye, K.O., Mishra, S., Ray, D.: Deep learning observables in computational fluid dynamics. J. Comput. Phys. 410, 109339 (2020)

    Article  MathSciNet  Google Scholar 

  26. Mhaskar, H.N.: Approximation properties of a multilayered feedforward artificial neural network. Adv. Comput. Math. 1(1), 61–80 (1993)

    Article  MathSciNet  Google Scholar 

  27. Mhaskar, H.N.: Neural networks for optimal approximation of smooth and analytic functions. Neural Comput. 8(1), 164–177 (1996)

    Article  Google Scholar 

  28. Opschoor, J.A.A.: In preparation. PhD thesis, Dissertation, ETH Zürich, 202x

  29. Opschoor, J.A.A., Petersen, P.C., Schwab, C.: Deep ReLU networks and high-order finite element methods. Anal. Appl. 18(05), 715–770 (2020)

    Article  MathSciNet  Google Scholar 

  30. Opschoor, J.A.A., Schwab, C., Zech, J.: Exponential ReLU DNN expression of holomorphic maps in high dimension. Constructive Approximation. Published Online April 23rd, (2021)

  31. Petersen, P., Voigtlaender, F.: Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Netw. 108, 296–330 (2018)

    Article  Google Scholar 

  32. Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)

    Article  MathSciNet  Google Scholar 

  33. Rivlin, T.J.: The Chebyshev Polynomials. Wiley-Interscience [John Wiley & Sons], New York-London-Sydney (1974)

    MATH  Google Scholar 

  34. Rolnick, D., Tegmark, M.: The power of deeper networks for expressing natural functions. In: International Conference on Learning Representations, (2018)

  35. Schwab, C., Zech, J.: Deep learning in high dimension: neural network expression rates for generalized polynomial chaos expansions in UQ. Anal. Appl. Singap. 17(1), 19–55 (2019)

    Article  MathSciNet  Google Scholar 

  36. Tang, S., Li, B., Yu, H.: ChebNet: Efficient and stable constructions of deep neural networks with rectified power units using Chebyshev approximations. Technical report, 2019. ArXiv: 1911.05467

  37. Trefethen, L.N.: Approximation Theory and Approximation Practice. Society for Industrial and Applied Mathematics, Philadelphia (2019)

    Book  Google Scholar 

  38. Yang, L., Meng, X., Karniadakis, G.E.: B-PINNs: Bayesian physics-informed neural networks for forward and inverse PDE problems with noisy data. J. Comput. Phys. 425, 109913 (2021)

    Article  MathSciNet  Google Scholar 

  39. Yarotsky, D.: Error bounds for approximations with deep ReLU networks. Neural Netw. 94, 103–114 (2017)

    Article  Google Scholar 

  40. Zech, J., Schwab, C.: Convergence rates of high dimensional Smolyak quadrature. ESAIM Math. Model. Numer. Anal. 54(4), 1259–1307 (2020)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lukas Herrmann.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Christoph Schwab acknowledges stimulating exchanges with participants in the Isaac Newton Institute’s Mathematics of Deep Learning (MDL) term, 1 July 2021 to 17 December 2021.

Constructive ReLU DNN Approximation of \(T_n\)

Constructive ReLU DNN Approximation of \(T_n\)

We present an emulation of univariate Chebyšev polynomials \(T_n(x)\) of arbitrary degrees by ReLU DNNs, which will be developed in [28] and which follows closely the construction in [29] for univariate monomials. Specifically, we construct a DNN that approximates the Chebyšev polynomials of the first kind, denoted by \(\{T_\ell \}_{\ell \in {\mathbb {N}}_0}\). As was derived for DNNs with the RePU activation function \(\sigma _r(x) = (\max \{x,0\})^r\) for \(r\in {\mathbb {N}}\) satisfying \(r\ge 2\) in [36], they can be approximated efficiently also by ReLU DNNs by exploiting the three term recursion

$$\begin{aligned} \forall m,n\in {\mathbb {N}}_0: \quad T_{m+n} = 2T_{m}T_{n}-T_{|m-n|}, \qquad T_0(x) = 1, \quad T_1(x) = x, \quad \forall x\in {\mathbb {R}}. \end{aligned}$$
(A.1)

This recursion is specific to the \(T_n\) and follows from the addition rule for cosines.

To construct the DNNs that approximate all Chebyšev polynomials of degree \(1,\ldots ,n\) on \({\hat{I}}{:}= (-1,1)\), we first construct inductively, for all \(k\in {\mathbb {N}}\), DNNs \(\{\varPsi ^{k}_{\delta }\}_{\delta \in (0,1)}\) with input dimension one and output dimension \(2^{k-1}+2\) with the following properties: denoting all components of the output, except for the first one, by \({\tilde{T}}_{\ell ,\delta } :=(\varPsi ^{k}_{\delta })_{2+\ell -2^{k-1}}\) for \(\ell \in \{2^{k-1},\ldots ,2^{k}\}\), it holds that

$$\begin{aligned} \varPsi ^{k}_{\delta }(x)&= \, \big ( x, {\tilde{T}}_{2^{k-1},\delta }(x), \ldots , {\tilde{T}}_{2^k,\delta }(x) \big ), x \in {\hat{I}}, \nonumber \\ \left\| T_\ell (x)-{\tilde{T}}_{\ell ,\delta }(x) \right\| _{W^{1,\infty }({\hat{I}})}&\le \, \delta , \ell \in \{2^{k-1},\ldots ,2^k\}. \end{aligned}$$
(A.2)

We only provide the DNN constructions, for proofs of the error bound and the estimates on the network depth and size in Lemma 3.1 we refer to [28].

Induction basis. Let \(\delta \in (0,1)\) be arbitrary and define \(L_1 :={{\,\mathrm{depth}\,}}({{\tilde{\prod }}}_{\delta /4,1}^2)\). Also, define the matrix \(A :=[1,1]^\top \in {\mathbb {R}}^{2\times 1}\) and the vector \(b' :=[-1]\in {\mathbb {R}}^1\), and let \(A_i\), \(b_i\), \(i=1,\ldots ,L_1 +1\) denote the weights and biases of \({{\tilde{\prod }}}_{\delta /4,1}^2\) as in Proposition 3.2. Then we define

$$\begin{aligned} \varPsi ^{1}_{\delta } :=\left( \varPhi ^{\mathrm{Id}}_{1,L_1} , \varPhi ^{\mathrm{Id}}_{1,L_1} , \varPhi \right) , \end{aligned}$$

where the weights and biases of \(\varPhi \) are \(A_1A, A_2,\ldots ,A_{L_1},2A_{L_1+1}\) resp. \(b_1,\ldots ,b_{L_1},2b_{L_1+1}+b'\). It follows that \((\varPsi ^{1}_{\delta }(x))_{1} = x\), \({\tilde{T}}_{1,\delta }(x) :=(\varPsi ^{1}_{\delta }(x))_{2} = x = T_1(x)\) and \({\tilde{T}}_{2,\delta }(x) :=(\varPsi ^{1}_{\delta }(x))_{3} = 2({{\tilde{\prod }}}_{\delta /4,1}^2)(x,x)-1\) for all \(x\in {\hat{I}}{:}= (-1,1)\).

Induction hypothesis (IH). For all \(\delta \in (0,1)\) and \(k\in {\mathbb {N}}\), let \(\theta :=2^{-2k-4} \delta \), and assume that there exists a DNN \(\varPsi ^{k}_{\theta }\) which satisfies Eq. (A.2) with \(\theta \) instead of \(\delta \).

Induction step. For \(\delta \) and k as in (IH), we show that (A.2) holds with \(\delta \) as in (IH) and with \(k+1\) instead of k. We define, for \(\varPhi ^{1,k}\) and \(\varPhi ^{2,k}_{\delta }\) introduced below,

$$\begin{aligned} \varPsi ^{k+1}_{\delta } :=\varPhi ^{2,k}_{\delta } \circ \varPhi ^{1,k} \circ \varPsi ^{k}_{\theta }. \end{aligned}$$
(A.3)

For a sketch of the network structure, see Fig. 4. The DNN \(\varPhi ^{1,k}\) of depth 0 implements the linear map

$$\begin{aligned} {\mathbb {R}}^{2^{k-1}+2}\rightarrow {\mathbb {R}}^{2^{k+1}+2}:(z_1,\ldots ,z_{2^{k-1}+2}) \mapsto (z_1,z_{2^{k-1}+2},z_{2},z_{3},z_3,z_3,z_3,z_4,z_4,z_4,z_4,z_5, \\ \ldots ,z_{2^{k-1}+1},z_{2^{k-1}+2},z_{2^{k-1}+2},z_{2^{k-1}+2} ). \end{aligned}$$

Denoting its weights and biases by \(A^{1,k},b^{1,k}\), it holds that \(b^{1,k} = 0\) and

$$\begin{aligned} (A^{1,k})_{m,i} = {\left\{ \begin{array}{ll} 1 &{} \text {if } m = 1, i = 1, \\ 1 &{} \text {if } m = 2, i = 2^{k-1} + 2, \\ 1 &{} \text {if } m \in \{3,\ldots ,2^{k+1}+2\}, i = \lceil \tfrac{m+5}{4} \rceil , \\ 0 &{} \text {else}. \end{array}\right. } \end{aligned}$$

With \(L_{\theta } :={{\,\mathrm{depth}\,}}({{\tilde{\prod }}}_{\theta ,2}^2)\) we define

$$\begin{aligned} \varPhi ^{2,k}_{\delta } :=\varPhi \circ \left( \varPhi ^{\mathrm{Id}}_{2,L_{\theta }} , {{\tilde{\prod }}}_{\theta ,2}^2 , \ldots , {{\tilde{\prod }}}_{\theta ,2}^2 \right) _{\mathrm {d}}, \end{aligned}$$

containing \(2^{k}\) \({{\tilde{\prod }}}_{\theta ,2}^2\)-networks, with \(\varPhi \) denoting the depth 0 network with weights and biases \(A^{2,k}\in {\mathbb {R}}^{(2^k+2)\times (2^k+2)}\) and \(b^{2,k}\in {\mathbb {R}}^{2^k+2}\) defined as

$$\begin{aligned} (A^{2,k})_{m,i} :={\left\{ \begin{array}{ll} 1 &{} \text {if } m = i \le 2, \\ 2 &{} \text {if } m = i \ge 3, \\ -1 &{} \text {if } m\ge 3 \text { is odd}, i=1, \\ 0 &{} \text {else}, \end{array}\right. } \qquad \qquad \qquad (b^{2,k})_m = {\left\{ \begin{array}{ll} -1 &{} \text {if } m\ge 3 \text { is even}, \\ 0 &{} \text {else}. \end{array}\right. } \end{aligned}$$

The network \(\varPsi ^{k+1}_{\delta }\) defined in Eq. (A.3) realizes

$$\begin{aligned} (\varPsi ^{k+1}_{\delta }(x))_{1}&= x, \qquad \text { for } x\in {\hat{I}}, \end{aligned}$$
(A.4)
$$\begin{aligned} (\varPsi ^{k+1}_{\delta }(x))_{2}&= {\tilde{T}}_{2^k,\theta }(x), \qquad \text { for } x\in {\hat{I}}, \end{aligned}$$
(A.5)
$$\begin{aligned} (\varPsi ^{k+1}_{\delta }(x))_{\ell +2-2^k}&= 2{{\tilde{\prod }}}_{\theta ,2}^2\left( {\tilde{T}}_{\lceil \ell /2 \rceil ,\theta }(x), {\tilde{T}}_{\lfloor \ell /2 \rfloor ,\theta }(x)\right) - x^{\lceil \ell /2 \rceil - \lfloor \ell /2 \rfloor }, \\&\qquad \qquad \qquad \qquad \text { for } x\in {\hat{I}}\text { and } \ell \in \{2^k+1,\ldots ,2^{k+1}\},\nonumber \end{aligned}$$
(A.6)

where \(x^{\lceil \ell /2 \rceil - \lfloor \ell /2 \rfloor }=x=T_1(x)\) for odd \(\ell \) and \(x^{\lceil \ell /2 \rceil - \lfloor \ell /2 \rfloor }=1=T_0(x)\) for even \(\ell \). For \(\ell \in \{2^k+1,\ldots ,2^{k+1}\}\) and \(x\in {\hat{I}}\), the right-hand side of (A.6) will be denoted by

$$\begin{aligned} {\tilde{T}}_{\ell ,\delta }(x) :=&\, (\varPsi ^{k+1}_{\delta }(x))_{\ell +2-2^k}. \end{aligned}$$

This finishes the construction of \(\varPsi ^{k+1}_{\delta }\).

Fig. 4
figure 4

Sketch of \(\varPsi ^{k+1}_\delta \) for some \(k\in {\mathbb {N}}\) and \(\delta \in (0,1)\), inductively constructed from \(\varPsi ^{k}_{\theta }\) with \(\theta = 2^{-2k-4} \delta \). The subnetwork \(\varPhi ^{1,k}\) realizes a linear map, correctly coupling the output of \(\varPsi ^{k}_\theta \) to the input of \(\varPhi ^{2,k}_\delta \). The subnetwork \(\varPhi ^{2,k}_\delta \) acts as the identity on the first two inputs, and as an approximate multiplication from Proposition 3.2 on pairs of the remaining inputs. Output are the input x and approximations of Chebyšev polynomials of degree \(2^k, \ldots , 2^{k+1}\), with accuracy \(\delta \)

Next, we construct \(\varPhi ^{{\text {Cheb}},n}_{\delta }\) as in Lemma 3.1.

If \(n=1\), for all \(\delta \in (0,1)\) we define \(\varPhi ^{{\text {Cheb}},n}_{\delta }{:}= ((A,b))\), where \(A {:}= [1] \in {\mathbb {R}}^{1\times 1}\) and \(b {:}= [0] \in {\mathbb {R}}^1\).

If \(n\ge 2\), let \(k{:}=\lceil \log _2(n)\rceil \). See Fig. 5 for a sketch of the DNN construction below. We use the networks \(\{\varPsi ^{j}_{\delta }\}_{\delta \in (0,1),j\in \{1,\ldots ,k\}}\) constructed above and take \(\{\ell _j\}_{j=1}^k\subset {\mathbb {N}}\) such that \({{\,\mathrm{depth}\,}}\left( \varPsi ^k_{\delta } \right) +1 = {{\,\mathrm{depth}\,}}\left( \varPsi ^j_{\delta } \right) + \ell _j\) for \(j=1,\ldots ,k\), and thus \(\ell _j\le \max _{j=1}^k {{\,\mathrm{depth}\,}}\left( \varPsi ^j_{\delta } \right) = {{\,\mathrm{depth}\,}}\left( \varPsi ^k_{\delta } \right) \). We define

$$\begin{aligned} \varPhi ^{{\text {Cheb}},n}_{\delta } :=\varPhi ^{3,n} \circ \left( \varPsi ^1_{\delta }\circ \varPhi ^{\mathrm{Id}}_{1,\ell _1},\ldots , \varPsi ^k_{\delta }\circ \varPhi ^{\mathrm{Id}}_{1,\ell _k}\right) , \end{aligned}$$

where the DNN \(\varPhi ^{3,n}\) of depth 0 emulates the linear map \({\mathbb {R}}^{2^k+2k-1}\rightarrow {\mathbb {R}}^n\) satisfying

$$\begin{aligned} \varPhi ^{3,n}(z_1,\ldots ,z_{2^k+2k-1})_1 = z_2, \qquad \varPhi ^{3,n}(z_1,\ldots ,z_{2^k+2k-1})_2 = z_3, \\ \text { and for all}~ \ell =3,\ldots ,n, \text { with } j {:}= \lceil \log _2(\ell )\rceil : \qquad \varPhi ^{3,n}(z_1,\ldots ,z_{2^k+2k-1})_\ell = z_{\ell +2j-1}. \end{aligned}$$

The realization satisfies for all \(\ell =1,\ldots ,n\)

$$\begin{aligned} (\varPhi ^{{\text {Cheb}},n}_{\delta }(x) )_\ell = {\tilde{T}}_{\ell ,\delta }(x), \qquad x \in {\hat{I}}, \, \ell \in \{1,\ldots ,n\}. \end{aligned}$$
Fig. 5
figure 5

Sketch of \(\varPhi ^{{\text {Cheb}},n}_\delta \) for some \(n\in {\mathbb {N}}\) and \(\delta \in (0,1)\), constructed from identity networks and the previously described \(\varPsi ^1_\delta ,\ldots ,\varPsi ^k_\delta \), with \(k {:}= \lceil \log _2(n) \rceil \). The subnetwork \(\varPhi ^{3,n}\) realizes a linear map that selects the desired approximations of univariate Chebyšev polynomials from the outputs of the preceding layer

Remark A.1

The subnetwork \(\varPsi ^k_\delta \) of \(\varPhi ^{{\text {Cheb}},n}_\delta \) approximates all univariate Chebyšev polynomials of degree up to \(2^k\), also when \(n<2^k\). This causes the “step-like” behavior in Figs. 2 and 3. This step-wise growth of the network size can easily be prevented by removing from \(\varPsi ^k_\delta \) the product networks \({{\tilde{\prod }}}_{\theta ,2}^2\) that compute \({\tilde{T}}_{\ell ,\delta }(x)\) for \(\ell >n\), and modifying \(\varPhi ^{3,n}\) accordingly.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Herrmann, L., Opschoor, J.A.A. & Schwab, C. Constructive Deep ReLU Neural Network Approximation. J Sci Comput 90, 75 (2022). https://doi.org/10.1007/s10915-021-01718-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10915-021-01718-2

Keywords

Mathematics Subject Classification

Navigation