Abstract
We study the approximation properties of the variation spaces corresponding to shallow neural networks with respect to the uniform norm. Specifically, we consider the spectral Barron space, which consists of the convex hull of decaying Fourier modes, and the convex hull of indicator functions of half-spaces, which corresponds to shallow neural networks with sigmoidal activation function. Up to logarithmic factors, we determine the metric entropy and nonlinear dictionary approximation rates for these spaces with respect to the uniform norm. Combined with previous results with respect to the \(L^2\)-norm, this also gives the metric entropy up to logarithmic factors with respect to any \(L^p\)-norm with \(1\le p\le \infty \). In addition, we study the approximation rates for high-order spectral Barron spaces using shallow neural networks with ReLU\(^k\) activation function. Specifically, we show that for a sufficiently high-order spectral Barron space, ReLU\(^k\) networks are able to achieve an approximation rate of \(n^{-(k+1)}\) with respect to the uniform norm.
Similar content being viewed by others
Data availability
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
References
Bach, F.: Breaking the curse of dimensionality with convex neural networks. J. Mach. Learn. Res. 18(1), 629–681 (2017)
Ball, K., Pajor, A.: The entropy of convex bodies with “few” extreme points. In: Proceedings of the 1989 Conference in Banach Spaces at Strob. Austria. Cambridge Univ. Press (1990)
Barron, A.R.: Neural net approximation. In: Proc. 7th Yale Workshop on Adaptive and Learning Systems, vol. 1, pp. 69–72 (1992)
Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39(3), 930–945 (1993)
Barron, A.R., Cohen, A., Dahmen, W., DeVore, R.A.: Approximation and learning by greedy algorithms. Ann. Stat. 36(1), 64–94 (2008)
Cao, F., Xie, T., Xu, Z.: The estimate for approximation error of neural networks: a constructive approach. Neurocomputing 71(4–6), 626–630 (2008)
Carl, B.: Entropy numbers, s-numbers, and eigenvalue problems. J. Funct. Anal. 41(3), 290–306 (1981)
Carl, B.: Metric entropy of convex hulls in Hilbert spaces. Bull. Lond. Math. Soc. 29(4), 452–458 (1997)
Carl, B., Kyrezi, I., Pajor, A.: Metric entropy of convex hulls in Banach spaces. J. Lond. Math. Soc. 60(3), 871–896 (1999)
Cohen, A., Devore, R., Petrova, G., Wojtaszczyk, P.: Optimal stable nonlinear approximation. Found. Comput. Math. 1–42 (2021)
Costarelli, D., Spigler, R.: Constructive approximation by superposition of sigmoidal functions. Anal. Theory Appl. 29(2), 169–196 (2013)
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)
Debao, C.: Degree of approximation by superpositions of a sigmoidal function. Approx. Theory Appl. 9(3), 17–28 (1993)
DeVore, R.A.: Nonlinear approximation. Acta Numerica 7, 51–150 (1998)
DeVore, R.A., Temlyakov, V.N.: Some remarks on greedy algorithms. Adv. Comput. Math. 5(1), 173–187 (1996)
E, W., Ma, C., Wu, L.: Barron spaces and the compositional function spaces for neural network models. arXiv preprint arXiv:1906.08039 (2019)
E, W., Ma, C., Wu, L.: The Barron space and the flow-induced function spaces for neural network models. Constr. Approx. 1–38 (2021)
E, W., Wojtowytsch, S.: Representation formulas and pointwise properties for Barron functions. CoRR (2020)
E, W., Yu, B.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1) (2018)
Erdös, P., Gruber, P.M., Hammer, J.: Lattice points. Longman scientific & technical Harlow (1989)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 1189–1232 (2001)
Hao, W., Jin, X., Siegel, J.W., Xu, J.: An efficient greedy training algorithm for neural networks and applications in PDEs. arXiv preprint arXiv:2107.04466 (2021)
Jones, L.K.: A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training. Ann. Stat. 20(1), 608–613 (1992)
Klusowski, J.M., Barron, A.R.: Approximation by combinations of RELU and squared RELU ridge functions with \(\ell ^1\) and \(\ell ^0\) controls. IEEE Trans. Inf. Theory 64(12), 7649–7656 (2018)
Konyagin, S.V., Kuleshov, A.A., Maiorov, V.E.: Some problems in the theory of ridge functions. Proc. Steklov Inst. Math. 301(1), 144–169 (2018)
Kurková, V., Sanguineti, M.: Bounds on rates of variable-basis and neural-network approximation. IEEE Trans. Inf. Theory 47(6), 2659–2665 (2001)
Kurková, V., Sanguineti, M.: Comparison of worst case errors in linear and neural network approximation. IEEE Trans. Inf. Theory 48(1), 264–275 (2002)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Ledoux, M., Talagrand, M.: Probability in Banach Spaces: Isoperimetry and Processes, vol. 23. Springer, Berlin (1991)
Lewicki, G., Marino, G.: Approximation of functions of finite variation by superpositions of a sigmoidal function. Appl. Math. Lett. 17(10), 1147–1152 (2004)
Livshits, E.D.: Lower bounds for the rate of convergence of greedy algorithms. Izvestiya Math. 73(6), 1197 (2009)
Lu, J., Shen, Z., Yang, H., Zhang, S.: Deep network approximation for smooth functions. arXiv preprint arXiv:2001.03040 (2020)
Makovoz, Y.: Random approximants and neural networks. J. Approx. Theory 85(1), 98–109 (1996)
Makovoz, Y.: Uniform approximation by neural networks. J. Approx. Theory 95(2), 215–228 (1998)
Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)
Matoušek, J.: Tight upper bounds for the discrepancy of half-spaces. Discret. Comput. Geom. 13(3), 593–601 (1995)
Matoušek, J.: Improved upper bounds for approximation by zonotopes. Acta Math. 177(1), 55–73 (1996)
Matousek, J.: Geometric Discrepancy: An Illustrated Guide, vol. 18. Springer, Berlin (1999)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010)
Ongie, G., Willett, R., Soudry, D., Srebro, N.: A function space view of bounded norm infinite width RELU nets: the multivariate case. In: International Conference on Learning Representations (ICLR 2020) (2019)
Parhi, R., Nowak, R.D.: Banach space representer theorems for neural networks and ridge splines. arXiv preprint arXiv:2006.05626 (2020)
Parhi, R., Nowak, R.D.: What kinds of functions do deep neural networks learn? Insights from variational spline theory. arXiv preprint arXiv:2105.03361 (2021)
Petrushev, P.P.: Approximation by ridge functions and neural networks. SIAM J. Math. Anal. 30(1), 155–189 (1998)
Pietsch, A., (Algebra), I., Company, N.H.P.: Operator Ideals. Mathematical Studies. North-Holland Publishing Company (1980). https://books.google.com/books?id=SAyoAAAAIAAJ
Pisier, G.: Remarques sur un résultat non publié de b. maurey. Séminaire Analyse fonctionnelle (dit “Maurey-Schwartz”), pp. 1–12 (1981)
Sauer, N.: On the density of families of sets. J. Comb. Theory Ser. A 13(1), 145–147 (1972)
Shelah, S.: A combinatorial problem; stability and order for models and theories in infinitary languages. Pac. J. Math. 41(1), 247–261 (1972)
Siegel, J.W., Xu, J.: Characterization of the variation spaces corresponding to shallow neural networks. arXiv preprint arXiv:2106.15002 (2021)
Siegel, J.W., Xu, J.: Sharp bounds on the approximation rates, metric entropy, and \(n\)-widths of shallow neural networks. arXiv preprint arXiv:2101.12365 (2021)
Siegel, J.W., Xu, J.: High-order approximation rates for shallow neural networks with cosine and RELUk activation functions. Appl. Comput. Harmon. Anal. 58, 1–26 (2022)
Siegel, J.W., Xu, J.: Optimal convergence rates for the orthogonal greedy algorithm. IEEE Trans. Inf. Theory (2022)
Sil’nichenko, A.: Rate of convergence of greedy algorithms. Math. Notes 76(3), 582–586 (2004)
Temlyakov, V.: Greedy Approximation, vol. 20. Cambridge University Press, Cambridge (2011)
Temlyakov, V.N.: Greedy approximation. Acta Numer. 17(235), 409 (2008)
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. In: Measures of Complexity, pp. 11–30. Springer, Berlin (2015)
Xu, J.: Finite neuron method and convergence analysis. Commun. Comput. Phys. 28(5), 1707–1745 (2020). https://doi.org/10.4208/cicp.OA-2020-0191
Xu, Z., Cao, F.: The essential order of approximation for neural networks. Sci China Ser. F Inf. Sci. 47(1), 97–112 (2004)
Yukich, J.E., Stinchcombe, M.B., White, H.: Sup-norm approximation bounds for networks through probabilistic methods. IEEE Trans. Inf. Theory 41(4), 1021–1027 (1995)
Acknowledgements
We would like to thank Professors Russel Caflisch, Ronald DeVore, Weinan E, Albert Cohen, Stephan Wojtowytsch and Jason Klusowski for helpful discussions. This work was supported by the Verne M. Willaman Chair Fund at the Pennsylvania State University, and the National Science Foundation (Grant No. DMS-1819157 and DMS-2111387).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ma, L., Siegel, J.W. & Xu, J. Uniform approximation rates and metric entropy of shallow neural networks. Res Math Sci 9, 46 (2022). https://doi.org/10.1007/s40687-022-00346-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40687-022-00346-y