Skip to main content
Log in

Uniform approximation rates and metric entropy of shallow neural networks

  • Research
  • Published:
Research in the Mathematical Sciences Aims and scope Submit manuscript

Abstract

We study the approximation properties of the variation spaces corresponding to shallow neural networks with respect to the uniform norm. Specifically, we consider the spectral Barron space, which consists of the convex hull of decaying Fourier modes, and the convex hull of indicator functions of half-spaces, which corresponds to shallow neural networks with sigmoidal activation function. Up to logarithmic factors, we determine the metric entropy and nonlinear dictionary approximation rates for these spaces with respect to the uniform norm. Combined with previous results with respect to the \(L^2\)-norm, this also gives the metric entropy up to logarithmic factors with respect to any \(L^p\)-norm with \(1\le p\le \infty \). In addition, we study the approximation rates for high-order spectral Barron spaces using shallow neural networks with ReLU\(^k\) activation function. Specifically, we show that for a sufficiently high-order spectral Barron space, ReLU\(^k\) networks are able to achieve an approximation rate of \(n^{-(k+1)}\) with respect to the uniform norm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

References

  1. Bach, F.: Breaking the curse of dimensionality with convex neural networks. J. Mach. Learn. Res. 18(1), 629–681 (2017)

    MATH  Google Scholar 

  2. Ball, K., Pajor, A.: The entropy of convex bodies with “few” extreme points. In: Proceedings of the 1989 Conference in Banach Spaces at Strob. Austria. Cambridge Univ. Press (1990)

  3. Barron, A.R.: Neural net approximation. In: Proc. 7th Yale Workshop on Adaptive and Learning Systems, vol. 1, pp. 69–72 (1992)

  4. Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39(3), 930–945 (1993)

    Article  MathSciNet  Google Scholar 

  5. Barron, A.R., Cohen, A., Dahmen, W., DeVore, R.A.: Approximation and learning by greedy algorithms. Ann. Stat. 36(1), 64–94 (2008)

    Article  MathSciNet  Google Scholar 

  6. Cao, F., Xie, T., Xu, Z.: The estimate for approximation error of neural networks: a constructive approach. Neurocomputing 71(4–6), 626–630 (2008)

    Article  Google Scholar 

  7. Carl, B.: Entropy numbers, s-numbers, and eigenvalue problems. J. Funct. Anal. 41(3), 290–306 (1981)

    Article  MathSciNet  Google Scholar 

  8. Carl, B.: Metric entropy of convex hulls in Hilbert spaces. Bull. Lond. Math. Soc. 29(4), 452–458 (1997)

    Article  MathSciNet  Google Scholar 

  9. Carl, B., Kyrezi, I., Pajor, A.: Metric entropy of convex hulls in Banach spaces. J. Lond. Math. Soc. 60(3), 871–896 (1999)

    Article  MathSciNet  Google Scholar 

  10. Cohen, A., Devore, R., Petrova, G., Wojtaszczyk, P.: Optimal stable nonlinear approximation. Found. Comput. Math. 1–42 (2021)

  11. Costarelli, D., Spigler, R.: Constructive approximation by superposition of sigmoidal functions. Anal. Theory Appl. 29(2), 169–196 (2013)

    Article  MathSciNet  Google Scholar 

  12. Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)

    Article  MathSciNet  Google Scholar 

  13. Debao, C.: Degree of approximation by superpositions of a sigmoidal function. Approx. Theory Appl. 9(3), 17–28 (1993)

    MathSciNet  MATH  Google Scholar 

  14. DeVore, R.A.: Nonlinear approximation. Acta Numerica 7, 51–150 (1998)

    Article  Google Scholar 

  15. DeVore, R.A., Temlyakov, V.N.: Some remarks on greedy algorithms. Adv. Comput. Math. 5(1), 173–187 (1996)

    Article  MathSciNet  Google Scholar 

  16. E, W., Ma, C., Wu, L.: Barron spaces and the compositional function spaces for neural network models. arXiv preprint arXiv:1906.08039 (2019)

  17. E, W., Ma, C., Wu, L.: The Barron space and the flow-induced function spaces for neural network models. Constr. Approx. 1–38 (2021)

  18. E, W., Wojtowytsch, S.: Representation formulas and pointwise properties for Barron functions. CoRR (2020)

  19. E, W., Yu, B.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1) (2018)

  20. Erdös, P., Gruber, P.M., Hammer, J.: Lattice points. Longman scientific & technical Harlow (1989)

  21. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 1189–1232 (2001)

  22. Hao, W., Jin, X., Siegel, J.W., Xu, J.: An efficient greedy training algorithm for neural networks and applications in PDEs. arXiv preprint arXiv:2107.04466 (2021)

  23. Jones, L.K.: A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training. Ann. Stat. 20(1), 608–613 (1992)

    Article  MathSciNet  Google Scholar 

  24. Klusowski, J.M., Barron, A.R.: Approximation by combinations of RELU and squared RELU ridge functions with \(\ell ^1\) and \(\ell ^0\) controls. IEEE Trans. Inf. Theory 64(12), 7649–7656 (2018)

    Article  MathSciNet  Google Scholar 

  25. Konyagin, S.V., Kuleshov, A.A., Maiorov, V.E.: Some problems in the theory of ridge functions. Proc. Steklov Inst. Math. 301(1), 144–169 (2018)

    Article  MathSciNet  Google Scholar 

  26. Kurková, V., Sanguineti, M.: Bounds on rates of variable-basis and neural-network approximation. IEEE Trans. Inf. Theory 47(6), 2659–2665 (2001)

    Article  MathSciNet  Google Scholar 

  27. Kurková, V., Sanguineti, M.: Comparison of worst case errors in linear and neural network approximation. IEEE Trans. Inf. Theory 48(1), 264–275 (2002)

    Article  MathSciNet  Google Scholar 

  28. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  29. Ledoux, M., Talagrand, M.: Probability in Banach Spaces: Isoperimetry and Processes, vol. 23. Springer, Berlin (1991)

    Book  Google Scholar 

  30. Lewicki, G., Marino, G.: Approximation of functions of finite variation by superpositions of a sigmoidal function. Appl. Math. Lett. 17(10), 1147–1152 (2004)

    Article  MathSciNet  Google Scholar 

  31. Livshits, E.D.: Lower bounds for the rate of convergence of greedy algorithms. Izvestiya Math. 73(6), 1197 (2009)

    Article  MathSciNet  Google Scholar 

  32. Lu, J., Shen, Z., Yang, H., Zhang, S.: Deep network approximation for smooth functions. arXiv preprint arXiv:2001.03040 (2020)

  33. Makovoz, Y.: Random approximants and neural networks. J. Approx. Theory 85(1), 98–109 (1996)

    Article  MathSciNet  Google Scholar 

  34. Makovoz, Y.: Uniform approximation by neural networks. J. Approx. Theory 95(2), 215–228 (1998)

    Article  MathSciNet  Google Scholar 

  35. Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)

    Article  Google Scholar 

  36. Matoušek, J.: Tight upper bounds for the discrepancy of half-spaces. Discret. Comput. Geom. 13(3), 593–601 (1995)

    Article  MathSciNet  Google Scholar 

  37. Matoušek, J.: Improved upper bounds for approximation by zonotopes. Acta Math. 177(1), 55–73 (1996)

    Article  MathSciNet  Google Scholar 

  38. Matousek, J.: Geometric Discrepancy: An Illustrated Guide, vol. 18. Springer, Berlin (1999)

    Book  Google Scholar 

  39. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010)

  40. Ongie, G., Willett, R., Soudry, D., Srebro, N.: A function space view of bounded norm infinite width RELU nets: the multivariate case. In: International Conference on Learning Representations (ICLR 2020) (2019)

  41. Parhi, R., Nowak, R.D.: Banach space representer theorems for neural networks and ridge splines. arXiv preprint arXiv:2006.05626 (2020)

  42. Parhi, R., Nowak, R.D.: What kinds of functions do deep neural networks learn? Insights from variational spline theory. arXiv preprint arXiv:2105.03361 (2021)

  43. Petrushev, P.P.: Approximation by ridge functions and neural networks. SIAM J. Math. Anal. 30(1), 155–189 (1998)

    Article  MathSciNet  Google Scholar 

  44. Pietsch, A., (Algebra), I., Company, N.H.P.: Operator Ideals. Mathematical Studies. North-Holland Publishing Company (1980). https://books.google.com/books?id=SAyoAAAAIAAJ

  45. Pisier, G.: Remarques sur un résultat non publié de b. maurey. Séminaire Analyse fonctionnelle (dit “Maurey-Schwartz”), pp. 1–12 (1981)

  46. Sauer, N.: On the density of families of sets. J. Comb. Theory Ser. A 13(1), 145–147 (1972)

    Article  MathSciNet  Google Scholar 

  47. Shelah, S.: A combinatorial problem; stability and order for models and theories in infinitary languages. Pac. J. Math. 41(1), 247–261 (1972)

    Article  MathSciNet  Google Scholar 

  48. Siegel, J.W., Xu, J.: Characterization of the variation spaces corresponding to shallow neural networks. arXiv preprint arXiv:2106.15002 (2021)

  49. Siegel, J.W., Xu, J.: Sharp bounds on the approximation rates, metric entropy, and \(n\)-widths of shallow neural networks. arXiv preprint arXiv:2101.12365 (2021)

  50. Siegel, J.W., Xu, J.: High-order approximation rates for shallow neural networks with cosine and RELUk activation functions. Appl. Comput. Harmon. Anal. 58, 1–26 (2022)

    Article  MathSciNet  Google Scholar 

  51. Siegel, J.W., Xu, J.: Optimal convergence rates for the orthogonal greedy algorithm. IEEE Trans. Inf. Theory (2022)

  52. Sil’nichenko, A.: Rate of convergence of greedy algorithms. Math. Notes 76(3), 582–586 (2004)

    Article  MathSciNet  Google Scholar 

  53. Temlyakov, V.: Greedy Approximation, vol. 20. Cambridge University Press, Cambridge (2011)

    Book  Google Scholar 

  54. Temlyakov, V.N.: Greedy approximation. Acta Numer. 17(235), 409 (2008)

    MathSciNet  MATH  Google Scholar 

  55. Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. In: Measures of Complexity, pp. 11–30. Springer, Berlin (2015)

  56. Xu, J.: Finite neuron method and convergence analysis. Commun. Comput. Phys. 28(5), 1707–1745 (2020). https://doi.org/10.4208/cicp.OA-2020-0191

    Article  MathSciNet  MATH  Google Scholar 

  57. Xu, Z., Cao, F.: The essential order of approximation for neural networks. Sci China Ser. F Inf. Sci. 47(1), 97–112 (2004)

    Article  MathSciNet  Google Scholar 

  58. Yukich, J.E., Stinchcombe, M.B., White, H.: Sup-norm approximation bounds for networks through probabilistic methods. IEEE Trans. Inf. Theory 41(4), 1021–1027 (1995)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We would like to thank Professors Russel Caflisch, Ronald DeVore, Weinan E, Albert Cohen, Stephan Wojtowytsch and Jason Klusowski for helpful discussions. This work was supported by the Verne M. Willaman Chair Fund at the Pennsylvania State University, and the National Science Foundation (Grant No. DMS-1819157 and DMS-2111387).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Limin Ma, Jonathan W. Siegel or Jinchao Xu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, L., Siegel, J.W. & Xu, J. Uniform approximation rates and metric entropy of shallow neural networks. Res Math Sci 9, 46 (2022). https://doi.org/10.1007/s40687-022-00346-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40687-022-00346-y

Keywords

Navigation