Abstract
We study the variation space corresponding to a dictionary of functions in \(L^2(\Omega )\) for a bounded domain \(\Omega \subset {\mathbb {R}}^d\). Specifically, we compare the variation space, which is defined in terms of a convex hull with related notions based on integral representations. This allows us to show that three important notions relating to the approximation theory of shallow neural networks, the Barron space, the spectral Barron space, and the Radon BV space, are actually variation spaces with respect to certain natural dictionaries.
Similar content being viewed by others
References
Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inform. Theory 39(3), 930–945 (1993)
Barron, A.R., Cohen, A., Dahmen, W., DeVore, R.A.: Approximation and learning by greedy algorithms. Annal Stat. 36(1), 64–94 (2008)
DeVore, R.A.: Nonlinear approximation. Acta Numer. 7, 51–150 (1998)
DeVore, R.A., Temlyakov, V.N.: Some remarks on greedy algorithms. Adv. Comput. Math. 5(1), 173–187 (1996)
Diestel, J.: Sequences and series in Banach spaces, vol. 92. Springer Science & Business Media (2012)
Dudley, R.M.: Real analysis and probability. CRC Press (2018)
E, W., Ma, C., Wu, L.: Barron spaces and the compositional function spaces for neural network models. arXiv preprint arXiv:1906.08039 (2019)
Weinan, E., Wojtowytsch, S.: Representation formulas and pointwise properties for barron functions. CoRR (2020)
Evans, L.C., Gariepy, R.F.: Measure theory and fine properties of functions. CRC press (2015)
Hao, W., Jin, X., Siegel, J.W., Xu, J.: An efficient greedy training algorithm for neural networks and applications in pdes. arXiv preprint arXiv:2107.04466 (2021)
Hornik, K., Stinchcombe, M., White, H., Auer, P.: Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives. Neural Comput. 6(6), 1262–1275 (1994)
Jones, L.K.: A simple lemma on greedy approximation in hilbert space and convergence rates for projection pursuit regression and neural network training. Ann. Stat. 20(1), 608–613 (1992)
Kainen, P.C., Krková, V., Vogt, A.: Integral combinations of heavisides. Math. Nachr. 283(6), 854–878 (2010)
Klusowski, J.M., Barron, A.R.: Approximation by combinations of relu and squared relu ridge functions with \(\ell ^1\) and \(\ell ^0\) controls. IEEE Trans. Inform. Theory 64(12), 7649–7656 (2018)
Krogh, A., Hertz, J.: A simple weight decay can improve generalization. Adv. Neural Inform. Process. Syst.4 (1991)
Kurková, V., Sanguineti, M.: Bounds on rates of variable-basis and neural-network approximation. IEEE Trans. Inform. Theory 47(6), 2659–2665 (2001)
Kurková, V., Sanguineti, M.: Comparison of worst case errors in linear and neural network approximation. IEEE Trans. Inform. Theory 48(1), 264–275 (2002)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Leshno, M., Lin, V.Y., Pinkus, A., Schocken, S.: Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks 6(6), 861–867 (1993)
Li, Z., Ma, C., Wu, L.: Complexity measures for neural networks with general activation functions using path-based norms. arXiv preprint arXiv:2009.06132 (2020)
Livshits, E.D.: Lower bounds for the rate of convergence of greedy algorithms. Izvestiya: Mathematics 73(6), 1197 (2009)
Makovoz, Y.: Random approximants and neural networks. J. Approx. Theory 85(1), 98–109 (1996)
Makovoz, Y.: Uniform approximation by neural networks. J. Approx. Theory 95(2), 215–228 (1998)
Markoff, A.: On mean values and exterior densities. Rec. Math. 4(1), 165–191 (1938)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML (2010)
Ongie, G., Willett, R., Soudry, D., Srebro, N.: A function space view of bounded norm infinite width relu nets: The multivariate case. In: International Conference on Learning Representations (ICLR 2020) (2019)
Parhi, R., Nowak, R.D.: Banach space representer theorems for neural networks and ridge splines. arXiv preprint arXiv:2006.05626 (2020)
Parhi, R., Nowak, R.D.: What kinds of functions do deep neural networks learn? insights from variational spline theory. arXiv preprint arXiv:2105.03361 (2021)
Petrosyan, A., Dereventsov, A., Webster, C.G.: Neural network integral representations with the relu activation function. In: Mathematical and Scientific Machine Learning, pp. 128–143. PMLR (2020)
Pisier, G.: Remarques sur un résultat non publié de b. maurey. Séminaire Analyse fonctionnelle (dit “Maurey-Schwartz") pp. 1–12 (1981)
Prokhorov, Y.V.: Convergence of random processes and limit theorems in probability theory. Theory Probab. Appl. 1(2), 157–214 (1956)
Savarese, P., Evron, I., Soudry, D., Srebro, N.: How do infinite width bounded norm networks look in function space? In: Conference on Learning Theory, pp. 2667–2690. PMLR (2019)
Siegel, J.W., Xu, J.: Approximation rates for neural networks with general activation functions. Neural Networks 128, 313–321 (2020)
Siegel, J.W., Xu, J.: High-order approximation rates for neural networks with ReLU\(^k\) activation functions. arXiv preprint arXiv:2012.07205 (2020)
Siegel, J.W., Xu, J.: Sharp bounds on the approximation rates, metric entropy, and \(n\)-widths of shallow neural networks. arXiv preprint arXiv:2101.12365 (2021)
Sil’nichenko, A.: Rate of convergence of greedy algorithms. Math. Notes 76(3), 582–586 (2004)
Temlyakov, V.: Greedy approximation, vol. 20. Cambridge University Press (2011)
Temlyakov, V.N.: Greedy approximation. Acta Numerica 17(235), 409 (2008)
Weinan, E., Ma, C., Wu, L.: Barron spaces and the compositional function spaces for neural network models. arXiv preprint arXiv:1906.08039 (2019)
Weinan, E., Ma, C., Wu, L.: The barron space and the flow-induced function spaces for neural network models. Constructive Approximation pp. 1–38 (2021)
Xu, J.: Finite neuron method and convergence analysis. Commun. Comput. Phys. 28(5), 1707–1745 (2020). https://doi.org/10.4208/cicp.OA-2020-0191
Acknowledgements
We would like to thank Professors Russel Caflisch, Ronald DeVore, Weinan E, Albert Cohen, Stephan Wojtowytsch and Jason Klusowski for helpful discussions. We would also like to thank the anonymous reviewers for their helpful comments. This work was supported by the Verne M. Willaman Chair Fund at the Pennsylvania State University, and the National Science Foundation (Grant No. DMS-1819157 and DMS-2111387).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Zuowei Shen.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Siegel, J.W., Xu, J. Characterization of the Variation Spaces Corresponding to Shallow Neural Networks. Constr Approx 57, 1109–1132 (2023). https://doi.org/10.1007/s00365-023-09626-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00365-023-09626-4