Machine Learning

, Volume 14, Issue 1, pp 115–133 | Cite as

Approximation and Estimation Bounds for Artificial Neural Networks

  • Andrew R. Barron


For a common class of artificial neural networks, the mean integrated squared error between the estimated network and a target function f is shown to be bounded by \({\text{O}}\left( {\frac{{C_f^2 }}{n}} \right) + O(\frac{{ND}}{N}\log N)\)where n is the number of nodes, d is the input dimension of the function, N is the number of training observations, and Cf is the first absolute moment of the Fourier magnitude distribution of f. The two contributions to this total risk are the approximation error and the estimation error. Approximation error refers to the distance between the target function and the closest neural network function of a given architecture and estimation error refers to the distance between this ideal network function and an estimated network function. With n ~ Cf(N/(dlog N))1/2 nodes, the order of the bound on the mean integrated squared error is optimized to be O(Cf((d/N)log N)1/2). The bound demonstrates surprisingly favorable properties of network estimation compared to traditional series and nonparametric curve estimation techniques in the case that d is moderately large. Similar bounds are obtained when the number of nodes n is not preselected as a function of Cf (which is generally not known a priori), but rather the number of nodes is optimized from the observed data by the use of a complexity regularization or minimum description length criterion. The analysis involves Fourier techniques for the approximation error, metric entropy considerations for the estimation error, and a calculation of the index of resolvability of minimum complexity estimation of the family of networks.

Neural nets approximation theory estimation theory complexity regularization statistical risk 


  1. Barron, A. R. (1989). Statistical properties of artificial neural networks. Proceedings of the IEEE International Conference on Decision and Control, (pp.280–285). New York: IEEE.Google Scholar
  2. Barron, A. R. (1990). Complexity regularization with applications to artificial neural networks. In G. Roussas (ed.) Nonparametric Functional Estimation, (pp.561–576). Boston, MA and Dordrecht, the Netherlands: Kluwer Academic Publishers.Google Scholar
  3. Barron, A. R. (1991). Approximation and estimation bounds for artificial neural networks. Proceedings of the Fourth Workshop on Computational Learning Theory, (pp.243–249). San Mateo, CA: Morgan Kaufmann Publishers. (Prelimary version of the present paper).Google Scholar
  4. Barron, A. R. (1992). Neural net approximation. Proceedings of the Seventh Yale Workshop on Adaptive and Learning Systems, (pp.69–72). K. S. Narendra (ed.), Yale University.Google Scholar
  5. Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39, 930–945.Google Scholar
  6. Barron, A. R. & Cover, T. M. (1991). Minimum complexity density estimation. IEEE Transactions on Information Theory, 37, 1034–1054.Google Scholar
  7. Barron, A. R. & Sheu, C.-H. (1991). Approximation of density functions by sequences of exponential families. Annals of Statistics, 19, 1347–1369.Google Scholar
  8. Cover, T. M. & Thomas, J. (1991). Elements of Information Theory, New York: Wiley.Google Scholar
  9. Cox, D. D. (1988). Approximation of least squares regression on nested subspaces. Annals of Statistics, 16, 713–732.Google Scholar
  10. Cybenko, G. (1989). Approximations by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2, 303–314.Google Scholar
  11. Eubank, R. (1988). Spline Smoothing and Nonparametric Regression, New York: Marcel Dekker.Google Scholar
  12. Hardle, W. (1990). Applied Nonparametric Regression, Cambridge, U.K. and New York: Cambridge University Press.Google Scholar
  13. Haussler, D. (1992). Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100, 78–150.Google Scholar
  14. Hornik, K., Stinchcombe, M., & White, H. (1988). Multi-layer feedforward networks are universal approximators. Neural Networks, 2, 359–366.Google Scholar
  15. Ibragimov, I. A., and Hasminskii, R. Z. (1980). On nonparametric estimation of regression. Doklady Acad. Nauk SSSR, 252, 780–784.Google Scholar
  16. Jones, L. K. (1992). A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training. Annals of Statistics, 20, 608–613.Google Scholar
  17. Li, K. C. (1987). Asymptotic optimality for C p, C L, cross-validation, and generalized cross-validation: discrete index set. Annals of Statistics, 15, 958–975.Google Scholar
  18. McCaffrey, D. F. & Gallant, A. R. (1991). Convergence rates for single hidden layer feedforward networks. Rand Corporation working paper, Santa Monica, California and Institute of Statistics Mimeograph Series, Number 2207, North Carolina State University.Google Scholar
  19. Nemirovskii, A. S. (1985). Nonparametric estimation of smooth regression functions. Soviet Journal of Computer and Systems Science, 23, 1–11.Google Scholar
  20. Nemirovskii, A. S., Polyak, B. T. & Tsybakov, A. B. (1985). Rate of convergence of nonparametric estimators of maximum-likelihood type. Problems of Information Transmission, 21, 258–272.Google Scholar
  21. Nussbaum, M. (1986). On nonparametric estimation of a regression function that is smooth in a domain on Rk. Theory of Probability and its Applications, 31, 118–125.Google Scholar
  22. Pinsker, M. S. (1980). Optimal filtering of square-integrable signals on a background of Gaussian noise. Problems in Information Transmission, 16.Google Scholar
  23. Rissanen, J. (1983). A universal prior for integers and estimation by minimum description length. Annals of Statistics, 11, 416–431.Google Scholar
  24. Seber, G. A. F. & Wild, C. M. (1989). Nonlinear Regression, New York: Wiley.Google Scholar
  25. Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis, New York: Chapman and Hall.Google Scholar
  26. Stone, C. J. (1982). Optimal global rates of convergence for nonparametric estimators. Annals of Statistics, 10, 1040–1053.Google Scholar
  27. Stone, C. J. (1990). Large-sample inference for log-spline models. Annals of Statistics, 18, 717–741.Google Scholar
  28. Vapnik, V. (1982). Estimation of Dependences Based on Empirical Data, New York: Springer-Verlag.Google Scholar
  29. White, H. (1990). Connectionist nonparametric regression: multilayer feedforward networks can learn arbitrary mappings. Neural Networks, 3, 535–550.Google Scholar

Copyright information

© Kluwer Academic Publishers 1994

Authors and Affiliations

  • Andrew R. Barron
    • 1
  1. 1.Department of StatisticsYale UniversityNew Haven

Personalised recommendations