Skip to main content
Log in

Nonasymptotic Bounds on the L 2 Error of Neural Network Regression Estimates

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

The estimation of multivariate regression functions from bounded i.i.d. data is considered. The L 2 error with integration with respect to the design measure is used as an error criterion. The distribution of the design is assumed to be concentrated on a finite set. Neural network estimates are defined by minimizing the empirical L 2 risk over various sets of feedforward neural networks. Nonasymptotic bounds on the L 2 error of these estimates are presented. The results imply that neural networks are able to adapt to additive regression functions and to regression functions which are a sum of ridge functions, and hence are able to circumvent the curse of dimensionality in these cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Andrews D.W.K., Whang Y.J. (1990). Additive interactive regression models: Circumvention of the curse of dimensionality. Econometric Theory 6:466–479

    Article  MathSciNet  Google Scholar 

  • Anthony M., Bartlett P. (1999). Neural network learning: theoretical foundations. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Barron A.R. (1991). Complexity regularization with application to artificial neural networks. In: Roussas G. (eds). Nonparametric functional estimation and related topics, NATO ASI Series. Kluwer, Dodrecht, pp. 561–576

    Google Scholar 

  • Barron A.R. (1994). Approximation and estimation bounds for artificial neural networks. Machine Learning 14:115–133

    MATH  Google Scholar 

  • Bickel P.J., Klaasen C.A.J., Ritov Y., Wellner J.A. (1993). Efficent and adaptive estimation for semiparametric models. The John Hopkins University Press, Baltimore

    Google Scholar 

  • Breiman L. (1993). Fitting additive models to regression data. Computational Statistics and Data Analysis 15:13–46

    Article  MATH  MathSciNet  Google Scholar 

  • Breiman L., Freiman J.H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association 80:580–598

    Article  MATH  MathSciNet  Google Scholar 

  • Burman P. (1990). Estimation of generalized additive models. Journal of Multivariate Analysis 32:230–255

    Article  MATH  MathSciNet  Google Scholar 

  • Chen Z. (1991). Interaction spline models and their convergence rates. Annals of Statistics 19:1855–1868

    MATH  MathSciNet  Google Scholar 

  • Cybenko G. (1989). Approximations by superpositions of sigmoidal functions. Mathematic Control, Signals, Systems 2:303–314

    Article  MATH  MathSciNet  Google Scholar 

  • Devroye L., Györfi L., Lugosi G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, Berlin Heidelberg New York

    MATH  Google Scholar 

  • Friedman J.H., Stuetzle W. (1981). Projection pursuit regression. Journal of the American Statistical Association 76:817–823

    Article  MathSciNet  Google Scholar 

  • Funahashi K. (1989). On the approximate realization of continuous mappings by neural networks. Neural Networks 2:183–192

    Article  Google Scholar 

  • Györfi L., Kohler M., Krzyżak A., Walk H. (2002). A distribution-free theory of nonparametric regression. Springer, Berlin Heidelberg New York

    MATH  Google Scholar 

  • Hamers M., Kohler M. (2003). A bound on the expected maximal deviation of averages from their means. Statistics & Probability Letters 62:137–144

    MATH  MathSciNet  Google Scholar 

  • Hamers M., Kohler M. (2004). How well can a regression function be estimated if the distribution of the (random) design is concentrated on a finite set?. Journal of Statistical Planning and Inference 123:377–394

    Article  MATH  MathSciNet  Google Scholar 

  • Hastie T., Tibshirani R.J. (1990). Generalized additive models. Chapman and Hall, London

    MATH  Google Scholar 

  • Hertz J., Krogh A., Palmer R.G. (1991). Introduction to the Theory of Neural Computation. Addison-Wesley, Redwood City

    Google Scholar 

  • Hornik K., Stinchcombe M., White H. (1989). Multi-layer feedforward networks are universal approximators. Neural Networks 2:359–366

    Article  Google Scholar 

  • Huang J. (1998). Projection estimation in multiple regression with applications to functional anova models. Annals of Statistics 26:242–272

    Article  MATH  MathSciNet  Google Scholar 

  • Kohler M. (1998). Nonparametric regression function estimation using interaction least squares splines and complexity regularization. Metrika 47:147–163

    MATH  MathSciNet  Google Scholar 

  • Lee W.S., Bartlett P.L., Williamson R.C. (1996). Efficient agnostic learning of neural networks with bounded fan–in. IEEE Transactions on Information Theory 42:2118–2132

    Article  MATH  MathSciNet  Google Scholar 

  • Linton O.B. (1997). Efficient estimation of additive nonparametric regression models. Biometrika 84:469–474

    Article  MATH  MathSciNet  Google Scholar 

  • Linton O.B., Härdle W. (1996). Estimating additive regression models with known links. Biometrika 83:529–540

    Article  MATH  MathSciNet  Google Scholar 

  • Linton O.B., Nielsen J.B. (1995). A kernel method of estimating structured nonparametric regression based on marginal integration. Biometrika 82:93–100

    MATH  MathSciNet  Google Scholar 

  • Lugosi G., Zeger K. (1995). Nonparametric estimation via empirical risk minimization. IEEE Transactions on Information Theory 41:677–687

    Article  MATH  MathSciNet  Google Scholar 

  • McCulloch W.S., Pitts W. (1943). A logical calculus of ideas immanent in neural activity. Bulletin of Mathematical Biophysics 5:115–133

    Article  MATH  MathSciNet  Google Scholar 

  • McGaffrey D.F., Gallant A.R. (1994). Convergence rates for single hidden layer feedforward networks. Neural Networks 7:147–158

    Article  Google Scholar 

  • Minsky M.L., Papert S. (1969). Perceptrons: An introduction to computational geometry. MIT Press, Cambridge

    MATH  Google Scholar 

  • Newey W.K. (1994). Kernel estimation of partial means and a general variance estimator. Econometric Theory 10:233–253

    MathSciNet  Google Scholar 

  • Pollard D. (1984). Convergence of stochastic processes. Springer, Berlin Heidelberg New York

    MATH  Google Scholar 

  • Ripley B.D. (1996). Pattern recognition and neural networks. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Rummelhart D.E., McClelland J.L. (1986). Parallel distributed processing: explorations in microstructure of cognition Vol. 1. Foundations. MIT Press, Cambridge

    Google Scholar 

  • Stone C.J. (1982). Optimal global rates of convergence for nonparametric regression. Annals of Statistics 10:1040–1053

    MATH  MathSciNet  Google Scholar 

  • Stone C.J. (1985). Additive regression and other nonparametric models. Annals of Statistics 13:689–705

    MATH  MathSciNet  Google Scholar 

  • Stone C.J. (1994). The use of polynomial splines and their tensor products in multivariate function estimation. Annals of Statistics 22:118–184

    MATH  MathSciNet  Google Scholar 

  • Wahba G., Wang Y., Gu, C., Klein R., Klein B. (1995). Smoothing spline anova for exponential families, with application to the wisconsin epidemiological study of diabetic retinopathy. Annals of Statistics 23:1865–1895

    Article  MATH  MathSciNet  Google Scholar 

  • White H. (1990). Connectionist nonparametric regression: multilayer feedforward networks can learn arbitrary mappings. Neural Networks 3:535–549

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Hamers.

About this article

Cite this article

Hamers, M., Kohler, M. Nonasymptotic Bounds on the L 2 Error of Neural Network Regression Estimates. Ann Inst Stat Math 58, 131–151 (2006). https://doi.org/10.1007/s10463-005-0005-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-005-0005-9

Keywords

Navigation