Abstract
We consider the problem of approximating functions from scattered data using linear superpositions of non-linearly parameterized functions. We show how the total error (generalization error) can be decomposed into two parts: an approximation part that is due to the finite number of parameters of the approximation scheme used; and an estimation part that is due to the finite number of data available. We bound each of these two parts under certain assumptions and prove a general bound for a class of approximation schemes that include radial basis functions and multilayer perceptrons.
Similar content being viewed by others
References
A.R. Barron, Approximation and estimation bounds for artificial neural networks, Technical Report 59, Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL (March 1991).
A.R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, Tehnical Report 58, Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL (March 1991).
A.R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Trans. Inform. Theory 39(3) (1993) 930-945.
A.R. Barron, Approximation and estimation bounds for artificial neural networks, Machine Learning 14 (1994) 115-133.
R.E. Bellman, Adaptive Control Processes (Princeton University Press, Princeton, NJ, 1961).
L. Breiman, Hinging hyperplanes for regression, classification, and function approximation, IEEE Trans. Inform. Theory 39(3) (1993) 999-1013.
L. Breiman, Stacked regression, Technical Report, University of California, Berkeley, CA (1993).
G. Cybenko, Approximation by superposition of a sigmoidal function, Math. Control Systems Signals 2(4) (1989) 303-314.
R. DeVore, R. Howard and C. Micchelli, Optimal nonlinear approximation, Manuskripta Math. (1989).
R.A. DeVore, Degree of nonlinear approximation, in: Approximation Theory, Vol. 6, eds. C.K. Chui, L.L. Schumaker and D.J. Ward (Academic Press, New York, 1991) pp. 175-201.
R.A. DeVore and X.M. Yu, Nonlinear n-widths in Besov spaces, in: Approximation Theory, Vol. 6, eds. C.K. Chui, L.L. Schumaker and D.J. Ward (Academic Press, New York, 1991) pp. 203-206.
R.M. Dudley, Universal Donsker classes and metric entropy, Ann. Probab. 14(4) (1987) 1306-1326.
K. Funahashi, On the approximate realization of continuous mappings by neural networks, Neural Networks 2 (1989) 183-192.
F. Girosi, On some extensions of radial basis functions and their applications in artificial intelligence, Comput. Math. Appl. 24(12) (1992) 61-80.
F. Girosi and G. Anzellotti, Rates of convergence for radial basis functions and neural networks, in: Artificial Neural Networks for Speech and Vision, ed. R.J. Mammone (Chapman and Hall, London, 1993) pp. 97-113.
D. Haussler, Decision theoretic generalizations of the PAC model for neural net and other learning applications, Technical Report UCSC-CRL-91-02, University of California, Santa Cruz, CA (1989).
D. Haussler, Decision theoretic generalizations of the PAC model for neural net and other learning applications, Inform. Comput. 100(1) (1992) 78-150.
K. Hornik, M. Stinchcombe and H. White, Multilayer feedforward networks are universal approximators, Neural Networks 2 (1989) 359-366.
K. Hornik, M. Stinchcombe, H. White and P. Auer, Degree of approximation results for feedforward network approximating unknown functions and their derivatives, Neural Comput. 6 (1994) 1262-127.
L.K. Jones, A simple lemma on greedy approximation in Hilbert space and convergence rates for Projection Pursuit Regression and neural network training, Ann. Statist. 20(1) (1992) 608-613.
L.D. Kudryavtsev and S.M. Nikol'skii, Spaces of differentiable functions of several variables, in: Analysis III, ed. S.M. Nikol'skii (Springer, Berlin, 1991).
R.P. Lippmann, An introduction to computing with neural nets, IEEE ASSP Mag. (April 1987) 4-22.
G.G. Lorentz, Metric entropy, widths, and superposition of functions, Amer. Math. Monthly 69 (1962) 469-485.
G.G. Lorentz, Approximation of Functions (Chelsea, New York, 1986).
H. Mhaskar and C. Micchelli, Degree of approximation by neural and translation networks with a single hidden layer, Adv. Appl. Math. 16 (1995) 151-183.
H.N. Mhaskar, Neural networks for optimal approximation of smooth and analytic functions, Neural Comput. 8 (1996) 164-167.
H.N. Mhaskar, Approximation properties of a multilayered feedforward artificial neural network, Adv. Comput. Math. 1 (1993) 61-80.
H.N. Mhaskar and C.A. Micchelli, Approximation by superposition of a sigmoidal function, Adv. Appl. Math. 13 (1992) 350-373.
H.N. Mhaskar and C.A. Micchelli, Dimension independent bounds on the degree of approximation by neural networks, IBM J. Res. Devel. 38 (1994) 277-284.
C.A. Micchelli, Interpolation of scattered data: distance matrices and conditionally positive definite functions, Constr. Approx. 2 (1986) 11-22.
J. Moody and C. Darken, Fast learning in networks of locally-tuned processing units, Neural Comput. 1(2) (1989) 281-294.
N. Murata, An integral representation of functions using three layered networks and their approximation bounds, Neural Networks 9 (1996) 947-956.
A. Pinkus, N-widths in Approximation Theory (Springer, New York, 1986).
T. Poggio and F. Girosi, Networks for approximation and learning, Proc. IEEE 78(9) (1990).
T. Poggio and F. Girosi, Regularization algorithms for learning that are equivalent to multilayer networks, Science 247 (1990) 978-982.
D. Pollard, Convergence of Stochastic Processes (Springer, Berlin, 1984).
M.J.D. Powell, Radial basis functions for multivariable interpolation: a review, in: Algorithms for Approximation, eds. J.C. Mason and M.G. Cox (Clarendon Press, Oxford, 1987).
D.E. Rumelhart, G.E. Hinton and R.J. Williams, Parallel Distributed Processing (MIT Press, Cambridge, MA, 1986).
E.M. Stein, Singular Integrals and Differentiability Properties of Functions (Princeton University Press, Princeton, NJ, 1970).
A.F. Timan, Theory of Approximation of Functions of a Real Variable (Macmillan, New York, 1963).
V.N. Vapnik, Estimation of Dependences Based on Empirical Data (Springer, Berlin, 1982).
V.N. Vapnik and A.Y. Chervonenkis, On the uniform convergence of relative frequences of events to their probabilities, Theory Probab. Appl. 17(2) (1971) 264-280.
V.N. Vapnik and A.Y. Chervonenkis, The necessary and sufficient conditions for the uniform convergence of averages to their expected values, Teor. Veroyatnost. i Primenen. 26(3) (1981) 543-564.
V.N. Vapnik and A.Y. Chervonenkis, The necessary and sufficient conditions for consistency in the empirical risk minimization method, Pattern Recognition and Image Analysis 1(3) (1991) 283-305.
H. White, Connectionist nonparametric regression: Multilayer perceptrons can learn arbitrary mappings, Neural Networks 3 (1990) 535-549.
W.P. Ziemer, Weakly Differentiable Functions: Sobolev Spaces and Functions of Bounded Variation (Springer, New York, 1989).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Niyogi, P., Girosi, F. Generalization bounds for function approximation from scattered noisy data. Advances in Computational Mathematics 10, 51–80 (1999). https://doi.org/10.1023/A:1018966213079
Issue Date:
DOI: https://doi.org/10.1023/A:1018966213079