A result relating convex n-widths to covering numbers with some applications to neural networks

  • Jonathan Baxter
  • Peter Bartlett
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1208)


In general, approximating classes of functions defined over high-dimensional input spaces by linear combinations of a fixed set of basis functions or “features” is known to be hard. Typically, the worstcase error of the best basis set decays only as fast as Θ (n−1/d, where n is the number of basis functions and d is the input dimension. However, there are many examples of high-dimensional pattern recognition problems (such as face recognition) where linear combinations of small sets of features do solve the problem well. Hence these function classes do not suffer from the “curse of dimensionality” associated with more general classes. It is natural then, to look for characterizations of highdimensional function classes that nevertheless are approximated well by linear combinations of small sets of features. In this paper we give a general result relating the error of approximation of a function class to the covering number of its “convex core”. For one-hidden-layer neural networks, covering numbers of the class of functions computed by a single hidden node upper bound the covering numbers of the convex core. Hence, using standard results we obtain upper bounds on the approximation rate of neural network classes.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Jonathan Baxter. Learning Internal Representations. In Proceedings of the Eighth International Conference on Computational Learning Theory, Santa Cruz, California, 1995. ACM Press.Google Scholar
  2. 2.
    Shimon Edelman and Nathan Intrator. Learning low dimensional representations of visual objects with extensive use of prior knowledge. In Sebastian Thrun, editor, Explanation-Based Neural Network Learning. Kluwer Academic, 1996. To Appear.Google Scholar
  3. 3.
    David Haussler. Sphere packing numbers of the boolean n-cube with bounded VC-dimension. Journal of Combinatorial Theory A, 69:217–232, 1995.Google Scholar
  4. 4.
    A N Kolmogorov. Über die beste Annäherung von Funktionen einer gegebenen Funktionenclasse. Ann. Math., 37:107–110, 1936.Google Scholar
  5. 5.
    A N Kolmogorov and V M Tihomirov. ε-entropy and ε-capacity of sets in functional spaces. AMS Translations Series 2, 17:277–364, 1961.Google Scholar
  6. 6.
    George G Lorentz, Manfred v. Golitschek, and Yuly Makovoz. Constructive Approximation: advanced problems. Springer Verlag, Berlin, 1996.Google Scholar
  7. 7.
    Allan Pinkus. n-Widths in Approximation Theory. Springer-Verlag, Berlin, 1985.Google Scholar
  8. 8.
    Sebastian Thrun and Tom M Mitchell. Learning One More Thing. Technical Report CMU-CS-94-184, CMU, 1994.Google Scholar
  9. 9.
    Aad W van der Vaart and Jon A Wellner. Weak Convergence and Empirical Processes. Springer-Verlag, New York, 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Jonathan Baxter
    • 1
  • Peter Bartlett
    • 1
  1. 1.Department of Systems Engineering Research School of Information Sciences and EngineeringAustralian National UniversityCanberraAustralia

Personalised recommendations