Minimization of empirical error over perceptron networks

  • Věra Kůrková
Conference paper


Supervised learning by perceptron networks is investigated a minimization of empirical error functional. Input/output functions minimizing this functional require the same number m of hidden units as the size of the training set. Upper bounds on rates of convergence to zero of infima over networks with n hidden units (where n is smaller than m) are derived in terms of a variational norm. It is shown that fast rates are guaranteed when the sample of data defining the empirical error can be interpolated by a function, which may have a rather large Sobolev-type seminorm. Fast convergence is possible even when the seminorm depends exponentially on the input dimension.


Hide Unit Reproduce Kernel Hilbert Space Normed Linear Space Supremum Norm Suboptimal Solution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of AMS, 68: 337–404.MATHMathSciNetCrossRefGoogle Scholar
  2. [2]
    Barron, A. R. (1992). Neural net approximation. In: Proceedings of the 7th Yale Workshop on Adaptive and Learning Systems (pp. 69–72).Google Scholar
  3. [3]
    Cheang, G. H. L., Barron, A. R. (2000). A better approximation for balls. Journal of Approximation Theory 104:183–203.MathSciNetCrossRefGoogle Scholar
  4. [4]
    Cucker, F. and Smale, S. (2001). On the mathematical foundations of learning. Bulletin of AMS 39: 1–49.MathSciNetCrossRefGoogle Scholar
  5. [5]
    Donahue, M. J., Gurvits, L., Darken, C, Sontag, E. (1997). Rates of convex approximation in non-Hilbert spaces. Constructive Approximation 13: 187–220.MathSciNetCrossRefGoogle Scholar
  6. [6]
    Ito, Y. (1992). Finite mapping by neural networks and truth functions. Math. Scientist 17: 69–77.MATHGoogle Scholar
  7. [7]
    Kainen, P. C, Kůrková, V. (1993). Quasiorthogonal dimension of Euclidean spaces. Applied Math. Letters 6:7–10, 1993.CrossRefGoogle Scholar
  8. [8]
    Kainen, P. C, Kůrková, V., Vogt, A. (2004). A Sobolev-type upper bound for rates of approximation by linear combinations of plane waves. Research Report ICS-2003-900, Institute of Computer Science, Prague.Google Scholar
  9. [9]
    Kůrková, V. (2003). High-dimensional approximation and optimization by neural networks. Chapter 4 In: Advances in Learning Theory: Methods, Models and Applications. (Eds. J. Suykens et al.) (pp. 69–88). IOS Press, Amsterdam.Google Scholar
  10. [10]
    Kůrková, V. (2004). Learning from data as an inverse problem. In: Proceedings of COMPSTAT 2004 (Ed. J. Antoch) (pp. 1377–1384). Physica-Verlag, Heidelberg.Google Scholar
  11. [11]
    Kůrková, V., Sanguineti, M. (2004). Error estimates for approximate optimization by the extended Ritz method. SI AM Journal on Optimization (to appear).Google Scholar
  12. [12]
    Kůrková, V., Sanguineti, M. (2004). Learning with generalization capability by kernel methods of bounded complexity. Journal of Complexity (to appear).Google Scholar
  13. [13]
    Kůrková, V., Savický, P., Hlaváčková, K. (1998). Representations and rates of approximation of real-valued Boolean functions by neural networks. Neural Networks 11:651–659.CrossRefGoogle Scholar
  14. [14]
    Poggio T., Smale, S. (2003). The mathematics of learning: dealing with data. Notices of AMS 50: 536–544.MathSciNetGoogle Scholar
  15. [15]
    Shläfli, L. (1950). Gesamelte mathematische abhandlungen. Band 1. Basel, Verlag Birkhäuzer.Google Scholar

Copyright information

© Springer-Verlag/Wien 2005

Authors and Affiliations

  • Věra Kůrková
    • 1
  1. 1.Institute of Computer ScienceAcademy of Sciences of the Czech RepublicCzech Republic

Personalised recommendations