A Few Notes on Statistical Learning Theory

  • Shahar Mendelson
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2600)

Abstract

In these notes our aim is to survey recent (and not so recent) results regarding the mathematical foundations of learning theory. The focus in this article is on the theoretical side and not on the applicative one; hence, we shall not present examples which may be interesting from the practical point of view but have little theoretical significance. This survey is far from being complete and it focuses on problems the author finds interesting (an opinion which is not necessarily shared by the majority of the learning community). Relevant books which present a more evenly balanced approach are, for example 1, 4, 34, 35

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    M. Anthony, P.L. Bartlett: Neural Network Learning: Theoretical Foundations, Cambridge University Press, 1999.Google Scholar
  2. 2.
    N. Alon, S. Ben-David, N. Cesa-Bianchi, D. Haussler: Scale sensitive dimensions, uniform convergence and learnability, J. of ACM 44 (4), 615–631, 1997.MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    O. Bousquet: A Bennett concentration inequality and its application to suprema of empirical processes, preprint.Google Scholar
  4. 4.
    L. Devroye, L. Györfi, G. Lugosi: A Probabilistic Theory of Pattern Recognition, Springer, 1996.Google Scholar
  5. 5.
    R.M. Dudley: Real Analysis and Probability, Chapman and Hall, 1993.Google Scholar
  6. 6.
    R.M. Dudley: The sizes of compact subsets of Hilbert space and continuity of Gaussian processes, J. of Functional Analysis 1, 290–330, 1967.MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    R.M. Dudley: Central limit theorems for empirical measures, Annals of Probability 6(6), 899–929, 1978.MATHMathSciNetCrossRefGoogle Scholar
  8. 8.
    R.M. Dudley: Uniform Central Limit Theorems, Cambridge Studies in Advanced Mathematics 63, Cambridge University Press, 1999.Google Scholar
  9. 9.
    E. Giné, J. Zinn: Some limit theorems for empirical processes, Annals of Probability, 12(4), 929–989, 1984.MATHMathSciNetCrossRefGoogle Scholar
  10. 10.
    D. Haussler: Sphere packing numbers for subsets of Boolean n-cube with bounded Vapnik-Chervonenkis dimension, J. of Combinatorial Theory (A) 69, 217–232, 1995.MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    W. Hoeffding: Probability inequalities for sums of bounded random variables, J. of the American Statistical Association, 58, 13–30, 1963.MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    V. Koltchinskii, D. Panchenko: Rademacher processes and bounding the risk of function learning, High Dimensional Probability, II (Seattle, WA, 1999), 443–457, Progr. Probab., 47, Birkhauser.Google Scholar
  13. 13.
    R. Latala, K. Oleszkiewicz: On the best constant in the Khintchine-Kahane inequality, Studia Math. 109(1), 101–104, 1994.MATHMathSciNetGoogle Scholar
  14. 14.
    M. Ledoux: The Concentration of Measure Phenomenon, Mathematical Surveys an Monographs, Vol 89, AMS, 2001.Google Scholar
  15. 15.
    M. Ledoux, M. Talagrand: Probability in Banach Spaces: Isoperimetry and Processes, Springer, 1991.Google Scholar
  16. 16.
    W.S. Lee, P.L. Bartlett, R.C. Williamson: The Importance of Convexity in Learning with Squared Loss, IEEE Transactions on Information Theory 44 (5), 1974–1980, 1998.MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    P. Massart: About the constants in Talagrand’s concentration inequality for empirical processes, Annals of Probability, 28(2), 863–884, 2000.MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    S. Mendelson: Rademacher averages and phase transitions in Glivenko-Cantelli class, IEEE Transactions on Information Theory, 48(1), 251–263, 2002.MATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    S. Mendelson: Improving the sample complexity using global data, IEEE Transactions on Information Theory, 48(7), 1977–1991, 2002.MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    S. Mendelson: Geometric parameters of kernel machines, in Proceedings of the 15th annual conference on Computational Learning Theory COLT02, Jyrki Kivinen and Robert H. Sloan(Eds.), Lecture Notes in Computer Sciences 2375, Springer, 29–43, 2002.Google Scholar
  21. 21.
    S. Mendelson, R. Vershynin: Entropy, combinatorial dimensions and random averages, in Proceedings of the 15th annual conference on Computational Learning Theory COLT02, Jyrki Kivinen and Robert H. Sloan(Eds.), Lecture Notes in Computer Sciences 2375, Springer, 14–28, 2002.Google Scholar
  22. 22.
    S. Mendelson, R. Vershynin: Entropy and the combinatorial dimension, Inventiones Mathematicae, to appear.Google Scholar
  23. 23.
    S. Mendelson, R.C. Williamson: Agnostic learning nonconvex classes of functions, in Proceedings of the 15th annual conference on Computational Learning Theory COLT02, Jyrki Kivinen and Robert H. Sloan(Eds.), Lecture Notes in Computer Sciences 2375, Springer, 1–13, 2002.Google Scholar
  24. 24.
    V.D. Milman, G. Schechtman: Asymptotic Theory of Finite Dimensional Normed Spaces, Lecture Notes in Mathematics 1200, Springer 1986.Google Scholar
  25. 25.
    A. Pajor: Sous espaces l n 1 des espaces de Banach, Hermann, Paris, 1985.Google Scholar
  26. 26.
    G. Pisier: The volume of convex bodies and Banach space geometry, Cambridge University Press, 1989.Google Scholar
  27. 27.
    E. Rio: Une inegalité de Bennett pour les maxima de processus empiriques, preprint.Google Scholar
  28. 28.
    N. Sauer: On the density of families of sets, J. Combinatorial Theory (A), 13, 145–147, 1972.MATHCrossRefMathSciNetGoogle Scholar
  29. 29.
    S. Shelah: A combinatorial problem: stability and orders for models and theories in infinitary languages, Pacific Journal of Mathematics, 41, 247–261, 1972.MATHMathSciNetGoogle Scholar
  30. 30.
    V.N. Sudakov: Gaussian processes and measures of solid angles in Hilbert space, Soviet Mathematics. Doklady 12, 412–415, 1971.MATHMathSciNetGoogle Scholar
  31. 31.
    M. Talagrand: Type, infratype and the Elton-Pajor theorem, Inventiones Mathematicae, 107, 41–59, 1992.MATHCrossRefMathSciNetGoogle Scholar
  32. 32.
    M. Talagrand: Sharper bounds for Gaussian and empirical processes, Annals of Probability, 22(1), 28–76, 1994.MATHMathSciNetCrossRefGoogle Scholar
  33. 33.
    A.W. Van der Vaart, J.A. Wellner: Weak Convergence and Empirical Processes, Springer-Verlag, 1996.Google Scholar
  34. 34.
    V. Vapnik: Statistical Learning Theory, Wiley 1998.Google Scholar
  35. 35.
    A. Vidyasagar: The Theory of Learning and Generalization Springer-Verlag, 1996.Google Scholar
  36. 36.
    V. Vapnik, A. Chervonenkis: Necessary and sufficient conditions for uniform convergence of means to mathematical expectations, Theory Prob. Applic. 26(3), 532–553, 1971.CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Shahar Mendelson
    • 1
  1. 1.RSISE, The Australian National UniversityCanberraAustralia

Personalised recommendations