Advertisement

Machine Learning

, Volume 25, Issue 2–3, pp 195–236 | Cite as

Rigorous Learning Curve Bounds from Statistical Mechanics

  • David Haussler
  • Michael Kearns
  • H. Sebastian Seung
  • Naftali Tishby
Article

Abstract

In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the well-established Vapnik-Chervonenkis theory is that our bounds can be considerably tighter in many cases, and are also more reflective of the true behavior of learning curves. This behavior can often exhibit dramatic properties such as phase transitions, as well as power law asymptotics not explained by the VC theory. The disadvantages of our theory are that its application requires knowledge of the input distribution, and it is limited so far to finite cardinality function classes.

We illustrate our results with many concrete examples of learning curve bounds derived from our theory.

learning curves statistical mechanics phase transitions VC dimension 

References

  1. Amari, S., Fujita, N., & Shinomoto, S. (1992). Four types of learning curves. Neural Computation, 4(4):605–618.Google Scholar
  2. Baum, E.B., & Lyuu, Y.-D. (1991). The transition to perfect generalization in perceptrons. Neural Comput., 3:386–401.Google Scholar
  3. Benedek, G., & Itai, A. (1991). Learnability with respect to fixed distributions. Theoret. Comput. Sci., 86(2):377–389.Google Scholar
  4. Cohn, D., & Tesauro, G. (1992). How tight are the Vapnik-Chervonenkis bounds? Neural Comput., 4:249–269.Google Scholar
  5. Cover, T., & Thomas, J. (1991). Elements of Information Theory, Wiley.Google Scholar
  6. Devroye, L., & Lugosi, G. (1994). Lower bounds in pattern recognition and learning. Preprint.Google Scholar
  7. Dudley, R.M. (1978). Central limit theorems for empirical measures. Annals of Probability, 6(6):899–929.Google Scholar
  8. Ehrenfeucht, A., Haussler, D., Kearns, M., & Valiant, L. (1989). Ageneral lower bound on the number of examples needed for learning. Information and Computation, 82(3):247–251.Google Scholar
  9. Engel, A., & Fink, W. (1993). Statistical mechanics calculation of Vapnik Chervonenkis bounds for perceptrons. J. Phys., 26:6893–6914.Google Scholar
  10. Engel, A., & van den Broeck, C. (1993). Systems that can learn from examples: replica calculation of uniform convergence bounds for the perceptron. Phys. Rev. Lett., 71:1772–1775.Google Scholar
  11. Gardner, E. (1988). The space of interactions in neural network models. J. Phys., A21:257–270.Google Scholar
  12. Gardner, E., & Derrida, B. (1989). Three unfinished works on the optimal storage capacity of networks. J. Phys., A22:1983–1994.Google Scholar
  13. Goldman, S.A., Kearns, M.J., & Schapire, R.E. (1990). On the sample complexity of weak learning. In Proceedings of the 3rd Workshop on Computational Learning Theory(pp. 217–231), San Mateo, CA: Morgan Kaufmann.Google Scholar
  14. Györgyi, G. (1990). First-order transition to perfect generalization in a neural network with binary synapses. Phys. Rev., A41:7097–7100.Google Scholar
  15. Gyorgyi, G., & Tishby, N. (1990). Statistical theory of learning a rule. In K. Thuemann & R. Koeberle (Eds.), Neural Networks and Spin Glasses, World Scientific.Google Scholar
  16. Haussler, D. (1992). Decision-theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1):78–150.Google Scholar
  17. Haussler, D., Kearns, M., & Schapire, R.E. (1991). Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension. In Proceedings of the 4th Workshop on Computational Learning Theory (pp. 61–74), San Mateo, CA: Morgan Kaufmann.Google Scholar
  18. Levin, E., Tishby, N., & Solla, S. (1989). A statistical approach to learning and generalization in neural networks. In R. Rivest, (Ed.), Proc. 3rd Annu. Workshop on Comput. Learning Theory, Morgan Kaufmann.Google Scholar
  19. Lyuu, Y.-D., & Rivin, I. (1992) Tight bounds on transition to perfect generalization in perceptrons. Neural Comput., 4:854–862.Google Scholar
  20. Martin, G.L., & Pittman, J.A. (1991). Recognizing hand-printed letters and digits using backpropagation learning. Neural Comput., 3:258–267.Google Scholar
  21. Oblow, E. (1992). Implementing Valiant's learnability theory using random sets. Machine Learning, 8(1):45–74.Google Scholar
  22. Pollard, D. (1984). Convergence of Stochastic Processes, Springer-Verlag.Google Scholar
  23. Sarrett, W., & Pazzani, M. (1992). Average case analysis of empirical and explanation-based learning algorithms. Machine Learning, 9(4):349–372.Google Scholar
  24. Schwartz, D.B., Samalam, V.K., Denker, J.S., & Solla, S.A. (1990). Exhaustive learning. Neural Comput., 2:374–385.Google Scholar
  25. Seung, H.S., Sompolinsky, H., & Tishby, N. (1992). Statistical mechanics of learning from examples. Physical Review, A45:6056–6091.Google Scholar
  26. Simon, H.U. (1993). General bounds on the number of examples needed for learning probabilistic concepts. In Proceedings of the 6th Annual ACM Conference on Computational Learning Theory(pp. 402–411), New York, NY: ACM Press.Google Scholar
  27. Sompolinsky, H., Seung, H.S., & Tishby, N. (1991). Learning curves in large neural networks. In Proc. 4th Annu. Workshop on Comput. Learning Theory (pp. 112–127), San Mateo, CA: Morgan Kaufmann.Google Scholar
  28. Sompolinsky, H., Tishby, N., & Seung, H.S. (1990). Learning from examples in large neural networks. Phys. Rev. Lett., 65(13):1683–1686.Google Scholar
  29. Vapnik, V., Levin, E., & LeCun, Y. (1994). Measuring the VC dimension of a learning machine. Neural Compu-tation, 6(5):851–876.Google Scholar
  30. Vapnik, V.N. (1982). Estimation of Dependences Based on Empirical Data, Springer-Verlag, New York.Google Scholar
  31. Vapnik, V.N., & Chervonenkis, A.Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264–280.Google Scholar
  32. Watkin, T.L.H., Rau, A., & Biehl, M. (1993). The statistical mechanics of learning a rule. Rev. Mod. Phys., 65:499–556.Google Scholar

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • David Haussler
    • 1
  • Michael Kearns
    • 2
  • H. Sebastian Seung
    • 3
  • Naftali Tishby
    • 4
  1. 1.U.C. Santa CruzSanta CruzCalifornia
  2. 2.AT&T Laboratories ResearchNew Jersey
  3. 3.Bell LaboratoriesLucent TechnologiesNew Jersey
  4. 4.Hebrew UniversityJerusalemIsrael

Personalised recommendations