Rigorous learning curve bounds from statistical mechanics

Haussler, David; Kearns, Michael; Seung, H. Sebastian; Tishby, Naftali

doi:10.1007/BF00114010

Rigorous learning curve bounds from statistical mechanics

Published: November 1996

Volume 25, pages 195–236, (1996)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Rigorous learning curve bounds from statistical mechanics

Download PDF

David Haussler¹,
Michael Kearns²,
H. Sebastian Seung³ &
…
Naftali Tishby⁴

844 Accesses
14 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the well-established Vapnik-Chervonenkis theory is that our bounds can be considerably tighter in many cases, and are also more reflective of the true behavior of learning curves. This behavior can often exhibit dramatic properties such as phase transitions, as well as power law asymptotics not explained by the VC theory. The disadvantages of our theory are that its application requires knowledge of the input distribution, and it is limited so far to finite cardinality function classes.

We illustrate our results with many concrete examples of learning curve bounds derived from our theory.

Avoid common mistakes on your manuscript.

References

Amari S., Fujita N., & Shinomoto S. (1992). Four types of learning curves. Neural Computation, 4(4):605–618.
Google Scholar
Baum E.B., & Lyuu Y.-D. (1991). The transition to perfect generalization in perceptrons. Neural Comput., 3:386–401.
Google Scholar
Benedek G., & Itai A. (1991). Learnability with respect to fixed distributions. Theoret. Comput. Sci., 86(2):377–389.
Google Scholar
Cohn D., & Tesauro G. (1992). How tight are the Vapnik-Chervonenkis bounds? Neural Comput., 4:249–269.
Google Scholar
Cover, T., & Thomas, J. (1991). Elements of Information Theory, Wiley.
Devroye, L., & Lugosi, G. (1994). Lower bounds in pattern recognition and learning. Preprint.
Dudley R.M. (1978). Central limit theorems for empirical measures. Annals of Probability, 6(6):899–929.
Google Scholar
Ehrenfeucht A., Haussler D., Kearns M., & Valiant L. (1989). A general lower bound on the number of examples needed for learning. Information and Computation, 82(3):247–251.
Google Scholar
Engel A., & Fink W. (1993). Statistical mechanics calculation of Vapnik Chervonenkis bounds for perceptrons. J. Phys., 26:6893–6914.
Google Scholar
Engel A., & van denBroeck C. (1993). Systems that can learn from examples: replica calculation of uniform convergence bounds for the perceptron. Phys. Rev. Lett., 71: 1772–1775.
Google Scholar
Gardner E. (1988). The space of interactions in neural network models. J. Phys., A21:257–270.
Google Scholar
Gardner E., & Derrida B. (1989). Three unfinished works on the optimal storage capacity of networks. J. Phys., A22:1983–1994.
Google Scholar
Goldman S.A., Kearns M.J., & Schapire R.E. (1990). On the sample complexity of weak learning. In Proceedings of the 3rd Workshop on Computational Learning Theory (pp. 217–231), San Mateo, CA: Morgan Kaufmann.
Google Scholar
Györgyi G. (1990). First-order transition to perfect generalization in a neural network with binary synapses. Phys. Rev., A41:7097–7100.
Google Scholar
Gyorgyi, G., & Tishby, N. (1990). Statistical theory of learning a rule. In K. Thuemann & R. Koeberle (Eds.), Neural Networks and Spin Glasses, World Scientific.
Haussler D. (1992). Decision-theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1):78–150.
Google Scholar
Haussler D., Kearns M., & Schapire R.E. (1991). Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension. In Proceedings of the 4th Workshop on Computational Learning Theory (pp. 61–74), San Mateo, CA: Morgan Kaufmann.
Google Scholar
Levin, E., Tishby, N., & Solla, S. (1989). A statistical approach to learning and generalization in neural networks. In R. Rivest, (Ed.), Proc. 3rd Annu. Workshop on Comput. Learning Theory, Morgan Kaufmann.
Lyuu Y.-D., & Rivin I. (1992) Tight bounds on transition to perfect generalization in perceptrons. Neural Comput., 4:854–862.
Google Scholar
Martin G.L., & Pittman J.A. (1991). Recognizing hand-printed letters and digits using backpropagation learning. Neural Comput., 3:258–267.
Google Scholar
Oblow E. (1992). Implementing Valiant's learnability theory using random sets. Machine Learning, 8(1):45–74.
Google Scholar
Pollard, D. (1984). Convergence of Stochastic Processes, Springer-Verlag.
Sarrett W., & Pazzani M. (1992). Average case analysis of empirical and explanation-based learning algorithms. Machine Learning, 9(4):349–372.
Google Scholar
Schwartz D.B., Samalam V.K., Denker J.S., & Solla S.A. (1990). Exhaustive learning. Neural Comput., 2:374–385.
Google Scholar
Seung H.S., Sompolinsky H., & Tishby N. (1992). Statistical mechanics of learning from examples. Physical Review, A45:6056–6091.
Google Scholar
Simon H.U. (1993). General bounds on the number of examples needed for learning probabilistic concepts. In Proceedings of the 6th Annual ACM Conference on Computational Learning Theory (pp. 402–411), New York, NY: ACM Press.
Google Scholar
Sompolinsky H., Seung H.S., & Tishby N. (1991). Learning curves in large neural networks. In Proc. 4th Annu. Workshop on Comput. Learning Theory (pp. 112–127), San Mateo, CA: Morgan Kaufmann.
Google Scholar
Sompolinsky H., Tishby N., & Seung H.S. (1990). Learning from examples in large neural networks. Phys. Rev. Lett., 65(13):1683–1686.
Google Scholar
Vapnik V., Levin E., & LeCun Y. (1994). Measuring the VC dimension of a learning machine. Neural Computation, 6(5):851–876.
Google Scholar
Vapnik V.N. (1982). Estimation of Dependences Based on Empirical Data, Springer-Verlag, New York.
Google Scholar
Vapnik V.N., & Chervonenkis A.Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264–280.
Google Scholar
Watkin T.L.H., Rau A., & Biehl M. (1993). The statistical mechanics of learning a rule. Rev. Mod. Phys., 65:499–556.
Google Scholar

Download references

Author information

Authors and Affiliations

U.C. Santa Cruz, Santa Cruz, California
David Haussler
AT&T Laboratories Research, Murray Hill, New Jersey
Michael Kearns
Bell Laboratories, Lucent Technologies, Murray Hill, New Jersey
H. Sebastian Seung
Hebrew University, Jerusalem, Israel
Naftali Tishby

Authors

David Haussler
View author publications
You can also search for this author in PubMed Google Scholar
Michael Kearns
View author publications
You can also search for this author in PubMed Google Scholar
H. Sebastian Seung
View author publications
You can also search for this author in PubMed Google Scholar
Naftali Tishby
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haussler, D., Kearns, M., Seung, H.S. et al. Rigorous learning curve bounds from statistical mechanics. Mach Learn 25, 195–236 (1996). https://doi.org/10.1007/BF00114010

Download citation

Issue Date: November 1996
DOI: https://doi.org/10.1007/BF00114010

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Rigorous learning curve bounds from statistical mechanics

Abstract

Article PDF

Similar content being viewed by others

The Moment Zeta Function and Applications

Online Learning

Relative deviation learning bounds and generalization with unbounded loss functions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Rigorous learning curve bounds from statistical mechanics

Abstract

Article PDF

Similar content being viewed by others

The Moment Zeta Function and Applications

Online Learning

Relative deviation learning bounds and generalization with unbounded loss functions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation