Abstract
In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the well-established Vapnik-Chervonenkis theory is that our bounds can be considerably tighter in many cases, and are also more reflective of the true behavior of learning curves. This behavior can often exhibit dramatic properties such as phase transitions, as well as power law asymptotics not explained by the VC theory. The disadvantages of our theory are that its application requires knowledge of the input distribution, and it is limited so far to finite cardinality function classes.
We illustrate our results with many concrete examples of learning curve bounds derived from our theory.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Amari S., Fujita N., & Shinomoto S. (1992). Four types of learning curves. Neural Computation, 4(4):605–618.
Baum E.B., & Lyuu Y.-D. (1991). The transition to perfect generalization in perceptrons. Neural Comput., 3:386–401.
Benedek G., & Itai A. (1991). Learnability with respect to fixed distributions. Theoret. Comput. Sci., 86(2):377–389.
Cohn D., & Tesauro G. (1992). How tight are the Vapnik-Chervonenkis bounds? Neural Comput., 4:249–269.
Cover, T., & Thomas, J. (1991). Elements of Information Theory, Wiley.
Devroye, L., & Lugosi, G. (1994). Lower bounds in pattern recognition and learning. Preprint.
Dudley R.M. (1978). Central limit theorems for empirical measures. Annals of Probability, 6(6):899–929.
Ehrenfeucht A., Haussler D., Kearns M., & Valiant L. (1989). A general lower bound on the number of examples needed for learning. Information and Computation, 82(3):247–251.
Engel A., & Fink W. (1993). Statistical mechanics calculation of Vapnik Chervonenkis bounds for perceptrons. J. Phys., 26:6893–6914.
Engel A., & van denBroeck C. (1993). Systems that can learn from examples: replica calculation of uniform convergence bounds for the perceptron. Phys. Rev. Lett., 71: 1772–1775.
Gardner E. (1988). The space of interactions in neural network models. J. Phys., A21:257–270.
Gardner E., & Derrida B. (1989). Three unfinished works on the optimal storage capacity of networks. J. Phys., A22:1983–1994.
Goldman S.A., Kearns M.J., & Schapire R.E. (1990). On the sample complexity of weak learning. In Proceedings of the 3rd Workshop on Computational Learning Theory (pp. 217–231), San Mateo, CA: Morgan Kaufmann.
Györgyi G. (1990). First-order transition to perfect generalization in a neural network with binary synapses. Phys. Rev., A41:7097–7100.
Gyorgyi, G., & Tishby, N. (1990). Statistical theory of learning a rule. In K. Thuemann & R. Koeberle (Eds.), Neural Networks and Spin Glasses, World Scientific.
Haussler D. (1992). Decision-theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1):78–150.
Haussler D., Kearns M., & Schapire R.E. (1991). Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension. In Proceedings of the 4th Workshop on Computational Learning Theory (pp. 61–74), San Mateo, CA: Morgan Kaufmann.
Levin, E., Tishby, N., & Solla, S. (1989). A statistical approach to learning and generalization in neural networks. In R. Rivest, (Ed.), Proc. 3rd Annu. Workshop on Comput. Learning Theory, Morgan Kaufmann.
Lyuu Y.-D., & Rivin I. (1992) Tight bounds on transition to perfect generalization in perceptrons. Neural Comput., 4:854–862.
Martin G.L., & Pittman J.A. (1991). Recognizing hand-printed letters and digits using backpropagation learning. Neural Comput., 3:258–267.
Oblow E. (1992). Implementing Valiant's learnability theory using random sets. Machine Learning, 8(1):45–74.
Pollard, D. (1984). Convergence of Stochastic Processes, Springer-Verlag.
Sarrett W., & Pazzani M. (1992). Average case analysis of empirical and explanation-based learning algorithms. Machine Learning, 9(4):349–372.
Schwartz D.B., Samalam V.K., Denker J.S., & Solla S.A. (1990). Exhaustive learning. Neural Comput., 2:374–385.
Seung H.S., Sompolinsky H., & Tishby N. (1992). Statistical mechanics of learning from examples. Physical Review, A45:6056–6091.
Simon H.U. (1993). General bounds on the number of examples needed for learning probabilistic concepts. In Proceedings of the 6th Annual ACM Conference on Computational Learning Theory (pp. 402–411), New York, NY: ACM Press.
Sompolinsky H., Seung H.S., & Tishby N. (1991). Learning curves in large neural networks. In Proc. 4th Annu. Workshop on Comput. Learning Theory (pp. 112–127), San Mateo, CA: Morgan Kaufmann.
Sompolinsky H., Tishby N., & Seung H.S. (1990). Learning from examples in large neural networks. Phys. Rev. Lett., 65(13):1683–1686.
Vapnik V., Levin E., & LeCun Y. (1994). Measuring the VC dimension of a learning machine. Neural Computation, 6(5):851–876.
Vapnik V.N. (1982). Estimation of Dependences Based on Empirical Data, Springer-Verlag, New York.
Vapnik V.N., & Chervonenkis A.Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264–280.
Watkin T.L.H., Rau A., & Biehl M. (1993). The statistical mechanics of learning a rule. Rev. Mod. Phys., 65:499–556.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Haussler, D., Kearns, M., Seung, H.S. et al. Rigorous learning curve bounds from statistical mechanics. Mach Learn 25, 195–236 (1996). https://doi.org/10.1007/BF00114010
Issue Date:
DOI: https://doi.org/10.1007/BF00114010