Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension

Floyd, Sally; Warmuth, Manfred

doi:10.1023/A:1022660318680

Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension

Published: December 1995

Volume 21, pages 269–304, (1995)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension

Download PDF

Sally Floyd¹ &
Manfred Warmuth²

1732 Accesses
11 Citations
Explore all metrics

Abstract

Within the framework of pac-learning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A sample compression scheme of size k for a concept class C \(\subseteq \) 2^X consists of a compression function and a reconstruction function. The compression function receives a finite sample set consistent with some concept in C and chooses a subset of k examples as the compression set. The reconstruction function forms a hypothesis on X from a compression set of k examples. For any sample set of a concept in C the compression set produced by the compression function must lead to a hypothesis consistent with the whole original sample set when it is fed to the reconstruction function. We demonstrate that the existence of a sample compression scheme of fixed-size for a class C is sufficient to ensure that the class C is pac-learnable.

Previous work has shown that a class is pac-learnable if and only if the Vapnik-Chervonenkis (VC) dimension of the class is finite. In the second half of this paper we explore the relationship between sample compression schemes and the VC dimension. We define maximum and maximal classes of VC dimension d. For every maximum class of VC dimension d, there is a sample compression scheme of size d, and for sufficiently-large maximum classes there is no sample compression scheme of size less than d. We discuss briefly classes of VC dimension d that are maximal but not maximum. It is an open question whether every class of VC dimension d has a sample compression scheme of size O(d).

References

Angluin, D. (1988). Queries and concept learning. Machine Learning, Vol. 2 No. 4, 319–342, Apr. 1988.
Google Scholar
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1987). Occam's razor. Information Processing Letters, Vol. 24, 377–380.
Google Scholar
Blumer, A., A. Ehrenfeucht, D. Haussler, & M. Warmuth. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the Association for Computing Machinery, Vol. 36, No. 4, 929–965.
Google Scholar
Blumer, A., & Littlestone, N. (1989). Learning faster than promised by the Vapnik-Chervonenkis dimension. Discrete Applied Mathematics 24, p.47-53.
Google Scholar
Cesa-Bianchi, N., Freund, Y., Helmbold, D. P., Haussler, D., Schapire, R. E., & Warmuth, M. K. (1993). How to use expert advice. Proceedings of the 25th ACM Symposium on the Theory of Computation, 382–391.
Clarkson, K. L. (1992). Randomized geometric algorithms. In F. K. Hwang and D. Z. Hu (Eds.), Euclidean Geometry and Computers. World Scientific Publishing.
Ehrenfeucht, A., Haussler, D., Kearns, M., & Valiant, L. (1987). A general lower bound on the number of examples needed for learning. Proceedings of the 1988 Workshop on Computational Learning Theory, Morgan Kaufmann, 139–154.
Floyd, S. (1989). On space-bounded learning and the Vapnik-Chervonenkis dimension. PhD thesis, International Computer Science Institute Technical Report TR-89-061, Berkeley, California.
Freund, Y. (1995). Boosting a weak learning algorithm by majority. To appear in Information and Computation.
Goldman, S., & Sloan, R. (1994). The power of self-directed learning. Machine Learning, Vol. 14 No. 3, 271–294.
Google Scholar
Haussler, D. (1988). Space efficient learning algorithms. Technical Report UCSC-CRL-88-2, University of California Santa Cruz.
Haussler, D., Welzl, E. (1987). Epsilon-nets & simplex range queries. Discrete and Computational Geometry 2, 127–151.
Google Scholar
Helmbold, D. P., & Warmuth, M. K. (1995). On weak learning. Journal of Computer and System Sciences, to appear.
Helmbold, D., Sloan, R., & Warmuth, M. (1990). Learning nested differences of intersection-closed concept classes. Machine Learning 5, 165–196, 1990.
Google Scholar
Helmbold, D., Sloan, R., & Warmuth, M. (1992). Learning integer lattices. Siam Journal on Computing, Vol. 21 No. 2, 240–266.
Google Scholar
Littlestone, N. (1988). Learning when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning 2, 285–318.
Google Scholar
Littlestone, N. (1989). Mistake bounds and logarithmic linear-threshold learning algorithms. PhD thesis, Technical Report UCSC-CRL-89-11, University of California Santa Cruz.
Littlestone, N., Haussler, D., & Warmuth, M. (1994). Predicting {0, 1}-functions on randomly drawn points. Information and Computation, Vol. 115 No. 2, 148–292.
Google Scholar
Littlestone, N, & Warmuth, M. (1986). Relating data compression and learnability. Unpublished manuscript, University of California Santa Cruz.
Mitchell, T. (1977). Version spaces: a candidate elimination approach to rule learning. Proceedings of the International Joint Committee for Artificial Intelligence 1977. Cambridge, Mass., 305–310.
Pach, J., & Woeginger, G. (1990). Some new bounds for epsilon-nets. Proceedings of the Sixth Annual Symposium on Computational Geometry, Berkeley, California, 10–15.
Pitt, L., & Valiant, L. (1988). Computational limitations on learning from examples. Journal of the Association for Computing Machinery, Vol. 35 No. 4, 965–984.
Google Scholar
Quinlan, J., & Rivest, R. (1989). Inferring decision trees using the minimum description length principle. Information and Computation, Vol. 80, 227–248.
Google Scholar
Rissanen, J. (1986). Stochastic complexity and modeling. Annals of Statistics, Vol. 14 No. 3, 1080–1100.
Google Scholar
Sauer, N. (1972). On the density of families of sets. Journal of Combinatorial Theory (A) 13, 145–147.
Google Scholar
Schapire, R. (1990). The strength of weak learnability. Machine Learning, Vol. 5 No. 2, 197–227.
Google Scholar
Shawe-Taylor, J., Anthony, M., & Biggs, N. (1989). Bounding sample size with the Vapnik-Chervonenkis dimension. Technical Report CSD-TR-618, University of London, Royal Halloway and New Bedford College.
Valiant, L.G. (1984). A theory of the learnable. Communications of the Association for Computing Machinery, Vol. 27, No. 11, 1134–42.
Google Scholar
Vapnik, V.N. (1982). Estimation of dependencies based on empirical data. Springer Verlag, New York.
Google Scholar
Vapnik, V.N. & Chervonenkis, A.Ya. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, Vol. 16, No. 2, 264–280.
Google Scholar
Welzl, E. (1987). Complete range spaces. Unpublished notes.
Welzl, E., & Woeginger, G. (1987). On Vapnik-Chervonenkis dimension one. Unpublished manuscript, Institutes for Information Processing, Technical University of Graz and Austrian Computer Society, Austria.

Download references

Author information

Authors and Affiliations

M/S 46A-1123, Lawrence Berkeley Laboratory, One Cyclotron Road, Berkeley, CA, 94720
Sally Floyd
Department of Computer Science, University of California, Santa Cruz, CA, 95064
Manfred Warmuth

Authors

Sally Floyd
View author publications
You can also search for this author in PubMed Google Scholar
Manfred Warmuth
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Floyd, S., Warmuth, M. Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension. Machine Learning 21, 269–304 (1995). https://doi.org/10.1023/A:1022660318680

Download citation

Issue Date: December 1995
DOI: https://doi.org/10.1023/A:1022660318680

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension

Abstract

Article PDF

Similar content being viewed by others

Order Compression Schemes

Bounding Embeddings of VC Classes into Maximum Classes

PAC learning, VC dimension, and the arithmetic hierarchy

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension

Abstract

Article PDF

Similar content being viewed by others

Order Compression Schemes

Bounding Embeddings of VC Classes into Maximum Classes

PAC learning, VC dimension, and the arithmetic hierarchy

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation