Abstract
Within the framework of pac-learning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A sample compression scheme of size k for a concept class C \(\subseteq \) 2X consists of a compression function and a reconstruction function. The compression function receives a finite sample set consistent with some concept in C and chooses a subset of k examples as the compression set. The reconstruction function forms a hypothesis on X from a compression set of k examples. For any sample set of a concept in C the compression set produced by the compression function must lead to a hypothesis consistent with the whole original sample set when it is fed to the reconstruction function. We demonstrate that the existence of a sample compression scheme of fixed-size for a class C is sufficient to ensure that the class C is pac-learnable.
Previous work has shown that a class is pac-learnable if and only if the Vapnik-Chervonenkis (VC) dimension of the class is finite. In the second half of this paper we explore the relationship between sample compression schemes and the VC dimension. We define maximum and maximal classes of VC dimension d. For every maximum class of VC dimension d, there is a sample compression scheme of size d, and for sufficiently-large maximum classes there is no sample compression scheme of size less than d. We discuss briefly classes of VC dimension d that are maximal but not maximum. It is an open question whether every class of VC dimension d has a sample compression scheme of size O(d).
Article PDF
Similar content being viewed by others
References
Angluin, D. (1988). Queries and concept learning. Machine Learning, Vol. 2 No. 4, 319–342, Apr. 1988.
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1987). Occam's razor. Information Processing Letters, Vol. 24, 377–380.
Blumer, A., A. Ehrenfeucht, D. Haussler, & M. Warmuth. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the Association for Computing Machinery, Vol. 36, No. 4, 929–965.
Blumer, A., & Littlestone, N. (1989). Learning faster than promised by the Vapnik-Chervonenkis dimension. Discrete Applied Mathematics 24, p.47-53.
Cesa-Bianchi, N., Freund, Y., Helmbold, D. P., Haussler, D., Schapire, R. E., & Warmuth, M. K. (1993). How to use expert advice. Proceedings of the 25th ACM Symposium on the Theory of Computation, 382–391.
Clarkson, K. L. (1992). Randomized geometric algorithms. In F. K. Hwang and D. Z. Hu (Eds.), Euclidean Geometry and Computers. World Scientific Publishing.
Ehrenfeucht, A., Haussler, D., Kearns, M., & Valiant, L. (1987). A general lower bound on the number of examples needed for learning. Proceedings of the 1988 Workshop on Computational Learning Theory, Morgan Kaufmann, 139–154.
Floyd, S. (1989). On space-bounded learning and the Vapnik-Chervonenkis dimension. PhD thesis, International Computer Science Institute Technical Report TR-89-061, Berkeley, California.
Freund, Y. (1995). Boosting a weak learning algorithm by majority. To appear in Information and Computation.
Goldman, S., & Sloan, R. (1994). The power of self-directed learning. Machine Learning, Vol. 14 No. 3, 271–294.
Haussler, D. (1988). Space efficient learning algorithms. Technical Report UCSC-CRL-88-2, University of California Santa Cruz.
Haussler, D., Welzl, E. (1987). Epsilon-nets & simplex range queries. Discrete and Computational Geometry 2, 127–151.
Helmbold, D. P., & Warmuth, M. K. (1995). On weak learning. Journal of Computer and System Sciences, to appear.
Helmbold, D., Sloan, R., & Warmuth, M. (1990). Learning nested differences of intersection-closed concept classes. Machine Learning 5, 165–196, 1990.
Helmbold, D., Sloan, R., & Warmuth, M. (1992). Learning integer lattices. Siam Journal on Computing, Vol. 21 No. 2, 240–266.
Littlestone, N. (1988). Learning when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning 2, 285–318.
Littlestone, N. (1989). Mistake bounds and logarithmic linear-threshold learning algorithms. PhD thesis, Technical Report UCSC-CRL-89-11, University of California Santa Cruz.
Littlestone, N., Haussler, D., & Warmuth, M. (1994). Predicting {0, 1}-functions on randomly drawn points. Information and Computation, Vol. 115 No. 2, 148–292.
Littlestone, N, & Warmuth, M. (1986). Relating data compression and learnability. Unpublished manuscript, University of California Santa Cruz.
Mitchell, T. (1977). Version spaces: a candidate elimination approach to rule learning. Proceedings of the International Joint Committee for Artificial Intelligence 1977. Cambridge, Mass., 305–310.
Pach, J., & Woeginger, G. (1990). Some new bounds for epsilon-nets. Proceedings of the Sixth Annual Symposium on Computational Geometry, Berkeley, California, 10–15.
Pitt, L., & Valiant, L. (1988). Computational limitations on learning from examples. Journal of the Association for Computing Machinery, Vol. 35 No. 4, 965–984.
Quinlan, J., & Rivest, R. (1989). Inferring decision trees using the minimum description length principle. Information and Computation, Vol. 80, 227–248.
Rissanen, J. (1986). Stochastic complexity and modeling. Annals of Statistics, Vol. 14 No. 3, 1080–1100.
Sauer, N. (1972). On the density of families of sets. Journal of Combinatorial Theory (A) 13, 145–147.
Schapire, R. (1990). The strength of weak learnability. Machine Learning, Vol. 5 No. 2, 197–227.
Shawe-Taylor, J., Anthony, M., & Biggs, N. (1989). Bounding sample size with the Vapnik-Chervonenkis dimension. Technical Report CSD-TR-618, University of London, Royal Halloway and New Bedford College.
Valiant, L.G. (1984). A theory of the learnable. Communications of the Association for Computing Machinery, Vol. 27, No. 11, 1134–42.
Vapnik, V.N. (1982). Estimation of dependencies based on empirical data. Springer Verlag, New York.
Vapnik, V.N. & Chervonenkis, A.Ya. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, Vol. 16, No. 2, 264–280.
Welzl, E. (1987). Complete range spaces. Unpublished notes.
Welzl, E., & Woeginger, G. (1987). On Vapnik-Chervonenkis dimension one. Unpublished manuscript, Institutes for Information Processing, Technical University of Graz and Austrian Computer Society, Austria.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Floyd, S., Warmuth, M. Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension. Machine Learning 21, 269–304 (1995). https://doi.org/10.1023/A:1022660318680
Issue Date:
DOI: https://doi.org/10.1023/A:1022660318680