Teaching and Compressing for Low VC-Dimension

  • Shay Moran
  • Amir Shpilka
  • Avi Wigderson
  • Amir Yehudayoff


In this work we study the quantitative relation between VC-dimension and two other basic parameters related to learning and teaching. Namely, the quality of sample compression schemes and of teaching sets for classes of low VC-dimension. Let C be a binary concept class of size m and VC-dimension d. Prior to this work, the best known upper bounds for both parameters were log(m), while the best lower bounds are linear in d. We present significantly better upper bounds on both as follows. Set k = O(d2 d loglog | C | ).

We show that there always exists a concept c in C with a teaching set (i.e. a list of c-labeled examples uniquely identifying c in C) of size k. This problem was studied by Kuhlmann (On teaching and learning intersection-closed concept classes. In: EuroCOLT, pp 168–182, 1999). Our construction implies that the recursive teaching (RT) dimension of C is at most k as well. The RT-dimension was suggested by Zilles et al. (J Mach Learn Res 12:349–384, 2011) and Doliwa et al. (Recursive teaching dimension, learning complexity, and maximum classes. In: ALT, pp 209–223, 2010). The same notion (under the name partial-ID width) was independently studied by Wigderson and Yehudayoff (Population recovery and partial identification. In: FOCS, pp 390–399, 2012). An upper bound on this parameter that depends only on d is known just for the very simple case d = 1, and is open even for d = 2. We also make small progress towards this seemingly modest goal.

We further construct sample compression schemes of size k for C, with additional information of klog(k) bits. Roughly speaking, given any list of C-labelled examples of arbitrary length, we can retain only k labeled examples in a way that allows to recover the labels of all others examples in the list, using additional klog(k) information bits. This problem was first suggested by Littlestone and Warmuth (Relating data compression and learnability. Unpublished, 1986).



We thank Noga Alon and Gillat Kol for helpful discussions in various stages of this work.


  1. 1.
    N. Alon, S. Moran, A. Yehudayoff, Sign rank, VC dimension and spectral gaps. Electronic Colloquium on Computational Complexity (ECCC) vol. 21, no. 135 (2014)Google Scholar
  2. 2.
    D. Angluin, M. Krikis, Learning from different teachers. Mach. Learn. 51(2), 137–163 (2003)CrossRefMATHGoogle Scholar
  3. 3.
    M. Anthony, G. Brightwell, D.A. Cohen, J. Shawe-Taylor. On exact specification by examples, in COLT, 1992, pp. 311–318Google Scholar
  4. 4.
    P. Assouad, Densite et dimension. Ann. Inst. Fourier 3, 232–282 (1983)MATHGoogle Scholar
  5. 5.
    F. Balbach, Models for algorithmic teaching. PhD thesis, University of Lübeck, 2007Google Scholar
  6. 6.
    S. Ben-David, A. Litman, Combinatorial variability of Vapnik–Chervonenkis classes with applications to sample compression schemes. Discret. Appl. Math. 86(1), 3–25 (1998)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth, Occam’s razor. Inf. Process. Lett. 24(6), 377–380 (1987)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth, Learnability and the Vapnik–Chervonenkis dimension. J. Assoc. Comput. Mach. 36(4), 929–965 (1989)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    X. Chen, Y. Cheng, B. Tang, A note on teaching for VC classes. Electronic Colloquium on Computational Complexity (ECCC), vol. 23, no. 65 (2016)Google Scholar
  10. 10.
    A. Chernikov, P. Simon, Externally definable sets and dependent pairs. Isr. J. Math. 194(1), 409–425 (2013)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods (Cambridge University Press, Cambridge, 2000)CrossRefMATHGoogle Scholar
  12. 12.
    T. Doliwa, H.-U. Simon, S. Zilles, Recursive teaching dimension, learning complexity, and maximum classes, in ALT, 2010, pp. 209–223Google Scholar
  13. 13.
    P. Domingos, The role of Occam’s razor in knowledge discovery. Data Min. Knowl. Discov. 3(4), 409–425 (1999)CrossRefGoogle Scholar
  14. 14.
    R.M. Dudley, Central limit theorems for empirical measures. Ann. Probab. 6, 899–929 (1978)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Z. Dvir, A. Rao, A. Wigderson, A. Yehudayoff, Restriction access, in Innovations in Theoretical Computer Science, Cambridge, 8–10, Jan 2012, pp. 19–33Google Scholar
  16. 16.
    S. Floyd, Space-bounded learning and the Vapnik–Chervonenkis dimension, in COLT, 1989, pp. 349–364Google Scholar
  17. 17.
    S. Floyd, M.K. Warmuth, Sample compression, learnability, and the Vapnik–Chervonenkis dimension. Mach. Learn. 21(3), 269–304 (1995)Google Scholar
  18. 18.
    Y. Freund, Boosting a weak learning algorithm by majority. Inf. Comput. 121(2), 256–285 (1995)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    S.A. Goldman, M. Kearns, On the complexity of teaching. J. Comput. Syst. Sci. 50(1), 20–31 (1995)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    S.A. Goldman, H.D. Mathias, Teaching a smarter learner. J. Comput. Syst. Sci. 52(2), 255–267 (1996)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    S.A. Goldman, R.L. Rivest, R.E. Schapire, Learning binary relations and total orders. SIAM J. Comput. 22(5), 1006–1034 (1993)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    S. Hanneke, Teaching dimension and the complexity of active learning, in COLT, 2007, pp. 66–81Google Scholar
  23. 23.
    D. Haussler, Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik–Chervonenkis dimension. J. Comb. Theory Ser. A 69(2), 217–232 (1995)MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    D. Haussler, E. Welzl, epsilon-nets and simplex range queries. Discret. Comput. Geom. 2, 127–151 (1987)Google Scholar
  25. 25.
    D.P. Helmbold, R.H. Sloan, M.K. Warmuth, Learning integer lattices. SIAM J. Comput. 21(2), 240–266 (1992)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    D.P. Helmbold, M.K. Warmuth, On weak learning. J. Comput. Syst. Sci. 50(3), 551–573 (1995)MathSciNetCrossRefMATHGoogle Scholar
  27. 27.
    J.C. Jackson, A. Tomkins, A computational model of teaching, in COLT, 1992, pp. 319–326Google Scholar
  28. 28.
    M. Kearns, U.V. Vazirani, An Introduction to Computational Learning Theory (MIT Press, Cambridge, 1994)Google Scholar
  29. 29.
    H. Kobayashi, A. Shinohara, Complexity of teaching by a restricted number of examples, in COLT, 2009Google Scholar
  30. 30.
    C. Kuhlmann, On teaching and learning intersection-closed concept classes, in EuroCOLT, 1999, pp. 168–182Google Scholar
  31. 31.
    D. Kuzmin, M.K. Warmuth, Unlabeled compression schemes for maximum classes. J. Mach. Learn. Res. 8, 2047–2081 (2007)MathSciNetMATHGoogle Scholar
  32. 32.
    R. Livni, P. Simon, Honest compressions and their application to compression schemes, in COLT, 2013, pp. 77–92Google Scholar
  33. 33.
    M. Marchand, J. Shawe-Taylor, The set covering machine. J. Mach. Learn. Res. 3, 723–746 (2002)MathSciNetMATHGoogle Scholar
  34. 34.
    S. Moran, A. Yehudayoff. Sample compression for VC classes. Electronic Colloquium on Computational Complexity (ECCC), vol. 22, no. 40 (2015)Google Scholar
  35. 35.
    J. von Neumann, Zur theorie der gesellschaftsspiele. Mathematische Annalen 100, 295–320 (1928)MathSciNetCrossRefMATHGoogle Scholar
  36. 36.
    J.R. Quinlan, R.L. Rivest, Inferring decision trees using the minimum description length principle. Inf. Comput. 80(3), 227–248 (1989)MathSciNetCrossRefMATHGoogle Scholar
  37. 37.
    B.I.P. Rubinstein, P.L. Bartlett, J.H. Rubinstein, Shifting: one-inclusion mistake bounds and sample compression. J. Comput. Syst. Sci. 75(1), 37–59 (2009)MathSciNetCrossRefMATHGoogle Scholar
  38. 38.
    B.I.P. Rubinstein, J.H. Rubinstein, A geometric approach to sample compression. J. Mach. Learn. Res. 13, 1221–1261 (2012)MathSciNetMATHGoogle Scholar
  39. 39.
    R. Samei, P. Semukhin, B. Yang, S. Zilles, Algebraic methods proving Sauer’s bound for teaching complexity. Theor. Comput. Sci. 558, 35–50 (2014)MathSciNetCrossRefMATHGoogle Scholar
  40. 40.
    R. Samei, P. Semukhin, B. Yang, S. Zilles, Sample compression for multi-label concept classes, in COLT, vol. 35, 2014, pp. 371–393Google Scholar
  41. 41.
    N. Sauer, On the density of families of sets. J. Comb. Theory Ser. A 13, 145–147 (1972)MathSciNetCrossRefMATHGoogle Scholar
  42. 42.
    A. Shinohara, S. Miyano, Teachability in computational learning, in ALT, 1990, pp. 247–255Google Scholar
  43. 43.
    L.G. Valiant, A theory of the learnable. Commun. ACM 27, 1134–1142 (1984)CrossRefMATHGoogle Scholar
  44. 44.
    V.N. Vapnik, A.Ya. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16, 264–280 (1971)Google Scholar
  45. 45.
    M.K. Warmuth, Compressing to VC dimension many points, in COLT/Kernel, 2003, pp. 743–744Google Scholar
  46. 46.
    A. Wigderson, A. Yehudayoff, Population recovery and partial identification, in FOCS, 2012, pp. 390–399Google Scholar
  47. 47.
    S. Zilles, S. Lange, R. Holte, M. Zinkevich, Models of cooperative teaching and learning. J. Mach. Learn. Res. 12, 349–384 (2011)MathSciNetMATHGoogle Scholar

Copyright information

© Springer International publishing AG 2017

Authors and Affiliations

  • Shay Moran
    • 1
    • 2
  • Amir Shpilka
    • 3
  • Avi Wigderson
    • 4
  • Amir Yehudayoff
    • 5
  1. 1.Department of Computer ScienceTechnion-IITHaifaIsrael
  2. 2.Max Planck Institute for InformaticsSaarbrückenGermany
  3. 3.Department of Computer ScienceTel Aviv UniversityTel Aviv-YafoIsrael
  4. 4.School of MathematicsInstitute for Advanced StudyPrincetonUSA
  5. 5.Department of MathematicsTechnion-IITHaifaIsrael

Personalised recommendations