Margin-Sparsity Trade-Off for the Set Covering Machine

  • François Laviolette
  • Mario Marchand
  • Mohak Shah
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3720)


We propose a new learning algorithm for the set covering machine and a tight data-compression risk bound that the learner can use for choosing the appropriate tradeoff between the sparsity of a classifier and the magnitude of its separating margin.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Ben-David, S., Litman, A.: Combinatorial variability of Vapnik-Chervonenkis classes. Discrete Applied Mathematics 86, 3–25 (1998)MATHCrossRefMathSciNetGoogle Scholar
  2. Bennett, K.P.: Combining support vector and mathematical programming methods for classifications. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods—Support Vector Learning, pp. 307–326. MIT Press, Cambridge (1999)Google Scholar
  3. Bi, J., Bennett, K.P., Embrechts, M., Breneman, K.M., Song, M.: Dimensionality reduction via sparse support vector machines. Journal of Machine Learning Reasearch 3, 1229–1245 (2003)MATHCrossRefGoogle Scholar
  4. Blum, A., Langford, J.: PAC-MDL bounds. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 344–357. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pp. 144–152. ACM Press, New York (1992)CrossRefGoogle Scholar
  6. Floyd, S., Warmuth, M.: Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Machine Learning 21(3), 269–304 (1995)Google Scholar
  7. Graepel, T., Herbrich, R., Shawe-Taylor, J.: Generalisation error bounds for sparse linear classifiers. In: Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pp. 298–303 (2000)Google Scholar
  8. Graepel, T., Herbrich, R., Williamson, R.C.: From margin to sparsity. In: Advances in neural information processing systems 13, pp. 210–216 (2001)Google Scholar
  9. Langford, J.: Tutorial on practical prediction theory for classification. Journal of Machine Learning Reasearch 3, 273–306 (2005)MathSciNetGoogle Scholar
  10. Littlestone, N., Warmuth, M.: Relating data compression and learnability. Technical report, University of California Santa Cruz, Santa Cruz, CA (1986)Google Scholar
  11. Marchand, M., Shawe-Taylor, J.: The set covering machine. Journal of Machine Learning Reasearch 3, 723–746 (2002)CrossRefMathSciNetGoogle Scholar
  12. Marchand, M., Sokolova, M.: Learning with decision lists of data-dependent Features. Journal of Machine Learning Reasearch 6, 427–451 (2005)MathSciNetGoogle Scholar
  13. Mendelson, S.: Rademacher averages and phase transitions in Glivenko-Cantelli class. IEEE Transactions on Information Theory 48, 251–263 (2002)MATHCrossRefMathSciNetGoogle Scholar
  14. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)MATHGoogle Scholar
  15. von Luxburg, U., Bousquet, O., Schölkopf, B.: A compression approach to support vector model selection. Journal of Machine Learning Research 5, 293–323 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • François Laviolette
    • 1
  • Mario Marchand
    • 1
  • Mohak Shah
    • 2
  1. 1.IFT-GLOUniversité LavalSainte-FoyCanada
  2. 2.SITEUniversity of OttawaOttawaCanada

Personalised recommendations