Machine Learning

, Volume 76, Issue 2–3, pp 179–193 | Cite as

Sparse kernel SVMs via cutting-plane training

Article

Abstract

We explore an algorithm for training SVMs with Kernels that can represent the learned rule using arbitrary basis vectors, not just the support vectors (SVs) from the training set. This results in two benefits. First, the added flexibility makes it possible to find sparser solutions of good quality, substantially speeding-up prediction. Second, the improved sparsity can also make training of Kernel SVMs more efficient, especially for high-dimensional and sparse data (e.g. text classification). This has the potential to make training of Kernel SVMs tractable for large training sets, where conventional methods scale quadratically due to the linear growth of the number of SVs. In addition to a theoretical analysis of the algorithm, we also present an empirical evaluation.

Keywords

Support vector machines Kernel methods Sparse kernel methods Cutting plane algorithm Basis pursuit 

References

  1. Bach, F., & Jordan, M. (2005). Predictive low-rank decomposition for kernel methods. In ICML (pp. 33–40). Google Scholar
  2. Bordes, A., Ertekin, S., Weston, J., & Bottou, L. (2005). Fast kernel classifiers with online and active learning. JMLR, 6, 1579–1619. MathSciNetGoogle Scholar
  3. Burges, C. (1996). Simplified support vector decision rules. In ICML (pp. 71–77). Google Scholar
  4. Burges, C., & Schölkopf, B. (1997). Improving the accuracy and speed of support vector learning machines. NIPS, 9, 375–381. Google Scholar
  5. Fine, S., & Scheinberg, K. (2001). Efficient SVM training using low-rank kernel representations. JMLR, 2, 243–264. CrossRefGoogle Scholar
  6. Joachims, T. (1999). Making large-scale SVM learning practical. In Schölkopf, B., Burges, C., Smola, A. (Eds.), Advances in kernel methods—support vector learning (pp. 169–184). Cambridge: MIT Press. Google Scholar
  7. Joachims, T. (2006). Training linear SVMs in linear time. In SIGKDD (pp. 217–226). Google Scholar
  8. Joachims, T., Finley, T., & Yu, C. N. (2009). Cutting-plane training of structural SVMs. Machine Learning, 76(1). Google Scholar
  9. Keerthi, S., Chapelle, O., & DeCoste, D. (2006). Building support vector machines with reduced classifier complexity. JMLR, 7, 1493–1515. MathSciNetGoogle Scholar
  10. Platt, J. (1999). Using analytic QP and sparseness to speed training of support vector machines. In NIPS (pp. 557–563). Google Scholar
  11. Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press. Google Scholar
  12. Smola, A., & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. In ICML (pp. 911–918). Google Scholar
  13. Steinwart, I. (2003). Sparseness of support vector machines. JMLR, 4, 1071–1105. CrossRefMathSciNetGoogle Scholar
  14. Teo, C. H., Smola, A., Vishwanathan, S. V., & Le, Q. V. (2007). A scalable modular convex solver for regularized risk minimization. In SIGKDD (pp. 727–736). Google Scholar
  15. Tsang, I., Kwok, J., & Cheung, P. M. (2005). Core vector machines: Fast SVM training on very large data sets. JMLR, 6, 363–392. MathSciNetGoogle Scholar
  16. Tsang, I. W., Kocsor, A., & Kwok, J. T. (2007). Simpler core vector machines with enclosing balls. In ICML (pp. 911–918). Google Scholar
  17. Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. JMLR, 6, 1453–1484. MathSciNetGoogle Scholar
  18. Vincent, P., & Bengio, Y. (2002). Kernel matching pursuit. Machine Learning, 48(1–3), 165–187. MATHCrossRefGoogle Scholar
  19. Williams, C., & Seeger, M. (2001). Using the Nystrom method to speed up kernel machines. In NIPS. Google Scholar
  20. Wu, M., Schölkopf, B., & Bakir, G. H. (2006). A direct method for building sparse kernel learning algorithms. JMLR, 7, 603–624. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Dept. of Computer ScienceCornell UniversityIthacaUSA

Personalised recommendations