The Minimum Redundancy – Maximum Relevance Approach to Building Sparse Support Vector Machines

  • Xiaoxing Yang
  • Ke Tang
  • Xin Yao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5788)


Recently, building sparse SVMs becomes an active research topic due to its potential applications in large scale data mining tasks. One of the most popular approaches to building sparse SVMs is to select a small subset of training samples and employ them as the support vectors. In this paper, we explain that selecting the support vectors is equivalent to selecting a number of columns from the kernel matrix, and is equivalent to selecting a subset of features in the feature selection domain. Hence, we propose to use an effective feature selection algorithm, namely the Minimum Redundancy – Maximum Relevance (MRMR) algorithm to solve the support vector selection problem. MRMR algorithm was then compared to two existing methods, namely back-fitting (BF) and pre-fitting (PF) algorithms. Preliminary results showed that MRMR generally outperformed BF algorithm while it was inferior to PF algorithm, in terms of generalization performance. However, the MRMR approach was extremely efficient and significantly faster than the two compared algorithms.


Relevance Redundancy Sparse design SVMs Machine learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  2. 2.
    Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 121–167 (1998)CrossRefGoogle Scholar
  3. 3.
    Suykens, J.A.K., Vandewalle, J.: Least squares support vector machine classifiers. Neural Processing Letters 9, 293–300 (1999)CrossRefzbMATHGoogle Scholar
  4. 4.
    Fung, G., Mangasarian, O.L.: Proximal support vector machine classifiers. In: Proceedings of Knowledge Discovery and Data Mining, San Francisco, CA, New York, pp. 77–86 (2001)Google Scholar
  5. 5.
    Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Smola, A.J., Mueller, K.-R.: Constructing descriptive and discriminative non-linear features: Rayleigh coefficients in kernel feature spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(5), 623–628 (2003)CrossRefGoogle Scholar
  6. 6.
    Burges, C.J.C.: Simplified support vector decision rules. In: Proceedings of the 13th International Conference on Machine Learning, Bari, Italy, pp. 71–77 (1996)Google Scholar
  7. 7.
    Burges, C.J.C., Schoelkopf, B.: Improving speed and accuracy of support vector learning machines. In: Advances in Neural Information Processing Systems, vol. 9, pp. 375–381. MIT Press, Cambridge (1997)Google Scholar
  8. 8.
    Wu, M., Schölkoph, B., Bakir, G.: A direct method for building sparse kernel learning algorithms. Journal of Machine Learning Research 7, 603–624 (2006)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Lee, Y., Mangasarian, O.L.: RSVM: reduced support vector machines. In: CD Proceedings of the First SIAM International Conference on Data Mining, Chicago (2001)Google Scholar
  10. 10.
    Lee, Y., Mangasarian, O.L.: SSVM: A smooth support vector machine. In: Computational Optimization and applications, pp. 5–22 (2001)Google Scholar
  11. 11.
    Lin, K., Lin, C.: A study on reduced support vector machines. IEEE Transactions on Neural Networks 14, 1449–1459 (2003)CrossRefGoogle Scholar
  12. 12.
    Downs, T., Gates, K.E., Masters, A.: Exact simplification of support vector solutions. Journal of Machine Learning Research 2, 293–297 (2001)zbMATHGoogle Scholar
  13. 13.
    Keerthi, S.S., Chapelle, O., DeCoste, D.: Building support vector machines with reduced classifier complexity. Journal of Machine Learning Research 8, 1–22 (2006)zbMATHGoogle Scholar
  14. 14.
    Sun, P., Yao, X.: Greedy forward selection algorithms to sparse Gaussian process regression. In: Proceedings of the 2006 International Joint Conference on Neural Networks (IJCNN 2006), Vancouver, Canada, pp. 159–165 (2006)Google Scholar
  15. 15.
    Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the Computational Systems Bioinformatics, pp. 523–528 (2003)Google Scholar
  16. 16.
    UCI Machine Learning Repository,

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Xiaoxing Yang
    • 1
  • Ke Tang
    • 1
  • Xin Yao
    • 1
    • 2
  1. 1.Nature Inspired Computation and Applications Laboratory (NICAL), School of Computer Science and TechnologyUniversity of Science and Technology of ChinaHefeiChina
  2. 2.The Center of Excellence for Research in Computational Intelligence and Applications (CERCIA), School of Computer ScienceThe University of BirminghamBirminghamU.K.

Personalised recommendations