Evaluating Feature Selection for SVMs in High Dimensions

  • Roland Nilsson
  • José M. Peña
  • Johan Björkegren
  • Jesper Tegnér
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)


We perform a systematic evaluation of feature selection (FS) methods for support vector machines (SVMs) using simulated high- dimensional data (up to 5000 dimensions). Several findings previously reported at low dimensions do not apply in high dimensions. For example, none of the FS methods investigated improved SVM accuracy, indicating that the SVM built-in regularization is sufficient. These results were also validated using microarray data. Moreover, all FS methods tend to discard many relevant features. This is a problem for applications such as microarray data analysis, where identifying all biologically important features is a major objective.


  1. 1.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)zbMATHCrossRefGoogle Scholar
  2. 2.
    Dougherty, E.R.: The fundamental role of pattern recognition for the gene-expression/microarray data in bioinformatics. Pattern Recognition 38, 2226–2228 (2005)CrossRefGoogle Scholar
  3. 3.
    Golub, T.R., et al.: Molecular classifiation of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  4. 4.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997)zbMATHCrossRefGoogle Scholar
  5. 5.
    Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10, 1895–1923 (1998)CrossRefGoogle Scholar
  6. 6.
    Davies, S., Russel, S.: NP-completeness of searches for smallest possible feature sets. In: Proceedings of the 1994 AAAI fall symposium on relevance, pp. 37–39. AAAI Press, Menlo Park (1994)Google Scholar
  7. 7.
    Guyon, I., et al.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)zbMATHCrossRefGoogle Scholar
  8. 8.
    Fung, G., Mangasarian, O.L.: A feature selection newton method for support vector machine classification. Computational Optimization and Applications 28, 185–202 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Weston, J., et al.: Use of the zero-norm with linear models and kernel methods. Journal of Machine Learning Research 3, 1439–1461 (2003)zbMATHCrossRefGoogle Scholar
  10. 10.
    Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)zbMATHGoogle Scholar
  11. 11.
    Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)CrossRefGoogle Scholar
  12. 12.
    Keerthi, S.S.: Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms. IEEE Transactions on Neural Networks 13(5), 1225–1229 (2002)CrossRefGoogle Scholar
  13. 13.
    Ambroise, C., McLachlan, G.J.: Selection bias in gene extraction on the basis of microarray gene-expression data. PNAS 99(10), 6562–6566 (2002)zbMATHCrossRefGoogle Scholar
  14. 14.
    Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons Inc., Chichester (1998)zbMATHGoogle Scholar
  15. 15.
    Perkins, S., et al.: Grafting: Fast, incremental feature selection by gradient descent in function space. Journal of Machine Learning Research 3, 1333–1356 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Statnikov, A., et al.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5), 631–643 (2005)CrossRefGoogle Scholar
  17. 17.
    Speed, T. (ed.): Statistical Analysis of Gene Expression Microarray Data. Chapman & Hall, Boca Raton (2003)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Roland Nilsson
    • 1
  • José M. Peña
    • 1
  • Johan Björkegren
    • 2
  • Jesper Tegnér
    • 1
  1. 1.IFM Computational BiologyLinköping UniversityLinköpingSweden
  2. 2.Gustav V Research Institute, Karolinska InstituteStockholmSweden

Personalised recommendations