Knowledge and Information Systems

, Volume 12, Issue 1, pp 95–116 | Cite as

Stability of feature selection algorithms: a study on high-dimensional spaces

  • Alexandros Kalousis
  • Julien Prados
  • Melanie Hilario
Regular Paper


With the proliferation of extremely high-dimensional data, feature selection algorithms have become indispensable components of the learning process. Strangely, despite extensive work on the stability of learning algorithms, the stability of feature selection algorithms has been relatively neglected. This study is an attempt to fill that gap by quantifying the sensitivity of feature selection algorithms to variations in the training set. We assess the stability of feature selection algorithms based on the stability of the feature preferences that they express in the form of weights-scores, ranks, or a selected feature subset. We examine a number of measures to quantify the stability of feature preferences and propose an empirical way to estimate them. We perform a series of experiments with several feature selection algorithms on a set of proteomics datasets. The experiments allow us to explore the merits of each stability measure and create stability profiles of the feature selection algorithms. Finally, we show how stability profiles can support the choice of a feature selection algorithm.


Feature selection High dimensionality Feature stability 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine A (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750CrossRefGoogle Scholar
  2. 2.
    Domingos P (2000) A unified bias-variance decomposition and its applications. In: Langley P (ed) Proceedings of the seventeenth international conference on machine learning. Morgan Kaufmann, San Fransisco, pp 231–238Google Scholar
  3. 3.
    Domingos P (2000) A unified bias-variance decomposition for zero-one and squared loss. In: Proceedings of the seventeenth national conference on artificial intelligence. AAAI Press, Melno, pp 564–569Google Scholar
  4. 4.
    Duda R, Hart P, Stork D (2001) Pattern classification and scene analysis. Wiley, New YorkGoogle Scholar
  5. 5.
    Fayyad U, Irani K (1993) Multi-interval discretization of continuous attributes as preprocessing for classification learning. In: Bajcsy R (ed) Proceedings of the 13th international joint conference on artificial intelligence. Morgan Kaufmann, San Fransisco, pp 1022–1027Google Scholar
  6. 6.
    Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4:1–58Google Scholar
  7. 7.
    Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander E (1999) Molecular classification of cancer: class discovery and class prediction by gene expression. Science 286:531–537CrossRefGoogle Scholar
  8. 8.
    Guyon I, Weston J, Barnhill S, Vladimir V (2002) Gene selection for cancer classification using support vector machines. Machine Learn 46(1–3):389–422zbMATHCrossRefGoogle Scholar
  9. 9.
    Hall M, Holmes G (2003) Benchmarking attribute selection techniques for discere class data mining. IEEE Trans Knowl Data Eng 15(3)Google Scholar
  10. 10.
    Mitchel A, Divoli A, Kim JH, Hilario M, Selimas I, Attwood T (2005) Metis: multiple extraction techniques for informative sentences. Bioinformatics 21:4196–4197CrossRefGoogle Scholar
  11. 11.
    Petricoin E, Ardekani A, Hitt B, Levine P, Fusaro V, Steinberg S, Mills G, Simone C, Fishman D, Kohn E, Liotta L (2002) Use of proteomic patterns in serum to identify ovarian cancer. Lancet 395:572–577CrossRefGoogle Scholar
  12. 12.
    Petricoin E, Ornstein D, Paweletz C, Ardekani A, Hackett P, Hitt B, Velassco A, Trucco C, Wiegand L, Wood K, Simone C, Levine P, Marston Linehan W, Emmert-Buck M, Steinberg S, Kohn E, Liotta L (2002) Serum proteomic patterns for detection of prostate cancer. J NCI 94(20)Google Scholar
  13. 13.
    Pomeroy S, Tamayo P, Gaasenbeek M, Sturla L, Angelo M, McLaughlin M, Kim J, Goumnerova L, Black P, Lau C, Allen J, Zagzag D, Olson J, Curran T, Wetmore C, Biegel J, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis D, Mesirov J, Lander E, Golub T (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442CrossRefGoogle Scholar
  14. 14.
    Prados J, Kalousis A, Sanchez JC, Allard L, Carrette O, Hilario M (2004) Mining mass spectra for diagnosis and biomarker discovery of cerebral accidents. Proteomics 4(8):2320–2332CrossRefGoogle Scholar
  15. 15.
    Robnik-Sikonja M, Kononenko I (2003) Theoretical and empirical analysis of relieff and rrelieff. Machine Learn 53(1–2):23–693zbMATHCrossRefGoogle Scholar
  16. 16.
    Turney P (1995) Technical note: bias and the quantification of stability. Machine Learn 20:23–33Google Scholar
  17. 17.
    Witten I, Frank E (1999) Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San FransiscoGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2006

Authors and Affiliations

  • Alexandros Kalousis
    • 1
  • Julien Prados
    • 1
  • Melanie Hilario
    • 1
  1. 1.Computer Science DepartmentUniversity of GenevaGenevaSwitzerland

Personalised recommendations