Stability of feature selection algorithms: a study on high-dimensional spaces

Kalousis, Alexandros; Prados, Julien; Hilario, Melanie

doi:10.1007/s10115-006-0040-8

Stability of feature selection algorithms: a study on high-dimensional spaces

Regular Paper
Published: 01 December 2006

Volume 12, pages 95–116, (2007)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Alexandros Kalousis¹,
Julien Prados¹ &
Melanie Hilario¹

2000 Accesses
399 Citations
Explore all metrics

Abstract

With the proliferation of extremely high-dimensional data, feature selection algorithms have become indispensable components of the learning process. Strangely, despite extensive work on the stability of learning algorithms, the stability of feature selection algorithms has been relatively neglected. This study is an attempt to fill that gap by quantifying the sensitivity of feature selection algorithms to variations in the training set. We assess the stability of feature selection algorithms based on the stability of the feature preferences that they express in the form of weights-scores, ranks, or a selected feature subset. We examine a number of measures to quantify the stability of feature preferences and propose an empirical way to estimate them. We perform a series of experiments with several feature selection algorithms on a set of proteomics datasets. The experiments allow us to explore the merits of each stability measure and create stability profiles of the feature selection algorithms. Finally, we show how stability profiles can support the choice of a feature selection algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

References

Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine A (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750
Article Google Scholar
Domingos P (2000) A unified bias-variance decomposition and its applications. In: Langley P (ed) Proceedings of the seventeenth international conference on machine learning. Morgan Kaufmann, San Fransisco, pp 231–238
Google Scholar
Domingos P (2000) A unified bias-variance decomposition for zero-one and squared loss. In: Proceedings of the seventeenth national conference on artificial intelligence. AAAI Press, Melno, pp 564–569
Google Scholar
Duda R, Hart P, Stork D (2001) Pattern classification and scene analysis. Wiley, New York
Google Scholar
Fayyad U, Irani K (1993) Multi-interval discretization of continuous attributes as preprocessing for classification learning. In: Bajcsy R (ed) Proceedings of the 13th international joint conference on artificial intelligence. Morgan Kaufmann, San Fransisco, pp 1022–1027
Google Scholar
Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4:1–58
Google Scholar
Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander E (1999) Molecular classification of cancer: class discovery and class prediction by gene expression. Science 286:531–537
Article Google Scholar
Guyon I, Weston J, Barnhill S, Vladimir V (2002) Gene selection for cancer classification using support vector machines. Machine Learn 46(1–3):389–422
Article MATH Google Scholar
Hall M, Holmes G (2003) Benchmarking attribute selection techniques for discere class data mining. IEEE Trans Knowl Data Eng 15(3)
Mitchel A, Divoli A, Kim JH, Hilario M, Selimas I, Attwood T (2005) Metis: multiple extraction techniques for informative sentences. Bioinformatics 21:4196–4197
Article Google Scholar
Petricoin E, Ardekani A, Hitt B, Levine P, Fusaro V, Steinberg S, Mills G, Simone C, Fishman D, Kohn E, Liotta L (2002) Use of proteomic patterns in serum to identify ovarian cancer. Lancet 395:572–577
Article Google Scholar
Petricoin E, Ornstein D, Paweletz C, Ardekani A, Hackett P, Hitt B, Velassco A, Trucco C, Wiegand L, Wood K, Simone C, Levine P, Marston Linehan W, Emmert-Buck M, Steinberg S, Kohn E, Liotta L (2002) Serum proteomic patterns for detection of prostate cancer. J NCI 94(20)
Pomeroy S, Tamayo P, Gaasenbeek M, Sturla L, Angelo M, McLaughlin M, Kim J, Goumnerova L, Black P, Lau C, Allen J, Zagzag D, Olson J, Curran T, Wetmore C, Biegel J, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis D, Mesirov J, Lander E, Golub T (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442
Article Google Scholar
Prados J, Kalousis A, Sanchez JC, Allard L, Carrette O, Hilario M (2004) Mining mass spectra for diagnosis and biomarker discovery of cerebral accidents. Proteomics 4(8):2320–2332
Article Google Scholar
Robnik-Sikonja M, Kononenko I (2003) Theoretical and empirical analysis of relieff and rrelieff. Machine Learn 53(1–2):23–693
Article MATH Google Scholar
Turney P (1995) Technical note: bias and the quantification of stability. Machine Learn 20:23–33
Google Scholar
Witten I, Frank E (1999) Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Fransisco
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Geneva, Geneva, Switzerland
Alexandros Kalousis, Julien Prados & Melanie Hilario

Authors

Alexandros Kalousis
View author publications
You can also search for this author in PubMed Google Scholar
Julien Prados
View author publications
You can also search for this author in PubMed Google Scholar
Melanie Hilario
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexandros Kalousis.

Additional information

Alexandros Kalousis received the B.Sc. degree in computer science, in 1994, and the M.Sc. degree in advanced information systems, in 1997, both from the University of Athens, Greece. He received the Ph.D. degree in meta-learning for classification algorithm selection from the University of Geneva, Department of Computer Science, Geneva, in 2002. Since then he is a Senior Researcher in the same university. His research interests include relational learning with kernels and distances, stability of feature selection algorithms, and feature extraction from spectral data.

Julien Prados is a Ph.D. student at the University of Geneva, Switzerland. In 1999 and 2001, he received the B.Sc. and M.Sc. degrees in computer science from the University Joseph Fourier (Grenoble, France). After a year of work in industry, he joined the Geneva Artificial Intelligence Laboratory, where he is working on bioinformatics and datamining tools for mass spectrometry data analysis.

Melanie Hilario has a Ph.D. in computer science from the University of Paris VI and currently works at the University of Geneva’s Artificial Intelligence Laboratory. She has initiated and participated in several European research projects on neuro-symbolic integration, meta-learning, and biological text mining. She has served on the program committees of many conferences and workshops in machine learning, data mining, and artificial intelligence. She is currently an Associate Editor of theInternational Journal on Artificial Intelligence Toolsand a member of the Editorial Board of theIntelligent Data Analysis journal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kalousis, A., Prados, J. & Hilario, M. Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12, 95–116 (2007). https://doi.org/10.1007/s10115-006-0040-8

Download citation

Received: 30 November 2005
Revised: 03 February 2006
Accepted: 01 April 2006
Published: 01 December 2006
Issue Date: May 2007
DOI: https://doi.org/10.1007/s10115-006-0040-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stability of feature selection algorithms: a study on high-dimensional spaces

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Stability of feature selection algorithms: a study on high-dimensional spaces

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation