Stability of feature selection algorithms: a study on high-dimensional spaces
- 1.1k Downloads
With the proliferation of extremely high-dimensional data, feature selection algorithms have become indispensable components of the learning process. Strangely, despite extensive work on the stability of learning algorithms, the stability of feature selection algorithms has been relatively neglected. This study is an attempt to fill that gap by quantifying the sensitivity of feature selection algorithms to variations in the training set. We assess the stability of feature selection algorithms based on the stability of the feature preferences that they express in the form of weights-scores, ranks, or a selected feature subset. We examine a number of measures to quantify the stability of feature preferences and propose an empirical way to estimate them. We perform a series of experiments with several feature selection algorithms on a set of proteomics datasets. The experiments allow us to explore the merits of each stability measure and create stability profiles of the feature selection algorithms. Finally, we show how stability profiles can support the choice of a feature selection algorithm.
KeywordsFeature selection High dimensionality Feature stability
Unable to display preview. Download preview PDF.
- 2.Domingos P (2000) A unified bias-variance decomposition and its applications. In: Langley P (ed) Proceedings of the seventeenth international conference on machine learning. Morgan Kaufmann, San Fransisco, pp 231–238Google Scholar
- 3.Domingos P (2000) A unified bias-variance decomposition for zero-one and squared loss. In: Proceedings of the seventeenth national conference on artificial intelligence. AAAI Press, Melno, pp 564–569Google Scholar
- 4.Duda R, Hart P, Stork D (2001) Pattern classification and scene analysis. Wiley, New YorkGoogle Scholar
- 5.Fayyad U, Irani K (1993) Multi-interval discretization of continuous attributes as preprocessing for classification learning. In: Bajcsy R (ed) Proceedings of the 13th international joint conference on artificial intelligence. Morgan Kaufmann, San Fransisco, pp 1022–1027Google Scholar
- 6.Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4:1–58Google Scholar
- 9.Hall M, Holmes G (2003) Benchmarking attribute selection techniques for discere class data mining. IEEE Trans Knowl Data Eng 15(3)Google Scholar
- 12.Petricoin E, Ornstein D, Paweletz C, Ardekani A, Hackett P, Hitt B, Velassco A, Trucco C, Wiegand L, Wood K, Simone C, Levine P, Marston Linehan W, Emmert-Buck M, Steinberg S, Kohn E, Liotta L (2002) Serum proteomic patterns for detection of prostate cancer. J NCI 94(20)Google Scholar
- 13.Pomeroy S, Tamayo P, Gaasenbeek M, Sturla L, Angelo M, McLaughlin M, Kim J, Goumnerova L, Black P, Lau C, Allen J, Zagzag D, Olson J, Curran T, Wetmore C, Biegel J, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis D, Mesirov J, Lander E, Golub T (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442CrossRefGoogle Scholar
- 16.Turney P (1995) Technical note: bias and the quantification of stability. Machine Learn 20:23–33Google Scholar
- 17.Witten I, Frank E (1999) Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San FransiscoGoogle Scholar