Measuring the Stability of Feature Selection with Applications to Ensemble Methods

Nogueira, Sarah; Brown, Gavin

doi:10.1007/978-3-319-20248-8_12

Sarah Nogueira¹⁶ &
Gavin Brown¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9132))

Included in the following conference series:

International Workshop on Multiple Classifier Systems

1240 Accesses
17 Citations

Abstract

Ensemble methods are often used to decide on a good selection of features for later processing by a classifier. Examples of this are in the determination of Random Forest variable importance proposed by Breiman, and in the concept of feature selection ensembles, where the outputs of multiple feature selectors are combined to yield more robust results. All of these methods rely critically on the concept of feature selection stability - similar but distinct to the concept of diversity in classifier ensembles. We conduct a systematic study of the literature, identifying desirable/undesirable properties, and identify a weakness in existing measures. A simple correction is proposed, and empirical studies are conducted to illustrate its utility.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Proofs of the theorems available at http://www.cs.man.ac.uk/~nogueirs.

References

Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P., Saeys, Y.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26, 392–398 (2010)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Ditzler, G., Polikar, R., Rosen, G.: A bootstrap based neyman-pearson test for identifying variable importance. IEEE Trans. Neural Netw. Learn. Syst. 26, 880–886 (2014)
Article Google Scholar
Dunne, K., Cunningham, P., Azuaje, F.: Solutions to instability problems with sequential wrapper-based approaches to feature selection. Technical report, Journal of Machine Learning Research (2002)
Google Scholar
He, Z., Yu, W.: Stable feature selection for biomarker discovery. Comput. Biol. Chem. 34, 215–225 (2010)
Article MathSciNet Google Scholar
Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12, 95–116 (2007)
Article Google Scholar
Křížek, P., Kittler, J., Hlaváč, V.: Improving stability of feature selection methods. In: Kropatsch, W.G., Kampel, M., Hanbury, A. (eds.) CAIP 2007. LNCS, vol. 4673, pp. 929–936. Springer, Heidelberg (2007)
Chapter Google Scholar
Kuncheva, L.I.: A stability index for feature selection. In: Artificial Intelligence and Applications (2007)
Google Scholar
Lustgarten, J.L., Gopalakrishnan, V., Visweswaran, S.: Measuring stability of feature selection in biomedical datasets. In: Proceedings of the AMIA Annual Symposium (2009)
Google Scholar
Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008)
Chapter Google Scholar
Somol, P., Novovičová, J.: Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1921–1939 (2010)
Article Google Scholar
Wald, R., Khoshgoftaar, T.M., Napolitano, A.: Stability of filter- and wrapper-based feature subset selection. In: International Conference on Tools with Artificial Intelligence. IEEE Computer Society (2013)
Google Scholar
Yu, L., Ding, C.H.Q., Loscalzo, S.: Stable feature selection via dense feature groups. In: KDD (2008)
Google Scholar
Zhang, M., Zhang, L., Zou, J., Yao, C., Xiao, H., Liu, Q., Wang, J., Wang, D., Wang, C., Guo, Z.: Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes. Bioinformatics 25, 1662–1668 (2009)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the EPSRC grant [EP/I028099/1].

Author information

Authors and Affiliations

School of Computer Science, University of Manchester, Manchester, M13 9PL, UK
Sarah Nogueira & Gavin Brown

Authors

Sarah Nogueira
View author publications
You can also search for this author in PubMed Google Scholar
Gavin Brown
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sarah Nogueira .

Editor information

Editors and Affiliations

Ulm University, Ulm, Germany
Friedhelm Schwenker
University of Cagliari, Cagliari, Italy
Fabio Roli
University of Surrey, Guildford, United Kingdom
Josef Kittler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nogueira, S., Brown, G. (2015). Measuring the Stability of Feature Selection with Applications to Ensemble Methods. In: Schwenker, F., Roli, F., Kittler, J. (eds) Multiple Classifier Systems. MCS 2015. Lecture Notes in Computer Science(), vol 9132. Springer, Cham. https://doi.org/10.1007/978-3-319-20248-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-20248-8_12
Published: 03 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20247-1
Online ISBN: 978-3-319-20248-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics