Abstract
In addition to accuracy, stability is also a measure of success for a feature selection algorithm. Stability could especially be a concern when the number of samples in a data set is small and the dimensionality is high. In this study, we introduce a stability measure, and perform both accuracy and stability measurements of MRMR (Minimum Redundancy Maximum Relevance) feature selection algorithm on different data sets. The two feature evaluation criteria used by MRMR, MID (Mutual Information Difference) and MIQ (Mutual Information Quotient), result in similar accuracies, but MID is more stable. We also introduce a new feature selection criterion, MID α , where redundancy and relevance of selected features are controlled by parameter α.
Chapter PDF
Similar content being viewed by others
Keywords
References
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant Feature and The Subset Selection Problem. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 121–129 (1994)
Liu, H., Yu, L.: Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Transactions on Knowledge and Data Engineering 17(4), 491–502 (2005)
Ding, C., Peng, H.: Minimum Redundancy Feature Selection from Microarray Gene Expression Data. In: Proceedings of the Computational Systems Bioinformatics conference (CSB 2003), pp. 523–529 (2003)
Yu, L., Liu, H.: Efficient Feature Selection via Analysis of Relevance and Redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)
Pepe, M.S., Etzioni, R., Feng, Z., et al.: Phases of Biomarker Development for Early Detection of Cancer. J. Natl. Cancer Inst. 93, 1054–1060 (2001)
Kalousis, A., Prados, J., Hilario, M.: Stability of Feature Selection Algorithms: A Study on High-Dimensional Spaces. Knowledge and Information Systems 12, 95–116 (2007)
Yu, L., Ding, C., Loscalzo, S.: Stable Feature Selection via Dense Feature Groups. In: Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 803–811 (2008)
Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML 2008, Part II. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008)
Ding, I., Peng, H.C.: Minimum Redundancy Feature Selection from Microarray Gene Expression Data. In: Proc. Second IEEE Computational Systems Bioinformatics Conf., pp. 523–528 (2003)
Hungarian Algorithm by Alexander Melin, MATLAB CENTRAL Web Site, http://www.mathworks.com/matlabcentral/fileexchange/11609
Marsaglia, G.: Ratios of Normal Variables and Ratios of Sums of Uniform Variables. Journal of the American Statistical Association 60(309), 193–204 (1965)
UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearn/MLRepository.html
Tzanetakis, G., Cook, P.: Musical Genre Classification of Audio Signals. IEEE Transactions on Speech and Audio Processing 10(5), 293–302 (2002)
Ding, I., Peng, H.C., Moerchen, F., Ultsch, A., Thies, M., Loehken, I.: Modelling Timbre Distance with Temporal Statistics From Polyphonic Music. IEEE Transactions on Speech and Audio Processing 14, 81–90 (2006)
Alpaydin, E.: Introduction to Machine Learning. The MIT Press, Cambridge (2004)
Gulgezen, G.: Stable and Accurate Feature Selection. M.Sc. Thesis, Istanbul Technical University, Computer Engineering Department (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gulgezen, G., Cataltepe, Z., Yu, L. (2009). Stable and Accurate Feature Selection. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04180-8_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-04180-8_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04179-2
Online ISBN: 978-3-642-04180-8
eBook Packages: Computer ScienceComputer Science (R0)