Advertisement

Stable and Accurate Feature Selection

  • Gokhan Gulgezen
  • Zehra Cataltepe
  • Lei Yu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5781)

Abstract

In addition to accuracy, stability is also a measure of success for a feature selection algorithm. Stability could especially be a concern when the number of samples in a data set is small and the dimensionality is high. In this study, we introduce a stability measure, and perform both accuracy and stability measurements of MRMR (Minimum Redundancy Maximum Relevance) feature selection algorithm on different data sets. The two feature evaluation criteria used by MRMR, MID (Mutual Information Difference) and MIQ (Mutual Information Quotient), result in similar accuracies, but MID is more stable. We also introduce a new feature selection criterion, MID α , where redundancy and relevance of selected features are controlled by parameter α.

Keywords

Feature Selection Stable Feature Selection Stability MRMR (Minimum Redundancy Maximum Relevance) 

References

  1. 1.
    John, G.H., Kohavi, R., Pfleger, K.: Irrelevant Feature and The Subset Selection Problem. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 121–129 (1994)Google Scholar
  2. 2.
    Liu, H., Yu, L.: Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Transactions on Knowledge and Data Engineering 17(4), 491–502 (2005)CrossRefGoogle Scholar
  3. 3.
    Ding, C., Peng, H.: Minimum Redundancy Feature Selection from Microarray Gene Expression Data. In: Proceedings of the Computational Systems Bioinformatics conference (CSB 2003), pp. 523–529 (2003)Google Scholar
  4. 4.
    Yu, L., Liu, H.: Efficient Feature Selection via Analysis of Relevance and Redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Pepe, M.S., Etzioni, R., Feng, Z., et al.: Phases of Biomarker Development for Early Detection of Cancer. J. Natl. Cancer Inst. 93, 1054–1060 (2001)CrossRefGoogle Scholar
  6. 6.
    Kalousis, A., Prados, J., Hilario, M.: Stability of Feature Selection Algorithms: A Study on High-Dimensional Spaces. Knowledge and Information Systems 12, 95–116 (2007)CrossRefGoogle Scholar
  7. 7.
    Yu, L., Ding, C., Loscalzo, S.: Stable Feature Selection via Dense Feature Groups. In: Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 803–811 (2008)Google Scholar
  8. 8.
    Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML 2008, Part II. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  9. 9.
    Ding, I., Peng, H.C.: Minimum Redundancy Feature Selection from Microarray Gene Expression Data. In: Proc. Second IEEE Computational Systems Bioinformatics Conf., pp. 523–528 (2003)Google Scholar
  10. 10.
    Hungarian Algorithm by Alexander Melin, MATLAB CENTRAL Web Site, http://www.mathworks.com/matlabcentral/fileexchange/11609
  11. 11.
    Marsaglia, G.: Ratios of Normal Variables and Ratios of Sums of Uniform Variables. Journal of the American Statistical Association 60(309), 193–204 (1965)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearn/MLRepository.html
  13. 13.
    Tzanetakis, G., Cook, P.: Musical Genre Classification of Audio Signals. IEEE Transactions on Speech and Audio Processing 10(5), 293–302 (2002)CrossRefGoogle Scholar
  14. 14.
    Ding, I., Peng, H.C., Moerchen, F., Ultsch, A., Thies, M., Loehken, I.: Modelling Timbre Distance with Temporal Statistics From Polyphonic Music. IEEE Transactions on Speech and Audio Processing 14, 81–90 (2006)CrossRefGoogle Scholar
  15. 15.
    Alpaydin, E.: Introduction to Machine Learning. The MIT Press, Cambridge (2004)zbMATHGoogle Scholar
  16. 16.
    Gulgezen, G.: Stable and Accurate Feature Selection. M.Sc. Thesis, Istanbul Technical University, Computer Engineering Department (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Gokhan Gulgezen
    • 1
  • Zehra Cataltepe
    • 1
  • Lei Yu
    • 2
  1. 1.Computer Engineering DepartmentIstanbul Technical UniversityIstanbulTurkey
  2. 2.Computer Science DepartmentBinghamton UniversityBinghamtonUSA

Personalised recommendations