A filter-dominating hybrid sequential forward floating search method for feature subset selection in high-dimensional space

  • John Q. Gan
  • Bashar Awwad Shiekh Hasan
  • Chun Sing Louis Tsui
Original Article

Abstract

Sequential forward floating search (SFFS) has been well recognized as one of the best feature selection methods. This paper proposes a filter-dominating hybrid SFFS method, aiming at high efficiency and insignificant accuracy sacrifice for high-dimensional feature subset selection. Experiments with this new hybrid approach have been conducted on five feature data sets, with different combinations of classifier and separability index as alternative criteria for evaluating the performance of potential feature subsets. The classifiers under consideration include linear discriminate analysis classifier, support vector machine, and K-nearest neighbors classifier, and the separability indexes include the Davies-Bouldin index and a mutual information based index. Experimental results have demonstrated the advantages and usefulness of the proposed method in high-dimensional feature subset selection.

Keywords

Data mining Feature selection High-dimensional data analysis Performance evaluation Search algorithm 

References

  1. 1.
    Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502CrossRefGoogle Scholar
  2. 2.
    Gan JQ (2006) Feature dimensionality reduction by manifold learning in brain-computer interface design. In: 3rd international workshop on brain-computer interfaces. Graz, Austria, pp 28–29Google Scholar
  3. 3.
    Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recogn 43(1):5–13CrossRefMATHGoogle Scholar
  4. 4.
    Awwad Shiekh Hasan B, Gan JQ, Zhang Q (2010) Multi-objective evolutionary methods for channel selection in brain-computer interfaces: some preliminary experimental results. In: IEEE congress on evolutionary computation. Barcelona, Spain, pp 3339–3344Google Scholar
  5. 5.
    Tong DL, Mintram R (2010) Genetic algorithm-neural network (GANN): a study of neural network activation functions and depth of genetic algorithm search applied to feature selection. Int J Mach Learn Cybern 1(1–4):75–87CrossRefGoogle Scholar
  6. 6.
    Boehm O, Hardoon DR, Manevitz LM (2011) Classifying cognitive states of brain activity via one-class neural networks with feature selection by genetic algorithms. Int J Mach Learn Cybern 2(3):125–134CrossRefGoogle Scholar
  7. 7.
    Davies JL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227CrossRefGoogle Scholar
  8. 8.
    Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans SMC-Part B 28(3):301–315Google Scholar
  9. 9.
    Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRefGoogle Scholar
  10. 10.
    Handl J, Knowles J (2006) Feature subset selection in unsupervised learning via multiobjective optimization. Int J Comput Intell Res 2(3):217–238MathSciNetGoogle Scholar
  11. 11.
    Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings 18th international conference on machine learning, pp 74–81Google Scholar
  12. 12.
    Sebban M, Nock R (2002) A hybrid filter/wrapper approach of feature selection using information theory. Pattern Recogn 35:835–846CrossRefMATHGoogle Scholar
  13. 13.
    Somol P, Novovocova J, Pudil P (2006) Flexible-hybrid sequential floating search in statistical feature selection. In: Lecture notes in computer science, vol 4109. Springer, Berlin, pp 632–639Google Scholar
  14. 14.
    Huang J, Cai Y, Xu X (2007) A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn Lett 28(13):1825–1844CrossRefGoogle Scholar
  15. 15.
    Uncu O, Turksen IB (2007) A novel feature selection approach: combining feature wrappers and filters. Inf Sci 177:449–466CrossRefMATHMathSciNetGoogle Scholar
  16. 16.
    Bermejo P, Gamez JA, Puerta JM (2011) A GRASP algorithm for fast hybrid (filter-wrapper) feature subset selection in high-dimensional datasets. Pattern Recogn Lett 32(5):701–711CrossRefMathSciNetGoogle Scholar
  17. 17.
    Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15:1119–1125CrossRefGoogle Scholar
  18. 18.
    Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2):153–158CrossRefGoogle Scholar
  19. 19.
    Dyson M, Balli T, Gan JQ, Sepulveda F, Palaniappan R (2008) Approximate entropy for EEG-based movement detection. In: 4th international workshop on brain-computer interfaces, Graz, Austria, pp 110–115Google Scholar
  20. 20.
    Blankertz B, Dornhege G, Krauledat M, Mller K, Curio G (2007) The non-invasive berlin brain-computer interface: fast acquisition of effective performance in untrained subjects. NeuroImage 37:539–550Google Scholar
  21. 21.
    Desmar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • John Q. Gan
    • 1
  • Bashar Awwad Shiekh Hasan
    • 1
  • Chun Sing Louis Tsui
    • 1
  1. 1.School of Computer Science and Electronic EngineeringUniversity of EssexColchesterUK

Personalised recommendations