Feature Selection in High Dimensional Data by a Filter-Based Genetic Algorithm

  • Claudio De Stefano
  • Francesco Fontanella
  • Alessandra Scotto di Freca
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10199)


In classification and clustering problems, feature selection techniques can be used to reduce the dimensionality of the data and increase the performances. However, feature selection is a challenging task, especially when hundred or thousands of features are involved. In this framework, we present a new approach for improving the performance of a filter-based genetic algorithm. The proposed approach consists of two steps: first, the available features are ranked according to a univariate evaluation function; then the search space represented by the first M features in the ranking is searched using a filter-based genetic algorithm for finding feature subsets with a high discriminative power.

Experimental results demonstrated the effectiveness of our approach in dealing with high dimensional data, both in terms of recognition rate and feature number reduction.


  1. 1.
    Nips 2003 workshop on feature extraction and feature selection challenge (2003).
  2. 2.
    Bermejo, P., Gámez, J.A., Puerta, J.M.: Improving incremental wrapper-based subset selection via replacement and early stopping. IJPRAI 25(5), 605–625 (2011)MathSciNetGoogle Scholar
  3. 3.
    Cordella, L.P., De Stefano, C., Fontanella, F., Marrocco, C., Scotto di Freca, A.: Combining single class features for improving performance of a two stage classifier. In: 20th International Conference on Pattern Recognition (ICPR 2010), pp. 4352–4355. IEEE Computer Society (2010)Google Scholar
  4. 4.
    Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(1–4), 131–156 (1997)CrossRefGoogle Scholar
  5. 5.
    De Stefano, C., Fontanella, F., Marrocco, C.: A GA-based feature selection algorithm for remote sensing images. In: Giacobini, M., et al. (eds.) EvoWorkshops 2008. LNCS, vol. 4974, pp. 285–294. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-78761-7_29CrossRefGoogle Scholar
  6. 6.
    De Stefano, C., Fontanella, F., Maniaci, M., Scotto di Freca, A.: A method for scribe distinction in medieval manuscripts using page layout features. In: Maino, G., Foresti, G.L. (eds.) ICIAP 2011. LNCS, vol. 6978, pp. 393–402. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-24085-0_41CrossRefGoogle Scholar
  7. 7.
    Gütlein, M., Frank, E., Hall, M., Karwath, A.: Large scale attribute selection using wrappers. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2009) (2009)Google Scholar
  8. 8.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  9. 9.
    Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc., San Francisco (2000)Google Scholar
  10. 10.
    Huang, J., Cai, Y., Xu, X.: A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28(13), 1825–1844 (2007)CrossRefGoogle Scholar
  11. 11.
    Lanzi, P.: Fast feature selection with genetic algorithms: a filter approach. In: IEEE International Conference on Evolutionary Computation, pp. 537–540, April 1997Google Scholar
  12. 12.
    Lee, J.S., Oh, I.S., Moon, B.R.: Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1424–1437 (2004)CrossRefGoogle Scholar
  13. 13.
    Li, R., Lu, J., Zhang, Y., Zhao, T.: Dynamic adaboost learning with feature selection based on parallel genetic algorithm for image annotation. Knowl. Based Syst. 23(3), 195–201 (2010)CrossRefGoogle Scholar
  14. 14.
    Lichman, M.: UCI machine learning repository (2013).
  15. 15.
    Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: ICTAI, pp. 88–91. IEEE Computer Society, Washington, DC (1995)Google Scholar
  16. 16.
    Manimala, K., Selvi, K., Ahila, R.: Hybrid soft computing techniques for feature selection and parameter optimization in power quality data mining. Appl. Soft Comput. 11(8), 5485–5497 (2011). Scholar
  17. 17.
    Ochoa, G.: Error thresholds in genetic algorithms. Evol. Comput. 14(2), 157–182 (2006)CrossRefGoogle Scholar
  18. 18.
    Oreski, S., Oreski, G.: Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst. Appl. 41(4, Part 2), 2052–2064 (2014)CrossRefGoogle Scholar
  19. 19.
    Spolaôr, N., Lorena, A.C., Lee, H.D.: Multi-objective genetic algorithm evaluation in feature selection. In: Takahashi, R.H.C., Deb, K., Wanner, E.F., Greco, S. (eds.) EMO 2011. LNCS, vol. 6576, pp. 462–476. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-19893-9_32CrossRefGoogle Scholar
  20. 20.
    Tan, F., Fu, X., Zhang, Y., Bourgeois, A.G.: A genetic algorithm-based method for feature subset selection. Soft Comput. 12(2), 111–120 (2007)CrossRefGoogle Scholar
  21. 21.
    Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)CrossRefGoogle Scholar
  22. 22.
    Yusta, S.C.: Different metaheuristic strategies to solve the feature selection problem. Pattern Recogn. Lett. 30(5), 525–534 (2009)CrossRefGoogle Scholar
  23. 23.
    Zhai, Y., Ong, Y.S., Tsang, I.: The emerging “big dimensionality”. IEEE Comput. Intell. Mag. 9(3), 14–26 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Claudio De Stefano
    • 1
  • Francesco Fontanella
    • 1
  • Alessandra Scotto di Freca
    • 1
  1. 1.Dipartimento di Ingegneria Elettrica e dell’Informazione (DIEI)Università di Cassino e del Lazio meridionaleCassinoItaly

Personalised recommendations