Skip to main content

Feature Selection in High Dimensional Data by a Filter-Based Genetic Algorithm

  • Conference paper
  • First Online:
Applications of Evolutionary Computation (EvoApplications 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10199))

Included in the following conference series:

Abstract

In classification and clustering problems, feature selection techniques can be used to reduce the dimensionality of the data and increase the performances. However, feature selection is a challenging task, especially when hundred or thousands of features are involved. In this framework, we present a new approach for improving the performance of a filter-based genetic algorithm. The proposed approach consists of two steps: first, the available features are ranked according to a univariate evaluation function; then the search space represented by the first M features in the ranking is searched using a filter-based genetic algorithm for finding feature subsets with a high discriminative power.

Experimental results demonstrated the effectiveness of our approach in dealing with high dimensional data, both in terms of recognition rate and feature number reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    Note that the same holds also for the feature-class correlation.

References

  1. Nips 2003 workshop on feature extraction and feature selection challenge (2003). http://clopinet.com/isabelle/Projects/NIPS2003

  2. Bermejo, P., Gámez, J.A., Puerta, J.M.: Improving incremental wrapper-based subset selection via replacement and early stopping. IJPRAI 25(5), 605–625 (2011)

    MathSciNet  Google Scholar 

  3. Cordella, L.P., De Stefano, C., Fontanella, F., Marrocco, C., Scotto di Freca, A.: Combining single class features for improving performance of a two stage classifier. In: 20th International Conference on Pattern Recognition (ICPR 2010), pp. 4352–4355. IEEE Computer Society (2010)

    Google Scholar 

  4. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(1–4), 131–156 (1997)

    Article  Google Scholar 

  5. De Stefano, C., Fontanella, F., Marrocco, C.: A GA-based feature selection algorithm for remote sensing images. In: Giacobini, M., et al. (eds.) EvoWorkshops 2008. LNCS, vol. 4974, pp. 285–294. Springer, Heidelberg (2008). doi:10.1007/978-3-540-78761-7_29

    Chapter  Google Scholar 

  6. De Stefano, C., Fontanella, F., Maniaci, M., Scotto di Freca, A.: A method for scribe distinction in medieval manuscripts using page layout features. In: Maino, G., Foresti, G.L. (eds.) ICIAP 2011. LNCS, vol. 6978, pp. 393–402. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24085-0_41

    Chapter  Google Scholar 

  7. Gütlein, M., Frank, E., Hall, M., Karwath, A.: Large scale attribute selection using wrappers. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2009) (2009)

    Google Scholar 

  8. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  9. Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc., San Francisco (2000)

    Google Scholar 

  10. Huang, J., Cai, Y., Xu, X.: A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28(13), 1825–1844 (2007)

    Article  Google Scholar 

  11. Lanzi, P.: Fast feature selection with genetic algorithms: a filter approach. In: IEEE International Conference on Evolutionary Computation, pp. 537–540, April 1997

    Google Scholar 

  12. Lee, J.S., Oh, I.S., Moon, B.R.: Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1424–1437 (2004)

    Article  Google Scholar 

  13. Li, R., Lu, J., Zhang, Y., Zhao, T.: Dynamic adaboost learning with feature selection based on parallel genetic algorithm for image annotation. Knowl. Based Syst. 23(3), 195–201 (2010)

    Article  Google Scholar 

  14. Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  15. Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: ICTAI, pp. 88–91. IEEE Computer Society, Washington, DC (1995)

    Google Scholar 

  16. Manimala, K., Selvi, K., Ahila, R.: Hybrid soft computing techniques for feature selection and parameter optimization in power quality data mining. Appl. Soft Comput. 11(8), 5485–5497 (2011). http://www.sciencedirect.com/science/article/pii/S1568494611001694

    Article  Google Scholar 

  17. Ochoa, G.: Error thresholds in genetic algorithms. Evol. Comput. 14(2), 157–182 (2006)

    Article  Google Scholar 

  18. Oreski, S., Oreski, G.: Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst. Appl. 41(4, Part 2), 2052–2064 (2014)

    Article  Google Scholar 

  19. Spolaôr, N., Lorena, A.C., Lee, H.D.: Multi-objective genetic algorithm evaluation in feature selection. In: Takahashi, R.H.C., Deb, K., Wanner, E.F., Greco, S. (eds.) EMO 2011. LNCS, vol. 6576, pp. 462–476. Springer, Heidelberg (2011). doi:10.1007/978-3-642-19893-9_32

    Chapter  Google Scholar 

  20. Tan, F., Fu, X., Zhang, Y., Bourgeois, A.G.: A genetic algorithm-based method for feature subset selection. Soft Comput. 12(2), 111–120 (2007)

    Article  Google Scholar 

  21. Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)

    Article  Google Scholar 

  22. Yusta, S.C.: Different metaheuristic strategies to solve the feature selection problem. Pattern Recogn. Lett. 30(5), 525–534 (2009)

    Article  Google Scholar 

  23. Zhai, Y., Ong, Y.S., Tsang, I.: The emerging “big dimensionality”. IEEE Comput. Intell. Mag. 9(3), 14–26 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Fontanella .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

De Stefano, C., Fontanella, F., Scotto di Freca, A. (2017). Feature Selection in High Dimensional Data by a Filter-Based Genetic Algorithm. In: Squillero, G., Sim, K. (eds) Applications of Evolutionary Computation. EvoApplications 2017. Lecture Notes in Computer Science(), vol 10199. Springer, Cham. https://doi.org/10.1007/978-3-319-55849-3_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55849-3_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55848-6

  • Online ISBN: 978-3-319-55849-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics