Advertisement

Variable-Length Representation for EC-Based Feature Selection in High-Dimensional Data

  • N. D. Cilia
  • C. De Stefano
  • F. FontanellaEmail author
  • A. Scotto di Freca
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11454)

Abstract

Feature selection is a challenging problem, especially when hundreds or thousands of features are involved. Evolutionary Computation based techniques and in particular genetic algorithms, because of their ability to explore large and complex search spaces, have proven to be effective in solving such kind of problems. Though genetic algorithms binary strings provide a natural way to represent feature subsets, several different representation schemes have been proposed to improve the performance, with most of them needing to a priori set the number of features. In this paper, we propose a novel variable length representation, in which feature subsets are represented by lists of integers. We also devised a crossover operator to cope with the variable length representation. The proposed approach has been tested on several datasets and the results compared with those achieved by a standard genetic algorithm. Results of comparisons demonstrated the effectiveness of the proposed approach in improving the performance obtainable with a standard genetic algorithm when thousand of features are involved.

Keywords

Feature selection Evolutionary algorithms Variable length representation 

References

  1. 1.
    Cordella, L.P., De Stefano, C., Fontanella, F., Scotto di Freca, A.: A weighted majority vote strategy using bayesian networks. In: Petrosino, A. (ed.) ICIAP 2013 Part II. LNCS, vol. 8157, pp. 219–228. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-41184-7_23CrossRefGoogle Scholar
  2. 2.
    De Stefano, C., Fontanella, F., Folino, G., di Freca, A.S.: A Bayesian approach for combining ensembles of GP classifiers. In: Sansone, C., Kittler, J., Roli, F. (eds.) MCS 2011. LNCS, vol. 6713, pp. 26–35. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-21557-5_5CrossRefGoogle Scholar
  3. 3.
    De Stefano, C., Fontanella, F., Scotto Di Freca, A.: A novel Naive Bayes voting strategy for combining classifiers. In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 467–472, September 2012Google Scholar
  4. 4.
    Dash, M., Liu, H.: Feature selection for classification. Intel. Data Anal. 1(1–4), 131–156 (1997)CrossRefGoogle Scholar
  5. 5.
    Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)CrossRefGoogle Scholar
  6. 6.
    Bevilacqua, V., Mastronardi, G., Piscopo, G.: Evolutionary approach to inverse planning in coplanar radiotherapy. Image Vis. Comput. 25(2), 196–203 (2007)CrossRefGoogle Scholar
  7. 7.
    Menolascina, F., Tommasi, S., Paradiso, A., Cortellino, M., Bevilacqua, V., Mastronardi, G.: Novel data mining techniques in acgh based breast cancer subtypes profiling: the biological perspective. In: 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, pp. 9–16, April 2007Google Scholar
  8. 8.
    Menolascina, F., et al.: Developing optimal input design strategies in cancer systems biology with applications to microfluidic device engineering. BMC Bioinform. 10(12), October 2009Google Scholar
  9. 9.
    Bevilacqua, V., Costantino, N., Dotoli, M., Falagario, M., Sciancalepore, F.: Strategic design and multi-objective optimisation of distribution networks based on genetic algorithms. Int. J. Comput. Integr. Manuf. 25(12), 1139–1150 (2012)CrossRefGoogle Scholar
  10. 10.
    Bevilacqua, V., Pacelli, V., Saladino, S.: A novel multi objective genetic algorithm for the portfolio optimization. In: Huang, D.-S., Gan, Y., Bevilacqua, V., Figueroa, J.C. (eds.) ICIC 2011. LNCS, vol. 6838, pp. 186–193. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-24728-6_25CrossRefGoogle Scholar
  11. 11.
    Bevilacqua, V., Brunetti, A., Triggiani, M., Magaletti, D., Telegrafo, M., Moschetta, M.: An optimized feed-forward artificial neural network topology to support radiologists in breast lesions classification. In: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, GECCO 2016 Companion, pp. 1385–1392. ACM, New York, NY, USA (2016)Google Scholar
  12. 12.
    Cilia, N.D., De Stefano, C., Fontanella, F., Scotto di Freca, A.: A ranking-based feature selection approach for handwritten character recognition. Pattern Recogn. Lett. 121, 77–86 (2018)CrossRefGoogle Scholar
  13. 13.
    De Stefano, C., Fontanella, F., Marrocco, C.: A GA-based feature selection algorithm for remote sensing images. In: Giacobini, M., et al. (eds.) EvoWorkshops 2008. LNCS, vol. 4974, pp. 285–294. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-78761-7_29CrossRefGoogle Scholar
  14. 14.
    De Stefano, C., Fontanella, F., Marrocco, C., Scotto di Freca, A.: A GA-based feature selection approach with an application to handwritten character recognition. Pattern Recogn. Lett. 35, 130–141 (2014)CrossRefGoogle Scholar
  15. 15.
    Hong, J.H., Cho, S.B.: Efficient huge-scale feature selection with speciated genetic algorithm. Pattern Recogn. Lett. 27(2), 143–150 (2006)CrossRefGoogle Scholar
  16. 16.
    Chen, T.C., Hsieh, Y.C., You, P.S., Lee, Y.C.: Feature selection and classification by using grid computing based evolutionary approach for the microarray data. In: 2010 3rd International Conference on Computer Science and Information Technology, vol. 9, pp. 85–89, July 2010Google Scholar
  17. 17.
    Jeong, Y.S., Shin, K.S., Jeong, M.K.: An evolutionary algorithm with the partial sequential forward floating search mutation for large-scale feature selection problems. J. Oper. Res. Soc. 66(4), 529–538 (2015)CrossRefGoogle Scholar
  18. 18.
    Yahya, A.A., Osman, A., Ramli, A.R., Balola, A.: Feature selection for high dimensional data: an evolutionary filter approach. J. Comput. Sci. 7, 800–820 (2011)CrossRefGoogle Scholar
  19. 19.
    Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2000)Google Scholar
  20. 20.
    Ochoa, G.: Error thresholds in genetic algorithms. Evol. Comput. 14(2), 157–182 (2006)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Huang, J., Cai, Y., Xu, X.: A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28(13), 1825–1844 (2007)CrossRefGoogle Scholar
  22. 22.
    Oreski, S., Oreski, G.: Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst. Appl. 41(4, Part 2), 2052–2064 (2014)CrossRefGoogle Scholar
  23. 23.
    Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the Twentieth International Conference on International Conference on Machine Learning, ICML2003, pp. 856–863. AAAI Press (2003)Google Scholar
  24. 24.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • N. D. Cilia
    • 1
  • C. De Stefano
    • 1
  • F. Fontanella
    • 1
    Email author
  • A. Scotto di Freca
    • 1
  1. 1.Dipartimento di Ingegneria Elettrica e dell’Informazione (DIEI)Università di Cassino e del Lazio meridionaleCassinoItaly

Personalised recommendations