A Novel Outlook on Feature Selection as a Multi-objective Problem

  • Pietro Barbiero
  • Evelyne Lutton
  • Giovanni Squillero
  • Alberto TondaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12052)


Feature selection is the process of choosing, or removing, features to obtain the most informative feature subset of minimal size. Such subsets are used to improve performance of machine learning algorithms and enable human understanding of the results. Approaches to feature selection in literature exploit several optimization algorithms. Multi-objective methods also have been proposed, minimizing at the same time the number of features and the error. While most approaches assess error resorting to the average of a stochastic K-fold cross-validation, comparing averages might be misleading. In this paper, we show how feature subsets with different average error might in fact be non-separable when compared using a statistical test. Following this idea, clusters of non-separable optimal feature subsets are identified. The performance in feature selection can thus be evaluated by verifying how many of these optimal feature subsets an algorithm is able to identify. We thus propose a multi-objective optimization approach to feature selection, EvoFS, with the objectives to i. minimize feature subset size, ii. minimize test error on a 10-fold cross-validation using a specific classifier, iii. maximize the analysis of variance value of the lowest-performing feature in the set. Experiments on classification datasets whose feature subsets can be exhaustively evaluated show that our approach is able to always find the best feature subsets. Further experiments on a high-dimensional classification dataset, that cannot be exhaustively analyzed, show that our approach is able to find more optimal feature subsets than state-of-the-art feature selection algorithms.


Feature selection Machine learning Multi-objective optimization Evolutionary algorithms Multi-objective evolutionary algorithms 


  1. 1.
    Cilia, N.D., De Stefano, C., Fontanella, F., Scotto di Freca, A.: Variable-length representation for EC-based feature selection in high-dimensional data. In: Kaufmann, P., Castillo, P.A. (eds.) EvoApplications 2019. LNCS, vol. 11454, pp. 325–340. Springer, Cham (2019). Scholar
  2. 2.
    Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2015)CrossRefGoogle Scholar
  3. 3.
    Hamdani, T.M., Won, J.-M., Alimi, A.M., Karray, F.: Multi-objective feature selection with NSGA II. In: Beliczynski, B., Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds.) ICANNGA 2007. LNCS, vol. 4431, pp. 240–247. Springer, Heidelberg (2007). Scholar
  4. 4.
    Xue, B., Fu, W., Zhang, M.: Multi-objective feature selection in classification: a differential evolution approach. In: Dick, G.G., et al. (eds.) SEAL 2014. LNCS, vol. 8886, pp. 516–528. Springer, Cham (2014). Scholar
  5. 5.
    Vignolo, L.D., Milone, D.H., Scharcanski, J.: Feature selection for face recognition based on multi-objective evolutionary wrappers. Expert Syst. Appl. 40(13), 5077–5084 (2013)CrossRefGoogle Scholar
  6. 6.
    Zhou, Z., Li, S., Qin, G., Folkert, M., Jiang, S., Wang, J.: Multi-objective based radiomic feature selection for lesion malignancy classification. IEEE J. Biomed. Health Inform. 24, 194–204 (2019)CrossRefGoogle Scholar
  7. 7.
    Fan, Y.J., Kamath, C.: On the selection of dimension reduction techniques for scientific applications (2012). 10.2172/1036865. part of the Annals of Information Systems book series (AOIS, volume 17)
  8. 8.
    Bermingham, M., et al.: Application of high-dimensional feature selection: evaluation for genomic prediction in man. Sci. Rep. 5, 10312 (2015). Scholar
  9. 9.
    Tsai, F.S.: Dimensionality reduction for computer facial animation. Expert Syst. Appl. 39(5), 4965–4971 (2012). Scholar
  10. 10.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  11. 11.
    Lewis, P.: The characteristic selection problem in recognition systems. IRE Trans. Inf. Theory 8(2), 171–178 (1962)CrossRefGoogle Scholar
  12. 12.
    Chien, Y., Fu, K.S.: On the generalized Karhunen-Loève expansion (Corresp.). IEEE Trans. Inf. Theory 13(3), 518–520 (1967)CrossRefGoogle Scholar
  13. 13.
    Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: Feature selection for SVMs. In: Advances in Neural Information Processing Systems 13, pp. 668–674. MIT Press (2000)Google Scholar
  14. 14.
    Kozachenko, L., Leonenko, N.N.: Sample estimate of the entropy of a random vector. Problemy Peredachi Informatsii 23(2), 9–16 (1987)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Fisher, R.A.: XV-the correlation between relatives on the supposition of mendelian inheritance. Earth Environ. Sci. Trans. R. Soc. Edinb. 52(2), 399–433 (1919)CrossRefGoogle Scholar
  16. 16.
    Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)CrossRefGoogle Scholar
  17. 17.
    Heiman, G.W.: Understanding Research Methods and Statistics: An Integrated Introduction for Psychology. Mifflin and Company, Houghton (2001)Google Scholar
  18. 18.
    Cox, D.R.: The regression analysis of binary sequences. J. Roy. Stat. Soc. Ser. B (Methodol.) 20(2), 215–232 (1958)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)CrossRefGoogle Scholar
  20. 20.
    Welch, B.L.: The generalization of student’s problem when several different population variances are involved. Biometrika 34(1/2), 28–35 (1947)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Krzywinski, M., Altman, N.: Points of significance: comparing samples-part I. Nat. Methods 11(3), 215 (2014)CrossRefGoogle Scholar
  22. 22.
    Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Casalicchio, G., et al.: OpenML: an R package to connect to the machine learning platform OpenML. Comput. Statistics 34(3), 977–991 (2017). Scholar
  24. 24.
    Garrett, A.: inspyred (version 1.0.1) inspired intelligence (2012).
  25. 25.
    Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013). Scholar
  26. 26.
    Dua, D., Graff, C.: UCI machine learning repository (2017).
  27. 27.
    Quinlan, J.R.: Simplifying decision trees. Int. J. Man Mach. Stud. 27(3), 221–234 (1987)CrossRefGoogle Scholar
  28. 28.
    Siebert, J.P.: Vehicle recognition using rule based methods (1987)Google Scholar
  29. 29.
    Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the NIPS 2003 feature selection challenge. In: Advances in Neural Information Processing Systems, pp. 545–552 (2005)Google Scholar
  30. 30.
    Guyon, I.: Design of experiments of the NIPS 2003 variable selection benchmark. In: NIPS 2003 Workshop on Feature Extraction and Feature Selection (2003)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Politecnico di TorinoTorinoItaly
  2. 2.UMR 782, Université Paris-Saclay, INRA, AgroParisTechThiverval-GrignonFrance

Personalised recommendations