Skip to main content
Log in

Testing Different Ensemble Configurations for Feature Selection

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

In recent years, ensemble learning has become a prolific area of study in pattern recognition, based on the assumption that using and combining different learning models in the same problem could lead to better performance results than using a single model. This idea of ensemble learning has traditionally been used for classification tasks, but has more recently been adapted to other machine learning tasks such as clustering and feature selection. We propose several feature selection ensemble configurations based on combining rankings of features from individual rankers according to the combination method and threshold value used. The performance of each proposed ensemble configuration was tested for synthetic datasets (to assess the adequacy of the selection), real classical datasets (with more samples than features), and DNA microarray datasets (with more features than samples). Five different classifiers were studied in order to test the suitability of the proposed ensemble configurations and assess the results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Abdi H (2007) The Kendall rank correlation coefficient. Encyclopedia of measurement and statistics. Sage, Thousand Oaks

    Google Scholar 

  2. Abeel T, Helleputte T, Van de Peer Y, Dupont P, Saeys Y (2010) Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3):392–398

    Article  Google Scholar 

  3. Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F, Tranchevent LC, De Moor B, Marynen P, Hassan B et al (2006) Gene prioritization through genomic data fusion. Nat Biotechnol 24(5):537–544

    Article  Google Scholar 

  4. Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66

    Google Scholar 

  5. Asuncion A, Newman D (2007) UCI machine learning repository. http://archive.ics.uci.edu/ml/datasets.html

  6. Basu M, Ho TK (2006) Data complexity in pattern recognition. Springer Science & Business Media, Berlin

    Book  MATH  Google Scholar 

  7. Bay SD (1998) Combining nearest neighbor classifiers through multiple feature subsets. In: ICML, vol 98, Citeseer, pp 37–45

  8. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2013) A review of feature selection methods on synthetic data. Knowl Inf Syst 34(3):483–519

    Article  Google Scholar 

  9. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2014) Data classification using an ensemble of filters. Neurocomputing 135:13–20

    Article  Google Scholar 

  10. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2015) Feature selection for high-dimensional data. Springer International Publishing, Berlin

  11. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2015) Recent advances and emerging challenges of feature selection in the context of big data. Knowl-Based Syst 86:33–45

    Article  Google Scholar 

  12. Bolon-Canedo V, Moran-Fernandez L, Alonso-Betanzos A (2015) An insight on complexity measures and classification in microarray data. In: 2015 international joint conference on neural networks (IJCNN). IEEE, pp 42–49

  13. Brahim AB, Limam M (2013) Robust ensemble feature selection for high dimensional data sets. In: 2013 international conference on high performance computing and simulation (HPCS). IEEE, pp 151–157

  14. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  15. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  16. Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton

    MATH  Google Scholar 

  17. Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167

    Article  Google Scholar 

  18. Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181

    MathSciNet  MATH  Google Scholar 

  19. Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: European conference on computational learning theory. Springer, pp 23–37

  20. Gao K, Khoshgoftaar TM, Wang H (2009) An empirical investigation of filter attribute selection techniques for software quality classification. In: IRI’09. IEEE international conference on information reuse and integration. IEEE, pp 272–277

  21. Guyon I (2006) Feature extraction: foundations and applications, vol 207. Springer Science & Business Media, Berlin

    Google Scholar 

  22. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  23. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422

    Article  MATH  Google Scholar 

  24. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844

    Article  Google Scholar 

  25. Jain AK, Chandrasekaran B (1982) 39 Dimensionality and sample size considerations in pattern recognition practice. Handb Stat 2:835–855

    Article  MATH  Google Scholar 

  26. Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 133–142

  27. Khoshgoftaar TM, Golawala M, Van Hulse J (2007) An empirical study of learning from imbalanced data using random forest. In: 2007. ICTAI 2007. 19th IEEE international conference on tools with artificial intelligence, vol 2. IEEE, pp 310–317

  28. Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. In: AAAI, pp 129–134

  29. Kolde R, Laur S, Adler P, Vilo J (2012) Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 28(4):573–580

    Article  Google Scholar 

  30. Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: Machine learning: ECML-94. Springer, pp 171–182

  31. Kuncheva L (2004) Combining pattern classifiers: methods and algorithms. Wiley, New York

    Book  MATH  Google Scholar 

  32. Kuncheva L, Whitaker C (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207

    Article  MATH  Google Scholar 

  33. Liu H, Setiono R (1995) Chi2: feature selection and discretization of numeric attributes. In: 2012 IEEE 24th international conference on tools with artificial intelligence. IEEE Computer Society, pp 388–388

  34. Lyerly SB (1952) The average spearman rank correlation coefficient. Psychometrika 17(4):421–428

    Article  MathSciNet  MATH  Google Scholar 

  35. Mejía-Lavalle M, Sucar E, Arroyo G (2006) Feature selection with a perceptron neural net. In: Proceedings of the international workshop on feature selection for data mining, pp 131–135

  36. Molina LC, Belanche L, Nebot À (2002) Feature selection algorithms: a survey and experimental evaluation. In: 2002. ICDM 2003. Proceedings. 2002 IEEE international conference on data mining. IEEE, pp 306–313

  37. Morán-Fernández L, Bolón-Canedo V, Alonso-Betanzos A (2017) Centralized vs. distributed feature selection methods based on data complexity measures. Knowl Based Syst 117:27–45

  38. Olsson J, Oard DW (2006) Combining feature selectors for text classification. In: Proceedings of the 15th ACM international conference on information and knowledge management. ACM, pp 798–799

  39. Opitz DW (1999) Feature selection for ensembles. In: AAAI/IAAI, pp 379–384

  40. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  41. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  42. Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, Amsterdam

    Google Scholar 

  43. Ren Y, Zhang L, Suganthan P (2016) Ensemble classification and regression-recent developments, applications and future directions. IEEE Comput Intell Mag 11(1):41–53

    Article  Google Scholar 

  44. Rish I (2001) An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3. IBM, New York, pp 41–46

  45. Rodríguez D, Ruiz R, Cuadrado-Gallego J, Aguilar-Ruiz J (2007) Detecting fault modules applying feature selection to classifiers. In: IEEE international conference on information reuse and integration, IRI 2007. IEEE, pp 667–672

  46. Schapire R (1990) The strength of weak learnability. Mach Learn 5(2):197–227

    Google Scholar 

  47. Schölkopf B, Burges CJ, Smola AJ (1999) Advances in kernel methods: support vector learning. MIT Press, Cambridge

    MATH  Google Scholar 

  48. Seijo-Pardo B, Bolón-Canedo V, Porto-Díaz I, Alonso-Betanzos A (2015) Ensemble feature selection for ranking of features. In: Advances in computational intelligence. Lecture notes in computer science, vol LN-9095, pp 29–42

  49. Tsymbal A, Pechenizkiy M, Cunningham P (2005) Diversity in search strategies for ensemble feature selection. Inf Fus 6(1):83–98

    Article  Google Scholar 

  50. Tukey JW (1949) Comparing individual means in the analysis of variance. Biometrics 5(2):99–114

    Article  MathSciNet  Google Scholar 

  51. Vapnik V (2000) The nature of statistical learning theory. Springer Science & Business Media, Berlin

    Book  MATH  Google Scholar 

  52. Wang H, Khoshgoftaar TM, Gao K (2010) Ensemble feature selection technique for software quality classification. In: SEKE, pp 215–220

  53. Wang H, Khoshgoftaar TM, Napolitano A (2010) A comparative study of ensemble feature selection techniques for software defect prediction. In: 2010 9th international conference on machine learning and applications (ICMLA). IEEE, pp 135–140

  54. Willett P (2013) Combination of similarity rankings using data fusion. J Chem Inf Model 53(1):1–10

    Article  MathSciNet  Google Scholar 

  55. Windeatt T, Duangsoithong R, Smith R (2011) Embedded feature ranking for ensemble mlp classifiers. IEEE Trans Neural Netw 22(6):988–994

    Article  Google Scholar 

  56. Yang F, Mao K (2011) Robust feature selection for microarray data based on multicriterion fusion. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 8(4):1080–1092

    Article  Google Scholar 

  57. Yang CH, Huang CC, Wu KC, Chang HY (2008) A novel ga-taguchi-based feature selection method. In: International Conference on Intelligent Data Engineering and Automated Learning. Springer, Berlin, Heidelberg, pp 112–119

  58. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224

    MathSciNet  MATH  Google Scholar 

  59. Zheng Z, Webb GI (1998) Stochastic attribute selection committees. Springer, Berlin

    Book  Google Scholar 

Download references

Acknowledgements

This research was financially supported in part by the Spanish Ministerio de Economía y Competitividad (Research Project TIN 2015-65069-C2-1R), partially funded by FEDER funds of the European Union and by the Consellería de Industria of the Xunta de Galicia (Research Project GRC2014/035). V. Bolón-Canedo acknowledges Xunta de Galicia postdoctoral funding (ED481B 2014/164-0). Financial support from the Xunta de Galicia (Centro singular de investigación de Galicia accreditation 2016–2019) and the European Union (European Regional Development Fund—ERDF) (Research Project ED431G/01), is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Borja Seijo-Pardo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Seijo-Pardo, B., Bolón-Canedo, V. & Alonso-Betanzos, A. Testing Different Ensemble Configurations for Feature Selection. Neural Process Lett 46, 857–880 (2017). https://doi.org/10.1007/s11063-017-9619-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-017-9619-1

Keywords

Navigation