Skip to main content
Log in

Ensemble Feature Subset Selection: Integration of Symmetric Uncertainty and Chi-Square techniques with RReliefF

  • Original Contribution
  • Published:
Journal of The Institution of Engineers (India): Series B Aims and scope Submit manuscript

Abstract

The emanation of the high-dimensional data processing induces severe problems and challenges besides the apparent benefits. High-dimensional data analysis demands a huge requirement for processing. In this paper, we have proposed multistage methods ChS-R (Chi-square integrated with RReliefF) and SU-R (Symmetric Uncertainty integrated with RReliefF) for ranking features. The proposed integrated feature ranking methods use different statistical methods to select appropriate feature subset. The methods are integrated to overcome issues of one method with benefits of other method. The Chi-square (ChS) test is initially applied to select top n features, followed by RReliefF. In RReliefF algorithm, attributes are selected according to their suitability for the target function. It gives global view of attribute quality for further dimensionality reduction. In addition RReliefF deals with noisy, incomplete and multi-class data. Similarly, Symmetric Uncertainty (SU) integrated with RReliefF approach is proposed. The results are validated with random forest (RF), K-nearest neighbor (KNN), support vector machine (SVM) classifiers. The proposed systems are compared with SU, ChS, Relief and Ensemble Feature Selection with Mutual Information (EFS-MI) methods. The proposed approach achieves 89.48% dimensionality reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  2. M. Gutkin, R. Shamir, G. Dror, SlimPLS: A method for feature selection in gene expression-based disease classification. PLoS ONE 4(7), e6416 (2009). https://doi.org/10.1371/journal.pone.0006416

    Article  Google Scholar 

  3. G.B. Huang, D.H. Wang, Y. Lan, Extreme learning machines: a survey. Int. J. Mach. Learn. Cybernet. 2(2), 107–122 (2011)

    Article  Google Scholar 

  4. Y. Leung, Y. Hung, A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Trans. Comput. Biol. Bioinf. 7(1), 108–117 (2008)

    Article  Google Scholar 

  5. J.C. Ang, A. Mirzal, H. Haron, H.N.A. Hamed, Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans. Comput. Biol. Bioinf. 13(5), 971–989 (2015)

    Article  Google Scholar 

  6. V.S. Ha, H.N. Nguyen, Credit scoring with a feature selection approach based deep learning. In: MATEC Web of Conferences, volume 54, page 05004. EDP Sciences, 2016

  7. Y.Y. Yao, Information-theoretic measures for knowledge discovery and data mining, in Entropy Measures, Maximum Entropy Principle and Emerging Applications. Studies in Fuzziness and Soft Computing, vol. 119, ed. by Karmeshu (Springer, Berlin, Heidelberg, 2003), pp. 115–136. https://doi.org/10.1007/978-3-540-36212-8_6

    Chapter  Google Scholar 

  8. L. Yu, H. Liu, Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)

    MathSciNet  MATH  Google Scholar 

  9. M. Verleysen, F. Rossi, D. François, Advances in feature selection with mutual information, in Similarity-Based Clustering. Lecture Notes in Computer Science, vol. 5400, ed. by M. Biehl, B. Hammer, M. Verleysen, T. Villmann (Springer, Berlin, Heidelberg, 2009), pp. 52–69. https://doi.org/10.1007/978-3-642-01805-3_4

    Chapter  Google Scholar 

  10. E. Bonilla-Huerta, A. Hernandez-Montiel, R. Morales-Caporal, M. Arjona-López, Hybrid framework using multiple-filters and an embedded approach for an efficient selection and classification of microarray data. IEEE/ACM Trans. Computat. Biol Bioinf. 13(1), 12–26 (2015)

    Article  Google Scholar 

  11. R. Dash, B.B. Misra, Pipelining the ranking techniques for microarray data classification: A case study. Appl. Soft Comput. 48, 298–316 (2016)

    Article  Google Scholar 

  12. R. Dash, A two stage grading approach for feature selection and classification of microarray data using pareto based feature ranking techniques: A case study. J. King Saud Univ. Comp. Inf. Sci. 32(2), 232–247 (2020). https://doi.org/10.1016/j.jksuci.2017.08.005

    Article  Google Scholar 

  13. J. Reunanen, Overfitting in making comparisons between variable selection methods. J. Mach. Learn. Res. 3, 1371–1382 (2003)

    MATH  Google Scholar 

  14. M. van Heeswijk, Y. Miche, Binary/ternary extreme learning machines. Neurocomputing 149, 187–197 (2015)

    Article  Google Scholar 

  15. M. Seera, C.P. Lim, A hybrid intelligent system for medical data classification. Expert Syst. Appl. 41(5), 2239–2249 (2014)

    Article  Google Scholar 

  16. Yu. Hualong, Jun Ni, An improved ensemble learning method for classifying high-dimensional and imbalanced biomedicine data. IEEE/ACM Trans. Comput. Biol. Bioinf. 11(4), 657–666 (2014)

    Article  Google Scholar 

  17. A.K. Das, S. Das, A. Ghosh, Ensemble feature selection using bi-objective genetic algorithm. Knowl. Based Syst. 123, 116–127 (2017)

    Article  Google Scholar 

  18. N. Hoque, M. Singh, D.K. Bhattacharyya, Efs-mi: an ensemble feature selection method for classification. Complex Intell. Syst. 4(2), 105–118 (2018)

    Article  Google Scholar 

  19. D.V.R. Oliveira, G.D.C. Cavalcanti, R. Sabourin, Online pruning of base classifiers for Dynamic Ensemble Selection. Pattern Recognit. 72, 44–58 (2017)

    Article  Google Scholar 

  20. R. Nagarajan, M. Upreti, An ensemble predictive modeling framework for breast cancer classification. Methods 131, 128–134 (2017)

    Article  Google Scholar 

  21. Y. Xiao, W. Jun, Z. Lin, X. Zhao, A deep learning-based multi-model ensemble method for cancer prediction. Compu Methods Programs Biomed. 153, 1–9 (2018)

    Article  Google Scholar 

  22. K. Kira, L.A. Rendell, A practical approach to feature selection. In: Machine Learning Proceedings 1992, pages 249–256. Elsevier, 1992

  23. I. Kononenko, Estimating attributes: analysis and extensions of relief. In: European conference on machine learning, pages 171–182. Springer, 1994

  24. M. Robnik-Šikonja, I. Kononenko, An adaptation of relief for attribute estimation in regression. In: Machine Learning: Proceedings of the Fourteenth International Conference (ICML’97), volume 5, pages 296–304, 1997

  25. J. Demsar, Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

Download references

Funding

There was no outside funding or grant that assisted in this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Archana Shivdas Sumant.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sumant, A.S., Patil, D. Ensemble Feature Subset Selection: Integration of Symmetric Uncertainty and Chi-Square techniques with RReliefF. J. Inst. Eng. India Ser. B 103, 831–844 (2022). https://doi.org/10.1007/s40031-021-00684-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40031-021-00684-5

Keywords

Navigation