Abstract
The emanation of the high-dimensional data processing induces severe problems and challenges besides the apparent benefits. High-dimensional data analysis demands a huge requirement for processing. In this paper, we have proposed multistage methods ChS-R (Chi-square integrated with RReliefF) and SU-R (Symmetric Uncertainty integrated with RReliefF) for ranking features. The proposed integrated feature ranking methods use different statistical methods to select appropriate feature subset. The methods are integrated to overcome issues of one method with benefits of other method. The Chi-square (ChS) test is initially applied to select top n features, followed by RReliefF. In RReliefF algorithm, attributes are selected according to their suitability for the target function. It gives global view of attribute quality for further dimensionality reduction. In addition RReliefF deals with noisy, incomplete and multi-class data. Similarly, Symmetric Uncertainty (SU) integrated with RReliefF approach is proposed. The results are validated with random forest (RF), K-nearest neighbor (KNN), support vector machine (SVM) classifiers. The proposed systems are compared with SU, ChS, Relief and Ensemble Feature Selection with Mutual Information (EFS-MI) methods. The proposed approach achieves 89.48% dimensionality reduction.
Similar content being viewed by others
References
I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
M. Gutkin, R. Shamir, G. Dror, SlimPLS: A method for feature selection in gene expression-based disease classification. PLoS ONE 4(7), e6416 (2009). https://doi.org/10.1371/journal.pone.0006416
G.B. Huang, D.H. Wang, Y. Lan, Extreme learning machines: a survey. Int. J. Mach. Learn. Cybernet. 2(2), 107–122 (2011)
Y. Leung, Y. Hung, A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Trans. Comput. Biol. Bioinf. 7(1), 108–117 (2008)
J.C. Ang, A. Mirzal, H. Haron, H.N.A. Hamed, Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans. Comput. Biol. Bioinf. 13(5), 971–989 (2015)
V.S. Ha, H.N. Nguyen, Credit scoring with a feature selection approach based deep learning. In: MATEC Web of Conferences, volume 54, page 05004. EDP Sciences, 2016
Y.Y. Yao, Information-theoretic measures for knowledge discovery and data mining, in Entropy Measures, Maximum Entropy Principle and Emerging Applications. Studies in Fuzziness and Soft Computing, vol. 119, ed. by Karmeshu (Springer, Berlin, Heidelberg, 2003), pp. 115–136. https://doi.org/10.1007/978-3-540-36212-8_6
L. Yu, H. Liu, Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)
M. Verleysen, F. Rossi, D. François, Advances in feature selection with mutual information, in Similarity-Based Clustering. Lecture Notes in Computer Science, vol. 5400, ed. by M. Biehl, B. Hammer, M. Verleysen, T. Villmann (Springer, Berlin, Heidelberg, 2009), pp. 52–69. https://doi.org/10.1007/978-3-642-01805-3_4
E. Bonilla-Huerta, A. Hernandez-Montiel, R. Morales-Caporal, M. Arjona-López, Hybrid framework using multiple-filters and an embedded approach for an efficient selection and classification of microarray data. IEEE/ACM Trans. Computat. Biol Bioinf. 13(1), 12–26 (2015)
R. Dash, B.B. Misra, Pipelining the ranking techniques for microarray data classification: A case study. Appl. Soft Comput. 48, 298–316 (2016)
R. Dash, A two stage grading approach for feature selection and classification of microarray data using pareto based feature ranking techniques: A case study. J. King Saud Univ. Comp. Inf. Sci. 32(2), 232–247 (2020). https://doi.org/10.1016/j.jksuci.2017.08.005
J. Reunanen, Overfitting in making comparisons between variable selection methods. J. Mach. Learn. Res. 3, 1371–1382 (2003)
M. van Heeswijk, Y. Miche, Binary/ternary extreme learning machines. Neurocomputing 149, 187–197 (2015)
M. Seera, C.P. Lim, A hybrid intelligent system for medical data classification. Expert Syst. Appl. 41(5), 2239–2249 (2014)
Yu. Hualong, Jun Ni, An improved ensemble learning method for classifying high-dimensional and imbalanced biomedicine data. IEEE/ACM Trans. Comput. Biol. Bioinf. 11(4), 657–666 (2014)
A.K. Das, S. Das, A. Ghosh, Ensemble feature selection using bi-objective genetic algorithm. Knowl. Based Syst. 123, 116–127 (2017)
N. Hoque, M. Singh, D.K. Bhattacharyya, Efs-mi: an ensemble feature selection method for classification. Complex Intell. Syst. 4(2), 105–118 (2018)
D.V.R. Oliveira, G.D.C. Cavalcanti, R. Sabourin, Online pruning of base classifiers for Dynamic Ensemble Selection. Pattern Recognit. 72, 44–58 (2017)
R. Nagarajan, M. Upreti, An ensemble predictive modeling framework for breast cancer classification. Methods 131, 128–134 (2017)
Y. Xiao, W. Jun, Z. Lin, X. Zhao, A deep learning-based multi-model ensemble method for cancer prediction. Compu Methods Programs Biomed. 153, 1–9 (2018)
K. Kira, L.A. Rendell, A practical approach to feature selection. In: Machine Learning Proceedings 1992, pages 249–256. Elsevier, 1992
I. Kononenko, Estimating attributes: analysis and extensions of relief. In: European conference on machine learning, pages 171–182. Springer, 1994
M. Robnik-Šikonja, I. Kononenko, An adaptation of relief for attribute estimation in regression. In: Machine Learning: Proceedings of the Fourteenth International Conference (ICML’97), volume 5, pages 296–304, 1997
J. Demsar, Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Funding
There was no outside funding or grant that assisted in this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sumant, A.S., Patil, D. Ensemble Feature Subset Selection: Integration of Symmetric Uncertainty and Chi-Square techniques with RReliefF. J. Inst. Eng. India Ser. B 103, 831–844 (2022). https://doi.org/10.1007/s40031-021-00684-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40031-021-00684-5