Abstract
Feature selection algorithms are widely used in most of the critical applications of machine learning, pattern recognition, disease diagnosis and fraud identification. Specifically, due to the existence of redundant and irrelevant attributes, the performance of the underlying model will be degraded with higher time complexity and lower accuracy. Thus, feature selection has become one of the most widely used pre-processing techniques as it improves performance concerning time and accuracy. Several methods exist in identifying the optimal subset of features from the large feature set. However, none of the single feature selection techniques is proved to be effective for all types of the training set. Each method will be effective with specific classifiers and for specific datasets. This paper utilizes the maximum relevancy and minimum redundancy based ensemble feature selection for effective classification with maximized accuracy. The proposed model collects the ranks from different filtering techniques and convert them to scores using the rank-sum approach. The accumulated scores signify the feature importance for which the mutual information for each selected feature with the class variable as well as the selected optimal features are computed. The model has been validated using various experiments by different classifiers, and it is proved to be more effective in improving the performance of the classification results than the existing single rank techniques.
Keywords
- Ensemble model
- Feature selection
- Mutual information
- Minimum redundancy
- Maximum relevancy
This is a preview of subscription content, access via your institution.
Buying options





References
Charitopoulos, A., Rangoussi, M., Koulouriotis, D.: On the use of soft computing methods in educational data mining and learning analytics research: a review of years 2010–2018. Int. J. Artif. Intell. Educ. 30(3), 371–430 (2020)
Verma, D., Mishra, N.: Analysis and prediction of breast cancer and diabetes disease datasets using data mining classification techniques. In: International Conference on Intelligent Sustainable Systems, pp. 533–538. IEEE (2017)
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier (2011)
Mafarja, M.M., Mirjalili, S.: Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260, 302–312 (2017)
Nousi, C., Belogianni, P., Koukaras, P., Tjortjis, C.: Mining data to deal with epidemics: case studies to demonstrate real world AI applications. In: Handbook of Artificial Intelligence in Healthcare, pp. 287–312. Springer, Cham (2022)
Zhang, C., Liu, C., Zhang, X., Almpanidis, G.: An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst. Appl. 82, 128–150 (2017)
Keleş, M.K.: Breast cancer prediction and detection using data mining classification algorithms: a comparative study. TehnickiVjesnik 26(1), 149–155 (2019)
Gahlaut, A., Singh, P.K.: Prediction analysis of risky credit using data mining classification models. In: 8th International Conference on Computing, Communication and Networking Technologies, pp. 1–7. IEEE (2017)
Moradi, S., MokhatabRafiei, F.: A dynamic credit risk assessment model with data mining techniques: evidence from Iranian banks. Financ. Innov. 5(1), 1–27 (2019)
Rajaleximi, P., Ahmed, M., Alenezi, A.: Classification of imbalanced class distribution using random forest with multiple weight based majority voting for credit scoring. Int. J. Recent Technol. Eng. 7(6S5), 517–526 (2019)
Jalota, C., Agrawal, R.: Analysis of educational data mining using classification. In: International Conference on Machine Learning, Big Data, Cloud and Parallel Computing, pp. 243–247. IEEE (2019)
Oramas, S., Nieto, O., Barbieri, F., Serra, X.: Multi-label music genre classification from audio, text, and images using deep features. In: 18th International Society for Music Information Retrieval Conference. Suzhou, China (2017)
Garg, S., Singh, A., Batra, S., Kumar, N., Obaidat, M.S.: EnClass: ensemble-based classification model for network anomaly detection in massive datasets. In: Global Communications Conference, pp. 1–7. IEEE (2017)
Sathya Bama, S., Ahmed, M.I., Saravanan, A.: Relevance re-ranking through proximity based term frequency model. In: International Conference on ICT Innovations, pp. 219–229. Springer, Cham (2016)
Saravanan, A., Sathya Bama, S.: Extraction of core web content from web pages using noise elimination. J. Eng. Sci. Technol. Rev. 13(4), 173–187 (2020)
Sathya Bama, S., Saravanan, A.: Efficient classification using average weighted pattern score with attribute rank based feature selection. Int. J. Intell. Syst. Appl. 10(7), 29 (2019)
Hoque, N., Singh, M., Bhattacharyya, D.K.: EFS-MI: an ensemble feature selection method for classification. Complex Intell. Syst. 4(2), 105–118 (2018)
Rostami, M., Berahmand, K., Nasiri, E., Forouzande, S.: Review of swarm intelligence-based feature selection methods. Eng. Appl. Artif. Intell. 100, 104210 (2021)
Mafarja, M., Mirjalili, S.: Whale optimization approaches for wrapper feature selection. Appl. Soft Comput. 62, 441–453 (2018)
Got, A., Moussaoui, A., Zouache, D.: Hybrid filter-wrapper feature selection using Whale optimization algorithm: a multi-objective approach. Expert Syst. Appl. 115312 (2021)
Sheikhpour, R., Sarram, M.A., Gharaghani, S., Chahooki, M.A.Z.: A survey on semi-supervised feature selection methods. Pattern Recogn. 64, 141–158 (2017)
Bommert, A., Sun, X., Bischl, B., Rahnenführer, J., Lang, M.: Benchmark for filter methods for feature selection in high-dimensional classification data. Comput. Stat. Data Anal. 143, 106839 (2020)
Palimkar P. et al.: Machine learning technique to prognosis diabetes disease: random forest classifier approach. In: Bianchini, M., Piuri, V., Das, S., Shaw, R.N. (eds.) Advanced Computing and Intelligent Technologies. Lecture Notes in Networks and Systems, vol. 218. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-2164-2_19
Bolón-Canedo, V., Alonso-Betanzos, A.: Ensembles for feature selection: a review and future trends. Inf. Fusion 52, 1–12 (2019)
Das, A.K., Das, S., Ghosh, A.: Ensemble feature selection using bi-objective genetic algorithm. Knowl.-Based Syst. 123, 116–127 (2017)
Seijo-Pardo, B., Porto-Díaz, I., Bolón-Canedo, V., Alonso-Betanzos, A.: Ensemble feature selection: homogeneous and heterogeneous approaches. Knowl.-Based Syst. 118, 124–139 (2017)
Kumar A. et al.: Analysis of classifier algorithms to detect anti-money laundering. In: Bansal, J.C., Paprzycki, M., Bianchini, M., Das, S. (eds.) Computationally Intelligent Systems and their Applications. Studies in Computational Intelligence, vol. 950. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0407-2_11
Ferro, C.A., Jupp, T.E., Lambert, F.H., Huntingford, C., Cox, P.M.: Model complexity versus ensemble size: allocating resources for climate prediction. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 370(1962), 1087–1099 (2012)
Torres-Sospedra, J., Fernandez-Redondo, M., Hernandez-Espinosa, C.: A research on combination methods for ensembles of multilayer feedforward. In: Proceedings of International Joint Conference on Neural Networks, vol. 2, pp. 1125–1130. IEEE (2005)
Seijo-Pardo, B., Bolón-Canedo, V., Alonso-Betanzos, A.: On developing an automatic threshold applied to feature selection ensembles. Inf. Fusion 45, 227–245 (2019)
Pujari, P., Gupta, J.B.: Improving classification accuracy by using feature selection and ensemble model. Int. J. Soft Comput. Eng. 2(2), 380–386 (2012)
Mridha, K. et al.: Phishing URL classification analysis using ANN algorithm. In: 2021 IEEE 4th International Conference on Computing, Power and Communication Technologies (GUCON), pp. 1–7 (2021). https://doi.org/10.1109/GUCON50781.2021.9573797
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical recipes in C. Probab. Eng. Informational Sci. 2(1) (1988)
Zhang, L., Shan, L., Wang, J.: Optimal feature selection using distance-based discrete firefly algorithm with mutual information criterion. Neural Comput. Appl. 28(9), 2795–2808 (2017)
Tsymbal, A., Puuronen, S., Patterson, D.W.: Ensemble feature selection with the simple Bayesian classification. Inf. Fusion 4(2), 87–100 (2003)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 313–325. Springer, Berlin, Heidelberg (2008)
De Jay, N., Papillon-Cavanagh, S., Olsen, C., El-Hachem, N., Bontempi, G., Haibe-Kains, B.: mRMRe: an R package for parallelized mRMR ensemble feature selection. Bioinformatics 29(18), 2365–2368 (2013)
Rajaleximi, P., Ahmed, M., Alenezi, A.: Feature selection using optimized multiple rank score model for credit scoring. Int. J. Intell. Eng. Syst. 12(04), 74–84 (2019)
Bama, S.S., Ahmed, M.I., Saravanan, A.: A survey on performance evaluation measures for information retrieval system. Int. Res. J. Eng. Technol. 2(2), 1015–1020 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Saravanan, A., Felix, C.S., Umarani, M. (2022). Maximum Relevancy and Minimum Redundancy Based Ensemble Feature Selection Model for Effective Classification. In: Shaw, R.N., Das, S., Piuri, V., Bianchini, M. (eds) Advanced Computing and Intelligent Technologies. Lecture Notes in Electrical Engineering, vol 914. Springer, Singapore. https://doi.org/10.1007/978-981-19-2980-9_11
Download citation
DOI: https://doi.org/10.1007/978-981-19-2980-9_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2979-3
Online ISBN: 978-981-19-2980-9
eBook Packages: Computer ScienceComputer Science (R0)