Skip to main content

Maximum Relevancy and Minimum Redundancy Based Ensemble Feature Selection Model for Effective Classification

  • 317 Accesses

Part of the Lecture Notes in Electrical Engineering book series (LNEE,volume 914)

Abstract

Feature selection algorithms are widely used in most of the critical applications of machine learning, pattern recognition, disease diagnosis and fraud identification. Specifically, due to the existence of redundant and irrelevant attributes, the performance of the underlying model will be degraded with higher time complexity and lower accuracy. Thus, feature selection has become one of the most widely used pre-processing techniques as it improves performance concerning time and accuracy. Several methods exist in identifying the optimal subset of features from the large feature set. However, none of the single feature selection techniques is proved to be effective for all types of the training set. Each method will be effective with specific classifiers and for specific datasets. This paper utilizes the maximum relevancy and minimum redundancy based ensemble feature selection for effective classification with maximized accuracy. The proposed model collects the ranks from different filtering techniques and convert them to scores using the rank-sum approach. The accumulated scores signify the feature importance for which the mutual information for each selected feature with the class variable as well as the selected optimal features are computed. The model has been validated using various experiments by different classifiers, and it is proved to be more effective in improving the performance of the classification results than the existing single rank techniques.

Keywords

  • Ensemble model
  • Feature selection
  • Mutual information
  • Minimum redundancy
  • Maximum relevancy

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-981-19-2980-9_11
  • Chapter length: 16 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   269.00
Price excludes VAT (USA)
  • ISBN: 978-981-19-2980-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   349.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Charitopoulos, A., Rangoussi, M., Koulouriotis, D.: On the use of soft computing methods in educational data mining and learning analytics research: a review of years 2010–2018. Int. J. Artif. Intell. Educ. 30(3), 371–430 (2020)

    CrossRef  Google Scholar 

  2. Verma, D., Mishra, N.: Analysis and prediction of breast cancer and diabetes disease datasets using data mining classification techniques. In: International Conference on Intelligent Sustainable Systems, pp. 533–538. IEEE (2017)

    Google Scholar 

  3. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier (2011)

    Google Scholar 

  4. Mafarja, M.M., Mirjalili, S.: Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260, 302–312 (2017)

    CrossRef  Google Scholar 

  5. Nousi, C., Belogianni, P., Koukaras, P., Tjortjis, C.: Mining data to deal with epidemics: case studies to demonstrate real world AI applications. In: Handbook of Artificial Intelligence in Healthcare, pp. 287–312. Springer, Cham (2022)

    Google Scholar 

  6. Zhang, C., Liu, C., Zhang, X., Almpanidis, G.: An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst. Appl. 82, 128–150 (2017)

    CrossRef  Google Scholar 

  7. Keleş, M.K.: Breast cancer prediction and detection using data mining classification algorithms: a comparative study. TehnickiVjesnik 26(1), 149–155 (2019)

    Google Scholar 

  8. Gahlaut, A., Singh, P.K.: Prediction analysis of risky credit using data mining classification models. In: 8th International Conference on Computing, Communication and Networking Technologies, pp. 1–7. IEEE (2017)

    Google Scholar 

  9. Moradi, S., MokhatabRafiei, F.: A dynamic credit risk assessment model with data mining techniques: evidence from Iranian banks. Financ. Innov. 5(1), 1–27 (2019)

    CrossRef  Google Scholar 

  10. Rajaleximi, P., Ahmed, M., Alenezi, A.: Classification of imbalanced class distribution using random forest with multiple weight based majority voting for credit scoring. Int. J. Recent Technol. Eng. 7(6S5), 517–526 (2019)

    Google Scholar 

  11. Jalota, C., Agrawal, R.: Analysis of educational data mining using classification. In: International Conference on Machine Learning, Big Data, Cloud and Parallel Computing, pp. 243–247. IEEE (2019)

    Google Scholar 

  12. Oramas, S., Nieto, O., Barbieri, F., Serra, X.: Multi-label music genre classification from audio, text, and images using deep features. In: 18th International Society for Music Information Retrieval Conference. Suzhou, China (2017)

    Google Scholar 

  13. Garg, S., Singh, A., Batra, S., Kumar, N., Obaidat, M.S.: EnClass: ensemble-based classification model for network anomaly detection in massive datasets. In: Global Communications Conference, pp. 1–7. IEEE (2017)

    Google Scholar 

  14. Sathya Bama, S., Ahmed, M.I., Saravanan, A.: Relevance re-ranking through proximity based term frequency model. In: International Conference on ICT Innovations, pp. 219–229. Springer, Cham (2016)

    Google Scholar 

  15. Saravanan, A., Sathya Bama, S.: Extraction of core web content from web pages using noise elimination. J. Eng. Sci. Technol. Rev. 13(4), 173–187 (2020)

    CrossRef  Google Scholar 

  16. Sathya Bama, S., Saravanan, A.: Efficient classification using average weighted pattern score with attribute rank based feature selection. Int. J. Intell. Syst. Appl. 10(7), 29 (2019)

    Google Scholar 

  17. Hoque, N., Singh, M., Bhattacharyya, D.K.: EFS-MI: an ensemble feature selection method for classification. Complex Intell. Syst. 4(2), 105–118 (2018)

    CrossRef  Google Scholar 

  18. Rostami, M., Berahmand, K., Nasiri, E., Forouzande, S.: Review of swarm intelligence-based feature selection methods. Eng. Appl. Artif. Intell. 100, 104210 (2021)

    CrossRef  Google Scholar 

  19. Mafarja, M., Mirjalili, S.: Whale optimization approaches for wrapper feature selection. Appl. Soft Comput. 62, 441–453 (2018)

    CrossRef  Google Scholar 

  20. Got, A., Moussaoui, A., Zouache, D.: Hybrid filter-wrapper feature selection using Whale optimization algorithm: a multi-objective approach. Expert Syst. Appl. 115312 (2021)

    Google Scholar 

  21. Sheikhpour, R., Sarram, M.A., Gharaghani, S., Chahooki, M.A.Z.: A survey on semi-supervised feature selection methods. Pattern Recogn. 64, 141–158 (2017)

    CrossRef  Google Scholar 

  22. Bommert, A., Sun, X., Bischl, B., Rahnenführer, J., Lang, M.: Benchmark for filter methods for feature selection in high-dimensional classification data. Comput. Stat. Data Anal. 143, 106839 (2020)

    CrossRef  MathSciNet  Google Scholar 

  23. Palimkar P. et al.: Machine learning technique to prognosis diabetes disease: random forest classifier approach. In: Bianchini, M., Piuri, V., Das, S., Shaw, R.N. (eds.) Advanced Computing and Intelligent Technologies. Lecture Notes in Networks and Systems, vol. 218. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-2164-2_19

  24. Bolón-Canedo, V., Alonso-Betanzos, A.: Ensembles for feature selection: a review and future trends. Inf. Fusion 52, 1–12 (2019)

    CrossRef  Google Scholar 

  25. Das, A.K., Das, S., Ghosh, A.: Ensemble feature selection using bi-objective genetic algorithm. Knowl.-Based Syst. 123, 116–127 (2017)

    CrossRef  Google Scholar 

  26. Seijo-Pardo, B., Porto-Díaz, I., Bolón-Canedo, V., Alonso-Betanzos, A.: Ensemble feature selection: homogeneous and heterogeneous approaches. Knowl.-Based Syst. 118, 124–139 (2017)

    CrossRef  Google Scholar 

  27. Kumar A. et al.: Analysis of classifier algorithms to detect anti-money laundering. In: Bansal, J.C., Paprzycki, M., Bianchini, M., Das, S. (eds.) Computationally Intelligent Systems and their Applications. Studies in Computational Intelligence, vol. 950. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0407-2_11

  28. Ferro, C.A., Jupp, T.E., Lambert, F.H., Huntingford, C., Cox, P.M.: Model complexity versus ensemble size: allocating resources for climate prediction. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 370(1962), 1087–1099 (2012)

    CrossRef  Google Scholar 

  29. Torres-Sospedra, J., Fernandez-Redondo, M., Hernandez-Espinosa, C.: A research on combination methods for ensembles of multilayer feedforward. In: Proceedings of International Joint Conference on Neural Networks, vol. 2, pp. 1125–1130. IEEE (2005)

    Google Scholar 

  30. Seijo-Pardo, B., Bolón-Canedo, V., Alonso-Betanzos, A.: On developing an automatic threshold applied to feature selection ensembles. Inf. Fusion 45, 227–245 (2019)

    CrossRef  Google Scholar 

  31. Pujari, P., Gupta, J.B.: Improving classification accuracy by using feature selection and ensemble model. Int. J. Soft Comput. Eng. 2(2), 380–386 (2012)

    Google Scholar 

  32. Mridha, K. et al.: Phishing URL classification analysis using ANN algorithm. In: 2021 IEEE 4th International Conference on Computing, Power and Communication Technologies (GUCON), pp. 1–7 (2021). https://doi.org/10.1109/GUCON50781.2021.9573797

  33. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical recipes in C. Probab. Eng. Informational Sci. 2(1) (1988)

    Google Scholar 

  34. Zhang, L., Shan, L., Wang, J.: Optimal feature selection using distance-based discrete firefly algorithm with mutual information criterion. Neural Comput. Appl. 28(9), 2795–2808 (2017)

    CrossRef  Google Scholar 

  35. Tsymbal, A., Puuronen, S., Patterson, D.W.: Ensemble feature selection with the simple Bayesian classification. Inf. Fusion 4(2), 87–100 (2003)

    CrossRef  Google Scholar 

  36. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)

    CrossRef  Google Scholar 

  37. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Google Scholar 

  38. Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 313–325. Springer, Berlin, Heidelberg (2008)

    Google Scholar 

  39. De Jay, N., Papillon-Cavanagh, S., Olsen, C., El-Hachem, N., Bontempi, G., Haibe-Kains, B.: mRMRe: an R package for parallelized mRMR ensemble feature selection. Bioinformatics 29(18), 2365–2368 (2013)

    CrossRef  Google Scholar 

  40. Rajaleximi, P., Ahmed, M., Alenezi, A.: Feature selection using optimized multiple rank score model for credit scoring. Int. J. Intell. Eng. Syst. 12(04), 74–84 (2019)

    Google Scholar 

  41. Bama, S.S., Ahmed, M.I., Saravanan, A.: A survey on performance evaluation measures for information retrieval system. Int. Res. J. Eng. Technol. 2(2), 1015–1020 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Saravanan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Saravanan, A., Felix, C.S., Umarani, M. (2022). Maximum Relevancy and Minimum Redundancy Based Ensemble Feature Selection Model for Effective Classification. In: Shaw, R.N., Das, S., Piuri, V., Bianchini, M. (eds) Advanced Computing and Intelligent Technologies. Lecture Notes in Electrical Engineering, vol 914. Springer, Singapore. https://doi.org/10.1007/978-981-19-2980-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-2980-9_11

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-2979-3

  • Online ISBN: 978-981-19-2980-9

  • eBook Packages: Computer ScienceComputer Science (R0)