Skip to main content

Efficient feature selection for inconsistent heterogeneous information systems based on a grey wolf optimizer and rough set theory

Abstract

Inconsistent heterogeneous information systems (IHISs) are predominant nowadays. In the meantime, feature selection (FS) for such systems represents a challenge, requiring more innovative research. In the present article, we introduce a novel FS algorithm, GWNO, to tackle this challenge. The novelty of GWNO stems from combining the powers of both grey wolf optimization (GWO) and rough set theory (RST). GWO is used to search for a minimal feature subset just enough to describe the IHIS, whereas RST is used to design a clever fitness function to guide the search. For validation, GWNO was implemented and heavily tested with a kNN classifier, using seven publicly available IHISs of dimensionalities ranging from 10s of features to 2000+ features. For each IHIS, GWNO first selected the important features and then submitted those features to kNN for classification. The test results were highly impressive, with FS ending in less than 10 iterations and classification accuracy reaching 99%. For performance evaluation, GWNO was compared to eight recently published algorithms of its category on the same seven IHISs. It outperformed them all, in terms of FS speed, number of features selected, and classification accuracy. Specifically, it ended FS first, selected up to 77% less features, and achieved with those fewer features a classification accuracy higher than the competitors. For rigorous and credible results, tenfold cross-validation was used throughout the experiments.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Availability of data and material

Available upon request.

References

  1. Abdel-Basset M, El-Shahat D, El-henawy I, de Albuquerque VHC, Mirjalili S (2020) A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2019.112824

    Article  Google Scholar 

  2. Abualigah LM, Khader AT, Hanandeh ES (2018) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci 25:456–466. https://doi.org/10.1016/j.jocs.2017.07.018

    Article  Google Scholar 

  3. Agrawal RK, Kaur B, Sharma S (2020) Quantum based whale optimization algorithm for wrapper feature selection. Applied Soft Computing 89. https://doi.org/10.1016/j.asoc.2020.106092

  4. Al-Tashi Q, Kadir S, Rais HM, Mirjalili S, Alhussian H (2019) Binary optimization using hybrid grey wolf optimization for feature selection. IEEE Access 7:39496–39508. https://doi.org/10.1109/ACCESS.2019.2906757

    Article  Google Scholar 

  5. Al-Tashi H, Jadid S, Rais H, Mirjalili S (2020) Approaches to multi-objective feature selection: a systematic literature review. IEEE Access 8:125076–125096

    Article  Google Scholar 

  6. Al-Tashi Q, Rais HM, Abdulkadir SJ, Mirjalili S (2020) Feature selection based on grey wolf optimizer for oil & gas reservoir classification. In: 2020 International Conference on Computational Intelligence (ICCI), pp 211–216. https://doi.org/10.1109/ICCI51257.2020.9247827

  7. Al-Tashi Q, Rais H, Jadid S (2018) Feature selection method based on grey wolf optimization for coronary artery disease classification. In: International conference of reliable information and communication technology, pp 257–266. https://doi.org/10.1007/978-3-319-99007-1_25

  8. Al-Wajih R, Abdulkadir SJ, Aziz N, Al-Tash Q, Talpur N (2021) Hybrid binary grey wolf with harris hawks optimizer for feature selection. IEEE Access 9:31662–31677. https://doi.org/10.1109/ACCESS.2021.3060096

    Article  Google Scholar 

  9. Bernal E, Castillo O, Soria J et al (2018) A variant to the dynamic adaptation of parameters in galactic swarm optimization using a fuzzy logic augmentation. In: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp 1–7

  10. Catherine Blake. [n.d.]. UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository

  11. Emary E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172:371–381. https://doi.org/10.1016/j.neucom.2015.06.083

    Article  Google Scholar 

  12. Emary E, Zawbaa HM, Sharawi M (2019) Impact of levy flight on modern meta-heuristic optimizers. Appl Soft Comput 75:775–789. https://doi.org/10.1016/j.asoc.2018.11.033

    Article  Google Scholar 

  13. Ertenlice O, Kalayci CB (2018) A survey of swarm intelligence for portfolio optimization: algorithms and applications. Swarm Evol Comput 39:36–52. https://doi.org/10.1016/j.swevo.2018.01.009

    Article  Google Scholar 

  14. Fan JC, Li Y, Tang LY, Wu GK (2018) Roughpso: rough set-based particle swarm optimisation. Int J Bio-Insp Comput 12:245–253. https://doi.org/10.1504/IJBIC.2018.10017835

    Article  Google Scholar 

  15. Faris H, Hassonah MA, Alaa M et al (2018) A multi-verse optimizer approach for feature selection and optimizing svm parameters based on a robust system architecture. Neural Comput Appl 30:2355–2369. https://doi.org/10.1007/s00521-016-2818-2

    Article  Google Scholar 

  16. Gao K, Cao Z, Zhang L, Chen Z et al (2019) A review on swarm intelligence and evolutionary algorithms for solving flexible job shop scheduling problems. IEEE/CAA J Automatica Sinica 6:904–916. https://doi.org/10.1109/JAS.2019.1911540

    Article  Google Scholar 

  17. Ghaddar B, Naoum-Sawaya J (2018) High dimensional data classification and feature selection using support vector machines. Eur J Oper Res 265:993–1004. https://doi.org/10.1016/j.ejor.2017.08.040

    MathSciNet  Article  MATH  Google Scholar 

  18. Ghaemi M, Feizi-Derakhshi MR (2019) Feature selection using forest optimization algorithm. Pattern Recogn 60:121–129. https://doi.org/10.1016/j.patcog.2016.05.012

    Article  Google Scholar 

  19. Gomes GF, da Cunha SS, Ancelotti AC (2019) A sunflower optimization (sfo) algorithm applied to damage identification on laminated composite plates. Eng Comput 35:619–626. https://doi.org/10.1007/s00366-018-0620-8

    Article  Google Scholar 

  20. Gupta S, Deep K (2019) A novel random walk grey wolf optimizer. Swarm Evol Comput 44:101–112. https://doi.org/10.1016/j.swevo.2018.01.001

    Article  Google Scholar 

  21. Hamed A, Sobhy A, Nassar H (2021) Distributed approach for computing rough set approximations of big incomplete information systems. Inf Sci 547:427–449. https://doi.org/10.1016/j.ins.2020.08.049

    MathSciNet  Article  Google Scholar 

  22. http://sites.labic.icmc.usp.br/text_collections/

  23. Kohli M, Arora S (2018) Chaotic grey wolf optimization algorithm for constrained optimization problems. J Comput Design Eng 5:458–472. https://doi.org/10.1016/j.jcde.2017.02.005

    Article  Google Scholar 

  24. Liu C, Niu P, Li G et al (2018) Enhanced shuffled frog-leaping algorithm for solving numerical function optimization problems. J Intell Manuf 29:1133–1153. https://doi.org/10.1007/s10845-015-1164-z

    Article  Google Scholar 

  25. Lu Z, Wang C, Guo J (2018) A hybrid of fish swarm algorithm and shuffled frog leaping algorithm for attribute reduction. In: 13th World Congress on Intelligent Control and Automation (WCICA) pp 1482–1487. https://doi.org/10.1109/WCICA.2018.8630621

  26. Machine learning repository. https://archive.ics.uci.edu/ml/index.php. Retrieved October 11, 2019

  27. Mafarja MM, Mirjalili S (2018) Hybrid binary ant lion optimizer with rough set and approximate entropy reducts for feature selection. Soft Comput 23:6249–265. https://doi.org/10.1007/s00500-018-3282-y

    Article  Google Scholar 

  28. Mirjalili SZ, Mirjalili S, Saremi S et al (2018) Grasshopper optimization algorithm for multi-objective optimization problems. Appl Intell 48:805–820. https://doi.org/10.1007/s10489-017-1019-8

    Article  Google Scholar 

  29. Pitonakova L, Crowder R, Bullock S (2018) The information-cost-reward framework for understanding robot swarm foraging. Swarm Intell 12:71–96. https://doi.org/10.1007/s11721-017-0148-3

    Article  Google Scholar 

  30. Rath A, Samantaray S, Swain PC (2019) Optimization of the cropping pattern using cuckoo search technique. In Smart Techniques for a Smarter Planet 19–35. https://doi.org/10.1007/978-3-030-03131-2

  31. Saedudin RR, Mahdin H, Kasim S, et al . (2018) A relative tolerance relation of rough set for incomplete information systems. In: International Conference on Soft Computing and Data Mining, 2018, pp 72–81. http://dx.doi.org/10.17576/jsm-2019-4812-24

  32. Sarangi A, Sarangi SK, Mohanty B, Bhusal SP (2019) Gaussian mutated particle swarm optimization with modified velocity for function optimization. In: International Conference on Intelligent Computing and Communication Technologies pp 235–243. https://doi.org/10.1007/978-981-13-8461-5-26

  33. Schroeder A, Trease B, Arsie A (2019) Balancing robot swarm cost and interference effects by varying robot quantity and size. Swarm Intell 13:1–19. https://doi.org/10.1007/s11721-018-0161-1

    Article  Google Scholar 

  34. Seijo-Pardo B, Porto-Diaz I, Bolon-Canedo V, Alonso-Betanzos A (2017) Ensemble feature selection: homogeneous and heterogeneous approaches. Knowl Based Syst 118:124–139. https://doi.org/10.1016/j.knosys.2016.11.017

    Article  Google Scholar 

  35. Sujatha M, Devi GL, Rao KS, Ramesh N (2018) Rough set theory based missing value imputation. In: Cognitive Science and Health Bioinformatics pp 97–106. https://doi.org/10.1007/978-981-10-6653-5-9

  36. Wang Q, Qian Y, Liang X, Guo Q, Liang J (2018) Local neighborhood rough set. Knowl Based Syst 153:53–64. https://doi.org/10.1016/j.knosys.2018.04.023

    Article  Google Scholar 

  37. Wang C, Shi Y, Fan X, Shao M (2019) Attribute reduction based on \(k\)-nearest neighborhood rough sets. Int J Approx Reas 106:18–31. https://doi.org/10.1016/j.ijar.2018.12.013

    MathSciNet  Article  MATH  Google Scholar 

  38. Wang C, Huang Y, Shao M, Fan X (2019) Fuzzy rough set-based attribute reduction using distance measures. Knowl Based Syst 164:205–212. https://doi.org/10.1016/j.knosys.2018.10.038

    Article  Google Scholar 

  39. Wang R, Ye S (2019) Multi-label active learning driven by uncertainty and inconsistency. In: 2019 International Conference on Machine Learning and Cybernetics (ICMLC), Kobe, Japan, 7–10 July 2019, pp. 1–6 https://doi.org/10.1109/ICMLC48188.2019.8949214

  40. Wei X, Cao B, Yu PS (2017) Unsupervised feature selection with heterogeneous side information. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management pp 2359–23662. https://doi.org/10.1145/3132847.3133055

  41. Yang X, Liang S, Yu H, Gao S, Qian Y (2019) Pseudo-label neighborhood rough set: measures and attribute reductions. Int J Approx Reason 105:112–129. https://doi.org/10.1016/j.ijar.2018.11.010

    MathSciNet  Article  MATH  Google Scholar 

  42. Zhao Y, Wei-gang L, Ao L (2020) Improved grey wolf optimization based on the two-stage search of hybrid CMA-ES. Soft Comput 24:1097–1115. https://doi.org/10.1007/s00500-019-03948-x

    Article  Google Scholar 

  43. Zou L, Li H, Jiang W, Yang X (2019) An improved fish swarm algorithm for neighborhood rough set reduction and its application. IEEE Access 7:90277–90288. https://doi.org/10.1109/ACCESS.2019.2926799

    Article  Google Scholar 

Download references

Funding

Not applicable.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ahmed Hamed.

Ethics declarations

Conflict of interest

Not applicable.

Code availability

Available upon request.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hamed, A., Nassar, H. Efficient feature selection for inconsistent heterogeneous information systems based on a grey wolf optimizer and rough set theory. Soft Comput 25, 15115–15130 (2021). https://doi.org/10.1007/s00500-021-06375-z

Download citation

Keywords

  • Grey wolf
  • Neighborhood rough set
  • Heterogeneous data
  • Inconsistent data
  • Feature selection