Skip to main content
Log in

Feature selection with clustering probabilistic particle swarm optimization

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Dealing with high-dimensional data poses a significant challenge in machine learning. To address this issue, researchers have proposed feature selection as a viable solution. Due to the intricate search space involved in feature selection, swarm intelligence algorithms have gained popularity for their exceptional search capabilities. This study introduces a method called Clustering Probabilistic Particle Swarm Optimization (CPPSO) to revolutionize the traditional particle swarm optimization approach by incorporating probabilities to represent velocity and incorporating an elitism mechanism. Furthermore, CPPSO employs a clustering strategy based on the K-means algorithm, utilizing the Hamming distance to divide the population into two sub-populations to improve the performance. To assess CPPSO’s performance, a comparative analysis is conducted against seven existing algorithms using twenty diverse datasets. These datasets are all based on real-world problems. Fifteen of these are frequently used in feature selection research, while the remaining five comprise imbalanced datasets as well as multi-label datasets. The experimental results demonstrate the superiority of CPPSO across a range of evaluation criteria, surpassing the performance of the comparative algorithms on the majority of the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recognit 43(1):5–13. https://doi.org/10.1016/j.patcog.2009.06.009

    Article  ADS  Google Scholar 

  2. Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: a data perspective. ACM Comput Surv (CSUR) 50(6):1–45. https://doi.org/10.1145/3136625

    Article  Google Scholar 

  3. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182

    Google Scholar 

  4. Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(1–4):131–156. https://doi.org/10.1016/S1088-467X(97)00008-5

    Article  Google Scholar 

  5. Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79. https://doi.org/10.1016/j.neucom.2017.11.077

    Article  Google Scholar 

  6. Zhang Y, Gong D-W, Gao X-Z, Tian T, Sun X (2020) Binary differential evolution with self-learning for multi-objective feature selection. Inf Sci 507:67–85. https://doi.org/10.1016/j.ins.2019.08.040

    Article  MathSciNet  Google Scholar 

  7. Nie F, Wang Z, Tian L, Wang R, Li X (2020) Subspace sparse discriminative feature selection. IEEE Trans Cybern 52(6):4221–4233. https://doi.org/10.1109/TCYB.2020.3025205

    Article  Google Scholar 

  8. Alsahaf A, Petkov N, Shenoy V, Azzopardi G (2022) A framework for feature selection through boosting. Expert Syst Appl 187:115895. https://doi.org/10.1016/j.eswa.2021.115895

    Article  Google Scholar 

  9. Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2020) A review of unsupervised feature selection methods. Artif Intell Rev 53(2):907–948. https://doi.org/10.1007/s10462-019-09682-y

    Article  Google Scholar 

  10. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024

    Article  Google Scholar 

  11. Dudani SA (1976) The distance-weighted k-nearest-neighbor rule. IEEE Trans Syst Man Cybern 4:325–327. https://doi.org/10.1109/TSMC.1976.5408784

    Article  Google Scholar 

  12. Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9:293–300. https://doi.org/10.1023/A:1018628609742

    Article  Google Scholar 

  13. Nguyen BH, Xue B, Zhang M (2020) A survey on swarm intelligence approaches to feature selection in data mining. Swarm Evol Comput 54:100663. https://doi.org/10.1016/j.swevo.2020.100663

    Article  Google Scholar 

  14. Xue B, Zhang M, Browne WN (2012) Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE Trans Cybern 43(6):1656–1671. https://doi.org/10.1109/TSMCB.2012.2227469

    Article  Google Scholar 

  15. Whitney AW (1971) A direct method of nonparametric measurement selection. IEEE Trans Comput 100(9):1100–1103. https://doi.org/10.1109/T-C.1971.223410

    Article  Google Scholar 

  16. Marill T, Green D (1963) On the effectiveness of receptors in recognition systems. IEEE Trans Inf Theory 9(1):11–17. https://doi.org/10.1109/TIT.1963.1057810

    Article  Google Scholar 

  17. Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125. https://doi.org/10.1016/0167-8655(94)90127-9

    Article  ADS  Google Scholar 

  18. Zhang Y, Song X-F, Gong D (2017) A return-cost-based binary firefly algorithm for feature selection. Inf Sci 418:561–574. https://doi.org/10.1016/j.ins.2017.08.047

    Article  Google Scholar 

  19. Rostami M, Berahmand K, Nasiri E, Forouzandeh S (2021) Review of swarm intelligence-based feature selection methods. Eng Appl Artif Intell 100:104210. https://doi.org/10.1016/j.engappai.2021.104210

    Article  Google Scholar 

  20. Mafarja MM, Mirjalili S (2017) Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312. https://doi.org/10.1016/j.neucom.2017.04.053

    Article  Google Scholar 

  21. Wan Y, Wang M, Ye Z, Lai X (2016) A feature selection method based on modified binary coded ant colony optimization algorithm. Appl Soft Comput 49:248–258. https://doi.org/10.1016/j.asoc.2016.08.011

    Article  Google Scholar 

  22. AlFarraj O, AlZubi A, Tolba A (2019) Optimized feature selection algorithm based on fireflies with gravitational ant colony algorithm for big data predictive analytics. Neural Comput Appl 31:1391–1403. https://doi.org/10.1007/s00521-018-3612-0

    Article  Google Scholar 

  23. Al-Thanoon NA, Algamal ZY, Qasim OS (2021) Feature selection based on a crow search algorithm for big data classification. Chemom Intell Lab Syst 212:104288. https://doi.org/10.1016/j.chemolab.2021.104288

    Article  CAS  Google Scholar 

  24. Abualigah LM, Khader AT, Hanandeh ES (2018) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci 25:456–466. https://doi.org/10.1016/j.jocs.2017.07.018

    Article  Google Scholar 

  25. Huda RK, Banka H (2019) Efficient feature selection and classification algorithm based on PSO and rough sets. Neural Comput Appl 31:4287–4303. https://doi.org/10.1007/s00521-017-3317-9

    Article  Google Scholar 

  26. Amoozegar M, Minaei-Bidgoli B (2018) Optimizing multi-objective PSO based feature selection method using a feature elitism mechanism. Expert Syst Appl 113:499–514. https://doi.org/10.1016/j.eswa.2018.07.013

    Article  Google Scholar 

  27. Li J, Yang L, Yi J, Yang H, Todo Y, Gao S (2022) A simple but efficient ranking-based differential evolution. IEICE Trans Inf Syst 105(1):189–192. https://doi.org/10.1587/transinf.2021EDL8053

    Article  Google Scholar 

  28. Zhang Y, Gao S, Cai P, Lei Z, Wang Y (2023) Information entropy-based differential evolution with extremely randomized trees and LightGBM for protein structural class prediction. Appl Soft Comput 136:110064. https://doi.org/10.1016/j.asoc.2023.110064

    Article  Google Scholar 

  29. Fong S, Wong R, Vasilakos AV (2015) Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans Serv Comput 9(1):33–45. https://doi.org/10.1109/TSC.2015.2439695

    Article  Google Scholar 

  30. Lei Z, Gao S, Wang Y, Yu Y, Guo L (2022) An adaptive replacement strategy-incorporated particle swarm optimizer for wind farm layout optimization. Energy Convers Manag 269:116174. https://doi.org/10.1016/j.enconman.2022.116174

    Article  Google Scholar 

  31. Ibrahim RA, Ewees AA, Oliva D, Abd Elaziz M, Lu S (2019) Improved salp swarm algorithm based on particle swarm optimization for feature selection. J Ambient Intell Humaniz Comput 10:3155–3169. https://doi.org/10.1007/s12652-018-1031-9

    Article  Google Scholar 

  32. Paul D, Jain A, Saha S, Mathew J (2021) Multi-objective PSO based online feature selection for multi-label classification. Knowl Based Syst 222:106966. https://doi.org/10.1016/j.knosys.2021.106966

    Article  Google Scholar 

  33. Xue Y, Xue B, Zhang M (2019) Self-adaptive particle swarm optimization for large-scale feature selection in classification. ACM Trans Knowl Discov Data (TKDD) 13(5):1–27. https://doi.org/10.1145/3340848

    Article  Google Scholar 

  34. Xue Y, Cai X, Neri F (2022) A multi-objective evolutionary algorithm with interval based initialization and self-adaptive crossover operator for large-scale feature selection in classification. Appl Soft Comput 127:109420. https://doi.org/10.1016/j.asoc.2022.109420

    Article  Google Scholar 

  35. Hu Y, Zhang Y, Gong D (2020) Multiobjective particle swarm optimization for feature selection with fuzzy cost. IEEE Trans Cybern 51(2):874–888. https://doi.org/10.1109/TCYB.2020.3015756

    Article  Google Scholar 

  36. Xue Y, Tang Y, Xu X, Liang J, Neri F (2021) Multi-objective feature selection with missing data in classification. IEEE Trans Emerg Top Comput Intell 6(2):355–364. https://doi.org/10.1109/TETCI.2021.3074147

    Article  ADS  Google Scholar 

  37. Du K-L, Swamy M, Du K-L, Swamy M (2016) Particle swarm optimization. In: Search and optimization by metaheuristics: techniques and algorithms inspired by nature, pp 153–173. https://doi.org/10.1007/978-3-319-41192-7_9

  38. Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recognit 36(2):451–461. https://doi.org/10.1016/S0031-3203(02)00060-2

    Article  ADS  Google Scholar 

  39. Guha R, Ghosh M, Chakrabarti A, Sarkar R, Mirjalili S (2020) Introducing clustering based population in binary gravitational search algorithm for feature selection. Appl Soft Comput 93:106341. https://doi.org/10.1016/j.asoc.2020.106341

    Article  Google Scholar 

  40. Pramanik R, Sarkar S, Sarkar R (2022) An adaptive and altruistic PSO-based deep feature selection method for pneumonia detection from chest X-rays. Appl Soft Comput 128:109464. https://doi.org/10.1016/j.asoc.2022.109464

    Article  PubMed  PubMed Central  Google Scholar 

  41. Alwajih R, Abdulkadir SJ, Al Hussian H, Aziz N, Al-Tashi Q, Mirjalili S, Alqushaibi A (2022) Hybrid binary whale with harris hawks for feature selection. Neural Comput Appl 34(21):19377–19395. https://doi.org/10.1007/s00521-022-07522-9

    Article  Google Scholar 

  42. Liu H, Zhang X-W, Tu L-P (2020) A modified particle swarm optimization using adaptive strategy. Expert Syst Appl 152:113353. https://doi.org/10.1016/j.eswa.2020.113353

    Article  Google Scholar 

  43. Tran B, Xue B, Zhang M (2018) Variable-length particle swarm optimization for feature selection on high-dimensional classification. IEEE Trans Evol Comput 23(3):473–487. https://doi.org/10.1109/TEVC.2018.2869405

    Article  Google Scholar 

  44. Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453. https://doi.org/10.1016/j.asoc.2017.11.006

    Article  Google Scholar 

  45. Wang Z, Gao S, Zhang Y, Guo L (2022) Symmetric uncertainty-incorporated probabilistic sequence-based ant colony optimization for feature selection in classification. Knowl Based Syst 256:109874. https://doi.org/10.1016/j.knosys.2022.109874

    Article  Google Scholar 

  46. Khalid AM, Hamza HM, Mirjalili S, Hosny KM (2022) BCOVIDOA: a novel binary coronavirus disease optimization algorithm for feature selection. Knowl Based Syst 248:108789. https://doi.org/10.1016/j.knosys.2022.108789

    Article  PubMed  PubMed Central  Google Scholar 

  47. Fernández A, García S, Jesus MJ, Herrera F (2008) A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst 159(18):2378–2398. https://doi.org/10.1016/j.fss.2007.12.023

    Article  MathSciNet  Google Scholar 

  48. Gonçalves EC, Plastino A, Freitas AA (2013) A genetic algorithm for optimizing the label ordering in multi-label classifier chains. In: 2013 IEEE 25th international conference on tools with artificial intelligence. IEEE, pp 469–476. https://doi.org/10.1109/ICTAI.2013.76

  49. Tsoumakas G, Katakis I, Vlahavas I (2008) Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of ECML/PKDD 2008 workshop on mining multidimensional data (MMD’08), vol 21, pp 53–59

  50. Maldonado S, López J (2018) Dealing with high-dimensional class-imbalanced datasets: embedded feature selection for SVM classification. Appl Soft Comput 67:94–105

    Article  Google Scholar 

Download references

Acknowledgements

This research was partially supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI under Grant JP22H03643, Japan Science and Technology Agency (JST) Support for Pioneering Research Initiated by the Next Generation (SPRING) under Grant JPMJSP2145, and JST through the Establishment of University Fellowships towards the Creation of Science Technology Innovation under Grant JPMJFS2115.

Author information

Authors and Affiliations

Authors

Contributions

JG: Conceptualization, Writing- original draft, Methodology, Software. ZW: Methodology, Software, Supervision, Writing- review & editing. ZL: Conceptualization, Supervision,Writing- review & editing. R-LW: Conceptualization, Supervision,Writing- review & editing. ZW: Conceptualization, Supervision,Writing- review & editing. SG: Conceptualization, Supervision, Writing- review & editing.

Corresponding author

Correspondence to Shangce Gao.

Ethics declarations

Conflict of interest

The authors whose names are listed immediately below certify that they have no affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript. The authors have no relevant financial or non-financial interests to disclose

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, J., Wang, Z., Lei, Z. et al. Feature selection with clustering probabilistic particle swarm optimization. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02111-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13042-024-02111-9

Keywords

Navigation