Skip to main content
Log in

Optimized feature selection method using particle swarm intelligence with ensemble learning for cancer classification based on microarray datasets

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Cancer is considered a leading cause of mortality in both developed and developing countries. Cancer classification based on the microarray dataset has provided insight into possible treatment strategies. A complicated and high-dimensional number of genes and a few numbers of instances are characteristics of the microarray datasets. Gene selection is therefore a challenging and required task for the data analysis of microarray expression. The selection of genes may reveal insight into the underlying mechanism of a particular biological phenomenon. Several academics have recently developed methods of feature selection, utilizing metaheuristic algorithms for interpreting and analyzing microarray data. Nevertheless, due to the few numbers of samples in microarray data compared to the high dimensionality, several data mining approaches have been unsuccessful to select the most relevant and informatics genes. As a result, incorporating various classifiers can enhance feature selection and classification performance. The current study aims to propose a method for cancer classification by employing ensemble learning. Hence, in this paper, particle swarm optimization and an ensemble learning method collaborate for feature selection and cancer classification. As a result, the analysis indicates the effectiveness of the proposed method for cancer classification based on microarray datasets, and in terms of accuracy, the performance outcomes are 100%, 92.86%, 86.36%, 100%, 85.71% for leukemia, colon, breast cancer, ovarian, and central nervous system, respectively, which overcome most of the state-of-the-art methods and also dominance on the baseline ensemble method with 12% enhancement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Availability of data and materials

Data are available upon request.

Code availability

Code is available upon request.

Abbreviations

PSO:

Particle swarm optimization

SVM:

Support vector machines

KNN:

K-nearest neighbors

DT:

Decision trees

NB:

Naïve Bayes

RF:

Random forest

CNN:

Convolutional neural network

PCA:

Principle component analysis

GSP:

Gene selection programming

GEP:

Gene expression programming

GA:

Genetic algorithm

CNS:

Central nervous system

AML:

Acute myeloblastic leukemia

ALL:

Acute lymphoblastic leukemia

GWO:

Grey wolf optimizer

WOA:

Whale optimization algorithm

BAT:

Bat algorithm

MFO:

Moth-flame optimization

FFA:

Firefly algorithm

MVO:

Multi-verse optimizer

ROC:

Receiver operating characteristic curve

References

  1. Plummer M, de Martel C, Vignat J, Ferlay J, Bray F, Franceschi S (2018) Global burden of cancers attributable to infections in 2012: a synthetic analysis. Lancet Glob Heal 4(9):e609–e616. https://doi.org/10.1016/S2214-109X(16)30143-7

    Article  Google Scholar 

  2. WHO, “Cancer,” (2020) World Health Organization. https://www.who.int/news-room/fact-sheets/detail/cancer. Accessed 23 June 2021

  3. Montazeri M (2016) Machine learning models in breast cancer survival prediction. Technol Heal Care 24(1):31–42. https://doi.org/10.3233/THC-151071

    Article  Google Scholar 

  4. Peng Y (2006) A novel ensemble machine learning for robust microarray data classification. Comput Biol Med 36(6):553–573. https://doi.org/10.1016/J.COMPBIOMED.2005.04.001

    Article  Google Scholar 

  5. Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2015) Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 13:8–17. https://doi.org/10.1016/J.CSBJ.2014.11.005

    Article  Google Scholar 

  6. Wang X, Hessner MJ, Wu Y, Pati N, Ghosh S (2003) Quantitative quality control in microarray experiments and the application in data filtering, normalization and false positive rate prediction. Bioinformatics 19(11):1341–1347. https://doi.org/10.1093/bioinformatics/btg154

    Article  Google Scholar 

  7. Mohamad MS, Omatu S, Yoshioka M, Deris S (2008) An approach using hybrid methods to select informative genes from microarray data for cancer classification. In: Proceedings of—2nd Asia international conference on modelling simulation, AMS 2008, pp 603–608. https://doi.org/10.1109/AMS.2008.71

  8. Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinform. https://doi.org/10.1155/2015/198363

    Article  Google Scholar 

  9. Křížek P (2008) Feature selection: stability, algorithms, and evaluation Doctoral thesis. Czech Technical University

  10. Hosseini ES, Moattar MH (2019) Evolutionary feature subsets selection based on interaction information for high dimensional imbalanced data classification. Appl Soft Comput 82:105581. https://doi.org/10.1016/j.asoc.2019.105581

    Article  Google Scholar 

  11. Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: MHS’95. Proceedings of the sixth international symposium on micro machine and human science, pp 39–43. https://doi.org/10.1109/MHS.1995.494215.

  12. Ali A, Shamsuddin SM, Ralescu AL (2007) Classification with class imbalance problem: a review. Int J Adv Soft Comput its Appl 7(3):176–204

    Google Scholar 

  13. Bolón-Canedo V, Alonso-Betanzos A (2019) Ensembles for feature selection: a review and future trends. Inf Fus 52(2018):1–12. https://doi.org/10.1016/j.inffus.2018.11.008

    Article  Google Scholar 

  14. Dittman D, Khoshgoftaar TM, Wald R, Napolitano A (2011) Random forest: a reliable tool for patient response prediction. In: 2011 IEEE international conference on bioinformatics and biomedicine workshops. BIBMW 2011, pp 289–296. https://doi.org/10.1109/BIBMW.2011.6112389

  15. Alelyani S (2021) Stable bagging feature selection on medical data. J Big Data. https://doi.org/10.1186/s40537-020-00385-8

    Article  Google Scholar 

  16. Jowkar GH, Mansoori EG (2016) Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification. Comput Biol Chem 64:263–270. https://doi.org/10.1016/j.compbiolchem.2016.07.004

    Article  MathSciNet  Google Scholar 

  17. Morovvat M, Osareh A (2016) An ensemble of filters and wrappers for microarray data classification. Mach Learn Appl An Int J 3(2):01–17. https://doi.org/10.5121/mlaij.2016.3201

    Article  Google Scholar 

  18. Dagnew G, Shekar BH (2021) Ensemble learning-based classification of microarray cancer data on tree-based features. Cogn Comput Syst 3(1):48–60. https://doi.org/10.1049/ccs2.12003

    Article  Google Scholar 

  19. Panda M (2018) Elephant search optimization combined with deep neural network for microarray data analysis. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2017.12.002

    Article  Google Scholar 

  20. Hussain S, Muhammad S, Iqbal J, Ahmad I (2020) Optimized gene selection and classification of cancer from microarray gene expression data using deep learning. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05367-8

    Article  Google Scholar 

  21. Tabares-Soto R, Orozco-Arias S, Romero-Cano V, Segovia Bucheli V, Luis Rodriguez-Sotelo J, Felipe Jimenez-Varon C (2020) A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data”. PEERJ Comput Sci. https://doi.org/10.7717/peerj-cs.270

    Article  Google Scholar 

  22. Ebrahimpour MK, Eftekhari M (2017) Ensemble of feature selection methods: a hesitant fuzzy sets approach. Appl Soft Comput J 50:300–312. https://doi.org/10.1016/j.asoc.2016.11.021

    Article  Google Scholar 

  23. Das AK, Das S, Ghosh A (2017) Ensemble feature selection using bi-objective genetic algorithm. Knowl Based Syst 123:116–127. https://doi.org/10.1016/j.knosys.2017.02.013

    Article  Google Scholar 

  24. Chen KH, Wang KJ, Wang KM, Angelia MA (2014) Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data. Appl Soft Comput J 24:773–780. https://doi.org/10.1016/j.asoc.2014.08.032

    Article  Google Scholar 

  25. Al-betar MA, Alomari OA, Abu-romman SM (2020) Genomics A TRIZ-inspired bat algorithm for gene selection in cancer classification. Genomics 112(1):114–126. https://doi.org/10.1016/j.ygeno.2019.09.015

    Article  Google Scholar 

  26. Gumaei A, El-zaart A (2021) Feature selection with ensemble learning for prostate cancer diagnosis from microarray gene expression. Health Inform J. https://doi.org/10.1177/1460458221989402

    Article  Google Scholar 

  27. Alanni R, Hou J, Azzawi H, Xiang Y (2019) A novel gene selection algorithm for cancer classification using microarray datasets. BMC Med Genomics. https://doi.org/10.1186/s12920-018-0447-6

    Article  Google Scholar 

  28. Shi P, Liang K, Han D, Zhang Y (2017) 2718. A novel intelligent fault diagnosis method of rotating machinery based on deep learning and PSO-SVM. J Vibroeng 19(8):1. https://doi.org/10.21595/jve.2017.18380

    Article  Google Scholar 

  29. Panda M (2018) Elephant search optimization combined with deep neural network for microarray data analysis. J King Saud Univ Comput Inf Sci 1:1. https://doi.org/10.1016/j.jksuci.2017.12.002

    Article  Google Scholar 

  30. Dabba A, Tari A, Meftali S (2020) Hybridization of Moth flame optimization algorithm and quantum computing for gene selection in microarray data. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-02434-9

    Article  Google Scholar 

  31. Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453. https://doi.org/10.1016/j.asoc.2017.11.006

    Article  Google Scholar 

  32. Zhou Y, Kang J, Guo H (2020) Many-objective optimization of feature selection based on two-level particle cooperation. Inf Sci (Ny) 532:91–109. https://doi.org/10.1016/j.ins.2020.05.004

    Article  MathSciNet  Google Scholar 

  33. Zhou Y, Kang J, Kwong S, Wang X, Zhang Q (2020) An evolutionary multi-objective optimization framework of discretization-based feature selection for classification. Swarm Evol Comput 60:100770. https://doi.org/10.1016/j.swevo.2020.100770

    Article  Google Scholar 

  34. Zhou Y, Zhang W, Kang J, Zhang X, Wang X (2021) A problem-specific non-dominated sorting genetic algorithm for supervised feature selection. Inf Sci (Ny) 547:841–859. https://doi.org/10.1016/j.ins.2020.08.083

    Article  MathSciNet  MATH  Google Scholar 

  35. Iliyasu AM, Fatichah C (2017) A quantum hybrid PSO combined with fuzzy k-NN approach to feature selection and cell classification in cervical cancer detection. Sensors (Switzerland) 17(12):1–17. https://doi.org/10.3390/s17122935

    Article  Google Scholar 

  36. Kavitha KR, Harishankar UN, Akhil MC (2018) PSO based feature selection of gene for cancer classification using SVM-RFE. In: 2018 international conference on advances in computing, communications and informatics, ICACCI 2018, pp 1012–1016. https://doi.org/10.1109/ICACCI.2018.8554429.

  37. Gu S, Cheng R, Jin Y (2018) Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft Comput 22:811–822. https://doi.org/10.1007/s00500-016-2385-6

    Article  Google Scholar 

  38. Cilia ND, De Stefano C, Fontanella F, Raimondo S, di Freca AS (2019) An experimental comparison of feature-selection and classification methods for microarray datasets. Inf 10(3):1–13. https://doi.org/10.3390/info10030109

    Article  Google Scholar 

  39. Zhu Z, Ong Y-S, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit 40(11):3236–3248. https://doi.org/10.1016/j.patcog.2007.02.007

    Article  MATH  Google Scholar 

  40. Mazumder DH (2019) An enhanced feature selection filter for classification of microarray cancer data. ETRI J. https://doi.org/10.4218/etrij.2018-0522

    Article  Google Scholar 

  41. Golub TR et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537. https://doi.org/10.1126/SCIENCE.286.5439.531

    Article  Google Scholar 

  42. Alrefai N (2019) Ensemble machine learning for leukemia cancer diagnosis based on microarray datasets. Int J Appl Eng Res 14(21):4077–4084

    Google Scholar 

  43. Alon U et al. (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Accessed 20 July 2019. Available: http://www.pnas.org.

  44. van’t Veer LJ et al (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536. https://doi.org/10.1038/415530a

    Article  Google Scholar 

  45. Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215. https://doi.org/10.1016/j.asoc.2017.09.038

    Article  Google Scholar 

  46. Gordon GJ et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62(17):4963–4967

    Google Scholar 

  47. Petricoin EF et al (2002) Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359(9306):572–577. https://doi.org/10.1016/S0140-6736(02)07746-2

    Article  Google Scholar 

  48. Pomeroy SL et al (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442. https://doi.org/10.1038/415436a

    Article  Google Scholar 

  49. Pashaei E, Ozen M, Aydin N (2016) Gene selection and classification approach for microarray data based on Random Forest Ranking and BBHA. In: 3rd IEEE EMBS international conference on biomedical and health informatics, BHI 2016, pp 308–311. https://doi.org/10.1109/BHI.2016.7455896.

  50. Molina D, Poyatos J, Del Ser J, García S, Hussain A, Herrera F (2020) Comprehensive taxonomies of nature- and bio-inspired optimization: inspiration versus algorithmic behavior, critical analysis recommendations. Cognit Comput. https://doi.org/10.1007/s12559-020-09730-8

    Article  Google Scholar 

  51. Eberhart S (1998) A modified particle swarm optimizer. In: 1998 IEEE international conference on evolutionary computation proceedings. IEEE world congress on computational intelligence (Cat. No.98TH8360), pp 69–73. https://doi.org/10.1109/ICEC.1998.699146.

  52. Han J, Kamber M, Pei J (2011) Data mining. concepts and techniques, 3rd Edition (The Morgan Kaufmann Series in Data Management Systems). Accessed 01 Dec 2018. [Online]. Available: http://myweb.sabanciuniv.edu/rdehkharghani/files/2016/02/The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Kamber-Jian-Pei-Data-Mining.-Concepts-and-Techniques-3rd-Edition-Morgan-Kaufmann-2011.pdf

  53. Lysiak R, Kurzynski M, Woloszynski T (2014) Optimal selection of ensemble classifiers using measures of competence and diversity of base classifiers. Neurocomputing 126:29–35. https://doi.org/10.1016/j.neucom.2013.01.052

    Article  Google Scholar 

  54. Cavalcanti GDC, Oliveira LS, Moura TJM, Carvalho GV (2016) Combining diversity measures for ensemble pruning. Pattern Recognit Lett 74:38–45. https://doi.org/10.1016/j.patrec.2016.01.029

    Article  Google Scholar 

  55. Brodley C, Lane T (1996) Creating and exploiting coverage and diversity. In: Proc. AAAI-96 workshop on integrating multiple learned models. Portland, OR, pp 8–14

  56. Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1):95–116. https://doi.org/10.1007/S10115-006-0040-8

    Article  Google Scholar 

  57. García V, Salvador Sánchez J (2014) Mapping microarray gene expression data into dissimilarity spaces for tumor classification. Inf Sci. https://doi.org/10.1016/j.ins.2014.09.064

    Article  Google Scholar 

  58. Kilicarslan S, Adem K, Celik M (2020) Diagnosis and classification of cancer using hybrid model based on ReliefF and convolutional neural network. Med Hypotheses 137:109577. https://doi.org/10.1016/j.mehy.2020.109577

    Article  Google Scholar 

Download references

Funding

This study did not receive external or internal funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nashat Alrefai.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest that are relevant to the content of this article.

Ethics approval

All information and the data source used in our study were mentioned in the research, and it is available and public for research purpose.

Consent for publication

In this study we used public dataset and cited appropriately.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alrefai, N., Ibrahim, O. Optimized feature selection method using particle swarm intelligence with ensemble learning for cancer classification based on microarray datasets. Neural Comput & Applic 34, 13513–13528 (2022). https://doi.org/10.1007/s00521-022-07147-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07147-y

Keywords

Navigation