Skip to main content
Log in

A new multi-objective binary Harris Hawks optimization for gene selection in microarray data

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Cancer classification is one of the main applications of gene expression data (microarray data) and is essential for a comprehensive diagnosis of cancer treatment. Therefore, bio-inspired algorithms have developed several effective applications in the analysis of gene selection, which are one of the most effective applied in this domain. Harris Hawks optimization is a novel and recent algorithm that has an excellent balance between exploration and exploitation. This paper presents the first study on multi-objective binary Harris Hawks optimization (MOBHHO) for gene selection. We define gene selection as a problem, including two main conflicting objectives: minimizing the number of genes and maximizing the classification accuracy. MOBHHO uses two fitness functions to solve competing objectives. The first function based on SVM with LOOCV classifier and the second function also depends on KNN with K-fold classifier, as well as the percentage of gene selection found in both functions. Furthermore, MOBHHO tries to find the Pareto-optimal solutions, i.e. the best gene subset that contains a minimal number of selected genes and better classification accuracy. We have integrated several filter-based ranking methods with our proposal. In order to test the performance accuracy of the proposed MOBHHO algorithm, we compared our algorithm with other recently published algorithms in the literature. The experiment results which have been conducted on eight benchmarks (binary-class and multi-class), MOBHHO able to provide a minimum number of genes to obtain the highest classification accuracy. The proposed method reaches above 98% classification accuracy in six benchmark datasets and a maximum accuracy of 100% is achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Akay MF (2009) Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst Appl 36:3240–3247

    Google Scholar 

  • Alanni R, Hou J, Azzawi H, Xiang Y (2019) A novel gene selection algorithm for cancer classification using microarray datasets. BMC Med Genom 12:10

    Google Scholar 

  • Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96:6745–6750

    Google Scholar 

  • Alshamlan HM, Badr GH, Alohali YA (2015) Genetic bee colony (gbc) algorithm: a new gene selection method for microarray cancer classification. Comput Biol Chem 56:49–60

    Google Scholar 

  • Annavarapu Chandra Sekhara Rao SD, Banka H (2016) Cancer microarray data feature selection using multi-objective binary particle swarm optimization algorithm. EXCLI J 15:460

    Google Scholar 

  • Apolloni J, Leguizamón G, Alba E (2016) Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl Soft Comput 38:922–932

    Google Scholar 

  • Bligaard T, Jóhannesson GH, Ruban AV, Skriver HL, Jacobsen KW, Nørskov JK (2003) Pareto-optimal alloys. Appl Phys Lett 83:4527–4529

    Google Scholar 

  • Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2015) Distributed feature selection: an application to microarray data classification. Appl Soft Comput 30:136–150

    Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    MATH  Google Scholar 

  • Chen YW, Lin CJ (2006) Combining SVMs with various feature selection strategies. In: Feature extraction. Springer, Berlin, Heidelberg, pp 315–324

    Google Scholar 

  • Cho JH, Kim DH (2011) Intelligent feature selection by bacterial foraging algorithm and information theory. In: International conference on advanced communication and networking. Springer, Berlin, Heidelberg, pp 238–244

    Google Scholar 

  • Chung RS (1997) Colectomy for sigmoid volvulus. Dis Colon Rectum 40:363–365

    Google Scholar 

  • Coello CAC, Lamont GB, Van Veldhuizen DA et al (2007) Evolutionary algorithms for solving multi-objective problems, vol 5. Springer, Berlin

    MATH  Google Scholar 

  • Cover TM, Thomas JA (1991) Information theory and statistics. In: Elements of information theory, vol 1, pp 279–335

  • Cover TM, Van Campenhout JM (1977) On the possible orderings in the measurement selection problem. IEEE Trans Syst Man Cybern 7:657–661

    MathSciNet  MATH  Google Scholar 

  • Crawford JR, Howell DC (1998) Comparing an individual’s test score against norms derived from small samples. Clin Neuropsychol 12:482–486

    Google Scholar 

  • Dabba A, Tari A, Meftali S, Mokhtari R (2021a) Gene selection and classification of microarray data method based on mutual information and moth flame algorithm. Expert Syst Appl 166:114012

    Google Scholar 

  • Dabba A, Tari A, Meftali S (2021b) Hybridization of Moth flame optimization algorithm and quantum computing for gene selection in microarray data. J Ambient Intell Human Comput 12(2):2731–2750

    Google Scholar 

  • Das K, Mishra D, Shaw K (2016) A metaheuristic optimization framework for informative gene selection. Inform Med Unlocked 4:10–20

    Google Scholar 

  • Dashtban M, Balafar M, Suravajhala P (2018) Gene selection for tumor classification using a novel bio-inspired multi-objective approach. Genomics 110:10–17

    Google Scholar 

  • Deng L, Pei J, Ma J, Lee DL (2004) A rank sum test method for informative gene discovery. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 410–419

  • Dif N, Elberrichi Z (2019) An enhanced recursive firefly algorithm for informative gene selection. Int J Swarm Intell Res (IJSIR) 10:21–33

    Google Scholar 

  • FORTE (2004) Francesco et MANTOVANI, Michela. Manuale di economia e politica dei beni culturali. Rubbettino Editore

  • Gallo CA, Carballido JA, Ponzoni I (2011) Discovering time-lagged rules from microarray data using gene profile classifiers. BMC Bioinform 12:123

    Google Scholar 

  • Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science science 286:531–537

    Google Scholar 

  • Hameed SS, Muhammad FF, Hassan R, Saeed F (2018) Gene selection and classification in microarray datasets using a hybrid approach of pcc-bpso/ga with multi classifiers. JCS 14:868–880

    Google Scholar 

  • Harvey DY, Todd MD (2014) Automated feature design for numeric sequence classification by genetic programming. IEEE Trans Evol Comput 19:474–489

    Google Scholar 

  • Hasnat A, Molla AU (2016) Feature selection in cancer microarray data using multi-objective genetic algorithm combined with correlation coefficient. In: 2016 international conference on emerging technological trends (ICETT). IEEE, pp 1–6

  • Hegazy AE, Makhlouf MA, El-Tawel GS (2019) Feature selection using chaotic salp swarm algorithm for data classification. Arab J Sci Eng 44(4):3801–3816

    Google Scholar 

  • Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H (2019) Harris hawks optimization: algorithm and applications. Futur Gener Comput Syst 97:849–872

    Google Scholar 

  • Hengpraprohm S, Mukviboonchai S, Thammasang R, Chongstitvatana P (2010) A GA-Based classifier for microarray data classification. In: 2010 international conference on intelligent computing and cognitive informatics. IEEE, pp 199–202

  • Kubat M (1999) Neural networks: a comprehensive foundation by Simon Haykin, Macmillan, 1994, ISBN 0-02-352781-7. Knowledge Eng Rev 13(4):409–412

    Google Scholar 

  • Lal TN, Chapelle O, Weston J, Elisseeff A (2006) Embedded methods. In: Feature extraction. Springer, Berlin, Heidelberg, pp 137–165

    Google Scholar 

  • Li S, Wu X, Tan M (2008) Gene selection using hybrid particle swarm optimization and genetic algorithm. Soft Comput 12:1039–1048

    Google Scholar 

  • Li Y, Wang G, Chen H, Shi L, Qin L (2013) An ant colony optimization based dimension reduction method for high-dimensional datasets. J Bionic Eng 10:231–241

    Google Scholar 

  • Lin SW, Ying KC, Chen SC, Lee ZJ (2008) Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 35:1817–1824

    Google Scholar 

  • Lu H, Chen J, Yan K, Jin Q, Xue Y, Gao Z (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256:56–62

    Google Scholar 

  • Lv J, Peng Q, Chen X, Sun Z (2016) A multi-objective heuristic algorithm for gene expression microarray data classification. Expert Syst Appl 59:13–19

    Google Scholar 

  • Ma S, Li X, Wang Y (2016) Classification of gene expression data using multiobjective differential evolution. Energies 9:1061

    Google Scholar 

  • MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, no 14, pp 281–297

  • Mirjalili S, Hashim SM, Taherzadeh G, Mirjalili SZ, Salehi S (2011) A study of different transfer functions for binary version of particle swarm optimization. In: Proceedings of the international conference on genetic and evolutionary methods (GEM). The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp)

  • Mohamad MS, Omatu S, Deris S, Yoshioka M, Abdullah A, Ibrahim Z (2013) An enhancement of binary particle swarm optimization for gene selection in classifying cancer classes. Algorithms Mol Biol 8:15

    Google Scholar 

  • Moosa JM, Shakur R, Kaykobad M, Rahman MS (2016) Gene selection for cancer classification with the help of bees. BMC Med Genom 9:47

    Google Scholar 

  • Nancy SG, Saranya K, Rajasekar S (2020) Neuro-Fuzzy ant bee colony based feature selection for cancer classification. In: EAI international conference on big data innovation for sustainable cognitive computing. Springer, Cham, pp 31–40

    Google Scholar 

  • Ng AY (1997) Preventing “overfitting” of cross-validation data. In: ICML, vol 97, pp 245–253

  • Nutt CL, Mani D, Betensky RA, Tamayo P, Cairncross JG, Ladd C, Pohl U, Hartmann C, McLaughlin ME, Batchelor TT et al (2003) Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res 63:1602–1607

    Google Scholar 

  • Panthong R, Srivihok A (2015) Wrapper feature subset selection for dimension reduction based on ensemble learning algorithm. Proc Comput Sci 72:162–169

    Google Scholar 

  • Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Google Scholar 

  • Petricoin EF III, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Mills GB, Simone C, Fishman DA, Kohn EC et al (2002) Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359:572–577

    Google Scholar 

  • Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JY, Goumnerova LC, Black PM, Lau C et al (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415:436

    Google Scholar 

  • Prasad Y, Biswas K, Hanmandlu M (2018) A recursive pso scheme for gene selection in microarray data. Appl Soft Comput 71:213–225

    Google Scholar 

  • Saeys Y, Degroeve S, Aeyels D, Van de Peer Y, Rouzé P (2003) Fast feature selection using a simple estimation of distribution algorithm: a case study on splice site prediction. Bioinformatics 19:ii179–ii188

    Google Scholar 

  • Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics bioinformatics 23:2507–2517

    Google Scholar 

  • Shao C, Paynabar K, Kim TH, Jin JJ, Hu SJ, Spicer JP, Wang H, Abell JA (2013) Feature selection for manufacturing process monitoring using cross-validation. J Manuf Syst 32:550–555

    Google Scholar 

  • Sharma A, Rani R (2019) C-hmoshssa: gene selection for cancer classification using multi-objective meta-heuristic and machine learning methods. Comput Methods Programs Biomed 178:219–235

    Google Scholar 

  • Shukla AK, Singh P, Vardhan M (2020) Gene selection for cancer types classification using novel hybrid metaheuristics approach. Swarm Evolut Comput 54:100661

    Google Scholar 

  • Staunton JE, Slonim DK, Coller HA, Tamayo P, Angelo MJ, Park J, Scherf U, Lee JK, Reinhold WO, Weinstein JN et al (2001) Chemosensitivity prediction by transcriptional profiling. Proc Natl Acad Sci 98:10787–10792

    Google Scholar 

  • Su AI, Welsh JB, Sapinoso LM, Kern SG, Dimitrov P, Lapp H, Schultz PG, Powell SM, Moskaluk CA, Frierson HF et al (2001) Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res 61:7388–7393

    Google Scholar 

  • Xie J, Wang C (2011) Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases. Expert Syst Appl 38:5809–5815

    Google Scholar 

  • Zhang H, Wang H, Dai Z, Chen MS, Yuan Z (2012) Improving accuracy for cancer classification with a new algorithm for genes selection. BMC Bioinform 13:298

    Google Scholar 

  • Zhu Z, Ong YS, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit 40:3236–3248

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Dabba.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dabba, A., Tari, A. & Meftali, S. A new multi-objective binary Harris Hawks optimization for gene selection in microarray data. J Ambient Intell Human Comput 14, 3157–3176 (2023). https://doi.org/10.1007/s12652-021-03441-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-021-03441-0

Keywords

Navigation