Abstract
In DNA microarray applications, many techniques are proposed for cancer classification in order to detect normal and cancerous humans or classify different types of cancers. Gene selection is usually required as a preliminary step for a cancer classification problem. This step aims to select the most informative genes among a great number of genes, which represent an important issue. Although many studies have been proposed to address this issue, they lack getting the most informative and fewest number of genes with the highest accuracy and little effort from the high dimensionality of microarray datasets. Manta ray foraging optimization(MRFO) algorithm is a new meta-heuristic algorithm that mimics the nature of manta ray fishes in food foraging. MRFO has achieved promising results in other fields, such as solar generating units. Due to the high accuracy results of the support vector machines (SVM), it is the most commonly used classification algorithm in cancer studies, especially with microarray data. For exploiting the pros of both algorithms (i.e., MRFO and SVM), in this paper, a hybrid algorithm is proposed to select the most predictive and informative genes for cancer classification. A binary microarray dataset, which includes colon and leukemia1, and a multi-class microarray dataset that includes SRBCT, lymphoma, and leukemia2, are used to evaluate the accuracy of the proposed technique. Like other optimization techniques, MRFO suffers from some problems related to the high dimensionality and complexity of the microarray data. For solving such problems as well as improving the performance, the minimum redundancy maximum relevance (mRMR) method is used as a preprocessing stage. The proposed technique has been evaluated compared to the most common cancer classification algorithms. The experimental results show that our proposed technique achieves the highest accuracy with the fewest number of informative genes and little effort.
Similar content being viewed by others
References
Dubitzky, W.; Granzow, M.; Downes, C.S.; Berrar, D.: Introduction to microarray data analysis. In: A Practical Approach to Microarray Data Analysis. Springer, pp. 1–46. (2003)
Benso, A.; Di Carlo, S.; Politano, G.; Savino, A.: Gpu acceleration for statistical gene classification. In: 2010 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR), Vol. 2, IEEE, pp. 1–6. (2010)
Golub, T.R.; Slonim, D.K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J.P.; Coller, H.; Loh, M.L.; Downing, J.R.; Caligiuri, M.A.; et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Alshamlan, H.M.; Badr, G.H.; Alohali, Y.: A study of cancer microarray gene expression profile: objectives and approaches. In: Proceedings of the World Congress on Engineering, Vol. 2, pp. 1–6 (2013)
Ghorai, S.; Mukherjee, A.; Sengupta, S.; Dutta, P.K.: Multicategory cancer classification from gene expression data by multiclass NPPC ensemble. In: 2010 International Conference on Systems in Medicine and Biology, IEEE, (2010), pp. 41–48
Guo, S.-B.; Lyu, M.R.; Lok, T.-M.: Gene selection based on mutual information for the classification of multi-class cancer. In: International Conference on Intelligent Computing, Springer, pp. 454–463 (2006)
Alanni, R.; Hou, J.; Azzawi, H.; Xiang, Y.: A novel gene selection algorithm for cancer classification using microarray datasets. BMC Med. Genomics 12(1), 10 (2019)
Alshamlan, H.M.; Badr, G.H.; Alohali, Y.A.: The performance of bio-inspired evolutionary gene selection methods for cancer classification using microarray dataset, International Journal of Bioscience. Biochem. Bioinf. 4(3), 166 (2014)
Narendra, P.M.; Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. 9, 917–922 (1977)
Watada, J.; Arunava, R.; Jingru, L.; Bo, W.; Shuming, W.: A dual recurrent neural network-based hybrid approach for solving convex quadratic bi-level programming problem. Neurocomputing 407, 136–154 (2020)
Zhao, W.; Zhang, Z.; Wang, L.: Manta ray foraging optimization: an effective bio-inspired optimizer for engineering applications. Eng. Appl. Artif. Intell. 87, 103300103300 (2020)
Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Huerta, E.B.; Duval, B.; Hao, J.-K.: A hybrid GA/SVM approach for gene selection and classification of microarray data. In: Workshops on Applications of Evolutionary Computation, Springer, pp. 34–44(2006)
Mukherjee, S.: Classifying microarray data using support vector machines. In: A practical Approach to Microarray Data Analysis. Springer, pp. 166–185 (2003)
Alshamlan, H.; Badr, G.; Alohali, Y.: A comparative study of cancer classification methods using microarray gene expression profile. In: Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013), Springer, pp. 389–398 (2014)
Heidari, A.A.; Mirjalili, S.; Faris, H.; Aljarah, I.; Mafarja, M.; Chen, H.: Harris hawks optimization: algorithm and applications. Future Generat. Comput. Syst. 97, 849–872 (2019)
Hayyolalam, V.; Kazem, A.A.P.: Black widow optimization algorithm: a novel meta-heuristic approach for solving engineering optimization problems. Eng. Appl. Artif. Intell. 87, 103249 (2020)
Mirjalili, S.; Mirjalili, S.M.; Lewis, A.: Grey wolf optimizer. Adv. Eng. softw. 69, 46–61 (2014)
Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4(2), 65–85 (1994)
Mirjalili, S.; Lewis, A.: The whale optimization algorithm. Adv. Eng. softw. 95, 51–67 (2016)
Kennedy, J.; Eberhart, R.: Particle swarm optimization. In: Proceedings of ICNN’95-International Conference on Neural Networks, Vol. 4, IEEE, pp. 1942–1948 (1995)
Karaboga, D.: An idea based on honey bee swarm for numerical optimization, Tech. rep., Technical report-tr06, Erciyes university, engineering faculty, computer (2005).
Alon, U.; Barkai, N.; Notterman, D.A.; Gish, K.; Ybarra, S.; Mack, D.; Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sci. 96(12), 6745–6750 (1999)
Khan, J.; Wei, J.S.; Ringner, M.; Saal, L.H.; Ladanyi, M.; Westermann, F.; Berthold, F.; Schwab, M.; Antonescu, C.R.; Peterson, C.; et al.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Med. 7(6), 673–679 (2001)
Alizadeh, A.A.; Eisen, M.B.; Davis, R.E.; Ma, C.; Lossos, I.S.; Rosenwald, A.; Boldrick, J.C.; Sabet, H.; Tran, T.; Yu, X.; et al.: Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000)
Armstrong, S.A.; Staunton, J.E.; Silverman, L.B.; Pieters, R.; den Boer, M.L.; Minden, M.D.; Sallan, S.E.; Lander, E.S.; Golub, T.R.; Korsmeyer, S.J.: Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics 30(1), 41–47 (2002)
Peng, H.; Long, F.; Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. intell. 27(8), 1226–1238 (2005)
Lazar, C.; Taminau, J.; Meganck, S.; Steenhoff, D.; Coletta, A.; Molter, C.; de Schaetzen, V.; Duque, R.; Bersini, H.; Nowe, A.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 9(4), 1106–1119 (2012)
Tabakhi, S.; Moradi, P.; Akhlaghian, F.: An unsupervised feature selection algorithm based on ant colony optimization. Eng. Appl. Artif. Intell. 32, 112–123 (2014)
Liao, B.; Jiang, Y.; Liang, W.; Zhu, W.; Cai, L.; Cao, Z.: Gene selection using locality sensitive laplacian score. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 11(6), 1146–1156 (2014)
He, X.; Cai, D.; Niyogi, P.: Laplacian score for feature selection. In: Advances in neural information processing systems, pp. 507–514. (2006)
Cai, R.; Hao, Z.; Yang, X.; Wen, W.: An efficient gene selection algorithm based on mutual information. Neurocomputing 72(4–6), 991–999 (2009)
Raileanu, L.E.; Stoffel, K.: Theoretical comparison between the gini index and information gain criteria. Ann. Math. Artif. Intell. 41(1), 77–93 (2004)
Ding, C.; Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinf. Comput. Biol. 3(02), 185–205 (2005)
Bertoni, A.; Folgieri, R.; Valentini, G.: Bio-molecular cancer prediction with random subspace ensembles of support vector machines. Neurocomputing 63, 535–539 (2005)
Lai, C.; Reinders, M.J.; Wessels, L.: Random subspace method for multivariate feature selection. Pattern Recognit. Lett. 27(10), 1067–1076 (2006)
Li, X.; Zhao, H.: Weighted random subspace method for high dimensional data classification. Statistics and its Interface 2(2), 153 (2009)
Haindl, M.; Somol, P.; Ververidis, D.; Kotropoulos, C.: Feature selection based on mutual correlation. In: Iberoamerican Congress on Pattern Recognition, Springer, pp. 569–577 (2006)
Ghazavi, S.N.; Liao, T.W.: Medical data mining by fuzzy modeling with selected features. Artif. Intell. Med. 43(3), 195–206 (2008)
Ferreira, A.J.; Figueiredo, M.A.: An unsupervised approach to feature discretization and selection. Pattern Recognit. 45(9), 3048–3060 (2012)
Ferreira, A.J.; Figueiredo, M.A.: Efficient feature selection filters for high-dimensional data. Pattern Recognit. Lett. 33(13), 1794–1804 (2012)
Yu, L.; Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), 2003, pp. 856–863
Yu, L.; Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5(Oct), 1205–1224 (2004)
Gheyas, I.A.; Smith, L.S.: Feature subset selection in large dimensionality domains. Pattern recognition 43(1), 5–13 (2010)
Saeys, Y.; Inza, I.; Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Sahu, B.; Mishra, D.: A novel feature selection algorithm using particle swarm optimization for cancer microarray data. Procedia Eng. 38, 27–31 (2012)
Martinez, E.; Alvarez, M.M.; Trevino, V.: Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm. Comput. Biol. Chem. 34(4), 244–250 (2010)
Li, Y.; Wang, G.; Chen, H.; Shi, L.; Qin, L.: An ant colony optimization based dimension reduction method for high-dimensional datasets. J. Bionic Eng. 10(2), 231–241 (2013)
Kabir, M.M.; Shahjahan, M.; Murase, K.: A new hybrid ant colony optimization algorithm for feature selection. Expert Syst. Appl. 39(3), 3747–3763 (2012)
Yu, H.; Gu, G.; Liu, H.; Shen, J.; Zhao, J.: A modified ant colony optimization algorithm for tumor marker gene selection. Genomics Proteomics Bioinf. 7(4), 200–208 (2009)
Srivastava, A.; Chakrabarti, S.; Das, S.; Ghosh, S.; Jayaraman, V.K.: Hybrid firefly based simultaneous gene selection and cancer classification using support vector machines and random forests. In: Proceedings of Seventh International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA 2012), Springer, pp. 485–494 (2013)
Inza, I.; Sierra, B.; Blanco, R.; Larrañaga, P.: Gene selection by sequential search wrapper approaches in microarray cancer class prediction. J. Intell. Fuzzy Syst. 12(1), 25–33 (2002)
Inza, I.; Larrañaga, P.; Blanco, R.; Cerrolaza, A.J.: Filter versus wrapper gene selection approaches in DNA microarray domains. Artif. Intell. Med. 31(2), 91–103 (2004)
Ghoneimy, M.; Nabil, E.; Badr, A.; El-Khamisy, S.F.: Bioscience research.
Alshamlan, H.M.; Badr, G.H.; Alohali, Y.A.: Abc-svm: artificial bee colony and svm method for microarray gene selection and multi class cancer classification. Int. J. Mach. Learn. Comput. 6(3), 184 (2016)
Alba, E.; Garcia-Nieto, J.; Jourdan, L.; Talbi, E.-G.: Gene selection in cancer classification using PSO, SVM and GA, SVM hybrid algorithms. In: IEEE Congress on Evolutionary Computation. IEEE 2007, 284–290 (2007)
Rani, R.R.; Ramyachitra, D.: Microarray cancer gene feature selection using spider monkey optimization algorithm and cancer classification using SVM. Procedia Comput. Sci. 143, 108–116 (2018)
Almugren, N.; Alshamlan, H.: Ff-svm: New firefly-based gene selection algorithm for microarray cancer classification. In: 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), IEEE, pp. 1–6 (2019)
Maulik, U.; Chakraborty, D.: Fuzzy preference based feature selection and semisupervised svm for cancer classification. IEEE Trans. Nanobiosci. 13(2), 152–160 (2014)
Chen, M.-S.; Ho, T.-Y.; Huang, D.-Y.: Online transductive support vector machines for classification. In: 2012 International Conference on Information Security and Intelligent Control, IEEE, pp. 258–261 (2012)
Zhang, L.; Zhou, W.; Wang, B.; Zhang, Z.; Li, F.: Applying 1-norm svm with squared loss to gene selection for cancer classification. Appl. Intell. 48(7), 1878–1890 (2018)
Zhao, W.; Wang, G.; Wang, H.; Chen, H.; Dong, H.; Zhao, Z.: A novel framework for gene selection. Int. J. Adv. Comput. Technol. 3(3), 184–191 (2011)
Lee, C.-P.; Leu, Y.: A novel hybrid feature selection method for microarray data analysis. Appl. Soft Comput. 11(1), 208–213 (2011)
Leung, Y.; Hung, Y.: A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 7(1), 108–117 (2010)
Zibakhsh, A.; Abadeh, M.S.: Gene selection for cancer tumor detection using a novel memetic algorithm with a multi-view fitness function. Eng. Appl. Artif. Intell. 26(4), 1274–1281 (2013)
Alshamlan, H.; Badr, G;, Alohali, Y.: mrmr-abc: a hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. Biomed. Res. Int. (2015)
Alshamlan, H.M.; Badr, G.H.; Alohali, Y.A.: Genetic bee colony (gbc) algorithm: a new gene selection method for microarray cancer classification. Comput. Biol. Chem. 56, 49–60 (2015)
Díaz-Uriarte, R.; De Andres, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinf. 7(1), 3 (2006)
Wang, G.; Song, Q.; Xu, B.; Zhou, Y.: Selecting feature subset for high dimensional data via the propositional foil rules. Pattern Recognit. 46(1), 199–214 (2013)
Duan, K.-B.; Rajapakse, J.C.; Wang, H.; Azuaje, F.: Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE trans. Nanobiosci. 4(3), 228–234 (2005)
Duan, K.-B.; Rajapakse, J.C.; Nguyen, M.N.: One-versus-one and one-versus-all multiclass svm-rfe for gene selection in cancer classification. In: European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, Springer, pp. 47–56 (2007)
Ghosh, K.K.; Guha, R.; Bera, S.K.; Kumar, N.; Sarkar, R.: S-shaped versus v-shaped transfer functions for binary manta ray foraging optimization in feature selection problem.
Fathy, A.; Rezk, H.; Yousri, D.: A robust global MPPT to mitigate partial shading of triple-junction solar cell-based system using manta ray foraging optimization algorithm. Solar Energy 207, 305–316 (2020)
El-Hameed, M.A.; Elkholy, M.M.; El-Fergany, A.A.: Three-diode model for characterization of industrial solar generating units using manta-rays foraging optimizer: Analysis and validations. Energy Convers. Manage. 219, 113048 (2020)
Selem, S.I.; Hasanien, H.M.; El-Fergany, A.A.: Parameters extraction of PEMFC’s model using manta rays foraging optimizer. Int. J. Energy Res. 44(6), 4629–4640 (2020)
El Akadi, A.; Amine, A.; El Ouardighi, A.; Aboutajdine, D.: A new gene selection approach based on minimum redundancy-maximum relevance (MRMR) and genetic algorithm (GA). In: 2009 IEEE/ACS International Conference on Computer Systems and Applications, IEEE, pp. 69–75 (2009)
Liu, H.; Liu, L.; Zhang, H.: Ensemble gene selection by grouping for microarray data classification. J. Biomed. Inf. 43(1), 81–87 (2010)
Abdi, M.J.; Hosseini, S.M.; Rezghi, M.: A novel weighted support vector machine based on particle swarm optimization for gene selection and tumor classification. Comput. Math. Methods Med. (2012)
Yun, C.; Oh, B.; Yang, J.; Nang, J.: Feature subset selection based on bio-inspired algorithms. J. Inf. Sci. Eng. 27(5), 1667–1686 (2011)
Huang, T.; Wang, P.; Ye, Z.-Q.; Xu, H.; He, Z.; Feng, K.-Y.; Hu, L.; Cui, W.; Wang, K.; Dong, X.; et al.: Prediction of deleterious non-synonymous SNPS based on protein interaction network and hybrid properties. PLoS ONE 5(7), e11900 (2010)
Rodríguez-Peérez, R.; Vogt, M.; Bajorath, J.: Support vector machine classification and regression prioritize different structural features for binary compound activity and potency value prediction. ACS Omega 2(10), 6371–6379 (2017)
Wang, X.; Gotoh, O.: Microarray-based cancer prediction using soft computing approach, Cancer informatics 7 CIN–S2655. (2009)
Shen, Q.; Shi, W.-M.; Kong, W.; Ye, B.-X.: A combination of modified particle swarm optimization algorithm and support vector machine for gene selection and tumor classification. Talanta 71(4), 1679–1683 (2007)
Abdi, M.J.; Giveki, D.: Automatic detection of erythemato-squamous diseases using PSO-SVM based on association rules. Eng. Appl. Artif. Intell. 26(1), 603–608 (2013)
Huang, H.-L.; Chang, F.-L.: Esvm: Evolutionary support vector machine for automatic feature selection and classification of microarray data. Biosystems 90(2), 516–528 (2007)
Huang, H.-L.; Lee, C.-C.; Ho, S.-Y.: Selecting a minimal number of relevant genes from microarray data to design accurate tissue classifiers. Biosystems 90(1), 78–86 (2007)
Yang, C.-S.; Chuang, L.-Y.; Ke, C.-H.; Yang, C.-H.: A hybrid feature selection method for microarray classification., IAENG Int. J. Comput. Sci. 35(3)
Peng, S.; Xu, Q.; Ling, X.B.; Peng, X.; Du, W.; Chen, L.: Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. FEBS Lett. 555(2), 358–362 (2003)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Houssein, E.H., Hassan, H.N., Al-Sayed, M.M. et al. Gene Selection for Microarray Cancer Classification based on Manta Rays Foraging Optimization and Support Vector Machines. Arab J Sci Eng 47, 2555–2572 (2022). https://doi.org/10.1007/s13369-021-06102-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-021-06102-8