Abstract
Selecting disease-causing genes from gene expression and methylation data with hundreds of thousands of loci is of great benefit for cancer diagnosis and treatment, but it also faces tremendous technical challenges due to its small sample size and ultrahigh-dimensional genetic markers. To enhance the search speed, this paper proposes a new gene selection algorithm, called the Membrane Computing with Harmony Search Algorithm (MC-HSA), based on the theory of membrane computing to quickly select a subset of potential disease-causing genes. In the MC-HSA, an active membrane dissolving P system is designed to obtain a trade-off between global exploration and local exploitation ability for detecting gene combinations that have a strong association with disease status. The harmony search algorithm is embedded in the P system to comprehensively detect gene subsets in both gene expression and DNA methylation data. An enhanced classifier consisting of four general classifiers is employed to improve classification accuracy (CA) and avoid overfitting, while a penalty function is developed to screen out redundant genes. Experiments on six real datasets indicate that our method is very competitive compared with ten excellent optimization algorithms (HybridGA, QSFS, RMA, WOA-CM, ME-BPSO, CDNC, ABCD, HAMS, mRMR, and ImRMR). Taking the gene expression and DNA methylation data of prostate cancer as an example, the experimental results show that our method finds a smaller number of genes with high CA (> 99%) than four state-of-the-art algorithms and maintains stable performance. Finally, we specifically analyzed the representative genes and comprehensively validated them in terms of Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and Gene Ontologies (GO).
Similar content being viewed by others
Data availability
The gene expression data and DNA methylation data of prostate cancer are openly available in NCBI Ref.id: GSE55599. The other five microarray datasets can be freely obtained at https://file.biolab.si/biolab/supp/bicancer/projections/info/leukemia.html.
References
Siegel, R. L., Miller, K. D., & Jemal, A. (2019). Cancer statistics, 2019. CA: a Cancer Journal Clinicians, 69, 7–34.
Xi, Y., & Xu, P. (2021). Global colorectal cancer burden in 2020 and projections to 2040[J]. Translational Oncology, 14(10), 101174.
Ma, F., Wu, J., Fu, L., et al. (2021). Interpretation of specification for breast cancer screening, early diagnosis and treatment management in Chinese women. Journal of the National Cancer Center, 1(3), 97–100.
Sung, H., Ferlay, J., Siegel, R. L., et al. (2021). Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA:a Cancer Journal for Clinicians, 71(3), 209–249.
Sayed, S., Nassef, M., Badr, A., et al. (2019). A nested genetic algorithm for feature selection in high-dimensional cancer microarray datasets[J]. Expert Systems with Applications, 121, 233–243. https://doi.org/10.1016/j.eswa.2018.12.022
Dai, X., Xiang, L., Li, T., et al. (2016). Cancer hallmarks, biomarkers and breast cancer molecular subtypes. Journal of Cancer, 7(10), 1281.
Tollis, M., Schneider-Utaka, A. K., & Maley, C. C. (2020). The evolution of human cancer gene duplications across mammals. Molecular Biology and Evolution, 37(10), 2875–2886. https://doi.org/10.1093/molbev/msaa128
Rostami, M., Forouzandeh, S., Berahmand, K., Soltani, M., Shahsavari, M., & Oussalah, M. (2022). Gene selection for microarray data classification via multi-objective graph theoretic-based method. Artificial Intelligence in Medicine, 123, 102228. https://doi.org/10.1016/j.artmed.2021.102228
Singhal, S. K., Usmani, N., Michiels, S., et al. (2016). Towards understanding the breast cancer epigenome: a comparison of genome-wide DNA methylation and gene expression data. Oncotarget, 7(3), 3002. https://doi.org/10.18632/oncotarget.6503
Lee, J., Choi, I. Y., & Jun, C. H. (2016). An efficient multivariate feature ranking method for gene selection in high-dimensional microarray data. Expert Systems With Applications, 166, 113971. https://doi.org/10.1016/j.eswa.2020.113971
Venkataramana, L., Jacob, S. G., Ramadoss, R., et al. (2019). Improving classification accuracy of cancer types using parallel hybrid feature selection on microarray gene expression data. Genes & Genomics, 41(11), 1301–1313. https://doi.org/10.1007/s13258-019-00859-x
Liu, S., Wang, H., Peng, W., et al. (2022). A surrogate-assisted evolutionary feature selection algorithm with parallel random grouping for high-dimensional classification. IEEE Transactions on Evolutionary Computation. https://doi.org/10.1109/TEVC.2022.3149601
Vanjimalar S, Ramyachitra D, Manikandan P. (2018) A review on feature selection techniques for gene expression data.//2018 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC). IEEE, pp. 1–4.
Bose S, Das C, Banerjee A, et al. An Ensemble Filtering and Supervised Clustering based Informative Gene Selection Algorithm in Microarray Gene Expression Data//2020 4th International Conference on Computational Intelligence and Networks (CINE). IEEE, pp. 1–7, 2020.
Leung, Y., & Hung, Y. (2008). A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(1), 108–117.
Nouri-Moghaddam, B., Ghazanfari, M., & Fathian, M. (2021). A novel bioinspired hybrid multifilter wrapper gene selection method with ensemble classifier for microarray data. Neural Computing and Applications. https://doi.org/10.48550/arXiv.2101.00819
Al-Obeidat F, Tubaishat A, Shah B, et al. (2020) Gene encoder: a feature selection technique through unsupervised deep learning-based clustering for large gene expression data. Neural Computing and Applications, pp. 1–23.
Shah, S. H., Iqbal, M. J., Ahmad, I., Khan, S., & Rodrigues, J. J. (2020). Optimized gene selection and classification of cancer from microarray gene expression data using deep learning. Neural Computing and Applications. https://doi.org/10.1007/s00521-020-05367-8
Dabba, A., Tari, A., & Meftali, S. (2021). A new multi-objective binary Harris Hawks optimization for gene selection in microarray data. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-021-03441-0
Tian, Y., Lu, C., Zhang, X., et al. (2020). Solving large-scale multiobjective optimization problems with sparse optimal solutions via unsupervised neural networks. IEEE Transactions on Cybernetics, 51(6), 3115–3128. https://doi.org/10.1109/TCYB.2020.2979930
Hira, Z. M., & Gillies, D. F. (2015). A review of feature selection and feature extraction methods applied on microarray data. Advances in Bioinformatics. https://doi.org/10.1155/2015/198363
Črepinšek, M., Liu, S. H., & Mernik, M. (2013). Exploration and exploitation in evolutionary algorithms: a survey. ACM Computing Surveys (CSUR), 45(3), 1–33. https://doi.org/10.1145/0000000.0000000
Zhang, L., Liu, L., Yang, X. S., et al. (2016). A novel hybrid firefly algorithm for global optimization. PLoS ONE, 11(9), e0163230. https://doi.org/10.1371/journal.pone.0163230
Paun, G. (2010). Membrane computing. Scholarpedia, 5(1), 9259.
Zhang, G., Pérez-Jiménez, M. J., Riscos-Núñez, A., et al. (2021). Membrane computing models: implementations. Berlin: Springer.
Zhang, G., Pérez-Jiménez, M. J., & Gheorghe, M. (2017). Real-life applications with membrane computing. Berlin: Springer.
Busi, N. (2007). Using well-structured transition systems to decide divergence for catalytic P systems. Theoretical Computer Science, 372(2–3), 125–135. https://doi.org/10.1016/j.tcs.2006.11.021
Păun, G. (2002). Introduction: Membrane computing—what it is and what it is not//Membrane Computing (pp. 1–6). Berlin: Springer.
Gakii, C., & Rimiru, R. (2021). Identification of cancer related genes using feature selection and association rule mining. Informatics in Medicine Unlocked, 24, 100595. https://doi.org/10.1016/j.imu.2021.100595
Zhang G, Peng Z, Li X, et al. (2021) TABBA: A Novel Feature Selection Method Based on Binary Bat Algorithm and T Test//2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA). IEEE. https://doi.org/10.1109/ICCCBDA51879.2021.9442565
Houssein E H, Hussien A G, Hassanien A E. A binary whale optimization algorithm with hyperbolic tangent fitness function for feature selection [C]//IEEE 8th International Conference on Intelligent Computing and Information Systems, ICICIS2017, v2018. pp. 166–172, 2018.
Jeong, Y. S., Shin, K. S., & Jeong, M. K. (2015). An evolutionary algorithm with the partial sequential forward floating search mutation for large-scale feature selection problems. Journal of the Operational Research Society, 66(4), 529–538. https://doi.org/10.1057/jors.2013.72
Han, F., Yang, C., Wu, Y. Q., et al. (2015). A gene selection method for microarray data based on binary PSO encoding gene-to-class sensitivity information. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14(1), 85–96.
Othman, M. S., Kumaran, S. R., & Yusuf, L. M. (2020). Gene selection using hybrid multi-objective cuckoo search algorithm with evolutionary operators for cancer microarray data. IEEE Access, 8, 186348–186361. https://doi.org/10.1109/ACCESS.2020.3029890
Alomari, O. A., Makhadmeh, S. N., Al-Betar, M. A., et al. (2021). Gene selection for microarray data classification based on Gray Wolf Optimizer enhanced with TRIZ-inspired operators. Knowledge-Based Systems, 223, 107034. https://doi.org/10.1016/j.knosys.2021.107034
Azadifar, S., & Ahmadi, A. (2021). A graph-based gene selection method for medical diagnosis problems using a many-objective PSO algorithm. BMC Medical Informatics and Decision Making, 21(1), 1–16.
Bae, J. H., Kim, M., Lim, J. S., et al. (2021). Feature selection for colon cancer detection using k-means clustering and modified harmony search algorithm. Mathematics, 9, 570. https://doi.org/10.3390/math9050570
Ghosh, M., Begum, S., Sarkar, R., et al. (2019). Recursive memetic algorithm for gene selection in microarray data. Expert Systems with Applications, 116, 172–185.
Zhang, G., Shang, Z., Verlan, S., et al. (2020). An overview of hardware implementation of membrane computing models. ACM Computing Surveys (CSUR), 53(4), 1–38.
Zhang, G., Rong, H., Paul, P., et al. (2021). A complete arithmetic calculator constructed from spiking neural p systems and its application to information fusion. International Journal of Neural Systems, 31(1), 1–17. https://doi.org/10.1142/S0129065720500550
Zhu, M., Yang, Q., Dong, J., et al. (2021). An adaptive optimization spiking neural P system for binary problems. International Journal of Neural Systems, 31(1), 1–17.
Rong, H., Duan, Y., & Zhang, G. (2022). A bibliometric analysis of membrane computing (1998–2019). Journal of Membrane Computing. https://doi.org/10.1007/s41965-022-00098-2
Dong, J., Zhang, G., Luo, B., et al. (2022). A distributed adaptive optimization spiking neural P system for approximately solving combinatorial optimization problems. Information Sciences, 596, 1014. https://doi.org/10.1016/j.ins.2022.03.007
Zhang, G., Zhang, X., Rong, H., et al. (2022). A layered spiking neural system for classification problems. International Journal of Neural Systems, 32, 2250023. https://doi.org/10.1142/S012906572250023X
Chan, K. G., Chin, P. S., Tee, K. K., Chang, C. Y., Yin, W. F., & Sheng, K. Y. (2015). Draft Genome sequence of aeromonas caviae strain L12, a quorum-sensing strain isolated from a freshwater Lake in Malaysia. Genome Announcements, 5(2), 1–15. https://doi.org/10.1128/genomeA.00079-15
Gheorghe M, Ipate F, Dragomir C, et al. Kernel P systems-version I[J]. Eleventh Brainstorming Week on Membrane Computing (11BWMC), pp. 97–124, 2013.
Elkhani N, Muniyandi R C., Membrane computing to model feature selection of microarray cancer data//Proceedings of the ASE BigData & SocialInformatics 2015. pp. 1–9, 2015.
Geem,Z.W.,Kim,J.H.,Loganathan,G.V., A new heuristic optimization algorithm: harmony search.Simulation, vol.76, no.2 pp. 60–68, 2001. DOI: 0037-5497(2001)l:2<60:ANHOAH>2.0.TX;2-3
Geem, Z. W. (2007). Novel derivative of harmony search algorithm for discrete design variables. Applied Mathematics and Computation, 199, 223–230.
Kim, Y.-H., & Yoon, Y. (2019). Zong Woo Geem, A comparison study of harmony search and genetic algorithm for the max-cut problem. Swarm and Evolutionary Computation, 44, 130–135. https://doi.org/10.1016/j.swevo.2018.01.004
Yu, N., Wu, M. J., Liu, J. X., et al. (2020). Correntropy-based hypergraph regularized NMF for clustering and feature selection on multicancer integrated data. IEEE Transactions on Cybernetics, 51(8), 3952–3963.
Feng, C. M., Xu, Y., Liu, J. X., et al. (2019). Supervised discriminative sparse PCA for com-characteristic gene selection and tumor classification on multiview biological data. IEEE Transactions on Neural Networks and Learning Systems, 30(10), 2926–2937.
Su, Y., Li, S., Zheng, C., et al. (2019). A heuristic algorithm for identifying molecular signatures in cancer. IEEE Transactions on NanoBioscience, 19(1), 132–141.
Costa, A., & Fernandez-Viagas, V. (2022). A modified harmony search for the T-single machine scheduling problem with variable and flexible maintenance. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2022.116897
Tuo, S., Zhang, J., Yuan, X., et al. (2016). FHSA-SED: two-locus model detection for genome-wide association study with harmony search algorithm. PLoS ONE, 11(3), e0150669. https://doi.org/10.1371/journal.pone.0150669
Tuo, S., Zhang, J., Yuan, X., He, Z., Liu, Y., & Liu, Z. (2017). Niche harmony search algorithm for detecting complex disease associated high-order SNP combinations. Scientific Reports, 7, 11529. https://doi.org/10.1038/s41598-017-11064-9
Tuo, S., Liu, H., & Chen, H. (2020). Multipopulation harmony search algorithm for the detection of high-order SNP interactions. Bioinformatics, 36, 4389–4398. https://doi.org/10.1093/bioinformatics/btaa215
Paziewska, A., Dabrowska, M., Goryca, K., et al. (2014). DNA methylation status is more reliable than gene expression at detecting cancer in prostate biopsy. British Journal of Cancer, 111(4), 781–789. https://doi.org/10.1371/journal.pone.0150669
Irizarry, R. A., Bolstad, B. M., Collin, F., et al. (2003). Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Research, 31(4), e15–e15. https://doi.org/10.1186/1471-2105-6-214
El-Naqa, I., Yang, Y., Wernick, M. N., et al. (2002). A support vector machine approach for detection of microcalcifications. IEEE Transactions on Medical Imaging, 21(12), 1552–1563. https://doi.org/10.1109/TMI.2002.806569
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27. https://doi.org/10.1109/TIT.1967.1053964
Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society: Series A (General), 135(3), 370–384. https://doi.org/10.2307/2344614
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/BF00058655
Ghosh, M., Sen, S., Sarkar, R., et al. (2021). Quantum squirrel inspired algorithm for gene selection in methylation and expression data of prostate cancer. Applied Soft Computing, 105, 107221. https://doi.org/10.1016/j.asoc.2021.107221
Mafarja, M., & Mirjalili, S. (2018). Whale optimization approaches for wrapper feature selection. Applied Soft Computing, 62, 441–453. https://doi.org/10.1016/j.asoc.2017.11.006
Wei, J., Zhang, R., Yu, Z., et al. (2017). A BPSO-SVM algorithm based on memory renewal and enhanced mutation mechanisms for feature selection. Applied Soft Computing, 58, 176–192. https://doi.org/10.1016/j.asoc.2017.04.061
Yu, T., Liu, D., Zhang, T., et al. (2019). Inhibition of Tet1-and Tet2-mediated DNA demethylation promotes immunomodulation of periodontal ligament stem cells. Cell Death & Disease, 10(10), 1–11. https://doi.org/10.1038/s41419-019-2025-z
Huang, Y. H., Zhang, C. Z., Huang, Q. S., et al. (2021). Clinicopathologic features, tumor immune microenvironment and genomic landscape of Epstein-Barr virus-associated intrahepatic cholangiocarcinoma. Journal of Hepatology, 74(4), 838–849. https://doi.org/10.1016/j.jhep.2020.10.037
Wang, S., Kollipara, R. K., Humphries, C. G., et al. (2016). The ubiquitin ligase TRIM25 targets ERG for degradation in prostate cancer. Oncotarget, 7(40), 64921. https://doi.org/10.18632/oncotarget.11915
Zhao, X., Hu, D., Li, J., et al. (2020). Database mining of genes of prognostic value for the prostate adenocarcinoma microenvironment using the cancer gene atlas. BioMed Research International. https://doi.org/10.1155/2020/5019793
Hsiao, C. P., Wang, D., Kaushal, A., et al. (2014). Differential expression of genes related to mitochondrial biogenesis and bioenergetics in fatigued prostate cancer men receiving external beam radiation therapy. Journal of Pain and Symptom Management, 48(6), 1080–1090. https://doi.org/10.1016/j.jpainsymman.2014.03.010
Zhang, L., Meng, X., Pan, C., et al. (2020). piR-31470 epigenetically suppresses the expression of glutathione S-transferase pi 1 in prostate cancer via DNA methylation. Cellular Signaling, 67, 109501. https://doi.org/10.1016/j.cellsig.2019.109501
Zhang, J., Gao, K., Xie, H., et al. (2021). SPOP mutation induces DNA methylation by stabilizing GLP/G9a. Nature Communications, 12(1), 1–17. https://doi.org/10.1038/s41467-021-25951-3
Dias, A., Kote-Jarai, Z., Mikropoulos, C., et al. (2018). Prostate cancer germline variations and implications for screening and treatment. Cold Spring Harbor Perspectives in Medicine, 8(9), a030379.
Tenge, V. R., Knowles, J., & Johnson, J. L. (2014). The ribosomal biogenesis protein Utp21 interacts with Hsp90 and has differing requirements for Hsp90-associated proteins. PLoS ONE, 9(3), e92569. https://doi.org/10.1371/journal.pone.0092569
Li, S., Wu, X., & Tan, M. (2008). Gene selection using hybrid particle swarm optimization and genetic algorithm. Soft Computing, 9(12), 1039–1048. https://doi.org/10.1007/s00500-007-0272-x
Rostami, M., Forouzandeh, S., Berahmand, K., et al. (2022). Gene selection for microarray data classification via multiobjective graph theoretic-based method. Artificial Intelligence in Medicine, 123, 102228. https://doi.org/10.1016/j.artmed.2021.102228
Coleto-Alcudia, V., & Vega-Rodríguez, M. A. (2020). Artificial Bee Colony algorithm based on Dominance (ABCD) for a hybrid gene selection method. Knowledge-Based Systems, 205, 106323. https://doi.org/10.1016/j.knosys.2020.106323
Mandal, M., & Mukhopadhyay, A. (2013). An improved minimum redundancy maximum relevance approach for feature selection in gene expression data. Procedia Technology, 10, 20–27.
Tuo, S., Li, C., Liu, F., et al. (2022). A Novel Multitasking Ant Colony Optimization Method for Detecting Multiorder SNP Interactions. Interdisciplinary Sciences: Computational Life Sciences, 14, 814–832. https://doi.org/10.1007/s12539-022-00530-2
Tuo, S., Li, C., Liu, F., et al. (2022). MTHSA-DHEI: Multitasking harmony search algorithm for detecting high-order SNP epistatic interactions. Complex & Intelligent Systems. https://doi.org/10.1007/s40747-022-00813-7
Acknowledgements
The author would like to thank all the editors, reviewers, and referees for their constructive comments.
Funding
This work was supported by General Project of Shannxi Provincial Education Department (No. 18JK0165), the Natural Science Foundation of China under Grant 62002289, the Natural Science Basic Research Program of Shaanxi (No. 2021JM-347), and the Postgraduate Innovation Fund Project of Xi'an University of Posts and Telecommunications (No. CXJJYL2021032).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tuo, S., Liu, F., Feng, Z. et al. Membrane computing with harmony search algorithm for gene selection from expression and methylation data. J Membr Comput 4, 293–313 (2022). https://doi.org/10.1007/s41965-022-00111-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41965-022-00111-8