Abstract
Cancer is a genetic disease that is categorized among the most lethal and belligerent diseases. An early staging of the disease can reduce the high mortality rate associated with cancer. The advancement in high throughput sequencing technology and the implementation of several Machine Learning algorithms have led to significant progress in Oncogenomics over the past few decades. Oncogenomics uses RNA sequencing and gene expression profiling for the identification of cancer-related genes. The high dimensionality of RNA sequencing data makes it a complex and large-scale optimization problem. CDRGI presents a Discrete Filtering technique based on a Binary Artificial Bee Colony coupling Support Vector Machine and a two-stage cascading classifier to identify relevant genes and detect cancer using RNA seq data. The proposed approach has been tested for seven different cancers, including Breast Cancer, Stomach Cancer (STAD), Colon Cancer (COAD), Liver Cancer, Lung Cancer (LUSC), Kidney Cancer (KIRC), and Skin Cancer. The results revealed that the CDRGI performs better for feature reduction while achieving better classification accuracy for STAD, COAD, LUSC and KIRC cancer types.
Similar content being viewed by others
References
Xiao Y, Wu J, Lin Z, Zhao X (2018) A deep learning-based multi-model ensemble method for cancer prediction. Comput Methods Progr Biomed 153:1–9
Xiao Y, Wu J, Lin Z, Zhao X (2018) A semi-supervised deep learning method based on stacked sparse auto-encoder for cancer prediction using RNA-seq data. Comput Methods Progr Biomed 166:99–105
Elyasigomari V, Lee DA, Screen HR, Shaheed MH (2017) Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification. J Biomed Inform 67:11–20
Khalifa NEM, Taha MHN, Ali DE, Slowik A, Hassanien AE (2020) Artificial intelligence technique for gene expression by tumor RNA-Seq data: a novel optimized deep learning approach. IEEE Access 8:22874–22883
Lu H, Chen J, Yan K, Jin Q, Xue Y, Gao Z (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256:56–62
Cancer-World Health Organization. https://www.who.int/news-room/fact-sheets/detail/cancer
Prager GW, Braga S, Bystricky B, Qvortrup C, Criscitiello C, Esin E, Strijbos M (2018) Global cancer control: responding to the growing burden, rising costs and inequalities in access. ESMO Open 3(2):e000285
National Cancer Institute. https://www.cancer.gov/about-cancer/understanding/what-is-cancer
Saini H, Lal SP, Naidu VV, Pickering VW, Singh G, Tsunoda T, Sharma A (2016) Gene masking-a technique to improve accuracy for cancer classification with high dimensionality in microarray data. BMC Med Genom 9(3):74
National Cancer Institute. https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga
Hsu YH, Si D (2018) Cancer Type Prediction and Classification Based on RNA-sequencing Data. In: 2018 40th annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 5374–5377
Danaee P, Ghaeini R, Hendrix DA (2017) A deep learning approach for cancer detection and relevant gene identification. In: Pacific symposium on biocomputing 2017. pp 21–229
Kashan MH, Nahavandi N, Kashan AH (2012) DisABC: a new artificial bee colony algorithm for binary optimization. Appl Soft Comput 12(1):342–352
Lyu B, Haque A (2018) Deep learning based tumor type classification using gene expression data. In: Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics. pp 89–96
Hamzeh O, Alkhateeb A, Zheng J, Kandalam S, Rueda L (2020) Prediction of tumor location in prostate cancer tissue using a machine learning system on gene expression data. BMC Bioinform 21(2):1–10
Shon HS, Batbaatar E, Kim KO, Cha EJ, Kim KA (2020) Classification of kidney cancer data using cost-sensitive hybrid deep learning approach. Symmetry 12(1):154
Karaboga D (2005) An idea based on honey bee swarm for numerical optimization, vol 200. Technical report-tr06. Erciyes University, Engineering Faculty, Computer Engineering Department, pp 1–10
Akay B, Karaboga D (2012) A modified artificial bee colony algorithm for real-parameter optimization. Inf Sci 192:120–142
Schiezaro M, Pedrini H (2013) Data feature selection based on artificial bee colony algorithm. EURASIP J Image Video Process 2013(1):47
CatBoost. https://catboost.ai/
Kang P, Lin Z, Teng S, Zhang G, Guo L, Zhang W (2019) Catboost-based framework with additional user information for social media popularity prediction. In: Proceedings of the 27th ACM international conference on multimedia. pp 2677–2681
Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A (2018) CatBoost: unbiased boosting with categorical features. In: Advances in neural information processing systems. pp 6638–6648
Understanding of MultiLayer (MLP) Perceptron. https://medium.com/@AI.with.Kain/understanding-of-multilayer-perceptron-mlp-8f179c4a135f
Tang J, Deng C, Huang GB (2015) Extreme learning machine for multilayer perceptron. IEEE Trans Neural Netw Learn Syst 27(4):809–821
Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
https://dataaspirant.com/2017/05/22/random-forest-algorithm-machine-learing/
Robinson MD, Oshlack A (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11(3):R25
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Al-Obeidat, F., Rocha, Á., Akram, M. et al. (CDRGI)-Cancer detection through relevant genes identification. Neural Comput & Applic 34, 8447–8454 (2022). https://doi.org/10.1007/s00521-021-05739-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-05739-8