Abstract
Precision medicine is a boon to the medical field recently for early disease detection, monitoring disease progression, and developing new drugs. To successfully deploy precision medicine, biomarker identification is the first step and DNA microarray technology acts as a powerful tool in recent decades. Using DNA microarray data, one can analyze the tens of thousands of genes simultaneously, but it also has some limitations. The data is quite noisy and contains irrelevant genes. Furthermore, it has a dimension imbalance problem which affects the overall performance of the gene selection process or disease classification accuracy. This paper has emphasized on the role of pre-processing approach using some widely used statistical methods to overcome these drawbacks. To show the importance of pre-processing, here, there are two different approaches: gene selection without pre-processing and with pre-processing on two real-life datasets such as Breast Cancer and Leukemia. Gene selection is viewed here as an optimization problem, and the optimization is done using the proposed PSO-SVM gene selection model. It can be observed that the second approach (with pre-processing) gives better results as compared to the former. After comparing the performance of the pre-processing methods, based on gene selection (maximum accuracy was achieved by choosing minimum genes) it can be inferred that SNR and Fisher Score are competitive and better than others.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Singh RK, Sivabalakrishnan M (2015) Feature selection of gene expression data for cancer classification: a review. Procedia Comput Sci 50:52–57
Saha S, Biswas S, Acharyya S (2019) Gene selection by sample classification using k nearest neighbor and meta-heuristic algorithms. In: 2016 IEEE 6th international conference on advanced computing (IACC), pp 250–255. IEEE
Dutta J, Biswas S, Saha S, Acharyya S (2015) Identification of disease-critical genes causing preeclampsia: meta-heuristic approaches. In: 2015 IEEE UP section conference on electrical computer and electronics (UPCON), pp 1–6. IEEE
Dass S, Mistry S, Sarkar P, Paik P (2021) An optimize gene selection approach for cancer classification using hybrid feature selection methods. In: International conference on advanced network technologies and intelligent computing, pp 751–764. Springer, Cham (2021)
Debata PP, Mohapatra P (2022) Identification of significant bio-markers from high-dimensional cancerous data employing a modified multi-objective meta-heuristic algorithm. J King Saud Univ-Comput Inform Sci 34(8):4743–4755
Alomari OA, Makhadmeh SN, Al-Betar MA, Alyasseri ZAA, Doush IA, Abasi AK, Awadallah MA, Zitar RA (2021) Gene selection for microarray data classification based on Gray Wolf Optimizer enhanced with TRIZ-inspired operators. Knowl-Based Syst 223:107034
Dabba A, Tari A, Meftali S, Mokhtari R (2021) Gene selection and classification of microarray data method based on mutual information and moth flame algorithm. Expert Syst Appl 166:114012
Dabba A, Tari A, Meftali S (2021) Hybridization of Moth flame optimization algorithm and quantum computing for gene selection in microarray data. J Ambient Intell Humaniz Comput 12(2):2731–2750
Hameed SS, Hassan WH, Latiff LA, Muhammadsharif FF (2021) A comparative study of nature-inspired metaheuristic algorithms using a three-phase hybrid approach for gene selection and classification in high-dimensional cancer datasets. Soft Comput 25(13):8683–8701
Ghosh T, Mitra S, Acharyya S (2021) Pathway marker identification using gene expression data analysis: a particle swarm optimisation approach. In: International conference on emerging applications of information technology, pp 127–136. Springer, Singapore (2021)
Shukla AK, Singh P, Vardhan M (2020) Gene selection for cancer types classification using novel hybrid metaheuristics approach. Swarm Evol Comput 54:100661
Abdulrauf Sharifai G, Zainol Z (2020) Feature selection for high-dimensional and imbalanced biomedical data based on robust correlation based redundancy and binary grasshopper optimization algorithm. Genes 11(7):717
Prabhakar SK, Lee SW (2020) Transformation based tri-level feature selection approach using wavelets and swarm computing for prostate cancer classification. IEEE Access 8:127462–127476
Baliarsingh SK, Vipsita S, Dash B (2020) A new optimal gene selection approach for cancer classification using enhanced Jaya-based forest optimization algorithm. Neural Comput Appl 32(12):8599–8616
Almugren N, Alshamlan HM (2019) New bio-marker gene discovery algorithms for cancer gene expression profile. IEEE Access 7:136907–136913
Saidi R, Bouaguel W, Essoussi N (2019) Hybrid feature selection method based on the genetic algorithm and pearson correlation coefficient. In: Machine learning paradigms: theory and application, pp 3–24. Springer, Cham (2019)
Al-Yousef A, Samarasinghe S (2021) A novel computational approach for biomarker detection for gene expression-based computer-aided diagnostic systems for breast cancer. In: Artificial neural networks. Humana, New York (2021), pp 195–208
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN'95-international conference on neural networks. IEEE, vol 4, pp 1942–1948
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory, pp 144–152
Jana B, Acharyaa S (2019) Critical gene selection by a modified particle swarm optimization approach. In: International conference on pattern recognition and machine intelligence. Springer, Cham (2019), pp 165–175
Chang JC et al (2003) Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. The Lancet 362(9381):362–369
Chang JC et al (2005) Patterns of resistance and incomplete response to docetaxel by gene expression profiling in breast cancer patients. J Clin Oncol 23(6):1169–1177. https://doi.org/10.1200/JCO.2005.03.156
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ghosh, T., Acharyya, S. (2024). Role of Pre-processing in Gene Selection Using DNA Microarray Gene Expression Data. In: Tavares, J.M.R.S., Rodrigues, J.J.P.C., Misra, D., Bhattacherjee, D. (eds) Data Science and Communication. ICTDsC 2023. Studies in Autonomic, Data-driven and Industrial Computing. Springer, Singapore. https://doi.org/10.1007/978-981-99-5435-3_7
Download citation
DOI: https://doi.org/10.1007/978-981-99-5435-3_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5434-6
Online ISBN: 978-981-99-5435-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)