Skip to main content

Role of Pre-processing in Gene Selection Using DNA Microarray Gene Expression Data

  • Conference paper
  • First Online:
Data Science and Communication (ICTDsC 2023)

Included in the following conference series:

  • 111 Accesses

Abstract

Precision medicine is a boon to the medical field recently for early disease detection, monitoring disease progression, and developing new drugs. To successfully deploy precision medicine, biomarker identification is the first step and DNA microarray technology acts as a powerful tool in recent decades. Using DNA microarray data, one can analyze the tens of thousands of genes simultaneously, but it also has some limitations. The data is quite noisy and contains irrelevant genes. Furthermore, it has a dimension imbalance problem which affects the overall performance of the gene selection process or disease classification accuracy. This paper has emphasized on the role of pre-processing approach using some widely used statistical methods to overcome these drawbacks. To show the importance of pre-processing, here, there are two different approaches: gene selection without pre-processing and with pre-processing on two real-life datasets such as Breast Cancer and Leukemia. Gene selection is viewed here as an optimization problem, and the optimization is done using the proposed PSO-SVM gene selection model. It can be observed that the second approach (with pre-processing) gives better results as compared to the former. After comparing the performance of the pre-processing methods, based on gene selection (maximum accuracy was achieved by choosing minimum genes) it can be inferred that SNR and Fisher Score are competitive and better than others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Singh RK, Sivabalakrishnan M (2015) Feature selection of gene expression data for cancer classification: a review. Procedia Comput Sci 50:52–57

    Article  Google Scholar 

  2. Saha S, Biswas S, Acharyya S (2019) Gene selection by sample classification using k nearest neighbor and meta-heuristic algorithms. In: 2016 IEEE 6th international conference on advanced computing (IACC), pp 250–255. IEEE

    Google Scholar 

  3. Dutta J, Biswas S, Saha S, Acharyya S (2015) Identification of disease-critical genes causing preeclampsia: meta-heuristic approaches. In: 2015 IEEE UP section conference on electrical computer and electronics (UPCON), pp 1–6. IEEE

    Google Scholar 

  4. Dass S, Mistry S, Sarkar P, Paik P (2021) An optimize gene selection approach for cancer classification using hybrid feature selection methods. In: International conference on advanced network technologies and intelligent computing, pp 751–764. Springer, Cham (2021)

    Google Scholar 

  5. Debata PP, Mohapatra P (2022) Identification of significant bio-markers from high-dimensional cancerous data employing a modified multi-objective meta-heuristic algorithm. J King Saud Univ-Comput Inform Sci 34(8):4743–4755

    Google Scholar 

  6. Alomari OA, Makhadmeh SN, Al-Betar MA, Alyasseri ZAA, Doush IA, Abasi AK, Awadallah MA, Zitar RA (2021) Gene selection for microarray data classification based on Gray Wolf Optimizer enhanced with TRIZ-inspired operators. Knowl-Based Syst 223:107034

    Article  Google Scholar 

  7. Dabba A, Tari A, Meftali S, Mokhtari R (2021) Gene selection and classification of microarray data method based on mutual information and moth flame algorithm. Expert Syst Appl 166:114012

    Article  Google Scholar 

  8. Dabba A, Tari A, Meftali S (2021) Hybridization of Moth flame optimization algorithm and quantum computing for gene selection in microarray data. J Ambient Intell Humaniz Comput 12(2):2731–2750

    Article  Google Scholar 

  9. Hameed SS, Hassan WH, Latiff LA, Muhammadsharif FF (2021) A comparative study of nature-inspired metaheuristic algorithms using a three-phase hybrid approach for gene selection and classification in high-dimensional cancer datasets. Soft Comput 25(13):8683–8701

    Article  Google Scholar 

  10. Ghosh T, Mitra S, Acharyya S (2021) Pathway marker identification using gene expression data analysis: a particle swarm optimisation approach. In: International conference on emerging applications of information technology, pp 127–136. Springer, Singapore (2021)

    Google Scholar 

  11. Shukla AK, Singh P, Vardhan M (2020) Gene selection for cancer types classification using novel hybrid metaheuristics approach. Swarm Evol Comput 54:100661

    Article  Google Scholar 

  12. Abdulrauf Sharifai G, Zainol Z (2020) Feature selection for high-dimensional and imbalanced biomedical data based on robust correlation based redundancy and binary grasshopper optimization algorithm. Genes 11(7):717

    Google Scholar 

  13. Prabhakar SK, Lee SW (2020) Transformation based tri-level feature selection approach using wavelets and swarm computing for prostate cancer classification. IEEE Access 8:127462–127476

    Article  Google Scholar 

  14. Baliarsingh SK, Vipsita S, Dash B (2020) A new optimal gene selection approach for cancer classification using enhanced Jaya-based forest optimization algorithm. Neural Comput Appl 32(12):8599–8616

    Article  Google Scholar 

  15. Almugren N, Alshamlan HM (2019) New bio-marker gene discovery algorithms for cancer gene expression profile. IEEE Access 7:136907–136913

    Article  Google Scholar 

  16. Saidi R, Bouaguel W, Essoussi N (2019) Hybrid feature selection method based on the genetic algorithm and pearson correlation coefficient. In: Machine learning paradigms: theory and application, pp 3–24. Springer, Cham (2019)

    Google Scholar 

  17. Al-Yousef A, Samarasinghe S (2021) A novel computational approach for biomarker detection for gene expression-based computer-aided diagnostic systems for breast cancer. In: Artificial neural networks. Humana, New York (2021), pp 195–208

    Google Scholar 

  18. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN'95-international conference on neural networks. IEEE, vol 4, pp 1942–1948

    Google Scholar 

  19. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory, pp 144–152

    Google Scholar 

  20. Jana B, Acharyaa S (2019) Critical gene selection by a modified particle swarm optimization approach. In: International conference on pattern recognition and machine intelligence. Springer, Cham (2019), pp 165–175

    Google Scholar 

  21. Chang JC et al (2003) Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. The Lancet 362(9381):362–369

    Article  Google Scholar 

  22. Chang JC et al (2005) Patterns of resistance and incomplete response to docetaxel by gene expression profiling in breast cancer patients. J Clin Oncol 23(6):1169–1177. https://doi.org/10.1200/JCO.2005.03.156

    Article  Google Scholar 

  23. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sriyankar Acharyya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ghosh, T., Acharyya, S. (2024). Role of Pre-processing in Gene Selection Using DNA Microarray Gene Expression Data. In: Tavares, J.M.R.S., Rodrigues, J.J.P.C., Misra, D., Bhattacherjee, D. (eds) Data Science and Communication. ICTDsC 2023. Studies in Autonomic, Data-driven and Industrial Computing. Springer, Singapore. https://doi.org/10.1007/978-981-99-5435-3_7

Download citation

Publish with us

Policies and ethics