Skip to main content
Log in

Establishment of Three Gene Prognostic Markers in Pancreatic Ductal Adenocarcinoma Using Machine Learning Approach

  • Original Article
  • Published:
Journal of Medical and Biological Engineering Aims and scope Submit manuscript



Pancreatic ductal adenocarcinoma (PDAC) is the most prevalent form of pancreatic cancer, accounting for about 85% of all occurrences. It is highly challenging to treat PDAC because of its extreme aggressiveness and lack of therapeutic options. Identifying new gene markers can help in the design of novel targeted therapeutics.


In this study, we identified three different gene prognostic markers in PDAC using a machine learning approach. Initially, the differential expression genes (DEGs) profile of accession number GSE183795 was downloaded from the gene expression omnibus database of the National Center for Biotechnology Information (NCBI), which consists of the expression profile of the 244 patients with PDAC (139 pancreatic tumors, 102 adjacent non-tumors and 3 normal). Then, the expression dataset was preprocessed using different packages of R programming, such as GEOquery, Affy, and Limma. Further, DEGs were identified by the machine learning algorithms, including random forest (RF) and extreme gradient boost (XGboost). Finally, survival analysis was performed to identify DEGs using GEPIA software (TCGA database).


Our results revealed that 6 out of 25 DEGs (ERCC3, ACY3, ATP2A3, MW-TW1879, MW-TW3829, and ZBTB7A) identified by RF and XGBoost algorithm were the same, indicating their feature importance. Moreover, three genes, including ATP2A3 (p = 0.029), NRL (p = 0.012), and FBXO45 (p = 0.013), were statistically significant when tested for survival analysis and may be utilized as the prognostic marker genes for PDAC.


These findings provide valuable insights into the molecular characteristics of PDAC and can potentially guide future research on cancer theranostics interventions for this devastating disease.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

Data will be made available on request.



Pancreatic ductal adenocarcinoma


Differential expression genes


National Center for Biotechnology Information


Random forest


Extreme gradient boost


Quantitative polymerase chain reaction

RNA seq:

RNA sequencing


Machine learning


Support vector machine


Receiver operating characteristic curve


Area under the curve


Gene Expression Profiling Interactive Analysis


The Cancer Genome Atlas


Genotype-Tissue Expression


Excision repair cross-complementation group 3


Aminoacylase 3


Squamous-cell lung cancer


Neural retina- specific leucine




  1. Matellan, C., et al. (2023). Retinoic acid receptor β modulates mechanosensing and invasion in pancreatic cancer cells via myosin light chain 2. Oncogenesis, 12(1), 23.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Yang, S. (2022). Dysregulation of HNF1B/Clusterin Axis Enhances Disease Progression in a Highly Aggressive Subset of Pancreatic Cancer Patients Carcinogenesis, : p. bgac092-bgac092.

  3. Mizrahi, J. D., et al. (2020). Pancreatic cancer. The Lancet, 395(10242), 2008–2020.

    Article  CAS  Google Scholar 

  4. Sung, H., et al. (2019). Emerging cancer trends among young adults in the USA: Analysis of a population-based cancer registry. The Lancet Public Health, 4(3), e137–e147.

    Article  PubMed  Google Scholar 

  5. Christenson, E. S., Jaffee, E., & Azad, N. S. (2020). Current and emerging therapies for patients with advanced pancreatic ductal adenocarcinoma: A bright future. The Lancet Oncology, 21(3), e135–e145.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Rychkov, D., et al. (2021). Cross-tissue transcriptomic analysis leveraging machine learning approaches identifies new biomarkers for rheumatoid arthritis. Frontiers in Immunology, 12, 638066.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Gupta, S., et al. (2022). Deep learning techniques for cancer classification using microarray gene expression data. Frontiers in Physiology, 13, 952709.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Nagy, Á., et al. (2018). Validation of miRNA prognostic power in hepatocellular carcinoma using expression data of independent datasets. Scientific Reports, 8(1), 9227.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Tabares-Soto, R., et al. (2020). A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data. PeerJ Computer Science, 6, e270.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Durinck, S. (2008). Pre-processing of microarray data and analysis of differential expression Bioinformatics: Data, Sequence Analysis and Evolution, : pp. 89–110.

  11. Herrero, J., Díaz-Uriarte, R., & Dopazo, J. (2003). Gene Expression data Preprocessing Bioinformatics, 19(5): 655–656.

    CAS  PubMed  Google Scholar 

  12. Chen, J. W., & Dhahbi, J. (2021). Lung adenocarcinoma and lung squamous cell carcinoma cancer classification, biomarker identification, and gene expression analysis using overlapping feature selection methods. Scientific Reports, 11(1), 13323.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Kabiraj, S., et al. (2020). Breast Cancer risk prediction using XGBoost and Random Forest Algorithm. 11th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2020, 1–4.

    Google Scholar 

  14. Lai, Y. L., et al. (2022). Identification of a steroid hormone-associated gene signature predicting the prognosis of prostate cancer through an integrative bioinformatics analysis. Cancers, 14(6), 1565.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Mi, X., et al. (2021). Permutation-based identification of important biomarkers for complex diseases via machine learning models. Nature Communications, 12(1), 3008.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Venkatesan, C. (2022). Efficient Machine Learning Technique for Tumor Classification Based on Gene Expression Data. in. 8th International Conference on Advanced Computing and Communication Systems (ICACCS). 2022. IEEE.

  17. Hossain, M. A., et al. (2019). Machine learning and bioinformatics models to identify gene expression patterns of ovarian cancer associated with disease progression and mortality. Journal of Biomedical Informatics, 100, 103313.

    Article  PubMed  Google Scholar 

  18. Zhang, S., et al. (2023). Aligned deep neural network for integrative analysis with high-dimensional input. Journal of Biomedical Informatics, 144, 104434.

    Article  PubMed  Google Scholar 

  19. Petegrosso, R., Li, Z., & Kuang, R. (2020). Machine learning and statistical methods for clustering single-cell RNA-sequencing data. Briefings in Bioinformatics, 21(4), 1209–1223.

    Article  CAS  PubMed  Google Scholar 

  20. Roy, S., et al. (2020). Classification models for Invasive Ductal Carcinoma Progression, based on gene expression data-trained supervised machine learning. Scientific Reports, 10(1), 4113.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Almeida, P. P., Cardoso, C. P., & de Freitas, L. M. (2020). PDAC-ANN: An artificial neural network to predict pancreatic ductal adenocarcinoma based on gene expression. BMC cancer, 20, 1–11.

    Article  Google Scholar 

  22. Savareh, B. A., et al. (2020). A machine learning approach identified a diagnostic model for pancreatic cancer through using circulating microRNA signatures. Pancreatology, 20(6), 1195–1204.

    Article  Google Scholar 

  23. Yadav, A. K., Sharma, D., & Sorokina, O. (2022). Indu Khatri1, 2 and Manoj K. Bhasin1, 3 Systems Biology and Omics Approaches to Understand Complex Diseases Biology, : p. 8.

  24. Yan, W., et al. (2020). Identifying drug targets in pancreatic ductal adenocarcinoma through machine learning, analyzing biomolecular networks, and structural modeling. Frontiers in Pharmacology, 11, 534.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Ram, M., Najafi, A., & Shakeri, M. T. (2017). Classification and biomarker genes selection for cancer gene expression data using random forest. Iranian Journal of Pathology, 12(4), 339.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Li, W., et al. (2019). Gene expression value prediction based on XGBoost algorithm. Frontiers in Genetics, 10, 1077.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Ma, B., et al. (2020). Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data. Computers in Biology and Medicine, 121, 103761.

    Article  CAS  PubMed  Google Scholar 

  28. Shahane, R., Ismail, M., & Prabhu, C. (2019). A survey on deep learning techniques for prognosis and diagnosis of cancer from microarray gene expression data. Journal of Computational and Theoretical Nanoscience, 16(12), 5078–5088.

    Article  CAS  Google Scholar 

  29. Pragya, et al. (2023). Differential Gene expression data analysis of ASD using Random Forest. Studies in Health Technology and Informatics, 302, 1047–1051.

    CAS  PubMed  Google Scholar 

  30. Agastheeswaramoorthy, K., & Sevilimedu, A. (2020). Drug REpurposing using AI/ML tools-for Rare diseases (DREAM-RD): A case study with fragile X syndrome (FXS). bioRxiv. p. 2020.09. 25.311142.

  31. Pezoulas, V. C., et al. (2021). Machine learning approaches on high throughput NGS data to unveil mechanisms of function in biology and disease. Cancer Genomics & Proteomics, 18(5), 605–626.

    Article  CAS  Google Scholar 

  32. Vougas, K., et al. (2019). Machine learning and data mining frameworks for predicting drug response in cancer: An overview and a novel in silico screening process based on association rule mining (203, p. 107395). Pharmacology & therapeutics.

  33. Banaei, N., et al. (2019). Machine learning algorithms enhance the specificity of cancer biomarker detection using SERS-based immunoassays in microfluidic chips. RSC Advances, 9(4), 1859–1868.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Wang, F., Su, Q., & Li, C. (2022). Identidication of novel biomarkers in non-small cell lung cancer using machine learning. Scientific Reports, 12(1), 16693.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Abbas, M., & El-Manzalawy, Y. (2020). Machine learning based refined differential gene expression analysis of pediatric sepsis. BMC Medical Genomics, 13(1), 122.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Ram, M., Najafi, A., & Shakeri, M. T. (2017). Classification and biomarker genes selection for Cancer gene expression data using Random Forest. Iran J Pathol, 12(4), 339–347.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Koppad, S., et al. (2022). Machine learning-based identification of colon cancer candidate diagnostics genes. Biology, 11(3), 365.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Zhang, Z. M., et al. (2020). Early diagnosis of pancreatic ductal adenocarcinoma by combining relative expression orderings with machine-learning method. Frontiers in Cell and Developmental Biology, 8, 582864.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Haigis, K. M., Cichowski, K., & Elledge, S. J. (2019). Tissue-specificity in cancer: The rule, not the exception. Science, 363(6432), 1150–1151.

    Article  CAS  PubMed  Google Scholar 

  40. Riechelmann, R. P. (2023). Germline pathogenic variants in patients with early-onset neuroendocrine neoplasms. Endocrine-Related Cancer, 30(6).

  41. Zhou, J. (2020). The drug-resistance mechanisms of five platinum-based antitumor agents. Front. Pharmacol ; 11. 2020.

  42. Tsirulnikov, K., et al. (2018). Aminoacylase 3 is a new potential marker and therapeutic target in hepatocellular carcinoma. Journal of Cancer, 9(1), 1.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Korošec, B., et al. (2009). ATP2A3 gene is involved in cancer susceptibility. Cancer Genetics and Cytogenetics, 188(2), 88–94.

    Article  PubMed  Google Scholar 

  44. Lin, Y. H., et al. (2022). Functional role of mitochondrial DNA in cancer progression. International Journal of Molecular Sciences, 23(3), 1659.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Liu, J., et al. (2022). ZBTB7A, a mir-144-3p targeted gene, accelerates bladder cancer progression via downregulating HIC1 expression. Cancer Cell International, 22(1), 1–14.

    Article  CAS  Google Scholar 

  46. Singh, A. K., et al. (2021). Role of ZBTB7A zinc finger in tumorigenesis and metastasis. Molecular Biology Reports, 48(5), 4703–4719.

    Article  CAS  PubMed  Google Scholar 

  47. Christodoulou, P., et al. (2021). Altered SERCA expression in breast cancer. Medicina, 57(10), 1074.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Papp, B., et al. (2012). Endoplasmic reticulum calcium pumps and cancer cell differentiation. Biomolecules, 2(1), 165–186.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Meneses-Morales, I., et al. (2019). Epigenetic regulation of the human ATP2A3 gene promoter in gastric and colon cancer cell lines. Molecular Carcinogenesis, 58(6), 887–897.

    Article  CAS  PubMed  Google Scholar 

  50. Iyer, A. S., & Chapoval, S. P. (2018). Neuroimmune semaphorin 4A in cancer angiogenesis and inflammation: A promoter or a suppressor? International Journal of Molecular Sciences, 20(1), 124.

    Article  PubMed  PubMed Central  Google Scholar 

  51. Wu, L., et al. (2022). Fbxo45 facilitates pancreatic carcinoma progression by targeting USP49 for ubiquitination and degradation. Cell Death & Disease, 13(3), 231.

    Article  CAS  Google Scholar 

  52. Lin, M., Wang, Z., & Zhu, X. (2020). FBXO45 is a potential therapeutic target for cancer therapy. Cell Death Discovery, 6(1), 55.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Wang, K., et al. (2018). Identification of aberrantly expressed F-box proteins in squamous-cell lung carcinoma. Journal of cancer Research and Clinical Oncology, 144, 1509–1521.

    Article  CAS  PubMed  Google Scholar 

  54. Swain, P. K., et al. (2001). Multiple phosphorylated isoforms of NRL are expressed in rod Photoreceptors*. Journal of Biological Chemistry, 276(39), 36824–36830.

    Article  CAS  PubMed  Google Scholar 

  55. Garancher, A., et al. (2018). NRL and CRX define photoreceptor identity and reveal subgroup-specific dependencies in medulloblastoma. Cancer cell, 33(3), 435–449. e6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


This study did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations



Pragya: Data curation, Formal analysis, Investigation, Methodology, Resources, Validation, Visualization, Writing - original draft, Writing - review & editing. Praveen Kumar Govarthan: Software, Investigation, Validation, Conceptualization. Malay Nayak: Methodology, Software, Validation, Visualization, Writing - original draft. Sudip Mukherjee: Conceptualization, Supervision, Writing - review & editing. Jac Fredo Agastinose Ronickom: Conceptualization, Supervision, Writing - review & editing.

Corresponding author

Correspondence to Jac Fredo Agastinose Ronickom.

Ethics declarations

Competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pragya, P., Govarthan, P.K., Nayak, M. et al. Establishment of Three Gene Prognostic Markers in Pancreatic Ductal Adenocarcinoma Using Machine Learning Approach. J. Med. Biol. Eng. (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: