Skip to main content
Log in

Identification of gene-level methylation for disease prediction

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

DNA methylation is an epigenetic alteration that plays a fundamental part in governing gene regulatory processes. The DNA methylation mechanism affixes methyl groups to distinct cytosine residues, influencing chromatin architectures. Multiple studies have demonstrated that DNA methylation's regulatory effect on genes is linked to the beginning and progression of several disorders. Researchers have recently uncovered thousands of phenotype-related methylation sites through the epigenome-wide association study (EWAS). However, combining the methylation levels of several sites within a gene and determining the gene-level DNA methylation remains challenging. In this study, we proposed the supervised UMAP Assisted Gene-level Methylation method (sUAGM) for disease prediction based on supervised UMAP (Uniform Manifold Approximation and Projection), a manifold learning-based method for reducing dimensionality. The methylation values at the gene level generated using the proposed method are evaluated by employing various feature selection and classification algorithms on three distinct DNA methylation datasets derived from blood samples. The performance has been assessed employing classification accuracy, F-1 score, Mathews Correlation Coefficient (MCC), Kappa, Classification Success Index (CSI) and Jaccard Index. The Support Vector Machine with the linear kernel (SVML) classifier with Recursive Feature Elimination (RFE) performs best across all three datasets. From comparative analysis, our method outperformed existing gene-level and site-level approaches by achieving 100% accuracy and F1-score with fewer genes. The functional analysis of the top 28 genes selected from the Parkinson's disease dataset revealed a significant association with the disease.

Graphical Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Availability of data and material

The datasets analysed during the current study are available in the Gene Expression omnibus (GEO). https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc = GSE111629, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc = GSE156994 and https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc = GSE128235.

Code availability

Code is available from the corresponding author on reasonable request.

References

  1. Khodadadi E et al (2021) Current advances in DNA methylation analysis methods. BioMed Research International. https://doi.org/10.1155/2021/8827516

    Article  PubMed  PubMed Central  Google Scholar 

  2. Moore LD, Le T, Fan G (2013) DNA methylation and its basic function. Neuropsychopharmacology 38(1):23–38. https://doi.org/10.1038/npp.2012.112

    Article  CAS  PubMed  Google Scholar 

  3. Li S, Tollefsbol TO (2021) DNA methylation methods: Global DNA methylation and methylomic analyses. Methods 187:28–43. https://doi.org/10.1016/j.ymeth.2020.10.002

    Article  CAS  PubMed  Google Scholar 

  4. Bogdanović O et al (2016) Active DNA demethylation at enhancers during the vertebrate phylotypic period. Nat Genet 48(4):417–426. https://doi.org/10.1038/ng.3522

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Brenet F et al (2011) DNA methylation of the first exon is tightly linked to transcriptional silencing. PloS one 6(1):e14524. https://doi.org/10.1371/journal.pone.0014524

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Rajpal S et al (2023) XAI-MethylMarker: Explainable AI approach for biomarker discovery for breast cancer subtype classification using methylation data. Expert Systems with Applications 225:120130. https://doi.org/10.1016/j.eswa.2023.120130

    Article  Google Scholar 

  7. Kaur G et al (2022) DNA Methylation: A Promising Approach in Management of Alzheimer’s Disease and Other Neurodegenerative Disorders. Biology 11(1):90. https://doi.org/10.3390/biology11010090

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Zhang Ye, Zeng C (2016) Role of DNA methylation in cardiovascular diseases. Clin Exp Hypertens 38(3):261–267. https://doi.org/10.3109/10641963.2015.1107087

    Article  CAS  PubMed  Google Scholar 

  9. Barres R, Zierath JR (2011) DNA methylation in metabolic disorders. Am J Clin Nutr 93(4):897S-900S. https://doi.org/10.3945/ajcn.110.001933

    Article  CAS  PubMed  Google Scholar 

  10. Quan, Yuan, et al. "Mining the selective remodeling of DNA methylation in promoter regions to identify robust gene-level associations with phenotype." Frontiers in molecular biosciences 8 (2021): 597513. https://doi.org/10.3389/fmolb.2021.597513

  11. Chuang, Yu-Hsuan, et al. "Parkinson’s disease is associated with DNA methylation levels in human blood and saliva." Genome medicine 9.1 (2017): 1–12. https://doi.org/10.1186/s13073-017-0466-5

  12. Li, Qingqin S., et al. "Association of peripheral blood DNA methylation level with Alzheimer’s disease progression." Clinical epigenetics 13.1 (2021): 1–16. https://doi.org/10.1186/s13148-021-01179-2

  13. Zuo, Tao, et al. "Methods in DNA methylation profiling." Epigenomics 1.2 (2009): 331–345. https://doi.org/10.2217/epi.09.31

  14. Fernandez-Jimenez, Nora, et al. "Comparison of Illumina 450K and EPIC arrays in placental DNA methylation." Epigenetics 14.12 (2019): 1177–1182. https://doi.org/10.1080/15592294.2019.1634975

  15. Li, En, and Yi Zhang. "DNA methylation in mammals." Cold Spring Harbor perspectives in biology 6.5 (2014): a019133. https://doi.org/10.1101/cshperspect.a019133

  16. Mahendran, Nivedhitha, and Durai Raj Vincent PM. "A deep learning framework with an embedded-based feature selection approach for the early detection of the Alzheimer's disease." Computers in Biology and Medicine 141 (2022): 105056. https://doi.org/10.1016/j.compbiomed.2021.105056

  17. Ma, Baoshan, et al. "Diagnostic classification of cancers using DNA methylation of paracancerous tissues." Scientific Reports 12.1 (2022): 1–14. https://doi.org/10.1038/s41598-022-14786-7

  18. Zhang, Ge, et al. "A novel biomarker identification approach for gastric cancer using gene expression and DNA methylation dataset." Frontiers in Genetics 12 (2021): 644378. https://doi.org/10.3389/fgene.2021.644378

  19. Feng, Xin, et al. "Detection and comparative analysis of methylomic biomarkers of rheumatoid arthritis." Frontiers in genetics 11 (2020): 238. https://doi.org/10.3389/fgene.2020.00238

  20. Augustine, Jisha, and A. S. Jereesh. "Blood-Based DNA Methylation Marker Identification for Parkinson’s Disease Prediction." International Conference on Innovative Computing and Communications. Springer, Singapore, 2022. https://doi.org/10.1007/978-981-16-2597-8_67

  21. Yan, Haidan, et al. "Individualized analysis reveals CpG sites with methylation aberrations in almost all lung adenocarcinoma tissues." Journal of translational medicine 15.1 (2017): 1–9. https://doi.org/10.1186/s12967-017-1122-y

  22. Cai, Jinpu, et al. "A comprehensive comparison of residue-level methylation levels with the regression-based gene-level methylation estimations by ReGear." Briefings in Bioinformatics 22.4 (2021): bbaa253. https://doi.org/10.1093/bib/bbaa253

  23. Kou, Chuanhua, Yuanyuan Zhang, and Jinhe Wang. "CSSIG: Identification of Cancer Sample-Specific Associated Genes Using Information Gain based on DNA Methylation Data." Proceedings of the 2020 3rd International Conference on Big Data Technologies. 2020. https://doi.org/10.1145/3422713.3422740

  24. Wang, Changliang, et al. "Identification of potential blood biomarkers for Parkinson’s disease by gene expression and DNA methylation data integration analysis." Clinical epigenetics 11.1 (2019): 1–15. https://doi.org/10.1186/s13148-019-0621-5

  25. Merid, Simon Kebede, et al. "Integration of gene expression and DNA methylation identifies epigenetically controlled modules related to PM2. 5 exposure." Environment international 146 (2021): 106248. https://doi.org/10.1016/j.envint.2020.106248

  26. McInnes, Leland, John Healy, and James Melville. "Umap: Uniform manifold approximation and projection for dimension reduction." arXiv preprint arXiv:1802.03426 (2018). https://doi.org/10.48550/arXiv.1802.03426

  27. Barrett, Tanya, et al. "NCBI GEO: archive for functional genomics data sets—update." Nucleic acids research 41.D1 (2012): D991-D995. https://doi.org/10.1093/nar/gks1193

  28. Dabin, Luke C., et al. "Altered DNA methylation profiles in blood from patients with sporadic Creutzfeldt–Jakob disease." Acta neuropathologica 140.6 (2020): 863–879. https://doi.org/10.1007/s00401-020-02224-9

  29. Zannas, Anthony S., et al. "Epigenetic upregulation of FKBP5 by aging and stress contributes to NF-κB–driven inflammation and cardiovascular risk." Proceedings of the National Academy of Sciences 116.23 (2019): 11370–11379. https://doi.org/10.1073/pnas.1816847116

  30. Starovoitov, V. V., & Golub, Y. I. (2020, March). Comparative study of quality estimation of binary classification. In Informatics (Vol. 17, No. 1, pp. 87–101). https://doi.org/10.37661/1816-0301-2020-17-1-87-101

  31. Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure, 405(2), 442–451. https://doi.org/10.1016/0005-2795(75)90109-9

  32. Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measur 20(1):37–46. https://doi.org/10.1177/001316446002000

    Article  Google Scholar 

  33. Labatut, V., & Cherifi, H. (2012). Accuracy measures for the comparison of classifiers. arXiv preprint arXiv:1207.3790. https://doi.org/10.48550/arXiv.1207.3790

  34. Li, Bi-Qing, et al. "Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS." (2012): e43927. https://doi.org/10.1371/journal.pone.0043927

  35. Remeseiro, Beatriz, and Veronica Bolon-Canedo. "A review of feature selection methods in medical applications." Computers in biology and medicine 112 (2019): 103375. https://doi.org/10.1016/j.compbiomed.2019.103375

  36. Li, Jundong, et al. "Feature selection: A data perspective." ACM computing surveys (CSUR) 50.6 (2017): 1–45. https://doi.org/10.1145/3136625

  37. Student. "The probable error of a mean." Biometrika (1908): 1–25. https://doi.org/10.1093/biomet/6.1.1

  38. Wilcoxon, Frank. "Individual comparisons by ranking methods." Breakthroughs in statistics. Springer, New York, NY, 1992. 196–202. https://doi.org/10.1007/978-1-4612-4380-9_16

  39. Liu, Huan, and Rudy Setiono. "Chi2: Feature selection and discretization of numeric attributes." Proceedings of 7th IEEE international conference on tools with artificial intelligence. IEEE, 1995. https://doi.org/10.1109/TAI.1995.479783

  40. Guyon, Isabelle, et al. "Gene selection for cancer classification using support vector machines." Machine learning 46.1 (2002): 389–422. https://doi.org/10.1023/A:1012487302797

  41. Liaw, Andy, and Matthew Wiener. "Classification and regression by randomForest." R news 2.3 (2002): 18–22. https://journal.r-project.org/articles/RN-2002-022/RN-2002-022.pdf

  42. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.org/10.1007/BF00994018

    Article  Google Scholar 

  43. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2):131–163. https://doi.org/10.1023/A:1007465528199

    Article  Google Scholar 

  44. Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185. https://doi.org/10.1080/00031305.1992.10475879

    Article  Google Scholar 

  45. Breiman, Leo, et al. Classification and regression trees. Routledge, 2017. https://doi.org/10.1201/9781315139470

  46. Hosmer Jr, David W., Stanley Lemeshow, and Rodney X. Sturdivant. Applied logistic regression. Vol. 398. John Wiley & Sons, 2013. https://doi.org/10.1002/9781118548387

  47. Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324

    Article  Google Scholar 

  48. Chen, Tianqi, and Carlos Guestrin. "Xgboost: A scalable tree boosting system." Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016. https://doi.org/10.1145/2939672.2939785

  49. Al-Obeidat, F., Tubaishat, A., Shah, B., & Halim, Z. (2022). Gene encoder: a feature selection technique through unsupervised deep learning-based clustering for large gene expression data. Neural Computing and Applications, 1–23. https://doi.org/10.1007/s00521-020-05101-4

  50. Magare, Archana C., and Maulika S. Patel. "Biomarkers Identification for Parkinson’s Disease using Machine Learning." 2021 International Conference on Artificial Intelligence and Machine Vision (AIMV). IEEE, 2021. https://doi.org/10.1109/AIMV53313.2021.9670941

  51. Fernández-Martínez JL et al (2019) Robust sampling of defective pathways in Parkinson disease. J Med Inform Decis Mak 1:37–52. https://doi.org/10.3390/ijms20194681

    Article  CAS  Google Scholar 

  52. Briggs, Christine E., et al. "Midbrain dopamine neurons in Parkinson׳ s disease exhibit a dysregulated miRNA and target-gene network." Brain research 1618 (2015): 111–121. https://doi.org/10.1016/j.brainres.2015.05.021

  53. Yuan J, Zhang S, Zhang Y (2018) Nrf1 is paved as a new strategic avenue to prevent and treat cancer, neurodegenerative and other diseases. Toxicol Appl Pharmacol 360:273–283. https://doi.org/10.1016/j.taap.2018.09.037

    Article  CAS  PubMed  Google Scholar 

  54. Durrenberger, Pascal F., et al. "Common mechanisms in neurodegeneration and neuroinflammation: a BrainNet Europe gene expression microarray study." Journal of neural transmission 122.7 (2015): 1055–1068. https://doi.org/10.1007/s00702-014-1293-0

  55. Mo J, Chen J, Zhang B (2020) Critical roles of FAM134B in ER-phagy and diseases. Cell Death Dis 11(11):1–12. https://doi.org/10.1038/s41419-020-03195-1

    Article  CAS  Google Scholar 

  56. Furtinger, Sabine, et al. "Plasticity of Y1 and Y2 receptors and neuropeptide Y fibers in patients with temporal lobe epilepsy." Journal of Neuroscience 21.15 (2001): 5804–5812. https://doi.org/10.1523/JNEUROSCI.21-15-05804.2001

  57. Borgwardt, Line, et al. "Alpha-mannosidosis: correlation between phenotype, genotype and mutant MAN2B1 subcellular localisation." Orphanet journal of rare diseases 10.1 (2015): 1–16. https://doi.org/10.1186/s13023-015-0286-x

  58. Nielsen, Jonas Ellegaard, et al. "Shotgun-based proteomics of extracellular vesicles in Alzheimer’s disease reveals biomarkers involved in immunological and coagulation pathways." Scientific Reports 11.1 (2021): 1–15. https://doi.org/10.1038/s41598-021-97969-y

  59. Takeda-Uchimura, Yoshiko, et al. "Beta3Gn-T7 Is a Keratan Sulfate β1, 3 N-Acetylglucosaminyltransferase in the Adult Brain." Frontiers in Neuroanatomy 16 (2022). https://doi.org/10.3389/fnana.2022.813841

  60. Zakeri S, Sadat N, Pashazadeh S, MotieGhader H (2020) Gene biomarker discovery at different stages of Alzheimer using gene co-expression network approach. Sci Rep 10(1):1–13. https://doi.org/10.1038/s41598-020-69249-8

    Article  CAS  Google Scholar 

  61. Kimbrel, Nathan A., et al. "A genome-wide association study of suicide attempts in the million veterans program identifies evidence of pan-ancestry and ancestry-specific risk loci." Molecular psychiatry 27.4 (2022): 2264–2272. https://doi.org/10.1038/s41380-022-01472-3

  62. Zhang, Xianglong, et al. "Genome-wide burden of rare short deletions is enriched in major depressive disorder in four cohorts." Biological psychiatry 85.12 (2019): 1065–1073. https://doi.org/10.1016/j.biopsych.2019.02.022

  63. Galfalvy, Hanga, et al. "A pilot genome wide association and gene expression array study of suicide with and without major depression." The world journal of biological psychiatry 14.8 (2013): 574–582. https://doi.org/10.3109/15622975.2011.597875

  64. Peter-Ross EM (2018) Molecular hypotheses to explain the shared pathways and underlying pathobiological causes in catatonia and in catatonic presentations in neuropsychiatric disorders. Med Hypotheses 113:54–64. https://doi.org/10.1016/j.mehy.2018.02.009

    Article  CAS  PubMed  Google Scholar 

  65. Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. "The elements of statistical learnin." Cited on 33 (2009). https://doi.org/10.1007/978-0-387-21606-5

  66. Baghi, Masoud, et al. "MiR‐193b deregulation is associated with Parkinson's disease." Journal of Cellular and Molecular Medicine 25.13 (2021): 6348–6360. https://doi.org/10.1111/jcmm.16612

  67. Wang, Ling, et al. "Association of four new candidate genetic variants with Parkinson's disease in a Han Chinese population." American Journal of Medical Genetics Part B: Neuropsychiatric Genetics 171.3 (2016): 342–347. https://doi.org/10.1002/ajmg.b.32410

  68. Guillén-Navarro, Encarna, et al. "A new seipin-associated neurodegenerative syndrome." Journal of medical genetics 50.6 (2013): 401–409. https://doi.org/10.1136/jmedgenet-2013-101525

  69. Rakshit, Hindol, Nitin Rathi, and Debjani Roy. "Construction and analysis of the protein-protein interaction networks based on gene expression profiles of Parkinson's disease." PloS one 9.8 (2014): e103047. https://doi.org/10.1371/journal.pone.0103047

  70. Casaletto, Kaitlin B., et al. "Neurogranin, a synaptic protein, is associated with memory independent of Alzheimer biomarkers." Neurology 89.17 (2017): 1782–1788. https://doi.org/10.1212/WNL.0000000000004569

  71. Azam, Shofiul, et al. "G-protein-coupled receptors in CNS: a potential therapeutic target for intervention in neurodegenerative disorders and associated cognitive deficits." Cells 9.2 (2020): 506. https://doi.org/10.3390/cells9020506

  72. Nkiliza, Aurore, et al. "RNA-binding disturbances as a continuum from spinocerebellar ataxia type 2 to Parkinson disease." Neurobiology of disease 96 (2016): 312–322. https://doi.org/10.1016/j.nbd.2016.09.014

  73. Rahman, Md Habibur, et al. "Discovering biomarkers and pathways shared by alzheimer's disease and Parkinson's disease to identify novel therapeutic targets." Int J Eng Res Technol 6 (2020). https://doi.org/10.3390/medicina55050191

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jisha Augustine.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interests.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to participate

Informed consent was obtained from all individual participants included in the study.

Consent for publication

Not applicable.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Augustine, J., Jereesh, A.S. Identification of gene-level methylation for disease prediction. Interdiscip Sci Comput Life Sci 15, 678–695 (2023). https://doi.org/10.1007/s12539-023-00584-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-023-00584-w

Keywords

Navigation