Skip to main content

Imputing DNA Methylation by Transferred Learning Based Neural Network


DNA methylation is one important epigenetic type to play a vital role in many diseases including cancers. With the development of the high-throughput sequencing technology, there is much progress to disclose the relations of DNA methylation with diseases. However, the analyses of DNA methylation data are challenging due to the missing values caused by the limitations of current techniques. While many methods have been developed to impute the missing values, these methods are mostly based on the correlations between individual samples, and thus are limited for the abnormal samples in cancers. In this study, we present a novel transfer learning based neural network to impute missing DNA methylation data, namely the TDimpute-DNAmeth method. The method learns common relations between DNA methylation from pan-cancer samples, and then fine-tunes the learned relations over each specific cancer type for imputing the missing data. Tested on 16 cancer datasets, our method was shown to outperform other commonly-used methods. Further analyses indicated that DNA methylation is related to cancer survival and thus can be used as a biomarker of cancer prognosis.

This is a preview of subscription content, access via your institution.


  1. Francis R C. Epigenetics: The Ultimate Mystery of Inheritance. WW Norton & Company, 2011.

  2. Ye P, Luan Y, Chen K, Liu Y, Xiao C, Xie Z. MethSMRT: An integrative database for DNA N6-methyladenine and N4-methylcytosine generated by single-molecular real-time sequencing. Nucleic Acids Research, 2016, 45(D1): D85-D89. DOI:

    Article  Google Scholar 

  3. Kulis M, Esteller M. DNA methylation and cancer. Advances in Genetics, 2010, 70(22): 27-56. DOI:

    Article  Google Scholar 

  4. Gerd P. Defining driver DNA methylation changes in human cancer. International Journal of Molecular Sciences, 2018, 19(4): Article No. 1166. DOI: 10.3390/ijms19041166.

  5. Jouinot A, Assie G, Libe R et al. DNA methylation is an independent prognostic marker of survival in adrenocortical cancer. The Journal of Clinical Endocrinology & Metabolism, 2016, 102(3): 923-932. DOI:

    Article  Google Scholar 

  6. Zhang G, Huang K C, Xu Z et al. Across-platform imputation of DNA methylation levels incorporating nonlocal information using penalized functional regression. Genetic Epidemiology, 2016, 40(4): 333-340. DOI:

    Article  Google Scholar 

  7. Troyanskaya O, Cantor M, Sherlock G et al. Missing value estimation methods for DNA microarrays. Bioinformatics, 2001, 17(6): 520-525. DOI:

    Article  Google Scholar 

  8. Guttorp P, Fuentes M, Sampson P. Using transforms to analyze space-time processes. In Statistical Methods for Spatio-Temporal Systems, Finkenstadt B, Held L, Isham V (eds.), CRC/Chapman, 2006, pp.77-150.

  9. Josse J, Husson F. Handling missing values in exploratory multivariate data analysis methods. Journal de la Société Française de Statistique, 2012, 153(2): 77-99.

    MathSciNet  MATH  Google Scholar 

  10. Di Lena P, Sala C, Prodi A, Nardini C. Missing value estimation methods for DNA methylation data. Bioinformatics, 2019, 35(19): 3786-3793. DOI:

    Article  Google Scholar 

  11. Stekhoven D J, Bühlmann P. MissForest-Non-Parametric missing value imputation for mixed-type data. Bioinformatics, 2012, 28(1): 112-118. DOI:

    Article  Google Scholar 

  12. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436-444. DOI:

    Article  Google Scholar 

  13. Heffernan R, Paliwal K, Lyons J et al. Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Scientific Reports, 2015, 5: Article No. 11476. DOI: 10.1038/srep11476.

  14. Chen J, Zheng S, Zhao H, Yang Y. Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map. Journal of Cheminformatics, 2021, 13(1): Article No. 7. DOI: 10.1186/s13321-021-00488-1.

  15. Senior A W, Evans R, Jumper J et al. Improved protein structure prediction using potentials from deep learning. Nature, 2020, 577(7792): 706-710. DOI:

    Article  Google Scholar 

  16. Ching T, Himmelstein D S, Beaulieu-Jones B K et al. Opportunities and obstacles for deep learning in biology and medicine. Journal of the Royal Society Interface, 2018, 15(141): Article No. 20170387. DOI: 10.1098/rsif.2017.0387.

  17. Zheng S, Li Y, Chen S, Xu J, Yang Y. Predicting drugprotein interaction using quasi-visual question answering system. Nature Machine Intelligence, 2020, 2(2): 134-140. DOI:

    Article  Google Scholar 

  18. Zheng S, Rao J, Zhang Z, Xu J, Yang Y. Predicting retrosynthetic reactions using self-corrected transformer neural networks. Journal of Chemical Information and Modeling, 2019, 60(1): 47-55. DOI:

    Article  Google Scholar 

  19. Way G P, Greene C S. Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders. Pac Symp Biocomput, 2018, 23: 80-91. DOI:

    Article  Google Scholar 

  20. Titus A J, Wilkins O M, Bobak C A, Christensen B C. Unsupervised deep learning with variational autoencoders applied to breast tumor genome-wide DNA methylation data with biologic feature extraction., Dec. 2021. DOI: 10.1101/433763.

  21. Lv X, Chen Z, Lu Y, Yang Y. An end-to-end Oxford Nanopore basecaller using convolution-augmented transformer. In Proc. the 2020 IEEE International Conference on Bioinformatics and Biomedicine, Dec. 2020, pp.337-342. DOI: 10.1109/BIBM49941.2020.9313290.

  22. Tian T, Wan J, Song Q, Wei Z. Clustering single-cell RNA-seq data with a model-based deep learning approach. Nature Machine Intelligence, 2019, 1(4): 191-198. DOI:

    Article  Google Scholar 

  23. Lopez R, Regier J, Cole M B, Jordan M I, Yosef N. Deep generative modeling for single-cell transcriptomics. Nature Methods, 2018, 15(12): 1053-1058. DOI:

    Article  Google Scholar 

  24. Zeng Y, Zhou X, Rao J, Lu Y, Yang Y. Accurately clustering single-cell RNA-seq data by capturing structural relations between cells through graph convolutional network. In Proc. the 2020 IEEE International Conference on Bioinformatics and Biomedicine, Dec. 2020, pp.519-522. DOI: 10.1109/BIBM49941.2020.9313569.

  25. Zhou X, Chai H, Zeng Y, Zhao H, Luo C H, Yang Y. scAdapt: Virtual adversarial domain adaptation network for single cell RNA-seq data classification across platforms and species. Briefings in Bioinformatics, 2021, 22(6): Article No. bbab281. DOI: 10.1093/bib/bbab281.

  26. Zhang Z, Zhao Y, Liao X et al. Deep learning in omics: A survey and guideline. Briefings in Functional Genomics, 2019, 18(1): 41-57. DOI:

    Article  Google Scholar 

  27. The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature, 2020, 578(7793): 82-93. DOI:

  28. Li Y, Wang L, Wang J, Ye J, Reddy C K. Transfer learning for survival analysis via efficient L2, 1-Norm regularized cox regression. In Proc. the 2016 IEEE International Conference on Data Mining, Dec. 2016, pp.231-240. DOI:

  29. Yousefi S, Amrollahi F, Amgad M et al. Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models. Scientific Reports, 2017, 7(1): Article No. 11707. DOI: 10.1038/s41598-017-11817-6.

  30. Yang X, Gao L, Zhang S. Comparative pan-cancer DNA methylation analysis reveals cancer common and specific patterns. Briefings in Bioinformatics, 2016, 18(5): 761-773. DOI:

    Article  Google Scholar 

  31. Hoadley K A, Yau C, Wolf D M et al. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell, 2014, 158(4): 929-944. DOI:

    Article  Google Scholar 

  32. Zhou X, Chai H, Zhao H, Luo C H, Yang Y. Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning-based neural network. Giga-Science, 2020, 9(7): Article No. giaa076. DOI: 10.1093/gigascience/giaa076.

  33. Wei L, Jin Z, Yang S, Xu Y, Zhu Y, Ji Y. TCGAassembler 2: Software pipeline for retrieval and processing of TCGA/CPTAC data. Bioinformatics, 2017, 34(9): 1615-1617. DOI:

    Article  Google Scholar 

  34. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 2010, 33(1): 1-22.

    Article  Google Scholar 

  35. Van Belle V, Pelckmans K, Van Huffel S, Suykens J A. Support vector methods for survival analysis: A comparison between ranking and regression approaches. Artificial Intelligence in Medicine, 2011, 53(2): 107-118. DOI:

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Yue-Dong Yang.

Supplementary Information


(PDF 175 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, XF., Zhou, X., Rao, JH. et al. Imputing DNA Methylation by Transferred Learning Based Neural Network. J. Comput. Sci. Technol. 37, 320–329 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • neural network
  • transfer learning
  • DNA methylation
  • data imputation
  • survival analysis