Skip to main content
Log in

Decoding whole-genome mutational signatures in 37 human pan-cancers by denoising sparse autoencoder neural network

  • Article
  • Published:
Oncogene Submit manuscript

Abstract

Millions of somatic mutations have recently been discovered in cancer genomes. These mutations in cancer genomes occur due to internal and external mutagenesis forces. Decoding the mutational processes by examining their unique patterns has successfully revealed many known and novel signatures from whole exome data, but many still remain undiscovered. Here, we developed a deep learning approach, DeepMS, to decompose mutational signatures using 52,671,908 somatic mutations from 2780 highly curated cancer genomes with whole genome sequencing (WGS) in 37 cancer types/subtypes. With rigorous model training and comparison, we characterized 54 signatures for single base substitutions (SBSs), 11 for doublet base substitutions (DBSs) and 16 for small insertions and deletions (Indels). Compared to the previous methods, DeepMS could discover 37 SBS, 5 DBS, and 9 Indel new signatures, many of which represent associations with DNA mismatch or base excision repair and cisplatin therapy mechanisms. We further developed a regression-based model to estimate the correlation between signatures and clinical and demographical phenotypes. The first deep learning model DeepMS on WGS somatic mutational profiles enable us identify more comprehensive context-based mutational signatures than traditional NMF approaches. Our work substantially expands the landscape of the naturally occurring mutational signatures in cancer genomes, and provides new insights into cancer biology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1: Framework of Denoising Sparse Auto-Encoder (DSAE) model.
Fig. 2: t-SNE plots for the signatures of three somatic mutational classes.
Fig. 3: Comparison of somatic mutational signatures with Alexandrov et al. study.
Fig. 4: Association between latent vector and patient environmental exposure based on regression models.
Fig. 5: Association between mutational signatures and 37 cancer types/subtypes through logistic regression models.

Similar content being viewed by others

Data and code availability

All data and the code in the paper are present in Supplementary materials and GitHub https://github.com/bsml320/DeepMS.

References

  1. Garraway LA, Lander ES. Lessons from the cancer genome. Cell. 2013;153:17–37.

    Article  CAS  PubMed  Google Scholar 

  2. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SA, Behjati S, Biankin AV, et al. Signatures of mutational processes in human cancer. Nature. 2013;500:415–21.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Alexandrov L, Kim J, Haradhvala NJ, Huang MN, Ng AWT, Boot A, et al. The repertoire of mutational signatures in human cancer. Nature. 2020;578:94–101.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Jia P, Pao W, Zhao Z. Patterns and processes of somatic mutations in nine major cancers. BMC Med Genom. 2014;7:11.

    Article  Google Scholar 

  5. Hainaut P, Pfeifer GP. Patterns of p53 G–>T transversions in lung cancers reflect the primary mutagenic signature of DNA-damage by tobacco smoke. Carcinogenesis. 2001;22:367–74.

    Article  CAS  PubMed  Google Scholar 

  6. Pfeifer GP, Denissenko MF, Olivier M, Tretyakova N, Hecht SS, Hainaut P. Tobacco smoke carcinogens, DNA damage and p53 mutations in smoking-associated cancers. Oncogene. 2002;21:7435–51.

    Article  CAS  PubMed  Google Scholar 

  7. Pfeifer GP, You YH, Besaratinia A. Mutations induced by ultraviolet light. Mutat Res. 2005;571:19–31.

    Article  CAS  PubMed  Google Scholar 

  8. Pena-Diaz J, Bregenhorn S, Ghodgaonkar M, Follonier C, Artola-Boran M, Castor D, et al. Noncanonical mismatch repair as a source of genomic instability in human cells. Mol Cell. 2017;47:669–80.

    Article  Google Scholar 

  9. Alexandrov LB, Nik-Zainal S, Siu HC, Leung SY, Stratton MR. A mutational signature in gastric cancer suggests therapeutic strategies. Nat Commun. 2015;6:8683.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Forbes SA, Beare D, Boutselakis H, Bamford S, Bindal N, Tate J, et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 2017;45:D777–83.

    Article  CAS  PubMed  Google Scholar 

  11. Alexandrov LB, Nik-Zainal S, Wedge DC, Campbell PJ, Stratton MR. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 2013;3:246–59.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Gehring JS, Fischer B, Lawrence M, Huber W. SomaticSignatures: inferring mutational signatures from single-nucleotide variants. Bioinformatics. 2015;31:3673–5.

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Blokzijl F, Janssen R, van Boxtel R, Cuppen E. MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med. 2018;10:33.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Ardin M, Cahais V, Castells X, Bouaoun L, Byrnes G, Herceg Z, et al. MutSpec: a Galaxy toolbox for streamlined analyses of somatic mutation spectra in human and mouse cancer genomes. BMC Bioinforma. 2016;17:170.

    Article  Google Scholar 

  15. Rosenthal R, McGranahan N, Herrero J, Taylor BS, Swanton C. DeconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol. 2016;17:31.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Thurau C, Kersting K, Wahabzada M, Bauckhage C. Convex non-negative matrix factorization for massive datasets. Knowl Inform Syst. 2011;29:457–78.

    Article  Google Scholar 

  17. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 1. Lake Tahoe, Nevada: Curran Associates Inc.; 2012, p. 1097–105.

  18. Rifai S, Vincent P, Muller X, Glorot X, Bengio Y. Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the 28th International Conference on International Conference on Machine Learning. Bellevue, Washington, USA: Omnipress; 2011, p. 833–40.

  19. Lvd Maaten, Hinton GE. Visualizing high-dimensional data Using t-SNE. J Mach Learn Res. 2008;9:2579–605.

    Google Scholar 

  20. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res. 2010;11:3371–408.

    Google Scholar 

  21. Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Kucab JE, Zou X, Morganella S, Joel M, Nanda AS, Nagy E, et al. A compendium of mutational signatures of environmental agents. Cell. 2019;177:821–36.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Hatakeyama K, Ohshima K, Nagashima T, Ohnami S, Serizawa M, Shimoda Y, et al. Molecular profiling and sequential somatic mutation shift in hypermutator tumours harbouring POLE mutations. Sci Rep. 2018;8:8700.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Palles C, Cazier JB, Howarth KM, Domingo E, Jones AM, Broderick P, et al. Germline mutations affecting the proofreading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nat Genet. 2013;45:136–44.

    Article  CAS  PubMed  Google Scholar 

  25. Drost J, van Boxtel R, Blokzijl F, Mizutani T, Sasaki N, Sasselli V, et al. Use of CRISPR-modified human stem cell organoids to study the origin of mutational signatures in cancer. Science. 2017;358:234–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Boot A, Huang MN, Ng AWT, Ho SC, Lim JQ, Kawakami Y, et al. In-depth characterization of the cisplatin mutational signature in human cell lines and in esophageal and liver tumors. Genome Res. 2018;28:654–65.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Viel A, Bruselles A, Meccia E, Fornasarig M, Quaia M, Canzonieri V, et al. A specific mutational signature associated with DNA 8-oxoguanine persistence in MUTYH-defective colorectal cancer. EBioMedicine. 2017;20:39–49.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Pilati C, Shinde J, Alexandrov LB, Assie G, Andre T, Helias-Rodzewicz Z, et al. Mutational signature analysis identifies MUTYH deficiency in colorectal cancers and adrenocortical carcinomas. J Pathol. 2017;242:10–5.

    Article  CAS  PubMed  Google Scholar 

  29. Tomasetti C, Li L, Vogelstein B. Stem cell divisions, somatic mutations, cancer etiology, and cancer prevention. Science. 2017;355:1330–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Jia P, Wang Q, Chen Q, Hutchinson KE, Pao W, Zhao Z. MSEA: detection and quantification of mutation hotspots through mutation set enrichment analysis. Genome Biol. 2014;15:489.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Cooper DN, Mort M, Stenson PD, Ball EV, Chuzhanova NA. Methylation-mediated deamination of 5-methylcytosine appears to give rise to mutations causing human inherited disease in CpNpG trinucleotides, as well as in CpG dinucleotides. Hum Genom. 2010;4:406–10.

    Article  CAS  Google Scholar 

  32. Chuzhanova NA, Anassis EJ, Ball EV, Krawczak M, Cooper DN. Meta-analysis of indels causing human genetic disease: mechanisms of mutagenesis and the role of local DNA sequence complexity. Hum Mutat. 2003;21:28–44.

    Article  CAS  PubMed  Google Scholar 

  33. Ollila J, Lappalainen I, Vihinen M. Sequence specificity in CpG mutation hotspots. FEBS Lett. 1996;396:119–22.

    Article  CAS  PubMed  Google Scholar 

  34. Krawczak M, Ball EV, Cooper DN. Neighboring-nucleotide effects on the rates of germ-line single-base-pair substitution in human genes. Am J Hum Genet. 1998;63:474–88.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Brash DE. UV signature mutations. Photochem Photobiol. 2015;91:15–26.

    Article  CAS  PubMed  Google Scholar 

  36. Ceccaldi R, Rondinelli B, D'Andrea AD. Repair pathway choices and consequences at the double-strand break. Trends Cell Biol. 2016;26:52–64.

    Article  CAS  PubMed  Google Scholar 

  37. Pfeifer GP. Formation and processing of UV photoproducts: effects of DNA sequence and chromatin environment. Photochem Photobiol. 1997;65:270–83.

    Article  CAS  PubMed  Google Scholar 

  38. Vincent P, Larochelle H, Bengio Y, Manzagol P-A. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland: ACM; 2008, p. 1096–103.

  39. Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2013;35:1798–828.

    Article  PubMed  Google Scholar 

  40. Haradhvala NJ, Polak P, Stojanov P, Covington KR, Shinbrot E, Hess JM, et al. Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair. Cell. 2016;164:538–49.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Kelley DR, Snoek J, Rinn JL. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 2016;26:990–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Zhang J, Bajari R, Andric D, Gerthoffert F, Lepsa A, Nahal-Bose H, et al. The International Cancer Genome Consortium data portal. Nat Biotechnol. 2019;37:367–9.

    Article  CAS  PubMed  Google Scholar 

  43. Benjamini Y, Hochberg Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J R Stat Soc: Ser B. 1995;57:289–300.

    Google Scholar 

Download references

Acknowledgements

We thank Drs Chen Wang and Wei Xie for insightful discussion.

Funding

Cancer Prevention and Research Institute of Texas (CPRIT RP180734), National Institutes of Health (R01LM012806).

Author information

Authors and Affiliations

Authors

Contributions

PJ, GP, and ZZ conceived the study. GP performed data analysis. GP and RH constructed the models. GP, YD, PJ, and ZZ interpreted the results. GP, PJ, and ZZ wrote the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Zhongming Zhao or Peilin Jia.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pei, G., Hu, R., Dai, Y. et al. Decoding whole-genome mutational signatures in 37 human pan-cancers by denoising sparse autoencoder neural network. Oncogene 39, 5031–5041 (2020). https://doi.org/10.1038/s41388-020-1343-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41388-020-1343-z

  • Springer Nature Limited

This article is cited by

Navigation