Abstract
Genetic alterations under positive selection in healthy tissues have implications for cancer risk. However, total levels of positive selection across the genome remain unknown. Passenger mutations are influenced by all driver mutations, regardless of type or location in the genome. Therefore, the total number of passengers can be used to estimate the total number of drivers—including unidentified drivers outside of cancer genes that are traditionally missed. Here we analyze the variant allele frequency spectrum of synonymous mutations from healthy blood and esophagus to quantify levels of missing positive selection. In blood, we find that only 30% of passengers can be explained by single-nucleotide variants in driver genes, suggesting high levels of positive selection for mutations elsewhere in the genome. In contrast, more than half of all passengers in the esophagus can be explained by just the two driver genes NOTCH1 and TP53, suggesting little positive selection elsewhere.
Similar content being viewed by others
Data availability
The principal dataset from Bolton et al. can be downloaded using the link https://raw.githubusercontent.com/papaemmelab/bolton_NG_CH/master/M_long.txt. The dataset from Razavi et al. can be downloaded from the European Genome-Phenome Archive (EGA) under accession no. EGAS00001003755. All synonymous variants analyzed in this manuscript are listed in Supplementary Tables 1–3. The sequencing data for healthy esophagus were originally reported by Martincorena et al.; they may be found in the EGA under accession codes EGAD00001004158 and EGAD00001004159 and can be downloaded directly from https://www.science.org/doi/suppl/10.1126/science.aau3879/suppl_file/aau3879_tables2.xlsx.
Code availability
All code used in this study will be available on the Blundell laboratory GitHub page: https://github.com/blundelllab/Genetic-hitchhiking.
References
Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371, 2477–2487 (2014).
Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371, 2488–2498 (2014).
Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).
Martincorena, I. et al. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).
Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).
Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041.e21 (2017).
Young, A. L., Challen, G. A., Birmann, B. M. & Druley, T. E. Clonal haematopoiesis harbouring AML-associated mutations is ubiquitous in healthy adults. Nat. Commun. 7, 12484 (2016).
Young, A. L., Tong, R. S., Birmann, B. M. & Druley, T. E. Clonal haematopoiesis and risk of acute myeloid leukemia. Haematologica https://doi.org/10.3324/haematol.2018.215269 (2019).
Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).
Loh, P.-R. et al. Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations. Nature 559, 350–355 (2018).
Loh, P.-R., Genovese, G. & McCarroll, S. A. Monogenic and polygenic inheritance become instruments for clonal selection. Nature https://doi.org/10.1038/s41586-020-2430-6 (2020).
Moore, L. et al. The mutational landscape of normal human endometrial epithelium. Nature 580, 640–646 (2020).
Abelson, S. et al. Prediction of acute myeloid leukaemia risk in healthy individuals. Nature 559, 400–404 (2018).
Desai, P. et al. Somatic mutations precede acute myeloid leukemia years before diagnosis. Nat. Med. 24, 1015–1023 (2018).
Bolton, K. L. et al. Cancer therapy shapes the fitness landscape of clonal hematopoiesis. Nat. Genet. https://doi.org/10.1038/s41588-020-00710-0 (2020).
Razavi, P. et al. High-intensity sequencing reveals the sources of plasma circulating cell-free DNA variants. Nat. Med. 25, 1928–1937 (2019).
Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).
Watson, C. J. et al. The evolutionary dynamics and fitness landscape of clonal hematopoiesis. Science 367, 1449–1454 (2020).
Williams, M. J. et al. Measuring the distribution of fitness effects in somatic evolution by combining clonal dynamics with dN/dS ratios. eLife 9, e48714 (2020).
Hess, J. M. et al. Passenger hotspot mutations in cancer. Preprint at bioRxiv https://doi.org/10.1101/675801 (2019).
Luria, S. E. & Delbrück, M. Mutations of bacteria from virus sensitivity to virus resistance. Genetics 28, 491–511 (1943).
Desai, M. M. & Fisher, D. S. Beneficial mutation–selection balance and the effect of linkage on positive selection. Genetics 176, 1759–1798 (2007).
Williams, M. J., Werner, B., Barnes, C. P., Graham, T. A. & Sottoriva, A. Identification of neutral tumor evolution across cancer types. Nat. Genet. 48, 238–244 (2016).
Loeb, L. A. et al. Extensive subclonal mutational diversity in human colorectal cancer and its significance. Proc. Natl Acad. Sci. USA 116, 26863–26872 (2019).
Blundell, J. R. et al. The dynamics of adaptive genetic diversity during the early stages of clonal evolution. Nat. Ecol. Evol. 3, 293–301 (2019).
Fusco, D., Gralka, M., Kayser, J., Anderson, A. & Hallatschek, O. Excess of mutational jackpot events in expanding populations revealed by spatial Luria–Delbrück experiments. Nat. Commun. 7, 12760 (2016).
Schreck, C. F. et al. Impact of crowding on the diversity of expanding populations. Preprint at bioRxiv https://doi.org/10.1101/743534 (2019).
Lohmueller, K. E. et al. Natural selection affects multiple aspects of genetic variation at putatively neutral sites across the human genome. PLoS Genet. 7, e1002326 (2011).
Simons, B. D. Deep sequencing as a probe of normal stem cell fate and preneoplasia in human epidermis. Proc. Natl Acad. Sci. USA 113, 128–133 (2016).
Chapman, M. S. et al. Lineage tracing of human embryonic development and foetal haematopoiesis through somatic mutations. Preprint at bioRxiv https://doi.org/10.1101/2020.05.29.088765 (2020).
Gao, T. et al. Interplay between chromosomal alterations and gene mutations shapes the evolutionary trajectory of clonal hematopoiesis. Nat. Commun. 12, 338 (2021).
Danielsson, M. et al. Longitudinal changes in the frequency of mosaic chromosome Y loss in peripheral blood cells of aging men varies profoundly between individuals. Eur. J. Hum. Genet. 28, 349–357 (2020).
Thompson, D. J. et al. Genetic predisposition to mosaic Y chromosome loss in blood. Nature 575, 652–657 (2019).
Miyamoto, T., Weissman, I. L. & Akashi, K. AML1/ETO-expressing nonleukemic stem cells in acute myelogenous leukemia with 8;21 chromosomal translocation. Proc. Natl Acad. Sci. USA 97, 7521–7526 (2000).
Corces-Zimmerman, M. R. & Majeti, R. Pre-leukemic evolution of hematopoietic stem cells: the importance of early mutations in leukemogenesis. Leukemia 28, 2276–2282 (2014).
Aguet, F. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Khurana, E. et al. Role of non-coding sequence variants in cancer. Nat. Rev. Genet. 17, 93–108 (2016).
Rheinbay, E. et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature 578, 102–111 (2020).
Kumar, S. et al. Passenger mutations in more than 2,500 cancer genomes: overall molecular functional impact and consequences. Cell https://doi.org/10.1016/j.cell.2020.01.032 (2020).
Li, S. et al. Distinct evolution and dynamics of epigenetic and genetic heterogeneity in acute myeloid leukemia. Nat. Med. 22, 792–799 (2016).
Gebhard, C. et al. Profiling of aberrant DNA methylation in acute myeloid leukemia reveals subclasses of CG-rich regions with epigenetic or genetic association. Leukemia 33, 26–36 (2019).
Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature 578, 122–128 (2020).
Liu, X. et al. Genetic alterations in esophageal tissues from squamous dysplasia to carcinoma. Gastroenterology 153, 166–177 (2017).
Colom, B. et al. Spatial competition shapes the dynamic mutational landscape of normal esophageal epithelium. Nat. Genet. https://doi.org/10.1038/s41588-020-0624-3 (2020).
Supek, F., Miñana, B., Valcárcel, J., Gabaldón, T. & Lehner, B. Synonymous mutations frequently act as driver mutations in human cancers. Cell 156, 1324–1335 (2014).
Sharma, Y. et al. A pan-cancer analysis of synonymous mutations. Nat. Commun. 10, 2569 (2019).
Supek, F., Skunca, N., Repar, J., Vlahovicek, K. & Smuc, T. Translational selection is ubiquitous in prokaryotes. PLoS Genet. 6, e1001004 (2010).
Drummond, D. A. & Wilke, C. O. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134, 341–352 (2008).
Acknowledgements
We thank K. Bolton, A. Zehir and E. Papaemmanuil for sharing unpublished data. We also thank D. Solit, P. Razavi, D. Brown and J. Reis-Filho for sharing data and I. Martincorena for sharing data and for discussions. G.Y.P.P., C.J.W. and J.R.B. are funded by the CRUK Cambridge Centre and CRUK Early Detection Programme. J.R.B. is supported by a UKRI Future Leaders Fellowship. D.S.F. and J.R.B. are supported by the Stand Up to Cancer Foundation and the National Science Foundation via grant no. PHY-1545840.
Author information
Authors and Affiliations
Contributions
J.R.B. conceived the project. G.Y.P.P. developed the theory with input from J.R.B. and D.S.F. Data analysis methods, plotting and numerical simulations were all developed by G.Y.P.P. with input from J.R.B. and C.J.W. The manuscript was written by G.Y.P.P. and J.R.B., with input from C.J.W. All authors provided comments and edits on the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Genetics thanks Ruben van Boxtel, Benjamin Werner and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Model performance in recovering driver mutation rates in simulations.
(a) Our method is able to recover driver mutation rates accurately across a range of mutation rates (5 × 105 simulation runs were performed). At higher driver mutation rate, it is mainly limited by clonal interference which causes clones to reach sizes lower than that predicted by our theory. Best-fit values are presented with their 95% confidence intervals. (b) This shows the simulation (run no. = 15000) corresponding to driver mutation rate μb = 3 × 10-6 (τ = 1 year). The neutral mutation frequency spectrum above Ψ = 3 × 10-3 was fitted with our passenger prediction to infer the underlying driver mutation rates driving the expansions. Simulated data are presented as mean values ± sampling error. (c) The likelihood plot shows the fit for the driver mutation rate and fitness by examining the ‘nonsynonymous’ variant allele (that is driver mutation) frequency spectrum only. It is overlaid with the maximum likelihood value (white cross) and best-fit value found by the Nelder-Mead optimization algorithm (green cross). (d) The likelihood plot shows the best-fit value as well as 95% confidence intervals for the inferred total driver mutation rate from the ‘synonymous’ variant (neutral mutation) allele frequency spectrum based on the inferred fitness from the ‘nonsynonymous’ variant allele frequency spectrum.
Extended Data Fig. 2 Developmental mutation rates averages to 2-4 SNVs across entire genome per cell doubling.
(a) SNV VAFs in HSPC single-cell colonies in an 8 - week foetus30 where coverage is 22.6x per colony. SNVs found between 35% - 65% (within the dashed lines) are likely clonal in the colony. (b) SNV VAFs in HSPC single-cell colonies in an 18 - week foetus30 where coverage is 12.2x per colony. SNVs found between 30% - 70% (within the dashed lines) are likely clonal in the colony. (c) The best-fit to the reverse cumulative for the number of mutations per cell doubling per haploid is 1.86 (95% CI = 1.6 - 2.1) for Lee Six et al. data (green line and datapoints), 1.0 (95% CI = 1.0-1.1) for Chapman et al. 8-week foetus (purple line and datapoints) and 1.0 (95% CI = 0.9-1.1) for Chapman et al. 18-week foetus (orange line and datapoints).
Extended Data Fig. 3 Inferring the unobserved driver mutation rate using nonsynonymous VAF spectra in Bolton et al.
The best-fit nonsynonymous VAF spectrum based on the distribution of ages in the cohort (n = 4160) includes nonsynonymous developmental contribution estimated by considering sizes of the genomic regions (light purple line, Supplementary note 3B) and possible nonsynonymous passengers (orange dashed lines). (a) Best-fit haploid driver rate of the most commonly mutated gene (DNMT3A) is 2.9 × 10-6 per year based on the DFE defined by equation 18 (Supplementary note 3C). (b) Best-fit haploid driver rate of the top 5 genes (DNMT3A, TET2, PPM1D, SF3B1, ATM) is 4.1 × 10-6 per year. (c) Best-fit haploid driver rate of the top 10 genes (DNMT3A, TET2, PPM1D, SF3B1, ATM, ASXL1, JAK2, TP53, SRSF2, CHEK2) is 4.8 × 10-6 per year. Data are presented as mean values ± sampling error.
Extended Data Fig. 4 Mutation rates of missing drivers assuming different fitness effects.
(a) The higher the fitness effects of the unobserved drivers, the lower the mutation rate needed to explain the discrepancy in the synonymous VAF density. Inset: Pie chart showing the fraction of explained, unexplained positive selection by observed drivers (all nonsynonymous SNVs on the panel15) and developmental contribution to the observed synonymous VAF spectra. (b) The observed synonymous VAF spectra (data points, variant number = 344) compared to the density predicted by observed drivers and developmental mutations (dashed orange line) and the predicted density by also including unobserved drivers with different fitness effects (solid orange lines). Data are presented as mean values ± sampling error.
Extended Data Fig. 5 Contribution from different parts of the DFE to the predicted passenger spectrum.
(a) The age distribution of the 4160 individuals in Bolton et al.15. (b) The predicted passenger spectrum in healthy blood according to the inferred distribution of fitness effects in healthy blood (Supplementary note 3C, ‘p = 3’) and best-fit total driver mutation rate from the synonymous VAF spectrum in blood (Supplementary note 3E) for the age distribution of the 4160 individuals. (c) The relative contribution to the passenger spectrum of driver mutations with different fitness effects changes as the individual ages. The total (grey line) represents the passenger VAF spectrum contributed by all driver mutations whose fitness s > 3.5%, below which contribution to the passenger spectrum is very small.
Extended Data Fig. 6 Nonsynonymous VAF spectra in Martincorena et al.
(a) The nonsynonymous VAF spectra of the top 10 genes (ranked by nonsynonymous SNV occurrence) were analyzed based on N τ = 7800 (Supplementary note 4C) to estimate their respective fitness and mutation rates. The analysis treats the distribution of fitness effects as delta functions each with a single-valued mutation rate and fitness, taking into account developmental contribution and possible passengers among nonsynonymous SNVs. (b) The nonsynonymous VAF spectra of genes beyond the top 10 (ranked by nonsynonymous SNV occurrence) were analyzed based on the chosen DFE (Supplementary note 3C). Similarly, developmental contribution and possible passengers among nonsynonymous SNVs were taken into account. Data are presented as mean values ± sampling error.
Supplementary information
Supplementary Information
Supplementary Notes 1–4 and Figs. 1–13.
Supplementary Tables
Table 1. Synonymous SNVs from Bolton et al. that were analyzed. Table 2. Synonymous SNVs from Razavi et al. included. Table 3. Synonymous SNVs from two studies by Young et al. included.
Rights and permissions
About this article
Cite this article
Poon, G.Y.P., Watson, C.J., Fisher, D.S. et al. Synonymous mutations reveal genome-wide levels of positive selection in healthy tissues. Nat Genet 53, 1597–1605 (2021). https://doi.org/10.1038/s41588-021-00957-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-021-00957-1
- Springer Nature America, Inc.
This article is cited by
-
The DNA damage sensor ATM kinase interacts with the p53 mRNA and guides the DNA damage response pathway
Molecular Cancer (2024)
-
Cell of origin epigenetic priming determines susceptibility to Tet2 mutation
Nature Communications (2024)
-
Inherited polygenic effects on common hematological traits influence clonal selection on JAK2V617F and the development of myeloproliferative neoplasms
Nature Genetics (2024)
-
Analysis of somatic mutations in whole blood from 200,618 individuals identifies pervasive positive selection and novel drivers of clonal hematopoiesis
Nature Genetics (2024)
-
Effects of lifestyle factors on leukocytes in cardiovascular health and disease
Nature Reviews Cardiology (2024)