Abstract
Phylogenetic tests of adaptive evolution, such as the widely used branch-site test (BST), assume that nucleotide substitutions occur singly and independently. Recent research has shown that errors at adjacent sites often occur during DNA replication, and the resulting multinucleotide mutations (MNMs) are overwhelmingly likely to be non-synonymous. To evaluate whether the BST misinterprets sequence patterns produced by MNMs as false support for positive selection, we analysed two genome-scale datasets—one from mammals and one from flies. We found that codons with multiple differences account for virtually all the support for lineage-specific positive selection in the BST. Simulations under conditions derived from these alignments but without positive selection show that realistic rates of MNMs cause a strong and systematic bias towards false inferences of selection. This bias is sufficient under empirically derived conditions to produce false positive inferences as often as the BST infers positive selection from the empirical data. Although some genes with BST-positive results may have evolved adaptively, the test cannot distinguish sequence patterns produced by authentic positive selection from those caused by neutral fixation of MNMs. Many published inferences of adaptive evolution using this technique may therefore be artefacts of model violation caused by unincorporated neutral mutational processes. We introduce a model that incorporates MNMs and may help to ameliorate this bias.
Similar content being viewed by others
References
Goldman, N. & Yang, Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11, 725–736 (1994).
Murrell, B. et al. Gene-wide identification of episodic selection. Mol. Biol. Evol. 32, 1365–1371 (2015).
Murrell, B. et al. Detecting individual sites subject to episodic diversifying selection. PLoS Genet. 8, e1002764 (2012).
Smith, M. D. et al. Less is more: an adaptive branch-site random effects model for efficient detection of episodic diversifying selection. Mol. Biol. Evol. 32, 1342–1353 (2015).
Yang, Z. & Nielsen, R. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19, 908–917 (2002).
Zhang, J., Nielsen, R. & Yang, Z. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol. Biol. Evol. 22, 2472–2479 (2005).
Pond, S. L., Frost, S. D. & Muse, S. V. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21, 676–679 (2005).
Kosiol, C., Holmes, I. & Goldman, N. An empirical codon model for protein sequence evolution. Mol. Biol. Evol. 24, 1464–1479 (2007).
Whelan, S. & Goldman, N. Estimating the frequency of events that cause multiple-nucleotide changes. Genetics 167, 2027–2043 (2004).
Muse, S. V. & Gaut, B. S. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol. Biol. Evol. 11, 715–724 (1994).
Han, M. V., Demuth, J. P., McGrath, C. L., Casola, C. & Hahn, M. W. Adaptive evolution of young gene duplicates in mammals. Genome Res. 19, 859–867 (2009).
Drosophila 12 Genomes Consortium et al. Evolution of genes and genomes on the Drosophila phylogeny. Nature 450, 203–218 (2007).
Foote, A. D. et al. Convergent evolution of the genomes of marine mammals. Nat. Genet. 47, 272–275 (2015).
Kosiol, C. et al. Patterns of positive selection in six mammalian genomes. PLoS Genet. 4, e1000144 (2008).
Roux, J. et al. Patterns of positive selection in seven ant genomes. Mol. Biol. Evol. 31, 1661–1685 (2014).
Yang, Z. & dos Reis, M. Statistical properties of the branch-site test of positive selection. Mol. Biol. Evol. 28, 1217–1228 (2011).
Zhang, J. Performance of likelihood ratio tests of evolutionary hypotheses under inadequate substitution models. Mol. Biol. Evol. 16, 868–875 (1999).
Gharib, W. H. & Robinson-Rechavi, M. The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC. Mol. Biol. Evol. 30, 1675–1686 (2013).
Zhai, W., Nielsen, R., Goldman, N. & Yang, Z. Looking for Darwin in genomic sequences—validity and success of statistical methods. Mol. Biol. Evol. 29, 2889–2893 (2012).
Nozawa, M., Suzuki, Y. & Nei, M. Reliabilities of identifying positive selection by the branch-site and the site-prediction methods. Proc. Natl Acad. Sci. USA 106, 6700–6705 (2009).
Casola, C. & Hahn, M. W. Gene conversion among paralogs results in moderate false detection of positive selection using likelihood methods. J. Mol. Evol. 68, 679–687 (2009).
Anisimova, M. & Yang, Z. Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. Mol. Biol. Evol. 24, 1219–1228 (2007).
Kosakovsky Pond, S. L. et al. A random effects branch-site model for detecting episodic diversifying selection. Mol. Biol. Evol. 28, 3033–3043 (2011).
Zhang, J. Frequent false detection of positive selection by the likelihood method with branch-site models. Mol. Biol. Evol. 21, 1332–1339 (2004).
Schrider, D. R., Hourmozdi, J. N. & Hahn, M. W. Pervasive multinucleotide mutational events in eukaryotes. Curr. Biol. 21, 1051–1054 (2011).
Saribasak, H. et al. DNA polymerase ζ generates tandem mutations in immunoglobulin variable regions. J. Exp. Med. 209, 1075–1081 (2012).
Loeb, L. A. & Monnat, R. J. DNA polymerases and human disease. Nat. Rev. Genet. 9, 594–604 (2008).
Matsuda, T., Bebenek, K., Masutani, C., Hanaoka, F. & Kunkel, T. A. Low fidelity DNA synthesis by human DNA polymerase-η. Nature 404, 1011–1013 (2000).
Seplyarskiy, V. B., Bazykin, G. A. & Soldatov, R. A. Polymerase ζ activity is linked to replication timing in humans: evidence from mutational signatures. Mol. Biol. Evol. 32, 3158–3172 (2015).
Stone, J. E., Lujan, S. A., Kunkel, T. A. & Kunkel, T. A. DNA polymerase zeta generates clustered mutations during bypass of endogenous DNA lesions in Saccharomyces cerevisiae. Environ. Mol. Mutagen. 53, 777–786 (2012).
Arana, M. E., Seki, M., Wood, R. D., Rogozin, I. B. & Kunkel, T. A. Low-fidelity DNA synthesis by human DNA polymerase theta. Nucleic Acids Res. 36, 3847–3856 (2008).
Besenbacher, S. et al. Multi-nucleotide de novo mutations in humans. PLoS Genet. 12, e1006315 (2016).
Chen, J. M., Férec, C. & Cooper, D. N. Complex multiple-nucleotide substitution mutations causing human inherited disease reveal novel insights into the action of translesion synthesis DNA polymerases. Hum. Mutat. 36, 1034–1038 (2015).
Chen, J. M., Cooper, D. N. & Férec, C. A new and more accurate estimate of the rate of concurrent tandem-base substitution mutations in the human germline: ∼0.4% of the single-nucleotide substitution mutation rate. Hum. Mutat. 35, 392–394 (2014).
Harris, K. & Nielsen, R. Error-prone polymerase activity causes multinucleotide mutations in humans. Genome Res. 24, 1445–1454 (2014).
Hodgkinson, A. & Eyre-Walker, A. Variation in the mutation rate across mammalian genomes. Nat. Rev. Genet. 12, 756–766 (2011).
Assaf, Z. J., Tilk, S., Park, J., Siegal, M. L. & Petrov, D. A. Deep sequencing of natural and experimental populations of Drosophila melanogaster reveals biases in the spectrum of new mutations. Genome Res. 27, 1988–2000 (2017).
Francioli, L. C. et al. Genome-wide patterns and properties of de novo mutations in humans. Nat. Genet. 47, 822–826 (2015).
Zhu, W. et al. Concurrent nucleotide substitution mutations in the human genome are characterized by a significantly decreased transition/transversion ratio. Hum. Mutat. 36, 333–341 (2015).
Averof, M., Rokas, A., Wolfe, K. H. & Sharp, P. M. Evidence for a high frequency of simultaneous double-nucleotide substitutions. Science 287, 1283–1286 (2000).
Bazykin, G. A., Kondrashov, F. A., Ogurtsov, A. Y., Sunyaev, S. & Kondrashov, A. S. Positive selection at sites of multiple amino acid replacements since rat–mouse divergence. Nature 429, 558–562 (2004).
Rogozin, I. B. et al. Evolutionary switches between two serine codon sets are driven by selection. Proc. Natl Acad. Sci. USA 113, 13109–13113 (2016).
De Maio, N., Holmes, I., Schlötterer, C. & Kosiol, C. Estimating empirical codon hidden Markov models. Mol. Biol. Evol. 30, 725–736 (2013).
Suzuki, Y. False-positive results obtained from the branch-site test of positive selection. Genes Genet. Syst. 83, 331–338 (2008).
Larracuente, A. M. et al. Evolution of protein-coding genes in Drosophila. Trends Genet. 24, 114–123 (2008).
Sironi, M., Cagliani, R., Forni, D. & Clerici, M. Evolutionary insights into host–pathogen interactions from mammalian sequence data. Nat. Rev. Genet. 16, 224–236 (2015).
Elde, N. C., Child, S. J., Geballe, A. P. & Malik, H. S. Protein kinase R reveals an evolutionary model for defeating viral mimicry. Nature 457, 485–489 (2009).
Patel, M. R., Loo, Y. M., Horner, S. M., Gale, M. & Malik, H. S. Convergent evolution of escape from hepaciviral antagonism in primates. PLoS Biol. 10, e1001282 (2012).
Demogines, A., Abraham, J., Choe, H., Farzan, M. & Sawyer, S. L. Dual host–virus arms races shape an essential housekeeping protein. PLoS Biol. 11, e1001571 (2013).
Barber, M. F. & Elde, N. C. Nutritional immunity. Escape from bacterial iron piracy through rapid evolution of transferrin. Science 346, 1362–1366 (2014).
Machkovech, H. M., Bedford, T., Suchard, M. A. & Bloom, J. D. Positive selection in CD8+ T-cell epitopes of influenza virus nucleoprotein revealed by a comparative analysis of human and swine viral lineages. J. Virol. 89, 11275–11283 (2015).
Field, S. F., Bulina, M. Y., Kelmanson, I. V., Bielawski, J. P. & Matz, M. V. Adaptive evolution of multicolored fluorescent proteins in reef-building corals. J. Mol. Evol. 62, 332–339 (2006).
Yokoyama, S., Tada, T., Zhang, H. & Britt, L. Elucidation of phenotypic adaptations: molecular analyses of dim-light vision proteins in vertebrates. Proc. Natl Acad. Sci. USA 105, 13480–13485 (2008).
Zhuang, H., Chien, M. S. & Matsunami, H. Dynamic functional evolution of an odorant receptor for sex-steroid-derived odors in primates. Proc. Natl Acad. Sci. USA 106, 21247–21251 (2009).
Bloom, J. D. An experimentally determined evolutionary model dramatically improves phylogenetic fit. Mol. Biol. Evol. 31, 1956–1978 (2014).
Lopez, P., Casane, D. & Philippe, H. Heterotachy, an important process of protein evolution. Mol. Biol. Evol. 19, 1–7 (2002).
Pond, S. K. & Muse, S. V. Site-to-site variation of synonymous substitution rates. Mol. Biol. Evol. 22, 2375–2385 (2005).
Chan, Y. F. et al. Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer. Science 327, 302–305 (2010).
Barrett, R. D. & Hoekstra, H. E. Molecular spandrels: tests of adaptation at the genetic level. Nat. Rev. Genet. 12, 767–780 (2011).
Siddiq, M. A., Loehlin, D. W., Montooth, K. L. & Thornton, J. W. Experimental test and refutation of a classic case of molecular adaptation in Drosophila melanogaster. Nat. Ecol. Evol. 1, 0025 (2017).
Acknowledgements
We are grateful to the members of the Thornton laboratory for discussion and helpful comments. We thank the Beagle2, Midway2 and Tarbell supercomputing clusters at the University of Chicago. We also thank the developers of HyPhy for presenting an open source platform that allows customization of standard analyses. Funding was provided by NIH R01GM104397 and R01GM121931 (to J.W.T.), NSF DEB-1601781 (to J.W.T. and A.V.), NSF DBI-1564611 (to M.W.H.), and the Precision Health Initiative of Indiana University (to M.W.H.).
Author information
Authors and Affiliations
Contributions
The analyses were designed by all authors, performed by A.V. and interpreted by all authors. The manuscript was written by A.V. and J.W.T. with contributions from M.W.H.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Tables 1–6 and Supplementary Figures 1–8
README for HyPhy batch files
Instructions for running simulations and BS+MNM test using batch files and Hyphy
Rights and permissions
About this article
Cite this article
Venkat, A., Hahn, M.W. & Thornton, J.W. Multinucleotide mutations cause false inferences of lineage-specific positive selection. Nat Ecol Evol 2, 1280–1288 (2018). https://doi.org/10.1038/s41559-018-0584-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41559-018-0584-5
- Springer Nature Limited
This article is cited by
-
The genome of the pygmy right whale illuminates the evolution of rorquals
BMC Biology (2023)
-
Co-expression network analyses of anthocyanin biosynthesis genes in Ruellia (Wild Petunias; Acanthaceae)
BMC Ecology and Evolution (2022)
-
Genome-wide macroevolutionary signatures of key innovations in butterflies colonizing new host plants
Nature Communications (2021)
-
Protein innovation through template switching in the Saccharomyces cerevisiae lineage
Scientific Reports (2021)
-
Improved inference of site-specific positive selection under a generalized parametric codon model when there are multinucleotide mutations and multiple nonsynonymous rates
BMC Evolutionary Biology (2019)