Abstract
While the inference of species trees from molecular sequences has become a common type of analysis in studies of species diversification, few programs so far allow for the use of single-nucleotide polymorphisms (SNPs) for the same purpose. In this book chapter, I discuss the use of the Bayesian program SNAPP, which infers the species tree by mathematically integrating over all possible genealogies at each SNP. In particular, I focus on a molecular clock model developed for SNAPP, allowing the inference of divergence times together with the species tree topology and the population size, directly from SNP datasets in variant call format. With the growing availability of SNP datasets for multiple closely related species, this approach is becoming increasingly relevant for the reconstruction of the temporal framework of recent species diversification.
Key words
- Genomics
- Phylogeny
- Species tree
- SNPs
- Divergence times
- SNAPP
- BEAST
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Kubatko LS, Degnan JH (2007) Inconsistency of phylogenetic estimates from concatenated data under coalescence. Syst Biol 56:17–24
Leaché AD, Rannala B (2011) The accuracy of species tree estimation under simulation: a comparison of methods. Syst Biol 60:126–137
Liu L, Edwards SV (2009) Phylogenetic analysis in the anomaly zone. Syst Biol 58:452–460
Degnan JH, DeGiorgio M, Bryant D, Rosenberg NA (2009) Properties of consensus methods for inferring species trees from gene trees. Syst Biol 58:35–54
Roch S, Steel M (2014) Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. Theor Popul Biol 100:56–62
Ogilvie HA, Bouckaert RR, Drummond AJ (2017) StarBEAST2 brings faster species tree inference and accurate estimates of substitution rates. Mol Biol Evol 34:2101–2114
Stange M, Sánchez-Villagra MR, Salzburger W, Matschiner M (2018) Bayesian divergence-time estimation with genome-wide SNP data of sea catfishes (Ariidae) supports Miocene closure of the Panamanian Isthmus. Syst Biol 67:681–699
Maddison WP (1997) Gene trees in species trees. Syst Biol 46:523–536
Liu L (2008) BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics 24:2542–2543
Edwards SV (2009) Is a new and general theory of molecular systematics emerging? Evolution 63:1–19
Kubatko LS, Carstens BC, Knowles LL (2009) STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25:971–973
Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data. Mol Biol Evol 27:570–580
Yang Z (2015) The BPP program for species tree estimation and species delimitation. Curr Zool 61:854–865
Zhang C, Rabiee M, Sayyari E, Mirarab S (2018) ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics 19:153
Edwards SV, Xi Z, Janke A et al (2016) Implementing and testing the multispecies coalescent model: a valuable paradigm for phylogenomics. Mol Phylogenet Evol 94:447–462
Springer MS, Gatesy J (2016) The gene tree delusion. Mol Phylogenet Evol 94:1–33
Chifman J, Kubatko LS (2014) Quartet inference from SNP data under the coalescent model. Bioinformatics 30:3317–3324
Bryant D, Bouckaert RR, Felsenstein J, Rosenberg NA, RoyChoudhury A (2012) Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol Biol Evol 29:1917–1932
De Maio N, Schrempf D, Kosiol C (2015) PoMo: an allele frequency-based approach for species tree estimation. Syst Biol 64:1018–1031
Stoltz M, Bauemer B, Bouckaert R et al (2021) Bayesian inference of species trees using diffusion models. Syst Biol 70:145–161
Leaché AD, Fujita MK, Minin VN, Bouckaert RR (2014) Species delimitation using genome-wide SNP data. Syst Biol 63:534–542
Bouckaert RR, Vaughan TG, Barido-Sottani J et al (2019) BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol 15:e1006650
Drummond AJ, Bouckaert RR (2015) Bayesian evolutionary analysis with BEAST 2. Cambridge University Press, Cambridge
Rambaut A, Drummond AJ, Xie D, Baele G, Suchard MA (2018) Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst Biol 67:901–904
Drummond AJ, Nicholls GK, Rodrigo AG, Solomon W (2002) Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics 161:1307–1320
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
Barth JMI, Gubili C, Matschiner M et al (2020) Stable species boundaries despite ten million years of hybridization in tropical eels. Nat Commun 11:1433
Kumar S, Stecher G, Suleski M, Hedges SB (2017) TimeTree: a resource for timelines, timetrees, and divergence times. Mol Biol Evol 34:1812–1819
Fernández R, Kallal RJ, Dimitrov D et al (2018) Phylogenomics, diversification dynamics, and comparative transcriptomics across the spider Tree of Life. Curr Biol 28:1489–1497
Rabosky DL, Chang J, Title PO et al (2018) An inverse latitudinal gradient in speciation rate for marine fishes. Nature 559:392–395
Upham NS, Esselstyn JA, Jetz W (2019) Inferring the mammal tree: species-level sets of phylogenies for questions in ecology, evolution, and conservation. PLoS Biol 17:e3000494
Janssens S, Couvreur TLP, Mertens A et al (2020) A large-scale species level dated angiosperm phylogeny for evolutionary and ecological analyses. Biodiv Data J 8:e39677
Matschiner M, Musilova Z, Barth JMI et al (2017) Bayesian phylogenetic estimation of clade ages supports trans-Atlantic dispersal of cichlid fishes. Syst Biol 66:3–22
Jacobsen MW, Pujolar JM, Gilbert MTP et al (2014) Speciation and demographic history of Atlantic eels (Anguilla anguilla and A. rostrata) revealed by mitogenome sequencing. Heredity 113:432–442
Plummer M, Best N, Cowles K, Vines K (2006) CODA: convergence diagnosis and output analysis for MCMC. R News 6:7–11
Yule GU (1925) A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F.R.S. Phil Trans R Soc Lond B 213:21–87
Genner MJ, Turner GF (2014) Timing of population expansions within the Lake Malawi haplochromine cichlid fish radiation. Hydrobiologia 748:121–132
Yang Z (1994) Estimating the pattern of nucleotide substitution. J Mol Evol 39:105–111
Brown WM, Prager EM, Wang A, Wilson AC (1982) Mitochondrial DNA sequences of primates: tempo and mode of evolution. J Mol Evol 18:225–239
Bouckaert RR (2010) DensiTree: making sense of sets of phylogenetic trees. Bioinformatics 26:1372–1373
Heled J, Bouckaert RR (2013) Looking for trees in the forest: summary tree from posterior samples. BMC Evol Biol 13:211
Acknowledgments
I thank Julie Lee-Yaw, Amanda Haponski, Livia Loureiro, Sue Sherman-Broyles, Bohao Fang, Yayan Kusuma, Daniel Poveda-Martínez, Xiaoxi Yang, Cecilia Fiorini, Kristen Finch, Armel Donkpegan, Marta Liber, Jie Gao, and Julia Canitz for testing the snapp_prep.rb script. Funding was provided by the Research Council of Norway (FRIPRO 275869).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Matschiner, M. (2022). Species Tree Inference with SNP Data. In: Pereira-Santana, A., Gamboa-Tuz, S.D., Rodríguez-Zapata, L.C. (eds) Plant Comparative Genomics. Methods in Molecular Biology, vol 2512. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2429-6_2
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2429-6_2
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2428-9
Online ISBN: 978-1-0716-2429-6
eBook Packages: Springer Protocols