Algorithmic improvements to species delimitation and phylogeny estimation under the multispecies coalescent
- 851 Downloads
The focus of this article is a Bayesian method for inferring both species delimitations and species trees under the multispecies coalescent model using molecular sequences from multiple loci. The species delimitation requires no a priori assignment of individuals to species, and no guide tree. The method is implemented in a package called STACEY for BEAST2, and is a extension of the author’s DISSECT package. Here we demonstrate considerable efficiency improvements by using three new operators for sampling from the posterior using the Markov chain Monte Carlo algorithm, and by using a model for the population size parameters along the branches of the species tree which allows these parameters to be integrated out. The correctness of the moves is demonstrated by tests of the implementation. The practice of using a pipeline approach to species delimitation under the multispecies coalescent, has been shown to have major problems on simulated data (Olave et al. in Syst Biol 63:263–271. doi: 10.1093/sysbio/syt106, 2014). The same simulated data set is used to demonstrate the accuracy and improved convergence of the present method. We also compare performance with *BEAST for a fixed delimitation analysis on a large data set, and again show improved convergence.
KeywordsSpecies delimitation Multispecies coalescent Bayesian analysis Markov chain Monte Carlo
Mathematics Subject Classification92B10 62P10
I thank the developers of BEAST for making this work feasible, and Remco Bouckaert in particular for helpful advice on writing the STACEY package. I thank the authors of Olave et al. (2014) and Giarla and Esselstyn (2015) for making their simulated data readily available, and for supplying extra details about their simulations. I thank two anonymous reviewers for valuable comments on an earlier version of this paper.
- Höhna S, Defoin-Platel M, Drummond AJ (2008) Clock-constrained tree proposal operators in Bayesian phylogenetic inference. In: 8th IEEE international conference on bioinformatics and bioengineering, Athens, Greece, pp 1–7, 8–10 Oct 2008Google Scholar
- Plummer M, Best N, Cowles K, Vines K (2006) CODA: convergence diagnosis and output analysis for MCMC. R News 6(1), 7–11. http://CRAN.R-project.org/doc/Rnews/
- Pritchard JK, Stephens M, Donnelly PJ (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959Google Scholar
- Rannala B, Yang Z (2003) Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164:1645–1656Google Scholar
- Yang Z (2002) Likelihood and Bayes estimation of ancestral population sizes in hominoids using data from multiple loci. Genetics 162:1811–1823Google Scholar