Bulletin of Mathematical Biology

, Volume 80, Issue 1, pp 64–103 | Cite as

Split Probabilities and Species Tree Inference Under the Multispecies Coalescent Model

  • Elizabeth S. Allman
  • James H. Degnan
  • John A. RhodesEmail author
Original Article


Using topological summaries of gene trees as a basis for species tree inference is a promising approach to obtain acceptable speed on genomic-scale datasets, and to avoid some undesirable modeling assumptions. Here we study the probabilities of splits on gene trees under the multispecies coalescent model, and how their features might inform species tree inference. After investigating the behavior of split consensus methods, we investigate split invariants—that is, polynomial relationships between split probabilities. These invariants are then used to show that, even though a split is an unrooted notion, split probabilities retain enough information to identify the rooted species tree topology for trees of 5 or more taxa, with one possible 6-taxon exception.


Multispecies coalescent model Split probability Species tree identifiability 

Mathematics Subject Classification




This work was begun while ESA and JAR were Short-term Visitors and JHD was a Sabbatical Fellow at the National Institute for Mathematical and Biological Synthesis, an institute sponsored by the National Science Foundation, the US Department of Homeland Security, and the US Department of Agriculture through NSF Award #EF-0832858, with additional support from the University of Tennessee, Knoxville. It was further supported by the National Institutes of Health Grant R01 GM117590, awarded under the Joint DMS/NIGMS Initiative to Support Research at the Interface of the Biological and Mathematical Sciences.

Supplementary material


  1. Alanzi ARA, Degnan JH (2017) Inferring rooted species trees from unrooted gene trees using approximate Bayesian computation. Mol Phylogenet Evol. Google Scholar
  2. Allman ES, Degnan JH, Rhodes JA (2011a) Determining species tree topologies from clade probabilities under the coalescent. J Theor Biol 289:96–106MathSciNetCrossRefGoogle Scholar
  3. Allman ES, Degnan JH, Rhodes JA (2011b) Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J Math Biol 62(6):833–862MathSciNetCrossRefzbMATHGoogle Scholar
  4. Allman ES, Degnan JH, Rhodes JA (2013) Species tree inference by the STAR method, and generalizations. J Comput Biol 20(1):50–61MathSciNetCrossRefGoogle Scholar
  5. Allman ES, Degnan JH, Rhodes JA (2016) Species tree inference from gene splits by unrooted STAR methods. IEEE/ACM Trans Comput Biol Bioinform.
  6. Ané C (2016) Personal communicationGoogle Scholar
  7. Chifman J, Kubatko L (2015) Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes, site-specific rate variation, and invariable sites. J Theor Biol 374:35–47MathSciNetCrossRefzbMATHGoogle Scholar
  8. Decker W, Greuel G-M, Pfister G, Schönemann H (2016) Singular 4–1–0—a computer algebra system for polynomial computations.
  9. Degnan JH (2013) Anomalous unrooted gene trees. Syst Biol 62:574–590CrossRefGoogle Scholar
  10. Degnan JH, Salter LA (2005) Gene tree distributions under the coalescent process. Evolution 59(1):24–37CrossRefGoogle Scholar
  11. Degnan JH, DeGiorgio M, Bryant D, Rosenberg NA (2009) Properties of consensus methods for inferring species trees from gene trees. Syst Biol 58(1):35–54CrossRefGoogle Scholar
  12. Ewing GB, Ebersberger I, Schmidt HA, von Haeseler A (2008) Rooted triple consensus and anomalous gene trees. BMC Evol Biol 8:118CrossRefGoogle Scholar
  13. Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data. Mol Biol Evol 27:570–580CrossRefGoogle Scholar
  14. Kubatko LS, Carstens BC, Knowles LL (2009) STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25(7):971–973CrossRefGoogle Scholar
  15. Larget BR, Kotha SK, Dewey CN, Ané C (2010) BUCKy: gene tree/species tree reconciliation with Bayesian concordance analysis. Bioinformatics 26:2910–2911CrossRefGoogle Scholar
  16. Liu L, Pearl DK (2007) Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Syst Biol 56:504–514CrossRefGoogle Scholar
  17. Liu L, Yu L (2011) Estimating species trees from unrooted gene trees. Syst Biol 60:661–667CrossRefGoogle Scholar
  18. Liu L, Yu L, Pearl DK, Edwards SV (2009) Estimating species phylogenies using coalescence times among sequences. Syst Biol 58:468–477CrossRefGoogle Scholar
  19. Liu L, Yu L, Edwards SV (2010) A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol Biol 10:302CrossRefGoogle Scholar
  20. Long C, Kubatko L (2017) Identifiability and reconstructibility of species phylogenies under a modified coalescent. arXiv:1701.06871
  21. Mirarab S, Warnow T (2015) ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31:i44–i52CrossRefGoogle Scholar
  22. Philippe H, Brinkmann H, Lavrov DV, Littlewood DTJ, Manuel M, Wörheide G, Baurain D (2011) Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol 9(3):e1000602CrossRefGoogle Scholar
  23. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61(3):539–542CrossRefGoogle Scholar
  24. Semple C, Steel M (2003) Phylogenetics Oxford lecture series in mathematics and its applications, vol 24. Oxford University Press, OxfordGoogle Scholar
  25. Vachaspati P, Warnow T (2015) ASTRID: accurate species trees from internode distances. BMC Genom 16(Suppl 10):S3CrossRefGoogle Scholar
  26. Wu Y (2012) Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood. Evolution 66(3):763–775CrossRefGoogle Scholar

Copyright information

© Society for Mathematical Biology 2017

Authors and Affiliations

  1. 1.Department of Mathematics and StatisticsUniversity of Alaska FairbanksFairbanksUSA
  2. 2.Department of Mathematics and StatisticsThe University of New MexicoAlbuquerqueUSA

Personalised recommendations