# Split Probabilities and Species Tree Inference Under the Multispecies Coalescent Model

- 116 Downloads

## Abstract

Using topological summaries of gene trees as a basis for species tree inference is a promising approach to obtain acceptable speed on genomic-scale datasets, and to avoid some undesirable modeling assumptions. Here we study the probabilities of splits on gene trees under the multispecies coalescent model, and how their features might inform species tree inference. After investigating the behavior of split consensus methods, we investigate split invariants—that is, polynomial relationships between split probabilities. These invariants are then used to show that, even though a split is an unrooted notion, split probabilities retain enough information to identify the rooted species tree topology for trees of 5 or more taxa, with one possible 6-taxon exception.

## Keywords

Multispecies coalescent model Split probability Species tree identifiability## Mathematics Subject Classification

92D15## Notes

### Acknowledgements

This work was begun while ESA and JAR were Short-term Visitors and JHD was a Sabbatical Fellow at the National Institute for Mathematical and Biological Synthesis, an institute sponsored by the National Science Foundation, the US Department of Homeland Security, and the US Department of Agriculture through NSF Award #EF-0832858, with additional support from the University of Tennessee, Knoxville. It was further supported by the National Institutes of Health Grant R01 GM117590, awarded under the Joint DMS/NIGMS Initiative to Support Research at the Interface of the Biological and Mathematical Sciences.

## Supplementary material

## References

- Alanzi ARA, Degnan JH (2017) Inferring rooted species trees from unrooted gene trees using approximate Bayesian computation. Mol Phylogenet Evol. https://doi.org/10.1016/j.ympev.2017.07.017 Google Scholar
- Allman ES, Degnan JH, Rhodes JA (2011a) Determining species tree topologies from clade probabilities under the coalescent. J Theor Biol 289:96–106MathSciNetCrossRefGoogle Scholar
- Allman ES, Degnan JH, Rhodes JA (2011b) Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J Math Biol 62(6):833–862MathSciNetCrossRefMATHGoogle Scholar
- Allman ES, Degnan JH, Rhodes JA (2013) Species tree inference by the STAR method, and generalizations. J Comput Biol 20(1):50–61MathSciNetCrossRefGoogle Scholar
- Allman ES, Degnan JH, Rhodes JA (2016) Species tree inference from gene splits by unrooted STAR methods. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2016.2604812
- Ané C (2016) Personal communicationGoogle Scholar
- Chifman J, Kubatko L (2015) Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes, site-specific rate variation, and invariable sites. J Theor Biol 374:35–47MathSciNetCrossRefMATHGoogle Scholar
- Decker W, Greuel G-M, Pfister G, Schönemann H (2016) Singular 4–1–0—a computer algebra system for polynomial computations. http://www.singular.uni-kl.de
- Degnan JH (2013) Anomalous unrooted gene trees. Syst Biol 62:574–590CrossRefGoogle Scholar
- Degnan JH, Salter LA (2005) Gene tree distributions under the coalescent process. Evolution 59(1):24–37CrossRefGoogle Scholar
- Degnan JH, DeGiorgio M, Bryant D, Rosenberg NA (2009) Properties of consensus methods for inferring species trees from gene trees. Syst Biol 58(1):35–54CrossRefGoogle Scholar
- Ewing GB, Ebersberger I, Schmidt HA, von Haeseler A (2008) Rooted triple consensus and anomalous gene trees. BMC Evol Biol 8:118CrossRefGoogle Scholar
- Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data. Mol Biol Evol 27:570–580CrossRefGoogle Scholar
- Kubatko LS, Carstens BC, Knowles LL (2009) STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25(7):971–973CrossRefGoogle Scholar
- Larget BR, Kotha SK, Dewey CN, Ané C (2010) BUCKy: gene tree/species tree reconciliation with Bayesian concordance analysis. Bioinformatics 26:2910–2911CrossRefGoogle Scholar
- Liu L, Pearl DK (2007) Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Syst Biol 56:504–514CrossRefGoogle Scholar
- Liu L, Yu L (2011) Estimating species trees from unrooted gene trees. Syst Biol 60:661–667CrossRefGoogle Scholar
- Liu L, Yu L, Pearl DK, Edwards SV (2009) Estimating species phylogenies using coalescence times among sequences. Syst Biol 58:468–477CrossRefGoogle Scholar
- Liu L, Yu L, Edwards SV (2010) A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol Biol 10:302CrossRefGoogle Scholar
- Long C, Kubatko L (2017) Identifiability and reconstructibility of species phylogenies under a modified coalescent. arXiv:1701.06871
- Mirarab S, Warnow T (2015) ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31:i44–i52CrossRefGoogle Scholar
- Philippe H, Brinkmann H, Lavrov DV, Littlewood DTJ, Manuel M, Wörheide G, Baurain D (2011) Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol 9(3):e1000602CrossRefGoogle Scholar
- Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61(3):539–542CrossRefGoogle Scholar
- Semple C, Steel M (2003) Phylogenetics Oxford lecture series in mathematics and its applications, vol 24. Oxford University Press, OxfordGoogle Scholar
- Vachaspati P, Warnow T (2015) ASTRID: accurate species trees from internode distances. BMC Genom 16(Suppl 10):S3CrossRefGoogle Scholar
- Wu Y (2012) Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood. Evolution 66(3):763–775CrossRefGoogle Scholar