Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent


Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals—each with many genes—splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 199

This is the net price. Taxes to be calculated in checkout.


  1. Allman ES, Rhodes JA (2003) Phylogenetic invariants for the general Markov model of sequence mutation. Math Biosci 186: 113–144

  2. Allman ES, Rhodes JA (2006) The identifiability of tree topology for phylogenetic models, including covarion and mixture models. J Comput Biol 13(5): 1101–1113

  3. Allman ES, Rhodes JA (2008) Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites. Math Biosci 211(1): 18–33

  4. Allman ES, Rhodes JA (2009) The identifiability of covarion models in phylogenetics. IEEE/ACM Trans Comput Biol Bioinformatics 6:76–88.

  5. Allman ES, Holder MT, Rhodes JA (2010a) Estimating trees from filtered data: identifiability of models for morphological phylogenetics. J Theor Biol 263: 108–119

  6. Allman ES, Petrović S, Rhodes JA, Sullivant S (2010b) Identifiability of 2-tree mixtures for group-based models. IEEE/ACM Trans Comput Biol Bioinformatics (pp 1–13, to appear)

  7. Bandelt HJ, Dress A (1986) Reconstructing the shape of a tree from observed dissimilarity data. Adv Appl Math 7: 209–343

  8. Carstens B, Knowles LL (2007) Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers. Syst Biol 56: 400–411

  9. Cavender JA, Felsenstein J (1987) Invariants on phylogenies in a simple case with discrete states. J Classif 4: 57–71

  10. Cayley A (1857) On the theory of the analytical forms called trees. Phil Mag 13: 172–176

  11. Cranston KA, Hurwitz B, Ware D, Stein L, Wing RA (2009) Species trees from highly incongruent gene trees in rice. Syst Biol 58: 489–500

  12. Degnan JH (2010) Probabilities of gene-tree topologies with intraspecific sampling given a species tree. In: Knowles LL, Kubatko LS (eds) Estimating species trees: practical and theoretical aspects. Wiley-Blackwell. ISBN: 978-0-470-52685-9

  13. Degnan JH, Rosenberg NA (2006) Discordance of species trees with their most likely gene trees. PLoS Genetics 2: 762–768

  14. Degnan JH, Rosenberg NA (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol 24: 332–340

  15. Degnan JH, Salter LA (2005) Gene tree distributions under the coalescent process. Evolution 59: 24–37

  16. Degnan JH, DeGiorgio M, Bryant D, Rosenberg NA (2009) Properties of consensus methods for inferring species trees from gene trees. Syst Biol 58: 35–54

  17. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17: 368–376

  18. Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, Sunderland

  19. Graham SW, Olmstead RG, Barrett SCH (2002) Rooting phylogenetic trees with distant outgroups: a case study from the commelinoid monocots. Mol Biol Evol 19: 1769–1781

  20. Greuel GM, Pfister G, Schönemann H (2009) Singular 3.1.0—a computer algebra system for polynomial computations.

  21. Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data. Mol Biol Evol 27: 570–580

  22. Hudson RR (1983) Testing the constant-rate neutral allele model with protein sequence data. Evolution 37: 203–217

  23. Huelsenbeck JP, Bollback JP, Levine AM (2002) Inferring the root of a phylogenetic tree. Syst Biol 51: 32–43

  24. Jennings WB, Edwards SV (2005) Speciational history of Australian grassfinches (Poephila) inferred from thirty gene trees. Evolution 59: 2033–2047

  25. Kim J, Rohlf FJ, Sokal RR (1993) The accuracy of phylogenetic estimation using the neighbor-joining method. Evolution 47: 471–486

  26. Kingman JFC (1982) On the genealogy of large populations. J Appl Probab 19A: 27–43

  27. Kubatko LS, Carstens BC, Knowles LL (2009) STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25: 971–973

  28. Lake JA (1987) A rate independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol 4: 167–191

  29. Liu L, Pearl DK (2007) Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Syst Biol 56: 504–514

  30. Liu L, Yu L, Pearl DK (2010) Maximum tree: a consistent estimator of the species tree. J Math Biol 60: 95–106

  31. Maddison WP, Knowles LL (2006) Inferring phylogeny despite incomplete lineage sorting. Syst Biol 55: 21–30

  32. Mossel E, Roch S (2010) Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE Comp Bio Bioinformatics 7: 166–171

  33. Nei M (1987) Molecular evolutionary genetics. Columbia University Press, New York

  34. Nordborg M (2001) Coalescent theory. In: Balding DJ, Bishop M, Cannings C (eds) Handbook of statistical genetics, 1st edn, chap 7. Wiley, New York, pp 179–212

  35. Pamilo P, Nei M (1988) Relationships between gene trees and species trees. Mol Biol Evol 5: 568–583

  36. Poe S, Chubb AL (2004) Birds in a bush: five genes indicate explosive radiation of avian orders. Evolution 58: 404–415

  37. Rokas A, Williams B, King N, Carroll S (2003) Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425: 798–804

  38. Rosenberg NA (2002) The probability of topological concordance of gene trees and species trees. Theor Popul Biol 61: 225–247

  39. Rosenberg NA (2007) Counting coalescent histories. J Comp Biol 14: 360–377

  40. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4: 406–425

  41. Semple C, Steel M (2003) Phylogenetics. Oxford University Press, Oxford

  42. Slatkin M, Pollack JL (2008) Subdivision in an ancestral species creates an asymmetry in gene trees. Mol Biol Evol 25: 2241–2246

  43. Steel M (1992) The complexity of reconstructing trees from qualitative characters and subtrees. J Classif 9: 91–116

  44. Strimmer K, von Haeseler A (1996) Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol Biol Evol 13: 964–969

  45. Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105: 437–460

  46. Takahata N (1989) Gene genealogy in three related populations: consistency probability between gene and population trees. Genetics 122: 957–966

  47. Tavaré S (1984) Line-of-descent and genealogical processes, and their applications in population genetics models. Theor Popul Biol 26: 119–164

  48. Wakeley J (2008) Coalescent theory. Roberts & Company, Greenwood Village

Download references

Author information

Correspondence to James H. Degnan.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Allman, E.S., Degnan, J.H. & Rhodes, J.A. Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J. Math. Biol. 62, 833–862 (2011) doi:10.1007/s00285-010-0355-7

Download citation


  • Multispecies coalescent
  • Phylogenetics
  • Invariants
  • Polytomy

Mathematics Subject Classification (2000)

  • 62P10
  • 92D15