Skip to main content
Log in

Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent

  • Published:
Journal of Mathematical Biology Aims and scope Submit manuscript

Abstract

Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals—each with many genes—splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Allman ES, Rhodes JA (2003) Phylogenetic invariants for the general Markov model of sequence mutation. Math Biosci 186: 113–144

    Article  MathSciNet  MATH  Google Scholar 

  • Allman ES, Rhodes JA (2006) The identifiability of tree topology for phylogenetic models, including covarion and mixture models. J Comput Biol 13(5): 1101–1113

    Article  MathSciNet  Google Scholar 

  • Allman ES, Rhodes JA (2008) Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites. Math Biosci 211(1): 18–33

    Article  MathSciNet  MATH  Google Scholar 

  • Allman ES, Rhodes JA (2009) The identifiability of covarion models in phylogenetics. IEEE/ACM Trans Comput Biol Bioinformatics 6:76–88. http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.52

    Google Scholar 

  • Allman ES, Holder MT, Rhodes JA (2010a) Estimating trees from filtered data: identifiability of models for morphological phylogenetics. J Theor Biol 263: 108–119

    Article  Google Scholar 

  • Allman ES, Petrović S, Rhodes JA, Sullivant S (2010b) Identifiability of 2-tree mixtures for group-based models. IEEE/ACM Trans Comput Biol Bioinformatics (pp 1–13, to appear)

  • Bandelt HJ, Dress A (1986) Reconstructing the shape of a tree from observed dissimilarity data. Adv Appl Math 7: 209–343

    Article  MathSciNet  Google Scholar 

  • Carstens B, Knowles LL (2007) Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers. Syst Biol 56: 400–411

    Article  Google Scholar 

  • Cavender JA, Felsenstein J (1987) Invariants on phylogenies in a simple case with discrete states. J Classif 4: 57–71

    Article  MATH  Google Scholar 

  • Cayley A (1857) On the theory of the analytical forms called trees. Phil Mag 13: 172–176

    Google Scholar 

  • Cranston KA, Hurwitz B, Ware D, Stein L, Wing RA (2009) Species trees from highly incongruent gene trees in rice. Syst Biol 58: 489–500

    Article  Google Scholar 

  • Degnan JH (2010) Probabilities of gene-tree topologies with intraspecific sampling given a species tree. In: Knowles LL, Kubatko LS (eds) Estimating species trees: practical and theoretical aspects. Wiley-Blackwell. ISBN: 978-0-470-52685-9

  • Degnan JH, Rosenberg NA (2006) Discordance of species trees with their most likely gene trees. PLoS Genetics 2: 762–768

    Article  Google Scholar 

  • Degnan JH, Rosenberg NA (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol 24: 332–340

    Article  Google Scholar 

  • Degnan JH, Salter LA (2005) Gene tree distributions under the coalescent process. Evolution 59: 24–37

    Google Scholar 

  • Degnan JH, DeGiorgio M, Bryant D, Rosenberg NA (2009) Properties of consensus methods for inferring species trees from gene trees. Syst Biol 58: 35–54

    Article  Google Scholar 

  • Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17: 368–376

    Article  Google Scholar 

  • Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, Sunderland

    Google Scholar 

  • Graham SW, Olmstead RG, Barrett SCH (2002) Rooting phylogenetic trees with distant outgroups: a case study from the commelinoid monocots. Mol Biol Evol 19: 1769–1781

    Google Scholar 

  • Greuel GM, Pfister G, Schönemann H (2009) Singular 3.1.0—a computer algebra system for polynomial computations. http://www.singular.uni-kl.de

  • Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data. Mol Biol Evol 27: 570–580

    Article  Google Scholar 

  • Hudson RR (1983) Testing the constant-rate neutral allele model with protein sequence data. Evolution 37: 203–217

    Article  Google Scholar 

  • Huelsenbeck JP, Bollback JP, Levine AM (2002) Inferring the root of a phylogenetic tree. Syst Biol 51: 32–43

    Article  Google Scholar 

  • Jennings WB, Edwards SV (2005) Speciational history of Australian grassfinches (Poephila) inferred from thirty gene trees. Evolution 59: 2033–2047

    Google Scholar 

  • Kim J, Rohlf FJ, Sokal RR (1993) The accuracy of phylogenetic estimation using the neighbor-joining method. Evolution 47: 471–486

    Article  Google Scholar 

  • Kingman JFC (1982) On the genealogy of large populations. J Appl Probab 19A: 27–43

    Article  MathSciNet  Google Scholar 

  • Kubatko LS, Carstens BC, Knowles LL (2009) STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25: 971–973

    Article  Google Scholar 

  • Lake JA (1987) A rate independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol 4: 167–191

    Google Scholar 

  • Liu L, Pearl DK (2007) Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Syst Biol 56: 504–514

    Article  Google Scholar 

  • Liu L, Yu L, Pearl DK (2010) Maximum tree: a consistent estimator of the species tree. J Math Biol 60: 95–106

    Article  MathSciNet  Google Scholar 

  • Maddison WP, Knowles LL (2006) Inferring phylogeny despite incomplete lineage sorting. Syst Biol 55: 21–30

    Article  Google Scholar 

  • Mossel E, Roch S (2010) Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE Comp Bio Bioinformatics 7: 166–171

    Article  Google Scholar 

  • Nei M (1987) Molecular evolutionary genetics. Columbia University Press, New York

    Google Scholar 

  • Nordborg M (2001) Coalescent theory. In: Balding DJ, Bishop M, Cannings C (eds) Handbook of statistical genetics, 1st edn, chap 7. Wiley, New York, pp 179–212

    Google Scholar 

  • Pamilo P, Nei M (1988) Relationships between gene trees and species trees. Mol Biol Evol 5: 568–583

    Google Scholar 

  • Poe S, Chubb AL (2004) Birds in a bush: five genes indicate explosive radiation of avian orders. Evolution 58: 404–415

    Google Scholar 

  • Rokas A, Williams B, King N, Carroll S (2003) Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425: 798–804

    Article  Google Scholar 

  • Rosenberg NA (2002) The probability of topological concordance of gene trees and species trees. Theor Popul Biol 61: 225–247

    Article  MATH  Google Scholar 

  • Rosenberg NA (2007) Counting coalescent histories. J Comp Biol 14: 360–377

    Article  Google Scholar 

  • Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4: 406–425

    Google Scholar 

  • Semple C, Steel M (2003) Phylogenetics. Oxford University Press, Oxford

    MATH  Google Scholar 

  • Slatkin M, Pollack JL (2008) Subdivision in an ancestral species creates an asymmetry in gene trees. Mol Biol Evol 25: 2241–2246

    Article  Google Scholar 

  • Steel M (1992) The complexity of reconstructing trees from qualitative characters and subtrees. J Classif 9: 91–116

    Article  MathSciNet  MATH  Google Scholar 

  • Strimmer K, von Haeseler A (1996) Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol Biol Evol 13: 964–969

    Google Scholar 

  • Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105: 437–460

    Google Scholar 

  • Takahata N (1989) Gene genealogy in three related populations: consistency probability between gene and population trees. Genetics 122: 957–966

    Google Scholar 

  • Tavaré S (1984) Line-of-descent and genealogical processes, and their applications in population genetics models. Theor Popul Biol 26: 119–164

    Article  MATH  Google Scholar 

  • Wakeley J (2008) Coalescent theory. Roberts & Company, Greenwood Village

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James H. Degnan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Allman, E.S., Degnan, J.H. & Rhodes, J.A. Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J. Math. Biol. 62, 833–862 (2011). https://doi.org/10.1007/s00285-010-0355-7

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00285-010-0355-7

Keywords

Mathematics Subject Classification (2000)

Navigation