Journal of Mathematical Biology

, Volume 62, Issue 6, pp 833–862 | Cite as

Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent

  • Elizabeth S. Allman
  • James H. Degnan
  • John A. Rhodes
Article

Abstract

Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals—each with many genes—splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.

Keywords

Multispecies coalescent Phylogenetics Invariants Polytomy 

Mathematics Subject Classification (2000)

62P10 92D15 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allman ES, Rhodes JA (2003) Phylogenetic invariants for the general Markov model of sequence mutation. Math Biosci 186: 113–144MathSciNetMATHCrossRefGoogle Scholar
  2. Allman ES, Rhodes JA (2006) The identifiability of tree topology for phylogenetic models, including covarion and mixture models. J Comput Biol 13(5): 1101–1113MathSciNetCrossRefGoogle Scholar
  3. Allman ES, Rhodes JA (2008) Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites. Math Biosci 211(1): 18–33MathSciNetMATHCrossRefGoogle Scholar
  4. Allman ES, Rhodes JA (2009) The identifiability of covarion models in phylogenetics. IEEE/ACM Trans Comput Biol Bioinformatics 6:76–88. http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.52 Google Scholar
  5. Allman ES, Holder MT, Rhodes JA (2010a) Estimating trees from filtered data: identifiability of models for morphological phylogenetics. J Theor Biol 263: 108–119CrossRefGoogle Scholar
  6. Allman ES, Petrović S, Rhodes JA, Sullivant S (2010b) Identifiability of 2-tree mixtures for group-based models. IEEE/ACM Trans Comput Biol Bioinformatics (pp 1–13, to appear)Google Scholar
  7. Bandelt HJ, Dress A (1986) Reconstructing the shape of a tree from observed dissimilarity data. Adv Appl Math 7: 209–343MathSciNetCrossRefGoogle Scholar
  8. Carstens B, Knowles LL (2007) Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers. Syst Biol 56: 400–411CrossRefGoogle Scholar
  9. Cavender JA, Felsenstein J (1987) Invariants on phylogenies in a simple case with discrete states. J Classif 4: 57–71MATHCrossRefGoogle Scholar
  10. Cayley A (1857) On the theory of the analytical forms called trees. Phil Mag 13: 172–176Google Scholar
  11. Cranston KA, Hurwitz B, Ware D, Stein L, Wing RA (2009) Species trees from highly incongruent gene trees in rice. Syst Biol 58: 489–500CrossRefGoogle Scholar
  12. Degnan JH (2010) Probabilities of gene-tree topologies with intraspecific sampling given a species tree. In: Knowles LL, Kubatko LS (eds) Estimating species trees: practical and theoretical aspects. Wiley-Blackwell. ISBN: 978-0-470-52685-9Google Scholar
  13. Degnan JH, Rosenberg NA (2006) Discordance of species trees with their most likely gene trees. PLoS Genetics 2: 762–768CrossRefGoogle Scholar
  14. Degnan JH, Rosenberg NA (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol 24: 332–340CrossRefGoogle Scholar
  15. Degnan JH, Salter LA (2005) Gene tree distributions under the coalescent process. Evolution 59: 24–37Google Scholar
  16. Degnan JH, DeGiorgio M, Bryant D, Rosenberg NA (2009) Properties of consensus methods for inferring species trees from gene trees. Syst Biol 58: 35–54CrossRefGoogle Scholar
  17. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17: 368–376CrossRefGoogle Scholar
  18. Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, SunderlandGoogle Scholar
  19. Graham SW, Olmstead RG, Barrett SCH (2002) Rooting phylogenetic trees with distant outgroups: a case study from the commelinoid monocots. Mol Biol Evol 19: 1769–1781Google Scholar
  20. Greuel GM, Pfister G, Schönemann H (2009) Singular 3.1.0—a computer algebra system for polynomial computations. http://www.singular.uni-kl.de
  21. Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data. Mol Biol Evol 27: 570–580CrossRefGoogle Scholar
  22. Hudson RR (1983) Testing the constant-rate neutral allele model with protein sequence data. Evolution 37: 203–217CrossRefGoogle Scholar
  23. Huelsenbeck JP, Bollback JP, Levine AM (2002) Inferring the root of a phylogenetic tree. Syst Biol 51: 32–43CrossRefGoogle Scholar
  24. Jennings WB, Edwards SV (2005) Speciational history of Australian grassfinches (Poephila) inferred from thirty gene trees. Evolution 59: 2033–2047Google Scholar
  25. Kim J, Rohlf FJ, Sokal RR (1993) The accuracy of phylogenetic estimation using the neighbor-joining method. Evolution 47: 471–486CrossRefGoogle Scholar
  26. Kingman JFC (1982) On the genealogy of large populations. J Appl Probab 19A: 27–43MathSciNetCrossRefGoogle Scholar
  27. Kubatko LS, Carstens BC, Knowles LL (2009) STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25: 971–973CrossRefGoogle Scholar
  28. Lake JA (1987) A rate independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol 4: 167–191Google Scholar
  29. Liu L, Pearl DK (2007) Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Syst Biol 56: 504–514CrossRefGoogle Scholar
  30. Liu L, Yu L, Pearl DK (2010) Maximum tree: a consistent estimator of the species tree. J Math Biol 60: 95–106MathSciNetCrossRefGoogle Scholar
  31. Maddison WP, Knowles LL (2006) Inferring phylogeny despite incomplete lineage sorting. Syst Biol 55: 21–30CrossRefGoogle Scholar
  32. Mossel E, Roch S (2010) Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE Comp Bio Bioinformatics 7: 166–171CrossRefGoogle Scholar
  33. Nei M (1987) Molecular evolutionary genetics. Columbia University Press, New YorkGoogle Scholar
  34. Nordborg M (2001) Coalescent theory. In: Balding DJ, Bishop M, Cannings C (eds) Handbook of statistical genetics, 1st edn, chap 7. Wiley, New York, pp 179–212Google Scholar
  35. Pamilo P, Nei M (1988) Relationships between gene trees and species trees. Mol Biol Evol 5: 568–583Google Scholar
  36. Poe S, Chubb AL (2004) Birds in a bush: five genes indicate explosive radiation of avian orders. Evolution 58: 404–415Google Scholar
  37. Rokas A, Williams B, King N, Carroll S (2003) Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425: 798–804CrossRefGoogle Scholar
  38. Rosenberg NA (2002) The probability of topological concordance of gene trees and species trees. Theor Popul Biol 61: 225–247MATHCrossRefGoogle Scholar
  39. Rosenberg NA (2007) Counting coalescent histories. J Comp Biol 14: 360–377CrossRefGoogle Scholar
  40. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4: 406–425Google Scholar
  41. Semple C, Steel M (2003) Phylogenetics. Oxford University Press, OxfordMATHGoogle Scholar
  42. Slatkin M, Pollack JL (2008) Subdivision in an ancestral species creates an asymmetry in gene trees. Mol Biol Evol 25: 2241–2246CrossRefGoogle Scholar
  43. Steel M (1992) The complexity of reconstructing trees from qualitative characters and subtrees. J Classif 9: 91–116MathSciNetMATHCrossRefGoogle Scholar
  44. Strimmer K, von Haeseler A (1996) Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol Biol Evol 13: 964–969Google Scholar
  45. Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105: 437–460Google Scholar
  46. Takahata N (1989) Gene genealogy in three related populations: consistency probability between gene and population trees. Genetics 122: 957–966Google Scholar
  47. Tavaré S (1984) Line-of-descent and genealogical processes, and their applications in population genetics models. Theor Popul Biol 26: 119–164MATHCrossRefGoogle Scholar
  48. Wakeley J (2008) Coalescent theory. Roberts & Company, Greenwood VillageGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • Elizabeth S. Allman
    • 1
  • James H. Degnan
    • 2
  • John A. Rhodes
    • 1
  1. 1.Department of Mathematics and StatisticsUniversity of Alaska FairbanksFairbanksUSA
  2. 2.Department of Mathematics and StatisticsUniversity of CanterburyChristchurchNew Zealand

Personalised recommendations