Abstract
Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals—each with many genes—splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.
Similar content being viewed by others
References
Allman ES, Rhodes JA (2003) Phylogenetic invariants for the general Markov model of sequence mutation. Math Biosci 186: 113–144
Allman ES, Rhodes JA (2006) The identifiability of tree topology for phylogenetic models, including covarion and mixture models. J Comput Biol 13(5): 1101–1113
Allman ES, Rhodes JA (2008) Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites. Math Biosci 211(1): 18–33
Allman ES, Rhodes JA (2009) The identifiability of covarion models in phylogenetics. IEEE/ACM Trans Comput Biol Bioinformatics 6:76–88. http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.52
Allman ES, Holder MT, Rhodes JA (2010a) Estimating trees from filtered data: identifiability of models for morphological phylogenetics. J Theor Biol 263: 108–119
Allman ES, Petrović S, Rhodes JA, Sullivant S (2010b) Identifiability of 2-tree mixtures for group-based models. IEEE/ACM Trans Comput Biol Bioinformatics (pp 1–13, to appear)
Bandelt HJ, Dress A (1986) Reconstructing the shape of a tree from observed dissimilarity data. Adv Appl Math 7: 209–343
Carstens B, Knowles LL (2007) Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers. Syst Biol 56: 400–411
Cavender JA, Felsenstein J (1987) Invariants on phylogenies in a simple case with discrete states. J Classif 4: 57–71
Cayley A (1857) On the theory of the analytical forms called trees. Phil Mag 13: 172–176
Cranston KA, Hurwitz B, Ware D, Stein L, Wing RA (2009) Species trees from highly incongruent gene trees in rice. Syst Biol 58: 489–500
Degnan JH (2010) Probabilities of gene-tree topologies with intraspecific sampling given a species tree. In: Knowles LL, Kubatko LS (eds) Estimating species trees: practical and theoretical aspects. Wiley-Blackwell. ISBN: 978-0-470-52685-9
Degnan JH, Rosenberg NA (2006) Discordance of species trees with their most likely gene trees. PLoS Genetics 2: 762–768
Degnan JH, Rosenberg NA (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol 24: 332–340
Degnan JH, Salter LA (2005) Gene tree distributions under the coalescent process. Evolution 59: 24–37
Degnan JH, DeGiorgio M, Bryant D, Rosenberg NA (2009) Properties of consensus methods for inferring species trees from gene trees. Syst Biol 58: 35–54
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17: 368–376
Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, Sunderland
Graham SW, Olmstead RG, Barrett SCH (2002) Rooting phylogenetic trees with distant outgroups: a case study from the commelinoid monocots. Mol Biol Evol 19: 1769–1781
Greuel GM, Pfister G, Schönemann H (2009) Singular 3.1.0—a computer algebra system for polynomial computations. http://www.singular.uni-kl.de
Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data. Mol Biol Evol 27: 570–580
Hudson RR (1983) Testing the constant-rate neutral allele model with protein sequence data. Evolution 37: 203–217
Huelsenbeck JP, Bollback JP, Levine AM (2002) Inferring the root of a phylogenetic tree. Syst Biol 51: 32–43
Jennings WB, Edwards SV (2005) Speciational history of Australian grassfinches (Poephila) inferred from thirty gene trees. Evolution 59: 2033–2047
Kim J, Rohlf FJ, Sokal RR (1993) The accuracy of phylogenetic estimation using the neighbor-joining method. Evolution 47: 471–486
Kingman JFC (1982) On the genealogy of large populations. J Appl Probab 19A: 27–43
Kubatko LS, Carstens BC, Knowles LL (2009) STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25: 971–973
Lake JA (1987) A rate independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol 4: 167–191
Liu L, Pearl DK (2007) Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Syst Biol 56: 504–514
Liu L, Yu L, Pearl DK (2010) Maximum tree: a consistent estimator of the species tree. J Math Biol 60: 95–106
Maddison WP, Knowles LL (2006) Inferring phylogeny despite incomplete lineage sorting. Syst Biol 55: 21–30
Mossel E, Roch S (2010) Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE Comp Bio Bioinformatics 7: 166–171
Nei M (1987) Molecular evolutionary genetics. Columbia University Press, New York
Nordborg M (2001) Coalescent theory. In: Balding DJ, Bishop M, Cannings C (eds) Handbook of statistical genetics, 1st edn, chap 7. Wiley, New York, pp 179–212
Pamilo P, Nei M (1988) Relationships between gene trees and species trees. Mol Biol Evol 5: 568–583
Poe S, Chubb AL (2004) Birds in a bush: five genes indicate explosive radiation of avian orders. Evolution 58: 404–415
Rokas A, Williams B, King N, Carroll S (2003) Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425: 798–804
Rosenberg NA (2002) The probability of topological concordance of gene trees and species trees. Theor Popul Biol 61: 225–247
Rosenberg NA (2007) Counting coalescent histories. J Comp Biol 14: 360–377
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4: 406–425
Semple C, Steel M (2003) Phylogenetics. Oxford University Press, Oxford
Slatkin M, Pollack JL (2008) Subdivision in an ancestral species creates an asymmetry in gene trees. Mol Biol Evol 25: 2241–2246
Steel M (1992) The complexity of reconstructing trees from qualitative characters and subtrees. J Classif 9: 91–116
Strimmer K, von Haeseler A (1996) Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol Biol Evol 13: 964–969
Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105: 437–460
Takahata N (1989) Gene genealogy in three related populations: consistency probability between gene and population trees. Genetics 122: 957–966
Tavaré S (1984) Line-of-descent and genealogical processes, and their applications in population genetics models. Theor Popul Biol 26: 119–164
Wakeley J (2008) Coalescent theory. Roberts & Company, Greenwood Village
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Allman, E.S., Degnan, J.H. & Rhodes, J.A. Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J. Math. Biol. 62, 833–862 (2011). https://doi.org/10.1007/s00285-010-0355-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00285-010-0355-7