Abstract
Phylogenies based on different genes can produce conflicting phylogenies; methods that resolve such ambiguities are becoming more popular, and offer a number of advantages for phylogenetic analysis. We review so-called species tree methods and the biological forces that can undermine them by violating important aspects of the underlying models. Such forces include horizontal gene transfer, gene duplication, and natural selection. We review ways of detecting loci influenced by such forces and offer suggestions for identifying or accommodating them. The way forward involves identifying outlier loci, as is done in population genetic analysis of neutral and selected loci, and removing them from further analysis, or developing more complex species tree models that can accommodate such loci.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Hillis DM (1987) Molecular Versus Morphological Approaches to Systematics. Annu Rev Ecol Syst 18:23–42
Kocher TD, Thomas WK, Meyer A et al (1989) Dynamics of mitochondrial DNA evolution in animals: amplification and sequencing with conserved primers. Proc Natl Acad Sci USA 86:6196–6200
Miyamoto MM, Cracraft J (1991) Phylogeny inference, DNA sequence analysis, and the future of molecular systematics. In: Miyamoto MM, Cracraft J (eds) Phylogenetic Analysis of DNA Sequences. Oxford Univ. Press, New York
Swofford DL, Olsen GJ, Waddell PJ et al (1996) Phylogenetic inference. In: Hillis DM MC, Mable BK (ed) Molecular Systematics. Sinauer Associates, Sunderland MA
Nei M (1987) Molecular Evolutionary Genetics, Columbia University Press, New York
Nei M, Kumar S (2000) Molecular Evolution and Phylogenetics, Oxford University Press, New York
Rosenberg NA (2002) The Probability of Topological Concordance of Gene Trees and Species Trees. Theor Popul Biol 61:225–247
Cavalli-Sforza LL (1964) Population structure and human evolution. Proc R Soc Lond, Ser B: Biol Sci 164:362–379
Avise JC, Arnold J, Ball RM et al (1987) Intraspecific phylogeography: the mitochondrial DNA bridge between population genetics and systematics. Annu Rev Ecol Syst 18:489–522
Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105:437–460
Pamilo P, Nei M (1988) Relationships between gene trees and species trees. Molecular Biological Evolution 5:568–583
Takahata N (1989) Gene genealogy in three related populations: consistency probability between gene and population trees. Genetics 122:957–966
Avise JC (1994) Molecular markers, natural history and evolution, Chapman and Hall, New York
Wollenberg K, Avise JC (1998) Sampling properties of genealogical pathways underlying population pedigrees. Evolution 52:957–966
Gould SJ (2001) The Book of Life: An illustrated history of the evolution of life on earth, W. W. Norton & Co., New York
Maddison WP (1997) Gene trees in species trees. Syst Biol 46:523–536
Jennings WB, Edwards SV (2005) Speciational history of Australian grass finches (Poephila) inferred from thirty gene trees. Evolution 59:2033–2047
Carstens BC, Knowles LL (2007) Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: An example from melanoplus grasshoppers. Syst Biol 56(3):400–411
Wong A, Jensen JD, Pool JE et al (2007) Phylogenetic incongruence in the Drosophila melanogaster species group. Molecular Phylogenetic Evolution 43:1138–1150
Edwards SV (2009) Is a new and general theory of molecular systematics emerging? Evolution 63:1–19
Neigel JE, Avise JC (1986) Phylogenetic relationships of mitochondrial DNA under various demographic models of speciation. In: Karlin S, Nevo E (eds) Evolutionary processes and theory. Academic Press, New York
Satta Y, Klein J, Takahata N (2000) DNA Archives and Our Nearest Relative: The Trichotomy Problem Revisited. Mol Phylogen Evol 14(2):259–275
Degnan JH, Rosenberg NA (2006) Discordance of Species Trees with Their Most Likely Gene Trees. PLoS Genet 2(5):e68
Rosenberg NA, Tao R (2008) Discordance of species trees with their most likely gene trees: the case of five taxa. Syst Biol 57:131–140
Degnan JH, Rosenberg NA (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol 24:332–340
Huang H, Knowles LL (2009) What Is the Danger of the Anomaly Zone for Empirical Phylogenetics? Syst Biol 58(5):527–536
Bryant D (2003) A Classification of Consensus Methods for Phylogenetics. In: Janowitz MF, Lapointe F-J, McMorris FR, Mirking B, Roberts FS (eds) Bioconsensus. American Mathematical Society, Providence RI
Felsenstein J (2004) Inferring Phylogenies, Sinauer Associates, Sunderland MA
Ewing GB, Ebersberger I, Schmidt HA et al (2008) Rooted triple consensus and anomalous gene trees. BMC Evol Biol 8:118
Degnan JH, DeGiorgio M, Bryant D et al (2009) Properties of Consensus Methods for Inferring Species Trees from Gene Trees. Syst Biol
Steel M, Rodrigo A (2008) Maximum Likelihood Supertrees. Syst Biol 57(2):243–250
Ranwez V, Criscuolo A, Douzery EJP (2010) SUPERTRIPLETS: a triplet-based supertree approach to phylogenomics. Bioinformatics 26(12):i115-i123
Ané C, Larget B, Baum DA et al (2007) Bayesian Estimation of Concordance among Gene Trees. Mol Biol Evol 24:412–426
Larget BR, Kotha SK, Dewey CN et al BUCKy: Gene tree/species tree reconciliation with Bayesian concordance analysis. Bioinformatics 26:2910–2911
Wiens JJ (2003) Missing data, incomplete taxa, and phylogenetic accuracy. Syst Biol 52:528–538
Gadagkar SR, Rosenberg MS, Kumar S (2005) Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. Journal of Experimental Zoology B 304(1):64–74
Bull JJ, Huelsenbeck JP, Cunningham CW et al (1993) Partitioning and Combining Data in Phylogenetic Analysis. Syst Biol 43:384–397
Rokas A, Williams BL, Carroll NKSB et al (2003) Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425:798–804
Driskell AC, Ane C, Burleigh JG et al (2004) Prospects for Building the Tree of Life from Large Sequence Databases. Science 306:1172–1174
Rokas A (2006) Genomics and the Tree of Life. Science 313:1897–1899
Kubatko LS, Degnan JH (2007) Inconsistency of Phylogenetic Estimates from Concatenated Data under Coalescence. Syst Biol 56(1):17–24
Wu M, Eisen JA (2008) A simple, fast, and accurate method of phylogenomic inference. Genome Biology 9:R151
Degnan JH, Salter LA (2005) Gene tree distributions under the coalescent process. Evolution 59:24–37
Liu L (2008) BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics 24(21):2542–2543
Liu L, Yu L, Kubatko LS et al (2009) Coalescent methods for estimating phylogenetic trees. Mol Phylogen Evol 53:320–328
Castillo-Ramirez S, Liu L, Pearl DK et al (2010) Bayesian estimation of species trees: a practical guide to optimal sampling and analysis. In: Knowles LL, Kubatko LS (eds) Estimating species trees: Practical and theoretical aspects. Hoboken NJ, John Wiley and Sons
Gillespie JH (2004) Population Genetics: A Concise Guide, 2nd edn. The Johns Hopkins University Press, Baltimore, MD
Wakeley J (2009) Coalescent Theory: An Introduction, Roberts & Co. Publishers, Greenwood Village, CO
Hartl DL, Clark AG (2006) Principles of Population Genetics, 4th edn. Sinauer Associates, Inc., Sunderland, MA
Wilson IJ, Weale ME, Balding DJ (2003) Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities. Journal of the Royal Statistical Society: Series A 166:155–158
Maddison WP, Knowles LL (2006) Inferring phylogeny despite incomplete lineage sorting. Syst Biol 55:21–30
Kubatko LS, Carstens BC, Knowles LL (2009) STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25(7):971–973
O’Meara BC (2010) New Heuristic Methods for Joint Species Delimitation and Species Tree Inference. Syst Biol 59(1):59–73
O’Meara BC (2008) Using trees: myrmecocystus phylogeny and character evolution and new methods for investigating trait evolution and species delimitation
Mossel E, Roch S (2007) Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci. [mss]
Rannala B, Yang Z (2003) Bayes Estimation of Species Divergence Times and Ancestral Population Sizes Using DNA Sequences From Multiple Loci. Genetics 164:1645–1656
Yang Z, Rannala B (2010) Bayesian species delimitation using multilocus sequence data. Proc Natl Acad Sci USA 107:9264–9269
Liu L, Yu L, Edwards SV (2010) A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol Biol 10:302
Oliver JC (2008) AUGIST: inferring species trees while accommodating gene tree uncertainty. Bioinformatics 24:2932–2933
Liu L, Pearl DK (2007) Species Trees from Gene Trees: Reconstructing Bayesian Posterior Distributions of a Species Phylogeny Using Estimated Gene Tree Distributions. Syst Biol 56(3):504–514
Heled J, Drummond AJ (2010) Bayesian Inference of Species Trees from Multilocus Data. Mol Biol Evol 27:570–580
Chung Y, Ané C (2011) Comparing Two Bayesian Methods for Gene Tree/Species Tree Reconstruction: Simulations with Incomplete Lineage Sorting and Horizontal Gene Transfer. Syst Biol 60:261–275
Leaché AD, Rannala B The Accuracy of Species Tree Estimation under Simulation: A Comparison of Methods. Syst Biol
Edwards SV, Liu L, Pearl DK (2007) High-resolution species trees without concatenation. Proc Natl Acad Sci USA 104:5936–5941
Liu L, Edwards SV (2009) Phylogenetic Analysis in the Anomaly Zone. Syst Biol 58:452–460
Huang H, He Q, Kubatko LS et al (2010) Sources of Error Inherent in Species-Tree Estimation: Impact of Mutational and Coalescent Effects on Accuracy and Implications for Choosing among Different Methods. Syst Biol 59(5):573–583
Suzuki Y, Glazko GV, Nei M (2002) Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc Natl Acad Sci USA 99:16138–16143
Avise JC, Ball RM (1990) Principles of genealogical concordance in species concepts and biological taxonomy. Oxford Surveys in Evolutionary Biology 7:45–67
He Y, Wu J, Dressman DC et al (2010) Heteroplasmic mitochondrial DNA mutations in normal and tumour cells. Nature 464:610–614
Leaché AD (2009) Species Tree Discordance Traces to Phylogeographic Clade Boundaries in North American Fence Lizards (Sceloporus). Syst Biol 58:547–559
De Queiroz K (2007) Species Concepts and Species Delimitation. Syst Biol 56:879–886
Hudson RR, Coyne JA (2002) Mathematical consequences of the genealogical species concept. Evolution 56:1557–1565
Tobias JA, Seddon N, Spottiswoode CN et al (2010) Quantitative criteria for species delimitation. Ibis 152(4):724–746
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959
Huelsenbeck JP, Andolfatto P (2007) Inference of Population Structure Under a Dirichlet Process Model. Genetics 175:187–1802
Leaché AD, Fujita MK (2010) Bayesian species delimitation in West African forest geckos (Hemidactylus fasciatus). Proc Natl Acad Sci USA 277:3071–3077
Knowles LL, Carstens BC (2007) Delimiting Species without Monophyletic Gene Trees. Syst Biol 56(6):887–895
Carstens BC, Dewey TA (2010) Species Delimitation Using a Combined Coalescent and Information-Theoretic Approach: An Example from North American Myotis Bats. Syst Biol 59:400–414
Wakeley J (2000) The effects of subdivision on the genetic divergence of populations and species. Evolution 54:1092–1101
Eckert AJ, Carstens BC (2008) Does gene flow destroy phylogenetic signal? The performance of three methods for estimating species phylogenies in the presence of gene flow. Mol Phylogen Evol 49:832–842
Doolittle WF, Bapteste E (2007) Pattern pluralism and the Tree of Life hypothesis. Proc Natl Acad Sci USA 104:2043–2049
Boto L (2010) Horizontal gene transfer in evolution: facts and challenges. Proc Roy Soc Lond B 277:819–827
Rivera MC, Lake JA (2004) The ring of life provides evidence for a genome fusion origin of eukaryotes. Nature 431:152–155
Kurland CG, Canback B, Berg OG (2003) Horizontal gene transfer: A critical view. Proc Natl Acad Sci USA 100:9658–9662
Hodkinson TR, Parnell JAN (2006) Introduction to the Systematics of Species Rich Groups. In: Hodkinson TR, Parnell JAN (eds) Reconstructing the tree of life: taxonomy and systematics of species rich taxa. CRC Press, Boca Raton, FL
Eisen JA (2000) Horizontal gene transfer among microbial genomes: new insights from complete genome analysis. Curr Opin Genet Dev 10:606–611
Jain R, Rivera MC, Lake JA (1999) Horizontal gene transfer among genomes: The complexity hypothesis. Proceedings of the National Academy of Sciences of the United States of America 96:3801–3806
Galtier N, Daubin V (2008) Dealing with incongruence in phylogenomic analyses. Philosophical Transactions of the Royal Society B: Biological Sciences 363:4023–4029
Andersson JO (2005) Lateral gene transfer in eukaryotes. Cell Mol Life Sci 62:1182–1197
Hotopp JCD, Clark ME, Oliveira DCSG et al (2007) Widespread Lateral Gene Transfer from Intracellular Bacteria to Multicellular Eukaryotes. Science 317:1753–1756
Thomas J, Schaack S, Pritham EJ (2010) Pervasive Horizontal Transfer of Rolling-Circle Transposons among Animals. Genome Biology and Evolution 2:656–664
Keeling PJ, Palmer JD (2008) Horizontal gene transfer in eukaryotic evolution. Nature Reviews Genetics 9:605–618
Blair JE (2009) Animals: Metazoa. In: Hedges SB, Kumar S (eds) The Timetree of Life. Oxford University Press, New York
Huang J, Gogarten JP (2006) Ancient horizontal gene transfer can benefit phylogenetic reconstruction. Trends Genet 22:361–366
Linz S, Radtke A, von Haesler A et al (2007) A Likelihood Framework to Measure Horizontal Gene Transfer. Mol Biol Evol 24:1312–1319
Rasmussen MD, Kellis M (2007) Accurate gene-tree reconstruction by learning gene- and species-specific substitution rates across multiple complete genomes. Genome Res 17:1932–1942
Rasmussen MD, Kellis M (2011) A Bayesian Approach for Fast and Accurate Gene Tree Reconstruction. Mol Biol Evol 28:273–290
Sanderson MJ, McMahon MM (2007) Inferring angiosperm phylogeny from EST data with widespread gene duplication. BMC Evol Biol 7:S1-S3
Edwards SV (2009) Natural selection and phylogenetic analysis. Proc Natl Acad Sci USA 106:8799–8800
Ray N, Excoffier L (2009) Inferring Past Demography Using Spatially Explicit Population Genetic Models. Human Biology 81:141–157
Castoe TA, Koning APJd, Kim H-M et al (2009) Evidence for an ancient adaptive episode of convergent molecular evolution. Proc Natl Acad Sci USA 106:8986–8991
Swofford DL (1991) When are phylogeny estimates from molecular and morphological data incongruent? Pp. 295–333 In: Miyamoto MM, Cracraft J (eds) Phylogenetic analysis of DNA sequences. Oxford Univ. Press, New York
Roettger M, Martin W, Dagan T (2009) A Machine-Learning Approach Reveals That Alignment Properties Alone Can Accurately Predict Inference of Lateral Gene Transfer from Discordant Phylogenies. Mol Biol Evol 26:1931–1939
Beaumont MA, Balding DJ (2004) Identifying adaptive genetic divergence among populations from genome scans. Mol Ecol 13:969–980
Waddington CH (1942) Canalization of development and the inheritance of acquired characters. Nature 150:563–565
Burke MK, Dunham JP, Shahrestani P et al (2010) Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature 467:587–590
Medrano-Soto A, Moreno-Hagelsieb G, Vinuesa P et al (2004) Successful lateral transfer requires codon usage compatibility between foreign genes and recipient genomes. Mol Biol Evol 21:1884–1894
Dufraigne C, Fertil B, Lespinats S et al (2005) Detection and characterization of horizontal transfers in prokaryotes using genomic signature. Nucleic Acid Research 33:e6
Lockhart PJ, Steel MA, Hendy MD et al (1994) Recovering evolutionary trees under a more realistic model of sequence evolution. Mol Biol Evol 11:605–612
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Evolution 17:368–376
Marjoram P, Molitor J, Plagnol V et al (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci USA 100:15324–15328
Galtier N (2007) A Model of Horizontal Gene Transfer and the Bacterial Phylogeny Problem. Syst Biol 56:633–642
Koslowski T, Zehender F (2005) Towards a quantitative understanding of horizontal gene transfer: A kinetic model. J Theor Biol 237:23–29
Suchard MA (2005) Stochastic Models for Horizontal Gene Transfer: Taking a Random Walk Through Tree Space. Genetics 170:419–431
Huson DH, Bryant D (2006) Application of Phylogenetic Networks in Evolutionary Studies. Mol Biol Evol 23:254–267
Lake JA, Rivera MC (2004) Deriving the Genomic Tree of Life in the Presence of Horizontal Gene Transfer: Conditioned Reconstruction. Mol Biol Evol 21:681–690
Ané C (2010) Reconstructing concordance trees and testing the coalescent model from genome-wide data sets. In: Knowles LL, Kubatko LS (eds) Estimating Species Trees: Practical and Theoretical Aspects. Wiley-Blackwell, Hoboken, NJ
Excoffier L, Novembre J, Schneider S (2000) SIMCOAL: a general coalescent program for simulation of molecular data in interconnected populations with arbitrary demography. J Hered 91:506–509
Anderson CNK, Ramakrishnan U, Chan YL et al (2005) Serial SimCoal: A population genetics model for data from multiple populations and points in time. Bioinformatics 21:1733–1734
Schneider S, Roessli D, Excoffier L (2005) Arlequin (version 3.0): An integrated software package for population genetics data analysis. Evolutionary Bioinformatics 1:47–50
Liu L, Yu L (2010) Phybase: an R package for species tree analysis. Bioinformatics 26:962–963
Kosiol C, Anisimova M (2012) Selection on the protein coding genome. In: Anisimova M (ed) Evolutionary genomics: statistical and computational methods (volume 2). Methods in Molecular Biology, Springer Science+Business Media New York
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix A: Simulating Gene Trees in Species Trees
Many researchers have found it useful to simulate the evolution of genes over a species tree topology. This can be done to test mathematical models, to get a feel for the amount of divergence expected in real data, or (as described below) to rigorously compare the ability of alternative species histories to account for data in hand. The program produces expected amounts of isolation due to drift, and in the context of Bayesian analysis can be used to infer other parameters regarding the demographic processes occurring at scales finer than the species group. A simple example of how this could be accomplished in Bayesian Serial SimCoal (118, 119) is described below. The suite of tools available through Arlequin (120) and the R-scripts in Phybase (121) can be used to further analyze the output of BayeSSC.
Although species trees can be simulated from a birth and death process using an R package TreeSim (http://cran.r-project.org/web/packages/TreeSim/index.html), researchers often adopt a fixed species tree to simulate genetic trees. Imagine a species tree with ten individuals, four species (with 4, 2, 3, and 1 representatives, respectively), and with known (or previously inferred) split times among taxa. In addition, we will assume for this example that the effective population size N e of each contemporary species is 1,000, and that the size of ancestral populations is the sum of the sizes of their respective descendent population. This situation is analogous to that depicted in Fig. 5. The corresponding NEXUS-formatted species tree is:
(D:1,500,(C:800,(B:500,A:500):300):700).
Here, branch lengths are in units of generations, which is commensurate with using units of individuals for the population sizes (other simulation methods use units of τ = μt and θ = 4Nμ, in units of substitutions per site, instead of t and N e , respectively).
A simple forward simulation can be run in any version of SimCoal using the following.par file:
Species tree input file; 10 taxa, 4 sp
4 demes
Deme sizes (arbitrary in this case)
1000
1000
1000
1000
Number of samples per deme
4
2
3
1
Growth rates
0
0
0
0
Number of migration matrices
0
Historical event: Date from to%mig new_N new_r migmat
3 events
500 1 0 1 2.00 0 0
800 2 0 1 1.50 0 0
1500 3 0 1 1.33 0 0
Mutations per generation for the whole sequence
0.0001
Number of loci
10
Data type: DNA, RFLP, or MICROSAT
DNA
//Mutation rates: Gamma parameters, theta and k
0 0
In this case, the tree was perfectly ordered, so all populations could simply fuse with deme 0, readjusting the population size each time. Of course, there is no need to assume that all populations have the same effective size, nor that N e of ancestral populations was the sum of their N e values of their descendants. If we wished to infer the size of clade AB at the time of the split, for example, we could replace the 2.00 in the first historical event with, for example, {U:0.5,3.0}, which would allow the program to infer the posterior probabilities of clade AB having an N e from 500 to 3,000 individuals. Similarly, if the mutation rate of the gene in question was unknown or if a range of mutation rates would simulate the desiderata, then the mutation rate constant, set in the example above at 0.0001, could be replaced with {E:0.0001}, creating an exponential distribution of mutation rates whose mean was 0.0001. Full documentation on the parameter files, and Bayesian inference using priors instead of constants, can be found at the BayeSSC Web site: http://www.stanford.edu/group/hadlylab/ssc/.
Note that the suite of Bayesian tools available at the Web site can be used to evaluate the relative strength of different species topologies. For example, the correspondence between output from the parameter file above with a perfectly ordered tree (((AB)C)D) and real data can be mathematically compared to the correspondence from a second file, where the tree is balanced with, say, topology ((AB)(CD)) instead.
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Anderson, C.N.K., Liu, L., Pearl, D., Edwards, S.V. (2012). Tangled Trees: The Challenge of Inferring Species Trees from Coalescent and Noncoalescent Genes. In: Anisimova, M. (eds) Evolutionary Genomics. Methods in Molecular Biology, vol 856. Humana Press. https://doi.org/10.1007/978-1-61779-585-5_1
Download citation
DOI: https://doi.org/10.1007/978-1-61779-585-5_1
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-61779-584-8
Online ISBN: 978-1-61779-585-5
eBook Packages: Springer Protocols