Skip to main content

Tangled Trees: The Challenge of Inferring Species Trees from Coalescent and Noncoalescent Genes

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 856))

Abstract

Phylogenies based on different genes can produce conflicting phylogenies; methods that resolve such ambiguities are becoming more popular, and offer a number of advantages for phylogenetic analysis. We review so-called species tree methods and the biological forces that can undermine them by violating important aspects of the underlying models. Such forces include horizontal gene transfer, gene duplication, and natural selection. We review ways of detecting loci influenced by such forces and offer suggestions for identifying or accommodating them. The way forward involves identifying outlier loci, as is done in population genetic analysis of neutral and selected loci, and removing them from further analysis, or developing more complex species tree models that can accommodate such loci.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Hillis DM (1987) Molecular Versus Morphological Approaches to Systematics. Annu Rev Ecol Syst 18:23–42

    Google Scholar 

  2. Kocher TD, Thomas WK, Meyer A et al (1989) Dynamics of mitochondrial DNA evolution in animals: amplification and sequencing with conserved primers. Proc Natl Acad Sci USA 86:6196–6200

    PubMed  CAS  Google Scholar 

  3. Miyamoto MM, Cracraft J (1991) Phylogeny inference, DNA sequence analysis, and the future of molecular systematics. In: Miyamoto MM, Cracraft J (eds) Phylogenetic Analysis of DNA Sequences. Oxford Univ. Press, New York

    Google Scholar 

  4. Swofford DL, Olsen GJ, Waddell PJ et al (1996) Phylogenetic inference. In: Hillis DM MC, Mable BK (ed) Molecular Systematics. Sinauer Associates, Sunderland MA

    Google Scholar 

  5. Nei M (1987) Molecular Evolutionary Genetics, Columbia University Press, New York

    Google Scholar 

  6. Nei M, Kumar S (2000) Molecular Evolution and Phylogenetics, Oxford University Press, New York

    Google Scholar 

  7. Rosenberg NA (2002) The Probability of Topological Concordance of Gene Trees and Species Trees. Theor Popul Biol 61:225–247

    PubMed  Google Scholar 

  8. Cavalli-Sforza LL (1964) Population structure and human evolution. Proc R Soc Lond, Ser B: Biol Sci 164:362–379

    Google Scholar 

  9. Avise JC, Arnold J, Ball RM et al (1987) Intraspecific phylogeography: the mitochondrial DNA bridge between population genetics and systematics. Annu Rev Ecol Syst 18:489–522

    Google Scholar 

  10. Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105:437–460

    PubMed  CAS  Google Scholar 

  11. Pamilo P, Nei M (1988) Relationships between gene trees and species trees. Molecular Biological Evolution 5:568–583

    CAS  Google Scholar 

  12. Takahata N (1989) Gene genealogy in three related populations: consistency probability between gene and population trees. Genetics 122:957–966

    PubMed  CAS  Google Scholar 

  13. Avise JC (1994) Molecular markers, natural history and evolution, Chapman and Hall, New York

    Google Scholar 

  14. Wollenberg K, Avise JC (1998) Sampling properties of genealogical pathways underlying population pedigrees. Evolution 52:957–966

    Google Scholar 

  15. Gould SJ (2001) The Book of Life: An illustrated history of the evolution of life on earth, W. W. Norton & Co., New York

    Google Scholar 

  16. Maddison WP (1997) Gene trees in species trees. Syst Biol 46:523–536

    Google Scholar 

  17. Jennings WB, Edwards SV (2005) Speciational history of Australian grass finches (Poephila) inferred from thirty gene trees. Evolution 59:2033–2047

    PubMed  CAS  Google Scholar 

  18. Carstens BC, Knowles LL (2007) Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: An example from melanoplus grasshoppers. Syst Biol 56(3):400–411

    PubMed  Google Scholar 

  19. Wong A, Jensen JD, Pool JE et al (2007) Phylogenetic incongruence in the Drosophila melanogaster species group. Molecular Phylogenetic Evolution 43:1138–1150

    CAS  Google Scholar 

  20. Edwards SV (2009) Is a new and general theory of molecular systematics emerging? Evolution 63:1–19

    PubMed  CAS  Google Scholar 

  21. Neigel JE, Avise JC (1986) Phylogenetic relationships of mitochondrial DNA under various demographic models of speciation. In: Karlin S, Nevo E (eds) Evolutionary processes and theory. Academic Press, New York

    Google Scholar 

  22. Satta Y, Klein J, Takahata N (2000) DNA Archives and Our Nearest Relative: The Trichotomy Problem Revisited. Mol Phylogen Evol 14(2):259–275

    CAS  Google Scholar 

  23. Degnan JH, Rosenberg NA (2006) Discordance of Species Trees with Their Most Likely Gene Trees. PLoS Genet 2(5):e68

    PubMed  Google Scholar 

  24. Rosenberg NA, Tao R (2008) Discordance of species trees with their most likely gene trees: the case of five taxa. Syst Biol 57:131–140

    PubMed  Google Scholar 

  25. Degnan JH, Rosenberg NA (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol 24:332–340

    PubMed  Google Scholar 

  26. Huang H, Knowles LL (2009) What Is the Danger of the Anomaly Zone for Empirical Phylogenetics? Syst Biol 58(5):527–536

    PubMed  CAS  Google Scholar 

  27. Bryant D (2003) A Classification of Consensus Methods for Phylogenetics. In: Janowitz MF, Lapointe F-J, McMorris FR, Mirking B, Roberts FS (eds) Bioconsensus. American Mathematical Society, Providence RI

    Google Scholar 

  28. Felsenstein J (2004) Inferring Phylogenies, Sinauer Associates, Sunderland MA

    Google Scholar 

  29. Ewing GB, Ebersberger I, Schmidt HA et al (2008) Rooted triple consensus and anomalous gene trees. BMC Evol Biol 8:118

    PubMed  Google Scholar 

  30. Degnan JH, DeGiorgio M, Bryant D et al (2009) Properties of Consensus Methods for Inferring Species Trees from Gene Trees. Syst Biol

    Google Scholar 

  31. Steel M, Rodrigo A (2008) Maximum Likelihood Supertrees. Syst Biol 57(2):243–250

    PubMed  Google Scholar 

  32. Ranwez V, Criscuolo A, Douzery EJP (2010) SUPERTRIPLETS: a triplet-based supertree approach to phylogenomics. Bioinformatics 26(12):i115-i123

    PubMed  CAS  Google Scholar 

  33. Ané C, Larget B, Baum DA et al (2007) Bayesian Estimation of Concordance among Gene Trees. Mol Biol Evol 24:412–426

    PubMed  Google Scholar 

  34. Larget BR, Kotha SK, Dewey CN et al BUCKy: Gene tree/species tree reconciliation with Bayesian concordance analysis. Bioinformatics 26:2910–2911

    Google Scholar 

  35. Wiens JJ (2003) Missing data, incomplete taxa, and phylogenetic accuracy. Syst Biol 52:528–538

    PubMed  Google Scholar 

  36. Gadagkar SR, Rosenberg MS, Kumar S (2005) Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. Journal of Experimental Zoology B 304(1):64–74

    Google Scholar 

  37. Bull JJ, Huelsenbeck JP, Cunningham CW et al (1993) Partitioning and Combining Data in Phylogenetic Analysis. Syst Biol 43:384–397

    Google Scholar 

  38. Rokas A, Williams BL, Carroll NKSB et al (2003) Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425:798–804

    PubMed  CAS  Google Scholar 

  39. Driskell AC, Ane C, Burleigh JG et al (2004) Prospects for Building the Tree of Life from Large Sequence Databases. Science 306:1172–1174

    PubMed  CAS  Google Scholar 

  40. Rokas A (2006) Genomics and the Tree of Life. Science 313:1897–1899

    PubMed  CAS  Google Scholar 

  41. Kubatko LS, Degnan JH (2007) Inconsistency of Phylogenetic Estimates from Concatenated Data under Coalescence. Syst Biol 56(1):17–24

    PubMed  CAS  Google Scholar 

  42. Wu M, Eisen JA (2008) A simple, fast, and accurate method of phylogenomic inference. Genome Biology 9:R151

    PubMed  Google Scholar 

  43. Degnan JH, Salter LA (2005) Gene tree distributions under the coalescent process. Evolution 59:24–37

    PubMed  Google Scholar 

  44. Liu L (2008) BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics 24(21):2542–2543

    PubMed  CAS  Google Scholar 

  45. Liu L, Yu L, Kubatko LS et al (2009) Coalescent methods for estimating phylogenetic trees. Mol Phylogen Evol 53:320–328

    CAS  Google Scholar 

  46. Castillo-Ramirez S, Liu L, Pearl DK et al (2010) Bayesian estimation of species trees: a practical guide to optimal sampling and analysis. In: Knowles LL, Kubatko LS (eds) Estimating species trees: Practical and theoretical aspects. Hoboken NJ, John Wiley and Sons

    Google Scholar 

  47. Gillespie JH (2004) Population Genetics: A Concise Guide, 2nd edn. The Johns Hopkins University Press, Baltimore, MD

    Google Scholar 

  48. Wakeley J (2009) Coalescent Theory: An Introduction, Roberts & Co. Publishers, Greenwood Village, CO

    Google Scholar 

  49. Hartl DL, Clark AG (2006) Principles of Population Genetics, 4th edn. Sinauer Associates, Inc., Sunderland, MA

    Google Scholar 

  50. Wilson IJ, Weale ME, Balding DJ (2003) Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities. Journal of the Royal Statistical Society: Series A 166:155–158

    Google Scholar 

  51. Maddison WP, Knowles LL (2006) Inferring phylogeny despite incomplete lineage sorting. Syst Biol 55:21–30

    PubMed  Google Scholar 

  52. Kubatko LS, Carstens BC, Knowles LL (2009) STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25(7):971–973

    PubMed  CAS  Google Scholar 

  53. O’Meara BC (2010) New Heuristic Methods for Joint Species Delimitation and Species Tree Inference. Syst Biol 59(1):59–73

    PubMed  Google Scholar 

  54. O’Meara BC (2008) Using trees: myrmecocystus phylogeny and character evolution and new methods for investigating trait evolution and species delimitation

    Google Scholar 

  55. Mossel E, Roch S (2007) Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci. [mss]

    Google Scholar 

  56. Rannala B, Yang Z (2003) Bayes Estimation of Species Divergence Times and Ancestral Population Sizes Using DNA Sequences From Multiple Loci. Genetics 164:1645–1656

    PubMed  CAS  Google Scholar 

  57. Yang Z, Rannala B (2010) Bayesian species delimitation using multilocus sequence data. Proc Natl Acad Sci USA 107:9264–9269

    PubMed  CAS  Google Scholar 

  58. Liu L, Yu L, Edwards SV (2010) A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol Biol 10:302

    PubMed  Google Scholar 

  59. Oliver JC (2008) AUGIST: inferring species trees while accommodating gene tree uncertainty. Bioinformatics 24:2932–2933

    PubMed  CAS  Google Scholar 

  60. Liu L, Pearl DK (2007) Species Trees from Gene Trees: Reconstructing Bayesian Posterior Distributions of a Species Phylogeny Using Estimated Gene Tree Distributions. Syst Biol 56(3):504–514

    PubMed  CAS  Google Scholar 

  61. Heled J, Drummond AJ (2010) Bayesian Inference of Species Trees from Multilocus Data. Mol Biol Evol 27:570–580

    PubMed  CAS  Google Scholar 

  62. Chung Y, Ané C (2011) Comparing Two Bayesian Methods for Gene Tree/Species Tree Reconstruction: Simulations with Incomplete Lineage Sorting and Horizontal Gene Transfer. Syst Biol 60:261–275

    Google Scholar 

  63. Leaché AD, Rannala B The Accuracy of Species Tree Estimation under Simulation: A Comparison of Methods. Syst Biol

    Google Scholar 

  64. Edwards SV, Liu L, Pearl DK (2007) High-resolution species trees without concatenation. Proc Natl Acad Sci USA 104:5936–5941

    PubMed  CAS  Google Scholar 

  65. Liu L, Edwards SV (2009) Phylogenetic Analysis in the Anomaly Zone. Syst Biol 58:452–460

    PubMed  Google Scholar 

  66. Huang H, He Q, Kubatko LS et al (2010) Sources of Error Inherent in Species-Tree Estimation: Impact of Mutational and Coalescent Effects on Accuracy and Implications for Choosing among Different Methods. Syst Biol 59(5):573–583

    PubMed  Google Scholar 

  67. Suzuki Y, Glazko GV, Nei M (2002) Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc Natl Acad Sci USA 99:16138–16143

    PubMed  CAS  Google Scholar 

  68. Avise JC, Ball RM (1990) Principles of genealogical concordance in species concepts and biological taxonomy. Oxford Surveys in Evolutionary Biology 7:45–67

    Google Scholar 

  69. He Y, Wu J, Dressman DC et al (2010) Heteroplasmic mitochondrial DNA mutations in normal and tumour cells. Nature 464:610–614

    PubMed  CAS  Google Scholar 

  70. Leaché AD (2009) Species Tree Discordance Traces to Phylogeographic Clade Boundaries in North American Fence Lizards (Sceloporus). Syst Biol 58:547–559

    PubMed  Google Scholar 

  71. De Queiroz K (2007) Species Concepts and Species Delimitation. Syst Biol 56:879–886

    PubMed  Google Scholar 

  72. Hudson RR, Coyne JA (2002) Mathematical consequences of the genealogical species concept. Evolution 56:1557–1565

    PubMed  Google Scholar 

  73. Tobias JA, Seddon N, Spottiswoode CN et al (2010) Quantitative criteria for species delimitation. Ibis 152(4):724–746

    Google Scholar 

  74. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959

    PubMed  CAS  Google Scholar 

  75. Huelsenbeck JP, Andolfatto P (2007) Inference of Population Structure Under a Dirichlet Process Model. Genetics 175:187–1802

    Google Scholar 

  76. Leaché AD, Fujita MK (2010) Bayesian species delimitation in West African forest geckos (Hemidactylus fasciatus). Proc Natl Acad Sci USA 277:3071–3077

    Google Scholar 

  77. Knowles LL, Carstens BC (2007) Delimiting Species without Monophyletic Gene Trees. Syst Biol 56(6):887–895

    PubMed  Google Scholar 

  78. Carstens BC, Dewey TA (2010) Species Delimitation Using a Combined Coalescent and Information-Theoretic Approach: An Example from North American Myotis Bats. Syst Biol 59:400–414

    PubMed  Google Scholar 

  79. Wakeley J (2000) The effects of subdivision on the genetic divergence of populations and species. Evolution 54:1092–1101

    PubMed  CAS  Google Scholar 

  80. Eckert AJ, Carstens BC (2008) Does gene flow destroy phylogenetic signal? The performance of three methods for estimating species phylogenies in the presence of gene flow. Mol Phylogen Evol 49:832–842

    CAS  Google Scholar 

  81. Doolittle WF, Bapteste E (2007) Pattern pluralism and the Tree of Life hypothesis. Proc Natl Acad Sci USA 104:2043–2049

    PubMed  CAS  Google Scholar 

  82. Boto L (2010) Horizontal gene transfer in evolution: facts and challenges. Proc Roy Soc Lond B 277:819–827

    PubMed  Google Scholar 

  83. Rivera MC, Lake JA (2004) The ring of life provides evidence for a genome fusion origin of eukaryotes. Nature 431:152–155

    PubMed  CAS  Google Scholar 

  84. Kurland CG, Canback B, Berg OG (2003) Horizontal gene transfer: A critical view. Proc Natl Acad Sci USA 100:9658–9662

    PubMed  CAS  Google Scholar 

  85. Hodkinson TR, Parnell JAN (2006) Introduction to the Systematics of Species Rich Groups. In: Hodkinson TR, Parnell JAN (eds) Reconstructing the tree of life: taxonomy and systematics of species rich taxa. CRC Press, Boca Raton, FL

    Google Scholar 

  86. Eisen JA (2000) Horizontal gene transfer among microbial genomes: new insights from complete genome analysis. Curr Opin Genet Dev 10:606–611

    PubMed  CAS  Google Scholar 

  87. Jain R, Rivera MC, Lake JA (1999) Horizontal gene transfer among genomes: The complexity hypothesis. Proceedings of the National Academy of Sciences of the United States of America 96:3801–3806

    PubMed  CAS  Google Scholar 

  88. Galtier N, Daubin V (2008) Dealing with incongruence in phylogenomic analyses. Philosophical Transactions of the Royal Society B: Biological Sciences 363:4023–4029

    Google Scholar 

  89. Andersson JO (2005) Lateral gene transfer in eukaryotes. Cell Mol Life Sci 62:1182–1197

    PubMed  CAS  Google Scholar 

  90. Hotopp JCD, Clark ME, Oliveira DCSG et al (2007) Widespread Lateral Gene Transfer from Intracellular Bacteria to Multicellular Eukaryotes. Science 317:1753–1756

    Google Scholar 

  91. Thomas J, Schaack S, Pritham EJ (2010) Pervasive Horizontal Transfer of Rolling-Circle Transposons among Animals. Genome Biology and Evolution 2:656–664

    PubMed  Google Scholar 

  92. Keeling PJ, Palmer JD (2008) Horizontal gene transfer in eukaryotic evolution. Nature Reviews Genetics 9:605–618

    PubMed  CAS  Google Scholar 

  93. Blair JE (2009) Animals: Metazoa. In: Hedges SB, Kumar S (eds) The Timetree of Life. Oxford University Press, New York

    Google Scholar 

  94. Huang J, Gogarten JP (2006) Ancient horizontal gene transfer can benefit phylogenetic reconstruction. Trends Genet 22:361–366

    PubMed  CAS  Google Scholar 

  95. Linz S, Radtke A, von Haesler A et al (2007) A Likelihood Framework to Measure Horizontal Gene Transfer. Mol Biol Evol 24:1312–1319

    PubMed  CAS  Google Scholar 

  96. Rasmussen MD, Kellis M (2007) Accurate gene-tree reconstruction by learning gene- and species-specific substitution rates across multiple complete genomes. Genome Res 17:1932–1942

    PubMed  CAS  Google Scholar 

  97. Rasmussen MD, Kellis M (2011) A Bayesian Approach for Fast and Accurate Gene Tree Reconstruction. Mol Biol Evol 28:273–290

    Google Scholar 

  98. Sanderson MJ, McMahon MM (2007) Inferring angiosperm phylogeny from EST data with widespread gene duplication. BMC Evol Biol 7:S1-S3

    Google Scholar 

  99. Edwards SV (2009) Natural selection and phylogenetic analysis. Proc Natl Acad Sci USA 106:8799–8800

    PubMed  CAS  Google Scholar 

  100. Ray N, Excoffier L (2009) Inferring Past Demography Using Spatially Explicit Population Genetic Models. Human Biology 81:141–157

    PubMed  Google Scholar 

  101. Castoe TA, Koning APJd, Kim H-M et al (2009) Evidence for an ancient adaptive episode of convergent molecular evolution. Proc Natl Acad Sci USA 106:8986–8991

    Google Scholar 

  102. Swofford DL (1991) When are phylogeny estimates from molecular and morphological data incongruent? Pp. 295–333 In: Miyamoto MM, Cracraft J (eds) Phylogenetic analysis of DNA sequences. Oxford Univ. Press, New York

    Google Scholar 

  103. Roettger M, Martin W, Dagan T (2009) A Machine-Learning Approach Reveals That Alignment Properties Alone Can Accurately Predict Inference of Lateral Gene Transfer from Discordant Phylogenies. Mol Biol Evol 26:1931–1939

    PubMed  CAS  Google Scholar 

  104. Beaumont MA, Balding DJ (2004) Identifying adaptive genetic divergence among populations from genome scans. Mol Ecol 13:969–980

    PubMed  CAS  Google Scholar 

  105. Waddington CH (1942) Canalization of development and the inheritance of acquired characters. Nature 150:563–565

    Google Scholar 

  106. Burke MK, Dunham JP, Shahrestani P et al (2010) Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature 467:587–590

    PubMed  CAS  Google Scholar 

  107. Medrano-Soto A, Moreno-Hagelsieb G, Vinuesa P et al (2004) Successful lateral transfer requires codon usage compatibility between foreign genes and recipient genomes. Mol Biol Evol 21:1884–1894

    PubMed  CAS  Google Scholar 

  108. Dufraigne C, Fertil B, Lespinats S et al (2005) Detection and characterization of horizontal transfers in prokaryotes using genomic signature. Nucleic Acid Research 33:e6

    Google Scholar 

  109. Lockhart PJ, Steel MA, Hendy MD et al (1994) Recovering evolutionary trees under a more realistic model of sequence evolution. Mol Biol Evol 11:605–612

    PubMed  CAS  Google Scholar 

  110. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Evolution 17:368–376

    CAS  Google Scholar 

  111. Marjoram P, Molitor J, Plagnol V et al (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci USA 100:15324–15328

    PubMed  CAS  Google Scholar 

  112. Galtier N (2007) A Model of Horizontal Gene Transfer and the Bacterial Phylogeny Problem. Syst Biol 56:633–642

    PubMed  Google Scholar 

  113. Koslowski T, Zehender F (2005) Towards a quantitative understanding of horizontal gene transfer: A kinetic model. J Theor Biol 237:23–29

    PubMed  CAS  Google Scholar 

  114. Suchard MA (2005) Stochastic Models for Horizontal Gene Transfer: Taking a Random Walk Through Tree Space. Genetics 170:419–431

    PubMed  CAS  Google Scholar 

  115. Huson DH, Bryant D (2006) Application of Phylogenetic Networks in Evolutionary Studies. Mol Biol Evol 23:254–267

    PubMed  CAS  Google Scholar 

  116. Lake JA, Rivera MC (2004) Deriving the Genomic Tree of Life in the Presence of Horizontal Gene Transfer: Conditioned Reconstruction. Mol Biol Evol 21:681–690

    PubMed  CAS  Google Scholar 

  117. Ané C (2010) Reconstructing concordance trees and testing the coalescent model from genome-wide data sets. In: Knowles LL, Kubatko LS (eds) Estimating Species Trees: Practical and Theoretical Aspects. Wiley-Blackwell, Hoboken, NJ

    Google Scholar 

  118. Excoffier L, Novembre J, Schneider S (2000) SIMCOAL: a general coalescent program for simulation of molecular data in interconnected populations with arbitrary demography. J Hered 91:506–509

    PubMed  CAS  Google Scholar 

  119. Anderson CNK, Ramakrishnan U, Chan YL et al (2005) Serial SimCoal: A population genetics model for data from multiple populations and points in time. Bioinformatics 21:1733–1734

    PubMed  CAS  Google Scholar 

  120. Schneider S, Roessli D, Excoffier L (2005) Arlequin (version 3.0): An integrated software package for population genetics data analysis. Evolutionary Bioinformatics 1:47–50

    Google Scholar 

  121. Liu L, Yu L (2010) Phybase: an R package for species tree analysis. Bioinformatics 26:962–963

    PubMed  CAS  Google Scholar 

  122. Kosiol C, Anisimova M (2012) Selection on the protein coding genome. In: Anisimova M (ed) Evolutionary genomics: statistical and computational methods (volume 2). Methods in Molecular Biology, Springer Science+Business Media New York

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Scott V. Edwards .

Editor information

Editors and Affiliations

Appendices

Appendix A: Simulating Gene Trees in Species Trees

Many researchers have found it useful to simulate the evolution of genes over a species tree topology. This can be done to test mathematical models, to get a feel for the amount of divergence expected in real data, or (as described below) to rigorously compare the ability of alternative species histories to account for data in hand. The program produces expected amounts of isolation due to drift, and in the context of Bayesian analysis can be used to infer other parameters regarding the demographic processes occurring at scales finer than the species group. A simple example of how this could be accomplished in Bayesian Serial SimCoal (118, 119) is described below. The suite of tools available through Arlequin (120) and the R-scripts in Phybase (121) can be used to further analyze the output of BayeSSC.

Although species trees can be simulated from a birth and death process using an R package TreeSim (http://cran.r-project.org/web/packages/TreeSim/index.html), researchers often adopt a fixed species tree to simulate genetic trees. Imagine a species tree with ten individuals, four species (with 4, 2, 3, and 1 representatives, respectively), and with known (or previously inferred) split times among taxa. In addition, we will assume for this example that the effective population size N e of each contemporary species is 1,000, and that the size of ancestral populations is the sum of the sizes of their respective descendent population. This situation is analogous to that depicted in Fig. 5. The corresponding NEXUS-formatted species tree is:

Fig. 5.
figure 5_1

The species tree simulated in the Appendix. Branch lengths are in units of generations, and branch widths (population sizes) are in units of individuals. This particular tree has the constraint that ancestral population sizes are the sum of the population sizes of descendent lineages, but of course one can simulate without these constraints using either Serial SimCoal or Phybase.

(D:1,500,(C:800,(B:500,A:500):300):700).

Here, branch lengths are in units of generations, which is commensurate with using units of individuals for the population sizes (other simulation methods use units of τ = μt and θ = 4, in units of substitutions per site, instead of t and N e , respectively).

A simple forward simulation can be run in any version of SimCoal using the following.par file:

Species tree input file; 10 taxa, 4 sp

4 demes

Deme sizes (arbitrary in this case)

1000

1000

1000

1000

Number of samples per deme

4

2

3

1

Growth rates

0

0

0

0

Number of migration matrices

0

Historical event: Date from to%mig new_N new_r migmat

3 events

500 1 0 1 2.00 0 0

800 2 0 1 1.50 0 0

1500 3 0 1 1.33 0 0

Mutations per generation for the whole sequence

0.0001

Number of loci

10

Data type: DNA, RFLP, or MICROSAT

DNA

//Mutation rates: Gamma parameters, theta and k

0 0

In this case, the tree was perfectly ordered, so all populations could simply fuse with deme 0, readjusting the population size each time. Of course, there is no need to assume that all populations have the same effective size, nor that N e of ancestral populations was the sum of their N e values of their descendants. If we wished to infer the size of clade AB at the time of the split, for example, we could replace the 2.00 in the first historical event with, for example, {U:0.5,3.0}, which would allow the program to infer the posterior probabilities of clade AB having an N e from 500 to 3,000 individuals. Similarly, if the mutation rate of the gene in question was unknown or if a range of mutation rates would simulate the desiderata, then the mutation rate constant, set in the example above at 0.0001, could be replaced with {E:0.0001}, creating an exponential distribution of mutation rates whose mean was 0.0001. Full documentation on the parameter files, and Bayesian inference using priors instead of constants, can be found at the BayeSSC Web site: http://www.stanford.edu/group/hadlylab/ssc/.

Note that the suite of Bayesian tools available at the Web site can be used to evaluate the relative strength of different species topologies. For example, the correspondence between output from the parameter file above with a perfectly ordered tree (((AB)C)D) and real data can be mathematically compared to the correspondence from a second file, where the tree is balanced with, say, topology ((AB)(CD)) instead.

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Anderson, C.N.K., Liu, L., Pearl, D., Edwards, S.V. (2012). Tangled Trees: The Challenge of Inferring Species Trees from Coalescent and Noncoalescent Genes. In: Anisimova, M. (eds) Evolutionary Genomics. Methods in Molecular Biology, vol 856. Humana Press. https://doi.org/10.1007/978-1-61779-585-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-585-5_1

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-61779-584-8

  • Online ISBN: 978-1-61779-585-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics