Skip to main content

Inferring Trees

  • Protocol
  • First Online:
Bioinformatics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1525))

Abstract

Molecular evolution can reveal the relationship between sets of homologous sequences and the patterns of change that occur during their evolution. An important aspect of these studies is the inference of a phylogenetic tree, which explicitly describes evolutionary relationships between homologous sequences. This chapter provides an introduction to evolutionary trees and how to infer them from sequence data using some commonly used inferential methodology. It focuses on statistical methods for inferring trees and how to assess the confidence one should have in any resulting tree, with a particular emphasis on the underlying assumptions of the methods and how they might affect the tree estimate. There is also some discussion of the underlying algorithms used to perform tree search and recommendations regarding the performance of different algorithms. Finally, there are a few practical guidelines, including how to combine multiple software packages to improve inference, and a comparison between Bayesian and Maximum likelihood phylogenetics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Hahn BH et al (2000) AIDS—AIDS as a zoonosis: scientific and public health implications. Science 287:607–614

    Article  CAS  PubMed  Google Scholar 

  2. Pellegrini M et al (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A 96:4285–4288

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Ames RM et al (2012) Determining the evolutionary history of gene families. Bioinformatics 28:48–55

    Article  CAS  PubMed  Google Scholar 

  4. Liberles DA et al (2012) The interface of protein structure, protein biophysics, and molecular evolution. Protein Sci 21:769–785

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Hahn MW, Han MV, Han S-G (2007) Gene family evolution across 12 Drosophila genomes. PLoS Genet 3:e197

    Article  PubMed  PubMed Central  Google Scholar 

  6. Mouse Genome Sequencing Consortium (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562

    Article  Google Scholar 

  7. Lynch M, Walsh B (2007) The origins of genome architecture. Sinauer Associates, Sunderland, MA

    Google Scholar 

  8. Gogarten JP, Doolittle WF, Lawrence JG (2002) Prokaryotic evolution in light of gene transfer. Mol Biol Evol 19:2226–2238

    Article  CAS  PubMed  Google Scholar 

  9. Yang Z, Rannala B (2010) Bayesian species delimitation using multilocus sequence data. Proc Natl Acad Sci U S A 107:9264–9269

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Siepel A et al (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15:1034–1050

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Felsenstein J (2003) Inferring Phylogenies. Sinauer Associates, Sunderland, MA

    Google Scholar 

  12. Löytynoja A, Goldman N (2008) Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320:1632–1635

    Article  PubMed  Google Scholar 

  13. Anisimova M, Cannarozzi G, Liberles DA (2010) Finding the balance between the mathematical and biological optima in multiple sequence alignment. Trends Evol Biol 2:e7

    Article  Google Scholar 

  14. Löytynoja A (2012) Alignment methods: strategies, challenges, benchmarking, and comparative overview. In: Evolutionary genomics. Springer, New York, pp 203–235.

    Google Scholar 

  15. Yang Z (2006) Computational molecular evolution. Oxford University Press, Oxford

    Book  Google Scholar 

  16. Redelings B, Suchard M (2005) Joint Bayesian estimation of alignment and phylogeny. Syst Biol 54:401–418

    Article  PubMed  Google Scholar 

  17. Thorne JL, Kishino H, Felsenstein J (1991) An evolutionary model for maximum likelihood alignment of DNA sequences. J Mol Evol 33:114–124

    Article  CAS  PubMed  Google Scholar 

  18. McGuire G, Denham MC, Balding DJ (2001) Models of sequence evolution for DNA sequences containing gaps. Mol Biol Evol 18:481–490

    Article  CAS  PubMed  Google Scholar 

  19. Morrison DA, Ellis JT (1997) Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of Apicomplexa. Mol Biol Evol 14:428–441

    Article  CAS  PubMed  Google Scholar 

  20. Wong K, Suchard M, Huelsenbeck J (2008) Alignment uncertainty and genomic analysis. Science 319:473–476

    Article  CAS  PubMed  Google Scholar 

  21. Blackburne BP, Whelan S (2013) Class of multiple sequence alignment algorithm affects genomic analysis. Mol Biol Evol 30:642–653

    Article  CAS  PubMed  Google Scholar 

  22. Wägele JW, Mayer C (2007) Visualizing differences in phylogenetic information content of alignments and distinction of three classes of long-branch effects. BMC Evol Biol 7:147

    Article  PubMed  PubMed Central  Google Scholar 

  23. Hendy MD, Penny D (1993) Spectral analysis of phylogenetic data. J Classif 10:5–24

    Article  Google Scholar 

  24. Morrison DA (2010) Using data-display networks for exploratory data analysis in phylogenetic studies. Mol Biol Evol 27:1044–1057

    Article  CAS  PubMed  Google Scholar 

  25. Huson DH, Bryant D (2006) Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23:254–267

    Article  CAS  PubMed  Google Scholar 

  26. Morrison DA (2011) Introduction to phylogenetic networks. RJR Productions, Uppsala, Sweden

    Google Scholar 

  27. Philippe H, Germot A (2000) Phylogeny of eukaryotes based on ribosomal RNA: long-branch attraction and models of sequence evolution. Mol Biol Evol 17:830–834

    Article  CAS  PubMed  Google Scholar 

  28. Inagaki Y et al (2004) Covarion shifts cause a long-branch attraction artifact that unites microsporidia and archaebacteria in EF-1α phylogenies. Mol Biol Evol 21:1340–1349

    Article  CAS  PubMed  Google Scholar 

  29. Viklund J, Ettema TJ, Andersson SG (2011) Independent genome reduction and phylogenetic reclassification of the oceanic SAR11 clade. Mol Biol Evol 29:599–615

    Article  PubMed  Google Scholar 

  30. Morrison DA (2006) Phylogenetic analyses of parasites in the new millennium. Adv Parasitol 63:1–124

    Article  PubMed  Google Scholar 

  31. Edwards AWF (1972) Likelihood: an account of the statistical concept of likelihood and its application to scientific inference. Cambridge University Press, New York

    Google Scholar 

  32. Chang JT (1996) Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math Biosci 137:51–73

    Article  CAS  PubMed  Google Scholar 

  33. Rogers JS (1997) On the consistency of maximum likelihood estimation of phylogenetic trees from nucleotide sequences. Syst Biol 46:354–357

    Article  CAS  PubMed  Google Scholar 

  34. Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Izquierdo-Carrasco F, Smith SA, Stamatakis A (2011) Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees. BMC Bioinformatics 12:470

    Article  PubMed  PubMed Central  Google Scholar 

  36. Steel M, Penny D (2000) Parsimony, likelihood, and the role of models in molecular phylogenetics. Mol Biol Evol 17:839–850

    Article  CAS  PubMed  Google Scholar 

  37. Siddall ME, Kluge AG (1997) Probabilism and phylogenetic inference. Cladistics 13:313–336

    Article  Google Scholar 

  38. Saitou N, Nei M (1987) The neighbor-joining method—a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425

    CAS  PubMed  Google Scholar 

  39. Allman ES, Rhodes JA (2006) The identifiability of tree topology for phylogenetic models, including covarion and mixture models. J Comput Biol 13:1101–1113

    Article  CAS  PubMed  Google Scholar 

  40. Swofford DL et al (1996) Phylogenetic inference. In: Hillis DM, Moritz C, Mable BK (eds) Molecular systematics. Sinauer Associates, Sunderland, MA, pp 407–514

    Google Scholar 

  41. Morrison DA (2007) Increasing the efficiency of searches for the maximum likelihood tree in a phylogenetic analysis of up to 150 nucleotide sequences. Syst Biol 56:988–1010

    Article  CAS  PubMed  Google Scholar 

  42. Whelan S (2007) New approaches to phylogenetic tree search and their application to large numbers of protein alignments. Syst Biol 56:727–740

    Article  PubMed  Google Scholar 

  43. Vinh LS, von Haeseler A (2004) IQPNNI: moving fast through tree space and stopping in time. Mol Biol Evol 21:1565–1571

    Article  CAS  Google Scholar 

  44. Money D, Whelan S (2012) Characterizing the phylogenetic tree-search problem. Syst Biol 61:228–239

    Article  PubMed  Google Scholar 

  45. Bryant D (2004) The splits in the neighborhood of a tree. Ann Combin 8:1–11

    Article  Google Scholar 

  46. Whelan S, Money D (2010) The prevalence of multifurcations in tree-space and their implications for tree-search. Mol Biol Evol 27:2674–2677

    Article  CAS  PubMed  Google Scholar 

  47. Lin Y-M, Fang S-C, Thorne JL (2007) A tabu search algorithm for maximum parsimony phylogeny inference. Eur J Oper Res 176:1908–1917

    Article  Google Scholar 

  48. Zwickl D (2006) Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. thesis, University of Texas, USA

    Google Scholar 

  49. Lewis PO (1998) A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. Mol Biol Evol 15:277–283

    Article  CAS  PubMed  Google Scholar 

  50. Lemmon AR, Milinkovitch MC (2002) The metapopulation genetic algorithm: an efficient solution for the problem of large phylogeny estimation. Proc Natl Acad Sci U S A 99:10516–10521

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Darriba D et al (2012) jModelTest 2: more models, new heuristics and parallel computing. Nat Methods 9:772

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Darriba D et al (2011) ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27:1164–1165

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Whelan S et al (2015) ModelOMatic: fast and automated model selection between RY, nucleotide, amino acid, and codon substitution models. Syst Biol 64:42–55

    Article  PubMed  Google Scholar 

  54. Allen JE, Whelan S (2014) Assessing the state of substitution models describing noncoding RNA evolution. Genome Biol Evol 6:65–75

    Article  PubMed  PubMed Central  Google Scholar 

  55. Blair C, Murphy RW (2011) Recent trends in molecular phylogenetic analysis: where to next? J Hered 102:130–138

    Article  PubMed  Google Scholar 

  56. Lanfear R et al (2012) PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol Biol Evol 29:1695–1701

    Article  CAS  PubMed  Google Scholar 

  57. Pagel M, Meade A (2004) A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst Biol 53:571–581

    Article  PubMed  Google Scholar 

  58. Le SQ, Lartillot N, Gascuel O (2008) Phylogenetic mixture models for proteins. Philos Trans R Soc B Biol Sci 363:3965–3976

    Article  CAS  Google Scholar 

  59. Le SQ, Gascuel O (2010) Accounting for solvent accessibility and secondary structure in protein phylogenetics is clearly beneficial. Syst Biol 59:277–287

    Article  CAS  PubMed  Google Scholar 

  60. Bouckaert RR (2010) DensiTree: making sense of sets of phylogenetic trees. Bioinformatics 26:1372–1373

    Article  CAS  PubMed  Google Scholar 

  61. Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791

    Article  Google Scholar 

  62. Hillis DM, Bull JJ (1993) An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst Biol 42:182–192

    Article  Google Scholar 

  63. Efron B, Halloran E, Holmes S (1996) Bootstrap confidence levels for phylogenetic trees. Proc Natl Acad Sci U S A 93:13429

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Embley TM, Martin W (2006) Eukaryotic evolution, changes and challenges. Nature 440:623–630

    Article  CAS  PubMed  Google Scholar 

  65. Fitzpatrick DA, Creevey CJ, McInerney JO (2006) Genome phylogenies indicate a meaningful α-proteobacterial phylogeny and support a grouping of the mitochondria with the Rickettsiales. Mol Biol Evol 23:74–85

    Article  CAS  PubMed  Google Scholar 

  66. McGowen MR, Gatesy J, Wildman DE (2014) Molecular evolution tracks macroevolutionary transitions in Cetacea. Trends Ecol Evol 29:336–346

    Article  PubMed  Google Scholar 

  67. Shimodaira H, Hasegawa M (1999) Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol 16:1114–1116

    Article  CAS  Google Scholar 

  68. Shimodaira H (2002) An approximately unbiased test of phylogenetic tree selection. Syst Biol 51:492–508

    Article  PubMed  Google Scholar 

  69. Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J Mol Evol 29:170–179

    Article  CAS  PubMed  Google Scholar 

  70. Stamatakis A, Hoover P, Rougemont J (2008) A rapid bootstrap algorithm for the RAxML web servers. Syst Biol 57:758–771

    Article  PubMed  Google Scholar 

  71. Minh BQ, Nguyen MAT, von Haeseler A (2013) Ultrafast approximation for phylogenetic bootstrap. Mol Biol Evol 30:1188–1195. doi:10.1093/molbev/mst024

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Anisimova M, Gascuel O (2006) Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol 55:539–552

    Article  PubMed  Google Scholar 

  73. Huelsenbeck JP et al (2002) Potential applications and pitfalls of Bayesian inference of phylogeny. Syst Biol 51:673–688

    Article  PubMed  Google Scholar 

  74. Holder M, Lewis PO (2003) Phylogeny estimation: traditional and Bayesian approaches. Nat Rev Genet 4:275–284

    Article  CAS  PubMed  Google Scholar 

  75. Ronquist F, Deans AR (2010) Bayesian phylogenetics and its influence on insect systematics. Annu Rev Entomol 55:189–206

    Article  CAS  PubMed  Google Scholar 

  76. Yang Z, Rannala B (2012) Molecular phylogenetics: principles and practice. Nat Rev Genet 13:303–314

    Article  CAS  PubMed  Google Scholar 

  77. Drummond AJ et al (2012) Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol 29:1969–1973

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Ronquist F et al (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61:539–542

    Article  PubMed  PubMed Central  Google Scholar 

  79. Larget B, Simon DL (1999) Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol Biol Evol 16:750–759

    Article  CAS  Google Scholar 

  80. Alfaro ME, Holder MT (2006) The posterior and the prior in Bayesian phylogenetics. Annu Rev Ecol Evol Syst 37:19–42

    Article  Google Scholar 

  81. Zhang C, Rannala B, Yang Z (2012) Robustness of compound Dirichlet priors for Bayesian inference of branch lengths. Syst Biol 61:779–784

    Article  PubMed  Google Scholar 

  82. Bergsten J, Nilsson AN, Ronquist F (2013) Bayesian tests of topology hypotheses with an example from diving beetles. Syst Biol 62:660–673

    Article  PubMed  PubMed Central  Google Scholar 

  83. Rannala B, Zhu T, Yang Z (2012) Tail paradox, partial identifiability, and influential priors in Bayesian branch length inference. Mol Biol Evol 29:325–335

    Article  CAS  PubMed  Google Scholar 

  84. Lewis PO, Holder MT, Holsinger KE (2005) Polytomies and Bayesian phylogenetic inference. Syst Biol 54:241–253

    Article  PubMed  Google Scholar 

  85. Yang ZH (2007) Fair-balance paradox, star-tree paradox, and Bayesian phylogenetics. Mol Biol Evol 24:1639–1655

    Article  CAS  PubMed  Google Scholar 

  86. Lartillot N, Philippe H (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol 21:1095–1109

    Article  CAS  PubMed  Google Scholar 

  87. Lartillot N, Brinkmann H, Philippe H (2007) Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol Biol 7:S4

    Article  PubMed  PubMed Central  Google Scholar 

  88. Robinson D et al (2003) Protein evolution with dependence among codons due to tertiary structure. Mol Biol Evol 20:1692–1704

    Article  CAS  PubMed  Google Scholar 

  89. Lartillot N, Poujol R (2011) A phylogenetic model for investigating correlated evolution of substitution rates and continuous phenotypic characters. Mol Biol Evol 28:729–744

    Article  CAS  PubMed  Google Scholar 

  90. Lukoschek V, Keogh JS, Avise JC (2012) Evaluating fossil calibrations for dating phylogenies in light of rates of molecular evolution: a comparison of three approaches. Syst Biol 61:22–43

    Article  PubMed  Google Scholar 

  91. Baele G et al (2012) Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Mol Biol Evol 29:2157–2167

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Delsuc F, Brinkmann H, Philippe H (2005) Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet 6:361–375

    Article  CAS  PubMed  Google Scholar 

  93. Landan G, Graur D (2007) Heads or tails: a simple reliability check for multiple sequence alignments. Mol Biol Evol 24:1380–1383

    Article  CAS  PubMed  Google Scholar 

  94. Penn O et al (2010) An alignment confidence score capturing robustness to guide tree uncertainty. Mol Biol Evol 27:1759–1767

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Jordan G, Goldman N (2012) The effects of alignment error and alignment filtering on the sitewise detection of positive selection. Mol Biol Evol 29:1125–1139

    Article  CAS  PubMed  Google Scholar 

  96. Huber KT et al (2002) Spectronet: a package for computing spectra and median networks. Appl Bioinformatics 1:2041–2059

    Google Scholar 

  97. Huson DH (1998) SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14:68–73

    Article  CAS  PubMed  Google Scholar 

  98. Gil M et al (2013) CodonPhyML: fast maximum likelihood phylogeny estimation under codon substitution models. Mol Biol Evol 30:1270–1280

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Swofford DL (2002) Phylogenetic analysis using parsimony (*and other methods). Sinauer Associates, Sunderland, MA

    Google Scholar 

  100. Guindon S et al (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321

    Article  CAS  PubMed  Google Scholar 

  101. Lartillot N, Lepage T, Blanquart S (2009) PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25:2286–2288

    Article  CAS  PubMed  Google Scholar 

  102. Nylander JA et al (2008) AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics. Bioinformatics 24:581–583

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

SW is funded by Uppsala University. DAM is funded by Akademikernas A-kassa and Trygghetsstiftelsen.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simon Whelan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this protocol

Cite this protocol

Whelan, S., Morrison, D.A. (2017). Inferring Trees. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1525. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6622-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6622-6_14

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6620-2

  • Online ISBN: 978-1-4939-6622-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics