Skip to main content

Phylogenomics

  • Protocol
  • First Online:
Comparative Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1704))

Abstract

Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. The abundance of genomic data for an enormous variety of organisms has enabled phylogenomic inference of many groups, and this has motivated the development of many computer programs implementing the associated methods. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2013) Genbank. Nucleic Acids Res 41(D1):D36–D42

    Article  CAS  PubMed  Google Scholar 

  2. O’Brien SJ, Menotti-Raymond M, Murphy WJ, Nash WG, Wienberg J, Stanyon R, Copeland NG, Jenkins NA, Womack JE, Graves JAM (1999) The promise of comparative genomics in mammals. Science 286(5439):458–481

    Article  PubMed  Google Scholar 

  3. Delsuc F, Brinkmann H, Philippe H (2005) Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet 6(5):361–375

    Article  CAS  PubMed  Google Scholar 

  4. Eisen JA, Kaiser D, Myers RM (1997) Gastrogenomic delights: a movable feast. Nat Med 3(10):1076

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Eisen JA (1998) Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res 8(3):163–167

    Article  CAS  PubMed  Google Scholar 

  6. Fan H, Ives AR, Surget-Groba Y, Cannon CH (2015) An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data. BMC Genomics 16(1):522

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  7. Darling AE, Mau Bob, Perna NT (2010) progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PloS One 5(6):e11147

    Google Scholar 

  8. Minkin I, Patel A, Kolmogorov M, Vyahhi N, Pham S (2013) Sibelia: a scalable and comprehensive synteny block generation tool for closely related microbial genomes. In: International workshop on algorithms in bioinformatics. Springer, Berlin, pp 215–229

    Chapter  Google Scholar 

  9. Gardner SN, Slezak T, Hall BG (2015) kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome. Bioinformatics 31:2877–2878

    Article  CAS  PubMed  Google Scholar 

  10. Contreras-Moreira B, Vinuesa P (2013) Get_homologues, a versatile software package for scalable and robust microbial pangenome analysis. Appl Environ Microbiol 79(24):7696–7701

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Li L, Stoeckert CJ, Roos DS (2003) Orthomcl: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Kristensen DM, Kannan L, Coleman MK, Wolf YI, Sorokin A, Koonin EV, Mushegian A (2010) A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches. Bioinformatics 26(12):1481–1487

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Treangen TJ, Ondov BD, Koren S, Phillippy AM (2014) The harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol 15(11):524

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Galtier N, Tourasse N, Gouy M (1999) A nonhyperthermophilic common ancestor to extant life forms. Science 283(5399):220–221

    Article  CAS  PubMed  Google Scholar 

  15. Bragg JG, Potter S, Bi K, Moritz C (2015) Exon capture phylogenomics: efficacy across scales of divergence. Mol Ecol Resour

    Google Scholar 

  16. Folk RA, Mandel JR, Freudenstein JV (2015) A protocol for targeted enrichment of intron-containing sequence markers for recent radiations: a phylogenomic example from heuchera (saxifragaceae). Appl Plant Sci 3(8):1500039

    Article  Google Scholar 

  17. Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D (2004) Ultraconserved elements in the human genome. Science 304(5675):1321–1325

    Article  CAS  PubMed  Google Scholar 

  18. Faircloth BC, McCormack JE, Crawford NG, Harvey MG, Brumfield RT, Glenn TC (2012) Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. Syst Biol 61:717–726

    Article  PubMed  Google Scholar 

  19. Crawford NG, Faircloth BC, McCormack JE, Brumfield RT, Winker K, Glenn TC (2012) More than 1000 ultraconserved elements provide evidence that turtles are the sister group of archosaurs. Biol Lett 8(5):783–786

    Article  PubMed  PubMed Central  Google Scholar 

  20. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S et al (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15(8):1034–1050

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Glazov EA, Pheasant M, McGraw EA, Bejerano G, Mattick JS (2005) Ultraconserved elements in insect genomes: a highly conserved intronic sequence implicated in the control of homothorax mrna splicing. Genome Res 15(6):800–808

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Zheng W-X, Zhang C-T (2008) Ultraconserved elements between the genomes of the plants arabidopsis thaliana and rice. J Biomol Struct Dyn 26(1):1–8

    Article  PubMed  Google Scholar 

  23. Smith BT, Harvey MG, Faircloth BC, Glenn TC, Brumfield RT (2013) Target capture and massively parallel sequencing of ultraconserved elements (uces) for comparative studies at shallow evolutionary time scales. Syst Biol 63:83–95

    Article  PubMed  Google Scholar 

  24. Faircloth BC (2015) PHYLUCE is a software package for the analysis of conserved genomic loci. Bioinformatics 32:786–788

    Article  PubMed  CAS  Google Scholar 

  25. Pearson T, Busch JD, Ravel J, Read TD, Rhoton SD, U’ren JM, Simonson TS, Kachur SM, Leadem RR, Cardon ML et al (2004) Phylogenetic discovery bias in bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing. Proc Natl Acad Sci USA 101(37):13536–13541

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Pearson T, Okinaka RT, Foster JT, Keim P (2009) Phylogenetic understanding of clonal populations in an era of whole genome sequencing. Infect Genet Evol 9(5):1010–1019

    Article  CAS  PubMed  Google Scholar 

  27. Leaché AD, Banbury BL, Felsenstein J, Nieto-Montes de Oca A, Stamatakis A (2015) Short tree, long tree, right tree, wrong tree: new acquisition bias corrections for inferring snp phylogenies. Syst Biol 64:1032–1047

    Article  PubMed  PubMed Central  Google Scholar 

  28. Lewis PO (2001) A likelihood approach to estimating phylogeny from discrete morphological character data. Syst Biol 50(6):913–925

    Article  CAS  PubMed  Google Scholar 

  29. Bertels F, Silander OK, Pachkov M, Rainey PB, van Nimwegen E (2014) Automated reconstruction of whole-genome phylogenies from short-sequence reads. Mol Biol Evol 31(5):1077–1088

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Stamatakis A (2014) Raxml version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ (2015) Iq-tree: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32(1):268–274

    Article  CAS  PubMed  Google Scholar 

  32. Ronquist F, Teslenko M, Van Der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) Mrbayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61(3):539–542

    Article  PubMed  PubMed Central  Google Scholar 

  33. Nielsen R, Paul JS, Albrechtsen A, Song YS (2011) Genotype and snp calling from next-generation sequencing data. Nat Rev Genet 12(6):443–451

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Rokas A, Holland PWH (2000) Rare genomic changes as a tool for phylogenetics. Trends Ecol Evol 15(11):454–459

    Article  CAS  PubMed  Google Scholar 

  35. Boore JL, Lavrov DV, Brown WM (1998) Gene translocation links insects and crustaceans. Nature 392(6677):667

    Article  CAS  PubMed  Google Scholar 

  36. Regier JC, Shultz JW, Zwick A, Hussey A, Ball B, Wetzer R, Martin JW, Cunningham CW (2010) Arthropod relationships revealed by phylogenomic analysis of nuclear protein-coding sequences. Nature 463(7284):1079–1083

    Article  CAS  PubMed  Google Scholar 

  37. Yue F, Cui L, Moret BME, Tang J et al (2008) Gene rearrangement analysis and ancestral order inference from chloroplast genomes with inverted repeat. BMC Genomics 9(1):S25

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Hu F, Lin Y, Tang J (2014) Mlgo: phylogeny reconstruction and ancestral inference from gene-order data. BMC Bioinf 15(1):354

    Article  Google Scholar 

  39. Moret BME, Wyman S, Bader DA, Warnow T, Yan M (2001) A new implementation and detailed study of breakpoint analysis. In: Pacific symposium on biocomputing, vol 6, pp 583–594

    Google Scholar 

  40. Tang J, Moret BME (2003) Scaling up accurate phylogenetic reconstruction from gene-order data. Bioinformatics 19(suppl 1):i305–i312

    Article  PubMed  Google Scholar 

  41. Kang S, Tang J, Schaeffer SW, Bader DA (2011) Rec-DCM-Eigen: reconstructing a less parsimonious but more accurate tree in shorter time. PloS One 6(8):e22483

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Hilker R, Sickinger C, Pedersen CNS, Stoye J (2012) Unimog—a unifying framework for genomic distance calculation and sorting based on DCJ. Bioinformatics 28(19):2509–2511

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Hu F, Lin Y, Tang J (2014) MLGO: phylogeny reconstruction and ancestral inference from gene-order data. BMC Bioinf 15(354)

    Google Scholar 

  44. Mostowy S, Behr MA (2005) The origin and evolution of mycobacterium tuberculosis. Clin Chest Med 26(2):207–216

    Article  PubMed  Google Scholar 

  45. Belinky F, Cohen O, Huchon D (2010) Large-scale parsimony analysis of metazoan indels in protein-coding genes. Mol Biol Evol 27(2):441–451

    Google Scholar 

  46. Müller K (2005) Seqstate. Appl Bioinf 4(1):65–69

    Article  Google Scholar 

  47. Rosenfeld JA, Oppenheim S, DeSalle R (2017) A whole genome gene content phylogenetic analysis of anopheline mosquitoes. Mol Phylogenet Evol 107:266–269

    Article  CAS  PubMed  Google Scholar 

  48. Lake JA, Rivera MC (2004) Deriving the genomic tree of life in the presence of horizontal gene transfer: conditioned reconstruction. Mol Biol Evol 21(4):681–690

    Article  CAS  PubMed  Google Scholar 

  49. Vos P, Hogers R, Bleeker M, Reijans M, Van de Lee T, Hornes M, Friters A, Pot J, Paleman J, Kuiper M et al (1995) Aflp: a new technique for dna fingerprinting. Nucleic Acids Res 23(21):4407–4414

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Koopman WJM, Wissemann V, De Cock K, Van Huylenbroeck J, De Riek J, Sabatino GJH, Visser D, Vosman B, Ritz CM, Maes B et al (2008) Aflp markers as a tool to reconstruct complex relationships: a case study in rosa (rosaceae). Am J Bot 95(3):353–366

    Article  CAS  PubMed  Google Scholar 

  51. Murata S, Takasaki N, Saitoh M, Okada N (1993) Determination of the phylogenetic relationships among pacific salmonids by using short interspersed elements (sines) as temporal landmarks of evolution. Proc Natl Acad Sci 90(15):6995–6999

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Verneau O, Catzeflis F, Furano AV (1998) Determining and dating recent rodent speciation events by using l1 (line-1) retrotransposons. Proc Natl Acad Sci 95(19):11284–11289

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Gibson A, Brown T, Baker L, Drobniewski F (2005) Can 15-locus mycobacterial interspersed repetitive unit-variable-number tandem repeat analysis provide insight into the evolution of mycobacterium tuberculosis? Appl Environ Microbiol 71(12):8207–8213

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Asher RJ (2007) A web-database of mammalian morphology and a reanalysis of placental phylogeny. BMC Evol Biol 7(1):108

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  55. Livezey BC, Zusi RL (2007) Higher-order phylogeny of modern birds (theropoda, aves: Neornithes) based on comparative anatomy. ii. analysis and discussion. Zool J Linnean Soc 149(1):1–95

    Article  Google Scholar 

  56. Murray GGR, Weinert LA, Rhule EL, Welch JJ (2016) The phylogeny of rickettsia using different evolutionary signatures: how tree-like is bacterial evolution? Syst Biol 65(2):265–279

    Article  PubMed  Google Scholar 

  57. Liu F-GR, Miyamoto MM, Freire NP, Ong PQ, Tennant MR, Young TS, Gugel KF (2001) Molecular and morphological supertrees for Eutherian (placental) mammals. Science 291(5509):1786–1789

    Article  CAS  PubMed  Google Scholar 

  58. Wheeler WC, Lucaroni N, Hong L, Crowley LM, Varón A (2015) Poy version 5: phylogenetic analysis using dynamic homologies under multiple optimality criteria. Cladistics 31(2):189–196

    Article  Google Scholar 

  59. Edgar RC (2004) Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Katoh K, Standley DM (2013) Mafft multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30(4):772–780

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Sela I, Ashkenazy H, Katoh K Pupko T (2015) Guidance2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res 43(W1):W7–W14

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Mirarab S, Nguyen N, Guo S, Wang L-S, Kim J, Warnow T (2015) Pasta: ultra-large multiple sequence alignment for nucleotide and amino-acid sequences. J Comput Biol 22(5):377–386

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Nguyen N-PD, Mirarab S, Kumar K, Warnow T (2015) Ultra-large alignments using phylogeny-aware profiles. Genome Biol 16(1):124

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  64. Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol 7(10):e1002195

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Larsson A (2014) Aliview: a fast and lightweight alignment viewer and editor for large datasets. Bioinformatics 30(22):3276–3278

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Farris JS (1970) Methods for computing Wagner trees. Syst Biol 19(1):83–92

    Article  Google Scholar 

  67. Camin JH, Sokal RR (1965) A method for deducing branching sequences in phylogeny. Evolution 311–326

    Google Scholar 

  68. Le Quesne WJ (1974) The uniquely evolved character concept and its cladistic application. Syst Biol 23(4):513–517

    Google Scholar 

  69. Farris JS (1977) Phylogenetic analysis under Dollo’s law. Syst Biol 26(1):77–88

    Article  Google Scholar 

  70. Platt RN, Zhang Y, Witherspoon DJ, Xing J, Suh A, Keith MS, Jorde LB, Stevens RD, Ray DA (2015) Targeted capture of phylogenetically informative ves sine insertions in genus Myotis. Genome Biol Evol 7(6):1664–1675

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Swofford DA, Olsen GJ (1990) Phylogeny reconstruction. In: Hillis DM, Moritz C (eds) Molecular systematics. Sinauer Associates, Sunderland, MA, pp 411–501

    Google Scholar 

  72. Sankoff D, Rousseau P (1975) Locating the vertices of a steiner tree in an arbitrary space. Math Program 9:240–246

    Article  Google Scholar 

  73. Goloboff PA, Farris JS, Nixon KC (2008) Tnt, a free program for phylogenetic analysis. Cladistics 24(5):774–786

    Article  Google Scholar 

  74. Müllner D (2011) fastcluster: Fast hierarchical clustering routines for R and Python

    Google Scholar 

  75. Khan MA, Elias I, Sjölund E, Nylander K, Guimera RV, Schobesberger E, Schmitzberger P, Lagergren J, Arvestad L (2013) Fastphylo: fast tools for phylogenetics. BMC Bioinf 14(1):334

    Article  Google Scholar 

  76. Criscuolo A, Gascuel O (2008) Fast NJ-like algorithms to deal with incomplete distance matrices. BMC Bioinf 9(1):166

    Article  CAS  Google Scholar 

  77. Lefort V, Desper R, Gascuel P(2015) Fastme 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Mol Biol Evol 32(10):2798–2800

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Felsenstein J (2016) {PHYLIP}: phylogenetic inference package, version 3.5 c

    Google Scholar 

  79. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ (2015) Iq-tree: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32(1):268–274

    Article  CAS  PubMed  Google Scholar 

  80. Price MN, Dehal PS, Arkin AP (2010) Fasttree 2–approximately maximum-likelihood trees for large alignments. PloS One 5(3):e9490

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  81. Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of phyml 3.0. Syst Biol 59(3):307–321

    Article  CAS  PubMed  Google Scholar 

  82. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092

    Article  CAS  Google Scholar 

  83. Hastings WE (1970) Monte carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109

    Article  Google Scholar 

  84. Drummond AJ, Suchard MA, Xie D, Rambaut A (2012) Bayesian phylogenetics with beauti and the beast 1.7. Mol Biol Evol 29(8):1969–1973

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Lewis PO, Holder MT, Swofford DL (2015) Phycas: software for Bayesian phylogenetic analysis. Syst Biol 64(3):525–531

    Article  CAS  PubMed  Google Scholar 

  86. Felsenstein J (1978) Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool 401–410

    Google Scholar 

  87. Kolaczkowski B, Thornton JW (2004) Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431(7011):980–984

    Google Scholar 

  88. Philippe H, Zhou Y, Brinkmann H, Rodrigue N, Delsuc F (2005) Heterotachy and long-branch attraction in phylogenetics. BMC Evol Biol 5(1):50

    Google Scholar 

  89. Gadagkar SR, Kumar S (2005) Maximum likelihood outperforms maximum parsimony even when evolutionary rates are heterotachous. Mol Biol Evol 22(11):2139–2141

    Article  CAS  PubMed  Google Scholar 

  90. Spencer M, Susko E, Roger AJ (2005) Likelihood, parsimony, and heterogeneous evolution. Mol Biol Evol 22(5):1161–1164

    Article  CAS  PubMed  Google Scholar 

  91. Ripplinger J, Sullivan J (2008) Does choice in model selection affect maximum likelihood analysis? Syst Biol 57(1):76–85

    Article  PubMed  Google Scholar 

  92. Warnow T (2012) Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent. PLOS Curr Tree Life 4:RRN1308

    Google Scholar 

  93. Simmons MP, Pickett KM, Miya M (2004) How meaningful are Bayesian support values? Mol Biol Evol 21(1):188–199

    Article  CAS  PubMed  Google Scholar 

  94. Rannala B, Zhu T, Yang Z (2012) Tail paradox, partial identifiability, and influential priors in Bayesian branch length inference. Mol Biol Evol 29(1):325–335

    Article  CAS  PubMed  Google Scholar 

  95. Hendy MD, Penny D (1982) Branch and bound algorithms to determine minimal evolutionary trees. Math Biosci 59(2):277–290

    Article  Google Scholar 

  96. Nixon KC (1999) The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15(4):407–414

    Article  Google Scholar 

  97. Bazinet AL, Zwickl DJ, Cummings MP (2014) A gateway for phylogenetic analysis powered by grid computing featuring garli 2.0. Syst Biol 63(5):812–818

    Article  PubMed  PubMed Central  Google Scholar 

  98. Helaers R, Milinkovitch MC (2010) Metapiga v2. 0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics. BMC Bioinf 11(1):379

    Google Scholar 

  99. Goloboff PA (1999) Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics 15(4):415–428

    Article  Google Scholar 

  100. Roshan UW, Warnow T, Moret BME, Williams TL (2004) Rec-i-dcm3: a fast algorithmic technique for reconstructing phylogenetic trees. In: Proceedings of 2004 I.E. computational systems bioinformatics conference, 2004. CSB 2004. IEEE, New York, pp 98–109

    Google Scholar 

  101. Swofford DL (2003) Paup*. phylogenetic analysis using parsimony (* and other methods). version 4.

    Google Scholar 

  102. Yang Z (1994) Estimating the pattern of nucleotide substitution. J Mol Evol 39(1):105–111

    Article  PubMed  Google Scholar 

  103. Tavaré S (1986) Some probabilistic and statistical problems in the analysis of dna sequences. Lect Math Life Sci 17:57–86

    Google Scholar 

  104. Jukes TH, Cantor CR (1969) Evolution of protein molecules. Mamm Protein Metab 3(21):132

    Google Scholar 

  105. Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16(2):111–120

    Article  CAS  PubMed  Google Scholar 

  106. Hasegawa M, Kishino H, Yano T-A (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial dna. J Mol Evol 22(2):160–174

    Article  CAS  PubMed  Google Scholar 

  107. Yang Z (1996) Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol 11(9):367–372

    Article  CAS  PubMed  Google Scholar 

  108. Mayrose I, Friedman N, Pupko T (2005) A gamma mixture model better accounts for among site rate heterogeneity. Bioinformatics 21(suppl 2):ii151–ii158

    Google Scholar 

  109. Lartillot N, Philippe H (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol 21(6):1095–1109

    Article  CAS  PubMed  Google Scholar 

  110. Le SQ, Lartillot N, Gascuel O (2008) Phylogenetic mixture models for proteins. Philos Trans R Soc B 363(1512):3965–3976

    Article  CAS  Google Scholar 

  111. Felsenstein J, Churchill GA (1996) A hidden Markov model approach to variation among sites in rate of evolution. Mol Biol Evol 13(1):93–104

    Article  CAS  PubMed  Google Scholar 

  112. McGuire G, Wright F, Prentice MJ (2000) A Bayesian model for detecting past recombination events in dna multiple alignments. J Comput Biol 7(1–2):159–170

    Article  CAS  PubMed  Google Scholar 

  113. Boussau B, Guéguen L, Gouy M (2009) A mixture model and a hidden Markov model to simultaneously detect recombination breakpoints and reconstruct phylogenies. Evol Bioinf 5:67

    Article  CAS  Google Scholar 

  114. Lopez P, Casane D, Philippe H (2002) Heterotachy, an important process of protein evolution. Mol Biol Evol 19(1):1–7

    Article  CAS  PubMed  Google Scholar 

  115. Galtier N, Gouy M (1998) Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of dna sequence evolution for phylogenetic analysis. Mol Biol Evol 15(7):871–879

    Article  CAS  PubMed  Google Scholar 

  116. Schöniger M, Von Haeseler A (1994) A stochastic model for the evolution of autocorrelated dna sequences. Mol Phylogenet Evol 3(3):240–247

    Article  PubMed  Google Scholar 

  117. Muse SV (1995) Evolutionary analyses of dna sequences subject to constraints of secondary structure. Genetics 139(3):1429–1439

    CAS  PubMed  PubMed Central  Google Scholar 

  118. Rzhetsky A (1995) Estimating substitution rates in ribosomal RNA genes. Genetics 141(2):771–783

    CAS  PubMed  PubMed Central  Google Scholar 

  119. Savill NJ, Hoyle DC, Higgs PG (2001) Rna sequence evolution with secondary structure constraints: comparison of substitution rate models using maximum-likelihood methods. Genetics 157(1):399–411

    CAS  PubMed  PubMed Central  Google Scholar 

  120. Renée E, Tillier M (1994) Maximum likelihood with multiparameter models of substitution. J Mol Evol 39(4):409–417

    Article  Google Scholar 

  121. Higgs PG (2000) RNA secondary structure: physical and computational aspects. Q Rev Biophys 33(3):199–253

    Article  CAS  PubMed  Google Scholar 

  122. Tillier ERM, Collins RA (1998) High apparent rate of simultaneous compensatory base-pair substitutions in ribosomal rna. Genetics 148(4):1993–2002

    CAS  PubMed  PubMed Central  Google Scholar 

  123. Allen JE, Whelan S (2014) Assessing the state of substitution models describing noncoding RNA evolution. Genome Biol Evol 6(1):65–75

    Article  PubMed  PubMed Central  Google Scholar 

  124. Dayhoff MO, Schwartz RM, Orcutt BC (1978) 22 a model of evolutionary change in proteins. In: Atlas of protein sequence and structure, vol 5. National Biomedical Research Foundation, Silver Spring, MD, pp 345–352

    Google Scholar 

  125. Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci 89(22):10915–10919

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  126. Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8(3):275–282

    CAS  PubMed  Google Scholar 

  127. Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18(5):691–699

    Article  CAS  PubMed  Google Scholar 

  128. Le SQ, Gascuel O (2008) An improved general amino acid replacement matrix. Mol Biol Evol 25(7):1307–1320

    Article  CAS  PubMed  Google Scholar 

  129. Yang Z, Nielsen R, Hasegawa M (1998) Models of amino acid substitution and applications to mitochondrial protein evolution. Mol Biol Evol 15(12):1600–1611

    Article  CAS  PubMed  Google Scholar 

  130. Dang CC, Le QS, Gascuel O, Le VS (2010) Flu, an amino acid substitution model for influenza proteins. BMC Evol Biol 10(1):99

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  131. Le SQ, Dang CC, Gascuel O (2012) Modeling protein evolution with several amino acid replacement matrices depending on site rates. Mol Biol Evol 29:2921–2936

    Article  CAS  PubMed  Google Scholar 

  132. Muse SV, Gaut BS (1994) A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol 11(5):715–724

    CAS  PubMed  Google Scholar 

  133. Goldman N, Yang Z (1994) A codon-based model of nucleotide substitution for protein-coding dna sequences. Mol Biol Evol 11(5):725–736

    CAS  PubMed  Google Scholar 

  134. Yang Z, Nielsen R (1998) Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J Mol Evol 46(4):409–418

    Article  CAS  PubMed  Google Scholar 

  135. Whelan S, Goldman N (2004) Estimating the frequency of events that cause multiple-nucleotide changes. Genetics 167(4):2027–2043

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Kosiol C, Holmes I, Goldman N (2007) An empirical codon model for protein sequence evolution. Mol Biol Evol 24(7):1464–1479

    Article  CAS  PubMed  Google Scholar 

  137. Gil M, Zanetti MS, Zoller S, Anisimova M (2013) CodonPhyML: fast maximum likelihood phylogeny estimation under codon substitution models. Mol Biol Evol, page mst034

    Google Scholar 

  138. Wright AM, Hillis DM (2014) Bayesian analysis using a simple likelihood model outperforms parsimony for estimation of phylogeny from discrete morphological data. PLoS One 9(10):e109210

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  139. Ho SYW, Jermiin LS (2004) Tracing the decay of the historical signal in biological sequence data. Syst Biol 53(4):623–637

    Article  PubMed  Google Scholar 

  140. Lemmon AR, Moriarty EC (2004) The importance of proper model assumption in Bayesian phylogenetics. Syst Biol 53(2):265–277

    Article  PubMed  Google Scholar 

  141. Sullivan J, Swofford DL (1997) Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. J Mamm Evol 4(2):77–86

    Article  Google Scholar 

  142. Sullivan J, Joyce P (2005) Model selection in phylogenetics. Annu Rev Ecol Evol Syst 36:445–466

    Article  Google Scholar 

  143. Posada D, Crandall KA (2001) Selecting the best-fit model of nucleotide substitution. Syst Biol 50(4):580–601

    Article  CAS  PubMed  Google Scholar 

  144. Abdo Z, Minin VN, Joyce P, Sullivan J (2005) Accounting for uncertainty in the tree topology has little effect on the decision-theoretic approach to model selection in phylogeny estimation. Mol Biol Evol 22(3):691–703

    Article  CAS  PubMed  Google Scholar 

  145. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    Article  Google Scholar 

  146. Anderson DR, Burnham KP (2002) Avoiding pitfalls when using information-theoretic methods. J Wildl Manag 66:912–918

    Article  Google Scholar 

  147. Schwarz G et al (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    Article  Google Scholar 

  148. Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90(430):773–795

    Article  Google Scholar 

  149. Minin V, Abdo Z, Joyce P, Sullivan J (2003) Performance-based selection of likelihood models for phylogeny estimation. Syst Biol 52(5):674–683

    Article  PubMed  Google Scholar 

  150. Darriba D, Taboada GL, Doallo R, Posada D (2012) jModelTest 2: more models, new heuristics and parallel computing. Nature methods 9(8):772–772

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  151. Posada D, Buckley TR (2004) Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Syst Biol 53(5):793–808

    Article  PubMed  Google Scholar 

  152. Hoff M, Orf S, Riehm B, Darriba D, Stamatakis A (2016) Does the choice of nucleotide substitution models matter topologically? BMC Bioinf 17(1):143

    Article  CAS  Google Scholar 

  153. Luo A, Qiao H, Zhang Y, Shi W, Ho SYW, Xu W, Zhang A, Zhu C (2010) Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets. BMC Evol Biol 10(1):242

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  154. Duchêne S, Duchêne DA, Di Giallonardo F, Eden J-S, Geoghegan JL, Holt KE, Ho SYW, Holmes EC (2016) Cross-validation to select Bayesian hierarchical models in phylogenetics. BMC Evol Biol 16(1):115

    Article  PubMed  PubMed Central  Google Scholar 

  155. Lartillot N, Brinkmann H, Philippe H (2007) Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol Biol 7(1):S4

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  156. Whelan S, Allen JE, Blackburne BP, Talavera D (2015) Modelomatic: fast and automated model selection between RY, nucleotide, amino acid, and codon substitution models. Syst Biol 64(1):42–55

    Article  PubMed  Google Scholar 

  157. Lartillot N, Philippe H (2006) Computing bayes factors using thermodynamic integration. Syst Biol 55(2):195–207

    Article  PubMed  Google Scholar 

  158. Baele G, Lemey P, Bedford T, Rambaut A, Suchard MA, Alekseyenko AV (2012) Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Mol Biol Evol 29(9):2157–2167

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  159. Fan Y, Wu R, Chen M-H, Kuo L, Lewis PO (2011) Choosing among partition models in Bayesian phylogenetics. Mol Biol Evol 28(1):523–532

    Article  CAS  PubMed  Google Scholar 

  160. Huelsenbeck JP, Larget B, Alfaro ME (2004) Bayesian phylogenetic model selection using reversible jump Markov chain monte carlo. Mol Biol Evol 21(6):1123–1133

    Article  CAS  PubMed  Google Scholar 

  161. Brandley MC, Schmitz A, Reeder TW (2005) Partitioned Bayesian analyses, partition choice, and the phylogenetic relationships of scincid lizards. Syst Biol 54(3):373–390

    Article  PubMed  Google Scholar 

  162. Li C, Lu G, Orti G (2008) Optimal data partitioning and a test case for ray-finned fishes (actinopterygii) based on ten nuclear loci. Syst Biol 57(4):519–539

    Article  PubMed  Google Scholar 

  163. Lanfear R, Calcott B, Ho SYW, Guindon S (2012) Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol Biol Evol 29(6):1695–1701

    Article  CAS  PubMed  Google Scholar 

  164. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5(2):R12

    Article  PubMed  PubMed Central  Google Scholar 

  165. Roure B, Rodriguez-Ezpeleta N, Philippe H (2007) SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics. BMC Evol Biol 7(1):S2

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  166. Wiens JJ (2003) Missing data, incomplete taxa, and phylogenetic accuracy. Syst Biol 52(4):528–538

    Article  PubMed  Google Scholar 

  167. Wiens JJ (2006) Missing data and the design of phylogenetic analyses. J Biomed Inform 39(1):34–42

    Article  CAS  PubMed  Google Scholar 

  168. Jeffroy O, Brinkmann H, Delsuc F, Philippe H (2006) Phylogenomics: the beginning of incongruence? Trends Genet 22(4):225–231

    Article  CAS  PubMed  Google Scholar 

  169. Simmons MP (2012) Misleading results of likelihood-based phylogenetic analyses in the presence of missing data. Cladistics 28(2):208–222

    Article  Google Scholar 

  170. Lemmon AR, Brown JM, Stanger-Hall K, Lemmon EM (2009) The effect of ambiguous data on phylogenetic estimates obtained by maximum likelihood and Bayesian inference. Syst Biol 58(1):130–145

    Article  CAS  PubMed  Google Scholar 

  171. Foster PG (2004) Modeling compositional heterogeneity. Syst Biol 53(3):485–495

    Article  PubMed  Google Scholar 

  172. Kapralov MV, Filatov DA (2007) Widespread positive selection in the photosynthetic rubisco enzyme. BMC Evol Biol 7(1):73

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  173. Yang Z, Rannala B (2005) Branch-length prior influences Bayesian posterior probability of phylogeny. Syst Biol 54(3):455–470

    Article  PubMed  Google Scholar 

  174. Lewis PO, Holder MT, Holsinger KE (2005) Polytomies and Bayesian phylogenetic inference. Syst Biol 54(2):241–253

    Article  PubMed  Google Scholar 

  175. Aberer AJ, Stamatakis A (2011) A simple and accurate method for rogue taxon identification. In: 2011 I.E. international conference on bioinformatics and biomedicine (BIBM). IEEE, New York, pp 118–122

    Chapter  Google Scholar 

  176. Bergsten J (2005) A review of long-branch attraction. Cladistics 21(2):163–193

    Article  Google Scholar 

  177. Fourment M, Gibbs MJ (2006) Patristic: a program for calculating patristic distances and graphically comparing the components of genetic change. BMC Evol Biol 6(1):1

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  178. Xia X, Xie Z, Salemi M, Chen L, Wang Y (2003) An index of substitution saturation and its application. Mol Phylogenet Evol 26(1):1–7

    Article  CAS  PubMed  Google Scholar 

  179. Xia X, Xie Z (2001) DAMBE: software package for data analysis in molecular biology and evolution. J Hered 92(4):371–373

    Article  CAS  PubMed  Google Scholar 

  180. Goremykin VV, Nikiforova SV, Bininda-Emonds ORP (2010) Automated removal of noisy data in phylogenomic analyses. J Mol Evol 71(5-6):319–331

    Article  CAS  PubMed  Google Scholar 

  181. Cummins CA, McInerney JO (2011) A method for inferring the rate of evolution of homologous characters that can potentially improve phylogenetic inference, resolve deep divergence and correct systematic biases. Syst Biol 60(6):833–844

    Article  PubMed  Google Scholar 

  182. Simmons MP, Gatesy J (2016) Biases of tree-independent-character-subsampling methods. Mol Phylogenet Evol 100:424–443

    Article  PubMed  Google Scholar 

  183. Chang BSW, Campbell DL (2000) Bias in phylogenetic reconstruction of vertebrate rhodopsin sequences. Mol Biol Evol 17(8):1220–1231

    Article  CAS  PubMed  Google Scholar 

  184. Simmons MP, Zhang L-B, Webb CT, Reeves A (2006) How can third codon positions outperform first and second codon positions in phylogenetic inference? an empirical example from the seed plants. Syst Biol 55(2):245–258

    Article  PubMed  Google Scholar 

  185. Bradley RD, Durish ND, Rogers DS, Miller JR, Engstrom MD, Kilpatrick CW (2007) Toward a molecular phylogeny for Peromyscus: evidence from mitochondrial cytochrome-b sequences. J Mammal 88(5):1146–1159

    Article  PubMed  PubMed Central  Google Scholar 

  186. Cox CJ, Foster PG, Hirt RP, Harris SR, and Embley TM (2008) The archaebacterial origin of eukaryotes. Proc Natl Acad Sci 105(51):20356–20361

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  187. Benoit Nabholz, Axel Künstner, Rui Wang, Erich D Jarvis, and Hans Ellegren (2011) Dynamic evolution of base composition: causes and consequences in avian phylogenomics. Mol Biol Evol 28(8):2197–2210

    Google Scholar 

  188. Jermiin LS, Ho JWK, Lau KW, Jayaswal V (2009) SeqVis: a tool for detecting compositional heterogeneity among aligned nucleotide sequences. Bioinf DNA Seq Anal 65–91

    Google Scholar 

  189. Sheffield NC, Song H, Cameron SL, Whiting MF (2009) Nonstationary evolution and compositional heterogeneity in beetle mitochondrial phylogenomics. Syst Biol 58(4):381–394

    Article  PubMed  Google Scholar 

  190. Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15):1972–1973

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  191. Aberer AJ, Krompaß D, Stamatakis A (2011) RogueNaRok: an efficient and exact algorithm for rogue taxon identification. Heidelberg Institute for Theoretical Studies: Exelixis-RRDR-2011–10

    Google Scholar 

  192. Trautwein MD, Wiegmann BM, Yeates DK (2011) Overcoming the effects of rogue taxa: evolutionary relationships of the bee flies. PLOS Currents Tree of Life

    Google Scholar 

  193. Aberer AJ, Krompass D, Stamatakis A (2013) Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice. Syst Biol 62(1):162–166

    Article  PubMed  Google Scholar 

  194. Pattengale N, Aberer A, Swenson K, Stamatakis A, Moret B (2011) Uncovering hidden phylogenetic consensus in large data sets. IEEE/ACM Trans Comput Biol Bioinf 8(4):902–911

    Article  Google Scholar 

  195. Heath TA, Hedtke SM, Hillis DM (2008) Taxon sampling and the accuracy of phylogenetic analyses. J Syst Evol 46(3):239–257

    Google Scholar 

  196. Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791

    Article  PubMed  Google Scholar 

  197. Minh BQ, Nguyen MAT, von Haeseler A (2013) Ultrafast approximation for phylogenetic bootstrap. Mol Biol Evol 30:1188–1195

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  198. Felsenstein J, Felenstein J (2004) Inferring phylogenies, vol 2. Sinauer Associates, Sunderland

    Google Scholar 

  199. Farris JS, Albert VA, Källersjö M, Lipscomb D, Kluge AG (1996) Parsimony jackknifing outperforms neighbor-joining. Cladistics 12(2):99–124

    Article  Google Scholar 

  200. Yang Y, Smith SA (2014) Orthology inference in nonmodel organisms using transcriptomes and low-coverage genomes: improving accuracy and matrix occupancy for phylogenomics. Mol Biol Evol 31(11):3081–3092

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  201. Chaudhary R, Fernández-Baca D, Burleigh JG (2014) Mulrf: a software package for phylogenetic analysis using multi-copy gene trees. Bioinformatics 31:432–433

    Article  PubMed  CAS  Google Scholar 

  202. Anisimova M, Gascuel O (2006) Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative. Syst Biol 55(4):539–552

    Article  PubMed  Google Scholar 

  203. Shimodaira H, Hasegawa M (1999) Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol 16:1114–1116

    Article  CAS  Google Scholar 

  204. Anisimova M, Gil M, Dufayard J-F, Dessimoz C, Gascuel O (2011) Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Syst Biol 60:681–699

    Article  Google Scholar 

  205. Salichos L, Stamatakis A, Rokas A (2014) Novel information theory-based measures for quantifying incongruence among phylogenetic trees. Mol Biol Evol 31:1261–1271

    Article  CAS  PubMed  Google Scholar 

  206. Kobert K, Salichos L, Rokas A, Stamatakis A (2016) Computing the internode certainty and related measures from partial gene trees. Mol Biol Evol 33:1606–1617

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  207. Bremer K et al. (1994) Branch support and tree stability. Cladistics 10(3):295–304

    Article  Google Scholar 

  208. Wilkinson M, Thorley JL, Upchurch P (2000) A chain is no stronger than its weakest link: double decay analysis of phylogenetic hypotheses. Syst Biol 49(4):754–776

    Article  CAS  PubMed  Google Scholar 

  209. Thorley JL, Page RDM (2000) RadCon: phylogenetic tree comparison and consensus. Bioinformatics 16(5):486–487

    Article  CAS  PubMed  Google Scholar 

  210. Geisler JH, McGowen MR, Yang G, Gatesy J (2011) A supermatrix analysis of genomic, morphological, and paleontological data from crown cetacea. BMC Evol Biol 11(1):112

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  211. Hillis DM, Bull JJ (1993) An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst Biol 42(2):182–192

    Article  Google Scholar 

  212. Scannell DR, Byrne KP, Gordon JL, Wong S, Wolfe KH (2006) Multiple rounds of speciation associated with reciprocal gene loss in polyploid yeasts. Nature 440(7082):341–345

    Article  CAS  PubMed  Google Scholar 

  213. Robinson DF, Foulds LR (1981) Comparison of phylogenetic trees. Math Biosci 53(1-2):131–147

    Article  Google Scholar 

  214. Williams WT, Clifford HT (1971) On the comparison of two classifications of the same set of elements. Taxon 519–522

    Google Scholar 

  215. Billera LJ, Holmes SP, Vogtmann K (2001) Geometry of the space of phylogenetic trees. Adv Appl Math 27(4):733–767

    Google Scholar 

  216. Owen M, Provan JS (2011) A fast algorithm for computing geodesic distances in tree space. IEEE/ACM Trans Comput Biol Bioinf 8(1):2–13

    Google Scholar 

  217. Amenta N, Godwin M, Postarnakevich N, John KS (2007) Approximating geodesic tree distance. Information Processing Letters 103(2):61–65

    Google Scholar 

  218. Estabrook GF, McMorris FR, Meacham CA (1985) Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Syst Biol 34(2):193–200

    Article  Google Scholar 

  219. Critchlow DE, Pearl DK, Qian C (1996) The triples distance for rooted bifurcating phylogenetic trees. Syst Biol 45(3):323–334

    Article  Google Scholar 

  220. Gordon AD (1983) On the assessment and comparison of classifications. University of St. Andrews. Department of Statistics

    Google Scholar 

  221. Kuhner MK, Yamato J (2015) Practical performance of tree comparison metrics. Syst Biol 64(2):205–214

    Article  CAS  PubMed  Google Scholar 

  222. Gori K, Suchan T, Alvarez N, Goldman N, Dessimoz C (2016) Clustering genes of common evolutionary history. Mol Biol Evol 33:1590–1605

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  223. Templeton AR (1983) Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37:221–244

    Article  CAS  PubMed  Google Scholar 

  224. Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from dna sequence data, and the branching order in hominoidea. J Mol Evol 29(2):170–179

    Article  CAS  PubMed  Google Scholar 

  225. Susko E (2014) Tests for two trees using likelihood methods. Mol Biol Evol 31:1029–1039

    Article  CAS  PubMed  Google Scholar 

  226. Karin EL, Susko E, Pupko T (2014) Alignment errors strongly impact likelihood-based tests for comparing topologies. Mol Biol Evol 31(11):3057–3067

    Article  CAS  Google Scholar 

  227. Buckley TR (2002) Model misspecification and probabilistic tests of topology: evidence from empirical data sets. Syst Biol 51(3):509–523

    Article  PubMed  Google Scholar 

  228. Shimodaira H (2002) An approximately unbiased test of phylogenetic tree selection. Syst Biol 51(3):492–508

    Article  PubMed  Google Scholar 

  229. Goldman N, Anderson JP, Rodrigo AG (2000) Likelihood-based tests of topologies in phylogenetics. Syst Biol 49(4):652–670

    Article  CAS  PubMed  Google Scholar 

  230. Strimmer K, Rambaut A (2002) Inferring confidence sets of possibly misspecified gene trees. Proc R Soc Lond B Biol Sci 269(1487):137–142

    Article  Google Scholar 

  231. Shimodaira H, Hasegawa M (2001) Consel: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17(12):1246–1247

    Article  CAS  PubMed  Google Scholar 

  232. Church SH, Ryan JF, Dunn CW (2015) Automation and evaluation of the SOWH test with SOWHAT. Syst Biol 64(6):1048–1058

    Article  PubMed  PubMed Central  Google Scholar 

  233. Madison WP (1997) Gene trees in species trees. Syst Biol 46(3):523–536

    Article  Google Scholar 

  234. Nakhleh L (2013) Computational approaches to species phylogeny inference and gene tree reconciliation. Trends Ecol Evol 28(12):719–728

    Article  PubMed  Google Scholar 

  235. Szöllősi GJ, Tannier E, Daubin V, Boussau B (2014) The inference of gene trees with species trees. Syst Biol 64:e42–e62

    Article  PubMed  PubMed Central  Google Scholar 

  236. Degnan JH, Rosenberg NA (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol 24(6):332–340

    Article  PubMed  Google Scholar 

  237. Rannala B, Yang Z (2003) Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164(4):1645–1656

    CAS  PubMed  PubMed Central  Google Scholar 

  238. Arnold ML (1997) Natural hybridization and evolution. Oxford University Press, Oxford

    Google Scholar 

  239. Mallet J (2007) Hybrid speciation. Nature 446(7133):279

    Article  CAS  PubMed  Google Scholar 

  240. Lewis-Rogers N, Crandall KA, Posada D (2004) Evolutionary analyses of genetic recombination. Dyn Genet 408:49–78

    Google Scholar 

  241. Riley SPD, Shaffer HB, Voss SR, Fitzpatrick BM (2003) Hybridization between a rare, native tiger salamander (ambystoma californiense) and its introduced congener. Ecol. Appl.13(5):1263–1275

    Article  Google Scholar 

  242. Sheppard SK, Didelot X, Jolley KA, Darling AE, Pascoe B, Meric G, Kelly DJ, Cody A, Colles FM, Strachan NJC et al (2013) Progressive genome-wide introgression in agricultural campylobacter coli. Mol Ecol 22(4):1051–1064

    Article  CAS  PubMed  Google Scholar 

  243. Storfer A, Mech SG, Reudink MW, Ziemba RE, Warren J, Collins JP, Wood RM (2004) Evidence for introgression in the endangered sonora tiger salamander, ambystoma tigrinum stebbinsi (lowe). Copeia 2004(4):783–796

    Article  Google Scholar 

  244. Goloboff PA, Catalano SA, Mirande JM, Szumik CA, Arias JS, Källersjö M, Farris JS (2009) Phylogenetic analysis of 73 060 taxa corroborates major eukaryotic groups. Cladistics 25(3):211–230

    Article  Google Scholar 

  245. Sullivan GM, Feinn R (2012) Using effect size—or why the p value is not enough. J Grad Med Educ 4(3):279–282

    Article  PubMed  PubMed Central  Google Scholar 

  246. Rokas A, Carroll SB (2006) Bushes in the tree of life. PLoS Biol 4(11):e352

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  247. Phillips MJ, Delsuc F, Penny D (2004) Genome-scale phylogeny and the detection of systematic biases. Mol Biol Evol 21(7):1455–1458

    Article  CAS  PubMed  Google Scholar 

  248. Gatesy J, O’Grady P, Baker RH (1999) Corroboration among data sets in simultaneous analysis: hidden support for phylogenetic relationships among higher level artiodactyl taxa. Cladistics 15(3):271–313

    Article  Google Scholar 

  249. Mirarab S, Reaz R, Bayzid MS, Zimmermann T, Swenson MS, Warnow T (2014) Astral: genome-scale coalescent-based species tree estimation. Bioinformatics 30(17):i541–i548

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  250. Degnan JH, Rosenberg NA (2006) Discordance of species trees with their most likely gene trees. PLoS Genet 2(5):e68

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  251. Warnow T (2011) Concatenation analyses in the presence of incomplete lineage sorting. PLoS Currents 7

    Google Scholar 

  252. Baum BR (1992) Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon 3–10

    Google Scholar 

  253. Ragan MA (1992) Phylogenetic inference based on matrix representation of trees. Mol Phylogenet Evol 1(1):53–58

    Google Scholar 

  254. Beck RMD, Bininda-Emonds ORP, Cardillo M, Liu F-GR, Purvis A (2006) A higher-level mrp supertree of placental mammals. BMC Evol Biol 6(1):93

    Google Scholar 

  255. Kupczok A, Schmidt HA, von Haeseler A (2010) Accuracy of phylogeny reconstruction methods combining overlapping gene data sets. Algorithms Mol Biol 5(1):37

    Google Scholar 

  256. Swenson MS, Suri R, Linder CR, Warnow T (2011) An experimental study of quartets maxcut and other supertree methods. Algorithms Mol. Biol. 6(1):7

    Google Scholar 

  257. Swenson MS, Suri R, Linder CR, Warnow T (2012) Superfine: fast and accurate supertree estimation. Syst Biol 61(2):214–227

    Google Scholar 

  258. Nguyen N, Mirarab S, Warnow T (2012) MRL and SuperFine+ MRL: new supertree methods. Algorithms for Molecular Biology 7(1):3

    Google Scholar 

  259. Creevey CJ, McInerney JO (2005) Clann: investigating phylogenetic information through supertree analyses. Bioinformatics 21(3):390–392

    Google Scholar 

  260. Scornavacca C, Berry V, Lefort V, Douzery EJP, Ranwez V (2008) Physic_ist: cleaning source trees to infer more informative supertrees. BMC Bioinf 9(1):413

    Google Scholar 

  261. Binet M, Gascuel O, Scornavacca C, Douzery EJP, Pardi F (2016) Fast and accurate branch lengths estimation for phylogenomic trees. BMC Bioinf 17(1):23

    Article  CAS  Google Scholar 

  262. Vachaspati P, Warnow T (2016) FastRFs: fast and accurate Robinson-Foulds supertrees using constrained exact optimization. Bioinformatics 33:631–639

    Google Scholar 

  263. Edwards SV, Xi Z, Janke A, Faircloth BC, McCormack JE, Glenn TC, Zhong B, Wu S, Lemmon EM, Lemmon AR et al (2016) Implementing and testing the multispecies coalescent model: a valuable paradigm for phylogenomics. Mol Phylogenet Evol 94:447–462

    Article  PubMed  Google Scholar 

  264. Bayzid SM, Warnow T (2012) Estimating optimal species trees from incomplete gene trees under deep coalescence. J Comput Biol 19(6):591–605

    Article  CAS  PubMed  Google Scholar 

  265. Davis KE, Page RD (2014) Reweaving the tapestry: a supertree of birds. PLoS Curr 6. https://doi.org/10.1371/currents.tol.c1af68dda7c999ed9f1e4b2d2df7a08e

  266. Chaudhary R, Bansal MS, Wehe A, Fernández-Baca D, Eulenstein O (2010) iGTP: a software package for large-scale gene tree parsimony analysis. BMC Bioinf 11(1):574

    Article  Google Scholar 

  267. Yu Y, Dong J, Liu KJ, Nakhleh L (2014) Maximum likelihood inference of reticulate evolutionary histories. Proc Natl Acad Sci 111(46):16448–16453

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  268. Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu C-H, Xie D, Suchard MA, Rambaut A, Drummond AJ (2014) Beast 2: a software platform for Bayesian evolutionary analysis. PLoS Comput Biol 10(4):e1003537

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  269. Edwards SV, Liu L, Pearl DK (2007) High-resolution species trees without concatenation. Proc Natl Acad Sci 104(14):5936–5941

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  270. Mossel E, Roch S (2010) Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE/ACM Trans Comput Biol Bioinf 7(1):166–171

    Article  CAS  Google Scholar 

  271. Liu L, Yu L, Pearl DK, Edwards SV (2009) Estimating species phylogenies using coalescence times among sequences. Syst Biol 58(5):468–477

    Article  CAS  PubMed  Google Scholar 

  272. Liu L, Yu L, Kubatko L, Pearl DK, Edwards SV (2009) Coalescent methods for estimating phylogenetic trees. Mol Phylogenet Evol 53(1):320–328

    Article  CAS  PubMed  Google Scholar 

  273. Kubatko LS, Carstens BC, Knowles LL (2009) Stem: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25(7):971–973

    Article  CAS  PubMed  Google Scholar 

  274. Ané C, Larget B, Baum DA, Smith SD, Rokas A (2007) Bayesian estimation of concordance among gene trees. Mol Biol Evol 24(2):412–426

    Article  PubMed  CAS  Google Scholar 

  275. Larget BR, Kotha SK, Dewey CN, Ané C (2010) Bucky: gene tree/species tree reconciliation with Bayesian concordance analysis. Bioinformatics 26(22):2910–2911

    Article  CAS  PubMed  Google Scholar 

  276. Liu L, Yu L, Edwards SV (2010) A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol Biol 10(1):302

    Article  PubMed  PubMed Central  Google Scholar 

  277. Mirarab S, Warnow T (2015) ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12):i44–i52

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  278. Vachaspati P, Warnow T (2015) Astrid: accurate species trees from internode distances. BMC Genomics 16(10):S3

    Article  PubMed  PubMed Central  Google Scholar 

  279. Zimmermann T, Mirarab S, Warnow T (2014) Bbca: improving the scalability of* beast using random binning. BMC Genomics 15(6):S11

    Article  PubMed  PubMed Central  Google Scholar 

  280. Bryant D, Bouckaert R, Felsenstein J, Rosenberg NA, RoyChoudhury A (2012) Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol Biol Evol 29(8):1917–1932

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  281. Chifman J, Kubatko L (2014) Quartet inference from SNP data under the coalescent model. Bioinformatics 30(23):3317–3324

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  282. Chifman J, Kubatko L (2015) Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes, site-specific rate variation, and invariable sites. J Theor Biol 374:35–47

    Article  PubMed  Google Scholar 

  283. Degnan JH, DeGiorgio M, Bryant D, Rosenberg NA (2009) Properties of consensus methods for inferring species trees from gene trees. Syst Biol 58(1):35–54

    Article  PubMed  PubMed Central  Google Scholar 

  284. Allman ES, Degnan JH, Rhodes JA (2011) Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. Journal of mathematical biology 62(6):833–862

    Article  PubMed  Google Scholar 

  285. Lefort V, Desper R, Gascuel O (2015) FastME 2.0: a comprehensive, accurate and fast distance-based phylogeny inference program. Mol Biol Evol 32(10):2798–2800

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  286. Springer MS, Gatesy J (2016) The gene tree delusion. Mol Phylogenet Evol 94:1–33

    Article  PubMed  Google Scholar 

  287. Bayzid MS, Mirarab S, Warnow TJ (2013) Inferring optimal species trees under gene duplication and loss. In: Pacific symposium on biocomputing, vol 18, pp 250–261

    Google Scholar 

  288. Boussau B, Szöllősi GJ, Duret L, Gouy M, Tannier E, Daubin V (2013) Genome-scale coestimation of species and gene trees. Genome Res 23(2):323–330

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  289. Lang JM, Darling AE, Eisen JA (2013) Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices. PloS One 8(4):e62510

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  290. Pride DT, Meinersmann RJ, Wassenaar TM, Blaser MJ (2003) Evolutionary implications of microbial genome tetranucleotide frequency biases. Genome Res 13(2):145–158

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  291. Davidson R, Vachaspati P, Mirarab S, Warnow T (2015) Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. BMC Genomics 16(10):S1

    Article  PubMed  PubMed Central  Google Scholar 

  292. Tonini J, Moore A, Stern D, Shcheglovitova M, Ortí G (2015) Concatenation and species tree methods exhibit statistically indistinguishable accuracy under a range of simulated conditions. PLOS Curr Tree Life

    Google Scholar 

  293. Daubin V, Gouy M, Perriere G (2002) A phylogenomic approach to bacterial phylogeny: evidence of a core of genes sharing a common history. Genome Res 12(7):1080–1090

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  294. Bevan RB, Lang BF, Bryant D (2005) Calculating the evolutionary rates of different genes: a fast, accurate estimator with applications to maximum likelihood phylogenetic analysis. Syst Biol 54(6):900–915

    Article  PubMed  Google Scholar 

  295. Manthey JD, Campillo LC, Burns KJ, Moyle RG (2016) Comparison of target-capture and restriction-site associated dna sequencing for phylogenomics: a test in cardinalid tanagers (aves, genus: Piranga). Syst Biol 65:640–650

    Article  PubMed  Google Scholar 

  296. de Vienne DM, Ollier S, Aguileta G (2012) Phylo-MCOA: a fast and efficient method to detect outlier genes and species in phylogenomics using multiple co-inertia analysis. Mol Biol Evol 29(6):1587–1598

    Article  PubMed  CAS  Google Scholar 

  297. Mirarab S, Bayzid MS, Boussau B, Warnow T (2014) Statistical binning improves species tree estimation in the presence of gene tree incongruence. Science 346:1250463

    Article  PubMed  CAS  Google Scholar 

  298. Bayzid MS, Mirarab S, Boussau B, Warnow T (2015) Weighted statistical binning: enabling statistically consistent genome-scale phylogenetic analyses. PLoS One 10(6):e0129183

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  299. Narechania A, Baker RH, Sit R, Kolokotronis S-O, DeSalle R, Planet PJ (2012) Random addition concatenation analysis: a novel approach to the exploration of phylogenomic signal reveals strong agreement between core and shell genomic partitions in the cyanobacteria. Genome Biol Evol 4(1):30–43

    Article  CAS  PubMed  Google Scholar 

  300. Edwards SV (2016) Phylogenomic subsampling: a brief review. Zool Scr 45(S1):63–74

    Article  Google Scholar 

  301. Simmons MP, Sloan DB, Gatesy J (2016) The effects of subsampling gene trees on coalescent methods applied to ancient divergences. Mol Phylogenet Evol 97:76–89

    Article  PubMed  Google Scholar 

  302. Strimmer K, Von Haeseler A (1997) Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc Natl Acad Sci 94(13):6815–6819

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  303. Dell’Ampio E, Meusemann K, Szucsich NU, Peters RS, Meyer B, Borner J, Petersen M, Aberer AJ, Stamatakis A, Walzl MG et al (2014) Decisive data sets in phylogenomics: lessons from studies on the phylogenetic relationships of primarily wingless insects. Mol Biol Evol 31(1):239–249

    Article  PubMed  CAS  Google Scholar 

  304. Arcila D, Ortí G, Vari R, Armbruster JW, Stiassny MLJ, Ko KD, Sabaj MH, Lundberg J, Revell LJ, Betancur-R R (2017) Genome-wide interrogation advances resolution of recalcitrant groups in the tree of life. Nat Ecol Evol 1:0020

    Article  Google Scholar 

  305. Huson DH, Bryant D (2006) Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23(2):254–267

    Article  CAS  PubMed  Google Scholar 

  306. Bryant D, Moulton V (2004) Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Mol Biol Evol 21(2):255–265

    Article  CAS  PubMed  Google Scholar 

  307. Boc A, Makarenkov V et al (2012) T-rex: a web server for inferring, validating and visualizing phylogenetic trees and networks. Nucleic Acids Res 40(W1):W573–W579

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  308. Legendre P, Makarenkov V (2002) Reconstruction of biogeographic and evolutionary networks using reticulograms. Syst Biol 51(2):199–216

    Article  PubMed  Google Scholar 

  309. Solís-Lemus C, Ané C (2016) Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. PLoS Genet 12(3):e1005896

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  310. Hejase HA, Liu KJ (2016) A scalability study of phylogenetic network inference methods using empirical datasets and simulations involving a single reticulation. BMC Bioinf 17(1):422

    Article  Google Scholar 

  311. Didelot X, Falush D (2007) Inference of bacterial microevolution using multilocus sequence data. Genetics 175(3):1251–1266

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  312. Didelot X, Lawson D, Darling A, Falush D (2010) Inference of homologous recombination in bacteria using whole-genome sequences. Genetics 186(4):1435–1449

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  313. Wollenberg MS, Ruby EG (2012) Phylogeny and fitness of Vibrio fischeri from the light organs of euprymna scolopes in two Oahu, Hawaii populations. ISME J 6(2):352–362

    Article  CAS  PubMed  Google Scholar 

  314. Suh A (2016) The phylogenomic forest of bird trees contains a hard polytomy at the root of neoaves. Zool Scr 45(S1):50–62

    Article  Google Scholar 

  315. Contreras-Moreira B, Vinuesa P. Get_homologues, a versatile software package for scalable and robust microbial pangenome analysis. Appl Environ Microbiol 79(24):7696–7701 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  316. Li L, Stoeckert CJ, Roos DS (2003) Orthomcl: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  317. Penn O, Privman E, Landan G, Graur D, Pupko T (2010) An alignment confidence score capturing robustness to guide tree uncertainty. Mol Biol Evol 27(8):1759–1767

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  318. Gupta RS (1998) Protein phylogenies and signature sequences: a reappraisal of evolutionary relationships among archaebacteria, eubacteria, and eukaryotes. Microbiol Mol Biol Rev 62(4):1435–1491

    CAS  PubMed  PubMed Central  Google Scholar 

  319. Ajawatanawong P, Baldauf SL (2013) Evolution of protein indels in plants, animals and fungi. BMC Evol Biol 13(1):1

    Article  CAS  Google Scholar 

  320. Rodriguez-R LM, Grajales A, Arrieta-Ortiz ML, Salazar C, Restrepo S, Bernal A (2012) Genomes-based phylogeny of the genus Xanthomonas. BMC Microbiol 12(1):1

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João C. Setubal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media LLC

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Patané, J.S.L., Martins, J., Setubal, J.C. (2018). Phylogenomics. In: Setubal, J., Stoye, J., Stadler, P. (eds) Comparative Genomics. Methods in Molecular Biology, vol 1704. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7463-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-7463-4_5

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-7461-0

  • Online ISBN: 978-1-4939-7463-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics