Working with the Tree of Life in Comparative Studies: How to Build and Tailor Phylogenies to Interspecific Datasets

  • László Zsolt GaramszegiEmail author
  • Alejandro Gonzalez-Voyer


All comparative analyses rely on at least one phylogenetic hypothesis. However, the reconstruction of the evolutionary history of species is not the primary aim of these studies. In fact, it is rarely the case that a well-resolved, fully matching phylogeny is available for the interspecific trait data at hand. Therefore, phylogenetic information usually needs to be combined across various sources that often rely on different approaches and different markers for the phylogenetic reconstruction. Building hypotheses about the evolutionary history of species is a challenging task, as it requires knowledge about the underlying methodology and an ability to flexibly manipulate data in diverse formats. Although most practitioners are not experts in phylogenetics, the appropriate handling of phylogenetic information is crucial for making evolutionary inferences in a comparative study, because the results will be proportional to the underlying phylogeny. In this chapter, we provide an overview on how to interpret and combine phylogenetic information from different sources, and review the various tree-tailoring techniques by touching upon issues that are crucial for the understanding of other chapters in this book. We conclude that whichever method is used to generate trees, the phylogenetic hypotheses will always include some uncertainty that should be taken into account in a comparative study.



Additive tree/phylogeny

A phylogeny is termed additive when the tips are not all equidistant from the root. In an additive phylogeny branch lengths represent the number of expected substitutions, therefore differences among taxa in the rate of molecular evolution will lead to differences in branch lengths.


A continuous line that connects two nodes or a node to a tip in the phylogeny.

Branch length

Represents the “distance” between the two nodes or the node and tip connected by the branch. The “distance” can be measured in number of evolutionary transitions (if the phylogeny is reconstructed using maximum parsimony methods), number of expected substitutions, which is an estimate of the rate of molecular evolution, or divergence times.

Gene duplication

When a second copy of an existing gene emerges within a single genome. Gene duplication is a major mechanism by which new genetic material is generated.


Shared similarity between taxa that is due to inheritance from a common ancestor.


Similarity between taxa that results from convergent evolution, for example due to similar selection pressures.

Horizontal gene transfer

The transfer of genetic material between individuals of different species, and which is not the result of inheritance from a common ancestor.


Mating between individuals of two distinct species of plants or animals resulting in viable offspring.

Incomplete lineage sorting

Occurs when coalescence times of alleles are within the time span of speciation events or shorter. Incomplete lineage sorting results in gene genealogies that are not concordant with the species phylogeny.


Represent the putative ancestors of the taxa represented in the phylogeny.

Orthologous genes

Genes originating from a common ancestor (i.e. homologous genes) that have undergone independent evolution following a speciation event.

Parallel or convergent evolution

Evolution of phenotypes or sequences under similar selective regimes leading to higher similarities than would be expected based on the degree of shared ancestry.

Paralogous genes

Genes originating from a duplication event recent enough to reveal their common ancestry.


When more than two branches originate from a single node in the phylogeny. Polytomies reflect uncertainty in the timing of speciation events, either because of lack of sufficient data to determine the order of events with confidence (so called “soft polytomies”) or because the speciation events were so rapid there was insufficient time for the necessary substitutions to discriminate between the timings of the speciation events to accumulate (so called “hard polytomies”).


Represents the most recent common ancestor of all the tips (taxa) in the phylogeny. All branches of the phylogeny lead to the root and the root connects all nodes.


Occurs when two aligned, presumably orthologous, sequences have accumulated such an elevated number of repeated substitutions that these provide a poor estimate of their time of divergence. Saturation occurs because there is a higher probability of reverse mutations (changes to a nucleotide present in the past) as time of divergence increases and hence apparent differences between orthologous sequences become lower than expected based on the time of divergence.

Substitution rate

Also referred to as molecular evolution rate, it is the rate at which organisms accumulate genetic differences over time, it is usually calculated as the number of substitutions per site per unit time. Non-synonymous and synonymous substitutions can be discriminated depending on whether changes in the nucleotide sequence affect the translated amino acide sequences or not, respectively.


Also called leaves (following the tree analogy for phylogenies) they are the taxa whose relationships are being estimated with the phylogeny

Ultrametric tree/phylogeny

A phylogeny is termed ultrametric when all the tips are equidistant from the root. In other words the distance between any two species in the tree is the same as long as the path crosses the root of the tree. In ultrametric trees the branch lengths usually represent divergence times. Ultrametric trees can also be estimated under the assumption of a constant rate of substitution that is the same for all taxa, also called a molecular clock. However, recent studies with diverse species have called into question the molecular clock showing that the rate of molecular evolution varies among even closely related species and is correlated with species-specific traits and even environmental variables.


  1. Abdo Z, Minin VN, Joyce P, Sullivan J (2005) Accounting for uncertainty in the tree topology has little effect on the decision-theoretic approach to model selection in phylogeny estimation. Mol Biol Evol 22 (3):691–703. doi: 10.1093/molbev/msi050 PubMedCrossRefGoogle Scholar
  2. Alfaro ME, Huelsenbeck JP (2006) Comparative performance of Bayesian and AIC-based measures of phylogenetic model uncertainty. Syst Biol 55(1):89–96. doi: 10.1080/10635150500433565 PubMedCrossRefGoogle Scholar
  3. Amcoff M, Gonzalez-Voyer A, Kolm N (2013) Evolution of egg dummies in tanganyikan cichid fishes: the roles of parental care and sexual selection. J Evol Biol 26:2369–2382. doi: 10.1111/jeb.12231 PubMedCrossRefGoogle Scholar
  4. Arima S, Tardella L (2012) Improved harmonic mean estimator for phylogenetic model evidence. J Comput Biol 19(4):418–438. doi: 10.1089/cmb.2010.0139 PubMedCrossRefGoogle Scholar
  5. Arnold C, Matthews LJ, Nunn CL (2010) The 10k Trees website: a new online resource for primate phylogeny. Evol Anthropol 19:114–118CrossRefGoogle Scholar
  6. Benson DA, al. e (2011) GenBank. Nucleic Acids Res 39:D32–D37PubMedPubMedCentralCrossRefGoogle Scholar
  7. Bininda-Emonds O, Gittleman JL, Purvis A (1999) Building large trees by combining phylogenetic information: a complete phylogeny of the extant Carnivora (Mammalia). Biol Rev 74:143–175PubMedCrossRefGoogle Scholar
  8. Bininda-Emonds ORP, Cardillo M, Jones KE, R DEM, Beck RMD, Grenyer R, Price SA, Vos RA, Gittleman JL, Purvis A (2007) The delayed rise of present-day mammals. Nature 446:507–512PubMedCrossRefGoogle Scholar
  9. Blomberg SP, Lefevre JG, Wells JA, Waterhouse M (2012) Independent contrasts and PGLS regression estimators are equivalent. Syst Biol 61(3):382–391. doi: 10.1093/sysbio/syr118 PubMedCrossRefGoogle Scholar
  10. Bromham L (2011) The genome as a life-history character: why rate of molecular evolution varies between mammal species. Phil Trans R Soc B 366:2503–2513. doi: 10.1098/rstb.2011.0014 PubMedCrossRefGoogle Scholar
  11. Burleigh JG, Bansal MS, Eulenstein O, Hartmann S, Wehe A, Vision TJ (2011) Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees. Syst Biol 60(2):117–125. doi: 10.1093/sysbio/syq072 PubMedCrossRefGoogle Scholar
  12. Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17:540–552PubMedCrossRefGoogle Scholar
  13. de Villemereuil P, Wells JA, Edwards RD, Blomberg SP (2012) Bayesian models for comparative analysis integrating phylogenetic uncertainty. BMC Evol Biol 12. doi: 10.1186/1471-2148-12-102 PubMedPubMedCentralCrossRefGoogle Scholar
  14. Desluc F, Brinkmann H, Philippe H (2005) Phylogenomics and the reconstruction of the tree of life. Nature Rev Genet 6(5):361–375Google Scholar
  15. Donoghue MJ, Ackerly DD (1996) Phylogenetic uncertainties and sensitivity analyses in comparative biology. Phil Trans R Soc B 351:1241–2149CrossRefGoogle Scholar
  16. Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  17. Ewens WJ, Grant GR (2010) Statistical methods in bioinformatics: an introduction. Springer Science and Business Media, New YorkGoogle Scholar
  18. Felsenstein J (1985) Phylogenies and the comparative method. Am Nat 125(1):1–15CrossRefGoogle Scholar
  19. Felsenstein J (2004) Inferring phylogenies. Sunderland, Sinauer AssociatesGoogle Scholar
  20. FitzJohn RG, Maddison WP, Otto SP (2009a) Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. Syst Biol 58(6):595–611. doi: 10.1093/sysbio/syp067 CrossRefGoogle Scholar
  21. FitzJohn RG, Maddison WP, Otto SP (2009b) Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. Syst Biol 58:595–611CrossRefGoogle Scholar
  22. Freckleton RP, Harvey PH, Pagel M (2002) Phylogenetic analysis and comparative data: a test and review of evidence. Am Nat 160(6):712–726. doi: 10.1086/343873 PubMedCrossRefGoogle Scholar
  23. Galtier N, Jobson RW, Nabholz B, Glemin S, Blier PU (2009) Mitochondrial whims: metabolic rate, longevity and the rate of molecular evolution. Biol Lett 5 (3):413–416. doi:rsbl.2008.0662 [pii]  10.1098/rsbl.2008.0662
  24. Gonzalez-Voyer A, Fitzpatrick JL, Kolm N (2008) Sexual selection determines parental care patterns in cichlid fishes. Evolution 62 (8):2015–2026. doi:EVO426 [pii]  10.1111/j.1558-5646.2008.00426.x
  25. Grafen A (1989) The phylogenetic regression. Phil Trans R Soc B 326(1223):119–157PubMedCrossRefGoogle Scholar
  26. Hall BG (2004) Phylogenetic trees made easy: a how-to manual. Sinauer Associates Inc, SunderlandGoogle Scholar
  27. Hansen TF (1997) Stabilizing selection and the comparative analysis of adaptation. Evolution 51(5):1341–1351CrossRefGoogle Scholar
  28. Harvey PH, Pagel MD (1991) The comparative method in evolutionary biology. Oxford University Press, OxfordGoogle Scholar
  29. Hastings WK (1970) Monte carlo sampling methods using markov chains and their applications. Biometrika 57(1):97–109. doi: 10.2307/2334940 CrossRefGoogle Scholar
  30. Higgins D, Lemey P (2009) Multiple sequence alignment. In: Lemey P, Salemi M, Vandamme A-M (eds) The phylogenetic handbook: a practical approach to phylogenetic analysis and hypothesis testing. Cambridge University Press, Cambridge, pp 68–96CrossRefGoogle Scholar
  31. Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP (2001) Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294:2310–2314PubMedCrossRefGoogle Scholar
  32. Ives AR, Midford PE, Garland T (2007) Within-species variation and measurement error in phylogenetic comparative methods. Syst Biol 56(2):252–270. doi: 10.1080/10635150701313830 PubMedCrossRefGoogle Scholar
  33. Jetz W, Thomas GH, Joy JB, Hartmann K, Mooers AO (2012a) The global diversity of birds in space and time. Nature 491(7424):444–448. doi: 10.1038/nature11631 CrossRefGoogle Scholar
  34. Jetz W, Thomas GH, Joy JB, Hartmann K, Mooers AO (2012b) The global diversity of birds in space and time. Nature 491:444–448. doi: 10.1038/nature11631 CrossRefGoogle Scholar
  35. Kälersjö M, Albert VA, Farris JS (1999) Homoplasy increases phylogenetic structure. Cladistics 15(1):91–93. doi: 10.1111/j.1096-0031.1999.tb00400.x CrossRefGoogle Scholar
  36. Kalinowski ST (2009) How well do evolutionary trees describe genetic relationships among populations? Heredity 102:506–513. doi: 10.1038/hdy.2008.136 PubMedCrossRefGoogle Scholar
  37. Leclerc MC, Hugot JP, Durand P, Renaud F (2004) Evolutionary relationships between 15 Plasmodium species from new and old World primates (including humans): an 18S rDNA cladistic analysis. Parasitology 129:677–684PubMedCrossRefGoogle Scholar
  38. Lemey P, Salemi M, Vandamme A-M (eds) (2009) The phylogenetic handbook: a practical approach to phylogenetic analysis and hypothesis testing. Cambridge University Press, CambridgeGoogle Scholar
  39. Linder CR, Warnow T (2006) An overview of phylogeny reconstruction. In: Aluru S (ed) Handbook of computational molecular biology. Chapman & Hall/CRC Computer & Information Science, Boca Raton, FLGoogle Scholar
  40. Linnaeus C (1758) Systema naturae. 10th edn., StockholmGoogle Scholar
  41. Martins EP, Hansen TF (1997) Phylogenies and the comparative method: a general approach to incorporating phylogenetic information into the analysis of interspecific data. Am Nat 149(4):646–667CrossRefGoogle Scholar
  42. Martins EP, Housworth EA (2002) Phylogeny shape and the phylogenetic comparative method. Syst Biol 51(6):873–880. doi: 10.1080/10635150290155863 PubMedCrossRefGoogle Scholar
  43. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092CrossRefGoogle Scholar
  44. Minin V, Abdo Z, Joyce P, Sullivan J (2003) Performance-based selection of likelihood models for phylogeny estimation. Syst Biol 52 (5):674–683. doi: 10.1080/10635150390235494 PubMedPubMedCentralCrossRefGoogle Scholar
  45. Moriyama EN, Powell JR (1997) Synonymous substitution rates in Drosophila: mitochondrial versus nuclear genes. J Mol Evol 45:378–391PubMedCrossRefGoogle Scholar
  46. Morlon H, Parsons TL, Plotkin JB (2011) Reconciling molecular phylogenies with the fossil record. Proc Natl Acad Sci 108(39):16327–16332. doi: 10.1073/pnas.1102543108 PubMedCrossRefGoogle Scholar
  47. Nakhleh L (2013) Computational approaches to species phylogeny inference and gene tree reconciliation. Trends Ecol Evol 28(12):719–728. doi: 10.1016/j.tree.2013.09.004 PubMedCrossRefGoogle Scholar
  48. Nei M, Kumar N (2000) Molecular evolution and phylogenetics. Oxford University Press, OxfordGoogle Scholar
  49. Page RDM, Holmes EC (1998) Molecular evolution: a phylogenetic approach. Blackwell Publishing, OxfordGoogle Scholar
  50. Pagel M (1999) Inferring the historical patterns of biological evolution. Nature 401:877–884PubMedPubMedCentralCrossRefGoogle Scholar
  51. Pagel M, Meade A (2006) Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. Am Nat 167(6):808–825PubMedGoogle Scholar
  52. Pagel M, Meade A, Barker D (2004a) Bayesian estimation of ancestral character states on phylogenies. Syst Biol 53(3):673–684. doi: 10.1080/10635150490522232 PubMedCrossRefGoogle Scholar
  53. Pagel M, Meade A, Barker D (2004b) Bayesian estimation of ancestral character states on phylogenies. Syst Biol 53(5):673–684PubMedCrossRefGoogle Scholar
  54. Paradis E (2011) Analysis of phylogenetics and evolution with R, 2nd edn. Springer, BerlinGoogle Scholar
  55. Posada D, Buckley TR (2004) Model selection and model averaging in phylogenetics: advantages of akaike information criterion and Bayesian approaches over likelihood ratio tests. Syst Biol 53(5):793–808. doi: 10.1080/10635150490522304 PubMedCrossRefGoogle Scholar
  56. R Development Core Team (2007) R: a language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. doi:http://www.R-project.orgS
  57. Revell LJ, Reynolds RG (2012) A new Bayesian method for fitting evolutionary models to comparative data with intraspecific variation. Evolution 66(9):2697–2707. doi: 10.1111/j.1558-5646.2012.01645.x PubMedCrossRefGoogle Scholar
  58. Ronquist F, Huelsenbeck JP (2003) MrBayes 3: bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574PubMedCrossRefGoogle Scholar
  59. Roquet C, Thuiller W, Lavergne S (2013) Building megaphylogenies for macroecology: taking up the challenge. Ecography 36:13–26. doi: 10.1111/j.1600-0587.2012.07773.x PubMedPubMedCentralCrossRefGoogle Scholar
  60. Rzhetsky A, Nei M (1992) A simple method for estimating and testing minimum-evolution trees. Mol Biol Evol 9:945–967Google Scholar
  61. Saitou N, Nei M (1987) The neighbor-joining method—a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4):406–425PubMedGoogle Scholar
  62. Sanderson MJ (1997) A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol Biol Evol 14(12):1218–1231CrossRefGoogle Scholar
  63. Santos JC (2012) Fast molecular evolution associated with high active metabolic rates in poison frogs. Mol Biol Evol 29(8):2001–2018PubMedCrossRefGoogle Scholar
  64. Santos-Gally R, Gonzalez-Voyer A, Arroyo J (2013) Deconstructing heterostyly: the evolutionary role of incompatibility system, pollinators, and floral architecture. Evolution 67(7):2072–2082PubMedCrossRefGoogle Scholar
  65. Sibley CG, Ahlquist JE (1990) Phylogeny and classification of birds: a study in molecular evolution. Yale University Press, New HavenGoogle Scholar
  66. Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22 (21):2688–2690. btl446 [pii] doi: 10.1093/bioinformatics/btl446 PubMedCrossRefGoogle Scholar
  67. Stone GN, Nee S, Felsenstein J (2011) Controlling for non-independence in comparative analysis of patterns across populations within species. Phil Trans R Soc B 366(1569):1410–1424. doi: 10.1098/rstb.2010.0311 PubMedCrossRefGoogle Scholar
  68. Symonds MRE (2002) The effects of topological inaccuracy in evolutionary trees on the phylogenetic comparative method of independent contrasts. Syst Biol 51:541–553PubMedCrossRefGoogle Scholar
  69. Thomas GH, Hartmann K, Jetz W, Joy JB, Mimoto A, Mooers AO (2013) PASTIS: an R package to facilitate phylogenetic assembly with soft taxonomic inferences. Methods Ecol Evol 4:1011–1017. doi: 10.1111/2041-210X.12117 CrossRefGoogle Scholar
  70. Wolfe KH, Sharp PM, Li W-H (1989) Rates of synonymous substitution in plant nuclear genes. J Mol Evol 29:208–211CrossRefGoogle Scholar
  71. Wu D, Jospin G, Eisen J (2013) Systematic identification of gene families for use as “markers” for phylogenetic and phylogeny-driven ecological studies of bacteria and archaea and their major subgroups. PLoS ONE 8(10):e77033. doi: 10.1371/journal.pone.0077033 PubMedPubMedCentralCrossRefGoogle Scholar
  72. Zwickl DJ (2006) Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. University of Texas at Austin, AustinGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • László Zsolt Garamszegi
    • 1
    Email author
  • Alejandro Gonzalez-Voyer
    • 2
  1. 1.Department of Evolutionary EcologyEstación Biológica de Doñana—CSICSevillaSpain
  2. 2.Conservation and Evolutionary Genetics GroupEstación Biológica de Doñana (EBD–CSIC)SevillaSpain

Personalised recommendations