Skip to main content

Working with the Tree of Life in Comparative Studies: How to Build and Tailor Phylogenies to Interspecific Datasets

  • Chapter
Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology

Abstract

All comparative analyses rely on at least one phylogenetic hypothesis. However, the reconstruction of the evolutionary history of species is not the primary aim of these studies. In fact, it is rarely the case that a well-resolved, fully matching phylogeny is available for the interspecific trait data at hand. Therefore, phylogenetic information usually needs to be combined across various sources that often rely on different approaches and different markers for the phylogenetic reconstruction. Building hypotheses about the evolutionary history of species is a challenging task, as it requires knowledge about the underlying methodology and an ability to flexibly manipulate data in diverse formats. Although most practitioners are not experts in phylogenetics, the appropriate handling of phylogenetic information is crucial for making evolutionary inferences in a comparative study, because the results will be proportional to the underlying phylogeny. In this chapter, we provide an overview on how to interpret and combine phylogenetic information from different sources, and review the various tree-tailoring techniques by touching upon issues that are crucial for the understanding of other chapters in this book. We conclude that whichever method is used to generate trees, the phylogenetic hypotheses will always include some uncertainty that should be taken into account in a comparative study.

The original version of this chapter was revised: Online Practical Material website has been updated. The erratum to this chapter is available at https://doi.org/10.1007/978-3-662-43550-2_23

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    see also Glossary at the end of the chapter

References

  • Abdo Z, Minin VN, Joyce P, Sullivan J (2005) Accounting for uncertainty in the tree topology has little effect on the decision-theoretic approach to model selection in phylogeny estimation. Mol Biol Evol 22 (3):691–703. doi:10.1093/molbev/msi050

    Article  PubMed  Google Scholar 

  • Alfaro ME, Huelsenbeck JP (2006) Comparative performance of Bayesian and AIC-based measures of phylogenetic model uncertainty. Syst Biol 55(1):89–96. doi:10.1080/10635150500433565

    Article  PubMed  Google Scholar 

  • Amcoff M, Gonzalez-Voyer A, Kolm N (2013) Evolution of egg dummies in tanganyikan cichid fishes: the roles of parental care and sexual selection. J Evol Biol 26:2369–2382. doi:10.1111/jeb.12231

    Article  CAS  PubMed  Google Scholar 

  • Arima S, Tardella L (2012) Improved harmonic mean estimator for phylogenetic model evidence. J Comput Biol 19(4):418–438. doi:10.1089/cmb.2010.0139

    Article  CAS  PubMed  Google Scholar 

  • Arnold C, Matthews LJ, Nunn CL (2010) The 10k Trees website: a new online resource for primate phylogeny. Evol Anthropol 19:114–118

    Article  Google Scholar 

  • Benson DA, al. e (2011) GenBank. Nucleic Acids Res 39:D32–D37

    Article  PubMed  PubMed Central  Google Scholar 

  • Bininda-Emonds O, Gittleman JL, Purvis A (1999) Building large trees by combining phylogenetic information: a complete phylogeny of the extant Carnivora (Mammalia). Biol Rev 74:143–175

    Article  CAS  PubMed  Google Scholar 

  • Bininda-Emonds ORP, Cardillo M, Jones KE, R DEM, Beck RMD, Grenyer R, Price SA, Vos RA, Gittleman JL, Purvis A (2007) The delayed rise of present-day mammals. Nature 446:507–512

    Article  CAS  PubMed  Google Scholar 

  • Blomberg SP, Lefevre JG, Wells JA, Waterhouse M (2012) Independent contrasts and PGLS regression estimators are equivalent. Syst Biol 61(3):382–391. doi:10.1093/sysbio/syr118

    Article  PubMed  Google Scholar 

  • Bromham L (2011) The genome as a life-history character: why rate of molecular evolution varies between mammal species. Phil Trans R Soc B 366:2503–2513. doi:10.1098/rstb.2011.0014

    Article  PubMed  PubMed Central  Google Scholar 

  • Burleigh JG, Bansal MS, Eulenstein O, Hartmann S, Wehe A, Vision TJ (2011) Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees. Syst Biol 60(2):117–125. doi:10.1093/sysbio/syq072

    Article  CAS  PubMed  Google Scholar 

  • Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17:540–552

    Article  CAS  PubMed  Google Scholar 

  • de Villemereuil P, Wells JA, Edwards RD, Blomberg SP (2012) Bayesian models for comparative analysis integrating phylogenetic uncertainty. BMC Evol Biol 12. doi:10.1186/1471-2148-12-102

    Article  PubMed  PubMed Central  Google Scholar 

  • Desluc F, Brinkmann H, Philippe H (2005) Phylogenomics and the reconstruction of the tree of life. Nature Rev Genet 6(5):361–375

    Google Scholar 

  • Donoghue MJ, Ackerly DD (1996) Phylogenetic uncertainties and sensitivity analyses in comparative biology. Phil Trans R Soc B 351:1241–2149

    Article  Google Scholar 

  • Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Ewens WJ, Grant GR (2010) Statistical methods in bioinformatics: an introduction. Springer Science and Business Media, New York

    Google Scholar 

  • Felsenstein J (1985) Phylogenies and the comparative method. Am Nat 125(1):1–15

    Article  Google Scholar 

  • Felsenstein J (2004) Inferring phylogenies. Sunderland, Sinauer Associates

    Google Scholar 

  • FitzJohn RG, Maddison WP, Otto SP (2009a) Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. Syst Biol 58(6):595–611. doi:10.1093/sysbio/syp067

    Article  PubMed  Google Scholar 

  • FitzJohn RG, Maddison WP, Otto SP (2009b) Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. Syst Biol 58:595–611

    Article  PubMed  Google Scholar 

  • Freckleton RP, Harvey PH, Pagel M (2002) Phylogenetic analysis and comparative data: a test and review of evidence. Am Nat 160(6):712–726. doi:10.1086/343873

    Article  CAS  PubMed  Google Scholar 

  • Galtier N, Jobson RW, Nabholz B, Glemin S, Blier PU (2009) Mitochondrial whims: metabolic rate, longevity and the rate of molecular evolution. Biol Lett 5 (3):413–416. doi:rsbl.2008.0662 [pii] 10.1098/rsbl.2008.0662

  • Gonzalez-Voyer A, Fitzpatrick JL, Kolm N (2008) Sexual selection determines parental care patterns in cichlid fishes. Evolution 62 (8):2015–2026. doi:EVO426 [pii] 10.1111/j.1558-5646.2008.00426.x

  • Grafen A (1989) The phylogenetic regression. Phil Trans R Soc B 326(1223):119–157

    Article  CAS  PubMed  Google Scholar 

  • Hall BG (2004) Phylogenetic trees made easy: a how-to manual. Sinauer Associates Inc, Sunderland

    Google Scholar 

  • Hansen TF (1997) Stabilizing selection and the comparative analysis of adaptation. Evolution 51(5):1341–1351

    Article  PubMed  Google Scholar 

  • Harvey PH, Pagel MD (1991) The comparative method in evolutionary biology. Oxford University Press, Oxford

    Google Scholar 

  • Hastings WK (1970) Monte carlo sampling methods using markov chains and their applications. Biometrika 57(1):97–109. doi:10.2307/2334940

    Article  Google Scholar 

  • Higgins D, Lemey P (2009) Multiple sequence alignment. In: Lemey P, Salemi M, Vandamme A-M (eds) The phylogenetic handbook: a practical approach to phylogenetic analysis and hypothesis testing. Cambridge University Press, Cambridge, pp 68–96

    Chapter  Google Scholar 

  • Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP (2001) Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294:2310–2314

    Article  CAS  PubMed  Google Scholar 

  • Ives AR, Midford PE, Garland T (2007) Within-species variation and measurement error in phylogenetic comparative methods. Syst Biol 56(2):252–270. doi:10.1080/10635150701313830

    Article  PubMed  Google Scholar 

  • Jetz W, Thomas GH, Joy JB, Hartmann K, Mooers AO (2012a) The global diversity of birds in space and time. Nature 491(7424):444–448. doi:10.1038/nature11631

    Article  CAS  PubMed  Google Scholar 

  • Jetz W, Thomas GH, Joy JB, Hartmann K, Mooers AO (2012b) The global diversity of birds in space and time. Nature 491:444–448. doi:10.1038/nature11631

    Article  CAS  PubMed  Google Scholar 

  • Kälersjö M, Albert VA, Farris JS (1999) Homoplasy increases phylogenetic structure. Cladistics 15(1):91–93. doi:10.1111/j.1096-0031.1999.tb00400.x

    Article  Google Scholar 

  • Kalinowski ST (2009) How well do evolutionary trees describe genetic relationships among populations? Heredity 102:506–513. doi:10.1038/hdy.2008.136

    Article  CAS  PubMed  Google Scholar 

  • Leclerc MC, Hugot JP, Durand P, Renaud F (2004) Evolutionary relationships between 15 Plasmodium species from new and old World primates (including humans): an 18S rDNA cladistic analysis. Parasitology 129:677–684

    Article  CAS  PubMed  Google Scholar 

  • Lemey P, Salemi M, Vandamme A-M (eds) (2009) The phylogenetic handbook: a practical approach to phylogenetic analysis and hypothesis testing. Cambridge University Press, Cambridge

    Google Scholar 

  • Linder CR, Warnow T (2006) An overview of phylogeny reconstruction. In: Aluru S (ed) Handbook of computational molecular biology. Chapman & Hall/CRC Computer & Information Science, Boca Raton, FL

    Google Scholar 

  • Linnaeus C (1758) Systema naturae. 10th edn., Stockholm

    Google Scholar 

  • Martins EP, Hansen TF (1997) Phylogenies and the comparative method: a general approach to incorporating phylogenetic information into the analysis of interspecific data. Am Nat 149(4):646–667

    Article  Google Scholar 

  • Martins EP, Housworth EA (2002) Phylogeny shape and the phylogenetic comparative method. Syst Biol 51(6):873–880. doi:10.1080/10635150290155863

    Article  PubMed  Google Scholar 

  • Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092

    Article  CAS  Google Scholar 

  • Minin V, Abdo Z, Joyce P, Sullivan J (2003) Performance-based selection of likelihood models for phylogeny estimation. Syst Biol 52 (5):674–683. doi:10.1080/10635150390235494

    Article  PubMed  Google Scholar 

  • Moriyama EN, Powell JR (1997) Synonymous substitution rates in Drosophila: mitochondrial versus nuclear genes. J Mol Evol 45:378–391

    Article  CAS  PubMed  Google Scholar 

  • Morlon H, Parsons TL, Plotkin JB (2011) Reconciling molecular phylogenies with the fossil record. Proc Natl Acad Sci 108(39):16327–16332. doi:10.1073/pnas.1102543108

    Article  PubMed  PubMed Central  Google Scholar 

  • Nakhleh L (2013) Computational approaches to species phylogeny inference and gene tree reconciliation. Trends Ecol Evol 28(12):719–728. doi:10.1016/j.tree.2013.09.004

    Article  PubMed  Google Scholar 

  • Nei M, Kumar N (2000) Molecular evolution and phylogenetics. Oxford University Press, Oxford

    Google Scholar 

  • Page RDM, Holmes EC (1998) Molecular evolution: a phylogenetic approach. Blackwell Publishing, Oxford

    Google Scholar 

  • Pagel M (1999) Inferring the historical patterns of biological evolution. Nature 401:877–884

    Article  CAS  PubMed  Google Scholar 

  • Pagel M, Meade A (2006) Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. Am Nat 167(6):808–825

    PubMed  Google Scholar 

  • Pagel M, Meade A, Barker D (2004a) Bayesian estimation of ancestral character states on phylogenies. Syst Biol 53(3):673–684. doi:10.1080/10635150490522232

    Article  PubMed  Google Scholar 

  • Pagel M, Meade A, Barker D (2004b) Bayesian estimation of ancestral character states on phylogenies. Syst Biol 53(5):673–684

    Article  PubMed  Google Scholar 

  • Paradis E (2011) Analysis of phylogenetics and evolution with R, 2nd edn. Springer, Berlin

    Google Scholar 

  • Posada D, Buckley TR (2004) Model selection and model averaging in phylogenetics: advantages of akaike information criterion and Bayesian approaches over likelihood ratio tests. Syst Biol 53(5):793–808. doi:10.1080/10635150490522304

    Article  PubMed  Google Scholar 

  • R Development Core Team (2007) R: a language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. doi:http://www.R-project.orgS

  • Revell LJ, Reynolds RG (2012) A new Bayesian method for fitting evolutionary models to comparative data with intraspecific variation. Evolution 66(9):2697–2707. doi:10.1111/j.1558-5646.2012.01645.x

    Article  PubMed  Google Scholar 

  • Ronquist F, Huelsenbeck JP (2003) MrBayes 3: bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574

    Article  CAS  PubMed  Google Scholar 

  • Roquet C, Thuiller W, Lavergne S (2013) Building megaphylogenies for macroecology: taking up the challenge. Ecography 36:13–26. doi:10.1111/j.1600-0587.2012.07773.x

    Article  PubMed  PubMed Central  Google Scholar 

  • Rzhetsky A, Nei M (1992) A simple method for estimating and testing minimum-evolution trees. Mol Biol Evol 9:945–967

    CAS  Google Scholar 

  • Saitou N, Nei M (1987) The neighbor-joining method—a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4):406–425

    CAS  PubMed  Google Scholar 

  • Sanderson MJ (1997) A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol Biol Evol 14(12):1218–1231

    Article  CAS  Google Scholar 

  • Santos JC (2012) Fast molecular evolution associated with high active metabolic rates in poison frogs. Mol Biol Evol 29(8):2001–2018

    Article  CAS  PubMed  Google Scholar 

  • Santos-Gally R, Gonzalez-Voyer A, Arroyo J (2013) Deconstructing heterostyly: the evolutionary role of incompatibility system, pollinators, and floral architecture. Evolution 67(7):2072–2082

    Article  PubMed  Google Scholar 

  • Sibley CG, Ahlquist JE (1990) Phylogeny and classification of birds: a study in molecular evolution. Yale University Press, New Haven

    Google Scholar 

  • Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22 (21):2688–2690. btl446 [pii] doi:10.1093/bioinformatics/btl446

    Article  CAS  PubMed  Google Scholar 

  • Stone GN, Nee S, Felsenstein J (2011) Controlling for non-independence in comparative analysis of patterns across populations within species. Phil Trans R Soc B 366(1569):1410–1424. doi:10.1098/rstb.2010.0311

    Article  PubMed  PubMed Central  Google Scholar 

  • Symonds MRE (2002) The effects of topological inaccuracy in evolutionary trees on the phylogenetic comparative method of independent contrasts. Syst Biol 51:541–553

    Article  PubMed  Google Scholar 

  • Thomas GH, Hartmann K, Jetz W, Joy JB, Mimoto A, Mooers AO (2013) PASTIS: an R package to facilitate phylogenetic assembly with soft taxonomic inferences. Methods Ecol Evol 4:1011–1017. doi:10.1111/2041-210X.12117

    Article  Google Scholar 

  • Wolfe KH, Sharp PM, Li W-H (1989) Rates of synonymous substitution in plant nuclear genes. J Mol Evol 29:208–211

    Article  CAS  Google Scholar 

  • Wu D, Jospin G, Eisen J (2013) Systematic identification of gene families for use as “markers” for phylogenetic and phylogeny-driven ecological studies of bacteria and archaea and their major subgroups. PLoS ONE 8(10):e77033. doi:10.1371/journal.pone.0077033

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zwickl DJ (2006) Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. University of Texas at Austin, Austin

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to László Zsolt Garamszegi .

Editor information

Editors and Affiliations

Glossary

Additive tree/phylogeny

A phylogeny is termed additive when the tips are not all equidistant from the root. In an additive phylogeny branch lengths represent the number of expected substitutions, therefore differences among taxa in the rate of molecular evolution will lead to differences in branch lengths.

Branch

A continuous line that connects two nodes or a node to a tip in the phylogeny.

Branch length

Represents the “distance” between the two nodes or the node and tip connected by the branch. The “distance” can be measured in number of evolutionary transitions (if the phylogeny is reconstructed using maximum parsimony methods), number of expected substitutions, which is an estimate of the rate of molecular evolution, or divergence times.

Gene duplication

When a second copy of an existing gene emerges within a single genome. Gene duplication is a major mechanism by which new genetic material is generated.

Homology

Shared similarity between taxa that is due to inheritance from a common ancestor.

Homoplasy

Similarity between taxa that results from convergent evolution, for example due to similar selection pressures.

Horizontal gene transfer

The transfer of genetic material between individuals of different species, and which is not the result of inheritance from a common ancestor.

Hybridization

Mating between individuals of two distinct species of plants or animals resulting in viable offspring.

Incomplete lineage sorting

Occurs when coalescence times of alleles are within the time span of speciation events or shorter. Incomplete lineage sorting results in gene genealogies that are not concordant with the species phylogeny.

Nodes

Represent the putative ancestors of the taxa represented in the phylogeny.

Orthologous genes

Genes originating from a common ancestor (i.e. homologous genes) that have undergone independent evolution following a speciation event.

Parallel or convergent evolution

Evolution of phenotypes or sequences under similar selective regimes leading to higher similarities than would be expected based on the degree of shared ancestry.

Paralogous genes

Genes originating from a duplication event recent enough to reveal their common ancestry.

Polytomy

When more than two branches originate from a single node in the phylogeny. Polytomies reflect uncertainty in the timing of speciation events, either because of lack of sufficient data to determine the order of events with confidence (so called “soft polytomies”) or because the speciation events were so rapid there was insufficient time for the necessary substitutions to discriminate between the timings of the speciation events to accumulate (so called “hard polytomies”).

Root

Represents the most recent common ancestor of all the tips (taxa) in the phylogeny. All branches of the phylogeny lead to the root and the root connects all nodes.

Saturation

Occurs when two aligned, presumably orthologous, sequences have accumulated such an elevated number of repeated substitutions that these provide a poor estimate of their time of divergence. Saturation occurs because there is a higher probability of reverse mutations (changes to a nucleotide present in the past) as time of divergence increases and hence apparent differences between orthologous sequences become lower than expected based on the time of divergence.

Substitution rate

Also referred to as molecular evolution rate, it is the rate at which organisms accumulate genetic differences over time, it is usually calculated as the number of substitutions per site per unit time. Non-synonymous and synonymous substitutions can be discriminated depending on whether changes in the nucleotide sequence affect the translated amino acide sequences or not, respectively.

Tips

Also called leaves (following the tree analogy for phylogenies) they are the taxa whose relationships are being estimated with the phylogeny

Ultrametric tree/phylogeny

A phylogeny is termed ultrametric when all the tips are equidistant from the root. In other words the distance between any two species in the tree is the same as long as the path crosses the root of the tree. In ultrametric trees the branch lengths usually represent divergence times. Ultrametric trees can also be estimated under the assumption of a constant rate of substitution that is the same for all taxa, also called a molecular clock. However, recent studies with diverse species have called into question the molecular clock showing that the rate of molecular evolution varies among even closely related species and is correlated with species-specific traits and even environmental variables.

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Garamszegi, L.Z., Gonzalez-Voyer, A. (2014). Working with the Tree of Life in Comparative Studies: How to Build and Tailor Phylogenies to Interspecific Datasets. In: Garamszegi, L. (eds) Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43550-2_2

Download citation

Publish with us

Policies and ethics