Working with the Tree of Life in Comparative Studies: How to Build and Tailor Phylogenies to Interspecific Datasets
All comparative analyses rely on at least one phylogenetic hypothesis. However, the reconstruction of the evolutionary history of species is not the primary aim of these studies. In fact, it is rarely the case that a well-resolved, fully matching phylogeny is available for the interspecific trait data at hand. Therefore, phylogenetic information usually needs to be combined across various sources that often rely on different approaches and different markers for the phylogenetic reconstruction. Building hypotheses about the evolutionary history of species is a challenging task, as it requires knowledge about the underlying methodology and an ability to flexibly manipulate data in diverse formats. Although most practitioners are not experts in phylogenetics, the appropriate handling of phylogenetic information is crucial for making evolutionary inferences in a comparative study, because the results will be proportional to the underlying phylogeny. In this chapter, we provide an overview on how to interpret and combine phylogenetic information from different sources, and review the various tree-tailoring techniques by touching upon issues that are crucial for the understanding of other chapters in this book. We conclude that whichever method is used to generate trees, the phylogenetic hypotheses will always include some uncertainty that should be taken into account in a comparative study.
A phylogeny is termed additive when the tips are not all equidistant from the root. In an additive phylogeny branch lengths represent the number of expected substitutions, therefore differences among taxa in the rate of molecular evolution will lead to differences in branch lengths.
A continuous line that connects two nodes or a node to a tip in the phylogeny.
Represents the “distance” between the two nodes or the node and tip connected by the branch. The “distance” can be measured in number of evolutionary transitions (if the phylogeny is reconstructed using maximum parsimony methods), number of expected substitutions, which is an estimate of the rate of molecular evolution, or divergence times.
When a second copy of an existing gene emerges within a single genome. Gene duplication is a major mechanism by which new genetic material is generated.
Shared similarity between taxa that is due to inheritance from a common ancestor.
Similarity between taxa that results from convergent evolution, for example due to similar selection pressures.
The transfer of genetic material between individuals of different species, and which is not the result of inheritance from a common ancestor.
Mating between individuals of two distinct species of plants or animals resulting in viable offspring.
Occurs when coalescence times of alleles are within the time span of speciation events or shorter. Incomplete lineage sorting results in gene genealogies that are not concordant with the species phylogeny.
Represent the putative ancestors of the taxa represented in the phylogeny.
Genes originating from a common ancestor (i.e. homologous genes) that have undergone independent evolution following a speciation event.
Evolution of phenotypes or sequences under similar selective regimes leading to higher similarities than would be expected based on the degree of shared ancestry.
Genes originating from a duplication event recent enough to reveal their common ancestry.
When more than two branches originate from a single node in the phylogeny. Polytomies reflect uncertainty in the timing of speciation events, either because of lack of sufficient data to determine the order of events with confidence (so called “soft polytomies”) or because the speciation events were so rapid there was insufficient time for the necessary substitutions to discriminate between the timings of the speciation events to accumulate (so called “hard polytomies”).
Represents the most recent common ancestor of all the tips (taxa) in the phylogeny. All branches of the phylogeny lead to the root and the root connects all nodes.
Occurs when two aligned, presumably orthologous, sequences have accumulated such an elevated number of repeated substitutions that these provide a poor estimate of their time of divergence. Saturation occurs because there is a higher probability of reverse mutations (changes to a nucleotide present in the past) as time of divergence increases and hence apparent differences between orthologous sequences become lower than expected based on the time of divergence.
Also referred to as molecular evolution rate, it is the rate at which organisms accumulate genetic differences over time, it is usually calculated as the number of substitutions per site per unit time. Non-synonymous and synonymous substitutions can be discriminated depending on whether changes in the nucleotide sequence affect the translated amino acide sequences or not, respectively.
Also called leaves (following the tree analogy for phylogenies) they are the taxa whose relationships are being estimated with the phylogeny
A phylogeny is termed ultrametric when all the tips are equidistant from the root. In other words the distance between any two species in the tree is the same as long as the path crosses the root of the tree. In ultrametric trees the branch lengths usually represent divergence times. Ultrametric trees can also be estimated under the assumption of a constant rate of substitution that is the same for all taxa, also called a molecular clock. However, recent studies with diverse species have called into question the molecular clock showing that the rate of molecular evolution varies among even closely related species and is correlated with species-specific traits and even environmental variables.
- Desluc F, Brinkmann H, Philippe H (2005) Phylogenomics and the reconstruction of the tree of life. Nature Rev Genet 6(5):361–375Google Scholar
- Ewens WJ, Grant GR (2010) Statistical methods in bioinformatics: an introduction. Springer Science and Business Media, New YorkGoogle Scholar
- Felsenstein J (2004) Inferring phylogenies. Sunderland, Sinauer AssociatesGoogle Scholar
- Galtier N, Jobson RW, Nabholz B, Glemin S, Blier PU (2009) Mitochondrial whims: metabolic rate, longevity and the rate of molecular evolution. Biol Lett 5 (3):413–416. doi:rsbl.2008.0662 [pii] 10.1098/rsbl.2008.0662
- Gonzalez-Voyer A, Fitzpatrick JL, Kolm N (2008) Sexual selection determines parental care patterns in cichlid fishes. Evolution 62 (8):2015–2026. doi:EVO426 [pii] 10.1111/j.1558-5646.2008.00426.x
- Hall BG (2004) Phylogenetic trees made easy: a how-to manual. Sinauer Associates Inc, SunderlandGoogle Scholar
- Harvey PH, Pagel MD (1991) The comparative method in evolutionary biology. Oxford University Press, OxfordGoogle Scholar
- Lemey P, Salemi M, Vandamme A-M (eds) (2009) The phylogenetic handbook: a practical approach to phylogenetic analysis and hypothesis testing. Cambridge University Press, CambridgeGoogle Scholar
- Linder CR, Warnow T (2006) An overview of phylogeny reconstruction. In: Aluru S (ed) Handbook of computational molecular biology. Chapman & Hall/CRC Computer & Information Science, Boca Raton, FLGoogle Scholar
- Linnaeus C (1758) Systema naturae. 10th edn., StockholmGoogle Scholar
- Nei M, Kumar N (2000) Molecular evolution and phylogenetics. Oxford University Press, OxfordGoogle Scholar
- Page RDM, Holmes EC (1998) Molecular evolution: a phylogenetic approach. Blackwell Publishing, OxfordGoogle Scholar
- Paradis E (2011) Analysis of phylogenetics and evolution with R, 2nd edn. Springer, BerlinGoogle Scholar
- R Development Core Team (2007) R: a language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. doi:http://www.R-project.orgS
- Rzhetsky A, Nei M (1992) A simple method for estimating and testing minimum-evolution trees. Mol Biol Evol 9:945–967Google Scholar
- Sibley CG, Ahlquist JE (1990) Phylogeny and classification of birds: a study in molecular evolution. Yale University Press, New HavenGoogle Scholar
- Wu D, Jospin G, Eisen J (2013) Systematic identification of gene families for use as “markers” for phylogenetic and phylogeny-driven ecological studies of bacteria and archaea and their major subgroups. PLoS ONE 8(10):e77033. doi: 10.1371/journal.pone.0077033 PubMedPubMedCentralCrossRefGoogle Scholar
- Zwickl DJ (2006) Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. University of Texas at Austin, AustinGoogle Scholar