Journal of Molecular Evolution

, Volume 64, Issue 1, pp 80–89 | Cite as

Using Confidence Set Heuristics During Topology Search Improves the Robustness of Phylogenetic Inference

  • Shirley L. Pepke
  • Davin Butt
  • Isabelle Nadeau
  • Andrew J. Roger
  • Christian Blouin
Article

Abstract

We examine the impact of likelihood surface characteristics on phylogenetic inference. Amino acid data sets simulated from topologies with branch length features chosen to represent varying degrees of difficulty for likelihood maximization are analyzed. We present situations where the tree found to achieve the global maximum in likelihood is often not equal to the true tree. We use the program covSEARCH to demonstrate how the use of adaptively sized pools of candidate trees that are updated using confidence tests results in solution sets that are highly likely to contain the true tree. This approach requires more computation than traditional maximum likelihood methods, hence covSEARCH is best suited to small to medium-sized alignments or large alignments with some constrained nodes. The majority rule consensus tree computed from the confidence sets also proves to be different from the generating topology. Although low phylogenetic signal in the input alignment can result in large confidence sets of trees, some biological information can still be obtained based on nodes that exhibit high support within the confidence set. Two real data examples are analyzed: mammal mitochondrial proteins and a small tubulin alignment. We conclude that the technique of confidence set optimization can significantly improve the robustness of phylogenetic inference at a reasonable computational cost. Additionally, when either very short internal branches or very long terminal branches are present, confident resolution of specific bipartitions or subtrees, rather than whole-tree phylogenies, may be the most realistic goal for phylogenetic methods.

Keywords

Phylogenetics Maximum likelihood Confidence sets Robustness Majority consensus 

References

  1. Bininda-Emonds OR, Brady SG, Kim J, Sanderson MJ (2001) Scaling of accuracy in extremely large phylogenetic trees. Pac Symp Biocomput 547–558Google Scholar
  2. Blouin C, Butt D, Hickey G, Rau-Chaplin A (2005a) Fast parallel maximum likelihood-based protein phylogeny. ISCA, Las Vegas, USA, September 2005Google Scholar
  3. Blouin C, Butt D, Roger AJ (2005b) The impact of taxon sampling on the estimation of rate of evolution at sites. Mol Biol Evol 22:784–791CrossRefGoogle Scholar
  4. Butt D, Roger A, Blouin C (2005) libcov: A C++ bioinformatic library to manipulate protein structures, sequence alignments and phylogeny. BMC Bioinform 6:138CrossRefGoogle Scholar
  5. Chang J (1996) Full reconstruction of Markov models on evolutionary trees: Identifiability and consistency. Math Biosci 137:51PubMedCrossRefGoogle Scholar
  6. Chor B, Hendy M, Holland B, Penny D (2000) Multiple maxima of likelihood in phylogenetic trees. Mol Biol Evol 17:1529–1541PubMedGoogle Scholar
  7. Cummings MP, Otto SP, Wakeley J (1995) Sampling properties of DNA sequence data in phylogenetic analysis. Mol Biol Evol 12:814–822PubMedGoogle Scholar
  8. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376PubMedCrossRefGoogle Scholar
  9. Felsenstein J (1985) Confidence limits on phylogenies with a molecular clock. Syst Zool 34:152–161CrossRefGoogle Scholar
  10. Felsenstein J (2003) Inferring phylogenies. Sinauer Associates, Sunderland, MAGoogle Scholar
  11. Felsenstein J (2004) PHYLIP (Phylogeny Inference Package), version 3.6. Distributed by the author, Department of Genome Sciences, University of Washington, SeattleGoogle Scholar
  12. Gascuel O (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol 14:685–695PubMedGoogle Scholar
  13. Goldman N, Anderson JP, Rodrigo AG (2000) Likelihood-based tests of topologies in phylogenetics. Syst Biol 49:652–670PubMedCrossRefGoogle Scholar
  14. Hordijk W, Gascuel O (2005) Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood. Bioinformatics 21:4338–4347PubMedCrossRefGoogle Scholar
  15. Keeling PJ (2003) Congruent evidence from α-tubulin and β-tubulin gene phylogenies for a zygomycete origin of microsporidia. Fungal Genet Biol 38:298–309PubMedCrossRefGoogle Scholar
  16. Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. J Mol Evol 29:170–179PubMedCrossRefGoogle Scholar
  17. Kishino H, Miyata T, Hasegawa M (1990) Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. J Mol Evol 30:151–160CrossRefGoogle Scholar
  18. Mossel E, Steel M (2005) How much can evolved characters tell us about the tree that generated them? In: Gascuel O (ed) Mathematics of evolution and phylogeny. Oxford University Press, New York, pp 384–412Google Scholar
  19. Olsen G, Matsuda H, Hagstrom R, Overbeek R (1994) fastDNAML: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. CABIOS 10:41–48PubMedGoogle Scholar
  20. Pollock DD, Zwickl DJ, McGuire JA, Hillis DM (2002) Increased taxon sampling is advantageous for phylogenetic inference. Syst Biol 51:664–671PubMedCrossRefGoogle Scholar
  21. Rogers JS (2001) Maximum likelihood estimation of phylogenetic trees is consistent when substitution rates vary according to the invariable sites plus gamma distribution. Syst Biol 50:713–722PubMedCrossRefGoogle Scholar
  22. Rogers J, Swofford D (1999) Multiple local maxima for likelihoods of phylogenetic trees: a simulation study. Mol Biol Evol 16:1079–1085PubMedGoogle Scholar
  23. Rokas A, Carroll SB (2005) More genes or taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. Mol Biol Evol 22:1337–1344PubMedCrossRefGoogle Scholar
  24. Rokas A, King N, Finnerty J, Carroll SB (2003) Conflicting phylogenetic signals at the base of the metazoan tree. Evol Dev 5:346–259PubMedCrossRefGoogle Scholar
  25. Schmidt H, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: Maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502–504PubMedCrossRefGoogle Scholar
  26. Shi X, Gu H, Susko E, Field C (2005) The comparison of the confidence regions in phylogeny. Mol Biol Evol 22:2285–2296PubMedCrossRefGoogle Scholar
  27. Shimodaira H (2002) An approximately unbiased test of phylogenetic tree selection. Syst Biol 51:492–508PubMedCrossRefGoogle Scholar
  28. Shimodaira H, Hasegawa M (1999) Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol 16:1114–1116Google Scholar
  29. Stamatakis A, Ludwig T, Meier H (2004) RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21:456–463PubMedCrossRefGoogle Scholar
  30. Strimmer K, Rambaut A (2002) Inferring confidence sets of possibly misspecified gene trees. Proc R Soc Lond B 269:137–142CrossRefGoogle Scholar
  31. Swofford DL (2002) PAUP*: Phylogenetic Analysis Using Parsimony (*and other methods). Sinauer Associates, Sunderland, MAGoogle Scholar
  32. Swofford DL, Olsen GJ, Waddell PJ, Hillis DM (1996) Phylogenetic inference. In: Hillis DM, oritz C, Mable BK (eds) Molecular systematics. Sinauer, Sunderland, MAGoogle Scholar
  33. Thompson JD, Higgins D, Gibson T (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680PubMedGoogle Scholar
  34. Wolf M, Easteal S, Kahn M, McKay B, Jermin L (2000) TrExML: a maximum-likelihood approach for extensive tree-space exploration. Bioinformatics 16:383–394PubMedCrossRefGoogle Scholar
  35. Yang Z (2005) Phylogenetic analysis by maximum likelihood (PAML). Available at: http://abacus.gene.ucl.ac.uk/software/paml.html
  36. Zwickl DJ, Hillis DM (2002) Increased taxon sampling greatly reduces phylogenetic error. Syst Biol 51:588–598PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, Inc. 2006

Authors and Affiliations

  • Shirley L. Pepke
    • 1
    • 2
  • Davin Butt
    • 2
  • Isabelle Nadeau
    • 1
    • 2
  • Andrew J. Roger
    • 1
  • Christian Blouin
    • 1
    • 2
  1. 1.Department of Biochemistry and Molecular BiologyDalhousie UniversityHalifaxCanada
  2. 2.Department of Computer ScienceDalhousie UniversityHalifaxCanada

Personalised recommendations