Using Confidence Set Heuristics During Topology Search Improves the Robustness of Phylogenetic Inference
We examine the impact of likelihood surface characteristics on phylogenetic inference. Amino acid data sets simulated from topologies with branch length features chosen to represent varying degrees of difficulty for likelihood maximization are analyzed. We present situations where the tree found to achieve the global maximum in likelihood is often not equal to the true tree. We use the program covSEARCH to demonstrate how the use of adaptively sized pools of candidate trees that are updated using confidence tests results in solution sets that are highly likely to contain the true tree. This approach requires more computation than traditional maximum likelihood methods, hence covSEARCH is best suited to small to medium-sized alignments or large alignments with some constrained nodes. The majority rule consensus tree computed from the confidence sets also proves to be different from the generating topology. Although low phylogenetic signal in the input alignment can result in large confidence sets of trees, some biological information can still be obtained based on nodes that exhibit high support within the confidence set. Two real data examples are analyzed: mammal mitochondrial proteins and a small tubulin alignment. We conclude that the technique of confidence set optimization can significantly improve the robustness of phylogenetic inference at a reasonable computational cost. Additionally, when either very short internal branches or very long terminal branches are present, confident resolution of specific bipartitions or subtrees, rather than whole-tree phylogenies, may be the most realistic goal for phylogenetic methods.
KeywordsPhylogenetics Maximum likelihood Confidence sets Robustness Majority consensus
The authors wish to thank Ed Susko, Associate Editor Nicholas Galtier, and the anonymous reviewers for their suggested improvements to the manuscript and, also, Matthew Spencer for helpful discussion. Thanks go to J. Leigh for providing the tree in Fig. 3a. Shirley Pepke was supported by a Genome Atlantic postdoctoral fellowship. This work was supported by NSERC discovery grant 298397-04 (CB).
- Bininda-Emonds OR, Brady SG, Kim J, Sanderson MJ (2001) Scaling of accuracy in extremely large phylogenetic trees. Pac Symp Biocomput 547–558Google Scholar
- Blouin C, Butt D, Hickey G, Rau-Chaplin A (2005a) Fast parallel maximum likelihood-based protein phylogeny. ISCA, Las Vegas, USA, September 2005Google Scholar
- Felsenstein J (2003) Inferring phylogenies. Sinauer Associates, Sunderland, MAGoogle Scholar
- Felsenstein J (2004) PHYLIP (Phylogeny Inference Package), version 3.6. Distributed by the author, Department of Genome Sciences, University of Washington, SeattleGoogle Scholar
- Mossel E, Steel M (2005) How much can evolved characters tell us about the tree that generated them? In: Gascuel O (ed) Mathematics of evolution and phylogeny. Oxford University Press, New York, pp 384–412Google Scholar
- Shimodaira H, Hasegawa M (1999) Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol 16:1114–1116Google Scholar
- Swofford DL (2002) PAUP*: Phylogenetic Analysis Using Parsimony (*and other methods). Sinauer Associates, Sunderland, MAGoogle Scholar
- Swofford DL, Olsen GJ, Waddell PJ, Hillis DM (1996) Phylogenetic inference. In: Hillis DM, oritz C, Mable BK (eds) Molecular systematics. Sinauer, Sunderland, MAGoogle Scholar
- Yang Z (2005) Phylogenetic analysis by maximum likelihood (PAML). Available at: http://abacus.gene.ucl.ac.uk/software/paml.html