Abstract
It is well known among phylogeneticists that adding an extra taxon (e.g. species) to a data set can alter the structure of the optimal phylogenetic tree in surprising ways. However, little is known about this “rogue taxon” effect. In this paper we characterize the behavior of balanced minimum evolution (BME) phylogenetics on data sets of this type using tools from polyhedral geometry. First we show that for any distance matrix there exist distances to a “rogue taxon” such that the BME-optimal tree for the data set with the new taxon does not contain any nontrivial splits (bipartitions) of the optimal tree for the original data. Second, we prove a theorem which restricts the topology of BME-optimal trees for data sets of this type, thus showing that a rogue taxon cannot have an arbitrary effect on the optimal tree. Third, we computationally construct polyhedral cones that give complete answers for BME rogue taxon behavior when our original data fits a tree on four, five, and six taxa. We use these cones to derive sufficient conditions for rogue taxon behavior for four taxa, and to understand the frequency of the rogue taxon effect via simulation.
Article PDF
Similar content being viewed by others
References
Baurain, D., Brinkmann, H., & Philippe, H. (2007). Lack of resolution in the animal phylogeny: Closely spaced cladogeneses or undetected systematic errors? Mol. Biol. Evol., 24(1), 6.
Berger, S. A., & Stamatakis, A. (2009). Evolutionary placement of short sequence reads. http://arxiv.org/abs/0911.2852.
Bordewich, M., Gascuel, O., Huber, K. T., & Moulton, V. (2009). Consistency of topological moves based on the balanced minimum evolution principle of phylogenetic inference. IEEE/ACM Trans. Comput. Biol. Bioinfo., 6(1), 110–117.
Brodal, G. S., Fagerberg, R., & Pedersen, C. N. S. (2004). Computing the quartet distance between evolutionary trees in time O(nlog n). Algorithmica, 38(2), 377–395.
Bryant, D., Tsang, J., Kearney, P., & Li, M. (2000). Computing the quartet distance between evolutionary trees. In Proceedings of the eleventh annual ACM-SIAM symposium on discrete algorithms (p. 286). Society for Industrial and Applied Mathematics, Philadelphia.
Chailloux, E., Manoury, P., & Pagano, B. (2000). Developing applications with objective caml. http://caml.inria.fr/ocaml/index.en.html.
DeBry, R. W. (2005). The systematic component of phylogenetic error as a function of taxonomic sampling under parsimony. Syst. Biol., 54(3), 432.
Desper, R., & Gascuel, O. (2002a). Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. In Workshop on algorithms in bioinformatics (WABI) (pp. 357–374).
Desper, R., & Gascuel, O. (2002b). Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. J. Comput. Biol., 9(5), 687–705.
Desper, R., & Gascuel, O. (2004). Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting. Mol. Biol. Evol., 21(3), 587–598.
Desper, R., & Gascuel, O. (2005). The minimum evolution distance-based approach to phylogenetic inference. In O. Gascuel (Ed.), Mathematics of evolution & phylogeny (pp. 1–32). Oxford: Oxford University Press.
Eickmeyer, K., Huggins, P., Pachter, L., & Yoshida, R. (2008). On the optimality of the neighbor-joining algorithm. Algorithms Mol. Biol., 3(5).
Ewald, G. (1996). Graduate texts in mathematics : Vol. 168. Combinatorial convexity and algebraic geometry. New York: Springer.
Felsenstein, J. (1978). Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool., 27(4), 401–410.
Felsenstein, J. (1995). PHYLIP (phylogeny inference package), version 3.57 c. Department of Genetics, University of Washington, Seattle.
Felsenstein, J. (2004). Inferring Phylogenies. Sunderland, MA: Sinauer Press.
Galassi, M., Davies, J., Theiler, J., Gough, B., Jungman, G., Alken, P., Booth, M., & Rossi, F. (2009). GNU scientific library reference manual (3rd ed.). Network Theory Ltd. http://www.gnu.org/software/gsl/.
Gawrilow, E., & Joswig, M. (2000). Polymake: A framework for analyzing convex polytopes. In G. Kalai, & G. M. Ziegler (Eds.), Polytopes—combinatorics and computation (pp. 43–74). Basel: Birkhäuser.
Graybeal, A. (1998). Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol., 47(1), 9.
Guillemot, S. & Pardi, F. (2009). Personal communication.
Heath, T. A., Hedtke, S. M., & Hillis, D. M. (2008a). Taxon sampling and the accuracy of phylogenetic analyses. J. Syst. Evol., 46(3), 239–257.
Heath, T. A., Zwickl, D. J., Kim, J., & Hillis, D. M. (2008b). Taxon sampling affects inferences of macroevolutionary processes from phylogenetic trees. Syst. Biol., 57(1), 160.
Hedtke, S. M., Townsend, T. M., & Hillis, D. M. (2006). Resolution of phylogenetic conflict in large data sets by increased taxon sampling. Syst. Biol., 55(3), 522.
Hendy, M. D., & Penny, D. (1989). A framework for the quantitative study of evolutionary trees. Syst. Zool., 38(4), 297–309.
Hillis, D. M. (1996). Inferring complex phylogenies. Nature, 383(6596), 130.
Hillis, D. M., Pollock, D. D., McGuire, J. A., & Zwickl, D. J. (2003). Is sparse taxon sampling a problem for phylogenetic inference? Syst. Biol., 52(1), 124–126.
Jensen, A. N. (2009). Gfan—A software system for Gröbner fans. Available at http://www.math.tu-berlin.de/~jensen/software/gfan/gfan.html.
Kim, J. (1996). General inconsistency conditions for maximum parsimony: Effects of branch lengths and increasing numbers of taxa. Syst. Biol., 45(3), 363.
Mailund, T., & Pedersen, C.N.S. (2004). QDist—Quartet distance between evolutionary trees. Bioinformatics, 971.
Olsen, G. J., Matsuda, H., Hagstrom, R., & Overbeek, R. (1994). fastDNAml: A tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Bioinformatics, 10(1), 41.
Pachter, L., & Sturmfels, B. (Eds.) (2005). Algebraic statistics for computational biology (p. 69). Cambridge: Cambridge University Press. Chap. II.
Padberg, M. W. & Grötschel, M. (1985). Polyhedral computations. In Wiley-intersci. ser. discrete math. The traveling salesman problem (pp. 307–360). Chichester: Wiley.
Poe, S. (1998). Sensitivity of phylogeny estimation to taxonomic sampling. Syst. Biol., 47(1), 18.
Poe, S. (2003). Evaluation of the strategy of long-branch subdivision to improve the accuracy of phylogenetic methods. Syst. Biol., 52(3), 423–428.
Pollock, D. D., Zwickl, D. J., McGuire, J. A., & Hillis, D. M. (2002). Increased taxon sampling is advantageous for phylogenetic inference. Syst. Biol., 51(4), 664–671.
Rannala, B., Huelsenbeck, J. P., Yang, Z., & Nielsen, R. (1998). Taxon sampling and the accuracy of large phylogenies. Syst. Biol., 47(4), 702–710.
Robinson, D. F., & Foulds, L. R. (1981). Comparison of phylogenetic trees. Math. Biosci, 53(1–2), 131–147.
Ronquist, F., Huelsenbeck, J. P., & van der Mark, P. (2005). MrBayes 3.1 manual. http://mrbayes.csit.fsu.edu/mb3.1_manual.pdf.
Rosenberg, M. S., & Kumar, S. (2001). Incomplete taxon sampling is not a problem for phylogenetic inference. Proc. Natl. Acad. Sci., 98(19), 10751.
Rosenberg, M. S., & Kumar, S. (2003). Taxon sampling, bioinformatics, and phylogenomics. Syst. Biol., 52(1), 119–124.
Saitou, N., & Nei, M. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol., 4(4), 406–425.
Steel, M. A., & Penny, D. (1993). Distributions of tree comparison metrics—Some new results. Syst. Biol., 42(2), 126.
Studier, J. A., & Keppler, K. J. (1988). A note on the neighbor-joining method of Saitou and Nei. Mol. Biol. Evol., 5(6), 729–731.
Sullivan, J., & Swofford, D. L. (1997). Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. J. Mammal. Evol., 4(2), 77–86.
Von Mering, C., Hugenholtz, P., Raes, J., Tringe, S. G., Doerks, T., Jensen, L. J., Ward, N., & Bork, P. (2007). Quantitative phylogenetic assessment of microbial communities in diverse environments. Science, 315(5815), 1126.
Ziegler, G. M. (2006). Graduate texts in mathematics : Vol. 152. Lectures on polytopes. Berlin: Springer.
Zwickl, D. J. & Hillis, D. M. (2002). Increased taxon sampling greatly reduces phylogenetic error. Syst. Biol., 51(4), 588.
Author information
Authors and Affiliations
Corresponding author
Additional information
The first author was supported by a UC Berkeley Chancellor’s Fellowship. The second author was supported by the Miller Institute for Basic Research at UC Berkeley.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Cueto, M.A., Matsen, F.A. Polyhedral Geometry of Phylogenetic Rogue Taxa. Bull Math Biol 73, 1202–1226 (2011). https://doi.org/10.1007/s11538-010-9556-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11538-010-9556-x