Bulletin of Mathematical Biology

, Volume 73, Issue 6, pp 1202–1226 | Cite as

Polyhedral Geometry of Phylogenetic Rogue Taxa

Open Access
Original Article

Abstract

It is well known among phylogeneticists that adding an extra taxon (e.g. species) to a data set can alter the structure of the optimal phylogenetic tree in surprising ways. However, little is known about this “rogue taxon” effect. In this paper we characterize the behavior of balanced minimum evolution (BME) phylogenetics on data sets of this type using tools from polyhedral geometry. First we show that for any distance matrix there exist distances to a “rogue taxon” such that the BME-optimal tree for the data set with the new taxon does not contain any nontrivial splits (bipartitions) of the optimal tree for the original data. Second, we prove a theorem which restricts the topology of BME-optimal trees for data sets of this type, thus showing that a rogue taxon cannot have an arbitrary effect on the optimal tree. Third, we computationally construct polyhedral cones that give complete answers for BME rogue taxon behavior when our original data fits a tree on four, five, and six taxa. We use these cones to derive sufficient conditions for rogue taxon behavior for four taxa, and to understand the frequency of the rogue taxon effect via simulation.

Keywords

Minimum evolution Distance-based phylogenetic inference Linear programming Polytope Normal fan 

References

  1. Baurain, D., Brinkmann, H., & Philippe, H. (2007). Lack of resolution in the animal phylogeny: Closely spaced cladogeneses or undetected systematic errors? Mol. Biol. Evol., 24(1), 6. CrossRefGoogle Scholar
  2. Berger, S. A., & Stamatakis, A. (2009). Evolutionary placement of short sequence reads. http://arxiv.org/abs/0911.2852.
  3. Bordewich, M., Gascuel, O., Huber, K. T., & Moulton, V. (2009). Consistency of topological moves based on the balanced minimum evolution principle of phylogenetic inference. IEEE/ACM Trans. Comput. Biol. Bioinfo., 6(1), 110–117. CrossRefGoogle Scholar
  4. Brodal, G. S., Fagerberg, R., & Pedersen, C. N. S. (2004). Computing the quartet distance between evolutionary trees in time O(nlog n). Algorithmica, 38(2), 377–395. MathSciNetMATHCrossRefGoogle Scholar
  5. Bryant, D., Tsang, J., Kearney, P., & Li, M. (2000). Computing the quartet distance between evolutionary trees. In Proceedings of the eleventh annual ACM-SIAM symposium on discrete algorithms (p. 286). Society for Industrial and Applied Mathematics, Philadelphia. Google Scholar
  6. Chailloux, E., Manoury, P., & Pagano, B. (2000). Developing applications with objective caml. http://caml.inria.fr/ocaml/index.en.html.
  7. DeBry, R. W. (2005). The systematic component of phylogenetic error as a function of taxonomic sampling under parsimony. Syst. Biol., 54(3), 432. CrossRefGoogle Scholar
  8. Desper, R., & Gascuel, O. (2002a). Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. In Workshop on algorithms in bioinformatics (WABI) (pp. 357–374). Google Scholar
  9. Desper, R., & Gascuel, O. (2002b). Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. J. Comput. Biol., 9(5), 687–705. CrossRefGoogle Scholar
  10. Desper, R., & Gascuel, O. (2004). Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting. Mol. Biol. Evol., 21(3), 587–598. CrossRefGoogle Scholar
  11. Desper, R., & Gascuel, O. (2005). The minimum evolution distance-based approach to phylogenetic inference. In O. Gascuel (Ed.), Mathematics of evolution & phylogeny (pp. 1–32). Oxford: Oxford University Press. Google Scholar
  12. Eickmeyer, K., Huggins, P., Pachter, L., & Yoshida, R. (2008). On the optimality of the neighbor-joining algorithm. Algorithms Mol. Biol., 3(5). Google Scholar
  13. Ewald, G. (1996). Graduate texts in mathematics : Vol. 168. Combinatorial convexity and algebraic geometry. New York: Springer. MATHGoogle Scholar
  14. Felsenstein, J. (1978). Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool., 27(4), 401–410. CrossRefGoogle Scholar
  15. Felsenstein, J. (1995). PHYLIP (phylogeny inference package), version 3.57 c. Department of Genetics, University of Washington, Seattle. Google Scholar
  16. Felsenstein, J. (2004). Inferring Phylogenies. Sunderland, MA: Sinauer Press. Google Scholar
  17. Galassi, M., Davies, J., Theiler, J., Gough, B., Jungman, G., Alken, P., Booth, M., & Rossi, F. (2009). GNU scientific library reference manual (3rd ed.). Network Theory Ltd. http://www.gnu.org/software/gsl/.
  18. Gawrilow, E., & Joswig, M. (2000). Polymake: A framework for analyzing convex polytopes. In G. Kalai, & G. M. Ziegler (Eds.), Polytopes—combinatorics and computation (pp. 43–74). Basel: Birkhäuser. Google Scholar
  19. Graybeal, A. (1998). Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol., 47(1), 9. CrossRefGoogle Scholar
  20. Guillemot, S. & Pardi, F. (2009). Personal communication. Google Scholar
  21. Heath, T. A., Hedtke, S. M., & Hillis, D. M. (2008a). Taxon sampling and the accuracy of phylogenetic analyses. J. Syst. Evol., 46(3), 239–257. Google Scholar
  22. Heath, T. A., Zwickl, D. J., Kim, J., & Hillis, D. M. (2008b). Taxon sampling affects inferences of macroevolutionary processes from phylogenetic trees. Syst. Biol., 57(1), 160. CrossRefGoogle Scholar
  23. Hedtke, S. M., Townsend, T. M., & Hillis, D. M. (2006). Resolution of phylogenetic conflict in large data sets by increased taxon sampling. Syst. Biol., 55(3), 522. CrossRefGoogle Scholar
  24. Hendy, M. D., & Penny, D. (1989). A framework for the quantitative study of evolutionary trees. Syst. Zool., 38(4), 297–309. CrossRefGoogle Scholar
  25. Hillis, D. M. (1996). Inferring complex phylogenies. Nature, 383(6596), 130. CrossRefGoogle Scholar
  26. Hillis, D. M., Pollock, D. D., McGuire, J. A., & Zwickl, D. J. (2003). Is sparse taxon sampling a problem for phylogenetic inference? Syst. Biol., 52(1), 124–126. CrossRefGoogle Scholar
  27. Jensen, A. N. (2009). Gfan—A software system for Gröbner fans. Available at http://www.math.tu-berlin.de/~jensen/software/gfan/gfan.html.
  28. Kim, J. (1996). General inconsistency conditions for maximum parsimony: Effects of branch lengths and increasing numbers of taxa. Syst. Biol., 45(3), 363. CrossRefGoogle Scholar
  29. Mailund, T., & Pedersen, C.N.S. (2004). QDist—Quartet distance between evolutionary trees. Bioinformatics, 971. Google Scholar
  30. Olsen, G. J., Matsuda, H., Hagstrom, R., & Overbeek, R. (1994). fastDNAml: A tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Bioinformatics, 10(1), 41. CrossRefGoogle Scholar
  31. Pachter, L., & Sturmfels, B. (Eds.) (2005). Algebraic statistics for computational biology (p. 69). Cambridge: Cambridge University Press. Chap. II. MATHGoogle Scholar
  32. Padberg, M. W. & Grötschel, M. (1985). Polyhedral computations. In Wiley-intersci. ser. discrete math. The traveling salesman problem (pp. 307–360). Chichester: Wiley. Google Scholar
  33. Poe, S. (1998). Sensitivity of phylogeny estimation to taxonomic sampling. Syst. Biol., 47(1), 18. CrossRefGoogle Scholar
  34. Poe, S. (2003). Evaluation of the strategy of long-branch subdivision to improve the accuracy of phylogenetic methods. Syst. Biol., 52(3), 423–428. CrossRefGoogle Scholar
  35. Pollock, D. D., Zwickl, D. J., McGuire, J. A., & Hillis, D. M. (2002). Increased taxon sampling is advantageous for phylogenetic inference. Syst. Biol., 51(4), 664–671. CrossRefGoogle Scholar
  36. Rannala, B., Huelsenbeck, J. P., Yang, Z., & Nielsen, R. (1998). Taxon sampling and the accuracy of large phylogenies. Syst. Biol., 47(4), 702–710. CrossRefGoogle Scholar
  37. Robinson, D. F., & Foulds, L. R. (1981). Comparison of phylogenetic trees. Math. Biosci, 53(1–2), 131–147. MathSciNetMATHCrossRefGoogle Scholar
  38. Ronquist, F., Huelsenbeck, J. P., & van der Mark, P. (2005). MrBayes 3.1 manual. http://mrbayes.csit.fsu.edu/mb3.1_manual.pdf.
  39. Rosenberg, M. S., & Kumar, S. (2001). Incomplete taxon sampling is not a problem for phylogenetic inference. Proc. Natl. Acad. Sci., 98(19), 10751. CrossRefGoogle Scholar
  40. Rosenberg, M. S., & Kumar, S. (2003). Taxon sampling, bioinformatics, and phylogenomics. Syst. Biol., 52(1), 119–124. CrossRefGoogle Scholar
  41. Saitou, N., & Nei, M. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol., 4(4), 406–425. Google Scholar
  42. Steel, M. A., & Penny, D. (1993). Distributions of tree comparison metrics—Some new results. Syst. Biol., 42(2), 126. MathSciNetGoogle Scholar
  43. Studier, J. A., & Keppler, K. J. (1988). A note on the neighbor-joining method of Saitou and Nei. Mol. Biol. Evol., 5(6), 729–731. Google Scholar
  44. Sullivan, J., & Swofford, D. L. (1997). Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. J. Mammal. Evol., 4(2), 77–86. CrossRefGoogle Scholar
  45. Von Mering, C., Hugenholtz, P., Raes, J., Tringe, S. G., Doerks, T., Jensen, L. J., Ward, N., & Bork, P. (2007). Quantitative phylogenetic assessment of microbial communities in diverse environments. Science, 315(5815), 1126. CrossRefGoogle Scholar
  46. Ziegler, G. M. (2006). Graduate texts in mathematics : Vol. 152. Lectures on polytopes. Berlin: Springer. Google Scholar
  47. Zwickl, D. J. & Hillis, D. M. (2002). Increased taxon sampling greatly reduces phylogenetic error. Syst. Biol., 51(4), 588. CrossRefGoogle Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  1. 1.Department of MathematicsUniversity of CaliforniaBerkeleyUSA
  2. 2.Program in Computational BiologyFred Hutchinson Cancer Research CenterSeattleUSA

Personalised recommendations