Algorithmica

, Volume 74, Issue 4, pp 1386–1403 | Cite as

A ‘Stochastic Safety Radius’ for Distance-Based Tree Reconstruction

Article

Abstract

A variety of algorithms have been proposed for reconstructing trees that show the evolutionary relationships between species by comparing differences in genetic data across present-day species. If the leaf-to-leaf distances in a tree can be accurately estimated, then it is possible to reconstruct this tree from these estimated distances, using polynomial-time methods such as the popular ‘Neighbor-Joining’ algorithm. There is a precise combinatorial condition under which distance-based methods are guaranteed to return a correct tree (in full or in part) based on the requirement that the input distances all lie within some ‘safety radius’ of the true distances. Here, we explore a stochastic analogue of this condition, and mathematically establish upper and lower bounds on this ‘stochastic safety radius’ for distance-based tree reconstruction methods. Using simulations, we show how this notion provides a new way to compare the performance of distance-based tree reconstruction methods. This may help explain why Neighbor-Joining performs so well, as its stochastic safety radius appears close to optimal (while its more classical safety radius is the same as many other less accurate methods).

Keywords

Tree Reconstruction Robustness to random error 

References

  1. 1.
    Atteson, K.: The performance of neighbor-joining methods of phylogeny reconstruction. Algorithmica 25(2–3), 251–278 (1999)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Berry, V., Gascuel, O.: Inferring evolutionary trees with strong combinatorial evidence. Theor. Comput. Sci. 240(2), 271–298 (1997)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Bordewich, M., Mihaescu, R.: Accuracy guarantees for phylogeny reconstruction algorithms based on balanced minimum evolution. In: Moulton, V., Singh, M. (eds.) Proceedings of WABI 2010, 10th international workshop on algorithms in bioinformatics, volume 6293 of LNBI, pp. 250–261. Springer (2010)Google Scholar
  4. 4.
    Bulmer, M.: Use of the method of generalized least-squares in reconstructing phylogenies from sequence data. Mol. Biol. Evol. 8, 868–883 (1991)Google Scholar
  5. 5.
    Casella, G., Berger, R.L.: Statistical Inference. Duxbury Press, Belmont (1990)MATHGoogle Scholar
  6. 6.
    Cavalli-Sforza, L.L., Edwards, A.W.F.: Phylogenetic analysis: models and estimation procedures. Am. J. Hum. Genet. 19, 223–257 (1967)Google Scholar
  7. 7.
    Desper, R., Gascuel, O.: Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. J. Comput. Biol. 9, 687–706 (2002)CrossRefMATHGoogle Scholar
  8. 8.
    Eickmeyer, K., Huggins, P., Pachter, L., Yoshida, R.: On the optimality of the neighbor-joining algorithm. Algorithms Mol. Biol. 3, 5 (2008)CrossRefGoogle Scholar
  9. 9.
    Fitch, W.M., Margoliash, E.: Construction of phylogenetic trees. Science 155, 279–284 (1967)CrossRefGoogle Scholar
  10. 10.
    Gascuel, O.: Concerning the NJ algorithm and its unweighted version, UNJ. In: Mirkin, B., McMorris, F.R., Roberts, F.S., Rzhetsky, A. (eds.) Mathematical Hierarchies and Biology, pp. 149–170. American Mathematical Society, Providence (1997)Google Scholar
  11. 11.
    Gascuel, O.: ”BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol. Biol. Evol. 14(7), 685–695 (1997)CrossRefGoogle Scholar
  12. 12.
    Gascuel, O.: Data model and classification by trees: the minimum variance reduction (MVR) method. J. Classif. 17, 69–99 (2000)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Gascuel, O.: On the optimization principle in phylogenetic analysis and the minimum evolution criterion. Mol. Biol. Evol. 17(3), 401–405 (2000)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Gascuel, O., McKenzie, A.: Performance analysis of hierarchical clustering algorithms. J. Classif. 21, 3–18 (2004)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Gascuel, O., Steel, M.: Neighbor-Joining revealed. Mol. Biol. Evol. 23(11), 1997–2000 (2006)CrossRefGoogle Scholar
  16. 16.
    Gascuel, O., Levy, D.: A reduction algorithm for approximating a (nonmetric) dissimilarity by a tree distance. J. Classif. 13, 129–155 (1996)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Guiasu, S.: Information Theory with Applications. McGraw-Hill, New York (1977)MATHGoogle Scholar
  18. 18.
    Kannan, S.K., Lawler, E.L., Warnow, T.J.: Determining the evolutionary tree using experiments. J. Algorithms 21, 26–50 (1996)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Mihaescu, R., Levy, D., Pachter, L.: Why neighbor-joining works. Algorithmica 54(1), 1–24 (2009)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Pardi, F., Guillemot, S., Gascuel, O.: Robustness of phylogenetic inference based on minimum evolution. Bull. Math. Biol. 72, 1820–1839 (2010)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Pearl, J., Tarsi, M.: Structuring causal trees. J. Complex. 2, 60–77 (1986)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Pauplin, Y.: Direct calculation of a tree length using a distance matrix. J. Mol. Evol. 51, 41–47 (2000)Google Scholar
  23. 23.
    Robinson, D.R., Foulds, L.R.: Comparison of phylogenetic trees. Math. Biosci. 53, 131–147 (1981)MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Rzhetsky, A., Nei, M.: Theoretical foundation of the minimum-evolution method of phylogenetic inference. Mol. Biol. Evol. 10, 1073–1095 (1993)Google Scholar
  25. 25.
    Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)Google Scholar
  26. 26.
    Sattath, S., Tversky, A.: Additive similarity trees. Psychometrika 42, 319–345 (1997)CrossRefGoogle Scholar
  27. 27.
    Zarestkii, K.: Reconstructing a tree from the distances between its leaves (In Russian). Uspehi Mathematicheskikh Nauk 20, 90–92 (1965)Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Institut de Biologie Computationelle (IBC) – LIRMM (UMR 5506)CNRS and Université de MontpellierMontpellierFrance
  2. 2.Biomathematics Research CentreUniversity of CanterburyChristchurchNew Zealand

Personalised recommendations