Skip to main content

Advertisement

Log in

A ‘Stochastic Safety Radius’ for Distance-Based Tree Reconstruction

  • Published:
Algorithmica Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

A variety of algorithms have been proposed for reconstructing trees that show the evolutionary relationships between species by comparing differences in genetic data across present-day species. If the leaf-to-leaf distances in a tree can be accurately estimated, then it is possible to reconstruct this tree from these estimated distances, using polynomial-time methods such as the popular ‘Neighbor-Joining’ algorithm. There is a precise combinatorial condition under which distance-based methods are guaranteed to return a correct tree (in full or in part) based on the requirement that the input distances all lie within some ‘safety radius’ of the true distances. Here, we explore a stochastic analogue of this condition, and mathematically establish upper and lower bounds on this ‘stochastic safety radius’ for distance-based tree reconstruction methods. Using simulations, we show how this notion provides a new way to compare the performance of distance-based tree reconstruction methods. This may help explain why Neighbor-Joining performs so well, as its stochastic safety radius appears close to optimal (while its more classical safety radius is the same as many other less accurate methods).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Atteson, K.: The performance of neighbor-joining methods of phylogeny reconstruction. Algorithmica 25(2–3), 251–278 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  2. Berry, V., Gascuel, O.: Inferring evolutionary trees with strong combinatorial evidence. Theor. Comput. Sci. 240(2), 271–298 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bordewich, M., Mihaescu, R.: Accuracy guarantees for phylogeny reconstruction algorithms based on balanced minimum evolution. In: Moulton, V., Singh, M. (eds.) Proceedings of WABI 2010, 10th international workshop on algorithms in bioinformatics, volume 6293 of LNBI, pp. 250–261. Springer (2010)

  4. Bulmer, M.: Use of the method of generalized least-squares in reconstructing phylogenies from sequence data. Mol. Biol. Evol. 8, 868–883 (1991)

    Google Scholar 

  5. Casella, G., Berger, R.L.: Statistical Inference. Duxbury Press, Belmont (1990)

    MATH  Google Scholar 

  6. Cavalli-Sforza, L.L., Edwards, A.W.F.: Phylogenetic analysis: models and estimation procedures. Am. J. Hum. Genet. 19, 223–257 (1967)

    Google Scholar 

  7. Desper, R., Gascuel, O.: Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. J. Comput. Biol. 9, 687–706 (2002)

    Article  MATH  Google Scholar 

  8. Eickmeyer, K., Huggins, P., Pachter, L., Yoshida, R.: On the optimality of the neighbor-joining algorithm. Algorithms Mol. Biol. 3, 5 (2008)

    Article  Google Scholar 

  9. Fitch, W.M., Margoliash, E.: Construction of phylogenetic trees. Science 155, 279–284 (1967)

    Article  Google Scholar 

  10. Gascuel, O.: Concerning the NJ algorithm and its unweighted version, UNJ. In: Mirkin, B., McMorris, F.R., Roberts, F.S., Rzhetsky, A. (eds.) Mathematical Hierarchies and Biology, pp. 149–170. American Mathematical Society, Providence (1997)

    Google Scholar 

  11. Gascuel, O.: ”BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol. Biol. Evol. 14(7), 685–695 (1997)

    Article  Google Scholar 

  12. Gascuel, O.: Data model and classification by trees: the minimum variance reduction (MVR) method. J. Classif. 17, 69–99 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  13. Gascuel, O.: On the optimization principle in phylogenetic analysis and the minimum evolution criterion. Mol. Biol. Evol. 17(3), 401–405 (2000)

    Article  MathSciNet  Google Scholar 

  14. Gascuel, O., McKenzie, A.: Performance analysis of hierarchical clustering algorithms. J. Classif. 21, 3–18 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  15. Gascuel, O., Steel, M.: Neighbor-Joining revealed. Mol. Biol. Evol. 23(11), 1997–2000 (2006)

    Article  Google Scholar 

  16. Gascuel, O., Levy, D.: A reduction algorithm for approximating a (nonmetric) dissimilarity by a tree distance. J. Classif. 13, 129–155 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  17. Guiasu, S.: Information Theory with Applications. McGraw-Hill, New York (1977)

    MATH  Google Scholar 

  18. Kannan, S.K., Lawler, E.L., Warnow, T.J.: Determining the evolutionary tree using experiments. J. Algorithms 21, 26–50 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  19. Mihaescu, R., Levy, D., Pachter, L.: Why neighbor-joining works. Algorithmica 54(1), 1–24 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  20. Pardi, F., Guillemot, S., Gascuel, O.: Robustness of phylogenetic inference based on minimum evolution. Bull. Math. Biol. 72, 1820–1839 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  21. Pearl, J., Tarsi, M.: Structuring causal trees. J. Complex. 2, 60–77 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  22. Pauplin, Y.: Direct calculation of a tree length using a distance matrix. J. Mol. Evol. 51, 41–47 (2000)

    Google Scholar 

  23. Robinson, D.R., Foulds, L.R.: Comparison of phylogenetic trees. Math. Biosci. 53, 131–147 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  24. Rzhetsky, A., Nei, M.: Theoretical foundation of the minimum-evolution method of phylogenetic inference. Mol. Biol. Evol. 10, 1073–1095 (1993)

    Google Scholar 

  25. Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)

    Google Scholar 

  26. Sattath, S., Tversky, A.: Additive similarity trees. Psychometrika 42, 319–345 (1997)

    Article  Google Scholar 

  27. Zarestkii, K.: Reconstructing a tree from the distances between its leaves (In Russian). Uspehi Mathematicheskikh Nauk 20, 90–92 (1965)

    Google Scholar 

Download references

Acknowledgments

MS thanks the Allan Wilson Centre and the NZ Marsden Fund for supporting this work. We thank the two anonymous reviewers for a number of helpful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mike Steel.

Appendix: Proof of (2)

Appendix: Proof of (2)

Substituting \(t=x+u, u \ge 0\) in \({\mathbb P}(N(0,1)>x) = \int _x^\infty \frac{1}{\sqrt{2\pi }} e^{-t^2/2} dt\) gives:

$$\begin{aligned} {\mathbb P}(N(0,1)>x) = e^{-x^2/2}\int _0^\infty \frac{1}{\sqrt{2\pi }} e^{-xu}e^{-u^2/2} du < e^{-x^2/2}\int _0^\infty \frac{1}{\sqrt{2\pi }} e^{-u^2/2} du, \end{aligned}$$

where the second inequality is from \(e^{-xu}< 1\) for all \(x,u>0\). Since the last term on the right is \(\frac{1}{2}\), we get the inequality in (2). Turning to the asymptotic relationship, consider:

$$\begin{aligned} \lim _{x \rightarrow \infty } \frac{\frac{1}{\sqrt{2\pi }}\int _x^\infty e^{-t^2/2} dt}{\frac{1}{x\sqrt{2\pi }} e^{-x^2/2}}. \end{aligned}$$
(13)

Since the numerator and denominator limits are both zero, we can apply L’Hôpital’s rule. Straightforward calculus (using the fundamental theorem of calculus for the numerator) establishes that the limit in (13) equals 1. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gascuel, O., Steel, M. A ‘Stochastic Safety Radius’ for Distance-Based Tree Reconstruction. Algorithmica 74, 1386–1403 (2016). https://doi.org/10.1007/s00453-015-0005-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-015-0005-y

Keywords

Navigation