Abstract
A variety of algorithms have been proposed for reconstructing trees that show the evolutionary relationships between species by comparing differences in genetic data across present-day species. If the leaf-to-leaf distances in a tree can be accurately estimated, then it is possible to reconstruct this tree from these estimated distances, using polynomial-time methods such as the popular ‘Neighbor-Joining’ algorithm. There is a precise combinatorial condition under which distance-based methods are guaranteed to return a correct tree (in full or in part) based on the requirement that the input distances all lie within some ‘safety radius’ of the true distances. Here, we explore a stochastic analogue of this condition, and mathematically establish upper and lower bounds on this ‘stochastic safety radius’ for distance-based tree reconstruction methods. Using simulations, we show how this notion provides a new way to compare the performance of distance-based tree reconstruction methods. This may help explain why Neighbor-Joining performs so well, as its stochastic safety radius appears close to optimal (while its more classical safety radius is the same as many other less accurate methods).
Similar content being viewed by others
References
Atteson, K.: The performance of neighbor-joining methods of phylogeny reconstruction. Algorithmica 25(2–3), 251–278 (1999)
Berry, V., Gascuel, O.: Inferring evolutionary trees with strong combinatorial evidence. Theor. Comput. Sci. 240(2), 271–298 (1997)
Bordewich, M., Mihaescu, R.: Accuracy guarantees for phylogeny reconstruction algorithms based on balanced minimum evolution. In: Moulton, V., Singh, M. (eds.) Proceedings of WABI 2010, 10th international workshop on algorithms in bioinformatics, volume 6293 of LNBI, pp. 250–261. Springer (2010)
Bulmer, M.: Use of the method of generalized least-squares in reconstructing phylogenies from sequence data. Mol. Biol. Evol. 8, 868–883 (1991)
Casella, G., Berger, R.L.: Statistical Inference. Duxbury Press, Belmont (1990)
Cavalli-Sforza, L.L., Edwards, A.W.F.: Phylogenetic analysis: models and estimation procedures. Am. J. Hum. Genet. 19, 223–257 (1967)
Desper, R., Gascuel, O.: Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. J. Comput. Biol. 9, 687–706 (2002)
Eickmeyer, K., Huggins, P., Pachter, L., Yoshida, R.: On the optimality of the neighbor-joining algorithm. Algorithms Mol. Biol. 3, 5 (2008)
Fitch, W.M., Margoliash, E.: Construction of phylogenetic trees. Science 155, 279–284 (1967)
Gascuel, O.: Concerning the NJ algorithm and its unweighted version, UNJ. In: Mirkin, B., McMorris, F.R., Roberts, F.S., Rzhetsky, A. (eds.) Mathematical Hierarchies and Biology, pp. 149–170. American Mathematical Society, Providence (1997)
Gascuel, O.: ”BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol. Biol. Evol. 14(7), 685–695 (1997)
Gascuel, O.: Data model and classification by trees: the minimum variance reduction (MVR) method. J. Classif. 17, 69–99 (2000)
Gascuel, O.: On the optimization principle in phylogenetic analysis and the minimum evolution criterion. Mol. Biol. Evol. 17(3), 401–405 (2000)
Gascuel, O., McKenzie, A.: Performance analysis of hierarchical clustering algorithms. J. Classif. 21, 3–18 (2004)
Gascuel, O., Steel, M.: Neighbor-Joining revealed. Mol. Biol. Evol. 23(11), 1997–2000 (2006)
Gascuel, O., Levy, D.: A reduction algorithm for approximating a (nonmetric) dissimilarity by a tree distance. J. Classif. 13, 129–155 (1996)
Guiasu, S.: Information Theory with Applications. McGraw-Hill, New York (1977)
Kannan, S.K., Lawler, E.L., Warnow, T.J.: Determining the evolutionary tree using experiments. J. Algorithms 21, 26–50 (1996)
Mihaescu, R., Levy, D., Pachter, L.: Why neighbor-joining works. Algorithmica 54(1), 1–24 (2009)
Pardi, F., Guillemot, S., Gascuel, O.: Robustness of phylogenetic inference based on minimum evolution. Bull. Math. Biol. 72, 1820–1839 (2010)
Pearl, J., Tarsi, M.: Structuring causal trees. J. Complex. 2, 60–77 (1986)
Pauplin, Y.: Direct calculation of a tree length using a distance matrix. J. Mol. Evol. 51, 41–47 (2000)
Robinson, D.R., Foulds, L.R.: Comparison of phylogenetic trees. Math. Biosci. 53, 131–147 (1981)
Rzhetsky, A., Nei, M.: Theoretical foundation of the minimum-evolution method of phylogenetic inference. Mol. Biol. Evol. 10, 1073–1095 (1993)
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)
Sattath, S., Tversky, A.: Additive similarity trees. Psychometrika 42, 319–345 (1997)
Zarestkii, K.: Reconstructing a tree from the distances between its leaves (In Russian). Uspehi Mathematicheskikh Nauk 20, 90–92 (1965)
Acknowledgments
MS thanks the Allan Wilson Centre and the NZ Marsden Fund for supporting this work. We thank the two anonymous reviewers for a number of helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proof of (2)
Appendix: Proof of (2)
Substituting \(t=x+u, u \ge 0\) in \({\mathbb P}(N(0,1)>x) = \int _x^\infty \frac{1}{\sqrt{2\pi }} e^{-t^2/2} dt\) gives:
where the second inequality is from \(e^{-xu}< 1\) for all \(x,u>0\). Since the last term on the right is \(\frac{1}{2}\), we get the inequality in (2). Turning to the asymptotic relationship, consider:
Since the numerator and denominator limits are both zero, we can apply L’Hôpital’s rule. Straightforward calculus (using the fundamental theorem of calculus for the numerator) establishes that the limit in (13) equals 1. \(\square \)
Rights and permissions
About this article
Cite this article
Gascuel, O., Steel, M. A ‘Stochastic Safety Radius’ for Distance-Based Tree Reconstruction. Algorithmica 74, 1386–1403 (2016). https://doi.org/10.1007/s00453-015-0005-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-015-0005-y