A ‘Stochastic Safety Radius’ for Distance-Based Tree Reconstruction
- 159 Downloads
A variety of algorithms have been proposed for reconstructing trees that show the evolutionary relationships between species by comparing differences in genetic data across present-day species. If the leaf-to-leaf distances in a tree can be accurately estimated, then it is possible to reconstruct this tree from these estimated distances, using polynomial-time methods such as the popular ‘Neighbor-Joining’ algorithm. There is a precise combinatorial condition under which distance-based methods are guaranteed to return a correct tree (in full or in part) based on the requirement that the input distances all lie within some ‘safety radius’ of the true distances. Here, we explore a stochastic analogue of this condition, and mathematically establish upper and lower bounds on this ‘stochastic safety radius’ for distance-based tree reconstruction methods. Using simulations, we show how this notion provides a new way to compare the performance of distance-based tree reconstruction methods. This may help explain why Neighbor-Joining performs so well, as its stochastic safety radius appears close to optimal (while its more classical safety radius is the same as many other less accurate methods).
KeywordsTree Reconstruction Robustness to random error
MS thanks the Allan Wilson Centre and the NZ Marsden Fund for supporting this work. We thank the two anonymous reviewers for a number of helpful suggestions.
- 3.Bordewich, M., Mihaescu, R.: Accuracy guarantees for phylogeny reconstruction algorithms based on balanced minimum evolution. In: Moulton, V., Singh, M. (eds.) Proceedings of WABI 2010, 10th international workshop on algorithms in bioinformatics, volume 6293 of LNBI, pp. 250–261. Springer (2010)Google Scholar
- 4.Bulmer, M.: Use of the method of generalized least-squares in reconstructing phylogenies from sequence data. Mol. Biol. Evol. 8, 868–883 (1991)Google Scholar
- 6.Cavalli-Sforza, L.L., Edwards, A.W.F.: Phylogenetic analysis: models and estimation procedures. Am. J. Hum. Genet. 19, 223–257 (1967)Google Scholar
- 10.Gascuel, O.: Concerning the NJ algorithm and its unweighted version, UNJ. In: Mirkin, B., McMorris, F.R., Roberts, F.S., Rzhetsky, A. (eds.) Mathematical Hierarchies and Biology, pp. 149–170. American Mathematical Society, Providence (1997)Google Scholar
- 22.Pauplin, Y.: Direct calculation of a tree length using a distance matrix. J. Mol. Evol. 51, 41–47 (2000)Google Scholar
- 24.Rzhetsky, A., Nei, M.: Theoretical foundation of the minimum-evolution method of phylogenetic inference. Mol. Biol. Evol. 10, 1073–1095 (1993)Google Scholar
- 25.Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)Google Scholar
- 27.Zarestkii, K.: Reconstructing a tree from the distances between its leaves (In Russian). Uspehi Mathematicheskikh Nauk 20, 90–92 (1965)Google Scholar