Abstract
We present the first sub-quadratic time algorithm that with high probability correctly reconstructs phylogenetic trees for short sequences generated by a Markov model of evolution. Due to rapid expansion in sequence databases, such very fast algorithms are necessary. Other fast heuristics have been developed for building trees from large alignments [18,1], but they lack theoretical performance guarantees. Our new algorithm runs in O(n 1 + γ(g)log2 n) time, where γ is an increasing function of an upper bound on the branch lengths in the phylogeny, the upper bound g must be below \(1/2-\sqrt{1/8} \approx 0.15\), and γ(g) < 1 for all g. For phylogenies with very short branches, the running time of our algorithm is near-linear. For example, if all branches have mutation probability less than 0.02, the running time of our algorithm is roughly O(n 1.2log2 n). Our preliminary experiments show that many large phylogenies can be reconstructed more accurately than allowed by current methods, in comparable running times.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Brown, D.G., Truszkowski, J.: Towards a Practical O(n logn) Phylogeny Algorithm. In: Przytycka, T.M., Sagot, M.-F. (eds.) WABI 2011. LNCS (LNBI), vol. 6833, pp. 14–25. Springer, Heidelberg (2011)
Brown, D.G., Truszkowski, J.: Towards a practical O(n logn) phylogeny algorithm. Algorithms for Molecular Biology (special issue on selected papers from WABI 2011 (submitted, 2012)
Buhler, J., Tompa, M.: Finding motifs using random projections. J. Comp. Biol. 9(2), 225–242 (2002)
Csűrös, M.: Fast recovery of evolutionary trees with thousands of nodes. J. Comp. Biol. 9(2), 277–297 (2002)
Daskalakis, C., Mossel, E., Roch, S.: Phylogenies without Branch Bounds: Contracting the Short, Pruning the Deep. In: Batzoglou, S. (ed.) RECOMB 2009. LNCS, vol. 5541, pp. 451–465. Springer, Heidelberg (2009)
Daskalakis, C., Mossel, E., Roch, S.: Evolutionary trees and the Ising model on the Bethe lattice: a proof of Steel’s conjecture (July 27, 2005), http://arxiv.org/abs/math/0509575
Elias, I., Lagergren, J.: Fast Neighbor Joining. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 1263–1274. Springer, Heidelberg (2005)
Erdös, P.L., Steel, M.A., Székely, L.A., Warnow, T.: A few logs suffice to build (almost) all trees: Part II. Theor. Comput. Sci 221(1-2), 77–118 (1999)
Erdös, P.L., Steel, M.A., Székely, L.A., Warnow, T.: Greengenes, a chimera-checked 16s rrna gene database and workbench compatible with arb. Appl. Environ. Microbiol. 72, 5069–5072 (2006)
Evans, W., Kenyon, C., Peres, Y., Schulman, L.J.: Broadcasting on trees and the Ising model. The Annals of Applied Probability 10(2), 410–433 (2000)
Felsenstein, J.: Inferring Phylogenies. Sinauer (2001)
Gronau, I., Moran, S., Snir, S.: Fast and reliable reconstruction of phylogenetic trees with very short edges. In: Proceedings of SODA 2008, pp. 379–388 (2008)
Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proceedings of STOC 1998, New York, pp. 604–613 (1998)
King, V., Zhang, L., Zhou, Y.: On the complexity of distance-based evolutionary tree reconstruction. In: Proceedings of SODA 2003, pp. 444–453 (2003)
Liu, K., Raghavan, S., Nelesen, S., Linder, C.R., Warnow, T.: Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science 324(5934), 1561–1564 (2009)
Mihaescu, R., Hill, C., Rao, S.: Fast phylogeny reconstruction through learning of ancestral sequences (December 08, 2008), http://arxiv.org/abs/0812.1587
Mossel, E.: Phase transitions in phylogeny. Trans. Amer. Math. Soc. 356, 2379–2404 (2004)
Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree: Computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. E 26(7), 1641–1650 (2009)
Zhang, L., Shen, J., Yang, J., Li, G.: Analyzing the fitch method for reconstructing ancestral states on ultrametric phylogenetic trees. Bulletin of Mathematical Biology 72, 1760–1782 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brown, D.G., Truszkowski, J. (2012). Fast Phylogenetic Tree Reconstruction Using Locality-Sensitive Hashing. In: Raphael, B., Tang, J. (eds) Algorithms in Bioinformatics. WABI 2012. Lecture Notes in Computer Science(), vol 7534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33122-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-33122-0_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33121-3
Online ISBN: 978-3-642-33122-0
eBook Packages: Computer ScienceComputer Science (R0)