Skip to main content

Fast Phylogenetic Tree Reconstruction Using Locality-Sensitive Hashing

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7534))

Abstract

We present the first sub-quadratic time algorithm that with high probability correctly reconstructs phylogenetic trees for short sequences generated by a Markov model of evolution. Due to rapid expansion in sequence databases, such very fast algorithms are necessary. Other fast heuristics have been developed for building trees from large alignments [18,1], but they lack theoretical performance guarantees. Our new algorithm runs in O(n 1 + γ(g)log2 n) time, where γ is an increasing function of an upper bound on the branch lengths in the phylogeny, the upper bound g must be below \(1/2-\sqrt{1/8} \approx 0.15\), and γ(g) < 1 for all g. For phylogenies with very short branches, the running time of our algorithm is near-linear. For example, if all branches have mutation probability less than 0.02, the running time of our algorithm is roughly O(n 1.2log2 n). Our preliminary experiments show that many large phylogenies can be reconstructed more accurately than allowed by current methods, in comparable running times.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brown, D.G., Truszkowski, J.: Towards a Practical O(n logn) Phylogeny Algorithm. In: Przytycka, T.M., Sagot, M.-F. (eds.) WABI 2011. LNCS (LNBI), vol. 6833, pp. 14–25. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  2. Brown, D.G., Truszkowski, J.: Towards a practical O(n logn) phylogeny algorithm. Algorithms for Molecular Biology (special issue on selected papers from WABI 2011 (submitted, 2012)

    Google Scholar 

  3. Buhler, J., Tompa, M.: Finding motifs using random projections. J. Comp. Biol. 9(2), 225–242 (2002)

    Article  Google Scholar 

  4. Csűrös, M.: Fast recovery of evolutionary trees with thousands of nodes. J. Comp. Biol. 9(2), 277–297 (2002)

    Article  Google Scholar 

  5. Daskalakis, C., Mossel, E., Roch, S.: Phylogenies without Branch Bounds: Contracting the Short, Pruning the Deep. In: Batzoglou, S. (ed.) RECOMB 2009. LNCS, vol. 5541, pp. 451–465. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  6. Daskalakis, C., Mossel, E., Roch, S.: Evolutionary trees and the Ising model on the Bethe lattice: a proof of Steel’s conjecture (July 27, 2005), http://arxiv.org/abs/math/0509575

  7. Elias, I., Lagergren, J.: Fast Neighbor Joining. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 1263–1274. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Erdös, P.L., Steel, M.A., Székely, L.A., Warnow, T.: A few logs suffice to build (almost) all trees: Part II. Theor. Comput. Sci 221(1-2), 77–118 (1999)

    Article  MATH  Google Scholar 

  9. Erdös, P.L., Steel, M.A., Székely, L.A., Warnow, T.: Greengenes, a chimera-checked 16s rrna gene database and workbench compatible with arb. Appl. Environ. Microbiol. 72, 5069–5072 (2006)

    Article  Google Scholar 

  10. Evans, W., Kenyon, C., Peres, Y., Schulman, L.J.: Broadcasting on trees and the Ising model. The Annals of Applied Probability 10(2), 410–433 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  11. Felsenstein, J.: Inferring Phylogenies. Sinauer (2001)

    Google Scholar 

  12. Gronau, I., Moran, S., Snir, S.: Fast and reliable reconstruction of phylogenetic trees with very short edges. In: Proceedings of SODA 2008, pp. 379–388 (2008)

    Google Scholar 

  13. Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proceedings of STOC 1998, New York, pp. 604–613 (1998)

    Google Scholar 

  14. King, V., Zhang, L., Zhou, Y.: On the complexity of distance-based evolutionary tree reconstruction. In: Proceedings of SODA 2003, pp. 444–453 (2003)

    Google Scholar 

  15. Liu, K., Raghavan, S., Nelesen, S., Linder, C.R., Warnow, T.: Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science 324(5934), 1561–1564 (2009)

    Article  Google Scholar 

  16. Mihaescu, R., Hill, C., Rao, S.: Fast phylogeny reconstruction through learning of ancestral sequences (December 08, 2008), http://arxiv.org/abs/0812.1587

  17. Mossel, E.: Phase transitions in phylogeny. Trans. Amer. Math. Soc. 356, 2379–2404 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  18. Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree: Computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. E 26(7), 1641–1650 (2009)

    Article  Google Scholar 

  19. Zhang, L., Shen, J., Yang, J., Li, G.: Analyzing the fitch method for reconstructing ancestral states on ultrametric phylogenetic trees. Bulletin of Mathematical Biology 72, 1760–1782 (2010)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Brown, D.G., Truszkowski, J. (2012). Fast Phylogenetic Tree Reconstruction Using Locality-Sensitive Hashing. In: Raphael, B., Tang, J. (eds) Algorithms in Bioinformatics. WABI 2012. Lecture Notes in Computer Science(), vol 7534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33122-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33122-0_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33121-3

  • Online ISBN: 978-3-642-33122-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics