Accelerating the Neighbor-Joining Algorithm Using the Adaptive Bucket Data Structure

  • Leonid Zaslavsky
  • Tatiana A. Tatusova
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4983)

Abstract

The complexity of the neighbor joining method is determined by the complexity of the search for an optimal pair (”neighbors to join”) performed globally at each iteration. Accelerating the neighbor-joining method requires performing a smarter search for an optimal pair of neighbors, avoiding re-evaluation of all possible pairs of points at each iteration.

We developed an acceleration technique for the neighbor-joining method that significantly decreases complexity for important applications without any change in the neighbor-joining method. This technique utilizes the bucket data structure. The pairs of nodes are arranged in buckets according to values of the goal function δij = ui + uj − dij. Buckets are adaptively re-arranged after each neighbor-joining step. While the pairs of nodes in the top bucket are re-evaluated at every iteration, pairs in lower buckets are accessed more rarely, when the algorithm determines that the elements of the bucket need to be re-evaluated based on new values of δij. As a result, only a small portion of candidate pairs of nodes is examined at each iteration.

The algorithm is cache efficient, since the bucket data structures are able to exploit locality and adjust to cache properties.

Keywords

neighbor-joining algorithm bucket data structure adaptive cache-efficient 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Saitau, N., Nei, M.: The neighbor-joining method: new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)Google Scholar
  2. 2.
    Studier, J.A., Keppler, K.J.: A note on the neighbor-joining algorithm of Saitou and Nei. Molecular Biology and Evolution 5, 729–731 (1988)Google Scholar
  3. 3.
    Felsenstein, J.: Inferring Phylogenies. Cambridge University Press, Cambridge (2003)Google Scholar
  4. 4.
    Atteson, K.: The performance of neighbor-joining methods of phylogenetic reconstruction. Algorithmica 25, 251–278 (1999)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Tamura, K., Nei, M., Kumar, S.: Prospects for inferring very large phylogenies by using the neighbor-joining method. PNAS 101(30), 11030–11035 (2004)CrossRefGoogle Scholar
  6. 6.
    Bryant, D.: On the uniqueness of the selection criterion in neighbor-joining. Journal of Classification 22(1) (2005)Google Scholar
  7. 7.
    Desper, R., Gascuel, O.: The minimum-evolution distance-based approach to phylogenetic interference. In: Gascuel, O. (ed.) Mathematics of evolution and phylogeny, pp. 1–32. Oxford University Press, Oxford (2005)Google Scholar
  8. 8.
    Gascuel, O., Steel, M.: Neighbor-joining revealed. Molecular Biology and Evolution 23, 1997–2000 (2006)CrossRefGoogle Scholar
  9. 9.
    Gascuel, O.: BIONJ: an improved version of the nj algorithm based on a simple model of sequence data. Mol. Biol. Evol. 14, 685–695 (1997)Google Scholar
  10. 10.
    Bruno, W.J., Socci, N., Halpern, A.L.: Weighted neighbor-joining: a likelihood-based approach to distance-based phyloginy reconstruction. Mol. Biol. Evol. 17, 189–197 (2000)Google Scholar
  11. 11.
    Yang, Z.: Computational Molecular Evolution. Oxford University Press, Oxford (2006)Google Scholar
  12. 12.
    Bryant, D.: A classification of consensus methods for phylogenies. In: Janowitz, M., Lapointe, F.J., McMorris, F., Mirkin, B., Roberts, F., (eds.) BioConsensus, DIMACS, Americal Mathematical Society, pp. 163–184 (2003)Google Scholar
  13. 13.
    Bao, Y., Bolotov, P., Dernovoy, D., Kiryutin, B., Zaslavsky, L., Tatusova, T., Ostell, J., Lipman, D.: The Influenza Virus Resource at the National Center for Biotechnology Information. Journal of Virology 82(2), 596–601 (2008)CrossRefGoogle Scholar
  14. 14.
    Zaslavsky, L., Bao, Y., Tatusova, T.A.: An Adaptive-Resolution Tree Visualization of Large Influenza Virus Sequence Datasets. In: Măndoiu, I.I., Zelikovsky, A. (eds.) ISBRA 2007. LNCS (LNBI), vol. 4463, pp. 192–202. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  15. 15.
    Mailund, T., Pedersen, C.N.: Quickjoin – fast neighbor-joining tree reconstruction. Bioinformatics 20(17), 3261–3262 (2004)CrossRefGoogle Scholar
  16. 16.
    Mailund, T., Brodal, G.S., Fagerberg, R., Pedersen, C.N.S., Phillips, D.: Recrafting the neighbor-joining method. BMC Bioinformatics 7(29) (2006)Google Scholar
  17. 17.
    Shenerman, L., Evans, J., Foster, J.A.: Clearcut: fast implementation of relaxed neighbor joining. Bioinformatics 22(22), 2823–2824 (2006)CrossRefGoogle Scholar
  18. 18.
    Evans, J., Shenerman, L., Foster, J.: Relaxed Neighbor-Joining: A Fast Distance-Based Phylogenetic Tree Construction Method. J. Mol. Evol. 62, 785–792 (2006)CrossRefGoogle Scholar
  19. 19.
    Elias, I., Lagergren, J.: Fast neighbor joining. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 1263–1274. Springer, Heidelberg (2005)Google Scholar
  20. 20.
    LaMarca, A., Ladner, R.E.: The influence of caches on the performance of sorting. Journal of Algorithms 31, 66–104 (1999)MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Brodal, G.S., Fagerberg, R., Vinther, K.: Engineering a cache-oblivious sorting algorithm. Journal of Experimental Algorithmics 12, 2.1 (2007)CrossRefGoogle Scholar
  22. 22.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press and McGraw-Hill (2001)Google Scholar
  23. 23.
    Dial, R.B.: Algorithm 360: Shortest path forest with topological ordering. Comm. ACM 12, 632–633 (1969)CrossRefGoogle Scholar
  24. 24.
    Wagner, R.A.: A shortest path algorithm for edge-aparse graphs. J. Assoc. Comput. Mach. 23, 50–57 (1976)MATHMathSciNetGoogle Scholar
  25. 25.
    Dinic, E.A.: Economical algorithms for finding shortest path in network. In: Popkov, Y.S., Shmulyan, B.L., (eds.) Transportation Modeling Systems, The Institute for System Studies, pp. 36–44 (in Russian) (1978)Google Scholar
  26. 26.
    Denardo, E.V., Fox, B.L.: Shortest-route methods: 1. reaching, pruning, and buckets. Oper. Res. 27, 161–186 (1979)MATHMathSciNetCrossRefGoogle Scholar
  27. 27.
    Cherkassky, B.V., Goldberg, A.V., Silverstein, C.: Buckets, heaps, lists, and monotone priority queues. SIAM Journal of Computing 1999 28(4), 1326–1346 (1999)MATHCrossRefMathSciNetGoogle Scholar
  28. 28.
    Musser, D.R., Derge, G.J., Saini, A.: STL Tutorial and Reference Guide: C++ Programming with the Standard Template Library, 2nd edn. Addison-Wesley, Reading (2001)Google Scholar
  29. 29.
    Meyers, S.: Effective STL. Addison-Wesley, Reading (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Leonid Zaslavsky
    • 1
  • Tatiana A. Tatusova
    • 1
  1. 1.National Center for Biotechnology Information, National Library of MedicineNational Institute of HealthBethesdaUSA

Personalised recommendations