Skip to main content

Large-Scale Neighbor-Joining with NINJA

  • Conference paper
Algorithms in Bioinformatics (WABI 2009)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5724))

Included in the following conference series:

Abstract

Neighbor-joining is a well-established hierarchical clustering algorithm for inferring phylogenies. It begins with observed distances between pairs of sequences, and clustering order depends on a metric related to those distances. The canonical algorithm requires O(n 3) time and O(n 2) space for n sequences, which precludes application to very large sequence families, e.g. those containing 100,000 sequences. Datasets of this size are available today, and such phylogenies will play an increasingly important role in comparative biology studies. Recent algorithmic advances have greatly sped up neighbor-joining for inputs of thousands of sequences, but are limited to fewer than 13,000 sequences on a system with 4GB RAM. In this paper, I describe an algorithm that speeds up neighbor-joining by dramatically reducing the number of distance values that are viewed in each iteration of the clustering procedure, while still computing a correct neighbor-joining tree. This algorithm can scale to inputs larger than 100,000 sequences because of external-memory-efficient data structures. A free implementation may by obtained from http://nimbletwist.com/software/ninja

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)

    CAS  PubMed  Google Scholar 

  2. Nakhleh, L., Moret, B.M.E., Roshan, U., John, K.S., Sun, J., Warnow, T.: The accuracy of fast phylogenetic methods for large datasets. In: Proc. 7th Pacific Symp. on Biocomputing, PSB 2002, pp. 211–222 (2002)

    Google Scholar 

  3. Atteson, K.: The Performance of Neighbor-Joining Methods of Phylogenetic Reconstruction. Algorithmica 25, 251–278 (1999)

    Article  Google Scholar 

  4. Felsenstein, J.: Inferring phylogenies (January 2004)

    Google Scholar 

  5. Bryant, D.: On the Uniqueness of the Selection Criterion in Neighbor-Joining. Journal of Classification 22, 3–15 (2005)

    Article  Google Scholar 

  6. Studier, J.A., Keppler, K.J.: A note on the neighbor-joining algorithm of Saitou and Nei. Mol. Biol. Evol. 5(6), 729–731 (1988)

    CAS  PubMed  Google Scholar 

  7. Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, S.J., Hotz, H.R.R., Ceric, G., Forslund, K., Eddy, S.R., Sonnhammer, E.L.L., Bateman, A.: The Pfam protein families database. Nucleic Acids Res. 36(Database issue), D281–D288 (2008)

    Google Scholar 

  8. Griffiths Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S.R., Bateman, A.: Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33(Database issue), D121–D124 (2005)

    Article  Google Scholar 

  9. Goldman, N., Yang, Z.: Introduction. Statistical and computational challenges in molecular phylogenetics and evolution. Philos. Trans. R Soc. Lond B Biol. Sci. 363(1512), 3889–3892 (2008)

    Article  PubMed  PubMed Central  Google Scholar 

  10. Smith, S.A., Beaulieu, J.M., Donoghue, M.J.: Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches. BMC Evol. Biol. 9, 37 (2009)

    Article  PubMed  PubMed Central  Google Scholar 

  11. Howe, K., Bateman, A., Durbin, R.: QuickTree: building huge Neighbour-Joining trees of protein sequences. Bioinformatics 18(11), 1546–1547 (2002)

    Article  CAS  PubMed  Google Scholar 

  12. Mailund, T., Pedersen, C.N.S.: QuickJoin–fast neighbour-joining tree reconstruction. Bioinformatics 20(17), 3261–3262 (2004)

    Article  CAS  PubMed  Google Scholar 

  13. Mailund, T., Brodal, G.S., Fagerberg, R., Pedersen, C.N.S., Phillips, D.: Recrafting the neighbor-joining method. BMC Bioinformatics 7, 29 (2006)

    Article  PubMed  PubMed Central  Google Scholar 

  14. Simonsen, M., Mailund, T., Pedersen, C.N.S.: Rapid Neighbour-Joining. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 113–122. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  15. Zaslavsky, L., Tatusova, T.: Accelerating the neighbor-joining algorithm using the adaptive bucket data structure. In: Măndoiu, I., Sunderraman, R., Zelikovsky, A. (eds.) ISBRA 2008. LNCS (LNBI), vol. 4983, pp. 122–133. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Evans, J., Sheneman, L., Foster, J.: Relaxed neighbor joining: a fast distance-based phylogenetic tree construction method. J. Mol. Evol. 62(6), 785–792 (2006)

    Article  CAS  PubMed  Google Scholar 

  17. Elias, I., Lagergren, J.: Fast Neighbor Joining. Theor. Comput. Sci. 410, 1993–2000 (2009)

    Article  Google Scholar 

  18. Desper, R., Gascuel, O.: Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. Journal of Computational Biology 9(5), 687–705 (2002)

    Article  CAS  PubMed  Google Scholar 

  19. Sheneman, L., Evans, J., Foster, J.A.: Clearcut: a fast implementation of relaxed neighbor joining. Bioinformatics 22(22), 2823–2824 (2006)

    Article  CAS  PubMed  Google Scholar 

  20. Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree: Computing Large Minimum-Evolution Trees with Profiles instead of a Distance Matrix. Molecular Biology and Evolution 26, 1641–1650 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Patterson, D.A.: Latency lags bandwidth. Communications of the ACM 47(10), 71–75 (2004)

    Article  Google Scholar 

  22. Bayer, R., McCreight, E.: Organization and Maintenance of Large Ordered Indexes. Acta Informatica 1, 173–189 (1972)

    Article  Google Scholar 

  23. Corman, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms, 2nd edn. MIT Press, Cambridge (2001)

    Google Scholar 

  24. Brengel, K., Crauser, A., Ferragina, P., Meyer, U.: An Experimental Study of Priority Queues in External Memory. In: Vitter, J.S., Zaroliagis, C.D. (eds.) WAE 1999. LNCS, vol. 1668, pp. 345–359. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  25. Gascuel, O.: BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol. Biol. Evol. 14(7), 685–695 (1997)

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wheeler, T.J. (2009). Large-Scale Neighbor-Joining with NINJA. In: Salzberg, S.L., Warnow, T. (eds) Algorithms in Bioinformatics. WABI 2009. Lecture Notes in Computer Science(), vol 5724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04241-6_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04241-6_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04240-9

  • Online ISBN: 978-3-642-04241-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics