Large-Scale Neighbor-Joining with NINJA

  • Travis J. Wheeler
Conference paper

DOI: 10.1007/978-3-642-04241-6_31

Volume 5724 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Wheeler T.J. (2009) Large-Scale Neighbor-Joining with NINJA. In: Salzberg S.L., Warnow T. (eds) Algorithms in Bioinformatics. WABI 2009. Lecture Notes in Computer Science, vol 5724. Springer, Berlin, Heidelberg

Abstract

Neighbor-joining is a well-established hierarchical clustering algorithm for inferring phylogenies. It begins with observed distances between pairs of sequences, and clustering order depends on a metric related to those distances. The canonical algorithm requires O(n3) time and O(n2) space for n sequences, which precludes application to very large sequence families, e.g. those containing 100,000 sequences. Datasets of this size are available today, and such phylogenies will play an increasingly important role in comparative biology studies. Recent algorithmic advances have greatly sped up neighbor-joining for inputs of thousands of sequences, but are limited to fewer than 13,000 sequences on a system with 4GB RAM. In this paper, I describe an algorithm that speeds up neighbor-joining by dramatically reducing the number of distance values that are viewed in each iteration of the clustering procedure, while still computing a correct neighbor-joining tree. This algorithm can scale to inputs larger than 100,000 sequences because of external-memory-efficient data structures. A free implementation may by obtained from http://nimbletwist.com/software/ninja

Keywords

Phylogeny inference Neighbor joining external memory 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Travis J. Wheeler
    • 1
  1. 1.Department of Computer ScienceThe University of ArizonaTucsonUSA