Skip to main content

Advertisement

Log in

Fast and sensitive protein alignment using DIAMOND

  • Brief Communication
  • Published:

From Nature Methods

View current issue Submit your manuscript

Abstract

The alignment of sequencing reads against a protein reference database is a major computational bottleneck in metagenomics and data-intensive evolutionary projects. Although recent tools offer improved performance over the gold standard BLASTX, they exhibit only a modest speedup or low sensitivity. We introduce DIAMOND, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1: Comparison of DIAMOND and RAPSearch2 against BLASTX for four sequencing technologies and for ORFs predicted from a bacterial assembly.

Similar content being viewed by others

Accession codes

Accessions

Sequence Read Archive

References

  1. Handelsman, J., Rondon, M., Brady, S., Clardy, J. & Goodman, R. Chem. Biol. 5, R245–R249 (1998).

    Article  CAS  Google Scholar 

  2. Benson, D.A., Karsch-Mizrachi, I., Lipman, D., Ostell, J. & Wheeler, D. Nucleic Acids Res. 33, D34–D38 (2005).

    Article  CAS  Google Scholar 

  3. Kanehisa, M. & Goto, S. Nucleic Acids Res. 28, 27–30 (2000).

    Article  CAS  Google Scholar 

  4. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  Google Scholar 

  5. Kent, W.J. Genome Res. 12, 656–664 (2002).

    Article  CAS  Google Scholar 

  6. Edgar, R.C. Bioinformatics 26, 2460–2461 (2010).

    Article  CAS  Google Scholar 

  7. Zhao, Y., Tang, H. & Ye, Y. Bioinformatics 28, 125–126 (2012).

    Article  CAS  Google Scholar 

  8. Huson, D.H. & Xie, C. Bioinformatics 30, 38–39 (2014).

    Article  CAS  Google Scholar 

  9. Burkhardt, S. & Kärkkäinen, J. Fundamenta Informaticae 23, 1001–1018 (2003).

    Google Scholar 

  10. Ma, B., Tromp, J. & Li, M. Bioinformatics 18, 440–445 (2002).

    Article  CAS  Google Scholar 

  11. Ilie, L., Ilie, S., Khoshraftar, S. & Bigvand, A.M. BMC Genomics 12, 280 (2011).

    Article  Google Scholar 

  12. Murphy, L.R., Wallqvist, A. & Levy, R.M. Protein Eng. 13, 149–152 (2000).

    Article  CAS  Google Scholar 

  13. Smith, T.F. & Waterman, M.S. J. Mol. Biol. 147, 195–197 (1981).

    Article  CAS  Google Scholar 

  14. Mackelprang, R. et al. Nature 480, 368–371 (2011).

    Article  CAS  Google Scholar 

  15. Jansson, J. Microbe 6, 309–315 (2011).

    Google Scholar 

  16. Turnbaugh, P.J. et al. Nature 449, 804–810 (2007).

    Article  CAS  Google Scholar 

  17. Venter, J.C. et al. Science 304, 66–74 (2004).

    Article  CAS  Google Scholar 

  18. Wilson, M.C. et al. Nature 506, 58–62 (2014).

    Article  CAS  Google Scholar 

  19. Wheeler, D.L. et al. Nucleic Acids Res. 36, D13–D21 (2008).

    Article  CAS  Google Scholar 

  20. Boncz, P., Manegold, S. & Kersten, M.L. Proc. VLDB Conf. 99, 54–65 (1999).

    Google Scholar 

  21. Hach, F. et al. Nat. Methods 7, 576–577 (2010).

    Article  CAS  Google Scholar 

  22. Rognes, T. BMC Bioinformatics 12, 221 (2011).

    Article  Google Scholar 

  23. Henikoff, J.G. & Henikoff, S. Methods Enzymol. 266, 88–105 (1996).

    Article  CAS  Google Scholar 

  24. Zhu, W., Lomsadze, A. & Borodovsky, M. Nucleic Acids Res. 38, e132 (2010).

    Article  Google Scholar 

Download references

Acknowledgements

This research was partially supported by the National Research Foundation and Ministry of Education Singapore under its Research Centre of Excellence Programme, and by the A*STAR Computational Resource Centre through the use of its high-performance computing facilities.

Author information

Authors and Affiliations

Authors

Contributions

B.B. designed and implemented the algorithm. C.X. performed the experimental study. C.X. and D.H.H. initiated and guided the project. D.H.H. and B.B. wrote the manuscript.

Corresponding authors

Correspondence to Benjamin Buchfink or Daniel H Huson.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Spaced seeds.

(a) The four seed shapes of weight 12 that DIAMOND uses by default. Ones and zeros indicate positions to use and ignore, respectively. (b) Illustration of the application of a spaced seed to match letters between a reference and a query sequence.

Supplementary Figure 2 Ratio of main memory accesses.

The ratio K/K’ as a function of the total length of the query sequences, for different seed lengths. The variables K and K’ represent the approximate number of main memory accesses required when using a single index or double index, respectively.

Supplementary Figure 3 PCoA analysis of 12 permafrost samples based on a subset of 6 million reads.

BLASTX results are shown on the left, (a) and (c). DIAMOND-fast results are shown on the right, (b) and (d). The upper two panels show the first and second principle coordinates, whereas the lower two panels show the first and third principle coordinates.

Source data

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–3 and Supplementary Tables 1–3 (PDF 523 kb)

Supplementary Software

DIAMOND v0.4.7 source code (ZIP 2737 kb)

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Buchfink, B., Xie, C. & Huson, D. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12, 59–60 (2015). https://doi.org/10.1038/nmeth.3176

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3176

  • Springer Nature America, Inc.

This article is cited by

Navigation