Fast and sensitive protein alignment using DIAMOND

Buchfink, Benjamin; Xie, Chao; Huson, Daniel H

doi:10.1038/nmeth.3176

Fast and sensitive protein alignment using DIAMOND

Brief Communication
Published: 17 November 2014

Volume 12, pages 59–60, (2015)
Cite this article

From

View current issue Submit your manuscript

Benjamin Buchfink¹,
Chao Xie^2,3 &
Daniel H Huson^1,2

51k Accesses
5751 Citations
103 Altmetric
7 Mentions
Explore all metrics

Abstract

The alignment of sequencing reads against a protein reference database is a major computational bottleneck in metagenomics and data-intensive evolutionary projects. Although recent tools offer improved performance over the gold standard BLASTX, they exhibit only a modest speedup or low sensitivity. We introduce DIAMOND, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

**Figure 1: Comparison of DIAMOND and RAPSearch2 against BLASTX for four sequencing technologies and for ORFs predicted from a bacterial assembly.**

Sensitive protein alignments at tree-of-life scale using DIAMOND

Article Open access 07 April 2021

AC-DIAMOND: Accelerating Protein Alignment via Better SIMD Parallelization and Space-Efficient Indexing

Progress in quickly finding orthologs as reciprocal best hits: comparing blast, last, diamond and MMseqs2

Article Open access 24 October 2020

Accession codes

Accessions

Sequence Read Archive

References

Handelsman, J., Rondon, M., Brady, S., Clardy, J. & Goodman, R. Chem. Biol. 5, R245–R249 (1998).
Article CAS Google Scholar
Benson, D.A., Karsch-Mizrachi, I., Lipman, D., Ostell, J. & Wheeler, D. Nucleic Acids Res. 33, D34–D38 (2005).
Article CAS Google Scholar
Kanehisa, M. & Goto, S. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS Google Scholar
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. J. Mol. Biol. 215, 403–410 (1990).
Article CAS Google Scholar
Kent, W.J. Genome Res. 12, 656–664 (2002).
Article CAS Google Scholar
Edgar, R.C. Bioinformatics 26, 2460–2461 (2010).
Article CAS Google Scholar
Zhao, Y., Tang, H. & Ye, Y. Bioinformatics 28, 125–126 (2012).
Article CAS Google Scholar
Huson, D.H. & Xie, C. Bioinformatics 30, 38–39 (2014).
Article CAS Google Scholar
Burkhardt, S. & Kärkkäinen, J. Fundamenta Informaticae 23, 1001–1018 (2003).
Google Scholar
Ma, B., Tromp, J. & Li, M. Bioinformatics 18, 440–445 (2002).
Article CAS Google Scholar
Ilie, L., Ilie, S., Khoshraftar, S. & Bigvand, A.M. BMC Genomics 12, 280 (2011).
Article Google Scholar
Murphy, L.R., Wallqvist, A. & Levy, R.M. Protein Eng. 13, 149–152 (2000).
Article CAS Google Scholar
Smith, T.F. & Waterman, M.S. J. Mol. Biol. 147, 195–197 (1981).
Article CAS Google Scholar
Mackelprang, R. et al. Nature 480, 368–371 (2011).
Article CAS Google Scholar
Jansson, J. Microbe 6, 309–315 (2011).
Google Scholar
Turnbaugh, P.J. et al. Nature 449, 804–810 (2007).
Article CAS Google Scholar
Venter, J.C. et al. Science 304, 66–74 (2004).
Article CAS Google Scholar
Wilson, M.C. et al. Nature 506, 58–62 (2014).
Article CAS Google Scholar
Wheeler, D.L. et al. Nucleic Acids Res. 36, D13–D21 (2008).
Article CAS Google Scholar
Boncz, P., Manegold, S. & Kersten, M.L. Proc. VLDB Conf. 99, 54–65 (1999).
Google Scholar
Hach, F. et al. Nat. Methods 7, 576–577 (2010).
Article CAS Google Scholar
Rognes, T. BMC Bioinformatics 12, 221 (2011).
Article Google Scholar
Henikoff, J.G. & Henikoff, S. Methods Enzymol. 266, 88–105 (1996).
Article CAS Google Scholar
Zhu, W., Lomsadze, A. & Borodovsky, M. Nucleic Acids Res. 38, e132 (2010).
Article Google Scholar

Download references

Acknowledgements

This research was partially supported by the National Research Foundation and Ministry of Education Singapore under its Research Centre of Excellence Programme, and by the A*STAR Computational Resource Centre through the use of its high-performance computing facilities.

Author information

Authors and Affiliations

Department of Computer Science and Center for Bioinformatics, University of Tübingen, Tübingen, Germany
Benjamin Buchfink & Daniel H Huson
Singapore Centre on Environmental Life Sciences Engineering, School of Biological Sciences, Nanyang Technological University, Singapore
Chao Xie & Daniel H Huson
Life Sciences Institute, National University of Singapore, Singapore
Chao Xie

Authors

Benjamin Buchfink
View author publications
You can also search for this author in PubMed Google Scholar
Chao Xie
View author publications
You can also search for this author in PubMed Google Scholar
Daniel H Huson
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.B. designed and implemented the algorithm. C.X. performed the experimental study. C.X. and D.H.H. initiated and guided the project. D.H.H. and B.B. wrote the manuscript.

Corresponding authors

Correspondence to Benjamin Buchfink or Daniel H Huson.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Spaced seeds.

(a) The four seed shapes of weight 12 that DIAMOND uses by default. Ones and zeros indicate positions to use and ignore, respectively. (b) Illustration of the application of a spaced seed to match letters between a reference and a query sequence.

Supplementary Figure 2 Ratio of main memory accesses.

The ratio K/K’ as a function of the total length of the query sequences, for different seed lengths. The variables K and K’ represent the approximate number of main memory accesses required when using a single index or double index, respectively.

Supplementary Figure 3 PCoA analysis of 12 permafrost samples based on a subset of 6 million reads.

BLASTX results are shown on the left, (a) and (c). DIAMOND-fast results are shown on the right, (b) and (d). The upper two panels show the first and second principle coordinates, whereas the lower two panels show the first and third principle coordinates.

Source data

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–3 and Supplementary Tables 1–3 (PDF 523 kb)

Supplementary Software

DIAMOND v0.4.7 source code (ZIP 2737 kb)

Source data

Source data to Fig. 1

Source data to Supplementary Fig. 2

Rights and permissions

Reprints and permissions

About this article

Cite this article

Buchfink, B., Xie, C. & Huson, D. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12, 59–60 (2015). https://doi.org/10.1038/nmeth.3176

Download citation

Received: 29 April 2014
Accepted: 20 October 2014
Published: 17 November 2014
Issue Date: January 2015
DOI: https://doi.org/10.1038/nmeth.3176
Springer Nature America, Inc.

This article is cited by

Identification of potential microbial risk factors associated with fecal indicator exceedances at recreational beaches
- Faizan Saleem
- Enze Li
- Herb E. Schellhorn
Environmental Microbiome (2024)
A systematic screen for co-option of transposable elements across the fungal kingdom
- Ursula Oggenfuss
- Thomas Badet
- Daniel Croll
Mobile DNA (2024)
Genome-wide analysis of the Tritipyrum NAC gene family and the response of TtNAC477 in salt tolerance
- Xiaojuan Liu
- Guangyi Zhou
- Mingjian Ren
BMC Plant Biology (2024)
Exploring virus-host-environment interactions in a chemotrophic-based underground estuary
- Timothy M. Ghaly
- Amaranta Focardi
- Ian T. Paulsen
Environmental Microbiome (2024)
Genome sequencing and molecular networking analysis of the wild fungus Anthostomella pinea reveal its ability to produce a diverse range of secondary metabolites
- R. Iacovelli
- T. He
- K. Haslinger
Fungal Biology and Biotechnology (2024)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast and sensitive protein alignment using DIAMOND

From

Abstract

Access this article

Similar content being viewed by others

Sensitive protein alignments at tree-of-life scale using DIAMOND

AC-DIAMOND: Accelerating Protein Alignment via Better SIMD Parallelization and Space-Efficient Indexing

Progress in quickly finding orthologs as reciprocal best hits: comparing blast, last, diamond and MMseqs2

Accession codes

Accessions

Sequence Read Archive

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary Figure 1 Spaced seeds.

Supplementary Figure 2 Ratio of main memory accesses.

Supplementary Figure 3 PCoA analysis of 12 permafrost samples based on a subset of 6 million reads.

Supplementary information

Supplementary Text and Figures

Supplementary Software

Source data

Source data to Fig. 1

Source data to Supplementary Fig. 2

Rights and permissions

About this article

Cite this article

This article is cited by

Identification of potential microbial risk factors associated with fecal indicator exceedances at recreational beaches

A systematic screen for co-option of transposable elements across the fungal kingdom

Genome-wide analysis of the Tritipyrum NAC gene family and the response of TtNAC477 in salt tolerance

Exploring virus-host-environment interactions in a chemotrophic-based underground estuary

Genome sequencing and molecular networking analysis of the wild fungus Anthostomella pinea reveal its ability to produce a diverse range of secondary metabolites

Navigation

Fast and sensitive protein alignment using DIAMOND

Abstract

Access this article

Similar content being viewed by others

Accession codes

Accessions

Sequence Read Archive

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation