Skip to main content

AC-DIAMOND: Accelerating Protein Alignment via Better SIMD Parallelization and Space-Efficient Indexing

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2016)

Abstract

To speed up the alignment of DNA reads or assembled contigs against a protein database has been a challenge up to now. The recent tool DIAMOND has significantly improved the speed of BLASTX and RAPSearch, while giving similar degree of sensitivity. Yet for applications like metagenomics, where large amount of data is involved, DIAMOND still takes a lot of time. This paper introduces an even faster protein alignment tool, called AC-DIAMOND, which attempts to speed up DIAMOND via better SIMD parallelization and more space-efficient indexing of the reference database; the latter allows more queries to be loaded into the memory and processed together. Experimental results show that AC-DIAMOND is about 4 times faster than DIAMOND on aligning DNA reads or contigs, while retaining the same sensitivity as DIAMOND.For example, the latest assembly of the Iowa praire soil metagenomic dataset generates over 9 milllion of contigs, with a total size about 7 Gbp; when aligning these contigs to the protein database NCBI-nr, DIAMOND takes 4 to 5 days, and AC-DIAMOND takes about 1 day. AC-DIAMOND is available for testing at http://ac-diamond.sourceforge.net.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz.

  2. 2.

    http://www.ncbi.nlm.nih.gov/sra/SRX1000158[accn].

References

  1. Buchfink, B., Xie, C., Huson, D.H.: Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12(1), 59–60 (2015)

    Article  Google Scholar 

  2. Ye, Y., Choi, J.H., Tang, H.: RAPSearch: a fast protein similarity search tool for short reads. BMC Bioinform. 12(1), 159 (2011)

    Article  Google Scholar 

  3. Zhao, Y., Tang, H., Ye, Y.: RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics 28(1), 125–126 (2012)

    Article  Google Scholar 

  4. Huson, D.H., Xie, C.: A poor man’s BLASTX-high-throughput metagenomic protein database search using PAUDA. Bioinformatics, btt254 (2013)

    Google Scholar 

  5. Suzuki, S., Kakuta, M., Ishida, T., Akiyama, Y.: Faster sequence homology searches by clustering subsequences. Bioinformatics, btu780 (2015)

    Google Scholar 

  6. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)

    Article  Google Scholar 

  7. Li, D., Liu, C.M., Luo, R., Sadakane, K., Lam, T.W.: MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, btv033 (2015)

    Google Scholar 

  8. Howe, A.C., Jansson, J.K., Malfatti, S.A., Tringe, S.G., Tiedje, J.M., Brown, C.T.: Tackling soil diversity with the assembly of large, complex metagenomes. Proc. Nat. Acad. Sci. 111(13), 4904–4909 (2014)

    Article  Google Scholar 

  9. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hing-Fung Ting .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Mai, H. et al. (2016). AC-DIAMOND: Accelerating Protein Alignment via Better SIMD Parallelization and Space-Efficient Indexing. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2016. Lecture Notes in Computer Science(), vol 9656. Springer, Cham. https://doi.org/10.1007/978-3-319-31744-1_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31744-1_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31743-4

  • Online ISBN: 978-3-319-31744-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics