Advertisement

AC-DIAMOND: Accelerating Protein Alignment via Better SIMD Parallelization and Space-Efficient Indexing

  • Huijun Mai
  • Dinghua Li
  • Yifan Zhang
  • Henry Chi-Ming Leung
  • Ruibang Luo
  • Hing-Fung Ting
  • Tak-Wah Lam
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9656)

Abstract

To speed up the alignment of DNA reads or assembled contigs against a protein database has been a challenge up to now. The recent tool DIAMOND has significantly improved the speed of BLASTX and RAPSearch, while giving similar degree of sensitivity. Yet for applications like metagenomics, where large amount of data is involved, DIAMOND still takes a lot of time. This paper introduces an even faster protein alignment tool, called AC-DIAMOND, which attempts to speed up DIAMOND via better SIMD parallelization and more space-efficient indexing of the reference database; the latter allows more queries to be loaded into the memory and processed together. Experimental results show that AC-DIAMOND is about 4 times faster than DIAMOND on aligning DNA reads or contigs, while retaining the same sensitivity as DIAMOND.For example, the latest assembly of the Iowa praire soil metagenomic dataset generates over 9 milllion of contigs, with a total size about 7 Gbp; when aligning these contigs to the protein database NCBI-nr, DIAMOND takes 4 to 5 days, and AC-DIAMOND takes about 1 day. AC-DIAMOND is available for testing at http://ac-diamond.sourceforge.net.

Keywords

DNA-protein alignment SIMD Dynamic programming Compressed indexing 

References

  1. 1.
    Buchfink, B., Xie, C., Huson, D.H.: Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12(1), 59–60 (2015)CrossRefGoogle Scholar
  2. 2.
    Ye, Y., Choi, J.H., Tang, H.: RAPSearch: a fast protein similarity search tool for short reads. BMC Bioinform. 12(1), 159 (2011)CrossRefGoogle Scholar
  3. 3.
    Zhao, Y., Tang, H., Ye, Y.: RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics 28(1), 125–126 (2012)CrossRefGoogle Scholar
  4. 4.
    Huson, D.H., Xie, C.: A poor man’s BLASTX-high-throughput metagenomic protein database search using PAUDA. Bioinformatics, btt254 (2013)Google Scholar
  5. 5.
    Suzuki, S., Kakuta, M., Ishida, T., Akiyama, Y.: Faster sequence homology searches by clustering subsequences. Bioinformatics, btu780 (2015)Google Scholar
  6. 6.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)CrossRefGoogle Scholar
  7. 7.
    Li, D., Liu, C.M., Luo, R., Sadakane, K., Lam, T.W.: MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, btv033 (2015)Google Scholar
  8. 8.
    Howe, A.C., Jansson, J.K., Malfatti, S.A., Tringe, S.G., Tiedje, J.M., Brown, C.T.: Tackling soil diversity with the assembly of large, complex metagenomes. Proc. Nat. Acad. Sci. 111(13), 4904–4909 (2014)CrossRefGoogle Scholar
  9. 9.
    Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Huijun Mai
    • 1
  • Dinghua Li
    • 1
  • Yifan Zhang
    • 1
  • Henry Chi-Ming Leung
    • 1
  • Ruibang Luo
    • 1
    • 2
    • 3
  • Hing-Fung Ting
    • 1
  • Tak-Wah Lam
    • 1
    • 2
  1. 1.HKU-BGI Bioinformatics Algorithms and Core Technology Laboratory, Department of Computer ScienceUniversity of Hong KongHong KongChina
  2. 2.L3 Bioinformatics LimitedHong KongChina
  3. 3.United Electronics Co., LtdBeijingChina

Personalised recommendations