Journal of Signal Processing Systems

, Volume 77, Issue 1–2, pp 131–149 | Cite as

Large-Scale Pairwise Alignments on GPU Clusters: Exploring the Implementation Space

  • Huan Truong
  • Da Li
  • Kittisak Sajjapongse
  • Gavin Conant
  • Michela BecchiEmail author


Several problems in computational biology require the all-against-all pairwise comparisons of tens of thousands of individual biological sequences. Each such comparison can be performed with the well-known Needleman-Wunsch alignment algorithm. However, with the rapid growth of biological databases, performing all possible comparisons with this algorithm in serial becomes extremely time-consuming. The massive computational power of graphics processing units (GPUs) makes them an appealing choice for accelerating these computations. As such, CPU-GPU clusters can enable all-against-all comparisons on large datasets. In this work, we present four GPU implementations for large-scale pairwise sequence alignment: TiledDScan-mNW, DScan-mNW, RScan-mNW and LazyRScan-mNW. The proposed GPU kernels exhibit different parallelization patterns: we discuss how each parallelization strategy affects the memory accesses and the utilization of the underlying GPU hardware. We evaluate our implementations on a variety of low- and high-end GPUs with different compute capabilities. Our results show that all the proposed solutions outperform the existing open-source implementation from the Rodinia Benchmark Suite, and LazyRScan-mNW is the preferred solution for applications that require performing the trace-back operation only on a subset of the considered sequence pairs (for example, the pairs whose alignment score exceeds a predefined threshold). Finally, we discuss the integration of the proposed GPU kernels into a hybrid MPI-CUDA framework for deployment on CPU-GPU clusters. In particular, our proposed distributed design targets both homogeneous and heterogeneous clusters with nodes that differ amongst themselves in their hardware configuration.


Heterogeneous system Sequence alignment GPU 



We thank the reviewers for their feedback. This work has been supported by NSF award CNS-1216756 and by equipment donations from Nvidia Corporation.


  1. 1.
    Needleman, S. B., & Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48, 443–453.CrossRefGoogle Scholar
  2. 2.
    Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147(1), 195–197.CrossRefGoogle Scholar
  3. 3.
    Thompson, J. D., Higgins, D. G., & Gibson, T. J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22, 4673–4680.CrossRefGoogle Scholar
  4. 4.
    Hillis, D. M., Moritz, C., & Mable, B. K. (1996). Molecular systematics (2nd ed.). Sunderland: Sinauer Associates.Google Scholar
  5. 5.
    Nei, M., & Gojobori, T. (1986). Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Molecular Biology and Evolution, 3(5), 418–426.Google Scholar
  6. 6.
    Pearson, W. R., & Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the United States of America, 85(8), 2444–2448.CrossRefGoogle Scholar
  7. 7.
    Altschul, S. F., Gish, W., Miller, W., et al. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3), 403–410.CrossRefGoogle Scholar
  8. 8.
    Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997). Gapped blast and Psi-blast : a new-generation of protein database search programs. Nucleic Acids Research, 25(17), 3389–3402.CrossRefGoogle Scholar
  9. 9.
    Li, H., Ruan, J., & Durbin, R. (2008). Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research, 18(11), 1851–1858.CrossRefGoogle Scholar
  10. 10.
    Langmead, B., Trapnell, C., Pop, M., et al. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3), R25.CrossRefGoogle Scholar
  11. 11.
    Myers, G. (1999). A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of the ACM (JACM), 46(3), 395–415.CrossRefzbMATHGoogle Scholar
  12. 12.
    Benson, D. A., Cavanaugh, M., Clark, K., et al. (2013). GenBank. Nucleic Acids Research, 41(Database issue), D36–D42.CrossRefGoogle Scholar
  13. 13.
    Meusemann, K., von Reumont, B. M., Simon, S., et al. (2010). A phylogenomic approach to resolve the arthropod tree of life. Molecular Biology and Evolution, 27(11), 2451–2464.CrossRefGoogle Scholar
  14. 14.
    Pace, N. R. (2009). Mapping the tree of life: progress and prospects. Microbiology and Molecular Biology Reviews, 73(4), 565–576.CrossRefGoogle Scholar
  15. 15.
    Parfrey, L. W., Grant, J., Tekle, Y. I., et al. (2010). Broadly sampled multigene analyses yield a well-resolved eukaryotic tree of life. Systems Biology, 59(5), 518–533.CrossRefGoogle Scholar
  16. 16.
    Beja, O., Suzuki, M. T., Heidelberg, J. F., et al. (2002). Unsuspected diversity among marine aerobic anoxygenic phototrophs. Nature, 415(6872), 630–633.CrossRefGoogle Scholar
  17. 17.
    Kim, M., Morrison, M., & Yu, Z. (2011). Status of the phylogenetic diversity census of ruminal microbiomes. FEMS Microbiology Ecology, 76(1), 49–63.CrossRefGoogle Scholar
  18. 18.
    Tringe, S. G., & Rubin, E. M. (2005). Metagenomics: DNA sequencing of environmental samples. Nature Reviews Genetics, 6(11), 805–814.CrossRefGoogle Scholar
  19. 19.
    Venter, J. C., Remington, K., Heidelberg, J. F., et al. (2004). Environmental genome shotgun sequencing of the sargasso Sea. Science, 304(5667), 66–74.CrossRefGoogle Scholar
  20. 20.
    Whitford, M. F., Forster, R. J., Beard, C. E., et al. (1998). Phylogenetic analysis of rumen bacteria by comparative sequence analysis of cloned 16S rRNA genes. Anaerobe, 4(3), 153–163.CrossRefGoogle Scholar
  21. 21.
    Cole, J. R., Wang, Q., Cardenas, E., et al. (2009). The ribosomal database project: improved alignments and new tools for rRNA analysis. Nucleic Acids Research, 37(Database issue), D141–D145.CrossRefGoogle Scholar
  22. 22.
    Tarditi, D., Puri, S., & Oglesby, J. (2006). Accelerator: using data parallelism to program GPUs for general-purpose uses. SIGARCH Comput. Archit. News, 34(5), 325–335.CrossRefGoogle Scholar
  23. 23.
    Che, S., Boyer, M., Meng, J., et al. (2009). “Rodinia: A benchmark suite for heterogeneous computing,” in Proc. of IISWC, pp. 44–54.Google Scholar
  24. 24.
  25. 25.
    Vouzis, P. D., & Sahinidis, N. V. (2010). GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics, 27(2), 182–188.CrossRefGoogle Scholar
  26. 26.
    Schatz, M. C., Trapnell, C., Delcher, A. L., et al. (2007). High-throughput sequence alignment using graphics processing units. BMC Bioinformatics, 8, 474.CrossRefGoogle Scholar
  27. 27.
    Walters, J. P., Meng, X., Chaudhary, V., et al. (2007). MPI-HMMER-boost: distributed FPGA acceleration. The Journal of VLSI Signla Processing Systems for Signal, Image, and Video Technology, 48(3), 6.Google Scholar
  28. 28.
    Pang, B., Zhao, N., Becchi, M., et al. (2012). Accelerating large-scale protein structure alignments with graphics processing units. BMC Res Notes, 5, 116.CrossRefGoogle Scholar
  29. 29.
    Manavski, S. A., & Valle, G. (2008). CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC Bioinformatics, 9(Suppl 2), S10.CrossRefGoogle Scholar
  30. 30.
    Liu, W., Schmidt, B., Voss, G., et al. (2007). Streaming algorithms for biological sequence alignment on GPUs. IEEE Transactions on Parallel and Distributed Systems, 19, 1270–1281.Google Scholar
  31. 31.
    Gao, Y., and Bakos, J. D. (2012). “GPU Acceleration of Pyrosequencing Noise Removal,” in Proc. of SAAHPC, Argonne, IL USA, pp. 94–101.Google Scholar
  32. 32.
    Liu, Y., Maskell, D. L., & Schmidt, B., (2009). “CUDASW++: Optimizing Smith-Waterman Sequence Database Searches for CUDA-enabled Graphics Processing Units,” BMC Research Notes, vol. 2, no. 73.Google Scholar
  33. 33.
    Wirawan, A., Kwoh, C. K., Hieu, N. T., et al. (2008). CBESW: sequence alignment on the playstation 3. BMC Bioinformatics, 9, 377.CrossRefGoogle Scholar
  34. 34.
    Szalkowski, A., Ledergerber, C., Krahenbuhl, P., et al. (2008). SWPS3 - Fast multi-threaded vectorized Smith-Waterman for IBM cell/B.E. And x86/SSE2. BMC Res Notes, 1, 107.CrossRefGoogle Scholar
  35. 35.
    Li, J., Ranka, S., & Sahni, S., (2012).“Pairwise sequence alignment for very long sequences on GPUs,” in Proc. of ICCABS, pp. 1–6.Google Scholar
  36. 36.
    Li, K.-B. (2003). ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics, 19(12), 2.CrossRefGoogle Scholar
  37. 37.
    Biegert, A., Mayer, C., Remmert, M., et al. (2006). The MPI bioinformatics toolkit for protein sequence analysis. Nucleic Acids Research, 34, 5.CrossRefGoogle Scholar
  38. 38.
    Henikoff, S., & Henikoff, J. G. (1992). Amino-acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences, U.S.A, 22, 10915–10919.CrossRefGoogle Scholar
  39. 39.
    Hirschberg, D. S. (1975). A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6), 341–343.CrossRefzbMATHMathSciNetGoogle Scholar
  40. 40.
    Myers, E. W., & Miller, W. (1988). Optimal alignments in linear space. Computer applications in the biosciences: CABIOS, 4(1), 11–17.Google Scholar
  41. 41.
    Sanders, J., & Jabdrot, E., (2010). CUDA by Example: An Introduction to General-Purpose GPU Programming: Addison-Wesley Professional.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Huan Truong
    • 1
  • Da Li
    • 2
  • Kittisak Sajjapongse
    • 2
  • Gavin Conant
    • 1
    • 3
  • Michela Becchi
    • 1
    • 2
    Email author
  1. 1.MU Informatics InstituteUniversity of MissouriColumbiaUSA
  2. 2.Department of Electrical and Computer EngineeringUniversity of MissouriColumbiaUSA
  3. 3.Division of Animal SciencesUniversity of MissouriColumbiaUSA

Personalised recommendations