Large-Scale Pairwise Alignments on GPU Clusters: Exploring the Implementation Space
- 324 Downloads
Several problems in computational biology require the all-against-all pairwise comparisons of tens of thousands of individual biological sequences. Each such comparison can be performed with the well-known Needleman-Wunsch alignment algorithm. However, with the rapid growth of biological databases, performing all possible comparisons with this algorithm in serial becomes extremely time-consuming. The massive computational power of graphics processing units (GPUs) makes them an appealing choice for accelerating these computations. As such, CPU-GPU clusters can enable all-against-all comparisons on large datasets. In this work, we present four GPU implementations for large-scale pairwise sequence alignment: TiledDScan-mNW, DScan-mNW, RScan-mNW and LazyRScan-mNW. The proposed GPU kernels exhibit different parallelization patterns: we discuss how each parallelization strategy affects the memory accesses and the utilization of the underlying GPU hardware. We evaluate our implementations on a variety of low- and high-end GPUs with different compute capabilities. Our results show that all the proposed solutions outperform the existing open-source implementation from the Rodinia Benchmark Suite, and LazyRScan-mNW is the preferred solution for applications that require performing the trace-back operation only on a subset of the considered sequence pairs (for example, the pairs whose alignment score exceeds a predefined threshold). Finally, we discuss the integration of the proposed GPU kernels into a hybrid MPI-CUDA framework for deployment on CPU-GPU clusters. In particular, our proposed distributed design targets both homogeneous and heterogeneous clusters with nodes that differ amongst themselves in their hardware configuration.
KeywordsHeterogeneous system Sequence alignment GPU
We thank the reviewers for their feedback. This work has been supported by NSF award CNS-1216756 and by equipment donations from Nvidia Corporation.
- 4.Hillis, D. M., Moritz, C., & Mable, B. K. (1996). Molecular systematics (2nd ed.). Sunderland: Sinauer Associates.Google Scholar
- 5.Nei, M., & Gojobori, T. (1986). Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Molecular Biology and Evolution, 3(5), 418–426.Google Scholar
- 23.Che, S., Boyer, M., Meng, J., et al. (2009). “Rodinia: A benchmark suite for heterogeneous computing,” in Proc. of IISWC, pp. 44–54.Google Scholar
- 24.“Nvidia Applications Catalog” http://www.nvidia.com/docs/IO/123576/nv-applications-catalog-lowres.pdf
- 27.Walters, J. P., Meng, X., Chaudhary, V., et al. (2007). MPI-HMMER-boost: distributed FPGA acceleration. The Journal of VLSI Signla Processing Systems for Signal, Image, and Video Technology, 48(3), 6.Google Scholar
- 30.Liu, W., Schmidt, B., Voss, G., et al. (2007). Streaming algorithms for biological sequence alignment on GPUs. IEEE Transactions on Parallel and Distributed Systems, 19, 1270–1281.Google Scholar
- 31.Gao, Y., and Bakos, J. D. (2012). “GPU Acceleration of Pyrosequencing Noise Removal,” in Proc. of SAAHPC, Argonne, IL USA, pp. 94–101.Google Scholar
- 32.Liu, Y., Maskell, D. L., & Schmidt, B., (2009). “CUDASW++: Optimizing Smith-Waterman Sequence Database Searches for CUDA-enabled Graphics Processing Units,” BMC Research Notes, vol. 2, no. 73.Google Scholar
- 35.Li, J., Ranka, S., & Sahni, S., (2012).“Pairwise sequence alignment for very long sequences on GPUs,” in Proc. of ICCABS, pp. 1–6.Google Scholar
- 40.Myers, E. W., & Miller, W. (1988). Optimal alignments in linear space. Computer applications in the biosciences: CABIOS, 4(1), 11–17.Google Scholar
- 41.Sanders, J., & Jabdrot, E., (2010). CUDA by Example: An Introduction to General-Purpose GPU Programming: Addison-Wesley Professional.Google Scholar