State-of-the-Art in Smith–Waterman Protein Database Search on HPC Platforms

  • Enzo RucciEmail author
  • Carlos García
  • Guillermo Botella
  • Armando De Giusti
  • Marcelo Naiouf
  • Manuel Prieto-Matías


Searching biological sequence database is a common and repeated task in bioinformatics and molecular biology. The Smith–Waterman algorithm is the most accurate method for this kind of search. Unfortunately, this algorithm is computationally demanding and the situation gets worse due to the exponential growth of biological data in the last years. For that reason, the scientific community has made great efforts to accelerate Smith–Waterman biological database searches in a wide variety of hardware platforms. We give a survey of the state-of-the-art in Smith–Waterman protein database search, focusing on four hardware architectures: central processing units, graphics processing units, field programmable gate arrays and Xeon Phi coprocessors. After briefly describing each hardware platform, we analyse temporal evolution, contributions, limitations and experimental work and the results of each implementation. Additionally, as energy efficiency is becoming more important every day, we also survey performance/power consumption works. Finally, we give our view on the future of Smith–Waterman protein searches considering next generations of hardware architectures and its upcoming technologies.


Bioinformatics Computational acceleration Database search Smith–Waterman algorithm Protein sequence 



Enzo Rucci holds a PhD CONICET Fellowship from the Argentinian Government, and this work has been partially supported by Spanish research project TIN 2012-32180.


  1. 1.
    Alpern B, Carter L and Gatlin KS (1995) Microparallelism and High-performance Protein Matching. SC95, doi:10.1109/SUPERC.1995.242795Google Scholar
  2. 2.
    Altera Corporation (2016) Altera SDK for OpenCL. Available at Cited 08 Jan 2016
  3. 3.
    AMD (2016) High-Bandwidth Memory. Available at Cited 08 Jan 2016
  4. 4.
    Benkrid K, Akoglu A, Ling C, Song Y, Liu Y and Tian X (2012) High performance biological pairwise sequence alignment: FPGA versus GPU versus cell BE versus GPP. Int. J. Reconfig. Comput., doi:10.1155/2012/752910Google Scholar
  5. 5.
    Benkrid K, Ying L and Benkrid A (2009) A Highly Parameterized and Efficient FPGA-Based Skeleton for Pairwise Biological Sequence Alignment. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, doi:10.1109/TVLSI.2008.2005314zbMATHGoogle Scholar
  6. 6.
    Borovska P and Lazarova M (2011) Parallel models for sequence alignment on CPU and GPU. CompSysTech 2011, doi:10.1145/2023607.2023644Google Scholar
  7. 7.
    Daily J (2016) Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinformatics, doi: 10.1186/s12859-016-0930-zGoogle Scholar
  8. 8.
    Dydel S and Bala P (2004) Large Scale Protein Sequence Alignment Using FPGA Reprogrammable Logic Devices. LNCS, doi:10.1007/978-3-540-30117-2_5Google Scholar
  9. 9.
    Farrar M (2007) Striped Smith–Waterman speeds database searches six time over other SIMD implementations. Bioinformatics, doi:10.1093/bioinformatics/btl582Google Scholar
  10. 10.
    Farrar M (2008) Optimizing Smith–Waterman for the Cell Broad-band Engine. Available at Cited 21 Mar 2009
  11. 11.
    Gotoh O (1982) An improved algorithm for matching biological sequences. J. Mol. Biol., doi:10.1016/0022-2836(82)90398-9Google Scholar
  12. 12.
    Harris M (2014) Maxwell: The Most Advanced CUDA GPU Ever Made. Available at Cited 08 Jan 2016
  13. 13.
    Hasan L and Al-Ars Z (2011) An Overview of Hardware-Based Acceleration of Biological Sequence Alignment. In: Lopes H (ed) Computational Biology and Applied Bioinformatics. InTechGoogle Scholar
  14. 14.
    Howse B and Smith R (2015) Tick Tock On The Rocks: Intel Delays 10nm, Adds 3rd Gen 14nm Core Product Kaby Lake. Available at Cited 08 Dec 2015
  15. 15.
    IBM (2015) IBM and Xilinx Announce Strategic Collaboration to Accelerate Data Center Applications. Available at Cited 18 Jan 2016
  16. 16.
    Intel (2016) Intel Acquisition of Altera. Available at Cited 18 Jan 2016
  17. 17.
    Isa MN, Benkrid K, Clayton T, Ling C and Erdogan AT (2011) An FPGA-based parameterised and scalable optimal solutions for pairwise biological sequence analysis. AHS 2011, doi:10.1109/AHS.2011.5963957Google Scholar
  18. 18.
    Kentie M (2010) Biological Sequence Alignment Using Graphics Processing Units. MSc Thesis, TUDelftGoogle Scholar
  19. 19.
    Khalafallah A, Elbabb HF, Mahmoud O and Elshamy A (2010) Optimizing Smith–Waterman algorithm on Graphics Processing Unit. ICCTD 2010, doi:10.1109/ICCTD.2010.5645976Google Scholar
  20. 20.
    Lan h, Liu W, Schmidt B, and Wang B (2015) Accelerating Large-Scale Biological Database Search on Xeon Phi-based Neo-Heterogeneous Architectures. BIBM 2015, doi:10.1109/BIBM.2015.7359735Google Scholar
  21. 21.
    Liu W, Schmidt B, Voss G, Schroder A and Muller-Wittig W (2006) Bio-sequence database scanning on a GPU. IPDPS 2006, doi:IPDPS.2006.1639531Google Scholar
  22. 22.
    Liu Y, Huang W, Johnson J and Vaidya S (2006) GPU Accelerated Smith–Waterman. LNCS, doi:10.1007/11758549_29Google Scholar
  23. 23.
    Liu Y, Maskell DL and Schmidt B (2009) CUDASW + +: optimizing Smith–Waterman sequence database searches for CUDA-enabled graphics processing units. BMC Research Notes, doi:10.1186/1756-0500-2-73Google Scholar
  24. 24.
    Liu Y and Schmidt B (2014) SWAPHI: Smith–Waterman Protein Database Search on Xeon Phi Coprocessors. ASAP 2014, doi:10.1109/ASAP.2014.6868657Google Scholar
  25. 25.
    Liu Y, Schmidt B and Maskell DL (2010) CUDASW + + 2.0: enhanced Smith–Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions. BMC Research Notes, doi:10.1186/1756-0500-3-93Google Scholar
  26. 26.
    Liu Y, Wirawan A and Schmidt B (2013) CUDASW + + 3.0: accelerating Smith–Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformatics, doi:10.1186/1471-2105-14-117Google Scholar
  27. 27.
    Manavski S and Valle G (2008) CUDA compatible GPU cards as efficient hardware accelerators for Smith–Waterman sequence alignment. BMC Bioinformatics, doi:10.1186/1471-2105-9-S2-S10Google Scholar
  28. 28.
    McCool MD (2008) Scalable Programming Models for Massively Multicore Processors. Proceedings of the IEEE, doi: 10.1109/JPROC.2008.917731Google Scholar
  29. 29.
    Mentor Graphics (2015) Handel-C System Methodology. Available at Cited 08 Jan 2016
  30. 30.
    Moammer K (2015) AMD Zen CPU Microarchitecture Details Leaked In Patch - Doubles Down On IPC And Floating Point Throughput. Available at Cited 16 Oct 2015
  31. 31.
    Moammer K (2015) Nvidia: Pascal Is 10X Maxwell, Launching in 2016 - Features 16nm, 3D Memory, NV-Link and Mixed Precision. Available at Cited 16 Jan 2016
  32. 32.
    Needleman SB and Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins, J. Mol. Biol., doi:10.1016/0022-2836(70)90057-4Google Scholar
  33. 33.
    NVIDIA Corporation (2016) CUDA. Available at Cited 08 Jan 2016
  34. 34.
    Oliver TF, Schmidt B and Maskell DL (2005) Reconfigurable architectures for bio-sequence database scanning on FPGAs. IEEE Transactions on Circuits and Systems, doi:10.1109/TCSII.2005.853340Google Scholar
  35. 35.
    OpenACC Organization (2016) OpenACC. Available at Cited 08 Jan 2016
  36. 36.
    Pirzada U (2015) Intel’s Skylake Purley Family of Microprocessors Will Boast upto 28 Cores and 56 Threads - Next Generation Xeon Platform Landing in 2016. Available at Cited 08 Dec 2015
  37. 37.
    Rognes T (2011) Faster Smith–Waterman database searches with inter-sequence SIMD parallelization. BMC Bioinformatics, doi:10.1186/1471-2105-12-221Google Scholar
  38. 38.
    Rognes T and Seeberg E (2000) Six-fold speed-up of Smith–Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics, doi:10.1093/bioinformatics/16.8.699Google Scholar
  39. 39.
    Rucci E, García C, Botella, G, De Giusti A, Naiouf M and Prieto-Matías M (2015) An energy-aware performance analysis of SWIMM: Smith–Waterman implementation on Intel’s Multicore and Manycore architectures. CPE, doi: 10.1002/cpe.3598Google Scholar
  40. 40.
    Rucci E, García C, Botella, G, De Giusti A, Naiouf M and Prieto-Matías M (2016) OSWALD: OpenCL Smith-Waterman on Altera’s FPGA for large protein databases. IJHPCA, doi: 10.1177/1094342016654215Google Scholar
  41. 41.
    Seetle S (2013) High-performance Dynamic Programming on FPGAs with OpenCL. Available at Cited 08 Jan 2016
  42. 42.
    Smith R (2011) AMD’s Graphics Core Next Preview: AMD’s New GPU, Architected For Compute. Available at Cited 08 Jan 2016
  43. 43.
    Smith TF and Waterman MS (1981) Identification of common molecular subsequences. J. Mol. Biol., doi:10.1016/0022-2836(81)90087-5Google Scholar
  44. 44.
    Szalkowski A, Ledergerber C, Krahenbuhl P and Dessimoz C (2008) SWPS3 - fast multi-threaded vectorized Smith–Waterman for IBM Cell/B.E. and x86/SSE2. BMC Research Notes, doi:10.1186/1756-0500-1-107Google Scholar
  45. 45.
    The Khronos Group (2016) OpenCL: The open standard for parallel programming of heterogeneous systems. Available at Cited 08 Jan 2016
  46. 46.
    Van Court T and Herbordt MC (2004) Families of FPGA-based algorithms for approximate string matching. ASAP 2004, doi:10.1109/ASAP.2004.1342484Google Scholar
  47. 47.
    Vermij E (2011) Genetic sequence alignment on a supercomputing platform. MSc Thesis, TUDelftGoogle Scholar
  48. 48.
    Vestias M and Neto H (2014) Trends of CPU, GPU and FPGA for high-performance computing. FPL 2014, doi:10.1109/FPL.2014.6927483Google Scholar
  49. 49.
    Wang L, Chan Y, Duan X, Lan H, Meng X and Liu W (2014) XSW: Accelerating Biological Database Search on Xeon Phi. IPDPS 2014, doi:10.1109/IPDPSW.2014.108Google Scholar
  50. 50.
    Wang L, Chan Y, Duan X, Lan H, Meng X and Liu W (2014) XSW 2.0: A fast Smith–Waterman Algorithm Implementation on Intel Xeon Phi Coprocessors. Available at Cited 16 Nov 2015
  51. 51.
    Wozniak A (1997) Using video-oriented instructions to speed up sequence comparison. CABIOS 13-2:145–150Google Scholar
  52. 52.
    Xilinx Inc. (2016) SDAccel Development Environment. Available at Cited 08 Jan 2016
  53. 53.
    Zhang P, Tan G and Gao GR (2007) Implementation of the Smith-Waterman Algorithm on a Reconfigurable Supercomputing Platform. HPRCTA 2007, doi:10.1145/1328554.1328565Google Scholar
  54. 54.
    Zhao M, Lee W, Garrison E and Marth G (2013)SSW Library: An SIMD Smith-Waterman C/C + + Library for Use in Genomic Applications. PLoS One, doi:10.1371/journal.pone.0082138Google Scholar
  55. 55.
    Zou D, Dou Y and Xia F (2011) Optimization schemes and performance evaluation of Smith-Waterman algorithm on CPU, GPU and FPGA. CPE, doi: 10.1002/cpe.1913Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Enzo Rucci
    • 1
    Email author
  • Carlos García
    • 3
  • Guillermo Botella
    • 3
  • Armando De Giusti
    • 1
  • Marcelo Naiouf
    • 2
  • Manuel Prieto-Matías
    • 3
  1. 1.Instituto de Investigación en Informática LIDI (III-LIDI)CONICET, Universidad Nacional de La PlataBuenos AiresArgentina
  2. 2.Instituto de Investigación en Informática LIDI (III-LIDI)CONICET, Facultad de Informática, Universidad Nacional de La PlataBuenos AiresArgentina
  3. 3.Dpto. Arquitectura de Computadores y Automática Universidad Complutense de MadridMadridSpain

Personalised recommendations