International Journal of Parallel Programming

, Volume 47, Issue 2, pp 296–316 | Cite as

SWIMM 2.0: Enhanced Smith–Waterman on Intel’s Multicore and Manycore Architectures Based on AVX-512 Vector Extensions

  • Enzo Rucci
  • Carlos Garcia SanchezEmail author
  • Guillermo Botella Juan
  • Armando De Giusti
  • Marcelo Naiouf
  • Manuel Prieto-Matias


The well-known Smith–Waterman (SW) algorithm is the most commonly used method for local sequence alignments, but its acceptance is limited by the computational requirements for large protein databases. Although the acceleration of SW has already been studied on many parallel platforms, there are hardly any studies which take advantage of the latest Intel architectures based on AVX-512 vector extensions. This SIMD set is currently supported by Intel’s Knights Landing (KNL) accelerator and Intel’s Skylake (SKL) general purpose processors. In this paper, we present an SW version that is optimized for both architectures: the renowned SWIMM 2.0. The novelty of this vector instruction set requires the revision of previous programming and optimization techniques. SWIMM 2.0 is based on a massive multi-threading and SIMD exploitation. It is competitive in terms of performance compared with other state-of-the-art implementations, reaching 511 GCUPS on a single KNL node and 734 GCUPS on a server equipped with a dual SKL processor. Moreover, these successful performance rates make SWIMM 2.0 the most efficient energy footprint implementation in this study achieving 2.94 GCUPS/Watts on the SKL processor.


Bioinformatics Smith–Waterman Xeon-Phi Intel-KNL SIMD Intel-AVX512 



This work has been supported by the EU (FEDER) and the Spanish MINECO, under Grant TIN2015-65277-R and the CAPAP-H6 network (TIN2016-81840-REDT).


  1. 1.
    Bender, E.: Big data in biomedicine: 4 big questions. Nature 527, S19 (2015)CrossRefGoogle Scholar
  2. 2.
    Altschul, S.F., Madden, T.L., Schffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped Blast and PsiBlast: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389 (1997)CrossRefGoogle Scholar
  3. 3.
    Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85(8), 2444 (1988). CrossRefGoogle Scholar
  4. 4.
    Sæbø, P.E., Andersen, S.M., Myrseth, J., Laerdahl, J.K., Rognes, T.: PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology. Nucleic Acids Res. 33(Suppl 2), W535 (2005)CrossRefGoogle Scholar
  5. 5.
    Farrar, M.: Striped Smith–Waterman speeds database searches six time over other SIMD implementations. Bioinformatics 23(2), 156 (2007)CrossRefGoogle Scholar
  6. 6.
    Rucci, E., García, C., Botella, G., De Giusti, A., Naiouf, M., Prieto-Matías, M.: State-of-the-Art in Smith–Waterman Protein Database Search on HPC Platforms, pp. 197–223. Springer, New York (2016). Google Scholar
  7. 7.
    Rognes, T.: Faster Smith–Waterman database searches with inter-sequence SIMD parallelisation. BMC Bioinform. 12(1), 221 (2011). CrossRefGoogle Scholar
  8. 8.
    Frielingsdorf, J.T.: Improving optimal sequence alignments through a simd-accelerated library. Master’s thesis, University of Oslo (2015)Google Scholar
  9. 9.
    Daily, J.: Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinform. 17, 81 (2016)CrossRefGoogle Scholar
  10. 10.
    Liu, Y., Schmidt, B., Maskell, D.L.: CUDASW++2.0: enhanced Smith–Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions. BMC Res. Notes 3(1), 1 (2010). CrossRefGoogle Scholar
  11. 11.
    Liu, Y., Wirawan, A., Schmidt, B.: CUDASW++ 3.0: accelerating Smith–Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinform. 14, 117 (2013)CrossRefGoogle Scholar
  12. 12.
    Liu, Y., Schmidt, B.: SWAPHI: Smith–Waterman protein database search on Xeon Phi coprocessors. In: 25th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2014) (2014)Google Scholar
  13. 13.
    Lan, H., Liu, W., Schmidt, B., Wang, B.: Accelerating large-scale biological database search on Xeon Phi-based neo-heterogeneous architectures. in 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2015), pp. 503–510.
  14. 14.
    Rucci, E., Garcia, C., Botella, G., De Giusti, A., Naiouf, M., Prieto-Matas, M.: An energy-aware performance analysis of SWIMM: Smith–Waterman implementation on Intel’s Multicore and Manycore architectures. Concurr. Comput. Pract. Exp. 27(18), 5517 (2015). CrossRefGoogle Scholar
  15. 15.
    Lan, H., Liu, W., Liu, Y., Schmidt, B.: SWhybrid: a hybrid-parallel framework for large-scale protein sequence database search. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2017), pp. 42–51.
  16. 16.
    Isa, M., Benkrid, K., Clayton, T., Ling, C., Erdogan, A.: An FPGA-based parameterised and scalable optimal solutions for pairwise biological sequence analysis. In: Adaptive Hardware and Systems (AHS), 2011 NASA/ESA Conference on (2011), pp. 344–351.
  17. 17.
    Oliver, T.F., Schmidt, B., Maskell, D.L.: Reconfigurable architectures for bio-sequence database scanning on FPGAs. IEEE Trans. Circuits Syst. II Express Briefs 52(12), 851 (2005). CrossRefGoogle Scholar
  18. 18.
    Li, T.I., Shum, W., Truong, K.: 160-fold acceleration of the Smith–Waterman algorithm using a field programmable gate array (FPGA). BMC Bioinform. 8, I85 (2007)CrossRefGoogle Scholar
  19. 19.
    Rucci, E., Garcia, C., Botella, G., De Giusti, A., Naiouf, M., Prieto-Matas, M.: OSWALD: OpenCL Smith–Waterman algorithm on altera FPGA for large protein databases. J. High Perform. Comput. Appl, Int (2016). Google Scholar
  20. 20.
    Rucci, E., Garcia, C., Botella, G., De Giusti, A., Naiouf, M., Prieto-Matias, M.: First experiences accelerating Smith–Waterman on Intel’s Knights Landing processor. In: Ibrahim, S., Choo, K.K.R., Yan, Z., Pedrycz, W. (eds.) Algorithms and Architectures for Parallel Processing: 17th International Conference, ICA3PP 2017, Helsinki, Finland, August 21–23, 2017, Proceedings, pp. 569–579. Springer, Cham (2017).
  21. 21.
    Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195 (1981)CrossRefGoogle Scholar
  22. 22.
    Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1981)CrossRefGoogle Scholar
  23. 23.
    Sodani, A., Gramunt, R., Corbal, J., Kim, H.S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.C.: Knights landing: second-generation Intel Xeon Phi product. IEEE Micro 36(2), 34 (2016). CrossRefGoogle Scholar
  24. 24.
    Asai, R.: MCDRAM as High-Bandidth Memory (HBM) in Knights Landing Processors: Developer’s Guide (2016).
  25. 25.
    Intel Corporation: Intel 64 and IA-32 Architectures Optimization Reference Manual (2017).
  26. 26.
    Rognes, T., Seeberg, E.: Six-fold speed-up of Smith–Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 16(8), 699 (2000). CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.II-LIDI, CONICETUniversidad Nacional de La PlataBuenos AiresArgentina
  2. 2.Universidad Complutense de MadridMadridSpain
  3. 3.III-LIDIUniversidad Nacional de La PlataBuenos AiresArgentina

Personalised recommendations