Advertisement

Using SIMD Instructions to Accelerate Sequence Similarity Searches Inside a Database System

  • Sidath Randeni Kadupitige
  • Uwe Röhm
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10837)

Abstract

Database systems are optimised for managing large data sets, but they face difficulties making an impact to life sciences where the typical use cases involve much more complex analytical algorithms than found in traditional OLTP or OLAP scenarios. Although many database management systems (DBMS) are extensible via stored procedures to implement transactions or complex algorithms, these stored procedures are usually unable to leverage the inbuilt optimizations provided by the query engine, so other optimization avenues must be explored.

In this paper, we investigate how sequence alignment algorithms, one of the most common operations carried out on a bioinformatics or genomics database, can be efficiently implemented close to the data within an extensible database system. We investigate the use of single instruction, multiple data (SIMD) extensions to accelerate logic inside an DBMS. We also compare it to implementations of the same logic outside the DBMS.

Our implementation of an SIMD-accelerated Smith Waterman sequence-alignment algorithm shows an order of magnitude improvement on a non-accelerated version while running inside a DBMS. Our SIMD accelerated version also performs with little to no overhead inside the DBMS compared to the same logic running outside the DBMS.

Keywords

Sequence databases Stored procedures SIMD acceleration 

References

  1. 1.
    Daily, J.: Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinform. 17(1), 81 (2016)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Delaney, K., Beauchemin, B., Cunningham, C., Kehayias, J., Randal, P.S., Nevarez, B.: Microsoft SQL Server 2012 Internals. Microsoft Press, Redmond (2013)Google Scholar
  3. 3.
    Dorr, R.: How It Works: SQL Server 2016 SSE/AVX Support (2016)Google Scholar
  4. 4.
    Farrar, M.: Striped smith-waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23(2), 156–161 (2006)CrossRefGoogle Scholar
  5. 5.
    Héman, S.: Updating compressed column stores. Ph.D. thesis, Informatics Institute (IVI) (2009)Google Scholar
  6. 6.
    Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. PNAS 89(22), 10915–10919 (1992)CrossRefGoogle Scholar
  7. 7.
    IHGRC: Finishing the euchromatic sequence of the human genome. Nature 431(7011), 931–945 (2004)CrossRefGoogle Scholar
  8. 8.
    Larson, P., Birka, A., Hanson, E.N., Huang, W., Nowakiewicz, M., Papadimos, V.: Real-time analytical processing with SQL server. PVLDB 8(12), 1740–1751 (2015)Google Scholar
  9. 9.
    Leturgez, L.: SIMD outside and inside Oracle 12c (2015)Google Scholar
  10. 10.
    Manegold, S., Boncz, P.A., Kersten, M.L.: Optimizing database architecture for the new bottleneck: memory access. VLDB J. 9(3), 231–246 (2000)CrossRefGoogle Scholar
  11. 11.
    Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: ACM SIGMOD, SIGMOD 2015, pp. 1493–1508. ACM, New York (2015)Google Scholar
  12. 12.
    Rognes, T.: Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation. BMC Bioinform. 12, 221 (2011)CrossRefGoogle Scholar
  13. 13.
    Rognes, T., Seeberg, E.: Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 16(8), 699–706 (2000)CrossRefGoogle Scholar
  14. 14.
    Röhm, U., Blakeley, J.A.: Data management for high-throughput genomics. In: Fourth Biennial Conference on Innovative Data Systems Research, CIDR 2009, Asilomar, CA, USA, 4–7 January 2009, Online Proceedings (2009)Google Scholar
  15. 15.
    Röhm, U., Diep, T.-M.: How to BLAST your database — a study of stored procedures for BLAST searches. In: Li Lee, M., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, pp. 807–816. Springer, Heidelberg (2006).  https://doi.org/10.1007/11733836_58CrossRefGoogle Scholar
  16. 16.
    Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)CrossRefGoogle Scholar
  17. 17.
    Sosic, M.: An SIMD dynamic programming C/C++ library. Master’s thesis, University of Zagreb (2015)Google Scholar
  18. 18.
    Stonebraker, M., Brown, P., Zhang, D., Becla, J.: SciDB: a database management system for applications with complex analytics. Comput. Sci. Eng. 15(3), 54–62 (2013)CrossRefGoogle Scholar
  19. 19.
    Wozniak, A.: Using video-oriented instructions to speed up sequence comparison. Comput. Appl. Biosci. 13(2), 145–150 (1997)Google Scholar
  20. 20.
    Zhao, M., Lee, W.P., Garrison, E.P., Marth, G.T.: SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications. PLoS ONE 8(12), e82138 (2013)CrossRefGoogle Scholar
  21. 21.
    Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, 3–6 June 2002, pp. 145–156 (2002)Google Scholar
  22. 22.
    Żukowski, M.: Balancing vectorized query execution with bandwidth-optimized storage. Ph.D. thesis, Informatics Institute (IVI) (2009)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.The University of SydneySydneyAustralia

Personalised recommendations