Skip to main content

Using SIMD Instructions to Accelerate Sequence Similarity Searches Inside a Database System

  • Conference paper
  • First Online:
Databases Theory and Applications (ADC 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10837))

Included in the following conference series:

  • 1090 Accesses

Abstract

Database systems are optimised for managing large data sets, but they face difficulties making an impact to life sciences where the typical use cases involve much more complex analytical algorithms than found in traditional OLTP or OLAP scenarios. Although many database management systems (DBMS) are extensible via stored procedures to implement transactions or complex algorithms, these stored procedures are usually unable to leverage the inbuilt optimizations provided by the query engine, so other optimization avenues must be explored.

In this paper, we investigate how sequence alignment algorithms, one of the most common operations carried out on a bioinformatics or genomics database, can be efficiently implemented close to the data within an extensible database system. We investigate the use of single instruction, multiple data (SIMD) extensions to accelerate logic inside an DBMS. We also compare it to implementations of the same logic outside the DBMS.

Our implementation of an SIMD-accelerated Smith Waterman sequence-alignment algorithm shows an order of magnitude improvement on a non-accelerated version while running inside a DBMS. Our SIMD accelerated version also performs with little to no overhead inside the DBMS compared to the same logic running outside the DBMS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Daily, J.: Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinform. 17(1), 81 (2016)

    Article  MathSciNet  Google Scholar 

  2. Delaney, K., Beauchemin, B., Cunningham, C., Kehayias, J., Randal, P.S., Nevarez, B.: Microsoft SQL Server 2012 Internals. Microsoft Press, Redmond (2013)

    Google Scholar 

  3. Dorr, R.: How It Works: SQL Server 2016 SSE/AVX Support (2016)

    Google Scholar 

  4. Farrar, M.: Striped smith-waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23(2), 156–161 (2006)

    Article  Google Scholar 

  5. Héman, S.: Updating compressed column stores. Ph.D. thesis, Informatics Institute (IVI) (2009)

    Google Scholar 

  6. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. PNAS 89(22), 10915–10919 (1992)

    Article  Google Scholar 

  7. IHGRC: Finishing the euchromatic sequence of the human genome. Nature 431(7011), 931–945 (2004)

    Article  Google Scholar 

  8. Larson, P., Birka, A., Hanson, E.N., Huang, W., Nowakiewicz, M., Papadimos, V.: Real-time analytical processing with SQL server. PVLDB 8(12), 1740–1751 (2015)

    Google Scholar 

  9. Leturgez, L.: SIMD outside and inside Oracle 12c (2015)

    Google Scholar 

  10. Manegold, S., Boncz, P.A., Kersten, M.L.: Optimizing database architecture for the new bottleneck: memory access. VLDB J. 9(3), 231–246 (2000)

    Article  Google Scholar 

  11. Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: ACM SIGMOD, SIGMOD 2015, pp. 1493–1508. ACM, New York (2015)

    Google Scholar 

  12. Rognes, T.: Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation. BMC Bioinform. 12, 221 (2011)

    Article  Google Scholar 

  13. Rognes, T., Seeberg, E.: Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 16(8), 699–706 (2000)

    Article  Google Scholar 

  14. Röhm, U., Blakeley, J.A.: Data management for high-throughput genomics. In: Fourth Biennial Conference on Innovative Data Systems Research, CIDR 2009, Asilomar, CA, USA, 4–7 January 2009, Online Proceedings (2009)

    Google Scholar 

  15. Röhm, U., Diep, T.-M.: How to BLAST your database — a study of stored procedures for BLAST searches. In: Li Lee, M., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, pp. 807–816. Springer, Heidelberg (2006). https://doi.org/10.1007/11733836_58

    Chapter  Google Scholar 

  16. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  Google Scholar 

  17. Sosic, M.: An SIMD dynamic programming C/C++ library. Master’s thesis, University of Zagreb (2015)

    Google Scholar 

  18. Stonebraker, M., Brown, P., Zhang, D., Becla, J.: SciDB: a database management system for applications with complex analytics. Comput. Sci. Eng. 15(3), 54–62 (2013)

    Article  Google Scholar 

  19. Wozniak, A.: Using video-oriented instructions to speed up sequence comparison. Comput. Appl. Biosci. 13(2), 145–150 (1997)

    Google Scholar 

  20. Zhao, M., Lee, W.P., Garrison, E.P., Marth, G.T.: SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications. PLoS ONE 8(12), e82138 (2013)

    Article  Google Scholar 

  21. Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, 3–6 June 2002, pp. 145–156 (2002)

    Google Scholar 

  22. Żukowski, M.: Balancing vectorized query execution with bandwidth-optimized storage. Ph.D. thesis, Informatics Institute (IVI) (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sidath Randeni Kadupitige .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Randeni Kadupitige, S., Röhm, U. (2018). Using SIMD Instructions to Accelerate Sequence Similarity Searches Inside a Database System. In: Wang, J., Cong, G., Chen, J., Qi, J. (eds) Databases Theory and Applications. ADC 2018. Lecture Notes in Computer Science(), vol 10837. Springer, Cham. https://doi.org/10.1007/978-3-319-92013-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-92013-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92012-2

  • Online ISBN: 978-3-319-92013-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics