Using SIMD Instructions to Accelerate Sequence Similarity Searches Inside a Database System

Randeni Kadupitige, Sidath; Röhm, Uwe

doi:10.1007/978-3-319-92013-9_7

Sidath Randeni Kadupitige¹⁷ &
Uwe Röhm¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10837))

Included in the following conference series:

Australasian Database Conference

1090 Accesses

Abstract

Database systems are optimised for managing large data sets, but they face difficulties making an impact to life sciences where the typical use cases involve much more complex analytical algorithms than found in traditional OLTP or OLAP scenarios. Although many database management systems (DBMS) are extensible via stored procedures to implement transactions or complex algorithms, these stored procedures are usually unable to leverage the inbuilt optimizations provided by the query engine, so other optimization avenues must be explored.

In this paper, we investigate how sequence alignment algorithms, one of the most common operations carried out on a bioinformatics or genomics database, can be efficiently implemented close to the data within an extensible database system. We investigate the use of single instruction, multiple data (SIMD) extensions to accelerate logic inside an DBMS. We also compare it to implementations of the same logic outside the DBMS.

Our implementation of an SIMD-accelerated Smith Waterman sequence-alignment algorithm shows an order of magnitude improvement on a non-accelerated version while running inside a DBMS. Our SIMD accelerated version also performs with little to no overhead inside the DBMS compared to the same logic running outside the DBMS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Daily, J.: Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinform. 17(1), 81 (2016)
Article MathSciNet Google Scholar
Delaney, K., Beauchemin, B., Cunningham, C., Kehayias, J., Randal, P.S., Nevarez, B.: Microsoft SQL Server 2012 Internals. Microsoft Press, Redmond (2013)
Google Scholar
Dorr, R.: How It Works: SQL Server 2016 SSE/AVX Support (2016)
Google Scholar
Farrar, M.: Striped smith-waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23(2), 156–161 (2006)
Article Google Scholar
Héman, S.: Updating compressed column stores. Ph.D. thesis, Informatics Institute (IVI) (2009)
Google Scholar
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. PNAS 89(22), 10915–10919 (1992)
Article Google Scholar
IHGRC: Finishing the euchromatic sequence of the human genome. Nature 431(7011), 931–945 (2004)
Article Google Scholar
Larson, P., Birka, A., Hanson, E.N., Huang, W., Nowakiewicz, M., Papadimos, V.: Real-time analytical processing with SQL server. PVLDB 8(12), 1740–1751 (2015)
Google Scholar
Leturgez, L.: SIMD outside and inside Oracle 12c (2015)
Google Scholar
Manegold, S., Boncz, P.A., Kersten, M.L.: Optimizing database architecture for the new bottleneck: memory access. VLDB J. 9(3), 231–246 (2000)
Article Google Scholar
Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: ACM SIGMOD, SIGMOD 2015, pp. 1493–1508. ACM, New York (2015)
Google Scholar
Rognes, T.: Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation. BMC Bioinform. 12, 221 (2011)
Article Google Scholar
Rognes, T., Seeberg, E.: Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 16(8), 699–706 (2000)
Article Google Scholar
Röhm, U., Blakeley, J.A.: Data management for high-throughput genomics. In: Fourth Biennial Conference on Innovative Data Systems Research, CIDR 2009, Asilomar, CA, USA, 4–7 January 2009, Online Proceedings (2009)
Google Scholar
Röhm, U., Diep, T.-M.: How to BLAST your database — a study of stored procedures for BLAST searches. In: Li Lee, M., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, pp. 807–816. Springer, Heidelberg (2006). https://doi.org/10.1007/11733836_58
Chapter Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Article Google Scholar
Sosic, M.: An SIMD dynamic programming C/C++ library. Master’s thesis, University of Zagreb (2015)
Google Scholar
Stonebraker, M., Brown, P., Zhang, D., Becla, J.: SciDB: a database management system for applications with complex analytics. Comput. Sci. Eng. 15(3), 54–62 (2013)
Article Google Scholar
Wozniak, A.: Using video-oriented instructions to speed up sequence comparison. Comput. Appl. Biosci. 13(2), 145–150 (1997)
Google Scholar
Zhao, M., Lee, W.P., Garrison, E.P., Marth, G.T.: SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications. PLoS ONE 8(12), e82138 (2013)
Article Google Scholar
Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, 3–6 June 2002, pp. 145–156 (2002)
Google Scholar
Żukowski, M.: Balancing vectorized query execution with bandwidth-optimized storage. Ph.D. thesis, Informatics Institute (IVI) (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

The University of Sydney, Sydney, NSW, 2006, Australia
Sidath Randeni Kadupitige & Uwe Röhm

Authors

Sidath Randeni Kadupitige
View author publications
You can also search for this author in PubMed Google Scholar
Uwe Röhm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sidath Randeni Kadupitige .

Editor information

Editors and Affiliations

ICT, Griffith University, Southport, Queensland, Australia
Junhu Wang
Nanyang Technological University, Singapore, Singapore
Gao Cong
Faculty of Information and Communication Technologies, Swinburne University of Technology, Hawthorn, Victoria, Australia
Jinjun Chen
The University of Melbourne, Melbourne, Victoria, Australia
Jianzhong Qi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Randeni Kadupitige, S., Röhm, U. (2018). Using SIMD Instructions to Accelerate Sequence Similarity Searches Inside a Database System. In: Wang, J., Cong, G., Chen, J., Qi, J. (eds) Databases Theory and Applications. ADC 2018. Lecture Notes in Computer Science(), vol 10837. Springer, Cham. https://doi.org/10.1007/978-3-319-92013-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-92013-9_7
Published: 18 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92012-2
Online ISBN: 978-3-319-92013-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics