Abstract
Grouped aggregation is a commonly used analytical function. The common implementation of the function using hashing techniques suffers lower throughput rate due to the collision of the insert keys in the hashing techniques. During collision, the underlying technique searches for an alternative location to insert keys. Searching an alternative location increases the processing time for an individual key thereby degrading the overall throughput. In this work, we use Single Instruction Multiple Data (SIMD) vectorization to search multiple slots at an instant followed by direct aggregation of results. We provide our experimental results of our vectorized grouped aggregation with various open-addressing hashing techniques using several dataset distributions and our inferences on them. Among our findings, we observe different impacts of vectorization on these techniques. Namely, linear probing and two-choice hashing improve their performance with vectorization, whereas cuckoo and hopscotch hashing show a negative impact. Overall, we provide in this work a basic structure of a dedicated SIMD accelerated grouped aggregation framework that can be adapted with different hashing techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Broneske, D., Meister, A., Saake, G.: Hardware-sensitive scan operator variants for compiled selection pipelines. In: Datenbanksysteme für Business, Technologie und Web (BTW), pp. 403–412 (2017)
Broneske, D., Saake, G.: Exploiting capabilities of modern processors in data intensive applications. IT - Inf. Technol. 59(3), 133 (2017)
Cieslewicz, J., Ross, K.A.: Adaptive aggregation on chip multiprocessors. In: Proceedings of the Very Large Databases (VLDB), pp. 339–350 (2007)
Flajolet, P., Poblete, P., Viola, A.: On the analysis of linear probing hashing. Algorithmica 22(4), 490–515 (1998)
Gray, J., Sundaresan, P., Englert, S., Baclawski, K., Weinberger, P.J.: Quickly generating billion-record synthetic databases. In: International Conference on Management of Data (SIGMOD), pp. 243–252 (1994)
Herlihy, M., Shavit, N., Tzafrir, M.: Hopscotch hashing. In: Taubenfeld, G. (ed.) DISC 2008. LNCS, vol. 5218, pp. 350–364. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87779-0_24
Jiang, P., Agrawal, G.: Efficient SIMD and MIMD parallelization of hash-based aggregation by conflict mitigation. In: Proceedings of the International Conference on Supercomputing (ICS), pp. 24:1–24:11 (2017)
McClellan, M.T., Minker, J., Knuth, D.E.: The art of computer programming, vol. 3: sorting and searching. Math. Comput. 28(128), 1175 (1974)
Pagh, R., Rodler, F.F.: Cuckoo hashing. J. Algorithms 51(2), 122–144 (2004)
Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: Proceedings of the International Conference on Management of Data (SIGMOD), pp. 1493–1508 (2015)
Polychroniou, O., Ross, K.A.: High throughput heavy hitter aggregation for modern SIMD processors. In: Proceedings of the Ninth International Workshop on Data Management on New Hardware (DaMoN), pp. 6:1–6:6 (2013)
Richa, A.W., Mitzenmacher, M., Sitaraman, R.: The power of two random choices: a survey of techniques and results. Comb. Optim. 9, 255–304 (2001)
Richter, S., Alvarez, V., Dittrich, J.: A seven-dimensional analysis of hashing methods and its implications on query processing. In: Proceedings of the Very Large Databases (VLDB), vol. 9, no. 3, pp. 96–107 (2015)
Ross, K.A.: Efficient hash probes on modern processors. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 1297–1301. IEEE (2007)
Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: International Conference on Management of Data (SIGMOD), p. 145 (2002)
Acknowledgments
This work was partially funded by the DFG (grant no.: SA 465/51-1 and SA 465/50-1).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Gurumurthy, B., Broneske, D., Pinnecke, M., Campero, G., Saake, G. (2018). SIMD Vectorized Hashing for Grouped Aggregation. In: Benczúr, A., Thalheim, B., Horváth, T. (eds) Advances in Databases and Information Systems. ADBIS 2018. Lecture Notes in Computer Science(), vol 11019. Springer, Cham. https://doi.org/10.1007/978-3-319-98398-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-98398-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98397-4
Online ISBN: 978-3-319-98398-1
eBook Packages: Computer ScienceComputer Science (R0)