Advertisement

SIMD Vectorized Hashing for Grouped Aggregation

  • Bala GurumurthyEmail author
  • David Broneske
  • Marcus Pinnecke
  • Gabriel Campero
  • Gunter Saake
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11019)

Abstract

Grouped aggregation is a commonly used analytical function. The common implementation of the function using hashing techniques suffers lower throughput rate due to the collision of the insert keys in the hashing techniques. During collision, the underlying technique searches for an alternative location to insert keys. Searching an alternative location increases the processing time for an individual key thereby degrading the overall throughput. In this work, we use Single Instruction Multiple Data (SIMD) vectorization to search multiple slots at an instant followed by direct aggregation of results. We provide our experimental results of our vectorized grouped aggregation with various open-addressing hashing techniques using several dataset distributions and our inferences on them. Among our findings, we observe different impacts of vectorization on these techniques. Namely, linear probing and two-choice hashing improve their performance with vectorization, whereas cuckoo and hopscotch hashing show a negative impact. Overall, we provide in this work a basic structure of a dedicated SIMD accelerated grouped aggregation framework that can be adapted with different hashing techniques.

Keywords

SIMD Hashing techniques Hash based grouping Grouped aggregation Direct aggregation Open addressing 

Notes

Acknowledgments

This work was partially funded by the DFG (grant no.: SA 465/51-1 and SA 465/50-1).

References

  1. 1.
    Broneske, D., Meister, A., Saake, G.: Hardware-sensitive scan operator variants for compiled selection pipelines. In: Datenbanksysteme für Business, Technologie und Web (BTW), pp. 403–412 (2017)Google Scholar
  2. 2.
    Broneske, D., Saake, G.: Exploiting capabilities of modern processors in data intensive applications. IT - Inf. Technol. 59(3), 133 (2017)Google Scholar
  3. 3.
    Cieslewicz, J., Ross, K.A.: Adaptive aggregation on chip multiprocessors. In: Proceedings of the Very Large Databases (VLDB), pp. 339–350 (2007)Google Scholar
  4. 4.
    Flajolet, P., Poblete, P., Viola, A.: On the analysis of linear probing hashing. Algorithmica 22(4), 490–515 (1998)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Gray, J., Sundaresan, P., Englert, S., Baclawski, K., Weinberger, P.J.: Quickly generating billion-record synthetic databases. In: International Conference on Management of Data (SIGMOD), pp. 243–252 (1994)Google Scholar
  6. 6.
    Herlihy, M., Shavit, N., Tzafrir, M.: Hopscotch hashing. In: Taubenfeld, G. (ed.) DISC 2008. LNCS, vol. 5218, pp. 350–364. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-87779-0_24CrossRefGoogle Scholar
  7. 7.
    Jiang, P., Agrawal, G.: Efficient SIMD and MIMD parallelization of hash-based aggregation by conflict mitigation. In: Proceedings of the International Conference on Supercomputing (ICS), pp. 24:1–24:11 (2017)Google Scholar
  8. 8.
    McClellan, M.T., Minker, J., Knuth, D.E.: The art of computer programming, vol. 3: sorting and searching. Math. Comput. 28(128), 1175 (1974)CrossRefGoogle Scholar
  9. 9.
    Pagh, R., Rodler, F.F.: Cuckoo hashing. J. Algorithms 51(2), 122–144 (2004)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: Proceedings of the International Conference on Management of Data (SIGMOD), pp. 1493–1508 (2015)Google Scholar
  11. 11.
    Polychroniou, O., Ross, K.A.: High throughput heavy hitter aggregation for modern SIMD processors. In: Proceedings of the Ninth International Workshop on Data Management on New Hardware (DaMoN), pp. 6:1–6:6 (2013)Google Scholar
  12. 12.
    Richa, A.W., Mitzenmacher, M., Sitaraman, R.: The power of two random choices: a survey of techniques and results. Comb. Optim. 9, 255–304 (2001)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Richter, S., Alvarez, V., Dittrich, J.: A seven-dimensional analysis of hashing methods and its implications on query processing. In: Proceedings of the Very Large Databases (VLDB), vol. 9, no. 3, pp. 96–107 (2015)Google Scholar
  14. 14.
    Ross, K.A.: Efficient hash probes on modern processors. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 1297–1301. IEEE (2007)Google Scholar
  15. 15.
    Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: International Conference on Management of Data (SIGMOD), p. 145 (2002)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Bala Gurumurthy
    • 1
    Email author
  • David Broneske
    • 1
  • Marcus Pinnecke
    • 1
  • Gabriel Campero
    • 1
  • Gunter Saake
    • 1
  1. 1.Otto-von-Guericke-UniversitätMagdeburgGermany

Personalised recommendations