Locality-Adaptive Parallel Hash Joins Using Hardware Transactional Memory

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10195)

Abstract

Previous work [1] has claimed that the best performing implementation of in-memory hash joins is based on (radix-)partitioning of the build-side input. Indeed, despite the overhead of partitioning, the benefits from increased cache-locality and synchronization free parallelism in the build-phase outweigh the costs when the input data is randomly ordered. However, many datasets already exhibit significant spatial locality (i.e., non-randomness) due to the way data items enter the database: through periodic ETL or trickle loaded in the form of transactions. In such cases, the first benefit of partitioning — increased locality — is largely irrelevant. In this paper, we demonstrate how hardware transactional memory (HTM) can render the other benefit, freedom from synchronization, irrelevant as well.

Specifically, using careful analysis and engineering, we develop an adaptive hash join implementation that outperforms parallel radix-partitioned hash joins as well as sort-merge joins on data with high spatial locality. In addition, we show how, through lightweight (less than 1% overhead) runtime monitoring of the transaction abort rate, our implementation can detect inputs with low spatial locality and dynamically fall back to radix-partitioning of the build-side input. The result is a hash join implementation that is more than 3 times faster than the state-of-the-art on high-locality data and never more than 1% slower.

References

  1. 1.
    Balkesen, C., et al.: Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. In: ICDE (2013)Google Scholar
  2. 2.
    Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: SIGMOD (2011)Google Scholar
  3. 3.
    Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach. Elsevier, Amsterdam (2011)MATHGoogle Scholar
  4. 4.
    Herlihy, M., Moss, J.E.B.: Transactional memory: architectural support for lock-free data structures. ACM (1993)Google Scholar
  5. 5.
    Jacobi, C., Slegel, T., Greiner, D.: Transactional memory architecture and implementation for IBM system Z. In: MICRO (2012)Google Scholar
  6. 6.
    Kim, C., et al.: Sort vs. hash revisited: fast join implementation on modern multi-core CPUs. In: VLDB (2009)Google Scholar
  7. 7.
    Leis, V., Kemper, A., Neumann, T.: Exploiting hardware transactional memory in main-memory databases. In: ICDE (2014)Google Scholar
  8. 8.
    Makreshanski, D., Levandoski, J., Stutsman, R.: To lock, swap, or elide: on the interplay of hardware transactional memory and lock-free indexing. Proc. VLDB Endow. 8(11), 1298–1309 (2015)CrossRefGoogle Scholar
  9. 9.
    Manegold, S., Boncz, P., Kersten, M.: Optimizing main-memory join on modern hardware. In: TKDE (2002)Google Scholar
  10. 10.
    Peters, T.: Description of timsort. http://bugs.python.org/file4451/timsort.txt
  11. 11.
    Rogaway, P., Shrimpton, T.: Cryptographic hash-function basics: definitions, implications, and separations for preimage resistance, second-preimage resistance, and collision resistance. In: Roy, B., Meier, W. (eds.) FSE 2004. LNCS, vol. 3017, pp. 371–388. Springer, Heidelberg (2004). doi:10.1007/978-3-540-25937-4_24 CrossRefGoogle Scholar
  12. 12.
    Shavit, N., Touitou, D.: Software transactional memory. Distrib. Comput. 10, 99–116 (1997)CrossRefGoogle Scholar
  13. 13.
    Tran, K.Q., Blanas, S., Naughton, J.F.: On transactional memory, spinlocks, and database transactions. In: ADMS (2010)Google Scholar
  14. 14.
    Yoo, R.M., et al.: Performance evaluation of Intel transactional synchronization extensions for high-performance computing. In: SC (2013)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.MIT CSAILCambridgeUSA

Personalised recommendations