Many-query join: efficient shared execution of relational joins on modern hardware

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.


Database architectures typically process queries one at a time, executing concurrent queries in independent execution contexts. Often, such a design leads to unpredictable performance and poor scalability. One approach to circumvent the problem is to take advantage of sharing opportunities across concurrently running queries. In this paper, we propose many-query join (MQJoin), a novel method for sharing the execution of a join that can efficiently deal with hundreds of concurrent queries. This is achieved by minimizing redundant work and making efficient use of main-memory bandwidth and multi-core architectures. Compared to existing proposals, MQJoin is able to efficiently handle larger workloads regardless of the schema by exploiting more sharing opportunities. We also compared MQJoin to two commercial main-memory column-store databases. For a TPC-H-based workload, we show that MQJoin provides 2–5\(\times \) higher throughput with significantly more stable response times.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21


  1. 1.

    Code can be downloaded from:


  1. 1.

    TPC-H Benchmark.

  2. 2.

    Albutiu, M.-C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB 5(10), 1064–1075 (2012)

    Google Scholar 

  3. 3.

    Arumugam, S., Dobra, A., Jermaine, C.M., Pansare, N., Perez, L.: The DataPath system: a data-centric analytic processing engine for large data warehouses. Proc. SIGMOD 2010, 519–530 (2010)

    Google Scholar 

  4. 4.

    Avnur, R., Hellerstein, J.M.: Eddies: continuously adaptive query processing. Proc. SIGMOD 2000, 261–272 (2000)

    Google Scholar 

  5. 5.

    Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort versus hash revisited. PVLDB 7(1), 85–96 (2013)

    Google Scholar 

  6. 6.

    Balkesen, C., Teubner, J., Alonso, G., Özsu, M.T.: Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. Proc. ICDE 2013, 362–373 (2013)

    Google Scholar 

  7. 7.

    Balkesen, C., Teubner, J., Alonso, G., Özsu, T.: Main-memory hash joins on modern processor architectures. IEEE Trans. Knowl. Data Eng. 27(7), 1754–1766 (2015)

    Article  Google Scholar 

  8. 8.

    Barber, R., Lohman, G., Pandis, I., Raman, V., Sidle, R., Attaluri, G., Chainani, N., Lightstone, S., Sharpe, D.: Memory-efficient hash joins. Proc. VLDB 8(4), 353–364 (2014)

    Article  Google Scholar 

  9. 9.

    Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. Proc. SIGMOD 2011, 37–48 (2011)

    Google Scholar 

  10. 10.

    Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. Proc. CIDR 2005, 225–237 (2005)

    Google Scholar 

  11. 11.

    Candea, G., Polyzotis, N., Vingralek, R.: A scalable, predictable join operator for highly concurrent data warehouses. PVLDB 2(1), 277–288 (2009)

    Google Scholar 

  12. 12.

    Chen, C., Roussopoulos, N.: The implementation and performance evaluation of the ADMS query optimizer: integrating query result caching and matching. In: Proc EDBT, pp. 323–336 (1994)

  13. 13.

    Chen, S., Ailamaki, A., Gibbons, P. B., Mowry, T. C.: Improving hash join performance through prefetching. In: Proc. ICDE 2004, pp. 116– (2004)

  14. 14.

    Chen, S., Ailamaki, A., Gibbons, P.B., Mowry, T.C.: Improving hash join performance through prefetching. ACM Trans. Database Syst. 32(3), 17 (2007)

    Article  Google Scholar 

  15. 15.

    Ebenstein, R., Kamat, N., Nandi, A.: FluxQuery: an execution framework for highly interactive query workloads. In: Proc SIGMOD, pp. 1333–1345. ACM, New York, NY, USA (2016)

  16. 16.

    Giannikis, G., Alonso, G., Kossmann, D.: SharedDB: killing one thousand queries with one stone. PVLDB 5(6), 526–537 (2012)

    Google Scholar 

  17. 17.

    Giannikis, G., Makreshanski, D., Alonso, G., Kossmann, D.: Shared workload optimization. PVLDB 7(6), 429–440 (2014)

    Google Scholar 

  18. 18.

    Graefe, G.: Volcano&#151 an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng. 6(1), 120–135 (1994)

    Article  Google Scholar 

  19. 19.

    Harizopoulos, S., Ailamaki, A.: StagedDB: designing database servers for modern hardware. In: In IEEE Data, pp. 11–16 (2005)

  20. 20.

    Harizopoulos, S., Shkapenyuk, V., Ailamaki, A.: QPipe: a simultaneously pipelined relational query engine. Proc. SIGMOD 2005, 383–394 (2005)

    Google Scholar 

  21. 21.

    Ivanova, M. G., Kersten, M. L., Nes, N. J., Gonçalves, R. A.: An architecture for recycling intermediates in a column-store. In: Proc. SIGMOD, pp. 309–320. ACM, New York, NY, USA (2009)

  22. 22.

    Jha, S., He, B., Lu, M., Cheng, X., Huynh, H.P.: Improving main memory hash joins on intel xeon phi processors: an experimental approach. PVLDB 8(6), 642–653 (2015)

    Google Scholar 

  23. 23.

    Johnson, R., Harizopoulos, S., Hardavellas, N., Sabirli, K., Pandis, I., Ailamaki, A., Mancheril, N.G., Falsafi, B.: To share or not to share? Proc. VLDB 2007, 351–362 (2007)

    Google Scholar 

  24. 24.

    Kim, C., Kaldewey, T., Lee, V.W., Sedlar, E., Nguyen, A.D., Satish, N., Chhugani, J., Di Blas, A., Dubey, P.: Sort versus hash revisited: fast join implementation on modern multi-core CPUs. PVLDB 2(2), 1378–1389 (2009)

    Google Scholar 

  25. 25.

    Krikellas, K., Inc, G., Viglas, S. D., Cintra, M.: Modeling multithreaded query execution on chip multiprocessors. In ADMS (2010)

  26. 26.

    Lang, C.A., Bhattacharjee, B., Malkemus, T., Padmanabhan, S., Wong, K.: Increasing buffer-locality for multiple relational table scans through grouping and throttling. Proc. ICDE 2007, 1136–1145 (2007)

    Google Scholar 

  27. 27.

    Lang, C. A., Bhattacharjee, B., Malkemus, T., Wong, K.: Increasing buffer-locality for multiple index based scans through intelligent placement and index scan speed control. In: Proc. VLDB, pp. 1298–1309 (2007)

  28. 28.

    Lang, H., Mühlbauer, T., Funke, F., Boncz, P. A., Neumann, T., Kemper, A.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16, pp. 311–326. ACM, New York, NY, USA (2016)

  29. 29.

    Larson, P.-A., Birka, A., Hanson, E.N., Huang, W., Nowakiewicz, M., Papadimos, V.: Real-time analytical processing with SQL server. Proc. VLDB 8(12), 1740–1751 (2015)

    Article  Google Scholar 

  30. 30.

    Liu, F., Blanas, S.: Forecasting the cost of processing multi-join queries via hashing for main-memory databases. In Proc. SoCC, pp. 153–166. ACM, New York, NY, USA (2015)

  31. 31.

    Makreshanski, D., Giceva, J., Barthels, C., Alonso, G.: BatchDB: efficient isolated execution of hybrid OLTP+OLAP workloads for interactive applications. In: Proc. SIGMOD, pp. 37–50. ACM, New York, NY, USA (2017)

  32. 32.

    Manegold, S., Boncz, P., Kersten, M.: Optimizing main-memory join on modern hardware. IEEE Trans. Knowl. Data Eng. 14(4), 709–730 (2002)

    Article  Google Scholar 

  33. 33.

    Manegold, S., Boncz, P., Kersten, M. L.: Generic database cost models for hierarchical memory systems. In: Proc VLDB, pp. 191–202. VLDB Endowment (2002)

  34. 34.

    Manegold, S., Pellenkoft, A., Kersten, M. L.: A multi-query optimizer for Monet. In: Proc. BNCOD, pp. 36–50. Springer, London, UK (2000)

  35. 35.

    Müller, I., Sanders, P., Lacurie, A., Lehner, W., Färber, F.: Cache-efficient aggregation: hashing is sorting. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Proc. SIGMOD 2015, pp. 1123–1136. ACM, New York, NY, USA (2015)

  36. 36.

    O’Neil, P., Graefe, G.: Multi-table joins through bitmapped join indices. SIGMOD Rec. 24(3), 8–11 (1995)

    Article  Google Scholar 

  37. 37.

    O’Neil, P., O’Neal, B., Chen, X.: Star schema benchmark.

  38. 38.

    Psaroudakis, I., Athanassoulis, M., Ailamaki, A.: Sharing data and work across concurrent analytical queries. PVLDB 6(9), 637–648 (2013)

    Google Scholar 

  39. 39.

    Qiao, L., Raman, V., Reiss, F., Haas, P.J., Lohman, G.M.: Main-memory scan sharing for multi-core CPUs. PVLDB 1(1), 610–621 (2008)

    Google Scholar 

  40. 40.

    Raman, V., Attaluri, G., Barber, R., Chainani, N., Kalmuk, D., KulandaiSamy, V., Leenstra, J., Lightstone, S., Liu, S., Lohman, G.M., Malkemus, T., Mueller, R., Pandis, I., Schiefer, B., Sharpe, D., Sidle, R., Storm, A., Zhang, L.: DB2 with BLU acceleration: so much more than just a column store. Proc. VLDB 6(11), 1080–1091 (2013)

    Article  Google Scholar 

  41. 41.

    Raman, V., Swart, G., Qiao, L., Reiss, F., Dialani, V., Kossmann, D., Narang, I., Sidle, R.: Constant-time query processing. In: Proc. ICDE 2008, pp. 60–69 (2008)

  42. 42.

    Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. In: Proc. SIGMOD, pp. 249–260. ACM, New York, NY, USA (2000)

  43. 43.

    Răducanu, B., Boncz, P., Zukowski, M.: Micro adaptivity in vectorwise. In: Proc. SIGMOD, pp. 1231–1242. ACM, New York, NY, USA (2013)

  44. 44.

    Sellis, T.K.: Multiple-query optimization. ACM Trans. Database Syst. 13(1), 23–52 (1988)

    Article  Google Scholar 

  45. 45.

    Shatdal, A., Kant, C., Naughton, J.F.: Cache conscious algorithms for relational query processing. Proc. VLDB 1994, 510–521 (1994)

    Google Scholar 

  46. 46.

    Sodani, A.: Knights landing (knl): 2nd generation intel(r) xeon phi processor. In: 2015 IEEE Hot Chips 27 Symposium (HCS), pp. 1–24 (Aug 2015)

  47. 47.

    Unterbrunner, P., Giannikis, G., Alonso, G., Fauser, D., Kossmann, D.: Predictable performance for unpredictable workloads. PVLDB 2(1), 706–717 (2009)

    Article  Google Scholar 

  48. 48.

    Valduriez, P.: Join indices. ACM Trans. Database Syst. 12(2), 218–246 (1987)

    Article  Google Scholar 

  49. 49.

    Zukowski, M., Héman, S., Nes, N., Boncz, P.: Cooperative scans: dynamic bandwidth sharing in a DBMS. Proc. VLDB 2007, 723–734 (2007)

    Google Scholar 

  50. 50.

    Zukowski, M., Nes, N., Boncz, P.: DSM versus NSM: CPU performance tradeoffs in block-oriented query processing. In: Proc. DaMoN 2008, pp. 47–54 (2008)

  51. 51.

    Zukowski, M., van de Wiel, M., Boncz, P.: Vectorwise: a vectorized analytical DBMS. Proc. ICDE 2012, 1349–1350 (2012)

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Darko Makreshanski.

Appendix: A modified TPC-H queries

Appendix: A modified TPC-H queries


Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Makreshanski, D., Giannikis, G., Alonso, G. et al. Many-query join: efficient shared execution of relational joins on modern hardware. The VLDB Journal 27, 669–692 (2018).

Download citation


  • OLAP
  • Analytics
  • Join
  • MQJoin
  • Shared join
  • Main memory
  • TPC-H
  • Xeon Phi