Skip to main content
Log in

Interleaving with coroutines: a systematic and practical approach to hide memory latency in index joins

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Index joins present a case of pointer-chasing code that causes data cache misses. In principle, we can hide these cache misses by overlapping them with computation: The lookups involved in an index join are parallel tasks whose execution can be interleaved, so that, when a cache miss occurs in one task, the processor executes independent instructions from another one. Yet, the literature provides no concrete performance model for such interleaved execution and, more importantly, production systems still waste processor cycles on cache misses because (a) hardware and compiler limitations prohibit automatic task interleaving and (b) existing techniques that involve the programmer produce unmaintainable code and are thus avoided in practice. In this paper, we address these shortcomings: we model interleaved execution explaining how to estimate the speedup of any interleaving technique, and we propose interleaving with coroutines, i.e., functions that suspend their execution for later resumption. We deploy coroutines on index joins running in SAP HANA and show that interleaving with coroutines performs like other state-of-the-art techniques, retains close resemblance to the original code, and supports both interleaved and non-interleaved execution in the same implementation. Thus, we establish the first systematic and practical approach for interleaving index joins of any type.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. Retrieved from a profiling session of 60 s.

  2. We have not yet investigated how to form a pipeline with variable size, so we do not provide a corresponding SPP version.

  3. http://www.boost.org/doc/libs.

  4. https://en.wikipedia.org/wiki/Duff%27s_device.

  5. At the time of writing, Visual C++ and Clang are the only compilers that implement the technical specification for coroutines.

  6. An interested reader can find more information about C++ coroutines in the technical specification [6] and at Lewis Baker’s excellent blog: https://lewissbaker.github.io/.

  7. The observed speedups differ from the expected ones mainly due to rounding; group sizes cannot be floats.

References

  1. Laudon, J., Gupta, A., Horowitz, M.: Interleaving: A multithreading technique targeting multiprocessors and workstations. In: Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS VI, pp. 308–318. ACM, New York, NY, USA (1994)

  2. Mowry, T.C., Lam, M.S., Gupta, A.: Design and evaluation of a compiler algorithm for prefetching. In: Proceedings of the fifth international conference on architectural support for programming languages and operating systems, ASPLOS V, pp. 62–73. ACM, New York, NY, USA (1992)

  3. Chen, S., Ailamaki, A., Gibbons, P.B., Mowry, T.C.: Improving hash join performance through prefetching. ACM Trans. Database Syst. 32(3), 1–32 (2007)

    Article  Google Scholar 

  4. Kocberber, O., Falsafi, B., Grot, B.: Asynchronous memory access chaining. PVLDB 9(4), 252–263 (2015)

    Google Scholar 

  5. Färber, F., May, N., Lehner, W., Große, P., Müller, I., Rauhe, H., Dees, J.: The SAP HANA database: an architecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012)

    Google Scholar 

  6. Programming Languages – C++ Extensions for Coroutines. Proposed Draft Technical Specification ISO/IEC DTS 22277 (E). http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4680.pdf. Accessed 16 May 2018

  7. Psaropoulos, G., Legler, T., May, N., Ailamaki, A.: Interleaving with coroutines: a practical approach for robust index joins. Proc. VLDB Endow. 11(2), 230–242 (2017)

    Article  Google Scholar 

  8. Lang, H., Mühlbauer, T., Funke, F., Boncz, P.A., Neumann, T., Kemper, A.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16, pp. 311–326. ACM, New York, NY, USA (2016)

  9. Larson, P.-A., Clinciu, C., Hanson, E.N., Oks, A., Price, S.L., Rangarajan, S., Surna, A., Zhou, Q.: Sql server column store indexes. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD ’11, pp. 1177–1184. ACM, New York, NY, USA (2011)

  10. Poess, M., Potapov, D.: Data compression in Oracle. In: Proceedings of VLDB, pp. 937–947. (2003)

  11. Raman, V., Attaluri, G., Barber, R., Chainani, N., Kalmuk, D., KulandaiSamy, V., Leenstra, J., Lightstone, S., Liu, S., Lohman, G.M., Malkemus, T., Mueller, R., Pandis, I., Schiefer, B., Sharpe, D., Sidle, R., Storm, A., Zhang, L.: DB2 with BLU Acceleration: so much more than just a column store. PVLDB 6(11), 1080–1091 (2013)

    Google Scholar 

  12. Melton, J.: Advanced SQL 1999: Understanding Object-Relational, and Other Advanced Features. Elsevier Science Inc., Amsterdam (2002)

    Google Scholar 

  13. Colgan, M.: Oracle database in-memory. Technical report, Oracle Corporation, 2015. http://www.oracle.com/technetwork/database/in-memory/overview/twp-oracle-database-in-memory-2245633.pdf. Accessed 16 May 2018

  14. Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, S., Kersten, M.: MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012)

    Google Scholar 

  15. Kemper, A., Neumann, T.: HyPer: a hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: Proceedings of ICDE, pp. 195–206, (2011)

  16. Larson, P.-A., Clinciu, C., Fraser, C., Hanson, E.N., Mokhtar, M., Nowakiewicz, M., Papadimos, V., Price, S.L., Rangarajan, S., Rusanu, R., Saubhasik, M.: Enhancements to SQL Server column stores. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13, pp. 1159–1168. ACM, New York, NY, USA (2013)

  17. Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture optimized for the new bottleneck: Memory access. In: Proceedings of the 25th international conference on very large data bases, VLDB ’99, pp. 54–65. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. (1999)

  18. Müller, I., Ratsch, C., Färber, F.: Adaptive string dictionary compression in in-memory column-store database systems. In: Proceedings of EDBT, pp. 283–294, (2014)

  19. Rao, J., Ross, K.A.: Making B+-trees cache conscious in main memory. In: Proceedings of ACM SIGMOD, pp. 475–486. (2000)

  20. Transaction Processing Performance Council. TPC-DS Benchmark Version 2.3.0. http://www.tpc.org/tpcds/. Accessed 14 August 2017

  21. Intel Corporation. Intel® 64 and IA-32 Architectures Optimization Reference Manual, (2016)

  22. Kocberber, O., Grot, B., Picorel, J., Falsafi, B., Lim, K., Ranganathan, P.: Meet the walkers: accelerating index traversals for in-memory databases. In: Proceedings of the 46th annual IEEE/ACM international symposium on microarchitecture, MICRO-46, pp. 468–479. ACM, New York, (2013)

  23. Neumann, T.: Trying to Speed Up Binary Search, 2015. http://databasearchitects.blogspot.ch/2015/09/trying-to-speed-up-binary-search.html. Accessed 16 May 2018

  24. Lam, M.D., Rothberg, E.E., Wolf, M.E.: The cache performance and optimizations of blocked algorithms. In: Proceedings of the ASPLOS (1991)

  25. Kim, C., Chhugani, J., Satish, N., Sedlar, E., Nguyen, A.D., Kaldewey, T., Lee, V.W., Brandt, S.A., Dubey, P.: FAST: fast architecture sensitive tree search on modern CPUs and GPUs. In: Proceedings of SIGMOD, pp. 339–350, (2010)

  26. Zhou, J., Ross, K.A.: Buffering accesses to memory-resident index structures. In: Proceedings of VLDB, pp. 405–416. (2003)

  27. Zhou, J., Cieslewicz, J., Ross, K.A., Shah, M.: Improving database performance on simultaneous multithreading processors. In: Proceedings of VLDB, pp. 49–60. (2005)

  28. Moura, A .L .D., Ierusalimschy, R.: Revisiting coroutines. ACM Trans. Program. Lang. Syst., 31(2), 6:1–6:31 (2009)

    Article  Google Scholar 

  29. Conway, M.E.: Design of a separable transition-diagram compiler. Commun. ACM 6(7), 396–408 (1963)

    Article  MATH  Google Scholar 

  30. C# Reference. https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/yield. Accessed 16 May 2018

  31. Generators. Python Wiki. https://wiki.python.org/moin/Generators. Accessed 16 May 2018

  32. Russinovich, M.E., Solomon, D.A., Ionescu, A.: Windows Internals, Part 1: Covering Windows Server 2008 R2 and Windows 7, 6th edn. Microsoft Press, USA (2012)

    Google Scholar 

  33. Kerrisk, M.: The Linux Programming Interface: A Linux and UNIX System Programming Handbook, 1st edn. No Starch Press, San Francisco (2010)

    Google Scholar 

  34. RethinkDB Team. Improving a large C++ project with coroutines, 2010. https://www.rethinkdb.com/blog/improving-a-large-c-project-with-coroutines/. Accessed 16 May 2018

  35. Dybvig, R .K.: The Scheme Programming Language, 4th edn. The MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  36. Kowalke, O., Goodspeed, N.: Call/cc (call-with-current-continuation): a low-level api for stackful context switching. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0534r3.pdf. Accessed 16 May 2018

  37. Marusarz, J.: Understanding how general exploration works in Intel®VTune™Amplifier XE, 2015. https://software.intel.com/en-us/articles/understanding-how-general-exploration-works-in-intel-vtune-amplifier-xe. Accessed 16 May 2018

  38. McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 2(4), 308–320 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  39. Jonathan, C., Minhas, U.F., Hunter, J., Levandoski, J., Nishanov, G.: Exploiting coroutines to attack the “killer nanoseconds”. PVLDB 11(11), 1702–1714 (2018)

    Google Scholar 

  40. Kiriansky, V., Xu, H., Rinard, M., Amarasinghe, S.: Cimple: instruction and memory level parallelism. ArXiv e-prints, (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgios Psaropoulos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Psaropoulos, G., Legler, T., May, N. et al. Interleaving with coroutines: a systematic and practical approach to hide memory latency in index joins. The VLDB Journal 28, 451–471 (2019). https://doi.org/10.1007/s00778-018-0533-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-018-0533-6

Keywords

Navigation