Interleaving with coroutines: a systematic and practical approach to hide memory latency in index joins

Psaropoulos, Georgios; Legler, Thomas; May, Norman; Ailamaki, Anastasia

doi:10.1007/s00778-018-0533-6

Interleaving with coroutines: a systematic and practical approach to hide memory latency in index joins

Regular Paper
Published: 14 December 2018

Volume 28, pages 451–471, (2019)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Georgios Psaropoulos ORCID: orcid.org/0000-0003-4173-2059^1,2,
Thomas Legler²,
Norman May² &
…
Anastasia Ailamaki^1,3

865 Accesses
10 Citations
Explore all metrics

Abstract

Index joins present a case of pointer-chasing code that causes data cache misses. In principle, we can hide these cache misses by overlapping them with computation: The lookups involved in an index join are parallel tasks whose execution can be interleaved, so that, when a cache miss occurs in one task, the processor executes independent instructions from another one. Yet, the literature provides no concrete performance model for such interleaved execution and, more importantly, production systems still waste processor cycles on cache misses because (a) hardware and compiler limitations prohibit automatic task interleaving and (b) existing techniques that involve the programmer produce unmaintainable code and are thus avoided in practice. In this paper, we address these shortcomings: we model interleaved execution explaining how to estimate the speedup of any interleaving technique, and we propose interleaving with coroutines, i.e., functions that suspend their execution for later resumption. We deploy coroutines on index joins running in SAP HANA and show that interleaving with coroutines performs like other state-of-the-art techniques, retains close resemblance to the original code, and supports both interleaved and non-interleaved execution in the same implementation. Thus, we establish the first systematic and practical approach for interleaving index joins of any type.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Fig. 7

JG2Time: A Learned Time Estimator for Join Operators Based on Heterogeneous Join-Graphs

In-memory database acceleration on FPGAs: a survey

Article Open access 26 October 2019

Efficient High-Level Programming in Plain Java

Article 05 December 2022

Notes

Retrieved from a profiling session of 60 s.
We have not yet investigated how to form a pipeline with variable size, so we do not provide a corresponding SPP version.
http://www.boost.org/doc/libs.
https://en.wikipedia.org/wiki/Duff%27s_device.
At the time of writing, Visual C++ and Clang are the only compilers that implement the technical specification for coroutines.
An interested reader can find more information about C++ coroutines in the technical specification [6] and at Lewis Baker’s excellent blog: https://lewissbaker.github.io/.
The observed speedups differ from the expected ones mainly due to rounding; group sizes cannot be floats.

References

Laudon, J., Gupta, A., Horowitz, M.: Interleaving: A multithreading technique targeting multiprocessors and workstations. In: Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS VI, pp. 308–318. ACM, New York, NY, USA (1994)
Mowry, T.C., Lam, M.S., Gupta, A.: Design and evaluation of a compiler algorithm for prefetching. In: Proceedings of the fifth international conference on architectural support for programming languages and operating systems, ASPLOS V, pp. 62–73. ACM, New York, NY, USA (1992)
Chen, S., Ailamaki, A., Gibbons, P.B., Mowry, T.C.: Improving hash join performance through prefetching. ACM Trans. Database Syst. 32(3), 1–32 (2007)
Article Google Scholar
Kocberber, O., Falsafi, B., Grot, B.: Asynchronous memory access chaining. PVLDB 9(4), 252–263 (2015)
Google Scholar
Färber, F., May, N., Lehner, W., Große, P., Müller, I., Rauhe, H., Dees, J.: The SAP HANA database: an architecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012)
Google Scholar
Programming Languages – C++ Extensions for Coroutines. Proposed Draft Technical Specification ISO/IEC DTS 22277 (E). http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4680.pdf. Accessed 16 May 2018
Psaropoulos, G., Legler, T., May, N., Ailamaki, A.: Interleaving with coroutines: a practical approach for robust index joins. Proc. VLDB Endow. 11(2), 230–242 (2017)
Article Google Scholar
Lang, H., Mühlbauer, T., Funke, F., Boncz, P.A., Neumann, T., Kemper, A.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16, pp. 311–326. ACM, New York, NY, USA (2016)
Larson, P.-A., Clinciu, C., Hanson, E.N., Oks, A., Price, S.L., Rangarajan, S., Surna, A., Zhou, Q.: Sql server column store indexes. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD ’11, pp. 1177–1184. ACM, New York, NY, USA (2011)
Poess, M., Potapov, D.: Data compression in Oracle. In: Proceedings of VLDB, pp. 937–947. (2003)
Raman, V., Attaluri, G., Barber, R., Chainani, N., Kalmuk, D., KulandaiSamy, V., Leenstra, J., Lightstone, S., Liu, S., Lohman, G.M., Malkemus, T., Mueller, R., Pandis, I., Schiefer, B., Sharpe, D., Sidle, R., Storm, A., Zhang, L.: DB2 with BLU Acceleration: so much more than just a column store. PVLDB 6(11), 1080–1091 (2013)
Google Scholar
Melton, J.: Advanced SQL 1999: Understanding Object-Relational, and Other Advanced Features. Elsevier Science Inc., Amsterdam (2002)
Google Scholar
Colgan, M.: Oracle database in-memory. Technical report, Oracle Corporation, 2015. http://www.oracle.com/technetwork/database/in-memory/overview/twp-oracle-database-in-memory-2245633.pdf. Accessed 16 May 2018
Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, S., Kersten, M.: MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012)
Google Scholar
Kemper, A., Neumann, T.: HyPer: a hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: Proceedings of ICDE, pp. 195–206, (2011)
Larson, P.-A., Clinciu, C., Fraser, C., Hanson, E.N., Mokhtar, M., Nowakiewicz, M., Papadimos, V., Price, S.L., Rangarajan, S., Rusanu, R., Saubhasik, M.: Enhancements to SQL Server column stores. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13, pp. 1159–1168. ACM, New York, NY, USA (2013)
Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture optimized for the new bottleneck: Memory access. In: Proceedings of the 25th international conference on very large data bases, VLDB ’99, pp. 54–65. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. (1999)
Müller, I., Ratsch, C., Färber, F.: Adaptive string dictionary compression in in-memory column-store database systems. In: Proceedings of EDBT, pp. 283–294, (2014)
Rao, J., Ross, K.A.: Making B+-trees cache conscious in main memory. In: Proceedings of ACM SIGMOD, pp. 475–486. (2000)
Transaction Processing Performance Council. TPC-DS Benchmark Version 2.3.0. http://www.tpc.org/tpcds/. Accessed 14 August 2017
Intel Corporation. Intel^® 64 and IA-32 Architectures Optimization Reference Manual, (2016)
Kocberber, O., Grot, B., Picorel, J., Falsafi, B., Lim, K., Ranganathan, P.: Meet the walkers: accelerating index traversals for in-memory databases. In: Proceedings of the 46th annual IEEE/ACM international symposium on microarchitecture, MICRO-46, pp. 468–479. ACM, New York, (2013)
Neumann, T.: Trying to Speed Up Binary Search, 2015. http://databasearchitects.blogspot.ch/2015/09/trying-to-speed-up-binary-search.html. Accessed 16 May 2018
Lam, M.D., Rothberg, E.E., Wolf, M.E.: The cache performance and optimizations of blocked algorithms. In: Proceedings of the ASPLOS (1991)
Kim, C., Chhugani, J., Satish, N., Sedlar, E., Nguyen, A.D., Kaldewey, T., Lee, V.W., Brandt, S.A., Dubey, P.: FAST: fast architecture sensitive tree search on modern CPUs and GPUs. In: Proceedings of SIGMOD, pp. 339–350, (2010)
Zhou, J., Ross, K.A.: Buffering accesses to memory-resident index structures. In: Proceedings of VLDB, pp. 405–416. (2003)
Zhou, J., Cieslewicz, J., Ross, K.A., Shah, M.: Improving database performance on simultaneous multithreading processors. In: Proceedings of VLDB, pp. 49–60. (2005)
Moura, A .L .D., Ierusalimschy, R.: Revisiting coroutines. ACM Trans. Program. Lang. Syst., 31(2), 6:1–6:31 (2009)
Article Google Scholar
Conway, M.E.: Design of a separable transition-diagram compiler. Commun. ACM 6(7), 396–408 (1963)
Article MATH Google Scholar
C# Reference. https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/yield. Accessed 16 May 2018
Generators. Python Wiki. https://wiki.python.org/moin/Generators. Accessed 16 May 2018
Russinovich, M.E., Solomon, D.A., Ionescu, A.: Windows Internals, Part 1: Covering Windows Server 2008 R2 and Windows 7, 6th edn. Microsoft Press, USA (2012)
Google Scholar
Kerrisk, M.: The Linux Programming Interface: A Linux and UNIX System Programming Handbook, 1st edn. No Starch Press, San Francisco (2010)
Google Scholar
RethinkDB Team. Improving a large C++ project with coroutines, 2010. https://www.rethinkdb.com/blog/improving-a-large-c-project-with-coroutines/. Accessed 16 May 2018
Dybvig, R .K.: The Scheme Programming Language, 4th edn. The MIT Press, Cambridge (2009)
MATH Google Scholar
Kowalke, O., Goodspeed, N.: Call/cc (call-with-current-continuation): a low-level api for stackful context switching. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0534r3.pdf. Accessed 16 May 2018
Marusarz, J.: Understanding how general exploration works in Intel®VTune™Amplifier XE, 2015. https://software.intel.com/en-us/articles/understanding-how-general-exploration-works-in-intel-vtune-amplifier-xe. Accessed 16 May 2018
McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 2(4), 308–320 (1976)
Article MathSciNet MATH Google Scholar
Jonathan, C., Minhas, U.F., Hunter, J., Levandoski, J., Nishanov, G.: Exploiting coroutines to attack the “killer nanoseconds”. PVLDB 11(11), 1702–1714 (2018)
Google Scholar
Kiriansky, V., Xu, H., Rinard, M., Amarasinghe, S.: Cimple: instruction and memory level parallelism. ArXiv e-prints, (2018)

Download references

Author information

Authors and Affiliations

School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, VD, Switzerland
Georgios Psaropoulos & Anastasia Ailamaki
SAP SE, Walldorf, BW, Germany
Georgios Psaropoulos, Thomas Legler & Norman May
RAW Labs SA, Lausanne, VD, Switzerland
Anastasia Ailamaki

Authors

Georgios Psaropoulos
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Legler
View author publications
You can also search for this author in PubMed Google Scholar
Norman May
View author publications
You can also search for this author in PubMed Google Scholar
Anastasia Ailamaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georgios Psaropoulos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Psaropoulos, G., Legler, T., May, N. et al. Interleaving with coroutines: a systematic and practical approach to hide memory latency in index joins. The VLDB Journal 28, 451–471 (2019). https://doi.org/10.1007/s00778-018-0533-6

Download citation

Received: 20 May 2018
Revised: 11 October 2018
Accepted: 05 December 2018
Published: 14 December 2018
Issue Date: 01 August 2019
DOI: https://doi.org/10.1007/s00778-018-0533-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interleaving with coroutines: a systematic and practical approach to hide memory latency in index joins

Abstract

Access this article

Similar content being viewed by others

JG2Time: A Learned Time Estimator for Join Operators Based on Heterogeneous Join-Graphs

In-memory database acceleration on FPGAs: a survey

Efficient High-Level Programming in Plain Java

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Interleaving with coroutines: a systematic and practical approach to hide memory latency in index joins

Abstract

Access this article

Similar content being viewed by others

JG2Time: A Learned Time Estimator for Join Operators Based on Heterogeneous Join-Graphs

In-memory database acceleration on FPGAs: a survey

Efficient High-Level Programming in Plain Java

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation