An experimental evaluation and analysis of database cracking

Abstract

Database cracking has been an area of active research in recent years. The core idea of database cracking is to create indexes adaptively and incrementally as a side product of query processing. Several works have proposed different cracking techniques for different aspects including updates, tuple reconstruction, convergence, concurrency control, and robustness. Our 2014 VLDB paper “The Uncracked Pieces in Database Cracking” (PVLDB 7:97–108, 2013/VLDB 2014) was the first comparative study of these different methods by an independent group. In this article, we extend our published experimental study on database cracking and bring it to an up-to-date state. Our goal is to critically review several aspects, identify the potential, and propose promising directions in database cracking. With this study, we hope to expand the scope of database cracking and possibly leverage cracking in database engines other than MonetDB. We repeat several prior database cracking works including the core cracking algorithms as well as three other works on convergence (hybrid cracking), tuple reconstruction (sideways cracking), and robustness (stochastic cracking), respectively. Additionally to our conference paper, we now also look at a recently published study about CPU efficiency (predication cracking). We evaluate these works and show possible directions to do even better. As a further extension, we evaluate the whole class of parallel cracking algorithms that were proposed in three recent works. Altogether, in this work we revisit 8 papers on database cracking and evaluate in total 18 cracking methods, 6 sorting algorithms, and 3 full index structures. Additionally, we test cracking under a variety of experimental settings, including high selectivity (Low selectivity means that many entries qualify. Consequently, a high selectivity means, that only few entries qualify) queries, low selectivity queries, varying selectivity, and multiple query access patterns. Finally, we compare cracking against different sorting algorithms as well as against different main memory optimized indexes, including the recently proposed adaptive radix tree (ART). Our results show that: (1) the previously proposed cracking algorithms are repeatable, (2) there is still enough room to significantly improve the previously proposed cracking algorithms, (3) parallelizing cracking algorithms efficiently is a hard task, (4) cracking depends heavily on query selectivity, (5) cracking needs to catch up with modern indexing trends, and (6) different indexing algorithms have different indexing signatures.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Notes

  1. 1.

    Note that the query time of full scan varies by as much as 4 times. This is because of lazy evaluation in the filtering depending on the position of low key and high key in the value domain.

  2. 2.

    Measured with Intel VTune Amplifier 2015.

  3. 3.

    After the first few queries, cracking mostly performs a pair of crack-in-two operations as the likelihood of two splits falling in two different partitions increases with the number of applied queries.

  4. 4.

    Please note that our current implementation relies on a uniform key distribution to create equal-sized partitions. Handling skewed distributions would require the generation of equi-depth partitions.

  5. 5.

    The available ART implementation does not support bulk loading.

  6. 6.

    In contrast to [23], we do not merge the chunks after each query as this results in overhead.

References

  1. 1.

    Adelson-Velsky, G., et al.: An algorithm for the organization of information. In: USSR Academy of Sciences, pp. 263–266 (1962)

  2. 2.

    Alvarez, V., Schuhknecht, F.M., Dittrich, J., Richter, S.: Main memory adaptive indexing for multi-core systems. In: DaMoN, Snowbird, UT, USA, pp. 3:1–3:10 (2014)

  3. 3.

    Bayer, R., McCreight, E.M.: Organization and maintenance of large ordered indices. Acta Inf. 1, 173–189 (1972)

  4. 4.

    Birkeland, O.R.: Searching large data volumes with MISD processing. Ph.D. Thesis (2008)

  5. 5.

    DeWitt, D.J., Naughton, J.F., et al.: Practical skew handling in parallel joins. In: VLDB, Proceedings, pp. 27–40 (1992)

  6. 6.

    Finch, T.: Incremental Calculation of Weighted Mean and Variance. University of Cambridge Computing Service, Cambridge (2009)

    Google Scholar 

  7. 7.

    Generalized Heap Impl. https://github.com/valyala/gheap

  8. 8.

    Graefe, G., Halim, F., Idreos, S., et al.: Concurrency control for adaptive indexing. PVLDB 5, 656–667 (2012)

  9. 9.

    Graefe, G., Halim, F., Idreos, S., et al.: Transactional support for adaptive indexing. VLDB J. 23(2), 303–328 (2014)

  10. 10.

    Graefe, G., Kuno, H.: Self-selecting, self-tuning, incrementally optimized indexes. In: EDBT, pp. 371–381 (2010)

  11. 11.

    Halim, F., Idreos, S., et al.: Stochastic database cracking: towards robust adaptive indexing in main-memory column-stores. PVLDB 5, 502–513 (2012)

    Google Scholar 

  12. 12.

    Hildebrandt, P., Isbitz, H.: Radix exchange: an internal sorting method for digital computers. J. ACM 6(2), 156–163 (1959)

    MATH  MathSciNet  Article  Google Scholar 

  13. 13.

    Hoare, C.A.R.: Quicksort. Commun. ACM 4(7), 321 (1961)

    Article  Google Scholar 

  14. 14.

    Idreos, S., Kersten, M., Manegold, S.: Updating a cracked database. In: SIGMOD, pp. 413–424 (2007)

  15. 15.

    Idreos, S., Kersten, M., Manegold, S.: Self-organizing tuple reconstruction in column-stores. In: SIGMOD, pp. 297–308 (2009)

  16. 16.

    Idreos, S., Manegold, S., et al.: Merging what’s cracked, cracking what’s merged. PVLDB 4, 586–597 (2011)

    Google Scholar 

  17. 17.

    Idreos, S., et al.: Database cracking. In: CIDR, pp. 68–78 (2007)

  18. 18.

    Kersten, M., et al.: Cracking the database store. In: CIDR, pp. 213–224 (2005)

  19. 19.

    Kim, C., et al.: FAST: Fast architecture sensitive tree search on modern CPUs and GPUs. In: SIGMOD, pp. 339–350 (2010)

  20. 20.

    Leis, V., et al.: The adaptive radix tree: ARTful indexing for main-memory databases. In: ICDE, pp. 38–49 (2013)

  21. 21.

    Martinez-Palau, X., Dominguez-Sal, D., et al.: Two-way replacement selection. PVLDB 3, 871–881 (2010)

    Google Scholar 

  22. 22.

    McCalpin, J.D.: STREAM benchmark, version from January 17. https://www.cs.virginia.edu/stream/FTP/Code/stream.c (2013)

  23. 23.

    Pirk, H., Petraki, E., Idreos, S., Manegold, S., Kersten, M.L.: Database cracking: fancy scan, not poor man’s sort! In: DaMoN, Snowbird, UT, USA, pp. 4:1–4:8 (2014)

  24. 24.

    Rao, J., Ross, K.A.: Making B+-trees cache conscious in main memory. In: SIGMOD, pp. 475–486 (2000)

  25. 25.

    Schuhknecht, F.M., Jindal, A., Dittrich, J.: The uncracked pieces in database cracking. PVLDB 7, 97–108 (2013)

    Google Scholar 

  26. 26.

    Schuhknecht, F.M., Khanchandani, P., Dittrich, J.: On the surprising difficulty of simple things: the case of radix partitioning. PVLDB 8, 934–937 (2015)

    Google Scholar 

Download references

Acknowledgments

Special thanks to Stratos Idreos for helping us in understanding the hybrid methods. Work partially supported by BMBF.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Felix Martin Schuhknecht.

Ethics declarations

Competing interests

As we re-evaluate research, there are potential competing interests with CWI Amsterdam and the authors of [8].

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Schuhknecht, F.M., Jindal, A. & Dittrich, J. An experimental evaluation and analysis of database cracking. The VLDB Journal 25, 27–52 (2016). https://doi.org/10.1007/s00778-015-0397-y

Download citation

Keywords

  • Adaptive indexing
  • Database cracking
  • Sorting
  • Multi-threaded algorithms