, Volume 14, Issue 3, pp 199–209 | Cite as

The Design and Implementation of CoGaDB: A Column-oriented GPU-accelerated DBMS

  • Sebastian BreßEmail author


Nowadays, the performance of processors is primarily bound by a fixed energy budget, the power wall. This forces hardware vendors to optimize processors for specific tasks, which leads to an increasingly heterogeneous hardware landscape. Although efficient algorithms for modern processors such as GPUs are heavily investigated, we also need to prepare the database optimizer to handle computations on heterogeneous processors. GPUs are an interesting base for case studies, because they already offer many difficulties we will face tomorrow.

In this paper, we present CoGaDB, a main-memory DBMS with built-in GPU acceleration, which is optimized for OLAP workloads. CoGaDB uses the self-tuning optimizer framework HyPE to build a hardware-oblivious optimizer, which learns cost models for database operators and efficiently distributes a workload on available processors. Furthermore, CoGaDB implements efficient algorithms on CPU and GPU and efficiently supports star joins. We show in this paper, how these novel techniques interact with each other in a single system. Our evaluation shows that CoGaDB quickly adapts to the underlying hardware by increasing the accuracy of its cost models at runtime.


DBMS architecture GPU acceleration Co-processing Main-memory DBMS 



We thank Jens Teubner from TU Dortmund University and Theo Härder from University of Kaiserslautern for their helpful feedback.


  1. 1.
    Abadi D, Myers D, DeWitt D, Madden S. (2007) Materialization strategies in a column-oriented DBMS. In: ICDE, IEEE, pp 466–475Google Scholar
  2. 2.
    Abadi DJ, Madden SR, Hachem N. (2008) Column-stores vs. row-stores: how different are they really? In: SIGMOD, ACM, pp 967–980Google Scholar
  3. 3.
    Abadi D, Boncz P, Harizopoulos S, Idreos S, Madden S (2013) The design and implementation of modern column-oriented database systems. Foundations Trends in Databases 5(3):197–280CrossRefGoogle Scholar
  4. 4.
    Bakkum P, Chakradhar S (2012) Efficient data management for GPU databases.
  5. 5.
    Balkesen C, Alonso G, Teubner J, Özsu MT (2013) Multi-core, main-memory joins: sort vs. hash revisited. PVLDB 7(1):85–96Google Scholar
  6. 6.
    Balkesen C, Teubner J, Alonso G, Özsu MT (2013) Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. In: ICDE, pp 362–373Google Scholar
  7. 7.
    Boncz PA, Zukowski M, Nes N (2005) MonetDB/X100: hyper-pipelining query execution. In: CIDR, pp 225–237Google Scholar
  8. 8.
    Borkar S, Chien AA (2011) The future of microprocessors. Commun ACM 54(5):67–77CrossRefGoogle Scholar
  9. 9.
    Breß S, Geist I, Schallehn E, Mory M, Saake G (2012) A framework for cost based optimization of hybrid CPU/GPU query plans in database systems. Control Cybernetics 41(4):715–742Google Scholar
  10. 10.
    Breß S, Beier F, Rauhe H, Sattler K-U, Schallehn E, Saake G (2013) Efficient co-processor utilization in database query processing. Information Systems 38(8):1084–1096CrossRefGoogle Scholar
  11. 11.
    Breß S, Heimel M, Saecker M, Köcher B, Markl V, Saake G (2014) Ocelot/HyPE: optimized data processing on heterogeneous hardware. PVLDB 7(13)Google Scholar
  12. 12.
    Breß S, Siegmund N, Heimel M, Saecker M, Lauer T, Bellatreche L, Saake G (2014) Load-aware inter-co-processor parallelism in database query processing. Data & Knowledge Engineering. doi:10.1016/j.datak.2014.07.003Google Scholar
  13. 13.
    Broneske D, Breß S, Heimel M, Saake G (2014) Toward hardware-sensitive database operations. In: EDBT,, pp 229–234Google Scholar
  14. 14.
    Broneske D, Breß S, Saake G (2014) Database scan variants on modern CPUs: a performance study. In: IMDM@VLDBGoogle Scholar
  15. 15.
    Gray J et al (1997) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min Knowl Disc 1(1):29–53CrossRefGoogle Scholar
  16. 16.
    Gregg C, Hazelwood K (2011) Where is the data? Why you cannot debate CPU vs. GPU performance without the answer. In: ISPASS, IEEE, pp 134–144Google Scholar
  17. 17.
    He B, Lu M, Yang K, Fang R, Govindaraju NK, Luo Q, Sander PV (2009) Relational query co-processing on graphics processors. ACM Trans Database Syst 34:21Google Scholar
  18. 18.
    Heimel M, Markl V (2012) A first step towards GPU-assisted query optimization. In: ADMS, pp 33–44Google Scholar
  19. 19.
    Heimel M, Saecker M, Pirk H, Manegold S, Markl V (2013) Hardware-oblivious parallelism for in-memory column-stores. PVLDB 6(9):709–720Google Scholar
  20. 20.
    Heimel M, Haase F, Meinke M, Breß S, Saecker M, Markl V (2014) Demonstrating self-learning algorithm adaptivity in a hardware-oblivious database engine. In: EDBT,, pp 616–619Google Scholar
  21. 21.
    Idreos S, Groffen F, Nes N, Manegold S, Mullender KS, Kersten ML (2012) MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng Bull 35(1):40–45Google Scholar
  22. 22.
    Johnson R, Raman V, Sidle R, Swart G (2008) Row-wise parallel predicate evaluation. PVLDB 1(1):622–634Google Scholar
  23. 23.
    Leis V, Boncz P, Kemper A, Neumann T (2014) Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In: SIGMOD, ACM, pp 743–754Google Scholar
  24. 24.
    Manegold S, Boncz PA, Kersten ML (2000) Optimizing database architecture for the new bottleneck: memory access. VLDB J 9(3):231–246CrossRefzbMATHGoogle Scholar
  25. 25.
    Manegold S, Boncz P, Kersten ML (2002) Generic database cost models for hierarchical memory systems. In: PVLDB, VLDB Endowment, pp 191–202Google Scholar
  26. 26.
    Manegold S, Boncz P, Nes N, Kersten M (2004) Cache-conscious radix-decluster projections. VLDB, VLDB Endowment, pp 684–695Google Scholar
  27. 27.
    Markl V, Raman V, Simmen D, Lohman G, Pirahesh H, Cilimdzic M (2004) Robust query processing through progressive optimization. In: SIGMOD, ACM, pp 659–670Google Scholar
  28. 28.
    Mühlbauer T, Rödiger W, Seilbeck R, Kemper A, Neumann T (2014) Heterogeneity-conscious parallel query execution: getting a better mileage while driving faster! In: DaMoN, ACM, pp 2:1–2:10Google Scholar
  29. 29.
    Neumann T (2011) Efficiently compiling efficient query plans for modern hardware. PVLDB 4(9):539–550Google Scholar
  30. 30.
    NVIDIA. NVIDIA CUDA C Programming Guide. (2014) pp 31–36, Version 6.0. Accessed 18 May 2014
  31. 31.
    O’Neil P, Graefe G (1995) Multi-table joins through bitmapped join indices. SIGMOD Rec 24(3):8–11CrossRefGoogle Scholar
  32. 32.
    O’Neil P, O’Neil EJ, Chen X (2009) The star schema benchmark (SSB), Revision 3.
  33. 33.
    Raman V, Swart G, Qiao L, Reiss F, Dialani V, Kossmann D, Narang I, Sidle R (2008) Constant-time query processing. In: ICDE, IEEE, pp 60–69Google Scholar
  34. 34.
    Raman V et al (2013) DB2 with BLU acceleration: so much more than just a column store. PVLDB 6(11):1080–1091Google Scholar
  35. 35.
    Stillger M, Lohman GM, Markl V, Kandil M (2001) LEO - DB2`s learning optimizer. In: VLDB, Morgan Kaufmann Publishers Inc., pp 19–28Google Scholar
  36. 36.
    Wang K, Zhang K, Yuan Y, Ma S, Lee R, Ding X, Zhang X (2014) Concurrent analytical query processing with GPUs. PVLDB 7(11):1011–1022Google Scholar
  37. 37.
    Ye Y, Ross KA, Vesdapunt N (2011) Scalable aggregation on multicore processors. In: DaMoN, ACM, pp 1–9Google Scholar
  38. 38.
    Yuan Y, Lee R, Zhang X (2013) The yin and yang of processing data warehousing queries on GPU devices. PVLDB 6(10):817–828Google Scholar
  39. 39.
    Zhang S, He J, He B, Lu M (2013) OmniDB: towards portable and efficient query processing on parallel CPU/GPU architectures. PVLDB 6(12):1374–1377Google Scholar
  40. 40.
    Zhou J, Ross KA (2002) Implementing database operations using SIMD instructions. In: SIGMOD, ACM, pp 145–156Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.TU Dortmund UniversityDortmundGermany
  2. 2.University of MagdeburgMagdeburgGermany

Personalised recommendations