Automatic Selection of Processing Units for Coprocessing in Databases

  • Sebastian Breß
  • Felix Beier
  • Hannes Rauhe
  • Eike Schallehn
  • Kai-Uwe Sattler
  • Gunter Saake
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7503)


Specialized processing units such as GPUs or FPGAs provide great opportunities to speed up database operations by exploiting parallelism and relieving the CPU. But utilizing coprocessors efficiently poses major challenges to developers. Besides finding fine-granular data parallel algorithms and tuning them for the available hardware, it has to be decided at runtime which (co)processor should be chosen to execute a specific task. Depending on input parameters, wrong decisions can lead to severe performance degradations since involving coprocessors introduces a significant overhead, e.g., for data transfers. In this paper, we present a framework that automatically learns and adapts execution models for arbitrary algorithms on any (co)processor to find break-even points and support scheduling decisions. We demonstrate its applicability for three common use cases in modern database systems and show how their performance can be improved with wise scheduling decisions.


Execution Time Child Node Batch Size Schedule Decision Automatic Selection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gregg, C., Hazelwood, K.: Where is the data? why you cannot debate cpu vs. gpu performance without the answer. In: ISPASS, pp. 134–144. IEEE (2011)Google Scholar
  2. 2.
    Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: Gputerasort: high performance graphics co-processor sorting for large database management. In: SIGMOD, pp. 325–336. ACM (2006)Google Scholar
  3. 3.
    AMD: AMD Accelerated Parallel Processing (APP) SDK, Samples & Demos,
  4. 4.
    Hellerstein, J.M., Naughton, J.F., Pfeffer, A.: Generalized Search Trees for Database Systems. In: VLDB, pp. 562–573. Morgan Kaufmann Publishers Inc. (1995)Google Scholar
  5. 5.
    Beier, F., Kilias, T., Sattler, K.U.: Gist scan acceleration using coprocessors. In: DaMoN, pp. 63–69. ACM (2012)Google Scholar
  6. 6.
    Abadi, D.J., Madden, S.R., Hachem, N.: Column-stores vs. row-stores: how different are they really? In: SIGMOD, pp. 967–980. ACM (2008)Google Scholar
  7. 7.
    French, C.D.: ”One size fits all” database architectures do not work for DSS. In: SIGMOD, pp. 449–450. ACM (1995)Google Scholar
  8. 8.
    Boncz, P., Zukowski, M., Nes, N.: MonetDB/X100: Hyper-pipelining query execution. In: CIDR, pp. 225–237. VLDB Endowment (2005)Google Scholar
  9. 9.
    Stonebraker, M., Abadi, D.: Others.: C-store: a column-oriented DBMS. In: VLDB, pp. 553–564. VLDB Endowment (2005)Google Scholar
  10. 10.
    Krueger, J., Kim, C., Grund, M., Satish, N.: Fast updates on read-optimized databases using multi-core CPUs. J. VLDB Endowment, 61–72 (2011)Google Scholar
  11. 11.
    Ding, S., He, J., Yan, H., Suel, T.: Using graphics processors for high performance IR query processing. In: WWW, pp. 421–430. ACM (2009)Google Scholar
  12. 12.
    Wu, D., Zhang, F., Ao, N., Wang, G., Liu, X., Liu, J.: Efficient lists intersection by cpu-gpu cooperative computing. In: IPDPS Workshops, pp. 1–8. IEEE (2010)Google Scholar
  13. 13.
    Hoberock, J., Bell, N.: Thrust: A Parallel Template Library, Version 1.3.0 (2010)Google Scholar
  14. 14.
  15. 15.
    Krueger, J., Grund, M., Jaeckel, I., Zeier, A., Plattner, H.: Applicability of GPU Computing for Efficient Merge in In-Memory Databases. In: ADMS. VLDB Endowment (2011)Google Scholar
  16. 16.
    Breß, S., Mohammad, S., Schallehn, E.: Self-tuning distribution of db-operations on hybrid cpu/gpu platforms. In: Grundlagen von Datenbanken, CEUR-WS, pp. 89–94 (2012)Google Scholar
  17. 17.
    Anthony Ralston, P.R.: A first course in numerical analysis, 2nd edn., vol. 73, p. 251. Dover Publications (2001)Google Scholar
  18. 18.
    Zhang, N., Haas, P.J., Josifovski, V., Lohman, G.M., Zhang, C.: Statistical learning techniques for costing xml queries. In: VLDB, pp. 289–300. VLDB Endowment (2005)Google Scholar
  19. 19.
    ALGLIB Project: ALGLIB,
  20. 20.
    Akdere, M., Cetintemel, U., Upfal, E., Zdonik, S.: Learning-based query performance modeling and prediction. Technical report. Department of Computer Science, Brown University (2011)Google Scholar
  21. 21.
    Lee, V.W., Kim, C., et al.: Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In: SIGARCH Comput. Archit. News, pp. 451–460. ACM (2010)Google Scholar
  22. 22.
    Zidan, M.A., Bonny, T., Salama, K.N.: High performance technique for database applications using a hybrid gpu/cpu platform. In: VLSI, pp. 85–90. ACM (2011)Google Scholar
  23. 23.
    He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational query coprocessing on graphics processors. In: ACM Trans. Database Syst., pp. 1–21. ACM (2009)Google Scholar
  24. 24.
    Matsunaga, A., Fortes, J.A.B.: On the use of machine learning to predict the time and resources consumed by applications. In: CCGRID, pp. 495–504. IEEE (2010)Google Scholar
  25. 25.
    Kerr, A., Diamos, G., Yalamanchili, S.: Modeling gpu-cpu workloads and systems. In: GPGPU, pp. 31–42. ACM (2010)Google Scholar
  26. 26.
    Iverson, M.A., Ozguner, F., Follen, G.J.: Run-time statistical estimation of task execution times for heterogeneous distributed computing. In: HPDC, pp. 263–270. IEEE (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Sebastian Breß
    • 3
  • Felix Beier
    • 1
  • Hannes Rauhe
    • 1
    • 2
  • Eike Schallehn
    • 3
  • Kai-Uwe Sattler
    • 1
  • Gunter Saake
    • 3
  1. 1.Ilmenau University of TechnologyGermany
  2. 2.SAP AGGermany
  3. 3.Otto-von-Guericke University MagdeburgGermany

Personalised recommendations