An Operator-Stream-Based Scheduling Engine for Effective GPU Coprocessing

  • Sebastian Breß
  • Norbert Siegmund
  • Ladjel Bellatreche
  • Gunter Saake
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8133)


Since a decade, the database community researches opportunities to exploit graphics processing units to accelerate query processing. While the developed GPU algorithms often outperform their CPU counterparts, it is not beneficial to keep processing devices idle while over utilizing others. Therefore, an approach is needed that effectively distributes a workload on available (co-)processors while providing accurate performance estimations for the query optimizer. In this paper, we extend our hybrid query-processing engine with heuristics that optimize query processing for response time and throughput simultaneously via inter-device parallelism. Our empirical evaluation reveals that the new approach doubles the throughput compared to our previous solution and state-of-the-art approaches, because of nearly equal device utilization while preserving accurate performance estimations.


Graphic Processing Unit Query Processing Round Robin Processing Device Query Optimizer 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anderson, T., Finn, J.D.: The New Statistical Analysis of Data, 1st edn. Springer (1996)Google Scholar
  2. 2.
    Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice & Experience 23(2), 187–198 (2011)CrossRefGoogle Scholar
  3. 3.
    Bakkum, P., Skadron, K.: Accelerating SQL Database Operations on a GPU with CUDA. In: GPGPU, pp. 94–103. ACM (2010)Google Scholar
  4. 4.
    Breß, S., Beier, F., Rauhe, H., Schallehn, E., Sattler, K.-U., Saake, G.: Automatic Selection of Processing Units for Coprocessing in Databases. In: Morzy, T., Härder, T., Wrembel, R. (eds.) ADBIS 2012. LNCS, vol. 7503, pp. 57–70. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  5. 5.
    Diamos, G., Wu, H., Lele, A., Wang, J., Yalamanchili, S.: Efficient Relational Algebra Algorithms and Data Structures for GPU. Technical report, Center for Experimental Research in Computer Systems (CERS) (2012)Google Scholar
  6. 6.
    Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M., Manocha, D.: Fast Computation of Database Operations using Graphics Processors. In: SIGMOD, pp. 215–226. ACM (2004)Google Scholar
  7. 7.
    Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery 1(1), 29–53 (1997)CrossRefGoogle Scholar
  8. 8.
    He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational Query Co-Processing on Graphics Processors. ACM Trans. Database Syst. 34, 21:1–21:39 (2009)Google Scholar
  9. 9.
    Ilić, A., Pratas, F., Trancoso, P., Sousa, L.: High-Performance Computing on Heterogeneous Systems: Database Queries on CPU and GPU. In: High Performance Scientific Computing with Special Emphasis on Current Capabilities and Future Perspectives, pp. 202–222. IOS Press (2011)Google Scholar
  10. 10.
    Ilić, A., Sousa, L.: CHPS: An Environment for Collaborative Execution on Heterogeneous Desktop Systems. International Journal of Networking and Computing 1(1), 96–113 (2011)Google Scholar
  11. 11.
    Iverson, M., Ozguner, F., Potter, L.: Statistical Prediction of Task Execution Times Through Analytic Benchmarking for Scheduling in a Heterogeneous Environment. In: HCW, pp. 99–111 (1999)Google Scholar
  12. 12.
    Kerr, A., Diamos, G., Yalamanchili, S.: Modeling GPU-CPU Workloads and Systems. In: GPGPU, pp. 31–42. ACM (2010)Google Scholar
  13. 13.
    Lauer, T., Datta, A., Khadikov, Z., Anselm, C.: Exploring Graphics Processing Units as Parallel Coprocessors for Online Aggregation. In: DOLAP, pp. 77–84. ACM (2010)Google Scholar
  14. 14.
    Malik, M., Riha, L., Shea, C., El-Ghazawi, T.: Task Scheduling for GPU Accelerated Hybrid OLAP Systems with Multi-core Support and Text-to-Integer Translation. In: 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 1987–1996. IEEE (2012)Google Scholar
  15. 15.
    Pirk, H.: Efficient Cross-Device Query Processing. In: The VLDB PhD Workshop. VLDB Endowment (2012)Google Scholar
  16. 16.
    Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming, 1st edn., vol. 186, pp. 2–6. Addison-Wesley Professional (2010)Google Scholar
  17. 17.
    Schlicht, E.: Isolation and Aggregation in Economics, 1st edn. Springer (1985)Google Scholar
  18. 18.
    Tang, X., Chanson, S.: Optimizing Static Job Scheduling in a Network of Heterogeneous Computers. In: ICPP, pp. 373–382. IEEE (2000)Google Scholar
  19. 19.
    Topcuouglu, H., Hariri, S., Wu, M.-Y.: Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)CrossRefGoogle Scholar
  20. 20.
    Wu, R., Zhang, B., Hsu, M., Chen, Q.: GPU-Accelerated Predicate Evaluation on Column Store. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 570–581. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  21. 21.
    Zhao, Y., Deshpande, P.M., Naughton, J.F.: An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. In: SIGMOD, pp. 159–170. ACM (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Sebastian Breß
    • 1
  • Norbert Siegmund
    • 1
  • Ladjel Bellatreche
    • 2
  • Gunter Saake
    • 1
  1. 1.University of MagdeburgGermany
  2. 2.LIAS/ISAE-ENSMAFuturoscopeFrance

Personalised recommendations