A Scalable Software Framework for Stateful Stream Data Processing on Multiple GPUs and Applications

  • Farhoosh AlghabiEmail author
  • Ulrich Schipper
  • Andreas Kolb


During the past few years, the increase of computational power has been realized using more processors with multiple cores and specific processing units like graphics processing units (GPUs). Also, the introduction of programming languages such as CUDA and OpenCL makes it easy, even for non-graphics programmers, to exploit the computational power of massively parallel processors available in current GPUs. Although CUDA and OpenCL relieve programmers from considering many low-level details of parallel programming on multiple cores on a single GPU, the same support at a higher level of parallelization for multiple GPUs is still under research. In particular, fundamental issues of memory management and synchronization must be dealt with directly by the programmer. In this chapter, we introduce concepts for CUDA-based frameworks which are designed for stateful stream data processing for graph-like arrangements of processing modules on two or more GPUs in a single compute node. We evaluate these concepts and further elaborate on the approach of our choice. Our approach relieves the programmer from error-prone chores of memory management and synchronization. The chapter presents detailed evaluation results which demonstrate the scalability of the proposed framework. To demonstrate the usability of our framework, we utilize it for demanding online processing in the areas of crystallographic structure detection and video decryption.


GPGPU Software framework Multi-GPU Stream data processing 



This research was partially funded by the German Ministry for Research and Education (BMBF) under grant No. 05k10PSB.


  1. 1.
    Macedonia, M.: The GPU enters computing’s mainstream. IEEE Comput. 36(10), 106–108 (2003)CrossRefGoogle Scholar
  2. 2.
    Enmyren, J., Kessler, C.: Skepu: A multi-backend skeleton programming library for multi-GPU systems. In: Proceedings on International ACM Workshop High-level parallel programming and applications, pp. 5–14 (2010)Google Scholar
  3. 3.
    Meyer, B., Plessl, C., Forstner, J.: Transformation of scientific algorithms to parallel computing code: Single GPU and mpi multi GPU backends with subdomain support. In: Proceeding of 2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC), pp. 60–63 (2011)Google Scholar
  4. 4.
    Chen, L., Villa, O., Krishnamoorthy, S., Gao, G.: Dynamic load balancing on single- and multi-GPU systems. In: Proc. Parallel & Distributed Processing (IPDPS) (2010). doi: 10.1109/IPDPS.2010.5470413
  5. 5.
    Chen, L., Villa, O., Gao, G.: Exploring fine-grained task-based execution on multi-GPU systems. In: Proceedings of IEEE International Conference on Cluster Computing, pp. 386–394 (2011)Google Scholar
  6. 6.
    Stuart, J.A., Chen, C.K., Ma, K.L., Owens, J.D.: Multi-GPU volume rendering using MapReduce. In: Proceedings of International ACM Symposium on High Performance Distributed Computing, pp. 841–848 (2010)Google Scholar
  7. 7.
    Schaa, D., Kaeli, D.: Exploring the multiple-GPU design space. In: Proceedings of International IEEE Symposium on Parallel and Distributed Processing (2009)Google Scholar
  8. 8.
    Verner, U., Schuster, A., Silberstein, M.: Processing data streams with hard real-time constraints on heterogeneous systems. In: Proceedings on International Conference on Supercomputing, pp. 120–129 (2011)Google Scholar
  9. 9.
    Yamagiwa, S., Arai, M., Wada, K.: Efficient handling of stream buffers in GPU stream-based computing platform. In: Proceedings on IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 286–291 (2011)Google Scholar
  10. 10.
    Teodoro, G., Sachetto, R., Sertel, O., Gurcan, M., Meira, W., Catalyurek, U., Ferreira, R.: Coordinating the use of GPU and CPU for improving performance of compute intensive applications. In: Proceedings on Internatyional IEEE Conference on Cluster (2009)Google Scholar
  11. 11.
    Houzet, D., Huet, S., Rahman, A.: Syscellc: A data-flow programming model on multi-GPU. In: Proceedings of International Conference on Computational Science, pp. 1035–1044 (2010)Google Scholar
  12. 12.
    Zhang, Y., Mueller, F.: Gstream: A general-purpose data streaming framework on GPU clusters. In: Proceedings of International Conference on Parallel Processing, pp. 245–254, (2011)Google Scholar
  13. 13.
    Vogelgesang, M., Chilingaryan, S., dos Santos Rolo, T., Kopmann, A.: Ufo: A scalable GPU-based image processing framework for on-line monitoring. In: Proceedings on IEEE 14th International Conference on High Performance Computing and Communications, pp. 824–829 (2012)Google Scholar
  14. 14.
    Wang, X., Bao, X.: A novel block cryptosystem based on the coupled chaotic map lattice. Nonlinear Dyn 72, 707–715 (2013)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Cheddad, A., Condell, J., Curran, K., Kevitt, P.M.: Digital image steganography: survey and analysis of current methods. Signal Process 90, 727–752 (2010)CrossRefzbMATHGoogle Scholar
  16. 16.
    Alghabi, F., Schipper, U., Kolb, A.: Real-time processing of pnCCD images using GPUs. In: 14th International Workshop on Radiation Imaging Detectors (2012)Google Scholar
  17. 17.
    Andritschke, R., Hartner, G., Hartmann, R., Meidinger, N., Strüder, L.: Data analysis for characterizing pnCCDs. In Proceedings of Nuclear Science Symposium, pp. 2166–2172 (2008)Google Scholar

Copyright information

© Springer Science+Business Media Singapore 2015

Authors and Affiliations

  • Farhoosh Alghabi
    • 1
    Email author
  • Ulrich Schipper
    • 1
  • Andreas Kolb
    • 1
  1. 1.Institute for Vision and GraphicsUniversity of SiegenSiegenGermany

Personalised recommendations