A Multi-GPU Programming Library for Real-Time Applications

  • Sebastian Schaetz
  • Martin Uecker
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7439)


We present MGPU, a C++ programming library targeted at single-node multi-GPU systems. Such systems combine disproportionate floating point performance with high data locality and are thus well suited to implement real-time algorithms. We describe the library design, programming interface and implementation details in light of this specific problem domain. The core concepts of this work are a novel kind of container abstraction and MPI-like communication methods for intra-system communication. We further demonstrate how MGPU is used as a framework for porting existing GPU libraries to multi-device architectures. Putting our library to the test, we accelerate an iterative non-linear image reconstruction algorithm for real-time magnetic resonance imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us to conclude that multi-GPU systems are a viable solution for real-time MRI reconstruction as well as signal-processing applications in general.


GPGPU multi-GPU hardware-aware algorithm real-time signal-processing MRI iterative image reconstruction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Austern, M.H.: Segmented Iterators and Hierarchical Algorithms. In: Jazayeri, M., Musser, D.R., Loos, R.G.K. (eds.) Dagstuhl Seminar 1998. LNCS, vol. 1766, p. 80. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  2. 2.
    Bakushinskiĭ, A., Kokurin, M.: Iterative Methods for Approximate Solution of Inverse Problems, vol. 577. Kluwer Academic Pub. (2004)Google Scholar
  3. 3.
    Barak, A., Ben-Nun, T., Levy, E., Shiloh, A.: A package for OpenCL based heterogeneous computing on clusters with many GPU devices. In: 2010 IEEE International Conference on Cluster Computing Workshops and Posters (Cluster Workshops), pp. 1–7. IEEE (2010)Google Scholar
  4. 4.
    Bergstrom, L.: Measuring NUMA effects with the Stream benchmark. CoRR abs/1103.3225 (2011)Google Scholar
  5. 5.
    Block, K., Uecker, M., Frahm, J.: Undersampled radial MRI with multiple coils. Iterative image reconstruction using a total variation constraint. Magnetic Resonance in Medicine 57, 1086–1098 (2007)Google Scholar
  6. 6.
    Chilingaryan, S., Mirone, A., Hammersley, A., Ferrero, C., Helfen, L., Kopmann, A., dos Santos Rolo, T., Vagovic, P.: A GPU-Based Architecture for Real-Time Data Assessment at Synchrotron Experiments. IEEE Transactions on Nuclear Science (99), 1–1 (2011)Google Scholar
  7. 7.
    Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. Pitman (1989)Google Scholar
  8. 8.
    Dawes, B., Abrahams, D., Rivera, R.: Boost C++ libraries,
  9. 9.
    Enmyren, J., Kessler, C.: SkePU: A Multi-Backend Skeleton Programming Library for Multi-GPU Systems. In: Proceedings of the Fourth International Workshop on High-Level Parallel Programming and Applications, pp. 5–14. ACM (2010)Google Scholar
  10. 10.
    Hansen, M., Atkinson, D., Sorensen, T.: Cartesian SENSE and k-t SENSE reconstruction using commodity graphics hardware. Magnetic Resonance in Medicine 59(3), 463–468 (2008)CrossRefGoogle Scholar
  11. 11.
    Hoberock, J., Bell, N.: Thrust: C++ Template Library for CUDA (2009)Google Scholar
  12. 12.
    Huang, F., Vijayakumar, S., Li, Y., Hertel, S., Duensing, G.: A software channel compression technique for faster reconstruction with many channels. Magnetic Resonance Imaging 26, 133–141 (2007)CrossRefGoogle Scholar
  13. 13.
    Jang, B., Kaeli, D., Do, S., Pien, H.: Multi GPU Implementation of Iterative Tomographic Reconstruction Algorithms. In: IEEE International Symposium on Biomedical Imaging: From Nano to Macro, ISBI 2009, pp. 185–188. IEEE (2009)Google Scholar
  14. 14.
    Jung, H., Sung, K., Nayak, K., Kim, E., Ye, J.: k-t focuss: A general compressed sensing framework for high resolution dynamic mri. Magnetic Resonance in Medicine 61(1), 103–116 (2009)CrossRefGoogle Scholar
  15. 15.
    Kim, D., Trzasko, J., Smelyanskiy, M., Haider, C., Dubey, P., Manduca, A.: High-performance 3D compressive sensing MRI reconstruction using many-core architectures. Journal of Biomedical Imaging 2 (2011)Google Scholar
  16. 16.
    Knoll, F., Freiberger, M., Bredies, K., Stollberger, R.: AGILE: An open source library for image reconstruction using graphics card hardware acceleration. Proc. Intl. Soc. Mag. Reson. Med. 19, 2554 (2011)Google Scholar
  17. 17.
    Lustig, M., Donoho, D., Pauly, J.: Sparse MRI: The application of compressed sensing for rapid MR imaging. Magnetic Resonance in Medicine 58, 1182–1195 (2007)CrossRefGoogle Scholar
  18. 18.
    Murphy, M., Alley, M., Demmel, J., Keutzer, K., Vasanawala, S., Lustig, M.: Fast ℓ1-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime. IEEE Transactions on Medical Imaging 1, 99 (2012)Google Scholar
  19. 19.
    Rupp, K., Rudolf, F., Weinbub, J.: ViennaCL - A High Level Linear Algebra Library for GPUs and Multi-Core CPUs. In: Proc. GPUScA, pp. 51–56 (2010)Google Scholar
  20. 20.
    Tsao, J., Boesiger, P., Pruessmann, K.: k-t blast and k-t sense: Dynamic mri with high frame rate exploiting spatiotemporal correlations. Magnetic Resonance in Medicine 50(5), 1031–1042 (2003)CrossRefGoogle Scholar
  21. 21.
    Uecker, M., Hohage, T., Block, K., Frahm, J.: Image Reconstruction by Regularized Nonlinear Inversion - Joint Estimation of Coil Sensitivities and Image Content. Magnetic Resonance in Medicine 60(3), 674–682 (2008)CrossRefGoogle Scholar
  22. 22.
    Uecker, M., Zhang, S., Frahm, J.: Nonlinear inverse reconstruction for real-time MRI of the human heart using undersampled radial FLASH. Magnetic Resonance in Medicine 63, 1456–1462 (2010)CrossRefGoogle Scholar
  23. 23.
    Uecker, M., Zhang, S., Voit, D., Karaus, A., Merboldt, K.D., Frahm, J.: Real-time MRI at a resolution of 20 ms. NMR in Biomedicine 23, 986–994 (2010)CrossRefGoogle Scholar
  24. 24.
    Verner, U., Schuster, A., Silberstein, M.: Processing Data Streams with Hard Real-time Constraints on Heterogeneous Systems. In: Proceedings of the International Conference on Supercomputing, pp. 120–129. ACM (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Sebastian Schaetz
    • 1
  • Martin Uecker
    • 2
  1. 1.BiomedNMR Forschungs GmbHMax Planck Institute for biophysical ChemistryGoettingenGermany
  2. 2.Department of Electrical Engineering and Computer SciencesUniversity of CaliforniaBerkeleyUSA

Personalised recommendations