Skip to main content

A Multi-GPU Programming Library for Real-Time Applications

  • Conference paper
Algorithms and Architectures for Parallel Processing (ICA3PP 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7439))

Abstract

We present MGPU, a C++ programming library targeted at single-node multi-GPU systems. Such systems combine disproportionate floating point performance with high data locality and are thus well suited to implement real-time algorithms. We describe the library design, programming interface and implementation details in light of this specific problem domain. The core concepts of this work are a novel kind of container abstraction and MPI-like communication methods for intra-system communication. We further demonstrate how MGPU is used as a framework for porting existing GPU libraries to multi-device architectures. Putting our library to the test, we accelerate an iterative non-linear image reconstruction algorithm for real-time magnetic resonance imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us to conclude that multi-GPU systems are a viable solution for real-time MRI reconstruction as well as signal-processing applications in general.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Austern, M.H.: Segmented Iterators and Hierarchical Algorithms. In: Jazayeri, M., Musser, D.R., Loos, R.G.K. (eds.) Dagstuhl Seminar 1998. LNCS, vol. 1766, p. 80. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  2. Bakushinskiĭ, A., Kokurin, M.: Iterative Methods for Approximate Solution of Inverse Problems, vol. 577. Kluwer Academic Pub. (2004)

    Google Scholar 

  3. Barak, A., Ben-Nun, T., Levy, E., Shiloh, A.: A package for OpenCL based heterogeneous computing on clusters with many GPU devices. In: 2010 IEEE International Conference on Cluster Computing Workshops and Posters (Cluster Workshops), pp. 1–7. IEEE (2010)

    Google Scholar 

  4. Bergstrom, L.: Measuring NUMA effects with the Stream benchmark. CoRR abs/1103.3225 (2011)

    Google Scholar 

  5. Block, K., Uecker, M., Frahm, J.: Undersampled radial MRI with multiple coils. Iterative image reconstruction using a total variation constraint. Magnetic Resonance in Medicine 57, 1086–1098 (2007)

    Google Scholar 

  6. Chilingaryan, S., Mirone, A., Hammersley, A., Ferrero, C., Helfen, L., Kopmann, A., dos Santos Rolo, T., Vagovic, P.: A GPU-Based Architecture for Real-Time Data Assessment at Synchrotron Experiments. IEEE Transactions on Nuclear Science (99), 1–1 (2011)

    Google Scholar 

  7. Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. Pitman (1989)

    Google Scholar 

  8. Dawes, B., Abrahams, D., Rivera, R.: Boost C++ libraries, http://www.boost.org

  9. Enmyren, J., Kessler, C.: SkePU: A Multi-Backend Skeleton Programming Library for Multi-GPU Systems. In: Proceedings of the Fourth International Workshop on High-Level Parallel Programming and Applications, pp. 5–14. ACM (2010)

    Google Scholar 

  10. Hansen, M., Atkinson, D., Sorensen, T.: Cartesian SENSE and k-t SENSE reconstruction using commodity graphics hardware. Magnetic Resonance in Medicine 59(3), 463–468 (2008)

    Article  Google Scholar 

  11. Hoberock, J., Bell, N.: Thrust: C++ Template Library for CUDA (2009)

    Google Scholar 

  12. Huang, F., Vijayakumar, S., Li, Y., Hertel, S., Duensing, G.: A software channel compression technique for faster reconstruction with many channels. Magnetic Resonance Imaging 26, 133–141 (2007)

    Article  Google Scholar 

  13. Jang, B., Kaeli, D., Do, S., Pien, H.: Multi GPU Implementation of Iterative Tomographic Reconstruction Algorithms. In: IEEE International Symposium on Biomedical Imaging: From Nano to Macro, ISBI 2009, pp. 185–188. IEEE (2009)

    Google Scholar 

  14. Jung, H., Sung, K., Nayak, K., Kim, E., Ye, J.: k-t focuss: A general compressed sensing framework for high resolution dynamic mri. Magnetic Resonance in Medicine 61(1), 103–116 (2009)

    Article  Google Scholar 

  15. Kim, D., Trzasko, J., Smelyanskiy, M., Haider, C., Dubey, P., Manduca, A.: High-performance 3D compressive sensing MRI reconstruction using many-core architectures. Journal of Biomedical Imaging 2 (2011)

    Google Scholar 

  16. Knoll, F., Freiberger, M., Bredies, K., Stollberger, R.: AGILE: An open source library for image reconstruction using graphics card hardware acceleration. Proc. Intl. Soc. Mag. Reson. Med. 19, 2554 (2011)

    Google Scholar 

  17. Lustig, M., Donoho, D., Pauly, J.: Sparse MRI: The application of compressed sensing for rapid MR imaging. Magnetic Resonance in Medicine 58, 1182–1195 (2007)

    Article  Google Scholar 

  18. Murphy, M., Alley, M., Demmel, J., Keutzer, K., Vasanawala, S., Lustig, M.: Fast ℓ1-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime. IEEE Transactions on Medical Imaging 1, 99 (2012)

    Google Scholar 

  19. Rupp, K., Rudolf, F., Weinbub, J.: ViennaCL - A High Level Linear Algebra Library for GPUs and Multi-Core CPUs. In: Proc. GPUScA, pp. 51–56 (2010)

    Google Scholar 

  20. Tsao, J., Boesiger, P., Pruessmann, K.: k-t blast and k-t sense: Dynamic mri with high frame rate exploiting spatiotemporal correlations. Magnetic Resonance in Medicine 50(5), 1031–1042 (2003)

    Article  Google Scholar 

  21. Uecker, M., Hohage, T., Block, K., Frahm, J.: Image Reconstruction by Regularized Nonlinear Inversion - Joint Estimation of Coil Sensitivities and Image Content. Magnetic Resonance in Medicine 60(3), 674–682 (2008)

    Article  Google Scholar 

  22. Uecker, M., Zhang, S., Frahm, J.: Nonlinear inverse reconstruction for real-time MRI of the human heart using undersampled radial FLASH. Magnetic Resonance in Medicine 63, 1456–1462 (2010)

    Article  Google Scholar 

  23. Uecker, M., Zhang, S., Voit, D., Karaus, A., Merboldt, K.D., Frahm, J.: Real-time MRI at a resolution of 20 ms. NMR in Biomedicine 23, 986–994 (2010)

    Article  Google Scholar 

  24. Verner, U., Schuster, A., Silberstein, M.: Processing Data Streams with Hard Real-time Constraints on Heterogeneous Systems. In: Proceedings of the International Conference on Supercomputing, pp. 120–129. ACM (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schaetz, S., Uecker, M. (2012). A Multi-GPU Programming Library for Real-Time Applications. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7439. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33078-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33078-0_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33077-3

  • Online ISBN: 978-3-642-33078-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics