A Multi-GPU Programming Library for Real-Time Applications

Schaetz, Sebastian; Uecker, Martin

doi:10.1007/978-3-642-33078-0_9

Sebastian Schaetz²² &
Martin Uecker²³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7439))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

2173 Accesses
30 Citations
1 Altmetric

Abstract

We present MGPU, a C++ programming library targeted at single-node multi-GPU systems. Such systems combine disproportionate floating point performance with high data locality and are thus well suited to implement real-time algorithms. We describe the library design, programming interface and implementation details in light of this specific problem domain. The core concepts of this work are a novel kind of container abstraction and MPI-like communication methods for intra-system communication. We further demonstrate how MGPU is used as a framework for porting existing GPU libraries to multi-device architectures. Putting our library to the test, we accelerate an iterative non-linear image reconstruction algorithm for real-time magnetic resonance imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us to conclude that multi-GPU systems are a viable solution for real-time MRI reconstruction as well as signal-processing applications in general.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Austern, M.H.: Segmented Iterators and Hierarchical Algorithms. In: Jazayeri, M., Musser, D.R., Loos, R.G.K. (eds.) Dagstuhl Seminar 1998. LNCS, vol. 1766, p. 80. Springer, Heidelberg (2000)
Chapter Google Scholar
Bakushinskiĭ, A., Kokurin, M.: Iterative Methods for Approximate Solution of Inverse Problems, vol. 577. Kluwer Academic Pub. (2004)
Google Scholar
Barak, A., Ben-Nun, T., Levy, E., Shiloh, A.: A package for OpenCL based heterogeneous computing on clusters with many GPU devices. In: 2010 IEEE International Conference on Cluster Computing Workshops and Posters (Cluster Workshops), pp. 1–7. IEEE (2010)
Google Scholar
Bergstrom, L.: Measuring NUMA effects with the Stream benchmark. CoRR abs/1103.3225 (2011)
Google Scholar
Block, K., Uecker, M., Frahm, J.: Undersampled radial MRI with multiple coils. Iterative image reconstruction using a total variation constraint. Magnetic Resonance in Medicine 57, 1086–1098 (2007)
Google Scholar
Chilingaryan, S., Mirone, A., Hammersley, A., Ferrero, C., Helfen, L., Kopmann, A., dos Santos Rolo, T., Vagovic, P.: A GPU-Based Architecture for Real-Time Data Assessment at Synchrotron Experiments. IEEE Transactions on Nuclear Science (99), 1–1 (2011)
Google Scholar
Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. Pitman (1989)
Google Scholar
Dawes, B., Abrahams, D., Rivera, R.: Boost C++ libraries, http://www.boost.org
Enmyren, J., Kessler, C.: SkePU: A Multi-Backend Skeleton Programming Library for Multi-GPU Systems. In: Proceedings of the Fourth International Workshop on High-Level Parallel Programming and Applications, pp. 5–14. ACM (2010)
Google Scholar
Hansen, M., Atkinson, D., Sorensen, T.: Cartesian SENSE and k-t SENSE reconstruction using commodity graphics hardware. Magnetic Resonance in Medicine 59(3), 463–468 (2008)
Article Google Scholar
Hoberock, J., Bell, N.: Thrust: C++ Template Library for CUDA (2009)
Google Scholar
Huang, F., Vijayakumar, S., Li, Y., Hertel, S., Duensing, G.: A software channel compression technique for faster reconstruction with many channels. Magnetic Resonance Imaging 26, 133–141 (2007)
Article Google Scholar
Jang, B., Kaeli, D., Do, S., Pien, H.: Multi GPU Implementation of Iterative Tomographic Reconstruction Algorithms. In: IEEE International Symposium on Biomedical Imaging: From Nano to Macro, ISBI 2009, pp. 185–188. IEEE (2009)
Google Scholar
Jung, H., Sung, K., Nayak, K., Kim, E., Ye, J.: k-t focuss: A general compressed sensing framework for high resolution dynamic mri. Magnetic Resonance in Medicine 61(1), 103–116 (2009)
Article Google Scholar
Kim, D., Trzasko, J., Smelyanskiy, M., Haider, C., Dubey, P., Manduca, A.: High-performance 3D compressive sensing MRI reconstruction using many-core architectures. Journal of Biomedical Imaging 2 (2011)
Google Scholar
Knoll, F., Freiberger, M., Bredies, K., Stollberger, R.: AGILE: An open source library for image reconstruction using graphics card hardware acceleration. Proc. Intl. Soc. Mag. Reson. Med. 19, 2554 (2011)
Google Scholar
Lustig, M., Donoho, D., Pauly, J.: Sparse MRI: The application of compressed sensing for rapid MR imaging. Magnetic Resonance in Medicine 58, 1182–1195 (2007)
Article Google Scholar
Murphy, M., Alley, M., Demmel, J., Keutzer, K., Vasanawala, S., Lustig, M.: Fast ℓ₁-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime. IEEE Transactions on Medical Imaging 1, 99 (2012)
Google Scholar
Rupp, K., Rudolf, F., Weinbub, J.: ViennaCL - A High Level Linear Algebra Library for GPUs and Multi-Core CPUs. In: Proc. GPUScA, pp. 51–56 (2010)
Google Scholar
Tsao, J., Boesiger, P., Pruessmann, K.: k-t blast and k-t sense: Dynamic mri with high frame rate exploiting spatiotemporal correlations. Magnetic Resonance in Medicine 50(5), 1031–1042 (2003)
Article Google Scholar
Uecker, M., Hohage, T., Block, K., Frahm, J.: Image Reconstruction by Regularized Nonlinear Inversion - Joint Estimation of Coil Sensitivities and Image Content. Magnetic Resonance in Medicine 60(3), 674–682 (2008)
Article Google Scholar
Uecker, M., Zhang, S., Frahm, J.: Nonlinear inverse reconstruction for real-time MRI of the human heart using undersampled radial FLASH. Magnetic Resonance in Medicine 63, 1456–1462 (2010)
Article Google Scholar
Uecker, M., Zhang, S., Voit, D., Karaus, A., Merboldt, K.D., Frahm, J.: Real-time MRI at a resolution of 20 ms. NMR in Biomedicine 23, 986–994 (2010)
Article Google Scholar
Verner, U., Schuster, A., Silberstein, M.: Processing Data Streams with Hard Real-time Constraints on Heterogeneous Systems. In: Proceedings of the International Conference on Supercomputing, pp. 120–129. ACM (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

BiomedNMR Forschungs GmbH, Max Planck Institute for biophysical Chemistry, Goettingen, Germany
Sebastian Schaetz
Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, USA
Martin Uecker

Authors

Sebastian Schaetz
View author publications
You can also search for this author in PubMed Google Scholar
Martin Uecker
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology, Deakin University, Melbourne Burwood Campus, 221 Burwood Highway, 3125, Burwood, VIC, Australia
Yang Xiang
SEECS, University of Ottawa, 8, King Edward Ave, K1N 6N5, Ottawa, ON, Canada
Ivan Stojmenovic
Department of Intelligent Informatics, Kyushu Sangyo University, 2-3-1 Matsukadai, Higashi-ku, 813-8503, Fukuoka, Japan
Bernady O. Apduhan
School of Information Science and Engineering, Central South University, 410083, Changsha, Hunan Province, P.R. China
Guojun Wang
Department of Information Engineering, Hiroshima University, 1-4-1, Kagamiyama, 739-8527, Higashi-Hiroshima, Japan
Koji Nakano
School of Information Technologies, University of Sydney, Building J12, 2006, Sydney, NSW, Australia
Albert Zomaya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schaetz, S., Uecker, M. (2012). A Multi-GPU Programming Library for Real-Time Applications. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7439. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33078-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-33078-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33077-3
Online ISBN: 978-3-642-33078-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics