An Efficient Implementation of GPU Virtualization in High Performance Clusters

Duato, José; Igual, Francisco D.; Mayo, Rafael; Peña, Antonio J.; Quintana-Ortí, Enrique S.; Silla, Federico

doi:10.1007/978-3-642-14122-5_44

José Duato⁸,
Francisco D. Igual⁹,
Rafael Mayo⁹,
Antonio J. Peña⁸,
Enrique S. Quintana-Ortí⁹ &
…
Federico Silla⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6043))

Included in the following conference series:

European Conference on Parallel Processing

2157 Accesses
17 Citations

Abstract

Current high performance clusters are equipped with high bandwidth/low latency networks, lots of processors and nodes, very fast storage systems, etc. However, due to economical and/or power related constraints, in general it is not feasible to provide an accelerating co-processor –such as a graphics processor (GPU)– per node. To overcome this, in this paper we present a GPU virtualization middleware, which makes remote CUDA-compatible GPUs available to all the cluster nodes. The software is implemented on top of the sockets application programming interface, ensuring portability over commodity networks, but it can also be easily adapted to high performance networks.

Download to read the full chapter text

Chapter PDF

Providing CUDA Acceleration to KVM Virtual Machines in InfiniBand Clusters with rCUDA

OpenCL as a Programming Model for GPU Clusters

On construction of a virtual GPU cluster with InfiniBand and 10 Gb Ethernet virtualization

Article 19 July 2018

Keywords

References

Barrachina, S., Castillo, M., Igual, F.D., Mayo, R., Quintana-Ortí, E.S.: Solving dense linear systems on graphics processors. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 739–748. Springer, Heidelberg (2008)
Chapter Google Scholar
Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: stream computing on graphics hardware. In: SIGGRAPH ’04: ACM SIGGRAPH 2004 Papers, pp. 777–786. ACM, New York (2004)
Chapter Google Scholar
Duato, J., Silla, F., Yalamanchili, S., Holden, B., Miranda, P., Underhill, J., Cavalli, M., Brüning, U.: Extending HyperTransport protocol for improved scalability. In: Proceedings of the First International Workshop on HyperTransport Research and Applications (WHTRA 2009), pp. 46–53 (2009)
Google Scholar
Figueiredo, R., Dinda, P.A., Fortes, J.: Guest editors’ introduction: Resource virtualization renaissance. Computer 38(5), 28–31 (2005)
Article Google Scholar
Andres Lagar-Cavilla, H., Tolia, N., Satyanarayanan, M., de Lara, E.: VMM-independent graphics acceleration. In: VEE ’07: Proceedings of the 3rd international conference on Virtual execution environments, pp. 33–43. ACM, New York (2007)
Chapter Google Scholar
Litz, H., Froening, H., Nuessle, M., Bruening, U.: VELO: A novel communication engine for ultra-low latency message transfers. In: ICPP ’08. 37th International Conference on Parallel Processing, September 2008, pp. 238–245 (2008)
Google Scholar
Mogul, J.C., Minshall, G.: Rethinking the TCP nagle algorithm. Computer Communication Review 31(1), 6–20 (2001)
Article Google Scholar
Munshi, A. (ed.): OpenCL 1.0 Specification. Khronos OpenCL Working Group (2009)
Google Scholar
Nagle, J.: Congestion control in IP/TCP internetworks. Computer Communication Review 14(4), 11–17 (1984)
Article Google Scholar
Nagle, J.: RFC 896: Congestion control in IP/TCP internetworks (January 1984)
Google Scholar
NVIDIA: Nvidia CUDA Compiler Driver NVCC. NVIDIA (2008)
Google Scholar
NVIDIA: Nvidia CUDA Programming Guide Version 2.1. NVIDIA (2008)
Google Scholar
Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Kruger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. Computer Graphics Forum 26(1), 80–113 (2007)
Article Google Scholar
Stone, S.S., Haldar, J.P., Tsao, S.C., Hwu, W.-m.W., Liang, Z.-P., Sutton, B.P.: Accelerating advanced MRI reconstructions on GPUs. In: CF ’08: Proceedings of the 2008 conference on Computing frontiers, pp. 261–272. ACM, New York (2008)
Chapter Google Scholar
Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: SC ’08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, Piscataway, NJ, USA, pp. 1–11. IEEE Press, Los Alamitos (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Informática de Sistemas y Computadores, Universidad Politécnica de Valencia (UPV), 46022, Valencia, Spain
José Duato, Antonio J. Peña & Federico Silla
Depto. de Ingeniería y Ciencia de Computadores, Universidad Jaume I (UJI), 12071, Castellón, Spain
Francisco D. Igual, Rafael Mayo & Enrique S. Quintana-Ortí

Authors

José Duato
View author publications
You can also search for this author in PubMed Google Scholar
Francisco D. Igual
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Mayo
View author publications
You can also search for this author in PubMed Google Scholar
Antonio J. Peña
View author publications
You can also search for this author in PubMed Google Scholar
Enrique S. Quintana-Ortí
View author publications
You can also search for this author in PubMed Google Scholar
Federico Silla
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Insitute for Applied Mathematics, Delft University of Technology, 2628, Delft, The Netherlands
Hai-Xiang Lin
Scaledinfra technologies GmbH, Köllnerhofgasse 3/15A, 1010, Vienna, Austria
Michael Alexander
VTT, Kaitovayla 1, 90570, Oulu, Finland
Martti Forsell
Technische Universität Dresden, 01069, Dresden, Germany
Andreas Knüpfer
Institute for Computer Science, Technical University of Innsbruck, 6020, Innsbruck, Austria
Radu Prodan
Instituto Superior Técnico/INESC-ID., Rua Alves Redol 9, 1000-029, Lisbon, Portugal
Leonel Sousa
Jülich Supercomputing Centre, 52425, Jülich, Germany
Achim Streit

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duato, J., Igual, F.D., Mayo, R., Peña, A.J., Quintana-Ortí, E.S., Silla, F. (2010). An Efficient Implementation of GPU Virtualization in High Performance Clusters. In: Lin, HX., et al. Euro-Par 2009 – Parallel Processing Workshops. Euro-Par 2009. Lecture Notes in Computer Science, vol 6043. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14122-5_44

Download citation

DOI: https://doi.org/10.1007/978-3-642-14122-5_44
Published: 17 June 2010
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14121-8
Online ISBN: 978-3-642-14122-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Efficient Implementation of GPU Virtualization in High Performance Clusters

Abstract

Chapter PDF

Similar content being viewed by others

Providing CUDA Acceleration to KVM Virtual Machines in InfiniBand Clusters with rCUDA

OpenCL as a Programming Model for GPU Clusters

On construction of a virtual GPU cluster with InfiniBand and 10 Gb Ethernet virtualization

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An Efficient Implementation of GPU Virtualization in High Performance Clusters

Abstract

Chapter PDF

Similar content being viewed by others

Providing CUDA Acceleration to KVM Virtual Machines in InfiniBand Clusters with rCUDA

OpenCL as a Programming Model for GPU Clusters

On construction of a virtual GPU cluster with InfiniBand and 10 Gb Ethernet virtualization

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation