Abstract
Current high performance clusters are equipped with high bandwidth/low latency networks, lots of processors and nodes, very fast storage systems, etc. However, due to economical and/or power related constraints, in general it is not feasible to provide an accelerating co-processor –such as a graphics processor (GPU)– per node. To overcome this, in this paper we present a GPU virtualization middleware, which makes remote CUDA-compatible GPUs available to all the cluster nodes. The software is implemented on top of the sockets application programming interface, ensuring portability over commodity networks, but it can also be easily adapted to high performance networks.
Chapter PDF
Similar content being viewed by others
References
Barrachina, S., Castillo, M., Igual, F.D., Mayo, R., Quintana-Ortí, E.S.: Solving dense linear systems on graphics processors. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 739–748. Springer, Heidelberg (2008)
Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: stream computing on graphics hardware. In: SIGGRAPH ’04: ACM SIGGRAPH 2004 Papers, pp. 777–786. ACM, New York (2004)
Duato, J., Silla, F., Yalamanchili, S., Holden, B., Miranda, P., Underhill, J., Cavalli, M., Brüning, U.: Extending HyperTransport protocol for improved scalability. In: Proceedings of the First International Workshop on HyperTransport Research and Applications (WHTRA 2009), pp. 46–53 (2009)
Figueiredo, R., Dinda, P.A., Fortes, J.: Guest editors’ introduction: Resource virtualization renaissance. Computer 38(5), 28–31 (2005)
Andres Lagar-Cavilla, H., Tolia, N., Satyanarayanan, M., de Lara, E.: VMM-independent graphics acceleration. In: VEE ’07: Proceedings of the 3rd international conference on Virtual execution environments, pp. 33–43. ACM, New York (2007)
Litz, H., Froening, H., Nuessle, M., Bruening, U.: VELO: A novel communication engine for ultra-low latency message transfers. In: ICPP ’08. 37th International Conference on Parallel Processing, September 2008, pp. 238–245 (2008)
Mogul, J.C., Minshall, G.: Rethinking the TCP nagle algorithm. Computer Communication Review 31(1), 6–20 (2001)
Munshi, A. (ed.): OpenCL 1.0 Specification. Khronos OpenCL Working Group (2009)
Nagle, J.: Congestion control in IP/TCP internetworks. Computer Communication Review 14(4), 11–17 (1984)
Nagle, J.: RFC 896: Congestion control in IP/TCP internetworks (January 1984)
NVIDIA: Nvidia CUDA Compiler Driver NVCC. NVIDIA (2008)
NVIDIA: Nvidia CUDA Programming Guide Version 2.1. NVIDIA (2008)
Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Kruger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. Computer Graphics Forum 26(1), 80–113 (2007)
Stone, S.S., Haldar, J.P., Tsao, S.C., Hwu, W.-m.W., Liang, Z.-P., Sutton, B.P.: Accelerating advanced MRI reconstructions on GPUs. In: CF ’08: Proceedings of the 2008 conference on Computing frontiers, pp. 261–272. ACM, New York (2008)
Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: SC ’08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, Piscataway, NJ, USA, pp. 1–11. IEEE Press, Los Alamitos (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Duato, J., Igual, F.D., Mayo, R., Peña, A.J., Quintana-Ortí, E.S., Silla, F. (2010). An Efficient Implementation of GPU Virtualization in High Performance Clusters. In: Lin, HX., et al. Euro-Par 2009 – Parallel Processing Workshops. Euro-Par 2009. Lecture Notes in Computer Science, vol 6043. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14122-5_44
Download citation
DOI: https://doi.org/10.1007/978-3-642-14122-5_44
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14121-8
Online ISBN: 978-3-642-14122-5
eBook Packages: Computer ScienceComputer Science (R0)