Euro-Par 2009: Euro-Par 2009 – Parallel Processing Workshops pp 385-394 | Cite as
An Efficient Implementation of GPU Virtualization in High Performance Clusters
Abstract
Current high performance clusters are equipped with high bandwidth/low latency networks, lots of processors and nodes, very fast storage systems, etc. However, due to economical and/or power related constraints, in general it is not feasible to provide an accelerating co-processor –such as a graphics processor (GPU)– per node. To overcome this, in this paper we present a GPU virtualization middleware, which makes remote CUDA-compatible GPUs available to all the cluster nodes. The software is implemented on top of the sockets application programming interface, ensuring portability over commodity networks, but it can also be easily adapted to high performance networks.
Keywords
Graphics processors (GPUs) virtualization high performance computing clusters GridPreview
Unable to display preview. Download preview PDF.
References
- 1.Barrachina, S., Castillo, M., Igual, F.D., Mayo, R., Quintana-Ortí, E.S.: Solving dense linear systems on graphics processors. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 739–748. Springer, Heidelberg (2008)CrossRefGoogle Scholar
- 2.Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: stream computing on graphics hardware. In: SIGGRAPH ’04: ACM SIGGRAPH 2004 Papers, pp. 777–786. ACM, New York (2004)CrossRefGoogle Scholar
- 3.Duato, J., Silla, F., Yalamanchili, S., Holden, B., Miranda, P., Underhill, J., Cavalli, M., Brüning, U.: Extending HyperTransport protocol for improved scalability. In: Proceedings of the First International Workshop on HyperTransport Research and Applications (WHTRA 2009), pp. 46–53 (2009)Google Scholar
- 4.Figueiredo, R., Dinda, P.A., Fortes, J.: Guest editors’ introduction: Resource virtualization renaissance. Computer 38(5), 28–31 (2005)CrossRefGoogle Scholar
- 5.Andres Lagar-Cavilla, H., Tolia, N., Satyanarayanan, M., de Lara, E.: VMM-independent graphics acceleration. In: VEE ’07: Proceedings of the 3rd international conference on Virtual execution environments, pp. 33–43. ACM, New York (2007)CrossRefGoogle Scholar
- 6.Litz, H., Froening, H., Nuessle, M., Bruening, U.: VELO: A novel communication engine for ultra-low latency message transfers. In: ICPP ’08. 37th International Conference on Parallel Processing, September 2008, pp. 238–245 (2008)Google Scholar
- 7.Mogul, J.C., Minshall, G.: Rethinking the TCP nagle algorithm. Computer Communication Review 31(1), 6–20 (2001)CrossRefGoogle Scholar
- 8.Munshi, A. (ed.): OpenCL 1.0 Specification. Khronos OpenCL Working Group (2009)Google Scholar
- 9.Nagle, J.: Congestion control in IP/TCP internetworks. Computer Communication Review 14(4), 11–17 (1984)CrossRefGoogle Scholar
- 10.Nagle, J.: RFC 896: Congestion control in IP/TCP internetworks (January 1984)Google Scholar
- 11.NVIDIA: Nvidia CUDA Compiler Driver NVCC. NVIDIA (2008)Google Scholar
- 12.NVIDIA: Nvidia CUDA Programming Guide Version 2.1. NVIDIA (2008)Google Scholar
- 13.Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Kruger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. Computer Graphics Forum 26(1), 80–113 (2007)CrossRefGoogle Scholar
- 14.Stone, S.S., Haldar, J.P., Tsao, S.C., Hwu, W.-m.W., Liang, Z.-P., Sutton, B.P.: Accelerating advanced MRI reconstructions on GPUs. In: CF ’08: Proceedings of the 2008 conference on Computing frontiers, pp. 261–272. ACM, New York (2008)CrossRefGoogle Scholar
- 15.Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: SC ’08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, Piscataway, NJ, USA, pp. 1–11. IEEE Press, Los Alamitos (2008)Google Scholar