Providing CUDA Acceleration to KVM Virtual Machines in InfiniBand Clusters with rCUDA

  • Ferran Pérez
  • Carlos Reaño
  • Federico Silla
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9687)


There is a trend towards using graphics processing units (GPUs) not only for graphics visualization, but also for accelerating scientific applications. But their use for this purpose is not without disadvantages: GPUs increase costs and energy consumption. Furthermore, GPUs are generally underutilized. Using virtual machines could be a possible solution to address these problems, however, current solutions for providing GPU acceleration to virtual machines environments, such as KVM or Xen, present some issues. In this paper we propose the use of remote GPUs to accelerate scientific applications running inside KVM virtual machines. Our analysis shows that this approach could be a possible solution, with low overhead when used over InfiniBand networks.


CUDA KVM Virtualization InfiniBand HPC 



This work was funded by Generalitat Valenciana under Grant PROMETEOII/2013/009 of the PROMETEO program phase II. The authors are grateful for the generous support provided by Mellanox Technologies and the equipment donated by NVIDIA Corporation.


  1. 1.
    Kernel-based Virtual Machine. Accessed: Jan 2016
  2. 2.
    NextIO, N2800-ICA — Flexible and manageable I/O expansion and virtualization. Accessed: Mar 2012
  3. 3.
    Oracle VM VirtualBox. Accessed: Jan 2016
  4. 4.
    Shadowfax II - scalable implementation of GPGPU assemblies. Accessed: Jan 2016
  5. 5.
    V-GPU: GPU virtualization. Accessed: Jan 2016
  6. 6.
    VMware virtualization. Accessed: Jan 2016
  7. 7.
    Xen Project. Accessed: Jan 2016
  8. 8.
    Mellanox, Connect-IB Single and Dual QSFP+ Port PCI Express Gen3x16 Adapter Card User Manual (2013).
  9. 9.
    rCUDA: Virtualizing GPUs to reduce cost and improve performance (2014).
  10. 10.
    CUDA API Reference Manual 7.0 (2015).
  11. 11.
    NVIDIA Popular GPU-Accelerated Applications Catalog (2015).
  12. 12.
    Barak, A., Ben-Nun, T., Levy, E., Shiloh, A.: A package for OpenCL based heterogeneous computing on clusters with many GPU devices. In: 2010 IEEE International Conference on Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS), pp. 1–7. IEEE (2010)Google Scholar
  13. 13.
    Barrachina, S., Castillo, M., Igual, F.D., Mayo, R., Quintana-Ortí, E.S., Quintana-Ortí, G.: Exploiting the capabilities of modern GPUs for dense matrix computations. Concurrency Comput.: Pract. Experience 21(18), 2457–2477 (2009)CrossRefGoogle Scholar
  14. 14.
    Duato, José, Igual, Francisco D., Mayo, Rafael, Peña, Antonio J., Quintana-Ortí, Enrique S., Silla, Federico: An efficient implementation of GPU virtualization in high performance clusters. In: Lin, Hai-Xiang, Alexander, Michael, Forsell, Martti, Knüpfer, Andreas, Prodan, Radu, Sousa, Leonel, Streit, Achim (eds.) Euro-Par 2009. LNCS, vol. 6043, pp. 385–394. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Felter, W.: An updated performance comparison of virtual machines and linux containers. IBM Research Report (2014)Google Scholar
  16. 16.
    Gaikwad, A., Toke, I.M.: GPU based sparse grid technique for solving multidimensional options pricing PDEs. In: Proceedings of the 2nd Workshop on High Performance Computational Finance, WHPCF 2009, pp. 6: 1–6: 9. ACM, New York (2009)Google Scholar
  17. 17.
    Giunta, G., Montella, R., Agrillo, G., Coviello, G.: A GPGPU transparent virtualization component for high performance computing clouds. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010, Part I. LNCS, vol. 6271, pp. 379–391. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Group, K.O.W: OpenCL 1.2 Specification (2011)Google Scholar
  19. 19.
    Gupta, V., Gavrilovska, A., Schwan, K., Kharche, H., Tolia, N., Talwar, V., Ranganathan, P.: GViM: GPU-accelerated virtual machines. In: Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing, pp. 17–24. ACM (2009)Google Scholar
  20. 20.
    Iserte, S., Gimeno, A.C., Mayo, R., Quintana-Ortí, E.S., Silla, F., Duato, J., Reaño, C., Prades, J.: SLURM support for remote GPU virtualization: Implementation and performance study. In: 26th IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2014, Paris, France, 22–24 October, pp. 318–325 (2014)Google Scholar
  21. 21.
    Jo, H., Jeong, J., Lee, M., Choi, D.H.: Exploiting GPUs in virtual machine for BioCloud. BioMed Res. Int. 2013, 1–11 (2013)CrossRefGoogle Scholar
  22. 22.
    Kegel, P., Steuwer, M., Gorlatch, S.: dopencl: Towards a uniform programming approach for distributed heterogeneous multi-many-core systems. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), pp. 174–186, May 2012Google Scholar
  23. 23.
    Kim, J., Seo, S., Lee, J., Nah, J., Jo, G., Lee, J.: SnuCL: An OpenCL framework for heterogeneous CPU/GPU clusters. In: Proceedings of the 26th ACM International Conference on Supercomputing, ICS 2012, pp. 341–352. ACM, New York (2012)Google Scholar
  24. 24.
    Krishnan, V.: Towards an integrated IO and clustering solution using PCI express. In: 2007 IEEE International Conference on Cluster Computing, pp. 259–266. IEEE (2007)Google Scholar
  25. 25.
    Laboratories, S.N.: LAMMPS Molecular Dynamics Simulator (2013).
  26. 26.
    Liang, T.Y., Chang, Y.W.: GridCuda: a grid-enabled CUDA programming toolkit. In: 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications (WAINA), pp. 141–146. IEEE (2011)Google Scholar
  27. 27.
    Liu, Y., Wirawan, A., Schmidt, B.: CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinform. 14(1), 1–10 (2013)CrossRefGoogle Scholar
  28. 28.
    Merritt, A.M., Gupta, V., Verma, A., Gavrilovska, A., Schwan, K.: Shadowfax: scaling in heterogeneous cluster systems via GPGPU assemblies. In: Proceedings of the 5th International Workshop on Virtualization Technologies in Distributed Computing, VTDC 2011, pp. 3–10. ACM, New York (2011)Google Scholar
  29. 29.
    NVIDIA: CUDA C Programming Guide 7.0 (2015)Google Scholar
  30. 30.
    Oikawa, M., Kawai, A., Nomura, K., Yasuoka, K., Yoshikawa, K., Narumi, T.: DS-CUDA: a middleware to use many GPUs in the cloud environment. In: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012, pp. 1207–1214. IEEE Computer Society, Washington, DC (2012)Google Scholar
  31. 31.
    Peña, A.J., Reaño, C., Silla, F., Mayo, R., Quintana-Ortí, E.S., Duato, J.: A complete and efficient CUDA-sharing solution for HPC clusters. Parallel Comput. 40(10), 574–588 (2014)CrossRefGoogle Scholar
  32. 32.
    Playne, D.P., Hawick, K.A.: Data parallel three-dimensional cahn-hilliard field equation simulation on GPUs with CUDA. In: PDPTA, pp. 104–110 (2009)Google Scholar
  33. 33.
    Shi, L., Chen, H., Sun, J.: vCUDA: GPU accelerated high performance computing in virtual machines. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, pp. 1–11. IEEE (2009)Google Scholar
  34. 34.
    Vouzis, P.D., Sahinidis, N.V.: Gpu-blast: Using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2010)CrossRefGoogle Scholar
  35. 35.
    Walters, J.P., Younge, A.J., Kang, D.I., Yao, K.T., Kang, M., Crago, S.P., Fox, G.C.: GPU-Passthrough performance: a comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL applications. In: 7th IEEE International Conference on Cloud Computing (CLOUD 2014) (2014)Google Scholar
  36. 36.
    Wu, H., Diamos, G., Sheard, T., Aref, M., Baxter, S., Garland, M., Yalamanchili, S.: Red fox: an execution environment for relational query processing on GPUs. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2014, pp. 44: 44–44: 54. ACM, New York (2014)Google Scholar
  37. 37.
    Xiao, S., Balaji, P., Zhu, Q., Thakur, R., Coghlan, S., Lin, H.,Wen, G., Hong, J., Chun Feng, W.: Vocl: An optimized environment for transparentvirtualization of graphics processing units. In: Proceedings of the 1st Innovative Parallel Computing (InPar) (2012)Google Scholar
  38. 38.
    Yang, C.T., Wang, H.Y., Ou, W.S., Liu, Y.T., Hsu, C.H.: On implementation of GPU virtualization using PCI pass-through. In: 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), pp. 711–716. IEEE (2012)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2016

Authors and Affiliations

  1. 1.DISCAUniversitat Politècnica de ValènciaValenciaSpain

Personalised recommendations