Skip to main content

An Open-Source Virtualization Layer for CUDA Applications

  • Conference paper
  • First Online:
Euro-Par 2020: Parallel Processing Workshops (Euro-Par 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12480))

Included in the following conference series:


GPUs have achieved widespread adoption for High-Performance Computing and Cloud applications. However, the closed-source nature of CUDA has hindered the development of otherwise commonly used virtualization techniques. In this paper, we evaluate the feasibility of building a GPU virtualization layer that isolates the GPU and CPU parts of CUDA applications to achieve better control of the interactions between applications and the CUDA libraries. We present our open-source tool that transparently intercepts CUDA library calls and executes them in a separate process using remote procedure calls. This allows the execution of CUDA applications on machines without a GPU and provides a basis for the development of tools that require fine-grained control of the GPU resources, such as checkpoint/restore and job schedulers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. 1.

    The Top500 list ( from November 2019 that ranks the fastest HPC clusters contains no cluster that uses GPUs from different vendors.

  2. 2.

    The code is available at

  3. 3.

    As of writing the latest GPU generation and CUDA version are Turing and CUDA 10.2.


  1. Baker, Z.K., Gokhale, M.B., Tripp, J.L.: Matched filter computation on FPGA, cell and GPU. In: 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007), pp. 207–218, April 2007.

  2. Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: 2009 IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54. IEEE (2009)

    Google Scholar 

  3. Duato, J., Peña, A.J., Silla, F., Mayo, R., Quintana-Ortí, E.S.: rCUDA: reducing the number of GPU-based accelerators in high performance clusters. In: 2010 International Conference on High Performance Computing Simulation, pp. 224–231, June 2010.

  4. Esmaeilzadeh, H., Blem, E., Amant, R.S., Sankaralingam, K., Burger, D.: Dark silicon and the end of multicore scaling. IEEE Micro 32(3), 122–134 (2012).

    Article  Google Scholar 

  5. Gavrilovska, A., et al.: High-performance hypervisor architectures: virtualization in HPC systems. In: Workshop on System-Level Virtualization for HPC (HPCVirt). Citeseer (2007)

    Google Scholar 

  6. Kutzner, C., Páll, S., Fechner, M., Esztermann, A., de Groot, B.L., Grubmüller, H.: More bang for your buck: improved use of GPU nodes for GROMACS 2018. J. Comput. Chem. 40(27), 2418–2431 (2019).

    Article  Google Scholar 

  7. Laurenzano, M.A., Tikir, M.M., Carrington, L., Snavely, A.: PEBIL: efficient static binary instrumentation for Linux. In: 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS), pp. 175–183 (2010)

    Google Scholar 

  8. Milojičić, D.S., Douglis, F., Paindaveine, Y., Wheeler, R., Zhou, S.: Process migration. ACM Comput. Surv. 32(3), 241–299 (2000).

    Article  Google Scholar 

  9. Mirz, M., Vogel, S., Reinke, G., Monti, A.: DPsim–a dynamic phasor real-time simulator for power systems. SoftwareX 10, 100253 (2019).

    Article  Google Scholar 

  10. Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2007, pp. 89–100. Association for Computing Machinery, New York (2007).

  11. NVIDIA Corporation: Multi-process service. Technical report. Accessed 04 May 2020

  12. NVIDIA Corporation: NVIDIA(R) CUDA(TM) architecture. Technical report. Accessed 10 May 2020

  13. Oikawa, M., Kawai, A., Nomura, K., Yasuoka, K., Yoshikawa, K., Narumi, T.: DS-CUDA: a middleware to use many GPUs in the cloud environment. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 1207–1214 (2012)

    Google Scholar 

  14. Reaño, C., Silla, F.: A performance comparison of CUDA remote GPU virtualization frameworks. In: Proceedings of the 2015 IEEE International Conference on Cluster Computing, CLUSTER 2015, pp. 488–489. IEEE Computer Society (2015).

  15. Shi, L., Chen, H., Sun, J., Li, K.: vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Trans. Comput. 61(6), 804–816 (2012)

    Article  MathSciNet  Google Scholar 

  16. Silla, F., Prades, J., Iserte, S., Reaño, C.: Remote GPU virtualization: is it useful? In: 2016 2nd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB), pp. 41–48 (2016)

    Google Scholar 

  17. Srinivasan, R.: RPC: remote procedure call protocol specification version 2 (1995)

    Google Scholar 

  18. Villa, O., et al.: Scaling the power wall: a path to exascale. In: SC 2014: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 830–841, November 2014.

Download references


This research and development was supported by the German Federal Ministry of Education and Research under Grant 01IH16010C (Project ENVELOPE).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Niklas Eiling .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Eiling, N., Lankes, S., Monti, A. (2021). An Open-Source Virtualization Layer for CUDA Applications. In: Balis, B., et al. Euro-Par 2020: Parallel Processing Workshops. Euro-Par 2020. Lecture Notes in Computer Science(), vol 12480. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-71592-2

  • Online ISBN: 978-3-030-71593-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics