What Is the Right Balance for Performance and Isolation with Virtualization in HPC?

  • Thomas Naughton
  • Garry Smith
  • Christian Engelmann
  • Geoffroy Vallée
  • Ferrol Aderholdt
  • Stephen L. Scott
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8805)


The use of virtualization in high-performance computing (HPC) has been suggested as a means to provide tailored services and added functionality that many users expect from full-featured Linux cluster environments. While the use of virtual machines in HPC can offer several benefits, maintaining performance is a crucial factor. In some instances performance criteria are placed above isolation properties and selective relaxation of isolation for performance is an important characteristic when considering resilience for HPC environments employing virtualization.

In this paper we consider some of the factors associated with balancing performance and isolation in configurations that employ virtual machines. In this context, we propose a classification of errors based on the concept of “error zones”, as well as a detailed analysis of the trade-offs between resilience and performance based on the level of isolation provided by virtualization solutions. Finally, the results from a set of experiments are presented, that use different virtualization solutions, and in doing so allow further elucidation of the topic.


Virtual Machine High Performance Computing Fault Injection Virtual Machine Migration Virtual Machine Monitor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Avižienis, A., Laprie, J.C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing (TDSC) 1(1), 11–33 (2004), CrossRefGoogle Scholar
  2. 2.
    Beckman, P., Brightwell, R., de Supinski, B.R., Gokhale, M., Hofmeyr, S., Krishnamoorthy, S., Lang, M., Maccabe, B., Shalf, J., Snir, M.: Exascale Operating Systems and Runtime Software Report. Tech. rep., U. S. Department of Energy (December 28, 2012)Google Scholar
  3. 3.
    Bellard, F.: QEMU, a fast and portable dynamic translator. In: USENIX 2005 Annual Technical Conference. Anaheim, CA, USA (April 2005)Google Scholar
  4. 4.
    Brightwell, R., Oldfield, R., Maccabe, A.B., Bernholdt, D.E.: Hobbes: Composition and virtualization as the foundations of an extreme-scale OS/R. In: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers (ROSS 2013), pp. 2:1–2:8. ACM, New York,
  5. 5.
    Engelmann, C., Scott, S.L., Ong, H., Vallée, G., Naughton, T.: Configurable Virtualized System Environments for High Performance Computing. In: Proceedings of the 1st Workshop on System-level Virtualization for High Performance Computing (HPCVirt 2007), Held in Conjunction with the ACM EuroSys 2007, Lisbon, Portugal (March 20, 2007),
  6. 6.
    Gallard, J., Lèbre, A., Vallée, G., Morin, C., Gallard, P., Scott, S.L.: Refinement proposal of the goldberg’s theory. In: Hua, A., Chang, S.-L. (eds.) ICA3PP 2009. LNCS, vol. 5574, pp. 853–865. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  7. 7.
    Goldberg, R.P.: Architecture of Virtual Machines. In: Proceedings of the Workshop on Virtual Computer Systems, pp. 74–112. ACM Press, New York (1973)CrossRefGoogle Scholar
  8. 8.
    Goloubeva, O., Rebaudengo, M., Reorda, M.S., Violante, M.: Software-Implemented Hardware Fault Tolerance. Springer (August 2006)Google Scholar
  9. 9.
    Huang, W., Liu, J., Abali, B., Panda, D.K.: A case for high performance computing with virtual machines. In: ICS 2006: Proceedings of the 20th annual international conference on Supercomputing, pp. 125–134. ACM Press, New York (2006)Google Scholar
  10. 10.
    Intel® Corporation: Intel® 64 and IA-32 Architectures Software Developer’s Manual – Volume 1: Basic Architecture (February 2014),, Order Number: 253665-050US
  11. 11.
    Kitten lightweight kernel, (last visited: August 29, 2009)
  12. 12.
    Koopman, P., DeVale, J.: The exception handling effectiveness of POSIX operating systems. IEEE Transactions on Software Engineering 26(9), 837–848 (2000)CrossRefGoogle Scholar
  13. 13.
    Lange, J., Pedretti, K., Hudson, T., Dinda, P., Cui, Z., Xia, L., Bridges, P., Gocke, A., Jaconette, S., Levenhagen, M., Brightwell, R.: Palacios and Kitten: New high performance operating systems for scalable virtualized and native supercomputing. In: IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp. 1–12 (April 2010)Google Scholar
  14. 14.
    Linux Kernel-based Virtual Machine (KVM),, (last visited: March 30, 2014)
  15. 15.
    Liu, J., Huang, W., Abali, B., Panda, D.K.: High performance VMM-Bypass I/O in virtual machines. In: Proceedings of the Annual USENIX Technical Conference (USENIX 2006), pp. 29–42. USENIX Association (2006),
  16. 16.
    Naughton, T., Bland, W., Vallée, G.R., Engelmann, C., Scott, S.L.: Fault Injection Framework for System Resilience Evaluation: Fake Faults for Finding Future Failures. In: Proceedings of the 2nd Workshop on Resiliency in High Performance Computing (Resilience 2009), ACM Press, New York (June 9, 2009); held in conjunction with HPDC 2009, Munich, Germany Google Scholar
  17. 17.
    Naughton, T., Vallée, G., Engelmann, C., Scott, S.L.: A case for virtual machine based fault injection in a high-performance computing environment. In: Alexander, M., et al. (eds.) Euro-Par 2011, Part I. LNCS, vol. 7155, pp. 234–243. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  18. 18.
    Palacios: An OS independent embeddable VMM,, Project URL: (Last visited: April 26, 2014).
  19. 19.
    RedHat: (Whitepaper) KVM - Kernel-based Virtual Machine (September 1, 2008), (last visited: April 1, 2014).
  20. 20.
    Riesen, R., Brightwell, R., Bridges, P.G., Hudson, T., Maccabe, A.B., Widener, P.M., Ferreira, K.: Designing and implementing lightweight kernels for capability computing. Concurrency and Computation: Practice and Experience 21(6), 793–817 (2009), CrossRefGoogle Scholar
  21. 21.
    Scott, S.L., Vallée, G., Naughton, T., Tikotekar, A., Engelmann, C., Ong, H.: Research on System-Level Virtualization at the Oak Ridge National Laboratory. Future Generation Computer Systems (2009)Google Scholar
  22. 22.
    Vallée, G., Naughton, T., Ong, H., Scott, S.L.: Checkpoint/restart of virtual machines based on xen. In: HAPCW 2006: High Availability and Performance Computing Workshop. Held in conjunction with LACSI 2006, Santa Fe, New Mexico, USA (October 2006)Google Scholar
  23. 23.
    Vallée, G.R., Naughton, T., Engelmann, C., Ong, H.H., Scott, S.L.: System-level virtualization for high performance computing. In: Proceedings of the 16th Euromicro International Conference on Parallel, Distributed, and network-based Processing (PDP), February 13-15, pp. 636–643. IEEE Computer Society, Los Alamitos (2008),
  24. 24.
    Youseff, L., Seymour, K., You, H., Dongarra, J., Wolski, R.: The impact of paravirtualized memory hierarchy on linear algebra computational kernels and software. In: Proceedings of the 17th International Symposium on High Performance Distributed Computing (HPDC 2008), pp. 141–152. ACM, New York (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Thomas Naughton
    • 1
    • 2
  • Garry Smith
    • 2
  • Christian Engelmann
    • 1
  • Geoffroy Vallée
    • 1
  • Ferrol Aderholdt
    • 3
  • Stephen L. Scott
    • 1
    • 3
  1. 1.Computer Science and Mathematics DivisionOak Ridge National LaboratoryOak RidgeUSA
  2. 2.The University of ReadingReadingUK
  3. 3.Computer ScienceTennessee Tech UniversityCookvilleUSA

Personalised recommendations