Performance comparison of multi-container deployment schemes for HPC workloads: an empirical study


The high-performance computing (HPC) community has recently started to use containerization to obtain fast, customized, portable, flexible, and reproducible deployments of their workloads. Previous work showed that deploying an HPC workload into a single container can keep bare-metal performance. However, there is a lack of research on multi-container deployments that partition the processes belonging to each application into different containers. Partitioning HPC applications has shown to improve their performance on virtual machines by allowing to set affinity to a non-uniform memory access (NUMA) domain for each of them. Consequently, it is essential to understand the performance implications of distinct multi-container deployment schemes for HPC workloads, focusing on the impact of the container granularity and its combination with processor and memory affinity. This paper presents a systematic performance comparison and analysis of multi-container deployment schemes for HPC workloads on a single-node platform, which considers different containerization technologies (including Docker and Singularity), two different platform architectures (UMA and NUMA), and two application subscription modes (exact subscription and over-subscription). Our results indicate that finer-grained multi-container deployments, on the one side, can benefit the performance of some applications with low interprocess communication, especially in over-subscribed scenarios and when combined with affinity, but, on the other side, they can incur some performance degradation for communication-intensive applications when using containerization technologies that deploy isolated network namespaces.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33
Fig. 34
Fig. 35
Fig. 36
Fig. 37
Fig. 38
Fig. 39
Fig. 40


  1. 1.

  2. 2.

  3. 3.

  4. 4.

  5. 5.


  1. 1.

    Alam S, Barrett R, Bast M, Fahey MR, Kuehn J, McCurdy C, Rogers J, Roth P, Sankaran R, Vetter JS et al (2008) Early evaluation of IBM BlueGene/P. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC’08). IEEE, pp 1–12.

  2. 2.

    Arango C, Dernat R, Sanabria J (2017) Performance evaluation of container-based virtualization for high performance computing environments. CoRR abs/1709.10140

  3. 3.

    Azab A (2017) Enabling docker containers for high-performance and many-task computing. In: Proceedings of the 2017 IEEE International Conference on Cloud Engineering (IC2E), pp 279–285.

  4. 4.

    Bacik J Cpu scheduler imbalance with cgroups.

  5. 5.

    Banerjee A, Mehta R, Shen Z (2015) NUMA aware I/O in virtualized systems. In: Proceedings of the 2015 IEEE 23rd annual symposium on high-performance interconnects, pp 10–17 (2015).

  6. 6.

    Bermejo B, Juiz C (2020) On the classification and quantification of server consolidation overheads. J Supercomput.

    Article  Google Scholar 

  7. 7.

    Cheng Y, Chen W, Chen X, Xu B, Zhang S (2013) A user-level numa-aware scheduler for optimizing virtual machine performance. In: Revised selected papers of the 10th international symposium on advanced parallel processing technologies, APPT 2013, vol 8299, pp 32–46. Springer, Berlin, Heidelberg.

  8. 8.

    Chung MT, Quang-Hung N, Nguyen M, Thoai N (2016) Using docker in high performance computing applications. In: Proceedings of the 2016 IEEE Sixth International Conference on Communications and Electronics (ICCE), pp 52–57.

  9. 9.

    Felter W, Ferreira A, Rajamony R, Rubio J (2015) An updated performance comparison of virtual machines and Linux containers. In: Proceedings of the 2015 IEEE international symposium on performance analysis of systems and software (ISPASS). IEEE, pp 171–172.

  10. 10.

    Google: Cgroups-cpus.

  11. 11.

    Halácsy G, Ádám Mann Z (2018) Optimal energy-efficient placement of virtual machines with divisible sizes. Inf Process Lett 138:51–56.

    MathSciNet  Article  MATH  Google Scholar 

  12. 12.

    HPC advisor council: HPCC performance benchmark and profiling (2015).

  13. 13.

    HPC wire: Sylabs releases singularity 3.0 container platform; Cites AI Support (2018).

  14. 14.

    Ibrahim KZ, Hofmeyr S, Iancu C (2011) Characterizing the performance of parallel applications on multi-socket virtual machines. In: Proceedings of the 2011 11th IEEE/ACM international symposium on cluster, cloud and grid computing. IEEE, pp 1–12.

  15. 15.

    Ibrahim KZ, Hofmeyr S, Iancu C (2014) The case for partitioning virtual machines on multicore architectures. IEEE Trans Parallel Distrib Syst 25(10):2683–2696.

    Article  Google Scholar 

  16. 16.

    Iosup A, Ostermann S, Yigitbasi MN, Prodan R, Fahringer T, Epema D (2011) Performance analysis of cloud computing services for many-tasks scientific computing. IEEE Trans Parallel Distrib Syst 22(6):931–945.

    Article  Google Scholar 

  17. 17.

    Jha DN, Garg S, Jayaraman PP, Buyya R, Li Z, Morgan G, Ranjan R (2019) A study on the evaluation of HPC microservices in containerized environment. Concurr Comput.

    Article  Google Scholar 

  18. 18.

    Jha DN, Garg S, Jayaraman PP, Buyya R, Li Z, Ranjan R (2018) A holistic evaluation of docker containers for interfering microservices. In: Proceedings of the 2018 IEEE International Conference on Services Computing (SCC), pp 33–40.

  19. 19.

    Kuity A, Peddoju SK (2017) Performance evaluation of container-based high performance computing ecosystem using OpenPOWER. In: Kunkel JM, Yokota R, Taufer M, Shalf J (eds) High performance computing, ISC high performance 2017, Lecture notes in computer science. Springer International Publishing, Cham, vol 10524, pp 290–308.

  20. 20.

    Kurtzer GM, Sochat V, Bauer MW (2017) Singularity: scientific containers for mobility of compute. PLoS ONE 12(5):e0177459.

    Article  Google Scholar 

  21. 21.

    Lozi JP, Lepers B, Funston J, Gaud F, Quéma V, Fedorova A (2016) The Linux scheduler: a decade of wasted cores. In: Proceedings of the Eleventh European Conference on Computer Systems, EuroSys’16. Association for Computing Machinery.

  22. 22.

    Luszczek PR, Bailey DH, Dongarra JJ, Kepner J, Lucas RF, Rabenseifner R, Takahashi D (2006) The HPC challenge (HPCC) benchmark suite. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC’06).

  23. 23.

    Luszczek P, Koester D (2005) HPC challenge v1.x benchmark suite. SC’05 Tutorial, Seattle, Washington.

  24. 24.

    Maliszewski AM, Griebler D, Schepke C, Ditter A, Fey D, Fernandes LG (2018) The NAS benchmark kernels for single and multi-tenant cloud instances with LXC/KVM. In: Proceedings of the 2018 International Conference on High Performance Computing Simulation (HPCS), pp 359–366.

  25. 25.

    Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60.

    MathSciNet  Article  MATH  Google Scholar 

  26. 26.

    Menouer T (2020) KCSS: Kubernetes container scheduling strategy. J Supercomput.

    Article  Google Scholar 

  27. 27.

    OpenMPI Team: Can I force aggressive or degraded performance modes?

  28. 28.

    OpenMPI Team: Can I oversubscribe nodes (run more processes than processors)?

  29. 29.

    Perarnau S, Essen BCV, Gioiosa R, Iskra K, Gokhale MB, Yoshii K, Beckman P (2019) Argo. In: Operating systems for supercomputers and high performance computing.

  30. 30.

    Pillet V, Labarta J, Cortes T, Girona S (1995) PARAVER: a tool to visualize and analyze parallel code. In: Proceedings of the 18th World Occam and Transputer User Group Technical Meeting. IOS Press, pp 9–13

  31. 31.

    Rao J, Wang K, Zhou X, Xu C (2013) Optimizing virtual machine scheduling in NUMA multicore systems. In: Proceedings of the 2013 IEEE 19th international symposium on high performance computer architecture (HPCA), pp 306–317.

  32. 32.

    Roloff E, Diener M, Carissimi A, Navaux POA (2012) High performance computing in the cloud: deployment, performance and cost efficiency. In: Proceedings of the 4th IEEE International Conference on Cloud Computing Technology and Science, pp 371–378.

  33. 33.

    Rudyy O, Garcia-Gasulla M, Mantovani F, Santiago A, Sirvent R, Vázquez M (2019) Containers in HPC: a scalability and portability study in production biological simulations. In: Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 567–577.

  34. 34.

    Saha P, Beltre A, Govindaraju M (2019) Scylla: a mesos framework for container based MPI jobs. CoRR abs/1905.08386

  35. 35.

    Saha P, Beltre A, Uminski P, Govindaraju M (2018) Evaluation of docker containers for scientific workloads in the cloud. In: Proceedings of the practice and experience on advanced research computing, PEARC’18. Association for Computing Machinery.

  36. 36.

    Sande Veiga V, Simon M, Azab A, Fernandez C, Muscianisi G, Fiameni G, Marocchi S (2019) Evaluation and benchmarking of singularity MPI containers on EU research e-infrastructure. In: Proceedings of the 2019 IEEE/ACM International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC), pp 1–10.

  37. 37.

    Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3–4):591–611.

    MathSciNet  Article  MATH  Google Scholar 

  38. 38.

    Sharma P, Chaufournier L, Shenoy P, Tay YC (2016) Containers and virtual machines at scale. In: Proceedings of the 17th International Conference on Middleware, pp 1–13.

  39. 39.

    Sterling T, Anderson M, Brodowicz M (2018) The essential resource management. In: High performance computing, chapter 5. Morgan Kaufmann, Boston, pp 141–190.

  40. 40.

    Tesfatsion SK, Klein C, Tordsson J (2018) Virtualization techniques compared: performance, resource, and power usage overheads in clouds. In: Proceedings of the 2018 ACM/SPEC International Conference on Performance Engineering, ICPE ’18. Association for Computing Machinery, pp 145–156.

  41. 41.

    Torrez A, Randles T, Priedhorsky R (2019) HPC container runtimes have minimal or no performance impact. In: Proceedings of the 2019 IEEE/ACM international workshop on containers and new orchestration paradigms for isolated environments in HPC (CANOPIE-HPC), pp 37–42.

  42. 42.

    Tudor BM, Teo YM (2011) A practical approach for performance analysis of shared-memory programs. In: Proceedings of the 2011 IEEE international parallel distributed processing symposium, pp 652–663.

  43. 43.

    Vmware: virtualizing high-performance computing (HPC) environments: reference architecture (September) (2018)

  44. 44.

    Wang Y, Evans RT, Huang L (2019) Performant container support for HPC applications. In: Proceedings of the practice and experience in advanced research computing on rise of the machines (learning), PEARC’19, pp 1–6. Association for Computing Machinery.

  45. 45.

    Welch BL (1947) The generalization of student’s problem when several different population variances are involved. Biometrika 34(1–2):28–35.

    MathSciNet  Article  MATH  Google Scholar 

  46. 46.

    Xavier MG, Neves MV, Rossi FD, Ferreto TC, Lange T, De Rose CAF (2013) Performance evaluation of container-based virtualization for high performance computing environments. In: Proceedings of the 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp 233–240.

  47. 47.

    Xing F, You H, Lu C (2014) HPC benchmark assessment with statistical analysis. Procedia Comput Sci 29:210–219.

    Article  Google Scholar 

  48. 48.

    Yang S, Wang X, An L, Zhang G (2019) Yun: a high-performance container management service based on OpenStack. In: Proceedings of the 2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC), pp 202–209.

  49. 49.

    Younge AJ, Pedretti K, Grant RE, Brightwell R (2017) A tale of two systems: using containers to deploy HPC applications on supercomputers and clouds. In: Proceedings of the 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), pp 74–81.

Download references


We thank Lenovo for providing the technical infrastructure to run the experiments in this paper. This work was partially supported by Lenovo as part of Lenovo-BSC collaboration agreement, by the Spanish Government under contract PID2019-107255GB-C22, and by the Generalitat de Catalunya under contract 2017-SGR-1414 and under grant 2020 FI-B 00257.

Author information



Corresponding author

Correspondence to Peini Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, P., Guitart, J. Performance comparison of multi-container deployment schemes for HPC workloads: an empirical study. J Supercomput 77, 6273–6312 (2021).

Download citation


  • Docker
  • Singularity
  • Performance analysis
  • Deployment schemes
  • Multi-container
  • HPC workloads