Operational Concepts of GPU Systems in HPC Centers: TCO and Productivity

  • Fabian P. Schneider
  • Sandra Wienke
  • Matthias S. Müller
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10659)


Nowadays, numerous supercomputers comprise GPUs due to promising high performance and memory bandwidth at low power consumption. With GPUs attached to a host system, applications could improve their runtime by utilizing both devices. However, this comes at a cost of increased development effort and system power consumption. In this paper, we compare the total cost of ownership (TCO) and productivity of different operational concepts of GPU systems in HPC centers covering various (heterogeneous) program execution models and CPU-GPU setups. Our investigations include runtime, power consumption, development effort and hardware purchase costs and are exemplified with two application case studies.


TCO Productivity Multi-GPU Operation Procurement 


  1. 1.
    Ament, M., Knittel, G., Weiskopf, D., Strasser, W.: A parallel preconditioned conjugate gradient solver for the poisson problem on a multi-GPU platform. In: 2010 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, pp. 583–592 (2010)Google Scholar
  2. 2.
    Bücker, H., Beucker, R., Rupp, A.: Parallel minimum p-norm solution of the neuromagnetic inverse problem for realistic signals using exact Hessian-vector products. SIAM J. Sci. Comput. 30(6), 2905–2921 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Cevahir, A., Nukada, A., Matsuoka, S.: Fast conjugate gradients with multiple GPUs. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2009 Part I. LNCS, vol. 5544, pp. 893–903. Springer, Heidelberg (2009). CrossRefGoogle Scholar
  4. 4.
    Cevahir, A., Nukada, A., Matsuoka, S.: High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning. Comput. Sci.- Res. Dev. 25(1–2), 83–91 (2010)CrossRefGoogle Scholar
  5. 5.
    Dongarra, J., Graybill, R., Harrod, W., Lucas, R., Lusk, E., Luszczek, P., Mcmahon, J., Snavely, A., Vetter, J., Yelick, K., Alam, S., Campbell, R., Carrington, L., Chen, T.Y., Khalili, O., Meredith, J., Tikir, M.: DARPA’s HPCS program: history, models, tools, languages. In: Zelkowitz, M.V. (ed.) Advances in COMPUTERS High Performance Computing, vol. 72, pp. 1–100. Elsevier, Amsterdam (2008)CrossRefGoogle Scholar
  6. 6.
    Dongarra, J.J., De Supinski, B.R. (eds.): International Journal of High Performance Computing Applications, vol. 18, no. 4. Sage Publications (2004)Google Scholar
  7. 7.
    European Commission - Community Research and Development Information Service (CORDIS): Guide to Financial Issues Relating to FP7 Indirect Actions (2014)Google Scholar
  8. 8.
    German Science Foundation (DFG): DFG Personnel Rates for 2017Google Scholar
  9. 9.
    Gupta, V., Gavrilovska, A., Schwan, K., Kharche, H., Tolia, N., Talwar, V., Ranganathan, P.: GViM: GPU-accelerated virtual machines. In: Proceedings of the 3rd ACM Workshop on System-Level Virtualization for High Performance Computing, pp. 17–24. ACM (2009). 1519141Google Scholar
  10. 10.
    Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. Natl. Bur. Stand. 49, 409–436 (1952)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Iserte, S., Castello, A., Mayo, R., Quintana-Orti, E.S., Silla, F., Duato, J., Reano, C., Prades, J.: SLURM support for remote GPU virtualization: implementation and performance study. In: 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing, pp. 318–325 (2014)Google Scholar
  12. 12.
    Kindratenko, V.V., Enos, J.J., Shi, G., Showerman, M.T., Arnold, G.W., Stone, J.E., Phillips, J., Hwu, W.: GPU clusters for high-performance computing. In: 2009 IEEE International Conference on Cluster Computing and Workshops, pp. 1–8 (2009)Google Scholar
  13. 13.
    Lang, J., Rünger, G.: An execution time and energy model for an energy-aware execution of a conjugate gradient method with CPU/GPU collaboration. J. Parallel Distrib. Comput. 74(9), 2884–2897 (2014)CrossRefGoogle Scholar
  14. 14.
    Lu, F., Song, J., Yin, F., Zhu, X.: Performance evaluation of hybrid programming patterns for large CPU/GPU heterogeneous clusters. Comput. Phys. Commun. 183(6), 1172–1181 (2012)CrossRefGoogle Scholar
  15. 15.
    Oak Ridge National Laboratory: Job Resource Accounting. Accessed 4 2017
  16. 16.
    Oak Ridge National Laboratory: XK7 (Titan) Node Description. Accessed 4 2017
  17. 17.
    Pena, A.J., Reano, C., Silla, F., Mayo, R., Quintana-Orti, E.S., Duato, J.: A complete and efficient CUDA-sharing solution for HPC clusters. Parallel Comput. 40(10), 574–588 (2014)CrossRefGoogle Scholar
  18. 18.
    Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Müller, M.S.: Assessing the performance of OpenMP programs on the Intel Xeon Phi. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 547–558. Springer, Heidelberg (2013). CrossRefGoogle Scholar
  19. 19.
    Silla, F., Prades, J., Iserte, S., Reano, C.: Remote GPU virtualization: is it useful? In: 2016 2nd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB), pp. 41–48 (2016)Google Scholar
  20. 20.
    The Global Scientific Information and Computing Center (GSIC): TSUBAME 2.5 User’s Guide: User Environment. Accessed 4 2017
  21. 21.
    Top500-The List., November 2016
  22. 22.
    Verschoor, M., Jalba, A.C.: Analysis and performance estimation of the Conjugate Gradient method on multiple GPUs. Parallel Comput. 38(10–11), 552–575 (2012)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Vázquez, F., Ortega, G., Fernández, J.J., Garzón, E.M.: Improving the performance of the sparse matrix vector product with GPUs. In: 10th IEEE International Conference on Computer and Information Technology, pp. 1146–1151 (2010)Google Scholar
  24. 24.
    Wienke, S., an Mey, D., Müller, M.S.: Accelerators for technical computing: is it worth the pain? A TCO perspective. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 330–342. Springer, Heidelberg (2013). CrossRefGoogle Scholar
  25. 25.
    Wienke, S., Iliev, H., an Mey, D., Müller, M.S.: Modeling the productivity of HPC systems on a computing center scale. In: Kunkel, J.M., Ludwig, T. (eds.) ISC High Performance 2015. LNCS, vol. 9137, pp. 358–375. Springer, Cham (2015). CrossRefGoogle Scholar
  26. 26.
    Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC—first experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012). CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.IT CenterRWTH Aachen UniversityAachenGermany
  2. 2.JARA – High-Performance ComputingAachenGermany

Personalised recommendations