An study of the effect of process malleability in the energy efficiency on GPU-based clusters

Abstract

The adoption of graphic processor units (GPU) in high-performance computing (HPC) infrastructures determines, in many cases, the energy consumption of those facilities. For this reason, an efficient management and administration of the GPU-enabled clusters is crucial for the optimum operation of the cluster. The main aim of this work is to study and design efficient mechanisms of job scheduling across GPU-enabled clusters by leveraging process malleability techniques, able to reconfigure running jobs, depending on the cluster status. This paper presents a model that improves the energy efficiency when processing a batch of jobs in an HPC cluster. The model is validated through the MPDATA algorithm, as a representative example of stencil computation used in numerical weather prediction. The proposed solution applies the efficiency metrics obtained in a new reconfiguration policy aimed at job arrays. This solution allows the reduction in the processing time of workloads up to 4.8 times and reduction in the energy consumption up to 2.4 times the cluster compared to the traditional job management, where jobs are not reconfigured during their execution.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. 1.

    https://slurm.schedmd.com/job_array.html.

  2. 2.

    http://wiki.gridengine.info/wiki/index.php/Simple-Job-Array-Howto.

  3. 3.

    http://docs.adaptivecomputing.com/mwm/7-0/Content/topics/jobAdministration/jobarrays.html.

References

  1. 1.

    Nvidia web page. http://www.nvidia.com (2018). Accessed: 2018-12-17

  2. 2.

    Barlas G (2014) Multicore and GPU programming: an integrated approach. Elsevier, Amsterdam

    Google Scholar 

  3. 3.

    Burtscher M, Zecena I, Zong Z (2014) Measuring GPU power with the K20 built-in sensor. ACM, New York, pp 28:28–28:36

    Google Scholar 

  4. 4.

    Comprés I, Mo-Hellenbrand A, Gerndt M, Bungartz HJ (2016) Infrastructure and API extensions for elastic execution of MPI applications. In: Proceedings of the 23rd European MPI Users’ Group Meeting on—EuroMPI 2016. ACM Press, New York, pp 82–97

  5. 5.

    El Maghraoui K, Desell TJ, Szymanski BK, Varela CA (2009) Malleable iterative MPI applications. Concurr Comput Practice Exp 21(3):393–413

    Article  Google Scholar 

  6. 6.

    El Maghraoui K, Szymanski BK, Varela C (2006) An architecture for reconfigurable iterative MPI applications in dynamic environments. In: International Conference on Parallel Processing and Applied Mathematics, pp 258–27

  7. 7.

    Feitelson DG (1996) Packing schemes for gang scheduling. In: Feitelson DG, Rudolph L (eds) Lecture notes in computer science book series (LNCS), vol 1162. Springer, Berlin, pp 89–110

    Google Scholar 

  8. 8.

    Gupta A, Acun B, Sarood O, Kalé LV (2014) Towards realizing the potential of malleable jobs. In: 21st International Conference on High Performance Computing (HiPC)

  9. 9.

    Iserte S (2018) High-throughput computation through efficient resource management. Ph.D. thesis, Universitat Jaume I, Castelló de la Plana

  10. 10.

    Iserte S, Martínez H, Barrachina S, Castillo M, Mayo R, Peña AJ (2018) Dynamic reconfiguration of noniterative scientific applications. Int J High Perform Comput Appl 33:804–816

    Article  Google Scholar 

  11. 11.

    Iserte S, Mayo R, Quintana-Ortí ES, Beltran V, Peña AJ (2017) Efficient scalable computing through flexible applications and adaptive workloads. In: 10th International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2). Bristol

  12. 12.

    Iserte S, Mayo R, Quintana-Ortí ES, Beltran V, Peña AJ (2018) DMR API: improving cluster productivity by turning applications into malleable. Parallel Comput 78:54–66

    Article  Google Scholar 

  13. 13.

    Kungand HT, Leiserson CE (1979) Algorithms for VLSI processor arrays. In: Introduction to VLSI Systems. Addison-Wesley

  14. 14.

    Lemarinier P, Hasanov K, Venugopal S, Katrinis K (2016) Architecting malleable MPI applications for priority-driven adaptive scheduling. In: Proceedings of the 23rd European MPI Users’ Group Meeting (EuroMPI), pp 74–81

  15. 15.

    Lublin U, Feitelson DG (2003) The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J Parallel Distrib Comput 63(11):1105–1122

    Article  Google Scholar 

  16. 16.

    Martín G, Singh DE, Marinescu MC, Carretero J (2015) Enhancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration. Parallel Comput 46:60–77

    Article  Google Scholar 

  17. 17.

    Prabhakaran S, Neumann M, Rinke S, Wolf F, Gupta A, Kale LV (2015) A batch system with efficient adaptive scheduling for malleable and evolving applications. In: 2015 IEEE International Parallel and Distributed Processing Symposium, pp 429–438

  18. 18.

    Prusa J, Smolarkiewicz P, Wyszogrodzki A (2008) EULAG, a computational model for multiscale flows. Comput Fluids 37:1193–1207

    MathSciNet  Article  Google Scholar 

  19. 19.

    Rojek K (2018) Machine learning method for energy reduction by utilizing dynamic mixed precision on GPU-based supercomputers. Concurr Comput Practice Exp. https://doi.org/10.1002/cpe.4644

    Article  Google Scholar 

  20. 20.

    Rojek K, Quintana-Ortí ES, Wyrzykowski R (2017) Modeling power consumption of 3D MPDATA and the CG method on ARM and intel multicore architectures. J Supercomput 73(10):4373–4389

    Article  Google Scholar 

  21. 21.

    Rojek K, Wyrzykowski R (2017) Performance modeling of 3D MPDATA simulations on GPU cluster. J Supercomput 73(2):664–675

    Article  Google Scholar 

  22. 22.

    Rojek K, Wyrzykowski R, Kuczynski L (2017) Systematic adaptation of stencil-based 3D MPDATA to GPU architectures. Concurr Comput Practice Exp 29(9):e3970

    Article  Google Scholar 

  23. 23.

    Sainz F, Bellon J, Beltran V, Labarta J (2015) Collective offload for heterogeneous clusters. In: 22nd International Conference on High Performance Computing (HiPC)

  24. 24.

    Smolarkiewicz P (2006) Multidimensional positive definite advection transport algorithm: an overview. Int J Numer Methods Fluids 50:1123–1144

    MathSciNet  Article  Google Scholar 

  25. 25.

    Spenke F, Balzer K, Frick S, Hartke B, Dieterich JM (2019) Malleable parallelism with minimal effort for maximal throughput and maximal hardware load. Comput Theor Chem 1151:72–77

    Article  Google Scholar 

  26. 26.

    Sudarsan R, Ribbens C (2009) Scheduling resizable parallel applications. In: International Symposium on Parallel and Distributed Processing

  27. 27.

    Szustak L (2018) Strategy for data-flow synchronizations in stencil parallel computations on multi-/manycore systems. J Supercomput 74(4):1534–1546

    Article  Google Scholar 

  28. 28.

    Yoo AB, Jette MA, Grondona M (2003) SLURM: simple linux utility for resource management. In: 9th International Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), pp 44–60

Download references

Acknowledgements

The researcher from Universitat Jaume I (UJI) was supported by the project TIN2017-82972-R from MINECO and FEDER. The National Polish Science Centre supported the researcher from Czestochowa University of Technology under Grant No. UMO-2015/17/D/ST6/04059 and under Grant No. UMO-2017/26/D/ST6/00687. This work was partially performed during a short-term scientific missing (STSM) from Krzysztof Rojek to UJI supported by the EU COST IC1305. Authors are also grateful to the BSC for letting them use the HPC facilities. Finally, authors want to thank Prof. Enrique S. Quintana-Ortí for his invaluable insights and comments, as well as the anonymous reviewers whose suggestions significantly improved the manuscript.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Sergio Iserte.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Iserte, S., Rojek, K. An study of the effect of process malleability in the energy efficiency on GPU-based clusters. J Supercomput 76, 255–274 (2020). https://doi.org/10.1007/s11227-019-03034-x

Download citation

Keywords

  • Dynamic resource management
  • Job reconfiguration
  • MPI malleability
  • Job array-aware scheduling
  • MPDATA algorithm
  • Heterogeneous programming model