Abstract
The adoption of graphic processor units (GPU) in high-performance computing (HPC) infrastructures determines, in many cases, the energy consumption of those facilities. For this reason, an efficient management and administration of the GPU-enabled clusters is crucial for the optimum operation of the cluster. The main aim of this work is to study and design efficient mechanisms of job scheduling across GPU-enabled clusters by leveraging process malleability techniques, able to reconfigure running jobs, depending on the cluster status. This paper presents a model that improves the energy efficiency when processing a batch of jobs in an HPC cluster. The model is validated through the MPDATA algorithm, as a representative example of stencil computation used in numerical weather prediction. The proposed solution applies the efficiency metrics obtained in a new reconfiguration policy aimed at job arrays. This solution allows the reduction in the processing time of workloads up to 4.8 times and reduction in the energy consumption up to 2.4 times the cluster compared to the traditional job management, where jobs are not reconfigured during their execution.
Similar content being viewed by others
References
Nvidia web page. http://www.nvidia.com (2018). Accessed: 2018-12-17
Barlas G (2014) Multicore and GPU programming: an integrated approach. Elsevier, Amsterdam
Burtscher M, Zecena I, Zong Z (2014) Measuring GPU power with the K20 built-in sensor. ACM, New York, pp 28:28–28:36
Comprés I, Mo-Hellenbrand A, Gerndt M, Bungartz HJ (2016) Infrastructure and API extensions for elastic execution of MPI applications. In: Proceedings of the 23rd European MPI Users’ Group Meeting on—EuroMPI 2016. ACM Press, New York, pp 82–97
El Maghraoui K, Desell TJ, Szymanski BK, Varela CA (2009) Malleable iterative MPI applications. Concurr Comput Practice Exp 21(3):393–413
El Maghraoui K, Szymanski BK, Varela C (2006) An architecture for reconfigurable iterative MPI applications in dynamic environments. In: International Conference on Parallel Processing and Applied Mathematics, pp 258–27
Feitelson DG (1996) Packing schemes for gang scheduling. In: Feitelson DG, Rudolph L (eds) Lecture notes in computer science book series (LNCS), vol 1162. Springer, Berlin, pp 89–110
Gupta A, Acun B, Sarood O, Kalé LV (2014) Towards realizing the potential of malleable jobs. In: 21st International Conference on High Performance Computing (HiPC)
Iserte S (2018) High-throughput computation through efficient resource management. Ph.D. thesis, Universitat Jaume I, Castelló de la Plana
Iserte S, Martínez H, Barrachina S, Castillo M, Mayo R, Peña AJ (2018) Dynamic reconfiguration of noniterative scientific applications. Int J High Perform Comput Appl 33:804–816
Iserte S, Mayo R, Quintana-Ortí ES, Beltran V, Peña AJ (2017) Efficient scalable computing through flexible applications and adaptive workloads. In: 10th International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2). Bristol
Iserte S, Mayo R, Quintana-Ortí ES, Beltran V, Peña AJ (2018) DMR API: improving cluster productivity by turning applications into malleable. Parallel Comput 78:54–66
Kungand HT, Leiserson CE (1979) Algorithms for VLSI processor arrays. In: Introduction to VLSI Systems. Addison-Wesley
Lemarinier P, Hasanov K, Venugopal S, Katrinis K (2016) Architecting malleable MPI applications for priority-driven adaptive scheduling. In: Proceedings of the 23rd European MPI Users’ Group Meeting (EuroMPI), pp 74–81
Lublin U, Feitelson DG (2003) The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J Parallel Distrib Comput 63(11):1105–1122
Martín G, Singh DE, Marinescu MC, Carretero J (2015) Enhancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration. Parallel Comput 46:60–77
Prabhakaran S, Neumann M, Rinke S, Wolf F, Gupta A, Kale LV (2015) A batch system with efficient adaptive scheduling for malleable and evolving applications. In: 2015 IEEE International Parallel and Distributed Processing Symposium, pp 429–438
Prusa J, Smolarkiewicz P, Wyszogrodzki A (2008) EULAG, a computational model for multiscale flows. Comput Fluids 37:1193–1207
Rojek K (2018) Machine learning method for energy reduction by utilizing dynamic mixed precision on GPU-based supercomputers. Concurr Comput Practice Exp. https://doi.org/10.1002/cpe.4644
Rojek K, Quintana-Ortí ES, Wyrzykowski R (2017) Modeling power consumption of 3D MPDATA and the CG method on ARM and intel multicore architectures. J Supercomput 73(10):4373–4389
Rojek K, Wyrzykowski R (2017) Performance modeling of 3D MPDATA simulations on GPU cluster. J Supercomput 73(2):664–675
Rojek K, Wyrzykowski R, Kuczynski L (2017) Systematic adaptation of stencil-based 3D MPDATA to GPU architectures. Concurr Comput Practice Exp 29(9):e3970
Sainz F, Bellon J, Beltran V, Labarta J (2015) Collective offload for heterogeneous clusters. In: 22nd International Conference on High Performance Computing (HiPC)
Smolarkiewicz P (2006) Multidimensional positive definite advection transport algorithm: an overview. Int J Numer Methods Fluids 50:1123–1144
Spenke F, Balzer K, Frick S, Hartke B, Dieterich JM (2019) Malleable parallelism with minimal effort for maximal throughput and maximal hardware load. Comput Theor Chem 1151:72–77
Sudarsan R, Ribbens C (2009) Scheduling resizable parallel applications. In: International Symposium on Parallel and Distributed Processing
Szustak L (2018) Strategy for data-flow synchronizations in stencil parallel computations on multi-/manycore systems. J Supercomput 74(4):1534–1546
Yoo AB, Jette MA, Grondona M (2003) SLURM: simple linux utility for resource management. In: 9th International Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), pp 44–60
Acknowledgements
The researcher from Universitat Jaume I (UJI) was supported by the project TIN2017-82972-R from MINECO and FEDER. The National Polish Science Centre supported the researcher from Czestochowa University of Technology under Grant No. UMO-2015/17/D/ST6/04059 and under Grant No. UMO-2017/26/D/ST6/00687. This work was partially performed during a short-term scientific missing (STSM) from Krzysztof Rojek to UJI supported by the EU COST IC1305. Authors are also grateful to the BSC for letting them use the HPC facilities. Finally, authors want to thank Prof. Enrique S. Quintana-Ortí for his invaluable insights and comments, as well as the anonymous reviewers whose suggestions significantly improved the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Iserte, S., Rojek, K. An study of the effect of process malleability in the energy efficiency on GPU-based clusters. J Supercomput 76, 255–274 (2020). https://doi.org/10.1007/s11227-019-03034-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-019-03034-x