Abstract
The use of heterogeneous systems in supercomputing is on the rise as they improve both performance and energy efficiency. However, the programming of these machines requires considerable effort to get the best results in massively data-parallel applications. Maat is a library that enables OpenCL programmers to efficiently execute single data-parallel kernels using all the available devices on a heterogeneous system. It offers a set of load balancing methods, which perform the data partitioning and distribution among the devices, exploiting more of the performance of the system and consequently reducing execution time. Until now, however, a study of the implications of these on the energy consumption has not been made. Therefore, this paper analyses the energy efficiency of the different load balancing methods compared to a baseline system that uses just a single GPU. To evaluate the impact of the heterogeneity of the system, the GPUs were set to different frequencies. The obtained results show that in all the studied cases there is at least one load balancing method that improves energy efficiency.
Similar content being viewed by others
References
Benner P, Remón A, Dufrechou E, Ezzatti P, Quintana-Ortí Enrique S (2015) Extending lyapack for the solution of band lyapunov equations on hybrid CPU–GPU platforms. J Supercomput 71(2):740–750
Cai X, Lai G, Lin X (2013) Forecasting large scale conditional volatility and covariance using neural network on GPU. J Supercomput 63(2):490–507
Niemeyer KE, Sung CJ (2014) Recent progress and challenges in exploiting graphics processors in computational fluid dynamics. J Supercomput 67(2):528–564
Pérez B, Bosque JL, Beivide R (2016) Simplifying programming and load balancing of data parallel applications on heterogeneous systems. In: Proc. of the 9th workshop on general purpose processing using GPU, 2016, pp 42–51
Beaumont O, Boudet V, Petitet A, Rastello F, Robert Yves (2001) A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers). IEEE Trans Comput 50(10):1052–1070
Amd accelerated parallel processing software development kit v2.9. Accesed Nov 2015
Rotem E, Naveh A, Rajwan D, Ananthakrishnan A, Weissmann E (2011) Power management architecture of the 2nd generation Intel Core microarchitecture, formerly codenamed Sandy Bridge. In: IEEE Int. HotChips Symp. on High-Perf. Chips (HotChips 2011), 2011
NVIDIA. NVIDIA Management Library (NVML). Accesed April 2016
Castillo E, Camarero C, Borrego A, Bosque JL (2015) Financial applications on multi-CPU and multi-GPU architectures. J Supercomput 71(2):729–739
de la Lama Carlos S, Toharia P, Bosque JL, Robles OD (2012) Static multi-device load balancing for opencl. In: Proc. of ISPA, IEEE Computer Society, 2012, pp 675–682
Lee J, Samadi M, Park Y, Mahlke S (2013) Transparent CPU–GPU collaboration for data-parallel Kernels on heterogeneous systems. In: Proc. of PACT, Piscataway, NJ, USA, 2013. IEEE Press, pp 245–256
Binotto APD, Pereira CE, Fellner DW (2010) Towards dynamic reconfigurable load-balancing for hybrid desktop platforms. In: Proc. of IPDPS, pp 1–4. IEEE Computer Society, April 2010
Boyer M, Skadron K, Che S, Jayasena N (2013) Load balancing in a changing World: dealing with heterogeneity and performance variability. In: Proc. of the ACM international conference on computing frontiers, 2013, pp 21:1–21:10
Kaleem R, Barik R, Shpeisman T, Lewis BT, Hu C, Pingali K (2014) Adaptive heterogeneous scheduling for integrated GPUs. In: Proc. of PACT, New York, NY, USA, 2014. ACM, pp 151–162
Hong S, Kim H (2010) An integrated GPU power and performance model. SIGARCH Comput Archit News 38(3):280–289
Abe Y, Sasaki H, Kato S, Inoue K, Edahiro M, Peres M (2014) Power and performance characterization and modeling of GPU-accelerated systems. In: Parallel and distributed processing symposium, 2014 IEEE 28th International, 2014, pp 113–122
Price DC, Clark MA, Barsdell BR, Babich R, Greenhill LJ (2015) Optimizing performance-per-watt on GPUs in high performance computing. Comput Sci Res Dev 1–9. doi:10.1007/s00450-015-0300-5
Burtscher M, Zecena I, Zong Z (2014) Measuring GPU power with the k20 built-in sensor. In: Proceedings of workshop on general purpose processing using GPUs, GPGPU-7, New York, NY, USA, 2014. ACM, pp 28:28–28:36
Ge R, Vogt R, Majumder J, Alam A, Burtscher M, Zong Z (2013) Effects of dynamic voltage and frequency scaling on a k20 GPU. In: Proceedings of the 42 Int. conference on parallel processing, ICPP ’13, 2013, pp 826–833
Ma K, Li X, Chen W, Zhang C, Wang X (2012) GreenGPU: A holistic approach to energy efficiency in GPU–CPU heterogeneous architectures. In: 41st International conference on parallel processing, ICPP 2012, 2012, pp 48–57
Wang G, Ren X (2010) Power-efficient work distribution method for CPU–GPU heterogeneous system. In: International symposium on parallel and distributed processing with applications, Sept 2010, pp 122–129
Garzón, EM, Moreno JJ, Martínez JA (2016) An approach to optimise the energy efficiency of iterative computation on integrated GPU–CPU systems. J Supercomput, 1–12. doi:10.1007/s11227-016-1643-9
Tosun Suleyman (2012) Energy- and reliability-aware task scheduling onto heterogeneous mpsoc architectures. J Supercomput 62(1):265–289
León G, Molero JM, Garzón EM, García I, Plaza A, Quintana-Ortí ES (2015) Exploring the performance–power–energy balance of low-power multicore and manycore architectures for anomaly detection in remote sensing. J Supercomput 71(5):1893–1906
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pérez, B., Stafford, E., Bosque, J.L. et al. Energy efficiency of load balancing for data-parallel applications in heterogeneous systems. J Supercomput 73, 330–342 (2017). https://doi.org/10.1007/s11227-016-1864-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1864-y