Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Energy efficiency of load balancing for data-parallel applications in heterogeneous systems


The use of heterogeneous systems in supercomputing is on the rise as they improve both performance and energy efficiency. However, the programming of these machines requires considerable effort to get the best results in massively data-parallel applications. Maat is a library that enables OpenCL programmers to efficiently execute single data-parallel kernels using all the available devices on a heterogeneous system. It offers a set of load balancing methods, which perform the data partitioning and distribution among the devices, exploiting more of the performance of the system and consequently reducing execution time. Until now, however, a study of the implications of these on the energy consumption has not been made. Therefore, this paper analyses the energy efficiency of the different load balancing methods compared to a baseline system that uses just a single GPU. To evaluate the impact of the heterogeneity of the system, the GPUs were set to different frequencies. The obtained results show that in all the studied cases there is at least one load balancing method that improves energy efficiency.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3


  1. 1.

    Benner P, Remón A, Dufrechou E, Ezzatti P, Quintana-Ortí Enrique S (2015) Extending lyapack for the solution of band lyapunov equations on hybrid CPU–GPU platforms. J Supercomput 71(2):740–750

  2. 2.

    Cai X, Lai G, Lin X (2013) Forecasting large scale conditional volatility and covariance using neural network on GPU. J Supercomput 63(2):490–507

  3. 3.

    Niemeyer KE, Sung CJ (2014) Recent progress and challenges in exploiting graphics processors in computational fluid dynamics. J Supercomput 67(2):528–564

  4. 4.

    Pérez B, Bosque JL, Beivide R (2016) Simplifying programming and load balancing of data parallel applications on heterogeneous systems. In: Proc. of the 9th workshop on general purpose processing using GPU, 2016, pp 42–51

  5. 5.

    Beaumont O, Boudet V, Petitet A, Rastello F, Robert Yves (2001) A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers). IEEE Trans Comput 50(10):1052–1070

  6. 6.

    Amd accelerated parallel processing software development kit v2.9. Accesed Nov 2015

  7. 7.

    Rotem E, Naveh A, Rajwan D, Ananthakrishnan A, Weissmann E (2011) Power management architecture of the 2nd generation Intel Core microarchitecture, formerly codenamed Sandy Bridge. In: IEEE Int. HotChips Symp. on High-Perf. Chips (HotChips  2011), 2011

  8. 8.

    NVIDIA. NVIDIA Management Library (NVML). Accesed April 2016

  9. 9.

    Castillo E, Camarero C, Borrego A, Bosque JL (2015) Financial applications on multi-CPU and multi-GPU architectures. J Supercomput 71(2):729–739

  10. 10.

    de la Lama Carlos S, Toharia P, Bosque JL, Robles OD (2012) Static multi-device load balancing for opencl. In: Proc. of ISPA, IEEE Computer Society, 2012, pp 675–682

  11. 11.

    Lee J, Samadi M, Park Y, Mahlke S (2013) Transparent CPU–GPU collaboration for data-parallel Kernels on heterogeneous systems. In: Proc. of PACT, Piscataway, NJ, USA, 2013. IEEE Press, pp 245–256

  12. 12.

    Binotto APD, Pereira CE, Fellner DW (2010) Towards dynamic reconfigurable load-balancing for hybrid desktop platforms. In: Proc. of IPDPS, pp 1–4. IEEE Computer Society, April 2010

  13. 13.

    Boyer M, Skadron K, Che S, Jayasena N (2013) Load balancing in a changing World: dealing with heterogeneity and performance variability. In: Proc. of the ACM international conference on computing frontiers, 2013, pp 21:1–21:10

  14. 14.

    Kaleem R, Barik R, Shpeisman T, Lewis BT, Hu C, Pingali K (2014) Adaptive heterogeneous scheduling for integrated GPUs. In: Proc. of PACT, New York, NY, USA, 2014. ACM, pp 151–162

  15. 15.

    Hong S, Kim H (2010) An integrated GPU power and performance model. SIGARCH Comput Archit News 38(3):280–289

  16. 16.

    Abe Y, Sasaki H, Kato S, Inoue K, Edahiro M, Peres M (2014) Power and performance characterization and modeling of GPU-accelerated systems. In: Parallel and distributed processing symposium, 2014 IEEE 28th International, 2014, pp 113–122

  17. 17.

    Price DC, Clark MA, Barsdell BR, Babich R, Greenhill LJ (2015) Optimizing performance-per-watt on GPUs in high performance computing. Comput Sci Res Dev 1–9. doi:10.1007/s00450-015-0300-5

  18. 18.

    Burtscher M, Zecena I, Zong Z (2014) Measuring GPU power with the k20 built-in sensor. In: Proceedings of workshop on general purpose processing using GPUs, GPGPU-7, New York, NY, USA, 2014. ACM, pp 28:28–28:36

  19. 19.

    Ge R, Vogt R, Majumder J, Alam A, Burtscher M, Zong Z (2013) Effects of dynamic voltage and frequency scaling on a k20 GPU. In: Proceedings of the 42 Int. conference on parallel processing, ICPP ’13, 2013, pp 826–833

  20. 20.

    Ma K, Li X, Chen W, Zhang C, Wang X (2012) GreenGPU: A holistic approach to energy efficiency in GPU–CPU heterogeneous architectures. In: 41st International conference on parallel processing, ICPP 2012, 2012, pp 48–57

  21. 21.

    Wang G, Ren X (2010) Power-efficient work distribution method for CPU–GPU heterogeneous system. In: International symposium on parallel and distributed processing with applications, Sept 2010, pp 122–129

  22. 22.

    Garzón, EM, Moreno JJ, Martínez JA (2016) An approach to optimise the energy efficiency of iterative computation on integrated GPU–CPU systems. J Supercomput, 1–12. doi:10.1007/s11227-016-1643-9

  23. 23.

    Tosun Suleyman (2012) Energy- and reliability-aware task scheduling onto heterogeneous mpsoc architectures. J Supercomput 62(1):265–289

  24. 24.

    León G, Molero JM, Garzón EM, García I, Plaza A, Quintana-Ortí ES (2015) Exploring the performance–power–energy balance of low-power multicore and manycore architectures for anomaly detection in remote sensing. J Supercomput 71(5):1893–1906

Download references

Author information

Correspondence to José Luis Bosque.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pérez, B., Stafford, E., Bosque, J.L. et al. Energy efficiency of load balancing for data-parallel applications in heterogeneous systems. J Supercomput 73, 330–342 (2017).

Download citation


  • Heterogeneous systems
  • Load balancing
  • Energy efficiency
  • OpenCL