Energy efficiency of load balancing for data-parallel applications in heterogeneous systems

Pérez, Borja; Stafford, Esteban; Bosque, José Luis; Beivide, Ramón

doi:10.1007/s11227-016-1864-y

Energy efficiency of load balancing for data-parallel applications in heterogeneous systems

Published: 08 September 2016

Volume 73, pages 330–342, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

457 Accesses
16 Citations
Explore all metrics

Abstract

The use of heterogeneous systems in supercomputing is on the rise as they improve both performance and energy efficiency. However, the programming of these machines requires considerable effort to get the best results in massively data-parallel applications. Maat is a library that enables OpenCL programmers to efficiently execute single data-parallel kernels using all the available devices on a heterogeneous system. It offers a set of load balancing methods, which perform the data partitioning and distribution among the devices, exploiting more of the performance of the system and consequently reducing execution time. Until now, however, a study of the implications of these on the energy consumption has not been made. Therefore, this paper analyses the energy efficiency of the different load balancing methods compared to a baseline system that uses just a single GPU. To evaluate the impact of the heterogeneity of the system, the GPUs were set to different frequencies. The obtained results show that in all the studied cases there is at least one load balancing method that improves energy efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting Co-execution with OneAPI: Heterogeneity from a Modern Perspective

Load balancing in a heterogeneous world: CPU-Xeon Phi co-execution of data-parallel kernels

Article 17 March 2018

To Distribute or Not to Distribute: The Question of Load Balancing for Performance or Energy

References

Benner P, Remón A, Dufrechou E, Ezzatti P, Quintana-Ortí Enrique S (2015) Extending lyapack for the solution of band lyapunov equations on hybrid CPU–GPU platforms. J Supercomput 71(2):740–750
Article Google Scholar
Cai X, Lai G, Lin X (2013) Forecasting large scale conditional volatility and covariance using neural network on GPU. J Supercomput 63(2):490–507
Article Google Scholar
Niemeyer KE, Sung CJ (2014) Recent progress and challenges in exploiting graphics processors in computational fluid dynamics. J Supercomput 67(2):528–564
Article Google Scholar
Pérez B, Bosque JL, Beivide R (2016) Simplifying programming and load balancing of data parallel applications on heterogeneous systems. In: Proc. of the 9th workshop on general purpose processing using GPU, 2016, pp 42–51
Beaumont O, Boudet V, Petitet A, Rastello F, Robert Yves (2001) A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers). IEEE Trans Comput 50(10):1052–1070
Article MathSciNet MATH Google Scholar
Amd accelerated parallel processing software development kit v2.9. Accesed Nov 2015
Rotem E, Naveh A, Rajwan D, Ananthakrishnan A, Weissmann E (2011) Power management architecture of the 2nd generation Intel Core microarchitecture, formerly codenamed Sandy Bridge. In: IEEE Int. HotChips Symp. on High-Perf. Chips (HotChips 2011), 2011
NVIDIA. NVIDIA Management Library (NVML). Accesed April 2016
Castillo E, Camarero C, Borrego A, Bosque JL (2015) Financial applications on multi-CPU and multi-GPU architectures. J Supercomput 71(2):729–739
Article Google Scholar
de la Lama Carlos S, Toharia P, Bosque JL, Robles OD (2012) Static multi-device load balancing for opencl. In: Proc. of ISPA, IEEE Computer Society, 2012, pp 675–682
Lee J, Samadi M, Park Y, Mahlke S (2013) Transparent CPU–GPU collaboration for data-parallel Kernels on heterogeneous systems. In: Proc. of PACT, Piscataway, NJ, USA, 2013. IEEE Press, pp 245–256
Binotto APD, Pereira CE, Fellner DW (2010) Towards dynamic reconfigurable load-balancing for hybrid desktop platforms. In: Proc. of IPDPS, pp 1–4. IEEE Computer Society, April 2010
Boyer M, Skadron K, Che S, Jayasena N (2013) Load balancing in a changing World: dealing with heterogeneity and performance variability. In: Proc. of the ACM international conference on computing frontiers, 2013, pp 21:1–21:10
Kaleem R, Barik R, Shpeisman T, Lewis BT, Hu C, Pingali K (2014) Adaptive heterogeneous scheduling for integrated GPUs. In: Proc. of PACT, New York, NY, USA, 2014. ACM, pp 151–162
Hong S, Kim H (2010) An integrated GPU power and performance model. SIGARCH Comput Archit News 38(3):280–289
Article Google Scholar
Abe Y, Sasaki H, Kato S, Inoue K, Edahiro M, Peres M (2014) Power and performance characterization and modeling of GPU-accelerated systems. In: Parallel and distributed processing symposium, 2014 IEEE 28th International, 2014, pp 113–122
Price DC, Clark MA, Barsdell BR, Babich R, Greenhill LJ (2015) Optimizing performance-per-watt on GPUs in high performance computing. Comput Sci Res Dev 1–9. doi:10.1007/s00450-015-0300-5
Burtscher M, Zecena I, Zong Z (2014) Measuring GPU power with the k20 built-in sensor. In: Proceedings of workshop on general purpose processing using GPUs, GPGPU-7, New York, NY, USA, 2014. ACM, pp 28:28–28:36
Ge R, Vogt R, Majumder J, Alam A, Burtscher M, Zong Z (2013) Effects of dynamic voltage and frequency scaling on a k20 GPU. In: Proceedings of the 42 Int. conference on parallel processing, ICPP ’13, 2013, pp 826–833
Ma K, Li X, Chen W, Zhang C, Wang X (2012) GreenGPU: A holistic approach to energy efficiency in GPU–CPU heterogeneous architectures. In: 41st International conference on parallel processing, ICPP 2012, 2012, pp 48–57
Wang G, Ren X (2010) Power-efficient work distribution method for CPU–GPU heterogeneous system. In: International symposium on parallel and distributed processing with applications, Sept 2010, pp 122–129
Garzón, EM, Moreno JJ, Martínez JA (2016) An approach to optimise the energy efficiency of iterative computation on integrated GPU–CPU systems. J Supercomput, 1–12. doi:10.1007/s11227-016-1643-9
Tosun Suleyman (2012) Energy- and reliability-aware task scheduling onto heterogeneous mpsoc architectures. J Supercomput 62(1):265–289
Article Google Scholar
León G, Molero JM, Garzón EM, García I, Plaza A, Quintana-Ortí ES (2015) Exploring the performance–power–energy balance of low-power multicore and manycore architectures for anomaly detection in remote sensing. J Supercomput 71(5):1893–1906
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering and Electronics Department, University of Cantabria, Santander, Spain
Borja Pérez, Esteban Stafford, José Luis Bosque & Ramón Beivide

Authors

Borja Pérez
View author publications
You can also search for this author in PubMed Google Scholar
Esteban Stafford
View author publications
You can also search for this author in PubMed Google Scholar
José Luis Bosque
View author publications
You can also search for this author in PubMed Google Scholar
Ramón Beivide
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José Luis Bosque.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pérez, B., Stafford, E., Bosque, J.L. et al. Energy efficiency of load balancing for data-parallel applications in heterogeneous systems. J Supercomput 73, 330–342 (2017). https://doi.org/10.1007/s11227-016-1864-y

Download citation

Published: 08 September 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s11227-016-1864-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Energy efficiency of load balancing for data-parallel applications in heterogeneous systems

Abstract

Access this article

Similar content being viewed by others

Exploiting Co-execution with OneAPI: Heterogeneity from a Modern Perspective

Load balancing in a heterogeneous world: CPU-Xeon Phi co-execution of data-parallel kernels

To Distribute or Not to Distribute: The Question of Load Balancing for Performance or Energy

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Energy efficiency of load balancing for data-parallel applications in heterogeneous systems

Abstract

Access this article

Similar content being viewed by others

Exploiting Co-execution with OneAPI: Heterogeneity from a Modern Perspective

Load balancing in a heterogeneous world: CPU-Xeon Phi co-execution of data-parallel kernels

To Distribute or Not to Distribute: The Question of Load Balancing for Performance or Energy

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation