The Journal of Supercomputing

, Volume 75, Issue 3, pp 1610–1624 | Cite as

A heuristic technique to improve energy efficiency with dynamic load balancing

  • Alberto CabreraEmail author
  • Alejandro Acosta
  • Francisco Almeida
  • Vicente Blanco


Heterogeneous computers require a well-distributed workload to operate efficiently. When possible, this load balancing procedure should redistribute the workload with minimal knowledge of the system architecture, to reduce overhead. We propose a generic dynamic load balancing technique for iterative problems, independent from the resource to optimize. Proof of this generalization is given through formalization of the designed technique. A heuristic algorithm is defined based upon this formalization, with a structure that facilitates different objective functions. As a result, swapping the objective function can be done with relatively low effort. This heuristic is implemented to minimize energy consumption in an application. We use this application to solve three different dynamic programming problems with multiple GPUs. The implementation is described and then compared against two different workloads, the homogeneous distribution and another dynamic load balancing technique. Our experimentation shows good results in minimizing the overall energy consumption with low overhead.


Dynamic load balancing Iterative algorithms Parallel computing Energy efficiency 



This work was supported by the Spanish Ministry of Science, Innovation and Universities through the TIN2016-78919-R project, the Government of the Canary Islands, with the project ProID2017010130 and the grant TESIS2017010134, which is co-financed by the Ministry of Economy, Industry, Commerce and Knowledge of Canary Islands and the European Social Funds (ESF), operative program integrated of Canary Islands 2014-2020 Strategy Aim 3, Priority Topic 74(85%); the Spanish network CAPAP-H, and the European COST Action CHIPSET.


  1. 1.
    Acosta A, Almeida F (2013) Skeletal based programming for dynamic programming on multi-GPU systems. J Supercomput 65(3):1125–1136. CrossRefGoogle Scholar
  2. 2.
    Agullo E, Demmel J, Dongarra J, Hadri B, Kurzak J, Langou J, Ltaief H, Luszczek P, Tomov S (2009) Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. J Phys Conf Ser 180(1):012037CrossRefGoogle Scholar
  3. 3.
    Almeida F, Arteaga J, Blanco V, Cabrera A (2015) Energy measurement tools for ultrascale computing: a survey. Supercomput Front Innov 2(2):64–76Google Scholar
  4. 4.
    Beloglazov A, Abawajy J, Buyya R (2012) Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing. Future Gener Comput Syst 28(5):755–768. (Special Section: Energy efficiency in large-scale distributed systems)CrossRefGoogle Scholar
  5. 5.
    Browne S, Dongarra J, Garner N, Ho G, Mucci P (2000) A portable programming interface for performance evaluation on modern processors. Int J High Perform Comput Appl 14(3):189–204. CrossRefGoogle Scholar
  6. 6.
    Cabrera A, Acosta A, Almeida F, Blanco V (2017) Energy efficient dynamic load balancing over multi-GPU heterogeneous systems. In: Parallel Processing and Applied Mathematics—12th International Conference, PPAM 2017, Lublin, Poland, September 10–13, 2017, Revised Selected Papers, Part II, pp 123–132.
  7. 7.
    Cabrera A, Almeida F, Arteaga J, Blanco V (2014) Measuring energy consumption using EML (energy measurement library). Comput Sci Res Dev 30(2):135–143. CrossRefGoogle Scholar
  8. 8.
    Dongarra J, Bosilca G, Chen Z, Eijkhout V, Fagg GE, Fuentes E, Langou J, Luszczek P, Pjesivac-Grbovic J, Seymour K, You H, Vadhiyar SS (2006) Self-adapting numerical software (SANS) effort. IBM J Res Dev 50(2/3):223–238CrossRefGoogle Scholar
  9. 9.
    Garzón EM, Moreno JJ, Martínez JA (2017) An approach to optimise the energy efficiency of iterative computation on integrated GPU–CPU systems. J Supercomput 73(1):114–125. CrossRefGoogle Scholar
  10. 10.
    Ge R, Feng X, Song S, Chang HC, Li D, Cameron KW (2010) Powerpack: energy profiling and analysis of high-performance systems and applications. IEEE Trans Parallel Distrib Syst 21(5):658–671CrossRefGoogle Scholar
  11. 11.
    Guzek M, Kliazovich D, Bouvry P (2015) HEROS: energy-efficient load balancing for heterogeneous data centers. In: Pu C, Mohindra A (eds) 8th IEEE International Conference on Cloud Computing, CLOUD 2015, New York City, NY, USA, June 27–July 2, 2015, pp 742–749. IEEE.
  12. 12.
    Hendrickson B, Leland R (1995) An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM J Sci Comput 16(2):452–469. MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Innovative Computing Laboratory (2011) University of Tennessee: the parallel linear algebra for scalable multi-core architectures (PLASMA) project. Accessed May 2018
  14. 14.
    Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680. MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Kumar V, Grama A, Vempaty N (1994) Scalable load balancing techniques for parallel computers. J Parallel Distrib Comput 22(1):60–79. CrossRefGoogle Scholar
  16. 16.
    Marqués R, Paulino H, Alexandre F, Medeiros PD (2013) Algorithmic skeleton framework for the orchestration of GPU computations. In: Wolf F, Mohr B, an Mey D (eds) Euro-Par 2013 Parallel Processing—19th International Conference, Aachen, Germany, August 26–30, 2013. Proceedings, Lecture Notes in Computer Science, vol 8097, pp 874–885. Springer.
  17. 17.
    Martínez J, Garzón E, Plaza A, García I (2009) Automatic tuning of iterative computation on heterogeneous multiprocessors with ADITHE. J Supercomput.
  18. 18.
    Meuer H, Strohmaier E, Dongarra J, Simon H Top500 list. Accessed May 2018
  19. 19.
    Mladenović N, Hansen P (1997) Variable neighborhood search. Comput Oper Res 24(11):1097–1100. MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Peláez I, Almeida F, Suárez F (2007) DPSKEL: a skeleton based tool for parallel dynamic programming. In: 7th International Conference on Parallel Processing and Applied Mathematics, PPAM2007. Gdansk, Poland, pp 1104–1113.
  21. 21.
    Reddy R, Lastovetsky A (2017) Bi-objective optimization of data-parallel applications on homogeneous multicore clusters for performance and energy. IEEE Trans Comput 1(1):1–1. Google Scholar
  22. 22.
    Richmond P, Romano D (2010) FLAME: Flexible large-scale agent modelling environment on the GPU. Accessed Dec 2018
  23. 23.
    Steuwer M, Gorlatch S (2014) Skelcl: a high-level extension of opencl for multi-GPU systems. J Supercomput 69(1):25–33. CrossRefGoogle Scholar
  24. 24.
    Takouna I, Rojas-Cessa R, Sachs K, Meinel C (2013) Communication-aware and energy-efficient scheduling for parallel applications in virtualized data centers. In: IEEE/ACM 6th International Conference on Utility and Cloud Computing, UCC 2013, Dresden, Germany, December 9–12, 2013, pp 251–255. IEEE.
  25. 25.
    The FLAME Project (2011) Flame: formal linear algebra methods environment. Accessed May 2018
  26. 26.
    Willebeek-LeMair MH, Reeves AP (1993) Strategies for dynamic load balancing on highly parallel computers. IEEE Trans Parallel Distrib Syst 4(9):979–993. CrossRefGoogle Scholar
  27. 27.
    Xu C, Lau FC (1997) Load balancing in parallel computers: theory and practice. Kluwer Academic Publishers, NorwellzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.HPC Group, Escuela Superior de Ingeniería y TecnologíaUniversidad de La LagunaSan Cristóbal de La LagunaSpain

Personalised recommendations