The Journal of Supercomputing

, Volume 73, Issue 1, pp 114–125 | Cite as

An approach to optimise the energy efficiency of iterative computation on integrated GPU–CPU systems

  • E. M. GarzónEmail author
  • J. J. Moreno
  • J. A. Martínez


Currently, the energy efficiency of computational systems is of paramount relevance. In this work, an approach for improving energy efficiency is proposed in the context of the iterative computation on integrated GPU-CPU systems. The proposal, referred to as E-ADITHE, combines iterative procedures with: (1) a heuristic scheme for processing units selection according to the estimation of energy efficiency and (2) the load balancing on heterogeneous processors. There is a wide variety of iterative algorithms related to science and engineering which can take advantage of E-ADITHE. The Beltrami filter has been selected as a representative example of such procedures and its OpenCL version has been used to validate E-ADITHE. The analysis of the results shows that E-ADITHE improves automatically the energy efficiency of parallel iterative algorithm on modern heterogeneous processors.


Energy efficiency Heterogeneous processors Parallel iterative algorithms Integrated CPU–GPU 


  1. 1.
    AMD (2015) AMD compute cores. A new era of computing. AMD enables CPU and GPU cores to work together on a single APU chip.
  2. 2.
    Chen X, Xu C, Dick RP, Mao ZM (2010) Performance and power modeling in a multi-programmed multi-core environment. In: Proceedings of the 47th design automation conference, DAC ’10. ACM, New York, pp 813–818Google Scholar
  3. 3.
    Clarke D, Ilic A, Lastovetsky A, Rychkov V, Sousa L, Zhong Z (2014) Design and optimization of scientific applications for highly heterogeneous and hierarchical HPC platforms using functional computation performance models. Wiley, New York, pp 235–260Google Scholar
  4. 4.
    Cocaa-Fernndez A, Ranilla J, Snchez L (2015) Energy-efficient allocation of computing node slots in HPC clusters through parameter learning and hybrid genetic fuzzy system modeling. J Supercomput 71(3):1163–1174CrossRefGoogle Scholar
  5. 5.
    Deng Y, Hu Y, Meng Xi, Zhu Y, Zhang Z, Han J (2014) Predictively booting nodes to minimize performance degradation of a power-aware web cluster. Clust Comput 17(4):1309–1322CrossRefGoogle Scholar
  6. 6.
    Fernandez JJ (2009) Tomobflow: feature-preserving noise filtering for electron tomography. BMC Bioinform 10:178CrossRefGoogle Scholar
  7. 7.
    Fernández JJ, Martínez JA (2010) Three-dimensional feature-preserving noise reduction for real-time electron tomography. Digit Signal Process 20(4):1162–1172CrossRefGoogle Scholar
  8. 8.
    Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness (Series of Books in the Mathematical Sciences) W.H. Freeman, 1st ednGoogle Scholar
  9. 9.
    Hong S, Kim H (2010) An integrated GPU power and performance model. SIGARCH Comput Archit News 38(3):280–289CrossRefGoogle Scholar
  10. 10.
    Kaleem R, Barik R, Shpeisman T, Lewis BT, Hu Ch, Pingali K (2014) Adaptive heterogeneous scheduling for integrated GPUs. In: Proceedings of the 23rd international conference on parallel architectures and compilation, PACT ’14. ACM, New York, pp 151–162Google Scholar
  11. 11.
    Kang Y, Choi W, Kim B, Kim J (2014) On tradeoff between the two compromise factors in assigning tasks on a cluster computing. Clust Comput 17(3):861–870CrossRefGoogle Scholar
  12. 12.
    Kimmel R, Sochen NA, Malladi R (1997) From high energy physics to low level vision. Lect Notes Comput Sci 1252:236–247CrossRefGoogle Scholar
  13. 13.
    Leng J, Hetherington T, ElTantawy A, Gilani S, Kim NS, Aamodt TM, Reddi VJ (2013) GPUWattch: enabling energy optimizations in GPGPUs. SIGARCH Comput Archit News 41(3):487–498CrossRefGoogle Scholar
  14. 14.
    Martínez JA, Vázquez F, Garzón EM, Fernández JJ (2011) Real-time electron tomography based on GPU computing. In: Euro-Par 2010 Parallel Processing Workshops, LNCS, vol 6586. Springer, Berlin, Heidelberg, pp 201–208Google Scholar
  15. 15.
    Martinez JA, Almeida F, Garzon EM, Acosta A, Blanco V (2011) Adaptive load balancing of iterative computation on heterogeneous nondedicated systems. J Supercomput 58(3):385–393CrossRefGoogle Scholar
  16. 16.
    Martinez JA, Garzon EM, Plaza A, Garcia I (2011) Automatic tuning of iterative computation on heterogeneous multiprocessors with ADITHE. J Supercomput 58(2):151–159CrossRefGoogle Scholar
  17. 17.
    Mittal S, Vetter JS (2014) A survey of methods for analyzing and improving GPU energy efficiency. ACM Comput Surv 47(2):19:1–19:23CrossRefGoogle Scholar
  18. 18.
  19. 19.
    Press WH, Flannery BP, Teukolsky SA (1992) Vetterling WT numerical recipes: the art of scientific computing. Cambridge University Press, CambridgezbMATHGoogle Scholar
  20. 20.
    Scogland TRW, Lin H, Feng W (2010) A first look at integrated gpus for green high-performance computing. Comput Sci Res Dev 25(3–4):125–134CrossRefGoogle Scholar
  21. 21.
    Tian Y, Lin C, Li K (2014) Managing performance and power consumption tradeoff for multiple heterogeneous servers in cloud computing. Clust Comput 17(3):943–955CrossRefGoogle Scholar
  22. 22.
    Ukidave Y, Kaeli DR (2013) Analyzing optimization techniques for power efficiency on heterogeneous platforms. In: Parallel and distributed processing symposium workshops PhD Forum (IPDPSW), 2013 IEEE 27th International, pp 1040–1049Google Scholar
  23. 23.
    Wang H, Sathish V, Singh R, Schulte MJ, Kim NS (2012) Workload and Power budget partitioning for single-chip heterogeneous processors. In: Proceedings of the 21st international conference on parallel architectures and compilation techniques, PACT ’12. ACM, New York, pp 401–410Google Scholar
  24. 24.
    Weaver VM, Johnson M, Kasichayanula K, Ralph J, Luszczek P, Terpstra D, Moore S (2012) Measuring energy and power with PAPI. In: Proceedings of the 2012 41st international conference on parallel processing workshops, ICPPW ’12. IEEE Computer Society, Washington, DC, pp 262–268Google Scholar
  25. 25.
    Yuffe M, Knoll E, Mehalel M, Shor J, Kurts T (2011) A fully integrated multi-CPU, GPU and memory controller 32nm processor. In: Solid-state circuits conference digest of technical papers (ISSCC), 2011 IEEE International, pp 264–266Google Scholar
  26. 26.
    Zhong Z, Rychkov V, Lastovetsky A (2014) Data partitioning on multicore and multi-GPU platforms using functional performance models. Comput IEEE Trans PP(99):1–1Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • E. M. Garzón
    • 1
    Email author
  • J. J. Moreno
    • 1
  • J. A. Martínez
    • 1
  1. 1.Department of InformaticsUniversity of Almería ceiA3AlmeríaSpain

Personalised recommendations