Feedback Control Optimization for Performance and Energy Efficiency on CPU-GPU Heterogeneous Systems

  • Feng-Sheng LinEmail author
  • Po-Ting Liu
  • Ming-Hua Li
  • Pao-Ann Hsiung
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10048)


Owing to the rising awareness of environment protection, high performance is not the only aim in system design, energy efficiency has increasingly become an important goal. In accordance with this goal, heterogeneous systems which are more efficient than CPU-based homogeneous systems, and occupying a growing proportion in the Top500 and the Green500 lists. Nevertheless, heterogeneous system design being more complex presents greater challenges in achieving a good tradeoff between performance and energy efficiency for applications running on such systems. To address the performance energy tradeoff issue in CPU-GPU heterogeneous systems, we propose a novel feedback control optimization (FCO) method that alternates between frequency scaling of device and division of kernel workload between CPU and GPU. Given a kernel and a workload division, frequency scaling involves finding near-optimal core frequency of the CPU and of the GPU. Further, an iterative algorithm is proposed for finding a near-optimal workload division that balance workload between CPU and GPU at a frequency that was optimal for the previous workload division. The frequency scaling phase and workload division phase are alternatively performed until the proposed FCO method converges and finds a configuration including core frequency for CPU, core frequency for GPU, and the workload division. Experiments show that compared with the state-of-the-art GreenGPU method, performance can be improved by 7.9%, while energy consumption can be reduced by 4.16%.


CPU GPU Heterogeneous system Frequency scaling Workload division Performance Energy efficiency 


  1. 1.
    Top. 500 Supercomputer Sites (2013).
  2. 2.
    The Green500 (2013).
  3. 3.
    Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54. IEEE Press, October 2009Google Scholar
  4. 4.
  5. 5.
    Matthew, S., Henry, D., Karthikeyan, S.: Porting CMP benchmarks to GPUs. Department of Computer Sciences, The University of Wisconsin-Madison, Technical report (2011)Google Scholar
  6. 6.
    Li, J., Martinez, J.F., Huang, M.C.: The thrifty barrier: energy-aware synchronization in shared-memory multiprocessors. In: Proceedings of the 10th International Symposium on High Performance Computer Architecture (HPCA), p. 14. IEEE Computer Society, February 2004Google Scholar
  7. 7.
    Lim, M., Freeh, V.W., Lowenthal, D.K.: Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs. In: Proceedings of the ACM/IEEE Conference on Supercomputing (SC). IEEE Press, November 2006Google Scholar
  8. 8.
    Hong, S., Kim, H.: An integrated GPU power and performance model. In: Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA), pp. 280–289. ACM Press, June 2010Google Scholar
  9. 9.
    Song, S., Su, C., Rountree, B., Cameron, K.W.: A simplified and accurate model of power-performance efficiency on emergent GPU architectures. In: Proceedings of the IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS), pp. 673–686. IEEE Computer Society, May 2013Google Scholar
  10. 10.
    Luk, C., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 45–55. IEEE Press, December 2009Google Scholar
  11. 11.
    Diamos, G.F., Yalamanchili, S.: Harmony: an execution model and runtime for heterogeneous many core systems. In: Proceedings of the 17th International Symposium on High Performance Distributed Computing (HPDC), pp. 197–200. ACM Press, June 2008Google Scholar
  12. 12.
    Ravi, V.T., Ma, W., Chiu, D., Agrawal, G.: Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In: Proceedings of the 24th ACM International Conference on Supercomputing (ICS), pp. 137–146. ACM Press, June 2010Google Scholar
  13. 13.
    Grewe, D., ÓBoyle, M.F.P.: A static task partitioning approach for heterogeneous systems using OpenCL. In: Knoop, J. (ed.) CC 2011. LNCS, vol. 6601, pp. 286–305. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-19861-8_16 CrossRefGoogle Scholar
  14. 14.
    Weaver, V.M., Johnson, M., Kasichayanula, K., Ralph, J., Luszczek, P., Terpstra, D., Moore, S.: Measuring energy and power with PAPI. In: Proceedings of the 41st International Conference on Parallel Processing Workshops (ICPPW), pp. 262–268. IEEE Press, September 2012Google Scholar
  15. 15.
    Rafique, M.M., Butt, A.R., Nikolopoulos, D.S.: A capabilities-aware framework for using computational accelerators in data-intensive computing. J. Parallel Distrib. Comput. 71(2), 185–197 (2011)CrossRefGoogle Scholar
  16. 16.
    Ma, K., Li, X., Chen, W., Zhang, X., Wang, X.: GreenGPU: a holistic approach to energy efficiency in GPU-CPU heterogeneous architectures. In: Proceedings of the 41st International Conference on Parallel Processing (ICPP), pp. 48–57. IEEE Press, September 2012Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Feng-Sheng Lin
    • 1
    Email author
  • Po-Ting Liu
    • 2
  • Ming-Hua Li
    • 1
  • Pao-Ann Hsiung
    • 2
  1. 1.Information and Communication LaboratoriesIndustrial Technology Research InstituteHsinchuTaiwan
  2. 2.Department of Computer Science and Information TechnologyNational Chung Cheng UniversityChiayiTaiwan

Personalised recommendations