The Visual Computer

, Volume 31, Issue 6–8, pp 1045–1054 | Cite as

GPGPU-Perf: efficient, interval-based DVFS algorithm for mobile GPGPU applications

  • SeongKi Kim
  • Young J. Kim
Original Article


Although general purpose computations on graphics processing unit (GPGPU) technologies are available even on GPUs, their performance has been seriously affected by the underlying dynamic voltage and frequency scaling (DVFS) mechanism of GPU. In order to save the energy, eventually prolonging the battery life, the DVFS adjusts the GPU’s frequency according to the past utilization. When the GPU processes graphic tasks only, it is enough to process them within a fixed time (typically 30–60 frames per second), so the DVFS parameters can be conservatively set. However, in GPGPU case, the GPU should process them at much higher rates depending on applications. Although a modification of DVFS parameters may improve the GPGPU performance, the energy efficiency is sacrificed, and the performance of graphic tasks is affected, as these parameters are shared by both graphic and GPGPU tasks. In order to improve the GPGPU performance without influencing the graphic performance, we devise the new GPGPU-Perf algorithm that adjusts the DVFS parameters such as thresholds and an interval. The new algorithm controls the frequency more intelligently for mobile GPGPU applications, and thus the performance over energy increases by 1.44 times with no influences on graphic tasks and any modifications of GPGPU algorithms. To the best of our knowledge, this paper is the first work that proposes a GPU-DVFS algorithm for GPGPU applications.


DVFS GPGPU Mobile device OpenCL OpenGL ES 



This work was supported in part by NRF in Korea (2012R1A2A2A01046246, 2012R1A2A2A06047007, 2014K1A3A1A17073365) and MCST/KOCCA in the CT R&D program 2014 (R2014060011). Young J. Kim is the corresponding author.


  1. 1.
    Altantsetseg, E., Muraki, Y., Matsuyama, K., Konno, K.: Feature line extraction from unorganized noisy point clouds using truncated Fourier series. Vis. Comput. 29(6–8), 617–626 (2013)CrossRefGoogle Scholar
  2. 2.
    Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software, pp. 163–174 (2009)Google Scholar
  3. 3.
    Boyer, M.: Improving Resource Utilization in Heterogeneous CPU-GPU Systems. Ph.D. thesis, University of Virginia, Virginia (2013)Google Scholar
  4. 4.
    Chang, B., Woo, S., Ihm, I.: GPU-based parallel construction of compact visual hull meshes. Vis. Comput. 30(2), 201–211 (2014)CrossRefGoogle Scholar
  5. 5.
    Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: A benchmark suite for heterogeneous computing. In: Proceedings of IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54 (2009)Google Scholar
  6. 6.
    Choi, K., Soma, R., Pedram, M.: Dynamic voltage and frequency scaling based on workload decomposition. In: Proceedings of the International Symposium on Low Power Electronics and Design, pp. 174–179 (2004)Google Scholar
  7. 7.
    Huang, M., Liu, F., Wu, E.: A GPU-based matting Laplacian solver for high resolution image matting. Vis. Comput. 26(6–8), 943–950 (2010)CrossRefGoogle Scholar
  8. 8.
    Khronos: The OpenCL C Specification Version: 2.0. Khronos Group (2014)Google Scholar
  9. 9.
    Leng, J., Hetherington, T., ElTantawy, A., Gilani, S., Kim, N.S., Aamodt, T.M., Reddi, V.J.: GPUWattch: enabling energy optimizations in GPGPUs. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, pp. 487–498 (2013)Google Scholar
  10. 10.
    Liu, F., Harada, T., Lee, Y., Kim, Y.J.: Real-time collision culling of a million bodies on graphics processing units. ACM Trans. Gr. 29(6), 154:1–154:8 (2010)Google Scholar
  11. 11.
    Liu, F., Kim, Y.J.: Exact and adaptive signed distance fields computation for rigid and deformable models on GPUs. IEEE Trans. Vis. Comput. Gr. (TVCG) 20(5), 714–725 (2014)CrossRefGoogle Scholar
  12. 12.
    Ma, K., Li, X., Chen, W., Zhang, C., Wang, X.: GreenGPU: A holistic approach to energy efficiency in GPU-CPU heterogeneous architectures. In: Proceedings of International Conference on Parallel Processing, pp. 48–57 (2012)Google Scholar
  13. 13.
    Mochockitt, B.C., Lahirit, K., Cadambit, S., Hut, X.S.: Signature-based workload estimation for mobile 3D graphics. In: Proceedings of Design Automation Conference, pp. 592–597 (2006)Google Scholar
  14. 14.
    Orgerie, A.C., Assuncao, MDd, Lefevre, L.: A survey on techniques for improving the energy efficiency of large-scale distributed systems. ACM Comput. Surv. 46(4), 47:1–47:31 (2014)CrossRefGoogle Scholar
  15. 15.
    Pallipadi, V., Starikovskiy, A.: The ondemand governor: past, present and future. Proc. Linux Symp. 2, 223–238 (2006)Google Scholar
  16. 16.
    Rister, B., Wang, G., Wu, M., Cavallaro, J.R.: A fast and efficient sift detector using the mobile GPU. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)Google Scholar
  17. 17.
    Shen, J., Varbanescu, A.L.: A detailed performance analysis of the openMP Rodinia benchmark. In: Proceedings of Technical Report PDS-2011-011, Delft University of Technology, DelftGoogle Scholar
  18. 18.
    Xinxin, M., Ling, S.Y., Kaiyong, Z., Xiaowen, C.: A measurement study of GPU DVFS on energy conservation. In: Proceedings of the Workshop on Power-Aware Computing and Systems (2013)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringEwha Womans UniversitySeoulKorea

Personalised recommendations