Skip to main content
Log in

Impacts of optimization strategies on performance, power/energy consumption of a GPU based parallel reduction

  • Published:
Journal of Central South University Aims and scope Submit manuscript

Abstract

In the era of modern high performance computing, GPUs have been considered an excellent accelerator for general purpose data-intensive parallel applications. To achieve application speedup from GPUs, many of performance-oriented optimization techniques have been proposed. However, in order to satisfy the recent trend of power and energy consumptions, power/energy-aware optimization of GPUs needs to be investigated with detailed analysis in addition to the performance-oriented optimization. In this work, in order to explore the impact of various optimization strategies on GPU performance, power and energy consumptions, we evaluate performance and power/energy consumption of a well-known application running on different commercial GPU devices with the different optimization strategies. In particular, in order to see the more generalized performance and power consumption patterns of GPU based accelerations, our evaluations are performed with three different Nvdia GPU generations (Fermi, Kepler and Maxwell architectures), various core clock frequencies and memory clock frequencies. We analyze how a GPU kernel execution is affected by optimization and what GPU architectural factors have much impact on its performance and power/energy consumption. This paper also categorizes which optimization technique primarily improves which metric (i.e., performance, power or energy efficiency). Furthermore, voltage frequency scaling (VFS) is also applied to examine the effect of changing a clock frequency on these metrics. In general, our work shows that effective GPU optimization strategies can improve the application performance significantly without increasing power and energy consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. STRATTON J A, ANSSARI N, RODRIGUES C, I J SUNG, OBEID N, CHANG L W, LIU G D, HWU W. Optimization and architecture effects on GPU computing workload performance [C]//Innovative Parallel Computing (InPar). San Jose, USA: IEEE, 2012: 1–10.

    Google Scholar 

  2. RYOO S, RODRIGUES C I, BAGHSORKHI S S, STONE S S, KIRK D B, HWU W W. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA [C]//Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '08). Utah, USA: ACM, 2008: 73–82.

    Google Scholar 

  3. JANG B, DO S, PIEN H, KAELI D. Architecture-aware optimization targeting multithreaded stream computing [C]//Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units-GPGPU-2. Washington DC, USA: ACM, 2009: 62–70.

    Google Scholar 

  4. JANG B, SCHAA D, MISTRY P, KAELI D. Exploiting memory access patterns to improve memory performance in data-parallel architectures [J]. IEEE Transactions on Parallel and Distributed Systems, 2011, 22(1): 105–118.

    Article  Google Scholar 

  5. MEI X, ZHAO K, LIU C, CHU X. Benchmarking the memory hierarchy of modern GPUs [M]. Heidelberg: Springer Berlin, 2014: 144–156.

    Google Scholar 

  6. SUDA R, REN D. Accurate measurements and precise modeling of power dissipation of CUDA kernels toward power optimized high performance CPU-GPU computing [C]//The Tenth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT). Hiroshima, Japan: IEEE, 2009.

    Google Scholar 

  7. CALANDRINI G 1, GARDEL A, BRAVO I, REVENGA P, LÁZARO J L, TOLED-MOREO F J. Power measurement methods for energy efficient applications [J]. Sensors, 2013, 13(6): 7786–7796.

    Article  Google Scholar 

  8. DASGUPTA A, HONG S, KIM H, PARK J. A new temperature distribution measurement method on GPU architectures using thermocouples [R]. Georgia Institute of Technology, 2012.

    Google Scholar 

  9. LANG J, RÜNGER G. High-resolution power profiling of GPU functions using low-resolution measurement [C]//19th International Conference on Parallel Processing (Euro-Par 2013). Aachen, Germany: Springer-Verlag Berlin, 2013: 801–812.

    Google Scholar 

  10. COLLANGE S, DEFOUR D, TISSERAND A. Power consumption of GPUs from a software perspective [C]//ICCS '09 Proceedings of the 9th International Conference on Computational Science. LA, USA: Springer-Verlag Berlin, 2009: 914–923.

    Google Scholar 

  11. PHUONG T Y, LEE J G. Software based ultrasound B-mode/beamforming optimization on GPU and its performance prediction [C]//21th IEEE International Conference on High Performance Computing. Goa, India: IEEE, 2014: 1–10.

    Google Scholar 

  12. JIAO Y, LIN H, BALAJI P, FENG W. Power and performance characterization of computational kernels on the GPU [C]//IEEE/ACM International Conference on Green Computing and Communications and International Conference on Cyber, Physical and Social Computing. Hangzhou, China: IEEE, 2010: 221–228.

    Google Scholar 

  13. HONG S. Modeling performance and power for energy-efficient GPGPU computing [D]. Georgia: Georgia Institute of Technology, 2012.

    Google Scholar 

  14. HONG S, KIM H. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness [J]. In ACM SIGARCH Computer Architecture News, 2009, 37: 152–163.

    Article  Google Scholar 

  15. HONG S, KIM H. An integrated GPU power and performance model [J]. In ACM SIGARCH Computer Architecture News, 2010, 38: 280–289.

    Article  Google Scholar 

  16. KASICHAYANULA K, TERPSTRA D, LUSZCZEK P, TOMOV S, MOORE S, PETERSON G D. Power aware computing on GPUs [C]//Symposium on Application Accelerators in High Performance Computing. Illinois, USA: IEEE, 2012: 64–73.

    Google Scholar 

  17. ABE Y, SASAKI H, KATO S, INOUE K, EDAHIRO M, PERES M. Power and performance characterization and modeling of GPUaccelerated systems [C]//IEEE 28th International Symposium on Parallel and Distributed Processing. Arizona, USA: IEEE, 2014: 113–122.

    Google Scholar 

  18. ABE Y, SASAKI H, PERES M, INOUE K, MURAKAMI K, KATO S. Power and performance analysis of GPU-accelerated systems [C]//Proceedings of the ACM Workshop on Power-Aware Computing and System. California, USA: ACM, 2012.

    Google Scholar 

  19. MEI Xin-xin, YUNG Ling-sing, ZHAO Kai-yong, CHU Xiao-wen. A measurement study of GPU DVFS on energy conservation [C]//Proceedings of the ACM Workshop on Power-Aware Computing and System. Pennsylvania, USA: ACM, 2013.

    Google Scholar 

  20. RONG G E. VOGT R, MAJUMDER J, ALAM A, BURTSCHER M, ZONG Zi-liang. Effects of dynamic voltage and frequency scaling on a K20 GPU [C]//Parallel Processing (ICPP), 2013 42nd International Conference. Lyon, France: IACC, 2013: 826–833.

    Google Scholar 

  21. UKIDAVE Y, ZIABARI A K, MISTRY P, SCHIRNER G, KAELI D. Analyzing power efficiency of optimization techniques and algorithm design methods for applications on heterogeneous platforms [J]. International Journal of High Performance Computing Applications 2014, 28(3): 319–334.

    Article  Google Scholar 

  22. COPLIN J, BURTSCHER M. Effects of source-code optimizations on GPU performance and energy consumption [C]//Proceedings of the 8th Workshop on General Purpose Processing using GPUs. San Francisco, CA, USA, 2015.

    Google Scholar 

  23. HARRIS M. Optimizing parallel reduction in CUDA, nvidia developer technology [EB/OL]. [2007]. http://developer.download. nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/reduction/d oc/reduction.pdf.

    Google Scholar 

  24. NVIDIA [EB/OL]. [2017]. http://www.geforce.com/hardware/ desktop-gpus/geforce-gtx-titan-x/specifications.

  25. HARRIS M. 5 things you should know about the new maxwell GPU architecture [EB/OL]. [2014–02–21]. http://devblogs.nvidia. com/parallelforall/5-things-you-should-know-about-new-maxwell-gp u-architecture/

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeong-Gun Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Phuong, T.Y., Lee, DY. & Lee, JG. Impacts of optimization strategies on performance, power/energy consumption of a GPU based parallel reduction. J. Cent. South Univ. 24, 2624–2637 (2017). https://doi.org/10.1007/s11771-017-3676-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11771-017-3676-5

Keywords

Navigation