Energy prediction of CUDA application instances using dynamic regression models

Rejitha, R. S.; Benedict, Shajulin; Alex, Suja A.; Infanto, Shany

doi:10.1007/s00607-016-0534-5

Energy prediction of CUDA application instances using dynamic regression models

Published: 04 January 2017

Volume 99, pages 765–790, (2017)
Cite this article

Computing Aims and scope Submit manuscript

R. S. Rejitha ORCID: orcid.org/0000-0002-1718-2635¹,
Shajulin Benedict¹,
Suja A. Alex¹ &
…
Shany Infanto¹

367 Accesses
6 Citations
Explore all metrics

Abstract

GPGPUs no longer seem to be an inconsequential component of supercomputing architectures, and a section of HPC application developers no longer refrain from utilizing GPGPUs. CUDA, in general, has remained a successful computing platform for those architectures. Thousands of scientific applications from various domains, such, as bio-informatics, HEP, and so forth, have been accelerated using CUDA in the past few years. In fact, the energy consumption issue still remains a serious challenge for the HPC and GPGPU communities. This paper proposes energy prediction approaches using dynamic regression models, such as parallel dynamic random forest modeling (P-DynRFM), dynamic random forest modeling (DynRFM), dynamic support vector machines (DynSVM), and dynamic linear regression modeling (DynLRM). These models identify energy efficient CUDA application instances while considering the block size, grid size, and the other tunable parameters, such as problem size. The predictions of CUDA application instances have been attained by executing a few CUDA application instances and predicting the other CUDA application instances based on the performance metrics of applications, such as number of instructions, memory issues, and so forth. The proposed energy prediction mechanisms were evaluated with CUDA applications such as Nbody and Particle Simulations on two GPGPU machines. The proposed dynamic prediction mechanisms achieved a 50.26 to 61.23 percentage of energy/performance prediction improvements when compared to the classical prediction models; and, the parallel implementation of the dynamic RFM (P-DynRFM) recorded over 83 percentage points of prediction time improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Article Open access 19 January 2019

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

Cloud benchmarking and performance analysis of an HPC application in Amazon EC2

Article Open access 28 June 2023

References

Alcides F, Bruno C (2013) miniumGPU: An Intelligent Framework for GPU Programming, Facing the Multicore-Challenge III, Volume 7686 of the series Lecture Notes in Computer Science, pp 96–107
Ashwin MA, Mayank D, Wu CF (2016) CampProf: A Visual Performance Analysis Tool for Memory Bound GPU Kernels, in https://vtechworks.lib.vt.edu/bitstream/handle/10919/19729/CampProf-TechReport.pdf?sequence=3&isAllowed=y. Accessed on 10 July 2016
Bacigalupo DA, Jarvis SA, He L, Spooner DP, Dillenberger DN, Nudd GR (2005) An investigation into the application of different performance prediction methods to distributed enterprise applications. J Supercomput 34:2
Article Google Scholar
Barnes BJ, Rountree B, Lowenthal DK, Reeves J, de Supinski B, Schulz M (2008) A regression-based approach to scalability prediction. In: 22nd annual international conference on supercomputing
Benedict S, Rejitha RS, Phillip G, Prodan R, Fahringer T (2015) Energy prediction of OpenMP applications using random forest modeling approach. In: iWAPT2015 @ IPDPS, pp 1251–1260. doi:10.1109/IPDPSW.2015.12
Boyer M, Jiayuan M, Kumaran K (2013) Improving GPU performance prediction with data transfer modeling. In: 2013 IEEE 27th international parallel and distributed processing symposium workshops & PhD forum (IPDPSW), pp 1097–1106. doi:10.1109/IPDPSW.2013.236
Brehm J, Worley P (1997) Performance prediction for complex parallel applications. In: 11th international symposium on parallel processing
Cao J, Jarvis SA, Spoon DP, Turner JD, Kerbyson DJ, Nudd GR (2002) Performance prediction technology for agent-based resource management in grid environments. In: 16th international parallel and distributed processing symposium, DC, USA, Washington
Carrington L, Snavely A, Wolter N (2006) A performance prediction framework for scientific applications. Future Gener Comput Syst 22(3)
CUDA application catalog, in http://www.nvidia.com/content/gpu-applications/PDF/gpu-applications-catalog.pdf. Accessed in Nov 2016
Elizabeth T, Nathan C, David AP, John B, Barry IP, Dave H (2015) Parallel cuda implementation of conflict detection for application to airspace deconfliction. J Supercomput 71(10):3787–3810. doi:10.1007/s11227-015-1467-z
Article Google Scholar
Filipovic J, Benkner S (2015) OpenCL kernel fusion for GPU, Xeon Phi and CPU. In: Proceedings of IEEE international symposium on computer architecture and high performance computing (SBAC-PAD), Florianopolis, Brazil, October 2015, IEEE Computer Society. doi:10.1109/SBAC-PAD.2015.29
Haifeng W, Yunpeng C. Predicting power consumption of GPUs with fuzzy wavelet neural networks. Parallel Comput 44:18–36. doi:10.1016/j.parco.2015.02.002
Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. SIGARCH Com-put Archit News 37(3):152–163. doi:10.1145/1555815.1555775
Article Google Scholar
Iverson MA, Ozguner F, Potter L (1999) Statistical prediction of task execution times through analytic benchmarking for scheduling in a heterogeneous environment. IEEE Trans Comput 48(12):1374–1379
Article Google Scholar
Jaewoong S, Aniruddha D, Hyesoon K, Richard V. A performance analysis framework for identifying potential benefits in GPGPU applications. In: Proceedings of the 17th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP ’12), ACM, New York, NY, USA, 11–22. doi:10.1145/2145816.2145819
Kapadia N, Fortes J, Brodley C (1999) Predictive application-performance modeling in a computational grid environment. In: 8th international symposium on high performance distributed computing
Kubota Y, Takahashi D (2011) Optimization of Sparse Matrix-Vector Multiplication by Auto Selecting Storage Schemes on GPU. LNCS, in ICCSA 2011:547–561
Google Scholar
Li H, Groep D, Templon J, Wolters L (2004) Predicting job start times on clusters. In: International symposium on cluster computing and the grid
Monakov A, Lokhmotov A, Avetisyan A (2010) Automatically tuning sparse matrix-vector multiplication for GPU architectures. LNCS, Springer, pp 111–125. doi:10.1007/978-3-642-11515-8_10
Nadeem F, Murtaza Y, Prodan R, Fahringer T (2006) International conference on e-science and grid computing. In: Soft benchmarks-based application performance prediction using a minimum training set, Amsterdam, Netherlands
Peppher project, http://www.peppher.edu. Accessed in Nov 2016
Shajulin B, Rejitha RS, Alex SA. Energy and performance prediction of CUDA applications using dynamic regression models. In: Proceedings of the 9th India software engineering conference (ISEC ’16). ACM, New York, NY, USA, 37–47. doi:10.1145/2856636.2856643
Shuai C, Michael B, Jiayuan M, David T, Sheaffer JW, Kevin S (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380. doi:10.1016/j.jpdc.2008.05.014
Article Google Scholar
Shuaiwen Song, Chunyi Su, Rountree B., Cameron K.W., “A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures,” 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 673–686, 20–24 May 2013. doi:10.1109/IPDPS.2013.73
Sierra-Canto X, Madera-Ramirez F, Uc-Cetina V (2010) Parallel Training of a Back-Propagation Neural Network Using CUDA,” 2010 Ninth International Conference on in Machine Learning and Applications (ICMLA), pp. 307–312, 12–14 Dec. doi:10.1109/ICMLA.2010.52
Smith W, Foster I, Taylor V (2004) Predicting application run times with historical information. J Parallel Distributed Comput 64(9):1007–1016
Article MATH Google Scholar
Snavely A, Carrington L, Wolter N, Labarta J, Badia R, Purkayastha A (2002) A framework for performance modeling and prediction,” in Supercomputing Conference
Sodhi S, Subhlok J, Xu Q (2008) Performance prediction with skeletons. Cluster Comput 11(2):151–165
Article Google Scholar
R. Susukita, H. Ando, M. Aoyagi, H. Honda, Y. Inadomi, K. Inoue, S. Ishizuki, Y. Kimura, H. Komatsu, M. Kurokawa, K. J. Murakami, H. Shibamura, S. Yamamura and Y. Yu, “Performance prediction of large-scale parallell system and application using macro-level simulation,” in Supercomputing Conference, 2008
Takefusa A, Tatebe O, Matsuoka S, Morita Y (2003) Performance analysis of scheduling and replication algorithms on Grid Datafarm architecture for high-energy physics applications,” in 12th International Symposium on High Performance Distributed Computing
Taylor V, Wu X, Geisler J, Stevens R (2002) “Using Kernel Couplings to Predict Parallel Application Performance,” in 11th IEEE International Symposium on High Performance Distributed Computing. DC, USA, Washington
Tingxing D, Dobrev V, Kolev T, Rieben R, Tomov S, Dongarra J (2014) A Step towards Energy Efficient Computing: Redesigning a Hydrodynamic Application on CPU-GPU. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pp. 972–981, 19–23. doi:10.1109/IPDPS.2014.103
Tirado-Ramos A, Tsouloupas G, Dikaiakos M, Sloot P (2005) Grid Resource Selection by Application Benchmarking for Computational Haemodynamics Applications. In: International Conference on Computational Science, 2005
Usman D, Johan E, Christoph WK (2011) Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems. In: Proceedings of the 4th international workshop on multicore software engineering (IWMSE’11), ACM, New York, NY, USA, pp 25–32. doi:10.1145/1984693.1984697
Vedran M, Martina HD, Nataa HB (2014) Optimizing ELARS Algorithms using NVIDIA CUDA Heterogeneous Parallel Programming Platform”. ICT Innovations 2014, pp. 135–144. doi:10.1007/978-3-319-09879-1_14
Vraalsen F, Aydt RA, Mendes CL, Reed DA (2001) Performance Contracts: Predicting and Monitoring Grid Application Behavior,” in 2nd International Workshop on Grid Computing
Wu Y, Liu L, Mao J, Yang G, Zheng W (2007) An analytical model for performance evaluation in a computational grid,” in 3rd workshop on High performance computing in China
Wu X, Taylor V, Paris J (2006) A Web-based Prophesy Automated Performance Modeling System,” in International Conference on Web Technologies, Applications and Services
Xiangzheng S, Yunquan Z, Ting W, Xianyi Z, Liang Y, Li R (2011) Optimizing SpMV for Diagonal Sparse Matrices on GPU,” 2011 International Conference on in Parallel Processing (ICPP), pp 492–501. doi:10.1109/ICPP.2011.53
Yang LT, Ma X, Mueller F (2005) Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution,” in Supercomputing
Yero EJH, Henriques MAA (2006) Contention-sensitive static performance prediction for parallel distributed applications. Perform Evaluat 63(4):265–277
Article Google Scholar
Yooseong K, Shrivastava A (2011) CuMAPz: a tool to analyze memory access patterns in CUDA”, in Design Automation Conference (DAC), 2011 48th ACM/EDAC/IEEE, pp 128–133

Download references

Acknowledgements

This work is funded by the Department of Science and Technology of India under the Indo-Austrian PPP Scheme - Engineering Sciences division. The authors thank DST (India) and FWF (Austria) for funding this research work. In addition, they thank the reviewers of this paper for providing constructive comments.

Author information

Authors and Affiliations

HPCCLoud Research Laboratory, SXCCE, Anna University, Chennai, India
R. S. Rejitha, Shajulin Benedict, Suja A. Alex & Shany Infanto

Authors

R. S. Rejitha
View author publications
You can also search for this author in PubMed Google Scholar
Shajulin Benedict
View author publications
You can also search for this author in PubMed Google Scholar
Suja A. Alex
View author publications
You can also search for this author in PubMed Google Scholar
Shany Infanto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. S. Rejitha.

Additional information

This work is partially funded by the Indo-Austrian project scheme of FWF-DST (India): DST No. INT/AUA/FWF/P-02/2013.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rejitha, R.S., Benedict, S., Alex, S.A. et al. Energy prediction of CUDA application instances using dynamic regression models. Computing 99, 765–790 (2017). https://doi.org/10.1007/s00607-016-0534-5

Download citation

Received: 11 July 2016
Accepted: 19 December 2016
Published: 04 January 2017
Issue Date: August 2017
DOI: https://doi.org/10.1007/s00607-016-0534-5

Keywords

Mathematics Subject Classification

68M20

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Energy prediction of CUDA application instances using dynamic regression models

Abstract

Access this article

Similar content being viewed by others

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Can GPU performance increase faster than the code error rate?

Cloud benchmarking and performance analysis of an HPC application in Amazon EC2

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Energy prediction of CUDA application instances using dynamic regression models

Abstract

Access this article

Similar content being viewed by others

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Can GPU performance increase faster than the code error rate?

Cloud benchmarking and performance analysis of an HPC application in Amazon EC2

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation