Skip to main content

Energy prediction of CUDA application instances using dynamic regression models


GPGPUs no longer seem to be an inconsequential component of supercomputing architectures, and a section of HPC application developers no longer refrain from utilizing GPGPUs. CUDA, in general, has remained a successful computing platform for those architectures. Thousands of scientific applications from various domains, such, as bio-informatics, HEP, and so forth, have been accelerated using CUDA in the past few years. In fact, the energy consumption issue still remains a serious challenge for the HPC and GPGPU communities. This paper proposes energy prediction approaches using dynamic regression models, such as parallel dynamic random forest modeling (P-DynRFM), dynamic random forest modeling (DynRFM), dynamic support vector machines (DynSVM), and dynamic linear regression modeling (DynLRM). These models identify energy efficient CUDA application instances while considering the block size, grid size, and the other tunable parameters, such as problem size. The predictions of CUDA application instances have been attained by executing a few CUDA application instances and predicting the other CUDA application instances based on the performance metrics of applications, such as number of instructions, memory issues, and so forth. The proposed energy prediction mechanisms were evaluated with CUDA applications such as Nbody and Particle Simulations on two GPGPU machines. The proposed dynamic prediction mechanisms achieved a 50.26 to 61.23 percentage of energy/performance prediction improvements when compared to the classical prediction models; and, the parallel implementation of the dynamic RFM (P-DynRFM) recorded over 83 percentage points of prediction time improvements.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11


  1. Alcides F, Bruno C (2013) miniumGPU: An Intelligent Framework for GPU Programming, Facing the Multicore-Challenge III, Volume 7686 of the series Lecture Notes in Computer Science, pp 96–107

  2. Ashwin MA, Mayank D, Wu CF (2016) CampProf: A Visual Performance Analysis Tool for Memory Bound GPU Kernels, in Accessed on 10 July 2016

  3. Bacigalupo DA, Jarvis SA, He L, Spooner DP, Dillenberger DN, Nudd GR (2005) An investigation into the application of different performance prediction methods to distributed enterprise applications. J Supercomput 34:2

    Article  Google Scholar 

  4. Barnes BJ, Rountree B, Lowenthal DK, Reeves J, de Supinski B, Schulz M (2008) A regression-based approach to scalability prediction. In: 22nd annual international conference on supercomputing

  5. Benedict S, Rejitha RS, Phillip G, Prodan R, Fahringer T (2015) Energy prediction of OpenMP applications using random forest modeling approach. In: iWAPT2015 @ IPDPS, pp 1251–1260. doi:10.1109/IPDPSW.2015.12

  6. Boyer M, Jiayuan M, Kumaran K (2013) Improving GPU performance prediction with data transfer modeling. In: 2013 IEEE 27th international parallel and distributed processing symposium workshops & PhD forum (IPDPSW), pp 1097–1106. doi:10.1109/IPDPSW.2013.236

  7. Brehm J, Worley P (1997) Performance prediction for complex parallel applications. In: 11th international symposium on parallel processing

  8. Cao J, Jarvis SA, Spoon DP, Turner JD, Kerbyson DJ, Nudd GR (2002) Performance prediction technology for agent-based resource management in grid environments. In: 16th international parallel and distributed processing symposium, DC, USA, Washington

  9. Carrington L, Snavely A, Wolter N (2006) A performance prediction framework for scientific applications. Future Gener Comput Syst 22(3)

  10. CUDA application catalog, in Accessed in Nov 2016

  11. Elizabeth T, Nathan C, David AP, John B, Barry IP, Dave H (2015) Parallel cuda implementation of conflict detection for application to airspace deconfliction. J Supercomput 71(10):3787–3810. doi:10.1007/s11227-015-1467-z

    Article  Google Scholar 

  12. Filipovic J, Benkner S (2015) OpenCL kernel fusion for GPU, Xeon Phi and CPU. In: Proceedings of IEEE international symposium on computer architecture and high performance computing (SBAC-PAD), Florianopolis, Brazil, October 2015, IEEE Computer Society. doi:10.1109/SBAC-PAD.2015.29

  13. Haifeng W, Yunpeng C. Predicting power consumption of GPUs with fuzzy wavelet neural networks. Parallel Comput 44:18–36. doi:10.1016/j.parco.2015.02.002

  14. Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. SIGARCH Com-put Archit News 37(3):152–163. doi:10.1145/1555815.1555775

    Article  Google Scholar 

  15. Iverson MA, Ozguner F, Potter L (1999) Statistical prediction of task execution times through analytic benchmarking for scheduling in a heterogeneous environment. IEEE Trans Comput 48(12):1374–1379

    Article  Google Scholar 

  16. Jaewoong S, Aniruddha D, Hyesoon K, Richard V. A performance analysis framework for identifying potential benefits in GPGPU applications. In: Proceedings of the 17th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP ’12), ACM, New York, NY, USA, 11–22. doi:10.1145/2145816.2145819

  17. Kapadia N, Fortes J, Brodley C (1999) Predictive application-performance modeling in a computational grid environment. In: 8th international symposium on high performance distributed computing

  18. Kubota Y, Takahashi D (2011) Optimization of Sparse Matrix-Vector Multiplication by Auto Selecting Storage Schemes on GPU. LNCS, in ICCSA 2011:547–561

    Google Scholar 

  19. Li H, Groep D, Templon J, Wolters L (2004) Predicting job start times on clusters. In: International symposium on cluster computing and the grid

  20. Monakov A, Lokhmotov A, Avetisyan A (2010) Automatically tuning sparse matrix-vector multiplication for GPU architectures. LNCS, Springer, pp 111–125. doi:10.1007/978-3-642-11515-8_10

  21. Nadeem F, Murtaza Y, Prodan R, Fahringer T (2006) International conference on e-science and grid computing. In: Soft benchmarks-based application performance prediction using a minimum training set, Amsterdam, Netherlands

  22. Peppher project, Accessed in Nov 2016

  23. Shajulin B, Rejitha RS, Alex SA. Energy and performance prediction of CUDA applications using dynamic regression models. In: Proceedings of the 9th India software engineering conference (ISEC ’16). ACM, New York, NY, USA, 37–47. doi:10.1145/2856636.2856643

  24. Shuai C, Michael B, Jiayuan M, David T, Sheaffer JW, Kevin S (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380. doi:10.1016/j.jpdc.2008.05.014

    Article  Google Scholar 

  25. Shuaiwen Song, Chunyi Su, Rountree B., Cameron K.W., “A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures,” 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 673–686, 20–24 May 2013. doi:10.1109/IPDPS.2013.73

  26. Sierra-Canto X, Madera-Ramirez F, Uc-Cetina V (2010) Parallel Training of a Back-Propagation Neural Network Using CUDA,” 2010 Ninth International Conference on in Machine Learning and Applications (ICMLA), pp. 307–312, 12–14 Dec. doi:10.1109/ICMLA.2010.52

  27. Smith W, Foster I, Taylor V (2004) Predicting application run times with historical information. J Parallel Distributed Comput 64(9):1007–1016

    Article  MATH  Google Scholar 

  28. Snavely A, Carrington L, Wolter N, Labarta J, Badia R, Purkayastha A (2002) A framework for performance modeling and prediction,” in Supercomputing Conference

  29. Sodhi S, Subhlok J, Xu Q (2008) Performance prediction with skeletons. Cluster Comput 11(2):151–165

    Article  Google Scholar 

  30. R. Susukita, H. Ando, M. Aoyagi, H. Honda, Y. Inadomi, K. Inoue, S. Ishizuki, Y. Kimura, H. Komatsu, M. Kurokawa, K. J. Murakami, H. Shibamura, S. Yamamura and Y. Yu, “Performance prediction of large-scale parallell system and application using macro-level simulation,” in Supercomputing Conference, 2008

  31. Takefusa A, Tatebe O, Matsuoka S, Morita Y (2003) Performance analysis of scheduling and replication algorithms on Grid Datafarm architecture for high-energy physics applications,” in 12th International Symposium on High Performance Distributed Computing

  32. Taylor V, Wu X, Geisler J, Stevens R (2002) “Using Kernel Couplings to Predict Parallel Application Performance,” in 11th IEEE International Symposium on High Performance Distributed Computing. DC, USA, Washington

  33. Tingxing D, Dobrev V, Kolev T, Rieben R, Tomov S, Dongarra J (2014) A Step towards Energy Efficient Computing: Redesigning a Hydrodynamic Application on CPU-GPU. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pp. 972–981, 19–23. doi:10.1109/IPDPS.2014.103

  34. Tirado-Ramos A, Tsouloupas G, Dikaiakos M, Sloot P (2005) Grid Resource Selection by Application Benchmarking for Computational Haemodynamics Applications. In: International Conference on Computational Science, 2005

  35. Usman D, Johan E, Christoph WK (2011) Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems. In: Proceedings of the 4th international workshop on multicore software engineering (IWMSE’11), ACM, New York, NY, USA, pp 25–32. doi:10.1145/1984693.1984697

  36. Vedran M, Martina HD, Nataa HB (2014) Optimizing ELARS Algorithms using NVIDIA CUDA Heterogeneous Parallel Programming Platform”. ICT Innovations 2014, pp. 135–144. doi:10.1007/978-3-319-09879-1_14

  37. Vraalsen F, Aydt RA, Mendes CL, Reed DA (2001) Performance Contracts: Predicting and Monitoring Grid Application Behavior,” in 2nd International Workshop on Grid Computing

  38. Wu Y, Liu L, Mao J, Yang G, Zheng W (2007) An analytical model for performance evaluation in a computational grid,” in 3rd workshop on High performance computing in China

  39. Wu X, Taylor V, Paris J (2006) A Web-based Prophesy Automated Performance Modeling System,” in International Conference on Web Technologies, Applications and Services

  40. Xiangzheng S, Yunquan Z, Ting W, Xianyi Z, Liang Y, Li R (2011) Optimizing SpMV for Diagonal Sparse Matrices on GPU,” 2011 International Conference on in Parallel Processing (ICPP), pp 492–501. doi:10.1109/ICPP.2011.53

  41. Yang LT, Ma X, Mueller F (2005) Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution,” in Supercomputing

  42. Yero EJH, Henriques MAA (2006) Contention-sensitive static performance prediction for parallel distributed applications. Perform Evaluat 63(4):265–277

    Article  Google Scholar 

  43. Yooseong K, Shrivastava A (2011) CuMAPz: a tool to analyze memory access patterns in CUDA”, in Design Automation Conference (DAC), 2011 48th ACM/EDAC/IEEE, pp 128–133

Download references


This work is funded by the Department of Science and Technology of India under the Indo-Austrian PPP Scheme - Engineering Sciences division. The authors thank DST (India) and FWF (Austria) for funding this research work. In addition, they thank the reviewers of this paper for providing constructive comments.

Author information

Authors and Affiliations


Corresponding author

Correspondence to R. S. Rejitha.

Additional information

This work is partially funded by the Indo-Austrian project scheme of FWF-DST (India): DST No. INT/AUA/FWF/P-02/2013.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rejitha, R.S., Benedict, S., Alex, S.A. et al. Energy prediction of CUDA application instances using dynamic regression models. Computing 99, 765–790 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Applications
  • CUDA
  • Energy prediction
  • Performance analysis
  • Tools

Mathematics Subject Classification

  • 68M20