Abstract
GPGPUs no longer seem to be an inconsequential component of supercomputing architectures, and a section of HPC application developers no longer refrain from utilizing GPGPUs. CUDA, in general, has remained a successful computing platform for those architectures. Thousands of scientific applications from various domains, such, as bio-informatics, HEP, and so forth, have been accelerated using CUDA in the past few years. In fact, the energy consumption issue still remains a serious challenge for the HPC and GPGPU communities. This paper proposes energy prediction approaches using dynamic regression models, such as parallel dynamic random forest modeling (P-DynRFM), dynamic random forest modeling (DynRFM), dynamic support vector machines (DynSVM), and dynamic linear regression modeling (DynLRM). These models identify energy efficient CUDA application instances while considering the block size, grid size, and the other tunable parameters, such as problem size. The predictions of CUDA application instances have been attained by executing a few CUDA application instances and predicting the other CUDA application instances based on the performance metrics of applications, such as number of instructions, memory issues, and so forth. The proposed energy prediction mechanisms were evaluated with CUDA applications such as Nbody and Particle Simulations on two GPGPU machines. The proposed dynamic prediction mechanisms achieved a 50.26 to 61.23 percentage of energy/performance prediction improvements when compared to the classical prediction models; and, the parallel implementation of the dynamic RFM (P-DynRFM) recorded over 83 percentage points of prediction time improvements.
Similar content being viewed by others
References
Alcides F, Bruno C (2013) miniumGPU: An Intelligent Framework for GPU Programming, Facing the Multicore-Challenge III, Volume 7686 of the series Lecture Notes in Computer Science, pp 96–107
Ashwin MA, Mayank D, Wu CF (2016) CampProf: A Visual Performance Analysis Tool for Memory Bound GPU Kernels, in https://vtechworks.lib.vt.edu/bitstream/handle/10919/19729/CampProf-TechReport.pdf?sequence=3&isAllowed=y. Accessed on 10 July 2016
Bacigalupo DA, Jarvis SA, He L, Spooner DP, Dillenberger DN, Nudd GR (2005) An investigation into the application of different performance prediction methods to distributed enterprise applications. J Supercomput 34:2
Barnes BJ, Rountree B, Lowenthal DK, Reeves J, de Supinski B, Schulz M (2008) A regression-based approach to scalability prediction. In: 22nd annual international conference on supercomputing
Benedict S, Rejitha RS, Phillip G, Prodan R, Fahringer T (2015) Energy prediction of OpenMP applications using random forest modeling approach. In: iWAPT2015 @ IPDPS, pp 1251–1260. doi:10.1109/IPDPSW.2015.12
Boyer M, Jiayuan M, Kumaran K (2013) Improving GPU performance prediction with data transfer modeling. In: 2013 IEEE 27th international parallel and distributed processing symposium workshops & PhD forum (IPDPSW), pp 1097–1106. doi:10.1109/IPDPSW.2013.236
Brehm J, Worley P (1997) Performance prediction for complex parallel applications. In: 11th international symposium on parallel processing
Cao J, Jarvis SA, Spoon DP, Turner JD, Kerbyson DJ, Nudd GR (2002) Performance prediction technology for agent-based resource management in grid environments. In: 16th international parallel and distributed processing symposium, DC, USA, Washington
Carrington L, Snavely A, Wolter N (2006) A performance prediction framework for scientific applications. Future Gener Comput Syst 22(3)
CUDA application catalog, in http://www.nvidia.com/content/gpu-applications/PDF/gpu-applications-catalog.pdf. Accessed in Nov 2016
Elizabeth T, Nathan C, David AP, John B, Barry IP, Dave H (2015) Parallel cuda implementation of conflict detection for application to airspace deconfliction. J Supercomput 71(10):3787–3810. doi:10.1007/s11227-015-1467-z
Filipovic J, Benkner S (2015) OpenCL kernel fusion for GPU, Xeon Phi and CPU. In: Proceedings of IEEE international symposium on computer architecture and high performance computing (SBAC-PAD), Florianopolis, Brazil, October 2015, IEEE Computer Society. doi:10.1109/SBAC-PAD.2015.29
Haifeng W, Yunpeng C. Predicting power consumption of GPUs with fuzzy wavelet neural networks. Parallel Comput 44:18–36. doi:10.1016/j.parco.2015.02.002
Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. SIGARCH Com-put Archit News 37(3):152–163. doi:10.1145/1555815.1555775
Iverson MA, Ozguner F, Potter L (1999) Statistical prediction of task execution times through analytic benchmarking for scheduling in a heterogeneous environment. IEEE Trans Comput 48(12):1374–1379
Jaewoong S, Aniruddha D, Hyesoon K, Richard V. A performance analysis framework for identifying potential benefits in GPGPU applications. In: Proceedings of the 17th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP ’12), ACM, New York, NY, USA, 11–22. doi:10.1145/2145816.2145819
Kapadia N, Fortes J, Brodley C (1999) Predictive application-performance modeling in a computational grid environment. In: 8th international symposium on high performance distributed computing
Kubota Y, Takahashi D (2011) Optimization of Sparse Matrix-Vector Multiplication by Auto Selecting Storage Schemes on GPU. LNCS, in ICCSA 2011:547–561
Li H, Groep D, Templon J, Wolters L (2004) Predicting job start times on clusters. In: International symposium on cluster computing and the grid
Monakov A, Lokhmotov A, Avetisyan A (2010) Automatically tuning sparse matrix-vector multiplication for GPU architectures. LNCS, Springer, pp 111–125. doi:10.1007/978-3-642-11515-8_10
Nadeem F, Murtaza Y, Prodan R, Fahringer T (2006) International conference on e-science and grid computing. In: Soft benchmarks-based application performance prediction using a minimum training set, Amsterdam, Netherlands
Peppher project, http://www.peppher.edu. Accessed in Nov 2016
Shajulin B, Rejitha RS, Alex SA. Energy and performance prediction of CUDA applications using dynamic regression models. In: Proceedings of the 9th India software engineering conference (ISEC ’16). ACM, New York, NY, USA, 37–47. doi:10.1145/2856636.2856643
Shuai C, Michael B, Jiayuan M, David T, Sheaffer JW, Kevin S (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380. doi:10.1016/j.jpdc.2008.05.014
Shuaiwen Song, Chunyi Su, Rountree B., Cameron K.W., “A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures,” 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 673–686, 20–24 May 2013. doi:10.1109/IPDPS.2013.73
Sierra-Canto X, Madera-Ramirez F, Uc-Cetina V (2010) Parallel Training of a Back-Propagation Neural Network Using CUDA,” 2010 Ninth International Conference on in Machine Learning and Applications (ICMLA), pp. 307–312, 12–14 Dec. doi:10.1109/ICMLA.2010.52
Smith W, Foster I, Taylor V (2004) Predicting application run times with historical information. J Parallel Distributed Comput 64(9):1007–1016
Snavely A, Carrington L, Wolter N, Labarta J, Badia R, Purkayastha A (2002) A framework for performance modeling and prediction,” in Supercomputing Conference
Sodhi S, Subhlok J, Xu Q (2008) Performance prediction with skeletons. Cluster Comput 11(2):151–165
R. Susukita, H. Ando, M. Aoyagi, H. Honda, Y. Inadomi, K. Inoue, S. Ishizuki, Y. Kimura, H. Komatsu, M. Kurokawa, K. J. Murakami, H. Shibamura, S. Yamamura and Y. Yu, “Performance prediction of large-scale parallell system and application using macro-level simulation,” in Supercomputing Conference, 2008
Takefusa A, Tatebe O, Matsuoka S, Morita Y (2003) Performance analysis of scheduling and replication algorithms on Grid Datafarm architecture for high-energy physics applications,” in 12th International Symposium on High Performance Distributed Computing
Taylor V, Wu X, Geisler J, Stevens R (2002) “Using Kernel Couplings to Predict Parallel Application Performance,” in 11th IEEE International Symposium on High Performance Distributed Computing. DC, USA, Washington
Tingxing D, Dobrev V, Kolev T, Rieben R, Tomov S, Dongarra J (2014) A Step towards Energy Efficient Computing: Redesigning a Hydrodynamic Application on CPU-GPU. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pp. 972–981, 19–23. doi:10.1109/IPDPS.2014.103
Tirado-Ramos A, Tsouloupas G, Dikaiakos M, Sloot P (2005) Grid Resource Selection by Application Benchmarking for Computational Haemodynamics Applications. In: International Conference on Computational Science, 2005
Usman D, Johan E, Christoph WK (2011) Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems. In: Proceedings of the 4th international workshop on multicore software engineering (IWMSE’11), ACM, New York, NY, USA, pp 25–32. doi:10.1145/1984693.1984697
Vedran M, Martina HD, Nataa HB (2014) Optimizing ELARS Algorithms using NVIDIA CUDA Heterogeneous Parallel Programming Platform”. ICT Innovations 2014, pp. 135–144. doi:10.1007/978-3-319-09879-1_14
Vraalsen F, Aydt RA, Mendes CL, Reed DA (2001) Performance Contracts: Predicting and Monitoring Grid Application Behavior,” in 2nd International Workshop on Grid Computing
Wu Y, Liu L, Mao J, Yang G, Zheng W (2007) An analytical model for performance evaluation in a computational grid,” in 3rd workshop on High performance computing in China
Wu X, Taylor V, Paris J (2006) A Web-based Prophesy Automated Performance Modeling System,” in International Conference on Web Technologies, Applications and Services
Xiangzheng S, Yunquan Z, Ting W, Xianyi Z, Liang Y, Li R (2011) Optimizing SpMV for Diagonal Sparse Matrices on GPU,” 2011 International Conference on in Parallel Processing (ICPP), pp 492–501. doi:10.1109/ICPP.2011.53
Yang LT, Ma X, Mueller F (2005) Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution,” in Supercomputing
Yero EJH, Henriques MAA (2006) Contention-sensitive static performance prediction for parallel distributed applications. Perform Evaluat 63(4):265–277
Yooseong K, Shrivastava A (2011) CuMAPz: a tool to analyze memory access patterns in CUDA”, in Design Automation Conference (DAC), 2011 48th ACM/EDAC/IEEE, pp 128–133
Acknowledgements
This work is funded by the Department of Science and Technology of India under the Indo-Austrian PPP Scheme - Engineering Sciences division. The authors thank DST (India) and FWF (Austria) for funding this research work. In addition, they thank the reviewers of this paper for providing constructive comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is partially funded by the Indo-Austrian project scheme of FWF-DST (India): DST No. INT/AUA/FWF/P-02/2013.
Rights and permissions
About this article
Cite this article
Rejitha, R.S., Benedict, S., Alex, S.A. et al. Energy prediction of CUDA application instances using dynamic regression models. Computing 99, 765–790 (2017). https://doi.org/10.1007/s00607-016-0534-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-016-0534-5