Abstract
Predictive models enable a better understanding of the performance characteristics of applications on multicore systems. Previous work has utilized performance counters in a system-centered approach to model power consumption for the system, CPU, and memory components. Often, these approaches use the same group of counters across different applications. In contrast, we develop application-centric models (based upon performance counters) for the runtime and power consumption of the system, CPU, and memory components. Our work analyzes four Hybrid (MPI/OpenMP) applications: the NAS Parallel Multizone Benchmarks (BT-MZ, SP-MZ, LU-MZ) and a Gyrokinetic Toroidal Code, GTC. Our models show that cache utilization (L1/L2), branch instructions, TLB data misses, and system resource stalls affect the performance of each application and performance component differently. We show that the L2 total cache hits counter affects performance across all applications. The models are validated for the system and component power measurements with an error rate less than 3%.
Similar content being viewed by others
References
Bair E et al. (2006) Prediction by supervised principal components. J Am Stat Assoc 103:119–137
Bellosa F (2000) The benefits of event-driven energy accounting in power-sensitive systems. In: ACM SIGOPS Euro workshop, September 2000
Lloyd Bircher W, John LK (2007) Complete system power estimation: a trickle-down approach based on performance events. In: Proc of ISPASS’2007, pp 158–168
Lloyd Bircher W et al. (2005) Runtime identification of microprocessor energy saving opportunities. In: Proc of the int sym on low power elec and design, August 2005
Cao Z, Easterling DR, Watson LT, Lia D, Cameron KW, Feng W-C (2010) Power saving experiments for large-scale global optimisation. The Int J Parallel Emergent Distributed Syst 25(5):381–400
Curtis-Maury M et al. (2006) Online power-performance adaptation of multithreaded programs using hardware event-based prediction. In: ICS06
Curtis-Maury M et al. (2008) Prediction-based power-performance adaptation of multithreaded scientific codes. In: TPDS, vol 19, p 10
Freeh V et al. (2008) Just-in-time dynamic voltage scaling: exploiting inter-node slack to save energy in MPI programs. J Parallel Distributed Comput 68(9) 1175–1185
Freeh V, Pan F, Lowenthal D, Kappiah N (2005) Using multiple energy gears in MPI programs on a power-scalable cluster. In: PPOPP05
Hsu C-H, Feng W-C (2005) A power-aware run-time system for high-performance computing. In: IEEE/ACM supercomputing 2005 (SC05), November 2005
Ge R et al. (2010) PowerPack: energy profiling and analysis of high-performance systems and applications. IEEE Trans Parallel Distributed Syst 21(5):658–671
Jin H et al. (2004) Performance characteristics of the multi-zone NAS parallel benchmarks. In: IPDPS
Lee K, Skadron K (2005) Using performance counters for runtime temperature sensing in high performance processors. In: HPPAC05, April 2005
Li D et al. (2010) Hybrid MPI/OpenMP power-aware computing. In: IPDPS2010, Atlanta, Georgia, April 2010
Li T et al. (2003) Run-time modeling and estimation of operating system power consumption. In: Sigmetrics2003
Lively C, Taylor V et al. (2008) A methodology for developing high fidelity communication models on multicore systems. In: SBAC-PAD, pp 55–62
Multicore application modeling infrastructure (MuMI) project. http://www.mumi-tool.org
Kogge PM (2008) ExaScale computing study: technology challenges in achieving exascale systems. Tech report TR-2008-13, Univ of Notre Dame, CSE Dept, Sept 28, 2008
Performance application programming interface, papi. http://icl.cs.utk.edu/papi/
Lim M, Porterfield A, Fowler R (2010) SoftPower: fine-grain power estimations using performance counters. In: HPDC’10, New York, NY
Rountree B et al. (2009) Adagio: making DVS practical for complex HPC applications. In: ICS09, NY
Singh K, Bhadhauria M, McKee SA (2008) Real time power estimation and thread scheduling via performance counters. In: Proc of workshop on design, architecture, and simulation of chip multi-processors, November 2008
Song S et al. (2003) Energy profiling and analysis of the HPC challenge benchmarks. Int J High Perform Comput Appl 23(3):265–276
Song S et al. (2011) Iso-energy-efficiency: an approach to power-constrained parallel computation. In: IPDPS
Taylor V, Wu X, Stevens R (2003) Prophesy: an infrastructure for performance analysis and modeling system of parallel and grid applications. ACM SIGMETRICS Perf Eval Rev 30(4):13–18
Wu X, Taylor V et al. (2006) Performance analysis, modeling and prediction of a parallel multiblock lattice Boltzmann application using prophesy system. In: ICCC06
Wu X, Taylor V, Lively C et al. (2009) Performance analysis and optimization of parallel scientific applications on CMP clusters. Scal Comput Pract Exp 10(1):188–195
Wu X, Taylor V (2011) Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers. In: ACM SIGMETRICS Perf Evaluation Rev, vol 38, Issue 4, 2011 56–62
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lively, C., Wu, X., Taylor, V. et al. Power-aware predictive models of hybrid (MPI/OpenMP) scientific applications on multicore systems. Comput Sci Res Dev 27, 245–253 (2012). https://doi.org/10.1007/s00450-011-0190-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00450-011-0190-0