The Journal of Supercomputing

, Volume 74, Issue 11, pp 5643–5658 | Cite as

MeterPU: a generic measurement abstraction API

Enabling energy-tuned skeleton backend selection
  • Lu Li
  • Christoph Kessler


We present MeterPU, an easy-to-use, generic and low-overhead abstraction API for taking measurements of various metrics (time, energy) on different hardware components (e.g., CPU, DRAM, GPU) in a heterogeneous computer system, using pluggable platform-specific measurement implementations behind a common interface in C++. We show that with MeterPU, not only legacy (time) optimization frameworks, such as autotuned skeleton back-end selection, can be easily retargeted for energy optimization, but also switching between measurement metrics or techniques for arbitrary code sections now becomes trivial. We apply MeterPU to implement the first energy-tunable skeleton programming framework, based on the SkePU skeleton programming library.


MeterPU Measurement abstraction API GPU Performance measurement Energy measurement Auto-tuning Skeleton programming 



Research partially funded by EU FP7 project EXCESS and SeRC project OpCoReS. We thank Oleg Sysoev from Linköping University for suggestions on statistical data handling. We thank Dennis Hoppe from HLRS Stuttgart, Erik Hansson from Linköping University, Paul Renaud-Goud from Chalmers and all other EXCESS project members for comments on this work.


  1. 1.
    Enmyren J, Kessler CW (2010) SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proc. 4th Int. Workshop on High-Level Parallel Programming and Applications (HLPP-2010). ACM, Baltimore, USA. doi: 10.1145/1863482.1863487
  2. 2.
    Dastgeer U, Kessler C (2015) Smart containers and skeleton programming for GPU-based systems. Int J Parallel Program 44(3):506–530. doi: 10.1007/s10766-015-0357-6 CrossRefGoogle Scholar
  3. 3.
    Dastgeer U, Enmyren J, Kessler CW (2011) Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems. In: Proceedings of the 4th International Workshop on Multicore Software Engineering. ACM, New York, pp 25–32.
  4. 4.
    Dastgeer U, Li L, Kessler C (2013) Adaptive implementation selection in a skeleton programming library. In: Proc. of the 2013 Biennial Conference on Advanced Parallel Processing Technology (APPT-2013), vol. LNCS 8299. Springer, Aug. 2013, pp 170–183Google Scholar
  5. 5.
    Li L, Dastgeer U, Kessler C (2016) Pruning strategies in adaptive off-line tuning for optimized composition of components on heterogeneous systems. Parallel Comput 51:37–45. doi: 10.1016/j.parco.2015.09.003 CrossRefGoogle Scholar
  6. 6.
    Cabrera A, Almeida F, Arteaga J, Blanco V (2015) Energy measurement library (EML) usage and overhead analysis. In: 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), March 2015, pp 554–558Google Scholar
  7. 7.
    Kiss A, Danelutto M, Herczeg Z, Molnar P, Sipka R, Torquati M, Vidacs L (2015) D6.4: REPARA performance and energy monitoring library. The REPARA Consortium, Tech. Rep., Mar. 2015Google Scholar
  8. 8.
    Josephsen D (2007) Building a monitoring infrastructure with Nagios. Prentice Hall PTR, Upper Saddle RiverGoogle Scholar
  9. 9.
    GroundWork Inc. (2015) GroundWork—Unified Monitoring For Real. Accessed: 21 Jan 2015
  10. 10.
    Bourdon A, Noureddine A, Rouvoy R, Seinturier L (2013) PowerAPI: a software library to monitor the energy consumed at the process-level. ERCIM News 92 2013Google Scholar
  11. 11.
    Steuwer M, Kegel P, Gorlatch S (2011) SkelCL—a portable skeleton library for high-level GPU programming. In: 16th Int. Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS’11), May 2011Google Scholar
  12. 12.
    Ernsting S, Kuchen H (2012) Algorithmic skeletons for multi-core, multi-GPU systems and clusters. Int J High Perform Comput Netw 7:129–138CrossRefGoogle Scholar
  13. 13.
    Goli M, Gonzalez-Velez H (2013) Heterogeneous algorithmic skeletons for FastFlow with seamless coordination over hybrid architectures. In: Euromicro PDP Int. Conf. on Par., Distrib. and Netw.-Based Processing, pp 148–156Google Scholar
  14. 14.
    Marques R, Paulino H, Alexandre F, Medeiros PD (2013) Algorithmic skeleton framework for the orchestration of GPU computations. In: Euro-Par 2013 Parallel Processing. Springer, 2013, vol. LNCS 8097, pp 874–885Google Scholar
  15. 15.
    Breymann U (1998) Designing components with the C++ STL. Addison-Wesley, USAGoogle Scholar
  16. 16.
    Nvidia Corp. NVML API reference guide,” Mar. 2014. Available:
  17. 17.
    Burtscher M, Zecena I, Zong Z (2014) Measuring GPU power with the K20 built-in sensor. In: Proc. Workshop on General Purpose Processing using GPUs (GPGPU-7). ACM, Mar. 2014Google Scholar
  18. 18.
    Li L, Dastgeer U, Kessler C (2013) Adaptive off-line tuning for optimized composition of components for heterogeneous many-core systems. In: High Performance Computing for Computational Science—VECPAR 2012. Springer, BerlinGoogle Scholar
  19. 19.
    Li L, Kessler C (2015) MeterPU: a generic measurement abstraction API enabling energy-tuned skeleton backend selection. In: Trustcom/BigDataSE/ISPA, 2015 IEEE, vol. 3, Aug 2015, pp 154–159. doi: 10.1109/Trustcom.2015.625
  20. 20.
    Kessler C, Li L, Dastgeer U, Tsigas P, Gidenstam A, Renaud-Goud P, Walulya I, Atalar A, Moloney D, Hoai PH, Tran V (2014) D1.1 Early validation of system-wide energy compositionality and affecting factors on the EXCESS platforms. Project Deliverable, EU FP7 project execution models for energy-efficient computing systems (EXCESS). Accessed Apr 2014

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.IDA, Linköping UniversityLinköpingSweden

Personalised recommendations