Power Management and Event Verification in PAPI

  • Heike JagodeEmail author
  • Asim YarKhan
  • Anthony Danalis
  • Jack Dongarra
Conference paper


For more than a decade, the PAPI performance monitoring library has helped to implement the familiar maxim attributed to Lord Kelvin: “If you cannot measure it, you cannot improve it.” Widely deployed and widely used, PAPI provides a generic, portable interface for the hardware performance counters available on all modern CPUs and some other components of interest that are scattered across the chip and system. Recent and radical changes in processor and system design—systems that combine multicore CPUs and accelerators, shared and distributed memory, PCI-express and other interconnects—as well as the emergence of power efficiency as a primary design constraint, and reduced data movement as a primary programming goal, pose new challenges and bring new opportunities to PAPI. We discuss new developments of PAPI that allow for multiple sources of performance data to be measured simultaneously via a common software interface. Specifically, a new PAPI component that controls power is discussed. We explore the challenges of shared hardware counters that include system-wide measurements in existing multicore architectures. We conclude with an exploration of future directions for the PAPI interface.


Hardware Performance Counters Paper Components Primary Program Goals Running Average Power Limit (RAPL) Region-specific Models (MSRs) 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We thank the anonymous reviewers for their improvement suggestions.

This material is based upon work supported in part by the DOE Office of Science, Advanced Scientific Computing Research, under award No. DE-SC0006733 “SUPER—Institute for Sustained Performance, Energy and Resilience,” and by the National Science Foundation under award No. 1450429 “PAPI-EX.”


  1. 1.
    Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A Portable Programming Interface for Performance Evaluation on Modern Processors. Int. J. High Perform. Comput. Appl. 14(3), 189–204 (2000)CrossRefGoogle Scholar
  2. 2.
    Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Danalis, A., Luszczek, P., Marin, G., Vetter, J.S., Dongarra, J.: BlackjackBench: Portable hardware characterization with automated results’ analysis. Comput. J. 57(7), 1002–1016 (2013)CrossRefGoogle Scholar
  4. 4.
    Drongowski, P.: An introduction to analysis and optimization with AMD CodeAnalyst™Performance Analyzer. Advanced Micro Devices, Inc. (2008)Google Scholar
  5. 5.
    Intel, I.: Intel 64 and IA-32 Architectures Software Developer’s Manual - Systems Programming Guide, vol. 3, chap. 14 (2015)Google Scholar
  6. 6.
  7. 7.
    Kurzak, J., Luszczek, P., YarKhan, A., Faverge, M., Langou, J., Bouwmeester, H., Dongarra, J.: Multithreading in the PLASMA Library. Handbook of Multi and Many-Core Processing: Architecture, Algorithms, Programming, and Applications. Computer and Information Science Series. Chapman and Hall/CRC, Boca Raton (2013)Google Scholar
  8. 8.
    Malony, A.D., Biersdorff, S., Shende, S., Jagode, H., Tomov, S., Juckeland, G., Dietrich, R., Poole, D., Lamb, C.: Parallel performance measurement of heterogeneous parallel systems with gpus. In: Proceedings of the 2011 International Conference on Parallel Processing, ICPP ’11, pp. 176–185. IEEE Computer Society, Washington, DC, USA (2011)Google Scholar
  9. 9.
    McCraw, H., Terpstra, D., Dongarra, J., Davis, K., R., M.: Beyond the CPU: Hardware Performance Counter Monitoring on Blue Gene/Q. In: Proceedings of the International Supercomputing Conference 2013, ISC’13, pp. 213–225. Springer, Heidelberg, June (2013)Google Scholar
  10. 10.
    McCraw, H., Ralph, J., Danalis, A., Dongarra, J.: Power Monitoring with PAPI for Extreme Scale Architectures and Dataflow-based Programming Models, pp. 385–391 (2014)Google Scholar
  11. 11.
    McFadden, M., Shoga, K., Rountree, B.: Msr-safe (2015).
  12. 12.
    Molnar, I.: perf: Linux profiling with performance counters (2009).
  13. 13.
    Terpstra, D., Jagode, H., You, H., Dongarra, J.: Collecting performance data with PAPI-C. Tools for High Performance Computing 2009, pp. 157–173 (2009)Google Scholar
  14. 14.
    Treibig, J., Hager, G., Wellein, G.: LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of the First International Workshop on Parallel Software Tools and Tool Infrastructures September (2010)Google Scholar
  15. 15.
    Walker, S., Shoga, K., Rountree, B., Morita, L.: Libmsr (2015).
  16. 16.
    Wolf, J.: Programming Methods for the PentiumTM III Processor’s Streaming SIMD Extensions Using the VTune™Performance Enhancement Environment. Intel Corporation (1999)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Heike Jagode
    • 1
    Email author
  • Asim YarKhan
    • 1
  • Anthony Danalis
    • 1
  • Jack Dongarra
    • 1
  1. 1.Innovative Computing LaboratoryUniversity of Tennessee, KnoxvilleKnoxvilleUSA

Personalised recommendations