Collecting Performance Data with PAPI-C

  • Dan Terpstra
  • Heike Jagode
  • Haihang You
  • Jack Dongarra
Conference paper

Abstract

Modern high performance computer systems continue to increase in size and complexity. Tools to measure application performance in these increasingly complex environments must also increase the richness of their measurements to provide insights into the increasingly intricate ways in which software and hardware interact. PAPI (the Performance API) has provided consistent platform and operating system independent access to CPU hardware performance counters for nearly a decade. Recent trends toward massively parallel multi-core systems with often heterogeneous architectures present new challenges for the measurement of hardware performance information, which is now available not only on the CPU core itself, but scattered across the chip and system. We discuss the evolution of PAPI into Component PAPI, or PAPI-C, in which multiple sources of performance data can be measured simultaneously via a common software interface. Several examples of components and component data measurements are discussed. We explore the challenges to hardware performance measurement in existing multi-core architectures. We conclude with an exploration of future directions for the PAPI interface.

Keywords

Direct Numerical Simulation Performance Counter Hardware Performance Float Point Operation Hardware Counter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. International Journal of High-Performance Computing Applications, Vol. 14, No. 3, pp. 189-204 (2000) CrossRefGoogle Scholar
  2. 2.
    Cameron, K.W., Ge, R., and Feng, X.: High-performance, power-aware distributed computing for scientific applications. Computer, 38(11):40–47 (2005) CrossRefGoogle Scholar
  3. 3.
    Feng, W.C.: The importance of being low power in high performance computing. CTWatch Quarterly, 1(3), August (2005) Google Scholar
  4. 4.
    Freeh, V.W., Lowenthal, D.K., Pan, F., Kappiah, N.: Using multiple energy gears in MPI programs on a power-scalable cluster. In Principles and Practices of Parallel Programming (PPOPP), June (2005) Google Scholar
  5. 5.
    Perfmon2 Sourceforge Project Page: http://perfmon2.sourceforge.net
  6. 6.
    Molnar, I.: Performance Counters for Linux, v8. http://lwn.net/Articles/336542
  7. 7.
    Moore, S.: A Comparison of Counting and Sampling Modes of Using Performance Monitoring Hardware. ICCS 2002, Amsterdam, April (2002) Google Scholar
  8. 8.
    Operating System share, November 1999: http://www.top500.org/charts/list/14/os
  9. 9.
    Operating System share, November 2009: http://www.top500.org/charts/list/34/os
  10. 10.
    Pettersson, M.: Linux x86 Performance-Monitoring Counters Driver. http://www.csd.uu.se/~mikpe/linux/perfctr
  11. 11.
    Jarp, S., Jurga, R., Nowak, A.: Perfmon2: A leap forward in Performance Monitoring. Journal of Physics: Conference Series 119, 042017 (2008) CrossRefGoogle Scholar
  12. 12.
    Luszczek, P., Dongarra, J., Koester, D., Rabenseifner, R., Lucas, B., Kepner, J., McCalpin, J., Bailey, D., Takahashi, D.: Introduction to the hpc challenge benchmark suite. Technical report, March (2005) Google Scholar
  13. 13.
    Hardware Monitoring by lm_sensors: http://www.lm-sensors.org/
  14. 14.
    Top500 list: http://www.top500.org
  15. 15.
    NCCS.gov computing resources documentation: http://www.nccs.gov/computing-resources/jaguar
  16. 16.
    Software Optimization Guide for AMD Family 10h Processors, Pub. no. 40546 (2008) Google Scholar
  17. 17.
    Chen, J. H., Hawkes, E. R., et al.: Direct numerical simulation of ignition front propagation in a constant volume with temperature inhomogeneities I. fundamental analysis and diagnostics. Combustion and flame, 145, pp. 128-144 (2006) CrossRefGoogle Scholar
  18. 18.
    Sankaran, R., Hawkes, E. R., et al.: Structure of a spatially developing turbulent lean methane-air Bunsen flame. Proceedings of the combustion institute 31, pp. 1291-1298 (2007) Google Scholar
  19. 19.
    Hawkes, E. R., Sankaran, R., et al.: Scalar mixing in direct numerical simulations of temporally evolving nonpremixed plane jet flames with skeletal CO-H2 kinetics. Proceedings of the combustion institute 31, pp. 1633-1640 (2007) Google Scholar
  20. 20.
    Cray XT Programming Environment User’s Guide (Version 2.2). S-2396-22, July (2009) Google Scholar
  21. 21.
    BIOS and Kernel Developer’s Guide (BKDG) for AMD Family 10h Processors (particularly Section 3.12.). Vol. 31116 Rev 3.34, September (2009) Google Scholar
  22. 22.
    Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3B: System Programming Guide (Particularly Chapter 19.17.2 Performance Monitoring Facility in the Uncore). Part 2 Order Number: 253669-031US, June (2009) Google Scholar
  23. 23.
    Walkup, B.: Blue Gene/P Universal Performance Counters. http://www.nccs.gov/wp-content/training/2008_bluegene/BobWalkup_BGP_UPC.pdf

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Dan Terpstra
    • 1
  • Heike Jagode
  • Haihang You
  • Jack Dongarra
  1. 1.The University of TennesseeKnoxvilleUSA

Personalised recommendations