Abstract
The PAPI performance library is a widely used tool for gathering performance data from running applications. Modern processors support advanced sampling interfaces, such as Intel’s Precise Event Based Sampling (PEBS) and AMD’s Instruction Based Sampling (IBS). The current PAPI sampling interface predates the existence of these interfaces and only provides simple instruction-pointer based samples.
We propose a new, improved, sampling interface that provides support for the extended sampling information available on modern hardware. We extend the PAPI interface to add a new PAPI_sample_init call that uses the Linux perf_event interface to access the extra sample information. A pointer to these samples is returned to the user, who can either decode them on the fly, or write them to disk for later analysis.
By providing extended sampling information, this new PAPI interface allows advanced performance analysis and optimization that was previously not possible. This will greatly enhance the ability to optimize software in modern extreme-scale programming environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adhianto, L., et al.: HPCToolkit: tools for performance analysis of optimized parallel programs. Concurrency Comput.: Practice Exp. 22(6), 685–701 (2010)
Advanced Micro Devices: BIOS and Kernel Developer’s Guide (BKDG) For AMD Family 15h Models 00h–0Fh Processors, January 2013
Advanced Micro Devices: BIOS and Kernel Developer’s Guide (BKDG) For AMD Family 15h Models 30h–3Fh Processors, March 2014
AMD: AMD Family 15h Processor BIOS and Kernel Developer Guide (2011)
ARM: ARM Architecture Reference Manual Supplement Statistical Profiling Extension, for ARMv8-A (2017)
Drongowski, P., Yu, L., Swehosky, F., Suthikulpanit, S., Richter, R.: Incorporating instruction-based sampling into AMD CodeAnalyst. In: Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software, pp. 119–120, March 2010
Drongowski, P.: Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors. Advanced Micro Devices, Inc. (2007)
Eranian, S.: Linux \({\rm perf}\_{\rm events}\) status update. In: Scalable Tools Workshop, August 2016
Fässler, U., Nowak, A.: Perf file format. Technical report, CERN Openlab, September 2011
Gleixner, T., Molnar, I.: Performance counters for Linux (2009)
Gregg, B.: FlameGraphs. http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
IBM: Linux on Z and LinuxONE: Device Drivers, Features, and Commands (2018)
Intel Corporation: Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3: System Programming Guide, June 2015
Juvva, K.: Memory bandwidth monitoring in Linux for HPC applications. In: Linux Con North America 2015, August 2015
Kleen, A.: Intel PMU profiling tools. https://github.com/andikleen/pmu-tools
Kleen, A.: Adding processor trace support to Linux. Linux Weekly News (2015). https://lwn.net/Articles/648154/
Kleen, A.: perf.data file format specification draft (2015). https://lwn.net/Articles/644919/
Kleen, A., Strong, B.: Intel®processor trace on Linux. In: Tracing Summit 2015 (2015)
Knüpfer, A., et al.: The Vampir performance analysis tool-set. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds.) Tools for High Performance Computing, pp. 139–155. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68564-7_9
Lachaize, R., Lepers, B., Quéma, V.: Memprof: a memory profiler for NUMA multicore systems. In: USENIX Annual Technical Conference, June 2012
Levinthal, D.: Gooda PMU event analysis package. https://github.com/David-Levinthal/gooda
Lipp, M., et al.: Meltdown. ArXiv e-prints, January 2018
Liu, Y., Weaver, V.: Enhancing PAPI with low-overhead rdpmc reads. In: Proceedings of the 6th Workshop on Extreme-Scale Programming Tools, November 2017
Lopez, I., Moore, S., Weaver, V.: A prototype sampling interface for PAPI. In: Extreme Science Engineering Discovery Environment Conference, July 2015
McCurdy, C., Vetter, J.: Finding and fixing NUMA-related performance problems on multi-core platforms. In: Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software, pp. 87–96, March 2010
Mucci, P.J., Browne, S., Deane, C., Ho, G.: PAPI: a portable interface to hardware performance counters. In: Proceedings of Department of Defense HPCMP User Group Conference, June 1999
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 89–100, June 2007
Olsa, J.: Perf & CTF. In: Tracing Summit 2014 (2014)
Petitet, A., Whaley, R., Dongarra, J., Cleary, A., Luszczek, P.: HPL – a portable implementation of the high-performance linpack benchmark for distributed-memory computers. Innovative Computing Laboratory, Computer Science Department, University of Tennessee, v2.2, December 2017. http://www.netlib.org/benchmark/hpl/
Ragate, S.: GPU PC sampling utility. Technical report, Innovative Computing Lab, University of Tennessee (2015)
Selva, M., Morel, L., Marquet, K.: numap: a portable library for low level memory profiling. Technical report RR-8879, INRIA, March 2016
Treibig, J., Hager, G., Wellein, G.: LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of the First International Workshop on Parallel Software Tools and Tool Infrastructures, September 2010
Weaver, V.: \({\rm perf}\_{\rm event}\_{\rm open}\) manual page. In: Kerrisk, M. (ed.) Linux Programmer’s Manual, February 2018
Acknowledgment
This work was supported by the National Science Foundation under Grant No. SSI-1450122.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Smith, F., Weaver, V.M. (2019). Advanced Event-Sampling Support for PAPI. In: Bhatele, A., Boehme, D., Levine, J., Malony, A., Schulz, M. (eds) Programming and Performance Visualization Tools. ESPT ESPT VPA VPA 2017 2018 2017 2018. Lecture Notes in Computer Science(), vol 11027. Springer, Cham. https://doi.org/10.1007/978-3-030-17872-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-17872-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17871-0
Online ISBN: 978-3-030-17872-7
eBook Packages: Computer ScienceComputer Science (R0)