Validation of Hardware Events for Successful Performance Pattern Identification in High Performance Computing

  • Thomas RöhlEmail author
  • Jan Eitzinger
  • Georg Hager
  • Gerhard Wellein
Conference paper


Hardware performance monitoring (HPM) is a crucial ingredient of performance analysis tools. While there are interfaces like LIKWID, PAPI or the kernel interface perf_event which provide HPM access with some additional features, many higher level tools combine event counts with results retrieved from other sources like function call traces to derive (semi-)automatic performance advice. However, although HPM is available for x86 systems since the early 90s, only a small subset of the HPM features is used in practice. Performance patterns provide a more comprehensive approach, enabling the identification of various performance-limiting effects. Patterns address issues like bandwidth saturation, load imbalance, non-local data access in ccNUMA systems, or false sharing of cache lines. This work defines HPM event sets that are best suited to identify a selection of performance patterns on the Intel Haswell processor. We validate the chosen event sets for accuracy in order to arrive at a reliable pattern detection mechanism and point out shortcomings that cannot be easily circumvented due to bugs or limitations in the hardware.


Home Agent Cache Line Event Count Performance Pattern Load Imbalance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Parts of this work were funded by the German Federal Ministry of Research and Education (BMBF) under Grant Number 01IH13009.


  1. 1.
    Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurr. Comput.: Pract. Exp.D 22(6), 685–701 (2010)Google Scholar
  2. 2.
    Eranian, S.: Perfmon2: a flexible performance monitoring interface for Linux. In: Ottawa Linux Symposium, pp. 269–288, Citeseer (2006)Google Scholar
  3. 3.
    Geimer, M., Wolf, F., Wylie, B.J., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurr. Comput.: Pract. Exp. 22(6), 702–719 (2010)Google Scholar
  4. 4.
    Gleixner, T., Molnar, I.: Linux 2.6.32: perf_event.h. (2008)
  5. 5.
    Guillen, C.: Knowledge-based performance monitoring for large scale HPC architectures. Dissertation p. (2015)
  6. 6.
    Intel: Intel 64 and IA-32 Architectures Software Developer Manuals. (2015)
  7. 7.
    Intel: Intel Open Source Technology Center for PerfMon. (2015)
  8. 8.
    Intel: Intel Xeon Processor E3-1200 v3 Product Family Specification Update. (2015)
  9. 9.
    Intel: Intel Xeon Processor E5 v3 Family Uncore Performance Monitoring. (2015)
  10. 10.
    Kufrin, R.: Perfsuite: An accessible, open source performance analysis environment for linux. In: 6th International Conference on Linux Clusters: The HPC Revolution, vol. 151, p. 05. Citeseer (2005)Google Scholar
  11. 11.
    Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: Building customized program analysis tools with dynamic instrumentation. SIGPLAN Not. 40(6), 190–200 (2005).
  12. 12.
    Mucci, P.J., Browne, S., Deane, C., Ho, G.: PAPI: A portable interface to hardware performance counters. In: Proceedings of the Department of Defense HPCMP Users Group Conference. pp. 7–10 (1999)Google Scholar
  13. 13.
    Pettersson, M.: Linux x86 performance-monitoring counters driver (2003)Google Scholar
  14. 14.
    Roehl, T.: Performance patterns for the Intel Haswell EP/EN/EX architecture. (2015)
  15. 15.
    Ryan, B.: Inside the Pentium. BYTE Mag. 18(6), 102–104 (1993)Google Scholar
  16. 16.
    Schulz, M., Galarowicz, J., Maghrak, D., Hachfeld, W., Montoya, D., Cranford, S.: Open| SpeedShop: An open source infrastructure for parallel performance analysis. Sci. Prog. 16(2–3), 105–121 (2008)Google Scholar
  17. 17.
    Treibig, J., Hager, G., Wellein, G.: LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures. San Diego, CA (2010)Google Scholar
  18. 18.
    Treibig, J., Hager, G., Wellein, G.: Pattern driven node level performance engineering. (2013), sC13 poster
  19. 19.
    Treibig, J., Hager, G., Wellein, G.: Performance patterns and hardware metrics on modern multicore processors: Best practices for performance engineering. Euro-Par 2012: Parallel Processing Workshops. Lecture Notes in Computer Science, vol. 7640, pp. 451–460. Springer, Berlin (2013)Google Scholar
  20. 20.
    Weaver, V., Terpstra, D., Moore, S.: Non-determinism and overcount on modern hardware performance counter implementations. In: 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 215–224 (2013)Google Scholar
  21. 21.
    Zaparanuks, D., Jovic, M., Hauswirth, M.: Accuracy of performance counter measurements. In: IEEE International Symposium on Performance Analysis of Systems and Software, 2009. ISPASS 2009. pp. 23–32 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Thomas Röhl
    • 1
    Email author
  • Jan Eitzinger
    • 1
  • Georg Hager
    • 1
  • Gerhard Wellein
    • 1
  1. 1.Erlangen Regional Computing Center (RRZE)University of Erlangen-NurembergErlangenGermany

Personalised recommendations