Performance Patterns and Hardware Metrics on Modern Multicore Processors: Best Practices for Performance Engineering

  • Jan Treibig
  • Georg Hager
  • Gerhard Wellein
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7640)


Many tools and libraries employ hardware performance monitoring (HPM) on modern processors, and using this data for performance assessment and as a starting point for code optimizations is very popular. However, such data is only useful if it is interpreted with care, and if the right metrics are chosen for the right purpose. We demonstrate the sensible use of hardware performance counters in the context of a structured performance engineering approach for applications in computational science. Typical performance patterns and their respective metric signatures are defined, and some of them are illustrated using case studies. Although these generic concepts do not depend on specific tools or environments, we restrict ourselves to modern x86-based multicore processors and use the likwid-perfctr tool under the Linux OS.


Loop Nest Memory Bandwidth Cache Line Multicore Processor Performance Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Günther, F., Mehl, M., Pögl, M., Zenger, C.: A cache-aware algorithm for PDEs on hierarchical data structures based on space-filling curves. SIAM Journal on Scientific Computing 28(5), 1634–1650 (2006), MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    Klug, T., Ott, M., Weidendorfer, J., Trinitis, C.: Autopin – automated optimization of thread-to-core pinning on multicore systems. T. HiPEAC 3, 219–235 (2011)Google Scholar
  3. 3.
    Chen, H., Chung Hsu, W., Lu, J., Chung Yew, P.: Dynamic trace selection using performance monitoring hardware sampling. In: Proceedings of the 1st International Symposium on Code Generation and Optimization, pp. 79–90 (2003)Google Scholar
  4. 4.
    Burtscher, M., Kim, B.-D., Diamond, J., McCalpin, J., Koesterke, L., Browne, J.: PerfExpert: An easy-to-use performance diagnosis tool for HPC applications. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010, pp. 1–11. IEEE Computer Society, Washington, DC (2010),, ISBN 978-1-4244-7559-9CrossRefGoogle Scholar
  5. 5.
    de la Cruz, R., Araya-Polo, M.: Towards a multi-level cache performance model for 3d stencil computation. Procedia Computer Science 4, 2146–2155 (2011); Proceedings of the International Conference on Computational Science, ICCS 2011,
  6. 6.
    Pfeiffer, W., Wright, N.: Modeling and predicting application performance on parallel computers using HPC challenge benchmarks. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, pp. 1–12 (2008) ISSN 1530-2075Google Scholar
  7. 7.
    Williams, S.W., Waterman, A., Patterson, D.A.: Roofline: An insightful visual performance model for floating-point programs and multicore architectures. Tech. Rep. UCB/EECS-2008-134, EECS Department, University of California, Berkeley (October 2008)
  8. 8.
    Treibig, J., Hager, G., Wellein, G.: LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In: The First International Workshop on Parallel Software Tools and Tool Infrastructures, PSTI 2010, pp. 207–216. IEEE Computer Society, Los Alamitos (2010), Google Scholar
  9. 9.
    LIKWID performance tools,
  10. 10.
    Iglberger, K., Hager, G., Treibig, J., Rüde, U.: Expression templates revisited: A performance analysis of current ET methodologies. SIAM Journal on Scientific Computing 34(2), C42–C69 (2012), Google Scholar
  11. 11.
    Iglberger, K., Hager, G., Treibig, J., Rüde, U.: High performance smart expression template math libraries. In: Proceedings of APMM 2012, the 2nd International Workshop on New Algorithms and Programming Models for the Manycore Era at HPCS 2012, Madrid, Spain, July 2-6 (accepted, 2012)Google Scholar
  12. 12.
    Treibig, J., Hager, G., Hofmann, H.G., Hornegger, J., Wellein, G.: Pushing the limits for medical image reconstruction on recent standard multicore processors. International Journal of High Performance Computing Applications (accepted),
  13. 13.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Jan Treibig
    • 1
  • Georg Hager
    • 1
  • Gerhard Wellein
    • 1
  1. 1.Erlangen Regional Computing Center (RRZE)Friedrich-Alexander-Universität Erlangen-NürnbergErlangenGermany

Personalised recommendations