Advertisement

LIKWID: Lightweight Performance Tools

  • Jan Treibig
  • Georg Hager
  • Gerhard Wellein
Conference paper

Abstract

Exploiting the performance of today’s microprocessors requires intimate knowledge of the microarchitecture as well as an awareness of the ever-growing complexity in thread and cache topology. LIKWID is a set of command line utilities that addresses four key problems: Probing the thread and cache topology of a shared-memory node, enforcing thread-core affinity on a program, measuring performance counter metrics, and microbenchmarking for reliable upper performance bounds. Moreover, it includes an mpirun wrapper allowing for portable thread-core affinity in MPI and hybrid MPI/threaded applications. To demonstrate the capabilities of the tool set we show the influence of thread affinity on performance using the well-known OpenMP STREAM triad benchmark, use hardware counter tools to study the performance of a stencil code, and finally show how to detect bandwidth problems on ccNUMA-based compute nodes.

Keywords

Memory Bandwidth NUMA Domain Performance Counter Command Line Tool Thread Count 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgment

We are indebted to Intel Germany for providing test systems and early access hardware for benchmarking. A special acknowledgment goes to Michael Meier, who had the basic idea for likwid-pin , implemented the prototype, and provided many useful thoughts in discussions. This work was supported by the Competence Network for Scientific and Technical High Performance Computing in Bavaria (KONWIHR) under the project “OMI4papps.”

References

  1. 1.
    Homepage of LIKWID tool suite http://code.google.com/p/likwid/
  2. 2.
    Jost, G., Haoqiang, J., Labarta, J., Gimenez, J., Caubet, J.: Performance analysis of multilevel parallel applications on shared memory architectures. Proceedings of the Parallel and Distributed Processing Symposium (2003)Google Scholar
  3. 3.
    Gerndt, M., Kereku, E.: Automatic Memory Access Analysis with Periscope. ICCS ’07: Proceedings of the 7th international conference on Computational Science pp. 847–854 (2007)Google Scholar
  4. 4.
    Gerndt, M., Fürlinger, K., Kereku, E.: Periscope: Advanced Techniques for Performance Analysis. PARCO, pp. 15–26 (2005)Google Scholar
  5. 5.
    Terpstra, D., Jagode, H., You, H., Dongarra, J.: Collecting Performance Data with PAPI-C. Proceedings of the 3rd Parallel Tools Workshop, Springer, Dresden, Germany (2010)Google Scholar
  6. 6.
    Browne, S., Deane, C., Ho, G., Mucci, P.: PAPI: A Portable Interface to Hardware Performance Counters. Proceedings of Department of Defense HPCMP Users Group Conference, June (1999)Google Scholar
  7. 7.
    Drongowski, P.J.: Basic Performance Measurements for AMD Athlon 64, AMD Opteron and AMD Phenom Processors. Technical Note, Advanced Micro Devices, Inc. Boston Design Center, September (2008)Google Scholar
  8. 8.
    DeRose, L., Homer, B., Johnson, D.: Detecting application load imbalance on high end massively parallel systems. Euro-Par 2007 Parallel Processing Conference, pp. 150–159 (2007)Google Scholar
  9. 9.
    Hager, G., Wellein, G.: Introduction to High Performance Computing for Scientists and Engineers. CRC Press, ISBN 978-1439811924, July (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.Erlangen Regional Computing Center (RRZE)Friedrich-Alexander Universität Erlangen-NürnbergErlangenGermany

Personalised recommendations