Skip to main content

Language-Centric Performance Analysis of OpenMP Programs with Aftermath

Part of the Lecture Notes in Computer Science book series (LNPSE,volume 9903)


We present a new set of tools for the language-centric performance analysis and debugging of OpenMP programs that allows programmers to relate dynamic information from parallel execution to OpenMP constructs. Users can visualize execution traces, examine aggregate metrics on parallel loops and tasks, such as load imbalance or synchronization overhead, and obtain detailed information on specific events, such as the partitioning of a loop’s iteration space, its distribution to workers according to the scheduling policy and fine-grain synchronization. Our work is based on the Aftermath performance analysis tool and a ready-to-use, instrumented version of the LLVM/clang OpenMP run-time with negligible overhead for tracing. By analyzing the performance of the MG application of the NPB suite, we show that language-centric performance analysis in general and our tools in particular can help improve the performance of large-scale OpenMP applications significantly.


  • OpenMP
  • Performance analysis
  • Tracing

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. 1.

    As reported by the numactl command line tool of libnuma, invoked with the –hardware option.

  2. 2.

  3. 3.

  4. 4. -with-intel-vtune-amplifier-xe.


  1. Accessed May 2016

  2. Intel openmp runtime library. Accessed May 2016

  3. LLVM OpenMP support. Accessed May 2016

  4. Omni compiler project. Accessed May 2016

  5. Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineberg, S., Frederickson, P., Lasinski, T., Schreiber, R., Simon, H., Venkatakrishnan, V.: The NAS Parallel Benchmarks. Technical report (1994)

    Google Scholar 

  6. Bell, R., Malony, A.D., Shende, S.S.: ParaProf: a portable, extensible, and scalable tool for parallel performance profile analysis. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 17–26. Springer, Heidelberg (2003)

    CrossRef  Google Scholar 

  7. OpenMP Architecture Review Board: OpenMP Application Program Interface Version 4.5, November 2015

    Google Scholar 

  8. Drebes, A., Pop, A., Heydemann, K., Cohen, A.: Interactive visualization of cross-layer performance anomalies in dynamic task-parallel applications and systems. In: 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2016

    Google Scholar 

  9. Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona openmp tasks suite: a set of benchmarks targeting the exploitation of task parallelism in openmp. In: Proceedings of the International Conference on Parallel Processing, ICpp 2009, pp. 124–131. IEEE Computer Society, Washington, DC, USA (2009)

    Google Scholar 

  10. Eichenberger, A., Mellor-Crummey, J., Schulz, M., Copty, N., Cownie, J., Dietrich, R., Liu, X., Loh, E., Lorenz, D.: OpenMP Technical Report 2 on the OMPT Interface. Technical report (2014)

    Google Scholar 

  11. Huck, K.A., Malony, A.D.: Perfexplorer: a performance data mining framework for large-scale parallel computing. In: Proceedings of the ACM/IEEE Conference on Supercomputing, SC 2005, pp. 41–53. IEEE Computer Society, Washington, DC, USA (2005)

    Google Scholar 

  12. Itzkowitz, M., Mazurov, O., Copty, N., Lin, Y.: An OpenMP Runtime API for Profiling. Accessed May 2016

  13. Jost, G., Mazurov, O., an Mey, D.: Adding new dimensions to performance analysis through user-defined objects. In: Mueller, M.S., Chapman, B.M., Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005 and IWOMP 2006. LNCS, vol. 4315, pp. 255–266. Springer, Heidelberg (2008)

    CrossRef  Google Scholar 

  14. Muddukrishna, A., Jonsson, P.A., Podobas, A., Brorsson, M.: Grain graphs: openmp performance analysis made easy. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPopp 2016, pp. 28:1–28:13. ACM, New York (2016)

    Google Scholar 

  15. Müller, M.S., Knüpfer, A., Jurenz, M., Lieber, M.,Brunst, H., Mix, H., Nagel, W.E.: Developing scalable applicationswith Vampir, VampirServer and VampirTrace. In: Proceedings of ParCo 2007. Advances in Parallel Computing, vol. 15, pp. 637–644. IOS Press (2008)

    Google Scholar 

  16. Pillet, V., Labarta, J., Cortes, T., Girona, S.: PARAVER: A tool to visualize and analyze parallel code. In: WoTUG-18. Technical report (1995)

    Google Scholar 

  17. Pop, A., Cohen, A.: OpenStream: expressiveness and data-flow compilation of OpenMP streaming programs. ACM Trans. Archit. Code Optim. 9(4), 53:1–53:25 (2013)

    CrossRef  Google Scholar 

  18. Shende, S.S., Malony, A.D.: The tau parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)

    CrossRef  Google Scholar 

  19. The Cairo Graphics Team: Cairo graphics. Accessed May 2016

  20. The GTK+ Team: The GTK+ project. Accessed May 2016

Download references


Our work was partly supported by the grants EU FET-HPC ExaNoDe H2020-671578, Eurolab-4-HPC H2020-671610, UK EPSRC EP/M004880/1, and France Nano 2017 DEMA. A. Pop is funded by a Royal Academy of Engineering Uni-versity Research Fellowship.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Andi Drebes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Drebes, A., Bréjon, JB., Pop, A., Heydemann, K., Cohen, A. (2016). Language-Centric Performance Analysis of OpenMP Programs with Aftermath. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45549-5

  • Online ISBN: 978-3-319-45550-1

  • eBook Packages: Computer ScienceComputer Science (R0)