Language-Centric Performance Analysis of OpenMP Programs with Aftermath

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9903)


We present a new set of tools for the language-centric performance analysis and debugging of OpenMP programs that allows programmers to relate dynamic information from parallel execution to OpenMP constructs. Users can visualize execution traces, examine aggregate metrics on parallel loops and tasks, such as load imbalance or synchronization overhead, and obtain detailed information on specific events, such as the partitioning of a loop’s iteration space, its distribution to workers according to the scheduling policy and fine-grain synchronization. Our work is based on the Aftermath performance analysis tool and a ready-to-use, instrumented version of the LLVM/clang OpenMP run-time with negligible overhead for tracing. By analyzing the performance of the MG application of the NPB suite, we show that language-centric performance analysis in general and our tools in particular can help improve the performance of large-scale OpenMP applications significantly.


OpenMP Performance analysis Tracing 



Our work was partly supported by the grants EU FET-HPC ExaNoDe H2020-671578, Eurolab-4-HPC H2020-671610, UK EPSRC EP/M004880/1, and France Nano 2017 DEMA. A. Pop is funded by a Royal Academy of Engineering Uni-versity Research Fellowship.


  1. 1. Accessed May 2016
  2. 2.
    Intel openmp runtime library. Accessed May 2016
  3. 3.
    LLVM OpenMP support. Accessed May 2016
  4. 4.
  5. 5.
    Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineberg, S., Frederickson, P., Lasinski, T., Schreiber, R., Simon, H., Venkatakrishnan, V.: The NAS Parallel Benchmarks. Technical report (1994)Google Scholar
  6. 6.
    Bell, R., Malony, A.D., Shende, S.S.: ParaProf: a portable, extensible, and scalable tool for parallel performance profile analysis. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 17–26. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  7. 7.
    OpenMP Architecture Review Board: OpenMP Application Program Interface Version 4.5, November 2015Google Scholar
  8. 8.
    Drebes, A., Pop, A., Heydemann, K., Cohen, A.: Interactive visualization of cross-layer performance anomalies in dynamic task-parallel applications and systems. In: 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2016Google Scholar
  9. 9.
    Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona openmp tasks suite: a set of benchmarks targeting the exploitation of task parallelism in openmp. In: Proceedings of the International Conference on Parallel Processing, ICpp 2009, pp. 124–131. IEEE Computer Society, Washington, DC, USA (2009)Google Scholar
  10. 10.
    Eichenberger, A., Mellor-Crummey, J., Schulz, M., Copty, N., Cownie, J., Dietrich, R., Liu, X., Loh, E., Lorenz, D.: OpenMP Technical Report 2 on the OMPT Interface. Technical report (2014)Google Scholar
  11. 11.
    Huck, K.A., Malony, A.D.: Perfexplorer: a performance data mining framework for large-scale parallel computing. In: Proceedings of the ACM/IEEE Conference on Supercomputing, SC 2005, pp. 41–53. IEEE Computer Society, Washington, DC, USA (2005)Google Scholar
  12. 12.
    Itzkowitz, M., Mazurov, O., Copty, N., Lin, Y.: An OpenMP Runtime API for Profiling. Accessed May 2016
  13. 13.
    Jost, G., Mazurov, O., an Mey, D.: Adding new dimensions to performance analysis through user-defined objects. In: Mueller, M.S., Chapman, B.M., Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005 and IWOMP 2006. LNCS, vol. 4315, pp. 255–266. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Muddukrishna, A., Jonsson, P.A., Podobas, A., Brorsson, M.: Grain graphs: openmp performance analysis made easy. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPopp 2016, pp. 28:1–28:13. ACM, New York (2016)Google Scholar
  15. 15.
    Müller, M.S., Knüpfer, A., Jurenz, M., Lieber, M.,Brunst, H., Mix, H., Nagel, W.E.: Developing scalable applicationswith Vampir, VampirServer and VampirTrace. In: Proceedings of ParCo 2007. Advances in Parallel Computing, vol. 15, pp. 637–644. IOS Press (2008)Google Scholar
  16. 16.
    Pillet, V., Labarta, J., Cortes, T., Girona, S.: PARAVER: A tool to visualize and analyze parallel code. In: WoTUG-18. Technical report (1995)Google Scholar
  17. 17.
    Pop, A., Cohen, A.: OpenStream: expressiveness and data-flow compilation of OpenMP streaming programs. ACM Trans. Archit. Code Optim. 9(4), 53:1–53:25 (2013)CrossRefGoogle Scholar
  18. 18.
    Shende, S.S., Malony, A.D.: The tau parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)CrossRefGoogle Scholar
  19. 19.
    The Cairo Graphics Team: Cairo graphics. Accessed May 2016
  20. 20.
    The GTK+ Team: The GTK+ project. Accessed May 2016

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.School of Computer ScienceThe University of ManchesterManchesterUK
  2. 2.Sorbonne Universités, UPMC Paris 06, CNRS, UMR 7606, LIP6ParisFrance
  3. 3.InriaParisFrance
  4. 4.École Normale SupérieureParisFrance

Personalised recommendations