Abstract
We present a new set of tools for the language-centric performance analysis and debugging of OpenMP programs that allows programmers to relate dynamic information from parallel execution to OpenMP constructs. Users can visualize execution traces, examine aggregate metrics on parallel loops and tasks, such as load imbalance or synchronization overhead, and obtain detailed information on specific events, such as the partitioning of a loop’s iteration space, its distribution to workers according to the scheduling policy and fine-grain synchronization. Our work is based on the Aftermath performance analysis tool and a ready-to-use, instrumented version of the LLVM/clang OpenMP run-time with negligible overhead for tracing. By analyzing the performance of the MG application of the NPB suite, we show that language-centric performance analysis in general and our tools in particular can help improve the performance of large-scale OpenMP applications significantly.
Keywords
- OpenMP
- Performance analysis
- Tracing
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
As reported by the numactl command line tool of libnuma, invoked with the –hardware option.
- 2.
- 3.
- 4.
References
http://vite.gforge.inria.fr. Accessed May 2016
Intel openmp runtime library. https://www.openmprtl.org. Accessed May 2016
LLVM OpenMP support. http://openmp.llvm.org. Accessed May 2016
Omni compiler project. http://www.hpcs.cs.tsukuba.ac.jp/omni-compiler/download/download-benchmarks.html. Accessed May 2016
Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineberg, S., Frederickson, P., Lasinski, T., Schreiber, R., Simon, H., Venkatakrishnan, V.: The NAS Parallel Benchmarks. Technical report (1994)
Bell, R., Malony, A.D., Shende, S.S.: ParaProf: a portable, extensible, and scalable tool for parallel performance profile analysis. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 17–26. Springer, Heidelberg (2003)
OpenMP Architecture Review Board: OpenMP Application Program Interface Version 4.5, November 2015
Drebes, A., Pop, A., Heydemann, K., Cohen, A.: Interactive visualization of cross-layer performance anomalies in dynamic task-parallel applications and systems. In: 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2016
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona openmp tasks suite: a set of benchmarks targeting the exploitation of task parallelism in openmp. In: Proceedings of the International Conference on Parallel Processing, ICpp 2009, pp. 124–131. IEEE Computer Society, Washington, DC, USA (2009)
Eichenberger, A., Mellor-Crummey, J., Schulz, M., Copty, N., Cownie, J., Dietrich, R., Liu, X., Loh, E., Lorenz, D.: OpenMP Technical Report 2 on the OMPT Interface. Technical report (2014)
Huck, K.A., Malony, A.D.: Perfexplorer: a performance data mining framework for large-scale parallel computing. In: Proceedings of the ACM/IEEE Conference on Supercomputing, SC 2005, pp. 41–53. IEEE Computer Society, Washington, DC, USA (2005)
Itzkowitz, M., Mazurov, O., Copty, N., Lin, Y.: An OpenMP Runtime API for Profiling. http://www.compunity.org/futures/omp-api.html. Accessed May 2016
Jost, G., Mazurov, O., an Mey, D.: Adding new dimensions to performance analysis through user-defined objects. In: Mueller, M.S., Chapman, B.M., Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005 and IWOMP 2006. LNCS, vol. 4315, pp. 255–266. Springer, Heidelberg (2008)
Muddukrishna, A., Jonsson, P.A., Podobas, A., Brorsson, M.: Grain graphs: openmp performance analysis made easy. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPopp 2016, pp. 28:1–28:13. ACM, New York (2016)
Müller, M.S., Knüpfer, A., Jurenz, M., Lieber, M.,Brunst, H., Mix, H., Nagel, W.E.: Developing scalable applicationswith Vampir, VampirServer and VampirTrace. In: Proceedings of ParCo 2007. Advances in Parallel Computing, vol. 15, pp. 637–644. IOS Press (2008)
Pillet, V., Labarta, J., Cortes, T., Girona, S.: PARAVER: A tool to visualize and analyze parallel code. In: WoTUG-18. Technical report (1995)
Pop, A., Cohen, A.: OpenStream: expressiveness and data-flow compilation of OpenMP streaming programs. ACM Trans. Archit. Code Optim. 9(4), 53:1–53:25 (2013)
Shende, S.S., Malony, A.D.: The tau parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)
The Cairo Graphics Team: Cairo graphics. http://www.cairographics.org. Accessed May 2016
The GTK+ Team: The GTK+ project. http://www.gtk.org. Accessed May 2016
Acknowledgments
Our work was partly supported by the grants EU FET-HPC ExaNoDe H2020-671578, Eurolab-4-HPC H2020-671610, UK EPSRC EP/M004880/1, and France Nano 2017 DEMA. A. Pop is funded by a Royal Academy of Engineering Uni-versity Research Fellowship.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Drebes, A., Bréjon, JB., Pop, A., Heydemann, K., Cohen, A. (2016). Language-Centric Performance Analysis of OpenMP Programs with Aftermath. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham. https://doi.org/10.1007/978-3-319-45550-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-45550-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45549-5
Online ISBN: 978-3-319-45550-1
eBook Packages: Computer ScienceComputer Science (R0)