Performance Measurement for the OpenMP 4.0 Offloading Model

  • Robert Dietrich
  • Felix Schmitt
  • Alexander Grund
  • Dirk Schmidl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8806)


OpenMP is one of the most widely used standards for enabling thread-level parallelism in high performance computing codes. The recently released version 4.0 of the specification introduces directives that enable application developers to offload portions of the computation to massively-parallel target devices. However, to efficiently utilize these devices, sophisticated performance analysis tools are required. The emerging OpenMP Tools Interface (OMPT) aids the development of portable tools, but currently lacks the support for OpenMP 4.0 target directives. This paper presents a novel approach to measure the performance of applications utilizing OpenMP offloading. It introduces libmpti, an OMPT-based measurement library for Intel MIC target devices. For host-side analysis we extended the OPARI2 instrumenter and prototypically integrated the complete approach into the state-of-the-art tool infrastructure Score-P. We demonstrate the effectiveness of the presented method and implementation with a Conjugate-Gradient (CG) kernel on an Intel Xeon Phi coprocessor. Finally, we visualize the obtained performance data with Vampir.


performance analysis offloading OpenMP 4.0 Intel MIC Score-P 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mellor-Crummey, J., et al.: OMPT support branch of the open source Intel OpenMP runtime library (December 2013),
  2. 2.
    Eichenberger, A., Mellor-Crummey, J., Schulz, M., Copty, N., Cownie, J., Dietrich, R., Liu, X., Loh, E., Lorenz, D.: OpenMP Technical Report 2 on the OMPT Interface (March 2014)Google Scholar
  3. 3.
    Geimer, M., Wolf, F., Wylie, B.J.N., Erika Abraham, D.B., Mohr, B.: The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience 22(6), 702–719 (2010)Google Scholar
  4. 4.
    Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The Vampir Performance Analysis Tool-Set. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds.) ”Tools for High Performance Computing”, Proceedings of the 2nd International Workshop on Parallel Tools for High Performance Computing. Springer, Stuttgart (2008)Google Scholar
  5. 5.
    Liu, X., Mellor-Crummey, J., Fagan, M.: A new approach for performance analysis of OpenMP programs. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, pp. 69–80. ACM (2013)Google Scholar
  6. 6.
    Mey, D., Biersdorf, S., Bischof, C., Diethelm, K., Eschweiler, D., Gerndt, M., Knüpfer, A., Lorenz, D., Malony, A., Nagel, W.E., Oleynik, Y., Rössel, C., Saviankou, P., Schmidl, D., Shende, S., Wagner, M., Wesarg, B., Wolf, F.: Score-P: A Unified Performance Measurement System for Petascale Applications. In: Bischof, C., Hegering, H.G., Nagel, W.E., Wittum, G. (eds.) Competence in High Performance Computing 2010, pp. 85–97. Springer (2012)Google Scholar
  7. 7.
    Mohr, B., Malony, A.D., Shende, S., Wolf, F.: Design and Prototype of a Performance Tool Interface for OpenMP. The Journal of Supercomputing 23(1), 105–128 (2002)CrossRefzbMATHGoogle Scholar
  8. 8.
    NVIDIA: CUDA Toolkit Documentation — CUPTI (July 2013),
  9. 9.
    OpenMP Architecture Review Board: OpenMP application program interface version 4.0 (July 2013),
  10. 10.
    Wylie, B.J., Frings, W.: Scalasca support for MPI+OpenMP parallel applications on large-scale HPC systems based on Intel Xeon Phi. In: Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery, p. 37. ACM (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Robert Dietrich
    • 1
  • Felix Schmitt
    • 1
  • Alexander Grund
    • 1
  • Dirk Schmidl
    • 2
  1. 1.Center for Information Services and High Performance ComputingTechnische Universität DresdenDresdenGermany
  2. 2.IT CenterRWTH Aachen UniversityAachenGermany

Personalised recommendations