How to Reconcile Event-Based Performance Analysis with Tasking in OpenMP

  • Daniel Lorenz
  • Bernd Mohr
  • Christian Rössel
  • Dirk Schmidl
  • Felix Wolf
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6132)


With version 3.0, the OpenMP specification introduced a task construct and with it an additional dimension of concurrency. While offering a convenient means to express task parallelism, the new construct presents a serious challenge to event-based performance analysis. Since tasking may disrupt the classic sequence of region entry and exit events, essential analysis procedures such as reconstructing dynamic call paths or correctly attributing performance metrics to individual task region instances may become impossible. To overcome this limitation, we describe a portable method to distinguish individual task instances and to track their suspension and resumption with event-based instrumentation. Implemented as an extension of the OPARI source-code instrumenter, our portable solution supports C/C++ programs with tied tasks and with untied tasks that are suspended only at implied scheduling points, while introducing only negligible measurement overhead. Finally, we discuss possible extensions of the OpenMP specification to provide general support for task identifiers with untied tasks.


Query Image Parallel Region Task Creation Implicit Task Schedule Point 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    OpenMP Architecture Review Board. OpenMP application progam interface version 3.0. Technical report, OpenMP Architecture Review Board (May 2008)Google Scholar
  2. 2.
    Bui, V., Hernandez, O., Chapman, B., Kufrin, R., Tafti, D., Gopalkrishnan, P.: Towards an implementation of the OpenMP collector API. In: Parallel Computing: Architectures, Algorithms and Applications, Proceedings of the ParCo 2007 Conference, Jülich, Germany (September 2007)Google Scholar
  3. 3.
    DeRose, L.A., Mohr, B., Seelam, S.R.: Profiling and tracing OpenMP applications with POMP based monitoring libraries. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 47–54. Springer, Heidelberg (2004)Google Scholar
  4. 4.
    Deselaers, T., Keysers, D., Ney, H.: Features for image retrieval - a quantitative comparison. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 228–236. Springer, Heidelberg (2004)Google Scholar
  5. 5.
    Führlinger, K., Skinner, D.: Performance profiling for OpenMP tasks. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 132–139. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  6. 6.
    Geimer, M., Wolf, F., Wylie, B.J.N., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience 22(6), 702–719 (2010)Google Scholar
  7. 7.
    Itzkowitz, M., Mazurov, O., Copty, N., Lin, Y.: An OpenMP runtime API for profiling. Technical report, Sun Microsystems, Inc. (2007)Google Scholar
  8. 8.
    Lin, Y., Mazurov, O.: Providing observability for OpenMP 3.0 applications. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 104–117. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  9. 9.
    Mohr, B., Malony, A.D., Hoppe, H.-C., Schlimbach, F., Haab, G., Hoeflinger, J., Shah, S.: A performance monitoring interface for OpenMP. In: Proceedings of the 4th European Workshop on OpenMP (EWOMP’02), Rome, Italy (September 2002)Google Scholar
  10. 10.
    Mohr, B., Malony, A.D., Shende, S.S., Wolf, F.: Design and prototype of a performance tool interface for OpenMP. The Journal of Supercomputing 23(1), 105–128 (2002)zbMATHCrossRefGoogle Scholar
  11. 11.
    Shende, S.S., Malony, A.D.: The TAU parallel performance system. International Journal of High Performance Computing Applications 20(2), 287–331 (2006)CrossRefGoogle Scholar
  12. 12.
    Terboven, C., Deselaers, T., Bischof, C., Ney, H.: Shared-memory parallelization for content-based image retrieval. In: ECCV 2006 Workshop on Computation Intensive Methods for Computer Vision (CIMCV), Graz, Austria (May 2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Daniel Lorenz
    • 1
  • Bernd Mohr
    • 1
  • Christian Rössel
    • 1
  • Dirk Schmidl
    • 2
  • Felix Wolf
    • 1
    • 2
    • 3
  1. 1.Jülich Supercomputing CentreForschungszentrum JülichGermany
  2. 2.Dept. of Computer ScienceRWTH Aachen UniversityGermany
  3. 3.German Research School for Simulation SciencesAachenGermany

Personalised recommendations