Advertisement

Suitability of Performance Tools for OpenMP Task-Parallel Programs

  • Dirk SchmidlEmail author
  • Christian Terboven
  • Dieter an Mey
  • Matthias S. Müller
Conference paper

Abstract

In 2008 task based parallelism was added to OpenMP as the major update for version 3.0. Tasks provide an easy way to express dynamic parallelism in OpenMP applications. However, achieving a good performance with OpenMP task-parallel programs is a challenging task. OpenMP runtime systems are free to schedule, interrupt and resume tasks in many different ways, thereby complicating the prediction of the program behavior by the programmer. Hence, it is important for a programmer to get support from performance tools to understand the performance characteristics of his application.Different performance tools follow different approaches to collect this information and to present it to the programmer. Important differences are the amount of information which is gathered and stored and the amount of overhead that is introduced. We identify typical usage patterns of OpenMP tasks in application codes. Then we compare the usability of several performance tools for task-parallel applications. We concentrate our investigations on two topics, the amount and usefulness of the measured data and the overhead introduced by the performance tool.

Keywords

Performance Tool Task Region Source Code Level Intel Compiler Performance Analysis Tool 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

Parts of this work were funded by the German Federal Ministry of Research and Education (BMBF) under Grant No. 01IH11006 (LMAC).

References

  1. 1.
    Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: Parallel Processing, 2009 (ICPP ’09), Vienna, pp. 124–131 (Sept 2009)Google Scholar
  2. 2.
    Fürlinger, K., Skinner, D.: Performance profiling for OpenMP tasks. In: Müller, M.S., Supinski, B.R., Chapman, B.M. (eds.) Evolving OpenMP in an Age of Extreme Parallelism. Lecture Notes in Computer Science, vol. 5568, pp. 132–139. Springer, Berlin/Heidelberg (2009).http://dx.doi.org/10.1007/978-3-642-02303-3_11
  3. 3.
    Geimer, M., Wolf, F., Wylie, B.J.N., Ábrahám, E., Becker, D., Mohr, B.: The SCALASCA performance toolset architecture. In: Proceedings of the International Workshop on Scalable Tools for High-End Computing (STHEC), Kos, pp. 51–65 (June 2008)Google Scholar
  4. 4.
    Gerndt, M., Ott, M.: Automatic performance analysis with periscope. Concurr. Comput.: Pract. Exp. 22(6), 736–748 (2010)Google Scholar
  5. 5.
    Intel: Intel VTune Amplifier XE (Sept 2013). http://software.intel.com/en-us/intel-vtune-amplifier-xe
  6. 6.
    Kapinos, P., an Mey, D.: Productivity and performance portability of the OpenMP 3.0 tasking concept when applied to an engineering code written in Fortran 95. Int. J. Parallel Program. 38(5–6), 379–395 (2010). http://dx.doi.org/10.1007/s10766-010-0138-1
  7. 7.
    Knüpfer, A., Rössel, C., an Mey, D., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A.D., Nagel, W.E., Oleynik, Y., Philippen, P., Saviankou, P., Schmidl, D., Shende, S.S., Tschüter, R., Wagner, M., Wesarg, B., Wolf, F.: Score-P – a joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Proceedings of 5th Parallel Tools Workshop, Dresden, (Sept 2011)Google Scholar
  8. 8.
    Lin, Y., Mazurov, O.: Providing observability for OpenMP 3.0 applications. In: Müller, M.S., Supinski, B.R., Chapman, B.M. (eds.) Evolving OpenMP in an Age of Extreme Parallelism. Lecture Notes in Computer Science, vol. 5568, pp. 104–117. Springer, Berlin/Heidelberg (2009). http://dx.doi.org/10.1007/978-3-642-02303-3_9
  9. 9.
    Mohr, B., Malony, A.D., Shende, S., Wolf, F.: Design and prototype of a performance tool interface for OpenMP. J. Supercomput. 23(1), 105–128 (2002)CrossRefzbMATHGoogle Scholar
  10. 10.
    Nagel, W., Weber, M., Hoppe, H.C., Solchenbach, K.: VAMPIR: visualization and analysis of MPI resources. Supercomputer 12(1), 69–80 (1996)Google Scholar
  11. 11.
    Oracle: Oracle Solaris Studio 12.2: Performance Analyzer (Sept 2013). http://docs.oracle.com/cd/E18659_01/html/821-1379/
  12. 12.
    Shende, S., Malony, A.D.: The TAU parallel performance system, SAGE publications. Int. J. High Perform. Comput. Appl. 20(2), 287–331 (2006)CrossRefGoogle Scholar
  13. 13.
    Terboven, C., Schmidl, D., Cramer, T., an Mey, D.: Assessing OpenMP tasking implementations on NUMA architectures. In: Chapman, B.M., Massaioli, F., Mller, M.S., Rorro, M. (eds.) OpenMP in a Heterogeneous World. Lecture Notes in Computer Science, vol. 7312, pp. 182–195. Springer, Berlin/Heidelberg (2012). http://dx.doi.org/10.1007/978-3-642-30961-8_14

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Dirk Schmidl
    • 1
    Email author
  • Christian Terboven
    • 1
  • Dieter an Mey
    • 1
  • Matthias S. Müller
    • 1
    • 2
  1. 1.Chair for High Performance ComputingIT Center RWTH Aachen UniversityAachenGermany
  2. 2.JARA High-Performance ComputingAachenGermany

Personalised recommendations