Advertisement

Integrated Measurement for Cross-Platform OpenMP Performance Analysis

  • Kevin A. Huck
  • Allen D. Malony
  • Sameer Shende
  • Doug W. Jacobsen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8766)

Abstract

The ability to measure the performance of OpenMP programs portably across shared memory platforms and across OpenMP compilers is a challenge due to the lack of a widely-implemented performance interface standard. While the OpenMP community is currently evaluating a tools interface specification called OMPT, at present there are different instrumentation methods possible at different levels of observation and with different system and compiler dependencies. This paper describes how support for four mechanisms for OpenMP measurement has been integrated into the TAU performance system. These include source-level instrumentation (Opari), a runtime “collector” API (called ORA) built into an OpenMP compiler (OpenUH), a wrapped OpenMP runtime library (GOMP using ORA), and an OpenMP runtime library supporting an OMPT prototype (Intel). The capabilities of these approaches are evaluated with respect to observation visibility, portability, and measurement overhead for OpenMP benchmarks from the NAS parallel benchmarks, Barcelona OpenMP Task Suite, and SPEC 2012. The integrated OpenMP measurement support is also demonstrated on a scientific application, MPAS-Ocean.

Keywords

Parallel Region Task Scheduler Thread Count OpenMP Program Hardware Counter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bui, V., et al.: Towards an implementation of the OpenMP collector API. Urbana 51, 61801 (2007)Google Scholar
  2. 2.
    Dagum, L., Menon, R.: OpenMP: An industry standard API for shared-memory programming. IEEE Computational Science Engineering 5(1), 46–55 (1998)CrossRefGoogle Scholar
  3. 3.
    Eichenberger, A., et al.: OMPT and OMPD: OpenMP tools application programming interfaces for performance analysis and debugging (2014), (OpenMP 4.0 draft proposal)Google Scholar
  4. 4.
    Eichenberger, A.E., et al.: OMPT: An OpenMP tools application programming interface for performance analysis. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 171–185. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  5. 5.
    Fürlinger, K., Gerndt, M.: ompP: A profiling tool for OpenMP. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005/2006. LNCS, vol. 4315, pp. 15–23. Springer, Heidelberg (2008)Google Scholar
  6. 6.
    Geimer, M., et al.: The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience 22(6), 702–719 (2010)Google Scholar
  7. 7.
    GNU: GCC, the GNU Compiler Collection (2014), http://gcc.gnu.org
  8. 8.
    GNU: GNU Binutils (2014), http://www.gnu.org/software/binutils
  9. 9.
    GNU: GNU libgomp (2014), http://gcc.gnu.org/onlinedocs/libgomp/
  10. 10.
    Hernandez, O., et al.: Open source software support for the OpenMP runtime api for profiling. In: Proceedings of ICPPW 2009, pp. 130–137. IEEE (2009)Google Scholar
  11. 11.
    Intel: Intel open source OpenMP runtime (2014), http://www.openmprtl.org
  12. 12.
    Intel: Intel® Thread Profiler - Product Overview (2014), http://software.intel.com/en-us/articles/intel-thread-profiler-product-overview/
  13. 13.
    Itzkowitz, M., et al.: An OpenMP Runtime API for Profiling. OpenMP official ARB White Paper 314, 181–190 (2007)Google Scholar
  14. 14.
    Knüpfer, A., et al.: The Vampir performance analysis tool-set. In: Tools for High Performance Computing, pp. 139–155. Springer (2008)Google Scholar
  15. 15.
    LANL and NCAR: MPAS (2014), http://mpas-dev.github.io
  16. 16.
    LBL: Hopper, NERSC’s Cray XE6 System (2014), http://www.nersc.gov/
  17. 17.
    Liao, C., et al.: OpenUH: An optimizing, portable OpenMP compiler. Concurrency and Computation: Practice and Experience 19(18), 2317–2332 (2007)CrossRefGoogle Scholar
  18. 18.
    Liao, C., et al.: OpenUH: An optimizing, portable OpenMP compiler. Concurrency and Computation: Practice and Experience 19(18), 2317–2332 (2007)CrossRefGoogle Scholar
  19. 19.
    Lin, Y., Mazurov, O.: Providing observability for OpenMP 3.0 applications. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 104–117. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  20. 20.
    Liu, X., et al.: A new approach for performance analysis of OpenMP programs. In: ICS 2014, pp. 69–80. ACM, New York (2013)Google Scholar
  21. 21.
    Mohr, B., et al.: Towards a performance tool interface for OpenMP: An approach based on directive rewriting. Citeseer (2001)Google Scholar
  22. 22.
    Mohr, B., et al.: Design and Prototype of a Performance Tool Interface for OpenMP. Journal of Supercomputing 23(1), 105–128 (2002)CrossRefzbMATHGoogle Scholar
  23. 23.
    Mohr, B., Wolf, F.: KOJAK–a tool set for automatic performance analysis of parallel programs. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 1301–1304. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  24. 24.
    Mohsen, M.S., et al.: A survey on performance tools for OpenMP. World Academy of Science, Engineering and Technology 49 (2009)Google Scholar
  25. 25.
    OpenMP Architecture Review Board: The OpenMP® API specification for parallel programming (2014), http://openmp.org/wp/openmp-specifications/
  26. 26.
  27. 27.
    Qawasmeh, A., Malik, A., Chapman, B., Huck, K., Malony, A.: Open Source Task Profiling by Extending the OpenMP Runtime API. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 186–199. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  28. 28.
    Quinlan, D.: ROSE: Compiler support for object-oriented frameworks. Parallel Processing Letters 10(02n03), 215–226 (2000)CrossRefGoogle Scholar
  29. 29.
    Quinlan, D.J., et al.: ROSE compiler project (2014), http://www.rosecompiler.org
  30. 30.
    Ringler, T., et al.: A multi-resolution approach to global ocean modeling. Ocean Modelling 69(0), 211–232 (2013)CrossRefGoogle Scholar
  31. 31.
    Shende, S., Malony, A.D.: The TAU Parallel Performance System. International Journal of High Performance Computing Applications 20(2), 287–311 (2006)CrossRefGoogle Scholar
  32. 32.
    Shende, S., et al.: Characterizing I/O Performance Using the TAU Performance System. In: Exascale Mini-symposium, ParCo 2011 (2011)Google Scholar
  33. 33.
    The Libunwind Project: The libunwind project (2014), http://www.nongnu.org/libunwind/
  34. 34.
    University of Oregon: ACISS (2014), http://aciss.uoregon.edu/

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Kevin A. Huck
    • 1
  • Allen D. Malony
    • 1
  • Sameer Shende
    • 1
  • Doug W. Jacobsen
    • 2
  1. 1.University of OregonEugeneUSA
  2. 2.Los Alamos National LaboratoryLos AlamosUSA

Personalised recommendations