Towards Parallel Performance Analysis Tools for the OpenSHMEM Standard

  • Sebastian Oeste
  • Andreas Knüpfer
  • Thomas Ilsche
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8356)


This paper discusses theoretic and practical aspects when extending performance analysis tools to support the OpenSHMEM standard for parallel programming. The theoretical part covers the mapping of OpenSHMEM’s communication primitives to a generic event record scheme that is compatible with a range of PGAS libraries. The visualization of the recorded events is included as well. The practical parts demonstrate an experimental extension for Cray-SHMEM in VampirTrace and Vampir and first results with a parallel example application. Since Cray-SHMEM is similar to OpenSHMEM in many respects, this serves as a realistic preview. Finally, an outlook on a native support for OpenSHMEM is given together with some recommendations for future revisions of the OpenSHMEM standard from the perspective of performance tools.


OpenSHMEM Performance Analysis Tracing Tools Infrastructure 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: Hpctoolkit: tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience 22(6), 685–701 (2010)Google Scholar
  2. 2.
    Alrutz, T., et al.: GASPI – A partitioned global address space programming interface. In: Keller, R., Kramer, D., Weiss, J.-P. (eds.) Facing the Multicore-Challenge III 2012. LNCS, vol. 7686, pp. 135–136. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  3. 3.
    Carlson, W.W., Draper, J.M., Culler, D.E.: S-246, 187 introduction to UPC and language specificationGoogle Scholar
  4. 4.
    Chapman, B., Curtis, T., Pophale, S., Poole, S., Kuehn, J., Koelbel, C., Smith, L.: Introducing OpenSHMEM – SHMEM for the PGAS community (2010)Google Scholar
  5. 5.
    Geimer, M., Wolf, F., Wylie, B.J.N., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience 22(6), 702–719 (2010)Google Scholar
  6. 6.
    Information technology – Open Systems Interconnection – Basic Reference Model (1994)Google Scholar
  7. 7.
    Jana, S., Schuchart, J.: Tracing and visualizing power consumption of OpenSHMEM applications. Personal Communications (September 2013)Google Scholar
  8. 8.
    Knüpfer, A., Dietrich, R., Doleschal, J., Geimer, M., Hermanns, M.-A., Rössel, C., Tschüter, R., Wesarg, B., Wolf, F.: Generic support for remote memory access operations in Score-P and OTF2. In: Cheptsov, A., Brinkmann, S., Gracia, J., Resch, M.M., Nagel, W.E. (eds.) Tools for High Performance Computing 2012, pp. 57–74. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  9. 9.
    Knüpfer, A., Rössel, C., an Mey, D., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A., et al.: Score-p: A joint performance measurement run-time infrastructure for periscope, scalasca, tau, and vampir. In: Tools for High Performance Computing 2011, pp. 79–91. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  10. 10.
    Message Passing Interface Forum. MPI: A message-passing interface standard, version 2.2. Specification (September 2009)Google Scholar
  11. 11.
    Miller, B.P., Bernat, A.R.: Anywhere, any time binary instrumentation. In: ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE), Szeged, Hungary (September 2011)Google Scholar
  12. 12.
    Müller, M.S., Knüpfer, A., Jurenz, M., Lieber, M., Brunst, H., Mix, H., Nagel, W.E.: Developing scalable applications with vampir, vampirserver and vampirtrace. In: Parallel Computing: Architectures, Algorithms and Applications, vol. 15, pp. 637–644. IOS Press (2008)Google Scholar
  13. 13.
    Nieplocha, J., Carpenter, B.: ARMCI: A portable remote memory copy library for distributed array libraries and compiler run-time systems. In: Rolim, J., et al. (eds.) IPPS-WS 1999 and SPDP-WS 1999. LNCS, vol. 1586, pp. 533–546. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  14. 14.
    Nieplocha, J., Harrison, R.J., Littlefield, R.J.: Global arrays: A non-uniform-memory-access programming model for high-performance computers. The Journal of Supercomputing 10, 10–197 (1996)CrossRefGoogle Scholar
  15. 15.
    Numrich, R.W., Reid, J.: Co-array fortran for parallel programming. ACM Fortran Forum 17(2), 1–31 (1998)CrossRefGoogle Scholar
  16. 16.
    Oeste, S.: Aufzeichnung einseitiger Kommunikation zur Leistungsanalyse paralleler SHMEM-Anwendungen, Bachelor thesis in German (2012)Google Scholar
  17. 17.
    Shende, S.S., Malony, A.D.: The tau parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Sebastian Oeste
    • 1
  • Andreas Knüpfer
    • 1
  • Thomas Ilsche
    • 1
  1. 1.Center for Information Services and HPC (ZIH)Technische Universität DresdenGermany

Personalised recommendations