OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools

Volume 8356 of the series Lecture Notes in Computer Science pp 105-119

Profiling Non-numeric OpenSHMEM Applications with the TAU Performance System

  • John LinfordAffiliated withParaTools Inc.
  • , Tyler A. SimonAffiliated withUniversity of Maryland Baltimore CountyParaTools Inc.
  • , Sameer ShendeAffiliated withParaTools Inc.University of Oregon
  • , Allen D. MalonyAffiliated withParaTools Inc.University of Oregon

* Final gross prices may vary according to local VAT.

Get Access


The recent development of a unified SHMEM framework, OpenSHMEM, has enabled further study in the porting and scaling of applications that can benefit from the SHMEM programming model. This paper focuses on non-numerical graph algorithms, which typically have a low FLOPS/byte ratio. An overview of the space and time complexity of Kruskal’s and Prim’s algorithms for generating a minimum spanning tree (MST) is presented, along with an implementation of Kruskal’s algorithm that uses OpenSHEM to generate the MST in parallel without intermediate communication. Additionally, a procedure for applying the TAU Performance System to OpenSHMEM applications to produce indepth performance profiles showing time spent in code regions, memory access patterns, and network load is presented. Performance evaluations from the Cray XK7 “Titan” system at Oak Ridge National Laboratory and a 48 core shared memory system at University of Maryland, Baltimore County are provided.