Chapter

OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools

Volume 8356 of the series Lecture Notes in Computer Science pp 1-13

Designing a High Performance OpenSHMEM Implementation Using Universal Common Communication Substrate as a Communication Middleware

  • Pavel ShamisAffiliated withLancaster UniversityExtreme Scale Systems Center (ESSC), Oak Ridge National Laboratory (ORNL)
  • , Manjunath Gorentla VenkataAffiliated withLancaster UniversityExtreme Scale Systems Center (ESSC), Oak Ridge National Laboratory (ORNL)
  • , Stephen PooleAffiliated withLancaster UniversityExtreme Scale Systems Center (ESSC), Oak Ridge National Laboratory (ORNL)
  • , Aaron WelchAffiliated withLancaster UniversityComputer Science Department, University of Houston (UH)
  • , Tony CurtisAffiliated withLancaster UniversityComputer Science Department, University of Houston (UH)

* Final gross prices may vary according to local VAT.

Get Access

Abstract

OpenSHMEM is an effort to standardize the well-known SHMEM parallel programming library. The project aims to produce an open-source and portable SHMEM API and is led by ORNL and UH. In this paper, we optimize the current OpenSHMEM reference implementation, based on GASNet, to achieve higher performance characteristics. To achieve these desired performance characteristics, we have redesigned an important component of the OpenSHMEM implementation, the network layer, to leverage a low-level communication library designed for implementing parallel programming models called UCCS. In particular, UCCS provides an interface and semantics such as native atomic operations and remote memory operations to better support PGAS programming models, including OpenSHMEM. Through the use of microbenchmarks, we evaluate this new OpenSHMEM implementation on various network metrics, including the latency of point-to-point and collective operations. Furthermore, we compare the performance of our OpenSHMEM implementation with the state-of-the-art SGI SHMEM. Our results show that the atomic operations of our OpenSHMEM implementation outperform SGI’s SHMEM implementation by 3%. Its RMA operations outperform both SGI’s SHMEM and the original OpenSHMEM reference implementation by as much as 18% and 12% for gets, and as much as 83% and 53% for puts.