OpenSHMEM Extensions and a Vision for Its Future Direction

  • Stephen Poole
  • Pavel Shamis
  • Aaron Welch
  • Swaroop Pophale
  • Manjunath Gorentla Venkata
  • Oscar Hernandez
  • Gregory Koenig
  • Tony Curtis
  • Chung-Hsing Hsu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8356)

Abstract

The Extreme Scale Systems Center (ESSC) at Oak Ridge National Laboratory (ORNL), together with the University of Houston, led the effort to standardize the SHMEM API with input from the vendors and user community. In 2012, OpenSHMEM specification 1.0 was finalized and released to the OpenSHMEM community for comments. As we move to future HPC systems, there are several shortcomings in the current specification that we need to address to ensure scalability, higher degrees of concurrency, locality, thread safety, fault-tolerance, parallel I/O capabilities, etc. In this paper we discuss an immediate set of extensions that we propose to the current API and our vision for a future API, OpenSHMEM Next-Generation (NG), that targets future Exascale systems. We also explain our rational for the proposed extensions and highlight the lessons learned from other PGAS languages and communication libraries.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    OpenSHMEM Org.: OpenSHMEM specification (2011)Google Scholar
  2. 2.
    Chapman, B., Curtis, T., Pophale, S., Poole, S., Kuehn, J., Koelbel, C., Smith, L.: Introducing OpenSHMEM: SHMEM for the PGAS community. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, PGAS 2010, New York, NY, USA (2010)Google Scholar
  3. 3.
    Poole, S.W., Hernandez, O., Kuehn, J.A., Shipman, G.M., Curtis, A., Feind, K.: OpenSHMEM - Toward a Unified RMA Model. In: Encyclopedia of Parallel Computing, pp. 1379–1391 (2011)Google Scholar
  4. 4.
    Pophale, S., Nanjegowda, R., Curtis, T., Chapman, B., Jin, H., Poole, S., Kuehn, J.: Openshmem performance and potential: A npb experimental study (2012)Google Scholar
  5. 5.
    Pophale, S.S.: SRC: OpenSHMEM library development. In: Lowenthal, D.K., de Supinski, B.R., McKee, S.A. (eds.) ICS, p. 374. ACM (2011)Google Scholar
  6. 6.
    Shamis, P., Venkata, M.G., Kuehn, J.A., Poole, S.W., Graham, R.L.: Universal common communication substrate (uccs) specification. version 0.1. Tech Report ORNL/TM-2012/339, Oak Ridge National Laboratory, ORNL (2012)Google Scholar
  7. 7.
    Graham, R.L., Shamis, P., Kuehn, J.A., Poole, S.W.: Communication middleware overview. Tech Report ORNL/TM-2012/120, Oak Ridge National Laboratory, ORNL (2012)Google Scholar
  8. 8.
    Lawry, W., Wilson, C., Maccabe, A.B., Brightwell, R.: Comb: A portable benchmark suite for assessing mpi overlap. In: IEEE Cluster, pp. 23–26 (2002)Google Scholar
  9. 9.
    Quadrics Supercomputers World Ltd.: SHMEM Programming Manual (2001)Google Scholar
  10. 10.
    CRAY: Thread-safe shmem extensions (2012)Google Scholar
  11. 11.
    Mellor-Crummey, J., Adhianto, L., Scherer III, W.N., Jin, G.: A new vision for coarray fortran. In: Proceedings of the Third Conference on Partitioned Global Address Space Programing Models, PGAS 2009, pp. 5:1–5:9. ACM, New York (2009)Google Scholar
  12. 12.
    Walker, D.W., Dongarra, J.J.: Mpi: A standard message passing interface. Supercomputer 12, 56–68 (1996)Google Scholar
  13. 13.
    Scherer III, W.N., Adhianto, L., Jin, G., Mellor-Crummey, J., Yang, C.: Hiding latency in coarray fortran 2.0. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, PGAS 2010, pp. 14:1–14:9. ACM, New York (2010)Google Scholar
  14. 14.
    Hoefler, T., Kambadur, P., Graham, R.L., Shipman, G., Lumsdaine, A.: A case for standard non-blocking collective operations. In: Cappello, F., Herault, T., Dongarra, J. (eds.) EuroPVM/MPI 2007. LNCS, vol. 4757, pp. 125–134. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  15. 15.
    Almási, G., Heidelberger, P., Archer, C.J., Martorell, X., Erway, C.C., Moreira, J.E., Steinmacher-Burow, B., Zheng, Y.: Optimization of mpi collective communication on bluegene/l systems. In: Proceedings of the 19th Annual International Conference on Supercomputing, ICS 2005, pp. 253–262. ACM, New York (2005)Google Scholar
  16. 16.
    Cachin, C., Kursawe, K., Petzold, F., Shoup, V.: Secure and efficient asynchronous broadcast protocols. In: Kilian, J. (ed.) CRYPTO 2001. LNCS, vol. 2139, pp. 524–541. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  17. 17.
    Gupta, R.: The fuzzy barrier: a mechanism for high speed synchronization of processors. In: Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS III, pp. 54–63. ACM, New York (1989)CrossRefGoogle Scholar
  18. 18.
    Shirako, J., Peixotto, D.M., Sarkar, V., Scherer, W.: Phaser accumulators: A new reduction construct for dynamic parallelism. In: IEEE International Symposium on Parallel Distributed Processing, IPDPS 2009, pp. 1–12 (2009)Google Scholar
  19. 19.
    UPC Consortium: Upc language specifications, v1.2. Tech Report LBNL-59208, Lawrence Berkeley National Lab (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Stephen Poole
    • 1
  • Pavel Shamis
    • 1
  • Aaron Welch
    • 2
  • Swaroop Pophale
    • 2
  • Manjunath Gorentla Venkata
    • 1
  • Oscar Hernandez
    • 1
  • Gregory Koenig
    • 1
  • Tony Curtis
    • 2
  • Chung-Hsing Hsu
    • 1
  1. 1.Extreme Scale Systems CenterOak Ridge National LaboratoryUSA
  2. 2.Computer Science DepartmentUniversity of HoustonUSA

Personalised recommendations