Evaluating OpenSHMEM Explicit Remote Memory Access Operations and Merged Requests

Boehm, Swen; Pophale, Swaroop; Venkata, Manjunath Gorentla

doi:10.1007/978-3-319-50995-2_2

Swen Boehm¹⁷,
Swaroop Pophale¹⁷ &
Manjunath Gorentla Venkata¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10007))

Included in the following conference series:

Workshop on OpenSHMEM and Related Technologies

451 Accesses
2 Citations

Abstract

The OpenSHMEM Library Specification has evolved considerably since version 1.0. Recently, non-blocking implicit Remote Memory Access (RMA) operations were introduced in OpenSHMEM 1.3. These provide a way to achieve better overlap between communication and computation. However, the implicit non-blocking operations do not provide a separate handle to track and complete the individual RMA operations. They are guaranteed to be completed after either a shmem_quiet(), shmem_barrier() or a shmem_barrier_all() is called. These are global completion and synchronization operations. Though this semantic is expected to achieve a higher message rate for the applications, the drawback is that it does not allow fine-grained control over the completion of RMA operations.

In this paper, first, we introduce non-blocking RMA operations with requests, where each operation has an explicit request to track and complete the operation. Second, we introduce interfaces to merge multiple requests into a single request handle. The merged request tracks multiple user-selected RMA operations, which provides the flexibility of tracking related communication operations with one request handle. Lastly, we explore the implications in terms of performance, productivity, usability and the possibility of defining different patterns of communication via merging of requests. Our experimental results show that a well designed and implemented OpenSHMEM stack can hide the overhead of allocating and managing the requests. The latency of RMA operations with requests is similar to blocking and implicit non-blocking RMA operations. We test our implementation with the Scalable Synthetic Compact Applications (SSCA #1) benchmark and observe that using RMA operations with requests and merging of these requests outperform the implementation using blocking RMA operations and implicit non-blocking operations by 49% and 74% respectively.

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

OpenSHMEM specification 1.3. http://openshmem.org/site/sites/default/site_files/OpenSHMEM-1.3.pdf
Bader, D., Madduri, K., Gilbert, J., Shah, V., Kepner, J., Meuse, T., Krishnamurthy, A.: Designing scalable synthetic compact applications for benchmarking high productivity computing systems (2006)
Google Scholar
Baker, M., Welch, A., Gorentla Venkata, M.: Parallelizing the Smith-Waterman algorithm using OpenSHMEM and MPI-3 one-sided interfaces. In: Gorentla Venkata, M., Shamis, P., Imam, N., Lopez, M.G. (eds.) OpenSHMEM 2014. LNCS, vol. 9397, pp. 178–191. Springer, Heidelberg (2015). doi:10.1007/978-3-319-26428-8_12
Chapter Google Scholar
ten Bruggencate, M., Roweth, D., Oyanagi, S.: Thread-safe SHMEM extensions. In: Poole, S., Hernandez, O., Shamis, P. (eds.) OpenSHMEM 2014. LNCS, vol. 8356, pp. 178–185. Springer, Heidelberg (2014). doi:10.1007/978-3-319-05215-1_13
Chapter Google Scholar
Dinan, J., Flajslik, M.: Contexts: a mechanism for high throughput communication in OpenSHMEM. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS 2014, NY, USA, pp. 10:1–10:9. ACM, New York (2014). http://doi.acm.org/10.1145/2676870.2676872
Dongarra, J.J., Otto, S.W., Snir, M., Walker, D.: An Introduction to the MPI Standard, University of Tennessee, Knoxville, TN, USA (1995). http://www.ncstrl.org:8900/ncstrl/servlet/search?formname=detail&id=oai%3Ancstrlh%3Autk_cs%3Ancstrl.utk_cs%2F%2FUT-CS-95-274
Guiffaut, C., Mahdjoubi, K.: A parallel FDTD algorithm using the MPI library. IEEE Antennas Propag. Mag. 43(2), 94–103 (2001)
Article Google Scholar
Hoefler, T., Kambadur, P., Graham, R.L., Shipman, G., Lumsdaine, A.: A case for standard non-blocking collective operations. In: Cappello, F., Herault, T., Dongarra, J. (eds.) EuroPVM/MPI 2007. LNCS, vol. 4757, pp. 125–134. Springer, Heidelberg (2007). doi:10.1007/978-3-540-75416-9_22
Chapter Google Scholar
Hoefler, T., Lumsdaine, A., Rehm, W.: Implementation and performance analysis of non-blocking collective operations for MPI. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, SC 2007, pp. 1–10. IEEE (2007)
Google Scholar
Hoefler, T., Squyres, J., Bosilca, G., Fagg, G., Lumsdaine, A., Rehm, W.: Non-blocking collective operations for MPI-2. Open Systems Lab, Indiana University, Technical report 8 (2006)
Google Scholar
Liu, J., Chandrasekaran, B., Wu, J., Jiang, W., Kini, S., Yu, W., Buntinas, D., Wyckoff, P., Panda, D.K.: Performance comparison of MPI implementations over InfiniBand, Myrinet and Quadrics. In: Supercomputing, 2003 ACM/IEEE Conference, pp. 58–58. IEEE (2003)
Google Scholar
Martin, R., Komatitsch, D., Blitz, C., Goff, N.: Simulation of seismic wave propagation in an asteroid based upon an unstructured MPI spectral-element method: blocking and non-blocking communication strategies. In: Palma, J.M.L.M., Amestoy, P.R., Daydé, M., Mattoso, M., Lopes, J.C. (eds.) VECPAR 2008. LNCS, vol. 5336, pp. 350–363. Springer, Heidelberg (2008). doi:10.1007/978-3-540-92859-1_32
Chapter Google Scholar
Saif, T., Parashar, M.: Understanding the behavior and performance of non-blocking communications in MPI. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 173–182. Springer, Heidelberg (2004). doi:10.1007/978-3-540-27866-5_22
Chapter Google Scholar
Shamis, P., Venkata, M.G., Lopez, M.G., Baker, M.B., Hernandez, O., Itigin, Y., Dubman, M., Shainer, G., Graham, R.L., Liss, L., Shahar, Y., Potluri, S., Rossetti, D., Becker, D., Poole, D., Lamb, C., Kumar, S., Stunkel, C., Bosilca, G., Bouteiller, A.: UCX: an open source framework for HPC network APIs and beyond. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pp. 40–43, August 2015
Google Scholar
The Ohio State University: OSU micro-benchmarks (2016). http://mvapich.cse.ohio-state.edu/benchmarks/
Tipparaju, V., Krishnan, M., Nieplocha, J., Santhanaraman, G., Panda, D.: Exploiting non-blocking remote memory access communication in scientific benchmarks. In: Pinkston, T.M., Prasanna, V.K. (eds.) HiPC 2003. LNCS, vol. 2913, pp. 248–258. Springer, Heidelberg (2003). doi:10.1007/978-3-540-24596-4_27
Chapter Google Scholar

Download references

Acknowledgments

This work is supported by the United States Department of Defense and used resources of the Extreme Scale Systems Center located at the Oak Ridge National Laboratory.

Author information

Authors and Affiliations

Oak Ridge National Laboratory, Oak Ridge, Tennessee, 37831, USA
Swen Boehm, Swaroop Pophale & Manjunath Gorentla Venkata

Authors

Swen Boehm
View author publications
You can also search for this author in PubMed Google Scholar
Swaroop Pophale
View author publications
You can also search for this author in PubMed Google Scholar
Manjunath Gorentla Venkata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Swen Boehm .

Editor information

Editors and Affiliations

Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Manjunath Gorentla Venkata
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Neena Imam
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Swaroop Pophale
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Tiffany M. Mintz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boehm, S., Pophale, S., Venkata, M.G. (2016). Evaluating OpenSHMEM Explicit Remote Memory Access Operations and Merged Requests. In: Gorentla Venkata, M., Imam, N., Pophale, S., Mintz, T. (eds) OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments. OpenSHMEM 2016. Lecture Notes in Computer Science(), vol 10007. Springer, Cham. https://doi.org/10.1007/978-3-319-50995-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-50995-2_2
Published: 15 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50994-5
Online ISBN: 978-3-319-50995-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics