Abstract
The Message Passing Interface (MPI) has been very popular for programming parallel scientific applications. As the multi-core architectures have become prevalent, a major question that has emerged is about the use of MPI within a compute node and its impact on communication costs. The one-sided communication interface in MPI provides a mechanism to reduce communication costs by removing matching requirements of the send/receive model. The MPI standard provides the flexibility to allocate memory windows backed by shared memory. However, state-of-the-art open-source MPI libraries do not leverage this optimization opportunity for commodity clusters. In this paper, we present a design and implementation of intra-node MPI one-sided interface using shared memory backed windows on multi-core clusters. We use MVAPICH2 MPI library for design, implementation and evaluation. Micro-benchmark evaluation shows that the new design can bring up to 85% improvement in Put, Get and Accumulate latencies, with passive synchronization mode. The bandwidth performance of Put and Get improves by 64% and 42%, respectively. Splash LU benchmark shows an improvement of up to 55% with the new design on 32 core Magny-cours node. It shows similar improvement on a 12 core Westmere node. The mean BFS time in Graph500 reduces by 39% and 77% on Magny-cours and Westmere nodes, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Graph500, http://www.graph500.org/
MPI-3 RMA, https://svn.mpi-forum.org/trac/mpi-forum-web/raw-attachment/wiki/mpi3-rma-proposal1/one-side-2.pdf
Barrett, B.W., Shipman, G.M., Lumsdaine, A.: Analysis of implementation options for MPI-2 one-sided. In: Cappello, F., Herault, T., Dongarra, J. (eds.) PVM/MPI 2007. LNCS, vol. 4757, pp. 242–250. Springer, Heidelberg (2007)
Booth, S., Mourao, E.: Single sided MPI implementations for SUN MPI. In: Proceedings of the ACM/IEEE Conference on Supercomputing, p. 2 (2000)
Jin, H.W., Sur, S., Chai, L., Panda, D.K.: Lightweight kernel-level primitives for high-performance MPI intra-node communication over multi-core systems. In: Proceedings of IEEE International Conference on Cluster Computing, pp. 446–451 (2007)
Lai, P., Sur, S., Panda, D.K.: Designing Truly One-Sided MPI-2 RMA Intra-node Communication on Multi-core Systems. In: Proceedings of International Supercomputing Conference (ISC), vol. 25, pp. 3–14 (2010)
Narravula, S., Mamidala, A., Vishnu, A., Vaidyanathan, K., Jin, H.W., Panda, D.K.: High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations. In: Proceedings of IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid), pp. 583–590 (2007)
OSU Microbenchmarks: http://mvapich.cse.ohio-state.edu/benchmarks/
Santhanaraman, G., Balaji, P., Gopalakrishnan, K., Thakur, R., Gropp, W., Panda, D.K.: Natively Supporting True One-Sided Communication in MPI on Multi-core Systems with InfiniBand. In: Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid), pp. 380–387 (2009)
Santhanaraman, G., Narravula, S., Panda, D.K.: Designing Passive Synchronization for MPI-2 One-Sided Communication to Maximize Overlap. In: Proceedings of International Parallel and Distributed Processing Symposium (IPDPS), pp. 1–11 (2008)
Santhanaraman, G., Gangadharappa, T., Narravula, S., Mamidala, A., Panda, D.K.: Design Alternatives for Implementing Fence Synchronization in MPI-2 One-sided Communication on InfiniBand Clusters. In: Proceedings of IEEE Cluster, pp. 1–9 (2009)
Singh, J.P., Weber, W., Gupta, A.: Splash: Stanford parallel applications for shared-memory. Tech. rep., Stanford, CA, USA (1991)
Thakur, R., Gropp, W., Toonen, B.: Optimizing the Synchronization Operations in MPI One-Sided Communication. In: International Journal of High Performance Computing Applications (IJHPCA), pp. 119–128 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Potluri, S., Wang, H., Dhanraj, V., Sur, S., Panda, D.K. (2011). Optimizing MPI One Sided Communication on Multi-core InfiniBand Clusters Using Shared Memory Backed Windows. In: Cotronis, Y., Danalis, A., Nikolopoulos, D.S., Dongarra, J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2011. Lecture Notes in Computer Science, vol 6960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24449-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-24449-0_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24448-3
Online ISBN: 978-3-642-24449-0
eBook Packages: Computer ScienceComputer Science (R0)