Supporting MPI-2 One Sided Communication on Multi-rail InfiniBand Clusters: Design Challenges and Performance Benefits

  • Abhinav Vishnu
  • Gopal Santhanaraman
  • Wei Huang
  • Hyun-Wook Jin
  • Dhabaleswar K. Panda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3769)


In cluster computing, InfiniBand has emerged as a popular high performance interconnect with MPI as the de facto programming model. However, even with InfiniBand, bandwidth can become a bottleneck for clusters executing communication intensive applications. Multi-rail cluster configurations with MPI-1 are being proposed to alleviate this problem. Recently, MPI-2 with support for one-sided communication is gaining significance. In this paper, we take the challenge of designing high performance MPI-2 one-sided communication on multi-rail InfiniBand clusters. We propose a unified MPI-2 design for different configurations of multi-rail networks (multiple ports, multiple HCAs and combinations). We present various issues associated with one-sided communication such as multiple synchronization messages, scheduling of RDMA (Read, Write) operations, ordering relaxation and discuss their implications on our design. Our performance results show that multi-rail networks can significantly improve MPI-2 one-sided communication performance. Using PCI-Express with two-ports, we can achieve a peak MPI_Put bidirectional bandwidth of 2620 Million Bytes/s, compared to 1910 MB/s for single-rail implementation. For PCI-X with two HCAs, we can almost double the throughput and reduce the latency to half for large messages.


Message Passing Interface Virtual Channel Message Size Large Message Small Message 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Argonne National Laboratory. MPICH2,
  2. 2.
    Bonachea, D.: GASNet Specification, v1.1. Technical Report UCB/CSD-02-1207, Computer Science Division, University of California at Berkeley (October 2002)Google Scholar
  3. 3.
    Coll, S., Frachtenberg, E., Petrini, F., Hoisie, A., Gurvits, L.: Using multirail networks in high-performance clusters. In: Proceedings of the 3rd IEEE International Conference on Cluster Computing, Washington, DC, USA, p. 15. IEEE Computer Society, Los Alamitos (2001)Google Scholar
  4. 4.
    Duato, J., Yalamanchili, S., Ni, L.: Interconnection Networks: An Engineering Approach. The IEEE Computer Society Press, Los Alamitos (1997)Google Scholar
  5. 5.
    Goudreau, M., Lang, K., Rao, S.B., Suel, T., Tsantilas, T.: Portable and Effcient Parallel Computing Using the BSP Model. IEEE Transactions on Computers, 670–689 (1999)Google Scholar
  6. 6.
    Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard. Parallel Computing 22(6), 789–828 (1996)zbMATHCrossRefGoogle Scholar
  7. 7.
    Huang, W., Santhanaraman, G., Jin, H.-W., Panda, D.K.: Scheduling of MPI-2 One Sided Operations On InfiniBand. In: Int’l Parallel and Distributed Processing Symposium, IPDPS 2005 (2005)Google Scholar
  8. 8.
    InfiniBand Trade Association. InfiniBand Architecture Specification, Release 1.0, October 24 (2000)Google Scholar
  9. 9.
    Jiang, W., Liu, J., Jin, H.W., Panda, D.K., Buntinas, D., Thakur, R., Gropp, W.: Efficient Implementation of MPI-2 Passive One-Sided Communication on InfiniBand Clusters. EuroPVM/MPI (September 2004)Google Scholar
  10. 10.
    Jiang, W., Liu, J., Jin, H.-W., Panda, D.K., Gropp, W., Thakur, R.: High Performance MPI-2 One-Sided Communication over InfiniBand. In: International Symposium on Cluster Computing and the Grid, CCGrid 2004 ( April 2004)Google Scholar
  11. 11.
    Liu, J., Vishnu, A., Panda, D.K.: Building multirail infiniband clusters: Mpi-level design and performance evaluation. In: SC 2004: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, Washington, DC, USA, p. 33. IEEE Computer Society, Los Alamitos (2004)Google Scholar
  12. 12.
    Liu, J., Wu, J., Kini, S.P., Buntinas, D., Yu, W., Chandrasekaran, B., Noronha, R., Wyckoff, P., Panda, D.K.: MPI over InfiniBand: Early Experiences. Technical Report, OSU-CISRC-10/02-TR25, Computer and Information Science, the Ohio State University (January 2003)Google Scholar
  13. 13.
    Network-Based Computing Laboratory. MVAPICH: MPI for InfiniBand on VAPI Layer (January 2003),
  14. 14.
    Nieplocha, J., Carpenter, B.: ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-Time Systems. In: Rolim, J.D.P. (ed.) IPPS-WS 1999 and SPDP-WS 1999. LNCS, vol. 1586, Springer, Heidelberg (1999)CrossRefGoogle Scholar
  15. 15.
    Thakur, R., Gropp, W., Toonen, B.: Minimizing Synchronization Overhead in the Implementation of MPI One-Sided Communication. In: EuroPVM/MPI (September 2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Abhinav Vishnu
    • 1
  • Gopal Santhanaraman
    • 1
  • Wei Huang
    • 1
  • Hyun-Wook Jin
    • 1
  • Dhabaleswar K. Panda
    • 1
  1. 1.Department of Computer Science and EngineeringThe Ohio State UniversityColumbusUSA

Personalised recommendations