Abstract
Recent studies show that MPI processes in real applications could arrive at an MPI collective operation at different times. This imbalanced process arrival pattern can significantly affect the performance of the collective operation. MPI_Alltoall() and MPI_Allgather() are communication-intensive collective operations that are used in many scientific applications. Therefore, their efficient implementations under different process arrival patterns are critical to the performance of scientific applications running on modern clusters. In this paper, we propose novel RDMA-based process arrival pattern aware MPI_Alltoall() and MPI_Allgather() for different message sizes over InfiniBand clusters. We also extend the algorithms to be shared memory aware for small to medium size messages under process arrival patterns. The performance results indicate that the proposed algorithms outperform the native MVAPICH implementations as well as other non-process arrival pattern aware algorithms when processes arrive at different times. Specifically, the RDMA-based process arrival pattern aware MPI_Alltoall() and MPI_Allgather() are 3.1 times faster than MVAPICH for 8 KB messages. On average, the applications studied in this paper (FT, RADIX, and N-BODY) achieve a speedup of 1.44 using the proposed algorithms.
Similar content being viewed by others
References
MPI: A Message Passing Interface standard (1997)
Faraj A., Patarasuk P., Yuan X.: A study of process arrival patterns for MPI collective operations. Int. J. Parallel Program. 36(6), 543–570 (2008)
Patarasuk, P., Yuan, X.: Efficient MPI_Bcast across different process arrival patterns. In: Proceedings 22nd International Parallel and Distributed Processing Symposium (IPDPS). (2008)
InfiniBand Architecture, http://www.infinibandta.org
Qian, Y., Afsahi, A.: Process arrival pattern and shared memory aware alltoall on InfiniBand. 16th EuroPVM/MPI Lecture Notes in Computer Science (LNCS 5759), pp. 250–260. (2009)
ConnectX InfiniBand Adapters, product brief, Mellanox Technologies, Inc. http://www.mellanox.com/pdf/products/hca/ConnectX_IB_Card.pdf
Virtual Protocol Interconnect (VPI), product brief, Mellanox Technologies, Inc. http://www.mellanox.com/related-docs/prod_adapter_cards/PB_ConnectX_VPI.pdf
Bruck J., Ho C.-T., Kipnis S., Upfal E., Weathersby D.: Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE Trans. Parallel Distrib. Syst. 8(11), 1143–1156 (1997)
Thakur R., Rabenseifner R., Gropp W.: Optimization of collective communication operations in MPICH. Int. J. High Perform. Comput. Appl. 19(1), 49–66 (2005)
Sur, S., Bondhugula, U.K.R., Mamidala, A., Jin, H.-W., Panda, D.K.: High performance RDMA based all-to-all broadcast for InfiniBand clusters. In: Proceedings 12th International Conference on High Performance Computing (HiPC). (2005)
NAS Benchmarks, version 2.4, http://www.nas.nasa.gov/Resources/Software/npb.html
Qian Y., Afsahi A.: Efficient shared memory and RDMA based collectives on multi-rail QsNetII SMP clusters. Cluster Comput. J. Networks Softw. Tools Appl. 11(4), 341–354 (2008)
Tipparaju, V., Nieplocha, J., Panda, D.K.: Fast collective operations using shared and remote memory access protocols on clusters. In: Proceedings 17th International Parallel and Distributed Processing Symposium (IPDPS). (2003)
Qian, Y., Rashti, M.J., Afsahi, A.: Multi-connection and multi-core aware all-gather on InfiniBand clusters. In: Proceedings 20th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS), pp. 245–251. (2008)
OpenFabrics Alliance Homepage, http://www.openfabrics.org
Shan H., Singh J.P., Oliker L., Biswas R.: Message passing and shared address space parallelism on an SMP cluster. Parallel Comput. 29(2), 167–186 (2003)
Vadhiyar, S.S., Fagg, G.E., Dongarra, J.: Automatically tuned collective communications. In: Proceedings 2000 ACM/IEEE Conference on Supercomputing (SC). (2000)
Buntinas, D., Mercier, G., Gropp, W.: Data transfers between processes in an SMP system: performance study and application to MPI. In: Procedings 35th International Conference on Parallel Processing (ICPP), pp. 487–496. (2006)
Sistare, S., vandeVaart, R., Loh, E.: Optimization of MPI collectives on clusters of large-scale SMPs. In: Proceedings 1999 ACM/IEEE Conference on Supercomputing (SC). (1999)
Mamidala, A.R., Chai, L., Jin, H.-W., Panda, D.K.: Efficient SMP-aware MPI-level broadcast over InfiniBand’s hardware multicast. Workshop on Communication Architecture on Clusters (CAC). In: Proceedings of 20th International Parallel and Distributed Processing Symposium (IPDPS). Pittsburgh, PA (2006)
Wu, M., Kendall, R.A., Wright, K.: Optimizing collective communications on SMP clusters. In: Proceedings 34th International Conference on Parallel Processing (ICPP), pp. 399–407. (2005)
Traff, J.L.: Efficient allgather for regular SMP-clusters. In: Proceedings EuroPVM/MPI, pp. 58–65. (2006)
Ritzdorf, H., Traff, J.L.: Collective operations in NEC’s high-performance MPI libraries. In: Proceedings 20th International Parallel and Distributed Processing Symposium (IPDPS). (2006)
Sur, S., Jin, H.-W., Panda, D.K.: Efficient and scalable all-to-all personalized exchange for InfiniBand clusters. In: Proceedings 33rd International Conference on Parallel Processing (ICCP), pp. 275–282. (2004)
Mamidala, A.R., Vishnu, A., Panda, D.K.: Efficient shared memory and RDMA based design for MPI-allgather over InfiniBand. In: Proceedings EuroPVM/MPI, pp. 66–75. (2006)
Mamidala, A., Kumar, R., De, D., Panda, D.K.: MPI collectives on modern multicore clusters: performance optimizations and communication characteristics. In: Proceedings 8th International Symposium on Cluster Computing and the Grid (CCGrid). (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qian, Y., Afsahi, A. Process Arrival Pattern Aware Alltoall and Allgather on InfiniBand Clusters. Int J Parallel Prog 39, 473–493 (2011). https://doi.org/10.1007/s10766-010-0152-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-010-0152-3