Improved MPI All-to-all Communication on a Giganet SMP Cluster

  • Jesper Larsson Träff
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2474)


We present the implementation of an improved, almost optimal algorithm for regular, personalized all-to-all communication for hierarchical multiprocessors, like clusters of SMP nodes. In MPI this communication primitive is realized in the MPI_Alltoall collective. The algorithm is a natural generalization of a well-known algorithm for non-hierarchical systems based on factorization. A specific contribution of the paper is a completely contention-free scheme not using token-passing for exchange of messages between SMP nodes.

We describe a dedicated implementation for a small Giganet SMP cluster with 6 SMP nodes of 4 processors each. We present simple experiments to validate the assumptions underlying the design of the algorithm. The results were used to guide the detailed implementation of a crucial part of the algorithm. Finally, we compare the improved MPI_Alltoall collective to a trivial (but widely used) implementation, and show improvements in average completion time of sometimes more than 10%. While this may not seem much, we have reasons to believe that the improvements will be more substantial for larger systems.


Active Node Message Passing Interface Local Index Distribute Processing Symposium Average Completion Time 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    M. GoIlebiewski and J. L. Träff. MPI-2 one-sided communications on a Giganet SMP cluster. In Recent Advances in Parallel Virtual Machine and Message Passing Interface. 8th European PVM/MPI Users’ Group Meeting, volume 2131 of Lecture Notes in Computer Science, pages 16–23, 2001.CrossRefGoogle Scholar
  2. [2]
    W. Gropp, S. Huss-Lederman, A. Lumsdaine, E. Lusk, B. Nitzberg, W. Saphir, and M. Snir. MPI-The Complete Reference, volume 2, The MPI Extensions. MIT Press, 1998.Google Scholar
  3. [3]
    S. E. Hambrusch, F. Hameed, and A. A. Khokhar. Communication operations on coarse-grained mesh architectures. Parallel Computing, 21(5):731–752, 1995.zbMATHCrossRefGoogle Scholar
  4. [4]
    F. Harary. Graph Theory. Addison-Wesley, 1967.Google Scholar
  5. [5]
    L. P. Huse. MPI optimization for SMP based clusters interconnected with SCI. In 7th European PVM/MPI User’s Group Meeting, volume 1908 of Lecture Notes in Computer Science, pages 56–63, 2000.Google Scholar
  6. [6]
    N. T. Karonis, B. R. de Supinski, I. Foster, W. Gropp, E. Lusk, and J. Bresnahan. Exploiting hierarchy in parallel computer networks to optimize collective operation performance. In Proceedings of the 14th International Parallel and Distributed Processing Symposium (IPDPS 2000), pages 377–386, 2000.Google Scholar
  7. [7]
    T. Kielmann, H.E. Bal, and S. Gorlatch. Bandwidth-efficient collective communication for clustered wide area systems. In Proceedings of the 14th International Parallel and Distributed Processing Symposium (IPDPS 2000), pages 492–499, 2000.Google Scholar
  8. [8]
    T. Kielmann, R.F.H. Hofman, H.E. Bal, A. Plaat, and R.A.F. Bhoedjang. MagPIe: MPI’s collective communication operations for clustered wide area systems. In Proceedings of the 1999 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP‘99), volume 34(8) of ACM Sigplan Notices, pages 131–140, 1999.Google Scholar
  9. [9]
    T. Kielmann, R.F.H. Hofman, H.E. Bal, A. Plaat, and R.A.F. Bhoedjang. MPI’s reduction operations in clustered wide area systems. In Third Message Passing Interface Developer’s and User’s Conference (MPIDC’99), pages 43–52, 1999.Google Scholar
  10. [10]
    P. Sanders and R. Solis-Oba. How helpers hasten h-relations. Journal of Algorithms, 41(1):86–98, 2001.zbMATHCrossRefMathSciNetGoogle Scholar
  11. [11]
    P. Sanders and J. L. Träff. The hierarchical factor algorithm for all-to-all communication. In Euro-Par 2002 Parallel Processing, volume 2400 of Lecture Notes in Computer Science, pages 786–790, 2002.CrossRefGoogle Scholar
  12. [12]
    M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra. MPI-The Complete Reference, volume 1, The MPI Core. MIT Press, second edition, 1998.Google Scholar
  13. [13]
    X. Wang, E.K. Blum, D.S. Parker, and D. Massey. The dance party problem and its application to collective communication in computer networks. Parallel Computing, 23(8):1141–1156, 1997.zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Jesper Larsson Träff
    • 1
  1. 1.C&C Research Laboratories, NEC Europe Ltd.Sankt AugustinGermany

Personalised recommendations