Improved MPI All-to-all Communication on a Giganet SMP Cluster
We present the implementation of an improved, almost optimal algorithm for regular, personalized all-to-all communication for hierarchical multiprocessors, like clusters of SMP nodes. In MPI this communication primitive is realized in the MPI_Alltoall collective. The algorithm is a natural generalization of a well-known algorithm for non-hierarchical systems based on factorization. A specific contribution of the paper is a completely contention-free scheme not using token-passing for exchange of messages between SMP nodes.
We describe a dedicated implementation for a small Giganet SMP cluster with 6 SMP nodes of 4 processors each. We present simple experiments to validate the assumptions underlying the design of the algorithm. The results were used to guide the detailed implementation of a crucial part of the algorithm. Finally, we compare the improved MPI_Alltoall collective to a trivial (but widely used) implementation, and show improvements in average completion time of sometimes more than 10%. While this may not seem much, we have reasons to believe that the improvements will be more substantial for larger systems.
KeywordsActive Node Message Passing Interface Local Index Distribute Processing Symposium Average Completion Time
Unable to display preview. Download preview PDF.
- M. GoIlebiewski and J. L. Träff. MPI-2 one-sided communications on a Giganet SMP cluster. In Recent Advances in Parallel Virtual Machine and Message Passing Interface. 8th European PVM/MPI Users’ Group Meeting, volume 2131 of Lecture Notes in Computer Science, pages 16–23, 2001.CrossRefGoogle Scholar
- W. Gropp, S. Huss-Lederman, A. Lumsdaine, E. Lusk, B. Nitzberg, W. Saphir, and M. Snir. MPI-The Complete Reference, volume 2, The MPI Extensions. MIT Press, 1998.Google Scholar
- F. Harary. Graph Theory. Addison-Wesley, 1967.Google Scholar
- L. P. Huse. MPI optimization for SMP based clusters interconnected with SCI. In 7th European PVM/MPI User’s Group Meeting, volume 1908 of Lecture Notes in Computer Science, pages 56–63, 2000.Google Scholar
- N. T. Karonis, B. R. de Supinski, I. Foster, W. Gropp, E. Lusk, and J. Bresnahan. Exploiting hierarchy in parallel computer networks to optimize collective operation performance. In Proceedings of the 14th International Parallel and Distributed Processing Symposium (IPDPS 2000), pages 377–386, 2000.Google Scholar
- T. Kielmann, H.E. Bal, and S. Gorlatch. Bandwidth-efficient collective communication for clustered wide area systems. In Proceedings of the 14th International Parallel and Distributed Processing Symposium (IPDPS 2000), pages 492–499, 2000.Google Scholar
- T. Kielmann, R.F.H. Hofman, H.E. Bal, A. Plaat, and R.A.F. Bhoedjang. MagPIe: MPI’s collective communication operations for clustered wide area systems. In Proceedings of the 1999 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP‘99), volume 34(8) of ACM Sigplan Notices, pages 131–140, 1999.Google Scholar
- T. Kielmann, R.F.H. Hofman, H.E. Bal, A. Plaat, and R.A.F. Bhoedjang. MPI’s reduction operations in clustered wide area systems. In Third Message Passing Interface Developer’s and User’s Conference (MPIDC’99), pages 43–52, 1999.Google Scholar
- M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra. MPI-The Complete Reference, volume 1, The MPI Core. MIT Press, second edition, 1998.Google Scholar