Bandwidth Efficient All-to-All Broadcast on Switched Clusters

Faraj, Ahmad; Patarasuk, Pitch; Yuan, Xin

doi:10.1007/s10766-007-0047-0

Bandwidth Efficient All-to-All Broadcast on Switched Clusters

Published: 07 August 2007

Volume 36, pages 426–453, (2008)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Ahmad Faraj¹,
Pitch Patarasuk² &
Xin Yuan²

100 Accesses
10 Citations
Explore all metrics

Clusters of workstations employ flexible topologies: regular, irregular, and hierarchical topologies have been used in such systems. The flexibility poses challenges for developing efficient collective communication algorithms since the network topology can potentially have a strong impact on the communication performance. In this paper, we consider the all-to-all broadcast operation on clusters with cut-through and store-and-forward switches. We show that near-optimal all-to-all broadcast on a cluster with any topology can be achieved by only using the links in a spanning tree of the topology when the message size is sufficiently large. The result implies that increasing network connectivity beyond the minimum tree connectivity does not improve the performance of the all-to-all broadcast operation when the most efficient topology specific algorithm is used. All-to-all broadcast algorithms that achieve near-optimal performance are developed for clusters with cut-through and clusters with store-and-forward switches. We evaluate the algorithms through experiments and simulations. The empirical results confirm our theoretical finding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

The MPI Forum, The MPI-2: Extensions to the Message Passing Interface, Available at http://www.mpi-forum.org/docs/mpi-20-html/mpi2-report.html.
J. Bruck, C. Ho, S. Kipnis, E. Upfal, and D. Weathersby, Efficient algorithms for all-to-all communications in multiport messagepassing systems, IEEE Transactions on Parallel and Distributed Systems, 8(11):1143–1156 (November 1997).
S. L. Johnsson and C. T. Ho, Optimum Broadcasting and Personalized Communication in Hypercubes, IEEE Transactions on Computers, 38(9):1249–1268 (September 1989).
Varvarigos E.A., Bertsekas D.P. (1992). Communication Algorithms for Isotropic Tasks in Hypercubes and Wraparound Meshes. Parallel Computing, 18(11):1233–1257
Article MATH MathSciNet Google Scholar
D.S. Scott, Efficient All–to–All Communication Patterns in Hypercube and Meshtopologies, The Sixth Distributed Memory Computing Conference, pp. 398–403 (May 1991).
R. Thakur and A. Choudhary, All-to-all Communication on Meshes with Wormhole Routing, 8th International Parallel Processing Symposium (IPPS), pp. 561–565 (April 1994).
Y. Yang and J. Wang, Efficient all-to-all broadcast in all-port mesh and torus networks, Proceedings of 5th IEEE International Symposium on High-Performance Computer Architecture (HPCA-5), Orlando, FL, pp 290–299 (January 1999).
S. Kumar and L. V. Kale, Scaling All-to-All Multicast on Fat-tree Networks, The 10th International Conference on Parallel and Distributed Systems (ICPADS 2004), Newport Beach, CA, pp. 205–214 (July 2004).
E. Oh and I. A. Kanj, Efficient All-to-All Broadcast Schemes in Distributed-Memory Parallel Computers, The 16th International Symposium on High Performance Computing Systems and Applications (HPCS ’02), pp. 65–70 (2002).
T. Kielmann, R. F. H. Hofman, H. E. Bal, A. Plaat, and R. A. F. Bhoedjang, MagPIe:MPI’s Collective Communication Operations for Clustered Wide Area Systems, In Proceeding Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Atlanta, GA, pp. 131–140, (May 1999).
M. Golebiewski, R. Hempel, and J. L. Traff, Algorithms for Collective Communication Operations on SMP Clusters, in The 1999 Workshop on Cluster-Based Computing, p. 1115 (June 1999).
W. Yu, D. Buntinas, and D. K. Panda, Scalable and High Performance NIC-Based Allgather over Myrinet/GM, TR-22, OSU-CISRC (April 2004).
A. Tam and C. Wang, Efficient Scheduling of Complete Exchange on Clusters, The ISCA 13th International Conference on Parallel and Distributed Computing Systems, pp. 111–116 (August 2000).
R. Thakur, R. Rabenseifner, and W. Gropp, Optimizing of Collective Communication Operations in MPICH, ANL/MCS-P1140-0304, Mathematics and Computer Science Division, Argonne National Laboratory (March 2004).
MPICH – A Portable Implementation of MPI, Available at http://www.mcs. anl.gov/mpi/mpich.
G. D. Benson, C. Chu, Q. Huang, and S. G. Caglar, A Comparison of MPICH Allgather Algorithms on Switched Networks, in Proceedings of the 10th EuroPVM/MPI 2003 Conference, Venice, Italy, pp. 335–343 (September 2003).
M. Jacunski, P. Sadayappan, and D.K. Panda, All-to-All Broadcast on Switch-Based Clusters of Workstations, Proceedings of 1999 International Parallel Processing Symposium, San Juan, Puerto Rico, pp. 325–329 (April 1999).
A. Faraj, X. Yuan, and Pitch Patarasuk, A Message Scheduling Scheme for All-to-all Personalized Communication on Ethernet Switched Clusters, IEEE Transactions on Parallel and Distributed Systems, 18(2):264–276 (Feburary 2007).
R. G. Lane, S. Daniels, and X. Yuan, An Empirical Study of Reliable Multicast Protocols over Ethernet-Connected Networks, Performance Evaluation Journal, 64(3):210–228 (March 2007).
A. Faraj and X. Yuan, Automatic Generation and Tuning of MPI Collective Communication Routines, The 19th ACM International Conference on Supercomputing (ICS’05), Cambridge, MA, pp. 393–402 (June 20–22, 2005).
A. Faraj, X. Yuan, and D.K. Lowenthal, STAR-MPI: Self Tuned Adaptive Routines for MPI Collective Operations, The 20th ACM International Conference on Supercomputing, Cairns, Australia, pp. 199–208 (June 2006).
A. Karwande, X. Yuan, and D. K. Lowenthal, An MPI Prototype for Compiled Communication on Ethernet Switched Clusters, Journal of Parallel and Distributed Computing, 65(10):1123–1133 (October 2005).
Xin Yuan, Rami Melhem, and Rajiv Gupta, Algorithms for Supporting Compiled Communication, IEEE Transactions on Parallel and Distributed Systems, 14(2):107–118 (February 2003).
LAM/MPI Parallel Computing, Available at http://www.lam-mpi.org.
W. Gropp and E. Lusk, Reproducible Measurements of MPI Performance Characteristics. Tech. Report ANL/MCS-P755-0699, Argonne National Labratory (June 1999).
E. W. Zegura, K. Calvert, and S. Bhattacharjee, How to Model an Internetwork. IEEE Infocom ’96, pp. 594-602 (April 1996).

Download references

Author information

Authors and Affiliations

Blue Gene Software Development, IBM Corporation, Rochester, MN, 55901, USA
Ahmad Faraj
Department of Computer Science, Florida State University, Tallahassee, FL, 32306, USA
Pitch Patarasuk & Xin Yuan

Authors

Ahmad Faraj
View author publications
You can also search for this author in PubMed Google Scholar
Pitch Patarasuk
View author publications
You can also search for this author in PubMed Google Scholar
Xin Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Yuan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Faraj, A., Patarasuk, P. & Yuan, X. Bandwidth Efficient All-to-All Broadcast on Switched Clusters. Int J Parallel Prog 36, 426–453 (2008). https://doi.org/10.1007/s10766-007-0047-0

Download citation

Received: 09 February 2007
Accepted: 03 May 2007
Published: 07 August 2007
Issue Date: August 2008
DOI: https://doi.org/10.1007/s10766-007-0047-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bandwidth Efficient All-to-All Broadcast on Switched Clusters

Access this article

Similar content being viewed by others

MPI vs. BitTorrent: Switching between Large-Message Broadcast Algorithms in the Presence of Bottleneck Links

A Topological Perspective on Distributed Network Algorithms

Broadcast and minimum spanning tree with o(m) messages in the asynchronous CONGEST model

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bandwidth Efficient All-to-All Broadcast on Switched Clusters

Access this article

Similar content being viewed by others

MPI vs. BitTorrent: Switching between Large-Message Broadcast Algorithms in the Presence of Bottleneck Links

A Topological Perspective on Distributed Network Algorithms

Broadcast and minimum spanning tree with o(m) messages in the asynchronous CONGEST model

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation