Abstract

We show how to adapt and extend a well-known allgather (all-to-all broadcast) algorithm to parallel systems with a hierarchical communication system such as clusters of SMP nodes. For small problem sizes, the new algorithm requires a logarithmic number of communication rounds in the number of SMP nodes, and gracefully degrades towards a linear algorithm as problem size increases. The algorithm has been used to implement the MPI_Allgather collective operation of MPI in the MPI/SX library. Performance measurements on a 72 node SX-8 system shows that graceful degradation provides a smooth transition from logarithmic to linear behavior, and significantly outperforms a standard, linear algorithm. The performance of the latter is furthermore highly sensitive to the distribution of MPI processes over the physical processors.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Almási, G., Heidelberger, P., Archer, C., Martorell, X., Erway, C.C., Moreira, J.E., Steinmacher-Burow, B.D., Zheng, Y.: Optimization of MPI collective communication on BlueGene/L systems. In: 19th ACM International Conference on Supercomputing (ICS 2005), pp. 253–262 (2005)Google Scholar
  2. 2.
    Benson, G.D., Chu, C.-W., Huang, Q., Caglar, S.G.: A comparison of MPICH allgather algorithms on switched networks. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 335–343. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  3. 3.
    Bruck, J., Ho, C.-T., Kipnis, S., Upfal, E., Weathersby, D.: Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE Transactions on Parallel and Distributed Systems 8(11), 1143–1156 (1997)CrossRefGoogle Scholar
  4. 4.
    Fraigniaud, P., Lazard, E.: Methods and problems of communication in usual networks. Discrete Applied Mathematics 53(1–3), 79–133 (1994)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Hedetniemi, S.M., Hedetniemi, T., Liestman, A.L.: A survey of gossiping and broadcasting in communication networks. Networks 18, 319–349 (1988)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Johnsson, S.L., Ho, C.-T.: Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers 38(9), 1249–1268 (1989)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Karonis, N.T., Toonen, B.R., Foster, I.T.: MPICH-G2: A grid-enabled implementation of the message passing interface. Journal of Parallel and Distributed Computing 63(5), 551–563 (2003)MATHCrossRefGoogle Scholar
  8. 8.
    Kielmann, T., Hofman, R.F.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.F.: MagPIe: MPI’s collective communication operations for clustered wide area systems. In: Symposium on Principles and Practice of Parallel Programming (PPoPP 1999). ACM Sigplan Notices, vol. 34, pp. 131–140 (1999)Google Scholar
  9. 9.
    Krumme, D.W., Cybenko, G., Venkataraman, K.N.: Gossiping in minimal time. SIAM Journal on Computing 21(1), 111–139 (1992)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Ritzdorf, H., Träff, J.L.: Collective operations in NEC’s high-performance MPI libraries. In: International Parallel and Distributed Processing Symposium (IPDPS 2006), p. 100 (2006)Google Scholar
  11. 11.
    Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI – The Complete Reference. In: The MPI Core, 2nd edn., vol. 1. MIT Press, Cambridge (1998)Google Scholar
  12. 12.
    Thakur, R., Gropp, W.D., Rabenseifner, R.: Improving the performance of collective operations in MPICH. International Journal on High Performance Computing Applications 19, 49–66 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jesper Larsson Träff
    • 1
  1. 1.C&C Research LaboratoriesNEC Europe Ltd.Sankt AugustinGermany

Personalised recommendations