Full Bandwidth Broadcast, Reduction and Scan with Only Two Trees

  • Peter Sanders
  • Jochen Speck
  • Jesper Larsson Träff
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4757)

Abstract

We present a new, simple algorithmic idea for exploiting the capability for bidirectional communication present in many modern interconnects for the collective MPI operations broadcast, reduction and scan. Our algorithms achieve up to twice the bandwidth of most previous and commonly used algorithms. In particular, our algorithms for reduction and scan are the currently best known. Experiments on clusters with Myrinet and InfiniBand interconnects show significant reductions in running time for broadcast and reduction, for reduction even close to the best possible factor of two.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bar-Noy, A., Kipnis, S., Schieber, B.: Optimal multiple message broadcasting in telephone-like communication systems. Discrete Applied Mathematics 100(1-2), 1–15 (2000)MATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Barnett, M., Gupta, S., Payne, D.G., Schuler, L., van de Geijn, R., Watts, J.: Building a high-performance collective communication library. In: Supercomputing 1994, pp. 107–116 (1994)Google Scholar
  3. 3.
    Chan, E.W., Heimlich, M.F., Purkayastha, A., van de Geijn, R.A.: On optimizing collective communication. In: IEEE International Conference on Cluster Computing CLUSTER 2004, IEEE Computer Society Press, Los Alamitos (2004)Google Scholar
  4. 4.
    Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable imlementation of the MPI message passing interface standard. Parallel Computing 22(6), 789–828 (1996)MATHCrossRefGoogle Scholar
  5. 5.
    Happe, H.H., Vinter, B.: Improving TCP/IP multicasting with message segmentation. In: Communicating Process Architectures (CPA 2005) (2005)Google Scholar
  6. 6.
    Kwon, O.-H., Chwa, K.-Y.: Multiple message broadcasting in communication networks. Networks 26, 253–261 (1995)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Mayr, E.W., Plaxton, C.G.: Pipelined parallel prefix computations, and sorting on a pipelined hypercube. Journal of Parallel and Distributed Computing 17, 374–380 (1993)MATHCrossRefGoogle Scholar
  8. 8.
    Pjesivac-Grbovic, J., Angskun, T., Bosilca, G., Fagg, G.E., Gabriel, E., Dongarra, J.: Performance analysis of MPI collective operations. In: International Parallel and Distributed Processing Symposium (IPDPS 2005), Workshop on Performance Modeling, Evaluation, and Optimization of Parallel and Distributed Systems (PMEO) (2005)Google Scholar
  9. 9.
    Rabenseifner, R., Träff, J.L.: More efficient reduction algorithms for message-passing parallel systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J.J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 3241, pp. 36–46. Springer, Heidelberg (2004)Google Scholar
  10. 10.
    Ritzdorf, H., Träff, J.L.: Collective operations in NEC’s high-performance MPI libraries. In: International Parallel and Distributed Processing Symposium (IPDPS 2006), p. 100 (2006)Google Scholar
  11. 11.
    Sanders, P., Träff, J.L.: Parallel prefix (scan) algorithms for MPI. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 4192, pp. 49–57. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
    Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI – The Complete Reference. In: The MPI Core, 2nd edn., MIT Press, Cambridge (1998)Google Scholar
  13. 13.
    Thakur, R., Gropp, W.D., Rabenseifner, R.: Improving the performance of collective operations in MPICH. International Journal on High Performance Computing Applications 19, 49–66 (2004)CrossRefGoogle Scholar
  14. 14.
    Träff, J.L.: A simple work-optimal broadcast algorithm for message-passing parallel systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J.J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 3241, pp. 173–180. Springer, Heidelberg (2004)Google Scholar
  15. 15.
    Träff, J.L., Ripke, A.: An optimal broadcast algorithm adapted to SMP-clusters. In: Di Martino, B., Kranzlmüller, D., Dongarra, J.J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 3666, pp. 48–56. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    Träff, J.L., Ripke, A.: Optimal broadcast for fully connected networks. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J.J. (eds.) HPCC 2005. LNCS, vol. 3726, pp. 45–56. Springer, Heidelberg (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Peter Sanders
    • 1
  • Jochen Speck
    • 1
  • Jesper Larsson Träff
    • 2
  1. 1.Universität Karlsruhe, Am Fasanengarten 5, D-76131 KarlsruheGermany
  2. 2.NEC Laboratories Europe, NEC Europe Ltd., Rathausallee 10, D-53757 Sankt AugustinGermany

Personalised recommendations