Efficient Implementation of Allreduce on BlueGene/L Collective Network

  • George Almási
  • Gábor Dózsa
  • C. Chris Erway
  • Burkhardt Steinmacher-Burow
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3666)


BlueGene/L is currently in the pole position on the Top500 list[4]. In its full configuration the system will leverage 65,536 compute nodes. Application scalability is a crucial issue for a system of such size. On BlueGene/L scalability is made possible through the efficient exploitation of special communication. The BlueGene/L system software provides its own optimized version for collective communication routines in addition to the general purpose MPICH2 implementation. The collective network is a natural platform for reduction operations due to its built-in arithmetic units. Unfortunately ALUs of the collective network can handle only fixed point operands. Therefore efficient exploitation of that network for the purpose of floating point reductions is a challenging task. In this paper we present our experiences with implementing an efficient collective network algorithm for Allreduce sums of floating point numbers.


Reduction Operation Virtual Channel Interprocessor Communication Collective Network Scratchpad Memory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adiga, N.R., et al.: An overview of the BlueGene/L supercomputer. In: SC 2002 – High Performance Networking and Computing, Baltimore, MD (November 2002)Google Scholar
  2. 2.
    Almási, G., Bellofatto, R., Brunheroto, J., Caşcaval, C., nos, J.G.C., Ceze, L., Crumley, P., Erway, C., Gagliano, J., Lieber, D., Martorell, X., Moreira, J.E., Sanomiya, A., Strauss, K.: An overview of the BlueGene/L system software organization. In: Proceedings of Euro-Par 2003 Conference, Klagenfurt, Austria, August 2003. LNCS. Springer, Heidelberg (2003)Google Scholar
  3. 3.
    Almasi, G., et al.: Cellular supercomputing with system-on-a-chip. In: IEEE International Solid-state Circuits Conference ISSCC (2001)Google Scholar
  4. 4.
    Dongarra, J., Meuer, H.-W., Strohmaier, E.: TOP500 Supercomputer Sites. Available in Web page at,
  5. 5.
    Shuler, L., Riesen, R., Jong, C., van Dresser, D., Maccabe, A.B., Fisk, L.A., Stallcup, T.M.: The PUMA operating system for massively parallel computers. In: Proceedings of the Intel Supercomputer Users’ Group. 1995 Annual North America Users’ Conference (June 1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • George Almási
    • 1
  • Gábor Dózsa
    • 1
  • C. Chris Erway
    • 2
  • Burkhardt Steinmacher-Burow
    • 3
  1. 1.IBM T. J. Watson Research CenterYorktown HeightsUSA
  2. 2.Dept. of Comp. SciBrown University ProvidenceUSA
  3. 3.IBM GermanyBoeblingenGermany

Personalised recommendations