Chapter

Euro-Par 2004 Parallel Processing

Volume 3149 of the series Lecture Notes in Computer Science pp 833-845

Implementing MPI on the BlueGene/L Supercomputer

  • George AlmásiAffiliated withIBM Thomas J. Watson Research Center
  • , Charles ArcherAffiliated withIBM Systems Group
  • , José G. CastañosAffiliated withIBM Thomas J. Watson Research Center
  • , C. Chris ErwayAffiliated withIBM Thomas J. Watson Research Center
  • , Philip HeidelbergerAffiliated withIBM Thomas J. Watson Research Center
  • , Xavier MartorellAffiliated withIBM Thomas J. Watson Research Center
  • , José E. MoreiraAffiliated withIBM Thomas J. Watson Research Center
  • , Kurt PinnowAffiliated withIBM Systems Group
  • , Joe RattermanAffiliated withIBM Systems Group
    • , Nils SmedsAffiliated withIBM Thomas J. Watson Research Center
    • , Burkhard Steinmacher-burowAffiliated withIBM Thomas J. Watson Research Center
    • , William GroppAffiliated withMathematics and Computer Science Division, Argonne National Laboratory
    • , Brian ToonenAffiliated withMathematics and Computer Science Division, Argonne National Laboratory

* Final gross prices may vary according to local VAT.

Get Access

Abstract

The BlueGene/L supercomputer will consist of 65,536 dual-processor compute nodes interconnected by two high-speed networks: a three-dimensional torus network and a tree topology network. Each compute node can only address its own local memory, making message passing the natural programming model for BlueGene/L. In this paper we present our implementation of MPI for BlueGene/L. In particular, we discuss how we leveraged the architectural features of BlueGene/L to arrive at an efficient implementation of MPI in this machine. We validate our approach by comparing MPI performance against the hardware limits and also the relative performance of the different modes of operation of BlueGene/L. We show that dedicating one of the processors of a node to communication functions greatly improves the bandwidth achieved by MPI operation, whereas running two MPI tasks per compute node can have a positive impact on application performance.