Recent Advances in Parallel Virtual Machine and Message Passing Interface

Volume 5759 of the series Lecture Notes in Computer Science pp 20-30

MPI on a Million Processors

  • Pavan BalajiAffiliated withLancaster UniversityArgonne National Laboratory
  • , Darius BuntinasAffiliated withLancaster UniversityArgonne National Laboratory
  • , David GoodellAffiliated withLancaster UniversityArgonne National Laboratory
  • , William GroppAffiliated withLancaster UniversityUniversity of Illinois
  • , Sameer KumarAffiliated withCarnegie Mellon UniversityIBM T.J. Watson Research Center, Yorktown Heights
  • , Ewing LuskAffiliated withLancaster UniversityArgonne National Laboratory
  • , Rajeev ThakurAffiliated withLancaster UniversityArgonne National Laboratory
  • , Jesper Larsson TräffAffiliated withCarnegie Mellon UniversityNEC Laboratories Europe

* Final gross prices may vary according to local VAT.

Get Access


Petascale machines with close to a million processors will soon be available. Although MPI is the dominant programming model today, some researchers and users wonder (and perhaps even doubt) whether MPI will scale to such large processor counts. In this paper, we examine this issue of how scalable is MPI. We first examine the MPI specification itself and discuss areas with scalability concerns and how they can be overcome. We then investigate issues that an MPI implementation must address to be scalable. We ran some experiments to measure MPI memory consumption at scale on up to 131,072 processes or 80% of the IBM Blue Gene/P system at Argonne National Laboratory. Based on the results, we tuned the MPI implementation to reduce its memory footprint. We also discuss issues in application algorithmic scalability to large process counts and features of MPI that enable the use of other techniques to overcome scalability limitations in applications.