Upcoming exascale capable systems are expected to comprise more than a million processing elements.
As researchers continue to work toward architecting these systems, it is becoming increasingly clear that
these systems will utilize a significant amount of shared hardware between processing units; this includes
shared caches, memory and network components. Thus, understanding how effective current message passing
and communication infrastructure is in tying these processing elements together, is critical to making educated
guesses on what we can expect from such future machines. Thus, in this paper, we characterize the communication
performance of the message passing interface (MPI) implementation on 32 racks (131072 cores) of the largest
Blue Gene/P (BG/P) system in the United States (80% of the total system size) and reveal various interesting
insights into it.