Computer Science - Research and Development

, Volume 28, Issue 2, pp 147–155

The design of ultra scalable MPI collective communication on the K computer


    • Fujitsu Limited
  • Naoyuki Shida
    • Fujitsu Limited
  • Kenichi Miura
    • Fujitsu Limited
  • Shinji Sumimoto
    • Fujitsu Limited
  • Atsuya Uno
    • RIKEN
  • Motoyoshi Kurokawa
    • RIKEN
  • Fumiyoshi Shoji
    • RIKEN
  • Mitsuo Yokokawa
    • RIKEN
Special Issue Paper

DOI: 10.1007/s00450-012-0211-7

Cite this article as:
Adachi, T., Shida, N., Miura, K. et al. Comput Sci Res Dev (2013) 28: 147. doi:10.1007/s00450-012-0211-7


This paper proposes the design of ultra scalable MPI collective communication for the K computer, which consists of 82,944 computing nodes and is the world’s first system over 10 PFLOPS. The nodes are connected by a Tofu interconnect that introduces six dimensional mesh/torus topology. Existing MPI libraries, however, perform poorly on such a direct network system since they assume typical cluster environments. Thus, we design collective algorithms optimized for the K computer.

On the design of the algorithms, we place importance on collision-freeness for long messages and low latency for short messages. The long-message algorithms use multiple RDMA network interfaces and consist of neighbor communication in order to gain high bandwidth and avoid message collisions. On the other hand, the short-message algorithms are designed to reduce software overhead, which comes from the number of relaying nodes. The evaluation results on up to 55,296 nodes of the K computer show the new implementation outperforms the existing one for long messages by a factor of 4 to 11 times. It also shows the short-message algorithms complement the long-message ones.


K computerMPI collective communicationTorus network

Copyright information

© Springer-Verlag 2012