Matrix Multiplication on Multidimensional Torus Networks

  • Edgar Solomonik
  • James Demmel
Conference paper

DOI: 10.1007/978-3-642-38718-0_21

Part of the Lecture Notes in Computer Science book series (LNCS, volume 7851)
Cite this paper as:
Solomonik E., Demmel J. (2013) Matrix Multiplication on Multidimensional Torus Networks. In: Daydé M., Marques O., Nakajima K. (eds) High Performance Computing for Computational Science - VECPAR 2012. VECPAR 2012. Lecture Notes in Computer Science, vol 7851. Springer, Berlin, Heidelberg

Abstract

Blocked matrix multiplication algorithms such as Cannon’s algorithm and SUMMA have a 2-dimensional communication structure. We introduce a generalized ’Split-Dimensional’ version of Cannon’s algorithm (SD-Cannon) with higher-dimensional and bidirectional communication structure. This algorithm is useful for torus interconnects that can achieve more injection bandwidth than single-link bandwidth. On a bidirectional torus network of dimension d, SD-Cannon can lower the algorithmic bandwidth cost by a factor of up to d. With rectangular collectives, SUMMA also achieves the lower bandwidth cost but has a higher latency cost. We use Charm++ virtualization to efficiently map SD-Cannon on unbalanced and odd-dimensional torus network partitions. Our performance study on Blue Gene/P demonstrates that a MPI version of SD-Cannon can exploit multiple communication links and improve performance.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Edgar Solomonik
    • 1
  • James Demmel
    • 1
  1. 1.Division of Computer ScienceUniversity of California at BerkeleyUSA

Personalised recommendations