Block-cyclic array redistribution on networks of workstations
This article deals with the run-time comparison between several algorithms (including the MPI_Alltoallv() function call) to perform the redistribution of arrays that are distributed in a block-cyclic fashion over a multidimensional processor grid. The generation of the communication messages to be exchanged by the processors involved in the redistribution is not taken into account. Rather we focus on the scheduling of those messages: how to organize the message exchanges into “structured” communication steps that will minimize communication overhead.
Key-wordsdistributed arrays redistribution block-cyclic distribution scheduling MPI HPF network of workstations
Unable to display preview. Download preview PDF.
- 1.Frédéric Desprez, Jack Dongarra, Antoine Petitet, Cyril Randriamaro, and Yves Robert. Scheduling block-cyclic array redistribution. Research Report 97-349, Computer Science Department, University of Tennessee at Knoxville, February 1997. Also available as LAPALK Working Note 120, http://www.netlib.org/lapack/lawns.Google Scholar
- 4.L. Prylli and B. Tourancheau. Efficient block-cyclic data redistribution. In EuroPar'96, volume 1123 of Lectures Notes in Computer Science, pages 155–164. Springer Verlag, 1996.Google Scholar
- 5.M. Snir, S. W. Otto, S. Huss-Lederman, D. W. Walker, and J. Dongarra. MPI the complete reference. The MIT Press, 1996.Google Scholar
- 7.Lei Wang, James M. Stichnoth, and Siddhartha Chatterjee. Runtime performance of parallel array assignment: an empirical study. In 1996 ACM/IEEE Supercomputing Conference. http://www.supercomp.org/sc96/proceedings, 1996.Google Scholar