Performance evaluation and modeling of reduction operations on the IBM RS/6000 SP parallel computer
We discuss algorithms for global reduction (or combine) operations (e.g., global sums) for numbers of processors that need not be a power of 2, and implement these using standard message-passing techniques on distributed-memory parallel computers. We present performance results measured on an IBM RS/6000 SP parallel computer at UNI•C. Significant performance improvements are obtained by using a recursive doubling method with a vector splice/gather approach.
Unable to display preview. Download preview PDF.
- 1.B. Hammer and Ole H. Nielsen, Parallel Ab-Initio Molecular Dynamics in proceedings of the Workshop on Applied Parallel Computing in Physics, Chemistry and Engineering Science (PARA'95), August 21–24, 1995, ed. J. Wasniewski, Springer Lecture Notes in Computer Science, vol. 1041, pp. 295.Google Scholar
- 2.J. Bruck and C.-T. Ho, Efficient Global Combine Operations in Multi-Port Message-Passing Systems, Parallel Processing Letters vol. 3(4), pp. 335, 1993.Google Scholar
- 3.J. Bruck, C.-T. Ho, S. Kipnis, and D. Weathersby, Efficient Algorithms for All-to-All Communications in Multi-Port Message Passing Systems, manuscript.Google Scholar
- 4.IBM's RS/6000 SP documentation is available on WWW: URL:http://www.rs6000.ibm.com/software/sp.products/sp3.html.Google Scholar
- 5.IBM's Parallel Environment (PE) product, which includes the MPI library, is described on WWW: URL:http://www.rs6000.ibm.com/software/sp_products/pe.html.Google Scholar