A comparison of MPI performance on different MPPs
Since MPI  has become a standard for message-passing on distributed memory machines a number of implementations have evolved. Today there is an MPI implementation available for all relevant MPP systems, a number of which is based on MPICH . In this paper we are going to present performance comparison for several implementations of MPI on different MPPs. Results for the Cray T3E, the IBM RS/6000 SP, the Hitachi SR2201 and the Intel Paragon are presented. In addition we compare those results to the NEC SX-4, a shared memory PVP.
Results presented will show latency and bandwidth for point-to-point communication. In addition results for global communications and synchronization will be given. This covers a wide range of MPI features used by typical numerical simulation codes. Finally we investigate a core conjugate gradient solver operation to show the behaviour of latency-hiding techniques on different platforms.
Unable to display preview. Download preview PDF.
- 1.Message Passing Interface Forum: MPI: A Message Passing Interface Standard. University of Tennesee, Knoxville, USA, 1995.Google Scholar
- 3.Zhiwei Xu, Kai Hwang, “Modeling Communication Overhead: MPI and MPL Performance on the IBM SP2”, IEEE Parallel & Distributed Technology, Spring 1996, 9–23.Google Scholar
- 5.Shahid H. Bokhari, “Multiphase Complete Exchange on Paragon, SP2, and CS-2”, IEEE Parallel & Distributed Technology, Fall 1996, 45–59.Google Scholar
- 7.José Miguel, Augustin Arruabarrena, Ramón Beivide and José Angel Gregorio, “Assessing the Performance of the New IBM SP2 Communication Subsystem”, IEEE Parallel & Distributed Technology, Winter 1996, 12–22.Google Scholar
- 10.Resch, M., Berger, H., Rabenseifner, R., Boenisch, T.: MPI Performance on the Cray T3E, BI, RUS, 1997.Google Scholar