rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

  • James Psota
  • Anant Agarwal
Conference paper

DOI: 10.1007/978-3-540-77560-7_3

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4917)
Cite this paper as:
Psota J., Agarwal A. (2008) rMPI: Message Passing on Multicore Processors with On-Chip Interconnect. In: Stenström P., Dubois M., Katevenis M., Gupta R., Ungerer T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2008. Lecture Notes in Computer Science, vol 4917. Springer, Berlin, Heidelberg

Abstract

With multicore processors becoming the standard architecture, programmers are faced with the challenge of developing applications that capitalize on multicore’s advantages. This paper presents rMPI, which leverages the on-chip networks of multicore processors to build a powerful abstraction with which many programmers are familiar: the MPI programming interface. To our knowledge, rMPI is the first MPI implementation for multicore processors that have on-chip networks. This study uses the MIT Raw processor as an experimentation and validation vehicle, although the findings presented are applicable to multicore processors with on-chip networks in general. Likewise, this study uses the MPI API as a general interface which allows parallel tasks to communicate, but the results shown in this paper are generally applicable to message passing communication. Overall, rMPI’s design constitutes the marriage of message passing communication and on-chip networks, allowing programmers to employ a well-understood programming model to a high performance multicore processor architecture.

This work assesses the applicability of the MPI API to multicore processors with on-chip interconnect, and carefully analyzes overheads associated with common MPI operations. This paper contrasts MPI to lower-overhead network interface abstractions that the on-chip networks provide. The evaluation also compares rMPI to hand-coded applications running directly on one of the processor’s low-level on-chip networks, as well as to a commercial-quality MPI implementation running on a cluster of Ethernet-connected workstations. Results show speedups of 4x to 15x for 16 processor cores relative to one core, depending on the application, which equal or exceed performance scalability of the MPI cluster system. However, this paper ultimately argues that while MPI offers reasonable performance on multicores when, for instance, legacy applications must be run, its large overheads squander the multicore opportunity. Performance of multicores could be significantly improved by replacing MPI with a lighter-weight communications API with a smaller memory footprint.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • James Psota
    • 1
  • Anant Agarwal
    • 1
  1. 1.Massachusetts Institute of TechnologyCambridgeUSA

Personalised recommendations