Flattening on the Fly: efficient handling of MPI derived datatypes

  • Jesper Larsson Träff
  • Rolf Hempel
  • Hubert Ritzdorf
  • Falk Zimmermann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1697)

Abstract

The Message Passing Interface (MPI) incorporates a mechanism for describing structured, non-contiguous memory layouts for use as communication buffers in MPI communication functions. The rationale behind the derived datatype mechanism is to alleviate the user from tedious packing and unpacking of non-consecutive data into contiguous communication buffers. Furthermore, the mechanism makes it possible to improve performance by saving on internal buffering. Apparently, current MPI implementations entail considerable performance penalties when working with derived datatypes. We describe a new method called flattening on the fly for the efficient handling of derived datatypes in MPI. The method aims at exploiting regularities in the memory layout described by the datatype as far as possible. In addition it considerably reduces the overhead for parsing the datatype. Flattening on the fly has been implemented and evaluated on an NEC SX-4 vector supercomputer. On the SX-4 flattening on the fly performs significantly better than previous methods, resulting in performance comparable to what the user can in the best case achieve by packing and unpacking data manually. Also on a PC cluster the method gives worthwhile improvements in cases that are not handled well by the conventional implementation.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    M. Baum, M. Goŀebiewski, R. Hempel, and J. L. Träff. Dual-device MPI implementation for PC clusters with SMP nodes. In Third MPI Developer’s and User’s Conference (MPIDC’99), pages 53–60, 1999.Google Scholar
  2. 2.
    W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable imlementation of the MPI message passing interface standard. Parallel Computing, 22(6):789–828, Sept. 1996.MATHCrossRefGoogle Scholar
  3. 3.
    W. D. Gropp, E. Lusk, and D. Swider. Improving the performance of MPI derived datatypes. In Third MPI Developer’s and User’s Conference (MPIDC’99), pages 25–30, 1999.Google Scholar
  4. 4.
    R. Hempel, H. Ritzdorf, and F. Zimmermann. Implementation of MPI on NEC’s SX-4 multi-node architecture. Future Generation Computer Systems, 1999. To appear.Google Scholar
  5. 5.
    M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra. MPI-The Complete Reference, volume 1, The MPI Core. MIT Press, second edition, 1998.Google Scholar
  6. 6.
    R. Thakur, W. Gropp, and E. Lusk. A case for using MPI’s derived datatypes to improve I/O performance. In Proceedings of SC98: High Performance Networking and Computing. ACM/IEEE Press, 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Jesper Larsson Träff
    • 1
  • Rolf Hempel
    • 1
  • Hubert Ritzdorf
    • 1
  • Falk Zimmermann
    • 1
  1. 1.C & C Research LaboratoriesNEC Europe Ltd.Sankt AugustinGermany

Personalised recommendations