Parallel Zero-Copy Algorithms for Fast Fourier Transform and Conjugate Gradient Using MPI Datatypes

  • Torsten Hoefler
  • Steven Gottlieb
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6305)

Abstract

Many parallel applications need to communicate non- contiguous data. Most applications manually copy (pack/unpack) data before communications even though MPI allows a zero-copy specification. In this work, we study two complex use-cases: (1) Fast Fourier Transformation where we express a local memory transpose as part of the datatype, and (2) a conjugate gradient solver with a checkerboard layout that requires multiple nested datatypes. We demonstrate significant speedups up to a factor of 3.8 and 18%, respectively, in both cases. Our work can be used as a template to utilize datatypes for application developers. For MPI implementers, we show two practically relevant access patterns that deserve special optimization.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    MPI Forum: MPI: A Message-Passing Interface Standard. Version 2.2 (2009) http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf
  2. 2.
    The InfiniBand Trade Association: Infiniband Architecture Specification , Release 1.2. InfiniBand Trade Association vol.1(2003)Google Scholar
  3. 3.
    Kumar, S., et al.: The deep computing messaging framework: generalized scalable message passing on the blue gene/p supercomputer. In: ICS 2008: Proceedings of the 22nd annual international conference on Supercomputing, pp. 94–103. ACM, New York (2008)CrossRefGoogle Scholar
  4. 4.
    Santhanaraman, G., Wu, J., Huang, W., Panda, D.K.: Designing zero-copy message passing interface derived datatype communication over infiniband: Alternative approaches and performance evaluation. Int. J. High Perform. Comput. Appl. 19, 129–142 (2005)CrossRefGoogle Scholar
  5. 5.
    Träff, J.L., Hempel, R., Ritzdorf, H., Zimmermann, F.: Flattening on the fly: Efficient handling of mpi derived datatypes. In: Proceedings of the 6th European PVM/MPI Users’ Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, London, UK, pp. 109–116. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  6. 6.
    Gabriel, E., Resch, M., Rühle, R.: Implementing and benchmarking derived datatypes in metacomputing. In: HPCN Europe 2001: Proc. of the 9th Intl. Conference on High-Performance Computing and Networking, London, UK, pp. 493–502. Springer, Heidelberg (2001)Google Scholar
  7. 7.
    Gropp, W., Lusk, E., Swider, D.: Improving the performance of mpi derived datatypes. In: Proceedings of the Third MPI Developer’s and User’s Conference, pp. 25–30. MPI Software Technology Press (1999)Google Scholar
  8. 8.
    Byna, S., Gropp, W., Sun, X.H., Thakur, R.: Improving the performance of mpi derived datatypes by optimizing memory-access cost. In: IEEE International Conference on Cluster Computing, p. 412 (2003)Google Scholar
  9. 9.
    Lu, Q., Wu, J., Panda, D., Sadayappan, P.: Applying MPI Derived Datatypes to the NAS Benchmarks: A Case Study. In: Proc. of the Intl. Conf. on Par. Proc. Workshops (2004)Google Scholar
  10. 10.
    Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical recipes in C: the art of scientific computing, 2nd edn. Cambridge University Press, Cambridge (1992)Google Scholar
  11. 11.
    Bernard, C., Ogilvie, M.C., DeGrand, T.A., DeTar, C.E., Gottlieb, S.A., Krasnitz, A., Sugar, R., Toussaint, D.: Studying Quarks and Gluons On Mimd Parallel Computers. International Journal of High Performance Computing Applications 5, 61–70 (1991)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Torsten Hoefler
    • 1
  • Steven Gottlieb
    • 1
  1. 1.National Center for Supercomputing ApplicationsUniversity of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations