Skip to main content

Automatic Memory Optimizations for Improving MPI Derived Datatype Performance

  • Conference paper
Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI 2006)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 4192))

Abstract

MPI derived datatypes allow users to describe noncontiguous memory layout and communicate noncontiguous data with a single communication function. This powerful feature enables an MPI implementation to optimize the transfer of noncontiguous data. In practice, however, many implementations of MPI derived datatypes perform poorly, which makes application developers avoid using this feature. In this paper, we present a technique to automatically select templates that are optimized for memory performance based on the access pattern of derived datatypes. We implement this mechanism in the MPICH2 source code. The performance of our implementation is compared to well-written manual packing/unpacking routines and original MPICH2 implementation. We show that performance for various derived datatypes is significantly improved and comparable to that of optimized manual routines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Byna, S., Gropp, W., Sun, X.-H., Thakur, R.: Improving the Performance of MPI Derived Datatypes by Optimizing Memory-Access Cost. In: Proceedings of IEEE International Conference on Cluster Computing (December 2003)

    Google Scholar 

  2. Byna, S., Sun, X.-H., Gropp, W., Thakur, R.: Predicting Memory-Access Cost Based on Data-Access Patterns. In: Proceedings of IEEE International Conference on Cluster Computing (September 2004)

    Google Scholar 

  3. Carns, P.H., Ligon III, W.B., Ross, R.B., Thakur, R.: PVFS: A Parallel File System for Linux Clusters. In: Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, pp. 317–327, USENIX Association (2000)

    Google Scholar 

  4. Gropp, W., Lusk, E., Swider, D.: Improving the Performance of MPI Derived Datatypes. In: Proceedings of the Third MPI Developer’s and User’s Conference, pp. 25–30. MPI Software Technology Press (March 1999)

    Google Scholar 

  5. Lam, M., Rothberg, E.E., Wolf, M.E.: The Cache Performance of Blocked Algorithms. In: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991, pp. 63–74 (1991)

    Google Scholar 

  6. Lu, Q., Wu, J., Panda, D., Sadayappan, P.: Applying MPI Derived Datatypes to the NAS Benchmarks: A Case Study. Technical Report OSU-CISRC-4/04-TR19, Ohio State University

    Google Scholar 

  7. Message Passing Interface Forum, MPI: A Message-Passing Interface Standard, Version 1.1 (June 1995), http://www.mpi-forum.org/docs/docs.html

  8. Mowry, T., Gupta, A.: Tolerating Latency Through Software-controlled Prefetching in Shared-memory Multiprocessors. Journal of Parallel and Distributed Computing 12(2) (June 1991)

    Google Scholar 

  9. Ogawa, H., Matsuoka, S.: OMPI: Optimizing MPI Programs using Partial Evaluation. In: Proceedings of IEEE/ACM Supercomputing Conference, Pittsburgh (November 1996)

    Google Scholar 

  10. Reussner, R., Träff, J.L., Hunzelmann, G.: A Benchmark for MPI Derived Datatypes. In: Dongarra, J., Kacsuk, P., Podhorszki, N. (eds.) PVM/MPI 2000. LNCS, vol. 1908, pp. 10–17. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  11. Ross, R., Miller, N., Gropp, W.: Implementing Fast and Reusable Datatype Processing. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 404–413. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  12. Träff, J.L., Hempel, R., Ritzdorf, H., Zimmermann, F.: Flattening on the fly: Efficient handling of MPI derived datatypes. In: Margalef, T., Dongarra, J., Luque, E. (eds.) PVM/MPI 1999. LNCS, vol. 1697, pp. 109–116. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  13. Wu, J., Wyckoff, P., Panda, D.: High Performance Implementation of MPI Derived Datatype Communication over InfiniBand. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Byna, S., Sun, XH., Thakur, R., Gropp, W. (2006). Automatic Memory Optimizations for Improving MPI Derived Datatype Performance. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2006. Lecture Notes in Computer Science, vol 4192. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846802_36

Download citation

  • DOI: https://doi.org/10.1007/11846802_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-39110-4

  • Online ISBN: 978-3-540-39112-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics