Exploiting Shared Memory to Improve Parallel I/O Performance

  • Andrew B. Hastings
  • Alok Choudhary
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4192)


We explore several methods utilizing system-wide shared memory to improve the performance of MPI-IO, particularly for non-contiguous file access. We introduce an abstraction called the datatype iterator that permits efficient, dynamic generation of (offset, length) pairs for a given MPI derived datatype. Combining datatype iterators with overlapped I/O and computation, we demonstrate how a shared memory MPI implementation can utilize more than 90% of the available disk bandwidth (in some cases representing a 5× performance improvement over existing methods) even for extreme cases of non-contiguous datatypes. We generalize our results to suggest possible parallel I/O performance improvements on systems without global shared memory.


Parallel I/O shared memory datatype iterator non-contiguous access MPI-IO 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gropp, W., Huss-Lederman, S., Lumsdaine, A., Lusk, E., Nitzberg, B., Saphir, W., Snir, M.: MPI – The Complete Reference, vol. 2. MIT Press, Cambridge (1998)Google Scholar
  2. 2.
    Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: Proceedings of the Seventh Symposium on the Frontiers of Massively Parallel Computation, pp. 182–189. IEEE Computer Society Press, Los Alamitos (1999)CrossRefGoogle Scholar
  3. 3.
    Thakur, R., Gropp, W., Lusk, E.: On implementing MPI-IO portably and with high performance. In: Proceedings of the Sixth Workshop on Input/Output in Parallel and Distributed Systems, May 1999, pp. 23–32 (1999)Google Scholar
  4. 4.
    Ching, A., Choudhary, A., Liao, W.-K., Ross, R., Gropp, W.: Efficient structured data access in parallel file systems. In: Proceedings of the IEEE International Conference on Cluster Computing, pp. 326–335. IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
  5. 5.
    MPICH2 home page (August 2005),
  6. 6.
    HPCS – High Productivity Computer Systems (April 2006),
  7. 7.
    Vildibill, M.: Sun’s Hero program: Changing the productivity game (April 2006),
  8. 8.
    Ross, R., Miller, N., Gropp, W.: Implementing fast and reusable datatype processing. In: Proceedings of the 10th European PVM/MPI Users Group Meeting, pp. 404–413. Springer, Heidelberg (2003)Google Scholar
  9. 9.
    Worringen, J., Träff, J.L., Ritzdorf, H.: Fast parallel non-contiguous file access. In: Proceedings of SC2003: High Performance Networking and Computing. IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
  10. 10.
    Worringen, J., Gäer, A., Reker, F.: Exploiting transparent remote memory access for non-contiguous- and one-sided-communication. In: Proceedings of the International Parallel and Distributed Processing Symposium, pp. 163–172. IEEE Computer Society Press, Los Alamitos (2002)CrossRefGoogle Scholar
  11. 11.
  12. 12.
    LAM/MPI parallel computing (April 2005),
  13. 13.
    Ross, R.: Parallel I/O benchmarking consortium (August 2005),
  14. 14.
    ASC center for astrophysical thermonuclear flashes (April 2006),

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Andrew B. Hastings
    • 1
  • Alok Choudhary
    • 2
  1. 1.Sun Microsystems, Inc. 
  2. 2.Northwestern University 

Personalised recommendations