Skip to main content

Hierarchical Read–Write Optimizations for Scientific Applications with Multi-variable Structured Datasets


Large-scale scientific applications spend a significant amount of time in reading and writing data. These simulations run on supercomputers which are architected with high-bandwidth, low-latency, and complex topology interconnects. Yet, few efforts exist that fully exploit the interconnect features for I/O. MPI-IO optimizations suffer from significant network contention at large core counts making I/O a critical bottleneck at extreme scales. We propose HieRO, which leverages the fast interconnect and performs hierarchical optimizations for I/O in scientific applications with structured datasets. HieRO performs reads/writes in multiple stages using carefully chosen leader processes who invoke the MPI-IO calls. Additionally, HieRO considers the application’s domain decomposition and access patterns and fully utilizes the on-chip interconnect at each multicore node. We evaluate the efficacy of our optimizations with two scientific applications, WRF and S3D, with I/O access patterns commonly used in a wide gamut of applications. We evaluate our approaches on two supercomputers, the Edison Cray XC30 and the Mira Blue Gene/Q, representing systems with diverse interconnects and parallel filesystems. We demonstrate that algorithmic changes can lead to significant improvements in parallel read/write. HieRO is able to achieve more than \(40\times \) read time improvements for WRF and achieve up to \(40\times \) read and \(13\times \) write time improvements for S3D on 524288 cores.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4




  1. Arimilli, B., et al.: The PERCS high-performance interconnect. In: Annual Symposium on High Performance Interconnects (2010)

  2. Behzad, B., et al.: Improving parallel I/O autotuning with performance modeling. In: International Symposium on High-Performance Parallel and Distributed Computing (2014)

  3. Chaarawi, M., Gabriel, E.: Automatically selecting the number of aggregators for collective I/O operations. In: International Conference on Cluster Computing, pp. 428–437 (2011)

  4. Chen, D., et al.: The IBM Blue Gene/Q interconnection network and message unit. In: Proceedings of the IEEE/ACM SC11 Conference

  5. Chen, J.H., et al.: Terascale direct numerical simulations of turbulent combustion using S3D. Comput. Sci. Discov. 2(1), 015001 (2009)

  6. Coloma, K., Ching, A., Choudhary, A., Liao, W., Ross, R., Thakur, R., Ward, N.L.: A new flexible MPI collective I/O implementation. In: International Conference on Cluster Computing (2006)

  7. Crandall, P., Aydt, R., Chien, A., Reed, D.: Input/output characteristics of scalable parallel applications. In: Proceedings of the IEEE/ACM SC95 Conference

  8. Edwards, T., Roy, K.: Using I/O servers to improve application performance on Cray XT™ technology. In: CUG Proceedings (2010)

  9. Gao, K., Liao, W., Choudhary, A., Ross, R., Latham, R.: Combining I/O operations for multiple array variables in parallel netCDF. In: International Conference on Cluster Computing and Workshops (2009)

  10. Gilge, M.: IBM system blue gene solution: Blue Gene/Q application development. IBM Redbooks (2013)

  11. Haring, R., et al.: The IBM Blue Gene/Q compute chip. IEEE Micro 32(2), 48–60 (2012)

    Article  Google Scholar 

  12. Hurrell, J.W., et al.: The community earth system model: a framework for collaborative research. Bull. Am. Meteorol. Soc. 94(9), 1339–1360 (2013)

    Article  Google Scholar 

  13. Kim, J., Dally, W., Scott, S., Abts, D.: Technology-driven, highly-scalable dragonfly topology. In: 35th International Symposium on Computer Architecture (2008)

  14. Lang, S., Carns, P., Latham, R., Ross, R., Harms, K., Allcock, W.: I/O performance challenges at leadership scale. In: Proceedings of the IEEE/ACM SC09 Conference (2009)

  15. Li, J., et al.: Parallel netCDF: A high-performance scientific I/O interface. In: Proceedings of the IEEE/ACM SC03 Conference (2003)

  16. Liao, W., Thakur, R.: MPI-IO. (ANL/MCS-P5162-0714) (2014)

  17. Liu, J., Chen, Y., Zhuang, Y.: Hierarchical I/O scheduling for collective I/O. In: International Symposium on Cluster, Cloud and Grid Computing, pp. 211–218 (2013)

  18. Michalakes, J., et al.: The weather research and forecast model: software architecture and performance. In: Proceedings of the 11th ECMWF Workshop on the Use of High Performance Computing In Meteorology (2004)

  19. Prabhat, Koziol, Q., (eds.): High Performance Parallel I/O, vol. 22. Chapman & Hall/CRC Computational Science, CRC Press, Boca Raton (2014)

  20. Schmuck, F., Haskin, R.: GPFS: A shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies (2002)

  21. Schwan, P.: Lustre, Building a file system for 1000 node clusters. In: Proceedings of Linux Symposium (2003)

  22. Sehrish, S., Son, S., Liao, W., Choudhary, A., Schuchardt, K.: Improving collective I/O performance by pipelining request aggregation and file access. In: Proceedings of the 20th European MPI Users’ Group Meeting (2013)

  23. Sreepathi, S., Sripathi, V., Mills, R., Hammond, G., Mahinthakumar, G.: SCORPIO: a scalable two-phase parallel I/O library with application to a large scale subsurface simulator. In: International Conference on High Performance Computing (2013)

  24. Thakur, R., Gropp, W., Lusk, E.: Optimizing noncontiguous accesses in MPI-IO. Parallel Comput. 28(1), 83–105 (2002)

    Article  MATH  Google Scholar 

  25. Venkatesan, V., et al.: Design and evaluation of nonblocking collective I/O operations. In: Recent Advances in the Message Passing Interface (EuroMPI’11), pp. 90–98 (2011)

  26. Vishwanath, V., Hereld, M., Morozov, V., Papka, M.E.: Topology-aware data movement and staging for I/O acceleration on Blue Gene/P supercomputing systems. In: Proceedings of the IEEE/ACM SC11 Conference (2011)

  27. Wang, Z., Shi, X., Jin, H., Wu, S., Chen, Y.: Iteration based collective I/O strategy for parallel I/O systems. In: International Symposium on Cluster, Cloud and Grid Computing (2014)

Download references


This research has been funded in part and used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Preeti Malakar.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Malakar, P., Vishwanath, V. Hierarchical Read–Write Optimizations for Scientific Applications with Multi-variable Structured Datasets. Int J Parallel Prog 45, 94–108 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Read
  • Write
  • Multi-variable dataset
  • Scientific applications