The Journal of Supercomputing

, Volume 71, Issue 6, pp 2066–2090 | Cite as

Performance model-directed data sieving for high-performance I/O

  • Yong Chen
  • Yin Lu
  • Prathamesh Amritkar
  • Rajeev Thakur
  • Yu Zhuang


Many scientific computing applications and engineering simulations exhibit noncontiguous I/O access patterns. Data sieving is an important technique to improve the performance of noncontiguous I/O accesses by combining small and noncontiguous requests into a large and contiguous request. It has been proven effective even though more data are potentially accessed than demanded. In this study, we propose a new data sieving approach namely performance model-directed data sieving, or PMD data sieving in short. It improves the existing data sieving approach from two aspects: (1) dynamically determines when it is beneficial to perform data sieving; and (2) dynamically determines how to perform data sieving if beneficial. It improves the performance of the existing data sieving approach considerably and reduces the memory consumption as verified by both theoretical analysis and experimental results. Given the importance of supporting noncontiguous accesses effectively and reducing the memory pressure in a large-scale system, the proposed PMD data sieving approach in this research holds a great promise and will have an impact on high-performance I/O systems.


Data sieving Runtime systems Parallel I/O Libraries  Parallel file systems High-performance computing 


  1. 1.
    Ali N, Carns PH, Iskra K, Kimpe D, Lang S, Latham R, Ross RB, Ward L, Sadayappan P (2009) Scalable I/O forwarding framework for high-performance computing systems. Proceedings of the 2009 IEEE International Conference on Cluster ComputingGoogle Scholar
  2. 2.
    Abbasi H, Wolf M, Eisenhauer G, Klasky S, Schwan K, Zheng F (2010) Datastager: scalable data staging services for petascale applications. Cluster Comput 13(3):277–290CrossRefGoogle Scholar
  3. 3.
    Abbasi H, Eisenhauer G, Wolf M, Schwan K, Klasky S (2011) Just In Time: Adding Value to the I/O Pipelines Of High Performance Applications with JITStaging. In: Proceedings of International Symposium on High Performance Distributed Computing (HPDC), pp 27–36Google Scholar
  4. 4.
    Blas JG, Isaila F, Carretero J, Latham R, Ross R (2009) Multiple-level MPI file write-back and prefetching for blue gene systems. In: Proceedings of PVM/MPIGoogle Scholar
  5. 5.
    Bordawekar R, Rosario JM, Choudhary AN (1993) Design and evaluation of primitives for parallel I/O. In: Proceedings of ACM/IEEE Supercomputing ConferenceGoogle Scholar
  6. 6.
    Byna S, Chen Y, Sun X-H, Thakur R, Gropp W (2008) Parallel I/O prefetching using MPI file caching and I/O signatures. In: Proceedings of the ACM/IEEE SuperComputing Conference (SC’08)Google Scholar
  7. 7.
    Carns PH, Ligon III WB, Ross RB, Thakur R (2000) PVFS: A parallel file system for linux clusters. In: Proceedings of the 4th Annual Linux Showcase and ConferenceGoogle Scholar
  8. 8.
    Chang F, Gibson GA (1999) Automatic I/O hint generation through speculative execution. In: Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI)Google Scholar
  9. 9.
    Chen Y, Sun X-H, Thakur R, Roth PC, Gropp W (2011) LACIO: a new layout-aware collective I/O strategy for parallel I/O systems. In: The Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS’11)Google Scholar
  10. 10.
    Chen Y, Byna S, Sun X-H, Thakur R, Gropp W (2008) Hiding I/O latency with pre-execution prefetching for parallel applications. Best paper award finalist, in Proceedings of the ACM/IEEE SuperComputing Conference (SC’08)Google Scholar
  11. 11.
    Cluster File Systems Inc. Lustre: a scalable, high performance file system, Whitepaper.
  12. 12.
    Crandall PE, Aydt RA, Chien AA, Reed DA (1995) Input/output characteristics of scalable parallel applications. In: Proceedings of the ACM/IEEE conference on Supercomputing, pp 59-esGoogle Scholar
  13. 13.
    Eshel M, Haskin RL, Hildebrand D, Naik M, Schmuck FB, Tewari R (2010) Panache: a parallel file system cache for global file access. In: Proceedings of the 8th USENIX Conference on File and Storage TechnologiesGoogle Scholar
  14. 14.
    Gu P, Wang J, Ross R (2008) Bridging the gap between parallel file systems and local file systems: a case study with PVFS. The 37th International Conference on Parallel processing 2008 (ICPP’08), pp 554–561Google Scholar
  15. 15.
    Huang HH, Shan L, Szalay A, Terzis A (2011) Performance modeling and analysis of flash-based storage devices in Mass Storage Systems and Technologies (MSST). 2011 IEEE 27th Symposium onGoogle Scholar
  16. 16.
    Iskra K, Romein JW, Yoshii K, Beckman P (2008) ZOID: I/O forwarding infrastructure for petascale architectures. In: Proceedings of the 13th ACM SIGPLAN symposium on principles and practice of parallel programming, pp 153–162Google Scholar
  17. 17.
    Kotz D (1997) Disk-directed I/O for MIMD multiprocessors. ACM Trans Comput Systems 15(1):41–74Google Scholar
  18. 18.
    Lang S, Latham R, Ross RB, Kimpe D (2009) Interfaces for coordinated access in the file system. CLUSTER, pp 1–9Google Scholar
  19. 19.
    Lei H, Duchamp D (1997) An analytical approach to file prefetching. In: Proceedings of the 1997 USENIX Annual Technical Conference, pp 275–288Google Scholar
  20. 20.
    Liao W-K, Ching A, Coloma K, Choudhary A, Ward L (2007) An implementation and evaluation of client-side file caching for MPI-IO. In: Proceedings of IEEE International parallel and distributed processing symposiumGoogle Scholar
  21. 21.
    Lofstead JF, Klasky S, Schwan K, Podhorszki N, Jin C (2008) Flexible I/O and integration for scientific codes through the adaptable I/O system (ADIOS). In: Proceedings of the 6th International Workshop on challenges of large applications in distributed environmentsGoogle Scholar
  22. 22.
    Lu Y, Chen Y, Amritkar Y, Thakur R, Zhuang Y (2012) A new data sieving approach for high performance I/O. In: Proceedings of 7th International Conference on Future Information Technology, Vancouver, CanadaGoogle Scholar
  23. 23.
    May J (2001) Parallel I/O for high performance computing. Morgan Kaufmann Publishing, San Francisco, CAGoogle Scholar
  24. 24.
    Ma XS, Winslett M, Lee J, Yu SK (2002) Faster collective output through active buffering. IPDPSGoogle Scholar
  25. 25.
    Nisar A, Liao WK, Choudhary A (2008) Scaling parallel I/O performance through I/O delegate and caching system. SCGoogle Scholar
  26. 26.
    Nitzberg B, Lo V (1997) Collective buffering: improving parallel I/O performance. HPDCGoogle Scholar
  27. 27.
    Oldfield R, Kotz D (2001) Armada: a parallel file system for computational grids. In: Proceedings of IEEE/ACM International Symposium on luster Computing and the Grid, pp 194–201, Brisbane, Australia. IEEE PressGoogle Scholar
  28. 28.
  29. 29.
    Patterson RH, Gibson GA, Ginting E, Stodolsky D, Zelenka J (1995) Informed prefetching and caching. In: Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP ’05), ACMGoogle Scholar
  30. 30.
    Rafique MM, Butt AR, Nikolopoulos DS (2008) DMA-based prefetching for I/O-intensive workloads on the cell architecture. Conf. Computing, Frontiers, pp 23–32Google Scholar
  31. 31.
  32. 32.
    Schmuck F, Haskin R (2002) GPFS: A shared-disk file system for large computing clusters. In: Proceedings of the First USENIX Conference on File and Storage Technologies, pp 231–244, USENIXGoogle Scholar
  33. 33.
    Seamons K, Chen Y, Jones P, Jozwiak J, Winslett M (1995) Server-directed collective I/O in panda. In: Proceedings of Supercomputing ConferenceGoogle Scholar
  34. 34.
    Song H, Yin Y, Chen Y, Sun X (2011) A cost intelligent application specific data layout scheme for parallel file systems. In: Proceedings of the 20th international symposium on High performance distributed computing. ACM New York, NY, USAGoogle Scholar
  35. 35.
    Tran N, Reed DA (2004) Automatic ARIMA time series modeling for adaptive I/O prefetching. IEEE Trans Parallel Distrib Syst 15(4):362–377CrossRefGoogle Scholar
  36. 36.
    Thakur R, Gropp W, Lusk E (1999) Data sieving and collective I/O in ROMIO. In: Proceedings of the 7th Symposium on the Frontiers of Massively Parallel ComputationGoogle Scholar
  37. 37.
    Thakur R, Ross R, Lusk E, Gropp W (2004) Users Guide for ROMIO: a high-performance, portable MPI-IO implementation. Technical Memorandum ANL/MCS-TM-234. Mathematics and Computer Science Division, Argonne National Laboratory, Revised MayGoogle Scholar
  38. 38.
    Thakur R, Choudhary A, Bordawekar R, More S, Kuditipudi S (1996) Passion: optimized I/O for parallel applications. Computer 29(6):70–78Google Scholar
  39. 39.
    Vilayannur M, Sivasubramaniam A, Kandemir MT, Thakur R, Ross R (2006) Discretionary caching for I/O on clusters. Cluster Comput 9(1):29–44CrossRefGoogle Scholar
  40. 40.
    Wang J, Yao X, Mitchell C, Gu P (2009) A hierarchical data cache architecture for iSCSI storage server. IEEE Trans Comput 58(4):1–15CrossRefMathSciNetGoogle Scholar
  41. 41.
    Weil S, Brandt S, Miller E, Long DDE, Maltzahn C (2006) Ceph: a scalable, high-performance distributed file system. In: Proceedings of USENIX Symposium on operating Systems design and implementationGoogle Scholar
  42. 42.
    Welch B, Unangst M, Abbasi Z, Gibson G, Mueller B, Small J, Zelenka J, Zhou B (2008) Scalable performance of the panasas parallel file system. In: Proceedings of the 6th USENIX Conference on File and Storage TechnologiesGoogle Scholar
  43. 43.
    Widener P, Wolf M, Abbasi H, McManus S, Payne M, Barrick MJ, Pulikottil J, Bridges PG, Schwan K (2011) Exploiting latent I/O asynchrony in petascale science applications. IJHPCA 25(2):161–179Google Scholar
  44. 44.
    Yang CK, Mitra T, Chiueh T (2002) A decoupled architecture for application-specific file prefetching. Freenix Track of USENIX 2002 Annual ConferenceGoogle Scholar
  45. 45.
    Zhang X, Jiang S, Davis K (2009) Making resonance a common case: a high-performance implementation of collective I/O on parallel file systems. In: Proceedings of the 23rd IEEE International Symposium on parallel and distributed processingGoogle Scholar
  46. 46.
    Zhang Z, Lee K, Ma X, Zhou Y (2008) PFC: transparent optimization of existing prefetching strategies for multi-level storage systems. ICDCS, pp 740–751Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Yong Chen
    • 1
  • Yin Lu
    • 1
  • Prathamesh Amritkar
    • 1
  • Rajeev Thakur
    • 2
  • Yu Zhuang
    • 1
  1. 1.Computer Science DepartmentTexas Tech UniversityLubbockUSA
  2. 2.Mathematics and Computer Science DivisionArgonne National LaboratoryArgonneUSA

Personalised recommendations