Abstract
Many scientific computing applications and engineering simulations exhibit noncontiguous I/O access patterns. Data sieving is an important technique to improve the performance of noncontiguous I/O accesses by combining small and noncontiguous requests into a large and contiguous request. It has been proven effective even though more data is potentially accessed than demanded. In this study, we propose a new data sieving approach namely Performance Model Directed Data Sieving, or PMD data sieving in short. It improves the existing data sieving approach from two aspects: (1) dynamically determines when it is beneficial to perform data sieving; and (2) dynamically determines how to perform data sieving if beneficial. It improves the performance of the existing data sieving approach and reduces the memory consumption as verified by experimental results. Given the importance of supporting noncontiguous accesses effectively and reducing the memory pressure in a large-scale system, the proposed PMD data sieving approach in this research holds a promise and will have an impact on high performance I/O systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blas, J.G., Isaila, F., Carretero, J., Latham, R., Ross. R.: Multiple-level MPI file write-back and prefetching for blue gene systems. In: Proceedings of the PVM/MPI (2009)
Bordawekar, R., Rosario, J.M., Choudhary, A.N.: Design and evaluation of primitives for parallel I/O. In: Proceedings of the ACM/IEEE Supercomputing Conference (1993)
Carns, P.H., Ligon, W.B., III, Ross, R.B., Thakur, R.: PVFS “a parallel file system for linux clusters.” In: Proceedings of the 4th Annual Linux Showcase and Conference (2000)
Cluster File Systems Inc.: Lustre: a scalable, high performance file system. Whitepaper. http://www.lustre.org/docs/whitepaper.pdf
Crandall, P.E., Aydt, R.A., Chien, A.A., Reed, D.A.: Input/output characteristics of scalable parallel applications. In: Proceedings of the ACM/IEEE Conference on Supercomputing, pp. 59-es (1995)
Iskra, K., Romein, J.W., Yoshii, K., Beckman, P.: ZOID: I/O forwarding infrastructure for petascale architectures. In: Proceedings of the 13th ACM PPoPP (2008)
Lei, H., Duchamp, D.: An analytical approach to file prefetching. In: Proceedings of the 1997 USENIX Annual Technical Conference, pp. 275–288, Jan 1997
Lofstead, J.F., Klasky, S., Schwan, K., Podhorszki, N., Jin, C.: Flexible I/O and integration for scientific codes through the adaptable I/O system (ADIOS). In: Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments (2008)
May, J.: Parallel I/O for high performance computing. Morgan Kaufmann, San Francisco (2001)
Ma, X.S., Winslett, M., et. al.: Faster collective output through active buffering. In: IPDPS (2002)
Nisar,A., Liao, W.-K., Choudhary, A.: Scaling parallel I/O performance through I/O delegate and caching system. SC (2008)
Nitzberg, B., et al.: Collective buffering: improving parallel I/O performance. In: HPDC (1997)
Rafique, M.M., Butt, A.R., Nikolopoulos, D.S.: DMA-based prefetching for I/O-intensive workloads on the cell architecture. In: Conference on Computing Frontiers, pp. 23–32 (2008)
ROMIO website.: http://www-unix.mcs.anl.gov/romio/
Schmuck, F., Haskin, R., GPFS: a shared-disk file system for large computing clusters. In: Proceedings of the First USENIX FAST, pp. 231–244, USENIX, Jan 2002
Tran, N., Reed, D.A.: Automatic ARIMA time series modeling for adaptive I/O prefetching. IEEE Trans. Parallel Distrib. Sys. 15(4), 362–377 (2004)
Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: Proceedings of the 7th Symposium on the Frontiers of Massively Parallel Computation (1999)
Thakur, R., Choudhary, A., Bordawekar, R., More, S., Kuditipudi, S.: Passion: optimized I/O for parallel applications. Computer 29(6), 70–78, June 1996
Vilayannur, M., Sivasubramaniam, A., Kandemir, M.T., Thakur, R., Ross, R.: Discretionary caching for I/O on clusters. Cluster Comput. 9(1), 29–44 (2006)
Welch, B., Unangst, M., Abbasi, Z., Gibson, G., Mueller, B., Small, J., Zelenka, J., Zhou, B.: Scalable performance of the panasas parallel file system. USENIX FAST (2008)
Zhang, X., Jiang, S., Davis, K.: Making resonance a common case: a high-performance implementation of collective I/O on parallel file systems. IPDPS (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media Dortdrecht
About this paper
Cite this paper
Lu, Y., Chen, Y., Amritkar, P., Thakur, R., Zhuang, Y. (2012). A New Data Sieving Approach for High Performance I/O. In: J. (Jong Hyuk) Park, J., Leung, V., Wang, CL., Shon, T. (eds) Future Information Technology, Application, and Service. Lecture Notes in Electrical Engineering, vol 164. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-4516-2_12
Download citation
DOI: https://doi.org/10.1007/978-94-007-4516-2_12
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-4515-5
Online ISBN: 978-94-007-4516-2
eBook Packages: EngineeringEngineering (R0)