A New Data Sieving Approach for High Performance I/O

Lu, Yin; Chen, Yong; Amritkar, Prathamesh; Thakur, Rajeev; Zhuang, Yu

doi:10.1007/978-94-007-4516-2_12

Yin Lu⁵,
Yong Chen⁵,
Prathamesh Amritkar⁵,
Rajeev Thakur⁶ &
…
Yu Zhuang⁵

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 164))

1718 Accesses
4 Citations

Abstract

Many scientific computing applications and engineering simulations exhibit noncontiguous I/O access patterns. Data sieving is an important technique to improve the performance of noncontiguous I/O accesses by combining small and noncontiguous requests into a large and contiguous request. It has been proven effective even though more data is potentially accessed than demanded. In this study, we propose a new data sieving approach namely Performance Model Directed Data Sieving, or PMD data sieving in short. It improves the existing data sieving approach from two aspects: (1) dynamically determines when it is beneficial to perform data sieving; and (2) dynamically determines how to perform data sieving if beneficial. It improves the performance of the existing data sieving approach and reduces the memory consumption as verified by experimental results. Given the importance of supporting noncontiguous accesses effectively and reducing the memory pressure in a large-scale system, the proposed PMD data sieving approach in this research holds a promise and will have an impact on high performance I/O systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Blas, J.G., Isaila, F., Carretero, J., Latham, R., Ross. R.: Multiple-level MPI file write-back and prefetching for blue gene systems. In: Proceedings of the PVM/MPI (2009)
Google Scholar
Bordawekar, R., Rosario, J.M., Choudhary, A.N.: Design and evaluation of primitives for parallel I/O. In: Proceedings of the ACM/IEEE Supercomputing Conference (1993)
Google Scholar
Carns, P.H., Ligon, W.B., III, Ross, R.B., Thakur, R.: PVFS “a parallel file system for linux clusters.” In: Proceedings of the 4th Annual Linux Showcase and Conference (2000)
Google Scholar
Cluster File Systems Inc.: Lustre: a scalable, high performance file system. Whitepaper. http://www.lustre.org/docs/whitepaper.pdf
Crandall, P.E., Aydt, R.A., Chien, A.A., Reed, D.A.: Input/output characteristics of scalable parallel applications. In: Proceedings of the ACM/IEEE Conference on Supercomputing, pp. 59-es (1995)
Google Scholar
Iskra, K., Romein, J.W., Yoshii, K., Beckman, P.: ZOID: I/O forwarding infrastructure for petascale architectures. In: Proceedings of the 13th ACM PPoPP (2008)
Google Scholar
Lei, H., Duchamp, D.: An analytical approach to file prefetching. In: Proceedings of the 1997 USENIX Annual Technical Conference, pp. 275–288, Jan 1997
Google Scholar
Lofstead, J.F., Klasky, S., Schwan, K., Podhorszki, N., Jin, C.: Flexible I/O and integration for scientific codes through the adaptable I/O system (ADIOS). In: Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments (2008)
Google Scholar
May, J.: Parallel I/O for high performance computing. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Ma, X.S., Winslett, M., et. al.: Faster collective output through active buffering. In: IPDPS (2002)
Google Scholar
Nisar,A., Liao, W.-K., Choudhary, A.: Scaling parallel I/O performance through I/O delegate and caching system. SC (2008)
Google Scholar
Nitzberg, B., et al.: Collective buffering: improving parallel I/O performance. In: HPDC (1997)
Google Scholar
Rafique, M.M., Butt, A.R., Nikolopoulos, D.S.: DMA-based prefetching for I/O-intensive workloads on the cell architecture. In: Conference on Computing Frontiers, pp. 23–32 (2008)
Google Scholar
ROMIO website.: http://www-unix.mcs.anl.gov/romio/
Schmuck, F., Haskin, R., GPFS: a shared-disk file system for large computing clusters. In: Proceedings of the First USENIX FAST, pp. 231–244, USENIX, Jan 2002
Google Scholar
Tran, N., Reed, D.A.: Automatic ARIMA time series modeling for adaptive I/O prefetching. IEEE Trans. Parallel Distrib. Sys. 15(4), 362–377 (2004)
Article Google Scholar
Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: Proceedings of the 7th Symposium on the Frontiers of Massively Parallel Computation (1999)
Google Scholar
Thakur, R., Choudhary, A., Bordawekar, R., More, S., Kuditipudi, S.: Passion: optimized I/O for parallel applications. Computer 29(6), 70–78, June 1996
Google Scholar
Vilayannur, M., Sivasubramaniam, A., Kandemir, M.T., Thakur, R., Ross, R.: Discretionary caching for I/O on clusters. Cluster Comput. 9(1), 29–44 (2006)
Article Google Scholar
Welch, B., Unangst, M., Abbasi, Z., Gibson, G., Mueller, B., Small, J., Zelenka, J., Zhou, B.: Scalable performance of the panasas parallel file system. USENIX FAST (2008)
Google Scholar
Zhang, X., Jiang, S., Davis, K.: Making resonance a common case: a high-performance implementation of collective I/O on parallel file systems. IPDPS (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Texas Tech University, Lubbock, TX, USA
Yin Lu, Yong Chen, Prathamesh Amritkar & Yu Zhuang
Mathematics and Computer Science Division, Argonne National Lab, Argonne, IL, USA
Rajeev Thakur

Authors

Yin Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Prathamesh Amritkar
View author publications
You can also search for this author in PubMed Google Scholar
Rajeev Thakur
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhuang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Chen .

Editor information

Editors and Affiliations

SeoulTech, Computer Science and Engineering, Seoul University of Science & Technology, Gongreung 2-dong 172, Seoul, 139-742, Korea, Republic of (South Korea)
James J. (Jong Hyuk) Park
, Electrical and Computer Engineering, The University of British Columbia, Room 4013, Kaiser Building, Main Mall 2332, Vancouver, V6T 1Z4, British Columbia, Canada
Victor C.M. Leung
The University of Hong Kong, Hong Kong, 1, China, People's Republic
Cho-Li Wang
, Division of Information and Computer Eng, Ajou University, San 5, Suwon, Gyeonggido, 443-749, Korea, Republic of (South Korea)
Taeshik Shon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, Y., Chen, Y., Amritkar, P., Thakur, R., Zhuang, Y. (2012). A New Data Sieving Approach for High Performance I/O. In: J. (Jong Hyuk) Park, J., Leung, V., Wang, CL., Shon, T. (eds) Future Information Technology, Application, and Service. Lecture Notes in Electrical Engineering, vol 164. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-4516-2_12

Download citation

DOI: https://doi.org/10.1007/978-94-007-4516-2_12
Published: 05 June 2012
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-4515-5
Online ISBN: 978-94-007-4516-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics