Pre-execution data prefetching with I/O scheduling

Zhao, Yue; Yoshigoe, Kenji; Xie, Mengjun

doi:10.1007/s11227-013-1060-2

Pre-execution data prefetching with I/O scheduling

Published: 25 December 2013

Volume 68, pages 733–752, (2014)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Yue Zhao¹,
Kenji Yoshigoe¹ &
Mengjun Xie¹

195 Accesses
2 Citations
Explore all metrics

Abstract

Parallel applications suffer from I/O latency. Pre-execution I/O prefetching is effective in hiding I/O latency, in which a pre-execution prefetching thread is created and dedicated to fetch the data for the main thread in advance. However, existing pre-execution prefetching works do not pay attention to the relationship between the main thread and the pre-execution prefetching thread. They just simply pre-execute the I/O accesses using the prefetching thread as soon as possible failing to carefully coordinate them with the operations of the main thread. This drawback induces a series of adverse effects on pre-execution prefetching such as diminishing the degree of the parallelism between computation and I/O, delaying the I/O access of main threads, and aggravating the I/O resource competition in the whole system. In this paper, we propose a new method to overcome this drawback by scheduling the I/O operations among the main threads and the pre-execution prefetching threads. The results of extensive experiments on four popular benchmarks in parallel I/O performance area demonstrate the benefits of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pre-execution Data Prefetching with Inter-thread I/O Scheduling

Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors

Article 29 April 2016

Improving Parallel I/O Performance Using Multithreaded Two-Phase I/O with Processor Affinity Management

References

Chen Y, Sun XH, Thakur R, Roth PC, Gropp W (2011) LACIO: a new collective I/O strategy for parallel I/O systems. In: Proceedings of international parallel and distributed processing symposium (IPDPS). IEEE, New York, pp 794–804
Sun X-H, Chen Y, Wu M (2005) Scalability of heterogeneous computing. In: Proceedings of 34th international conference on parallel processing
Liu N, Fu J, Carothers CD (2010) Massively parallel I/O for partitioned solver systems. Parallel Process Lett 6:1–17
Google Scholar
Kesavan M, Gavrilovska A, Schwan K (2010) On disk I/O scheduling in virtual machines. In: WIOV ’10, March 2010
Ali N, Carns PH, Iskra K, Kimpe D, Lang S, Latham R, Ross RB, Ward L, Sadayappan P (2009) Scalable I/O forwarding framework for high-performance computing systems. In: CLUSTER. pp 1–10
Ding X, Jiang S, Chen F, Davis K, Zhang X (2007) DiskSeen: exploiting disk layout and access history to enhance I/O prefetch. In: Proceedings of USENIX annual technical conference
Kotz DF, Ellis CS (1990) Prefetching in file systems for MIMD multiprocessors. In: IEEE transactions on parallel and distributed systems, vol 1, no 2
May J (2001) Parallel I/O for high performance computing. Morgan Kaufmann Publishing, Los Altos
Papathanasiou A, Scott M (2005) Aggressive prefetching: an idea whose time has come. In: Proceedings of the 10th workshop on hot topics in operating systems
Patterson RH (1997) Informed prefetching and caching. Carnegie Mellon Ph.D. Dissertation CMU-CS-97-204
Son SW, Kandemir M, Karakoy M, Chakrabarti D (2009) A compiler-directed data prefetching scheme for chip multiprocessors. In: Proceedings of the 14th symposium on principles and practice of parallel programming. pp 209–218
Ravichandran N, Paris JF (2005) Making early predictions of file accesses. In: Proceedings of 4th International Inf. Telecommun. Technol. pp 122–129
Brown AD, Mowry TC, Krieger O (2001) Compiler-based I/O prefetching for out-of-core applications. ACM Trans Comput Syst 19(2):111–170
Google Scholar
Seelam S, Chung IH, Bauer J, Wen HF (2010) Masking I/O latency using application level I/O caching and prefetching on Blue Gene systems. In: Proceedings of IEEE international symposium on parallel distributed processing (IPDPS). pp 1–12
He J, Sun X-H, Thakur R (2012) KNOWAC: I/O prefetch via accumulated knowledge. In: Proceedings Of IEEE international conference on cluster computing. pp 429–437
Chen Y, Byna S, Sun XH, Thakur R, Gropp W (2008) Hiding I/O latency with pre-execution prefetching for parallel applications. In: Proceedings of SC 2008. pp 1–10
Zhao Y, Yoshigoe K (2012) Hiding I/O latency with parallel pre-execution prefetching. In: Proceedings of the 24th IASTED international conference on parallel and distributed computing and systems (PDCS 2012), November 2012. pp 162–169
Zhao Y, Yoshigoe K, Xie M (2013) Pre-execution data prefetching with inter-thread I/O scheduling. In: Proceedings of the 2013 international supercomputing conference. Lecture notes in computer science (LNCS), vol 7905. Springer, Berlin, pp 395–407
Schwan P (2003) Lustre: building a file system for 1000-node clusters. In: Proceedings of Linux. Symposium, July 2003
Ligon W, Ross R (2003) Parallel I/O and the parallel virtual file system. In: Beowulf cluster computing with Linux. MIT Press, Cambridge, pp 493–534
Schmuck F, Haskin R (2002) GPFS: a shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX conference on file and storage technologies
Chen Y, Byna S, Sun X-H, Thakur R, Gropp W (2008) Exploring parallel I/O concurrency with speculative prefetching. In: Proceedings of 37th international conference on parallel processing (ICPP 08)
Margo MW, Kovatch PA, Andrews P, Banister B (2004) An analysis of state-of-the-art parallel file systems for linux. In: The 5th international conference on Linux clusters: the HPC revolution 2004. Austin, TX
Lofstead JF, Klasky S, Schwan K, Podhorszki N, Jin C (2008) Flexible io and integration for scientific codes through the adaptable io system (adios). In: Proceedings of the 6th international workshop on Challenges of large applications in distributed, environments. pp 15–24
Jin C, Klasky S, Hodson S, Yu W, Lofstead J, Abbasi H, Schwan K, Wolf M, Liao W, Choudhary A, Parashar M, Docan C, Oldfield R (2008) Adaptive io system (adios). Cray Users Group
Lofstead J, Klasky S, Booth M, Abbasi H, Zheng F, Wolf M, Schwan K (2009) Petascale io using the adaptable io system. Cray Users Group
Buettner D, Kunkel J, Ludwig T (2009) Using non-blocking I/O operations in high performance computing to reduce execution times. Recent advances in parallel virtual machine and message passing interface. Lecture notes in computer science, vol 5759. pp 134–142
Bent J, Gibson G, Grider G, McClelland B, Nowoczynski P, Nunez J, Polte M, Wingate M (2009) Plfs: a checkpoint filesystem for parallel applications. In: Proceedings of conference on high performance computing, networking, storage and analysis (SC’ 2009)
Kotz DF, Nieuwejaar N (1994) Dynamic file-access characteristics of a production parallel scientific workload. In: Proceedings of Supercomputing’94. pp 640–649
Reed D (2003) Scalable Input/Output: achieving system balance. The MIT Press, Cambridge
Madhyastha TM, Reed DA (2002) Learning to classify parallel Input/ Output access patterns. In: Proceedings of IEEE transactions on parallel and distributed systems, vol 13, no 8
Smirni E, Reed DA (1997) Workload characterization of Input/Output intensive parallel applications. In: Proceedings of the 9th international conference on computer performance evaluation: modeling techniques and tools
Thakur R, Gropp W, Lusk E (1999) Data sieving and collective I/O in ROMIO. In: Proceedings of the 7th symposium on the frontiers of massively parallel computation
Solihin Y, Lee J, Torrellas J (2002) Using a user-level memory thread for correlation prefetching. In: Proceedings of the 29th annual international symposium on computer architecture (ISCA), Anchorage. Alaska, May 2002
Makatos T, Klonatos Y, Marazakis M, Flouris MD, Bilas A (2010) Using transparent compression to improve SSD-based I/O caches. In: Proceedings of the 5th European conference on computer systems, EuroSys 10, NY, USA. ACM, New york, pp 1–14
Welton B, Kimpe D, Cope J, Patrick C, Iskra K, Ross R (2011) Improving I/O forwarding throughput with data compression. In: International conference on cluster computing, CLUSTER ’11. IEEE, New York, pp 438–445
Vishwanath V, Hereld M, Iskra K, Kimpe D, Morozov V, Papka ME, Ross RB, Yoshii K (2010) Accelerating i/o forwarding in ibm blue gene/p systems. In: SC. pp 1–10
Piernas J, Nieplocha J, Felix EJ (2007) Evaluation of active storage strategies for the lustre parallel file system. In: Proceedings of Supercomputing, 2007 (SC ’07)

Download references

Acknowledgments

This work was supported in part by the National Science Foundation under Grant CRI CNS-0855248, Grant EPS-0701890, Grant EPS-0918970, and Grant MRI CNS-0619069.

Author information

Authors and Affiliations

Department of Computer Science, University of Arkansas at Little Rock, 2801 S. University Avenue, Little Rock, AR, 72204, USA
Yue Zhao, Kenji Yoshigoe & Mengjun Xie

Authors

Yue Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Kenji Yoshigoe
View author publications
You can also search for this author in PubMed Google Scholar
Mengjun Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Zhao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, Y., Yoshigoe, K. & Xie, M. Pre-execution data prefetching with I/O scheduling. J Supercomput 68, 733–752 (2014). https://doi.org/10.1007/s11227-013-1060-2

Download citation

Published: 25 December 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s11227-013-1060-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pre-execution data prefetching with I/O scheduling

Abstract

Access this article

Similar content being viewed by others

Pre-execution Data Prefetching with Inter-thread I/O Scheduling

Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors

Improving Parallel I/O Performance Using Multithreaded Two-Phase I/O with Processor Affinity Management

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pre-execution data prefetching with I/O scheduling

Abstract

Access this article

Similar content being viewed by others

Pre-execution Data Prefetching with Inter-thread I/O Scheduling

Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors

Improving Parallel I/O Performance Using Multithreaded Two-Phase I/O with Processor Affinity Management

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation