Skip to main content
Log in

APS: adaptable prefetching scheme to different running environments for concurrent read streams in distributed file systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Distributed file systems (DFSs) are widely used in various areas. One of the key issues is to provide high performance of concurrent read streams (i.e., multiple series of sequential reads by concurrent processes) for their applications. Despite the many studies on local file systems (LFSs), research has seldom been done on concurrent read streams in DFSs with different running environments (i.e., different types of storage devices and various network delays). Furthermore, most of the existing DFSs have a sharply degraded performance compared with a LFS (i.e., EXT4). Therefore, to achieve high performance in concurrent read streams, this study introduces a populating effect that keeps sending subsequent reads to a storage server and then proposes an adaptable prefetching scheme (APS) to obtain the effect even in different running environments. Hence, our APS resolves all the problems that we identified as dramatically degrading the performance in existing DFSs. In three different types of storage devices and in various network delays, the evaluation results show that our prefetching scheme (1) achieves almost the same performance as a LFS from an individual server and (2) minimizes the performance degradation of random reads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. A file system and storage benchmark. https://github.com/filebench/filebench/wiki. Accessed Mar 2018

  2. Baek SH, Park KH (2009) Striping-aware sequential prefetching for independency and parallelism in disk arrays with concurrent accesses. IEEE Trans Comput 58(8):1146–1152

    Article  MathSciNet  MATH  Google Scholar 

  3. Chen M et al (2017) vNFS: maximizing NFS performance with compounds and vectorized I/O. ACM Trans Storage (TOS) 13(3):21

    MathSciNet  Google Scholar 

  4. Cooper BF et al (2010) Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing. ACM

  5. Ding X et al (2007) DiskSeen: exploiting disk layout and access history to enhance I/O prefetch. In; USENIX Annual Technical Conference, vol 7

  6. Dong B et al (2010) Correlation based file prefetching approach for hadoop. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom). IEEE

  7. Ellard D, Seltzer MI (2003) NFS tricks and benchmarking traps. In: USENIX Annual Technical Conference, FREENIX Track

  8. Feiyi W et al (2009) Understanding lustre filesystem internals. Oak Ridge National Laboratory, National Center for Computational Sciences, Technical Report

  9. Fengguang WU, Hongsheng XI, Chenfeng XU (2008) On the design of a new linux readahead framework. ACM SIGOPS Oper Syst Rev 42(5):75–84

    Article  Google Scholar 

  10. Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. In: ACM SIGOPS Operating Systems Review, vol 37, no. 5. ACM

  11. Gill BS, Bathen LAD (2007) Optimal multistream sequential prefetching in a shared cache. ACM Trans Storage (TOS) 3(3):10

    Article  Google Scholar 

  12. Gluster File System. http://www.gluster.org. Accessed Mar 2018

  13. Hong J et al (2016) Optimizing Hadoop framework for solid state drives. In: IEEE International Congress on Big Data (BigData Congress), 2016. IEEE

  14. Islam NS et al (2016) High performance design for HDFS with byte-addressability of NVM and RDMA. In: Proceedings of the 2016 International Conference on Supercomputing. ACM

  15. Jiang S et al (2013) A prefetching scheme exploiting both data layout and access history on disk. ACM Trans Storage (TOS) 9(3):10

    Google Scholar 

  16. Lee HK, An BS, Kim EJ (2009) Adaptive prefetching scheme using web log mining in Cluster-based web systems. In: IEEE International Conference on Web Services, 2009. ICWS 2009. IEEE

  17. Liang S, Jiang S, Zhang X (2007) STEP: sequentiality and thrashing detection based prefetching to improve performance of networked storage servers. In: 27th International Conference on Distributed Computing Systems (ICDCS’07). IEEE

  18. Li C, Shen K, Papathanasiou AE (2007) Competitive prefetching for concurrent sequential I/O. In: ACM SIGOPS Operating Systems Review, vol 41(3). ACM

  19. Martin RP, Culler DE (1999) NFS sensitivity to high performance networks. ACM SIGMETRICS Perform Eval Rev 27(1):71–82

    Article  Google Scholar 

  20. Mikami S, Ohta K, Tatebe O (2011) Using the Gfarm File System as a POSIX compatible storage platform for Hadoop MapReduce applications. In: Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing. IEEE Computer Society

  21. Pai R, Pulavarty B, Cao M (2004) Linux 2.6 performance improvement through readahead optimization. In: Proceedings of the Linux Symposium, vol 2

  22. Palankar MR et al (2008) Amazon S3 for science grids: a viable solution? In: Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing. ACM

  23. Papagiannaki K et al (2002) Analysis of measured single-hop delay from an operational backbone network. In: Proceedings of the Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies. INFOCOM 2002. IEEE, vol 2. IEEE

  24. Papagiannaki K et al (2003) Measurement and analysis of single-hop delay on an IP backbone network. IEEE J Sel Areas Commun 21(6):908–921

    Article  Google Scholar 

  25. Pillai TS et al (2017) Application crash consistency and performance with CCFS. FAST, vol 15

  26. Rago S, Bohra A, Ungureanu C (2013) Using eager strategies to improve NFS I/O performance. Int J Parallel Emerg Distrib Syst 28(2):134–158

    Article  Google Scholar 

  27. Roselli DS, Lorch JR, Anderson TE (2000) A comparison of file system workloads. In: USENIX Annual Technical Conference, General Track

  28. Saini S et al (2012) I/O performance characterization of Lustre and NASA applications on Pleiades. In: 2012 19th International Conference on High Performance Computing (HiPC). IEEE

  29. Shafer J, Rixner S, Cox AL (2010) The hadoop distributed filesystem: balancing portability and performance. In: 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS). IEEE

  30. Shriver EAM, Small C, Smith KA (1999) Why does file system prefetching work? USENIX Annual Technical Conference, General Track

  31. Shvachko K et al (2010) The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). IEEE

  32. Soundararajan G, Mihailescu M, Amza C (2008) Context-aware prefetching at the storage server. In: USENIX Annual Technical Conference

  33. Sur S et al (2010) Can high-performance interconnects benefit hadoop distributed file system. In: Workshop on Micro Architectural Support for Virtualization, Data Center Computing, and Clouds (MASVDC). Held in Conjunction with MICRO

  34. The IOzone Benchmark. http://www.iozone.org. Accessed Mar 2018

  35. Walker E (2006) A distributed file system for a wide-area high performance computing infrastructure. WORLDS. Vol. 6

  36. Weil SA et al (2006) Ceph: A scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation. USENIX Association

  37. Welch B et al (2008) Scalable performance of the panasas parallel file system. FAST, vol 8

  38. Wu F et al (2007) Linux readahead: less tricks for more. In: Proceedings of the Linux Symposium, vol 2

  39. Yadgar G et al (2008) Mc2: multiple clients on a multilevel cache. In: The 28th International Conference on Distributed Computing Systems, 2008. ICDCS’08. IEEE

  40. Yadgar G et al (2011) Management of multilevel, multiclient cache hierarchies with application hints. ACM Trans Comput Syst (TOCS) 29(2):5

    Article  Google Scholar 

  41. Zhang Z et al (2008) Pfc: transparent optimization of existing prefetching strategies for multi-level storage systems. In: The 28th International Conference on Distributed Computing Systems, 2008. ICDCS’08. IEEE

Download references

Acknowledgements

This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. R0126-15-1082, Management of Developing ICBMS (IoT, Cloud, Bigdata, Mobile, Security) Core Technologies and Development of Exascale Cloud Storage Technology).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sangmin Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, S., Hyun, S.J., Kim, HY. et al. APS: adaptable prefetching scheme to different running environments for concurrent read streams in distributed file systems. J Supercomput 74, 2870–2902 (2018). https://doi.org/10.1007/s11227-018-2333-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2333-6

Keywords

Navigation