Skip to main content

A self-tuning client-side metadata prefetching scheme for wide area network file systems

Abstract

Client-side metadata prefetching is commonly used in wide area network (WAN) file systems because it can effectively hide network latency. However, most existing prefetching approaches do not meet the various prefetching requirements of multiple workloads. They are usually optimized for only one specific workload and have no or harmful effects on other workloads. In this paper, we present a new self-tuning client-side metadata prefetching scheme that uses two different prefetching strategies and dynamically adapts to workload changes. It uses a directory-directed prefetching strategy to prefetch the related file metadata in the same directory, and a correlation-directed prefetching strategy to prefetch the related file metadata accessed across directories. A novel self-tuning mechanism is proposed to efficiently convert the prefetching strategy between directory-directed and correlation-directed prefetching. Experimental results using real system traces show that the hit ratio of the client-side cache can be significantly improved by our self-tuning client-side prefetching. With regards to the multi-workload concurrency scenario, our approach improves the hit ratios for the no-prefetching, directory-directed prefetching, variant probability graph algorithm, variant apriori algorithm, and variant semantic distance algorithm by up to 15.22%, 6.32%, 10.08%, 11.65%, and 10.73%, corresponding to 25.24%, 18.11%, 23.53%, 24.94%, and 24.19% reductions in the average access time, respectively.

This is a preview of subscription content, access via your institution.

References

  1. Wrzeszcz M, Trzepla K, Slota R, et al. Metadata organization and management for globalization of data access with onedata. In: Proceedings of the International Conference on Parallel Processing and Applied Mathematics, Krakow, 2015. 312–321

  2. Grimshaw A, Morgan M, Kalyanaraman A. GFFS—the XSEDE global federated file system. Parall Process Lett, 2013, 23: 1340005

    MathSciNet  Article  Google Scholar 

  3. Weil S A, Brandt S A, Miller E L, et al. Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, Washington, 2006. 307–320

  4. Ghemawat S, Gobioff H, Leung S T. The Google file system. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles, New York, 2003. 20–43

  5. Zhang S, Catanese H, Wang A A I. The composite-file file system: decoupling the one-to-one mapping of files and metadata for better performance. In: Proceedings of the 14th USENIX Conference on File and Storage Technologies, Santa Clara, 2016. 15–22

  6. Beckmann N, Chen H, Cidon A. LHD: improving cache hit rate by maximizing hit density. In: Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation, Renton, 2018. 389–403

  7. Li Z, Chen Z, Srinivasan S M, et al. C-Miner: mining block correlations in storage systems. In: Proceedings of the 3rd USENIX Conference on File and Storage Technologies, San Francisco, 2004. 173–186

  8. Hsu W W, Smith A J, Young H C. The automatic improvement of locality in storage systems. ACM Trans Comput Syst, 2005, 23: 424–473

    Article  Google Scholar 

  9. Ding X, Jiang S, Chen F, et al. DiskSeen: exploiting disk layout and access history to enhance I/O prefetch. In: Proceedings of USENIX Annual Technical Conference, Boston, 2007. 7: 261–274

    Google Scholar 

  10. Jiang S, Ding X, Xu Y, et al. A prefetching scheme exploiting both data layout and access history on disk. ACM Trans Storage, 2013, 9: 1–23

    Article  Google Scholar 

  11. Kuenning G H. The design of the seer predictive caching system. In: Proceedings of the 1st Workshop on Mobile Computing Systems and Applications, New York, 1994. 37–43

  12. Griffioen J. Performance measurements of automatic prefetching. In: Proceedings of the ISCA International Conference on Parallel and Distributed Computing Systems, New York, 1995. 165–170

  13. Li X, Xiao L, Qiu M, et al. Enabling dynamic file I/O path selection at runtime for parallel file system. J Supercomput, 2014, 68: 996–1021

    Article  Google Scholar 

  14. Battle L, Chang R, Stonebraker M. Dynamic prefetching of data tiles for interactive visualization. In: Proceedings of the 2016 International Conference on Management of Data, San Francisco, 2016. 1363–1375

  15. Wei B, Xiao L M, Wei W, et al. A new adaptive coding selection method for distributed storage systems. IEEE Access, 2018, 6: 13350–13357

    Article  Google Scholar 

  16. Lin W, Xu S Y, Li J, et al. Design and theoretical analysis of virtual machine placement algorithm based on peak workload characteristics. Soft Comput, 2017, 21: 1301–1314

    Article  Google Scholar 

  17. Patrick C M, Kandemir M, Karakoy M, et al. Cashing in on hints for better prefetching and caching in PVFS and MPI-IO. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, 2010: 191–202

  18. Henschel R, Simms S, Hancock D, et al. Demonstrating Lustre over a 100 Gbps wide area network of 3500 km. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, Salt Lake City, 2012. 1–8

  19. Carns P, Lang S, Ross R, et al. Small-file access in parallel file systems. In: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, New York, 2009. 1–11

  20. Cao P, Felten E W, Karlin A R, et al. A study of integrated prefetching and caching strategies. SIGMETRICS Perform Eval Rev, 1995, 23: 188–197

    Article  Google Scholar 

  21. Habermann P, Chi C C, Alvarez-Mesa M, et al. Application-specific cache and prefetching for HEVC CABAC decoding. IEEE Multimedia, 2017, 24: 72–85

    Article  Google Scholar 

  22. Al Assaf M M, Jiang X, Qin X, et al. Informed prefetching for distributed multi-level storage systems. J Sign Process Syst, 2018, 90: 619–640

    Article  Google Scholar 

  23. Hou B, Chen F. Pacaca: mining object correlations and parallelism for enhancing user experience with cloud storage. In: Proceedings of the 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2018. 293–305

Download references

Acknowledgements

This work was supported by National key R&D Program of China (Grant No. 2018YFB0203901), National Natural Science Foundation of China (Grant No. 61772053), the Fund of the State Key Laboratory of Software Development Environment (Grant No. SKLSDE-2018ZX-10), and Science Challenge Project (Grant No. TZ2016002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Limin Xiao.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wei, B., Xiao, L., Song, Y. et al. A self-tuning client-side metadata prefetching scheme for wide area network file systems. Sci. China Inf. Sci. 65, 132101 (2022). https://doi.org/10.1007/s11432-019-2833-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-019-2833-1

Keywords

  • wide area network file systems
  • multiple workloads
  • metadata prefetching
  • correlation-directed prefetching
  • directory-directed prefetching
  • self-tuning prefetching