Advertisement

Journal of Signal Processing Systems

, Volume 90, Issue 4, pp 619–640 | Cite as

Informed Prefetching for Distributed Multi-Level Storage Systems

  • Maen M. Al Assaf
  • Xunfei Jiang
  • Xiao Qin
  • Mohamed Riduan Abid
  • Meikang Qiu
  • Jifu Zhang
Article
  • 135 Downloads

Abstract

In this paper, we present an informed prefetching technique called IPODS that makes use of application-disclosed access patterns to prefetch hinted blocks in distributed multi-level storage systems. We develop a prefetching pipeline in IPODS, where an informed prefetching process is divided into a set of independent prefetching steps and separated among multiple storage levels in a distributed system. In the IPODS system, while data blocks are prefetched from hard disks to memory buffers in remote storage servers, data blocks buffered in the servers are prefetched through networks to the clients’ local cache. We show that these two prefetching steps can be handled in a pipelining manner to improve I/O performance of distributed storage systems. Our IPODS technique differs from existing prefetching schemes in two ways. First, it reduces applications’ I/O stalls by keeping hinted data in clients’ local caches and storage servers’ fast buffers (e.g., solid state disks). Second, in a prefetching pipeline, multiple informed prefetching mechanisms coordinate semi-dependently to fetch blocks (1) from low-level (slow) to high-level (fast) storage devices in servers and (2) from high-level devices in servers to the clients’ local cache. The prefetching pipeline in IPODS judiciously hides network latency in distributed storage systems, thereby reducing the overall I/O access time in distributed systems. Using a wide range of real-world I/O traces, our experiments show that IPODS can noticeably improve I/O performance of distributed storage systems by 6%.

Keywords

Informed prefetching Pipelining Parallel storage systems Distributed multi-level storage system 

Notes

Acknowledgments

Xiao Qin’s work is supported by the U.S. National Science Foundation under Grants IIS-1618669, CCF-0845257 (CAREER), CNS-0917137, CNS-0757778, CCF-0742187, CNS-0831502, CNS-0855251, and OCI-0753305. Jifu Zhang’s study is supported by the National Natural Science Foundation of P.R. China under grant No.61572343.

References

  1. 1.
    Patterson, R.H., Gibson, G., Stodolsky, D., & Zelenka, J. (1995). Informed prefetching and caching. In Proceedings of the 15th ACM symposium on operating system principles (pp. 79–95). CO, USA.Google Scholar
  2. 2.
    Chen, Y., Byna, S., Sun, X., Thakur, R., & Gropp, W. (2008). Hiding I/O latency with pre-execution prefetching for parallel applications. In Proceedings of the 2008 ACM/IEEE conference on supercomputing (pp. 1–10). Austin, TX, USA.Google Scholar
  3. 3.
    Yang, C.K., Mitra, T., & Chiueh, T. (2002). A decoupled architecture for application-specific file prefetching. In Freenix track of USENIX 2002 annual conference.Google Scholar
  4. 4.
    Griffioen, J., & Appleton, R. (1994). Reducing file system latency using a predictive approach. In Proceedings of the 1994 USENIX annual technical conference (pp. 197–207). Berkeley, CA, USA.Google Scholar
  5. 5.
    Nijim, M. (2010). Modelling speculative prefetching for hybrid storage systems. In IEEE fifth international conference on networking, architecture and storage (NAS), 2010 (pp. 143–151). Macau.Google Scholar
  6. 6.
    Thomasian, A. (2006). Multi-level RAID for very large disk arrays. ACM SIGMETRICS Performance Evaluation Review, 33(4).  https://doi.org/10.1145/1138085.1138091.
  7. 7.
    Kaneko, T. (1974). Optimal task switching policy for a multilevel storage system. IBM Journal of Research and Development, 18(4), 310–315.CrossRefzbMATHGoogle Scholar
  8. 8.
    Huizinga, D.M., & Desai, S. (2000). Implementation of informed prefetching and caching in linux. In Proceedings of the international conference on information technology (pp. 443–448). Las Vegas, NV, USA.Google Scholar
  9. 9.
    Patterson, R.H., Gibson, G.A., & Satyanarayanan, M. (1993). A status report on research in transparent informed prefetching. ACM SIGOPS Operating Systems Review, 27(2), 21–34.CrossRefGoogle Scholar
  10. 10.
    Patterson, R. H., Gibson, G. A., & Satyanarayanan, M. (1992). Using transparent informed prefetching (TIP) to reduce file read latency. In Proceedings of conference on mass storage systems and technologies (pp. 329–342). Greenbelt, MD.Google Scholar
  11. 11.
    Patterson, R.H., & Gibson, G. (1994). Exposing I/O concurrency with informed prefetching. In Proceedings of the third international conference on on parallel and distributed information systems (pp. 7–16). Austin, TX, USA.Google Scholar
  12. 12.
    Chen, Y., Byna, S., Sun, X., Thakur, R., & Gropp, W. (2008). Exploring parallel I/O concurrency with speculative prefetching. In Proceedings of the 2008 37th international conference on parallel processing (pp. 422–429). Portland, OR, USA.Google Scholar
  13. 13.
    Tomkins, A., Patterson, R.H., & Gibson, G. (1997). Informed multi-process prefetching and caching. In Proceedings of the 1997 ACM SIGMETRICS international conference on measurement and modeling of computer systems (pp. 100–114). Seattle, WA, USA.Google Scholar
  14. 14.
    Kimbrel, T., Cao, P., Felten, E., Karlin, A., & Li, K. (1996). Integrated parallel prefetching and caching. In Proceedings of the 1996 ACM SIGMETRICS international conference on measurement and modeling of computer systems (pp. 262–263). PA, USA.Google Scholar
  15. 15.
    Ganger, G.R., Worthington, B.L., Hou, R.Y., & Patt, Y.N. (1994). Disk arrays: high-performance, high-reliability storage subsystems. Journal: Computer, 27, 30–36.  https://doi.org/10.1109/2.268882. issn: 0018-9162, Ann Arbor, MI, USA.Google Scholar
  16. 16.
    Chang, F., & Gibson, G. A. (1999). Automatic I/O hint generation through speculative execution. In Proceedings of the third symposium on operating systems design and implementation (pp. 1–14). New Orleans, Louisiana, United States.Google Scholar
  17. 17.
    Byna, S., Chen, Y., Sun, X.-H., Thakur, R., & Gropp, W. (2008). Parallel I/O prefetching using MPI file caching and I/O signatures. In Proceedings of the 2008 ACM/IEEE conference on supercomputing. Austin, Texas.Google Scholar
  18. 18.
    Al Assaf, M.M., Jiang, X., Abid, M.R., & Qin, X. (2013). Eco-storage: a hybrid storage system with energy-efficient informed prefetching. Journal of Signal Processing Systems, Springer US.  https://doi.org/10.1007/s11265-013-0784-9.
  19. 19.
    Jiang, X., Al Assaf, M.M., Zhang, J., Alghamdi, M.I., Ruan, X., Muzaffar, T., & Qin, X. (2013). Thermal modeling of hybrid storage clusters. Journal of Signal Processing Systems, Springer US.  https://doi.org/10.1007/s11265-013-0787-6.
  20. 20.
    Lee, E. K., & Thekkath, C. A. (1996). Petal: distributed virtual disks. In Proceedings of the seventh international conference on architectural support for programming languages and operating systems (pp. 84–92). Cambridge, Massachusetts.Google Scholar
  21. 21.
    Long, D.D.E., Montague, B.R., & Cabrera, L. (1994). Swift/raid: a distributed raid system. University of California at Santa Cruz, Santa Cruz, CA.Google Scholar
  22. 22.
    Watson, R.W., & Coyne, R.A. (1995). The parallel I/O architecture of the high-performance storage system (HPSS). In Proceedings of the 14th IEEE symposium on mass storage systems (p. 27).Google Scholar
  23. 23.
    Hartman, J.H., & Ousterhout, J.K. (1995). The Zebra striped network file system. ACM Transactions on Computer Systems (TOCS), 13(3), 274–310.CrossRefGoogle Scholar
  24. 24.
    Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., & Gruber, R.E. (2008). Bigtable: a distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2), 1–26.CrossRefGoogle Scholar
  25. 25.
    Tierney, B., Lee, J., Chen, L.T., Herzog, H., Hoo, G., Jin, G., & Johnston, W.E. (1994). Distributed parallel data storage systems: a scalable approach to high speed image servers. In Proceedings of the second ACM international conference on multimedia (pp. 399–405). San Francisco, CA.Google Scholar
  26. 26.
    Moyer, S.A., & Sunderam, V. (1994). PIOUS: a scalable parallel I/O system for distributed computing environments. In Proceedings of scalable high-performance computing conference (pp. 71–78). Knoxville, TN.Google Scholar
  27. 27.
    Cabrera, L., & Long, D.D.E. (1991). Swift: using distributed disk striping to provide high I/O data rates. University of California at Santa Cruz, Santa Cruz, CA.Google Scholar
  28. 28.
    Tierney, B.L., Johnston, W.E., Herzog, H., Hoo, G., Jin, G., Lee, J., Chen, L.T., & Rotem, D. (1994). Using high speed networks to enable distributed parallel image server systems. In Proceedings of the 1994 conference on supercomputing (pp. 610–619). Washington, D.C.Google Scholar
  29. 29.
    Feng, D., Zou, Q., Jiang, H., & et al. (2008). A novel model for synthesizing parallel i/o workloads in scientific applications. In Proceedings of the IEEE international conference on cluster computing (cluster’08). Tsukuba, Japan.Google Scholar
  30. 30.
    Wu, Y., Dimakis, A.G., & Ramchandran, K. (2007). Deterministic regenerating codes for distributed storage, presented at the Allerton Con. Control, Computing, and Communication, Urbana-Champaign IL.Google Scholar
  31. 31.
    Dimakis, A.G., Godfrey, P.B., Wu, Y., Wainwright, M.J., & Ramchandran, K. (2010). Network coding for distributed storage systems. IEEE Transactions on Information Theory, 56(9), 4539–4551.CrossRefGoogle Scholar
  32. 32.
    Narayan, S., & Chandy, J.A. (2007). Parity redundancy in a clustered storage system. In International workshop on storage network architecture and parallel I/Os, 2007. SNAPI., page(s): 17–24, volume: Issue:, 24–24.Google Scholar
  33. 33.
    D. Borthakur (2007). The hadoop distributed file system: architecture and design. The Apache Software Foundation. http://hadoop.apache.org/common/docs/r0.18.0/hdfs_design.pdf.
  34. 34.
    D. Borthakur (2008). HDFS architecture, the apache software foundation. http://hadoop.apache.org/common/docs/r0.20.0/hdfs_design.pdf.
  35. 35.
    Shafer, J., Rixner, S., & Cox, A. (2010). The Hadoop distributed filesystem: balancing portability and performance. In IEEE international symposium on performance analysis of systems & software (ISPASS) (pp. 122–133). White Plains, NY.  https://doi.org/10.1109/ISPASS.2010.5452045
  36. 36.
    Moise, D., Antoniu, G., & Bougé, L. (2010). Improving the Hadoop map/reduce framework to support concurrent appends through the BlobSeer BLOB management system. In Proceedings of the 19th ACM international symposium on high performance distributed computing (HPDC ’10) (pp. 834–840). Chicago, IL.  https://doi.org/10.1145/1851476.1851596
  37. 37.
    Dean, J., & Ghemawat, S. (2010). Mapreduce: a flexible data processing tool. Communications of the ACM, 53(1).  https://doi.org/10.1145/1629175.1629198.
  38. 38.
    Dean, J., & Ghemawat, S. (2008). Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1).  https://doi.org/10.1145/1327452.1327492.
  39. 39.
    Baker, M.G., Hartman, J.H., Kupfer, M.D., Shirriff, K.W., & Ousterhout, J.K. (1991). Measurements of a distributed file system. In Proceedings of the thirteenth ACM symposium on operating systems principles (pp. 198–212). Pacific Grove, California, United States.  https://doi.org/10.1145/121132.121164
  40. 40.
    Spasojevic, M., & Satyanarayanan, M. (1996). An empirical study of a wide-area distributed file system. ACM Transactions on Computer Systems (TOCS), 14(2), 200–222.  https://doi.org/10.1145/227695.227698.CrossRefGoogle Scholar
  41. 41.
    Satyanarayanan, M. (1990). Scalable, secure and highly available distributed file access. Computer, 23(5), 9–21.CrossRefGoogle Scholar
  42. 42.
    Ghemawat, S., Gobioff, H., & Leung, S. -T. (2003). The Google file system. In Proceedings of the nineteenth ACM symposium on operating systems principles. Bolton Landing, NY, USA.  https://doi.org/10.1145/945445.945450
  43. 43.
    Weil, S.A., Brandt, S.A., Miller, E.L., Long, D.D.E., & Maltzahn, C. (2006). Ceph: a scalable, high-performance distributed file system. In Proceedings of the 7th symposium on operating systems design and implementation. Seattle, Washington.Google Scholar
  44. 44.
    Thekkath, C.A., Mann, T., & Lee, E.K. (1997). Frangipani: a scalable distributed file system. In Proceedings of the sixteenth ACM symposium on operating systems principles (pp. 224–237). Saint Malo, France.  https://doi.org/10.1145/268998.266694
  45. 45.
    Siegel, A., Birman, K., & Marzullo, K. (1990). Deceit: a flexible distributed file system. In Proceedings of the workshop on the management of replicated data, 1990 (pp. 15–17). Houston, TX, USA.Google Scholar
  46. 46.
    Satyanarayanan, M., Howard, J.H., Nichols, D.A., Sidebotham, R.N., Spector, A.Z., & West, M.J. (1985). The ITC distributed file system: principles and design. In Proceedings of the tenth ACM symposium on operating systems principles (pp. 35–50). Orcas Island, Washington, United States.  https://doi.org/10.1145/323647.323633
  47. 47.
    Satyanarayanan, M., Kistler, J.J., Kumar, P., Okasaki, M.E., Siegel, E.H., & Steer, D.C. (1990). Coda: a highly available file system for a distributed workstation environment. IEEE Transactions on Computers, 39(4), 447–459.CrossRefGoogle Scholar
  48. 48.
    Howard, J.H., Kazar, M.L., Menees, S.G., Nichols, D.A., Satyanarayanan, M., Sidebotham, R.N., & West, M.J. (1988). Scale and performance in a distributed file system. ACM Transactions on Computer Systems (TOCS), 6(1), 51–81.  https://doi.org/10.1145/35037.35059.CrossRefGoogle Scholar
  49. 49.
    Rochberg, D., & Gibson, G.A. (1997). Prefetching over a network: early experience with CTIP. ACM SIGMETRICS Performance Evaluation Review, 25(3), 29–36.CrossRefGoogle Scholar
  50. 50.
    Al Assaf, M.M. Informed prefetching in distributed multi-level storage systems, http://hdl.handle.net/10415/2935.
  51. 51.
    Madhyastha, T., Gibson, G., & Faloutsos, C. (1999). Informed prefetching of collective input/output requests. In Proceedings of the 1999 ACM/IEEE conference on supercomputing (CDROM). Portland, Oregon.Google Scholar
  52. 52.
    Zhang, Z., Lee, K., Ma, X., & Zhou, Y. (2008). PFC: transparent optimization of existing prefetching strategies for multi-Level storage systems. In Proceedings of 28th international conference on distributed computing system (pp. 740–751). Beijing, China.Google Scholar
  53. 53.
  54. 54.
  55. 55.
  56. 56.
  57. 57.
    Lewis, J., Alghamdi, M.I., Assaf, M.A., Ruan, X.-J., Ding, Z.-Y., & Qin, X. (2010). An automatic prefetching and caching system. In Proceedings of the 29th international performance computing and communications conference (IPCCC).Google Scholar
  58. 58.
    Ramspeed cache and memory benchmarking tool http://alasir.com/software/ramspeed/.

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Maen M. Al Assaf
    • 1
  • Xunfei Jiang
    • 2
  • Xiao Qin
    • 3
  • Mohamed Riduan Abid
    • 4
  • Meikang Qiu
    • 5
  • Jifu Zhang
    • 6
  1. 1.King Abdullah II School for Information TechnologyThe University of JordanAmmanJordan
  2. 2.Computer Science DepartmentEarlham CollegeRichmondUSA
  3. 3.Department of Computer Science and Software EngineeringAuburn UniversityAuburnUSA
  4. 4.Department of Computer ScienceAl Akhawayn UniversityIfraneMorocco
  5. 5.Department of Computer SciencePace UniversityNew YorkUSA
  6. 6.School of Computer Science and TechnologyTaiyuan University of Science and TechnologyTaiyuanChina

Personalised recommendations