Skip to main content
Log in

HDCache: A Distributed Cache System for Real-Time Cloud Services

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Providing a real-time cloud service requires simultaneously retrieving a large amount of data. How to improve the performance of file access becomes a great challenge. This paper first addresses the preconditions of dealing with this problem considering the requirements of applications, hardware, software, and network environments in the cloud. Then, a novel distributed layered cache system named HDCache is proposed. HDCahe is built on the top of Hadoop Distributed File System (HDFS). Applications can integrate the client library of HDCache to access the multiple cache services. The cache services are built up with three access layers an in-memory cache, a snapshot of the local disk, and a network disk provided by HDFS. The files loaded from HDFS are cached in a shared memory which can be directly accessed by the client library. In order to improve robustness and alleviate workload, the cache services are organized in a peer-to-peer style using a distributed hash table and every cached file has three replicas scattered in different cache service nodes. Experimental results show that HDCache can store files with a wide range in their sizes and has the access performance in a millisecond level under highly concurrent environments. The tested hit ratio obtained from a real-world cloud serviced is higher than 95 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Baldoni, R., Cimmino, S., Marchetti, C.: Total order communications over asynchronous distributed systems: Specifications and implementations (2004). Technical report, Citeseer

  2. Barroso, L.A., Dean, J., Holzle, U.: Web search for a planet: The google cluster architecture. Micro IEEE 23(2), 22–28 (2003)

    Article  Google Scholar 

  3. Boden, N.J., Cohen, D., Felderman, R.E, Kulawik, A.E, Seitz, C.L, Seizovic, J.N, Su, W.-K.: Myrinet: A gigabit-per-second local area network. IEEE Micro 15(1), 29–36 (1995)

    Article  Google Scholar 

  4. Borthakur, D., Gray, J., Sen Sarma, J., Muthukkaruppan, K., Spiegelberg, N., Kuang, H., Ranganathan, K., Molkov, D., Menon, A., Rash, S.: Apache hadoop goes realtime at facebook. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 1071–1080. ACM (2011)

  5. Boulon, J., Konwinski, A., Qi, R., Rabkin, A., Yang, E., Yang, M.: Chukwa, a large-scale monitoring system. In: Proceedings of CCA, vol. 8, pp. 1–5 (2008)

  6. Burrows, M.: The chubby lock service for loosely-coupled distributed systems. In: Proceedings of the 7th symposium on Operating systems design and implementation, pp. 335–350. USENIX Association (2006)

  7. Cardenas, Y., Pierson, J.-M., Brunie, L.: Uniform distributed cache service for grid computing. In: Proceedings of the Sixteenth International Workshop on Database and Expert Systems Applications, pp. 351–355. IEEE (2005)

  8. Carstoiu, D., Cernian, A., Olteanu, A.: Hadoop hbase-0.20. 2 performance evaluation. In: 2010 4th International Conference on New Trends in Information Science and Service Science (NISS), pp. 84–87. IEEE (2010)

  9. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C, Wallach, D.A, Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 26(2), 4 (2008)

    Article  Google Scholar 

  10. Chen, G., Jagadish, H.V., Jiang, D., Maier, D., Ooi, B.C., Tan, K., Tan, W.: Federation in cloud data management: Challenges and opportunities. IEEE Trans. Knowl. Data Eng. 26(7), 1670–1678 (2014)

    Article  Google Scholar 

  11. Chohan, N., Bunch, C., Krintz, C., Canumalla, N.: Cloud platform datastore support. Journal of grid computing 11(1), 63–81 (2013)

    Article  Google Scholar 

  12. Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with ycsb. In: Proceedings of the 1st ACM symposium on Cloud computing. ACM, pp. 143–154 (2010)

  13. Costantini, A., Gervasi, O., Zollo, F., Caprini, L.: User interaction and data management for large scale grid applications. Journal of Grid Computing 12(3), 485–497 (2014)

    Article  Google Scholar 

  14. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  15. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: ACM SIGOPS Operating Systems Review, vol. 41, pp. 205–220. ACM (2007)

  16. Dong, B., Qiu, J., Zheng, Q., Zhong, X., Li, J., Li, Y.: A novel approach to improving the efficiency of storing and accessing small files on hadoop: a case study by powerpoint files. In: 2010 IEEE International Conference on Services Computing (SCC), pp. 65–72. IEEE (2010)

  17. Eastlake, D., Fowler, G., Vo, K.-P., Noll, L.: The fnv non-cryptographic hash algorithm. http://tools.ietf.org/html/eastlake-fnv-03 (2014)

  18. Brad F.: Distributed caching with memcached. Linux journal 2004(124), 5 (2004)

    Google Scholar 

  19. George, L.: HBAse: the definitive guide. O’Reilly Media Inc. (2011)

  20. Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: ACM SIGOPS operating systems review, vol. 37, pp. 29–43. ACM (2003)

  21. Hecht, R., Jablonski, S.: Nosql evaluation. In: International Conference on Cloud and Service Computing, pp. 336–41 (2011)

  22. Ho, S.Y., Kwok, S.H.: The attraction of personalized service for users in mobile commerce: an empirical study. ACM SIGecom Exchanges 3(4), 10–18 (2002)

    Article  Google Scholar 

  23. Hsiao, H.-C., Chung, H.-Y., Shen, H., Chao, Y.-C.: Load rebalancing for distributed file systems in clouds. IEEE Transactions on Parallel and D.S. 24(5), 951–962 (2013)

    Article  Google Scholar 

  24. Hua, X., Hao, Wu., Li, Z., Ren, S.: Enhancing throughput of the hadoop distributed file system for interaction-intensive tasks. J. Parallel Distrib. Comput. 74(8), 2770–2779 (2014)

    Article  Google Scholar 

  25. Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: Wait-free coordination for internet-scale systems. In: USENIX Annual Technical Conference, vol. 8, p. 9 (2010)

  26. Islam, N.S., Rahman, MW., Jose, J., Rajachandrasekar, R., Wang, H., Subramoni, H., Murthy, C., Panda, D.K.: High performance rdma-based design of hdfs over infiniband. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, p. 35 (2012)

  27. Jiang, L., Li, B., Song, M.: The optimization of hdfs based on small files. In: 2010 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT), pp. 912–915. IEEE (2010)

  28. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review 44(2), 35–40 (2010)

    Article  Google Scholar 

  29. Liu, X., Han, J., Zhong, Y., Han, C., He, X.: Implementing webgis on hadoop: A case study of improving small file i/o performance on hdfs. In: IEEE International Conference on Cluster Computing and Workshops, 2009. CLUSTER’09, pp. 1–8. IEEE (2009)

  30. Macedo, T., Oliveira, F.: Redis cookbook. O’Reilly Media Inc. (2011)

  31. Mackey, G., Sehrish, S., Wang., J.: Improving metadata management for small files in hdfs. In: IEEE International Conference on Cluster Computing and Workshops CLUSTER ’09, pp. 1–4. IEEE (2009)

  32. March, V., Teo, Y.M.: A read-only distributed hash table. Journal of Grid Computing 9(4), 501–529 (2011)

    Article  Google Scholar 

  33. Membrey, P., Plugge, E., Hawkins, D.: The definitive guide to mongo DB: the no SQL database for cloud and desktop computing (2010). A press

  34. Ousterhout, J., Agrawal, P., Erickson, D., Kozyrakis, C., Leverich, J., Mazières, D., Mitra, S., Narayanan, A., Ongaro, D., Parulkar, G.: The case for ramcloud. Commun. ACM 54(7), 121–130 (2011)

    Article  Google Scholar 

  35. Ousterhout, J., Agrawal, P., Erickson, D., Kozyrakis, C., Leverich, J., Mazières, D., Mitra, S., Narayanan, A., Ongaro, D., Parulkar, G., Rosenblum, M.: The case for ramclouds: scalable high-performance storage entirely in dram. ACM SIGOPS Operating Systems Review 43(4), 92–105 (2010)

    Article  Google Scholar 

  36. Jure P.: Using memcached for data distribution in industrial environment. In: International Conference on System, pp. 368–372 (2008)

  37. Pfister, G.F.: An introduction to the infiniband architecture. High Performance Mass Storage and Parallel I/O 42, 617–632 (2001)

    Google Scholar 

  38. Reed, B., Junqueira, F.P.: A simple totally ordered broadcast protocol. In: Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware, p. 2. ACM (2008)

  39. Shafer, J., Rixner, S., Cox, A.L.: The hadoop distributed filesystem: Balancing portability and performance. In: 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pp. 122–133. IEEE (2010)

  40. Shamsi, J., Khojaye, M.A., Qasmi, M.A.: Data-intensive cloud computing: Requirements, expectations, challenges, and solutions. Journal of grid computing 11(2), 281–310 (2013)

    Article  Google Scholar 

  41. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)

  42. Singh, H.J., Singh, VP.: High scalability of hdfs using distributed namespace. Int. J. Comput. Appl. Technol. 52, 30–37 (2012)

    Google Scholar 

  43. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment 2(2), 1626–1629 (2009)

    Article  Google Scholar 

  44. Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., Qin, X.: Improving mapreduce performance through data placement in heterogeneous hadoop clusters. In: IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), pp. 1–9. IEEE (2010)

  45. Xu, C., Huang, X., Wu, N., Xu, P., Yang, G.: Using memcached to promote read throughput in massive small-file storage system. In: 2010 9th International Conference on Grid and Cooperative Computing (GCC), pp. 24–29. IEEE (2010)

  46. Xue, S.J., Bin Pan, Wu., Fang, W.: A novel approach in improving i/o performance of small meteorological files on hdfs. Applied Mechanics and Materials 117, 1759–1765 (2012)

    Google Scholar 

  47. Yoon, S.-D., Jung, I.-Y., Kim, K.-H., Jeong, C.-S.: Improving hdfs performance using local caching system. In: Second International Conference On Future Generation Communication Technology (FGCT), pp. 153–156. IEEE (2013)

  48. Zhang, J., Gongqing, Wu., Xuegang, Hu., Wu, X.: A distributed cache for hadoop distributed file system in real-time cloud services. In: 2012 ACM/IEEE 13th International Conference on Grid Computing (GRID), pp. 12–21. IEEE (2012)

  49. Zhang, S., Han, J., Liu, Z., Wang, K., Wang, S., Feng, S.: Accelerating mapreduce with distributed memory cache. In: 2009 15th International Conference on Parallel and Distributed Systems (ICPADS), pp. 472–478. IEEE (2009)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Li, Q. & Zhou, W. HDCache: A Distributed Cache System for Real-Time Cloud Services. J Grid Computing 14, 407–428 (2016). https://doi.org/10.1007/s10723-015-9360-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-015-9360-9

Keywords

Navigation