HDCache: A Distributed Cache System for Real-Time Cloud Services

Zhang, Jing; Li, Qianmu; Zhou, Wei

doi:10.1007/s10723-015-9360-9

HDCache: A Distributed Cache System for Real-Time Cloud Services

Published: 06 February 2016

Volume 14, pages 407–428, (2016)
Cite this article

Journal of Grid Computing Aims and scope Submit manuscript

401 Accesses
7 Citations
Explore all metrics

Abstract

Providing a real-time cloud service requires simultaneously retrieving a large amount of data. How to improve the performance of file access becomes a great challenge. This paper first addresses the preconditions of dealing with this problem considering the requirements of applications, hardware, software, and network environments in the cloud. Then, a novel distributed layered cache system named HDCache is proposed. HDCahe is built on the top of Hadoop Distributed File System (HDFS). Applications can integrate the client library of HDCache to access the multiple cache services. The cache services are built up with three access layers an in-memory cache, a snapshot of the local disk, and a network disk provided by HDFS. The files loaded from HDFS are cached in a shared memory which can be directly accessed by the client library. In order to improve robustness and alleviate workload, the cache services are organized in a peer-to-peer style using a distributed hash table and every cached file has three replicas scattered in different cache service nodes. Experimental results show that HDCache can store files with a wide range in their sizes and has the access performance in a millisecond level under highly concurrent environments. The tested hit ratio obtained from a real-world cloud serviced is higher than 95 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Baldoni, R., Cimmino, S., Marchetti, C.: Total order communications over asynchronous distributed systems: Specifications and implementations (2004). Technical report, Citeseer
Barroso, L.A., Dean, J., Holzle, U.: Web search for a planet: The google cluster architecture. Micro IEEE 23(2), 22–28 (2003)
Article Google Scholar
Boden, N.J., Cohen, D., Felderman, R.E, Kulawik, A.E, Seitz, C.L, Seizovic, J.N, Su, W.-K.: Myrinet: A gigabit-per-second local area network. IEEE Micro 15(1), 29–36 (1995)
Article Google Scholar
Borthakur, D., Gray, J., Sen Sarma, J., Muthukkaruppan, K., Spiegelberg, N., Kuang, H., Ranganathan, K., Molkov, D., Menon, A., Rash, S.: Apache hadoop goes realtime at facebook. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 1071–1080. ACM (2011)
Boulon, J., Konwinski, A., Qi, R., Rabkin, A., Yang, E., Yang, M.: Chukwa, a large-scale monitoring system. In: Proceedings of CCA, vol. 8, pp. 1–5 (2008)
Burrows, M.: The chubby lock service for loosely-coupled distributed systems. In: Proceedings of the 7th symposium on Operating systems design and implementation, pp. 335–350. USENIX Association (2006)
Cardenas, Y., Pierson, J.-M., Brunie, L.: Uniform distributed cache service for grid computing. In: Proceedings of the Sixteenth International Workshop on Database and Expert Systems Applications, pp. 351–355. IEEE (2005)
Carstoiu, D., Cernian, A., Olteanu, A.: Hadoop hbase-0.20. 2 performance evaluation. In: 2010 4th International Conference on New Trends in Information Science and Service Science (NISS), pp. 84–87. IEEE (2010)
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C, Wallach, D.A, Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 26(2), 4 (2008)
Article Google Scholar
Chen, G., Jagadish, H.V., Jiang, D., Maier, D., Ooi, B.C., Tan, K., Tan, W.: Federation in cloud data management: Challenges and opportunities. IEEE Trans. Knowl. Data Eng. 26(7), 1670–1678 (2014)
Article Google Scholar
Chohan, N., Bunch, C., Krintz, C., Canumalla, N.: Cloud platform datastore support. Journal of grid computing 11(1), 63–81 (2013)
Article Google Scholar
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with ycsb. In: Proceedings of the 1st ACM symposium on Cloud computing. ACM, pp. 143–154 (2010)
Costantini, A., Gervasi, O., Zollo, F., Caprini, L.: User interaction and data management for large scale grid applications. Journal of Grid Computing 12(3), 485–497 (2014)
Article Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: ACM SIGOPS Operating Systems Review, vol. 41, pp. 205–220. ACM (2007)
Dong, B., Qiu, J., Zheng, Q., Zhong, X., Li, J., Li, Y.: A novel approach to improving the efficiency of storing and accessing small files on hadoop: a case study by powerpoint files. In: 2010 IEEE International Conference on Services Computing (SCC), pp. 65–72. IEEE (2010)
Eastlake, D., Fowler, G., Vo, K.-P., Noll, L.: The fnv non-cryptographic hash algorithm. http://tools.ietf.org/html/eastlake-fnv-03 (2014)
Brad F.: Distributed caching with memcached. Linux journal 2004(124), 5 (2004)
Google Scholar
George, L.: HBAse: the definitive guide. O’Reilly Media Inc. (2011)
Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: ACM SIGOPS operating systems review, vol. 37, pp. 29–43. ACM (2003)
Hecht, R., Jablonski, S.: Nosql evaluation. In: International Conference on Cloud and Service Computing, pp. 336–41 (2011)
Ho, S.Y., Kwok, S.H.: The attraction of personalized service for users in mobile commerce: an empirical study. ACM SIGecom Exchanges 3(4), 10–18 (2002)
Article Google Scholar
Hsiao, H.-C., Chung, H.-Y., Shen, H., Chao, Y.-C.: Load rebalancing for distributed file systems in clouds. IEEE Transactions on Parallel and D.S. 24(5), 951–962 (2013)
Article Google Scholar
Hua, X., Hao, Wu., Li, Z., Ren, S.: Enhancing throughput of the hadoop distributed file system for interaction-intensive tasks. J. Parallel Distrib. Comput. 74(8), 2770–2779 (2014)
Article Google Scholar
Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: Wait-free coordination for internet-scale systems. In: USENIX Annual Technical Conference, vol. 8, p. 9 (2010)
Islam, N.S., Rahman, MW., Jose, J., Rajachandrasekar, R., Wang, H., Subramoni, H., Murthy, C., Panda, D.K.: High performance rdma-based design of hdfs over infiniband. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, p. 35 (2012)
Jiang, L., Li, B., Song, M.: The optimization of hdfs based on small files. In: 2010 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT), pp. 912–915. IEEE (2010)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review 44(2), 35–40 (2010)
Article Google Scholar
Liu, X., Han, J., Zhong, Y., Han, C., He, X.: Implementing webgis on hadoop: A case study of improving small file i/o performance on hdfs. In: IEEE International Conference on Cluster Computing and Workshops, 2009. CLUSTER’09, pp. 1–8. IEEE (2009)
Macedo, T., Oliveira, F.: Redis cookbook. O’Reilly Media Inc. (2011)
Mackey, G., Sehrish, S., Wang., J.: Improving metadata management for small files in hdfs. In: IEEE International Conference on Cluster Computing and Workshops CLUSTER ’09, pp. 1–4. IEEE (2009)
March, V., Teo, Y.M.: A read-only distributed hash table. Journal of Grid Computing 9(4), 501–529 (2011)
Article Google Scholar
Membrey, P., Plugge, E., Hawkins, D.: The definitive guide to mongo DB: the no SQL database for cloud and desktop computing (2010). A press
Ousterhout, J., Agrawal, P., Erickson, D., Kozyrakis, C., Leverich, J., Mazières, D., Mitra, S., Narayanan, A., Ongaro, D., Parulkar, G.: The case for ramcloud. Commun. ACM 54(7), 121–130 (2011)
Article Google Scholar
Ousterhout, J., Agrawal, P., Erickson, D., Kozyrakis, C., Leverich, J., Mazières, D., Mitra, S., Narayanan, A., Ongaro, D., Parulkar, G., Rosenblum, M.: The case for ramclouds: scalable high-performance storage entirely in dram. ACM SIGOPS Operating Systems Review 43(4), 92–105 (2010)
Article Google Scholar
Jure P.: Using memcached for data distribution in industrial environment. In: International Conference on System, pp. 368–372 (2008)
Pfister, G.F.: An introduction to the infiniband architecture. High Performance Mass Storage and Parallel I/O 42, 617–632 (2001)
Google Scholar
Reed, B., Junqueira, F.P.: A simple totally ordered broadcast protocol. In: Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware, p. 2. ACM (2008)
Shafer, J., Rixner, S., Cox, A.L.: The hadoop distributed filesystem: Balancing portability and performance. In: 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pp. 122–133. IEEE (2010)
Shamsi, J., Khojaye, M.A., Qasmi, M.A.: Data-intensive cloud computing: Requirements, expectations, challenges, and solutions. Journal of grid computing 11(2), 281–310 (2013)
Article Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Singh, H.J., Singh, VP.: High scalability of hdfs using distributed namespace. Int. J. Comput. Appl. Technol. 52, 30–37 (2012)
Google Scholar
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment 2(2), 1626–1629 (2009)
Article Google Scholar
Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., Qin, X.: Improving mapreduce performance through data placement in heterogeneous hadoop clusters. In: IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), pp. 1–9. IEEE (2010)
Xu, C., Huang, X., Wu, N., Xu, P., Yang, G.: Using memcached to promote read throughput in massive small-file storage system. In: 2010 9th International Conference on Grid and Cooperative Computing (GCC), pp. 24–29. IEEE (2010)
Xue, S.J., Bin Pan, Wu., Fang, W.: A novel approach in improving i/o performance of small meteorological files on hdfs. Applied Mechanics and Materials 117, 1759–1765 (2012)
Google Scholar
Yoon, S.-D., Jung, I.-Y., Kim, K.-H., Jeong, C.-S.: Improving hdfs performance using local caching system. In: Second International Conference On Future Generation Communication Technology (FGCT), pp. 153–156. IEEE (2013)
Zhang, J., Gongqing, Wu., Xuegang, Hu., Wu, X.: A distributed cache for hadoop distributed file system in real-time cloud services. In: 2012 ACM/IEEE 13th International Conference on Grid Computing (GRID), pp. 12–21. IEEE (2012)
Zhang, S., Han, J., Liu, Z., Wang, K., Wang, S., Feng, S.: Accelerating mapreduce with distributed memory cache. In: 2009 15th International Conference on Parallel and Distributed Systems (ICPADS), pp. 472–478. IEEE (2009)

Download references

Author information

Authors and Affiliations

Department of Software Engineering, School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei Street, Nanjing, 210094, People’s Republic of China
Jing Zhang & Qianmu Li
National Pilot School of Software, Yunnan University, Yunnan, China
Wei Zhou

Authors

Jing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qianmu Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J., Li, Q. & Zhou, W. HDCache: A Distributed Cache System for Real-Time Cloud Services. J Grid Computing 14, 407–428 (2016). https://doi.org/10.1007/s10723-015-9360-9

Download citation

Received: 25 April 2015
Accepted: 02 December 2015
Published: 06 February 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s10723-015-9360-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HDCache: A Distributed Cache System for Real-Time Cloud Services

Abstract

Access this article

Similar content being viewed by others

The big data system, components, tools, and technologies: a survey

Serverless Computing: Current Trends and Open Problems

Data deduplication techniques for efficient cloud storage management: a systematic review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

HDCache: A Distributed Cache System for Real-Time Cloud Services

Abstract

Access this article

Similar content being viewed by others

The big data system, components, tools, and technologies: a survey

Serverless Computing: Current Trends and Open Problems

Data deduplication techniques for efficient cloud storage management: a systematic review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation