Abstract
With the rapid growth and use of social networks, the appearance of Internet technology and the advent of the cloud computing a need for new tools and algorithms is appeared to handle the challenges of the big-data. One of the key advances in resolving the big-data challenges is to introduce scalable storage systems. NoSQL databases are considered as efficient big data storage management systems that provide horizontal scalability. To ensure scalability of the system, data partitioning strategies must be implemented in these databases. In this paper, an Adaptive Rendezvous Hashing Partitioning Module (ARHPM) is proposed for Cassandra NoSQL databases. The main goal of this module is to partition the data in Cassandra using rendezvous hashing with proposing a Load Balancing based Rendezvous Hashing (LBRH) algorithm for guaranteeing the load balancing in the partitioning process. To evaluate the proposed module, Cassandra is modified by embedding the APRHM partitioning module in it and a number of experiments are conducted to validate the load balancing of the proposed module by using the Yahoo Cloud Serving Benchmark.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Demchenko, Y., Membrey, P., Grosso, P., de Laat, C.: Addressing big data issues in scientific data infrastructure. In: First International Symposium on Big Data and Data Analytics in Collaboration (BDDAC 2013). Part of The 2013 International Conference on Collaboration Technologies and Systems (CTS 2013), 20–24 May 2013, San Diego, California, USA
Benzaken, V., Castagna, G., Nguyen, K., Siméon, J.: Static and dynamic semantics of NoSQL languages. SIGPLAN Not. 48(1), 101–114 (2013)
HBase Development Team. HBase: BigTable-like structured storage for Hadoop HDFS [EB/OL], 20 March 2013. http://wiki.apache.org/hadoop/Hbase/
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., et al.: BigTable: a distributed storage system for structured data. In: Proceedings of the 7th OSDI, pp. 205–218. ACM, Seattle (2006)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. Oper. Syst. Rev. 44(2), 35–40 (2010)
Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, 1997, pp. 654–663. ACM, New York (1997)
Srinivasan, L., Varma, V.: Adaptive load-balancing for consistent hashing in heterogeneous clusters 2015. In: 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (2015)
Byers, J., Considine, J., Mitzenmacher, M.: Simple load balancing for distributed hash tables. In: Kaashoek, MFrans, Stoica, Ion (eds.) IPTPS 2003. LNCS, vol. 2735, pp. 80–87. Springer, Heidelberg (2003)
Yao, Z., Ravishankar, C., Tripathi, S.: Hash-based virtual hierarchies for caching in hybrid content-delivery networks (PDF), CSE Department, University of California, Riverside, Riverside, CA, 13 May 2001. Accessed 15 November 2015
Turk, A., Oguz Selvitopi, R., Ferhatosmanoglu, H., Aykanat, C.: Temporal workload-aware replicated partitioning for social networks. IEEE Trans. Knowl. Data Eng. 26(11), 2832–2845 (2014)
Abramova, V., et al.: Testing cloud benchmark scalability with cassandra. In: 2014 IEEE 10th World Congress on Services (2014)
Huang, X., Wang, J., Zhong, Y., Song, S., Yu, P.S.: Optimizing data partition for scaling out NoSQL cluster, 20 September 2015 in Wiley Online Library (wileyonlinelibrary.com). doi:10.1002/cpe.3643
Ramakrishnan, L., et al.: Processing cassandra datasets with hadoop-streaming based approaches. IEEE 2015 Trans. Serv. Comput.
Chen, Z.: Hybrid range consistent hash partitioning strategy–a new data partition strategy for NoSQL database. In: 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (2013)
Cooper, B.F., Silberstein, A., Tam, E., et al.: Benchmarking cloud serving systems with YCSB. In: Proceedings of SoCC. ACM, Indianapolis (2010)
Seada, K., Helmy, A.: Rendezvous regions: a scalable architecture for service location and datacentric storage in large-scale wireless networks. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium, IPDPS 2004
Yuki, K.: “Digest::MurmurHash”, GitHub.com. Accessed 18 Mar 2015
Jenkins, B.: SpookyHash: a 128-bit noncryptographic hash. Accessed 29 Jan 2012
Server Virtualization with VMware vSphere|VMware India. www.vmware.com. Accessed 08 Mar 2016
https://datastax.github.io/python-driver/api/cassandra/policies.html. Accessed 4 Jan 2016
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Elghamrawy, S.M. (2017). An Adaptive Load-Balanced Partitioning Module in Cassandra Using Rendezvous Hashing. In: Hassanien, A., Shaalan, K., Gaber, T., Azar, A., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016. AISI 2016. Advances in Intelligent Systems and Computing, vol 533. Springer, Cham. https://doi.org/10.1007/978-3-319-48308-5_56
Download citation
DOI: https://doi.org/10.1007/978-3-319-48308-5_56
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48307-8
Online ISBN: 978-3-319-48308-5
eBook Packages: EngineeringEngineering (R0)