Skip to main content

Benchmarking Replication and Consistency Strategies in Cloud Serving Databases: HBase and Cassandra

  • Conference paper
  • First Online:
Big Data Benchmarks, Performance Optimization, and Emerging Hardware (BPOE 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8807))

Abstract

Databases serving OLTP operations generated by cloud applications have been widely researched and deployed nowadays. Such cloud serving databases like BigTable, HBase, Cassandra, Azure and many others are designed to handle a large number of concurrent requests performed on the cloud end. Such systems can elastically scale out to thousands of commodity hardware by using a shared nothing distributed architecture. This implies a strong need of data replication to guarantee service availability and data access performance. Data replication can improve system availability by redirecting operations against failed data blocks to their replicas and improve performance by rebalancing load across multiple replicas. However, according to the PACELC model, as soon as a distributed database replicates data, another tradeoff between consistency and latency arises. This tradeoff motivates us to figure out how the latency changes when we adjust the replication factor and the consistency level. The replication factor determines how many replicas a data block should maintain, and the consistency level specifies how to deal with read and write requests performed on replicas. We use YCSB to conduct several benchmarking efforts to do this job. We report benchmark results for two widely used systems: HBase and Cassandra.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    HBase also supports replicating data across datacenter for disaster recovery purpose only.

References

  1. Chang, F., et al.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 26(2), 4 (2008)

    Article  Google Scholar 

  2. Apache HBase. http://hbase.apache.org/

  3. Apache Cassandra. http://cassandra.apache.org/

  4. Brewer, E.: CAP twelve years later: How the “rules” have changed. Computer 45(2), 23–29 (2012)

    Article  Google Scholar 

  5. Abadi, D.J.: Consistency tradeoffs in modern distributed database system design. IEEE Comput. Mag. 45(2), 37 (2012)

    Article  MathSciNet  Google Scholar 

  6. HBase read high-availability using timeline-consistent region replicas. http://issues.apache.org/jira/browse/HBASE-10070

  7. Cooper, B.F., et al.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing. ACM (2010)

    Google Scholar 

  8. Cooper, B.F., et al.: PNUTS: Yahoo!’s hosted data serving platform. Proc. VLDB Endow. 1(2), 1277–1288 (2008)

    Article  Google Scholar 

  9. Project Voldemort: A distributed database. http://www.project-voldemort.com/voldemort/

  10. About Cassandra’s Built-in Consistency Repair Features. http://www.datastax.com/docs/1.1/dml/data_consistency#builtin-consistency

  11. Gao, W., et al.: Bigdatabench: a big data benchmark suite from web search engines (2013). arXiv preprint arXiv:1307.0320

    Google Scholar 

  12. Wang, L., et al.: Bigdatabench: A big data benchmark suite from internet services (2014). arXiv preprint arXiv:1401.1406

    Google Scholar 

  13. Pokluda, A., Sun, W.: Benchmarking Failover Characteristics of Large-Scale Data Storage Applications: Cassandra and Voldemort. http://www.alexanderpokluda.ca/coursework/cs848/CS848%20Project%20Report%20-%20Alexander%20Pokluda%20and%20Wei%20Sun.pdf

  14. Bermbach, D., Zhao, L., Sakr, S.: Towards comprehensive measurement of consistency guarantees for cloud-hosted data storage services. In: Nambiar, R., Poess, M. (eds.) TPCTC 2013. LNCS, vol. 8391, pp. 32–47. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianhui Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, H., Li, J., Zhang, H., Zhou, Y. (2014). Benchmarking Replication and Consistency Strategies in Cloud Serving Databases: HBase and Cassandra. In: Zhan, J., Han, R., Weng, C. (eds) Big Data Benchmarks, Performance Optimization, and Emerging Hardware. BPOE 2014. Lecture Notes in Computer Science(), vol 8807. Springer, Cham. https://doi.org/10.1007/978-3-319-13021-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13021-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13020-0

  • Online ISBN: 978-3-319-13021-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics