Evaluating Cassandra Scalability with YCSB
NoSQL data stores appeared to fill a gap in the database market: that of highly scalable data storage that can be used for simple storage and retrieval of key-indexed data while allowing easy data distribution over a possibly large number of servers. Cassandra has been pinpointed as one of the most efficient and scalable among currently existing NoSQL engines. Scalability of these engines means that, by adding nodes, we could have more served requests with the same performance and more nodes could result in reduced execution time of requests. However, we will see that adding nodes not always results in performance increase and we investigate how the workload, database size and the level of concurrency are related to the achieved scaling level. We will overview Cassandra data store engine, and then we evaluate experimentally how it behaves concerning scaling and request time speedup. We use the YCSB – Yahoo! Cloud Serving Benchmark for these experiments.
KeywordsNoSQL YCSB Cassandra Scalability
Unable to display preview. Download preview PDF.
- 2.A community white paper developed by leading researchers across the United States. Challenges and Opportunities with Big DataGoogle Scholar
- 4.Konstantinou, I., Angelou, E., Boumpouka, C., Tsoumakos, D., Koziris, N.: On the elasticity of nosql databases over cloud management platforms. In: CIKM, pp. 2385–2388 (2011)Google Scholar
- 5.Pirzadeh, P., Tatemura, J., Hacigumus, H.: Performance evaluation of range queries in key value stores. In: IPDPSW, pp. 1092–1101 (2011)Google Scholar
- 10.Hewitt, E.: Cassandra - The Definitive Guide: Distributed Data at Web Scale. Springer (2011)Google Scholar
- 13.Feng, C., Zouand, Y., Xu, Z.: CCIndex for Cassandra: A Novel Scheme for Multi-dimensional Range Queriesin Cassandra. In: SKG 2011, pp. 130–136 (2011)Google Scholar
- 14.Dede, E., Sendir, B., Kuzlu, P., Hartog, J., Govindaraju, M.: An Evaluation of Cassandra for Hadoop. In: IEEE CLOUD 2013, pp. 494–501 (2013), Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)Google Scholar