Abstract
Many contemporary applications have to deal with unexpected spikes or unforeseen peaks in demand for specific data objects – so-called hotspot objects. For example in social networks, specific media items can go viral quickly and unexpectedly and therefore, properly provisioning for such behavior is not trivial.
NoSQL databases are specifically designed for enhanced scalability, high availability, and elasticity to deal with increasing data volumes. Although existing performance benchmarking systems such as the Yahoo! Cloud Serving Benchmark (YCSB) provide support to test the performance properties of different databases under identical workloads, they lack support for testing how well these databases can cope with the above-mentioned unexpected hotspot object behaviour.
To address this shortcoming and fill the research gap, we present the design and implementation of a new YCSB workload that is rooted upon a formal characterization of hotspot-based spikes. The proposed workload implements the Pitman-Yor distribution and is configurable in a number of parameters such as spike probability and data locality. As such, it allows for more extensive experimental validation of database systems.
Our functional validation illustrates how the workload can be used to effectively stress-test different types of databases and we present our comparative results of benchmarking two popular NoSQL databases that are Cassandra and MongoDB in terms of their response to spiked workloads.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In this paper, we mainly focus on YCSB. However, a more extensive discussion of other benchmark systems is covered in Sect. 5.
- 2.
In the YCSB config, it will be used when the parameter workload is set to site.ycsb.workloads.SpikesWorkload.
- 3.
The experiments for a multi-node setup will be considered in the future work.
References
Arasu, A., et al.: Linear Road: A Stream Data Management Benchmark (2004). https://doi.org/10.1016/B978-012088469-8/50044-9
Armstrong, T., Ponnekanti, V., Borthakur, D., Callaghan, M.: LinkBench: a database benchmark based on the Facebook social graph, pp. 1185–1196 (2013)
Barahmand, S., Ghandeharizadeh, S.: BG: a benchmark to evaluate interactive social networking actions. Citeseer (2013)
Bodik, P., Fox, A., Franklin, M., Jordan, M., Patterson, D.: Characterizing, modeling, and generating workload spikes for stateful services, pp. 241–252 (2010). https://doi.org/10.1145/1807128.1807166
Chen, J., et al.: HotRing: a hotspot-aware in-memory key-value store. In: 18th USENIX Conference on File and Storage Technologies (FAST 20), Santa Clara, CA, pp. 239–252. USENIX Association, February 2020. https://www.usenix.org/conference/fast20/presentation/chen-jiqiang
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154 (2010)
Dayarathna, M., Suzumura, T.: XGDBench: a benchmarking platform for graph stores in exascale clouds, pp. 363–370 (2012)
Dayarathna, M., Suzumura, T.: Benchmarking Graph Data Management and Processing Systems: A Survey. arXiv preprint arXiv:2005.12873 (2020)
Dey, A., Fekete, A., Nambiar, R., Rohm, U.: YCSB+T: benchmarking web-scale transactional databases, pp. 223–230 (2014)
Difallah, D.E., Pavlo, A., Curino, C., Cudre-Mauroux, P.: Oltp-bench: an extensible testbed for benchmarking relational databases. Proc. VLDB Endow. 7(4), 277–288 (2013)
Gao, W., et al.: Bigdatabench: a scalable and unified big data and AI benchmark suite. arXiv preprint arXiv:1802.08254 (2018)
Ghazal, A., et al.: BigBench V2: the new and improved BigBench. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 1225–1236. IEEE (2017)
Ghazal, A., et al.: BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1197–1208 (2013)
Gupta, P., Carey, M.J., Mehrotra, S., Yus, O.: SmartBench: a benchmark for data management in smart spaces. Proc. VLDB Endow. 13(12), 1807–1820 (2020)
Kumar, S.P., Lefebvre, S., Chiky, R., Soudan, E.G.: Evaluating consistency on the fly using YCSB. In: 2014 International Workshop on Computational Intelligence for Multimedia Understanding (IWCIM), pp. 1–6, November 2014. https://doi.org/10.1109/IWCIM.2014.7008801
Leutenegger, S.T., Dias, D.: A modeling study of the TPC-C benchmark. ACM SIGMOD Rec. 22(2), 22–31 (1993)
Lu, P., Yuan, L., Zhang, Y., Cao, H., Li, K.: AutoFlow: Hotspot-Aware, Dynamic Load Balancing for Distributed Stream Processing. arXiv preprint arXiv:2103.08888 (2021)
Nambiar, R.O., Poess, M.: The making of TPC-DS. In: VLDB, vol. 6, pp. 1049–1058 (2006)
Patil, S., et al.: YCSB++: benchmarking and performance debugging advanced features in scalable table stores, pp. 1–14 (2011)
PilHo, K.: Transaction processing performance council (TPC). Guide d’installation (2014)
Pirzadeh, P., Carey, M.J., Westmann, T.: BigFUN: a performance study of big data management system functionality. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 507–514. IEEE (2015)
Pitman, J., Yor, M.: The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 855–900 (1997)
Poess, M., Smith, B., Kollar, L., Larson, P.: TPC-DS, taking decision support benchmarking to the next level (2002)
Sidhanta, S., Mukhopadhyay, S., Golab, W.: DYN-YCSB: benchmarking adaptive frameworks. In: 2019 IEEE World Congress on Services (SERVICES), vol. 2642–939X, pp. 392–393, July 2019. https://doi.org/10.1109/SERVICES.2019.00119
TPC: Transaction Processing Performance Council. tpcorg http://www.tpc.org/. Accessed 14 Feb 2020
TPC-E: TPC-E is an On-Line Transaction Processing Benchmark. http://www.tpc.org/tpce/ (2020). Accessed 20 Feb 2021
Waudby, J., Steer, B.A., Karimov, K., Marton, J., Boncz, P., Szárnyas, G.: Towards testing ACID compliance in the LDBC social network benchmark. In: Nambiar, R., Poess, M. (eds.) TPCTC 2020. LNCS, vol. 12752, pp. 1–17. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-84924-5_1
Wu, Z., Butkiewicz, M., Perkins, D., Katz-Bassett, E., Madhyastha, H.V.: SPANStore: cost-effective geo-replicated storage spanning multiple cloud services. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 292–308 (2013)
Xia, F., Li, Y., Yu, C., Ma, H., Qian, W.: BSMA: a benchmark for analytical queries over social media data. Proc. VLDB Endow. 7(13), 1573–1576 (2014)
Acknowledgements
This research is partially funded by the Research Fund KU Leuven and the Cybersecurity Initiative Flanders (CIF) project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Claesen, C., Rafique, A., Van Landuyt, D., Joosen, W. (2022). A YCSB Workload for Benchmarking Hotspot Object Behaviour in NoSQL Databases. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking. TPCTC 2021. Lecture Notes in Computer Science(), vol 13169. Springer, Cham. https://doi.org/10.1007/978-3-030-94437-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-94437-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-94436-0
Online ISBN: 978-3-030-94437-7
eBook Packages: Computer ScienceComputer Science (R0)