Abstract
Long-running service workloads (e.g. web search engine) and short-term data analysis workloads (e.g. Hadoop MapReduce jobs) co-locate in today’s data centers. Developing realistic benchmarks to reflect such practical scenario of mixed workload is a key problem to produce trustworthy results when evaluating and comparing data center systems. This requires using actual workloads as well as guaranteeing their submissions to follow patterns hidden in real-world traces. However, existing benchmarks either generate actual workloads based on probability models, or replay real-world workload traces using basic I/O operations. To fill this gap, we propose a benchmark tool that is a first step towards generating a mix of actual service and data analysis workloads on the basis of real workload traces. Our tool includes a combiner that enables the replaying of actual workloads according to the workload traces, and a multi-tenant generator that flexibly scales the workloads up and down according to users’ requirements. Based on this, our demo illustrates the workload customization and generation process using a visual interface. The proposed tool, called BigDataBench-MT, is a multi-tenant version of our comprehensive benchmark suite BigDataBench and it is publicly available from http://prof.ict.ac.cn/BigDataBench/multi-tenancyversion/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Apache hbase. http://hbase.apache.org/
Apache hive. https://cwiki.apache.org/confluence/display/Hive/Home
Apache impala. http://impala.io/
Apache Mahout. http://mahout.apache.org/
Apache Nutch. http://nutch.apache.org/
Apache spark. https://spark.apache.org/
Big data benchmark by amplab of UC berkeley. https://amplab.cs.berkeley.edu/benchmark/
BigDataBench. http://prof.ict.ac.cn/BigDataBench/
Hadoop ecosystems. https://hadoopecosystemtable.github.io/
MySQL database. https://www.mysql.com/
Oprofile. http://oprofile.sourceforge.net/
Oracle database. http://www.oracle.com/
PigMix. https://cwiki.apache.org/confluence/display/PIG/PigMix
Sogou user query logs. http://www.sogou.com/labs/dl/q-e.html
Specweb99 benchmark. https://www.spec.org/web2009/
Terasort. https://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html
Testdfsio. https://support.pivotal.io/hc/en-us/articles/200864057-Running-DFSIO-mapreduce-benchmark-test/
TPC benchmarks. http://www.tpc.org/
TPC-W benchmark. http://www.tpc.org/tpcw/
WordCount. http://wiki.apache.org/hadoop/WordCount
Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M., Linkbench: a database benchmark based on the facebook social graph. In: SIGMOD 2013, pp. 1185–1196. ACM (2013)
Barahmand, S., Ghandeharizadeh, S., BG: A benchmark to evaluate interactive social networking actions. In: CIDR 2013. Citeseer (2013)
Capotă, M., Hegeman, T., Iosup, A., Prat-Pérez, A., Erling, O., Boncz, P., Graphalytics: A big data benchmark for graph-processing platforms. In: Proceedings of the GRADES 2015, pp. 7. ACM (2015)
Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. VLDB 5(12), 1802–1813 (2012)
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: SoCC 2010, pp. 143–154. ACM (2010)
Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W., Dynamo: amazon’s highly available key-value store. In: ACM SIGOPS Operating Systems Review, vol. 41, pp. 205–220. ACM (2007)
Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: A study of emerging workloads on modern hardware. Technical report (2011)
Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.-A.: Bigbench: Towards an industry standard benchmark for big data analytics. In: SIGMOD 2013, pp. 1197–1208. ACM (2013)
Han, R., Ghanem, M.M., Guo, L., Guo, Y., Osmond, M.: Enabling cost-aware and adaptive elasticity of multi-tier cloud applications. Future Gener. Comput. Syst. 32, 82–98 (2014)
Han, R., Wang, J., Ge, F., Vazquez-Poletti, J.L., Zhan, J.: SARP: producing approximate results with small correctness losses for cloud interactive services. In: CF 2015, pp. 22. ACM (2015)
Han, R., Wang, J., Huang, S., Shao, C., Zhan, S., Zhan, J., Vazquez-Poletti, J.L., SARP: producing approximate results with small correctness losses for cloud interactive services. In: ICPP 2015, pp. 490–499. IEEE (2015)
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B., The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In: IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2013), pp. 41–51. IEEE (2010)
Kim, K., Jeon, K., Han, H., Kim, S.-G., Jung, H., Yeom, H.Y.: Mrbench: A. benchmark for mapreduce framework. In: ICPADS 2008, pp. 11–18. IEEE (2008)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
Ming, Z., Luo, C., Gao, W., Han, R., Yang, Q., Wang, L., Zhan, J.: BDGS: A scalable big data generator suite in big data benchmarking. In: Rabl, T., Raghunath, N., Poess, M., Bhandarkar, M., Jacobsen, H.A., Baru, C. (eds.) Advancing Big Data Benchmarks. LNCS, vol. 8585, pp. 138–154. Springer, Heidelberg (2014)
Nambiar, R.: A standard for benchmarking big data systems. In: BigData 2014, pp. 18–20. IEEE (2014)
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD 2009, pp. 165–178. ACM (2009)
Pelleg, D., Moore, A.W., X-means, et al.: Extending k-means with efficient estimation of the number of clusters. In: ICML 2000, pp. 727–734 (2000)
Reiss, C., Tumanov, A., Ganger, G.R., Katz, R.H., Kozuch, M.A.: Heterogeneity, dynamicity of clouds at scale: Google trace analysis. In: Proceedings of the Third ACM Symposium on Cloud Computing, pp. 7. ACM (2012)
Saletore, V., Krishnan, K., Viswanathan, V., Tolentino, M.E., Hcbench, et al.: Methodology, development, and characterization of a customer usage representative big data/hadoop benchmark. In: IISWC 2013, pp. 77–86. IEEE (2013)
Sangroya, A., Serrano, D., Bouchenak, S.: MRBS: towards dependability benchmarking for hadoop mapreduce. In: Caragiannis, I., Alexander, M., Badia, R.M., Cannataro, M., Costan, A., Danelutto, M., Desprez, F., Krammer, B., Sahuquillo, J., Scott, S.L., Weidendorfer, J. (eds.) Euro-Par Workshops 2012. LNCS, vol. 7640, pp. 3–12. Springer, Heidelberg (2013)
Shankar, D., Lu, X., Wasi-ur-Rahman, M., Islam, N., Panda, D.K.D.K.: A Micro-benchmark suite for evaluating hadoop mapreduce on high-performance networks. In: Zhan, J., Rui, H., Weng, C. (eds.) BPOE 2014. LNCS, vol. 8807, pp. 19–33. Springer, Heidelberg (2014)
Sumbaly, R., Kreps, J., Gao, L., Feinberg, A., Soman, C., Shah, S.: Serving large-scale batch computed data with project voldemort. In: FAST 2012, pp. 18–18. USENIX Association (2012)
Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., et al.: Bigdatabench: a big data benchmark suite from internet services. In: HPCA 2014, pp. 488–499. IEEE (2014)
Acknowledgements
This work is supported by the National High Technology Research and Development Program of China (Grant No. 2015AA015308), the National Natural Science Foundation of China (Grant No. 61502451), and the Key Technology Research and Development Programs of Guangdong Province, China (Grant No. 2015B010108006).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Han, R. et al. (2016). BigDataBench-MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads. In: Zhan, J., Han, R., Zicari, R. (eds) Big Data Benchmarks, Performance Optimization, and Emerging Hardware. BPOE 2015. Lecture Notes in Computer Science(), vol 9495. Springer, Cham. https://doi.org/10.1007/978-3-319-29006-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-29006-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29005-8
Online ISBN: 978-3-319-29006-5
eBook Packages: Computer ScienceComputer Science (R0)