BigDataBench-MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads

Han, Rui; Zhan, Shulin; Shao, Chenrong; Wang, Junwei; John, Lizy K.; Xu, Jiangtao; Lu, Gang; Wang, Lei

doi:10.1007/978-3-319-29006-5_2

Rui Han¹⁶,
Shulin Zhan¹⁸,
Chenrong Shao¹⁹,
Junwei Wang²⁰,
Lizy K. John²¹,
Jiangtao Xu¹⁶,
Gang Lu^16,17 &
…
Lei Wang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9495))

Included in the following conference series:

BPOE

1022 Accesses
5 Citations

Abstract

Long-running service workloads (e.g. web search engine) and short-term data analysis workloads (e.g. Hadoop MapReduce jobs) co-locate in today’s data centers. Developing realistic benchmarks to reflect such practical scenario of mixed workload is a key problem to produce trustworthy results when evaluating and comparing data center systems. This requires using actual workloads as well as guaranteeing their submissions to follow patterns hidden in real-world traces. However, existing benchmarks either generate actual workloads based on probability models, or replay real-world workload traces using basic I/O operations. To fill this gap, we propose a benchmark tool that is a first step towards generating a mix of actual service and data analysis workloads on the basis of real workload traces. Our tool includes a combiner that enables the replaying of actual workloads according to the workload traces, and a multi-tenant generator that flexibly scales the workloads up and down according to users’ requirements. Based on this, our demo illustrates the workload customization and generation process using a visual interface. The proposed tool, called BigDataBench-MT, is a multi-tenant version of our comprehensive benchmark suite BigDataBench and it is publicly available from http://prof.ict.ac.cn/BigDataBench/multi-tenancyversion/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Apache hbase. http://hbase.apache.org/
Apache hive. https://cwiki.apache.org/confluence/display/Hive/Home
Apache impala. http://impala.io/
Apache Mahout. http://mahout.apache.org/
Apache Nutch. http://nutch.apache.org/
Apache spark. https://spark.apache.org/
Big data benchmark by amplab of UC berkeley. https://amplab.cs.berkeley.edu/benchmark/
BigDataBench. http://prof.ict.ac.cn/BigDataBench/
Grep. http://wiki.apache.org/hadoop/Grep
Gridmix. https://hadoop.apache.org/docs/r1.2.1/gridmix.html
Hadoop ecosystems. https://hadoopecosystemtable.github.io/
MySQL database. https://www.mysql.com/
Nnbench. http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapred-test/0.22.0/org/apache/hadoop/hdfs/NNBench.java/
Oprofile. http://oprofile.sourceforge.net/
Oracle database. http://www.oracle.com/
Perf. https://perf.wiki.kernel.org/
PigMix. https://cwiki.apache.org/confluence/display/PIG/PigMix
Sogou user query logs. http://www.sogou.com/labs/dl/q-e.html
Sort. http://wiki.apache.org/hadoop/Sort
Specweb99 benchmark. https://www.spec.org/web2009/
Swim. https://github.com/SWIMProjectUCB/SWIM/wiki
Terasort. https://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html
Testdfsio. https://support.pivotal.io/hc/en-us/articles/200864057-Running-DFSIO-mapreduce-benchmark-test/
TPC benchmarks. http://www.tpc.org/
TPC-W benchmark. http://www.tpc.org/tpcw/
WordCount. http://wiki.apache.org/hadoop/WordCount
Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M., Linkbench: a database benchmark based on the facebook social graph. In: SIGMOD 2013, pp. 1185–1196. ACM (2013)
Google Scholar
Barahmand, S., Ghandeharizadeh, S., BG: A benchmark to evaluate interactive social networking actions. In: CIDR 2013. Citeseer (2013)
Google Scholar
Capotă, M., Hegeman, T., Iosup, A., Prat-Pérez, A., Erling, O., Boncz, P., Graphalytics: A big data benchmark for graph-processing platforms. In: Proceedings of the GRADES 2015, pp. 7. ACM (2015)
Google Scholar
Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. VLDB 5(12), 1802–1813 (2012)
Google Scholar
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: SoCC 2010, pp. 143–154. ACM (2010)
Google Scholar
Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)
Article Google Scholar
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W., Dynamo: amazon’s highly available key-value store. In: ACM SIGOPS Operating Systems Review, vol. 41, pp. 205–220. ACM (2007)
Google Scholar
Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: A study of emerging workloads on modern hardware. Technical report (2011)
Google Scholar
Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.-A.: Bigbench: Towards an industry standard benchmark for big data analytics. In: SIGMOD 2013, pp. 1197–1208. ACM (2013)
Google Scholar
Han, R., Ghanem, M.M., Guo, L., Guo, Y., Osmond, M.: Enabling cost-aware and adaptive elasticity of multi-tier cloud applications. Future Gener. Comput. Syst. 32, 82–98 (2014)
Article Google Scholar
Han, R., Wang, J., Ge, F., Vazquez-Poletti, J.L., Zhan, J.: SARP: producing approximate results with small correctness losses for cloud interactive services. In: CF 2015, pp. 22. ACM (2015)
Google Scholar
Han, R., Wang, J., Huang, S., Shao, C., Zhan, S., Zhan, J., Vazquez-Poletti, J.L., SARP: producing approximate results with small correctness losses for cloud interactive services. In: ICPP 2015, pp. 490–499. IEEE (2015)
Google Scholar
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B., The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In: IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2013), pp. 41–51. IEEE (2010)
Google Scholar
Kim, K., Jeon, K., Han, H., Kim, S.-G., Jung, H., Yeom, H.Y.: Mrbench: A. benchmark for mapreduce framework. In: ICPADS 2008, pp. 11–18. IEEE (2008)
Google Scholar
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
Article Google Scholar
Ming, Z., Luo, C., Gao, W., Han, R., Yang, Q., Wang, L., Zhan, J.: BDGS: A scalable big data generator suite in big data benchmarking. In: Rabl, T., Raghunath, N., Poess, M., Bhandarkar, M., Jacobsen, H.A., Baru, C. (eds.) Advancing Big Data Benchmarks. LNCS, vol. 8585, pp. 138–154. Springer, Heidelberg (2014)
Google Scholar
Nambiar, R.: A standard for benchmarking big data systems. In: BigData 2014, pp. 18–20. IEEE (2014)
Google Scholar
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD 2009, pp. 165–178. ACM (2009)
Google Scholar
Pelleg, D., Moore, A.W., X-means, et al.: Extending k-means with efficient estimation of the number of clusters. In: ICML 2000, pp. 727–734 (2000)
Google Scholar
Reiss, C., Tumanov, A., Ganger, G.R., Katz, R.H., Kozuch, M.A.: Heterogeneity, dynamicity of clouds at scale: Google trace analysis. In: Proceedings of the Third ACM Symposium on Cloud Computing, pp. 7. ACM (2012)
Google Scholar
Saletore, V., Krishnan, K., Viswanathan, V., Tolentino, M.E., Hcbench, et al.: Methodology, development, and characterization of a customer usage representative big data/hadoop benchmark. In: IISWC 2013, pp. 77–86. IEEE (2013)
Google Scholar
Sangroya, A., Serrano, D., Bouchenak, S.: MRBS: towards dependability benchmarking for hadoop mapreduce. In: Caragiannis, I., Alexander, M., Badia, R.M., Cannataro, M., Costan, A., Danelutto, M., Desprez, F., Krammer, B., Sahuquillo, J., Scott, S.L., Weidendorfer, J. (eds.) Euro-Par Workshops 2012. LNCS, vol. 7640, pp. 3–12. Springer, Heidelberg (2013)
Chapter Google Scholar
Shankar, D., Lu, X., Wasi-ur-Rahman, M., Islam, N., Panda, D.K.D.K.: A Micro-benchmark suite for evaluating hadoop mapreduce on high-performance networks. In: Zhan, J., Rui, H., Weng, C. (eds.) BPOE 2014. LNCS, vol. 8807, pp. 19–33. Springer, Heidelberg (2014)
Google Scholar
Sumbaly, R., Kreps, J., Gao, L., Feinberg, A., Soman, C., Shah, S.: Serving large-scale batch computed data with project voldemort. In: FAST 2012, pp. 18–18. USENIX Association (2012)
Google Scholar
Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., et al.: Bigdatabench: a big data benchmark suite from internet services. In: HPCA 2014, pp. 488–499. IEEE (2014)
Google Scholar

Download references

Acknowledgements

This work is supported by the National High Technology Research and Development Program of China (Grant No. 2015AA015308), the National Natural Science Foundation of China (Grant No. 61502451), and the Key Technology Research and Development Programs of Guangdong Province, China (Grant No. 2015B010108006).

Author information

Authors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Rui Han, Jiangtao Xu, Gang Lu & Lei Wang
University of Chinese Academy of Sciences, Beijing, China
Gang Lu
ICarsclub, Beijing, China
Shulin Zhan
Xi’an Jiaotong University, Xi’an, China
Chenrong Shao
Kingsoft Cloud, Beijing, China
Junwei Wang
Department of Electrical and Computer Engineering, The University of Texas, Austin, TX, USA
Lizy K. John

Authors

Rui Han
View author publications
You can also search for this author in PubMed Google Scholar
Shulin Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Chenrong Shao
View author publications
You can also search for this author in PubMed Google Scholar
Junwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lizy K. John
View author publications
You can also search for this author in PubMed Google Scholar
Jiangtao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Gang Lu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Han .

Editor information

Editors and Affiliations

Institute of Computing, Chinese Academy of Sciences, Beijing, China
Jianfeng Zhan
ICT, Chinese Academy of Sciences, Beijing, China
Rui Han
FB12 - DBIS (5. Stock), Goethe Universität Frankfurt, Frankfurt, Hessen, Germany
Roberto V. Zicari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, R. et al. (2016). BigDataBench-MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads. In: Zhan, J., Han, R., Zicari, R. (eds) Big Data Benchmarks, Performance Optimization, and Emerging Hardware. BPOE 2015. Lecture Notes in Computer Science(), vol 9495. Springer, Cham. https://doi.org/10.1007/978-3-319-29006-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-29006-5_2
Published: 09 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29005-8
Online ISBN: 978-3-319-29006-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics