Skip to main content

BigDataBench-MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads

  • Conference paper
  • First Online:
Big Data Benchmarks, Performance Optimization, and Emerging Hardware (BPOE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9495))

Included in the following conference series:

Abstract

Long-running service workloads (e.g. web search engine) and short-term data analysis workloads (e.g. Hadoop MapReduce jobs) co-locate in today’s data centers. Developing realistic benchmarks to reflect such practical scenario of mixed workload is a key problem to produce trustworthy results when evaluating and comparing data center systems. This requires using actual workloads as well as guaranteeing their submissions to follow patterns hidden in real-world traces. However, existing benchmarks either generate actual workloads based on probability models, or replay real-world workload traces using basic I/O operations. To fill this gap, we propose a benchmark tool that is a first step towards generating a mix of actual service and data analysis workloads on the basis of real workload traces. Our tool includes a combiner that enables the replaying of actual workloads according to the workload traces, and a multi-tenant generator that flexibly scales the workloads up and down according to users’ requirements. Based on this, our demo illustrates the workload customization and generation process using a visual interface. The proposed tool, called BigDataBench-MT, is a multi-tenant version of our comprehensive benchmark suite BigDataBench and it is publicly available from http://prof.ict.ac.cn/BigDataBench/multi-tenancyversion/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache hbase. http://hbase.apache.org/

  2. Apache hive. https://cwiki.apache.org/confluence/display/Hive/Home

  3. Apache impala. http://impala.io/

  4. Apache Mahout. http://mahout.apache.org/

  5. Apache Nutch. http://nutch.apache.org/

  6. Apache spark. https://spark.apache.org/

  7. Big data benchmark by amplab of UC berkeley. https://amplab.cs.berkeley.edu/benchmark/

  8. BigDataBench. http://prof.ict.ac.cn/BigDataBench/

  9. Grep. http://wiki.apache.org/hadoop/Grep

  10. Gridmix. https://hadoop.apache.org/docs/r1.2.1/gridmix.html

  11. Hadoop ecosystems. https://hadoopecosystemtable.github.io/

  12. MySQL database. https://www.mysql.com/

  13. Nnbench. http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapred-test/0.22.0/org/apache/hadoop/hdfs/NNBench.java/

  14. Oprofile. http://oprofile.sourceforge.net/

  15. Oracle database. http://www.oracle.com/

  16. Perf. https://perf.wiki.kernel.org/

  17. PigMix. https://cwiki.apache.org/confluence/display/PIG/PigMix

  18. Sogou user query logs. http://www.sogou.com/labs/dl/q-e.html

  19. Sort. http://wiki.apache.org/hadoop/Sort

  20. Specweb99 benchmark. https://www.spec.org/web2009/

  21. Swim. https://github.com/SWIMProjectUCB/SWIM/wiki

  22. Terasort. https://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html

  23. Testdfsio. https://support.pivotal.io/hc/en-us/articles/200864057-Running-DFSIO-mapreduce-benchmark-test/

  24. TPC benchmarks. http://www.tpc.org/

  25. TPC-W benchmark. http://www.tpc.org/tpcw/

  26. WordCount. http://wiki.apache.org/hadoop/WordCount

  27. Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M., Linkbench: a database benchmark based on the facebook social graph. In: SIGMOD 2013, pp. 1185–1196. ACM (2013)

    Google Scholar 

  28. Barahmand, S., Ghandeharizadeh, S., BG: A benchmark to evaluate interactive social networking actions. In: CIDR 2013. Citeseer (2013)

    Google Scholar 

  29. Capotă, M., Hegeman, T., Iosup, A., Prat-Pérez, A., Erling, O., Boncz, P., Graphalytics: A big data benchmark for graph-processing platforms. In: Proceedings of the GRADES 2015, pp. 7. ACM (2015)

    Google Scholar 

  30. Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. VLDB 5(12), 1802–1813 (2012)

    Google Scholar 

  31. Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: SoCC 2010, pp. 143–154. ACM (2010)

    Google Scholar 

  32. Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)

    Article  Google Scholar 

  33. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W., Dynamo: amazon’s highly available key-value store. In: ACM SIGOPS Operating Systems Review, vol. 41, pp. 205–220. ACM (2007)

    Google Scholar 

  34. Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: A study of emerging workloads on modern hardware. Technical report (2011)

    Google Scholar 

  35. Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.-A.: Bigbench: Towards an industry standard benchmark for big data analytics. In: SIGMOD 2013, pp. 1197–1208. ACM (2013)

    Google Scholar 

  36. Han, R., Ghanem, M.M., Guo, L., Guo, Y., Osmond, M.: Enabling cost-aware and adaptive elasticity of multi-tier cloud applications. Future Gener. Comput. Syst. 32, 82–98 (2014)

    Article  Google Scholar 

  37. Han, R., Wang, J., Ge, F., Vazquez-Poletti, J.L., Zhan, J.: SARP: producing approximate results with small correctness losses for cloud interactive services. In: CF 2015, pp. 22. ACM (2015)

    Google Scholar 

  38. Han, R., Wang, J., Huang, S., Shao, C., Zhan, S., Zhan, J., Vazquez-Poletti, J.L., SARP: producing approximate results with small correctness losses for cloud interactive services. In: ICPP 2015, pp. 490–499. IEEE (2015)

    Google Scholar 

  39. Huang, S., Huang, J., Dai, J., Xie, T., Huang, B., The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In: IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2013), pp. 41–51. IEEE (2010)

    Google Scholar 

  40. Kim, K., Jeon, K., Han, H., Kim, S.-G., Jung, H., Yeom, H.Y.: Mrbench: A. benchmark for mapreduce framework. In: ICPADS 2008, pp. 11–18. IEEE (2008)

    Google Scholar 

  41. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  42. Ming, Z., Luo, C., Gao, W., Han, R., Yang, Q., Wang, L., Zhan, J.: BDGS: A scalable big data generator suite in big data benchmarking. In: Rabl, T., Raghunath, N., Poess, M., Bhandarkar, M., Jacobsen, H.A., Baru, C. (eds.) Advancing Big Data Benchmarks. LNCS, vol. 8585, pp. 138–154. Springer, Heidelberg (2014)

    Google Scholar 

  43. Nambiar, R.: A standard for benchmarking big data systems. In: BigData 2014, pp. 18–20. IEEE (2014)

    Google Scholar 

  44. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD 2009, pp. 165–178. ACM (2009)

    Google Scholar 

  45. Pelleg, D., Moore, A.W., X-means, et al.: Extending k-means with efficient estimation of the number of clusters. In: ICML 2000, pp. 727–734 (2000)

    Google Scholar 

  46. Reiss, C., Tumanov, A., Ganger, G.R., Katz, R.H., Kozuch, M.A.: Heterogeneity, dynamicity of clouds at scale: Google trace analysis. In: Proceedings of the Third ACM Symposium on Cloud Computing, pp. 7. ACM (2012)

    Google Scholar 

  47. Saletore, V., Krishnan, K., Viswanathan, V., Tolentino, M.E., Hcbench, et al.: Methodology, development, and characterization of a customer usage representative big data/hadoop benchmark. In: IISWC 2013, pp. 77–86. IEEE (2013)

    Google Scholar 

  48. Sangroya, A., Serrano, D., Bouchenak, S.: MRBS: towards dependability benchmarking for hadoop mapreduce. In: Caragiannis, I., Alexander, M., Badia, R.M., Cannataro, M., Costan, A., Danelutto, M., Desprez, F., Krammer, B., Sahuquillo, J., Scott, S.L., Weidendorfer, J. (eds.) Euro-Par Workshops 2012. LNCS, vol. 7640, pp. 3–12. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  49. Shankar, D., Lu, X., Wasi-ur-Rahman, M., Islam, N., Panda, D.K.D.K.: A Micro-benchmark suite for evaluating hadoop mapreduce on high-performance networks. In: Zhan, J., Rui, H., Weng, C. (eds.) BPOE 2014. LNCS, vol. 8807, pp. 19–33. Springer, Heidelberg (2014)

    Google Scholar 

  50. Sumbaly, R., Kreps, J., Gao, L., Feinberg, A., Soman, C., Shah, S.: Serving large-scale batch computed data with project voldemort. In: FAST 2012, pp. 18–18. USENIX Association (2012)

    Google Scholar 

  51. Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., et al.: Bigdatabench: a big data benchmark suite from internet services. In: HPCA 2014, pp. 488–499. IEEE (2014)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National High Technology Research and Development Program of China (Grant No. 2015AA015308), the National Natural Science Foundation of China (Grant No. 61502451), and the Key Technology Research and Development Programs of Guangdong Province, China (Grant No. 2015B010108006).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Han, R. et al. (2016). BigDataBench-MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads. In: Zhan, J., Han, R., Zicari, R. (eds) Big Data Benchmarks, Performance Optimization, and Emerging Hardware. BPOE 2015. Lecture Notes in Computer Science(), vol 9495. Springer, Cham. https://doi.org/10.1007/978-3-319-29006-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-29006-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-29005-8

  • Online ISBN: 978-3-319-29006-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics