MiDBench: Multimodel Industrial Big Data Benchmark

Cheng, Yijian; Cheng, Mengqian; Ge, Hao; Guo, Yuhe; Hao, Yuanzhe; Sun, Xiaoguang; Qin, Xiongpai; Lu, Wei; Chen, Yueguo; Du, Xiaoyong

doi:10.1007/978-3-030-32813-9_15

MiDBench: Multimodel Industrial Big Data Benchmark

Yijian Cheng^10,11,
Mengqian Cheng^10,11,
Hao Ge^10,11,
Yuhe Guo^10,11,
Yuanzhe Hao^10,11,
Xiaoguang Sun^10,11,
Xiongpai Qin^10,11,
Wei Lu^10,11,
Yueguo Chen^10,11 &
…
Xiaoyong Du^10,11

Conference paper
First Online: 08 October 2019

1252 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11459))

The original version of this chapter was revised: the url link was corrected in reference 3. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-32813-9_21

Abstract

Driven by the increasing industrial data over decades, big data systems have evolved rapidly. The diversity and complexity of industrial applications raise great challenge for companies to choose appropriate big data systems. Therefore, big data system benchmark becomes a research hotspot. Most of the state-of-the-art benchmarks focus on specific domains or data formats.

This paper presents our efforts on multimodel industrial big data benchmark, called MiDBench. MiDBench focuses on big data systems in crane assembly, wind turbines monitoring and simulation results management scenarios, which correspond to bills of materials (a.b.a BoM), time series and unstructured data format respectively. Currently, we have chose and developed eleven typical workloads of these three types application domains in our benchmark suite and we generate synthetic data by scaling the sample data. For the sake of fairness, we chose widely acceptable throughput and response time as metrics. Through the above we have established a set of benchmark applicable to high-end manufacturing with high credibility. Overall, experiment results show that Neo4j (representing graph database) performs better than Oracle (representing relation database) for processing BoM data. IotDB is better than InfluxDB in time series data for query and stress test. MongoDB performs better than ElasticSearch in simulation results management domain.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Change history

08 October 2019
In the version of this paper that was originally published, reference 3 linked to the wrong website. This has been corrected.

References

Elasticsearch. https://www.elastic.co/
InfluxDB. https://www.influxdata.com/
IoTDB. https://iotdb.apache.org/
MongoDB. https://www.mongodb.com/
MySQL. https://www.mysql.com
Neo4j. https://neo4j.com/
Oracle. https://www.oracle.com
Time series benchmark suite (TSBS). https://github.com/timescale/tsbs
TPC.TPC-A, June 1994. http://www.tpc.org/tpca/spec/tpca_current.pdf
TPC.TPC-C, February 2010. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5.11.0.pdf
TPC.TPC-DS, November 2015. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.1.0.pdf
TPC.TPC-E, April 2015. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-e_v1.14.0.pdf
TPC.TPC-H, November 2014. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf
Anderson, T.L., Berre, A.J., Mallison, M., Porter, H.H., Schneider, B.: The HyperModel benchmark. In: Bancilhon, F., Thanos, C., Tsichritzis, D. (eds.) EDBT 1990. LNCS, vol. 416, pp. 317–331. Springer, Heidelberg (1990). https://doi.org/10.1007/BFb0022180
Chapter Google Scholar
Arasu, A., et al.: Linear road: a stream data management benchmark. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, vol. 30, pp. 480–491. VLDB Endowment (2004). http://dl.acm.org/citation.cfm?id=1316689.1316732
Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: LinkBench: a database benchmark based on the Facebook social graph. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 1185–1196. ACM, New York (2013). https://doi.org/10.1145/2463676.2465296
Böhme, T., Rahm, E.: Multi-user evaluation of XML data management systems with XMach-1. In: Bressan, S., Lee, M.L., Chaudhri, A.B., Yu, J.X., Lacroix, Z. (eds.) Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web. LNCS, vol. 2590, pp. 148–159. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36556-7_12
Chapter MATH Google Scholar
Jin, C.-Q., Qian, W.-N., Zhou, M.-Q., Zhou, A.-Y.: Benchmarking data management systems: from traditional database to emergent big data. Chin. J. Comput. (2014). http://cjc.ict.ac.cn/online/bfpub/jcq-2014430143239.pdf
Ferdman, M., et al.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware, pp. 37–48 (2012). https://www.industry-academia.org/download/ASPLOS12_Clearing_the_Clouds.pdf
Ghazal, A., et al.: BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 1197–1208. ACM, New York (2013). https://doi.org/10.1145/2463676.2463712
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), pp. 41–51, March 2010. https://doi.org/10.1109/ICDEW.2010.5452747
Jia, Z., Wang, L., Zhan, J., Zhang, L., Luo, C.: Characterizing data analysis workloads in data centers. In: 2013 IEEE International Symposium on Workload Characterization (IISWC), pp. 66–76, September 2013. https://doi.org/10.1109/IISWC.2013.6704671
Li, Y.G., et al.: XOO7: applying OO7 benchmark to xml query processing tool. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM 2001, pp. 167–174. ACM, New York (2001). https://doi.org/10.1145/502585.502614
Ming, Z., et al.: BDGS: a scalable big data generator suite in big data benchmarking. In: Rabl, T., Jacobsen, H.-A., Raghunath, N., Poess, M., Bhandarkar, M., Baru, C. (eds.) WBDB 2013. LNCS, vol. 8585, pp. 138–154. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10596-3_11
Chapter Google Scholar
Myllymaki, J., Kaufman, J.: DynaMark: a benchmark for dynamic spatial indexing. In: Chen, M.-S., Chrysanthis, P.K., Sloman, M., Zaslavsky, A. (eds.) MDM 2003. LNCS, vol. 2574, pp. 92–105. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36389-0_7
Chapter Google Scholar
Nicola, M., Kogan, I., Schiefer, B.: An XML transaction processing benchmark. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 2007, pp. 937–948. ACM, New York (2007). https://doi.org/10.1145/1247480.1247590
O’Neil, P.E.: The set query benchmark. In: The Benchmark Handbook (1991)
Google Scholar
Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: XMark: a benchmark for XML data management. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 974–985. VLDB Endowment (2002). http://dl.acm.org/citation.cfm?id=1287369.1287455
Wang, L., et al.: BigDataBench: a big data benchmark suite from internet services. CoRR abs/1401.1406 (2014). http://arxiv.org/abs/1401.1406
Yao, B.B., Özsu, M.T., Khandelwal, N.: XBench benchmark and performance testing of XML DBMSs. In: Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, pp. 621–632. IEEE Computer Society, Washington, DC (2004). http://dl.acm.org/citation.cfm?id=977401.978145
Zhu, Y., et al.: BigOP: generating comprehensive big data workloads as a benchmarking framework. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014. LNCS, vol. 8422, pp. 483–492. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05813-9_32
Chapter Google Scholar

Download references

Acknowledgment

The work is partially supported by the Ministry of Science and Technology of China, National Key Research and Development Program (No. 2016YFB1000702), and the NSF China under grant No. 61432006. You can visit our MiDBench at https://github.com/dbiir/MiDBench.

Author information

Authors and Affiliations

Key Laboratory of Data Engineering and Knowledge Engineering (MOE), Beijing, China
Yijian Cheng, Mengqian Cheng, Hao Ge, Yuhe Guo, Yuanzhe Hao, Xiaoguang Sun, Xiongpai Qin, Wei Lu, Yueguo Chen & Xiaoyong Du
School of Infomation, Renmin University of China, Beijing, China
Yijian Cheng, Mengqian Cheng, Hao Ge, Yuhe Guo, Yuanzhe Hao, Xiaoguang Sun, Xiongpai Qin, Wei Lu, Yueguo Chen & Xiaoyong Du

Authors

Yijian Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Mengqian Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Hao Ge
View author publications
You can also search for this author in PubMed Google Scholar
Yuhe Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yuanzhe Hao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoguang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xiongpai Qin
View author publications
You can also search for this author in PubMed Google Scholar
Wei Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yueguo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Du
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiongpai Qin .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Chen Zheng
Chinese Academy of Sciences, Beijing, China
Jianfeng Zhan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, Y. et al. (2019). MiDBench: Multimodel Industrial Big Data Benchmark. In: Zheng, C., Zhan, J. (eds) Benchmarking, Measuring, and Optimizing. Bench 2018. Lecture Notes in Computer Science(), vol 11459. Springer, Cham. https://doi.org/10.1007/978-3-030-32813-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-32813-9_15
Published: 08 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32812-2
Online ISBN: 978-3-030-32813-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics