A Survey on Benchmarks for Big Data and Some More Considerations

Qin, Xiongpai; Zhou, Xiaoyun

doi:10.1007/978-3-642-41278-3_75

Xiongpai Qin²⁴ &
Xiaoyun Zhou²⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8206))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

5040 Accesses
2 Citations

Abstract

A big data benchmark suite is needed eagerly by customers, industry and academia recently. A number of prominent works in last several years are reviewed, their characteristics are introduced and shortcomings are analyzed. The authors also provide some suggestions on building the expected benchmark, including: component based benchmarks as well as end-to-end benchmarks should be used together to test distinct tools and test the system as a whole; workloads should be enriched with complex analytics to encompass different application scenarios; metrics other than performance metrics should also be considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ventana Research: Hadoop and Information Management: Benchmarking the Challenge of Enormous Volumes of Data (2013), http://www.ventanaresearch.com/research/benchmarkDetail.aspx?id=1663
Big Data Top 100: An open, community-based effort for benchmarking big data systems (2013), http://bigdatatop100.org/benchmarks
Hemsoth, N.: A New Benchmark for Big Data (2013), http://www.datanami.com/datanami/2013-03-06/a_new_benchmark_for_big_data.html
Kim, K., Jeon, K., Han, H., Kim, S.G., Jung, H., Yeom, H.Y.: MRBench: A Benchmark for MapReduce Framework. In: Proceedings of ICPADS, pp. 11–18. IEEE Press, Melbourne (2008)
Google Scholar
Loebman, S., Nunley, D., Kwon, Y., Howe, B., Balazinska, M., Gardner, J.P.: Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help? In: Proceedings of CLUSTER, pp. 1–10. IEEE Press, New Orleans (2009)
Google Scholar
Moussa, R.: TPC-H Benchmark Analytics Scenarios and Performances on Hadoop Data Clouds. In: Benlamri, R. (ed.) NDT 2012, Part I. CCIS, vol. 293, pp. 220–234. Springer, Heidelberg (2012)
Chapter Google Scholar
Ahmad, F., Lee, S., Thottethodi, M., Vijaykumar, T.N.: PUMA: Purdue MapReduce Benchmarks Suite. Purdue Technical Report TR-ECE-12-11 (2012)
Google Scholar
Chen, Y., Alspaugh, S., Ganapathi, A., Griffith, R., Katz, R.: SWIM - Statistical Workload Injector for MapReduce (2013), https://github.com/SWIMProjectUCB/SWIM/wiki
Chen, Y.P., Raab, F., Katz, R.H.: From TPC-C to Big Data Benchmarks: A Functional Workload Model. UC Berkeley Technical Report UCB/EECS-2012-174 (2012)
Google Scholar
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking Cloud Serving Systems with YCSB. In: SoCC, pp. 143–154. ACM Press, Indianapolis (2010)
Chapter Google Scholar
Smullen, C.W., Shahrukh, I.V., Tarapore, R., Gurumurthi, S.: A Benchmark Suite for Unstructured Data Processing. In: Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os (in Conjunction with MSST), pp. 79–83. IEEE Press, San Diego (2007)
Google Scholar
Huang, S.S., Huang, J., Dai, J.Q., Xie, T., Huang, B.: The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis. In: ICDE Workshops on Information & Software as Services, pp. 41–51. IEEE Press, Long Beach (2010)
Google Scholar
Nyberg, C., Shah, M.: Sort benchmark (2012), http://sortbenchmark.org/
TeraSort. TeraSort Benchmark (2012), http://sortbenchmark.org/
Malley, O.O., Murthy, A.C.: Winning a 60 Second Dash with a Yellow Elephant (2009), http://sortbenchmark.org/Yahoo2009.pdf
GridMix. GridMix Benchmark (2012), http://hadoop.apache.org/docs/r1.1.1/gridmix.html
Jia, Y., Shao, Z.: A Benchmark for Hive, PIG and Hadoop (2012), http://issues.apache.org/jira/browse/HIVE-396
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD, pp. 165–178. ACM Press, Rhode Island (2009)
Chapter Google Scholar
DFSIO program. DFSIO of Hadoop source distribution (2012), src/test/org/apache/hadoop/fs/TestDFSIO
Google Scholar
Luo, C.J., Zhan, J.F., Jia, Z., Wang, L., Lu, G., Zhang, L.X., Xu, C.Z., Sun, N.H.: CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications. Frontier of Computer Science 6(4), 347–362 (2012)
MathSciNet Google Scholar
UCSD Center for Large Scale Data Systems Research: Big Data Benchmarking Workshops (2013), http://clds.ucsd.edu/bdbc/workshops
Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.A.: BigBench: Towards an Industry Standard Benchmark for Big Data Analytics. In: SIGMOD. ACM Press, New York (2013)
Google Scholar
Graph 500: Graph 500 Benchmark 1 (2013), http://www.graph500.org/specifications
King, R.: Facebook releasing new Social Graph database benchmark: LinkBench (2013), http://www.zdnet.com/facebook-releasing-new-social-graph-database-benchmark-linkbench-7000013356/
Ferguson, M.: Architecting a Big Data Platform for Analytics. A Whitepaper Prepared for IBM (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information, Renmin University of China, Beijing, 100872, China
Xiongpai Qin
Computer Science Department, Jiangsu Normal University, Xuzhou, Jiangsu, 221116, China
Xiaoyun Zhou

Authors

Xiongpai Qin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyun Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin
University of Science and Technology of China, Hefei, China
Ke Tang
Nanjing University, Nanjing, China
Yang Gao
Ostfalia University of Applied Sciences, 38302, Wolfenbüttel, Germany
Frank Klawonn
Kyungpook National University, 702-701, Buk-Gu, Daegu, Korea
Minho Lee
Nature Inspired Computational and Applications Laboratory, School of Computer Science and Technology,, University of Science and Technology of China, 230027, Hefei, China
Thomas Weise
University of Science and Technology of China, 230017, Hefei, China
Bin Li
CERCIA, School of Computer Science, University of Birmingham, B15 2TT, Edgbaston, Birmingham, UK
Xin Yao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qin, X., Zhou, X. (2013). A Survey on Benchmarks for Big Data and Some More Considerations. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2013. IDEAL 2013. Lecture Notes in Computer Science, vol 8206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41278-3_75

Download citation

DOI: https://doi.org/10.1007/978-3-642-41278-3_75
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41277-6
Online ISBN: 978-3-642-41278-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics