Abstract
A big data benchmark suite is needed eagerly by customers, industry and academia recently. A number of prominent works in last several years are reviewed, their characteristics are introduced and shortcomings are analyzed. The authors also provide some suggestions on building the expected benchmark, including: component based benchmarks as well as end-to-end benchmarks should be used together to test distinct tools and test the system as a whole; workloads should be enriched with complex analytics to encompass different application scenarios; metrics other than performance metrics should also be considered.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ventana Research: Hadoop and Information Management: Benchmarking the Challenge of Enormous Volumes of Data (2013), http://www.ventanaresearch.com/research/benchmarkDetail.aspx?id=1663
Big Data Top 100: An open, community-based effort for benchmarking big data systems (2013), http://bigdatatop100.org/benchmarks
Hemsoth, N.: A New Benchmark for Big Data (2013), http://www.datanami.com/datanami/2013-03-06/a_new_benchmark_for_big_data.html
Kim, K., Jeon, K., Han, H., Kim, S.G., Jung, H., Yeom, H.Y.: MRBench: A Benchmark for MapReduce Framework. In: Proceedings of ICPADS, pp. 11–18. IEEE Press, Melbourne (2008)
Loebman, S., Nunley, D., Kwon, Y., Howe, B., Balazinska, M., Gardner, J.P.: Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help? In: Proceedings of CLUSTER, pp. 1–10. IEEE Press, New Orleans (2009)
Moussa, R.: TPC-H Benchmark Analytics Scenarios and Performances on Hadoop Data Clouds. In: Benlamri, R. (ed.) NDT 2012, Part I. CCIS, vol. 293, pp. 220–234. Springer, Heidelberg (2012)
Ahmad, F., Lee, S., Thottethodi, M., Vijaykumar, T.N.: PUMA: Purdue MapReduce Benchmarks Suite. Purdue Technical Report TR-ECE-12-11 (2012)
Chen, Y., Alspaugh, S., Ganapathi, A., Griffith, R., Katz, R.: SWIM - Statistical Workload Injector for MapReduce (2013), https://github.com/SWIMProjectUCB/SWIM/wiki
Chen, Y.P., Raab, F., Katz, R.H.: From TPC-C to Big Data Benchmarks: A Functional Workload Model. UC Berkeley Technical Report UCB/EECS-2012-174 (2012)
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking Cloud Serving Systems with YCSB. In: SoCC, pp. 143–154. ACM Press, Indianapolis (2010)
Smullen, C.W., Shahrukh, I.V., Tarapore, R., Gurumurthi, S.: A Benchmark Suite for Unstructured Data Processing. In: Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os (in Conjunction with MSST), pp. 79–83. IEEE Press, San Diego (2007)
Huang, S.S., Huang, J., Dai, J.Q., Xie, T., Huang, B.: The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis. In: ICDE Workshops on Information & Software as Services, pp. 41–51. IEEE Press, Long Beach (2010)
Nyberg, C., Shah, M.: Sort benchmark (2012), http://sortbenchmark.org/
TeraSort. TeraSort Benchmark (2012), http://sortbenchmark.org/
Malley, O.O., Murthy, A.C.: Winning a 60 Second Dash with a Yellow Elephant (2009), http://sortbenchmark.org/Yahoo2009.pdf
GridMix. GridMix Benchmark (2012), http://hadoop.apache.org/docs/r1.1.1/gridmix.html
Jia, Y., Shao, Z.: A Benchmark for Hive, PIG and Hadoop (2012), http://issues.apache.org/jira/browse/HIVE-396
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD, pp. 165–178. ACM Press, Rhode Island (2009)
DFSIO program. DFSIO of Hadoop source distribution (2012), src/test/org/apache/hadoop/fs/TestDFSIO
Luo, C.J., Zhan, J.F., Jia, Z., Wang, L., Lu, G., Zhang, L.X., Xu, C.Z., Sun, N.H.: CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications. Frontier of Computer Science 6(4), 347–362 (2012)
UCSD Center for Large Scale Data Systems Research: Big Data Benchmarking Workshops (2013), http://clds.ucsd.edu/bdbc/workshops
Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.A.: BigBench: Towards an Industry Standard Benchmark for Big Data Analytics. In: SIGMOD. ACM Press, New York (2013)
Graph 500: Graph 500 Benchmark 1 (2013), http://www.graph500.org/specifications
King, R.: Facebook releasing new Social Graph database benchmark: LinkBench (2013), http://www.zdnet.com/facebook-releasing-new-social-graph-database-benchmark-linkbench-7000013356/
Ferguson, M.: Architecting a Big Data Platform for Analytics. A Whitepaper Prepared for IBM (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qin, X., Zhou, X. (2013). A Survey on Benchmarks for Big Data and Some More Considerations. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2013. IDEAL 2013. Lecture Notes in Computer Science, vol 8206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41278-3_75
Download citation
DOI: https://doi.org/10.1007/978-3-642-41278-3_75
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41277-6
Online ISBN: 978-3-642-41278-3
eBook Packages: Computer ScienceComputer Science (R0)