Skip to main content

A Survey on Benchmarks for Big Data and Some More Considerations

  • Conference paper
Intelligent Data Engineering and Automated Learning – IDEAL 2013 (IDEAL 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8206))

Abstract

A big data benchmark suite is needed eagerly by customers, industry and academia recently. A number of prominent works in last several years are reviewed, their characteristics are introduced and shortcomings are analyzed. The authors also provide some suggestions on building the expected benchmark, including: component based benchmarks as well as end-to-end benchmarks should be used together to test distinct tools and test the system as a whole; workloads should be enriched with complex analytics to encompass different application scenarios; metrics other than performance metrics should also be considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ventana Research: Hadoop and Information Management: Benchmarking the Challenge of Enormous Volumes of Data (2013), http://www.ventanaresearch.com/research/benchmarkDetail.aspx?id=1663

  2. Big Data Top 100: An open, community-based effort for benchmarking big data systems (2013), http://bigdatatop100.org/benchmarks

  3. Hemsoth, N.: A New Benchmark for Big Data (2013), http://www.datanami.com/datanami/2013-03-06/a_new_benchmark_for_big_data.html

  4. Kim, K., Jeon, K., Han, H., Kim, S.G., Jung, H., Yeom, H.Y.: MRBench: A Benchmark for MapReduce Framework. In: Proceedings of ICPADS, pp. 11–18. IEEE Press, Melbourne (2008)

    Google Scholar 

  5. Loebman, S., Nunley, D., Kwon, Y., Howe, B., Balazinska, M., Gardner, J.P.: Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help? In: Proceedings of CLUSTER, pp. 1–10. IEEE Press, New Orleans (2009)

    Google Scholar 

  6. Moussa, R.: TPC-H Benchmark Analytics Scenarios and Performances on Hadoop Data Clouds. In: Benlamri, R. (ed.) NDT 2012, Part I. CCIS, vol. 293, pp. 220–234. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. Ahmad, F., Lee, S., Thottethodi, M., Vijaykumar, T.N.: PUMA: Purdue MapReduce Benchmarks Suite. Purdue Technical Report TR-ECE-12-11 (2012)

    Google Scholar 

  8. Chen, Y., Alspaugh, S., Ganapathi, A., Griffith, R., Katz, R.: SWIM - Statistical Workload Injector for MapReduce (2013), https://github.com/SWIMProjectUCB/SWIM/wiki

  9. Chen, Y.P., Raab, F., Katz, R.H.: From TPC-C to Big Data Benchmarks: A Functional Workload Model. UC Berkeley Technical Report UCB/EECS-2012-174 (2012)

    Google Scholar 

  10. Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking Cloud Serving Systems with YCSB. In: SoCC, pp. 143–154. ACM Press, Indianapolis (2010)

    Chapter  Google Scholar 

  11. Smullen, C.W., Shahrukh, I.V., Tarapore, R., Gurumurthi, S.: A Benchmark Suite for Unstructured Data Processing. In: Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os (in Conjunction with MSST), pp. 79–83. IEEE Press, San Diego (2007)

    Google Scholar 

  12. Huang, S.S., Huang, J., Dai, J.Q., Xie, T., Huang, B.: The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis. In: ICDE Workshops on Information & Software as Services, pp. 41–51. IEEE Press, Long Beach (2010)

    Google Scholar 

  13. Nyberg, C., Shah, M.: Sort benchmark (2012), http://sortbenchmark.org/

  14. TeraSort. TeraSort Benchmark (2012), http://sortbenchmark.org/

  15. Malley, O.O., Murthy, A.C.: Winning a 60 Second Dash with a Yellow Elephant (2009), http://sortbenchmark.org/Yahoo2009.pdf

  16. GridMix. GridMix Benchmark (2012), http://hadoop.apache.org/docs/r1.1.1/gridmix.html

  17. Jia, Y., Shao, Z.: A Benchmark for Hive, PIG and Hadoop (2012), http://issues.apache.org/jira/browse/HIVE-396

  18. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD, pp. 165–178. ACM Press, Rhode Island (2009)

    Chapter  Google Scholar 

  19. DFSIO program. DFSIO of Hadoop source distribution (2012), src/test/org/apache/hadoop/fs/TestDFSIO

    Google Scholar 

  20. Luo, C.J., Zhan, J.F., Jia, Z., Wang, L., Lu, G., Zhang, L.X., Xu, C.Z., Sun, N.H.: CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications. Frontier of Computer Science 6(4), 347–362 (2012)

    MathSciNet  Google Scholar 

  21. UCSD Center for Large Scale Data Systems Research: Big Data Benchmarking Workshops (2013), http://clds.ucsd.edu/bdbc/workshops

  22. Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.A.: BigBench: Towards an Industry Standard Benchmark for Big Data Analytics. In: SIGMOD. ACM Press, New York (2013)

    Google Scholar 

  23. Graph 500: Graph 500 Benchmark 1 (2013), http://www.graph500.org/specifications

  24. King, R.: Facebook releasing new Social Graph database benchmark: LinkBench (2013), http://www.zdnet.com/facebook-releasing-new-social-graph-database-benchmark-linkbench-7000013356/

  25. Ferguson, M.: Architecting a Big Data Platform for Analytics. A Whitepaper Prepared for IBM (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Qin, X., Zhou, X. (2013). A Survey on Benchmarks for Big Data and Some More Considerations. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2013. IDEAL 2013. Lecture Notes in Computer Science, vol 8206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41278-3_75

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41278-3_75

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41277-6

  • Online ISBN: 978-3-642-41278-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics