Skip to main content

Big Data Benchmark - Big DS

  • Conference paper
  • First Online:
Advancing Big Data Benchmarks (WBDB 2013, WBDB 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8585))

Included in the following conference series:

Abstract

Performance and scalability in clusters of heterogeneous and complex Big Data Analytic environments are always unpredictable. In this paper, we are trying to address this problem by using a benchmark named “Big DS”. The benchmark adopts many great ideas from some famous industry benchmarks like TPC-H [1], TPC-DS [1], SPECvirt_sc2010 [2] and SPECjbb2005 [2], we also adopt some ideas from non-standard benchmarks liked TeraSort [3], SWIM [4], etc. By defining a configurable workload for different big data analytics environment, Big DS can be used for measuring the performance and scalability of a big data analytics platform or environment for different business.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. TPC. TPC is a trademark of the Transaction Processing Performance Council. TPC-H and TPC-DS are the decision support benchmarks of TPC organization. http://www.tpc.org

  2. SPEC. SPEC is a trademark of the Standard Performance Evaluation Corporation 1995–2014. SPECjbb2005 is the server side Java Benchmark of SPEC.org. SPECjbb2013 is the evaluation version of SPECjbb2005. SPECvirt_2010sc is the server consolidation virtualization benchmark of SPEC.org. http://www.spec.org

  3. TeraSort. Refer to the Apache Terasort benchmark, which is a MapReduce version of Sort benchmark

    Google Scholar 

  4. SWIM. SWIM stands for Statistical Workload Injector for MapReduce. The synthesis methodology is adopted in BigDS and it’s supporting toolset

    Google Scholar 

  5. Apache Hadoop and it’s related projects. Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users. It is licensed under the Apache License 2.0

    Google Scholar 

  6. Apache Hive. Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. [1] While initially developed by Facebook

    Google Scholar 

  7. Cloudera Impala. loudera Impala is Cloudera’s open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html

  8. Google BigTable. Refer to Google’s BigTable paper. http://research.google.com/archive/bigtable-osdi06.pdf

  9. Huppler, K.: Chairman TPC. The author of “The art of building a good benchmark” (2009). http://www.tpc.org/tpctc/tpctc2009/tpctc2009-03.pdf

  10. WBDB, Workshop of Big Data Benchmarking, San Jose. http://clds.ucsd.edu/wbdb2012

  11. Big Bench. Extend TPC-DS specification to include unstructured and semi-structured data; modify the TPC-DS. In: A data model for BigBench was proposed in the First WBDB Workshop by Ghazal (2012)

    Google Scholar 

  12. Deep Analytic Pipeline. A Benchmark Proposal by Milind Bhandarkar (Pivotal Chief Scientist), (2013). http://clds.sdsc.edu/sites/clds.sdsc.edu/files/2013-03-07-DeepAnalyticsPipeline.pdf

  13. Apache Drill Project. Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Drill is the open source version of Google’s Dremel system which is available as an IaaS service called Google BigQuery. http://incubator.apache.org/drill/

  14. Google Dremel. http://research.google.com/pubs/pub36632.html

  15. Google Big Query. https://developers.google.com/bigquery/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun-Ming Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhao, JM., Wang, WS., Liu, X., Chen, YF. (2014). Big Data Benchmark - Big DS. In: Rabl, T., Raghunath, N., Poess, M., Bhandarkar, M., Jacobsen, HA., Baru, C. (eds) Advancing Big Data Benchmarks. WBDB WBDB 2013 2013. Lecture Notes in Computer Science(), vol 8585. Springer, Cham. https://doi.org/10.1007/978-3-319-10596-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10596-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10595-6

  • Online ISBN: 978-3-319-10596-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics