Big Data Benchmark - Big DS

Zhao, Jun-Ming; Wang, Wen-Shuan; Liu, Xian; Chen, You-Fu

doi:10.1007/978-3-319-10596-3_5

Jun-Ming Zhao¹⁹,
Wen-Shuan Wang¹⁹,
Xian Liu¹⁹ &
…
You-Fu Chen¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8585))

Included in the following conference series:

1432 Accesses
9 Citations

Abstract

Performance and scalability in clusters of heterogeneous and complex Big Data Analytic environments are always unpredictable. In this paper, we are trying to address this problem by using a benchmark named “Big DS”. The benchmark adopts many great ideas from some famous industry benchmarks like TPC-H [1], TPC-DS [1], SPECvirt_sc2010 [2] and SPECjbb2005 [2], we also adopt some ideas from non-standard benchmarks liked TeraSort [3], SWIM [4], etc. By defining a configurable workload for different big data analytics environment, Big DS can be used for measuring the performance and scalability of a big data analytics platform or environment for different business.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

TPC. TPC is a trademark of the Transaction Processing Performance Council. TPC-H and TPC-DS are the decision support benchmarks of TPC organization. http://www.tpc.org
SPEC. SPEC is a trademark of the Standard Performance Evaluation Corporation 1995–2014. SPECjbb2005 is the server side Java Benchmark of SPEC.org. SPECjbb2013 is the evaluation version of SPECjbb2005. SPECvirt_2010sc is the server consolidation virtualization benchmark of SPEC.org. http://www.spec.org
TeraSort. Refer to the Apache Terasort benchmark, which is a MapReduce version of Sort benchmark
Google Scholar
SWIM. SWIM stands for Statistical Workload Injector for MapReduce. The synthesis methodology is adopted in BigDS and it’s supporting toolset
Google Scholar
Apache Hadoop and it’s related projects. Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users. It is licensed under the Apache License 2.0
Google Scholar
Apache Hive. Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. [1] While initially developed by Facebook
Google Scholar
Cloudera Impala. loudera Impala is Cloudera’s open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html
Google BigTable. Refer to Google’s BigTable paper. http://research.google.com/archive/bigtable-osdi06.pdf
Huppler, K.: Chairman TPC. The author of “The art of building a good benchmark” (2009). http://www.tpc.org/tpctc/tpctc2009/tpctc2009-03.pdf
WBDB, Workshop of Big Data Benchmarking, San Jose. http://clds.ucsd.edu/wbdb2012
Big Bench. Extend TPC-DS specification to include unstructured and semi-structured data; modify the TPC-DS. In: A data model for BigBench was proposed in the First WBDB Workshop by Ghazal (2012)
Google Scholar
Deep Analytic Pipeline. A Benchmark Proposal by Milind Bhandarkar (Pivotal Chief Scientist), (2013). http://clds.sdsc.edu/sites/clds.sdsc.edu/files/2013-03-07-DeepAnalyticsPipeline.pdf
Apache Drill Project. Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Drill is the open source version of Google’s Dremel system which is available as an IaaS service called Google BigQuery. http://incubator.apache.org/drill/
Google Dremel. http://research.google.com/pubs/pub36632.html
Google Big Query. https://developers.google.com/bigquery/

Download references

Author information

Authors and Affiliations

HP Building, No 112, Jianguo Road Chaoyang District, Beijing, 100022, China
Jun-Ming Zhao, Wen-Shuan Wang, Xian Liu & You-Fu Chen

Authors

Jun-Ming Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Shuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xian Liu
View author publications
You can also search for this author in PubMed Google Scholar
You-Fu Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun-Ming Zhao .

Editor information

Editors and Affiliations

University of Toronto, Toronto, Ontario, Canada
Tilmann Rabl
Cisco Systems, Inc., San José, USA
Nambiar Raghunath
Oracle Corporation, Redwood Shores, USA
Meikel Poess
Pivotal Software, Inc., Palo Alto, USA
Milind Bhandarkar
University of Toronto, Toronto, Canada
Hans-Arno Jacobsen
University of California at San Diego, La Jolla, USA
Chaitanya Baru

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, JM., Wang, WS., Liu, X., Chen, YF. (2014). Big Data Benchmark - Big DS. In: Rabl, T., Raghunath, N., Poess, M., Bhandarkar, M., Jacobsen, HA., Baru, C. (eds) Advancing Big Data Benchmarks. WBDB WBDB 2013 2013. Lecture Notes in Computer Science(), vol 8585. Springer, Cham. https://doi.org/10.1007/978-3-319-10596-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-10596-3_5
Published: 09 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10595-6
Online ISBN: 978-3-319-10596-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics