Skip to main content

Introducing TPCx-HS: The First Industry Standard for Benchmarking Big Data Systems

Part of the Lecture Notes in Computer Science book series (LNPSE,volume 8904)

Abstract

The designation Big Data has become a mainstream buzz phrase across many industries as well as research circles. Today many companies are making performance claims that are not easily verifiable and comparable in the absence of a neutral industry benchmark. Instead one of the test suites used to compare performance of Hadoop based Big Data systems is the TeraSort. While it nicely defines the data set and tasks to measure Big Data Hadoop systems it lacks a formal specification and enforcement rules that enable the comparison of results across systems. In this paper we introduce TPCx-HS, the industry’s first industry standard benchmark, designed to stress both hardware and software that is based on Apache HDFS API compatible distributions. TPCx-HS extends the workload defined in TeraSort with formal rules for implementation, execution, metric, result verification, publication and pricing. It can be used to asses a broad range of system topologies and implementation methodologies of Big Data Hadoop systems in a technically rigorous and directly comparable and vendor-neutral manner.

Keywords

  • TPC
  • Big Data
  • Industry standard
  • Benchmark

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-15350-6_1
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   39.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-15350-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   49.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.

Notes

  1. 1.

    There is no inherent scale limitation in the benchmark. Larger datasets can be added (and smaller ones retired) based on industry trends over time.

References

  1. TPCx-HS Specification. www.tpc.org

  2. Huppler, K., Johnson, D.: TPC express – a new path for TPC benchmarks. In: Nambiar, R., Poess, M. (eds.) TPCTC 2013. LNCS, vol. 8391, pp. 48–60. Springer, Heidelberg (2014)

    CrossRef  Google Scholar 

  3. Nambiar, R., Poess, M.: Keeping the TPC relevant! PVLDB 6(11), 1186–1187 (2013)

    Google Scholar 

  4. Anon, et al.: Measure of transaction processing power. A condensed version of this paper appears in Datamation, April 1, 1985. This paper was scanned from the Tandem Technical Report TR 85.2 in 2001 and reformatted by Jim Gray

    Google Scholar 

  5. O’Malley, O.: Tera Byte Sort on Apache Hadoop. http://sortbenchmark.org/YahooHadoop.pdf

  6. Nambiar, R., Wakou, N., Masland, A., Thawley, P., Lanken, M., Carman, F., Majdalany, M.: Shaping the landscape of industry standard benchmarks: contributions of the transaction processing performance council (TPC). In: Nambiar, R., Poess, M. (eds.) TPCTC 2011. LNCS, vol. 7144, pp. 1–9. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  7. TPC Pricing Specification. www.tpc.org

  8. TPC Energy Specification. www.tpc.org

  9. TPCx-HS Benchmark Specification

    Google Scholar 

  10. Nambiar, R., Poess, M. (eds.): TPCTC 2011. LNCS, vol. 7144. Springer, Heidelberg (2012)

    Google Scholar 

  11. Nambiar, R., Poess, M. (eds.): TPCTC 2010. LNCS, vol. 6417. Springer, Heidelberg (2011)

    Google Scholar 

Download references

Acknowledgement

Developing an industry standard benchmark for a new environment like Big Data has taken the dedicated efforts of experts across many companies. The authors thank the contributions of Andrew Bond (Red Hat), Andrew Masland (NEC), Avik Dey (Intel), Brian Caufield (IBM), Chaitanya Baru (SDSC), Da Qi Ren (Huawei), Dileep Kumar (Cloudera), Jamie Reding (Microsoft), John Fowler (Oracle), John Poelman (IBM), Karthik Kulkarni (Cisco), Meikel Poess (Oracle), Mike Brey (Oracle), Mike Crocker (SAP), Paul Cao (HP), Raghunath Nambiar (Cisco), Reza Taheri (VMware), Simon Harris (IBM), Tariq Magdon-Ismail (VMware), Wayne Smith (Intel), Yanpei Chen (Cloudera), Michael Majdalany (L&M), Forrest Carman (Owen Media) and Andreas Hotea (Hotea Solutions).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tariq Magdon-Ismail .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Nambiar, R. et al. (2015). Introducing TPCx-HS: The First Industry Standard for Benchmarking Big Data Systems. In: Nambiar, R., Poess, M. (eds) Performance Characterization and Benchmarking. Traditional to Big Data. TPCTC 2014. Lecture Notes in Computer Science(), vol 8904. Springer, Cham. https://doi.org/10.1007/978-3-319-15350-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-15350-6_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-15349-0

  • Online ISBN: 978-3-319-15350-6

  • eBook Packages: Computer ScienceComputer Science (R0)