Introducing TPCx-HS: The First Industry Standard for Benchmarking Big Data Systems

  • Raghunath Nambiar
  • Meikel Poess
  • Akon Dey
  • Paul Cao
  • Tariq Magdon-Ismail
  • Da Qi Ren
  • Andrew Bond
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8904)

Abstract

The designation Big Data has become a mainstream buzz phrase across many industries as well as research circles. Today many companies are making performance claims that are not easily verifiable and comparable in the absence of a neutral industry benchmark. Instead one of the test suites used to compare performance of Hadoop based Big Data systems is the TeraSort. While it nicely defines the data set and tasks to measure Big Data Hadoop systems it lacks a formal specification and enforcement rules that enable the comparison of results across systems. In this paper we introduce TPCx-HS, the industry’s first industry standard benchmark, designed to stress both hardware and software that is based on Apache HDFS API compatible distributions. TPCx-HS extends the workload defined in TeraSort with formal rules for implementation, execution, metric, result verification, publication and pricing. It can be used to asses a broad range of system topologies and implementation methodologies of Big Data Hadoop systems in a technically rigorous and directly comparable and vendor-neutral manner.

Keywords

TPC Big Data Industry standard Benchmark 

References

  1. 1.
    TPCx-HS Specification. www.tpc.org
  2. 2.
    Huppler, K., Johnson, D.: TPC express – a new path for TPC benchmarks. In: Nambiar, R., Poess, M. (eds.) TPCTC 2013. LNCS, vol. 8391, pp. 48–60. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  3. 3.
    Nambiar, R., Poess, M.: Keeping the TPC relevant! PVLDB 6(11), 1186–1187 (2013)Google Scholar
  4. 4.
    Anon, et al.: Measure of transaction processing power. A condensed version of this paper appears in Datamation, April 1, 1985. This paper was scanned from the Tandem Technical Report TR 85.2 in 2001 and reformatted by Jim GrayGoogle Scholar
  5. 5.
    O’Malley, O.: Tera Byte Sort on Apache Hadoop. http://sortbenchmark.org/YahooHadoop.pdf
  6. 6.
    Nambiar, R., Wakou, N., Masland, A., Thawley, P., Lanken, M., Carman, F., Majdalany, M.: Shaping the landscape of industry standard benchmarks: contributions of the transaction processing performance council (TPC). In: Nambiar, R., Poess, M. (eds.) TPCTC 2011. LNCS, vol. 7144, pp. 1–9. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  7. 7.
    TPC Pricing Specification. www.tpc.org
  8. 8.
    TPC Energy Specification. www.tpc.org
  9. 9.
    TPCx-HS Benchmark SpecificationGoogle Scholar
  10. 10.
    Nambiar, R., Poess, M. (eds.): TPCTC 2011. LNCS, vol. 7144. Springer, Heidelberg (2012)Google Scholar
  11. 11.
    Nambiar, R., Poess, M. (eds.): TPCTC 2010. LNCS, vol. 6417. Springer, Heidelberg (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Raghunath Nambiar
    • 1
  • Meikel Poess
    • 2
  • Akon Dey
    • 3
  • Paul Cao
    • 4
  • Tariq Magdon-Ismail
    • 5
  • Da Qi Ren
    • 6
  • Andrew Bond
    • 7
  1. 1.Cisco Systems, Inc.San JoseUSA
  2. 2.Oracle CorporationRedwood ShoresUSA
  3. 3.School of Information TechnologiesUniversity of SydneySydneyAustralia
  4. 4.Hewlett-PackardHoustonUSA
  5. 5.VMware, Inc.Palo AltoUSA
  6. 6.Futurewei TechnologiesSanta ClaraUSA
  7. 7.Red HatRaleighUSA

Personalised recommendations