TPCx-HS on the Cloud!

  • Nicholas Wakou
  • Michael Woodside
  • Arkady Kanevsky
  • Fazal E Rehman Khan
  • Mofassir ul Islam Arif
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10080)

Abstract

The introduction of web scale operations needed for social media coupled with ease of access to the internet by mobile devices has exponentially increased the amount of data being generated every day. By conservative estimates the world generates close to 50,000 GB of data every second, 90% of which is unstructured, and this growth is accelerating. From its origins as a web log processing system at Yahoo, the open source nature and efficient processing of Apache Hadoop has made it the industry standard for Big Data processing.

TPCx-HS was the first benchmark standard by a major Industry-Standard performance consortium for the Big Data space. TPCx-HS is a derivative of Apache Hadoop Workloads; Teragen, Terasort and Teravalidate. Ever since its release by the TPC in August 2014, all the 18 results published (as of August 2016) have been based on on-premise, Bare-metal hardware configurations.

This paper will show how Hadoop can be deployed on an OpenStack cloud using the OpenStack Sahara project and how TPCx-HS can be used to measure and evaluate the performance of the Cloud under Test (CuT). It will also show how an OpenStack cloud can be optimized to get the performance of TPCx-HS on the Cloud to match as closely as possible that on a Bare-metal configuration. Lastly, it will share results and experiences based on a Hadoop on Cloud Proof-of-Concept (POC), a study that was undertaken by the Dell Open Source Solutions team.

Keywords

Apache Hadoop OpenStack Big data Cloud TPCx-HS Benchmark 

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Nicholas Wakou
    • 1
  • Michael Woodside
    • 1
  • Arkady Kanevsky
    • 1
  • Fazal E Rehman Khan
    • 2
  • Mofassir ul Islam Arif
    • 2
  1. 1.Dell Inc.Round RockUSA
  2. 2.xFlow Research Inc.IslamabadPakistan

Personalised recommendations