Setting the Direction for Big Data Benchmark Standards

  • Chaitanya Baru
  • Milind Bhandarkar
  • Raghunath Nambiar
  • Meikel Poess
  • Tilmann Rabl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7755)

Abstract

The Workshop on Big Data Benchmarking (WBDB2012), held on May 8-9, 2012 in San Jose, CA, served as an incubator for several promising approaches to define a big data benchmark standard for industry. Through an open forum for discussions on a number of issues related to big data benchmarking—including definitions of big data terms, benchmark processes and auditing — the attendees were able to extend their own view of big data benchmarking as well as communicate their own ideas, which ultimately led to the formation of small working groups to continue collaborative work in this area. In this paper, we summarize the discussions and outcomes from this first workshop, which was attended by about 60 invitees representing 45 different organizations, including industry and academia. Workshop attendees were selected based on their experience and expertise in the areas of management of big data, database systems, performance benchmarking, and big data applications. There was consensus among participants about both the need and the opportunity for defining benchmarks to capture the end-to-end aspects of big data applications. Following the model of TPC benchmarks, it was felt that big data benchmarks should not only include metrics for performance, but also price/performance, along with a sound foundation for fair comparison through audit mechanisms. Additionally, the benchmarks should consider several costs relevant to big data systems including total cost of acquisition, setup cost, and the total cost of ownership, including energy cost. The second Workshop on Big Data Benchmarking will be held in December 2012 in Pune, India, and the third meeting is being planned for July 2013 in Xi’an, China.

Keywords

Big Data Benchmarking Industry Standards 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Gridmix3, git://git.apache.org/hadoop-mapreduce.git/src/contrib/gridmix/ Google Scholar
  2. 2.
    Internet World Stats – Miniwatts Marketing Group (December 2011), http://www.internetworldstats.com/stats.html
  3. 3.
  4. 4.
    Statistical Workload Injector for MapReduce (SWIM), https://github.com/SWIMProjectUCB/SWIM/wiki
  5. 5.
    TPC: TPC Benchmark DS Specification, http://www.tpc.org/tpcds/spec/tpcds_1.1.0.pdf
  6. 6.
    TPC: TPC-Pricing Specification, http://www.tpc.org/pricing/spec/Price_V1.7.0.pdf
  7. 7.
    Workshop On Big Data Benchmarking (2012), http://clds.ucsd.edu/wbdb2012
  8. 8.
    Agrawal, D., Bernstein, P., Bertino, E., Davidson, S., Dayal, U., Franklin, M., Gehrke, J., Haas, L., Halevy, A., Han, J., Jagadish, H.V., Labrinidis, A., Madden, S., Papakonstantinou, Y., Patel, J., Ramakrishnan, R., Ross, K., Shahabi, C., Suciu, D., Vaithyanathan, S., Widom, J.: Challenges and Opportunities with Big Data. Community white paper (2011) Google Scholar
  9. 9.
    Gantz, J., Reinsel, D.: The Digital Universe Decade – Are You Ready? IDC report (2010), http://www.emc.com/collateral/analyst-reports/idc-digital-universe-are-you-ready.pdf
  10. 10.
    Gray, J.: Sort Benchmark Home Page, http://sortbenchmark.org/
  11. 11.
    Hogan, T.: Overview of TPC Benchmark E: The Next Generation of OLTP Benchmarks. In: Nambiar, R., Poess, M. (eds.) TPCTC 2009. LNCS, vol. 5895, pp. 84–98. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  12. 12.
    Huppler, K.: Price and the TPC. In: Nambiar, R., Poess, M. (eds.) TPCTC 2010. LNCS, vol. 6417, pp. 73–84. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  13. 13.
    Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the Graph 500. Sandia National Laboratories (2010) Google Scholar
  14. 14.
    Nambiar, R., Poess, M.: The Making of TPC-DS. In: VLDB 2006, pp. 1049-1058, (2006) Google Scholar
  15. 15.
    Patil, S., Polte, M., Ren, K., Tantisiriroj, W., Xiao, L., López, J., Gibson, G., Fuchs, A., Rinaldi, B.: YCSB++: Benchmarking and Performance Debugging Advanced Features in Scalable Table Stores. In: SOCC 2011, pp. 9:1-9:14 (2011) Google Scholar
  16. 16.
    Poess, M., Floyd, C.: New TPC Benchmarks for Decision Support and Web Commerce. SIGMOD Record 29(4), 64–71 (2000)CrossRefGoogle Scholar
  17. 17.
    Poess, M., Nambiar, R., Walrath, D.: Why You Should Run TPC-DS: A Workload Analysis. In: VLDB 2007, pp. 1138–1149 (2007) Google Scholar
  18. 18.
    Poess, M., Smith, B., Kollár, L., Larson, P.: TPC-DS, Taking Decision Support Benchmarking to the Next Level. In: SIGMOD 2002, pp. 582–587 (2002) Google Scholar
  19. 19.
    Rabl, T., Frank, M., Sergieh, H.M., Kosch, H.: A Data Generator for Cloud-Scale Benchmarking. In: Nambiar, R., Poess, M. (eds.) TPCTC 2010. LNCS, vol. 6417, pp. 41–56. Springer, Heidelberg (2011) Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Chaitanya Baru
    • 1
  • Milind Bhandarkar
    • 2
  • Raghunath Nambiar
    • 3
  • Meikel Poess
    • 4
  • Tilmann Rabl
    • 5
  1. 1.San Diego Supercomputer CenterUC San DiegoUSA
  2. 2.Greenplum/EMCUSA
  3. 3.Cisco Systems, IncUSA
  4. 4.Oracle CorporationUSA
  5. 5.University of TorontoCanada

Personalised recommendations