Abstract
The Workshop on Big Data Benchmarking (WBDB2012), held on May 8-9, 2012 in San Jose, CA, served as an incubator for several promising approaches to define a big data benchmark standard for industry. Through an open forum for discussions on a number of issues related to big data benchmarking—including definitions of big data terms, benchmark processes and auditing — the attendees were able to extend their own view of big data benchmarking as well as communicate their own ideas, which ultimately led to the formation of small working groups to continue collaborative work in this area. In this paper, we summarize the discussions and outcomes from this first workshop, which was attended by about 60 invitees representing 45 different organizations, including industry and academia. Workshop attendees were selected based on their experience and expertise in the areas of management of big data, database systems, performance benchmarking, and big data applications. There was consensus among participants about both the need and the opportunity for defining benchmarks to capture the end-to-end aspects of big data applications. Following the model of TPC benchmarks, it was felt that big data benchmarks should not only include metrics for performance, but also price/performance, along with a sound foundation for fair comparison through audit mechanisms. Additionally, the benchmarks should consider several costs relevant to big data systems including total cost of acquisition, setup cost, and the total cost of ownership, including energy cost. The second Workshop on Big Data Benchmarking will be held in December 2012 in Pune, India, and the third meeting is being planned for July 2013 in Xi’an, China.
Keywords
- Big Data
- Benchmarking
- Industry Standards
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
Gridmix3, git://git.apache.org/hadoop-mapreduce.git/src/contrib/gridmix/
Internet World Stats – Miniwatts Marketing Group (December 2011), http://www.internetworldstats.com/stats.html
SPEC CPU2006: http://www.spec.org/cpu2006/
Statistical Workload Injector for MapReduce (SWIM), https://github.com/SWIMProjectUCB/SWIM/wiki
TPC: TPC Benchmark DS Specification, http://www.tpc.org/tpcds/spec/tpcds_1.1.0.pdf
TPC: TPC-Pricing Specification, http://www.tpc.org/pricing/spec/Price_V1.7.0.pdf
Workshop On Big Data Benchmarking (2012), http://clds.ucsd.edu/wbdb2012
Agrawal, D., Bernstein, P., Bertino, E., Davidson, S., Dayal, U., Franklin, M., Gehrke, J., Haas, L., Halevy, A., Han, J., Jagadish, H.V., Labrinidis, A., Madden, S., Papakonstantinou, Y., Patel, J., Ramakrishnan, R., Ross, K., Shahabi, C., Suciu, D., Vaithyanathan, S., Widom, J.: Challenges and Opportunities with Big Data. Community white paper (2011)
Gantz, J., Reinsel, D.: The Digital Universe Decade – Are You Ready? IDC report (2010), http://www.emc.com/collateral/analyst-reports/idc-digital-universe-are-you-ready.pdf
Gray, J.: Sort Benchmark Home Page, http://sortbenchmark.org/
Hogan, T.: Overview of TPC Benchmark E: The Next Generation of OLTP Benchmarks. In: Nambiar, R., Poess, M. (eds.) TPCTC 2009. LNCS, vol. 5895, pp. 84–98. Springer, Heidelberg (2009)
Huppler, K.: Price and the TPC. In: Nambiar, R., Poess, M. (eds.) TPCTC 2010. LNCS, vol. 6417, pp. 73–84. Springer, Heidelberg (2011)
Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the Graph 500. Sandia National Laboratories (2010)
Nambiar, R., Poess, M.: The Making of TPC-DS. In: VLDB 2006, pp. 1049-1058, (2006)
Patil, S., Polte, M., Ren, K., Tantisiriroj, W., Xiao, L., López, J., Gibson, G., Fuchs, A., Rinaldi, B.: YCSB++: Benchmarking and Performance Debugging Advanced Features in Scalable Table Stores. In: SOCC 2011, pp. 9:1-9:14 (2011)
Poess, M., Floyd, C.: New TPC Benchmarks for Decision Support and Web Commerce. SIGMOD Record 29(4), 64–71 (2000)
Poess, M., Nambiar, R., Walrath, D.: Why You Should Run TPC-DS: A Workload Analysis. In: VLDB 2007, pp. 1138–1149 (2007)
Poess, M., Smith, B., Kollár, L., Larson, P.: TPC-DS, Taking Decision Support Benchmarking to the Next Level. In: SIGMOD 2002, pp. 582–587 (2002)
Rabl, T., Frank, M., Sergieh, H.M., Kosch, H.: A Data Generator for Cloud-Scale Benchmarking. In: Nambiar, R., Poess, M. (eds.) TPCTC 2010. LNCS, vol. 6417, pp. 41–56. Springer, Heidelberg (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baru, C., Bhandarkar, M., Nambiar, R., Poess, M., Rabl, T. (2013). Setting the Direction for Big Data Benchmark Standards. In: Nambiar, R., Poess, M. (eds) Selected Topics in Performance Evaluation and Benchmarking. TPCTC 2012. Lecture Notes in Computer Science, vol 7755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36727-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-36727-4_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36726-7
Online ISBN: 978-3-642-36727-4
eBook Packages: Computer ScienceComputer Science (R0)