Discussion of BigBench: A Proposed Industry Standard Performance Benchmark for Big Data

  • Chaitanya Baru
  • Milind Bhandarkar
  • Carlo Curino
  • Manuel Danisch
  • Michael Frank
  • Bhaskar Gowda
  • Hans-Arno Jacobsen
  • Huang Jie
  • Dileep Kumar
  • Raghunath Nambiar
  • Meikel Poess
  • Francois Raab
  • Tilmann Rabl
  • Nishkam Ravi
  • Kai Sachs
  • Saptak Sen
  • Lan Yi
  • Choonhan Youn
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8904)

Abstract

Enterprises perceive a huge opportunity in mining information that can be found in big data. New storage systems and processing paradigms are allowing for ever larger data sets to be collected and analyzed. The high demand for data analytics and rapid development in technologies has led to a sizable ecosystem of big data processing systems. However, the lack of established, standardized benchmarks makes it difficult for users to choose the appropriate systems that suit their requirements. To address this problem, we have developed the BigBench benchmark specification. BigBench is the first end-to-end big data analytics benchmark suite. In this paper, we present the BigBench benchmark and analyze the workload from technical as well as business point of view. We characterize the queries in the workload along different dimensions, according to their functional characteristics, and also analyze their runtime behavior. Finally, we evaluate the suitability and relevance of the workload from the point of view of enterprise applications, and discuss potential extensions to the proposed specification in order to cover typical big data processing use cases.

References

  1. 1.
    Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: LinkBench: a database benchmark based on the facebook social graph. In: SIGMOD, pp. 1185–1196 (2013)Google Scholar
  2. 2.
    Chen, Y., Raab, F., Katz, R.: From TPC-C to big data benchmarks: a functional workload model. In: Rabl, T., Poess, M., Baru, C., Jacobsen, H.-A. (eds.) WBDB 2012. LNCS, vol. 8163, pp. 28–43. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  3. 3.
    Chowdhury, B., Rabl, T., Saadatpanah, P., Du, J., Jacobsen, H.A.: A BigBench implementation in the hadoop ecosystem. In: Rabl, T., Raghunath, N., Poess, M., Bhandarkar, M., Jacobsen, H.-A., Baru, C. (eds.) WBDB 2013. LNCS, vol. 8585, pp. 3–18. Springer, Switzerland (2014) CrossRefGoogle Scholar
  4. 4.
    Costley, J., Lankford, P.: Big Data Cases in Banking and Securities - A Report from the Front Lines. Technical report STAC (2014)Google Scholar
  5. 5.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  6. 6.
    Dominguez-Sal, D., Martinez-Bazan, N., Muntes-Mulero, V., Baleta, P., Larriba-Pey, J.L.: A Discussion on the Design of Graph Database Benchmarks. In: Nambiar, R., Poess, M. (eds.) TPCTC 2010. LNCS, vol. 6417, pp. 25–40. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  7. 7.
    Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen., H.A.: BigBench: towards an industry standard benchmark for big data analytics. In: SIGMOD (2013)Google Scholar
  8. 8.
    Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: ICDEW (2010)Google Scholar
  9. 9.
    Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: the next frontier for innovation, competition, and productivity. Technical report, McKinsey Global Institute (2011). http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the_next_frontier_for_innovation
  10. 10.
    Marz, N.: Storm - Distributed and Fault-Tolerant Realtime Computation. http://www.storm-project.net/
  11. 11.
    Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the Graph 500. Cray Users Group (CUG) (2010)Google Scholar
  12. 12.
    Nambiar, R.O., Poess, M.: The making of TPC-DS. In: Dayal, U., Whang, K.Y., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y.K. (eds.) VLDB, pp. 1049–1058. ACM (2006)Google Scholar
  13. 13.
    Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD, pp. 165–178 (2009)Google Scholar
  14. 14.
    Pöss, M., Floyd, C.: New TPC benchmarks for decision support and web commerce. SIGMOD Rec. 29(4), 64–71 (2000)CrossRefGoogle Scholar
  15. 15.
    Pöss, M., Nambiar, R.O., Walrath, D.: Why you should run TPC-DS: a workload analysis. In: VLDB, pp. 1138–1149 (2007)Google Scholar
  16. 16.
    Rabl, T., Frank, M., Danisch, M., Gowda, B., Jacobsen, H.A.: Towards a complete BigBench implementation. In: WBDB (2014). (in print)Google Scholar
  17. 17.
    Rabl, T., Frank, M., Sergieh, H.M., Kosch, H.: A data generator for cloud-scale benchmarking. In: Nambiar, R., Poess, M. (eds.) TPCTC 2010. LNCS, vol. 6417, pp. 41–56. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  18. 18.
    Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. PVLDB 2(2), 1626–1629 (2009)Google Scholar
  19. 19.
    Transaction Processing Performance Council: TPC Benchmark C - Standard Specification (2010). (version 5.11)Google Scholar
  20. 20.
    Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., Zhen, C., Lu, G., Zhan, K., Li, X., Qiu, B.: BigDataBench: a big data benchmark suite from internet services. In: HPCA (2014)Google Scholar
  21. 21.
    Yi, L., Dai, J.: Experience from hadoop benchmarking with HiBench: from micro-benchmarks toward end-to-end pipelines. In: Rabl, T., Raghunath, N., Poess, M., Bhandarkar, M., Jacobsen, H.-A., Baru, C. (eds.) WBDB 2013. LNCS, vol. 8585, pp. 43–48. Springer, Switzerland (2014) CrossRefGoogle Scholar
  22. 22.
    Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI, pp. 2–2 (2012)Google Scholar
  23. 23.
    Zhao, J.M., Wang, W., Liu, X.: Big data benchmark - big DS. In: Rabl, T., Raghunath, N., Poess, M., Bhandarkar, M., Jacobsen, H.-A., Baru, C. (eds.) WBDB 2013. LNCS, vol. 8585, pp. 49–57. Springer, Switzerland (2014) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Chaitanya Baru
    • 11
  • Milind Bhandarkar
    • 10
  • Carlo Curino
    • 7
  • Manuel Danisch
    • 1
  • Michael Frank
    • 1
  • Bhaskar Gowda
    • 6
  • Hans-Arno Jacobsen
    • 8
  • Huang Jie
    • 6
  • Dileep Kumar
    • 3
  • Raghunath Nambiar
    • 2
  • Meikel Poess
    • 9
  • Francois Raab
    • 5
  • Tilmann Rabl
    • 1
    • 8
  • Nishkam Ravi
    • 3
  • Kai Sachs
    • 12
  • Saptak Sen
    • 4
  • Lan Yi
    • 6
  • Choonhan Youn
    • 11
  1. 1.BankmarkPassauGermany
  2. 2.Cisco SystemsSan JoseUSA
  3. 3.ClouderaPalo AltoUSA
  4. 4.HortonworksSanta ClaraUSA
  5. 5.InfosizingManitou SpringsUSA
  6. 6.Intel CorporationSanta ClaraUSA
  7. 7.Microsoft CorporationRedmondUSA
  8. 8.Middleware Systems Research GroupTorontoCanada
  9. 9.Oracle CorporationRedwood CityUSA
  10. 10.PivotalVancouverCanada
  11. 11.San Diego Supercomputer CenterLa JollaUSA
  12. 12.SPEC Research GroupGainesvilleUSA

Personalised recommendations