Advertisement

SparkBench – A Spark Performance Testing Suite

  • Dakshi Agrawal
  • Ali Butt
  • Kshitij Doshi
  • Josep-L. Larriba-Pey
  • Min LiEmail author
  • Frederick R Reiss
  • Francois Raab
  • Berni Schiefer
  • Toyotaro Suzumura
  • Yinglong Xia
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9508)

Abstract

Spark has emerged as an easy to use, scalable, robust and fast system for analytics with a rapidly growing and vibrant community of users and contributors. It is multipurpose—with extensive and modular infrastructure for machine learning, graph processing, SQL, streaming, statistical processing, and more. Its rapid adoption therefore calls for a performance assessment suite that supports agile development, measurement, validation, optimization, configuration, and deployment decisions across a broad range of platform environments and test cases.

Recognizing the need for such comprehensive and agile testing, this paper proposes going beyond existing performance tests for Spark and creating an expanded Spark performance testing suite. This proposal describes several desirable properties flowing from the larger scale, greater and evolving variety, and nuanced requirements of different applications of Spark. The paper identifies the major areas of performance characterization, and the key methodological aspects that should be factored into the design of the proposed suite. The objective is to capture insights from industry and academia on how to best characterize capabilities of Spark-based analytic platforms and provide cost-effective assessment of optimization opportunities in a timely manner.

Keywords

Testing Suite Conditional Random Field Graph Computation Reference Implementation Audit Rule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

The authors would like to acknowledge all those who contributed with suggestions, ideas and provided valuable feedback during earlier drafts of this document. In particular we would like to thank Alan Bivens, Michael Hind, David Grove, Steve Rees, Shankar Venkataraman, Randy Swanberg, Ching-Yung Lin, and John Poelman.

References

  1. 1.
  2. 2.
  3. 3.
    Huppler, K.: The art of building a good benchmark. In: Nambiar, R., Poess, M. (eds.) TPCTC 2009. LNCS, vol. 5895, pp. 18–30. Springer, Heidelberg (2009)Google Scholar
  4. 4.
    Boncz, P., Neumann, T., Erling, O.: TPC-H analyzed: hidden messages and lessons learned from an influential benchmark. In: Nambiar, R., Poess, M. (eds.) TPCTC 2013. LNCS, vol. 8391, pp. 61–76. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  5. 5.
    Jacob, B., Mudge, T.N.: Notes on calculating computer performance. University of Michigan, Computer Science and Engineering Division, Department of Electrical Engineering and Computer Science (1995)Google Scholar
  6. 6.
    Transaction Processing Performance Council. http://www.tpc.org/
  7. 7.
    Standard Performance Evaluation Corporation. https://www.spec.org/
  8. 8.
    Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.-A.: BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 1197–1208. ACM, New York, NY, USA (2013)Google Scholar
  9. 9.
    Alexandrov, A., Tzoumas, K., Markl, V.: Myriad: scalable and expressive data generation. Proc. VLDB Endow. 5(12), 1890–1893 (2012)CrossRefGoogle Scholar
  10. 10.
    Rabl, T., Frank, M., Sergieh, H.M., Kosch, H.: A data generator for cloud-scale benchmarking. In: Nambiar, R., Poess, M. (eds.) TPCTC 2010. LNCS, vol. 6417, pp. 41–56. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  11. 11.
    Linked Data Benchmark Council Social Network Benchmark (LDBC-SNB) Generator. https://github.com/ldbc/ldbc_snb_datagen
  12. 12.
  13. 13.
    DOTS: Database Opensource Test Suite. http://ltp.sourceforge.net/documentation/how-to/dots.php
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
    Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In 26th IEEE ICDEW, pp. 41–51, March 2010Google Scholar
  19. 19.
    Performance portal for Apache Spark. http://01org.github.io/sparkscore/plaf1.html
  20. 20.
    Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., Zheng, C., Lu, G., Zhan, K., Li, X., Qiu, B.: Bigdatabench: a big data benchmark suite from internet services. In: IEEE 20th HPCA, pp. 488–499, February 2014Google Scholar
  21. 21.
  22. 22.
    Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.-A.: Bigbench: towards an industry standard benchmark for big data analytics. In: Proceedings of the 2013 ACM SIGMOD, pp. 1197–1208 (2013)Google Scholar
  23. 23.
    Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM SOCC, pp. 143–154 (2010)Google Scholar
  24. 24.
    Li, M., Tan, J., Wang, Y., Zhang, L., Salapura, V.: SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF 2015, Article 53, ACM, New York, NY, USA (2015)Google Scholar
  25. 25.
    Erling, O., Averbuch, A., Larriba-Pey, J.L., Chafi, H., Gubichev, A., Prat, A., Pham, M.-D., Boncz, P.: The LDBC social network benchmark: interactive workload. In: Proceedings of SIGMOD 2015, Melbourne (2015)Google Scholar
  26. 26.
    Capotă, M., Hegeman, T., Iosup, A., Prat, A., Erling, O., Boncz, P.: Graphalytics: a big data benchmark for graph-processing platforms. In: Proceedings of GRADES2015, co-located with ACM SIGMOD/PODS (2015)Google Scholar
  27. 27.
    Angles, R., Boncz, P.A., Larriba-Pey, J.-L., Fundulaki, I., Neumann, T., Erling, O., Neubauer, P., Martínez-Bazan, N., Kotsev, V., Toma, I.: The linked data benchmark council: a graph and RDF industry benchmarking effort. SIGMOD Record 43(1), 27–31 (2014)CrossRefGoogle Scholar
  28. 28.
  29. 29.
    Kim, K., Jeon, K., Han, H., Kim, S.,x Jung, S., Yeom, H.Y.: MRBench: a benchmark for MapReduce framework. In: IEEE ICPADS (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Dakshi Agrawal
    • 1
  • Ali Butt
    • 2
  • Kshitij Doshi
    • 5
  • Josep-L. Larriba-Pey
    • 3
  • Min Li
    • 1
    Email author
  • Frederick R Reiss
    • 1
  • Francois Raab
    • 4
  • Berni Schiefer
    • 1
  • Toyotaro Suzumura
    • 1
  • Yinglong Xia
    • 1
  1. 1.IBM ResearchSan JoseUSA
  2. 2.Virginia TechBlacksburgUSA
  3. 3.Universitat Politècnica de Catalunya BarcelonaTechBarcelonaSpain
  4. 4.InfoSizingManitou SpringsUSA
  5. 5.IntelMountain ViewUSA

Personalised recommendations