SparkBench – A Spark Performance Testing Suite

Agrawal, Dakshi; Butt, Ali; Doshi, Kshitij; Larriba-Pey, Josep-L.; Li, Min; Reiss, Frederick R; Raab, Francois; Schiefer, Berni; Suzumura, Toyotaro; Xia, Yinglong

doi:10.1007/978-3-319-31409-9_3

Dakshi Agrawal¹⁵,
Ali Butt¹⁶,
Kshitij Doshi¹⁹,
Josep-L. Larriba-Pey¹⁷,
Min Li¹⁵,
Frederick R Reiss¹⁵,
Francois Raab¹⁸,
Berni Schiefer¹⁵,
Toyotaro Suzumura¹⁵ &
…
Yinglong Xia¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9508))

Included in the following conference series:

Technology Conference on Performance Evaluation and Benchmarking

1413 Accesses
12 Citations

Abstract

Spark has emerged as an easy to use, scalable, robust and fast system for analytics with a rapidly growing and vibrant community of users and contributors. It is multipurpose—with extensive and modular infrastructure for machine learning, graph processing, SQL, streaming, statistical processing, and more. Its rapid adoption therefore calls for a performance assessment suite that supports agile development, measurement, validation, optimization, configuration, and deployment decisions across a broad range of platform environments and test cases.

Recognizing the need for such comprehensive and agile testing, this paper proposes going beyond existing performance tests for Spark and creating an expanded Spark performance testing suite. This proposal describes several desirable properties flowing from the larger scale, greater and evolving variety, and nuanced requirements of different applications of Spark. The paper identifies the major areas of performance characterization, and the key methodological aspects that should be factored into the design of the proposed suite. The objective is to capture insights from industry and academia on how to best characterize capabilities of Spark-based analytic platforms and provide cost-effective assessment of optimization opportunities in a timely manner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SparkBench: a spark benchmarking suite characterizing large-scale in-memory data analytics

Article 31 January 2017

Spark Parameter Tuning via Trial-and-Error

Characterizing BigBench Queries, Hive, and Spark in Multi-cloud Environments

Notes

1.
Current Spark Streaming is not recommended for sub-second response time, however, we discuss this here in the anticipation of future improvements.

References

DataBricks. https://databricks.com/
Mahout. http://mahout.apache.org/
Huppler, K.: The art of building a good benchmark. In: Nambiar, R., Poess, M. (eds.) TPCTC 2009. LNCS, vol. 5895, pp. 18–30. Springer, Heidelberg (2009)
Google Scholar
Boncz, P., Neumann, T., Erling, O.: TPC-H analyzed: hidden messages and lessons learned from an influential benchmark. In: Nambiar, R., Poess, M. (eds.) TPCTC 2013. LNCS, vol. 8391, pp. 61–76. Springer, Heidelberg (2014)
Chapter Google Scholar
Jacob, B., Mudge, T.N.: Notes on calculating computer performance. University of Michigan, Computer Science and Engineering Division, Department of Electrical Engineering and Computer Science (1995)
Google Scholar
Transaction Processing Performance Council. http://www.tpc.org/
Standard Performance Evaluation Corporation. https://www.spec.org/
Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.-A.: BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 1197–1208. ACM, New York, NY, USA (2013)
Google Scholar
Alexandrov, A., Tzoumas, K., Markl, V.: Myriad: scalable and expressive data generation. Proc. VLDB Endow. 5(12), 1890–1893 (2012)
Article Google Scholar
Rabl, T., Frank, M., Sergieh, H.M., Kosch, H.: A data generator for cloud-scale benchmarking. In: Nambiar, R., Poess, M. (eds.) TPCTC 2010. LNCS, vol. 6417, pp. 41–56. Springer, Heidelberg (2011)
Chapter Google Scholar
Linked Data Benchmark Council Social Network Benchmark (LDBC-SNB) Generator. https://github.com/ldbc/ldbc_snb_datagen
Graph500 generator. http://www.graph500.org/specifications
DOTS: Database Opensource Test Suite. http://ltp.sourceforge.net/documentation/how-to/dots.php
SAP. http://www.sap.com
Infor LN Baan. www.infor.com/product_summary/erp/ln/
Spark-perf. https://github.com/databricks/spark-perf
Sort Benchmark. http://sortbenchmark.org/
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In 26th IEEE ICDEW, pp. 41–51, March 2010
Google Scholar
Performance portal for Apache Spark. http://01org.github.io/sparkscore/plaf1.html
Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., Zheng, C., Lu, G., Zhan, K., Li, X., Qiu, B.: Bigdatabench: a big data benchmark suite from internet services. In: IEEE 20th HPCA, pp. 488–499, February 2014
Google Scholar
AMPLab Big Data Benchmark. https://amplab.cs.berkeley.edu/benchmark/
Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.-A.: Bigbench: towards an industry standard benchmark for big data analytics. In: Proceedings of the 2013 ACM SIGMOD, pp. 1197–1208 (2013)
Google Scholar
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM SOCC, pp. 143–154 (2010)
Google Scholar
Li, M., Tan, J., Wang, Y., Zhang, L., Salapura, V.: SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF 2015, Article 53, ACM, New York, NY, USA (2015)
Google Scholar
Erling, O., Averbuch, A., Larriba-Pey, J.L., Chafi, H., Gubichev, A., Prat, A., Pham, M.-D., Boncz, P.: The LDBC social network benchmark: interactive workload. In: Proceedings of SIGMOD 2015, Melbourne (2015)
Google Scholar
Capotă, M., Hegeman, T., Iosup, A., Prat, A., Erling, O., Boncz, P.: Graphalytics: a big data benchmark for graph-processing platforms. In: Proceedings of GRADES2015, co-located with ACM SIGMOD/PODS (2015)
Google Scholar
Angles, R., Boncz, P.A., Larriba-Pey, J.-L., Fundulaki, I., Neumann, T., Erling, O., Neubauer, P., Martínez-Bazan, N., Kotsev, V., Toma, I.: The linked data benchmark council: a graph and RDF industry benchmarking effort. SIGMOD Record 43(1), 27–31 (2014)
Article Google Scholar
PigMix. https://cwiki.apache.org/confluence/display/PIG/PigMix
Kim, K., Jeon, K., Han, H., Kim, S.,x Jung, S., Yeom, H.Y.: MRBench: a benchmark for MapReduce framework. In: IEEE ICPADS (2008)
Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge all those who contributed with suggestions, ideas and provided valuable feedback during earlier drafts of this document. In particular we would like to thank Alan Bivens, Michael Hind, David Grove, Steve Rees, Shankar Venkataraman, Randy Swanberg, Ching-Yung Lin, and John Poelman.

Author information

Authors and Affiliations

IBM Research, San Jose, USA
Dakshi Agrawal, Min Li, Frederick R Reiss, Berni Schiefer, Toyotaro Suzumura & Yinglong Xia
Virginia Tech, Blacksburg, USA
Ali Butt
Universitat Politècnica de Catalunya BarcelonaTech, Barcelona, Spain
Josep-L. Larriba-Pey
InfoSizing, Manitou Springs, USA
Francois Raab
Intel, Mountain View, USA
Kshitij Doshi

Authors

Dakshi Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Ali Butt
View author publications
You can also search for this author in PubMed Google Scholar
Kshitij Doshi
View author publications
You can also search for this author in PubMed Google Scholar
Josep-L. Larriba-Pey
View author publications
You can also search for this author in PubMed Google Scholar
Min Li
View author publications
You can also search for this author in PubMed Google Scholar
Frederick R Reiss
View author publications
You can also search for this author in PubMed Google Scholar
Francois Raab
View author publications
You can also search for this author in PubMed Google Scholar
Berni Schiefer
View author publications
You can also search for this author in PubMed Google Scholar
Toyotaro Suzumura
View author publications
You can also search for this author in PubMed Google Scholar
Yinglong Xia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Min Li .

Editor information

Editors and Affiliations

Cisco Systems, Inc., San Jose, CA, USA
Raghunath Nambiar
Oracle Corporation, Redwood City, CA, USA
Meikel Poess

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Agrawal, D. et al. (2016). SparkBench – A Spark Performance Testing Suite. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking: Traditional to Big Data to Internet of Things. TPCTC 2015. Lecture Notes in Computer Science(), vol 9508. Springer, Cham. https://doi.org/10.1007/978-3-319-31409-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-31409-9_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31408-2
Online ISBN: 978-3-319-31409-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SparkBench – A Spark Performance Testing Suite

Abstract

Access this chapter

Similar content being viewed by others

SparkBench: a spark benchmarking suite characterizing large-scale in-memory data analytics

Spark Parameter Tuning via Trial-and-Error

Characterizing BigBench Queries, Hive, and Spark in Multi-cloud Environments

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

SparkBench – A Spark Performance Testing Suite

Abstract

Access this chapter

Similar content being viewed by others

SparkBench: a spark benchmarking suite characterizing large-scale in-memory data analytics

Spark Parameter Tuning via Trial-and-Error

Characterizing BigBench Queries, Hive, and Spark in Multi-cloud Environments

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation