Skip to main content
Log in

SPBench: a framework for creating benchmarks of stream processing applications

  • Special Issue Article
  • Published:
Computing Aims and scope Submit manuscript

Abstract

In a fast-changing data-driven world, real-time data processing systems are becoming ubiquitous in everyday applications. The increasing data we produce, such as audio, video, image, and, text are demanding quickly and efficiently computation. Stream Parallelism allows accelerating this computation for real-time processing. But it is still a challenging task and most reserved for experts. In this paper, we present SPBench, a framework for benchmarking stream processing applications. It aims to support users with a set of real-world stream processing applications, which are made accessible through an Application Programming Interface (API) and executable via Command Line Interface (CLI) to create custom benchmarks. We tested SPBench by implementing parallel benchmarks with Intel Threading Building Blocks (TBB), FastFlow, and SPar. This evaluation provided useful insights and revealed the feasibility of the proposed framework in terms of usage, customization, and performance analysis. SPBench demonstrated to be a high-level, reusable, extensible, and easy of use abstraction to build parallel stream processing benchmarks on multi-core architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://github.com/GMAP/SPBench.

  2. https://github.com/gmap/spbench.

References

  1. Andrade HC, Gedik B, Turaga DS (2014) Fundamentals of stream processing: application design, systems, and analytics. Cambridge University Press, Cambridge

    Book  Google Scholar 

  2. McCool M, Reinders J, Robison A (2012) Structured parallel programming: patterns for efficient computation. Elsevier, Amsterdam

    Google Scholar 

  3. Friedman E, Tzoumas K (2016) Introduction to Apache Flink: stream processing for real time and beyond. O’Reilly Media, Inc., Sebastopol

  4. Jain A (2017) Mastering apache storm: Real-time big data streaming using kafka, hbase and redis. Packt Publishing Ltd, Birmingham

    Google Scholar 

  5. Nabi Z (2016) Pro Spark Streaming: The Zen of Real-Time Analytics Using Apache Spark. Apress, New York

    Book  Google Scholar 

  6. Zeuch S, Monte BD, Karimov J, Lutz C, Renz M, Traub J, Breß S, Rabl T, Markl V (2019) Analyzing efficient stream processing on modern hardware. Proc VLDB Endow 12(5):516–530

    Article  Google Scholar 

  7. Voss M, Asenjo R, Reinders J (2019) Pro TBB: C++ parallel programming with threading building blocks. Apress, New York

    Book  Google Scholar 

  8. Aldinucci M, Danelutto M, Kilpatrick P, Torquati M (2017) Fastflow: high-level and efficient streaming on multicore, chap. 13, pp. 261–280. John Wiley & Sons Ltd, Hoboken

  9. Griebler D, Danelutto M, Torquati M, Fernandes LG (2017) SPar: A DSL for high-level and productive stream parallelism. Parallel Process Lett 27(01):1740005

    Article  MathSciNet  Google Scholar 

  10. del Rio Astorga D, Dolz MF, Fernández J, García JD (2017) A generic parallel pattern interface for stream and data processing. Concurr Comput Pract Exp 29(24):e4175

    Article  Google Scholar 

  11. Mencagli G, Torquati M, Griebler D, Danelutto M, Fernandes LGL (2019) Raising the parallel abstraction level for streaming analytics applications. IEEE Access 7:131944–131961

    Article  Google Scholar 

  12. Griebler D, Hoffmann RB, Danelutto M, Fernandes LG (2018) High-level and productive stream parallelism for Dedup, Ferret, and Bzip2. Int J Parallel Program 47(1):253–271

    Google Scholar 

  13. Griebler D, Hoffmann RB, Danelutto M, Fernandes LG (2017) Higher-Level Parallelism Abstractions for Video Applications with SPar. Parallel Computing is Everywhere. In: Proceedings of the international conference on parallel computing, ParCo’17. IOS Press, Bologna, Italy, pp 698–707

  14. Rockenbach DA, Stein CM, Griebler D, Mencagli G, Torquati M, Danelutto M, Fernandes LG (2019) Stream Processing on Multi-cores with GPUs: Parallel Programming Models’ Challenges. In: International parallel and distributed processing symposium workshops (IPDPSW). IPDPSW’19. IEEE, Rio de Janeiro, Brazil, pp 834–841

  15. Vogel A, Griebler D, Fernandes LG (2021) Providing high- level self- adaptive abstractions for stream parallelism on multicores. Softw Pract Exp 51:1194–1217

    Article  Google Scholar 

  16. Mencagli G, Dazzi P, Tonci N (2018) Spinstreams: A static optimization tool for data stream processing applications. In: Middleware ’18: Proceedings of the 19th international middleware conference, middleware ’18, Association for Computing Machinery, New York, NY, USA, pp 66–79

  17. Griebler D, Hoffmann RB, Danelutto M, Fernandes LG (2018) Stream parallelism with ordered data constraints on multi-core systems. J Supercomput 75(8):4042–4061

    Article  Google Scholar 

  18. Hoffmann RB, Griebler D, Danelutto M, Fernandes LG (2020) Stream Parallelism Annotations for Multi-Core Frameworks. In: XXIV Brazilian Symposium on Programming Languages (SBLP). SBLP’20. ACM, Natal, Brazil, pp 48–55

  19. Rockenbach DA, Griebler D, Danelutto M, Fernandes LG (2019) High-level stream parallelism abstractions with SPar targeting GPUs. Parallel computing is everywhere. In: Proceedings of the International Conference on Parallel Computing (ParCo), ParCo’19. vol 36, IOS Press, Prague, Czech Republic, pp 543–552

  20. Stein CM, Rockenbach DA, Griebler D, Torquati M, Mencagli G, Danelutto M, Fernandes LG (2020) Latency- aware adaptive micro- batching techniques for streamed data compression on graphics processing units. Concurr Comput Pract Exp 33:e5786

    Google Scholar 

  21. Bordin MV, Griebler D, Mencagli G, Geyer CFR, Fernandes LG (2020) DSPBench: a suite of benchmark applications for distributed data stream processing systems. IEEE Access 8:222900–222917

    Article  Google Scholar 

  22. Pagliari A, Huet F, Urvoy-Keller G (2020) Namb: A quick and flexible stream processing application prototype generator. In: 2020 20th IEEE/ACM international symposium on cluster, cloud and internet computing (CCGRID), pp 61–70

  23. Shukla A, Chaturvedi S, Simmhan Y (2017) Riotbench: An iot benchmark for distributed stream processing systems. Concurr Comput Pract Exp 29(21):e4257

    Article  Google Scholar 

  24. Lu R, Wu G, Xie B, Hu J (2014) Stream bench: towards benchmarking modern distributed stream computing frameworks. In: 7th International conference on utility and cloud computing, pp 69–78

  25. Wang Y (2016) Stream processing systems benchmark: Streambench. Master’s thesis, Aalto University

  26. Agrawal D, Butt A, Doshi K, Larriba-Pey JL, Li M, Reiss FR, Raab F, Schiefer B, Suzumura T, Xia Y (2016) Sparkbench - a spark performance testing suite. In: Nambiar R, Poess M (eds) Performance evaluation and benchmarking: traditional to big data to internet of things. Springer International, Cham, pp 26–44

    Chapter  Google Scholar 

  27. Maron CAF, Vogel A, Griebler D, Fernandes LG (2019) Should PARSEC benchmarks be more parametric? a case study with Dedup. In: 27th Euromicro international conference on parallel. Distributed and network-based processing (PDP), PDP’19. IEEE, Pavia, Italy, pp 217–221

  28. Zhang S, He B, Dahlmeier D, Zhou AC, Heinze T (2017) Revisiting the design of data stream processing systems on multi-core processors. In: 2017 IEEE 33rd International conference on data engineering (ICDE), pp 659–670

  29. Griebler D (2016) Domain-Specific Language & Support Tool for High-Level Stream Parallelism. Ph.D. thesis, Faculdade de Informática - PPGCC - PUCRS, Porto Alegre, Brazil

  30. Garcia AM, Griebler D, Schepke C, Fernandes LG (2021) Introducing a Stream Processing Framework for Assessing Parallel Programming Interfaces. In: 29th Euromicro international conference on parallel, distributed and network-based processing (PDP), PDP’21. IEEE, Valladolid, Spain

  31. Thies W, Amarasinghe S (2010) An empirical characterization of stream programs and its implications for language and compiler design. In: 2010 19th international conference on parallel architectures and compilation techniques (PACT), pp 365–376

  32. Bienia C, Kumar S, Singh JP, Li K (2008) The parsec benchmark suite: Characterization and architectural implications. In: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pp 72–81

  33. De Sensi D, De Matteis T, Torquati M, Mencagli G, Danelutto M (2017) Bringing parallel patterns out of the corner: The p3 arsec benchmark suite. ACM Trans Archit Code Optim 14(4):1–26

    Article  Google Scholar 

  34. Henning S, Hasselbring W (2021) Theodolite: Scalability benchmarking of distributed stream processing engines in microservice architectures. Big Data Res 25:100209

    Article  Google Scholar 

  35. Karimov J, Rabl T, Katsifodimos A, Samarev R, Heiskanen H, Markl V (2018) Benchmarking distributed stream data processing systems. In: 2018 IEEE 34th international conference on data engineering (ICDE), pp 1507–1518

  36. Čermák M, Tovarňák D, Laštovička M, Čeleda P (2016) A performance benchmark for netflow data analysis on distributed stream processing systems. In: NOMS 2016 - 2016 IEEE/IFIP network operations and management symposium, pp 919–924

  37. Le-Phuoc D, Dao-Tran M, Pham MD, Boncz P, Eiter T, Fink M (2012) Linked stream data processing engines: Facts and figures. In: The Semantic Web – ISWC 2012. Springer, Berlin, Heidelberg, pp 300–312

  38. Amanullah MA, Habeeb RAA, Nasaruddin FH, Gani A, Ahmed E, Nainar ASM, Akim NM, Imran M (2020) Deep learning and big data technologies for iot security. Comput Commun 151:495–517

    Article  Google Scholar 

  39. Hazelcast: Hazelcast in-memory computing platform (2020). https://hazelcast.com/products/in-memory-computing-platform/#in-memory-solutions

  40. Seward J ( 2017) A Program and Library for Data Compression. http://www.bzip.org/1.0.5/bzip2-manual-1.0.5.html

  41. Arubas E (2013) Face detection and recognition (theory and practice) . http://eyalarubas.com/face-detection-and-recognition.html

  42. Navarro A, Asenjo R, Tabik S, Cascaval C (2009) Analytical modeling of pipeline parallelism. In: 2009 18th international conference on parallel architectures and compilation techniques, pp 281–290

Download references

Acknowledgements

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nivel Superior - Brasil (CAPES) - Finance Code 001, FAPERGS 05/2019-PQG project ParAS (No 19/2551-0001895-9), FAPERGS 10/2020-ARD project SPar4.0 (No 21/2551-0000725-7), Universal MCTIC/CNPq No 28/2018 project SParCloud (No 437693/2018-0). The authors acknowledge LAD-IDEIA/PUCRS for computing resources.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adriano Marques Garcia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Garcia, A.M., Griebler, D., Schepke, C. et al. SPBench: a framework for creating benchmarks of stream processing applications. Computing 105, 1077–1099 (2023). https://doi.org/10.1007/s00607-021-01025-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-021-01025-6

Keywords

Mathematics Subject Classification

Navigation