Skip to main content

Rafiki: Task-Level Capacity Planning in Distributed Stream Processing Systems

  • 111 Accesses

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13098)

Abstract

Distributed Stream Processing is a valuable paradigm for reliably processing vast amounts of data at high throughput rates with low end-to-end latencies. Most systems of this type offer a fine-grained level of control to parallelize the computation of individual tasks within a streaming job. Adjusting the parallelism of tasks has a direct impact on the overall level of throughput a job can provide as well as the amount of resources required to provide an adequate level of service. However, finding optimal parallelism configurations that fall within the expected Quality of Service requirements is no small feat to accomplish.

In this paper we present Rafiki, an approach to automatically determine optimal parallelism configurations for Distributed Stream Processing jobs. Here we conduct a number of proactive profiling runs to gather information about the processing capacities of individual tasks, thereby making the selection of specific utilization targets possible. Understanding the capacity information enables users to adequately provision resources so that streaming jobs can deliver the desired level of service at a reduced operational cost with predictable recovery times. We implemented Rafiki prototypically together with Apache Flink where we demonstrate its usefulness experimentally.

Keywords

  • Distributed Stream Processing
  • Capacity planning
  • Resource optimization
  • Quality of Service
  • Parallelization
  • Profiling
  • Performance modeling

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-031-06156-1_28
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-031-06156-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.

Notes

  1. 1.

    https://github.com/ciklista/rafiki.

  2. 2.

    https://cloud.google.com/.

  3. 3.

    https://cloud.google.com/kubernetes-engine.

  4. 4.

    https://github.com/ciklista/rafiki.

References

  1. Bilal, M., Canini, M.: Towards automatic parameter tuning of stream processing systems. In: SoCC 2017, pp. 189–200. Association for Computing Machinery, New York, NY, USA (2017)

    Google Scholar 

  2. Chintapalli, S., et al.: Benchmarking streaming computation engines: storm, flink and spark streaming. In: IPDPSW. IEEE (2016)

    Google Scholar 

  3. Geldenhuys, M.K., Thamsen, L., Gontarska, K.K., Lorenz, F., Kao, O.: Effectively testing system configurations of critical IoT analytics pipelines. In: Baru, C., et al. (eds.) Big Data, pp. 4157–4162. IEEE (2019)

    Google Scholar 

  4. Geldenhuys, M.K., Thamsen, L., Kao, O.: Chiron: optimizing fault tolerance in QoS-aware distributed stream processing jobs. In: Wu, X., et al. (eds.) Big Data, pp. 434–440. IEEE (2020)

    Google Scholar 

  5. Isah, H., Abughofa, T., Mahfuz, S., Ajerla, D., Zulkernine, F.H., Khan, S.: A survey of distributed data stream processing frameworks. IEEE Access 7, 154300–154316 (2019)

    CrossRef  Google Scholar 

  6. Kalavri, V., Liagouris, J., Hoffmann, M., Dimitrova, D., Forshaw, M., Roscoe, T.: Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows. In: OSDI, pp. 783–798. USENIX Association, Carlsbad, CA (2018)

    Google Scholar 

  7. Kreps, J.: Kafka: a distributed messaging system for log processing (2011)

    Google Scholar 

  8. Li, Z., et al.: Flink-ER: an elastic resource-scheduling strategy for processing fluctuating mobile stream data on flink. Mob. Inf. Syst. 2020, 5351824:1–5351824:17 (2020)

    Google Scholar 

  9. Nasiri, H., Nasehi, S., Goudarzi, M.: Evaluation of distributed stream processing frameworks for IoT applications in smart cities. J. Big Data 6, 52 (2019)

    CrossRef  Google Scholar 

  10. Röger, H., Mayer, R.: A comprehensive survey on parallelization and elasticity in stream processing. ACM Comput. Surv. 52(2), 36:1–36:37 (2019)

    Google Scholar 

  11. Roy, N., Dubey, A., Gokhale, A., Dowdy, L.: A capacity planning process for performance assurance of component-based distributed systems. In: ICPE 2011, pp. 259–270. Association for Computing Machinery, New York, NY, USA (2011)

    Google Scholar 

  12. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Khatib, M.G., He, X., Factor, M. (eds.) MSST, pp. 1–10. IEEE Computer Society (2010)

    Google Scholar 

  13. Tang, Y., Gedik, B.: Autopipelining for data stream processing. IEEE Trans. Parallel Distrib. Syst. 24(12), 2344–2354 (2013)

    CrossRef  Google Scholar 

  14. Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., Wilkes, J.: Large-scale cluster management at Google with Borg. In: Réveillère, L., Harris, T., Herlihy, M. (eds.) EuroSys, pp. 18:1–18:17. ACM (2015)

    Google Scholar 

  15. Xu, L., Peng, B., Gupta, I.: Stela: enabling stream processing systems to scale-in and scale-out on-demand. In: IC2E, pp. 22–31. IEEE Computer Society (2016)

    Google Scholar 

Download references

Acknowledgment

This work has been supported through grants by the German Ministry for Education and Research (BMBF) as BIFOLD (funding mark 01IS18025A) and WaterGridSense 4.0 (funding mark 02WIK1475D).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin J. J. Pfister .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Pfister, B.J.J. et al. (2022). Rafiki: Task-Level Capacity Planning in Distributed Stream Processing Systems. In: , et al. Euro-Par 2021: Parallel Processing Workshops. Euro-Par 2021. Lecture Notes in Computer Science, vol 13098. Springer, Cham. https://doi.org/10.1007/978-3-031-06156-1_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06156-1_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06155-4

  • Online ISBN: 978-3-031-06156-1

  • eBook Packages: Computer ScienceComputer Science (R0)