DABS-Storm: A Data-Aware Approach for Elastic Stream Processing

  • Roland Kotto KombiEmail author
  • Nicolas Lumineau
  • Philippe Lamarre
  • Nicolo Rivetti
  • Yann Busnel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11360)


In the last decade, stream processing has become a very active research domain motivated by the growing number of stream-based applications. These applications make use of continuous queries, which are processed by a stream processing engine (SPE) to generate timely results given the ephemeral input data. Variations of input data streams, in terms of both volume and distribution of values, have a large impact on computational resource requirements. Dynamic and Automatic Balanced Scaling for Storm (DABS-Storm) is an original solution for handling dynamic adaptation of continuous queries processing according to evolution of input stream properties, while controlling the system stability. Both fluctuations in data volume and distribution of values within data streams are handled by DABS-Storm to adjust the resources usage that best meets processing needs. To achieve this goal, the DABS-Storm holistic approach combines a proactive auto-parallelization algorithm with a latency-aware load balancing strategy.


  1. 1.
    Abadi, D.J., et al.: The design of the borealis stream processing engine. In: CIDR 2005, Second Biennial Conference on Innovative Data Systems Research, Online Proceedings, Asilomar, CA, USA, 4–7 January 2005, pp. 277–289. (2005).
  2. 2.
    Abadi, D.J., et al.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003). Scholar
  3. 3.
    Aniello, L., Baldoni, R., Querzoni, L.: Adaptive online scheduling in storm. In: Chakravarthy, S., Urban, S.D., Pietzuch, P.R., Rundensteiner, E.A. (eds.) The 7th ACM International Conference on Distributed Event-Based Systems, DEBS 2013, Arlington, TX, USA, 29 June–03 July 2013, pp. 207–218. ACM (2013).
  4. 4.
  5. 5.
  6. 6.
    Arasu, A., et al.: STREAM: the Stanford data stream management system. In: Garofalakis, M.N., Gehrke, J., Rastogi, R. (eds.) Data Stream Management. DSA, pp. 317–336. Springer, Heidelberg (2016). Scholar
  7. 7.
    Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. VLDB J. 15(2), 121–142 (2006). Scholar
  8. 8.
    Balazinska, M., Balakrishnan, H., Stonebraker, M.: Load management and high availability in the medusa distributed stream processing system. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 929–930. ACM (2004)Google Scholar
  9. 9.
    Biem, A., et al.: IBM infosphere streams for scalable, real-time, intelligent transportation services. In: Elmagarmid, A.K., Agrawal, D. (eds.) Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, 6–10 June 2010, pp. 1093–1104. ACM (2010).
  10. 10.
    Box, G.: Box and Jenkins. In: Time Series Analysis, Forecasting and Control, pp. 161–215. Palgrave Macmillan, London (2013).
  11. 11.
    Carter, L., Wegman, M.N.: Universal classes of hash functions. J. Comput. Syst. Sci. 18(2), 143–154 (1979). Scholar
  12. 12.
    Chandrasekaran, S., et al.: TelegraphCQ: continuous dataflow processing. In: Halevy, A.Y., Ives, Z.G., Doan, A. (eds.) Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, 9–12 June 2003, p. 668. ACM (2003).
  13. 13.
    Cherniack, M., et al.: Scalable distributed stream processing. In: CIDR 2003, First Biennial Conference on Innovative Data Systems Research, Online Proceedings, Asilomar, CA, USA, 5–8 January 2003. (2003).
  14. 14.
    Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005). Scholar
  15. 15.
    Das, R., Tesauro, G., Walsh, W.E.: Model-based and model-free approaches to autonomic resource allocation. Technical report, RC23802, IBM Research Report, November 2005.!OpenDocument&Highlight=0,tesauro
  16. 16.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Brewer, E.A., Chen, P. (eds.) 6th Symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, California, USA, 6–8 December 2004, pp. 137–150. USENIX Association (2004).
  17. 17.
    Gedik, B.: Partitioning functions for stateful data parallelism in stream processing. VLDB J. 23(4), 517–539 (2014). Scholar
  18. 18.
    Gedik, B., Schneider, S., Hirzel, M., Wu, K.: Elastic scaling for data stream processing. IEEE Trans. Parallel Distrib. Syst. 25(6), 1447–1463 (2014). Scholar
  19. 19.
    Golab, L., Garg, S., Özsu, M.T.: On indexing sliding windows over online data streams. In: Bertino, E., et al. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 712–729. Springer, Heidelberg (2004). Scholar
  20. 20.
    Google Cloud Dataflow.
  21. 21.
    Heinze, T., Pappalardo, V., Jerzak, Z., Fetzer, C.: Auto-scaling techniques for elastic data stream processing. In: Bellur, U., Kothari, R. (eds.) The 8th ACM International Conference on Distributed Event-Based Systems, DEBS 2014, Mumbai, India, 26–29 May 2014, pp. 318–321. ACM (2014).
  22. 22.
    Hirzel, M., Soulé, R., Schneider, S., Gedik, B., Grimm, R.: A catalog of stream processing optimizations. ACM Comput. Surv. 46(4), 46:1–46:34 (2013). Scholar
  23. 23.
    Kang, J., Naughton, J.F., Viglas, S.: Evaluating window joins over unbounded streams. In: Dayal, U., Ramamritham, K., Vijayaraman, T.M. (eds.) Proceedings of the 19th International Conference on Data Engineering, 5–8 March 2003, Bangalore, India, pp. 341–352. IEEE Computer Society (2003).
  24. 24.
    Kombi, R.K., Lumineau, N., Lamarre, P.: A preventive auto-parallelization approach for elastic stream processing. In: Lee, K., Liu, L. (eds.) 37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017, Atlanta, GA, USA, 5–8 June 2017, pp. 1532–1542. IEEE Computer Society (2017).
  25. 25.
    Mukhopadhyay, A., Mazumdar, R.R.: Analysis of randomized join-the-shortest-queue (JSQ) schemes in large heterogeneous processor-sharing systems. IEEE Trans. Control Netw. Syst. 3(2), 116–126 (2016). Scholar
  26. 26.
    Nasir, M.A.U., Morales, G.D.F., García-Soriano, D., Kourtellis, N., Serafini, M.: The power of both choices: practical load balancing for distributed stream processing engines. In: Gehrke, J., Lehner, W., Shim, K., Cha, S.K., Lohman, G.M. (eds.) 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, 13–17 April 2015, pp. 137–148. IEEE Computer Society (2015).
  27. 27.
    Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: Fan, W., et al. (eds.) ICDMW 2010, The 10th IEEE International Conference on Data Mining Workshops, Sydney, Australia, 13 December 2010, pp. 170–177. IEEE Computer Society (2010).
  28. 28.
    Peng, B., Hosseini, M., Hong, Z., Farivar, R., Campbell, R.H.: R-storm: resource-aware scheduling in storm. In: Lea, R., Gopalakrishnan, S., Tilevich, E., Murphy, A.L., Blackstock, M. (eds.) Proceedings of the 16th Annual Middleware Conference, Vancouver, BC, Canada, 07–11 December 2015, pp. 149–161. ACM (2015).
  29. 29.
    Rivetti, N., Anceaume, E., Busnel, Y., Querzoni, L., Sericola, B.: Proactive online scheduling for shuffle grouping in distributed stream processing systems. In: Proceedings of the 17th ACM/IFIP/USENIX International Middleware Conference, Middleware (2016)Google Scholar
  30. 30.
    Rivetti, N., Busnel, Y., Querzoni, L.: Load-aware shedding in stream processing systems. In: Gal, A., Weidlich, M., Kalogeraki, V., Venkasubramanian, N. (eds.) Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems, DEBS 2016, Irvine, CA, USA, 20–24 June 2016, pp. 61–68. ACM (2016).
  31. 31.
    Rivetti, N., Querzoni, L., Anceaume, E., Busnel, Y., Sericola, B.: Efficient key grouping for near-optimal load balancing in stream processing systems. In: Eliassen, F., Vitenberg, R. (eds.) Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, DEBS 2015, Oslo, Norway, 29 June–3 July 2015, pp. 80–91. ACM (2015).
  32. 32.
    Sattler, K., Beier, F.: Towards elastic stream processing: patterns and infrastructure. In: Cormode, G., Yi, K., Deligiannakis, A., Garofalakis, M.N. (eds.) Proceedings of the First International Workshop on Big Dynamic Distributed Data, CEUR Workshop Proceedings, Riva del Garda, Italy, 30 August 2013, vol. 1018, pp. 49–54. (2013).
  33. 33.
    Schneider, S., Andrade, H., Gedik, B., Biem, A., Wu, K.: Elastic scaling of data parallel operators in stream processing. In: 23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, 23–29 May 2009, pp. 1–12. IEEE (2009).
  34. 34.
    Senderovich, A., Weidlich, M., Gal, A., Mandelbaum, A.: Queue mining – predicting delays in service processes. In: Jarke, M., et al. (eds.) CAiSE 2014. LNCS, vol. 8484, pp. 42–57. Springer, Cham (2014). Scholar
  35. 35.
    Stonebraker, M., Çetintemel, U., Zdonik, S.B.: The 8 requirements of real-time stream processing. SIGMOD Rec. 34(4), 42–47 (2005). Scholar
  36. 36.
    Sullivan, M., Heybey, A.: Tribeca: a system for managing large databases of network traffic. In: Douglis, F. (ed.) 1998 USENIX Annual Technical Conference, New Orleans, Louisiana, USA, 15–19 June 1998. USENIX Association (1998).
  37. 37.
    Vengerov, D., Menck, A.C., Zaït, M., Chakkappen, S.: Join size estimation subject to filter conditions. PVLDB 8(12), 1530–1541 (2015). Scholar
  38. 38.
    Wu, Y., Tan, K.: ChronoStream: elastic stateful stream computation in the cloud. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 723–734, April 2015.
  39. 39.
    Xu, J., Chen, Z., Tang, J., Su, S.: T-storm: traffic-aware online scheduling in storm. In: IEEE 34th International Conference on Distributed Computing Systems, ICDCS 2014, Madrid, Spain, 30 June–3 July 2014, pp. 535–544. IEEE Computer Society (2014).
  40. 40.
    Xu, L., Peng, B., Gupta, I.: Stela: enabling stream processing systems to scale-in and scale-out on-demand. In: 2016 IEEE International Conference on Cloud Engineering, IC2E 2016, Berlin, Germany, 4–8 April 2016, pp. 22–31. IEEE Computer Society (2016).
  41. 41.
    Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: a fault-tolerant model for scalable stream processing. Technical report UCB/EECS-2012-259. Department of Electrical Engineering and Computer Science, California University, Berkeley (2012).

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Roland Kotto Kombi
    • 1
    Email author
  • Nicolas Lumineau
    • 2
  • Philippe Lamarre
    • 1
  • Nicolo Rivetti
    • 3
  • Yann Busnel
    • 4
  1. 1.Univ Lyon, INSA de Lyon, LIRIS UMR 5205VilleurbanneFrance
  2. 2.Univ Lyon, University Claude Bernard Lyon 1, LIRIS UMR 5205VilleurbanneFrance
  3. 3.Technion, Israel Institute of TechnologyHaifaIsrael
  4. 4.IMT AtlantiqueRennesFrance

Personalised recommendations