Abstract
In the last decade, stream processing has become a very active research domain motivated by the growing number of stream-based applications. These applications make use of continuous queries, which are processed by a stream processing engine (SPE) to generate timely results given the ephemeral input data. Variations of input data streams, in terms of both volume and distribution of values, have a large impact on computational resource requirements. Dynamic and Automatic Balanced Scaling for Storm (DABS-Storm) is an original solution for handling dynamic adaptation of continuous queries processing according to evolution of input stream properties, while controlling the system stability. Both fluctuations in data volume and distribution of values within data streams are handled by DABS-Storm to adjust the resources usage that best meets processing needs. To achieve this goal, the DABS-Storm holistic approach combines a proactive auto-parallelization algorithm with a latency-aware load balancing strategy.
This work has been partially supported by the project SocioPLug (ANR-13-INFR-0003) funded by the French National Research Agency, the Association Nationale Recherche Technologie (ANRt) http://socioplug.univ-nantes.fr/index.php/SocioPlug_Project.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The experimental evaluation relaxes the model by taking into account processing latency variance.
- 3.
At each scale-in or scale-out, system monitoring is disabled while the system stabilizes. Indeed, the data acquired during this transition period do not provide any information about the nominal behavior of the new configuration of the system.
- 4.
- 5.
- 6.
This choice is motivated by previous results on OSG detailed in [31].
References
Abadi, D.J., et al.: The design of the borealis stream processing engine. In: CIDR 2005, Second Biennial Conference on Innovative Data Systems Research, Online Proceedings, Asilomar, CA, USA, 4–7 January 2005, pp. 277–289. www.cidrdb.org (2005). http://cidrdb.org/cidr2005/papers/P23.pdf
Abadi, D.J., et al.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003). https://doi.org/10.1007/s00778-003-0095-z
Aniello, L., Baldoni, R., Querzoni, L.: Adaptive online scheduling in storm. In: Chakravarthy, S., Urban, S.D., Pietzuch, P.R., Rundensteiner, E.A. (eds.) The 7th ACM International Conference on Distributed Event-Based Systems, DEBS 2013, Arlington, TX, USA, 29 June–03 July 2013, pp. 207–218. ACM (2013). http://doi.acm.org/10.1145/2488222.2488267
Apache Flink. https://flink.apache.org/
Apache Storm. https://storm.apache.org/
Arasu, A., et al.: STREAM: the Stanford data stream management system. In: Garofalakis, M.N., Gehrke, J., Rastogi, R. (eds.) Data Stream Management. DSA, pp. 317–336. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-540-28608-0_16
Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. VLDB J. 15(2), 121–142 (2006). https://doi.org/10.1007/s00778-004-0147-z
Balazinska, M., Balakrishnan, H., Stonebraker, M.: Load management and high availability in the medusa distributed stream processing system. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 929–930. ACM (2004)
Biem, A., et al.: IBM infosphere streams for scalable, real-time, intelligent transportation services. In: Elmagarmid, A.K., Agrawal, D. (eds.) Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, 6–10 June 2010, pp. 1093–1104. ACM (2010). http://doi.acm.org/10.1145/1807167.1807291
Box, G.: Box and Jenkins. In: Time Series Analysis, Forecasting and Control, pp. 161–215. Palgrave Macmillan, London (2013). https://doi.org/10.1057/9781137291264_6
Carter, L., Wegman, M.N.: Universal classes of hash functions. J. Comput. Syst. Sci. 18(2), 143–154 (1979). https://doi.org/10.1016/0022-0000(79)90044-8
Chandrasekaran, S., et al.: TelegraphCQ: continuous dataflow processing. In: Halevy, A.Y., Ives, Z.G., Doan, A. (eds.) Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, 9–12 June 2003, p. 668. ACM (2003). http://doi.acm.org/10.1145/872757.872857
Cherniack, M., et al.: Scalable distributed stream processing. In: CIDR 2003, First Biennial Conference on Innovative Data Systems Research, Online Proceedings, Asilomar, CA, USA, 5–8 January 2003. www.cidrdb.org (2003). http://www-db.cs.wisc.edu/cidr/cidr2003/program/p23.pdf
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005). https://doi.org/10.1016/j.jalgor.2003.12.001
Das, R., Tesauro, G., Walsh, W.E.: Model-based and model-free approaches to autonomic resource allocation. Technical report, RC23802, IBM Research Report, November 2005. http://domino.watson.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/f5e3b7f574b24bad852570c1005e35a9!OpenDocument&Highlight=0,tesauro
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Brewer, E.A., Chen, P. (eds.) 6th Symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, California, USA, 6–8 December 2004, pp. 137–150. USENIX Association (2004). http://www.usenix.org/events/osdi04/tech/dean.html
Gedik, B.: Partitioning functions for stateful data parallelism in stream processing. VLDB J. 23(4), 517–539 (2014). https://doi.org/10.1007/s00778-013-0335-9
Gedik, B., Schneider, S., Hirzel, M., Wu, K.: Elastic scaling for data stream processing. IEEE Trans. Parallel Distrib. Syst. 25(6), 1447–1463 (2014). https://doi.org/10.1109/TPDS.2013.295
Golab, L., Garg, S., Özsu, M.T.: On indexing sliding windows over online data streams. In: Bertino, E., et al. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 712–729. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24741-8_41
Google Cloud Dataflow. https://cloud.google.com/dataflow/
Heinze, T., Pappalardo, V., Jerzak, Z., Fetzer, C.: Auto-scaling techniques for elastic data stream processing. In: Bellur, U., Kothari, R. (eds.) The 8th ACM International Conference on Distributed Event-Based Systems, DEBS 2014, Mumbai, India, 26–29 May 2014, pp. 318–321. ACM (2014). http://doi.acm.org/10.1145/2611286.2611314
Hirzel, M., Soulé, R., Schneider, S., Gedik, B., Grimm, R.: A catalog of stream processing optimizations. ACM Comput. Surv. 46(4), 46:1–46:34 (2013). http://doi.acm.org/10.1145/2528412
Kang, J., Naughton, J.F., Viglas, S.: Evaluating window joins over unbounded streams. In: Dayal, U., Ramamritham, K., Vijayaraman, T.M. (eds.) Proceedings of the 19th International Conference on Data Engineering, 5–8 March 2003, Bangalore, India, pp. 341–352. IEEE Computer Society (2003). https://doi.org/10.1109/ICDE.2003.1260804
Kombi, R.K., Lumineau, N., Lamarre, P.: A preventive auto-parallelization approach for elastic stream processing. In: Lee, K., Liu, L. (eds.) 37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017, Atlanta, GA, USA, 5–8 June 2017, pp. 1532–1542. IEEE Computer Society (2017). https://doi.org/10.1109/ICDCS.2017.253
Mukhopadhyay, A., Mazumdar, R.R.: Analysis of randomized join-the-shortest-queue (JSQ) schemes in large heterogeneous processor-sharing systems. IEEE Trans. Control Netw. Syst. 3(2), 116–126 (2016). https://doi.org/10.1109/TCNS.2015.2428331
Nasir, M.A.U., Morales, G.D.F., García-Soriano, D., Kourtellis, N., Serafini, M.: The power of both choices: practical load balancing for distributed stream processing engines. In: Gehrke, J., Lehner, W., Shim, K., Cha, S.K., Lohman, G.M. (eds.) 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, 13–17 April 2015, pp. 137–148. IEEE Computer Society (2015). https://doi.org/10.1109/ICDE.2015.7113279
Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: Fan, W., et al. (eds.) ICDMW 2010, The 10th IEEE International Conference on Data Mining Workshops, Sydney, Australia, 13 December 2010, pp. 170–177. IEEE Computer Society (2010). https://doi.org/10.1109/ICDMW.2010.172
Peng, B., Hosseini, M., Hong, Z., Farivar, R., Campbell, R.H.: R-storm: resource-aware scheduling in storm. In: Lea, R., Gopalakrishnan, S., Tilevich, E., Murphy, A.L., Blackstock, M. (eds.) Proceedings of the 16th Annual Middleware Conference, Vancouver, BC, Canada, 07–11 December 2015, pp. 149–161. ACM (2015). http://doi.acm.org/10.1145/2814576.2814808
Rivetti, N., Anceaume, E., Busnel, Y., Querzoni, L., Sericola, B.: Proactive online scheduling for shuffle grouping in distributed stream processing systems. In: Proceedings of the 17th ACM/IFIP/USENIX International Middleware Conference, Middleware (2016)
Rivetti, N., Busnel, Y., Querzoni, L.: Load-aware shedding in stream processing systems. In: Gal, A., Weidlich, M., Kalogeraki, V., Venkasubramanian, N. (eds.) Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems, DEBS 2016, Irvine, CA, USA, 20–24 June 2016, pp. 61–68. ACM (2016). http://doi.acm.org/10.1145/2933267.2933311
Rivetti, N., Querzoni, L., Anceaume, E., Busnel, Y., Sericola, B.: Efficient key grouping for near-optimal load balancing in stream processing systems. In: Eliassen, F., Vitenberg, R. (eds.) Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, DEBS 2015, Oslo, Norway, 29 June–3 July 2015, pp. 80–91. ACM (2015). http://doi.acm.org/10.1145/2675743.2771827
Sattler, K., Beier, F.: Towards elastic stream processing: patterns and infrastructure. In: Cormode, G., Yi, K., Deligiannakis, A., Garofalakis, M.N. (eds.) Proceedings of the First International Workshop on Big Dynamic Distributed Data, CEUR Workshop Proceedings, Riva del Garda, Italy, 30 August 2013, vol. 1018, pp. 49–54. CEUR-WS.org (2013). http://ceur-ws.org/Vol-1018/paper9.pdf
Schneider, S., Andrade, H., Gedik, B., Biem, A., Wu, K.: Elastic scaling of data parallel operators in stream processing. In: 23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, 23–29 May 2009, pp. 1–12. IEEE (2009). https://doi.org/10.1109/IPDPS.2009.5161036
Senderovich, A., Weidlich, M., Gal, A., Mandelbaum, A.: Queue mining – predicting delays in service processes. In: Jarke, M., et al. (eds.) CAiSE 2014. LNCS, vol. 8484, pp. 42–57. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07881-6_4
Stonebraker, M., Çetintemel, U., Zdonik, S.B.: The 8 requirements of real-time stream processing. SIGMOD Rec. 34(4), 42–47 (2005). https://doi.org/10.1145/1107499.1107504
Sullivan, M., Heybey, A.: Tribeca: a system for managing large databases of network traffic. In: Douglis, F. (ed.) 1998 USENIX Annual Technical Conference, New Orleans, Louisiana, USA, 15–19 June 1998. USENIX Association (1998). https://www.usenix.org/conference/1998-usenix-annual-technical-conference/tribeca-system-managing-large-databases-network
Vengerov, D., Menck, A.C., Zaït, M., Chakkappen, S.: Join size estimation subject to filter conditions. PVLDB 8(12), 1530–1541 (2015). http://www.vldb.org/pvldb/vol8/p1530-vengerov.pdf
Wu, Y., Tan, K.: ChronoStream: elastic stateful stream computation in the cloud. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 723–734, April 2015. https://doi.org/10.1109/ICDE.2015.7113328
Xu, J., Chen, Z., Tang, J., Su, S.: T-storm: traffic-aware online scheduling in storm. In: IEEE 34th International Conference on Distributed Computing Systems, ICDCS 2014, Madrid, Spain, 30 June–3 July 2014, pp. 535–544. IEEE Computer Society (2014). https://doi.org/10.1109/ICDCS.2014.61
Xu, L., Peng, B., Gupta, I.: Stela: enabling stream processing systems to scale-in and scale-out on-demand. In: 2016 IEEE International Conference on Cloud Engineering, IC2E 2016, Berlin, Germany, 4–8 April 2016, pp. 22–31. IEEE Computer Society (2016). https://doi.org/10.1109/IC2E.2016.38
Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: a fault-tolerant model for scalable stream processing. Technical report UCB/EECS-2012-259. Department of Electrical Engineering and Computer Science, California University, Berkeley (2012). http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer-Verlag GmbH Germany, part of Springer Nature
About this chapter
Cite this chapter
Kombi, R.K., Lumineau, N., Lamarre, P., Rivetti, N., Busnel, Y. (2019). DABS-Storm: A Data-Aware Approach for Elastic Stream Processing. In: Hameurlain, A., Wagner, R., Morvan, F., Tamine, L. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XL. Lecture Notes in Computer Science(), vol 11360. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58664-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-662-58664-8_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-58663-1
Online ISBN: 978-3-662-58664-8
eBook Packages: Computer ScienceComputer Science (R0)