Skip to main content
Log in

Online Nonstop Task Management for Storm-Based Distributed Stream Processing Engines

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Most distributed stream processing engines (DSPEs) do not support online task management and cannot adapt to time-varying data flows. Recently, some studies have proposed online task deployment algorithms to solve this problem. However, these approaches do not guarantee the Quality of Service (QoS) when the task deployment changes at runtime, because the task migrations caused by the change of task deployments will impose an exorbitant cost. We study one of the most popular DSPEs, Apache Storm, and find out that when a task needs to be migrated, Storm has to stop the resource (implemented as a process of Worker in Storm) where the task is deployed. This will lead to the stop and restart of all tasks in the resource, resulting in the poor performance of task migrations. Aiming to solve this problem, in this paper, we propose N-Storm (Nonstop Storm), which is a task-resource decoupling DSPE. N-Storm allows tasks allocated to resources to be changed at runtime, which is implemented by a thread-level scheme for task migrations. Particularly, we add a local shared key/value store on each node to make resources aware of the changes in the allocation plan. Thus, each resource can manage its tasks at runtime. Based on N-Storm, we further propose Online Task Deployment (OTD). Differing from traditional task deployment algorithms that deploy all tasks at once without considering the cost of task migrations caused by a task re-deployment, OTD can gradually adjust the current task deployment to an optimized one based on the communication cost and the runtime states of resources. We demonstrate that OTD can adapt to different kinds of applications including computation- and communication-intensive applications. The experimental results on a real DSPE cluster show that N-Storm can avoid the system stop and save up to 87% of the performance degradation time, compared with Apache Storm and other state-of-the-art approaches. In addition, OTD can increase the average CPU usage by 51% for computation-intensive applications and reduce network communication costs by 88% for communication-intensive applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Fu X W, Ghaffar T, Davis J C, Lee D. EdgeWise: A better stream processing engine for the edge. In Proc. the 2019 USENIX Annual Technical Conference, Jul. 2019, pp.929–946.

  2. Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel J M, Kulkarni S, Jackson J, Gade K, Fu M S, Donham J, Bhagat N, Mittal S, Ryaboy D. Storm@Twitter. In Proc. the 2014 ACM SIGMOD International Conference on Management of Data, Jun. 2014, pp.147–156. https://doi.org/10.1145/2588555.2595641.

  3. Kulkarni S, Bhagat N, Fu M S, Kedigehalli V, Kellogg C, Mittal S, Patel J M, Ramasamy K, Taneja S. Twitter heron: Stream processing at scale. In Proc. the 2015 ACM SIGMOD International Conference on Management of Data, May 2015, pp.239–250. https://doi.org/10.1145/2723372.2742788.

  4. Fu M S, Agrawal A, Floratou A, Graham B, Jorgensen A, Li R H, Lu N, Ramasamy K, Rao S, Wang C. Twitter heron: Towards extensible streaming engines. In Proc. the 33rd IEEE International Conference on Data Engineering, Apr. 2017, pp.1165–1172. https://doi.org/10.1109/ICDE.2017.161.

  5. Zhang Z, Jin P Q, Wang X L, Liu R C, Wan S H. NStorm: Efficient thread-level task migration in Apache Storm. In Proc. the 21st International Conference on High Performance Computing and Communications, the 17th IEEE International Conference on Smart City, the 5th IEEE International Conference on Data Science and Systems, Aug. 2019, pp.1595–1602. https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00219.

  6. Xu J L, Chen Z H, Tang J, Su S. T-Storm: Traffic-aware online scheduling in Storm. In Proc. the 34th IEEE International Conference on Distributed Computing Systems, Jun. 30–Jul. 3, 2014, pp.535–544. https://doi.org/10.1109/ICDCS.2014.61.

  7. Zhang J, Li C L, Zhu L Y, Liu Y P. The real-time scheduling strategy based on traffic and load balancing in Storm. In Proc. the 18th International Conference on High Performance Computing and Communications, the 14th IEEE International Conference on Smart City, the 2nd IEEE International Conference on Data Science and Systems, Dec. 2016, pp.372–379. https://doi.org/10.1109/HPCCSmartCity-DSS.2016.0060.

  8. Peng B Y, Hosseini M, Hong Z H, Farivar R, Campbell R. R-Storm: Resource-aware scheduling in Storm. In Proc. the 16th Annual Middleware Conference, Nov. 2015, pp.149–161. https://doi.org/10.1145/2814576.2814808.

  9. Mai L, Zeng K, Potharaju R, Xu L, Suh S, Venkataraman S, Costa P, Kim T, Muthukrishnan S, Kuppa V, Dhulipalla S, Rao S. Chi: A scalable and programmable control plane for distributed stream processing systems. Proceedings of the VLDB Endowment, 2018, 11(10): 1303–1316. https://doi.org/10.14778/3231751.3231765.

  10. Nasir M A U, De Francisci Morales G, García-Soriano D, Kourtellis N, Serafini M. The power of both choices: Practical load balancing for distributed stream processing engines. In Proc. the 31st IEEE International Conference on Data Engineering, Apr. 2015, pp.137–148. https://doi.org/10.1109/ICDE.2015.7113279.

  11. Aniello L, Baldoni R, Querzoni L. Adaptive online scheduling in Storm. In Proc. the 7th ACM International Conference on Distributed Event-Based Systems, Jun. 2013, pp.207–218. https://doi.org/10.1145/2488222.2488267.

  12. Cardellini V, Lo Presti F, Nardelli M, Russo G R. Optimal operator deployment and replication for elastic distributed data stream processing. Concurrency and Computation: Practice and Experience, 2018, 30(9): e4334. https://doi.org/10.1002/cpe.4334.

    Article  Google Scholar 

  13. Li J, Pu C, Chen Y, Gmach D, Milojicic D. Enabling elastic stream processing in shared clusters. In Proc. the 9th IEEE International Conference on Cloud Computing, Jun. 27–Jul. 2, 2016, pp.108–115. https://doi.org/10.1109/CLOUD.2016.0024.

  14. Weng Z J, Guo Q, Wang C K, Meng X F, He B S. AdaStorm: Resource efficient Storm with adaptive configuration. In Proc. the 33rd IEEE International Conference on Data Engineering, Apr. 2017, pp.1363–1364. https://doi.org/10.1109/ICDE.2017.178.

  15. Farahabady M R H, Samani H R D, Wang Y D, Zomaya A Y, Tari Z. A QoS-aware controller for Apache Storm. In Proc. the 15th IEEE International Symposium on Network Computing and Applications, Oct. 26–Nov. 2, 2016, pp.334–342. https://doi.org/10.1109/NCA.2016.7778638.

  16. Jiang J W, Zhang Z P, Cui B, Tong Y H, Xu N. Stro-MAX: Partitioning-based scheduler for real-time stream processing system. In Proc. the 22nd International Conference on Database Systems for Advanced Applications, Mar. 2017, pp.269–288. https://doi.org/10.1007/978-3-319-55699-4_17.

  17. Nardelli M, Cardellini V, Grassi V, Lo Presti F. Efficient operator placement for distributed data stream processing applications. IEEE Trans. Parallel and Distributed Systems, 2019, 30(8): 1753–1767. https://doi.org/10.1109/TPDS.2019.2896115.

    Article  Google Scholar 

  18. Eskandari L, Mair J, Huang Z Y, Eyers D. I-Scheduler: Iterative scheduling for distributed stream processing systems. Future Generation Computer Systems, 2021, 117: 219–233. https://doi.org/10.1016/j.future.2020.11.011.

  19. Chatzistergiou A, Viglas S D. Fast heuristics for near-optimal task allocation in data stream processing over clusters. In Proc. the 23rd ACM International Conference on Information and Knowledge Management, Nov. 2014, pp.1579–1588. https://doi.org/10.1145/2661829.2661882.

  20. Qian W J, Shen Q N, Qin J, Yang D, Yang Y H, Wu Z H. S-Storm: A slot-aware scheduling strategy for even scheduler in Storm. In Proc. the 18th International Conference on High Performance Computing and Communications, the 14th IEEE International Conference on Smart City, the 2nd IEEE International Conference on Data Science and Systems, Dec. 2016, pp.623–630. https://doi.org/10.1109/HPCC-SmartCity-DSS.2016.0093.

  21. Fang J H, Zhang R, Fu T Z J, Zhang Z J, Zhou A Y, Zhu J H. Parallel stream processing against workload skewness and variance. In Proc. the 26th International Symposium on High-Performance Parallel and Distributed Computing, Jun. 2017, pp.15–26. https://doi.org/10.1145/3078597.3078613.

  22. Li C L, Zhang J, Luo Y L. Real-time scheduling based on optimized Topology and communication traffic in distributed real-time computation platform of Storm. Journal of Network and Computer Applications, 2017, 87: 100–115. https://doi.org/10.1016/j.jnca.2017.03.007.

    Article  Google Scholar 

  23. Sun D W, Zhang G Y, Yang S L, Zheng W M, Khan S U, Li K Q. Re-Stream: Real-time and energy-efficient resource scheduling in big data stream computing environments. Information Sciences, 2015, 319: 92–112. https://doi.org/10.1016/j.ins.2015.03.027.

    Article  MathSciNet  Google Scholar 

  24. Sax M J, Castellanos M, Chen Q M, Hsu M. Aeolus: An optimizer for distributed intra-node-parallel streaming systems. In Proc. the 29th IEEE International Conference on Data Engineering, Apr. 2013, pp.1280–1283. https://doi.org/10.1109/ICDE.2013.6544924.

  25. Fu T Z J, Ding J B, Ma R T B, Winslett M, Yang Y, Zhang Z J. DRS: Auto-scaling for real-time stream analytics. IEEE/ACM Trans. Networking, 2017, 25(6): 3338– 3352. https://doi.org/10.1109/TNET.2017.2741969.

    Article  Google Scholar 

  26. Kahveci B, Gedik B. Joker: Elastic stream processing with organic adaptation. Journal of Parallel and Distributed Computing, 2020, 137: 205–223. https://doi.org/10.1016/j.jpdc.2019.10.012.

    Article  Google Scholar 

  27. Floratou A, Agrawal A, Graham B, Rao S, Ramasamy K. Dhalion: Self-regulating stream processing in Heron. Proceedings of the VLDB Endowment, 2017, 10(12): 1825–1836. https://doi.org/10.14778/3137765.3137786.

  28. Lombardi F, Aniello L, Bonomi S, Querzoni L. Elastic symbiotic scaling of operators and resources in stream processing systems. IEEE Trans. Parallel and Distributed Systems, 2018, 29(3): 572–585. https://doi.org/10.1109/TPDS.2017.2762683.

    Article  Google Scholar 

  29. Kalavri V, Liagouris J, Hoffmann M, Dimitrova D, Forshaw M, Roscoe T. Three steps is all you need: Fast, accurate, automatic scaling decisions for distributed streaming dataflows. In Proc. the 13th USENIX Conference on Operating Systems Design and Implementation, Oct. 2018, pp.783–798.

  30. Marangozova-Martin V, de Palma N, El Rheddane A. Multi-level elasticity for data stream processing. IEEE Trans. Parallel and Distributed Systems, 2019, 30(10): 2326–2337. https://doi.org/10.1109/TPDS.2019.2907950.

    Article  Google Scholar 

  31. Wang C K, Meng X F, Guo Q, Weng Z J, Yang C. Automating characterization deployment in distributed data stream management systems. IEEE Trans. Knowledge and Data Engineering, 2017, 29(12): 2669–2681. https://doi.org/10.1109/TKDE.2017.2751606.

    Article  Google Scholar 

  32. Yang M S, Ma R T B. Smooth task migration in Apache Storm. In Proc. the 2015 ACM SIGMOD International Conference on Management of Data, May 2015, pp.2067–2068. https://doi.org/10.1145/2723372.2764941.

  33. Shukla A, Simmhan Y. Toward reliable and rapid elasticity for streaming dataflows on clouds. In Proc. the 38th IEEE International Conference on Distributed Computing Systems, Jul. 2018, pp.1096–1106. https://doi.org/10.1109/ICDCS.2018.00109.

  34. Fernandez R C, Migliavacca M, Kalyvianaki E, Pietzuch P. Integrating scale out and fault tolerance in stream processing using operator state management. In Proc. the 2013 ACM SIGMOD International Conference on Management of Data, Jun. 2013, pp.725–736. https://doi.org/10.1145/2463676.2465282.

  35. Wu Y J, Tan K L. ChronoStream: Elastic stateful stream computation in the cloud. In Proc. the 31st IEEE International Conference on Data Engineering, Apr. 2015, pp.723–734. https://doi.org/10.1109/ICDE.2015.7113328.

  36. Gedik B, Schneider S, Hirzel M, Wu K L. Elastic scaling for data stream processing. IEEE Trans. Parallel and Distributed Systems, 2014, 25(6): 1447–1463. https://doi.org/10.1109/TPDS.2013.295.

    Article  Google Scholar 

  37. Noghabi S A, Paramasivam K, Pan Y, Ramesh N, Bringhurst J, Gupta I, Campbell R H. Samza: Stateful scalable stream processing at LinkedIn. Proceedings of the VLDB Endowment, 2017, 10(12): 1634–1645. https://doi.org/10.14778/3137765.3137770.

  38. Venkataraman S, Panda A, Ousterhout K, Armbrust M, Ghodsi A, Franklin M J, Recht B, Stoica I. Drizzle: Fast and adaptable stream processing at scale. In Proc. the 26th Symposium on Operating Systems Principles, Oct. 2017, pp.374–389. https://doi.org/10.1145/3132747.3132750.

  39. Wang L, Fu T Z J, Ma R T B, Winslett M, Zhang Z J. Elasticutor: Rapid elasticity for realtime stateful stream processing. In Proc. the 2019 International Conference on Management of Data, Jun. 2019, pp.573–588. https://doi.org/10.1145/3299869.3319868.

  40. Hoffmann M, Lattuada A, McSherry F. Megaphone: Latency-conscious state migration for distributed streaming dataflows. Proceedings of the VLDB Endowment, 2019, 12(9): 1002–1015. https://doi.org/10.14778/3329772.3329777.

  41. Del Monte B, Zeuch S, Rabl T, Markl V. Rhino: Efficient management of very large distributed state for stream processing engines. In Proc. the 2020 ACM SIGMOD International Conference on Management of Data, Jun. 2020, pp.2471–2486. https://doi.org/10.1145/3318464.3389723.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pei-Quan Jin.

Supplementary Information

ESM 1

(PDF 160 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Jin, PQ., Xie, XK. et al. Online Nonstop Task Management for Storm-Based Distributed Stream Processing Engines. J. Comput. Sci. Technol. 39, 116–138 (2024). https://doi.org/10.1007/s11390-021-1629-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-021-1629-9

Keywords

Navigation