Advertisement

Network-Aware Grouping in Distributed Stream Processing Systems

  • Fei Chen
  • Song Wu
  • Hai Jin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11334)

Abstract

Distributed Stream Processing (DSP) systems have recently attracted much attention because of their ability to process huge volumes of real-time stream data with very low latency on clusters of commodity hardware. Existing workload grouping strategies in a DSP system can be classified into four categories (i.e. raw and blind, data skewness, cluster heterogeneity, and dynamic load-aware). However, these traditional stream grouping strategies do not consider network distance between two communicating operators. In fact, the traffic from different network channels makes a significant impact on performance. How to grouping tuples according to network distances to improve performance has been a critical problem.

In this paper, we propose a network-aware grouping framework called Squirrel to improve the performance under different network distances. Identifying the network location of two communicating operators, Squirrel sets a weight and priority for each network channel. It introduces Weight Grouping to assign different numbers of tuples to each network channel according to channel’s weight and priority. In order to adapt to changes in network conditions, input load, resources and other factors, Squirrel uses Dynamic Weight Control to adjust network channel’s weight and priority online by analyzing runtime information. Experimental results prove Squirrel’s effectiveness and show that Squirrel can achieve 1.67x improvement in terms of throughput and reduce the latency by 47%.

Keywords

Stream processing Load balancing Grouping Network distance 

Notes

Acknowledgment

This work was supported by National Key Research and Development Program under grant 2018YFB1003600 and Pre-research Project of Beifang under grant FFZ-1601.

References

  1. 1.
  2. 2.
    Aniello, L., Baldoni, R., Querzoni, L.: Adaptive online scheduling in storm. In: Proceedings of DEBS 2013, pp. 207–218 (2013)Google Scholar
  3. 3.
    Caneill, M., EI Rheddane, A., Leroy, V., De Palma, N.: Locality-aware routing in stateful streaming applications. In: Proceedings of Middleware 2016, pp. 1–13 (2016)Google Scholar
  4. 4.
    Carbone, P., Ewen, S., Haridi, S.: Apache flink: stream and batch processing in a single engine. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 36(4), 28–38 (2015)Google Scholar
  5. 5.
    Chen, H., Zhang, F., Jin, H.: Popularity-aware differentiated distributed stream processing on skewed streams. In: Proceedings of ICNP 2017, pp. 1–10 (2017)Google Scholar
  6. 6.
    Chintapalli, S., et al.: Benchmarking streaming computation engines: storm, flink and spark streaming. In: Proceedings of IPDPSW 2016, pp. 1789–1792 (2016)Google Scholar
  7. 7.
    Fang, J., Zhang, R., Fu, T., Zhang, Z., Zhou, A., Zhu, J.: Parallel stream processing against workload skewness and variance. In: Proceedings of HPDC 2017, pp. 15–26 (2017)Google Scholar
  8. 8.
    Kulkarni, S., et al.: Twitter heron: stream processing at scale. In: Proceedings of SIGMOD 2015, pp. 239–250 (2015)Google Scholar
  9. 9.
    Murray, D., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: Proceedings of SOSP 2013, pp. 439–455 (2013)Google Scholar
  10. 10.
    Nasir, M.A.U., et al.: Load balancing for skewed streams on heterogeneous clusters. CoRR abs/1705.09073 (2017). http://arxiv.org/abs/1705.09073
  11. 11.
    Nasir, M.A.U., Morales, G.D.F., Garcia-Soriano, D., Kourtellis, N., Serafini, M.: The power of both choices: practical load balancing for distributed stream processing engines. In: Proceedings of ICDE 2015, pp. 137–148 (2015)Google Scholar
  12. 12.
    Nasir, M.A.U., Morales, G.D.F., Garcia-Soriano, D., Kourtellis, N., Serafini,M.: Partial key grouping: load-balanced partitioning of distributed streams. CoRR abs/1510.07623 (2015). http://arxiv.org/abs/1510.07623
  13. 13.
    Nasir, M.A.U., Morales, G.D.F., Kourtellis, N., Serafini, M.: When two choices are not enough: balancing at scale in distributed stream processing. In: Proceedings of ICDE 2016, pp. 589–600 (2016)Google Scholar
  14. 14.
    Peng, B., Hosseini, M., Hong, Z., Farivar, R., Campbell, R.: R-storm: resource-aware scheduling in storm. In: Proceedings of Middleware 2015, pp. 149–161 (2015)Google Scholar
  15. 15.
    Rivetti, N., Anceaume, E., Busnel, Y., Querzoni, L., Sericola, B.: Online scheduling for shuffle grouping in distributed stream processing systems. In: Proceedings of Middleware 2016, pp. 11–22 (2016)Google Scholar
  16. 16.
    Rivetti, N., Querzoni, L., Anceaume, E., Busnel, Y., Sericola, B.: Efficient key grouping for near-optimal load balancing in stream processing systems. In: Proceedings of DEBS 2015, pp. 80–91 (2015)Google Scholar
  17. 17.
    Schneider, S., Wolf, J., Hildrum, K., Khandekar, R.: Dynamic load balancing for ordered data-parallel regions in distributed streaming systems. In: Proceedings of Middleware 2016, pp. 21–34 (2016)Google Scholar
  18. 18.
    Toshniwal, A., et al.: Storm @twitter. In: Proceedings of SIGMOD 2014, pp. 147–156 (2014)Google Scholar
  19. 19.
    Xu, J., Chen, Z., Tang, J., Su, S.: T-storm: traffic-aware online scheduling in storm. In: Proceedings of ICDCS 2014, pp. 535–544 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhanChina

Personalised recommendations