A Topology and Traffic Aware Two-Level Scheduler for Stream Processing Systems in a Heterogeneous Cluster

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10659)


To efficiently handle a large volume of data, scheduling algorithms in stream processing systems need to minimise the data movement between communicating tasks to improve system throughput. However, finding an optimal scheduling algorithm for these systems is NP-hard. In this paper, we propose a heuristic scheduling algorithm for a heterogeneous cluster—T3-Scheduler—that can efficiently identify the communicating tasks and assign them to the same node, up to a specified level of utilisation for that node. Using three common micro-benchmarks and an evaluation using a real-world application, we demonstrate that T3-Scheduler outperforms current state-of-the-art scheduling algorithms, such as Aniello et al.’s popular ‘Online scheduler’, improving throughput by 20–72% for micro-benchmarks and 60% for the real-world application.


Stream processing Scheduling Big data Heterogeneous cluster 


  1. 1.
  2. 2.
  3. 3.
  4. 4.
    Aniello, L., Baldoni, R., Querzoni, L.: Adaptive online scheduling in Storm. In: Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems, pp. 207–218 (2013)Google Scholar
  5. 5.
    Cardellini, V., Grassi, V., Lo Presti, F., Nardelli, M.: Optimal operator placement for distributed stream processing applications. In: Proceedings of the 10th ACM International Conference on Distributed and Event-Based Systems, pp. 69–80. ACM (2016)Google Scholar
  6. 6.
    Chakravarthy, S., Jiang, Q.: Stream Data Processing: A Quality of Service Perspective: Modeling, Scheduling, Load Shedding, and Complex Event Processing, vol. 36. Springer Science & Business Media, Berlin (2009). zbMATHGoogle Scholar
  7. 7.
    Chatzistergiou, A., Viglas, S.D.: Fast heuristics for near-optimal task allocation in data stream processing over clusters. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1579–1588. ACM (2014)Google Scholar
  8. 8.
    Eskandari, L., Huang, Z., Eyers, D.: P-scheduler: adaptive hierarchical scheduling in Apache Storm. In: Proceedings of the Australasian Computer Science Week Multiconference, p. 26. ACM (2016)Google Scholar
  9. 9.
    Fu, T.Z., Ding, J., Ma, R.T., Winslett, M., Yang, Y., Zhang, Z.: DRS: dynamic resource scheduling for real-time analytics over fast streams. In: Proceedings of the 35th International Conference on Distributed Computing Systems (ICDCS), pp. 411–420. IEEE (2015)Google Scholar
  10. 10.
    Gary, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-completeness. WH Freeman and Company, New York (1979)Google Scholar
  11. 11.
    Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: Proceedings of 2010 International Conference on Data Mining Workshops (ICDMW), pp. 170–177. IEEE (2010)Google Scholar
  12. 12.
    Peng, B., Hosseini, M., Hong, Z., Farivar, R., Campbell, R.: R-Storm: Resource-aware scheduling in Storm. In: Proceedings of the 16th Annual Middleware Conference, pp. 149–161. ACM (2015)Google Scholar
  13. 13.
    Shan, A.: Heterogeneous processing: a strategy for augmenting moore’s law. Linux J. 2006(142), 7 (2006)Google Scholar
  14. 14.
    Xu, J., Chen, Z., Tang, J., Su, S.: T-Storm: traffic-aware online scheduling in Storm. In: Proceedings of the 34th International Conference on Distributed Computing Systems (ICDCS), pp. 535–544. IEEE (2014)Google Scholar
  15. 15.
    Xu, L., Peng, B., Gupta, I.: Stela: enabling stream processing systems to scale-in and scale-out on-demand. In: Proceedings of 2016 IEEE International Conference on Cloud Engineering (IC2E), pp. 22–31. IEEE (2016)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of OtagoDunedinNew Zealand

Personalised recommendations