Skip to main content
Log in

Two-stage scheduling for a fluctuant big data stream on heterogeneous servers with multicores in a data center

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Rapid processing with low-latency and high-throughput is a critical requirement for the applications of big data streams. However, the interferences among stream processing tasks in a data center decrease the utilization of the computational resources and prolong the latency of the tasks. Thus, we study an optimal scheduling method for processing a big data stream on heterogeneous servers with multicores in a data center. We model the big data stream processing and the scheduling problem with four objects or factors which are streaming data items, processing tasks, computational nodes and the cores inside each computational node. An interference model based on regression analysis and a prediction model based on the Autoregressive Integrated Moving Average are presented. Then, we propose a two-stage scheduling method including the fine-grained core scheduling and the coarse-grained node scheduling. In the core scheduling stage, we design a core scheduling algorithm named CS_TDF. In the node scheduling stage, we design a node scheduling algorithm named NS_ITF for a single time window and a continuous scheduling algorithm named PS_UIM for the entire data stream in all time windows. The experimental results show that our scheduling method achieves low interference and high computational resource utilization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

Enquiries about data availability should be directed to the authors.

Notes

  1. https://storm.apache.org.

  2. http://open.weibo.com.

References

  1. Guo, J., Chang, Z.H., Wang, S., et al.: Who limits the resource efficiency of my datacenter: an analysis of Alibaba datacenter traces. In: The International Symposium, pp. 1–10 (2019)

  2. He, K., Meng, X., Pan, Z., et al.: A novel task-duplication based clustering algorithm for heterogeneous computing environments. IEEE Trans. Parallel Distrib. Syst. 30(1), 2–14 (2019)

    Article  Google Scholar 

  3. Gao, G., Xiao, M., Wu, J., et al.: Opportunistic mobile data offloading with deadline constraints. IEEE Trans. Parallel Distrib. Syst. 28(12), 3584–3599 (2017)

    Article  Google Scholar 

  4. Barika, M., Garg, S., Chan, A., et al.: Scheduling algorithms for efficient execution of stream workflow applications in multicloud environments. IEEE Trans. Serv. Comput. 15(2), 860–875 (2022)

    Article  Google Scholar 

  5. Zhang, H., Geng, X., Ma, H.: Learning-driven interference-aware workload parallelization for streaming applications in heterogeneous cluster. IEEE Trans. Parallel Distrib. Syst. 32(1), 1–15 (2021)

    Article  Google Scholar 

  6. Barika, M., Garg, S., Zomaya, A.Y., et al.: Online scheduling technique to handle data velocity changes in stream workflows. IEEE Trans. Parallel Distrib. Syst. 32(8), 2115–2130 (2021)

    Article  Google Scholar 

  7. Li, W., Liu, D., Chen, K., et al.: Hone: mitigating stragglers in distributed stream processing with tuple scheduling. IEEE Trans. Parallel Distrib. Syst. 32(8), 2021–2034 (2021)

    Article  Google Scholar 

  8. Liu, S., Weng, J., Wang, J.H., et al.: An adaptive online scheme for scheduling and resource enforcement in Storm. IEEE/ACM Trans. Netw. 27(4), 1373–1386 (2019)

    Article  Google Scholar 

  9. Fu, M., Mittal, S., Kedigehalli, V., et al.: Streaming@Twitter. IEEE Data Eng. Bull. 38(4), 15–27 (2015)

    Google Scholar 

  10. Peng B., Hosseini, M., Hong, Z., et al.: R-storm: resource-aware scheduling in storm. In: Proceedings of the 16th Annual Middleware Conference, pp. 149–161 (2015)

  11. Shukla, A., Simmhan, Y.: Model-driven scheduling for distributed stream processing systems. J. Parallel Distrib. Comput. 117(1), 98–114 (2018)

    Article  Google Scholar 

  12. Huang, X., Shao, Z., Yang, Y.: POTUS: predictive online tuple scheduling for data stream processing systems. IEEE Trans. Cloud Comput. (2020). https://doi.org/10.1109/TCC.2020.3032577

    Article  Google Scholar 

  13. Heintz, B., Chandra, A., Sitaraman, R.K.: Optimizing timeliness and cost in geo-distributed streaming analytics. IEEE Trans. Cloud Comput. 8(1), 232–245 (2020)

    Article  Google Scholar 

  14. Sun, D., Gao, S., Liu, X., et al.: A multi-level collaborative framework for elastic stream computing systems. Futur. Gener. Comput. Syst. 128, 117–131 (2022)

    Article  Google Scholar 

  15. Li, H., Fang, H., Dai, H., et al.: A cost-efficient scheduling algorithm for streaming processing applications on cloud. Clust. Comput. (2022). https://doi.org/10.1007/s10586-021-03462-6

    Article  Google Scholar 

  16. Li, H., Dai, H., Liu, Z., et al.: Dynamic energy-efficient scheduling for streaming applications in storm. Computing 104(2), 413–432 (2022)

    Article  Google Scholar 

  17. KhudaBukhsh, W.R., Kar, S., Alt, B., et al.: Generalized cost-based job scheduling in very large heterogeneous cluster systems. IEEE Trans. Parallel Distrib. Syst. 31(11), 2594–2604 (2020)

    Article  Google Scholar 

  18. Liang, W., Hu, C., Wu, M., et al.: A data intensive heuristic approach to the two-stage streaming scheduling problem. J. Comput. Syst. Sci. 89(1), 64–79 (2017)

    Article  MathSciNet  Google Scholar 

  19. Jin, H., Chen, F., Wu, S., et al.: Towards low-latency batched stream processing by pre-scheduling. IEEE Trans. Parallel Distrib. Syst. 30(3), 710–722 (2018)

    Article  Google Scholar 

  20. Li, T., Tang, J., Xu, J.: Performance modeling and predictive scheduling for distributed stream data processing. IEEE Trans. Big Data 2(4), 353–364 (2016)

    Article  Google Scholar 

  21. Liu, X., Buyya, R.: Performance-oriented deployment of streaming applications on cloud. IEEE Trans. Big Data 5(1), 46–59 (2017)

    Article  Google Scholar 

  22. Shen, J., Varbanescu, A.L., Lu, Y., et al.: Workload partitioning for accelerating applications on heterogeneous platforms. IEEE Trans. Parallel Distrib. Syst. 27(9), 2766–2780 (2016)

    Article  Google Scholar 

  23. Wei, X., Li, L., Li, X., et al.: Pec: proactive elastic collaborative resource scheduling in data stream processing. IEEE Trans. Parallel Distrib. Syst. 30(7), 1628–1642 (2019)

    Article  Google Scholar 

  24. Min, C., Eom, Y.I.: Dynamic scheduling of irregular stream programs toward many-core scalability. IEEE Trans. Parallel Distrib. Syst. 26(6), 1594–1607 (2015)

    Article  Google Scholar 

  25. Huang, J., Li, R., Wei, Y., et al.: Bi-directional timing-power optimisation on heterogeneous multi-core architectures. IEEE Trans. Sustain. Comput. 6(4), 572–585 (2021)

    Article  Google Scholar 

  26. Zhao, J.C., Cui, H.M., Xue, J.L., et al.: Predicting cross-core performance interference on multicore processors with regression analysis. IEEE Trans. Parallel Distrib. Syst. 27(5), 1443–1456 (2016)

    Article  Google Scholar 

  27. Buddhika, T., Stern, R., Lindburg, K., Pallickara, S., et al.: Online scheduling and interference alleviation for low-latency, high-throughput processing of data streams. IEEE Trans. Parallel Distrib. Syst. 28(12), 3553–3569 (2017)

    Article  Google Scholar 

  28. Mars, J., Tang, L.: Chapter 2—understanding application contentiousness and sensitivity on modern multicores. In: Advances in Computers, pp. 59–85 (2013)

  29. Mars, J., Tang, L., Hundt, R., et al.: Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: 44th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 248–259 (2011)

  30. Guo, J., Ma, A., Yan, Y., et al.: Application performance prediction method based on cross-core performance interference on multi-core processor. Microprocess. Microsyst. 47(Part A), 112–120 (2016)

    Article  Google Scholar 

  31. Babu, C.N., Reddy, B.E.: A moving-average filter based hybrid ARIMA–ANN model for forecasting time series data. Appl. Soft Comput. 23, 27–38 (2014)

    Article  Google Scholar 

  32. Shukla, A., Chaturvedi, S., Simmhan, Y.: RIoTBench: an IoT benchmark for distributed stream processing systems. Concurr. Comput. Pract. Exp. 29(21), 1–22 (2017)

    Article  Google Scholar 

  33. Nechifor, S., Stefan, I., Fischer, M., et al.: Event detection for urban dynamic data streams. In: 2016 IEEE 16th International Conference on Data Mining Workshops, pp. 53–60 (2016)

  34. Goyal, P., Kaushik, P., Gupta, P., et al.: Multilevel event detection, storyline generation, and summarization for tweet streams. IEEE Trans. Computat. Soc. Syst. 7(1), 8–23 (2020)

    Article  Google Scholar 

Download references

Funding

This work was supported by the National Key R&D Program of China (Grant No. 2019YFB1704100), National Natural Science Foundation of China (Grant No. 62072337), National Social Science Foundation of China (Grant No. 17BTQ086), Subproject of National Seafloor Observatory System of China (Grant No. 2970000001/001/016).

Author information

Authors and Affiliations

Authors

Contributions

SW proposes the idea of this paper. SW collects a large amount of information, designs, and conducts experiments. SW is responsible for checking the experimental results. GZ has put forward constructive suggestions for this research. This paper was written by SW and checked by GZ.

Corresponding author

Correspondence to Guo-sun Zeng.

Ethics declarations

Competing interests

The authors have not disclosed any competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Zeng, Gs. Two-stage scheduling for a fluctuant big data stream on heterogeneous servers with multicores in a data center. Cluster Comput 27, 1581–1597 (2024). https://doi.org/10.1007/s10586-023-04044-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-023-04044-4

Keywords

Navigation