The Journal of Supercomputing

, Volume 74, Issue 2, pp 615–636 | Cite as

Rethinking elastic online scheduling of big data streaming applications over high-velocity continuous data streams

  • Dawei Sun
  • Hongbin Yan
  • Shang Gao
  • Xunyun Liu
  • Rajkumar Buyya
Article

Abstract

Online scheduling plays a key role for big data streaming applications in a big data stream computing environment, as the arrival rate of high-velocity continuous data stream might fluctuate over time. In this paper, an elastic online scheduling framework for big data streaming applications (E-Stream) is proposed, exhibiting the following features. (1) Profile mathematical relationships between system response time, multiple application fairness, and online features of high-velocity continuous stream. (2) Scale out or scale in a data stream graph by quantifying computation and communication cost, and the vertex semantics for arrival rate of data stream, and adjust the degree of parallelism of vertices in the graph. Subgraph is further constructed to minimize data dependencies among the subgraphs. (3) Elastically schedule a graph by a priority-based earliest finish time first online scheduling strategy, and schedule multiple graphs by a max–min fairness strategy. (4) Evaluate the low system response time and acceptable applications fairness objectives in a real-world big data stream computing environment. Experimental results conclusively demonstrate that the proposed E-Stream provides better system response time and applications fairness compared to the existing Storm framework.

Keywords

Elastic scheduling Data stream graph Streaming application High-velocity stream Big data computing 

Notes

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant No. 61602428; the Fundamental Research Funds for the Central Universities under Grant No. 2652015338; and Melbourne-Chindia Cloud Computing (MC3) Research Network. We are grateful to Prof. Satish Srirama for his comments on improving the paper.

References

  1. 1.
    Eskandari L, Huang Z, Eyers D (2016) P-Scheduler: adaptive hierarchical scheduling in apache storm. In: Proceedings of the Australasian Computer Science Week Multiconference, ACSW 2016, No. 26. ACM Press, New YorkGoogle Scholar
  2. 2.
    Sun DW, Zhang GY, Wu CW, Li KQ, Zheng WM (2017) Building a fault tolerant framework with deadline guarantee in big data stream computing environments. J Comput Syst Sci 89:4–23MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Dayarathna M, Toyotaro S (2013) Automatic optimization of stream programs via source program operator graph transformations. Distrib Parallel Databases 31(4):543–599CrossRefGoogle Scholar
  4. 4.
    Alexandrov A, Salzmann A, Krastev G, Katsifodimos A, Markl V (2016) Emma in Action: declarative dataflows for scalable data analysis. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD 2016. ACM Press, New York, pp 2073–2076Google Scholar
  5. 5.
    Convolbo MW, Chou J (2016) Cost-aware DAG scheduling algorithms for minimizing execution cost on cloud resources. J Supercomput 72(3):985–1012CrossRefGoogle Scholar
  6. 6.
    Kanoun K, Tekin C, Atienza D, Shaar M (2016) Big-data streaming applications scheduling based on staged multi-armed bandits. IEEE Trans Comput 65(12):3591–3605MathSciNetMATHGoogle Scholar
  7. 7.
    Fu TZJ, Ding J, Ma RTB, Winslett M, Yang Y, Yin Z, Zhang Z (2015) DRS: dynamic resource scheduling for real-time analytics over fast streams. In: Proceedings of 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015. IEEE Press, New York, pp 411–420Google Scholar
  8. 8.
    Peng B, Hosseini M, Hong Z, Farivar R, Campbell R (2015) R-Storm: resource-aware scheduling in Storm. In: Proceedings of the 16th Annual Middleware Conference, Middleware 2015. ACM Press, New York, pp 149–161Google Scholar
  9. 9.
    Choi Y, Chang S, Kim Y, Lee H, Son W, Jin S (2016) Detecting and monitoring game bots based on large-scale user-behavior log data analysis in multiplayer online games. J Supercomput 72(9):3572–3587CrossRefGoogle Scholar
  10. 10.
    Lohrmann B, Janacik P, Kao O (2015) Elastic stream processing with latency guarantees. In: Proceedings of 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015. IEEE Press, New York, pp 399–410Google Scholar
  11. 11.
    Ahmad SG, Liew CS, Rafique MM, Munir EU, Khan SU (2014) Data-intensive workflow optimization based on application task graph partitioning in heterogeneous computing systems. In: Proceedings of 4th IEEE International Conference on Big Data and Cloud Computing, BDCloud 2014. IEEE Press, New York, pp 129–136Google Scholar
  12. 12.
    Ghafarian T, Javadi B (2015) Cloud-aware data intensive workflow scheduling on volunteer computing systems. Future Gener Comput Syst 51:87–97CrossRefGoogle Scholar
  13. 13.
    Gu Y, Wu CQ (2016) Performance analysis and optimization of distributed workflows in heterogeneous network environments. IEEE Trans Comput 65(4):1266–1282MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Chen TW, Lee YC, Fekete A, Zomay AY (2015) Adaptive multiple-workflow scheduling with task rearrangement. J Supercomput 71(4):1297–1317CrossRefGoogle Scholar
  15. 15.
    Arabnejad H, Barbosa JG (2014) A budget constrained scheduling algorithm for workflow applications. J Grid Comput 12(4):665–679CrossRefGoogle Scholar
  16. 16.
    Yun D, Wu CQ, Gu Y (2015) An integrated approach to workflow mapping and task scheduling for delay minimization in distributed environments. J Parallel Distrib Comput 84:51–64CrossRefGoogle Scholar
  17. 17.
    Xu J, Chen Z, Tang J, Su S (2014) T-Storm: traffic-aware online scheduling in Storm. In: Proceedings of 2014 IEEE 34th Internatoin Conference on Distributed Computing Systems, ICDCS 2014. IEEE Press, New York, pp 535–544Google Scholar
  18. 18.
    Aniello L, Baldoni R, Querzoni L (2013) Adaptive online scheduling in Storm. In: Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems, DEBS 2013. ACM Press, New York, pp 207–218Google Scholar
  19. 19.
    Katsipoulakis NR, Thoma C, Gratta EA, Labrinidis A, Lee AJ, Chrysanthis PK (2015) CE-Storm: confidential elastic processing of data streams. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015. ACM Press, New York, pp 859–864Google Scholar
  20. 20.
    Chen Z, Xu J, Tang J, Kwiat K, Kamhoua C (2015) G-Storm: GPU-enabled high-throughput online data processing in Storm. In: Proceedings of the 2015 IEEE International Conference on Big Data, Big Data 2015. IEEE Press, New York, pp 307–312Google Scholar
  21. 21.
    Basanta-Val P, Fernández-García N, Wellings AJ, Audsley NC (2015) Improving the predictability of distributed stream processors. Future Gener Comput Syst 52:22–36CrossRefGoogle Scholar
  22. 22.
    Verma A, Kaushal S (2015) Cost-time efficient scheduling plan for execution workflows in the cloud. J Grid Comput 13(4):495–506MathSciNetCrossRefGoogle Scholar
  23. 23.
    Gu L, Zeng D, Guo S, Xiang Y, Hu J (2016) A general communication cost optimization framework for big data stream processing in geo-distributed data centers. IEEE Trans Comput 65(1):19–29MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Tang S, Lee BS, He B (2017) Fair resource allocation for data-intensive computing in the cloud. IEEE Trans Serv Comput. doi: 10.1109/TSC.2016.2531698 Google Scholar
  25. 25.
    Sun DW, Huang R (2016) A stable online scheduling strategy for real-time stream computing over fluctuating big data streams. IEEE Access 4:8593–8607CrossRefGoogle Scholar
  26. 26.
    Hu M, Luo J, Wang Y, Lukasiewycz M, Zeng Z (2014) Holistic scheduling of real-time applications in time-triggered in-vehicle networks. IEEE Trans Ind Inf 10(3):1817–1828CrossRefGoogle Scholar
  27. 27.
    Alkhanak EN, Lee SP, Rezaei R, Parizi RM (2016) Cost optimization approaches for scientific workflow scheduling in cloud and grid computing: a review, classifications, and open issues. J Syst Softw 113:1–26CrossRefGoogle Scholar
  28. 28.
    Hu M, Luo J, Wang Y, Veeravalli B (2017) Adaptive scheduling of task graphs with dynamic resilience. IEEE Trans Comput 66(1):17–23MathSciNetCrossRefMATHGoogle Scholar
  29. 29.
    Matei Z, Dhruba B, Joydeep SS, Khaled E, Scott S, Ion S (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of 5th European Conference on Computer systems, EuroSys 2010. ACM Press, New York, pp 265–278Google Scholar
  30. 30.
    Bala A, Chana I (2015) Intelligent failure prediction models for scientific workflows. Expert Syst Appl 42(3):980–989CrossRefGoogle Scholar
  31. 31.
    Zeng L, Veeravalli B, Zomaya AY (2015) An integrated task computation and data management scheduling strategy for workflow applications in cloud environments. J Netw Comput Appl 50:39–48CrossRefGoogle Scholar
  32. 32.
    Shi J, Luo J, Dong F, Zhang J, Zhang J (2016) Elastic resource provisioning for scientific workflow scheduling in cloud under budget and deadline constraints. Clust Comput 19(1):167–182CrossRefGoogle Scholar
  33. 33.
    Zhu Z, Zhang G, Li M, Liu X (2016) Evolutionary multi-objective workflow scheduling in cloud. IEEE Trans Parallel Distrib Syst 27(5):1344–1357CrossRefGoogle Scholar
  34. 34.
    Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J, Bhagat N, Mittal S, Ryaboy D (2014) Storm@twitter. In: Proceedings of 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014. ACM Press, New York, pp 147–156Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.School of Information EngineeringChina University of GeosciencesBeijingPeople’s Republic of China
  2. 2.Cloud Computing and Distributed Systems (CLOUDS) Laboratory, School of Computing and Information SystemsThe University of MelbourneParkvilleAustralia
  3. 3.School of Information TechnologyDeakin UniversityBurwoodAustralia

Personalised recommendations