Rethinking elastic online scheduling of big data streaming applications over high-velocity continuous data streams
- 225 Downloads
- 1 Citations
Abstract
Online scheduling plays a key role for big data streaming applications in a big data stream computing environment, as the arrival rate of high-velocity continuous data stream might fluctuate over time. In this paper, an elastic online scheduling framework for big data streaming applications (E-Stream) is proposed, exhibiting the following features. (1) Profile mathematical relationships between system response time, multiple application fairness, and online features of high-velocity continuous stream. (2) Scale out or scale in a data stream graph by quantifying computation and communication cost, and the vertex semantics for arrival rate of data stream, and adjust the degree of parallelism of vertices in the graph. Subgraph is further constructed to minimize data dependencies among the subgraphs. (3) Elastically schedule a graph by a priority-based earliest finish time first online scheduling strategy, and schedule multiple graphs by a max–min fairness strategy. (4) Evaluate the low system response time and acceptable applications fairness objectives in a real-world big data stream computing environment. Experimental results conclusively demonstrate that the proposed E-Stream provides better system response time and applications fairness compared to the existing Storm framework.
Keywords
Elastic scheduling Data stream graph Streaming application High-velocity stream Big data computingNotes
Acknowledgements
This work is supported by the National Natural Science Foundation of China under Grant No. 61602428; the Fundamental Research Funds for the Central Universities under Grant No. 2652015338; and Melbourne-Chindia Cloud Computing (MC3) Research Network. We are grateful to Prof. Satish Srirama for his comments on improving the paper.
References
- 1.Eskandari L, Huang Z, Eyers D (2016) P-Scheduler: adaptive hierarchical scheduling in apache storm. In: Proceedings of the Australasian Computer Science Week Multiconference, ACSW 2016, No. 26. ACM Press, New YorkGoogle Scholar
- 2.Sun DW, Zhang GY, Wu CW, Li KQ, Zheng WM (2017) Building a fault tolerant framework with deadline guarantee in big data stream computing environments. J Comput Syst Sci 89:4–23MathSciNetCrossRefMATHGoogle Scholar
- 3.Dayarathna M, Toyotaro S (2013) Automatic optimization of stream programs via source program operator graph transformations. Distrib Parallel Databases 31(4):543–599CrossRefGoogle Scholar
- 4.Alexandrov A, Salzmann A, Krastev G, Katsifodimos A, Markl V (2016) Emma in Action: declarative dataflows for scalable data analysis. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD 2016. ACM Press, New York, pp 2073–2076Google Scholar
- 5.Convolbo MW, Chou J (2016) Cost-aware DAG scheduling algorithms for minimizing execution cost on cloud resources. J Supercomput 72(3):985–1012CrossRefGoogle Scholar
- 6.Kanoun K, Tekin C, Atienza D, Shaar M (2016) Big-data streaming applications scheduling based on staged multi-armed bandits. IEEE Trans Comput 65(12):3591–3605MathSciNetMATHGoogle Scholar
- 7.Fu TZJ, Ding J, Ma RTB, Winslett M, Yang Y, Yin Z, Zhang Z (2015) DRS: dynamic resource scheduling for real-time analytics over fast streams. In: Proceedings of 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015. IEEE Press, New York, pp 411–420Google Scholar
- 8.Peng B, Hosseini M, Hong Z, Farivar R, Campbell R (2015) R-Storm: resource-aware scheduling in Storm. In: Proceedings of the 16th Annual Middleware Conference, Middleware 2015. ACM Press, New York, pp 149–161Google Scholar
- 9.Choi Y, Chang S, Kim Y, Lee H, Son W, Jin S (2016) Detecting and monitoring game bots based on large-scale user-behavior log data analysis in multiplayer online games. J Supercomput 72(9):3572–3587CrossRefGoogle Scholar
- 10.Lohrmann B, Janacik P, Kao O (2015) Elastic stream processing with latency guarantees. In: Proceedings of 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015. IEEE Press, New York, pp 399–410Google Scholar
- 11.Ahmad SG, Liew CS, Rafique MM, Munir EU, Khan SU (2014) Data-intensive workflow optimization based on application task graph partitioning in heterogeneous computing systems. In: Proceedings of 4th IEEE International Conference on Big Data and Cloud Computing, BDCloud 2014. IEEE Press, New York, pp 129–136Google Scholar
- 12.Ghafarian T, Javadi B (2015) Cloud-aware data intensive workflow scheduling on volunteer computing systems. Future Gener Comput Syst 51:87–97CrossRefGoogle Scholar
- 13.Gu Y, Wu CQ (2016) Performance analysis and optimization of distributed workflows in heterogeneous network environments. IEEE Trans Comput 65(4):1266–1282MathSciNetCrossRefMATHGoogle Scholar
- 14.Chen TW, Lee YC, Fekete A, Zomay AY (2015) Adaptive multiple-workflow scheduling with task rearrangement. J Supercomput 71(4):1297–1317CrossRefGoogle Scholar
- 15.Arabnejad H, Barbosa JG (2014) A budget constrained scheduling algorithm for workflow applications. J Grid Comput 12(4):665–679CrossRefGoogle Scholar
- 16.Yun D, Wu CQ, Gu Y (2015) An integrated approach to workflow mapping and task scheduling for delay minimization in distributed environments. J Parallel Distrib Comput 84:51–64CrossRefGoogle Scholar
- 17.Xu J, Chen Z, Tang J, Su S (2014) T-Storm: traffic-aware online scheduling in Storm. In: Proceedings of 2014 IEEE 34th Internatoin Conference on Distributed Computing Systems, ICDCS 2014. IEEE Press, New York, pp 535–544Google Scholar
- 18.Aniello L, Baldoni R, Querzoni L (2013) Adaptive online scheduling in Storm. In: Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems, DEBS 2013. ACM Press, New York, pp 207–218Google Scholar
- 19.Katsipoulakis NR, Thoma C, Gratta EA, Labrinidis A, Lee AJ, Chrysanthis PK (2015) CE-Storm: confidential elastic processing of data streams. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015. ACM Press, New York, pp 859–864Google Scholar
- 20.Chen Z, Xu J, Tang J, Kwiat K, Kamhoua C (2015) G-Storm: GPU-enabled high-throughput online data processing in Storm. In: Proceedings of the 2015 IEEE International Conference on Big Data, Big Data 2015. IEEE Press, New York, pp 307–312Google Scholar
- 21.Basanta-Val P, Fernández-García N, Wellings AJ, Audsley NC (2015) Improving the predictability of distributed stream processors. Future Gener Comput Syst 52:22–36CrossRefGoogle Scholar
- 22.Verma A, Kaushal S (2015) Cost-time efficient scheduling plan for execution workflows in the cloud. J Grid Comput 13(4):495–506MathSciNetCrossRefGoogle Scholar
- 23.Gu L, Zeng D, Guo S, Xiang Y, Hu J (2016) A general communication cost optimization framework for big data stream processing in geo-distributed data centers. IEEE Trans Comput 65(1):19–29MathSciNetCrossRefMATHGoogle Scholar
- 24.Tang S, Lee BS, He B (2017) Fair resource allocation for data-intensive computing in the cloud. IEEE Trans Serv Comput. doi: 10.1109/TSC.2016.2531698 Google Scholar
- 25.Sun DW, Huang R (2016) A stable online scheduling strategy for real-time stream computing over fluctuating big data streams. IEEE Access 4:8593–8607CrossRefGoogle Scholar
- 26.Hu M, Luo J, Wang Y, Lukasiewycz M, Zeng Z (2014) Holistic scheduling of real-time applications in time-triggered in-vehicle networks. IEEE Trans Ind Inf 10(3):1817–1828CrossRefGoogle Scholar
- 27.Alkhanak EN, Lee SP, Rezaei R, Parizi RM (2016) Cost optimization approaches for scientific workflow scheduling in cloud and grid computing: a review, classifications, and open issues. J Syst Softw 113:1–26CrossRefGoogle Scholar
- 28.Hu M, Luo J, Wang Y, Veeravalli B (2017) Adaptive scheduling of task graphs with dynamic resilience. IEEE Trans Comput 66(1):17–23MathSciNetCrossRefMATHGoogle Scholar
- 29.Matei Z, Dhruba B, Joydeep SS, Khaled E, Scott S, Ion S (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of 5th European Conference on Computer systems, EuroSys 2010. ACM Press, New York, pp 265–278Google Scholar
- 30.Bala A, Chana I (2015) Intelligent failure prediction models for scientific workflows. Expert Syst Appl 42(3):980–989CrossRefGoogle Scholar
- 31.Zeng L, Veeravalli B, Zomaya AY (2015) An integrated task computation and data management scheduling strategy for workflow applications in cloud environments. J Netw Comput Appl 50:39–48CrossRefGoogle Scholar
- 32.Shi J, Luo J, Dong F, Zhang J, Zhang J (2016) Elastic resource provisioning for scientific workflow scheduling in cloud under budget and deadline constraints. Clust Comput 19(1):167–182CrossRefGoogle Scholar
- 33.Zhu Z, Zhang G, Li M, Liu X (2016) Evolutionary multi-objective workflow scheduling in cloud. IEEE Trans Parallel Distrib Syst 27(5):1344–1357CrossRefGoogle Scholar
- 34.Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J, Bhagat N, Mittal S, Ryaboy D (2014) Storm@twitter. In: Proceedings of 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014. ACM Press, New York, pp 147–156Google Scholar