With the development of cloud computing, more and more data-intensive workflows have been deployed on virtualized datacenters. As a result, the energy spent on massive data accessing grows rapidly. In this paper, an energy aware scheduling algorithm is proposed, which introduces a novel heuristic called Minimal Data-Accessing Energy Path for scheduling data-intensive workflows aiming to reduce the energy consumption of intensive data accessing. Extensive experiments based on both synthetical and real workloads are conducted to investigate the effectiveness and performance of the proposed scheduling approach. The experimental results show that the proposed heuristic scheduling can significantly reduce the energy consumption of storing/retrieving intermediate data generated during the execution of data intensive workflow. In addition, it exhibits better robustness than existing algorithms when cloud systems are in presence of I/O intensive workloads.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Sun D W, Chang G R, Gao S, Jin L Z, Wang X W. Modeling a dynamic data replication strategy to increase system availability in cloud computing environments. Journal of Computer Science and Technology, 2012, 27(2): 256–272.
Sedaghat M, Hernández F, Elmroth E. Unifying cloud management: Towards overall governance of business level objectives. In Proc. the 11th IEEE/ACM Int. Symp. Cluster, Cloud and Grid Computing, May 2011, pp.591-597.
Iosup A, Yigitbasi N, Epema D. On the performance variability of production cloud services. In Proc. the 11th IEEE/ACM Int. Symp. Cluster, Cloud and Grid Computing, May 2011, pp.104-113.
Mahadevan P, Banerjee S, Sharma P, Shah A, Ranganathan P. On energy efficiency for enterprise and data center networks. IEEE Communications Magazine, 2011, 49(8): 94-100.
Goth G. Data center operators face energy irony. IEEE Internet Computing, 2010, 14(2): 7–10.
Wang J, Feng L, Xue W, Song Z. A survey on energy-efficient data management. SIGMOD Record, 2011, 40(2): 17–23.
Figueiredo J, Maciel P, Callou G, Tavares E, Sousa E, Silva B. Estimating reliability importance and total cost of acquisition for data center power infrastructures. In Proc. the IEEE Int. Conf. Systems, Man, and Cybernetics, Oct. 2011, pp.421-426.
Li J X, Li B, Wo T Y, Hu C M, Huai J P, Liu L, Lam K P. CyberGuarder: A virtualization security assurance architecture for green cloud computing. Future Generation ComputerSystems, 2012, 28(2): 379–390.
Garg S K, Yeob C S, Anandasivamc A, Buyyaa R. Environment-conscious scheduling of HPC applications on distributed cloud-oriented data centers. Journal of Parallel Distributed Computing, 2011, 71(6): 732–749.
Juve G, Deelman E, Berriman G B, Berman B P, Maechling P. An evaluation of the cost and performance of scientific workflows on Amazon EC2. Journal of Grid Computing, 2012, 10(1): 5–21.
Yuan D, Yang Y, Liu X, Zhang G, Chen J. A data dependency based strategy for intermediate data storage in scientific cloud workflow systems. Concurrency and Computation: Practice and Experience, 2012, 24(9): 956–976.
Tolosana-Calasanza R, Bañares J A, Pham C, Rana O F. Enforcing QoS in scientific workflow systems enacted over cloud infrastructures. Journal of Computer and System Sciences, 2012, 78(5): 1300–1315.
Sotomayor B, Montero R S, Llorente I M, Foster I. Virtual infrastructure management in private and hybrid clouds. IEEE Internet Computing, 2009, 13(5): 14–22.
Chapman C, Emmerich W, Márquez F G, Clayman S, Galis A. Software architecture definition for on-demand cloud provisioning. Cluster Computing, 2012, 15(2): 79–100.
Kirschnick J, Alcaraz-Calero J M, Goldsack P, Farrell A, Guijarro J, Loughran S, Edwards N, Wilcock L. Towards an architecture for deploying elastic services in the cloud. Software: Practice and Experience, 2012, 42(4): 395–408.
Cherkasova L, Gupta D, Vahdat A. Comparison of the three CPU schedulers in Xen. ACM SIGMETRICS Performance Evaluation Review, 2007, 35(2): 42–51.
Krishnan B, Amur H, Gavrilovska A, Schwan K. VM power metering: Feasibility and challenges. ACM SIGMETRICS Performance Evaluation Review, 2010, 38(3): 56–60.
Kang H, Chen Y, Wong J L, Radu S, Wu J. Enhancement of Xen’s scheduler for MapReduce workloads. In Proc. the 20th Int. Symp. High Performance Distributed Computing, June 2011, pp.251-262.
Kim H, Lim H, Jeong J, Jo H, Lee J. Task-aware virtual machine scheduling for I/O performance. In Proc. the 2009 ACM SIGPLAN/SIGOPS Int. Conf. Virtual Execution, March 2009, pp.101-110.
Abbasi Z, Varsamopoulos G, Gupta S K S. TACOMA: Server and workload management in Internet data centers considering cooling-computing power trade-off and energy proportionality. ACM Transactions on Architecture and Code Optimization, 2012, 9(2): Article No.11.
Fang W, Liang X, Sun Y, Vasilakos A V. Network element scheduling for achieving energy-aware data center networks. International Journal of Computers Communications and Control, 2012, 7(2):241–251.
Benoit A, Goud P R, Robert Y. Performance and energy optimization of concurrent pipelined applications. In Proc. the 24th IEEE Int. Symp. Parallel and Distributed Processing, Apr 2010, pp.1-12.
Baskiyar S, Abdel-Kader R. Energy aware DAG scheduling on heterogeneous systems. Cluster Computing, 2010, 13(4): 373–383.
Rizvandi N B, Taheri J, Zomaya A Y, Lee Y C. Linear combinations of DVFs-enabled processor frequencies to modify the energy-aware scheduling algorithms. In Proc. the 10th IEEE/ACM Int. Conf. Cluster, Cloud and Grid Computing, May 2010, pp.388-397.
Lee Y C, Zomaya A Y. Energy conscious scheduling for distributed computing systems under different operating conditions. IEEE Transactions on Parallel and Distributed Systems, 2011, 22(8): 1374–1381.
Mezmaza M, Melab N, Kessaci Y, Lee Y C, Talbi E G, Zomaya A Y, Tuyttens D. A parallel bi-objective hybrid metaheuristic for energy-aware scheduling for cloud computing systems. Journal of Parallel and Distributed Computing, 2011, 71(11): 1497–1508.
Zhu D, Melhem R, Childers B R. Scheduling with dynamic voltage/speed adjustment using slack reclamation in multi processor real-time systems. IEEE Transactions on Parallel and Distributed Systems, 2003, 14(7): 686–700.
Zong Z, Briggs M, Connor N, Xiao Q. An energy-efficient framework for large-scale parallel storage systems. In Proc. the 21st IEEE Int. Symp. Parallel and Distributed Processing, Mar. 2007, pp.1-7.
Manzanares A, Bellam K, Qin X. A prefetching scheme for energy conservation in parallel disk systems. In Proc. the 22nd IEEE Int. Symp. Parallel and Distributed Processing, Apr. 2008, pp.1-5.
Bohra A, Chaudhary V. Vmeter: Power modelling for virtualized clouds. In Proc. the 24th IEEE Int. Symp. Parallel and Distributed Processing, Apr. 2010, pp.1-8.
Cho S, Melhem R G. On the interplay of parallelization, program performance, and energy consumption. IEEE Transactions on Parallel and Distributed Systems, 2010, 21(3): 342-353.
Kim K H, Beloglazov A, Buyya R. Power-aware provisioning of virtual machines for real-time cloud services. Concurrency and Computation: Practice and Experience, 2011, 23(13):1491–1505.
Speitkamp B, Bichler M. A mathematical programming approach for server consolidation problems in virtualized data centers. IEEE Transactions on Services Computing, 2010, 3(4): 266–278.
Hupfeld F, Cortes T, Kolbeck B, Stender J, Focht E, Hess M, Malo J, Martí J, Cesario E. The XtreemFS architecture — A case for object-based file systems in grids. Concurrency and Computation: Practice and Experience, 2008, 20(17): 2049-2060.
Topcuoglu H, Hariri S, Wu M Y. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems, 2002, 13(3): 260–274.
Calheiros R N, Ranjan R, Beloglazov A, De Rose C A F, Buyya R. CloudSim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and Experience, 2011, 41(1): 23–50.
Berlinska J, Drozdowski M. Scheduling divisible MapReduce computations. Journal of Parallel and Distributed Computing, 2011, 71(3): 450–459.
Kiss T, Greenwell P, Heindl H, Terstyánszky G, Weingarten N. Parameter sweep workflows for modelling carbohydrate recognition. Journal of Grid Computing, 2010, 8(4): 587-601.
Kansal A, Zhao F, Liu J, Kothari N, Bhattacharya A A. Virtual machine power metering and provisioning. In Proc. the 1st ACM Symp. Cloud Computing, June 2010, pp.39-50.
Theiner D, Wieczorek M. Reduction of calibration time of distributed hydrological models by use of grid computing and nonlinear optimisation algorithms. In Proc. the 7th Int. Conf. Hydroinformatics, Sept. 2006.
Supported by the National Natural Science Foundation of China under Grant Nos. 60970038, 61272148, the Science and Technology Plan Project of Hunan Province of China under Grant No. 2012GK3075, and the Scientific Research Fund of Hunan Provincial Education Department of China under Grant No. 13B015.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Xiao, P., Hu, Z. & Zhang, Y. An Energy-Aware Heuristic Scheduling for Data-Intensive Workflows in Virtualized Datacenters. J. Comput. Sci. Technol. 28, 948–961 (2013). https://doi.org/10.1007/s11390-013-1390-9
- cloud computing
- energy efficient
- heuristic scheduling
- data-intensive workflow