Skip to main content

Advertisement

Log in

Improving the energy efficiency and performance of data-intensive workflows in virtualized clouds

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In recent years, deploying and running data-intensive workflows in cloud platform has become more and more popular in many areas. Unlike computation-intensive applications, a data-intensive workflow typically requires to deal with bulk data transferring between different resource sites, which means some traditional energy-efficiency optimization technologies are difficult to be enforced when running data-intensive workflows. In this paper, we first formulate the power model of a data-intensive workflow, which takes into account power consumption caused by data transferring. Based on this power model, we introduce a novel metric called Shortest Path in terms of Energy Consumption and design an energy-efficient heuristic scheduling algorithm, which is aiming at reducing the extra energy consumption caused by delays of bulk data transferring. Extensive experiments and performance evaluations show that the proposed scheduling algorithm can significantly reduce the overall energy consumption of running data-intensive workflows comparing with several existing algorithms. In addition, the proposed algorithm also exhibits better adaptiveness and robustness when a cloud system is facing intensive and unpredicted workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Buyya R, Yeo CS, Venugopal S et al (2009) Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Gener. Comput. Syst. 25(6):599–616

    Article  Google Scholar 

  2. Murphy MA, Goasguen S (2010) Virtual organization clusters: self-provisioned clouds on the grid. Future Gener Comput Syst 26(8):271–1281

    Article  Google Scholar 

  3. Hosny AM, Shedeed HA, Hussein AS, Tolba MF (2014) Cloud-based parallel solution for estimating statistical significance of megabyte-scale DNA sequences. Concurr Comput Pract Exp 26(1):118–133

    Article  Google Scholar 

  4. Kim C, Jeon C, Lee W, Yang S (2015) A parallel migration scheme for fast virtual machine relocation on a cloud cluster. J Supercomput 71(12):4623–4645

    Article  Google Scholar 

  5. Szabo C, Sheng QZ, Kroeger T et al (2014) Science in the cloud: allocation and execution of data-Intensive scientific workflows. J Grid Comput 12(2):245–264

    Article  Google Scholar 

  6. Barham P, Dragovic B, Fraser K et al (2003) Xen and the art of virtualization. In: Proceedings of the ACM symposium on Operating systems principles (SOSP). ACM, New York, pp 164–177. https://doi.org/10.1145/1165389.945462

  7. Bugnion E, Devine S, Rosenblum M et al (2012) Bringing virtualization to the x86 architecture with the original VMware Workstation. ACM Trans Computer Syst 30(4):1–51

    Article  Google Scholar 

  8. Gomez-Folgar F, Garcia-Loureiro AJ, Pena TF et al (2015) Study of the KVM CPU performance of open-source cloud management platforms. In: Proceedings of IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, Shenzhen, China, pp 1225–1228. https://doi.org/10.1109/CCGrid.2015.103

  9. Govindan S, Choi J, Nath AR et al (2009) Xen and Co.: communication-aware CPU management in consolidated xen-based hosting platforms. IEEE Trans Comput 58(8):1111–1125

    Article  MathSciNet  Google Scholar 

  10. Sharifi M, Salimi H, Najafzadeh M (2012) Power-efficient distributed scheduling of virtual machines using workload-aware consolidation techniques. J Supercomput 61(1):6–66

    Article  Google Scholar 

  11. Bianchini R (2012) Leveraging renewable energy in data centers: present and future. In: Proceedings of International Symposium on High Performance Distributed Computing (HPDC). ACM, Delft, pp 135-136. https://doi.org/10.1145/2287076.2287101

  12. Wang J, Feng L (2011) A survey on energy-efficient data management. ACM SIGMOD Rec 40(2):17–23

    Article  Google Scholar 

  13. Van Heddeghem W, Vereecken W, Colle D et al (2012) Distributed computing for carbon footprint reduction by exploiting low-footprint energy availability. Future Gener Comput Syst 28(2):405–414

    Article  Google Scholar 

  14. Arabnia HR, Oliver MA (1986) Fast operations on raster images with SIMD machine architectures. Int J Eurograph Assoc Comput Graph Forum 5(3):179–188

    Article  Google Scholar 

  15. Arabnia HR, Taha TR (1986) A parallel numerical algorithm on a reconfigurable multi-ring network. J Telecommun Syst 10(1–2):185–203 1998

    Google Scholar 

  16. Wani MA, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multi-ring network. J Supercomput 25(1):43–63

    Article  MATH  Google Scholar 

  17. Valafar H, Arabnia HR, Williams G (2004) Distributed global optimization and its development on the multiring network. Int J Neural Parallel Sci Comput 12(4):465–490

    MathSciNet  MATH  Google Scholar 

  18. Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th Annual International High Performance Computing Conference, pp 349–357

  19. Wani MA, Arabnia HR (2006) Parallel polygon approximation targeted at reconfigurable multi-ring hardware. In: Proceedings of the 2006 International Conference on Computer Graphics and Virtual Reality, pp 86–94

  20. Gao PX, Curtis AR, Wang B et al (2012) It’s not easy being green. In: Proceedings of the ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM). ACM, Helsinki, pp 211–222. https://doi.org/10.1145/2342356.2342398

  21. Kaur PD, Chana I (2014) A resource elasticity framework for QoS-aware execution of cloud applications. Future Gener Comput Syst 37:14–25

    Article  Google Scholar 

  22. Shibata T, Choi SJ, Taura K (2010) File-access characteristics of data-intensive workflow applications. In: Proceedings of International Conference on Cluster, Cloud and Grid Computing (CCGRID). IEEE, Melbourne, pp 522–525

  23. He L, Zou D, Zhang Z et al (2014) Developing resource consolidation frameworks for moldable virtual machines in clouds. Future Gener Comput Syst 32(1):69–81

    Article  Google Scholar 

  24. Brandic I, Benkner S, Engelbrecht G, Schmidt R (2005) QoS support for time-critical grid workflow applications. In: Proceedings of International Conference on e-Science and Grid Computing (e-Science). IEEE, Melbourne, pp 108–115. https://doi.org/10.1109/E-SCIENCE.2005.69

  25. Deelman E, Singh G, Su MH et al (2005) Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci Program J 13:219–237

    Google Scholar 

  26. Frey J, Tannenbaum T, Foster I et al (2002) Condor-G: a computation management agent for multi-institutional grids. Clust Comput 5(3):237–246

    Article  Google Scholar 

  27. Wang DL, Zender CS, Jenks SF (2008) Clustered workflow execution of retargeted data analysis scripts. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Lyon, pp 449–458. https://doi.org/10.1109/CCGRID.2008.69

  28. Nadeem F, Fahringer T (2009) Using templates to predict execution time of scientific workflow applications in the grid. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Shanghai, pp 316–323. https://doi.org/10.1109/CCGRID.2009.77

  29. Dun N, Taura K, Yonezawa A (2010) Fine-grained profiling for data-Intensive workflows. In: Proceedings of International Conference on Cluster, Cloud and Grid Computing (CCGRID). IEEE, Melbourne, pp 571–572. https://doi.org/10.1109/CCGRID.2010.29

  30. Tolosana-Calasanza R, Banares JA, Congduc P, Rana OF (2012) Enforcing QoS in scientific workflow systems enacted over Cloud infrastructures. J Comput Syst Sci 78(5):1300–1315

    Article  Google Scholar 

  31. Emeakaroha VC, Maurer M, Stern P et al (2013) Managing and optimizing bioinformatics workflows for data analysis in clouds. J Grid Comput 11(3):407–428

    Article  Google Scholar 

  32. Javadi B, Tomko M, Sinnott RO (2013) Decentralized orchestration of data-centric workflows in cloud environments. Future Gener Comput Syst 29(7):1826–1837

    Article  Google Scholar 

  33. Jung IY, Han BJ, Jeong CS, Rho S (2014) Cloud-based mapreduce workflow execution platform. J Internet Technol 15(6):1059–1067

    Google Scholar 

  34. Topcuoglu H, Hariri S, Wu MY (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274

    Article  Google Scholar 

  35. Decker J, Schneider J (2007) Heuristic scheduling of grid workflows supporting co-allocation and advance reservation. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Rio de Janeiro, pp 335–342. https://doi.org/10.1109/CCGRID.2007.56

  36. Glatard T, Montagnat J, Pennec X (2008) A probabilistic model to analyse workflow performance on production grids. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Lyon, pp 510–517. https://doi.org/10.1109/CCGRID.2008.123

  37. Wieczorek M, Podlipnig S, Prodan R, Fahringer T (2008) Bi-criteria scheduling of scientific workflows for the grid. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Lyon, pp 9–16. https://doi.org/10.1109/CCGRID.2008.21

  38. Yu J, Buyya R, Tham CK (2005) Cost-based scheduling of scientific workflow applications on utility grids. In: Proceedings of International Conference on e-Science and Grid Computing (e-Science). IEEE, Melbourne, pp 140–147. https://doi.org/10.1109/E-SCIENCE.2005.26

  39. Yu J, Buyya R (2006) A budget constrained scheduling of workflow applications on utility grids using genetic algorithms. In: Proceedings of Workshop on Workflows in Support of Large-Scale Science (WORKS). IEEE, Paris, pp 1–10. https://doi.org/10.1109/WORKS.2006.5282330

  40. Hunold S, Rauber T, Suter F (2008) Scheduling dynamic workflows onto clusters of clusters using postponing. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Lyon, pp 669–674. https://doi.org/10.1109/CCGRID.2008.44

  41. Lee YC, Subrata R, Zomaya AY (2009) On the performance of a dual-objective optimization model for workflow applications on grid platforms. IEEE Trans Parallel Distrib Syst 20(9):1273–1284

    Article  Google Scholar 

  42. Liu X, Chen J, Wu Z et al (2010) Handling recoverable temporal violations in scientific workflow systems: a workflow rescheduling based strategy. In: Proceedings of International Conference on Cluster, Cloud and Grid Computing (CCGRID). IEEE, Melbourne, pp 534–537. https://doi.org/10.1109/CCGRID.2010.15

  43. Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows. Future Gener Comput Syst 26(8):1200–1214

    Article  Google Scholar 

  44. Deng K, Ren K, Song J et al (2013) A clustering based coscheduling strategy for efficient scientific workflow execution in cloud computing. Concurr Comput Pract Exp 25(18):2523–2539

    Article  Google Scholar 

  45. Calheiros RN, Buyya R (2014) Meeting deadlines of scientific workflows in public clouds with tasks replication. IEEE Trans Parallel Distrib Syst 25(7):1787–1796

    Article  Google Scholar 

  46. Verma A, Kaushal S (2015) Cost-time efficient scheduling plan for executing workflows in the cloud. J Grid Comput 13(4):495–506

    Article  MathSciNet  Google Scholar 

  47. Zeng LB, Veeravalli B, Li X (2015) SABA: a security-aware and budget-aware workflow scheduling strategy in clouds. J Parallel Distrib Comput 75:141–151

    Article  Google Scholar 

  48. Bryk P, Malawski M, Juve G, Deelman E (2016) Storage-aware algorithms for scheduling of workflow ensembles in clouds. J Grid Comput 14(2):359–378

    Article  Google Scholar 

  49. Calheiros RN, Ranjan R, Beloglazov A et al (2011) CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41(1):23–50

    Article  Google Scholar 

  50. Theiner D, Wieczorek M (2006) Reduction of calibration time of distributed hydrological models by use of grid computing and nonlinear optimisation algorithms. In: Proceedings of International Conference on Hydroinformatics, pp 1–8

Download references

Acknowledgements

This work was supported by the Research project of Education Department of Hunan Province (No. 17K015).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xilong Qu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qu, X., Xiao, P. & Huang, L. Improving the energy efficiency and performance of data-intensive workflows in virtualized clouds. J Supercomput 74, 2935–2955 (2018). https://doi.org/10.1007/s11227-018-2344-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2344-3

Keywords

Navigation