The Journal of Supercomputing

, Volume 74, Issue 7, pp 2935–2955 | Cite as

Improving the energy efficiency and performance of data-intensive workflows in virtualized clouds

  • Xilong Qu
  • Peng Xiao
  • Lirong Huang


In recent years, deploying and running data-intensive workflows in cloud platform has become more and more popular in many areas. Unlike computation-intensive applications, a data-intensive workflow typically requires to deal with bulk data transferring between different resource sites, which means some traditional energy-efficiency optimization technologies are difficult to be enforced when running data-intensive workflows. In this paper, we first formulate the power model of a data-intensive workflow, which takes into account power consumption caused by data transferring. Based on this power model, we introduce a novel metric called Shortest Path in terms of Energy Consumption and design an energy-efficient heuristic scheduling algorithm, which is aiming at reducing the extra energy consumption caused by delays of bulk data transferring. Extensive experiments and performance evaluations show that the proposed scheduling algorithm can significantly reduce the overall energy consumption of running data-intensive workflows comparing with several existing algorithms. In addition, the proposed algorithm also exhibits better adaptiveness and robustness when a cloud system is facing intensive and unpredicted workloads.


Cloud computing Data-intensive workflow Quality of service Makespan Energy consumption 



This work was supported by the Research project of Education Department of Hunan Province (No. 17K015).


  1. 1.
    Buyya R, Yeo CS, Venugopal S et al (2009) Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Gener. Comput. Syst. 25(6):599–616CrossRefGoogle Scholar
  2. 2.
    Murphy MA, Goasguen S (2010) Virtual organization clusters: self-provisioned clouds on the grid. Future Gener Comput Syst 26(8):271–1281CrossRefGoogle Scholar
  3. 3.
    Hosny AM, Shedeed HA, Hussein AS, Tolba MF (2014) Cloud-based parallel solution for estimating statistical significance of megabyte-scale DNA sequences. Concurr Comput Pract Exp 26(1):118–133CrossRefGoogle Scholar
  4. 4.
    Kim C, Jeon C, Lee W, Yang S (2015) A parallel migration scheme for fast virtual machine relocation on a cloud cluster. J Supercomput 71(12):4623–4645CrossRefGoogle Scholar
  5. 5.
    Szabo C, Sheng QZ, Kroeger T et al (2014) Science in the cloud: allocation and execution of data-Intensive scientific workflows. J Grid Comput 12(2):245–264CrossRefGoogle Scholar
  6. 6.
    Barham P, Dragovic B, Fraser K et al (2003) Xen and the art of virtualization. In: Proceedings of the ACM symposium on Operating systems principles (SOSP). ACM, New York, pp 164–177.
  7. 7.
    Bugnion E, Devine S, Rosenblum M et al (2012) Bringing virtualization to the x86 architecture with the original VMware Workstation. ACM Trans Computer Syst 30(4):1–51CrossRefGoogle Scholar
  8. 8.
    Gomez-Folgar F, Garcia-Loureiro AJ, Pena TF et al (2015) Study of the KVM CPU performance of open-source cloud management platforms. In: Proceedings of IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, Shenzhen, China, pp 1225–1228.
  9. 9.
    Govindan S, Choi J, Nath AR et al (2009) Xen and Co.: communication-aware CPU management in consolidated xen-based hosting platforms. IEEE Trans Comput 58(8):1111–1125MathSciNetCrossRefGoogle Scholar
  10. 10.
    Sharifi M, Salimi H, Najafzadeh M (2012) Power-efficient distributed scheduling of virtual machines using workload-aware consolidation techniques. J Supercomput 61(1):6–66CrossRefGoogle Scholar
  11. 11.
    Bianchini R (2012) Leveraging renewable energy in data centers: present and future. In: Proceedings of International Symposium on High Performance Distributed Computing (HPDC). ACM, Delft, pp 135-136.
  12. 12.
    Wang J, Feng L (2011) A survey on energy-efficient data management. ACM SIGMOD Rec 40(2):17–23CrossRefGoogle Scholar
  13. 13.
    Van Heddeghem W, Vereecken W, Colle D et al (2012) Distributed computing for carbon footprint reduction by exploiting low-footprint energy availability. Future Gener Comput Syst 28(2):405–414CrossRefGoogle Scholar
  14. 14.
    Arabnia HR, Oliver MA (1986) Fast operations on raster images with SIMD machine architectures. Int J Eurograph Assoc Comput Graph Forum 5(3):179–188CrossRefGoogle Scholar
  15. 15.
    Arabnia HR, Taha TR (1986) A parallel numerical algorithm on a reconfigurable multi-ring network. J Telecommun Syst 10(1–2):185–203 1998Google Scholar
  16. 16.
    Wani MA, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multi-ring network. J Supercomput 25(1):43–63CrossRefzbMATHGoogle Scholar
  17. 17.
    Valafar H, Arabnia HR, Williams G (2004) Distributed global optimization and its development on the multiring network. Int J Neural Parallel Sci Comput 12(4):465–490MathSciNetzbMATHGoogle Scholar
  18. 18.
    Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th Annual International High Performance Computing Conference, pp 349–357Google Scholar
  19. 19.
    Wani MA, Arabnia HR (2006) Parallel polygon approximation targeted at reconfigurable multi-ring hardware. In: Proceedings of the 2006 International Conference on Computer Graphics and Virtual Reality, pp 86–94Google Scholar
  20. 20.
    Gao PX, Curtis AR, Wang B et al (2012) It’s not easy being green. In: Proceedings of the ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM). ACM, Helsinki, pp 211–222.
  21. 21.
    Kaur PD, Chana I (2014) A resource elasticity framework for QoS-aware execution of cloud applications. Future Gener Comput Syst 37:14–25CrossRefGoogle Scholar
  22. 22.
    Shibata T, Choi SJ, Taura K (2010) File-access characteristics of data-intensive workflow applications. In: Proceedings of International Conference on Cluster, Cloud and Grid Computing (CCGRID). IEEE, Melbourne, pp 522–525Google Scholar
  23. 23.
    He L, Zou D, Zhang Z et al (2014) Developing resource consolidation frameworks for moldable virtual machines in clouds. Future Gener Comput Syst 32(1):69–81CrossRefGoogle Scholar
  24. 24.
    Brandic I, Benkner S, Engelbrecht G, Schmidt R (2005) QoS support for time-critical grid workflow applications. In: Proceedings of International Conference on e-Science and Grid Computing (e-Science). IEEE, Melbourne, pp 108–115.
  25. 25.
    Deelman E, Singh G, Su MH et al (2005) Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci Program J 13:219–237Google Scholar
  26. 26.
    Frey J, Tannenbaum T, Foster I et al (2002) Condor-G: a computation management agent for multi-institutional grids. Clust Comput 5(3):237–246CrossRefGoogle Scholar
  27. 27.
    Wang DL, Zender CS, Jenks SF (2008) Clustered workflow execution of retargeted data analysis scripts. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Lyon, pp 449–458.
  28. 28.
    Nadeem F, Fahringer T (2009) Using templates to predict execution time of scientific workflow applications in the grid. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Shanghai, pp 316–323.
  29. 29.
    Dun N, Taura K, Yonezawa A (2010) Fine-grained profiling for data-Intensive workflows. In: Proceedings of International Conference on Cluster, Cloud and Grid Computing (CCGRID). IEEE, Melbourne, pp 571–572.
  30. 30.
    Tolosana-Calasanza R, Banares JA, Congduc P, Rana OF (2012) Enforcing QoS in scientific workflow systems enacted over Cloud infrastructures. J Comput Syst Sci 78(5):1300–1315CrossRefGoogle Scholar
  31. 31.
    Emeakaroha VC, Maurer M, Stern P et al (2013) Managing and optimizing bioinformatics workflows for data analysis in clouds. J Grid Comput 11(3):407–428CrossRefGoogle Scholar
  32. 32.
    Javadi B, Tomko M, Sinnott RO (2013) Decentralized orchestration of data-centric workflows in cloud environments. Future Gener Comput Syst 29(7):1826–1837CrossRefGoogle Scholar
  33. 33.
    Jung IY, Han BJ, Jeong CS, Rho S (2014) Cloud-based mapreduce workflow execution platform. J Internet Technol 15(6):1059–1067Google Scholar
  34. 34.
    Topcuoglu H, Hariri S, Wu MY (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274CrossRefGoogle Scholar
  35. 35.
    Decker J, Schneider J (2007) Heuristic scheduling of grid workflows supporting co-allocation and advance reservation. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Rio de Janeiro, pp 335–342.
  36. 36.
    Glatard T, Montagnat J, Pennec X (2008) A probabilistic model to analyse workflow performance on production grids. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Lyon, pp 510–517.
  37. 37.
    Wieczorek M, Podlipnig S, Prodan R, Fahringer T (2008) Bi-criteria scheduling of scientific workflows for the grid. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Lyon, pp 9–16.
  38. 38.
    Yu J, Buyya R, Tham CK (2005) Cost-based scheduling of scientific workflow applications on utility grids. In: Proceedings of International Conference on e-Science and Grid Computing (e-Science). IEEE, Melbourne, pp 140–147.
  39. 39.
    Yu J, Buyya R (2006) A budget constrained scheduling of workflow applications on utility grids using genetic algorithms. In: Proceedings of Workshop on Workflows in Support of Large-Scale Science (WORKS). IEEE, Paris, pp 1–10.
  40. 40.
    Hunold S, Rauber T, Suter F (2008) Scheduling dynamic workflows onto clusters of clusters using postponing. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Lyon, pp 669–674.
  41. 41.
    Lee YC, Subrata R, Zomaya AY (2009) On the performance of a dual-objective optimization model for workflow applications on grid platforms. IEEE Trans Parallel Distrib Syst 20(9):1273–1284CrossRefGoogle Scholar
  42. 42.
    Liu X, Chen J, Wu Z et al (2010) Handling recoverable temporal violations in scientific workflow systems: a workflow rescheduling based strategy. In: Proceedings of International Conference on Cluster, Cloud and Grid Computing (CCGRID). IEEE, Melbourne, pp 534–537.
  43. 43.
    Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows. Future Gener Comput Syst 26(8):1200–1214CrossRefGoogle Scholar
  44. 44.
    Deng K, Ren K, Song J et al (2013) A clustering based coscheduling strategy for efficient scientific workflow execution in cloud computing. Concurr Comput Pract Exp 25(18):2523–2539CrossRefGoogle Scholar
  45. 45.
    Calheiros RN, Buyya R (2014) Meeting deadlines of scientific workflows in public clouds with tasks replication. IEEE Trans Parallel Distrib Syst 25(7):1787–1796CrossRefGoogle Scholar
  46. 46.
    Verma A, Kaushal S (2015) Cost-time efficient scheduling plan for executing workflows in the cloud. J Grid Comput 13(4):495–506MathSciNetCrossRefGoogle Scholar
  47. 47.
    Zeng LB, Veeravalli B, Li X (2015) SABA: a security-aware and budget-aware workflow scheduling strategy in clouds. J Parallel Distrib Comput 75:141–151CrossRefGoogle Scholar
  48. 48.
    Bryk P, Malawski M, Juve G, Deelman E (2016) Storage-aware algorithms for scheduling of workflow ensembles in clouds. J Grid Comput 14(2):359–378CrossRefGoogle Scholar
  49. 49.
    Calheiros RN, Ranjan R, Beloglazov A et al (2011) CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41(1):23–50CrossRefGoogle Scholar
  50. 50.
    Theiner D, Wieczorek M (2006) Reduction of calibration time of distributed hydrological models by use of grid computing and nonlinear optimisation algorithms. In: Proceedings of International Conference on Hydroinformatics, pp 1–8Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Information Technology and ManagementHunan University of Finance and EconomicsChangsha CityChina
  2. 2.School of Computer and CommunicationHunan Institute of EngineeringXiangtanChina

Personalised recommendations