Cost-aware job scheduling for cloud instances using deep reinforcement learning

Cheng, Feng; Huang, Yifeng; Tanpure, Bhavana; Sawalani, Pawan; Cheng, Long; Liu, Cong

doi:10.1007/s10586-021-03436-8

Cost-aware job scheduling for cloud instances using deep reinforcement learning

Published: 16 October 2021

Volume 25, pages 619–631, (2022)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Feng Cheng¹,
Yifeng Huang²,
Bhavana Tanpure³,
Pawan Sawalani³,
Long Cheng ORCID: orcid.org/0000-0003-1638-059X^2,4 &
…
Cong Liu⁵

1727 Accesses
47 Citations
Explore all metrics

Abstract

As the services provided by cloud vendors are providing better performance, achieving auto-scaling, load-balancing, and optimized performance along with low infrastructure maintenance, more and more companies migrate their services to the cloud. Since the cloud workload is dynamic and complex, scheduling the jobs submitted by users in an effective way is proving to be a challenging task. Although a lot of advanced job scheduling approaches have been proposed in the past years, almost all of them are designed to handle batch jobs rather than real-time workloads, such as that user requests are submitted at any time with any amount of numbers. In this work, we have proposed a Deep Reinforcement Learning (DRL) based job scheduler that dispatches the jobs in real time to tackle this problem. Specifically, we focus on scheduling user requests in such a way as to provide the quality of service (QoS) to the end-user along with a significant reduction of the cost spent on the execution of jobs on the virtual instances. We have implemented our method by Deep Q-learning Network (DQN) model, and our experimental results demonstrate that our approach can significantly outperform the commonly used real-time scheduling algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cost-aware real-time job scheduling for hybrid cloud using deep reinforcement learning

Article 19 June 2022

Long Cheng, Archana Kalapgar, … Cong Liu

Deep Reinforcement Learning for Multi-resource Cloud Job Scheduling

DRLBTSA: Deep reinforcement learning based task-scheduling algorithm in cloud computing

Article 17 June 2023

Sudheer Mangalampalli, Ganesh Reddy Karri, … GhaidaMuttashar Abdul Sahib

References

Cheng, L., van Dongen, B.F., van der Aalst, W.M.: Scalable discovery of hybrid process models in a cloud computing environment. IEEE Trans. Serv. Comput. 13(2), 368–380 (2020)
Article Google Scholar
Liu, J., Shen, H., Chi, H., Narman, H.S., Yang, Y., Cheng, L., Chung, W.: A low-cost multi-failure resilient replication scheme for high-data availability in cloud storage. IEEE/ACM Trans. Netw. 29(4), 1436–1451 (2021)
Article Google Scholar
Podolskiy, V., Jindal, A., Gerndt, M.: IaaS reactive autoscaling performance challenges. In: Proceedings on IEEE 11th International Conference on Cloud Computing, pp. 954–957 (2018)
Tchernykh, A., Schwiegelsohn, U., Alexandrov, V., Talbi, E.-G.: Towards understanding uncertainty in cloud computing resource provisioning. Proc. Comput. Sci. 51, 1772–1781 (2015)
Article Google Scholar
Yu, Y., Jindal, V., Yen, I.-L., Bastani, F.: Integrating clustering and learning for improved workload prediction in the cloud. In: Proceedings on IEEE 9th International Conference on Cloud Computing, pp. 876–879 (2016)
Garg, S.K., Toosi, A.N., Gopalaiyengar, S.K., Buyya, R.: Sla-based virtual machine management for heterogeneous workloads in a cloud datacenter. J. Netw. Comput. Appl. 45, 108–120 (2014)
Article Google Scholar
Chen, X., Cheng, L., Liu, C., Liu, Q., Liu, J., Mao, Y., Murphy, J.: A WOA-based optimization approach for task scheduling in cloud computing systems. IEEE Syst. J. 14(3), 3117–3128 (2020)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT, New York (2018)
MATH Google Scholar
Thaipisutikul, T., Chen, Y.-C., Hui, L., Chen, S.-C., Mongkolwat, P., Shih, T.K.: The matter of deep reinforcement learning towards practical AI applications. In: Proceedings on 12th International Conference on Ubi-Media Computing, pp. 24–29 (2019)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar
Santra, S., Mali, K.: A new approach to survey on load balancing in VM in cloud computing: using CloudSim. In: Proceedings on 2015 International Conference on Computer, Communication and Control, pp. 1–5 (2015)
Silva Filho, M.C., Monteiro, C.C., Inácio, P.R., Freire, M.M.: Approaches for optimizing virtual machine placement and migration in cloud environments: a survey. J. Parallel Distrib. Comput. 111, 222–250 (2018)
Article Google Scholar
Ghobaei-Arani, M., Souri, A., Baker, T., Hussien, A.: Controcity: an autonomous approach for controlling elasticity using buffer management in cloud computing environment. IEEE Access 7, 912–924 (2019)
Article Google Scholar
Zheng, W., Tynes, M., Gorelick, H., Mao, Y., Cheng, L., Hou, Y.: Flowcon: elastic flow configuration for containerized deep learning applications. In: Proceedings on 48th International Conference on Parallel Processing, pp. 1–10 (2019)
Zheng, W., Song, Y., Guo, Z., Cui, Y., Gu, S., Mao, Y., Cheng, L.: Target-based resource allocation for deep learning applications in a multi-tenancy system. In: Proceedings on 2019 IEEE High Performance Extreme Computing Conference, pp. 1–7 (2019)
Ghobaei-Arani, M., Souri, A., Safara, F., Norouzi, M.: An efficient task scheduling approach using moth-flame optimization algorithm for cyber-physical system applications in fog computing. Trans. Emerg. Telecommun. Technol. 31(2), 3770 (2020)
Google Scholar
Ghobaei-Arani, M., Souri, A.: Lp-wsc: a linear programming approach for web service composition in geographically distributed cloud environments. J. Supercomput. 75(5), 2603–2628 (2019)
Article Google Scholar
Ghobaei-Arani, M., Jabbehdari, S., Pourmina, M.A.: An autonomic approach for resource provisioning of cloud services. Clust. Comput. 19(3), 1017–1036 (2016)
Article Google Scholar
Banicescu, I., Ciorba, F.M., Srivastava, S.: Performance optimization of scientific applications using an autonomic computing approach. Scalable Computing: Theory and Practice, pp. 437–466 (2012)
Boulmier, A., Banicescu, I., Ciorba, F.M., Abdennadher, N.: An autonomic approach for the selection of robust dynamic loop scheduling techniques. In: 2017 16th International Symposium on Parallel and Distributed Computing, pp. 9–17 (2017)
Sukhija, N., Malone, B., Srivastava, S., Banicescu, I., Ciorba, F.M.: Portfolio-based selection of robust dynamic loop scheduling algorithms using machine learning. IEEE Int. Parallel Distrib. Process. Symp. Workshops 2014, 1638–1647 (2014)
Google Scholar
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)
Article Google Scholar
Liu, Q., Cheng, L., Jia, A.L., Liu, C.: Deep reinforcement learning for communication flow control in wireless mesh networks. IEEE Netw. 35(2), 112–119 (2021)
Article Google Scholar
Li, H., Wei, T., Ren, A., Zhu, Q., Wang, Y.: Deep reinforcement learning: framework, applications, and embedded implementations. In: Proceedings on 2017 IEEE/ACM International Conference on Computer-Aided Design, pp. 847–854 (2017)
Liu, Q., Cheng, L., Ozcelebi, T., Murphy, J., Lukkien, J.: Deep reinforcement learning for IoT network dynamic clustering in edge computing. In: Proceedings on 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 600–603 (2019)
Zhang, C., Lyu, X., Huang, Y., Tang, Z., Liu, Z.: Molecular graph generation with deep reinforced multitask network and adversarial imitation learning. In: Proceedings on IEEE International Conference on Bioinformatics and Biomedicine, pp. 326–329 (2019)
Cheng, M., Li, J., Nazarian, S.: DRL-cloud: Deep reinforcement learning-based resource provisioning and task scheduling for cloud service providers. In: Proceedings on 23rd Asia and South Pacific Design Automation Conference, pp. 129–134 (2018)
Li, H., Li, J., Yao, W., Nazarian, S., Lin, X., Wang, Y.: Fast and energy-aware resource provisioning and task scheduling for cloud systems. In: Proceedings on 18th International Symposium on Quality Electronic Design, pp. 174–179 (2017)
Wei, Y., Pan, L., Liu, S., Wu, L., Meng, X.: DRL-scheduling: an intelligent qos-aware job scheduling framework for applications in clouds. IEEE Access 6(55), 112–125 (2018)
Google Scholar
Xu, Z., Wang, Y., Tang, J., Wang, J., Gursoy, M.C.: A deep reinforcement learning based framework for power-efficient resource allocation in cloud rans. In: Proceedings on 2017 IEEE International Conference on Communications, pp. 1–6 (2017)
Duan, L., Zhan, D., Hohnerlein, J.: Optimizing cloud data center energy efficiency via dynamic prediction of CPU idle intervals. In: Proceedings on IEEE 8th International Conference on Cloud Computing, pp. 985–988 (2015)
Arroba, P., Moya, J.M., Ayala, J.L., Buyya, R.: DVFS-aware consolidation for energy-efficient clouds. In: Proceedings on 2015 International Conference on Parallel Architecture and Compilation, pp. 494–495 (2015)
Liu, J., Cheng, L.: SwiftS: a dependency-aware and resource efficient scheduling for high throughput in clouds. In: IEEE INFOCOM 2021-IEEE Conference on Computer Communications Workshops. IEEE, 2021, pp. 1–2
Peng, Q., Zheng, W., Xia, Y., Wu, C., Li, Y., Long, M., Li, X.: Reactive workflow scheduling in fluctuant infrastructure-as-a-service clouds using deep reinforcement learning. In: International Conference on Collaborative Computing: Networking, Applications and Worksharing, pp. 285–304 (2020)
Dong, T., Xue, F., Xiao, C., Zhang, J.: Workflow scheduling based on deep reinforcement learning in the cloud environment. J. Ambient Intell. Hum. Comput., pp. 1–13, 2021
Kardani-Moghaddam, S., Buyya, R., Ramamohanarao, K.: Adrl: a hybrid anomaly-aware deep reinforcement learning-based resource scaling in clouds. IEEE Trans. Parallel Distrib. Syst. 32(3), 514–526 (2020)
Article Google Scholar
Stupar, I., Huljenić, D.: Analyzing service resource usage profiles for optimization of cloud service execution cost. In: Proceedings on IEEE EUROCON 17th International Conference on Smart Technologies, pp. 79–84 (2017)
Wan, J., Zhang, G., Gui, X., Zhang, R.: Reducing the VM rental cost in the cloud spot market. In: Proceedings on IEEE/ACM 9th International Conference on Utility and Cloud Computing, 2016, pp. 432–433
Kokkinos, P., Varvarigou, T.A., Kretsis, A., Soumplis, P., Varvarigos, E.A.: Cost and utilization optimization of amazon EC2 instances. In: Proceedings on IEEE 6th International Conference on Cloud Computing, 2013, pp. 518–525
IEEE International Parallel and Distributed Processing Symposium Workshops: Denninnart, C., Gentry, J., Salehi, M.A., Improving robustness of heterogeneous serverless computing systems via probabilistic task pruning. In. IEEE 2019, 6–15 (2019)
Google Scholar
Kandpal, M., Gahlawat, M., Patel, K.: Role of predictive modeling in cloud services pricing: a survey. In: Proceedings on 7th International Conference on Cloud Computing, Data Science & Engineering-Confluence, pp. 249–254 (2017)
Pandey, D., Pandey, P.: Approximate Q-learning: an introduction. In: Proceedings on 2nd International Conference on Machine Learning and Computing, pp. 317–320 (2010)
Li, Y., Wen, Y., Tao, D., Guan, K.: Transforming cooling optimization for green data center via deep reinforcement learning. IEEE Transactions on Cybernetics 50(5), 2002–2013 (2019)
Article Google Scholar
Torrado, R.R., Bontrager, P., Togelius, J., Liu, J, Perez-Liebana, D.: Deep reinforcement learning for general video game AI. In: Proceedings on IEEE Conference on Computational Intelligence and Games, 2018, pp. 1–8
Li, D., Chen, C., Guan, J., Zhang, Y., Zhu, J., Yu, R.: DCloud: deadline-aware resource allocation for cloud computing jobs. IEEE Trans. Parallel Distrib. Syst. 27(8), 2248–2260 (2015)
Article Google Scholar

Download references

Acknowledgements

Part of this work was supported by the Undergraduate Education Research and Reform Project of Southwest Jiaotong University in 2020 (No. 20201035-07), the Fundamental Research Funds for the Central Universities (2021MS017), the National Science Foundation of China (61902222), and the Taishan Scholar Youth Program of Shandong Province (tsqn201909109).

Author information

Authors and Affiliations

School of Mathematics, Southwest Jiaotong University, Chengdu, China
Feng Cheng
School of Control and Computer Engineering, North China Electric Power University in Beijing, Beijing, China
Yifeng Huang & Long Cheng
School of Computing, Dublin City University, Dublin, Ireland
Bhavana Tanpure & Pawan Sawalani
The Insight SFI Research Centre for Data Analytics, Dublin City University, Dublin, Ireland
Long Cheng
School of Computer Science and Technology, Shandong University of Technology, Zibo, China
Cong Liu

Authors

Feng Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Yifeng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Bhavana Tanpure
View author publications
You can also search for this author in PubMed Google Scholar
Pawan Sawalani
View author publications
You can also search for this author in PubMed Google Scholar
Long Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Cong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Long Cheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, F., Huang, Y., Tanpure, B. et al. Cost-aware job scheduling for cloud instances using deep reinforcement learning. Cluster Comput 25, 619–631 (2022). https://doi.org/10.1007/s10586-021-03436-8

Download citation

Received: 16 February 2021
Revised: 31 July 2021
Accepted: 24 September 2021
Published: 16 October 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10586-021-03436-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cost-aware job scheduling for cloud instances using deep reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Cost-aware real-time job scheduling for hybrid cloud using deep reinforcement learning

Deep Reinforcement Learning for Multi-resource Cloud Job Scheduling

DRLBTSA: Deep reinforcement learning based task-scheduling algorithm in cloud computing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cost-aware job scheduling for cloud instances using deep reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Cost-aware real-time job scheduling for hybrid cloud using deep reinforcement learning

Deep Reinforcement Learning for Multi-resource Cloud Job Scheduling

DRLBTSA: Deep reinforcement learning based task-scheduling algorithm in cloud computing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation