Continuous-Action Reinforcement Learning for Memory Allocation in Virtualized Servers
- 437 Downloads
In a virtualized computing server (node) with multiple Virtual Machines (VMs), it is necessary to dynamically allocate memory among the VMs. In many cases, this is done only considering the memory demand of each VM without having a node-wide view. There are many solutions for the dynamic memory allocation problem, some of which use machine learning in some form.
This paper introduces CAVMem (Continuous-Action Algorithm for Virtualized Memory Management), a proof-of-concept mechanism for a decentralized dynamic memory allocation solution in virtualized nodes that applies a continuous-action reinforcement learning (RL) algorithm called Deep Deterministic Policy Gradient (DDPG). CAVMem with DDPG is compared with other RL algorithms such as Q-Learning (QL) and Deep Q-Learning (DQL) in an environment that models a virtualized node.
In order to obtain linear scaling and be able to dynamically add and remove VMs, CAVMem has one agent per VM connected via a lightweight coordination mechanism. The agents learn how much memory to bid for or return, in a given state, so that each VM obtains a fair level of performance subject to the available memory resources. Our results show that CAVMem with DDPG performs better than QL and a static allocation case, but it is competitive with DQL. However, CAVMem incurs significant less training overheads than DQL, making the continuous-action approach a more cost-effective solution.
KeywordsReinforcement learning Memory Virtualization
This research is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 754337 (EuroEXA) and the European Union’s 7th Framework Programme under grant agreement number 610456 (Euroserver). It also received funding from the Spanish Ministry of Science and Technology (project TIN2015-65316-P), Generalitat de Catalunya (contract 2014-SGR-1272), and the Severo Ochoa Programme (SEV-2015-0493) of the Spanish Government.
- 2.Rao, J., Bu, X., Xu, C.-Z., Wang, L., Yin, G.: VCONF: a reinforcement learning approach to virtual machine auto configuration. In: Proceedings of the 6th International Conference on Autonomic Computing (ICAC 2009), pp. 137–146 (2009)Google Scholar
- 3.Garrido, L.A., Carpenter, P.: vMCA: memory capacity aggregation and management in cloud environments. In: IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS) (2017)Google Scholar
- 4.Garrido, L.A., Carpenter, P.: Aggregating and managing memory across computing nodes in cloud environments. In: Kunkel, J.M., Yokota, R., Taufer, M., Shalf, J. (eds.) ISC High Performance 2017. LNCS, vol. 10524, pp. 642–652. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67630-2_45CrossRefGoogle Scholar
- 9.Dulac-Arnold, G., Evans, R., Sunehag, P., Coppin, B.: Reinforcement learning in large discrete action spaces. CoRR abs/1512.07679 (2015) Google Scholar
- 11.Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation: USENIX Association, CA, USA, Berkeley, pp. 265–283 (2016)Google Scholar
- 12.Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, Cambridge University, Cambridge, England (1989)Google Scholar
- 14.Bibbona, E., Panfilo, G., Tavella, P.: The Ornstein–Uhlenbeck process as a model of a low pass filtered white noise. Metrologia (2008)Google Scholar
- 17.Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. CoRR (2015)Google Scholar