Continuous-Action Reinforcement Learning for Memory Allocation in Virtualized Servers

  • Luis A. GarridoEmail author
  • Rajiv Nishtala
  • Paul Carpenter
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11887)


In a virtualized computing server (node) with multiple Virtual Machines (VMs), it is necessary to dynamically allocate memory among the VMs. In many cases, this is done only considering the memory demand of each VM without having a node-wide view. There are many solutions for the dynamic memory allocation problem, some of which use machine learning in some form.

This paper introduces CAVMem (Continuous-Action Algorithm for Virtualized Memory Management), a proof-of-concept mechanism for a decentralized dynamic memory allocation solution in virtualized nodes that applies a continuous-action reinforcement learning (RL) algorithm called Deep Deterministic Policy Gradient (DDPG). CAVMem with DDPG is compared with other RL algorithms such as Q-Learning (QL) and Deep Q-Learning (DQL) in an environment that models a virtualized node.

In order to obtain linear scaling and be able to dynamically add and remove VMs, CAVMem has one agent per VM connected via a lightweight coordination mechanism. The agents learn how much memory to bid for or return, in a given state, so that each VM obtains a fair level of performance subject to the available memory resources. Our results show that CAVMem with DDPG performs better than QL and a static allocation case, but it is competitive with DQL. However, CAVMem incurs significant less training overheads than DQL, making the continuous-action approach a more cost-effective solution.


Reinforcement learning Memory Virtualization 



This research is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 754337 (EuroEXA) and the European Union’s 7th Framework Programme under grant agreement number 610456 (Euroserver). It also received funding from the Spanish Ministry of Science and Technology (project TIN2015-65316-P), Generalitat de Catalunya (contract 2014-SGR-1272), and the Severo Ochoa Programme (SEV-2015-0493) of the Spanish Government.


  1. 1.
    Zhang, W., Xie, H., Hsu, C.: Automatic memory control of multiple virtual machines on a consolidated server. IEEE Trans. Cloud Comput. 5(1), 2–14 (2017)CrossRefGoogle Scholar
  2. 2.
    Rao, J., Bu, X., Xu, C.-Z., Wang, L., Yin, G.: VCONF: a reinforcement learning approach to virtual machine auto configuration. In: Proceedings of the 6th International Conference on Autonomic Computing (ICAC 2009), pp. 137–146 (2009)Google Scholar
  3. 3.
    Garrido, L.A., Carpenter, P.: vMCA: memory capacity aggregation and management in cloud environments. In: IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS) (2017)Google Scholar
  4. 4.
    Garrido, L.A., Carpenter, P.: Aggregating and managing memory across computing nodes in cloud environments. In: Kunkel, J.M., Yokota, R., Taufer, M., Shalf, J. (eds.) ISC High Performance 2017. LNCS, vol. 10524, pp. 642–652. Springer, Cham (2017). Scholar
  5. 5.
    Armbrust, M., et al.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)CrossRefGoogle Scholar
  6. 6.
    Zhang, Q., Cheng, L., Boutaba, R.: Cloud computing: state-of-the-artand research challenges. J. Internet Serv. Appl. 1(1), 7–18 (2010)CrossRefGoogle Scholar
  7. 7.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015). Scholar
  8. 8.
    Van Hasselt, H.: Reinforcement learning in continuous state and action spaces. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. Adaptation, Learning, and Optimization, vol. 12. Springer, Heidelberg (2012). Scholar
  9. 9.
    Dulac-Arnold, G., Evans, R., Sunehag, P., Coppin, B.: Reinforcement learning in large discrete action spaces. CoRR abs/1512.07679 (2015) Google Scholar
  10. 10.
    Bu, X., Rao, J., Xu, C.Z.: Coordinated self-configuration of virtual machines and appliances using a model-free learning approach. IEEE Trans. Parallel Distrib. Syst. 24, 681–690 (2013)CrossRefGoogle Scholar
  11. 11.
    Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation: USENIX Association, CA, USA, Berkeley, pp. 265–283 (2016)Google Scholar
  12. 12.
    Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, Cambridge University, Cambridge, England (1989)Google Scholar
  13. 13.
    Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  14. 14.
    Bibbona, E., Panfilo, G., Tavella, P.: The Ornstein–Uhlenbeck process as a model of a low pass filtered white noise. Metrologia (2008)Google Scholar
  15. 15.
    Li, T., Xu, Z., Tang, J., Wang, Y.: Model-free control for distributed stream data processing using deep reinforcement learning. In: Proceedings of Very Large Database Endowment, February 2018CrossRefGoogle Scholar
  16. 16.
    Henning, J.L.: SPEC CPU2006 benchmark descriptions. SIGARCH Computer Architecture News, pp. 1–17 (2006)CrossRefGoogle Scholar
  17. 17.
    Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. CoRR (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Luis A. Garrido
    • 1
    Email author
  • Rajiv Nishtala
    • 2
  • Paul Carpenter
    • 1
  1. 1.Barcelona Supercomputing CenterBarcelonaSpain
  2. 2.Norwegian University of Science and TechnologyTrondheimNorway

Personalised recommendations