Improvement of Systems Management Policies Using Hybrid Reinforcement Learning

  • Gerald Tesauro
  • Nicholas K. Jong
  • Rajarshi Das
  • Mohamed N. Bennani
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)


Reinforcement Learning (RL) holds particular promise in an emerging application domain of performance management of computing systems. In recent work, online RL yielded effective server allocation policies in a prototype Data Center, without explicit system models or built-in domain knowledge. This paper presents a substantially improved and more practical “hybrid” approach, in which RL trains offline on data collected while a queuing-theoretic policy controls the system. This approach avoids potentially poor performance in live online training. Additionally we use nonlinear function approximators instead of tabular value functions; this greatly improves scalability, and surprisingly, eliminated the need for exploratory actions. In experiments using both open-loop and closed-loop traffic as well as large switching delays, our results show significant performance improvement over state-of-art queuing model policies.


Reinforcement Learn Service Level Agreement Allocation Decision Switching Delay Initial Policy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Das, R., Tesauro, G., Walsh, W.E.: Model-based and model-free approaches to autonomic resource allcation. Technical Report RC23802, IBM Research (2005)Google Scholar
  2. 2.
    Tesauro, G.: Online resource allocation using decompositional reinforcement learning. In: Proc. of AAAI 2005. AAAI Press, Menlo Park (2005)Google Scholar
  3. 3.
    Vengerov, D., Iakovlev, N.: A reinforcement learning framework for dynamic resource allocation: First results. In: Proc. of ICAC 2005 (2005)Google Scholar
  4. 4.
    Price, B., Boutilier, C.: Accelerating reinforcement learning through implicit imitation. J. of AI Research 19, 569–629 (2003)MATHGoogle Scholar
  5. 5.
    Lavenberg, S.S.: Personal communication (2006)Google Scholar
  6. 6.
    Tesauro, G., Jong, N.K., Das, R., Bennani, M.N.: A hybrid reinforcement learning approach to autnomic resource allocation. In: Proc. of ICAC 2006, pp. 65–73 (2006)Google Scholar
  7. 7.
    Squillante, M.S., Yao, D.D., Zhang, L.: Internet traffic: Periodicity, tail behavior and performance implications. In: Gelenbe, E. (ed.) System Performance Evaluation: Methodologies and Applications. CRC Press, Boca Raton (1999)Google Scholar
  8. 8.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  9. 9.
    Baird, L.: Residual algorithms: Reinforcement learning with function approximation. In: Proc. of ICML 1995 (1995)Google Scholar
  10. 10.
    Abbeel, P., Ng, A.Y.: Exploration and apprenticeship learning in reinforcement learning. In: Proc. of ICML 2005 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Gerald Tesauro
    • 1
  • Nicholas K. Jong
    • 2
  • Rajarshi Das
    • 1
  • Mohamed N. Bennani
    • 3
  1. 1.IBM TJ Watson Research CenterHawthorneUSA
  2. 2.Dept. of Computer SciencesUniv. of TexasAustinUSA
  3. 3.Dept. of Computer ScienceGeorge Mason Univ.FairfaxUSA

Personalised recommendations