On the use of hybrid reinforcement learning for autonomic resource allocation

Tesauro, Gerald; Jong, Nicholas K.; Das, Rajarshi; Bennani, Mohamed N.

doi:10.1007/s10586-007-0035-6

On the use of hybrid reinforcement learning for autonomic resource allocation

Original Paper
Published: 28 June 2007

Volume 10, pages 287–299, (2007)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Gerald Tesauro¹,
Nicholas K. Jong²,
Rajarshi Das¹ &
…
Mohamed N. Bennani³

613 Accesses
81 Citations
4 Altmetric
Explore all metrics

Abstract

Reinforcement Learning (RL) provides a promising new approach to systems performance management that differs radically from standard queuing-theoretic approaches making use of explicit system performance models. In principle, RL can automatically learn high-quality management policies without an explicit performance model or traffic model, and with little or no built-in system specific knowledge. In our original work (Das, R., Tesauro, G., Walsh, W.E.: IBM Research, Tech. Rep. RC23802 (2005), Tesauro, G.: In: Proc. of AAAI-05, pp. 886–891 (2005), Tesauro, G., Das, R., Walsh, W.E., Kephart, J.O.: In: Proc. of ICAC-05, pp. 342–343 (2005)) we showed the feasibility of using online RL to learn resource valuation estimates (in lookup table form) which can be used to make high-quality server allocation decisions in a multi-application prototype Data Center scenario. The present work shows how to combine the strengths of both RL and queuing models in a hybrid approach, in which RL trains offline on data collected while a queuing model policy controls the system. By training offline we avoid suffering potentially poor performance in live online training. We also now use RL to train nonlinear function approximators (e.g. multi-layer perceptrons) instead of lookup tables; this enables scaling to substantially larger state spaces. Our results now show that, in both open-loop and closed-loop traffic, hybrid RL training can achieve significant performance improvements over a variety of initial model-based policies. We also find that, as expected, RL can deal effectively with both transients and switching delays, which lie outside the scope of traditional steady-state queuing theory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human-in-the-loop machine learning: a state of the art

Article Open access 17 August 2022

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

References

Das, R., Tesauro, G., Walsh, W.E.: Model-based and model-free approaches to autonomic resource allocation. IBM Research, Tech. Rep. RC23802 (2005)
Tesauro, G.: Online resource allocation using decompositional reinforcement learning. In: Proc. of AAAI-05, pp. 886–891 (2005)
Tesauro, G., Das, R., Walsh, W.E., Kephart, J.O.: Utility-function-driven resource allocation in autonomic systems. In: Proc. of ICAC-05, pp. 342–343 (2005)
Hellerstein, J.L., Diao, Y., Parekh, S., Tilbury, D.M.: Feedback Control of Computing Systems. Wiley (2004)
Menascé, D.A., Almedia, V.A.F., Dowdy, L.W.: Performance by Design: Computer Capacity Planning by Example. Prentice Hall, Upper Saddle River (2004)
Google Scholar
Urgaonkar, B., Pacifici, G., Shenoy, P., Spreitzer, M., Tantawi, A.: An analytical model for multi-tier internet services and its applications. In: Proc. of SIGMETRICS-05, pp. 291–302 (2005)
Kephart, J.O., Chess, D.M.: The vision of autonomic computing. Computer 36(1), 41–52 (2003)
Article Google Scholar
Tesauro, G., Jong, N.K., Das, R., Bennani, M.N.: A hybrid reinforcement learning approach to autonomic resource allocation. In: Proc. of ICAC-06, pp. 65–73 (2006)
Vengerov, D., Iakovlev, N.: A reinforcement learning framework for dynamic resource allocation: First results. In: Proc. of ICAC-05, pp. 339–340 (2005)
Vengerov, D.: A reinforcement learning framework for utility-based scheduling in resource-constrained systems. Sun Microsystems, Tech. Rep. TR-2005-141 (2005)
Whiteson, S., Stone, P.: Adaptive job routing and scheduling. Eng. Appl. Artif. Intell. 17(7), 855–869 (2004)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT, Cambridge (1998)
Google Scholar
Tesauro, G.: Temporal difference learning and TD-Gammon. Commun. ACM 38(3), 58–68 (1995)
Article Google Scholar
Moody, J., Saffell, M.: Learning to trade via direct reinforcement. IEEE Trans. Neural Netw. 12(4), 875–889 (2001)
Article Google Scholar
Ng, A.Y., et al.: Inverted autonomous helicopter flight via reinforcement learning. In: Intl. Symposium on Experimental Robotics (2004)
Bellman, R.E.: Dynamic Programming. Princeton University Press (1957)
Walsh, W.E., Tesauro, G., Kephart, J.O., Das, R.: Utility functions in autonomic systems. In: Proc. of ICAC-04, pp. 70–77 (2004)
IBM: Websphere benchmark sample, http://www-306.ibm.com/software/webservers/appserv/benchmark3.html (2004)
Squillante, M.S., Yao, D.D., Zhang, L.: Internet traffic: Periodicity, tail behavior and performance implications. In: Gelenbe, E. (ed.) System Performance Evaluation: Methodologies and Applications, pp. 23–37. CRC (1999)
Singh, S., Cohn, D.: How to dynamically merge Markov decision processes. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems, vol. 10, pp. 1057–1063. MIT (1998)
Schaal, S.: Learning from demonstration. In: Mozer, M.C., et al. (eds.) Advances in Neural Information Processing Systems, vol. 9, pp. 1040–1046. MIT (1997)
Smart, W.D., Kaelbling, L.P.: Effective reinforcement learning for mobile robots. In: Proc. of Intl. Conf. on Robotics and Automation (ICRA-02) (2002)
Price, B., Boutilier, C.: Accelerating reinforcement learning through implicit imitation. J. AI Res. 19, 569–629 (2003)
MATH Google Scholar
Barron, A.R.: Complexity regularization with application to artificial neural networks. In: Roussas, G. (ed.) Nonparametric Functional Estimation and Related Topics (1991)
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003)
Article Google Scholar
Sridharan, M., Tesauro, G.: Multi-agent Q-learning and regression trees for automated pricing decisions. In: Proc. 17th Intl. Conf. on Machine Learning, pp. 927–934. Kaufmann, San Francisco (2000)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L., et al.(eds.) Foundations. Parallel Distributed Processing, vol. 1, pp. 318–362. MIT, Cambridge (1987)
Google Scholar
Baird, L.: Residual algorithms: Reinforcement learning with function approximation. In: Proc. of ICML-95 (1995)
Watkins, C.: Learning from delayed rewards. Ph.D. dissertation, Cambridge University (1989)
Pradhan, P., Tewari, R., Sahu, S., Chandra, C., Shenoy, P.: An observation-based approach towards self-managing web servers. In: Proc. of Intl. Workshop on Quality of Service, pp. 13–22 (2002)
Chandra, A., Gong, W., Shenoy, P.: Dynamic resource allocation for shared data centers using online measurements. In: Proc. of ACM/IEEE Intl. Workshop on Quality of Service (IWQoS), pp. 381–400 (2003)
Bennani, M.N., Menascé, D.A.: Assessing the robustness of self-managing computer systems under variable workloads. In: Proc. of ICAC-04, pp. 62–69 (2004)
Bennani, M.N., Menascé, D.A.: Resource allocation for autonomic data centers using analytic performance models. In: Proc. of ICAC-05, pp. 229–240 (2005)
IBM: WebSphere Extended Deployment, www.ibm.com/software/webservers/appserv/extend/ (2006)
IBM: Tivoli Intelligent Orchestrator product overview, http://www.ibm.com/software/tivoli/products/intell-orch (2005)
IBM: PowerExecutive, www.ibm.com/systems/management/director/extensions/powerexec.html (2006)

Download references

Author information

Authors and Affiliations

IBM TJ Watson Research Center, 19 Skyline Drive, Hawthorne, NY, 10532, USA
Gerald Tesauro & Rajarshi Das
Dept. of Computer Sciences, Univ. of Texas, Austin, TX, 78712, USA
Nicholas K. Jong
Oracle Inc., 1211 SW Fifth Ave., Portland, OR, 97204, USA
Mohamed N. Bennani

Authors

Gerald Tesauro
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas K. Jong
View author publications
You can also search for this author in PubMed Google Scholar
Rajarshi Das
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed N. Bennani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gerald Tesauro.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tesauro, G., Jong, N.K., Das, R. et al. On the use of hybrid reinforcement learning for autonomic resource allocation. Cluster Comput 10, 287–299 (2007). https://doi.org/10.1007/s10586-007-0035-6

Download citation

Published: 28 June 2007
Issue Date: September 2007
DOI: https://doi.org/10.1007/s10586-007-0035-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the use of hybrid reinforcement learning for autonomic resource allocation

Abstract

Access this article

Similar content being viewed by others

Human-in-the-loop machine learning: a state of the art

A practical guide to multi-objective reinforcement learning and planning

Multi-agent deep reinforcement learning: a survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the use of hybrid reinforcement learning for autonomic resource allocation

Abstract

Access this article

Similar content being viewed by others

Human-in-the-loop machine learning: a state of the art

A practical guide to multi-objective reinforcement learning and planning

Multi-agent deep reinforcement learning: a survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation