Simple Model-Based Exploration and Exploitation of Markov Decision Processes Using the Elimination Algorithm

Novoa, Elizabeth

doi:10.1007/978-3-540-76631-5_31

Elizabeth Novoa¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4827))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1007 Accesses

Abstract

The fundamental problem in learning and planning of Markov Decision Processes is how the agent explores and exploits an uncertain environment. The classical solutions to the problem are basically heuristics that lack appropriate theoretical justifications. As a result, principled solutions based on Bayesian estimation, though intractable even in small cases, have been recently investigated. The common approach is to approximate Bayesian estimation with sophisticated methods that cope the intractability of computing the Bayesian posterior. However, we notice that the complexity of these approximations still prevents their use as the long-term reward gain improvement seems to be diminished by the difficulties of implementation. In this work, we propose a deliberately simplistic model-based algorithm to show the benefits of Bayesian estimation when compared to classical model-free solutions. In particular, our agent combines several Markov Chains from its belief state and uses the matrix-based Elimination Algorithm to find the best action to take. We test our agent over the three standard problems Chain, Loop, and Maze, and find that it outperforms the classical Q-Learning with e-Greedy, Boltzmann, and Interval Estimation action selection heuristics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Google Scholar
Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete bayesian reinforcement learning. In: 23rd International Conference on Machine Learning (2006)
Google Scholar
Wang, T., Lizotte, D., Bowling, M., Schuurmans, D.: Bayesian sparse sampling for on-line reward optimization. In: International Conference on Machine Learning, Bonn, Germany (2005)
Google Scholar
Jordan, M.I. (ed.): Learning in graphical models. MIT Press, Cambridge (1999)
Google Scholar
Neal, R.M.: Bayesian Learning for Neural Networks. Springer, New York (1996)
MATH Google Scholar
Watkins, C.: Learning from Delayed Rewards. PhD thesis, University of Cambridge (1989)
Google Scholar
Russell, S.J., Norvig, P.: Artificial intelligence: a modern approach, 2nd edn. Prentice Hall/Pearson Education, Upper Saddle River, N.J (2003)
Google Scholar
Wiering, M.: Explorations in efficient reinforcement learning. PhD thesis, University of Amsterdam (1999)
Google Scholar
Kaelbling, L.P.: Associative reinforcement learning: Functions in k-dnf. Machine Learning 15(3), 279–298 (1994)
MATH Google Scholar
Duff, M.O.: Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massachusetts Amherst (2002)
Google Scholar
Lusena, C., Goldsmith, J., Mundhenk, M.: Nonapproximability results for partially observable markov decision processes. JAIR 14, 83–103 (2001)
MATH MathSciNet Google Scholar
Mundhenk, M., Goldsmith, J., Lusena, C., Allender, E.: Complexity of finite-horizon markov decision process problems. Journal of the ACM (JACM) 47(4), 681–720 (2000)
Article MathSciNet Google Scholar
Castro, P.S., Precup, D.: Using linear programming for bayesian exploration in markov decision processes. IJCAI , 2437–2442 (2007)
Google Scholar
Brafman, R.I., Tennenholtz, M.: R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2003)
Article MATH MathSciNet Google Scholar
Kearns, M., Mansour, Y., Ng, A.Y.: A sparse sampling algorithm for near-optimal planning in large markov decision processes. Machine Learning 49(2-3), 193–208 (2002)
Article MATH Google Scholar
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. Machine Learning 49(2-3), 209–232 (2002)
Article MATH Google Scholar
Strens, M.J.A.: A bayesian framework for reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 943–950. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Dearden, R., Friedman, N., Andre, D.: Model based bayesian exploration. In: Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence, pp. 150–159 (1999)
Google Scholar
Dearden, R., Friedman, N., Russell, S.J.: Bayesian q-learning. In: AAAI/IAAI, pp. 761–768 (1998)
Google Scholar
Sonin, I.M.: A generalized gittins index for markov chain and its recursive calculation. submitted to Statistics and Probability Letters (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Ingeniería Informática, Universidad de Santiago de Chile, Av. Ecuador 3659, Santiago, Chile
Elizabeth Novoa

Authors

Elizabeth Novoa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alexander Gelbukh Ángel Fernando Kuri Morales

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Novoa, E. (2007). Simple Model-Based Exploration and Exploitation of Markov Decision Processes Using the Elimination Algorithm. In: Gelbukh, A., Kuri Morales, Á.F. (eds) MICAI 2007: Advances in Artificial Intelligence. MICAI 2007. Lecture Notes in Computer Science(), vol 4827. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76631-5_31

Download citation

DOI: https://doi.org/10.1007/978-3-540-76631-5_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76630-8
Online ISBN: 978-3-540-76631-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics