Skip to main content

Simple Model-Based Exploration and Exploitation of Markov Decision Processes Using the Elimination Algorithm

  • Conference paper
Book cover MICAI 2007: Advances in Artificial Intelligence (MICAI 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4827))

Included in the following conference series:

  • 1007 Accesses

Abstract

The fundamental problem in learning and planning of Markov Decision Processes is how the agent explores and exploits an uncertain environment. The classical solutions to the problem are basically heuristics that lack appropriate theoretical justifications. As a result, principled solutions based on Bayesian estimation, though intractable even in small cases, have been recently investigated. The common approach is to approximate Bayesian estimation with sophisticated methods that cope the intractability of computing the Bayesian posterior. However, we notice that the complexity of these approximations still prevents their use as the long-term reward gain improvement seems to be diminished by the difficulties of implementation. In this work, we propose a deliberately simplistic model-based algorithm to show the benefits of Bayesian estimation when compared to classical model-free solutions. In particular, our agent combines several Markov Chains from its belief state and uses the matrix-based Elimination Algorithm to find the best action to take. We test our agent over the three standard problems Chain, Loop, and Maze, and find that it outperforms the classical Q-Learning with e-Greedy, Boltzmann, and Interval Estimation action selection heuristics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  2. Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)

    Google Scholar 

  3. Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete bayesian reinforcement learning. In: 23rd International Conference on Machine Learning (2006)

    Google Scholar 

  4. Wang, T., Lizotte, D., Bowling, M., Schuurmans, D.: Bayesian sparse sampling for on-line reward optimization. In: International Conference on Machine Learning, Bonn, Germany (2005)

    Google Scholar 

  5. Jordan, M.I. (ed.): Learning in graphical models. MIT Press, Cambridge (1999)

    Google Scholar 

  6. Neal, R.M.: Bayesian Learning for Neural Networks. Springer, New York (1996)

    MATH  Google Scholar 

  7. Watkins, C.: Learning from Delayed Rewards. PhD thesis, University of Cambridge (1989)

    Google Scholar 

  8. Russell, S.J., Norvig, P.: Artificial intelligence: a modern approach, 2nd edn. Prentice Hall/Pearson Education, Upper Saddle River, N.J (2003)

    Google Scholar 

  9. Wiering, M.: Explorations in efficient reinforcement learning. PhD thesis, University of Amsterdam (1999)

    Google Scholar 

  10. Kaelbling, L.P.: Associative reinforcement learning: Functions in k-dnf. Machine Learning 15(3), 279–298 (1994)

    MATH  Google Scholar 

  11. Duff, M.O.: Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massachusetts Amherst (2002)

    Google Scholar 

  12. Lusena, C., Goldsmith, J., Mundhenk, M.: Nonapproximability results for partially observable markov decision processes. JAIR 14, 83–103 (2001)

    MATH  MathSciNet  Google Scholar 

  13. Mundhenk, M., Goldsmith, J., Lusena, C., Allender, E.: Complexity of finite-horizon markov decision process problems. Journal of the ACM (JACM) 47(4), 681–720 (2000)

    Article  MathSciNet  Google Scholar 

  14. Castro, P.S., Precup, D.: Using linear programming for bayesian exploration in markov decision processes. IJCAI , 2437–2442 (2007)

    Google Scholar 

  15. Brafman, R.I., Tennenholtz, M.: R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  16. Kearns, M., Mansour, Y., Ng, A.Y.: A sparse sampling algorithm for near-optimal planning in large markov decision processes. Machine Learning 49(2-3), 193–208 (2002)

    Article  MATH  Google Scholar 

  17. Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. Machine Learning 49(2-3), 209–232 (2002)

    Article  MATH  Google Scholar 

  18. Strens, M.J.A.: A bayesian framework for reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 943–950. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  19. Dearden, R., Friedman, N., Andre, D.: Model based bayesian exploration. In: Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence, pp. 150–159 (1999)

    Google Scholar 

  20. Dearden, R., Friedman, N., Russell, S.J.: Bayesian q-learning. In: AAAI/IAAI, pp. 761–768 (1998)

    Google Scholar 

  21. Sonin, I.M.: A generalized gittins index for markov chain and its recursive calculation. submitted to Statistics and Probability Letters (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Gelbukh Ángel Fernando Kuri Morales

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Novoa, E. (2007). Simple Model-Based Exploration and Exploitation of Markov Decision Processes Using the Elimination Algorithm. In: Gelbukh, A., Kuri Morales, Á.F. (eds) MICAI 2007: Advances in Artificial Intelligence. MICAI 2007. Lecture Notes in Computer Science(), vol 4827. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76631-5_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76631-5_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76630-8

  • Online ISBN: 978-3-540-76631-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics