Skip to main content

Solving Hidden-Semi-Markov-Mode Markov Decision Problems

  • Conference paper
Scalable Uncertainty Management (SUM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8720))

Included in the following conference series:

Abstract

Hidden-Mode Markov Decision Processes (HM-MDPs) were proposed to represent sequential decision-making problems in non-stationary environments that evolve according to a Markov chain. We introduce in this paper Hidden-Semi-Markov-Mode Markov Decision Process es (HS3MDPs), a generalization of HM-MDPs to the more realistic case of non-stationary environments evolving according to a semi-Markov chain. Like HM-MDPs, HS3MDPs form a subclass of Partially Observable Markov Decision Processes. Therefore, large instances of HS3MDPs (and HM-MDPs) can be solved using an online algorithm, the Partially Observable Monte Carlo Planning (POMCP) algorithm, based on Monte Carlo Tree Search exploiting particle filters for belief state approximation. We propose a first adaptation of POMCP to solve HS3MDPs more efficiently by exploiting their structure. Our empirical results show that the first adapted POMCP reaches higher cumulative rewards than the original algorithm. However, in larger instances, POMCP may run out of particles. To solve this issue, we propose a second adaptation of POMCP, replacing particle filters by exact representations of beliefs. Our empirical results indicate that this new version reaches high cumulative rewards faster than the former adapted POMCP and still remains efficient even for large problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Araya-López, M., Thomas, V., Buffet, O., Charpillet, F.: A closer look at MOMDPs. In: ICTAI (2010)

    Google Scholar 

  2. Aström, K.: Optimal control of markov decision processes with incomplete state estimation. J. of Math. Analysis and Applications 10, 174–205 (1965)

    Article  MATH  Google Scholar 

  3. Cassandra, A., Littman, M., Zhang, N.: Incremental Pruning: A simple, fast, exact method for Partially Observable Markov Decision Processes. In: UAI, pp. 54–61 (1997)

    Google Scholar 

  4. Cassandra, T.: Pomdp-solve (2003-2013), http://www.pomdp.org/code/index.shtml/

  5. Chadès, I., Carwardine, J., Martin, T.G., Nicol, S., Sabbadin, R., Buffet, O.: MOMDPs: A solution for modelling adaptive management problems. In: AAAI (2012)

    Google Scholar 

  6. Choi, S.: Reinforcement learning in nonstationary environments. Ph.D. thesis, Hong Kong Univ. of Science and Tech. (2000)

    Google Scholar 

  7. Choi, S., Yeung, D., Zhang, N.: An environment model for nonstationary reinforcement learning. In: NIPS, pp. 981–993 (2000)

    Google Scholar 

  8. Choi, S., Zhang, N., Yeung, D.: Solving Hidden-Mode Markov Decision Problems. In: AISTATS, pp. 19–26 (2001)

    Google Scholar 

  9. Doya, K., Samejima, K., Katagiri, K., Kawato, M.: Multiple model-based reinforcement learning. Neural Computing 14(6), 1347–1369 (2002)

    Article  MATH  Google Scholar 

  10. Kaelbling, L., Littman, M., Cassandra, A.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(1-2), 99–134 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  11. Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)

    Google Scholar 

  12. Ong, S., Png, S., Hsu, D., Lee, W.: POMDPs for robotic tasks with mixed observability. In: Robotics: Science & Syst. (2009)

    Google Scholar 

  13. Papadimitriou, C., Tsitsiklis, J.: The complexity of Markov Decision Processes. Math. of OR 12(3), 441–450 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  14. Puterman, M.: Markov Decision Processes: Discrete dynamic stochastic programming. John Wiley Chichester (1994)

    Google Scholar 

  15. da Silva, B., Basso, E., Bazzan, A., Engel, P.: Dealing with non-stationary environments using context detection. In: ICML (2006)

    Google Scholar 

  16. Silver, D., Veness, J.: Monte-Carlo planning in large POMDPs. In: NIPS, pp. 2164–2172 (2010)

    Google Scholar 

  17. Yu, S.: Hidden Semi-Markov Models. Artificial Intelligence 174(2), 215–243 (2010)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Hadoux, E., Beynier, A., Weng, P. (2014). Solving Hidden-Semi-Markov-Mode Markov Decision Problems. In: Straccia, U., Calì, A. (eds) Scalable Uncertainty Management. SUM 2014. Lecture Notes in Computer Science(), vol 8720. Springer, Cham. https://doi.org/10.1007/978-3-319-11508-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11508-5_15

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11507-8

  • Online ISBN: 978-3-319-11508-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics