Solving Hidden-Semi-Markov-Mode Markov Decision Problems

  • Emmanuel Hadoux
  • Aurélie Beynier
  • Paul Weng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8720)

Abstract

Hidden-Mode Markov Decision Processes (HM-MDPs) were proposed to represent sequential decision-making problems in non-stationary environments that evolve according to a Markov chain. We introduce in this paper Hidden-Semi-Markov-Mode Markov Decision Process es (HS3MDPs), a generalization of HM-MDPs to the more realistic case of non-stationary environments evolving according to a semi-Markov chain. Like HM-MDPs, HS3MDPs form a subclass of Partially Observable Markov Decision Processes. Therefore, large instances of HS3MDPs (and HM-MDPs) can be solved using an online algorithm, the Partially Observable Monte Carlo Planning (POMCP) algorithm, based on Monte Carlo Tree Search exploiting particle filters for belief state approximation. We propose a first adaptation of POMCP to solve HS3MDPs more efficiently by exploiting their structure. Our empirical results show that the first adapted POMCP reaches higher cumulative rewards than the original algorithm. However, in larger instances, POMCP may run out of particles. To solve this issue, we propose a second adaptation of POMCP, replacing particle filters by exact representations of beliefs. Our empirical results indicate that this new version reaches high cumulative rewards faster than the former adapted POMCP and still remains efficient even for large problems.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Araya-López, M., Thomas, V., Buffet, O., Charpillet, F.: A closer look at MOMDPs. In: ICTAI (2010)Google Scholar
  2. 2.
    Aström, K.: Optimal control of markov decision processes with incomplete state estimation. J. of Math. Analysis and Applications 10, 174–205 (1965)CrossRefMATHGoogle Scholar
  3. 3.
    Cassandra, A., Littman, M., Zhang, N.: Incremental Pruning: A simple, fast, exact method for Partially Observable Markov Decision Processes. In: UAI, pp. 54–61 (1997)Google Scholar
  4. 4.
    Cassandra, T.: Pomdp-solve (2003-2013), http://www.pomdp.org/code/index.shtml/
  5. 5.
    Chadès, I., Carwardine, J., Martin, T.G., Nicol, S., Sabbadin, R., Buffet, O.: MOMDPs: A solution for modelling adaptive management problems. In: AAAI (2012)Google Scholar
  6. 6.
    Choi, S.: Reinforcement learning in nonstationary environments. Ph.D. thesis, Hong Kong Univ. of Science and Tech. (2000)Google Scholar
  7. 7.
    Choi, S., Yeung, D., Zhang, N.: An environment model for nonstationary reinforcement learning. In: NIPS, pp. 981–993 (2000)Google Scholar
  8. 8.
    Choi, S., Zhang, N., Yeung, D.: Solving Hidden-Mode Markov Decision Problems. In: AISTATS, pp. 19–26 (2001)Google Scholar
  9. 9.
    Doya, K., Samejima, K., Katagiri, K., Kawato, M.: Multiple model-based reinforcement learning. Neural Computing 14(6), 1347–1369 (2002)CrossRefMATHGoogle Scholar
  10. 10.
    Kaelbling, L., Littman, M., Cassandra, A.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(1-2), 99–134 (1998)CrossRefMATHMathSciNetGoogle Scholar
  11. 11.
    Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)Google Scholar
  12. 12.
    Ong, S., Png, S., Hsu, D., Lee, W.: POMDPs for robotic tasks with mixed observability. In: Robotics: Science & Syst. (2009)Google Scholar
  13. 13.
    Papadimitriou, C., Tsitsiklis, J.: The complexity of Markov Decision Processes. Math. of OR 12(3), 441–450 (1987)CrossRefMATHMathSciNetGoogle Scholar
  14. 14.
    Puterman, M.: Markov Decision Processes: Discrete dynamic stochastic programming. John Wiley Chichester (1994)Google Scholar
  15. 15.
    da Silva, B., Basso, E., Bazzan, A., Engel, P.: Dealing with non-stationary environments using context detection. In: ICML (2006)Google Scholar
  16. 16.
    Silver, D., Veness, J.: Monte-Carlo planning in large POMDPs. In: NIPS, pp. 2164–2172 (2010)Google Scholar
  17. 17.
    Yu, S.: Hidden Semi-Markov Models. Artificial Intelligence 174(2), 215–243 (2010)CrossRefMATHMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Emmanuel Hadoux
    • 1
  • Aurélie Beynier
    • 1
  • Paul Weng
    • 1
  1. 1.UPMC Univ Paris 06, UMR 7606, LIP6Sorbonne UniversitésParisFrance

Personalised recommendations