Solving Hidden-Semi-Markov-Mode Markov Decision Problems

Hadoux, Emmanuel; Beynier, Aurélie; Weng, Paul

doi:10.1007/978-3-319-11508-5_15

Emmanuel Hadoux²¹,
Aurélie Beynier²¹ &
Paul Weng²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8720))

Included in the following conference series:

International Conference on Scalable Uncertainty Management

549 Accesses
6 Citations

Abstract

Hidden-Mode Markov Decision Processes (HM-MDPs) were proposed to represent sequential decision-making problems in non-stationary environments that evolve according to a Markov chain. We introduce in this paper Hidden-Semi-Markov-Mode Markov Decision Process es (HS3MDPs), a generalization of HM-MDPs to the more realistic case of non-stationary environments evolving according to a semi-Markov chain. Like HM-MDPs, HS3MDPs form a subclass of Partially Observable Markov Decision Processes. Therefore, large instances of HS3MDPs (and HM-MDPs) can be solved using an online algorithm, the Partially Observable Monte Carlo Planning (POMCP) algorithm, based on Monte Carlo Tree Search exploiting particle filters for belief state approximation. We propose a first adaptation of POMCP to solve HS3MDPs more efficiently by exploiting their structure. Our empirical results show that the first adapted POMCP reaches higher cumulative rewards than the original algorithm. However, in larger instances, POMCP may run out of particles. To solve this issue, we propose a second adaptation of POMCP, replacing particle filters by exact representations of beliefs. Our empirical results indicate that this new version reaches high cumulative rewards faster than the former adapted POMCP and still remains efficient even for large problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Araya-López, M., Thomas, V., Buffet, O., Charpillet, F.: A closer look at MOMDPs. In: ICTAI (2010)
Google Scholar
Aström, K.: Optimal control of markov decision processes with incomplete state estimation. J. of Math. Analysis and Applications 10, 174–205 (1965)
Article MATH Google Scholar
Cassandra, A., Littman, M., Zhang, N.: Incremental Pruning: A simple, fast, exact method for Partially Observable Markov Decision Processes. In: UAI, pp. 54–61 (1997)
Google Scholar
Cassandra, T.: Pomdp-solve (2003-2013), http://www.pomdp.org/code/index.shtml/
Chadès, I., Carwardine, J., Martin, T.G., Nicol, S., Sabbadin, R., Buffet, O.: MOMDPs: A solution for modelling adaptive management problems. In: AAAI (2012)
Google Scholar
Choi, S.: Reinforcement learning in nonstationary environments. Ph.D. thesis, Hong Kong Univ. of Science and Tech. (2000)
Google Scholar
Choi, S., Yeung, D., Zhang, N.: An environment model for nonstationary reinforcement learning. In: NIPS, pp. 981–993 (2000)
Google Scholar
Choi, S., Zhang, N., Yeung, D.: Solving Hidden-Mode Markov Decision Problems. In: AISTATS, pp. 19–26 (2001)
Google Scholar
Doya, K., Samejima, K., Katagiri, K., Kawato, M.: Multiple model-based reinforcement learning. Neural Computing 14(6), 1347–1369 (2002)
Article MATH Google Scholar
Kaelbling, L., Littman, M., Cassandra, A.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(1-2), 99–134 (1998)
Article MATH MathSciNet Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Google Scholar
Ong, S., Png, S., Hsu, D., Lee, W.: POMDPs for robotic tasks with mixed observability. In: Robotics: Science & Syst. (2009)
Google Scholar
Papadimitriou, C., Tsitsiklis, J.: The complexity of Markov Decision Processes. Math. of OR 12(3), 441–450 (1987)
Article MATH MathSciNet Google Scholar
Puterman, M.: Markov Decision Processes: Discrete dynamic stochastic programming. John Wiley Chichester (1994)
Google Scholar
da Silva, B., Basso, E., Bazzan, A., Engel, P.: Dealing with non-stationary environments using context detection. In: ICML (2006)
Google Scholar
Silver, D., Veness, J.: Monte-Carlo planning in large POMDPs. In: NIPS, pp. 2164–2172 (2010)
Google Scholar
Yu, S.: Hidden Semi-Markov Models. Artificial Intelligence 174(2), 215–243 (2010)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

UPMC Univ Paris 06, UMR 7606, LIP6, Sorbonne Universités, Paris, France
Emmanuel Hadoux, Aurélie Beynier & Paul Weng

Authors

Emmanuel Hadoux
View author publications
You can also search for this author in PubMed Google Scholar
Aurélie Beynier
View author publications
You can also search for this author in PubMed Google Scholar
Paul Weng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Istituto di Scienza e Tecnologie dell’Informazione (ISTI - CNR), Pisa, Italy
Umberto Straccia
Department of Computer Science and Information Systems, University of London, London, United Kingdom
Andrea Calì

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hadoux, E., Beynier, A., Weng, P. (2014). Solving Hidden-Semi-Markov-Mode Markov Decision Problems. In: Straccia, U., Calì, A. (eds) Scalable Uncertainty Management. SUM 2014. Lecture Notes in Computer Science(), vol 8720. Springer, Cham. https://doi.org/10.1007/978-3-319-11508-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-11508-5_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11507-8
Online ISBN: 978-3-319-11508-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics