Under-Approximating Expected Total Rewards in POMDPs

Bork, Alexander; Katoen, Joost-Pieter; Quatmann, Tim

doi:10.1007/978-3-030-99527-0_2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13244))

Included in the following conference series:

International Conference on Tools and Algorithms for the Construction and Analysis of Systems

3147 Accesses
5 Citations

Abstract

We consider the problem: is the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP) below a given threshold? We tackle this—generally undecidable—problem by computing under-approximations on these total expected rewards. This is done by abstracting finite unfoldings of the infinite belief MDP of the POMDP. The key issue is to find a suitable under-approximation of the value function. We provide two techniques: a simple (cut-off) technique that uses a good policy on the POMDP, and a more advanced technique (belief clipping) that uses minimal shifts of probabilities between beliefs. We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well while providing tight lower bounds on the expected total reward.

This work is funded by the DFG RTG 2236 “UnRAVeL”.

Download to read the full chapter text

Chapter PDF

A Fast Approximation Method for Partially Observable Markov Decision Processes

Article 07 December 2018

Reasoning and predicting POMDP planning complexity via covering numbers

Article 19 January 2016

Potential-based reward shaping for finite horizon online POMDP planning

Article 05 March 2015

References

Amato, C., Bernstein, D.S., Zilberstein, S.: Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Auton. Agents Multi Agent Syst. 21(3), 293–320 (2010)
Google Scholar
Ashok, P., Butkova, Y., Hermanns, H., Kretínský, J.: Continuous-time Markov decisions based on partial exploration. In: ATVA. Lecture Notes in Computer Science, vol. 11138, pp. 317–334. Springer (2018)
Google Scholar
Aström, K.J.: Optimal control of Markov processes with incomplete state information. J. of Mathematical Analysis and Applications 10(1), 174–205 (1965)
Google Scholar
Baier, C., Katoen, J.P.: Principles of model checking. MIT Press (2008)
Google Scholar
Bellman, R.: A Markovian decision process. Journal of Mathematics and Mechanics 6, 679–684 (1957)
Google Scholar
Bonet, B.: Solving large POMDPs using real time dynamic programming. In: AAAI Fall Symp. on POMDPs (1998)
Google Scholar
Bonet, B., Geffner, H.: Solving POMDPs: RTDP-Bel vs. Point-based Algorithms. In: IJCAI. pp. 1641–1646 (2009)
Google Scholar
Bork, A., Junges, S., Katoen, J., Quatmann, T.: Verification of indefinite-horizon POMDPs. In: ATVA. Lecture Notes in Computer Science, vol. 12302, pp. 288–304. Springer (2020)
Google Scholar
Bork, A., Katoen, J.P., Quatmann, T.: Artifact for Paper: Under-Approximating Expected Total Rewards in POMDPs. Zenodo (2022). https://doi.org/10.5281/zenodo.5643643
Bork, A., Katoen, J.P., Quatmann, T.: Under-Approximating Expected Total Rewards in POMDPs. arXiv e-print (2022), https://arxiv.org/abs/2201.08772
Brázdil, T., Chatterjee, K., Chmelik, M., Forejt, V., Křetínskỳ, J., Kwiatkowska, M., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: ATVA. Lecture Notes in Computer Science, vol. 8837, pp. 98–114. Springer (2014)
Google Scholar
Braziunas, D., Boutilier, C.: Stochastic local search for POMDP controllers. In: AAAI. pp. 690–696. AAAI Press / The MIT Press (2004)
Google Scholar
Carr, S., Jansen, N., Topcu, U.: Verifiable rnn-based policies for POMDPs under temporal logic constraints. In: IJCAI. pp. 4121–4127. ijcai.org (2020)
Google Scholar
Carr, S., Jansen, N., Wimmer, R., Serban, A.C., Becker, B., Topcu, U.: Counterexample-guided strategy improvement for POMDPs using recurrent neural networks. In: IJCAI. pp. 5532–5539. ijcai.org (2019)
Google Scholar
Chatterjee, K., Chmelík, M., Davies, J.: A symbolic SAT-based algorithm for almost-sure reachability with small strategies in POMDPs. In: AAAI. pp. 3225–3232 (2016)
Google Scholar
Chatterjee, K., Chmelík, M., Gupta, R., Kanodia, A.: Optimal cost almost-sure reachability in POMDPs. Artificial Intelligence 234, 26–48 (2016)
Google Scholar
Chatterjee, K., Doyen, L., Henzinger, T.A.: Qualitative analysis of partially-observable Markov decision processes. In: MFCS. Lecture Notes in Computer Science, vol. 6281, pp. 258–269. Springer (2010)
Google Scholar
Cheng, H.T.: Algorithms for partially observable Markov decision processes. Ph.D. thesis, University of British Columbia (1988)
Google Scholar
Doshi, F., Pineau, J., Roy, N.: Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs. In: ICML. pp. 256–263 (2008)
Google Scholar
Eagle, J.N.: The optimal search for a moving target when the search path is constrained. Operations Research 32(5), 1107–1115 (1984)
Google Scholar
Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual (2021), https://www.gurobi.com
Hauskrecht, M.: Value-function approximations for partially observable Markov decision processes. J. Artif. Intell. Res. 13, 33–94 (2000)
Google Scholar
Hensel, C., Junges, S., Katoen, J., Quatmann, T., Volk, M.: The probabilistic model checker Storm. Int. J. on Software Tools for Technology Transfer (2021). https://doi.org/10.1007/s10009-021-00633-z
Horák, K., Bošanský, B., Chatterjee, K.: Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs. In: IJCAI. pp. 4764–4770. ijcai.org (7 2018)
Google Scholar
Itoh, H., Nakamura, K.: Partially observable Markov decision processes with imprecise parameters. Artificial Intelligence 171(8-9), 453–490 (2007)
Google Scholar
Jansen, N., Dehnert, C., Kaminski, B.L., Katoen, J., Westhofen, L.: Bounded model checking for probabilistic programs. In: ATVA. Lecture Notes in Computer Science, vol. 9938, pp. 68–85 (2016)
Google Scholar
Junges, S., Jansen, N., Seshia, S.A.: Enforcing almost-sure reachability in POMDPs. In: CAV (2). Lecture Notes in Computer Science, vol. 12760, pp. 602–625. Springer (2021)
Google Scholar
Junges, S., Jansen, N., Wimmer, R., Quatmann, T., Winterer, L., Katoen, J.P., Becker, B.: Finite-state Controllers of POMDPs via Parameter Synthesis. In: UAI. pp. 519–529. AUAI Press (2018)
Google Scholar
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(1-2), 99–134 (1998)
Google Scholar
Kurniawati, H., Hsu, D., Lee, W.S.: SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: Robotics: Science and Systems. vol. 2008 (2008)
Google Scholar
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: Verification of probabilistic real-time systems. In: CAV. Lecture Notes in Computer Science, vol. 6806, pp. 585–591. Springer (2011)
Google Scholar
Lovejoy, W.S.: Computationally feasible bounds for partially observed Markov decision processes. Operations Research 39(1), 162–175 (1991)
Google Scholar
Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In: AAAI/IAAI. pp. 541–548 (1999)
Google Scholar
Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and related stochastic optimization problems. Artificial Intelligence 147(1-2), 5–34 (2003)
Google Scholar
Meuleau, N., Kim, K.E., Kaelbling, L.P., Cassandra, A.R.: Solving POMDPs by searching the space of finite policies. In: UAI. pp. 417–426 (1999)
Google Scholar
Monahan, G.E.: State of the art — a survey of partially observable Markov decision processes: theory, models, and algorithms. Management Science 28(1), 1–16 (1982)
Google Scholar
Norman, G., Parker, D., Zou, X.: Verification and Control of Partially Observable Probabilistic Systems. Real-Time Systems 53(3), 354–402 (2017)
Google Scholar
Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for POMDPs. In: IJCAI. vol. 3, pp. 1025–1032 (2003)
Google Scholar
Quatmann, T., Katoen, J.: Sound value iteration. In: CAV (1). Lecture Notes in Computer Science, vol. 10981, pp. 643–661. Springer (2018)
Google Scholar
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach (4th Edition). Pearson (2020)
Google Scholar
Schrijver, A.: Theory of Linear and Integer Programming. John Wiley & Sons (1986)
Google Scholar
Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Autonomous Agents and Multi-Agent Systems 27(1), 1–51 (2013)
Google Scholar
Silver, D., Veness, J.: Monte-Carlo planning in large POMDPs. In: NIPS. pp. 2164–2172 (2010)
Google Scholar
Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov processes over a finite horizon. Operations Research 21(5), 1071–1088 (1973)
Google Scholar
Smith, T., Simmons, R.: Heuristic search value iteration for POMDPs. In: UAI. pp. 520–527 (2004)
Google Scholar
Sondik, E.J.: The Optimal Control of Partially Observable Markov Processes. Ph.D. thesis, Stanford Univ Calif Stanford Electronics Labs (1971)
Google Scholar
Sondik, E.J.: The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Operations research 26(2), 282–304 (1978)
Google Scholar
Spaan, M.T., Vlassis, N.: Perseus: Randomized point-based value iteration for POMDPs. J. of Artificial Intelligence Research 24, 195–220 (2005)
Google Scholar
Volk, M., Junges, S., Katoen, J.P.: Fast dynamic fault tree analysis by model checking techniques. IEEE Transactions on Industrial Informatics 14(1), 370–379 (2017)
Google Scholar
Wang, Y., Chaudhuri, S., Kavraki, L.E.: Bounded Policy Synthesis for POMDPs with Safe-Reachability Objectives. In: AAMAS. pp. 238–246 (2018)
Google Scholar
Winterer, L., Junges, S., Wimmer, R., Jansen, N., Topcu, U., Katoen, J.P., Becker, B.: Motion planning under partial observability using game-based abstraction. In: CDC. pp. 2201–2208. IEEE (2017)
Google Scholar
Zhang, N.L., Lee, S.S.: Planning with partially observable Markov decision processes: advances in exact solution method. In: UAI. pp. 523–530 (1998)
Google Scholar
Zhang, N.L., Zhang, W.: Speeding up the convergence of value iteration in partially observable Markov decision processes. Journal of Artificial Intelligence Research 14, 29–51 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

RWTH Aachen University, Aachen, Germany
Alexander Bork, Joost-Pieter Katoen & Tim Quatmann

Authors

Alexander Bork
View author publications
You can also search for this author in PubMed Google Scholar
Joost-Pieter Katoen
View author publications
You can also search for this author in PubMed Google Scholar
Tim Quatmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Bork .

Editor information

Editors and Affiliations

Ben-Gurion University of the Negev, Be’er Sheva, Israel
Dana Fisman
University of Illinois Urbana-Champaign, Urbana, IL, USA
Grigore Rosu

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bork, A., Katoen, JP., Quatmann, T. (2022). Under-Approximating Expected Total Rewards in POMDPs. In: Fisman, D., Rosu, G. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2022. Lecture Notes in Computer Science, vol 13244. Springer, Cham. https://doi.org/10.1007/978-3-030-99527-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-99527-0_2
Published: 30 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99526-3
Online ISBN: 978-3-030-99527-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The European Joint Conferences on Theory and Practice of Software. (opens in a new tab)

Under-Approximating Expected Total Rewards in POMDPs

Abstract

Chapter PDF

Similar content being viewed by others

A Fast Approximation Method for Partially Observable Markov Decision Processes

Reasoning and predicting POMDP planning complexity via covering numbers

Potential-based reward shaping for finite horizon online POMDP planning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Under-Approximating Expected Total Rewards in POMDPs

Abstract

Chapter PDF

Similar content being viewed by others

A Fast Approximation Method for Partially Observable Markov Decision Processes

Reasoning and predicting POMDP planning complexity via covering numbers

Potential-based reward shaping for finite horizon online POMDP planning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation