Skip to main content

Advertisement

SpringerLink
Book cover

International Conference on Tools and Algorithms for the Construction and Analysis of Systems

TACAS 2022: Tools and Algorithms for the Construction and Analysis of Systems pp 22–40Cite as

  1. Home
  2. Tools and Algorithms for the Construction and Analysis of Systems
  3. Conference paper
Under-Approximating Expected Total Rewards in POMDPs

Under-Approximating Expected Total Rewards in POMDPs

  • Alexander Bork  ORCID: orcid.org/0000-0002-7026-228X10,
  • Joost-Pieter Katoen  ORCID: orcid.org/0000-0002-6143-192610 &
  • Tim Quatmann  ORCID: orcid.org/0000-0002-2843-551110 
  • Conference paper
  • Open Access
  • First Online: 30 March 2022
  • 2053 Accesses

  • 1 Citations

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13244)

Abstract

We consider the problem: is the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP) below a given threshold? We tackle this—generally undecidable—problem by computing under-approximations on these total expected rewards. This is done by abstracting finite unfoldings of the infinite belief MDP of the POMDP. The key issue is to find a suitable under-approximation of the value function. We provide two techniques: a simple (cut-off) technique that uses a good policy on the POMDP, and a more advanced technique (belief clipping) that uses minimal shifts of probabilities between beliefs. We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well while providing tight lower bounds on the expected total reward.

This work is funded by the DFG RTG 2236 “UnRAVeL”.

Download conference paper PDF

References

  1. Amato, C., Bernstein, D.S., Zilberstein, S.: Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Auton. Agents Multi Agent Syst. 21(3), 293–320 (2010)

    Google Scholar 

  2. Ashok, P., Butkova, Y., Hermanns, H., Kretínský, J.: Continuous-time Markov decisions based on partial exploration. In: ATVA. Lecture Notes in Computer Science, vol. 11138, pp. 317–334. Springer (2018)

    Google Scholar 

  3. Aström, K.J.: Optimal control of Markov processes with incomplete state information. J. of Mathematical Analysis and Applications 10(1), 174–205 (1965)

    Google Scholar 

  4. Baier, C., Katoen, J.P.: Principles of model checking. MIT Press (2008)

    Google Scholar 

  5. Bellman, R.: A Markovian decision process. Journal of Mathematics and Mechanics 6, 679–684 (1957)

    Google Scholar 

  6. Bonet, B.: Solving large POMDPs using real time dynamic programming. In: AAAI Fall Symp. on POMDPs (1998)

    Google Scholar 

  7. Bonet, B., Geffner, H.: Solving POMDPs: RTDP-Bel vs. Point-based Algorithms. In: IJCAI. pp. 1641–1646 (2009)

    Google Scholar 

  8. Bork, A., Junges, S., Katoen, J., Quatmann, T.: Verification of indefinite-horizon POMDPs. In: ATVA. Lecture Notes in Computer Science, vol. 12302, pp. 288–304. Springer (2020)

    Google Scholar 

  9. Bork, A., Katoen, J.P., Quatmann, T.: Artifact for Paper: Under-Approximating Expected Total Rewards in POMDPs. Zenodo (2022). https://doi.org/10.5281/zenodo.5643643

  10. Bork, A., Katoen, J.P., Quatmann, T.: Under-Approximating Expected Total Rewards in POMDPs. arXiv e-print (2022), https://arxiv.org/abs/2201.08772

  11. Brázdil, T., Chatterjee, K., Chmelik, M., Forejt, V., Křetínskỳ, J., Kwiatkowska, M., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: ATVA. Lecture Notes in Computer Science, vol. 8837, pp. 98–114. Springer (2014)

    Google Scholar 

  12. Braziunas, D., Boutilier, C.: Stochastic local search for POMDP controllers. In: AAAI. pp. 690–696. AAAI Press / The MIT Press (2004)

    Google Scholar 

  13. Carr, S., Jansen, N., Topcu, U.: Verifiable rnn-based policies for POMDPs under temporal logic constraints. In: IJCAI. pp. 4121–4127. ijcai.org (2020)

    Google Scholar 

  14. Carr, S., Jansen, N., Wimmer, R., Serban, A.C., Becker, B., Topcu, U.: Counterexample-guided strategy improvement for POMDPs using recurrent neural networks. In: IJCAI. pp. 5532–5539. ijcai.org (2019)

    Google Scholar 

  15. Chatterjee, K., Chmelík, M., Davies, J.: A symbolic SAT-based algorithm for almost-sure reachability with small strategies in POMDPs. In: AAAI. pp. 3225–3232 (2016)

    Google Scholar 

  16. Chatterjee, K., Chmelík, M., Gupta, R., Kanodia, A.: Optimal cost almost-sure reachability in POMDPs. Artificial Intelligence 234, 26–48 (2016)

    Google Scholar 

  17. Chatterjee, K., Doyen, L., Henzinger, T.A.: Qualitative analysis of partially-observable Markov decision processes. In: MFCS. Lecture Notes in Computer Science, vol. 6281, pp. 258–269. Springer (2010)

    Google Scholar 

  18. Cheng, H.T.: Algorithms for partially observable Markov decision processes. Ph.D. thesis, University of British Columbia (1988)

    Google Scholar 

  19. Doshi, F., Pineau, J., Roy, N.: Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs. In: ICML. pp. 256–263 (2008)

    Google Scholar 

  20. Eagle, J.N.: The optimal search for a moving target when the search path is constrained. Operations Research 32(5), 1107–1115 (1984)

    Google Scholar 

  21. Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual (2021), https://www.gurobi.com

  22. Hauskrecht, M.: Value-function approximations for partially observable Markov decision processes. J. Artif. Intell. Res. 13, 33–94 (2000)

    Google Scholar 

  23. Hensel, C., Junges, S., Katoen, J., Quatmann, T., Volk, M.: The probabilistic model checker Storm. Int. J. on Software Tools for Technology Transfer (2021). https://doi.org/10.1007/s10009-021-00633-z

  24. Horák, K., Bošanský, B., Chatterjee, K.: Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs. In: IJCAI. pp. 4764–4770. ijcai.org (7 2018)

    Google Scholar 

  25. Itoh, H., Nakamura, K.: Partially observable Markov decision processes with imprecise parameters. Artificial Intelligence 171(8-9), 453–490 (2007)

    Google Scholar 

  26. Jansen, N., Dehnert, C., Kaminski, B.L., Katoen, J., Westhofen, L.: Bounded model checking for probabilistic programs. In: ATVA. Lecture Notes in Computer Science, vol. 9938, pp. 68–85 (2016)

    Google Scholar 

  27. Junges, S., Jansen, N., Seshia, S.A.: Enforcing almost-sure reachability in POMDPs. In: CAV (2). Lecture Notes in Computer Science, vol. 12760, pp. 602–625. Springer (2021)

    Google Scholar 

  28. Junges, S., Jansen, N., Wimmer, R., Quatmann, T., Winterer, L., Katoen, J.P., Becker, B.: Finite-state Controllers of POMDPs via Parameter Synthesis. In: UAI. pp. 519–529. AUAI Press (2018)

    Google Scholar 

  29. Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(1-2), 99–134 (1998)

    Google Scholar 

  30. Kurniawati, H., Hsu, D., Lee, W.S.: SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: Robotics: Science and Systems. vol. 2008 (2008)

    Google Scholar 

  31. Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: Verification of probabilistic real-time systems. In: CAV. Lecture Notes in Computer Science, vol. 6806, pp. 585–591. Springer (2011)

    Google Scholar 

  32. Lovejoy, W.S.: Computationally feasible bounds for partially observed Markov decision processes. Operations Research 39(1), 162–175 (1991)

    Google Scholar 

  33. Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In: AAAI/IAAI. pp. 541–548 (1999)

    Google Scholar 

  34. Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and related stochastic optimization problems. Artificial Intelligence 147(1-2), 5–34 (2003)

    Google Scholar 

  35. Meuleau, N., Kim, K.E., Kaelbling, L.P., Cassandra, A.R.: Solving POMDPs by searching the space of finite policies. In: UAI. pp. 417–426 (1999)

    Google Scholar 

  36. Monahan, G.E.: State of the art — a survey of partially observable Markov decision processes: theory, models, and algorithms. Management Science 28(1), 1–16 (1982)

    Google Scholar 

  37. Norman, G., Parker, D., Zou, X.: Verification and Control of Partially Observable Probabilistic Systems. Real-Time Systems 53(3), 354–402 (2017)

    Google Scholar 

  38. Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for POMDPs. In: IJCAI. vol. 3, pp. 1025–1032 (2003)

    Google Scholar 

  39. Quatmann, T., Katoen, J.: Sound value iteration. In: CAV (1). Lecture Notes in Computer Science, vol. 10981, pp. 643–661. Springer (2018)

    Google Scholar 

  40. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach (4th Edition). Pearson (2020)

    Google Scholar 

  41. Schrijver, A.: Theory of Linear and Integer Programming. John Wiley & Sons (1986)

    Google Scholar 

  42. Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Autonomous Agents and Multi-Agent Systems 27(1), 1–51 (2013)

    Google Scholar 

  43. Silver, D., Veness, J.: Monte-Carlo planning in large POMDPs. In: NIPS. pp. 2164–2172 (2010)

    Google Scholar 

  44. Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov processes over a finite horizon. Operations Research 21(5), 1071–1088 (1973)

    Google Scholar 

  45. Smith, T., Simmons, R.: Heuristic search value iteration for POMDPs. In: UAI. pp. 520–527 (2004)

    Google Scholar 

  46. Sondik, E.J.: The Optimal Control of Partially Observable Markov Processes. Ph.D. thesis, Stanford Univ Calif Stanford Electronics Labs (1971)

    Google Scholar 

  47. Sondik, E.J.: The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Operations research 26(2), 282–304 (1978)

    Google Scholar 

  48. Spaan, M.T., Vlassis, N.: Perseus: Randomized point-based value iteration for POMDPs. J. of Artificial Intelligence Research 24, 195–220 (2005)

    Google Scholar 

  49. Volk, M., Junges, S., Katoen, J.P.: Fast dynamic fault tree analysis by model checking techniques. IEEE Transactions on Industrial Informatics 14(1), 370–379 (2017)

    Google Scholar 

  50. Wang, Y., Chaudhuri, S., Kavraki, L.E.: Bounded Policy Synthesis for POMDPs with Safe-Reachability Objectives. In: AAMAS. pp. 238–246 (2018)

    Google Scholar 

  51. Winterer, L., Junges, S., Wimmer, R., Jansen, N., Topcu, U., Katoen, J.P., Becker, B.: Motion planning under partial observability using game-based abstraction. In: CDC. pp. 2201–2208. IEEE (2017)

    Google Scholar 

  52. Zhang, N.L., Lee, S.S.: Planning with partially observable Markov decision processes: advances in exact solution method. In: UAI. pp. 523–530 (1998)

    Google Scholar 

  53. Zhang, N.L., Zhang, W.: Speeding up the convergence of value iteration in partially observable Markov decision processes. Journal of Artificial Intelligence Research 14, 29–51 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. RWTH Aachen University, Aachen, Germany

    Alexander Bork, Joost-Pieter Katoen & Tim Quatmann

Authors
  1. Alexander Bork
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Joost-Pieter Katoen
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Tim Quatmann
    View author publications

    You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Bork .

Editor information

Editors and Affiliations

  1. Ben-Gurion University of the Negev, Be’er Sheva, Israel

    Dr. Dana Fisman

  2. University of Illinois Urbana-Champaign, Urbana, IL, USA

    Grigore Rosu

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and Permissions

Copyright information

© 2022 The Author(s)

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Bork, A., Katoen, JP., Quatmann, T. (2022). Under-Approximating Expected Total Rewards in POMDPs. In: Fisman, D., Rosu, G. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2022. Lecture Notes in Computer Science, vol 13244. Springer, Cham. https://doi.org/10.1007/978-3-030-99527-0_2

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-030-99527-0_2

  • Published: 30 March 2022

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99526-3

  • Online ISBN: 978-3-030-99527-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • The European Joint Conferences on Theory and Practice of Software.

    Published in cooperation with

    http://www.etaps.org/

Over 10 million scientific documents at your fingertips

Switch Edition
  • Academic Edition
  • Corporate Edition
  • Home
  • Impressum
  • Legal information
  • Privacy statement
  • California Privacy Statement
  • How we use cookies
  • Manage cookies/Do not sell my data
  • Accessibility
  • FAQ
  • Contact us
  • Affiliate program

Not affiliated

Springer Nature

© 2023 Springer Nature Switzerland AG. Part of Springer Nature.