Strong Simple Policies for POMDPs

Winterer, Leonore; Wimmer, Ralf; Becker, Bernd; Jansen, Nils

doi:10.1007/s10009-024-00747-0

Strong Simple Policies for POMDPs

Explanation Paradigms Leveraging Analytic Intuition
Regular
Open access
Published: 09 June 2024

Volume 26, pages 269–299, (2024)
Cite this article

Download PDF

You have full access to this open access article

International Journal on Software Tools for Technology Transfer Aims and scope Submit manuscript

Strong Simple Policies for POMDPs

Download PDF

Leonore Winterer¹,
Ralf Wimmer^2,1,
Bernd Becker¹ &
…
Nils Jansen³

215 Accesses
Explore all metrics

Abstract

The synthesis problem for partially observable Markov decision processes (POMDPs) is to compute a policy that provably adheres to one or more specifications. Yet, the general problem is undecidable, and policies require full (and thus potentially unbounded) traces of execution history. To provide good approximations of such policies, POMDP agents often employ randomization over action choices. We consider the problem of computing simpler policies for POMDPs, and provide several approaches to still ensure their expressiveness. Key aspects are (1) the combination of an arbitrary number of specifications the policies need to adhere to, (2) a restricted form of randomization, and (3) a light-weight preprocessing of the POMDP model to encode memory. We provide a novel encoding as a mixed-integer linear program as baseline to solve the underlying problems. Our experiments demonstrate that the policies we obtain are more robust, smaller, and easier to implement for an engineer than those obtained from state-of-the-art POMDP solvers.

Article PDF

Strengthening Deterministic Policies for POMDPs

Enforcing Almost-Sure Reachability in POMDPs

Search and Explore: Symbiotic Policy Synthesis in POMDPs

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Altman, E.: Constrained Markov Decision Processes. Routledge, London (1999)
Google Scholar
Amato, C., Bernstein, D.S., Zilberstein, S.: Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Auton. Agents Multi-Agent Syst. 21(3), 293–320 (2010)
Article Google Scholar
Andriushchenko, R., Ceska, M., Junges, S., Katoen, J.-P.: Inductive synthesis of finite-state controllers for pomdps. In: UAI. Proceedings of Machine Learning Research, vol. 180, pp. 85–95. PMLR (2022)
Google Scholar
Andriushchenko, R., Bork, A., Ceska, M., Junges, S., Katoen, J.-P., Macák, F.: Search and explore: symbiotic policy synthesis in pomdps. In: CAV (3). Lecture Notes in Computer Science, vol. 13966, pp. 113–135. Springer, Berlin (2023)
Google Scholar
Badings, T.S., Simão, T.D., Suilen, M., Jansen, N.: Decision-making under uncertainty: beyond probabilities. Int. J. Softw. Tools Technol. Transf. 25(3), 375–391 (2023)
Article Google Scholar
Baier, C., Katoen, J.-P.: Principles of Model Checking. MIT Press, Cambridge (2008)
Google Scholar
Bork, A., Junges, S., Katoen, J.-P., Quatmann, T.: Verification of indefinite-horizon pomdps. In: Van Hung, D., Sokolsky, O. (eds.) Int’l Symp. On Automated Technology for Verification and Analysis (ATVA), Hanoi, Vietnam, October 2020. LNCS, vol. 12302, pp. 288–304. Springer, Berlin (2020)
Chapter Google Scholar
Brock, O., Trinkle, J., Ramos, R.: SARSOP: efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: Robotics: Science and Systems IV, pp. 65–72. MIT Press, Cambridge (2009)
Chapter Google Scholar
Cassandra, A.R.: Exact and Approximate Algorithms for Partially Observable Markov Decision Processes. PhD thesis, Brown University, USA (1998). AAI9830418
Cassandra, A.R.: (2021). http://pomdp.org
Cassandra, A.R., Pack Kaelbling, L., Littman, M.L.: Acting optimally in partially observable stochastic domains. In: Hayes-Roth, B., Korf, R.E. (eds.) AAAI Conf. On Artificial Intelligence, vol. 2, Seattle, WA, USA, July/August 1994, pp. 1023–1028. AAAI Press, Menlo Park (1994)
Google Scholar
Cassandra, A.R., Littman, M.L., Zhang, N.L.: Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes (2013). CoRR arXiv:1302.1525
Chatterjee, K., Chmelik, M., Gupta, R., Kanodia, A.: Qualitative analysis of POMDPs with temporal logic specifications for robotics applications. In: IEEE Int’l Conf. On Robotics and Automation (ICRA), Seattle, WA, USA, pp. 325–330 (2015)
Google Scholar
Chatterjee, K., Chmelik, M., Gupta, R., Kanodia, A.: Optimal cost almost-sure reachability in POMDPs. Artif. Intell. 234, 26–48 (2016)
Article MathSciNet Google Scholar
Chatterjee, K., Saona, R., Ziliotto, B.: Finite-memory strategies in POMDPs with long-run average objectives. Math. Oper. Res. 47(1), 100–119 (2022)
Article MathSciNet Google Scholar
Chrisman, L.: Reinforcement learning with perceptual aliasing: the perceptual distinctions approach. In: AAAI Conf. On Artificial Intelligence, pp. 183–188. AAAI Press, Menlo Park (1992)
Google Scholar
Cubuktepe, M., Jansen, N., Junges, S., Katoen, J.-P., Papusha, I., Poonawala, H.A., Topcu, U.: Sequential convex programming for the efficient verification of parametric MDPs. In: Int’l Conf. On Tools and Algorithms for the Construction and Analysis of Systems (TACAS) Part II. LNCS, vol. 10206, pp. 133–150. Springer, Berlin (2017)
Google Scholar
Dehnert, C., Jansen, N., Wimmer, R., Abraham, E., Katoen, J.-P.: Fast debugging of PRISM models. In: Int’l Symp. On Automated Technology for Verification and Analysis (ATVA). LNCS, vol. 8837, pp. 146–162. Springer, Berlin (2014)
Chapter Google Scholar
Draper, D.L., Hanks, S., Weld, D.S.: A probablistic model of action for least-commitment planning with information gathering. In: López de Mántaras, R., Poole, D. (eds.) Conf. On Uncertainty in Artificial Intelligence (UAI), Seattle, WA, USA, July 1994, pp. 178–186. Morgan Kaufmann, San Mateo (1994)
Google Scholar
Draper, D.L., Hanks, S., Weld, D.S.: A probabilistic model of action for least-commitment planning with information gather (2013). CoRR arXiv:1302.6801
Floyd, R.W.: Algorithm 97: shortest path. Commun. ACM 5(6), 345 (1962)
Article Google Scholar
Givan, R., Dean, T.L., Greig, M.: Equivalence notions and model minimization in Markov decision processes. Artif. Intell. 147(1–2), 163–223 (2003)
Article MathSciNet Google Scholar
Gurobi Optimization, LLC: Gurobi optimizer reference manual (2019). http://www.gurobi.com
Hollins Wray, K., Zilberstein, S.: Multi-objective POMDPs with lexicographic reward preferences. In: Yang, Q., Wooldridge, M.J. (eds.) Int’l Joint Conf. On Artificial Intelligence (IJCAI), Buenos Aires, Argentina, July 2015, pp. 1719–1725. AAAI Press, Menlo Park (2015)
Google Scholar
Isom, J.D., Meyn, S.P., Braatz, R.D.: Piecewise linear dynamic programming for constrained POMDPs. In: Proc. Of the 23rd National Conf. On Artificial Intelligence – Volume 1, AAAI Conf. On Artificial Intelligence, pp. 291–296. AAAI Press, Menlo Park (2008)
Google Scholar
Junges, S., Jansen, N., Seshia, S.A.: Enforcing almost-sure reachability in POMDPs (2020). CoRR arXiv:2007.00085
Khonji, M., Jasour, A., Williams, B.C.: Approximability of constant-horizon constrained POMDP. In: Kraus, S. (ed.) Int’l Joint Conf. On Artificial Intelligence (IJCAI), Macao, China, August 2019, pp. 5583–5590. ijcai.org (2019)
Google Scholar
Knuth, D.E.: Two notes on notation. Am. Math. Mon. 99(5), 403–422 (1992)
Article MathSciNet Google Scholar
Kochenderfer, M.J.: Decision Making Under Uncertainty: Theory and Application. MIT Press, Cambridge (2015)
Book Google Scholar
Kumar, A., Zilberstein, S.: History-based controller design and optimization for partially observable mdps. In: Brafman, R.I., Domshlak, C., Haslum, P., Zilberstein, S. (eds.) Int’l Conf. On Automated Planning and Scheduling (ICAPS), Jerusalem, Israel, June 2015, pp. 156–164. AAAI Press, Menlo Park (2015)
Google Scholar
Kumar, A., Mostafa, H., Zilberstein, S.: Dual formulations for optimizing Dec-POMDP controllers. In: Int’l Conf. On Automated Planning and Scheduling (ICAPS), pp. 202–210 (2016)
Google Scholar
Kwiatkowska, M., Norman, G., Parker, D.: Prism 4.0: verification of probabilistic real-time systems. In: Int’l Conf. On Computer-Aided Verification (CAV). LNCS, vol. 6806, pp. 585–591. Springer, Berlin (2011)
Chapter Google Scholar
Littman, M.L., Topcu, U., Fu, J., Lee Isbell, C. Jr., Wen, M., MacGlashan, J.: Environment-independent task specifications via GLTL (2017). CoRR arXiv:1704.04341
Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In: Hendler, J., Subramanian, D. (eds.) AAAI Conf. On Artificial Intelligence, pp. 541–548. AAAI Press, Menlo Park (1999)
Google Scholar
McCallum, R.A.: Overcoming incomplete perception with utile distinction memory. In: Int’l Conf. On Machine Learning (ICML), pp. 190–196. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Meuleau, N., Peshkin, L., Kim, K.-E., Pack Kaelbling, L.: Learning finite-state controllers for partially observable environments. In: Conf. On Uncertainty in Artificial Intelligence (UAI), pp. 427–436. Morgan Kaufmann, San Mateo (1999)
Google Scholar
Milos, H.: Value-function approximations for partially observable Markov decision processes. J. Artif. Intell. Res. 13, 33–94 (2000)
Article MathSciNet Google Scholar
Norman, G., Parker, D., Zou, X.: Verification and control of partially observable probabilistic real-time systems. In: Sankaranarayanan, S., Vicario, E. (eds.) Int’l Conf. On Formal Modeling and Analysis of Timed Systems (FORMATS). LNCS, vol. 9268, pp. 240–255. Springer, Berlin (2015)
Chapter Google Scholar
Norman, G., Parker, D., Zou, X.: Verification and control of partially observable probabilistic systems. Real-Time Syst. 53(3), 354–402 (2017)
Article Google Scholar
Pack Kaelbling, L., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1), 99–134 (1998)
Article MathSciNet Google Scholar
Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of Markov decision processes. Math. Oper. Res. 12(3), 441–450 (1987)
Article MathSciNet Google Scholar
Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: an anytime algorithm for POMDPs. In: Int’l Joint Conf. On Artificial Intelligence (IJCAI), pp. 1025–1032. Morgan Kaufmann, San Mateo (2003)
Google Scholar
Pnueli, A.: The temporal logic of programs. In: Annual Symp. On Foundations of Computer Science, pp. 46–57. IEEE Comput. Soc., Los Alamitos (1977)
Google Scholar
Poupart, P., Malhotra, A., Pei, P., Kim, K.-E., Goh, B., Bowling, M.: Approximate linear programming for constrained partially observable Markov decision processes. In: Bonet, B., Koenig, S. (eds.) AAAI Conf. On Artificial Intelligence, Austin, Texas, USA, January 2015, pp. 3342–3348. AAAI Press, Menlo Park (2015)
Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics. Wiley-Interscience, New York (2005)
Google Scholar
Roijers, D.M., Vamplew, P., Whiteson, S., Dazeley, R.: A survey of multi-objective sequential decision-making. J. Artif. Intell. Res. 48(1), 67–113 (2013)
Article MathSciNet Google Scholar
Russell, S.J., Norvig, P.: Artificial Intelligence – a Modern Approach. Pearson Education, 3rd int’l edn. (2010)
Google Scholar
Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Auton. Agents Multi-Agent Syst. 27(1), 1–51 (2013)
Article Google Scholar
Silver, D., Veness, J.: Monte-Carlo planning in large POMDPs. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Conf. On Neural Information Processing Systems (NIPS), pp. 2164–2172. Curran Associates, Red Hook (2010)
Google Scholar
Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21(5), 1071–1088 (1973)
Article Google Scholar
Smith, T., Simmons, R.: Heuristic search value iteration for POMDPs. In: Conf. On Uncertainty in Artificial Intelligence (UAI), Banff, Canada, pp. 520–527. AUAI Press (2004)
Google Scholar
Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. MIT Press, Cambridge (2005)
Google Scholar
Velasquez, A.: Steady-state policy synthesis for verifiable control. In: Kraus, S. (ed.) Int’l Joint Conf. On Artificial Intelligence (IJCAI), pp. 5653–5661. ijcai.org (2019)
Google Scholar
Vlassis, N., Littman, M.L., Barber, D.: On the computational complexity of stochastic controller optimization in POMDPs. ACM Trans. Comput. Theory 4(4), 12:1–12:8 (2012)
Article Google Scholar
Walraven, E., Spaan, M.T.J.: Accelerated vector pruning for optimal POMDP solvers. In: AAAI Conf. On Artificial Intelligence, pp. 3672–3678. AAAI Press, Menlo Park (2017)
Google Scholar
Walraven, E., Spaan, M.T.J.: Accelerated vector pruning for optimal POMDP solvers. In: Singh, S., Markovitch, S. (eds.) AAAI Conf. On Artificial Intelligence, San Francisco, California, USA, February 2017, pp. 3672–3678. AAAI Press, Menlo Park (2017)
Google Scholar
Wimmer, R., Jansen, N., Abraham, E., Katoen, J.-P., Becker, B.: Minimal counterexamples for linear-time probabilistic verification. Theor. Comput. Sci. 549, 61–100 (2014)
Article MathSciNet Google Scholar
Winterer, L., Wimmer, R., Jansen, N., Becker, B.: Strengthening deterministic policies for POMDPs. In: NASA Formal Methods Conference (NFM), Moffett Field, CA, USA, May 2020, pp. 115–132. Springer, Berlin (2020)
Chapter Google Scholar
Winterer, L., Junges, S., Wimmer, R., Jansen, N., Topcu, U., Katoen, J.-P., Becker, B.: Strategy synthesis for POMDPs in robot planning via game-based abstractions. IEEE Trans. Autom. Control 66(3), 1040–1054 (2021)
Article MathSciNet Google Scholar
Wongpiromsarn, T., Frazzoli, E.: Control of probabilistic systems under dynamic, partially known environments with temporal logic specifications. In: Conf. On Decision and Control (CDC), pp. 7644–7651. IEEE (2012)
Google Scholar

Download references

Acknowledgements

This work was partially supported by the ERC Starting Grant 101077178 (DEUCE).

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Albert-Ludwigs-Universität Freiburg, Freiburg im Breisgau, Germany
Leonore Winterer, Ralf Wimmer & Bernd Becker
Altair Engineering GmbH, Böblingen, Germany
Ralf Wimmer
Ruhr-University, Bochum, Germany
Nils Jansen

Authors

Leonore Winterer
View author publications
You can also search for this author in PubMed Google Scholar
Ralf Wimmer
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Becker
View author publications
You can also search for this author in PubMed Google Scholar
Nils Jansen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nils Jansen.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Winterer, L., Wimmer, R., Becker, B. et al. Strong Simple Policies for POMDPs. Int J Softw Tools Technol Transfer 26, 269–299 (2024). https://doi.org/10.1007/s10009-024-00747-0

Download citation

Accepted: 23 April 2024
Published: 09 June 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s10009-024-00747-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Strong Simple Policies for POMDPs

Abstract

Article PDF

Similar content being viewed by others

Strengthening Deterministic Policies for POMDPs

Enforcing Almost-Sure Reachability in POMDPs

Search and Explore: Symbiotic Policy Synthesis in POMDPs

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Strong Simple Policies for POMDPs

Abstract

Article PDF

Similar content being viewed by others

Strengthening Deterministic Policies for POMDPs

Enforcing Almost-Sure Reachability in POMDPs

Search and Explore: Symbiotic Policy Synthesis in POMDPs

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation