Abstract
The synthesis problem for partially observable Markov decision processes (POMDPs) is to compute a policy that provably adheres to one or more specifications. Yet, the general problem is undecidable, and policies require full (and thus potentially unbounded) traces of execution history. To provide good approximations of such policies, POMDP agents often employ randomization over action choices. We consider the problem of computing simpler policies for POMDPs, and provide several approaches to still ensure their expressiveness. Key aspects are (1) the combination of an arbitrary number of specifications the policies need to adhere to, (2) a restricted form of randomization, and (3) a light-weight preprocessing of the POMDP model to encode memory. We provide a novel encoding as a mixed-integer linear program as baseline to solve the underlying problems. Our experiments demonstrate that the policies we obtain are more robust, smaller, and easier to implement for an engineer than those obtained from state-of-the-art POMDP solvers.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Altman, E.: Constrained Markov Decision Processes. Routledge, London (1999)
Amato, C., Bernstein, D.S., Zilberstein, S.: Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Auton. Agents Multi-Agent Syst. 21(3), 293–320 (2010)
Andriushchenko, R., Ceska, M., Junges, S., Katoen, J.-P.: Inductive synthesis of finite-state controllers for pomdps. In: UAI. Proceedings of Machine Learning Research, vol. 180, pp. 85–95. PMLR (2022)
Andriushchenko, R., Bork, A., Ceska, M., Junges, S., Katoen, J.-P., Macák, F.: Search and explore: symbiotic policy synthesis in pomdps. In: CAV (3). Lecture Notes in Computer Science, vol. 13966, pp. 113–135. Springer, Berlin (2023)
Badings, T.S., Simão, T.D., Suilen, M., Jansen, N.: Decision-making under uncertainty: beyond probabilities. Int. J. Softw. Tools Technol. Transf. 25(3), 375–391 (2023)
Baier, C., Katoen, J.-P.: Principles of Model Checking. MIT Press, Cambridge (2008)
Bork, A., Junges, S., Katoen, J.-P., Quatmann, T.: Verification of indefinite-horizon pomdps. In: Van Hung, D., Sokolsky, O. (eds.) Int’l Symp. On Automated Technology for Verification and Analysis (ATVA), Hanoi, Vietnam, October 2020. LNCS, vol. 12302, pp. 288–304. Springer, Berlin (2020)
Brock, O., Trinkle, J., Ramos, R.: SARSOP: efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: Robotics: Science and Systems IV, pp. 65–72. MIT Press, Cambridge (2009)
Cassandra, A.R.: Exact and Approximate Algorithms for Partially Observable Markov Decision Processes. PhD thesis, Brown University, USA (1998). AAI9830418
Cassandra, A.R.: (2021). http://pomdp.org
Cassandra, A.R., Pack Kaelbling, L., Littman, M.L.: Acting optimally in partially observable stochastic domains. In: Hayes-Roth, B., Korf, R.E. (eds.) AAAI Conf. On Artificial Intelligence, vol. 2, Seattle, WA, USA, July/August 1994, pp. 1023–1028. AAAI Press, Menlo Park (1994)
Cassandra, A.R., Littman, M.L., Zhang, N.L.: Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes (2013). CoRR arXiv:1302.1525
Chatterjee, K., Chmelik, M., Gupta, R., Kanodia, A.: Qualitative analysis of POMDPs with temporal logic specifications for robotics applications. In: IEEE Int’l Conf. On Robotics and Automation (ICRA), Seattle, WA, USA, pp. 325–330 (2015)
Chatterjee, K., Chmelik, M., Gupta, R., Kanodia, A.: Optimal cost almost-sure reachability in POMDPs. Artif. Intell. 234, 26–48 (2016)
Chatterjee, K., Saona, R., Ziliotto, B.: Finite-memory strategies in POMDPs with long-run average objectives. Math. Oper. Res. 47(1), 100–119 (2022)
Chrisman, L.: Reinforcement learning with perceptual aliasing: the perceptual distinctions approach. In: AAAI Conf. On Artificial Intelligence, pp. 183–188. AAAI Press, Menlo Park (1992)
Cubuktepe, M., Jansen, N., Junges, S., Katoen, J.-P., Papusha, I., Poonawala, H.A., Topcu, U.: Sequential convex programming for the efficient verification of parametric MDPs. In: Int’l Conf. On Tools and Algorithms for the Construction and Analysis of Systems (TACAS) Part II. LNCS, vol. 10206, pp. 133–150. Springer, Berlin (2017)
Dehnert, C., Jansen, N., Wimmer, R., Abraham, E., Katoen, J.-P.: Fast debugging of PRISM models. In: Int’l Symp. On Automated Technology for Verification and Analysis (ATVA). LNCS, vol. 8837, pp. 146–162. Springer, Berlin (2014)
Draper, D.L., Hanks, S., Weld, D.S.: A probablistic model of action for least-commitment planning with information gathering. In: López de Mántaras, R., Poole, D. (eds.) Conf. On Uncertainty in Artificial Intelligence (UAI), Seattle, WA, USA, July 1994, pp. 178–186. Morgan Kaufmann, San Mateo (1994)
Draper, D.L., Hanks, S., Weld, D.S.: A probabilistic model of action for least-commitment planning with information gather (2013). CoRR arXiv:1302.6801
Floyd, R.W.: Algorithm 97: shortest path. Commun. ACM 5(6), 345 (1962)
Givan, R., Dean, T.L., Greig, M.: Equivalence notions and model minimization in Markov decision processes. Artif. Intell. 147(1–2), 163–223 (2003)
Gurobi Optimization, LLC: Gurobi optimizer reference manual (2019). http://www.gurobi.com
Hollins Wray, K., Zilberstein, S.: Multi-objective POMDPs with lexicographic reward preferences. In: Yang, Q., Wooldridge, M.J. (eds.) Int’l Joint Conf. On Artificial Intelligence (IJCAI), Buenos Aires, Argentina, July 2015, pp. 1719–1725. AAAI Press, Menlo Park (2015)
Isom, J.D., Meyn, S.P., Braatz, R.D.: Piecewise linear dynamic programming for constrained POMDPs. In: Proc. Of the 23rd National Conf. On Artificial Intelligence – Volume 1, AAAI Conf. On Artificial Intelligence, pp. 291–296. AAAI Press, Menlo Park (2008)
Junges, S., Jansen, N., Seshia, S.A.: Enforcing almost-sure reachability in POMDPs (2020). CoRR arXiv:2007.00085
Khonji, M., Jasour, A., Williams, B.C.: Approximability of constant-horizon constrained POMDP. In: Kraus, S. (ed.) Int’l Joint Conf. On Artificial Intelligence (IJCAI), Macao, China, August 2019, pp. 5583–5590. ijcai.org (2019)
Knuth, D.E.: Two notes on notation. Am. Math. Mon. 99(5), 403–422 (1992)
Kochenderfer, M.J.: Decision Making Under Uncertainty: Theory and Application. MIT Press, Cambridge (2015)
Kumar, A., Zilberstein, S.: History-based controller design and optimization for partially observable mdps. In: Brafman, R.I., Domshlak, C., Haslum, P., Zilberstein, S. (eds.) Int’l Conf. On Automated Planning and Scheduling (ICAPS), Jerusalem, Israel, June 2015, pp. 156–164. AAAI Press, Menlo Park (2015)
Kumar, A., Mostafa, H., Zilberstein, S.: Dual formulations for optimizing Dec-POMDP controllers. In: Int’l Conf. On Automated Planning and Scheduling (ICAPS), pp. 202–210 (2016)
Kwiatkowska, M., Norman, G., Parker, D.: Prism 4.0: verification of probabilistic real-time systems. In: Int’l Conf. On Computer-Aided Verification (CAV). LNCS, vol. 6806, pp. 585–591. Springer, Berlin (2011)
Littman, M.L., Topcu, U., Fu, J., Lee Isbell, C. Jr., Wen, M., MacGlashan, J.: Environment-independent task specifications via GLTL (2017). CoRR arXiv:1704.04341
Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In: Hendler, J., Subramanian, D. (eds.) AAAI Conf. On Artificial Intelligence, pp. 541–548. AAAI Press, Menlo Park (1999)
McCallum, R.A.: Overcoming incomplete perception with utile distinction memory. In: Int’l Conf. On Machine Learning (ICML), pp. 190–196. Morgan Kaufmann, San Mateo (1993)
Meuleau, N., Peshkin, L., Kim, K.-E., Pack Kaelbling, L.: Learning finite-state controllers for partially observable environments. In: Conf. On Uncertainty in Artificial Intelligence (UAI), pp. 427–436. Morgan Kaufmann, San Mateo (1999)
Milos, H.: Value-function approximations for partially observable Markov decision processes. J. Artif. Intell. Res. 13, 33–94 (2000)
Norman, G., Parker, D., Zou, X.: Verification and control of partially observable probabilistic real-time systems. In: Sankaranarayanan, S., Vicario, E. (eds.) Int’l Conf. On Formal Modeling and Analysis of Timed Systems (FORMATS). LNCS, vol. 9268, pp. 240–255. Springer, Berlin (2015)
Norman, G., Parker, D., Zou, X.: Verification and control of partially observable probabilistic systems. Real-Time Syst. 53(3), 354–402 (2017)
Pack Kaelbling, L., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1), 99–134 (1998)
Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of Markov decision processes. Math. Oper. Res. 12(3), 441–450 (1987)
Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: an anytime algorithm for POMDPs. In: Int’l Joint Conf. On Artificial Intelligence (IJCAI), pp. 1025–1032. Morgan Kaufmann, San Mateo (2003)
Pnueli, A.: The temporal logic of programs. In: Annual Symp. On Foundations of Computer Science, pp. 46–57. IEEE Comput. Soc., Los Alamitos (1977)
Poupart, P., Malhotra, A., Pei, P., Kim, K.-E., Goh, B., Bowling, M.: Approximate linear programming for constrained partially observable Markov decision processes. In: Bonet, B., Koenig, S. (eds.) AAAI Conf. On Artificial Intelligence, Austin, Texas, USA, January 2015, pp. 3342–3348. AAAI Press, Menlo Park (2015)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics. Wiley-Interscience, New York (2005)
Roijers, D.M., Vamplew, P., Whiteson, S., Dazeley, R.: A survey of multi-objective sequential decision-making. J. Artif. Intell. Res. 48(1), 67–113 (2013)
Russell, S.J., Norvig, P.: Artificial Intelligence – a Modern Approach. Pearson Education, 3rd int’l edn. (2010)
Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Auton. Agents Multi-Agent Syst. 27(1), 1–51 (2013)
Silver, D., Veness, J.: Monte-Carlo planning in large POMDPs. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Conf. On Neural Information Processing Systems (NIPS), pp. 2164–2172. Curran Associates, Red Hook (2010)
Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21(5), 1071–1088 (1973)
Smith, T., Simmons, R.: Heuristic search value iteration for POMDPs. In: Conf. On Uncertainty in Artificial Intelligence (UAI), Banff, Canada, pp. 520–527. AUAI Press (2004)
Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. MIT Press, Cambridge (2005)
Velasquez, A.: Steady-state policy synthesis for verifiable control. In: Kraus, S. (ed.) Int’l Joint Conf. On Artificial Intelligence (IJCAI), pp. 5653–5661. ijcai.org (2019)
Vlassis, N., Littman, M.L., Barber, D.: On the computational complexity of stochastic controller optimization in POMDPs. ACM Trans. Comput. Theory 4(4), 12:1–12:8 (2012)
Walraven, E., Spaan, M.T.J.: Accelerated vector pruning for optimal POMDP solvers. In: AAAI Conf. On Artificial Intelligence, pp. 3672–3678. AAAI Press, Menlo Park (2017)
Walraven, E., Spaan, M.T.J.: Accelerated vector pruning for optimal POMDP solvers. In: Singh, S., Markovitch, S. (eds.) AAAI Conf. On Artificial Intelligence, San Francisco, California, USA, February 2017, pp. 3672–3678. AAAI Press, Menlo Park (2017)
Wimmer, R., Jansen, N., Abraham, E., Katoen, J.-P., Becker, B.: Minimal counterexamples for linear-time probabilistic verification. Theor. Comput. Sci. 549, 61–100 (2014)
Winterer, L., Wimmer, R., Jansen, N., Becker, B.: Strengthening deterministic policies for POMDPs. In: NASA Formal Methods Conference (NFM), Moffett Field, CA, USA, May 2020, pp. 115–132. Springer, Berlin (2020)
Winterer, L., Junges, S., Wimmer, R., Jansen, N., Topcu, U., Katoen, J.-P., Becker, B.: Strategy synthesis for POMDPs in robot planning via game-based abstractions. IEEE Trans. Autom. Control 66(3), 1040–1054 (2021)
Wongpiromsarn, T., Frazzoli, E.: Control of probabilistic systems under dynamic, partially known environments with temporal logic specifications. In: Conf. On Decision and Control (CDC), pp. 7644–7651. IEEE (2012)
Acknowledgements
This work was partially supported by the ERC Starting Grant 101077178 (DEUCE).
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Winterer, L., Wimmer, R., Becker, B. et al. Strong Simple Policies for POMDPs. Int J Softw Tools Technol Transfer 26, 269–299 (2024). https://doi.org/10.1007/s10009-024-00747-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10009-024-00747-0