Decision-making under uncertainty: beyond probabilities

Badings, Thom; Simão, Thiago D.; Suilen, Marnix; Jansen, Nils

doi:10.1007/s10009-023-00704-3

Decision-making under uncertainty: beyond probabilities

Challenges and perspectives

Explanation Paradigms Leveraging Analytic Intuition
Special Section: Introducing Explanation Paradigms Leveraging Analytic Intuition
Open access
Published: 30 May 2023

Volume 25, pages 375–391, (2023)
Cite this article

Download PDF

You have full access to this open access article

International Journal on Software Tools for Technology Transfer Aims and scope Submit manuscript

Decision-making under uncertainty: beyond probabilities

Download PDF

Thom Badings¹,
Thiago D. Simão¹,
Marnix Suilen¹ &
…
Nils Jansen¹

1804 Accesses
6 Citations
Explore all metrics

Abstract

This position paper reflects on the state-of-the-art in decision-making under uncertainty. A classical assumption is that probabilities can sufficiently capture all uncertainty in a system. In this paper, the focus is on the uncertainty that goes beyond this classical interpretation, particularly by employing a clear distinction between aleatoric and epistemic uncertainty. The paper features an overview of Markov decision processes (MDPs) and extensions to account for partial observability and adversarial behavior. These models sufficiently capture aleatoric uncertainty, but fail to account for epistemic uncertainty robustly. Consequently, we present a thorough overview of so-called uncertainty models that exhibit uncertainty in a more robust interpretation. We show several solution techniques for both discrete and continuous models, ranging from formal verification, over control-based abstractions, to reinforcement learning. As an integral part of this paper, we list and discuss several key challenges that arise when dealing with rich types of uncertainty in a model-based fashion.

Article PDF

Approximating Euclidean by Imprecise Markov Decision Processes

Probabilistic Majorization of Partially Observable Markov Decision Processes

Domain independent heuristics for online stochastic contingent planning

Article Open access 08 July 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: ICML. Proceedings of Machine Learning Research, vol. 70, pp. 22–31. PMLR, mlr.press (2017)
Google Scholar
Alegre, L.N., Bazzan, A.L.C., da Silva, B.C.: Minimum-delay adaptation in non-stationary reinforcement learning via online high-confidence change-point detection. In: AAMAS, pp. 97–105. ACM, New York (2021)
Google Scholar
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: AAAI, pp. 2669–2678. AAAI Press, Menlo Park (2018)
Google Scholar
Altman, E.: Constrained Markov Decision Processes: Stochastic Modeling. Routledge, London (1999)
MATH Google Scholar
Alur, R., Henzinger, T.A., Lafferriere, G., Pappas, G.J.: Discrete abstractions of hybrid systems. Proc. IEEE 88(7), 971–984 (2000)
Article Google Scholar
Amato, C.: Decision-making under uncertainty in multi-agent and multi-robot systems: Planning and learning. In: IJCAI, pp. 5662–5666 (2018). ijcai.org
Google Scholar
Amato, C., Bernstein, D.S., Zilberstein, S.: Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Auton. Agents Multi-Agent Syst. 21(3), 293–320 (2010)
Article Google Scholar
Anderson, B.D., Moore, J.B.: Optimal control: linear quadratic methods. Courier Corporation, Mineola, New York (2007)
Google Scholar
Andrés, I., de Barros, L.N., Mauá, D.D., Simão, T.D.: When a robot reaches out for human help. In: IBERAMIA. Lecture Notes in Computer Science, vol. 11238, pp. 277–289. Springer, Berlin (2018)
Google Scholar
Antsaklis, P.J., Michel, A.N.: Linear Systems. Birkhäuser, Basel (2006)
MATH Google Scholar
Argote, L.: Input uncertainty and organizational coordination in hospital emergency units. Administrative science quarterly, 420–434 (1982)
Arrowsmith, D.K., Place, C.M., Place, C., et al.: An introduction to dynamical systems. Cambridge University Press, Cambridge (1990)
MATH Google Scholar
As, Y., Usmanova, I., Curi, S., Krause, A.: Constrained policy optimization via bayesian world models. In: ICLR (2022). OpenReview.net
Google Scholar
Ashok, P., Kretínský, J., Weininger, M.: PAC statistical model checking for markov decision processes and stochastic games. In: CAV (1). Lecture Notes in Computer Science, vol. 11561, pp. 497–519. Springer, Berlin (2019).
Google Scholar
Åström, K.J., Murray, R.M.: Feedback systems: an introduction for scientists and engineers. Princeton University Press, Princeton (2010)
MATH Google Scholar
Azizzadenesheli, K., Brunskill, E., Anandkumar, A.: Efficient exploration through bayesian deep Q-networks. In: ITA, pp. 1–9. IEEE, ieee.org (2018)
Google Scholar
Badings, T., Cubuktepe, M., Jansen, N., Junges, S., Katoen, J.P., Topcu, U.: Scenario-based verification of uncertain parametric MDPs. International Journal on Software Tools for Technology Transfer, 1–17 (2022)
Badings, T., Romao, L., Abate, A., Parker, D., Poonawala, H.A., Stoelinga, M., Jansen, N.: Robust Control for Dynamical Systems with Non-Gaussian Noise via Formal Abstractions. J. Artif. Intell. Res. 76, 341–391 (2023)
Article MathSciNet MATH Google Scholar
Badings, T.S., Abate, A., Jansen, N., Parker, D., Poonawala, H.A., Stoelinga, M.: Sampling-based robust control of autonomous systems with non-gaussian noise. In: AAAI, pp. 9669–9678. AAAI Press, Menlo Park (2022)
Google Scholar
Badings, T.S., Jansen, N., Junges, S., Stoelinga, M., Volk, M.: Sampling-Based Verification of CTMCs with Uncertain Rates. Preprint arXiv:2205.08300 (2022)
Badings, T.S., Jansen, N., Poonawala, H.A., Stoelinga, M.: Filter-based abstractions with correctness guarantees for planning under uncertainty. Preprint arXiv:2103.02398 (2021)
Badings, T.S., Romao, L., Abate, A., Jansen, N.: Probabilities are not enough: Formal controller synthesis for stochastic dynamical models with epistemic uncertainty. In: AAAI (2023)
Google Scholar
Baier, C., Katoen, J.: Principles of model checking. MIT Press, Cambridge (2008)
MATH Google Scholar
Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966)
Article MATH Google Scholar
Belta, C., Yordanov, B., Gol, E.A.: Formal methods for discrete-time dynamical systems, vol. 15. Springer, Berlin (2017)
Book MATH Google Scholar
Ben-Tal, A., Ghaoui, L.E., Nemirovski, A.: Robust Optimization. Princeton Series in Applied Mathematics, vol. 28. Princeton University Press, Princeton (2009)
Book MATH Google Scholar
Bertsimas, D., Brown, D.B., Caramanis, C.: Theory and applications of robust optimization. SIAM Rev. 53(3), 464–501 (2011)
Article MathSciNet MATH Google Scholar
Blondel, V.D., Tsitsiklis, J.N.: A survey of computational complexity results in systems and control. Autom. 36(9), 1249–1274 (2000)
Article MathSciNet MATH Google Scholar
Boutilier, C., Dearden, R., Goldszmidt, M.: Stochastic dynamic programming with factored representations. Artif. Intell. 121(1–2), 49–107 (2000)
Article MathSciNet MATH Google Scholar
Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2014)
MATH Google Scholar
Brin, M., Stuck, G.: Introduction to dynamical systems. Cambridge University Press, Cambridge (2002)
Book MATH Google Scholar
Bry, A., Roy, N.: Rapidly-exploring random belief trees for motion planning under uncertainty. In: ICRA, pp. 723–730. IEEE, ieee.org (2011)
Google Scholar
Burns, B., Brock, O.: Sampling-based motion planning with sensing uncertainty. In: ICRA, pp. 3313–3318. IEEE, ieee.org (2007)
Google Scholar
Campi, M.C., Garatti, S.: Introduction to the scenario approach. SIAM, Philadelphia (2018)
Book MATH Google Scholar
Carr, S., Jansen, N., Bharadwaj, S., Spaan, M.T.J., Topcu, U.: Safe policies for factored partially observable stochastic games. In: Robotics: Science and Systems (2021)
Google Scholar
Carr, S., Jansen, N., Junges, S., Topcu, U.: Safe reinforcement learning via shielding under partial observability. In: AAAI (2023)
Google Scholar
Carr, S., Jansen, N., Topcu, U.: Verifiable rnn-based policies for pomdps under temporal logic constraints. In: IJCAI, pp. 4121–4127 (2020). ijcai.org
Google Scholar
Carr, S., Jansen, N., Topcu, U.: Task-aware verifiable rnn-based policies for partially observable markov decision processes. J. Artif. Intell. Res. 72, 819–847 (2021)
Article MathSciNet MATH Google Scholar
Carr, S., Jansen, N., Wimmer, R., Serban, A.C., Becker, B., Topcu, U.: Counterexample-guided strategy improvement for pomdps using recurrent neural networks. In: IJCAI, pp. 5532–5539 (2019). ijcai.org
Google Scholar
Cauchi, N., Abate, A.: Stochy: Automated verification and synthesis of stochastic processes. In: TACAS (2). Lecture Notes in Computer Science, vol. 11428, pp. 247–264. Springer, Berlin (2019)
Google Scholar
Chades, I., Carwardine, J., Martin, T.G., Nicol, S., Sabbadin, R., Buffet, O.: MOMDPs: A Solution for Modelling Adaptive Management Problems. In: AAAI, pp. 267–273. AAAI Press, Menlo Park (2012)
Google Scholar
Chatterjee, K., Chmelík, M., Karkhanis, D., Novotný, P., Royer, A.: Multiple-environment markov decision processes: Efficient analysis and applications. In: ICAPS, pp. 48–56. AAAI Press, Menlo Park (2020)
Google Scholar
Chatterjee, K., Chmelik, M., Tracol, M.: What is decidable about partially observable markov decision processes with \(\omega \)-regular objectives. J. Comput. Syst. Sci. 82(5), 878–911 (2016)
Article MathSciNet MATH Google Scholar
Chen, M., Frazzoli, E., Hsu, D., Lee, W.S.: POMDP-lite for robust robot planning under uncertainty. In: ICRA, pp. 5427–5433. IEEE, ieee.org (2016)
Google Scholar
Chen, T., Forejt, V., Kwiatkowska, M.Z., Parker, D., Simaitis, A.: Prism-games: A model checker for stochastic multi-player games. In: TACAS. LNCS, vol. 7795, pp. 185–191. Springer, Berlin (2013)
Google Scholar
Cheung, W.C., Simchi-Levi, D., Zhu, R.: Reinforcement learning for non-stationary markov decision processes: The blessing of (more) optimism. In: ICML. Proceedings of Machine Learning Research, vol. 119, pp. 1843–1854. PMLR, mlr.press (2020)
Google Scholar
Chow, Y., Ghavamzadeh, M., Janson, L., Pavone, M.: Risk-constrained reinforcement learning with percentile risk criteria. J. Mach. Learn. Res. 18, 167:1–167:51 (2018)
MathSciNet MATH Google Scholar
Clarke, E.M.: Model checking – my 27-year quest to overcome the state explosion problem. In: NASA Formal Methods, NASA Conference Proceedings, vol. NASA/CP–2009–215407, p. 1 (2009F)
Google Scholar
Clarke, E.M., Henzinger, T.A., Veith, H., Bloem, R.: Handbook of Model Checking. Springer, Berlin (2018)
Book MATH Google Scholar
Clements, W.R., Robaglia, B., Delft, B.V., Slaoui, R.B., Toth, S.: Estimating risk and uncertainty in deep reinforcement learning. Preprint arXiv:1905.09638 (2019)
Coraluppi, S.P., Marcus, S.I.: Risk-sensitive and minimax control of discrete-time, finite-state markov decision processes. Autom. 35(2), 301–309 (1999)
Article MathSciNet MATH Google Scholar
Cubuktepe, M., Jansen, N., Junges, S., Marandi, A., Suilen, M., Topcu, U.: Robust Finite-State Controllers for Uncertain POMDPs. In: AAAI, pp. 11792–11800. AAAI Press, Menlo Park (2021)
Google Scholar
Depeweg, S., Hernández-Lobato, J.M., Doshi-Velez, F., Udluft, S.: Decomposition of uncertainty in bayesian deep learning for efficient and risk-sensitive learning. In: ICML. Proceedings of Machine Learning Research, vol. 80, pp. 1192–1201. PMLR, mlr.press (2018)
Google Scholar
Di Castro, D., Tamar, A., Mannor, S.: Policy gradients with variance related risk criteria. In: ICML. icml.cc / Omnipress, Madison (2012)
Google Scholar
Duff, M.O.: Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. Ph.D. thesis, University of Massachusetts Amherst (2002)
Dulac-Arnold, G., Levine, N., Mankowitz, D.J., Li, J., Paduraru, C., Gowal, S., Hester, T.: Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Mach. Learn. 110(9), 2419–2468 (2021)
Article MathSciNet MATH Google Scholar
Emery-Montemerlo, R., Gordon, G.J., Schneider, J.G., Thrun, S.: Approximate solutions for partially observable stochastic games with common payoffs. In: AAMAS, pp. 136–143. IEEE Comput. Soc., Los Alamitos (2004)
Google Scholar
Eysenbach, B., Levine, S.: Maximum entropy RL (provably) solves some robust RL problems. In: ICLR (2022). OpenReview.net
Google Scholar
Fan, C., Qin, Z., Mathur, U., Ning, Q., Mitra, S., Viswanathan, M.: Controller synthesis for linear system with reach-avoid specifications. IEEE Trans. Autom. Control 67(4), 1713–1727 (2022)
Article MathSciNet MATH Google Scholar
Fisac, J.F., Akametalu, A.K., Zeilinger, M.N., Kaynama, S., Gillula, J.H., Tomlin, C.J.: A general safety framework for learning-based control in uncertain robotic systems. IEEE Trans. Autom. Control 64(7), 2737–2752 (2019)
Article MathSciNet MATH Google Scholar
Fox, C.R., Ülkümen, G.: Distinguishing two dimensions of uncertainty. Fox, Craig R. and Gülden Ülkümen (2011), “Distinguishing Two Dimensions of Uncertainty”. In: Brun, W., Kirkebøen, G., Montgomery, H. (eds.) Essays in Judgment and Decision Making. Universitetsforlaget, Oslo (2011)
Google Scholar
Gajane, P., Ortner, R., Auer, P.: A sliding-window algorithm for markov decision processes with arbitrarily changing rewards and transitions. Preprint arXiv:1805.10066 (2018)
García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015)
MathSciNet MATH Google Scholar
Ghavamzadeh, M., Mannor, S., Pineau, J., Tamar, A.: Bayesian Reinforcement Learning: A Survey. Found. Trends Mach. Learn. 8(5–6), 359–483 (2015)
Article MATH Google Scholar
Girard, A., Pappas, G.J.: Approximation metrics for discrete and continuous systems. IEEE Trans. Autom. Control 52(5), 782–798 (2007)
Article MathSciNet MATH Google Scholar
Givan, R., Leach, S.M., Dean, T.L.: Bounded-parameter markov decision processes. Artif. Intell. 122(1–2), 71–109 (2000)
Article MathSciNet MATH Google Scholar
Goodess, C.M., Hall, J., Best, M., Betts, R., Cabantous, L., Jones, P.D., Kilsby, C.G., Pearman, A., Wallace, C.: Climate scenarios and decision making under uncertainty. Built Environ. 33(1), 10–30 (2007)
Article Google Scholar
Goyal, V., Grand-Clement, J.: Robust Markov Decision Process: Beyond Rectangularity (2020)
Hansen, E.A.: An Improved Policy Iteration Algorithm for Partially Observable MDPs. In: NIPS, pp. 1015–1021. MIT Press, Cambridge (1997)
Google Scholar
Hansen, E.A., Bernstein, D.S., Zilberstein, S.: Dynamic programming for partially observable stochastic games. In: AAAI, pp. 709–715. AAAI Press / The MIT Press, Menlo Park / Cambridge (2004)
Google Scholar
Hausknecht, M.J., Stone, P.: Deep recurrent q-learning for partially observable mdps. In: AAAI Fall Symposia, pp. 29–37. AAAI Press, Menlo Park (2015)
Google Scholar
Hoeffding, W.: Probability Inequalities for Sums of Bounded Random Variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Article MathSciNet MATH Google Scholar
Horák, K., Bosanský, B., Pechoucek, M.: Heuristic Search Value Iteration for One-Sided Partially Observable Stochastic Games. In: AAAI, pp. 558–564. AAAI Press, Menlo Park (2017)
Google Scholar
Horák, K., Zhu, Q., Bosanský, B.: Manipulating Adversary’s Belief: A Dynamic Game Approach to Deception by Design for Proactive Network Security. In: GameSec. LNCS, vol. 10575, pp. 273–294. Springer, Berlin (2017)
Google Scholar
Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach. Learn. 110(3), 457–506 (2021)
Article MathSciNet MATH Google Scholar
Itoh, H., Nakamura, K.: Partially observable Markov decision processes with imprecise parameters. Artif. Intell. 171(8), 453–490 (2007)
Article MathSciNet MATH Google Scholar
Jaeger, M., Bacci, G., Bacci, G., Larsen, K.G., Jensen, P.G.: Approximating euclidean by imprecise markov decision processes. In: ISoLA (1). Lecture Notes in Computer Science, vol. 12476, pp. 275–289. Springer, Berlin (2020)
Google Scholar
Jaksch, T., Ortner, R., Auer, P.: Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res. 11, 1563–1600 (2010)
MathSciNet MATH Google Scholar
Jansen, N., Könighofer, B., Junges, S., Serban, A., Bloem, R.: Safe reinforcement learning using probabilistic shields (invited paper). In: CONCUR, LIPIcs, vol. 171, pp. 3:1–3:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Wadern (2020)
Google Scholar
Jin, Y., Yang, Z., Wang, Z.: Is pessimism provably efficient for offline RL? In: ICML. Proceedings of Machine Learning Research, vol. 139, pp. 5084–5096. PMLR, mlr.press (2021)
Google Scholar
Junges, S., Jansen, N., Wimmer, R., Quatmann, T., Winterer, L., Katoen, J., Becker, B.: Finite-State Controllers of POMDPs using Parameter Synthesis. In: UAI, pp. 519–529. AUAI Press, auai.org (2018)
Google Scholar
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)
Article MathSciNet MATH Google Scholar
Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Fluids Eng. 82(1), 35–45 (1960)
MathSciNet Google Scholar
Kamran, D., Simão, T.D., Yang, Q., Ponnambalam, C.T., Fischer, J., Spaan, M.T.J., Lauer, M.: A modern perspective on safe automated driving for different traffic dynamics using constrained reinforcement learning. In: ITSC, pp. 4017–4023. IEEE, ieee.org (2022)
Google Scholar
Katt, S., Oliehoek, F.A., Amato, C.: Bayesian Reinforcement Learning in Factored POMDPs. In: AAMAS, pp. 7–15. IFAAMAS, ifaamas.org (2019)
Google Scholar
Kochenderfer, M.J.: Decision Making Under Uncertainty: Theory and Application. MIT Press, Cambridge (2015)
Book MATH Google Scholar
Kress-Gazit, H., Fainekos, G.E., Pappas, G.J.: Temporal-Logic-Based Reactive Mission and Motion Planning. IEEE Trans. Robot. 25(6), 1370–1381 (2009)
Article Google Scholar
Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. In: NeurIPS (2020)
Google Scholar
Kumar, A., Zilberstein, S.: Dynamic Programming Approximations for Partially Observable Stochastic Games. In: FLAIRS Conference. AAAI Press, Menlo Park (2009)
Google Scholar
Kwiatkowska, M., Norman, G., Parker, D., Santos, G.: Prism-games 3.0: Stochastic game verification with concurrency, equilibria and time. In: CAV. Lecture Notes in Computer Science, vol. 2, pp. 475–487. Springer, Berlin (2020). 12225
Google Scholar
Lahijanian, M., Andersson, S.B., Belta, C.: Formal verification and synthesis for discrete-time stochastic systems. IEEE Trans. Autom. Control 60(8), 2031–2045 (2015)
Article MathSciNet MATH Google Scholar
Laroche, R., Trichelair, P., des Combes, R.T.: Safe policy improvement with baseline bootstrapping. In: ICML. Proceedings of Machine Learning Research, vol. 97, pp. 3652–3661. PMLR, mlr.press (2019)
Google Scholar
Lathi, B.P., Green, R.A.: Signal processing and linear systems, vol. 2. Oxford University Press, Oxford (1998)
MATH Google Scholar
Lavaei, A., Soudjani, S., Abate, A., Zamani, M.: Automated verification and synthesis of stochastic hybrid systems: A survey. Preprint arXiv:2101.07491 (2021)
Lavaei, A., Soudjani, S., Frazzoli, E., Zamani, M.: Constructing MDP Abstractions Using Data with Formal Guarantees. arXiv e-prints pp. arXiv–2206 (2022)
Levine, S., Kumar, A., Tucker, G., Fu, J.: Offline reinforcement learning: Tutorial, review, and perspectives on open problems. Preprint arXiv:2005.01643 (2020)
Liu, J.S., Chen, R.: Sequential monte carlo methods for dynamic systems. J. Am. Stat. Assoc. 93(443), 1032–1044 (1998)
Article MathSciNet MATH Google Scholar
Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell. 147(1–2), 5–34 (2003)
Article MathSciNet MATH Google Scholar
Mallik, K., Schmuck, A., Soudjani, S., Majumdar, R.: Compositional synthesis of finite-state abstractions. IEEE Trans. Autom. Control 64(6), 2629–2636 (2019)
Article MathSciNet MATH Google Scholar
Mannor, S., Simester, D., Sun, P., Tsitsiklis, J.N.: Bias and Variance Approximation in Value Function Estimates. Manag. Sci. 53(2), 308–322 (2007)
Article MATH Google Scholar
Meuleau, N., Kim, K., Kaelbling, L.P., Cassandra, A.R.: Solving POMDPs by Searching the Space of Finite Policies. In: UAI, pp. 417–426. Morgan Kaufmann, San Mateo (1999)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M.A., Fidjeland, A., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nat. 518(7540), 529–533 (2015)
Article Google Scholar
Modares, H.: Data-driven safe control of linear systems under epistemic and aleatory uncertainties. Preprint arXiv:2202.04495 (2022)
Moerland, T.M., Broekens, J., Jonker, C.M.: Model-based reinforcement learning: A survey. Preprint arXiv:2006.16712 (2020)
Moos, J., Hansel, K., Abdulsamad, H., Stark, S., Clever, D., Peters, J.: Robust reinforcement learning: A review of foundations and recent advances. Mach. Learn. Knowl. Extr. 4(1), 276–315 (2022)
Article Google Scholar
Munos, R.: From bandits to monte-carlo tree search: The optimistic principle applied to optimization and planning. Found. Trends Mach. Learn. 7(1), 1–129 (2014)
Article MATH Google Scholar
Nadjahi, K., Laroche, R., des Combes R.T.: Safe policy improvement with soft baseline bootstrapping. In: ECML/PKDD. Lecture Notes in Computer Science, vol. 3, pp. 53–68. Springer, Berlin (2019). 11908
Google Scholar
Nilim, A., Ghaoui, L.E.: Robust control of markov decision processes with uncertain transition matrices. Oper. Res. 53(5), 780–798 (2005)
Article MathSciNet MATH Google Scholar
Osogami, T.: Robust partially observable markov decision process. In: ICML. JMLR Workshop and Conference Proceedings, vol. 37, pp. 106–115 (2015). JMLR.org
Google Scholar
Panaganti, K., Xu, Z., Kalathil, D., Ghavamzadeh, M.: Robust reinforcement learning using offline data. Preprint arXiv:2208.05129 (2022)
Park, S., Serpedin, E., Qaraqe, K.A.: Gaussian assumption: The least favorable but the most useful [lecture notes]. IEEE Signal Process. Mag. 30(3), 183–186 (2013)
Article Google Scholar
Pattanaik, A., Tang, Z., Liu, S., Bommannan, G., Chowdhary, G.: Robust deep reinforcement learning with adversarial attacks. Preprint arXiv:1712.03632 (2017)
Petrik, M., Ghavamzadeh, M., Chow, Y.: Safe policy improvement by minimizing robust baseline regret. In: NIPS, pp. 2298–2306 (2016)
Google Scholar
Pineau, J., Gordon, G.J., Thrun, S.: Point-based value iteration: An anytime algorithm for pomdps. In: IJCAI, pp. 1025–1032. Morgan Kaufmann, San Mateo (2003)
Google Scholar
Pnueli, A.: The temporal logic of programs. In: FOCS, pp. 46–57. IEEE Comput. Soc., Los Alamitos (1977)
Google Scholar
Ponnambalam, C.T., Oliehoek, F.A., Spaan, M.T.J.: Abstraction-guided policy recovery from expert demonstrations. In: ICAPS, pp. 560–568. AAAI Press, Menlo Park (2021)
Google Scholar
Prentice, S., Roy, N.: The belief roadmap: Efficient planning in linear pomdps by factoring the covariance. In: ISRR. Springer Tracts in Advanced Robotics, vol. 66, pp. 293–305. Springer, Berlin (2007)
Google Scholar
Puggelli, A., Li, W., Sangiovanni-Vincentelli, A.L., Seshia, S.A.: Polynomial-time verification of PCTL properties of mdps with convex uncertainties. In: CAV. Lecture Notes in Computer Science, vol. 8044, pp. 527–542. Springer, Berlin (2013)
Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics. Wiley, New York (1994)
Book MATH Google Scholar
Raskin, J., Sankur, O.: Multiple-environment markov decision processes. In: FSTTCS. LIPIcs, vol. 29, pp. 531–543 Schloss Dagstuhl - Leibniz-Zentrum für Informatik, ??? (2014)
Google Scholar
Reissig, G., Weber, A., Rungger, M.: Feedback refinement relations for the synthesis of symbolic controllers. IEEE Trans. Autom. Control 62(4), 1781–1796 (2017)
Article MathSciNet MATH Google Scholar
Rigter, M., Lacerda, B., Hawes, N.: Risk-averse bayes-adaptive reinforcement learning. In: NeurIPS, pp. 1142–1154 (2021)
Google Scholar
Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2(3), 21–41 (2000)
Article Google Scholar
Ross, S., Chaib-draa, B., Pineau, J.: Bayes-Adaptive POMDPs. In: NIPS, pp. 1225–1232. Curran Associates, Red Hook (2007)
Google Scholar
Ross, S., Pineau, J.: Model-based bayesian reinforcement learning in large structured domains. In: UAI, pp. 476–483. AUAI Press, auai.org (2008)
Google Scholar
Rostampour, V., Badings, T.S., Scherpen, J.: Demand flexibility management for buildings-to-grid integration with uncertain generation. Energies 13(24), 6532 (2020)
Article Google Scholar
Roy, J., Girgis, R., Romoff, J., Bacon, P., Pal, C.J.: Direct Behavior Specification via Constrained Reinforcement Learning. In: ICML. Proceedings of Machine Learning Research, vol. 162, pp. 18828–18843. PMLR, mlr.press (2022)
Google Scholar
Russel, R.H., Petrik, M.: Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs. In: NeurIPS, pp. 7047–7056 (2019)
Google Scholar
Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach, Third International Edition. Pearson Education, Upper Saddle River (2010)
MATH Google Scholar
Sarkar, P.: Sequential monte carlo methods in practice. Technometrics 45(1), 106 (2003)
Article Google Scholar
Schulman, J., Moritz, P., Levine, S., Jordan, M.I., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: ICLR (Poster) (2016)
Google Scholar
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T.P., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of go with deep neural networks and tree search. Nat. 529(7587), 484–489 (2016)
Article Google Scholar
Simão, T.D., Laroche, R., Tachet des Combes, R.: Safe Policy Improvement with an Estimated Baseline Policy. In: AAMAS, pp. 1269–1277. IFAAMAS, ifaamas.org (2020)
Google Scholar
Simão, T.D., Spaan, M.T.J.: Safe policy improvement with baseline bootstrapping in factored environments. In: AAAI, pp. 4967–4974. AAAI Press, Menlo Park (2019)
Google Scholar
Simão, T.D., Spaan, M.T.J.: Structure learning for safe policy improvement. In: IJCAI, pp. 3453–3459 (2019). ijcai.org
Google Scholar
Simão, T.D., Suilen, M., Jansen, N.: Safe Policy Improvement for POMDPs via Finite-State Controllers In: AAAI (2023). Preprint arXiv:2301.04939
Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable markov processes over a finite horizon. Oper. Res. 21(5), 1071–1088 (1973)
Article MATH Google Scholar
Smith, R.C.: Uncertainty quantification: theory, implementation, and applications, vol. 12. SIAM, Philadelphia (2013)
Book Google Scholar
Sniazhko, S.: Uncertainty in decision-making: A review of the international business literature. Cogent Bus. Manag. 6(1), 1650692 (2019)
Article Google Scholar
Soize, C.: Uncertainty quantification. Springer, Berlin (2017)
Book MATH Google Scholar
Soudjani, S.E.Z., Abate, A.: Adaptive and sequential gridding procedures for the abstraction and verification of stochastic processes. SIAM J. Appl. Dyn. Syst. 12(2), 921–956 (2013)
Article MathSciNet MATH Google Scholar
Spaan, M.T.J., Vlassis, N.: Perseus: Randomized Point-based Value Iteration for POMDPs. J. Artif. Intell. Res. 24, 195–220 (2005)
Article MATH Google Scholar
Suilen, M., Jansen, N., Cubuktepe, M., Topcu, U.: Robust Policy Synthesis for Uncertain POMDPs via Convex Optimization. In: IJCAI, pp. 4113–4120 (2020). ijcai.org
Google Scholar
Suilen, M., Simão, T.D., Parker, D., Jansen, N.: Robust anytime learning of markov decision processes. Preprint arXiv:2205.15827 (2022)
Sullivan, T.J.: Introduction to uncertainty quantification, vol. 63. Springer, Berlin (2015)
MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)
MATH Google Scholar
Tabuada, P.: Verification and Control of Hybrid Systems - A Symbolic Approach. Springer, Berlin (2009)
Book MATH Google Scholar
Tan, K.L., Esfandiari, Y., Lee, X.Y., Aakanksha, S.S.: Robustifying reinforcement learning agents via action space adversarial training. In: ACC, pp. 3959–3964. IEEE, ieee.org (2020)
Google Scholar
Tappler, M., Aichernig, B.K., Bacci, G., Eichlseder, M., Larsen, K.G.: L\({}^{\text{*}}\)-based learning of markov decision processes (extended version). Form. Asp. Comput. 33(4–5), 575–615 (2021)
Article MathSciNet MATH Google Scholar
Tappler, M., Muskardin, E., Aichernig, B.K., Pill, I.: Active model learning of stochastic reactive systems. In: SEFM. Lecture Notes in Computer Science, vol. 13085, pp. 481–500. Springer, Berlin (2021)
Google Scholar
Thiebes, S., Lins, S., Sunyaev, A.: Trustworthy artificial intelligence. Electron. Mark. 31(2), 447–464 (2021)
Article Google Scholar
Thomas, P.S., Theocharous, G., Ghavamzadeh, M.: High Confidence Policy Improvement. In: ICML. JMLR Workshop and Conference Proceedings, vol. 37, pp. 2380–2388 (2015). JMLR.org
Google Scholar
Thrun, S., Burgard, W., Fox, D.: Probabilistic robotics. Intelligent robotics and autonomous agents. MIT Press, Cambridge (2005)
MATH Google Scholar
Trentelman, H.L., Stoorvogel, A.A., Hautus, M.: Control theory for linear systems. Springer, Berlin (2012)
MATH Google Scholar
Uehara, M., Sun, W.: Pessimistic model-based offline reinforcement learning under partial coverage. In: ICLR (2022). OpenReview.net
Google Scholar
Urpí, N.A., Curi, S., Krause, A.: Risk-averse offline reinforcement learning. In: ICLR (2021). OpenReview.net
Google Scholar
Vaandrager, F.W.: Model learning. Commun. ACM 60(2), 86–95 (2017)
Article Google Scholar
Vlassis, N., Ghavamzadeh, M., Mannor, S., Poupart, P.: Bayesian reinforcement learning. In: Wiering, M.A., van Otterlo, M. (eds.) Reinforcement Learning, Adaptation, Learning, and Optimization, vol. 12, pp. 359–386. Springer, Berlin (2012)
Google Scholar
Vlassis, N., Littman, M.L., Barber, D.: On the Computational Complexity of Stochastic Controller Optimization in POMDPs. ACM Trans. Comput. Theory 4(4), 12:1–12:8 (2012)
Article MATH Google Scholar
Walraven, E., Spaan, M.T.J.: Point-based value iteration for finite-horizon pomdps. J. Artif. Intell. Res. 65, 307–341 (2019)
Article MathSciNet MATH Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. King’s College, Cambridge United Kingdom (1989). Ph.D. thesis
Google Scholar
Wiesemann, W., Kuhn, D., Sim, M.: Distributionally robust convex optimization. Oper. Res. 62(6), 1358–1376 (2014)
Article MathSciNet MATH Google Scholar
Wolff, E.M., Topcu, U., Murray, R.M.: Robust control of uncertain markov decision processes with temporal logic specifications. In: CDC, pp. 3372–3379. IEEE, ieee.org (2012)
Google Scholar
Wooldridge, M.: The Road to Conscious Machines: The Story of AI. Penguin, Baltimore (2020)
Google Scholar
Xu, H., Mannor, S.: Distributionally Robust Markov Decision Processes. Math. Oper. Res. 37(2), 288–300 (2012)
Article MathSciNet MATH Google Scholar
Yang, Q., Simão, T.D., Tindemans, S.H., Spaan, M.T.: Safety-constrained reinforcement learning with a distributional safety critic. Machine Learning, 1–29 (2022)
Yang, Q., Simão, T.D., Tindemans, S.H., Spaan, M.T.J.: WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning. In: AAAI, pp. 10639–10646. AAAI Press, Menlo Park (2021)
Google Scholar
Zak, S.H.: Systems and control, vol. 198. Oxford University Press, New York (2003)
Google Scholar
Zhao, X., Calinescu, R., Gerasimou, S., Robu, V., Flynn, D.: Interval change-point detection for runtime probabilistic model checking. In: ASE, pp. 163–174. IEEE, ieee.org (2020)
Google Scholar

Download references

Funding

This work was funded by the ERC Starting Grant 101077178 (DEUCE), and the NWO grants NWA.1160.18.238 (PrimaVera) and OCENW.KLEIN.187 (Provably Correct Policies for Uncertain POMDPs).

Author information

Authors and Affiliations

Department of Software Science, Radboud University, Nijmegen, The Netherlands
Thom Badings, Thiago D. Simão, Marnix Suilen & Nils Jansen

Authors

Thom Badings
View author publications
You can also search for this author in PubMed Google Scholar
Thiago D. Simão
View author publications
You can also search for this author in PubMed Google Scholar
Marnix Suilen
View author publications
You can also search for this author in PubMed Google Scholar
Nils Jansen
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Badings, T., Simão, T.D., Suilen, M. et al. Decision-making under uncertainty: beyond probabilities. Int J Softw Tools Technol Transfer 25, 375–391 (2023). https://doi.org/10.1007/s10009-023-00704-3

Download citation

Accepted: 04 April 2023
Published: 30 May 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10009-023-00704-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Decision-making under uncertainty: beyond probabilities

Abstract

Article PDF

Similar content being viewed by others

Approximating Euclidean by Imprecise Markov Decision Processes

Probabilistic Majorization of Partially Observable Markov Decision Processes

Domain independent heuristics for online stochastic contingent planning

References

Funding

Author information

Authors and Affiliations

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Decision-making under uncertainty: beyond probabilities

Abstract

Article PDF

Similar content being viewed by others

Approximating Euclidean by Imprecise Markov Decision Processes

Probabilistic Majorization of Partially Observable Markov Decision Processes

Domain independent heuristics for online stochastic contingent planning

References

Funding

Author information

Authors and Affiliations

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation