Abstract
This position paper reflects on the state-of-the-art in decision-making under uncertainty. A classical assumption is that probabilities can sufficiently capture all uncertainty in a system. In this paper, the focus is on the uncertainty that goes beyond this classical interpretation, particularly by employing a clear distinction between aleatoric and epistemic uncertainty. The paper features an overview of Markov decision processes (MDPs) and extensions to account for partial observability and adversarial behavior. These models sufficiently capture aleatoric uncertainty, but fail to account for epistemic uncertainty robustly. Consequently, we present a thorough overview of so-called uncertainty models that exhibit uncertainty in a more robust interpretation. We show several solution techniques for both discrete and continuous models, ranging from formal verification, over control-based abstractions, to reinforcement learning. As an integral part of this paper, we list and discuss several key challenges that arise when dealing with rich types of uncertainty in a model-based fashion.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: ICML. Proceedings of Machine Learning Research, vol. 70, pp. 22–31. PMLR, mlr.press (2017)
Alegre, L.N., Bazzan, A.L.C., da Silva, B.C.: Minimum-delay adaptation in non-stationary reinforcement learning via online high-confidence change-point detection. In: AAMAS, pp. 97–105. ACM, New York (2021)
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: AAAI, pp. 2669–2678. AAAI Press, Menlo Park (2018)
Altman, E.: Constrained Markov Decision Processes: Stochastic Modeling. Routledge, London (1999)
Alur, R., Henzinger, T.A., Lafferriere, G., Pappas, G.J.: Discrete abstractions of hybrid systems. Proc. IEEE 88(7), 971–984 (2000)
Amato, C.: Decision-making under uncertainty in multi-agent and multi-robot systems: Planning and learning. In: IJCAI, pp. 5662–5666 (2018). ijcai.org
Amato, C., Bernstein, D.S., Zilberstein, S.: Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Auton. Agents Multi-Agent Syst. 21(3), 293–320 (2010)
Anderson, B.D., Moore, J.B.: Optimal control: linear quadratic methods. Courier Corporation, Mineola, New York (2007)
Andrés, I., de Barros, L.N., Mauá, D.D., Simão, T.D.: When a robot reaches out for human help. In: IBERAMIA. Lecture Notes in Computer Science, vol. 11238, pp. 277–289. Springer, Berlin (2018)
Antsaklis, P.J., Michel, A.N.: Linear Systems. Birkhäuser, Basel (2006)
Argote, L.: Input uncertainty and organizational coordination in hospital emergency units. Administrative science quarterly, 420–434 (1982)
Arrowsmith, D.K., Place, C.M., Place, C., et al.: An introduction to dynamical systems. Cambridge University Press, Cambridge (1990)
As, Y., Usmanova, I., Curi, S., Krause, A.: Constrained policy optimization via bayesian world models. In: ICLR (2022). OpenReview.net
Ashok, P., Kretínský, J., Weininger, M.: PAC statistical model checking for markov decision processes and stochastic games. In: CAV (1). Lecture Notes in Computer Science, vol. 11561, pp. 497–519. Springer, Berlin (2019).
Åström, K.J., Murray, R.M.: Feedback systems: an introduction for scientists and engineers. Princeton University Press, Princeton (2010)
Azizzadenesheli, K., Brunskill, E., Anandkumar, A.: Efficient exploration through bayesian deep Q-networks. In: ITA, pp. 1–9. IEEE, ieee.org (2018)
Badings, T., Cubuktepe, M., Jansen, N., Junges, S., Katoen, J.P., Topcu, U.: Scenario-based verification of uncertain parametric MDPs. International Journal on Software Tools for Technology Transfer, 1–17 (2022)
Badings, T., Romao, L., Abate, A., Parker, D., Poonawala, H.A., Stoelinga, M., Jansen, N.: Robust Control for Dynamical Systems with Non-Gaussian Noise via Formal Abstractions. J. Artif. Intell. Res. 76, 341–391 (2023)
Badings, T.S., Abate, A., Jansen, N., Parker, D., Poonawala, H.A., Stoelinga, M.: Sampling-based robust control of autonomous systems with non-gaussian noise. In: AAAI, pp. 9669–9678. AAAI Press, Menlo Park (2022)
Badings, T.S., Jansen, N., Junges, S., Stoelinga, M., Volk, M.: Sampling-Based Verification of CTMCs with Uncertain Rates. Preprint arXiv:2205.08300 (2022)
Badings, T.S., Jansen, N., Poonawala, H.A., Stoelinga, M.: Filter-based abstractions with correctness guarantees for planning under uncertainty. Preprint arXiv:2103.02398 (2021)
Badings, T.S., Romao, L., Abate, A., Jansen, N.: Probabilities are not enough: Formal controller synthesis for stochastic dynamical models with epistemic uncertainty. In: AAAI (2023)
Baier, C., Katoen, J.: Principles of model checking. MIT Press, Cambridge (2008)
Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966)
Belta, C., Yordanov, B., Gol, E.A.: Formal methods for discrete-time dynamical systems, vol. 15. Springer, Berlin (2017)
Ben-Tal, A., Ghaoui, L.E., Nemirovski, A.: Robust Optimization. Princeton Series in Applied Mathematics, vol. 28. Princeton University Press, Princeton (2009)
Bertsimas, D., Brown, D.B., Caramanis, C.: Theory and applications of robust optimization. SIAM Rev. 53(3), 464–501 (2011)
Blondel, V.D., Tsitsiklis, J.N.: A survey of computational complexity results in systems and control. Autom. 36(9), 1249–1274 (2000)
Boutilier, C., Dearden, R., Goldszmidt, M.: Stochastic dynamic programming with factored representations. Artif. Intell. 121(1–2), 49–107 (2000)
Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2014)
Brin, M., Stuck, G.: Introduction to dynamical systems. Cambridge University Press, Cambridge (2002)
Bry, A., Roy, N.: Rapidly-exploring random belief trees for motion planning under uncertainty. In: ICRA, pp. 723–730. IEEE, ieee.org (2011)
Burns, B., Brock, O.: Sampling-based motion planning with sensing uncertainty. In: ICRA, pp. 3313–3318. IEEE, ieee.org (2007)
Campi, M.C., Garatti, S.: Introduction to the scenario approach. SIAM, Philadelphia (2018)
Carr, S., Jansen, N., Bharadwaj, S., Spaan, M.T.J., Topcu, U.: Safe policies for factored partially observable stochastic games. In: Robotics: Science and Systems (2021)
Carr, S., Jansen, N., Junges, S., Topcu, U.: Safe reinforcement learning via shielding under partial observability. In: AAAI (2023)
Carr, S., Jansen, N., Topcu, U.: Verifiable rnn-based policies for pomdps under temporal logic constraints. In: IJCAI, pp. 4121–4127 (2020). ijcai.org
Carr, S., Jansen, N., Topcu, U.: Task-aware verifiable rnn-based policies for partially observable markov decision processes. J. Artif. Intell. Res. 72, 819–847 (2021)
Carr, S., Jansen, N., Wimmer, R., Serban, A.C., Becker, B., Topcu, U.: Counterexample-guided strategy improvement for pomdps using recurrent neural networks. In: IJCAI, pp. 5532–5539 (2019). ijcai.org
Cauchi, N., Abate, A.: Stochy: Automated verification and synthesis of stochastic processes. In: TACAS (2). Lecture Notes in Computer Science, vol. 11428, pp. 247–264. Springer, Berlin (2019)
Chades, I., Carwardine, J., Martin, T.G., Nicol, S., Sabbadin, R., Buffet, O.: MOMDPs: A Solution for Modelling Adaptive Management Problems. In: AAAI, pp. 267–273. AAAI Press, Menlo Park (2012)
Chatterjee, K., Chmelík, M., Karkhanis, D., Novotný, P., Royer, A.: Multiple-environment markov decision processes: Efficient analysis and applications. In: ICAPS, pp. 48–56. AAAI Press, Menlo Park (2020)
Chatterjee, K., Chmelik, M., Tracol, M.: What is decidable about partially observable markov decision processes with \(\omega \)-regular objectives. J. Comput. Syst. Sci. 82(5), 878–911 (2016)
Chen, M., Frazzoli, E., Hsu, D., Lee, W.S.: POMDP-lite for robust robot planning under uncertainty. In: ICRA, pp. 5427–5433. IEEE, ieee.org (2016)
Chen, T., Forejt, V., Kwiatkowska, M.Z., Parker, D., Simaitis, A.: Prism-games: A model checker for stochastic multi-player games. In: TACAS. LNCS, vol. 7795, pp. 185–191. Springer, Berlin (2013)
Cheung, W.C., Simchi-Levi, D., Zhu, R.: Reinforcement learning for non-stationary markov decision processes: The blessing of (more) optimism. In: ICML. Proceedings of Machine Learning Research, vol. 119, pp. 1843–1854. PMLR, mlr.press (2020)
Chow, Y., Ghavamzadeh, M., Janson, L., Pavone, M.: Risk-constrained reinforcement learning with percentile risk criteria. J. Mach. Learn. Res. 18, 167:1–167:51 (2018)
Clarke, E.M.: Model checking – my 27-year quest to overcome the state explosion problem. In: NASA Formal Methods, NASA Conference Proceedings, vol. NASA/CP–2009–215407, p. 1 (2009F)
Clarke, E.M., Henzinger, T.A., Veith, H., Bloem, R.: Handbook of Model Checking. Springer, Berlin (2018)
Clements, W.R., Robaglia, B., Delft, B.V., Slaoui, R.B., Toth, S.: Estimating risk and uncertainty in deep reinforcement learning. Preprint arXiv:1905.09638 (2019)
Coraluppi, S.P., Marcus, S.I.: Risk-sensitive and minimax control of discrete-time, finite-state markov decision processes. Autom. 35(2), 301–309 (1999)
Cubuktepe, M., Jansen, N., Junges, S., Marandi, A., Suilen, M., Topcu, U.: Robust Finite-State Controllers for Uncertain POMDPs. In: AAAI, pp. 11792–11800. AAAI Press, Menlo Park (2021)
Depeweg, S., Hernández-Lobato, J.M., Doshi-Velez, F., Udluft, S.: Decomposition of uncertainty in bayesian deep learning for efficient and risk-sensitive learning. In: ICML. Proceedings of Machine Learning Research, vol. 80, pp. 1192–1201. PMLR, mlr.press (2018)
Di Castro, D., Tamar, A., Mannor, S.: Policy gradients with variance related risk criteria. In: ICML. icml.cc / Omnipress, Madison (2012)
Duff, M.O.: Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. Ph.D. thesis, University of Massachusetts Amherst (2002)
Dulac-Arnold, G., Levine, N., Mankowitz, D.J., Li, J., Paduraru, C., Gowal, S., Hester, T.: Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Mach. Learn. 110(9), 2419–2468 (2021)
Emery-Montemerlo, R., Gordon, G.J., Schneider, J.G., Thrun, S.: Approximate solutions for partially observable stochastic games with common payoffs. In: AAMAS, pp. 136–143. IEEE Comput. Soc., Los Alamitos (2004)
Eysenbach, B., Levine, S.: Maximum entropy RL (provably) solves some robust RL problems. In: ICLR (2022). OpenReview.net
Fan, C., Qin, Z., Mathur, U., Ning, Q., Mitra, S., Viswanathan, M.: Controller synthesis for linear system with reach-avoid specifications. IEEE Trans. Autom. Control 67(4), 1713–1727 (2022)
Fisac, J.F., Akametalu, A.K., Zeilinger, M.N., Kaynama, S., Gillula, J.H., Tomlin, C.J.: A general safety framework for learning-based control in uncertain robotic systems. IEEE Trans. Autom. Control 64(7), 2737–2752 (2019)
Fox, C.R., Ülkümen, G.: Distinguishing two dimensions of uncertainty. Fox, Craig R. and Gülden Ülkümen (2011), “Distinguishing Two Dimensions of Uncertainty”. In: Brun, W., Kirkebøen, G., Montgomery, H. (eds.) Essays in Judgment and Decision Making. Universitetsforlaget, Oslo (2011)
Gajane, P., Ortner, R., Auer, P.: A sliding-window algorithm for markov decision processes with arbitrarily changing rewards and transitions. Preprint arXiv:1805.10066 (2018)
García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015)
Ghavamzadeh, M., Mannor, S., Pineau, J., Tamar, A.: Bayesian Reinforcement Learning: A Survey. Found. Trends Mach. Learn. 8(5–6), 359–483 (2015)
Girard, A., Pappas, G.J.: Approximation metrics for discrete and continuous systems. IEEE Trans. Autom. Control 52(5), 782–798 (2007)
Givan, R., Leach, S.M., Dean, T.L.: Bounded-parameter markov decision processes. Artif. Intell. 122(1–2), 71–109 (2000)
Goodess, C.M., Hall, J., Best, M., Betts, R., Cabantous, L., Jones, P.D., Kilsby, C.G., Pearman, A., Wallace, C.: Climate scenarios and decision making under uncertainty. Built Environ. 33(1), 10–30 (2007)
Goyal, V., Grand-Clement, J.: Robust Markov Decision Process: Beyond Rectangularity (2020)
Hansen, E.A.: An Improved Policy Iteration Algorithm for Partially Observable MDPs. In: NIPS, pp. 1015–1021. MIT Press, Cambridge (1997)
Hansen, E.A., Bernstein, D.S., Zilberstein, S.: Dynamic programming for partially observable stochastic games. In: AAAI, pp. 709–715. AAAI Press / The MIT Press, Menlo Park / Cambridge (2004)
Hausknecht, M.J., Stone, P.: Deep recurrent q-learning for partially observable mdps. In: AAAI Fall Symposia, pp. 29–37. AAAI Press, Menlo Park (2015)
Hoeffding, W.: Probability Inequalities for Sums of Bounded Random Variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Horák, K., Bosanský, B., Pechoucek, M.: Heuristic Search Value Iteration for One-Sided Partially Observable Stochastic Games. In: AAAI, pp. 558–564. AAAI Press, Menlo Park (2017)
Horák, K., Zhu, Q., Bosanský, B.: Manipulating Adversary’s Belief: A Dynamic Game Approach to Deception by Design for Proactive Network Security. In: GameSec. LNCS, vol. 10575, pp. 273–294. Springer, Berlin (2017)
Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach. Learn. 110(3), 457–506 (2021)
Itoh, H., Nakamura, K.: Partially observable Markov decision processes with imprecise parameters. Artif. Intell. 171(8), 453–490 (2007)
Jaeger, M., Bacci, G., Bacci, G., Larsen, K.G., Jensen, P.G.: Approximating euclidean by imprecise markov decision processes. In: ISoLA (1). Lecture Notes in Computer Science, vol. 12476, pp. 275–289. Springer, Berlin (2020)
Jaksch, T., Ortner, R., Auer, P.: Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res. 11, 1563–1600 (2010)
Jansen, N., Könighofer, B., Junges, S., Serban, A., Bloem, R.: Safe reinforcement learning using probabilistic shields (invited paper). In: CONCUR, LIPIcs, vol. 171, pp. 3:1–3:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Wadern (2020)
Jin, Y., Yang, Z., Wang, Z.: Is pessimism provably efficient for offline RL? In: ICML. Proceedings of Machine Learning Research, vol. 139, pp. 5084–5096. PMLR, mlr.press (2021)
Junges, S., Jansen, N., Wimmer, R., Quatmann, T., Winterer, L., Katoen, J., Becker, B.: Finite-State Controllers of POMDPs using Parameter Synthesis. In: UAI, pp. 519–529. AUAI Press, auai.org (2018)
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)
Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Fluids Eng. 82(1), 35–45 (1960)
Kamran, D., Simão, T.D., Yang, Q., Ponnambalam, C.T., Fischer, J., Spaan, M.T.J., Lauer, M.: A modern perspective on safe automated driving for different traffic dynamics using constrained reinforcement learning. In: ITSC, pp. 4017–4023. IEEE, ieee.org (2022)
Katt, S., Oliehoek, F.A., Amato, C.: Bayesian Reinforcement Learning in Factored POMDPs. In: AAMAS, pp. 7–15. IFAAMAS, ifaamas.org (2019)
Kochenderfer, M.J.: Decision Making Under Uncertainty: Theory and Application. MIT Press, Cambridge (2015)
Kress-Gazit, H., Fainekos, G.E., Pappas, G.J.: Temporal-Logic-Based Reactive Mission and Motion Planning. IEEE Trans. Robot. 25(6), 1370–1381 (2009)
Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. In: NeurIPS (2020)
Kumar, A., Zilberstein, S.: Dynamic Programming Approximations for Partially Observable Stochastic Games. In: FLAIRS Conference. AAAI Press, Menlo Park (2009)
Kwiatkowska, M., Norman, G., Parker, D., Santos, G.: Prism-games 3.0: Stochastic game verification with concurrency, equilibria and time. In: CAV. Lecture Notes in Computer Science, vol. 2, pp. 475–487. Springer, Berlin (2020). 12225
Lahijanian, M., Andersson, S.B., Belta, C.: Formal verification and synthesis for discrete-time stochastic systems. IEEE Trans. Autom. Control 60(8), 2031–2045 (2015)
Laroche, R., Trichelair, P., des Combes, R.T.: Safe policy improvement with baseline bootstrapping. In: ICML. Proceedings of Machine Learning Research, vol. 97, pp. 3652–3661. PMLR, mlr.press (2019)
Lathi, B.P., Green, R.A.: Signal processing and linear systems, vol. 2. Oxford University Press, Oxford (1998)
Lavaei, A., Soudjani, S., Abate, A., Zamani, M.: Automated verification and synthesis of stochastic hybrid systems: A survey. Preprint arXiv:2101.07491 (2021)
Lavaei, A., Soudjani, S., Frazzoli, E., Zamani, M.: Constructing MDP Abstractions Using Data with Formal Guarantees. arXiv e-prints pp. arXiv–2206 (2022)
Levine, S., Kumar, A., Tucker, G., Fu, J.: Offline reinforcement learning: Tutorial, review, and perspectives on open problems. Preprint arXiv:2005.01643 (2020)
Liu, J.S., Chen, R.: Sequential monte carlo methods for dynamic systems. J. Am. Stat. Assoc. 93(443), 1032–1044 (1998)
Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell. 147(1–2), 5–34 (2003)
Mallik, K., Schmuck, A., Soudjani, S., Majumdar, R.: Compositional synthesis of finite-state abstractions. IEEE Trans. Autom. Control 64(6), 2629–2636 (2019)
Mannor, S., Simester, D., Sun, P., Tsitsiklis, J.N.: Bias and Variance Approximation in Value Function Estimates. Manag. Sci. 53(2), 308–322 (2007)
Meuleau, N., Kim, K., Kaelbling, L.P., Cassandra, A.R.: Solving POMDPs by Searching the Space of Finite Policies. In: UAI, pp. 417–426. Morgan Kaufmann, San Mateo (1999)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M.A., Fidjeland, A., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nat. 518(7540), 529–533 (2015)
Modares, H.: Data-driven safe control of linear systems under epistemic and aleatory uncertainties. Preprint arXiv:2202.04495 (2022)
Moerland, T.M., Broekens, J., Jonker, C.M.: Model-based reinforcement learning: A survey. Preprint arXiv:2006.16712 (2020)
Moos, J., Hansel, K., Abdulsamad, H., Stark, S., Clever, D., Peters, J.: Robust reinforcement learning: A review of foundations and recent advances. Mach. Learn. Knowl. Extr. 4(1), 276–315 (2022)
Munos, R.: From bandits to monte-carlo tree search: The optimistic principle applied to optimization and planning. Found. Trends Mach. Learn. 7(1), 1–129 (2014)
Nadjahi, K., Laroche, R., des Combes R.T.: Safe policy improvement with soft baseline bootstrapping. In: ECML/PKDD. Lecture Notes in Computer Science, vol. 3, pp. 53–68. Springer, Berlin (2019). 11908
Nilim, A., Ghaoui, L.E.: Robust control of markov decision processes with uncertain transition matrices. Oper. Res. 53(5), 780–798 (2005)
Osogami, T.: Robust partially observable markov decision process. In: ICML. JMLR Workshop and Conference Proceedings, vol. 37, pp. 106–115 (2015). JMLR.org
Panaganti, K., Xu, Z., Kalathil, D., Ghavamzadeh, M.: Robust reinforcement learning using offline data. Preprint arXiv:2208.05129 (2022)
Park, S., Serpedin, E., Qaraqe, K.A.: Gaussian assumption: The least favorable but the most useful [lecture notes]. IEEE Signal Process. Mag. 30(3), 183–186 (2013)
Pattanaik, A., Tang, Z., Liu, S., Bommannan, G., Chowdhary, G.: Robust deep reinforcement learning with adversarial attacks. Preprint arXiv:1712.03632 (2017)
Petrik, M., Ghavamzadeh, M., Chow, Y.: Safe policy improvement by minimizing robust baseline regret. In: NIPS, pp. 2298–2306 (2016)
Pineau, J., Gordon, G.J., Thrun, S.: Point-based value iteration: An anytime algorithm for pomdps. In: IJCAI, pp. 1025–1032. Morgan Kaufmann, San Mateo (2003)
Pnueli, A.: The temporal logic of programs. In: FOCS, pp. 46–57. IEEE Comput. Soc., Los Alamitos (1977)
Ponnambalam, C.T., Oliehoek, F.A., Spaan, M.T.J.: Abstraction-guided policy recovery from expert demonstrations. In: ICAPS, pp. 560–568. AAAI Press, Menlo Park (2021)
Prentice, S., Roy, N.: The belief roadmap: Efficient planning in linear pomdps by factoring the covariance. In: ISRR. Springer Tracts in Advanced Robotics, vol. 66, pp. 293–305. Springer, Berlin (2007)
Puggelli, A., Li, W., Sangiovanni-Vincentelli, A.L., Seshia, S.A.: Polynomial-time verification of PCTL properties of mdps with convex uncertainties. In: CAV. Lecture Notes in Computer Science, vol. 8044, pp. 527–542. Springer, Berlin (2013)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics. Wiley, New York (1994)
Raskin, J., Sankur, O.: Multiple-environment markov decision processes. In: FSTTCS. LIPIcs, vol. 29, pp. 531–543 Schloss Dagstuhl - Leibniz-Zentrum für Informatik, ??? (2014)
Reissig, G., Weber, A., Rungger, M.: Feedback refinement relations for the synthesis of symbolic controllers. IEEE Trans. Autom. Control 62(4), 1781–1796 (2017)
Rigter, M., Lacerda, B., Hawes, N.: Risk-averse bayes-adaptive reinforcement learning. In: NeurIPS, pp. 1142–1154 (2021)
Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2(3), 21–41 (2000)
Ross, S., Chaib-draa, B., Pineau, J.: Bayes-Adaptive POMDPs. In: NIPS, pp. 1225–1232. Curran Associates, Red Hook (2007)
Ross, S., Pineau, J.: Model-based bayesian reinforcement learning in large structured domains. In: UAI, pp. 476–483. AUAI Press, auai.org (2008)
Rostampour, V., Badings, T.S., Scherpen, J.: Demand flexibility management for buildings-to-grid integration with uncertain generation. Energies 13(24), 6532 (2020)
Roy, J., Girgis, R., Romoff, J., Bacon, P., Pal, C.J.: Direct Behavior Specification via Constrained Reinforcement Learning. In: ICML. Proceedings of Machine Learning Research, vol. 162, pp. 18828–18843. PMLR, mlr.press (2022)
Russel, R.H., Petrik, M.: Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs. In: NeurIPS, pp. 7047–7056 (2019)
Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach, Third International Edition. Pearson Education, Upper Saddle River (2010)
Sarkar, P.: Sequential monte carlo methods in practice. Technometrics 45(1), 106 (2003)
Schulman, J., Moritz, P., Levine, S., Jordan, M.I., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: ICLR (Poster) (2016)
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T.P., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of go with deep neural networks and tree search. Nat. 529(7587), 484–489 (2016)
Simão, T.D., Laroche, R., Tachet des Combes, R.: Safe Policy Improvement with an Estimated Baseline Policy. In: AAMAS, pp. 1269–1277. IFAAMAS, ifaamas.org (2020)
Simão, T.D., Spaan, M.T.J.: Safe policy improvement with baseline bootstrapping in factored environments. In: AAAI, pp. 4967–4974. AAAI Press, Menlo Park (2019)
Simão, T.D., Spaan, M.T.J.: Structure learning for safe policy improvement. In: IJCAI, pp. 3453–3459 (2019). ijcai.org
Simão, T.D., Suilen, M., Jansen, N.: Safe Policy Improvement for POMDPs via Finite-State Controllers In: AAAI (2023). Preprint arXiv:2301.04939
Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable markov processes over a finite horizon. Oper. Res. 21(5), 1071–1088 (1973)
Smith, R.C.: Uncertainty quantification: theory, implementation, and applications, vol. 12. SIAM, Philadelphia (2013)
Sniazhko, S.: Uncertainty in decision-making: A review of the international business literature. Cogent Bus. Manag. 6(1), 1650692 (2019)
Soize, C.: Uncertainty quantification. Springer, Berlin (2017)
Soudjani, S.E.Z., Abate, A.: Adaptive and sequential gridding procedures for the abstraction and verification of stochastic processes. SIAM J. Appl. Dyn. Syst. 12(2), 921–956 (2013)
Spaan, M.T.J., Vlassis, N.: Perseus: Randomized Point-based Value Iteration for POMDPs. J. Artif. Intell. Res. 24, 195–220 (2005)
Suilen, M., Jansen, N., Cubuktepe, M., Topcu, U.: Robust Policy Synthesis for Uncertain POMDPs via Convex Optimization. In: IJCAI, pp. 4113–4120 (2020). ijcai.org
Suilen, M., Simão, T.D., Parker, D., Jansen, N.: Robust anytime learning of markov decision processes. Preprint arXiv:2205.15827 (2022)
Sullivan, T.J.: Introduction to uncertainty quantification, vol. 63. Springer, Berlin (2015)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)
Tabuada, P.: Verification and Control of Hybrid Systems - A Symbolic Approach. Springer, Berlin (2009)
Tan, K.L., Esfandiari, Y., Lee, X.Y., Aakanksha, S.S.: Robustifying reinforcement learning agents via action space adversarial training. In: ACC, pp. 3959–3964. IEEE, ieee.org (2020)
Tappler, M., Aichernig, B.K., Bacci, G., Eichlseder, M., Larsen, K.G.: L\({}^{\text{*}}\)-based learning of markov decision processes (extended version). Form. Asp. Comput. 33(4–5), 575–615 (2021)
Tappler, M., Muskardin, E., Aichernig, B.K., Pill, I.: Active model learning of stochastic reactive systems. In: SEFM. Lecture Notes in Computer Science, vol. 13085, pp. 481–500. Springer, Berlin (2021)
Thiebes, S., Lins, S., Sunyaev, A.: Trustworthy artificial intelligence. Electron. Mark. 31(2), 447–464 (2021)
Thomas, P.S., Theocharous, G., Ghavamzadeh, M.: High Confidence Policy Improvement. In: ICML. JMLR Workshop and Conference Proceedings, vol. 37, pp. 2380–2388 (2015). JMLR.org
Thrun, S., Burgard, W., Fox, D.: Probabilistic robotics. Intelligent robotics and autonomous agents. MIT Press, Cambridge (2005)
Trentelman, H.L., Stoorvogel, A.A., Hautus, M.: Control theory for linear systems. Springer, Berlin (2012)
Uehara, M., Sun, W.: Pessimistic model-based offline reinforcement learning under partial coverage. In: ICLR (2022). OpenReview.net
Urpí, N.A., Curi, S., Krause, A.: Risk-averse offline reinforcement learning. In: ICLR (2021). OpenReview.net
Vaandrager, F.W.: Model learning. Commun. ACM 60(2), 86–95 (2017)
Vlassis, N., Ghavamzadeh, M., Mannor, S., Poupart, P.: Bayesian reinforcement learning. In: Wiering, M.A., van Otterlo, M. (eds.) Reinforcement Learning, Adaptation, Learning, and Optimization, vol. 12, pp. 359–386. Springer, Berlin (2012)
Vlassis, N., Littman, M.L., Barber, D.: On the Computational Complexity of Stochastic Controller Optimization in POMDPs. ACM Trans. Comput. Theory 4(4), 12:1–12:8 (2012)
Walraven, E., Spaan, M.T.J.: Point-based value iteration for finite-horizon pomdps. J. Artif. Intell. Res. 65, 307–341 (2019)
Watkins, C.J.C.H.: Learning from delayed rewards. King’s College, Cambridge United Kingdom (1989). Ph.D. thesis
Wiesemann, W., Kuhn, D., Sim, M.: Distributionally robust convex optimization. Oper. Res. 62(6), 1358–1376 (2014)
Wolff, E.M., Topcu, U., Murray, R.M.: Robust control of uncertain markov decision processes with temporal logic specifications. In: CDC, pp. 3372–3379. IEEE, ieee.org (2012)
Wooldridge, M.: The Road to Conscious Machines: The Story of AI. Penguin, Baltimore (2020)
Xu, H., Mannor, S.: Distributionally Robust Markov Decision Processes. Math. Oper. Res. 37(2), 288–300 (2012)
Yang, Q., Simão, T.D., Tindemans, S.H., Spaan, M.T.: Safety-constrained reinforcement learning with a distributional safety critic. Machine Learning, 1–29 (2022)
Yang, Q., Simão, T.D., Tindemans, S.H., Spaan, M.T.J.: WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning. In: AAAI, pp. 10639–10646. AAAI Press, Menlo Park (2021)
Zak, S.H.: Systems and control, vol. 198. Oxford University Press, New York (2003)
Zhao, X., Calinescu, R., Gerasimou, S., Robu, V., Flynn, D.: Interval change-point detection for runtime probabilistic model checking. In: ASE, pp. 163–174. IEEE, ieee.org (2020)
Funding
This work was funded by the ERC Starting Grant 101077178 (DEUCE), and the NWO grants NWA.1160.18.238 (PrimaVera) and OCENW.KLEIN.187 (Provably Correct Policies for Uncertain POMDPs).
Author information
Authors and Affiliations
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Badings, T., Simão, T.D., Suilen, M. et al. Decision-making under uncertainty: beyond probabilities. Int J Softw Tools Technol Transfer 25, 375–391 (2023). https://doi.org/10.1007/s10009-023-00704-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10009-023-00704-3