Abstract
Stochastic Shortest Path (SSP) is the most popular framework to model sequential decision-making problems under stochasticity. However, decisions for real problems should consider risk sensitivity to provide robust decisions taking into account bad scenarios. SSPs that deal with risk are called Risk-Sensitive SSPs (RSSSPs), and an interesting framework from a theoretical perspective considers Expected Utility Theory under an exponential utility function. However, from a practical perspective, exponential utility function causes overflow or underflow in computer implementation even in small state spaces. In this paper, we make use of LogSumExp technique to solve RSSSPs under exponential utility in practice within Value Iteration, Policy Iteration, and Linear Programming algorithms. Experiments were performed on a toy problem to show scalability of the proposed algorithms.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Usually, Expected Utility Theory considers a value function to be maximized, the positive expected utility. However, because we are considering cost functions, the SSP literature minimizes expected cost, then we chose to follow the SSP literature to avoid any misunderstanding.
- 2.
For example, a policy that pays for sure M arbitrarily bigger, may be chosen over a policy that pays \(M+\varepsilon \) with probability \(\alpha \) and \(\varepsilon \) with probability \(1-\alpha \).
References
Bertsekas, D.P., Tsitsiklis, J.N.: An analysis of stochastic shortest path problems. Math. Oper. Res. 16(3), 580–595 (1991)
Bonet, B., Geffner, H.: Labeled RTDP: improving the convergence of real-time dynamic programming. In: ICAPS 2003, pp. 12–21. AAAI Press (2003)
Chen, Y., Gao, D.Y.: Global solutions to nonconvex optimization of 4th-order polynomial and log-sum-exp functions. J. Glob. Optim. 64(3), 417–431 (2014). https://doi.org/10.1007/s10898-014-0244-5
Chow, Y., Tamar, A., Mannor, S., Pavone, M.: Risk-sensitive and robust decision-making: a CVaR optimization approach. In: Advances in Neural Information Systems (2015)
Chvatal, V.: Linear Programming. Freeman Press, New York (1983)
Delage, E., Mannor, S.: Percentile optimization for Markov decision processes with parameter uncertainty. Oper. Res. 58(1), 203–213 (2010)
Denardo, E.V., Rothblum, U.G.: Optimal stopping, exponential utility, and linear programming. Math. Program. 16(1), 228–244 (1979). https://doi.org/10.1007/BF01582110
Filar, J.A., Krass, D., Ross, K.W., Ross, K.W.: Percentile performance criteria for limiting average Markov decision processes. IEEE Trans. Autom. Control 40(1), 2–10 (1995)
Filar, J.A., Kallenberg, L.C.M., Lee, H.M.: Variance-penalized Markov decision processes. Math. Oper. Res. 14(1), 147–161 (1989)
Freire, V., Delgado, K.V.: Extreme risk averse policy for goal-directed risk-sensitive Markov decision process. In: 5th Brazilian Conference on Intelligent Systems, pp. 79–84 (2016)
Freire, V.: The role of discount factor in risk sensitive Markov decision processes. In: 5th Brazilian Conference on Intelligent Systems, pp. 480–485 (2016)
Freire, V., Delgado, K.V.: GUBS: a utility-based semantic for goal-directed Markov decision processes. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 741–749 (2017)
Gosavi, A., Das, S.K., Murray, S.L.: Beyond exponential utility functions: a variance-adjusted approach for risk-averse reinforcement learning. In: 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 1–8 (2014)
Hansen, E.A., Zilberstein, S.: LAO*: A heuristic search algorithm that finds solutions with loops. Artif. Intell. 129, 35–62 (2001)
Howard, R.A., Matheson, J.E.: Risk-sensitive Markov decision processes. Manag. Sci. 18(7), 356–369 (1972)
Jaquette, S.C.: A utility criterion for Markov decision processes. Manag. Sci. 23(1), 43–49 (1976)
Keeney, R.L., Raiffa, H.: Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Wiley, New York (1976)
Keller, T., Eyerich, P.: Prost: probabilistic planning based on UCT. In: Twenty-Second International Conference on Automated Planning and Scheduling, pp. 1–9 (2012)
Kennings, A.A., Markov, I.L.: Analytical minimization of half-perimeter wirelength. In: Proceedings of the 2000 Asia and South Pacific Design Automation Conference, pp. 179–184. ACM (2000)
Koenig, S., Muise, C., Sanner, S.: Non-traditional objective functions for MDPs. In: IJCAI-18 Workshop on Goal Reasoning, pp. 1–8 (2014)
Liu, Y., Koenig, S.: Risk-sensitive planning with one-switch utility functions: value iteration. In: Proceedings of the 20th National Conference on Artificial Intelligence, pp. 993–999. AAAI Press (2005)
Liu, Y., Koenig, S.: Functional value iteration for decision-theoretic planning with general utility functions. In: Proceedings of the 21st National Conference on Artificial Intelligence, pp. 1186–1186. AAAI Press (2006)
Ma, S., Yu, J.Y.: State-augmentation transformations for risk-sensitive reinforcement learning. In: The Thirty-Third AAAI Conference on Artificial Intelligence. The Thirty-First Innovative Applications of Artificial Intelligence Conference. The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, pp. 4512–4519 (2019)
Mann, T.P.: Numerically stable hidden Markov model implementation. In: An HMM Scaling Tutorial, pp. 1–8 (2006)
Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. Mach. Learn. 49(2), 267–290 (2002). https://doi.org/10.1023/A:1017940631555
Moldovan, T.M., Abbeel, P.: Risk aversion in Markov decision processes via near optimal Chernoff bounds. In: Advances in Neural Information Processing Systems, NIPS 2012, pp. 3131–3139 (2012)
Naylor, W.C., Donelly, R., Sha, L.: Non-linear optimization system and method for wire length and delay optimization for an automatic electric circuit placer (2001). US Patent 6,301,693
Nielsen, F., Sun, K.: Guaranteed bounds on information-theoretic measures of univariate mixtures using piecewise log-sum-exp inequalities. Entropy 18(12), 442 (2016)
Patek, S.D.: On terminating Markov decision processes with a risk-averse objective function. Automatica 37(9), 1379–1386 (2001)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (2014)
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Robert, C.: Machine Learning, A Probabilistic Perspective. Taylor & Francis, Milton Park (2014)
Rothblum, U.G.: Multiplicative Markov decision chains. Math. Oper. Res. 9(1), 6–24 (1984)
Sigl, G., Doll, K., Johannes, F.M.: Analytical placement: a linear or a quadratic objective function. In: 28th ACM/IEEE Design Automation Conference, pp. 427–432 (1991)
Sobel, M.J.: The variance of discounted Markov decision processes. J. Appl. Probab. 19(4), 794–802 (1982)
Trevizan, F., Thiébaux, S., Santana, P., Williams, B.: I-dual: solving constrained SSPs via heuristic search in the dual space. In: Proceedings of the 26th International Joint Conference on AI (IJCAI) (2017)
Trevizan, F., Thiébaux, S., Santana, P., Williams, B.: Heuristic search in dual space for constrained stochastic shortest path problems. In: Twenty-Sixth International Conference on Automated Planning and Scheduling (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
de Freitas, E.M., Freire, V., Delgado, K.V. (2020). Risk Sensitive Stochastic Shortest Path and LogSumExp: From Theory to Practice. In: Cerri, R., Prati, R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science(), vol 12320. Springer, Cham. https://doi.org/10.1007/978-3-030-61380-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-61380-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61379-2
Online ISBN: 978-3-030-61380-8
eBook Packages: Computer ScienceComputer Science (R0)