Risk Sensitive Stochastic Shortest Path and LogSumExp: From Theory to Practice

de Freitas, Elthon Manhas; Freire, Valdinei; Delgado, Karina Valdivia

doi:10.1007/978-3-030-61380-8_9

Risk Sensitive Stochastic Shortest Path and LogSumExp: From Theory to Practice

Elthon Manhas de Freitas¹⁰,
Valdinei Freire¹¹ &
Karina Valdivia Delgado¹¹

Conference paper
First Online: 13 October 2020

884 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12320))

Abstract

Stochastic Shortest Path (SSP) is the most popular framework to model sequential decision-making problems under stochasticity. However, decisions for real problems should consider risk sensitivity to provide robust decisions taking into account bad scenarios. SSPs that deal with risk are called Risk-Sensitive SSPs (RSSSPs), and an interesting framework from a theoretical perspective considers Expected Utility Theory under an exponential utility function. However, from a practical perspective, exponential utility function causes overflow or underflow in computer implementation even in small state spaces. In this paper, we make use of LogSumExp technique to solve RSSSPs under exponential utility in practice within Value Iteration, Policy Iteration, and Linear Programming algorithms. Experiments were performed on a toy problem to show scalability of the proposed algorithms.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Usually, Expected Utility Theory considers a value function to be maximized, the positive expected utility. However, because we are considering cost functions, the SSP literature minimizes expected cost, then we chose to follow the SSP literature to avoid any misunderstanding.
2.
For example, a policy that pays for sure M arbitrarily bigger, may be chosen over a policy that pays \(M+\varepsilon \) with probability \(\alpha \) and \(\varepsilon \) with probability \(1-\alpha \).

References

Bertsekas, D.P., Tsitsiklis, J.N.: An analysis of stochastic shortest path problems. Math. Oper. Res. 16(3), 580–595 (1991)
Article MathSciNet Google Scholar
Bonet, B., Geffner, H.: Labeled RTDP: improving the convergence of real-time dynamic programming. In: ICAPS 2003, pp. 12–21. AAAI Press (2003)
Google Scholar
Chen, Y., Gao, D.Y.: Global solutions to nonconvex optimization of 4th-order polynomial and log-sum-exp functions. J. Glob. Optim. 64(3), 417–431 (2014). https://doi.org/10.1007/s10898-014-0244-5
Article MathSciNet MATH Google Scholar
Chow, Y., Tamar, A., Mannor, S., Pavone, M.: Risk-sensitive and robust decision-making: a CVaR optimization approach. In: Advances in Neural Information Systems (2015)
Google Scholar
Chvatal, V.: Linear Programming. Freeman Press, New York (1983)
Google Scholar
Delage, E., Mannor, S.: Percentile optimization for Markov decision processes with parameter uncertainty. Oper. Res. 58(1), 203–213 (2010)
Article MathSciNet Google Scholar
Denardo, E.V., Rothblum, U.G.: Optimal stopping, exponential utility, and linear programming. Math. Program. 16(1), 228–244 (1979). https://doi.org/10.1007/BF01582110
Article MathSciNet MATH Google Scholar
Filar, J.A., Krass, D., Ross, K.W., Ross, K.W.: Percentile performance criteria for limiting average Markov decision processes. IEEE Trans. Autom. Control 40(1), 2–10 (1995)
Article MathSciNet Google Scholar
Filar, J.A., Kallenberg, L.C.M., Lee, H.M.: Variance-penalized Markov decision processes. Math. Oper. Res. 14(1), 147–161 (1989)
Article MathSciNet Google Scholar
Freire, V., Delgado, K.V.: Extreme risk averse policy for goal-directed risk-sensitive Markov decision process. In: 5th Brazilian Conference on Intelligent Systems, pp. 79–84 (2016)
Google Scholar
Freire, V.: The role of discount factor in risk sensitive Markov decision processes. In: 5th Brazilian Conference on Intelligent Systems, pp. 480–485 (2016)
Google Scholar
Freire, V., Delgado, K.V.: GUBS: a utility-based semantic for goal-directed Markov decision processes. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 741–749 (2017)
Google Scholar
Gosavi, A., Das, S.K., Murray, S.L.: Beyond exponential utility functions: a variance-adjusted approach for risk-averse reinforcement learning. In: 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 1–8 (2014)
Google Scholar
Hansen, E.A., Zilberstein, S.: LAO*: A heuristic search algorithm that finds solutions with loops. Artif. Intell. 129, 35–62 (2001)
Article MathSciNet Google Scholar
Howard, R.A., Matheson, J.E.: Risk-sensitive Markov decision processes. Manag. Sci. 18(7), 356–369 (1972)
Article MathSciNet Google Scholar
Jaquette, S.C.: A utility criterion for Markov decision processes. Manag. Sci. 23(1), 43–49 (1976)
Article MathSciNet Google Scholar
Keeney, R.L., Raiffa, H.: Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Wiley, New York (1976)
MATH Google Scholar
Keller, T., Eyerich, P.: Prost: probabilistic planning based on UCT. In: Twenty-Second International Conference on Automated Planning and Scheduling, pp. 1–9 (2012)
Google Scholar
Kennings, A.A., Markov, I.L.: Analytical minimization of half-perimeter wirelength. In: Proceedings of the 2000 Asia and South Pacific Design Automation Conference, pp. 179–184. ACM (2000)
Google Scholar
Koenig, S., Muise, C., Sanner, S.: Non-traditional objective functions for MDPs. In: IJCAI-18 Workshop on Goal Reasoning, pp. 1–8 (2014)
Google Scholar
Liu, Y., Koenig, S.: Risk-sensitive planning with one-switch utility functions: value iteration. In: Proceedings of the 20th National Conference on Artificial Intelligence, pp. 993–999. AAAI Press (2005)
Google Scholar
Liu, Y., Koenig, S.: Functional value iteration for decision-theoretic planning with general utility functions. In: Proceedings of the 21st National Conference on Artificial Intelligence, pp. 1186–1186. AAAI Press (2006)
Google Scholar
Ma, S., Yu, J.Y.: State-augmentation transformations for risk-sensitive reinforcement learning. In: The Thirty-Third AAAI Conference on Artificial Intelligence. The Thirty-First Innovative Applications of Artificial Intelligence Conference. The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, pp. 4512–4519 (2019)
Google Scholar
Mann, T.P.: Numerically stable hidden Markov model implementation. In: An HMM Scaling Tutorial, pp. 1–8 (2006)
Google Scholar
Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. Mach. Learn. 49(2), 267–290 (2002). https://doi.org/10.1023/A:1017940631555
Article MATH Google Scholar
Moldovan, T.M., Abbeel, P.: Risk aversion in Markov decision processes via near optimal Chernoff bounds. In: Advances in Neural Information Processing Systems, NIPS 2012, pp. 3131–3139 (2012)
Google Scholar
Naylor, W.C., Donelly, R., Sha, L.: Non-linear optimization system and method for wire length and delay optimization for an automatic electric circuit placer (2001). US Patent 6,301,693
Google Scholar
Nielsen, F., Sun, K.: Guaranteed bounds on information-theoretic measures of univariate mixtures using piecewise log-sum-exp inequalities. Entropy 18(12), 442 (2016)
Article MathSciNet Google Scholar
Patek, S.D.: On terminating Markov decision processes with a risk-averse objective function. Automatica 37(9), 1379–1386 (2001)
Article Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (2014)
MATH Google Scholar
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Article Google Scholar
Robert, C.: Machine Learning, A Probabilistic Perspective. Taylor & Francis, Milton Park (2014)
Google Scholar
Rothblum, U.G.: Multiplicative Markov decision chains. Math. Oper. Res. 9(1), 6–24 (1984)
Article MathSciNet Google Scholar
Sigl, G., Doll, K., Johannes, F.M.: Analytical placement: a linear or a quadratic objective function. In: 28th ACM/IEEE Design Automation Conference, pp. 427–432 (1991)
Google Scholar
Sobel, M.J.: The variance of discounted Markov decision processes. J. Appl. Probab. 19(4), 794–802 (1982)
Article MathSciNet Google Scholar
Trevizan, F., Thiébaux, S., Santana, P., Williams, B.: I-dual: solving constrained SSPs via heuristic search in the dual space. In: Proceedings of the 26th International Joint Conference on AI (IJCAI) (2017)
Google Scholar
Trevizan, F., Thiébaux, S., Santana, P., Williams, B.: Heuristic search in dual space for constrained stochastic shortest path problems. In: Twenty-Sixth International Conference on Automated Planning and Scheduling (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

FIAP, São Paulo, Brazil
Elthon Manhas de Freitas
University of São Paulo, São Paulo, Brazil
Valdinei Freire & Karina Valdivia Delgado

Authors

Elthon Manhas de Freitas
View author publications
You can also search for this author in PubMed Google Scholar
Valdinei Freire
View author publications
You can also search for this author in PubMed Google Scholar
Karina Valdivia Delgado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karina Valdivia Delgado .

Editor information

Editors and Affiliations

Federal University of São Carlos, São Carlos, Brazil
Ricardo Cerri
Federal University of ABC, Santo Andre, Brazil
Ronaldo C. Prati

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Freitas, E.M., Freire, V., Delgado, K.V. (2020). Risk Sensitive Stochastic Shortest Path and LogSumExp: From Theory to Practice. In: Cerri, R., Prati, R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science(), vol 12320. Springer, Cham. https://doi.org/10.1007/978-3-030-61380-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-61380-8_9
Published: 13 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61379-2
Online ISBN: 978-3-030-61380-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics