Skip to main content

Risk Sensitive Stochastic Shortest Path and LogSumExp: From Theory to Practice

  • Conference paper
  • First Online:
  • 884 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12320))

Abstract

Stochastic Shortest Path (SSP) is the most popular framework to model sequential decision-making problems under stochasticity. However, decisions for real problems should consider risk sensitivity to provide robust decisions taking into account bad scenarios. SSPs that deal with risk are called Risk-Sensitive SSPs (RSSSPs), and an interesting framework from a theoretical perspective considers Expected Utility Theory under an exponential utility function. However, from a practical perspective, exponential utility function causes overflow or underflow in computer implementation even in small state spaces. In this paper, we make use of LogSumExp technique to solve RSSSPs under exponential utility in practice within Value Iteration, Policy Iteration, and Linear Programming algorithms. Experiments were performed on a toy problem to show scalability of the proposed algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Usually, Expected Utility Theory considers a value function to be maximized, the positive expected utility. However, because we are considering cost functions, the SSP literature minimizes expected cost, then we chose to follow the SSP literature to avoid any misunderstanding.

  2. 2.

    For example, a policy that pays for sure M arbitrarily bigger, may be chosen over a policy that pays \(M+\varepsilon \) with probability \(\alpha \) and \(\varepsilon \) with probability \(1-\alpha \).

References

  1. Bertsekas, D.P., Tsitsiklis, J.N.: An analysis of stochastic shortest path problems. Math. Oper. Res. 16(3), 580–595 (1991)

    Article  MathSciNet  Google Scholar 

  2. Bonet, B., Geffner, H.: Labeled RTDP: improving the convergence of real-time dynamic programming. In: ICAPS 2003, pp. 12–21. AAAI Press (2003)

    Google Scholar 

  3. Chen, Y., Gao, D.Y.: Global solutions to nonconvex optimization of 4th-order polynomial and log-sum-exp functions. J. Glob. Optim. 64(3), 417–431 (2014). https://doi.org/10.1007/s10898-014-0244-5

    Article  MathSciNet  MATH  Google Scholar 

  4. Chow, Y., Tamar, A., Mannor, S., Pavone, M.: Risk-sensitive and robust decision-making: a CVaR optimization approach. In: Advances in Neural Information Systems (2015)

    Google Scholar 

  5. Chvatal, V.: Linear Programming. Freeman Press, New York (1983)

    Google Scholar 

  6. Delage, E., Mannor, S.: Percentile optimization for Markov decision processes with parameter uncertainty. Oper. Res. 58(1), 203–213 (2010)

    Article  MathSciNet  Google Scholar 

  7. Denardo, E.V., Rothblum, U.G.: Optimal stopping, exponential utility, and linear programming. Math. Program. 16(1), 228–244 (1979). https://doi.org/10.1007/BF01582110

    Article  MathSciNet  MATH  Google Scholar 

  8. Filar, J.A., Krass, D., Ross, K.W., Ross, K.W.: Percentile performance criteria for limiting average Markov decision processes. IEEE Trans. Autom. Control 40(1), 2–10 (1995)

    Article  MathSciNet  Google Scholar 

  9. Filar, J.A., Kallenberg, L.C.M., Lee, H.M.: Variance-penalized Markov decision processes. Math. Oper. Res. 14(1), 147–161 (1989)

    Article  MathSciNet  Google Scholar 

  10. Freire, V., Delgado, K.V.: Extreme risk averse policy for goal-directed risk-sensitive Markov decision process. In: 5th Brazilian Conference on Intelligent Systems, pp. 79–84 (2016)

    Google Scholar 

  11. Freire, V.: The role of discount factor in risk sensitive Markov decision processes. In: 5th Brazilian Conference on Intelligent Systems, pp. 480–485 (2016)

    Google Scholar 

  12. Freire, V., Delgado, K.V.: GUBS: a utility-based semantic for goal-directed Markov decision processes. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 741–749 (2017)

    Google Scholar 

  13. Gosavi, A., Das, S.K., Murray, S.L.: Beyond exponential utility functions: a variance-adjusted approach for risk-averse reinforcement learning. In: 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 1–8 (2014)

    Google Scholar 

  14. Hansen, E.A., Zilberstein, S.: LAO*: A heuristic search algorithm that finds solutions with loops. Artif. Intell. 129, 35–62 (2001)

    Article  MathSciNet  Google Scholar 

  15. Howard, R.A., Matheson, J.E.: Risk-sensitive Markov decision processes. Manag. Sci. 18(7), 356–369 (1972)

    Article  MathSciNet  Google Scholar 

  16. Jaquette, S.C.: A utility criterion for Markov decision processes. Manag. Sci. 23(1), 43–49 (1976)

    Article  MathSciNet  Google Scholar 

  17. Keeney, R.L., Raiffa, H.: Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Wiley, New York (1976)

    MATH  Google Scholar 

  18. Keller, T., Eyerich, P.: Prost: probabilistic planning based on UCT. In: Twenty-Second International Conference on Automated Planning and Scheduling, pp. 1–9 (2012)

    Google Scholar 

  19. Kennings, A.A., Markov, I.L.: Analytical minimization of half-perimeter wirelength. In: Proceedings of the 2000 Asia and South Pacific Design Automation Conference, pp. 179–184. ACM (2000)

    Google Scholar 

  20. Koenig, S., Muise, C., Sanner, S.: Non-traditional objective functions for MDPs. In: IJCAI-18 Workshop on Goal Reasoning, pp. 1–8 (2014)

    Google Scholar 

  21. Liu, Y., Koenig, S.: Risk-sensitive planning with one-switch utility functions: value iteration. In: Proceedings of the 20th National Conference on Artificial Intelligence, pp. 993–999. AAAI Press (2005)

    Google Scholar 

  22. Liu, Y., Koenig, S.: Functional value iteration for decision-theoretic planning with general utility functions. In: Proceedings of the 21st National Conference on Artificial Intelligence, pp. 1186–1186. AAAI Press (2006)

    Google Scholar 

  23. Ma, S., Yu, J.Y.: State-augmentation transformations for risk-sensitive reinforcement learning. In: The Thirty-Third AAAI Conference on Artificial Intelligence. The Thirty-First Innovative Applications of Artificial Intelligence Conference. The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, pp. 4512–4519 (2019)

    Google Scholar 

  24. Mann, T.P.: Numerically stable hidden Markov model implementation. In: An HMM Scaling Tutorial, pp. 1–8 (2006)

    Google Scholar 

  25. Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. Mach. Learn. 49(2), 267–290 (2002). https://doi.org/10.1023/A:1017940631555

    Article  MATH  Google Scholar 

  26. Moldovan, T.M., Abbeel, P.: Risk aversion in Markov decision processes via near optimal Chernoff bounds. In: Advances in Neural Information Processing Systems, NIPS 2012, pp. 3131–3139 (2012)

    Google Scholar 

  27. Naylor, W.C., Donelly, R., Sha, L.: Non-linear optimization system and method for wire length and delay optimization for an automatic electric circuit placer (2001). US Patent 6,301,693

    Google Scholar 

  28. Nielsen, F., Sun, K.: Guaranteed bounds on information-theoretic measures of univariate mixtures using piecewise log-sum-exp inequalities. Entropy 18(12), 442 (2016)

    Article  MathSciNet  Google Scholar 

  29. Patek, S.D.: On terminating Markov decision processes with a risk-averse objective function. Automatica 37(9), 1379–1386 (2001)

    Article  Google Scholar 

  30. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (2014)

    MATH  Google Scholar 

  31. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  32. Robert, C.: Machine Learning, A Probabilistic Perspective. Taylor & Francis, Milton Park (2014)

    Google Scholar 

  33. Rothblum, U.G.: Multiplicative Markov decision chains. Math. Oper. Res. 9(1), 6–24 (1984)

    Article  MathSciNet  Google Scholar 

  34. Sigl, G., Doll, K., Johannes, F.M.: Analytical placement: a linear or a quadratic objective function. In: 28th ACM/IEEE Design Automation Conference, pp. 427–432 (1991)

    Google Scholar 

  35. Sobel, M.J.: The variance of discounted Markov decision processes. J. Appl. Probab. 19(4), 794–802 (1982)

    Article  MathSciNet  Google Scholar 

  36. Trevizan, F., Thiébaux, S., Santana, P., Williams, B.: I-dual: solving constrained SSPs via heuristic search in the dual space. In: Proceedings of the 26th International Joint Conference on AI (IJCAI) (2017)

    Google Scholar 

  37. Trevizan, F., Thiébaux, S., Santana, P., Williams, B.: Heuristic search in dual space for constrained stochastic shortest path problems. In: Twenty-Sixth International Conference on Automated Planning and Scheduling (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karina Valdivia Delgado .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

de Freitas, E.M., Freire, V., Delgado, K.V. (2020). Risk Sensitive Stochastic Shortest Path and LogSumExp: From Theory to Practice. In: Cerri, R., Prati, R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science(), vol 12320. Springer, Cham. https://doi.org/10.1007/978-3-030-61380-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61380-8_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61379-2

  • Online ISBN: 978-3-030-61380-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics