Skip to main content
Springer Nature Link
Log in
Menu
Find a journal Publish with us Track your research
Search
Cart
  1. Home
  2. Machine Learning
  3. Article

Risk-Sensitive Reinforcement Learning

  • Published: November 2002
  • Volume 49, pages 267–290, (2002)
  • Cite this article
Download PDF
Machine Learning Aims and scope Submit manuscript
Risk-Sensitive Reinforcement Learning
Download PDF
  • Oliver Mihatsch1 &
  • Ralph Neuneier1 
  • 6649 Accesses

  • 1 Altmetric

  • Explore all metrics

Abstract

Most reinforcement learning algorithms optimize the expected return of a Markov Decision Problem. Practice has taught us the lesson that this criterion is not always the most suitable because many applications require robust control strategies which also take into account the variance of the return. Classical control literature provides several techniques to deal with risk-sensitive optimization goals like the so-called worst-case optimality criterion exclusively focusing on risk-avoiding policies or classical risk-sensitive control, which transforms the returns by exponential utility functions. While the first approach is typically too restrictive, the latter suffers from the absence of an obvious way to design a corresponding model-free reinforcement learning algorithm.

Our risk-sensitive reinforcement learning algorithm is based on a very different philosophy. Instead of transforming the return of the process, we transform the temporal differences during learning. While our approach reflects important properties of the classical exponential utility framework, we avoid its serious drawbacks for learning. Based on an extended set of optimality equations we are able to formulate risk-sensitive versions of various well-known reinforcement learning algorithms which converge with probability one under the usual conditions.

Article PDF

Download to read the full article text

Similar content being viewed by others

Reinforcement learning and stochastic optimisation

Article 23 December 2021

Solving Markov decision processes with downside risk adjustment

Article 11 June 2016

Markov risk mappings and risk-sensitive optimal prediction

Article Open access 27 November 2022

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.
  • Artificial Intelligence
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

  • Basar, T. S., & Bernhard, P. (1995). H ? -optimal control and related minimax design problems: A dynamic game approach (2nd edn.). Boston: Birkhäuser.

    Google Scholar 

  • Bellman, R. E., & Dreyfus, S. E. (1962). Applied dynamic programming. Princeton: Princeton University Press.

    Google Scholar 

  • Bertsekas, D. P. (1995). Dynamic programming and optimal control (Vol. 2.). Belmont, MA: Athena Scientific.

    Google Scholar 

  • Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Belmont, MA: Athena Scientific.

    Google Scholar 

  • Coraluppi, S. (1997). Optimal control of Markov decision processes for performance and Robustness. Ph.D. Thesis, University of Maryland.

  • Coraluppi, S. P., & Marcus, S. I. (1999). Risk-sensitive, minimax and mixed risk-neutral/minimax control of Markov decision processes. In W. M. McEarney, G. G. Yin, & Q. Zhang (Eds.), Stochastic analysis, control, optimization and applications (pp. 21–40). Boston: Birkhäuser.

    Google Scholar 

  • Elton, E. J., & Gruber, M. J. (1995). Modern portfolio theory and investment analysis. New York: John Wiley & Sons.

    Google Scholar 

  • Gordon, G. J. (1995). Stable function approximation in dynamic programming. In A. Prieditis, & S. J. Russel (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 261–268). San Francisco: Morgan Kaufmann Publishers.

    Google Scholar 

  • Heger, M. (1994a). Consideration of risk and reinforcement learning. In W. W. Cohen, & H. Hirsh (Eds.), Machine Learning: Proceedings of the Eleventh International Conference (pp. 105–111). San Francisco: Morgan Kaufmann Publishers.

    Google Scholar 

  • Heger, M. (1994b). Risk and reinforcement learning: Concepts and dynamic programming. Technical Report, Zentrum für Kognitionswissenschaften, Universität Bremen, Germany.

    Google Scholar 

  • Howard, R. A., & Matheson, J. E. (1972). Risk-sensitive Markov decision processes. Management Science, 18:7, 356–369.

    Google Scholar 

  • Koenig, S., & Simmons, R. G. (1994). Risk-sensitive planning with probabilistic decision graphs. In Proceedings of the Fourth International Conference on Principles of Knowledge Representation and Reasoning (KR) (pp. 363-373).

  • Littman, M. L., & Szepesvri, C. (1996). A generalized reinforcement-learning model: Convergence and applications. In L. Saitta (Ed.), Machine Learning: Proceedings of the Thirteenth International Conference (pp. 310–318). San Francisco: Morgan Kaufman Publishers.

    Google Scholar 

  • Marbach, P., Mihatsch, O., & Tsitsiklis. J. N. (2000). Call admission control and routing in integrated services networks using neuro-dynamic programming. IEEE Journal on Selected Areas in Communications, 18:2, 197–208.

    Google Scholar 

  • Neuneier, R. (1998). Enhancing Q-learning for optimal asset allocation. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Advances in neural information processing systems (Vol. 10). Cambridge, MA: The MIT Press.

    Google Scholar 

  • Neuneier, R., & Mihatsch, O. (2000). Risk-averse asset allocation using reinforcement learning. In: Proceedings of the Seventh International Conference on Forecasting Financial Markets: Advances for Exchange Rates, Interest Rates and Asset Management.

  • Pratt, J. W. (1964). Risk aversion in the small and in the large. Econometrica, 32, 122–136.

    Google Scholar 

  • Puterman, M. L. (1994). Markov decision processes. New York: John Wiley & Sons.

    Google Scholar 

  • Singh, S., & Bertsekas, D. (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In M. C. Mozer, M. I. Jordan, and T. Petsche (Eds.), Advances in neural information processing systems (Vol. 9, pp. 974–980). Cambridge, MA: The MIT Press.

    Google Scholar 

  • Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.

    Google Scholar 

  • Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42:5, 674–690.

    Google Scholar 

  • Tsitsiklis, J. N., & Van Roy, B. (1999). Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives. IEEE Transactions on Automatic Control, 44:10, 1840–1851.

    Google Scholar 

  • von Neumann, J., & Morgenstern, O. (1953). Theory of games and economic behavior (3rd edn.). Princeton University Press.

  • Watkins, C. J. C. H. (1989). Learning from delayed rewards. Ph.D. Thesis, University of Cambridge, England.

    Google Scholar 

  • Zhang, W., & Dietterich, T. G. (1996). High-performance job-shop scheduling with a time-delay TD(?) network. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 1024–1030). Cambridge, MA: The MIT Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Corporate Technology, Information and Communications 4, Siemens AG, D-81730, Munich, Germany

    Oliver Mihatsch & Ralph Neuneier

Authors
  1. Oliver Mihatsch
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Ralph Neuneier
    View author publications

    You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mihatsch, O., Neuneier, R. Risk-Sensitive Reinforcement Learning. Machine Learning 49, 267–290 (2002). https://doi.org/10.1023/A:1017940631555

Download citation

  • Issue Date: November 2002

  • DOI: https://doi.org/10.1023/A:1017940631555

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • reinforcement learning
  • risk-sensitive control
  • temporal differences
  • dynamic programming
  • Bellman's equation
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Journal finder
  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Cancel contracts here

65.109.116.201

Not affiliated

Springer Nature

© 2025 Springer Nature