On maximizing probabilities for over-performing a target for Markov decision processes

Huang, Tanhao; Dai, Yanan; Chen, Jinwen

doi:10.1007/s11081-023-09870-4

On maximizing probabilities for over-performing a target for Markov decision processes

Research Article
Published: 08 December 2023

(2023)
Cite this article

Optimization and Engineering Aims and scope Submit manuscript

Tanhao Huang¹,
Yanan Dai¹ &
Jinwen Chen¹

85 Accesses
Explore all metrics

Abstract

This paper studies the dual relation between risk-sensitive control and large deviation control of maximizing the probability for out-performing a target for Markov Decision Processes. To derive the desired duality, we apply a non-linear extension of the Krein-Rutman Theorem to characterize the optimal risk-sensitive value and prove that an optimal policy exists which is stationary and deterministic. The right-hand side derivative of this value function is used to characterize the specific targets which make the duality to hold. It is proved that the optimal policy for the “out-performing” probability can be approximated by the optimal one for the risk-sensitive control. The range of the (right-hand, left-hand side) derivative of the optimal risk-sensitive value function plays an important role. Some essential differences between these two types of optimal control problems are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Solving Markov decision processes with downside risk adjustment

Article 11 June 2016

Time-Inconsistent Risk-Sensitive Equilibrium for Countable-Stated Markov Decision Processes

Article 26 May 2020

Risk-Sensitivity Vanishing Limit for Controlled Markov Processes

Article 16 March 2023

References

Alsheikh MA, Hoang DT, Niyato D et al (2015) Markov decision processes with applications in wireless sensor networks: a survey. IEEE Commun Surv Tutor 17(3):1239–1267
Article Google Scholar
Anantharam V, Borkar VS (2017) A variational formula for risk-sensitive reward. SIAM J Control Optim 55(2):961–988
Article MathSciNet MATH Google Scholar
Boucherie RJ, Van Dijk NM (2017) Markov decision processes in practice. Springer, Berlin
Book MATH Google Scholar
Dembo A, Zeitouni O (2010) Large deviations techniques and applications. Stochastic modelling and applied probability. Springer, Berlin, p 38
Book MATH Google Scholar
Di Masi GB, Stettner L (1999) Risk-sensitive control of discrete-time Markov processes with infinite horizon. SIAM J Control Optim 38(1):61–78
Article MathSciNet MATH Google Scholar
Dupuis P, Ellis RS (1997) A weak convergence approach to the theory of large deviations. Wiley, New York
Book MATH Google Scholar
Feinberg EA, Shwartz A (2002) Handbook of Markov decision processes. Springer, Heidelberg
Book MATH Google Scholar
Fleming WH, Hernandez-Hernandez D (1997) Risk-sensitive control of finite state machines on an infinite horizon I. SIAM J Control Optim 35(5):1790–1810
Article MathSciNet MATH Google Scholar
Gosavi A (2006) A risk-sensitive approach to total productive maintenance. Automatica 42:1321–1330
Article MathSciNet MATH Google Scholar
Hata H, Nagai H, Sheu SJ (2010) Asymptotics of the probability minimizing a “down-side’’ risk. Ann Appl Probab 20(1):52–89
Article MathSciNet MATH Google Scholar
Jaskiewicz A (2007) Average optimality for risk-sensitive control with general state space. Ann Appl Probab 17(2):654–675
Article MathSciNet MATH Google Scholar
Nagai H (2012) Downside risk minimization via a large deviations approach. Ann Appl Probab 22(2):608–669
Article MathSciNet MATH Google Scholar
Ogiwara T (1995) Nonlinear Perron-Frobenius problem on an ordered Banach space. Jpn J Math 21(1):43–103
Article MathSciNet MATH Google Scholar
Pham H (2003) A large deviations approach to optimal long term investment. Financ Stoch 7(2):169–195
Article MathSciNet MATH Google Scholar
Pham H (2003) A risk-sensitive control dual approach to a large deviations control problem. Syst Control Lett 49(4):295–309
Article MathSciNet MATH Google Scholar
Piunovskiy A, Zhang Y (2010) Modern trends in controlled stochastic processes. Springer, Berlin
Google Scholar
Puhalskii AA (2011) On portfolio choice by maximizing the outperformance probability. Math Financ Int J Math Stat Financ Econ 21(1):145–167
MathSciNet MATH Google Scholar
Puhalskii AA (2019) On long term investment optimality. Appl Math Optim 80(1):1–62
Article MathSciNet MATH Google Scholar
Puterman ML (1994) Markov decision processes. Wiley, Amsterdam
Book MATH Google Scholar
Rockafellar RT (1970) Convex analysis. Princeton University Press
Book MATH Google Scholar
Stettner L (2004) Duality and risk sensitive portfolio optimization. Contemp Math 351:333–348
Article MathSciNet MATH Google Scholar
White DJ (1993) A survey of applications of Markov decision processes. J Opl Res Soc 44(2):1073–1096
Article MATH Google Scholar

Download references

Acknowledgements

The authors are grateful to the Referee for the very careful review of the first version of this paper and for the many helpful comments and suggestions for improvement. This paper is part of the first author’s dissertation. This work is supported by the NSFC 11671226.

Author information

Authors and Affiliations

Department of Mathematics, Tsinghua University, Beijing, China
Tanhao Huang, Yanan Dai & Jinwen Chen

Authors

Tanhao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yanan Dai
View author publications
You can also search for this author in PubMed Google Scholar
Jinwen Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinwen Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huang, T., Dai, Y. & Chen, J. On maximizing probabilities for over-performing a target for Markov decision processes. Optim Eng (2023). https://doi.org/10.1007/s11081-023-09870-4

Download citation

Received: 31 October 2022
Revised: 03 November 2023
Accepted: 03 November 2023
Published: 08 December 2023
DOI: https://doi.org/10.1007/s11081-023-09870-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On maximizing probabilities for over-performing a target for Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Solving Markov decision processes with downside risk adjustment

Time-Inconsistent Risk-Sensitive Equilibrium for Countable-Stated Markov Decision Processes

Risk-Sensitivity Vanishing Limit for Controlled Markov Processes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On maximizing probabilities for over-performing a target for Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Solving Markov decision processes with downside risk adjustment

Time-Inconsistent Risk-Sensitive Equilibrium for Countable-Stated Markov Decision Processes

Risk-Sensitivity Vanishing Limit for Controlled Markov Processes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation