Risk-Sensitive Optimal Control in Communicating Average Markov Decision Chains

  • Rolando Cavazos-Cadena
  • Emmanuel Fernández-Gaucherand
Part of the International Series in Operations Research & Management Science book series (ISOR, volume 46)


This work concerns discrete-time Markov decision processes with denumerable state space and bounded costs per stage. The performance of a control policy is measured by a (long-run) risk-sensitive average cost criterion associated to a utility function with constant risk sensitivity coefficient λ, and the main objective of the paper is to study the existence of bounded solutions to the risk-sensitive average cost optimality equation for arbitrary values of λ. The main results are as follows: When the state space is finite, if the transition law is communicating, in the sense that under an arbitrary stationary policy transitions are possible between every pair of states, the optimality equation has a bounded solution for arbitrary non-null λ. However, when the state space is infinite and denumerable, the communication requirement and a strong form of the simultaneous Doeblin condition do not yield a bounded solution to the optimality equation if the risk sensitivity coefficie nt has a sufficiently large absolute value, in general.


Markov decision proccesses Exponential utility function Constant risk sensitivity Constant average cost Communication condition Simultaneous Doeblin condition Bounded solutions to the risk-sensitive optimality equation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Arapostathis, A., V. S. Borkar, E. Fernández-Gaucherand, M. K. Gosh and S. I. Marcus (1993). Discrete-time controlled Markov processes with average cost criteria: a survey, SIAM, Journal on Control and Optimization, 31, 282–334.MathSciNetCrossRefGoogle Scholar
  2. Avila-Godoy, G., A. Brau and E. Fernández-Gaucherand. (1997). “Controlled Markov chains with discounted risk-sensitive criteria: applications to machine replacement,” in Proc. 36th IEEE Conference on Decision and Control, San Diego, CA, pp. 1115–1120.Google Scholar
  3. Avila-Godoy, G. and E. Fernández-Gaucherand. (1998). “Controlled Markov chains with exponential risk-sensitive criteria: modularity, structured policies and applications,” in Proc. 37th IEEE Conference on Decision and Control, Tampa, FL, pp. 778–783.Google Scholar
  4. Avila-Godoy, G.M. and E. Fernández-Gaucherand. (2000). “Risk-Sensitive Inventory Control Problems,” in Proc. Industrial Engineering Research Conference 2000, Cleveland, OH.Google Scholar
  5. Bertsekas, D.P. (1987). Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs.zbMATHGoogle Scholar
  6. Borkar, V.K. (1984). On minimum cost per unit of time control of Markov chains, SIAM Journal on Control and Optimization, 21, 965–984.MathSciNetCrossRefGoogle Scholar
  7. Cavazos-Cadena, R. and E. Fernández-Gaucherand. (1998a). Controlled Markov Chains with Risk-Sensitive Criteria: Average Cost, Optimality Equations, and Optimal Solutions, ZOR: Mathematical Methods of Operations Research. To appear.Google Scholar
  8. Cavazos-Cadena, R. and E. Fernández-Gaucherand. (1998b). Controlled Markov Chains with Risk-Sensitive Average Cost Criterion: Necessary Conditions for Optimal Solutions Under Strong Recurrence Assumptions. Submitted for Publication.Google Scholar
  9. Cavazos-Cadena, R. and E. Fernández-Gaucherand. (1998c). Markov Decision Processes with Risk-Sensitive Average Cost Criterion: The Discounted Stochastic Games Approach. Submitted for publication.Google Scholar
  10. Cavazos-Cadena, R. and E. Fernández-Gaucherand. (1998d). The Vanishing Discount Approach in Markov Chains with Risk-Sensitive Criteria. Submitted for publication.Google Scholar
  11. Fernández-Gaucherand, E., A. Arapostathis and S.I. Marcus. (1990). Remarks on the Existence of Solutions to the Average Cost Optimality Equation in Markov Decision Processes, Systems and Control Letters, 15, 425–432.MathSciNetCrossRefGoogle Scholar
  12. Fernández-Gaucherand, E. and S.I. Marcus. (1997). Risk-Sensitive Optimal Control of Hidden Markov Models: Structural Results. IEEE Transactions on Automatic Control, 42, 1418–1422.MathSciNetCrossRefGoogle Scholar
  13. Fleming, W.H. and W. M. McEneany. (1995). Risk-sensitive control on an infinite horizon, SIAM, Journal on Control and Optimization, 33, 1881–1915.MathSciNetCrossRefGoogle Scholar
  14. Fleming, W.H. and D. Hernández-Hernández. (1997b), Risk sensitive control of finite state machines on an infinite horizon I, SIAM Journal on Control and Optimization, 35, 1970–1810.MathSciNetCrossRefGoogle Scholar
  15. Hernández-Hernández, D. and S.I. Marcus.(1996), Risk sensitive control of Markov processes in countable state space, Systems & Control Letters, 29, 147–155.MathSciNetCrossRefGoogle Scholar
  16. Hernández-Lerma, D. and J.B. Lasserre. (1996). Discrete-Time Markov Control Processes, Springer, New York.CrossRefGoogle Scholar
  17. Hinderer, K. (1970). Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter, Lecture Notes on Operations Research and Mathematical Systems, No. 33, Springer, New York.CrossRefGoogle Scholar
  18. Hordjik, A. (1974). Dynamic Programming and Potential Theory, Mathematical Centre Tract No. 51, Matematisch Centrum, Amsterdam.Google Scholar
  19. Howard, A.R. and J.E. Matheson. (1972). Risk-sensitive Markov Decision processes, Managment Sciences, 18, 356–369.MathSciNetCrossRefGoogle Scholar
  20. Jacobson, D.H. (1973). Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games, IEEE Transactions on Automatic Control, 18, 124–131.MathSciNetCrossRefGoogle Scholar
  21. Jaquette, S.C. (1973). Markov decision processes with a new optimality criterion: discrete time. The Annals of Statistics, 1, 496–505.MathSciNetCrossRefGoogle Scholar
  22. Jaquette, S.C. (1976). A utility criterion for Markov decision processes. Management Science, 23, 43–49.MathSciNetCrossRefGoogle Scholar
  23. James, M.R., J.S. Baras and R. J. Elliot. (1994). Risk-sensitive control and dynamic games for partially observed discrete-time nonlinear systems, IEEE Transactions on Automatic Control, 39, 780–792.MathSciNetCrossRefGoogle Scholar
  24. Kumar, P.R. and P. Varaiya. (1986). Stochastic Systems: Estimation, Identification and Adaptive Control, Prentice-Hall, Englewood Cliffs.zbMATHGoogle Scholar
  25. Lai, T.L. and S. Yakowitz, “Machine Learning and Nonparametric Bandit Theory,” IEEE Trans. Auto. Control40 (1995) 1199–1209.MathSciNetCrossRefGoogle Scholar
  26. Ioéve, M. (1980). Probability Theory I, Springer, New York.Google Scholar
  27. Marcus, S.I., E. Fernández-Gaucherand, D. Hernández-Hernández, S. Coraluppi and P. Fard. (1996). Risk Sensitive Markov Decision Processes, in Systems & Control in the Twenty-First Century. Series: Progress in Systems and Control, Birkhäuser. Editors: C.I. Byrnes, B.N. Datta, D.S. Gilliam, C.F. Martin, 263–279.CrossRefGoogle Scholar
  28. Puterman, M. (1994). Markov Decison Processes, Wiley, New York.CrossRefGoogle Scholar
  29. Ross, S.M. Applied Probability Models with Optimization Applications, Holden-Day, San Francisco.Google Scholar
  30. Royden, H.L. (1968). Real Analysis, MacMillan, London.zbMATHGoogle Scholar
  31. Runolfsson, T. (1994). The equivalence beteen infinite horizon control of stochastic systems with exponential-of-integral performance index and stochastic differential games, IEEE Transactions on Automatic Control, 39, 1551–1563.MathSciNetCrossRefGoogle Scholar
  32. Shayman, M.A. and E. Fernández-Gaucherand. (1999). “Risk-sensitive decision theoretic troubleshooting.” In Proc. 37th Annual Allerton Conference on Communication, Control, and Computing, September 22–24.Google Scholar
  33. Thomas, L.C. (1980). Connectedness conditions for denumerable state Markov decision processes, in: R. Hartjey, L. C. Thomas and D.J. White, Editors, Recent Advances in Markov Decision Processes, Academic press, New York.Google Scholar
  34. Whittle, P. (1990). Risk-sensitive Optimal Control, Wiley, New York.zbMATHGoogle Scholar
  35. Yakowitz, S., T. Jawayardena, and S. Li. (1992). “Machine Learning for Dependent Observations, with Application to Queueing Network Optimization,” IEEE Trans. Auto. Control37, 1316–1324.CrossRefGoogle Scholar
  36. Yakowitz, S. “The Nonparametric Bandit Approach to Machine Learning,” in Proc. 34th IEEE Conf. Decision & Control,” New Orleans, LA, 568–572.Google Scholar
  37. Yakowitz, S. Mathematics of Adaptive Control Processes. Elsevier, New York.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2002

Authors and Affiliations

  • Rolando Cavazos-Cadena
    • 1
  • Emmanuel Fernández-Gaucherand
    • 2
  1. 1.Departamento de Estadística y CálculoUniversidadAutónoma Agraria Antonio NarroBuenavista, SaltilloMéxico
  2. 2.Department of Electrical & Computer Engineering & Computer ScienceUniversity of CincinnatiCincinnatiUSA

Personalised recommendations