Computing semi-stationary optimal policies for multichain semi-Markov decision processes

  • Prasenjit Mondal
Game theory and optimization


We consider semi-Markov decision processes with finite state and action spaces and a general multichain structure. A form of limiting ratio average (undiscounted) reward is the criterion for comparing different policies. The main result is that the value vector and a pure optimal semi-stationary policy (i.e., a policy which depends only on the initial state and the current state) for such an SMDP can be computed directly from an optimal solution of a finite set (whose cardinality equals the number of states) of linear programming (LP) problems. To be more precise, we prove that the single LP associated with a fixed initial state provides the value and an optimal pure stationary policy of the corresponding SMDP. The relation between the set of feasible solutions of each LP and the set of stationary policies is also analyzed. Examples are worked out to describe the algorithm.


Semi-Markov decision processes Limiting ratio average reward Multichain structure Pure optimal semi-stationary policies Linear programming 

Mathematics Subject Classification

60K15 60K20 



I am grateful to Prof. T. Parthasarathy of CMI & ISI Chennai. Some ideas presented in this paper results from a fruitful discussion with him during the International Conference & Workshop on “Game Theory and Optimization”, June 6-10, 2016 at IIT Madras. This paper is dedicated to celebrate the 75th birthday of Prof. T. Parthasarathy, who has made significant contributions to the theory of games and linear complementarity problems. I am also thankful to Prof. S. Sinha of Jadavpur University, Kolkata for many valuable suggestions. I would like to thank the two anonymous Referees for their valuable and detailed comments that has helped structure this paper better.


  1. Baykal-Gürsoy, M. (2010). Semi-Markov decision processes. Wiley Encyclopedia of Operations Research and Management Science, doi: 10.1002/9780470400531.eorms0757.
  2. Bellman, R. E. (1962). A Markovian decision process. Journal of Mathematics and Mechanics, 6, 679–684.Google Scholar
  3. Bewley, T., & Kohlberg, E. (1978). On stochastic games with stationary optimal strategies. Mathematics of Operations Research, 3(2), 104–125.CrossRefGoogle Scholar
  4. Blackwell, D. (1962). Discrete dynamic programming. Annals of Mathematical Statistics, 33, 719–726.CrossRefGoogle Scholar
  5. Chen, D., & Trivedi, K. S. (2005). Optimization for condition-based maintenance with semi-Markov decision process. Reliability Engineering & System Safety, 90(1), 25–29.CrossRefGoogle Scholar
  6. Denardo, E. V., & Fox, B. L. (1968). Multichain Markov renewal programs. SIAM Journal of Applied Mathrmatics, 16(3), 468–487.CrossRefGoogle Scholar
  7. Derman, C. (1962). On sequential decisions and Markov chains. Management Science, 9(1), 16–24.CrossRefGoogle Scholar
  8. Derman, C. (1964). On sequential control processes. Annals of Mathematical Statististics, 35(1), 341–349.CrossRefGoogle Scholar
  9. Doob, J. L. (1953). Stochastic processes (p. 52369). Hoboken: Willey.Google Scholar
  10. Federgruen, A., Hordijk, A., & Tijms, H. C. (1978). A note on simultaneous recurrence conditions on a set of denumerable stochastic matrices. Journal of Appllied Probablity, 15(4), 842–847.CrossRefGoogle Scholar
  11. Feinberg, E. A. (1994). Constrained semi-Markov decision processes with average rewards. Mathematical Methods of Operational Research, 39(3), 257–288.CrossRefGoogle Scholar
  12. Fox, B. (1966). Markov renewal programming by linear fractional programming. SIAM Journal on Applied Mathematics, 14(6), 1418–1432.CrossRefGoogle Scholar
  13. Hordijk, A., & Kallenberg, L. C. M. (1979). Linear programming and Markov decision chains. Management Science, 25(4), 352–362.CrossRefGoogle Scholar
  14. Howard, R. A. (1960). Dynamic Programming and Markov Processes. New York: Wiley.Google Scholar
  15. Howard, R. A. (1963). Linear programming and Markov decision chains. Ottawa: Proc Internat Statist Inst.Google Scholar
  16. Jaśkiewicz, A. (2004). On the equivalence of two expected average cost criteria for Semi-Markov control processes. Mathematics of Operations Research, 29, 326–338.CrossRefGoogle Scholar
  17. Jaśkiewicz, A., & Nowak, A. (2007). Average optimality for semi-Markov control processes. Morfismos, 11(1), 15–36.Google Scholar
  18. Jewell, W. S. (1963). Markov-renewal programming. Operations Research, 2, 938–971.CrossRefGoogle Scholar
  19. Jianyong, L., & Xiaobo, Z. (2004). On average reward semi-Markov decision processes with a general multichain structure. Mathematics Operations Research, 29(2), 339–352.CrossRefGoogle Scholar
  20. Mondal, P., & Sinha, S. (2015). Ordered field property for semi-Markov games when one player controls transition probabilities and transition times. International Game Theory Review, 17(2), 1540022-1–1540022-26.CrossRefGoogle Scholar
  21. Mondal, P. (2015). Linear programming and zero-sum two-person undiscounted Semi-Markov games. Asia-Pacific Journal of Operational Research, 32(6), 1550043-1–1550043-20.CrossRefGoogle Scholar
  22. Mondal, P. (2016a). On undiscounted Semi-Markov decision processes with absorbing states. Mathematical Methods of Operations Research, 83(2), 161–177.CrossRefGoogle Scholar
  23. Mondal, P. (2016b). Completely mixed strategies for single controller unichain Semi-Markov games with undiscounted payoffs. Operational Research, doi: 10.1007/s12351-016-0272-7.
  24. Mondal, P. (2017). On zero-sum two-person undiscounted semi-Markov games with a multichain structure. Advances in Applied Probability, 49(3), 826–849. doi: 10.1017/apr.2017.23.CrossRefGoogle Scholar
  25. Osaki, S., & Mine, H. (1968). Linear programming algorithms for Semi-Markovian decision processes. Journal of Mathematical Analysis and Applications, 22, 356–381.CrossRefGoogle Scholar
  26. Ross, S. M. (1970). Applied probability models with optimization applications. San Francisco: Holden-Day.Google Scholar
  27. Schal, M. (1992). On the second optimality equation for semi-markov decision models. Mathematics of Operations Research, 17(2), 470–486.CrossRefGoogle Scholar
  28. Schweitzer, P. J. (1971). Iterative solution of the functional equations of undiscounted Markov renewal programming. Journal of Mathematical Analysis and Applications, 34(3), 495–501.CrossRefGoogle Scholar
  29. Sennott, L. I. (1989). Average cost semi-Markov decision processes and the control of queueing systems. Probability in the Engineering and Informational Sciences, 3(02), 247–272.CrossRefGoogle Scholar
  30. Shapley, L. (1953). Stochastic games. In Proceedings of National Academy Sciences, USA (Vol. 39, pp. 1095–1100).Google Scholar
  31. Sinha, S., & Mondal, P. (2017). Semi-Markov decision processes with limiting ratio average rewards. Journal of Mathematical Analysis and Applications, 455(1), 864–871. doi: 10.1016/j.jmaa.2017.06.017.CrossRefGoogle Scholar
  32. White, D. J. (1993). A survey of applications of Markov decision processes. Journal of the Operational Research Society, 44(11), 1073–1096.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Mathematics DepartmentGovernment General Degree CollegeBankuraIndia

Personalised recommendations