Abstract
Limiting ratio average (undiscounted) reward finite (state and action spaces) semi-Markov decision processes (SMDPs) with absorbing states are considered where all but one states are absorbing. We propose a realistic inspection model that suitably fits into the class of undiscounted SMDPs with absorbing states. Existence of an optimal semi-stationary policy (i.e., a semi-Markov policy independent of decision epoch counts) is proved. A linear programming algorithm is provided to compute such an optimal policy.
Similar content being viewed by others
References
Baykal-Gursoy M, Gursoy K (2007) Semi-Markov decision processes. Prob Eng Inform Sci 21(04):635–657
Blackwell D (1962) Discrete dynamic programming. Ann Math Stat 33:719–726
Chen D, Trivedi KS (2005) Optimization for condition-based maintenance with semi-Markov decision process. Reliab Eng Syst Saf 90(1):25–29
Das TK, Gosavi A, Mahadevan S, Marchalleck N (1999) Solving semi-Markov decision problems using average reward reinforcement learning. Manage Sci 45(4):560–574
Derman C (1962) On sequential decisions and Markov chains. Manage Sci 9(1):16–24
Derman C, Strauch RE (1966) A note on memoryless rules for controlling sequential control processes. Ann Math Stat 37(1):276–278
Everett H (1957) Recursive games. In: Dresher M, Tucker AW, Wolfe P (eds) Contributions to the Theory of Games III, Ann. Math. Studies 39. Princeton University Press, Princeton, pp 47–78
Federgruen A, Hordijk A, Tijms HC (1978) A note on simultaneous recurrence conditions on a set of denumerable stochastic matrices. J Appl Prob 15:842–847
Feinberg EA (1994) Constrained Semi-Markov decision processes with average rewards. Math Methods Oper Res 39(3):257–288
Flesch J, Thuijsman F, Vrieze OJ (1996) Recursive repeated games with absorbing states. Math Operat Res 21(4):1016–1022
Hinderer K, Waldmann KH (2005) Algorithms for countable state Markov decision models with an absorbing set. SIAM J Contr Optim 43(6):2109–2131
Howard RA (1963) Semi-Markovian decision processes. Proceedings of international statistical institute, Ottawa, Canada
Jewell WS (1963) Markov-renewal programming I and II. Operat Res 2:938–971
Jianyong L, Xiaobo Z (2004) On average reward semi-Markov decision processes with a general multichain structure. Math Operat Res 29(2):339–352
Kemeny JG, Snell JL (1976) Finite Markov Chains. Van Nostrand, New York
Kuhn HW (1953) Extensive games and the problem of information. In Kuhn HW, Tucker AW (ed) Contributions to the theory of games. vol II, Ann. Math. Stud. 28, 193–216, Princeton University Press
Mondal P (2015) Linear programming and zero-sum two-person undiscounted semi-Markov games. Asia Pac J Oper Res 32(5):1550043
Mondal P, Sinha S (2015) Ordered field property for semi-Markov games when one player controls transition probabilities and transition times. Int Game Theory Rev 17(2):1540022-1–1540022-26
Ross SM (1970) Applied probability models with optimization applications. Holden-Day, San Francisco
Schal M (1992) On the second optimality equation for semi-Markov decision models. Math Operat Res 17(2):470–486
Schweitzer PJ, Federgruen A (1978) The functional equations of undiscounted Markov renewal programming. Math Operat Res 3(4):308–321
Sennott LI (1989) Average cost semi-Markov decision processes and the control of queueing systems. Probab Eng Inf Sci 3(02):247–272
Weyl H (1950) Elementary proof of a minimax theorem due to von Neumann. In: Kuhn HW, Tucker AW (eds) Contributions to the theory of games, vol I, Ann. Math. Studies, vol 24. Princeton University Press, Princeton, NJ, pp 19–25
White DJ (1993) A survey of applications of markov decision processes. J Operat Res Soc 44(11):1073–1096
Acknowledgments
The author is thankful to Dr. S. Sinha of Jadavpur University for many helpful suggestions. The author wishes to thank the unknown referees who have patiently gone through this paper and whose suggestions have improved its presentation and readability considerably.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mondal, P. On undiscounted semi-Markov decision processes with absorbing states. Math Meth Oper Res 83, 161–177 (2016). https://doi.org/10.1007/s00186-015-0524-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00186-015-0524-y
Keywords
- Semi-Markov decision processes
- Limiting ratio average reward criterion
- Optimal semi-stationary policies
- LP algorithms