Markov ratio decision processes
- 58 Downloads
A finite-state Markov decision process, in which, associated with each action in each state, there are two rewards, is considered. The objective is to optimize the ratio of the two rewards over an infinite horizon. In the discounted version of this decision problem, it is shown that the optimal value is unique and the optimal strategy is pure and stationary; however, they are dependent on the starting state. Also, a finite algorithm for computing the solution is given.
Key WordsMarkov decision processes ratio rewards discounting optimal solutions algorithms
Unable to display preview. Download preview PDF.
- 1.Howard, R. A.,Dynamic Programming and Markov Processes, Technology Press & John Wiley and Sons, New York, New York, 1960.Google Scholar
- 2.Wolfe, P., andDantzig, G. B.,Linear Programming in a Markov Chain, Operations Research, Vol. 10, pp. 702–710, 1962.Google Scholar
- 3.Derman, C.,On Sequential Decisions and Markov Chains, Management Science, Vol. 9, pp. 16–24, 1962.Google Scholar
- 4.Aggarwal, V. V.,Bimatrix Markovian Decision Processes and Stochastic Ratio Games, Case Western Reserve University, Cleveland, Ohio, PhD Thesis, 1973.Google Scholar
- 5.Fox, B.,Markov Renewal Programming by Linear Fractional Programming, SIAM Journal on Applied Mathematics, Vol. 14, pp. 1418–1432, 1966.Google Scholar
- 6.Jewell, W. S.,Markov Renewal Programming: II, Infinite Return Models, Examples, Operations Research, Vol. 11, pp. 949–971, 1963.Google Scholar
- 7.Charnes, A., andCooper, W. W.,Programming with Linear Fractional Functionals, Naval Research Logistics Quarterly, Vol. 9, pp. 181–186, 1962.Google Scholar