An algorithm to estimate sub-optimal present values for unichain markov processes with alternative reward structures
A new algorithm to obtain a sub-optimal policy and estimate of present values for a class of discounted discrete Markov processes having alternative reward structure in an infinite horizon have been discussed. Based on initial estimates of steady-state probability and gain, this algorithm determines a policy and estimates the present value vectors which could either be used as it is or in conjunction with FR-algorithm or H-algorithm depending upon the accuracy requirement. In both the later cases it generally accelerates the process of computation.
Unable to display preview. Download preview PDF.
- 1.Das Gupta, S., Int. J. Control, Vol 14, No. 6, 1031–40 (1971)Google Scholar
- 2.Finkbeiner, B., and Runggaldier, W., Computing methods in Optimization Problem-Vol 2, ed. by Zadeh, L. A., and Balakrishnan, A. V., (1965)Google Scholar
- 3.Howard, R. A., Dynamic Programming and Markov Processes, Technology Press and Wiley, (1965)Google Scholar
- 4.Liusternik, L. A., and Sobolev, V. J., Elements of Functional Analysis, UnganRinehart and Winston, New York, (1961)Google Scholar