Abstract
The functional equations of infinite horizon discounted Markov renewal programming arev*=Tv* whereT is a monotone contraction operator. This paper shows how to accelerate convergence of the value-iteration schemev (n+1)=Tv (n) by a block-scaling step whereby all states in a given group have theirv (n) i scaled by a common scale factor. A similar method exists when the relative valuesv* i −v*1 are computed iteratively. In both cases, the block scaling factors are solution to a set of functional equations which has similar structure to a discounted Markov renewal program, and can itself be solved by successive approximation, policy iteration, or linear programming.
Similar content being viewed by others
References
D.P. Bertsekas and D.A. Castanon, Adaptive aggregation methods for infinite horizon dynamic programming, IEEE Trans. Automatic Control AC-34 (1989) 589–598.
F. Chatelin and W.L. Miranker, Acceleration by aggregation of successive approximation methods, Lin. Algebra Appl. 43 (1982) 17–47.
C. Derman,Finite State Markovian Decision Processes (Academic Press, New York, 1970).
L.M. Dudkin, I. Rabinovich and J. Vakhutinsky,Iterative Aggregation Theory (Marcel Dekker, New York, 1980).
A. Federgruen and P.J. Schweitzer, A survey of asymptotic value-iteration for undiscounted Markovian decision process, in: R. Hartley, L.C. Thomas and D.J. White (eds.),Recent Developments in Markov Decision Processes (Academic Press, New York, 1980), pp. 73–109.
(a) P. Henrici,Elements of Numerical Analysis (Wiley, New York, 1964).
(b) D.P. Heyman and M.J. Sobel,Stochastic Models in Operations Research, Vol. 2 (McGraw-Hill, New York, 1984).
R.A. Howard, Semi-Markovian decision processes, Bull. Int. Statist. Inst. 40, Part 2 (1963) 625–652.
W.S. Jewell, Markov renewal programming I and II, Oper. Res. 11 (1963) 938–971.
S. Lippman, Applying a new device in the optimization of exponential systems, Oper. Res. 23 (1975) 687–710.
J. MacQueen, A modified dynamic programming method for Markovian decision problems, J. Math. Anal. Appl. 14 (1966) 38–43.
R. Mendelssohn, An iterative aggregation procedure for Markov decision processes, Oper. Res. 30 (1982) 62–73.
S. Osaki and H. Mine, Linear programming algorithms for semi-Markovian decision processes, J. Math. Anal. Appl. 22 (1968) 356–381
E.L. Porteus, Some bounds for discounted sequential decision processes, Manag. Sci. 18 (1971) 7–11.
E.L. Porteus, Bounds and transformations for discounted finite Markov decision chains, Oper. Res. 33 (1975) 761–784.
E.L. Porteus, Overview of iterative methods for discounted finite Markov and semi-Markov decision chains, in: R. Hartley, L.C. Thomas and D.J. White (eds.),Recent Developments in Markov Decision Processes (Academic Press, New York, 1980) pp. 1–20.
P.J. Schweitzer, Bounds on the fixed point of a monotone contraction operator, J. Math. Anal. Appl. 123 (1987) 376–388.
P.J. Schweitzer, Iterative aggregation-disaggregation for discounted Markov renewal programming, forthcoming.
P.J. Schweitzer, U. Sumita and K. Ohno, A replacement process decomposition for discounted Markov renewal programs, this volume, pp. 631–646.
L.P. Seelen, An algorithm forPh / Ph / c queues, Europ. J. Oper. Res. 23 (1986) 118–127.
K.-H. Waldmann, On bounds for dynamic programs, Math. Oper. Res. 10 (1985) 220–232.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Schweitzer, P.J. Block-scaling of value-iteration for discounted Markov renewal programming. Ann Oper Res 29, 603–630 (1991). https://doi.org/10.1007/BF02283616
Issue Date:
DOI: https://doi.org/10.1007/BF02283616