Abstract
This paper considers the solution of Markov decision problems whose parameters can be obtained only via approximating schemes, or where it is computationally preferable to approximate the parameters, rather than employing exact algorithms for their computation.
Various models are presented in which this situation occurs. Furthermore, it is shown that a modified value-iteration method may be employed, both for the discounted version and for the undiscounted version of the model, in order to solve the optimality equation and to find optimal policies. In both cases, the convergence rate is determined.
As a side result, we characterize the asymptotic behavior of backward products of a geometrically convergent sequence of Markov matrices.
Similar content being viewed by others
References
Grinold, R.,Elimination of Suboptimal Actions in Markov Decision Problems, Operations Research, Vol. 27, pp. 848–851, 1973.
Hastings, N., andMello, J.,Tests for Suboptimal Actions in Discounted Markov Programming, Management Science, Vol. 19, pp. 1019–1022, 1973.
Macqueen, J.,A Test for Suboptimal Actions in Markov Decision Problems, Operations Research, Vol. 15, pp. 559–561, 1967.
Porteus, E.,Some Bounds for Discounted Sequential Decision Processes, Management Science, Vol. 18, pp. 7–11, 1971.
White, D.,Elimination of Non-optimal Actions in Markov Decision Processes, Dynamic Programming and Its Applications, Edited by M. Puterman, Academic Press, New York, New York, 1978.
Hastings, N.,A Test for Nonoptimal Actions in Undiscounted Finite Markov Decision Chains, Management Science, Vol. 23, pp. 87–92, 1976.
Federgruen, A., Schweitzer, P. J., andTijms, H. C.,Contraction Mappings Underlying Undiscounted Markov Decision Problems, Journal of Mathematical Analysis and Applications, Vol. 65, pp. 711–730, 1978.
Luenberger, D.,Introduction to Linear and Nonlinear Programming, Addison-Wesley Publishing Company, Reading, Massachusetts, 1973.
Goffin, J.,On Convergence Rates of Subgradient Optimization Methods, McGill University, Working Paper, No. 76–34, 1976.
Murray, W.,Numerical Methods for Unconstrained Optimization, Academic Press, New York, New York, 1972.
Odoni, A.,On Finding the Maximal Gain for Markov Decision Processes, Operations Research, Vol. 17, pp. 857–860, 1969.
Jewell, W.,Markov Renewal Programming, Operations Research, Vol. 11, pp. 938–971, 1963.
Russel, C.,An Optimal Policy for Operating a Multipurpose Reservoir, Operations Research, Vol. 20, pp. 1181–1189, 1972.
Verkhovsky, B.,Smoothing System Design and Parametric Markovian Programming, Markov Decision Theory, Edited by H. Tijons and J. Wessels, Mathematical Center, Amsterdam, Holland, 1977.
Verkhovsky, B., andSpivak, V.,Water Systems Optimal Design and Controlled Stochastic Processes, Ekonomika 1, Matematicheskie Metody, Vol. 8, pp. 966–972, 1972.
Sobel, M.,Optimal Operation of Queues, Mathematical Methods in Queueing Theory, Lecture Notes in Economics and Mathematical Systems, Edited by A. B. Clarke, Springer-Verlag, Berlin, Germany, 1976.
Deleve, G., Federgruen, A., andTijms, H. C.,A General Markov Decision Method, II, Advances in Applied Probability, Vol. 9, pp. 316–335, 1977.
Lippman, S.,Applying a New Device in the Optimization of Exponential Queueing Systems, Operations Research, Vol. 23, pp. 687–711, 1975.
Schweitzer, P. J.,Iterative Solution of the Functional Equations for Undiscounted Markov Renewal Programming, Journal of Mathematical Analysis and Applications, Vol. 34, pp. 495–501, 1971.
Denardo, E.,Markov Renewal Programs with Small Interest Rates, Annals of Mathematical Statistics, Vol. 42, pp. 477–496, 1971.
Miller, B., andVeinott, A., Jr.,Discrete Dynamic Programming with a Small Interest Rate, Annals of Mathematical Statistics, Vol. 40, pp. 366–370, 1969.
Veinott, A., Jr.,Discrete Dynamic Programming with Sensitive Discount Optimality Criteria, Annals of Mathematical Statistics, Vol. 40, pp. 1635–1640, 1969.
Federgruen, A., andSchweitzer, P. J.,Successive Approximation Methods for Solving Nested Functional Equations in Markov Decision Theory, University of Rochester, Graduate School of Management, Working Paper No. 7908, 1979.
Shapley, L.,Stochastic Games, Proceedings of the National Academy of Sciences, Vol. 39, pp. 1095–1100, 1953.
Denardo, E.,Contraction Mappings in the Theory Underlying Dynamic Programming, SIAM Review, Vol. 9, pp. 165–177, 1967.
White, D.,Dynamic Programming, Markov Chains and the Method of Successive Approximations, Journal of Mathematical Analysis and Applications, Vol. 6, pp. 373–376, 1963.
Brown, B.,On the Iterative Method of Dynamic Programming on a Finite State Space, Discrete-Time Markov Process, Annals of Mathematical Statistics, Vol. 36, pp. 1279–1285, 1965.
Schweitzer, P. J.,Perturbation Theory and Markovian Decision Processes, Massachusetts Institute of Technology, Operations Research Center, PhD Dissertation, 1965
Lanery, E., Etude Asymptotique des Systèmes Markoviens à Commande, Revue de l'Informatique et de la Recherche Opérationelle, Vol. 1, pp. 3–5, 1967.
Schweitzer, P. J., andFedergruen, A.,The Asymptotic Behavior of Undiscounted Value Iteration in Markov Decision Problems, Mathematics of Operations Research, Vol. 2, pp. 360–381, 1978.
Schweitzer, P. J., andFedergruen, A.,Geometric Convergence of Value-Iteration in Multichain Markov Decision Problems, Advances in Applied Probability, Vol. 11, pp. 188–217, 1979.
Federgruen, A., andSchweitzer, P. J.,Discounted and Undiscounted Value-Iteration in Markov Decision Problems, Dynamic Programming and Its Applications, Edited by M. Puterman, Academic Press, New York, New York, 1978.
Denardo, E., andFox, B.,Multichain Markov Renewal Programs, SIAM Journal on Applied Mathematics, Vol. 16, pp. 468–487, 1968.
Anthonisse, J., andTijms, H.,Exponential Convergence of Products of Stochastic Matrices, Journal of Mathematical Analysis and Applications, Vol. 59, pp. 360–364, 1979.
Chatterjee, S., andSeneta, E.,Toward Consensus: Some Convergence Theorems on Repeated Averaging, Journal of Applied Probability, Vol. 14, pp. 89–97, 1977.
Federgruen, A.,The Rate of Convergence for Backwards Products of a Convergent Sequence of Finite Markov Matrices, University of Rochester, Graduate School of Management, Working Paper No. 7827, 1978.
Huang, C., Isaacson, D., andVinograde, B.,The Rate of Convergence of Certain Nonhomogeneous Markov Chains, Zeitschrift für Wahrschetnlichkeits-theorie, Vol. 35, pp. 141–146, 1976.
Schweitzer, P. J.,Perturbation Theory and Finite Markov Chains, Journal of Applied Probability, Vol. 5, pp. 401–413, 1968.
Author information
Authors and Affiliations
Additional information
Communicated by R. A. Howard
Rights and permissions
About this article
Cite this article
Federgruen, A., Schweitzer, P.J. Nonstationary Markov decision problems with converging parameters. J Optim Theory Appl 34, 207–241 (1981). https://doi.org/10.1007/BF00935474
Issue Date:
DOI: https://doi.org/10.1007/BF00935474