Skip to main content
Log in

Nonstationary Markov decision problems with converging parameters

  • Contributed Papers
  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

This paper considers the solution of Markov decision problems whose parameters can be obtained only via approximating schemes, or where it is computationally preferable to approximate the parameters, rather than employing exact algorithms for their computation.

Various models are presented in which this situation occurs. Furthermore, it is shown that a modified value-iteration method may be employed, both for the discounted version and for the undiscounted version of the model, in order to solve the optimality equation and to find optimal policies. In both cases, the convergence rate is determined.

As a side result, we characterize the asymptotic behavior of backward products of a geometrically convergent sequence of Markov matrices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Grinold, R.,Elimination of Suboptimal Actions in Markov Decision Problems, Operations Research, Vol. 27, pp. 848–851, 1973.

    Google Scholar 

  2. Hastings, N., andMello, J.,Tests for Suboptimal Actions in Discounted Markov Programming, Management Science, Vol. 19, pp. 1019–1022, 1973.

    Google Scholar 

  3. Macqueen, J.,A Test for Suboptimal Actions in Markov Decision Problems, Operations Research, Vol. 15, pp. 559–561, 1967.

    Google Scholar 

  4. Porteus, E.,Some Bounds for Discounted Sequential Decision Processes, Management Science, Vol. 18, pp. 7–11, 1971.

    Google Scholar 

  5. White, D.,Elimination of Non-optimal Actions in Markov Decision Processes, Dynamic Programming and Its Applications, Edited by M. Puterman, Academic Press, New York, New York, 1978.

    Google Scholar 

  6. Hastings, N.,A Test for Nonoptimal Actions in Undiscounted Finite Markov Decision Chains, Management Science, Vol. 23, pp. 87–92, 1976.

    Google Scholar 

  7. Federgruen, A., Schweitzer, P. J., andTijms, H. C.,Contraction Mappings Underlying Undiscounted Markov Decision Problems, Journal of Mathematical Analysis and Applications, Vol. 65, pp. 711–730, 1978.

    Google Scholar 

  8. Luenberger, D.,Introduction to Linear and Nonlinear Programming, Addison-Wesley Publishing Company, Reading, Massachusetts, 1973.

    Google Scholar 

  9. Goffin, J.,On Convergence Rates of Subgradient Optimization Methods, McGill University, Working Paper, No. 76–34, 1976.

  10. Murray, W.,Numerical Methods for Unconstrained Optimization, Academic Press, New York, New York, 1972.

    Google Scholar 

  11. Odoni, A.,On Finding the Maximal Gain for Markov Decision Processes, Operations Research, Vol. 17, pp. 857–860, 1969.

    Google Scholar 

  12. Jewell, W.,Markov Renewal Programming, Operations Research, Vol. 11, pp. 938–971, 1963.

    Google Scholar 

  13. Russel, C.,An Optimal Policy for Operating a Multipurpose Reservoir, Operations Research, Vol. 20, pp. 1181–1189, 1972.

    Google Scholar 

  14. Verkhovsky, B.,Smoothing System Design and Parametric Markovian Programming, Markov Decision Theory, Edited by H. Tijons and J. Wessels, Mathematical Center, Amsterdam, Holland, 1977.

    Google Scholar 

  15. Verkhovsky, B., andSpivak, V.,Water Systems Optimal Design and Controlled Stochastic Processes, Ekonomika 1, Matematicheskie Metody, Vol. 8, pp. 966–972, 1972.

    Google Scholar 

  16. Sobel, M.,Optimal Operation of Queues, Mathematical Methods in Queueing Theory, Lecture Notes in Economics and Mathematical Systems, Edited by A. B. Clarke, Springer-Verlag, Berlin, Germany, 1976.

    Google Scholar 

  17. Deleve, G., Federgruen, A., andTijms, H. C.,A General Markov Decision Method, II, Advances in Applied Probability, Vol. 9, pp. 316–335, 1977.

    Google Scholar 

  18. Lippman, S.,Applying a New Device in the Optimization of Exponential Queueing Systems, Operations Research, Vol. 23, pp. 687–711, 1975.

    Google Scholar 

  19. Schweitzer, P. J.,Iterative Solution of the Functional Equations for Undiscounted Markov Renewal Programming, Journal of Mathematical Analysis and Applications, Vol. 34, pp. 495–501, 1971.

    Google Scholar 

  20. Denardo, E.,Markov Renewal Programs with Small Interest Rates, Annals of Mathematical Statistics, Vol. 42, pp. 477–496, 1971.

    Google Scholar 

  21. Miller, B., andVeinott, A., Jr.,Discrete Dynamic Programming with a Small Interest Rate, Annals of Mathematical Statistics, Vol. 40, pp. 366–370, 1969.

    Google Scholar 

  22. Veinott, A., Jr.,Discrete Dynamic Programming with Sensitive Discount Optimality Criteria, Annals of Mathematical Statistics, Vol. 40, pp. 1635–1640, 1969.

    Google Scholar 

  23. Federgruen, A., andSchweitzer, P. J.,Successive Approximation Methods for Solving Nested Functional Equations in Markov Decision Theory, University of Rochester, Graduate School of Management, Working Paper No. 7908, 1979.

  24. Shapley, L.,Stochastic Games, Proceedings of the National Academy of Sciences, Vol. 39, pp. 1095–1100, 1953.

    Google Scholar 

  25. Denardo, E.,Contraction Mappings in the Theory Underlying Dynamic Programming, SIAM Review, Vol. 9, pp. 165–177, 1967.

    Google Scholar 

  26. White, D.,Dynamic Programming, Markov Chains and the Method of Successive Approximations, Journal of Mathematical Analysis and Applications, Vol. 6, pp. 373–376, 1963.

    Google Scholar 

  27. Brown, B.,On the Iterative Method of Dynamic Programming on a Finite State Space, Discrete-Time Markov Process, Annals of Mathematical Statistics, Vol. 36, pp. 1279–1285, 1965.

    Google Scholar 

  28. Schweitzer, P. J.,Perturbation Theory and Markovian Decision Processes, Massachusetts Institute of Technology, Operations Research Center, PhD Dissertation, 1965

  29. Lanery, E., Etude Asymptotique des Systèmes Markoviens à Commande, Revue de l'Informatique et de la Recherche Opérationelle, Vol. 1, pp. 3–5, 1967.

    Google Scholar 

  30. Schweitzer, P. J., andFedergruen, A.,The Asymptotic Behavior of Undiscounted Value Iteration in Markov Decision Problems, Mathematics of Operations Research, Vol. 2, pp. 360–381, 1978.

    Google Scholar 

  31. Schweitzer, P. J., andFedergruen, A.,Geometric Convergence of Value-Iteration in Multichain Markov Decision Problems, Advances in Applied Probability, Vol. 11, pp. 188–217, 1979.

    Google Scholar 

  32. Federgruen, A., andSchweitzer, P. J.,Discounted and Undiscounted Value-Iteration in Markov Decision Problems, Dynamic Programming and Its Applications, Edited by M. Puterman, Academic Press, New York, New York, 1978.

    Google Scholar 

  33. Denardo, E., andFox, B.,Multichain Markov Renewal Programs, SIAM Journal on Applied Mathematics, Vol. 16, pp. 468–487, 1968.

    Google Scholar 

  34. Anthonisse, J., andTijms, H.,Exponential Convergence of Products of Stochastic Matrices, Journal of Mathematical Analysis and Applications, Vol. 59, pp. 360–364, 1979.

    Google Scholar 

  35. Chatterjee, S., andSeneta, E.,Toward Consensus: Some Convergence Theorems on Repeated Averaging, Journal of Applied Probability, Vol. 14, pp. 89–97, 1977.

    Google Scholar 

  36. Federgruen, A.,The Rate of Convergence for Backwards Products of a Convergent Sequence of Finite Markov Matrices, University of Rochester, Graduate School of Management, Working Paper No. 7827, 1978.

  37. Huang, C., Isaacson, D., andVinograde, B.,The Rate of Convergence of Certain Nonhomogeneous Markov Chains, Zeitschrift für Wahrschetnlichkeits-theorie, Vol. 35, pp. 141–146, 1976.

    Google Scholar 

  38. Schweitzer, P. J.,Perturbation Theory and Finite Markov Chains, Journal of Applied Probability, Vol. 5, pp. 401–413, 1968.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Communicated by R. A. Howard

Rights and permissions

Reprints and permissions

About this article

Cite this article

Federgruen, A., Schweitzer, P.J. Nonstationary Markov decision problems with converging parameters. J Optim Theory Appl 34, 207–241 (1981). https://doi.org/10.1007/BF00935474

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00935474

Key Words

Navigation