Illustrated review of convergence conditions of the value iteration algorithm and the rolling horizon procedure for average-cost MDPs

Della Vecchia, Eugenio; Di Marco, Silvia; Jean-Marie, Alain

doi:10.1007/s10479-012-1070-0

Illustrated review of convergence conditions of the value iteration algorithm and the rolling horizon procedure for average-cost MDPs

Published: 02 February 2012

Volume 199, pages 193–214, (2012)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Eugenio Della Vecchia¹,
Silvia Di Marco¹ &
Alain Jean-Marie²

361 Accesses
3 Citations
Explore all metrics

Abstract

This paper is concerned with the links between the Value Iteration algorithm and the Rolling Horizon procedure, for solving problems of stochastic optimal control under the long-run average criterion, in Markov Decision Processes with finite state and action spaces. We review conditions of the literature which imply the geometric convergence of Value Iteration to the optimal value. Aperiodicity is an essential prerequisite for convergence. We prove that the convergence of Value Iteration generally implies that of Rolling Horizon. We also present a modified Rolling Horizon procedure that can be applied to models without analyzing periodicity, and discuss the impact of this transformation on convergence. We illustrate with numerous examples the different results. Finally, we discuss rules for stopping Value Iteration or finding the length of a Rolling Horizon. We provide an example which demonstrates the difficulty of the question, disproving in particular a conjectured rule proposed by Puterman.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Average cost criterion induced by the regular utility function for continuous-time Markov decision processes

Article 20 February 2017

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

Article 04 August 2020

Infinite-Horizon Problems Under Periodicity Constraint

Article 29 November 2017

Notes

Observe the discrepancy with the general notion of convergence of algorithms in Computer Science, which requires that an algorithm stops and returns the correct result.

References

Alden, J. M., & Smith, R. L. (1992). Rolling horizon procedures in nonhomogeneous Markov decision processes. Operations Research, 40(2), S183–S194.
Article Google Scholar
Bertsekas, D. P. (1987). Dynamic programming: deterministic and stochastic models. Englewood Cliffs: Prentice Hall.
Google Scholar
Derman, C. (1970). Finite state Markovian decision processes. New York: Academic Press.
Google Scholar
Çinlar, E. (1975). Introduction to stochastic processes. New York: Prentice Hall.
Google Scholar
Federgruen, A., Schweitzer, P., & Tijms, C. (1978). Contraction mappings underlying undiscounted Markov decision problems. Journal of Mathematical Analysis and Applications, 65, 711–730.
Article Google Scholar
Guo, X., & Shi, P. (2001). Limiting average criteria for non stationary Markov decision processes. SIAM Journal on Optimization, 11(4), 1037–1053.
Article Google Scholar
Hernández-Lerma, O., & Lasserre, J. B. (1990). Error bounds for rolling horizon policies in discrete-time Markov control processes. IEEE Transactions on Automatic Control, 35(10), 1118–1124.
Article Google Scholar
Kallenberg, L. (2002). Finite state and action MDPS. In E. Feinberg & A. Shwartz (Eds.), Handbook of Markov decision processes. Methods and applications. Kluwer’s international series.
Google Scholar
Kallenberg, L. Markov decision processes. Lectures notes, University of Leiden, 2009, in www.math.leidenuniv.nl/~kallenberg/Lecture-notes-MDP.pdf.
Lanery, E. (1967). Etude asymptotique des systèmes Markoviens à commande. Revue Française d’Informatique et de Recherche Opérationnelle, 1, 3–56.
Google Scholar
Meyn, S. P., & Tweedie, R. L. (2009). Markov chains and stochastic stability (2nd ed.). London: Cambridge University Press.
Book Google Scholar
Puterman, L. (1994). Markov decision processes. New York: Wiley.
Book Google Scholar
Ross, S. M. (1970). Applied probability models with optimization applications. Oakland: Holden-Day.
Google Scholar
Schweitzer, P. J. (1971). Iterative solution of the functional equation of undiscounted Markov renewal programming. Journal of Mathematical Analysis and Applications, 34, 495–501.
Article Google Scholar
Schweitzer, P. J., & Federgruen, A. (1977). The asymptotic behavior of undiscounted value iteration in Markov decision problems. Mathematics of Operations Research, 2(4), 360–381.
Article Google Scholar
Schweitzer, P. J., & Federgruen, A. (1979). Geometric convergence of the value iteration in multichain Markov decision problems. Advances in Applied Probability, 11, 188–217.
Article Google Scholar
Tijms, H. C. (1986). Stochastic modelling and analysis, a computational approach. New York: Wiley.
Google Scholar
White, D. J. (1993). Markov decision processes. New York: Wiley.
Google Scholar

Download references

Author information

Authors and Affiliations

CONICET-UNR, Rosario, Argentina
Eugenio Della Vecchia & Silvia Di Marco
INRIA-LIRMM, Montpellier, France
Alain Jean-Marie

Authors

Eugenio Della Vecchia
View author publications
You can also search for this author in PubMed Google Scholar
Silvia Di Marco
View author publications
You can also search for this author in PubMed Google Scholar
Alain Jean-Marie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Silvia Di Marco.

Appendix

Each month an individual must decide how to allocate his wealth between different consumptions and investments. Each state represents a level of individual’s wealth at the start of a month. Wealth levels give access to two different investment opportunities, prudent or risky. Choosing an investment profile at each level results in a probability transition for the next wealth level, as well as an instantaneous gain. The individual’s objective is to maximize the average gain.

There are five levels of wealth, ordered from the smallest to the largest. At the medium level, connected to the risky behavior, there exists positive probability to pass to the next inferior level of wealth. It is also possible to cycle among the two inferior levels, but there is no action which permit the access to the three superior levels from the inferior ones. Besides, being at the poorest level, by some external help we achieve level 2. There is a common action space A={a ₁,a ₂}, where a ₁ represents the prudent investment profile and a ₂ the risky attitude. We show the data below. $P_{a_{k}}(s,j)$ is the transition probability from the state s to state j when action a _k is used, i.e. $P_{a_{k}}(s,j)=p(j|s,a_{k})$.

The gains can summarize as follows: r(s,a _k) in the matrix below is the gain when at state s, the action a _k is chosen.

$$\left(\begin{array}{c@{\quad}c}1 & 2\\1 & 2\\1 & 1\\3 & 2\\6 & 6\end{array}\right).$$

Through the implementation of the MRH procedure the optimal average wealth can be computed: g ^∗=(2,2,4,4,4). It is produced by the stationary policy associated to the decision rule d=(a ₂,a ₂,a ₁,a ₂,a ₁) whose transition matrix is:

$$\left (\begin{array}{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c}0 & 1 & 0 & 0 & 0\\1& 0 & 0 & 0& 0\\0& 0& 0.7 & 0.3& 0 \\0& 0& 0& 0 & 1\\0& 0& 0& 1 & 0\end{array} \right).$$

Clearly, it is a multichain periodic model. When RH procedure is applied directly, there is no convergence: the procedure gives infinitely (and periodically) many times two policies, (a ₂,a ₂,a ₁,a ₁,a ₁) and (a ₂,a ₂,a ₁,a ₂,a ₁). The first one produces a gain g=(2,2,3,3,3) and then it is not optimal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Della Vecchia, E., Di Marco, S. & Jean-Marie, A. Illustrated review of convergence conditions of the value iteration algorithm and the rolling horizon procedure for average-cost MDPs. Ann Oper Res 199, 193–214 (2012). https://doi.org/10.1007/s10479-012-1070-0

Download citation

Published: 02 February 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s10479-012-1070-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Illustrated review of convergence conditions of the value iteration algorithm and the rolling horizon procedure for average-cost MDPs

Abstract

Access this article

Similar content being viewed by others

Average cost criterion induced by the regular utility function for continuous-time Markov decision processes

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

Infinite-Horizon Problems Under Periodicity Constraint

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Illustrated review of convergence conditions of the value iteration algorithm and the rolling horizon procedure for average-cost MDPs

Abstract

Access this article

Similar content being viewed by others

Average cost criterion induced by the regular utility function for continuous-time Markov decision processes

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

Infinite-Horizon Problems Under Periodicity Constraint

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation