Skip to main content
Log in

On the convergence of successive approximations in dynamic programming with non-zero terminal reward

  • Published:
Zeitschrift für Operations Research Aims and scope Submit manuscript

Abstract

This paper considers the convergence of the finite-horizon optimal value functions of dynamic programming to the infinite-horizon optimal value function, when there is a non-zero terminal-reward function. The model and methods follow closely these used by Schäl in a recent paper, in which a terminal reward of zero was assumed. We first present convergence conditions that are direct extensions of Schäl's, then related conditions in which the terminal-reward function is an upper or lower bound for the infinite-horizon optimal value function. Some applications to problems in queueing control are mentioned briefly. We also comment on the relation between our conditions and the more restrictive conditions of strongly convergent and contractive models, and present a very general result concerning uniqueness of the solution to the infinite-horizon optimality equation.

Zusammenfassung

In der Arbeit wird die Konvergenz von Wertfunktionen dynamischer Optimierungsprobleme mit endlichem Planungshorizont gegen die Wertfunktionen bei unendlichem Planungshorizont betrachtet, wobei die Endauszahlung verschieden von Null ist. Modell und Vorgehensweise lehnen sich an entsprechende Resultate von Schäl an für den Fall, daß die Endauszahlung Null ist. Zunächst werden Konvergenzbedingungen angegeben, welche unmittelbare Erweiterungen der Schälschen Ergebnisse sind, gefolgt von Bedingungen, bei denen die Endauszahlung eine obere oder untere Schranke für die Endauszahlung bei unendlichem Planungshorizont ist. Einige Anwendungen auf Probleme der Steuerung von Warteschlangen werden erwähnt. Ferner werden der Zusammenhang zwischen unseren Bedingungen und den restriktiveren Bedingungen bei stark konvergenten und Kontraktions-Modellen erläutert und ein sehr allgemeines Modell über die Eindeutigkeit der Lösung der Optimalitätsgleichung bei unendlichem Planungshorizont angegeben.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Blackwell, D.: Discounted dynamic programming. Ann. Math. Statist.36, 1965, 226–235.

    Google Scholar 

  • Denardo, E.V.: Contraction mappings in the theory underlying dynamic programming. SIAM Rev.9, 1967, 165–177.

    Google Scholar 

  • Hee, K.M. van, A. Hordijk, andJ. van der Wal: Successive approximations for convergent dynamic programming. Markov Decision Theory. Ed. by H.C. Tijms and J. Wessels. Mathematical Centre Tract No. 93, Amsterdam 1977.

  • Hinderer, K.: Foundations of non-stationary dynamic programming with discrete time parameter. Lecture notes in Operations Research and Mathematical Systems, vol. 33. Berlin-Heidelberg-New York 1970.

  • Hordijk, A.: Dynamic programming and Markov potential theory. Mathematical Centre Tracts51, Amsterdam 1974.

  • Nunen, J.A.E.E. van: Contracting Markov decision processes. Mathematical Centre Tracts71, Amsterdam 1976.

  • Nunen, J.A.E.E. van, andJ. Wessels: Markov decision processes with unbounded rewards. Markov Decision Theory. Ed. by H.C. Tijms and J. Wessels. Mathematical Centre Tract No. 93, Amsterdam 1977.

  • Schäl, M.: Conditions for optimality in dynamic programming and for the limit ofn-stage optimal policies to be optimal. Z. Wahrscheinlichkeitstheorie verw. Geb.32, 1975, 179–196.

    Google Scholar 

  • Stidham, S.: Socially and individually optimal control of arrivals to a GI/M/1 queue. Management Science24, 1978, 1598–1610.

    Google Scholar 

  • Strauch, R.E.: Negative dynamic programming. Ann. Math. Statist.37, 1966, 871–890.

    Google Scholar 

  • Wessels, J.: Markov programming by successive approximations with respect to weighted supremum norms. J. Math. Anal. Appl.58, 1977, 326–335.

    Google Scholar 

  • Whittle, P.: A simple condition for regularity in negative programming. J. Appl. Prob.16, 1979, 305–318.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

An earlier draft of this paper appeared as: “On the Convergence of Successive Approximations and Uniqueness of the Solution to the Functional Equation of Dynamic Programming.” IMSOR Report 2190, The Institute of Mathematical Statistics and Operations Research, The Technical University of Denmark, May 1977 (revised, July 1977).

This research was partially supported by NATO Research Grant No. SRG.SS.5, administered by the NATO Special Programme Panel on Systems Science, and was begun while the author was guest professor at The Institute of Mathematical Statistics and Operations Research at The Technical University of Denmark, January to July, 1977. Further support was provided by the National Science Foundation under Grant No. ENG78-24420.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stidham, S. On the convergence of successive approximations in dynamic programming with non-zero terminal reward. Zeitschrift für Operations Research 25, 57–77 (1981). https://doi.org/10.1007/BF01920049

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01920049

Keywords

Navigation