On the convergence of successive approximations in dynamic programming with non-zero terminal reward

Stidham, S.

doi:10.1007/BF01920049

On the convergence of successive approximations in dynamic programming with non-zero terminal reward

Published: May 1981

Volume 25, pages 57–77, (1981)
Cite this article

Zeitschrift für Operations Research Aims and scope Submit manuscript

S. Stidham Jr.¹

56 Accesses
4 Citations
Explore all metrics

Abstract

This paper considers the convergence of the finite-horizon optimal value functions of dynamic programming to the infinite-horizon optimal value function, when there is a non-zero terminal-reward function. The model and methods follow closely these used by Schäl in a recent paper, in which a terminal reward of zero was assumed. We first present convergence conditions that are direct extensions of Schäl's, then related conditions in which the terminal-reward function is an upper or lower bound for the infinite-horizon optimal value function. Some applications to problems in queueing control are mentioned briefly. We also comment on the relation between our conditions and the more restrictive conditions of strongly convergent and contractive models, and present a very general result concerning uniqueness of the solution to the infinite-horizon optimality equation.

Zusammenfassung

In der Arbeit wird die Konvergenz von Wertfunktionen dynamischer Optimierungsprobleme mit endlichem Planungshorizont gegen die Wertfunktionen bei unendlichem Planungshorizont betrachtet, wobei die Endauszahlung verschieden von Null ist. Modell und Vorgehensweise lehnen sich an entsprechende Resultate von Schäl an für den Fall, daß die Endauszahlung Null ist. Zunächst werden Konvergenzbedingungen angegeben, welche unmittelbare Erweiterungen der Schälschen Ergebnisse sind, gefolgt von Bedingungen, bei denen die Endauszahlung eine obere oder untere Schranke für die Endauszahlung bei unendlichem Planungshorizont ist. Einige Anwendungen auf Probleme der Steuerung von Warteschlangen werden erwähnt. Ferner werden der Zusammenhang zwischen unseren Bedingungen und den restriktiveren Bedingungen bei stark konvergenten und Kontraktions-Modellen erläutert und ein sehr allgemeines Modell über die Eindeutigkeit der Lösung der Optimalitätsgleichung bei unendlichem Planungshorizont angegeben.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convergence of Solutions of Optimal Control Problems with Discounting on Large Intervals in the Regions Close to the Endpoints

Article 08 October 2014

A Uniform Tauberian Theorem in Optimal Control

Convergence of Discrete Approximations of Stochastic Programming Problems with Probabilistic Criteria

References

Blackwell, D.: Discounted dynamic programming. Ann. Math. Statist.36, 1965, 226–235.
Google Scholar
Denardo, E.V.: Contraction mappings in the theory underlying dynamic programming. SIAM Rev.9, 1967, 165–177.
Google Scholar
Hee, K.M. van, A. Hordijk, andJ. van der Wal: Successive approximations for convergent dynamic programming. Markov Decision Theory. Ed. by H.C. Tijms and J. Wessels. Mathematical Centre Tract No. 93, Amsterdam 1977.
Hinderer, K.: Foundations of non-stationary dynamic programming with discrete time parameter. Lecture notes in Operations Research and Mathematical Systems, vol. 33. Berlin-Heidelberg-New York 1970.
Hordijk, A.: Dynamic programming and Markov potential theory. Mathematical Centre Tracts51, Amsterdam 1974.
Nunen, J.A.E.E. van: Contracting Markov decision processes. Mathematical Centre Tracts71, Amsterdam 1976.
Nunen, J.A.E.E. van, andJ. Wessels: Markov decision processes with unbounded rewards. Markov Decision Theory. Ed. by H.C. Tijms and J. Wessels. Mathematical Centre Tract No. 93, Amsterdam 1977.
Schäl, M.: Conditions for optimality in dynamic programming and for the limit ofn-stage optimal policies to be optimal. Z. Wahrscheinlichkeitstheorie verw. Geb.32, 1975, 179–196.
Google Scholar
Stidham, S.: Socially and individually optimal control of arrivals to a GI/M/1 queue. Management Science24, 1978, 1598–1610.
Google Scholar
Strauch, R.E.: Negative dynamic programming. Ann. Math. Statist.37, 1966, 871–890.
Google Scholar
Wessels, J.: Markov programming by successive approximations with respect to weighted supremum norms. J. Math. Anal. Appl.58, 1977, 326–335.
Google Scholar
Whittle, P.: A simple condition for regularity in negative programming. J. Appl. Prob.16, 1979, 305–318.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial Engineering, North Carolina State University, P.O.Box 5511, 27650, Raleigh, NC, USA
S. Stidham Jr.

Authors

S. Stidham Jr.
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

An earlier draft of this paper appeared as: “On the Convergence of Successive Approximations and Uniqueness of the Solution to the Functional Equation of Dynamic Programming.” IMSOR Report 2190, The Institute of Mathematical Statistics and Operations Research, The Technical University of Denmark, May 1977 (revised, July 1977).

This research was partially supported by NATO Research Grant No. SRG.SS.5, administered by the NATO Special Programme Panel on Systems Science, and was begun while the author was guest professor at The Institute of Mathematical Statistics and Operations Research at The Technical University of Denmark, January to July, 1977. Further support was provided by the National Science Foundation under Grant No. ENG78-24420.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stidham, S. On the convergence of successive approximations in dynamic programming with non-zero terminal reward. Zeitschrift für Operations Research 25, 57–77 (1981). https://doi.org/10.1007/BF01920049

Download citation

Received: 15 February 1979
Revised: 15 October 1980
Issue Date: May 1981
DOI: https://doi.org/10.1007/BF01920049

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the convergence of successive approximations in dynamic programming with non-zero terminal reward

Abstract

Zusammenfassung

Access this article

Similar content being viewed by others

Convergence of Solutions of Optimal Control Problems with Discounting on Large Intervals in the Regions Close to the Endpoints

A Uniform Tauberian Theorem in Optimal Control

Convergence of Discrete Approximations of Stochastic Programming Problems with Probabilistic Criteria

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the convergence of successive approximations in dynamic programming with non-zero terminal reward

Abstract

Zusammenfassung

Access this article

Similar content being viewed by others

Convergence of Solutions of Optimal Control Problems with Discounting on Large Intervals in the Regions Close to the Endpoints

A Uniform Tauberian Theorem in Optimal Control

Convergence of Discrete Approximations of Stochastic Programming Problems with Probabilistic Criteria

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation