Mathematical Programming

, Volume 30, Issue 1, pp 46–70 | Cite as

Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints

  • A. Hordijk
  • L. C. M. Kallenberg
Article

Abstract

This paper investigates the computation of transient-optimal policies in discrete dynamic programming. The model, is quite general: it may contain transient as well as nontransient policies. and the transition matrices are not necessarily substochastic.

A functional equation for the so-called transient-value-vector is derived and the concept of superharmonicity is introduced. This concept provides the linear program to compute the transientvalue-vector and a transient-optimal policy.

We also discuss the elimination of suboptimal actions, the solution of problems with additional constraints, and the computation of an efficient policy for a multiple objective dynamic programming problem.

Key words

Transient Dynamic Programming Markov Decision Chains Linear Programming Suboptimal Actions Additional Constraints Multiple Objective Dynamic Programming 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Blackwell, “Discrete dynamic programming”,Annals of Mathematical Statistics 33 (1962) 719–726.MathSciNetMATHGoogle Scholar
  2. E. V. Denardo, “Contraction mappings in the theory underlying dynamic programming”.SIAM Review 9 (1967) 165–177.MATHCrossRefMathSciNetGoogle Scholar
  3. E.V. Denardo and U.G. Rothblum, “Optimal stopping, exponential utility and linear programming”,Mathematical Programming 16 (1979) 228–244.CrossRefMathSciNetGoogle Scholar
  4. F. D'Epenoux “Sur un problème de production et de stockage dans l'aléatoire”,Revue Française de Recherche Operationelle 14 (1960) 3–16.Google Scholar
  5. C Derman,Finite state Markovian decision processes (Academic Press, New York, 1970).MATHGoogle Scholar
  6. D. K. Faddeev and V.N. Faddeeva,Computational methods of linear algebra (Freeman, San Francisco, 1963).Google Scholar
  7. K. M. van Hee, A. Hordijk and J. van der Wal, “Successive approximations for convergent dynamic programming”, in: H.C. Tijms and J. Wessels, eds.,Markov decision theory (Mathematical Centre Tract no. 93) (Mathematical Centre. Amsterdam, 1977) pp. 183–211.Google Scholar
  8. A. Hordijk,Dynamic programming and Markov potential theory (Mathematical Centre Tract. 51, Mathematical Centre, Amsterdam, 1974).MATHGoogle Scholar
  9. A. Hordijk and L.C.M. Kallenberg, “Linear programming and Markov decision chains.Management Science 25 (1979) 352–362.MATHMathSciNetGoogle Scholar
  10. A. Hordijk and L.C.M. Kallenberg, “Linear programming and Markov games”, in O. Moeschlin and D. Pallaschke, eds,Game theory and mathematical economics (North-Holland, Amsterdam, 1981) pi. 291–320.Google Scholar
  11. L.C.M. Kalenberg, “Linear programming and finite Markovian control problems”, Thesis, University of Leiden (Mathematical Centre, Amsterdam, 1980).Google Scholar
  12. S. Karlin,Mathematical methods and theory in games, programming and economics, volume I (Addison-Wesley, Reading, Massachusetts, 1959).Google Scholar
  13. T. Kato,Perturbartion theory for linear operators (Springer, New York, 1966).Google Scholar
  14. M. Loève,Probability theory (Van Nostrand, New York, 1955).MATHGoogle Scholar
  15. J MacQueen, “A test for suboptimal actions in Markovian decision problems”,Operations Research 15 (1967) 559–561.MATHGoogle Scholar
  16. B.L. Miller and A.F. Veinott Jr., “Discrete dynamic programming with a small interest rate”,Annals of Mathematical Statistics 40 (1969) 366–370.MathSciNetMATHGoogle Scholar
  17. J.A.E.E. van Nunen and J. Wessels. “Markov decision processes with unbounded rewards”, in: H.C. Tijms and J. Wessels, eds.,Markov decision theory (Mathematical Centre Tract no. 93) (Mathematical Centre, Amsterdam, 1977) pp. 1–24.Google Scholar
  18. U.G. Rothblum, “Normalized Markov decision chains. I: Sensitive discount optimality”.Operations Research 23 (1975) 785–795.MATHMathSciNetCrossRefGoogle Scholar
  19. U.G. Rothblum, “Normalized Markov decision chains. II: Optimality of nonstationary policies”,SIAM Journal of Control and Optimization 15 (1977) 221–232.MATHCrossRefMathSciNetGoogle Scholar
  20. A.F. Veinott Jr., Discrete dynamic programming with sensitive discount optimality criteria”,Annals of Mathematical Statistics 40 (1969) 1636–1660.Google Scholar

Copyright information

© The Mathematical Programming Society, Inc. 1984

Authors and Affiliations

  • A. Hordijk
    • 1
  • L. C. M. Kallenberg
    • 1
  1. 1.Department of MathematicsUniversity of LeidenThe Netherlands

Personalised recommendations