Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints
Article
Received:
Revised:
- 64 Downloads
- 6 Citations
Abstract
This paper investigates the computation of transient-optimal policies in discrete dynamic programming. The model, is quite general: it may contain transient as well as nontransient policies. and the transition matrices are not necessarily substochastic.
A functional equation for the so-called transient-value-vector is derived and the concept of superharmonicity is introduced. This concept provides the linear program to compute the transientvalue-vector and a transient-optimal policy.
We also discuss the elimination of suboptimal actions, the solution of problems with additional constraints, and the computation of an efficient policy for a multiple objective dynamic programming problem.
Key words
Transient Dynamic Programming Markov Decision Chains Linear Programming Suboptimal Actions Additional Constraints Multiple Objective Dynamic ProgrammingPreview
Unable to display preview. Download preview PDF.
References
- D. Blackwell, “Discrete dynamic programming”,Annals of Mathematical Statistics 33 (1962) 719–726.MathSciNetMATHGoogle Scholar
- E. V. Denardo, “Contraction mappings in the theory underlying dynamic programming”.SIAM Review 9 (1967) 165–177.MATHCrossRefMathSciNetGoogle Scholar
- E.V. Denardo and U.G. Rothblum, “Optimal stopping, exponential utility and linear programming”,Mathematical Programming 16 (1979) 228–244.CrossRefMathSciNetGoogle Scholar
- F. D'Epenoux “Sur un problème de production et de stockage dans l'aléatoire”,Revue Française de Recherche Operationelle 14 (1960) 3–16.Google Scholar
- C Derman,Finite state Markovian decision processes (Academic Press, New York, 1970).MATHGoogle Scholar
- D. K. Faddeev and V.N. Faddeeva,Computational methods of linear algebra (Freeman, San Francisco, 1963).Google Scholar
- K. M. van Hee, A. Hordijk and J. van der Wal, “Successive approximations for convergent dynamic programming”, in: H.C. Tijms and J. Wessels, eds.,Markov decision theory (Mathematical Centre Tract no. 93) (Mathematical Centre. Amsterdam, 1977) pp. 183–211.Google Scholar
- A. Hordijk,Dynamic programming and Markov potential theory (Mathematical Centre Tract. 51, Mathematical Centre, Amsterdam, 1974).MATHGoogle Scholar
- A. Hordijk and L.C.M. Kallenberg, “Linear programming and Markov decision chains.Management Science 25 (1979) 352–362.MATHMathSciNetGoogle Scholar
- A. Hordijk and L.C.M. Kallenberg, “Linear programming and Markov games”, in O. Moeschlin and D. Pallaschke, eds,Game theory and mathematical economics (North-Holland, Amsterdam, 1981) pi. 291–320.Google Scholar
- L.C.M. Kalenberg, “Linear programming and finite Markovian control problems”, Thesis, University of Leiden (Mathematical Centre, Amsterdam, 1980).Google Scholar
- S. Karlin,Mathematical methods and theory in games, programming and economics, volume I (Addison-Wesley, Reading, Massachusetts, 1959).Google Scholar
- T. Kato,Perturbartion theory for linear operators (Springer, New York, 1966).Google Scholar
- M. Loève,Probability theory (Van Nostrand, New York, 1955).MATHGoogle Scholar
- J MacQueen, “A test for suboptimal actions in Markovian decision problems”,Operations Research 15 (1967) 559–561.MATHGoogle Scholar
- B.L. Miller and A.F. Veinott Jr., “Discrete dynamic programming with a small interest rate”,Annals of Mathematical Statistics 40 (1969) 366–370.MathSciNetMATHGoogle Scholar
- J.A.E.E. van Nunen and J. Wessels. “Markov decision processes with unbounded rewards”, in: H.C. Tijms and J. Wessels, eds.,Markov decision theory (Mathematical Centre Tract no. 93) (Mathematical Centre, Amsterdam, 1977) pp. 1–24.Google Scholar
- U.G. Rothblum, “Normalized Markov decision chains. I: Sensitive discount optimality”.Operations Research 23 (1975) 785–795.MATHMathSciNetCrossRefGoogle Scholar
- U.G. Rothblum, “Normalized Markov decision chains. II: Optimality of nonstationary policies”,SIAM Journal of Control and Optimization 15 (1977) 221–232.MATHCrossRefMathSciNetGoogle Scholar
- A.F. Veinott Jr., Discrete dynamic programming with sensitive discount optimality criteria”,Annals of Mathematical Statistics 40 (1969) 1636–1660.Google Scholar
Copyright information
© The Mathematical Programming Society, Inc. 1984