Summary
We consider a discounted Markovian Decision Process (MDP) with finite state and action space. For a fixed discount factor we derive a bound for the number of steps, taken by Howard's policy improvement algorithm (PIA) to determine an optimal policy for the MDP, that is essentially polynomial in the number of states and actions of the MDP. The main tools are the contraction properties of the PIA and a lower bound for the difference of the value functions of a MDP with rational data.
Zusammenfassung
Wir betrachten einen Markoffschen Entscheidungsprozeß mit endlichem Zustands- und Aktionenraum. Bei festgehaltenem Diskontierungsfaktor bestimmen wir eine Grenze für die Anzahl der Schritte in Howards Politikverbesserungsverfahren, die im wesentlichen polynomial in der Anzahl der Zustände und Aktionen ist. Die Haupthilfsmittel sind dabei die Kontraktionseigenschaft des Algorithmus und eine untere
Similar content being viewed by others
References
De Ghellinck GT (1960) Les problèmes de décisions sequentielles. Cah Centr Etud Rech Opérat 2:161–179
Hinderer K (1970) Foundations of non-stationary dynamic programming with discrete time-parameter. Lecture Note Operations Research and Math. Systems 33. Springer, Berlin Heidelberg New York
Howard RA (1960) Dynamic programming and Markov processes. J Wiley, New York
Klee V, Minty GL (1972) How good is the simplex algorithm? In: Shiska O (ed) Inequalities III. Academic Press, New York, pp 159–179
Manne AS (1960) Linear programming and sequential decisions. Manag Sci 6:259–267
Schäl M (1975) Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Z Wahrscheinlichkeitstheorie verw. Gebiete 32: 345–364
Strassen V (1969) Gaussian elimination is not optimal. Numer Math 13:354–356s
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Meister, U., Holzbaur, U. A polynomial time bound for Howard's policy improvement algorithm. OR Spektrum 8, 37–40 (1986). https://doi.org/10.1007/BF01720771
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF01720771