Lower Bounds for Howard’s Algorithm for Finding Minimum Mean-Cost Cycles
Howard’s policy iteration algorithm is one of the most widely used algorithms for finding optimal policies for controlling Markov Decision Processes (MDPs). When applied to weighted directed graphs, which may be viewed as Deterministic MDPs (DMDPs), Howard’s algorithm can be used to find Minimum Mean-Cost cycles (MMCC). Experimental studies suggest that Howard’s algorithm works extremely well in this context. The theoretical complexity of Howard’s algorithm for finding MMCCs is a mystery. No polynomial time bound is known on its running time. Prior to this work, there were only linear lower bounds on the number of iterations performed by Howard’s algorithm. We provide the first weighted graphs on which Howard’s algorithm performs Ω(n 2) iterations, where n is the number of vertices in the graph.
KeywordsMarkov Decision Process Weighted Directed Graph Improvement Step Markov Decision Problem Policy Iteration Algorithm
Unable to display preview. Download preview PDF.
- 5.Fearnley, J.: Exponential lower bounds for policy iteration. In: Proc. of 37th ICALP (2010), Preliminaey version available at http://arxiv.org/abs/1003.3418v1
- 7.Friedmann, O.: An exponential lower bound for the parity game strategy improvement algorithm as we know it. In: Proc. of 24th LICS, pp. 145–156 (2009)Google Scholar
- 8.Georgiadis, L., Goldberg, A.V., Tarjan, R.E., Werneck, R.F.F.: An experimental study of minimum mean cycle algorithms. In: Proc. of 11th ALENEX, pp. 1–13 (2009)Google Scholar
- 10.Hansen, T.D., Miltersen, P.B., Zwick, U.: Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. CoRR, abs/1008.0530 (2010)Google Scholar
- 13.Madani, O.: Personal communication (2008)Google Scholar
- 17.Ye, Y.: The simplex method is strongly polynomial for the Markov decision problem with a fixed discount rate (2010), http://www.stanford.edu/~yyye/simplexmdp1.pdf