Advertisement

Lower Bounds for Howard’s Algorithm for Finding Minimum Mean-Cost Cycles

  • Thomas Dueholm Hansen
  • Uri Zwick
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6506)

Abstract

Howard’s policy iteration algorithm is one of the most widely used algorithms for finding optimal policies for controlling Markov Decision Processes (MDPs). When applied to weighted directed graphs, which may be viewed as Deterministic MDPs (DMDPs), Howard’s algorithm can be used to find Minimum Mean-Cost cycles (MMCC). Experimental studies suggest that Howard’s algorithm works extremely well in this context. The theoretical complexity of Howard’s algorithm for finding MMCCs is a mystery. No polynomial time bound is known on its running time. Prior to this work, there were only linear lower bounds on the number of iterations performed by Howard’s algorithm. We provide the first weighted graphs on which Howard’s algorithm performs Ω(n 2) iterations, where n is the number of vertices in the graph.

Keywords

Markov Decision Process Weighted Directed Graph Improvement Step Markov Decision Problem Policy Iteration Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bellman, R.E.: Dynamic programming. Princeton University Press, Princeton (1957)zbMATHGoogle Scholar
  2. 2.
    Bellman, R.E.: On a routing problem. Quarterly of Applied Mathematics 16, 87–90 (1958)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Dasdan, A.: Experimental analysis of the fastest optimum cycle ratio and mean algorithms. ACM Trans. Des. Autom. Electron. Syst. 9(4), 385–418 (2004)CrossRefGoogle Scholar
  4. 4.
    Derman, C.: Finite state Markov decision processes. Academic Press, London (1972)zbMATHGoogle Scholar
  5. 5.
    Fearnley, J.: Exponential lower bounds for policy iteration. In: Proc. of 37th ICALP (2010), Preliminaey version available at http://arxiv.org/abs/1003.3418v1
  6. 6.
    Ford Jr., L.R., Fulkerson, D.R.: Maximal flow through a network. Canadian Journal of Mathematics 8, 399–404 (1956)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Friedmann, O.: An exponential lower bound for the parity game strategy improvement algorithm as we know it. In: Proc. of 24th LICS, pp. 145–156 (2009)Google Scholar
  8. 8.
    Georgiadis, L., Goldberg, A.V., Tarjan, R.E., Werneck, R.F.F.: An experimental study of minimum mean cycle algorithms. In: Proc. of 11th ALENEX, pp. 1–13 (2009)Google Scholar
  9. 9.
    Goldberg, A.V., Tarjan, R.E.: Finding minimum-cost circulations by canceling negative cycles. Journal of the ACM 36(4), 873–886 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Hansen, T.D., Miltersen, P.B., Zwick, U.: Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. CoRR, abs/1008.0530 (2010)Google Scholar
  11. 11.
    Howard, R.A.: Dynamic programming and Markov processes. MIT Press, Cambridge (1960)zbMATHGoogle Scholar
  12. 12.
    Karp, R.M.: A characterization of the minimum cycle mean in a digraph. Discrete Mathematics 23(3), 309–311 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Madani, O.: Personal communication (2008)Google Scholar
  14. 14.
    Megiddo, N.: Combinatorial optimization with rational objective functions. Mathematics of Operations Research 4(4), 414–424 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Megiddo, N.: Applying parallel computation algorithms in the design of serial algorithms. Journal of the ACM 30(4), 852–865 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Puterman, M.L.: Markov decision processes. Wiley, Chichester (1994)CrossRefzbMATHGoogle Scholar
  17. 17.
    Ye, Y.: The simplex method is strongly polynomial for the Markov decision problem with a fixed discount rate (2010), http://www.stanford.edu/~yyye/simplexmdp1.pdf
  18. 18.
    Young, N.E., Tarjan, R.E., Orlin, J.B.: Faster parametric shortest path and minimum-balance algorithms. Networks 21, 205–221 (1991)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Thomas Dueholm Hansen
    • 1
  • Uri Zwick
    • 2
  1. 1.Department of Computer ScienceAarhus UniversityDenmark
  2. 2.School of Computer ScienceTel Aviv UniversityTel AvivIsrael

Personalised recommendations