Exponential Lower Bounds for Policy Iteration
We study policy iteration for infinite-horizon Markov decision processes. It has recently been shown policy iteration style algorithms have exponential lower bounds in a two player game setting. We extend these lower bounds to Markov decision processes with the total reward and average-reward optimality criteria.
Unable to display preview. Download preview PDF.
- 1.Andersson, D.: Extending Friedmann’s lower bound to the Hoffman-Karp algorithm. Preprint (June 2009)Google Scholar
- 2.Andersson, D., Hansen, T.D., Miltersen, P.B.: Toward better bounds on policy iteration. Preprint (June 2009)Google Scholar
- 4.Friedmann, O.: A super-polynomial lower bound for the parity game strategy improvement algorithm as we know it. In: Logic in Computer Science (LICS). IEEE, Los Alamitos (2009)Google Scholar
- 5.Howard, R.: Dynamic Programming and Markov Processes. Technology Press and Wiley (1960)Google Scholar
- 6.Mansour, Y., Singh, S.P.: On the complexity of policy iteration. In: Laskey, K.B., Prade, H. (eds.) UAI 1999: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 401–408. Morgan Kaufmann, San Francisco (1999)Google Scholar