Strong polynomiality of the Gass-Saaty shadow-vertex pivoting rule for controlled random walks
We consider the subclass of linear programs that formulate Markov Decision Processes (mdps). We show that the Simplex algorithm with the Gass-Saaty shadow-vertex pivoting rule is strongly polynomial for a subclass of mdps, called controlled random walks (CRWs); the running time is O(|S|3⋅|U|2), where |S| denotes the number of states and |U| denotes the number of actions per state. This result improves the running time of Zadorojniy et al. (Mathematics of Operations Research 34(4):992–1007, 2009) algorithm by a factor of |S|. In particular, the number of iterations needed by the Simplex algorithm for CRWs is linear in the number of states and does not depend on the discount factor.
KeywordsMarkov decision process Controlled queues Controlled random walks Simplex algorithm Gass-Saaty shadow-vertex pivoting rule
- Amenta, N., & Ziegler, G. M. (1996). Advances in discrete and computational geometry. In Contemporary mathematics: Vol. 223. Deformed products and maximal shadows of polytopes. Providence: Am. Math. Soc. Google Scholar
- Barasz, M., & Vempala, S. (2010). A new approach to strongly polynomial linear programming. In Innovations in computer science (pp. 42–48). Google Scholar
- de Ghellinck, G. (1960). Les problemes de decisions sequentielles. Cahiers Du Centre D’études de Recherche Opérationnelle, 2, 161–179. Google Scholar
- Kitaev, M. Y., & Rykov, V. V. (1995). Controlled queueing systems. Boca Raton: CRC Press. Google Scholar
- Kleinrock, L. (1975). Queueing systems, Vol. I: Theory. New York: Wiley. Google Scholar
- Matoušek, J., & Gärtner, B. (2007). Understanding and using linear programming. Berlin: Springer. Google Scholar
- Meyn, S. P. (2008). Control techniques for complex networks. Cambridge: Cambridge University Press. Google Scholar
- Puterman, M. L. (1994). Markov decision processes: discrete stochastic dynamic programming. New York: Wiley. Google Scholar
- Schrijver, A. (1998). Theory of linear and integer programming. New York: Wiley. Google Scholar
- Ye, Y. (2010). The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Seminar, talk. Google Scholar
- Zadorojniy, A., & Even, G. Hyperbolic behavior of occupation measures between neighboring policies in CMDPs. http://www.eng.tau.ac.il/~sasha/.