Long-Term Values in Markov Decision Processes, (Co)Algebraically
- 178 Downloads
This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof principle, based on Banach’s Fixpoint Theorem, that we call contraction coinduction, and (ii) to show that the long-term value function of a policy with respect to discounted sums can be obtained via a generalized notion of corecursive algebra, which is designed to take boundedness into account. We also explore boundedness features of the Kantorovich lifting of the distribution monad to metric spaces.
KeywordsMarkov decision process Long-term value Discounted sum Coalgebra Algebra Corecursive algebra Fixpoint Metric space
We would like to thank Tarmo Uustalu for pointing us to [5, Theorem 19], thereby improving the paper. We also thank Jasmine Blanchette, Wan Fokkink and Ana Sokolova for useful comments.
- 2.Baldan, P., Bonchi, F., Kerstan, H., König, B.: Behavioral metrics via functor lifting. In: 34th International Conference on Foundation of Software Technology and Theoretical Computer Science, FSTTCS 2014, pp. 403–415 (2014)Google Scholar
- 3.Bartels, F.: On Generalised Coinduction and Probabilistic Specification Formats. Ph.D. thesis, Vrije Universiteit Amsterdam (2004)Google Scholar
- 9.Ferns, N., Panangaden, P., Precup, D.: Metrics for finite markov decision processes. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI 2004, pp. 162–169. AUAI Press, Arlington (2004). http://dl.acm.org/citation.cfm?id=1036843.1036863
- 20.Moore, A.W.: Markov Systems, Markov Decision Processes, and Dynamic Programming (2002). lecture slides available at https://www.autonlab.org/tutorials
- 21.Ó’Searcóid, M.: Metric Spaces. Springer, London (2006)Google Scholar
- 24.Ruozzi, N., Kozen, D.: Applications of metric coinduction. Logical Methods in Computer Science, 5 (2009)Google Scholar
- 29.Villani, C.: Optimal Transport, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 338. Springer, Heidelberg (2009)Google Scholar