Advertisement

Long-Term Values in Markov Decision Processes, (Co)Algebraically

  • Frank M. V. Feys
  • Helle Hvid HansenEmail author
  • Lawrence S. Moss
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11202)

Abstract

This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof principle, based on Banach’s Fixpoint Theorem, that we call contraction coinduction, and (ii) to show that the long-term value function of a policy with respect to discounted sums can be obtained via a generalized notion of corecursive algebra, which is designed to take boundedness into account. We also explore boundedness features of the Kantorovich lifting of the distribution monad to metric spaces.

Keywords

Markov decision process Long-term value Discounted sum Coalgebra Algebra Corecursive algebra Fixpoint Metric space 

Notes

Acknowledgements

We would like to thank Tarmo Uustalu for pointing us to [5, Theorem 19], thereby improving the paper. We also thank Jasmine Blanchette, Wan Fokkink and Ana Sokolova for useful comments.

References

  1. 1.
    Abramsky, S., Winschel, V.: Coalgebraic analysis of subgame-perfect equilibria in infinite games without discounting. Math. Struct. Comput. Sci. 27(5), 751–761 (2017)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Baldan, P., Bonchi, F., Kerstan, H., König, B.: Behavioral metrics via functor lifting. In: 34th International Conference on Foundation of Software Technology and Theoretical Computer Science, FSTTCS 2014, pp. 403–415 (2014)Google Scholar
  3. 3.
    Bartels, F.: On Generalised Coinduction and Probabilistic Specification Formats. Ph.D. thesis, Vrije Universiteit Amsterdam (2004)Google Scholar
  4. 4.
    Bellman, R.: Dynamic Programming, 1st edn. Princeton University Press, Princeton (1957)zbMATHGoogle Scholar
  5. 5.
    Capretta, V., Uustalu, T., Vene, V.: Recursive coalgebras from comonads. Inf. Comp. 204, 437–468 (2006)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Capretta, V., Uustalu, T., Vene, V.: Corecursive algebras: a study of general structured corecursion. In: Oliveira, M.V.M., Woodcock, J. (eds.) SBMF 2009. LNCS, vol. 5902, pp. 84–100. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-10452-7_7CrossRefGoogle Scholar
  7. 7.
    Denardo, E.V.: Contraction mappings in the theory underlying dynamic programming. SIAM Rev. 9(2), 165–177 (1967)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Desharnais, J., Edalat, A., Panangaden, P.: Bisimulation for labelled markov processes. Inf. Comput. 179(2), 163–193 (2002)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Ferns, N., Panangaden, P., Precup, D.: Metrics for finite markov decision processes. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI 2004, pp. 162–169. AUAI Press, Arlington (2004). http://dl.acm.org/citation.cfm?id=1036843.1036863
  10. 10.
    Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev./Revue Internationale de Statistique 70(3), 419–435 (2002)zbMATHGoogle Scholar
  11. 11.
    Giry, M.: A categorical approach to probability theory. In: Banaschewski, B. (ed.) Categorical Aspects of Topology and Analysis. LNM, vol. 915, pp. 68–85. Springer, Heidelberg (1982).  https://doi.org/10.1007/BFb0092872CrossRefGoogle Scholar
  12. 12.
    Howard, R.A.: Dynamic Programming and Markov Processes. The M.I.T. Press, Cambridge, MA (1960)zbMATHGoogle Scholar
  13. 13.
    Jacobs, B.: Distributive laws for the coinductive solution of recursive equations. Inf. Comput. 204(4), 561–587 (2006)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Jacobs, B., Silva, A., Sokolova, A.: Trace semantics via determinization. J. Comput. Syst. Sci. 81(5), 859–879 (2015). 11th International Workshop on Coalgebraic Methods in Computer Science, CMCS 2012 (Selected Papers)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Johnstone, P.T.: Adjoint lifting theorems for categories of algebras. Bull. Lond. Math. Soc. 7, 294–297 (1975)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Klin, B.: Bialgebras for structural operational semantics: an introduction. Theor. Comput. Sci. 412(38), 5043–5069 (2011)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Kozen, D.: Coinductive proof principles for stochastic processes. Log. Methods Comput. Sci. 5, 1–19 (2009)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Mac Lane, S.: Categories for the Working Mathematician. GTM, vol. 5. Springer, New York (1978).  https://doi.org/10.1007/978-1-4757-4721-8CrossRefzbMATHGoogle Scholar
  19. 19.
    Milius, S.: Completely iterative algebras and completely iterative monads. Inf. Comput. 196(1), 1–41 (2005)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Moore, A.W.: Markov Systems, Markov Decision Processes, and Dynamic Programming (2002). lecture slides available at https://www.autonlab.org/tutorials
  21. 21.
    Ó’Searcóid, M.: Metric Spaces. Springer, London (2006)Google Scholar
  22. 22.
    Pavlovic, D.: A semantical approach to equilibria and rationality. In: Kurz, A., Lenisa, M., Tarlecki, A. (eds.) CALCO 2009. LNCS, vol. 5728, pp. 317–334. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-03741-2_22CrossRefGoogle Scholar
  23. 23.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (2014)zbMATHGoogle Scholar
  24. 24.
    Ruozzi, N., Kozen, D.: Applications of metric coinduction. Logical Methods in Computer Science, 5 (2009)Google Scholar
  25. 25.
    Rutten, J.: Universal coalgebra: a theory of systems. Theor. Comput. Sci. 249(1), 3–80 (2000)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Silva, A., Bonchi, F., Bonsangue, M., Rutten, J.: Generalizing determinization from automata to coalgebras. Log. Methods Comput. Sci. 9, 1–27 (2013)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Silva, A., Sokolova, A.: Sound and complete axiomatization of trace semantics for probabilistic systems. Electron. Notes Theor. Comput. Sci. 276, 291–311 (2011).  https://doi.org/10.1016/j.entcs.2011.09.027MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Sokolova, A.: Probabilistic systems coalgebraically. Theor. Comput. Sci. 412(38), 5095–5110 (2011).  https://doi.org/10.1016/j.tcs.2011.05.008MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Villani, C.: Optimal Transport, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 338. Springer, Heidelberg (2009)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2018

Authors and Affiliations

  • Frank M. V. Feys
    • 1
  • Helle Hvid Hansen
    • 1
    Email author
  • Lawrence S. Moss
    • 2
  1. 1.Department of Engineering Systems and Services, TPMDelft University of TechnologyDelftThe Netherlands
  2. 2.Department of MathematicsIndiana UniversityBloomingtonUSA

Personalised recommendations