Bisimulation for Markov Decision Processes through Families of Functional Expressions

  • Norm Ferns
  • Doina Precup
  • Sophia Knight

Abstract

We transfer a notion of quantitative bisimilarity for labelled Markov processes [1] to Markov decision processes with continuous state spaces. This notion takes the form of a pseudometric on the system states, cast in terms of the equivalence of a family of functional expressions evaluated on those states and interpreted as a real-valued modal logic. Our proof amounts to a slight modification of previous techniques [2,3] used to prove equivalence with a fixed-point pseudometric on the state-space of a labelled Markov process and making heavy use of the Kantorovich probability metric. Indeed, we again demonstrate equivalence with a fixed-point pseudometric defined on Markov decision processes [4] ; what is novel is that we recast this proof in terms of integral probability metrics [5] defined through the family of functional expressions, shifting emphasis back to properties of such families. The hope is that a judicious choice of family might lead to something more computationally tractable than bisimilarity whilst maintaining its pleasing theoretical guarantees. Moreover, we use a trick from descriptive set theory to extend our results to MDPs with bounded measurable reward functions, dropping a previous continuity constraint on rewards and Markov kernels.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Desharnais, J., Jagadeesan, R., Gupta, V., Panangaden, P.: The Metric Analogue of Weak Bisimulation for Probabilistic Processes. In: LICS 2002: Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science, July 22-25, pp. 413–422. IEEE Computer Society, Washington, DC (2002)Google Scholar
  2. 2.
    van Breugel, F., Worrell, J.: Towards Quantitative Verification of Probabilistic Transition Systems. In: Orejas, F., Spirakis, P.G., van Leeuwen, J. (eds.) ICALP 2001. LNCS, vol. 2076, pp. 421–432. Springer, Heidelberg (2001a)CrossRefGoogle Scholar
  3. 3.
    van Breugel, F., Worrell, J.: An Algorithm for Quantitative Verification of Probabilistic Transition Systems. In: Larsen, K.G., Nielsen, M. (eds.) CONCUR 2001. LNCS, vol. 2154, pp. 336–350. Springer, Heidelberg (2001b)CrossRefGoogle Scholar
  4. 4.
    Ferns, N., Panangaden, P., Precup, D.: Bisimulation Metrics for Continuous Markov Decision Processes. SIAM Journal on Computing 40(6), 1662–1714 (2011)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Müller, A.: Integral Probability Metrics and Their Generating Classes of Functions. Advances in Applied Probability 29, 429–443 (1997)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Larsen, K.G., Skou, A.: Bisimulation Through Probabilistic Testing. Information and Computation 94(1), 1–28 (1991)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Milner, R.: A Calculus of Communication Systems. LNCS, vol. 92. Springer, New York (1980)CrossRefMATHGoogle Scholar
  8. 8.
    Park, D.: Concurrency and Automata on Infinite Sequences. In: Proceedings of the 5th GI-Conference on Theoretical Computer Science, pp. 167–183. Springer, London (1981)CrossRefGoogle Scholar
  9. 9.
    Desharnais, J., Gupta, V., Jagadeesan, R., Panangaden, P.: Metrics for Labeled Markov Systems. In: Baeten, J.C.M., Mauw, S. (eds.) CONCUR 1999. LNCS, vol. 1664, pp. 258–273. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  10. 10.
    Desharnais, J., Gupta, V., Jagadeesan, R., Panangaden, P.: Metrics for Labelled Markov Processes. Theor. Comput. Sci. 318(3), 323–354 (2004)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    van Breugel, F., Hermida, C., Makkai, M., Worrell, J.: Recursively Defined Metric Spaces Without Contraction. Theoretical Computer Science 380(1-2), 143–163 (2007)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Kozen, D.: A Probabilistic PDL. In: STOC 1983: Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, pp. 291–297. ACM, New York (1983)CrossRefGoogle Scholar
  13. 13.
    van Breugel, F., Sharma, B., Worrell, J.B.: Approximating a Behavioural Pseudometric Without Discount for Probabilistic Systems. In: Seidl, H. (ed.) FOSSACS 2007. LNCS, vol. 4423, pp. 123–137. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  14. 14.
    Ferns, N., Panangaden, P., Precup, D.: Metrics for Finite Markov Decision Processes. In: AUAI 2004: Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence, Arlington, Virginia, United States, pp. 162–169. AUAI Press (2004)Google Scholar
  15. 15.
    Ferns, N., Panangaden, P., Precup, D.: Metrics for Markov Decision Processes with Infinite State Spaces. In: Proceedings of the 21 Annual Conference on Uncertainty in Artificial Intelligence (UAI 2005), Arlington, Virginia, pp. 201–208. AUAI Press (2005)Google Scholar
  16. 16.
    Ferns, N., Castro, P.S., Precup, D., Panangaden, P.: Methods for Computing State Similarity in Markov Decision Processes. In: Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI 2006), Arlington, Virginia. AUAI Press, Arlington (2006)Google Scholar
  17. 17.
    Castronovo, M., Maes, F., Ernst., R.F.,, D.: Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning. In: Proceedings of the 10th European Workshop on Reinforcement Learning (EWRL 2012), Edinburgh, Scotland, June 30-July 1, vol. 24, pp. 1–10 (2012)Google Scholar
  18. 18.
    Panangaden, P.: Labelled Markov Processes. Imperial College Press (2009)Google Scholar
  19. 19.
    Giry, M.: A Categorical Approach to Probability Theory. Categorical Aspects of Topology and Analysis, pp. 68–85 (1982)Google Scholar
  20. 20.
    Billingsley, P.: Convergence of Probability Measures. Wiley (1968)Google Scholar
  21. 21.
    Dudley, R.M.: Real Analysis and Probability. Cambridge University Press (August 2002)Google Scholar
  22. 22.
    Desharnais, J.: Labelled Markov Processes. PhD thesis, McGill University (2000)Google Scholar
  23. 23.
    Desharnais, J., Edalat, A., Panangaden, P.: Bisimulation for Labeled Markov Processes. Information and Computation 179(2), 163–193 (2002)MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Gibbs, A.L., Su, F.E.: On Choosing and Bounding Probability Metrics. International Statistical Review 70, 419–435 (2002)CrossRefMATHGoogle Scholar
  25. 25.
    Villani, C.: Topics in Optimal Transportation. Graduate Studies in Mathematics, vol. 58. American Mathematical Society (2003)Google Scholar
  26. 26.
    Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Applications of Mathematics. Springer, New York (1999)CrossRefMATHGoogle Scholar
  27. 27.
    Srivastava, S.M.: A Course on Borel Sets. Graduate texts in mathematics, vol. 180. Springer (2008)Google Scholar
  28. 28.
    Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control: The Discrete-Time Case. Athena Scientific (2007)Google Scholar
  29. 29.
    Parthasarathy, K.R.: Probability Measures on Metric Spaces. Academic, New York (1967)CrossRefMATHGoogle Scholar
  30. 30.
    Chen, D., van Breugel, F., Worrell, J.: On the Complexity of Computing Probabilistic Bisimilarity. In: Birkedal, L. (ed.) FOSSACS 2012. LNCS, vol. 7213, pp. 437–451. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  31. 31.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York (1994)CrossRefMATHGoogle Scholar
  32. 32.
    Sriperumbudur, B.K., Fukumizu, K., Gretton, A., Schölkopf, B., Lanckriet, G.R.G.: On the Empirical Estimation of Integral Probability Metrics. Electronic Journal of Statistics 6, 1550–1599 (2012)MathSciNetCrossRefMATHGoogle Scholar
  33. 33.
    Bouchard-Côté, A., Ferns, N., Panangaden, P., Precup, D.: An Approximation Algorithm for Labelled Markov Processes: Towards Realistic Approximation. In: QEST 2005: Proceedings of the Second International Conference on the Quantitative Evaluation of Systems (QEST 2005) on The Quantitative Evaluation of Systems, pp. 54–61. IEEE Computer Society, Washington, DC (2005)Google Scholar
  34. 34.
    Müller, A.: Stochastic Orders Generated by Integrals: A Unified Study. Advances in Applied Probability 29, 414–428 (1997)MathSciNetCrossRefMATHGoogle Scholar
  35. 35.
    Chaput, P., Danos, V., Panangaden, P., Plotkin, G.: Approximating Markov Processes by Averaging. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009, Part II. LNCS, vol. 5556, pp. 127–138. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Norm Ferns
    • 1
  • Doina Precup
    • 2
  • Sophia Knight
    • 3
  1. 1.Département d’InformatiqueÉcole Normale SupérieureParis Cedex 05France
  2. 2.School of Computer ScienceMcGill UniversityMontréalCanada
  3. 3.CNRS, LORIAUniversité de LorraineNancyFrance

Personalised recommendations