Advertisement

Computing Behavioral Distances, Compositionally

  • Giorgio Bacci
  • Giovanni Bacci
  • Kim G. Larsen
  • Radu Mardare
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8087)

Abstract

We propose a general definition of composition operator on Markov Decision Processes with rewards (MDPs) and identify a well behaved class of operators, called safe, that are guaranteed to be non-extensive w.r.t. the bisimilarity pseudometrics of Ferns et al. [10], which measure behavioral similarities between MDPs. For MDPs built using safe/non-extensive operators, we present the first method that exploits the structure of the system for (exactly) computing the bisimilarity distance on MDPs. Experimental results show significant improvements upon the non-compositional technique.

Keywords

Discount Factor Multiagent System Composition Operator Markov Decision Process Parallel Composition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bacci, G., Bacci, G., Larsen, K.G., Mardare, R.: On-the-Fly Exact Computation of Bisimilarity Distances. In: Piterman, N., Smolka, S.A. (eds.) TACAS 2013. LNCS, vol. 7795, pp. 1–15. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  2. 2.
    Castro, P.S., Precup, D.: Using bisimulation for policy transfer in MDPs. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2010, Richland, SC, vol. 1, pp. 1399–1400. International Foundation for Autonomous Agents and Multiagent Systems (2010)Google Scholar
  3. 3.
    Castro, P.S., Precup, D.: Automatic Construction of Temporally Extended Actions for MDPs Using Bisimulation Metrics. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 140–152. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Chatterjee, K., de Alfaro, L., Majumdar, R., Raman, V.: Algorithms for Game Metrics. Logical Methods in Computer Science 6(3) (2010)Google Scholar
  5. 5.
    Chen, D., van Breugel, F., Worrell, J.: On the Complexity of Computing Probabilistic Bisimilarity. In: Birkedal, L. (ed.) FOSSACS 2012. LNCS, vol. 7213, pp. 437–451. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  6. 6.
    Comanici, G., Panangaden, P., Precup, D.: On-the-Fly Algorithms for Bisimulation Metrics. In: International Conference on Quantitative Evaluation of Systems, pp. 94–103 (2012)Google Scholar
  7. 7.
    Comanici, G., Precup, D.: Basis function discovery using spectral clustering and bisimulation metrics. In: AAMAS 2011, Richland, SC, vol. 3, pp. 1079–1080. International Foundation for Autonomous Agents and Multiagent Systems (2011)Google Scholar
  8. 8.
    Dantzig, G.B.: Application of the Simplex method to a transportation problem. In: Koopmans, T. (ed.) Activity Analysis of Production and Allocation, pp. 359–373. J. Wiley, New York (1951)Google Scholar
  9. 9.
    Desharnais, J., Gupta, V., Jagadeesan, R., Panangaden, P.: Metrics for labelled Markov processes. Theoretical Computer Science 318(3), 323–354 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Ferns, N., Panangaden, P., Precup, D.: Metrics for finite Markov Decision Processes. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI, pp. 162–169. AUAI Press (2004)Google Scholar
  11. 11.
    Giacalone, A., Jou, C., Smolka, S.A.: Algebraic reasoning for probabilistic concurrent systems. In: Proc. IFIP TC2 Working Conference on Programming Concepts and Methods, pp. 443–458. North-Holland (1990)Google Scholar
  12. 12.
    Givan, R., Dean, T., Greig, M.: Equivalence notions and model minimization in Markov decision processes. Artificial Intelligence 147(1-2), 163–223 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Larsen, K.G., Skou, A.: Bisimulation through probabilistic testing. Information and Computation 94(1), 1–28 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. John Wiley & Sons, Inc., New York (1994)CrossRefzbMATHGoogle Scholar
  15. 15.
    Thorsley, D., Klavins, E.: Approximating stochastic biochemical processes with Wasserstein pseudometrics. IET Systems Biology 4(3), 193–211 (2010)CrossRefGoogle Scholar
  16. 16.
    van Breugel, F., Sharma, B., Worrell, J.: Approximating a Behavioural Pseudometric without Discount for Probabilistic Systems. Logical Methods in Computer Science 4(2), 1–23 (2008)Google Scholar
  17. 17.
    van Breugel, F., Worrell, J.: Approximating and computing behavioural distances in probabilistic transition systems. Theoretical Computer Science 360(1-3), 373–385 (2006)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Giorgio Bacci
    • 1
  • Giovanni Bacci
    • 1
  • Kim G. Larsen
    • 1
  • Radu Mardare
    • 1
  1. 1.Department of Computer ScienceAalborg UniversityDenmark

Personalised recommendations