Testing Probabilistic Equivalence Through Reinforcement Learning

  • Josée Desharnais
  • François Laviolette
  • Sami Zhioua
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4337)

Abstract

We propose a new approach to verification of probabilistic processes for which the model may not be available. We use a technique from Reinforcement Learning to approximate how far apart two processes are by solving a Markov Decision Process. If two processes are equivalent, the algorithm will return zero, otherwise it will provide a number and a test that witness the non equivalence. We suggest a new family of equivalences, called K-moment, for which it is possible to do so. The weakest, 1-moment equivalence, is trace-equivalence. The others are weaker than bisimulation but stronger than trace-equivalence.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blute, R., Desharnais, J., Edalat, A., Panangaden, P.: Bisimulation for labelled Markov processes. In: Proc. of the Twelfth IEEE Symposium On Logic In Computer Science, Warsaw, Poland (1997)Google Scholar
  2. 2.
    van Breugel, F., Shalit, S., Worrell, J.B.: Testing labelled markov processes. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 537–548. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Van Breugel, F., Worrell, J.: Approximating and computing behavioural distances in probabilistic transition systems. Theoretical Computer Science (2006)Google Scholar
  4. 4.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Chichester (1991)MATHCrossRefGoogle Scholar
  5. 5.
    Desharnais, J., Laviolette, F., Darsini Moturu, K., Zhioua, S.: Trace equivalence characterization through reinforcement learning. In: 19th Canadian Conference on Artificial Intelligence (2006) (accepted for publication)Google Scholar
  6. 6.
    Even-Dar, E., Mansour, Y.: Learning rates for Q-learning. In: Helmbold, D.P., Williamson, B. (eds.) COLT 2001 and EuroCOLT 2001. LNCS, vol. 2111, pp. 589–604. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  7. 7.
    Fiechter, C.N.: Design and Analysis of Efficient Reinforcement Learning Algorithms. PhD thesis, Univ. of Pittsburgh (1997)Google Scholar
  8. 8.
    Giacalone, A., Jou, C., Smolka, S.: Algebraic reasoning for probabilistic concurrent systems. In: Proceedings of the Working Conference on Programming Concepts and Methods. IFIP TC2 (1990)Google Scholar
  9. 9.
    Van Glabbeek, R.J.: The linear time - branching time spectrum ii. In: Best, E. (ed.) CONCUR 1993. LNCS, vol. 715, pp. 66–81. Springer, Heidelberg (1993)Google Scholar
  10. 10.
    Jou, C.-C., Smolka, S.A.: Equivalences, congruences, and complete axiomatizations for probabilistic processes. In: Baeten, J.C.M., Klop, J.W. (eds.) CONCUR 1990. LNCS, vol. 458. Springer, Heidelberg (1990)Google Scholar
  11. 11.
    Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
  12. 12.
    Kearns, M., Singh, S.: Finite-sample convergence rates for q-learning and indirect algorithms. In: Proc. of the 1998 conference on Advances in neural information processing systems II, pp. 996–1002. MIT Press, Cambridge (1999)Google Scholar
  13. 13.
    Larsen, K.G., Skou, A.: Bisimulation through probabilistic testing. Inf. Comput. 94(1), 1–28 (1991)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Lowe, G.: Representing Nondeterministic and Probabilistic Behaviour in Reactive Processes. Technical report, Progr. Res. Group, Oxford University (1993)Google Scholar
  15. 15.
    Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)Google Scholar
  16. 16.
    Watkins, C.: Learning from Delayed Rewards. PhD thesis, Univ. of Cambridge (1989)Google Scholar
  17. 17.
    Watkins, C., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Josée Desharnais
    • 1
  • François Laviolette
    • 1
  • Sami Zhioua
    • 1
  1. 1.IFT-GLOUniversité LavalQuébecCanada

Personalised recommendations