Testing Probabilistic Equivalence Through Reinforcement Learning

  • Josée Desharnais
  • François Laviolette
  • Sami Zhioua
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4337)


We propose a new approach to verification of probabilistic processes for which the model may not be available. We use a technique from Reinforcement Learning to approximate how far apart two processes are by solving a Markov Decision Process. If two processes are equivalent, the algorithm will return zero, otherwise it will provide a number and a test that witness the non equivalence. We suggest a new family of equivalences, called K-moment, for which it is possible to do so. The weakest, 1-moment equivalence, is trace-equivalence. The others are weaker than bisimulation but stronger than trace-equivalence.


Reinforcement Learn Markov Decision Process Label Transition System Test Language Reinforcement Learn Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blute, R., Desharnais, J., Edalat, A., Panangaden, P.: Bisimulation for labelled Markov processes. In: Proc. of the Twelfth IEEE Symposium On Logic In Computer Science, Warsaw, Poland (1997)Google Scholar
  2. 2.
    van Breugel, F., Shalit, S., Worrell, J.B.: Testing labelled markov processes. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 537–548. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Van Breugel, F., Worrell, J.: Approximating and computing behavioural distances in probabilistic transition systems. Theoretical Computer Science (2006)Google Scholar
  4. 4.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Chichester (1991)zbMATHCrossRefGoogle Scholar
  5. 5.
    Desharnais, J., Laviolette, F., Darsini Moturu, K., Zhioua, S.: Trace equivalence characterization through reinforcement learning. In: 19th Canadian Conference on Artificial Intelligence (2006) (accepted for publication)Google Scholar
  6. 6.
    Even-Dar, E., Mansour, Y.: Learning rates for Q-learning. In: Helmbold, D.P., Williamson, B. (eds.) COLT 2001 and EuroCOLT 2001. LNCS, vol. 2111, pp. 589–604. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  7. 7.
    Fiechter, C.N.: Design and Analysis of Efficient Reinforcement Learning Algorithms. PhD thesis, Univ. of Pittsburgh (1997)Google Scholar
  8. 8.
    Giacalone, A., Jou, C., Smolka, S.: Algebraic reasoning for probabilistic concurrent systems. In: Proceedings of the Working Conference on Programming Concepts and Methods. IFIP TC2 (1990)Google Scholar
  9. 9.
    Van Glabbeek, R.J.: The linear time - branching time spectrum ii. In: Best, E. (ed.) CONCUR 1993. LNCS, vol. 715, pp. 66–81. Springer, Heidelberg (1993)Google Scholar
  10. 10.
    Jou, C.-C., Smolka, S.A.: Equivalences, congruences, and complete axiomatizations for probabilistic processes. In: Baeten, J.C.M., Klop, J.W. (eds.) CONCUR 1990. LNCS, vol. 458. Springer, Heidelberg (1990)Google Scholar
  11. 11.
    Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
  12. 12.
    Kearns, M., Singh, S.: Finite-sample convergence rates for q-learning and indirect algorithms. In: Proc. of the 1998 conference on Advances in neural information processing systems II, pp. 996–1002. MIT Press, Cambridge (1999)Google Scholar
  13. 13.
    Larsen, K.G., Skou, A.: Bisimulation through probabilistic testing. Inf. Comput. 94(1), 1–28 (1991)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Lowe, G.: Representing Nondeterministic and Probabilistic Behaviour in Reactive Processes. Technical report, Progr. Res. Group, Oxford University (1993)Google Scholar
  15. 15.
    Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)Google Scholar
  16. 16.
    Watkins, C.: Learning from Delayed Rewards. PhD thesis, Univ. of Cambridge (1989)Google Scholar
  17. 17.
    Watkins, C., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Josée Desharnais
    • 1
  • François Laviolette
    • 1
  • Sami Zhioua
    • 1
  1. 1.IFT-GLOUniversité LavalQuébecCanada

Personalised recommendations