Trace Equivalence Characterization Through Reinforcement Learning

  • Josée Desharnais
  • François Laviolette
  • Krishna Priya Darsini Moturu
  • Sami Zhioua
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4013)

Abstract

In the context of probabilistic verification, we provide a new notion of trace-equivalence divergence between pairs of Labelled Markov processes. This divergence corresponds to the optimal value of a particular derived Markov Decision Process. It can therefore be estimated by Reinforcement Learning methods. Moreover, we provide some PAC-guarantees on this estimation.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bellman, R.E.: Dynamic Programming. Dover Publications, Incorporated (2003)MATHGoogle Scholar
  2. 2.
    Blute, R., Desharnais, J., Edalat, A., Panangaden, P.: Bisimulation for labelled Markov processes. In: Proc. of the Twelfth IEEE Symposium On Logic In Computer Science, Warsaw, Poland (1997)Google Scholar
  3. 3.
    Censor, Y.: Parallel Optimization: Theory, Algorithms, Applications. Oxford University Press, Oxford (1997)Google Scholar
  4. 4.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory, ch. 12. Wiley, Chichester (1991)CrossRefGoogle Scholar
  5. 5.
    Even-Dar, E., Mansour, Y.: Learning rates for Q-learning. In: Helmbold, D.P., Williamson, B. (eds.) COLT 2001 and EuroCOLT 2001. LNCS (LNAI), vol. 2111, pp. 589–604. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  6. 6.
    Fiechter, C.N.: Design and Analysis of Efficient Reinforcement Learning Algorithms. PhD thesis, Univ.of Pittsburgh (1997)Google Scholar
  7. 7.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. American Statistical Association Journal 58, 13–30 (1963)MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Jaakkola, T., Jordan, M.I., Singh, S.P.: Convergence of stochastic iterative dynamic programming algorithms. In: Cowan, J.D., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems, vol. 6, pp. 703–710. Morgan Kaufmann Publishers, San Francisco (1994)Google Scholar
  9. 9.
    Jou, C.-C., Smolka, S.A.: Equivalences, congruences, and complete axiomatizations for probabilistic processes. In: Baeten, J.C.M., Klop, J.W. (eds.) CONCUR 1990. LNCS, vol. 458, Springer, Heidelberg (1990)Google Scholar
  10. 10.
    Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
  11. 11.
    Kearns, M., Singh, S.: Finite-sample convergence rates for q-learning and indirect algorithms. In: Proc. of the 1998 conference on Advances in neural information processing systems II, pp. 996–1002. MIT Press, Cambridge (1999)Google Scholar
  12. 12.
    Larsen, K.G., Skou, A.: Bisimulation through probabilistic testing. Inf. Comput. 94(1), 1–28 (1991)MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)Google Scholar
  14. 14.
    Tsitsiklis, J.N.: Asynchronous stochastic approximation and Q-learning. Machine Learning 16(3), 185–202 (1994)MATHMathSciNetGoogle Scholar
  15. 15.
    Watkins, C.: Learning from Delayed Rewards. PhD thesis, Univ. of Cambridge (1989)Google Scholar
  16. 16.
    Watkins, C., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Josée Desharnais
    • 1
  • François Laviolette
    • 1
  • Krishna Priya Darsini Moturu
    • 1
  • Sami Zhioua
    • 1
  1. 1.IFT-GLO, Université LavalQuébec (QC)Canada

Personalised recommendations