Advertisement

Trace Equivalence Characterization Through Reinforcement Learning

  • Josée Desharnais
  • François Laviolette
  • Krishna Priya Darsini Moturu
  • Sami Zhioua
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4013)

Abstract

In the context of probabilistic verification, we provide a new notion of trace-equivalence divergence between pairs of Labelled Markov processes. This divergence corresponds to the optimal value of a particular derived Markov Decision Process. It can therefore be estimated by Reinforcement Learning methods. Moreover, we provide some PAC-guarantees on this estimation.

Keywords

Reinforcement Learn Markov Decision Process Reward Function Label Transition System Iterative Dynamic Programming 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bellman, R.E.: Dynamic Programming. Dover Publications, Incorporated (2003)zbMATHGoogle Scholar
  2. 2.
    Blute, R., Desharnais, J., Edalat, A., Panangaden, P.: Bisimulation for labelled Markov processes. In: Proc. of the Twelfth IEEE Symposium On Logic In Computer Science, Warsaw, Poland (1997)Google Scholar
  3. 3.
    Censor, Y.: Parallel Optimization: Theory, Algorithms, Applications. Oxford University Press, Oxford (1997)Google Scholar
  4. 4.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory, ch. 12. Wiley, Chichester (1991)CrossRefGoogle Scholar
  5. 5.
    Even-Dar, E., Mansour, Y.: Learning rates for Q-learning. In: Helmbold, D.P., Williamson, B. (eds.) COLT 2001 and EuroCOLT 2001. LNCS (LNAI), vol. 2111, pp. 589–604. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  6. 6.
    Fiechter, C.N.: Design and Analysis of Efficient Reinforcement Learning Algorithms. PhD thesis, Univ.of Pittsburgh (1997)Google Scholar
  7. 7.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. American Statistical Association Journal 58, 13–30 (1963)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Jaakkola, T., Jordan, M.I., Singh, S.P.: Convergence of stochastic iterative dynamic programming algorithms. In: Cowan, J.D., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems, vol. 6, pp. 703–710. Morgan Kaufmann Publishers, San Francisco (1994)Google Scholar
  9. 9.
    Jou, C.-C., Smolka, S.A.: Equivalences, congruences, and complete axiomatizations for probabilistic processes. In: Baeten, J.C.M., Klop, J.W. (eds.) CONCUR 1990. LNCS, vol. 458, Springer, Heidelberg (1990)Google Scholar
  10. 10.
    Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
  11. 11.
    Kearns, M., Singh, S.: Finite-sample convergence rates for q-learning and indirect algorithms. In: Proc. of the 1998 conference on Advances in neural information processing systems II, pp. 996–1002. MIT Press, Cambridge (1999)Google Scholar
  12. 12.
    Larsen, K.G., Skou, A.: Bisimulation through probabilistic testing. Inf. Comput. 94(1), 1–28 (1991)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)Google Scholar
  14. 14.
    Tsitsiklis, J.N.: Asynchronous stochastic approximation and Q-learning. Machine Learning 16(3), 185–202 (1994)zbMATHMathSciNetGoogle Scholar
  15. 15.
    Watkins, C.: Learning from Delayed Rewards. PhD thesis, Univ. of Cambridge (1989)Google Scholar
  16. 16.
    Watkins, C., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Josée Desharnais
    • 1
  • François Laviolette
    • 1
  • Krishna Priya Darsini Moturu
    • 1
  • Sami Zhioua
    • 1
  1. 1.IFT-GLO, Université LavalQuébec (QC)Canada

Personalised recommendations