Monte Carlo Tree Search for Verifying Reachability in Markov Decision Processes

Ashok, Pranav; Brázdil, Tomáš; Křetínský, Jan; Slámečka, Ondřej

doi:10.1007/978-3-030-03421-4_21

Pranav Ashok²²,
Tomáš Brázdil²³,
Jan Křetínský²² &
…
Ondřej Slámečka²³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11245))

Included in the following conference series:

International Symposium on Leveraging Applications of Formal Methods

963 Accesses
4 Citations

Abstract

The maximum reachability probabilities in a Markov decision process can be computed using value iteration (VI). Recently, simulation-based heuristic extensions of VI have been introduced, such as bounded real-time dynamic programming (BRTDP), which often manage to avoid explicit analysis of the whole state space while preserving guarantees on the computed result. In this paper, we introduce a new class of such heuristics, based on Monte Carlo tree search (MCTS), a technique celebrated in various machine-learning settings. We provide a spectrum of algorithms ranging from MCTS to BRTDP. We evaluate these techniques and show that for larger examples, where VI is no more applicable, our techniques are more broadly applicable than BRTDP with only a minor additional overhead.

This research was supported in part by Deutsche Forschungsgemeinschaft (DFG) through the TUM International Graduate School of Science and Engineering (IGSSE) project 10.06 PARSEC, the Czech Science Foundation grant No. 18-11193S, and the DFG project 383882557 Statistical Unbounded Verification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Verification of Markov Decision Processes Using Learning Algorithms

Model Checking Finite-Horizon Markov Chains with Probabilistic Inference

Optimistic Value Iteration

References

Ashok, P., Brázdil, T., Křetínský, J., Slámečka, O.: Monte Carlo tree search for verifying reachability in Markov decision processes. CoRR abs/1809.03299 (2018)
Google Scholar
Ashok, P., Chatterjee, K., Daca, P., Křetínský, J., Meggendorfer, T.: Value iteration for long-run average reward in Markov decision processes. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 201–221. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_10
Chapter Google Scholar
Abramson, B., Korf, R.E.: A model of two-player evaluation functions. In: Proceedings of the 6th National Conference on Artificial Intelligence (AAAI 1987), pp. 90–94. Morgan Kaufmann (1987)
Google Scholar
Brázdil, T., et al.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11936-6_8
Chapter Google Scholar
Bellman, R.: A Markovian decision process. 6:15 (1957)
Google Scholar
Browne, C., et al.: A survey of monte carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4, 1–43 (2012)
Article Google Scholar
Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7
Chapter Google Scholar
de Alfaro, L.: Formal verification of probabilistic systems. Ph.D. thesis, Stanford University (1997)
Google Scholar
Dehnert, C., Junges, S., Katoen, J.P., Volk, M.: A storm is coming: a modern probabilistic model checker. In: Majumdar, R., Kunčak, V. (eds.) Computer Aided Verification. CAV 2017. Lecture Notes in Computer Science, vol. 10427, pp. 592–600. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63390-9_31
D’Argenio, P., Legay, A., Sedwards, S., Traonouez, L.-M.: Smart sampling for lightweight verification of Markov decision processes. Int. J. Softw. Tools Technol. Transf. 17(4), 469–484 (2015)
Article Google Scholar
Gelly, S., Wang, Y.: Exploration exploitation in Go: UCT for Monte-Carlo Go. In: NIPS: Neural Information Processing Systems Conference On-line Trading of Exploration and Exploitation Workshop, Canada, December 2006
Google Scholar
Haddad, S., Monmege, B.: Interval iteration algorithm for MDPs and IMDPs. Theor. Comput. Sci. 735, 111–131 (2018)
Article MathSciNet Google Scholar
Henriques, D., Martins, J., Zuliani, P., Platzer, A., Clarke, E.M.: Statistical model checking for Markov decision processes. In: QEST, pp. 84–93 (2012)
Google Scholar
James, S., Konidaris, G., Rosman, B.: An analysis of Monte Carlo tree search. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 4–9 February 2017, San Francisco, California, USA, pp. 3576–3582 (2017)
Google Scholar
Kelmedi, E., Krämer, J., Kretínský, J., Weininger, M.: Value iteration for simple stochastic games: stopping criterion and learning algorithm. In: CAV (1), pp. 623–642. Springer (2018)
Google Scholar
Křetínský, J., Meggendorfer, T.: Efficient strategy iteration for mean payoff in Markov decision processes. In: ATVA, pp. 380–399 (2017)
Google Scholar
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47
Chapter Google Scholar
Kwiatkowska, M., Norman, G., Parker, D.: The PRISM benchmark suite. In: Proceedings of 9th International Conference on Quantitative Evaluation of SysTems (QEST 2012), pp. 203–204. IEEE CS Press (2012)
Google Scholar
Křetínský, J.: Survey of statistical verification of linear unbounded properties: model checking and distances. In: Margaria, T., Steffen, B. (eds.) ISoLA 2016. LNCS, vol. 9952, pp. 27–45. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47166-2_3
Chapter Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Chapter Google Scholar
Kocsis, L., Szepesvári, C., Willemson, J.: Improved Monte-Carlo search (2006)
Google Scholar
Brendan McMahan, H., Likhachev, M., Gordon, G.J.: Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In: ICML 2005 (2005)
Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (2014)
MATH Google Scholar
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Silver, D., et al.: Mastering chess and Shogi by self-play with a general reinforcement learning algorithm. CoRR, abs/1712.01815 (2017)
Google Scholar
Strehl, A.L., Li, L., Wiewiora, E., Langford, J., Littman, M.L.: PAC model-free reinforcement learning. In: ICML, pp. 881–888 (2006)
Google Scholar
Silver, D., Sutton, R.S., Mueller, M.: Temporal-difference search in computer Go. Mach. Learn. 87, 183–219 (2012)
Article MathSciNet Google Scholar
Silver, D., Tesauro, G.: Monte-Carlo simulation balancing. In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML 2009), pp. 945–952. ACM (2009)
Google Scholar
Vodopivec, T., Samothrakis, S., Ster, B.: On Monte Carlo tree search and reinforcement learning. J. Artif. Intell. Res. 60, 881–936 (2017)
Article MathSciNet Google Scholar
Younes, H.L.S., Simmons, R.G.: Probabilistic verification of discrete event systems using acceptance sampling. In: Brinksma, E., Larsen, K.G. (eds.) CAV 2002. LNCS, vol. 2404, pp. 223–235. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45657-0_17
Chapter MATH Google Scholar

Download references

Author information

Authors and Affiliations

Technical University of Munich, Munich, Germany
Pranav Ashok & Jan Křetínský
Masaryk University, Brno, Czech Republic
Tomáš Brázdil & Ondřej Slámečka

Authors

Pranav Ashok
View author publications
You can also search for this author in PubMed Google Scholar
Tomáš Brázdil
View author publications
You can also search for this author in PubMed Google Scholar
Jan Křetínský
View author publications
You can also search for this author in PubMed Google Scholar
Ondřej Slámečka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Křetínský .

Editor information

Editors and Affiliations

University of Limerick, Limerick, Ireland
Tiziana Margaria
TU Dortmund, Dortmund, Germany
Bernhard Steffen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ashok, P., Brázdil, T., Křetínský, J., Slámečka, O. (2018). Monte Carlo Tree Search for Verifying Reachability in Markov Decision Processes. In: Margaria, T., Steffen, B. (eds) Leveraging Applications of Formal Methods, Verification and Validation. Verification. ISoLA 2018. Lecture Notes in Computer Science(), vol 11245. Springer, Cham. https://doi.org/10.1007/978-3-030-03421-4_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-03421-4_21
Published: 30 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03420-7
Online ISBN: 978-3-030-03421-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Monte Carlo Tree Search for Verifying Reachability in Markov Decision Processes

Abstract

Access this chapter

Similar content being viewed by others

Verification of Markov Decision Processes Using Learning Algorithms

Model Checking Finite-Horizon Markov Chains with Probabilistic Inference

Optimistic Value Iteration

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Monte Carlo Tree Search for Verifying Reachability in Markov Decision Processes

Abstract

Access this chapter

Similar content being viewed by others

Verification of Markov Decision Processes Using Learning Algorithms

Model Checking Finite-Horizon Markov Chains with Probabilistic Inference

Optimistic Value Iteration

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation