Counterexample Explanation by Learning Small Strategies in Markov Decision Processes

  • Tomáš Brázdil
  • Krishnendu Chatterjee
  • Martin Chmelík
  • Andreas Fellner
  • Jan Křetínský
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9206)


For deterministic systems, a counterexample to a property can simply be an error trace, whereas counterexamples in probabilistic systems are necessarily more complex. For instance, a set of erroneous traces with a sufficient cumulative probability mass can be used. Since these are too large objects to understand and manipulate, compact representations such as subchains have been considered. In the case of probabilistic systems with non-determinism, the situation is even more complex. While a subchain for a given strategy (or scheduler, resolving non-determinism) is a straightforward choice, we take a different approach. Instead, we focus on the strategy itself, and extract the most important decisions it makes, and present its succinct representation.

The key tools we employ to achieve this are (1) introducing a concept of importance of a state w.r.t. the strategy, and (2) learning using decision trees. There are three main consequent advantages of our approach. Firstly, it exploits the quantitative information on states, stressing the more important decisions. Secondly, it leads to a greater variability and degree of freedom in representing the strategies. Thirdly, the representation uses a self-explanatory data structure. In summary, our approach produces more succinct and more explainable strategies, as opposed to e.g. binary decision diagrams. Finally, our experimental results show that we can extract several rules describing the strategy even for very large systems that do not fit in memory, and based on the rules explain the erroneous behaviour.


Decision Tree Markov Decision Process Training Sequence Compact Representation Binary Decision Diagram 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research was funded in part by Austrian Science Fund (FWF) Grant No P 23499-N23, FWF NFN Grant No S11407-N23 (RiSE) and Z211-N23 (Wittgenstein Award), European Research Council (ERC) Grant No 279307 (Graph Games), ERC Grant No 267989 (QUAREM), the Czech Science Foundation Grant No P202/12/G061, and People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme (FP7/2007–2013) REA Grant No 291734.


  1. 1.
    Howard, R.A.: Dynamic Programming and Markov Processes. The MIT press, New York, London, Cambridge (1960)Google Scholar
  2. 2.
    Puterman, M.L.: Markov Decision Processes. Wiley, New York (1994)CrossRefzbMATHGoogle Scholar
  3. 3.
    Filar, J., Vrieze, K.: Competitive Markov Decision Processes. Springer, New York (1997)zbMATHGoogle Scholar
  4. 4.
    Baier, C., Katoen, J.-P.: Principles of Model Checking (Representation and Mind Series). The MIT Press, Cambridge (2008)Google Scholar
  5. 5.
    Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  6. 6.
    Courcoubetis, C., Yannakakis, M.: The complexity of probabilistic verification. J. ACM 42(4), 857–907 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Vardi, M.: Automatic verification of probabilistic concurrent finite state programs. In: FOCS, pp. 327–338 (1985)Google Scholar
  8. 8.
    Segala, R.: Modeling and verification of randomized distributed real-time systems. Ph.D thesis, MIT Press (1995). Technical report MIT/LCS/TR-676Google Scholar
  9. 9.
    De Alfaro, L.: Formal verification of probabilistic systems. Ph.D thesis, Stanford University (1997)Google Scholar
  10. 10.
    Kwiatkowska, M., Parker, D.: Automated verification and strategy synthesis for probabilistic systems. In: Van Hung, D., Ogawa, M. (eds.) ATVA 2013. LNCS, vol. 8172, pp. 5–22. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  11. 11.
    Brázdil, T., Chatterjee, K., Chmelík, M., Forejt, V., Křetínský, J., Kwiatkowska, M., Parker, D., Ujma, M.: Verification of markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Heidelberg (2014) Google Scholar
  12. 12.
    Bernet, J., Janin, D., Walukiewicz, I.: Permissive strategies: from parity games to safety games. ITA 36(3), 261–275 (2002)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Bouyer, P., Markey, N., Olschewski, J., Ummels, M.: Measuring permissiveness in parity games: mean-payoff parity games revisited. In: Bultan and Hsiung [60] pp. 135–149Google Scholar
  14. 14.
    Dräger, K., Forejt, V., Kwiatkowska, M., Parker, D., Ujma, M.: Permissive controller synthesis for probabilistic systems. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014 (ETAPS). LNCS, vol. 8413, pp. 531–546. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  15. 15.
    Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc., New York (1997)zbMATHGoogle Scholar
  16. 16.
    Kwiatkowska, M., Norman, G., Parker, D.: The PRISM benchmark suite. In: QEST, pp. 203–204 (2012)Google Scholar
  17. 17.
    Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: IJCAI-95, pp. 1104–1111 (1995)Google Scholar
  18. 18.
    Kearns, M., Koller, D.: Efficient reinforcement learning in factored MDPs. In: IJCAI, pp. 740–747. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999)Google Scholar
  19. 19.
    Kushmerick, N., Hanks, S., Weld, D.: An algorithm for probabilistic least-commitment planning. In: Proceedings of AAAI-94, pp. 1073–1078 (1994)Google Scholar
  20. 20.
    Hoey, J., St-aubin, R., Hu, A., Boutilier, C.: Spudd: stochastic planning using decision diagrams. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 279–288. Morgan Kaufmann (1999)Google Scholar
  21. 21.
    Chapman, D., Kaelbling, L.P.: Input generalization in delayed reinforcement learning: an algorithm and performance comparisons. pp. 726–731. Morgan Kaufmann (1991)Google Scholar
  22. 22.
    Koller, D., Parr, R.: Computing factored value functions for policies in structured MDPs. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 1332–1339. Morgan Kaufmann (1999)Google Scholar
  23. 23.
    Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: structural assumptions and computational leverage. JAIR 11, 1–94 (1999)MathSciNetzbMATHGoogle Scholar
  24. 24.
    De Alfaro, L., Kwiatkowska, M., Norman, G., Parker, D., Segala, R.: Symbolic model checking of probabilistic processes using MTBDDS and the kronecker representation. In: Graf, S. (ed.) TACAS 2000. LNCS, vol. 1785, pp. 395–410. Springer, Heidelberg (2000) CrossRefGoogle Scholar
  25. 25.
    Hermanns, H., Kwiatkowska, M., Norman, G., Parker, D., Siegle, M.: On the use of MTBDDs for performability analysis and verification of stochastic systems. J. Log. Algebraic Program. Spec. Issue Probab. Tech. Des. Anal. Syst. 56(1–2), 23–67 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Miner, A.S., Parker, D.: Symbolic representations and analysis of large probabilistic systems. In: Baier, C., Haverkort, B.R., Hermanns, H., Katoen, J.-P., Siegle, M. (eds.) Validation of Stochastic Systems. LNCS, vol. 2925, pp. 296–338. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  27. 27.
    Wimmer, R., Braitling, B., Becker, B., Hahn, E.M., Crouzen, P., Hermanns, H., Dhama, A., Theel, O.: Symblicit calculation of long-run averages for concurrent probabilistic systems. In: QEST, pp. 27–36, IEEE Computer Society, Washington, DC, USA (2010)Google Scholar
  28. 28.
    Boutilier, C., Dearden, R.: Approximating value trees in structured dynamic programming. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 54–62 (1996)Google Scholar
  29. 29.
    Pyeatt, L.D.: Reinforcement learning with decision trees. In: The 21st IASTED International Multi-Conference on Applied Informatics (AI 2003), Innsbruck, Austria, pp. 26–31, 10–13 Feb 2003Google Scholar
  30. 30.
    Raghavendra, C.S., Liu, S., Panangadan, A., Talukder, A.: Compact representation of coordinated sampling policies for body sensor networks. In: Proceedings of Workshop on Advances in Communication and Networks (Smart Homes for Tele-Health), pp. 6–10, IEEE (2010)Google Scholar
  31. 31.
    Han, T., Katoen, J.-P., Damman, B.: Counterexample generation in probabilistic model checking. IEEE Trans. Softw. Eng. 35(2), 241–257 (2009)CrossRefGoogle Scholar
  32. 32.
    Andrés, M.E., D’Argenio, P., Van Rossum, P.: Significant diagnostic counterexamples in probabilistic model checking. In: Chockler, H., Hu, A.J. (eds.) HVC 2008. LNCS, vol. 5394, pp. 129–148. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  33. 33.
    Wimmer, R., Jansen, N., Ábrahám, E., Katoen, J.-P., Becker, B.: Minimal counterexamples for linear-time probabilistic verification. TCS 549, 61–100 (2014)CrossRefzbMATHGoogle Scholar
  34. 34.
    Jansen, N., Ábrahám, E., Katelaan, J., Wimmer, R., Katoen, J.-P., Becker, B.: Hierarchical counterexamples for discrete-time markov chains. In: Bultan and Hsiung [60] pp. 443–452Google Scholar
  35. 35.
    Damman, B., Han, T., Katoen, J.-P.: Regular expressions for PCTL counterexamples. In: QEST, pp. 179–188, IEEE Computer Society (2008)Google Scholar
  36. 36.
    Fecher, H., Huth, M., Piterman, N., Wagner, D.: PCTL model checking of markov chains: truth and falsity as winning strategies in games. Perform. Eval. 67(9), 858–872 (2010)CrossRefGoogle Scholar
  37. 37.
    Aljazzar, H., Leue, S.: Directed explicit state-space search in the generation of counterexamples for stochastic model checking. IEEE Trans. Softw. Eng. 36(1), 37–60 (2010)CrossRefGoogle Scholar
  38. 38.
    Komuravelli, A., Păsăreanu, C.S., Clarke, E.M.: Assume-guarantee abstraction refinement for probabilistic systems. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 310–326. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  39. 39.
    Aljazzar, H., Leitner-Fischer, F., Leue, S., Simeonov, D.: DiPro - a tool for probabilistic counterexample generation. In: Groce, A., Musuvathi, M. (eds.) SPIN Workshops 2011. LNCS, vol. 6823, pp. 183–187. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  40. 40.
    Jansen, N., Ábráham, E., Volk, M., Wimmer, R., Katoen, J.-P., Becker, B.: The COMICS tool - computing minimal counterexamples for DTMCs. In: Chakraborty, S., Mukund, M. (eds.) ATVA 2012. LNCS, vol. 7561, pp. 349–353. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  41. 41.
    Ábrahám, E., Becker, B., Dehnert, C., Jansen, N., Katoen, J.-P., Wimmer, R.: Counterexample generation for discrete-time markov models: an introductory survey. In: Bernardo, M., Damiani, F., Hähnle, R., Johnsen, E.B., Schaefer, I. (eds.) SFM 2014. LNCS, vol. 8483, pp. 65–121. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  42. 42.
    Aljazzar, H., Leue, S.: Generation of counterexamples for model checking of markov decision processes. In: QEST, pp. 197–206, IEEE Computer Society (2009)Google Scholar
  43. 43.
    Leitner-Fischer, F., Leue, S.: Probabilistic fault tree synthesis using causality computation. IJCCBS 4(2), 119–143 (2013)CrossRefGoogle Scholar
  44. 44.
    Kattenbelt, M., Huth, M.: Verification and refutation of probabilistic specifications via games. In: IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2009, IIT Kanpur, India, pp. 251–262, 15–17 Dec 2009Google Scholar
  45. 45.
    Wimmer, R., Jansen, N., Vorpahl, A., Ábrahám, E., Katoen, J.-P., Becker, B.: High-level counterexamples for probabilistic automata. In: Joshi, K., Siegle, M., Stoelinga, M., D’Argenio, P.R. (eds.) QEST 2013. LNCS, vol. 8054, pp. 39–54. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  46. 46.
    Dehnert, C., Jansen, N., Wimmer, R., Ábrahám, E., Katoen, J.-P.: Fast debugging of PRISM models. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 146–162. Springer, Heidelberg (2014) Google Scholar
  47. 47.
    Kwiatkowska, M.Z., Norman, G., Parker, D.: Game-based abstraction for Markov decision processes. In: QEST, pp. 157–166 (2006)Google Scholar
  48. 48.
    Hermanns, H., Wachter, B., Zhang, L.: Probabilistic CEGAR. In: Gupta, A., Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 162–175. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  49. 49.
    Chadha, R., Viswanathan, M.: A counterexample-guided abstraction-refinement framework for Markov decision processes. ACM Trans. Comput. Log. 12(1), 1 (2010)MathSciNetCrossRefGoogle Scholar
  50. 50.
    Chatterjee, K., Chmelík, M., Daca, P.: CEGAR for qualitative analysis of probabilistic systems. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 473–490. Springer, Heidelberg (2014) Google Scholar
  51. 51.
    D’Argenio, P.R., Jeannet, B., Jensen, H.E., Larsen, K.G.: Reachability analysis of probabilistic systems by successive refinements. In: De Luca, L., Gilmore, S. (eds.) PROBMIV 2001, PAPM-PROBMIV 2001, and PAPM 2001. LNCS, vol. 2165, pp. 39–56. Springer, Heidelberg (2001) CrossRefGoogle Scholar
  52. 52.
    D’Argenio, P.R.: Reduction and refinement strategies for probabilistic analysis. In: Hermanns, H., Segala, R. (eds.) PROBMIV 2002, PAPM-PROBMIV 2002, and PAPM 2002. LNCS, vol. 2399, pp. 57–76. Springer, Heidelberg (2002) CrossRefGoogle Scholar
  53. 53.
    McMahan, H.B., Likhachev, M., Gordon, G.J.: Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In: ICML (2005)Google Scholar
  54. 54.
    Brázdil, T., Kiefer, S., Kučera, A.: Efficient analysis of probabilistic programs with an unbounded counter. J. ACM 61(6), 41:1–41:35 (2014)CrossRefGoogle Scholar
  55. 55.
    Von Essen, C., Jobstmann, B., Parker, D., Varshneya, R.: Semi-symbolic computation of efficient controllers in probabilistic environments. Technical report, Verimag (2012)Google Scholar
  56. 56.
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
  57. 57.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993) Google Scholar
  58. 58.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  59. 59.
    Drazin, S., Montag, M.: Decision tree analysis using weka. Machine Learning-Project II, University of Miami, pp. 1–3 (2012)Google Scholar
  60. 60.
    Bultan, T., Hsiung, P.-A. (eds.): Automated Technology for Verification and Analysis, ATVA 2011. 9th International Symposium, Taipei, Taiwan, October 11-14, 2011. Proceedings, vol. 6996, LNCS. Springer, Heidelberg (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Tomáš Brázdil
    • 1
  • Krishnendu Chatterjee
    • 2
  • Martin Chmelík
    • 2
  • Andreas Fellner
    • 2
  • Jan Křetínský
    • 2
  1. 1.Masaryk UniversityBrnoCzech Republic
  2. 2.ISTKlosterneuburgAustria

Personalised recommendations