Skip to main content

A Modest Approach to Dynamic Heuristic Search in Probabilistic Model Checking

  • Conference paper
  • First Online:
Quantitative Evaluation of Systems (QEST 2021)

Abstract

This paper presents Modysh, a probabilistic model checker which harvests and extends non-exhaustive exploration methods originally developed in the AI planning context. Its core functionality is based on enhancements of the heuristic search methods labeled real-time dynamic programming and find-revise-eliminate-traps and is capable of handling efficiently maximal and minimal reachability properties, expected reward properties as well as bounded properties on general MDPs. Modysh is integrated in the infrastructure of the Modest Toolset and extends the property types supported by it. We discuss the algorithmic particularities in detail and evaluate the competitiveness of Modysh in comparison to state-of-the-art model checkers in a large case study rooted in the well-established Quantitative Verification Benchmark Set. This study demonstrates that Modysh is especially attractive to use on very large benchmark instances which are not solvable by any other tool.

This work has received support by the ERC Advanced Investigators Grant 695614 POWVER, by the DFG Grant 389792660 as part of TRR 248 CPEC, and by the Key-Area Research and Development Grant 2018B010107004 of Guangdong Province.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aljazzar, H., Leue, S.: Generation of counterexamples for model checking of Markov decision processes. In: QEST 2009, Sixth International Conference on the Quantitative Evaluation of Systems, Budapest, Hungary, 13-16 September 2009. pp. 197–206. IEEE Computer Society (2009). https://doi.org/10.1109/QEST.2009.10, https://ieeexplore.ieee.org/xpl/conhome/5290656/proceeding

  2. Ashok, P., Brázdil, T., Kretínský, J., Slámecka, O.: Monte carlo tree search for verifying reachability in Markov decision processes. In: Margaria, T., Steffen, B. (eds.) Leveraging Applications of Formal Methods, Verification and Validation. Verification - 8th International Symposium, ISoLA 2018, Limassol, Cyprus, November 5-9, 2018, Proceedings, Part II. Lecture Notes in Computer Science, vol. 11245, pp. 322–335. Springer (2018). https://doi.org/10.1007/978-3-030-03421-4_21

  3. Ashok, P., Kretínský, J., Weininger, M.: PAC statistical model checking for Markov decision processes and stochastic games. In: Dillig and Tasiran [15], pp. 497–519. https://doi.org/10.1007/978-3-030-25540-4_29

  4. Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Artif. Intell. 72(1-2), 81–138 (1995). https://doi.org/10.1016/0004-3702(94)00011-O

  5. Bellman, R.: A Markovian decision process. Journal of mathematics and mechanics 6(5), 679–684 (1957)

    Google Scholar 

  6. Bertsekas, D.P.: Dynamic Programming and Optimal Control, Vol. 1. Athena Scientific (1995)

    Google Scholar 

  7. Bertsekas, D.P.: Dynamic Programming and Optimal Control, Vol. 2. Athena Scientific (1995)

    Google Scholar 

  8. Bonet, B., Geffner, H.: Labeled RTDP: improving the convergence of real-time dynamic programming. In: Giunchiglia, E., Muscettola, N., Nau, D.S. (eds.) Proceedings of the Thirteenth International Conference on Automated Planning and Scheduling (ICAPS 2003), June 9-13, 2003, Trento, Italy. pp. 12–21. AAAI (2003), http://www.aaai.org/Library/ICAPS/2003/icaps03-002.php

  9. Bonet, B., Geffner, H.: Learning depth-first search: A unified approach to heuristic search in deterministic and non-deterministic settings, and its application to MDPs. In: Long, D., Smith, S.F., Borrajo, D., McCluskey, L. (eds.) Proceedings of the Sixteenth International Conference on Automated Planning and Scheduling, ICAPS 2006, Cumbria, UK, June 6-10, 2006. pp. 142–151. AAAI (2006), http://www.aaai.org/Library/ICAPS/2006/icaps06-015.php

  10. Brázdil, T., Chatterjee, K., Chmelik, M., Forejt, V., Kretínský, J., Kwiatkowska, M.Z., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J. (eds.) Automated Technology for Verification and Analysis - 12th International Symposium, ATVA 2014, Sydney, NSW, Australia, November 3-7, 2014, Proceedings. Lecture Notes in Computer Science, vol. 8837, pp. 98–114. Springer (2014). https://doi.org/10.1007/978-3-319-11936-6_8, https://doi.org/10.1007/978-3-319-11936-6

  11. Budde, C.E., D’Argenio, P.R., Hartmanns, A., Sedwards, S.: An efficient statistical model checker for nondeterminism and rare events. Int. J. Softw. Tools Technol. Transf. 22(6), 759–780 (2020). https://doi.org/10.1007/s10009-020-00563-2

  12. Budde, C.E., Dehnert, C., Hahn, E.M., Hartmanns, A., Junges, S., Turrini, A.: JANI: Quantitative model and tool interaction. In: Legay, A., Margaria, T. (eds.) Tools and Algorithms for the Construction and Analysis of Systems - 23rd International Conference, TACAS 2017, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2017, Uppsala, Sweden, April 22-29, 2017, Proceedings, Part II. Lecture Notes in Computer Science, vol. 10206, pp. 151–168 (2017). https://doi.org/10.1007/978-3-662-54580-5_9

  13. Budde, C.E., Hartmanns, A., Klauck, M., Kretinsky, J., Parker, D., Quatmann, T., Turrini, A., Zhang, Z.: On Correctness, Precision, and Performance in Quantitative Verification (QComp 2020 Competition Report). In: Proceedings of the 9th International Symposium On Leveraging Applications of Formal Methods, Verification and Validation. Software Verification Tools (2020). https://doi.org/10.1007/978-3-030-83723-5_15

  14. Dehnert, C., Junges, S., Katoen, J., Volk, M.: A storm is coming: A modern probabilistic model checker. In: Majumdar, R., Kuncak, V. (eds.) Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part II. Lecture Notes in Computer Science, vol. 10427, pp. 592–600. Springer (2017). https://doi.org/10.1007/978-3-319-63390-9_31

  15. Dillig, I., Tasiran, S. (eds.): Computer Aided Verification - 31st International Conference, CAV 2019, New York City, NY, USA, July 15-18, 2019, Proceedings, Part I, Lecture Notes in Computer Science, vol. 11561. Springer (2019). https://doi.org/10.1007/978-3-030-25540-4

  16. Hahn, E.M., Hartmanns, A.: A comparison of time- and reward-bounded probabilistic model checking techniques. In: Fränzle, M., Kapur, D., Zhan, N. (eds.) Dependable Software Engineering: Theories, Tools, and Applications - Second International Symposium, SETTA 2016, Beijing, China, November 9-11, 2016, Proceedings. Lecture Notes in Computer Science, vol. 9984, pp. 85–100 (2016). https://doi.org/10.1007/978-3-319-47677-3_6

  17. Hahn, E.M., Hartmanns, A.: Efficient algorithms for time- and cost-bounded probabilistic model checking. CoRR abs/1605.05551 (2016), http://arxiv.org/abs/1605.05551

  18. Hahn, E.M., Hartmanns, A., Hensel, C., Klauck, M., Klein, J., Kretínský, J., Parker, D., Quatmann, T., Ruijters, E., Steinmetz, M.: The 2019 comparison of tools for the analysis of quantitative formal models - (QComp 2019 competition report). In: Beyer, D., Huisman, M., Kordon, F., Steffen, B. (eds.) Tools and Algorithms for the Construction and Analysis of Systems - 25 Years of TACAS: TOOLympics, Held as Part of ETAPS 2019, Prague, Czech Republic, April 6-11, 2019, Proceedings, Part III. Lecture Notes in Computer Science, vol. 11429, pp. 69–92. Springer (2019). https://doi.org/10.1007/978-3-030-17502-3_5

  19. Hahn, E.M., Hartmanns, A., Hermanns, H.: Reachability and reward checking for stochastic timed automata. Electron. Commun. Eur. Assoc. Softw. Sci. Technol. 70 (2014). https://doi.org/10.14279/tuj.eceasst.70.968

  20. Hahn, E.M., Li, Y., Schewe, S., Turrini, A., Zhang, L.: iscasMc: A web-based probabilistic model checker. In: Jones, C.B., Pihlajasaari, P., Sun, J. (eds.) FM 2014: Formal Methods - 19th International Symposium, Singapore, May 12-16, 2014. Proceedings. Lecture Notes in Computer Science, vol. 8442, pp. 312–317. Springer (2014). https://doi.org/10.1007/978-3-319-06410-9_22, https://doi.org/10.1007/978-3-319-06410-9

  21. Hansen, E.A., Zilberstein, S.: Lao\({}^{\text{*}}\): A heuristic search algorithm that finds solutions with loops. Artif. Intell. 129(1-2), 35–62 (2001). https://doi.org/10.1016/S0004-3702(01)00106-0

  22. Hartmanns, A., Hermanns, H.: The Modest Toolset: An integrated environment for quantitative modelling and verification. In: Ábrahám, E., Havelund, K. (eds.) Tools and Algorithms for the Construction and Analysis of Systems - 20th International Conference, TACAS 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France, April 5-13, 2014. Proceedings. Lecture Notes in Computer Science, vol. 8413, pp. 593–598. Springer (2014). https://doi.org/10.1007/978-3-642-54862-8_51

  23. Hartmanns, A., Hermanns, H.: Explicit model checking of very large MDP using partitioning and secondary storage. In: Finkbeiner, B., Pu, G., Zhang, L. (eds.) Automated Technology for Verification and Analysis - 13th International Symposium, ATVA 2015, Shanghai, China, October 12-15, 2015, Proceedings. Lecture Notes in Computer Science, vol. 9364, pp. 131–147. Springer (2015). https://doi.org/10.1007/978-3-319-24953-7_10

  24. Hartmanns, A., Klauck, M., Parker, D., Quatmann, T., Ruijters, E.: The Quantitative Verification Benchmark Set. In: Vojnar, T., Zhang, L. (eds.) Tools and Algorithms for the Construction and Analysis of Systems - 25th International Conference, TACAS 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, April 6-11, 2019, Proceedings, Part I. Lecture Notes in Computer Science, vol. 11427, pp. 344–350. Springer (2019). https://doi.org/10.1007/978-3-030-17462-0_20

  25. Hatefi-Ardakani, H.: Finite horizon analysis of Markov automata. Ph.D. thesis, Saarland University, Germany (2017), http://scidok.sulb.uni-saarland.de/volltexte/2017/6743/

  26. Helmert, M.: The fast downward planning system. CoRR abs/1109.6051 (2011), http://arxiv.org/abs/1109.6051

  27. Izadi, M.T.: Sequential decision making under uncertainty. In: Zucker, J., Saitta, L. (eds.) Abstraction, Reformulation and Approximation, 6th International Symposium, SARA 2005, Airth Castle, Scotland, UK, July 26-29, 2005, Proceedings. Lecture Notes in Computer Science, vol. 3607, pp. 360–361. Springer (2005). https://doi.org/10.1007/11527862_33

  28. The JANI specification. http://www.jani-spec.org/, accessed on 25/06/2021

  29. Klauck, M., Hermanns, H.: Artifact accompanying the paper "A Modest Approach to Dynamic Heuristic Search in Probabilistic Model Checking" (2021), available at http://doi.org/10.5281/zenodo.4922360

  30. Kolobov, A.: Scalable methods and expressive models for planning under uncertainty. Ph.D. thesis, University of Washington (2013)

    Google Scholar 

  31. Kolobov, A., Mausam, Weld, D.S., Geffner, H.: Heuristic search for generalized stochastic shortest path mdps. In: Bacchus, F., Domshlak, C., Edelkamp, S., Helmert, M. (eds.) Proceedings of the 21st International Conference on Automated Planning and Scheduling, ICAPS 2011, Freiburg, Germany June 11-16, 2011. AAAI (2011), http://aaai.org/ocs/index.php/ICAPS/ICAPS11/paper/view/2682

  32. Kretínský, J., Meggendorfer, T.: Efficient strategy iteration for mean payoff in Markov decision processes. In: D’Souza, D., Kumar, K.N. (eds.) Automated Technology for Verification and Analysis - 15th International Symposium, ATVA 2017, Pune, India, October 3-6, 2017, Proceedings. Lecture Notes in Computer Science, vol. 10482, pp. 380–399. Springer (2017). https://doi.org/10.1007/978-3-319-68167-2_25, https://doi.org/10.1007/978-3-319-68167-2

  33. Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: Verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) Computer Aided Verification - 23rd International Conference, CAV 2011, Snowbird, UT, USA, July 14-20, 2011. Proceedings. Lecture Notes in Computer Science, vol. 6806, pp. 585–591. Springer (2011). https://doi.org/10.1007/978-3-642-22110-1_47, https://doi.org/10.1007/978-3-642-22110-1

  34. Neupane, T., Myers, C.J., Madsen, C., Zheng, H., Zhang, Z.: STAMINA: stochastic approximate model-checker for infinite-state analysis. In: Dillig and Tasiran [15], pp. 540–549. https://doi.org/10.1007/978-3-030-25540-4_31

  35. Ruijters, E., Reijsbergen, D., de Boer, P., Stoelinga, M.: Rare event simulation for dynamic fault trees. Reliab. Eng. Syst. Saf. 186, 220–231 (2019). https://doi.org/10.1016/j.ress.2019.02.004

  36. Steinmetz, M., Hoffmann, J., Buffet, O.: Goal probability analysis in probabilistic planning: Exploring and enhancing the state of the art. J. Artif. Intell. Res. 57, 229–271 (2016). https://doi.org/10.1613/jair.5153

  37. Tarjan, R.E.: Depth-first search and linear graph algorithms. SIAM J. Comput. 1(2), 146–160 (1972). https://doi.org/10.1137/0201010

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michaela Klauck .

Editor information

Editors and Affiliations

Appendices

A Proof for \(\textit{MinProb}\)

As announced in Sect. 3.1, this appendix provides a proof that G\(\text {LRTDP}\) solves \(\textit{MinProb}\) properties on general MDP structures correctly by converging to the optimal fixpoint.

To show convergence to the optimal value function from below in case of an admissible initialization, we can argue along the invariant

$$\begin{aligned} \forall k, \sigma : V_k(s) \le P^{\sigma }_s(\diamond G), \text { where } \sigma \text { s.t. } P^{\sigma }_s(\diamond G) = V^*(s) \end{aligned}$$

stating that the value function in every iteration is always at most the value under the optimal policy. This means that an initially admissible value function always stays admissible. This is true for the admissible initialization when \(k=0\), because then \(V_0(s) = 1\) if \(s \in G\) and 0 otherwise. For all other iterations it holds that for some action a and we can derive that

$$\begin{aligned}\begin{gathered} \sum _{s'} P(s, a, s') \cdot V_k(s') \le \sum _{s'} P(s, a, s') \cdot \min _{\sigma }P^{\sigma }_{s'}(\diamond G) \\ \le \sum _{s'} \min _{\sigma }(P(s, a, s') \cdot P^{\sigma }_{s'}(\diamond G)) \le \min _{\sigma } \sum _{s'} P(s, a, s') \cdot P^{\sigma }_{s'}(\diamond G)\text {.} \end{gathered}\end{aligned}$$

The second inequality holds because \(\sigma _{opt}\) is memoryless and independent of \(s'\).

Now assume \(\sigma _{opt}\) is such that \(P^{\sigma _{opt}}_s(\diamond G)\) is minimal for all s. Then for action \(a = greedy (s, V_k)\) we have for any action b, and in particular for \(b=\sigma _{opt}(s)\),

$$\begin{aligned} \sum _{s'}P(s, a, s') \cdot V_k(s') \le \sum _{s'}P(s, b, s')\cdot V_k(s'). \end{aligned}$$

Moreover \(V_k(s') \le P^{\sigma _{opt}}_{s'}(\diamond G)\), which allows us to derive

$$\begin{aligned} \sum _{s'}P(s, a, s') \cdot V_k(s') \le \sum _{s'}P(s, \sigma _{opt}(s), s') \cdot P^{\sigma _{opt}}_{s'}(\diamond G) = P^{\sigma _{opt}}_{s}(\diamond G). \end{aligned}$$

Claim: \(\text {If } V_k \text { is a fixpoint for } k \rightarrow \infty \text { then } P^\sigma (\diamond G) = V_\infty (s_0) \forall \sigma \text { greedy in } V.~(5)\)

Since this means \(V^*(s_0) \le V_\infty (s_0)\) and with the result from above (\(\forall k: V_k \le V^*\)) we can conclude \(V^*(s_0) = V_\infty (s_0)\).

It remains to show that (5) holds: Let , i. e., a greedy policy with respect to the value function \(V_k\) and \(S_k = \{s | P^{\sigma _k}_{s_0}(\diamond s)>0\}\), i. e., all states reachable with this greedy policy, then \(\max ( residual (S_k)) \le \delta _k\) and for \(k \rightarrow \infty \) it holds that \(\delta _k \rightarrow 0\).

To show that \(\delta _k\) will approach 0 it is enough to argue about the states which will be updated an infinite number of times, i. e., in the end, about the states on optimal policies. These are the states in \(S_\infty = \bigcap _{i \ge 0} \bigcup _{k \ge i} S_k\).

Let K be such that \(\forall k \ge K: \bigcup _{i \ge k} S_i = S_\infty \), i. e. a step from which on we only consider states which will be infinitely often visited when running G\(\text {LRTDP}\) infinitely long. Assume we are in a step \(j+1 \ge K\). Let \(s \in S_\infty \). We have to distinguish two cases:

  • If s has not been updated then \(V_{j+1}(s) = V_j(s)\).

  • If s is the updated state then \(V_{j+1}(s) = \min _{\alpha } \sum _{s'} P(s, \alpha , s') \cdot V_j(s')\)

But this is the same as for simple synchronous value iteration, for which convergence against the optimal fixpoint is proven. For our asynchronous case in G\(\text {LRTDP}\) we nevertheless have to guarantee fairness among the states in \(S_\infty \), i. e., we have to make sure that they are updated infinitely often. This is the case because each possible trial of \(S_\infty \) (there are finitely many trials) appears infinitely often, i. e. the states in this trial are updated infinitely often (by construction of G\(\text {LRTDP}\) when choosing the next greedy action). All other states not in \(S_\infty \) can be ignored because they will not influence the greedy policy and optimal values because they are already too large:

For any \(s \in S\setminus S_{\infty }\) it holds that \(V_{\infty }(s) = V_K(s) \le V^*(s)\) and for any \(s \in S_{\infty }\) by definition of \(S_{\infty }\) and K we know that an action leading again to a state in \(S_{\infty }\) will be chosen, i. e., an \(a \in \sigma _{\infty }\): \(V_{\infty }(s) \le \sum _{s' \in S_{\infty }} P(s, \sigma _{\infty }, s') \cdot V_{\infty }(s')\) but for every action we choose the greedy one and for any \(k \ge K\) it holds that \(V_k(s) \le \sum _{s' \in S} P(s, a, s') \cdot V_k(s') \le V_{\infty }(s)\), i. e., the action in \(\sigma _{\infty }\) must have been the greedy action not leading to \(S\setminus S_{\infty }\). This means that \(V_{\infty }\) defines an optimal strategy on \(S_{\infty }\) for \(s_0 \in S_{\infty }\) which is also an optimal strategy on S because no state \(s' \in S \setminus S_{\infty }\) is visited even with \(V_{\infty }(s')<V^*(s')\). In addition the initial state lies in \(S_\infty \) by construction, i. e., \(P_{\min }(\diamond G) = V^{\sigma _{opt}}(s_0)\) reaches the fixpoint and is updated infinitely often.

In summary, when running G\(\text {LRTDP}\) in an infinite number of iterations, the value function for states in \(S_\infty \) will approach the optimal values of the minimal probability to reach the goal from below, will never get larger than the optimal value and the difference between V and \(V^*\) always becomes strictly smaller for these states. In addition, we can at some point stop updating the value function for parts of the state space because these values will not have an influence on the correct optimal result for the initial state. In our implementation G\(\text {LRTDP}\) is designed in such a way that it stops when the values on the optimal policy only change by less than \(\varepsilon \), which is the same convergence criterion as for simple value iteration.

B Proof for \(\textit{MaxProb}\)

Taking up our promise from Sect. 3.1, in the following we will first give an intuition about why the presented combination of G\(\text {LRTDP}\) and \(\text {FRET}\) solves \(\textit{MaxProb}\) properties on general MDP structures correctly, not only on problems having at least one almost-sure policy as proven in [31], by converging to the optimal fixpoint. Afterwards we sketch a more formal proof.

All greedy policies inspected by G\(\text {LRTDP}\) at some point end in a goal state or a dead-end state. This could be a real dead-end, i. e., a sink state with only a self-loop or a permanent trap which has been transformed to a dead-end by the cycle elimination of FRET. If it is a permanent trap identified by \(\text {FRET}\), the values of all states in it are set to 0. Otherwise, when the sink state is discovered for the first time its value is also directly set to 0. This means we tag these states, do not explore them further and propagate their value back through the graph. Cycling forever is not possible because \(\text {FRET}\) eventually eliminates all such cycles in greedy policies. With this, we can state that at some point no more states are left to explore in the current G\(\text {LRTDP}\) trial because all relevant traps are eliminated or a goal or a sink has been found. Then G\(\text {LRTDP}\) runs until the state values of the current greedy policies are converged up to \(\varepsilon \). Even if the greedy policy is not the same in every iteration, at some point it will stay within a set of greedy states which are part of finitely many greedy policies. The values of these states will have converged close enough to the optimal ones such that the algorithm concentrates on these optimal policies. The value function used in G\(\text {LRTDP}\) is initialized admissibly and therefore can only monotonically decrease and approach the optimal fixpoint from above. When this point is reached (up to \(\varepsilon \)), the entire procedure (G\(\text {LRTDP}\) + \(\text {FRET}\)) terminates. This fixpoint must be the optimal one because the Bellman equation only admits a single fixpoint [7].

To show convergence to the optimal value function from above in case of an admissible initialization, we can argue along the invariant

$$\begin{aligned} \forall k, \sigma : V_k(s) \ge P^{\sigma }_s(\diamond G), \text { where } \sigma \text { s.t. } P^{\sigma }_s(\diamond G) = V^*(s) \end{aligned}$$

stating that the value function in every iteration is always greater or equal than the optimal value under the optimal policy. This means that an initially admissible value function always stays admissible. This is true for the admissible initialization when \(k=0\), because then \(V_0(s) = 0\) if \(s \in \mathcal S_\bot \) and 1 otherwise. For all other iterations it holds that

for some action a and we can derive that

$$\begin{aligned}\begin{gathered} \sum _{s'} P(s, a, s') \cdot V_k(s') \ge \sum _{s'} P(s, a, s') \cdot \max _{\sigma }P^{\sigma }_{s'}(\diamond G) \\ \ge \sum _{s'} \max _{\sigma }(P(s, a, s') \cdot P^{\sigma }_{s'}(\diamond G)) \ge \max _{\sigma } \sum _{s'} P(s, a, s') \cdot P^{\sigma }_{s'}(\diamond G) \end{gathered}\end{aligned}$$

The second inequality holds because \(\sigma _{opt}\) is memoryless and independent of \(s'\).

Now assume \(\sigma _{opt}\) is such that \(P^{\sigma _{opt}}_s(\diamond G)\) is maximal for all s. Then for action \(a = greedy (s, V_k)\) we have for any action b, and in particular for \(b=\sigma _{opt}(s)\),

$$\begin{aligned} \sum _{s'}P(s, a, s') \cdot V_k(s') \ge \sum _{s'}P(s, b, s')\cdot V_k(s'). \end{aligned}$$

Moreover \(V_k(s') \ge P^{\sigma _{opt}}_{s'}(\diamond G)\) and hence

$$\begin{aligned} \sum _{s'}P(s, a, s') \cdot V_k(s') \ge \sum _{s'}P(s, \sigma _{opt}(s), s') \cdot P^{\sigma _{opt}}_{s'}(\diamond G) = P^{\sigma _{opt}}_{s}(\diamond G). \end{aligned}$$

Claim: \(\text {If } V_k \text { is a fixpoint for } k \rightarrow \infty \text { then } P^\sigma (\diamond G) = V_\infty (s_0)~ \forall \sigma \text { greedy in } V.~(6)\)

Since this means \(V^*(s_0) \ge V_\infty (s_0)\) and with the result from above (\(\forall k: V_k \ge V^*\)) we can conclude \(V^*(s_0) = V_\infty (s_0)\).

It remains to show that (6) holds: Let , i. e., a greedy policy with respect to the value function \(V_k\) and \(S_k = \{s | P^{\sigma _k}_{s_0}(\diamond s)>0\}\), i. e., all states reachable with this greedy policy, then \(\max ( residual (S_k)) \le \delta _k\) and for \(k \rightarrow \infty \) it holds that \(\delta _k \rightarrow 0\).

To show that \(\delta _k\) will approach 0 it is enough to argue about the states which will be updated an infinite number of times, i. e., in the end, about the states on optimal policies. These are the states in \(S_\infty = \bigcap _{i \ge 0} \bigcup _{k \ge i} S_k\).

Let K be such that \(\forall k \ge K: \bigcup _{i \ge k} S_i = S_\infty \), i. e. a step from which on we only consider states which will be infinitely often visited when running \(\text {FRET}\)-\(\text {LRTDP}\) infinitely long. Assume we are in a step \(j+1 \ge K\). Let \(s \in S_\infty \). We have to distinguish two cases:

  • If s has not been updated then \(V_{j+1}(s) = V_j(s)\).

  • If s is the updated state then \(V_{j+1}(s) = \max _{\alpha } \sum _{s'} P(s, \alpha , s') \cdot V_j(s')\)

This is the same as for simple synchronous value iteration, for which convergence against the optimal fixpoint is proven. For our asynchronous case in G\(\text {LRTDP}\) we are left with the duty to guarantee fairness among the states in \(S_\infty \), i. e., we have to make sure that they are updated infinitely often. This is the case because each possible trial of \(S_\infty \) (there are finitely many trials) appears infinitely often, i. e., the states in this trial are updated infinitely often (by construction of G\(\text {LRTDP}\) when choosing the next greedy action). All other states not in \(S_\infty \) can be ignored because they will not influence the greedy policy and optimal values because they are already too large:

For any \(s \in S\setminus S_{\infty }\) it holds that \(V_{\infty }(s) = V_K(s) \ge V^*(s)\) and for any \(s \in S_{\infty }\) by definition of \(S_{\infty }\) and K we know that an action leading again to a state in \(S_{\infty }\) will be chosen, i. e., an \(a \in \sigma _{\infty }\): \(V_{\infty }(s) \ge \sum _{s' \in S_{\infty }} P(s, \sigma _{\infty }, s') \cdot V_{\infty }(s')\) but for every action we choose the greedy one and for any \(k \ge K\) it holds that \(V_k(s) \ge \sum _{s' \in S} P(s, a, s') \cdot V_k(s') \ge V_{\infty }(s)\), i. e., the action in \(\sigma _{\infty }\) must have been the greedy action not leading to \(S\setminus S_{\infty }\).

This means that \(V_{\infty }\) defines an optimal strategy on \(S_{\infty }\) for \(s_0 \in S_{\infty }\) which is also an optimal strategy on S because no state \(s' \in S \setminus S_{\infty }\) is visited even with \(V_{\infty }(s')>V^*(s')\).

In addition the initial state lies in \(S_\infty \) by construction, i. e., \(P_{\max }(\diamond G) = V^{\sigma _{opt}}(s_0)\) reaches the fixpoint and is updated infinitely often.

Altogether, this shows that when running \(\text {FRET}\)-\(\text {LRTDP}\) over an infinite number of iterations, the value function for states in \(S_\infty \) will approach the optimal values of the maximal probability to reach the goal from above, will never get smaller than the optimal value and the difference between V and \(V^*\) always becomes strictly smaller for these states. In addition, we can at some point stop updating the value function for parts of the state space because these values will not have an influence on the correct optimal result for the initial state. In our implementation \(\text {FRET}\)-\(\text {LRTDP}\) is designed in such a way that it stops when the values on the optimal policy only change by less than \(\varepsilon \), which is the same convergence criterion as for simple value iteration.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Klauck, M., Hermanns, H. (2021). A Modest Approach to Dynamic Heuristic Search in Probabilistic Model Checking. In: Abate, A., Marin, A. (eds) Quantitative Evaluation of Systems. QEST 2021. Lecture Notes in Computer Science(), vol 12846. Springer, Cham. https://doi.org/10.1007/978-3-030-85172-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85172-9_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85171-2

  • Online ISBN: 978-3-030-85172-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics