Abstract
This paper presents Modysh, a probabilistic model checker which harvests and extends non-exhaustive exploration methods originally developed in the AI planning context. Its core functionality is based on enhancements of the heuristic search methods labeled real-time dynamic programming and find-revise-eliminate-traps and is capable of handling efficiently maximal and minimal reachability properties, expected reward properties as well as bounded properties on general MDPs. Modysh is integrated in the infrastructure of the Modest Toolset and extends the property types supported by it. We discuss the algorithmic particularities in detail and evaluate the competitiveness of Modysh in comparison to state-of-the-art model checkers in a large case study rooted in the well-established Quantitative Verification Benchmark Set. This study demonstrates that Modysh is especially attractive to use on very large benchmark instances which are not solvable by any other tool.
This work has received support by the ERC Advanced Investigators Grant 695614 POWVER, by the DFG Grant 389792660 as part of TRR 248 CPEC, and by the Key-Area Research and Development Grant 2018B010107004 of Guangdong Province.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aljazzar, H., Leue, S.: Generation of counterexamples for model checking of Markov decision processes. In: QEST 2009, Sixth International Conference on the Quantitative Evaluation of Systems, Budapest, Hungary, 13-16 September 2009. pp. 197–206. IEEE Computer Society (2009). https://doi.org/10.1109/QEST.2009.10, https://ieeexplore.ieee.org/xpl/conhome/5290656/proceeding
Ashok, P., Brázdil, T., Kretínský, J., Slámecka, O.: Monte carlo tree search for verifying reachability in Markov decision processes. In: Margaria, T., Steffen, B. (eds.) Leveraging Applications of Formal Methods, Verification and Validation. Verification - 8th International Symposium, ISoLA 2018, Limassol, Cyprus, November 5-9, 2018, Proceedings, Part II. Lecture Notes in Computer Science, vol. 11245, pp. 322–335. Springer (2018). https://doi.org/10.1007/978-3-030-03421-4_21
Ashok, P., Kretínský, J., Weininger, M.: PAC statistical model checking for Markov decision processes and stochastic games. In: Dillig and Tasiran [15], pp. 497–519. https://doi.org/10.1007/978-3-030-25540-4_29
Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Artif. Intell. 72(1-2), 81–138 (1995). https://doi.org/10.1016/0004-3702(94)00011-O
Bellman, R.: A Markovian decision process. Journal of mathematics and mechanics 6(5), 679–684 (1957)
Bertsekas, D.P.: Dynamic Programming and Optimal Control, Vol. 1. Athena Scientific (1995)
Bertsekas, D.P.: Dynamic Programming and Optimal Control, Vol. 2. Athena Scientific (1995)
Bonet, B., Geffner, H.: Labeled RTDP: improving the convergence of real-time dynamic programming. In: Giunchiglia, E., Muscettola, N., Nau, D.S. (eds.) Proceedings of the Thirteenth International Conference on Automated Planning and Scheduling (ICAPS 2003), June 9-13, 2003, Trento, Italy. pp. 12–21. AAAI (2003), http://www.aaai.org/Library/ICAPS/2003/icaps03-002.php
Bonet, B., Geffner, H.: Learning depth-first search: A unified approach to heuristic search in deterministic and non-deterministic settings, and its application to MDPs. In: Long, D., Smith, S.F., Borrajo, D., McCluskey, L. (eds.) Proceedings of the Sixteenth International Conference on Automated Planning and Scheduling, ICAPS 2006, Cumbria, UK, June 6-10, 2006. pp. 142–151. AAAI (2006), http://www.aaai.org/Library/ICAPS/2006/icaps06-015.php
Brázdil, T., Chatterjee, K., Chmelik, M., Forejt, V., Kretínský, J., Kwiatkowska, M.Z., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J. (eds.) Automated Technology for Verification and Analysis - 12th International Symposium, ATVA 2014, Sydney, NSW, Australia, November 3-7, 2014, Proceedings. Lecture Notes in Computer Science, vol. 8837, pp. 98–114. Springer (2014). https://doi.org/10.1007/978-3-319-11936-6_8, https://doi.org/10.1007/978-3-319-11936-6
Budde, C.E., D’Argenio, P.R., Hartmanns, A., Sedwards, S.: An efficient statistical model checker for nondeterminism and rare events. Int. J. Softw. Tools Technol. Transf. 22(6), 759–780 (2020). https://doi.org/10.1007/s10009-020-00563-2
Budde, C.E., Dehnert, C., Hahn, E.M., Hartmanns, A., Junges, S., Turrini, A.: JANI: Quantitative model and tool interaction. In: Legay, A., Margaria, T. (eds.) Tools and Algorithms for the Construction and Analysis of Systems - 23rd International Conference, TACAS 2017, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2017, Uppsala, Sweden, April 22-29, 2017, Proceedings, Part II. Lecture Notes in Computer Science, vol. 10206, pp. 151–168 (2017). https://doi.org/10.1007/978-3-662-54580-5_9
Budde, C.E., Hartmanns, A., Klauck, M., Kretinsky, J., Parker, D., Quatmann, T., Turrini, A., Zhang, Z.: On Correctness, Precision, and Performance in Quantitative Verification (QComp 2020 Competition Report). In: Proceedings of the 9th International Symposium On Leveraging Applications of Formal Methods, Verification and Validation. Software Verification Tools (2020). https://doi.org/10.1007/978-3-030-83723-5_15
Dehnert, C., Junges, S., Katoen, J., Volk, M.: A storm is coming: A modern probabilistic model checker. In: Majumdar, R., Kuncak, V. (eds.) Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part II. Lecture Notes in Computer Science, vol. 10427, pp. 592–600. Springer (2017). https://doi.org/10.1007/978-3-319-63390-9_31
Dillig, I., Tasiran, S. (eds.): Computer Aided Verification - 31st International Conference, CAV 2019, New York City, NY, USA, July 15-18, 2019, Proceedings, Part I, Lecture Notes in Computer Science, vol. 11561. Springer (2019). https://doi.org/10.1007/978-3-030-25540-4
Hahn, E.M., Hartmanns, A.: A comparison of time- and reward-bounded probabilistic model checking techniques. In: Fränzle, M., Kapur, D., Zhan, N. (eds.) Dependable Software Engineering: Theories, Tools, and Applications - Second International Symposium, SETTA 2016, Beijing, China, November 9-11, 2016, Proceedings. Lecture Notes in Computer Science, vol. 9984, pp. 85–100 (2016). https://doi.org/10.1007/978-3-319-47677-3_6
Hahn, E.M., Hartmanns, A.: Efficient algorithms for time- and cost-bounded probabilistic model checking. CoRR abs/1605.05551 (2016), http://arxiv.org/abs/1605.05551
Hahn, E.M., Hartmanns, A., Hensel, C., Klauck, M., Klein, J., Kretínský, J., Parker, D., Quatmann, T., Ruijters, E., Steinmetz, M.: The 2019 comparison of tools for the analysis of quantitative formal models - (QComp 2019 competition report). In: Beyer, D., Huisman, M., Kordon, F., Steffen, B. (eds.) Tools and Algorithms for the Construction and Analysis of Systems - 25 Years of TACAS: TOOLympics, Held as Part of ETAPS 2019, Prague, Czech Republic, April 6-11, 2019, Proceedings, Part III. Lecture Notes in Computer Science, vol. 11429, pp. 69–92. Springer (2019). https://doi.org/10.1007/978-3-030-17502-3_5
Hahn, E.M., Hartmanns, A., Hermanns, H.: Reachability and reward checking for stochastic timed automata. Electron. Commun. Eur. Assoc. Softw. Sci. Technol. 70 (2014). https://doi.org/10.14279/tuj.eceasst.70.968
Hahn, E.M., Li, Y., Schewe, S., Turrini, A., Zhang, L.: iscasMc: A web-based probabilistic model checker. In: Jones, C.B., Pihlajasaari, P., Sun, J. (eds.) FM 2014: Formal Methods - 19th International Symposium, Singapore, May 12-16, 2014. Proceedings. Lecture Notes in Computer Science, vol. 8442, pp. 312–317. Springer (2014). https://doi.org/10.1007/978-3-319-06410-9_22, https://doi.org/10.1007/978-3-319-06410-9
Hansen, E.A., Zilberstein, S.: Lao\({}^{\text{*}}\): A heuristic search algorithm that finds solutions with loops. Artif. Intell. 129(1-2), 35–62 (2001). https://doi.org/10.1016/S0004-3702(01)00106-0
Hartmanns, A., Hermanns, H.: The Modest Toolset: An integrated environment for quantitative modelling and verification. In: Ábrahám, E., Havelund, K. (eds.) Tools and Algorithms for the Construction and Analysis of Systems - 20th International Conference, TACAS 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France, April 5-13, 2014. Proceedings. Lecture Notes in Computer Science, vol. 8413, pp. 593–598. Springer (2014). https://doi.org/10.1007/978-3-642-54862-8_51
Hartmanns, A., Hermanns, H.: Explicit model checking of very large MDP using partitioning and secondary storage. In: Finkbeiner, B., Pu, G., Zhang, L. (eds.) Automated Technology for Verification and Analysis - 13th International Symposium, ATVA 2015, Shanghai, China, October 12-15, 2015, Proceedings. Lecture Notes in Computer Science, vol. 9364, pp. 131–147. Springer (2015). https://doi.org/10.1007/978-3-319-24953-7_10
Hartmanns, A., Klauck, M., Parker, D., Quatmann, T., Ruijters, E.: The Quantitative Verification Benchmark Set. In: Vojnar, T., Zhang, L. (eds.) Tools and Algorithms for the Construction and Analysis of Systems - 25th International Conference, TACAS 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, April 6-11, 2019, Proceedings, Part I. Lecture Notes in Computer Science, vol. 11427, pp. 344–350. Springer (2019). https://doi.org/10.1007/978-3-030-17462-0_20
Hatefi-Ardakani, H.: Finite horizon analysis of Markov automata. Ph.D. thesis, Saarland University, Germany (2017), http://scidok.sulb.uni-saarland.de/volltexte/2017/6743/
Helmert, M.: The fast downward planning system. CoRR abs/1109.6051 (2011), http://arxiv.org/abs/1109.6051
Izadi, M.T.: Sequential decision making under uncertainty. In: Zucker, J., Saitta, L. (eds.) Abstraction, Reformulation and Approximation, 6th International Symposium, SARA 2005, Airth Castle, Scotland, UK, July 26-29, 2005, Proceedings. Lecture Notes in Computer Science, vol. 3607, pp. 360–361. Springer (2005). https://doi.org/10.1007/11527862_33
The JANI specification. http://www.jani-spec.org/, accessed on 25/06/2021
Klauck, M., Hermanns, H.: Artifact accompanying the paper "A Modest Approach to Dynamic Heuristic Search in Probabilistic Model Checking" (2021), available at http://doi.org/10.5281/zenodo.4922360
Kolobov, A.: Scalable methods and expressive models for planning under uncertainty. Ph.D. thesis, University of Washington (2013)
Kolobov, A., Mausam, Weld, D.S., Geffner, H.: Heuristic search for generalized stochastic shortest path mdps. In: Bacchus, F., Domshlak, C., Edelkamp, S., Helmert, M. (eds.) Proceedings of the 21st International Conference on Automated Planning and Scheduling, ICAPS 2011, Freiburg, Germany June 11-16, 2011. AAAI (2011), http://aaai.org/ocs/index.php/ICAPS/ICAPS11/paper/view/2682
Kretínský, J., Meggendorfer, T.: Efficient strategy iteration for mean payoff in Markov decision processes. In: D’Souza, D., Kumar, K.N. (eds.) Automated Technology for Verification and Analysis - 15th International Symposium, ATVA 2017, Pune, India, October 3-6, 2017, Proceedings. Lecture Notes in Computer Science, vol. 10482, pp. 380–399. Springer (2017). https://doi.org/10.1007/978-3-319-68167-2_25, https://doi.org/10.1007/978-3-319-68167-2
Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: Verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) Computer Aided Verification - 23rd International Conference, CAV 2011, Snowbird, UT, USA, July 14-20, 2011. Proceedings. Lecture Notes in Computer Science, vol. 6806, pp. 585–591. Springer (2011). https://doi.org/10.1007/978-3-642-22110-1_47, https://doi.org/10.1007/978-3-642-22110-1
Neupane, T., Myers, C.J., Madsen, C., Zheng, H., Zhang, Z.: STAMINA: stochastic approximate model-checker for infinite-state analysis. In: Dillig and Tasiran [15], pp. 540–549. https://doi.org/10.1007/978-3-030-25540-4_31
Ruijters, E., Reijsbergen, D., de Boer, P., Stoelinga, M.: Rare event simulation for dynamic fault trees. Reliab. Eng. Syst. Saf. 186, 220–231 (2019). https://doi.org/10.1016/j.ress.2019.02.004
Steinmetz, M., Hoffmann, J., Buffet, O.: Goal probability analysis in probabilistic planning: Exploring and enhancing the state of the art. J. Artif. Intell. Res. 57, 229–271 (2016). https://doi.org/10.1613/jair.5153
Tarjan, R.E.: Depth-first search and linear graph algorithms. SIAM J. Comput. 1(2), 146–160 (1972). https://doi.org/10.1137/0201010
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Proof for \(\textit{MinProb}\)
As announced in Sect. 3.1, this appendix provides a proof that G\(\text {LRTDP}\) solves \(\textit{MinProb}\) properties on general MDP structures correctly by converging to the optimal fixpoint.
To show convergence to the optimal value function from below in case of an admissible initialization, we can argue along the invariant
stating that the value function in every iteration is always at most the value under the optimal policy. This means that an initially admissible value function always stays admissible. This is true for the admissible initialization when \(k=0\), because then \(V_0(s) = 1\) if \(s \in G\) and 0 otherwise. For all other iterations it holds that for some action a and we can derive that
The second inequality holds because \(\sigma _{opt}\) is memoryless and independent of \(s'\).
Now assume \(\sigma _{opt}\) is such that \(P^{\sigma _{opt}}_s(\diamond G)\) is minimal for all s. Then for action \(a = greedy (s, V_k)\) we have for any action b, and in particular for \(b=\sigma _{opt}(s)\),
Moreover \(V_k(s') \le P^{\sigma _{opt}}_{s'}(\diamond G)\), which allows us to derive
Claim: \(\text {If } V_k \text { is a fixpoint for } k \rightarrow \infty \text { then } P^\sigma (\diamond G) = V_\infty (s_0) \forall \sigma \text { greedy in } V.~(5)\)
Since this means \(V^*(s_0) \le V_\infty (s_0)\) and with the result from above (\(\forall k: V_k \le V^*\)) we can conclude \(V^*(s_0) = V_\infty (s_0)\).
It remains to show that (5) holds: Let , i. e., a greedy policy with respect to the value function \(V_k\) and \(S_k = \{s | P^{\sigma _k}_{s_0}(\diamond s)>0\}\), i. e., all states reachable with this greedy policy, then \(\max ( residual (S_k)) \le \delta _k\) and for \(k \rightarrow \infty \) it holds that \(\delta _k \rightarrow 0\).
To show that \(\delta _k\) will approach 0 it is enough to argue about the states which will be updated an infinite number of times, i. e., in the end, about the states on optimal policies. These are the states in \(S_\infty = \bigcap _{i \ge 0} \bigcup _{k \ge i} S_k\).
Let K be such that \(\forall k \ge K: \bigcup _{i \ge k} S_i = S_\infty \), i. e. a step from which on we only consider states which will be infinitely often visited when running G\(\text {LRTDP}\) infinitely long. Assume we are in a step \(j+1 \ge K\). Let \(s \in S_\infty \). We have to distinguish two cases:
-
If s has not been updated then \(V_{j+1}(s) = V_j(s)\).
-
If s is the updated state then \(V_{j+1}(s) = \min _{\alpha } \sum _{s'} P(s, \alpha , s') \cdot V_j(s')\)
But this is the same as for simple synchronous value iteration, for which convergence against the optimal fixpoint is proven. For our asynchronous case in G\(\text {LRTDP}\) we nevertheless have to guarantee fairness among the states in \(S_\infty \), i. e., we have to make sure that they are updated infinitely often. This is the case because each possible trial of \(S_\infty \) (there are finitely many trials) appears infinitely often, i. e. the states in this trial are updated infinitely often (by construction of G\(\text {LRTDP}\) when choosing the next greedy action). All other states not in \(S_\infty \) can be ignored because they will not influence the greedy policy and optimal values because they are already too large:
For any \(s \in S\setminus S_{\infty }\) it holds that \(V_{\infty }(s) = V_K(s) \le V^*(s)\) and for any \(s \in S_{\infty }\) by definition of \(S_{\infty }\) and K we know that an action leading again to a state in \(S_{\infty }\) will be chosen, i. e., an \(a \in \sigma _{\infty }\): \(V_{\infty }(s) \le \sum _{s' \in S_{\infty }} P(s, \sigma _{\infty }, s') \cdot V_{\infty }(s')\) but for every action we choose the greedy one and for any \(k \ge K\) it holds that \(V_k(s) \le \sum _{s' \in S} P(s, a, s') \cdot V_k(s') \le V_{\infty }(s)\), i. e., the action in \(\sigma _{\infty }\) must have been the greedy action not leading to \(S\setminus S_{\infty }\). This means that \(V_{\infty }\) defines an optimal strategy on \(S_{\infty }\) for \(s_0 \in S_{\infty }\) which is also an optimal strategy on S because no state \(s' \in S \setminus S_{\infty }\) is visited even with \(V_{\infty }(s')<V^*(s')\). In addition the initial state lies in \(S_\infty \) by construction, i. e., \(P_{\min }(\diamond G) = V^{\sigma _{opt}}(s_0)\) reaches the fixpoint and is updated infinitely often.
In summary, when running G\(\text {LRTDP}\) in an infinite number of iterations, the value function for states in \(S_\infty \) will approach the optimal values of the minimal probability to reach the goal from below, will never get larger than the optimal value and the difference between V and \(V^*\) always becomes strictly smaller for these states. In addition, we can at some point stop updating the value function for parts of the state space because these values will not have an influence on the correct optimal result for the initial state. In our implementation G\(\text {LRTDP}\) is designed in such a way that it stops when the values on the optimal policy only change by less than \(\varepsilon \), which is the same convergence criterion as for simple value iteration.
B Proof for \(\textit{MaxProb}\)
Taking up our promise from Sect. 3.1, in the following we will first give an intuition about why the presented combination of G\(\text {LRTDP}\) and \(\text {FRET}\) solves \(\textit{MaxProb}\) properties on general MDP structures correctly, not only on problems having at least one almost-sure policy as proven in [31], by converging to the optimal fixpoint. Afterwards we sketch a more formal proof.
All greedy policies inspected by G\(\text {LRTDP}\) at some point end in a goal state or a dead-end state. This could be a real dead-end, i. e., a sink state with only a self-loop or a permanent trap which has been transformed to a dead-end by the cycle elimination of FRET. If it is a permanent trap identified by \(\text {FRET}\), the values of all states in it are set to 0. Otherwise, when the sink state is discovered for the first time its value is also directly set to 0. This means we tag these states, do not explore them further and propagate their value back through the graph. Cycling forever is not possible because \(\text {FRET}\) eventually eliminates all such cycles in greedy policies. With this, we can state that at some point no more states are left to explore in the current G\(\text {LRTDP}\) trial because all relevant traps are eliminated or a goal or a sink has been found. Then G\(\text {LRTDP}\) runs until the state values of the current greedy policies are converged up to \(\varepsilon \). Even if the greedy policy is not the same in every iteration, at some point it will stay within a set of greedy states which are part of finitely many greedy policies. The values of these states will have converged close enough to the optimal ones such that the algorithm concentrates on these optimal policies. The value function used in G\(\text {LRTDP}\) is initialized admissibly and therefore can only monotonically decrease and approach the optimal fixpoint from above. When this point is reached (up to \(\varepsilon \)), the entire procedure (G\(\text {LRTDP}\) + \(\text {FRET}\)) terminates. This fixpoint must be the optimal one because the Bellman equation only admits a single fixpoint [7].
To show convergence to the optimal value function from above in case of an admissible initialization, we can argue along the invariant
stating that the value function in every iteration is always greater or equal than the optimal value under the optimal policy. This means that an initially admissible value function always stays admissible. This is true for the admissible initialization when \(k=0\), because then \(V_0(s) = 0\) if \(s \in \mathcal S_\bot \) and 1 otherwise. For all other iterations it holds that
for some action a and we can derive that
The second inequality holds because \(\sigma _{opt}\) is memoryless and independent of \(s'\).
Now assume \(\sigma _{opt}\) is such that \(P^{\sigma _{opt}}_s(\diamond G)\) is maximal for all s. Then for action \(a = greedy (s, V_k)\) we have for any action b, and in particular for \(b=\sigma _{opt}(s)\),
Moreover \(V_k(s') \ge P^{\sigma _{opt}}_{s'}(\diamond G)\) and hence
Claim: \(\text {If } V_k \text { is a fixpoint for } k \rightarrow \infty \text { then } P^\sigma (\diamond G) = V_\infty (s_0)~ \forall \sigma \text { greedy in } V.~(6)\)
Since this means \(V^*(s_0) \ge V_\infty (s_0)\) and with the result from above (\(\forall k: V_k \ge V^*\)) we can conclude \(V^*(s_0) = V_\infty (s_0)\).
It remains to show that (6) holds: Let , i. e., a greedy policy with respect to the value function \(V_k\) and \(S_k = \{s | P^{\sigma _k}_{s_0}(\diamond s)>0\}\), i. e., all states reachable with this greedy policy, then \(\max ( residual (S_k)) \le \delta _k\) and for \(k \rightarrow \infty \) it holds that \(\delta _k \rightarrow 0\).
To show that \(\delta _k\) will approach 0 it is enough to argue about the states which will be updated an infinite number of times, i. e., in the end, about the states on optimal policies. These are the states in \(S_\infty = \bigcap _{i \ge 0} \bigcup _{k \ge i} S_k\).
Let K be such that \(\forall k \ge K: \bigcup _{i \ge k} S_i = S_\infty \), i. e. a step from which on we only consider states which will be infinitely often visited when running \(\text {FRET}\)-\(\text {LRTDP}\) infinitely long. Assume we are in a step \(j+1 \ge K\). Let \(s \in S_\infty \). We have to distinguish two cases:
-
If s has not been updated then \(V_{j+1}(s) = V_j(s)\).
-
If s is the updated state then \(V_{j+1}(s) = \max _{\alpha } \sum _{s'} P(s, \alpha , s') \cdot V_j(s')\)
This is the same as for simple synchronous value iteration, for which convergence against the optimal fixpoint is proven. For our asynchronous case in G\(\text {LRTDP}\) we are left with the duty to guarantee fairness among the states in \(S_\infty \), i. e., we have to make sure that they are updated infinitely often. This is the case because each possible trial of \(S_\infty \) (there are finitely many trials) appears infinitely often, i. e., the states in this trial are updated infinitely often (by construction of G\(\text {LRTDP}\) when choosing the next greedy action). All other states not in \(S_\infty \) can be ignored because they will not influence the greedy policy and optimal values because they are already too large:
For any \(s \in S\setminus S_{\infty }\) it holds that \(V_{\infty }(s) = V_K(s) \ge V^*(s)\) and for any \(s \in S_{\infty }\) by definition of \(S_{\infty }\) and K we know that an action leading again to a state in \(S_{\infty }\) will be chosen, i. e., an \(a \in \sigma _{\infty }\): \(V_{\infty }(s) \ge \sum _{s' \in S_{\infty }} P(s, \sigma _{\infty }, s') \cdot V_{\infty }(s')\) but for every action we choose the greedy one and for any \(k \ge K\) it holds that \(V_k(s) \ge \sum _{s' \in S} P(s, a, s') \cdot V_k(s') \ge V_{\infty }(s)\), i. e., the action in \(\sigma _{\infty }\) must have been the greedy action not leading to \(S\setminus S_{\infty }\).
This means that \(V_{\infty }\) defines an optimal strategy on \(S_{\infty }\) for \(s_0 \in S_{\infty }\) which is also an optimal strategy on S because no state \(s' \in S \setminus S_{\infty }\) is visited even with \(V_{\infty }(s')>V^*(s')\).
In addition the initial state lies in \(S_\infty \) by construction, i. e., \(P_{\max }(\diamond G) = V^{\sigma _{opt}}(s_0)\) reaches the fixpoint and is updated infinitely often.
Altogether, this shows that when running \(\text {FRET}\)-\(\text {LRTDP}\) over an infinite number of iterations, the value function for states in \(S_\infty \) will approach the optimal values of the maximal probability to reach the goal from above, will never get smaller than the optimal value and the difference between V and \(V^*\) always becomes strictly smaller for these states. In addition, we can at some point stop updating the value function for parts of the state space because these values will not have an influence on the correct optimal result for the initial state. In our implementation \(\text {FRET}\)-\(\text {LRTDP}\) is designed in such a way that it stops when the values on the optimal policy only change by less than \(\varepsilon \), which is the same convergence criterion as for simple value iteration.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Klauck, M., Hermanns, H. (2021). A Modest Approach to Dynamic Heuristic Search in Probabilistic Model Checking. In: Abate, A., Marin, A. (eds) Quantitative Evaluation of Systems. QEST 2021. Lecture Notes in Computer Science(), vol 12846. Springer, Cham. https://doi.org/10.1007/978-3-030-85172-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-85172-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85171-2
Online ISBN: 978-3-030-85172-9
eBook Packages: Computer ScienceComputer Science (R0)