Abstract
This paper presents Modysh, a probabilistic model checker which harvests and extends nonexhaustive exploration methods originally developed in the AI planning context. Its core functionality is based on enhancements of the heuristic search methods labeled realtime dynamic programming and findreviseeliminatetraps and is capable of handling efficiently maximal and minimal reachability properties, expected reward properties as well as bounded properties on general MDPs. Modysh is integrated in the infrastructure of the Modest Toolset and extends the property types supported by it. We discuss the algorithmic particularities in detail and evaluate the competitiveness of Modysh in comparison to stateoftheart model checkers in a large case study rooted in the wellestablished Quantitative Verification Benchmark Set. This study demonstrates that Modysh is especially attractive to use on very large benchmark instances which are not solvable by any other tool.
This work has received support by the ERC Advanced Investigators Grant 695614 POWVER, by the DFG Grant 389792660 as part of TRR 248 CPEC, and by the KeyArea Research and Development Grant 2018B010107004 of Guangdong Province.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aljazzar, H., Leue, S.: Generation of counterexamples for model checking of Markov decision processes. In: QEST 2009, Sixth International Conference on the Quantitative Evaluation of Systems, Budapest, Hungary, 1316 September 2009. pp. 197–206. IEEE Computer Society (2009). https://doi.org/10.1109/QEST.2009.10, https://ieeexplore.ieee.org/xpl/conhome/5290656/proceeding
Ashok, P., Brázdil, T., Kretínský, J., Slámecka, O.: Monte carlo tree search for verifying reachability in Markov decision processes. In: Margaria, T., Steffen, B. (eds.) Leveraging Applications of Formal Methods, Verification and Validation. Verification  8th International Symposium, ISoLA 2018, Limassol, Cyprus, November 59, 2018, Proceedings, Part II. Lecture Notes in Computer Science, vol. 11245, pp. 322–335. Springer (2018). https://doi.org/10.1007/9783030034214_21
Ashok, P., Kretínský, J., Weininger, M.: PAC statistical model checking for Markov decision processes and stochastic games. In: Dillig and Tasiran [15], pp. 497–519. https://doi.org/10.1007/9783030255404_29
Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using realtime dynamic programming. Artif. Intell. 72(12), 81–138 (1995). https://doi.org/10.1016/00043702(94)00011O
Bellman, R.: A Markovian decision process. Journal of mathematics and mechanics 6(5), 679–684 (1957)
Bertsekas, D.P.: Dynamic Programming and Optimal Control, Vol. 1. Athena Scientific (1995)
Bertsekas, D.P.: Dynamic Programming and Optimal Control, Vol. 2. Athena Scientific (1995)
Bonet, B., Geffner, H.: Labeled RTDP: improving the convergence of realtime dynamic programming. In: Giunchiglia, E., Muscettola, N., Nau, D.S. (eds.) Proceedings of the Thirteenth International Conference on Automated Planning and Scheduling (ICAPS 2003), June 913, 2003, Trento, Italy. pp. 12–21. AAAI (2003), http://www.aaai.org/Library/ICAPS/2003/icaps03002.php
Bonet, B., Geffner, H.: Learning depthfirst search: A unified approach to heuristic search in deterministic and nondeterministic settings, and its application to MDPs. In: Long, D., Smith, S.F., Borrajo, D., McCluskey, L. (eds.) Proceedings of the Sixteenth International Conference on Automated Planning and Scheduling, ICAPS 2006, Cumbria, UK, June 610, 2006. pp. 142–151. AAAI (2006), http://www.aaai.org/Library/ICAPS/2006/icaps06015.php
Brázdil, T., Chatterjee, K., Chmelik, M., Forejt, V., Kretínský, J., Kwiatkowska, M.Z., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J. (eds.) Automated Technology for Verification and Analysis  12th International Symposium, ATVA 2014, Sydney, NSW, Australia, November 37, 2014, Proceedings. Lecture Notes in Computer Science, vol. 8837, pp. 98–114. Springer (2014). https://doi.org/10.1007/9783319119366_8, https://doi.org/10.1007/9783319119366
Budde, C.E., D’Argenio, P.R., Hartmanns, A., Sedwards, S.: An efficient statistical model checker for nondeterminism and rare events. Int. J. Softw. Tools Technol. Transf. 22(6), 759–780 (2020). https://doi.org/10.1007/s10009020005632
Budde, C.E., Dehnert, C., Hahn, E.M., Hartmanns, A., Junges, S., Turrini, A.: JANI: Quantitative model and tool interaction. In: Legay, A., Margaria, T. (eds.) Tools and Algorithms for the Construction and Analysis of Systems  23rd International Conference, TACAS 2017, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2017, Uppsala, Sweden, April 2229, 2017, Proceedings, Part II. Lecture Notes in Computer Science, vol. 10206, pp. 151–168 (2017). https://doi.org/10.1007/9783662545805_9
Budde, C.E., Hartmanns, A., Klauck, M., Kretinsky, J., Parker, D., Quatmann, T., Turrini, A., Zhang, Z.: On Correctness, Precision, and Performance in Quantitative Verification (QComp 2020 Competition Report). In: Proceedings of the 9th International Symposium On Leveraging Applications of Formal Methods, Verification and Validation. Software Verification Tools (2020). https://doi.org/10.1007/9783030837235_15
Dehnert, C., Junges, S., Katoen, J., Volk, M.: A storm is coming: A modern probabilistic model checker. In: Majumdar, R., Kuncak, V. (eds.) Computer Aided Verification  29th International Conference, CAV 2017, Heidelberg, Germany, July 2428, 2017, Proceedings, Part II. Lecture Notes in Computer Science, vol. 10427, pp. 592–600. Springer (2017). https://doi.org/10.1007/9783319633909_31
Dillig, I., Tasiran, S. (eds.): Computer Aided Verification  31st International Conference, CAV 2019, New York City, NY, USA, July 1518, 2019, Proceedings, Part I, Lecture Notes in Computer Science, vol. 11561. Springer (2019). https://doi.org/10.1007/9783030255404
Hahn, E.M., Hartmanns, A.: A comparison of time and rewardbounded probabilistic model checking techniques. In: Fränzle, M., Kapur, D., Zhan, N. (eds.) Dependable Software Engineering: Theories, Tools, and Applications  Second International Symposium, SETTA 2016, Beijing, China, November 911, 2016, Proceedings. Lecture Notes in Computer Science, vol. 9984, pp. 85–100 (2016). https://doi.org/10.1007/9783319476773_6
Hahn, E.M., Hartmanns, A.: Efficient algorithms for time and costbounded probabilistic model checking. CoRR abs/1605.05551 (2016), http://arxiv.org/abs/1605.05551
Hahn, E.M., Hartmanns, A., Hensel, C., Klauck, M., Klein, J., Kretínský, J., Parker, D., Quatmann, T., Ruijters, E., Steinmetz, M.: The 2019 comparison of tools for the analysis of quantitative formal models  (QComp 2019 competition report). In: Beyer, D., Huisman, M., Kordon, F., Steffen, B. (eds.) Tools and Algorithms for the Construction and Analysis of Systems  25 Years of TACAS: TOOLympics, Held as Part of ETAPS 2019, Prague, Czech Republic, April 611, 2019, Proceedings, Part III. Lecture Notes in Computer Science, vol. 11429, pp. 69–92. Springer (2019). https://doi.org/10.1007/9783030175023_5
Hahn, E.M., Hartmanns, A., Hermanns, H.: Reachability and reward checking for stochastic timed automata. Electron. Commun. Eur. Assoc. Softw. Sci. Technol. 70 (2014). https://doi.org/10.14279/tuj.eceasst.70.968
Hahn, E.M., Li, Y., Schewe, S., Turrini, A., Zhang, L.: iscasMc: A webbased probabilistic model checker. In: Jones, C.B., Pihlajasaari, P., Sun, J. (eds.) FM 2014: Formal Methods  19th International Symposium, Singapore, May 1216, 2014. Proceedings. Lecture Notes in Computer Science, vol. 8442, pp. 312–317. Springer (2014). https://doi.org/10.1007/9783319064109_22, https://doi.org/10.1007/9783319064109
Hansen, E.A., Zilberstein, S.: Lao\({}^{\text{*}}\): A heuristic search algorithm that finds solutions with loops. Artif. Intell. 129(12), 35–62 (2001). https://doi.org/10.1016/S00043702(01)001060
Hartmanns, A., Hermanns, H.: The Modest Toolset: An integrated environment for quantitative modelling and verification. In: Ábrahám, E., Havelund, K. (eds.) Tools and Algorithms for the Construction and Analysis of Systems  20th International Conference, TACAS 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France, April 513, 2014. Proceedings. Lecture Notes in Computer Science, vol. 8413, pp. 593–598. Springer (2014). https://doi.org/10.1007/9783642548628_51
Hartmanns, A., Hermanns, H.: Explicit model checking of very large MDP using partitioning and secondary storage. In: Finkbeiner, B., Pu, G., Zhang, L. (eds.) Automated Technology for Verification and Analysis  13th International Symposium, ATVA 2015, Shanghai, China, October 1215, 2015, Proceedings. Lecture Notes in Computer Science, vol. 9364, pp. 131–147. Springer (2015). https://doi.org/10.1007/9783319249537_10
Hartmanns, A., Klauck, M., Parker, D., Quatmann, T., Ruijters, E.: The Quantitative Verification Benchmark Set. In: Vojnar, T., Zhang, L. (eds.) Tools and Algorithms for the Construction and Analysis of Systems  25th International Conference, TACAS 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, April 611, 2019, Proceedings, Part I. Lecture Notes in Computer Science, vol. 11427, pp. 344–350. Springer (2019). https://doi.org/10.1007/9783030174620_20
HatefiArdakani, H.: Finite horizon analysis of Markov automata. Ph.D. thesis, Saarland University, Germany (2017), http://scidok.sulb.unisaarland.de/volltexte/2017/6743/
Helmert, M.: The fast downward planning system. CoRR abs/1109.6051 (2011), http://arxiv.org/abs/1109.6051
Izadi, M.T.: Sequential decision making under uncertainty. In: Zucker, J., Saitta, L. (eds.) Abstraction, Reformulation and Approximation, 6th International Symposium, SARA 2005, Airth Castle, Scotland, UK, July 2629, 2005, Proceedings. Lecture Notes in Computer Science, vol. 3607, pp. 360–361. Springer (2005). https://doi.org/10.1007/11527862_33
The JANI specification. http://www.janispec.org/, accessed on 25/06/2021
Klauck, M., Hermanns, H.: Artifact accompanying the paper "A Modest Approach to Dynamic Heuristic Search in Probabilistic Model Checking" (2021), available at http://doi.org/10.5281/zenodo.4922360
Kolobov, A.: Scalable methods and expressive models for planning under uncertainty. Ph.D. thesis, University of Washington (2013)
Kolobov, A., Mausam, Weld, D.S., Geffner, H.: Heuristic search for generalized stochastic shortest path mdps. In: Bacchus, F., Domshlak, C., Edelkamp, S., Helmert, M. (eds.) Proceedings of the 21st International Conference on Automated Planning and Scheduling, ICAPS 2011, Freiburg, Germany June 1116, 2011. AAAI (2011), http://aaai.org/ocs/index.php/ICAPS/ICAPS11/paper/view/2682
Kretínský, J., Meggendorfer, T.: Efficient strategy iteration for mean payoff in Markov decision processes. In: D’Souza, D., Kumar, K.N. (eds.) Automated Technology for Verification and Analysis  15th International Symposium, ATVA 2017, Pune, India, October 36, 2017, Proceedings. Lecture Notes in Computer Science, vol. 10482, pp. 380–399. Springer (2017). https://doi.org/10.1007/9783319681672_25, https://doi.org/10.1007/9783319681672
Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: Verification of probabilistic realtime systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) Computer Aided Verification  23rd International Conference, CAV 2011, Snowbird, UT, USA, July 1420, 2011. Proceedings. Lecture Notes in Computer Science, vol. 6806, pp. 585–591. Springer (2011). https://doi.org/10.1007/9783642221101_47, https://doi.org/10.1007/9783642221101
Neupane, T., Myers, C.J., Madsen, C., Zheng, H., Zhang, Z.: STAMINA: stochastic approximate modelchecker for infinitestate analysis. In: Dillig and Tasiran [15], pp. 540–549. https://doi.org/10.1007/9783030255404_31
Ruijters, E., Reijsbergen, D., de Boer, P., Stoelinga, M.: Rare event simulation for dynamic fault trees. Reliab. Eng. Syst. Saf. 186, 220–231 (2019). https://doi.org/10.1016/j.ress.2019.02.004
Steinmetz, M., Hoffmann, J., Buffet, O.: Goal probability analysis in probabilistic planning: Exploring and enhancing the state of the art. J. Artif. Intell. Res. 57, 229–271 (2016). https://doi.org/10.1613/jair.5153
Tarjan, R.E.: Depthfirst search and linear graph algorithms. SIAM J. Comput. 1(2), 146–160 (1972). https://doi.org/10.1137/0201010
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Proof for \(\textit{MinProb}\)
As announced in Sect. 3.1, this appendix provides a proof that G\(\text {LRTDP}\) solves \(\textit{MinProb}\) properties on general MDP structures correctly by converging to the optimal fixpoint.
To show convergence to the optimal value function from below in case of an admissible initialization, we can argue along the invariant
stating that the value function in every iteration is always at most the value under the optimal policy. This means that an initially admissible value function always stays admissible. This is true for the admissible initialization when \(k=0\), because then \(V_0(s) = 1\) if \(s \in G\) and 0 otherwise. For all other iterations it holds that for some action a and we can derive that
The second inequality holds because \(\sigma _{opt}\) is memoryless and independent of \(s'\).
Now assume \(\sigma _{opt}\) is such that \(P^{\sigma _{opt}}_s(\diamond G)\) is minimal for all s. Then for action \(a = greedy (s, V_k)\) we have for any action b, and in particular for \(b=\sigma _{opt}(s)\),
Moreover \(V_k(s') \le P^{\sigma _{opt}}_{s'}(\diamond G)\), which allows us to derive
Claim: \(\text {If } V_k \text { is a fixpoint for } k \rightarrow \infty \text { then } P^\sigma (\diamond G) = V_\infty (s_0) \forall \sigma \text { greedy in } V.~(5)\)
Since this means \(V^*(s_0) \le V_\infty (s_0)\) and with the result from above (\(\forall k: V_k \le V^*\)) we can conclude \(V^*(s_0) = V_\infty (s_0)\).
It remains to show that (5) holds: Let , i. e., a greedy policy with respect to the value function \(V_k\) and \(S_k = \{s  P^{\sigma _k}_{s_0}(\diamond s)>0\}\), i. e., all states reachable with this greedy policy, then \(\max ( residual (S_k)) \le \delta _k\) and for \(k \rightarrow \infty \) it holds that \(\delta _k \rightarrow 0\).
To show that \(\delta _k\) will approach 0 it is enough to argue about the states which will be updated an infinite number of times, i. e., in the end, about the states on optimal policies. These are the states in \(S_\infty = \bigcap _{i \ge 0} \bigcup _{k \ge i} S_k\).
Let K be such that \(\forall k \ge K: \bigcup _{i \ge k} S_i = S_\infty \), i. e. a step from which on we only consider states which will be infinitely often visited when running G\(\text {LRTDP}\) infinitely long. Assume we are in a step \(j+1 \ge K\). Let \(s \in S_\infty \). We have to distinguish two cases:

If s has not been updated then \(V_{j+1}(s) = V_j(s)\).

If s is the updated state then \(V_{j+1}(s) = \min _{\alpha } \sum _{s'} P(s, \alpha , s') \cdot V_j(s')\)
But this is the same as for simple synchronous value iteration, for which convergence against the optimal fixpoint is proven. For our asynchronous case in G\(\text {LRTDP}\) we nevertheless have to guarantee fairness among the states in \(S_\infty \), i. e., we have to make sure that they are updated infinitely often. This is the case because each possible trial of \(S_\infty \) (there are finitely many trials) appears infinitely often, i. e. the states in this trial are updated infinitely often (by construction of G\(\text {LRTDP}\) when choosing the next greedy action). All other states not in \(S_\infty \) can be ignored because they will not influence the greedy policy and optimal values because they are already too large:
For any \(s \in S\setminus S_{\infty }\) it holds that \(V_{\infty }(s) = V_K(s) \le V^*(s)\) and for any \(s \in S_{\infty }\) by definition of \(S_{\infty }\) and K we know that an action leading again to a state in \(S_{\infty }\) will be chosen, i. e., an \(a \in \sigma _{\infty }\): \(V_{\infty }(s) \le \sum _{s' \in S_{\infty }} P(s, \sigma _{\infty }, s') \cdot V_{\infty }(s')\) but for every action we choose the greedy one and for any \(k \ge K\) it holds that \(V_k(s) \le \sum _{s' \in S} P(s, a, s') \cdot V_k(s') \le V_{\infty }(s)\), i. e., the action in \(\sigma _{\infty }\) must have been the greedy action not leading to \(S\setminus S_{\infty }\). This means that \(V_{\infty }\) defines an optimal strategy on \(S_{\infty }\) for \(s_0 \in S_{\infty }\) which is also an optimal strategy on S because no state \(s' \in S \setminus S_{\infty }\) is visited even with \(V_{\infty }(s')<V^*(s')\). In addition the initial state lies in \(S_\infty \) by construction, i. e., \(P_{\min }(\diamond G) = V^{\sigma _{opt}}(s_0)\) reaches the fixpoint and is updated infinitely often.
In summary, when running G\(\text {LRTDP}\) in an infinite number of iterations, the value function for states in \(S_\infty \) will approach the optimal values of the minimal probability to reach the goal from below, will never get larger than the optimal value and the difference between V and \(V^*\) always becomes strictly smaller for these states. In addition, we can at some point stop updating the value function for parts of the state space because these values will not have an influence on the correct optimal result for the initial state. In our implementation G\(\text {LRTDP}\) is designed in such a way that it stops when the values on the optimal policy only change by less than \(\varepsilon \), which is the same convergence criterion as for simple value iteration.
B Proof for \(\textit{MaxProb}\)
Taking up our promise from Sect. 3.1, in the following we will first give an intuition about why the presented combination of G\(\text {LRTDP}\) and \(\text {FRET}\) solves \(\textit{MaxProb}\) properties on general MDP structures correctly, not only on problems having at least one almostsure policy as proven in [31], by converging to the optimal fixpoint. Afterwards we sketch a more formal proof.
All greedy policies inspected by G\(\text {LRTDP}\) at some point end in a goal state or a deadend state. This could be a real deadend, i. e., a sink state with only a selfloop or a permanent trap which has been transformed to a deadend by the cycle elimination of FRET. If it is a permanent trap identified by \(\text {FRET}\), the values of all states in it are set to 0. Otherwise, when the sink state is discovered for the first time its value is also directly set to 0. This means we tag these states, do not explore them further and propagate their value back through the graph. Cycling forever is not possible because \(\text {FRET}\) eventually eliminates all such cycles in greedy policies. With this, we can state that at some point no more states are left to explore in the current G\(\text {LRTDP}\) trial because all relevant traps are eliminated or a goal or a sink has been found. Then G\(\text {LRTDP}\) runs until the state values of the current greedy policies are converged up to \(\varepsilon \). Even if the greedy policy is not the same in every iteration, at some point it will stay within a set of greedy states which are part of finitely many greedy policies. The values of these states will have converged close enough to the optimal ones such that the algorithm concentrates on these optimal policies. The value function used in G\(\text {LRTDP}\) is initialized admissibly and therefore can only monotonically decrease and approach the optimal fixpoint from above. When this point is reached (up to \(\varepsilon \)), the entire procedure (G\(\text {LRTDP}\) + \(\text {FRET}\)) terminates. This fixpoint must be the optimal one because the Bellman equation only admits a single fixpoint [7].
To show convergence to the optimal value function from above in case of an admissible initialization, we can argue along the invariant
stating that the value function in every iteration is always greater or equal than the optimal value under the optimal policy. This means that an initially admissible value function always stays admissible. This is true for the admissible initialization when \(k=0\), because then \(V_0(s) = 0\) if \(s \in \mathcal S_\bot \) and 1 otherwise. For all other iterations it holds that
for some action a and we can derive that
The second inequality holds because \(\sigma _{opt}\) is memoryless and independent of \(s'\).
Now assume \(\sigma _{opt}\) is such that \(P^{\sigma _{opt}}_s(\diamond G)\) is maximal for all s. Then for action \(a = greedy (s, V_k)\) we have for any action b, and in particular for \(b=\sigma _{opt}(s)\),
Moreover \(V_k(s') \ge P^{\sigma _{opt}}_{s'}(\diamond G)\) and hence
Claim: \(\text {If } V_k \text { is a fixpoint for } k \rightarrow \infty \text { then } P^\sigma (\diamond G) = V_\infty (s_0)~ \forall \sigma \text { greedy in } V.~(6)\)
Since this means \(V^*(s_0) \ge V_\infty (s_0)\) and with the result from above (\(\forall k: V_k \ge V^*\)) we can conclude \(V^*(s_0) = V_\infty (s_0)\).
It remains to show that (6) holds: Let , i. e., a greedy policy with respect to the value function \(V_k\) and \(S_k = \{s  P^{\sigma _k}_{s_0}(\diamond s)>0\}\), i. e., all states reachable with this greedy policy, then \(\max ( residual (S_k)) \le \delta _k\) and for \(k \rightarrow \infty \) it holds that \(\delta _k \rightarrow 0\).
To show that \(\delta _k\) will approach 0 it is enough to argue about the states which will be updated an infinite number of times, i. e., in the end, about the states on optimal policies. These are the states in \(S_\infty = \bigcap _{i \ge 0} \bigcup _{k \ge i} S_k\).
Let K be such that \(\forall k \ge K: \bigcup _{i \ge k} S_i = S_\infty \), i. e. a step from which on we only consider states which will be infinitely often visited when running \(\text {FRET}\)\(\text {LRTDP}\) infinitely long. Assume we are in a step \(j+1 \ge K\). Let \(s \in S_\infty \). We have to distinguish two cases:

If s has not been updated then \(V_{j+1}(s) = V_j(s)\).

If s is the updated state then \(V_{j+1}(s) = \max _{\alpha } \sum _{s'} P(s, \alpha , s') \cdot V_j(s')\)
This is the same as for simple synchronous value iteration, for which convergence against the optimal fixpoint is proven. For our asynchronous case in G\(\text {LRTDP}\) we are left with the duty to guarantee fairness among the states in \(S_\infty \), i. e., we have to make sure that they are updated infinitely often. This is the case because each possible trial of \(S_\infty \) (there are finitely many trials) appears infinitely often, i. e., the states in this trial are updated infinitely often (by construction of G\(\text {LRTDP}\) when choosing the next greedy action). All other states not in \(S_\infty \) can be ignored because they will not influence the greedy policy and optimal values because they are already too large:
For any \(s \in S\setminus S_{\infty }\) it holds that \(V_{\infty }(s) = V_K(s) \ge V^*(s)\) and for any \(s \in S_{\infty }\) by definition of \(S_{\infty }\) and K we know that an action leading again to a state in \(S_{\infty }\) will be chosen, i. e., an \(a \in \sigma _{\infty }\): \(V_{\infty }(s) \ge \sum _{s' \in S_{\infty }} P(s, \sigma _{\infty }, s') \cdot V_{\infty }(s')\) but for every action we choose the greedy one and for any \(k \ge K\) it holds that \(V_k(s) \ge \sum _{s' \in S} P(s, a, s') \cdot V_k(s') \ge V_{\infty }(s)\), i. e., the action in \(\sigma _{\infty }\) must have been the greedy action not leading to \(S\setminus S_{\infty }\).
This means that \(V_{\infty }\) defines an optimal strategy on \(S_{\infty }\) for \(s_0 \in S_{\infty }\) which is also an optimal strategy on S because no state \(s' \in S \setminus S_{\infty }\) is visited even with \(V_{\infty }(s')>V^*(s')\).
In addition the initial state lies in \(S_\infty \) by construction, i. e., \(P_{\max }(\diamond G) = V^{\sigma _{opt}}(s_0)\) reaches the fixpoint and is updated infinitely often.
Altogether, this shows that when running \(\text {FRET}\)\(\text {LRTDP}\) over an infinite number of iterations, the value function for states in \(S_\infty \) will approach the optimal values of the maximal probability to reach the goal from above, will never get smaller than the optimal value and the difference between V and \(V^*\) always becomes strictly smaller for these states. In addition, we can at some point stop updating the value function for parts of the state space because these values will not have an influence on the correct optimal result for the initial state. In our implementation \(\text {FRET}\)\(\text {LRTDP}\) is designed in such a way that it stops when the values on the optimal policy only change by less than \(\varepsilon \), which is the same convergence criterion as for simple value iteration.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Klauck, M., Hermanns, H. (2021). A Modest Approach to Dynamic Heuristic Search in Probabilistic Model Checking. In: Abate, A., Marin, A. (eds) Quantitative Evaluation of Systems. QEST 2021. Lecture Notes in Computer Science(), vol 12846. Springer, Cham. https://doi.org/10.1007/9783030851729_2
Download citation
DOI: https://doi.org/10.1007/9783030851729_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783030851712
Online ISBN: 9783030851729
eBook Packages: Computer ScienceComputer Science (R0)