1 Introduction

Model checking [12] is a verification technique that determines the validity of properties specified as temporal logics formulae on formal models of systems ranging from hardware circuits [6, 13] and concurrent programs [21] to cyber-physical systems [15, 45]. The model’s semantics is traditionally some form of transition system [3]. Extended model checking approaches deal with, for example, real-time systems using a timed automata semantics [1, 7], or probabilistic systems [2] using Markov chains or Markov decision processes (MDP) [5, 47]. Given the often safety- or mission-critical nature of the systems being model-checked, the correctness of the model checker is of utmost importance.

As of today, however, few model checkers themselves are formally verified, and none of those is widely used. The Cava LTL model checker [8, 18], for example, is fully verified, from algorithmic correctness all the way down to a correct implementation. Yet, for the same purpose, Spin  [31] remains the tool of choice for practitioners despite being unverified. This is because Cava supports only a fragment of the Promela input language [44], and is much slower due to its purely functional-programming implementation, while Spin ’s algorithms and code have been highly optimised. Similarly, the fully verified Munta model checker for timed automata [55] is significantly slower than the de-facto standard tool UPPAAL [4], despite Munta ’s refinement resulting in Standard ML code that uses imperative elements such as arrays to obtain better performance.

While initiatives like Cava and Munta constitute major achievements in interactive theorem proving (ITP) research, they have not managed to bring the benefits of ITP into verification practice. Their approach towards the goal of a fully-verified model checker is top-down: Create a new tool from scratch, necessarily starting (and ultimately remaining) with a limited scope that prevents practical adoption. In addition, they are limited by the technology available in their time for refining abstract algorithms into executable code.

We instead propose a bottom-up approach: Starting from an existing model checker that is competitive and has an established user base, replace its unverified code by provably correct implementations component-by-component. In this way, the tool is not immediately fully verified, but the trusted code base is reduced step-by-step. Crucially, by exploiting recent advances in refinement technology [39, 41] that deliver highly-efficient LLVM bytecode, our verified replacement components perform similarly to the unverified originals implemented in e.g. C or C#. The incremental approach is thus “invisible” to the users, leading to an immediate adoption of the benefits of ITP in verification practice.

figure a

A MEC is a subset of the states of an MDP for which a strategy exists that remains within the MEC with probability 1. In an MDP with nontrivial MECs, the Bellman operator used in sound numeric algorithms for probabilistic model checking (PMC) for indefinite-horizon properties [22, 26, 48] has multiple fixed points, leading to divergence [22] and/or breaking the algorithm’s correctness proof [26]. Eliminating or later deflating [17] the MECs of an MDP is thus a necessary step in PMC. To the best of our knowledge, ours is the first mechanical formalisation and correctness proof of MEC decomposition. We use the Isabelle Refinement Framework [42] to refine our algorithm down to LLVM code which we integrate into an existing model checker. We target mcsta of the Modest Toolset  [25]. Its performance is competitive [9], and it has been used for various case studies by different teams of researchers [24, 50, 53]. Our verification and refinement of MEC decomposition constitutes a critical step on the long-term bottom-up path towards a fully-verified probabilistic model checker, laying the foundation for verifying the actual numeric algorithm as the next step. MEC decomposition is also used in probabilistic planning [57] as part of the FRETFootnote 1 approach [33, 51], and can be generalised from MDP to stochastic games where it is equally necessary for sound algorithms [17]. Our work can thus be transferred to tools in these areas.

Our MEC decomposition algorithm, or MEC algorithm for short, follows the standard approach [3, Algorithm 47]: (i) find all strongly connected components (SCCs) of the MDP’s graph, (ii) identify all bottom SCCs as MECs and remove them, (iii) delete all transitions with nonzero probability to leave an SCC, and (iv) repeat until no more states remain. After defining MDPs and MECs in Sect. 2, we present the algorithm, our formalisation in Isabelle/HOL, and our correctness proof in Sect. 3. We introduce the efficient data structures for the implementation in Sect. 4. We had earlier verified Gabow’s SCC-finding algorithm and refined it into efficient LLVM code for mcsta  [28]. We were able to integrate the SCC algorithm’s high-level correctness proof into our MEC algorithm formalisation with minor technical adaptations. However, the SCC algorithm could assume the graph to be static, whereas the MEC algorithm iteratively changes the MDP graph. We thus need a new data structure that allows deleting states and transitions, which we describe together with the corresponding refinement proofs in Sect. 4. In this part, we also extended proofs and refinement relations for the SCC-finding aspect due to an extended data structure. In Sect. 5, we describe the LLVM code generation and integration into mcsta. By adopting mcsta ’s existing MDP representation, we minimise costly glue code and transformations or copies of the data. This is important for the scalability and performance of our end result, which we experimentally show in Sect. 6.

Related Work. Certification is an alternative to verification: A formally verified certifier checks the results of an unverified tool. This requires a practical certification mechanism and the support of the unverified model checker. Formally verified certification tools that work on significant problem sizes exist for e.g. timed automata model checking [54, 56] and SAT solving [29, 40].

Probabilistic models have been the subject of ITP work before. Notably, there are some formalisations of MDPs and the value iteration algorithm in Isabelle/HOL [30] and Coq [52], but executable code does not appear to have been extracted from these proofs. Additionally, there is a formalisation of value iteration for discounted expected rewards [43] which extracts Standard ML code from the proof. We note that MEC decomposition is not necessary in the discounted case, thus [43] and the many current works in machine learning/artificial intelligence based on reinforcement learning typically avoid the problem.

The standard MEC decomposition approach computes SCCs. SCC-finding algorithms have been formalised with various tools, including Isabelle/HOL [38], Coq [46], and Why3 [11]. Of these, only [38] extracted executable code, which however performed poorly. Our earlier verification and high-performance refinement of Gabow’s SCC-finding algorithm [28] built upon ideas from [38]. An asymptotically faster MEC algorithm has been proposed [10]. It combines SCC-finding with a lock-step depth-first search. The algorithm has not been adopted by PMC tools so far, likely due to its implementation complexity.

2 Background

We introduce MDPs and MECs in the context of probabilistic model checking, then explain the refinement-based approach to program verification that we use.

Probabilistic Model Checking. Let \([0, 1] \subseteq \mathbb {R} \) be the interval of real numbers from 0 to 1 and \(2^X\) the power set of X. A (discrete) probability distribution over X is a function \(\mu :X \rightarrow [0,1]\) where \(\sum _{x \in X} \mu (x) = 1\) that has countable support \( Sp (\mu ) = \{\,v \mid \mu (v) > 0\,\}\). Dist(X) is the set of probability distributions over X.

Definition 1

A Markov decision process (MDP) is a pair (SK) where S is a finite set of states and K is the kernel of type \(S \rightarrow 2^{ Dist (S)}\).

An MDP models the interaction of an agent with a random environment: In current state u, the agent makes a decision, i.e. non-deterministically chooses a distribution \(\mu \in K(u)\). The environment then updates the current state by sampling \(\mu \). By repeating this process, we trace a path with a certain probability. A strategy represents an agent’s decisions of which distribution to pick next based on the path traced so far. Combining an MDP and a strategy removes all non-determinism, resulting in a Markov chain on which a probability measure over paths can be defined in the standard way [3]. We characterise interesting sets of paths via properties; for this work, we are particularly interested in reachability:

Definition 2

Given sets \(A, T \subseteq S\), a reachability property is an LTL formula \(\lnot A \mathbin {\textsf{U}} T\) (characterising the set of paths that do not visit avoid states (A) before a target state (T) which is visited eventually). Under a given strategy, the probability of satisfying a property is the probability mass of the (measurable) set of paths satisfying that property.

Fig. 1.
figure 1

MDP (SK)

There is a strategy that minimises and one that maximises the probability of satisfying \(\lnot A \mathbin {\textsf{U}} T\) [3], which induce the minimum/maximum reachability probabilities.

Example 1

Figure 1 shows an MDP with \(S=\{\,0,1,2,3\,\}\). The edges represent K where \(\alpha \), \(\beta \) and \(\gamma \) label the non-deterministic choices followed by the probability mass of each state. The minimum probability to satisfy \(\lnot \{\,1\,\} \mathbin {\textsf{U}} \{\,3\,\}\) is 0 for the strategy that always chooses \(\beta \) and \(\gamma \). The maximum probability is 0.5 by choosing \(\alpha \) twice. After this, we are either in target state 3 or in avoid state 1.

The edges of an MDP kernel are \( Edges (K) = \{\,(u,v) \mid \exists \,\mu \in K(u):\mu (v) > 0\,\}\). A sub-MDP of (SK) is a pair (CD) where \(C \subseteq S\) and \(D(u) \subseteq K(u)\).

Definition 3

Given an MDP (SK), an end component (EC) [14] is a sub-MDP (CD) such that \(C \times C \subseteq Edges(D)^*\) (it is strongly connected) and \((u,v) \in Edges (K) \wedge u \in C \Rightarrow v \in C\) (it is closed). A maximal end component (MEC) is an EC that is not a sub-MDP of another EC.

SCCs are weaker than MECs: They are maximal strongly connected subsets of states rather than closed sub-MDPs. In other words, for every state, there exists a strategy such that the next state is in the SCC with probability \(>0\), while for a MEC the probability is 1. MECs play an essential role in sound algorithms for evaluating reachability probabilities: Collapsing the MECs (i.e. replacing every MEC by a single state that collects all edges out of the MEC) guarantees a single fixed point for these algorithms. We find MECs through a graph analysis that requires the computation of SCCs. Graph analysis means that we only need to know whether probabilities are non-zero, i.e. we work with the MDP structure that maps state u to its set of supports \(\{\,Sp(\mu ) \mid \mu \in K(u)\,\} \subseteq 2^S\). We call elements of the outer set transitions and elements of the inner sets branches.

Example 2

In the MDP structure for Fig. 1, state 0 is mapped to \(\{\, \{\,0,1\,\}, \{\,2\,\} \,\}\). The MDP has two SCCs: \(\{\,0,1,2\,\}\) and \(\{\,3\,\}\). Set \(\{\,0,1\,\}\) is not an SCC as it is not maximal. There are three MECs: \(\{\,0,1\,\}\), \(\{\,2\,\}\), and \(\{\,3\,\}\). While state 2 has an edge to 1, it is not in the same MEC as it cannot go back with probability 1.

We also use models that are Markov automata (MA) [16] and probabilistic timed automata (PTA) [37]. Untimed reachability on a MA can be checked on its embedded MDP, while PTA can be converted to MDP using e.g. digital clocks [36].

Verification by Refinement. We aim for efficient verified executable code. This requires reasoning about the high-level behaviour of algorithms as well as about lower-level concepts like efficient data structures. To keep these independent concerns separate, we use an iterative refinement approach:

We represent the algorithm with the nondeterministic result (nres) monad of the Isabelle Refinement Framework (IRF) [42]. It has two possible states: result and fail. The former captures the set of outputs of all non-deterministic behaviours (e.g. picking an element of a set) of a program while the latter occurs if any behaviour of the program fails (e.g. non-termination). For abstract program A and concrete program C, the refinement relation \(C \le \Downarrow R\,A\) holds iff each result of C relates to a result of A via relation R. If A fails, then C always refines it. We use predefined relations like \(R_{ size }\) and \(R_{ bool }\) to relate natural numbers and booleans to 64 and 1 bit words, respectively, or to build a relation from abstraction function \(\alpha \) which converts concrete data to abstract data and invariant I that holds if the data is in valid form. We use notation for

where P is a precondition over the abstract program. To refine e.g. addition of natural numbers (\(a + b\)) to addition of 64-bit words, we need the precondition \(a + b \le 2^{63} - 1\); the maximal value of 64-bit signed words.

As they are transitive, we can compose refinements. The final step is an automatic refinement to a model of LLVM using the sepref tool [39]. It uses assertions of separation logic [49] to map data structures to concrete memory contents; e.g. and map 64 and 1 bit words to memory, respectively, and \(A_{ list }\) maps a list to memory using a heap. We combine relations and assertions through composition; e.g. maps natural numbers to memory.

Example 3

We show an example of a bitset, abstractly represented as a . We implement operation sget, which tests whether a value is in a bitset.

figure i

Here, we (1) define an abstract function , which is a membership test, and an implementation over (i.e. a list of 64-bit binary words) and an index . This function obtains the i-th bit in the sequence: obtains the j-th word and the k-th bit in word x. We then (2) relate to a using \(R_{ bs }\); we provide as abstraction function to convert s to s, and as invariant that makes sure that n values fit in our bitset. Next, (3) we prove refinement of to . maps all values up to n to themselves. (4) Function is an LLVM program automatically generated by sepref and refines . The precondition guarantees that the index is in bounds. Finally (5) through composition we obtain that maps a to a bitset on the heap. This allows sepref to generate LLVM code for every occurence of . Note that we simplified the notation e.g. to match the relation refinement and we omitted notation for (non-)destructive heap access.

Fig. 2.
figure 2

An execution of the MEC algorithm using 3 iterations.

3 Correctness of the MEC Algorithm

The standard MEC algorithm iteratively culls the MDP as follows: (1) Calculate the SCC decomposition of the current MDP, (2) find the SCCs that are MECs, and (3) remove the found MECs, and remove all transitions with branches to a different SCC. Figure 2 shows 3 iterations of this algorithm on the MDP of Fig. 1. SCCs are marked by dotted lines, states of SCCs that are MECs are coloured, and culled branches in the current/previous iteration are red/gray. This algorithm is loosely based on those of [3, 10] where [10] uses an attractor computation to remove more states per iteration for which we were unable to find an efficient implementation while [3] excludes step 2, not identifying MECs early, which means computations may be repeated on them. Our approach is based on the existing code in mcsta, which includes step 2 and omits the attractor computation.

3.1 Abstract MDP Structure

We represent the MDP structure as mapping each state to a list of lists of states, i.e. it is of type . We chose a list-based representation over a set-based one for straightforward compatibility with our earlier SCC implementation [28] while still being abstract enough for our purposes. The MEC algorithm takes the states ( ) and the MDP kernel ( ) as parameters. We use Isabelle/HOL’s locale mechanism for general constructs. A locale creates a block in which user-specified assumptions hold. We define an MDP locale with some natural well-formedness assumptions:

figure ag

This locale states that transitions have at least one branch, the state space is finite, and the MDP is closed, i.e. all transitions starting in S end in S.

3.2 Specification

Let denote that the MDP is strongly connected. Given :

figure aj

where is the proper . Here, holds if is a sub-MDP of . We allow reorderings and (de)duplications of the transitions as they do not alter the MDP structure. An EC is a strongly connected, closed sub-MDP with at least one state. A MEC is an EC that is not a proper sub-MDP of another EC. With these definitions, we specify MEC algorithms as those that return a list with the MECs of the input MDP structure:

figure ap

3.3 Abstract Algorithm

We now define the MEC algorithm, focusing on its abstract, high-level behaviour; we refine this to concrete data structures in Sect. 4. The definition in Isabelle is:

figure aq

We initialise the loop state in line 2 as an empty list M to store the MECs and \(S=S_0\) and \(K=K_0\). We bundle this data into one tuple so that we can refine them through a single assertion in Sect. 4.2. We iterate as long as there are states for which we have not found a MEC in line 3. We then perform the three-step process described earlier: We (1) compute in line 5 such that holds (i.e. is a distinct list of all SCCs of the graph structure of S and K). We then (2) obtain list V which contains all SCCs of C that are also MECs in line 6. We finally (3) remove MECs and transitions between different SCCs in line 7. At the end of the program, we extract M which contains the MECs.

These operations are defined by high-level behaviour; e.g. for :

figure av

We elided the definition of predicate which holds if only contains the transitions in whose branches all remain within the same SCC as their source. Also, adds the identified MECs to and removes them from  .

These definitions are still far from an efficient implementation. We first refine each operation to a control flow (definitions elided). The operations of that control flow are implemented in the respective data structures in Sect. 4. The SCC algorithm has been refined separately in [28].

Invariant. We define the following invariant for the main loop of the algorithm:

figure bc

It states that (1) the states ( ) and MECs ( ) are disjoint and (2) cover the original statespace. Also, (3,4)  is pairwise disjoint and contains no duplicates. The current graph structure is (5) a sub-MDP of the input and (6) an MDP itself. Further, (7) all MECs are either in or in the current graph structure, (8) a MEC of the current graph structure is a MEC of the original one, (9) each element of is a MEC, and (10) transitions in the original graph structure within one SCC are preserved in the current one. We have proven that the invariant is preserved throughout the while-loop and if is empty the specification holds.

Termination is guaranteed as every non-empty graph has at least one bottom-SCC (BSCC), i.e. an SCC with no outgoing edges. Our algorithms finds MECs by identifying BSCCs; we find at least one MEC per iteration and remove it from the state space. Since the state space is finite, we necessarily terminate.

4 Data Structures and Refinement

The next step is to define the data structures to efficiently implement the abstract operations specified in Sect. 3.3. For input and output, we formalize the data structures that mcsta uses, so that we can integrate our implementation without costly conversions. Using the IRF, the refinement is done modularly, and in multiple steps to structure the correctness proof and keep it manageable.

4.1 Supplementary Data Structures

We introduce auxiliary data structures that are part of mcsta ’s data structure:

Intervals. In mcsta, intervals of natural numbers are represented as a single 64 bit word, where the 20 most significant bits encode the length, and the 44 remaining bits encode the starting point l. Like in [28], we express this refinement in two levels: the relation relates a 64 bit word to a pair (ni) of type , and the functions and represent these as set and list, respectively.

Disjoint Nat Set List. Our implementation requires a map from states to indices of MECs or SCCs. Low-valued indices are MECs while high-valued ones are SCCs. Abstractly, we represent this as two lists of sets of states such that each state occurs at most once. We highlight some operations here:

figure bo

Operation constructs a tuple of empty lists, returns the length of the first list, and moves state v into index i of the first list (removing it from anywhere else if necessary). Every operation on the first list (with suffix 1) has a corresponding operation on the second one (with suffix 2). We omit some further operations for this data structure.

We implement the data structure as an array map that maps values to the index of the set that they are in. This means that we flatten the two lists into one map. We introduce a bound L which is the maximal size of the first list. Indices \(i < L\) represent indices to sets in the first list; indices \(i \ge L\) represent index \(i-L\) in the second list. Values that are not in any set get a \(-1\) entry. We capture this mapping in assertion \(A_{ dslt }\).

Example 4

Let \(L=3\) and \(N=5\). Then \(A_{ dslt }\) maps the abstract to array \(a=[3,0,2,-1,3]\): We have \(a[1] < L\) so value 1 must be in the first list; since \(a[1]=0\), we find value 1 in the set at index 0. We also have \(a[4] \ge L\) so value 4 is in the second list. Since \(a[4]=3\), we find it at index \(3-L=0\). Lastly, we have \(a[3]=-1\), which means that value 3 is not in any of the sets.

4.2 The mcsta Data Structure

The mcsta data structure is a tuple . , and represent the states, transitions, and branches of the MDP structure, respectively. Additionally, and are sets representing the avoid and target states of the reachability property being verified (corresponding to sets A and T of Def. 2). We define a relation \(R_{ Mdi }\) that relates our model of the mcsta data structure to an MDP structure:

figure ca
figure cb

The states of an MDP in mcsta are numbered from 0 to \(N-1\). derives the kernel from the data structure as follows: and are lists of intervals (represented as tuples, see Sect. 4.1) and is a list of state indices. If , the next n transitions starting at index i in belong to state v. This means that a transition is an index . Similarly, is a tuple pointing to an interval of indices on . A branch is thus an index and is the target state of the branch. Furthermore, if , we ignore all outgoing edges. The invariant states the following: (1) it fixes the number of states to N for the bounds calculations in sepref. It states that (2,3) our target and avoid states are a subset of \(S_0\). It also states that (4,5,6) points to valid indices on , points to valid indices on and points to a valid indices on , (7,8)  and do not contain empty intervals, (9) transitions cannot overlap, and (10,11) the input MDP structure remains constant. The relation relates the input MDP structure to the concrete data structure. Using sepref and composition, we obtain the according assertion . We refine the concrete data structure to LLVM using the IRF standard library and the supplementary data structures from Sect. 4.1: We implement and as lists of bit-packed intervals, as a list of 64-bit values, and and as bitsets.

Example 5

One possibility to represent Fig. 1 is \( St = [(2,0),(1,2),(1,3),(1,3)]\), \( Tr = [(1,0),(2,1),(1,3),(2,4),(1,6)]\), \( Br = [2,0,1,0,3,1,3]\). For state 0 we have \( St ! 0 = (2,0)\), i.e. it has 2 successors (\(\alpha \) and \(\beta \)) starting at index 0. Similarly, for transition 1 (corresponding to \(\beta \) in this case) we have \( Tr ! 1 = (2,1)\), which means that this transition has 2 branches starting at index 1 in \( Br \) (i.e. state \( Br ! 1 = 0\) and \( Br ! 2 = 1\)). Note that we have not defined \( Av \) and \( Ta \) yet as these are dependent on the property. If we assume the property of Example 1 then \( Av = \{1\}\) and \( Ta = \{3\}\). These translate to the bitsets ...0010 and ...1000 respectively, which removes the outgoing transitions from those states (not visualized).

mcsta directly passes this data to our implementation. However, as we have seen in Sect. 3.3, our algorithm needs to be able to efficiently remove states and transitions. The data structure that we have presented so far cannot implement this functionality efficiently.

Cullable MDP Structure. The implementation of from Sect. 3.3 supplements the input data structure with the data structure from Sect. 4.1, which is a tuple of lists of sets of states. States in the first list of the tuple are removed while states in the second one are not. Furthermore, a transition starting in some state v is “activated” if all branches of that transition are within the same set. If any branch connects different sets, the transition is deactivated. With this approach, we place states of the same SCC into the same set, disabling transitions between SCCs in the process. Additionally, we use the tuple structure to distinguish between MECs in the first list and SCCs in the second, which means that it eventually stores the MEC decomposition.

figure df

With we test if a transition is activated by checking that all branches are in the same set ( which is a operation) as the source state. We use this to derive the culled kernel which contains exactly the activated transitions of . omits the states that have been identified as a MEC. The MECs are stored in the first list of directly. Variable is the number of remaining states, i.e. for which no MEC has been identified. We use this for implementing the termination criterion. We omit the definition of the invariant which mainly concerns well-formedness of . This data structure allows us to efficiently implement from Sect. 3.3 by putting states from the same SCC into the same set. This update is straightforward to implement as it merely involves updating the value of unfinished states in the map to the corresponding index of the SCC, which is also stored in a map for the SCC algorithm.

Example 6

Assuming the middle situation in Fig. 2, consider the input data from Example 5 and additionally \( Mm = ([\{3\},\{2\}],[\{0,1\}])\). For state 0, \(\beta \) is activated since its branches (to 0 and 1) are in the same set as the source (0). However, \(\alpha \) branches to 2 which is in another set, so the transition was deleted.

4.3 Filter List

Given the number of states N and the number of MECs M, we have \(M \le N\). This is essential for our bounds calculation: Since the ID of a MEC is represented as a 64-bit value, we need to bound M. We require a “dense” indexing for the MECs, i.e. they must be numbered from 0 to \(M - 1\) efficiently. This way, we can do our bounds calculation solely using N, which we know a priori. We do so by iterating over all states, and if we find any transitions leaving the SCC, the SCC is not a MEC and we filter it. We implemented a filter set and filter list to implement this memory- and time-efficiently.

The filter list is an extension of any data structure representing a list. On the abstract layer, it is of type where the first list of the pair is the original list that we want to filter and the second one is the filtered variant. Concretely, it is of type . The first list ( ) is the original list, the second list ( ) is a map containing indices, and the natural number is a counter representing the length of the filtered list c. The list is the core of this data structure. It maps an index i of the unfiltered list to if is filtered or to if is at index j in the filtered list. The filter set is similar to the filter list but either maps to (entry is filtered) or (entry is unfiltered). We then convert the filter set to a list by assigning a unique index to each unfiltered entry.

Example 7

Assume unfiltered list out of which we want to filter a and c. Abstractly, we have . Its concrete implementation is the triple . Since a (index 0) and c (index 2) are filtered, we have . Similarly, we find b (index 1) at index 0 in the filtered list. Therefore .

5 Code Generation and Integration

Using the algorithm of Sect. 3 and the data structures of Sect. 4, we derive an LLVM program using sepref. Through transitivity of the refinement relation, we can show that this program refines the specification from Sect. 3.2. The IRF provides the setup to extract a separation logic Hoare triple from our correctness proof. Let \(\star \) be the separation conjunction. Then we obtain:

figure ei

where \(A_{ Mdo }\) is derived from \(A_{ dslt }\) mapping only its first list to memory given that the second list is empty. The precondition consists of several parts: (1) The input consists of a value N representing the number of states with its 64 bit representation \( ni \) and an input MDP structure \((S_0,K_0)\) which is represented in memory by \( mdpi \). (2) We are provided a pointer to the MDP structure (\( p\_mdpi \)) and one to an address where we can store our output. (3) \((S_0,K_0)\) is an MDP structure that has fewer than \(2^{62}\) states and a dense numbering S. Given these preconditions we (4) run our program with the specified input parameters. We then get (5) a MEC decomposition M and its representation in memory such that (6) the input parameters are preserved and we additionally obtain the MEC decomposition, (7) the provided pointer ( ) points to that decomposition and (8) M is the set of MECs.

The IRF has built-in functionality to translate to LLVM code with a header file, which can be called as an external function from mcsta. Note that we use indirection through pointers to avoid problems with different ABIs when passing structures as parameters or return values. It is invoked as:

figure en

5.1 Compatibility with mcsta

We refer to the verified LLVM code as the verified implementation and to the pre-existing implementation in mcsta as the integrated implementation. While the data format of the verified implementation is compatible with mcsta, there were some important differences that we fix using glue code for post-processing:

First, collapsing the MECs for interval iteration, which is currently not verified, requires the MECs to be sorted in exploration order. The algorithm we formalised does not do that out of the box and we are not aware of an algorithm that does preserve this order. That is why we decided to reorder the MEC indices as a post-processing step.

Second, the integrated algorithm groups all target states into one collapsed target state and does the same for avoid states. The verified algorithm puts each target and avoid state in its own MEC. Both approaches are correct, but the verified algorithm therefore calculates at least as many MECs as the integrated one. We considered formalising this collapsing of states in our proofs, but we decided against it as it would complicate them. Since we decided for the post-processing approach for reordering, we included the latter as well.

Fig. 3.
figure 3

Comparison of runtime to complete the MEC decomposition routine

6 Experimental Evaluation

We have embedded the verified implementation into the mcsta tool of the Modest Toolset. Since it uses mcsta ’s regular input and output data structures, we do not need any expensive conversions and minimal glue code with negligible runtime. Furthermore, we implemented a reference implementation in C++ that we manually optimised. We now compare the performance of these two and the integrated implementation.

6.1 Experimental Setup

We use all applicable benchmarks (i.e. all MDP models, PTA models transformed into their digital clocks MDP, and MA transformed into the embedded MDP for untimed properties) from the Quantitative Verification Benchmark Set (QVBS) [27], which however rarely contain any nontrivial MECs. MEC decomposition is still necessary since we do not know a priori whether nontrivial MECs exist in a model and the algorithm may still require multiple iterations to obtain this result. To study the performance when nontrivial MECs exist, we adapt a benchmark set for long-run average rewards (LRA) from [20]. We test one reachability property per model to trigger the MEC algorithm and inflated the parameters to challenge the implementations. This benchmark set contains the mer Mars rover case study from [19], the sensors case study from [34], and other models from the PRISM benchmarks. We added parameters where sensible to allow scaling the model. We also created MDP adaptations of the stochastic games originally hand-crafted to contain interesting MEC structures for the evaluation of [35]. This gave us 61 benchmark instances to test our implementation on. We aimed to benchmark models between 500,000 and 100 million states. Smaller models terminate too quickly to benchmark while larger models run out of memory. We ran all benchmarks on an Intel Core i7-12700H system with 32 GB of RAM running Linux Mint 21.3.

6.2 Results

We ran each benchmark three times and report the averages of those runs. Figure 3 compares the wall clock runtime, with the left scatter plot comparing the verified to the integrated implementation and the right comparing the verified to the reference implementation. Each dot is a pair of runtime values for one benchmark instance. We distinguish benchmarks for PTA, MA and MDP from the QVBS (Q) or the LRA benchmarks (L). With our setup, we found that our verified implementation performs on par with the reference and integrated implementations, with a slight edge for the integrated implementation, but with little optimisation potential. We observe that this pattern also seems to hold independently of the type of model. One noteworthy outcome is the fact that the integrated implementation crashes for one instance whereas the reference and verified implementations do not. This is caused by the integrated implementation requiring more memory. We compared peak memory usage (working set) of the verified and integrated implementations. While this approach may be influenced by external factors like garbage collection, it can still provide a useful indication of relative memory consumption. Peak memory was higher for the integrated implementation in 42 out of the 61 instances. On average, the integrated implementation used about \(8.2\%\) more memory than the verified implementation. In isolated instances it reached up to \(36.4\%\) more. In comparison, the verified implementation used at most \(28.4\%\) more than the integrated implementation for isolated instances. The instance that crashed (tireworld with \(n=45\)) lies on the verge of what a laptop with 32 GB of RAM can process: Peak memory reached almost 31 GB for this instance using the verified algorithm.

7 Conclusion

We have formally verified a MEC decomposition algorithm in Isabelle/HOL. As far as we know, this is the first such formalization. We have refined this algorithm down to LLVM and generated efficient executable code which we embedded into the mcsta probabilistic model checker of the Modest Toolset. This is a step towards a fully verified model checking toolchain. We aim to replace algorithms in the toolchain piece by piece, monitoring the performance impact in each step. Where previous attempts at formally verified model checkers have not been competitive in terms of performance and functionality, our approach yields comparable performance to manual implementations. Additionally, if desired, cross-usage with other (unverified) functionality is possible. While the performance of our verified implementation is comparable to the integrated implementation, it is a clear improvement over the integrated implementation in terms of memory usage.

Future Work. Comparisons with the manual implementations suggest that the verified implementation does not have a lot of optimization potential. We consider it more useful to focus on other algorithms at this point. One candidate is the improved MEC algorithm by Chatterjee et al. [10], which has a better theoretical complexity than our implementation; deriving a competitive implementation from this would be highly relevant. Another candidate is the interval iteration algorithm [22] which uses the MEC algorithm as a pre-processing step. An efficient implementation of interval iteration requires a representation of real or rational numbers with low overhead. Unverified implementations rely on IEEE floating-point values (floats) which are suitable for high-performance computations but come with rounding errors [23]. This requires an extension of the IRF in order to refine real numbers to floats and reason about rounding.

Data availability. The proofs and benchmarks presented in this paper are archived and available at DOI 10.4121/3f2a4539-e69b-4d16-b665-530c1abddfbchttps://doi.org/10.4121/3f2a4539-e69b-4d16-b665-530c1abddfbc [32].