figure a
figure b

1 Introduction

The design of efficient algorithms for model checking has been a major research challenge for over three decades. Following the SAT breakthrough in the late 90 s [22, 25], many novel SAT-based techniques have been proposed, which have tremendously increased the efficiency and scalability of (symbolic) model checking and its applicability to real-world systems (e.g., [6, 8, 15, 17, 18, 20, 21, 24, 27]). Although the vast majority of such approaches have focused on safety properties, their benefits have extended also to liveness model checking, thanks to the development of liveness verification algorithms that work by exploiting efficient safety checkers, either via a monolithic reduction from liveness to safety [4], or via more sophisticated strategies that use safety checkers incrementally [13], exploiting also the inductive invariants generated when the verification is successful [9, 16].

In this paper, we present a novel SAT-based liveness checking algorithm, which we call rlive, that also takes advantage of efficient safety model checkers and their capability of producing inductive invariants for verified properties. Like all other SAT-based approaches to liveness checking, rlive works on properties of the form FGq, stating that q has to eventually stabilize to true in all traces of the system, relying on standard procedures (e.g., [12, 14]) for transforming a model checking problem for an arbitrary LTL property into this form.

Similar to the FAIR algorithm of [9], rlive then proceeds by refuting candidate counterexamples to the property, i.e. traces in which \(\lnot q\) holds infinitely often, using a sequence of calls to a safety checker, and exploiting the inductive invariants generated by such safety checks to prune the set of reachable \(\lnot q\)-states, until either a real (lasso-shaped) counterexample for FGq is found, or no \(\lnot q\)-states are reachable, implying that the property holds. However, in contrast to FAIR, which directly searches for lasso-shaped traces where \(\lnot q\) holds in at least one state of the loop, rlive searches for counterexamples incrementally, via a recursive chain of safety checks, each of which tries to determine whether it is possible to reach a \(\lnot q\)-state starting from the successors of a previously-reached \(\lnot q\)-state, in a manner conceptually similar to k-Liveness  [13]. If a \(\lnot q\)-state is found for the second time during this recursive chain, a (lasso-shaped) counterexample witnessing the violation of FGq is constructed, and the algorithm terminates. Otherwise, eventually one of the recursive safety checks will generate an inductive invariant C proving that no other \(\lnot q\)-state can be reached from (the successors of) a given \(\lnot q\)-state s. rlive then uses C to derive constraints that exclude s from the reachable states of the system, forcing it to (recursively) consider a different \(\lnot q\)-state to continue the current candidate counterexample trace. Specifically, C is used to strengthen the target states to reach, by asking the safety checker to ignore \(\lnot q\)-states whose successors are all contained in C (since all such states in C cannot visit \(\lnot q\) infinitely-often); furthermore, C can be used also to strengthen the transition relation of the input system, since no state in C can be part of a counterexample. To give this intuition, we refer to states in C as shoals, as they represent regions of the state space that must be avoided in order to not “get stuck” in the search for a counterexample. Eventually, the shoals (recursively) produced will either exclude all \(\lnot q\)-states, thus proving that the input property holds, or compel rlive to find a lasso-shaped counterexample for it.

Intuitively, rlive effectively identifies counterexamples by searching, in a depth-first manner, for traces that contain as many \(\lnot q\)-states as possible. Performing the search incrementally, by a sequence of simple reachability checks, turns out to be computationally cheaper than searching directly for loops in practice. Moreover, whenever the current candidate counterexample trace cannot be completed, the shoals obtained from the safety checks can be used globally to strengthen the transition system and reduce the search space that needs to be explored, thus accelerating the convergence of the algorithm.

We have implemented rlive on top of the nuXmv model checker [10] which has a mature, state-of-the-art IC3 implementation, and compared it against state-of-the-art implementations of other SAT-based liveness checking algorithms, including FAIR, k-Liveness, and their recent combination called k-FAIR  [16]. Our experimental results, conducted on a wide range of benchmarks taken from recent hardware model checking competitions [1, 2], demonstrate the strengths of our algorithm: rlive solves more benchmarks than any other competitor in the given resource bounds, and very often with significantly shorter time.

Paper Structure. The rest of the paper is structured as follows. After the introduction of the necessary background in Sect. 2, we describe rlive in Sect. 3 and prove its soundness and correctness. We compare rlive with related work in Sect. 4, and experimentally evaluate its performance in Sect. 5. Finally, we conclude in Sect. 6 outlining also directions for future work.

2 Preliminaries

2.1 Boolean Satisfiability

A literal is a Boolean variable or its negation. If l is a literal, we denote its corresponding variable with var(l). A cube (resp. clause) is a conjunction (resp. disjunction) of literals. The negation of a clause is a cube and vice versa. A formula in Conjunctive Normal Form (CNF) is a conjunction of clauses. For simplicity, we also treat a CNF formula \(\phi \) as a set of clauses and make no difference between the formula and its set representation. Similarly, a cube or a clause c can be treated as a set of literals or a Boolean formula, depending on the context.

We say a CNF formula \(\phi \) is satisfiable if there exists an assignment of its Boolean variables, called a model, that makes \(\phi \) true; otherwise, \(\phi \) is unsatisfiable. A SAT solver is a tool that can decide the satisfiability of a CNF formula \(\phi \). In addition to providing a yes/no answer, modern SAT solvers can also produce models for satisfiable formulas, and unsatisfiable cores (UC), i.e. a reason for unsatisfiability, for unsatisfiable ones. More precisely, in the following we shall assume to have a SAT solver that supports the following API (which is standard in state-of-the-art SAT solvers based on the CDCL algorithm [19]):

  • is-SAT \((\phi , \mathcal {A})\) checks the satisfiability of \(\phi \) under the given assumptions \(\mathcal {A}\), which is a list of literals. This is logically equivalent to checking the satisfiability of \(\phi \wedge \bigwedge \mathcal {A}\), but is typically more efficient;

  • get-UC() retrieves an UC of the assumption literals of the previous SAT call when the formula \(\phi \wedge \bigwedge \mathcal {A}\) is unsatisfiable. That is, the result is a set \(uc \subseteq \mathcal {A}\) such that \(\phi \wedge \bigwedge uc\) is unsatisfiable;

  • get-model() retrieves the model of the formula \(\phi \wedge \bigwedge \mathcal {A}\) of the previous SAT call, if the formula is satisfiable.

2.2 Boolean Transition Systems

A Boolean transition system \( Sys \) is a tuple \(\langle X, I, T\rangle \), where X is a set of variables, and I and T are formulae. The state space of \( Sys \) is the set of possible assignments to X. I(X) is a Boolean formula corresponding to the set of initial states, and \(T(X, X')\) is a Boolean formula representing the transition relation, where \(X' = \{ x' \mid x \in X \}\) represent the next state variables. In the following, we extend the prime notation to states and formulae in the natural way. The state \(s_{2}\) is a successor of a state \(s_{1}\) iff \(s_{1} \wedge s_{2}' \models T \), which is also denoted by \((s_1,s_2)\in T\). A finite path of length k is a finite state sequence \( s_{1}, s_{2}, \dots , s_{k} \), where \((s_{i}, s_{i+1})\in T \) holds for \((1\le i \le k -1)\). An infinite path is an infinite state sequence \( s_{1}, s_{2}, \dots \), where \((s_{i}, s_{i+1})\in T \) holds for \(i\ge 1\). The number of states is finite for any (Boolean) transition system. An infinite path is lasso-shaped if it can be presented as \(\alpha \cdot \beta ^\omega \), where \(\alpha \) is the finite prefix, e.g. \(s_{1}, s_{2}, \dots , s_{l-1}\), and \(\beta \) is an infinitely-repeating suffix, e.g. \(s_{l}, s_{l+2}, \dots , s_{k}\). A state t is reachable from s in k steps if there is a path of length k from s to t. Let S be a set of states in \( Sys \). We overload T and denote the set of successors of states in S as \(T(S) = \{t \mid (s,t) \in T, s \in S\}\). Conversely, we define the set of predecessors of states in S as \(T^{-1}(S) = \{s \mid (s,t) \in T, t \in S\}\). Recursively, we define \(T^{0}(S) = S\) and \(T^{i+1}(S) = T(T^{i}(S))\) where \(i \ge 0\); the notation \(T^{-i}(S)\) is defined analogously. In short, \(T^{i}(S)\) denotes the states that are reachable from S in i steps, and \(T^{-i}(S)\) denotes the states that can reach S in i steps.

2.3 Invariant Checking

Let a Boolean transition system \( Sys =\langle X, I, T\rangle \) be given. A Boolean formula P over X is an invariant iff it holds in all the reachable states of Sys. An invariant checker either proves that P holds for any state reachable from an initial state in I, or disproves P by producing a counterexample. In the former case, we say that the property is proven in the system, while in the latter case, the property is disproved. A counterexample is a finite path from an initial state s to a state t violating P, i.e., \(t \in \lnot P\); such a state is also called a bad state.

Invariant checking, also referred to as safety checking, is reduced to symbolic reachability analysis. Reachability analysis can be performed in a forward or backward search. Forward search starts from initial states I and searches for bad states by computing \(T^{i}(I)\) with increasing values of i, while backward search begins with states in \(\lnot P\) and searches for initial states by computing \(T^{-i}(\lnot P)\) with increasing values of i.

State-of-the-art safety checking algorithms utilize SAT techniques to explore the state space so as to improve the overall performance dramatically. Representative approaches include IC3/PDR [8, 15], interpolation-based model checking [20], combinations of IC3 with interpolation [27] or k-induction [17], and (forward and backward) CAR  [18]. In the following, we abstract from specific invariant checking algorithms, and assume that they implement the following API:

  • check-reachable \((I, T, \lnot P)\) denotes a generic procedure for safety checking. It takes as input a set of initial states I, the transition relation T, and the negation of the candidate invariant P. check-reachable returns unsafe if P is not an invariant. Otherwise, it returns safe.

  • get-invariant() retrieves an inductive invariant proving that the bad states are unreachable, i.e. a set \(\iota \) of states closed under T, containing the states reachable from I, and not intersecting \(\lnot P\). More formally, \(\iota \) is such that \(I \models \iota \), \(\iota \wedge T \models \iota '\), and \(\iota \models P\).

  • get-cex-trace() retrieves, if the property is violated, the counterexample found by the safety checker, i.e. a finite path from I to \(\lnot P\).

2.4 Liveness Checking

We now consider the general model checking problem, denoted \(Sys\models \phi \), where \(\phi \) is a formula in Linear Temporal Logic (LTL) [23]. Following the standard automata-theoretic approach [26], the problem can be reduced to checking \(Sys\times \mathcal{A}_{\lnot \phi }\models FG q\), where \(\lnot q\) can be seen as the Büchi acceptance condition of \(\mathcal{A}_{\lnot \phi }\). (Symbolic techniques such as [12, 14] can be used in practice to encode such reduction.) FGq intuitively means that, in any satisfying trace, q eventually holds in all the future states, so that the acceptance condition \(\lnot q\) can only be visited a finite number of times. Dually, a counterexample is an infinite path where \(\lnot q\) is visited an infinite number of times, i.e. a trace satisfying \(G F \lnot q\).

In the following, we focus on the (simplified) \(Sys \models F G q\) problem, referred to as liveness checking. If the property is violated, there always exists a lasso-shaped counterexampleFootnote 1, i.e., an infinite path \(\alpha \cdot \beta ^\omega \) where (i) the prefix \(\alpha \) is a finite trace of Sys whose last state t violates q, i.e., \(t \in \lnot q\), and (ii) the infinitely-repeating suffix \(\beta \) is a path in Sys from a successor of t to t. We refer to a state \(t\in \lnot q\) as a \(\lnot q\)-state.

Algorithm 1
figure c

. k-FAIR = k-Liveness + FAIR

The algorithms for liveness checking are more complicated than those for invariant checking. In order to show that a candidate invariant does not hold, it is sufficient to find a finite path. Liveness checking, on the other hand, requires finding an infinite (lasso-shaped) counter-example (or proving that none exists). The most effective solutions to liveness checking are based on invariant checking. The most relevant to our work are the following.

  • The L2S  [4] (Liveness-to-Safety) construction introduces a copy of the state variables in Sys, to record the first state of the loop, and a fresh variable inLoop, to record that the loop has started. The state vector copy is non-deterministically assigned a state violating q, i.e. the start of the loop, and can never change after that. The search tries to reach a state where each state variable has the same value as its copy and \(inLoop = true\), which implies that a violating lasso is detected. This translation is sound and complete.

  • FAIR  [9] tries to construct a lasso-shaped counterexample as follows: first, it searches for a candidate prefix (\(\alpha \)); then, starting from the last (bad) state t of \(\alpha \), it searches for a suffix (\(\beta \)) that ends with t. Both steps are based on invariant checking. If the loop cannot be found, this bad state will be pruned. Fundamental optimizations include state generalization and, more importantly, extraction of walls (where, intuitively, states in a loop can only exist on one side of the wall). Then, FAIR iterates trying to find another candidate prefix for the lasso. The procedure terminates as soon as no prefix can be detected, in which case the property is proved.

  • k-Liveness  [13] tries to prove FGq based on the following intuition: if FGq holds, then there is a (finite) maximum number of times in which q can be violated in any path. The k-Liveness construction introduces a counter of the number of times q is violated and calls a safety checker to prove that the counter cannot exceed the given limit k. In case of failure, the limit is increased. k-Liveness proves the property if a k is found such that no path visits \(\lnot q\) more than k times. In general, k-Liveness is considered effective in proving the property. Notice, however, that k-Liveness – as described above – is incomplete, and will diverge if the property does not hold. On the other hand, it is possible to find counterexamples by checking the existence of repeated bad states from the path returned by the safety-checking call. As already suggested in [13], k-Liveness can be run in parallel to bounded model checking [6], that is complete in the case of violation.

  • k-FAIR  [16] is a more recent approach, designed to inherit advantages from FAIR and k-Liveness. k-FAIR utilizes k-Liveness for proving correctness while leveraging FAIR for finding counterexamples. The k-FAIR algorithm is shown in Algorithm 1. We see that FAIR and k-Liveness can both be considered a special case of k-FAIR. If line 17 is removed, the algorithm becomes FAIR. If line 5 and lines 9–16 are removed, it becomes k-Liveness.

3 Liveness Checking with rlive

In this section we informally describe rlive, then present the pseudo code and some optimizations, and finally characterize its formal properties.

3.1 Overview

Fig. 1.
figure 1

Forward expansion and shoal construction (left); Rollback (right).

Fig. 2.
figure 2

Terminating conditions: counterexample found, unsafe (left); \(\lnot q\) no longer reachable from I, safe (right).

rlive is a new algorithm for liveness checking (\(Sys\models F G q\)). At a high level, rlive can be seen as a depth-first search with chronological backtracking and learning. rlive incrementally tries to build a counterexample to FGq, progressively extending it with more states in \(\lnot q\). In the forward expansion phase, rlive first looks for a finite path \(\pi _1\) from I to \(\lnot q\), with \(s_1\) being the last state of \(\pi _1\). Then, rlive looks for another path \(\pi _2\) from \(T(s_1)\) to \(\lnot q\), and so on. See Fig. 1, left. The forward expansion proceeds until one of two conditions holds.

  1. 1.

    if \(s_n\) is equal to \(s_i\), with \(i < n\), then a lasso-shaped counterexample exists, and the search terminates with unsafe (Fig. 2, left). The counterexample can be constructed by concatenating the previously found \(\pi _i\).

  2. 2.

    if \(s_{n+1}\) cannot reach \(\lnot q\), then a shoal is built, i.e. a set of states closed under T and containing \(T(s_{n+1})\), that can reach no target state (\(shoal_{n+1}\) in Fig. 1, left). Clearly, no state in a shoal can belong to the counterexample; hence, shoals are learned and used to block the subsequent forward expansions.

In the second case, the algorithm rolls back to the previous level, and restarts the forward search, looking for a new way to enter \(\lnot q\). However, to avoid entering the shoals again, the target \(\lnot q\)-state must have successors outside the shoals. (e.g. \(s'_{n+1}\) in Fig. 1, right). The algorithm terminates with safe whenever it rolls back to level 0, and finds no way to reach, from the initial states, the remaining subset of \(\lnot q\) while avoiding the shoal constraints (Fig. 2, right).

We remark that, upon backtracking, the forward search space is restricted to avoid the shoal constraints as well as the states in \(\lnot q\) that do not belong to the counterexample. Hence, the navigation toward the target is increasingly restricted because of the discovered shoal constraints and also because the target is progressively shrunk.

The algorithm described above is naturally implemented with primitives provided by the safety checker, such as deciding reachability and constructing the counterexamples and the invariants. A further practical optimization called dead-state pruning, trades off calls to the safety checker with calls to the SAT solver, enlarging the shoals with a cheap form of look ahead to further prune the target set.

3.2 Algorithm

Algorithm 2
figure d

. Implementation of rlive

Algorithm 2 describes how rlive is implemented using a generic invariant-checking engine implementing the API introduced in Sect. 2.3. To prove or falsify the liveness property FGq, rlive will maintain a global state set C at line 2, representing the shoals (i.e. states from which \(\lnot q\) can be reached only a finite number of times) discovered so far.

The algorithm starts from line 4, checking whether \(\lnot q\) is reachable from the initial states, using check-reachable. If it is not reachable, Gq is proved, and so FGq is verified. Otherwise, from the counterexample trace returned by check-reachable, we get a reachable \(\lnot q\)-state s. Then the search-cex function is called to search for the next \(\lnot q\)-state from s.

When C is not empty, we block the states in C from the transition system by adding the constraint \(\lnot C \wedge \lnot C'\) to T (lines 5 and 17). At the same time, the states to be searched become \(\lnot q\cap T^{-1}(\lnot C)\), which ensures that the searched \(\lnot q\)-states have \(\lnot C\) successors, to exclude the \(\lnot q\)-states that are proved not to be part of a counterexample.

In the search-cex(sB) function of line 11, the parameter s serves as a new reached \(\lnot q\)-state, and the parameter B contains the \(\lnot q\)-states that have been previously reached along the current trace. Therefore, in lines 12–13, when s has appeared in B, a lasso-shape counterexample has been found, so the function returns True (a counterexample has been detected). Line 15 is the implementation of an important heuristic called dead-pruning, which we describe in detail in the next subsection. A new call to check-reachable is performed to find the next \(\lnot q\cap T^{-1}(\lnot C)\)-state starting from the successor of s on line 17. The reason for searching from the T(s)-states is that s itself is a state that meets \(\lnot q\cap T^{-1}(\lnot C)\). However, calculating the exact set T(s) might be quite expensive, so we use an overapproximation of T(s), which we describe below in Sect. 3.3. If a state t can be reached, then the function is called recursively, t is used as the new starting state, and the s state is added to B. Otherwise, check-reachable would return an inductive invariant D on line 22.Footnote 2 This invariant is an overapproximation of the reachable states starting from T(s), and none of these states can reach \(\lnot q\cap T^{-1}(\lnot C)\). Therefore, states in D are shoals, so they can be added to C, and then the function returns False.

3.3 Optimizations

Avoiding the Explicit Computation of \(\boldsymbol{T^{-1}(\lnot C)}\). When asking for the next \(\lnot q \cap T^{-1}(\lnot C)\)-state in the current trace, we can avoid the explicit computation of \(T^{-1}(\lnot C)\) by exploiting some additional knowledge about how the reachability engine check-reachable works. For example, if check-reachable is based on IC3  [8], we can simply add a constraint \(T \wedge \lnot C'\) to the SAT solver when asking for a \(\lnot q\)-state.

Efficiently Over-Approximating \(\boldsymbol{T(s)}\). Using IC3 as an implementation of check-reachable allows us also to efficiently overapproximate the states T(s) in the (recursive) searches for the next \(\lnot q\)-states in the current trace (line 17). To do so, we slightly modify IC3,Footnote 3 and in particular the query that checks whether a given predecessor b of a bad (\(\lnot q\)-)state intersects the initial states of the system. Rather than checking whether \(T(s) \wedge b\) is satisfiable, we check the satisfiability of \(s \wedge T\) under the assumption of \(b'\). If the formula is unsat, we add the cube \(c \subseteq b\) corresponding to the unsat core produced by the SAT solver (i.e. such that \(c' = \textsc {get}\hbox {-}\textsc {UC}{()}\)) to the 0-th frame of IC3. In this way, the 0-th frame of IC3 will effectively be our desired over-approximation of T(s).

Dead States Pruning. During rlive, lots of dead states, i.e. states that do not have any successors, are formed due to the strengthening of T and \(\lnot q\) using the discovered shoals. To prove that \(\lnot q\) cannot be reached from such a dead state, check-reachable needs to search for the predecessor states of \(\lnot q\) and describe the overapproximation of the reachable set from the dead state with the literals in the predecessors, which might require a large number of SAT queries.

Dead-pruning optimization is a simple and effective optimization (but probably not the only one) used to detect and quickly block the dead states. The optimization is used before calling check-reachable, to check whether a successor of the starting bad state is a dead state. If it is, then it can be excluded from the search and used to strengthen the shoals C.

Line 26 in Algorithm 2 is the implementation of the dead-pruning heuristic. A successor d of s is computed on lines 27–29. If d is determined to be a dead state (line 30), then it can be added to C (after being generalized using the unsat core produced by the SAT solver). The function returns False once it finds a successor of s with successors outside of C. If all the successors of s are blocked as dead states, the function returns True.

3.4 Correctness Proof

This section presents the proofs for the correctness of rlive (Algorithm 2). We first show the following lemmas which are crucial for the proof.

Lemma 1

Every state \(t\in C\) can only reach a \(\lnot q\)-state a finite number of times.

Proof

According to Algorithm 2, C can be updated in either the search-cex or prune-dead procedure. Since the latter one is optional (it is an optimization), we first consider the proof without the prune-dead procedure.

In the search-cex procedure, C is the state set that is updated by the union of different inductive invariants returned by check-reachable (line 23), whose initial states are an over-approximation of successors of some \(\lnot q\)-state s. From the correctness of check-reachable, every state t in the inductive invariant satisfies: (1) it may be reachable from the initial states (and \(\lnot q\)-state s) due to over-approximation, thus may be reachable from s, and (2) it cannot reach the states in \(\lnot q \cap T^{-1}(\lnot C)\) (line 17). By construction, assume \(C = C_1\cup C_2\cup \ldots \cup C_n\) where \(C_k\) (\(1\le k\le n\)) is the k-th inductive invariant added into C. We prove the lemma by induction over n. Obviously, every state \(t\in C_1\) cannot reach states in \(\lnot q\) (and \(C_1\cap \lnot q = \emptyset \)). So the lemma holds in the base case. For the inductive step (when \(k>1\)), since every state \(t\in C_k\) cannot reach \(\lnot q \cap T^{-1} (\lnot (\bigcup _{1\le i\le k-1}C_i))\), we consider a state \(\tilde{s}\in \lnot q\) in two different sets. If \(\tilde{s}\in T^{-1}(\lnot (\bigcup _{1\le i\le k-1}C_i))\), t cannot reach \(\tilde{s}\); otherwise, \(\tilde{s}\not \in T^{-1}(\lnot (\bigcup _{1\le i\le k-1}C_i))\) implies that \(T(\tilde{s})\subseteq (\bigcup _{1\le i\le k-1}C_i)\), i.e., every successor of \(\tilde{s}\) is in \((\bigcup _{1\le i\le k-1}C_i)\). From the inductive hypothesis, every state in \((\bigcup _{1\le i\le k-1}C_i)\) can only reach a \(\lnot q\)-state a finite number of times. Therefore, we have that t can only reach a \(\lnot q\)-state finitely-many times as well.

Taking the prune-dead procedure into consideration, only those states whose successors are all in C are added into C (line 32). From the hypothesis assumption, every state in C can only visit a \(\lnot q\)-state a finite number of times, so as the predecessors, those states can only visit a \(\lnot q\)-state finitely-many times as well.    \(\square \)

Lemma 2

Given \(s \models \lnot C\), when the prune-dead (s) procedure returns, it returns True if and only if every successor of s, if existing, is in C.

Proof

(\(\Rightarrow \)) The procedure returns True implies that either the SAT call at line 27 returns unsat, which indicates that every successor of s is in C, or there is some successor d of s that is not in C. However, since the procedure returns True, the SAT call at line 30 must return unsat, which indicates that every successor of d, if existing, is in C. Then d will be added into C according to lines 31-32. So \(d\in C\) becomes true. The above process will repeat inside the while loop at line 27 until every successor of s is in C.

(\(\Leftarrow \)) If every successor of s is in C, the SAT call at line 27 will return unsat. Therefore, the while loop directly stops and the procedure returns True at line 35.    \(\square \)

Lemma 3

  1. 1.

    search-cex(sB) returns True if and only if there is a lasso starting from s and its loop part contains a \(\lnot q\)-state.

  2. 2.

    search-cex(sB) always terminates.

Proof

  1. 1.

    (\(\Rightarrow \)) The procedure is recursively implemented and it returns True as soon as a \(\lnot q\)-state t (which can be the same as s) is already in B, indicating that a loop is detected. Moreover, t is reachable from the input state s, since t is detected from the successors of s by check-reachable. Therefore, a lasso starting from s and looping with t is found when the procedure returns True. (\(\Leftarrow \)) Assume the lasso is \(s, \ldots , t_1, \ldots , (t_i, \ldots , t_j)\) in which \(t_j = t_i\) (\(1\le j\le i\)) and every \(t_k\) (\(1\le k\le j\)) is a \(\lnot q\)-state. First of all, we can prove that for each \(t_k\), it is true that \(t_k\in T^{-1}(\lnot C)\), i.e., there is some successor of \(t_k\) that is not in C; otherwise, from \(t_k\) there cannot be a lasso looping with a \(\lnot q\)-state, as based on Lemma 1, all successors of \(t_k\) being in C implies they can only visit a \(\lnot q\)-state a finite number of times. Therefore, \(t_k\) can be found by the check-reachable call at line 17 and prune-dead \((t_k)\) cannot return True according to Lemma 2, implying that search-cex(sB) will not return False at line 16. As a result, search-cex(sB) will finally return True at line 13 once it finds \(t_j\) for the second time.

  2. 2.

    We prove that the while loop of line 17 of search-cex(sB) is terminating. The point is that the size of the state set \(\lnot q\cap T^{-1}(\lnot C)\) keeps shrinking after each iteration of the loop, because the \(\lnot q\)-state t at line 18 will be removed from \(\lnot q\cap T^{-1}(\lnot C)\). The reason is that when the recursive search-cex \((s, \emptyset )\) procedure returns False at line 19, the proof of Item 1 above guarantees that there is no lasso starting from t and looping with a \(\lnot q\)-state. So C will be updated either by the inductive invariant (line 23) or the unsat core in the prune-dead procedure (line 32) such that \(t\not \in T^{-1}(\lnot C)\) is true, according to Lemmas 1 and 2. Therefore, t is successfully removed from \(\lnot q\cap T^{-1}(\lnot C)\). In the worst case, the state set will become empty and check-reachable can terminate with safe as no bad state can be found at line 5.

   \(\square \)

Lemma 4

  1. 1.

    rlive(ITq) always terminates.

  2. 2.

    rlive(ITq) returns safe if and only if the system (IT) satisfies the property FGq.

Proof

  1. 1.

    The proof is analogous to that of Item 2 of Lemma 3, so it is omitted.

  2. 2.

    (\(\Rightarrow \)) Assume by contradiction that rlive returns safe, but the property doesn’t hold. Therefore, there exists a lasso-shaped trace \(\pi \) of the form \(s, \ldots , t_1\), \(\ldots , (t_i, \ldots , t_j)\) in which \(t_j = t_i\) (\(1 \le j \le i\)) and every \(t_k (1 \le k \le j)\) is a \(\lnot q\)-state. By Lemma 1, none of the states in \(\pi \) is in C, and moreover \(t_i \in \lnot q \cap T^{-1}(\lnot C)\). Therefore, \(s, \ldots , t_1\) is a trace reaching the bad state \(\lnot q \cap T^{-1}(\lnot C)\) in the system \(\langle X, I, T \wedge (\lnot C \wedge \lnot C') \rangle \), which is found by the check-reachable call at line 5. But then, search-cex \((t_1, \varnothing )\) at line 7 returns False by Lemma 3, and so rlive returns unsafe, which is a contradiction.

    (\(\Leftarrow \)) The system satisfies the property implies that every \(\lnot q\)-state that is reachable from the initial states, if existing, can only be visited finitely-many times. Assume the number of such reachable \(\lnot q\)-states is k (\(k<+\infty \)). If \(k=0\), the check-reachable procedure in the while loop of rlive (line 5) will directly return safe and thus rlive returns safe. When \(k>0\), assume the reachable \(\lnot q\)-states are \(s_1,\dots , s_k\). So there are at most k iterations of the while loop, since each \(s_i\) (\(1\le i\le k\)) can be found at most once by the check-reachable call on line 5 (the argument is similar to the one used in the proof of Item 2 of Lemma 3). However, search-cex \((s_i, \emptyset )\) will return False, because \(s_i\) can be visited only a finite number of times and thus no lasso can be detected. As a result, rlive cannot return unsafe inside the loop. And finally, rlive can only return safe in the worst case that every \(s_i\) is found and blocked in the while loop.

   \(\square \)

Theorem 1

(Correctness). rlive can always terminate and terminate with the correct result.

Proof

Directly from Lemma 4.    \(\square \)

4 Related Work

We have already introduced the main SAT-based liveness checking algorithms in Sect. 2.4. Here, we discuss their relation with rlive, highlighting both similarities and differences with our approach.

rlive vs L2S  [4]. The original liveness-to-safety transformation is conceptually very simple, and it can be applied with any off-the-shelf safety model checking algorithm, not necessarily based on SAT. The eager L2S transformation can however be inefficient, as it requires a duplication of the state variables, which might lead to significant performance penalties. In contrast, rlive follows a lazier approach, using an incremental reduction to safety, designed to exploit the invariant generation capability of modern SAT-based safety checking engines, which does not require duplicating the state variables and can be more efficient in practice.

rlive vs FAIR  [9]. At a high level, rlive and FAIR follow the same principle of incremental strengthening the input problem by exploiting the inductive invariants generated when refuting candidate counterexamples with a safety model checker. The main difference is in how the candidate counterexamples are identified and blocked: while FAIR does that by checking directly for looping paths that start from a given reachable \(\lnot q\)-state, rlive follows a more incremental approach, in which repeated (and recursive) safety checks are used to build a bad loop incrementally. As our experimental results show (see Sect. 5), this difference turns out to be crucial for performance in practice. A second difference regards the nature of the information extracted from the inductive invariants produced by the safety checker: in general, the walls of FAIR are regions that cannot be crossed to find a counterexample (i.e., all states of a counterexample to FGq are on one side of the wall), whereas shoals are regions that must be avoided completely (i.e., no state in a counterexample can part of a shoal).

rlive vs k-Liveness  [13]. The incremental approach used by rlive for constructing counterexamples is inspired by the k-Liveness algorithm; in some sense, rlive can in fact be seen as a depth-first (DFS) variant of k-Liveness, which performs instead a breadth-first (BFS) search (relative to the number k of times in which \(\lnot q\) can occur in the traces of the system). Thanks to its DFS approach, rlive doesn’t need to maintain a global k value, but uses a different k for each trace; as such, it can sometims reach values of k which are beyond the capabilities of k-Liveness (see our results in Sect. 5).Footnote 4 Another difference between the two approaches is in the capability of finding counterexamples: although in principle complete, k-Liveness is more effective at proving properties than at disproving them, and already in the original paper [13] the authors recommend complementing it with BMC for finding counterexamples; on the other hand, rlive is effective both for safe and unsafe properties.

rlive vs k-FAIR  [16]. k-FAIR is a parametric combination of FAIR and k-Liveness, in which each candidate counterexample to FGq either is analyzed using FAIR, or causes an increase in the k counter of k-Liveness (see Algorithm 1). As such, the comparisons made above between rlive and FAIR or k-Liveness apply also to k-FAIR. Like k-FAIR, rlive can also be seen as trying to combine the strengths of the two techniques in a single algorithm; however, the two approaches differ significantly in how such integration is performed.

5 Evaluation

We have implemented rlive inside the nuXmv model checker [10]. Our implementation can use three different safety-checking engines, namely IC3, fCAR (Forward CAR), and bCAR (Backward CAR), relying on the latest version of CaDiCaL [7] as backend SAT solver. In this section, we experimentally evaluate rlive by comparing it with different state-of-the-art SAT-based liveness checking algorithms.

5.1 Experimental Setup

We include in our evaluation nuXmv  [10] and IIMC  [3], two state-of-the-art tools implementing SAT-based liveness-checking algorithms which are among the best-performing ones in the most recent liveness-checking tracks of the Hardware Model Checking Competition (HWMCC) [1, 2]. nuXmv implements L2S and k-Liveness, using a configuration that runs k-Liveness in lockstep with BMC as suggested in [13] for the latter (which we refer to as k-Liveness + BMC below). In addition to rlive, we also implemented other three liveness-checking algorithms on top of nuXmv, namely k-Liveness, FAIR, and k-FAIR. FAIR and k-FAIR are implemented according to Algorithm 1, and k-Liveness is added with the ability to find counterexamples by checking for repeated \(\lnot q\)-states in the violated traces (before increasing the value of k). IIMC implements FAIR and “plain” k-Liveness instead (without BMC). Table 1 summarizes the tested tools, algorithms, and their engines. Regarding rlive, the ‘-d’ flag is used to enable the dead-pruning optimization, otherwise rlive ignores lines 15–16 of Algorithm 2.

Table 1. Tools and algorithms evaluated in the experiments.

We evaluate all the configurations on 223 benchmarks, in aiger [5] format, of the liveness property track of HWMCC 2015 and 2017 [1, 2].Footnote 5 We ran the experiments on a cluster, which consists of Gold 6132 2.6GHz CPUs in 240 nodes running RedHat 4.8.5 with a total of 96GB RAM. For each test, we set the memory limit to 8GB and the time limit to 1 h. During the experiments, each model-checking run has exclusive access to a dedicated node. For correctness checking, we compared the results from different solvers and found no discrepancies.

Table 2. Summary of overall results among different tools/configurations.

5.2 Experimental Results

Overview. The main results of the experiment are summarized in Table 2, in which the different tools/configurations are ordered by the total number of successfully solved instances within the given resource budget. From the table, we can see that rlive is the algorithm with the overall best performance in terms of the number of solved cases. More explicitly, rlive with the dead-pruning optimization and using IC3 as the backend solves the largest number of instances (159), and it is also the configuration that verifies the most cases (66). rlive is also the algorithm that finds the largest number of counterexamples, and this is true for all configurations that we tested (with ‘rlive -d’ using bCAR being the best one).

Regarding other tools/algorithms, the best performing one is the k-Liveness + BMC implementation in nuXmv, solving a total number of 146 cases, which is 11% less than the best configuration of rlive (i.e., ‘rlive -d’). All the other configurations solve significantly fewer instances than rlive.

The results in Table 2 also show that using different engines to run the rlive algorithm preserves good performance.Footnote 6 Under the same implementation platform, their overall performance is better than k-Liveness using IC3: ‘rlive -d (fCAR)/ (bCAR)’ solves 158/142 instances in total, while k-Liveness only solves 124. Using fCAR results in a much better performance than bCAR on verifying properties (62 vs. 45). However, applying the bCAR engine seems to be an advantage in finding counterexamples, although the gap with other engines is modest (and rlive in general performs very well on finding counterexamples).

Finally, the results show also the importance of the dead-pruning optimization. Before the dead-pruning optimization is enabled, the performance of rlive is similar to k-Liveness + BMC from nuXmv (145 vs. 146). Dead-pruning improves rlive (using IC3) by verifying 12 more instances and finding 2 additional violations.

Fig. 3.
figure 3

Comparisons among the implementations under different configurations. (Note that for better readability the y-axis starts from a value of 100.)

Fig. 4.
figure 4

Time comparison between rlive (with dead-pruning) and other implementations/configurations. rlive is always on the x-axis. Points above the diagonal indicate better performance of rlive. Points on the borders indicate timeouts (3600 s).

Runtime Efficiency. In order to evaluate the runtime efficiency of rlive, we show in Fig. 3 a plot on the number of solved instances (y-axis) in the given time limit (x-axis) for a subset of the tested configurations (all using IC3 as a backend). From the plot, it is evident that ‘rlive -d’ is significantly more efficient than the other competitors, always solving the largest number of instances within the timeout ranging from 600 s to 3600 s.

A more detailed comparison between rlive and other algorithms is shown in Fig. 4. From the plots, we can see that rlive outperforms other algorithms in a large number of cases, especially in the case of violated properties. An interesting exception is IIMC-FAIR, which shows strengths that are complementary to those of rlive, particularly for verified properties.

Portfolio Configurations. We analyze the behaviour of rlive in “portfolio” configurations, which is a technique often used in practice to improve performance when multiple CPU cores are available. For this, we performed two (virtual) experiments. In the first experiment, we consider a (virtual) portfolio consisting of the algorithms using IC3 as the backend,Footnote 7 and compare it with (virtual) portfolios obtained by excluding a single algorithm at a time, in order to analyze the contribution of the excluded algorithm to the virtual best. The results are shown in Table 3. From the table, we can see that rlive contributes significantly to the performance of the virtual best, particularly for violated properties. Moreover, when multiple engines solve the same property, rlive is the fastest in the vast majority of cases (81 over 183 verified by the virtual best, with the 2nd best performing being the fastest only in 26 cases).

Table 3. Virtual Best results among implementations by IC3 engine. VBS \(\setminus \) (Algorithm a) refers to the removal of a from the portfolio, so the reduction in the number of solutions represents the contribution of a to the portfolio. #Fastest Solution represents the number of times algorithm a solves a case fastest in the full VBS portfolio.
Table 4. Top 10 combinations of 2 algorithms implementation into one portfolio.

In the second experiment, we compose (virtual) portfolios in a “bottom up” way, by considering only configurations running two different algorithms in parallel. Also, in this case, the results in Table 4 clearly show the impact of rlive.

Fig. 5.
figure 5

Comparison of the k values (maximum recursion depths reached) at termination on property verified cases between k-Liveness and rlive. Points on the borders indicate timeouts.

Analysis of rlive Behaviour. We explore the reasons for the excellent performance of rlive through Fig. 5, which compares the k value of k-Liveness to the corresponding maximum recursion depth in rlive on verified properties. They both represent that the algorithm can find a path containing at most k \(\lnot q\)-states before terminating with safe. When both algorithms terminate within the time limit, the k value of rlive is always less (or equal) than the value of k-Liveness. Since k-Liveness performs a breadth-first search (in terms of k), it always needs to find the path that contains the most \(\lnot q\)-states before it can terminate. On the other hand, the shoals generated by rlive during the search process help in blocking other \(\lnot q\)-states, allowing rlive to converge at a smaller depth. In addition, rlive is better at solving cases where there is a path containing a large number of \(\lnot q\)-states in the system, where k-Liveness needs to reach a very large k value to converge. These cases are located on the right border of Fig. 5. The recursion depths of rlive on these cases reach far over 100, with the deepest one reaching 4095. However, the maximum k value of k-Liveness is only around 100. Figure 5 shows also some cases (located in the upper border of the plot) which could be solved by k-Liveness but not by rlive. We investigated them and found that dead states caused the rollback steps of rlive to be slower. The current dead-pruning optimization, which only performs a one-step lookahead to discover dead states, is not effective for such instances (though in most cases this simple strategy works), suggesting future directions for improvement.

6 Conclusions

We presented rlive, a novel algorithm for the liveness checking problem FGq. The idea is to search for a lasso-shaped counterexample by repeatedly calling a safety checker to re-enter the \(\lnot q\) states set. The search proceeds in depth-first, backtracking when a state in \(\lnot q\) can be excluded by proving that \(\lnot q\) can only be reached finitely-many times from its successors, and cannot be part of a counterexample. The invariants returned by the underlying safety checker restrict the search progressively. We called such invariants shoals, as intuitively they represent states that must be avoided when searching for a counterexample. A thorough experimental evaluation clearly demonstrates that rlive is superior to the other liveness checkers, both in terms of benchmarks solved and run time.

Regarding future research, we plan to extend this work in several directions. First, we will investigate heuristics to control the exploration order of bad states and the counterexamples produced by the safety checker. Second, we will consider the extraction of proofs from rlive. Third, we will consider extensions of rlive to the infinite-state case, in combination with algorithms for finding non-lasso-shaped counterexamples such as [11].