1 Introduction

First Order Logic (FOL) is a powerful formalism that naturally captures many interesting decision (and optimization) problems. In recent years, there has been a tremendous progress in automated logic reasoning tools, such as Boolean SATisfiability Solvers (SAT) and Satisfiability Modulo Theory (SMT) solvers. This enabled the use of logic and logic satisfiabilty solvers as a universal solution to many problems in Computer Science, in general, and in Program Analysis, in particular. Most new program analysis techniques formalize the desired analysis task in a fragment of FOL, and delegate the analysis to a SAT or an SMT solver. Examples include deductive verification tools such as Dafny [30] and Why3 [13], symbolic execution engines such as KLEE [7], Bounded Model Checking engines such as CBMC [10] and SMACK [9], and many others.

In this paper, we focus on a fragment of FOL called Constrained Horn Clauses (CHC). CHCs arise in many applications of automated verification. They naturally capture such problems as discovery and verification of inductive invariants [4, 18]; Model Checking of safety properties of finite- and infinite-state systems [2, 23]; safety verification of push-down systems (and their extensions) [4, 28]; modular verification of distributed and parameterized systems [17, 19, 33]; and type inference [35, 36], and many others.

Using CHC, developers of program analysis tools can separate the process of developing a proof methodology (also known as generation of Verification Condition (VC)) from the algorithmic details of deciding whether the VC is correct. Such a flexible design simplifies supporting multiple proof methodologies, multiple languages, and multiple verification tasks with a single framework. Today, there are multiple effective program verification tools based on the CHC methodology, including a C/C++ verification framework SeaHorn [18], a Java verification framework JayHorn [25], and an Android information flow verification tool HornDroid [8], a Rust verification framework RustHorn [31], Solidity verification tools SmartACE [37] and Solidity Compiler Model Checker [1]. Many more approaches utilize CHC as part of a more general verification solution.

The idea of reducing program verification (and model checking) to FOL satisfiability is well researched. A great example is the use of Constraint Logic Programming (CLP) [24] in program verification, or the use of Datalog for pointer analysis [34]. What is unique is the application of SMT-solvers in the decision procedure and lifting of techniques that have been developed in Model Checking and Program Verification communities to the uniform setting of satisfiabilty of CHC formulas. In the rest of this paper, we show how verification problems can be represented in CHCs (Sect. 2), and describe key algorithms behind Spacer  [27], a CHC engine of the SMT solver Z3 [32] that is used to solve them (Sect. 3).

2 Logic of Constrained Horn Clauses

In this section, we give a brief overview of Constrained Horn Clauses (CHC). We illustrate an application of CHC to verification of a simple imperative program with a loop.

The logic of Constrained Horn Clauses is a fragment of FOL. We assume that the reader is familiar with the basic concepts of FOL, including signatures, theories, and models. For the purpose of this presentation, let \(\varSigma \) be some fixed FOL signature and \(\mathcal {A}\) be an FOL theory over \(\varSigma \). For example, \(\varSigma \) is a signature for arithmetic, including constants 0, and 1, and a binary function \(\cdot + \cdot \), and \(\mathcal {A}\) the theory of Presburger arithmetic. A Constrained Horn Clause (CHC) is an FOL sentence of the form:

$$\begin{aligned} \forall V \cdot (\varphi \wedge p_1(X_1) \wedge \cdots \wedge p_k(X_k) \implies h(X)) \end{aligned}$$
(1)

where V is the set of all free variables in the body of the sentence, \(\{p_i\}_{i=1}^{k}\) and h are uninterpreted predicate symbols (in the signature), \(\{X_i\}_{i=1}^{k}\) and X are first-order terms, and p(X) stands for application of predicate p to a list of terms X.

A CHC in Eq. (1) can be equivalently written as the following clause:

$$\begin{aligned} (\lnot \varphi \vee \lnot p_1(X_1) \vee \cdots \vee \lnot p_n(X_n) \vee h(X)) \end{aligned}$$
(2)

where all free variables are implicitly universally quantified. Note that in this case only h appears positively, which explains why these are called Horn clauses. We write \(\mathrm {CHC}(\mathcal {A})\) to denote the set of all sentences in FOL modulo theory \(\mathcal {A}\) that can be written as a set of Constrained Horn Clauses. A sentence \(\varPhi \) is in \(\mathrm {CHC}(\mathcal {A})\) if it can be written as a conjunction of clauses of the form of Eq. (1).

A \(\mathrm {CHC}(\mathcal {A})\) sentence \(\varPhi \) is satisfiable if there exists a model \(\mathcal {M}\) of \(\mathcal {A}\) extended with interpretation for all of the uninterpreted predicates in \(\varPhi \) such that \(\mathcal {M}\) satisfies \(\varPhi \), written \(\mathcal {M} \models \varPhi \). In practice, we are often interested not in an arbitrary model, but a model that can be described concisely in some target fragment of FOL. We call such models solutions. Given an FOL fragment \(\mathcal {F}\), an \(\mathcal {F}\)-solution to a \(CHC(\mathcal {A})\) formula \(\varPhi \) is a model \(\mathcal {M}\) such that \(\mathcal {M} \models \varPhi \) and interpretation of every uninterpreted predicate in \(\mathcal {M}\) is definable in \(\mathcal {F}\). Most commonly, \(\mathcal {F}\) is taken to be either a quantifier free or universally quantified fragment of arithmetic \(\mathcal {A}\), often further extended with arrays.

Fig. 1.
figure 1

A program and its verification conditions in CHC.

Example 1

To illustrate the definitions above consider a C program of a simple counter shown in Fig. 1. The goal is to verify that the assertion at the end of the program holds on every execution. To verify the assertion using the principle of inductive invariants, we need to show that there exists a formula Inv(x) over program variable x such that (a) it is true before the loop, stable at every iteration of the loop, and guarantees the assertion when the loop terminates. Since we are interested in partial correctness, we are not concerned with the case when the loop does not terminate. This principle is naturally encoded as three Constrained Horn Clauses, shown in the in Fig. 1. The uninterpreted predicate \( Inv \) represents the inductive invariant. The program is correct, hence the CHCs are satisfiable. The satisfying model extends the theory of arithmetic with the following definitions of \( Inv \):

$$\begin{aligned} Inv ^{\mathcal {M}} = \{z \mid z \le 5\} \end{aligned}$$
(3)

The CHCs also have a solution in the quantifier free theory of Linear Integer Arithmetic. In particular, \( Inv \) can be defined as follows:

$$\begin{aligned} Inv = \lambda z \cdot z \le 5 \end{aligned}$$
(4)

where the notation function with argument x and body \(\varphi \).

The CHCs in this example can be expressed as an SMT-LIB script, shown in Fig. 2, and solved by Spacer engine of Z3. Note that the script uses some Z3-specific extensions, including logic HORN and several option that disable pre-processing (which is not necessary for such a simple example).

   \(\square \)

Fig. 2.
figure 2

CHCs from Fig. 1 in SMT-LIB format.

Fig. 3.
figure 3

A program with a function and its verification conditions in CHC.

Example 2

Figure 3 shows a similar program, however, with a function that abstracts away the increment operation. The corresponding CHCs are also shown in Fig. 3. There are two unknowns, \( Inv \) that represents the desired inductive invariant, and \( Inc \) that represents the summary (i.e., pre- and post-conditions, or an over-approximation) of the function . Since the program still satisfies the assertion, the CHCs are satisfiable, and have

$$\begin{aligned} Inv ^{\mathcal {M}}&= \{z \mid z \le 5 \} = \lambda z \cdot z \le 5 \end{aligned}$$
(5)
$$\begin{aligned} Inc ^{\mathcal {M}}&= \{(z, r) \mid r = z + 1\} = \lambda z, r \cdot r \le z + 1 \end{aligned}$$
(6)

The corresponding SMT-LIB script is shown in Fig. 4.    \(\square \)

Example 3

In this last example, consider a set of CHCs shown in Fig. 5. They are similar to CHCs in Fig. 1, with one exception. These CHCs are unsatisfiable. There is no interpretation of \( Inv \) to satisfy them. This is witnessed by a refutation – a resolution proof – shown in Fig. 6. The corresponding SMT-LIB script in shown in Fig. 7.    \(\square \)

3 Solving CHC Modulo Theories

The logic of CHC can be seen as a convenient modelling language. That is, it does not restrict or impose a preference on a decision procedure used to solve the problem. In fact, a variety of solvers and techniques are widely available, including Spacer  [28] (that is available as part of Z3), FreqHorn [12], and ELDARICA [22]. There is also an annual competition, CHC-COMPFootnote 1, to evaluate state-of-the-art solvers. In the rest of this section, we give a brief overview of the algorithm underlying Spacer.

Fig. 4.
figure 4

CHCs from Fig. 3 in SMT-LIB format.

Fig. 5.
figure 5

An example of unsatisfiable CHCs.

Spacer is an extension and generalization of SAT-based Model Checking algorithms to CHC modulo SMT-supported theories. On propositional transition systems, Spacer behaves similarly to IC3 [6] and PDR [11], and can be seen as an adaptation of these algorithms. For other first-order theories, Spacer extends Generalized PDR of Hoder and Bjørner [21].

Given a CHC system \(\varPhi \), Spacer works by iteratively looking for a bounded derivation of \(\mathrm {false}\) from \(\varPhi \). It explores \(\varPhi \) in a top-down (or backwards) direction. Each time Spacer fails to find a derivation of a fixed bound N, the reasons for failure are analyzed to derive consequences of \(\varPhi \) that explain why a derivation of \(\mathrm {false}\) must have at least \(N+1\) steps. This process is repeated until either (a) \(\mathrm {false}\) is derived and \(\varPhi \) is shown to be unsatisfiable, (b) the consequences form a solution to \(\varPhi \), thus, showing that \(\varPhi \) satisfiable, or (c) the process continues indefinitely, but continuously ruling out impossibility of longer and longer refutations. Thus, even though the problem is in general undecidable, Spacer always makes progress trying to show that \(\varPhi \) is unsatisfiable or that there is no short proof of unsatisiability.

Spacer is a procedure for solving linear and non-linear CHCs. For convenience of the presentation, we restrict ourselves to a special case of non-linear CHCs that consists of the following three clauses:

$$\begin{aligned} Init (X)&\Rightarrow P(X) \end{aligned}$$
(7)
$$\begin{aligned} P(X)&\Rightarrow Bad (X)\end{aligned}$$
(8)
$$\begin{aligned} P(X) \wedge P(X^o) \wedge Tr (X,X^o,X')&\Rightarrow P(X') \end{aligned}$$
(9)

where, X is a set of free variables, \(X' = \{x' \mid x \in X\}\) and \(X^o = \{x^o \mid x \in X\}\) are auxiliary free variables, \( Init \), \( Bad \), and \( Tr \) are FOL formulas over the free variables (as indicated), and P is an uninterpreted predicate. Recall that all free variables in each clause are implicitly universally quantified. Thus, the only unknown to solve for is the uninterpreted predicate P. We call these three clauses a safety problem, and write \(\langle Init (X), Tr (X, X^o, X'), Bad (X) \rangle \) as a shorthand to represent them. It is not hard to show that satisfiability of arbitrary CHCs is reducible to a safety problem. Thus, this simplification does not lose generality. In practice, Spacer directly supports more complex CHCs with multiple unknown uninterpreted predicates.

Fig. 6.
figure 6

Refutation proof for CHCs in Fig. 5.

Fig. 7.
figure 7

CHCs from Fig. 5 in SMT-LIB format.

figure c

Before presenting the algorithm, we need to introduce two concepts from logic: Craig Interpolation and Model Based Projection.

Craig Interpolation. Given two formulas \(A[\vec {x},\vec {z}]\) and \(B[\vec {y},\vec {z}]\) such that \(A \wedge B\) is unsatisfiable, a Craig interpolant \(I[\vec {z}] = \textsc {Itp}(A[\vec {x},\vec {z}], B[\vec {y}, \vec {z}])\), is a formula \(I[\vec {z}]\) such that \(A[\vec {x}, \vec {z}] \Rightarrow I[\vec {z}]\) and \(I[\vec {z}] \Rightarrow \lnot B[\vec {y},\vec {z}]\). We further require that the interpolant is a clause. Intuitively, the interpolant I captures the consequences of A that are inconsistent with B. If A is a conjunction of literals, the interpolant can be seen as a semantic variant of an UNSAT core.

Model Based Projection. Let \(\varphi \) be a formula, \(U \subseteq Vars (\varphi )\) a subset of variables of \(\varphi \), and P a model of \(\varphi \). Then, \(\psi = \textsc {Mbp}(U, P, \varphi )\) is a model based projection if (a) \(\psi \) is a monomial, (b) \( Vars (\psi ) \subseteq Vars (\varphi ) \setminus U\), (c) \(P \models \psi \), (d) \(\psi \Rightarrow \exists V \cdot \varphi \). Intuitively, an MBP is an under-approximation of existential quantifier elimination, where the choice of the under-approximation is guided by the model.

We present Spacer  [27] as a set of rules shown in Algorithm 1. While the algorithm is sound under any order on application of the rules, it is easy to see that only some orders lead to progress. Since solving CHCs even over LIA is undecidable, we are only concerned with soundness and progress, and do not discuss termination. The algorithm is based on the core principles of IC3 [5], however, it differs significantly in the details. The rules Unreachable and Reachable detect termination, either by discovering an inductive solution, or by discovering existence of a refutation, respectively. Unfold increases the exploration depth, and Candidate constructs a new proof obligation based on the current depth and the set \( Bad \) of bad states. Successor computes additional reachable states, that is, an under-approximation of the model of the implicit predicate P. Note that it used Model Based Projection to under-approximate forward predicate transformer. The rules MustPredecessor and MayPredecessor compute a new proof obligation that precedes an existing one. MustPredecessor does the computation based on existing reachable states, while MayPredecessor makes a guess based on existing over-approximation of P. In this case, MBP is used again, but now to under-approximate a backward predicate transformer. The rule NewLemma computes a new over-approximation, called a lemma, of what is derivable about P in \(i+1\) by blocking a proof obligation. This is very similar to the corresponding step in IC3. Note, however, that interpolation is used to generalize the learned lemma beyond the literals of the proof obligation. ReQueue allows pushing blocked proof obligations to higher level, and Push allows pushing and inductively generalizing lemmas.

Spacer was introduced in [27]. Extension for convex linear arithmetic (i.e., discovering convex and co-convex solutions) is described in [3]. Support for quantifier free solutions for CHC over the combined theories of arrays and arithmetic is described in [26]. Extension for quantified solutions, which are necessary for establishing interesting properties when arrays are involved is described in [20]. More recently, the interpolation for lemma-generalization has been replaced by more global guidance [14]. This made Spacer competitive with other data-driven approaches that infer new lemmas based on numerical values of blocked counterexamples. Machine Learning-based inductive generalization has been suggested in [29]. The solver has also been extended to support Algebraic Data Types and Recursive Functions [16]. Work on improving support for bit-vectors [15] and experimenting with support for uninterpreted functions is ongoing.