figure a
figure b

1 Introduction

Overview. Program specifications are often written in expressive, high-level languages: for instance, in temporal logic [14], in first-order logic with quantifiers [28], in separation logic [40], or in specification languages that provide extended quantifiers for computing the sum or maximum value of array elements [7, 33]. Specifications commonly also use a rich set of theories; for instance, specifications could be written using full Peano arithmetic, as opposed to bit-vectors or linear arithmetic used in the program. Rich specification languages make it possible to express intended program behaviour in a succinct form, and as a result reduce the likelihood of mistakes being introduced in specifications.

There is a gap, however, between the languages used in specifications and the input languages of automatic verification tools. Software model checkers, in particular, usually require specifications to be expressed using program assertions and Boolean program expressions, and do not directly support any of the more sophisticated language features mentioned. In fact, rich specification languages are challenging to handle in automatic verification, since satisfiability checks can become undecidable (i.e., it is no longer decidable whether assertion failures can occur on a program path), and techniques for inferring program invariants usually focus on simple specifications only.

To bridge this gap, it is common practice to encode high-level specifications in the low-level assertion languages understood by the tools. For instance, temporal properties can be translated to Büchi automata, and added to programs using ghost variables and assertions [14]; quantified properties can be replaced with non-determinism, ghost variables, or loops [13, 37]; sets used to specify the absence of data-races can be represented using non-deterministically initialized variables [18]. By adding ghost variables and bespoke ghost code to programs [22], many specifications can be made effectively checkable.

The translation of specifications to assertions or ghost code is today largely designed, or even carried out, by hand. This is an error-prone process, and for complex specifications and programs it is very hard to ensure that the low-level encoding of a specification faithfully models the original high-level properties to be checked. Mistakes have been found even in industrial, very carefully developed specifications [39], and can result in assertions that are vacuously satisfied by any program. Naturally, the manual translation of specifications also tends to be an ad-hoc process that does not easily generalise to other specifications.

This paper proposes the first general framework to automate the translation of rich program specifications to simpler program assertions, using a process called instrumentation. Our approach models the semantics of specific complex operations using program-independent instrumentation operators, consisting of (manually designed) rewriting rules that define how the evaluation of the operator can be achieved using simpler program statements and ghost variables. The instrumentation approach is flexible enough to cover a wide range of different operators, including operators that are best handled by weaving their evaluation into the program to be analysed. While instrumentation operators are manually written, their application to programs can be performed in a fully automatic way by means of a search procedure. The soundness of an instrumentation operator is shown formally, once and for all, by providing an instrumentation invariant that ensures that the operator can never be used to show correctness of an incorrect program.

Additional instrumentation operator definitions, correctness proofs, and detailed evaluation results can be found in the accompanying extended report [4].

Fig. 1.
figure 1

Program computing triangular numbers, and its instrumented counterpart

Motivating Example. We illustrate our approach on the computation of triangular numbers \(s_N = (N^2 + N) / 2\), see left-hand side of Fig. 1. For reasons of presentation, the program has been normalised by representing the square  using an auxiliary variable . While mathematically simple, verifying the post-condition  in the program turns out to be challenging even for state-of-the-art model checkers, as such tools are usually thrown off course by the non-linear term . Computing the value of by adding a loop in line 16 is not sufficient for most tools either, since the program in any case requires a non-linear invariant to be derived for the loop in lines 4–12.

The insight needed to elegantly verify the program is that the value  can be tracked during the program execution using a ghost variable . For this, the program is instrumented to maintain the relationship : initially, , and each time the value of is modified, also the variable  is updated accordingly. With the value  available, both the loop invariant and the post-condition turn into formulas over linear arithmetic, and program verification becomes largely straightforward. The challenge, of course, is to discover this program transformation automatically, and to guarantee the soundness of the process. For the example, the transformed program is shown on the right-hand side of Fig. 1, and discussed in the next paragraphs.

Fig. 2.
figure 2

Definition of an instrumentation operator \(\varOmega _{square}\) for tracking squares

Our method splits the process of program instrumentation into two parts: (i) choosing an instrumentation operator, which is defined manually, designed to be program-independent, and induces a space of possible program transformations; and (ii) carrying out an automatic application strategy to find, among the possible program transformations, one that enables verification of a program.

An instrumentation operator for tracking squares is shown in Fig. 2, and consists of the declaration of two ghost variables (x_sq, x_shad) with initial value 0, respectively; four rules for rewriting program statements; and the instrumentation invariant witnessing correctness of the operator. The rewrite rules use formal variables xy, which can represent arbitrary variables in the program (, , ). An application of the operator to a program will declare the ghost variables in the form of global variables, and then rewrite some chosen set of program statements using the provided rules. Since the statements to be rewritten can be chosen arbitrarily, and since moreover multiple rewrite rules might apply to some statements, rewriting can result in many different variants of a program. In the example, we rewrite the assignments C, D of the left-hand side program using rewrite rules (R2) and (R4), respectively, resulting in the instrumented and correct program on the right-hand side.

Instrumentation operators are designed to be sound, which means that rewriting a wrong selection of program statements might lead to an instrumented program that cannot be verified, i.e., in which assertions might fail, but instrumentation can never turn an incorrect source program into a correct instrumented program. This opens up the possibility to systematically search for the right program instrumentation. We propose a counterexample-guided algorithm for this purpose, which starts from some arbitrarily chosen instrumentation, checks whether the instrumented program can be verified, and otherwise attempts to fix the instrumentation using a refinement loop. As soon as a verifiable instrumented program has been found, the search can stop and the correctness of the original program has been shown.

The concept of instrumentation invariants is essential for guaranteeing soundness of an operator. Instrumentation invariants are formulas that can (only) refer to the ghost variables introduced by an instrumentation operator, and are formulated in such a way that they hold in every reachable state of every instrumented program. To maintain their invariants, instrumentation operators use shadow variables that duplicate the values of program variables. In the operator in Fig. 2, the purpose of the shadow variable  is to reproduce the value of the program variable whose square is tracked (). The rewriting rules introduce guards to detect incorrect instrumentation (the assertions in (R2), (R3), (R4)), which are particular cases in which some update of a relevant variable was missed and not correctly instrumented. The use of shadow variables and guards make instrumentation operators very flexible; in our example, note that instrumentation tracks the square of the value of during the loop, but is also used later to simplify the expression . This is possible because of the instrumentation invariant and because holds after termination of the loop, which is verified through the assertion introduced in line 14.

Contributions and Outline. The operator shown in Fig. 2 is simple, and does not apply to all programs, but it can easily be generalised to other arithmetic operators and program statements. The framework presented in this paper provides the foundation for developing a (extendable) library of formally verified instrumentation operators. In the scope of this paper, we focus on two specification constructs that have been identified as particularly challenging in the literature: existential and universal quantifiers over arrays, and aggregation (or extended quantifiers), which includes computing the sum or maximum value of elements in an array. Our experiments on benchmarks taken from the SV-COMP [8] show that even relatively simple instrumentation operators can significantly extend the capabilities of a software model checker, and often make the automatic verification of otherwise hard specifications easy.

The contributions of the paper are: (i) a general framework for program instrumentation, which defines a space of program transformations that work by rewriting individual statements (Sect. 2); (ii) an application strategy search algorithm in this space, for a given program (Sect. 3); (iii) two instantiations of the framework—one for instrumentation operators to handle specifications with quantifiers (Sect. 4.1), and one for extended quantifiers (Sect. 4.2); (iv) machine-checked proofs of the correctness of the instrumentation operators for quantifiers \(\forall \) and the extended quantifier \max; (v) a new verification tool, MonoCera, that is tailored to the verification of programs with aggregation; and (vi) an evaluation of our method and tool on a set of examples, including such from SV-COMP [8] (Sect. 5).

2 Instrumentation Framework

The next two sections formally introduce the instrumentation framework. Later, we instantiate the framework for quantification and aggregation over arrays. We split the instrumentation process into two parts:

  1. 1.

    An instrumentation operator that defines how to rewrite program statements with the purpose of eliminating language constructs that are difficult to reason about automatically, but leaves the choice of which occurrences of these statements to rewrite to the second part (this section).

  2. 2.

    An application strategy for the instrumentation operator, which can be implemented using heuristics or systematic search, among others. The strategy is responsible for selecting the right (if any) program instrumentation from the many possible ones, Sect. 3 is dedicated to the second part.

Even though instrumentation operators are non-deterministic, we shall guarantee their soundness: if the original program has a failing assertion, so will any instrumented program, regardless of the chosen application strategy; that is, instrumentation of an incorrect program will never yield a correct program.

We shall also guarantee a weak form of completeness, to the effect that if an assertion that has not been added to the program by the instrumentation fails in the instrumented program, then it will also fail in the original program. As a result, any counterexample (for such an assertion) produced when verifying the instrumented program can be transformed into a counterexample for the original program.

Table 1. Syntax of the core language.

2.1 The Core Language

While our implementation works on programs represented as constrained Horn clauses [12], i.e., is language-agnostic, for readability purposes we present our approach in the setting of an imperative core programming language with data-types for unbounded integers, Booleans, and arrays, and assert and assume statements. The language is deliberately kept simple, but is still close to standard C. The main exception is the semantics of arrays: they are defined here to be functional and therefore represent a value type. Arrays have integers as index type and are unbounded, and their signature and semantics are otherwise borrowed from the SMT-LIB theory of extensional arrays [6]:

  • Reading the value of an array a at index i:      select(a, i);

  • Updating an array a at index i with a new value x:      store(a, i, x).

The complete syntax of the core language is given in Table 1. Programs are written using a vocabulary \(\mathcal {X}\) of typed program variables; the typing rules of the language are given in [4]. As syntactic sugar, we sometimes write a[i] instead of select(a, i), and a[i] = x instead of a = store(a, i, x).

We denote by \(D_{\sigma }\) the domain of a program type \(\sigma \). The domain of an array type \(\texttt{Array}~\sigma \) is the set of functions \(f: \mathbbm {Z} \rightarrow D_{\sigma }\).

Semantics. We assume the Flanagan-Saxe extended execution model of programs with assume and assert statements (see, e.g., [23]), in which executing an assert statement with an argument that evaluates to false fails, i.e., terminates abnormally. An assume statement with an argument that evaluates to false has the same semantics as a non-terminating loop. Partial correctness properties of programs are expressed using Hoare triples \(\{ Pre \} \;P\; \{ Post \}\), which state that an execution of P, starting in a state satisfying \( Pre \), never fails, and may only terminate in states that satisfy \( Post \). As usual, a program P is considered (partially) correct if the Hoare triple \(\{ true \} \;P\; \{ true \}\) holds.

The evaluation of program expressions is modelled using a function \(\llbracket \cdot \rrbracket \!\!_s\) that maps program expressions t of type \(\sigma \) to their value \(\llbracket t \rrbracket \!\!_s \in D_{\sigma }\) in the state s.

2.2 Instrumentation Operators

An instrumentation operator defines schemes to rewrite programs while preserving the meaning of the existing program assertions. Without loss of generality, we restrict program rewriting to assignment statements. Instrumentation can introduce ghost state by adding arbitrary fresh variables to the program. The main part of an instrumentation consists of rewrite rules, which are schematic rules \(r \;\texttt {=}\; t \leadsto s\), where the meta-variable r ranges over program variables, t is an expression that can contain further meta-variables, and s is a schematic program in which the meta-variables from \(r \;\texttt {=}\; t\) might occur. Any assignment that matches \(r \;\texttt {=}\; t\) can be rewritten to s.

Definition 1

(Instrumentation Operator). An instrumentation operator is a tuple \(\varOmega _{} = (G_{ }, R_{ }, I_{ })\), where:

  1. (i)

    \(G = \langle (\texttt{x}_1, init _1), \ldots , (\texttt{x}_k, init _k) \rangle \) is a tuple of pairs of ghost variables and their initial values;

  2. (ii)

    R is a set of rewrite rules \(r \;\texttt {=}\; t \leadsto s\), where s is a program operating on the ghost variables \(\texttt{x}_1, \ldots , \texttt{x}_k\) (and containing meta-variables from \(r \;\texttt {=}\; t\));

  3. (iii)

    I is a formula over the ghost variables \(\texttt{x}_1, \ldots , \texttt{x}_k\), called the instrumentation invariant.

The rewrite rules R and the invariant I must adhere to the following constraints:

  1. 1.

    The instrumentation invariant I is satisfied by the initial ghost values, i.e., it holds in the state \(\{\texttt{x}_1 \mapsto init _1, \ldots , \texttt{x}_k \mapsto init _k\}\).

  2. 2.

    For all rewrites \(r \;\texttt {=}\;t \leadsto s \in R\) the following hold:

    1. (a)

      s terminates (normally or abnormally) for pre-states satisfying I, assuming that all meta-variables are ordinary program variables.

    2. (b)

      s does not assign to variables other than r or the ghost variables \(\texttt{x}_1, \ldots , \texttt{x}_k\).

    3. (c)

      s preserves the instrumentation invariant: \(\{ I \}\ s'\ \{ I \}\), where \(s'\) is s with every \(\texttt {assert(}e\texttt {)}\) statement replaced by an \(\texttt {assume(}e\texttt {)}\) statement.

    4. (d)

      s preserves the semantics of the assignment \(r \;\texttt {=}\;t\): the Hoare triple \(\{ I \} \;\texttt {z} \;\texttt {=}\;t\texttt {;}\;s' \; \{ \texttt {z} = r \}\), where \(\texttt {z}\) is a fresh variable, holds.

The conditions imposed in the definition ensure that all instrumentations are correct, in the sense that they are sound and weakly complete, as we show below. In particular, the instrumentation invariant guarantees that the rewrites of program statements are semantics-preserving w.r.t. the original program, and thus, the execution of any statement of the original program has the same effect before and after instrumentation. Observe that the conditions can themselves be deductively verified to hold for each concrete instrumentation operator, and that this check is independent of the programs to be instrumented, so that an instrumentation operator can be proven correct once and for all.

An instrumentation operator \(\varOmega \) does itself not define which occurrences of program statements are to be rewritten, but only how they are rewritten. Given a program P and the operator \(\varOmega \), an instrumented program \(P'\) is derived by carrying out the following two steps: (i) variables \(\texttt{x}_1, \ldots , \texttt{x}_k\) and the assignments \(\texttt{x}_1 \;\texttt {=}\; init _1\texttt {;}\;\ldots \texttt {;}\;\texttt{x}_k \;\texttt {=}\; init _k\) are added at the beginning of the program, and (ii) some of the assignments in P, to which a rewriting rule \(r \;\texttt {=}\;t \leadsto s\) in \(\varOmega \) is applicable, are replaced by s, substituting meta-variables with the actual terms occurring in the assignment. We denote by \(\varOmega (P)\) the set of all instrumented programs \(P'\) that can be derived in this way. An example of an instrumentation operator and its application was shown Fig. 1 and Fig. 2.

2.3 Instrumentation Correctness

Verification of an instrumented program produces one of two possible results: a witness if verification is successful, or a counterexample otherwise. A witness consists of the inductive invariants needed to verify the program, and is presented in the context of the programming language: it is translated back from the back-end theory used by the verification tool, and is a formula over the program variables and the ghost variables added during instrumentation. A counterexample is an execution trace leading to a failing assertion.

Definition 2

(Soundness). An instrumentation operator \(\varOmega \) is called sound if for every program P and instrumented program \(P' \in \varOmega (P)\), whenever there is an execution of P where some assert statement fails, then there also is an execution of \(P'\) where some assert statement fails.

Equivalently, existence of a witness for an instrumented program entails existence of a witness for the original program, in the form of a set of inductive invariants solely over the program variables. Notably, because of the semantics-preserving nature of the rewrites under the instrumentation invariant, a witness for the original program can be derived from one for the instrumented program. One such back-translation is to add the instrumentation invariant as a conjunct to the original witness, and to existentially quantify over the ghost variables.

Example. To illustrate the back-translation, we return to the instrumentation operator from Fig. 2 and the example program from Fig. 1. The witness produced by our verification tool in this case is the formula:

$$\begin{aligned} \begin{aligned} \texttt{i} = \mathtt {x\_shad} \wedge \mathtt {x\_sq} + \mathtt {x\_shad} = 2s\wedge \texttt{N} \ge \texttt{i} \wedge \texttt{N} \ge \texttt{1} \wedge 2\texttt{s} \ge \texttt{i} \wedge \texttt{i} \ge 0 \end{aligned} \end{aligned}$$

After conjoining the instrumentation invariant \(\mathtt {x\_sq} = \mathtt {x\_shad}^2\) and existentially quantifying over the involved ghost variables, we obtain an inductive invariant that is sufficient to verify the original program:

$$\begin{aligned} \begin{aligned} \exists x_\textrm{sq}, x_\textrm{shad}.\; ( \texttt{i} = x_\textrm{shad} \wedge x_\textrm{sq} + x_\textrm{shad} = 2s \wedge ~~~~~~~~~~ \\ \texttt{N} \ge \texttt{i} \wedge \texttt{N} \ge \texttt{1} \wedge 2\texttt{s} \ge \texttt{i} \wedge \texttt{i} \ge 0 \wedge x_\textrm{sq} = x_\textrm{shad}^2) \end{aligned} \end{aligned}$$

Definition 3

(Weak Completeness). The operator \(\varOmega \) is called weakly complete if for every program P and instrumented program \(P' \in \varOmega (P)\), whenever an assert statement that has not been added to the program by the instrumentation fails in the instrumented program \(P'\), then it also fails in the original program P.

Similarly to the back-translation of invariants, when verification fails, counterexamples for assertions of the original program, found during verification of the instrumented program, can be translated back to counterexamples for the original program. We thus obtain the following result.

Theorem 1

(Soundness and weak completeness). Every instrumentation operator \(\varOmega \) is sound and weakly complete.

Proof

Let \(\varOmega _{} = (G_{ }, R_{ }, I_{ })\) be an instrumentation operator. Since I is a formula over ghost variables only, which holds initially and is preserved by all rewrites, I is an invariant of the fully instrumented program. This entails that rewrites of assignments are semantics-preserving. Furthermore, since instrumentation code only assigns to ghost variables or to r (i.e., the left-hand side of the original statement), program variables have the same valuation in the instrumented program as in the original one. Furthermore, since all rewrites are terminating under I, the instrumented program will terminate if and only if the original program does.

In the case when verification succeeds, and a witness is produced, weak completeness follows vacuously. A witness consists of the inductive invariants sufficient to verify the instrumented program. Thus, they are also sufficient to verify the assertions existing in the original program, since assertions are not rewritten and all program variables have the same valuation in the original and the instrumented programs. Since a witness for the instrumented program can be back-translated to a witness for the original program, any failing assertion in the original program must also fail after instrumentation, and \(\varOmega \) is therefore sound.

In the case when verification fails, soundness follows vacuously, and if the failing assertion was added during instrumentation, also weak completeness follows. If the assertion existed in the original program, since such assertions are not rewritten, and since program variables have the same valuation in the instrumented program as in the original program, then any counterexample for the instrumented program is also a counterexample for the original program, when projected onto the program variables. \(\square \)

figure y

3 Instrumentation Application Strategies

We will now define a counterexample-guided search procedure to discover applications of instrumentation operators that make it possible to verify a program.

For our algorithm, we assume that we are given an oracle \( IsCorrect \) that is able to check the correctness of programs after instrumentation. Such an oracle could be approximated, for instance, using a software model checker. The oracle is free to ignore the complex functions we are trying to eliminate by instrumentation; for instance, in Fig. 1, the oracle can over-approximate the term  by assuming that it can have any value. We further assume that C is the set of control points of a program P corresponding to the statements to which a given set of instrumentation operators can be applied. For each control point \(p \in C\), let Q(p) be the set of rewrite rules applicable to the statement at p, including also a distinguished value \(\bot \) that expresses that p is not modified. For the program in Fig. 1, for instance, the choices could be defined by \(Q(\texttt{A}) = Q(\texttt{B}) = \{ \mathrm {(R1)} , \bot \}\), \(Q(\texttt{C}) = \{ \mathrm {(R2)} , \bot \}\), and \(Q(\texttt{D}) = \{ \mathrm {(R4)} , \bot \}\), referring to the rules in Fig. 2. Any function \(r : C \rightarrow \bigcup _{p \in C} Q(p)\) with \(r(p) \in Q(p)\) will then define one possible program instrumentation. We will denote the set of well-typed functions \(C \rightarrow \bigcup _{p \in C} Q(p)\) by R, and the program obtained by rewriting P according to \(r \in R\) by \(P_r\). We further denote the control point in \(P_r\) corresponding to some \(p \in C\) in P by \( ins _r (p)\).

Algorithm 1 presents our algorithm to search for instrumentations that are sufficient to verify a program P. The algorithm maintains a set \( Cand \subseteq R\) of remaining ways to instrument P, and in each loop considers one of the remaining elements \(r \in Cand \) (line 4). If the oracle manages to verify \(P_r\) in line 5, due to soundness of instrumentation the correctness of P has been shown (line 6); if \(P_r\) is incorrect, there has to be a counterexample ending with a failing assertion (line 8). There are two possible causes of assertion failures: if the failing assertion in \(P_r\) already existed in P, then due to the weak completeness of instrumentation also P has to be incorrect (line 10). Otherwise, the program instrumentation has to be refined, and for this from \( Cand \) we remove all instrumentations \(r'\) that agree with r regarding the instrumentation of the statements occurring in the counterexample (line 13).

Since R is finite, and at least one element of \( Cand \) is eliminated in each iteration, the refinement loop terminates. The set \( Cand \) can be exponentially big, however, and therefore should be represented symbolically (using BDDs, or using an SMT solver managing the set of blocking constraints from line 13).

We can observe soundness and completeness of the algorithm w.r.t. the considered instrumentation operators (proof in [4]):

Lemma 1

(Correctness of Algorithm 1). If Algorithm 1 returns an instrumentation \(r \in R\), then \(P_r\) and P are correct. If Algorithm 1 returns \( Incorrect \), then P is incorrect. If there is \(r \in R\) such that \(P_r\) is correct, then Algorithm 1 will return \(r'\) such that \(P_{r'}\) is correct.

4 Instrumentation Operators for Arrays

4.1 Instrumentation Operators for Quantification over Arrays

Table 2. Extension of the core language with quantified expressions.

To handle quantifiers in a programming setting, we extend the language defined in Table 1 by adding quantified expressions over arrays, as shown in Table 2. As seen, we also extend the language with a lambda expression over two variables. The rationale for this is that many quantified properties can be expressed as a binary predicate with the first argument corresponding to the value of an element and the second to the index. This allows us to express properties over both the value of an element and its index. For example, we can express that each element should be equal to its index, as is done in the example program in Fig. 3. In the program, each element in the array is assigned the value corresponding to its index, after which it is asserted that this property indeed holds.

Using \(\texttt {P(x}_0\texttt {,i}_0\texttt {)}\) as shorthand for \(\texttt {(}\mathtt {\lambda }{} \texttt {(x,i).P)(x}_0\texttt {,i}_0\texttt {)}\), the new expressions can be defined formally as:

$$\begin{aligned} \llbracket \texttt {forall}{} \texttt {(a, l, u, }\lambda \texttt {(x,i).P}{} \texttt {)} \rrbracket \!\!_s&~=~ \forall \mathtt {i \in [l, u).}\; \llbracket \texttt {P(a[i],i)} \rrbracket \!\!_s \\ \llbracket \texttt {exists}{} \texttt {(a, l, u, }\lambda \texttt {(x,i).P}{} \texttt {)} \rrbracket \!\!_s&~=~ \exists \mathtt {i \in [l, u).}\; \llbracket \texttt {P(a[i],i)} \rrbracket \!\!_s \end{aligned}$$

Note that the types of x and a must be compatible and P be a Boolean-valued expression.

Fig. 3.
figure 3

Example of program to be verified using a quantified assert statement.

Fig. 4.
figure 4

Definition of an instrumentation operator for universal quantification

To handle programs such as the one in Fig. 3, we turn to the instrumentation framework outlined in Sect. 2.2, which we use here to define an instrumentation operator for universal quantification. The general idea is to instrument programs with a ghost variable, tracking if some predicate holds for all elements in an interval of the array, with shadow variables representing the tracked array, and the bounds of the interval. Naturally, an instrumentation operator for existential quantification can be defined in a similar fashion. For simplicity, we shall assume a normal form of programs, into which every program can be rewritten by introducing additional variables. In the normal form, , and can only occur in simple assignment statements. For example, stores are restricted to occur in statements of the form: .

Over such normalised programs, and for a universally quantified expression \(\texttt {forall(a, l, u, }\lambda \texttt {(x,i)(P))}\), we define the instrumentation operator \(\varOmega _{\forall , P} = (G_{ \forall , P }, R_{ \forall , P }, I_{ \forall , P })\) as shown in Fig. 4 over four ghost variables. The array over which quantification occurs is tracked by \(\mathtt {qu\_ar}\) and the variables \(\mathtt {qu\_lo}\), \(\mathtt {qu\_hi}\) represent the bounds of the currently tracked interval. The result of the quantified expression is tracked by \(\mathtt {qu\_P}\), whose value is \( true \) iff \(\texttt{P}\) holds for all elements in \(\texttt{a}\) in the interval \([\mathtt {qu\_lo, qu\_hi})\). The rewrite rules for stores, selects and assignments of universally quantified expressions are then defined as follows. For stores, the first if-branch resets the tracking to the one element interval \([\texttt{i}, \mathtt {i+1})\) when accessing elements far outside of the currently tracked interval, or if we are tracking the empty interval (as is the case at initialisation). If an access occurs immediately adjacent to the currently tracked interval (e.g., if \(\texttt{i} = \mathtt {qu\_lo-1}\)), then that element is added to the tracked interval, and the value of \(\mathtt {qu\_P}\) is updated to also account for the value of \(\texttt{P}\) at index \(\texttt{i}\). If instead the access is within the tracked interval, then we either reset the interval (if \(\mathtt {qu\_P}\) is \(\texttt{false}\)) or keep the interval unchanged (if \(\mathtt {qu\_P}\) is \(\texttt{true}\)). Rewrites of selects are similar to stores, except tracking does not need to be reset when reading inside the tracked interval. For rewrites of quantified expressions, if the quantified interval is empty, \(\texttt{b}\) is assigned \(\texttt{true}\). Otherwise, assertions check that the tracked interval matches the quantified interval before assigning \(\texttt{t}\) to \(\mathtt {qu\_P}\). If \(\mathtt {qu\_P}\) is \(\texttt{true}\), then it is sufficient that quantification occurs over a sub-interval of the tracked interval, and vice versa if \(\mathtt {qu\_P}\) is \(\texttt{false}\).

The result of applying \(\varOmega _{\forall , P}\) to the program in Fig. 3 is shown in [4]. As exhibited by the experiments in Sect. 5, the resulting program is in many cases easier to verify by state-of-the-art verification tools. Note that the instrumentation operator defined is only one possibility among many. For example, one could track several ranges simultaneously over the array in question, or also track the index of some element in the array over which P holds, or make different choices on stores outside of the tracked interval.

The following lemma establishes correctness of the instrumentation operator. The proof can be found in [4].

Lemma 2

(Correctness of \(\varOmega _ \forall ,P \)).\(\varOmega _ \forall ,P \) is an instrumentation operator, i.e., it adheres to the constraints imposed in Definition 1.

4.2 Instrumentation Operators for Aggregation over Arrays

We now turn to the verification of safety properties with aggregation. As examples of aggregation, we consider in particular the operators \sum and \max, calculating the sum and maximum value of an array, respectively. Aggregation is supported in the form of extended quantifiers in the specification languages JML [33] and ACSL [7], and is frequently needed for the specification of functional correctness properties. Although commonly used, most verification tools do not support aggregation, so that properties involving aggregation have to be manually rewritten using standard quantifiers, pure recursive functions, or ghost code involving loops. This reduction step is error-prone, and represents an additional complication for automatic verification approaches, but can be handled elegantly using the instrumentation framework. For generality, we formalise aggregation over arrays with the help of monoid homomorphisms.

Definition 4

(Monoid). A monoid is a structure \((M, \circ , e)\) consisting of a non-empty set M, a binary associative operation \(\circ \) on M, and a neutral element \(e \in M\). A monoid is commutative if \(\circ \) is commutative. A monoid is cancellative if \(x \circ y = x \circ z\) implies \(y = z\), and \(y \circ x = z \circ x\) implies \(y = z\), for all \(x, y, z \in M\).

For aggregation, we model finite intervals of arrays using the cancellative monoid \((D^*, \cdot , \epsilon )\) of finite sequences over some data domain D. The concatenation operator \(\cdot \) is non-commutative.

Definition 5

(Monoid Homomorphism). A monoid homomorphism is a function \(h : M_1 \rightarrow M_2\) between monoids \((M_1, \circ _1, e_1)\) and \((M_2, \circ _2, e_2)\) with the properties \(h(x \circ _1 y) = h(x) \circ _2 h(y)\) and \(h(e_1) = e_2\).

Ordinary quantifiers can be modelled as homormorphisms \(D^* \rightarrow \mathbbm {B}\), so that the instrumentation in this section strictly generalizes Sect. 4.1. A second classical example is the computation of the maximum (similarly, minimum) value in a sequence. For the domain of integers, the natural monoid to use is the algebra \((\mathbbm {Z}_{-\infty }, \max , -\infty )\) of integers extended with \(-\infty \),Footnote 1 and the homomorphism \(h_{\max }\) is generated by mapping singleton sequences \(\langle n \rangle \) to the value n. A third example is the computation of the element sum of an integer sequence, corresponding to the monoid \(( Z , +, 0)\) and the homomorphism \(h_{{\text {sum}}}\). Similarly, the number of occurrences of some element can be computed. The considered monoid in the last two cases of aggregation is even cancellative.

Programming Language with Aggregation. We extend our core programming language with expressions \(\texttt {aggregate}_{M, h}{} \texttt {(}\langle Expr \rangle {} \texttt {,} \langle Expr \rangle {} \texttt {,} \langle Expr \rangle {} \texttt {)}\), and use monoid homomorphisms to formalise them. Recall that we denote by \(D_{\sigma }\) the domain of a program type \(\sigma \).

Definition 6

Let \(\texttt {Array}~\sigma \) be an array type, \(\sigma _M\) a program type, M a commutative monoid that is a subset of \(D_{\sigma _M}\), and \(h : D_{\sigma }^* \rightarrow M\) a monoid homomorphism. Let furthermore \( ar \) be an expression of type \(\texttt {Array}~\sigma \), and l and u integer expressions. Then, \(\texttt {aggregate}_{M, h}{} \texttt {(} ar {} \texttt {,} l\texttt {,} u\texttt {)}\) is an expression of type \(\sigma _M\), with semantics defined by:

$$\begin{aligned} \llbracket \texttt {aggregate}_{M, h}{} \texttt {(} ar {} \texttt {,} l\texttt {,} u\texttt {)} \rrbracket \!\!_s ~=~ h(\langle \llbracket ar \rrbracket \!\!_s (\llbracket l \rrbracket \!\!_s), \llbracket ar \rrbracket \!\!_s (\llbracket l \rrbracket \!\!_s + 1), \ldots , \llbracket ar \rrbracket \!\!_s (\llbracket u \rrbracket \!\!_s - 1) \rangle ) \end{aligned}$$

Intuitively, the expression \(\texttt {aggregate}_{M, h}{} \texttt {(} ar {} \texttt {,} l\texttt {,} u\texttt {)}\) denotes the result of applying the homomorphism h to the slice \( ar [l \;..\; u-1]\) of the array \( ar \). As a convention, in case \(u < l\) we assume that the result of aggregate is \(h(\langle \rangle )\). As with array accesses, we assume also that only occurs in normalised statements of the form \(\texttt {t = aggregate}_{M, h}{} \texttt {(} ar {} \texttt {,} l\texttt {,} u\texttt {)}\).

In our examples, we use derived operations as found in ACSL: \max as short-hand notation for \(\texttt {aggregate}_{(\mathbbm {Z}_{-\infty }, \max , -\infty ), h_{\max }}\)Footnote 2, and \sum as short-hand notation for \(\texttt {aggregate}_{(\mathbbm {Z}, +, 0), h_{{\text {sum}}}}\).

An Instrumentation Operator for Maximum. For \max, an operator \(\varOmega _{max} = (G_{ max }, R_{ max }, I_{ max })\) can be defined similarly to the operator \(\varOmega _{\forall , P}\) from Sect. 4.1, in that the maximum value in a particular interval of the array is tracked. One key difference is that an extra ghost variable is added to track an array index where the maximum value of the array interval is stored, in order to not have to reset tracking on every store inside of the tracked interval. A complete definition is proposed in [4].

An Instrumentation Operator for Sum. Cancellative aggregation is aggregation based on a cancellative monoid. Cancellative aggregation makes it possible to track aggregate values faithfully even when storing inside of the tracked interval, unlike and universal quantification. An example of a cancellative operator is the aggregate .

Fig. 5.
figure 5

Definition of an instrumentation operator \(\varOmega _{sum}\) for Sum

The instrumentation operator \(\varOmega _{sum} = (G_{ sum }, R_{ sum }, I_{ sum })\) is defined in Fig. 5. The instrumentation code tracks the sum of values in the interval, and when increasing the bounds of the tracked interval, the new values are simply added to the tracked sum. Since is cancellative, when storing inside of the tracked interval, the previous value at the index being written to is first subtracted from the sum, before adding the new value, ensuring that the correct aggregate value is computed. The following correctness result is proved in [4].

Lemma 3

(Correctness of \(\varOmega _ sum \)).\(\varOmega _ sum \) is an instrumentation operator, i.e., it adheres to the constraints imposed in Definition 1.

Deductive Verification of Instrumentation Operators. As stated in Sect. 2.2, instrumentation operators may be verified independently of the programs to be instrumented. The operators described in this paper, i.e. square, universal quantification, maximum, and sum, have been verified in the verification tool Frama-C [15]. The verified instrumentations are adaptations for the C language semantics and execution model. More specifically, the adapted operators assume C native arrays, rather than functional ones.

5 Evaluation

5.1 Implementation

To evaluate our instrumentation framework, we have implemented the instrumentation operators for quantifiers and aggregation over arrays. The implementation is done over constrained Horn clauses (CHCs), by adding the rewrite rules defined in Sect. 4 to Eldarica [30], an open-source solver for CHCs. We also implemented the automatic application of the instrumentation operators, largely following Algorithm 1 but with a few minor changes due to the CHC setting. The CHC setting makes our implementation available to various CHC-based verification tools, for instance JayHorn (Java) [32], Korn (C) [19], RustHorn (Rust) [36], SeaHorn (C/LLVM) [26] and TriCera (C) [20].

In order to evaluate our approach at the level of C programs, we extended TriCera, an open-source assertion-based model checker that translates C programs into a set of CHCs and relies on Eldarica as back-end solver. TriCera is extended to parse quantifiers and aggregation operators in its input C programs and to encode them as part of the translation into CHCs. We call the resulting toolchain MonoCera. An artefact that includes MonoCera and the benchmarks is available online [5].

To handle complicated access patterns, for instance a program processing an array from the beginning and end at the same time, the implementation can apply multiple instrumentation operators simultaneously; the number of operators is incremented when Algorithm 1 returns Inconclusive.

5.2 Experiments and Comparisons

To assess our implementation, we assembled a test suite and carried out experiments comparing MonoCera with the state-of-the-art C model checkers CPAchecker 2.1.1 [11], SeaHorn 10.0.0 [26] and TriCera 0.2. It should be noted that deductive verification frameworks, such as Dafny and Frama-C, can handle, for example, the program in Fig. 3 if they are provided with a manually written loop invariant; however, since MonoCera  relies on automatic techniques for invariant inference, we only benchmark against tools using similar automatic techniques. We also excluded VeriAbs [1], since its licence does not permit its use for scientific evaluation.

The tools were set up, as far as possible, with equivalent configurations; for instance, to use the SMT-LIB theory of arrays [6] in order to model C arrays, and a mathematical (as opposed to machine) semantics of integers. CPAchecker was configured to use k-induction [10], which was the only configuration that worked in our tests using mathematical integers. SeaHorn was run using the default settings. All tests were run on a Linux machine with AMD Opteron 2220 SE @ 2.8 GHz and 6 GB RAM with a timeout of 300 s.

Test Suite. The comparison includes a set of programs calculating properties related to the quantification and aggregation properties over arrays. The benchmarks and verification results are summarised in Table 3. The benchmark suite contains programs ranging between 16 to 117 LOC and is comprised of two parts: (i) 117 programs taken from the SV-COMP repository [9], and (ii) 26 programs crafted by the authors (min: 6, max: 8, sum: 9, forall: 3).

Table 3. Results for MonoCera (Mono), TriCera (Tri), SeaHorn (Sea), and CPAchecker (CPA). For MonoCera, also statistics are given for verification time (s), size of the instrumentation search space, and search iterations.

To construct the SV-COMP benchmark set for MonoCera  we gathered all test files from the directories prefixed with or , and singled out programs containing some assert statement that could be rewritten using a quantifier or an aggregation operator over a single array. For example, loops

figure al

can be rewritten using \(\texttt{forall}\) or \(\texttt{max}\) operators. We created a benchmark for each possible rewriting; for instance, in the case of \(\texttt{max}\), by rewriting the loop into . The original benchmarks were used for the evaluation of the other tools, none of which supported (extended) quantifiers.

In (ii), we crafted 9 programs that make use of aggregation or quantifiers, and derived further benchmarks by considering different array sizes (10, 100 and unbounded size); one combination (unbounded array inside a struct) had to be excluded, as it is not valid C. In order to evaluate other tools on our crafted benchmarks, we reversed the process described for the SV-COMP benchmarks and translated the operators into corresponding loop constructs.

Results. In Table 3, we present the number of verified programs per instrumentation operator for each tool, as well as further statistics for MonoCera regarding verification times and instrumentation search space. The “Inst. space” column indicates the size of the instrumentation search space (i.e., number of instrumentations producible by applying the non-deterministic instrumentation operator). “Inst. steps” column indicates the number of attempted instrumentations, i.e., number of iterations in the while-loop in Algorithm 1. In our implementation, the check in Algorithm 1 line 5 can time out and cause the check to be repeated at a later time with a greater timeout, which can lead to more iterations than the size of the search space. In [4], we list results per benchmark for each tool.

For the SV-COMP benchmarks, CPAchecker managed to verify 1 program, while SeaHorn and TriCera could not verify any programs. MonoCera verified in total 42 programs from SV-COMP. Regarding the crafted benchmarks, several tools could verify the examples with array size 10. However, when the array size was 100 or unbounded, only MonoCera succeeded.

6 Related Work

It is common practice, in both model checking and deductive verification, to translate high-level specifications to low-level specifications prior to verification (e.g., [13, 14, 18, 37]). Such translations often make use of ghost variables and ghost code, although relatively little systematic research has been done on the required properties of ghost code [22]. The addition of ghost variables to a program for tracking the value of complex expressions also has similarities with the concept of term abstraction in Horn solving [3]. To the best of our knowledge, we are presenting the first general framework for automatic program instrumentation.

A lot of research in software model checking considered the handling of standard quantifiers \(\forall , \exists \) over arrays. In the setting of constrained Horn clauses, properties with universal quantifiers can sometimes be reduced to quantifier-free reasoning over non-linear Horn clauses [13, 37]. Our approach follows the same philosophy of applying an up-front program transformation, but in a more general setting. Various direct approaches to infer quantified array invariants have been proposed as well: e.g., by extending the IC3 algorithm [27], syntax-guided synthesis [21], learning [24], by solving recurrence equations [29], backward reachability [3], or superposition [25]. To the best of our knowledge, such methods have not been extended to aggregation.

Deductive verification tools usually have rich support for quantified specifications, but rely on auxiliary assertions like loop invariants provided by the user, and on SMT solvers or automated theorem provers for quantifier reasoning. Although several deductive verification tools can parse extended quantifiers, few offer support for reasoning about them. Our work is closest to the method for handling comprehension operators in Spec\(\#\) [35], which relies on code annotations provided by the user, but provides heuristics to automatically verify such annotations. The code instrumentation presented in this paper has similarity with the proof rules in Spec\(\#\); the main differences are that our method is based on an upfront program transformation, and that we aim at automatically finding required program invariants, as opposed to only verifying their correctness. The KeY tool provides proof rules similar to the ones in Spec\(\#\) for some of the JML extended quantifiers [2]; those proof rules can be applied manually to verify human-written invariants. The Frama-C system [15] can parse ACSL extended quantifiers [7], but, to the best of our knowledge, none of the Frama-C plugins can automatically process such quantifiers. Other systems, e.g., Dafny [34], require users to manually define aggregation operators as recursive functions.

In the theory of algebraic data-types, several transformation-based approaches have been proposed to verify properties that involve recursive functions or catamorphisms [17, 31]. Aggregation over arrays resembles the evaluation of recursive functions over data-types; a major difference is that data-types are more restricted with respect to accessing and updating data than arrays.

Array folds logic (AFL) [16] is a decidable logic in which properties on arrays beyond standard quantification can be expressed: for instance, counting the number of elements with some property. Similar properties can be expressed using automata on data words [41], or in variants of monadic second-order logic [38]. Such languages can be seen as alternative formalisms to aggregation or extended quantifiers; they do not cover, however, all kinds of aggregation we are interested in. Array sums cannot be expressed in AFL or data automata, for instance.

7 Conclusion

We have presented a framework for automatic and provably correct program instrumentation, allowing the automatic verification of programs containing certain expressive language constructs, which are not directly supported by the existing automatic verification tools. Our experiments with a prototypical implementation, in the tool MonoCera, show that our method is able to automatically verify a significant number of benchmark programs involving quantification and aggregation over arrays that are beyond the scope of other tools.

There are still various other benchmarks that MonoCera (as well as other tools) cannot verify. We believe that many of those benchmarks are in reach of our method, because of the generality of our approach. Ghost code is known to be a powerful specification mechanism; similarly, in our setting, more powerful instrumentation operators can be easily formulated for specific kinds of programs. In future work, we therefore plan to develop a library of instrumentation operators for different language constructs (including arithmetic operators), non-linear arithmetic, other types of structures with regular access patterns such as binary heaps, and general linked-data structures.

We also plan to refine our method for showing incorrectness of programs more efficiently, as the approach is currently applicable mainly for verifying correctness (experiments in [4]). Another line of work is the establishment of stronger completeness results than the weak completeness result presented here, for specific programming language fragments.