figure a

1 Introduction

Formally verifying programs against safety specifications is difficult. This problem worsens in the presence of data structures like lists, arrays, and maps, which are ubiquitous in real-world applications. For instance, proving an array-handling program safe often requires discovering an inductive invariant that is universally quantified over ranges of array elements. Such invariants help to prove the unreachability of error states independently of the size of the array. However, the majority of invariant synthesis approaches are limited to quantifier-free numerical invariants. The approach presented in this paper advances the knowledge by an effective technique to discover quantified invariants over arrays and linear integer arithmetic.

Syntax-guided techniques [3] have recently been applied to synthesize quantifier-free numerical invariants [15,16,17, 34] in the approach called Freq-Horn. In a nutshell, FreqHorn collects various statistics from the syntactical patterns occurring in the program’s source code and uses them to construct a set of formal grammars that specify a search space for invariants. It is often sufficient to perform an enumerative search over the formulas produced from these grammars and identify a set of suitable inductive invariants among them using an off-the-shelf solver for Satisfiability Modulo Theories (SMT). The presence of arrays complicates this reasoning in a few respects: it is hard to find suitable candidates and difficult to prove them inductive.

In this paper, we present a novel technique that extends the approach of enumerative search in general, and its instantiation in FreqHorn in particular, to reason about quantifiers. It discovers invariants over arrays in multiple stages. First, by exploiting the program syntax, it identifies ranges of elements accessed in each loop. Second, it identifies potentially useful facts about individual elements and generalizes them to hypotheses about entire ranges. The SMT-based validation of candidates, which are quantified formulas, is often inexpensive as they are constructed using the same syntactic patterns that appear in the source code. Furthermore, for supporting certain corner cases, our approach allows specifying additional rules that help in generalizing learned properties. The combination of properties proven inductive by an SMT solver is often enough to prove that the program meets a safety specification.

We show that FreqHorn advances state-of-the-art on a selection of array-handling programs from SVCOMPFootnote 1 and literature. For instance, it can prove completely automatically that an array is monotone after applying a sorting algorithm. Furthermore, FreqHorn is able to discover quantifier-free invariants over integer variables in the program, use them as inductive relatives while checking inductiveness of quantified candidates over arrays; and vice versa.

While a detailed discussion of the related work comes later in the paper (Sect. 6), it is noteworthy that being syntax-guided crucially helps us overcome several limitations of other techniques to verify array-handling programs [2, 9, 11, 35]. Most of them avoid inferring quantified invariants explicitly and thus do not produce checkable proofs. As a result, tools are fragile and in practice often output false positives (see Sect. 5 for concrete results). By comparison, our approach never produces false positives, and its results can be validated by existing SMT solvers.

The core contributions made through this work are:

  • a novel syntax-guided approach to generate universally quantified invariants for programs manipulating arrays;

  • an algorithm and its fully automated implementation; and

  • a thorough experimental evaluation comparing our technique with state-of-the-art in verification of array-handling programs.

The rest of the paper is structured as follows. In Sect. 2, we give background and notation and illustrate our approach on an example. Our main contributions are then presented in Sect. 3 (main algorithm) and Sect. 4 (important design choices). In Sect. 5, we show the evaluation and comparison with state-of-the-art. Finally, the related work and conclusion complete the paper in Sects. 6 and 7, respectively.

2 Background

The Satisfiability Modulo Theories (SMT) task is to decide whether there is an assignment m of values to variables in a first-order logic formula \(\varphi \) that makes it true. We write \(\varphi \implies \psi \), if every satisfying assignment to \(\varphi \) is also a satisfying assignment to some formula \(\psi \). By \( Expr \) we denote the space of all possible quantifier-free formulas in our background theory and by \({ Vars }\) a range of possible variables.

2.1 Programs as Constrained Horn Clauses

To guarantee expected behaviors, programs require proofs, such as inductive invariants, ranking functions, or recurrence sets. It is becoming increasingly popular to consider a verification task as a proof synthesis task which is formulated as a system of SMT formulas involving unknown predicates, also known as constrained Horn clauses (CHC). The synthesis goal is to discover a suitable interpretation of all unknown predicates that make all CHCs true. CHCs offer the advantages of flexibility and modularity in designing verifiers for various systems and languages. CHCs can be constructed in a way that captures the operational semantics of a language in question, and an off-the-shelf CHC solver can be used for solving the resulting formulas.

Definition 1

A linear constrained Horn clause (CHC) over a set of uninterpreted relation symbols is a formula in first-order logic that has the form of one of three implications (called respectively a fact, an inductive clause, and a query):

$$\begin{aligned}&\varphi (\vec {x_1}) \implies \varvec{inv}_1(\vec {x_1}) \\\nonumber&\varvec{inv}_1(\vec {x_1}) \wedge \varphi (\vec {x_1},\vec {x_2}) \implies \varvec{inv}_2(\vec {x_2})\\\nonumber&\varvec{inv}_1(\vec {x_1}) \wedge \varphi (\vec {x_1}) \implies \bot \\ \end{aligned}$$

where are uninterpreted symbols, \(\vec {x_1}, \vec {x_2}\) are vectors of variables, and \(\varphi \), called a body, is a fully interpreted formula (i.e., \(\varphi \) does not have applications of \(\varvec{inv}_1\) or \(\varvec{inv}_2\)).

For a CHC C, by \( src (C)\) we denote an application of in the premise of C (if C is a fact, we write ). Similarly, by \( dst (C)\) we denote an application of in the conclusion of C (if C is a query, we write ). We define functions \( rel \) and \( args \), such that for each \(\varvec{inv}(\vec {x})\), and . For a CHC C, by \( body (C)\) we denote the body (i.e., \(\varphi \)) of C.

Fig. 1.
figure 1

Example program: source code in C.

Fig. 2.
figure 2

Example program: CHC encoding.

Example 1

Figure 1 gives a program in the C programming language that handles two integer arrays, A and B, both of an unknown size N. The A array has unknown content, and the program first identifies a value m which is smaller or equal to all elements of A (it might be either a minimal element among the content of A or 0). Then, the program populates B by values of A with m subtracted. Interestingly, the order of elements A and B is not preserved, e.g., A[0] - m gets written to B[N - 1], and so on. Finally, the program computes the sum s of all elements in B and requires us to prove that s is never negative.

Figure 2 gives a CHC encoding of the program. The system has three uninterpreted predicates, \(\varvec{inv}_1\), \(\varvec{inv}_2\), and \(\varvec{inv}_3\) corresponding to invariants at heads of the three loops. The primed variables correspond to modified variables. Rules B, D, and F encode the loop bodies, and the remaining rules encode the fragments of code before, after, or between the loops. In particular, rule G ensures that after the third loop has terminated, a program state with a negative value of s is unreachable. Before we describe how our technique solves this CHC system (see Sect. 2.2), we briefly introduce the notion of satisfiability of CHCs.

Definition 2

Given a set of uninterpreted relation symbols and a set S of CHCs over , we say that S is satisfiable if there exists an interpretation that assigns to each n-ary symbol a relation over n-tuples and makes all implications in S valid.

In the paper, we assume that a relation assigned by an interpretation is represented by a formula \(\psi \) over at most n free variables.

We call a CHC C inductive when \( rel ( src (C)) = rel ( dst (C)) = \varvec{inv}\) for some \(\varvec{inv}\). While accessing an array in a loop, we assume the existence of an integer counter variable. More formally:

Definition 3

Let C be an inductive CHC, \(\vec {x}= args ( src (C))\), and \(\vec {x}' = args ( dst (C))\). We say that C is array-handling if there exist numbers c and a, such that (1) \(1\le c \le |\vec {x}|\) and \(1\le a \le |\vec {x}|\); (2) \(\vec {x}[c]\) (and consequently, its “primed copy” \(\vec {x}'[c]\)) has type integer, (3) either of these implications holds:

$$\begin{aligned} body (C) \implies \vec {x}[c] < \vec {x}'[c] \end{aligned}$$
(1)
$$\begin{aligned} body (C) \implies \vec {x}[c] > \vec {x}'[c] \end{aligned}$$
(2)

(4) \(\vec {x}[a]\) (and consequently \(\vec {x}'[a]\)) has type array, and (5) there is an access function f that identifies a relationship between an access to \(\vec {x}[a]\) in \( body (C)\) and \(\vec {x}[c]\).

2.2 Illustrating Example

The CHC system in Fig. 2 has a solution, indicating that the program meets its specification. In particular:

$$\begin{aligned} \varvec{inv}_1 \mapsto&\,\forall j \,.\,i< j< N \implies m \le A[j] \\\varvec{inv}_2 \mapsto&\,\forall j \,.\,0 \le j< N \implies m \le A[j] \, \wedge \\&\,\forall j \,.\,0 \le j< i \implies B[N - j - 1] = A [j] - m \\\varvec{inv}_3 \mapsto&\,\forall j \,.\,0 \le j< N \implies m \le A[j] \, \wedge \\&\,\forall j \,.\,0 \le j < N \implies B[N - j - 1] = A [j] - m \\&\wedge \, s \ge 0 \end{aligned}$$

The interpretation of \(\varvec{inv}_1\) means that as the first loop progresses (i.e, all elements \(A[N-1]\), \(A[N-2]\), ..., \(A[i + 1]\) are sequentially considered), the value of m is always smaller than all the considered elements. Thus, we refer to the interpretation of \(\varvec{inv}_1\) as a progress lemma. When the first loop has terminated, clearly, this property holds for all elements from A[0] to \(A[N-1]\). Because A leaks through the second loop without any changes, the interpretation of \(\varvec{inv}_1\) gets finalized (thus, it becomes a finalized lemma) and added to an interpretation of \(\varvec{inv}_2\).

Additionally, the interpretation of \(\varvec{inv}_2\) gets a relational fact about pairs of elements A[0] and \(B[N - 1]\), A[1] and \(B[N - 2]\), ..., \(A[i-1]\) and \(B[N - i - 2]\), which again appears as a progress lemma and then gets finalized in an interpretation of \(\varvec{inv}_3\). With these two quantified invariants about all elements of A, and relation about pairs of elements of A and B, it is possible to derive the remaining lemma in the interpretation of \(\varvec{inv}_3\), namely, \(s\ge {0}\); which concludes the proof.

3 Invariants via Enumerative Search

In this work, we aim at discovering a solution for a CHC system S over a set of uninterpreted symbols enumeratively, i.e., by guessing a candidate formula for each , substituting it for all CHCs \(C \in S\) and checking their validity.

3.1 Quantifier-Free Invariants

We build on top of an algorithm, called FreqHorn, recently proposed in [17]. Its key insight is an automatic construction of a set of formal grammars \(G(\varvec{inv})\) for each based on either source code, program behaviors, or both. Importantly, these grammars are conjunction-free: they cannot be used to produce a conjunction of clauses and can give rise to only a finite number of formulas, potentially related to invariants (otherwise, the approach does not guarantee strong convergence). Since invariants are often represented by a conjunction of lemmas, FreqHorn attempts to sample (i.e., recursively apply production rules) each lemma from a grammar in separation, until a combination of them is sufficient for the inductiveness and safety, or a search space is exhausted. Freq-Horn relies on an SMT solver to filter out unsuccessfully sampled lemmas.

The construction of formal grammars is biased by the syntax of CHC encoding. First, FreqHorn collects a set of \( Seeds \) by converting the body of each CHC to a Conjunctive Normal Form, extracting, and normalizing each conjunct. Then, the set of seeds could be optionally replenished by a set of behavioral seeds and bounded proofs. They are constructed respectively from the concrete values of variables obtained from actual program runs, and Craig interpolants from unsatisfiable finite unrollings of the CHC systems. Finally, the production rules are created in a way to enable producing seeds and also their \( mutants \) (i.e., syntactically similar formulas to seeds). In general, no specific restriction on a grammar-construction method is imposed; so in practice, the grammars are allowed to be more (or less) general to enable a broader (or more focused) search space for invariants.

3.2 Quantified Candidates from Quantifier-Free Grammars

The main obstacle for applying the enumerative search to generate array invariants is that the grammars do not allow quantifiers. Because grammars are constructed automatically from syntactic patterns which appear in the original programs, in the presence of arrays, we can expect expressions involving only particular elements of arrays (such as ones accessed via a loop counter). However, since each loop repeats certain operations over a range of array elements, we have to generalize the extracted expressions about individual elements to expressions about entire ranges.

Let a set of variables associated with a relation symbol \(\varvec{inv}\) be , where \({ IntVars }(\varvec{inv})\) and \({ ArrVars }(\varvec{inv})\) are disjoint and contain integer variables and array variables, respectively. A candidate quantified invariant over arrays consists of three parts:

  • a set of quantified integer variables \({ QVars }(\varvec{inv})\), which are introduced by our algorithm and do not appear in \({ Vars }(\varvec{inv})\);

  • a range formula over \({ QVars }(\varvec{inv}) \cup { IntVars }(\varvec{inv})\); and

  • a quantifier-free cell property over \({ QVars }(\varvec{inv}) \cup { Vars }(\varvec{inv})\).

figure b
figure c

A naive idea for getting a range formula and a cell property is to sample them separately, and then to bind them together using some \({ QVars }(\varvec{inv})\). But it would result in a large search space. Algorithm 1 gives a more tailored procedure on the matter. The central role in this process is taken by an analysis of the loop counters which are used to access array elements (line 3). This analysis is performed once for each loop before the main verification process, and thus its results are reused in all iterations of the verification process.

Our algorithm identifies \({ QVars }(\varvec{inv})\) by creating a fresh variable for each counter, including counters of nested loops (line 5). It then generates range formulas based on the results of the analysis (line 6) such that: (1) the range formula itself is an inductive invariant for \(\varvec{inv}\), and (2) the range formula is expressed over the initial values of counters of \(\varvec{inv}\) and the counters themselves. Finally, only a cell property is going to be produced from the grammar \(G(\varvec{inv})\), constructed from the seeds (recall Sect. 3.1), in which all counters are replaced by the corresponding variables from \({ QVars }(\varvec{inv})\) (line 7). Thus, the only part of the candidate formula where the counter can appear is the range formula.

figure d
figure e

Once grammars, \({ QVars }\), and ranges are detected, our approach proceeds to sample candidates and to check them with an SMT solver. The general flow of this algorithm is illustrated in Algorithm 2. For each , it initiates a set \( Lemmas (\varvec{inv})\) (line 2). Then it iteratively guesses lemmas until a combination of them is inductive and safe, or a search space is exhausted (lines 3–4).

Compared to the baseline approach from [17], our new algorithm fixes a shape for the candidates for arrays. At the same time, it permits to sample quantifier-free candidates (line 6): they could be either formulas over counters or any other variables in the loop, or even formulas over isolated array elements (if, e.g., accessed by a constant). Then (line 8), Algorithm 2 propagates candidates through all available implications in CHCs using quantifier elimination and identifies lemmas among the candidates. This step is similar to the baseline approach from [17], but for completeness of presentation, we provide the pseudocode in Algorithms 3 and 4. The only differences are (1) in the implementation of the candidate propagation for array candidates and (2) in the weakening of failed candidates (both in Algorithm 3, to be discussed in Sects. 4.3 and 4.4, respectively).

Both successful and unsuccessful candidates are “blocked” from their grammars to avoid re-sampling them in the next iterations. This fact together with the property of grammars being conjunction-free gives the main hint for proving the following theorem.

Theorem 1

Algorithm 2 always makes a finite number of iterations, and if it returns with SAT then the CHC system is satisfiable.

Next section discusses a particular instantiation of important subroutines that make our invariant synthesizer effective in practice.

4 Design Choices

Our main contribution is a completely automated algorithm for finding quantified invariants for array-handling loops. In this section, we first show how by exploiting the program syntax we can identify ranges of elements accessed in each loop (Sect. 4.1). Second, we present an intuitive justification to why our candidates can often be proved as lemmas by an off-the-shelf SMT solver (Sect. 4.2). Finally, we extend our algorithm to handle more complicated cases of multiple loops (Sects. 4.34.4), and benchmarks of the tiling [9] technique, which are adapted from the industrial code of battery controllers (Sect. 4.5).

4.1 Discovery of Progress Lemmas

We start with the simplest scenario of a single loop handling just one array. Let S be a system of CHCs over a set of uninterpreted relation symbols . Let correspond to a loop, in which arrays are accessed using some counter variable i (counters are automatically identified by posing and solving queries of forms (1) and (2)).

Recall that we do not necessarily require the array elements to be accessed directly by i, and we allow an access function f to identify relationships between i and an index of the accessed element. However, we assume that the counter is unique in the loop because it is the case in most of the practical applications. In principle, our algorithm can be extended to loops handling several independent counters (although it is rare in practice), with the help of additionally discovered lemmas that describe relationships among counters. We leave a discussion about this to future work.

Definition 4

A range of \(\varvec{inv}\) and a counter i is a formula over \({ IntVars }(\varvec{inv})\) and a free variable v having form \(L< v \wedge v < U\), such that either of formulas \(L < i\) or \(i < U\) is a lemma for \(\varvec{inv}\). A progress lemma is either a formula \(L< v \wedge v < i\) (if \(L < i\) is a lemma), or a formula \(i< v \wedge v < U\) (if \(i < U\) is a lemma).

Both ranges and progress ranges can be identified statically. Let \(C_1\) and \(C_2\) be two CHCs, such that \(\varvec{inv}= rel ( dst (C_1)) = rel ( src (C_2)) = rel ( dst (C_2))\) and \(\varvec{inv}\ne rel ( src (C_1))\). It is common in practice that \(body(C_1)\) identifies a symbolic bound b on the initial value of i: it could be either a lower bound (if i increments in \(body (C_2)\)) or an upper bound (if i decrements). In this case, a progress range of \(\varvec{inv}\) is simply computed as a lemma for \(\varvec{inv}\) over i and b. A range of \(\varvec{inv}\) can often be constructed as a conjunction of the progress range with the negation of the termination condition of \(body(C_2)\).Footnote 2

Example 2

For the CHC-encoding of the program is shown in Fig. 2, the ranges of \(\varvec{inv}_1, \varvec{inv}_2\) and \(\varvec{inv}_3\) are all equal to \(-1< v < N\). The progress range of \(\varvec{inv}_1\) is \(i< v < N\), and the progress ranges of \(\varvec{inv}_2\) and \(\varvec{inv}_3\) are \(-1< v < i\).

We call candidates, that use progress ranges in their left sides, progress candidates:

$$\begin{aligned} \forall \vec {q}\,.\,{ progressRange }(\varvec{inv})(\vec {q}) \implies cand \end{aligned}$$

where \(\vec {q}= { QVars }(\varvec{inv})\) and cand is a quantifier-free formula over \({ QVars }(\varvec{inv}) \cup { IntVars }(\varvec{inv})\). As can be seen from Algorithm 1, all sampled candidates are progress candidates. However, during the next steps of the algorithm (i.e., propagation and weakening) we will use other kind of candidates (namely, regress and finalized, see Sects. 4.3 and 4.4 respectively).

If a progress candidate is proven inductive, we call it a progress lemma.

4.2 SMT-Based Inductiveness Checking

We rely on recent advances of SMT solving to identify successful candidates, a conjunction of which is directly used to prove the desired safety specification. In general, solving quantified formulas for validity is a hard task, however, in certain cases, the initiation and inductiveness queries can be simplified and reduced to a sequence of (sometimes even quantifier-free) formulas over integer arithmetic. We illustrate such proving strategy, inspired by the tiling approach [9], on the following example.

Example 3

Recall the CHC system from Fig. 2. Consider a progress candidate \(\forall j\,.\,i< j < N \implies m \le A[j]\) for \(\varvec{inv}_1\). Checking its initiation (i.e., for CHC A) requires deciding validity of the following quantified formula:

$$\begin{aligned} i' = N'-1 \wedge m' = 0 \implies \Big (\forall j\,.\,i'< j < N' \implies m' \le A'[j] \Big ) \end{aligned}$$
(3)

The range formula \(i'< j < N'\) simplifies to \(N'-1< j < N'\), which is always false, making formula (3) always valid.

Checking the inductiveness of the candidate (i.e., for CHC B) boils down to solving a more complicated formula:

$$\begin{aligned}&\Big (\forall j\,.\,i< j< N \implies m \le A[j] \Big )\nonumber \\&\quad \wedge i \ge 0 \wedge m' = ite (m > A[i], A[i], m) \wedge i' = i - 1 \implies \nonumber \\&\qquad \qquad \qquad \qquad \,\,\, \Big (\forall j\,.\,i'< j < N \implies m' \le A[j]\Big ) \end{aligned}$$
(4)

Although quantifiers are present on both sides of (4), proving its validity is not hard. Indeed, the query is reducible to two implications:

$$\begin{aligned} \Big (\forall j\,.\,i< j < N \implies m \le A[j]\Big ) \wedge m' = ite (m > A[i], A[i], m) \implies m' \le A[i] \end{aligned}$$
$$\begin{aligned} \begin{array}{l} \!\!\!\!\Big (\forall j\,.\,i< j< N \implies m \le A[j] \Big ) \wedge \\ \quad \quad \quad \,\,\,\,\, m' = ite (m > A[i], A[i], m) \implies \Big (\forall j\,.\,i< j < N \implies m' \le A[j]\Big ) \end{array} \end{aligned}$$

The former does not require any information about \(A[i+1],\ldots , A[N-1]\), so the entire quantified conjunction is ignored, and A[i] could be replaced by a fresh integer variable. The latter is trickier: it requires to prove that if all elements in a range are greater or equal than m, then they are also greater or equal to \(ite (m > A[i], A[i], m)\). This again is reduced to a quantifier-free formula over integer arithmetic:

$$\begin{aligned} m \le A[j] \wedge m' = ite (m > A[i], A[i], m) \implies m' \le A[j] \end{aligned}$$

Thus, because formulas (3) and (4) are valid, the progress candidate is proved a progress lemma.

In general, we cannot always conduct proofs that easily. Often, the prerequisite for success is the commonality of an access function f in the candidate and the body of the CHC. Fortunately, our algorithm ensures that all access functions used in the candidates are borrowed directly from bodies of CHCs. Thus, in many cases, FreqHorn is able to check large amounts of candidates quickly.

4.3 Strategy of Lemma Propagation

In this subsection, we identify a useful strategy for propagation of quantified lemmas through adjacent CHCs in the given system, inspired by [17]. Let some have the following lemma:

$$\begin{aligned} \forall \vec {q}\,.\,\rho (\vec {q}) \implies \ell \end{aligned}$$

where \(\vec {q}= { QVars }(\varvec{inv}_1)\), formula \(\rho \) over \(\vec {q}\cup { IntVars }(\varvec{inv}_1)\) is either a range or a progress range, and \(\ell \) is over \(\vec {q}\cup { Vars }(\varvec{inv}_1)\). Let then a CHC C be such that \( rel ( src (C)) = \varvec{inv}_1\) and \( rel ( dst (C)) = \varvec{inv}_2\), and its body be \(\varphi (\vec {x}_1,\vec {x}_2)\).

Definition 5

Forward propagation of lemma \(\forall \vec {q}\,.\,\rho (\vec {q}) \implies \ell \) through C gives a formula of the following form:

$$\begin{aligned} \forall \vec {q}\,.\,(\exists \vec {x}_1 \,.\,\rho (\vec {q})(\vec {x}_1) \wedge \varphi (\vec {x}_1,\vec {x}_2)) \implies (\exists \vec {x}_1(\vec {x}_1, \vec {q}) \,.\,\ell \wedge \varphi (\vec {x}_1,\vec {x}_2)) \end{aligned}$$

Example 4

Recall the example from Fig. 2 and the following lemma for \(\varvec{inv}_1\):

$$\begin{aligned} \forall j \,.\,i< j < N \implies m \le A[j] \end{aligned}$$

The body of C is \(i < 0 \wedge i' = 0\), thus the forward propagation gives the following formula:

$$\begin{aligned} \forall j \,.\,(\exists i \,.\,i< j< N \wedge i< 0 \wedge i' = 0) \implies (\exists i \,.\,m \le A[j] \wedge i < 0 \wedge i' = 0) \end{aligned}$$

Applying quantifier elimination to both sides of the implication, we get the following formula:

$$\begin{aligned} \forall j \,.\,0 \le j < N \implies m \le A[j] \end{aligned}$$

Note that this formula is not going to be immediately learned as a lemma, but instead should be checked by the solver for inductiveness. Intuitively, such a candidate represents some facts about array elements that were accessed during a loop that has terminated. If after the propagation it appeared that the candidate uses the entire range then we refer to such candidate to as a finalized candidate.

4.4 Weakening Strategy

Whenever a finalized candidate cannot be proven inductive, we often do not want to withdraw it completely. Instead, our algorithm runs weakening and proposes regress candidates. The main idea is to calculate a range of elements which have not been touched by the loop yet. This is an inverse of the procedure outlined in Sect. 4.1.

Definition 6

Given , its \({ Range }(\varvec{inv})\) and \({ progressRange }(\varvec{inv})\) formulas, we call a regress range a formula of the following kind:

We call candidates that use regress ranges in their left sides as regress candidates. Clearly, a regress candidate is weaker than the corresponding finalized candidate. Thus, from the failure to prove inductiveness of the finalized candidate it does not follow that the regress candidate is not inductive; and it makes sense to try proving it in the next iteration.

4.5 Learning from Sub-ranges

In complicated scenarios of loops with multiple iterators, multiple array variables or multiple access functions, the iterative process of lemma discovery, might end up in a large number of quantified formulas and get lost while checking a candidate for inductiveness (recall Sect. 4.2). To overcome current limitations in existing SMT solvers, it appeared to be useful to help the solver while generalizing learned lemmas. In particular, a property could be learned for two subranges of an array, and then combined in the following way:

Fig. 3.
figure 3

Learning from sub-ranges.

Lemma 1

Let for some two lemmas be of the following kind:

$$\begin{aligned}&\forall \vec {q}\,.\,\rho _1(\vec {q}) \implies \ell&\forall \vec {q}\,.\,\rho _2(\vec {q}) \implies \ell \end{aligned}$$
(5)

Then, the following is also a lemma for \(\varvec{inv}\):

$$\begin{aligned} \forall \vec {q}\,.\,\rho _1(\vec {q}) \vee \rho _2(\vec {q}) \implies \ell \end{aligned}$$

Example 5

Figure 3 shows a program from the tiling benchmark suite [9]. If lemmas \(\forall j \,.\,0< j < N \implies A[2*j-1] = 0 \vee A[2*j-1] \le m\) and \(\forall j \,.\,0< j < N \implies A[2*j-2] = 0 \vee A[2*j-2] \le m\) are discovered, then formula \(\forall j \,.\,0 \le j < 2*N-1 \implies A[j] = 0 \vee A[j] \le m\) is also a lemma.

5 Evaluation

We have implemented our algorithm on top of the FreqHornFootnote 3 tool. It takes a system of CHCs with arrays as input and performs an enumerative search as presented in Sect. 4. The tool uses Z3 [12] to solve SMT queries.

We have evaluated FreqHorn on 137 satisfiable CHC-translations of publicly available C programs (whose assertions are safe) taken from the SVCOMP ReachSafety Array subcategory and literature. These programs include variations of standard array copying, initializing, maximum, minimum, sorting, and tiling benchmarks. Among these 137 benchmarks, 79 have a single loop, and 58 have multiple loops, including 7 that have nested loops. These programs are encoded using the theories of Arrays, Linear (LIA) and Non-linear Integer Arithmetic (NIA). Our experiments have been performed on an Ubuntu 18.04 machine running at 2.5 GHz and having 16 GB memory, with a timeout of 100 s for every benchmark. FreqHorn solved 129 benchmarks within the timeout, of which 73 solved benchmarks had a single loop and 56 had multiple loops.

We have compared our tool with Spacer (Z3 v4.8.3) [26], that implements a recent QUIC3 [22] algorithm, Booster (v0.2) [2], VIAP (v1.0) [35], and VeriAbs (v1.3.10) [11]. The last two tools performed well in the ReachSafety Array subcategory at SVCOMP 2019Footnote 4. Figure 4 gives a comparison of FreqHorn timings against timings of these tools.Footnote 5

Fig. 4.
figure 4

FreqHorn vs competitors. Each point in a plot represents a pair of the run times (sec \(\times \) sec) of FreqHorn (x-axis) and a competitor (y-axis). Timeouts are placed on the inner dashed lines; false alarms, unsupported cases, and crashes are on the outer dashed lines.

Compare to 129 benchmarks solved by FreqHorn, only 81 were solved by Spacer, 108 – by VeriAbs, 70 – by VIAP, and 48 – by Booster.

FreqHorn solved 54 benchmarks on which Spacer diverged. Our intuition is that Spacer works poorly on programs with non-deterministic assignments and NIA operations, which our tool can handle.

FreqHorn solved 27 benchmarks on which VeriAbs diverged. VeriAbs failed to solve programs with nested loops and when array values were dependent on access indices. Furthermore, it decided one of the programs as unsafe, Time-wise, FreqHorn significantly outperformed VeriAbs on all benchmarks. Importantly, the short time taken by FreqHorn includes the time for generating a checkable witness – quantified invariant – an essence that VeriAbs cannot produce by design. On the other side, VeriAbs solved several benchmarks after merging loops. No quantified invariant satisfying the FreqHorn’s restrictions exists for these benchmarks before this program transformation.

FreqHorn solved 60 programs on which VIAP diverged. VIAP decided one program as unsafe. There were no programs on which FreqHorn took more time than VIAP. Finally, FreqHorn solved 83 programs on which Booster diverged. And again, Booster decided two programs as unsafe.

6 Related Work

Our algorithm for quantified invariant synthesis extends the prior work on checking satisfiability of CHCs [15,16,17], where solutions do not permit quantifiers. It works in a similar – enumerate-and-check – manner, but there are two crucial changes: (1) introduction of quantifiers, to formulate hypotheses over a subset of array indices, and (2) a generalization mechanism, to derive properties that may hold over the entire range of array indices.

Many existing approaches for verifying programs over arrays are extensions of well-known techniques for programs over scalar variables to quantified invariants. For example, by extending predicates with Skolem variables in predicate abstraction [30], by exploiting the MCMT [19] framework in lazy abstraction with interpolants [1] and its integration with acceleration [2], and, recently, QUIC3 [22], that extends IC3 [8, 14] to universally quantified invariants. Apart from the skeletal similarity, however, these approaches rely on orthogonal techniques.

Partitioning of arrays has also been used to infer invariants in many different ways. It refers to splitting an array into symbolic segments, and may be based on syntax [20, 23, 25] or semantics [10, 31]. Invariants may be inferred for each segment separately and generalized for the entire array. The partitioning need not be explicit, as in [13]. However, most of these techniques (except [13, 31]) are restricted to contiguous array segments, and work well when different loop iterations write to disjoint array locations or when the segments are non-overlapping. Tiling [9], a property-driven verification technique, overcomes these limitations for a class of programs by inferring array access patterns in loops. But identifying tiles of array accesses is itself a difficult problem, and the approach is currently based on heuristics developed by observing interesting patterns.

There are a number of approaches that verify array programs without inferring quantified invariants explicitly. A straightforward way is to smash all array elements into a single memory location [4], but it is quite imprecise. Every array element might also be considered a separate variable, but it is not possible with unknown array sizes. There are also techniques that abstract an array to a fixed number of elements, e.g. k-distinguished cell abstraction [32, 33] and k-shrinkability [24, 29]. Such abstractions usually reduce array modifying loops with unknown bounds to a known, small bound. It may even be possible to get rid of such loops altogether, by accelerating (computing transitive closures of) transition relations involving array updates in that loop [7]. Along similar lines, VIAP [35] resorts to reasoning with recurrences instead of loops. It translates the input program, including loops, to a set of first-order axioms, and checks if they derive the property. But all these techniques do not obtain quantified invariants explicitly, unlike ours. Besides, many of these transformations produce an abstraction of the original program, i.e., they do not preserve safety.

Alternatively, there are approaches that use sufficiently expressive templates to infer quantified invariants over arrays [5, 21, 27]. However, the templates need to be supplied manually. For instance, [6] uses a template space of quantified invariants and reduces the problem to quantifier-free invariant generation. Thus, universally quantified solutions for unknown predicates in a CHC system may be obtained by extending a generic CHC solver to handle quantified predicates. Learning need not be limited to user-supplied templates; one may do away with the templates entirely and learn only from examples and counterexamples [18]. Alternatively, [36] chooses a template upfront and refurbishes it with constants or coefficients appearing in the program source. Similarly, [28] proposes to infer array invariants without any user guidance or any user-defined templates or predicates. Their method is based on automatic analysis of predicates that update an array and allows one to generate first-order invariants, including those that contain alternations of quantifiers. But it does not work for nested loops. By comparison, our technique supports multiple as well as nested loops, enables candidate propagation between loops and, more importantly, generates the grammar automatically from the syntactical constructions appearing in the program’s source.

7 Conclusion

We have presented a new algorithm to synthesize quantified invariants over array variables, systematically accessed in loops. Our algorithm implements an enumerative search that guesses invariants based on syntactic constructions which appear in the code and checks their initiation, inductiveness, and safety with an off-the-shelf SMT solver. Key insights behind our approach are that individual accesses to array elements performed in the loop can be generalized to hypotheses about entire ranges, and the existing SMT solvers can be used to validate these hypotheses efficiently. Our implementation on top of a CHC solver FreqHorn confirmed that such strategy is effective on a variety of practical examples. In a vast majority of cases, our tool outperformed competitors and provided checkable guarantees that prevented from reporting false positives.