via Syntax-Guided Synthesis

. Programs with arrays are ubiquitous. Automated reasoning about arrays necessitates discovering properties about ranges of elements at certain program points. Such properties are formally speciﬁed by universally quantiﬁed formulas, which are diﬃcult to ﬁnd, and diﬃcult to prove inductive. In this paper, we propose an algorithm based on an enu-merative search that discovers quantiﬁed invariants in stages. First, by exploiting the program syntax, it identiﬁes ranges of elements accessed in each loop. Second, it identiﬁes potentially useful facts about individual elements and generalizes them to hypotheses about entire ranges. Finally, by applying recent advances of SMT solving, the algorithm ﬁl-ters out wrong hypotheses. The combination of properties is often enough to prove that the program meets a safety speciﬁcation. The algorithm has been implemented in a solver for Constrained Horn Clauses, Freq-Horn , and extended to deal with multiple (possibly nested) loops. We show that FreqHorn advances state-of-the-art on a wide range of public array-handling programs.


Introduction
Formally verifying programs against safety specifications is difficult.This problem worsens in the presence of data structures like lists, arrays, and maps, which are ubiquitous in real-world applications.For instance, proving an array-handling program safe often requires discovering an inductive invariant that is universally quantified over ranges of array elements.Such invariants help to prove the unreachability of error states independently of the size of the array.However, the majority of invariant synthesis approaches are limited to quantifier-free numerical invariants.The approach presented in this paper advances the knowledge by an effective technique to discover quantified invariants over arrays and linear integer arithmetic.
Syntax-guided techniques [3] have recently been applied to synthesize quantifier-free numerical invariants [15][16][17]34] in the approach called Freq-Horn.In a nutshell, FreqHorn collects various statistics from the syntactical patterns occurring in the program's source code and uses them to construct a set of formal grammars that specify a search space for invariants.It is often sufficient to perform an enumerative search over the formulas produced from these grammars and identify a set of suitable inductive invariants among them using an off-the-shelf solver for Satisfiability Modulo Theories (SMT).The presence of arrays complicates this reasoning in a few respects: it is hard to find suitable candidates and difficult to prove them inductive.
In this paper, we present a novel technique that extends the approach of enumerative search in general, and its instantiation in FreqHorn in particular, to reason about quantifiers.It discovers invariants over arrays in multiple stages.First, by exploiting the program syntax, it identifies ranges of elements accessed in each loop.Second, it identifies potentially useful facts about individual elements and generalizes them to hypotheses about entire ranges.The SMT-based validation of candidates, which are quantified formulas, is often inexpensive as they are constructed using the same syntactic patterns that appear in the source code.Furthermore, for supporting certain corner cases, our approach allows specifying additional rules that help in generalizing learned properties.The combination of properties proven inductive by an SMT solver is often enough to prove that the program meets a safety specification.
We show that FreqHorn advances state-of-the-art on a selection of arrayhandling programs from SVCOMP 1 and literature.For instance, it can prove completely automatically that an array is monotone after applying a sorting algorithm.Furthermore, FreqHorn is able to discover quantifier-free invariants over integer variables in the program, use them as inductive relatives while checking inductiveness of quantified candidates over arrays; and vice versa.
While a detailed discussion of the related work comes later in the paper (Sect.6), it is noteworthy that being syntax-guided crucially helps us overcome several limitations of other techniques to verify array-handling programs [2,9,11,35].Most of them avoid inferring quantified invariants explicitly and thus do not produce checkable proofs.As a result, tools are fragile and in practice often output false positives (see Sect. 5 for concrete results).By comparison, our approach never produces false positives, and its results can be validated by existing SMT solvers.
The core contributions made through this work are: -a novel syntax-guided approach to generate universally quantified invariants for programs manipulating arrays; -an algorithm and its fully automated implementation; and -a thorough experimental evaluation comparing our technique with state-ofthe-art in verification of array-handling programs.
The rest of the paper is structured as follows.In Sect.2, we give background and notation and illustrate our approach on an example.Our main contributions are then presented in Sect. 3 (main algorithm) and Sect. 4 (important design choices).In Sect.5, we show the evaluation and comparison with state-of-theart.Finally, the related work and conclusion complete the paper in Sects.6 and 7, respectively.

Background
The Satisfiability Modulo Theories (SMT) task is to decide whether there is an assignment m of values to variables in a first-order logic formula ϕ that makes it true.We write ϕ =⇒ ψ, if every satisfying assignment to ϕ is also a satisfying assignment to some formula ψ.By Expr we denote the space of all possible quantifier-free formulas in our background theory and by Vars a range of possible variables.

Programs as Constrained Horn Clauses
To guarantee expected behaviors, programs require proofs, such as inductive invariants, ranking functions, or recurrence sets.It is becoming increasingly popular to consider a verification task as a proof synthesis task which is formulated as a system of SMT formulas involving unknown predicates, also known as constrained Horn clauses (CHC).The synthesis goal is to discover a suitable interpretation of all unknown predicates that make all CHCs true.CHCs offer the advantages of flexibility and modularity in designing verifiers for various systems and languages.CHCs can be constructed in a way that captures the operational semantics of a language in question, and an off-the-shelf CHC solver can be used for solving the resulting formulas.Definition 1.A linear constrained Horn clause (CHC) over a set of uninterpreted relation symbols R is a formula in first-order logic that has the form of one of three implications (called respectively a fact, an inductive clause, and a query): where inv 1 , inv 2 ∈ R are uninterpreted symbols, x 1 , x 2 are vectors of variables, and ϕ, called a body, is a fully interpreted formula (i.e., ϕ does not have applications of inv 1 or inv 2 ).equal to all elements of A (it might be either a minimal element among the content of A or 0).Then, the program populates B by values of A with m subtracted.Interestingly, the order of elements A and B is not preserved, e.g., A[0] -m gets written to B[N -1], and so on.Finally, the program computes the sum s of all elements in B and requires us to prove that s is never negative.
Figure 2 gives a CHC encoding of the program.The system has three uninterpreted predicates, inv 1 , inv 2 , and inv 3 corresponding to invariants at heads of the three loops.The primed variables correspond to modified variables.Rules B, D, and F encode the loop bodies, and the remaining rules encode the fragments of code before, after, or between the loops.In particular, rule G ensures that after the third loop has terminated, a program state with a negative value of s is unreachable.Before we describe how our technique solves this CHC system (see Sect. 2.2), we briefly introduce the notion of satisfiability of CHCs.Definition 2. Given a set of uninterpreted relation symbols R and a set S of CHCs over R , we say that S is satisfiable if there exists an interpretation that assigns to each n-ary symbol inv ∈ R a relation over n-tuples and makes all implications in S valid.
In the paper, we assume that a relation assigned by an interpretation is represented by a formula ψ over at most n free variables.
We call a CHC C inductive when rel (src(C)) = rel (dst(C)) = inv for some inv.While accessing an array in a loop, we assume the existence of an integer counter variable.More formally: Definition 3. Let C be an inductive CHC, x = args(src(C)), and x = args(dst(C)).We say that C is array-handling if there exist numbers c and a, such that (1) 1 ≤ c ≤ | x| and 1 ≤ a ≤ | x|; (2) x[c] (and consequently, its "primed copy" x [c]) has type integer, (3) either of these implications holds: (4) x[a] (and consequently x [a]) has type array, and (5) there is an access function f that identifies a relationship between an access to x[a] in body(C) and x[c].

Illustrating Example
The CHC system in Fig. 2 has a solution, indicating that the program meets its specification.In particular: The interpretation of inv 1 means that as the first loop progresses (i.e, all elements A[N − 1], A[N − 2], . . ., A[i + 1] are sequentially considered), the value of m is always smaller than all the considered elements.Thus, we refer to the interpretation of inv 1 as a progress lemma.When the first loop has terminated, clearly, this property holds for all elements from A[0] to A[N − 1].Because A leaks through the second loop without any changes, the interpretation of inv 1 gets finalized (thus, it becomes a finalized lemma) and added to an interpretation of inv 2 .
Additionally, the interpretation of inv 2 gets a relational fact about pairs of elements , which again appears as a progress lemma and then gets finalized in an interpretation of inv 3 .With these two quantified invariants about all elements of A, and relation about pairs of elements of A and B, it is possible to derive the remaining lemma in the interpretation of inv 3 , namely, s ≥ 0; which concludes the proof.

Invariants via Enumerative Search
In this work, we aim at discovering a solution for a CHC system S over a set of uninterpreted symbols R enumeratively, i.e., by guessing a candidate formula for each inv ∈ R , substituting it for all CHCs C ∈ S and checking their validity.

Quantifier-Free Invariants
We build on top of an algorithm, called FreqHorn, recently proposed in [17].Its key insight is an automatic construction of a set of formal grammars G(inv) for each inv ∈ R based on either source code, program behaviors, or both.Importantly, these grammars are conjunction-free: they cannot be used to produce a conjunction of clauses and can give rise to only a finite number of formulas, potentially related to invariants (otherwise, the approach does not guarantee strong convergence).Since invariants are often represented by a conjunction of lemmas, FreqHorn attempts to sample (i.e., recursively apply production rules) each lemma from a grammar in separation, until a combination of them is sufficient for the inductiveness and safety, or a search space is exhausted.Freq-Horn relies on an SMT solver to filter out unsuccessfully sampled lemmas.
The construction of formal grammars is biased by the syntax of CHC encoding.First, FreqHorn collects a set of Seeds by converting the body of each CHC to a Conjunctive Normal Form, extracting, and normalizing each conjunct.Then, the set of seeds could be optionally replenished by a set of behavioral seeds and bounded proofs.They are constructed respectively from the concrete values of variables obtained from actual program runs, and Craig interpolants from unsatisfiable finite unrollings of the CHC systems.Finally, the production rules are created in a way to enable producing seeds and also their mutants (i.e., syntactically similar formulas to seeds).In general, no specific restriction on a grammar-construction method is imposed; so in practice, the grammars are allowed to be more (or less) general to enable a broader (or more focused) search space for invariants.

Quantified Candidates from Quantifier-Free Grammars
The main obstacle for applying the enumerative search to generate array invariants is that the grammars do not allow quantifiers.Because grammars are constructed automatically from syntactic patterns which appear in the original programs, in the presence of arrays, we can expect expressions involving only particular elements of arrays (such as ones accessed via a loop counter).However, since each loop repeats certain operations over a range of array elements, we have to generalize the extracted expressions about individual elements to expressions about entire ranges.
Let a set of variables associated with a relation symbol inv be Vars(inv) def = IntVars(inv) ∪ ArrVars(inv), where IntVars(inv) and ArrVars(inv) are disjoint and contain integer variables and array variables, respectively.A candidate quantified invariant over arrays consists of three parts: -a set of quantified integer variables QVars(inv), which are introduced by our algorithm and do not appear in Vars(inv); -a range formula over QVars(inv) ∪ IntVars(inv); and -a quantifier-free cell property over QVars(inv) ∪ Vars(inv).A naive idea for getting a range formula and a cell property is to sample them separately, and then to bind them together using some QVars(inv).But it would result in a large search space.Algorithm 1 gives a more tailored procedure on the matter.The central role in this process is taken by an analysis of the loop counters which are used to access array elements (line 3).This analysis is performed once for each loop before the main verification process, and thus its results are reused in all iterations of the verification process.
Our algorithm identifies QVars(inv) by creating a fresh variable for each counter, including counters of nested loops (line 5).It then generates range formulas based on the results of the analysis (line 6) such that: (1) the range formula itself is an inductive invariant for inv, and (2) the range formula is expressed over the initial values of counters of inv and the counters themselves.Finally, only a cell property is going to be produced from the grammar G(inv), Algorithm 4. extend(S, R , Cand , Lemmas), cf [17].
for each inv ∈ R Output: extended Cand 1 Cand ← weaken(S , R , Cand , Lemmas); 2 for all C ∈ S s.t.rel (src(C)) ∈ R do constructed from the seeds (recall Sect.3.1), in which all counters are replaced by the corresponding variables from QVars(inv) (line 7).Thus, the only part of the candidate formula where the counter can appear is the range formula.
Once grammars, QVars, and ranges are detected, our approach proceeds to sample candidates and to check them with an SMT solver.The general flow of this algorithm is illustrated in Algorithm 2. For each inv ∈ R , it initiates a set Lemmas(inv) (line 2).Then it iteratively guesses lemmas until a combination of them is inductive and safe, or a search space is exhausted (lines 3-4).
Compared to the baseline approach from [17], our new algorithm fixes a shape for the candidates for arrays.At the same time, it permits to sample quantifier-free candidates (line 6): they could be either formulas over counters or any other variables in the loop, or even formulas over isolated array elements (if, e.g., accessed by a constant).Then (line 8), Algorithm 2 propagates candidates through all available implications in CHCs using quantifier elimination and identifies lemmas among the candidates.This step is similar to the baseline approach from [17], but for completeness of presentation, we provide the pseudocode in Algorithms 3 and 4. The only differences are (1) in the implementation of the candidate propagation for array candidates and (2) in the weakening of failed candidates (both in Algorithm 3, to be discussed in Sects.4.3 and 4.4, respectively).
Both successful and unsuccessful candidates are "blocked" from their grammars to avoid re-sampling them in the next iterations.This fact together with the property of grammars being conjunction-free gives the main hint for proving the following theorem.
Theorem 1. Algorithm 2 always makes a finite number of iterations, and if it returns with SAT then the CHC system is satisfiable.
Next section discusses a particular instantiation of important subroutines that make our invariant synthesizer effective in practice.

Design Choices
Our main contribution is a completely automated algorithm for finding quantified invariants for array-handling loops.In this section, we first show how by exploiting the program syntax we can identify ranges of elements accessed in each loop (Sect.4.1).Second, we present an intuitive justification to why our candidates can often be proved as lemmas by an off-the-shelf SMT solver (Sect.4.2).Finally, we extend our algorithm to handle more complicated cases of multiple loops (Sects.4.3-4.4),and benchmarks of the tiling [9] technique, which are adapted from the industrial code of battery controllers (Sect.4.5).

Discovery of Progress Lemmas
We start with the simplest scenario of a single loop handling just one array.Let S be a system of CHCs over a set of uninterpreted relation symbols R .Let inv ∈ R correspond to a loop, in which arrays are accessed using some counter variable i (counters are automatically identified by posing and solving queries of forms (1) and ( 2)).
Recall that we do not necessarily require the array elements to be accessed directly by i, and we allow an access function f to identify relationships between i and an index of the accessed element.However, we assume that the counter is unique in the loop because it is the case in most of the practical applications.In principle, our algorithm can be extended to loops handling several independent counters (although it is rare in practice), with the help of additionally discovered lemmas that describe relationships among counters.We leave a discussion about this to future work.

Definition 4. A range of inv and a counter i is a formula over IntVars(inv) and a free variable v having form L < v ∧ v < U, such that either of formulas L < i or i < U is a lemma for inv. A progress lemma is either a formula
Both ranges and progress ranges can be identified statically.Let C 1 and C 2 be two CHCs, such that inv = rel (dst(C 1 )) = rel (src(C 2 )) = rel (dst(C 2 )) and inv = rel (src(C 1 )).It is common in practice that body(C 1 ) identifies a symbolic bound b on the initial value of i: it could be either a lower bound (if i increments in body(C 2 )) or an upper bound (if i decrements).In this case, a progress range of inv is simply computed as a lemma for inv over i and b.A range of inv can often be constructed as a conjunction of the progress range with the negation of the termination condition of body(C 2 ). 2xample 2. For the CHC-encoding of the program is shown in Fig. 2, the ranges of inv 1 , inv 2 and inv 3 are all equal to −1 < v < N. The progress range of inv 1 is i < v < N, and the progress ranges of inv 2 and inv 3 are −1 < v < i.
We call candidates, that use progress ranges in their left sides, progress candidates: ∀ q .progressRange(inv)( q) =⇒ cand where q = QVars(inv) and cand is a quantifier-free formula over QVars(inv) ∪ IntVars(inv).As can be seen from Algorithm 1, all sampled candidates are progress candidates.However, during the next steps of the algorithm (i.e., propagation and weakening) we will use other kind of candidates (namely, regress and finalized, see Sects.4.3 and 4.4 respectively).
If a progress candidate is proven inductive, we call it a progress lemma.

SMT-Based Inductiveness Checking
We rely on recent advances of SMT solving to identify successful candidates, a conjunction of which is directly used to prove the desired safety specification.In general, solving quantified formulas for validity is a hard task, however, in certain cases, the initiation and inductiveness queries can be simplified and reduced to a sequence of (sometimes even quantifier-free) formulas over integer arithmetic.We illustrate such proving strategy, inspired by the tiling approach [9], on the following example.
Example 3. Recall the CHC system from Fig. 2. Consider a progress candidate ∀j .i < j < N =⇒ m ≤ A[j] for inv 1 .Checking its initiation (i.e., for CHC A) requires deciding validity of the following quantified formula: The range formula i < j < N simplifies to N − 1 < j < N , which is always false, making formula (3) always valid.
Checking the inductiveness of the candidate (i.e., for CHC B) boils down to solving a more complicated formula: Although quantifiers are present on both sides of (4), proving its validity is not hard.Indeed, the query is reducible to two implications: The former does not require any information about , so the entire quantified conjunction is ignored, and A[i] could be replaced by a fresh integer variable.The latter is trickier: it requires to prove that if all elements in a range are greater or equal than m, then they are also greater or equal to

ite(m > A[i], A[i], m
).This again is reduced to a quantifier-free formula over integer arithmetic: Thus, because formulas (3) and ( 4) are valid, the progress candidate is proved a progress lemma.
In general, we cannot always conduct proofs that easily.Often, the prerequisite for success is the commonality of an access function f in the candidate and the body of the CHC.Fortunately, our algorithm ensures that all access functions used in the candidates are borrowed directly from bodies of CHCs.Thus, in many cases, FreqHorn is able to check large amounts of candidates quickly.

Strategy of Lemma Propagation
In this subsection, we identify a useful strategy for propagation of quantified lemmas through adjacent CHCs in the given system, inspired by [17].Let some inv 1 ∈ R have the following lemma: ∀ q .ρ( q) =⇒ where q = QVars(inv 1 ), formula ρ over q ∪ IntVars(inv 1 ) is either a range or a progress range, and is over q ∪ Vars(inv 1 ).Let then a CHC C be such that rel (src(C)) = inv 1 and rel (dst(C)) = inv 2 , and its body be ϕ( x 1 , x 2 ).Definition 5. Forward propagation of lemma ∀ q .ρ( q) =⇒ through C gives a formula of the following form: Example 4. Recall the example from Fig. 2 and the following lemma for inv 1 : The body of C is i < 0 ∧ i = 0, thus the forward propagation gives the following formula: Applying quantifier elimination to both sides of the implication, we get the following formula: Note that this formula is not going to be immediately learned as a lemma, but instead should be checked by the solver for inductiveness.Intuitively, such a candidate represents some facts about array elements that were accessed during a loop that has terminated.If after the propagation it appeared that the candidate uses the entire range then we refer to such candidate to as a finalized candidate.

Weakening Strategy
Whenever a finalized candidate cannot be proven inductive, we often do not want to withdraw it completely.Instead, our algorithm runs weakening and proposes regress candidates.The main idea is to calculate a range of elements which have not been touched by the loop yet.This is an inverse of the procedure outlined in Sect.4.1.Definition 6.Given inv ∈ R , its Range(inv) and progressRange(inv) formulas, we call a regress range a formula of the following kind: We call candidates that use regress ranges in their left sides as regress candidates.Clearly, a regress candidate is weaker than the corresponding finalized candidate.Thus, from the failure to prove inductiveness of the finalized candidate it does not follow that the regress candidate is not inductive; and it makes sense to try proving it in the next iteration.

Learning from Sub-ranges
In complicated scenarios of loops with multiple iterators, multiple array variables or multiple access functions, the iterative process of lemma discovery, might end up in a large number of quantified formulas and get lost while checking a candidate for inductiveness (recall Sect.4.2).To overcome current limitations in existing SMT solvers, it appeared to be useful to help the solver while generalizing learned lemmas.In particular, a property could be learned for two subranges of an array, and then combined in the following way: Lemma 1.Let for some inv ∈ R two lemmas be of the following kind: Then, the following is also a lemma for inv: Figure 3 shows a program from the tiling benchmark suite [9].
≤ m is also a lemma.

Evaluation
We have implemented our algorithm on top of the FreqHorn3 tool.It takes a system of CHCs with arrays as input and performs an enumerative search as presented in Sect. 4. The tool uses Z3 [12] to solve SMT queries.
We have evaluated FreqHorn on 137 satisfiable CHC-translations of publicly available C programs (whose assertions are safe) taken from the SVCOMP ReachSafety Array subcategory and literature.These programs include variations of standard array copying, initializing, maximum, minimum, sorting, and tiling benchmarks.Among these 137 benchmarks, 79 have a single loop, and 58 have multiple loops, including 7 that have nested loops.These programs are encoded using the theories of Arrays, Linear (LIA) and Non-linear Integer Arithmetic (NIA).Our experiments have been performed on an Ubuntu 18.04 machine running at 2.5 GHz and having 16 GB memory, with a timeout of 100 s for every benchmark.FreqHorn solved 129 benchmarks within the timeout, of which 73 solved benchmarks had a single loop and 56 had multiple loops.
FreqHorn solved 54 benchmarks on which Spacer diverged.Our intuition is that Spacer works poorly on programs with non-deterministic assignments and NIA operations, which our tool can handle.
FreqHorn solved 27 benchmarks on which VeriAbs diverged.VeriAbs failed to solve programs with nested loops and when array values were dependent on access indices.Furthermore, it decided one of the programs as unsafe, Time-wise, FreqHorn significantly outperformed VeriAbs on all benchmarks.
Importantly, the short time taken by FreqHorn includes the time for generating a checkable witness -quantified invariant -an essence that VeriAbs cannot produce by design.On the other side, VeriAbs solved several benchmarks after merging loops.No quantified invariant satisfying the FreqHorn's restrictions exists for these benchmarks before this program transformation.
FreqHorn solved 60 programs on which VIAP diverged.VIAP decided one program as unsafe.There were no programs on which FreqHorn took more time than VIAP.Finally, FreqHorn solved 83 programs on which Booster diverged.And again, Booster decided two programs as unsafe.

Related Work
Our algorithm for quantified invariant synthesis extends the prior work on checking satisfiability of CHCs [15][16][17], where solutions do not permit quantifiers.It works in a similar -enumerate-and-check -manner, but there are two crucial changes: (1) introduction of quantifiers, to formulate hypotheses over a subset of array indices, and (2) a generalization mechanism, to derive properties that may hold over the entire range of array indices.
Many existing approaches for verifying programs over arrays are extensions of well-known techniques for programs over scalar variables to quantified invariants.For example, by extending predicates with Skolem variables in predicate abstraction [30], by exploiting the MCMT [19] framework in lazy abstraction with interpolants [1] and its integration with acceleration [2], and, recently, QUIC3 [22], that extends IC3 [8,14] to universally quantified invariants.Apart from the skeletal similarity, however, these approaches rely on orthogonal techniques.
Partitioning of arrays has also been used to infer invariants in many different ways.It refers to splitting an array into symbolic segments, and may be based on syntax [20,23,25] or semantics [10,31].Invariants may be inferred for each segment separately and generalized for the entire array.The partitioning need not be explicit, as in [13].However, most of these techniques (except [13,31]) are restricted to contiguous array segments, and work well when different loop iterations write to disjoint array locations or when the segments are non-overlapping.Tiling [9], a property-driven verification technique, overcomes these limitations for a class of programs by inferring array access patterns in loops.But identifying tiles of array accesses is itself a difficult problem, and the approach is currently based on heuristics developed by observing interesting patterns.
There are a number of approaches that verify array programs without inferring quantified invariants explicitly.A straightforward way is to smash all array elements into a single memory location [4], but it is quite imprecise.Every array element might also be considered a separate variable, but it is not possible with unknown array sizes.There are also techniques that abstract an array to a fixed number of elements, e.g.k-distinguished cell abstraction [32,33] and k-shrinkability [24,29].Such abstractions usually reduce array modifying loops with unknown bounds to a known, small bound.It may even be possible to get rid of such loops altogether, by accelerating (computing transitive closures of) transition relations involving array updates in that loop [7].Along similar lines, VIAP [35] resorts to reasoning with recurrences instead of loops.It translates the input program, including loops, to a set of first-order axioms, and checks if they derive the property.But all these techniques do not obtain quantified invariants explicitly, unlike ours.Besides, many of these transformations produce an abstraction of the original program, i.e., they do not preserve safety.
Alternatively, there are approaches that use sufficiently expressive templates to infer quantified invariants over arrays [5,21,27].However, the templates need to be supplied manually.For instance, [6] uses a template space of quantified invariants and reduces the problem to quantifier-free invariant generation.Thus, universally quantified solutions for unknown predicates in a CHC system may be obtained by extending a generic CHC solver to handle quantified predicates.Learning need not be limited to user-supplied templates; one may do away with the templates entirely and learn only from examples and counterexamples [18].Alternatively, [36] chooses a template upfront and refurbishes it with constants or coefficients appearing in the program source.Similarly, [28] proposes to infer array invariants without any user guidance or any user-defined templates or predicates.Their method is based on automatic analysis of predicates that update an array and allows one to generate first-order invariants, including those that contain alternations of quantifiers.But it does not work for nested loops.By comparison, our technique supports multiple as well as nested loops, enables candidate propagation between loops and, more importantly, generates the grammar automatically from the syntactical constructions appearing in the program's source.

Conclusion
We have presented a new algorithm to synthesize quantified invariants over array variables, systematically accessed in loops.Our algorithm implements an enumerative search that guesses invariants based on syntactic constructions which appear in the code and checks their initiation, inductiveness, and safety with an off-the-shelf SMT solver.Key insights behind our approach are that individual accesses to array elements performed in the loop can be generalized to hypotheses about entire ranges, and the existing SMT solvers can be used to validate these hypotheses efficiently.Our implementation on top of a CHC solver FreqHorn confirmed that such strategy is effective on a variety of practical examples.In a vast majority of cases, our tool outperformed competitors and provided checkable guarantees that prevented from reporting false positives.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material.If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
For a CHC C, by src(C) we denote an application of inv ∈ R in the premise of C (if C is a fact, we write src(C) def = ).Similarly, by dst(C) we denote an application of inv ∈ R in the conclusion of C (if C is a query, we write dst(C) def = ⊥).We define functions rel and args, such that for each inv( x), rel (inv( x)) def = inv and args(inv( x)) def = x.For a CHC C, by body(C) we denote the body (i.e., ϕ) of C. Example 1. Figure 1 gives a program in the C programming language that handles two integer arrays, A and B, both of an unknown size N.The A array has unknown content, and the program first identifies a value m which is smaller or int N = nondetInt () ; int * A = nondetArray ( N ); int m = 0; for

Fig. 4 .
Fig. 4. FreqHorn vs competitors.Each point in a plot represents a pair of the run times (sec × sec) of FreqHorn (x-axis) and a competitor (y-axis).Timeouts are placed on the inner dashed lines; false alarms, unsupported cases, and crashes are on the outer dashed lines.