Accelerating SyntaxGuided Invariant Synthesis
Abstract
We present a fast algorithm for syntaxguided synthesis of inductive invariants which combines enumerative learning with inductivesubset extraction, leverages counterexamplestoinduction and interpolationbased bounded proofs. It is a variant of a recently proposed probabilistic method, called FreqHorn, which is however less dependent on heuristics than its predecessor. We present an evaluation of the new algorithm on a large set of benchmarks and show that it exhibits a more predictable behavior than its predecessor, and it is competitive to the stateoftheart invariant synthesizers based on Property Directed Reachability.
1 Introduction
Syntaxguided techniques [1] recently earned significant success in the field of synthesis of inductive invariants [13] for a given program and a given safety specification. Invariants are needed to represent overapproximations of the set of reachable program states, such that from their empty intersection with the set of error states one could conclude that the program is safe. While searching for invariants, it is intuitive to collect various statistics from the syntactical constructions, which appear in the program’s source code, and use them as a guidance.
This work continues the track of FreqHorn, a completely automatic approach for (1) construction of the formal grammar based on the symbolic program encoding, and (2) probabilistic search through the candidate formulas belonging to that grammar. FreqHorn utilizes an SMT solver for checking inductiveness of each generated formula and iteratively constructs a suitable invariant based on the successful attempts (those formulas are called lemmas). Since based on a finite number of expressions, the formal grammar is sufficiently small, and thus the candidate formulas can be enumerated relatively quickly. We distinguish two types of candidates: (1) formulas directly extracted from the program’s encoding (called seeds) and (2) formulas which are syntactically close to seeds (called mutants).
The conceptual novelty of FreqHorn is believed to be in the combined use of seeds and mutants, but the original paper [13] is largely silent on the matter. Furthermore, it turns a blind eye to some algorithmic and practical details which are required for making the approach actually efficient. Among the downsides are (1) the treatment of all syntactic expressions equally and ignorance to whether the candidates have any relevance to the given safety specification; and (2) inability to predict a moreorless appropriate order of candidates to be sampled and checked.
Luckily, elements of the Property Directed Reachability (PDR) [4, 10] can be adapted in various stages of FreqHorn’s workflow and can mitigate the downsides of the original algorithm. In particular, we propose to check candidates in batches, and we show that in practice it helps discovering larger amounts of lemmas. Additionally, we propose to keep a history of counterexamplestoinduction (CTI) which blocked FreqHorn from learning a lemma. With some periodicity, our new algorithm checks if there is a CTI which is invalidated by the currently learned lemmas, and this triggers the recheck of that failed lemma.
Last but not least, we integrate our new algorithm with the classic techniques based on Bounded Model Checking [3]. We propose to compute additional candidates by Craig interpolation [6] from proofs of bounded safety. We show that it is often sufficient to obtain some fixed amount of candidates from interpolants in the beginning of the synthesis process, and further to bootstrap the initial set of learned lemmas by the inductive subset extracted from the combination of the syntactic seeds and interpolants. In contrast to the entirely randomized workflow of the original version of FreqHorn, the behavior of our revised implementation at the bootstrapping is predictable. The randomized search is used by the new algorithm only for discovering mutants; and in our experiments, it was required in about one third of cases only.

A new revision and a new implementation of the FreqHorn algorithm which is split into the bootstrapping and the sampling stages. In the first stage, it deterministically exploits seeds only. In the second stage, it keeps generating and checking only mutants, and it is by design nondeterministic.

In the bootstrapping stage, interpolationbased proofs of bounded safety that replenish the set of seeds by the candidates that likely reflect the nature of the error unreachability and consequently affect the grammarbased generation of mutants.

In the sampling stage, the routine to extract inductive subsets which mitigates the effect of an unpredictably chosen sampling order.

A more accurate strategy for the search space pruning and an efficient counterexampleguided approach to give some failed candidates a second chance.
The rest of the paper is structured as follows. In Sect. 2, we briefly formulate the inductive synthesis problem, and in Sect. 3 we sketch the basic FreqHorn algorithm that attempts to solve it. With the help of techniques from Sect. 4, in Sect. 5 the FreqHorn algorithm gets augmented and reformulated. In Sect. 6, we show the experimental evidence that it indeed outperforms its predecessor and is competitive to stateoftheart. Finally, the related work, conclusion, and acknowledgments complete the paper in Sects. 7 and 8.
2 Background and Notation
A firstorder theory \(\mathcal {T}\) consists of a signature \(\varSigma \), which gathers variables, function and predicate symbols, and a set \( Expr \) of \(\varSigma \)formulas. Formula \(\varphi \in Expr \) is called \(\mathcal {T}\)satisfiable if there exists an interpretation m of each element (i.e., a variable, a function or a predicate symbol), under which \(\varphi \) evaluates to \(\top \) (denoted \(m\models \varphi \)); otherwise \(\varphi \) is called \(\mathcal {T}\)unsatisfiable (denoted \(\varphi \!\implies \! \bot \)). The Satisfiability Modulo Theory (SMT) problem [8] for a given theory \(\mathcal {T}\) and a formula \(\varphi \) aims at determining whether \(\varphi \) is \(\mathcal {T}\)satisfiable. In this work, we formulate the tasks arising in program verification by encoding them to the SMT problems.
Definition 1
A transition system P is a tuple \(\langle { V }\cup { V' }, Init , Tr \rangle \), where \({ V' }\) is a primed copy of a set of variables \({ V }\); \( Init \) and \( Tr \) are \(\mathcal {T}\)encodings of respectively the initial states and the transition relation.
We view programs as transition systems and throughout the paper use both terms interchangeably. Verification task is a pair \(\langle P, Bad \rangle \), where \(P = \langle { V }\cup { V' }, Init , Tr \rangle \) is a program, and \( Bad \) is a \(\mathcal {T}\)encoding the error states. A verification task has a solution if the set of error states is unreachable. A solution to the verification task is represented by a safe inductive invariant, a formula that covers every initial state, is closed under the transition relation, and does not cover any of the error states.
Definition 2
To simplify reading, in the rest of the paper safe inductive invariants are referred to as just invariants. We assume that an invariant \( Inv \) has the form of conjunction, i.e., \( Inv = \ell _0 \wedge \ldots \wedge \ell _n\), and each \(\ell _i\) is called a lemma.
The validity of each implication (1) and (2) is equivalent to the unsatisfiability of the negation of the corresponding formula. Suppose, a formula \( Inv \) makes (1) valid, but does not make (2) valid. Thus, there exists an interpretation m satisfying \( Inv ({ V }) \wedge Tr ({ V }, { V' }) \wedge \lnot { Inv ({ V' })}\), to which we refer to as a counterexampletoinduction (CTI).
Example 1
The loop in program in Fig. 1a iterates N times, and in each iteration it nondeterministically picks a value M, adds it to x (conditionally) and to c, and assigns the sum of x and c to k. We wish to prove that after the loop terminates, \(\mathtt {x \ge N}\). An invariant for the program is defined nonuniquely, e.g., both the conjunction \((k\,\texttt {mod}\,2 = 0 \wedge x = c)\) and conjunction \((k = x + c \wedge x \ge c)\) are the solutions for this verification task. \(\square \)
3 SyntaxGuided Invariant Synthesis
In this work, we aim at discovering invariants in an enumerative way, i.e., by guessing a candidate formula, substituting it for conditions (1), (2), and (3), and checking their validity. Here we present a moderately reformulated and simplified view of an algorithm recently proposed in [13].^{1} The pseudocode of the algorithm, called FreqHorn, is shown in Algorithm 1. The key insight behind the algorithm is the automatic construction of a grammar G (line 2) based on a fixed set \( Seeds \) of expressions obtained by traversing parse trees of \( Init \), \( Tr \), and \( Bad \) (line 1).
The algorithm uses G for generating the candidate formulas (line 5) and populates the set of lemmas until their conjunction is an invariant. The algorithm learns from each positive and negative attempt (line 9). That is, G gets adjusted, such that the candidate (and some of its close relatives) is not going to be sampled in any of the following iterations.
Example 2
The verification condition for the program in Fig. 1a is represented by three implications in Fig. 1d. They are syntactically split into the set \( Seeds \) of expressions over \({ V }\), elements of which used to contain only primed or only unprimed variables (shown in Fig. 1b). In particular, equalities \(x' = x + M\) and \(c' = c + M\) are excluded from \( Seeds \), and equality \(k'=x'+c'\) is rewritten to \(k=x+c\). The grammar containing the normalized expressions from \( Seeds \) is shown in Fig. 1c. It is easy to see that all lemmas consisting in both invariants from Example 1, \((k\,\texttt {mod}\,2 = 0 \wedge x = c)\) and \((k = x + c \wedge x \ge c)\), can be generated by applying the grammar’s production rules recursively. \(\square \)
Definition 3
Each formula contained in set \( Seeds \), which is used for constructing grammar G (in line 2), is called a seed. Formula \( cand \) produced by G is called a mutant if \( cand \not \in Seeds \).
The main downside of Algorithm 1 is that it is hard to choose a sampling order for each individual lemma at the final invariant. Suppose, \( cand = (x = c)\) is sampled and checked in the first iteration of Algorithm 1. Consequently, condition (2) is not fulfilled, and it is witnessed by the following CTI: \([x \leftarrow 0; k \leftarrow 1; c \leftarrow 0; N \leftarrow 10; x' \leftarrow 0; k' \leftarrow 7; c \leftarrow 7; M \leftarrow 7]\). The grammar is then adjusted, such that \(x = c\) (and some syntactically relevant, stronger or equivalent formulas, e.g. \(x = c\)) do not belong to the grammar anymore.
Suppose, in the second iteration of Algorithm 1, \( cand = (k\,\texttt {mod}\,2 = 0)\). It passes checks (1) and (2), gets inserted to set \( Lemmas \), and thus it is going to be taken into account in the following iterations (see implications in lines 4 and 6). The grammar is then adjusted again, such that \(k\,\texttt {mod}\,2 = 0\) (and some weaker or equivalent formulas, e.g. \(k\,\texttt {mod}\,2 \ge 0\)) do not belong to the grammar anymore. Note that if in the third iteration \( cand = (x = c)\) was sampled again, the algorithm would terminate. However, it is impossible since the sampling grammar was adjusted after both negative and positive attempts.
The opposite sampling order (i.e., \( cand = (k\,\texttt {mod}\,2 = 0)\) first, and \( cand = (x = c)\) then) would lead to a faster convergence of the algorithm. Since it is hard to decide which order to choose, the production rules are equipped with probability distributions that allow both orders under certain probabilities. In this paper, we propose to use a strategy which is less dependent on an order – to check candidates in batches – and we describe it in Sect. 5 in more detail.
4 Old Friends Are Best
In this section, we rehash two ideas widely used in symbolic model checking that can be adapted to accelerate syntaxguided invariant synthesis.
4.1 InterpolationBased Proofs of Bounded Safety
Lemma 1
If a BMC formula for program P and some k is satisfiable then no invariant exists.
A proof of bounded safety is an overapproximation I of the set of initial states, such that any path of length k, that starts in a state satisfying I, does not end in a state satisfying \( Bad \). Extraction of proofs is typically done with the help of Craig interpolation [6].
Definition 4
Given two formulas A and B, such that \(A \wedge B \!\implies \! \bot \), an interpolant I is a formula satisfying three conditions: (1) \(A\!\implies \!I\), (2) \(I\wedge {B}\!\implies \! \bot \), and (3) I is expressed over the common alphabet to A and B.
For an invocation of a procedure of generating an interpolant I for A and B and splitting it to a set of conjunctionfree clauses (i.e., \(I = \ell _0 \wedge \dots \wedge \ell _n\)), we write \(\{\ell _i\} \leftarrow \textsc {getItp}(A,B)\). Algorithm 2 shows an algorithm to generate interpolationbased proofs of bounded safety for BMC formulas. It iteratively unrolls the transition relation and applies the interpolation to the entire BMC formula. In addition, in spirit of Lazy Annotation [25], while decrementing i, the algorithm applies a backward reasoning and checks if an error state is reachable by \((ki)\) steps from an empty state (line 4). It triggers the interpolation to be applied to smaller formulas, and in some cases fastens the proof search (line 5).
Example 3
Let the program in Fig. 1a is unrolled 0 times, then its BMC formula is constructed as follows: \(\underbrace{x = 0 \wedge k = 0 \wedge c = 0}_{ Init } \wedge \underbrace{c \ge N \wedge \lnot (x \ge N)}_{ Bad }\). It is unsatisfiable, and since interpolants are not unique, function getItp(\( Init , Bad )\) could return \( proof _1 = \{x \ge 0, c \le 0\}\), \( proof _2 = \{x = c\}\), or \( proof _3 = \{x \ge c\}\). \(\square \)
4.2 Inductive Subset Extraction
When checking the consecution of a set of candidate formulas “onebyone” (i.e., like in Algorithm 1), the order of checks is crucial, and the chance to miss some important lemma is high. It can be overcome by checking all candidate formulas at once, identifying which ones brake the validity of implication (2), removing them from the set, and repeating the “allatonce” check. Algorithm 3 shows a simple implementation of this iterative algorithm, which is extensively used in PDR and also known as Houdini [14], Note that Houdini is only meaningful for the candidate formulas which are already implied by the initial states.
Example 4
Conjunction of formulas from set \( Seeds \) in Fig. 1b is unsatisfiable, and its minimal unsatisfiable core is \(c < N \wedge c \ge N\). Thus, Algorithm 3 would immediately return the entire set \( Seeds \). Let a set \( Cands \) be constructed from \( Seeds \) by removing all elements, for which condition (1) does not hold. Conjunction of the elements in \( Cands \) is satisfiable: \(\{x = 0, c = 0, k = 0, k\,\texttt {mod}\,2 = 0, k = x + c\}\). Applying Algorithm 3 to \( Cands \) gives the inductive subset \(\{k\,\texttt {mod}\,2 = 0, k = x + c \}\).
\(\square \)
Note that we extended Algorithm 3 with a routine to extract a counterexampletoinduction \(\pi \) for each element dropped from \( Cands \) (lines 3–6). We restrict each \(\pi \) to only assignments to variables from \({ V }\) (denoted Open image in new window ) and group all noninductive formulas from \( Cands \) by the particular \(\pi \) that killed them. This routine is important for optimizing the syntaxguided invariant synthesis algorithm, and it is discussed in more detail in Sect. 5.
5 Reconsidering SyntaxGuided Invariant Synthesis
The lesson we learned when running the FreqHorn algorithm is that the program encoding gives many hints on how the shape of lemmas should look like. However, the encoding itself can barely give any information about the sampling order. Our main idea to revise the FreqHorn algorithm is to treat seeds and mutants separately. Indeed, as we have seen in Example 2, both seeds and mutants are needed for constructing an invariant, but seeds do not actually need to be resampled – these candidates are ready to be checked prior to any sampling.
5.1 Overview
The pseudocode of the new FreqHorn’s revision is shown in Algorithm 4. In the bootstrapping, the algorithm relies on Algorithm 2 to replenish the set of seeds by semanticallymeaningful candidates, and in the sampling stage, it relies on Algorithm 3 to mitigate the effect of an unpredictably chosen sampling order. Another algorithmic advantage against Algorithm 1 (to be explained in Sect. 5.2) lies in a more accurate strategy for the search space pruning and the efficient counterexampleguided method to give some failed candidates a second chance.
The algorithm takes as input a verification task and values of important configuration parameters N, M, and K (to be explained further). Like Algorithm 1, it starts with obtaining a set of expressions \( Seeds \) from \( Init \), \( Tr \), and \( Bad \) (line 1). Then, \( Seeds \) gets merged with sets of formulas obtained by Craig interpolation from proofs of bounded safety for a range of bounds \(0,\ldots , N\). Note that if there is a counterexample of length \(k < N\) discoverable by the BMC engine then an invariant does not exist (recall Lemma 1), and Algorithm 4 terminates (line 5).
The bootstrapping ends when the merged set \( Seeds \) is taken as input by Algorithm 3, and it extracts an inductive subset (line 19). However, prior to it, the algorithm checks the initiation condition for all elements of the merged set, and the set is filtered accordingly (lines 15–18).
Example 5
Let set \( Seeds \) be as in Fig. 1b, and set \( Cands \) be constructed from \( Seeds \) by removing all elements, for which condition (1) does not hold. Assume that a proof of bounded safety for \(k=0\) is \(\{x = c\}\) (as one of the options in Example 3). Applying Algorithm 3 to \( Cands \cup \{x = c\}\), we get the inductive subset \(\{k\,\texttt {mod}\,2 = 0, k = x + c, x = c\}\). Since the conjunction of these lemmas is an invariant, the algorithm terminates just after the bootstrapping. \(\square \)
Checking the candidate formulas in batches is an important improvement over Algorithm 1. This way, the algorithm becomes less dependent of the heuristics for prioritizing the searchspace traversal. The size of the batch M is configurable, and if the size of set \( Cands \) is less than M, then the set gets additional mutants (lines 13–14). Mutants are sampled from the grammar, which is powered by both, the program’s encoding (similar to Algorithm 1) and the proofs of bounded safety (new in Algorithm 4). This enlarges the search space for the further mutants.
If the initial batch of candidates still misses some lemmas necessary for an invariant, then Algorithm 4 proceeds to a new iteration. In particular, the extracted inductive subset gets merged with the set of lemmas (line 24), and the assembly of a new batch of candidates starts from scratch (line 25).
Example 6
Assume that a proof of bounded safety is \( proof _1 = \{x \ge 0, c \le 0\}\) (as in Example 3). However, the initiation condition is fulfilled for none of the elements of \( proof _1\), so none of them contains in the set of formulas \( Cands \) taken as input by Algorithm 3. Thus, \( proof _1\) does not bring any additional value to the set of seeds, and (contrary to the case in Example 5) the algorithm does not terminate after the bootstrapping. Instead, it proceeds to sampling fresh mutants. \(\square \)
Theorem 1
If Algorithm 4 terminates, then either an actual bug is found (line 5), or an invariant is synthesized (after the whileloop).
5.2 Learning Strategy
A substantial distinction between the FreqHorn’s revisions is how they react to the positive and negative attempts. In Algorithm 1, the search space gets adjusted after each individual check (recall the example after Definition 3). The grammar adjustments are performed by changing the probabilities assigned to the production rules. In addition to zeroing the probability of sampling a candidate \( cand \) itself, after each positive check, Algorithm 1 zeroes the probabilities of sampling some formulas which are weaker than \( cand \), and after each negative check – the probabilities of sampling some formulas which are stronger than \( cand \) (see [13] for more details).
In contrast, Algorithm 4 reacts just to the failed candidates after the initiation check (line 17) and to the successful candidates after the consecution check (line 21). Otherwise, if the consecution check failed for a candidate \( cand \) (inside Algorithm 3), Algorithm 4 does not disqualify \( cand \) from being checked again in the future, and this is done by keeping \( cand \) locally and periodically seeking an opportunity to give \( cand \) a second chance.
To efficiently exploit the secondchance candidates, we rely on the extension of Algorithm 3 by the routine to extract counterexamplestoinduction. That is, for each failed \( cand \) there exists \(m\in CTI \) that killed it. To maintain this information, every application of Algorithm 3 updates the map \( CTImap \) from \( CTI \) to failed candidates. In Algorithm 4, it remains to periodically check whether some m is eliminated (line 29), and it would increase chances of all candidates killed by m (line 31) to succeed the consecution check in the next iteration. On the other hand, if some m still models the conjunction of learned lemmas then it is guaranteed that candidates in \( CTImap (m)\) will fail the consecution check again.
Finally, to ensure that the CTIcheck happens not too often, we run it only when at least K new lemmas are learned. To make this happen, Algorithm 4 performs a redundancy check (line 22) for all lemmas \(\ell \) that have passed the initiation and the consecution checks: \(\ell \) gets learned only when the conjunction of all lemmas learned so far does not imply \(\ell \). Obviously, when no new lemmas (after the redundancy check) are added, it does not make sense to run the CTIcheck since all CTIs are still valid.
5.3 Optimizations
The following tricks are omitted from the algorithm’s pseudocode to simplify reading, but they are important for the algorithm’s efficiency.

As a consequence of calculating frequencies, in the original FreqHorn algorithm, seeds were given priorities, but mutants were considered with a relatively small probability. In contrast, the new FreqHorn’s revision forces seeds to be checked in the bootstrapping. So while doing sampling, it gives priorities to mutants, and for that it ignores frequencies.

The initiation checks (lines 15–18) for proofs of bounded safety are omitted since by definition of interpolant (Definition 4) they are already fulfilled. The initiation checks for the secondchance candidates are omitted as well.

In case a candidate fails the consecution check, and it is queued for a second chance, it is still possible that Algorithm 4 samples it again in the next iterations. Resampling is avoided by additional adjustments to the probabilities of the sampling grammar in line 6 of Algorithm 3.

Algorithm 3 could be optimized if solved with assumptions. However, in our experience, it may lead to dropping more candidates than needed. Ideas for getting a maximal inductive subset from [23] could be applied here as well.

For getting proofs of bounded safety for various bounds, an incremental SMT solver could be used. That is, it could reuse parts of a BMC formula for bound k to encode the BMC formula for bound \(k+1\). Potentially, other tricks (e.g., [5, 30]) could also be applied here. Finally, interpolation could be replaced by the weakest precondition computation.
6 Evaluation
We implemented FreqHorn2 on top of our prior implementation FreqHorn^{2}. The tool takes as input a verification task in a form of linear constrained Horn clauses (CHC), automatically performs its unrolling, searches for counterexamples, generates proofs of bounded safety, and performs the Houdinistyle extraction of inductive subsets. All the symbolic reasoning is performed by the Z3 SMT solver [7].
We evaluated FreqHorn2 on various safe and buggy programs taken from SVCOMP^{3} and literature (e.g., [9, 15]). Since most of benchmarks, proposed by [9], appeared to be solvable during the bootstrapping of FreqHorn2 (more details in Sect. 6.1) within (fractions of) seconds, we crafted additional harder benchmarks by ourselves.
All the programs were encoded using the theories of linear (LIA) and nonlinear integer arithmetic (NIA). We did run FreqHorn2 on unsafe instances for the testing purposes only. It was able to detect a counterexample, but since no invariant exists in these cases, we do not discuss this experience here.
6.1 The Bootstrapping Experiment
In total, we considered 171 safe programs. For 103 of them, the seeds, generated by breaking the symbolic encoding to pieces, did already contain all lemmas needed for invariants. However, when we checked the seeds onebyone, we revealed invariants for only 63, but using the inductive subset extraction helped revealing all 103. Each set of seeds contained in average 9 formulas.
For our BMC implementation, we considered bounds 1, 2, and 3. Generated interpolants already contained all lemmas for invariants for 70 programs.^{4} Each set of bounded proofs contained in average 2 formulas. In all these cases, the output of Algorithm 2 was taken as input by Algorithm 3, and the final safety check was performed afterwards. Our most promising results were achieved while running Algorithm 3 for the merged sets of seeds and proofs of bounded safety (i.e., both sets as in the two prior runs together). The merged sets already contained all lemmas for invariants for 114 programs.
Exact timings.
6.2 Overall Statistics
Since technically FreqHorn2 is a CHCsolver, we compared it against other CHCsolvers, namely \(\mu \mathtt{Z}\) [17, 22] and Spacer3 [22]. All the tools were provided with the same CHCencodings of verification problems (and thus, the results do not directly depend on a process of encoding a Cprogram to a CHCfile). Both \(\mu \mathtt{Z}\) and Spacer3 are PDRbased, and despite the latter is faster than the former and can solve more benchmarks, there are 26 instances, for which the former outperforms the latter.
Table 1 shows the precise running times of FreqHorn2, FreqHorn, \(\mu \mathtt{Z}\), and Spacer3. To simplify reading, we removed nonrepresentative “noise”runs which took less that 1 s or exceeded a timeout of 60 s by all tools. In the table, \(\epsilon \) denotes an insignificant amount of time (\(\le \)1 s), and \(\infty \) denotes the timeout. The numbers of FreqHorn and FreqHorn2 are the means of three individual runs. In total, the table contains 128 instances. Additionally, Fig. 2 shows three scatter plots comparing running times of FreqHorn2 vs FreqHorn, \(\mu \mathtt{Z}\), and Spacer3 respectively. Each point in a plot represents a pair of the FreqHorn2 run (xaxis) and the competing tool run (yaxis).
FreqHorn2 outperformed its predecessor in 90 out of 128 cases. We witnessed the speedup up to 233X, and in average FreqHorn2 was four times faster than FreqHorn. In 40 cases FreqHorn2 outperformed Spacer3, and in 38 cases Spacer3 outperformed FreqHorn2. In 51 cases FreqHorn2 outperformed \(\mu \mathtt{Z}\), and in 34 cases \(\mu \mathtt{Z}\) outperformed FreqHorn2. Unfortunately, FreqHorn2 still has some performance anomalies, which we believe are connected to the often blind grammarconstruction mechanism, inability to generate large disjunctions, and possible inefficiencies of the blackbox interpolation engine.
7 Related Work
In this work we exploit a range of techniques originated from symbolic model checking, and in particular from IC3/PDR [4, 10], e.g., the idea of keeping CTIs and analyzing them to push previously considered lemmas [29]. Various strategies could be applied for making the lemma pushing more or less eager, i.e., as soon as a newlyadded lemma invalidates some CTI. In some IC3 implementations (e.g., [16]), eager pushing does not pay off, but avoiding to push certain lemmas during the regular pushing stage of IC3 results in an improvement. Since we do not have many lemmas, eager pushing also pays off.
The idea of applying Houdini to extract invariants from proofs of bounded safety was fundamental for the first version of Spacer [23]. They, however, keep obtaining proofs along the entire verification process. In contrast, we use proofs mainly for the bootstrapping, while the remaining progress of the algorithm is entirely dictated by the success of sampling.
Most of the successful verification tools today use various combinations of different techniques. In particular, approaches [2, 28] use invariants from abstract interpretation to force convergence of kinduction. Recently, kinduction was benefitted from lemmas obtained from PDR [21]. A promising idea to exploit the data from traces [12, 15] while creating and manipulating the candidates for invariants could also be used in our syntaxguided approach: at least we could add more constants to the grammar. However we are currently unaware of a strategy to find meaningful constants and to avoid overpopulation of the grammar by too many constants. Our preliminary experiments resulted so far in the performance decrease.
Techniques for automatic construction of grammars were applied outside of formal verification, but in the domains of security analysis and dynamic test generation [18, 19]. Indeed, mutations of the input data for some program can in fact be used as new input data and therefore can increase the testing coverage.
Finally, syntaxguided techniques [1] keep being used in program synthesis more frequently than in the inductive invariant synthesis. For instance, in applications [11, 20, 26, 27] a formal grammar is additionally provided, and it is considered a part of specification. In contrast, in our application, the verification condition contains the encoding of the entire program and the safety specification, which together are enough for construction of formal grammars completely automatically. This is in fact the main driving idea behind FreqHorn, and it leaves us a spacious room for its further adaptations, e.g., in proving and disproving program termination, automated repair of software regressions, and security analysis.
8 Conclusion
We have presented the new revision of the FreqHorn algorithm to synthesize safe inductive invariants based on syntactic features of the source code and the proofs of bounded safety. The new algorithm contains the deterministic bootstrapping stage and the nondeterministic sampling stage, which make it more predictable than its predecessor, allows converging more frequently and in average four times faster. Similarly to most of the stateoftheart verification techniques, our approach enjoys a tight integration with well renowned formal methods and should be treated as an example of successful interchange of ideas across application domains.
Footnotes
 1.
The original description [13] focuses on the probabilistic routines. In the interest of this work, we do not discuss them here but restrict our attention on describing and exemplifying the preprocessing steps.
 2.
The source code and benchmarks are available at https://github.com/grigoryfedyukovich/aeval/tree/rnd.
 3.
Software Verification Competition, http://svcomp.sosylab.org/, loop* categories.
 4.
Currently interpolation in FreqHorn2 is limited to LIA, so we had to skip interpolation for 17 benchmarks over NIA.
Notes
Acknowledgments
It is hard to underestimate the value of discussions with Alexander Ivrii, Arie Gurfinkel, Michael W. Whalen, and other attendees of the International Conference on Formal Methods in ComputerAided Design (FMCAD 2017) which gave rise to many interesting ideas and inspired this work.
References
 1.Alur, R., Bodík, R., Juniwal, G., Martin, M.M.K., Raghothaman, M., Seshia, S.A., Singh, R., SolarLezama, A., Torlak, E., Udupa, A.: Syntaxguided synthesis. In: FMCAD, pp. 1–17. IEEE (2013)Google Scholar
 2.Beyer, D., Dangl, M., Wendler, P.: Boosting kinduction with continuouslyrefined invariants. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 622–640. Springer, Cham (2015). https://doi.org/10.1007/9783319216904_42CrossRefGoogle Scholar
 3.Biere, A., Cimatti, A., Clarke, E., Zhu, Y.: Symbolic model checking without BDDs. In: Cleaveland, W.R. (ed.) TACAS 1999. LNCS, vol. 1579, pp. 193–207. Springer, Heidelberg (1999). https://doi.org/10.1007/3540490590_14CrossRefGoogle Scholar
 4.Bradley, A.R.: SATbased model checking without unrolling. In: Jhala, R., Schmidt, D. (eds.) VMCAI 2011. LNCS, vol. 6538, pp. 70–87. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642182754_7CrossRefGoogle Scholar
 5.Cabodi, G., Camurati, P., Palena, M., Pasini, P., Vendraminetto, D.: Interpolationbased learning as a mean to speedup bounded model checking (short paper). In: Cimatti, A., Sirjani, M. (eds.) SEFM 2017. LNCS, vol. 10469, pp. 382–387. Springer, Cham (2017). https://doi.org/10.1007/9783319661971_25CrossRefGoogle Scholar
 6.Craig, W.: Three uses of the HerbrandGentzen theorem in relating model theory and proof theory. J. Symbolic Logic, 269–285 (1957)MathSciNetCrossRefGoogle Scholar
 7.de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/9783540788003_24CrossRefGoogle Scholar
 8.Detlefs, D., Nelson, G., Saxe, J.B.: Simplify: a theorem prover for program checking. J. ACM 52(3), 365–473 (2005)MathSciNetCrossRefGoogle Scholar
 9.Dillig, I., Dillig, T., Li, B., McMillan, K.L.: Inductive invariant generation via abductive inference. In: OOPSLA, pp. 443–456. ACM (2013)CrossRefGoogle Scholar
 10.Eén, N., Mishchenko, A., Brayton, R.K.: Efficient implementation of property directed reachability. In: FMCAD, pp. 125–134. IEEE (2011)Google Scholar
 11.Fedyukovich, G., Ahmad, M.B.S., Bodík, R.: Gradual synthesis for static parallelization of singlepass arrayprocessing programs. In: PLDI, pp. 572–585. ACM (2017)CrossRefGoogle Scholar
 12.Fedyukovich, G., D’Iddio, A.C., Hyvärinen, A.E.J., Sharygina, N.: Symbolic detection of assertion dependencies for bounded model checking. In: Egyed, A., Schaefer, I. (eds.) FASE 2015. LNCS, vol. 9033, pp. 186–201. Springer, Heidelberg (2015). https://doi.org/10.1007/9783662466759_13CrossRefGoogle Scholar
 13.Fedyukovich, G., Kaufman, S., Bodík, R.: Sampling invariants from frequency distributions. In: FMCAD, pp. 100–107. IEEE (2017)Google Scholar
 14.Flanagan, C., Leino, K.R.M.: Houdini, an annotation assistant for ESC/Java. In: Oliveira, J.N., Zave, P. (eds.) FME 2001. LNCS, vol. 2021, pp. 500–517. Springer, Heidelberg (2001). https://doi.org/10.1007/3540452516_29CrossRefMATHGoogle Scholar
 15.Garg, P., Neider, D., Madhusudan, P., Roth, D.: Learning invariants using decision trees and implication counterexamples. In: POPL, pp. 499–512. ACM (2016)CrossRefGoogle Scholar
 16.Gurfinkel, A., Ivrii, A.: Pushing to the top. In: FMCAD, pp. 65–72. IEEE (2015)Google Scholar
 17.Hoder, K., Bjørner, N.: Generalized property directed reachability. In: Cimatti, A., Sebastiani, R. (eds.) SAT 2012. LNCS, vol. 7317, pp. 157–171. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642316128_13CrossRefGoogle Scholar
 18.Höschele, M., Zeller, A.: Mining input grammars from dynamic taints. In: ASE, pp. 720–725. ACM (2016)Google Scholar
 19.Höschele, M., Zeller, A.: Mining input grammars with AUTOGRAM. In: ICSE  Companion Volume, pp. 31–34. IEEE Computer Society (2017)Google Scholar
 20.Inala, J.P., Polikarpova, N., Qiu, X., Lerner, B.S., SolarLezama, A.: Synthesis of recursive ADT transformations from reusable templates. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10205, pp. 247–263. Springer, Heidelberg (2017). https://doi.org/10.1007/9783662545775_14CrossRefGoogle Scholar
 21.Jovanovic, D., Dutertre, B.: Propertydirected kinduction. In: FMCAD, pp. 85–92. IEEE (2016)Google Scholar
 22.Komuravelli, A., Gurfinkel, A., Chaki, S.: SMTbased model checking for recursive programs. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 17–34. Springer, Cham (2014). https://doi.org/10.1007/9783319088679_2. https://bitbucket.org/spacer/code/branch/spacer3CrossRefGoogle Scholar
 23.Komuravelli, A., Gurfinkel, A., Chaki, S., Clarke, E.M.: Automatic abstraction in SMTbased unbounded software model checking. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 846–862. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642397998_59CrossRefGoogle Scholar
 24.McMillan, K.L.: Interpolation and SATbased model checking. In: Hunt, W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 1–13. Springer, Heidelberg (2003). https://doi.org/10.1007/9783540450696_1CrossRefGoogle Scholar
 25.McMillan, K.L.: Lazy annotation revisited. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 243–259. Springer, Cham (2014). https://doi.org/10.1007/9783319088679_16CrossRefGoogle Scholar
 26.Phothilimthana, P.M., Jelvis, T., Shah, R., Totla, N., Chasins, S., Bodík, R.: Chlorophyll: synthesisaided compiler for lowpower spatial architectures. In: PLDI, pp. 396–407. ACM (2014)CrossRefGoogle Scholar
 27.Pu, Y., Bodík, R., Srivastava, S.: Synthesis of firstorder dynamic programming algorithms. In: OOPSLA, pp. 83–98. ACM (2011)CrossRefGoogle Scholar
 28.Roux, P., Delmas, R., Garoche, P.: SMTAI: an abstract interpreter as oracle for kinduction. Electr. Notes Theor. Comput. Sci. 267(2), 55–68 (2010)CrossRefGoogle Scholar
 29.Suda, M.: Triggered clause pushing for IC3. CoRR, abs/1307.4966 (2013)Google Scholar
 30.Vizel, Y., Gurfinkel, A., Malik, S.: Fast interpolating BMC. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 641–657. Springer, Cham (2015). https://doi.org/10.1007/9783319216904_43CrossRefGoogle Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.