Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Originated from the field of program synthesis, an approach of syntax-guided synthesis (SyGuS) [2] has recently been applied [14, 16] to verification of program safety. In general, a SyGuS-based method walks through a set of candidates, restricted by a formal grammar, and searches for a candidate that meets the predetermined specification. The distinguishing insight of [14, 16], in which SyGuS discovers inductive invariants, is that a formal grammar need not necessarily be provided by the user (as in applications to program synthesis), but instead it could be automatically constructed on the fly from the symbolic encoding of the program being analyzed. Despite being incomplete, the approach shows remarkable practical success due to its ability to discover various facts about program behaviors whose syntactic representations are compact and look similar to the actual program statements.

Problems of proving and disproving program termination have a known connection to safety verification, e.g., [7, 19, 28, 39, 40]. In particular, to prove termination, a program could be augmented by a counter (or a set of counters) that is initially assigned a reasonably large value and monotonically decreases at each iteration [38]. It remains to solve a safety verification task: to prove that the counter never goes negative. On the other hand, to prove that a program has only infinite traces, one could prove that the negation of a loop guard is never reachable, which boils down to another safety verification task. This knowledge motivates us not only to exploit safety verification as a subroutine in our techniques, but also to adapt successful methods across application domains.

We present a set of SyGuS-based algorithms for proving and disproving termination. For the former, our algorithm LinRank adds a decrementing counter to a loop, iteratively guesses lower bounds on its initial value (using the syntactic patterns obtained from the code), which lead to the safety verification tasks to be solved by an off-the-shelf Horn solver. Existence of an inductive invariant guarantees termination, and the algorithm converges. Otherwise LinRank proceeds to strengthening the lower bounds by adding another guess. Similarly, our algorithm LexRank deals with a system of extra counters ordered lexicographically and thus enables termination analysis for a wider class of programs.

For proving non-termination, we present a novel algorithm NontermRef that iteratively searches for a restriction on the loop guard, that might lead to infinite traces. Since safety verification cannot in general answer such queries, we build NontermRef on top of a solver for the validity of \(\forall \exists \)-formulas. In particular, we prove that if at the beginning of any iteration the desired restriction is fulfilled, then there exists a sequence of states from the beginning to the end of that iteration, and the desired restriction is fulfilled at the end of that iteration as well. Recent symbolic techniques [15] to handle quantifier alternation enabled us to prove non-termination of a large class of programs for which a reduction to safety verification is not effective.

These three algorithms are independent of each other, but they all rely on a generator of constraints that are further applied in different contexts. This distinguishes our work from most of the related approaches [7, 18, 20, 23, 30, 32, 36, 39, 40]. The key insight, adapted from [14, 16], is that the syntactical structures that appear in the program give rise to a formal grammar, from which many candidates could be sampled. Because the grammar is composed from a finite number of numeric constants, operators, and variable combinations, the number of sampled constraints is always finite. Furthermore, since our samples are syntactically close to the actual constructs which appear in the code, they often provide a practical guidance towards the proof of the task. Thus in the majority of cases, the algorithms converge with the successful result.

We have implemented our algorithms in a tool called FreqTerm, which utilizes solvers for Satisfiability Modulo Theory (SMT) [11, 15] and satisfiability of constrained Horn clauses [16, 24, 26]. These automatic provers become more robust and powerful every day, which affects performance of FreqTerm only positively. We have evaluated FreqTerm on a range of terminating and non-terminating programs taken from SVCOMPFootnote 1 and on large-scale benchmarks arising from Event-Condition-Action systemsFootnote 2 (ECA). Compared to state-of-the-art termination analyzers [18, 22, 30], FreqTerm exhibits a competitive runtime, and achieves several orders of magnitude performance improvement while proving non-termination of ECAs.

In the rest of the paper, we give background on automated verification (Sect. 2) and on SyGuS (Sect. 3); then we describe the application of SyGuS for proving termination (Sect. 4) and non-termination (Sect. 5). Finally, after reporting experimental results (Sect. 6), we overview related work (Sect. 7) and conclude the paper (Sect. 8).

2 Background and Notation

In this work, we formulate tasks arising in automated program analysis by encoding them to instances of the SMT problem [12]: for a given first-order formula \(\varphi \) and a background theory to decide whether there is an assignment m of values from the theory to variables in \(\varphi \) that makes \(\varphi \) true (denoted \(m\models \varphi \)). If every assignment to \(\varphi \) is also an assignment to some formula \(\psi \), we write \(\varphi \implies \psi \).

Definition 1

A transition system P is a tuple \(\langle { V }\cup { V' }, Init , Tr \rangle \), where \({ V }\) is a vector of variables; \({ V' }\) is its primed copy; formulas \( Init \) and \( Tr \) encode the initial states and the transition relation respectively.

We view programs as transition systems and throughout the paper use both terms interchangeably. An assignment s of values to all variables in \({ V }\) (or any copy of \({ V }\) such as \({ V' }\)) is called a state. A trace is a (possibly infinite) sequence of states \(s,s',\dots \), such that (1) \(s\models Init \), and (2) for each i, \(s^{(i)},s^{(i+1)}\models Tr \).

We assume, without loss of generality, that the transition-relation formula \( Tr ({ V }, { V' })\) is in Conjunctive Normal Form, and we split \( Tr ({ V }, { V' })\) to a conjunction \( Guard ({ V }) \wedge Body ({ V }, { V' })\), where \( Guard ({ V })\) is the maximal subset of conjuncts of \( Tr \) expressed over variables just from \({ V }\), and every conjunct of \( Body ({ V }, { V' })\) can have appearances of variables from \({ V }\) and \({ V' }\).

Intuitively, formula \( Guard ({ V })\) encodes a loop guard of the program, whose loop body is encoded in \( Body ({ V }, { V' })\). For example, for a program shown in Fig. 1a, \({ V }= \{x,y,K\}\), the \( Guard = y < K \vee y > K\), and the entire encoding of the transition relation is shown in Fig. 1b.

Definition 2

If each program trace contains a state s, such that \(s\models \lnot Guard \), then the program is called terminating (otherwise, it is called non-terminating).

Tasks of proving termination and non-termination are often reduced to tasks of proving program safety. A safety verification task is a pair \(\langle P, Err \rangle \), where \(P = \langle { V }\cup { V' }, Init , Tr \rangle \) is a program, and \( Err \) is an encoding of the error states. It has a solution if there exists a formula, called a safe inductive invariant, that implies \( Init \), is closed under \( Tr \), and is inconsistent with \( Err \).

Definition 3

Let \(P = \langle { V }\cup { V' }, Init , Tr \rangle \); a formula \( Inv \) is a safe inductive invariant if the following conditions hold: (1) \( Init ({ V }) \implies Inv ({ V })\), (2) \( Inv ({ V }) \wedge Tr ({ V }, { V' }) \implies Inv ({ V' })\), and (3) \( Inv ({ V }) \wedge Err ({ V }) \implies \bot \).

If there exists a trace c (called a counterexample) that contains a state s, such that \(s\models Err \), then the safety verification task does not have a solution.

Fig. 1.
figure 1

(a): C-code; (b): transition relation \( Tr \) (in the framebox – \( Guard \)); (c): formulas S extracted from \(\textit{Tr}\) and normalized; (d): grammar that generalizes S.

3 Exploiting Program Syntax

The key driver of our termination and non-termination provers is a generator of constraints which help to analyze the given program in different ways. The source code often gives useful information, e.g., of occurrences of variables, constants, arithmetic and comparison operators, that could bootstrap the formula generator. We rely on the SyGuS-based algorithm [16] introduced for verifying program safety. It automatically constructs the grammar G based on the fixed set of formulas S obtained by traversing parse trees of \( Init \), \( Tr \), and \( Err \). In our case, \( Err \) is not given, so G is based only on \( Init \) and \( Tr \).

For simplicity, we require formulas in S to have the form of inequalities composed from a linear combination over either \({ V }\) or \({ V' }\) and a constant (e.g., \(x' < y' + 1\) is included, but \(x' = x + 1\) is excluded). Then, if needed, variables are deprimed (e.g., \(x' < y' + 1\) is replaced by \(x < y + 1\)), and formulas are normalized, such that all terms are moved to the left side (e.g., \(x < y + 1\) is replaced by \(x - y - 1 < 0\)), the subtraction is rewritten as addition, < is rewritten as >, and respectively \(\le \) as \(\ge \) (e.g., \(x - y - 1 < 0\) is replaced by \((-1) \cdot x + y+ 1 > 0\)).

The entire process of creation of G is exemplified in Fig. 1. Production rules of G are constructed as follows: (1) the production rule for normalized inequalities (denoted ineq) consists of choices corresponding to distinct types of inequalities in S, (2) the production rule for linear combinations (denoted sum) consists of choices corresponding to distinct arities of inequalities in S, (3) production rules for variables, coefficients, and constants (denoted respectively var, coef, and const) consist of choices corresponding respectively to distinct variables, coefficients, and constants that occur in inequalities in S. Note that the method of creation of G naturally extends to considering disjunctions and nonlinear arithmetic [16].

Choices in production rules of grammar G can be further assigned probabilities based on frequencies of certain syntactic features (e.g., frequencies of particular constants or combinations of variables) that belong to the program’s symbolic encoding. In the interest of saving space, we do not discuss it here and refer the reader to [16]. The generation of formulas from G is performed recursively by sampling from probability distributions assigned to rules. Note that the choice of distributions affects only the order in which formulas are sampled and does not affect which formulas can or cannot be sampled in principle (because the grammar is fixed). Thus, without loss of generality, it is sound to assume that all distributions are uniform. In the context of termination analysis, we are interested in formulas produced by rules ineq and sum.

Fig. 2.
figure 2

(a): The worst-case dynamics of program from Fig. 1a; (b): the termination-argument validity check (in the frameboxes – lower bounds \(\{\ell _j\}\) for i).

4 Proving Termination

We start this section with a motivating example and then proceed to presenting the general-purpose algorithms for proving program termination.

Example 1

The program shown in Fig. 1a terminates. It operates on three integer variables, x, y, and K: in each iteration y gets closer to x, and x gets closer to K. Thus, the total number of values taken by y before it equals K is no bigger than the maximal distance among x, y, and K (in the following, denoted \( Max \)). The worst-case dynamics happens when initially \(x<y<K\) (shown in Fig. 2a), in other cases the program terminates even faster. To formally prove this, the program could be augmented by a so-called termination argument. For this example, it is simply a fresh variable i which is initially assigned \( Max \) (or any other value greater than \( Max \)) and which gets decremented by one in each iteration. The goal now is to prove that i never gets negative. Fig. 2b shows the encoding of this safety verification task (recall Definition 3). The existence of a solution to this task guarantees the safety of the augmented program, and thus, the termination of the original program. Most state-of-the-art Horn solvers are able to find a solution immediately.    \(\square \)

figure a

The main challenge in preparing the termination-argument validity check is the generation of lower bounds \(\{\ell _j\}\) for i in \( Init \) (e.g., conjunctions of the form \(i\!>\!\ell _j\) in in Fig. 2b). We build on the insight that each \(\ell _j\) could be constructed independently from the others, and then an inequality \(i\!>\!\ell _j\) could be conjoined with \( Init \), thus giving rise to a new safety verification task. For a generation of candidate inequalities, we utilize the algorithm from Sect. 3: all \(\{\ell _j\}\) can be sampled from grammar G which is obtained in advance from \( Init \) and \( Tr \).

For example, all six formulas in in Fig. 2b: \(x - K, K - x , y - K , K - y , x - y \), and \(y - x\) belong to the grammar shown in Fig. 1d. Note that for proving termination it is not necessary to have the most precise lower bounds. Intuitively, the larger the initial value of i, the more iterations it will stay positive. Thus, it is sound to try formulas which are not even related to actual lower bounds at all and keep them conjoined with \( Init \).

4.1 Synthesizing Linear Termination Arguments

Algorithm 1 shows an “enumerate-and-try” procedure to search for a linear termination argument that proves termination of a program P. To initialize this search, the algorithm introduces an extra counter variable i and adds it to \({ V }\) (respectively, its primed copy \(i'\) gets added to \({ V' }\)) (line 1).Footnote 3 Then the transition-relation formula \( Tr \) gets augmented by \(i' = i - 1\), the decrement of the counter in the loop body. To specify a set of error states, Algorithm 1 introduces a formula \( Err \) (line 2): whenever the loop guard is satisfied and the value of counter i is negative. Algorithm 1 then starts searching for large enough lower bounds for i (i.e., a set of constraints over \({ V }\cup \{i\}\) to be added to \( Init \)), such that no error state is ever reachable.

Before the main loop of our synthesis procedure starts, various formulas are extracted from the symbolic encoding of P and generalized to a formal grammar (line 3). The grammar is used for an iterative probabilistic sampling of candidate formulas (line 5) that are further added to the validity check of the current termination argument (line 8). In particular, each new constraint over i has the form \(i \!>\! cand \), where \( cand \) is produced by the sum production rule described in Sect. 3. Once \( Init \) is strengthened by this constraint, a new safety verification condition is compiled and checked (line 9) by an off-the-shelf Horn solver.

As a result of each safety check, either a formula satisfying Definition 3 or a counterexample \( cex \) witnessing reachability of an error state is generated. Existence of an inductive invariant guarantees that the conjunction of all synthesized lower bounds for i is large enough to prove termination, and thus Algorithm 1 converges. Otherwise, if grammar G still contains a formula that has not been considered yet, the synthesis loop iterates.

For the progress of the algorithm, it must keep track of the strength of each new candidate \( cand \). That is, \( cand \) should add more restrictions on i in \( Init \). Otherwise, the outcome of the validity check (line 9) would be the same as in the previous iteration. For this reason, Algorithm 1 includes an important routine [16]: after each sampled candidate \( cand \), it adjusts the probability distributions associated with the grammar, such that \( cand \) could not be sampled again in the future iterations (line 6). Additionally, it checks (line 7) if a new constraint adds some value over the already accepted constraints. Consequently, our algorithm does not require explicit handing of counterexamples: if in each iteration \( Init \) gets only stronger then current \( cex \) is invalidated. While in principle the algorithm could explicitly store \( cex \) and check its consistency with each new \( cand \), however in our experiments it did not lead to significant performance gains.

Theorem 1

If Algorithm 1 returns terminates for program P, then P terminates.

Indeed, the verification condition, which is proven safe in the last iteration of Algorithm 1, corresponds to some program \(P'\) that differs from P by the presence of variable i. The set of traces of P has a one-to-one correspondence with the set of traces of \(P'\), such that each state reachable in P could be extended by a valuation of i to become a reachable state in \(P'\). That is, P terminates iff \(P'\) terminates, and \(P'\) terminates by construction: i is initially assigned a reasonably large value, monotonically decreases at each iteration, and never goes negative.

figure d

We note that the loop in Algorithm 1 always executes only a finite number of iterations since G is constructed from the finite number of components, and in each iteration it gets adjusted to avoid re-sampling of the same candidates. However, an off-the-shelf Horn solver that checks validity of each candidate might not converge because the safety verification task is undecidable in general. To mitigate this obstacle, our implementation supports several state-of-the-art solvers and provides a flexibility to specify one to use.

4.2 Synthesizing Lexicographic Termination Arguments

There is a wide class of terminating programs for which no linear termination argument exists. A commonly used approach to handle them is via a search for a so-called lexicographic termination argument that requires introducing two or more extra counters. A SyGuS-based instantiation of such a procedure for two counters is shown in Algorithm 2 (more counters could be handled similarly). Algorithm 2 has a similar structure to Algorithm 1: the initial program gets augmented by counters, formula \( Err \) is introduced, lower bounds for counters are iteratively sampled and added to \( Init \) and \( Tr \), and the verification condition is checked for safety.

The differences in Algorithm 2 are in how it handles two counters i and j, between which an implicit order is fixed. In particular, \( Err \) is still expressed over i only, but i gets decremented by one only when j equals zero (line 14). At the same time, j gets updated in each iteration: if it was equal to zero, it gets assigned a value satisfying the conjunction of constraints in an auxiliary set \( jBounds \); otherwise it gets decremented by one. Algorithm 2 synthesizes \( jBounds \) as well as lower bounds for initial conditions over i and j. The sampling proceeds separately from three different grammars (lines 6, 9, and 12), and the samples are used in three different contexts (lines 7, 10, and 13 respectively). Optionally, Algorithm 2 could be parametrized by a synthesis strategy that gives interpretations for each of the nondet() calls (lines 5, 8, and 11 respectively). In the simplest case, each nondet() call is replaced by \(\top \), which means that in each iteration Algorithm 2 needs to sample from all three grammars. Alternatively, nondet() could be replaced by a method to identify only one grammar per iteration to be sampled from.

Theorem 2

If Algorithm 2 returns terminates for program P, then P terminates.

The proof sketch for Theorem 2 is similar to the one for Theorem 1: an augmented program \(P'\) terminates by construction (due to a mapping of values of \(\langle i,j \rangle \) into ordinals), and its set of traces has a one-to-one correspondence with the set of traces of P.

5 Proving Non-termination

In this section, we aim at solving the opposite task to the one in Sect. 4, i.e., we wish to witness infinite program traces and thus, to prove program non-termination. However, in contrast to a traditional search for a single infinite trace, it is often easier to search for groups of infinite traces.

Fig. 3.
figure 3

(a): A variant of program from Fig. 1a; (b): the valid \(\forall \exists \)-formula for its non-terminating refinement (in frameboxes – refined \( Guard \)-s); (c): an example of a non-terminating dynamics, when value of x (and eventually, y) never gets changed.

Lemma 1

Program \(P= \langle { V }\cup { V' }, Init , Tr \rangle \) where \( Tr = Guard \wedge Body \) does not terminate if:

  1. 1.

    there exists a state s, such that \(s\models Init \) and \(s\models Guard \),

  2. 2.

    for every state s, such that \(s\models Guard \), there exists a state \(s'\), such that \(s,s'\models Tr \) and \(s'\models Guard \).

The lemma distinguishes a class of programs, for which the following holds. First, the loop guard is reachable from the set of initial states. Second, whenever the loop guard is satisfied, there exists a transition to a state in which the loop guard is satisfied again. Therefore, each initial state s, from which the loop guard is reachable, gives rise to at least one infinite trace that starts with s.

Note that for programs with deterministic transition relations (like, e.g., in Fig. 1a), the check of the second condition of Lemma 1 reduces to deciding the satisfiability of a quantifier-free formula since each state can be transitioned to exactly one state. But if the transition relation is non-deterministic, the check reduces to deciding validity of a \(\forall \exists \)-formula. Although handling quantifiers is in general hard, some recent approaches [15] are particularly tailored to solve this type of queries efficiently.

In practice, the conditions of Lemma 1 are too strict to be fulfilled for an arbitrary program. However, to prove non-termination, it is sufficient to constrain the transition relation as long as it preserves at least one original transition and only then to apply Lemma 1.

Definition 4

Given programs \(P = \langle { V }\cup { V' }, Init , Tr \rangle \), and \(P'= \langle { V }\cup { V' }, Init , Tr ' \rangle \), we say that \(P'\) is a refinement of P if \( Tr ' \implies Tr \).

Intuitively, Definition 4 requires P and \(P'\) to operate over the same sets of variables and to start from the same initial states. Furthermore, each transition allowed by \( Tr '\) is also allowed by \( Tr \). One way to refine P is to restrict \( Tr = Guard \wedge Body \) by conjoining either \( Guard \), or \( Body \), or both with some extra constraints (called refinement constraints). In this work, we propose to sample them from our automatically constructed formal grammar (recall Sect. 3).

Example 2

Consider a program shown in Fig. 3a. It differs from the one shown in Fig. 1a by a non-deterministic choice in the second ite-statement. That is, y still moves towards x; but x moves towards K only when \(x > K\), and otherwise x may always keep the initial value. The formal grammar generated for this program is the same as shown in Fig. 1d, and it contains constraints \(x < K\) and \(y < K\). Lemma 1 does not apply for the program as is, but it does after refining \( Guard \) with those constraints. In particular, the \(\forall \exists \)-formula in Fig. 3b is valid, and a witness to its validity is depicted in Fig. 3c: eventually both x and y become equal and always remain smaller than K. Thus, the program does not terminate.   \(\square \)

figure e

5.1 Synthesizing Non-terminating Refinements

The algorithm for proving program’s non-termination is shown in Algorithm 3. It starts with a simple satisfiability check (line 1) which filters out programs that never reach the loop body (thus they immediately terminate). Then, the transition relation \( Tr \) gets strengthened by auxiliary inductive invariants obtained with the help of the initial states \( Init \) (line 2). The algorithm does not impose any specific requirements on the invariants (and it is sound even for a trivial invariant \(\top \)) and on a method that detects them. In many cases, auxiliary invariants make the algorithm converge faster. Similar to Algorithms 1–2, Algorithm 3 splits \( Init \) and \( Tr \) to a set of formulas and generalizes them to a grammar. The difference lies in the type of formulas sampled from the grammar (ineq vs sum) and their use in the synthesis loop: Algorithm 3 treats sampled candidates as refinement constraints and attempts to apply Lemma 1 (line 6).

The algorithm maintains a stack of refinement constraints \( Refs \). At the first iteration, \( Refs \) is empty, and thus the algorithm tries to apply Lemma 1 to the original program. For that application, a \(\forall \exists \)-formula is constructed and checked for validity. Intuitively the formula expresses the ability of \( Body \) to transition each state which satisfies \( Guard \) to a state which satisfies \( Guard \) as well. If the validity of \(\forall \exists \)-formula is proven, the algorithm converges (line 7). Otherwise, a refinement of P needs to be guessed. Thus, the algorithm samples a new formula (line 16) using the production rule ineq, which is described in Sect. 3, pushes it to \( Refs \), and iterates. Note that G permits formulas over \({ V }\) only (i.e., to restrict \( Guard \)), however, in principle it can be extended for sampling formulas over \({ V }\cup { V' }\) (thus, to restrict \( Body \) as well).

For the progress of the algorithm, it must keep track of how each new candidate \( cand \) corresponds to constraints already belonging to \( Refs \). That is, \( cand \) should not be implied by \( Guard \wedge \bigwedge \limits _{r \in Refs } r\) since otherwise the \(\forall \exists \)-formula in the next iteration would not change. Also, \( cand \) should not over-constrain the loop guard, and thus it is important to check that after adding \( cand \) to constraints from \( Guard \) and \( Refs \), the loop guard is still reachable from the initial states. Both these checks are performed before the sampling (line 9). After the sampling, necessary adjustments on the probability distributions, assigned to the production rules of the grammar [16], are applied to ensure the same refinement candidates are not re-sampled again (line 17).

Because by construction G cannot generate conjunctions of constraints, the algorithm handles conjunctions externally. It is useful in case when a single constraint is not enough for application of Lemma 1, and it should be strengthened by another constraint. On the other hand, it also might be needed to withdraw some sampled candidates before converging. For this reason, Algorithm 3 maintains a stack \( Gramms \) of grammars and handles it synchronously with stack \( Refs \) (lines 12–14 and 18–19). When all candidates from a grammar were considered and were unsuccessful, the algorithm pops the latest candidate from \( Refs \) and rolls back to the grammar used in the previous iteration. Additionally, a maximum size of \( Refs \) can be specified to avoid considering too deep refinements.

Theorem 3

If Algorithm 3 returns does not terminate for program P, then P does not terminate.

Indeed, constraints that belong to \( Refs \) in the last iteration of the algorithm give rise to a refinement \(P'\) of P, such that \(P' = \langle { V }\cup { V' }, Init , Tr \wedge \bigwedge \limits _{r \in Refs } r \rangle \). The satisfiability check (line 9) and the validity check (line 6) passed, which correspond to the conditions of Lemma 1. Thus, \(P'\) does not terminate, and consequently it has an infinite trace. Finally, since \(P'\) refines P then all traces (including infinite ones) of \(P'\) belong to P, and P does not terminate as well.

5.2 Integrating Algorithms Together

With a few exceptions [30, 39], existing algorithms address either the task of proving, or the task of disproving termination. The goal of this paper is to show that both tasks benefit from syntax-guided techniques. While an algorithmic integration of several orthogonal techniques is itself a challenging problem, it is not the focus of our paper. Still, we use a straightforward idea here. Since each presented algorithm has one big loop, an iteration of Algorithm 1 could be followed by an iteration of Algorithm 2 and in turn, by an iteration of Algorithm 3 (i.e., in a lockstep fashion). A positive result obtained by any algorithm forces all remaining algorithms to terminate. Based on our experiments, provided in detail in Sect. 6, the majority of benchmarks were proven either terminating or non-terminating by one of the algorithms within seconds. This justifies why the lockstep execution of all algorithms in practice would not bring a significant overhead.

6 Evaluation

We have implemented algorithms for proving termination and non-termination in a tool called FreqTermFootnote 4. It is developed on top of FreqHorn [16], uses it for Horn solving, and supports other Horn solvers, Spacer3 [26] and \(\mu \) Z [24], as well. To solve \(\forall \exists \)-formulas, FreqTerm uses the AE-VAL tool [15]. All the symbolic reasoning in the end is performed by the Z3 SMT solver [11].

FreqTerm takes as input a program encoded as a system of linear constrained Horn clauses (CHC). It supports any programming language, as long as a translator from it to CHCs exists. For encoding benchmarks to CHCs, we used SeaHorn v.0.1.0-rc3. To the best of our knowledge, FreqTerm is the only (non)-termination prover that supports a selection of Horn solvers in the backend. This allows the prover to leverage advancements in Horn solving easily.

We have compared FreqTerm against AProVE rev. c181f40 [18], Ultimate Automizer v.0.1.23 [22], and HipTNT+ v.1.0 [30]. The rest of the section summarizes three sets of experiments. Sections 6.1 and 6.2 discuss the comparison on small but tricky programs, respectively terminating and non-terminating, which shows that our approach is applicable to a wide range of conceptually challenging problems. In Sect. 6.3, we target several large-scale benchmarks and show that FreqTerm is capable of significant pushing the boundaries of termination and non-termination proving. In total, we considered 856 benchmarks of various size and complexity. All experiments were conducted on a Linux SMP machine, Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40 GHz, 56 CPUs, 377 GB RAM.

Fig. 4.
figure 4

FreqTerm vs respectively Ultimate Automizer, AProVE, and HipTNT+.

6.1 Performance on Terminating Benchmarks

We considered 171 terminating programsFootnote 5 from the Termination category of SVCOMP and programs crafted by ourselves. Altogether, four tools in our experiment were able to prove termination of 168 of them within a timeout of 60 s and left only three programs without a verdict. AProVE verified 76 benchmarks, HipTNT+ 90 (including 3 that no other tool solved), Ultimate Automizer 105 (including 4 that no other tool solved). FreqTerm, implementing Algorithms 1–2 and relying on different solvers verified in total 155 (including 30 that no other tool solved). In particular, Algorithm 1 instantiated with Spacer3, proved termination of 88 programs, with \(\mu \) Z 79, and with FreqHorn 80. Algorithm 2 instantiated with Spacer3, proved termination of 92 programs, with \(\mu \) Z 109, and with FreqHorn 74.

A scatterplot with logarithmic scale on the axes in Fig. 4(a) shows comparisons of best running times of FreqTerm vs the running times of competing tools. Each point in a plot represents a pair of the FreqTerm run (x-axis) and the competing tool run (y-axis). Intuitively, green points represent cases when FreqTerm outperforms the competitor. On average, for programs solved by both FreqTerm and Ultimate Automizer, FreqTerm is 29 times faster (speedup calculated as a ratio of geometric means of the corresponding runs). In a similar setting, FreqTerm is 32 times faster than AProVE. However, FreqTerm is 2 times slower than HipTNT+. The evaluation further revealed (in Sect. 6.3) that the latter tool is efficient only on small programs (around 10 lines of code each), and for large-scale benchmarks it exceeds the timeout.

6.2 Performance on Non-terminating Benchmarks

We considered 176 terminating programsFootnote 6 from the Termination category of SVCOMP and programs crafted by ourselves. Altogether, four tools proved non-termination of 172 of them: AProVE 35, HipTNT+ 92, Ultimate Automizer 123, and Algorithm 3 implemented in FreqTerm 152. Additionally, we evaluated the effect of \(\forall \exists \)-solving in FreqTerm. For that reason, we implemented a version of Algorithm 3 in which non-termination is reduced to safety, but the conceptual SyGuS-based refinement generator remained the same. This implementation used Spacer3 for proving that the candidate refinement can never exit the loop. Among 176 benchmarks, such routine solved only 105, which is 30% fewer than Algorithm 3. However, it managed to verify 8 benchmarks that Algorithm 3 could not verify (we believe, because Spacer3 was able to add an auxiliary inductive invariant).

Logarithmic scatterplot in Fig. 4(b) shows comparisons of FreqTerm vs the running times of competing tools. On average, FreqTerm is 41 times faster than Ultimate Automizer, 73 times faster than AProVE, and exhibits roughly similar runtimes to HipTNT+ (again, here we considered only programs solved by both tools). Based on these experiments, we conclude that currently Freq-Term is more effective and more efficient at synthesizing non-terminating program refinements than at synthesizing terminating arguments.

6.3 Large-Scale Benchmarks

We considered some large-scale benchmarks for evaluation arising from Event-Condition-Action (ECA) systems that describe reactive behavior [1]. We considered various modifications of five challenging ECAsFootnote 7. Each ECA consists of one large loop, where each iteration reads an input and modifies its internal state. If an unexpected input is read, the ECA terminates.

In our first case study, we aimed to prove non-termination of the given ECAs, i.e., that for any reachable internal state there exists an input value that would keep the ECA alive. The main challenge appeared to be in the size of benchmarks (up to 10000 lines of C code per loop) and reliance on an auxiliary inductive invariant. With the extra support of Spacer3 to provide the invariant, FreqTerm was able to prove non-termination of a wide range of programs. Among all the competing tools, only Ultimate Automizer was able to handle these benchmarks, but it verified only a small fraction of them within a 2 h timeout. In contrast, FreqTerm solved 301 out of 302 tasks and outperformed Ultimate Automizer by up to several orders of magnitude (i.e., from seconds to hours). Table 1 contains a brief summary of our experimental evaluation.Footnote 8

In our second case study, we instrumented the ECAs by adding extra conditions to the loop guards, thus imposing an implicit upper bound on the number of loop iterations, and applied tools to prove terminationFootnote 9 (shown in Table 2). Again, only Ultimate Automizer was able to compete with FreqTerm, and interestingly it was more successful here than in the first case study. Encouragingly, FreqTerm solved all but one instance and was consistently faster.

Table 1. FreqTerm vs Ultimate Automizer on non-terminating ECAs (302).
Table 2. FreqTerm vs Ultimate Automizer on terminating ECAs (207).

7 Related Work

Proving Termination. A wide range of state-of-the-art methods are based on iterative reasoning driven by counterexamples [4, 5, 9, 10, 19, 21, 23, 27, 29, 36] whose goal is to show that transitions cannot be executed forever. These approaches typically combine termination arguments, proven independently, but none of them leverages the syntax of programs during the analysis.

A minor range of tools of termination analyzers are based on various types of learning. In particular, [39] discovers a terminating argument from attempts to prove that no program state is terminating; [34] exploits information derived from tests, [37] guesses and checks transition invariants (over-approximations to the reachable transitive closure of the transition relation) from libraries of templates. The closest to our approach, [31] guesses and checks transition invariants using loop guards and branch conditions. In contrast, our algorithms guess lower bounds for auxiliary program counters and extensively use all available source code for guessing candidates.

Proving Non-termination. Traditional algorithms, e.g. [3, 6, 8, 20, 22], are based on a search for lasso-shaped traces and a discovery of recurrence sets, i.e., states that are visited infinitely often. For instance, [32] searches for a geometric series in lasso-shaped traces. Our algorithm discovers existential recurrence sets and does not deal with traces at all: it handles their abstraction via a \(\forall \exists \)-formula.

A reduction to safety attracts significant attention here as well. In particular, [40] relies only on invariant generation to show that the loop guard is also satisfied, [19] infers weakest preconditions over inputs, under which program is non-terminating; and [7, 28] iteratively eliminate terminating traces through a loop by adding extra assumptions. In contrast, our approach does not reduce to safety, and thus does not necessarily require invariants. However, we observed that if provided, in practice they often accelerate our verification process.

Syntax-Guided Synthesis. SyGuS [2] is applied to various tasks related to program synthesis, e.g., [13, 17, 25, 33, 35, 41]. However, the formal grammar in those applications is typically given or constructed from user-provided examples. To the best of our knowledge, the only application of SyGuS to automatic program analysis was proposed by [14, 16], and it inspired our approach. Originally, the formal grammar, constructed from the verification condition, was iteratively used to guess and check only inductive invariants. In this paper, we showed that a similar reasoning is practical and easily transferable across applications.

8 Conclusion

We have presented new algorithms for synthesis of termination arguments and non-terminating program refinements. Driven by SyGuS, they iteratively generate candidate formulas which tend to follow syntactic patterns obtained from the source code. By construction, the number of possible candidates is always finite, thus the search space is always relatively small. The algorithms rely on recent advances in constraint solving, they do not depend on a particular backend engine, and thus performance of checking validity of a candidate can be improved by advancements in solvers. Our implementation FreqTerm is evaluated on a wide range of terminating and non-terminating benchmarks. It is competitive with state-of-the-art and it significantly outperforms other tools when proving non-termination of large-scale Event-Condition-Action systems.

In future work, it would be interesting to investigate synergetic ways of integrating the proposed algorithms together, as well as exploiting strengths of different backend Horn solvers for different verification tasks.