TracerX: Dynamic Symbolic Execution with Interpolation (Competition Contribution)

Dynamic Symbolic Execution (DSE) is an important method for testing of programs. An important system on DSE is KLEE [1] which inputs a C/C++ program annotated with symbolic variables, compiles it into LLVM, and then emulates the execution paths of LLVM using a specified backtracking strategy. The major challenge in symbolic execution is path explosion. The method of abstraction learning [7] has been used to address this. The key step here is the computation of an interpolant to represent the learned abstraction. TracerX, our tool, is built on top of KLEE and it implements and utilizes abstraction learning. The core feature in abstraction learning is subsumption of paths whose traversals are deemed to no longer be necessary due to similarity with already-traversed paths. Despite the overhead of computing interpolants, the pruning of the symbolic execution tree that interpolants provide often brings significant overall benefits. In particular, TracerX can fully explore many programs that would be impossible for any non-pruning system like KLEE to do so.


INTRODUCTION
Symbolic execution (SE) has emerged as an important method to reason about programs, in both verification and testing.By reasoning about inputs as symbolic entities, its fundamental advantage over traditional black-box testing, which uses concrete inputs, is simply that it has better coverage of program paths.In particular, dynamic symbolic execution (DSE), where the execution space is explored path-by-path, has been shown effective in systems such as DART [Godefroid et al. 2005], CUTE [Sen et al. 2005] and KLEE [Cadar et al. 2008a] 1 .
A key advantage of DSE is that by examining a single path, the analysis can be both precise (for example, capturing intricate details such as the state of the cache micro-architecture), and efficient (for example, the constraint solver often needs to deal with path constraints that are aggregated into a single conjunction).Another advantage is the possibility of reasoning about system or library functions which we can execute but not analyze, as in the method of concolic testing CUTE [Sen et al. 2005].Yet another advantage is the ability to realize a search strategy in the path exploration, such as in a random, depth-first, or breadth-first manner, or in a manner determined by the program structure.However, the key disadvantage of DSE is that the number of program paths is in general exponential in the program size, and most available implementations of DSE do not employ a general technique to prune away some paths.Indeed, a recent paper [Avgerinos et al. 2016] describes that DSE "traditionally forks off two executors at the same line, which remain subsequently forever independent", clearly suggesting that the DSE processing of different paths have no symbiosis.
A variant of symbolic execution is that of Static Symbolic Execution (SSE), see e.g.[Avgerinos et al. 2016;Khurshid et al. 2003].The general idea is that the symbolic execution tree is encoded as a single logic formula whose treatment can be outsourced to an SMT solver [De Moura et al. 2002].The solver then deals with what is essentially a huge disjunctive formula.Clearly, there are some limitations to this approach as compared with DSE, for example, the loop bounds, including nested loops, must be pre-specified.However, SSE has a huge advantage over (non-pruning) DSE: its SMT solver can use the optimization method of conflict directed clause learning (CDCL) [Marques-Silva and Sakallah 1999].See e.g.Section 3.4 of [de Moura and Bjørner 2008] on how the SMT solver Z3 exploits CDCL.Essentially, CDCL enables "pruning" in the exploration process of the solver.
In this paper, our primary objective is to address the path explosion problem in DSE.More specifically, we wish to perform a path-by-path exploration of DSE to enjoy its benefits, but we include a pruning mechanism so that path generation can be eliminated if the path generated so far is guaranteed not to violate the stated safety conditions.Toward this goal, we employ the method of abstraction learning [Jaffar et al. 2009], which is more popularly known as lazy annotations (LA) [McMillan 2010[McMillan , 2014]].The core feature of this method is the use of interpolation, which serves to generalize the context of a node in the symbolic execution tree with an approximation of the weakest precondition of the node.This method has been implemented in the tracer system [Jaffar et al. 2012[Jaffar et al. , 2011] ] which was the first system to demonstrate DSE with pruning.While tracer was able to perform bounded verification and testing on many examples, it could not accommodate industrial programs that often dynamically manipulate the heap memory.Instead, tracer was primarily used to evaluate new algorithms in verification, analysis, and testing, e.g., [Chu and Jaffar 2012;Chu et al. 2016;Jaffar et al. 2013].
The main contribution of this paper is the design and implementation of a new interpolation algorithm, and integration into the KLEE system.In our primary experiment, we compare against KLEE.In a secondary experiment, we consider the related area of Static Symbolic Execution (SSE).The reason is that, while SSE is generally considered as significantly different from DSE, SSE is a competitor to DSE because they both address many common analysis problems.
Our main experimental result is that our algorithm leads in code penetration: given a target, which is essentially a designated program point, can one prove that the target is reachable, or prove that the target is unreachable.Thus our algorithm is more aligned with verification rather than testing.We suggest two driving applications for such verification.One is to confirm/deny an "alarm" from a static analysis; an alarm is a target for which there is a plausible reason for it to be a true bug.Another application is dead code detection.Penetration addresses both these questions.
Given that our algorithm is relatively heavy-weight, it is expected that for some examples, the overhead is not worth it.We show firstly that our implemented system performs well on a large benchmark suite.We then considered a subset of the original targets called hard targets.These are obtained by filtering out targets that can be proved easily by state-of-the-art methods: vanilla symbolic execution for reachable targets, and static analysis for unreachable targets.We then show that for the remaining (hard) targets, the performance gap widens.
We then performed a secondary experimental evaluation for testing.Here we followed the setup of the TEST-COMP competition by evaluating on bug finding (given a set of targets, find one) and code coverage (where all program blocks are targets, and to find as many as possible).In this secondary experiment, we show that our implementation is competitive.

A MOTIVATING EXAMPLE
Consider the shortest path problem in graphs.See Fig. 1 where we assume the graph has edges, where each of which has a "distance".The edges point from a lower-numbered vertex to a higher one.The variable  computes the distance between nodes 1 and  via the traversed path.In the end, we want to know if  ≥    for some given constant   .Symbolic execution on this program (where the variable node is symbolic) will traverse all paths, and therefore will be able to check the bound.Then, by iteratively executing symbolic execution with various values of   , we can solve the shortest path problem.
Example 2.1.Consider the example graph in Fig. 2 (a) and its respective adjacency matrix in Table 1.The execution tree of the program from Fig. 1 is presented in Fig. 2(b).The program points ⟨1⟩ and ⟨4⟩ represent the source and destination nodes.The program points using the same number, e.g.⟨3a⟩ and ⟨3b⟩, identify different visits to the same program point.Now assume we want to ensure that the distance between nodes 1 and 4 is greater than    = 90 at the end.A basic DSE tool will traverse all the paths in the execution tree if there is no pruning.There would be a problem for this example because the number of the paths is exponential in  .Now we will demonstrate how our approach can prune the execution tree.Since our approach is built on top of DSE, all the program points in the loop iterations possess the same program point.Hence, we can compare the states in different iterations and prune them when possible.Consider the leftmost path ⟨1⟩, ⟨2⟩, ⟨3a⟩, and ⟨4a⟩.At the end of this path, since the distance  is 120, the assertion is not violated and an interpolant is stored at ⟨3⟩:  ≥ 30 This interpolant represents the weakest precondition at ⟨3⟩ which satisfies the assertion.It is computed by updating the safety property  ≥ 90 with the update on  between nodes ⟨3a⟩, and ⟨4a⟩:  =  + 60.Next, this interpolant is passed to the parent node ⟨2⟩ considering the update on the variable  between ⟨2⟩ and ⟨3a⟩:  ≥ −10.
Backtracking back to ⟨2⟩, we now consider the second path ⟨1⟩, ⟨2⟩, ⟨4b⟩.Here again, the assertion is not violated and an interpolant is passed to node ⟨2⟩ considering the update on  between ⟨2⟩ and ⟨4b⟩:  ≥ 0. Now the intersection of the two interpolants received from the successor nodes of ⟨2⟩ is stored as the interpolant of ⟨2⟩:  ≥ 0.
Moving on, this interpolant is updated and sent to the parent node ⟨1⟩:  ≥ −20.Next, the path ⟨1⟩, ⟨3b⟩, and ⟨4c⟩ is traversed.Here, at node ⟨3b⟩ the distance is  = 35, and the interpolant stored at ⟨3a⟩ ( ≥ 30) can be used to prune this node.Note importantly that node ⟨3a⟩ was visited in the third iteration of the loop (in the program from Fig. 1) and node ⟨3b⟩ is visited in the second iteration.Here, the pruning is sound since both program points are on the same node in the graph (node 3) and the state at ⟨3b⟩ satisfies the respective weakest precondition interpolant.
This interpolant at the root infers that all values of  greater than or equal −5 will not violate the assertion.From this, we can infer that if the condition in the assertion had been  ≥ 95, it would still have not been violated.That is, we can,in fact, conclude that the shortest path between nodes 1 and 4 in the graph is 95, a tighter bound than what we started with.□ For this example, our implementation can deal with 1000 nodes, while CBMC and KLEE tops out at 100 and 25 respectively.

BACKGROUND: SYMBOLIC EXECUTION
We formalize dynamic symbolic execution (DSE) for a toy programming language.The variables in a program are denoted Vars.Other than the program variables Vars  , there are also symbolic variables Vars  .A basic statement is the assignment,  =  where  is some arithmetic or boolean expression.Another basic statement is assume() where  is a boolean expression.Note that "assertions" can be modeled by assume() statements coupled with a distinguished statement representing error.
For brevity, we omit other statements, e.g.functions, memory operations like arrays, malloc, etc. Extension to cover theses instructions would be routine.In general, we follow the semantics used by KLEE for these instructions.We will further discuss some implementation details for these instructions in Section 5.2.
We model a program P by a transition system: a tuple ⟨Σ, ℓ start , −→⟩ where Σ is the set of program points and is defined by Π ∧ .(Note that our expressions have no side-effects.)The notion of evaluation is extended for a set of constraints in an intuitive way.The evaluation of the constraint store of a state is denoted by  .
A symbolic state  ≡ ⟨ℓ, Π⟩ is called infeasible if  is unsatisfiable.Otherwise, the state is called feasible; symbolic execution is possible from a feasible state only.Definition 3.2 (Transition Step).Given a feasible symbolic state  ≡ ⟨ℓ, Π⟩, and a transition system ⟨Σ, ℓ start , −→⟩ the symbolic execution of transition →  is a sequence of symbolic states such that ∀1 ≤  ≤ ,   is a successor of  −1 .We can now define symbolic exploration as the process of constructing a Symbolic Execution Tree (SET) rooted at  0 .Following some search strategy, the order in which the nodes are constructed can be different.
For bounded verification and testing, we assume that the tree depth is bounded.Note that only one of our program statements,  (), is a "branch".This means that in a SET, a state has at most two successors.
The reachability of an error statement indicates a bug.Symbolic execution typically stops the path and generates a failed test case witnessing that bug.On the other hand, a path safely terminates if we reach a halt statement, and we also generate a passed test case.We prove a program is safe by showing that no error statement is reached.A subtree is called safe if no error statement is reached from its root.

Symbolic Execution with Interpolation
We now present a formulation of the method of dynamic symbolic execution with interpolation (DSEI) [Jaffar et al. 2009;McMillan 2010].The essential idea, in brief, is this.In exploring the SET, an interpolant Ψ of a state  is an abstraction of it which ensures the safety of the subtree rooted at that state.In other words, if we continue the execution with Ψ instead of , we will not reach any error.Thus upon encountering a state  of the same program point as , i.e.,  and  have the same set of emanating transitions, if  |= Ψ, then continuing the execution from  will not lead to an error.Consequently, we can prune the subtree rooted at .
DSEI is a top-down method, because it traverses the SET from the root, and is also bottom-up because it propagates formulas backward from the end of a path in the SET.Before proceeding with describing DSEI, let us briefly argue that using just a bottom-up approach using the concept of weakest precondition, is not practical.Before proceeding, consider using classic weakest preconditions, which operate on program fragments, not paths.The main disadvantage here is that being entirely bottom-up, the computed precondition at a program point is agnostic to the context of the states which reach that program point.
In contrast, DSEI performs a top-down depth-first search of the state space, path by path.For each path, it computes a path interpolant.This is where DSEI is bottom-up.For each subtree, it computes a tree interpolant being the conjunction of all of the path interpolants from within the subtree.Now, we describe how to compute a path interpolant.See Fig. 3. Rule (1a) and (2a) are the base cases and their process is well-understood.We simply present an example.Suppose  =  were x = x + 5 and Ψ were x < 7, then the weakest precondition is x < 2. In general, for an assignment of a variable to an expression , say x = x + , the weakest precondition wrt. to Ψ is Ψ[/ + ], i.e. the formula obtained from Ψ by simultaneously replacing all its occurrences of  with  + .
Next, rule (3a) addresses the case that a node is infeasible in the SET.Finally, we should highlight that in rule (4a), the interpolant  =⇒ Ψ still generates a disjunction.Consequently, the path-based weakest precondition, in general, may be a very large disjunction, exponential in the program length.
Example 3.4.Consider the example in Fig. 4, where  [] is a symbolic bit-vector ,  an unspecified Boolean condition on the bit-vector representing the precondition, and  an unspecified Boolean condition on any variables, representing the postcondition.
)   = 1 ; e l s e   = − 1 ; a s s e r t (  ) ; Suppose  = 3, and The classic weakest precondition of the program with postcondition  is a disjunction of 8 distinct formulas, each of which is (a) an assignment of specific bit values to the variables  [] conjoined with (b) a formula representing the propagation of the postcondition  through the   assignments corresponding to the  [] assignments in (a).These 8 formulas are By way of comparison, in the path-based weakest precondition, 6 of 8 of these formulas would be unsatisfiable.Thus, the path-based weakest precondition, after simplification, would be the following 2 formulas at ⟨1⟩.
The idea here is that classic weakest precondition, as it traverses is agnostic to precondition, and there it essentially traverses every path through the program.Only at the end it is known that only 2 of the 8 formulas generated cover the precondition.
We finally comment that in the path-based weakest precondition, though it has only two formulas, this still is a disjunction.In the next two sections, we present our algorithm which approximates the weakest path-based weakest precondition with a single conjunction.

THE MAIN ALGORITHM
The overall structure of our idealized algorithm is in Fig. 5, where it processes a symbolic execution tree (SET) via DSEI.The function DSEI receives a symbolic state  which represents a node from the SET.It then processes the subtree beneath the node and returns an interpolant Ψ.If  is safe, i.e. all paths from  do not violate the assertion, then Ψ is an interpolant storing a generalization of the state .If, on the other hand,  is unsafe, i.e. there is one path from  that violates assertion, then Ψ is error.
The DSEI() function first checks if a state is infeasible (line ⟨1⟩).The return value is simply false (as the interpolant).
At lines ⟨3⟩-⟨6⟩ (when  is a terminal node, i.e. it has no successors), it is checked if  is safe, then Φ is returned as an interpolant for the safe terminal node.Otherwise, a counter-example is found and error is returned.
Next, in lines ⟨8⟩-⟨12⟩, the DSEI function is recursively called on  ′ which is a successor node of The key element of the algorithm is in line ⟨10⟩, where the BackProp(s, stmt, Ψ) function returns Ψ ′ , a practical estimation of the weakest precondition of the statement  wrt. a postcondition Ψ.Note importantly that here we cannot use the path-based weakest precondition from Section 3.1, since it generates a disjunctive interpolant.In Section 5, we present the interpolation algorithm of the BackProp function.
Finally, the tree interpolant is returned in line ⟨20⟩ which is the conjunction of Ψ ′ and Ψ ′′ .Note here that Ψ ′ ∧ Ψ ′′ contains a condition which while not violated, ensures all paths within the subtree beneath  will not lead to a violation of the safety property Φ.
The key to performance is the subsumption step in line ⟨2⟩.Here Ψ contains the interpolant generated from processing a similar node in the SET, i.e. a node with the same program point.It is checked if  |= Ψ.If so,  is subsumed by Ψ meaning all paths beneath the node  would not lead to a violation of the safety property Φ.
This, in turn, means that the key is the quality of the interpolant Ψ generated.Without interpolation, i.e. relying on state subsumption alone, is not good enough2 .In Section 5, we present the interpolation algorithm of BackProp which is the main technical contribution of this paper.The BackProp function computes a conjunctive approximation of the weakest precondition as an interpolant.
Remark on Depth First vs. Random Strategy: The algorithm presented in this section is based on the depth-first traversal (DFS) of the SET.However, if the SET is so big that full coverage is implausible, then a DFS strategy in a non-pruning DSE (like KLEE) is known to have poor coverage [Cadar et al. 2008b].For such large programs, a random search strategy can be used to avoid the exploration of the SET from being stuck in some part of the SET.KLEE's "random" strategy presented in [Cadar et al. 2008a] addresses this issue and attempts to maximize coverage.
We can utilize a random strategy in our algorithm too.This strategy would be similar to KLEE's random strategy [Cadar et al. 2008a].The difference between our random strategy and KLEE's random strategy is two-fold: 1) Our approach still attempts to generate path and tree-interpolants to prune the SET.2) Our random strategy adds one more step to the two steps in KLEE's random strategy, which are executed in a round-robin fashion.This new step will pick a state which has a high chance to create a tree interpolant.
For our algorithm, the choice between DFS and random search strategies can matter greatly.DFS strategy is ideal for our algorithm since it maximizes the generation of tree-interpolants, which in turn increases the chance of subsumption.In general, in the random strategy, tree interpolant are formed slower, and memory usage is higher.On the other hand, it can reach higher coverage when the SET is not fully traversed.We follow the commonsense belief that for full exploration DFS is as good as any other strategy, whereas, for incomplete exploration, random is better.In Section 6, we will experiment with both the DFS and random strategies for our algorithm and we will show that both strategies can reach higher performance on different programs.

A PRACTICAL INTERPOLATION ALGORITHM
The essential idea behind our new interpolation algorithm is to have it be a conjunction.From section 3.1, we have seen that the path-based weakest precondition is in general a disjunction, hence, still not practical.So our proposed approach will entail a conjunctive approximation of the path-based weakest precondition.Clearly, the context itself is a first candidate.We now show that we can, by using the context as a guide, compute an effective abstraction of it.
Before we proceed, we mention abstraction learning via interpolation [Jaffar et al. 2009;McMillan 2010McMillan , 2014] ] which has demonstrated significant speedup in verification and testing, e.g., [Jaffar et  Although in all of these previous efforts, the interpolants implemented are conjunctive, they were not as general as in the weakest precondition.In fact, all implementations ensured that an interpolant was in the form of a conjunction which could then be dealt with efficiently by an SMT solver.
We now present the BackProp function in Fig. 6 which generates a conjunctive approximation of the path-based weakest precondition.As before, we use the notation − →  to denote a sequence of states.
Rules (1b) -(3b) are similar to the respective rules in Fig. 3.The problem at hand is to determine an interpolant which (a) it can replace  =⇒ Ψ from Fig. 3, and (b) it is a conjunction.The major difference here compared to the path-based weakest precondition (from Fig. 3) is that the rule (4a) is now replaced by the two new rules (4b) and (5b).
We first dispense the easy case in rule (4b) where  |=  holds.Clearly, Ψ ∧  is the right interpolant.
Next, we discuss the difficult case in rule (5b).We know that the best interpolant is provided by the concept of weakest precondition, that is: ¬ Ψ.However, this is a disjunction, and so is not suitable for us.What we require is a general method, for generalizing the constraints, that attempt to be as powerful as the weakest precondition method.However, it is restricted to produce only a conjunction of constraints.
Note that in rule (5b), we invoke a function abduction(  , , Ψ).We already have that  ∧ |= Ψ holds.This is by virtue of the top-down computation that brought us to this point.Now, we want a generalization  such that  ∧  |= Ψ.This means we have an abduction problem: given a conclusion Ψ and a partial contribution  to that conclusion, what is the most general constraint needed to be added to ?This is a classic problem [Abductive reasoning 2020].Unfortunately, we are not aware of any general abduction algorithm that is practical for our purposes.Next, we present our own abduction algorithm.

The Abduction Algorithm
Consider the abduction algorithm in Fig. 7.It uses two main functions.The first is core(, Ψ).The algorithm for the core function is presented in Fig. 7.It is called with the arguments  ∧  and Ψ.Note that, as explained in the previous section, we have  ∧  |= Ψ.The core function eliminates the constraints in  ∧  that are not needed for implying Ψ.The result is stored in φ.
The second function is separate(, ).The function partitions  into two formulas   and  v .The algorithm stores in   , any constraint which is either containing a variable from  or contains a variable which is appearing in another constraint in   .The remainder of the constraints are stored in  v .
The important property that is required concerns a notion of separation.We first define this property in general.
Definition 5.1 (Separation).Consider a first-order formula Ψ 1 ∧ Ψ 2 where the variables of Ψ 1 and Ψ 2 are  1 and  2 respectively, and  1 and  2 are disjoint.Let Ψ 1 |  1 denote the projection of Ψ onto  1 .We say that Ψ 1 and Ψ 2 are separate, written We next explain the separation property that we require from the separate(, ) function, which returns   and  v .This property is: We now return to the abduction function.Note that it calls separate() twice.In the first call,  is partitioned into two:   and  v .Since, separate() is provided with the variables of , the critical property that we would have is that the set of variables in (  ∧ ) and  v are disjoint.Moreover (  ∧ ) ★  v also holds.
In the next call, Ψ is partitioned into two.The difference here is that the separate() function is provided with the variables of (  ∧ ).The critical property that we would have is that the set of variables in Ψ  and Ψ v are disjoint.Moreover, the set of variables in (  ∧ ) and Ψ v are disjoint too.In the end, after the two calls to separate(), the following holds.
Finally,   ∧ Ψ v is returned as a generalization of  such that Ψ v ∧   ∧  |= Ψ holds.The special case is when Ψ  contains no constraints, i.e. it is  .In this case, Ψ v contains all of the constraints in Ψ.Since Ψ ∧  |= Ψ holds obviously Ψ is returned as a generalization of .We now outline a proof that the abduction algorithm is correct.We first require some helper results.Corollary 5.2 (Frame Rule).Consider three first-order formulas , , and .
we proceed by contradiction.Assume  is true while  is false for some valuation  on the variables of  and .We can now extend  to include the variables  -call this evaluation  ′ .So ( ∧ ) ′ implies ( ∧ ) ′ .In case ( ∧ ) ′ is true,  ′ must be true.This contradicts that assumption that  is false because  and  ′ agree on the variables of .Similarly, in case ( ∧ ) ′ is false, then this contradicts the assumption that  is true because  and  ′ agree on the variables of .□ Proof.Let φ be core( ∧ , Ψ).We prove the general case: where  and  2 are the set of variables in  and Step (b) on the other hand, provides a mechanism for backward reasoning, computing an approximation of the weakest precondition.That is, the interpolant is (partly) composed of constraints that come from the postcondition in a bottomup manner, and not from the constraint of the current symbolic state, which is from a top-down manner.
A naive implementation of the abduction algorithm will not be sufficiently practical.We will describe how we implement the algorithm in Section 5.2.
We now exemplify the concepts behind the BackProp algorithm in the following synthetic example.We attempt to show that first, the BackProp algorithm can generate a non-trivial interpolant; and second that the interpolant generated by the unsatisfiability core method is not an ideal solution, i.e. it is less general as compared to the BackProp algorithm.
Example 5.5.In Fig. 8, we depict the full SET of a program explored by DSEI.Note that program points are denoted by numbers, e.g.⟨2⟩, and we attach small letters to distinguish different encounters of the same program point, e.g.⟨2⟩.Assume,  and  are symbolic variables and  and  are program variables.We are attempting to prove the postcondition −3 <  < 6 ∧  < .
The left most path is traversed to ⟨4⟩ and since it is safe, an interpolant Ψ 3 is generated at ⟨3⟩ using rule (2b): −4 <  < 5 ∧  <  + 33.Moving now to the path ⟨1⟩ ⟨2⟩ ⟨3⟩ ⟨4⟩, the interpolant generated at ⟨3⟩ using rule (2b) would be We now show how to propagate Ψ 3 and Ψ 3 to obtain Ψ 2 .Consider first Ψ 3 .At ⟨2⟩, the guard in question is  ( > 0).We apply rule (5b) and the abduction function.First, φ would be −1 <  < 2 ∧  = 0 ∧  = 1, which still implies Ψ 3 .Next, since the guard has just the variable ,   is −1 <  < 2, and  x is  = 0 ∧  = 1.Finally, Ψ  would be −4 <  < 5 and Ψ x is  <  + 33.In other words, the first formula   is "in the frame of ", while  x is in the "anti-frame".Thus the abduction formula Ψ 2 is −1 <  < 2 ∧  <  + 33, which is obtained as a combination of the incoming context from the top (which implies −1 <  < 2) and the formula Ψ 3 which come from the bottom.Repeating this method for Ψ 3 , our abduction algorithm produces another interpolant: −1 <  < 2 ∧  >  − 2. Conjoining these two interpolants finally gives an interpolant Ψ 2 .Now, moving to ⟨2⟩ the constraint store implies the generated interpolant from ⟨2⟩.Hence, node ⟨2⟩ is subsumed (a.k.a.pruned) and its safety is inferred from the computed interpolant.Finally, we note that a classic Unsat-core interpolant would be −1 <  < 2 ∧  = 0 ∧  = 1 which obviously would not be able to subsume node ⟨2⟩.□ We now reconsider the example in Fig. 4 and show how the above interpolation algorithm would deal with this example in a great way.This time we do not constrain the precondition  (i.e. the  [] can freely take any binary values), and the postcondition 2 where we compare our algorithm against CBMC [Clarke et al. 2004] and LLBMC [Falke et al. 2013].(KLEE only manages  = 24 within timeout.)The reason for the vast superiority of our algorithm is that at any level  in the traversal, we compute just one interpolant: ≤  −  which subsumes all states at this level which are encountered later.In other words, we have "perfect" subsumption.Note that our search tree size is linear in  .

Remarks on Implementation
We call our implementation tracer-x.It is implemented on top of KLEE [Cadar et al. 2008a].The main addition to KLEE is the implementation of interpolation.DSE with interpolation was implemented before in tracer [Jaffar et al. 2012[Jaffar et al. , 2011]].tracer-x improves over tracer by building on top of KLEE, and an enhanced interpolation algorithm which makes it more efficient, and able to handle LLVM [LLVM 2018], including C/C++ programs.
tracer-x produces a new data structure called the subsumption table.This persistent structure is where the interpolants, which contain the subset of the path condition as well as the subset of , Vol. 1, No. 1, Article .Publication date: December 2020.memory regions are stored.When a new symbolic state is encountered, this table is consulted to check if the new state is subsumed by a record in the table, and hence its traversal need not continue.An entry in the table is created whenever KLEE removes a state from its worklist, which we assume to mean that KLEE has finished traversing the subtree originating from that state.Now we explain how we implement the core() algorithm in Fig. 7.In Section 1 we explained how an SMT solver can use the optimization method of (CDCL) [Marques-Silva and Sakallah 1999].More specifically, a core step in SMT solving is to involve a "theory solver" to solve a conjunction of constraints written for a particular theory.The CDCL method requires that the solver not only decides the satisfiability of the given conjunction.In case the result is "unsatisfiable", the solver also indicates which portion of the conjunction is required to keep in its unsatisfiability proof.This is known as the unsatisfiability core of the conjunction.In its pure form, the core only contains constraints that were already encountered.Essentially, we employ the unsatisfiability core technology of the SMT solvers for a more efficient implementation of the core() function.
Next, we explain how we efficiently implement the separate() algorithm in Fig. 7.We have employed a light-weight syntactic partitioning to approximate the algorithm in the separate() function.
Now, we will briefly explain how we extend the interpolation algorithm to other LLVM instructions.For this, we keep some extra information in the interpolant or the context.First, we elaborate more on the operational semantics of the malloc and free instructions.The key difference in how we deal with the malloc instruction, compared to KLEE, is that instead of using a concrete address returned by a system call to malloc, we use a fresh symbolic variable.We also add into the path condition the constraints specifying that the newly-allocated region is separated from the domain of the old heap store and the new domain of the new heap store includes both of them.
We also have special treatment for the array operation and the GEP instruction.As an example, suppose the transition were a[i] = 5 and Ψ was the formula *p = 5.We have extended rule (2b) from Fig 6 to return <M, i, 5>[p] = 5 as the interpolant.This formula is to be understood in the array theory.That is, M is a distinguished array variable representing the (entire) heap, <M, i, 5> is an array expression representing the array obtained from M after the element 5 has been inserted into location i.Finally, <M, i, 5>[p] refers to the  ℎ element of this array expression.
In order to extend the interpolation algorithm to perform sound inter-procedural subsumption, we store the call stack in an efficient way with an interpolant.This stored call stack is later checked with the call stack at the subsumption point and subsumption is only allowed if the call stacks are identical.

EXPERIMENTAL EVALUATION
We used an Intel Core i7-6700 at 3.40 GHz Linux box (Ubuntu 16.04) with 32GB RAM.The programs in Tables 3, 4, 5, and 6  The second set is from RERS[RERS 2012] (prefixed with "P" and "m" in the tables).They are from RERS Challenge competition in years 2012, 2017, and 2019 (identified by '-R12' to '-R19' respectively).The programs identified with 'P' are from the three different categories of the 2012 competition [RERS 2012]: 1) easy/small, containing plain assignments; 2) medium/moderate, containing arithmetic operations; and 3) large/hard, containing array and data structure manipulation.
The programs 'P3-R17*', 'P2-T-R17*', and 'P11-R17*' are from the LTL and Reachability problems of RERS 2017[RERS 2017] respectively.These programs are from the small and moderate size group and easy to hard categories.Similarly the 'P*-R19' problems are from the Sequential Training Problem, RERS 2019 [RERS 2019].The programs 'm34*' and 'm217*' are from Industrial Training Problems RERS 2019 and are divided into LTL, CTL, and Reachability Training Problems.Since, we have tested LTL problems from other tracks, here we focused on CTL and Reachability groups.These programs were the most difficult and complex programs in our experiment.We tagged the CTL and Reachability groups with '-C' and '-R', and the Arithmetic and Data Structure groups with '-A' and '-D'.Most of the programs are originally unbounded and we have tested them with different bounds (the program name is suffixed by the bound e.g.-100 means the loop bound used was 100).
We performed two experiments.
• The main experiment is on penetration/verification.This experiment runs each program using one target at a time.We then considered a subset of the original targets called hard targets.These are obtained by filtering out targets which can be proved easily by state-of-the-art methods: vanilla symbolic execution for reachable targets, and static analysis for unreachable targets.We then reran the main experiment on hard targets only.
• The supplementary experiment is on testing/coverage.
It is modeled after the TEST-COMP competition which has a "bug finding" component, and a "coverage" component.In the first part, bug finding, the task is to identify one target among all the targets injected in a program (performed for both all targets and hard targets).
In the second part, the overall objective is to measure code coverage.More precisely, we measured the coverage of basic blocks.Each program is ran with the purpose of full exploration (timeout 1 hour), reporting any memory or assertion error detected along the way.(This is the default analysis of KLEE.)We report the block coverage for the 47 programs from SV-COMP and we also extend this experiment to GNU Coreutils benchmarks [Coreutils-6.11 2008].
In both experiments, our baselines are KLEE [Cadar et al. 2008a] and CBMC [Clarke et al. 2004], as the state-of-the-art DSE and SSE tools.In general, CBMC is not appropriate for the second experiment on coverage (because they react with an external environment).Hence, there we only compared with KLEE, and use the Coreutils benchmark.

Main Experiment (Penetration)
The main purpose of this experiment is to detect individually each target (bug) injected in the program.Some of these targets will be easy to reach and some are very difficult.Also, some of the targets are located in unreachable parts of the program.We compare tracer-x and the baseline approaches on their capability in detecting easy as well as hard targets.
We determine a subset of all targets as hard targets via a filtering phase.These are obtained by filtering out targets which can be proved easily by state-of-the-art methods.We filtered out all the targets that are detected by KLEE within 5 minutes.Moreover, in a second step, we filtered out  have been proved reachable or unreachable by KLEE/CBMC and tracer-x.When a row is marked with "-" it means one of the tools had hit timeout on all targets in that program.The value denotes the relative speedup: for example, 0.5 means tracer-x was half as fast, and 2.0 means tracer-x was twice as fast5 .Fig. 9 shows the aggregated results of the all targets experiment.Fig. 9a shows the total numbers of targets that each tool has been able to prove as reachable or unreachable.The remaining targets are the ones where the tools timeout.Moreover, in Fig. 9b, we present the aggregate on the relative speedup of tracer-x over KLEE and CBMC.Finally, note that the detailed information on each target can be seen in [All-Targets 2020].
See Tables 4 which present the results of our experiment on hard targets.The columns are the same as in Table 3 except for the #HT column which reports the total number of hard targets.Since, in the previous experiments, the timeout was set to 5 minutes, for the tools to have a higher chance of finding hard targets we have extended the timeout to 10 minutes.Fig. 10 shows the aggregated results of the Hard targets experiment.Fig. 10a shows the total number of targets that each tool has been able to prove as reachable or unreachable.The remaining targets are the ones where the tools timeout.Moreover, in Fig. 10b, we present the aggregate on the relative speedup of tracer-x over KLEE and CBMC.Finally, we reported the detailed information on each hard target ran for all the programs in [Hard-Targets 2020].
Table 5 shows the results of bug finding.tracer-x-D is our system running under a Depth First Search (DFS) strategy and tracer-x-R with a random (KLEE-like) strategy.The column Time is in seconds, 1/0 shows whether the target was proven (reachable or unreachable).Here, "1" means that the target was proved reachable, and "0" shows that the target is unreachable if there was no timeout.For this experiment, the timeout was set to 1 hour.
Fig. 11 shows aggregated results for Table 5.The height of the bar denotes the number of "wins" for each system.A system wins when it proves a target faster than the others.We consider LLVM Basic Block Coverage (BB) as our coverage metric.Therefore we shall only compare against KLEE is this sub-experiment.
Table 6 shows the results of coverage achieved on 47 SV-COMP programs.The column BB shows the block coverage percentage.Columns 1 and 5 show the Benchmark names.Columns 2 to 4 & 6 to 8 show KLEE, tracer-x-D and tracer-x-R.The columns Time in seconds.Timeout set at 1 hour (∞ in the table).
We have also experimented with the Coreutils benchmark, for which KLEE is famous for proving good coverage.For space reasons, we relegate the detailed results to the appendix, in Table 7. Instead, Fig. 12 gives an overall picture of comparison with KLEE, and also of comparison between using a DFS or random strategy.
Finally, see the aggregate results for coverage on both the SV-COMP and Coreutils benchmarks in Fig. 13.7.1 Main Experiment Fig. 9a considered all targets.Clearly tracer-x has superior results in terms of proving both reachable and unreachable targets; it times out less.Note that KLEE was relatively poor in proving unreachable targets, while CBMC was relatively poor for reachable targets.In the end, tracer-x wins in 1339 (26.57%) targets, while loses in only 112 (2.21%) targets.Moving to hard targets, Fig. 10a, the gap widens.tracer-x wins in 796 (54.15%) targets, while loses in only 64 (4.35%) targets.
In summary for the main experiment, we now present a metric  _ _ for the final results in Tables 3 and 4. Let  _  denote the number of tracer-x wins, and  _ for losses.We define to capture our performance advantage in percentage terms.Similarly, for hard targets: Clearly, tracer-x is more effective as the targets become harder.
In a second comparison, we consider the relative speed of the tools.Before proceeding we mention the total time, in minutes, utilized for the three tools, tracer-x, KLEE and CBMC was 7782, 19648 and 20105 respectively for all targets.For hard targets, the numbers are 2634, 14630 and 10630.
In Fig. 9b, we aggregate the relative speedup of tracer-x over KLEE and CBMC.Recall that we are considering targets for which the tools terminate.Over all targets, tracer-x is 38.55× faster than KLEE and 137.56× faster than CBMC.Also, it can be observed that KLEE and CBMC have nearly the same total time.Regarding the speed computation, for tracer-x has in total, 33 winning programs, and 10 losing programs as compared to KLEE6 .Also, tracer-x has in total, 20 winning programs and 0 losing programs as compared to CBMC.When considering hard targets, Fig. 10b, the numbers are as follows.tracer-x is 490.26×faster than KLEE and 37.50× faster than CBMC.tracer-x has won in 7 winning programs and loses in no programs as compared to KLEE.Also, tracer-x wins in 20 programs and loses in 4 programs as compared to CBMC.

Supplementary Experiment
We first discuss the bug-finding results.We can observe in Table 5 7 that KLEE found the targets easily for 44 programs out of 47 programs.KLEE timeouts on the remaining 3 programs since all the targets were unreachable.But, when we ran the same set of programs with hard targets, then KLEE timeouts on 32 programs, proves 4 programs have unreachable targets, and proves a first target as reachable for 11 programs.We can observe that KLEE struggles in proving hard targets.On the other hand, the performance of CBMC for all and hard targets experiments is almost the same except for few cases.
While in the all targets experiment KLEE outperforms CBMC, CBMC has better performance in the hard targets programs.There are some programs which are draw where two of the tools have the same performance.
Finally, we consider tracer-x.Here, we observe that tracer-x-D is having a good performance in nearly all the easy targets.However, in some cases, it fails to reach the performance of KLEE.In these programs, we notice that tracer-x-R is competitive compared with KLEE.Moving to the hard targets, we observe that tracer-x-D has a better performance compared to tracer-x-R.See Fig. 11 which presents the aggregated number of programs where each tool was able to prove in the all targets and hard targets experiments.Here, we separately compare tracer-x-D and tracer-x-R with the baseline tools.tracer-x-D clearly outperforms KLEE and CBMC in both all targets and hard target experiments.In the all targets experiment, we notice that tracer-x-R wins on nearly as many programs as KLEE.By combining tracer-x-D and tracer-x-R for these experiments, i.e. tracer-x-D + tracer-x-R, then tracer-x has more number of winning cases and outperforms KLEE and CBMC significantly.
In summary, we conclude that tracer-x-D has good performance on both all Targets and hard Targets categories Also, we have noted that tracer-x-R is competitive with KLEE and when considered it can improve the overall performance of tracer-x.
We now discuss the coverage experiments, where we compare with only KLEE, and the set of targets is defined by the basic blocks.Fig. 13a shows the aggregated results for SV-COMP programs.It can be observed that KLEE terminated only on 5 programs and timeout on 42 programs, whereas tracer-x-D terminated on 31 programs and tracer-x-R terminated on 13 programs.Among the 47 programs, KLEE wins on 4 programs.There is a group where none of the systems terminated within timeout so higher BB will be required to compare.Here, tracer-x-D wins on 6 programs out of 15 programs.There is 1 program for which tracer-x-R won, and also for 2 programs tracer-x-R has better coverage compared to tracer-x-D but the same as KLEE.If we consider tracer-x-D and tracer-x-R together then our system wins on 38 programs.
We finally discuss the performances of KLEE, tracer-x-D, and tracer-x-R on Coreutils benchmarks.From Table 7 in the Appendix section, we can observe that tracer-x-D terminates and is faster in 12 programs compared to KLEE.Next, in Fig. 12 on the Coreutils programs where neither KLEE nor tracer-x terminates, we observe that KLEE has better coverage in 13 programs.Here, tracer-x-D does not perform well because of the huge execution SET.tracer-x-D has better coverage in only 3 programs.However, tracer-x-R has competitive results as compared to KLEE.tracer-x-R has better coverage on 15 programs.Moreover, in Fig. 13b we report the aggregated result on the 75 Coreutils programs.Overall, KLEE wins on 16 programs, and our combined result of tracer-x-D + tracer-x-R wins on 30 programs.
In summary, consider first the 47 SV-COMP programs.In bug-finding, tracer-x wins on 25 programs considering all targets, and on 32 programs considering hard targets.In coverage, the win is 38.Finally, for the Coreutils programs, the win is 30 out of 75, with a loss of 16.
The overall conclusion of these sets of experiments is that our algorithm has significantly improved path coverage of DSE by means of its interpolation algorithm.Clear evidence is given by showing many targets where tracer-x complete search while other systems cannot, or are significantly slower.When faced with an incomplete search, the result is less clear.This may be because the link between path coverage and code coverage/bug-finding is not clear.Nevertheless, our experiments do show that our algorithm is competitive or better for this purpose too.

RELATED WORK
Abstraction learning in symbolic execution has its origin in [Jaffar et al. 2009], and is also implemented in the TRACER system [Jaffar et al. 2012[Jaffar et al. , 2011]].TRACER implements two interpolation techniques: using unsatisfiability core and weakest precondition (termed postconditioned symbolic execution in [Yi et al. 2015]).Systems that use unsatisfiability core and weakest precondition respectively include Ultimate Automizer [Heizmann et al. 2014], and a KLEE modification reported in [Yi et al. 2015].The use of unsatisfiability core results in an interpolant that is conjunctive for a given program point and therefore requires less performance penalty in handling.In contrast, weakest precondition might be more expensive to compute, yet logically is the weakest interpolant, hence its use may result in more subsumptions.
Abstraction learning is also popularly known as lazy annotations (LA) in [McMillan 2010[McMillan , 2014]].In [McMillan 2014] McMillan reported experiments on comparing abstraction learning with various other approaches, including property-directed reachability (PDR) and bounded model checking (BMC).He observed that PDR, as implemented in Z3 produced less effective learned annotations.On the other hand, BMC technology, e.g.[Clarke et al. 2005;Cordeiro et al. 2012;Holzer et al. 2008;LLBMC 2012LLBMC 2012]], employs as backend a SAT or SMT solver, hence it employs learning, however, its learning is unstructured, where a learned clause may come from the entire formula [McMillan 2014].In contrast, learning in LA is structured, where an interpolant learnt is a set of facts describing a single program point.
Recently Veritesting [Avgerinos et al. 2016] leveraged modern SMT solvers to enhance symbolic execution for bug finding.Basically, a program is partitioned into difficult and easy fragments: the former are explored in DSE mode (i.e., KLEE mode), while the latter are explored using SSE mode with some power of pruning (i.e., BMC mode).Though this paper and veritesting share the same motivation, the distinction is clear.First, our learning is structured and has customizable interpolation techniques.Second, we directly address the problem of pruning in DSE mode via the use of symbolic addresses.In contrast, there will be program fragments where Veritesting's performance will downgrade to naive DSE, e.g.our motivating examples.In summary, we believe that our proposed algorithm can also be used to enhance Veritesting.
Our approach is also slightly related to various state merging techniques in symbolic execution, in the sense that both state merging and abstraction learning terminates a symbolic execution path prematurely while ensuring precision.State merging encodes multiple symbolic paths using ite expressions (disjunctions) fed into the solver.The article [Hansen et al. 2009] shows that state merging may result in significant degradation of performance, which hints that complete reliance on constraint solver for path exploration, as with the bounded model checkers (e.g., CBMC, LLBMC), may not always be the most efficient approach for symbolic execution.
Finally, there is very recent work on KLEE [Trabish et al. 2018] that exploits a dependency analysis to identify redundant code fragments that may be ignored during symbolic execution.More specifically, they execute some user-chosen functions only on-demand, using program slicing to reduce demand.This work is somewhat orthogonal to our work because of the manual input and because the slicing is a static process.In contrast, our algorithm is completely general and dynamic.

CONCLUSION
We presented a new interpolation algorithm and an implementation tracer-x to extend KLEE with pruning.The main objective is to address the path explosion problem in pursuit of code penetration: to prove that a target program point is either reachable or unreachable.That is, our focus is verification.We showed via a comprehensive experimental evaluation that, while computing interpolants has a very expensive overhead, the pruning it provides often far outweighs the expense, and brings significant advantages.In the experiments, we compared against KLEE, a dynamic symbolic system with no pruning, and CBMC, a static symbolic execution system which does have pruning.We showed that our system outperforms when experimented for penetration.In fact, the performance gap widens when the verification target is harder to prove.We finally demonstrated that our system is also competitive in testing.

A APPENDIX
In this Appendix section, we present the detailed result of our experiments on GNU Coreutils benchmark.Table 7 has 5 major columns.Columns 1 and 2 show the Benchmark name and #TB i.e. total number of basic blocks in LLVM IR respectively.Columns 3 to 5 show results of KLEE, tracer-x-D, and tracer-x-R respectively.These columns further split into four sub-columns each.These sub-columns are #Inst, #T , #VB, and #err.The #Inst (in Millions) shows the total number of LLVM instructions covered during the exploration of the SET.#T (in seconds) shows the total amount of execution time consumed.The timeout we set for this experiment was 1 hour.#VB shows the total number of uniquely visited basic blocks.#err is the number of error paths traversed during the execution.

#Fig
Fig. 1.Motivating Example 1 where Stmts is the set of program statements, be the transition relation that relates a state to its (possible) successors by executing the statements.We shall use ℓ stmt − −− → ℓ ′ to denote a transition relation from ℓ ∈ Σ to ℓ ′ ∈ Σ executing the statement stmt ∈ Stmts.Definition 3.1 (Symbolic State).A symbolic state  is a tuple ⟨ℓ, Π⟩, where ℓ ∈ Σ is the current program point, and the constraint store (or "context") Π is a first-order formula over symbolic variables Vars  and program variables Vars  .□ The evaluation  =  Π of an expression  with the constraint store Π is defined in the standard way by Π[/].Similarly, the evaluation  () Π

Fig. 6 .
Fig. 6.Conjunctive Path-Based Weakest Precondition 2013, 2012].Although in all of these previous efforts, the interpolants implemented are conjunctive, they were not as general as in the weakest precondition.In fact, all implementations ensured that an interpolant was in the form of a conjunction which could then be dealt with efficiently by an SMT solver.We now present the BackProp function in Fig.6which generates a conjunctive approximation of the path-based weakest precondition.As before, we use the notation − →  to denote a sequence of

Fig. 8 .
Fig. 8. Example with Interpolation In summary, the rule (5b) from Fig 6 performs (a) removing a constraint that already existed, or (b) transforming an existing constraint with another existing constraint.Step (a) is implemented via an unsat-core method.Step (b) on the other hand, provides a mechanism for backward reasoning, computing an approximation of the weakest precondition.That is, the interpolant is (partly) composed of constraints that come from the postcondition in a bottomup manner, and not from the constraint of the current symbolic state, which is from a top-down manner.A naive implementation of the abduction algorithm will not be sufficiently practical.We will describe how we implement the algorithm in Section 5.2.
(47 programs) are from SV-COMP Verification tasks [Psyco 2017] and The Rigorous Examination of Reactive Systems Challenge (RERS) [RERS 2012].A large subset of the test programs are industrial programs or have been used in testing and verification competitions.The raw experimental results can be accessed at [Artifacts 2020] 3 .Benchmarks Tested: Our first set (psyco1 to psyco7) is from SV-COMP verification tasks [Psyco 2017].These programs are generated by the PSYCO tool [Psycotool 2017] which produces interfaces using a symbolic execution and active automata learning.These programs contain complicated loops and are hard to analyze.

Table 5 .
The results for Supplementary Bug Finding Experiment(All Targets and Hard Targets)

Table 6 .
The results for Supplementary LLVM Block Coverage Experiment