Diﬀerence Veriﬁcation with Conditions

,


Introduction
Software changes frequently during its life-cycle: developers fix bugs, adapt existing features, or add new features.In agile development, software construction is an intrinsically incremental process.Every change to a working system holds a risk to introduce a new defect.Since software failures are often costly and may even endanger human lives, it is an integral part of software development to find potential failures and ensure their absence.
However, running a full verification after each change is inadequate: Changes rarely affect the complete program behavior.For example, consider program absSum (Fig. 1, middle).If the assignment of program variable r is changed in the else-branch at location 5 (absSum mod , Fig. 1, right), only program executions that take that else-branch show different behavior.Program executions that take the if-branch (highlighted in gray) are not affected by the change.This is typical for program changes: A modified program P exhibits some new or changed program executions compared to an original program P , but some executions also stay the same (Fig. 1, left).To ensure the safety of P , it is sufficient to inspect only the changed behavior ex(P ) \ ex(P ).Many incremental verification approaches [39,40] use this insight: Regressiontest selection [62] tries to only execute those tests in a test suite that are relevant w.r.t. the change, and incremental formal verification techniques adapt existing proofs [33,49,53,54], reuse intermediate results [16,59], or skip the exploration of unchanged behavior [21,47,60,61].However, they (a) all focus on one fixed verification approach, (b) require a strong coupling between the original verification approach and the incremental technique, and (c) require an initial, full verification run.Often, this inflexibility makes an approach prohibitive.
As an alternative, we define the concept of difference verification with conditions: Given the original and the changed software, difference verification with conditions first identifies all executions that are affected by changes and encodes them in a condition, an exchange format already known from conditional model checking [10]-we call this first part diffCond.Then, a conditional verifier uses that condition to verify only the changed program behavior.For this step, any existing off-the-shelf verifier can be turned into a conditional verifier with the reducer-based approach [13].
Difference verification with conditions allows us to (a) use varying verification approaches for incremental verification, (b) automatically turn any existing verifier into an incremental verifier, and (c) skip an initial, costly verification run.Contributions.We make the following contributions: -We propose difference verification with conditions, which is an incremental verification approach that combines existing tools and approaches.-We provide the algorithm diffCond, an integral part of difference verification with conditions, which outputs a description of the modified execution paths in an exchangeable condition format.We also prove its correctness.-We implemented diffCond in the verification framework CPAchecker and combined it with existing verifiers to construct difference verifiers.-To study the effectiveness and efficiency of difference verification with conditions, we performed an extensive evaluation on more than 10 000 C programs.
-diffCond and all our data are available for replication and to construct further difference verifiers (see Sect. 7).Fig. 2: CFA of absSum (Fig. 1), CFA of absSum mod , and a condition that describes the common executions of both programs, as created by our approach

Background
Programs.For ease of presentation, we consider imperative programs with deterministic control-flow, which execute statements from a set Ops.Our implementation supports C programs.Following literature [8,9,30], we model programs as control-flow automata.Figure 2 shows the CFA of the example program absSum from Fig. 1.A sequence We rely on standard operational semantics and model a program state by a pair of (1) the program counter, whose value refers to a program location in the CFA, and (2) a concrete data state c, whose shape we do not further specify [8].We denote the set of all concrete data states as C. The function sp op : C → 2 C describes the possible effects of operation op ∈ Ops on concrete data state c ∈ C. Based on this, a sequence We denote the set of all program paths by paths(P ).Program executions are derived from program paths.If p = ( 0 , c 0 ) The executions of a program P are defined as ex(P ) := {ex(p) | p ∈ paths(P )}.Conditions.A condition describes which program executions were already verified, e.g., in a previous verification run.We use automata to represent conditions and use accepting states to identify already verified executions [13].
a finite set Q of states, a transition relation δ ⊆ Q×Ops×Q ensuring ∀(q, op, q ) ∈ δ : q ∈ F ⇒ q ∈ F , the initial state q 0 ∈ Q, and a set F ⊆ Q of accepting states. 3he goal of absSum (left program in Fig. 2) is to compute r = |a| i=0 .However, the original program is buggy: In location 5 , it must compute the product of a and a + 1, not the sum.The fixed program is shown in the middle of Fig. 2-the fix is highlighted in blue.The original and modified version of the program only differ in the else-branch.If we assume that the original program was already verified, we know that program executions passing through the if-branch have already been verified and do not need to be considered during a reverification.In contrast, executions that pass through the else-branch and reach the modified statement must be verified.The condition shown on the right of Fig. 2 encodes this insight.Program executions that pass through the if-branch (a < 0) lead to the accepting state q 2 -we say they are covered by the condition.In contrast, program executions that pass through the else-branch (¬a < 0) never reach q 2 -they are not covered by the condition, and must be analyzed.
Next, we introduce a simple and efficient way to systematically compute a condition that covers the common executions of an original and a modified program.

Component diffCond for Modular Construction
The ultimate goal of difference verification with conditions is to speed up reverification of modified programs.To achieve this goal, we aim at ignoring unmodified program behavior during verification.Conditions are a well-fitting format to describe the unmodified program behavior.However, to benefit from difference verification with conditions, the construction of such conditions must be efficient, i.e., consume only a small portion of the overall execution time of the verification.Therefore, we use a syntactic approach to compute the condition, diffCond (Alg.1), which is linear in time regarding the size of the modified program.
diffCond gets as input the original program P and the modified program P .In lines 1 to 11, diffCond traverses the modified and the original program in parallel, stops traversal if the original and the modified program differ, and remembers the edge that differs in the modified program.
To compute the condition, we first determine the condition's states.Lines 12 to 18 compute all nodes that can reach a successor of a difference edge.Figure 3 highlights these nodes in green.Nodes that are not discovered in lines 12-18 cannot lead to a difference edge and, thus, not to different program behavior.Consequently, undiscovered nodes that are successors of nodes discovered in lines 12-18 become final states (line 23). Figure 3 highlights these nodes in gray (only node ( 2 , 2 )).The union of discovered and final states become our condition states.To complete the construction, we use the pair of initial program locations as the initial state ( 0 , 0 ) and add to the transition relation all transitions from E and D that connect condition states.Figure 2c shows the condition created from Fig. 3.
Finally, note that lines 19-21 handle the special case that the set D of difference edges is empty, thus resulting in Q = ∅ in line 19.The set D is empty if the original and the modified program only differ in the names of their program locations 4 or if the modified program is empty (( 0 , •, •) / ∈ G ).In both cases, all executions of the modified program are covered by the executions of the original program.As a result, the condition covers all executions: its only state is both initial and accepting state, and the condition has no transitions.
The purpose of algorithm diffCond is to compute a condition that supports skipping unchanged behavior during reverification of a modified program.
To still have a sound reverification, the produced condition must not cover executions that do not occur in the original program.The following theorem states this property of algorithm diffCond.
Proof.Assume ex(P ) \ ex(P ) = ∅.Hence, diffCond(P, Due to program semantics and π ∈ ex(P ) \ ex(P ), we conclude k ≤ m < n.Due to program semantics, P being deterministic, and π ∈ ex(P ), there exists an extension 0 op1 Theoretical Limitations.The effectiveness of difference verification with conditions depends on the amount of program code potentially affected by a change, which is determined by the diffCond component.diffCond only excludes program parts that cannot be syntactically reached from a program change.Therefore, difference verification is ineffective if some initial variable assignments at the very beginning of the program or some global declarations change.Moreover, the structure of a program strongly influences the effectiveness of difference verification.For example, programs like absSum ∞ (Fig. 4) that mainly consist of a loop are problematic.Program absSum ∞ (Fig. 4) is similar to absSum, but has an additional, outer loop that dominates the program.So when location 7 is changed in absSum ∞ , difference verification with conditions can only exclude the if-branch for the very first iteration of the outer loop.Thereafter, the change in location 7 may propagate into the if-branch.
In contrast, difference verification with conditions can be effective on programs that allow the exclusion of program parts, e.g., if the program is modular and, thus, consists of multiple, loosely coupled parts.Examples for modularity are the strategy design pattern, object-oriented software, or software applications with multiple program features.When designing our experiments, we will consider these limitations of difference verification with conditions.Before we get to our experiments, we must describe the modular composition of the diffCond component with a verifier, which specifies the difference verifier.

Modular Combinations with Existing Verifiers
The diffCond algorithm can be combined with any off-the-shelf conditional verifier [10] to produce a difference verifier in a modular way.The goal of a difference verifier is to verify only modified program paths.To this end, it first uses diffCond to discover potentially modified program paths and then runs a conditional verifier to explore only those paths identified by diffCond.Figure 5 shows the construction template for difference verification with conditions.diff-Cond gets the original and modified program as input and encodes the modified paths in a condition.The constructed condition is forwarded to a conditional verifier, which uses the condition to restrict its analysis of the modified program to those paths that are not covered by the condition (i.e., the modified paths).Based on this template, we can construct difference verifiers from arbitrary conditional verifiers.Moreover, we can construct difference verifiers from non-conditional verifiers by using the concept of reducer-based conditional verifiers [13].The idea of a reducer-based conditional verifier is shown on the right of Fig. 5  In this paper, we transform three verifiers into difference verifiers: CPA-Seq, UAutomizer, and Predicate.The first two are the best verifiers from SV-COMP 2020 [5], and the third is a predicate-abstraction approach.We use the off-the-shelf verifiers CPA-Seq and UAutomizer as non-conditional verifiers and thus add a reducer, while we use Predicate as conditional verifier.Since a difference verifier can now be built from any off-the-shelf verifier, we can also combine difference verification with other incremental verification techniques.As an example, we can use precision reuse [16].This technique is implemented in CPAchecker [16] and UAutomizer [49] and can be used with the previously mentioned approaches.Next we explain the technologies of the selected verifiers.
CPA-Seq uses several different strategies from the CPAchecker verification framework [6,11,14].CPA-Seq first analyzes different features of the program under verification.The program features considered are: recursion, concurrency, occurrence of loops, and occurrence of complex data types like pointers and structs.Based on these features, CPA-Seq uses one of five different verification techniques (cf.[6]).For non-recursive, non-concurrent programs with a non-trivial control flow, CPA-Seq uses a sequential combination of four different analyses: It uses value analysis with and without Counterexample-guided Abstraction Refinement (CEGAR) [24], a predicate analysis similar to Predicate, and kinduction with invariant generation [7].Invariants are generated by numerical and predicate analyses and are forwarded to the k-induction analysis.
UAutomizer is the automata-based approach from the Ultimate verification framework [29,31].It uses a CEGAR approach to successively refine an overapproximation of the error paths, which is given in form of automata.In each refinement step, a generalization of an infeasible error path is excluded from the over-approximation.The generalization of the error path is described by a Floyd-Hoare automaton [31], which assigns Boolean formulas over predicates to its states.The predicates are obtained via interpolation along the infeasible error path [43].
Predicate is the predicate-abstraction approach from the CPAchecker framework [14] with adjustable-block encoding (ABE) [15].ABE is instructed to abstract at loop heads only.CEGAR together with lazy refinement [34] and interpolation [32] determines the necessary set of predicates.
PrecisionReuse is a competitive incremental approach that avoids recomputing the required abstraction level [16].The idea is to start with the abstraction level determined in a previous verification run.To this end, it stores and reuses the precision, which describes the abstraction level, e.g., the set of predicates to be tracked.We use the version as implemented in CPAchecker.

Evaluation
We systematically evaluate our proposed approach along the following claims: Claim 1. Difference verification with conditions can be more effective than a full verification.Evaluation Plan: For all verifiers, we compare the number of tasks solved by difference verification with conditions and by the pure verifier.
Claim 2. Difference verification with conditions is more effective when using multiple verifiers.Evaluation Plan: We compare the number of tasks solved by each difference verifier with the union of tasks solved by all difference verifiers.Claim 3. Difference verification with conditions can be more efficient than a full verification.Evaluation Plan: For all verifiers, we compare the run time of difference verification with conditions and of the pure verifier.Claim 4. The run time of difference verification with conditions is dominated by the run time of the verifier.Evaluation Plan: We relate the time for verification to the time required by the diffCond algorithm and the reducer.
Claim 5. Difference verification with conditions can complement existing incremental verification approaches.Evaluation Plan: We compare the results of difference verification with conditions with the results of precision reuse [16], a competitive incremental verification approach.Claim 6. Combining difference verification with conditions with existing incremental verification approaches can be beneficial.Evaluation Plan: We compare the results of difference verification with the results of a combination of difference verification with conditions and precision reuse.

Experiment Setup
Computing Environment.We performed all experiments on machines with an Intel Xeon E3-1230 v5 CPU, 3.4 GHz, with 8 cores each, and 33 GB of memory, running Ubuntu 18.04 with Linux kernel 4.15.We limited each analysis run to 15 GB of memory, a time limit of 900 s, and 4 CPU cores.To enforce these limits, we ran our experiments with BenchExec [17], version 2.3.

Verifiers.
For our experiments, we use the software verifiers CPA-Seq5 [6,14] and UAutomizer 6 [29,31] as submitted for SV-COMP 2020, and CPAchecker [14,15] in revision 32864 7 .CPA-Seq and UAutomizer are used as verifiers.CPAchecker provides the verifier Predicate, but also the new diffCond component and the Reducer component for reducer-based conditional verification.The difference verifier based on Predicate is realized as a single run.In contrast, the difference verifiers based on CPA-Seq and UAutomizer are realized as composition of two separate runs.The first run executes the diffCond algorithm followed by the reducer to generate the residual program.It is only executed once per task, i.e., the same residual programs are given to CPA-Seq and UAutomizer.In a second run, CPA-Seq and UAutomizer, respectively, verify the residual program.To deal with residual programs, we increased the Java stack size for CPA-Seq and UAutomizer.
Existing Incremental Verifier.We use Predicate with precision reuse [16].
Verification Tasks.We use verification tasks from the public repository sv-benchmarks (tag svcomp20) 8 , which is the most diverse, largest, and wellestablished collection of verification tasks.Since difference verification with conditions is an incremental verification approach, we require different program versions.We searched the benchmark repository for programs that come with multiple versions and for which at least one version is hard to solve, i.e., at least one of the three considered verifiers takes more than 100 s for verification of that version, but is successful.From these programs, we arbitrarily picked the following: eca05 and eca12 (event-condition-action systems, both have 10 versions each), gcd (greatest common divisor computation, has 4 versions), newton (approximation of sine, has 24 versions), pals (leader election, has 26 versions), sfifo (second-chance FIFO replacement, has 5 versions), softflt (a software implementation of floats, has 5 versions), square (square-root computation, has 8 versions), and token (a communication protocol, has 28 versions).Unfortunately, all of these programs are specialized implementations with a single purpose.Thus, their implementation is strongly coupled and any reasonable program change affects the complete program.As explained before, this prohibits effective difference verification with conditions.
To get benchmark tasks that instead contain independent program parts, we create new combinations from the selected programs.We choose two programs, e.g., eca05 and token.We then combine these two programs according to the following scheme: We create a new program with all declarations and definitions of both original programs, but a new main function.This new main function randomly calls the main function of one of the two original programs.Name clashes are resolved via renaming.Figure 6 shows the conceptual structure of each program created through this combination.For our experiments, we consider the following combinations of programs: (1) eca05+token, (2) gcd+newton, (3) pals+eca12, (4) sfifo+token, (5) square+softflt.To create different versions of our combinations, we replace one of the two program parts with a different version of that part.For example, to get a different With this procedure, we get a large amount of different versions of our program combinations.For our evaluation, we consider each pair (O, N ) of versions O and N of program combinations that fulfills the following two conditions: (1) N reflects a change, i.e., the two programs are different.( 2) Version O, version N , or both versions are bug-free.This ensures that verification and difference verification can only find the same bugs.With this construction of benchmark tasks for incremental verification we get a total of 10 426 tasks that we use in our experiments.

Experimental Results
Claim 1 (Difference verification with conditions more effective).Table 1 gives an overview of our experimental results.Each column represents one task set.The rows refer to verifiers, i.e., pure verifiers (X) and difference verifiers (X ∆ ).The last two rows are the union of the results of all three verifiers.For each task set and verifier, the table provides the number of tasks for which the verifier finds a proof (), finds a bug (!), and only the difference verifier gives a conclusive answer (#).It also shows the number of tasks (N) that cannot be solved.Neither the pure nor the difference verifiers reported incorrect results.
The table shows that for each verifier there exist task sets on which the number of solved correct tasks () is higher for the difference verifier.Looking at columns #, we observe that typically there exist tasks that only the difference verifier can solve.Thus, this shows that our new difference verification with conditions can be more effective.
Difference verification with conditions is not always more effective.Especially, CPA-Seq ∆ and UAutomizer ∆ sometimes perform worse.For example, CPA-Seq ∆ finds significantly less bugs than CPA-Seq for eca05+token.The reason for this is the residual program constructed by the reducer, which is necessary to turn Table 1: Experimental results for Predicate, CPA-Seq and UAutomizer, as pure verifiers (X) and difference verifiers (X ∆ ) showing how many correct tasks () and tasks with a bug (!) are solved, how many tasks are only solved by the difference verifier (#) and which are too hard to solve (N) CPA-Seq into the required conditional verifier.The created residual programs, on which the off-the-shelf verifiers run, have a different structure than the original program.They make heavy use of goto statements and deeply nested branching structures.While semantically equivalent, this can have unexpected effects on analyses: In the case of the tasks in eca05+token, CPA-Seq was not able to detect required information about loops and thus aborts its verification.Note that this is not a direct issue of difference verification with conditions, but an orthogonal issue.To fix the problem, verification tools must be improved to better deal with the generated residual programs or the structure of the residual program must be improved.Despite of the problem with residual programs, difference verification can solve many tasks that a full verification run cannot solve.Since Predicate is already a conditional model checker, Predicate ∆ does not suffer from the residual program problem.Thus, the effectiveness of difference verification with conditions becomes even more obvious when comparing Predicate with Predicate ∆ .For the first three task sets, Predicate ∆ solves all tasks that Predicate solves plus a significant amount of additional tasks that Predicate cannot solve.For the last two task sets Predicate ∆ fails to solve a few tasks that Predicate can solve.However, Predicate ∆ still solves more tasks in total.One reason for this is that the predicate abstraction used by Predicate may compute different predicates (due to a slightly different exploration of the state space), which may result in a more expensive abstraction, if the explored state-space looks different.For some tasks, these different predicates may be less suited to solve the task and thus require more time, which results in the analysis hitting the time limit.Typically, we observe this phenomenon when Predicate is expensive already (in our experiments, when it takes at least 700 s).While for complicated tasks with large changes, difference verification may produce worse results, Predicate ∆ is still more effective than Predicate in all categories.
Claim 2 (Better with several verifiers).To study the usefulness of using several verifiers in difference verification, we look at the tasks solved by the three difference verifiers together.We observe that Predicate ∆ solves the most tasks in all task sets except for pals+eca12, in which CPA-Seq ∆ is better.Moreover, when looking at All ∆ , which takes the union of all results, we observe that for eca05+token multiple tasks without a property violation exist that cannot be solved by the best difference verifier of this task set (Predicate ∆ ).Thus, the difference verification is more effective when using several verifiers.
Claim 3 (Difference verification with conditions more efficient).We compare the run times of the verifiers with the run times of the difference verifiers.For all three verifiers, the scatter plots in Fig. 7 show the CPU time required to check a task without (x-axis) and with difference verification (y-axis).If a task was not solved, because the verifier either runs out of resources or encountered an error, we assume the maximum CPU time of 900 s.Figures 7a and 7b compare the two non-conditional verifiers CPA-Seq and UAutomizer, for which we use the reducer-based conditional verifier approach.For a significant number of tasks (below diagonal), the difference verifier is faster than the respective verifier CPA-Seq and UAutomizer, and the tasks on the right edge can only be solved by the difference verifier.There are tasks for which difference verification is slower (above diagonal).Note that the problem is the residual program, not our approach.For example, many tasks located at the upper edge do not represent timeouts of the difference verification, but failures of the verifier caused by the structure of the residual program.Figure 7c compares the conditional verifier Predicate.For the majority of tasks, the CPU time required by Predicate ∆ is equal to or less than the time required by Predicate (tasks below the line).Moreover, there are only few tasks for which Predicate ∆ is slower than Predicate (tasks above the line).The reason for this slow-down is most likely the computation of worse predicates (see Claim 1).To sum up, difference verification with conditions can successfully increase efficiency.
Claim 4 (Verifier dominates run time).We aim to show that the diffCond component and the residual program construction (in the reducer-based approach to construct conditional verifiers) require a negligible run time compared to the complete verification run time.We show in Fig. 8a for each task verified with CPA-Seq ∆ and UAutomizer ∆ , the CPU time required by the full verification run (x-axis) and the CPU time of that run spent for diffCond plus the reducer (yaxis).The time required by diffCond + reducer does not depend on the run time of the verifier, and it is below 60 s for all tasks.Claim 5 (Difference verification with conditions complementary).To show that difference verification with conditions complements existing incremental verification, we need to compare difference verification with conditions against an existing incremental approach.Looking at existing approaches that are (1) available as replication artifact and (b) able to run on verification tasks from sv-benchmarks, we identified two: both based on precision reuse, one implemented in CPAchecker [16] and one in Ultimate [49].We use the one in CPAchecker. Figure 8b shows the CPU time of precision reuse with Predicate, called Predicate (x-axis) against our difference verification with Predicate, called Predicate ∆ (y-axis).Many tasks are solved efficiently by both techniques (large cluster in lower left).For the remaining hard tasks, difference verification is often faster than precision reuse, or precision reuse cannot even solve the task (points below the diagonal and on right edge).This shows that difference verification with conditions can improve on precision reuse for a significant number of tasks.It can thus complement existing incremental techniques.
Claim 6 (Combinations sometimes beneficial).We combined difference verification with conditions with precision reuse, called Predicate ∆ .Figure 8c shows that this combination rarely becomes faster than difference verification Predicate ∆ alone.In the worst case, the combination even slows down because precision reuse tracks previously used predicates from the beginning while difference verification would only detect the necessary ones lazily.This more precise abstraction leads to more, sometimes unnecessary computations.Nevertheless, the combination can solve 29 tasks that neither Predicate, its difference verifier, nor precision reuse can solve alone.Thus, while a combination of the two incremental techniques is not beneficial in general, it can be.

Threats to Validity
External Validity.(1) Our benchmark tasks might not represent real program changes, and thus, our results might not transfer to reality.However, we built our tasks from a well-established collection of software-verification problems, which are considered relevant in the verification community.Moreover, many of the combined programs implement known algorithms (greatest common divisor, Newton approximation of a sine function, Taylor expansion of a square root) or are derived from real applications (OpenSSL, SystemC design, leader election).Also, our combination is not uncommon in practice.Such combination patterns e.g.result from implementing the strategy pattern.Finally, our task set contains pairs of programs whose only difference is a bug fix to eliminate the reachability of the __VERIFIER_error() call.We believe that similar fixes are done in practice to eliminate bugs.(2) We compared our approach only with a single existing approach for incremental verification, and this comparison is restricted to a single verifier.Our observations may not apply to different incremental verification approaches or different verifiers.The same holds for the combination of difference verification with orthogonal, incremental verification approaches.Internal Validity.
(3) The implementation of the diffCond algorithm may contain bugs, and thus, produces conditions that also exclude modified paths.We would expect that such a bug also excludes error paths.Since we never observed false proofs, we assume this is unlikely.(4) Difference verification with CPA-Seq and UAutomizer could appear improved simply because we separated verification from the execution of diffCond + Reducer and granted both runs a limit of 900 s.But the sum of the two times are always below 900 s for all correctly solved tasks.

Related Work
Equivalence Checking.Regression verification [27,28,55,56], SymDiff [23], UC-Klee [48], and other approaches [4,26] check whether the input-output behavior of the original and modified method or program is the same.Differential assertion checking [38] inspects whether the original and modified program trigger the same assertions when given the same inputs.Equivalence checking does not need to be restricted to a simple yes or no answer.Semantic Diff [35] reports all dependencies between variables and input values that occur either in the original or modified program.Conditional equivalence [37] infers under which input assumption the original and modified program produce the same output.Over-approximation of the differences between the original and modified program was also investigated [45].Differential symbolic execution [46] compares function summaries and constructs a delta that describes the input values on which the summaries are unequal.Partition-based regression verification [19] splits the program input space into inputs on which original and modified program behave equivalently and those on which the two programs are unequal.Equivalence checking is not directly tailored to property verification, but determining when the original and modified programs may behave differently is similar to the goal of the diffCond algorithm.
Result Adaption.Incremental data-flow analysis [51], Reviser [3], and IncA [57,58] adapt the existing data-flow solution to program modifications.Similarly, incremental abstract interpretation [52] adapts the solution of the abstract interpreter.Incremental model checking in the modal-µ calculus [54] adapts a previous fixed point and restarts the fixed-point iteration.Other approaches [18,20] model data-flow analysis and verification as computation of attributed parse trees.A change results in an update of the attributed parse tree.Extreme model checking [33] reuses valid parts of the abstract reachability graph (ARG) and resumes the state-space exploration from those nodes with invalid successors.Incremental state-space exploration [41] reuses a previous state-space graph to prune the current exploration.HiFrog [1] and eVolCheck [25] implement an approach that reuses function summaries and recomputes invalid summaries [53].UAutomizer adapts a previous trace abstraction [49], a set of Floyd-Hoare automata that describe infeasible error paths, to reuse it on the modified program.While result adaption uses the same verification technique for original and modified program, our approach may use different techniques.
Reusing Intermediate Results.Green [59], GreenTrie [36], and Recal [2] support the reuse of constraint proofs.Similarly, iSaturn [44] supports the reuse of SAT results of Boolean constraints that are identical.Precision reuse [16] reuses the precision of an abstraction, e.g., which variables or predicates to track, from a previous verification run.These approaches are orthogonal to our approach.In the experiments, we even combined precision reuse [16] with our approach.Skipping Unaffected Verification Steps.Regression model checking [60] stops exploration of a state as soon as no program change can be reached from that state.Directed incremental [47,50] and memoized [61] symbolic execution restrict the exploration to paths that may be affected by the program change.
Additionally, memoized symbolic execution does not check constraints as long as the path prefix is unchanged.The Dafny verifier rechecks methods affected by a change reusing unchanged verification conditions [42].iCoq [21,22] detects and only rechecks those Coq proofs that are affected by a change in the Coq project.These ideas are similar to ours but are tailored to specific techniques.

Conclusion
Software is frequently changed during development.Verification techniques must deal with repeatedly verifying nearly the same software again and again.To be able to construct efficient incremental verifiers from off-the-shelf components, we introduce difference verification with conditions, which steers an arbitrary existing verifier to reverify only the changed parts.Compared to existing techniques, our approach is tool-agnostic and can be used with arbitrary algorithms for change analysis.We provide an implementation of a change analysis as reusable component, which we combined with three existing verifiers.In a thorough evaluation on more than 10 000 tasks, we showed the effectiveness and efficiency of difference verification with conditions.
Data Availability Statement.diffCond and all our data are available for replication and to construct further difference verifiers on our supplementary web page 9 and in a replication package on Zenodo [12].

Fig. 1 :
Fig. 1: Relation between program executions of original and modified program (left) and an example: Program absSum (middle) and its modified version absSum mod (right).The modification at location 5 is shown in blue.Program parts unaffected by the modification are highlighted in gray.

Definition 1 .
A control-flow automaton (CFA) P = (L, 0 , G) consists of a set L of program locations with initial location 0 ∈ L, and a set G ⊆ L × Ops × L of control-flow edges.CFA P is deterministic if ( , op, ), ( , op, ) ∈ G ⇒ = .

∆ 7 :
CPU time (in s) of full verification vs. difference verification, per task edges equal in the original and the modified program, set D ⊆ L × L × Ops × L of CFA edges that differ in the modified program, set waitlist ⊆ L × L of program locations in original and modified program for which to compare outgoing edges.
They are stored in the composite form (( 1 , 1 ), op, ( 2 , 2 )).Set D ⊆ L × L × Ops × L stores all edges ( 1 , op, 2 ) of the modified program P that represent a change from the original program P at 1 , called difference edges.
1 , 1 ) (line 4).It considers all outgoing edges ( 1 , op, 2 ) of 1 in the modified program.If the same operation op does not exist at any outgoing edge of 1 , it is considered to be changed and the difference edge (( 1 , 1 ), op, 2 ) is stored in D before continuing with the next state in waitlist.However, if the same operation op exists at an outgoing edge ( 1 , op, 2 ), it is considered to be equal and the standard edge (( 1 , 1 ), op, ( 2 , 2 )) is stored in E before continuing with the next state in waitlist.To this end, diffCond explores the syntactical composition of the original and modified program.In addition, if the tuple ( 2 , 2 . To turn an arbitrary verifier into a conditional one, a reducer-based conditional verifier puts a preprocessor (called reducer) in front of the verifier.The reducer gets a program and a condition and outputs a new, residual program that represents the program paths not covered by the condition.A full verification Predicate ∆ Predicate ∆ Fig. 8: CPU time (in s) of (a) full difference-verification runs and the time spent for the two diff.components diffCond + reducer, (b) Predicate with precision reuse (Predicate ) vs. Predicate with difference verification (Predicate ∆ ), and (c) Predicate ∆ vs. Predicate ∆ with precision reuse (Predicate ∆ )