Counterexample-Guided Prophecy for Model Checking Modulo the Theory of Arrays

We develop a framework for model checking infinite-state systems by automatically augmenting them with auxiliary variables, enabling quantifier-free induction proofs for systems that would otherwise require quantified invariants. We combine this mechanism with a counterexample-guided abstraction refinement scheme for the theory of arrays. Our framework can thus, in many cases, reduce inductive reasoning with quantifiers and arrays to quantifier-free and array-free reasoning. We evaluate the approach on a wide set of benchmarks from the literature. The results show that our implementation often outperforms state-of-the-art tools, demonstrating its practical potential.


Introduction
Model checking is a widely-used and highly-effective technique for automated property checking.While model checking finite-state systems is a well-established technique for hardware and software systems, model checking infinite-state systems is more challenging.One challenge, for example, is that proving properties by induction over infinite-state systems often requires the use of universally quantified invariants.While some automated reasoning tools can reason about quantified formulas, such reasoning is typically not very robust.Furthermore, just discovering these quantified invariants remains very challenging.
Previous work (e.g., [McM18]) has shown that prophecy variables can sometimes play the same role as universally quantified variables, making it possible to transform a system that would require quantified reasoning into one that does not.However, to the best of our knowledge, there has been no automatic method for applying such transformations.In this paper, we introduce a technique we call counterexample-guided prophecy.During the refinement step of an abstraction-refinement loop, our technique automatically introduces prophecy variables, which both help with the refinement step and may also reduce the need for quantified reasoning.We demonstrate the technique in the context of model checking for infinite-state systems with arrays, a domain which is known for requiring quantified reasoning.We show how a standard abstraction for arrays can be augmented with counterexample-guided prophecy to obtain an algorithm that reduces the model checking problem to quantifier-free, array-free reasoning.The paper makes the following contributions: i) we introduce an algorithm called Prophecize that uses history and prophecy variables to target a specific term at a specific time step of an execution, producing a new transition system that can effectively reason universally about that term; ii) we develop an automatic abstraction-refinement procedure for arrays, which leverages the Prophecize algorithm during the refinement step, and show that it is sound and produces no false positives; iii) we develop a prototype implementation of our technique; and iv) we evaluate our technique on four sets of model checking benchmarks containing arrays and show that our implementation outperforms state-of-the-art tools on a majority of the benchmark sets.
The rest of the paper is organized as follows.We start by providing relevant background information in Section 2. We then motivate the use of prophecy variables with an example and introduce the Prophecize algorithm in Section 3. We describe our abstraction-refinement framework for arrays in Section 4 and discuss its expressiveness and limitations in Section 5.In Section 6, we describe our prototype along with some implementation details.We evaluate our approach empirically in Section 7, cover related work in Section 8, and finally conclude in Section 9.
This paper is an extended version of work presented in [MIG + 21].Notable changes include the proofs, a self-comparison with different options in Section 7, and Section 6, which discusses implementation details of the prototype.

Background
We assume the standard many-sorted first-order logical setting with the usual notions of signature, term, formula, and interpretation.A theory is a pair T = (Σ, I) where Σ is a signature and I is a class of Σ-interpretations, the models of T .A Σ-formula ϕ is satisfiable (resp., unsatisfiable) in T if it is satisfied by some (resp., no) interpretation in I. Given an interpretation M, a variable assignment s over a set of variables X is a mapping that assigns each variable x ∈ X of sort σ to an element of σ M , denoted x s .We write M[s] for the interpretation that is equivalent to M except that each variable x ∈ X is mapped to x s .Let x be a variable, t a term, and φ a formula.We denote with φ{x → t} the formula obtained by replacing every free occurrence of x in φ with t.We extend this notation to sets of variables and terms in the usual way.If f and g are two functions, we write f • g to mean functional composition, i.e., f • g(x) = f (g(x)).We use to refer to set union.
Let T A be the standard theory of arrays [Mcc62] with extensionality, extended with constant arrays.Concretely, we assume sorts for arrays, indices, and elements, and function symbols read , write, and constarr .Here and below, we use a and b to refer to arrays, i and j to refer to array indices, and e and c to refer to array elements, where c is also restricted to be an interpreted constant.The theory contains the class of all interpretations satisfying the following axioms: ∀ a, i, j, e. i = j =⇒ read (write(a, j , e), i ) = e ∧ i = j =⇒ read (write(a, j , e), i ) = read (a, i ) (write) ∀ a, b.
2.1.Symbolic Transition Systems and Model Checking.For generality, assume a background theory T with signature Σ.We will assume that all terms and formulas are Σ-terms and Σ-formulas, that entailment is entailment modulo T , and interpretations are T -interpretations.A symbolic transition system (STS) S is a tuple S := X, I, T , where X is a finite set of state variables, I(X) is a formula denoting the initial states of the system, and T (X, X ) is a formula expressing a transition relation.Here, X is the set obtained by replacing each variable x ∈ X with a new variable x of the same sort.Let prime(x) = x be the bijection corresponding to this replacement.We say that a variable x is frozen if T |= x = x.When the state variables are obvious, we will often drop X.
A state s of S is a variable assignment over X.An execution of S of length k is a pair M, π , where M is an interpretation and π := s 0 , s 1 , . . ., s k−1 is a path of length k, a sequence of states such that M[s 0 ] |= I(X) and M[s i ][s i+1 • prime −1 ] |= T (X, X ) for all 0 ≤ i < k − 1.When reasoning about paths, it is often convenient to have multiple copies of the state variables X.We use X@n to denote the set of variables obtained by replacing each variable x ∈ X with a new variable called x@n of the same sort.We refer to these as timed variables.A state s is reachable in S if it appears in a path of some execution of S. We say that a formula P (X) is an invariant of S, denoted by S |= P (X), if P (X) is satisfied in every reachable state of S (i.e., for every execution M, π , M[s] |= P (X) for each s in π).The invariant checking problem is, given S and P (X), to determine if S |= P (X).A counterexample is an execution M, π of S of length k such that M[s k−1 ] |= P (X).If I(X) |= φ(X) and φ(X) ∧ T (X, X ) |= φ(X ), then φ(X) is an inductive invariant.Every inductive invariant is an invariant (by induction over path length).In this paper we focus on model checking problems where I, T and P are quantifier-free.However, a quantified inductive invariant might still be necessary to prove a property of the system.
Bounded Model Checking (BMC) is a bug-finding technique which attempts to find a counterexample for a property, P (X), of length k for some finite k [BCCZ99].A single BMC query at bound k for an invariant property uses a constraint solver to check the satisfiability of the following formula: BM C(S, P, k) := I(X@0) ∧ ( k−2 i=0 T (X@i, X@(i + 1))) ∧ ¬P (X@(k − 1)).If the query is satisfiable, there is a bug.

Counterexample-Guided Abstraction Refinement (CEGAR)
. CEGAR is a general technique in which a difficult conjecture is tackled iteratively [Cla03].Algorithm 1 shows a simple CEGAR loop for checking an invariant P for an STS S. It is parameterized by three functions.The Abstract function produces an initial abstraction of the problem.It must satisfy the contract that if S, P = Abstract(S, P ), then S |= P =⇒ S |= P .The next function is the Prove function.This can be any (unbounded) model-checking algorithm that can return counterexamples.It checks whether a given property P is an invariant of a given STS S. If it is, it returns with proven set to true.Otherwise, it returns a bound k at which a counterexample exists.The final function is Refine.It takes the abstracted STS and property together with a bound k at which a known counterexample Algorithm 1 STS-CEGAR(S := X, I, T , P ) 1: X, I, T , P ← Abstract(S, P ) 2: while true do 3: k, proven ← Prove( X, I, T , P ) // try to prove 4: if proven then return true // property proved 5: X, I, T , P , refined ← Refine( X, I, T , P , k) // try to refine 6: if ¬refined then return false // found counterexample 7: end while for the abstract STS exists.Its job is to refine the abstraction until there is no longer a counterexample of size k.If it succeeds, it returns the new STS and property.It fails if there is an actual counterexample of size k for the concrete system.In this case, it sets the return value refined to false.

Auxiliary variables.
We finish this section with relevant background on auxiliary variables, a crucial part of the refinement step described in Section 4. Auxiliary variables are new variables added to the system which do not influence its behavior (i.e., a state is reachable in the old system iff it is a reduct [Hod93] to the old set of variables of a reachable state in the new system).There are two main categories of auxiliary variables we consider: history and prophecy.History variables, also known as ghost state, preserve a value, making its past value available in future states [OG76].Prophecy variables are the dual of history variables and provide a way to refer to a value that occurs in a future state.Abadi and Lamport formally characterized soundness conditions for the introduction of history and prophecy variables [AL88].Here, we consider a simple, structured form of history variables.
Definition 2.1.Let S = X, I, T be an STS, t a term whose free variables are in X, and n > 0, then Delay(S, t, n) returns a tuple X h , I h , T h , h n t , containing a new STS and history variable, where X h = X {h 1 t , . . ., h n t }, I h = I, and The Delay operator makes the current value of a term t available for the next n states in a path.This is accomplished by adding n new history variables and creating an assignment chain that passes the value to the next history variable at each state.Thus, h k t contains the value that t had k states ago.The initial value of each history variable is unconstrained.
Theorem 2.2 [AL88].Let S = X, I, T be an STS, P a property, and Delay(S, v, n) = S h , h n v .Then S |= P iff S h |= P .We refer to [AL88] for a general proof that subsumes Theorem 2.2.In contrast to the general approach for history variables, we use a version of prophecy that only requires a single frozen variable.The motivation for this is that a frozen variable can be used in place of a universal quantifier, as the following theorem adapted from [McM18] shows.
→: Suppose there is a counterexample trace demonstrating that S |= ∀ y.P (X ∪ {y}).In the last step of the trace, the property is violated.There is a specific value of y for which P (X ∪ {y}) does not hold.We can reconstruct the same counterexample trace for S p and P (X ∪ {y}){y → v} by assigning v this same value in every step of the trace.←: If there is a counterexample trace for S p |= P (X ∪ {y}){y → v}, we can construct a counterexample for S and ∀ y.P (X ∪ {y}) by using the same trace but dropping the assignment to the frozen variable, v. Finally, the value of y that violates the property in the last step of the trace will be the same value v was assigned.
Theorem 2.3 shows that a universally quantified variable in a property can be replaced with a fresh symbol in a process similar to Skolemization.The intuition is as follows.The frozen variable has the same value in all states, but it is uninitialized by I. Thus, for each path in S, there is a corresponding path (i.e., identical except at v) in S p for every possible value of v.This proliferation of paths plays the same role as the quantified variable in P .We mention here one more theorem from [McM18].This one allows us to introduce a universal quantifier.
Theorem 2.4 [McM18].Let S = X, I, T be an STS, P (X) a formula, and t a term.Then, S |= P (X) iff S |= ∀ y.(y = t =⇒ P (X)), where y is not free in P (X).
Proof Sketch.P (X) and ∀ y.(y = t =⇒ P (X)) for y ∈ X are equivalent in first order logic.Intuitively, when y = t, it simplifies to P (X) and all other values of y render the formula trivially true.Theorems 2.3 and 2.4 are special cases of Theorems 3 and 4 of [McM18], which handle temporal logic [Pnu77] formulas.Another notable difference is that Theorem 3 of [McM18] uses a fresh background symbol to replace the universally quantified variable.Note that it does not change as the system evolves because it is not a state variable of the transition system.Rather than allowing background symbols, we simulate this with a frozen variable that maintains its value in Theorem 2.3.

Using Auxiliary Variables to Assist Induction
We can use Theorem 2.4 followed by Theorem 2.3 to introduce frozen prophecy variables that predict the value of a term t when the property P is being checked.We refer to t as the prophecy target and the process as universal prophecy.If we also use Delay, we can target a term at some finite number of steps before the property is checked.This is captured by Algorithm 2, which takes a transition system, property P (X), term t, and n ≥ 0. If n = 0, it introduces a universal prophecy variable for t.Otherwise, it first introduces history variables for t and then applies universal prophecy to the delayed t.In either case it returns the augmented system, augmented property, and the prophecy variable.
We will use the STS shown in Fig. 1(a) as a running example throughout the paper (it is inspired by the hardware example from [Bje08]).We assume the background theory T includes integer arithmetic and arrays of integers indexed by integers.The variables in this STS include an array and four integer variables, representing the read index (i r ), write index (i w ), read data (d r ), and write data (d w ), respectively.The system starts with an array of all zeros.At every step, if the write data is less than 200, it writes that data to the array at the write index.Otherwise, the array stays the same.Additionally, the read data Algorithm 2 Prophecize( X, I, T , P (X), t, n) return X {p t }, I, T ∧ p t = p t , p t = t =⇒ P (X), p t 3: else 4: X h , I h , T h , h n t := Delay( X, I, T , t, n)

5:
return X h {p n t }, I, T ∧ p n t = p n t , p t = h n t =⇒ P (X), p n t 6: end if is updated with the current value of a at i r .This effectively introduces a one-step delay between when the value is read from a and when the value is present in d r .The property is that d r < 200.This property is clearly true, but it is not straightforward to prove with standard model checking techniques because it is not inductive.Note that it is also not k-inductive for any k [SSS00].The primary issue is that the property does not constrain the value of a at all, so in an inductive proof, the value of a could be anything in the induction hypothesis.
One way to prove the property is to strengthen it with the quantified invariant: ∀ i. read (a, i ) < 200.Remarkably, observe that by augmenting the system using Prophecize, it is possible to prove the property using only a quantifier-free invariant.In this case, the relevant prophecy target is the value of i r one step before checking the property.We run Prophecize( X, I, T , P, i r , 1) and it returns the system and property shown in Fig. 1(b), along with the prophecy variable p 1 ir .This augmented system has a simple, quantifier-free invariant which can be used to strengthen the property, making it inductive: read (a, p ir ) < 200.This formula holds in the initial state because of the constant array, and if we start in a state where it holds, it still holds after a transition.
Notice that the invariant learned over the prophecy variable has the same form as the original quantified invariant.However, we have instantiated that universal quantifier with a fresh, frozen prophecy variable.Intuitively, the prophecy variable captures a proof by contradiction: assume the property does not hold, consider the value of i r one step before the first failure of the property, and then use this value to show the property holds.This example shows that auxiliary variables can be used to transform an STS without a quantifier-free inductive invariant into an STS with one.However, it is not yet clear how to identify good targets for history and prophecy variables.In the next section, we show how this can be done as part of an abstraction refinement scheme for symbolic transition systems over the theory of arrays.26:7

Abstraction Refinement for Arrays
We now introduce our main contribution.Given a background theory T B and a model checking algorithm for STSs over T B , we use an instantiation of the CEGAR loop in Algorithm 1 to check properties of STSs over the theory that combines T B and the theory of arrays, T A .The key idea is to abstract all array operators and then add array lemmas as needed during refinement.4.1.Abstract and Prove.We use a standard abstraction for the theory of arrays, which we denote Abstract-Arrays.Every array sort is replaced with an uninterpreted sort, and the array variables are abstracted accordingly.Each constant array is replaced by a fresh abstract array variable, which is then constrained to be frozen (because constant arrays do not change over time).Additionally, we replace the read and write array operations with uninterpreted functions.Note that if the system contains multiple array sorts, we need to introduce separate read and write functions for each uninterpreted abstract array sort.Using uninterpreted sorts and functions for abstracting arrays is a common technique in Satisfiability Modulo Theories [BT18] (SMT) solvers [GKF08].Intuitively, our initial abstraction starts with memoryless arrays, i.e., the array axioms are not initially enforced on the abstraction.We then incrementally refine the arrays' memory as needed by adding prophecy variables to be used in array axioms.Intuitively, a prophecy variable keeps track of an index of the array that will faithfully store values.Fig. 2 shows the result of running Abstract-Arrays on the example from Fig. 1(a).Prove can be instantiated with any (unbounded) model checker that can accept expressions over the background theory T B combined with the theory of uninterpreted functions.In particular, due to our abstraction, the model checker does not need to support the theory of arrays.4.2.Refine.Here, we explain the refinement approach for our array abstraction.At a high level, the algorithm solves a BMC problem over the abstract STS at bound k.It identifies violations of array axioms in the returned abstract counterexample, and instantiates each violated axiom (this is essentially the same as the lazy array axiom instantiation approach used in SMT solvers [BMS06, BB08, CH15, dB09]).We then lift these axioms to the STSlevel by modifying the STS.It is this step that may require introducing auxiliary variables.The details are shown in Algorithm 3.
Line 1 computes an index set I of index terms with ComputeIndices -this set is used in the lazy axiom instantiation step below.The procedure adds to I every term that appears as an index in a read or write operation (recall that these appear as uninterpreted functions in the abstracted STS and property) in BM C( S, P , k).Furthermore, it adds a witness index for every array equality -the witness corresponds to a Skolemized existential variable Algorithm 3 Refine-Arrays ( S := X, I, T , P , k) if ρ = ⊥ then return X, I, T , P , true // Property holds up to bound k 5: ca, nca ← CheckArrayAxioms(ρ, I) if ca = ∅ ∧ nca = ∅ then return X, I, T , P , f alse // True counterexample 7: // Go through non-consecutive array axiom instantiations 8: for ax, i@n i ∈ nca do 9: let n min := min(τ (ax)\{n i }) 10: ca ← ca {ax c @n min } // add consecutive version of axiom 13: end for 16: // Go through consecutive array axiom instantiations 17: for ax ∈ ca do 18: if k = 0 then 21: else if n min = n max then 23: T ← T ∧ ax{X@n min → X} ∧ ax{X@n min → X } 24: else 25: T ← T ∧ ax{X@n min → X}{X@(n min + 1) → X } 26: end if

27:
end for 28: end loop in the contrapositive of axiom (ext).For soundness, it must also add an extra variable λ σ for each index sort σ and constrain it to be different from all the other index variables of the same sort (this is based on the approach in [BMS06]).Intuitively, this variable represents an arbitrary index different from those mentioned in the STS.We assume that the index sorts are from an infinite domain so that a distinct element is guaranteed.For simplicity of presentation, we also assume from now on that there is only a single index sort (e.g., integers).Otherwise, I must be partitioned by sort.For the abstract STS in Fig. 2, with k = 1, the index set would be I := {i r @0, i w @0, w 0 @0, w 1 @0, λ Int @0, i r @1, i w @1, w 0 @1, w 1 @1, λ Int @1}, where w 0 and w 1 are witness indices.
After computing indices, the algorithm enters the main loop.Line 3 dispatches a bounded model checking query, BM C( S, P , k).The result ρ is either a counterexample, or the distinguished value ⊥, indicating that the query is unsatisfiable.If it is the latter, then it returns the refined STS and property, as the property now holds on the STS up to bound k.Otherwise, the algorithm continues.The next step (line 5) is to find violations of array axioms in the execution ρ based on the index set I.
CheckArrayAxioms takes two arguments, a counterexample and an index set, and returns instantiated array axioms that do not hold over the counterexample.This works as follows.It first looks for occurrences of write in the BMC formula.For each such occurrence, it instantiates the (write) axiom so that the write term in the axiom matches the term in the formula (i.e., we use the write term as a trigger).This instantiates all quantified variables except for i.Next it instantiates i once for each variable in the index set.Each of the instantiated axioms are evaluated using the values from the counterexample and all instantiations that reduce to false are saved.The procedure does the same thing for the (const) axiom, using each constant array term in the BMC formula as a trigger.Finally, for each array equality a@m = b@n in the BMC formula, it checks an instantiation of the contrapositive of (ext): a@m = b@n → read (a@m, w i @n) = read (b@n, w i @n).The saved instantiated formulas that do not hold in ρ are added to the set of violated axioms.
CheckArrayAxioms sorts the collected, violated axiom instantiations into two sets based on which timed variables they contain.The consecutive set contains formulas with timed variables whose timing differs by at most one; whereas the timed variables in the formulas contained in the non-consecutive set may differ by more.Formally, let τ be a function which takes a single timed variable and returns its time (e.g., τ (i@2) = 2).We lift this to formulas by having τ (φ) return the set of all time-steps for variables in φ.A formula φ is consecutive iff max (τ (φ)) − min(τ (φ)) ≤ 1.Note that instantiations of (ext) are consecutive by construction.Additionally, because constant arrays have the same value in all time steps, the algorithm can always choose a representative time-step for instantiations of (const) that results in a consecutive formula.However, instantiations of (write) may be non-consecutive, because the variable from the index set may be from a time-step that is different from that of the trigger term.CheckArrayAxioms returns the pair ca, nca , where ca is a set of consecutive axiom instantiations and nca is a set of pairs -each of which contains a non-consecutive axiom instantiation and the index-set term that was used to create that instantiation.We assume that the index-set term used in a non-consecutive axiom is not an auxiliary variable.Since auxiliary variables only record or predict the value of another index, it does not make sense to target one of these for prophecy.
Line 6 checks if the returned sets are empty.If so, then there are no array axiom violations and ρ is a concrete counterexample.In this case, the system, property, and false are returned.Otherwise, the non-consecutive formulas are processed in lines 8-15.Given a non-consecutive formula ax together with its index-set variable i@n i , line 9 computes the minimum time-step of the axiom's other variables, n min .Then line 10 calls the Prophecize method to create a prophecy variable p k−n i i , that is effectively a way to refer to i@n i at time-step n min (line 10).This allows the algorithm to create a consecutive formula ax c that is semantically equivalent to ax (line 11).This new consecutive formula is added to ca in line 12, and in line 13 the introduced prophecy variables (one for each time-step) are added to the index set.Then, line 14 updates the abstraction.
At line 17, the algorithm is left with a set of consecutive formulas to process.For each consecutive formula ax, line 18 computes the minimum and maximum time-step of its variables, which must differ by no more than 1 (line 19).There are three cases to consider: i) when k = 0, the counterexample consists of only the initial state -then the initial state is refined by adding the untimed version of ax to I (line 21); ii) if ax contains only variables from a single time step, then the untimed version of ax is added as a constraint for both X and X , ensuring that it will hold in every state (line 23); iii) finally, if ax contains variables from two adjacent time steps, it can be translated directly into a transition formula to be added to T (line 25).The loop then repeats with the newly refined STS.
Example.Consider again the example from Fig. 2, and suppose Refine-Arrays is called on S and P with k = 3.At this unrolling, one possible abstract counterexample violates the following nonconsecutive axiom instantiation: (i r @2 = i w @0 =⇒ read ( write( a@0, i w @0, d w @0), i r @2) = d w @0) ∧ (i r @2 = i w @0 =⇒ read ( write( a@0, i w @0, d w @0), i r @2) = read ( a@0, i r @2)) To make this nonconsecutive axiom consecutive, it introduces a prophecy variable.The target will be the instantiated index, i r @2.The relevant value to predict is one step before a possible property violation (i.e., the end of a finite path), because k = 3, and τ (i r @2) = 2, thus k − τ (i r @2) = 1.This corresponds to the k − n i at line 10 of Algorithm 3. Calling Prophecize( S, P , i r , 1) returns the new STS X {h 1 ir , p 1 ir }, I, T ∧ h 1 ir = i r ∧ p 1 ir = p 1 ir and the new property p 1 ir = h 1 ir =⇒ d r < 200.The history variable h 1 ir makes the previous value of i r available at each time-step, and the prophecy variable p 1 ir predicts the value of i r one step before a possible property violation.The axiom will be updated by replacing i r @2 with the prophecy variable which has the same value.Since the prophecy variable is frozen, it is the same at every step.Thus, it can choose the prophecy variable at a time-step that makes the axiom consecutive.In this case, the algorithm substitutes p 1 ir @0 for i r @2.This results in the following consecutive axiom: (p 1 ir @0 = i w @0 =⇒ read ( write( a@0, i w @0, d w @0), p 1 ir @0) = d w @0) ∧ (p 1 ir @0 = i w @0 =⇒ read ( write( a@0, i w @0, d w @0), p 1 ir @0) = read ( a@0, p 1 ir @0)) The untimed version (and a primed version) of this consecutive axiom would be added to the transition relation at line 23 of Algorithm 3.
We stress that processing nonconsecutive axioms using Prophecize is how Algorithm 3 automatically discovers the universal prophecy variable p 1 ir , and it is exactly the universal prophecy variable that was needed in Section 3 to prove correctness of the running example.An alternative approach could avoid nonconsecutive axioms using Craig interpolants [Cra57] so that only consecutive axioms are found [BCS20].However, quantifier-free interpolants are not guaranteed to exist for the standard theory of arrays [KMZ06,BGR12], and the auxiliary variables found using nonconsecutive axioms are needed to improve the chances of finding a quantifier-free inductive invariant.It is thus extremely important to start with a weak abstraction that allows us to examine spurious counterexamples in the BMC unrolling and find nonconsecutive axiom instantiations, which are then used to identify good prophecy targets.4.3.Correctness.We now state two important correctness theorems.
Theorem 4.1.Algorithm 1, instantiated with Abstract-Arrays, a sound model-checker Prove as described above, and Refine-Arrays is sound, i.e., if it returns true then the property does hold.
Proof Sketch.Algorithm 1 only returns true if Prove succeeds in proving the property.Our initial abstraction only removes the array theory semantics, but leaves every other theory intact, so it is a sound abstraction.The refinement performed by Refine-Arrays is also sound.Prophecize first optionally applies Delay depending on the input arguments, then introduces a prophecy variable.Theorem 2.2 guarantees that Delay preserves the invariance of the property.Furthermore, introducing a prophecy variable is accomplished by directly applying Theorem 2.4, followed by Theorem 2.3, which additionally guarantee that the resulting system and property are invariant if and only if the original property is invariant on the original system.Thus, the entire Prophecize procedure produces a new system and property that preserve invariance with respect to the initial query.Finally, each axiom instantiation is, by definition, valid in the theory of arrays, and lifting them simply requires them to hold in every state of a path.Furthermore, this does not rule out any true counterexamples, as the interpretations in true counterexamples must be T A -interpretations.Therefore, if at any point Prove is able to prove the property, it follows that the original property holds on the original concrete system, S.
Theorem 4.2.If Algorithm 1, instantiated with Abstract-Arrays, Prove as described above, and Refine-Arrays, returns false, there is a counterexample in the concrete transition system.
Proof Sketch.Theorems 2.2, 2.3, and 2.4 ensure that invariance of the property is preserved when adding auxiliary variables.Algorithm 1 returns false only when Refine-Arrays returns false in line 6 of Algorithm 3.This occurs if the refinement procedure is unable to find any array axioms that are violated in the BMC formula (ρ from line 3 of Algorithm 3).It suffices to prove that if all enumerated axioms hold, then the BMC formula is satisfiable in T A and there is a length k counterexample.
The array property fragment [BMS06, BM07, KS16] is a fragment of the theory of arrays that allows some universal quantification while staying decidable.It is defined for an extensional theory of arrays with read and write functions with the semantics given by (write) and (ext).The fragment is of the form ∀ i .φ I ( i) → φ V ( i), for a vector of bound index variables i, index guard φ I , and value constraint φ V .Both φ I and φ V are constrained by a grammar.Our BMC queries are quantifier-free which falls within the array property fragment.The only universal quantifiers are hidden by the (const) axiom (because constant arrays are not explicitly included in the array property fragment theory of arrays).However, this simple form of universal quantification is contained in the fragment.Thus, our queries are a strict subset of the more general array property fragment.
Our axiom enumeration is based on a reduction technique [BMS06, BM07, KS16] for the array property fragment that is sound and complete.Because the technique is complete, if the abstract formula is satisfiable and all enumerated axioms are true, then the original T A formula is satisfiable.

Expressiveness and Limitations
5.1.Expressiveness.We now address the expressiveness of counterexample-guided prophecy with regard to the introduction of auxiliary variables.For simplicity, we ignore the array abstraction, relying on the correctness theorems.An inductive invariant using auxiliary variables can be converted to one without auxiliary variables by first universally quantifying over the prophecy variables, then existentially quantifying over the history variables.The details are captured by this theorem: Theorem 5.1.Let S := X, I, T be an STS, and P (X) be a property such that S |= P (X).Let H be the set of history variables, and P be the set of prophecy variables introduced by Refine-Arrays.Let S := X ∪ H ∪ P, I, T and P := ( p∈P p = t(p)) =⇒ P (X) be the system and property with auxiliary variables.The function t maps prophecy variables to their target term from Prophecize.If Inv (X, H, P) is an inductive invariant for S and entails P , then ∃H∀PInv(X, H, P) is an inductive invariant for S and entails P , where ∃H and ∀P bind each variable in the set with the corresponding quantifier.
Proof.We assume that Inv (X, H, P) is an inductive invariant that guarantees P .Equivalently, it meets the following conditions: Inv (X, H, P) |= P (safety) We must show that ∃H∀PInv (X, H, P) is an inductive invariant of S and entails P .We accomplish this by demonstrating that each of the three conditions must hold.
Initiation: I |= ∃H∀PInv (X, H, P).This holds trivially because I is unchanged, i.e., no auxiliary variables appear in the initial state constraint.
Consecution: ∃H∀PInv (X, H, P) ∧ T |= ∃H ∀P Inv (X , H , P ).This is equivalent to the following formula (manipulated into negation normal form) being unsatisfiable: ∃H∀PInv (X, H, P) ∧ T ∧ ∀H ∃P ¬Inv (X , H , P ) (5.1) To complete this part of the proof, we introduce the function σ which maps primed history variables to their next state update term from Delay.For example, suppose we called Delay(S, x, 2) for state variable x, then σ(h 1 x ) = x and σ(h 2 x ) = h 1 x .Crucially, the terms in the range of σ do not contain variables from P , because prophecy variables are not targeted by Prophecize.With this notation, the consecution of Inv (X, H, P) for T means that the following formula is unsatisfiable: We now show that the fact that (5.2) is unsatisfiable entails that (5.1) is unsatisfiable.First, observe that (5.2) is equisatisfiable with Next, if (5.1) is satisfiable, then the following formula is satisfiable, where we drop the quantifiers over H and replace them with fresh uninterpreted constants (for convenience, we simply drop the quantifiers and treat the free variables as uninterpreted constants): ∀PInv (X, H, P) ∧ T ∧ ∀H ∃P ¬Inv (X , H , P ).If this formula is satisfiable, then the following formula is also satisfiable, where we instantiate the universal quantifier over H with σ(H ): ∀PInv (X, H, P) ∧ T ∧ ∃P ¬Inv (X , σ(H ), P ).Finally, if this formula is satisfiable, then we can drop the existential quantifiers for P and instantiate the universal quantifier for P with P , which gives that (5.3) is satisfiable.Since (5.3) is unsatisfiable, then (5.1) must be unsatisfiable as well.Safety: ∃H∀PInv (X, H, P) |= P (X).This holds when ∃H∀PInv (X, H, P) ∧ ¬P (X) is unsatisfiable.To show this, we construct quantifier instantiations such that the resulting formula must be unsatisfiable.We first instantiate all the variables in H with fresh constants by dropping the existential quantification.Next, we instantiate each of the universally quantified p ∈ P with their target term t(p).Note that this target term might be a history variable from H which is now instantiated.We allow t to be applied to sets in the straightforward way.A model for the resulting formula, Inv (X, H, P){P → t(P)} ∧ ¬P (X), would be a counterexample for assumption (safety).Thus it must be unsatisfiable.
We have shown that initiation holds trivially, and that consecution and safety hold using quantifier instantiations.Thus, ∃H∀PInv (X, H, P) must be an inductive invariant for S and P (X).
Although the invariants found using counterexample-guided prophecy correspond to ∃∀ invariants over the unmodified system, we must acknowledge that the existential power is very weak.The existential quantifier is only used to remove history variables.While history variables can certainly be employed for existential power in an invariant [PHM + 21], these specific history variables are introduced solely to target a term for prophecy and only save a term for some fixed, finite number of steps.Thus, we do not expect to gain much existential power in finding invariants on practical problems.This use of history and prophecy variables can be thought of as quantifier instantiation at the model checking level, where the instantiation semantically uses a term appearing in an execution of the system.Consequently, our technique performs well on systems where there is only a small number of instantiations needed over terms that are not too distant in time from a potential property violation that must be disproved (i.e., not many history variables are required).This appears to be a common situation for invariant-finding benchmarks, as we show empirically in Section 7. 5.2.Limitations.If our CEGAR loop terminates, it either terminates with a proof or with a true counterexample.However, it is possible that the procedure may not terminate.In particular, while we can always refine the abstraction for a given bound k, there is no guarantee that this will eventually result in a refinement that rules out all spurious counterexamples (of any length).
This failure mode occurs, for instance, when no finite number of calls to Prophecize can capture all the relevant indices of the array.Consider an example system with I := a = constarr (0 ), T := a = write(a, i 0 , read (a, i 1 ) + 1 ), and P := read (a, i r ) ≥ 0. The array a is initialized with 0 at every index, and at every step, a is updated at a single index by reading from an arbitrary index of a and adding 1 to the result. 1 Note that the index variables are unconstrained: they can range over the integers freely at each time step.The property is that reading from a at i r returns a positive value.This property holds because of a quantified invariant maintained by the system: ∀i .read (a, i ) ≥ 0.
However, the initial abstraction is a memoryless array which can easily violate the property by returning negative values from reads.Since the array is updated in each step at an arbitrary index based on a read from another arbitrary index, no finite number of prophecy variables (of the form used in Prophecize) can capture all the relevant indices.It will successively rule out longer finite spurious counterexamples, but will never be refined enough to prove the property unboundedly.Note that this is related to our abstraction and choice to limit prophecy to predicting values a fixed, finite number of steps before a 1 An even simpler system which does not add 1 in the update would already be problematic; however, for that case, it is straightforward to extend our algorithm to have it learn that the array does not change.
26:14 potential property violation.Another form of prophecy variable could be used to prove this property.For example, a prophecy variable that predicts the first index value that stores a negative value in a could be used to show that this cannot happen.
We believe that this issue can be circumvented in an automated fashion with future work.In fact, an approach introduced since the conference version [MIG + 21] of this paper uses prophecy variables with a different refinement loop for verifying parameterized protocols, which cannot be handled by our technique due to this limitation [CGR21].
A related, but less fundamental issue is that the index set might not contain the best choice of targets for prophecy.While the index set is sufficient for ruling out bounded counterexamples, it is possible there is a better target for universal prophecy that does not appear in the index set.However, based on the evaluation in Section 7, it appears that the index set does work well in practice.

Implementation Details
We will now describe our prototype of counterexample-guided prophecy along with some practical implementation details.Recall that Algorithm 1 can use any unbounded model checking technique for Prove.In our prototype, we choose to instantiate it with ic3ia [Gri20] (downloaded Apr 27, 2020), an open-source C++ implementation of IC3 via Implicit Predicate Abstraction (IC3IA) [CGMT16], which is itself a CEGAR loop that uses implicit predicate abstraction to perform IC3 [Bra11,BBW14] on infinite-state systems and uses interpolants to find new predicates.ic3ia uses MathSAT [CGSS13] (version 5.6.3) as the backend SMT solver and interpolant producer.We call our prototype prophic3 [MI].

Engineering Heuristics and Options.
Weak and Strong Abstraction.It is important to have enough prophecy variables to assist in constructing inductive invariants.We found that we could often obtain a larger, richer set of prophecy variables by weakening our array abstraction.We do this by replacing equality between arrays by an uninterpreted predicate, and also checking the congruence axiom, the converse of (ext).Since more axioms are checked, there are more opportunities to introduce auxiliary variables.We call this weak abstraction (WA) as opposed to strong abstraction (SA), which uses regular equality between abstract arrays and guarantees congruence through UF axioms.Our default configuration uses weak abstraction.
Lemma and Auxiliary Variable Filtering.Although the algorithm depends on introducing auxiliary variables, an excessive number of unnecessary auxiliary variables could overwhelm the Prove step.Thus, an improvement not shown in Algorithm 3 is to check consecutive axioms first and only add nonconsecutive ones when necessary.This is the motivation behind the custom array solver implementation CheckArrayAxioms based on [BMS06].In principle, we could have used an SMT solver to find array axioms, but it would give no preference to consecutive axioms.Even when enumerating consecutive axioms first, we can still end up with more auxiliary variables than necessary.We use an unsat-core based procedure to prune nonconsecutive refinement axioms.In particular, we attempt to remove nonconsecutive axioms that target indices at times further from the end of the trace, because they would introduce more history variables.In practice, this can substantially reduce the number of added auxiliary variables.
Similarly, we could overwhelm the algorithm with unnecessary consecutive axioms.CheckArrayAxioms can still produce hundreds or even thousands of (consecutive) axiom instantiations.Once these are lifted to the transition system, some may be redundant.To mitigate this issue, when the BMC check returns ⊥ and we are about to return (line 4 of Algorithm 3), we keep only axioms that appear in the unsat core of the BMC formula [CGS11].
Abstract Values Refinement Loop.In our implementation, we also include a simple abstractionrefinement wrapper which abstracts large constant integers and refines them with the actual values if that fails.This is especially useful for verifying software benchmarks with large constant loop bounds.Otherwise, the system might need to be unrolled to a very large bound to reach an abstract counterexample.This was only necessary for a handful of benchmarks in the first benchmark set.
Assume Property in Pre-State.As long as we are only interested in the first violation of a property, we can assume that the property was not violated in the past.This observation is formalized in Theorem 5 of [McM18].It is common to achieve this for invariant checking by assuming the property over current state variables in the transition relation, so that every transition starts in a state satisfying the property.
In the context of counterexample-guided prophecy, this strategy may prove useful because the property is weakened with each call to Algorithm 2 (the original property becomes the consequent of an implication).We can assume the original (stronger) property in all previous states which can help the algorithm converge.
Consider the C program shown in Fig. 3, based on one of the benchmarks evaluated in Section 7. The example populates an array of arbitrary size N , with an arbitrary, fixed, value c. Fig. 4 shows one possible encoding of this program as an STS.This encoding is carefully chosen to illustrate a case where assuming the property in the pre-state is needed for counterexample-guided prophecy to converge.Other possible encodings could avoid this issue entirely.In this encoding, if we do not assume the original property in the pre-state, we observed that prophic3 would diverge and introduce an increasing number of prophecy variables.Consider a case where N is assigned a specific value, 5. Since the array abstraction starts memoryless, the algorithm needs to add 5 prophecy variables to refine the memory.However, since N is arbitrary, this results in an infinite chain of new prophecy variables as longer traces are considered.Furthermore, each time a prophecy variable is introduced, the underlying IC3IA algorithm is restarted with a weaker property.This means that the original property, ¬err , cannot be assumed in the pre-state.Note, the new property produced by Prophecize is an implication that will be trivially true in most of the path.The most important part of the transition relation in Fig. 4 to consider is the second-to-last line of T .Let that be the error rule.Since we do not assume ¬err , the error rule might be trivially satisfied.Intuitively, without this assumption, we need to justify the assertion for the value of index j at every time-step.
However, we know it is safe to assume the original property in the pre-state for counterexample guided prophecy.If we do so, the algorithm converges.This is because that assumption coupled with predicting a single j, one step before a potential property violation, is sufficient to enforce the error rule which ensures ¬err holds in the next state.
Important Variables.In many implementations of IC3IA, including ic3ia, new predicates are obtained by mining interpolants from an unsatisfiable spurious counterexample trace conjoined with the concrete unrolled transitions.Typically not all of these predicates are necessary, so they are often reduced using unsatisfiable cores.However, in the context of counterexample-guided prophecy, we might prefer certain predicates.In particular, predicates involving prophecy variables are good candidates, since we know the prophecy variable was necessary to rule out a spurious counterexample.Note that there are two levels of abstraction refinement in this context: the array abstraction and refinement for counterexample-guided prophecy, and the predicate abstraction in IC3IA.Here, we are focused on the latter.One heuristic we tried is always keeping predicates that use a prophecy variable.
Finite-domain Indices.In this paper, we have assumed that the index sort has an infinite domain.This fits our domain of problems that require quantified invariants.If the sort is finite, a universal quantifier could in principle be enumerated by instantiating it with every possible value.Although, it is certainly possible that a quantifier is much more efficient than this enumeration.
The restriction to infinite domain indices is also a technical limitation of the array solving technique of [BMS06], which our approach is based on.This restriction is shared by many SMT solvers, particularly when there are chains of equalities between writes on constant arrays with different bases, e.g., a = constarr (0 8 ) ∧ b = constarr (1 8 ) ∧ write(a, i 2 , e 8 ) = write(b, j 2 , d 8 ), where c w is a bitvector variable or value c with width w.An infinite domain allows us to assume that there is always an index value that has not been used in the array formula.This is crucial for the λ index, which has the primary goal of referring to indices initialized by the constant array axiom that were never overwritten.In a finite domain, we cannot make this assumption.See [BMS06,BM07] for more information on the λ index.
This limitation is only with regards to the array axiom enumeration.The other contributions of this paper, including using prophecy variables in place of universal quantification, are entirely applicable over finite domains.One low-effort approach for applying counterexampleguided prophecy over finite-domain indices is to give up completeness.By simply not including axioms over a λ index for finite-domain sorts, the array solving procedure might conclude the query is satisfiable when it is actually unsatisfiable.Thus, the overall algorithm could return spurious counterexamples, but would still soundly return proofs.A given spurious counterexample is finite and would be straightforward to analyze, either with a dedicated checker or an SMT solver without this limitation.

Experiments
7.1.Setup.We evaluate our tool against three state-of-the-art tools for inferring universally quantified invariants over linear arithmetic and arrays: freqhorn, quic3, and gspacer.All these tools are Constrained Horn Clause (CHC) solvers built on Z3 [dMB08].The algorithm implemented in this version of freqhorn [Fedb] is a syntax-guided synthesis [ABD + 15] approach for inferring universally quantified invariants over arrays [FPMG19].quic3 is built on Spacer [KGC14], the default CHC engine in Z3, and extends IC3 over linear arithmetic and arrays to allow universally quantified frames (frames are candidates for inductive invariants maintained by the IC3 algorithm) [GSV18].It also maintains a set of quantifier instantiations which are provided to the underlying SMT solver.quic3 was recently incorporated into Z3.We used Z3 version 4.8.9 with parameters suggested by the quic3 authors. 2 Finally, gspacer is an extension of Spacer which adds three new inference rules for improving local generalizations with global guidance [KCSG20].While this last technique does not specifically target universally quantified invariants, it can be used along with the quic3 options in Spacer and potentially executes a much different search.The gspacer submission [KG20] won the arrays category in CHC-COMP 2020 [R 20b].We use the same configuration entered in the competition.We also include ic3ia and the default configuration of Spacer in our results, neither of which can produce universally quantified invariants.Our default configuration of prophic3 uses weak abstraction.We chose to 2 fp.spacer.q3.use qgen=true fp.spacer.groundpobs=false fp.spacer.mbqi=falsefp.spacer.useeuf gen=true build our prototype on ic3ia instead of Spacer, in part because we needed uninterpreted functions for our array abstraction, and Spacer does not handle them in a straightforward way, due to the semantics of CHC [BGMR15].
We compare these solvers on four benchmark sets: i) freqhorn -all benchmarks containing arrays in [Feda] from the freqhorn paper [FPMG19]; ii) quic3 -benchmarks from the quic3 paper [GSV18] (these were C programs from SV-COMP [Bey17] that were modified to require universally quantified invariants); iii) vizel -additional benchmarks provided to us by the authors of [GSV18]; and iv) chc-comp-2020 -the array category benchmarks of CHC-COMP 2020 [R 20a] (as explained below, these contain a translation of the quic3 benchmarks).Additionally, we sort the benchmarks into four categories: 1) Q -safe benchmarks solved by some tool supporting quantified invariants but none of the solvers that do not; 2) QF -those solved by at least one of the tools that do not support quantified invariants, plus any unsafe benchmarks; 3) US -unsafe benchmarks and 4) UK -unknown (i.e., unsolved) benchmarks.Because not all of the benchmark sets were guaranteed to require quantifiers, this is an approximation of which benchmarks required quantified reasoning to prove safe.
Both prophic3 and ic3ia take a transition system and property specified in the Verification Modulo Theories (VMT) format [CRGI11], which is a transition system format built on SMT-LIB [BFT16].All other solvers read the CHC format.We translated benchmark sets freqhorn and chc-comp-2020 from CHC to VMT using the horn2vmt program which is distributed with ic3ia.For benchmark sets quic3 and vizel, we started with the C programs and generated both VMT and CHC using Kratos2 (an updated version of Kratos [CGM + 11]).We note that chc-comp-2020 includes another translation of the quic3 benchmarks to CHC by SeaHorn [GKKN15].We ran all experiments on a 3.5GHz Intel Xeon E5-2637 v4 CPU with a timeout of 2 hours and a memory limit of 32GB.7.2.Results.The results are shown in Fig. 6 as a table.Fig. 5 shows cactus plots demonstrating the number of solved benchmarks over time.We first observe that prophic3 solves the most benchmarks in the freqhorn, quic3, and vizel benchmark sets, both overall and in category Q.The quic3 (and most of the freqhorn) benchmarks require quantified invariants; thus, ic3ia and Spacer cannot solve any of them.On solved instances in the Q category, prophic3 introduced an average of 1.2 prophecy variables and a median of 1.This makes sense because, upon inspection, most benchmarks only require one quantifier and we are careful to only introduce prophecy variables when needed.On benchmarks it cannot solve, ic3ia either times out or fails to compute an interpolant.This is expected because quantifier-free interpolants are not guaranteed over the standard theory of arrays.Even without arrays, it is also possible for prophic3 to fail to compute an interpolant, because MathSAT's interpolation procedure is incomplete for combinations with non-convex theories such as integers.However, this was rarely observed in practice.
We further observe that prophic3 does not perform as well on unsafe benchmarks.This is expected, because our array solving procedure is enumeration-based and should be slower than the array theory solvers within an SMT solver.However, we believe that a dedicated array solving procedure is important for performance of the overall algorithm and especially safe benchmarks.We tried minimal experiments with obtaining array lemmas directly from the SMT solver and did not achieve comparable performance.This is likely because our array solver is aware of the ultimate goal to run Prophecize with a small delay and can enumerate array axioms in a corresponding order, starting with index instantiations that would require the fewest history variables.
There was one discrepancy in our experiments.On chc-LIA-lin-arrays 381 gspacer disagrees with quic3, Spacer, and prophic3.This is the same discrepancy mentioned in the CHC-COMP 2020 report [R 20b].prophic3 proved this benchmark safe without introducing any auxiliary variables and we used both CVC4 [BCD + 11] and MathSAT to verify that the solution was indeed an inductive invariant for the concrete system.We are confident that this benchmark is safe and thus do not count it as a solved instance for gspacer.Some of the tools are sensitive to the encoding.Since it is syntax-guided, freqhorn is sensitive to the encoding syntax.The freqhorn benchmarks were hand-written in CHC to be syntactically simple; this simplicity is maintained by horn2vmt and also benefits prophic3.However, prophic3 can be sensitive to other encodings.For example, the quic3 benchmarks translated by SeaHorn and included in chc-comp-2020 are much harder for prophic3 to solve (after translation by horn2vmt) compared to the direct C to VMT translation using Kratos2.We found that prophic3 solves 6 benchmarks when translated by horn2vmt • SeaHorn, versus 41 when translated directly by Kratos2.We stress that the CHC solvers performed similarly on both encodings: our experiments showed that quic3 and freqhorn solved exactly the same number in both translations, and gspacer solved 27 when translated with Kratos2 and 34 when translated with SeaHorn.Importantly, prophic3 on the Kratos2 encoding solved more benchmarks than any other tool and encoding pair.
There are two main reasons why prophic3 fails on the SeaHorn encodings.First, due to the LLVM-based encoding, some of the SeaHorn translations have index sets which are insufficient for finding the right prophecy variable.This has to do with the memory encoding and the way that fresh variables and guards are used.SeaHorn also splits memories into ranges which is problematic for our technique.Second, the SeaHorn translation is optimized for CHC, not for transition systems.For example, it introduces many new variables, and the argument order between different predicates may not match.In the transition system, this essentially has the effect of interchanging the values of variables between each loop.
SeaHorn has options that address some of these issues, and these helped prophic3 solve more benchmarks, but none of these options produce encodings that work as well as the Kratos2 encodings.The difference between good CHC and transition system encodings could also explain the overall difference in performance on chc-comp-2020 benchmarks, most of which were translated by SeaHorn.Both of these issues are practical, not fundamental, and we believe they can be resolved with additional engineering effort.7.3.Self Comparison.Next, we run a self-comparison using different options in prophic3.We accomplish this by starting with the configuration used above, and dropping a single feature to obtain a new configuration.This serves as a metric of how important each heuristic is to the overall performance of prophic3.Each configuration has a unique string identifying it as follows: (1) sa: with strong abstraction; (2) nav: no outer CEGAR loop that abstracts large values; (3) na: no assuming property in pre-state; (4) npr: no attempting to reduce the number of prophecy variables introduced; (5) ntp: no tracking prophecy variables as important variables to guide IC3IA to useful predicates; (6) nur: no unsat-core based reduction when enumerating timed axioms; (7) nhp: no seeding IC3IA with predicates obtained from equalities between history variables and targets over current state variables, e.g., if h 1 t = t is in the transition relation, would add h 1 x = t as a predicate; (8) nar: no additional reduction of consecutive axioms (differs from nur in that the consecutive axioms are lifted first); (9) noheur: a combination of 2-8.Fig. 7 shows the results in a table and Fig. 8 plots the number of solved benchmarks over time.We observe that prophic3 sa solves fewer benchmarks in the freqhorn, quic3, and vizel sets.However, it is faster on commonly solved instances.This makes sense because it needs to check fewer axioms (it uses built-in equality and thus does not check equality axioms).We suspect that it solves fewer benchmarks in the first three sets because it was unable to find the right prophecy variable.For example, for the standard find true-unreach-call ground benchmark in the quic3 set, a prophecy variable is needed to find a quantifier-free invariant.However, because of the stronger reasoning power of SA, the system can be sufficiently refined without introducing auxiliary variables.ic3ia is then unable to prove the property on the resulting system without the prophecy variable, instead timing out.Interestingly, notice that prophic3 sa solves the most benchmarks in the QF category overall, suggesting that there are practical performance benefits of the CEGAR approach even when quantified reasoning is not needed.
Feature nav is the additional CEGAR loop for abstracting large values.The results show that this primarily affects the freqhorn benchmarks.This is expected because those contained several examples with large, constant loop bounds.This means a quantifier was not strictly necessary, but was needed in practice.Without abstracting the loop bound, Algorithm 3 would take far too long to reach spurious counterexamples due to the large unrolling bound before an error state is reached.
Based on these results, each of the other heuristics alone do not make a big difference for these benchmarks.However, the noheur experiments demonstrates that dropping all of them simultaneously does negatively impact performance.It is slower overall in the cactus plots of Fig. 8, and solves markedly less in the vizel and chc-comp-2020 benchmark sets.The core algorithm performs well alone, but the heuristics interact to further improve performance.

Related Work
We refer often to McMillan's work in [McM18].In that paper, McMillan reduces infinite-state model checking problems to finite-state problems that can be checked with a SAT-based model checking algorithm by eagerly instantiating axioms.Not all possible axioms are instantiated, which is why this is an eager abstraction.This process requires introducing auxiliary variables.We use several of the same theorems, but for a different goal.Rather than reducing infinite-state to finite-state systems, we are interested in reducing problems with quantified inductive invariants to ones with quantifier-free ones.Furthermore, while the approach of [McM18] is a very general framework that is primarily applied manually, we focused on infinite arrays and provided a fully automated algorithm.
There are two important related approaches for abstracting arrays in horn clauses [MG16] and memories in hardware [Bje08].Both make a similar observation that arrays can be abstracted by modifying the property to maintain values at only a finite set of symbolic indices.We differ from the former by using a refinement loop that automatically adjusts the precision and targets relevant indices.The latter is also a refinement loop that adjusts precision, but differs in the domain and the refinement approach, which uses a multiplexor tree.Although neither paper uses the term prophecy variable, their refinement approaches can be viewed as prophecy-variable based.We differ from both approaches in our use of array axioms to automatically find and add auxiliary variables.
A similar lazy array axiom instantiation technique is proposed in [BCS20].However, their technique utilizes interpolants for finding violated axioms and cannot infer universally quantified invariants.The work of [CGI + 18] also uses lazy axiom-based refinement, abstracting non-linear arithmetic with uninterpreted functions.We differ in the domain and the use of auxiliary variables.In [PHM + 21], prophecy variables defined by temporal logic formulas are used for liveness and temporal proofs, with the primary goal of increasing the power of a temporal proof system.In contrast, we use prophecy variables here for a different purpose, and we also find them automatically.The work of [CN09] includes an approach for synthesizing auxiliary variables for modular verification of concurrent programs.Our approach differs significantly in the domain and details.
There is a substantial body of work on automated quantified invariant generation for arrays using first-order theorem provers [KV13,CKR16,KV09,McM08].These include extensions to saturation-based theorem proving to analyze specific kinds of predicates, and an extension to paramodulation-based theorem proving to produce universally quantified interpolants.In [LTZZ16], the authors propose an abstract interpretation approach to synthesize universally quantified array invariants.Our method also uses abstraction, but in a CEGAR framework.
Two other notable approaches capable of proving properties over arrays that require invariants with alternating quantifiers are [GGK20,PS20].The former proposes trace logic for extending first-order theorem provers to software verification, and the latter takes a counterexample-guided inductive synthesis approach.Our approach takes a model checking perspective and differs significantly in the details.While these approaches are more general, we compared against state-of-the-art tools that focus specifically on universally quantified invariants.
MCMT [GR10, GKLT12, CGK + 17] and its derivatives [ABG + 12, AGS14] are backwardreachability algorithms for proving properties over "array-based systems," which are typically used to model parameterized protocols.These approaches target syntactically restricted functional transition systems with universally quantified properties, whereas our approach targets general transition systems.Two other approaches for solving parameterized systems modeled with arrays are [GSM16] and [MGJ + 19].The former iteratively fixes the number of expected universal quantifiers, then eagerly instantiates them and encodes the invariant search to nonlinear CHC.The latter first uses a finite-state model checker to discover an inductive invariant for a specific parameterization and then applies a heuristic generalization process.We differ from all these techniques in domain and the use of auxiliary variables.Due to the limitations explained in Section 5, we do not expect our approach to work well for parameterized protocol verification without improvements.
In [LB04], heuristics are proposed for finding predicates with free indices that can be universally quantified in a predicate abstraction-based inductive invariant search.Our approach is counterexample-guided and does not utilize predicate abstraction directly (although IC3IA does).The authors of [KKRS17] propose a technique for Java programs that associates heap memory with the program location where it was allocated and generates CHC verification conditions.This enables the discovery of invariants over all heap memory allocated at that location, which implicitly provides quantified invariants.This is similar to our approach in that it gives quantification power without explicitly using quantifiers and in that their encoding removes arrays.However, we differ in that we focus on transition systems and utilize a different paradigm to obtain this implicit quantification.Prophecy variables have also been proposed for Hoare-style reasoning about concurrent programs.In [ZFF + 12], the authors formalize "structural" prophecy variables for Hoare logic, which can only predict state within their own thread.The authors of [JLP + 20] generalize this approach for separation logic to allow predicting values between different threads.Our work differs in the domain and level of automation.

Conclusion
We presented a novel approach for model checking transition systems containing arrays.We observed that history and prophecy variables can be extremely useful for reducing quantified invariants to quantifier-free invariants.We demonstrated that an initially weak abstraction in our CEGAR loop can help us to automatically introduce relevant auxiliary variables.Finally, we evaluated our approach on four sets of interesting array-manipulating benchmarks.In future work, we hope to improve performance, explore a tighter integration with the underlying model checker, address the limitations described in Section 5, and investigate applications of counterexample-guided prophecy to other theories.

Figure 2 .
Figure 2. Result of calling Abstract on the example from Fig. 1(a)

Figure 5 .
Figure 5. Number of solved benchmarks over time (sorted).

Figure 8 .
Figure 8. Number of solved benchmarks in self-comparison over time (sorted) Figure 6.Experimental results.They are reported as # Q / # QF / # US.