Property Directed Self Composition

We address the problem of verifying k-safety properties: properties that refer to k-interacting executions of a program. A prominent way to verify k-safety properties is by self composition. In this approach, the problem of checking k-safety over the original program is reduced to checking an"ordinary"safety property over a program that executes k copies of the original program in some order. The way in which the copies are composed determines how complicated it is to verify the composed program. We view this composition as provided by a semantic self composition function that maps each state of the composed program to the copies that make a move. Since the"quality"of a self composition function is measured by the ability to verify the safety of the composed program, we formulate the problem of inferring a self composition function together with the inductive invariant needed to verify safety of the composed program, where both are restricted to a given language. We develop a property-directed inference algorithm that, given a set of predicates, infers composition-invariant pairs expressed by Boolean combinations of the given predicates, or determines that no such pair exists. We implemented our algorithm and demonstrate that it is able to find self compositions that are beyond reach of existing tools.


Introduction
Many relational properties, such as noninterference [13], determinism [22], service level agreements [9], and more, can be reduced to the problem of k-safety. Namely, reasoning about k different traces of a program simultaneously. A common approach to verifying k-safety properties is by means of self composition, where the program is composed with k copies of itself [4,31]. A state of the composed program consists of the states of each copy, and a trace naturally corresponds to k traces of the original program. Therefore, k-safety properties of the original program become ordinary safety properties of the composition, hence reducing k-safety verification to ordinary safety. This enables reasoning about k-safety properties using any of the existing techniques for safety verification such as Hoare logic [21] or model checking [7].
While self composition is sound and complete for k-safety, its applicability is questionable for two main reasons: (i) considering several copies of the program greatly increases the state space; and (ii) the way in which the different copies are composed when reducing the problem to safety verification affects the complexity of the resulting self composed program, and as such affects the complexity of verifying it. Improving the applicability of self composition has been the topic of many works [2,29,15,19,26,32].
However, most efforts are focused on compositions that are pre-defined, or only depend on syntactic similarities.
In this paper, we take a different approach; we build upon the observation that by choosing the "right" composition, the verification can be greatly simplified by leveraging "simple" correlations between the executions. To that end, we propose an algorithm, called PDSC, for inferring a property directed self composition. Our approach uses a dynamic composition, where the composition of the different copies can change during verification, directed at simplifying the verification of the composed program.
Compositions considered in previous work differ in the order in which the copies of the program execute: either synchronously, asynchronously, or in some mix of the two [33,3,15]. To allow general compositions, we define a composition function that maps every state of the composed program to the set of copies that are scheduled in the next step. This determines the order of execution for the different copies, and thus induces the self composed program. Unlike most previous works where the composition is pre-defined based on syntactic rules only, our composition is semantic as it is defined over the state of the composed program.
To capture the difficulty of verifying the composed program, we consider verification by means of inferring an inductive invariant, parameterized by a language for expressing the inductive invariant. Intuitively, the more expressive the language needs to be, the more difficult the verification task is. We then define the problem of inferring a composition function together with an inductive invariant for verifying the safety of the composed program, where both are restricted to a given language. Note that for a fixed language L, an inductive invariant may exist for some composition function but not for another 4 . Thus, the restriction to L defines a target for the inference algorithm, which is now directed at finding a composition that admits an inductive invariant in L.
Example 1. To demonstrate our approach, consider the program in Figure 1. The program inserts a new value into an array. We assume that the array A and its length len are "low"-security variables, while the inserted value h is "high"-security. The first loop finds the location in which h will be inserted. Note that the number of iterations depends on the value of h. Due to that, the second loop executes to ensure that the output i (which corresponds to the number of iterations) does not leak sensitive data. As an example, we emphasize that without the second loop, i could leak the location of h in A. To express the property that i does not leak sensitive data, we use the 2-safety property that in any two executions, if the inputs A and len are the same, so is the output i.
To verify the 2-safety property, consider two copies of the program. Let the language L for verifying the self composition be defined by the predicates depicted in Figure 1. The most natural self composition to consider is a lock-step composition, where the copies execute synchronously. However, for such a composition the composed program may reach a state where, for example, i 1 = i 2 +1. This occurs when the first copy exists the first loop, while the second copy is still executing it. Since the language cannot express this correlation between the two copies, no inductive invariant suffices to verify that i 1 = i 2 when the program terminates. composition: if(pc1 < 3 && (pc2 > 0 || !cond1) && (pc2 == 3||(pc2 == 0 && cond2))) step(1); else if (pc2 < 3 && (pc1 > 0 || !cond2) && (pc1 == 3 || (pc1 == 0 && cond1))) step(2); else step (1,2); In contrast, when verifying the 2-safety property, PDSC directs its search towards a composition function for which an inductive invariant in L does exist. As such, it infers the composition function depicted in Figure 1, as well as an inductive invariant in L. The invariant for this composition implies that i 1 = i 2 at every state.
As demonstrated by the example, PDSC focuses on logical languages based on predicate abstraction [18], where inductive invariants can be inferred by model checking. In order to infer a composition function that admits an inductive invariant in L, PDSC starts from a default composition function, and modifies its definition based on the reasoning performed by the model checker during verification. As the composition function is part of the verified model (recall that it is defined over the program state), different compositions are part of the state space explored by the model checker. As a result, a key ingredient of PDSC is identifying "bad" compositions that prevent it from finding an inductive invariant in L. It is important to note that a naive algorithm that tries all possible composition functions has a time complexity O(2 2 |P| ), where P is the set of predicates considered. However, integrating the search for a composition function into the model checking algorithm allows us to reduce the time complexity of the algorithm to O(2 |P| ), where we show that the problem is in fact PSPACE-hard.
We implemented PDSC using SEAHORN [20], Z3 [12] and SPACER [23] and evaluated it on examples that demonstrate the need for nontrivial semantic compositions. Our results clearly show that PDSC can solve complex examples by inferring the required composition, while other tools cannot verify these examples. We emphasize that for these particular examples, lock-step composition is not sufficient. We also evaluated PDSC on the examples from [29,26] that are proven with the trivial lock-step composition. On these examples, PDSC is comparable to state of the art tools.
Related work. This paper addresses the problem of verifying k-safety properties (also called hyperproperties [8]) by means of self composition. Other approaches tackle the problem without self-composition, and often focus on more specific properties, most noticeably the 2-safety noninterference property (e.g. [1,32]). Below we focus on works that use self-composition.
Previous work such as [4,2,3,16,31,15] considered self composition (also called product programs) where the composition function is constant and set a-priori, using syntax-based hints. While useful in general, such self compositions may sometimes result in programs that are too complex to verify. This is in contrast to our approach, where the composition function is evolving during verification, and is adapted to the capabilities of the model checker.
The work most closely related to ours is [29] which introduces Cartesian Hoare Logic (CHL) for verification of k-safety properties, and designs a verification framework for this logic. This work is further improved in [26]. These works search for a proof in CHL, and in doing so, implicitly modify the composition. Our work infers the composition explicitly and can use off-the-shelf model checking tools. More importantly, when loops are involved both [29] and [26] use lock-step composition and align loops syntactically. Our algorithm, in contrast, does not rely on syntactic similarities, and can handle loops that cannot be aligned trivially.
There have been several results in the context of harnessing Constraint Horn Clauses (CHC) solvers for verification of relational properties [11,25]. Given several copies of a CHC system, a product CHC system that synchronizes the different copies is created by a syntactical analysis of the rules in the CHC system. These works restrict the synchronization points to CHC predicates (i.e., program locations), and consider only one synchronization (obtained via transformations of the system of CHCs). On the other hand, our algorithm iteratively searches for a good synchronization (composition), and considers synchronizations that depend on program state.
Equivalence checking and regression verification. Equivalence checking is another closely related research field, where a composition of several programs is considered. As an example, equivalence checking is applied to verify the correctness of compiler optimizations [33,28,10,19]. In [28] the composition is determined by a brute-force search for possible synchronization points. While this brute-force search resembles our approach for finding the correct composition, it is not guided by the verification process. The works in [10,19] identify possible synchronization points syntactically, and try to match them during the construction of a simulation relation between programs.
Regression verification also requires the ability to show equivalence between different versions of a program [16,17,30]. The problem of synchronizing unbalanced loops appears in [30] in the form of unbalanced recursive function calls. To allow synchronization in such cases, the user can specify different unrolling parameters for the different copies. In contrast, our approach relies only on user supplied predicates that are needed to establish correctness, while synchronization is handled automatically.

Preliminaries
In this paper we reason about programs by means of the transition systems defining their semantics. A transition system is a tuple T = (S, R, F ), where S is a set of states, R ⊆ S ×S is a transition relation that specifies the steps in an execution of the program, and F ⊆ S is a set of terminal states F ⊆ S such that every terminal state s ∈ F has an outgoing transition to itself and no additional transitions (terminal states allow us to reason about pre/post specifications of programs). An execution or trace π = s 0 , s 1 , . . . is a (finite or infinite) sequence of states such that for every i ≥ 0, (s i , s i+1 ) ∈ R. The execution is terminating if there exists 0 ≤ i ≤ |π| such that s i ∈ F . In this case, the suffix of the execution is of the form s i , s i , . . . and we say that π ends at s i .
As usual, we represent transition systems using logical formulas over a set of variables, corresponding to the program variables. We denote the set of variables by V. The set of terminal states is represented by a formula over V and the transition relation is represented by a formula over V V , where V represents the pre-state of a transition and V = {v | v ∈ V} represents its post-state. In the sequel, we use sets of states and their symbolic representation via formulas interchangeably.
Safety and inductive invariants. We consider safety properties defined via pre/post conditions. 5 A safety property is a pair (pre, post) where pre, post are formulas over V, representing subsets of S, denoting the pre-and post-condition, respectively. T satisfies (pre, post), denoted T |= (pre, post), if every terminating execution π of T that starts in a state s 0 such that s 0 |= pre ends in a state s such that s |= post. In other words, for every state s that is reachable in T from a state in pre we have that s |= F → post.
A prominent way to verify safety properties is by finding an inductive invariant. An inductive invariant for a transition system T and a safety property (pre, post) is a formula Inv such that (1) pre ⇒ Inv (initiation), (2) Inv ∧ R ⇒ Inv (consecution), and (3) Inv ⇒ (F → post) (safety), where ϕ ⇒ ψ denotes the validity of ϕ → ψ, and ϕ denotes ϕ(V ), i.e., the formula obtained after substituting every v ∈ V by the corresponding v ∈ V. If there exists such an inductive invariant, then T |= (pre, post).
k-safety. A k-safety property refers to k interacting executions of T . Similarly to an ordinary property, it is defined by (pre, post), except that pre and post are defined over denotes the ith copy of the program variables. As such, pre and post represent sets of k-tuples of program states (k-states for short): for a k-tuple (s 1 , . . . , s k ) of states and a formula ϕ over V 1 . . . V k , we say that (s 1 , . . . , s k ) |= ϕ if ϕ is satisfied when for each i, the assignment of V i is determined by s i . We say that T satisfies (pre, post), denoted T |= k (pre, post), if for every k terminating executions π 1 , . . . , π k of T that start in states r 1 , . . . , r k , respectively, such that (r 1 , . . . , r k ) |= pre, it holds that they end in states t 1 , . . . , t k , respectively, such that (t 1 , . . . , t k ) |= post.
For example, the non interference property may be specified by the following 2safety property: where LowIn and LowOut denote subsets of the program inputs, resp. outputs, that are considered "low security" and the rest are classified as "high security". This property asserts that every 2 terminating executions that start in states that agree on the "low security" inputs end in states that agree on the low security outputs, i.e., the outcome does not depend on any "high security" input and, hence, does not leak secure information.
Checking k-safety properties reduces to checking ordinary safety properties by creating a self composed program that consists of k copies of the transition system, each with its own copy of the variables, that run in parallel in some way. Thus, the self composed program is defined over variables denotes the variables associated with the ith copy. For example, a common composition is a lock-step composition in which the copies execute simultaneously. The resulting composed transition system . Then, the k-safety property (pre, post) is satisfied by T if and only if an ordinary safety property (pre, post) is satisfied by T k . More general notions of self composition are investigated in Section 3.

Inferring Self Compositions for Restricted Languages of Inductive Invariants
Any self-composition is sufficient for reducing k-safety to safety, e.g., lock-step, sequential, synchronous, asynchronous, etc. However, the choice of the self-composition used determines the difficulty of the resulting safety problem. Different self composed programs would require different inductive invariants, some of which cannot be expressed in a given logical language.
In this section, we formulate the problem of inferring a self composition function such that the obtained self composed program may be verified with a given language of inductive invariants. We are, therefore, interested in inferring both the self composition function and the inductive invariant for verifying the resulting self composed program. We start by formulating the kind of self compositions that we consider.
In the sequel, we fix a transition system T = (S, R, F ) with a set of variables V.

Semantic Self Composition
Roughly speaking, a k self composition of T consists of k copies of T that execute together in some order, where steps may interleave or be performed simultaneously. The order is determined by a self composition function, which may also be viewed as a scheduler that is responsible for scheduling a subset of the copies in each step. We consider semantic compositions in which the order may depend on the states of the different copies, as well as the correlations between them (as opposed to syntactic compositions that only depend on the control locations of the copies, but may not depend on the values of other variables): Definition 1 (Semantic Self Composition Function). A semantic k self composition function (k-composition function for short) is a function f : S k → P({1..k}), mapping each k-state to a nonempty set of copies that are to participate in the next step of the self composed program 6 .
We represent a k-composition function f by a set of logical conditions, with a condition C M for every nonempty subset M ⊆ {1..k} of the copies. For each such M ⊆ {1..k}, the condition C M is defined over V k = V 1 . . . V k , and hence it represents a set of k-states, with the meaning that all the k-states that satisfy C M are mapped to M by f : To ensure that the function is well defined, we require that ( M C M ) ≡ true, which ensures that every k-state satisfies at least one of the conditions. We also require that for every M 1 = M 2 , C M1 ∧ C M2 ≡ false, hence every k-state satisfies at most one condition. Together these requirements ensure that the conditions induce a partition of the set of all k-states. In the sequel, we identify a k-composition function f with its symbolic representation via conditions {C M } M and use them interchangeably.
That is, every transition of T f corresponds to a simultaneous transition of a subset M of the k copies of T , where the subset is determined by the self composition function f . If f (s 1 , . . . , s k ) = M , then for every i ∈ M we say that i is scheduled in (s 1 , . . . , s k ).

Example 2.
A k self composition that runs the k copies of T sequentially, one after the other, corresponds to a k-composition function f defined by f (s 1 , . . . , s k ) = {i} where i ∈ {1..k} is the minimal index of a non-terminal state in {s 1 , . . . , s k }. If all states in {s 1 , . . . , s k } are terminal then i = k (or any other index). This is encoded as follows: In order to ensure soundness of a reduction of k-safety to safety via self composition, one has to require that the self composition function does not "starve" any copy of the transition system that is about to terminate if it continues to execute. We refer to this requirement as fairness.

Definition 3 (Fairness).
A k-self composition function f is fair if for every k terminating executions π 1 , . . . , π k of T there exists an execution π of T f such that for every copy i ∈ {1..k}, the projection of π to i is π i .
Note that by the definition of the terminal states of T f , π as above is guaranteed to be terminating. We say that the ith copy terminates in π if π contains a k-state (s 1 , . . . , s k ) such that s i ∈ F . Fairness may be enforced in a straightforward way by requiring that whenever f (s 1 , . . . , s k ) = M , the set M includes no index i for which s i ∈ F , unless all have terminated. Since we assume that terminal states may only transition to themselves, a weaker requirement that suffices to ensure fairness is that M includes at least one index i for which s i ∈ F , unless there is no such index.
The following claim is now straightforward: Let T be a transition system, (pre, post) a k-safety property, and f a fair k-composition function for T and (pre, post). Then Proof (sketch). Every terminating execution of T f corresponds to k terminating executions of T . Fairness of f ensures that the converse also holds.
To demonstrate the necessity of the fairness requirement, consider a (non-fair) self composition function f that maps every state to {1}. Then, regardless of what the actual transition system T does, the resulting self composition T f satisfies every pre-post specification vacuously, as it never reaches a terminal state.
Remark 1. While we require the conditions {C M } M defining a self composition function f to induce a partition of S k in order to ensure that f is well defined as a (total) function, the requirement may be relaxed in two ways. First, we may allow C M1 and C M2 to overlap. This will add more transitions and may make the task of verifying the composed program more difficult, but it maintains the soundness of the reduction. Second, it suffices that the conditions cover the set of reachable states of the composed program rather than the entire state space. These relaxations do not damage soundness. Technically, this means that f represented by the conditions is a relation rather than a function. We still refer to it as a function and write f (s 1 , . . . , s k ) = M to indicate that (s 1 , . . . , s k ) |= C M , not excluding the possibility that (s 1 , . . . , s k ) |= M for M = M as well. We note that as long as the language used to describe compositions is closed under Boolean operations, we can always extract from the conditions {C M } M a function f . This is done as follows: It is easy to verify that f defined by {C M } M is a total self composition function and that if f is fair, then so is f .

The Problem of Inferring Self Composition with Inductive Invariant
Lemma 1 states the soundness of the reduction of k-safety to ordinary safety. Together with the ability to verify safety by means of an inductive invariant, this leads to a verification procedure. However, while soundness of the reduction holds for any self composition, an inductive invariant in a given language may exist for the composed program resulting from some compositions but not from others. We therefore consider the self composition function and the inductive invariant together, as a pair, leading to the following definition.
As commented in Remark 1, we relax the requirement that ( M C M ) ≡ true to Inv ⇒ M C M , thus ensuring that the conditions cover all the reachable states. Since the reachable states of T f are determined by {C M } M (which define f ), this reveals the interplay between the self composition function and the inductive invariant. Furthermore, we do not require that C M1 ∧ C M2 ≡ false for M 1 = M 2 , hence a k-state may satisfy multiple conditions. As explained earlier, these relaxations do not damage soundness. Furthermore, if we construct from f a self composition function f as described in Remark 1, Inv would be an inductive invariant for T f as well. Proof (sketch). If (f, Inv ) is a composition-invariant pair, then Inv is an inductive invariant for T f , where f is a fair composition function defined as in Remark 1. From Lemma 1 we conclude that T |= k (pre, post).
If we do not restrict the language in which f and Inv are specified, then the converse also holds. However, in the sequel we are interested in the ability to verify k-safety with a given language, e.g., one for which the conditions of Definition 4 belong to a decidable fragment of logic and hence can be discharged automatically. When no solution exists, it does not necessarily mean that T |= k (pre, post). Instead, it may be that the language L is simply not expressive enough. Unfortunately, for expressive languages (e.g., quantified formulas or even quantifier free linear integer arithmetic), the problem of inferring an inductive invariant alone is already undecidable, making the problem of inferring a composition-invariant pair undecidable as well: Lemma 3. Let L be closed under Boolean operations and under substitution of a variable with a value, and include equalities of the form v = a, where v is a variable and a is a value (of the same sort). If the problem of inferring an inductive invariant in L is undecidable, then so is the problem of inferring a composition-invariant pair in L.
Proof. We show a reduction from the ordinary invariant inference problem in L to the problem of inferring a composition-invariant pair in L. Given a transition system T and an ordinary safety property (pre, post) the reduction constructs a transition system T * = (S * , R * , F * ) over V * = V {b}, where b is a new Boolean variable such that when b = true the original transitions are taken and when b = false the systems remains in the same state, which is also added to the set of terminal states. Formally, for every v ∈ V, let a v be an arbitrary fixed value in the domain of v. For example, if v is Boolean, a v = false. The reduction constructs and the following 2-safety property: That is, the first copy is "initialized" with b = true and with the original pre-condition and is required to terminate in a state that satisfies the original post-condition, while the second copy is initialized with b = false, and with the value a v for each original variable, and is required to terminate in the same state. Clearly, if T has an inductive invariant Inv for (pre, post), For example, linear integer arithmetic satisfies the conditions of the lemma. This motivates us to restrict the languages of inductive invariants. Specifically, we consider languages defined by a finite set of predicates. We consider relational predicates, defined For a finite set of predicates P, we define L P to be the set of all formulas obtained by Boolean combinations of the predicates in P.
Definition 6 (Inference using predicate abstraction). The problem of inferring a predicatebased composition-invariant pair is defined as follows. The input is a transition system T , a k-safety property (pre, post), and a finite set of predicates P. The output is the solution to the problem of inferring a composition-invariant pair for T and (pre, post) in L P .

Remark 2.
It is possible to decouple the language used for expressing the self composition function from the language used to express the inductive invariant. Clearly, different sets of predicates (and hence languages) can be assigned to the self composition function and to the inductive invariant. However, since inductiveness is defined with respect to the transitions of the composed system, which are in turn defined by the self composition function, if the language defining f is not included in the language defining Inv , the conditions C M themselves would be over-approximated when checking the requirements of Definition 4 and therefore would incur a precision loss. For this reason, we use the same language for both.
Since the problem of invariant inference in L P is PSPACE-hard [24], a reduction from the problem of inferring inductive invariants to the problem of inferring composition-invariant pairs (similar to the one used in the proof of Lemma 3) shows that composition-invariant inference in L P is also PSPACE-hard: Theorem 1. Inferring a predicate-based composition-invariant pair is PSPACE-hard.

Algorithm for Inferring Composition-Invariant Pairs
In this section, we present Property Directed Self-Composition, PDSC for short -our algorithm for tackling the composition-invariant inference problem for languages of predicates (Definition 6). Namely, given a transition system T , a k-safety property (pre, post) and a finite set of predicates P, we address the problem of finding a pair (f, Inv ), where f is a self composition function and Inv is an inductive invariant for the composed transition system T f obtained from f , and both of them are in L P , i.e., defined by Boolean combinations of the predicates in P.
We rely on the property that a transition system (in our case T f ) has an inductive invariant in L P if and only if its abstraction obtained using P is safe. This is because, the set of reachable abstract states is the strongest set expressible in L P that satisfies initiation and consecution. Given T f , this allows us to use predicate abstraction to either obtain an inductive invariant in L P for T f (if the abstraction of T f is safe) or determine that no such inductive invariant exists (if an abstract counterexample trace is obtained). The latter indicates that a different self composition function needs to be considered. A naive realization of this idea gives rise to an iterative algorithm that starts from an arbitrary initial composition function and in each iteration computes a new composition function. At the worst case such an algorithm enumerates all self composition functions defined in L P , i.e., has time complexity O(2 2 |P| ). Importantly, we observe that, when no inductive invariant exists for some composition function, we can use the abstract counterexample trace returned in this case to (i) generalize and eliminate multiple composition functions, and (ii) identify that some abstract states must be unreachable if there is to be a composition-invariant pair, i.e., we "block" states in the spirit of property directed reachability [5,14]. This leads to the algorithm depicted in Algorithm 1 whose worst case time complexity is O(2 |P| ). Next, we explain the algorithm in detail.
Finding an inductive invariant for a given composition function using predicate abstraction. We use predicate abstraction [18,27] to check if a given candidate composition function has a corresponding inductive invariant. This is done as follows. The abstraction of T f using P, denoted A P (T f ), is a transition system (Ŝ,R) defined over variables B, where B = {b p | p ∈ P} (we omit the terminal states).Ŝ = {0, 1} B , i.e., each abstract state corresponds to a valuation of the Boolean variables representing P. An abstract stateŝ ∈Ŝ represents the following set of states of T f : We extend γ to sets of states and to formulas representing sets of states in the usual way. The abstract transition relation is defined as usual: R = {(ŝ 1 ,ŝ 2 ) | ∃s 1 ∈ γ(ŝ 1 ) ∃s 2 ∈ γ(ŝ 2 ). (s 1 , s 2 ) ∈ R f } Note that the set of abstract states in A P (T f ) does not depend on f .
Notation. We sometimes refer to an abstract stateŝ ∈Ŝ as the formula ŝ(bp)=1 b p ∧ ŝ(bp)=0 ¬b p . For a formula ψ ∈ L P , we denote by ψ(B) the result of substituting each p ∈ P in ψ by the corresponding Boolean variable b p . For the opposite direction, given a formula ψ over B, we denote by ψ(P) the formula in L P resulting from substituting each b p ∈ B in ψ by p. Therefore, ψ(P) is a symbolic representation of γ(ψ).
Every set defined by a formula ψ ∈ L P is precisely represented by ψ(B) in the sense that γ(ψ(B)) is equal to the set of states defined by ψ, i.e., ψ(B) is a precise abstraction of ψ. For simplicity, we assume that the termination conditions as well as the pre/post specification can be expressed precisely using the abstraction, in the following sense: Definition 7. P is adequate for T and (pre, post) if there exist ϕ pre , ϕ post , ϕ F i ∈ L P such that ϕ pre ≡ pre, ϕ post ≡ post and ϕ F i ≡ F i (for every copy i ∈ {1..k}).
The following lemma provides the foundation for our algorithm: Lemma 4. Let T be a transition system, (pre, post) a k safety property, and P a finite set of predicates adequate for T and (pre, post). For a self composition function f defined via conditions {C M } M in L P , there exists an inductive invariant Inv in L P such that (f, Inv ) is a composition-invariant pair for T and (pre, post) if and only if the following three conditions hold: S1 All reachable states of Furthermore, if the conditions hold, then the symbolic representation of the set of abstract states of A P (T f ) reachable from ϕ pre (B) is a formula Inv over B such that (f, Inv (P)) is a composition-invariant pair for T and (pre, post).
Proof. The proof relies on the following statement, denoted by ( * ): for a formula ϕ in L P and an abstract stateŝ, for every s ∈ γ(ŝ) it holds that s |= ϕ ⇔ŝ |= ϕ(B) (which follows by induction on the structure of a formula in L P , relying on the definition of γ(ŝ)). In particular, this implies that for a formula ψ over B, it holds that s |= ψ(P) ⇔ŝ |= ψ whenever s ∈ γ(ŝ).
(⇒) Let T , (pre, post) and P be as described, and let (f, Inv ) be a compositioninvariant pair for T and (pre, post) in L P . We first show that every (abstract) state that is reachable from ϕ pre (B) in A P (T f ) satisfies Inv (B). Letŝ be such a reachable state. Then there exists an abstract traceŝ 1 , . . . ,ŝ m such thatŝ 1 |= ϕ pre (B),ŝ m =ŝ and (ŝ i ,ŝ i+1 ) ∈R for every 1 ≤ i < m. Consider a concrete state s 1 of T f such that s 1 ∈ γ(ŝ 1 ), thenŝ 1 |= ϕ pre (B) and from ( * ) we get s 1 |= ϕ pre . From the definition of a composition-invariant pair (Definition 4) we get that s 1 |= Inv (initiation). Since Inv is in L P we get from ( * ) that alsoŝ 1 |= Inv (B). Forŝ 2 , the next state in the abstract trace, it also holds thatŝ 2 |= Inv (B): since (ŝ 1 ,ŝ 2 ) ∈R, we know that there exist some s a ∈ γ(ŝ 1 ) and s b ∈ γ(ŝ 2 ) such that (s a , s b ) ∈ R f , using ( * ) we get that s a |= Inv, the consecution of Inv implies s b |= Inv and from ( * ) we getŝ 2 |= Inv (B). By induction over the length of the abstract trace we get thatŝ |= Inv (B). We now turn to show that conditions S1-S3 hold. First, the safety of Inv for T f together with adequacy of P and ( * ) imply that Inv (B) ⇒ ( k j=1 F j (B)) → post(B) , and since all the reachable states of A P (T f ) satisfy Inv (B), S1 follows. Similarly, the covering requirement of f together with the property that C M is in L P for every M and together with ( * ) imply S2. Finally, S3 is implied directly from the fairness of f (Definition 4).
(⇐) Assume that for T , (pre, post), P and some composition function f as described, conditions S1-S3 hold. Condition S1 ensures that A P (T f ) satisfies the safety property (ϕ pre (B), ϕ post (B)), when we augment A P (T f ) with a set of terminal states given by the formula Hence, there exists an inductive invariant Inv over B for A P (T f ) and (ϕ pre (B), ϕ post (B)). Furthermore, condition S2 ensures that there exists such Inv for which Inv ⇒ M C M (B) (for example, such Inv may be obtained by conjoining the inductive invariant ensured by S1 with another inductive invariant that establishes S2). To conclude the proof we show that (f, Inv (P)) is a compositioninvariant pair for T and (pre, post), as defined in Definition 4. First, initiation and safety of Inv with respect to A P (T f ) and (ϕ pre (B), ϕ post (B)), imply initiation and safety (respectively) of Inv (P) with respect to T and (ϕ pre , ϕ post ) due to ( * ) and ad-equacy of P. As for consecution of Inv (P): for a pair of states s 1 , s 2 in T f such that (s 1 , s 2 ) ∈ R f , if s 1 ∈ γ(ŝ 1 ) and s 2 ∈ γ(ŝ 2 ), then (ŝ 1 ,ŝ 2 ) ∈R. Therefore, if s 1 |= Inv (P) thenŝ 1 |= Inv (according to ( * )), and from consecution of Inv in A P (T f ) alsoŝ 2 |= Inv , and from ( * ) we get s 2 |= Inv (P) and conclude the consecution of Inv (P) in T f . Similarly, for covering of f : recall that Inv ⇒ M C M (B), hence by ( * ), Inv (P) ⇒ M C M , i.e., f covers the states satisfying Inv (P). Finally, the fairness of f follows from S3. Algorithm 1 starts from the lock-step self composition function (Line 1), which is fair 7 , and constructs the next candidate f such that condition S3 in Lemma 4 always holds (see discussion of Modify_SC). Thus, condition S3 need not be checked explicitly.
Algorithm 1 checks whether conditions S1 and S2 hold for a given candidate composition function f by calling Abs_Reach (Line 3) -both checks are performed via a (non-)reachability check in A P (T f ), checking whether a state violating ( . Algorithm 1 maintains the abstract states that are not in M C M (B) by the formula Unreach defined over B, which is initialized to false (as the lock-step composition function is defined for every state) and is updated in each iteration of Algorithm 1 to include the abstract states violating M C M (B). If no abstract state violating S1 or S2 is reachable, i.e., the conditions hold, then Abs_Reach returns the (potentially overapproximated) set of reachable abstract states, represented by a formula Inv over B. In this case, by Lemma 4, (f, Inv (P)) is a composition-invariant pair (Line 4). Otherwise, an abstract counterexample trace is obtained. (We can of course apply bounded model checking to check if the counterexample is real; we omit this check as our focus is on the case where the system is safe.) Remark 3. In practice, we do not construct A P (T f ) explicitly. Instead, we use the implicit predicate abstraction approach [6].
Eliminating self composition candidates based on abstract counterexamples. An abstract counterexample to conditions S1 or S2 indicates that the candidate composition function f has no corresponding Inv . Violation of S1 can only be resolved by changing f such that the abstract trace is no longer feasible. Violation of S2 may, in principle, also be resolved by extending the definition of f such that it is defined for all the abstract states in the counterexample trace.
However, to prevent the need to explore both options, our algorithm maintains the following invariant for every candidate self composition function f that it constructs: Claim. Every abstract state that is not in M C M (B) is not reachable w.r.t. the abstract composed program of any composition function that is part of a composition-invariant pair for T and (pre, post).
This property clearly holds for the lock-step composition function, which the algorithm starts with, since for this composition, M C M (B) ≡ true. As we explain in Corollary 2, it continues to hold throughout the algorithm.
As a result of this property, whenever a candidate composition function f does not satisfy condition S1 or S2, it is never the case that M C M (B) needs to be extended to allow the abstract states in cex to be reachable. Instead, the abstract counterexample obtained in violation of the conditions needs to be eliminated by modifying f .
Let cex =ŝ 1 , . . . ,ŝ m+1 be an abstract counterexample of Any self composition f that agrees with f on the states in γ(ŝ i ) for everŷ s i that appears in cex has the same transitions in R f and, hence, the same transitions inR. It, therefore, exhibits the same abstract counterexample in A P (T f ). Hence, it violates S1 or S2 and is not part of any composition-invariant pair.
Notation. Recall that f is defined via conditions C M ∈ L P . This ensures that for every abstract stateŝ, f is defined in the same way for all the states in γ(ŝ). We denote the value of f on the states in γ(ŝ) by f (ŝ) (in particular, f (ŝ) may be undefined). We get that f (ŝ) = M if and only ifŝ |= C M (B).
Using this notation, to eliminate the abstract counterexample cex , one needs to eliminate at least one of the transitions in cex by changing the definition of f (ŝ i ) for some 1 ≤ i ≤ m. For a new candidate function f this may be encoded by the disjunctive constraint However, we observe that a stronger requirement may be derived from cex based on the following lemma: Lemma 5. Let f be a self composition function and cex =ŝ 1 , . . . ,ŝ m+1 a counterexample trace in A P (T f ) such thatŝ 1 |= ϕ pre (B) butŝ m+1 |= ( k i=1 ϕ F i (B)) ∧ ¬ϕ post (B) orŝ m+1 |= Unreach. Then for any self composition function f such that f (ŝ m ) = f (ŝ m ), ifŝ m is reachable in A P (T f ) from ϕ pre (B), then a counterexample trace to S1 or S2 exists.
Proof. Suppose thatŝ m is reachable in A P (T f ) from ϕ pre (B). Then there exists a traceŝ 1 , . . . ,ŝ m in A P (T f ) such thatŝ 1 |= ϕ pre (B) andŝ m =ŝ m . Since f (ŝ m ) = f (ŝ m ), the outgoing transitions ofŝ m are the same in both A P (T f ) and A P (T f ). In particular, the transition (ŝ m ,ŝ m+1 ) from A P (T f ) also exists in A P (T f ). Therefore, cex =ŝ 1 , . . . ,ŝ m ,ŝ m+1 is a trace toŝ m+1 in A P (T f ). Ifŝ m+1 |= ( k i=1 ϕ F i (B)) ∧ ¬ϕ post (B), then cex is a counterexample to S1 in A P (T f ) as well. Consider the case whereŝ m+1 |= Unreach. By the construction of Unreach, this indicates thatŝ m+1 has an outgoing abstract trace that leads to violation of S1 or S2 with every non-starving self composition function, and in particular in A P (T f ). Proof. If f (ŝ m ) = f (ŝ m ), then by Lemma 5,ŝ m is necessarily unreachable in A P (T f ) from ϕ pre (B). Therefore, if we change f (ŝ m ), all the requirements of Lemma 4 will still hold. If no alternative value that admits the fairness requirement exists, then f (ŝ m ) can remain undefined.
Therefore, we require that in the next self composition candidates the abstract statê s m must not be mapped to its current value in f , i.e., f (ŝ m ) = M , where f (ŝ m ) = M 8 .
Identifying abstract states that must be unreachable. A new candidate self composition is constructed such that it satisfies all the constraints in E (thus ensuring that no abstract counterexample will re-appear). In the construction, we make sure to satisfy S3 (fairness). Therefore, for every abstract stateŝ, we choose a value f (ŝ) that satisfies the constraints in E and is non-starving: a value M is starving forŝ ifŝ |= k j=1 ¬ϕ F j (B) butŝ |= j∈M ¬ϕ F j (B), i.e., some of the copies have not terminated inŝ but none of the non-terminating copies is scheduled. (Due to adequacy, a value M is starving forŝ if and only if it is starving for every s ∈ γ(ŝ).) If for some abstract stateŝ, all the non-starving values have already been excluded (i.e., (ŝ, M ) ∈ E for every non-starving M ), we conclude that there is no f such that s is reachable in A P (T f ) and f is part of a composition-invariant pair: Lemma 6. Letŝ ∈Ŝ be an abstract state such that for every ∅ = M ⊆ {1..k} either M is starving forŝ or (ŝ, M ) ∈ E. Then, for every f that satisfies S3, if A P (T f ) satisfies S1 and S2, thenŝ is unreachable in A P (T f ).
Proof. If f satisfies S3 and A P (T f ) satisfies S1 and S2, then according to Lemma 4 f is a part of some composition-invariant pair (f , Inv ) for T . Furthermore, as shown in the proof of Lemma 4, every (abstract) state that is reachable from ϕ pre (B) in A P (T f ) satisfies Inv (B). Assume to the contrary thatŝ is reachable in A P (T f ). Thenŝ |= Inv (B). According to Definition 4, f must be defined forŝ, thus f (ŝ) = M for some ∅ = M ⊆ {1 . . . k}. Since f is fair (satisfies S3) it must be the case that (ŝ, M ) ∈ E. According to the algorithm, at some iteration there was a composition f with f (ŝ) = M that caused adding (ŝ, M ) to E, i.e., there was a counterexample to S1 or S2 in A P (T f ) in the form of a trace toŝ. Then Lemma 5 implies that there is also a counterexample to S1 or S2 in A P (T f ) because f (ŝ) = f (ŝ) = M . This contradicts the assumption that A P (T f ) satisfies S1 and S2. Corollary 2. If there exists a composition-invariant pair (f , Inv ), thenŝ is unreach- This is because no matter how the self composition function f would be defined,ŝ is guaranteed to have an outgoing abstract counterexample trace in A P (T f ).
We, therefore, turn f (ŝ) to be undefined. As a result, condition S2 of Lemma 4 requires thatŝ will be unreachable in A P (T f ). In Algorithm 1, this is enforced by addingŝ to Unreach (Line 8).
Every abstract stateŝ that is added to Unreach is a strengthening of the safety property by an additional constraint that needs to be obeyed in any composition-invariant pair, where obtaining a composition-invariant pair is the target of the algorithm. This makes our algorithm property directed.
If an abstract state that satisfies ϕ pre (B) is added to Unreach, then Algorithm 1 determines that no solution exists (Line 9). Otherwise, it generates a new constraint for E based on the abstract state precedingŝ in the abstract counterexample (Line 12).
Constructing the next candidate self composition function. Given the set of constraints in E and the formula Unreach, Modify_SC (Line 13) generates the next candidate composition function by (i) taking a constraint (ŝ, M ) such thatŝ |= Unreach (typically the one that was added last), (ii) selecting a non-starving value M new forŝ (such a value must exist, otherwiseŝ would have been added to Unreach), and (iii) updating the conditions defining f as follows: The conditions of other values remain as before. This definition is facilitated by the fact that the same set of predicates is used both for defining f and for defining the abstract statesŝ ∈Ŝ (by which Inv is obtained). Note that in practice we do not explicitly turn f to be undefined for γ(Unreach). However, these definitions are ignored. The definition ensures that f is non-starving (satisfying condition S3) and that no two conditions C M1 = C M2 overlap. While the latter is not required, it also does not restrict the generality of the approach (since the language we consider is closed under Boolean operations).
Theorem 2. Let T be a transition system, (pre, post) a k-safety property and P a set of predicates over V k . If Algorithm 1 returns " no solution" then there is no compositioninvariant pair for T and (pre, post) in L P . Otherwise, (f, Inv (P)) returned by Algorithm 1 is a composition-invariant pair in L P , and thus T |= k (pre, post).
Proof. Algorithm 1 returns " no solution" when Unreach ∧ ϕ pre (B) is satisfiable. This means that there is an abstract stateŝ that satisfies ϕ pre (B) but also satisfies Unreach. By the construction of Unreach, this means thatŝ must be unreachable from ϕ pre (B) in any A P (T f ) such that (f , Inv ) a composition-invariant pair in L P (see Corollary 2). Hence, no such (f , Inv ) exists. Conversely, Algorithm 1 returns (f, Inv (P)) when all the conditions listed in Lemma 4 are met, thus (f, Inv (P)) is a composition-invariant pair.
Complexity. Each iteration of Algorithm 1 adds at least one constraint to E, excluding a potential value for f over some abstract stateŝ. An excluded values is never re-used. Hence, the number of iterations is at most the number of abstract states, 2 |P| , multiplied by the number of potential values for each abstract state, n = 2 k . Altogether, the number of iterations is at most O(2 |P| · 2 k ). Each iteration makes one call to Abs_Reach which checks reachability via predicate abstraction, hence, assuming that satisfiability checks in the original logic are at most exponential, its complexity is also O(2 |P| ). Therefore, the overall complexity of the algorithm is O(2 |P| ·2 k ). Typically, k is a small constant, hence the complexity is dominated by O(2 |P| ).

Evaluation and Conclusion
Implementation. We implemented PDSC (Algorithm 1) in Python on top of Z3 [12]. The input is a C program encoded (by SEAHORN [20]) as a transition system using Constrained Horn Clauses (CHC) in SMT2 format, a k-safety property and a set of predicates. The abstraction is implicitly encoded using the approach of [6], and it is parameterized by a composition function that is modified in each iteration. For reachability checks (Abs_Reach) we use SPACER [23], which supports LRA and arrays. For the set of predicates used by PDSC, we implemented an automatic procedure that mines these predicates from the CHC. Additional predicates may be added manually.
Experiments. To evaluate PDSC, we compare it to SYNONYM [26], the current state of the art in k-safety verification.
To show the effectiveness of PDSC, we consider examples that require a nontrivial composition (these examples are detailed in Appendix A). We emphasize that the motivation for these example is originated in real-life scenarios. For example, Figure 1 follows a pattern of constant-time execution. The results of these experiments are summarized in Table 1. PDSC is able to find the right composition function and prove all of the examples, while SYNONYM cannot verify any of them. We emphasize that for these examples, lock-step composition is not sufficient. However, PDSC infers a composition that depends on the programs' state (variable values), rather than just program locations.
Next we consider Java programs from [26,29], which we manually converted to C. For all but 3 examples, only 2 types of predicates, that we mined automatically were sufficient for verification: (i) relational predicates derived from the pre-and postconditions, e.g., for anti-symmetry, a predicate for each equality expression in the property, and (ii) for simple loops that have an index variable (e.g., for iterating over an array), an equality predicate between the copies of the indices. These predicates were sufficient since we used a large-step encoding of the transition relation, hence the abstraction via predicates takes effect only at cut-points. For the remaining 3 examples, we manually added 2-4 predicates. All but 1 of these examples are solved with a lockstep composition function. Yet, we include them to show that on examples with simple compositions PDSC performs similarly to SYNONYM. This can be seen in Figure 2.
Conclusion and Future Work. This work formulates the problem of inferring a self composition function together with an inductive invariant for the composed program, thus capturing the interplay between the self composition and the difficulty of verifying the resulting composed program. To address this problem we present PDSC-an algorithm for inferring a semantic self composition, directed at verifying the composed program with a given language of predicates. We show that PDSC manages to find nontrivial self compositions that are beyond reach of existing tools. In future work, we are interested in further improving PDSC by extending it with additional (possibly lazy) predicate discovery abilities. This has the potential to both improve performance and verify properties over wider range of programs. Additionally, we consider exploring further generalization techniques during the inference procedure.

A Benchmarks Used in the Evaluation
In this section, we elaborate on the examples from Table 1. (2); else step(1,2); Fig. 3: A program that computes 2x 2 ; the computation depends on a secret bit h while x is the low input. Figure 3 depicts a non-interference problem (a 2-safety problem) where x is the low input and h is the high input. Taint analysis methods cannot prove non-interference for this program, and no proof exists when the product program presented in [15] is applied (see Appendix B). However, using the language of predicates presented (also in Figure 3), our algorithm infers a composition-invariant pair that proves non-interference for the program.
In the program presented at Figure 4 we consider the non-interference property, with pre-condition low 1 = low 2 (low input) and post-condition y 1 = y 2 (non-interference). The high input h has no constraints as implied from the pre-condition. Intuitively, the difficulty of proving non-interference for this program is the need to "skip" the statement between the two loops in order to keep the outputs of the copies equal in every composed state along the execution. The suggested composition aligns the computations such that they proceed simultaniously only when both are at either loops, which makes i 1 = i 2 ∧ y 1 = y 2 true for every state of the self composed program.  (o11,o21)) = -sign(compare(o12,o22)) predicates: The example in Figure 5 is a comparator based on a Java comparator from the evaluation comparator programs. The comparator was modified to have loop that might perform two steps in a single iteration. The 2-safety property to prove for the comparator is antisymmetry, i.e. the pre-condition is o1 1 = o2 2 ∧ o1 2 = o2 1 and the post-condition is sign(compare(o1 1 , o2 1 )) = −sign(compare(o1 2 , o2 2 )). The figure also describes a composition that aligns the loops according to the value of f lag. This yields a composed program that has an invariant that proves the desired property in the predicates language from Figure 5.  For the program described in Figure 6 we consider the monotonicity property -a 2safety property with pre-condition [a 2 , b 2 ] ⊂ [a 1 , b 1 ] and post-condition c 2 < c 1 . Considering a composition that aligns the computations to start together and run simultaniously, it is easy to see that c 1 < c 2 for unbounded number iterations. However, in Figure 6 we see a composition that eases the task of finding a proof by scheduling the copies such that c 2 < c 1 holds from the first iteration of copy 2 and to the end of both computations.

A.5 ArrayInsert
The program with a detailed explanation of its proof using a composition-invariant pair are presented in Section 1.

B Demonstrating the Interplay Between Self Composition and Inductive Invariants
We illustrate the effect of the self composition function on the difficulty of verifying the obtained composed program, as well as the need for a semantic self composition function on the simple example depicted in Figure 3. The program receives as input an integer x and a secret bit h, and outputs y = 2x 2 . The desired specification is that the output does not depend on h, which is indeed the case. Formally, this is a 2-safety property with pre-condition x 1 = x 2 and post-condition y 1 = y 2 , requiring that in any two terminating executions that start with the same values for x, the final value of y is the same.
As explained earlier, any fair self composition function can be soundly used to reduce the 2-safety problem to an ordinary safety problem. This is because the variables of the two copies of the program are completely disjoint, making the states completely independent. Therefore, the output of each copy does not depend on the actual interleaving of the two copies. As a result, if some interleaving (a fair self composition function) violates the postcondition, all of them will. That is, the actual interleaving does not affect the soundness of the reduction to traditional safety. However, when we turn to verifying the safety of the composed program by finding an inductive invariant in a given language, the specific self composition function used plays a significant role. For example, consider a composition that "synchronizes" the two copies in each control structure (e.g. [15]). Such a composition runs the two copies of the loop in parallel until one copy exits the loop, and then continues to run the other copy. The self composed program obtained by this composition is displayed in Figure 7.  Figure 3 based on [15].
We show that for this composition, there exists no inductive invariant in quantifier free linear integer arithmetic (QFLIA) that is sufficient for establishing safety of the composed program.
(n, 2n 2 , 0, n 2 , 0) (We omit the second copy of x since both copies are equal in all the reachable statesa fact that is also expressible in QFLIA -and similarly, we omit h 1 and h 2 .) Clearly, an inductive invariant must be satisfied by all of these states, since all of them are reachable. However, we show that any QFLIA formula that is satisfied by all of these states is also satisfied by a state that reaches a bad state (i.e., a state where y 1 = y 2 ), thus if it is safe, it necessarily violates the consecution requirement, which means it is not an inductive invariant.
Let ϕ = ϕ 1 ∨ . . . ϕ r be a QFLIA formula, written in DNF form, where each ϕ i is a cube (conjunction of literals). Define R 1 , . . . , R r ⊆ R such that R i = {s ∈ R | s |= ϕ i } includes all states in R that satisfy ϕ i . We show that there exists i such that ϕ i is also satisfied by a state that reaches a bad state.
R includes infinitely many "points" of the form n, n 2 , n, n 2 , 0 where n is an even number. Therefore, since there are finitely many R i 's that together cover R, there exists i such that R i also includes infinitely many such points. Take two such points (n, n 2 , n, n 2 , 0) and (m, m 2 , m, m 2 , 0) in R i where n = m. Then (1/2(n+m), 1/2(n 2 + m 2 ), 1/2(n + m), 1/2(n 2 + m 2 ), 0) is a state (all values are integers) in the convex hull of R i . In particular, it must satisfy ϕ i (ϕ i is a cube in LIA that is satisfied by all states in R i , hence it is also satisfied by all states in its convex hull).
This means that ϕ is not an inductive invariant strong enough to establish safety of the composed program, in contradiction.
In contrast, with the composition function inferred by PDSC (see Figure 3), the composed program has an inductive invariant in QFLIA.