1 Introduction

Abstract Execution (AE) generalizes symbolic execution to programs with “holes,” i.e., abstract or schematic programs: Abstract programs may contain symbols that stand for arbitrary statements or expressions. Symbolic execution of such abstract elements is achieved by approximating them with abstract specifications of conditions on normal or exceptional behavior, frames and footprints, etc. The symbolic execution “flavor” we consider here is complete symbolic execution. This refers to the logic-based variant used in deductive verification [5, 9, 37] that comes with first-order and invariant reasoning. We do not regard the incomplete (often dynamic) symbolic execution variants employed in test case generation [25].

Abstract Execution permits to prove universal second-order properties of program behavior by implicitly quantifying over all permissible instances of abstract elements. However, given that schematic elements and their specifications are abstract, it is generally not possible to prove interesting functional properties of a single abstract program. The power of AE derives from being able to compare the execution of two related abstract programs with the same abstract elements. For loop-free abstract programs this tends to be fully automatic and even in the presence of loops it is usually much easier to find coupling invariants than functional ones [2, 10].

Second-order program properties involving the comparison of the behavior of two programs occur in any area of programming, where relative correctness of two program schemata is of concern: Rule-based compilation [48, 81] and optimization [46, 50], code refactoring [24], program synthesis [75], Correctness-by-Construction [45], to name a few.

Mechanized proofs of such properties traditionally are often performed with interactive proof assistants [56, 59, 82]. An example is the work on verified compilers [48, 81]. This permits specifying arbitrarily complex properties, but a substantial effort is required to manually write proof scripts. Existing automatic approaches, on the other hand, target specific applications (e.g., regression verification [26], “peephole” optimizations [50], symbolic execution rules [6]) and lack expressiveness. AE is positioned in a “sweet spot” in between these extremes, combining considerable expressiveness and generality with a high degree of automation.

1.1 The Setting of Abstract Execution

Most areas mentioned above involve the transformation of schematic programs. Proving the correctness of program transformation rules can be understood as a relational verification [12] problem over programs with placeholders. For example, the pair of schematic programs “p q” and “q p” (where \(p,q\) represent arbitrary statements) describes a program transformation swapping two statements. If we can prove that, under certain assumptions, all instances of the schematic programs before and after the transformation behave equivalently, the transformation is safe.

AE is implemented on top of KeY [5], a highly automatic deductive verification framework for Java programs based on symbolic execution. Our setting of AE extends the Java language by Abstract Statements (ASs)\(\backslash \) abstract_statement ,” and Abstract Expressions (AExps)\(\backslash \) abstract_expression T ,” where and are the identifiers of an abstract statement and expression, respectively, and \(T\) is the type of the abstract expression . ASs and AExps are called Abstract Program Element (APE), programs containing APEs are called abstract (or schematic) programs. AE universally closes over APEs in programs.

Without additional constraints, APE represents all of its well-formed concrete instances. This is insufficient to express meaningful properties. For instance, the above mentioned transformation “p q \(\rightarrow \) q p,” which is a refactoring technique called Slide Statements [22], is generally unsound: if we instantiate \(p\) with “ ” and \(q\) with “ ,” the final value of will generally be different in executions of the original and the transformed code. Therefore, AE provides a specification language to constrain the behavior of concrete instantiations represented by APEs. An APE is the declaration of a placeholder symbol (e.g., ) together with all specification clauses constraining it. It represents all concrete programs satisfying the specification; if multiple APEs with the same identifier are declared in a program, those represent the same programs (if applicable, modulo renaming of input/output locations).

1.2 Specifying Abstract Programs by Example

In the following, we call the memory locations which APE may write to its frame, and the locations it may read from its footprint.

Remark 1

(Wording: Frames and Footprints) In everyday language, the notion of a “footprint,” as in “carbon footprint,” is used for effects on the outside world. Here, we adhere to the meaning coined in the context of “dynamic frames,” where frames are regarded as “the part of the world which the operation has license to change” [40]. Footprint, on the other hand, is a standard term for accessible location sets in the context of dependency contracts [88], a research area closely related to AE.

Slide Statements is safe, i.e., retains the external behavior of the affected code, under the following conditions: (1) The frames of \(p\) and \(q\) must be disjoint, (2) the frame of \(p\) and footprint of \(q\) must be disjoint, (3) the frame of \(q\) and footprint of \(p\) must be disjoint, (4) if \(p\) completes abruptly (e.g., by throwing an exception), \(q\) must complete normally (and vice versa), and (5) if either \(p\) or \(q\) completes abruptly, the other may not have relevant side effects. Conditions (1) to (3) ensure that \(p\) and \(q\) are “independent,” i.e., do not interfere; Condition (4) establishes that the reason for (abrupt) completion of the program is the same before and after the transformation. For example, it cannot happen that before, the program completes because of an exception thrown by \(p\), while afterward, it completes due to a return by \(q\). Condition (5) is required because if one statement completes abruptly, the other one can only change the state either before or after the transformation, which is why any changes must be confined to locations we are not interested in.

To impose constraints on frames and footprints of abstract elements, we have to define which locations APEs may read and write. However, no further constraints other than the given ones should be enforced: frames and footprints should apply to all programs satisfying Conditions (1) to (3) and (5). We achieve this by using abstract, set-valued specification variables inspired by the theory of dynamic frames [40]. Specifically, we introduce constants \(\textit{frP}\), \(\textit{fpP}\), \(\textit{frQ}\), \(\textit{fpQ}\), etc., each representing an abstract set of program variables or heap locations that can be used to refer to the same frame or footprint in multiple specifications.

Fig. 1
figure 1

Abstract Program Model for Slide Statements

The abstract program model for Slide Statements is shown in Fig. 1 (to simplify the example, we only consider normal completion as well as completion due to a thrown exception or returned value, disregarding, e.g., abrupt completion due to a break statement). Our specification language extends the Java Modeling Language (JML) [47]. Constraints on ASs are imposed inside specification comments starting with “ ”; the keyword “ae_constraint” initiates the declaration of a constraint. In lines 25/26 and 33/34, we assign the newly introduced dynamic frame specification variables to the ASs, where the keyword assignable specifies a frame, and accessible a footprint of AS or AExp. Conditions (1) to (3) are encoded in lines 2–4. To realize mutual exclusion of abrupt completion (Condition (4)), we first bind abrupt completion of ASs and to abstract predicates \(\textit{throwsExcP}\), \(\textit{returnsP}\), \(\textit{throwsExcQ}\) and \(\textit{returnsQ}\), resp., with “ ” and “ ”  in lines 27–30 and 35–38. These predicates represent unknown conditions in the same way that dynamic frames represent unknown location sets, with the intention of giving them a name for future reference. The binding via“ ” is both necessary and sufficient: The respective behavior is always demanded when the specified conditions hold and only then. The function “\(\backslash \) value ” maps to the (abstract) value of a location set at the point in the program where it is used. It is needed since the locations represented by an abstract location set like \(\textit{fpP}\) do not change during program execution, while their values can change. Because furthermore, the same program may, or may not, throw an exception (return, etc.) depending on the evaluation of its footprint in the current environment, the abstract predicates are defined parametrically in the values of the footprints. Now, we can stipulate that at most one of the predicate holds (i.e., at most one AS completes abruptly) in lines 5–16 using the “\(\backslash \) mutex” keyword.

To encode Condition (5) in the model, we employ a further dynamic frame specification variable \(\textit{rel}\) representing an underspecified set of relevant locations. We use it to specify the property that the program performs equivalently before and after the transformation. If \(\backslash {{{{\textbf {{\texttt {result}}}}}}}\_{{{{\textbf {{\texttt {1}}}}}}}\) represents the value of \(\textit{rel}\) before and \(\backslash {{{{\textbf {{\texttt {result}}}}}}}\_{{{{\textbf {{\texttt {2}}}}}}}\) its value after the transformation, this property is specified as \(\backslash {{{{\textbf {{\texttt {result}}}}}}}\_{{{{\textbf {{\texttt {1}}}}}}}\doteq \backslash {{{{\textbf {{\texttt {result}}}}}}}\_{{{{\textbf {{\texttt {2}}}}}}}\). Without further constraints, the model has to be proven under the assumption that all locations are in \(\textit{rel}\), i.e., we prove full equivalence. In lines 17–22 of the model, we relax the proof goal by declaring the frame of disjoint from \(\textit{rel}\) if AS  completes abruptly, and vice versa. This is more liberal than preventing the normally completing statement from changing any part of the state.

Our AE tool proves correctness of the transformation specified in Fig. 1 fully automatically in less than 20 seconds. Such safety conditions on program transformations as shown above are hard to find. Indeed, nearly all conditions presented in this paper were not mentioned in the literature. We discovered them with a feedback loop on interpreting failed proof attempts. This process is supported by our implementation of AE in a semi-automatic program prover that permits proof inspection.

1.3 Organization of This Paper

Our formalization of AE is based on a dynamic program logic and an abstract, formal definition of Symbolic Execution, which we expound in Sect. 2. Sect. 3 presents the concrete and abstract syntax of our AE framework and defines the semantics of abstract programs. The core of the framework are our rules for executing APEs and simplifying abstract stores, which we present in Sect. 4. In Sect. 5, we explain details about the feedback loop for extracting preconditions for safe transformations and our approach to proving loop transformations. Furthermore, we provide an overview of the applications of AE to correct code refactoring, cost impact of transformation rules, and parallelization of sequential code. The implementation of AE in the program verification framework KeY is described in Sect. 6. Sect. 7 describes related work, and Sect. 8 concludes the paper and outlines ideas for future applications and extensions.

Novelty

This work is a heavily revised and much-extended version of a conference paper [79]. In contrast to the latter, it is fully based on dynamic frames (Sect. 3.1 presents the extended specification language), which enables the specification of general transformations. Furthermore, we introduce abstract expressions (AExps), whereas [79] relied on an “abstract expression idiom” using ASs. The brief, informal semantics definition [79] is made precise by translating it into dynamic logic (Sect. 3.3). This reduction makes it possible to mechanically check whether a given concrete program instantiates an abstract one. The symbolic execution rules for APEs and all simplification rules for abstract stores (Sect. 4) have been replaced by wholly revised versions, including rules for better support of heap-related properties. We use a new technique for proving loop transformations based on “abstract strongest loop invariants” (Sect. 5.1.2) that no longer requires manual loop coupling. For the application to correct refactoring (Sect. 5.1.3), we derived more precise safety preconditions for the analyzed refactoring techniques and added a transformation not considered in [79]. Furthermore, we found several unreported bugs in the refactoring engines of major Java IDEs, which we also discuss in Sect. 5.1.3. Finally, we give an overview of other applications of AE conducted since the publication of [79] (included in Sect. 5).

Most of the technical content in this paper is included in a Ph.D. thesis [76]. Here, we give a more condensed account and formalize AE rules in an abstract symbolic execution framework to make the theory independent of KeY’s program logic.

2 JavaDL and Symbolic Execution

This section introduces the program logic, including theories of heaps and location sets on which we rely. We also define a general theory of symbolic execution wherein we later express abstract execution rules.

2.1 Program Logic

Our framework is based on Java Dynamic Logic (JavaDL), a first-order dynamic logic for sequential programs. In this section, we provide the essential parts of the logic needed to keep the paper self-contained and refer to [5] for a full account. JavaDL extends typed first-order logic by three modal operators: Modalities and , as well as updates . The box modality expresses that if the program \(p\) terminates, then it terminates in a state where postcondition \(\varphi \) holds; the diamond modality additionally requires \(p\) to terminate. Updates denote certain limited state changes. In particular, they always terminate. The empty update represents an empty state change, an elementary update the transition where variable  is assigned the value of term \(t\). Two updates \(\mathcal {U}_1\) and \(\mathcal {U}_2\) can be combined into a parallel update , where both state changes are executed simultaneously. In case of conflicting assignments to the same variable, the syntactically later one “wins.” For example, the parallel composition is equivalent to the elementary update . Updates are applied to terms and formulas: and represent the value of term \(t\) and truth value of formula \(\varphi \) after the state change effected by \(\mathcal {U}\), respectively. In a sequential update , the right-hand sides of \(\mathcal {U}_2\) are interpreted in the state after the transition described by \(\mathcal {U}_1\), while in parallel compositions, they are interpreted in the same pre-state. The formula \(\{\mathcal {U}_1\}\{\mathcal {U}_2\}\varphi \) is equivalent to . We use the notation for and write for the set of all updates.

Terms and formulas are standard; we denote by and the sets of formulas and terms of type \(T\). We write for the set of program variables and for the set of logic variables \(v\). The semantics of JavaDL is based on first-order Kripke structures consisting of a domain , an interpretation function of function and predicate symbols, a set of states \(\sigma \) mapping program variables to domain values, and a program transition relation associating with legal program fragments \(p\) a transition relation such that iff \(p\), when started in , completes normally (without throwing an exception, breaking, or returning, etc.) in . This is sufficient, because programs that might terminate abnormally are locally transformed into normally terminating programs by the rules of the Java DL calculus, see Example 2 below. Full details are in [5, Chapter 3].

A legal program fragment \(p\) for a context program \(\textit{Prg}\) is a sequence of Java statements which may appear legally (according to the rules of the Java Language Specification [28]) in the extension of \(\textit{Prg}\) by an additional class \(C\) with a suitable method \(m\) into which \(p\) is embedded as a body. Updates, terms, and formulas are evaluated using an overloaded valuation function , where is a (logic) variable assignment. It assigns to updates a state transformer , to terms a domain value of type \(T\), and to formulas a truth value or . For closed formulas (without free logic variables), we omit . For example, the valuation of the term , which expresses that in all states where was updated to the value of , is strictly positive, is computed as follows:

figure bm

We write for . If for all and , we write \(\models \varphi \) and say that \(\varphi \) is valid. JavaDL has a sound sequent calculus [5] to derive from judgments the validity of the semantic entailment \(\Gamma \models \Delta \).

JavaDL implements a heap theory based on the theory of arrays [53]. A heap is a sequence of mappings from pairs of objects and fields to values. Writing to the heap is accomplished by a function , which takes a heap, object, field, and a value, and returns an updated heap. Reading values is done by the function , taking a heap, an object, and a field, and returning the field’s value. For example, the configuration represents a heap identical to h, but where the value of Person.age is 42. To evaluate the expression Person.age in it we compute . A sequence of mappings is modeled using nested expressions, as in .

Semantically, the current heap configuration in a state is stored in , for a designated variable of type . The value of location \((o,f)\) in a state is . The pair \((o,f)\) is an element of JavaDL’s type for location sets. Its domain are pairs of objects and fields; in Sect. 3.2, we extend this by program variable locations. The and theories in JavaDL are closely related: For instance, the function anonymizes the fields of the location set in the first heap argument; when accessing those, the values in the second heap are used instead. See [5] for details on the heap and location set models.

2.2 Symbolic Execution

Symbolic Execution (SE) [8, 91] is a popular program analysis technique introduced in the 1970s [15, 18, 41] for exploring a large number of execution paths of a program. The key idea is to treat inputs to a program as abstract symbols. Whenever the execution depends on the concrete value of a symbolic variable, SE follows the branching execution paths in parallel. Symbolic Execution engines maintain for each explored path (1) a path condition describing the conditions satisfied by the branches taken along that path, (2) a symbolic store mapping variables to (symbolic) values, and (3) a program counter pointing to the next instruction to execute. Branch execution updates the path condition, while assignments update the symbolic store [8]. The triple consisting of these elements is called a Symbolic Execution State (SES).

Semantically, a symbolic execution state represents a (potentially infinite) set of concrete execution states , in the same way as a symbolic parameter represents a potentially infinite set of concrete parameters. We call the set of concrete execution states concretizations for . Based on the notion of concretization, we develop two desirable properties of SE transition relations: Exhaustiveness, satisfied by overapproximating SE, and precision, satisfied by underapproximating SE. The definitions in this section are a digest of [76, Chapt. 3]. Specifically, the definitions of exhaustiveness and precision and their implications on the correctness of SE do not appear in previous publications. In Sect. 3, we extend this framework to abstract SESs, and define SE rules for SESs with abstract program counters in Sect. 4.

We formally define our notion of SESs. We represent path conditions by closed formulas and symbolic stores by JavaDL updates. Updates have the advantage that we can evaluate, for example, a formula, in a symbolic store by simply applying the update to the formula. A program counter in our framework is the whole remaining program (instead of a pointer to the next instruction).

Definition 1

(Symbolic Execution State) A Symbolic Execution State (SES) is a triple of (1) a set of closed formulas , the path condition, (2) an update , the symbolic store, and (3) a legal program fragment \(p\), the program counter. We omit \(p\) for empty program counters and denote the set of all SESs by  .

Based on the valuation function of JavaDL, we define the concretization function which, given an initial concrete state , concretizes a symbolic state to a concrete state relative to a given structure . The union for all initial states represents the set of concretizations. We begin with a “ -indexed” version and then define as the union for all structures . The idea is that all different interpretations of uninterpreted function and predicate symbols are captured in the concretizations. If, for instance, new Skolem symbols are introduced after a loop invariant application, the represented concrete state space is extended, which must be reflected in the definition.

Definition 2

(( -indexed) Concretization Function) The K-indexed concretization function maps an SES and a concrete state (1) to the empty set \(\emptyset \) if either , or, where , there is no such that , or otherwise (2) to the singleton set such that , where is as before. The concretization function is defined as .

Definition 3

(Semantics of SES) The semantics of an SES is defined as the union of its concretizations: .

The following example demonstrates the application of Def. 3 along a program containing a loop, which is abstracted using a loop invariant.

Example 1

(Concretization of SES) We consider a program \(p\) which decrements a positive variable inside a loop until it reaches 0 and adds 2 afterward:

Assume we want to show that always has the value 2 after \(p\) terminates. Since the initial value of , and therefore the number of loop iterations, is unknown, we abstract the loop with an invariant. SE starts with the initial SES , where the path condition contains the precondition that is nonnegative. A suitable loop invariant for the loop in p is . This loop invariant is inductive: It is strong enough to imply the postcondition. Together with the negated loop guard , which holds after termination of the loop, this is sufficiently strong to infer that is 0 after loop termination. After the application of a typical loop invariant rule, one has to show, as a side condition, that is an inductive loop invariant. Hence, we obtain the successor state , where is an anonymizing update with a Skolem constant c, \(c\ge 0\) the loop invariant, and \(c\le 0\) the branch condition signifying that the loop has been exited. Semantics-preserving simplification of the path condition yields the state . Its semantics is computed as follows:

figure dv

Consequently, represents all concrete states where attains the value 2. Step (\(*\)) results from the following considerations: if all structures in the specified set are such that , then the transformers created for are equivalent to those created for . After this simplification, there remain no more uninterpreted function symbols and the union over all can be omitted.

For any number \(k\le 0\), the formula is also a valid (though not inductive) loop invariant. If \(k\) is strictly negative, the SES resulting after executing the loop has more than one concretization. If we choose \(k\text {:}{=}-1\), for example, can attain the values \(-1\) or 0 after the loop. Consequently, the concretizations for the final SES after the program comprise some where is increased by two, and some where it is only increased by one: we are in the realm of overapproximating SE.

In this paper, an SE transition relation maps an SES to a non-empty set of successor SESs.Footnote 1 The big-step extension of SE transition relation is the reflexive and transitive closure.

Most practical SE transition relations can be defined as a set of schematic SE rules, where each such rule represents a family of SE transitions (one transition for each consistent instantiation of the contained schematic placeholders). We use sequent calculus notation: let and \(o_1, \dots , o_n\), for \(n\ge 0\), be SESs. The SE rule

represents all instances of the SE transitions resulting from consistent replacement of schematic placeholders in the input and output states. The rule is read bottom-up, has a name (here “ ”) written on the left, and may have conditions written on the right. In the following, we show two example SE rules.

Fig. 2
figure 2

Example SE Rules

Example 2

(SE Rules) Fig. 2 shows two SE rules for assignments and conditional statements. Schematic placeholders are \(C\), \(\mathcal {U}\), \(\textit{se}\), \(\pi \), \(\omega \), etc. The schema variable \(\mathcal {U}\) can be instantiated to any update, \(\pi \omega \) to a Java context, \(\textit{se}\) for a side effect-free expression, and so on. In the Java context \(\pi \omega \), \(\pi \) is an inactive prefix containing opening braces, labels, and the opening of various scoping frames for methods, exceptions, etc. Scopes are created by SE rules on the fly to ensure that only normally terminating programs ever need to be symbolically executed. For example, statements are scoped inside a “ {”. Dually, \(\omega \) consists of closing braces, the remaining program to execute symbolically, and closings of scopes such as clauses. Together, \(\pi \omega \) constitute a valid Java program. Let, for instance,

be two SESs. Then, the rule covers the SE transition .

We define two aspects of the correctness of symbolic transition relations: Exhaustiveness and precision. These properties are comparable to “recall” and “precision” in binary classification. Exhaustiveness is the property that during a symbolic transition, the set of concrete states represented by an input state is not decreased, whereas precision is the property this set is not increased. The definitions (called “strong” exhaustiveness and precision in [76]) fix the interpretations of uninterpreted function and predicate symbols across transitions. For exhaustiveness, fresh symbols may be created by a transition. This is, for instance, needed in loop invariant rules where assigned locations in loop bodies are anonymized using fresh constants, see Example 1.

Definition 4

(Exhaustive SE Transition Relation) An SE transition relation is called exhaustive iff for each transition , structure and concrete states , it holds that implies that there is (1) a “conservative extension” of interpreting all function and predicate symbols occurring in the same way as (in particular, ), (2) an SES \(o\in {}O\) and (3) a concrete state s.t. .

Definition 5

(Precise SE Transition Relation) An SE transition relation is called precise iff for each transition , \(o\in {}O\), structure and concrete states , it holds that implies that there is a concrete state s.t. .

Exhaustiveness is a crucial property for SE used in program verification, whereas precision is important for uncovering bugs and creating feasible test cases. Lems. 1 and 2 (proven in [76]) below formalize this intuition. To express them, we first define the concept of labeled SES. A labeled SES denotes the weakest precondition of relative to postcondition \(\varphi \). If is , the labeled SES is called valid.

Definition 6

(Labeled Symbolic Execution State) We write for the Labeled Symbolic Execution State with postcondition . Its semantics is defined such that for all formulas \(\psi \), implies if, and only if, it holds that and for all , .

For example, the weakest precondition of SES relative to postcondition is . Consequently, iff .

Lemma 1

(Bugs discovered by precise SE are feasible) Let be a precise SE transition relation and . If a postcondition is not true for a state \(o\in {}O\), i.e., \(\not \models {}o^\varphi \), it follows that .

Lemma 2

(A property proven by exhaustive SE holds for the inputs) Let be an exhaustive SE transition relation and . If a postcondition , which only contains rigid symbols already present in , holds for all states \(o\in {}O\), i.e., \(\models {}o^\varphi \), it follows that holds.

In Sect. 4, we devise SE rules for AE that are both precise and exhaustive with respect to the semantics of AE defined in the following section.

3 Syntax and Semantics of Abstract Execution

In Sect. 1.2, we introduced the essentials of the concrete syntax of our specification framework for abstract programs. Here, we explain features of the language that we omitted so far, define its abstract syntax and give it a semantics. Finally, we introduce abstract updates, a syntactic concept for second-order symbolic state changes.

3.1 Specification Language

figure gd

We demonstrate additional specification language features along the abstract example program in Listing 3 containing almost all features not yet mentioned in Sect. 1.2. Our specification syntax admits the notation , where is interpreted as the singleton containing . Intuitively, instantiations of this abstract program complete normally without changing the state, or else they complete because an exception thrown by AS . In the resulting state after the thrown exception, the program variable attains the final value \(-1\). Additionally, the locations to which the abstract location set \(\textit{frameP}\) is instantiated might be changed. One possible valid instance of the abstract program is which instantiates \(\textit{frameP}\) to and \(\textit{footprintP}\) to .

Enforcing Assignments and Specifying “All” or “Nothing”

As a default, locations listed in clauses are upper bounds: the set of represented concrete programs includes programs not assigning anything. Yet, sometimes one wants to enforce the assignment of a specific location. This can be done using the keyword, which is a JML extension specific to AE. For example, line 16 in Listing 3 imposes that all instantiations of AS assign the variable . Additional keywords that can be used in and clauses are “ ” (no location can be assigned, used in line 9) and “ ” (any location may be assigned, the default).

Notation

We occasionally use a simplified syntax for APEs instead of the concrete syntax with specification comments: the notations and represent AS and AExp with identifier symbols and , respectively, both with frame , footprint and, unless otherwise stated, expected to complete normally. We generally use capital letters , , ... for AS identifiers and lower case letters , , ... for AExp identifiers. We distinguish locations by a superscript exclamation mark as in .

The specifier enables simplification steps that would not be possible otherwise. Consider, for example, the following program, where an AS should operate on a (not “relevant”) temporary variable instead of the variable and is therefore surrounded by a set and reset statement as follows: Since has to assign , we can drop the set statement after considering the assignment of in the footprint of , without changing the semantics of the program, resulting in “ ”. Assuming that is not read after this program fragment, and again using the information that it has to be assigned by , we can merge the remaining statements and obtain “ ”. These simplifications would not have been possible without . For instance, the set statement could not have been dropped in the first simplification step without because then, also represented the empty statement, and the set statement was still effective.

The order of frame and footprint specifications of APEs matters. The program has the same effect on than has on (since both AS declarations have the same identifier); however, the effect of (with frame elements swapped) on will be different.

Additionally to , which we used in Sect. 1.2 for declaring the disjointness of location sets, the specification language supports the JML set operators for intersection, set difference, set union, and subset (see also [5, Sect. 9.3]). In Listing 3, we declare in line 3 that \(\textit{frameP}\) is a subset of the set \(\textit{rel}\) of all relevant locations.

In Sect. 1.2, we explained how to couple the abrupt completion behavior of APEs to expressions built from abstract predicates. Listing 3 likewise couples the exceptional behavior of to the abstract expression . In addition to such preconditions, we can also specify functional postconditions, i.e., guarantees on the state after execution of the APE. The specification in line 18 imposes that if completes normally, the variable has to be nonnegative afterward; similarly, if it completes due to a thrown exception, it has to equal \(-1\) (line 21).

A noteworthy construction is used in the functional postcondition of AExp in line 11: this boolean expression has to evaluate to true (i.e., its “ ” is true) iff the expression also holds. Consequently, always throws an exception, since otherwise, the body of the if statement would not be executed. AExp always completes normally, since line 12 stipulates that it completes exceptionally iff holds—that is, never. The condition asserted in line 25 of the listing is true (even though the statement is unreachable as always completes exceptionally). This is because we defined in lines 2 and 5 that all frame elements of are disjoint from \(\textit{footprintP}\), which thus retains its original value after execution of  .

AE supports additional specification cases for abrupt completion due to a (labeled) continue or (labeled) break from a block or loop. In the implementation, these cases are only considered if the specified APE occurs inside a loop (or labeled block). To specify them, use the keywords “ ” “ ” “ ” or “ ” (where \(\textit{lbl}\) is a label occurring in the context) similarly as we used before.

3.2 Abstract Syntax

Abstract Execution analyzes analyzes abstract program fragments. A program fragment is a sequence of statements that could occur inside a method body. An abstract program fragment contains at least one APE, as well as declarations of abstract location sets, function and predicates symbols, and constraints on these elements.

We first extend the theory to accommodate the subsequent definitions, and then formally define APEs and abstract program fragments.

3.2.1 Program Variable Locations

The theory in JavaDL has been designed to represent heap locations \((o,f)\). We extend this to also represent program variable locations of a new type . Please keep in mind that these refer to program variable locations of a given name, not to the value of that variable in some state. We extend the vocabulary of by

Function is a constructor for program variable locations from a program variable symbol; is a constructor for program variable locations. If, for example, is a program variable, maps to the corresponding location. The difference between and is that the latter is affected by state changes, while the former is not: We have , but . An expression represents the same locations as \(\textit{set}\); its purpose is to mark locations that have to be overwritten in assignable specifications. The semantics of a term are the values attained by the locations represented by the location set \(\textit{set}\). For instance, the meaning of is the value of program variable in the current state. The functions and are filters for heap and program variable location sets, respectively. A term (for conditional anonymization of program variables) evaluates to the program variable location if is in \(\textit{set}\) and to otherwise.

For simplicity, we write for , and use standard set notation for location sets, e.g., \(\textit{loc}\in \textit{set}\) expresses that the location \(\textit{loc}\) is in \(\textit{set}\). Dynamic frame specification variables are encoded as uninterpreted constant symbols of type .

3.2.2 Abstract Program Elements and Fragments

APEs are tuples of (1) an identifier, (2) a type (ASs have the designated pseudo-type ), (3) a frame and (4) footprint specification, (5) a termination specifier, either (APE has to terminate) or (APE may diverge), and (6) a set of specifications especially for sufficient and necessary preconditions of abrupt completion behavior. Specifications also comprise postconditions. In relational verification, the postcondition is frequently omitted and thus logically equals . Normal completion does not have a precondition in AE; APE completes normally iff it does not complete abruptly. We continue writing “APE ” short for “the APE with identifier symbol .” Subsequently, we formally define the abstract syntax of APEs.

Definition 7

(Abstract Program Element) An Abstract Program Element is a tuple

of an identifier \(\textit{id}\), a type \(\textit{type}\) (type for statements), a frame specification and a footprint specification (both tuples of terms of type ), a termination specifier and behavioral specifications \(\textit{specs}\). The latter is a tuple of the form

figure kg

where (1) , (2) \(\textit{returnsSpec}\), \(\textit{excSpec}\), \(\textit{continuesSpec}\), \(\textit{breaksSpec}\) are pairs of formulas defining pre- and postconditions for abrupt completion of the APE due to a , exception, , and , respectively, (3) \(\textit{continuesSpecLbl}\), \(\textit{breaksSpecLbl}\) are partial functions from Java labels to pairs of pre- and postconditions for abrupt completion due to a labeled or labeled , (4) all preconditions are mutually exclusive, (5) pre- and postconditions may contain local variables of the context, and the special program variables , to access heap locations, to access heap locations in the state before the APE was executed, to refer to the exception in the case that the APE completes abruptly due to a thrown exception (postcondition of \(\textit{excSpec}\) only), and to refer to the result value returned by AS (postcondition of \(\textit{returnsSpec}\) only).

Abstract Program Fragments (APFs) contain at least one APE, along with global declarations of AE specification variables and constraints on them ( ). We distinguish two types of specification variables: abstract location sets (for dynamic frames and footprints), and abstract function and predicate symbols used in the abstract specification of the behavior of APEs. APF defines the domains of the specification elements \(\textit{continuesSpecLbl}\) and \(\textit{breaksSpecLbl}\): APEs must supply pre- and postconditions for exactly the labels in the context of their appearance in the APF. Constraints can also be declared locally within APF to, e.g., refer to globally unavailable locations such as the exception variable of a clause. They are w.l.o.g. treated globally in the following definition: local constraints can be converted to global ones by interpreting them in the symbolic state of their occurrence.

Definition 8

(Abstract Program Fragments) An Abstract Program Fragment is a tuple \((p,\textit{APEs},\textit{locSpecVars}, \textit{funcAndPredSymbols},\textit{constraints})\), where (1) \(p\) is a sequence of statements containing exactly the APEs in the non-empty set \(\textit{APEs}\), (2) \(\textit{locSpecVars}\) is a set of dynamic frame specification variables ( constants), comprising the symbols used in \(\textit{APEs}\), (3) \(\textit{funcAndPredSymbols}\) is a set of abstract function and predicate symbols used in pre- and postconditions of the APEs, and (5) \(\textit{constraints}\) is a set of formulas constraining the behavior of the APEs.

3.3 Semantics of Abstract Program Elements and Fragments

We define the semantics of the AE framework indirectly by reduction to JavaDL: A statement or expression is an instance of AS or AExp if it satisfies a JavaDL formula that serves as its logical representation. This approach results in longer definitions than for a direct model-theoretic semantics, but has notable advantages:

  1. (1)

    The translation to a formal program logic enforces the precise description of legal instances and does not permit omitting important details.

  2. (2)

    The semantics is constructive in the sense that it gives rise to a directly implementable approach to verify that a given program fragment instantiates APF.

  3. (3)

    The previous two points facilitate validation, which, given the complex definition, is an important aspect.

The logical representation of APF needs to cover the following aspects: (1) the frame specification, including (2) the specific semantics of (motivated in Sect. 3.1), (3) the footprint specification,Footnote 2 (4) the termination condition, (5) the contract for normal completion, including , and (6) the contract for the various cases of abrupt completion. The following conjunction \(\textit{represents}(\textit{ape},p)\) of formulas evaluates to iff the program \(p\) is a legal instance of (is represented by) the APE \(\textit{ape}\). The first four conjuncts correspond to cases (1)–(4) above. The next two conjuncts relate to case (5), and the remaining conjuncts to case (6), where the two formulas \(\textit{breaksForLbl}(\textit{breaksSpecLbl},\textit{lb},p)\) and \(\textit{continuesForLbl}(\textit{continuesSpecLbl},\textit{lb},p)\) for labeled s and s, respectively, receive an additional parameter \(\textit{lb}\) representing the specific label to be considered.

figure la

For the sake of readability, we moved the formal definitions to Appendix A. Based on \(\textit{represents}\), we define the semantics of a single APE as follows:

Definition 9

(Semantics of APE) Let \(\textit{abstrStmt}\) be AS. Its semantics is the set of all concrete statements represented by it, formally:

The definition works accordingly for AExps.

Legal instantiations of Abstract Program Fragments first have to provide instantiations of the APE specification variables (i.e., of abstract location sets, and function and predicate symbols) satisfying the global constraints; second, they have to provide legal and consistent instantiations of the APEs s.t. the resulting program is a legal concrete program fragment. An instantiation of a set of APEs is consistent if two APEs with the same identifier are instantiated by statements or expressions that are equal up to the renaming of used locations, if the frame and footprint definitions of the APE occurrences they are instantiating differ.

In our subsequent definition of the semantics of APFs, the notation \(S[\textit{subst}]\) denominates the result of applying the substitution \(\textit{subst}\) on all elements of the set \(S\); similarly for program (elements) \(p\) instead of sets.

Definition 10

(Semantics of Abstract Program Fragment) Let be APF. A program fragment \(p^0\) is a legal instantiation of \(\mathcal {F}\) if it arises from a substitution \(\textit{subst}_\textit{locSpecVars}\) of concrete locations for specification variables, a substitution \(\textit{subst}_\textit{funcAndPredSymbols}\) for abstract function and predicate symbols, as well as an instantiation \(\textit{subst}_\textit{APEs}\) of concrete statements and expressions for APEs such that:

  1. (1)

    \(\textit{subst}_\textit{locSpecVars}\) substitutes concrete heap locations or program variables for elements of \(\textit{locSpecVars}\).

  2. (2)

    \(\textit{subst}_\textit{APEs}\) substitutes statements or expressions for elements of \(\textit{APEs}\).

  3. (3)

    \(\textit{subst}_\textit{funcAndPredSymbols}\) substitutes, for elements of \(\textit{funcAndPredSymbols}\), JavaDL terms or formulas containing at most locations corresponding to the arguments passed to the substituted symbol after applying \(\textit{subst}_\textit{locSpecVars}\).

  4. (4)

    The formulas \(\textit{constraints}[\textit{subst}_\textit{funcAndPredSymbols}][\textit{subst}_\textit{locSpecVars}]\) are valid (i.e., the global constraints on AE specification variables are satisfied).

  5. (5)

    \(p^0=p[\textit{subst}_\textit{APEs}]\).

  6. (6)

    Each instantiation in \(\textit{subst}_\textit{APEs}\) is represented by the APE it instantiates, respecting the instantiations of specification variables: For all , it holds that

  7. (7)

    Each instantiation in \(\textit{subst}_\textit{APEs}\) is consistent: For all APEs , with the same identifier symbol, it holds that and are equal modulo renaming of elements in the frame and footprint specifications in and (after applying \(\textit{subst}_\textit{locSpecVars}\)).

The semantics of \(\mathcal {F}\) is then defined as the set of its legal instantiations.

figure lm

Example 3

(Instantiating APFs) The abstract program model in Listing 4, a simplified version of Listing 1 from the introduction, represents a transformation rule swapping two statements that are independent, i.e., cannot overwrite state changes nor interfere with the footprint of the other statement. In addition, at most one statement may complete abruptly (we only consider exceptions and s in this example). We show that the concrete program fragment is an instance of the abstract model by constructing substitutions as required by Def. 10 that yield \(p^0\). Intuitively, the first assignment instantiates AS , and the latter two AS . First, we instantiate \(\textit{frameP}\) with , \(\textit{fpP}\) with , \(\textit{frameQ}\) with and \(\textit{fpQ}\) with . This instantiation satisfies the constraints specified in lines 24 in the listing (required by Condition (4) in Def. 10).

Our instantiation of AS throws an if is zero; consequently, we instantiate \(\textit{throwsExcP}\) to . Condition (3) requires that this instantiation uses at most locations corresponding to the “footprint” of the occurrences of the symbol \(\textit{throwsExcP}\) in the abstract program, i.e., \(\textit{fpP}\), after substituting abstract location sets. Since we instantiate \(\textit{fpP}\) to , the term , which accesses only , satisfies this condition. All other abstract predicates occurring in the program are instantiated to , which trivially satisfies the requirement on accessed locations.

With this instantiation of the abstract predicate symbols, Condition (4) is satisfied: since only one abstract predicate is not instantiated to , it is easy to see that mutual exclusion of the abrupt completion conditions (lines 69 in the listing) is ensured.

By substituting AS with “ ” and with “ ,” we obtain \(p^0\) and the validity of Condition (5). Condition (6) refers to Def. 9. For example, we have to show that “ ” is represented by the following result of instantiating abstract location set and predicate symbols in AS :

figure mk

We abstain to discuss the details of this condition for brevity. Intuitively, this partially instantiated AS represents all normally completing statements assigning at most and while accessing at most , which comprises “ .”

The instantiation trivially satisfies Condition (7) since there are not two APE occurrences with the same identifier symbol.

3.4 Syntax and Semantics of Abstract Updates

Abstract updates are the main building block of the AE calculus. They represent syntactically unbounded many concrete state changes. Abstract Execution turns APEs into abstract updates . While a concrete update assigns the value of a term \(t\) to a concrete variable , an abstract update has multiple left-hand and right-hand slots. It represents all state changes writing to any subset of the left-hand side locations, where the assigned values may only be built from combinations of constants and the memory locations specified on the right. Both, left- and right-hand may be empty, also the \(\textit{lhs}_i\) / \(\textit{rhs}_j\) may be abstract location sets instead of concrete program variables. Like APEs, abstract updates have an identifier symbol such as above, with the same semantic implication: abstract updates with the same identifier represent the same state changes, parametric in the arguments they are passed. We connect syntactically abstract updates and APEs by their names, as in for AS , but do not enforce a semantic connection.

We define the syntactic category of abstract update symbols. An abstract update symbol is an operator with a name (such as ), a list of parameters (the assignable locations), and an arity. Abstract updates are created from the application of an abstract update symbol to a list of terms (the right-hand sides, or “accessibles”). The length of the list has to match the arity of the symbol. Abstract updates can be used in the construction of sequential and parallel updates and update applications.

Definition 11

(Abstract Update Symbol) We define an abstract update symbol as an identifier , a \(n\)-tuple of parameter terms , and an arity \(m\) (with \(n,m\ge 0\)). Each abstract update symbol with the same identifier has (1) the same number \(n\) of assignable locations, and (2) the same arity \(m\). The set of all abstract update symbols is denoted by . To the set of updates we add, for , abstract updates , which may occur in compound update constructions. The right-hand side is an \(m\)-tuple of argument terms, where \(m\) is the arity of the abstract update symbol.

To define the semantics of abstract updates, we extend the interpretation function of Kripke structures such that returns a function that, depending on the values of the right-hand side of an abstract update, returns a state transformer. We then extend the valuation function of dynamic logic accordingly. The interpretation of an abstract update symbol has to respect its “frame” (i.e., ). Furthermore, we have to ensure that the interpretation of abstract updates with the same identifier is equivalent “modulo frame changes.” For instance, the abstract update should have the same effect on that has on : it has to hold that . We need the premise because the left-hand sides in the abstract updates are not declared as “has-to:” they do not have to be written, in which case the variables have to be equal in the pre-state for the equality to hold. Omitting the premise yields the constraint on the semantics of has-to left-hand sides.

Definition 12

(Semantics of Abstract Update) An interpretation function of a Kripke structure assigns to a symbol with arity \(m\) a function , such that:

  1. (1)

    Frame Condition: Let and . For all locations , it holds that either , or

  2. (2)

    State Transformers for Same Identifier Are Equivalent: Let, for any \(i=1,\dots ,n\), be , and . For all location set terms \(s_i'\) representing the same number of concrete locations as \(s_i\) (i.e., ), there has to be a bijective mapping \(\iota \) between and , such that for all , it holds that where and .

  3. (3)

    Has-To Condition: For , the requirement of Condition (2) has to hold for has-to locations and .

The mapping \(\iota \) in Condition (2) of Def. 12 is required because a single element of the of an abstract update symbol can be an abstract location set and therefore represent many concrete locations. The definition requires that state transformers created for the two abstract updates with the same identifier and equal arguments transform a pre-state , where a location has the same value as its corresponding location , to a state where they still have the same value (though potentially different from the value in ). If is a “has-to” location, the value in the resulting state will be equal independent of the value in the pre-state.

Extending the valuation function is straightforward.

Definition 13

(Valuation of Abstract Update) We extend the JavaDL valuation function as follows, for with arity \(m\):

Subsequently, we conclude the section on the syntax and semantics of Abstract Execution by considering Abstract Symbolic Execution States.

3.5 Syntax and Semantics of Abstract Symbolic Execution States

We generalize the notion of SES from Sect. 2.2. The only changes are that due to Def. 11, symbolic stores can also include abstract updates, and we use APFs instead of concrete program fragments as program counters.

Definition 14

(Abstract Symbolic Execution State) An Abstract SES is a triple of (1) a set of closed formulas , the path condition, (2) a (potentially abstract) update , the symbolic store, and (3) an Abstract Program Fragment \(\mathcal {F}\), the program counter. We write for the set of all abstract SESs.

As in the concrete case, the semantics of abstract SESs is based on the concept of concretization functions. The concretization function for abstract SESs takes a concrete program fragment as additional argument: if a given concrete program is represented by the abstract program counter, the concretization for this program is part of the semantics of the abstract SES; otherwise, the result is the empty set. The semantics of the abstract SES is obtained by constructing the union over the set of all concrete program fragments.

Definition 15

(Semantics of Abstract SES) The K-indexed abstract concretization function maps an abstract SES , a concrete state and concrete program element \(p\) either (1) to the empty set \(\emptyset \) if , or (2) to the set otherwise. The abstract concretization function is defined as . The semantics of an abstract SES is defined as .

4 Rules for Abstract Execution and Abstract Store Simplification

The fundamental idea of Abstract Execution is to perform second-order reasoning about universal properties of program behavior by Symbolic Execution: Abstract Statements and Abstract Expressions are translated into abstract updates; abrupt completion is taken into account by explicit branches in the symbolic execution tree. Thus, the core constituents of our reasoning system, presented in this section, are SE rules for APEs and simplification rules for abstract stores containing abstract updates.

4.1 Symbolic Execution Rules for Abstract Program Elements

AE is necessarily less expressive and complete than full structural induction over program syntax in higher-order logic, because it approximates induction with the fixed set of descriptive elements contained in abstract updates obtained from APEs. The advantage is that the resulting symbolic execution rules can be instantiated by matching, without having to guess an induction hypothesis. This has a dramatic impact on the efficiency and automation of AE. Nevertheless, because of the complications due to abrupt termination, the full AE rule for ASs without any abbreviations is rather lengthy. Therefore, we begin with a concise AE rule for the case of ASs (with frame and footprint ) that complete normally:

figure po

The rule removes the AS and appends to the symbolic store an abstract update. The abstract update symbol is created fresh when first executing AS with identifier , but is reused for further executions of ASs with the same identifier to ensure that ASs with the same identifier behave equivalently when executed with the same footprint values. The path condition is extended by the postcondition of normal completion, \(\textit{normalPost}\), in the state after execution of the AS. A precondition is not added, since it is not allowed for normal completion. This rule is exhaustive and precise, since the semantics of the abstract update symbol is aligned with the semantics of a normally completing AS with identifier . Only for ASs with non-trivial postconditions, the path condition has to be updated to achieve precision.

Fig. 3
figure 3

Symbolic Execution Rule for AE of Abstract Expressions

Complexity is added by considering abrupt completion. We first discuss the complete AE rule for AExps, depicted in Fig 3. The statement in the program counter of the rule’s conclusion is the assignment of AExp to a variable . The execution of can either complete normally, in which case an abstract value is assigned to , or else because of a thrown exception. In the latter case, the assignment to must not happen. Instead, the exception is thrown. Consequently, the active statement of the conclusion is not simply removed from the program counter (as in ), but replaced with a conditional throw of an abstract exception object if symbolic flag is true, and an assignment of a symbolic value to otherwise.

The generated symbolic store is more complex as in , because the values of , and have to be suitably initialized. Variable refers to the pre-state before executing . The corresponding state update, , is therefore added to the symbolic store before the abstract update . The values of and are interpreted in the post-state, and are set to and , respectively, The abstract functions , , and of types , and \(T\) are, similar to the abstract update symbol , created fresh when first executing AExp with identifier , but are reused for every further execution of AExp with the same identifier.

The path condition is extended by two formulas. First, the value of the expression is bound to the evaluation of the precondition for exceptional completion, \(\textit{pre}(\textit{excSpec})\), in the pre-state. Second, the assumptions concerning the postconditions for both completion modes are evaluated in the whole new symbolic store, i.e., the post-state. If an exception is thrown, i.e., evaluates to , the exception object is assumed to be non-null, and the postcondition \(\textit{post}(\textit{excSpec})\) is assumed to hold. In the converse case for normal completion, \(\textit{normalPost}\) is assumed. These postconditions may contain the variables (for exceptional completion) and (for normal completion), see Def. 7.

If \(\textit{pre}(\textit{excSpec})\) is satisfiable in the pre-state, subsequent symbolic execution after an application of will result in two SE branches, one for normal completion, and one for completion due to a thrown exception.

Fig. 4
figure 4

Symbolic Execution Rule for AE of Abstract Statements (abbreviations and label symbols are explained in the text)

Figure. 4 shows the AE rule for ASs. It is in essence an extension of with the additional cases for abrupt completion of a statement compared to an expression ( s, (labeled) s, (labeled) s). In each case, a conditional in the program counter yields a separate SE branch. Since statements generally do not evaluate to a value (with the exception of “expression statements”), the program counter does not contain an assignment.

The labels \(\textit{lb}_{b_1},\ldots ,\textit{lb}_{b_n}\) are (distinct) loop or block labels declared in the prefix \(\pi \); \(\textit{lb}_{c_1},\ldots ,\textit{lb}_{c_m}\) are all loop labels only. The rule is only applicable in the context of a non-void method (since we a value in the program counter) and within a loop (since we and ). For different contexts, we provide dedicated variants of rule  , which we do not detail here for brevity.

For readability, we use abbreviations in Fig. 4. The update \(\mathcal {U}_\textit{init}\) initializes all boolean flags such as and , as in the case of :

figure rp

The formula \(\textit{mutualExclusionFor}\) declares mutual exclusion of all flags that appear in \(\mathcal {U}_\textit{init}\) as left-hand sides, such that at most one of them can evaluate to . AS completes normally iff it does not complete abruptly, i.e., if all abrupt completion flags evaluate to . This is captured in the formula \(\textit{notAbruptly}\) defined as . The formula \(\textit{behavioralPreconds}\) binds the values of the flags to the corresponding preconditions defined by the AS. Since the specifications for labeled s and s are parametric in the label, the corresponding formulas are also passed the label as a parameter:

figure rv

Finally, \(\textit{behavioralPostconds}\) adds the assumptions about all postconditions:

figure rw

Observe that according to Def. 7, preconditions are mutually exclusive, so we can connect them to the Boolean flags with equivalence “ ” in \(\textit{behavioralPreconds}\). This requirement does not, and usually will not, have to hold for postconditions, which is why we use implication “ ” in \(\textit{behavioralPostconds}\).

The most important feature of our AE rules is that they are exhaustive because this implies by Lem. 2 that we can soundly prove abstract program properties by using them. In addition, they should be precise, such that they allow proving (modulo inherent incompleteness of used theories like arithmetic) everything that is logically valid. Indeed, our rules satisfy both properties. Subsequently, we state the corresponding theorems and provide proof sketches. For full proofs, we refer to [76].Footnote 3

Theorem 1

The rule (Fig. 4) is exhaustive.

Proof Sketch

We have to prove that for all instantiations of the conclusion SES in , each concretization (concrete state represented by the SES) is also a concretization of the premise SES. The core insights used in the proof are:

  1. (1)

    We perform a case distinction over the reasons for (normal or abrupt) completion of the AS instantiation. This is in the spirit of AE, which reasons about programs based on their effect rather than their syntactic structure.

  2. (2)

    We defined the semantics of APEs as a conjunction of JavaDL formulas. For a fixed, but arbitrary instantiation of the AS in the conclusion, we can assume the validity of this conjunction, and exploit the fact that it shares common elements with the premise SES to perform strong, semantics-preserving simplifications.

  3. (3)

    Re-using abstract updates and first-order symbols such as for APEs with the same identifier is soundness-critical; however, it is admissible since this only happens in the AE rules, and for equivalent APEs (modulo frame changes). Using truly fresh first-order symbols every time would be sound, but either incomplete or require non-trivial postconditions in the presence of multiple APEs with the same identifier symbol. The usage of fresh abstract updates would even require specifying the whole framed post-state for completeness. Note that all terms with such re-used symbols depend on the current value of the relevant context (the footprint of the APE). The contrary would be unsound. \(\square \)

Introducing abstract updates and first-order symbols freshly upon the first encounter with APE with a given identifier symbol, but re-using them later for APEs with this identifier is soundness-critical, but greatly simplifies both symbolic reasoning and the required specification effort in the presence of multiple APEs with the same identifier. While this calls for a discussion as in Item (3) for exhaustiveness, it simplifies the argument for precision.

Theorem 2

The rule (Fig. 4) is precise.

Proof Sketch

We have to prove that for all instantiations of the premise SES in , each concretization (concrete state represented by the SES) is also a concretization of the conclusion SES. For a given structure and initial concrete state, we may assume the validity of the path condition of the premise SES, since otherwise, no concretization is produced. It follows that the formula imposing mutual exclusion of abrupt completion is valid, which is why we can, as for exhaustiveness, proceed by case distinction over behavior. Because due to the semantics of APFs, APEs with the same identifiers are behaviorally isomorphic, we have to use the fact that logic symbols introduced for ASs with the same identifier symbols are re-used (since fresh symbols would yield more concretizations). This argument is non-standard: there is no formal connection between interpretations of abstract update symbols and the ASs they have been introduced for, but since there is only one rule for executing ASs and this rule always uses the same symbols for ASs with the same identifier symbol, we narrow down the interpretations of introduced logic symbols to the feasible ones. \(\square \)

The proofs for work analogously.

Theorem 3

The rule (Fig. 3) is exhaustive.

Theorem 4

The rule (Fig. 3) is precise.

The AE rules and transform APEs into second-order abstract updates in the symbolic store. To facilitate further reasoning with the resulting SESs, we have to provide sufficiently strong simplification rules for abstract updates, which we present in the following section.

4.2 Update Simplification Rules

We first provide an intuition of the mechanics of update simplification by discussing how concrete updates are simplified. Afterward, we introduce our simplification rules for abstract updates.

4.2.1 Concrete Update Simplification Rules

Symbolic execution transforms assignments to changes in the symbolic store. To evaluate a postcondition in a symbolic store after the execution terminated, the store has to be applied to the postcondition. Consider the program “ .” The resulting symbolic store for this program is . To evaluate the postcondition , we first have to simplify the store to a parallel normal form with distinct left-hand sides and terms \(t_i\) without updates. This is achieved by several update simplification rules. provides suitable rules for concrete updates. For the example above, we first apply the rule twice to transform the sequential update into a parallel update . Next, we turn into using one of the rules. Inside the right-hand side of the resulting update, we can drop the update application using the rule since the variable does not occur in the term 42. Formally, this is captured by the condition , where the function returns the “free” program variables in the term \(t\). We continue by applying the update , which leads to the update . The second application of can be dropped as before; the term is simplified to \(17\) by applying the update to the variable using rule . This results in the simplified update . Using the rule , we drop the update since it is overwritten by a later update in the parallel construction, leading to the update , which is in parallel normal form. Applying that update to our postcondition yields the true formula \(17>0\).

4.2.2 Abstract Update Simplification Rules

For abstract updates, we can reuse most of the existing machinery. One must strengthen the rule , because the condition is not sufficient if \(t\) contains terms depending on dynamic frame specification variables. Consider, for instance, the formula , where \(\textit{locs}\) is a dynamic frame variable of type . This formula is only true, i.e., can only be dropped, if we know from the current execution context that is not in \(\textit{locs}\). Consequently, in addition to , we have to ensure by checking the path condition that for each dynamic frame variable \(\textit{locset}\) such that occurs in \(t\) to be able to apply .

The replacement for , as well the other simplification rules for abstract updates, therefore, not only depend on the term in focus, but also on the execution context captured by the path condition \(C\) of a symbolic state. In analogy to the condition , expressing that is irrelevant for the term \(t\), we formalize a predicate \(\textit{irrelevant}(C,\textit{locset},t)\) expressing that the location set \(\textit{locset}\) is not relevant for the target term \(t\). It holds if the path condition implies \(\textit{locset}\cap \{s\}\doteq \emptyset \) for each constant \(s\) such that is a subterm of \(t\). Additionally, there are some special cases: The simplification rule for abstract updates discussed below introduces placeholders “ ” to frames that are by definition “irrelevant,” and treated accordingly. For program variables, we assert that the variable does not occur freely in the target, for heap locations that there is no free occurrence of the variable. The latter is a safe overapproximation; fine-grained heap-related simplifications require dedicated rules (we explain an example further below). We define on tuples of locations.

Definition 16

(Location Set Irrelevance Checking) Let be a path condition, \(\textit{locs}\) a tuple of locations (program variables, heap locations, and dynamic frame variables) and . We define as

In addition to , which tells us that assigning a location set has no effect on the valuation of a target, we need a predicate expressing that assignment of alllocations in the location tuple \(\textit{locs}_1\) will at least assign all locations in the location tuple \(\textit{locs}_2\). Depending on the type of a location in \(\textit{locs}_2\), there are several ways to conclude that the location is overwritten. In the simplest case, a location literally occurs in \(\textit{locs}_1\), as in or ; the judgment is independent from then. If the location tuple is a singleton (either a heap or program variable location), we check whether a suitable expression \((o,f)\in {}s\) occurs in the context. Otherwise, we have to find a combination of locations in such that the union of these locations covers \(\textit{locs}_2\).

Definition 17

(Location Set Overwrite Checking) Let be a path condition, and , tuples of location set terms, where . The predicate is defined asFootnote 4

Fig. 5
figure 5

Abstract Update Simplification Rules

We subsequently discuss the “core” abstract update simplification rules in Fig. 5. In addition to those, we provide a second rule set dedicated to simplifying heap-related terms that frequently arise in the context of AE proofs. Understanding these rules requires no new insights or techniques. To make the paper self-contained, we include the heap-related abstract update simplification rules in Appendix B.

In the rules we use the following tuple notation: If is an \(n\)-tuple, we write instead of and similarly for application of updates, etc.

Rule was already discussed. Three further rules, to , for dropping updates. Rules and correspond to the existing in , dropping an earlier update within a parallel composition if a later one dominates it. The first of these rules replaces an earlier concrete update by if is overwritten by the frame of a later abstract update. The second rule treats the case of an earlier abstract update that is dropped. This case is more complex due to the nature of abstract updates. We can drop the abstract update from a parallel update if there is a series of updates occurring later in the parallel scope that, together, overwrite . A simple case would be to replace by in , but for more complicated expressions, it is not required that a single update overwrites all contained locations at once. The rule corresponds to , handling the case of an (abstract) update that is dropped since the locations it assigns are irrelevant for the target term.

If only some of an abstract update’s left-hand sides are ineffective and the rules and are not available, we have to perform a more fine-grained simplification step than dropping the whole update. The formula is valid, but not provable with the rules discussed so far. In this situation, the rule is applicable. It replaces ineffective parts of an abstract update’s left-hand side with the “irrelevant” location “ .” For the example above, this results in which is trivially provable. The symbol “ ” receives special treatment in the definitions of the relations and as it is always considered to be irrelevant or overwritten, respectively. An abstract update with only ” left-hand sides can be dropped by rules and independently of their context.

In contrast to concrete updates, abstract updates cannot be applied to a target term by performing a simple substitution. Generally, some abstract update applications cannot be simplified away and remain in the final states resulting from symbolic execution. Especially thinking of correctness proofs of transformations, where we compare the execution result for two abstract programs, we need to establish a normal form to be able to compare the results. Consider, for example, the Slide Statements refactoring from the introduction. In the resulting symbolic store, the abstract updates occur in a different order and have to be normalized to show equivalence. This normal form is established by the rules and . Abstract updates are moved to the front of a parallel update as long as this does not change the semantics. However, an abstract update may only be pushed past another abstract update if it has a lexicographically smaller identifier symbol. If there are no conflicts between the elementary abstract and concrete updates within a parallel update, it is normalized to a block of abstract updates ordered according to the lexicographic order of their identifiers, followed by a block of concrete elementary updates.

Even though it is generally impossible to apply abstract updates by performing a substitution in the target we can (1) apply updates on an abstract update, and (2) perform an effective simplification for a special case, namely for program variables marked as in the left-hand side of an abstract update. The corresponding rules are and . The rule belongs to a class of simplification rules pushing update applications down into the term structure. It specifies that the application of an update \(\mathcal {U}\) to an abstract update is equal to the abstract update with the same left-hand side, but \(\mathcal {U}\) applied to the footprint. For situation (2), consider the formula . Since the update has to change the value of (based on value of the term ), the formula is equivalent to for a suitably chosen function symbol \(f\). “Suitably chosen,” in this case, means that the function has to be chosen dependently fresh for the identifier symbol of the abstract update to conform to the semantics of abstract updates (Def. 12). This is generalized to abstract updates with multiple left-hand sides by using function symbols indexed not only with the identifier, but also with the index \(k\) of the respective left-hand side. It is not always feasible to completely convert an abstract update into a concrete one, as in the following example:

The assignable program variable location “ ” is not marked as , and the abstract location set \(\textit{locset}\) cannot be converted to a concrete update. Therefore, we extract program variable locations individually and replace their positions in the left-hand side of the abstract update by the irrelevant location “ .” Our simplification rule incorporates these considerations. Applying to the example above yields .

5 Applications of Abstract Execution

Abstract Execution has been applied in a variety of different scenarios: (1) Deriving preconditions for the safe application of refactoring rules [76, 80], (2) analyzing the cost impact of program transformation rules [7], (3) the parallelization of sequential code [33], (4) “Correct-by-Construction” program development [89], (5) modular verification of software product lines with delta-oriented programming [66], and (6) the correctness of rule-based compilation [78]. Subsequently, we briefly overview all of these applications, discussing motivation, approach and results for each.

5.1 Safe Refactoring

Refactoring is the process of changing code in such a way that it does not alter its external behavior, yet improves its internal structure [22]. Careful refactoring can contribute to the maintainability and reusability of code. Consequently, many actions performed during software development are refactorings (ca. 30% as reported in [72]). While programmers still frequently manually refactor their code [57], most mainstream IDEs implement (semi-)automated refactoring techniques.Footnote 5

Code refactoring is generally a complex activity, and it is easy to break the refactored code accidentally [20]. The reason is that refactoring techniques come with preconditions and constraints that have to be satisfied to ensure the preservation of program semantics. If those are violated, the resulting program may not compile, or—which is worse—compile, but expose a different behavior. IDEs automatically check some of these preconditions, but not all [74]: Even refactoring with tool support does not exclude the possibility of unexpected changes to a program’s behavior [17, 73]. With the help of AE we could show in our work on correct refactoring [76, 79] that documentation of crucial preconditions in existing standard literature (e.g., [21, 22]) is vastly incomplete for many refactoring techniques.

We used AE to model nine statement-level refactorings.Footnote 6 We extracted sufficient preconditions ensuring their safety in a feedback loop driven by the interpretation of failed proof goals, ultimately leading to a proof certificate for these preconditions. We chose six refactorings from Fowler’s original book [21] and three from the second edition [22], which includes two techniques with loops. For each technique, we created a model consisting of two abstract programs, one representing the starting point, and one the result of the refactoring. Our proof goal is behavioral equivalence; thus, we obtain preconditions for, e.g., Extract Method at the same time as for its inverse Inline Method. For all refactoring techniques, we discovered new preconditions that had not been mentioned in the literature. Subsequently, we explain how we created proof obligations for proving behavioral equivalence (Sect. 5.1.1), discuss how to prove transformations with loops (Sect. 5.1.2) and provide an overview of the discovered preconditions (Sect. 5.1.3), including a description of four bugs we discovered in the implementations of Extract Method in IntelliJ IDEA and Eclipse. For an extensive discussion including full models for all refactoring techniques, we refer to [76, Chapter 6].

5.1.1 Proof Obligation for Behavioral Equivalence

A refactoring model includes (1) two abstract programs\(\textit{left}\)” and “\(\textit{right}\),” (2) a relational precondition\(\textit{Pre}\),” (3) a set of relevant locations\(\textit{relevant}\),” and (4) a postcondition\(\textit{Post}(s_1, s_2)\),” where \(s_1\) is a sequence consisting of a possibly returned value, a possibly thrown exception, and the values of the relevant locations for the left program, and similarly \(s_2\) for the right program. From these constituents, we create a proof obligation collecting the outcomes of the left and right program in two uninterpreted predicates \(P\) and \(Q\). As a default, we use a single abstract location set relevant for the relevant locations. Since this location set may represent any set of locations (unless the model imposes more specific constraints), both abstract programs have to coincide in their effects on the full program state. The resulting proof goal follows the syntactic pattern

(1)

where formulas \(\varphi _\textit{left}\) and \(\varphi _\textit{right}\) collect the results from executing the left and right programs in the predicates \(P\) and \(Q\), respectively. Formulas \(\varphi _\textit{left}\) and \(\varphi _\textit{right}\) follow the pattern “ ” and “ ”, respectively. The double negations are a technical necessity to make use of the diamond modality (enforcing termination of the left and right programs) in proof assumptions. The formulas \(\varphi _\textit{left}\) and \(\varphi _\textit{right}\) are defined as

figure wz

where is a Java class consisting of two methods and containing the left and right abstract program of our model, respectively. After the successful execution of both abstract programs, there will be exactly one instance of each of the uninterpreted predicates \(\textit{P}\) and \(\textit{Q}\), which makes instantiating the existential quantifier in the succedent of this proof obligation trivial. This encoding is more efficient than the alternative of expressing the problem using an equivalence “ ,” since the proof does not have to split, and, consequently, about 50% of the proof steps are saved.

Example 4

(Proof Obligations) We instantiate proof obligation schema (1) to the model from Example 3. Listing 4 shows the contents of the method except the “ ” statement at the end. The method contains the ASs in reverse order. The precondition \(\textit{Pre}\) is instantiated to the formula from Listing 4. We instantiate the postcondition \(\textit{Post}(s_1,s_2)\) to \(s_1[2]\doteq {}s_2[2]\), where \(s_i[2]\), the third component of sequence \(s_i\), contains the final value of the abstract location set \(\textit{relevant}\) after symbolic execution of and . We obtain:

The formulas \(\varphi _\textit{left}\) and \(\varphi _\textit{right}\) have exactly the shape given above, if is the class that declares methods and . A proof of this obligation splits into multiple subgoals. For example, there is one case, where \(P\) throws an exception and \(Q\) returns. Subgoals of this kind are immediately discarded, because the precondition (part of the s in Listing 4) requires those cases to be mutually exclusive. We focus now on the case where both ASs complete normally. The resulting proof obligation has the following shape:

The predicates \(P\) and \(Q\) are uninterpreted, so the only promising way to instantiate existentially \(P(s_1)\) and \(Q(s_2)\) in the conclusion, is to use the arguments of \(P\) and \(Q\)’s occurrence in the premise. After eliminating the quantifier in this way, the \(P\) and \(Q\) terms in the conclusion are identical to the instances in the premise and can be discharged. It remains to prove the instantiated postcondition \(s_1[2]\doteq {}s_2[2]\):

The disjointness assumptions in the precondition (for example, permit the (abstract) update simplification rules to close this proof goal.

Our graphical workbench  [77] automates the construction of such proof obligations. We discuss in Sect. 6.

5.1.2 Proving Transformations with Loops

Symbolic execution of loops requires advanced techniques: When loop guards are symbolic, we cannot know the number of iterations after which the loop will terminate. Frequently, loop invariants (see also Sect. 3), which are specifications respected by every loop iteration, are employed to abstract loop behavior regardless of the number of iterations. Finding good loop invariants is generally hard; indeed, coming up with sufficiently precise specifications was identified as the “new bottleneck” of formal verification (for example, [5]). In program equivalence proofs based on functional verification techniques, one even needs the strongest possible invariant for each occurring loop ([12, 76, Sec. 5.4.2]). This notwithstanding, we discovered a possibility to generically specify abstract strongest loop invariants for abstract programs.

Consider a loop with guard operating on a single variable . The formula is a strongest loop invariant when it is (1) preserved by every run and (2) there is exactly one value \(v\) such that \(\textit{Inv}(v)\) holds and \(g(v)\) does not hold. Condition (2) means that there remains no degree of freedom in the choice of the value of after loop termination: \(\textit{Inv}\) describes the exact, final value. We can formalize the condition as .

Generalizing this to a loop with an abstract expression as guard and dynamic frame specification variables as frame and footprint yields a condition constraining instantiations of abstract invariant formulas to abstract strongest ones:

This assumes that \(\textit{fr}\) and \(\textit{fp}\) are the loop frame and footprint, and \(\textit{guardIsTrue}\) is a predicate that holds if the loop guard evaluates to true. We add this as a global precondition and use \(\textit{guardIsTrue}\) and \(\textit{Inv}\) for the specifications inside our program.

As such, this only allows reasoning about normally completing loop bodies. Loop invariants are generally only required to hold before each further loop iteration; in particular, they need not hold after abrupt completion due to a or . By generalizing abstract strongest loop invariants to what we call strongest abstract strongest loop invariants—strongest loop invariants that also need to be respected after abrupt completion—we can also reason about abruptly completing loops. Assuming that \(\textit{breaksBody}\) is a predicate that holds if, and only if, the abstract statement in the loop body will complete abruptly because of a , our condition for abstract invariants \(\textit{Inv}\) gets

Listing 5 shows an example of a fully abstract loop with specifications for abrupt completion. In lines 17, the strongest abstract strongest loop invariant condition is stated. In lines 910 we assume the invariant initially. The postconditions for the abstract statement ensure that the invariant is preserved and bind the predicates for abrupt completion. Finally, \(\textit{Inv}\) is used as a loop invariant in line 11.

figure ya

Proving Validity of Instantiation for Models with Loops

Abstract loop invariants have a significant advantage over concrete ones: They are easy to come up with. In Listing 5, annotating a loop with a strongest abstract strongest invariant is a matter of introducing a fresh symbol (e.g., \(\textit{Inv}\)) and using it at suitable positions. This approach is a generic recipe that can be applied to various models.

But, of course, there is no free lunch: The complexity avoided by using generic abstract functional invariants, instead of concrete coupling invariants [12], returns when one wants to prove that a given concrete program with loops is a valid instance of a given abstract program. In this case, not only one needs to discover a meaningful invariant for the loop of the concrete program, which is difficult enough, but one has to discover the strongest loop invariant to serve as an instance of its abstract counterpart. Even though strongest invariants always exist for deterministic programs, they might be impossible to specify in a given specification language (for example, JML).

On the other hand, for many applications of AE it is sufficient to stay at the abstract level and avoid concrete strongest loop invariants altogether: For example, we show in the subsequent Sect. 5.1.3 that the mechanics for the Remove Control Flag refactoring as described in Fowler’s book [21] likely yields incorrect results. Furthermore, we show how this can be mitigated. Such insights for schematic transformations can be obtained without ever considering concrete instances.

5.1.3 Preconditions for Safe Refactoring

Slide Statements

The idea behind the Slide Statements refactoring technique (see technique (see 1.2) is to reorder statements to keep those together that have a common purpose [22]. Under the assumption that both statements complete normally, the preconditions documented in [22] are complete: Neither statement must write to the locations read by the other once, and they must also not write to the same locations. However, no preconditions are mentioned for abrupt completion. We inferred in addition that (1) at most one of and may complete abruptly in any state and (2) if one statement completes abruptly, the other one must not write to the relevant state.

The Consolidate Duplicate Conditional Fragments [21] technique is a special case of Slide Statements for moving common statements in all branches of an or statement to before or after that statement. For extracting a prefix of an , the same preconditions as for Slide Statements apply. Extracting a postfix from an comes without preconditions. For statements, the moved postfix must not throw an exception or access the caught exception object. If the postfix is moved to a block, the remaining statement in the block must not return.

Slide Statements is implemented as a lightweight refactoring in IntelliJ IDEA 2021.1 (Eclipse 4.19 does not support it). The IntelliJ implementation permits to move single statements (not separated by a “ ”) one position up or down. No preconditions are checked, which makes it easy to, for example, move a variable occurrence to a position before its definition. We did not file a bug report for IntelliJ since we formed the impression that the lightweight realization does not aim for correct results under all circumstances.

Consolidate Conditional Expression

For the case of sequential or nested conditionals with “the same result” [21], this refactoring proposes to merge these conditionals into a single check to improve clarity. There are two variants of this technique: (1) Transforming a sequence of statements to a single one with a disjunction as the guard, and (2) transforming a nested statement to a single one with a conjunction as the guard. Schematically:

figure yn

The crucial part of modeling this refactoring is the interpretation of having “the same result.” In our opinion, supported by the examples supplied in [21], \(\textit{P}\) should always either return or throw an exception: it is never executed twice. Our analysis results in the conclusion that under this assumption, both variants of the refactoring can be applied without additional preconditions. This is notable, since Fowler mentions that conditionals must not have any side effects, which is, however, only necessary if one uses Boolean connectives without short-circuit evaluation (“ ” and “ ”). In the case of variant (2), \(P\) can furthermore complete arbitrarily (i.e., also normally).

Extract Method, Decompose Conditional, and Move Statements to Callers

Method extraction is a well-known refactoring technique implemented in many IDEs. Fowler [21] names as preconditions that the extracted code may not assign more than one local variable referenced in the outside context.Footnote 7 We discovered two additional constraints: (1) The extracted fragment must not return since this changes the control flow. (2) If the extracted method assigns a local variable from the outside context, then it must not throw an exception after that variable has been assigned a value. For the latter additional precondition, consider the example in Fig. 6. For an empty , the division in line 4 completes abruptly because of a 0 divisor. Then, the final value of is 0 before, but after the transformation. While this can even be considered an improvement for the example scenario, it changes the program’s semantics; besides, in the reverse direction corresponding to the Inline Method refactoring, one would introduce a more obvious bug. Decompose Conditional and Move Statements to Callers are variants of Extract Method. We modeled and verified these, too. There were no additional insights compared to Extract Method.

Fig. 6
figure 6

Wrong Extraction of a Query Method

Discovered Bugs for Extract Method in IntelliJ and Eclipse

Considering the implementation in IDEs, Eclipse 4.19 issues a warning when extracting a fragment containing a statement, while IntelliJ IDEA 2021.1 tries to work around this problem: The extracted method returns null if no return occurred, which is checked at the call site. If a non-null value is returned, the caller returns that value; otherwise, the caller proceeds normally. We suspected that this approach by IntelliJ will not produce correct results for all input programs, and indeed quickly discovered a counterexample where IntelliJ produced uncompilable code without showing a previous warning. Consider the method in Listing 6 of Fig. 7. Applying Extract Method in IntelliJ IDEA 2021.1 to the highlighted lines yields the code in Listing 7. This code does not compile since the second statement in the refactored version of becomes unreachable, and, furthermore, the method misses a statement. Getting around this issue is not trivial. One option is to return a null value at the end of and return from if a non-null value was returned or resuming execution in otherwise (this is implemented in IntelliJ for different situations, for example, statements without branches). It is, however, not the ultimate solution, for instance, when the extracted fragment returns non-trivial types. We reported a bug to the IntelliJ developers, who fixed the problem in later IDEA versions.Footnote 8

Precondition (2) in the previous paragraph is unchecked by either IDE for the Extract Method direction. However, both IntelliJ and Eclipse produce a correct result for the inverse Inline Method direction by replacing with a temporary variable that is only assigned to at the end of the inlined method body. For the Extract Method direction, we figured out a workaround for the problem related to exceptions and suggested it in a bug reported to the IntelliJ developers.Footnote 9

Fig. 7
figure 7

Wrong Application of “Extract Method” by IntelliJ IDEA

Eclipse allows factoring out and statements from within loops, which immediately yields uncompilable code. We reported this bug to the Eclipse community.Footnote 10 IntelliJ is more considerate in handling and statements. In some cases, it produces correct results. However, it is still easy to come up with examples where IntelliJ produces uncompilable code or semantically incorrect results when factoring out conditionals containing s or s from within loops. Furthermore, the implementation is inconsistent and produces correct or wrong results for the same input depending on the surrounding code in the class. We filed another bug report for this issue.Footnote 11

Decompose Conditional [21] is a variant where condition and both branches of an statement are extracted to individual methods. For the branches, this is identical to Extract Method; there is no precondition for the extracted condition.

Move Statements to Callers [22] is a variant of Inline Method where a prefix (and not the whole body) of a method is moved to the callers. Conversely, Move Statements into Method moves statements before an invocation to inside the called method. The same restrictions as for Extract Method / Inline Method apply.

Replace Exception with Test

In our example in Fig. 6, we anticipated that a division by zero would raise an and used a statement to react accordingly. Fowler motivates the Replace Exception with Test refactoring [21] by declaring this procedure a code smell that should be avoided. Rather, we should check the problematic condition in case of the example ) beforehand in an / statement replacing the / . Note that then, method extraction in Fig. 6 would have been safe. Yet, this also demonstrates that Replace Exception with Test is generally unsafe, even though no preconditions are mentioned in [21]. Schematically, this refactoring can be written as

figure zv

where \(\textit{cond}\) is the condition under which \(P\) will throw an exception. The general problem with this refactoring is that whenever \(P\) throws an exception, it might have changed the relevant state before completing abruptly. After the refactoring, \(P\) is not executed at all. The refactoring can be safely applied if \(P\) neither assigns any “relevant” location nor the locations assigned by \(Q\), or if both \(P\) and \(Q\) do not assign any relevant location, and, furthermore, \(Q\) always completes normally.

Since these situations are unlikely in practice, we came up with a workaround that always ensures the safety of the refactoring technique: If \(Q\) contains a prefix resetting all locations assigned by \(P\) to default values that are independent of \(P\)’s assignments, then intermediate changes by are neutralized, resulting in the same effect before and after the refactoring. This situation can always be achieved by adding suitable reset statements to the clause directly before \(Q\).

To the best of our knowledge, Replace Exception with Test is not implemented in any major Java IDE.

Split Loop

Splitting a loop, where this is possible, contributes to readability by dividing loops with separate concerns. It can also make sense to split a loop to prepare for code parallelization (see Sect. 5.3). As we showed in our work on the cost impact of transformations (Sect. 5.2), the performance impact of this transformation is minor. This might be counterintuitive, but the only overhead introduced by dividing a loop is the double evaluation of the loop guard, which is usually insignificant. The schematic representation of Split Loop is

figure zy

We extracted the following preconditions: (1) The guard \(g\) must not have any side effects and \(P\), \(Q\) must not write to the footprint of \(g\), (2) the initialization statement \(\textit{Init}\) and the loop update statement \(\textit{Upd}\) must write (initialize/update) \(g\)’s footprint and must complete normally, \(\textit{Init}\)’s footprint must be empty, and \(\textit{Upd}\)’s footprint equals \(g\)’s footprint, (3) the frames and footprints of \(P\) and \(Q\) must be independent in the sense of Slide Statements, i.e., not overwrite each other and not influence each other’s evaluations, (4) \(P\) must not complete abruptly, (5) \(Q\) must not complete abruptly before \(P\) committed its final result (i.e., established its invariant). All these preconditions are undocumented in [21, 22].

Observe that loops over an iterator (with a guard like “ ”) do not satisfy these preconditions: a call to “ ” in \(P\) or \(Q\) changes the state on which the evaluation of \(g\) depends, which is not allowed due to condition 5.1.3. Therefore, it is unsafe to apply Split Loop to such loops.

Remove Control Flag

A “control flag” in a loop determines when the loop should terminate. The Remove Control Flag refactoring [21] suggests to resort to or statements instead to better communicate the control flow. Schematically:

figure aad

We found that the shortcut introduced by the abrupt completion, however, generally breaks semantic equivalence. Any code that would have been executed after setting the control flag (\(Q\) in the schema) is now skipped by the shortcut, and must thus not have effects visible outside of the loop. Otherwise, we have to duplicate \(Q\):

figure aae

Following the mechanics described in [21] likely yields incorrect results.

For the proofs of Split Loop and Remove Control Flag, we used abstract strongest abstract loop invariants, as described in Sect. 5.1.2.

Discussion: Inadequate Refactoring Support in IDEs

Most statement-level refactoring techniques discussed by Fowler [21, 22] are not supported in mainstream IDEs. Indeed, only variable or method extraction or inlining are implemented in IntelliJ IDEA and Eclipse, apart from renaming or class-level refactorings. We consider this problematic, as automated code refactoring can prevent many coding errors. In addition, even an imperfect implementation is a step forward, enabling users to submit bug reports and gradually improve the implementation.

5.2 Performance Impact of Transformations

Refactoring, as described above, changes a program without affecting its functionality to optimize “soft” properties, such as readability or maintainability. When refactoring, programmers are encouraged to disregard the manipulated program’s performance (e.g., runtime), this being an orthogonal aspect. Still, it is certainly worthwhile to know that, for example, Split Loop does not affect performance up to a constant factor.

In [7], we apply Quantitative Abstract Execution (QAE) to statically derive the cost effect of program transformation schemas. In contrast to existing relational cost analysis approaches (e.g., [63, 64]) that work by first applying a transformation—say, Split Loop—and analyzing performance after the fact, QAE obtains abstract cost bounds of transformations before they are even applied. Using these bounds, one can reason about the cost effect of a transformation in general, or obtain a concrete cost effect for a given target program by instantiating an abstract bound for that program.

Fig. 8
figure 8

specification lines are model for Code Motion. specification lines are manually annotated, lines are inferred automatically

Compared to standard AE models, their quantitative counterparts contain specifications of accessible locations of APE that are relevant for its execution cost, the so-called cost footprint. The total execution cost of an abstract program is tracked in a special variable (cf. the variable for a method’s returned result). Loop invariants are split into functional invariants, which, for the purpose of cost analysis, only need to be strong enough to prove termination (the clause), and cost invariants capturing the value of the variable throughout loop execution.

Fig. 8 shows the QAE models for Code Motion, a standard optimization implemented in compilers [4], where a loop-invariant statement is moved to before a loop. QAE allows to prove that the execution cost is not increased by the transformation (and decreased if the cost of AS is nonzero and the loop is executed at least once).

Frame, footprint, and cost footprint of the involved ASs are manually specified; the loop invariants and decreases terms are automatically inferred. The QAE toolchain consists of a cost analyzer, which infers these additional annotations, and a verifier, which proves them correct. The approach is parametric in a cost model: In particular, one can analyze (abstract) execution time and memory consumption in the heap. Thanks to the automatic inference of loop invariants and optimized proof strategies, the whole inference and certification process works fully automatically and does not require auxiliary specifications. The analysis results in abstract cost bounds parametric in the execution cost of the involved APEs, and a proof certificate for their correctness.

QAE has been evaluated with seven optimization techniques, comprising, e.g., Split Loop and Loop Tiling. All models contain loops, for which the cost analyzer was able to automatically infer sufficiently strong invariants and postconditions.

5.3 Safe Code Parallelization

Parallelization of sequential code is one the most important approaches, sometimes the only available one, to improve time performance. For this reason, code parallelization is one of the central research topics in the area of high-performance computing (HPC). In this community, design patterns are an established and powerful method to parallelize sequential programs [35, 51]. It makes sense to start with sequential programs, because these already serve their intended purpose, also one avoids loss of domain knowledge, documentation, and previous investments. In addition, patterns embody best practices, as well as correct and efficient usage of parallelization interfaces—knowledge that many programmers lack. Therefore, a pattern-based approach to parallelization constitutes a safe, efficient and even semi-automatic [60] migration path from sequential to parallel code.

Unfortunately, pattern-based parallelization suffers from a severe practical limitation: sequential legacy code typically does not exactly have the form that allows immediate application of a pattern. Hence a certain amount of code restructuring is unavoidable in most cases before pattern-based parallelization becomes applicable. The DiscoPoP parallelization framework [60] developed a small number of code transformation schemata [52] that in many cases are sufficient to bring sequential code into the form required for pattern-based parallelization to succeed, i.e. these restructuring schemata prepare code for parallelization, but they still work on sequential code.

Consider, for example, the -loop in Listing 10, where \(\textit{stmt}_2\) depends on the result of \(\textit{stmt}_1\) and \(\textit{stmt}_1\) depends on the result of \(\textit{stmt}_3\) (across iterations). At first sight, the code might seem not parallelizable because of a forward-dependency among loop iterations. However, an astute programmer might find a case where it is possible to successfully parallelize the code by just reordering the statements, placing \(\textit{stmt}_3\) before \(\textit{stmt}_1\), as depicted in Listing 11. Such a transformation preserves the semantics of the original code and makes it parallelizable using the pipeline pattern that achieves functional parallelism, similar to an assembly line. The pipeline consists of various stages that process data concurrently as it passes through the pipeline. It can be used to parallelize the body of a loop if the loop cannot be parallelized in the conventional way by simply dividing the iteration space (the do-all pattern). The pipeline pattern assigns each computational unit to a processor and provides a mechanism for passing on data elements to the next unit. Then it runs the computational units in parallel. In the example, the execution of different loop iterations can overlap as long as \(\textit{stmt}_3\) is completed in iteration i before \(\textit{stmt}_1\) and \(\textit{stmt}_2\) start in iteration \(i+1\). This was not possible before because \(\textit{stmt}_3\) came last.

Fig. 9
figure 9

A rudimentary sample code parallelizable using the pipeline pattern

The code transformation required to make the pipeline pattern applicable is called Computational Unit Repositioning in [52]. It is an instance of the Slide Statement refactoring technique mentioned in Sect. 5.1.3. Consequently, the preconditions for a safe application of the transformation are the same as for Slide Statements: Neither statement depends on the output of the other one, and cannot overwrite state changes of the other. The fully automatic proof in for this simple transformation technique consists of ca. 1,000 nodes and takes less than 7 seconds.

In [33] we formalized and verified in addition two more complex transformation schemata from [52] called Loop Splitting and Geometric Decomposition. These proofs were not fully automatic and required a number of user interactions. The formalization also required the concept of strongest loop invariant introduced in Sect. 5.1.2.

In addition, it was necessary to extend the AE framework with the possibility to specify and reason about families of abstract location sets. This extension allows for versatile specifications, but is a challenge for the prover. Still, we reached a degree of automation of 99.7% even for the most complex loop transformation.

Our formal models cover not merely necessary criteria for proving the correctness of (sequential) program transformations, but also stronger constraints for the subsequent addition of parallelization directives. Crucial preconditions on memory access should be automatically checkable by parallelization tools, or can at least be closely approximated. By precisely stating these requirements explicitly, we hope we cleared the way to safer parallelization.

One obvious future work direction is the generalization of AE to parallel programs. This would allow us to go one step further: To mechanically prove that the constraints in our models are sufficiently strong to ensure the preservation of the sequential program semantics after parallelization.

5.4 Correctness-by-Construction

A posteriori verification (also called post hoc verification) designates the approach to deductive verification, whereby a program is verified after it has been constructed and, possibly, even deployed. This is by far the most common approach today, even though it is acknowledged that developing specifications post hoc for a program that was not designed with verification in mind makes the task of specification and verification considerably harder than necessary [11, 30].

Interestingly, early work in formal software development often argued for a different, a constructive approach, often termed correctness-by-construction [19, 29, 90]. Its starting point is not the program code, but a mathematical formalization of a program’s intended behavior (a specification), from which, in a series of refinement steps, a correctness proof together with executable code is gradually developed. The success of the B method [1, 2] notwithstanding, correctness-by-construction was never realized for an industrial programming language. One of the problems was the lack of tool support, but in the past years there has been a renewed interest [44, 45].

In traditional correctness-by-construction the refinement rules are directly derived from the axiomatic program semantics. For example, Hoare’s rule for sequential composition [34]

$$\begin{aligned} \begin{array}{cc} \{P\} S_1 \{I\} &{} \{I\} S_2 \{Q\}\\ \hline \multicolumn{2}{c}{[SPAN]\{P\} S_1\text {;}\,S_2 \{Q\}} \end{array} \end{aligned}$$

gives rise to a refinement rule, whereby a specification \(\{P\} S \{Q\}\) of an as yet unknown program S is refined into a program in the shape of a sequential composition. Its specification is extended with an intermediate assertion I that must hold in between \(S_1\) and \(S_2\). There are at two main issues with this refinement approach: (1) the number of refinement rules is small and fixed, which results in a large number of fine-grained refinement steps; (2) refinement rules for complex language constructs (for example, aliasing, exception handling, dynamic dispatch, etc.) are complicated and need to be proven correct.

Based on the observation that S, \(S_1\), and \(S_2\) are APEs, it is possible to re-formulate a refinement rule as a program transformation rule (Sect. 5.1) in the AE framework. Therefore, AE has the potential to overcome the mentioned limitations: since AE was developed in the context of a deductive verification tool for Java, complex language constructs are supported, specifically, exceptions and framing. A tool such as (Sect. 6) lets one prove problem-specific refinement rules on the fly and could even provide immediate feedback on incorrect refinement attempts. A first proof-of-concept has been given in [89], but the full potential of AE for correctness-by-construction remains to be explored.

5.5 Sound Rule-Based Compilation

The principles underlying Abstract Execution, that is, the transformation of APEs to abstract updates and distinguishing different completion modes in different symbolic execution branches, do not only apply to Java. If we permit abstract programs in the source and target languages of a compiler, we can reason about the compiler’s correctness. We investigated this idea based on the example of a rule-based compiler from Java to LLVM IR [78]. The compiler consists of translation rules for each syntactic element of the Java source language. We express these rules as schematic SE rules, only that we use dual SE states over program pairs. A dual SE state has the shape , where \(C\) and \(\mathcal {U}\) are a path condition and an update, respectively, \(p_1\), \(p_2\) are programs in the source and target language of the rule-based compiler, and is a set of observable locations. Intuitively, a dual SE state expresses the judgment that executing the two programs \(p_i\) in the state determined by \(C\) and \(\mathcal {U}\) has the same effect on the locations in . We restrict the set of locations to the “observable” ones to enable the introduction of intermediate assignments to registers by the compiler.

Fig. 10
figure 10

Dual SE rule for translating an statement to LLVM IR

Based on this formalism, we can express the compilation of a Java statement to LLVM IR as the dual SE rule depicted in Fig. 10. In the rule, \(P_1\) and \(P_1'\) are abstract Java statements, and \(P_2\) and \(P_2'\) are abstract LLVM IR statements. An LLVM IR statement is the statement arising from inserting the statement \(q\) into the statement \(p\) at position \(n\) such that in the resulting statement, the temporary registers , , etc., are assigned in sequential order as required by LLVM IR.

The attractive feature of this AE-based compiler is that we can automatically reason about the correctness of the translation rules. To that end, we define the validity of a dual SE state as the validity of the following justifying formula:

As usual in sequent calculi, a dual SE rule is sound if we can conclude the conclusion’s validity from the premises’ validity. In other words, we can prove the correctness of the compiler by discharging proof obligations for each translation rule in our AE framework. As a trust anchor, we assume that the implementations of AE for the source and target languages are sound. This requirement corresponds to formalizing the semantics for those languages in proven-correct compilers written (and proved) in interactive proof assistants (e.g., the CompCert [48], Jinja [42], or CakeML [81] verified compilers). The difference is that our proofs are highly automated as we can rely on the AE framework. In contrast, interactive proofs require substantial manual work: The CompCert code, for example, consists of 44% proof scripts [48].

5.6 Verification of Software Product Lines

A software product line (SPL, also called product family) [62] is a set of related programs, so-called product variants that exhibit commonality as well as variability. Commonality typically manifests itself in terms of common core functionality and a code base shared by all variants. Variability derives from the need to support a possibly large number of different feature combinations (or products). The main argument for family-based software development is the possibility to factor out the commonalities and thus avoid having to develop and maintain a large number of variants in isolation. It is also much faster to realize a new product as a variant than developing it from scratch. Family-based development is most productive in variant-rich application areas, such as consumer products, embedded (IoT) devices, but also operating systems.

Regarding analysis, specification, and verification of software, the quest to lift single product-based approaches to feature- and family-oriented [83] approaches resulted in various proposals [31, 32, 43, 84,85,86]. The fundamental design space of SPL verification is demarcated by two extremes: In the ideal scenario for feature-oriented verification, one verifies the core code and family-specific code separately for each feature. A suitable composition mechanism then guarantees the correctness of each valid variant. The main drawback is that compositionality requires serious constraints on the admissibility of contracts and feature implementation. For example, Hähnle & Schaefer [31] proposed an adaptation of Liskov’s Substitution Principle (LSP) [49] in the context of delta-oriented programming [67]. Further contract composition principles are discussed by Thüm et al. [84]. The problem with constraints on contract admissibility is that it often imposes too severe restrictions on software design to be of practical use. On the other end of the design space lies product-based verification, where each valid product is specified and verified in isolation. This is usually prohibitive in cost, particularly, with respect to specification [11]. Besides, it excludes systematic reuse, the purported main advantage of software product lines.

It turns out that AE is the basis of an interesting trade-off [66] between the two extremes in terms of a fully compositional verification approach without too restrictive constraints that would render it impractical. The underlying variability principle is delta-oriented programming (DOP) [67], where each feature is implemented by one or more delta modules (deltas, for short) that are applied successively to a core variant. In DOP one specifies incremental code transformations at the granularity of a method declaration with code deltas. This aligns with contract-based specification. Hence, each modification of a method in a delta is assumed to be specified with a contract. Moreover, a delta for a method can be declared relative to a previous version of that method that is called using the keyword in the delta’s code.

The new idea of [66], compared to the cited earlier work, is to impose relatively liberal constraints on deltas and contracts that permit overriding of behavior and are not compositional in the general case. Compositionality is regained by imposing a normal form on the code declared in a delta. The obvious drawback is that legacy code generally does not follow this normal form, but [66] showed that a small number of behavior-preserving program transformations are sufficient to achieve normal form in practice for several case studies involving legacy code (only in one instance limited remodeling was needed). It was possible to use the Safe Refactoring approach described in Sect. 5.1 with minor modifications.

Because the LSP is broken by overriding, unlike in [31, 84], correctness of calls to is no longer guaranteed by a general composition principle (plus the correctness of -free methods). In addition, the correctness of contracts with calls to must be established with the help of the constraints implied by normal form. However, since calls to can be seen as an AS, this can be directly modeled and proven with AE. For the case studies in [66] all necessary transformation schemata and contracts were proven fully automatic.

6 Implementation

We implemented Abstract Execution on top of the SE engine of the deductive program verifier  [5]. Deviating from the uniform representation in Sect 4, there exist dedicated AS execution rules tailored to different contexts. For instance, if AS is executed outside of a loop, s and unlabeled s are omitted. This saves one having to exclude behavior that cannot occur explicitly.

AE Rules as Taclets

Most SE rules in are implemented as so-called taclets [65]. Taclets provide a textbook-like notation for schematic sequent calculus rules with side conditions and, in particular, for ’s calculus. The taclet syntax is easy to learn and even easier to read. The taclet semantics is formally defined relative to possible external application-specific conditions and transformers. Derivable taclets, not part of the axiomatic basis of , can be automatically proven correct within itself [14].

Except few complex SE rules requiring complex program transformations implemented directly in Java, all of ’s rules are expressed in taclet notation. To extend rule coverage as much as possible, the taclet language offers two extension points: variable conditions to answer complex queries about the proof environment and to initialize custom data structures, as well as transformers to create terms and program elements or perform complex transformations on existing ones. An implementation of a rule set based solely on extended taclets imposes certain overhead compared to a pure Java implementation, because of a necessarily more fine-grained decomposition into conditions and transformers, as well as a higher amount of parsing. Extension points also hide part of taclet semantics and hinder fully automatic proofs of derived taclets. Nevertheless, we decided to implement the abstract execution rules as extended taclets, because the advantages far outweigh the problems: (1) A taclet specification, even with extensions backed by custom Java code, is still less opaque and better maintainable than a pure Java implementation. (2) The AE rules are considered to be axioms, which is why they are anyhow not derivable.

We defined four rules for ASs (for different contexts) and one for AExps (only one context needed) as extended taclets. We make use of 11 new variable conditions and transformers. Each of these is realized as a simple, stand-alone Java class with clear responsibility, exposing as many details as possible in the textual representation of the taclet itself. As a representative example, one of extensions concerns a “for-each” construct for iterating over schema variables of list type. The following taclet code handles the case that AS completes abruptly due to a labeled . It occurs in the SE rules for ASs, specifically, in the part describing the shape of the code resulting from the execution:

figure abo

Implementing the for-each construct is more difficult than simply implementing a Java class for a transformer like “ ” that could replace the code above, but the result is more transparent and reusable, and comes close to the description of the rule in textbook-style.

Due to the complexity inherent to abstract statements, including completion modes and framing, the AE taclets implementing AS are, to the best of our knowledge, the most complex ones ever implemented. The longest taclet for ASs has 19 variable conditions, a “ ” clause (the premises of the rule) of 68 lines and it extends to 79 lines of code in total.

Appendix C shows the AE taclet for AExps. The AS taclets have a similar shape but are even more lengthy since more abrupt completion possibilities must be considered.

Built-In Rules for Abstract Update Simplification

We had to realize a number of abstract update simplification rules as “built-in” rules directly in Java instead of as taclets. The main reason is that these rules depend on a variable number of premises in a context that is initially unknown. For instance, to implement rule in Fig. 11, we found no straightforward way to extend the taclet mechanism to allow for a more flexible specification of premises without the risk of breaking the existing implementation.

To realize the AS implementation as a pure extension of the system without modifying the latter was an important design constraint that greatly improves maintainability of the AS functionality in future versions of .

7 Related Work

Schematic programs are a natural way to describe program transformations declaratively in a modular way: One describes how to transform, for example, an statement, while delegating the transformation of the then and else clauses to separate rules. The contents of these clauses are represented as placeholders.

One of the first applications of AE, and its original motivation, was the design of a modular, rule-based compiler with automatically proven-correct transformation rules [78] (see Sect. 5.5). Compilers are program transformers; in the area of proven-correct program transformations, the application to mechanically verified compilers gained significant interest. Compilers such as CompCert [48], CakeML [81] and Jinja [42] provide strong correctness guarantees. They all have in common that the source and target languages, correctness properties, and proofs are mechanized in interactive proof assistants such as Coq [82], Isabelle [59], or Lean [56]. These systems rely on expressive logical frameworks. In contrast, the scope of AE is restricted to universal, behavioral program properties: AE abstracts away from the inner structure of the programs being proven. Conversely, this restriction is also a selling point of AE. We demonstrated that our framework is expressive enough to be applied to refactoring, cost analysis of transformations, code parallelization, correctness-by-construction, software product line engineering, and rule-based compilation (see Sect. 5). In almost all of these cases, the AE- system found correctness proofs fully automatically. In contrast, interactive proof assistants require the user to write proof scripts manually: The CompCert code consists of 44% proof scripts [48].

In the context of the system, Ahrendt et al. [6] and Bubel et al. [14] addressed automatic correctness proofs of program transformation rules. The former work projects rules to an executable semantics in the rewriting logic of the Maude system [55]; the latter one validates the correctness of derived SE rules in the system itself. Both approaches are less expressive compared to AE. For example, Ahrendt et al. only support schematic expressions and Bubel et al. only schematic statements; we have both. The rewriting logic of Ahrendt et al. does not model abrupt completion. Neither of them admits constraints on frames and footprints or the specification of pre- and postconditions for APEs.

Abstract Execution completely decouples reasoning over abstract programs from the problem of checking whether a given concrete instance of an AS is valid. We discuss two approaches that, vice versa, check “eagerly” for instance validity. The first is the calculus for the differential dynamic logic \(\textsf {dL}\) of the KeYmaera X system [61]. It is based on uniform substitution of function, predicate, and program symbols. The calculus’ axiomatic core is a set of concrete formulas with uninterpreted function/predicate/program symbols that may be instantiated to further concrete functions/predicates/programs via uniform substitution. Substitutions are sound provided that they do not clash, for example, substituting a term with a free variable for another term at a position where that variable is bound is forbidden. Uninterpreted symbols can be viewed as abstract, hence it is possible to express and to derive formulas over AS in the \(\textsf {dL}\) calculus. Consider the \(\textsf {dL}\) formula

that represents the Consolidate Duplicate Conditional Fragments refactoring (see Sect. 5.1.3), where \(a\), \(b\), \(c\), \(q\), and \(\varphi \) are arbitrary programs, conditions, and formulas, respectively. Its validity in \(\textsf {dL}\) is provable. The drawback is that the conditions under which substitutions are sound, i.e., when no clash occurs, become rather complex. For example, one must implement the uniform substitution operator such that it avoids the case that an instance of \(q\) uses a variable assigned by an instance of \(a\), etc. While manageable for the modeling language used in \(\textsf {dL}\), it becomes extremely complex as soon as programming language features such as reference types, scopes, visibility rules, exceptions, etc., are introduced.

The most ambitious attempt at a substitution-centric approach so far is partial evaluation [23]. Given a program p with parameters, one considers a subset \(\overline{\texttt {e}}\) of its parameters to have static values \(\overline{e}\) at compile time. Viewed from an AE perspective, partial evaluation instantiates every occurrence of \(\overline{\texttt {e}}\) in the abstract program \(p(\overline{\texttt {e}})\) with concrete \(\overline{e}\), followed by simplification of the resulting concrete program using \(\overline{e}\). The idea is to obtain a more efficient program than the original p under the assumption that part of the input is static. Like in uniform substitution, the main technical challenge is to find syntactic conditions to avoid clashes that would admit invalid substitutions [70]. For realistic functional or imperative programming language this is so complex that it became a main research direction in partial evaluation, known as binding-time analysis [38]. As far as we know, it was never fully axiomatized in a calculus.

In a program repair scenario, Mechtaev et al. [54] use abstract programs with parametric schematic expressions. Their goal is to synthesize witnesses for these expressions satisfying a postcondition. Compared to our work, Mechtaev et al. (1) address existential second-order program proofs, (2) do not consider abrupt completion, and (3) have no concept of abstract statement.

Godlin and Strichman [26] perform regression verification of closely related program versions. To automate the proofs in the presence of recursive functions, they replace recursive calls with uninterpreted function symbols; loops are transformed into recursive functions. Although AExps are related to uninterpreted functions, the latter are pure. The approach does not need, and consequently does not support, ASs.

The PEC system [46] uses meta variables for expressions, variables, and statements to prove correct compiler optimizations. Similarly to the other approaches discussed, PEC does not support additional specifications to constrain the behavior of placeholders, and its “meta statements” can only complete normally. Alive [50] has a more restricted scope: It automatically proves the correctness of peephole optimizations for LLVM. Those optimizations are expressed in a restricted DSL less general than the AE framework. For example, only register names can be abstract, and programs cannot contain loops.

Several works address the correctness of refactoring. We distinguish methods statically verifying refactoring techniques, including the extraction of preconditions using formal methods and static enforcement of safe refactoring, as well as dynamic techniques using testing and runtime assertions.

Garrido and Meseguer [24] formalized the Java refactoring techniques Pull Up / Push Down Field, Pull Up / Push Down Method, and Rename Temporary in Maude’s rewrite logic. They prove the correctness of two refactoring techniques using a mixture of Maude evaluation and pen-and-paper proofs. Our AE-based proofs are fully mechanized and were derived by automatic proof search.

Using dynamic frames, AE abstracts away from concrete variable or field names. Schäfer et al. [68] address the problem of preventing naming and accessibility problems during code refactoring. For example, their framework ensures that a moved reference is still bound to the same declaration. For a safe application of refactoring techniques in practice, the behavioral guarantees from AE should be combined with a framework aware of names and bindings. Silva et al. [71] use Alloy [36] models to verify the type correctness of Java code transformations. They claim that they cover everything except behavioral issues—precisely what is handled by AE.

The design of the system (see Sect. 5.1) favors the formalization of statement-level refactoring techniques. Recent work by Abusdal et al. [3] demonstrates that as well class-level refactoring techniques can be modeled and proven in our framework. The authors verified Hide Delegate, a technique involving multiple classes.

Regarding dynamic techniques, Soares et al. [74] automatically generate test suites for detecting behavioral changes caused by code refactoring using static analysis. Eilertsen et al. [20] add correctness assertions in the course of refactoring applications. The related work by Namjoshi and Zuck [58] generates witnesses during program transformation that guarantee the equivalence of the source and target programs.

8 Conclusion and Future Work

We presented the theory, implementation, and known applications of Abstract Execution (AE), a specification and semi-automated verification framework for behavioral, universal second-order program properties of schematic programs. In AE, programs may contain schematic placeholders for both statements and expressions. The behavior of instantiations of these placeholders can be constrained by fine-grained, yet abstract, specifications. Compared to our previous conference publication [79], we (1) extended our framework with abstract expressions and set-valued specification variables for assigned and used locations, (2) provided a precise, formal semantics of abstract programs, (3) detailed our extended set of abstract update simplification rules, (4) described our implementation, and (5) discussed recent applications of the AE framework. In the context of our flagship application to verified code refactoring, we report on our discovered bugs in the refactoring engines of popular Java IDEs.

There are many ways to connect to this work. Our present work demonstrates that AE can be useful in different application scenarios. For example, one might analyze compiler optimizations beyond the “peephole” level targeted by frameworks such as Alive [50], or equivalence-preserving transformations for metamorphic testing [16] of compilers.

Going in a different direction, one could generalize concrete programs causing a specific behavior in program transformers (e.g., a crash or non-compiling output) to abstract versions describing the class of behavior-triggering inputs. Such an abstraction represents a hypothesis about the origins of the failure for debugging, in the spirit of Delta Debugging [92], DDSet [27], and the Alhazen tool [39]. This gives rise to another idea: Given an abstract program that forms a hypothesis about a class of behavior-inducing programs, one might help developers by creating concrete instances from this schematic one for further testing and validation. In our bug report to the IntelliJ developers, for instance, we could then not only have submitted a single, failing example but an abstract program from which the developers could generate myriads of further tests fully automatically by themselves.

Similar in spirit is instance checking: Is this concrete program fragment likely to trigger a known bug described by an abstract one? Or might it be an admissible input to a Extract Method refactoring? Such an instance checker, for which we have developed an early prototype, is the first ingredient for deriving a transformer from an input-output pair of schematic programs that detects possible input fragments in a larger context and transforms them into an optimized version.

Static instance checking is, in general, expensive: We might have to come up with strong loop invariants where we used abstract invariants in models; instantiating concrete, precise frames is another, non-trivial task. Instead, we could follow the approach of Eilertsen et al. [20] and derive safety assertions from transformation models checking runtime-enforceable properties of the models. Combined with test generation and the value of derived safety preconditions as a means of documentation, we can get the most out of an existing model even without static instance proofs.

While one can define separate precise pre- and postconditions for all completion modes of APE specification (for example, normal completion versus completion caused by ), the AE framework currently allows merely a single frame and footprint specification. The possibility to devise mode-dependent frames and footprints for each completion mode would enable more precise specifications and thus a broader set of represented concrete instances. Related to this, we extended the specification language to support parametric location sets in our application to code parallelization (Sect. 5.3). As a result, however, this application is the only one where the proof search required human interaction. Improving automation for such language extensions is essential for the attractiveness and acceptance of AE.

Automation can also be improved for our relational transformation proofs in general. reduces a relational problem to the functional verification of individual programs, which works reasonably well with strong, but abstract, loop invariants. Using techniques from relational verification such as relational invariants [12], it might be possible to work around strongest invariants, simplifying static instance checking and making AE ready for larger transformation proofs. Complicated, realistic transformation proofs might also require stronger support of heap-related properties. Compared to our earlier work [79], we significantly improved our abstract update simplification rules for heaps (Sect. 4.2.2); yet, more complex case studies will likely bring up situations demanding additional strategies for completeness and automation.

Abstract Execution is a promising and practically useful technique for proving properties of infinitely many programs in a single proof, with most applications residing in the area of program transformations. In this work, we have provided a complete account of the current state of the theory behind AE and its applications so far. We believe that this overview will provide a fruitful basis for a plethora of interesting follow-up work on the rigorous verification of program transformations.