figure a
figure b

1 Introduction

Developing static analyses is a laborious and complicated task due to the complexity of modern programming languages. A significant part of the complication pertains to ensuring that static analyses are sound, i.e., over-approximate the runtime behavior of analyzed programs. Unfortunately, even well-established static analyses are shown to be unsound, e.g., since 2010, more than 80 soundness bugs have been found in different analyses used in the LLVM compiler [46]. Testing helps finding soundness bugs but cannot prove their absence, leaving the trustworthiness of these analyses in question.

Mathematical soundness proofs ensure the absence of soundness bugs. However, such proofs are difficult for two reasons: First, soundness proofs relate two program semantics: the static semantics and the dynamic semantics [12]—each in its own can individually be complex. Especially modern programming language features such as reflection [30], concurrency [29], or native code [1] are notoriously difficult to analyze and hard to reason about. Second, the style of static and dynamic semantics can differ significantly, e.g., the static semantics of Doop [7], which is described in Datalog, differs significantly from dynamic semantics described with small-step rules [6]. This impedance mismatch makes soundness proofs monolithic, i.e., it is difficult to determine which parts of the static semantics relate to which parts of the dynamic semantics, requiring the soundness proofs to reason about both semantics as a whole. These problems complicate soundness proofs such that only leading experts with multiple years of experience can conduct them [13, 26].

To deal with the complexity of soundness proofs, existing works modularize static and dynamic semantics [5, 14, 28]. This modularization allows to compose a soundness proof for the entire analysis from soundness lemmas of small parts of the analysis. This allows reasoning about small parts of the analysis one at a time. These existing works require that both the static and dynamic semantics are derived from the same artifact, often called a generic interpreter. A generic interpreter describes the operational semantics of a language, without referring to details of dynamic or static semantics, and provides a common structure along which a soundness proof can be composed. However, generic interpreters restrict what types of analyses can be derived. In particular, generic interpreters derive analyses that follow the program execution order, specifically, forward whole-program abstract interpreters. But it is unclear how other types of analyses can be derived that do not follow the program execution order, such as backward, demand-driven/lazy, or summary-based analyses.

The work presented in this paper lifts this restriction by developing a soundness theory for the blackboard analysis architecture. The architecture is the foundation of the OPAL framework [21], which has been used to develop different kinds of analyses, including backward analyses [17], on-demand/lazy analyses [19, 41], and summary-based analyses [21]. In the architecture, complex static analyses are modularly composed from smaller, simpler static modules that handle individual language features, e.g., reflection, or program properties, e.g., immutability. These modules are decoupled—they are not allowed to call each other directly; instead, they communicate with each other by exchanging information via a central data store called blackboard [39] orchestrated by a fixpoint solver.

To develop a soundness theory for the blackboard analysis architecture, we define a dynamic semantics, which follows the same style as the static semantics and thus avoids the impedance mismatch problem. Specifically, the dynamic semantics is composed of dynamic modules that communicate with each other via a store. Our soundness theory is compositional, which means that each static module can be proven sound individually and soundness for the compound analysis follows from a meta theorem. Furthermore, we extend the theory to make soundness proofs of existing static modules reusable across different analyses. In particular, we prove that the soundness proof of an static module remains valid, even if (a) the compound analysis processes source code elements unknown to the module and (b) the store contains other types of analysis information unknown to the module. Furthermore, our proofs are polymorphic in the lattices on which static modules operate, i.e., the lattices can be changed without affecting soundness. For instance, we can reuse a pointer-static module, which typically depends on an allocation-site lattice, in a reflection analysis to propagate string information by extending this lattice without invalidating the pointer-static modules’ soundness proof.

We demonstrate the applicability of our theory by implementing four different analyses and their dynamic semantics in the blackboard analysis architecture: (1) a pointer and call-graph analysis, (2) an analysis for reflection, (3) an immutability analysis, and (4) a demand-driven reaching-definitions analysis. Our choice of analyses is inspired by existing state-of-the-art analyses for Java implemented in the OPAL framework [21, 41]. We implemented and tested each analysis and dynamic semantics in Scala to ensure they are executable. Furthermore, we used our theory to prove each analysis sound, where each analysis exercises a different aspect of our theory: (1) static modules can be proven sound independently despite mutually depending on each other, (2) soundness of modules remains valid even though the lattice changes, (3) soundness of a module remains valid even though different source code elements are analyzed, and (4) our theory applies to analyses which do not follow the program execution order.

In summary, we make the following contributions:

  • We give the first formalization of the blackboard analysis architecture (Section 2).

  • We develop a theory of compositional soundness proofs for the formal model of the blackboard analysis architecture. We prove that soundness of an analysis follows from independent soundness proofs for each of its modules (Section 3).

  • We show how to make soundness proofs reusable by extending our theory (Section 4).

  • We demonstrate the applicability of our theory on four different types of analyses (Section 5).

All proofs of theorems, lemmas, and case studies are provided in the paper’s supplementary material.

2 Blackboard Analysis Architecture

In this section, we introduce and formalize the static and dynamic semantics of the blackboard analysis architecture used in the OPAL framework [21].

2.1 Static Semantics

Static analyses in the blackboard analysis architecture consist of multiple static modules exchanging information via a central data store called blackboard [39]. This avoids coupling between modules as they are not allowed to call each other directly: Modules store analysis results in the blackboard using keys. These keys allow other modules to retrieve results without needing to know their producer.

Definition 1

(Static Semantics). We define basic notions and datatypes of the static semantics of the blackboard analysis architecture:

  1. 1.

    Entities (\(\smash {\widehat{e}} \in \smash {\widehat{\mathtt {\textsf{Entity}}}}\))Footnote 1 are parts of programs an analysis can compute information for. For example, entities could be classes, methods, statements, fields, variables, or allocation sites of objects. Entities are ordered discretely: \(\smash {\widehat{e}}_1 \sqsubseteq \smash {\widehat{e}}_2 \text { iff } \smash {\widehat{e}}_1 = \smash {\widehat{e}}_2.\)

  2. 2.

    Kinds (\(\kappa \in \textsf{Kind}\)) identify analysis information that can be computed for an entity. For example, a class entity could have kinds for its immutability or thread safety, a variable entity could have kinds for its definition site or approximations of its value. Kinds are also ordered discretely.

  3. 3.

    Properties (\(\smash {\widehat{p}} \in \smash {\widehat{\mathtt {\textsf{Property}}}}[\kappa ] \text { where } \smash {\widehat{\mathtt {\textsf{Property}}}}: \textsf{Kind}\rightarrow \textsf{Lattice}\)) denote analysis information which is identified by a kind \(\kappa \). For instance, a class entity could have an immutability property “mutable” or “immutable”. Properties of a kind are partially ordered and form a lattice.

  4. 4.

    A central store (\(\smash {\widehat{\mathtt {\sigma }}} \in \smash {\widehat{\mathtt {\textsf{Store}}}} \subseteq \smash {\widehat{\mathtt {\textsf{Entity}}}} \times (\kappa : \textsf{Kind}) \rightharpoonup \smash {\widehat{\mathtt {\textsf{Property}}}}[\kappa ]\))Footnote 2 contains all properties for each entity and kind. We use the notation \(\smash {\widehat{\sigma }}(\smash {\widehat{e}},\kappa )\) for a store lookup of an entity \(\smash {\widehat{e}}\) and kind \(\kappa \), which results in the bottom element \(\bot \) in case the property is not present. Furthermore, we use the notation \(\smash {\widehat{\sigma }} \sqcup [\smash {\widehat{e}},\kappa \mapsto \smash {\widehat{p}}]\) for writing a new property \(\smash {\widehat{p}}\) to the store. If a property for the entity \(\smash {\widehat{e}}\) and \(\kappa \) already exists in the store, then the old property is joined with the new property. Stores are ordered point-wise.

  5. 5.

    Static modules (\(\smash {\widehat{f}} \in \smash {\widehat{\mathtt {\textsf{Module}}}} = \smash {\widehat{\mathtt {\textsf{Entity}}}} \times \smash {\widehat{\mathtt {\textsf{Store}}}} \rightarrow \smash {\widehat{\mathtt {\textsf{Store}}}}\)) are monotone functions that compute properties of a given entity. The store allows multiple static modules to communicate and exchange information without having to call each other directly. Each static module has access to the entire store and can contribute to one or more properties.

  6. 6.

    The fixpoint algorithm (\(\textsf{fix}: \mathcal {P}(\smash {\widehat{\mathtt {\textsf{Module}}}}) \times \smash {\widehat{\mathtt {\textsf{Store}}}} \rightarrow \smash {\widehat{\mathtt {\textsf{Store}}}}\)) computes a fixpoint of a compound analysis \(\smash {\widehat{F}} \in \mathcal {P}(\smash {\widehat{\mathtt {\textsf{Module}}}})\) for an initial store \(\smash {\widehat{\sigma }}_1\). More specifically, the fixpoint \(\textsf{fix} (\smash {\widehat{F}},\smash {\widehat{\sigma }}_1)\) is a store \(\smash {\widehat{\sigma }}_n \sqsupseteq \smash {\widehat{\sigma }}_1\) such that static modules \(\smash {\widehat{f}} \in \smash {\widehat{F}}\) do not add new information, i.e., \(\smash {\widehat{f}}(\smash {\widehat{e}},\smash {\widehat{\sigma }}_n) = \smash {\widehat{\sigma }}_n\) for all \(\smash {\widehat{e}} \in \mathop {dom}(\smash {\widehat{\sigma }}_n)\). The fixpoint is unique and guaranteed to exist when all properties are lattices of finite height [10].

The types \(\smash {\widehat{\mathtt {\textsf{Entity}}}}\), \(\textsf{Kind}\), and \(\smash {\widehat{\mathtt {\textsf{Property}}}}\) are defined by analysis developers, whereas the other types and functions are fixed by this definition.    \(\square \)

We illustrate Definition 1 at the example of a text-book reaching-definitions analysis [38] for an imperative language with labeled assignments and expressions:

figure c

The static module \(\smash {\widehat{\mathtt {\textsf{reachingDefs}}}}\) is implemented with Scala-like pseudo code. Module \(\smash {\widehat{\mathtt {\textsf{reachingDefs}}}}\) computes for every statement of the program which variable definitions reach it. Therefore, entities are statements and the module’s property is a mapping from variables to assignments that may have defined it. Module \(\smash {\widehat{\mathtt {\textsf{reachingDefs}}}}\) joins the reaching definitions of all control-flow predecessors and then updates them on variable assignments. Note that module \(\smash {\widehat{\mathtt {\textsf{reachingDefs}}}}\) neither computes the control-flow predecessors directly nor does it call another module which computes this information. Instead, it retrieves this information from the store \(\smash {\widehat{\sigma }}\). This decoupling avoids dependencies between static modules and enables compositional soundness proofs.

2.2 Dynamic Semantics

Static analyses in the blackboard analysis architecture are proven sound with respect to a dynamic semantics in the same style, which we define formally in this subsection:

Definition 2

(Dynamic Semantics). We define the dynamic semantics used to prove soundness of analyses in the blackboard analysis architecture:

  1. 1.

    The dynamic semantics depends on concrete versions of entities (\(e \in \textsf{Entity}\)), properties (\(p \in \textsf{Property}[\kappa ]\) where \(\textsf{Property}: \textsf{Kind}\rightarrow \textsf{Set}\)) and stores (\(\sigma \in \textsf{Store}\subseteq \textsf{Entity}\times (\kappa :\textsf{Kind}) \rightarrow \textsf{Property}[\kappa ]\)). The kinds are the same as for static modules.

  2. 2.

    Dynamic modules (\(f \in \textsf{Module}= \textsf{Entity}\times \textsf{Store}\rightharpoonup \textsf{Store}\)) are partial functions which may only be defined for a subset of entities. Furthermore, the partial function is undefined in case it tries to lookup an element from the store which is not present.

  3. 3.

    Static analyses are proven sound with respect to a dynamic reachability semantics (\(\textsf{reachable}: \mathcal {P}(\textsf{Module}) \times \textsf{Store}\rightarrow \mathcal {P}(\textsf{Store})\)). The reachability semantics returns the set of all reachable stores by iteratively applying a set of dynamic modules. More specifically, the set \(\textsf{reachable} (F,\sigma _1)\) contains store \(\sigma _1\) and for all \(f \in F\), reachable stores \(\sigma \), and for entities \(e \in \mathop {dom}(\sigma )\), the set contains \(f(e,\sigma )\), if it is defined.   \(\square \)

We illustrate these definitions again at the example of the reaching-definitions analysis which we introduced in the previous subsection:

figure d

Dynamic module reachingDefs is analogous to its static counterpart \(\smash {\widehat{\mathtt {\textsf{reachingDefs}}}}\), but computes the most recent definition of a variable instead of all possible definitions. The dynamic module depends on the control-flow predecessor, which is the most recently executed statement. The control-flow predecessors are computed by module controlFlow, which is based on a small-step operational semantics \(\texttt {step}: \texttt{Stmt} \times \texttt{ProgramState} \rightharpoonup \texttt{Stmt} \times \texttt{ProgramState}\). Module controlFlow demonstrates that the blackboard architecture is capable to integrate existing dynamic operational semantics, such as those for Java [6] or WebAssembly [18].

The blackboard analysis architecture not only modularizes the static semantics but also the dynamic semantics, which is crucial for enabling compositional and reusable soundness proofs. In particular, each static module is proven sound with respect to exactly one dynamic module, which limits the proof scope and guarantees proof independence. Furthermore, for analyses that approximate non-standard dynamic semantics, the standard dynamic semantics can be modularly extended with further modules (e.g., Section 5.1).

To summarize, in this section we formally defined the blackboard analysis architecture, which allows to implement static analyses modularly. Furthermore, we defined a dynamic semantics in the same style against which analyses are proven sound.

3 Compositional Soundness Proofs

In this section, we develop a theory of compositional soundness proofs for analyses in the blackboard style: Soundness of a compound analysis follows directly from soundness of the individual static modules. This soundness theory simplifies the soundness proof, because it allows analysis developers to focus on soundness of individual static modules, instead of having to reason about soundness of the interaction of all static modules with each other. Furthermore, the soundness theory makes the proofs more maintainable, as a change to a module only affects the proof of that module and nothing else.

We start the section by defining soundness of static modules and then work up to soundness of whole analyses. The definitions of soundness are standard and build upon the theory of abstract interpretation [12]:

Definition 3

(Soundness of Static Modules). An static module \(\smash {\widehat{ f }} \in \smash {\widehat{\mathtt {\textsf{Module}}}}\) is sound if it overapproximates its dynamic counterpart \( f \in \textsf{Module}\):

$$\begin{aligned} & \textsf{sound}( f ,\smash {\widehat{ f }})\ \text {iff}\ \forall \smash {\widehat{e}} \in \smash {\widehat{\mathtt {\textsf{Entity}}}}, \smash {\widehat{\sigma }} \in \smash {\widehat{\mathtt {\textsf{Store}}}}, e \in \gamma _{\textsf{Entity}}(\smash {\widehat{e}}), \sigma \in \gamma _{\textsf{Store}}(\smash {\widehat{\mathtt {\sigma }}}) . \\ & \qquad \qquad \qquad f (e, \sigma ) \in \gamma _{\textsf{Store}}(\smash {\widehat{ f }}(\smash {\widehat{e}},\smash {\widehat{\mathtt {\sigma }}})) \end{aligned}$$

   \(\square \)

The expression \(x \in \gamma (\smash {\widehat{y}})\) reads as “element \(\smash {\widehat{y}}\) soundly overapproximates the concrete element x.” Function \(\gamma : \smash {\widehat{L}} \rightarrow \mathcal {P}(L)\) is a monotone function from an abstract domain \(\smash {\widehat{L}}\) to a powerset of a concrete domain L and is called concretization function. We do not require that an abstraction function \(\alpha : \mathcal {P}(L) \rightarrow \smash {\widehat{L}}\) in the opposite direction exists nor that \(\gamma \) and \(\alpha \) form a Galois connection, both of which are not necessary for soundness proofs.

The soundness definition above requires that analysis developers define concretizations for entities (\(\gamma _{\textsf{Entity}}: \smash {\widehat{\mathtt {\textsf{Entity}}}} \rightarrow \mathcal {P}(\textsf{Entity})\)) and properties (\(\gamma _{\textsf{Property}}: \smash {\widehat{\mathtt {\textsf{Property}}}}[\kappa ] \rightarrow \mathcal {P}(\textsf{Property}[\kappa ])\)). Often the abstract and concrete entities are of the same type (\(\smash {\widehat{\mathtt {\textsf{Entity}}}} = \textsf{Entity}\)). In this case, the concretization functions map to singleton sets (\(\gamma _\textsf{Entity}(e) = \{e\}\)). Based on concretization functions for entities, kinds, and properties, we define a point-wise concretization on stores. The definition can be found in the supplementary material.

In the following, we define soundness of compound analyses.

Definition 4

(Soundness of a Compound Analysis). Let \(\varPhi \subseteq \textsf{Module}\times \smash {\widehat{\mathtt {\textsf{Module}}}}\) be a set of static modules paired with corresponding dynamic modules. A compound analysis is sound if the fixpoint of all of its static modules overapproximates the reachability semantics of the corresponding dynamic modules:

$$\begin{aligned} & \textsf{sound}(\varPhi )\ \text { iff }\ \forall \smash {\widehat{\sigma }} \in \smash {\widehat{\textsf{Store}}}.\ \textsf{reachable} (F, \gamma _{\textsf{Store}}(\smash {\widehat{\sigma }})) \subseteq \gamma _{\textsf{Store}}(\textsf{fix} (\smash {\widehat{F}}, \smash {\widehat{\sigma }})) \\ & \quad \textrm{where}\,\, F = \{ f \mid (f,\_) \in \varPhi \} \,\,\textrm{and }\,\, \smash {\widehat{F}} = \{ \smash {\widehat{f}} \mid (\_,\smash {\widehat{f}}) \in \varPhi \}. \end{aligned}$$

   \(\square \)

The compound analysis approximates the dynamic reachability semantics (Definition 2.3), which collects the set of all stores reachable by applying dynamic modules. The dynamic reachability semantics is a collecting semantics, commonly used to prove soundness of abstract interpreters [12].

We are now ready to state the main theorem of this work:

Theorem 1

(Soundness Composition). Let \(\varPhi \subseteq \textsf{Module}\times \smash {\widehat{\mathtt {\textsf{Module}}}}\) be a set of static modules paired with corresponding dynamic modules. Soundness of a compound analysis follows from soundness of all of its static modules:

$$\begin{aligned} \textrm{If }\ \textsf{sound}( f ,\smash {\widehat{ f }}) \ \textrm{for all } \ (f,\smash {\widehat{f}}) \in \varPhi \ \textrm{then } \ \textsf{sound}(\varPhi ). \end{aligned}$$

Proof

We show \(\textsf{reachable} (F,\gamma _{\textsf{Store}}(\smash {\widehat{\sigma }}_1)) \subseteq \gamma _{\textsf{Store}}(\textsf{fix} (\smash {\widehat{F}}, \smash {\widehat{\sigma }}_1))\) by well-founded induction on \(X \preceq \textsf{reachable} (F,X)\).

  • Base case: \(\textsf{reachable} (F,\varnothing ) = \varnothing \subseteq \gamma _{\textsf{Store}}(\textsf{fix} (\smash {\widehat{F}}, \smash {\widehat{\sigma }}_1))\)

  • Inductive case: Suppose \(X \subseteq \gamma _{\textsf{Store}}(\textsf{fix} (\smash {\widehat{F}}, \smash {\widehat{\sigma }}_1))\) and \(\smash {\widehat{\sigma }}_n = \textsf{fix} (\smash {\widehat{F}}, \smash {\widehat{\sigma }}_1)\). Then for all \(\sigma \in X \subseteq \gamma _{\textsf{Store}}(\textsf{fix} (\smash {\widehat{F}}, \smash {\widehat{\sigma }}_1))\), we get \(\mathop {dom}(\sigma ) \subseteq \gamma _{\textsf{Entity}\times \textsf{Kind}}(\mathop {dom}(\smash {\widehat{\sigma }}_n))\) and \(\sigma (e,k) \in \gamma _{\textsf{Property}}(\smash {\widehat{\mathtt {\sigma }}}_n(\smash {\widehat{e}}, \kappa ))\) for all \(\forall (\smash {\widehat{e}},\kappa ) \in \mathop {dom}(\smash {\widehat{\mathtt {\sigma }}}_n)\) and \(e \in \gamma _{\textsf{Entity}}(\smash {\widehat{e}})\). Furthermore, since \(\smash {\widehat{\sigma }}_n\) is a fixpoint, it holds \(\smash {\widehat{f}}(\smash {\widehat{e}},\smash {\widehat{\sigma }}_n) \sqsubseteq \smash {\widehat{\sigma }}_n\) for all \(\smash {\widehat{f}} \in \smash {\widehat{F}}\) and \(\smash {\widehat{e}} \in \mathop {dom}(\smash {\widehat{\sigma }}_n)\). From \(\textsf{sound}(f,\smash {\widehat{f}})\) we conclude \(f(e,\sigma ) \in \gamma _{\textsf{Store}}(\smash {\widehat{f}}(\smash {\widehat{e}},\smash {\widehat{\sigma }}_n)) \subseteq \gamma _{\textsf{Store}}(\smash {\widehat{\sigma }}_n) = \gamma _{\textsf{Store}}(\textsf{fix} (\smash {\widehat{F}}, \smash {\widehat{\sigma }}_1))\) for all \((f,\smash {\widehat{f}}) \in \varPhi \), \((e,\_) \in \mathop {dom}(\sigma )\), \((\smash {\widehat{e}},\_) \in \mathop {dom}(\smash {\widehat{\sigma }}_n)\) with \(e \in \gamma _{\textsf{Entity}}(\smash {\widehat{e}})\). It follows \(\textsf{reachable} (F,X) \subseteq \gamma _{\textsf{Store}}(\textsf{fix} (\smash {\widehat{F}}, \smash {\widehat{\sigma }}_1))\).    \(\square \)

We illustrate this theorem by applying it to the reaching definitions analysis from Section 2.1. Specifically, soundness of the compound analysis follows from soundness of module \(\smash {\widehat{\mathtt {\textsf{reachingDefs}}}}\) module \(\smash {\widehat{\mathtt {\textsf{controlFlow}}}}\) by Theorem 1:

figure e

This means \(\smash {\widehat{\mathtt {\textsf{reachingDefs}}}}\) can be proven sound independently from \(\smash {\widehat{\mathtt {\textsf{controlFlow}}}}\), even though the modules interact with each other in the compound analysis. The proof independence is possible because neither module reachingDefs nor \(\smash {\widehat{\mathtt {\textsf{reachingDefs}}}}\) call the control-flow modules directly. Instead, both the static and dynamic module read the control-flow information from the stores, which are guaranteed to be a sound overapproximation initially (assumption \(\sigma \in \gamma _{\textsf{Store}}(\smash {\widehat{\mathtt {\sigma }}})\)). Furthermore, only properties that the reaching-definitions modules themselves wrote to the store need to be sound overapproximations. Properties that other modules wrote to the store are not subject of the soundness proof of the reaching-definitions modules. The soundness proof of module \(\smash {\widehat{\mathtt {\textsf{reachingDefs}}}}\) is found in the supplementary material.

To summarize, in this section we developed a theory of compositional soundness proofs for analyses described in the blackboard architectural style. Each static module can be proven sound independently from other modules. Furthermore, soundness of a whole analysis follows directly from soundness of each module. In particular, no reasoning about the analysis as a whole is required.

4 Reusable Soundness Proofs

As of now, static modules refer to a specific type of entities, kinds, properties, and stores. However, adding new modules to an analysis may require extending these types. This invalidates the soundness proofs of existing modules and they need to be re-established. In this section, we extend our theory to make static modules and their soundness proofs reusable.

4.1 Extending the Type of Entities and Kinds

We start by explaining how entities and kinds can be extended without invalidating existing soundness proofs.

For example, if we were to add a taint static module to an existing analysis over types \(\smash {\widehat{\mathtt {\textsf{Entity}}}}\), \(\textsf{Kind}\), and \(\smash {\widehat{\mathtt {\textsf{Store}}}}\), we needed to extend these types to hold the new analysis information:

$$\begin{aligned} \smash {\widehat{\mathtt {\textsf{Entity}}}}' = \smash {\widehat{\mathtt {\textsf{Entity}}}} \mid \textsf{Var} {} & {} \textsf{Kind}' = \textsf{Kind}\mid \kappa _{\texttt{Taint}}\end{aligned}$$

But this invalidates the proofs of existing modules that depend on the subsets \(\smash {\widehat{\mathtt {\textsf{Entity}}}}\) and \(\textsf{Kind}\). To solve this problem, we first parameterize the type of modules to make explicit what types of entities and kinds they depend on:

Definition 5

(Parameterized Modules (Preliminary)). We define a type of module that is parameterized by the types of entities E, kinds K, and store S:

$$\begin{aligned} & f \in \textsf{Module}[E,K] = \forall S: \textsf{Store}[E,K] .\ E \times S \rightarrow S \end{aligned}$$

   \(\square \)

Interface \(\textsf{Store}[E,K]\) defines read and write operations for an abstract store type S, that restricts access to entities of type E and kinds of type K. The store interface allows us to call parameterized modules with stores containing supersets of the type of entities and kinds.

For these parameterized modules, we define a sound lifting to supersets of entities and kinds:

figure f

The lifting calls module f on all entities of type E on which f is defined and simply ignores all other entities, returning the store unchanged. For example, the lifted reaching-definitions module \(\textsf{lift} [\textsf{Stmt} \mid \textsf{Var},\ \kappa _{\texttt{ReachingDefs}} \mid \kappa _{\texttt{ControlFlowPred}} \mid \kappa _{\texttt{Taint}}](\smash {\widehat{\mathtt {\textsf{reachingDefs}}}})\) operates on the entities \(\textsf{Stmt}\) and the kinds \(\kappa _{\texttt{ReachingDefs}} \mid \kappa _{\texttt{ControlFlowPred}} \), but ignores entities \(\textsf{Var}\) and kinds \(\kappa _{\texttt{Taint}}\).

The lifting preserves soundness of the lifted modules for disjoint extensions of entities.

Definition 6

(Disjoint Extension). Entities \(\smash {\widehat{E}}' \supseteq \smash {\widehat{E}}\) and \(E' \supseteq E\) are a disjoint extension iff \(\gamma _{\textsf{Entity}}(\smash {\widehat{E}}) \subseteq E\) and \(\gamma _{\textsf{Entity}}(\smash {\widehat{E}}' \setminus \smash {\widehat{E}}) \subseteq E' \setminus E\).    \(\square \)

In other words, the concretization function \(\gamma _{\textsf{Entity}}\) does not mix up entities in \(\smash {\widehat{E}}\) and \(\smash {\widehat{E}}' \setminus \smash {\widehat{E}}\).

Lemma 1

(Lifting preserves Soundness). Let \(\smash {\widehat{f}} \in \textsf{Module}[\smash {\widehat{E}},K]\) and \(f \in \textsf{Module}[E,K]\) be a parameterized static module and dynamic module, \(\smash {\widehat{E}}' \supseteq \smash {\widehat{E}}\) and \(E' \supseteq E\) be a disjoint extension of entities, and \(K' \supseteq K\) a superset of kinds.

$$\begin{aligned} \textrm{If } \ \textsf{sound}(f,\smash {\widehat{f}}) \ \textrm{then } \ \textsf{sound}(\textsf{lift} [E',K'](f),\textsf{lift} [\smash {\widehat{E}}',K'](\smash {\widehat{f}})). \end{aligned}$$

Proof

Let \(\smash {\widehat{f}}: \textsf{Module}[\smash {\widehat{E}},K]\) and \(f: \textsf{Module}[E,K]\) be an analysis and dynamic module. Furthermore, let \(\smash {\widehat{e}}:\smash {\widehat{E}}'\) and \(e \in \gamma _{\textsf{Entity}}(\smash {\widehat{e}})\) be an entity and \(\smash {\widehat{\mathtt {\sigma }}}: \textsf{Store}[\smash {\widehat{E}}',K']\) and \(\sigma \in \gamma _{\textsf{Store}}(\smash {\widehat{\sigma }})\) be an abstract and concrete store.

  • In case \(\smash {\widehat{e}} \in \smash {\widehat{E}}\) then also \(e \in E\). Hence, \(\textsf{lift} (\smash {\widehat{f}})(\smash {\widehat{e}},\smash {\widehat{\sigma }}) = \smash {\widehat{f}}(\smash {\widehat{e}},\smash {\widehat{\sigma }})\) and \(\textsf{lift} (f)(e,\sigma ) = f(e,\sigma )\). Soundness follows by \(\textsf{sound}(f,\smash {\widehat{f}})\).

  • In case \(\smash {\widehat{e}} \in \smash {\widehat{E}}' \setminus \smash {\widehat{E}}\) then also \(e \in E' \setminus E\) for all \(e \in \gamma _{\textsf{Entity}}(\smash {\widehat{e}})\). Hence \(\textsf{lift} (\smash {\widehat{f}})(\smash {\widehat{e}},\smash {\widehat{\sigma }}) = \smash {\widehat{f}}(\smash {\widehat{e}},\smash {\widehat{\sigma }})\) and \(\textsf{lift} (f)(e,\sigma ) = f(\smash {\widehat{e}},\smash {\widehat{\sigma }})\).    \(\square \)

This lemma means that we can prove the soundness of static modules once for specific types of entities and kinds. Later, we can reuse the modules in a compound analysis with extended entities and kinds without having to prove soundness again.

4.2 Changing the Type of Properties

Next, we extend our theory to allow changing the type of properties without invalidating the soundness proofs of existing modules that use them.

For example, consider we already have a pointer-static module that propagates object allocation information \(\smash {\widehat{\mathtt {\textsf{Property}}}}[\kappa _{\texttt{Val}}] = \smash {\widehat{\mathtt {\textsf{Obj}}}}\). We may want to track string information as well. This could be done with a independent string-tracking static module with its own lattice. However, since tracking strings is mostly identical to tracking pointer information, such an additional module would duplicate significant amounts of code and require a new proof from scratch.

Instead, we can thus reuse the same pointer-static module to propagate string information \(\smash {\widehat{\mathtt {\textsf{Str}}}}\) by changing its lattice to \(\smash {\widehat{\mathtt {\textsf{Property}}}}'[\kappa _{\texttt{Val}}] = \smash {\widehat{\mathtt {\textsf{Obj}}}} \times \smash {\widehat{\mathtt {\textsf{Str}}}}\). However, this invalidates the soundness proof of the pointer-static module as it depends on type \(\smash {\widehat{\mathtt {\textsf{Property}}}}[\kappa _{\texttt{Val}}]\).

To solve this problem, we generalize the type of static modules again to be polymorphic over the type \(\smash {\widehat{\mathtt {\textsf{Property}}}}\):

Definition 7

(Parameterized Modules (Final)). We define a type of module that is parameterized by the type of entities E, kinds K, properties P, and stores S:

$$\begin{aligned} & f \in \textsf{Module}[E, K, I] = \forall P: I, S: \textsf{Store}[E,K,P], E \times S \rightarrow S \end{aligned}$$

   \(\square \)

Interface \(\textsf{Store}[E,K,P]\) restricts access to entities of type E and type K and contains properties of type P. Interface I defines operations on properties P.

For example, a pointer-static module may depend on the Scala-like interface Objects in Listing 1.1. Interface Objects depends on a type variable Value, which refers to possible values of variables. Function \(\textsf{newObj}\) creates a new object of a certain class and context. Function \(\textsf{forObj}\) iterates over all such objects applying continuation \(\texttt{f}\). Continuation \(\texttt{f}\) takes a class name, context, and store and returns a modified store. Interface Objects can be instantiated to support different value abstractions. For example, instance \(\smash {\widehat{\mathtt {\textsf{AllocationSite}}}}\) implements the interface with an allocation-site abstraction \(\smash {\widehat{\mathtt {\textsf{Obj}}}} = \smash {\widehat{\mathtt {\textsf{Obj}}}}(\mathcal {P}(\textsf{Class} \times \textsf{Context}))\) which abstracts object allocations by their class names and a call string to their allocation site. Instance \(\smash {\widehat{\mathtt {\textsf{AllocationSiteAndStrings}}}}\) implements a reduced product [9] of objects \(\smash {\widehat{\mathtt {\textsf{Obj}}}}\) and strings \(\smash {\widehat{\mathtt {\textsf{Str}}}} = \texttt{Constant}[\texttt{String}]\), which abstracts the value of strings with a constant abstraction. This allows us to reuse the same pointer-static module to propagate string information.

figure g

Note that certain interfaces may restrict what instances can be implemented. For example, an abstract domain that only approximates strings but not objects, cannot soundly implement operation in interface . In this case, interfaces need to be generalized to allow a wider range of instances.

4.3 Soundness of Parameterized Modules

In this subsection, we define soundness of parameterized static modules and prove a generalized soundness composition theorem.

Definition 8

(Soundness of Parameterized static Modules). A parameterized static module \(\smash {\widehat{f}}: \smash {\widehat{\mathtt {\textsf{Module}}}}[\smash {\widehat{E}},K,I]\) is sound w.r.t. a parameterized dynamic module \(f: \textsf{Module}[E,K,I]\) iff all their instances are sound:

$$\begin{aligned} & \textsf{sound}(f,\smash {\widehat{f}})\ \text {iff} \ \forall P:I, \smash {\widehat{P}}: I, S: \textsf{Store}[E,K,P], \smash {\widehat{S}}: \textsf{Store}[\smash {\widehat{E}},K,\smash {\widehat{P}}].\ \\ & \qquad \qquad \qquad \textsf{sound}(P,\smash {\widehat{P}}) \implies \textsf{sound}(f[P,S],\smash {\widehat{f}}[\smash {\widehat{P}},\smash {\widehat{S}}]). \end{aligned}$$

   \(\square \)

Parameterized static modules are proven sound for all sound instances of property interface I. A static instance \(\smash {\widehat{P}}: I\) is sound w.r.t. to a dynamic instance P : I, if all of its operations are sound. Soundness for dynamic and static instances of interface Objects in Listing 1.1 is defined as follows:

$$\begin{aligned} & \textsf{sound}(\textsf{newObj},\smash {\widehat{\mathtt {\textsf{newObj}}}})\ \text { iff }\ \forall c , \smash {\widehat{\texttt{ h }}}, h \in \gamma (\smash {\widehat{\texttt{ h }}}), \textsf{newObj}( c , h ) \in \gamma _{\textsf{Obj}}(\smash {\widehat{\mathtt {\textsf{newObj}}}}( c , \smash {\widehat{\texttt{ h }}})) \\ & \textsf{sound}(\textsf{forObj},\smash {\widehat{\mathtt {\textsf{forObj}}}})\ \text { iff }\ \forall f, \smash {\widehat{f}}, \textsf{sound}(f,\smash {\widehat{f}}) \implies \textsf{sound}(\textsf{forObj}(f),\smash {\widehat{\textsf{forObj}}}(\smash {\widehat{f}})) \end{aligned}$$

Soundness of first-order operations like \(\smash {\widehat{\mathtt {\textsf{newObj}}}}\) is similar to that of static modules (Definition 3). Soundness of higher-order operations like \(\smash {\widehat{\mathtt {\textsf{forObj}}}}\) is proven w.r.t. all sound functions \(\smash {\widehat{f}}\).

Finally, we generalize the soundness composition Theorem 1 to parameterized static modules. In particular, an analysis composed of parameterized static modules is sound if all of its modules are sound and the instance of its property interface is sound.

Theorem 2

(Soundness Composition for Parameterized Static Modules). Let \(\varPhi \) be parameterized static modules paired with corresponding dynamic modules over families of entities \(\smash {\widehat{E}}' = \bigcup _i \smash {\widehat{E}}_i, E' = \bigcup _i E_i\), kinds \(K' = \bigcup _i K_i\), properties \(\smash {\widehat{P}}\), P.

$$\begin{aligned} & \textrm{If } \ \textsf{sound}(f,\smash {\widehat{f}}) \ \textrm{for all } \ (f,\smash {\widehat{f}}) \in \varPhi \ \textrm{and } \ \textsf{sound}(P,\smash {\widehat{P}}) \ \textrm{then } \ \textsf{sound}(\varPhi '), \\ & \qquad \textrm{where}\ \varPhi ' = \{ (\textsf{lift} [E',K'](f), \textsf{lift} [\smash {\widehat{E}}',K'](\smash {\widehat{f}})) \mid (f,\smash {\widehat{f}}) \in \varPhi \} \end{aligned}$$

Proof

We instantiate the polymorphic modules \(f, \smash {\widehat{f}}\) with the compound types to obtain \(\textsf{sound}[E', K'](\textsf{lift} (f), \textsf{lift} [E',K'](\smash {\widehat{f}}))\). Then the soundness composition Theorem 3.4 for monomorphic modules applies.    \(\square \)

To summarize, in this section we explained how the type of entities, kinds, and properties can be changed without invalidating the soundness proofs of existing modules. To this end, we generalized the type of modules to be parametric over the type of entities, kinds, and properties. The parameterized modules access properties via an interface. The instances of this interface are specific to certain types of properties and require a soundness proof.

5 Applicability of the Theory

In this section, we demonstrate the applicability of our theory by first developing four analyses in the blackboard architecture and then proving them sound compositionally.

5.1 Case Studies

We developed four different analyses in the blackboard architecture (Section 2) together with their dynamic semantics (Section 2.2). We proved each analysis sound and discuss the proofs in Section 5.2. Each analysis exercises a specific part of our soundness theory:

  • A pointer analysis which mutually depends on a call-graph analysis (exercises the part of our theory presented in Section 3).

  • A reflection analysis which reuses the pointer analysis to propagate string information (exercises Section 4.2).

  • A field and object immutability analysis depending on all above analyses (exercises Section 4.1).

  • A demand-driven reaching-definitions analysis which demonstrates that our theory applies to this type of analyses.

Our choice of analyses was inspired by similar but more complex analyses for JVM-bytecode implemented in OPAL, which scale to real-world applications [21, 41]. Our analyses operate on a simpler object-oriented language with the following abstract syntax:

figure j

The language features inheritance, mutable memory, class fields, virtual method calls, and Java-like reflection [35]. Reflection is modeled as virtual calls to native methods. We also deliberately added features such as control-flow constructs and boolean operations. These are ignored by the analyses, but need to be modeled by dynamic semantics, complicating the soundness proof of the analyses.

We implemented and tested each analysis in Scala to ensure they are executable. Furthermore, we implemented and tested the corresponding dynamic semantics to ensure they are sensible. The code of analyses and dynamic semantics can be found in the supplementary material accompanying this paper. In the following, we discuss the implementation of each analysis in more detail.

Pointer and Call-Graph Analysis A pointer analysis for an object-oriented language computes which objects a variable or field may point to. A call-graph analysis determines which methods may be called at specific call sites. Pointer and call-graph analyses are the foundation which many other analyses build upon.

The analyses are composed from four static modules, whose dependencies are visualized in Figure 1. An arrow from a store entry to a module represents a read, an arrow in the other direction represents a write. Even though all modules implicitly depend on each other, they can be proven sound independently from each other (Section 3). This is possible because they do not call other modules directly, instead, all communication happens via the store.

Fig. 1.
figure 1

Points-To and Call-Graph Static Modules

Module \(\smash {\widehat{\mathtt {\textsf{method}}}}\) registers each statement of a method in the store to trigger other modules. It disregards control flow as the analysis is flow-insensitive and hence also registers statements that can never be executed. Flow-insensitive analyses can be more performant than flow-sensitive ones, but traditional approaches using generic abstract interpreters do not allow for flow-insensitive analyses. Module \(\smash {\widehat{\mathtt {\textsf{pointsTo}}}}\) analyzes New expressions and assignments of variable and field references. Module \(\smash {\widehat{\mathtt {\textsf{virtualCall}}}}\) resolves target methods of virtual calls based on the receiver object. Once a call is resolved, module \(\smash {\widehat{\mathtt {\textsf{invokeReturn}}}}\) extends the call context, assigns the method parameters and return value. Finally, it registers the called method as an entity in the store, triggering module \(\smash {\widehat{\mathtt {\textsf{method}}}}\).

The entities of the analyses are fields, statements, expressions, methods, and calls:

figure k

Each entity is paired with a call context or heap context, which allows to tune the precision of the analysis. The static modules communicate via two kinds: Kind \(\kappa _{\textsf{Val}}\) refers to possible values of expressions and fields and the return value of methods. Values are abstract objects containing information about where objects were allocated. Kind \(\kappa _{\textsf{CallTarget}} \) refers to possible targets of method calls. Call targets are sets of receiver objects paired with the target method and their arguments.

To illustrate the analysis, Listing 1.2 shows the code of modules \(\smash {\widehat{\mathtt {\textsf{virtualCall}}}}\) and \(\smash {\widehat{\mathtt {\textsf{invokeReturn}}}}\). They implicitly communicate with each other via the store but do not call each other directly. Module \(\smash {\widehat{\mathtt {\textsf{virtualCall}}}}\) resolves virtual method calls by first fetching the points-to set of the receiver reference from the store. Afterwards, it iterates over all possible receivers and fetches possible target methods from the class table. Finally, it writes the new call target to the store. Storing the receiver object and argument expressions as part of the call target allows to reuse module \(\smash {\widehat{\mathtt {\textsf{invokeReturn}}}}\) for different types of calls. If the entity is a Call expression, module \(\smash {\widehat{\mathtt {\textsf{invokeReturn}}}}\) first fetches the targets of the call from the store. Then, it iterates over all targets, extends the call context with function \(\smash {\widehat{\mathtt {\textsf{extendCtx}}}}\), binds the parameters to the values of the arguments and variable this to the receiver object. Furthermore, it registers the called method as an entity in the store, which in turn triggers module \(\smash {\widehat{\texttt{method}}}\) to process the statements of the called method. Lastly, module \(\smash {\widehat{\mathtt {\textsf{invokeReturn}}}}\) writes the return value of a method to the method entity in the store and copies it to call entities of this method.

figure l

The modules depend on interface Objects shown in Listing 1.1 and an analogous interface for call targets. Operations \(\smash {\widehat{\mathtt {\textsf{newObj}}}}\) and \(\smash {\widehat{\mathtt {\textsf{newCallTarget}}}}\) create new abstract objects and call targets. Operations \(\smash {\widehat{\mathtt {\textsf{forObj}}}}\) and \(\smash {\widehat{\mathtt {\textsf{forCallTarget}}}}\) iterate over all objects and call targets. Interface Objects also includes an operation \(\smash {\widehat{\mathtt {\textsf{nullPointer}}}}\) not shown in the listing, which returns an empty set of object allocation-sites (\(\smash {\widehat{\mathtt {\textsf{Obj}}}}(\varnothing )\)). The dynamic instances are analogous except that they operate on singleton types.

The dynamic modules compute a program’s heap and describe its changes during execution. They are analogous to their static counterparts except that they operate on singleton types \(\textsf{Obj} (\textsf{Class} \times \textsf{HeapCtx})\) and \(\textsf{CallTarget} (\textsf{Class} \times \textsf{HeapCtx} \times \textsf{Method} \times \textsf{Expr}^*)\).

All dynamic modules combined do not cover the entire language. In particular, there are no dynamic modules that cover reflective calls. This means, as of now, the dynamic semantics of reflection is undefined, and the soundness proof only covers programs without reflective calls. We address this point with the following case study.

Reflection Analysis Reflection is a language feature that allows to query information about classes and methods at runtime [35]. Our language supports three reflective methods: Methods Class.forName and Class.getMethod retrieve classes and methods by a string, respectively. Method.invoke invokes a method, where the target method is determined at runtime. Reflection is notoriously difficult to statically analyze soundly and precisely [30]: analyses need to approximate the content of the string passed into a reflective call. If the analysis cannot determine the string precisely, it needs to overapproximate or risk unsoundness. In this case study, we choose the former to be able to prove the analysis sound.

This case study demonstrates two important features of our formalization: First, the reflection analysis reuses all pointer and call-graph modules of the previous section (\(\smash {\widehat{\texttt{pointsTo}}}\), \(\smash {\widehat{\texttt{method}}}\), \(\smash {\widehat{\texttt{virtualCall}}}\), and \(\smash {\widehat{\texttt{invokeReturn}}}\)). It extends the value lattice to propagate new types of analysis information about strings. Even though the pointer analysis propagates new information, it does not require any changes and its soundness proof remains valid (Section 4.2). Second, the reflection analysis cooperates with the call-graph static module \(\smash {\widehat{\texttt{virtualCall}}}\) as reflective calls are regular virtual calls. For example, a call m.invoke(...) where variable m is of type Method is first resolved by virtual call resolution and its target Method.invoke is then resolved by reflective call resolution. Thus, both analyses add elements to the same set of call targets but can be proven sound independently from each other (Section 3).

Fig. 2.
figure 2

Reflection Static Modules

The reflection analysis extends the \(\smash {\widehat{\texttt{Obj}}}\) values of the pointer analysis with three new types of values—\(\smash {\widehat{\mathtt {\textsf{Str}}}}\), \(\smash {\widehat{\mathtt {\textsf{Class}}}}\), and \(\smash {\widehat{\mathtt {\textsf{Method}}}}\)—as a reduced product [9]:

$$\begin{aligned} & \smash {\widehat{\mathtt {\textsf{Property}}}}[{\kappa _{\textsf{Val}}}] = \bot \mid (\smash {\widehat{\mathtt {\textsf{Obj}}}} \times \smash {\widehat{\mathtt {\textsf{Str}}}} \times \smash {\widehat{\mathtt {\textsf{Class}}}} \times \smash {\widehat{\mathtt {\textsf{Method}}}}) & \\ & \smash {\widehat{\mathtt {\textsf{Str}}}} = \bot \mid \textsf{String} \mid \top & \\ & \smash {\widehat{\mathtt {\textsf{Class}}}} = \mathcal {P}(\textsf{Class}) \mid \top & \\ & \smash {\widehat{\mathtt {\textsf{Method}}}} = \mathcal {P}(\textsf{Method}) \mid \top \end{aligned}$$

String values are approximated with a constant lattice. Class and method values are approximated with a finite set of classes/methods or \(\top \). We reuse the modules of the pointer and call-graph analysis by implementing a new instance of interface Objects in Listing 1.1 for the new values. The new instance is similar to \(\smash {\widehat{\mathtt {\textsf{AllocationSiteAndStrings}}}}\) and iterates over all allocation-site information in strings, class/method values, and other objects.

The reflection analysis adds two new modules to the existing analysis in Figure 1. The new modules and their dependencies are visualized in Figure 2. Module \(\smash {\widehat{\mathtt {\textsf{reflection}}}}\) analyzes reflective calls to Class.forName, Class.getMethod, and Method.invoke. Module \(\smash {\widehat{\mathtt {\textsf{string}}}}\) analyzes string literals and concatenation. Listing 1.3 shows an excerpt of module \(\smash {\widehat{\mathtt {\textsf{reflection}}}}\) for Method.invoke. Module \(\smash {\widehat{\mathtt {\textsf{reflection}}}}\) first fetches the targets of a call resolved by module \(\smash {\widehat{\mathtt {\textsf{virtualCall}}}}\). If the call target is the native method invoke, module \(\smash {\widehat{\mathtt {\textsf{reflection}}}}\) matches on the arguments of the virtual call to extract the receiver and arguments of the reflective call target. Finally, it calls operation \(\smash {\widehat{\mathtt {\textsf{methodInvoke}}}}\) which returns the set of call targets.

Operation \(\smash {\widehat{\mathtt {\textsf{methodInvoke}}}}\) is part of an interface for reflective calls. The interface contains two other operations for retrieving class names and methods. \(\smash {\widehat{\mathtt {\textsf{methodInvoke}}}}\) matches on the call receiver and the method value. If the method value contains a finite set of methods, the operation checks if the receiver class has these methods and adds them as call targets. If the method value contains \(\top \), the operation adds all methods of the receiver class to the set of call targets. This over-approximates the dynamic module \(\textsf{reflection}\) where only one method is added as a call target.

figure m

The dynamic reflection modules are analogous except that different types of values are alternatives. In contrast to Section 5.1, the dynamic pointer and call-graph modules combined with the reflection and string modules now cover the entire language. Thus, the analysis is sound for all programs, even those using reflection.

Field and Object Immutability Analysis The analysis of this case study computes the immutability of objects and their fields inspired by a class and field immutability analysis by Roth et al. [41]. This information is useful for assessing the thread safety of programs, where multiple threads have access to the same objects.

This case study highlights two important features of our formalization. First, the core dynamic semantics of our language does not describe the immutability property. Therefore, we need to prove the static immutability analysis sound with respect to a dynamic immutability analysis. The case study demonstrates that the immutability concern can be encapsulated with analysis and dynamic modules, added modularly to the existing analysis and dynamic semantics, and reasoned about independently (Section 3). It is unclear how this can be achieved with a non-modular, monolithic analysis implementation. Second, the immutability analysis adds new types of entities and kinds to the store and reuses all modules of the pointer, call-graph, and reflection analysis. Even though the reused modules can be called with the new entities and have access to new kinds in the store, their soundness proofs remain valid (Section 4.1).

The immutability analysis adds objects (\(\textsf{Class} \times \textsf{HeapCtx}\)) to the types of entities and adds kinds \(\kappa _{\textsf{Mut}} \) and \(\kappa _{\textsf{Assign}} \) for their immutability and the assignability of their fields:

$$\begin{aligned} & \smash {\widehat{\mathtt {\textsf{Entity}}}}' = \smash {\widehat{\mathtt {\textsf{Entity}}}} \mid (\textsf{Class} \times \textsf{HeapCtx}) \\ & \smash {\widehat{\mathtt {\textsf{Property}}}}[\kappa _{\textsf{Mut}} ] = \smash {\widehat{\mathtt {\textsf{TransitivelyImmutable}}}} \mid \smash {\widehat{\mathtt {\textsf{NonTransitivelyImmutable}}}} \mid \smash {\widehat{\mathtt {\textsf{Mutable}}}} \\ & \smash {\widehat{\mathtt {\textsf{Property}}}}[\kappa _{\textsf{Assign}} ] = \smash {\widehat{\mathtt {\textsf{Assignable}}}} \mid \smash {\widehat{\mathtt {\textsf{NonAssignable}}}} \end{aligned}$$

\(\smash {\widehat{\mathtt {\textsf{Mutable}}}}\) describes objects whose fields are reassigned. \(\smash {\widehat{\mathtt {\textsf{NonTransitivelyImmutable}}}}\) describes objects whose fields are not reassigned, but some objects transitively reachable via fields are mutated. \(\smash {\widehat{\mathtt {\textsf{TransitivelyImmutable}}}}\) describes objects whose fields are not reassigned and no transitively reachable objects are mutated. \(\kappa _{\textsf{Assign}} \) uses two elements for reassigned and not reassigned fields.

The immutability analysis consists of three modules shown in Figure 3. Module \(\smash {\widehat{\mathtt {\textsf{fieldAssign}}}}\) sets fields f of objects o to \(\smash {\widehat{\mathtt {\textsf{Assignable}}}}\) for every assignment of the form x.f = e, where x may point to o. Module \(\smash {\widehat{\mathtt {\textsf{fieldMutability}}}}\) sets a field to \(\smash {\widehat{\mathtt {\textsf{Mutable}}}}\) if the field is assignable, to \(\smash {\widehat{\mathtt {\textsf{NonTransitivelyImmutable}}}}\) if it is non-assignable but one of the pointed-to objects is mutable, and to \(\smash {\widehat{\mathtt {\textsf{TransitivelyImmutable}}}}\) otherwise. Lastly, module \(\smash {\widehat{\mathtt {\textsf{objectMutability}}}}\) sets an object’s immutability to the least upper bound of the immutability of all of its fields.

Fig. 3.
figure 3

Immutability Static Modules

The dynamic modules are analogous except that they operate on concrete objects instead of abstract objects.

Demand-Driven Reaching-Definitions Analysis As a final case study, we developed a demand-driven intra-procedural reaching-definitions analysis for our object-oriented language. This case study demonstrates that our theory lifts a restriction of existing soundness theories for generic interpreters. In particular, our theory also applies to analyses that do not follow the program execution order.

The analysis computes which definitions of variables and fields reach a statement without being overwritten. The analysis is demand-driven, as it performs the minimum amount of work to compute the reaching definitions of a query statement: the analysis only computes the reaching definitions of the query statement and its predecessors. Also, the analysis does not compute the entire control-flow graph, but only the query statement’s predecessors.

The analysis consists of two modules \(\smash {\widehat{\mathtt {\textsf{reachingDefs}}}}\) and \(\smash {\widehat{\mathtt {\textsf{controlFlow}}}}\) similar to these discussed in Section 2. Module \(\smash {\widehat{\mathtt {\textsf{controlFlow}}}}\) calculates the set of control-flow predecessors of a given statement by computing the set of control-flow exits of the preceding statement within the abstract syntax tree. For example, the control-flow exits of an if statement are the exits of the last statements of both branches. The dynamic module controlFlow computes the predecessor immediately executed before the given statement. To this end, the module remembers the most recently executed statement in a mutable variable and only updates it if the given statement is the control-flow successor.

The main challenge in this case study was to find a dynamic module controlFlow that closely corresponds to the static module and still computes the correct control-flow predecessor. With a suitable dynamic module, the soundness proof of the static module became easier. Furthermore, we validated the correctness of the dynamic module with several unit tests.

5.2 Soundness Proofs of the Case Studies

We apply our theory to compositionally prove the analyses from the previous section sound. The proofs can be found in the supplementary material accompanying this paper. They are pen-and-paper proofs and do not make use of mechanization; but due to modularization, they are small and easy to verify.

Proving each analysis sound includes (a) proving each of its modules sound (Definition 8), (b) proving the instances of the property interface sound, and (c) verifying that Theorem 2 applies. To ensure the latter, we checked that there are no dependencies between modules and that all communication between them happens via the store (Definition 1). This can be easily checked by inspecting the code of the modules. Furthermore, we verified that modules do not make any assumption about abstract domains and are polymorphic in the store (Definition 7). This can be easily checked by inspecting the polymorphic type of the modules.

To prove the individual modules of an analysis sound, step (a) in the overall soundness proof, we use two techniques. The first uses the observation that static modules and their corresponding dynamic modules are often very similar, except for the types of entities and properties. We can abstract over these differences with a generic module, from which we derive both a dynamic and static module. Then, soundness follows immediately as a free theorem from parametricity [28]. In cases where abstracting with a generic module is not possible or desirable, we resort to a manual proof. We were able to use the first technique for all modules, except for \(\smash {\widehat{\mathtt {\textsf{method}}}}\), \(\smash {\widehat{\mathtt {\textsf{reachingDefs}}}}\), and \(\smash {\widehat{\mathtt {\textsf{controlFlow}}}}\). For illustrating cases where we need manual proofs, consider the flow-insensitive static module \(\smash {\widehat{\mathtt {\textsf{method}}}}\) of the pointer analysis and its corresponding dynamic module method. While we could potentially derive them from the same generic module, the derived static module would be less performant, because it would trigger the analysis of parts of the code, e.g., if conditions, which our current flow-insensitive module does not. This is an example where our approach leads to more freedom in the design of static analyses than the existing approach based on a generic interpreter (Section 6.1).

The soundness proofs of the static modules are reusable across different analyses, because the modules can be soundly lifted to supersets of entities and kinds (Lemma 1). For example, the immutability analysis adds class entities, requiring to lift the modules of the pointer and reflection analysis. Furthermore, the soundness proofs of static modules can be reused because the proofs are independent of the lattices used (Definition 8). For example, the reflection analysis reuses all modules of the pointer analysis, extending the value lattice with string, class, and method information. The soundness proofs of the pointer static modules remain valid because they do not depend on a specific value lattice. Instead, the proofs of the pointer modules depend on soundness lemmas of the newObj and forObj operations of Objects interface.

Finally, we consider step (b) in the overall soundness proof – the soundness proof of the instances of the property interface. These instances need to be proven sound manually, as the proof cannot be decomposed any further. To prove them sound, we proved each of their operations sound. For the pointer analysis we needed to prove 7 operations sound, for the reflection analysis 6 operations, for the immutability analysis 6 operations, and for the reaching-definitions analysis 0 operations. Of these 19 operations, 13 could be proven sound trivially, requiring only a single proof step after unfolding the definitions. The remaining 6 operations required more elaborate proofs with multiple steps and case distinctions. These include \(\smash {\widehat{\mathtt {\textsf{forObj}}}}\) from the pointer analysis, \(\smash {\widehat{\mathtt {\textsf{classForName}}}}\), \(\smash {\widehat{\mathtt {\textsf{getMethod}}}}\), and \(\smash {\widehat{\mathtt {\textsf{methodInvoke}}}}\) from the reflection analysis, and \(\smash {\widehat{\mathtt {\textsf{getFieldMutability}}}}\) and \(\smash {\widehat{\mathtt {\textsf{joinMutability}}}}\) from the immutability analysis.

6 Related Work

In this section, we discuss work related to compositional and reusable soundness proofs as well as to modular analysis architectures.

6.1 Theories for Compositional and Reusable Soundness Proofs

All works discussed in this subsection, including our own, build upon the theory of abstract interpretation. Abstract interpretation is a formal theory of sound static analyses, first conceived by Cousot et al. [12] but since then has found wide spread adoption in academia and industry [13, 16, 22, 25, 33, 44]. Abstract interpretation defines soundness of static analyses but does not explain how soundness can be proved. As we elaborate in the introduction, soundness proofs of practical analyses for real-world languages are difficult because they relate two complicated semantics often described in different styles. Proof attempts of such analyses often fail due to high proof complexity and effort. Furthermore, existing proofs are prone to become invalid if the static or dynamic semantics change and reestablishing proofs is laborious and complicated.

Domain constructions, such as reduced products and reduced cardinal powers [12], combine multiple existing abstract domains to improve their precision. They can be used to compose the soundness proof of operations on the abstract domain, e.g, primitive arithmetic, boolean, or string operations. However, they cannot be used to compose the soundness proof of the analysis of statements, e.g., assignments, loops, or procedure calls. In contrast, the blackboard architecture is capable to compose soundness proofs of both of these types of operations.

Darais et al. [14] developed a theory for soundness proofs, in which the static and dynamic semantics are derived from a small-step generic interpreter that describes the operational semantics of the language without mentioning details of static or dynamic semantics. The small-step generic interpreter is instantiated with reusable Galois transformers that capture aspects such as flow- or path-sensitivity and allow to change an existing analysis while preserving soundness. Galois transformers can be proven sound once and for all and their soundness proofs are reusable across different analyses. However, the approach does not compose soundness proofs of static semantics derived from the generic interpreter.

Keidel et al. [28] developed a theory for big-step abstract interpreters, deriving both the static and dynamic semantics from a generic big-step interpreter. The theory enables soundness composition [28, Theorem 4 and 5] if the generic interpreter is implemented with arrows [23] or in a meta-language which enjoys parametricity. But there is no theory how parts of soundness proofs can be reused between different analyses. Keidel et al. [27] later refined the theory by introducing reusable analysis components that capture different aspects of the language such as values, mutable state, or exceptions and are described with arrow transformers [23]. While components can be proven sound independently from each other, their composition requires glue code, which needs to be proven sound. Furthermore, the composition creates large arrow transformer stacks – that, unless optimized away by the compiler, may lead to inefficient analysis code. For example, a taint analysis for WebAssembly developed by using the approach depends on a stack of 18 arrow transformers. Eliminating the overhead of an arrow transformer stack of this size requires aggressive inlining and optimizations causing binary bloat and excessive compile times.

Bodin et al. [5] developed a theory of compositional soundness proofs for a style of semantics called skeletal semantics, which consists of hooks (recursive calls to the interpreter), filters (tests if variables satisfy a condition), and branches. The dynamic and static semantics are derived from the same skeleton. Also, soundness of the instantiated skeleton follows from soundness of the dynamic and static instance [5, Lemma 3.4 and 3.5]. However, their work does not describe how proofs can be reused across different analyses.

To recap, in all theories above the static and dynamic semantics must be derived from the same generic interpreter. This restricts what types of analyses can be derived. In particular, the static analysis must closely follow the program execution order dictated by the generic interpreter and it is unclear how static analyses can be derived that do not closely follow the program execution order. For example, backward analyses process programs in reverse order, flow-insensitive analyses may process statements in any order, and summary-based analyses construct summaries in bottom-up order. Our work lifts the restriction that static and dynamic semantics must be derived from the same artifact. static modules and corresponding dynamic modules must follow the blackboard architecture style, but else do not need to share any commonalities. This gives greater freedom as to which types of analyses can be implemented. For example, the blackboard analysis architecture has been used in prior work to develop backward analyses [17], on-demand/lazy analyses [19, 41], and summary-based analyses [21]. We also demonstrated in Section 5.1 that our theory applies to a demand-driven reaching definitions analysis. It is unclear how such an analysis can be derived from a generic interpreter.

6.2 Modular Analysis Architectures

These architectures describe how to implement static analyses modularly. Modular analysis architectures are a necessary requirement to develop theories for compositional and reusable soundness proofs. The theories give formal guarantees about proof independence, composition, and reuse.

Our work formally defines the blackboard analysis architecture used in the OPAL framework [15, 21]. In the past, OPAL has been used to implement state-of-the-art analyses for method purity [19], class- and field-immutability [41], and call-graphs [40] for Java Virtual Machine bytecode. Furthermore, OPAL features escape analyses and a solver for IFDS analyses [21] as well as a fixpoint algorithm that parallelizes the analysis execution [20].

Prior to the work presented in this paper, no formalization of the blackboard analysis architecture and no theory for its soundness existed. Our formalization captures the core of the OPAL framework, while deliberately ignoring implementation details. For example, our formalization does not describe the fixpoint algorithm and the order in which it executes static modules to resolve their dependencies. Proving the fixpoint algorithm correct is a separate concern compared to proving analyses sound, which is the focus of our formalization. That said, our formalization covers a variety of OPAL’s features described by Helm et al. [21]. For example, OPAL supports default and fallback properties for missing properties in the store. Fallback properties can be described by our formalization by adding them to the initial store passed to the fixpoint algorithm. We deliberately leave out default properties, which are an edge case in OPAL to mark properties not computed, e.g., because of dead code. They could be added to our formalization by extending analyses with a second set of static modules to be executed after the fixpoint is reached. Furthermore, OPAL supports optimistic analyses which ascend the lattice and pessimistic analyses which descend the lattice during fixpoint iteration. Both of these are covered by our formalization which describes analyses as monotone functions that ascend or descend the lattice. However, we deliberately do not cover OPAL’s mechanisms for allowing interaction between optimistic and pessimistic analyses, another edge case.

Configurable program analysis (CPA) [4] is a modular analysis architecture that describes analyses with transfer relation between control-flow nodes. CPAs can be systematically composed with reduced products. Furthermore, soundness of a component-wise transfer relation follows directly from soundness of its constituents. However, it is unclear how soundness proofs of primitive CPAs can be composed or how proof parts can be reused across analyses.

Doop [7] is a framework which describes analysis with relations in Datalog. Each relation is defined as a set of rules. These rules can be modularly added or replaced, without requiring changes to other rules. While individual analyses in Doop have been proven sound [43], the proofs are not compositional or reusable. In particular, if one rule changes, the proof becomes invalid and needs to be reestablished. This is because the proof reasons about soundness of all rules at once instead of individual rules or relations. The IncA framework [45] also describes analyses in Datalog, but allows relations over lattices instead of only sets. However, no soundness theory for its analyses exists. Similar to IncA, the Flix framework [37] describes analyses with lattice-based Datalog relations and functions. Flix proves individual functions sound with an automated theorem prover [36]. While an automated theorem prover reduces the proof effort and increases proof trustworthiness, there is no guarantee that the automated theorem prover is able to conduct a proof. Furthermore, the automated theorem prover does not establish a soundness proof of Datalog relations.

Verasco [26] is a modular analysis for C#minor [32], an intermediate language used by the CompCert C compiler [33]. Verasco is proven sound with the Coq proof assistant [3]. The soundness proof of the abstract C#minor semantics is independent of the abstract domain, which makes the proof reusable for other abstract domains. However, the abstract semantics is proven sound w.r.t. the standard concrete semantics. Thus, the proof cannot be reused for abstract semantics which approximate non-standard concrete semantics, such as information flow analyses [2] or liveness analyses [11].

Several other modular analysis architectures [24, 31, 42] do not have formal theories for soundness.

6.3 Monolithic Soundness Proofs

In this subsection, we compare compositional and reusable soundness proof theories to ad-hoc monolithic proofs and discuss their trade-offs.

Monolithic soundness proofs consider the entire analysis and dynamic semantics as a whole. This complicates the proof because there is no separation of concerns to manage the complexity of modern programming languages. Furthermore, monolithic soundness proofs are harder to maintain. In particular, whenever the analysis needs to be updated to support a new version of the language, or whenever the analysis is fine-tuned to improve precision and scalability, the soundness proof becomes invalid and needs to be reestablished. However, reestablishing the soundness proof is difficult because it is unclear which parts of the proof have become invalid and need to be updated. In contrast, compositional soundness proofs narrow the proofscope to individual modules, which decreases the proofs’ complexity. Furthermore, compositional soundness proofs are easier to maintain as changes to individual modules only invalidate their particular soundness proof, while the proofs of other modules remain valid.

The main benefit of monolithic soundness proofs over compositional proofs is that analyses may be proven sound with respect to existing formal dynamic semantics. However, often no suitable formal dynamic semantics exists and analyses still have to be proven sound with respect to customly defined or modified dynamic semantics. For example, HornDroid [8] is proven sound with respect to a custom instrumented JVM small-step semantics and JaamFootnote 3 is proven sound with respect to a custom JVM semantics in form of an abstract machine [22]. Furthermore, analyses of properties not present in standard language semantics need to be proven sound with respect to instrumented dynamic semantics. For example, a static taint analysis needs to be proven sound with respect to an instrumented dynamic semantics with taint information. In contrast, compositional soundness proofs require a one-time cost of formalizing a modular dynamic semantics for a language. Once this is done, several analyses can be proven sound with respect to this dynamic semantics. Furthermore, the dynamic semantics can be modularly extended to describe new aspects such as taint information.

7 Future Work

In this section, we discuss limitations of our work and how these limitations can be addressed in the future.

First, our soundness theory requires that static analyses and dynamic semantics are described in the blackboard analysis architecture. It is unclear how easily existing analyses and dynamic semantics be adapted to the architecture. In Section 2.2, we showed how existing small-step dynamic semantics can be described as a module and Helm et al. [21] implemented a wide range of static analyses in the architecture. In the future, we want to investigate how other styles of static and dynamic semantics can be adapted to the architecture.

Second, our soundness theory requires that all static modules are sound. However, in practice static analyses are deliberately unsound due to complicated language features [34]. In the future, we want to investigate how the blackboard analysis architecture can be used to localize unsoundness. Specifically, unsound analysis results could be tagged with the name of the module that produced them. All results derived from unsound results then propagate the tags. This way, it is always clear which results are potentially unsound and which modules caused unsoundness.

Lastly, our work has focused on soundness, i.e., analyses do not produce false-negative results. A complementary property to soundness is completeness, i.e., analyses do not produce false-positives results. No false-positive results are especially important if analyses produce warnings that are to be inspected by developers. In the future, we want to investigate if our theory can be extended to prove completeness of static analyses.

8 Conclusion

In this work, we developed a theory for compositional and reusable soundness proofs for static analyses in the blackboard analysis architecture. The blackboard analysis architecture modularizes the implementation of static analyses with analyses composed of independent static modules. We proved that soundness of an analysis follows directly from independent soundness proofs of each module. Furthermore, we extended our theory to enable the reuse of soundness proofs of existing modules across different analyses. We evaluated our approach by implementing four analyses and proving them sound: A pointer, a call-graph, a reflection, an immutability analysis, and a demand-driven reaching definitions analysis.