Keywords

figure a
figure b

1 Introduction

Tracking relationships between program variables is indispensable for proving properties of programs or verifying the absence of certain programming errors [14, 16, 33]. Inferring relational properties is particularly challenging for multi-threaded programs as all interferences by other threads that may happen in parallel, must be taken into account. In such an environment, only relational properties between globals protected by common mutexes are likely to persist throughout program execution. Generally, relations on clusters consisting of fewer variables are less brittle than those on larger clusters. Moreover, monolithic relational analyses employing, e.g., the polyhedral abstract domain are known to be notoriously expensive [36, 54]. Tracking smaller clusters may even be more precise than tracking larger clusters [19].

Example 1

Consider the following program. All accesses to globals g, h, and i are protected by the mutex a.

figure c

In this program, the main thread creates two new threads, starting at \(t_1\) and \(t_2\), respectively. Then it locks the mutex a to set all globals non-deterministically to some value and unlocks a again. After having joined the thread \(t_2\), it locks a again and sets all globals to the same unknown value and unlocks a again. Thread \(t_1\) sets i to the value of h. Thread \(t_2\) sets g and h to (potentially different) unknown values. Assume we are interested in equalities between globals. In order to succeed in showing assertion (1), it is necessary to detect that the main thread is unique and thus cannot read its past writes since these have been overwritten. Additionally, the analysis needs to certify that thread \(t_2\) also is unique, has been joined before the assertion, and that its writes must also have been overwritten.

For an analysis to prove assertion (2), propagating a joint abstraction of the values of all globals protected by a does not suffice: At the unlock of a in \(t_1\), \(g{=}h\) need not hold. If this monolithic relation is propagated to the last lock of a in main, (2) cannot be shown — despite \(t_1\) modifying neither g nor h.   \(\square \)

Here we show, that the loss of precision indicated in the example can be remedied by replacing the monolithic abstraction of all globals protected by a mutex with suitably chosen subclusters. In the example, we propose to instead consider the subclusters \(\{g,h\}\) and \(\{h,i\}\) separately. As \(t_1\) does not write any values to the cluster \(\{g,h\}\), the imprecise relation \(\top \) is not propagated to the main thread and assertion (2) can be shown.

To fine-tune the analysis, we rely on weakly relational domains. A variety of weakly relational domains have been proposed in the literature such as Two Variables Per Inequality [53], Octagons [36, 37], or simplifications thereof [33, 35]. The technical property of interest which all these domains have in common is that each abstract relation can be reconstructed from its projections onto subclusters of variables of size at most 2. We call such domains 2-decomposable. Beyond the numerical 2-decomposable domains, also non-numerical 2-decomposable domains can be constructed such as a domain relating string names and function pointers.

Based on 2-decomposable domains, we design thread-modular relational analyses of globals which may attain additional precision by taking local knowledge of threads into account. Therefore, we do not rely on a global trace semantics, but on a local trace semantics which formalizes for each thread that part of the computational past it can observe [48]. Abstract values for program points describe the set of all reaching local traces. Likewise, values recorded for observable actions are abstractions of all local traces ending in the corresponding action. Such observable actions are, e.g., unlock operations for mutexes. The abstract values are then refined by taking finite abstractions of local traces into account. To this end, we propose a generic framework that re-uses the components of any base analysis as black boxes. Our contributions can be summarized as follows:

  • We provide new relational analyses of globals as abstractions of the local trace semantics based on overlapping variable clusters (Sections 3, 4, and 8).

  • Our analysis deals with dynamically created and joined threads, whose thread ids may, e.g., be communicated to other threads via variables and which may synchronize via mutexes (Section 3).

  • We provide a generic scheme to incorporate history-based arguments into the analysis by taking finite abstractions of local traces into account (Section 5).

  • We give an analysis of dynamically created thread ids as an instance of our generic scheme. We apply this to exclude self-influences or reads from threads that cannot possibly run in parallel (Sections 6 and 7).

  • We prove that some loss of precision of relational analyses can be avoided by tracking all subclusters of variables. For the class of 2-decomposable relational domains, we prove that tracking variable clusters of size greater than 2 can be abandoned without precision loss (Section 8).

The analyses in this paper have all been implemented, a report of a practical evaluation is included in Section 9, whereas Section 10 details related work.

2 Relational Domains

First, we define the notion of relational domain employed in the description of our analysis. Let \({\mathcal{V}\! ars }\) be a set of variables, potentially of different types. We assume all configurations and assignments to be well-typed, i.e., the type of the (abstract) value matches the one specified for a variable. For each type \(\tau \) of values, we assume a complete lattice \(\mathcal{V}_\tau ^\sharp \) of abstract values abstracting the respective concrete values from \(\mathcal{V}_\tau \). Let \({\mathcal{V}^\sharp }\) denote the collection of these lattices, and \({\mathcal{V}\! ars }\rightarrow _\bot {\mathcal{V}^\sharp }\) denote the set of all type-consistent assignments \(\sigma \) from variables to non-\(\bot \) abstract values, extended with a dedicated least element (also denoted by \(\bot \)), and equipped with the induced ordering. A relational domain \(\mathcal{R}\) then is a complete lattice which provides the following operations

figure d

The operations to the left provide the abstract state transformers for the basic operation of programs (with non-deterministic assignments expressed as restrictions), while \(\textsf {lift}\) and \(\textsf {unlift}\) allow casting from abstract variable assignments to the relational domain as well as extracting single-variable information. We assume that \(\textsf {lift}\,\bot = \bot \) and \(\textsf {unlift}\,\bot = \bot \), and require that \(\textsf {unlift}\circ \textsf {lift}\sqsupseteq \textsf {id}\) where \(\sqsupseteq \) refers to the ordering of \(({\mathcal{V}\! ars }\rightarrow _\bot {\mathcal{V}^\sharp })\). Moreover, we require that the meet operations \(\sqcap \) of \({\mathcal{V}^\sharp }\) and \(\mathcal{R}\) safely approximate the intersection of the concretizations of the respective arguments. Restricting a relation r to a subset Y of variables amounts to forgetting all information about variables not in Y. Thus, we demand , , when \(Y_1 \subseteq Y_2\), , and

(1)

Restriction thus is idempotent. For convenience, we also define a shorthand for assignment of abstract valuesFootnote 1: . In order to construct an abstract interpretation, we further require monotonic concretization functions \(\gamma _{\mathcal{V}^\sharp }:{\mathcal{V}^\sharp }\rightarrow 2^\mathcal{V}\) and \(\gamma _\mathcal{R}:\mathcal{R}\rightarrow 2^{{\mathcal{V}\! ars }\rightarrow \mathcal{V}}\) satisfying the requirements presented in Fig. 1.

Fig. 1.
figure 1

Required properties for \(\gamma _{\mathcal{V}^\sharp }:{\mathcal{V}^\sharp }\rightarrow 2^\mathcal{V}\) and \(\gamma _\mathcal{R}:\mathcal{R}\rightarrow 2^{{\mathcal{V}\! ars }\rightarrow \mathcal{V}}\).

Example 2

As a value domain \(\mathcal{V}^\sharp _\tau \), consider the flat lattice over the sets of values of appropriate type \(\tau \). A relational domain \(\mathcal{R}_1\) is obtained by collecting satisfiable conjunctions of equalities between variables or variables and constants where the ordering is logical implication, extended with \({\textsf {False}}\) as least element. The greatest element in this complete lattice is given by \({\textsf {True}}\). The operations \(\textsf {lift}\) and \(\textsf {unlift}\) for non-\(\bot \) arguments then can be defined as

$$ \begin{array}{lllllll} \textsf {lift}\,\sigma = \bigwedge \{x=\sigma \,x\mid x\in {\mathcal{V}\! ars }, \sigma \,x\ne \top \} \qquad \,\, \textsf {unlift}\,r\,x = {\left\{ \begin{array}{ll} c &{} \text { if } r\implies (x=c) \\ \top &{} \text {otherwise} \end{array}\right. } \end{array} $$

The restriction of r to a subset Y of variables is given by the conjunction of all equalities implied by r which only contain variables from Y or constants.    \(\square \)

In line of Example 2, also non-numerical relational domains may be constructed.

A variable clustering \(\mathcal{S}\subseteq 2^{{\mathcal{V}\! ars }}\) is a set of subsets (clusters) of variables. For any cluster \(Y\subseteq {\mathcal{V}\! ars }\), let ; this set collects all abstract values from \(\mathcal{R}\) containing information on variables in Y only. Given an arbitrary clustering \(\mathcal{S}\subseteq 2^{{\mathcal{V}\! ars }}\), any relation \(r\in \mathcal{R}\) can be approximated by a meet of relations from \(\mathcal{R}^Y\) (\(Y\in \mathcal S\)) since for every \(r\in \mathcal{R}\), holds.

Some relational domains, however, can be fully recovered from their restrictions to specific subsets of clusters. We consider for \(k\ge 1\), the set \(\mathcal{S}_k\) of all non-empty subsets \(Y\subseteq {\mathcal{V}\! ars }\) of cardinality at most k. We call a relational domain \(\mathcal{R}\) k-decomposable if each abstract value from \(\mathcal{R}\) can be precisely expressed as the meet of its restrictions to clusters of \(\mathcal{S}_k\) and when all least upper bounds can be recovered by computing with clusters of \(\mathcal{S}_k\) only; that is,

(2)

holds for each abstract relation \(r\in \mathcal{R}\) and each set of abstract relations \(R\subseteq \mathcal{R}\).

Example 3

The domain \(\mathcal{R}_1\) from the previous example is 2-decomposable. This also holds for the octagon domain [36] and many other weakly relational numeric domains (pentagons [33], weighted hexagons [21], logahedra [28], TVPI [53], dDBM [46], and AVO [11]). The affine equalities or affine inequalities domains [16, 30], however, are not. The relational string domains proposed by Arceri et al. [6, Sec. 5.1 - 5.3], are examples of non-numeric 2-decomposable domains.

3 A Local Trace Semantics

We build upon the semantic framework for local traces, introduced by Schwarz et al. [48]. A local trace records all past events that have affected the present configuration of a specific thread, referred to as the ego thread. In [48], the local trace semantics is proven equivalent to the global trace semantics which itself is equivalent to a global interleaving semantics. In particular, any analysis that is sound w.r.t. the local trace semantics also is w.r.t. the interleaving semantics.

While the framework of Schwarz et al. [48] allows for different formalizations of traces, thread synchronization happens only via locking/unlocking and thread creation. Generalizing their semantics, we identify certain actions as observable by other threads when executing corresponding observing actions (see Table 1). When the ego thread executes an observing action, a local trace ending in the corresponding observable action is incorporated. Here, we consider as observable/observing actions locking/unlocking mutexes and creating/joining threads.

Table 1. Observable and observing actions and which concurrency primitive they relate to. The primitives targeted by this paper are in bold font.

Consider, e.g., the program in Fig. 2a and a corresponding local trace (Fig. 2b). This trace consists of one swim lane for each thread representing the sequence of steps it executed where each node in the graph represents a configuration attained by it. Additionally, the trace records the create and join orders as well as for each mutex a, the locking order for a (\(\rightarrow _c,\rightarrow _j\), and \(\rightarrow _a\), respectively). These orders introduce extra relationships between thread configurations. The unique start node of each local trace is an initial configuration of the main thread.

We distinguish between the sets \(\mathcal{X}\) and \(\mathcal{G}\) of local and global variables. We assume that \(\mathcal{X}\) contains a special variable \(\textsf {self}\) within which the thread id of the current thread, drawn from the set \(\mathcal{I}\), is maintained. A (local) thread configuration is a pair \((u,\sigma )\) where u is a program point and the type-consistent map \(\sigma : \mathcal{X}\rightarrow \mathcal{V}\) provides values for the local variables. The values of globals are not explicitly represented in a thread configuration, but can be recovered by consulting the (unique) last write to this global within the local trace. To model weak memory effects, weaker notions of last writes are conceivable. As in [48], we consider a set of actions \(\mathcal {A} ct \) that consists of locking and unlocking a (non-reentrant) mutex from a set \(\textsf {M}\), copying values of globals into locals and vice-versa, creating a new thread, as well as assignments with and branching on local variables. We extend \(\mathcal {A} ct \) with actions for returning from and joining with threads. We assume that writes to and reads from globals are atomic (or more precisely, we assume copying values of integral type to be atomic). This is enforced for each global g by a dedicated mutex \(m_g\) acquired just before accessing g and released immediately after. For simplicity, we associate traces corresponding to a write of g to this dedicated mutex \(m_g\), and thus do not need to consider writing and reading of globals as observable/observing actions. In examples, we omit explicitly locking and unlocking these mutexes. By convention, at program start all globals have value 0, while local variables may initially have any value.

Fig. 2.
figure 2

An example program and a corresponding local trace.

Each thread is represented by a control-flow graph with edges \(e\in \mathcal{E}\) of the form \(e=(u,\textsf {act},u')\) for some action \(\textsf {act}\in \mathcal {A} ct \) and program points u and \(u'\) where the start point of the main thread is \(u_0\). Let \(\mathcal {T}\) denote the set of all local traces of a given program. A formalism for local traces must, for each edge e of the control-flow graph, provide a transformation \(\llbracket e\rrbracket :\mathcal {T}^k\rightarrow 2^\mathcal {T}\) so that \(\llbracket e\rrbracket (t_0,\ldots ,t_{k-1})\) extends the local trace \(t_0\), possibly incorporating other local traces. For the operations \(\textsf {lock}(a),a\in \textsf {M}\), or \(x{=}\textsf {join}(x'), x,x'\in \mathcal{X}\), the arity of \(\llbracket e\rrbracket \) is two, another local trace, namely, with last operation \(\textsf {unlock}(a)\) or \(\textsf {return}\,x''\), respectively, is incorporated. The remaining edge transformations have arity one. In all cases, the set of resulting local traces may be empty when the operation is not applicable to its argument(s). We write \(\llbracket e\rrbracket (T_0,\ldots ,T_{k-1})\) for the set \(\bigcup _{t_0\in T_0,\ldots ,t_{k-1}\in T_{k-1}}\llbracket e\rrbracket (t_0,\ldots ,t_{k-1})\).

Fig. 3.
figure 3

Right-hand sides for side-effecting formulation of concrete semantics; t(y) extracts the value of local variable y from the terminal configuration of trace t.

Given definitions of \(\llbracket e\rrbracket \), the set \(\mathcal {T}\) can be inductively defined starting from a set \(\textsf {init}\) of initial local traces consisting of initial configurations of the main thread. To develop efficient thread-modular abstractions, we are interested in subsets \(\mathcal {T}[u], \mathcal {T}[a],\mathcal {T}[i]\) of local traces ending at some program point u, ending with an unlock operation for mutexes a (or from \(\textsf {init}\)), or ending with a return statement of thread i, respectively. Schwarz et al. [48] showed that such subsets can be described as the least solution of a side-effecting constraint system [5]. There, each right-hand side may, besides its contribution to the unknown on the left, also provide contributions to other unknowns (the side-effects). This allows expressing analyses that accumulate flow-insensitive information about globals during a flow-sensitive analysis of local states with dynamic control flow [51]. Here, in the presence of dynamic thread creation, we use side-effects to express that an observable action, unlock or return, should also contribute to the sets \(\mathcal {T}[a]\) or \(\mathcal {T}[i]\), such that they can be incorporated at the corresponding observing action. The side-effecting formulation of our concrete semantics takes the form:

$$\begin{aligned} \begin{array}{lllllll} (\eta ,\eta \,[u_0]) \sqsupseteq (\{ [a] \mapsto \textsf {init}\mid a {\in }\textsf {M}\}, \textsf {init})\quad (\eta ,\eta \,[u']) \sqsupseteq \llbracket u,\textsf {act}\rrbracket \eta \;\; (u,\textsf {act},u'){\in }\mathcal{E}\end{array} \end{aligned}$$
(3)

where the ordering \(\sqsupseteq \) is induced by the superset ordering and right-hand sides are defined in Fig. 3. A right-hand side takes an assignment \(\eta \) of the unknowns of the system and returns a pair \((\eta ',T)\) where T is the contribution to the unknown occurring on the left (as in ordinary constraint systems). The first component collects the side-effects as the assignment \(\eta '\). If the right-hand sides are monotonic, Eq. (3) has a unique least solution.

We only detail the right-hand sides for the creation of threads as well as the new actions join and return; the rest remain the same as defined by Schwarz et al. [48]. For thread creation, they provide the action \(x {=} \textsf {create}(u_1)\). Here, \(u_1\) is the program point at which the created thread should start. We assume that all locals from the creator are passed to the created thread, except for the variable \(\textsf {self}\). The variables \(\textsf {self}\) in the created thread and x in the creating thread receive a fresh thread id. Here, \(\textsf {new}\,u\,u_1\,t\) computes the local trace at the start point \(u_1\) from the local trace t of the creating thread. To handle returning and joining of threads we introduce the following two actions:

  • return x; – terminating a thread and returning the value of the local variable x to a thread waiting for the given thread to terminate.

  • \(x {=} \textsf {join}(x');\) where \(x'\) is a local variable holding a thread id – blocks the ego thread, until the thread with the given thread id has terminated. As in pthreads, at most one thread may call join for a given thread id. The value provided to return by the joined thread is assigned to the local variable x.

For returning results and realization of join, we employ the unknown [i] for the thread id i of the returning thread, as shown in Fig. 3.

4 Relational Analyses as Abstractions of Local Traces

Subsequently, we give relational analyses of the values of globals which we base on the local trace semantics. They are generic in the relational domain \(\mathcal{R}\), with 2-decomposable domains being particularly well-suited, as the concept of clusters is central to the analyses. We focus on relations between globals that are jointly write-protected by some mutex. We assume we are given for each global g, a set \(\mathcal {M}[g]\) of (write) protecting mutexes, i.e., mutexes that are always held when g is written. Let \(\mathcal {G}[a]=\{g\in \mathcal{G}\mid a\in \mathcal {M}[g]\}\) denote the set of globals protected by a mutex a. Let \(\emptyset \ne \mathcal {Q}_a \subseteq 2^{\mathcal {G}[a]}\) the set of clusters of these globals we associate with a. For technical reasons, we require at least one cluster per mutex a, which may be the empty cluster \(\emptyset \), thus not associating any information with a.

Our basic idea is to store at the unknown \([a,Q]\) (for each mutex a and cluster \(Q\in \mathcal {Q}_a\)) an abstraction of the relations only between globals in \(Q\). By construction, all globals in \(Q\) are protected by a. Whenever it is locked, the relational information stored at all \([a,Q]\) is incorporated into the local state by the lattice operation meet, i.e., the local state now maintains relations between locals as well as globals which no other thread can access at this program point. Whenever a is unlocked, the new relation between globals in all corresponding clusters \(Q\in \mathcal {Q}_a\) is side-effected to the respective unknowns \([a,Q]\). Simultaneously, all information on globals no longer protected, is forgotten to obtain the new local state. In this way, the analysis is fully relational in the local state, while only keeping relations within clusters of globals jointly protected by some mutex.

For clarity of presentation, we perform control-point splitting on the set of held mutexes when reaching program points. Apart from this, the constraint system and right-hand sides for the analysis closely follow those of the concrete semantics (Section 3) — with the exception that unknowns now take values from \(\mathcal{R}\) and that unknowns [a] are replaced with unknowns \([a,Q]\) for \(Q\in \mathcal {Q}_a\).

Fig. 4.
figure 4

Right-hand sides for the basic analysis. All functions are strict in \(\bot \) (describing the empty set of local traces), we only display definitions for non-\(\bot \) abstract values here. \(\llbracket \{ g \leftarrow 0 \mid g\in Q\}\rrbracket ^\sharp _\mathcal{R}\) is shorthand for the abstract transformer corresponding to the assignment of 0 to all variables in \(Q\) one-by-one.

All right-hand sides are given in detail in Fig. 4. For the start point of the program and the empty lockset, the right-hand side \(\textsf {init}^\sharp \) returns the \(\top \) relation updated such that the variable \(\textsf {self}\) holds the abstract thread id \(i_{0}\) of the main thread. Additionally, \(\textsf {init}^\sharp \) produces a side-effect for each mutex a and cluster \(Q\) that initializes all globals from the cluster with the value 0.

For a thread creating edge starting in program point u with lockset S, the right-hand side \(\llbracket [u,S],x {=} \textsf {create}(u_1)\rrbracket ^\sharp \) first generates a new abstract thread id, which we assume can be computed using function \(\nu ^\sharp \). The new id is assigned to the variable x in the local state of the current thread. Additionally, the start state \(r'\) for the newly created thread is constructed and side-effected to the thread’s start point with empty lockset \([u_1,\emptyset ]\). Since threads start with empty lockset, the state \(r'\) is obtained by removing all information about globals from the local state of the creator and assigning the new abstract thread id to the variable \(\textsf {self}\).

When locking a mutex a, the states stored at unknowns \([a,Q]\) with \(Q\in \mathcal {Q}_a\) are combined with the local state by meet. This is sound because the value stored at any \([a,Q]\) only maintains relationships between variables write-protected by a, and these values soundly account for the program state at every \(\textsf {unlock}(a)\) and at program start. When unlocking a, on the other hand, the local state restricted to the appropriate clusters \(Q\in \mathcal {Q}_a\) is side-effected to the respective unknowns \([a,Q]\), so that the changes made to variables in the cluster become visible to other threads. Also, the local state is restricted to the local variables and only those globals for which at least one protecting mutex is still held.

As special mutexes \(m_g\) immediately surrounding accesses to g are used to ensure atomicity, and information about g is associated with them, all reads and writes refer to the local copy of g. Guards and assignments (which may only involve local variables) are defined analogously. For a return edge, the abstract value to be returned is looked up in the local state and then side-effected to the abstract thread id of the current thread (as the value of the dedicated variable \(\textsf {ret}\)). For join, the least upper bound of all return values of all possibly joined threads is assigned to the left-hand side of the join statement in the local state.

Example 4

Consider the programFootnote 2 where \(\mathcal {M}[g] = \{a,b,m_g\}\), \(\mathcal {M}[h] = \{a,b,m_h\}\), \(\mathcal {Q}_a = \{\{g,h\}\}\), \(\mathcal {Q}_b = \{\{g,h\}\}\).

figure l

Our analysis succeeds in proving all assertions here. Thread \(t_2\) is of particular interest: When locking b only \(g\le h\) is known to hold, and locking the additional mutex a means that the better information \(g=h\) becomes available. The analysis by Mukherjee et al. [42] on the other hand only succeeds in proving assertion (2) — even when all globals are put in the same region. It cannot establish (1) because all correlations between locals and globals are forgotten when the mix operation is applied at the second \(\textsf {lock}(b)\) in the main thread. (3) cannot be established because, at \(\textsf {lock}(b)\) in \(t_1\), the mix operation also incorporates the state after the first \(\textsf {unlock}(b)\) in the main thread, where \(g=h\) does not hold. Similarly, for (4). The same applies for assertion (3) and the analysis using lock invariants proposed by Miné [39]. This analysis also falls short of showing (1), as at the \(\textsf {lock}(b)\) in the main thread, the lock invariant associated with b is joined into the local state. (4) is similarly out of reach. The same reasoning also applies to [39, 42, 48] after equipping the analyses with thread ids.   \(\square \)

Theorem 1

Any solution of the constraint system is sound w.r.t. the local trace semantics.

Proof

The proof is by fixpoint induction, the details are given in Appendix B of the extended version [49] of this paper.

We remark that, instead of relying on \(\mathcal {M}[g]\) being pre-computed, an analysis can also infer this information on the fly [58].

The analysis however still has some deficiencies. All writes to a global are accumulated regardless of the writing thread. As a consequence, a thread does, e.g., not only read its latest local writes but also all earlier local writes, even if those are definitely overwritten. Excluding some threads’ writes is an instance of the more general idea of excluding writes that cannot be last writes. Instead of giving ad hoc remedies for this specific shortcoming, we propose a general mechanism to improve the precision of any thread-modular analysis in the next section, and later instantiate it to the issue highlighted here.

5 Refinement via Finite Abstractions of Local Traces

To improve precision of thread-modular analyses we take additional abstractions of local traces into account. Our approach is generic, building on the right-hand sides of a base analysis and using them as black boxes. We will instantiate this framework to exclude writes based on thread ids from the analysis in Section 4. Other instantiations are conceivable as well. To make it widely applicable, the framework allows base analyses that already perform some splitting of unknowns at program points (e.g., locksets in Section 4). We denote by \([\hat{u}]\) such (possibly) extended unknowns for a program point u. A (base) analysis is defined by its right-hand sides, and a collection of domains: (1) \(\mathcal{D}_S\) for abstract values stored at unknowns for program points; (2) \(\mathcal{D}_\textsf {act}\) for abstract values stored at observable actions \(\textsf {act}\) (e.g., in Section 4, \(\mathcal{D}_M\) for unlocks and \(\mathcal{D}_T\) for thread returns).

Let \(\mathcal {A}\) be a set of finite information that can be extracted from a local trace by a function \(\alpha _\mathcal {A}{:}\mathcal {T}{\rightarrow } \mathcal {A}\). We call \(\alpha _\mathcal {A}\,t{\in }\mathcal {A}\) the digest of some local trace t. Let \(\llbracket u,\textsf {act}\rrbracket ^\sharp _\mathcal {A}{:} \mathcal {A}^k{\rightarrow }2^\mathcal {A}\) be the effect on the digest when performing a k-ary action \(\textsf {act}\in \mathcal {A} ct \) for a control flow edge originating at u. We require for \(e{=}(u,\textsf {act},v){\in }\mathcal{E}\),

$$\begin{aligned} \begin{array}{lll} \forall A_0,\ldots ,A_{k-1} \in \mathcal {A}&{}:&{}\; |\llbracket u,\textsf {act}\rrbracket ^\sharp _\mathcal {A}(A_0,\ldots ,A_{k-1})|\le 1 \\ \forall t_0,\ldots ,t_{k-1} \in \mathcal {T}&{}:&{} \alpha _\mathcal {A}(\llbracket e\rrbracket (t_0,\ldots ,t_{k-1})) \subseteq \llbracket u,\textsf {act}\rrbracket ^\sharp _\mathcal {A}(\alpha _\mathcal {A}\,t_0,\ldots ,\alpha _\mathcal {A}\,t_{k-1}) \end{array} \end{aligned}$$
(4)

where \(\alpha _\mathcal {A}\) is lifted element-wise to sets. While the first restriction ensures determinism, the second intuitively ensures that \(\llbracket u,\textsf {act}\rrbracket ^\sharp _\mathcal {A}\) soundly abstracts \(\llbracket e\rrbracket \).

For thread creation, we additionally require a helper function \(\textsf {new}^\sharp _\mathcal {A}: \mathcal{N}\rightarrow \mathcal{N}\rightarrow \mathcal {A}\rightarrow \mathcal {A}\) that returns for a thread created at an edge originating from u and starting execution at program point \(u_1\) the new digest. The same requirements are imposed for edges \((u,x {=}\textsf {create}(u_1),v)\in \mathcal{E}\),

$$\begin{aligned} \begin{array}{llllll} \forall A_0 {\in } \mathcal {A}&:&|\textsf {new}^\sharp _\mathcal {A}\,u\,u_1\,A_0|\le 1 \quad \; \forall t_0 {\in } \mathcal {T}&:&\alpha _\mathcal {A}(\textsf {new}\,u\,u_1\,t) \subseteq \textsf {new}^\sharp _\mathcal {A}\,u\,u_1\,(\alpha _\mathcal {A}\,t_0) \end{array} \end{aligned}$$
(5)

Also, we define for the initial digest at the start of the program

$$\begin{aligned} \begin{array}{lll} \textsf {init}^\sharp _\mathcal {A}= & {} \{\alpha _A\,t \mid t \in \textsf {init}\} \end{array} \end{aligned}$$
(6)

Under these assumptions, we can perform control-point splitting according to \(\mathcal {A}\). This means that unknowns \([\hat{u}]\) for program points u are replaced with new unknowns \([\hat{u},A]\), \(A\in \mathcal {A}\). Analogously, unknowns for observable actions \([\textsf {act}]\) are replaced with unknowns \([\textsf {act},A]\) for \(A\in \mathcal {A}\). Consider a single constraint from an abstract constraint system of the last section, which soundly abstracts the collecting local trace semantics of a program.

$$ \begin{array}{llllllrrr} (\eta ,\eta \,[\hat{v}])\sqsupseteq & {} \llbracket [\hat{u}],\textsf {act}\rrbracket ^\sharp \,\eta \quad \,\, \end{array} $$

The corresponding constraints of the refined system with control-point splitting differ based on whether the action \(\textsf {act}\) is observing, observable, or neither.

  • When \(\textsf {act}\) is observing, the new right-hand side additionally gets the digest \(A_1\) associated with the local traces that are to be incorporated:

    $$ \begin{array}{lll} (\eta , \eta \,\left[ \hat{v}, A'\right] )\sqsupseteq & {} \llbracket [\hat{u},A_0],\textsf {act},A_1\rrbracket ^\sharp \,\eta \qquad \text {for } A_0,A_1\in \mathcal {A}, A'\in \llbracket u,\textsf {act}\rrbracket ^\sharp _\mathcal {A}\,(A_0,A_1) \end{array} $$
  • When \(\textsf {act}\) is \(observable \), the digest \(A'\) of the resulting local trace is passed, so the side-effect can be redirected to the appropriate unknown:

    $$ \begin{array}{lll} (\eta , \eta \,\left[ \hat{v}, A'\right] )\sqsupseteq & {} \llbracket [\hat{u},A_0],\textsf {act},A'\rrbracket ^\sharp \,\eta \qquad \text {for } A_0\in \mathcal {A}, A'\in \llbracket u,\textsf {act}\rrbracket ^\sharp _\mathcal {A}\,(A_0) \end{array} $$
  • When \(\textsf {act}\) is neither, no additional digest is passed:

    $$ \begin{array}{lll} (\eta , \eta \,\left[ \hat{v}, A'\right] )\sqsupseteq & {} \llbracket [\hat{u},A_0],\textsf {act}\rrbracket ^\sharp \,\eta \qquad \;\;\;\;\; \text {for } A_0\in \mathcal {A}, A'\in \llbracket u,\textsf {act}\rrbracket ^\sharp _\mathcal {A}\,(A_0) \end{array} $$
Fig. 5.
figure 5

Right-hand sides for an observing action \(\textsf {act}\), an observable action \(\textsf {act}'\), a create action, and an action \(\textsf {act}''\) that is neither for the refined analyses, defined as wrappers around the right-hand sides of a base analysis.

The new right-hand sides are defined in terms of the right-hand side of the base analysis which are used as black boxes (Fig. 5). They act as wrappers, mapping any unknown consulted or side-effected to by the original analysis to the appropriate unknown of the refined system. Thus, the refined analysis automatically benefits from the extra information the digests provide. It may, e.g., exploit that \(\llbracket u,\textsf {act}\rrbracket ^\sharp _\mathcal {A}(A_0,A_1) = \emptyset \) meaning that, no local traces with digests \(A_0,A_1\) can be combined into a valid local trace ending with action \(\textsf {act}\). The complete definition of the refined constraint system instantiated to the actions considered here and unknowns for program points enriched with locksets is given in [49, Fig. 14].

Fig. 6.
figure 6

Right-hand sides for expressing locksets as a refinement.

Enriching program points with locksets can in fact be seen as a first application of this framework. The right-hand sides are given in Fig. 6.

Example 5

As a further instance, consider tracking which mutexes have been locked at least once in the local trace. At \(\textsf {lock}(a)\) traces in which a thread has performed a \(\textsf {lock}(a)\) can not be combined with traces that contain no \(\textsf {lock}(a)\). The corresponding right-hand sides are given in Fig. 7. When refining the analysis from Section 4 accordingly (assuming a protects g and h), it succeeds in proving the assert in this program as the initial values of 0 for g and h can be excluded.

figure m

This naturally generalizes to counting how often some action (e.g., a write to a global g) occurred, stopping exact bookkeeping at a constant (1 in this case).    \(\square \)

Fig. 7.
figure 7

Right-hand sides for refining according to encountered \(\textsf {lock}\) operations.

To prove soundness of local-trace-based refinement of our analysis from Section 4, we first construct a corresponding refined collecting local trace semantics. Then we verify that the refined analysis is sound w.r.t. this refined semantics – which, in turn, is proven sound w.r.t. the original collecting local trace semantics.

Theorem 2

Assume that \(\alpha _\mathcal {A}\), \(\textsf {new}^\sharp _\mathcal {A}\), and \(\llbracket u,\textsf {act}\rrbracket ^\sharp _\mathcal {A}\) fulfill requirements (4), (5), and (6). Then any solution of the refined constraint system is sound relative to the collecting local trace semantics.

Proof

A proof sketch instantiated with the actions considered here and unknowns enriched with locksets is provided in [49, Appendix D].

6 Analysis of Thread Ids and Uniqueness

Fig. 8.
figure 8

Program with multiple thread creations.

We instantiate the scheme from the previous section to compute abstract thread ids and their uniqueness. That refinement of the base analysis enhances precision of the analysis by excluding reads, e.g., from threads that have not yet been started. For that, we identify threads by their thread creation history, i.e., by sequences of create edges. As these sequences may grow arbitrarily, we collect all creates occurring after the first repetition into a set to obtain finite abstractions.

Example 6

In the program from Fig. 8, the first thread created by main receives the abstract thread id \((\textsf {main}\cdot \langle u_1,t_1\rangle ,\emptyset )\). It creates a thread with abstract thread id \((\textsf {main}\cdot \langle u_1,t_1\rangle \cdot \langle u_3,t_1\rangle ,\emptyset )\). At program point \(u_3\), the latter creates a thread starting at \(t_1\) and receiving the abstract thread id \((\textsf {main}\cdot \langle u_1,t_1\rangle ,\{\langle u_3,t_1\rangle \})\) – as do all threads subsequently created at this edge.    \(\square \)

Create edges, however, may also be repeatedly encountered within the creating thread, in a loop. To deal with this, we track for each thread, the set C of possibly already encountered create edges. As soon as a create edge is encountered again, the created thread receives a non-unique thread id.

Example 7

The first time the main thread reaches program point \(u_2\) in the program from Fig. 8, the created thread is assigned the unique abstract thread id \((\textsf {main}\cdot \langle u_2,t_1\rangle ,\emptyset )\). In subsequent loop iterations, the created threads are no longer kept separate, and thus receive the non-unique id \((\textsf {main},\{\langle u_2,t_1\rangle \})\).    \(\square \)

Formally, let \(\mathcal{N}_C,\mathcal{N}_S\) denote the subsets of program points with outgoing edge labeled \(x {=} \textsf {create}(...)\), and of starting points of threads, respectively. Let \(\mathcal P \subseteq \mathcal{N}_C \times \mathcal{N}_S\) denote sets of pairs relating thread creation nodes with the starting points of the created threads. The set \(\mathcal{I}^\sharp \) of abstract thread ids then consists of all pairs \((i,s)\in (\textsf {main}\cdot \mathcal{P}^*)\times 2^\mathcal{P}\) in which each pair \(\langle u,f\rangle \) occurs at most once. Given the set \(\mathcal{I}^\sharp \), we require that there is a concretization \(\gamma :\mathcal{I}^\sharp \rightarrow 2^\mathcal{I}\) and a function \(\textsf {single}:\mathcal{I}^\sharp \rightarrow \mathcal{V}^\sharp _\mathcal{I}\) with \(\gamma \,i^\sharp \subseteq \gamma _{\mathcal{V}^\sharp }\,(\textsf {single}\,i^\sharp )\). The abstract thread id of the main thread is given by \((\textsf {main},\emptyset )\). Therein, the elements in \((\textsf {main}\cdot \mathcal{P}^*)\times \{\emptyset \}\) represent the unique thread ids representing at most one concrete thread id, while the elements (is), \(s\ne \emptyset \), are ambiguous, i.e., may represent multiple concrete thread ids. Moreover, we maintain the understanding that the concretizations of distinct abstract thread ids from \(\mathcal{I}^\sharp \) all are disjoint.

As refining information \(\mathcal {A}\) we consider not only abstract thread ids – but additionally track sets of executed thread creations within the current thread. Accordingly, we set \(\mathcal {A}= \mathcal{I}^\sharp \times 2^P\) and define the right-hand sides as seen in Fig. 9, where \(\bar{i}\) denotes the set of pairs occurring in the sequence i.

Fig. 9.
figure 9

Right-hand sides for thread ids.

Example 8

Consider again the program from Fig. 8 with right-hand sides from Fig. 9, and assume that the missing right-hand for join returns its first argument. The initial thread has the abstract thread id \(i_0 = (\textsf {main},\emptyset )\). At its start point, the digest thus is \((i_0,\emptyset )\). At the create edge originating at \(u_1\), a new thread with id \((\textsf {main}\cdot \langle u_1,t_1\rangle ,\emptyset )\) is created. The digest for this thread then is \(((\textsf {main}\cdot \langle u_1,t_1\rangle ,\emptyset ),\emptyset )\). For the main thread, the encountered create edge \(\langle u_1,t_1\rangle \) is added to the second component of the digest, making it \((i_0,\{\langle u_1,t_1\rangle \})\).

When \(u_2\) is reached with \((i_0,\{\langle u_1,t_1\rangle \})\), a unique thread with id \((\textsf {main}\cdot \langle u_2,t_1\rangle ,\emptyset )\) is created. The new digest of the creating thread then is \((i_0,\{\langle u_1,t_1\rangle ,\) \(\langle u_2,t_1\rangle \})\). In subsequent iterations of the loop, for which \(u_2\) is reached with \((i_0,\{\langle u_1,t_1\rangle , \langle u_2,t_1\rangle \})\), a non-unique thread with id \((\textsf {main},\{\langle u_2,t_1\rangle \})\) is created.

When reaching \(u_3\) with id \((\textsf {main},\{\langle u_2,t_1\rangle \})\), a thread with id \((\textsf {main},\{\langle u_2,t_1\rangle ,\) \(\langle u_3,t_1\rangle \})\) is created as the id of the creating thread was already not unique. When reaching it with the id \((\textsf {main}\cdot \langle u_1,t_1\rangle ,\emptyset )\), a new thread with id \((\textsf {main}\cdot \langle u_1,t_1\rangle \cdot \langle u_3,t_1\rangle ,\emptyset )\) is created. When the newly created thread reaches this program point, the threads created there have the non-unique id \((\textsf {main}\cdot \langle u_1,t_1\rangle ,\{\langle u_3,t_1\rangle \})\), as \(\langle u_3,t_1\rangle \) already appears in the id of the creating thread.   \(\square \)

Abstract thread ids should provide us with functions

  • \(\textsf {unique}: \mathcal{I}^\sharp {\rightarrow } {\textbf {bool}}\) tells whether a thread id is unique.

  • \(\textsf {lcu\_anc}: \mathcal{I}^\sharp {\rightarrow } \mathcal{I}^\sharp {\rightarrow } \mathcal{I}^\sharp \) returns the last common unique ancestor of two threads.

  • \(\textsf {may\_create}: \mathcal{I}^\sharp {\rightarrow } \mathcal{I}^\sharp {\rightarrow } {\textbf {bool}}\) checks whether a thread may (transitively) create another.

For our domain \(\mathcal{I}^\sharp \), these can be defined as \(\textsf {unique}\,(i,s) = (s = \emptyset )\) and

$$ \begin{array}{lll} \textsf {lcu\_anc}\,(i,s)\,(i',s') &{}=&{} (\textsf {longest common prefix } i\,i', \emptyset )\\ \textsf {may\_create}\,(i,s)\,(i',s') &{}=&{} (\bar{i} \cup s) \subseteq (\bar{i'} \cup s') \end{array} $$

We use this extra information to enhance the definitions of \(\llbracket u,\textsf {lock}(a)\rrbracket ^\sharp _\mathcal {A}\) and \(\llbracket u,x' {=} \textsf {join}(x)\rrbracket ^\sharp _\mathcal {A}\) to take into account that the ego thread cannot acquire a mutex from another thread or join a thread that has definitely not yet been created. This is the case for a thread \(t'\)

  1. (1)

    that is directly created by the unique ego thread, but the ego thread has not yet reached the program point where \(t'\) is created;

  2. (2)

    whose thread id indicates that a thread that has not yet been created according to (1), is part of the creation history of \(t'\).

Accordingly, we introduce the predicate \(\textsf {may\_run}\,(i,C)\,(i',C')\) defined as

$$ \begin{array}{lll} (\textsf {lcu\_anc}\,i\,i' = i) \implies \exists \langle u,u'\rangle \in C: (i {\circ } \langle u,u'\rangle = i' \vee \textsf {may\_create}\,(i {\circ } \langle u,u'\rangle )\,i') \end{array} $$

which is false whenever thread \(i'\) is definitely not yet started. We then set

$$ \begin{array}{lll} \llbracket u,\textsf {lock}(a)\rrbracket ^\sharp _\mathcal {A}\,(i,C)\,(i',C') &{}=&{} \llbracket u,x' {=} \textsf {join}(x)\rrbracket ^\sharp _\mathcal {A}\,(i,C)\,(i',C')\\ &{}=&{} {\left\{ \begin{array}{ll} \{(i,C)\} &{} \text {if } \textsf {may\_run}\,(i,C)\,(i',C')\\ \emptyset &{} \text {otherwise} \end{array}\right. } \end{array} $$

This analysis of thread ids and uniqueness can be considered as a May-Happen-In-Parallel (or, more precisely, Must-Not-Happen-In-Parallel) analysis. MHP information is useful in a variety of scenarios: a thread-modular analysis of data races or deadlocks, e.g., that does not consider thread ids and joining, can be refined with this analysis to exclude more data races or deadlocks. Subsequently, we outline how the analysis from Section 4 may benefit from MHP information.

7 Exploiting Thread IDs to Improve Relational Analyses

We subsequently exploit abstract thread ids and their uniqueness to limit the amount of reading performed by the analysis from Section 4.

I1:

from other threads that have not yet been created.

I2:

the ego thread’s past writes, if its thread id is unique.

I3:

past writes from threads that have already been joined.

Improvements I1 and I3 have, e.g., been realized in a setting where thread ids and which thread is joined where can be read off from control-flow graphs [31]. Here, however, this information is computed during analysis. In our framework, I1 is already achieved by refining the base analysis according to Section 6.

Example 9

Consider the program below where \(\mathcal {M}[g] = \{a,b,m_g\}\), \(\mathcal {M}[h] = \{a,b,m_h\}\), \(\mathcal {M}[i] = \{m_i\}\) and assume \(\mathcal {Q}_a =\{\{g,h\}\}\).

figure n

The analysis succeeds in proving (1), as the thread (starting at) \(t_3\) that breaks the invariant \(g{=}h\) has definitely not been started yet at this program point. Without refinement, the analysis from Section 4 could not prove (1). However, this does not suffice to prove (2). At this program point, \(t_2\) may already be started. At the \(\textsf {lock}(a)\) in \(t_2\), \(t_3\) may also be started; thus, the violation of the invariant \(g{=}h\) by \(t_3\) is incorporated into the local state of \(t_2\) at lock. At \(\textsf {unlock}(a)\), despite \(t_2\) only reading g, the imprecise abstract relation violating \(g{=}h\), is side-effected to \([a,\{g,h\},t_2]\) and is incorporated at the second \(\textsf {lock}(a)\) of the main thread. The final shortcoming is that each thread reads all its own past (and future!) writes – even when it is known to be unique. This means that (3) cannot be proven.    \(\square \)

To achieve I2, some effort is required as our analysis forgets values of globals when they become unprotected. This is in contrast, e.g., to [39, 42]. We thus restrict side-effecting to mutexes to cases where the ego thread has possibly written a protected global since acquiring it. This is in contrast to Section 4, where a side-effect is performed at every unlock, i.e., everything a thread reads is treated as if it was written by that thread.

Technically, we locally track a map \(L: (\textsf {M}\times \mathcal {Q}) \rightarrow \mathcal{R}\), where \(L\,(a,Q)\) maintains for a mutex a, an abstract relation between the globals in cluster \(Q\in \mathcal {Q}_a\). More specifically, the abstract relation on the globals from \(Q\) recorded in \(L\,(a,Q)\) is the one that held when a was unlocked join-locally for the first time after the last join-local write to a global in \(\mathcal {G}\,[a]\). If there is no such \(\textsf {unlock}(a)\), the relation at program start is recorded. We call an operation in a local trace join-local to the ego thread, if it is (a) thread-local, i.e., performed by the ego thread, or (b) is executed by a thread that is (transitively) joined into the ego thread, or (c) is join-local to the parent thread at the node at which the ego thread is created. This notion will also be crucial for realizing I3. Join-locality is illustrated in Fig. 10, where the join-local part of a local trace is highlighted.

For join-local contributions, it suffices to consult \(L\,a\) instead of unknowns \([a,Q,i]\). Such contributions are accounted for. To check whether a contribution from some thread id is accounted for, we introduce a function \(\textsf {acc}: (\mathcal {A}\times \mathcal{D}_S) {\rightarrow } \mathcal {A}{\rightarrow } \textsf {bool}\) (see definition (7) below). Besides an abstract value from \(\mathcal{R}\), the local state \(\mathcal{D}_S\) now contains two additional components:

  • The map \(L: (\textsf {M}\times \mathcal {Q}) \rightarrow \mathcal{R}\) for which the join is given component-wise;

  • The set \(W: 2^\mathcal{G}\) (ordered by \(\subseteq \)) of globals that may have been written since one of its protecting mutexes has been locked, and not all protecting mutexes have been unlocked since.

Just like r, L and W are abstractions of the reaching local traces. \(\mathcal{D}_T\) is also enhanced with an L component, while \(\mathcal{D}_M\) remains unmodified. We sketch the right-hand sides here, definitions are given in Fig. 11. For program start \(\textsf {init}^\sharp \), in contrast to the analysis from Section 4, there is no initial side-effect to the unknowns for mutexes. The initial values of globals are join-local, and thus accounted for in the L component also passed to any subsequently created thread.

Fig. 10.
figure 10

Illustration highlighting the join-local part of a local trace of the program from Fig. 2a, and which writes are thus accounted for by L.

The right-hand sides for thread creation and return differ from the analysis from Section 4 enhanced with thread ids only in the handling of additional data structures L and W. As the thread ids are tracked precisely in the \(\mathcal {A}\) component, this information is directly used when determining which unknown to side-effect to and unknowns [(iC)] replace unknowns \([i',(i,C)]\).

For join, if the return value of the thread is not accounted for, it is assigned to the variable on the left-hand side and the L information from the ego thread and the joined thread is joined. If, on the other hand, it is accounted for, the thread has already been joined and cannot be joined again. There is a separate constraint for each \((i',C')\), so that all threads that could be joined are considered.

For locking of mutexes, upon lock, if \((i',C')\) is not accounted for, its information on the globals protected by a is joined with the join-local information for a maintained in \(L\,(a,Q)\), \(Q\in \mathcal {Q}_a\). This information about the globals protected by a is then incorporated into the local state by \(\sqcap \). For unlocking of mutexes, if there may have been a write to a protected global since the mutex was locked (according to W), the join-local information is updated and the local state restricted to \(Q\) is side-effected to the appropriate unknown \([a,Q,(i,C)]\) for \(Q\in \mathcal {Q}_a\). Just like in Section 4, r is then restricted to only maintain relationships between locals and those globals for which at least one protecting mutex is still held. Reading from and writing to globals once more are purely local operations. To exclude self writes, we set

$$\begin{aligned} \begin{array}{lll} \textsf {acc}\,((i,C),\_)\,(i',C') = \textsf {unique}\,i \wedge i = i' \end{array} \end{aligned}$$
(7)

The resulting analysis thus takes I1 (via \(\llbracket ...\rrbracket ^\sharp _\mathcal {A}\) defined in Section 6), as well as I2 (via \(\textsf {acc}\)) into account. In Example 9, it is now able to show all assertions.

Fig. 11.
figure 11

Right-hand sides for the improved (I1, I2) analysis using thread ids.

Theorem 3

This analysis is sound w.r.t. to the local trace semantics.

Proof

The proof relies on the following observations:

  • When \(\mathcal {G}[a]\cap W = \emptyset \), no side-effect is required.

  • Exclusions based on \(\textsf {acc}\) are sound, i.e., it only excludes join-local writes.

The detailed proof is a simplification of a proof for an enhanced analysis from the extended version [49, Appendix F], which we outline in Appendix G there.   \(\square \)

The analysis does not make use of components C at unknowns \([a,Q,(i,C)]\) and [iC]. In [49, Appendix E], we detail how this information can be exploited to exclude a further class of writes – namely, those that are performed by an ancestor of the ego thread before the ego thread was created. Alternatively, an implementation may abandon control-point splitting according to C at mutexes and thread ids, replacing \([a,Q,(i,C)], [i,C]\) with \([a,Q,i]\) and [i], respectively.

When turning to improvement I3, we observe that after joining a thread t with a unique thread id, t cannot perform further writes. As all writes of joined threads are join-local to the ego thread, it is not necessary to read from the corresponding global unknowns. We therefore enhance the analysis to also track in the local state, the set J of thread ids for which join has definitely been called in the join-local part of the local trace and refine \(\textsf {acc}\) to take J into account:

$$ \begin{array}{lll} \textsf {acc}\,((i,C),(J,L,W,r))\,(i',C')=\textsf {unique}\,i'\wedge (i=i' \vee i'\in J) \end{array} $$

The extended version [49, Appendix F] gives details on this enhancement.

8 Exploiting Clustered Relational Domains

Naïvely, one might assume that tracking relations among a larger set of globals is necessarily more precise than between smaller sets. Interestingly, this is no longer true for our analyses, e.g., in presence of thread ids. A similar effect where relating more globals can deteriorate precision has also been observed in the context of an analysis using a data-flow graph to model interferences [19].

Example 10

Consider again Example 1 in the introduction with \(\mathcal {Q}_a = \{\{g,h,i\}\}\). For this program, the constraint system of the analysis has a unique least solution. It verifies that assertion (1) holds. It assures for \([a,\{g,h,i\},t_1]\) that \(h{=}i\) holds, while for the main thread and the program point before each assertion, \(L\,(a,\{g,h,i\}) = \{ g{=}h, h{=}i \}\) holds, while for \([a,\{g,h,i\},\textsf {main}]\) and \([a,\{g,h,i\},t_2]\) only \(\top \) is recorded, as is for any relation associated with \(m_g\), \(m_h\), or \(m_i\). Assertion (2), however, will not succeed, as the side-effect from \(t_1\) causes the older values from the first write in the main thread to be propagated to the assertions as well, implying that while \(h{=}i\) is proven, \(g{=}h\) is not.   \(\square \)

Intuitively, the analysis loses precision because, at an unlock of mutex a, the current relationships between all clusters protected by a are side-effected. As soon as one global is written to, the analysis behaves as if all protected globals had been written. By limiting publishing to those clusters for which at least one global has been written, more precise information may remain at others.

In the improved analysis, when unlocking a mutex a, side-effects are only produced to clusters \(Q\in \mathcal {Q}_a\) containing at least one global that was written to since the last \(\textsf {lock}(a)\). Definitions for locking and unlocking are given in Fig. 12.

For locking the mutex a, the abstract value to be incorporated into the local state is assembled from the contributions of different threads to the clusters. For that, the separate constraints for each admitted digest from Section 5 are combined into one for the set \(\textbf{I} = \{ (i',C') \mid (i,C) \in \llbracket \textsf {lock}(a)\rrbracket ^\sharp _\mathcal {A}((i,C),(i',C')) \}\) of all admitted digests. This is necessary as side-effects to unaffected clusters at \(\textsf {unlock}(a)\) have been abandoned and thus the meet with the values for clusters of one thread at a time is unsound. For each \(Q\), the join-local information \(L\,(a,Q)\) is joined with all contributions to \(Q\) by threads that are not yet accounted for, but admitted for \(Q\) by the digests. Here, the contributions of threads that do not write \(Q\) is \(\bot \), and thus do not affect the value for \(Q\). Finally, the resulting value is used to improve the local state by meet. The right-hand side for \(\textsf {lock}(a)\) thus exploits the fine-grained, per-cluster MHP information provided by the digests and the predicate \(\textsf {acc}\). We obtain:

Theorem 4

Given domains \(\mathcal{R}\) and \({\mathcal{V}^\sharp }\) fulfilling the requirements from Fig. 1, any solution of the constraint system is sound w.r.t. the local trace semantics. Maximum precision is obtained with \(\mathcal {Q}_a = 2^{\mathcal {G}[a]}\).    \(\square \)

For Example 1, with \(\mathcal {Q}_a = 2^{\mathcal {G}[a]}\), both assertions are verified. Performing the analysis with all subclusters simultaneously can be expensive when sets \(\mathcal {G}[a]\) are large. The choice of subclustering thus generally involves a trade-off between precision and runtime. This is different for k-decomposable relational domains:

Theorem 5

Provided the relational domain is k-decomposable (Equation (2)), the clustered analysis using all subclusters of sizes at most k only, is equally precise as the clustered analysis using all subclusters \(\mathcal {Q}_a = 2^{\mathcal {G}[a]}\) at mutexes a.

Proof

Consider a solution \(\eta \) of the constraint system with \(\mathcal {Q}_a = 2^{\mathcal {G}[a]}\). Then for unknowns [aQ, (iC)] and \([a,Q',(i,C)]\) with \(Q\subseteq Q'\) and \(|Q| \le k\), and values \(r{=}\eta \,[a,Q,(i,C)]\), \(r'{=}\eta \,[a,Q',(i,C)]\), we have that (whenever the smaller cluster receives a side-effect, so does the larger one). Thus, by k-decomposability, the additional larger clusters \(Q'\), do not improve the meet over the clusters of size at most k for individual thread ids as well as the meet of their joins over all thread ids. The same also applies to the clustered information stored in L.    \(\square \)

Example 11

Consider again Example 1. If the analysis is performed with clusters \(\mathcal {Q}_a = \{\{h,i\},\{g,h\},\{g,i\},\{g\},\{i\},\{h\}\}\) both assertions can be proven.    \(\square \)

Fig. 12.
figure 12

Right-hand sides for unlocking and locking when limiting side-effecting to potentially written clusters.

The one element clusters, on the other hand, cannot be abandoned – as indicated by the example from Appendix H in the extended version [49].

9 Experimental Evaluation

We implemented [50] the analyses extending the context-sensitive static analyzer Goblint which provides the set of protecting mutexes for each global. The implementation tracks information about integral variables using either the Interval or the Octagon domains from Apron [29]. A comparison with other tools is difficult, for details see [49, Appendix I]:

  • Duet [19] — Its benchmarks are only available as binary goto-programs which neither its current version nor any other tool considered here can consume. Since Duet does not support function calls, it could only be run on some of the benchmarks considered here.

  • AstréeA [39] — A public version is available but not licensed for evaluation.

  • Watts [31] — Since we were unable to run the tool on any program, we compared with the numbers reported by the authors.

  • NR-Goblint [48] — Goblint with the non-relational analyses from [48].

We considered four different configurations, namely, Interval: the analysis from Section 4 with Intervals; Octagon: the same analysis with Octagons; TIDs: the analysis from Section 7 with enhancement [49, Appendix F] with Octagons; Clusters: TIDs using clusters of size at most 2 only. All benchmarks were run in a virtual machine on an AMD EPYC 7742 64-Core processorFootnote 3 running Ubuntu 20.04. The results of our evaluation are summarized in Table 2.

Table 2. Summary of evaluation results, with individual programs grouped together. For each group the number of programs and the total number of assertions are given. ( ) indicates that all (no) assertions are proven, otherwise the number of proven assertions is given. (—) indicates invalid results produced.

Our benchmarks. To capture particular challenges for multi-threaded relational analysis, we collected a set of small benchmarks (including the examples from this paper) and added assertions. On these, we evaluated our analyzer, NR-Goblint, and Duet. Our analysis in the Clusters configuration is capable of verifying all the programs. The other tools could only prove a handful of relational assertions.

Goblint benchmarks [48]. These benchmarks do not contain assertions. To still relate the precision of our analyzer to the non-relational NR-Goblint and to Duet, we used our tool in the Clusters setting to automatically derive invariants at each locking operation. Perhaps surprisingly, NR-Goblint could verify 95% of the invariants despite being non-relational and not using thread ids.

Watts benchmarks [31]. These programs were instrumented with asserts and significantly changed by the authors. Our analyses can verify all but 7 out of over 1000 assertions. Due to necessary fixes to programs and our inability to run their tool, numbers are not directly comparable. Nevertheless, for their scalability tests, reported runtimes for Watts are up to two orders of magnitude worse than ours. See [49, Appendix I] for a more detailed discussion.

Ratcop benchmarks [42]. These were Java programs. After manual translation to C, our analyzer succeeded in proving all assertions any configuration of Ratcop could with Octagons, while Ratcop required polyhedra in one case.

Fig. 13.
figure 13

Precision and performance evaluation on the Goblint benchmark set.

Internal comparison We evaluated our analyses in more detail on the Goblint benchmark set [48]. Fig. 13a shows sizes of the programs (in Logical LoC) and the number of thread ids found by the analysis from Section 6. The high number of threads identified as unique is encouraging. To evaluate precision, we compared the abstract values at each program point (joined over contexts). Fig. 13a shows for what proportion of program points tracking thread ids increases precision. There were no program points where precision decreased or values became incomparable, while for some programs gains of over 50% were observed. Fig. 13b illustrates runtimes. In 9 of 12 cases, performance differences between our relational analyses are negligible. In all cases, using clusters incurs no additional cost. Thus, the more precise analysis with clusters of size \(\le 2\) seems to be the method of choice for thread-modular relational abstract interpretation.

10 Related Work

Since its introduction by Miné [36, 37], the weakly relational numerical domain of Octagons has found wide-spread application for the analysis and verification of programs [8, 14]. Since tracking relations between all variables may be expensive, pre-analyses have been suggested to identify clusters of numerical variables whose relationships may be of interest [8, 14, 26, 45]. A dynamic approach to decompose relational domains into non-overlapping clusters based on learning is proposed by Singh et al. [55]. While these approaches trade (unnecessary) precision for efficiency, others try to partition the variables into clusters without compromising precision [15, 23, 24, 44, 54, 56]. These types of clustering are orthogonal to our approach and could, perhaps, be combined with it.

The integration of relational domains into thread-modular abstract interpretation was pioneered by Miné [39]. His analysis is based on lock invariants determining for each mutex a relation which holds whenever the mutex is not held. Weak interferences are used to account for asynchronous variable accesses. For practical analyses, a relational abstraction only for lock invariants is proposed, while using a coarse, non-relational abstraction for the weak interferences. This framework closely follows the framework for non-relational analysis [38], while abandoning background locksets. Our relational analysis, on the other hand, maintains at each mutex a only relations between variables write-protected by a. For these relations more precise results can be obtained, since they are incorporated into the local state at locks by meet (while [39] uses join).

Miné [40] present an analysis framework which is orthogonal to our approach. It is tailored to the verification of algorithms that do not rely on explicit synchronization via mutexes such as the Bakery algorithm. Miné [57] extend [40] to handle weak memory effects (PSO, TSO) by incorporating memory buffers into the thread-local semantics. The notion of interferences is also used by Sharma and Sharma [52] for the analysis of programs under the Release/Acquire Memory Model of C11 by additionally tracking abstractions of modification sequences for global variables. They consider fixed finite sets of threads only, and do not deal with thread creation or joining.

Earlier works on thread-modular relational analysis rely on Datalog rules to model interferences in the sense of Miné in combination with abstract interpretation applied to the Data-Flow Graph [19] or the Control-Flow Graph [31] (later extended to weak memory [32]), respectively. Botbol et al. [10] give a non-thread-modular analysis of multi-threaded programs with message-passing concurrency by encoding the program semantics as a symbolic transducer.

In all these approaches clusters of variables, if there are any, are predefined and not treated specially by the analysis. This is different in the thread-modular analysis proposed by Mukherjee et al. [42]. It propagates information from unlocks to locks. It is relational for the locals of each thread, and within disjoint subsets of globals, called regions. These regions must be determined beforehand and must satisfy region-race freedom. In contrast, the only extra a priori information required by our analysis, are the sets of (write-) protecting mutexes of globals – which can be computed during the analysis itself. The closest concept within our approach to a region is the set of globals jointly protected by mutexes. These sets may overlap – which the analysis explicitly exploits. Like ours, their proof of correctness refers to a thread-local semantics. Unlike ours, it is based on interleavings and thus overly detailed. The concrete semantics on which our analyses are based, is a collecting local trace semantics extending the semantics of Schwarz et al. [48] by additionally taking thread termination and joins into account. The analyses in [48], however, are non-relational. No refinement via further finite abstractions of local traces, such as thread ids is provided.

The thread id analysis perhaps most closely related to ours, is by Feret [20] who computes ids for agents in the \(\pi \)-calculus as abstractions of sequences of encountered create edges. Another line of analysis of concurrent programs deals with determining which critical events may happen in parallel (MHP) [1,2,3,4, 7, 17, 43, 59] to detect programming errors like, e.g., data races, or identifying opportunities for optimization. Mostly, MHP analyses are obtained as abstractions of a global trace semantics [18]. We apply related techniques for improving thread-modular analyses – but based on a local trace semantics. Like MHP analyses, we take thread creation and joining histories as well as sets of held mutexes into account. Additionally, we also consider crucial aspects of the modification history of globals and provide a general framework for further refinements.

In a sequential setting, splitting control locations according to some abstraction of reaching traces is a common technique for improving the precision of dataflow analyses [9, 27] or abstract interpretation [25, 34, 41, 47]. Control point splitting can be understood as an instance of the reduced cardinal power domain [12, 13, 22]. For the analysis of multi-threaded programs, Miné [39] applies the techniques of Mauborgne and Rival [34] to single threads, i.e., independently of the actions of all other threads. Our approach, on the other hand, may take arbitrary properties of local traces into account, and thus is more general.

11 Conclusion and Future Work

We have presented thread-modular relational analyses of global variables tailored to decomposable domains. In some cases, more precise results can be obtained by considering smaller clusters. For k-decomposable domains, however, we proved that the optimal result can already be obtained by considering clusters of size at most k. We have provided a framework to incorporate finite abstractions of local traces into the analysis. Here, we have applied this framework to take creation as well as joining of threads into account, but believe that it paves the way to seamlessly enhance the precision of thread-modular abstract interpretation. The evaluation of our analyses on benchmarks proposed in the literature indicates that our implementation is competitive both w.r.t. precision and efficiency. In future work, we would like to experiment with further abstractions of local traces, perhaps tailored to particular programming idioms, and also explore the potential of non-numerical 2-decomposable domains.