Keywords

1 Introduction

Dynamic analysis has emerged as a popular class of techniques for ensuring reliability of large scale software, owing to their scalability and soundness (no false positives). At a high level, such techniques solve the membership problem — given an execution \(\sigma \), typically modelled as a sequence of events, does \(\sigma \) belong to \(L_{\text {bug}}\), a chosen set of executions that exhibit some undesired behaviour. In the context of concurrent software, however, such a naive testing paradigm suffers from poor coverage, since even under the ideal input, the execution \(\sigma \) observed at the time of testing, may not reveal the presence of bug (membership in \(L_{\text {bug}}\)), because of the non-determinism due to thread scheduling. Runtime prediction, which is also the subject of this work, has emerged as a systematic approach to enhance vanilla dynamic analyses [15, 21, 40]. Instead of solving the vanilla membership problem (\(\sigma \in L_{\textsf {bug}}\)), runtime predictive techniques solve the predictive membership or predictive monitoring problem — they generalize the observed execution \(\sigma \) to a larger set of executions \(S_\sigma \) and check if there is some execution in \(S_\sigma \) that belongs to \(L_{\text {bug}}\).

The predictive power (how often real bugs are identified) as well as the speed of a runtime predictive analysis (or predictive monitoring), often conflicting goals, crucially depend upon the space of \(S_\sigma \) that the analysis reasons about. In the most general case, \(S_\sigma \) can be the set of all executions that preserve the control and data flow of \(\sigma \), namely correct reorderings [38] of \(\sigma \). Analyses that exhaustively reason about the entire space of correct reorderings have the highest prediction power in theory [35], but quickly become intractable even for very simple classes of bugs [21, 25]. On the other extreme is the class of trivial analyses which consider \(S_\sigma = \{\sigma \}\) but offer no predictive power. Analyses based on Mazurkiewicz’s trace equivalence theory [28] opt for middleground and balance predictive power with moderate computational complexity of the predictive monitoring question.

In the framework of trace theory, one fixes a concurrent alphabet \((\Sigma , \mathbb {D})\) consisting of a finite set of labels \(\Sigma \), and a symmetric, reflexive dependence relation \(\mathbb {D}\subseteq \Sigma \times \Sigma \). Now, each string \(w \in \Sigma ^*\) can be generalized to its equivalence class \([\![w]\!]_{\mathbb {D}}\), comprising all those strings \(w'\) which can be obtained from w by repeatedly swapping neighbouring events when their labels are not dependent. The corresponding predictive monitoring question under trace equivalence then translates to the disjointness check \([\![\sigma ]\!]_{\mathbb {D}} \cap L_{\text {bug}} \ne \varnothing \). Consider the sound dependence relation \(\mathbb {D}_\textsf{RWL}\) that marks pairs of events of the same thread, and pairs of events that write to the same memory location or the same lock as dependent. Here, we say \(\mathbb {D}\) is sound if one can only infer correct reorderings from \(\mathbb {D}\), i.e., for every well-formed execution \(\sigma \), \([\![\sigma ]\!]_{\mathbb {D}} \subseteq \textsf {CReorderings}(\sigma )\). Then consider, for example, the execution \(\sigma _1\) in Fig. 1a consisting of 6 events \(\{e_i\}_{i \le 6}\) performed by threads \(t_1\) and \(t_2\). It is easy to conclude that \(\sigma _1\) is equivalent to the reordering \(\rho _1 = e_1e_4e_2e_3e_5e_6\), i.e., \(\rho _1 \in [\![\sigma ]\!]_{\mathbb {D}_\textsf{RWL}}\) and thus \([\![\sigma ]\!]_{\mathbb {D}_\textsf{RWL}}\) is not disjoint from the set of executions where two \(\texttt{w}(x)\) events are consecutive. For a large class of languages \(L_{\text {bug}}\) [30], this question can, in fact, be answered in a one pass streaming constant-space algorithm, the holy grail of runtime monitoring, and has been instrumental in the success of industrial strength concurrency bug detectors [17, 29, 36].

Despite the simplicity and algorithmic efficiency of reasoning with commutativity of individual events, trace theory falls short in accurately reasoning about commutativity of atomic blocks of events in executions of concurrent programs. Consider, for example, the execution \(\sigma _2\) from Fig. 1b. Here, under the dependence \(\mathbb {D}_\textsf{RWL}\) described above, the events \(e_1 = {\texttt{w}(x)}^{t_1}\) and \(e_6 = {\texttt{w}(x)}^{t_2}\) are ordered through the chain of dependence \((e_1, e_3), (e_3, e_4), (e_4, e_6)\). However, the reordering \(\rho '_2 = e_4e_5e_1e_6e_2e_3\) is a correct reordering of \(\sigma _2\) and also witnesses the two write events consecutively. In other words, the equivalence induced by a dependence relation can be conservative since commutativity on individual events maybe insufficient to determine when two blocks commute. Observe that simply relaxing the dependence relation \(\mathbb {D}_\textsf{RWL}\) to a smaller set (say by removing dependence on locks) may be detrimental to soundness as one may infer that the ill-formed execution \(\rho ''_2 = e_1e_2e_4e_3e_5e_6\) is equivalent to \(\sigma _2\). Indeed, one can show that \(\mathbb {D}_\textsf{RWL}\) is the most relaxed sound dependence relation. At the same time, we remark that the efficiency of the algorithms based on trace equivalence [9, 11, 13, 16, 27] crucially stems from reasoning about commutativity of individual events (instead of blocks of events).

Fig. 1.
figure 1

Execution \(\sigma _1\) has a predictable data race, and can be exposed with trace equivalence. Execution \(\sigma _2\) has a predictable data race (witnessed by \(\rho _2\)) which cannot be exposed under trace equivalence, but can be exposed by strong trace prefixes.

In this work, we propose taking a different route for enhancing the predictive power of trace-based reasoning. Instead of allowing flexibility for commuting individual blocks of events, we observe that we can nevertheless enhance predictive power by sticking to commutativity of events but allowing for greater flexibility in selecting events that participate in these reorderings. Consider, for example, the execution \(\rho _2\) in Fig. 1c, which is a correct reordering of \(\sigma _2\) and also witnesses that the two write events are consecutive. Intuitively, one can obtain \(\rho _2\) from \(\sigma _2\) by first dropping the earlier critical section in \(t_1\), thereby unblocking the critical section in \(t_2\) so that it can commute to the beginning of the execution using event based commutativity.

In this work, we argue that the above style of reasoning can be formalized as a simple extension of the classic trace theory and no sophisticated algebraic formulation may be required. The dependence relation \(\mathbb {D}\) plays a dual role — (downward-closure) for each event e in some reordering, all events dependent before e must be present in the reordering, and (order-preservation) amongst the set of events present in the reordering, the relative order of all dependent events must be preserved. Towards this, we propose to make this distinction explicit. Reflecting on the example above, a key tool we employ here is to stratify the dependence in \(\mathbb {D}_\textsf{RWL}\) based on their strength. On one hand, we have strong dependencies, such as program order, for which both the roles (downward-closure) and (order-preservation) must be respected and cannot be relaxed. On the other hand, we have dependence between lock events, for which (order-preservation) must be kept intact, but nevertheless the first role (downward-closure) can be relaxed. We formalize this notion, in Sect. 3, using two sets of dependence relations, a strong dependence \(\mathbb {S}\) and a weak one \(\mathbb {W}\), and the resulting notion of a strong trace prefix of an execution, whose set of events is downward closed with respect to \(\mathbb {S}\) and further, the relative order on the residual events in it respects the order induced by \(\mathbb {S}\cup \mathbb {W}\).

Our generalization of traces to strong trace prefixes has important advantages. First, and the most obvious one, is the enhanced predictive power when monitoring against a language \(L_{\text {bug}}\), as we illustrated above. The second consequence of the explicit stratification of the dependence relation, is that we can predict against new, previously impossible, languages such as those for deadlock prediction [40]. Third, the simplicity of our strong trace prefixes framework and its proximity to the original trace-theoretic framework implies that the predictive monitoring question in this new setting is solvable in essentially the same time and space complexity as in the trace-theory setting, despite the enhanced predictive power it unveils. We present a unified scheme, in Sect. 4, to translate any predictive algorithm that works under trace equivalence against some language \(L_{\text {bug}}\) to one that works under strong trace prefixes (for the same language \(L_{\text {bug}}\)) with additional non-determinism (but similar time and space usage) or alternatively with a polynomial multiplicative blowup in time. Thus, when the predictive question can be answered in constant space for Mazurkiewicz traces (as with data races [1]), it continues to be solvable in constant space for strong trace prefixes.

In Sect. 5 we further shorten the gap between commutativity style reasoning (aka strong trace prefixes) and the full semantic space (aka correct reorderings). In particular, we show that we can further relax the dependence on conflicting memory locations (\((\texttt{r}, \texttt{w})\), \((\texttt{w}, \texttt{r})\)), that otherwise ensure soundness, and regain soundness back by baking in extra reads-from constraints in the prefixes. We define strong reads-from prefixes to formalize the resulting space of reorderings, and show that predictive monitoring under them can also be done with same time and space complexity as with strong trace prefixes. Next, in Sect. 6 we draw an interesting connection between strong reads-from prefixes and the class of synchronization-preserving data races [26] and deadlocks [40] which are the fastest and most predictive practical algorithms for detecting these concurrency bugs. We show that while synchronization-preserving reorderings are a larger class of reorderings, strong reads-from prefixes are nevertheless sufficient to capture the corresponding class of data races and deadlocks. As a consequence, we obtain constant-space algorithms for predicting classes of bugs, improving the previously known linear-space algorithms.

We put the new formalism to test by implementing the algorithms that follow from our results, for various specifications such as data races, deadlocks, and pattern languages [1]. We evaluated them on benchmark program traces derived from real world software applications and demonstrate the effectiveness of our formalism through its enhanced prediction power.

2 Predictive Monitoring and Trace Theory

Here we discuss some preliminary background on the predictive monitoring problem, trace theory and some limitations when applying the latter in the context of the former.

Events, Executions and Monitoring. We model an execution as a finite sequence \(\sigma = e_1, e_2, \ldots , e_k\) of events where each event \(e_i\) is labelled with a letter \(a_i = \textsf{lab}(e_i) \in \Sigma \) from a fixed alphabet \(\Sigma \). We will use \(\textsf {Events}_{\sigma }\) to denote the set of events of \(\sigma \) and use the notation \(e_1 <_{\sigma } e_2\) to denote that the event \(e_1\) appears before \(e_2\) in the sequence \(\sigma \). We will often use the custom alphabet \(\Sigma _\textsf{RWL}\) to label events of shared memory multithreaded programs. For this, we fix sets \(\mathcal {T}\), \(\mathcal {L}\) and \(\mathcal {X}\) of thread, lock, and memory location identifiers. Then, \(\Sigma _\textsf{RWL}= \{{\textsf{op}(d)}^{t} \,|\, t \in \mathcal {T}, \textsf{op}(d) \in \{\texttt{r}(x), \texttt{w}(x), \texttt{acq}(\ell ), \texttt{rel}(\ell )\}_{x \in \mathcal {X}, \ell \in \mathcal {L}}\}\) consists of labels denoting read/write of memory locations \(\mathcal {X}\) or acquire/release of locks \(\mathcal {L}\), each being performed by some thread \(t \in \mathcal {T}\). Executions of multithreaded programs are assumed to be well-formed, i.e., belong to the regular language \(L_\textsf{WF}\subseteq \Sigma _\textsf{RWL}^*\) that contains all strings where each release event e has a unique matching acquire event \(e'\) on the same lock and same thread, and no two critical sections on the same lock overlap. In addition, we only consider sequential consistency memory model in this paper. Our focus here is the runtime monitoring problem against a property \(L \subseteq \Sigma ^*\) — ‘given an execution \(\sigma \), does \(\sigma \in L\)?’

Predictive Monitoring and Correct Reorderings. Vanilla dynamic analyses that answer the membership question ‘\(\sigma \in L\)?’ often miss bugs thanks to non-deterministic thread interleaving. Nevertheless, even when an execution \(\sigma \) does not belong to the target language L, it may still be possible to predict bugs in alternate executions that can be inferred from \(\sigma \). Here, one first defines the set \(\textsf {CReorderings}(\sigma )\) of correct reorderings [35, 38] of \(\sigma \) comprising executions similar to \(\sigma \) in the following precise sense — every program P that can generate \(\sigma \), will also generate all executions in \(\textsf {CReorderings}(\sigma )\). For an execution \(\sigma \in L_\textsf{WF}\subseteq \Sigma ^*_\textsf{RWL}\) of a multithreaded program, \(\textsf {CReorderings}(\sigma )\) can be defined to be the set of all executions \(\rho \) of \(\sigma \) such that (1) \(\textsf {Events}_{\rho } \subseteq \textsf {Events}_{\sigma }\), (2) \(\rho \) is well-formed, i.e., \(\rho \in L_\textsf{WF}\), (3) \(\rho \) is downward-closed with respect to the program-order of \(\sigma \), i.e., for any two events \(e_1, e_2\) performed by the same thread and \(e_1 <_{\sigma } e_2\), if \(e_2 \in \textsf {Events}_{\rho }\), then \(e_1 \in \textsf {Events}_{\rho }\), and (4) for any read event \(e_{\texttt{r}}\) labelled \({\texttt{r}(x)}^{t}\), the write event \(e_{\texttt{w}}\) that \(e_{\texttt{r}}\) reads-from (\(e_{\texttt{r}} \in \textsf {rf}_{\sigma }(e_{\texttt{w}})\)) must also be in \(\rho \). Here, we say that \(e_{\texttt{r}} \in \textsf {rf}_{\sigma }(e_{\texttt{w}})\) if \(e_{\texttt{r}}\) and \(e_{\texttt{w}}\) access the same memory location x and there is no other write event \(e'_{\texttt{w}}\) such that \(e_{\texttt{w}} <_{\sigma } e'_{\texttt{w}} <_{\sigma } e_\texttt{r}\). The predictive monitoring question against a language L can now be formalized as ‘given an execution \(\sigma \), is \(\textsf {CReorderings}(\sigma ) \cap L \ne \varnothing \)?’. Observe that any witness \(\rho \in \textsf {CReorderings}(\sigma ) \cap L\) is a true positive since every execution in \(\textsf {CReorderings}(\sigma )\) passes the same control flow as \(\sigma \) and thus can be generated by any program that generates \(\sigma \). In general, this predictive monitoring question does not admit a tractable solution, even for the simplest class of (regular) languages, such as the class of executions that contain a data race [25], and has been shown to admit super-linear-space hardness even for 2 threads [12]. Practical and sound algorithms for solving the predictive monitoring problem [21, 24, 26, 31, 38] often weaken predictive power in favour of soundness by considering a smaller space \(S_\sigma \) of reorderings. A set \(S_\sigma \subseteq \Sigma ^*_\textsf{RWL}\) is said to be sound for a given execution \(\sigma \in L_\textsf{WF}\) if \(S_\sigma \subseteq \textsf {CReorderings}(\sigma )\), an algorithm that restricts its search of reorderings to \(S_\sigma \) will not report false positives.

Mazurkiewicz Traces. Trace theory, proposed by Antoni Mazurkiewicz [28], offers a tractable solution to the otherwise intractable predictive monitoring problem, by characterizing a simpler subclass of reorderings. Here, one identifies a reflexive and symmetric dependence relation \(\mathbb {D}\subseteq \Sigma \times \Sigma \), and deems an execution \(\rho \) equivalent to \(\sigma \) if one can obtain \(\rho \) from \(\sigma \) by repeatedly swapping neighbouring events when they are not dependent. Together, \((\Sigma , \mathbb {D})\) constitute a concurrent alphabet. Formally, the trace equivalence \(\sim _{\mathbb {D}}\) of the concurrent alphabet \((\Sigma , \mathbb {D})\) is the smallest equivalence on \(\Sigma ^*\) such that for every \(w_1, w_2 \in \Sigma ^*\) and for every \((a, b) \in \Sigma \times \Sigma \setminus \mathbb {D}\), we have \(w_1 \cdot a \cdot b \cdot w_2 \sim _{\mathbb {D}} w_1 \cdot b \cdot a \cdot w_2.\) We use \([\![w]\!]_{\mathbb {D}} = \{w' \,|\, w \sim _{\mathbb {D}} w'\}\) to denote the equivalence class of \(w \in \Sigma ^*\).

Model Shared-Memory Concurrency Using Traces. Let us see how traces can (conservatively) model a class of correct reorderings, with an appropriate choice of dependence over \(\Sigma _\textsf{RWL}\). The dependence \(\mathbb {D}_{\mathcal {L}} = \{({\textsf{op}_1(\ell )}^{t_1}, {\textsf{op}_2(\ell )}^{t_2}) \,|\, \ell \in \mathcal {L}\}\) can be used enforce mutual exclusion of critical sections — for every \(\rho \in [\![\sigma ]\!]_{\mathbb {D}_{\mathcal {L}}}\), the order of locking events is the same as in \(\sigma \), and thus if \(\sigma \) is well-formed, then so is \(\rho \). Likewise, the dependence \(\mathbb {D}_{\mathcal {T}} = \{({\textsf{op}_1(d_1)}^{t}, {\textsf{op}_2(d_2)}^{t}) \,|\, t \in \mathcal {T}\}\) is such that every \(\rho \in [\![\sigma ]\!]_{\mathbb {D}_{\mathcal {T}}}\) preserves the program order of \(\sigma \). Indeed, the dependence \(\mathbb {D}_{\textsf{HB}} = \mathbb {D}_{\mathcal {T}} \cup \mathbb {D}_{\mathcal {L}}\) is the classic happens-before dependence employed in modern data race detectors such as ThreadSanitizer  [36]. Finally, the dependence \(\mathbb {D}_{\textsf{RWL}} = \mathbb {D}_{\mathcal {T}} \cup \mathbb {D}_{\mathcal {L}} \cup \mathbb {D}_{\textsf{conf}}\), where \(\mathbb {D}_{\textsf{conf}} = \{({\textsf{op}_1(x)}^{t_1}, {\textsf{op}_2(x)}^{t_2}) \,|\, x \in \mathcal {X}, (\textsf{op}_1, \textsf{op}_2) \in \{(\texttt{w}, \texttt{r}), (\texttt{r}, \texttt{w}), (\texttt{w}, \texttt{w})\} \}\), ordering all conflicting memory accesses, ensures that for a well-formed execution \(\sigma \), we have \([\![\sigma ]\!]_{\mathbb {D}_{\textsf{RWL}}} \subseteq \textsf {CReorderings}(\sigma )\). The inclusion of \(\mathbb {D}_{\mathcal {T}}\) ensures that program order is preserved, \(\mathbb {D}_{\mathcal {L}}\) ensures well-formedness, while the remaining dependencies preserve the order of all conflicting pairs of events, and thus the reads-from relation. Indeed, \(\mathbb {D}_\textsf{RWL}\) is the smallest dependence that ensures soundness. Here, we say that \(\mathbb {D}\subseteq \Sigma _\textsf{RWL}\times \Sigma _\textsf{RWL}\) is sound if for every \(\sigma \in \Sigma ^*\), \([\![\sigma ]\!]_{\mathbb {D}} \subseteq \textsf {CReorderings}(\sigma )\),

Predictive Monitoring with Traces. The predictive monitoring question under trace equivalence induced by a generic concurrent alphabet \((\Sigma , \mathbb {D})\) becomes — ‘given an execution \(\sigma \), is \([\![\sigma ]\!]_{\mathbb {D}} \cap L \ne \varnothing \)?’. In general, even when L is regular, this problem cannot be solved faster than \(O(|\sigma |^\alpha )\). Here, \(\alpha \) is the degree of concurrency in \(\mathbb {D}\), or the size of the largest set without containing pairwise dependent events [1]. For the subclass of star-connected regular languages [30], this problem can be solved using a constant-space linear-time algorithm. Star-connected languages include the class of languages that can encode data races [9, 16], and the class of pattern languages [1] that capture other temporal bugs.

Example 1

Let \(L_{\textsf {race}} = \Sigma _{\textsf{RWL}}^* {\texttt{w}(x)}^{t_1} {\texttt{w}(x)}^{t_2} \Sigma _{\textsf{RWL}}^*\) be the set of executions that witness a race between two write accesses on memory location x between threads \(t_1\) and \(t_2\). Consider the execution \(\sigma _1\) illustrated in Fig. 1a and recall from Sect. 1 that \([\![\sigma _1]\!]_{\mathbb {D}_\textsf{RWL}} \cap L_{\textsf {race}} \ne \varnothing \). Further, recall that for the trace \(\sigma _2\) from Fig. 1b, we have \([\![\sigma _2]\!]_{\mathbb {D}_\textsf{RWL}} \cap L_{\textsf {race}} = \varnothing \), even though \(\textsf {CReorderings}(\sigma _2) \cap L_{\textsf {race}} \ne \varnothing \). In other words, data race prediction based on trace equivalence may have strictly less predictive power than prediction based on correct reorderings.

Fig. 2.
figure 2

Execution \(\sigma _3\) has a predictable deadlock, as witnessed by the correct reordering \(\rho _3\), but cannot be exposed by without violating \(\mathbb {D}_\mathcal {L}\).

Example 2

While trace equivalence can expose some data races (as with \(\sigma _1\) from Fig. 1a), it can fundamentally not model deadlock prediction. Consider the execution \(\sigma _3\) in Fig. 2a. It consists of two nested critical sections in inverted order of acquisition. Any program that generates \(\sigma _3\) is prone to a deadlock, as witnessed by the correct reordering \(\rho _3\) in Fig. 2b that acquires \(\ell _1\) in \(t_1\) and then immediately switches context to \(t_2\) in which lock \(\ell _2\) is acquired. Clearly, the underlying program is deadlocked at this point. Since \((\textsf{lab}(e_3), \textsf{lab}(e_5))\) and \((\textsf{lab}(e_4), \textsf{lab}(e_6)) \in \mathbb {D}_\mathcal {L}\), trace equivalence cannot predict this deadlock. Indeed, nested critical sections, acquired in a cyclic order, can never be reordered to actually expose the deadlock without violating the dependence between earlier release events and later acquire events induced by \(\mathbb {D}_\mathcal {L}\).

3 Strong Trace Prefixes

Observe that for both the executions \(\sigma _2\) (Example 1) and \(\sigma _3\) (Example 2), the correct reordering that exposes the bug in question can be obtained by relaxing the order of two events that were otherwise ordered by the dependence relation, in particular \(\mathbb {D}_\mathcal {L}\). Since the dependence \(\mathbb {D}_\mathcal {L}\) enforces mutual exclusion, it cannot be ignored altogether without compromising soundness. For example, setting \(\mathbb {D}_\mathcal {L}= \varnothing \), would deem \(\rho _2' = {\texttt{w}(x)}^{t_1} {\texttt{acq}(\ell )}^{t_1} {\texttt{acq}(\ell )}^{t_2} {\texttt{rel}(\ell )}^{t_1} {\texttt{rel}(\ell )}^{t_2} {\texttt{w}(x)}^{t_2}\) to be equivalent to \(\sigma _2\), even though \(\rho '_2 \not \in \textsf {CReorderings}(\sigma _2)\). Nevertheless, both these examples illustrate a key insight behind how we generalize the trace-theoretic framework — the dependence due to locks is weak. That is, let \(e_1 = {\texttt{rel}(\ell )}^{t_1} <_{\sigma } e_2 = {\texttt{acq}(\ell )}^{t_2}\) be events of an execution \(\sigma \). If they both appear in a reordering \(\rho \) of \(\sigma \), then, under commutativity-style reasoning, we demand that their relative order must be \(e_1 <_{\rho } e_2\). However, reorderings that drop the entire critical section of \(e_1\) may nevertheless be allowed and may not compromise well-formedness. This is in contrast with strong dependence such as those induced due to \(\mathbb {D}_\mathcal {T}\) or reads-from — any reordering must be downward closed with respect to them.

Building on these insights, we formalize strong trace prefixes by distinguishing dependencies that are absolutely necessary, i.e., strong dependence, from weak dependence, which do not affect causality, but only offer convenience for modelling constructs like mutual exclusion in a swap-based equivalence like trace equivalence. We present the formal definition of strong trace prefixes next.

Definition 1

(Dual Concurrent Alphabet). A dual concurrent alphabet is a tuple \((\Sigma , \mathbb {S}, \mathbb {W})\), where \(\Sigma \) is a finite alphabet, \(\mathbb {S}\subseteq \Sigma \times \Sigma \) is a reflexive and symmetric strong dependence relation, and \(\mathbb {W}\subseteq \Sigma \times \Sigma \) is an irreflexive symmetric weak dependence relation.

Definition 2

(Strong Trace Prefix). The strong trace prefix order induced by the dual alphabet \((\Sigma , \mathbb {S}, \mathbb {W})\), denoted \(\preccurlyeq ^{\mathbb {W}}_{\mathbb {S}}\), is the smallest reflexive and transitive binary relation on \(\Sigma ^*\) that satisfies:

  1. 1.

    \(\sim _{\mathbb {S}\cup \mathbb {W}} \; \subseteq \; \preccurlyeq ^{\mathbb {W}}_{\mathbb {S}}\), and

  2. 2.

    for every \(u, v \in \Sigma ^*\) and for every \(a \in \Sigma \), if for every \(b \in v\), \((a, b) \not \in \mathbb {S}\), then we have \(u \cdot v \preccurlyeq ^{\mathbb {W}}_{\mathbb {S}} u \cdot a \cdot v\)

We say that \(w' \in \Sigma ^*\) is a strong trace prefix of w if \(w' \preccurlyeq ^{\mathbb {W}}_{\mathbb {S}} w\). We use \(\langle \langle {w}||^{\mathbb {W}}_{\mathbb {S}} = \{w'\in \Sigma ^* \,|\, w' \preccurlyeq ^{\mathbb {W}}_{\mathbb {S}} w\}\) to denote the strong downward closure of w.

Let us also recall the classical notion of ideal based prefixes using the above. For a reflexive symmetric dependence relation \(\mathbb {D}\subseteq \Sigma \times \Sigma \), we use the notation \(\sqsubseteq _{\mathbb {D}}\) to denote the ideal prefix relation \(\preccurlyeq ^{\varnothing }_{\mathbb {D}}\), and call \(w_1\) an ideal prefix of \(w_2\) if \(w_1 \sqsubseteq _{\mathbb {D}} w_2\). We use \([\![{w}||_{\mathbb {D}} = \{w' \in \Sigma ^* \,|\, w' \sqsubseteq _{\mathbb {D}} w\}\) to denote the ideal downward closure of w.

A few observations about Definition 2 are in order. First, the relations \(\preccurlyeq ^{\mathbb {W}}_{\mathbb {S}}\) and \(\sqsubseteq _{\mathbb {D}}\) defined here are not equivalence relations (unlike \(\sim _{\mathbb {D}}\)) but only quasi orders and relate executions of different lengths (namely strong (or ideal) prefixes). Second, in the case \(\mathbb {W}\subseteq \mathbb {S}\), the strong trace prefix order gives the ideal prefix order \(\sqsubseteq _{\mathbb {S}\cup \mathbb {W}}\). Third, in general, strong prefixes are more permissive than ideal prefixes, i.e., \(\preccurlyeq ^{\varnothing }_{\mathbb {S}\cup \mathbb {W}} \subseteq \preccurlyeq ^{\mathbb {W}}_{\mathbb {S}}\), and is key to enhancing the predictive power of commutativity-style reasoning.

Example 3

Consider the alphabet \(\Sigma = \{a, b, c\}\). Fix the strong dependence relation \(\mathbb {S}= \{(a, a), (b, b), (c, c), (b, c), (c, b)\}\) and the weak dependence relation \(\mathbb {W}= \{(a, b), (b, a)\}\). Let \(\mathbb {D}= \mathbb {S}\cup \mathbb {W}\) be a traditional Mazurkiewicz-style dependence. Now, consider the string \(w = abacba\). First, observe the simple equivalence \(w \sim _{\mathbb {D}} w' = abcaba\). Indeed, no other strings in \(\Sigma ^*\) are \(\sim _{\mathbb {D}}\)-equivalent to w. The ideal prefixes of w \([\![{w}||_{\mathbb {D}} = \{\epsilon , a, ab, aba, abac, abacb, abacba, abc, abca, abcab, abcaba\}\) is precisely the set of (string) prefixes of the two strings w and \(w'\). The set of strong trace prefixes induced by \((\Sigma , \mathbb {S}, \mathbb {W})\) is larger though. First, consider the string \(w_1 = abcb\) and observe that \(w_1 \preccurlyeq ^{\mathbb {W}}_{\mathbb {S}} w\). This follows because (1) \(w' = abcab\cdot a\cdot \epsilon \), and thus \(abcab \preccurlyeq ^{\mathbb {W}}_{\mathbb {S}} w\), (2) \(abcab = abc\cdot a\cdot b\) and \((a, b) \not \in \mathbb {S}\) and thus \(abcb \preccurlyeq ^{\mathbb {W}}_{\mathbb {S}} abcab\), and finally (3) due to transitivity, we have \(w_1 \preccurlyeq ^{\mathbb {W}}_{\mathbb {S}} w\). Consider now the string \(w_2 = bcb\), and observe that \(abcb = \epsilon \cdot a\cdot bcb\), giving us \(w_2 \preccurlyeq ^{\mathbb {W}}_{\mathbb {S}} w_1\) since \(\{(a,b), (a,c)\} \cap \mathbb {S}= \varnothing \). Thus, \(w_2 \preccurlyeq ^{\mathbb {W}}_{\mathbb {S}} w\). On the other hand, observe that \(w_1 \not \!\sqsubseteq _{\mathbb {D}} w\) and \(w_2 \not \!\sqsubseteq _{\mathbb {D}} w\).

3.1 Modelling Correct Reorderings with Strong Trace Prefixes

Recall that \(\mathbb {D}_\textsf{RWL}\) ordered events of the same thread, same locks and conflicting events of a given memory location allowing us to soundly represent a class of correct reorderings of an execution \(\sigma \) as the equivalence class \([\![\sigma ]\!]_{\mathbb {D}_\textsf{RWL}}\). Here, we identify a finer gradation of \(\mathbb {D}_\textsf{RWL}\), to allow for a larger subset of correct reorderings. Specifically, we define the strong and weak dependence on \(\Sigma _\textsf{RWL}\) as:

$$\begin{aligned} \begin{array}{c} \mathbb {W}_{\texttt{w}} = \{({\texttt{w}(x)}^{t_1}, {\texttt{w}(x)}^{t_2}) \,|\, x \in \mathcal {X}, t_1\ne t_2 \in \mathcal {T}\}, \quad \mathbb {W}_\mathcal {L}= \{(a^{t_1}, b^{t_2}) \in \mathbb {D}_\mathcal {L} \,|\, t_1\ne t_2\}\\ \mathbb {W}_{\textsf{RWL}} = \mathbb {W}_\mathcal {L}\cup \mathbb {W}_{\texttt{w}}, \quad \quad \mathbb {S}_\textsf{RWL}= \mathbb {D}_\textsf{RWL}\setminus \mathbb {W}_\textsf{RWL}\end{array} \end{aligned}$$
(1)

In other words, the dual concurrent alphabet \((\Sigma _\textsf{RWL}, \mathbb {S}_\textsf{RWL}, \mathbb {W}_\textsf{RWL})\) relaxes the ‘hard ordering’ between writes to the same memory location (i.e., ‘conflicting writes’) as well as that between critical sections of the same lock (i.e., ‘conflicting lock events’). We next explain the intuition behind the above relaxations.

Weakening Dependence on Writes. Let us begin by arguing about \(\mathbb {W}_{\texttt{w}}\). When an execution contains two consecutive write events \(e_1, e_2\) with \(\textsf{lab}(e_1) = {\texttt{w}(x)}^{t_1}\) and \(\textsf{lab}(e_2) ={\texttt{w}(x)}^{t_2}\) on the same memory location \(x \in \mathcal {X}\), then, clearly, there is no event reading from the first write event \(e_1\) since it is immediately overwritten by \(e_2\). In this case, while flipping the order of \(e_1\) and \(e_2\) may violate the read-from relation of a read event reading from \(e_2\), observe that \(e_1\) can be completely dropped (in absence of later \(\mathbb {S}_\textsf{RWL}\)-dependent events after \(e_1\)) without dropping \(e_2\) and without affecting any control flow. In other words, the presence of \(e_2\) does not mandate the presence of \(e_1\), but when both are present, the conservative choice of placing \(e_1\) before \(e_2\) ensures that the reads-from relation is preserved.

Weakening Dependence on Lock Events. Recall that the primary role of the dependence \(\mathbb {D}_\mathcal {L}\) was to ensure mutual exclusion, i.e., two critical sections of the same lock do not overlap in any execution obtained by repeatedly swapping neighboring independent events. We identify that this is not a strong dependence, in that one can possibly drop an earlier critical section entirely, while retaining a later critical section on the same lock in a candidate correct reordering. The correct reordering \(\rho _2\) of \(\sigma _2\) in Fig. 1c can be obtained by leveraging this insight. Indeed, \(\rho _2 \in \langle \langle {\sigma _2}||^{\mathbb {W}_\textsf{RWL}}_{\mathbb {S}_\textsf{RWL}}\) because \(({o_1(\ell )}^{t_1}, {o_2(\ell )}^{t_2}) \in \mathbb {W}_\textsf{RWL}\) for \(o_1, o_2 \in \{\texttt{acq}, \texttt{rel}\}\). Moreover, in the deadlock example (Fig. 2), \(\rho _3 \in \langle \langle {\sigma _3}||^{\mathbb {W}_\textsf{RWL}}_{\mathbb {S}_\textsf{RWL}}\), since the critical section of \(l_2\) in thread \(t_1\) can be completely dropped without affecting the presence of \(\texttt{acq}(l_2)^{t_2}\).

Well-Formedness. The weak dependence \(\mathbb {W}_\mathcal {L}\) ensures that no two complete critical sections on the same lock overlap in a strong trace prefix (provided they did not overlap in the original execution). However, simply marking lock dependencies as weak still does not forbid strong trace prefixes where an earlier incomplete critical section overlaps with a later complete critical section. Consider for example, the (ill-formed) execution \(\rho '_{2} = {\texttt{w}(x)}^{t_1}{\texttt{acq}(\ell )}^{t_1}{\texttt{acq}(\ell )}^{t_2}{\texttt{rel}(\ell )}^{t_2}{\texttt{w}(x)}^{t_2}\). Observe that \(\rho '_2\) is a strong trace prefix of \(\sigma _2\) under \(\mathbb {S}_\textsf{RWL}\) and \(\mathbb {W}_\textsf{RWL}\). As we will show in Sect. 4.3, we can remedy this mild peculiarity in the predictive monitoring algorithm.

Soundness and Precision Power. Strong trace prefixes retain soundness (as long as they are well-formed) while enjoying higher predictive power:

Theorem 1

(Soundness and Precision Power). For each well-formed execution \(\sigma \in L_\textsf{WF}\), we have:

$$ [\![\sigma ]\!]_{\mathbb {S}_\textsf{RWL}\cup \mathbb {W}_\textsf{RWL}} \subseteq [\![{\sigma }||_{\mathbb {S}_\textsf{RWL}\cup \mathbb {W}_\textsf{RWL}} \subseteq \langle \langle {\sigma }||^{\mathbb {W}_\textsf{RWL}}_{\mathbb {S}_\textsf{RWL}} \cap L_\textsf{WF}\subseteq \textsf {CReorderings}(\sigma ). $$

Moreover, there is a \(\sigma \in L_\textsf{WF}\) for which each of the subset relationships are strict.

Maximality. Our choice of the dual concurrent alphabet \((\Sigma _\textsf{RWL}, \mathbb {S}_\textsf{RWL}, \mathbb {W}_\textsf{RWL})\) is also the best one amongst the space of sound dual concurrent alphabets obtained by stratifying \(\mathbb {D}_\textsf{RWL}\). Formally,

Theorem 2

(Maximality). Let \((\Sigma _\textsf{RWL}, \mathbb {S}, \mathbb {W})\) be a dual concurrent alphabet such that \(\mathbb {D}_\textsf{RWL}\subseteq \mathbb {S}\cup \mathbb {W}\), and, for every \(\sigma \in \Sigma ^*_\textsf{RWL}\), \(\langle \langle {\sigma }||^{\mathbb {W}}_{\mathbb {S}} \cap L_\textsf{WF}\subseteq \textsf {CReorderings}(\sigma )\). Then, \(\mathbb {S}_\textsf{RWL}\subseteq \mathbb {S}\), and thus, for every \(\sigma \), \(\langle \langle {\sigma }||^{\mathbb {W}}_{\mathbb {S}} \subseteq \langle \langle {\sigma }||^{\mathbb {W}_\textsf{RWL}}_{\mathbb {S}_\textsf{RWL}}\)

The formal proof of the theorems in this paper can be found in the extended version of our paper [2].

4 Complexity of Predictive Monitoring

In this section we investigate the impact of generalizing Mazurkiewicz traces to strong trace prefixes, on the predictive monitoring question. We present two schemes to translate arbitrary Turing machines for predictive monitoring under trace equivalence against a language L to one for predictive monitoring under strong trace prefixes against the same language L. The first scheme (Sect. 4.1), uses additional non-determinism (but same time and space usage), and the second (Sect. 4.2) employs polynomial multiplicative blow-up in time complexity.

4.1 Non-deterministic Predictive Monitoring

We first show that an algorithm that solves the vanilla predictive monitoring problem (\([\![\sigma ]\!]_{\mathbb {D}} \cap L \ne \varnothing \)) can be transformed into an algorithm for predictive monitoring against strong trace prefixes with similar resource (time and space) usage, albeit with use of non-determinism.

Theorem 3

Let \(L \subseteq \Sigma ^*\) and let M be a deterministic Turing machine, that uses time T(|w|) and space S(|w|), such that \(L(M) = \{w \,|\, [\![w]\!]_{\mathbb {D}}\cap L \ne \varnothing \}\). There is a nondeterministic Turing machine \(M'\) that uses time \(T(|w|) + O(|w|)\) and space \(S(|w|) + O(|w|)\), such that \(L(M') = \{w \,|\, \langle \langle {w}||^{\mathbb {W}}_{\mathbb {S}} \cap L \ne \varnothing \}\). Moreover, if M runs in one-pass, then \(M'\) uses space \(S(|w|) + c\) (for some constant c).

Observe that, in the above, we have \(S(|w|) + c \in O(S(|w|))\). Further, when \(T(|w|) \in \Omega (|w|)\), then \(T(|w|) + O(|w|) \in O(T(|w|))\). Thus, the time and space usage of the non-deterministic machine \(M'\) in Theorem 3 are essentially the same as those of M.

The proof of Theorem 3 relies on the observation that any strong prefix u of a string w is equivalent (according to trace equivalence using \(\sim _{\mathbb {S}\cup \mathbb {W}}\)) to a subsequence \(w'\) of w, such that \(w'\) is downward closed with respect to strong dependencies. The non-deterministic Turing machine \(M'\) first non-deterministically guesses a subsequence \(w'\) of the input execution w, then, using constant space and an additional forward streaming pass, ensures that \(w'\) is downward closed with respect to \(\mathbb {S}\), and finally invokes the Turing machine M on the string \(w'\).

It follows from Theorem 3 that when the language of M is regular, so is the language of \(M'\). This means, that when a language L can be predictively monitored in constant space under trace equivalence (for example data races, deadlocks, or pattern languages [1]), then it can also be predictively monitored in constant space, yet with higher predictive power, under strong trace prefixes!

4.2 Deterministic Predictive Monitoring

While Theorem 3 illustrates that the predictive monitoring question with strong trace prefixes becomes decidable (when assuming the analogous problem for Mazurkiewicz traces is decidable), we remark that the use of nondeterminism may lead to exponential blow-ups in time and space when translating it to a deterministic machine that can then be used in a practical predictive testing setup. Here, in this section, we establish that one can tactfully avoid this blow-up. In fact, we show that only allowing a polynomial multiplicative blow-up is sufficient to do predictive monitoring under strong prefixes starting with a deterministic Turing machine for that works under trace equivalence.

Our result is inspired by prior works on predictive monitoring under trace languages [3]. Here, one identifies strong idealsFootnote 1, i.e., sets of events that are downward closed with respect to the strong dependence relation, and checks whether there is a linearization of one of them that respects both strong and weak dependence and also belongs to the target language L. A parameter that crucially determines the time complexity here is the width of the concurrency alphabet. In our setting, the width \(\alpha _\mathbb {S}\) is the size of the largest subset of \(\Sigma \), that contains no two letters which are dependent according to \(\mathbb {S}\).

Theorem 4

Fix a language L. Let M be a deterministic Turing machine that uses time T(|w|) and space S(|w|), such that \(L(M) = \{w \,|\, [\![w]\!]_{\mathbb {D}}\cap L \ne \varnothing \}\). Then, there exists a deterministic Turing machine \(M'\) that runs in time \(O((|w|+T(|w|)) \cdot n^{\alpha _\mathbb {S}})\) and uses space \(S(|w|) + O(|w|)\), such that \(L(M') = \{w \,|\, \langle \langle {w}||^{\mathbb {W}}_{\mathbb {S}} \cap L \ne \varnothing \}\).

The above complexity bounds follow because one can systematically enumerate those subsequences of the input w which are downward closed with respect to \(\mathbb {S}\), by in turn enumerating the space of strong ideals. The set of strong ideals is, in turn, bounded by \(|w|^{\alpha _\mathbb {S}}\).

4.3 Ensuring Well-Formedness and Soundness

Recall that the dual concurrent alphabet \((\Sigma _\textsf{RWL}, \mathbb {S}_\textsf{RWL}, \mathbb {W}_\textsf{RWL})\) is not sufficient by itself for ensuring that the strong trace prefixes of an execution \(\sigma \in L_\textsf{WF}\) are also well-formed. Well-formedness can nevertheless be retrofitted in the predictive monitoring algorithm with same additional time, space and non-determinism. Theorem 5 formalizes this and follows from the observation that the set \(L_\textsf{WF}\) is (a) regular, and (b) closed under trace equivalence, i.e., for every \(\sigma \in L_\textsf{WF}\), we have \([\![\sigma ]\!]_{\mathbb {D}_{\textsf{RWL}}} \subseteq L_\textsf{WF}\), and algorithms for predictive monitoring can be easily augmented to reason about the set \(\langle \langle {\sigma }||^{\mathbb {W}_\textsf{RWL}}_{\mathbb {S}_\textsf{RWL}}\cap L_\textsf{WF}\).

Theorem 5

Let \(L \subseteq \Sigma _\textsf{RWL}^*\) and let M be a deterministic Turing machine, that uses time T(|w|) and space S(|w|), such that \(L(M) = \{\sigma \in L_\textsf{WF} \,|\, [\![\sigma ]\!]_{\mathbb {D}_\textsf{RWL}} \cap L_\textsf{WF}\cap L \ne \varnothing \}\). There is a nondeterministic Turing machine \(M'\) (resp. deterministic Turing machine \(M''\)) that uses time \(T(|w|) + O(|w|)\) (resp. \(O((|w|+T(|w|)) \cdot n^{\alpha _\mathbb {S}})\)) and space \(S(|w|) + O(|w|)\) such that \(L(M') ( = L(M'')) = \{\sigma \in L_\textsf{WF} \,|\, \langle \langle {\sigma }||^{\mathbb {W}}_{\mathbb {S}} \cap L_\textsf{WF}\cap L \ne \varnothing \}\). Moreover, if M runs in one-pass, \(M'\) uses space \(S(|w|) + c\) (for some constant c).

5 Strong Reads-From Prefixes

Strong trace prefixes generalize Mazurkiewicz traces and can enhance precision of predictive monitoring algorithms. In this section, we propose further generalizations in the context of \(\Sigma _\textsf{RWL}\), bringing the power of trace-based reasoning further close to correct reorderings. Towards this, we observe that the key constraints that correct reorderings must satisfy are only thread-order and reads-from relation, and thus \(\mathbb {S}_\textsf{RWL}\) may be relaxed further by removing the dependence between writes and reads.

Fig. 3.
figure 3

Execution \(\sigma _4\) has a predictable data race, and can be exposed with strong reads-from prefixes in \(\rho '_4\), but cannot be exposed by strong trace prefixes.

Consider the trace \(\sigma _4\) in Fig. 3. Here, the only strong prefixes of \(\sigma \) (under \(\mathbb {S}_\textsf{RWL}\) and \(\mathbb {W}_\textsf{RWL}\)) are its own (string) prefixes. That is, strong trace prefixes cannot be used to argue that there is a reordering (namely \(\rho '_4\) in Fig. 3c) in which \(\texttt{w}(y)\) and \(\texttt{r}(y)\) are next to each other. Intuitively, one can first obtain the intermediate \(\rho _4\) (Fig. 3b) from \(\sigma _4\) by dropping all events in the block of events containing \(e_2\) labelled \({\texttt{w}(x)}^{t_1}\) together with all its read events \(\textsf {rf}_{\sigma _4}(e_2) = \{e_3, e_4\}\), and then obtain \(\rho '_4\) from \(\rho _4\) using Mazurkiewicz between independent events. We remark however that, neither \(\rho _4\) nor \(\rho '_3\) is a strong trace prefix of \(\sigma _4\) because \(({\texttt{r}(x)}^{t_3}, {\texttt{w}(x)}^{t_2}) \in \mathbb {S}_\textsf{RWL}\). However, observe that one cannot obtain this reordering \(\rho _4\) in the presence of \(\mathbb {S}_\textsf{RWL}\).

The above example illustrates the possibility of relaxing \(\mathbb {S}_\textsf{RWL}\) by removing the dependencies between reads and writes. However, an incautious relaxation (such as removing \(({\texttt{w}(x)}^{t_1}, {\texttt{r}(x)}^{t_3})\) from \(\mathbb {S}_\textsf{RWL}\)) may result into a prefix like \(\rho ''_4 = {\texttt{w}(y)}^{t_1}{\texttt{r}(x)}^{t_3}{\texttt{w}(x)}^{t_2}{\texttt{r}(x)}^{t_2}{\texttt{r}(y)}^{t_2}\) which is not a correct reordering of \(\sigma _4\). In other words, while \((\texttt{r}, \texttt{w})\) and \((\texttt{w}, \texttt{r})\) dependencies can be relaxed, the stronger semantic dependence due to reads-from must still be retained. As a reminder, such a relaxation cannot accurately be modelled under strong prefixes alone since \((\Sigma _\textsf{RWL}, \mathbb {S}_\textsf{RWL}, \mathbb {W}_\textsf{RWL})\) is already the weakest alphabet (Theorem 2). We instead model this as strong reads-from prefixes defined below:

Definition 3

(Strong Reads-from Prefix). The strong reads-from prefix order induced by \((\Sigma _\textsf{RWL}, \mathbb {S}_\textsf{RWL}, \mathbb {W}_\textsf{RWL})\), denoted \(\trianglelefteq _{\textsf{rf}}\), is the smallest reflexive and transitive binary relation on \(\Sigma _\textsf{RWL}^*\) that satisfies:

  1. 1.

    \(\preccurlyeq ^{\mathbb {W}_\textsf{RWL}}_{\mathbb {S}_\textsf{RWL}} \; \subseteq \; \trianglelefteq _{\textsf{rf}}\), and

  2. 2.

    let \(\sigma = \sigma _1 \cdot e \cdot \sigma _2\), if \(\forall e' \in \sigma _2\), we have \((e, e')\not \in \mathbb {D}_\mathcal {T}\) and \(e'\not \in \textsf {rf}_{\sigma }(e)\), then \(\sigma _1\cdot \sigma _2 \trianglelefteq _{\textsf{rf}}\sigma _1 \cdot e \cdot \sigma _2\).

We say \(w' \in \Sigma _\textsf{RWL}^*\) is a strong reads-from prefix of w if \(w' \trianglelefteq _{\textsf{rf}}w\). We use \(\langle \langle {w}||_{\textsf{rf}} = \{w'\in \Sigma _\textsf{RWL}^* \,|\, w' \trianglelefteq _{\textsf{rf}}w\}\) to denote the strong reads-from downward closure of w.

In the above example, \(\rho _4\) and \(\rho '_4\) now can be modelled as strong reads-from prefixes of \(\sigma _4\), i.e., \(\rho _4, \rho '_4\in \langle \langle {\sigma _4}||_{\textsf{rf}}\), since \({w(x)}^{t_1}\) and \({r(x)}^{t_1}\) are not strong dependent with and not in the reads-from relation with any subsequent events. The soundness and precision power of strong-reads from prefixes are clear:

Theorem 6

(Soundness and Precision Power). For each well-formed execution \(\sigma \in L_\textsf{WF}\), we have:

$$ [\![\sigma ]\!]_{\mathbb {S}_\textsf{RWL}\cup \mathbb {W}_\textsf{RWL}} \subseteq [\![{\sigma }||_{\mathbb {S}_\textsf{RWL}\cup \mathbb {W}_\textsf{RWL}} \subseteq \langle \langle {\sigma }||^{\mathbb {W}_\textsf{RWL}}_{\mathbb {S}_\textsf{RWL}} \cap L_\textsf{WF}\subseteq \langle \langle {\sigma }||_{\textsf{rf}}\cap L_\textsf{WF}\subseteq \textsf {CReorderings}(\sigma ). $$

Moreover, there is a \(\sigma \in L_\textsf{WF}\) for which each of the subset relationships are strict.

We now discuss the algorithmic impact of this further relaxation to strong reads-from prefixes. Here we obtain results which are analogue to Theorem 3 and Theorem 4. These follow because one can guess and check whether a prefix preserves thread order and reads-from relations.

Theorem 7

Let \(L \subseteq \Sigma ^*\) and let M be a deterministic Turing machine, that uses time T(n) and space S(n), such that \(L(M) = \{w \,|\, [\![w]\!]_{\mathbb {D}}\cap L \ne \varnothing \}\). There is a nondeterministic Turing machine \(M'\) (resp. deterministic Turing machine \(M''\)) that uses time \(T(n) + O(n)\) (resp. \(O((n+T(n)) \cdot n^{\alpha _{\mathbb {S}_\textsf{RWL}}})\)) and space \(S(n) + O(n)\) such that \(L(M') (= L(M'')) = \{w \,|\, \langle \langle {w}||_{\textsf{rf}} \cap L \ne \varnothing \}\). Moreover, if M runs in one-pass, then \(M'\) uses space \(S(n) + c\) (for some constant c).

In next section, we will show that such a relaxation allows us to obtain a previously known class of synchronization-preserving data races and deadlocks.

6 Strong Prefixes Versus Synchronization Preservation

Recall the execution \(\sigma _2\) in Fig. 1, where \((e_1, e_6)\) is a data race, but cannot be detected using a happens-before style detector, i.e., using the dependence \(\mathbb {D}_L\). On the other hand, trace \(\rho _2\) demonstrates that this can be captured using strong prefixes (i.e., under \(\mathbb {S}_\textsf{RWL}\) and \(\mathbb {W}_\textsf{RWL}\)). Indeed, this is a classic example of a synchronization-preserving data race proposed in [26] and characterizes a large class of predictable data races that can also be detected in linear time. The analogous notion of synchronization-preserving deadlocks captures a large class of predictable deadlocks [40], and can be detected efficiently. Both these classes of bugs can be predicted by looking for synchronization-preserving correct reorderings, and in this section we investigate the relationship between them and strong reads-from prefixes.

Synchronization-Preserving Reorderings, Data Races and Deadlocks. An execution \(\rho \in \Sigma ^*_\textsf{RWL}\) is a synchronization-preserving correct reordering of execution \(\sigma \in \Sigma ^*_\textsf{RWL}\) if (a) \(\rho \) is a correct reordering of \(\sigma \), and (b) for each pair of acquire events \(a_1 \ne a_2\) (alternatively, critical sections) of \(\sigma \) on the same lock \(\ell \), such that both \(a_1\) and \(a_2\) are present in \(\rho \), we have, \(a_1 <_{\rho } a_2\) iff \(a_1 <_{\sigma } a_2\). We use \(\texttt {SyncP}(\sigma )\) to denote the set of all synchronization-preserving correct reordering of \(\sigma \). A sync(hronization)-preserving data races are a pair of conflicting events \((e_1, e_2)\), such that there is a synchronization-preserving correct reordering \(\rho \) in which \(e_1\) and \(e_2\) are \(\rho \)-enabled. Likewise, a sync-preserving deadlock of length k is a deadlock patternFootnote 2 \((e_1,\dots ,e_k)\), such that there is a synchronization-preserving correct reordering \(\rho \) in which \(e_1, \dots , e_k\) are \(\sigma \)-enabled. This class of data races and deadlocks can be detected in linear time and space [26, 40].

We observe that the algorithmic efficiency in predicting sync-preserving data races (resp. deadlocks) stems from the fact that whenever \((e_1, e_2)\) is a data race (resp. \((e_1, \dots , e_k)\) is a deadlock), it can be witnessed by a reordering which is not only synchronization-preserving, but also preserves the order of conflicting read and write events, i.e., through a conflict-preserving reordering:

Definition 4

(Conflict-preserving Correct Reordering) A reordering \(\rho \) of an execution \(\sigma \) is a conflict-preserving correct reordering if (a) \(\rho \) is the correct reordering of \(\sigma \), (b) for every lock \(\ell \) and for any two acquire event \(a_1, a_2\) labelled \(\texttt{acq}(\ell )\) in \(\rho \), \(a_1 <_{\rho } a_2\) iff \(a_1 <_{\sigma } a_2\), and (c) for every two conflicting events \(e_1\) and \(e_2\) in \(\rho \), \(e_1<_{\rho } e_2\) iff \(e_1 <_{\sigma } e_2\).

Here, we say \((e_1, e_2)\) is a conflicting pair of events if \((\textsf{lab}(e_1), \textsf{lab}(e_2)) \in \mathbb {D}_{\textsf{conf}}\). We use \(\texttt {ConfP}(\sigma )\) to denote all conflict-preserving correct reorderings of \(\sigma \). Observe that every conflict-preserving correct reordering of \(\sigma \) is also a synchronization-preserving correct reordering of \(\sigma \).

Proposition 1

For any execution \(\sigma \in \Sigma _\textsf{RWL}^*\), we have \(\texttt {ConfP}(\sigma )\subseteq \texttt {SyncP}(\sigma )\).

We now formalize our observation: We identify that in fact every synchronization-preserving data race (deadlock) is also a conflict-preserving data race (deadlock).

Lemma 1

Let \(\sigma \in \Sigma _\textsf{RWL}^*\) be an execution. A sequence of events \((e_1, \dots , e_k)\) is \(\sigma \)-enabled in some synchronization-preserving reordering of \(\sigma \) iff they are \(\sigma \)-enabled in some conflict-preserving reordering of \(\sigma \). Thus, sync-preserving data races and deadlocks can also be witnessed using conflict-preserving reorderings.

The connection between synchronization-preserving and strong reads-from prefixes is now straightforward because the class of conflict-preserving races or conflict-preserving deadlocks can be accurately modelled in our framework:

Lemma 2

Let \(\sigma \in \Sigma _\textsf{RWL}^*\) be an execution. We have \(\langle \langle {\sigma }||_{\textsf{rf}}\cap L_\textsf{WF}= \texttt {ConfP}(\sigma )\).

According to Lemma 1 and Lemma 2, we build the connection between sync-preserving data race (deadlock) and our strong reads-from prefixes: a sequence of events \((e_1, \dots , e_k)\) is \(\sigma \)-enabled in some synchronization-preserving reordering of \(\sigma \) iff they are \(\sigma \)-enabled in a well-formed strong reads-from prefix. Consequently, we get algorithms for detecting sync-preserving data races and deadlocks with improved space bound and same time:

Theorem 8

Synchronization-preserving races and deadlocks can be detected in linear time and constant space.

Even though, in the context of data races and deadlocks, it suffices to look at conflict-preserving correct reorderings, in general, the class of synchronization-preserving reorderings is much more expressive. As a consequence, when one goes beyond data races and deadlocks to a slightly different class of specifications, the predictive monitoring question under synchronization-preserving reorderings quickly becomes hard. In particular, we demonstrate this in the context of predicting if two events can be reordered in a certain order. Under Mazurkiewicz’s trace equivalence, this problem can be decided in linear time and constant space, and thus also for strong trace prefixes (Theorem 3). However, in the context of synchronization-preserving reorderings, we show that this problem cannot be solved in linear time and constant space.

Theorem 9

Let \(\sigma \in \Sigma _\textsf{RWL}^*\) be an execution, and \(e_1, e_2\in \textsf {Events}_{\sigma }\) be two events. Any streaming algorithm that checks if there is an execution \(\rho \in \texttt {SyncP}(\sigma )\) such that \(e_1<_{\rho }e_2\) uses linear space.

Indeed, the above problem (checking if two events can be flipped) is an example of the level 1/2 in the Straubing-Thérien hierarchyFootnote 3 [32], or pattern languagesFootnote 4 [1], whose predictive monitoring can be solved in linear time and constant space under Mazurkiewicz traces, and thus also under strong prefixes. However, Theorem 9 indicates that any streaming algorithm deciding this problem against pattern languages under synchronization-preserving reorderings has a linear-space lower bound. We therefore remark that, strong prefixes lie at the horizon of tractability in the context of predictive monitoring.

7 Experimental Evaluation

We evaluate the effectiveness of strong prefixes and strong reads-from prefixes for the purpose of predictive monitoring of executions (over \(\Sigma _\textsf{RWL}^*\)) of shared memory multi-threaded programs. The goal of our evaluation is two-folds. First, we want to empirically gauge the enhanced predictive power of strong reads-from prefixes over prediction based on trace equivalence. We demonstrate this using prediction against pattern languages proposed in [1]. For data races, strong reads-from prefixes can capture sync-preserving races, which have already been shown to have more empirical predictive power over trace-based reasoning [26]. Second, we want to evaluate how our, not-so-customized but constant space, algorithm for synchronization-preserving data races and deadlocks (Theorem 8) performs against the linear-space algorithm due to [26, 40].

Implementation and Setup. We implemented our predictive monitoring algorithms for data races, deadlocks and pattern languages, in Java, obtained by determinizing the non-deterministic monitors obtained from Theorem 7. We evaluate against benchmarks derived from [1, 26, 40], consisting of concurrent programs from a variety of suites: (a) the IBM Contest suite [10], (b) the DaCapo suite [5], (c) the Java Grande suite [39], (d) the Software Infrastructure Repository suite [8], and (e) others [6, 18,19,20, 23]. For each benchmark program, we generated an execution log using RV-Predict [33] and evaluate all competing algorithms on the same execution. Our experiments were conducted on a 64-bit Linux machine with Java 19 and 400GB heap space. Throughout, we set a timeout of 3 hours for each run of every algorithm. We present brief summary of our results here, and the full result can be found in  [2].

Table 1. Predictive monitoring against pattern languages, grouped by pattern length. Column 3 (Column 6) reports the total time taken under trace equivalence (strong reads-from prefixes). Column 2 (Column 4) reports the number of successful matches under trace equivalence (strong reads-from prefixes). Column 5 reports the number of times prediction based on strong reads-from prefixes reports an earlier match.

7.1 Enhanced Predictive Power of Strong Prefixes

We demonstrate the enhanced predictive power due to our proposed formalism in the context of predictive monitoring against pattern languages [1]. Pattern language specifications take the form \(\Sigma ^*{a}_1\Sigma ^*\dots \Sigma ^*{a}_{d}\Sigma ^*\), and thus include all executions that contain \({a}_1, \dots , {a}_{d}\) as a sub-sequence. Predictive monitoring against pattern languages can be performed in constant space and linear time under trace equivalence [1].

Implementation and Methodology. M (under trace equivalence) proposed in [1]. To perform predictive monitoring under strong reads-from prefixes, our algorithm \(M'\) guesses an appropriate prefix and invokes the predictive monitor M (under trace equivalence) proposed in [1], . Since M consumes constant space, in theory, simulating \(M'\) also requires constant space (see also Theorem 7). The resulting space complexity, however, can be prohibitive in practice. For scalability, we employ randomization to select a subset of prefixes, and only inspect these. Our results show that, despite this compromise, the predictive power under strong reads-from prefixes is higher than under trace equivalence. We use 30 benchmark executions, and for each execution, we isolate 20 patterns (of size 3 and 5), from randomly chosen sub-executions of length 5000, following [1]. For each pair of benchmark and pattern, we run the two streaming algorithms (trace equivalence v/s strong reads-from prefixes), on the sub-execution from which the pattern is extracted, allowing us to optimize memory usage. Both algorithms terminate as soon as the pattern is matched, otherwise the algorithms process the entire sub-execution. We use the publicly available implementation of [1].

Evaluation Results. Our results are summarized in Table 1. First, all matches reported under trace equivalence were also reported under strong reads-from prefixes, as expected based on Theorem 6 . Second, out of the \(30\times 20\) combinations of executions and patterns, trace equivalence based prediction reports 33 fewer matches as compared to prediction based on strong reads-from prefixes (466 vs 499). The enhancement in prediction power spans patterns of both sizes — 15 extra matches were found for patterns of size 3 and 18 extra matches were found for patterns of size 5. Third, we also collect more fine-grained information — amongst the 466 combinations reported by both, 18 were reported earlier (in a shorter execution prefix) under strong reads-from prefixes, and thus are new violations. Finally, the total time taken for prediction under reads-from prefixes is higher, as expected, but only by \(6\%\). In summary, strong reads-from prefixes offer higher prediction power in practice, with moderate additional overhead.

Table 2. Synchronization-preserving v/s conflict-preserving data races. \(\mathcal {N}\) and \(\mathcal {T}\) denote the number of events and threads in the executions.

7.2 Strong Reads-From Prefixes v/s Sync-Preservation

We implemented the constant-space linear-time algorithm for sync-preserving data races and deadlocks (Theorem 8) and compare it against the linear-space algorithms due to [26, 40], solving the same problem.

Table 3. Synchronization-preserving v/s conflict-preserving deadlocks. \(\mathcal {N}\) and \(\mathcal {T}\) denote the number of events and threads in the executions.

Implementation and Methodology. The algorithm guesses strong reads-from prefixes and checks whether the enabled events in them constitute a data race or a deadlock. Following [26, 40], we filtered out thread-local events to reduce the space usage of all algorithms. We compare our predictive monitoring algorithm under strong reads-from prefixes (conflict-preserving reorderings) with SyncP [26] and SyncPD [40] under synchronization-preserving prefixes. We use the publicly available implementation of [26, 40]. We run all algorithms on the entire executions. For the case of data races, we report the number of events \(e_2\) for which there is an earlier event \(e_1\) such that \((e_1, e_2)\) is a sync-preserving data race. For the case of deadlocks, we report the number of tuples of program locations corresponding to events reported to be in deadlock.

Evaluation Results. We present our results in Table 2 and Table 3. First, observe that the precision of data race and deadlock prediction based on strong reads-from prefixes is exactly the same as the prediction based on synchronization-preservation (compare columns 4 and 6 in both tables). Next, we observe that our implementations (even though constant space) is slower than the optimized algorithms proposed [26]. We conjecture this is because the constants appearing after determinization, are large (of the order of \(O(2^{\textsf {poly}(|\mathcal {X}| + |\mathcal {T}| + |\mathcal {L}|)})\)), also resulting in out-of-memory exceptions on some large benchmarks.

8 Related Work and Conclusions

Our work is inspired from runtime predictive analysis for testing concurrent programs, where the task is to enhance the coverage of naive dynamic analysis techniques to a larger space of correct reorderings [15, 34, 35]. A key focus here is to improve the scalability of prediction techniques for concurrency bugs such as data races [21, 24, 26, 31, 37, 38], deadlocks [20, 40], atomicity violations [4, 11, 27] and more general properties [1, 14] for an otherwise intractable problem [25]. The theme of our work is to develop efficient algorithms for predictive concurrency bug detection. We start with the setting of trace theory [28], where questions such as checking whether two events can be flipped, which are intractable in general [12, 22, 25], can be answered in constant space. The problem of relaxing Mazurkiewicz traces has been studied [7]. Recent work [12] has focused on reasoning about commutativity of grains of events.

In contrast, our work takes an orthogonal angle and proposes that, for co-safety properties, one might be able to perform relaxations by exploiting semantic properties of programming constructs in multithreaded shared-memory programs. To this end, strong trace prefixes and its extension on a concrete alphabet \(\Sigma _\textsf{RWL}\), strong reads-from prefixes, where the commutativity between events can be stratified into strong and weak dependencies. This simple relaxation allows us to capture a larger class of concurrency bugs, while still retaining the algorithmic simplicity that event based commutativity offers. We also show connections between prior algorithms for (sync-preserving) data race and deadlock prediction and our formalism, and arrive at asymptotically faster algorithms for them. We envision that combining commutativity based on groups of events [12] and prefix based prediction may be an interesting avenue for future research.