figure a

1 Introduction

Multi-threaded programming is notoriously prone to subtle software glitches that are difficult to identify and reproduce [29]. In addition, for the C language, the specifications represent another factor of complexity [2, 43, 68]. Indeed, to leave room for improving compiler efficiency and hardware support, so-called undefined behaviour [22, 40] is deliberately introduced in many points of the specifications. Such loose ends place further burden on the programmer, who is assumed to have a very good knowledge of the specific compiler and target architecture.

A data race is a rather insidious case of undefined behaviour in C. Such undesirable situation, triggered by conflicting access from multiple threads to overlapping memory locations, can be seen as a specific class of safety violations.

figure b

Let us consider two parallel threads respectively executing functions f and g on the left. Assuming that shared variable x is initially 0, one might be tempted to conclude that the value of x will eventually be either 1 or 2, depending on which thread is executed first. However, this reasoning incorrectly implies that the two threads are executed in sequence. In fact, f and g may interleave and interfere with each other: if f is pre-empted right after its read access to x, then g increases x to 1, and finally f multiplies the previously stored value of x by two, the final value of x will be 0.

Many techniques for static checking of generic safety properties are available, e.g. traditional symbolic execution and testing [10, 44], well-known under- and over-approximated analyses [5, 12], and more recent inductive methods [9]. Mature off-the-shelf static analysers typically accommodate such techniques within modular workflows, in form of mechanised encodings for efficient general-purpose decision procedures. Concurrency, as well as specific aspects thereof, can similarly be handled separately, as e.g. in context-bounded analysis [47, 59] and in the emulation of weak memory models under sequential consistency [1].

Driven by the same modularity principle, in this paper we focus on static detection of data races in multi-threaded C programs [39]. Much like dynamic detection, we wish to (i) monitor shared-memory access to keep track of the operating thread along with the relevant locations, and at the same time (ii) check for interfering operations from other threads. Remarkably, unlike dynamic detection, we cannot rely on low-level facilities offered by the operating system to inspect memory access. Conducive to static detection is thus the embedding of the whole detection mechanism within the program of interest. Intuitively, we instrument each relevant statement with a few operations on auxiliary variables and assertions; by construction, a feasible violation of any such assertion will indicate a feasible data race at the corresponding point of the program.

The above encoding yields a reduction to reachability. The in-program detection system hinges on a diligent bookkeeping of the relevant memory locations. Our instrumentation introduces no spurious or missed data races w.r.t. the feasible behaviour of the program, while avoiding any explicit representation or direct manipulation of memory locations. This transparently delegates all complexities (e.g. pointer aliasing, complex data structures, etc.) to the technology chosen for reachability analysis, while retaining maximum accuracy of detection.

We implemented a prototype data race detector, CSeq-DR, by integrating our encoding within an existing sequentialisation-based workflow [37] for context-bounded analysis. We compared CSeq-DR against four state-of-the-art data race detectors, including the best-performing tools at SV-COMP 2022 and 2023 [3, 4]. CSeq-DR proves competitive on the SV-COMP23 benchmarks; most notably, it discovers new issues in the implementation of well-known lock-free data structures [25, 34]. Guided by a detailed static analysis of the SV-COMP23 benchmarks, we designed a second set of benchmarks, EDR, to improve the coverage of specific features that are particularly relevant to data race detection, e.g. complex synchronisation, shared composite data types, and pointers. CSeq-DR shows a superior precision in the analysis on this second set of benchmarks.

Structure of the Paper. Section 2 introduces the syntax, semantics, and execution model of C programs with POSIX threads. Section 3 illustrates our main technical contribution, i.e. our reduction from data race detection to reachability. Section 4 sketches our prototype implementation and presents the experimental results. Sections 5 and 6 discuss related work and report final considerations, respectively.

2 Multi-threaded C Programs

A multi-threaded C program with POSIX threads [39] consists of multiple threads that can perform local computations, interact through the shared memory, and invoke pthread routines for thread creation, synchronisation, etc. At any point during the execution of the program, only the active thread can perform computations. Initially the main thread is active, and it is the only existing thread. New threads are spawned from the active thread, and added to the pool of inactive threads. On a context switch the active thread is pre-empted and becomes inactive, and one from the pool of inactive threads is activated. When a thread becomes active for the first time, its execution starts from the beginning; otherwise the thread continues from where it was last pre-empted.

Fig. 1.
figure 1

Running example

Let us now refer to the example program of Fig. 1 to informally introduce the syntax of multi-threaded C programs. A program consists of a sequence of declarations of global variables (in this case, only x) shared by all threads, followed by a sequence of function definitions (in this case g, f, and main, without input and output parameters for simplicity). The body of a function is composed of the declaration of local variables (like tid1 and tid2 in the main function) and the statements to be executed upon invocation. A compound statement or block is a sequence of statements enclosed in curly brackets. A statement (or an expression) involving only operations on the local memory without calls to a pthread routine is non-visible, otherwise visible (as all the statements in the example). The pthread routines include pthread_create to spawn a thread from a function (in the example with a simplified call) and pthread_join to wait for a specific thread to terminate. Other routines, e.g. for synchronisation via locks, conditional waiting, barriers, etc. are supported but not relevant here; it is also possible to explicitly declare atomic compound statements, whose execution cannot be pre-empted (as in compare-and-swap operations [32], GCC built-in atomics, and so on). We finally add the usual primitives for program verification, namely assume to discard all executions not satisfying a given condition, assert to express safety properties of interest, and nondet to non-deterministically assign to a variable any value allowed by its data type.

In the example program of Fig. 1 there are three threads. The main thread of the program, corresponding to its main function, is spawned at the beginning. The main thread in turn spawns two threads (from functions f and g respectively) and waits for them to terminate; it then checks whether the value of x is unchanged. The two threads update the value of x concurrently as shown.

The state of a multi-threaded program consists of the identifier of the active thread, a snapshot of the shared memory (i.e. an evaluation of the variables stored therein), and the local state of each thread (i.e. active or not); the local state of a thread consists of a local memory snapshot, the thread’s program counter pointing to the statement being executed, and a stack to handle procedure calls. In the initial state, the identifier of the active thread corresponds to the main thread, the program counter of the only thread points to the first statement of the main function, the call stack is empty, and each variable is assigned its init expression, if any, or either 0 or nondet, respectively for global and local variables. A transition is a change of state in the program resulting from the execution of a statement. An execution is a sequence of consecutive transitions from the initial state. An execution context is a sequence of transitions performed by a thread between it activation and the following pre-emption (or termination). A round-robin execution is an execution where the threads are activated in a round-robin fashion (or rounds) according to their static order of creation in the program. A context-bounded execution is an execution with a given number of context switches. Considering the example program, an execution invoking main, f, g, main, g, and f takes 3 rounds, or 5 context switches.

Throughout the paper we assume sequential consistency [48]; it is worth noticing that this does not inherently limit the applicability of our technique, since so-called weak memory models for modern hardware can be soundly simulated under sequential consistency with extra computations and nondeterminism [1]. Without loss of generality, we also assume that each non-compound statement involves at most either one global variable or a pointer, without side effects. We call such statements simple, observing that any complex (i.e. non-simple) statement can be transformed into an equivalent sequence of simple statements with temporary variables [13, 53]. Similarly, we assume that branch and loop conditions only refer to a single local scalar variable, and that function calls input parameters and return values are passed through local variables.

3 Encoding Data Race Checking as Reachability

In this section, we define a program transformation that encodes data race checking as reachability. We say that a multi-threaded program contains a data race if it can execute two conflicting actions (i.e. one thread writes to a memory location and another one reads from or writes to the same location), at least one of which is not atomic, and neither happens before the other [40]. In the rest of the paper we refer to a program as unsafe or safe depending on whether or not that program contains a data race.

We initially sketch our program transformation for simple cases and then progressively generalise it, elaborating a correctness argument as we go along. The key idea of our technique is to decorate each visible statement of the program under analysis with guarded assertions and operations on auxiliary variables. Such variables are synchronously updated to keep track of the threads and memory locations potentially involved in conflicting actions, while the guarded assertions combine extracted fragments of the visible statement in question to predicate on them. By construction, a violation of any of the assertions will indicate a feasible data race at the corresponding point of the initial program.

Auxiliary Variables. We initially add to the program under analysis the auxiliary global variables waddr and wtid to store the target address of the current shared-memory write operation and the identifier of the writing thread, respectively. Both variables are initialised to 0, indicating that no shared memory location is being written and no thread is writing to the shared memory.

Fig. 2.
figure 2

Basic encoding for a read operation on a shared variable.

Basic Operations.

We transform a simple read operation from a shared variable as shown in Fig. 2. The program fragment being transformed is l = g, where the value of a global variable g is assigned to a local variable l (line 5). Right before such operation, we check that the thread wtid (if any) currently about to write to the shared memory and the current thread pthread_self are not the same. If so, we further check whether the read address &g of g and the write address waddr match: if they do, the assertion fails; otherwise, the access is completed. Observe that the above check and the statement being encoded are wrapped into a single atomic statement to prevent in-between context switching.

Fig. 3.
figure 3

Basic encoding for a write operation on a shared variable.

Let us dissect the transformation for a simple write operation g = 3, where a shared variable is assigned a constant value (Fig. 3). It consists of two atomic blocks. The first one is similar to the encoding of a read operation, except that right before the actual assignment (line 7) we set the writing thread wtid to the current thread and waddr to the address &g of g (lines 5–6). As in the case of a read operation, the guarded assertion checks upfront that no other thread is currently trying to write to the same address (lines 2–3). If so, we update g as originally intended (line 7). In the second block we simply re-set waddr and wtid.

Proof Sketch (Reduction to Reachability). Intuitively, the race detection mechanism exploits the possible pre-emption of an encoded write operation right before the auxiliary variables waddr and wtid are re-set (line 9 in Fig. 3): at that point, another thread competing for a read or write operation can become active and reach an assertion violation. More concretely, suppose that the program under analysis is composed of a reader thread and a writer thread respectively executing l = g and g = 3 without synchronisation. Clearly, this program is unsafe according to the definition at the beginning of the section. The transformed program with the two threads encoded as in Figs. 2 and 3 must therefore contain a reachable assertion failure. Indeed, the writer thread can become active first, and then it can be pre-empted right before the second atomic block (line 9 in Fig. 3), so that the reader will become active, failing the assertion (line 3 in Fig. 2). Conversely, suppose the two threads are properly synchronised, e.g. via a shared lock. If the reader becomes active first, the assertion in there cannot fail as wtid and waddr are initialised to zero; since the reader does not modify such variables, the assertion checked by the writer thread activated subsequently will not fail either. If the writer becomes active first, wtid and waddr are both set and re-set within the same execution context, therefore the reader will not be able to fail the assertion check later. Observe that the argument for two writer threads would be similar as above. Finally two readers cannot trigger any assertion failure, because both wtid and waddr will be always 0.

Multiple Access. The encoding seen so far covers the basic case of a single access to a shared variable. In practice, multiple accesses to possibly different shared variables may occur within a statement. Since non-compound complex statements are assumed to be transformed into simple statements upfront (Sect. 2), this circumstance is limited to compound statements. For a regular block, we just encode the statements therein one by one. For an atomic block, however, this would not work because the pre-emption of encoded writes (Fig. 3) necessary for race detection would be disallowed. We thus encode atomic blocks in one go, as follows.

Fig. 4.
figure 4

Encoding multiple shared-memory access

Let us consider the statement \(\texttt{atomic} \mathtt {\{stmt_\texttt {1}; stmt_\texttt {2}; \dots \texttt {\}}}\), where every \(\texttt {stmt}_i\) is simple. The encoding template for such statement (Fig. 4) generalises the previous simple cases (Figs. 2, 4 and 3). The different x\(_\textit{i}\) and w\(_\textit{j}\) are placeholders to be replaced with syntactic fragments of the statement in question that involve access to the shared memory. We refer to every such fragment as a target expression. Let us denote with \(X = \{\texttt {x}_\texttt {1}, \dots , \texttt {x}_\textit{n}\}\) the set of target expressions for either a read or a write operation, and \(W = \{\texttt {w}_\texttt {1}, \dots , \texttt {w}_\textit{m}\}\) the set of target expressions for write operations. For example, we would have \( X = \{\texttt { \& g}\}\) and \(W = \{\}\) for the read operation l = g considered in Fig. 2, while \( X = W = \{\texttt { \& g}\}\) for the write operation g = 3 of Fig. 3. The guarded assertion for race detection is now expanded into multiple assertions (one per target in X, lines 3–5 in Fig. 4), whereas waddr is non-deterministically assigned to any of the write targets in W (lines 8–11). The non-deterministic assignment to waddr keeps the encoding compact; in particular, it avoids having to store the different target addresses for write operations separately (for instance by representing waddr as an array of m elements), which would in turn result in m \(\cdot \) n assertions at lines 3–5. We finally omit the second atomic block (lines 14–16) when W = \(\{\}\).

Proof sketch (Over-Approximation of Target ESxpressions). In order to build the sets X and W for a given statement, it is crucial to categorise its visible expressions as read-or-write or write-only target expressions. While this is relatively straightforward, deciding whether an expression entails shared-memory access is generally undecidable in the presence of pointers. In that respect, a convenient feature of our encoding is in that non-visible expressions can be added to X and W without detriment to soundness. To see why, let us suppose that some non-visible expressions x\(_i\) and w\(_j\) are added to X and W, causing a violation of one of the assertions (lines 3–5 of Fig. 4). Observe that both elements will result in additional checks at those lines; in the case of w\(_j\) indirectly, through a preceding non-deterministic assignment to waddr (lines 9–11) from another thread wtid. If only one of the expressions (i.e. only x\(_i\) or w\(_j\)) in the failing assertion is non-visible, a match with the other (visible) expression would not be possible, since the local storage of a thread and the shared memory cannot overlap. If instead the failing assertion compares w\(_j\) to x\(_i\), these would necessarily refer to the local storage space of two distinct threads (respectively wtid and pthread_self, as enforced by the guard at line 2), and therefore no match would be possible either. Given this argument, one could dispense with the detection of visible expressions and just populate X and W as if every expression was visible, without having to worry about false positives; this can be particularly useful for an actual implementation.

Composite Data Types and Pointers. Conflicting access to composite data types, possibly via pointers, requires some further ingenuity to achieve a precise representation of memory interference, and avoid unsoundess.

Fig. 5.
figure 5

Byte-precise tracking of memory locations

In the diagram of Fig. 5 (left), an array A of short integers (two bytes each element) is concurrently accessed at different positions. No data race is actually taking place as the memory locations being accessed are disjunct. However, an imprecise analysis based on a simple match of the base address of the shared data structure being accessed can raise false alarms. A similar situation can arise in the case of concurrent access to different fields of a shared struct (but not for a union, since all fields of a union have the same base address). Handling such cases requires to take into account the memory offsets for the different indexes of the array. Since our technique does not represent the target memory locations explicitly, but only through extracted program fragments that are pasted verbatim where appropriate, this entails no extra effort.

In the diagram of Fig. 5 (right), a producer and a consumer thread operate a shared buffer by respectively writing blocks of 8 bytes using long integers, and reading from the buffer byte by byte (e.g. to compute some low-level operation like byte-wise CRC [56]) into a char as soon as new data becomes available. The two threads access the buffer via local pointers of different types, while a shared index signposts available data not yet consumed. Due to a programming glitch in the handling of the shared index, the two operations may end up targeting different base addresses within the buffer, yet overlapping memory locations. Without taking into account the byte-width of the data being accessed, such conflicting access would be unsoundly marked as safe. We accommodate this in our encoding with an additional auxiliary variable wlen to be updated along with the others right before each write operation, and amend the guarded assertions accordingly.

Fig. 6.
figure 6

Encoding for data race checking, general case

We can finally define a general template for our encoding for data race detection. The memory locations currently about to be written span from waddr to waddr+wlen, and from x\(_{\texttt {\textit{i}}}\) and x\(_{\texttt {\textit{i}}}\)+xlen\(_{\texttt {\textit{i}}}\), respectively for the competing thread wtid and for the i-th access operation in the statement being encoded. The encoding is shown in Fig. 6. The amended checks (assertions at lines 3–7) detect overlaps in the above intervals. The non-deterministic assignment of waddr to any w\(_\textit{i}\) in the set W of write target expressions (lines 10–13) is unchanged, and the subsequent assignment of wlen accounts for the size of the appropriate write target expression (line 14).

4 Experimental Evaluation

Prototype. We have developed a prototype tool, CSeq-DR, that can detect data races in multi-threaded programs with POSIX threads in (a representative fragment of) C99 extended with atomic compound statements (Sect. 2).

Fig. 7.
figure 7

Prototype verification flow for data race detection

The overall verification flow is shown in Fig. 7. The three leftmost boxes integrate our encoding for data race detection (Sect. 3) within CSeq-Lazy [36], a sequentialisation-based tool for context-bounded analysis. Program P is unfolded into a bounded program \(P_u\), equivalent to P up to the given unwinding bound u. Program \(P_u\) is then instrumented for data-race checking, obtaining program \(P'_u\). Observe that \(P_u\) is instrumented, not P: the simpler structure of \(P_u\) makes it easier to build the sets X and W of targets (Sect. 3). To identify potentially-visible statements we distinguish between local and global variables, pointers and non-pointers, and structures and non-structures, possibly following structure fields recursively, and conservatively considering pointers as global variables.Footnote 1 Finally, \(P'_u\) is turned into a sequential program \(Q'_{u,r}\) that simulates all executions of P up to u loop iterations and r rounds, and fails an assertion if and only if an execution of P can lead to a data race within the given bounds. At this point, different tools can be plugged in to analyse \(Q'_{u,r}\) [37]. We use the CBMC [12], which reduces reachability in \(Q'_{u,r}\) to propositional satisfiability of \(\phi \), and in turn invokes MiniSat [19] to find a satisfiable assignment for \(\phi \), if any.

Benchmarks. We adopted as a first benchmark set the programs from the ConcurrencySafety track of the software verification competition (SV-COMP23) [4]. This widely used set yields a good coverage of the core features of the C programming language as well as of the basic concurrency mechanisms. All the tools we compare against have been fine-tuned on this set for the competition, which include different elements of complexity related to program analysis, such as complex control flow, deep loops, use of pointers, non-determinism, large amounts of threads, and so on. However, the set it not specific for data race checking.

In addition, we prepared an extended data race (EDR) benchmark set to specifically improve the coverage of a variety of cases that are particularly relevant to data race analysis. The benchmarks are organised into different subcategories: arrays-ptrs for operations on shared arrays and pointers, referencing and dereferencing, and type casting; structs-unions for other shared composite data structures (and combinations thereof); mixed-structs for different combinations of the first two subcategories; nested-locks for synchronisation with nested locks and atomic sections; multiple-rw for multiple read-write access to the shared memory; prod-cons for variants of the traditional producer-consumer example with shared-memory access via pointers of mixed types.

Table 1. Summary of benchmarks features

The complementarity of the SV-COMP23 and EDR benchmarks can be observed in Table 1, which compares them in terms of different complexity metrics and feature coverage. The two groups of rows refer to SV-COMP23 (top) and EDR (bottom). The two groups of columns refer to common sources of complexity for program analysis in general (left) and features that are of particular interest for data-race detection (right). The reported measures are the average number of lines of code (LOC), cyclomatic complexity (CC), number of threads (T), with starred values computed excluding instances with an infinite number of threads. The vertical bars represent the percentage of instances with specific characteristics, namely non-determinism (Nondet), pointers (Ptr), arrays (Arr), other composite data types such as struct or unions (Struct), non-trivial synchronisation (Sync), multiple shared-memory write operations (Multi), and pointer arithmetics (Ptr+).

As shown in the table, the SV-COMP23 benchmarks are not very representative of the sources of complexity specifically related to data race checking (top-right part of the table), and these always occur, when at all, together with generic elements of complexity (top-left part). The EDR set effectively counterbalances that by limiting generic sources of complexity (bottom-left) to focus on instances that are more interesting for race detection (bottom-right).

Setup. We evaluated CSeq-DR against a selection of four state-of-the-art data race checkers. Dartagnan [26, 50] is an SMT-based bounded model checker that leverages common LLVM [49] compiler optimisations to simplify the input program. Deagle [33, 65] is a SAT-based bounded model checker built on top of CBMC [12] with an efficient handling of concurrency and a tailored SAT decision procedure; it was the winner in the ConcurrencySafety category at SV-COMP 2023 [4], which subsumes the NoDataRace demo category of the previous edition of the competition. Ultimate GemCutter [45] is based on counterexample-guided abstraction refinement; it ranked first at SV-COMP 2022 [3] for the NoDataRace demo category. Goblint [61, 66] is a static analyser for data race checking based on thread-modular abstract interpretation. We used the following versions of the selected verifiers: Dartagnan 3.1.1 [15], Deagle 2.1 [16], GemCutter 0.2.2 [27], Goblint 1.8.2 [28].

We run the experiments on an otherwise idle workstation equipped with a dual Xeon E5-2687W 8-core 3.10 Hz processor and 128 GB of memory, running 64-bit GNU/Linux 5.10.27, with a memory limit of 16 GB and a timeout of 15 min for each instance (as in SV-COMP). In terms of parameters, bounded model checking requires a default unwinding bound to be used whenever a precise number of iterations for a loop cannot be computed upfront. We set an unwinding bound of 3 for Dartagnan and CSeq-DR, observing that our tool fully unwinds a loop whenever a bound can be statically computed; Deagle does not allow setting the unwinding bound but hardcodes a specific unwinding strategy which is fine-tuned for the SV-COMP benchmarks. Our prototype also requires another bound for context-bounded analysis, which we set to 3 rounds. GemCutter and Goblint implement over-approximate analyses which require no bounds; for these two tools we adopted their default configurations.

Table 2. Verification verdicts (SV-COMP23)

Experimental Results (SV-COMP23). The experimental results on the SV-COMP23 benchmarks are summarised in Table 2. Here, the columns left to right report the subcategory, the total number of instances (Count), correct results (races found or confirmed race freedom) (Correct), incorrect results (races missed or false alarms) (Wrong), internal errors (i.e. the tool crashed, threw an error, was unable to answer) (Error), and resources limits hits (Unknown). The maximum values for each subcategory are boxed. Our prototype CSeq-DR provides 665 correct verification verdicts, 0 incorrect verdicts, fails to produce an answer in 35 instances, and hits the resource limits on 83 instances.Footnote 2

In the goblint-regression subset, CSeq-DR fails to analyse 10 programs due to unsupported pthread library functions, parsing issues, and other internal errors. The analysis turns out to be too expensive on 57 instances; 50 of these are specifically crafted examples with ten thousands threads on which all verifiers struggle, except Goblint itself (also see relevant entry in Table 1). CSeq-DR is unable to handle any of the 6 ldv-linux instances due to embedded assembly code, 1 instance of pthread-divine causing an internal error, and all the 18 instances of pthread-drv-races due to function pointers causing the function inlining module to crash. In pthread and pthread-C-DAC, our tool hits the resource limits on a total of 10 programs with large loops (up to one thousand iterations); the loop unfolding module is able to statically determine the loops bound and fully unwind these loops, but the unfolded encoding ends up being too large to be analysed within the given resource limits. The pthread-complex subcategory is a small collection of programs with complex implementations of lock-free data structures whose analysis is notoriously difficult [38], and our tool does indeed struggle in 2 out of 4 instances. Interestingly, CSeq-DR is able to discover new issues in the remaining two instances, elimination_backoff_stack and workstealqueue_mutex-2, respectively containing well-known implementations of a stack [34] and a queue [25].Footnote 3 Finally, CSeq-DR hits the resource limits in 1 instance of pthread-extFootnote 4, and on 13 weaver instances containing loops with non-deterministic exit conditions, and dynamic allocation of blocks of non-deterministic size.

Dartagnan categorises 530 programs correctly, rejects 201 programs due to unsupported features and internal errors, times out on 48 instances, and incorrectly classifies 4 instances. Deagle produces 651 correct results, fails to provide a verdict in 121 cases due to unsupported syntax and internal errors, and times out on 11 instances. GemCutter correctly categorises 430 instances, with internal errors on 54 instances, and 299 timeouts. Goblint achieves 741 correct verification verdicts. However, it reports the incorrect verification verdict for 41 instances due to over-approximation.Footnote 5 The tool times out on a single instance.

Experimental Results (EDR). Table 3 reports the verification verdicts on the EDR benchmarks, divided by sub-category. For each sub-category, the results are split in two separate rows for unsafe (top) and safe instances (bottom). CSeq-DR correctly verifies all benchmarks.

Table 3. Verification verdicts (EDR)

Dartagnan misses 1 data race in arrays-ptrs due to type casting. It also misses races on struct-to-struct assignments, generating 4 incorrect results on structs-unions. Pointers cause 3 missed races on mixed-structs, and non-synchronised read-write access causes 7 incorrect results on nested-locks and 5 on multiple-rw. The tool hits the resources limits on 10 instances of prod-cons.

Deagle misses 12 races on arrays-ptrs due to pointer operations, type casting, aliasing, and arrays. It misses 9 races on structs-unions. On mixed-structs, it generates 3 false and misses 8 races, totalising 11 errors. On nested-locks, it misses 2 races due to multiple shared-memory access, and rejects 4 programs due to use of locks occurring within atomic blocks. On multiple-rw, it misses 1 data race involving multiple writes to composite structures. Lastly, Deagle misses 6 races in prod-cons where the shared memory is accessed via pointers.

GemCutter generates 2 false races on arrays-ptrs caused by dereferenced null pointers. Although this is in fact undefined behaviour, it does not strictly cause data races as null pointers are guaranteed to compare unequal to a pointer to any object or function [40]. The tool also misses 3 races on structs-unions and 7 on mixed-structs, and times out on 8 instances on prod-cons.

Goblint incorrectly classifies 10 programs in arrays-ptrs, missing 3 races involving pointer arithmetic, aliasing, and type casting, and generating 7 false alarms on array operations. On structs-unions, Goblint misses 7 races. On mixed-structs, it generates 8 false alarms and misses 3 races. On nested-locks, it generates 1 false alarm. On multiple-rw, it misses 1 race and generates 1 false alarm. At last, it generates 6 false alarms on prod-cons.

Summary. The experiments demonstrate the superiority of our prototype in terms of data-race detection accuracy. In particular, CSeq-DR is the only tool that produces no false positives or negatives (Tables 2 and 3). The accuracy is particularly evident in the presence of sources of complexity that stress the memory representation, where all competitors struggle in many cases (Table 3).

On the SV-COMP23 benchmarks (Table 2), our approach proves to be competitive against the considered state-of-the-art tools. On programs with a large number of threads and complex control flow (e.g. some instances of goblint-regression), CSeq-DR hits the resource limits; however, it does spot two previously undetected data races in complex lock-free data structures. Additionally, CSeq-DR rejects or crashes on considerably fewer instances than the other tools, outperforming Deagle (winner in the ConcurrencySafety category of SV-COMP 2023), GemCutter (which ranked first in the NoDataRace demo category of SV-COMP 2022), and Dartagnan in terms of correct results.

Fig. 8.
figure 8

Analysis runtime comparison (SV-COMP23, EDR)

As for speed (Fig. 8), CSeq-DR outperforms GemCutter and Dartagnan. Goblint proves comparatively quite fast, but its overly conservative approximation yields numerous false alarms on both benchmarks, resulting in the overall highest number of incorrect verification verdicts (Tables 2 and 3). Deagle proves capable of fast analysis too, also thanks to the unwinding strategy fine-tuned for SV-COMP23, but looses precision considerably on EDR (Table 3).

5 Related Work

As a recent trend in the development of programming languages and memory models, considerable effort has been devoted to balance the conflicting desiderata of programmers, compiler developers, and hardware vendors by moving towards stricter semantics to limit the possibility of data races upfront. For instance, in data race freedom semantics, all data-race-free parts of a program are guaranteed to have sequential semantics [18]; other approaches let the compiler synchronise shared-memory access in the likelihood of races [42], certify that some compiler optimisations will not introduce incorrectness [42, 57], or even disallow some of them [18]. Nevertheless, such efforts are hardly effective e.g. with legacy code, low-level device drivers, and existing codebase in currently still widespread programming languages and platforms.

Program transformation to handle concurrency (or specific aspects thereof) is relied upon, among the others, by preprocessors in the style of Rek [11] and early versions of CSeq [24], both implementing so-called eager sequentialisation to reduce to sequential reachability [47], and in the mentioned semantic-preserving encodings from weak memory models to sequential consistency [1]. An early proof-of-concept implementation [14] of CSeq-DR could only handle basic memory access, achieving modest results (5th place with 6 false positives) at SV-COMP 2022 [3]. GemCutter also relies on program transformation for detecting races [17], but needs one auxiliary variable per global variable in the program, while we only introduce three variables for the whole program; similarly to [14], its analysis beyond basic memory access can be inaccurate. An extension of lazy sequentialisation for deadlock checking is proposed in [35].

Besides the ones considered in this paper (Sect. 4), static techniques for race detection usually rely on locksets to determine safe synchronisation of memory access [20, 21, 41, 58, 62, 63, 67, 69]. Known tools include Locksmith [58] and RELAY [67], which introduce relative locksets for scalability; these tools may return incorrect verdicts in presence of pointers. Lockset-based analysis is usually over-approximated, thus it can prove the absence of races or report potential races. Possible ways to reduce spurious warnings are considered in [41]. Static tools for other languages include LLOV [8] for OpenMP programs in C, C++, and FORTRAN [55], and RacerD [6] and Chord [54] for Java.

Dynamic data race detection looks for conflicting memory access at runtime. Known tools include Pacer [7], which uses sampling strategies for performance improvement, ThreadSanitizer [63] for C++ and Go, ROMP [31] for parallel OpenMP applications, Nondeterminator [23] for the Cilk language, and TSVD [51], a thread-safety violation detector that injects delays on the program to expose races. Dynamic analysis can spot potential races in real software projects, but due to thread interleaving without a measurable coverage of the feasible behaviours of the system under analysis; on particularly critical software components, static analyses such as the one proposed in this paper can complement that with a systematic coverage and greater accuracy, when feasible.

6 Conclusion

C programs are particularly vulnerable to subtle data races. We have addressed this problem with a technique that automatically annotates a program and, combined with lazy sequentialisation and bounded model checking, yields effective under-approximate data race detection.

Our prototype implementation has proved competitive with state-of-the-art technology, showing an unmatched precision of analysis in the presence of complex synchronisation patterns and particularly relevant language features such as shared composite data types, and pointers. The approach can, in general, yield great detection accuracy at additional computational effort, which may be beneficial in the analysis of particularly critical software components. At the same time, our specific implementation has shown that context-bounded analysis can effectively mitigate the overhead introduced with our encoding.

Our program instrumentation allows to build the set of target expressions via relatively inexpensive yet conservative static analysis, at the cost of additional overhead but with no detriment to detection accuracy. Our prototype refines the sets of visible expressions by recursively inspecting composite data structures, but stops short of performing any pointer analysis. But of course one can plug in more sophisticated static analyses to calculate the target expressions. We leave for future work the investigation of different trade-offs between a more precise static analysis for working out the target expressions and overall performance of race detection. We also plan to explore the combination of our encoding with dynamic partial order reduction [46] for potential efficiency gains.

As commonplace for under-approximated analyses, our approach can miss bugs if bounds are insufficiently large. Nonetheless, out of 665 correct verification verdicts of the SV-COMP23 benchmarks, our prototype was able to compute static loop bounds and fully unfold 431 safe instances, basically failing to do so only on unbounded or non-deterministic loops. Also, it is empirically known that concurrency errors on real software typically occur within a few context switches [60] or a few memory operations [52]. In the future, we plan to experiment with alternative techniques to handle loops, such as k-induction [64], sequentialisation without unfolding [24, 47], and context-unbounded sequentialisation on top of modern back ends for unbounded analysis such as Kratos2 [30].