BMC for Weak Memory Models

. We present Dartagnan , a bounded model checker (BMC) for concurrent programs under weak memory models. Its distinguishing feature is that the memory model is not implemented inside the tool but taken as part of the input. Dartagnan reads CAT , the standard language for memory models, which allows to deﬁne x86/ TSO , ARM v7, ARM v8, Power , C/C++, and Linux kernel concurrency primitives. BMC with memory models as inputs is challenging. One has to encode into SMT not only the program but also its semantics as deﬁned by the memory model. What makes Dartagnan scale is its relation analysis, a novel static analysis that signiﬁcantly reduces the size of the encoding. Dartagnan matches or even exceeds the performance of the model-speciﬁc veriﬁcation tools Nidhugg and CBMC , as well as the performance of Herd , a CAT -compatible litmus testing tool. Compared to the unoptimized encoding, the speed-up is often more than two orders of magnitude.


Introduction
When developing concurrency libraries or operating system kernels, performance and scalability of the concurrency primitives is of paramount importance.These primitives rely on the synchronization guarantees of the underlying hardware and the programming language runtime environment.The formal semantics of these guarantees are often defined in terms of weak memory models.There is considerable interest in verification tools that take memory models into account [5,9,13,22].
A successful approach to formalizing weak memory models is CAT [11,12,16], a flexible specification language in which all memory models considered so far can be expressed succinctly.CAT, together with its accompanying tool Herd [4], has been used to formalize the semantics not only of assembly for x86/TSO, Power, ARMv7 and ARMv8, but also high-level programming languages, such as C/C++, transactional memory extensions, and recently the Linux kernel concurrency primitives [11,15,16,18,20,24,29].This success indicates the need for universal verification tools that are not limited to a specific memory model.
We present Dartagnan [3], a bounded model checker that takes memory models as inputs.Dartagnan expects a concurrent program annotated with an assertion and a memory model for which the verification should be conducted.It verifies the assertion on those executions of the program that are valid under the given memory model and returns a counterexample execution if the verification fails.As is typical of BMC, the verification results hold relative to an unrolling bound [21].The encoding phase, however, is new.Not only the program but also its semantics as defined by the CAT model are translated into an SMT formula.
Having to take into account the semantics quickly leads to large encodings.To overcome this problem, Dartagnan implements a novel relation analysis, which can be understood as a static analysis of the program semantics as defined by the memory model.More precisely, CAT defines the program semantics in terms of relations between the events that may occur in an execution.Depending on constraints over these relations, an execution is considered valid or invalid.Relation analysis determines the pairs of events that may influence a constraint of the memory model.Any remaining pair can be dropped from the encoding.The analysis is compatible with optimized fixpoint encodings presented in [27,28].
The second novelty is the support for advanced programming constructs.We redesigned Dartagnan's heap model, which now has pointers and arrays.Furthermore, we enriched the set of synchronization primitives, including readmodify-write and read-copy-update (RCU) instructions [26].One motivation for this richer set of programming constructs is the Linux kernel memory model [15] that has recently been added to the kernel documentation [2].This model has already been used by kernel developers to find bugs in and clarify details of the concurrency primitives.Since the model is expected to be refined with further development of the kernel, verification tools will need to quickly accommodate updates in the specification.So far, only Herd [4] has satisfied this requirement.Unfortunately, it is limited to fairly small programs (litmus tests).The present version of Dartagnan offers an alternative with substantially better performance.
We present experiments on a series of benchmarks consisting of 4751 Linux litmus tests and 7 mutual exclusion algorithms executed on TSO, ARM, and Linux.Despite the flexibility of taking memory models as inputs, Dartagnan's performance is comparable to CBMC [13] and considerably better than that of Nidhugg [5,9].Both are model-specific tools.Compared to the previous version of Dartagnan [28] and compared to Herd [4], we gain a speed-up of more than two orders of magnitude, thanks to the relation analysis.

Related Work.
In terms of the verification task to be solved, the following tools are the closest to ours.CBMC [13] is a scalable bounded model checker supporting TSO, but not ARM.An earlier version also supported Power.
Nidhugg [5,9] is a stateless model checker supporting TSO, Power, and a subset of ARMv7.It is excellent for programs with a small number of executions.RCMC [22] implements a stateless model checking algorithm targeting C11.We cannot directly benchmark against it because the source code of the tool is not yet publicly available, nor do we fully support C11. Herd [4] is the only tool aside from ours that takes a CAT memory model as input.Herd does not scale well to programs with a large number of executions, including some of the Linux kernel tests.Other verification tasks (e.g., fence insertion to restore sequential consistency) are tackled by Memorax [6][7][8], offence [14], Fender [23], DFence [25], and trencher [19].
Relation Analysis on an Example.Consider the program (in the .litmusformat) given to the left in the figure below.The assertion asks whether there is a reachable state with final values EBX = 1, ECX = 0. We analyze the program under the x86-TSO memory model shown below the program.The semantics of the program under TSO is a set of executions.An execution is a graph, similar to the one given below, where the nodes are events and the edges correspond to the relations defined by the memory model.Events are instances of instructions that access the shared memory: R (loads), W (stores, including initial stores), and M (the union of both).The atomic exchange instruction xchg [x], EAX gives rise to a pair of read and write events related by a (dashed) rmw edge.Such reads and writes belong to the set A of atomic read-modify-write events.
The relations rf, co, and fr model the communication of instructions via the shared memory (reading from a write, coherence, overwriting a read).Their restrictions rfe, coe, and fre denote (external) communication between instructions from different threads.Relation po is the program order within the same thread and po-loc is its restriction to events addressing the same memory location.Edges of mfence relate events separated by a fence.Further relations are derived from these base relations.To belong to the TSO semantics of the program, an execution has to satisfy the constraints of the memory model: empty rmw ∩ (fre ; coe), which enforces atomicity of read-modify-write events, and the two acyclicity constraints.
Dartagnan encodes the semantics of the given program under the given memory model into an SMT formula.The problem is that each edge (a, b) that may be present in a relation r gives rise to a variable r(a, b).The goal of our relation analysis is to reduce the number of edges that need to be encoded.We illustrate this on the constraint acyclic ghb-tso.The graph next to the program shows the 14 (dotted and solid) edges which may contribute to the relation ghbtso.Of those, only the 6 solid edges can occur in a cycle.The dotted edges can be dropped from the SMT encoding.Our relation analysis determines the solid edges-edges that may have an influence on a constraint of the memory model.Additionally, ghb-tso is a composition of various subrelations (e.g., po-tso or co ∪ fr) that also require encoding into SMT.Relation analysis applies to subrelations as well.Applied to all constraints, it reduces the number of encoded edges for all (sub)relations from 221 to 58.

Input, Functionality, and Implementation
Dartagnan has the ambition of being widely applicable, from assembly over operating system code written in C/C++ to lock-free data structures.The tool accepts programs in PPC, x86, AArch64 assembly, and a subset of C11, all limited to the subsets supported by Herd's .litmusformat.It also reads our own .ptsformat with C11-like syntax [28].We refer to global variables as memory locations and to local variables as registers.We support pointers, i.e., a register may hold the address of a location.Addresses and values are integers, and we allow the same arithmetic operations for addresses as for regular integer values.Different synchronization mechanisms are available, including variants of readmodify-write, various fences, and RCU instructions [26].
We support the assertion language of Herd.Assertions define inequalities over the values of registers and locations.They come with quantifiers over the reachable states that should satisfy the inequalities.
We use the CAT language [11,12,16] to define memory models.A memory model consists of named relations between events that may occur in an execution.Whether or not an execution is valid is defined by constraints over these relations: CAT has a rich relational language, and we only show an excerpt above.Socalled base relations b model the control flow, data flow, and synchronization constraints.The language provides intuitive operators to derive further relations.One may define relations recursively by referencing named relations.Their semantics is the least fixpoint.
Dartagnan is invoked with two inputs: the program, annotated with an assertion over the final states, and the memory model.There are two optional parameters related to the verification.The SMT encoding technique for recursive relations is defined by mode chosen between knastertarski (default) and idl (see below).The parameter alias, chosen between none and andersen (default), defines whether to use an alias analysis for our relation analysis (cf.Sect.3).
Being a bounded model checker, Dartagnan computes an unrolled program with conditionals but no loops.It encodes this acyclic program together with the memory model into an SMT formula and passes it to the Z3 solver.The formula has the form ψ prog ∧ ψ assert ∧ ψ mm , where ψ prog encodes the program, ψ assert the assertion, and ψ mm the memory model.We elaborate on the encoding of the program and the memory model.The assertion is already given as a formula.
We model the heap by encoding a new memory location for each variable and a set of locations for each memory allocation of an array.Every location has an address encoded as an integer variable whose value is chosen by the solver.In an array, the locations are required to have consecutive addresses.Instances of instructions are modeled as events, most notably stores (to the shared memory) and loads (from the shared memory).
We encode relations by associating pairs of events with Boolean variables.Whether the pair (e 1 , e 2 ) is contained in relation r is indicated by the variable r(e 1 , e 2 ).Encoding the relations r 1 ∩ r 2 , r 1 ∪ r 2 , r 1 ; r 2 , r 1 \ r 2 and r −1 is straightforward [27].For recursively defined and (reflexive and) transitive relations, Dartagnan lets the user choose between two methods for computing fixed points by setting the appropriate parameter.The integer-difference logic (IDL) method encodes a Kleene iteration by means of integer variables (one for each pair of events) representing the step in which the pair was added to the relation [27].The Knaster-Tarski encoding simply looks for a post fixpoint.We have shown in [28] that this is sufficient for reachability analysis.

Relation Analysis
To optimize the size of the encoding (and the solving times), we found it essential to reduce the domains of the relations.We determine for each relation a static over-approximation of the pairs of events that may be in this relation.Even more, we restrict the relation to the set of pairs that may influence a constraint of the given memory model.These restricted sets are the relation analysis information (of the program relative to the memory model).Technically, we compute, for each relation r, two sets of event pairs, M (r) and A(r).The former contains socalled may pairs, pairs of events that may be in relation r.This does not yet take into account whether the may pairs occur in some constraint of the memory model.The active pairs A(r) incorporate this information, and hence restrict the set of may pairs.As a consequence of the relation analysis, we only introduce Boolean variables r(e 1 , e 2 ) for the pairs (e 1 , e 2 ) ∈ A(r) to the SMT encoding.
The algorithm for constructing the may set and the active set is a fixpoint computation.What is unconventional is that the two sets propagate their information in different directions.For A(r), the computation proceeds from the constraints and propagates information down the syntax tree of the CAT memory model.The sets M (r) are computed bottom-up the syntax tree.Interestingly, in our implementation, we do not compute the full fixpoint but let the top-down process trigger the required bottom-up computation.
Both sets are computed as least solutions to a common system of inequalities.As we work over powerset lattices (relations are sets after all), the order of the system will be inclusion.We understand each set M (r) and A(r) as a variable, thereby identifying it with its least solution.To begin with, we give the definition for A(r).In the base case, we have a relation r that occurs in a constraint of the memory model.The inequality is defined based on the shape of the constraint: For the emptiness constraint, all pairs of events that may be contained in the relation are relevant.If the constraint requires irreflexivity, what matters are the pairs (e, e).If the constraint requires acyclicity, we concentrate on the pairs (e 1 , e 2 ), where (e 1 , e 2 ) may be in relation r and (e 2 , e 1 ) may be in relation r + .Note how the definition of active pairs triggers the computation of may pairs.If the relation in the constraint is a composed one, the following inequalities propagate the information about the active pairs down the syntax tree of the CAT memory model: The definition maintains the invariant A(r) ⊆ M (r).If a pair (e 1 , e 2 ) is relevant to relation r = r −1 1 , then (e 2 , e 1 ) will be relevant to r 1 .We do not have to intersect We can avoid the intersection with the may pairs for the next case as well.There, A(r) ⊆ M (r) holds by the invariant and M (r) = M (r 1 ) ∩ M (r 2 ) by definition (see below).For union and the other case of subtraction, the intersection with M (r 1 ) is necessary.There are symmetric definitions for union and intersection for r 2 .For a relation r 1 that occurs in a relational composition r = r 1 ; r 2 , the pairs (e 1 , e 3 ) become relevant if they may be composed with a pair (e 3 , e 2 ) in r 2 to obtain a pair (e 1 , e 2 ) relevant to r.Note that for r 2 we again need the may pairs.The definition for r 2 is similar.The definition for the (reflexive and) transitive closure follows the ideas for relational composition.
The definition of the may sets follows the syntax of the CAT memory model bottom-up.With ⊕ ∈ {∪, ∩, ; } and ⊗ ∈ {+, * , −1}, we have: This simply executes the operator of the relation on the corresponding may sets.Subtraction (r 1 \ r 2 ) is the exception, it is not sound to over-approximate r 2 .At the bottom level, the may sets are determined by the base relations.They depend on the shape of the relations and the positions of the events in the control flow.The relations loc, co and rf are concerned with memory accesses.What makes it difficult to approximate these relations is our support for pointers and pointer arithmetic.Without further information, we have to conservatively assume that a memory event may access any address.To improve the precision of the may sets for loc, co, and rf, our fixpoint computation incorporates a may-alias analysis.We use a control-flow insensitive Andersen-style analysis [17].It incurs only a small overhead and produces a close over-approximation of the may sets.The analysis returns1 a set of pairs of memory events PTS ⊆ (W ∪ R) × (W ∪ R) such that every pair of events outside PTS definitely accesses different addresses.Here, W are the store events in the program and R are the loads.Note that the analysis has to be control-flow insensitive as the given memory model may be very weak [10].We have M (loc) ⊇ PTS.Similarly, M (co) and M (rf) are defined by PTS restricted to (W × W) and (W × R), respectively.
We stress the importance of the alias analysis for our relation analysis: loc, co, and rf are frequently used as building blocks of composite relations.Excessive may sets will therefore negatively affect the over-approximations of virtually all relations in a memory model, and keep the overall encoding unnecessarily large.
Illustration.We illustrate the relation analysis on the example from the introduction.Consider constraint acyclic ghb-tso.The computation of the active set for the relation ghb-tso triggers the calculation of the may set, following the inequality A(ghb-tso) ⊇ M (ghb-tso) ∩ M (ghb-tso + ) −1 .The may set is the union of the may sets for the subrelations, shown by colored (dotted and solid) edges.number of instructions.The executions in both memory models coincide, making the performance on ARM comparable to that on TSO for Nidhugg.The difference is due to the optimal exploration in TSO, but not in ARM.Relation analysis has some impact on the performance (see FMCAD-2018 vs. Dartagnan), but the encoding size still grows faster than the number of executions.
The benchmarks Burns, Bakery, and Lamport demonstrate the opposite trend: the number of executions grows much faster than the size of the encoding.Here, CBMC and Dartagnan outperform Nidhugg.Notice that for Burns, Nidhugg performs better on ARM than on TSO with unrolling bound 5.This is counter-intuitive since one expects more executions on ARM.Although the number of executions coincide, the exploration time is higher on TSO due to a different search algorithm.For Szymanski, similar results hold except for Dartagnan-ARM where the encoding grows exponentially.
Figure 2 (left) shows the verification times for the current version of Dartagnan with and without alias analysis.The alias analysis results in a speed-up of more than two orders of magnitude in benchmarks with several threads accessing up to 18 locations.Figure 2 (right) compares the performance of Dartagnan against Herd.We used the Knaster-Tarski encoding and alias analysis since they yield the best performance.Herd outperforms Dartagnan on small test instances (less than 1 s execution time).This is due to the JVM startup time and the preprocessing costs of Dartagnan.However, on large benchmarks, Herd times out while Dartagnan takes less than 10 s.