Keywords

1 Introduction

Denotational semantics defines the meaning of programs compositionally, where the meaning of a program term is a function of the meanings assigned to its immediate syntactic constituents. This key feature makes denotational semantics instrumental in understanding the meaning a piece of code independently of the context under which the code will run. This style of semantics contrasts with standard operational semantics, which only executes closed/whole programs. A basic requirement of such a denotation function \(\llbracket {{-}}\rrbracket \) is for it to be adequate w.r.t. a given operational semantics: plugging program terms \(M\) and \(N\) with equal denotations—i.e. \(\llbracket {M}\rrbracket = \llbracket {N}\rrbracket \)—into some program context \(\Xi \left[ {{-}}\right] \) that closes over their variables, results in observationally indistinguishable closed programs in the given operational semantics. Moreover, assuming that denotations have a defined order (\(\le \)), a “directed” version of adequacy ensures that \(\llbracket {M}\rrbracket \le \llbracket {N}\rrbracket \) implies that all behaviors exhibited by \(\Xi \left[ {M}\right] \) under the operational semantics are also exhibited by \(\Xi \left[ {N}\right] \).

For shared-memory concurrent programming, Brookes’s seminal work [13] defined a denotational semantics, where the denotation \(\llbracket {M}\rrbracket \) is a set of totally ordered traces of \(M\) closed under certain operations, called stutter and mumble. Traces consist of sequences of memory snapshots that \(M\) guarantees to provide while relying on its environment to make other memory snapshots. Brookes [12] used the insights behind this semantics to develop a semantic model for separation logic, and Turon and Wand [46] used them to design a separation logic for refinement. Additionally, Xu et al. [48] used traces as a foundation for the Rely/Guarantee approach for verification of concurrent programs, and Liang et al., Liang et al. [34, 35] used a trace-based program logic for refinement.

A memory model decides what outcomes are possible from the execution of a program. Brookes established the adequacy of the trace-based denotational semantics w.r.t. the operational semantics of the strongest model, known as sequential consistency (SC), where every memory access happens instantaneously and immediately affects all concurrent threads. However, SC is too strong to model real-world shared memory, whether it be of modern hardware, such as x86-TSO [40, 44] and ARM, or of programming languages such as C/C++ and Java [4, 37]. These runtimes follow weak memory models that allow performant implementations, but admit more behaviors than SC.

Do weak memory models admit adequate Brookes-style denotational semantics? This question has been answered affirmatively once, by Jagadeesan et al. [25], who closely followed Brookes to define denotational semantics for x86-TSO. Other weak memory models, in particular, models of programming languages, and non-multi-copy-atomic models, where writes can be observed by different threads in different orders, have so far been out of reach of Brookes’s totally ordered traces, and were only captured by much more sophisticated models based on partial orders [15, 19, 24, 26, 28, 41].

In this paper we target the Release/Acquire memory model (RA, for short). This model, obtained by restricting the C/C++11 memory model to Release/Acquire atomics, is a well-studied fundamental memory model weaker than x86-TSO, which, roughly speaking, ensures “causal consistency” together with “per-location-SC” and “RMW (read-modify-write) atomicity” [29, 30]. These assurances make RA sufficiently strong for implementing common synchronization idioms. RA allows more performant implementations than SC, since, in particular, it allows the reordering of a write followed by a read from a different location, which is commonly performed by hardware, and it is non-multi-copy-atomic, thus allowing less centralized architectures like POWER [45].

Our first contribution is a Brookes-style denotational semantics for RA. As Brookes’s traces are totally ordered, this result may seem counterintuitive. The standard semantics for RA is a declarative (a.k.a. axiomatic) memory model, in the form of acyclicity consistency constraints over partially ordered candidate execution graphs. Since these graphs are not totally ordered, one might expect that Brookes’s traces are insufficient. Nevertheless, our first key observation is that an operational presentation of RA as an interleaving semantics of a weak memory system lends itself to Brookes-style semantics. For that matter, we develop a notion of traces compatible with Kang et al.’s “view-based” machine [27], an operational semantics that is equivalent to RA’s declarative formulation. Our main technical result is the (directed) adequacy of the proposed Brookes-style semantics w.r.t. that operational semantics of RA.

A main challenge when developing a denotational semantics lies in making it sufficiently abstract. While full abstraction is often out of reach, as a yardstick, we want our semantics to be able to justify various compiler transformations/optimizations that are known to be sound under RA [47]. Indeed, an immediate practical application of a denotational semantics is the ability to provide local formal justifications of program transformations, such as those performed by optimizing compilers. In this setting, to show that an optimization \(N\twoheadrightarrow M\) is valid amounts to showing that replacing \(N\) by \(M\) anywhere in a larger program does not introduce new behaviors, which follows from \(\llbracket {M}\rrbracket \le \llbracket {N}\rrbracket \) given a directionally adequate denotation function \(\llbracket {{-}}\rrbracket \).

To support various compiler transformations, we close our denotations under certain operations, including analogs to Brookes’s stutter and mumble, but also several RA-specific operations, that allow us to relate programs which would naively correspond to rather different sets of traces. Given these closure operations, our semantics validates standard program transformations, including structural transformations, algebraic laws of parallel programming, and all known thread-local RA-valid compiler optimizations. Thus, the denotational semantics is instrumental in formally establishing validity of transformations under RA, which is a non-trivial task [19, 47].

Our second contribution is to connect the core semantics of parallel programming languages exhibiting weak behaviors to the more standard semantic account for programming languages with effects. Brookes presented his semantics for a simple imperative WHILE language, but Benton et al., Dvir et al. [6, 20] later recast it atop Moggi’s monad-based approach [38] which uses a functional, higher-order core language. In this approach the core language is modularly extended with effect constructs to denote program effects. In particular, we define parallel composition as a first-class operator. This is in contrast to most of the research of weak memory models that employ imperative languages and assume a single top-level parallel composition.

A denotational semantics given in this monadic style comes ready-made with a rich semantic toolkit for program denotation [7], transformations [5, 8,9,10, 23], reasoning [2, 36], etc. . We challenge and reuse this diverse toolkit throughout the development. We follow a standard approach and develop specialized logical relations to establish the compositionality property of our proposed semantics; its soundness, which allows one to use the denotational semantics to show that certain outcomes are impossible under RA; and adequacy. This development puts weak memory models, which often require bespoke and highly specialized presentations, on a similar footing to many other programming effects.

Outline. In §2 we lay the groundwork for the rest of the paper by introducing the programming language that we will use (§2.1), the main ideas that underpin Brookes’s trace-based denotational semantics (§2.2), and the operational RA model (§2.3). In §3 we present the core aspects of our denotational semantics. First, we discuss our extension of RA’s operations semantics with first-class parallelism, which enables denotations to be defined for concurrent composition (§3.1). We then present RA traces (§3.2) and use them to define the denotations of key program constructs (§3.3). Next, we show how the restriction of traces within denotations (§3.4) and the addition of closure operations (§3.5) make our denotational semantics more abstract. The denotational semantics extends to the entire programming language standardly using Moggi’s monad-based approach (§3.6). With the denotational semantics in place, we present our main results in §4. Finally, we conclude and discuss related work in §5. More details are available in the extended version of this paper [21].

2 Preliminaries

We first introduce the language and its operational semantics under the Sequential Consistency (SC) memory model (§2.1). We then outline Brookes’s denotational semantics for SC (§2.2). Finally, we introduce Kang et al.’s operational presentation of Release/Acquire (RA) (§2.3).

2.1 Language and Operational Semantics

The programming language we use is an extension of a functional language with shared-state constructs. Program terms \(M\) and \(N\) can be composed sequentially explicitly as \(M\mathbin {\boldsymbol{;}}N\) or implicitly by left-to-right evaluation in the pairing construct \(\langle {M,N}\rangle \). They can be composed in parallel as \(M\boldsymbol{\parallel }N\). We assume preemptive scheduling, thus imposing no restrictions on the interleaving execution steps between parallel threads. To introduce the memory-access constructs, we present the well-known message passing litmus test, adapted to the functional setting:

$$\begin{aligned} {\left( {\texttt{x}\mathbin {\boldsymbol{:=}}1 \mathbin {\boldsymbol{;}}\texttt{y}\mathbin {\boldsymbol{:=}}1}\right) \boldsymbol{\parallel }\langle {\texttt{y}\boldsymbol{?},\texttt{x}\boldsymbol{?}}\rangle } \end{aligned}$$
(MP)

Here, \(\texttt{x}\) and \(\texttt{y}\) refer to distinct shared memory locations. Assignment \(\ell \mathbin {\boldsymbol{:=}}v\) stores the value \(v\) at location \(\ell \) in memory, and dereference \(\ell \boldsymbol{?}\) loads a value from \(\ell \). The language also includes atomic read-modify-write (RMW) constructs. For example, assuming integer storable values, \(\textrm{FAA}\left( {\ell ,v}\right) \) (Fetch-And-Add) atomically adds \(v\) to the value stored in \(\ell \). In contrast, interleaving is permitted between the dereferencing, adding, and storing in \(\ell \mathbin {\boldsymbol{:=}}\left( {\ell \boldsymbol{?} + v}\right) \). The underlying memory model dictates the behavior of the memory-access constructs more specifically.

In the functional setting, execution results in a returned value: \(\ell \mathbin {\boldsymbol{:=}}v\) returns the unit value \(\langle {}\rangle \), i.e. the empty tuple; \(\ell \boldsymbol{?}\), and the RMW constructs such as \(\textrm{FAA}\left( {\ell ,v}\right) \), return the loaded value; \(M\mathbin {\boldsymbol{;}}N\) returns what \(N\) returns; and \(\langle {M,N}\rangle \), as well as \(M\boldsymbol{\parallel }N\), return the pair consisting of the return value of \(M\) and the return value of \(N\). We assume left-to-right execution of pairs, so in the (MP) example \(\langle {\texttt{y}\boldsymbol{?},\texttt{x}\boldsymbol{?}}\rangle \) steps to \(\langle {v,\texttt{x}\boldsymbol{?}}\rangle \) for a value \(v\) that can be loaded from \(\texttt{y}\), and \(\langle {v,\texttt{x}\boldsymbol{?}}\rangle \) steps to \(\langle {v,w}\rangle \) for a value \(w\) that can be loaded from \(\texttt{x}\). In between, the left side of the parallel composition \((\boldsymbol{\parallel })\) can take steps.

We can use intermediate results in subsequent computations via let binding: \(\textbf{let}\,a=M\,\textbf{in}\, N\) binds the result of \(M\) to \(a\) in \(N\). Thus, we execute \(M\) first, and substitute the resulting value \(V\) for \(a\) in \(N\) before executing \({N}[a \mapsto V]\). Similarly, we deconstruct pairs by matching: \(\mathop {\textbf{match}}M\mathbin {\textbf{with}} \langle {a,b}\rangle \!.\, N\) binds the components of the pair that \(M\) returns to \(a\) and \(b\) respectively in \(N\). The first and second projections \(\textbf{fst}\) and \(\textbf{snd}\), as well as the operation \(\textbf{swap}\) that swaps the pair constituents, are defined using \(\textbf{match}\) standardly.

Sequential consistency. In the strongest memory model of Sequential Consistency (SC), every value stored is immediately made available to every thread, and every dereference must load the latest stored value. Thus the underlying memory model uses maps from locations to values for the memory state that evolves during program execution. Given an initial state, the behavior of a program in SC depends only on the choice of interleaving of steps. Though any such map can serve as an initial state, litmus tests are traditionally designed with the memory that sets all values to 0 in mind. In (MP) the order of the two stores and the two loads ensures that executions under SC may return \(\langle {\langle {}\rangle ,\langle {0,0}\rangle }\rangle \), \(\langle {\langle {}\rangle ,\langle {0,1}\rangle }\rangle \), and \(\langle {\langle {}\rangle ,\langle {1,1}\rangle }\rangle \), but not \(\langle {\langle {}\rangle ,\langle {1,0}\rangle }\rangle \).

Observations. An observable behavior of an entire program is a value it may evaluate to from given initial memory values. While programs may internally interact and observe the memory, we do not consider it feasible to observe the memory directly.

2.2 Overview of Brookes’s Trace-based Semantics

Observable behavior as defined for whole programs is too crude to study program terms that can interact with the program context within which they run. Indeed, compare \(M_1\) defined as versus \(M_2\) defined as . Under SC, the difference between them as whole programs is unobservable: starting from any initial state both return 1. Now consider them within the program context \({-}\boldsymbol{\parallel }\texttt{x}\mathbin {\boldsymbol{:=}}2\). That is, compare \(M_1 \boldsymbol{\parallel }\texttt{x}\mathbin {\boldsymbol{:=}}2\) versus \(M_2 \boldsymbol{\parallel }\texttt{x}\mathbin {\boldsymbol{:=}}2\). In the first, \(M_1\) still always returns 1; but in the second, \(M_2\) can also return 2 by interleaving the store of 2 in \(\texttt{x}\) immediately after the store of 1 in \(\texttt{x}\). Thus, if \(\llbracket {M}\rrbracket \), i.e. \(M\)’s denotation, were to simply map initial states to possible results according to executions of \(M\), we could not define \(\llbracket {M\boldsymbol{\parallel }N}\rrbracket \) in terms of \(\llbracket {M}\rrbracket \) and \(\llbracket {N}\rrbracket \) alone, because we would have \(\llbracket {M_1}\rrbracket = \llbracket {M_2}\rrbracket \) but also \(\llbracket {M_1 \boldsymbol{\parallel }\texttt{x}\mathbin {\boldsymbol{:=}}2}\rrbracket \ne \llbracket {M_2 \boldsymbol{\parallel }\texttt{x}\mathbin {\boldsymbol{:=}}2}\rrbracket \). We conclude that \(\llbracket {M}\rrbracket \) must contain more information on \(M\) than an “input-output” relation; it must account for interference by the environment.

Adequacy in SC. A prominent approach to define compositional semantics for concurrent programs is due to Brookes [13], who defined a denotational semantics for SC by taking \(\llbracket {M}\rrbracket \) to be a set of traces of \(M\) closed under certain rewrite rules as we detail below. Brookes established a (directional) adequacy theorem: if \(\llbracket {M}\rrbracket \supseteq \llbracket {N}\rrbracket \) then the transformation \(M\twoheadrightarrow N\) is valid under SC. The latter means that, when assuming SC-based operational semantics, \(M\) can be replaced by \(N\) within a program without introducing new observable behaviors for it. Thus, adequacy formally grounds the intuition that the denotational semantics soundly captures behavior of program terms.

As a particular practical benefit, formal and informal simulation arguments which are used to justify transformations in operational semantics can be replaced by cleaner and simpler proofs based on the denotational semantics. For example, a simple argument shows that \(\llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}v\mathbin {\boldsymbol{;}}\texttt{x}\mathbin {\boldsymbol{:=}}w}\rrbracket \supseteq \llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}w}\rrbracket \) holds in Brookes’s semantics. Thanks to adequacy, this justifies Write-Write Elimination Write-Write Elimination (WW-Elim) \(\texttt{x}\mathbin {\boldsymbol{:=}}v\mathbin {\boldsymbol{;}}\texttt{x}\mathbin {\boldsymbol{:=}}w\twoheadrightarrow \texttt{x}\mathbin {\boldsymbol{:=}}w\) in SC.

Traces in SC. In Brookes’s semantics, a program term is denoted by the set of traces, each trace consisting of a sequence of transitions. Each transition is of the form \(\langle {{\mu }, {\rho }}\rangle \), where \(\mu \) and \(\rho \) are memories, i.e. maps from locations to values. A transition describes a program term’s execution relying on a memory state \(\mu \) in order to guarantee the memory state \(\rho \).

For example, \(\llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}w}\rrbracket \) includes all traces of the form , where \(\rho \left[ {\texttt{x} := w}\right] \) is equal to \(\rho \) except for mapping \(\texttt{x}\) to \(w\). The definition is compositional: the traces in \(\llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}v\mathbin {\boldsymbol{;}}\texttt{x}\mathbin {\boldsymbol{:=}}w}\rrbracket \) are obtained from sequential compositions of traces from \(\llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}v}\rrbracket \) with traces from \(\llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}w}\rrbracket \), obtaining all traces of the form . Such a trace relies on \(\mu \) in order to guarantee \(\mu \left[ {\texttt{x} := v}\right] \), and then relies on \(\rho \) in order to guarantee \(\rho \left[ {\texttt{x} := w}\right] \). Allowing \(\rho \ne \mu \left[ {\texttt{x} := v}\right] \) reflects the possibility of environment interference between the two store instructions. Indeed, when denoting parallel composition \(\llbracket {M\boldsymbol{\parallel }N}\rrbracket \) we include all traces obtained by interleaving transitions from a trace from \(\llbracket {M}\rrbracket \) with transitions from a trace from \(\llbracket {N}\rrbracket \). By sequencing and interleaving, one subterm’s guarantee can fulfill the requirement which another subterm relies on. They may also relegate reliances and guarantees to their mutual context.

In the functional setting, executions not only modify the state but also return values. In this setting, traces are pairs, which we write as , where \(\xi \) is the sequence of transitions and \(r\) represents the final value that the program term guarantees to return [6]. For example, the semantics of dereference \(\llbracket {\texttt{x}\boldsymbol{?}}\rrbracket \) includes all traces of the form . Indeed, the execution of \(\texttt{x}\boldsymbol{?}\) does not change the memory and returns the value loaded from \(\texttt{x}\). In the semantics of assignment \(\llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}v}\rrbracket \), instead of we have .

Rewrite rules in SC. Were denotations in Brookes’s semantics defined to only include the traces explicitly mentioned above, it would not be abstract enough to justify (WW-Elim), which eliminates redundant writes. Indeed, we only saw traces with two transitions in \(\llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}v\mathbin {\boldsymbol{;}}\texttt{x}\mathbin {\boldsymbol{:=}}w}\rrbracket \), but in \(\llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}w}\rrbracket \) we saw traces with one. The semantics would still be adequate, but it would lack abstraction. This is where Brookes’s second main idea comes into play, making the denotations more abstract by closing them under two operations that rewrite traces:

  • Stutter adds a transition of the form \(\langle {{\mu }, {\mu }}\rangle \) anywhere in the trace. Intuitively, a program term can always guarantee what it relies on.

  • Mumble combines a couple of subsequent transitions of the form \(\langle {{\mu }, {\rho }}\rangle \langle {{\rho }, {\theta }}\rangle \) into a single transition \(\langle {{\mu }, {\theta }}\rangle \) anywhere in the trace. Intuitively, a program term can always omit a guarantee to the environment, and rely on its own omitted guarantee instead of relying on the environment.

Denotations in Brookes’s semantics are defined to be sets of traces closed under rewrite rules: applying a rewrite to a trace in the set results in a trace that is also in the set. For example, \(\llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}w}\rrbracket \) is the least closed set with all traces of the form , and \(\llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}v\mathbin {\boldsymbol{;}}\texttt{x}\mathbin {\boldsymbol{:=}}w}\rrbracket \) is the least closed set with all sequential compositions of traces from \(\llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}v}\rrbracket \) with traces from \(\llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}w}\rrbracket \).

Closure under these rules makes traces in \(\llbracket {M}\rrbracket \) correspond precisely to interrupted executions of \(M\), which are executions of \(M\) in which the memory can arbitrarily change between steps of execution. Each transition \(\langle {{\mu }, {\rho }}\rangle \) in a trace in \(\llbracket {M}\rrbracket \) corresponds to multiple execution steps of \(M\) that transition \(\mu \) into \(\rho \), and each gap between transitions accounts for possible environment interruption. The rewrite rules maintain this correspondence: stutter corresponds to taking 0 steps, and mumble corresponds to taking \(n+m\) steps instead of taking n steps and then m steps when the environment did not change the memory in between. Brookes’s adequacy proof is based on this precise correspondence. In particular, the single-pair traces in \(\llbracket {M}\rrbracket \) correspond to the (uninterrupted) executions, the “input-output” relation, of \(M\).

Abstraction in SC. Brookes’s semantics is fully abstract, meaning that the converse to adequacy also holds: if \(N\twoheadrightarrow M\) is valid under SC, then \(\llbracket {N}\rrbracket \supseteq \llbracket {M}\rrbracket \). However, Brookes’s proof relies on an artificial program construct, \(\textbf{await}\), that permits waiting for a specified memory snapshot and then step (atomically) to a second specified memory snapshot. Thus, in realistic languages, when this construct is unavailable, Brookes’s full abstraction proof does not apply.

Nevertheless, even without full abstraction, one can still provide evidence that an adequate semantics is abstract by ensuring that it supports known transformations. As an example, we show directly that \(\llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}v\mathbin {\boldsymbol{;}}\texttt{x}\mathbin {\boldsymbol{:=}}w}\rrbracket \supseteq \llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}w}\rrbracket \) holds in Brookes’s semantics. Since \(\llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}v\mathbin {\boldsymbol{;}}\texttt{x}\mathbin {\boldsymbol{:=}}w}\rrbracket \) is closed, it suffices to show that . For a memory \(\mu \), we have for every memory \(\rho \), in particular when \(\rho = \mu \left[ {\texttt{x} := v}\right] \). Since \(\rho \left[ {\texttt{x} := w}\right] = \mu \left[ {\texttt{x} := v}\right] \left[ {\texttt{x} := w}\right] = \mu \left[ {\texttt{x} := w}\right] \), we have . After applying mumble, we have .

2.3 Overview of Release/Acquire Operational Semantics

Memory accesses in RA are more subtle in than in SC. To address this we adopt Kang et al.’s “view-based” machine [27], an operational presentation of RA proven to be equivalent to the original declarative formulation of RA [e.g. 30]. In this model, rather than the memory holding only the latest value written to every variable, the memory accumulates a set of memory update messages for each location. Each thread maintains its own view that captures which messages the thread can observe, and is used to constrain the messages that the thread may read and write. The messages in the memory carry views as well, which are inherited from the thread that wrote the message, and passed to any thread that reads the message. Thus views indirectly maintain a causal relationship between messages in memory throughout the evolution of the system.

More concretely, causality is enforced by timestamping messages, thus placing them on their location’s timeline. To capture the atomicity of RMWs, each message occupies a half-open segment \((q,t]\) on their location’s timeline, where \(t\) is the message’s timestamp. It dovetails with a message at the same location with timestamp \({q}\). An RMW “modifies” a message by dovetailing with it.

A view \(\kappa \) associates a timestamp \(\kappa (\ell )\) to each location \(\ell \), obscuring the portion of \(\ell \)’s timeline before \(\kappa (\ell )\). The view points to a message at \(\ell \) with timestamp \(\kappa (\ell )\). A view \(\omega \) dominates a view \(\alpha \), written \(\alpha \le \omega \), if \(\alpha (\ell ) \le \omega (\ell )\) for every \(\ell \).

Messages point to messages via the view they carry, and must point to themselves. So when specifying a message, the value its view takes at its location may be omitted. For example, assuming of two location, \(\texttt{x}\) and \(\texttt{y}\), we denote by the message at location \(\texttt{x}\) that carries the value 1, occupies the segment \((.5,1.7]\) on \(\texttt{x}\)’s timeline, and carries the view \(\kappa \) such that \(\kappa (\texttt{x}) = 1.7\) and \(\kappa (\texttt{y}) = 3.5\). An example memory is depicted on the top of Figure 1.

Fig. 1.
figure 1

Illustrations of a memory (top) and a trace (bottom), in the setting of two memory locations, \(\texttt{x}\) and \(\texttt{y}\). Top: A memory holding six messages. The timelines are purposefully misaligned and not to scale to emphasize that timestamps for different locations are incomparable and that only the order between them is relevant. The graph structure that the views impose is illustrated by arrows pointing between messages. Messages that are not dovetailed are set apart, e.g. \(\nu _3\) dovetails with \(\nu _2\), which does not dovetail with \(\nu _1\). Bottom: A trace with two transitions: . The memory illustrated on top is \(\rho _2\). Messages and edges that are not part of a previous memory are highlighted. The local messages are \(\nu _2\) and \(\nu _3\), and the rest are environment messages.

When a thread writes to \(\ell \), it must increase the timestamp its view associates with \(\ell \) and use its new view as the message’s view. The message’s segment must not overlap with any other segment on \(\ell \)’s timeline. In particular, only one message can ever dovetail with a given message. A thread can only read from revealed messages, and when it reads, its view increases as needed to dominate the view of the loaded message. This may obscure messages at other locations.

Revisiting the (MP) litmus test, starting with a memory with a single message holding 0 at each location, and with all views pointing to the timestamps of these message, suppose the right thread loaded 1 from \(\texttt{y}\), as depicted on the left side of Figure 2. Such a message can only be available if the left thread stored it. Before storing 1 to \(\texttt{y}\), the left thread stored 1 to \(\texttt{x}\), obscuring the initial \(\texttt{x}\) message. The right thread inherits this limitation through the causal relationship, so it will not be able to load 0 from \(\texttt{x}\). Therefore, RA forbids the outcome \(\langle {\langle {}\rangle ,\langle {1,0}\rangle }\rangle \).

Fig. 2.
figure 2

Depictions of a step during an execution of a litmus test, with the view of the right thread changing from \(\sigma \) to \(\sigma '\). The value each message carries is in its bottom-right corner. Views are illustrated implicitly in the graph structure that they impose. Obscured messages are faded. Left: As the right thread in (MP) loads 1 from \(\texttt{y}\), it inherits the view of \(\epsilon _1\), obscuring \(\nu _0\). Right: The right thread in (SB) loading 0 from \(\texttt{x}\). Storing \(\epsilon _1\) did not obscure \(\nu _0\).

In contrast, consider the litmus test known as store buffering:

$$\begin{aligned} {\left( {\texttt{x}\mathbin {\boldsymbol{:=}}1 \mathbin {\boldsymbol{;}}\texttt{y}\boldsymbol{?}}\right) \boldsymbol{\parallel }\left( {\texttt{y}\mathbin {\boldsymbol{:=}}1 \mathbin {\boldsymbol{;}}\texttt{x}\boldsymbol{?}}\right) } \end{aligned}$$
(SB)

By considering the possible interleavings, one can check that no execution in SC returns \(\langle {0,0}\rangle \). However, in RA some do. Indeed, even if the left thread stores to \(\texttt{x}\) before the right thread loads from \(\texttt{x}\), the right thread’s view allows it to load 0, as depicted on the right side of Figure 2.

We can recover the SC behavior by interspersing fences between sequenced memory accesses, which we model with \(\textrm{FAA}\left( {\texttt{z},0}\right) \) to a fresh location \(\texttt{z}\). Thus, compare (SB) to the store buffering with fences litmus test:

$$\begin{aligned} {\left( {\texttt{x}\mathbin {\boldsymbol{:=}}1 \mathbin {\boldsymbol{;}}\textrm{FAA}\left( {\texttt{z},0}\right) \mathbin {\boldsymbol{;}}\texttt{y}\boldsymbol{?}}\right) \boldsymbol{\parallel }\left( {\texttt{y}\mathbin {\boldsymbol{:=}}1 \mathbin {\boldsymbol{;}}\textrm{FAA}\left( {\texttt{z},0}\right) \mathbin {\boldsymbol{;}}\texttt{x}\boldsymbol{?}}\right) } \end{aligned}$$
(SB+F)

Both of the \(\textrm{FAA}\left( {\texttt{z},0}\right) \) instructions store messages that must dovetail with the message that they load from, and in that also inherit its view. They cannot both dovetail with the same message because their segments cannot intersect. Thus, one of them—say, the one on the right—will have to dovetail with the other. In this scenario, the view of the message that the left thread stores at \(\texttt{z}\) points to the message it previously stored at \(\texttt{x}\). When the right thread loads the message from \(\texttt{z}\) it inherits this view, obscuring the initial message to \(\texttt{x}\). Therefore, when it later loads from \(\texttt{x}\), it must load what the left thread stored. Thus, like in SC, no execution in RA returns \(\langle {0,0}\rangle \).

3 Denotational Semantics for Release/Acquire

We start this section by explaining how we support first-class concurrent composition (\(\boldsymbol{\parallel }\)) in the operational semantics of Release/Acquire (§3.1). In the rest of the section we present the core of our denotational semantics. First, we present our notion of a trace, adapted to RA, along with four basic rewrite rules that our denotations are closed under (§3.2). Next, we define the denotations of the key program constructs (§3.3). We then present further aspects of the denotational semantics that make it more abstract: restrictions that traces in denotations must uphold (§3.4), and three more rewrite rules under which denotations are closed (§3.5). For completeness, we show how to give denotations to the whole language standardly, using Moggi’s approach (§3.6).

3.1 First-class Concurrent Composition

Kang et al. presentation assumes top-level parallelism, a common practice in studies of weak-memory models. This comes at the cost of the uniformity and compositionality. In particular, the denotation \(\llbracket {M\boldsymbol{\parallel }N}\rrbracket \) cannot be defined. We resolve this by extending Kang et al.’s operational semantics to support first-class parallelism by organizing thread views in an evolving view-tree, a binary tree with view-labelled leaves, rather than in a fixed flat mapping. Thus, states that accompany executing terms consist of a memory and a view-tree. In discourse, we do not distinguish between a view-leaf and its label.

An initial state consists of a memory with a single message at each location, and a view which points to these messages’ timestamps. The example below shows how threads inherit their parent’s view upon activation and combine their views as they synchronize:

Example

In the following, \(\mathrel {\rightsquigarrow }\) is the execution step relation, \(\mathrel {\rightsquigarrow ^*}\) is its reflexive-transitive closure, \(\mu _0\) is an initial memory, \(\dot{\kappa }\) is the \(\kappa \)-labelled view-leaf, is the view-tree that consists of a node connected to the view-trees \(T\) and \(R\), and \(\omega \) is the least view that dominates both \(\omega _1\) and \(\omega _2\):

figure q

First, \(M\) runs until it returns a value, which is discarded by the sequencing construct. Next, the parallel composition \(N_1 \boldsymbol{\parallel }N_2\) activates. The threads then interleave executions, each with its associated side of the view-tree. Finally, once both threads return a value, they synchronize.

Handling parallel composition as a first-class construct allows us to decompose Write-Read Reordering Write-Read Reordering (WR-Reord) \({\left( {\texttt{x}\mathbin {\boldsymbol{:=}}v}\right) } \mathbin {\boldsymbol{;}}{\texttt{y}\boldsymbol{?}} \twoheadrightarrow \textbf{fst}\, \langle {\texttt{y}\boldsymbol{?},\left( {\texttt{x}\mathbin {\boldsymbol{:=}}v}\right) }\rangle \,\), a crucial reordering of memory accesses valid under RA but not under SC, into a combination of Write-Read Deorder (WR-Deord) \(\langle {\left( {\texttt{x}\mathbin {\boldsymbol{:=}}v}\right) ,\texttt{y}\boldsymbol{?}}\rangle \twoheadrightarrow \left( {\texttt{x}\mathbin {\boldsymbol{:=}}v}\right) \boldsymbol{\parallel }\texttt{y}\boldsymbol{?}\) together with structural transformations and laws of parallel programming:

figure r

This provides a separation of concerns: the components of this decomposition are supported by our semantics using independent arguments. It also sheds a light on the interesting part, as they are all valid under SC except for (WR-Deord).

3.2 Traces for Release/Acquire

Adapting Brookes’s SC-traces, our RA-traces also include a sequence of transitions \(\xi \), each transition a pair of RA memories; and a return value \(r\). Intuitively, these play a similar role here, formally grounded in analogs to the stutter and mumble rewrite rules. Seeing that the operational semantics only adds messages and never modifies them, we require that every memory snapshot in the sequence \(\xi \) be contained in the subsequent one, whether it be within or across transitions. A message added within a transition is a local message; otherwise it is an environment message. We call the first memory in \(\xi \)’s first transition its opening memory, and the second memory in \(\xi \)’s last transition its closing memory.

In addition, RA-traces include an initial view \(\alpha \), declaring which messages are relied upon to be revealed in \(\xi \)’s opening memory; and a final view \(\omega \), declaring which messages are guaranteed to be revealed in \(\xi \)’s closing memory. We ground these intuition formally in the rewind and forward rewrite rules below.

We write the trace as . See an illustration on the bottom of Figure 1.

Stutter & Mumble. We define the stutter (\(\textsf{St}\)) and mumble (\(\textsf{Mu}\)) rewrite rules:

figure t

As in Brookes’s semantics, their role is to make the semantics more abstract by divorcing the length of the sequence from the individual steps taken in the operational semantics, while maintaining the transitions’ Rely/Guarantee character.

Rewind & Forward. The rewind (\(\textsf{Rw}\)) rewrite rules establish the fact that the term only relies on certain messages being revealed, not on messages being obscured. The rewind rule modifies the initial view, making it point to earlier messages on the timelines. Thus, relied upon messages will remain available after the rewrite. Similarly, the forward (\(\textsf{Fw}\)) rewrite rule establish the fact that the term only guarantees that certain messages are revealed. The forward rule modifies the final view, making it point to later messages on the timelines. Thus, any message guaranteed to be available was already guaranteed beforehand. The rules are schematically depicted in Figure 3.

Fig. 3.
figure 3

Schematic depictions of the rewind and forward rewrite rule, focusing on a single location, where the initial/final view points to \(\nu \) before and points to \(\epsilon \) after. The messages \(\nu \) and \(\epsilon \) may coincide, dovetail, or be separated. Left: The initial view \(\alpha \) is “rewound” to \(\alpha '\). Right: The final view \(\omega \) is “forwarded” to \(\omega '\).

3.3 Introducing Denotations for RA

We present denotations of key constructs of the programming language. By referring to the notion of a closed set below, we mean a set that is closed under certain rewrite rules, such as stutter, mumble, rewind, and forward from §3.2.

Pure. A pure (i.e. effect-free) computation guarantees a returned value, and otherwise can only guarantee what it relies on. For example, define \(\llbracket {2 + 3}\rrbracket \) as least closed set with all traces of the form .

Sequence. In denoting sequential composition we must make sure that the first component does not obscure any message that the second component relies on. Thus, define \(\llbracket {\langle {M,N}\rangle }\rrbracket \) as least closed set with all traces of the form , where there exists a view \(\kappa \) such that and . The existence of the revealed messages is implicit: \(\xi \)’s closing memory must be contained in the memory that follows it, which is \(\eta \)’s opening memory. The definition of \(\llbracket {M\mathbin {\boldsymbol{;}}N}\rrbracket \) is the same, except that the first component of the returned pair is discarded. That is, with traces of the form .

Parallel. Threads composed in parallel rely on the same preceding sequential environment and guarantee to the same succeeding sequential environment. Thus, define \(\llbracket {M_1 \boldsymbol{\parallel }M_2}\rrbracket \) as the least closed set with all traces of the form , where there exist sequences \(\xi _1\) and \(\xi _2\) such that and \(\xi \) is obtained by interleaving their transitions, and (for \(i \in \left\{ 1,2\right\} \)).

Dereference. We define \({\llbracket {\ell \boldsymbol{?}}\rrbracket }\) to be the least closed set with all traces of the form , where for some timestamp \(q\) and view \(\kappa \), and both \(\alpha \le \omega \) and \(\kappa \le \omega \).

Assignment. Define \({\llbracket {{\ell } \mathbin {\boldsymbol{:=}}{v}}\rrbracket }\) as the least closed set with all traces of the form where \(\rho \) is obtained by adding the message to \(\mu \) for some timestamp \(q\), and \(\alpha \le \omega \).

Read-modify-write. The definition of \({\llbracket {\textrm{FAA}\left( {\ell ,w}\right) }\rrbracket }\) combines the two above, along with a dovetailing requirement. Specifically, it is the least closed set with all traces of the form , where for some timestamp \(q\) and view \(\kappa \), both \(\alpha \le \omega \) and \(\kappa \le \omega \), and \(\rho \) is obtained by adding the message to \(\mu \). The semantics of other RMWs is defined similarly.

Example

We show that \({\llbracket {\ell \mathbin {\boldsymbol{:=}}v\mathbin {\boldsymbol{;}}v}\rrbracket } \subseteq {\llbracket {\ell \mathbin {\boldsymbol{:=}}v\mathbin {\boldsymbol{;}}\ell \boldsymbol{?}}\rrbracket }\). When sequencing two traces, the final view of the first must match the initial view of the second, so traces in \({\llbracket {\ell \mathbin {\boldsymbol{:=}}v\mathbin {\boldsymbol{;}}v}\rrbracket }\) have the form , where \(\rho \) is obtained by adding the message to \(\mu \) for some timestamp \(q\), and \(\alpha \le \omega \). Since \(\omega \) points to this added message, and since \(\rho \subseteq \theta \) as memories along a trace’s sequence, . By sequencing, .

3.4 Correspondence to the Operational Semantics

Traces in denotations, if unconstrained, may represent behaviors that include operationally unreachable states. Forbidding such redundant traces eliminates a source of differentiation between denotations, thus increasing their abstraction.

Reachable states. Consider the transformation \(\texttt{x}\boldsymbol{?} \mathbin {\boldsymbol{;}}\texttt{y}\boldsymbol{?} \twoheadrightarrow \texttt{y}\boldsymbol{?}\), a consequence of the RA-valid Irrelevant Read Elimination (R-Elim) \(\texttt{x}\boldsymbol{?} \mathbin {\boldsymbol{;}}\langle {}\rangle \twoheadrightarrow \langle {}\rangle \) and structural equivalences. Consider the state \(S\) that consists of the memory at the top of Figure 1 and the view that points to \(\nu _3\) and \(\epsilon _2\). The only step \(\texttt{x}\boldsymbol{?} \mathbin {\boldsymbol{;}}\texttt{y}\boldsymbol{?}\) can take from the state \(S\) is to load \(\nu _3\), inheriting the view that \(\nu _3\) carries, which changes the thread’s view to point to \(\epsilon _3\). Only \(\epsilon _3\) is available in the following step, which means the term returns 3. In contrast, starting from \(S\), the term \(\texttt{y}\boldsymbol{?}\) can load from \(\epsilon _2\) to return 7. This analysis does not invalidate the transformation because the state \(S\) is unreachable by an execution starting from an initial state, and should therefore be ignored when determining observable behaviors.

Internalizing invariants. Just as we ignore unreachable states in the operational semantics, we discard “unreachable” traces to refine our denotational semantics. We consider a state to be valid if it adheres to the following invariants.

  • Scattering: segments in memory never overlap.

  • Pointing: views always point to messages.

  • Dominating: views always dominate the views of the messages to which they point. This invalidates the state \(S\) above, because the view of the thread does not dominate the view of \(\nu _3\) even though it points to it.

  • Descending: a path from a message along the view-induced graph structure cannot end in another message with a greater timestamp at the same location. Demonstrated both positively and negatively in Figure 4.

  • Acyclicity: a cycle along the view-induced graph structure consists solely of messages which have the smallest timestamp on their timeline.

Memory snapshots in traces are required to obey each of the invariants above. The initial and final view must point to and dominate the opening and closing memory respectively. This means that there must be a message to load that allows the initial and final view to be equal, and we obtain \({\llbracket {\texttt{x}\boldsymbol{?} \mathbin {\boldsymbol{;}}\langle {}\rangle }\rrbracket } \supseteq {\llbracket {\langle {}\rangle }\rrbracket }\).

We also uphold requirements that correspond to the relation between the states across a possibly-interrupted series of steps in the operational semantics:

  • Accumulating: the memory after contains the memory before. We require that every memory snapshot contains the one before it.

  • Delimiting: if the view-trees before and after are leaves, then the view after dominates the view before, and the view of any written message dominates the view before and is dominated by the view after. We impose the analogous requirement on the initial and final views, and on the local messages.

The trace in Figure 1 adheres to the invariants and relationships we have listed.

Fig. 4.
figure 4

Two variations on the memory illustrated in Figure 1. Top: This can function as a memory snapshot in a trace. It demonstrates that the views of messages along a timeline do not have to be ordered: \(\epsilon _2\) appears earlier than \(\epsilon _3\) on \(\texttt{y}\)’s timeline but points to a later message on \(\texttt{x}\)’s timeline. Bottom: This cannot function as a memory snapshot in a trace, because it contains an ascending path. Intuitively, no thread could have written \(\epsilon _2\) because the view that \(\epsilon _2\) carries indicates that the thread would have already “known” about \(\nu _3\) and therefore, following the causality chain, about \(\epsilon _3\) as well. Thus, the thread would have been forbidden from picking \(\epsilon _2\)’s timestamp.

Concrete operational correspondence. We call the rewrite rules that were defined in §3.2 concrete because they maintain a certain concrete interpretation of traces. To see this, consider the operational semantics for RA augmented with an additional kind of step, which any term can take. The only change along this step is that a view in the view-tree inherits the view from a message that is available to it. This addition does not change the observable behaviors of whole programs, and maintains the above invariants.

Each trace in the denotations of §3.3, if closed only under the concrete rewrite rules, corresponds to an interrupted execution in the augmented operational semantics. The correspondence is similar to that from Brookes’s semantics in terms of the sequence of transitions and return value. The initial and final views determine the views at the beginning and the end of the interrupted execution.

The introduction of the rewrite rules in §3.5 will mean that traces do not have such a clear operational interpretation. The key to our proof of adequacy is to partially recover this operational correspondence in terms of the overall observable behaviors (§4).

3.5 Abstract Rewrite Rules

Transitions in RA traces consist of sets of messages, which record much more information about the operational execution than the mappings from locations to values we had in SC. This makes the trace-based semantics too concrete. We resolve the memory-concreteness issue by introducing three abstract rewrite rules that obfuscate information about local messages. This makes the denotations more abstract by blurring the distinctions that denotations can make.

Tighten. Recall the transformation (WR-Deord) that we wish to support. Let \(\tau _1 \in \llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}v}\rrbracket \) and \(\tau _2 \in \llbracket {\texttt{y}\boldsymbol{?}}\rrbracket \), such that they compose sequentially to form a trace from \(\llbracket {\langle {\left( {\texttt{x}\mathbin {\boldsymbol{:=}}v}\right) ,\texttt{y}\boldsymbol{?}}\rangle }\rrbracket \). Then \(\tau _1\)’s final view \(\kappa \) must equal \(\tau _2\)’s initial view. The view \(\kappa \) dominates the view \(\sigma \) of the local message \(\nu _1\) stored by \(\tau _1\), and \(\kappa \) cannot obscure the message \(\nu _2\) from which \(\tau _2\) loaded its value. Thus, \(\sigma \) cannot obscure \(\nu _2\). In contrast, consider \(\tau _1\) and \(\tau _2\) that compose in parallel to form a trace from \(\llbracket {\left( {\texttt{x}\mathbin {\boldsymbol{:=}}v}\right) \boldsymbol{\parallel }\texttt{y}\boldsymbol{?}}\rrbracket \). Here, the view of the local message may very well obscure the loaded message. Indeed, the final view of \(\tau _1\) may dominate the initial view of \(\tau _2\).

To resolve this, observe that the purpose of recording views in messages is to encumber its loaders. Under this perspective, the view of a local message guarantees to the environment that loading the local message will keep certain messages revealed. Therefore, making the view larger only weakens the guarantee. Thus, we introduce the tighten (\(\textsf{Ti}\)) rewrite rule that makes the view of a local message larger. The rule is depicted in Figure 5, and Figure 6 provides a concrete example. Using tighten, we can show that \(\llbracket {\langle {\left( {\texttt{x}\mathbin {\boldsymbol{:=}}v}\right) ,\texttt{y}\boldsymbol{?}}\rangle }\rrbracket \supseteq \llbracket {\left( {\texttt{x}\mathbin {\boldsymbol{:=}}v}\right) \boldsymbol{\parallel }\texttt{y}\boldsymbol{?}}\rrbracket \).

Fig. 5.
figure 5

Schematic depiction of the tighten rewrite rule, that focuses on a particular memory snapshot within the trace, in the setting of \(k{+}1\) locations. The message \(\nu \) is “tightened” to \(\nu '\), such that for each i it points to \(\beta _i\) instead of \(\epsilon _i\). This includes the case that \(\beta _i\) and \(\epsilon _i\) are the same message in some locations.

Fig. 6.
figure 6

A possible result from rewriting the trace from Figure 1 using tighten. Since \(\nu _2\) is local in the trace from Figure 1, tighten can advance its view to point to \(\epsilon _3\) instead of \(\epsilon _1\). The same replacement is applied throughout the trace’s sequence, not just the closing memory.

Fig. 7.
figure 7

Schematic depictions of the absorb (left) and dilute (right) rewrite rules, that focus on the segment of the dovetailed messages together with all pointers into and out of them, within a particular memory snapshot. The circular cloud represents the subset of the memory that the messages in focus are pointing to, showing that they all have the same view. The elliptical clouds represent views—including the initial and final view, as well as other messages—that point to each of the dovetailing messages. Left: The message \(\nu \) is “absorbed” into the message \(\epsilon \) to become \(\epsilon '\). No view may point to \(\nu \). Right: The message \(\nu '\) “dilutes” into \(\nu \) and \(\epsilon \). While \(\epsilon \) must be a local message, \(\nu \) and \(\nu '\) can appear anywhere the trace’s sequence, as long as they appear in the same places in the sequence, and that \(\epsilon \) does not appear before. The views that point to \(\nu '\) before diluting can point either to \(\nu \) or to \(\epsilon \) after diluting.

Absorb. Recall the transformation (WW-Elim) that we wish to support. To show this we aim to replicate, as far as we can, the reasoning we have used to show \(\llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}v\mathbin {\boldsymbol{;}}\texttt{x}\mathbin {\boldsymbol{:=}}w}\rrbracket \supseteq \llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}w}\rrbracket \) in Brookes’s semantics. Recall that, to use mumble, we made the memories match across the two transitions of \(\llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}v\mathbin {\boldsymbol{;}}\texttt{x}\mathbin {\boldsymbol{:=}}w}\rrbracket \). Doing so here, we end up with two local messages, whereas traces from \(\llbracket {\texttt{x}\mathbin {\boldsymbol{:=}}w}\rrbracket \) only have a single local message. Roughly speaking, the equality concerning SC memories \(\mu \left[ {\texttt{x} := v}\right] \left[ {\texttt{x} := w}\right] = \mu \left[ {\texttt{x} := w}\right] \) does not transfer to RA where memory, by accumulating messages, is more concrete. We resolve this by adding the absorb (\(\textsf{Ab}\)) rewrite rule, which replaces two dovetailed local messages with one that carries the second message’s value. The rule is depicted in Figure 7, and Figure 8 provides a specific example.

Fig. 8.
figure 8

A possible result from rewriting of the trace from Figure 6 using absorb. The dovetailed messages \(\nu _2\) and \(\nu _3\) are local in the trace from Figure 1, added within the same transition, so by rewriting by absorb they can be replaced by \(\nu _3'\) obtained by stretching \(\nu _3\)’s segment to cover \(\nu _2\)’s segment.

Dilute. There is another known family of transformations that are valid under RA memory, yet we cannot justify with the rules we presented. These introduce non-modifying atomic updates, such as Read to FAA (R-FAA) \(\ell \boldsymbol{?} \twoheadrightarrow \textrm{FAA}\left( {\ell ,0}\right) \).

Running within some context, \(\textrm{FAA}\left( {\ell ,0}\right) \) reads a message \(\nu \), to which it dovetails another message \(\epsilon \) with the same value. It’s possible that some \(\beta \) dovetails with \(\epsilon \) later in the execution. In the same context, we can simulate this behavior with \(\ell \boldsymbol{?}\) instead, by having the context provide \(\nu '\) instead of \(\nu \), with the difference that it takes up the same segment that \(\nu \) and \(\epsilon \) have taken up combined. If there is a \(\beta \) as mentioned, it can now dovetail with \(\nu '\) to the same effect. In this scenario, \(\nu \) is an environment message, but we must also account for the case that it is local to allow for composition, such as in \(\ell \mathbin {\boldsymbol{:=}}v\mathbin {\boldsymbol{;}}\ell \boldsymbol{?} \twoheadrightarrow \ell \mathbin {\boldsymbol{:=}}v\mathbin {\boldsymbol{;}}\textrm{FAA}\left( {\ell ,0}\right) \).

We internalize the idea behind this argument as the dilute (\(\textsf{Di}\)) rewrite rule, in which a message is replaced by two message that together occupy the same segment, the second being a local message that cannot appear before the first in the trace and must carry the same value. With dilute, \(\llbracket {\ell \boldsymbol{?}}\rrbracket \supseteq \llbracket {\textrm{FAA}\left( {\ell ,0}\right) }\rrbracket \). The rule is depicted in Figure 7, and Figure 9 provides a specific example.

Fig. 9.
figure 9

A possible result from rewriting of the trace from Figure 1 using dilute. The message \(\epsilon _1\) from Figure 1 was replaced with \(\epsilon _1'\), with the same value 1. The local message \(\beta \)—which takes up the rest of the missing space left behind by \(\epsilon _1\)—always appears with \(\epsilon _1'\), dovetailing with it and carrying the same value. The message \(\epsilon _2\), that used to dovetail with \(\epsilon _1\), now dovetails with \(\beta \).

3.6 Monadic Presentation

One of the contributions of this work is to bridge research of weak-memory models with Moggi’s monad-based approach [38] to denotational semantics. In this approach, one start by defining a monad, which has three components. The first associates for every set \(X\), which we think of as representing returned values, to a set \(\underline{\mathcal {T}}X\) representing computations that return values from \(X\). In our case, \(\underline{\mathcal {T}}X\) consists of countable sets of traces closed under rewrite rules.

Denotations are then defined according to their typing judgments. For example, \(a, b: \textsf{Loc}\vdash \langle {a,b\boldsymbol{?}}\rangle : \left( {\textsf{Loc}\times \textsf{Val}}\right) \) means that in the context that the free variables \(a\) and \(b\) are locations, the term \(\langle {a,b\boldsymbol{?}}\rangle \) is a location-value pair. Given a function \(\gamma \) that maps \(a\) and \(b\) to locations, \(\llbracket {\langle {a,b\boldsymbol{?}}\rangle }\rrbracket \gamma \in \underline{\mathcal {T}}\left( {\textsf{Loc}\times \textsf{Val}}\right) \). For \(\varGamma \vdash M: A\) and \(\varGamma \vdash N: A\), we generalize containment \(\llbracket {N}\rrbracket \supseteq \llbracket {M}\rrbracket \) pointwise: if \(\gamma \) maps variables in \(\varGamma \) appropriately by their type, then \(\llbracket {N}\rrbracket \gamma \supseteq \llbracket {M}\rrbracket \gamma \). This degenerates when \(\varGamma \) is empty, i.e. when \(M\) and \(N\) are closed terms.

The second monad component is a function \({\text {{return}}}^{\mathcal {T}}_{X} : X\rightarrow \underline{\mathcal {T}}X\) maps values to pure computations that return that value. The third component sequences computations, such that the latter depends on the value returned by the former: . Omitting the indices, the monad components must satisfy certain axioms that formalize the stated intuition: , and .

In our case, we define \({\text {{return}}}r\) as the least closed set with all traces of the form ; and as the least closed set with all traces of the form , where and for some \(\kappa \).

Denotations. This approach comes read-made with denotations for standard language constructs. For example, . Similarly, , where \(\gamma \left[ {{a}\mapsto {r}}\right] \) is obtained from \(\gamma \) by mapping \(a\) to \(r\). Pure computations use the return function, e.g. \(\llbracket {v}\rrbracket = {\text {{return}}}v\).

Program effects can be modularly introduced in this approach, such as memory access, where \({\llbracket {{\ell } \mathbin {\boldsymbol{:=}}{v}}\rrbracket } \in \underline{\mathcal {T}}\left\{ \langle {}\rangle \right\} \) and \({\llbracket {\ell \boldsymbol{?}}\rrbracket }, {\llbracket {\textrm{FAA}\left( {\ell ,v}\right) }\rrbracket } \in \underline{\mathcal {T}}\textsf{Val}\); and parallel composition, a function \((\mathbin {{\vert \vert \vert }}^{\mathcal {T}}_{X, Y}) : \underline{\mathcal {T}}X\times \underline{\mathcal {T}}Y\rightarrow \underline{\mathcal {T}}\left( {X\times Y}\right) \) with which \(\llbracket {M\boldsymbol{\parallel }N}\rrbracket \gamma {:}{=}\llbracket {M}\rrbracket \gamma \mathbin {{\vert \vert \vert }}\llbracket {N}\rrbracket \gamma \). The definition remains the same: we obtain traces in \(P\mathbin {{\vert \vert \vert }}Q\) by interleaving transitions and pairing returned values of traces with matching views, one from \(P\) and one from \(Q\).

Adhering to left-to-right evaluation both operationally and denotationally, \(M\mathbin {\boldsymbol{:=}}N\) is equivalent to \(\mathop {\textbf{match}}\langle {M,N}\rangle \mathbin {\textbf{with}} \langle {a,b}\rangle \!.\, a\mathbin {\boldsymbol{:=}}b\). In traces of assignment, the added local message is free to dovetail with a previous message, unlike in RMW traces where it must. Therefore, we have \({\llbracket {\ell \mathbin {\boldsymbol{:=}}\left( {\ell \boldsymbol{?} + v}\right) }\rrbracket } \supseteq {\llbracket {\textrm{FAA}\left( {\ell ,v}\right) }\rrbracket }\).

Structural reasoning. Among the general results and proof techniques this approach supplies are structural equivalences. These are denotational equations that hold due to the properties of the core calculus, and are preserved by modular expansions with program effects. For instance, if \(K\) is effect-free, then \( \llbracket {\textbf{ if }\, K\,\textbf{then}\, M\mathbin {\boldsymbol{;}}N\,\textbf{else}\, M\mathbin {\boldsymbol{;}}N'\,}\rrbracket = \llbracket {M\mathbin {\boldsymbol{;}}\textbf{ if }\, K\,\textbf{then}\, N\,\textbf{else}\, N'\,}\rrbracket \). Equivalences such as this one may otherwise require challenging ad-hoc proofs [e.g. 24, 26].

More generally, structural reasoning composes to derive further equivalences. For example, from \(\llbracket {\langle {}\rangle }\rrbracket = \llbracket {\ell \boldsymbol{?} \mathbin {\boldsymbol{;}}\langle {}\rangle }\rrbracket \) and structural equivalences, namely “left neutrality” \(\llbracket {K}\rrbracket = \llbracket {\langle {}\rangle \mathbin {\boldsymbol{;}}K}\rrbracket \) and “associativity” \(\llbracket {\left( {M\mathbin {\boldsymbol{;}}N}\right) \mathbin {\boldsymbol{;}}K}\rrbracket = \llbracket {M\mathbin {\boldsymbol{;}}\left( {N\mathbin {\boldsymbol{;}}K}\right) }\rrbracket \):

figure ax

Structural reasoning generalizes to program transformations. For example, is monotonic, so we can also derive:

figure az

Since \((\mathbin {{\vert \vert \vert }})\) is also monotonic, we can use this to show that \({\llbracket {\text {(SB)}}\rrbracket } \supseteq {\llbracket {\text {(SB+F)}}\rrbracket }\).

Higher order. An important aspect of a programming language is its facilitation of abstraction. Higher-order programming is a flexible instance of this, in which programmable functions can take functions as input and return functions as output. Moggi’s approach supports this feature out-of-the-box, in such a way that does not complicate the rest of the semantics, as the first-order fragment of the semantics need not change to include it.

Every value returned by an execution has a semantic presentation which we use as the returned value in traces. The semantic and syntactic values are identified in the first-order fragment, but different syntactic functions may have the same semantics, so the identification does not extend to higher-order.

We classify a term as a program if it is closed (every variable occurrence is bound) and of ground type (all functions are applied to arguments). This definition is in line with the expectation that a program should return a concrete result that the end-user can consume. Thus, we only consider observable behaviors of programs. Transformations only need to be valid when applied within programs. Programs degenerate to closed terms in the first-order fragment.

4 Main Results

We present the main results that we have proven about our denotational semantics. Moggi’s semantic toolkit features ubiquitously in their proofs.

Compositionality. In its most basic form, this key feature of denotational semantics means that a program term’s denotation is defined using the denotations of its immediate subterms. We have used this in (\(\star \)). In our case denotations are sets, where each elements represents a possible behavior of the term, we are interested in establishing a directional generalization of compositionality:

Lemma 1

If \(\llbracket {M}\rrbracket \subseteq \llbracket {N}\rrbracket \) then \(\llbracket {\Xi \left[ {M}\right] }\rrbracket \subseteq \llbracket {\Xi \left[ {N}\right] }\rrbracket \) for any program context \(\Xi \left[ {{-}}\right] \).

Compositionality is a consequence of its monadic design using monotonic operators, and is not substantially different from previous work [e.g. 20].

Observability correspondence. The abstract rewrite rules break the direct correspondence between traces and interrupted executions. For example, in our analysis of (WW-Elim), by using absorb, we ended up with a trace in which only one message is added even though the program term adds two messages.

Still, some connection must remain to obtain a proof of adequacy. In particular, we would like traces to correspond to observable behavior of programs. In one direction, an even stronger property holds, known as soundness:

Lemma 2

For every execution of a program \(M\) in the operational semantics of RA, there exists that matches the execution: \(\langle {{\alpha }, {\mu }}\rangle \) is the initial state, \(\langle {{\omega }, {\rho }}\rangle \) is the final state, and \(r\) matches the value returned.

To prove soundness, we take a trace where transitions correspond to the memory-accessing execution steps, and then use mumble to obtain a single transition.

Ignoring the final state, the correspondence holds in the other direction too:

Lemma 3

For every program \(M\) and there is an observable behavior of \(M\) with initial state \(\langle {{\alpha }, {\mu }}\rangle \) and return value matching \(r\).

The lack of correspondence with the final state is an artifact of the concreteness-abstraction divergence between the operational and denotational semantics. Due to this divergence, it is significantly more challenging to establish this direction of the correspondence than in previous work.

Overcoming the concreteness-abstraction hurdle. The most technically challenging step in proving Lemma 3 is to prove the application of abstract rewrite rules can be deferred to the end. We define the basic denotation of a term \(M\) by \({\underline{\llbracket {M}\rrbracket }}\), which is the denotation were it defined using only the concrete rewrite rules. Denoting its closure under the abstract rewrite rules by \({{\underline{\llbracket {M}\rrbracket }}}^{\dagger }\), we claim:

Lemma 4

If \(M\) is a program, then \({{\underline{\llbracket {M}\rrbracket }}}^{\dagger } = {\llbracket {M}\rrbracket }\).

Thus, to obtain all of the traces that result from the regular denotational construction, where all of the rewrite rules are applied throughout the entire denotational construction, it is enough to close only under the concrete rewrite rules as the denotation of a program is built-up from its subterms, applying the abstract rewrite rules only at the top level.

The intuition that guides the inductive proof of Lemma 4 is that the abstract rewrite rules can be percolated out. To get the main idea across while keeping the discussion self-contained, we focus on the \({{\underline{\llbracket {M_1 \boldsymbol{\parallel }M_2}\rrbracket }}}^{\dagger } \supseteq \llbracket {M_1 \boldsymbol{\parallel }M_2}\rrbracket \) case.

Let \(\pi \in \llbracket {M_1 \boldsymbol{\parallel }M_2}\rrbracket \). By definition, \(\pi \) is obtained by first composing some \(\tau _1 \in \llbracket {M_1}\rrbracket \) in parallel with some \(\tau _2 \in \llbracket {M_2}\rrbracket \), i.e. interleaving transitions and pairing return values, and then rewriting the resulting trace \(\tau \) with concrete and abstract rules. By the inductive hypothesis, \({{\underline{\llbracket {M_i}\rrbracket }}}^{\dagger } \supseteq \llbracket {M_i}\rrbracket \). So \(\tau _i \in {{\underline{\llbracket {M_i}\rrbracket }}}^{\dagger }\), meaning that \(\tau _i\) is the result of rewriting some \(\tau _i' \in {\underline{\llbracket {M_i}\rrbracket }}\) with abstract rules.

To warm up, we first address the case where and \(\tau _2' = \tau _2\). We would hope, naively, that we can compose \(\tau _1'\) with \(\tau _2'\) to obtain some \(\tau ' \in {\underline{\llbracket {M_1 \boldsymbol{\parallel }M_2}\rrbracket }}\) such that , and thus \(\tau '\) rewrites to \(\pi \). However, they do not compose because \(\tau _1'\) has two local message, and \(\tau _2'\) has only the one environment message that matches the result of “absorbing” the two messages. Rather, \(\tau _1'\) can compose with a trace \(\bar{\tau }_2\) which is equal to \(\tau _2'\) except for having the required two environment messages instead of the combined one.

We formalize this by introducing a dual auxiliary rewrite rule \(\bar{\textsf{x}}\) for each abstract rule \(\textsf{x}\). For example, the dual of absorb is expel, which splits up an environment message dually to how absorb combines local messages. The auxiliary rewrite rules keep us within the basic denotations:

Lemma 5

If \(\tau \in {\underline{\llbracket {M}\rrbracket }}\) and for some auxiliary rule \(\textsf{z}\), then \(\pi \in {\underline{\llbracket {M}\rrbracket }}\).

Then we apply , and obtain the required \(\tau '\) by composing \(\tau _1'\) in parallel with \(\bar{\tau }_2\). This process of applying the dual rewrite in order to percolate an abstract rewrite out holds for sequential composition too. We summarize:

Lemma 6

If for some abstract \(\textsf{x}\), and \(\pi \) composes in parallel with \(\varrho \) to obtain \(\tau \), then there exist and , such that \(\pi '\) composes in parallel with \(\varrho '\) to obtain \(\tau '\). Similarly for sequential composition.

In the case where there are more abstract rewrite rules needed to obtain \(\tau _1\) from \(\tau _1'\), we can repeat the process. Yet two problems remain.

The first problem is that \(\pi \) is obtained from \(\tau ' \in {\underline{\llbracket {M_1 \boldsymbol{\parallel }M_2}\rrbracket }}\) by both concrete and abstract rewrites, starting with the abstract rewrites that we have “peeled off” \(\tau _1\). To show that \(\pi \in {{\underline{\llbracket {M_1 \boldsymbol{\parallel }M_2}\rrbracket }}}^{\dagger }\), we need the concrete rewrites to come before the abstract rewrites.

The second problem appears once we remove our simplifying assumption that \(\tau _2' = \tau _2\). In the general case, we obtain \(\bar{\tau }_2\) from \(\tau _2'\) using abstract rewrites followed by auxiliary rewrites. If we could replace the sequence of rewrites with one in which the abstract rewrites follow the auxiliary rewrites, then \(\tau _2'\) could be rewritten with auxiliary rules to some \(\bar{\tau }_2' \in {\underline{\llbracket {M_2}\rrbracket }}\) by using Lemma 5, which in turn could be rewritten with abstract rewrites to \(\bar{\tau }_2 \in {{\underline{\llbracket {M_2}\rrbracket }}}^{\dagger }\). This would allow the proof to continue by repeating the process to the other side.

Both problems are solved by commuting the abstract rewrites outwards:

Lemma 7

For any rewrite sequence starting with \(\tau \) and ending with \(\pi \), there exists one in which all of the abstract rewrites appear last.

Thus, we can do as we planned and repeat the process to the other side, “peeling off” the abstract rewrites from \(\bar{\tau }_2\) to obtain \(\bar{\tau }_2' \in {\underline{\llbracket {M_2}\rrbracket }}\), rewriting \(\tau _1'\) with the dual auxiliary rules in lockstep, resulting in some \(\bar{\tau }_1' \in {\underline{\llbracket {M_1}\rrbracket }}\) by Lemma 5. By Lemma 6, these compose in parallel to some \(\bar{\tau } \in {\underline{\llbracket {M_1 \boldsymbol{\parallel }M_2}\rrbracket }}\) that rewrites with concrete and abstract rules to \(\tau \), and thus to \(\pi \). By Lemma 7, we can rewrite \(\bar{\tau }\) with concrete rules to some \(\bar{\tau }' \in {\underline{\llbracket {M_1 \boldsymbol{\parallel }M_2}\rrbracket }}\) first, and with abstract rules afterwards, obtaining \(\pi \in {{\underline{\llbracket {M_1 \boldsymbol{\parallel }M_2}\rrbracket }}}^{\dagger }\).

Having established Lemma 4, the rest is relatively straightforward. First, traces in basic denotations correspond to interrupted executions, and in particular, an analog of Lemma 3 holds for basic denotations:

Lemma 8

For every program \(M\) and there is an observable behavior of \(M\) with initial state \(\langle {{\alpha }, {\mu }}\rangle \) and return value matching \(r\).

Next, it is clear from their definition that the abstract rules do not change the number of transitions. Thus, thanks to Lemma 4, the single-transition traces in \(\llbracket {M}\rrbracket \) are the result of rewriting single-transition traces in \({\underline{\llbracket {M}\rrbracket }}\) by abstract rules, which correspond to observable behaviors of \(M\) by Lemma 8.

Lemma 3 follows from the fact that the abstract rules preserve the correspondence between traces and observable behavior of programs. For example, due to absorb there is a trace which only adds one message in the denotation of a program that adds two messages; yet the initial view, the opening memory, and the returned value are maintained. The tighten rule similarly preserves these. In both cases, the execution exhibiting the behavior can remain unchanged. The dilute rule may replace an initial message’s timestamp with a smaller one, in which case the execution exhibiting the behavior needs to use the new timestamp accordingly, but otherwise remains the same.

Adequacy. The central result is (directional) adequacy, stating that denotational approximation corresponds to refinement of observable behaviors:

Theorem 9

If \(\llbracket {M}\rrbracket \subseteq \llbracket {N}\rrbracket \), then for all program contexts \(\Xi \left[ {{-}}\right] \), every observable behavior of \(\Xi \left[ {M}\right] \) is an observable behavior of \(\Xi \left[ {N}\right] \).

In particular, \(\llbracket {M}\rrbracket \subseteq \llbracket {N}\rrbracket \) implies that \(N\twoheadrightarrow M\) is valid under RA, because the effect of applying it is unobservable.

Adequacy follows immediately from the above results. Indeed, using soundness, an observable behavior of \(\Xi \left[ {M}\right] \) corresponds to a single-transition \(\tau \in \llbracket {\Xi \left[ {M}\right] }\rrbracket \); by the assumption and compositionality \(\tau \in \llbracket {\Xi \left[ {N}\right] }\rrbracket \); and using the other direction, \(\tau \) corresponds to an observable behavior of \(\Xi \left[ {N}\right] \).

Higher-order subtleties. When applying the above results in the presence of higher order, one must pay attention to the program assumption. Indeed, suppose \(\llbracket {M}\rrbracket \supseteq \llbracket {M'}\rrbracket \). Compositionality does not entail that \(\llbracket {\lambda a_{}.\, M}\rrbracket \supseteq \llbracket {\lambda a_{}.\, M'}\rrbracket \). Indeed, a function \(\lambda a_{}.\, M\) is a value, i.e. it does not execute, and in particular it does not perform any effects, regardless of \(M\). Accordingly, \(\llbracket {\lambda a_{}.\, M}\rrbracket \) consists of closures of traces of the form , where \(f\) is a function that returns sets of traces obtained from \(\llbracket {M}\rrbracket \). The fact that \(\llbracket {M}\rrbracket \supseteq \llbracket {M'}\rrbracket \) is not helpful, because traces in \(\llbracket {\lambda a_{}.\, M'}\rrbracket \) have different returned values \(f'\) from traces in \(\llbracket {\lambda a_{}.\, M}\rrbracket \).

Directional compositionality is still useful in the presence of abstractions. For example, if \(M\) is a program that returns a location, then from \(\llbracket {a\mathbin {\boldsymbol{:=}}v\mathbin {\boldsymbol{;}}a\mathbin {\boldsymbol{:=}}w}\rrbracket \supseteq \llbracket {a\mathbin {\boldsymbol{:=}}w}\rrbracket \) it follows that \(\llbracket {\left( {\lambda a_{}.\, a\mathbin {\boldsymbol{:=}}v\mathbin {\boldsymbol{;}}a\mathbin {\boldsymbol{:=}}w}\right) M}\rrbracket \supseteq \llbracket {\left( {\lambda a_{}.\, a\mathbin {\boldsymbol{:=}}w}\right) M}\rrbracket \).

To deal with the need to prove properties “pointwise” that abstractions bring about, such as containment of denotations in the proof of directional compositionality, we use logical relations. Moggi’s toolkit provides a standard way to define these, thereby lifting properties to their higher-order counterparts.

Transformations exhibiting abstraction. To the best of our knowledge, all transformations \(N\twoheadrightarrow M\) proven to be valid under RA in the existing literature are supported by our denotational semantics, i.e. \(\llbracket {N}\rrbracket \supseteq \llbracket {M}\rrbracket \). Structural transformations are supported by virtue of using Moggi’s standard semantics. Our semantics also validates “algebraic laws of parallel programming”, such as sequencing \(M\boldsymbol{\parallel }N\twoheadrightarrow \langle {M,N}\rangle \) and its generalization that Hoare and van Staden [22] recognized, \(\left( {M_1 \mathbin {\boldsymbol{;}}M_2}\right) \boldsymbol{\parallel }\left( {N_1 \mathbin {\boldsymbol{;}}N_2}\right) \twoheadrightarrow \left( {M_1 \boldsymbol{\parallel }N_1}\right) \mathbin {\boldsymbol{;}}\left( {M_2 \boldsymbol{\parallel }N_2}\right) \), which in the functional setting can take the more expressive form in which the values returned are passed on to the following computation. See Figure 10 for a partial list.

Fig. 10.
figure 10

A selective list of supported non-structural transformations. Along with Symmetry, the denotational semantics supports all symmetric-monoidal laws with the binary operator \((\boldsymbol{\parallel })\) and the unit \(\langle {}\rangle \). Similar transformations, replacing \(\textrm{FAA}\) with other RMWs, are supported too. The abstract rewrite rules used to validate a transformation is mentioned, if there is one.

Hence we claim that our adequate denotational semantics is sufficiently abstract. This supports the case that Moggi’s semantic toolkit can successfully scale to handle the intricacies of RA concurrency by adapting Brookes’s traces.

5 Related Work and Concluding Remarks

Our work follows the approach of Brookes [13] and its extension to higher-order functions using monads by Benton et al. [6]. Brookes developed a denotational semantics for shared memory concurrency under standard sequentially consistency [33], and established full abstraction w.r.t. a language that has a global atomic \(\textbf{await}\) instruction that locks the entire memory. The concepts behind this approach had been used in multiple related developments, e.g. [12, 34, 35, 46]. We hope that our work that targets RA will pave the way for similar continuations.

Jagadeesan et al. [25] adapted Brookes’s semantics to the x86-TSO memory model [40]. They showed that for x86-TSO it suffices to include the final store buffer at the end of the trace and add two additional simple closure rules that emulate non-deterministic propagation of writes from store buffers to memory, and identify observably equivalent store buffers. The x86-TSO model, however, is much closer to sequential consistency than RA, which we study in this paper. In particular, unlike RA, x86-TSO is “multi-copy-atomic” (writes by one thread are made globally visible to all other threads at the same time) and successful RMW operations are immediately globally visible. Additionally, the parallel composition construct in Jagadeesan et al. [25] is rather strong: threads are forked and joined only when the store buffers are empty. Being non-multi-copy-atomic, RA requires a more delicate notion of traces and closure rules, but it has more natural meta-theoretic properties, which one would expect from a programming language concurrency model: sequencing, a.k.a. thread-inlining, is unsound under x86-TSO [see 25, 31] but sound under RA (see Figure 10).

Burckhardt et al. [14] developed a denotational semantics for hardware weak memory models (including x86-TSO) following an alternative approach. They represent sequential code blocks by sequences of operations that the code performs, and close them under certain rewrite rules (reorderings and eliminations) that characterize the memory model. This approach does not validates important optimizations, such as Read-Read Elimination. Moreover, unlike x86-TSO, RA cannot be characterized by rewrite operations on SC traces [31].

Dodds et al. [19] developed a fully abstract denotational semantics for RA, extended with fences and non-atomic accesses. Their semantics is based on RA’s declarative (a.k.a. axiomatic) formulation as acyclicity criteria on execution graphs. Roughly speaking, their denotation of code blocks (that they assume to be sequential) quantifies over all possible context execution graphs and calculates for each context the “happens-before” relation between context actions that is induced by the block. They further use a finite approximation of these histories to atomically validate refinement in a model checker. While we target RA as well, there are two crucial differences between our work and Dodds et al. [19]. First, we employ Brookes-style totally ordered traces and use interleaving-based operational presentation of RA. Second, and more importantly, we strive for a compositional semantics where denotations of compound programs are defined as functions of denotations of their constituents, which is not the case for Dodds et al. [19]. Their model can nonetheless validate transformations by checking them locally without access to the full program.

Others present non-compositional techniques and tools to check refinement under weak memory models between whole-thread sequential programs that apply for any concurrent context. Poetzl and Kroening [43] considered the SC-for-DRF model, using locks to avoid races. Their approach matches source to target by checking that they perform the same state transitions from lock to subsequent unlock operations and that the source does not allow more data-races. Morisset et al. [39] and Chakraborty and Vafeiadis [16] addressed this problem for the C/C++11 model, of which RA is a central fragment, by implementing matching algorithms between source and target that validate that all transformations between them have been independently proven to be safe under C/C++11.

Cho et al. [18] introduced a specialized semantics for sequential programs that can be used for justifying compiler optimizations under weak memory concurrency. They showed that behavior refinement under their sequential semantics implies refinement under any (sequential or parallel) context in the Promising Semantics 2.1 [17]. Their work focuses on optimizations of race-free accesses that are similar to C11’s “non-atomics” [4, 32]. It cannot be used to establish the soundness of program transformations that we study in this paper. Adding non-atomics to our model is an important future work.

Denotational approaches were developed for models much weaker than RA [15, 24, 26, 28, 41] that allow the infamous Read-Write Reorder and thus, for a high-level programming language, require addressing the challenge of detecting semantic dependencies between instructions [3]. These approaches are based on summarizing multiple partial orders between actions that may arise when a given program is executed under some context. In contrast, we use totally ordered traces by relating to RA’s interleaving operational semantics. In particular, Kavanagh and Brookes [28] use partial orders, Castellan, Paviotti et al. [15, 41] use event structures, and Jagadeesan et al., Jeffrey et al. [24, 26] employ “Pomsets with Preconditions” which trades compositionality for supporting non-multi-copy-atomicity, as in RA. These approaches do not validate certain access eliminations, nor Irrelevant Load Introduction, which our model validates.

An exciting aspect of our work is the connection between memory models to Moggi’s monadic approach. For SC, Abadi and Plotkin, Dvir et al. [1, 20] have made an even stronger connection via algebraic theories [42]. These allow to modularly combine shared memory concurrency with other computational effects. Birkedal et al. [11] develop semantics for a type-and-effect system for SC memory which they use to enhance compiler optimizations based on assumptions on the context that come from the type system. We hope to the current work can serve as a basis to extend such accounts to weaker models.