A Separation Logic for a Promising Semantics
Abstract
We present SLR, the first expressive program logic for reasoning about concurrent programs under a weak memory model addressing the outofthinair problem. Our logic includes the standard features from existing logics, such as RSL and GPS, that were previously known to be sound only under stronger memory models: (1) separation, (2) perlocation invariants, and (3) ownership transfer via releaseacquire synchronisation—as well as novel features for reasoning about (4) the absence of outofthinair behaviours and (5) coherence. The logic is proved sound over the recent “promising” memory model of Kang et al., using a substantially different argument to soundness proofs of logics for simpler memory models.
1 Introduction
Recent years have seen the emergence of several program logics [2, 6, 8, 16, 23, 24, 26, 27, 28] for reasoning about programs under weak memory models. These program logics are valuable tools for structuring program correctness proofs, and enabling programmers to reason about the correctness of their programs without necessarily knowing the formal semantics of the programming language. So far, however, they have only been applied to relatively strong memory models (such as TSO [19] or release/acquire consistency [15] that can be expressed as a constraint on individual candidate program executions) and provide little to no reasoning principles to deal with C/C++ “relaxed” accesses.
The main reason for this gap is that the behaviour of relaxed accesses is notoriously hard to specify [3, 5]. Up until recently, memory models have either been too strong (e.g., [5, 14, 17]), forbidding some behaviours observed with modern hardware and compilers, or they have been too weak (e.g., [4]), allowing socalled outofthinair (OOTA) behaviour even though it does not occur in practice and is highly problematic.
Recently, there have been several proposals of programming language memory models that allow load buffering behaviour, but forbid obvious outofthinair behaviours [10, 13, 20]. This development has enabled us to develop a program logic that provides expressive reasoning principles for relaxed accesses, without relying on overly strong models.
In this paper, we present SLR, a separation logic based on RSL [27], extended with strong reasoning principles for relaxed accesses, which we prove sound over the recent “promising” semantics of Kang et al. [13]. SLR features perlocation invariants [27] and physical separation [22], as well as novel assertions that we use to show the absence of OOTA behaviours and to reason about various coherence examples. (Coherence is a property of memory models that requires the existence of a perlocation total order on writes that reads respect.)
There are two main contributions of this work.
The second major contribution is the proof of soundness of SLR over the promising semantics [13]^{2}. The promising semantics is an operational model that represents memory as a collection of timestamped write messages. Besides the usual steps that execute the next command of a thread, the model has a nonstandard step that allows a thread to promise to perform a write in the future, provided that it can guarantee to be able to fulfil its promise. After a write is promised, other threads may read from that write as if it had already happened. Promises allow the loadstore reordering needed to exhibit the load buffering behaviour above, and yet seem, from a series of litmus tests, constrained enough so as to not introduce outofthinair behaviour.
Since the promising model is rather different from all other (operational and axiomatic) memory models for which a program logic has been developed, none of the existing approaches for proving soundness of concurrent program logics are applicable to our setting. Two key difficulties in the soundness proof come from dealing with promise steps.
 1.
Promises are very nonmodular, as they can occur at every execution point and can affect locations that may only be accessed much later in the program.
 2.
Since promised writes can be immediately read by other threads, the soundness proof has to impose the same invariants on promised writes as the ones it imposes on ordinary writes (e.g., that only values satisfying the location’s protocol are written). In a logic supporting ownership transfer,^{3} however, establishing those invariants is challenging, because a thread may promise to write to x even without having permission to write to x.
To deal with the first challenge, our proof decouples promising steps from ordinary execution steps. We define two semantics of Hoare triples—one “promising”, with respect to the full promising semantics, and one “nonpromising”, with respect to the promising semantics without promising steps—and prove that every Hoare triple that is correct with respect to its nonpromising interpretation is also correct with respect to its promising interpretation. This way, we modularise reasoning about promise steps. Even in the nonpromising semantics, however, we do allow threads to have outstanding promises. The main difference in the nonpromising semantics is that threads are not allowed to issue new promises.
To resolve the second challenge, we observe that in programs verified by SLR, a thread may promise to write to x only if it is able to acquire the necessary write permission before performing the actual write. This follows from promise certification: the promising semantics requires all promises to be certifiable; that is, for every state of the promising machine, there must exist a nonpromising execution of the machine that fulfils all outstanding promises.
2 Our Logic
The novelty of our program logic is to allow nontrivial reasoning about relaxed accesses. Unlike release/acquire accesses, relaxed accesses do not induce synchronisation between threads, so the usual approach of program logics, which relies on ownership transfer, does not apply. Therefore, in addition to reasoning about ownership transfer like a standard separation logic, our logic supports reasoning about relaxed accesses by collecting information about what reads have been observed, and in which order. When combined with information about which writes have been performed, we can deduce that certain executions are impossible.
For concreteness, we consider a minimal “WHILE” programming language with expressions, \(e \in \textit{Expr}\), and statements, \(s \in \textit{Stm}\), whose syntax is given in Fig. 1. Besides local register assignments, statements also include memory reads with relaxed or acquire mode, and memory writes with relaxed or release mode.
2.1 The Assertions of the Logic
\(\mathsf {Rel}(l,\phi )\) grants permission to perform a release write to location l and transfer away the invariant \(\phi (v)\), where v is the value written to that location. Conversely, \(\mathsf {Acq}(l,\phi )\) grants permission to perform an acquire read from location l and gain access to the invariant \(\phi (v)\), where v is the value returned by the read.
The first novel assertion form, \(\mathsf {O}({l},{v},{t})\), records the fact that location l was observed to have value v at timestamp t. The timestamp is used to order it with other reads from the same location. The information this assertion provides is very weak: it merely says that the owner of the assertion has observed that value, it does not imply that any other thread has ever observed it.
The other novel assertion form, \(\mathsf {W}^{\pi }({l},{X})\), asserts ownership of location l and records a set of writes X to that location. The fractional permission \(\pi \in \mathbb {Q}\) indicates whether ownership is shared or exclusive. Full permission, \(\pi = 1\), confers exclusive ownership of location l and ensures that X is the set of all writes to location l; any fraction, \(0< \pi < 1\), confers shared ownership and enforces that X is a lowerbound on the set of writes to location l. The order of writes to l is tracked through timestamps; the set X is thus a set of pairs consisting of the value and the timestamp of the write.
In examples where we only need to refer to the order of writes and not the exact timestamps, we write \(\mathsf {W}^{\pi }({x},{\ell })\), where \(\ell = [v_1, ..., v_n]\) is a list of values, as shorthand for Open image in new window . The \(\mathsf {W}^{\pi }({x},{\ell })\) assertion thus expresses ownership of location x with permission \(\pi \), and that the writes to x are given by the list \(\ell \) in order, with the most recent write at the front of the list.
2.2 The Rules of the Logic for Relaxed Accesses
which allows one to use “view shifting” implications to strengthen the precondition and weaken the postcondition.
The rules for relaxed accesses are adapted from the rules of RSL [27] for release/acquire accesses, but use our novel resources to track the more subtle behaviour of relaxed accesses. Since relaxed accesses do not introduce synchronisation, they cannot be used to transfer ownership; they can, however, be used to transfer information. For this reason, as in RSL [27], we associate a predicate \(\phi \) on values to a location x using paired \(\mathsf {Rel}(x,\phi )\) and \(\mathsf {Acq}(x,\phi )\) resources, for writers and readers, respectively. To write v to x, a writer has to provide \(\phi (v)\), and in exchange, when reading v from x, a reader obtains \(\phi (v)\). However, here, relaxed writes can only send pure predicates (i.e., ones which do not assert ownership of any resources), and relaxed reads can only obtain the assertion from the predicate guarded by a modality \(\nabla {}\)^{4} that only pure assertions filter through: if P is pure, then \(\nabla {P} \Longrightarrow P\). All assertions expressible in firstorder logic are pure.
Again, we can obtain \(\mathsf {O}({x},{v_0^x},{0})\), where \(v_0^x\) is the initial value of x, from the initial write permission for x, and distribute it to all the threads that will read from x, expressing the fact that the initial value is available to all threads, and use it as the required \(\mathsf {O}({x},{\_},{t})\) in the precondition of the read rule.
Separation. With these assertions, we can straightforwardly specify and verify the DisjointLists example. Ownership of an element of a list is simply expressed using a full write permission, \(\mathsf {W}^{1}({x},{X})\). This allows including the DisjointLists as a snippet in a larger program where the lists can be shared before or after, and still enforce the separation property we want to establish. While this reasoning sounds underwhelming (and we elide the details), we remark that it is unsound in models that allow OOTA behaviours.
2.3 Reasoning About Coherence
We use the timestamps in the \(\mathsf {O}({x},{a},{t})\) assertions to record the order in which reads read values, and then link the timestamps of the reads with those of the writes. Because we do not transfer anything, the predicate for x is Open image in new window again, and we elide the associated clutter below.
2.4 Handling Release and Acquire Accesses
Release/acquire can be understood abstractly in terms of views [15]: a release write contains the view of the writing thread at the time of the writing, and an acquire read updates the view of the reading thread with that of the release write it is reading from. This allows oneway synchronisation of views between threads.
To handle release/acquire accesses in SLR, we can adapt the rules for relaxed accesses by enabling ownership transfer according to predicate associated with the \(\mathsf {Rel}\) and \(\mathsf {Acq}\) permissions. The resulting rules are strictly more powerful than the corresponding RSL [27] rules, as they also allow us to reason about coherence.
2.5 Plain Accesses
Our formal development (in the technical appendix) also features the usual “partial ownership” \(x {\mathop {\mapsto }\limits ^{\pi }} v\) assertion for “plain” (nonatomic) locations, and the usual corresponding rules.
3 The Promising Semantics
In this section, we provide an overview of the promising semantics [13], the model for which we prove SLR sound. Formal details can be found in [1, 13].

As in the “strong release/acquire” model [15], the memory is a pool of timestamped messages, and each thread maintains a “view” thereof. A thread may read any value that is not older than the latest value observed by the thread for the given location; in particular, this may well not be the latest value written to that particular location. Timestamps and views model nonmulticopyatomicity: writes performed by one thread do not become simultaneously visible by all other threads.

The operational semantics contains a nonstandard step: at any point a thread can nondeterministically promise a write, provided that, at every point before the write is actually performed, the thread can certify the promise, that is, execute the write by running on its own from the current state. Promises are used to enable loadstore reordering.
The behaviour of promising steps can be illustrated on the LB+data+fakedep litmus test from the Introduction. The second thread can, at the very start of the execution, promise a write of 1 to x, because it can, by running on its own from the current state, read from y (it will read 0), then write 1 to x (because \(0 + 1  0 = 1\)), thereby fulfilling its promise. On the other hand, the first thread cannot promise a write of 1 to y at the beginning of the execution, because, by running on its own, it can only read 0 from x, and therefore only write 0 to y.
3.1 Storage Subsystem
3.2 Thread Subsystem
A thread state is a pair \( TS ={\langle \sigma ,V \rangle }\), where \(\sigma \) is the internal state of the thread and \(V\) is a view. We denote by \( TS .\sigma \) and \( TS .V\) the components of \( TS \).
Thread Internal State. The internal state \(\sigma \) consists of a thread store (denoted \(\sigma .\mu \)) that assigns values to local registers and a statement to execute (denoted \(\sigma .s\)). The transitions of the thread internal state are labeled with memory actions and are given by an ordinary sequential semantics. As these are routine, we leave their description to the technical appendix.
Views. Thread views are used to enforce coherence, that is, the existence of a perlocation total order on writes that reads respect. A view is a function \(V:{\textit{Loc}}\rightarrow {\textit{Time}}\), which records how far the thread has seen in the history of each location. To ensure that a thread does not read stale messages, its view restricts the messages the thread may read, and is increased whenever a thread observes a new message. Messages themselves also carry a view (the thread’s view when the message comes from a release write, and the bottom view otherwise) which is incorporated in the thread view when the message is read by an acquire read.
Additional Notations. The order on timestamps, \(\le \), is extended pointwise to views. \(\bot \) and \(\sqcup \) denote the natural bottom elements and join operations for views. Open image in new window denotes the view assigning t to x and \({\small 0}\) to other locations.
3.3 Interaction Between a Thread and the Storage Subsystem

Make an internal transition with no effect on the storage subsystem.

Read the value v from location x, when there is a matching message in memory that is not outdated according to the thread’s view. It then updates its view accordingly: it updates the timestamp for location x and, in addition, incorporates the message view if the read is an acquire read.

Write the value v to location x. Here, the thread picks a timestamp greater than the one of its current view for the message it adds to memory (or removes from the promise set). If the write is a release write, the message carries the view of the writing thread. Moreover, a release write to x can only be performed when the thread has already fulfilled all its promises to x.

Nondeterministically promise a relaxed write by adding a message to both M and P.
3.4 Constraining Promises
The thread configuration \({\mathop {\Longrightarrow }\limits ^{}}\)transitions allow a thread to (1) take any number of nonpromising steps, provided its thread configuration at the end of the sequence of step (intuitively speaking, when it gives control back to the scheduler) is consistent, or (2) take a promising step, again provided that its thread configuration after the step is consistent.
3.5 Full Machine
Finally, the full machine transitions simply lift the thread configuration \({\mathop {\Longrightarrow }\limits ^{}}\)transitions to the machine level. A machine state is a tuple \({\mathbf {MS}}= \langle \mathcal {T\!S}, \langle M, P \rangle \rangle \), where \(\mathcal {T\!S}\) is a function assigning a thread state \( TS \) to every thread, and \({\langle M,P \rangle }\) is a global configuration. The initial state \({\mathbf {MS}}^0\) (for a given program) consists of the function \(\mathcal {T\!S}^0\) mapping each thread i to its initial state \({\langle \sigma _i^0,\bot \rangle }\), where \(\sigma _i^0\) is the thread’s initial local state and \(\bot \) is the zero view (all timestamps in views are \({\small 0}\)); the initial memory \(M^0\) consisting of one message \(\langle x :_{{\small 0}}^{\texttt {rlx}} 0, \bot @ {\small 0}]\rangle \) for each location x; and the empty set of promises.
4 Semantics and Soundness
 1.
Reasoning about promises. This difficulty arises because promise steps can be nondeterministically performed by the promise machine at any time.
 2.
Reasoning about releaseacquire ownership transfer in the presence of promises. The problem is that writes may be promised before the thread has acquired enough resources to allow it to actually perform the write.
4.1 The Intuition
SLR assertions are interpreted by (sets of) resources, which represent permissions to write to a certain location and/or to obtain further resources by reading a certain message from memory. As is common in semantics of separation logics, the resources form a partial commutative monoid, and SLR’s separating conjunction is interpreted as the composition operation of the monoid.
When defining the meaning of a Hoare triple \(\{{P}\}\;{s}\;\{{Q}\}\), we think of the promise machine as if it were manipulating resources: each thread owns some resources and operates using them. The intuitive description of the Hoare triple semantics is that every run of the program \(s\) starting from a state containing the resources described by the precondition, P, will be “correct” and, if it terminates, will finish in a state containing the resources described by the postcondition, Q. The notion of a program running correctly can be described in terms of threads “respecting” the resources they own; for example, if a thread is executing a write or fulfilling a promise, it should own a resource representing the write permission.
4.2 A Closer Look at the Resources and the Assertion Semantics
We now take a closer look at the structure of resources and the semantics of assertions, whose formal definitions can be found in Figs. 2 and 3.
In addition, however, we have to deal with assertions that are parametrised by predicates (in our case, \(\mathsf {Rel}(x,\phi )\) and \(\mathsf {Acq}(x,\phi )\)). Doing so is not straightforward because naïve attempts of giving semantics to such assertions result in circular definitions. A common technique for avoiding this circularity is to treat predicates stored in assertions syntactically, and to interpret assertions relative to a world, which is used to interpret those syntactic predicates. In our case, worlds consist of two components: the WrPerm component associates a syntactic SLR predicate with every location (this component is used to interpret release permissions), while the AcqPerm component associates a syntactic predicate with a finite number of currently allocated predicate identifiers (this component is used to interpret acquire permissions). The reason for the more complex structure for acquire permissions is that they can be split (see (AcquireSplit)). Therefore, we allow multiple predicate identifiers associated with a single location. When acquire permissions are divided and split between threads, new predicate identifiers are allocated and associated with predicates in the world. The world ordering, \(\mathcal {W}_1 \le \mathcal {W}_2\), expresses that world \(\mathcal {W}_2\) is an extension of \(\mathcal {W}_1\) in which new predicate identifiers may have been allocated, but all existing predicate identifiers are associated with the same predicates.
Let us now focus our attention on the assertion semantics. The semantics of assertions, Open image in new window , is relative to a thread store \(\mu \) that assigns values to registers, and an environment \(\eta \) that assigns values to logical variables.

The observed assertion \(\mathsf {O}({x},{v},{t})\) says that the memory contains a message at location x with value v and timestamp t, and the current thread knows about it (i.e., the thread view contains it).

The write assertion \(\mathsf {W}^{\pi }({x},{X})\) asserts ownership of a (partial, with fraction \(\pi \)) write resource at location x, and requires that the largest timestamp recorded in X does not exceed the view of the current thread.

The acquire assertion, \(\mathsf {Acq}(x, \phi )\), asserts that location x has some predicate identifier \(\iota \) associated with the \(\phi \) predicate in the current world \(\mathcal {W}\).

The release assertion, \(\mathsf {Rel}(x, \phi )\), asserts that location x is associated with some predicate \(\phi '\) in the current world such that there exists a syntactic proof of the entailment, Open image in new window . The implication allows us to strengthen the predicate in release assertions.

Finally, \(\nabla {P}\) states that P is satisfiable in the current world.
4.3 Relating Concrete State and Resources
Before giving a formal description of the relationship between abstract resources and concrete machine states, we return to the intuition of threads manipulating resources presented in Sect. 4.1.
Consider what happens when a thread executes a release write to a location x. At that point, the thread has to own a release resource represented by \(\mathsf {Rel}(x,\phi )\), and to store the value v, it has to own the resources represented by \(\phi (v)\). As the write is executed, the thread gives up the ownership of the resources corresponding to \(\phi (v)\). Conversely, when a thread that owns the resource represented by \(\mathsf {Acq}(x,\phi )\) performs an acquire read of a value v from location x, it will gain ownership of resources satisfying \(\phi (v)\). However, this picture does not account for what happens to the resources that are “in flight”, i.e., the resources that have been released, but not yet acquired.
The last condition in the message resource satisfaction relation has to do with relaxed accesses. Since relaxed accesses do not provide synchronisation, we disallow ownership transfer through them. Therefore, we require that the release predicates connected with the relaxed messages are satisfiable with the empty resource. This condition, together with the requirement that the released resources satisfy acquire predicates, forbids ownership transfer via relaxed accesses.
 1.
A thread performs a write. This is the straightforward case: we simply require the thread to own the write resource and to update the set of valuetimestamp pairs recorded in the resource accordingly.
 2.
A thread promises a write. Here the situation is more subtle, because the thread might not own the write resource at the time it is issuing the promise, but will acquire the appropriate resource by the time it fulfils the promise. So, in order to assert that the promise step respects the resources owned by the thread, we also need to be able to talk about the resources that the thread can acquire in the future.
When dealing with the promises, the saving grace comes from the fact that all promises have to be certifiable, i.e., when issuing a promise a thread has to be able to fulfil it without help from other threads.
Intuitively, the existence of a certification run tells us that even though at the moment a thread issues a promise, it might not have the resources necessary to actually perform the corresponding write, the thread should, by running uninterrupted, still be able to obtain the needed resources before it fulfils the promise. This, in turn, tells us that the needed resources have to be already released by the other threads by the time the promise is made: only resources attached to messages in the memory are available to be acquired, and only the thread that made the promise is allowed to run during the certification; therefore all the available resources have already been released.
An important element that was omitted from the discussion so far is the definition of the composition in the resource monoid \( Res \). The resource composition, defined in Fig. 5, follows the expected notion of percomponent composition. The most important feature is in the composition of write resources: a full permission write resource is only composable with the empty write resource.
 1.
Memory M is consistent with respect to the total resource r and the message resource assignment u at world \(\mathcal {W}\).
 2.
The set of fulfilled writes to each location x in \({\langle M, P \rangle }\) must match the set of writes of all write permissions owned by any thread or associated with any messages, when combined.
 3.
For all unfulfilled promises to a location x by thread i, thread i must currently own or be able to acquire from u at least a shared write permission for x.
Our formal notion of erasure, defined in Fig. 6, has an additional parameter, a set of thread identifiers T. This set allows us to exclude promises of threads T from the requirement of respecting the resources. As we will see in the following subsection, this additional parameter plays a subtle, but key, role in the soundness proof. (The notion of erasure described above corresponds to the case when \(T=\emptyset \).)
Note also that the arguments of erasure very precisely account for who owns which part of the total resource. This diverges from the usual approach in separation logic, where we just give the total resource as the argument to the erasure. Our approach is motivated by Lemma 1, which states that a reader that owns the full write resource for location x knows which value it is going to read from x. This is the key lemma in the soundness proof of the (rrlx*) and (racq*) rules.
Lemma 1
If Open image in new window , and \({\langle M,P \rangle } \in {\lfloor r_F, u, \mathcal {W} \rfloor _{\{ i \}}}\) then for all messages \(m \in {M}(x) \setminus {P}(i)\) such that \(V(x) \le m.\texttt {time}\), we have \(m.\texttt {val}= \textit{fst}(\max (X))\).
Lemma 1 is looking from the perspective of thread i that owns the full write resource for the location x. This is expressed by Open image in new window (recall that \(r_F(i)\) are the resources owned by the thread i). Furthermore, the lemma assumes that the concrete state respects the abstract resources, expressed by \({\langle M,P \rangle } \in {\lfloor r_F, u, \mathcal {W} \rfloor _{\{ i \}}}\). Under these assumptions, the lemma intuitively tells us that the current thread knows which value it will read from x. Formally, the lemma says that all the messages thread i is allowed to read (i.e., messages in the memory that are not outstanding promises of thread i and whose timestamp is greater or equal to the view of thread i) have the value that appears as the maximal element in the set X.
To see why this lemma holds, consider a message \(m \in {M}(x) \setminus {P}(i)\). If m is an unfulfilled promise by a different thread j, then, by erasure, it follows that j currently owns or can acquire at least a shared write permission for x. However, this is a contradiction, since thread i currently owns the exclusive write permission, and, by erasure, \(r_F(i)\) is disjoint from the resources of all other threads and all resources currently associated with messages by u. Hence, m must be a fulfilled write. By erasure, it follows that the set of fulfilled writes to x is given by the combination of all write permissions. Since \(r_F(i)\) owns the exclusive write permission, this is just \(r_F(i).\texttt {wr}\). Hence, the set of fulfilled writes is X, and the value of the last fulfilled write is \(\textit{fst}(\max (X))\).
Note that in the reasoning above, it is crucial to know which thread and which message owns which resource. Without precisely tracking this information, we would be unable to prove Lemma 1.
4.4 Soundness
Now that we have our notion of erasure, we can proceed to formalise the meaning of triples, and present the key points of the soundness proof.
 1.
If no more steps can be taken, the current state and resources have to satisfy the postcondition B.
 2.If we can take a step which takes us from the state \({\langle M,P \rangle }\) (which respects our current resources r, the assignment of resources to messages u, and world \(\mathcal {W}\)) to the state \({\langle M',P' \rangle }\), then
 (a)
there exist resources \(r'\), an assignment of resources to messages \(u'\), and a future world \(\mathcal {W}'\), such that \({\langle M',P' \rangle }\) respects \(r'\), \(u'\), and \(\mathcal {W}'\), and
 (b)
we are safe for n more steps starting in the state \({\langle M',P' \rangle }\) with resources given by \(r'\), \(u'\) and \(\mathcal {W}'\).
 (a)

Upon termination, we are not required to satisfy exactly the postcondition B, but its view shift. A view shift is a standard notion in concurrent separation logics, which allows updates of the abstract resources which do not affect the concrete state. In our case, this means that resource r can be viewshifted into \(r'\) satisfying B as long as the erasure is unchanged. The formal definition of view shifts is given in the appendix.

Again as is standard in separation logics, safety requires framed resources to be preserved. This is the role of \(r_F\) in the safety definition. Frame preservation allows us to compose safety of threads that own compatible resources. However, departing from the standard notion of frame preservation, we precisely track who owns which resource in the frame, because this is important for erasure.
To do so, once again certification runs for promises play a pivotal role. Recall that whenever a thread makes a step, it has to be able to fulfil its promises without help from other threads (Sect. 3.4). Since there will be no interference by other threads, performing promise steps during certification is of no use (because promises can only be used by other threads). Therefore, we can assume that the certification runs are always promisefree.
Now that we have noted that certifications are promisefree, the key idea behind encapsulating the reasoning about promises is as follows. If we know that all executions of our program are safe for arbitrarily many nonpromising steps, we can use this to conclude that they are safe for promising steps too. Here, we use the fact that certification runs are possible runs of the program, and the fact that certifications are promisefree.
Let us now formalise our key idea. First, we need a way to state that executions are safe for nonpromising steps. This is expressed by the nonpromising safety predicate defined in Fig. 8. What we want to conclude is that nonpromising safety is enough to establish safety, as expressed by Theorem 1:
Theorem 1
We now discuss several important points in the definition of nonpromising safety which enable us to prove this theorem.
Nonpromising Safety is Indexed by Pairs of Natural Numbers. When proving Theorem 1, we use promisefree certification runs to establish the safety of the promise steps. A problem we face here is that the length of certification runs is unbounded. Somehow, we have to know that whenever the thread makes a step, it is \(\mathrm {npsafe}\) for arbitrarily many steps. Our solution is to have \(\mathrm {npsafe}\) transfinitely indexed over pairs of natural numbers ordered lexicographically. That way, if we are \(\mathrm {npsafe}\) at index \((n+1,0)\) and we take a step, we know that we are \(\mathrm {npsafe}\) at index (n, m) for every m. We are then free to choose a sufficiently large m depending on the length of the certification run we are considering.
Nonpromising Safety Considers Configurations that May Contain Promises. It is important to note that the definition of nonpromising safety does not require that there are no promises in the starting configuration. The only thing that is required is that no more promises are going to be issued. This is very important for Theorem 1, since safety considers all possible starting configurations (including the ones with existing promises), and if we want the lemma to hold, nonpromising safety has to consider all possible starting configurations too.
Additional Constraints by the Nonpromising Safety. Nonpromising safety also imposes additional constraints on the reducing thread i. In particular, any write permissions owned or acquirable by i after the reduction were already owned or acquirable by i before the reduction step. Intuitively, this holds because thread i can only transfer away resources and take ownership of resources it was already allowed to acquire before reducing. Lastly, nonpromising safety requires that if the reduction of i performs any new writes or fulfils any old promises, it must own the write permission for the location of the given message. Together, these two conditions ensure that if a promise is fulfilled during a threadlocal certification and the thread satisfies nonpromising safety, then the thread already owned or could acquire the write permission for the location of the promise. This is expressed formally in Lemma 2.
Lemma 2
Assuming that \(({\langle M,P \rangle }, V, r) \in \mathrm {npsafe}_{(n+1, k)}(\sigma , B)(W)\) and \({\langle M,P \rangle } \in {\lfloor r_F[i \mapsto r \bullet f], u, W \rfloor _{\{ i \}}}\) and \(\langle \langle \sigma , V \rangle , {\langle M,P \rangle } \rangle {\mathop {\longrightarrow }\limits ^{\text {NP}}}^k_i \langle \langle \sigma ', V' \rangle , {\langle M',P' \rangle } \rangle \) and \(m \in (M' \setminus P') \setminus (M \setminus P)\), we have \((r \bullet \mathrm {canAcq}(r, u)).\texttt {wr}(m.\texttt {loc}).\texttt {perm}> 0\).
The intuition for why Lemma 2 holds is that since only thread i executes, we know by the definition of nonpromising safety that any write permission owned or acquirable by i when the promise is fulfilled, it already owns or can acquire in the initial state. Furthermore, whenever a promise is fulfilled, the nonpromising safety definition explicitly requires ownership of the corresponding write permission. It follows that the thread already owns or can acquire the write permission for the location of the given promise in the initial state.
Lemma 2 gives us exactly the property that we need to reestablish erasure after the operational semantics introduces a new promise. This makes Lemma 2 the key step in the proof of Theorem 1, which allows us to disentangle reasoning about promising steps and normal reduction steps. Theorem 1 tells us that, in order to prove a proof rule sound, it is enough to prove that the nonpromising safety holds for arbitrary indices. This liberates us of the cumbersome reasoning about promise steps and allows us to focus on nonpromising reduction steps when proving the proof rules sound.
We can now state our toplevel correctness theorem, Theorem 2. Since our language only has toplevel parallel composition, we need a way to distribute initial resources to the various threads, and to collect all the resources once all the threads have finished. The correctness theorem gives us precisely that:
Theorem 2
 1.
 2.
\(\vdash \circledast _{x \in A} \mathsf {Rel}(x, \phi _x) * \mathsf {Acq}(x, \phi _x) * \mathsf {W}^{1}({x},{\{ (0, 0) \}}) \;\Rrightarrow \; \circledast _{i \in {\textit{Tid}}}\, P_i\)
 3.
\(\vdash \{{P_i}\}\;{s_i}\;\{{Q_i}\}\) for all i
 4.
Open image in new window and Open image in new window for all i
 5.
\(\vdash \circledast _{i \in {\textit{Tid}}}\, Q_i \Rrightarrow Q\)
 6.
\( FRV (Q_i) \cap FRV (Q_j) = \emptyset \) for all distinct \(i, j \in {\textit{Tid}}\)
5 Related Work
There are a number of techniques for reasoning under relaxed memory models, but besides the DRF theorems and some simple invariant logics [10, 13], no other techniques have been proved sound for a model allowing the weak behaviour of LB+data+fakedep from the introduction. The “invariantbased program logics” are by design unable to reason about programs like the random number generator, where having a bound on the set of values written to a location is not enough, let alone reasoning about functional correctness of a program.
Relaxed Separation Logic (RSL). Among program logics for relaxed memory, the most closely related is RSL [27]. There are two versions of RSL: a weak one that is sound with respect to the C/C++11 memory model, which features outofthinair reads, and a stronger one that is sound with respect to a variant of the C/C++11 memory that forbids load buffering.
The weak version of RSL forbids relaxed writes completely, and does not constrain the value returned by a relaxed read. The stronger version provides singlelocation invariants for relaxed accesses, but its soundness proof relies strongly on a strengthened version of C/C++11 without \( po \cup rf \) cycles (where \( po \) is program order, and \( rf \) is the readsfrom relation), which forbids load buffering.
When it comes to reasoning about coherence properties, even the strong version of RSL is surprisingly weak: it cannot be used to verify any of the coherence examples in this paper. In fact, RSL can be shown sound with respect to much weaker coherence axioms than what C/C++11 relaxed accesses provide.
One notable feature of RSL which we do not support is readmodifywrite (RMW) instructions (such as compareandswap and fetchandadd). However, the soundness proof of SLR makes no simplifying assumptions about the promising semantics which would affect the semantics of RMW instructions. Therefore, we are confident that enhancing SLR with rules for RMW instructions would not substantially affect the structure of the soundness proof, presented in Sect. 4.
Other Program Logics. FSL [8] extends (the strong version of) RSL with stronger rules for relaxed accesses in the presence of release/acquire fences. In FSL, a release fence can be used to package an assertion with a modality, which a relaxed write can then transfer. Conversely, the ownership obtained by a relaxed read is guarded by a symmetric modality than needs an acquire fence to be unpacked. The soundness proof of FSL also relies on \( po \cup rf \) acyclicity. Moreover, it is known to be unsound in models where load buffering is allowed [9, Sect. 5.2].
A number of other logics—GPS [26], iGPS [12], OGRA [16], iCAPTSO [24], the relyguarantee proof system for TSO of Ridge [23], and the program logic for TSO of Wehrman and Berdine [28]—have been developed for even stronger memory models (release/acquire or TSO), and also rely quite strongly on—and try to expose—the stronger consistency guarantees provided by those models.
The framework of Alglave and Cousot [2] for reasoning about relaxed concurrent programs is parametric with respect to an axiomatic “perexecution” memory model. By construction, as argued by Batty et al. [3], such models cannot be used to define a languagelevel model allowing the weak behaviour of LB+data+fakedep and similar litmus tests while forbidding outofthinair behaviours. Moreover, their framework does not provide the usual abstraction facilities of program logics.
The lace logic of Bornat et al. [6] targets hardware memory models, in particular Power. It relies on annotating the program with “perexecution” constraints, and on syntactic features of the program. For example, it distinguishes LB+data+fakedep from LB+data+po, its variant where the write of second thread is \([x]_{\texttt {rlx}} := 1\), and is thus unsuitable to address outofthinair behaviours.
Other Approaches. Besides program logics, another way to reason about programs under weak memory models is to reduce the act of reasoning under a memory model M to reasoning under a stronger model \(M'\)—typically, but not necessarily, sequential consistency [7, 18]. One can often establish DRF theorems stating that a program without any races when executed under \(M'\) has the same behaviours when executed under M as when executed under \(M'\). For the promising semantics, Kang et al. [13, Sect. 5.4] have established such theorems for \(M'\) being releaseacquire consistency, sequential consistency, and the promisefree promising semantics, for suitable notions of races. The last one, the “PromiseFree DRF” theorem, is applicable to the DisjointLists program from the introduction, but none of these theorems can be applied to any of the other examples of this paper, as they are racy. Moreover, these theorems are not compositional, as they do not state anything about the DisjointLists program when put inside a larger, racy program—for example, just an extra read of a from another thread.
6 Conclusion
In this paper, we have presented the first expressive logic that is sound under the promising semantics, and have demonstrated its expressiveness with a number of examples. Our logic can be seen both as a general proof technique for reasoning about concurrent programs, and also as tool for proving the absence of outofthinair behaviour for challenging examples, and reasoning about coherence. In the future, we would like to extend the logic to cover more of relaxed memory, more advanced reasoning principles, such as those available in GPS [26], and mechanise its soundness proof.
Interesting aspects of relaxed memory we would like to also cover are readmodifywrites and fences. These would allow us to consider concurrent algorithms like circular buffers and the atomic reference counter verified in FSL++ [9]. This could be done by adapting the corresponding rules of RSL and GPS; moreover, we could adapt them with our new approach to reason about coherence.
To mechanise the soundness proof, we intend to use the Iris framework [11], which has already been used to prove the soundness of iGPS [12], a variant of the GPS program logic. To do this, however, we have to overcome one technical limitation of Iris. Namely, the current version of Iris is stepindexed over \(\mathbb {N}\), while our semantics uses transfinite stepindexing over \(\mathbb {N} \times \mathbb {N}\) to define nonpromising safety and allow us to reason about certifications of arbitrary length for each reduction step. Progress has been made towards transfinitely stepindexed logical relations that may be applicable to a transfinitely stepindexed version of Iris [25].
Footnotes
 1.
The litmus test is called this way because some early attempts to solve the OOTA problem allowed this example to return arbitrary values for x and y.
 2.
As the promising semantics comes with formal proofs of correctness of all the expected local program transformations and of compilation schemes to the x86TSO, Power, and ARMv8POP architectures [21], SLR is sound for these architectures too.
 3.
Supporting ownership transfer is necessary to provide useful rules for C11 release and acquire accesses.
 4.
This \(\nabla {}\) modality is similar in spirit, but weaker than that of FSL [8].
Notes
Acknowledgments
We would like to thank the reviewers for their feedback. The research was supported in part by the Danish Council for Independent Research (project DFF – 418100273), by a European Research Council Consolidator Grant for the project “RustBelt” (grant agreement no. 683289), and by Len Blavatnik and the Blavatnik Family foundation.
References
 1.Supplementary material for this paper. http://plv.mpisws.org/slr/appendix.pdf
 2.Alglave, J., Cousot, P.: Ogre and Pythia: an invariance proof method for weak consistency models. In: POPL 2017, pp. 3–18. ACM, New York (2017). http://doi.acm.org/10.1145/2994593CrossRefGoogle Scholar
 3.Batty, M., Memarian, K., Nienhuis, K., PichonPharabod, J., Sewell, P.: The problem of programming language concurrency semantics. In: Vitek, J. (ed.) ESOP 2015. LNCS, vol. 9032, pp. 283–307. Springer, Heidelberg (2015). https://doi.org/10.1007/9783662466698_12CrossRefGoogle Scholar
 4.Batty, M., Owens, S., Sarkar, S., Sewell, P., Weber, T.: Mathematizing C++ concurrency. In: POPL 2011, pp. 55–66. ACM, New York (2011). http://doi.acm.org/10.1145/1926385.1926394CrossRefGoogle Scholar
 5.Boehm, H.J., Demsky, B.: Outlawing ghosts: avoiding outofthinair results. In: Proceedings of the Workshop on Memory Systems Performance and Correctness, MSPC 2014, pp. 7:1–7:6. ACM, New York (2014). http://doi.acm.org/10.1145/2618128.2618134
 6.Bornat, R., Alglave, J., Parkinson, M.: New lace and arsenic (2016). https://arxiv.org/abs/1512.01416
 7.Bouajjani, A., Derevenetc, E., Meyer, R.: Robustness against relaxed memory models. In: Software Engineering 2014, Fachtagung des GIFachbereichs Softwaretechnik, 25–28 Februar 2014, Kiel, Deutschland, pp. 85–86 (2014)Google Scholar
 8.Doko, M., Vafeiadis, V.: A program logic for C11 memory fences. In: Jobstmann, B., Leino, K.R.M. (eds.) VMCAI 2016. LNCS, vol. 9583, pp. 413–430. Springer, Heidelberg (2016). https://doi.org/10.1007/9783662491225_20CrossRefGoogle Scholar
 9.Doko, M., Vafeiadis, V.: Tackling reallife relaxed concurrency with FSL++. In: Yang, H. (ed.) ESOP 2017. LNCS, vol. 10201, pp. 448–475. Springer, Heidelberg (2017). https://doi.org/10.1007/9783662544341_17CrossRefGoogle Scholar
 10.Jeffrey, A., Riely, J.: On thin air reads towards an event structures model of relaxed memory. In: LICS 2016, pp. 759–767. ACM, New York (2016)Google Scholar
 11.Jung, R., Krebbers, R., Jourdan, J.H., Bizjak, A., Birkedal, L., Dreyer, D.: Iris from the ground up (2017)Google Scholar
 12.Kaiser, J.O., Dang, H.H., Dreyer, D., Lahav, O., Vafeiadis, V.: Strong logic for weak memory: reasoning about releaseacquire consistency in Iris. In: ECOOP 2017 (2017)Google Scholar
 13.Kang, J., Hur, C.K., Lahav, O., Vafeiadis, V., Dreyer, D.: A promising semantics for relaxedmemory concurrency. In: POPL 2017. ACM, New York (2017). http://doi.acm.org/10.1145/3009837.3009850
 14.Lahav, O., Vafeiadis, V., Kang, J., Hur, C.K., Dreyer, D.: Repairing sequential consistency in C/C++11. In: PLDI (2017)Google Scholar
 15.Lahav, O., Giannarakis, N., Vafeiadis, V.: Taming releaseacquire consistency. In: POPL 2016, pp. 649–662. ACM, New York (2016). http://doi.acm.org/10.1145/2837614.2837643CrossRefGoogle Scholar
 16.Lahav, O., Vafeiadis, V.: Owickigries reasoning for weak memory models. In: Halldórsson, M.M., Iwama, K., Kobayashi, N., Speckmann, B. (eds.) ICALP 2015. LNCS, vol. 9135, pp. 311–323. Springer, Heidelberg (2015). https://doi.org/10.1007/9783662476666_25CrossRefGoogle Scholar
 17.Manson, J., Pugh, W., Adve, S.V.: The Java memory model. In: POPL, pp. 378–391. ACM, New York (2005)CrossRefGoogle Scholar
 18.Owens, S.: Reasoning about the implementation of concurrency abstractions on x86TSO. In: D’Hondt, T. (ed.) ECOOP 2010. LNCS, vol. 6183, pp. 478–503. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642141072_23CrossRefGoogle Scholar
 19.Owens, S., Sarkar, S., Sewell, P.: A better x86 memory model: x86TSO. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs 2009. LNCS, vol. 5674, pp. 391–407. Springer, Heidelberg (2009). https://doi.org/10.1007/9783642033599_27CrossRefGoogle Scholar
 20.PichonPharabod, J., Sewell, P.: A concurrency semantics for relaxed atomics that permits optimisation and avoids thinair executions. In: POPL 2016, pp. 622–633. ACM, New York (2016)CrossRefGoogle Scholar
 21.Podkopaev, A., Lahav, O., Vafeiadis, V.: Promising compilation to ARMv8 POP. In: ECOOP 2017. LIPIcs, vol. 74, pp. 22:1–22:28. Schloss Dagstuhl  LeibnizZentrum fuer Informatik (2017)Google Scholar
 22.Reynolds, J.C.: Separation logic: a logic for shared mutable data structures. In: Proceedings of the 17th IEEE Symposium on Logic in Computer Science (LICS 2002), 22–25 July 2002, Copenhagen, Denmark, pp. 55–74 (2002). https://doi.org/10.1109/LICS.2002.1029817
 23.Ridge, T.: A relyguarantee proof system for x86TSO. In: Leavens, G.T., O’Hearn, P., Rajamani, S.K. (eds.) VSTTE 2010. LNCS, vol. 6217, pp. 55–70. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642150579_4CrossRefGoogle Scholar
 24.Sieczkowski, F., Svendsen, K., Birkedal, L., PichonPharabod, J.: A separation logic for fictional sequential consistency. In: Vitek, J. (ed.) ESOP 2015. LNCS, vol. 9032, pp. 736–761. Springer, Heidelberg (2015). https://doi.org/10.1007/9783662466698_30CrossRefGoogle Scholar
 25.Svendsen, K., Sieczkowski, F., Birkedal, L.: Transfinite stepindexing: decoupling concrete and logical steps. In: Thiemann, P. (ed.) ESOP 2016. LNCS, vol. 9632, pp. 727–751. Springer, Heidelberg (2016). https://doi.org/10.1007/9783662494981_28CrossRefMATHGoogle Scholar
 26.Turon, A., Vafeiadis, V., Dreyer, D.: GPS: navigating weak memory with ghosts, protocols, and separation. In: 2014 ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA 2014, pp. 691–707. ACM, New York (2014)Google Scholar
 27.Vafeiadis, V., Narayan, C.: Relaxed separation logic: a program logic for C11 concurrency. In: OOPSLA 2013, pp. 867–884. ACM, New York (2013)Google Scholar
 28.Wehrman, I., Berdine, J.: A proposal for weakmemory local reasoning. In: LOLA (2011)Google Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.