Rely-Guarantee Reasoning for Causally Consistent Shared Memory (Extended Version)

Rely-guarantee (RG) is a highly influential compositional proof technique for concurrent programs, which was originally developed assuming a sequentially consistent shared memory. In this paper, we first generalize RG to make it parametric with respect to the underlying memory model by introducing an RG framework that is applicable to any model axiomatically characterized by Hoare triples. Second, we instantiate this framework for reasoning about concurrent programs under causally consistent memory, which is formulated using a recently proposed potential-based operational semantics, thereby providing the first reasoning technique for such semantics. The proposed program logic, which we call Piccolo, employs a novel assertion language allowing one to specify ordered sequences of states that each thread may reach. We employ Piccolo for multiple litmus tests, as well as for an adaptation of Peterson's algorithm for mutual exclusion to causally consistent memory.


Introduction
Rely-guarantee (RG) is a fundamental compositional proof technique for concurrent programs [21,47].Each program component P is specified using rely and guarantee conditions, which means that P can tolerate any environment interference that follows its rely condition, and generate only interference included in its guarantee condition.Two components can be composed in parallel provided that the rely of each component agrees with the guarantee of the other.
The original RG framework and its soundness proof have assumed a sequentially consistent (SC) memory [32], which is unrealistic in modern processor architectures and programming languages.Nevertheless, the main principles behind RG are not at all specific for SC.Accordingly, our first main contribution, is to formally decouple the underlying memory model from the RG proof principles, by proposing a generic RG framework parametric in the input memory model.
To do so, we assume that the underlying memory model is axiomatized by Hoare triples specifying pre-and postconditions on memory states for each primitive operation (e.g., loads and stores).This enables the formal development of RGbased logics for different shared memory models as instances of one framework, where all build on a uniform soundness infrastructure of the RG rules (e.g., for sequential and parallel composition), but employ different specialized assertions to describe the possible memory states, where specific soundness arguments are only needed for primitive memory operations.
The second contribution of this paper is an instance of the general RG framework for causally consistent shared memory.The latter stands for a family of wide-spread and well-studied memory models weaker than SC, which are sufficiently strong for implementing a variety of synchronization idioms [6,12,26].Intuitively, unlike SC, causal consistency allows different threads to observe writes to memory in different orders, as long as they agree on the order of writes that are causally related.This concept can be formalized in multiple ways, and here we target a strong form of causal consistency, called strong releaseacquire (SRA) [28,30] (and equivalent to "causal convergence" from [12]), which is a slight strengthening of the well-known release-acquire (RA) model (used by C/C++11).(The variants of causal consistency only differ for programs with write/write races [10,28], which are rather rare in practice.) Our starting point for axiomatizing SRA as Hoare triples is the potentialbased operational semantics of SRA, which was recently introduced with the goal of establishing the decidability of control state reachability under this model [27,28] (in contrast to undecidability under RA [1]).Unlike more standard presentations of weak memory models whose states record information about the past (e.g., in the form of store buffers containing executed writes before they are globally visible [35], partially ordered execution graphs [8,20,30], or collections of timestamped messages and thread views [11,16,17,23,25,46]), the states of the potential-based model track possible futures ascribing what sequences of observations each thread can perform.We find this approach to be a particularly appealing candidate for Hoare-style reasoning which would naturally generalize SC-based reasoning.Intuitively, while an assertion in SC specifies possible observations at a given program point, an assertion in a potential-based model should specify possible sequences of observations.
To pursue this direction, we introduce a novel assertion language, resembling temporal logics, which allows one to express properties of sequences of states.For instance, our assertions can express that a certain thread may currently read x = 0, but it will have to read x = 1 once it reads y = 1.Then, we provide Hoare triples for SRA in this assertion language, and incorporate them in the general RG framework.The resulting program logic, which we call Piccolo, provides a novel approach to reason on concurrent programs under causal consistency, which allows for simple and direct proofs, and, we believe, may constitute a basis for automation in the future.To make our discussion concrete, consider the message passing program (MP) in Figures 1 and 2, comprising shared variables x and y and local registers a and b.The proof outline in Fig. 1 assumes SC, whereas Fig. 2 assumes SRA.In both cases, at the end of the execution, we show that if a is 1, then b must also be 1.We use these examples to explain the two main concepts introduced in this paper: (i) a generic RG framework and (ii) its instantiation with a potential-focused assertion system that enables reasoning under SRA.
Rely-Guarantee.The proof outline in Fig. 1 can be read as an RG derivation: 1. Thread T 1 locally establishes its postcondition when starting from any state that satisfies its precondition.This is trivial since its postcondition is True.2. Thread T 1 relies on the fact that its used assertions are stable w.r.t.interference from its environment.We formally capture this condition by a rely set R 1 {True, x = 1}.3. Thread T 1 guarantees to its concurrent environment that its only interferences are STORE(x, 1) and STORE(y, 1), and furthermore that STORE(y, 1) is only performed when x = 1 holds.We formally capture this condition by a guarantee set where each element is a command guarded by a precondition.4. Thread T 2 locally establishes its postcondition when starting from any state that satisfies its precondition.This is straightforward using standard Hoare rules for assignment and sequential composition.5. Thread T 2 's rely set is again obtained by collecting all the assertions used in its proof: . Indeed, the local reasoning for T 2 needs all these assertions to be stable under the environment interference.6. Thread T 2 's guarantee set is given by: , we require the Hoare triple {P ∩ R} τ → c {R} to hold.In this case, these proof obligations are straightforward to discharge using Hoare's assignment axiom (and is trivial for i = 1 and j = 2 since load instructions leave the memory intact).
Remark 1. Classical treatments of RG involve two related ideas [21]: (1) specifying a component by rely and guarantee conditions (together with standard pre-and postconditions); and (2) taking the relies and guarantees to be binary relations over states.Our approach adopts (1) but not (2).Thus, it can be seen as an RG presentation of the Owicki-Gries method [36], as was previously done in [31].We have not observed an advantage for using binary relations in our examples, but the framework can be straightforwardly modified to do so.Now, observe that substantial aspects of the above reasoning are not directly tied with SC.This includes the Hoare rules for compound commands (such as sequential composition above), the idea of specifying a thread using collections of stable rely assertions and guaranteed guarded primitive commands, and the noninterference condition for parallel composition.To carry out this generalization, we assume that we are provided an assertion language whose assertions are interpreted as sets of memory states (which can be much more involved than simple mappings of variables to values), and a set of valid Hoare triples for the primitive instructions.The latter is used for checking validity of primitive triples, (e.g., {P } T 1 → STORE(x, 1) {Q}), as well as non-interference conditions (e.g., {P ∩ R} T 1 → STORE(x, 1) {R}).In §4, we present this generalization, and establish the soundness of RG principles independently of the memory model.
Potential-based reasoning.The second contribution of our work is an application of the above to develop a logic for a potential-based operational semantics that captures SRA.In this semantics every memory state records sequences of store mappings (from shared variables to values) that each thread may observe.For example, assuming all variables are initialized to 0, if T 1 executed its code until completion before T 2 even started (so under SC the memory state is the store {x → 1, y → 1}), we may reach the SRA state in which T 1 's potential consists of one store {x → 1, y → 1}, and T 2 's potential is the sequence of stores: which captures the stores that T 2 may observe in the order it may observe them.Naturally, potentials are lossy allowing threads to non-deterministically lose a subsequence of the current store sequence, so they can progress in their sequences.Thus, T 2 can read 1 from y only after it loses the first two stores in its potential, and from this point on it can only read 1 from x. Now, one can see that all potentials of T 2 at its initial program point are, in fact, subsequences of the above sequence (regardless of where T 1 is), and conclude that a = 1 ⇒ b = 1 holds when T 2 terminates.
To capture the above informal reasoning in a Hoare logic, we designed a new form of assertions capturing possible locally observable sequences of stores, rather than one global store, which can be seen as a restricted fragment of linear temporal logic.The proof outline using these assertions is given in Fig. 2. In particular, [x = 1] is satisfied by all store sequences in which every store maps x to 1, whereas [y = 1] ; [x = 1] is satisfied by all store sequences that can be split into a (possibly empty) prefix whose value for y is not 1 followed by a (possibly empty) suffix whose value for x is 1.Assertions of the form τ ⋉I state that the potential of thread τ includes only store sequences that satisfy I.
The first assertion of T 2 is implied by the initial condition, T 0 ⋉[y = 1], since the potential of the parent thread T 0 is inherited by the forked child threads and is preserved by (i) line 1 because writing 1 to x leaves [y = 1] unchanged and re-establishes [x = 1]; and (ii) line 2 because the semantics for SRA ensures that after reading 1 from y by T 2 , the thread T 2 is confined by T 1 's potential just before it wrote 1 to y, which has to satisfy the precondition (SRA allows to update the other threads' potential only when the suffix of the potential after the update is observable by the writer thread.) In §6 we formalize these arguments as Hoare rules for the primitive instructions, whose soundness is checked using the potential-based operational semantics and the interpretation of the assertion language.Finally, Piccolo is obtained by incorporating these Hoare rules in the general RG framework.
Remark 2. Our presentation of the potential-based semantics for SRA (fully presented in §5) deviates from the original one in [28], where it was called loSRA.The most crucial difference is that while loSRA's potentials consist of lists of per-location read options, our potentials consist of lists of stores assigning a value to every variable.(This is similar in spirit to the adaptation of load buffers for TSO [4,5] to snapshot buffers in [2]).Additionally, unlike loSRA, we disallow empty potential lists, require that the potentials of the different threads agree on the very last value to each location, and handle read-modify-write (RMW) instructions differently.We employed these modifications to loSRA as we observed that direct reasoning on loSRA states is rather unnatural and counterintuitive, as loSRA allows traces that block a thread from reading any value from certain locations (which cannot happen in the version we formulate).For example, a direct interpretation of our assertions over loSRA states would allow states in which τ ⋉[x = v] and τ ⋉[x = v] both hold (when τ does not have any option to read from x), while these assertions are naturally contradictory when interpreted on top of our modified SRA semantics.To establish confidence in the new potential-based semantics we have proved in Coq its equivalence to the standard execution-graph based semantics of SRA (over 5K lines of Coq proofs) [29].

Preliminaries: Syntax and Semantics
In this section we describe the underlying program language, leaving the sharedmemory semantics parametric.
Syntax.The syntax of programs, given in Fig. 3

||
τ2 C 2 that forks two threads named τ 1 and τ 2 that execute the commands C 1 and C 2 , respectively.Each C i may itself comprise further parallel compositions.Since thread identifiers are explicit, we require commands to be well formed.Let Tid(C) be the set of all thread identifiers that appear in C. A command C is well formed, denoted wf(C), if parallel compositions inside employ disjoint sets of thread identifiers.This notion is formally defined by induction on the structure of commands, with the only interesting case being wf(C Program semantics.We provide small-step operational semantics to commands independently of the memory system.To connect this semantics to a given memory system, its steps are instrumented with labels, as defined next.Definition 1.A label l takes one of the following forms: and τ 1 , τ 2 ∈ Tid.We denote by Lab the set of all labels.
A register store is a mapping γ : Reg → Val.Register stores are extended to expressions as expected.We denote by Γ the set of all register stores.
The semantics of (instrumented) primitive commands is given in Fig. 4. Using this definition, the semantics of commands is given in Fig. 5. Its steps are of the form C, γ lε − → C ′ , γ ′ where C and C ′ are commands, γ and γ ′ are register stores, and l ε ∈ Lab∪{ε} (ε denotes a thread internal step).We lift this semantics to command pools as follows.
Definition 3. A command pool is a non-empty partial function C from thread identifiers to commands, such that the following hold: 1.
We write command pools as sets of the form Steps for command pools are given in Fig. 6.They take the form C, γ τ,lε − − → C ′ , γ ′ , where C and C ′ are command pools, γ and γ ′ are register stores, and τ : l ε (with τ ∈ Tid and l ε ∈ Lab ∪ {ε}) is a command transition label.
Memory semantics.To give semantics to programs under a memory model, we synchronize the transitions of a command C with a memory system.We leave the memory system parametric, and assume that it is represented by a labeled transition system (LTS) M with set of states denoted by M.Q, and steps denoted by − →M .The transition labels of general memory system M consist of non-silent program transition labels (elements of Tid × Lab) and a (disjoint) set M.Θ of internal memory actions, which is again left parametric (used, e.g., for memory-internal propagation of values).
Example 1.The simple memory system that guarantees sequential consistency is denoted here by SC.This memory system tracks the most recent value written to each variable and has no internal transitions (SC.Θ = ∅).Formally, it is defined by SC.Q Loc → Val and − →SC is given by: The composition of a program with a general memory system is defined next.Definition 4. The concurrent system induced by a memory system M, denoted by M, is the LTS whose transition labels are the elements of (Tid×(Lab∪{ε}))⊎ M.Θ; states are triples of the form C, γ, m where C is a command pool, γ is a register store, and m ∈ M.Q; and the transitions are "synchronized transitions" of the program and the memory system, using labels to decide what to synchronize on, formally given by:

Generic Rely-Guarantee Reasoning
In this section we present our generic RG framework.Rather than committing to a specific assertion language, our reasoning principles apply on the semantic level, using sets of states instead of syntactic assertions.The structure of proofs still follows program structure, thereby retaining RG's compositionality.By doing so, we decouple the semantic insights of RG reasoning from a concrete syntax.Next, we present proof rules serving as blueprints for memory model specific proof systems.An instantiation of this blueprint requires lifting the semantic principles to syntactic ones.More specifically, it requires 1. a language with (a) concrete assertions for specifying sets of states and (b) operators that match operations on sets of states (like ∧ matches ∩); and 2. sound Hoare triples for primitive commands.
Thus, each instance of the framework (for a specific memory system) is left with the task of identifying useful abstractions on states, as well as a suitable formalism, for making the generic semantic framework into a proof system.RG judgments.We let M be an arbitrary memory system and Σ M Γ×M.Q.Properties of programs C are stated via RG judgments: where P, Q ⊆ Σ M , R ⊆ P(Σ M ), and G is a set of guarded commands, each of which takes the form {G} τ → α, where G ⊆ Σ M and α is either an (instrumented) primitive command c or a fork/join label (of the form FORK(τ 1 , τ 2 ) or JOIN(τ 1 , τ 2 )).The latter is needed for considering the effect of forks and joins on the memory state.
Interpretation of RG judgments.RG judgments C sat M (P, R, G, Q) state that a terminating run of C starting from a state in P , under any concurrent context whose transitions preserve each of the sets of states in R, will end in a state in Q and perform only transitions contained in G.To formally define this statement, following the standard model for RG, these judgments are interpreted on computations of programs.Computations arise from runs of the concurrent system (see Def. 4) by abstracting away from concrete transition labels and including arbitrary "environment transitions" representing steps of the concurrent context.We have: -Memory transitions, which correspond to internal memory steps (labeled with θ ∈ M.Θ), of the form C, γ, m Note that memory transitions do not occur in the classical RG presentation (since SC does not have internal memory actions).
A computation is a (potentially infinite) sequence with a i ∈ {cmp, env, mem}.We let C last(ξ) , γ last(ξ) , m last(ξ) denotes its last element, when ξ is finite.We say that ξ is a computation of a command pool C when C 0 = C and for every i ≥ 0: We denote by Comp(C) the set of all computations of a command pool C.
To define validity of RG judgments, we use the following definition. - for every τ ∈ dom(C last(ξ) ).
We denote by Assume(P, R) the set of all computations that admit P and R, and by Commit(G, Q) the set of all computations that admit G and Q.
Then, validity of a judgment if defined as Memory triples.Our proof rules build on memory triples, which specify preand postconditions for primitive commands for a memory system M. Definition 6.A memory triple for a memory system M is a tuple of the form {P } τ → α {Q}, where P, Q ⊆ Σ M , τ ∈ Tid, and α is either an instrumented primitive command, a fork label, or a join label.A memory triple for M is valid, denoted by M {P } τ → α {Q}, if the following hold for every γ, m ∈ P , γ ′ ∈ Γ and m ′ ∈ M.Q: -if α is an instrumented primitive command and {τ → α}, γ, m τ,lε Example 2. For the memory system SC introduced in Ex. 1, we have, e.g., memory triples of the form SC {e(r := x)} τ → r := LOAD(x) {e} (where e(r := x) is the expression e with all occurrences of r replaced by x).
RG proof rules.We aim at proof rules deriving valid RG judgments.Figure 7 lists (semantic) proof rules based on externally provided memory triples.These rules basically follows RG reasoning for sequential consistency.For example, rule seq states that RG judgments of commands C 1 and C 2 can be combined when the postcondition of C 1 and the precondition of C 2 agree, thereby uniting their relies and guarantees.Rule com builds on memory triples.The rule par for parallel composition combines judgments for two components when their relies and guarantees are non-interfering.Intuitively speaking, this means that each of the assertions that each thread relied on for establishing its proof is preserved when applying any of the assignments collected in the guarantee set of the other thread.An example of non-interfering rely-guarantee pairs is given in step 7 in §2.Formally, non-interference is defined as follows: In turn, fork-join combines the proof of a parallel composition with proofs of fork and join steps (which may also affect the memory state).Note that the guarantees also involve guarded commands with FORK and JOIN labels.
Additional rules for consequence and introduction of auxiliary variables are elided here (they are similar to their SC counterparts), and provided in the appendix.
Soundness.To establish soundness of the above system we need an additional requirement regarding the internal memory transitions (for SC this closure vacuously holds as there are no such transitions).We require all relies in R to be stable under internal memory transitions, i.e. for R ∈ R we require This condition is needed since the memory system can non-deterministically take its internal steps, and the component's proof has to be stable under such steps.With this requirement, we are able to establish soundness.The proof, which generally follows [47] is is given in the appendix.We write ⊢ C sat M (P, R, G, Q) for provability of a judgment using the semantic rules presented above.

Potential-based Memory System for SRA
In this section we present the potential-based semantics for Strong Release-Acquire (SRA), for which we develop a novel RG logic.Our semantics is based on the one in [27,28], with certain adaptations to make it better suited for Hoare-style reasoning (see Remark 2).
In weak memory models, threads typically have different views of the shared memory.In SRA, we refer to a memory snapshot that a thread may observe as a potential store: Definition 8.A potential store is a function δ : Loc → Val × {R, RMW} × Tid.We write val(δ(x)), rmw(δ(x)), and tid(δ(x)) to retrieve the different components of δ(x).We denote by ∆ the set of all potential stores.
Having δ(x) = v, R, τ allows to read the value v from x (and further ascribes that this read reads from a write performed by thread τ , which is technically needed to properly characterize the SRA model).In turn, having δ(x) = v, RMW, τ further allows to perform an RMW instruction that atomically reads and modifies x.
Potential stores are collected in potential store lists describing the values which can (potentially) be read and in what order.Notation 9 Lists over an alphabet A are written as L = a 1 • ... • a n where a 1 , ... ,a n ∈ A. We also use • to concatenate lists, and write L[i] for the i'th element of L and |L| for the length of L.
A (potential) store list is a finite sequence of potential stores ascribing a possible sequence of stores that a thread can observe, in the order it will observe them.The RMW-flags in these lists have to satisfy certain conditions: once the flag for a location is set, it remains set in the rest of the list; and the flag must be set at the end of the list.Formally, store lists are defined as follows.
Definition 10.A store list L ∈ L is a non-empty finite sequence of potential stores with monotone RMW-flags ending with an RMW, that is: for all x ∈ Loc, 1. if rmw(L[i](x)) = RMW, then rmw(L[j](x)) = RMW for every i < j ≤ |L|, and 2. rmw(L[|L|](x)) = RMW.Now, SRA states (SRA.Q) consist of potential mappings that assign potentials to threads as defined next.Definition 11.A potential D is a non-empty set of potential store lists.A potential mapping is a function D : Tid ⇀ P(L)\{∅} that maps thread identifiers to potentials such that all lists agree on the very final potential store (that is: These potential mappings are "lossy" meaning that potential stores can be arbitrarily dropped.In particular, dropping the first store in a list enables reading from the second.This is formally done by transitioning from a state D to a "smaller" state D ′ as defined next.We also define L L ′ if L ′ is obtained from L by duplication of some stores (e.g., . This is lifted to potential mappings as expected. Figure 8 defines the transitions of SRA.The lose and dup steps account for losing and duplication in potentials.Note that these are both internal memory transitions (required to preserve relies as of (mem)).The fork and join steps distribute potentials on forked threads and join them at the end.The read step obtains its value from the first store in the lists of the potential of the reader, provided that all these lists agree on that value and the writer thread identifier.rmw steps atomically perform a read and a write step where the read is restricted to an RMW-marked entry.
Most of the complexity is left for the write step.It updates to the new written value for the writer thread τ .For every other thread, it updates a suffix (L 1 ) of the store list with the new value.For guaranteeing causal consistency this updated suffix cannot be arbitrary: it has to be in the potential of the writer thread (L 1 ∈ D(τ )).This is the key to achieving the "shared-memory causality principle" of [28], which ensures causal consistency.
Example 3. Consider again the MP program from Fig. 2.After the initial fork step, threads T 1 and T 2 may have the following store list in their potentials: Then, STORE(x, 1) by T 1 can generate the following store list for T 2 : Thus T 2 keeps the possibility of reading the "old" value of x.For T 1 this is different: the model allows the writing thread to only see its new value of x and all entries for x in the store list are updated.Thus, for T 1 we obtain store list Next, when T 1 executes STORE(y, 1), again, the value for y has to be updated to For T 2 the write step may change L 2 to Thus, thread T 2 can still see the old values, or lose the prefix of its list and see the new values.Importantly, it cannot read 1 from y and then 0 from x.Note that STORE(y, 1) by T 1 cannot modify L 2 to the list as it requires T 1 to have L 2 in its own potential.This models the intended semantics of message passing under causal consistency.
The next theorem establishes the equivalence of SRA as defined above and opSRA from [28], which is an (operational version of) the standard strong releaseacquire declarative semantics [26,30].(As a corollary, we obtain the equivalence between the potential-based system from [28] and the variant we define in this paper.) Our notion of equivalence employed in the theorem is trace equivalence.We let a trace of a memory system be a sequence of transition labels, ignoring ε transitions, and consider traces of SRA starting from an initial state λτ ∈ {T 1 , ... ,T N }.{ λx.0, RMW, T 0 } and traces of opSRA starting from the initial execution graph that consists of a write event to every location writing 0 by a distinguished initialization thread T 0 .The proof is of this theorem is by simulation arguments (forward simulation in one direction and backward for the converse).It is mechanized in Coq and available in [29].The mechanized proof does not consider fork and join steps, but they can be straightforwardly added.

Program Logic
For the instantiation of our RG framework to SRA, we next (1) introduce the assertions of the logic Piccolo and (2) specify memory triples for Piccolo.Our logic is inspired by interval logics like Moszkowski's ITL [34] or duration calculus [13].
Syntax and semantics.Figure 9 gives the grammar of Piccolo.We base it on extended expressions which-besides registers-can also involve locations as well as expressions of the form R(x) (to indicate RMW-flag R).Extended expressions E can hold on entire intervals of a store list (denoted [E]).Store lists can be split into intervals satisfying different interval expressions (I 1 ; ... ; I n ) using the ";" operator (called "chop").In turn, τ ⋉I means that all store lists in τ 's potential satisfy I.For an assertion ϕ, we let fv (ϕ) ⊆ Reg∪Loc∪Tid be the set of registers, locations and thread identifiers occurring in ϕ, and write R(x) ∈ ϕ to indicate that the term R(x) occurs in ϕ.
As an example consider again MP (Fig. 2).We would like to express that T 2 upon seeing y to be 1 cannot see the old value 0 of x anymore.In Piccolo this is expressed as T 2 ⋉ [y = 1] ; [x = 1]: the store lists of T 2 can be split into two intervals (one possibly empty), the first satisfying y = 1 and the second x = 1.
Formally, an assertion ϕ describes register stores coupled with SRA states: Definition 13.Let γ be a register store, δ a potential store, L a store list, and D a potential mapping.We let e γ,δ = γ(e), x γ,δ = δ(x), and R(x) γ,δ = if rmw(δ(x)) = R then true else false.The extension of this notation to any extended expression E is standard.The validity of assertions in γ, D , denoted by γ, D |= ϕ, is defined as follows: Note that with ∧ and ∨ as well as negation on expressions, 4 the logic provides the operators on sets of states necessary for an instantiation of our RG framework.Further, the requirements from SRA states guarantee certain properties: - False (follows from the fact that all lists in potentials are non-empty and agree on the last store).
-If γ, D |= τ ⋉ [R(x)] ; [E], then every list L ∈ D(τ ) contains a non-empty suffix satisfying E (since all lists have to end with RMW-flags set on).All assertions are preserved by steps lose and dup.This stability is required by our RG framework (condition (mem)) 5 .Stability is achieved here because negations occur on the level of (simple) expressions only (e.g., we cannot have ¬(τ ⋉[x = v]), meaning that τ must have a store in its potential whose value for x is not v, which would not be stable under lose).
Memory triples.Assertions in Piccolo describe sets of states, thus can be used to formulate memory triples.Figure 10 gives the base triples for the different primitive instructions.We see the standard SC rule of assignment (Subst-asgn) for registers followed by a number of stability rules detailing when assertions are not affected by instructions.Axioms Fork and Join describe the transfer of properties from forking thread to forked threads and back.
The next four axioms in the table concern write instructions (either SWAP or STORE).They reflect the semantics of writing in SRA: (1) In the writer thread τ all stores in all lists get updated (axiom Wr-own).Other threads π will have (2) their lists being split into "old" values for x with R flag and the new value for x (Wr-other-1), ( 3) properties (expressed as I τ ) of suffixes of lists being preserved when the writing thread satisfies the same properties (Wr-other-2) and ( 4) their lists consisting of R-accesses to x followed by properties of the writer (Wr-other-3).The last axiom concerns SWAP only: as it can only read from store entries marked as RMW it discards intervals satisfying [R(x)].
Example 4. We employ the axioms for showing one proof step for MP, namely one pair in the non-interference check of the rely R 2 of T 2 with respect to the guarantees G 1 of T 1 : this is an instance of Wr-other-2.
In addition to the axioms above, we use a shift rule for load instructions: A load instruction reads from the first store in the lists, however, if the list satisfying [(e ∧ E)(r := x)] in [(e ∧ E)(r := x)] ; I is empty, it reads from a list satisfying I.The shift rule for LOAD puts this shifting to next stores into a proof rule.Like the standard Hoare rule Subst-asgn, Ld-shift employs backward substitution.
Example 5. We exemplify rule Ld-shift on another proof step of example MP, one for local correctness of T 2 : } using the former as premise for Ld-shift.
In addition, we include the standard conjunction, disjunction and consequence rules of Hoare logic.For instrumented primitive commands we employ the following rule:  11 enforces an ordering on writes to the shared location x on thread T 1 .The postcondition guarantees that after reading the second write, thread T 2 cannot read from the first.Fig. 12 is similar, but the writes to x occur on two different threads.The postcondition of the program guarantees that the two different threads agree on the order of the writes.In particular if one reading thread (here T 3 ) sees the value 2 then 1, it is impossible for the other reading thread (here T 4 ) to see 1 then 2.
Potential assertions provide a compact and intuitive mechanism for reasoning, e.g., in Fig. 11, the precondition of line 3 precisely expresses the order of values available to thread T 2 .This presents an improvement over view-based assertions [16], which required a separate set of assertions to encode write order.
Peterson's algorithm.Figure 13 shows Peterson's algorithm for implementing mutual exclusion for two threads [37] together with Piccolo assertions.We depict only the code of thread T 1 .Thread T 2 is symmetric.A third thread T 3 is assumed stopping the other two threads at an arbitrary point in time.We use do C until e as a shorthand for C ; while e do C.For correctness under SRA, all accesses to the shared variable turn are via a SWAP, which ensures that turn behaves like an SC variable.
Correctness is encoded via registers mx 1 and mx 2 into which the contents of shared variable cs is loaded.Mutual exclusion should guarantee both registers to be 0. Thus neither threads should ever be able to read cs to be ⊥ (as stored in line 7).The proof (like the associated SC proof in [9]) introduces auxiliary variables a 1 and a 2 .Variable a i is initially false, set to true when a thread T i has performed its swap, and back to false when T i completes.
Once again potentials provide convenient mechanisms for reasoning about the interactions between the two threads.For example, the assertion T 1 ⋉[R(turn)] ; [flag 2 ] in the precondition of line 2 encapsulates the idea that an RMW on turn (via SWAP(turn, 2)) must read from a state in which flag 2 holds, allowing us to establish T 1 ⋉[flag 2 ] as a postcondition (using the axiom Swap-skip).We obtain disjunct T 1 ⋉[flag 2 ∧ turn = 1] after additionally applying Wr-own.8 Discussion, Related and Future Work Previous RG-like logics provided ad-hoc solutions for other concrete memory models such as x86-TSO and C/C++11 [11,16,17,31,38,39,46].These approaches established soundness of the proposed logic with an ad-hoc proof that couples together memory and thread transitions.We believe that these logics can be formulated in our proposed general RG framework (which will require extensions to other memory operations such as fences).
Moreover, Owicki-Gries logics for different fragments of the C11 memory model [16,17,46] used specialized assertions over the underlying view-based semantics.These include conditional-view assertion (enabling reasoning about MP), and value-order (enabling reasoning about coherence).Both types of assertions are special cases of the potential-based assertions of Piccolo.
Ridge [39] presents an RG reasoning technique tailored to x86-TSO, treating the write buffers in TSO architectures as threads whose steps have to preserve relies.This is similar to our notion of stability of relies under internal memory transitions.Ridge moreover allows to have memory-model specific assertions (e.g., on the contents of write buffers).
The OGRA logic [31] for Release-Acquire (which is slightly weaker form of causal consistency compared to SRA studied in this paper) takes a different approach, which cannot be directly handled in our framework.It employs simple SC-like assertions at the price of having a non-standard non-interference condition which require a stronger form of stability.
Coughlin et al. [14,15] provide an RG reasoning technique for weak memory models with a semantics defined in terms of reordering relations (on instructions).They study both multicopy and non-multicopy atomic architectures, but in all models, the rely-guarantee assertions are interpreted over SC.
Schellhorn et al. [40] develop a framework that extends ITL with a compositional interleaving operator, enabling proof decomposition using RG rules.Each interval represents a sequence of states, strictly alternating between program and environment actions (which may be a skip action).This work is radically different from ours since (1) their states are interpreted using a standard SC semantics, and (2) their intervals represent an entire execution of a command as well the interference from the environment while executing that command.
Under SC, rely-guarantee was combined with separation logic [43,45], which allows the powerful synergy of reasoning using stable invariants (as in relyguarantee) and ownership transfer (as in concurrent separation logic).It is interesting to study a combination of our RG framework with concurrent separation logics for weak memory models, such as [42,44].
Other works have studied the decidability of verification for causal consistency models.In work preceding the potential-based SRA model [28], Abdulla et al. [1] show that verification under RA is undecidable.In other work, Abdulla et al. [3] show that the reachability problem under TSO remains decidable for systems with dynamic thread creation.Investigating this question under SRA is an interesting topic for future work.
Finally, the spirit of our generic approach is similar to Iris [22], Views [18], Ogre and Pythia [7], the work of Ponce de León et al. [33], and recent axiomatic characterizations of weak memory reasoning [19], which all aim to provide a generic framework that can be instantiated to underlying semantics.
In the future we are interested in automating the reasoning in Piccolo, starting from automatically checking for validity of program derivations (using, e.g., SMT solvers for specialised theories of sequences or strings [24,41]), and, including, more ambitiously, synthesizing appropriate Piccolo invariants.As P ⊆ P ′ , ξ ∈ Assume(P ′ , R ∪ {P, Q}).
Proof.First of all, ξ i admits P i by P ⊆ P 1 ∩ P 2 .Now proof by contradiction.Assume j to be the smallest index such that for the transition − aj+1=cmp −−−−−− → in some ξ i there is no {p} τ i → c ∈ G i such that γ j , m j ∈ p and γ j , m j − c → γ j+1 , m j+1 .Without loss of generality assume this to be thread i = 1.We
Then there exists a k s.t.C k = {τ → if e then C 1 else C 2 }, a k+1 = ε and for all i ≤ k, step a i = cmp, hence γ k , m k ∈ P (as P is in the relies).Then two cases: (1) γ k ∈ e or is not.The cases are dual and we just consider the first one.In that case, C k+1 = {τ → C 1 }.Furthermore, γ k+1 , m k+1 ∈ P ∩ e × M.Q (the ε-step of if neither changing registers nor memory).All further component steps of ξ now satisfy G 1 (by {τ → C 1 } sat M