Abstraction for Crash-Resilient Objects (Extended Version)

We study abstraction for crash-resilient concurrent objects using non-volatile memory (NVM). We develop a library correctness criterion that is sound for ensuring contextual refinement in this setting, thus allowing clients to reason about library behaviors in terms of their abstract specifications, and library developers to verify their implementations against the specifications abstracting away from particular client programs. As a semantic foundation we employ a recent NVM model, called Persistent Sequential Consistency, and extend its language and operational semantics with useful specification constructs. The proposed correctness criterion accounts for NVM-related interactions between client and library code due to explicit persist instructions, and for calling policies enforced by libraries. We illustrate our approach on two implementations and specifications of simple persistent objects with different prototypical durability guarantees. Our results provide the first approach to formal compositional reasoning under NVM.


Introduction
Non-volatile memory, or NVM for short, is an emerging technology that enables byte addressable and high performant storage alongside with data persistency across system crashes.This combination of features allows researchers and practitioners to develop a variety of efficient crash-resilient data structures (see, e.g., [14,32]).Recently, NVM has started to become available in commodity architectures of manufacturers such as Intel and ARM [4,23], and formal (operational and declarative) models of these systems have been proposed [10,25,30].
Unfortunately, like other new technologies, NVM puts more burden on programmers.Indeed, to get close to the performance of DRAM, writes to the NVM are first kept in volatile (i.e., losing contents upon crashes) caches, and only later persist (i.e., propagate to the NVM), possibly not in the order in which they were issued.This results in counterintuitive behaviors even for sequential programs and requires careful management using barriers of different kinds, a.k.a.explicit persist instructions, for guaranteeing that the system recovers to a consistent state upon a failure.Combined with standard concurrency issues, programming on such machines is highly challenging.
To tackle the complexity and make NVM widely applicable, one would naturally want to draw on libraries encapsulating highly optimized concurrent crashresilient data structures (a.k.a.persistent objects).This approach goes both ways: programmers should be able to reason about their code using abstract library specifications that hide the implementation details, and in turn, library developers should be able to verify "once and for all" their implementations against their specifications abstracting away from a particular client program.From a formal standpoint, this indispensable modularity requires us to have a so-called (library) abstraction theorem: a correctness condition that guarantees the soundness of client reasoning that assumes the specification instead of the implementation.Put differently, the abstraction theorem should allow one to establish contextual refinement, i.e., conclude that the specification reproduces the implementation's client-observable behaviors under any (valid) context.To the best of our knowledge, while several correctness criteria for persistent objects, akin to classical linearizability, have been proposed and have been established for multiple sophisticated implementations, none of them has been formally related to contextual refinement by an abstraction theorem of this kind.
In this paper we formulate and prove an abstraction theorem for concurrent programs utilizing non-volatile memory.We target the "Persistent Sequential Consistency" model of [25], or PSC, which enriches the standard sequentially consistent shared-memory with non-volatile storage using per-location FIFO buffers to account for delayed and out-of-order persistence of writes.PSC constitutes a relatively simple model that is very close to developer's informal understanding of NVM.While existing hardware does not implement PSC as is, [25] presented compiler mappings from PSC to x86 (based on its persistency model from [30]), which can be used to ensure PSC semantics on Intel machines.Directly supporting relaxed memory models is left for future work.

Key Challenges and Ideas
We outline the main challenges and the key ideas in our solutions.We keep the discussion informal, leaving the formal development to later sections.

Library Specifications
A choice of a formalism for specifying library behaviors is integral in stating a library abstraction theorem.For libraries of concurrent data structures (a.k.a.concurrent objects), a popular approach is to give specifications in terms of sequential objects with the help of the classical notion of linearizability [21], which requires every sequence of method calls and returns that is possible to produce in a concurrent program to correspond to a sequence that can be generated by the sequential object.In this approach, a sequential object, represented by a set of sequences of pairs of method invocations and their associated responses, constitutes the library specification.Then, abstraction allows the client to reason about calls to a concurrent library as if they execute atomically on a single thread, or, equivalently, protected by a global lock [7,13].
For libraries of crash-resilient objects, there is more than one natural way of interpreting sequential specifications and adapting the linearizability definition, and no single notion of correctness w.r.t.sequential specifications captures all different options.A crash-resilient object may ensure that all methods completed by the moment of crash survive through it, or that some prefix of them does.It may also choose different possibilities for methods in progress at the moment of crash (whether they are allowed take their effect at some later point after the crash or not).Multiple adaptations of linearizability have been proposed, each relating crash-resilient objects to sequential specifications in a different way.This includes: strict linearizability [3], persistent atomicity [19], and durable linearizability and its buffered variant [24].Among them, buffered durable linearizability, which allows for efficient implementations, ended up not being compositional, which means that it may happen that two (non-interacting) libraries are both correct, but their combination is not.In fact, since each of the different notions is useful for particular objects, one may naturally want to mix different correctness notions in a single client program.This would force the client to reason with several alternatives for interpreting sequential specifications, and to make sure that they compose well with one another.
To approach this variety, we believe it is necessary to follow a different approach, which is standard in concurrent program verification (see, e.g., [18,20,26]), and was applied before for deriving abstraction theorems in different contexts [8,16,17].The idea is to take a library's specification to be just another library, where the latter is intended to have a simpler implementation.Then, we define a library correctness condition stating what it means for one library L to refine another library L # (equivalently, for L # to abstract L), and prove an abstraction theorem that ensures that when the library correctness condition is met, the behaviors of any client using L are contained in the behaviors of the client using L # .Such a theorem is only useful if the correctness condition avoids quantification over all possible clients, which would make the theorem trivial.
Using code for specifying libraries has several advantages over correctness notions based on sequential specifications.First, specifications and implementations are expressed and reasoned about in a unified framework, alleviating the need to interpret the use of sequential specifications by concurrent programs with system failures.Instead, the client of the theorem replaces complex library code with simpler specification code, and thus works with the semantics of a single language.Second, it enables a layered verification technique for library developers, allowing them to prove library correctness by introducing one or more intermediate implementations between L and L # .Finally, this formulation of the abstraction theorem is compositional (a.k.a.local) by construction, meaning that objects can be specified and verified in isolation.Now, "code as a specification" is only useful if the programming language is sufficiently expressive for desirable specifications.For concurrent objects, "atomic blocks", often included in theoretic programming languages, provide a handy specification construct.For NVM, one needs a way to govern the persistence similarly, offering intuitive specifications for libraries that simplify client reasoning.For that matter, viewing the out-of-order persistence of writes to different cache lines as the major source of counterintuitive behaviors in NVM, we propose a new specification construct, which we call persistence blocks.Roughly speaking, such blocks may only persist in their entirety, so that persistence blocks ensure an "all-or-nothing" persistency behaviors to the writes they protect.
Our blocks are closely related to persistent transactions of the PMDK library [22] (but we avoid the term transaction, since persistence blocks do not ensure isolation when executed concurrently).In our technical development, we extend the PSC model with instructions for persistence blocks, and carefully construct their semantics (see §4.2) to allow the abstraction result.We believe that persistence blocks are a useful specification construct for various data structures, where data consistency naturally involves multiple locations (often, pointers) being in-sync with one another.

Client-Library Interaction Using Explicit Persist Instructions
The key to establishing a library abstraction theorem is in decomposing a program into two interacting sub-parts, a client and a library, and understanding the interactions between them.These interactions are usually defined in terms of histories, taken to be sequences of method invocations and responses, along with the values being passed.The library correctness condition (the premise of the abstraction theorem) requires that histories produced by using a library L are also produced by its specification L # when both libraries are used by a certain "most general client" (MGC, for short) that concurrently invokes arbitrary methods of L an arbitrary number of times with every possible argument.The abstraction theorem ensures that if the library correctness condition holds, then L refines L # for any client.
Thus, for the abstraction theorem to hold, one has to make sure that the interactions between any client and the library are fully captured in the history produced by the library when used by the MGC.In crash-free sequentially consistent shared memory semantics, this is ensured by the standard assumption that the client and the library manipulate disjoint set of memory locations.Indeed, this restriction guarantees that clients can communicate with libraries only via values passed to and returned from method invocations.
However, we observe that under NVM, mutual interactions between the client and the library go beyond passed values, even when assuming disjointness of memory locations, which makes the standard notion of a library history insufficient.As a simple example, consider an interface with just one method f , specified by L # = [f → sfence; return].The sfence instruction, called "store fence", is an explicit persist instruction meant to be used in conjunction with optimized barriers called "flush-optimal" (denoted by fo).Its role is to guarantee the persistence of previous write instructions that are guarded by flush-optimal instructions.Concretely, under PSC (following x86), after a thread executes ẋ := 1; fo( ẋ); sfence, we know that the write of 1 to ẋ has persisted (i.e., been propagated to the NVM), while without the sfence, it may still sit in the volatile part of the memory system.
In turn, consider an implementation L, given by L = [f → return], that implements f by doing nothing.Clearly, L does not implement L # correctly.Indeed, for the (sequential) client program ẋ := 1; fo( ẋ); call(f ); ẏ := 1 that uses L # , we have ẏ = 1 =⇒ ẋ = 1 as a global invariant: if the system has crashed and we have ẏ = 1 in the NVM, then the sfence ensures that ẋ = 1 is in the NVM as well.Nevertheless, due to out-of-order persistence, if we use L in this program, we may get ẏ = 1 ∧ ẋ = 0 after a crash.Now, the client and the libraries above mention disjoint locations, and the histories that L may produce for the MGC are exactly the histories that L # produces (all well-formed sequences of "call" and "return").Thus, when inspecting histories of L and of L # , we do not have sufficient information to observe the difference between them.
Generally speaking, the challenge stems from the fact that certain explicit persist instructions (sfence and other instructions whose implementation in the hardware contains an implicit store fence, such as RMWs in x86), which can be executed by the library, impose conditions on the persistence of writes performed by the client that ran earlier on the same processor.
We address this challenge in two ways.First, we can sidestep the problem by weakening the semantics of store fences, making them relative to a set of locations (those used by the library or those used by the client).To do so, we extend the programming language with a specification construct similar to a store fence, but only affecting a given set of locations, and we restrict its use by each component to mention only the locations it owns.The use of these localized instructions instead of store fences is sufficient to ensure that the interaction between client and library is fully captured in histories, and allows us to establish the expected abstraction theorem.Libraries that do not intend to provide a store fence functionality to their clients can readily replace store fences with their localized counterparts.Doing so gives more freedom to alternative implementations of the same specification, which may, e.g., use alternative persist instructions without the store fence functionality (such as CLFLUSH in [23]).
On the other hand, it is possible that in performance-critical systems, clients would like to rely on a store fence that is executed anyway by the library for the library's own needs.For that, the library developer needs to use a standard store fence in the library's specification rather than the localized counterpart, and the abstraction theorem has to handle store fences with their standard, non-localized semantics.To do so, we expose in histories not only method invocations and responses, but also store fences.Roughly speaking, it means that in addition to the standard requirement on values passed by method invocations and responses, for L to refine L # , we would also require that L performs a store fence whenever L # does (which does not hold for the example above).Our notion of history in §5 is set to allow store fences (alongside with their weaker localized versions), and the abstraction theorem in §6 shows that these extended histories are expressive enough for defining the library-correctness condition.

Handling Calling Policies
The third challenge we address concerns abstraction for libraries that enforce certain calling policies on their clients. 2For instance, a library implementing a lock may require that the calls of each thread for acquiring and releasing the lock perfectly interleave, and a library implementing a single-producer queue may require that only one thread is calling the enqueue method.In the context of NVM, libraries often demand that a distinguished recovery method is called after every crash before invoking any other method of the library.When the client uses the library in a way that violates the calling policy, the library developer ensures nothing, and the blame is assigned to the client.
In the presence of calling policies, the contextual refinement guaranteed by the library abstraction theorem, stating that all behaviors of a program Pr [L] that uses L are also behaviors of the program Pr [L # ] that uses L # , is only applicable for a program Pr that respects the calling policy.An interesting compositionality question arises: Are we allowed to assume the library's specification when checking whether a program adheres to the calling policy (that is, require that Pr [L # ] adheres to the policy), or should this obligation be satisfied for the library's implementation (that is, require that Pr [L] adheres to the policy)?
The latter option would limit the applicability of the abstraction theorem for client reasoning.Indeed, it may be the case that establishing that Pr [L] adheres to the policy depends on the implementation L, whereas the abstraction theorem should allow reasoning without knowing the implementation at all.On the other hand, the former option seems circular, as it uses contextual refinement to establish its own precondition.
In this paper we show that requiring that Pr [L # ] adheres to the policy is actually sufficient for ensuring contextual refinement.Roughly speaking, our proof avoids circular reasoning by inspecting a minimal contextual refinement violation, for which we are able to establish policy adherence when using L, given policy adherence when using L # .To the best of our knowledge, this is a novel argument in the context of library abstraction.It is akin to DRF (data-race freedom) guarantees in weak memory concurrency, where often programs are guaranteed to have strong semantics (usually, sequential consistency) provided that certain race-freedom conditions hold in all runs under the strong semantics.
We note that many library's calling policies are "structural", namely they only enforce certain ordering constraints on the clients that do not depend on the values returned by the library (in particular, "execute recovery first" is a structural policy).In these cases, policy adherence holds even for an overapproximation L stub of L that returns arbitrary values.Certainly, however, this is not always the case.For example, a library L implementing standard list methods, cons and head, may require that head is only called on non-empty lists (like, e.g., pop front in C++ that triggers undefined behavior if applied to an empty list [1]).Then, invoking head with the value returned from cons does adhere to the calling policy, but this is not the case for the over-approximated library L stub , which allows cons to return the empty list.

NVM Programs: Syntax and Semantics
In this section we begin to present the formal settings for our results.As standard in memory models, it is convenient to break the operational semantics into: a program semantics (a.k.a.thread subsystem) and a memory semantics.We represent both components as labeled transition systems whose transition labels correspond to the operations they perform.We then consider the synchronized runs of the program and the memory, where program actions that interact with the memory are matched by actions executed by the memory system (see §4.1).
Next, we focus on the program part of the semantics, presenting both syntax ( §3.1) and semantics ( §3.2).We use the following standard notations.
Notation for finite sequences.For a finite alphabet Σ, we denote by Σ * (respectively, Σ + ) the set of all (non-empty) sequences over Σ.We use ǫ to denote the empty sequence.The length of a sequence s is denoted by |s|.We often identify sequences with their underlying functions (whose domain is {1, ... ,|s|}), and write s(k) for the symbol at position 1 ≤ k ≤ |s| in s.We write σ ∈ s if σ appears in s, that is if s(k) = σ for some 1 ≤ k ≤ |s|.We use "•" for concatenating sequences, and identify symbols with sequences of length 1.

Program Syntax
The domains and metavariables used to range over them are as follows: For concreteness, we present a simple programming-language syntax.Its expressions and instructions are given by the following grammar:  Expressions are constructed with arithmetic and boolean operations over registers and values.Instructions consist of a local assignment r := e; a conditional if e goto n 1 ... n m for non-deterministically jumping to a program counter from {n 1 , ... ,n m } when e evaluates to non-zero or, otherwise, skipping (goto n 1 ... n m can be encoded as if 1 goto n 1 ... n m ); havoc for arbitrarily modifying all registers; a write to memory x := e; and a read from memory r := x.There are also explicit persist instructions: a flush instruction fl( ẋ) and its optimized version fo( ẋ), called flush-optimal (referred to as CLFLUSH and CLFLUSHOPT in [23]), as well as the store fence instruction sfence (see §2.2).
This standard instruction set is extended to support calling and specifying library methods.There is a call instruction call(f ) and a return instruction return.The novel specification constructs include the local store fence instruction lsfence( Ẋ) that relaxes the semantics of sfence by only enforcing the persistence ordering for the given set Ẋ of variables (thus, lsfence(NVVar) is equivalent to sfence); and instructions to begin and end a persistence block, beginPB( Ẋ) and endPB( Ẋ), respectively.The persistence block demarks the writes that need to persist simultaneously after the block ends, either nondeterministically or triggered by a flush on some variable in Ẋ.
Next, we employ three syntactic categories: • Instruction sequences represent the (sequential) implementation of each method (including main).Formally, an instruction sequence I is a function from a nonempty finite domain of the form {0, ... ,n} (representing the possible program counters) to the set of instructions.We say that an instruction sequence is flat if it does not include an instruction of the form call( ).• Sequential programs consist of a "main" method accompanied with implementations of every method f ∈ F. Formally, a sequential program S is a function assigning an instruction sequence to every f ∈ {main} ∪ F. To avoid modeling a call stack and simplify the presentation, we require that S(f ) is a flat instruction sequence for every f ∈ F. • Concurrent programs are top-level parallel compositions of sequential programs, all accompanied by the same method implementations.Formally, a (concurrent) program Pr is a mapping assigning a sequential program to ev-ery τ ∈ Tid, with Pr (τ )(f ) = Pr (π)(f ) for every τ, π ∈ Tid and f ∈ F. Below, we write Pr (f ) for Pr (T 1 )(f ).

Program Semantics
We give semantics to the syntactic objects using labeled transition systems.
We often write q σ − → q ′ to denote a transition q, σ, q ′ .We denote by A.Σ, A.Q, A.q Init , and A.T the components of an LTS A. We write σ − →A for the relation { q, q ′ | q σ − → q ′ ∈ A.T} and − →A for σ∈Σ σ − →A .For a sequence t ∈ A.Σ * , we write t − →A for the composition We denote by traces(A) the set of all traces of A.
Next, we define the LTSs induced by instruction sequences, sequential programs, and concurrent programs.We will often identify the syntactic objects with the LTS they induce (e.g., when writing expressions like S.Q for a sequential program S).The transition labels of these LTSs feature action labels.Definition 2. An action label takes one of the following forms: a read R(x, v), a write W(x, v), a flush FL( ẋ), a flush-opt FO( ẋ), an sfence SF, a local sfence LSF( Ẋ), a start beginPB( Ẋ) or an end endPB( Ẋ) of a persistence block, a call CALL(f , φ), or a return RET(f , φ), where x ∈ Var, v ∈ Val, ẋ ∈ NVVar, Ẋ ⊆ NVVar, f ∈ F, and φ : Reg → Val.We denote by Lab the set of all action labels.The functions typ and var retrieve (when applicable) the type (R/W/ ...) and variable (x or ẋ) of an action label.We write varset(l) for the set of variables mentioned in l (e.g., varset(R(x, v)) = {x}, varset(LSF( Ẋ)) = Ẋ, and varset(SF) = ∅).
Action labels represent the interactions that a program has with the memory.Definition 3. The LTS induced by an instruction sequence I is given by: • The transition labels are action labels, extended with ǫ for silent transitions.
• The states are pairs pc, φ where pc ∈ N, called program counter, stores the current instruction pointer inside the sequence, and φ : Reg → Val, called local store, records the values of the registers.We assume that local stores are extended to expressions in the obvious way.• The initial state is 0, φ Init , where φ Init def = λr.0.
• The transitions are as follows: Recall that program semantics is separate from memory semantics, which is why the transitions above completely ignore the restrictions arising from the memory system.In particular, the write to memory x := e only announces itself in the label.The read from memory r := x loads an arbitrary value v into the destination register r, announcing that value in the read label.Other instructions act as no-ops, and simply announce themselves in the transition label, using the function matching label that maps each instruction to its label (fl( ẋ) → FL( ẋ), fo( ẋ) → FO( ẋ), and so on).
Finally, call(f ) and return instructions are not handled in this level, but receive special semantics at the level of sequential programs, as defined next.
Definition 4. The LTS induced by a sequential program S is given by: • The transition labels are action labels, extended with ǫ for silent transitions.
• The states are tuples q = pc, φ, pc s , f , where: -pc, φ is a state of the instruction sequence (see Def. 3) storing the state of the sequence currently running.-pc s ∈ N ∪ {⊥}, called the stored program counter, is used to remember the program position to jump to when the current instruction sequence returns, whereas pc s = ⊥ means that the main method is currently running.(Recall that we assume that S(f ) is flat for every f ∈ F, so we do not need to record the call stack.)-f ∈ F∪{main}, called the active method, tracks the method that is currently running.We denote by q.pc, q.φ, q.pc s , and q.f the components of a state q ∈ S.Q.
• The transitions are given by: The normal transition lifts the instruction-sequence transition to the level of sequential programs.Note that the transition applies for any method (main or other).The call transition passes control from the main method to some other method, jumping the program counter to the first instruction and storing the return point (pc +1).The return transition passes control back using the stored return point.For simplicity, we do not have any argument passing mechanism and use the full register store for that matter.(If needed, each component may store the values it needs in the memory, and reload them later on.) Finally, non-det-sfence is a non-standard transition that we find technically convenient to have.It allows the program to non-deterministically execute an sfence at any point.Since, as will become apparent when presenting the memory system, sfences only restrict the possible behaviors, this transition is safe to include in the program semantics.It is particularly useful for simplifying the library correctness condition that only considers inclusion of sets of histories (see §5).For instance, switching the roles of L and L # from §2.2, the library implementing f using sfence should be considered a refinement of the one that simply returns.For that, we allow the no-op specification to perform non-deterministic sfences that match the ones executed by the concrete implementation.
Finally, the LTS induced by a concurrent program is defined as follows.
Definition 5.The LTS induced by a (concurrent) program Pr is given by: • The set of transition labels is given by (Tid × (Lab∪ {ǫ})) ∪ { }.The functions on action labels (e.g., typ, var) are lifted to these labels in the obvious way.• The states, denoted by q, assign a state in Pr (τ ).Q to every τ ∈ Tid.
• The initial state is composed from the initial state of each thread: q Init def = Pr (T 1 ).qInit , ... ,Pr (T N ).qInit .• The transitions are interleaved thread transitions or crash transitions reinitializing the program state: We present PSC ("Persistent Sequential Consistency"), the persistency model used as the memory system.We first introduce the model as it is in [25] (extended with standard volatile memory alongside with the non-volatile one), following its operational presentation as an LTS with non-deterministic memory-internal transitions that flush stores from the volatile part to the non-volatile part.In §4.1, we define the synchronization of programs with the PSC memory system.In §4.2, we present the extensions added in this paper that are useful for library abstraction.Finally, in §4.3, we establish certain separation properties of PSC that are essential in our proofs.Roughly speaking, a state in PSC consists of a non-volatile memory (mapping from non-volatile variables to values) and a volatile memory (mapping from volatile variables to values).The volatile memory works just as a normal sequentially consistent memory, keeping track of the latest written value to every variable and returning that value for reads.Upon crash, the contents of the volatile memory is reset to its initial state.The non-volatile memory behaves observationally the same between crashes, but its contents survive crashes.To model delayed and out-of-order persistence of writes, write steps to non-volatile variables do not alter the non-volatile memory immediately when issued.Instead, writes first go to volatile per-variable persistence FIFO buffers, which maintain the writes to each variable that are yet to persist.Then, PSC non-deterministically takes persist steps that apply the oldest update from a persistence buffer in the non-volatile memory.Reads from non-volatile variables retrieve the latest value in the relevant buffer, or the value from the non-volatile memory if that buffer is empty, thus providing standard sequentially consistent semantics in the absence of system crashes.Upon crash the buffers are reset to their initial (empty) state, but the contents of the non-volatile memory remains intact.
Explicit persist instructions can be used to control the persistence of writes.A "flush" barrier for a certain variable blocks the execution until the relevant persistence buffer is empty, thus forcing all previous writes to that variable to persist.Alternatively, a (cheaper) "flush-optimal" barrier for a certain variable enqueues a special marker in the persistence buffer of this variable accompanied by the thread identifier of the thread that issued the barrier.The effect of flushoptimal is delayed until the same thread performs an sfence, which blocks the execution until all flush-optimal markers of that thread are dequeued from all buffers.The fact that the persistence buffers are FIFO ensures that an sfence by some thread forces the persistence of all writes executed before a flush-optimal issued by the same thread.Definition 6. PSC is the LTS defined as follows: • The transition labels are given by (Tid × Lab) ∪ {per, }.That is, a transition label can be a pair of the thread identifier and the action label of the operation, per denoting the internal propagation action, or denoting a system crash.• The states are tuples M = ṁ, m, P , where: -ṁ : NVVar → Val is called the non-volatile memory.
-m : VVar → Val is called the volatile memory.
-P : NVVar → PLBuff is called the persistence buffer.Here, PLBuff denotes the set of all per-location persistence buffers, each of which is a finite sequence p of entries of the form W(v) for v ∈ Val (writes), or FO(τ ) for τ ∈ Tid (flush optimal markers).The persistence buffer P assigns a per-location persistence buffer to every non-volatile variable. 4e denote by M.
Fig. 1.Transitions of PSC • The transitions of PSC are presented in Fig. 1, using an auxiliary function for looking up the most recent value of a variable: we let M (x) be M.m(x) for x ∈ VVar, and, for x ∈ NVVar, either the value v of the last write (rightmost) entry M.P(x) or, when there is no such entry, M. ṁ(x).
The transitions follow the intuitive account above.Those corresponding to program transitions are labeled with pairs in Tid×Lab.For instance, a transition labeled with τ, R(x, v R ) means that thread τ reads the value v R from (volatile or non-volatile) shared variable x.

Linking Programs and Memories
To give semantics of programs running under PSC, the thread system is synchronized with the PSC memory system.Formally, the synchronization of a program Pr with PSC, is another LTS, denoted by Pr ⋊ ⋉PSC, defined as follows: • The set of transition labels is Pr .Σ∪PSC.Σ, i.e., (Tid×(Lab∪{ǫ}))∪{per, }.
• The states are pairs q, M ∈ Pr .Q × PSC.Q.
• The initial state is q Init , M Init .
• The transitions are given by: The above transitions are "synchronized transitions" of Pr and PSC, using the labels to decide what to synchronize on.Both the program and the memory take the same step for transition labels that are common to both LTSs, only the program steps for transition labels that are only program transitions, and only the memory steps for transition labels that are only memory transitions.

Extending PSC for Library Abstraction
We present the modifications of PSC for supporting the new specification constructs: localized sfences and persistence blocks.When referring to PSC in the sequel we mean the following revised version.
Local store fences.Localized sfences are straightforwardly supported by the following additional memory transition: Here, instead of blocking until all FO(τ ) entries are removed from all buffers, we only require that such entries are not present in buffers associated with variables from a certain set (mentioned in the action label and corresponding to the argument of the lsfence( Ẋ) instruction).
Persistence blocks.We assume an infinite set BlockID of block identifiers that are non-deterministically allocated when blocks are opened.The state of the memory system keeps track of a mapping assigning the current open block identifier to every thread and non-volatile variable, or ⊥ if the variable is not a part of an open block of the thread.When writing to non-volatile variables, the associated block identifiers are attached to the write entry in the per-location persistence buffer.In turn, the propagation from the buffers to the NVM ensures that blocks are propagated only after they are not open and only in their entirety.
To do so, we generalize the persist step of PSC to allow simultaneous propagation of multiple entries from the buffers.To respect the per-variable FIFO order, the propagated entries should form a prefix of each buffer.Formally, this requires the following modifications: 1. Write entries in buffers take the form j:W(v) where j ∈ BlockID ∪ {⊥} and v ∈ Val (instead of W(v)).A write entry of the form ⊥:W(v) means that the corresponding write was not a part of a persistence block.2. States are extended to be quintuples M = ṁ, m, P , B, Bid , where: -B : Tid → NVVar → (BlockID ∪ {⊥}) is called the active-block mapping.It assigns a block identifier (or ⊥ if there is no active block) to every thread identifier and non-volatile variable.-Bid ⊆ BlockID × P(NVVar) is called the block identifiers set.It is used to store all persistence block identifiers occurring so far, each accompanied by the set of non-volatile variables that it protects.We denote by M.B and M.Bid the additional components of a state M .We impose the following well-formedness conditions: 4. The nv-write transition records the current active block in the added entry: 5. The following two transitions for opening and closing blocks are added: Thus, opening a block allocates a fresh identifier and sets the active-block mapping accordingly.In turn, closing a block resets the relevant variables in the active-block mapping.6.The following transition is used instead of persist-write and persist-fo.It generalizes both persist-write and persist-fo by simultaneously persisting several entries together (each p ẋ below stands for a sequence of entries).
v last write entry in p ẋ has value v M. ṁ( ẋ) there are no write entries in p ẋ This step imposes two restrictions.First, the persisted entries from each buffer (p ẋ) should form a prefix of that buffer, so that FIFO semantics is maintained.Second, to respect the persistence blocks, if some entry of a given block is persisted (∃ ẋ. j:W( ) ∈ p ẋ) then that block should not be currently active by any thread (∀ ẋ, τ.M.B(τ )( ẋ) = j) and no entries of that block should remain in the volatile buffers (∀ ẋ. j:W( ) ∈ P ′ ( ẋ))).We note that nested and interleaved blocks are allowed.The program on the right demonstrates such a case.Here, ẋ = 1 and ẏ = 1 must persist together; ż = 1 and ẇ = 1 must persist together; but these two pairs can persist independently of each other in any order.Thus, provided that the client and the library use blocks of their own locations, the block instructions by each component are invisible to the other.beginPB( ẋ, ẏ); ẋ := 1; beginPB( ż, ẇ); ż := 1; ẇ := 1; endPB( ż, ẇ); ẏ := 1; endPB( ẋ, ẏ);

Separation Properties
To enable our library abstraction proof, the required key property of PSC, which we preserved in its extensions, is the ability to separate PSC states into disjoint parts (the library's part and the client's part) and capture each memory transition in terms of its effect on the two parts.Next, we formulate this property, which we will later use to prove library abstraction.In fact, our arguments for library abstraction rely only on the properties below, and never "unfold" the PSC-related definitions.This allows one to refine and extend PSC, as long as the separation properties are preserved.
The separation of PSC states is stated in terms of the following restriction operator relative to a set of variables.For persistence blocks to behave correctly, we need an auxiliary condition on this set: we say that a set Ẋ ⊆ NVVar separates a state M ∈ PSC.Q if for every j, Ẏ ∈ M.Bid, we have Ẏ ⊆ Ẋ or Ẏ ⊆ NVVar\ Ẋ.
Definition 7. The restriction of M ∈ PSC.Q onto a set X ⊆ Var such that X ∩NVVar separates M , denoted by M | X , is the state M ′ ∈ PSC.Q given by: The next lemma states the separation property of PSC, providing a precise characterization of each PSC transition in terms of transitions on the restrictions M | X and M | Var\X .A special case is needed for store fence transitions, since taking these transitions enforces conditions on both restrictions.Lemma 1.Let X ⊆ Var such that X ∩ NVVar separates a state M 1 .
1.For every τ ∈ Tid and l ∈ Lab \ {SF} with varset(l) ⊆ X, M The proof of Lemma 1 proceeds by standard case analysis ranging over all possible transitions of PSC.Finally, the following operation is used below to compose a state from a client and a library components (see Lemma 2).Definition 8. Let M 1 , M 2 be states of PSC, and

Libraries and Their Clients
We present the notions of libraries and clients, as well as the necessary definitions for stating the abstraction theorem: histories and most general clients.
Libraries.We take a library L to be a function assigning to method names in dom(L) ⊆ F flat instruction sequences representing the method bodies.In the context of some library L, we refer to the implementations of the methods in {main} ∪ F \ dom(L) in a program Pr as the client of L.
Client-library composition.We consider the common case where libraries and their clients never access the same shared variables.To formally define this restriction, we use the following notations for sets of locations used by instruction sequences, libraries, and their clients: • Var(I) denotes the set of shared variables mentioned in an instruction sequence I (possibly as a part of a set Ẋ of variables, e.g., in beginPB( Ẋ)).• For a library L, Var(L)

Note that we always have Var(Pr
Histories.Histories record the interactions between libraries and clients.Formally, a history h of a library L is a sequence of transition labels representing a crash, a call to a method of L, a return from a method of L, or an sfence, i.e., labels from the set HTLab dom(L) , which is defined as follows: Definition 10.Let t be a trace of Pr ⋊ ⋉PSC for some program Pr .The history induced by t w.r.t. a set F ⊆ F, denoted by H F (t), is the subsequence of t over HTLab F consisting of (in the same order they appear in t): call and return labels τ, CALL(f , φ) and τ, RET(f , φ) with f ∈ F ; SF-labels τ, SF ; and crash labels.
The notation H F (t) is extended to sets of traces in the obvious way.The set of histories w.r.t.F of Pr , denoted by H F (Pr ), is given by H F (traces(Pr ⋊ ⋉PSC)).When F = F (i.e., the set of all method names), we simply write H(t) and H(Pr ).
Most general clients.We encompass library calling policies (see §2.3) using the notion of a "most general client"-a non-deterministic client that invokes the library methods in the most general way allowed by the policy.Formally, a most general client MGC is given as a (concurrent) program.Adherence to the calling policy is defined as follows.
Definition 11.Let L be a library, and Pr and MGC be programs such that L is safe for both Pr and MGC .We say that Pr correctly calls L w.r.t.MGC if The policy of a library with no restrictions on its clients (beyond the separation of shared resources) is expressed by an MGC, called MGC free , that repeatedly invokes arbitrary library methods with arbitrary initial stores.Often persistent objects include a recovery method meant to be executed after a crash before any other method is invoked.We call such a policy MGC rec .Formally, MGC free (for dom(L) = {f 1 , ... ,f n }) and MGC rec (for dom(L) = {f 1 , ... ,f n } ⊎ {recover}) assign the following main method to each thread τ : MGC free(τ )(main) = BEGIN : havoc; goto f1 ... fn END; f1 : call(f1); goto BEGIN; ... fn : call(fn); goto BEGIN; END : MGC rec(τ )(main) = a := CAS(x, 0, 1); if a = 0 goto REC; goto WAIT; REC : call(recover); ỹ := 1; goto BEGIN; WAIT : a := ỹ; if a = 0 goto WAIT; goto BEGIN; BEGIN : ... rest of the code as in MGC free ...
In MGC rec , using a compare-and-swap, one thread performs the recovery.All other threads wait until recovery ends to start their method invocations.

The Library Abstraction Theorem
In this section we state and prove the library abstraction theorem.The premise of this theorem, the library correctness condition, is formulated as follows.
Definition 12. Let L and L # be libraries, both safe for a program MGC .We say that L refines L # w.r.t.MGC , denoted by L ⊑ MGC L # , if both libraries implement the same methods and H(MGC Next, the abstraction theorem states that L ⊑ MGC L # ensures that any client adhering to the library's calling policy may safely use the implementation L while reasoning about possible behaviors in terms of the specification L # .Our notion of "a behavior" includes the generated histories, as well as the reachable states, by the composition of the program and the memory system.Including reachable states is intended to assist safety verification.Clearly, we cannot require that the program states match for threads that are currently executing a method of L. In addition, since L and L # may update the memory differently (e.g., use different variables), we should only consider the variables of the client when inspecting the memory states.This leads us to the following statement.
Theorem 1 (Abstraction).Suppose that L ⊑ MGC L # .Let MGC and Pr be programs such that both L and L # are safe for MGC and Pr , and Pr correctly calls L # w.r.t.MGC .If q Init , M Init t − →Pr[L]⋊ ⋉PSC q, M , then there exist t # and q # , M # such that the following hold: • For every τ ∈ Tid, if q(τ ).f ∈ dom(L), then q # (τ ) = q(τ ).
Note that L ⊑ MGC L # is necessary for the conclusion to hold: otherwise, MGC itself is a client that can observe behaviors of L that are impossible for L # .Following §2.3, we also note that policy adherence is required w.r.t. to L # .
To prove the abstraction theorem, the following key lemma is used multiple times (with different arguments).It allows us to compose the client's part from one trace with the library's part from another into one combined trace.

Lemma 2 (Composition).
Let L and L ′ be libraries implementing the same set F of methods such that both are safe for a program Pr , and L is also safe for a program Pr ′ .Suppose that q Init , M Init M lib , and H F (t cl ) = H F (t lib ).Then, there exists a trace t such that H(t) = H(t cl ) and q Init , M Init t − →Pr[L]⋊ ⋉PSC q, M , for: q lib (τ ).pc, q lib (τ ).φ, q cl (τ ).pc s , q cl (τ ).f The proof of Lemma 2 is based on the inherent disjointness in client-library composition provided by a library safe for its client program, which we leverage in the following two ways.
Firstly, we extract client-local and library-local transition properties from all transitions of Pr [L ′ ]⋊ ⋉PSC and Pr ′ [L]⋊ ⋉PSC.Thus, when we consider a transition by Pr [L ′ ]⋊ ⋉PSC corresponding to an instruction outside of a method of L ′ , we show that an analogous transition is possible with the same program state, but with memory state zeroing out locations used by the library L ′ .Similarly, when we consider a transition by Pr ′ [L]⋊ ⋉PSC corresponding to an instruction in a method of L, we show that an analogous transition is possible with almost the same program state, except we alter its stored program counter, and with memory state zeroing out locations used by the client Pr ′ .The justifications for these steps follow by the (⇒) directions of Lemma 1.
Secondly, we compose the client-local transition properties Pr exhibits in t cl and the library-local transition properties L exhibits in t lib while constructing transitions of Pr [L]⋊ ⋉PSC for a trace t.Knowing that L is safe for Pr , we consider client-local transition properties from t cl corresponding to transitions we wish to recreate in t, and replace zeroed-out memory locations with locations of L. Dually, we consider library-local transition properties from t lib corresponding to transitions we wish to recreate in t, and replace zeroed-out memory locations with locations of Pr .The (⇐) directions of Lemma 1 justify such transformations.For instance, non-SF-transitions can be composed, provided that the client program preserves the library memory state, and vice versa; while crashes and SF-transitions record an interaction between a client program and a library and therefore need to be performed in synchrony.
We use these two ideas in proving Lemma 2 by induction on the sum of lengths of t cl and t lib , and use their local transition properties to justify composing them in synchrony.For the base case, we can simply take t = ǫ.For the induction step, we consider the last labels in t cl and t lib , as well as the cases when one of the traces is empty.When t cl = • α cl and t lib = • α lib , we use t ′ from the induction hypothesis for t cl and t lib with the last action removed from either or both of them, and let t = t ′ • α cl or t = t ′ • α lib .
Then, the abstraction theorem is proved as follows.
Proof outline for Thm.
⊓ ⊔ The following corollary of Thm. 1 states that, like classical linearizability, our correctness condition is compositional (a.k.a.local), meaning that a library consisting of several (non-interacting) libraries can be abstracted by considering each sub-library separately.Formally, the composition of libraries L 1 , ... ,L n with pairwise disjoint sets of declared methods, denoted by L 1 ⊎ ... ⊎L n , is defined to be the library obtained by taking the union of L 1 , ... ,L n .Compositionality is formulated as follows.
To end this section, we provide a simple lemma that allows one to establish L ⊑ MGC L # by applying standard simulation arguments for crashless traces (with observable transitions being those that induce history labels).For that matter, we require a simulation relation on non-volatile memories generated by MGC [L]⋊ ⋉PSC and MGC [L # ]⋊ ⋉PSC that holds for the very initial memory and preserved during crashless executions.

An Application: Persistent Pairs
We illustrate the use of the library abstraction theorem for a simple concurrent and persistent data structure-a pair of values that supports write and read operations.We present two specifications and an implementation for each specification.Both specifications ensure atomicity (i.e., linearizability if the system does not crash), and "data consistency" (reads return values written by a single write invocation), but they differ in their persistency guarantees.For the concurrency aspect, the implementations follow the sequence lock (seqlock, for short) mechanism, which uses a version counter along with the pair and allows readers to avoid blocking [6].For durability, the implementations employ different techniques: one uses a "redo log" and the other is based on "checkpoints".
A durable pair.The first specification, a library we denote by L # pair , consists of three methods: write for writing the two values of the pair, read for reading the pair, and recover for recovering from a crash.The specification is as follows:  A volatile lock ( l) is used to ensure atomicity.For durability, writes use persistence blocks, which ensure that the two parts of the pair persist simultaneously.After the block is ended, fl( ẋ1 ) (equivalent here to fl( ẋ2 ) due to the persistence block) ensures that the block persists.If the system crashes after a write completed, the written values are guaranteed to survive the crash.Thus, there is nothing to be done at recovery.Nevertheless, aiming to allow implementations, the library policy requires that recovery is executed after every crash before other methods are invoked (MGC rec in §5).
Next, we present an implementation of L # pair , which we denote by L pair .We write x := y instead of a read of y (to some fresh register) followed by a write to x.We also omit some necessary register bookkeeping: since histories record the whole register store in call/return labels, strictly speaking, implementations must unroll changes to registers not used to pass return values.goto END; ẋ1 := ẋnew 1 ; fo( ẋ1); ẋ2 := ẋnew 2 ; fo( ẋ2); sfence; END: ṡ := 0; return; Ignoring crashes, atomicity is guaranteed here using a seqlock.As for persistency, observe first that writing directly to the NVM is wrong since we cannot control the non-deterministic propagation: if a crash occurs during the execution of write, it is possible that only one part of the pair has persisted, and the recovery method will not have sufficient information for reinitializing the pair correctly.Instead, write first records its "job" in ẋnew 1 , ẋnew

2
. Then, if a crash happens and the write was in the middle of updating ẋ1 , ẋ2 (as identified via observing an odd version number), the recovery will complete the job of the writer.We note that the (rather extensive) use of flushes (or flush-optimals followed by an sfence) is necessary here in order to restrict the out-of-order persistence.The final write to ṡ in write does not have to be explicitly persisted.Indeed, if a crash happens between this write and its persistence, recovery will redo the (idempotent) job.
• If ṁ( ṡ) is odd, then ṁ( ẋnew 1 ) = ṁ# ( ẋ1 ) and ṁ( ẋnew 2 ) = ṁ# ( ẋ2 ).Using the abstraction theorem, we obtain that for a program Pr that uses L pair correctly (i.e., calls recovery first after every crash), for every state q, M that is reachable in Pr [L pair ]⋊ ⋉PSC, there exists a state q # , M # reachable in Pr [L # pair ]⋊ ⋉PSC and indistinguishable from q, M from the client perspective.A buffered durable pair.A second specification, denoted by L # bpair , allows for "buffered" behaviors, which enable faster implementations by weakening persistency guarantees [24].Instead of requiring operations to persist before returning, it only requires that operations are "persistently ordered" before returning.Compared to L # pair , the explicit flush instruction fl( ẋ1 ) from the write method is omitted, which means that a crash after a completed write may take the pair back to its state before the write.Thus, the state after a crash need not necessarily be fully up-to-date.An additional method, called sync, can used to ensure that previous writes have persisted.Without sync, an implementation could simply ignore persistency and store the pair in the volatile memory, which corresponds to an execution of L # bpair in which the persistency buffers are never being flushed.An implementation can be obtained as follows: This implementation exploits the freedom allowed by the specification.Writes and reads again employ a seqlock, but this time they only use volatile variables.In turn, sync sets a "checkpoint", and recovery rolls the state back to the latest complete checkpoint.For that matter, a non-volatile flag ḟ is used to detect crashes during the setting the checkpoint ẋnext .Upon recovery, given the value of the flag, we know if we can restore the state from the current stored checkpoint, or, if a crash happened during the store of this checkpoint (which means that sync did not return), set the pair to the previous stored one.

Related and Future Work
Library abstraction theorems.Previous work has developed library abstraction theorems for crashless shared memory concurrency.First, [13] formalized the intuition that standard linearizability as defined in [21] corresponds to contextual refinement (and also proved a completeness result: the converse also holds provided that threads have other means of interaction besides the library).Later, [7] refined and formulated this result using history inclusion instead of linearizability, which is closer to our formalization.Other abstraction results account for liveness [16], resource-transferring programs [17], and x86-TSO [8].Our composition lemma (Lemma 2) is inspired by [8], which addresses a challenge that is close to the challenge posed by store fence instructions in NVM, where actions of the client and the library affect each other even if they access to distinct locations.To do so, the notion of a history is extended to expose events that correspond to the flushing certain entries from the x86-TSO store buffers, which is close to what we do to handle store fences.Our alternative approach to this problem, i.e., introducing a relaxed version of the store fence, is novel.While our framework is operational, library abstraction was also studied before for declarative shared memory concurrency semantics, particularly in the context of the C11 weak memory model [5,28].
Linearizability notions for persistent objects.Different approaches for adapting the standard linearizability criterion that is based on crash-free sequential specifications [21] were proposed before [3,19,24], but were not formally related to contextual refinement.Since methods like recover and sync (see §7) are meaningless in crash-free sequential specifications, they require an ad-hoc external treatment in these linearizability adaptations.The variety of approaches to interpret crash-free sequential specifications for crash-resilient concurrent objects makes it hard, in particular, to combine libraries with different linearizability guarantees in a single program.
In turn, these existing notions are typically expressible in the refinement framework that we employ.For example, in the crashless setting, by wrapping each method of a sequential implementation S of some object inside a global lock, one obtains an abstract library L # S for that object that corresponds to the conditions imposed by standard linearizability [7] (a library L is linearizable w.r.t.S iff every crashless history induced by a trace of MGC [L] is also induced by some trace of MGC [L # S ]).Now, when crashes are involved, by wrapping each method of S inside a global lock and a persistence block followed by an explicit flush instruction (like L # pair in §7), one obtains an abstract library L # S that corresponds to the conditions imposed by strict linearizability of [3] (L is strictly linearizable w.r.t.S iff L ⊑ MGC L # S ).Thus, our results can be used to derive contextual refinement (using L # S as a specification) from strictly linearizable objects.We note that while the original definition of strict linearizability was for a model with per-processor failure, what we consider here is its application for full system crashes.
Durable linearizability [24] weakens strict linearizability by allowing methods that were active during a crash to take their effect at any later point in the execution (or never), instead of requiring that the effect of such methods is visible immediately after the crash (or never).This weakening aims to allow lazy recovery for large structures, where either the recovery procedure is executed in parallel to other methods after a crash, or the methods themselves participate in recovering the data structure when they are further executed.This notion can be also expressible as an abstract implementation in our language.For this matter, every update method in the specification would: first record its task in a work-set; remove the task from the work-set; flush the updated work-set; and perform the task like in L # S described above.In turn, every query method may choose to complete any task it finds in the work-set, since the method performing such a task has crashed during its invocation.For persistent pairs (see §7), this is illustrated by the specification below.The non-volatile variable ẇ is the multiset holding the work-set with atomic add and remove operations, and lrw is an abstract multiple-readers-single-writer lock used to resolve races on the work-set.
A "buffered" version of strict linearizability, which only requires the existence of a prefix of the completed invocations to be observed after a crash, is also naturally derived by considering L # S b which is obtained from a sequential implementation S by wrapping each method of S inside a global lock and a persistence block (without an explicit flush instruction) and ensuring that there is a single non-volatile variable that is written to by all library methods (introducing such a variable if needed). 6 An alternative operational characterization of durable linearizability using Input/Output automata was developed in [12] and used to formally establish this property for the persistent queue of [14] by providing a full-blown simulation proof using the KIV proof assistant. 7Nevertheless, this work does not relate the proved correctness criterion to contextual refinement.
Persistency models.The underlying model we assume is PSC by [25], a strengthening of Px86 [30] that formalizes the Intel-x86 persistency.The paper [25] provided compiler mappings that ensure PSC semantics on machines guaranteeing Px86 semantics.We extended the general semantic framework with libraries, and extended PSC with local store fences and persistence blocks.
Future work.Future work includes extending our proof method and results for weaker persistency models, such as persistent x86-TSO [30] and ARM [10]; handling random access shared memory with allocations and deallocations (instead of the simplified shared variables model we employ); and lifting the strict condition that libraries and clients live in disjoint address spaces by allowing them to transfer ownership of certain locations (as was done in [17] for standard volatile memory). 6Since the corresponding "buffered" correctness notion is not compositional, while the refinement-based notion is (see Corollary 1), one cannot expect to have a per-object translation of a sequential implementation S into a concurrent and persistent implementation L # S b .Indeed, the addition of a single non-volatile variable that is written to by all library methods is a not a per-object translation (i.e., for two sequential library implementations implementing disjoint sets of methods and operating on disjoint variables, S1 and S2, we will not have ). 7 See https://kiv.isse.de/projects/Durable-Queue.html.
In addition, extending and adapting methods for refinement verification under volatile memory is needed in order to provide library developers with means to validate our library-correctness conditions.Such methods may include automated checking by approximation [7], layered interactive verification in the style of [20,27], and formal logics as the one in [26].Similarly, developing formal methods and tools that allow using library specifications for client reasoning is left for future work, including decidable reachability analysis [2], program logics [29], and principled testing [15].Finally, it is interesting to see how logical atomicity notions established by program logics, such as [11,31], which has been extended to cover crashes in disk-based storage systems [9], can be adapted for establishing our correctness condition and/or for client reasoning.

B Proofs
The following propositions are used in the following proofs.The all easily follow from our definitions.
The following properties all assume a library L that is safe for a program Pr .Proposition 5.If q τ,lǫ − − →Pr[L] q ′ and q(τ ).f ∈ dom(L), then q τ,lǫ − − →Pr q ′ .Proposition 6.For every state q, M reachable in Pr [L]⋊ ⋉PSC, we have that both Var(L) ∩ NVVar and Var(Pr \ dom(L)) ∩ NVVar separate M .Proposition 7. The following hold whenever q τ,l The following propositions easily follow from the definitions in §4.Under the conditions of Def. 8, we always have the following properties: Lemma 2 (Composition).Let L and L ′ be libraries implementing the same set F of methods such that both are safe for a program Pr , and L is also safe for a program Pr ′ .Suppose that q Init , M Init and H F (t cl ) = H F (t lib ).Then, there exists a trace t such that H(t) = H(t cl ) and q Init , M Init t − →Pr[L]⋊ ⋉PSC q, M , for: • q = λτ.q lib (τ ).pc, q lib (τ ).φ, q cl (τ ).pc s , q cl (τ ).f q cl (τ ).f ∈ F q cl (τ ) otherwise is non-empty and ends with a label α cl that does not contribute to H F (t cl ); • III) both t cl and t lib are non-empty and end with labels α cl and α lib contributing to histories H F (t cl ) and H F (t lib ), i.e., one of the following holds: It is easy to see that these three cases exhaust all possibilities for t lib and t cl .For instance, suppose that t lib is non-empty, but ends with a label corresponding to a history label.Let t lib = • α lib and H F (t lib ) = • H F (α lib ).By the lemma's premise, H F (t cl ) = H F (t lib ).Therefore, it must be that cl is non-empty, such a possibility is already covered by Case II, and when t ′ cl is empty, such a possibility is already covered by Case III.Case I. Suppose that t lib is non-empty and ends with a label α lib not corresponding to a history label.Let t lib = t ′ lib • α lib , and consider any state q ′ lib , M ′ lib for which there are the following transitions: In the following, we consider differently various cases for α lib in order to construct t.
Since the transition is program-internal, q ′ , M ′ τ,RET(f ,φ) − −−−−− →Pr[L]⋊ ⋉PSC q, M .By the induction hypothesis Compose(t ′ cl , t lib ), for q ′ cl , M ′ cl , q lib , M lib there exists t ′ such that H(t ′ ) = H(t ′ cl ) and q Init , M Init ), and we have shown that: To conclude the proof for this case, we let t = t ′ • τ, RET(f , φ) .Case III.Suppose both t cl and t lib are non-empty and end with a label corresponding to a history label.Let t cl = t ′ cl • α cl and t lib = t ′ lib • α lib .By the premise of the induction step, H F (t cl ) = H F (t lib ) holds; hence, H F (α cl ) = H F (α lib ), and we refer to that history action label as α.Let q ′ lib , M ′ lib and q ′ cl , M ′ cl be any states for which there are the following transitions: In the following, we consider different combinations of α cl and α lib in order to construct t.

For all i, L
Proof.We prove the claim by induction on n.For n = 1, the claim trivially follows.
For the induction step, let L 1 , ... ,L n , L # 1 , ... ,L # n be libraries, and let MGC be a program satisfying the required conditions.For MGC ′ = MGC [L # n ], we have that Var(L 1 ), ... ,Var(L n−1 ), Var(L # 1 ), ... ,Var(L # n−1 ), Var(MGC ′ \ dom(L 1 ⊎ ... ⊎L n−1 )) the step in which the implementation persists 0 for ḟ. (Note that this mean that we may need to exclude the fl( ẋ1 )-step from the specification trace, and we can do so since the invocation did not complete.)This construction ensures that R holds for the non-volatile memories in the end of the trace.To show this, one shows that R is in fact an invariant of this construction that holds whenever the lock is not held (M ( ḟ) = 0).

def=
f ∈dom(L) Var(L(f )).• For a program Pr and a set F ⊆ F, Var(Pr \ F ) def = τ ∈Tid Var(Pr (τ )(main)) ∪ f ∈F\F Var(Pr (f )).Then, client-library composition is defined as follows.Definition 9. A library L is safe for a program Pr if Var(L)∩Var(Pr \dom(L)) = ∅.When L is safe for Pr , we write Pr [L] for the program obtained from Pr by setting Pr (τ )(f ) = L(f ) for every τ ∈ Tid and f ∈ dom(L).

1 .
It suffices to show H(Pr [L]) ⊆ H(Pr [L # ]); then the claim follows using Lemma 2 by letting L := L # , L ′ := L, Pr := Pr , and Pr ′ := Pr .Suppose otherwise, and let h be a shortest history in H(Pr [L]) \ H(Pr [L # ]).Let t be a shortest trace in traces(Pr [L]⋊ ⋉PSC) with H(t) = h.Consider the last transition label α in t.The minimality of h and t ensures that α must be a return transition label for some f ∈ dom(L).Indeed, otherwise, we can show that α is enabled in the end of a corresponding trace of Pr [L # ]⋊ ⋉PSC, which contradicts the fact that h ∈ H(Pr [L # ]).(The full argument here requires applying Lemma 2 with L := L # , L ′ := L, Pr := Pr , and Pr ′ := Pr .)Now,using the fact that Pr correctly calls L # w.r.t.MGC , we again apply Lemma 2 with L := L, L ′ := L # , Pr := MGC , and Pr ′ := Pr , and derive that α is enabled in the end of a corresponding trace of MGC [L]⋊ ⋉PSC.Then,