The Decidability of Verification under PS 2.0

We consider the reachability problem for finite-state multi-threaded programs under the promising semantics (PS 2.0) of Lee et al., which captures most common program transformations. Since reachability is already known to be undecidable in the fragment of PS 2.0 with only release-acquire accesses (PS 2.0-ra), we consider the fragment with only relaxed accesses and promises (PS 2.0-rlx). We show that reachability under PS 2.0-rlx is undecidable in general and that it becomes decidable, albeit non-primitive recursive, if we bound the number of promises. Given these results, we consider a bounded version of the reachability problem. To this end, we bound both the number of promises and of “view-switches”, i.e., the number of times the processes may switch their local views of the global memory. We provide a code-to-code translation from an input program under PS 2.0 (with relaxed and release-acquire memory accesses along with promises) to a program under SC, thereby reducing the bounded reachability problem under PS 2.0 to the bounded context-switching problem under SC. We have implemented a tool and tested it on a set of benchmarks, demonstrating that typical bugs in programs can be found with a small bound.


Introduction
An important long-standing open problem in PL research has been to define a weak memory model that captures the semantics of concurrent memory accesses in languages like Java and C/C++. A model is considered good if it can be implemented efficiently (i.e., if it supports all usual compiler optimizations and its accesses are compiled to plain x86/ARM/Power/RISCV accesses), and is easy to reason about. To address this problem, Kang et al. [16] introduced the promising semantics. This was the first model that supported basic invariant reasoning, the DRF guarantee, and even a non-trivial program logic [30].
In the promising semantics, the memory is modeled as a set of timestamped messages, each corresponding to a write made by the program. Each process/thread records its own view of the memory-i.e., the latest timestamp for each memory location that it is aware of. A message has the form (x, v, (f, t], V ) where x is a location, v a value to be stored for x, (f, t] is the timestamp interval corresponding to the write and V is the local view of the process who made the write to x. When reading from memory, a process can either return the value stored at the timestamp in its view or advance its view to some larger timestamp and read from that message. When a process p writes to memory location x, a new message with a timestamp larger than p's view of x is created, and p's view is advanced to include the new message. In addition, in order to allow load-store reorderings, a process is allowed to promise a certain write in the future. A promise is also added as a message in the memory, except that the local view of the process is not updated using the timestamp interval in the message. This is done only when the promise is eventually fulfilled. A consistency check is used to ensure that every promised message can be certified (i.e., made fulfillable) by executing that process on its own. Furthermore, this should hold from any future memory (i.e., from any extension of the memory with additional messages). The quantification prevents deadlocks (i.e., processes from making promises they are not able to fulfil). However, the unbounded number of future memories, that need to be checked, makes the verification of even simple programs practically infeasible. Moreover, a number of transformations based on global value range analysis as well as register promotion were not supported in [16].
To address these concerns, Lee et al. developed a new version of the promising semantics, PS 2.0 [22] PS 2.0 simplifies the consistency check and instead of checking the promise fulfilment from all future memories, PS 2.0 checks for promise fulfilment only from a specially crafted extension of the current memory called capped memory. PS 2.0 also introduces the notion of reservations, which allows a process to secure a timestamp interval in order to perform a future atomic read-modify-write instruction. The reservation blocks any other message from using that timestamp interval. Because of these changes, PS 2.0 supports register promotion and global value range analysis, while capturing all features (process local optimizations, DRF guarantees, hardware mappings) of the original promising semantics. Although PS 2.0 can be considered a semantic breakthough, it is a very complex model: it supports two memory access modes, relaxed (rlx) and release-acquire (ra), along with promises, reservations and certifications.
Let PS 2.0-rlx (resp. PS 2.0-ra) be the fragment of PS 2.0 allowing only relaxed (rlx) (resp. release-acquire (ra)) memory accesses. A natural and fundamental question to investigate is the verification of concurrent programs under PS 2.0. Consider the reachability problem, i.e., whether a given configuration of a concurrent finite-state program is reachable. Reachability with only ra accesses has already been shown to be undecidable [1], even without promises and reservations. That leaves us only the PS 2.0-rlx fragment, which captures the semantics of concurrent 'relaxed' memory accesses in programming languages such as Java and C/C++. We show that if an unbounded number of promises is allowed, the reachability problem under PS 2.0-rlx is undecidable. Undecidability is obtained with an execution with only 2 processes and 3 context switches, where a context is a computation segment in which only one process is active.
Then, we show that reachability under PS 2.0-rlx becomes decidable if we bound the number of promises at any time (however, the total number of promises made within a run can be unbounded). The proof introduces a new memory model with higher order words LoHoW, which we show equivalent to PS 2.0-rlx in terms of reachable states. Under the bounded promises assumption, we use the decidability of the coverability problem of well structured transition systems (WSTS) [7,13] to show that the reachability problem for LoHoW with bounded number of promises is decidable. Further, PS 2.0-rlx without promises and reservations has a non-primitive recursive lower bound. Our decidability result covers the relaxed fragment of the RC11 model [20,16] (which matches the PS 2.0-rlx fragment with no promises). Given the high complexity for PS 2.0-rlx and the undecidability of PS 2.0-ra, we next consider a bounded version of the reachability problem. To this end, we propose a parametric under-approximation in the spirit of context bounding [9,33,21,26,24,29,1,3]. The aim of context bounding is to restrict the otherwise unbounded interaction between processes, and has been shown experimentally in the case of SC programs to maintain enough behaviour coverage for bug detection [24,29]. The concept of context bounding has been extended for weak memory models. For instance, for RA, Abdula et al. [1] proposed view bounding using the notion of view-switching messages and a translation that keeps track of the causality between different variables. Since PS 2.0 subsumes RA, we propose a bounding notion that extends view bounding.
Using our new bounding notion, we propose a source-to-source translation from programs under PS 2.0 to context-bounded executions of the transformed program under SC. The challenges in our translation differ a lot from that in [1], as we have to provide a procedure that (i) handles different memory accesses rlx and ra, (ii) guesses the promises and reservations in a non-deterministic manner, and (iii) verifies that promises are fulfilled using the capped memory.
We have implemented this reduction in a tool, PS2SC. Our experimental results demonstrate the effectiveness of our approach. We exhibit cases where hard-to-find bugs are detectable using a small view-bound. Our tool displays resilience to trivial changes in the position of bugs and the order of processes. Further, in our code-to-code translation, the mechanism for making and certifying promises and reservations is isolated in one module, and can easily be changed to cover different variants of the promising semantics.
For lack of space, detailed proofs can be found in [5].

Preliminaries
In this section, we introduce the notation that will be used throughout.

Notations. Given two natural numbers
and f (a ) = f (a ) for all a = a. For a binary relation R, we use [R] * to denote its reflexive and transitive closure. Given an alphabet Σ, we use Σ * (resp. Σ + ) to denote the set of possibly empty (resp. non-empty) finite words (also called simple words) over Σ. A higher order word over Σ is an element of (Σ * ) * (i.e., word of words). Let w = a 1 a 2 · · · a n be a simple word over Σ, we use |w| to denote the length of w. Given an index i in [1, |w|], we use w[i] to denote the i th letter of w. Given two indices i and j s.t. 1 ≤ i ≤ j ≤ |w|, we use w[i, j] to denote the word a i a i+1 · · · a j . Sometimes, we see a word as a function from [1, |w|] to Σ. Program Syntax. The simple programming language we use is described in Figure 1. A program Prog consists of a set Loc of (global) variables or memory locations, and a set P of processes. Each process p declares a set Reg (p) of (local) registers followed by a sequence of labeled instructions. We assume that these sets of registers are disjoint and we use Reg := ∪ p Reg (p) to denote their union. We assume also a (potentially unbounded) data domain Val from which the registers and locations take values. All locations and registers are assumed to be initialized with the special value 0 ∈ Val (if not mentioned otherwise). An instruction i is of the form λ : s where λ is a unique label and s is a statement. We use L p to denote the set of all labels of the process p, and L = p∈P L p the set of all labels of all processes. We assume that the execution of the process p starts always with a unique initial instruction labeled by λ p init . A write instruction is of the form x o = $r assigns the value of register $r to the location x, and o denotes the access mode. If o = rlx, the write is a relaxed write, while if o = ra, it is a release write. A read instruction $r = x o reads the value of the location x into the local register $r. Again, if the access mode o = rlx, it is a relaxed read, and if o = ra, it is an acquire read. Atomic updates or RMW instructions are either compare-and-swap (CAS or,ow ) or FADD or,ow . Both have a pair of accesses (o r , o w ∈ {rel, acq, rlx}) to the same location -a read followed by a write. Following [22], FADD(x, v) stores the value of x into a register $r, and adds v to x, while CAS(x, v 1 , v 2 ) compares an expected value v 1 to the value in x, and if the values are same, sets the value of x to v 2 . The old value of x is then stored in $r. A local assignment instruction $r = e assigns to the register $r the value of e, where e is an expression over a set of operators, constants as well as the contents of the registers of the current process, but not referring to the set of locations. The fence instruction SC-fence is used to enforce sequential consistency if it is placed between two memory access operations. For simplicity, we will write assume(x = e) instead of $r = x; assume($r = e). This notation is extended in the straightforward manner to conditional statements.

The Promising Semantics
In this section, we recall the promising semantics [22]. We present here PS 2.0 with three memory accesses, relaxed, release writes (rel) and acquire reads (acq).
Read-modify-writes (RMW) instructions have two access modes -one for read and one for write. We keep aside the release and acquire fences (and subsequent access modes), since they do not affect the results of this paper.
Timestamps. PS 2.0 uses timestamps to maintain a total order over all the writes to the same variable. We assume an infinite set of timestamps Time, densely totally ordered by ≤, with 0 being the minimum element. A view is a timestamp function V : Loc → Time that records the largest known timestamp for each location. Let T be the set containing all the timestamp functions, along with the special symbol ⊥. Let V init represent the initial view where all locations are mapped to 0. Given two views V and V , we The merge operation between two views V and V returns the pointwise maximum of V and V , i.e., (V V )(y) is the maximum of V (y) and V (y). Let I denote the set of all intervals over Time. The timestamp intervals in I have the form (f, t] where either f = t = 0 or f < t, with f, t ∈ Time. Given an interval I = (f, t] ∈ I, I.frm and I.to denote f, t respectively.

Memory.
In PS 2.0, the memory is modelled as a set of concrete messages (which we just call messages), and reservations. Each message represents the effect of a write or a RMW operation and each reservation is a timestamp interval reserved for future use. In more detail, a message m is a tuple ( Note that a reservation, unlike a message, does not commit to any particular value. We use m.loc (r.loc), m.val, m.to (r.to), m.frm (r.frm) and m.View to denote respectively x, v, t, f and V . Two elements (either messages or reservations) are said to be disjoint (m 1 #m 2 ) if they concern different variables (m 1 .loc = m 2 .loc) or their intervals do not overlap (m 1 .to ≤ m 2 .frm∨m 1  Transition System of a Process. Given a process p ∈ P, a state σ of p is defined by a pair (λ, R) where λ ∈ L is the label of the next instruction to be executed by p and R : Reg → Val maps each register of p to its current value. (Observe that we use the set of all labels L (resp. registers Reg) instead of L p (resp. Reg (p)) in the definition of σ just for the sake of simplicity.) Transitions between the states of p are of the form (λ, R) for the other cases of t where wt(o, x, v) stands for a write instruction that writes the value v to x, U(o r , o w , x, v r , v w ) stands for a RMW that reads the value v r from x and write v w to it, SC-fence stands for a SC-fence instruction, and stands for the execution of the other local instructions. Observe that o, o r , o w are the access modes which can be rlx or ra. We use ra for both release and acquire. Finally, we use (λ, R)

Machine States.
A machine state MS is a tuple ((J, R), VS, PS, M, G), where J : P → L maps each process p to the label of the next instruction to be executed, R : Reg → Val maps each register to its current value, VS = P → T is the process view map, which maps each process to a view, M is a memory and P S : P → M maps each process to a set of messages (called promise set), and G ∈ T is the global view (that will be used by SC fences). We use C to denote the set of all machine states. Given a machine state MS = ((J, R), VS, PS, M, G) and a process p, let MS↓p denote (σ, VS(p), PS(p), M, G), with σ = (J(p), R(p)), (i.e., the projection of the machine state to the process p). We call MS↓p the process configuration. We use C p to denote the set of all process configurations.
The initial machine state is the label of the initial instruction of p; (2) R init ($r) = 0 for every $r ∈ Reg; (3) for each p, VS(p) = V init as the initial view (that maps each location to the timestamp 0); (4) for each p, the set of promises PS init (p) is empty; (5) the initial memory M init contains exactly one initial message (x, 0, (0, 0], V init ) per location x; and (6) the initial global view maps each location to 0.
Transition Relation. We first describe the transition (σ, V, P, M, G) − → p (σ , V , P , M , G ) between process configurations in C p from which we induce the transition relation between machine states.
Process Relation. The formal definition of − → p is given in Figure 2. Below, we explain these inference rules. Note that the full set of rules can be found in [5]. Read A process p can read from M by observing a message m = (x, v, (f, t], K) if V (x) ≤ t (i.e., p must not be aware of a later message for x). In case of a relaxed read rd(rlx, x, v), the process view of x is updated to t, while for an acquire read rd(ra, x, v), the process view is updated to V [x → t] K. The global memory M , the set of promises P , and the global view G remain the same.

Write.
A process can add a fresh message to the memory (MEMORY : NEW) or fulfil an outstanding promise (MEMORY : FULFILL). The execution of a write (wt(rlx, x, v)) results in a message m with location x along with a timestamp interval (−, t]. Then, the process view for x is updated to t. In case of a release write (wt(ra, x, v)) the updated process view is also attached to m, and ensures that the process does not have an outstanding promise on x. (MEMORY : FULFILL) allows to split a promise interval or lower its view before fulfilment.
Update. When a process performs a RMW, it first reads a message m = (x, v, (f, t], K) and then writes an update message with frm timestamp equal to t; that is, a message of the form m = (x, v , (t, t ], K ). This forbids any other write to be placed between m and m . The access modes of the reads and writes in the update follow what has been described for the read and write above.

Promise, Reservation and Cancellation.
A process can non-deterministically promise future writes which are not release writes. This is done by adding a message m to the memory M s.t. m#M and to the set of promises P . Later, a relaxed write instruction can fulfil an existing promise. Recall that the execution of a release write requires that the set of promises to be empty and thus it can not be used to fulfil a promise. In the reserve step, the process reserves a timestamp interval to be used for a later RMW instruction reading from a certain message without fixing the value it will write. A reservation is added both to the memory and the promise set. The process can drop the reservation from both sets using the cancel step in non-deterministic manner. Consistency. According to Lee et al. [22], there is one final requirement on machine states called consistency, which roughly states that, from every encountered machine state, all the messages promised by a process p can be certified (i.e., made fulfillable) by executing p on its own from a certain future memory (called capped memory), i.e., extension of the memory with additional reservation. Before defining consistency, we need to introduce capped memory.
is the initial machine state and MS 1 , . . . , MS n are consistent machine states. Then, MS 0 , . . . , MS n are said to be reachable from MS init .
Given an instruction label function J : P → L that maps each process p ∈ P to an instruction label in L p , the reachability problem asks whether there exists a machine state of the form ((J, R), V, P, M, G) that is reachable from MS init . A positive answer to this problem means that J is reachable in Prog in PS 2.0.

Undecidability of Consistent Reachability in PS 2.0
The reachability problem is undecidable for PS 2.0 even for finite-state programs. The proof is by a reduction from Post's Correspondence Problem (PCP) [28]. A PCP instance consists of two sequences u 1 , . . . , u n and v 1 , . . . , v n of non-empty words over some alphabet Σ. Checking whether there exists a sequence of indices Our proof works with the fragment of PS 2.0 having only relaxed (rlx) memory accesses and crucially uses unboundedly many promises to ensure that a process cannot skip any writes made by another process. We construct a concurrent program with two processes p 1 and p 2 over a finite data domain. The code of p 1 is split into two modes: a generation mode and a validation mode by a if and its else branch. The if branch is entered when the value of a boolean location validate is 0 (its initial value). We show that reaching the instructions annotated by // and // in p 1 , p 2 is possible iff the PCP instance has a solution. We give below an overview of the execution steps leading to the annotated instructions. Our undecidability result is also tight in the sense that the reachability problem becomes decidable when we restrict ourselves to machine states where the number of promises is bounded. Further, our proof is robust: it goes through for PS 1.0 [16]. Let us call the fragment of PS 2.0 with only rlx memory accesses PS 2.0-rlx. Theorem 1. The reachability problem for concurrent programs over a finite data domain is undecidable under PS 2.0-rlx.

Decidable Fragments of PS 2.0
Since keeping ra memory accesses renders the reachability problem undecidable [1] and so does having unboundedly many promises when having rlx memory accesses (Theorem 1), we address in this section the decidability problem for PS 2.0-rlx with a bounded number of promises in any reachable configuration. Bounding the number of promises in any reachable machine state does not imply that the total number of promises made during that run is bounded. Let bdPS 2.0-rlx represent the restriction of PS 2.0-rlx to boundedly many promises where the number of promises in each reachable machine state is smaller or equal to a given constant. Notice that the fragment bdPS 2.0-rlx subsumes the relaxed fragment of the RC11 model [20,16].We assume here a finite data domain.
To establish the decidability of the reachability of bdPS 2.0-rlx, we introduce an alternate memory model for concurrent programs called LoHoW (for "lossy higher order words"). We present the operational semantics of LoHoW, and show that (1) PS 2.0-rlx is reachability equivalent to LoHoW, (2) under the bounded promise assumption, reachability is decidable in LoHoW (hence, bdPS 2.0-rlx).
Introduction to LoHoW. Given a concurrent program Prog, a state of LoHoW maintains a collection of higher order words, one per location of Prog, along with the states of all processes. The higher order word HW x corresponding to the location x is a word of simple words, representing the sub memory M (x) in PS 2.0-rlx. Each simple word in HW x is an ordered sequence of "memory types", that is, messages or promises in M (x), maintained in the order of their to timestamps in the memory. The word order between memory types in HW x represents the order induced by time stamps between memory types in M (x).
The key information to encode in each memory type of HW x is: (1) is it a message (msg) or a promise (prm) in M (x), (2) the process (p) which added it to M (x), the value (val) it holds, (3) the set S (called pointer set) of processes that have seen this memory type in M (x) and (4) whether the adjacent time interval to the right of this memory type in M (x) has been reserved by some process.
Memory Types. To keep track of (1-4) above, a memory type is an element of Σ ∪ Γ with, Σ = {msg, prm} × Val × P × 2 P (for 1-3) and Γ = {msg, prm} × Val × P × 2 P × P (for 4). We write a memory type as (r, v, p, S, ?). Here r represents either msg (message) or prm (promise) in M (x), v is the value, p is the process that added the message/promise, S is a pointer set of processes whose local view (on x) agrees with the to timestamp of the message/promise. If the type ∈ Γ , the fifth component (?) is the process id that has reserved the time slot right-adjacent to the message/promise. ? is a wildcard that may (or not) be matched. Simple Words. A simple word ∈ Σ * #(Σ ∪ Γ ), and each HW x is a word ∈ (Σ * #(Σ ∪ Γ )) + . # is a special symbol not in Σ ∪ Γ , which separates the last symbol from the rest of the simple word. Consecutive symbols of Σ in a simple word in HW x represent adjacent messages/promises in M (x) and are hence unavailable for a RMW. # does not correspond to any element from the memory, and is used to demarcate the last symbol of the simple word. Higher order words. A higher order word is a sequence of simple words. Figure  3 depicts a higher order word with four simple words. We use a left to right order in both simple words and higher order words. Furthermore, we extend in the straightforward manner the classical word indexation strategy to higher order words. For example, the symbol at the third position of the higher order word HW in Figure 3 is HW[3] = (msg, 2, p, {p, q}). A higher order word HW is well-formed iff for every p ∈ P, there is a unique position i in HW having p in its pointer set; that is, The higher order word given in Figure 3 is well-formed. We will use ptr(p, HW) to denote the unique position i in HW having p in its pointer set. We assume that all the manipulated higher order words are well-formed. g Each higher order word HW x represents the entire space [0, ∞) of available timestamps in M (x). Each simple word in HW x represents a timestamp interval (f, t], while consecutive simple words represent disjoint timestamp intervals (while preserving order). The memory types constituting each simple word take up adjacent timestamp intervals, spanning the timestamp interval of the simple word. The adjacency of timestamp intervals within simple words is used in RMW steps and reservations. The last symbol in a simple word denotes a message/promise which, (1) if in Σ, is available for a RMW, while (2) if in Γ , is unavailable for RMW since it is followed by a reservation. Symbols at positions other than the rightmost in a simple word, represent messages/promises which are not available for RMW. Figure 4 presents a mapping from a memory of PS 2.0-rlx to a collection of higher order words (one per location) in LoHoW.
Initializing higher order words. For each location x ∈ Loc, the initial higher order word HW init x is defined as , where P is the set of all processes and p 1 is some process in P. The set of all higher order words HW init x for all locations x represents the initial memory of PS 2.0-rlx where all locations have value 0, and all processes are aware of the initial message.
Simulating PS 2.0 Memory Operations in LoHoW. In the following, we describe how to handle PS 2.0-rlx instructions in LoHoW. Since we only have the rlx mode, we denote Reads, Writes and RMWs as wt(x, v), rd(x, v) and U(x, v r , v w ), dropping the modes. Reads. To simulate a rd(x, v) by a process p in LoHoW, we need an index j ≥ ptr(p, HW x ) in HW x such that HW x [j] is a memory type with value v of the form (−, v, −, S , ?) (? denotes that the type is either from Σ or Γ ). The read is simulated by adding p to the set S and removing it from its previous set.  (writing v to x) is simulated by adding a new msg type in HW x with a timestamp higher than the view of p for x: (1) add the simple word (msg, v, p, {p}) to the right of ptr(p, HW x ) or (2) there is α ∈ Σ such that the word w#α is in HW x to the right of ptr(p, HW x ). Modify w#α to get wα#(msg, v, p, {p})·. Remove p from its previous pointer set. In PS 2.0-rlx, a process p makes a reservation by adding the pair (x, (f, t]) to the memory, given that there is a message/promise in the memory with timestamp interval (−, f]. In LoHoW this is captured by "tagging" the rightmost memory type (message/promise) in a simple word with the name of the process that makes the reservation. This requires us to consider the memory types from Γ = {msg, prm} × Val × P × 2 P × P where the last component stores the process which made the reservation. Such a memory type always appears at the end of a simple word, and represents that the next timestamp interval adjacent to it has been reserved. Observe that nothing can be added to the right of a memory type of the form (msg, v, p, S, q). Thus, reservations are handled as follows. Memory is altered in PS 2.0-rlx during certification phase to check for promise fulfilment, and at the end of the certification phase, we resume from the memory which was there before. To capture this in LoHoW, we work on a duplicate of (HW x ) x∈Loc in the certification phase. Notice that the duplication allows losing non-deterministically, empty memory types: these are memory types whose pointer set is empty, as well as redundant simple words, which are simple words consisting entirely of empty memory types. This copy of HW x is then modified during certification, and is discarded once we finish the certification phase.

Formal Model of LoHoW
In the following, we formally define LoHoW and state the equivalence of the reachability problem in PS 2.0-rlx and LoHoW. For a memory type m = (r, v, p, S) (or m = (r, v, p, S, q)), we use m.value to denote v. For a memory type (r, v, p, S, ?) and a process p ∈ P, we define the following: add(m, p ) ≡ (r, v, p, S ∪ {p }, ?) and del(m, p ) ≡ (r, v, p, S \ {p }, ?). This corresponds to the addition/deletion of the process p to/from the set of pointers of m. Extending the above notation, given a higher order word HW, a position i ∈ {1, . . . , |HW|}, and p ∈ P , we define the following: add (HW, p, (HW, p), p, i). This corresponds to the addition/deletion/relocation of the pointer p to/from the word HW[i]. Insertion into higher order words. A higher order word HW can be extended in position 1 ≤ j ≤ |HW| with a memory type m = (r, v, p, {p}) as follows: • Insertion as a new simple word is defined only if HW[j − 1] = # (i.e., the position j is the end of a simple word). Let HW = del(HW, p) (i.e., removing p from its previous set of pointers). Then, the insertion of m results in Making/Canceling a reservation. A higher order word HW can also be modified by p by making/cancelling a reservation at a position 1 ≤ j ≤ |HW|. We define the operation Make(HW, p, j) (Cancel (HW, p, j)) that reserves (cancels) a time slot at j. Make(HW, p, j) (resp. Cancel (HW, p, j)  is handled by reading a value from a memory type which is on the right of the current pointer of p. A write operation, in the standard phase, can result in the insertion, on the right of the current pointer of p, of a new memory type at the end of a simple word or as a new simple word. The memory type resulting from a write in the certification phase is only allowed to be inserted at the end of the higher order word or at the reserved slots (using the rule splitting a reservation). Write can also be used to fulfil a promise or to split a promise (i.e., partial fulfilment) during the both phases. Making/canceling a reservation will result in tagging/untagging a memory type at the end of a simple word on the right of the current pointer of p. The case of RMW is similar to a read followed by a write operations (whose resulting memory type should be inserted to the right of the read memory type). Finally, a promise can only be made during the standard phase and the resulting memory type will be inserted at the end of a simple word or as a new word on the right of the current pointer of p. Two phases LoHoW states. A two-phases state of LoHoW is S = (π, p, st std , st cert ) where π ∈ {cert, std} is a flag describing whether the LoHoW is in "standard" phase or "certification" phase, p is the process which evolves in one of these phases, while st std , st cert are two LoHoW states (one for each phase). When the LoHoW is in the standard phase, then st std evolves, and when the LoHoW is in certification phase, st cert evolves. A two-phases LoHoW state is said to be initial if it is of the form (std, p, st init , st init ), where p ∈ P is any process. The transition relation → between two-phases LoHoW states is defined as follows: Given S = (π, p, st std , st cert ) and S = (π , p , st std , st cert ), we have S → S iff one of the following cases holds: . A positive answer to this problem means J is reachable in Prog in LoHoW.
The following theorem states the equivalence between LoHoW and PS 2.0-rlx in terms of reachable instruction label functions.

Theorem 2. An instruction label function J is reachable in a program Prog in
LoHoW iff J is reachable in Prog in PS 2.0-rlx.

Decidability of LoHoW with Bounded Promises
The equivalence of the reachability in LoHoW and PS 2.0-rlx, coupled with Theorem 1 shows that reachability is undecidable in LoHoW. To recover decidability, we look at LoHoW with only bounded number of the promise memory type in any higher order word. Let K-LoHoW denote LoHoW with a number of promises bounded by K. (Observe that K-LoHoW corresponds to bdPS 2.0-rlx.)

Theorem 3. The reachability problem is decidable for K-LoHoW.
As a corollary of Theorem 3, the decidability of reachability follows for bdPS 2.0-rlx. The proof makes use of the framework of Well-Structured Transition Systems (WSTS) [7,13]. Next, we state that the reachability problem for K-LoHoW (even for K = 0) is highly non-trivial (i.e., non-primitive recursive). The proof is done by reduction from the reachability problem for lossy channel systems, in a similar to the case of TSO [8] where we insert SC-fence instructions everywhere in the process that simulates the lossy channel process (in order to ensure that no promises can be made by that process).

Source to Source Translation
In this section, we propose an algorithmic approach for state reachability in concurrent programs under PS 2.0. We first recall the notion of view altering reads [1], and that of bounded contexts in SC [29]. View Altering Reads. A read from the memory is view altering if it changes the view of the process reading it. This means that the view in the message being read from was greater than the process view on some variable. The message which is read from in turn is called a view altering message. A run in which the total number of view altering reads (across all threads) is bounded (by some parameter) is called a view-bounded run. The underapproximate analysis for PS 2.0-ra without promises and reservations [1] considered view bounded runs. Essential Events. An essential event in a run ρ of a program under PS 2.0 is either a promise, a reservation or a view altering read by some process in the run. Bounded Context. A context is an uninterrupted sequence of actions by a single process. In a run having K contexts, the execution switches from one process to another K − 1 times. A K bounded context run is one where the number of context switches are bounded by K ∈ N. The K bounded context reachability problem in SC checks for the existence of a K bounded context run reaching some chosen instruction. Now we define the notion of bounding for PS 2.0. The Bounded Consistent Reachability Problem. A run ρ of a concurrent program under PS 2.0, is called K bounded iff the number of essential events in ρ is ≤ K. The K bounded reachability problem for PS 2.0 checks for the existence of a run ρ of Prog which is K-bounded. Assuming Prog has n processes, we propose an algorithm that reduces the K bounded reachability problem to a K + n bounded context reachability problem of a program Prog under SC. Translation Overview. We now provide a brief overview of the data structures and procedures utilized in our translation; the full details and correctness are in [5]. Let Prog be a concurrent program under PS 2.0 with set of processes P and locations Loc. Our algorithm relies on a source to source translation of Prog to a bounded context SC program Prog , as shown in Figure 8 and operates on the same data domain (need not be finite). The translation (i) adds a new process (Main) that initializes the global variables of Prog , (2) for each process p ∈ P adds local variables, which are initialized by the function InitProc. Fig. 8: Source-to-source translation map This is followed by the code block CSO p,λ0 (Context Switch Out) that optionally enables the process to switch out of context. For each λ labeled instruction i in p, the map λ : i p transforms it into a sequence of instructions as follows : the code block CSI (Context Switch In) checks if the process is active in the current context; then it transforms each statement s of instruction i into a sequence of instructions following the map s p , and finally executes the code block CSO p,λ . CSO p,λ facilitates two things: when the process is at an instruction label λ, (1) allows p to make promises/reservations after λ, s.t. the control is back at λ after certification; (2) it ensures that the machine state is consistent when p switches out of context. Translation of assume, if and while statements keep the same statement. Translation of read and write statements are described later. Translation of RMW statements are omitted for ease of presentation.
The set of promises a process makes has to be constrained with respect to the set of promises that it can certify To address this, in the translation, processes run in two modes : a 'normal' mode and a 'check' (consistency check ) mode. In the normal mode, a process does not make any promises or reservations. In the check mode, the process may make promises and reservations and subsequently certify them before switching out of context. In any context, a process first enters the normal mode, and then, before exiting the context it enters the check mode. The check mode is used by the process to (1) make new promises/reservations and (2) certify consistency of the machine state. We also add an optional parameter, called certification depth (certDepth), which constrains the number of steps a process may take in the check mode to certify its promises. Figure 9 shows the structure of a translated run under SC. Fig. 9: Control flow: In each context, a process runs first in normal mode n and then in consistency check mode cc. The transitions between these modes is facilitated by the CSO code block of the respective process. We check assertion failures for K + n context-bounded executions (j ≤ K + n).
To reduce the PS 2.0 run into a bounded context SC run, we use the bound on the number of essential events. From the run ρ in PS 2.0, we construct a K bounded run ρ in PS 2.0 where the processes run in the order of generation of essential events. So, the process which generates the first essential event is run first, till that event happens, then the second process which generates the second essential event is run, and so on. This continues till K + n contexts : the K bounds the number of essential events, and the n is to ensure all processes are run to completion. The bound on the number of essential events gives a bound on the number of timestamps that need to be maintained. As observed in [1], each view altering read requires two timestamps; additionally, each promise/reservation requires one timestamp. Since we have K such essential events, 2K time stamps suffice. We choose Time = {0, 1, 2, . . . , 2K} as the set of timestamps. Now we briefly give a high level overview of the translation. Data Structures. The message data structure represents a message generated as a write or a promise and has 4 fields (i) var , the address of the memory location written to; (ii) the timestamp t in the view associated with the message; (iii) v, the value written; and (iv) flag, that keeps track of whether it is a message or a promise; and, in case of a promise, which process it belongs to. The View data structure stores, for each memory location x, (i) a timestamp t ∈ Time, (ii) a value v written to x, (iii) a Boolean l ∈ {true, false} representing whether t is an exact timestamp (which can be used for essential events) or an abstract timestamp (which corresponds to non-essential events).
Global Variables. The Memory is an array of size K holding elements of type message . This array is populated with the view altering messages, promises and reservations generated by the program. We maintain counters for (1) the number of elements in Memory ; (2) the number of context switches that have occurred; and (3) the number of essential events that have occurred.
Local Variables. In addition to its local registers, each process has local variables including (1) a local variable view which stores a local instance of the view function (this is of type View), (2) a flag denoting whether the process is running in the current context, and (3) a flag checkMode denoting whether the process is in the certification phase. We implement the certification phase as a function call, and hence store the process state and return address, while entering it.

Translation Maps
In what follows we illustrate how the translation simulates a run under PS 2.0. At the outset, recall that each process alternates, in its execution, between two modes: a normal mode (n in Figure 9) at the beginning of each context and the check mode at the end of the current context (cc in Figure 9), where it may make new promises and certify them before switching out of context.
Context Switch Out (CSO p,λ ). We describe the CSO module; Algorithm 1 of Figure 10 provides its pseudocode. CSO p,λ is placed after each instruction λ in the original program and serves as an entry and exit point for the consistency check phase of the process. When in normal mode (n) after some instruction λ, CSO non-deterministically guesses whether the process should exit the context at this point, and sets the checkMode flag to true and subsequently, saves its local state and the return address (to mark where to resume execution from, in the next context). The process then continues its execution in the consistency check mode (cc) from the current instruction label (λ) itself. Now the process may generate new promises (see Algorithm 1 of Figure 10) and certify these as well as earlier made promises. In order to conclude the check mode phase, the process will enter the CSO block at some different instruction label λ . Now since the checkMode flag is true, the process enters the else branch, verifies that there are no outstanding promises of p to be certified. Since the promises are not yet fulfilled, when p switches out of context, it has to mark all its promises uncertified. When the context is back to p again, this will be used to fulfil the promises or to certify them again before the context switches out of p again.
Then it exits the check mode phase, setting checkMode to false. Finally it loads the saved state, and returns to the instruction label λ (where it entered check mode) and exits the context. Another process may now resume execution.  Figure 10. This is the general pseudo code for both kinds of memory accesses, with specific details pertaining to the particular access mode omitted. Let us first consider execution in the normal mode (i.e., checkMode is false). First, the process updates its local state with the value that it will write. Then, the process non-deterministically chooses one of three possibilities for the write, it either (i) does not assign a fresh timestamp (non-essential event), (ii) assigns a fresh timestamp and adds it to memory, or (iii) fulfils some outstanding promise.
Let us now consider a write executing when checkMode is true, and highlight differences with the normal mode. In case (i), non essential events exclude promises and reservations. Then, while in certification phase, since we use a capped memory, the process can make a write if either (1) the write interval can be generated through splitting insertion or (2) the write can be certified with the help of a reservation. Basically the writes we make either split an existing interval (and add this to the left of a promise), or forms a part of a reservation. Thus, the time stamp of a neighbour is used. In case (ii) when a fresh time stamp is used, the write is made as a promise, and then certified before switching out of context. The analogue of case (iii) is the certification of promises for the current context; promise fulfilment happens only in the normal mode. To help a process decide the value of a promise, we use the fact that CBMC allows us to assign a non-deterministic value of a variable. On top of that, we have implemented an optimization that checks the set of possible values to be written in the future.
Read Statements. The translation of a read instruction $r := x o , o ∈ {rlx, ra} of process p is given in Algorithm 3 of Figure 11. The process first guesses, whether it will read from a view altering message in the memory of from its local view. If it is the latter, the process must first verify whether it can read from the local view ; for instance, reading from the local view may not be possible after execution of a fence instruction when the timestamp of a variable x gets incremented from the local view t to t > t. In the case of a view altering read, we first check that we have not reached the context switching/essential event bound. Then the new message is fetched from Memory and we check the view (timestamps) in the acquired message satisfy the conditions imposed by the access type ∈ {ra, rlx}. Finally, the process updates its view with that of the new message and increments the counters for the context switches and the essential events. Theorem 5 proves the correctness of our translation.

Implementation and Experimental Results
In order to check the efficiency of the source-to-source translation, we implement a prototype tool, PS2SC which is the first tool to handle PS 2.0. PS2SC takes as input a C program and a bound K and translates it to a program Prog to be run under SC. We use CBMC v5.10 as the backend verifier for Prog . CBMC takes as input L, the loop unrolling parameter for bounded model checking of Prog . If PS2SC returns unsafe, then the program has an unsafe execution. Conversely, if it returns safe then none of the executions within the subset violate any assertion. K may be iteratively incremented to increase the number of executions explored. PS2SC has a functionality of partial-promises allowing subsets of processes to promise, providing an effective under-approximation technique.
We now report the results of experiments we have performed with PS2SC. We have two objectives: (1) studying the performance of PS2SC on thin-air litmus tests and benchmarks utilizing promises, and (2) comparing PS2SC with other model checkers when operating in the promise-free mode. In the first case we show that PS2SC is able to uncover bugs in litmus tests and examples with few reads and writes to the shared memory. When this interaction and subsequent non-determinism of PS 2.0 increases, we also enable partial promises. For the second case we compare PS2SC with three model checkers CDSChecker [25], GenMC [18] and Rcmc [17] that support the promise-free subset of PS 2.0. Our observations highlight the ability to detect hard to find bugs with small K for unsafe benchmarks. We do not consider compilation time for any tool while reporting the results. For PS2SC, the time reported is the time taken by the CBMC backend for analysis. The timeout used is 1hr for all benchmarks. All experiments are conducted on a machine with 3.00 GHz Intel Core i5-3330 CPU and 8GB RAM running an Ubuntu-16 64-bit operating system. We denote timeout by 'TO', and memory limit exceeded by 'MLE'.  [16,22,11,23]. These examples are small programs that serve as barebones thin-air tests for the C11 memory model. Consistency tests based on the Java Memory Model are proposed in [23], which were experimented on by [27] with their MRDer tool. Like MRDer, PS2SC is able to verify most of these tests within 1 minute which shows its ability to handle typical programming idioms of PS 2.0 (see Table 1).  [10].
In these examples a process is required to generate a promise (speculative write) with value as the i th fibonacci number. This promise is certified using process-local reads. Thus though the parameter i increases the interaction of the promising process with the memory remains constant. The CAS variant requires the process to make use of reservations. We note that PS2SC uncovers the bugs effectively in these cases. In cases where promise-certificate requires reads from external processes, the amount of shared-memory interaction increases with i. In this case, we use partial promises. How to recover tractable analysis? We note that though the above example consists of several processes interacting with the memory, the bug can be uncovered even if only a single process is allowed to make promising writes. We run PS2SC in the partial-promises mode. We considered the case where only a single process generates promises, and PS2SC was able to uncover the bug. The results obtained are in Table 2, where PS2SC[1p] denotes that only one process is permitted to perform promises. We then repeat our experiments on other unsafe benchmarks -including ExponentialBug from Fig. 2 of [15] -and have similar observations. To summarize, we note that the huge non-determinism of PS 2.0 can be fought by using the modular approach of partial-promises.
Comparing with Other Tools. In this section, we compare performance of PS2SC in promise-free mode with CDSChecker [25], GenMC [18] and Rcmc [17] (which do not support promises). The main objective of this section is to provide evidence for the practicability of the essential-event-bounding technique. The results of this section indicate that the source-to-source translation with Kessential-event bounding is effective at uncovering hard to find bugs in non-trivial programs. Additionally, we observe that in most examples considered, we had K ≤ 10. We provide here a subset of the experimental results and the remaining in the full version of the paper [5]. In the tables that follow we provide the value of K (for PS2SC) and the value of L (loop-unrolling bound) for all tools.  Table 3, we experiment on two parametrized benchmarks: ExponentialBug (Fig. 2 of [15]) and Fibonacci (from SV-COMP 2019). In ExponentialBug(N ) N is the number writes made to a variable by a process. We note that in ExponentialBug(N ) the number of executions grows as N !, while the processes have to follow a specific interleaving to uncover the hard to find bug. In Fibonacci(N ), two processes compute the value of the n th fibonacci number in a distributed fashion.  Table 4, we consider benchmarks based on concurrent data structures. The first of these is a concurrent locking algorithm originating from [14]. The second, LinuxLocks(N) is adapted from evaluations of CDSChecker [25]. We note that if not completely fenced, it is unsafe. We fence all but one lock access. Both these results show the ability of our tool to uncover bugs with a small value of K.
Variations of mutual exclusion protocols. We consider variants of mutual exclusion protocols from SV-COMP 2019. The fully fenced versions of the protocols are safe. We modify these protocols by introducing bugs and comparing the performance of PS2SC for bug detection with the other tools. These benchmarks are parameterized by the number of processes. In Table 5, we unfence a single process of the Peterson and Szymanski protocols making them unsafe. These are benchmarks petersonU(i) and szymanskiU(i) where i is the number of processes. In petersonB(i), we keep all processes fenced but introduce a bug into the critical section of a process (write a value to a shared variable and read a different value from it). We note that the other tools do not scale, while PS2SC is able to detect the bug within one minute, showing that essential event-bounding is an effective under-approximation technique for bug-finding.
Remark. Through all these experiments, we observe that SMC tools and our tool try to tackle the same problem by using orthogonal approaches to finding bugs. Hence, through the experiments above we are not trying to pitch one approach against the other, but rather trying to highlight the differences in their features. We have exhibited examples where our tool is able to uncover hard-to-find bugs faster than the others with relatively small values of K.

Related Work and Conclusion
Most of the existing verification work for C/C++ concurrency models concern the development of stateless model checking coupled with dynamic partial order reduction (e.g., [6,17,18,26,25]) and do not handle the promising semantics. Context-bounding has been proposed in [29] for programs running under SC. This work has been extended in different directions and has led to efficient and scalable techniques for the analysis of concurrent programs (see e.g., [24,21,33,32,12,34]). In the context of weak memory models, context-bounded analyses have been proposed for TSO/PSO [9,31] and POWER [3].
The decidability of the verification problems for programs running under weak memory models has been addressed for TSO [8], RA [1], SRA [19], and POWER [2]. We believe that our proof techniques can be easily adapted to work with different variants of the promising semantics [16] (see [4]). For instance, in the code-to-code translation, the mechanism for making and certifying promises and reservations is isolated in one module, which can be easily changed to cover different variants of the promising semantics. Furthermore, the undecidability proof still goes through for [16]. Moreover, providing a tool for the verification of (among other things) litmus tests, will provide a valuable environment which can be used in further improvements of the promising semantics. To the best of our knowledge, this the first time that this problem is investigated for PS 2.0-rlx and PS2SC is the first tool for automated verification of programs under PS 2.0. Finally, studying the decidability problem for related models that solve the thin-air problem (e.g., Paviotti et al. [27]) is interesting and kept as future work.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4. 0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.