# Modular Termination Verification for Non-blocking Concurrency

- 14 Citations
- 1.8k Downloads

## Abstract

We present Total-TaDA, a program logic for verifying the total correctness of concurrent programs: that such programs both terminate and produce the correct result. With Total-TaDA, we can specify constraints on a thread’s concurrent environment that are necessary to guarantee termination. This allows us to verify total correctness for non-blocking algorithms, e.g. a counter and a stack. Our specifications can express lock- and wait-freedom. More generally, they can express that one operation cannot impede the progress of another, a new non-blocking property we call *non-impedance*. Moreover, our approach is modular. We can verify the operations of a module independently, and build up modules on top of each other.

## Keywords

Program Logic Loop Iteration Concurrent Program Read Operation Total Correctness## 1 Introduction

The problem of understanding and proving the correctness of programs has been considered at least since Turing [21]. When proving a program, it is not just important to know that it will give the right answer, but also that the program terminates. This is especially challenging for concurrent programs. When multiple threads are changing some shared resource, knowing if each thread terminates can often depend on the behaviour of the other threads and even on the scheduler that decides which thread should run at a particular moment.

If we prove that a concurrent program only produces the right answer, we establish *partial correctness*. Many recent developments have been made in program logics for partial correctness of concurrent programs [5, 11, 16, 17, 19, 22]. These logics emphasise a *modular* approach, which allows us to decouple the verification of a module’s clients and its implementation. Each operation of the module is proven in isolation, and the reasoning is local to the thread. To achieve this, these logics abstract the interference between a thread and its environment.

These logics have been applied to reason about fine-grained concurrency, which is characterised by the use of low-level synchronisation operations (such as compare-and-swap). A well-known class of fine-grained concurrent programs is that of *non-blocking* algorithms. With non-blocking algorithms, suspension of a thread cannot halt the progress of other threads: the progress of a single thread cannot require another thread to be scheduled. Thus if the interference from the environment is suitably restricted, the operations are guaranteed to terminate.

If we prove that a program produces the correct results and also always completes in a finite time, we establish *total correctness*. Turing [21] and Floyd [6] introduced the use of well-founded relations, combined with partial-correctness arguments, to prove the termination of sequential programs. The same technique is general enough to prove concurrent programs too. However, previous applications of this technique in the concurrent setting, which we discuss in Sect. 7, do not support straight-forward reasoning about clients.

In this paper, we extend a particular concurrent program logic, TaDA [16], with well-founded termination reasoning. With the resulting logic, Total-TaDA, we can prove total correctness of fine-grained concurrent programs. The novelty of our approach is in using TaDA’s abstraction mechanisms to specify constraints on the environment necessary to ensure termination. It retains the modularity of TaDA and abstracts the internal termination arguments. We demonstrate our approach on counter and stack algorithms.

We observe that Total-TaDA can be used to verify standard non-blocking properties of algorithms. However, our specifications capture more: we propose the concept of *non-impedance* that our specifications suggest. We say that one operation *impedes* another if the second can be prevented from terminating by repeated concurrent invocations of the first. This concept seems important to the design and use of non-blocking algorithms where we have some expectation about how clients use the algorithm, and what progress guarantees they expect.

*TaDA.*TaDA introduced a new form of specification, given by

*atomic triples*, which supports local, modular reasoning and can express constraints on the concurrent environment. Simple atomic triples have the following form:Intuitively, the specification states that the program \(\mathbb {C}\) atomically updates

*p*(

*x*) to

*q*(

*x*) for an arbitrary \(x \in X\). As we are in a concurrent setting, while \(\mathbb {C}\) is executing, there might be interference from the environment before the atomic update. The pseudo-quantifier Open image in new window restricts the interference: before the atomic update, the environment must maintain

*p*(

*x*), but it is allowed to change the parameter as long as it stays within

*X*; after the atomic update, the environment is not constrained. This specification thus provides a contract between the client of \(\mathbb {C}\) and the implementation: the client can assume that the precondition holds for some \(x \in X\) until it performs the update.

^{1}The internal structure of the counter is abstracted using the abstract predicate [14] \(\mathsf {C}(s, x, n)\), which states that there is a counter at address

*x*with value

*n*and

*s*abstracts implementation specific information about the counter. The specification says that the \(\mathtt {incr}\) atomically increments the counter by 1. The environment is allowed to update the counter to any value of

*n*as long as it is a natural number. The specification enforces obligations on both the client and the implementation: the client must guarantee that the counter is not destroyed and that its value is a natural number until the atomic update occurs; and the implementation must guarantee that it does not change the value of the counter until it performs the specified atomic action. Working at the abstraction of the counter means that each operation can be verified without knowing the rest of the operations of the module. Consequently, modules can be extended with new operations without having to re-verify the existing operations. Additionally, the implementation of \(\mathtt {incr}\) can be replaced by another implementation that satisfies the same specification, without needing to re-verify the clients that make use of the counter. While atomic triples are expressive, they do not guarantee termination. In particular, an implementation could block, deadlock or live-lock and still be considered correct.

*Non-blocking Algorithms.* In general, guaranteeing the termination of concurrent programs is a difficult problem. In particular, termination could depend on the behaviour of the scheduler (whether or not it is *fair*) and of other threads that might be competing for resources. We focus on non-blocking programs. Non-blocking programs have the benefit that their termination is not dependent on the behaviour of the scheduler.

There are two common non-blocking properties: *wait-freedom* [8] and *lock-freedom* [13]. Wait-freedom requires that operations complete irrespective of the interference caused by other threads: termination cannot depend on the amount of interference caused by the environment. Lock-freedom is less restrictive. It requires that, when multiple threads are performing operations, then at least one of them must make progress. This means that a thread might never terminate if the amount of interference caused by the environment is unlimited.

TaDA is well suited to reasoning about interference between threads. In particular, we can write specifications that limit the amount of interference caused by the client, and so guarantee termination of lock-free algorithms. We will see how both wait-freedom and lock-freedom can be expressed in Total-TaDA.

*Termination.*Well-founded relations provide a general way to prove termination. In particular, Floyd [6] used well-founded relations to prove the termination of sequential programs. In fact, it is sufficient to use ordinal numbers [3] without losing expressivity. A ‘Hoare-style’ while rule, using ordinals and adapted from Floyd’s work, has the form:The loop invariant \(p(\gamma )\) is parametrised by an ordinal \(\gamma \) (the

*variant*) which is decreased by every execution of the loop body \(\mathbb {C}\). Because ordinals cannot have infinite descending chains, the loop must terminate in a finite number of steps. This proof rule allows termination reasoning to be localised to the individual loops in the program. In this paper, we extend TaDA with termination based on ordinal numbers, using the while rule given above.

*Total-TaDA.* We obtain the program logic Total-TaDA by modifying TaDA to have a total-correctness semantics. The details are given in Sect. 3. With Total-TaDA, we can specify and verify non-blocking algorithms. Wait-free operations always terminate, independently of the operations performed by the environment. For lock-free operations however, we need to restrict the amount of interference the environment can cause in order to guarantee termination. Our key insight is that, as well as bounding the number of iterations of loops, ordinals can bound the interference on a module. This allows us to give total-correctness specifications for lock-free algorithms. In Sect. 2, we specify and verify lock-free implementations of a counter. The specification introduces ordinals to bound the number of times a client may update the counter. This makes it possible to guarantee that the lock-free increment operation will terminate, since either it will succeed or some other concurrent increment will succeed. As the number of increments is bounded, the operation must eventually succeed.

Total-TaDA retains the modularity of TaDA. In particular, we can verify the termination of clients of modules using the total-correctness specifications, without reference to the implementation. We show an example of this in Sect. 2.2. Since the client only depends on the specification, we can replace the implementation. In Sect. 2.3 we show that two different implementations of a counter satisfy the same total-correctness specification. With Total-TaDA we can verify the operations of a module independently, exploiting locality.

As a case study for Total-TaDA, we show how to specify and verify both functional correctness and termination of Treiber’s stack in Sect. 4. In Sect. 5, we discuss the implications of a total-correctness semantics for the soundness proof of Total-TaDA. In Sect. 6, we show how lock-freedom and wait-freedom can be expressed with Total-TaDA specifications. We also introduce the concept of non-impedance in Sect. 6.3 and argue for its value in specifying non-blocking algorithms. We discuss related work in Sect. 7 and future directions in Sect. 8.

## 2 Motivating Examples: Counters

Our underlying programming language is a concurrent while language with functions, allocation and the atomic assignment \(\mathtt {x} \mathtt {\ :=\ } E\), read \(E \mathtt {\ :=\ } [E]\), write \([E] \mathtt {\ :=\ } E\) and compare-and-swap \(\mathtt {x} \mathtt {\ :=\ } \mathtt {CAS}(E,E,E)\), where expressions *E* have no side effects. Consider a counter module with a constructor makeCounter and two operations: incr that increments the value of the counter by 1 and returns its previous value; and read that returns the value of the counter. We give an implementation in Fig. 1a, and an alternative implementation of incr in Fig. 1b.

### 2.1 Abstract Specification

The ordinal parameter is exposed in the specification of the counter to allow the implementation to guarantee that its loops terminate. In a wait-free implementation it would not be necessary to expose the ordinal parameter. For this counter, the read operation is wait-free, while the increment operation is lock-free, since termination depends on bounding the number of interfering increments.

### 2.2 Clients

*Sequential Client.*Consider a program that creates a counter and contains two nested loops. As in the previous example, the outer loop runs a finite but randomly determined number of times. The inner loop also runs a randomly determined number of times, and increments the counter on each iteration. Figure 2 shows this client, together with its total-correctness proof.

The while rule is used for each of the loops: for the outer loop, the variant is \(\mathtt {n}\); for the inner loop, the variant is \(\mathtt {m}\). Since the number of iterations of each loop is determined before it is run, the variants need only be considered up to finite ordinals (*i.e.* natural numbers). (We could modify the code to use a single loop that conditionally decrements \(\mathtt {n}\) (and randomises \(\mathtt {m}\)) or decrements \(\mathtt {m}\). This variation would require a transfinite ordinal for the variant.)

As well as enforcing loop termination, ordinals play a role as a parameter to the \(\mathsf {C}\) predicate, which must be decreased on each increment. When we create the counter, we choose \(\omega ^2\) as the initial ordinal. We have seen that \(\omega \) allows us to decrement the counter a non-deterministic (but finite) number of times. We want to repeat this a non-deterministic (but finite) number of times, so \(\omega \cdot \omega = \omega ^2\) is the appropriate ordinal. Once the number \(\mathtt {n}\) of iterations of the outer loop is determined, we decrease this to \(\omega \cdot \mathtt {n}\) by using the axiom provided by the counter module. Similarly, when \(\mathtt {m}\) is chosen, we decrease the ordinal from \(\omega \cdot \mathtt {n} = \omega \cdot (\mathtt {n} - 1) + \omega \) to \(\omega \cdot (\mathtt {n} - 1) + \mathtt {m}\).

*Concurrent Client.* Consider a program that creates two threads, each of which increments the counter a finite but unbounded number of times. We again prove this client using the abstract specification of the counter. The proof is given in Fig. 3. In this example, the counter is shared between the two threads, which may concurrently update it. To reason about sharing, we use a *shared region*.

As in TaDA, a shared region encapsulates some resource that is available to multiple threads. Threads can access the resource when performing (abstractly) atomic operations, such as incr. The region presents an abstract state, and defines a protocol that determines how the region may be updated. Ghost resources, called *guards*, are associated with transitions in the protocol. The guards for a region form a partial commutative monoid with the operation \(\bullet \), which is lifted by \(*\) in assertions. In order for a thread to make a particular update, it must have ownership of a guard associated with the corresponding transition. All guards are allocated along with the region they are associated with.

For the concurrent client, we introduce a region with type name \(\mathbf {CClient}\). This region encapsulates the shared counter. Accordingly, the region type is parametrised by the address of the counter. The abstract state of the region records the current value of the counter.

There are two types of guard resources associated with \(\mathbf {CClient}\) regions. The guard \(\textsc {Inc}(m, \beta , \pi )\) provides capability to increment the counter. Conceptually, multiple threads may have \(\textsc {Inc}\) guards, and a fractional permission \(\pi \in (0,1]\) (in the style of [2]) is used to keep track of these capabilities. The parameter *m* expresses the *local contribution* to the value of the counter — the actual value is the sum of the local contributions. The ordinal parameter \(\beta \) represents a local bound on the number of increments. Again, the actual bound is a sum of the local bounds. Standard ordinal addition is inconvenient since it is not commutative; we use the natural (or Hessenberg) sum [9], denoted \(\oplus \), which is associative, commutative, and monotone in its arguments.

*n*and ordinal \(\alpha \). These values should match the totals for the \(\textsc {Inc}\) guards, which we enforce by requiring the following implication to hold:

*f*. The first premiss requires that this update is allowed by the transition system for the region, given the guard resources available (G). The second premiss requires that the program \(\mathbb {C}\) (abstractly) atomically performs the corresponding update on the concrete state of the region.

### 2.3 Implementations

We prove the total correctness of the two distinct increment implementations against the abstract specification given in Sect. 2.1.

*Spin Counter Increment.*Consider incr shown in Fig. 1a. Note that the read, write and compare-and-swap operations are atomic. We want to prove the total correctness of incr against the atomic specification. The first step is to give a concrete interpretation of the abstract predicate \(\mathsf {C}(s, x, n, \alpha )\). We introduce a new region type, \(\mathbf {Counter}\), with only one non-empty guard, \(\textsc {G}\). The abstract states of the region are pairs of the form \((n, \alpha )\), where

*n*is the value of the counter and \(\alpha \) is a bound on the number of increments. All transitions are guarded by \(\textsc {G}\) with the transition:

*x*and value

*n*. Note that \(\alpha \) is not represented in the concrete heap, as it is not part of the program. We use it solely to ensure that the number of operations is finite.

*r*, address

*x*, and with abstract state \((n, \alpha )\). Furthermore, it encapsulates exclusive ownership of the guard G, and so embodies exclusive permission to update the counter. (Note that the type of the first parameter of \(\mathsf {C}\), which is abstract to the client, is instantiated as \(\mathsf {RId}\).)

*atomicity context*\({a} : {\begin{array}{@{}c@{}}x \in X\end{array}} \rightsquigarrow {\begin{array}{@{}c@{}}Q(x)\end{array}}\) records the update we require. The program is given the atomic tracking resource Open image in new window initially (in place of the guard G); this resource permits a single update to the region in accordance with the atomicity context, while at the same time guaranteeing that the region’s state will remain within

*X*. When the single update occurs, the atomic tracking resource simultaneously changes to record the actual update performed: Open image in new window .

The make atomic rule of Total-TaDA is just the same as that of TaDA. The only difference is that termination is enforced. Whereas in TaDA it would be possible for an abstract atomic operation to loop forever without performing its atomic update, in Total-TaDA it is guaranteed to eventually perform the update.

A proof of the increment implementation is shown in Fig. 4. The atomicity context allows the environment to modify the abstract state of the counter. However, it makes no restriction on the number of times. The \(\mathbf {Counter}\) transition system enforces that the ordinal \(\alpha \) must decrease every time the value of the counter is increased. This means that the number of times the region’s abstract state is updated is finite. Our loop invariant is parametrised with a variant \(\gamma \) that takes the value of \(\alpha \) at the beginning of each loop iteration. When we first read the value of the counter *n*, we can assert: \(n > \mathtt {v} \Rightarrow \gamma > \alpha \).

If the compare-and-swap operation fails, the value of the counter has changed. This can only happen in accordance with the region’s transition system, and so the ordinal parameter \(\alpha \) must have decreased. As such, the invariant still holds but for a lower ordinal, \(\alpha < \gamma \). We are localising the termination argument for the loop, by relating the local variant with the ordinal parametrising the region.

If the compare-and-swap succeeds, then we record our update from \((\mathtt {v}, \alpha )\) to \((\mathtt {v}\,+\,1, \beta (\mathtt {v},\alpha ))\), where \(\beta \) is the function chosen by the client that determines how the ordinal is reduced. The make atomic rule allows us to export this update in the postcondition of the whole operation.

*Backoff Increment.*Consider a different implementation of the increment operation, given in Fig. 1b. Like the previous implementation, it loops attempting to perform the operation. However, if the compare-and-swap fails due to contention, it waits for a random number of iterations before retrying.

Despite the differences to the previous increment, the specification is the same. In fact, we can give the same interpretation for the abstract predicate \(\mathsf {C}(x, n, \alpha )\), and the same guards and regions that were used for the previous implementation. (Since this is the case, a counter module could provide *both* of these operations: the proof system guarantees that they work correctly together.)

The main difference in the proof is that each iteration of the loop depends on not only the amount of interference on the counter, but also on the variable \(\mathtt {n}\) that is randomised when the compare-and-swap fails. Any random number will be smaller than \(\omega \), and the maximum amount of times that the compare-and-swap can fail is \(\alpha \), the parameter of the \(\mathsf {C}\) predicate. This is because \(\alpha \) is a bound on the number of times the counter can be incremented. We therefore use \(\omega \cdot \alpha \,+\,\mathtt {n}\) as the upper bound on the number of loop iterations.

Let \(\gamma \) be equal to \(\omega \cdot \alpha \,+\,\mathtt {n}\) at the start of the loop iteration. At each loop iteration, we have two cases, when \(\mathtt {n} = 0\) or otherwise. In the first case we try to perform the increment by doing a compare-and-swap. If the compare-and-swap succeeds, then the increment occurs and the loop will exit. If it fails, then the environment must have decreased \(\alpha \). This means that \(\gamma \ge \omega \cdot \alpha \,+\,\omega \) for the new value of \(\alpha \). We then set \(\mathtt {n}\) to be a new random number, which is less than \(\omega \), and end up with \(\gamma > \omega \cdot \alpha \,+\,\mathtt {n}\). In the second case of the loop iteration, we simply decrement \(\mathtt {n}\) by 1 and we know that \(\gamma > \omega \cdot \alpha \,+\,\mathtt {n}\) for the new value of \(\mathtt {n}\). The proof of the backoff increment is shown in Fig. 5.

## 3 Logic

Total-TaDA is a Hoare logic which, for the first time, can be used to prove total correctness for fine-grained non-blocking concurrent programs. The logic is essentially the same as for TaDA, simply adapted to incorporate termination analysis using ordinals in a standard way.

^{2}The pre- and postconditions are split into a private part (the \(p_p\) and \(q_p (x,y)\)) and a public part (the

*p*(

*x*) and

*q*(

*x*,

*y*)). The idea is that the command may make multiple, non-atomic updates to the private part, but must only make a single atomic update to the public part. Before the atomic update, the environment is allowed to change the public part of the state, but only by changing the parameter

*x*of

*p*which must remain within

*X*. After the atomic update, the specification makes no constraint on how the environment modifies the public state. All that is known is that, immediately after the atomic update, the public and private parts satisfy the postcondition for a common value of

*y*. The private assertions in our judgements must be

*stable*: that is, they must account for any updates other threads could have sufficient resources to perform.

We give an overview of the key Total-TaDA proof rules that deal with termination and atomicity in Fig. 6. The while rule enforces that the number of times that the loop body can run is finite. The rule allows us to perform a while loop if we can guarantee that each loop iteration decreases the ordinal parametrising the invariant *p*. By the finite-chain property of ordinals, there cannot be an infinite number of iterations.

The parallel rule and the frame rule are analogous to those for separation logic. The parallel rule allows us to split resources among two threads as long as the resources of one thread are not touched by the other thread. The frame rule allows us to add the frame resources to the pre- and postcondition, which are untouched by the command. Our frame rule separately adds to both the private and public parts. Note that the frame for the public part may be parametrised by the Open image in new window -bound variable *x*.

The next three rules allow us to access the contents of a shared region by using an atomic command. With all of the rules, the update to the shared region must be atomic, so its interpretation is in the public part of the premiss. (The region is in the public part in the conclusion also, but may be moved by weakening.)

The open region rule allows us to access the contents of a shared region without updating its abstract state. The command may change the concrete state of the region, so long as the abstract state is preserved.

The use atomic rule allows us to update the abstract state of a shared region. To do so, we need a guard that permits this update. This rule takes a \(\mathbb {C}\) which (abstractly) atomically updates the region *a* from some state \(x \in X\) to the state *f*(*x*). It requires the guard \(\textsc {G}\) for the region, which allows the update according to the transition system, as established by one of the premisses. Another premiss states that the command \(\mathbb {C}\) performs the update described by the transition system of region *a* in an atomic way. This allows us to conclude that the region *a* is updated atomically by the command \(\mathbb {C}\). Note that the command is not operating at the same level of abstraction as the region *a*. Instead it is working at a lower level of abstraction, which means that if it is atomic at that level it will also be atomic at the region *a* level.

The update region rule similarly allows us to update the abstract state of a shared region, but this time the authority comes from the atomicity context instead of a guard. In order to perform such an update, the atomic update to the region must not already have happened, indicated by Open image in new window in the precondition of the conclusion. In the postcondition, there are two cases: either the appropriate update happened, or no update happened. If it did happen, the new state of the region is some \(z \in Q(x)\), and both *x* and *z* are recorded in the atomicity tracking resource. If it did not, then both the region’s abstract state and the atomicity tracking resource are unchanged. The premiss requires the command to make a corresponding update to the concrete state of the region. The atomicity context and tracking resource are not in the premiss; they serve to record information about the atomic update that is performed for use further down the proof tree.

Finally, we revisit the make atomic rule, which elaborates on the version presented in Sect. 2.3. As before, a guard in the conclusion must permit the update in accordance with the transition system for the region. This is replaced in the premiss by the atomicity context and atomicity tracking resource, which tracks the occurrence of the update. One difference is the inclusion of the private state, which is effectively preserved between the premiss and the conclusion. A second difference is the Open image in new window -binding of the resulting state of the atomic update. This allows the private state to reflect the result of the update.

## 4 Case Study: Treiber’s Stack

We now consider a version of Treiber’s stack [20] to demonstrate how Total-TaDA can be applied to verify the total correctness of larger modules.

### 4.1 Specification

*x*; its contents

*vs*; an ordinal \(\alpha \) that decreases every time a push operation is performed; and two parameters,

*s*and

*t*that range over abstract types \(\mathbb {T}_1\) and \(\mathbb {T}_2\) respectively. These last two parameters encapsulate implementation-specific information about the configuration of the stack (

*s*is invariant, while

*t*may vary) and hence their types are abstract to the client.

The constructor returns an empty stack, parametrised by an arbitrary ordinal chosen by the client. The push operation atomically adds an element to the head of the stack. The pop operation atomically removes one element from the head of the stack, if one is available (*i.e.* the stack is non-empty); otherwise it will simply return 0. (As this stack is non-blocking, it would not be possible for the pop operation to wait for the stack to become non-empty.)

Note that the ordinal parametrising the stack is not required to decrease when popping the stack. This means that the stack operations cannot be starved by an unbounded number of pop invocations. This need not be the case in general for a lock-free stack, but it is true for Treiber’s stack. We discuss the ramifications of this kind of specification further in Sect. 6.3.

### 4.2 Implementation

Figure 8 gives an implementation of the stack operations based on Treiber’s stack [20]. The stack is represented as a heap cell containing a pointer (the head pointer) to a singly-linked list of the values on the stack.

Values are pushed onto the stack by allocating a new node holding the value to be pushed and a pointer to the old head of the stack. A compare-and-swap operation updates the old head of the stack to point to the new node. If the operation fails, it will be because the head of the stack has changed, and so the operation is retried.

*i.e.*the head points to 0), then pop simply returns 0, without affecting the stack.

### 4.3 Correctness

*x*, with contents

*ns*, and a disjoint set of nodes

*ds*(the discarded nodes):We define a region type \(\mathbf {TStack}\) to hold the shared data-structure. The type is parametrised by the address of the stack, and its abstract state consists of a list of nodes in the stack

*ns*, a set of popped nodes

*ds*, and an ordinal \(\alpha \). The \(\mathbf {TStack}\) region type has the following interpretation:

Note that for every transition \((ns, ds, \alpha ) \rightsquigarrow (ns', ds', \alpha ')\), we have \( 2 \cdot \alpha \,+\,\left| {ns} \right| > 2 \cdot \alpha '\,+\,\left| {ns'} \right| \). Pushing decreases the ordinal, but extends the length of the stack by 1; popping maintains the ordinal, but decreases the length of the stack. This property allows us to use \(2 \cdot \alpha + \left| {ns} \right| \) as a variant in the compare-and-swap loops, since it is guaranteed to decrease under any interference.

*ns*. Consequently,

*vs*is the list of values on the stack, rather than pairs of address and value.

## 5 Soundness

The proof of soundness of Total-TaDA is similar to that for TaDA [16] and based on the Views Framework [4]. We use the same model for assertions as that for TaDA. We also use a similar semantic judgement, \(\vDash \), which ensures that the concrete behaviours of programs simulate the abstract behaviours represented by the specifications. The key distinction is that, whereas in TaDA the judgement is defined coinductively (as a greatest fixed point), in Total-TaDA the judgement is defined inductively (as a least fixed point). This means that TaDA admits executions that never terminate, while Total-TaDA requires executions to always terminate: that is, reach a base-case of the inductive definition.

The soundness proof consists of lemmas that justify each of the proof rules for the semantic judgement. Most of the Total-TaDA rules have similar proofs to the corresponding TaDA rules, but proceed by induction instead of coinduction. Of course, the while rule is different, since termination does not follow trivially. We sketch the proof for while. All details are in the technical report [18].

### **Lemma 1**

**(**

**While**

**Rule).**Let \(\alpha \) be an ordinal. If, for all \(\gamma \le \alpha \),

### *Proof*

## 6 Non-blocking Properties

Non-blocking properties are used to characterise concurrent algorithms that guarantee progress. A *lock-free* algorithm guarantees global progress: an individual thread might fail to make progress, but only because some other thread does make progress. A *wait-free* algorithm guarantees local progress: every thread makes progress when it is scheduled. We consider how non-blocking properties can be formalised using Total-TaDA.

### 6.1 Lock-Freedom

We have described lock-freedom in terms of an informal notion of “progress”. In order to properly characterise modules as lock-free, we need a more formal definition. We can characterise global progress for a module as follows: at any time, eventually either a pending operation will be completed or another operation will be begun. If we assume that the number of threads is bounded, then as long as there are pending module operations, some operation will eventually complete. (If the number of threads is unbounded, then there is no guarantee that any operation will complete, even if it is scheduled arbitrarily often, since additional operations can always begin.)

Based on this observation, Gotsman *et al.* [7] reduced lock-freedom to the termination of a simple class of programs, the bounded most-general clients (BMGCs) of a module. Hoffmann *et al.* [10] generalised the result to apply to algorithms where the identity or number of threads is significant. An (*m*, *n*)-bounded general client consists of *m* threads which each invoke *n* module operations in sequence. If all such bounded general clients (for every *n* and *m*)^{3} terminate, then the module is lock free.

### **Definition 1**

*O*. Define the following sets of programs:

### **Theorem 1**

**(Hoffmann****et al.** [10]**).** Given a module \(\mathcal {M}\), if, for all *m* and *n*, every program \(c \in C_{m,n}\) terminates, then \(\mathcal {M}\) is lock free.

Using this theorem, we define a specification pattern for Total-TaDA that guarantees lock-freedom and follows easily from the typical specifications we establish for lock-free modules.

### **Theorem 2**

### *Proof*

By Theorem 1, it is sufficient to show that, for arbitrary *m*, *n* and \(c \in C_{m,n}\), the program *c* terminates.Fix the number of threads *m*.

*i*-th component of vector \(\bar{x}\). We denote by \(\sum \bar{x}\) the sum \(\sum _{i=1}^{i=m} x_i\).) Region states are interpreted as follows: \( I(\mathbf {M}_a(s, \bar{x})) \triangleq \exists u .\,\mathsf {M}\left( s, u, \textstyle \sum \bar{x}\right) \). The guard algebra for \(\mathbf {M}\) consists of

*m*distinct guards \(\textsc {G}_1, \cdots , \textsc {G}_m\). The state transition system for \(\mathbf {M}\) allows a thread holding guard \(\textsc {G}_i\) to decrease the

*i*-th component of the abstract state:

*n*, and \(\mathtt {op} \in O\), using the use atomic rule, we haveApplying this specification repeatedly (by induction), we have for arbitrary \(t \in T_n\)Let \(c = \mathtt {init} ; (t_1 \Vert \cdots \Vert t_m) \in C_{m,n}\) be arbitrary. We derive Open image in new window easily by choosing \(n \cdot m\) as the initial ordinal and creating an \(\mathbf {M}\)-region with initial state \((n,\cdots ,n)\). Consequently,

*c*terminates, as required. \(\square \)

It is straightforward to apply Theorem 2 to the modules we have considered.

### 6.2 Wait-Freedom

Whereas lock-freedom only requires that *some* thread makes progress, wait-freedom requires that *every* thread makes progress (provided that it is not permanently descheduled). In terms of operations, this requires that each operation of a module should complete within a finite number of steps. Since Total-TaDA specifications guarantee that operations terminate, it is simple to describe a specification that implies that a module is wait-free.

### **Theorem 3**

### *Proof*

The specifications imply that \(\mathsf {M}\) is an invariant which is established by the initialiser and preserved at all times by the module operations. Furthermore, all of the module operations terminate, assuming the environment maintains \(\mathsf {M}\) invariant. Consequently, all of the module operations terminate in the context of an environment calling module operations: the module is wait-free. \(\square \)

Lock-freedom can only be applied to a module as a whole, since it relates to global progress. Wait-freedom, by contrast, relates to local progress — that the operations of *each* thread terminate — and so it is meaningful to consider an individual operation to be wait-free in a context where other operations may be lock-free or even blocking. By combining (partial-correctness) TaDA and Total-TaDA specifications (indicated by \(\vdash \) and \(\vdash _\tau \) respectively), we can give a specification pattern that guarantees wait-freedom for a specific module operation.

### **Theorem 4**

### *Proof*

As before, \(\mathsf {M}\) is a module invariant; \(\mathtt {op}\) is guaranteed to terminate with this invariant, therefore it is wait-free. \(\square \)

The specifications required by Theorem 4 do not follow from those given for our examples. However, where applicable, the proofs can easily be adapted. For instance, to show that the read operation of the counter is wait-free, we would remove the ordinals from the region definition, and abstract the value of the counter. This breaks the termination proof for the increment operations, but we can adapt it to a partial-correctness proof in TaDA. The termination proof for read does not depend on the ordinal parameter of the region, and so we can still establish total correctness, as required.

### 6.3 Non-impedance

*impede*each other — that is, which operations may prevent the termination of an operation if infinitely many of them are invoked during a (fair) execution of the operation. Our specification implies that read does not impede either read or incr. This is expressed by edges Open image in new window and Open image in new window in the following non-impedance graph:

The stack specification in Fig. 7, much like the counter specification, implies that pop does not impede either push or pop:

The pop operation, however, may be impeded by push.

The non-impedance relationships implied by the stack specification are important for clients. For instance, consider a producer-consumer scenario in which the stack is used to communicate data from producers to consumers. When no data is available, consumers may simply loop attempting to pop the stack. If the pop operation could impede push, then producers might be starved by consumers. In this situation, we could not guarantee that the system would make progress. This suggests that non-impedance, which is captured by Total-TaDA specifications, can be an important property of non-blocking algorithms.

## 7 Related Work

Hoffmann *et al.* [10] introduced a concurrent separation logic for verifying total correctness. By adapting the most-general-client approach of Gotsman *et al.* [7], they establish that modules are lock-free. (They do not, however, establish functional correctness.) This method involves a thread passing “tokens” to other threads whose lock-free operations are impeded by modifications to the shared state. Subsequent approaches [1, 12] also use some form of tokens that are used up in loops or function calls. These approaches require special proof rules for the tokens. When these approaches restrict to dealing with finite numbers of tokens, support for unbounded non-determinism (as in the backoff increment example of Fig. 5) is limited. In Total-TaDA such token passing is not necessary. Instead, we require the client to provide a general (ordinal) limit on the amount of impeding interference. Consequently, we can guarantee the termination of loops with standard proof rules.

Liang *et al.* [12] have developed a proof theory for termination-preserving refinement, applying it to verify linearisability and lock-freedom. Their approach constrains impedance by requiring that impeding actions correspond to progress at the abstract level. In Total-TaDA, such constraints are made by requiring that impeding actions decrease an ordinal associated with a shared region. Their approach does not freely combine lock-free and wait-free specifications whereas, with Total-TaDA, we can reason about lock- and wait-freedom in combination, and more subtle conditions such as non-impedance. For example, we can show when a read operation of a lock-free data-structure is wait-free. Their specifications establish termination-preserving refinement (given a context, if the abstract program is guaranteed to terminate, then so is the concrete), whereas Total-TaDA specifications establish termination (in a context, the program will terminate).

Boström and Müller [1] have introduced an approach that can verify termination and progress properties of concurrent programs. The approach supports blocking concurrency and non-terminating programs, which Total-TaDA does not. However, the approach does not aim at racy concurrent programs and cannot deal with any of the examples shown in the paper. Furthermore, the relationship between termination and lock- and wait-freedom is not considered.

Of the above approaches, none covers total functional correctness for fine-grained concurrent programs. With Total-TaDA we can reason about clients that use modules, without their implementation details. Moreover, with Total-TaDA it is easy to verify module operations independently, with respect to a common abstraction, rather than considering a whole module at once. Finally, our approach to specification is unique in supporting lock- and wait-freedom simultaneously, as well as expressing more subtle conditions such as non-impedance.

## 8 Conclusions and Future Work

We have introduced Total-TaDA, a program logic that provides local, modular reasoning for proving the termination and functional correctness of non-blocking concurrent programs. With our abstract specifications, clients can reason about total correctness without needing to know about the underlying implementation. Different implementations, satisfying the same specification, can have different termination arguments, but these arguments are not exposed to the clients. By using ordinals to bound interference, our specifications can express traditional non-blocking properties. Moreover, they capture a new notion of *non-impedance*: that one operation does not set back the progress of another.

We have claimed that our approach supports modular reasoning, and substantiated this by reasoning about implementations and clients of modules. We provide further examples in the technical report [18]. In particular, we specify a non-blocking map and verify two implementations, based on lists and hash tables, with the second making use of the first through the abstract specification. We also implement a set specification on top of the map.

*Blocking.* Many concurrent modules make use of *blocking*, for example by using semaphores or monitors. Properties such as starvation-freedom can be expressed in terms of termination, but require the assumption of a fair scheduler. Some aspects of our approach are likely to apply here. However, it is also necessary to constrain future behaviours, for instance, to specify that a lock that has been acquired will be released in a finite time. This might be achieved with a program logic that can reason explicitly about continuations.

*Non-termination.* Some programs, such as operating systems, are designed not to terminate. Such programs should still continually perform useful work. It would be interesting to extend Total-TaDA to specify and verify progress properties of non-terminating systems. Progress can be seen as localised termination, so the same reasoning techniques should apply. However, a different approach to specification will be necessary to express and verify these properties.

## Footnotes

- 1.
The parameter

*s*of the abstract predicate was mistakenly abstracted in [16]. Technically, it is not possible to abstract it by existentially quantifying in the precondition of the atomic triple. - 2.
We have omitted region levels, analogous to those in TaDA, in our judgements to simplify our presentation. They prevent a region from being opened twice within a single branch of the proof tree, which unsoundly duplicates resources.

- 3.
The bounded

*most-general*client may be seen as the program which non-deterministically chooses among all bounded general clients.

## Notes

### Acknowledgements

We thank Bart Jacobs, Hongjin Liang, Peter Müller and the anonymous referees for useful feedback. This research was supported by EPSRC Programme Grants EP/H008373/1 and EP/K008528/1, by the “ModuRes” Sapere Aude Advanced Grant from The Danish Council for Independent Research for the Natural Sciences (FNU) and the “Automated Verification for Concurrent Programs” Individual Postdoc Grant from The Danish Council for Independent Research for Technology and Production Sciences (FTP).

## References

- 1.Boström, P., Müller, P.: Modular verification of finite blocking in non-terminating programs. In: Boyland, J.T. (ed.) 29th European Conference on Object-Oriented Programming, vol. 37, pp. 639–663. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl (2015)Google Scholar
- 2.Boyland, J.: Checking interference with fractional permissions. In: Cousot, R. (ed.) Static Analysis. LNCS, vol. 2694, pp. 55–72. Springer, Heidelberg (2003)CrossRefGoogle Scholar
- 3.Cantor, G.: Beiträge zur begründung der transfiniten mengenlehre. Mathematische Annalen
**49**(2), 207–246 (1897). http://dx.doi.org/10.1007/BF01444205 MathSciNetCrossRefzbMATHGoogle Scholar - 4.Dinsdale-Young, T., Birkedal, L., Gardner, P., Parkinson, M., Yang, H.: Views: compositional reasoning for concurrent programs. In: POPL, pp. 287–300 (2013)Google Scholar
- 5.Dinsdale-Young, T., Dodds, M., Gardner, P., Parkinson, M.J., Vafeiadis, V.: Concurrent abstract predicates. In: D’Hondt, T. (ed.) ECOOP 2010. LNCS, vol. 6183, pp. 504–528. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 6.Floyd, R.W.: Assigning meanings to programs. In: Proceedings of the American Mathematical Society Symposia on Applied Mathematics, vol. 19, pp. 19–31 (1967)Google Scholar
- 7.Gotsman, A., Cook, B., Parkinson, M., Vafeiadis, V.: Proving that non-blocking algorithms don’t block. In: POPL, pp. 16–28 (2009)Google Scholar
- 8.Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst.
**13**(1), 124–149 (1991)CrossRefGoogle Scholar - 9.Hessenberg, G.: Grundbegriffe der Mengenlehre. Abhandlungen der Fries’schen Schule / Neue Folge. Vandenhoeck & Ruprecht, Göttingen (1906)zbMATHGoogle Scholar
- 10.Hoffmann, J., Marmar, M., Shao, Z.: Quantitative reasoning for proving lock-freedom. In: 2013 28th Annual IEEE/ACM Symposium on Logic in Computer Science (LICS), pp. 124–133. IEEE (2013)Google Scholar
- 11.Jung, R., Swasey, D., Sieczkowski, F., Svendsen, K., Turon, A., Birkedal, L., Dreyer, D.: Iris: monoids and invariants as an orthogonal basis for concurrent reasoning. In: POPL, pp. 637–650 (2015)Google Scholar
- 12.Liang, H., Feng, X., Shao, Z.: Compositional verification of termination-preserving refinement of concurrent programs. In: Proceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), p. 65. ACM (2014)Google Scholar
- 13.Massalin, H., Pu, C.: A lock-free multiprocessor os kernel. SIGOPS Oper. Syst. Rev.
**26**, 108 (1992)CrossRefGoogle Scholar - 14.Parkinson, M., Bierman, G.: Separation logic and abstraction. In: POPL, pp. 247–258 (2005)Google Scholar
- 15.Reynolds, J.C.: Separation logic: a logic for shared mutable data structures. In: 2002 Proceedings 17th Annual IEEE Symposium on Logic in Computer Science, pp. 55–74. IEEE (2002)Google Scholar
- 16.da Rocha Pinto, P., Dinsdale-Young, T., Gardner, P.: Tada: a logic for time and data abstraction. In: Jones, R. (ed.) ECOOP 2014. LNCS, vol. 8586, pp. 207–231. Springer, Heidelberg (2014)Google Scholar
- 17.da Rocha Pinto, P., Dinsdale-Young, T., Gardner, P.: Steps in modular specifications for concurrent modules (invited tutorial paper). Electron. Notes Theor. Comput. Sci.
**319**, 3–18 (2015)MathSciNetCrossRefGoogle Scholar - 18.da Rocha Pinto, P., Dinsdale-Young, T., Gardner, P., Sutherland, J.: Modular termination verification for non-blocking concurrency. Technical report, Imperial College London (2016)Google Scholar
- 19.Svendsen, K., Birkedal, L.: Impredicative concurrent abstract predicates. In: Shao, Z. (ed.) ESOP 2014 (ETAPS). LNCS, vol. 8410, pp. 149–168. Springer, Heidelberg (2014)CrossRefGoogle Scholar
- 20.Treiber, R.K.: Systems programming: coping with parallelism. Technical report RJ 5118, IBM Almaden Research Center, April 1986Google Scholar
- 21.Turing, A.M.: Checking a large routine. In: Report of a Conference on High Speed Automatic Calculating Machines, pp. 67–69 (1949). http://www.turingarchive.org/browse.php/B/8
- 22.Turon, A., Dreyer, D., Birkedal, L.: Unifying refinement and hoare-style reasoning in a logic for higher-order concurrency. In: ICFP, pp. 377–390 (2013)Google Scholar