1 Introduction

Development of formally proved software systems using incremental refinement has been successfully used in many case studies. Often the system developed is a sequential system, e.g. a compiler. The standard technique used then is data refinement [8, 9, 14] or closely related definitions [2].

Our group has developed a verified file system for flash memory [12, 13, 22, 26] using a strategy based on data types specified as abstract state machines (ASMs, [4]), data refinement, and subcomponents. The resulting refinement tower is shown in Fig. 1. It starts with an abstract state machine that specifies the POSIX file system operations. This interface is then refined to an implementation VFS (denoted by VFS \(\sqsubseteq \) POSIX), which calls operations of a submachine AFS. This machine acts as an abstract interface to the next implementation. This continues until the MTD layer is reached, which is the generic interface for flash hardware used in Linux.

Scala code for simulations as well as C code integrated into the Linux kernel has been generated from the implementations (shown in grey). The file system so far is strictly sequential, i.e., all operations are called in sequential order. Adding concurrency is however relevant for practical usability and efficiency on at least three levels: top-level operations, garbage collection and wear leveling.

Since existing refinement strategies are typically designed to start with an atomic specification that is refined to a concurrent system, this raises the question how to add concurrency a posteriori to intermediate levels of such a refinement tower without losing modularity and without having to start verification from scratch. This paper gives a positive answer to the question, by “shifting” parts of the refinement towers, i.e., by modifying individual specifications and implementations, to make them concurrent.

Fig. 1.
figure 1

Flashix refinement tower

We will use erase block management (the EBM interface) and the concurrent implementation of wear leveling (WL) based on the interface Blocks as an example to demonstrate how concurrency is added. A specification of the sequential specifications and refinements involved has already been published in [23].

The next section will give a simplified version of the relevant sequential specifications and implementation, to demonstrate in Sect. 3 how concurrency using locks is added and how restrictions are encoded as ownership constraints. Section 4 informally introduces the well-known concept of linearizability as the relevant concept to verify correctness of concurrent implementations, and shows how the proof of linearizability can be split into one of data refinement (that reuses the original proof) and one of atomicity refinement. Section 5 will give a proof strategy based on rely-guarantee proofs and reduction. Both have been implemented in our KIV [11] theorem prover. The specifications and proofs for the case study are available online [18]. Section 6 gives related work, and Sect. 7 concludes.

2 The Refinement for Wear Leveling

Flash hardware is partitioned into erase blocks. Blocks can be written sequentially, and erased as a whole. Erasing wears out the block until it becomes unusable. Therefore, for efficient usage of a flash device, blocks must be worn out evenly. In particular if a device is filled to a large part with static data, the blocks with these data must sometimes be swapped with other (currently empty) blocks, that have often been modified and erased. This is called wear leveling. Wear leveling is hidden from the more abstract levels of the file system by the erase block manager (EBM) interface. The interface offers access to logical blocks. The task of the implementation (WL) is to map them to the physical blocks offered by the hardware, and to change the mapping when this is advisable, using an internal operation for wear leveling that has no effect (implements skip) for the interface EBM.

An abstract specification of the erase block manager is given with the ASM . The state consists of a function that maps logical block numbers to actual content and a set of currently used (“mapped”) block numbers.

figure b

For simplicity, we do not specify , except for a default value . The interface of shown in Fig. 2 allows to read and to write the content of logical blocks. The operations use a semicolon to separate input and output parameters.

Fig. 2.
figure 2

Sequential specification of the erase block manager ( )

The implementation of is given by the ASM together with a specification as a submachine. This refinement introduces the distinction between logical and physical blocks. allows reading and writing of physical blocks while is responsible for the mapping of logical to physical blocks. Furthermore, the wear leveling algorithm is implemented in .

To enable wear leveling each physical block in contains a header. This header stores which logical block is mapped to the physical block or if the block is currently unmapped .

figure o

The state of is a function that maps physical block numbers to blocks. Initially all blocks are unmapped and empty.

figure q

The interface of as shown in Fig. 3 provides additional functionality to write and read the header of a physical block. Accessing the content of a block requires it to be mapped, i.e., the header of the block must not be . For wear leveling the interface also offers an interface operation \(\mathbf {blocks\_get\_wl}\) that returns two physical blocks and , that are suitable for wear leveling. The actual decision is based on erase counts (also stored in block headers), but we leave the concrete implementation open here. To signal that wear leveling is currently unnecessary, the operation returns a block with an unmapped header.

Fig. 3.
figure 3

Sequential specification of the physical block layer ( )

The operations of are depicted in Fig. 4. To avoid scanning the headers of all blocks, the state of maintains an in-memory mapping from logical block numbers to headers, which contain the corresponding physical block numbers if the logical block is mapped.

figure z

Reading and writing of content delegates to the corresponding operations of by following . If a logical block is unmapped, the write operation first maps this block to an unused physical block by writing a header and updating . Therefore provides an operation  that returns a fresh block that can be mapped.

The wear leveling operation , that is not visible to the clients, first requests a pair of blocks to be wear leveled by calling . If the Block is mapped, its header and content are copied to the Block and is updated. We leave away many details here, that ensure, that crashing in the middle of wear leveling will result in a consistent state, see [23].

To prove the refinement three invariants are established in .

figure am

The three predicates guarantee a valid mapping between logical and physical blocks.  prohibits that two logical blocks are mapped to the same physical block,  ensures that each mapped physical block in points to the correct logical block, and  ensures that each mapped physical block also has a matching entry in .

Fig. 4.
figure 4

Sequential implementation of the wear leveling layer ( )

The abstraction relation between states of the specification and states of the implementation ensures that mapped blocks in conform with mapped logical blocks in and that contents of conform to the contents of the mapped physical blocks in .

figure ax

Together with the invariants this is sufficient to prove a data refinement using forward simulation.

3 Adding Concurrency and Ownership

The sequential code calls the wear leveling operation at the end of every other operation. This causes small pauses in between operations. A better solution is to call wear leveling in a separate thread concurrently. This exploits that even the MTD hardware interface is capable of reading and writing different blocks concurrently. This is not possible for individual blocks, since these do not provide random access, but can be written sequentially only.

Adding concurrency implies that interface operations are now called concurrently by several threads, and it is natural to assume that they now have an atomic semantics (which is the natural semantics of ASMs, but was not required in a sequential context). We emphasize this, by writing and for and with atomic semantics, although the machines are the same. Assuming an atomic semantics for the implementation is however unrealistic.

A simple solution that enforces an atomic semantics for an implementation is to use a single global mutex, that is set before each operation and released afterwards. Doing so for the operations of would however prevent wear leveling from running concurrent.

An implementation of that uses such a simple locking strategy would be correct to enforce atomicity, but too restrictive as it would prevent concurrent access to different blocks. It would also not be sufficient for the correctness of . To understand this, consider the implementation of \(\mathbf {wl\_write}\) in Fig. 4 and a potential interleaving of two concurrent executions of this operation as depicted in Fig. 5. Here two threads and write two contents to different logical blocks resp. . Both logical blocks are unmapped so by calling \(\mathbf {blocks\_map}\) unmapped physical blocks are chosen to be mapped. Although the operation is atomic it is possible that for the same physical block is returned as for since has not written the new header yet. Both threads would then write to the same physical block, first different headers that point to resp. , then different contents resp. . After both writes finish an inconsistent state is reached to the effect that the written data of is lost and the injectivity of the block mapping is violated.

Fig. 5.
figure 5

Critical interleaving of two \(\mathbf {wl\_write}\) executions

A concept is needed that enforces on the level of that its implementation can assume that only one thread is writing each block at one time, and that headers are written by a single thread only.

The concept we use is that of threads owning data structures.

figure bt

An owner can either own a data structure non-exclusively (typically for reading) or exclusively for writing. That a thread owns all headers or some block for reading or writing is specified as two ghost variables and . To ensure, that clients of the extended interface shown in Fig. 6 respect the ownership, we add preconditions to the operations, that request read-ownership for reading and write-ownership for writing blocks and headers. A thread that wants to call an operation of must now acquire ownership before it and can release ownership afterwards. For this purpose the interface is extended with two auxiliary acquire and release operations. These acquire and release full ownership, which is sufficient for the concurrent implementation of wear leveling given below. It is possible to add operations that acquire and release read-ownership too. Acquiring full ownership has the precondition that there is no current owner. If two threads now try to write the same block, one of them will violate the precondition of acquire (if it tries to acquire) or it will violate the precondition of writing (if it does not). But this is impossible, since submachine calls in implementations are checked to satisfy their preconditions.

Calls to acquire and release in the augmented code of wear leveling will now ensure, that ownership is properly acquired. They are used for verification, but are “ghost code” that is eliminated when generating executable code.

To make sure, that calls to acquire never violate their precondition, we have to use locks in the extended implementation of given in Fig. 8. The simple implementation we give here just uses mutexes.

figure bz
Fig. 6.
figure 6

Atomic specification of the physical block layer with ownership ( )

The locking and unlocking operations \(\mathbf {mutex\_lock}\) and \(\mathbf {mutex\_unlock}\) are specified as the atomic program statements given in Fig. 7. The definition of \(\mathbf {mutex\_lock}\) uses the program construct . The construct blocks the current thread until its guard \(\varphi \) is satisfied. Immediately afterwards, the program \(\alpha \) is executed in a single, indivisible step.

Fig. 7.
figure 7

Mutex locking operations

Figure 8 shows the result of applying sufficient locking and ownership acquisition to . Additionally, each atomic step gets an individual label (W1W18, R1R8, and WL1WL21) to give assertions for this program point when reasoning about atomicity (see Sect. 5). We refer to this concurrent implementation as . The state of is enhanced by a lock that protects the headers of all blocks, and locks for each logical block that protects its contents.

Fig. 8.
figure 8

Concurrent implementation of the wear leveling layer ( )

figure ch

We use mutexes for all locks, since they match our simplification of acquiring write-ownership only. The actual Erase-Block-Manager in Flashix employs reader-writer locks whenever parallel reading is unproblematic. The general locking concept of  is to acquire only if the mapping from logical to physical blocks needs to be updated. This is the case when writing to an unmapped block or when wear leveling is active. Otherwise, locking only one individual of a specific logical block is sufficient. This lock protects the corresponding entry of the block mapping as well as the content of the physical block . With this strategy multiple reads and writes to different, mapped logical blocks are possible, even in parallel to wear leveling.

Fig. 9.
figure 9

Concurrency refinement of the erase-block-manager

One exception is that the has to be acquired in every \(\mathbf {wl\_write}\) execution (W2W14 in Fig. 8), at least for a short amount of time. This is due to the locking hierarchy that is employed to avoid deadlocks. When running in parallel, it is possible that a \(\mathbf {wl\_write}\) and \(\mathbf {wl\_wear\_leveling}\) may both need to acquire and the same , so it must be ensured that those operations request the locks in the same order. Because \(\mathbf {wl\_wear\_leveling}\) needs to be owner of to get suitable physical blocks at WL4 before a logical block can be locked, \(\mathbf {wl\_write}\) must request ahead of requesting .

Figure 9 shows the resulting refinement of . Proving using linearizability is discussed in detail in the next sections. It remains to integrate the new “shifted” refinement into the refinement tower. The layers above can remain untouched since is identical to , and sequential use of is not problematic. Below an adjustment is necessary: a simple one is to use a global lock around the operations of its implementation. Since the level is already close to the MTD hardware interface, the real solution propagates ownership down to ownerships at the hardware level (where blocks store a sequence of bytes instead of a header and content).

4 Linearizabilty and Atomicity Refinement

The standard correctness criterion we use to prove correctness of the refinement of to from Fig. 9 is linearizability. A formal definition can be found in [15], we only give an informal description here.

A concurrent implementation CASM with nonatomic programs \(COP_i\) is linearizable to an atomic specification AASM with atomic operations \(AOP_i\), if the input/output behaviors of each concurrent run can be explained by mapping them to the sequential input/output behavior of some sequential run of AASM.

Fig. 10.
figure 10

Splitting the refinement

The mapping between a concurrent and a sequential run is as follows: for each concurrent call of an operation \(COP_i\) that is started at time \(t_i\) and returns at time \(t'_i\) find some point in time \(l_i\) with \(t_i \le l_i \le t'_i\), such that all \(l_i\) are different. The point is called the linearization point of the operation call. Then construct some sequential run of AASM that executes each corresponding abstract operation \(AOP_i\) atomically at time \(l_i\). Note that even for fixed linearization points this may give several sequential runs if the abstract operations are nondeterministic.

A refinement from AASM to CASM then is linearizable, if for every concurrent run linearization points and an abstract sequential run can be found, such that all operation calls have the same inputs and outputs.

The clients of the interface then cannot distinguish the concurrent run from one, where each operation call is delayed until time \(l_i\), executes \(AOP_i\) atomically and then is delayed again until time \(t'_i\).

Our proof technique will use an intermediate machine that is the same as , but executes the code of each operation as one atomic step. This splits the refinement problem into three parts as shown in Fig. 10. The data refinement , that we have already proved (since the ASMs are the same as and ). Second, a trivial refinement that abstracts from the locking/unlocking (and acquire/release) instructions in , since the overall effect of locking/unlocking in one atomic step is empty. Finally, the atomicity refinement , where both machines have the same data and operations, but different atomicity. Splitting the refinement from an atomic AASM to a concurrent CASM by using an intermediate \(\mathtt {at}\)(CASM), which executes the operations of CASM atomically, has the advantage that data refinement is completely decoupled from atomicity refinement.

The next section will describe a proof strategy for proving the atomicity refinement between and , which is the new problem we get from adding concurrency to the refinement tower.

5 Proof Strategy for Atomicity Refinement

The proof strategy we use to prove atomicity refinement consists of two steps. First we prove that the concurrent runs of satisfy some assertions at all program points. These proofs use thread-local reasoning with the rely-guarantee calculus. They additionally ensure termination and deadlock-freedom, which are not implied by linearizability alone. Second we prove that based on the assertions, atomic program steps can be reduced to larger and larger atomic steps, until we arrive at . We sketch the basic strategy in the first subsection, and give results for the case study in Sect. 5.2.

5.1 Rely-Guarantee Proofs and Reduction

The variant of the rely-guarantee calculus used here is similar to the one given in [30], Section 5. The basic correctness statementFootnote 1 is of the form

figure dq

where program \(\alpha \) is assumed to be the sequential program of some thread, that executes atomic steps. These alternate with environment steps, where one environment step is an arbitrary sequence of steps of other threads.

The program is assumed to use the state variables . Precondition , postcondition , predicate , and global invariant are predicates over this state. The rely and the guarantee restrict environment and program steps. They are predicates over and We write arguments in predicates if they differ from the standard ones only.

The formula asserts, that program \(\alpha \), when started in a state that satisfies precondition and global invariant , will execute steps that satisfy and preserve the invariant , as long as all previous environment steps satisfy and preserve too. No program step will block, when at that time holds. In addition, when all environment steps satisfy and preserve , then the program will either terminate and the final state will satisfy , or it will stop in a blocked state where is false.

The calculus to prove such formulas in KIV is based on symbolic execution. The basic rule to execute one atomic step at label L, that is annotated with an assertion \(\varphi _L\) is

figure el

The rule reduces the conclusion at the bottom to premises. The first premise states that before executing \(\alpha \) the assertion at the initial label holds, and that the first step does not block (\(\varphi \) holds) whenever the predicate is true.

The second premise uses the Dynamic Logic formula which asserts that the sequential program \(\alpha \) has a terminating run that yields a state \(\underline{x'}\). The premise ensures that the first atomic step of the program, which executes \(\alpha \) is a step that satisfies and preserves the invariant .

The third premise continues symbolic execution with the rest of the program. Its precondition uses two sets \(\underline{x}_0\) and \(\underline{x}_1\) of fresh variables, to represent the two old states before and after the first atomic program step. The subsequent environment step from \(\underline{x}_1\) to the current state \(\underline{x}\) is assumed to satisfy . Since rely steps preserve the invariant, it can be assumed for the current state again.

One common instance of the rule is a parallel assignment \(\underline{y} := \underline{t}\), which can be viewed as an abbreviation for atomic true {\(\underline{y} := \underline{t}\)}. In this case the formula reduces to , where \(\underline{z}\) are the remaining variables from \(\underline{x}\) that are not assigned.

The rules for other constructs like conditionals resemble the usual rules for symbolic execution of programs, except that similar to the rule above they have rely steps in between program steps and side conditions for assertions and guarantee. For loops, a loop invariant (that holds at the start of each iteration) and a variant, that decreases with a wellfounded order are needed. Proofs for recursive routines need wellfounded induction.

Individual rely-guarantee proofs for single threads can be combined to a rely-guarantee property of a concurrent system. The crucial property that needs to hold for this to work, is that the relies and guarantees must be compatible: the guarantee of each thread must imply the relies of other threads . For our state machines where all threads are known to execute the same operations, the guarantee can be chosen to be , the weakest guarantee possible that is trivially compatible. The system is deadlock-free, if the disjunction of all holds. When a mutex is used, is chosen to be which implies this condition. This easily generalizes to the hierarchy of locks used in the case study.

In summary, to verify assertions for a specification of a concurrent state machine with operations \(OP_i\), the user has to provide an invariant , a rely and a predicate . The latter describes states, where a thread is not currently executing an operation. From these predicate logic proof obligations (e.g. the must be reflexive, initial states satisfy the invariant etc.) are generated, together with the following rely guarantee proof obligation for each operation.

figure fe

Successful verification guarantees that each of the assertions \(\varphi _L\) holds every time a thread reaches label L, that the operations terminate and that the implementation is deadlock-free.

The verified assertions are then used to combine atomic statements to larger ones following Lipton’s [19] strategy of reduction. The idea is that a thread executing two atomic steps \(At_{L1}\) and \(At_{L2}\) (at labels L1 and L2) with an environment step in between is often equivalent to first executing the environment step, then \(At_{L1}\) and \(At_{L2}\) with no intermediate environment step. In this case the two steps can be merged together to form one atomic step.

Fig. 11.
figure 11

\(At_{L1}\) commutes to the right of environment step \(At_M; At_N\)

Reverting the order of first executing \(At_{L1}\) and then an environment step is possible, if all steps of other threads, that could be a part of the environment step, commute to the right with \(At_{L1}\), in the sense that executing them in both orders gives the same final state. In this case \(At_{L1}\) is called a right mover. Analogous to this, a step that commutes to left with all steps is called a left mover. Figure 11 shows an example, where the environment step consists of two steps \(At_{M}\) and \(At_{N}\) of other threads. The original run is shown at the bottom, the alternative run which allows executing \(At_{L1}\) and \(At_{L2}\) as one atomic step at the top. The intermediate states of the runs are different, but they reach the same final state.

The atomic steps of the programs can all be written in the form

figure ff

where L is the label, and \(\varphi _L\) the assertion established. The guard is true for all statements, except locking instructions, cf. Figure 7. Program \(\alpha _{L}\) is either an assignment, or the call of a submachine operation. For a conditional or a while loop with test \(\delta \), \(\alpha _{L}\) is defined to be \(b := \delta \) using a fresh variable b, while binding a local variable \(\mathbf{let}\ y = t\ \mathbf{in} \ldots \) gives \(\alpha _{L} \equiv \{ y := t\}\). The formal condition for \(At_{L1}\) to commute to the right with \(At_{L2}\) executed by another thread is

(1)

In the formula, are variants, that rename thread local variables used in \(At_{M}\) to new, primed variables disjoint from the shared state and the local variables of \(At_{L}\). The criterion critically uses the assertions at both labels, since they often show that the preconditions of the implication contradict each other, trivializing the proof. If, for example the two steps are both in a region where a common lock is needed, they commute trivially: \(\varphi _{L}\) implies , while \(\varphi '_{M}\) implies , so the proof obligation trivially holds. A general result is that locking is always a right mover, while unlocking is always a left mover.

Combining steps to larger steps can be translated into rules for making statements like sequential composition, conditionals and loops atomic, when their parts are atomic already. We use rules similar to the reduction rules given in [10]. Iterated application gives larger and larger atomic blocks. Ideally, the final result is that the whole concurrent program of one operation has been combined into a single atomic step. If this is possible, then a linearizability proof becomes trivial, as the linearizability point then simply is the single atomic step.

5.2 Proving the Case Study

The main task for proving the atomicity refinement of the case study is to find assertions, rely conditions and a global invariant that are strong enough to allow atomicity refinement.

The rely conditions are derived from the crucial ideas what data structures are protected from being changed, when thread has a certain lock or ownership. This results in the following clauses.

figure fl

The only rely that is somewhat difficult to find is the last one: if a thread locks logical block , then other threads are not allowed to change the block header to point to or to point away from .

The global invariant and the assertions are derived from several sources. First, ownership as used in the interface has to be compatible with the use of locks.

(2)
(3)
(4)

The invariant (2) states that headers are owned only if the lock has been taken. Invariant (3) states that a mapped physical block can be owned (and therefore changed) only if the corresponding logical block that is stored in its header is locked. For unmapped blocks property (4) states that they can be owned only if has taken the header lock.

Second, the three global invariants of the sequential code are relevant. Dropping them completely would result in illegal states where e.g. the block mapping is no longer injective. However, the invariants of the sequential verification are only guaranteed to hold in idle states, where no thread is running. So it is necessary to give weaker assertions for intermediate states, that are still sufficient to avoid illegal ones.

For the given case study, it turns out that and are preserved by all steps, but that does not hold while the headers are locked. As a result the global invariant can include only when the headers are currently not owned (\(Oheaders = readers(\emptyset )\)). To establish this assertion, after a step that releases , assertions have to be given for all labels, where is taken. For writing the predicate is violated between line after the header of block has been set to and line , where is set to . For all lines in this range holds: if were already updated, then would hold. The wear leveling algorithm gives similar assertions for the range .

Finally, assertions are sometimes necessary for the code after a test or after assignments to a variable. In a purely sequential setting, the test for at R2 ensures that this formula holds, until the subsequent let binding at line R4, which will ensure when the variable is used later on. However, in the concurrent setting may be assigned by other threads, destroying each of these properties. In the given case, the rely conditions are strong enough to propagate the formulas, so we assert that at line R4 the first formula holds, while for lines R5R7 the second holds. A number of similar assertions are needed for other local variables.

Proving the rely-guarantee proof obligations for the individual programs requires the main effort in proving the concurrent setting correct. This is in line with case studies we have done for lock-free algorithms [25, 27,28,29], where proving rely-guarantee assertions caused the main effort too.

After establishing assertions for all program points, the program can then be reduced, combining atomic steps to larger ones. This requires to find out, which steps are left or right movers (or both). The current strategy implemented in KIV does simple syntactic checks to check whether the resulting commutativity requirement (1) is trivial: either the accessed variables are disjoint, or the preconditions of the proof obligation trivially reduce to false. Otherwise it is possible to generate proof obligations, by manually asserting that certain steps (identified by their label) are left or right movers (or both).

For the case study, manual specifications of mover types are currently necessary for the atomic calls \(\mathbf {blocks\_acquire}\) (right mover) and (left mover) of . The reader may check, that this trivially implies that the other operations of are left and right movers. After the mover types have been determined, the reduction rules are then applied automatically, to form maximally large atomic blocks.

This immediately results in a single atomic block for \(\mathbf {wl\_write}\) and \(\mathbf {wl\_read}\). Reducing \(\mathbf {wl\_wear\_leveling}\) creates three atomic blocks. The first ends at the conditional at line WL6 and is a right mover. The second is for the let-block WL7WL19. The third is for the last two lines WL20WL21, and is a left mover. The conditional cannot be reduced, since its then-branch requires the lock for block to be free, while the empty else-branch does not have this guard. With the atomic blocks now being much larger than before, it becomes possible to prove much stronger invariants that just hold in between blocks, but did not hold for the original programs. In particular, since all locking and unlocking of blocks is now within atomic regions, the simple invariant that all are always free can be established using another simple rely-guarantee proof. With the new invariant established, another reduction step finds, that the conditional at line WL6 can now be reduced to an atomic block. Together with the initial and the final block being right resp. left movers already, the wear leveling code is combined by another reduction step into a single step. This implies that the concurrent implementation of wear leveling is indeed linearizable and a correct refinement.

6 Related Work

Related work on wear leveling and the flash file system we have developed has already been given in [23], where the full version of the sequential wear leveling algorithm has been specified.

This paper is based on the PhD of Jörg Pfähler [21], where concurrency was added to the full wear leveling algorithm. The full version needs to add ownership annotations and locks to several refinements. This version is now used in our actual flash file system implementation. The PhD also contains extensions that allow verifying crash-safety, which we could not address in this paper.

The flash file system by Damchoom et al. [7] has concurrent wear leveling. The synchronization between threads is implicitly performed by the semantics of Event-B models, i.e., an event in an Event-B model is always executed atomically, and not explicitly via locks or other synchronization primitives. This makes the step to actual running code more difficult and less straightforward. The full erase block management used in our flash file system is also more general, because it does not use additional bits of out-of-band data of an erase block.

Verification of concurrent, lock-based systems is of course a very broad topic with lots of important contributions, and the proof techniques we use are from this field. We are not aware of other formal methods that specifically address the question of this paper: how to add concurrency a posteriori to an existing modular, sequential system, without having to prove the system from scratch. Adding concurrency to components of an existing software system to increase efficiency is however a recurring software engineering task that should be supported by formal methods.

Refinement and abstraction of atomicity is quite common for concurrent systems, and many refinement definitions for concurrent systems like [1] or [20] address refinements of atomicity. The refinement calculus of Back [3] uses the opposite direction. It starts out with an atomic program and splits it into smaller actions in refinement steps.

The calculus of atomic actions due to Elmas et al. [10] is an extension of Lipton’s [19] original approach for highly concurrent, linearizable programs. It provides a more incremental verification methodology than the calculus given here for highly concurrent systems and its implementation is better automated. The assertions and invariants are incrementally validated in [10], whereas here a rely/guarantee proof is used to validate them before applying any reductions. The rules of the calculus in [10] address partial correctness, so termination would have to be proven differently. Nevertheless, many of the reduction rules given in this paper are directly used in our approach too.

Ownership annotations are used in the C verifier VCC [6] and Spec# [16] in order to ensure data-race freedom of the code. They are typically coupled to objects of the programming language, while we decouple the use of ownership from objects. Fractional permissions [5] in concurrent versions of separation logics [24] serve a similar purpose as ownership. These are for example supported by the C code verifier VeriFast [17].

7 Conclusion

We have presented an approach for adding concurrency to an existing refinement tower. The given approach allows to add concurrency by enhancing some of the components of the refinement tower. Abstract interfaces are extended with acquire and release operations, that specify allowed concurrency. In our case study concurrent writes on different blocks are possible, while concurrent writes on the same block are disallowed. Concurrent code using these interfaces is then possible, that enhances the existing sequential code with suitable locking strategies. We have evaluated this strategy of “shifting parts of the refinement” tower by making wear-leveling concurrent in the Flashix file system. Specifications using the same concept have been defined for concurrent garbage collection, with executable code already running. Verification is work in progress. We also work on a allowing concurrent calls for POSIX file system operations.