1 Introduction

For distributed applications, keeping a single copy of data at one location or multiple fully-synchronized copies (i.e. state-machine replication) at different locations, makes the application susceptible to loss of availability due to network and machine failures. On the other hand, having multiple un-synchronized replicas of the data results in high availability, fault tolerance and uniform low latency, albeit at the expense of consistency. In the latter case, an update issued at one replica can be asynchronously transmitted to other replicas, allowing the system to operate continuously even in the presence of network or node failures [8]. However, mechanisms must now be provided to ensure replicas are kept consistent with each other in the face of concurrent updates and arbitrary re-ordering of such updates by the underlying network.

Over the last few years, Conflict-free Replicated Datatypes (CRDTs) [19,20,21] have emerged as a popular solution to this problem. In op-based CRDTs, when an operation on a CRDT instance is issued at a replica, an effector (basically an update function) is generated locally, which is then asynchronously transmitted (and applied) at all other replicas.Footnote 1 Over the years, a number of CRDTs have been developed for common datatypes such as maps, sets, lists, graphs, etc.

The primary correctness criterion for a CRDT implementation is convergence (sometimes called strong eventual consistency [9, 20] (SEC)): two replicas which have received the same set of effectors must converge to the same CRDT state. Because of the weak default guarantees assumed to be provided by the underlying network, however, we must consider the possibility that effectors can be applied in arbitrary order on different replicas, complicating correctness arguments. This complexity is further exacerbated because CRDTs impose no limitations on how often they are invoked, and may assume additional properties on network behaviour [14] that must be taken into account when formulating correctness arguments.

Given these complexities, verifying convergence of operations in a replicated setting has proven to be challenging and error-prone [9]. In response, several recent efforts have used mechanized proof assistants to yield formal machine-checked proofs of correctness [9, 24]. While mechanization clearly offers stronger assurance guarantees than handwritten proofs, it still demands substantial manual proof engineering effort to be successful. In particular, correctness arguments are typically given in terms of constraints on CRDT states that must be satisfied by the underlying network model responsible for delivering updates performed by other replicas. Relating the state of a CRDT at one replica with the visibility properties allowed by the underlying network has typically involved constructing an intricate simulation argument or crafting a suitably precise invariant to establish convergence. This level of sophisticated reasoning is required for every CRDT and consistency model under consideration. There is a notable lack of techniques capable of reasoning about CRDT correctness under different weak consistency policies, even though such techniques exist for other correctness criteria such as preservation of state invariants [10, 11] or serializability [4, 16] under weak consistency.

To overcome these challenges, we propose a novel automated verification strategy that does not require complex proof-engineering of handcrafted simulation arguments or invariants. Instead, our methodology allows us to directly connect constraints on events imposed by the consistency model with constraints on states required to prove convergence. Consistency model constraints are extracted from an axiomatization of network behavior, while state constraints are generated using reasoning principles that determine the commutativity and non-interference of sequences of effectors, subject to these consistency constraints. Both sets of constraints can be solved using off-the-shelf theorem provers. Because an important advantage of our approach is that it is parametric on weak consistency schemes, we are able to analyze the problem of CRDT convergence under widely different consistency policies (e.g., eventual consistency, causal consistency, parallel snapshot isolation (PSI) [23], among others), and for the first time verify CRDT convergence under such stronger models (efficient implementations of which are supported by real-world data stores). A further pleasant by-product of our approach is a pathway to take advantage of such stronger models to simplify existing CRDT designs and allow composition of CRDTs to yield new instantiations for more complex datatypes.

The paper makes the following contributions:

  1. 1.

    We present a proof methodology for verifying the correctness of CRDTs amenable to automated reasoning.

  2. 2.

    We allow the proof strategy to be parameterized on a weak consistency specification that allows us to state correctness arguments for a CRDT based on constraints imposed by these specifications.

  3. 3.

    We experimentally demonstrate the effectiveness of our proposed verification strategy on a number of challenging CRDT implementations across multiple consistency schemes.

Collectively, these contributions yield (to the best of our knowledge) the first automated and parameterized proof methodology for CRDT verification.

The remainder of the paper is organized as follows. In the next section, we provide further motivation and intuition for our approach. Section 3 formalizes the problem definition, providing an operational semantics and axiomatizations of well-known consistency specifications. Section 4 describes our proof strategy for determining CRDT convergence that is amenable to automated verification. Section 5 provides details about our implementation and experimental results justifying the effectiveness of our framework. Section 6 presents related work and conclusions.

Fig. 1.
figure 1

A simple Set CRDT definition.

2 Illustrative Example

We illustrate our approach using a Set CRDT specification as a running example. A CRDT \((\varSigma ,O,\sigma _{\mathsf {init}})\) is characterized by a set of states \(\varSigma \), a set of operations O and an initial state \(\sigma _{\mathsf {init}} \in \varSigma \), where each operation \(o \in O\) is a function with signature \(\varSigma \rightarrow (\varSigma \rightarrow \varSigma )\). The state of a CRDT is replicated, and when operation o is issued at a replica with state \(\sigma \), the effector \(o(\sigma )\) is generated, which is immediately applied at the local replica (which we also call the source replica) and transmitted to all other replicas, where it is subsequently applied upon receipt.

Additional constraints on the order in which effectors can be received and applied at different replicas are specified by a consistency policy, discussed below. In the absence of any such additional constraints, however, we assume the underlying network only offers eventually consistent guarantees - all replicas eventually receive all effectors generated by all other replicas, with no constraints on the order in which these effectors are received.

Consider the simple Set CRDT definition shown in Fig. 1. Let E be an arbitrary set of elements. The state space \(\varSigma \) is . Add(a):S denotes the operation Add(a) applied on a replica with state S, which generates an effector which simply adds a to the state of all other replicas it is applied to. Similarly, Remove(a):S generates an effector that removes a on all replicas to which it is applied. Lookup(a):S is a query operation which checks whether the queried element is present in the source replica S.

Fig. 2.
figure 2

A definition of an ORSet CRDT.

A CRDT is convergent if during any execution, any two replicas which have received the same set of effectors have the same state. Our strategy to prove convergence is to show that any two effectors of the CRDT pairwise commute with each other modulo a consistency policy, i.e. for two effectors \(e_1\) and \(e_2\), \(e_1 \circ e_2 = e_2 \circ e_1\). Our simple Set CRDT clearly does not converge when executed on an eventually consistent data store since the effectors \(e_1 =\) Add(a):S\(_1\) and \(e_2 =\) Remove(a):S\(_2\) do not commute, and the semantics of eventual consistency imposes no additional constraints on the visibility or ordering of these operations that could be used to guarantee convergence. For example, if \(e_1\) is applied to the state at some replica followed by the application of \(e_2\), the resulting state does not include the element a; conversely, applying \(e_2\) to a state at some replica followed by \(e_1\) leads to a state that does contain the element a.

However, while commutativity is a sufficient property to show convergence, it is not always a necessary one. In particular, different consistency models impose different constraints on the visibility and ordering of effectors that can obviate the need to reason about their commutativity. For example, if the consistency model enforces Add(a) and Remove(a) effectors to be applied in the same order at all replicas, then the Set CRDT will converge. As we will demonstrate later, the PSI consistency model exactly matches this requirement. To further illustrate this, consider the definition of the ORSet CRDT shown in Fig. 2. Here, every element is tagged with a unique identifier (coming from the set I). Add(a,i):S simply adds the element a tagged with iFootnote 2, while Remove(a):S returns an effector that when applied to a replica state will remove all tagged versions of a that were present in S, the source replica.

Fig. 3.
figure 3

A variant of the ORSet using tombstones.

Suppose \(e_1=\) Add(a,i):S\(_1\) and \(e_2=\) Remove(a):S\(_2\). If it is the case that S\(_2\) does not contain (a,i), then these two effectors are guaranteed to commute because \(e_2\) is unaware of (a,i) and thus behaves as a no-op with respect to effector \(e_1\) when it is applied to any replica state. Suppose, however, that \(e_1\)’s effect was visible to \(e_2\); in other words, \(e_1\) is applied to S\(_2\) before \(e_2\) is generated. There are two possible scenarios that must be considered. (1) Another replica (call it S’) has \(e_2\) applied before \(e_1\). Its final state reflects the effect of the Add operation, while S\(_2\)’s final state reflects the effect of applying the Remove; clearly, convergence is violated in this case. (2) All replicas apply \(e_1\) and \(e_2\) in the same order; the interesting case here is when the effect of \(e_1\) is always applied before \(e_2\) on every replica. The constraint that induces an effector order between \(e_1\) and \(e_2\) on every replica as a consequence of \(e_1\)’s visibility to \(e_2\) on S\(_2\) is supported by a causally consistent distributed storage model. Under causal consistency, whenever \(e_2\) is applied to a replica state, we are guaranteed that \(e_1\)’s effect, which adds (a,i) to the state, would have occurred. Thus, even though \(e_1\) and \(e_2\) do not commute when applied to an arbitrary state, their execution under causal consistency nonetheless allows us to show that all replica states converge. The essence of our proof methodology is therefore to reason about commutativity modulo consistency - it is only for those CRDT operations unaffected by the constraints imposed by the consistency model that proving commutativity is required. Consistency properties that affect the visibility of effectors are instead used to guide and simplify our analysis. Applying this notion to pairs of effectors in arbitrarily long executions requires incorporating commutativity properties under a more general induction principle to allow us to generalize the commutativity of effectors in bounded executions to the unbounded case. This generalization forms the heart of our automated verification strategy.

Figure 3 defines an ORSet with “tombstone” markers used to keep track of deleted elements in a separate set. Our proof methodology is sufficient to automatically show that this CRDT converges under EC.

3 Problem Definition

In this section, we formalize the problem of determining convergence in CRDTs parametric to a weak consistency policy. First, we define a general operational semantics to describe all valid executions of a CRDT under any given weak consistency policy. As stated earlier, a CRDT program \(\mathcal {P}\) is specified by the tuple \((\varSigma , O, \sigma _{\mathsf {init}})\). Here, we find it to convenient to define an operation \(o \in O\) as a function \((\varSigma \times (\varSigma \rightarrow \varSigma )^*) \rightarrow (\varSigma \rightarrow \varSigma )\). Instead of directly taking as input a generating state, operations are now defined to take as input a start state and a sequence of effectors. The intended semantics is that the sequence of effectors would be applied to the start state to obtain the generating state. Using this syntax allows us simplify the presentation of the proof methodology in the next section, since we can abstract a history of effectors into an equivalent start state.

Formally, if \(\hat{o}:\varSigma \rightarrow (\varSigma \rightarrow \varSigma )\) was the original op-based definition, then we define the operation \(o:(\varSigma \times (\varSigma \rightarrow \varSigma )^*) \rightarrow (\varSigma \rightarrow \varSigma )\) as follows:

figure a

Note that \(\epsilon \) indicates the empty sequence. Hence, for all states \(\sigma \) and sequence of functions \(\pi \), we have \(o(\sigma , \pi ) = \hat{o}(\pi (\sigma ))\).

To define the operational semantics, we abstract away from the concept of replicas, and instead maintain a global pool of effectors. A new CRDT operation is executed against a CRDT state obtained by first selecting a subset of effectors from the global pool and then applying the elements in that set in some non-deterministically chosen permutation to the initial CRDT state. The choice of effectors and their permutation must obey the weak consistency policy specification. Given a CRDT \(\mathcal {P} = (\varSigma , O, \sigma _{\mathsf {init}})\) and a weak consistency policy \(\varPsi \), we define a labeled transition system \(\mathcal {S}_{\mathcal {P}, \varPsi } = (\mathcal {C},\rightarrow )\), where \(\mathcal {C}\) is a set of configurations and \(\rightarrow \) is the transition relation. A configuration \(c = (\varDelta ,\mathsf {vis},\mathsf {eo})\) consists of three components: \(\varDelta \) is a set of events, \(\mathsf {vis} \subseteq \varDelta \times \varDelta \) is a visibility relation, and \(\mathsf {eo}\subseteq \varDelta \times \varDelta \) is a global effector order relation (constrained to be anti-symmetric). An event \(\eta \in \varDelta \) is a tuple \((\mathsf {eid}, o, \sigma _s, \varDelta _r, \mathsf {eo})\) where \(\mathsf {eid}\) is a unique event id, \(o \in O\) is a CRDT operation, \(\sigma _s \in \varSigma \) is the start CRDT state, \(\varDelta _r\) is the set of events visible to \(\eta \) (also called the history of \(\eta \)), and \(\mathsf {eo}\) is a total order on the events in \(\varDelta _r\) (also called the local effector order relation). We assume projection functions for each component of an event (for example \(\sigma _s(\eta )\) projects the start state of the event \(\eta \)).

Given an event \(\eta = (\mathsf {eid}, o, \sigma _s, \varDelta _r, \mathsf {eo})\), we define \(\eta ^{e}\) to be the effector associated with the event. This effector is obtained by executing the CRDT operation o against the start CRDT state \(\sigma _s\) and the sequence of effectors obtained from the events in \(\varDelta _r\) arranged in the reverse order of \(\mathsf {eo}\). Formally,

$$\begin{aligned} \eta ^{e} = {\left\{ \begin{array}{ll} o( \sigma _{s}, \epsilon ) &{} \text{ if } \varDelta _r = \phi \\ o( \sigma _{s}, \prod \nolimits _{i=1}^{k} \eta _{P(i)}^{e}) &{} \text{ if } \varDelta _r = \{\eta _1, \ldots , \eta _k\} \text{ where } P:\{1,\ldots ,k\} \rightarrow \{1,\ldots ,k\}\\ &{} \forall i,j. i < j \Rightarrow (\eta _{P(j)}, \eta _{P(i)}) \in \mathsf {eo} \end{array}\right. } \end{aligned}$$

In the above definition, when \(\varDelta _r\) is non-empty, we define a permutation P of the events in \(\varDelta _r\) such that the permutation order is the inverse of the effector order \(\mathsf {eo}\). This ensures that if \((\eta _i,\eta _j) \in \mathsf {eo}\), then \(\eta _j^e\) occurs before \(\eta _i^e\) in the sequence passed to the CRDT operation o, effectively applying \(\eta _i^e\) before \(\eta _j^e\) to obtain the generating state for o.

The following rule describes the transitions allowed in \(\mathcal {S}_{\mathcal {P}, \varPsi }\):

figure b

The rule describes the effect of executing a new operation o, which begins by first selecting a subset of already completed events (\(\varDelta _r\)) and a total order \(\mathsf {eo}_r\) on these events which obeys the global effector order \(\mathsf {eo}\). This mimics applying the operation o on an arbitrary replica on which the events of \(\varDelta _r\) have been applied in the order \(\mathsf {eo}_r\). A new event (\(\eta \)) corresponding to the issued operation o is computed, which is used to label the transition and is also added to the current configuration. All the events in \(\varDelta _r\) are visible to the new event \(\eta \), which is reflected in the new visibility relation \(\mathsf {vis}'\). The system moves to the new configuration \((\varDelta ', \mathsf {vis}', \mathsf {eo}')\) which must satisfy the consistency policy \(\varPsi \). Note that even though the general transition rule allows the event to pick any arbitrary start state \(\sigma _s\), we restrict the start state of all events in a well-formed execution to be the initial CRDT state \(\sigma _{\mathsf {init}}\), i.e. the state in which all replicas begin their execution. A trace of \(\mathcal {S}_{\mathcal {P}, \varPsi }\) is a sequence of transitions. Let \(\llbracket \mathcal {S}_{\mathcal {P}, \varPsi } \rrbracket \) be the set of all finite traces. Given a trace \(\tau \), \(L(\tau )\) denotes all events (i.e. labels) in \(\tau \).

Definition 1

(Well-formed Execution). A trace \(\tau \in \llbracket \mathcal {S}_{\mathcal {P}, \varPsi } \rrbracket \) is a well-formed execution if it begins from the empty configuration \(C_{\mathsf {init}} = (\{\},\{\},\{\})\) and \(\forall \eta \in L(\tau )\), \(\sigma _s(\eta ) = \sigma _{\mathsf {init}}\).

Let \(\mathcal {WF}(\mathcal {S}_{\mathcal {P}, \varPsi })\) denote all well-formed executions of \(\mathcal {S}_{\mathcal {P}, \varPsi }\). The consistency policy \(\varPsi (\varDelta , \mathsf {vis}, \mathsf {eo})\) is a formula constraining the events in \(\varDelta \) and relations \(\mathsf {vis}\) and \(\mathsf {eo}\) defined over these events. Below, we illustrate how to express certain well-known consistency policies in our framework:

Consistency scheme

\({\varPsi (\varDelta , \mathsf {vis}, \mathsf {eo})}\)

Eventual Consistency [3]

\(\forall \eta ,\eta ' \in \varDelta . \lnot \mathsf {eo}(\eta ,\eta ')\)

Causal Consistency [14]

\(\forall \eta ,\eta ' \in \varDelta . \mathsf {vis}(\eta ,\eta ') \Leftrightarrow \mathsf {eo}(\eta ,\eta ') \)

\(\wedge \forall \eta ,\eta ', \eta '' \in \varDelta . \mathsf {vis}(\eta , \eta ') \wedge \mathsf {vis}(\eta ', \eta '') \Rightarrow \mathsf {vis}(\eta , \eta '')\)

RedBlue Consistency (\(O_r\)) [13]

\(\forall \eta ,\eta ' \in \varDelta . o(\eta ) \in O_r \wedge o(\eta ') \in O_r \wedge \mathsf {vis}(\eta ,\eta ') \Leftrightarrow \mathsf {eo}(\eta ,\eta ') \)

\(\wedge \forall \eta ,\eta ' \in \varDelta . o(\eta ) \in O_r \wedge o(\eta ') \in O_r \Rightarrow \mathsf {vis}(\eta ,\eta ') \vee \mathsf {vis}(\eta ',\eta )\)

Parallel Snapshot Isolation [23]

\(\forall \eta ,\eta ' \in \varDelta . (\mathsf {Wr}(\eta ^e) \cap \mathsf {Wr}(\eta ^{'e}) \ne \phi \wedge \mathsf {vis}(\eta ,\eta ')) \Leftrightarrow \mathsf {eo}(\eta ,\eta ')\)

\(\wedge \forall \eta ,\eta ' \in \varDelta . \mathsf {Wr}(\eta ^e) \cap \mathsf {Wr}(\eta ^{'e}) \ne \phi \Rightarrow \mathsf {vis}(\eta ,\eta ') \vee \mathsf {vis}(\eta ',\eta )\)

Strong Consistency

\(\forall \eta ,\eta ' \in \varDelta . \mathsf {vis}(\eta ,\eta ') \Leftrightarrow \mathsf {eo}(\eta ,\eta ')\)

\(\wedge \forall \eta ,\eta ' \in \varDelta . \mathsf {vis}(\eta ,\eta ') \vee \mathsf {vis}(\eta ',\eta )\)

For Eventual Consistency (EC) [3], we do not place any constraints on the visibility order and require the global effector order to be empty. This reflects the fact that in EC, any number of events can occur concurrently at different replicas, and hence a replica can witness any arbitrary subset of events which may be applied in any order. In Causal Consistency (CC) [14], an event is applied at a replica only if all causally dependent events have already been applied. An event \(\eta _1\) is causally dependent on \(\eta _2\) if \(\eta _1\) was generated at a replica where either \(\eta _2\) or any other event causally dependent on \(\eta _2\) had already been applied. The visibility relation \(\mathsf {vis}\) captures causal dependency, and by making \(\mathsf {vis}\) transitive, we ensure that all causal dependencies of events in \(\varDelta _r\) are also present in \(\varDelta _r\) (this is because in the transition rule, \(\varPsi \) is checked on the updated visibility relation which relates events in \(\varDelta _r\) with the newly generated event). Further, causally dependent events must be applied in the same order at all replicas, which we capture by asserting that \(\mathsf {vis}\) implies \(\mathsf {eo}\). In RedBlue Consistency (RB) [13], a subset of CRDT operations (\(O_r \subseteq O\)) are synchronized, so that they must occur in the same order at all replicas. We express RB in our framework by requiring the visibility relation to be total among events whose operations are in \(O_r\). In Parallel Snapshot Isolation (PSI) [23], two events which conflict with each other (because they write to a common variable) are not allowed to be executed concurrently, but are synchronized across all replicas to be executed in the same order. Similar to [10], we assume that when a CRDT is used under PSI, its state space \(\varSigma \) is a map from variables to values, and every operation generates an effector which simply writes to certain variables. We assume that \(\mathsf {Wr}(\eta ^e)\) returns the set of variables written by the effector \(\eta ^e\), and express PSI in our framework by requiring that events which write a common variable are applied in the same order (determined by their visibility relation) across all replicas; furthermore, the policy requires that the visibility operation among such events is total. Finally, in Strong Consistency, the visibility relation is total and all effectors are applied in the same order at all replicas.

Given an execution \(\tau \in \llbracket \mathcal {S}_{\mathcal {P}, \varPsi } \rrbracket \) and a transition \(C \xrightarrow {\eta } C'\) in \(\tau \), we associate a set of replica states \(\varSigma _{\eta }\) that the event can potentially witness, by considering all permutations of the effectors visible to \(\eta \) which obey the global effector order, when applied to the start state \(\sigma _{s}(\eta )\). Formally, this is defined as follows, assuming \(\eta = (\mathsf {eid}, o, \sigma _{s}, \{\eta _1, \ldots , \eta _k\}, \mathsf {eo}_r)\) and \(C = (\varDelta , \mathsf {vis}, \mathsf {eo})\)):

figure c

In the above definition, for all valid local effector orders \(\mathsf {eo}_{P}\), we compute the CRDT states obtained on applying those effectors on the start CRDT state, which constitute \(\varSigma _{\eta }\). The original event \(\eta \) presumably would have witnessed one of these states.

Definition 2

(Convergent Event). Given an execution \(\tau \in \llbracket \mathcal {S}_{\mathcal {P}, \varPsi } \rrbracket \) and an event \(\eta \in L(\tau )\), \(\eta \) is convergent if \(\varSigma _{\eta }\) is singleton.

Definition 3

(Strong Eventual Consistency). A CRDT \((\varSigma , O, \sigma _{\mathsf {init}})\) achieves strong eventual consistency (SEC)under a weak consistency specification \(\varPsi \) if for all well-formed executions \(\tau \in \mathcal {WF}(\mathcal {S}_{\mathcal {P}, \varPsi })\) and for all events \(\eta \in L(\tau )\), \(\eta \) is convergent.

An event is convergent if all valid permutations of visible events according to the specification \(\varPsi \) lead to the same state. This corresponds to the requirement that if two replicas have witnessed the same set of operations, they must be in the same state. A CRDT achieves SEC if all events in all executions are convergent.

4 Automated Verification

In order to show that a CRDT achieves SEC under a consistency specification, we need to show that all events in any execution are convergent, which in turn requires us to show that any valid permutation of valid subsets of events in an execution leads to the same state. This is a hard problem because we have to reason about executions of unbounded length, involving unbounded sets of effectors and reconcile the declarative event-based specifications of weak consistency with states generated during execution. To make the problem tractable, we use a two-fold strategy. First, we show that if any pair of effectors generated during any execution either commute with each other or are forced to be applied in the same order by the consistency policy, then the CRDT achieves SEC. Second, we develop an inductive proof rule to show that all pairs of effectors generated during any (potentially unbounded) execution obey the above mentioned property. To ensure soundness of the proof rule, we place some reasonable assumptions on the consistency policy that (intuitively) requires behaviorally equivalent events to be treated the same by the policy, regardless of context (i.e., the length of the execution history at the time the event is applied). We then extract a simple sufficient condition which we call as non-interference to commutativity that captures the heart of the inductive argument. Notably, this condition can be automatically checked for different CRDTs under different consistency policies using off-the-shelf theorem provers, thus providing a pathway to performing automated parametrized verification of CRDTs.

Given a transition \((\varDelta , \mathsf {vis}, \mathsf {eo}) \xrightarrow {\eta } C\), we denote the global effector order in the starting configuration of \(\eta \), i.e. \(\mathsf {eo}\) as \(\mathsf {eo}_{\eta }\). We first show that a sufficient condition to prove that a CRDT is convergent is to show that any two events in its history either commute or are related by the global effector order.

Lemma 1

Given an execution \(\tau \in \llbracket \mathcal {S}_{\mathcal {P}, \varPsi } \rrbracket \), and an event \(\eta =(\mathsf {id},o,\sigma _s,\) \(\varDelta _r,\mathsf {eo}_r) \in L(\tau )\), if for all \(\eta _1, \eta _2 \in \varDelta _r\) such that \(\eta _1 \ne \eta _2\), either \(\eta _1^e \circ \eta _2^e = \eta _2^e \circ \eta _1^e\) or \(\mathsf {eo}_{\eta }(\eta _1, \eta _2)\) or \(\mathsf {eo}_{\eta }(\eta _2, \eta _1)\), then \(\eta \) is convergentFootnote 3.

We now present a property that consistency policies must obey for our verification methodology to be soundly applied. First, we define the notion of behavioral equivalence of events:

Definition 4

(Behavioral Equivalence).

Two events \(\eta _1 = (\mathsf {id}_1, o_1, \sigma _1, \varDelta _1, \mathsf {eo}_1)\) and \(\eta _2 = (\mathsf {id}_2, o_2, \sigma _2, \varDelta _2, \mathsf {eo}_2)\) are behaviorally equivalent if \(\eta _1^{e} = \eta _2^{e}\) and \(o_1 = o_2\).

That is, behaviorally equivalent events produce the same effectors. We use the notation \(\eta _1 \equiv \eta _2\) to indicate that they are behaviorally equivalent.

Definition 5

(Behaviorally Stable Consistency Policy). A consistency policy \(\varPsi \) is behaviorally stable if \(\forall \varDelta , \mathsf {vis}, \mathsf {eo}, \varDelta ', \mathsf {vis}^{'}, \mathsf {eo}^{'}\), \( \eta _1, \eta _2 \in \varDelta \), \(\eta _1^{'}, \eta _2^{'} \in \varDelta ^{'}\) the following holds:

$$\begin{aligned} \begin{aligned} (\varPsi (\varDelta , \mathsf {vis}, \mathsf {eo}) \wedge \varPsi (\varDelta ^{'}, \mathsf {vis}^{'}, \mathsf {eo}^{'}) \wedge \eta _1 \equiv \eta _{1}^{'} \wedge \eta _2 \equiv \eta _{2}^{'} \wedge \mathsf {vis}(\eta _1, \eta _2) \Leftrightarrow \mathsf {vis}'(\eta _{1}^{'}, \eta _{2}^{'})) \\ \Rightarrow \mathsf {eo}(\eta _1, \eta _2) \Leftrightarrow \mathsf {eo}'(\eta _{1}^{'}, \eta _{2}^{'}) \end{aligned} \end{aligned}$$

Behaviorally stable consistency policies treat behaviorally equivalent events which have the same visibility relation among them in the same manner by enforcing the same effector order. All consistency policies that we discussed in the previous section (representing the most well-known in the literature) are behaviorally stable:

Lemma 2

EC, CC, PSI, RB and SC are behaviorally stable.

EC does not enforce any effector ordering and hence is trivially stable behaviorally. CC forces causally dependent events to be in the same order, and hence behaviorally equivalent events which have the same visibility order will be forced to be in the same effector order. RB forces events whose operations belong to a specific subset to be in the same order, but since behaviorally equivalent events perform the same operation, they would be enforced in the same effector ordering. Similarly, PSI forces events writing to a common variable to be in the same order, but since behaviorally equivalent events generate the same effector, they would also write to the same variables and hence would be forced in the same effector order. SC forces all events to be in the same order which is equal to the visibility order, and hence is trivially stable behaviorally. In general, behaviorally stable consistency policies do not consider the context in which events occur, but instead rely only on observable behavior of the events to constrain their ordering. A simple example of a consistency policy which is not behaviorally stable is a policy which maintains bounded concurrency [12] by limiting the number of concurrent operations across all replicas to a fixed bound. Such a policy would synchronize two events only if they occur in a context where keeping them concurrent would violate the bound, but behaviorally equivalent events in a different context may not be synchronized.

For executions under a behaviorally stable consistency policy, the global effector order between events only grows in an execution, so that if two events \(\eta _1\) and \(\eta _2\) are in the history of some event \(\eta \) are related by \(\mathsf {eo}_{\eta }\), then if they later occur in the history of any other event, they would be related in the same effector order. Hence, we can now define a common global effector order for an execution. Given an execution \(\tau \in \llbracket \mathcal {S}_{\mathcal {P}, \varPsi }\rrbracket \), the effector order \(\mathsf {eo}_{\tau } \subseteq L(\tau ) \times L(\tau )\) is an anti-symmetric relation defined as follows:

$$ \mathsf {eo}_{\tau } = \{(\eta _1, \eta _2)\ |\ \exists \eta \in L(\tau ).\ (\eta _1, \eta _2) \in \mathsf {eo}_{\eta } \} $$

Similarly, we also define \(\mathsf {vis}_{\tau }\) to be the common visibility relation for an execution \(\tau \), which is nothing but the \(\mathsf {vis}\) relation in the final configuration of \(\tau \).

Definition 6

(Commutative modulo Consistency Policy). Given a CRDT \(\mathcal {P}\), a behaviorally stable weak consistency specification \(\varPsi \) and an execution \(\tau \in \llbracket \mathcal {S}_{\mathcal {P}, \varPsi } \rrbracket \), two events \(\eta _1,\eta _2 \in L(\tau )\) such that \(\eta _1 \ne \eta _2\) commute modulo the consistency policy \(\varPsi \) if either \(\eta _1^e \circ \eta _2^e = \eta _2^e \circ \eta _1^e\) or \(\mathsf {eo}_{\tau }(\eta _1, \eta _2)\) or \(\mathsf {eo}_{\tau }(\eta _2, \eta _1)\).

The following lemma is a direct consequence of Lemma 1:

Lemma 3

Given a CRDT \(\mathcal {P}\) and a behaviorally stable consistency specification \(\varPsi \), if for all \(\tau \in \mathcal {WF}(\mathcal {S}_{\mathcal {P}, \varPsi })\), for all \(\eta _1, \eta _2 \in L(\tau )\) such that \(\eta _1 \ne \eta _2\), \(\eta _1\) and \(\eta _2\) commute modulo the consistency policy \(\varPsi \), then \(\mathcal {P}\) achieves SEC under \(\varPsi \).

Our goal is to use Lemma 3 to show that all events in any execution commute modulo the consistency policy. However, executions can be arbitrarily long and have an unbounded number of events. Hence, for events occurring in such large executions, we will instead consider behaviorally equivalent events in a smaller execution and show that they commute modulo the consistency policy, which by stability of the consistency policy directly translates to their commutativity in the larger context. Recall that the effector generated by an operation depends on its start state and the sequence of other effectors applied to that state. To generate behaviorally equivalent events with arbitrarily long histories in short executions, we summarize these long histories into the start state of events, and use commutativity itself as an inductive property of these start states. That is, we ask if two events with arbitrary start states and empty histories commute modulo \(\varPsi \), whether the addition of another event to their histories would continue to allow them to commute modulo \(\varPsi \).

Definition 7

(Non-interference to Commutativity). (Non-Interf) A CRDT \(\mathcal {P} = (\varSigma , O, \sigma _{\texttt {init}})\) satisfies non-interference to commutativity under a consistency policy \(\varPsi \) if and only if the following conditions hold:

  1. 1.

    For all executions \(C_{\texttt {init}} \xrightarrow {\eta _1} C_1 \xrightarrow {\eta _2} C_2\) in \(\mathcal {WF}(\mathcal {S}_{\mathcal {P}, \varPsi })\), \(\eta _1\) and \(\eta _2\) commute modulo \(\varPsi \).

  2. 2.

    For all \(\sigma _1, \sigma _2, \sigma _3 \in \varSigma \), if for execution \(\tau \equiv C_{\texttt {init}} \xrightarrow {\eta _1} C_1 \xrightarrow {\eta _2} C_2\) in \(\llbracket \mathcal {S}_{\mathcal {P}, \varPsi } \rrbracket \) where \(\sigma _s(\eta _1) = \sigma _1\), \(\sigma _s(\eta _2) = \sigma _2\), \(\eta _1\) and \(\eta _2\) commute modulo \(\varPsi \), then for all executions \(\tau ' \equiv C_{\texttt {init}} \xrightarrow {\eta _3} C_{1}^{'} \xrightarrow {\eta _{1}^{'}} C_{2}^{'} \xrightarrow {\eta _{2}^{'}} C_{3}^{'}\) such that \(\sigma _s(\eta _{1}^{'}) = \sigma _1\), \(o(\eta _{1}^{'}) = o(\eta _1)\), \(\sigma _s(\eta _{2}^{'}) = \sigma _2\), \(o(\eta _{2}^{'}) = o(\eta _2)\), \(\sigma _s(\eta _3) = \sigma _3\), and \(\mathsf {vis}_{\tau }(\eta _1, \eta _2) \Leftrightarrow \mathsf {vis}_{\tau '}(\eta _{1}^{'}, \eta _{2}^{'})\), \(\eta _{1}^{'}\) and \(\eta _{2}^{'}\) commute modulo \(\varPsi \).

Condition (1) corresponds to the base case of our inductive argument and requires that in well-formed executions with 2 events, both the events commute modulo \(\varPsi \). For condition (2), our intention is to consider two events \(\eta _a\) and \(\eta _b\) with any arbitrary histories which can occur in any well-formed execution and, assuming that they commute modulo \(\varPsi \), show that even after the addition of another event to their histories, they continue to commute. We use CRDT states \(\sigma _1,\sigma _2\) to summarize the histories of the two events, and construct behaviorally equivalent events (\(\eta _1 \equiv \eta _a\) and \(\eta _2 \equiv \eta _b\)) which would take \(\sigma _1,\sigma _2\) as their start states. That is, if \(\eta _a\) produced the effector \(o(\sigma _{\texttt {init}}, \pi )\)Footnote 4, where o is the CRDT operation corresponding to \(\eta _a\) and \(\pi \) is the sequence of effectors in its history, we leverage the observation that \(o(\sigma _{\texttt {init}}, \pi ) = o(\pi (\sigma _{\texttt {init}}), \epsilon )\), and assuming \(\sigma _1 = \pi (\sigma _{\texttt {init}})\), we obtain the behaviorally equivalent event \(\eta _1\), i.e. \(\eta _1^e \equiv \eta _a^e\). Similar analysis establishes that \(\eta _2^e \equiv \eta _b^e\). However, since we have no way of characterizing states \(\sigma _1\) and \(\sigma _2\) which are obtained by applying arbitrary sequences of effectors, we use commutativity itself as an identifying characteristic, focusing on only those \(\sigma _1\) and \(\sigma _2\) for which the events \(\eta _1\) and \(\eta _2\) commute modulo \(\varPsi \).

The interfering event is also summarized by another CRDT state \(\sigma _3\), and we require that after suffering interference from this new event, the original two events would continue to commute modulo \(\varPsi \). This would essentially establish that any two events with any history would commute modulo \(\varPsi \) in these small executions, which by the behavioral stability of \(\varPsi \) would translate to their commutativity in any execution.

Theorem 1

Given a CRDT \(\mathcal {P}\) and a behaviorally stable consistency policy \(\varPsi \), if \(\mathcal {P}\) satisfies non-interference to commutativity under \(\varPsi \), then \(\mathcal {P}\) achieves SEC under \(\varPsi \).

Example: Let us apply the proposed verification strategy to the ORSet CRDT shown in Fig. 2. Under EC, condition (1) of Non-Interf fails, because in the execution \(C_{\texttt {init}} \xrightarrow {\eta _1} C_1 \xrightarrow {\eta _2} C_2\) where \(o(\eta _1) = \) Add(a,i) and \(o(\eta _2)=\) Remove(a) and \(\mathsf {vis}(\eta _1, \eta _2)\), \(\eta _1\) and \(\eta _2\) don’t commute modulo EC, since (a,i) would be present in the source replica of Remove(a). However, \(\eta _1\) and \(\eta _2\) would commute modulo CC, since they would be related by the effector order. Now, moving to condition (2) of Non-interf, we limit ourselves to source replica states \(\sigma _1\) and \(\sigma _2\) where Add(a,i) and Remove(a) do commute modulo CC. If \(\mathsf {vis}_{\tau }(\eta _1, \eta _2)\), then after interference, in execution \(\tau '\), \(\mathsf {vis}_{\tau '}(\eta _1^{'}, \eta _2^{'})\), in which case \(\eta _1^{'}\) and \(\eta _2^{'}\) trivially commute modulo CC (because they would be related by the effector order). On the other hand, if \(\lnot \mathsf {vis}_{\tau }(\eta _1, \eta _2)\), then for \(\eta _1\) and \(\eta _2\) to commute modulo CC, we must have that the effectors \(\eta _1^e\) and \(\eta _2^e\) themselves commute, which implies that \(\texttt {(a,i)} \notin \sigma _2\). Now, consider any execution \(\tau ^{'}\) with an interfering operation \(\eta _3\). If \(\eta _3\) is another Add(a,i’) operation, then \(\texttt {i'}\ne \texttt {i}\), so that even if it is visible to \(\eta _2^{'}\), \(\eta _2^{'e}\) will not remove (a,i), so that \(\eta _1^{'}\) and \(\eta _2^{'}\) would commute. Similarly, if \(\eta _3\) is another Remove(a) operation, it can only remove tagged versions of a from the source replicas of \(\eta _2^{'}\), so that the effector \(\eta _2^{'e}\) would not remove (a,i).

5 Experimental Results

In this section, we present the results of applying our verification methodology to a number of CRDTs under different consistency models. We collected CRDT implementations from a number of sources [1, 19, 20] and since all of the existing implementations assume a very weak consistency model (primarily CC), we additionally implemented a few CRDTs on our own intended to only work under stronger consistency schemes but which are better in terms of time/space complexity and ease of development. Our implementations are not written in any specific language but instead are specified abstractly akin to the definitions given in Figs. 1 and 2. To specify CRDT states and operations, we fix an abstract language that contains uninterpreted datatypes (used for specifying elements of sets, lists, etc.), a set datatype with support for various set operations (add, delete, union, intersection, projection, lookup), a tuple datatype (along with operations to create tuples and project components) and a special uninterpreted datatype equipped with a total order for identifiers. Note that the set datatype used in our abstract language is different from the Set CRDT, as it is only intended to perform set operations locally at a replica. All existing CRDT definitions can be naturally expressed in this framework.

Here, we revert back to the op-based specification of CRDTs. For a given CRDT \(\mathcal {P}=(\varSigma ,O,\sigma _{\texttt {init}})\), we convert all its operations into FOL formulas relating the source, input and output replica states. That is, for a CRDT operation \(o : \varSigma \rightarrow \varSigma \rightarrow \varSigma \), we create a predicate \(\mathsf {o} : \varSigma \times \varSigma \times \varSigma \rightarrow \mathbb {B}\) such that \(\mathsf {o}(\sigma _s, \sigma _i, \sigma _o)\) is true if and only if \(o(\sigma _s)(\sigma _i) = \sigma _o\). Since CRDT states are typically expressed as sets, we axiomatize set operations to express their semantics in FOL.

In order to specify a consistency model, we introduce a sort for events and binary predicates \(\mathsf {vis}\) and \(\mathsf {eo}\) over this sort. Here, we can take advantage of the declarative specification of consistency models and directly encode them in FOL. Given an encoding of CRDT operations and a consistency model, our verification strategy is to determine whether the Non-Interf property holds. Since both conditions of this property only involve executions of finite length (at most 3), we can directly encode them as \(\mathsf {UNSAT}\) queries by asking for executions which break the conditions. For condition (1), we query for the existence of two events \(\eta _1\) and \(\eta _2\) along with \(\mathsf {vis}\) and \(\mathsf {eo}\) predicates which satisfy the consistency specification \(\varPsi \) such that these events are not related by \(\mathsf {eo}\) and their effectors do not commute. For condition (2), we query for the existence of events \(\eta _1, \eta _2, \eta _3\) and their respective start states \(\sigma _1,\sigma _2, \sigma _3\), such that \(\eta _1\) and \(\eta _2\) commute modulo \(\varPsi \) but after interference from \(\eta _3\), they are not related by \(\mathsf {eo}\) and do not commute. Both these queries are encoded in EPR [18], a decidable fragment of FOL, so if the CRDT operations and the consistency policy can also be encoded in a decidable fragment of FOL (which is the case in all our experiments), then our verification strategy is also decidable. We write Non-Interf-1 and Non-Interf-2 for the two conditions of Non-Interf.

Fig. 4.
figure 4

Convergence of CRDTs under different consistency policies.

Figure 4 shows the results of applying the proposed methodology on different CRDTs. We used Z3 to discharge our satisfiability queries. For every combination of a CRDT and a consistency policy, we write to indicate that verification of Non-Interf failed, while indicates that it was satisfied. We also report the verification time taken by Z3 for every CRDT across all consistency policies executing on a standard desktop machine. We have picked the three collection datatypes for which CRDTs have been proposed i.e. Set, List and Graph, and for each such datatype, we consider multiple variants that provide a tradeoff between consistency requirements and implementation complexity. Apart from EC, CC and PSI, we also use a combination of PSI and RB, which only enforce PSI between selected pairs of operations (in contrast to simple RB which would enforce SC between all selected pairs). Note that when verifying a CRDT under PSI, we assume that the set operations are implemented as Boolean assignments, and the write set \(\mathsf {Wr}\) consists of elements added/removed. We are unaware of any prior effort that has been successful in automatically verifying any CRDT, let alone those that exhibit the complexity of the ones considered here.

Set: The Simple-Set CRDT in Fig. 1 does not converge under EC or CC, but achieves convergence under PSI+RB which only synchronizes Add and Remove operations to the same elements, while all other operations continue to run under EC, since they do commute with each other. As explained earlier, ORSet does not converge under EC and violates Non-Interf-1. ORSet with tombstones converges under EC as well since it uses a different set (called a tombstone) to keep track of removed elements. USet is another implementation of the Set CRDT which converges under the assumptions that an element is only added once, and removes only work if the element is already present in the source replica. USet converges only under PSI, because under any weaker consistency model, non-interf-2 breaks, since Add(a) interferes and breaks the commutativity of Add(a) and Remove(a). Notice that as the consistency level weakens, implementations need to keep more and more information to maintain convergence–compute unique ids, tag elements with them or keep track of deleted elements. If the underlying replicated store supports stronger consistency levels such as PSI, simpler definitions are sufficient.

List: The List CRDT maintains a total ordering between its elements. It supports two operations: AddRight(e,a) adds new element a to the right of existing element e, while Remove(e) removes e from the list. We use the implementation in [1] (called RGA) which uses time-stamped insertion trees. To maintain integrity of the tree structure, the immediate predecessor of every list element must be present in the list, due to which operations AddRight(a,b) and AddRight(b,c) do not commute. Hence RGA does not converge under EC because Non-Interf-1 is violated, but converges under CC.

To make adds and removes involving the same list element commute, RGA maintains a tombstone set for all deleted list elements. This can be expensive as deleted elements may potentially need to be tracked forever, even with garbage collection. We consider a slight modification of RGA called RGA-No-Tomb which does not keep track of deleted elements. This CRDT now has a convergence violation under CC (because of Non-Interf-1), but achieves convergence under PSI+RB where we enforce PSI only for pairs of AddRight and Remove operations.

Graph: The Graph CRDT maintains sets of vertices and edges and supports operations to add and remove vertices and edges. The 2P2P-Graph specification uses separate 2P-Sets for both vertices and edges, where a 2P-Set itself maintains two sets for addition and removal of elements. While 2P sets themselves converge under EC, the 2P2P-Graph has convergence violations (to Non-Interf-1) involving AddVertex(v) and RemoveVertex(v) (similarly for edges) since it removes a vertex from a replica only if it is already present. We verify that it converges under CC. Graphs require an integrity constraint that edges in the edge-set must always be incident on vertices in the vertex-set. Since concurrent RemoveVertex(v) and AddEdge(v,v’) can violate this constraint, the 2P2P-Graph uses the internal structure of the 2P-Set which keeps track of deleted elements and considers an edge to be in the edge set only if its vertices are not in the vertex tombstone set (leading to a remove-wins strategy).

Building a graph CRDT can be viewed as an exercise in composing CRDTs by using two ORSet CRDTs, keeping the internal implementation of the ORSet opaque, using only its interface. The Graph-with-ORSet implementation uses separate ORSets for vertices and edges and explicitly maintains the graph integrity constraint. We find convergence violations (to Non-Interf-1) between RemoveVertex(v) and AddEdge(v,v’), and RemoveVertex(v) and RemoveEdge(v,v’) under both EC and CC. Under PSI+RB (enforcing RB on the above two pairs of operations), we were able to show convergence.

When a CRDT passes Non-Interf under a consistency policy, we can guarantee that it achieves SEC under that policy. However, if it fails Non-Interf, it may or may not converge. In particular, if it fails Non-Interf-1 it will definitely not converge (because Non-Interf-1 constructs a well-formed execution), but if it passes Non-Interf-1 and fails Non-Interf-2, it may still converge because of the imprecision of Non-Interf-2. There are two sources of imprecision, both concerning the start states of the events picked in the condition: (1) we only use commutativity as a distinguishing property of the start states, but this may not be a sufficiently strong inductive invariant, (2) we place no constraints on the start state of the interfering operation. In practice, we have found that for all cases except U-Set, convergence violations manifest via failure of Non-Interf-1. If Non-Interf-2 breaks, we can search for well-formed executions of higher length upto a bound. For U-Set, we were successful in adopting this approach, and were able to find a non-convergent well-formed execution of length 3.

6 Related Work and Conclusions

Reconciling concurrent updates in a replicated system is a important well-studied problem in distributed applications, having been first studied in the context of collaborative editing systems [17]. Incorrect implementation of replicated sets in Amazon’s Dynamo system [7] motivated the design of CRDTs as a principled approach to implementing replicated data types. Devising correct implementations has proven to be challenging, however, as evidenced by the myriad pre-conditions specified in the various CRDT implementations [20].

Burckhardt et al. [6] present an abstract event-based framework to describe executions of CRDTs under different network conditions; they also propose a rigorous correctness criterion in the form of abstract specifications. Their proof strategy, which is neither automated nor parametric on consistency policies, verifies CRDT implementations against these specifications by providing a simulation invariant between CRDT states and event structures. Zeller et al. [24] also require simulation invariants to verify convergence, although they only target state-based CRDTs. Gomes et al. [9] provide mechanized proofs of convergence for ORSet and RGA CRDTs under causal consistency, but their approach is neither automated nor parametric.

A number of earlier efforts [2, 10,11,12, 22] have looked at the problem of verifying state-based invariants in distributed applications. These techniques typically target applications built using CRDTs, and assume their underlying correctness. Because they target correctness specifications in the form of state-based invariants, it is unclear if their approaches can be applied directly to the convergence problem we consider here. Other approaches [4, 5, 16] have also looked at the verification problem of transactional programs running on replicated systems under weak consistency, but these proposals typically use serializability as the correctness criterion, adopting a “last-writer wins” semantics, rather than convergence, to deal with concurrent updates.

This paper demonstrates the automated verification of CRDTs under different weak consistency policies. We rigorously define the relationship between commutativity and convergence, formulating the notion of commutativity modulo consistency policy as a sufficient condition for convergence. While we require a non-trivial inductive argument to show that non-interference to commutativity is sufficient for convergence, the condition itself is designed to be simple and amenable to automated verification using off-the-shelf theorem-provers. We have successfully applied the proposed verification strategy for all major CRDTs, additionally motivating the need for parameterization in consistency policies by showing variants of existing CRDTs which are simpler in terms of implementation complexity but converge under different weak consistency models.