Strong eventual consistency of the collaborative editing framework WOOT

Commutative Replicated Data Types (CRDTs) are a promising new class of data structures for large-scale shared mutable content in applications that only require eventual consistency. The WithOut Operational Transforms (WOOT) framework is the first CRDT for collaborative text editing introduced by Oster et al. (In: Conference on Computer Supported Cooperative Work (CSCW). ACM, New York, pp 259–268, 2006a). Its eventual consistency property was verified only for a bounded model to date. While the consistency of many other previously published CRDTs had been shown immediately with their publication, the property for WOOT remained open for 14 years. We use a novel approach identifying a previously unknown sort-key based protocol that simulates the WOOT framework to show its consistency. We formalize the proof using the Isabelle/HOL proof assistant to machine-check its correctness.


Introduction
A Replicated (Abstract) Data Type (RDT) consists of "multiple copies of a shared Abstract Data Type (ADT) replicated over distributed sites, [which] provides a set of primitive operation types corresponding to that of normal ADTs, concealing details for consistency maintenance" [25].RDTs can be classified as state-based or operation-based depending on whether full states (e.g., a document's text) or only the operations performed on them (e.g., character insertions and deletions) are exchanged among replicas.Operationbased RDTs are commutative when the integration of any two concurrent operations on any reachable replica state commutes [27].Commutative (Operation-Based) Replicated Data Types (CRDTs1 from now on) enable sharing mutable content with optimistic replication-ensuring high-B Emin Karayel me@eminkarayel.deEdgar Gonzàlez edgargip@google.com 1 Google, Mountain View, USA 2 Present Address: Karlsruhe, Germany availability, responsive interaction, and eventual consistency in an asynchronous network without consensus-based concurrency control [14].They are used in highly scalable robust distributed applications [4,31].An RDT (and, in particular a CRDT) is eventually consistent when, if after some point in time no further updates are made at any replica, all replicas eventually converge to equal states.It is strongly eventually consistent when it is both eventually consistent and strongly convergent, i.e., any pair of peers which have integrated the same set of updates (in possibly different order) are in the same state [27].The first [3] proposed CRDT for collaborative text editing was the WithOut Operational Transforms (WOOT) Framework [21].It has been implemented as part of several OSS projects [5,7,9,19].However its eventual consistency property was verified only for a bounded model to date [20,21].The usual commutativity of operations based proofs of consistency fail to apply for WOOT, hence we use a novel approach, identifying a previously unknown sort-key based protocol that simulates the WOOT framework.Due to the length and complexity of the proof, we formalized it and machine-checked its correctness using the proof assistant Isabelle/HOL [10].In summary our novel contributions are: • the introduction of a new class of sort key based protocols for collaborative editing with logarithmic message size per edit operation, • the observation that the WOOT framework can be simulated by an instance of the above, revealing previously unknown hidden-structure of the framework.
After reviewing related work in the following section, we start in Sect. 3 with a well-known strongly consistent CRDT making a sequence of refinements until we reach the WOOT framework.This allows us to discuss each idea in the proof individually, instead of presenting a large intricate proof in one go.Section 3 motivates the question of the existence of a certain sort-key space, which we confirm in Sect. 4. In Sect. 5 we present the rigorous results implied by the previous section and their formalized proof.We discuss the high-level proof strategy and key intermediate results.In Sect.6 we explain how we modelled the framework and proof in Isabelle/HOL.In Sect.7 we conclude with a summary and discuss open research directions.

Related work
Ellis and Gibbs [6] introduced the first collaborative text editing tools, which were based on operational transformations (OT).The basic idea behind OT-based frameworks is to adjust edit operations, based on the effects of previously executed concurrent operations.For instance, in Fig. 1a, peer B can execute the message received from peer A without correction, but peer A needs to transform the one received from peer B to reach the same state.Proving the correctness of OTbased frameworks is error-prone and requires complicated case coverage [15,22].Counter-examples have been found in most OT algorithms [25], [8,  §8.2].LSEQ [17], LOGOOT [31] and TreeDoc [23] are CRDTs that create and send sort keys for symbols (e.g., 1.5 and 3.5 in Fig. 1b).These keys can then be directly used to order them, without requiring any transformations, and are drawn from a dense totally ordered space.In the figure rational numbers were chosen for simplicity, but more commonly lexicographically ordered sequences are used. 2The consistency property of these frameworks can be established easily.However, the space required per sort key potentially grows linearly with the count of edit operations.In LSEQ, a randomized allocation strategy for new identifiers is used to reduce the key growth, based on empirically determined edit patterns-but in the worst-case the size of the keys will still grow linearly with the count of insert operations.Preguica et al. [23] propose a solution for this problem using regular rebalancing operations.
However, this can only be done using a consensus-based mechanism, which is only possible when the number of participating peers is small.A benefit of LSEQ, LOGOOT, and TreeDoc is that deleted symbols can be garbage-collected (though delete messages may have to be kept in a buffer if the corresponding insertion message has not arrived at a peer), in contrast to the WOOT Framework, where deleted symbols (tombstones) cannot be removed.Replicated Growable Arrays (RGAs) are another data structure for collaborative editing, introduced by Roh et al. [25].Contrary to the previous approaches, the identifiers associated with the symbols are not sort keys, but are instead ordered consistently with the happened-before relation.A peer sends the identifier of the symbol immediately preceding the new symbol at the time it was created and the actual identifier associated with the new symbol.The integration algorithm starts by finding the preceding symbol and skipping following symbols with a larger identifier before placing the new symbol.The authors provide a mathematical eventual consistency proof.Recently, Gomes et al. [8] also formalized the eventual consistency property of RGAs using Isabelle/HOL [18].The message size of both the WOOT framework and of RGAs grows only logarithmically with the number of peers and edit operations. 3In addition to the original design of WOOT by Oster et al. [21], a number of extensions have also been proposed.For instance, Weiss et al. [30] propose a line-based version called WOOTO.Ahmed-Nacer et al. [2] introduce a second extension called WOOTH, which improves performance by using hash tables.The latter compare their implementation in benchmarks against LOGOOT, RGA, and an OT algorithm.To the best of our knowledge there were no previous publications that further expand on the correctness of the WOOT Framework.The fact that the general convergence proof was missing had also been mentioned by Kumawat  and Khunteta [12,  §3.10].

Deriving WOOT
As we mentioned in the introduction, a rigorous correctness proof for the eventual consistency of the WOOT framework is long and complex, which is the reason we relied on a proof assistant to avoid any subtle flaws in the argument.In this section we want to highlight the key ideas in it: we derive the WOOT framework starting from the well-known consistent CRDT 2P-Set (Two-Phase-Set), making iterative refinements until we reach WOOT.On each refinement, we explain intuitively why it preserves consistency.This way,

Peer A
Peer B p a n t p a n t we can convey each idea from the exhaustive proof, independently and decoupled from each other.

2P-Set
The 2P-Set [27] is one of the simple CRDTs allowing a shared replicated data structure for a set where elements can be added and removed.Each peer keeps track of two sets S and R initially both being empty.An element e is added/removed to/from the replicated data structure by broadcasting a message add e / remove e.A peer that receives an add e message adds the element e to the first set S. If a peer receives a remove e it will add it to the second set R. A peer determines whether an element is in the 2P-Set by checking whether it is in the first set and not in the second set.Note that an element which was removed can never be added again. 4Messages are broadcast to all peers including the one that originated the message and the integration algorithm for a message is identical, irrespective of whether the message originated from the same peer or from a different one. 5We summarize the CRDT in Protocol 1.We think of the peers as single threaded machines, communicating by broadcasting messages in an asynchronous 4 In practice this can be circumvented by storing an additional unique identifier per set element.See also U-set [27] for an example of this design. 5Modelling a replicated data structure this way enables easier proofs, as there is no need to distinguish those two cases.However, a realworld implementation may have separate code-paths for broadcasting a message to other peers, and integrating the message locally.See for example [8,  §5.2],where the same approach has been taken.network.There is no synchronization or shared variables between peers.For simplicity, we do assume a single broadcast message will be received (with a possibly arbitrary delay) at most once by each peer 6 and that messages are neither being altered nor being sent by peers implementing a different protocol.It is easy to see that, once each peer has received all updates, they will all be in the same state, i.e., if we assume eventual delivery than we have eventual consistency.And, similarly, that two peers who have integrated the same set of messages will be in the same state.This essentially follows from the fact that set union is commutative, associative, and idempotent.For example the operations add e / add f commute because: A more naive implementation with only a single set, where remove (element e) removes the element from that set, would not have the same consistency properties: A peer that receives an insert operation after the remove operation of the same element will be in a different state than a peer that receives the remove operation after the insert operation.Algebraically this happens because: In the CRDT community the elements of the second set are called tombstones.The information in the set needs to be pre-served to enable order-independent integration of messages and would be avoidable in a single process application.

Sort keys
Symbols may occur multiple times and in arbitrary order within text, hence sets do not seem to be a useful CRDT for collaborative text editing, but we can easily support sequences using the 2P-Set as a building block.
To do that we use set elements e that are pairs of: The user only observes the symbols, but in the order induced by the sort keys.In the following we use the term character to refer to the compound tuple consisting of the symbol and additional associated information, like the sort key.In Table 1, we give an example of a state with pairs of sort keys and symbols-for the sequence "pant".
Characters are inserted into the sequence by creating a sort key that is ordered between those of the preceding and succeeding sequence elements.In the example (Table 1), to In Protocol 2 we summarize the framework: We denote by rank α (S, k) the function that returns the k-th smallest7 element of the set S according to the order induced by α.Similarly sort α (S) returns the sequence of the elements in S according to the order induced by α.
The algorithm build-sort-key computes a sort key between the pair of sort keys it was given.Note that for the special case where a sort key for the beginning (resp.end) of the string needs to be generated: We allow passing in the special values (resp.) as first (resp.second) argument representing an element outside the set of sort keys with order strictly smaller (resp.larger) than all of them.
To clarify, the above approach works under the assumption that the available sort keys are elements of a dense totally ordered set, such that it is always possible to find a new sort key between each pair of previously generated sort keys. 8dditionally, a mechanism needs to be introduced that prevents the possibility of choosing the same sort key twice.

Avoiding collisions
As mentioned in the previous paragraph, the above solution may lead to collisions, where two distinct characters are inserted at the same position with the same sort key.That prevents them from being ordered in a sequence, as well as any possibility of inserting a character between them.To resolve this, we introduce a unique id i ∈ I for each inserted character.We reserve distinct dense subsets for the sort keys of each such unique id.

Protocol 3 New version of the Insert Algorithm
In Protocol 3, we give an updated version of the insertion algorithm.The function create-unique-id creates a new globally unique id.This can be achieved by assigning a unique id to each peer and keeping and incrementing a counter on each peer, the unique id of the character would be formed by the pair of peer id and counter.The function is used to generate a new sort key, with the unique id as an additional argument.As before, the first argument (lower bound) may be the special value to facilitate the creation of a sort key for the beginning of the string.And, similarly, the last argument (the upper bound) may be the special value to facilitate insertion at the end of the string.Note that we require to have at least the following properties: The first property ensures that the newly computed sort key is actually between the sort keys of the adjacent characters.
The second implies that for distinct unique ids the generated sort keys will be distinct, for any predecessor and successor.We can give an example for a function fulfilling conditions 1, and 2 in the case where the identifiers are natural numbers strictly between 0 and b, i.e., I = {1, . . ., b − 1}.Let Q b,i be the rational numbers with a finite b-ary representation ending with i ∈ I: and let θ be an injective function from the rational numbers to the natural numbers. 9We can then define the following function on A := Q ∩ (0, 1), the set of rational numbers between 0 and 1: i.e. if we order the rational numbers according to the enumeration induced by θ , then (l, i, u) is the first rational number in the sequence that has a finite b-ary representation ending in i and whose value is strictly between l and u.
Note that we identified with 0 (representing the beginning of the string) and with 1 (representing the ending of the string), while the sort keys are rational numbers strictly between 0 and 1.We don't give a proof that this indeed works, since we will see below that we need an additional property for which narrows down the possible definitions for further.But the curious reader may verify that the candidate sets for each i are disjoint but all are dense in Q.

Avoiding transfer of sort keys
The above scheme has the drawback that the bit size of the sort keys can grow linearly with the number of edit operations and, since they are part of the transferred operations, the same is true for the message sizes per edit operation.To fix that, we are making a second change to the scheme.Instead of transferring the sort keys themselves, we send the unique ids (from the previous section) of each character, as well as the unique id of its immediate predecessor and successor at the time it was created.Additionally, we require that is a pure function, used by all peers.This allows every peer to compute the sort keys themselves.
Consider for example the characters in Table 2.We would assign them the sort keys: The identifier of a character, as well as the identifiers of its predecessor/successor, i.e., the identifiers of the character that were preceding/succeeding it at the time it was created, are immutable and using the function it is possible to compute the sort keys of each character recursively.Note that the unique ids do not have to be order-preserving and can be constructed in way that they only have logarithmic size with respect to the number of participating peers and edit operations.

Protocol 4 Deferred Computation of Sort Keys
In Protocol 4 we summarize the new scheme.In it, a character is represented by a triple in a peer's state; in particular: • The identifier i(c) ∈ I • A sort key α(c) ∈ A (as in the previous scheme) • The symbol σ (c) ∈ (as in the previous scheme) The keen reader will notice that the scheme could fail if insert messages are delivered out-of-order.In particular, when the insert message m 1 for the referenced sort key of an insert message m 2 has not been delivered to a peer, the integration of m 2 will not succeed.

Semantic causal delivery
One way to ensure that such a failure does not happen is to delay the integration of messages whose dependencies have not been received yet.The authors of WOOT have coined the term semantic causal delivery to refer to this delivery mechanism.Indeed it is a weaker form of causal delivery, which would imply a message is only delivered to a peer if all messages that were present during its creation were already delivered.
Since the latter set subsumes the dependencies, causal delivery will automatically imply semantic causal delivery, but not vice versa.
Note that since there is never a dependency on a delete message, and the insert messages already have a unique id associated to them, we denote by deps(m) the set of insert messages a message depends on (more precisely the ids of the inserted characters).In Protocol 5 we depict a possible mechanism to ensure semantic causal delivery, where messages whose dependencies have not been received yet are buffered, until their dependencies are integrated.For the discussion in the following sections, we will assume that some mechanism is employed to ensure semantic causal deliveryor potentially a stronger delivery notion.

Acyclic dependency graph
The dependency relation between messages above is acyclic during a run of the framework.To see this, note that a message can only depend on messages already integrated by a peer.In fact it is a consequence of the fact the state graph of a distributed message-passing algorithm is acyclic-when we associate the messages with the states on which they were generated, we can see that the dependency relation is a sub-

Receive Event Broadcast Event
Fig. 2 The causal relationship of messages is a subrelation of the happened-before relation relation of Lamport's acyclic happened-before relation.In Fig. 2 we illustrate the case of a message m 2 dependent on m 1 , which also implies that the state creating m 1 must have happened before the state creating m 2 .Note that the converse is not necessarily true, i.e., a message may not semantically depend on all the messages that were created or integrated before it.

Interleaving anomalies
In Sect.3.3 we described the minimum requirements on .
The conditions ensure intention preservation, i.e., that the character appears at the place it was inserted.But for concurrent insertion at the same place, the order is unspecified.For example, if two peers concurrently insert a character with symbol x and y between a and b the result, after all messages are received may be "axyb" or "ayxb".A well known anomaly [11] is the situation where entire words are inserted in the same spot concurrently.Consider for example the concurrent insertion of "pea" and "nut" at position 7 in "I like s" A good outcome would be "I like peanuts" or "I like nutpeas"; a careless definition of could assign sort keys resulting in something like "I like pneauts".We solve the issue in two steps.First, similarly to Oster et al. [21], we require that the unique identifiers (not the sort keys) associated to each character form a total order themselves.We cannot require the order to be monotone with respect to the ordering of the characters (that's what the sort keys are there for), but we want to make sure that the unique identifiers generated by distinct peers do not interleave, i.e., identifiers generated by peer A should never be ordered between identifiers generated by peer B 10 .The second step is to add an additional constraint to the function , requiring that the sort keys preserve the order on the identifier, if it does not violate Condition 1: Fig. 3 Sort keys associated with the concurrent insertion of 'pea' and 'nut' Given two characters that are to be inserted within each other's boundaries, we require that the order of the identifiers is respected.The condition is non-trivial, but it can be derived from Oster et al. [21, Theorem 3], under the assumption that there is a sort-key mapping .
Let us see how this avoids the "pneauts" outcome: We call the sort key of the second space (resp.s symbol) in "I like s" x (resp.y).The identifiers from the first peer generating "pea" are called i 1 , i 2 and i 3 .The identifiers from the second peer generating "nut" are called j 1 , j 2 and j 3 .We assume i 1 < i 2 < i 3 < j 1 < j 2 < j 3 .The first peer types "pa" and inserts the "e" between those in the last step, so that i 2 is associated with "a" and i 3 is associated with "e".In Fig. 3 we present the associated sort keys to each of the symbols.
Using Condition 1 we can easily deduce that: Since x < α p < y and x < α n < y we can conclude using Condition 3 that α p and α n are ordered with respect to the order chosen between i 1 and j 1 , i.e., α p < α n .Using Condition 3 again for α a and α n , where we have α p < α a < y and α p < α n < y, we can determine that α a < α n , which implies:

Avoiding the computation of sort keys
The keen reader may have noticed that, in the example above, there was no need to refer to a concrete instantiation of the function : we could order the entered symbol just by using conditions 1 and 3 as rules.We will see that this is always possible, and that leads to the last change to the framework we are building.Instead of computing the sort key of a character, we keep the characters in a sequence according to the order induced by the sort keys.The state of a peer is a sequence of characters: w 1 , . . ., w |w| .For each character w k , we remember the identifiers of the predecessor and successor characters l(w k ), u(w k ) (the characters that were adjacent to the character when it was created), as well as the character's identifier i(w k ) and symbol σ (w k ).This information is being stored, so that we can make sure the preconditions of Condition 3 are met.Note that we never remove characters from the sequence but just mark them as deleted, by replacing the characters symbol with ⊥.The integration algorithm for an insert message becomes the following: Given a new received character w new to be inserted, we can look up the positions l and u of the preceding and succeeding characters in the sequence, for which we know the identifiers.
Note that idx w (i) denotes the position of a character in a peer's state w with identifier i.Note also that l(w new ) (resp.u(w new )) may be (resp.).Thus, we define idx w ( ) := 0 and idx w ( ) := |w| + 1 for completeness.Due to Condition 1, we know that the sort key of the new character has to be between the sort keys of w l and w u .In the easiest case, these two would already be adjacent (i.e.u = l + 1) and we can just insert the new character after w l .If not, we need to narrow down the position further using Condition 3.This is possible for a subset T w (l, u) of the characters strictly between w l and w u whose dependencies are outside of the range w l and w u , i.e., both the position of the predecessor (resp.successor) of w t for t ∈ T w (l, u) needs to be less (resp.greater) than or equal to l (resp.u), which implies: Similarly we have: In those cases, we can use Condition  Fig. 4 Example of the insertion of the character with symbol 'n' after the characters 'p','e' and 'a' from the previous example in Sect.3.7.In the first step the integration algorithm determines the possible position for the new character using its predecessor and successor.After that the position is further narrowed down, using identifier comparisons with characters whose dependencies are outside the target range.The characters that are in L in each iteration step are depicted in dark gray.The positions of the narrowed bounds for the next step are depicted using the arrows with the labels l and u whether i(w new ) < i(w t ) or i(w new ) > i(w t ) for each w t ∈ T w (l, u).Note that T w (l, u) cannot be empty: Since the dependency graph is acyclic, we can choose a minimal element according to the dependency relation from the set of characters strictly between w l and w u , which by definition cannot have a dependency between w l and w u (otherwise it would not be a minimal element).Also, we can apply Condition 3 between pairs of elements in T w (l, u), which implies that the identifiers in T w (l, u) will be strictly increasing with the position of the character.This leads to an integration algorithm where a consecutive pair of elements in T w (l, u) (enclosing the identifier i(w new )) is chosen to narrow down l and u.11

Protocol 6
The WOOT Framework

w |w|
In Protocol 6 we present the resulting framework, which is the WOOT framework as described by Oster et al.In Fig. 4 we depict the integration of the character 'n' from the example of Sect.3.7 using the integration algorithm.To summarize, if the function fulfills conditions 1, 2 and 3 then the frameworks in Protocol 4 and Protocol 6 behave identically, exchanging the same set of messages, for the same modifications and providing the same view.While the internal state of Protocol 6 keeps the sequence of characters ordered according the sort keys implicitly, Protocol 4 explicitly computes them.

The function
As seen in the previous section, it is possible to simulate WOOT with the sort-key based Protocol 4, under the assumption that for any totally ordered identifier space I we can construct a sort key space and a function fulfilling Conditions 1, 2 and 3.In this section, we prove that this is indeed possible.We start by constructing such a sort key space under the assumption that I is finite.To extend to the infinite case, we then use the compactness theorem.We omit Condition 2 in the intermediate results, but we conclude at the end of this section that Condition 2 follows from Conditions 1 and 3. We would like to note that this is a novel result, not previously mentioned in publications to the best of our knowledge.
We denote by x (resp.x ) the largest integer (resp.smallest integer) smaller or equal (resp.larger or equal) to x.
for all l, u ∈ Q b .Additionally, we can conclude using the properties about d b we established: We write μ −1 l , ν −1 u for the inverse of μ l and ν u , i.e., x. Definition of : Finally, we can define using recursion on the length of the expansion of l and u, i.e., max To provide an intuition for the function , we refer to Fig. 5.We split the rational numbers between 0 and 1 into b equal-sized intervals.The identifiers correspond to the endpoints of the intervals that are strictly between 0 and 1.During each recursion step, an interval is scaled to the range between 0 and 1.Note that we define on a larger domain (l, i, u) l < u ∈ Q b , l < 1, u > 0, i ∈ I than stated in the proposition.(This is necessary, since the recursion relies on the extended domain, for example when b = 4: See also the second example in Fig. 5. On the other hand, if fulfills conditions 1 and 3, this will remain true for any restriction of it.)In the case that max(d b (l), d b (u)) = 0 we can conclude l ≤ 0 from the fact that l < 1 and integer, and similarly that u ≥ 1 and thus l < i < u, i.e., (l, i, u) is directly defined, i.e., Otherwise, we can conclude from ( 3) and ( 4) that the value (l, i, u) is either also directly defined or in terms of arguments with smaller max(d b (l), d b (u)).That those are still in the domain follow from the monotony of μ l , ν u as well as the inequalities (1), (2).Note that since i ∈ Q b and that the application of both μ −1 l , ν −1 preserve membership in Q b , we can deduce that the range of is in Q b .
As before we identify 0 with , the sort key associated to the beginning of the string, and 1 with , the sort key associated with the end of the string.The set of sort keys is Range of : We first show that • Case l < i < u: Then (l, i, u) = i and both ( 6) and (7)  follow by definition.
)), and we can using the induction hypothesis conclude μ l (l) < (μ l (l), i, μ l (u)) < μ l (u) which implies (7).Also using the induction hypothesis we have 0 < Example evaluation of Ψ when u < i using 2 recursions; the dashed lines represent rescaling by ν u and ν u : Example evaluation of Ψ when i < l using 1 recursion; here μ l (represented by the dashed lines) is applied to rescale the range between 2 4 and 3 4 to 0 and 1: Example evaluation of Ψ when l < i < u: l u Ψ l, 2 4 , u Fig. 5 Example evaluations of with no, one and two recursions for b = 4 from which (6) follows using 0 )) and we can again using the induction hypothesis conclude ν u (l) < (ν u (l), i, ν u (u)) < ν u (u) which implies (7).Also using the induction hypothesis we have 0 Monotonicity of : Next we show that (l, i, u)< (l, i , u) for i<i , l<u, l < 1 and u > 0 ( 8 ) where the equalities are by definition and the inequality follows from the induction hypothesis.• Case i ≤ l < i < u: Note that l < i implies bl < bi which implies bl + 1 ≤ bi since both sides of the inequality are integer.Using (6) we can now conclude: and the result follows using the induction hypothesis analogous to the first case.Otherwise, bl < bu − 1 and thus bl + 1 ≤ bu − 1, since both sides of the inequality are integer.We can use (6) arriving at bi < bu and since both sides of the inequality are integer.We have bi ≤ bu − 1, hence using (6) we can conclude • Case l < u ≤ i < i : Can be shown using the induction hypothesis analogous to the first case.
Stability of : Next we show that We consider the three cases from the definition of (l, i, u): • Case l < i < u: Then we have (l, i, u) = i and hence l < i < u due to the assumption which implies (l , i, u which implies bl ≤ bl since the left and right hand sides are integers.On the other hand bl ≤ bl follows directly from l ≤ l , i.e., bl = bl .Using that we can rely on the induction hypothesis, to conclude (l, i, u) = (l , i, u ).
bu − 1 which implies bu ≥ bu since the left and right hand sides are integers.On the other hand bu ≤ bu follows directly from u ≤ u, i.e., bu = bu .Like in the previous case, we can rely on the induction hypothesis to conclude (l, i, u) = (l , i, u ).
We have already shown that fulfills Condition 1 in (7).To show Condition 3, let us assume l < (l, i, u) < u and l < (l , i , u ) < u then: where the inequality follows from (8) and the equalities from (9).
It is easy to extend the previous result to arbitrary totally ordered finite identifier sets: By Proposition 1, there is a totally ordered set A and a function : X → A fulfilling conditions 1 and 3, where We can define (l, i, u) = (l, φ(i), u) when l < u and arbitrarily 13 set (l, i, u) = 1/b if l ≥ u.Proposition 3 Let I be a totally ordered set.Then there exists a totally ordered set A and a function : A∪{ }×I×A∪{ } → A fulfilling conditions 1 and 3.
Proof To extend the result from Proposition 2 to the infinite case, we use the compactness theorem [16, §2.1]:Let us assume that I is an infinite totally ordered set.We introduce a language L with two constant symbols and , one relation symbol < and an infinite set of 2-ary function symbols i for each element of I. Consider the following infinite set of first order sentences: The sentences (10-15) express that < is a strict total order and that the constants (resp.) are the smallest (resp.largest) elements in that order.Note that ( 16) and ( 17) constitute infinite sets of first order sentences.If the set of sentences has a model, it is easy to see that there is a totally ordered set A and a function fulfilling Condition 1 and 3. (Note that A would be identified with the elements of the model excluding the values associated to , and (l, i, u) would be the value associated with the value of i (l, u)).To show that (10-17) have a model, we rely on the compactness theorem, which is asking us to check whether any finite subset of those sentences have a model.For such a finite subset F, there will be a finite subset I of I such that the sentences (10-15) and the restriction of ( 16) and ( 17) to the sentences associated with the identifiers in I are a superset of F. For I we can rely on the previous result on finite identifier spaces to find a model, which implies using the compactness theorem, that this theorem is true even if the identifier space is infinite.
Theorem 1 Let I be a totally ordered set.Then there exists a totally ordered set A and a function : A∪{ }×I ×A∪{ } → A fulfilling conditions 1, 2 and 3. 13 Since the antecedents of both conditions 1 and 3 imply l < u.
Proof Because of Proposition 3, we only need to show that Condition 2 is true.Let l, l ∈ A ∪ { } and u, u ∈ A ∪ { } such that l < u, l < u and: We show that i = i by contradiction.Let us hence assume that i = i , which implies either i < i or i > i since I is totally ordered.We can then infer using Condition 1 and ( 18) that l, l , i, i , u, u fulfill the premise of Condition 3 and hence: Both conclusions are in conflict with (18).Hence, the assumption that i = i must have been false.

Strong eventual consistency of WOOT
Section 3 gave an informal derivation of the WOOT Framework and its consistency as a sequence of simulation arguments.We identified that Theorem 1 should imply that WOOT is strongly eventually consistent.However, a rigorous proof of the implication requires a large number of lemmas and definitions. 14We decided to use Isabelle/HOL to carry out the rigorous proof to avoid subtle flaws in the arguments.The resulting machine-checked proof [10] was open-sourced to the Archive of Formal Proofs (AFP) [1].
In this section, we describe the exact distributed execution model and the results we have verified in Isabelle/HOL and give a brief overview of the proof.Definitions, assumptions and theorems are accompanied by footnotes that reference the corresponding entities in the formalized proof.

Distributed system
To rigorously express the consistency properties, we need to formally define the distributed execution model used in the proof.We followed the modelling laid out by Gomes et al. [8]  and Raynal [24, Chapter 6] for distributed message-passing algorithms, but refined it for the case of the WOOT framework.Let P be a finite set of peer identifiers. 15Similarly to Shapiro et al. [27,  §2], we assume the participating peers are non-Byzantine.We model an execution of the framework as sequences of events for each peer, 16 where an event can either be a broadcast event and or the reception of a message, and we call this sequence the history of the peer h( p). 17We write h( p) i = broadcast(m) if the i-th event of peer p was broadcasting the message m and h( p) i = receive(m, q, j) if it was the reception of the message m broadcast at the j-th event of peer q. Figure 6 provides an example of such histories.Indices for histories start from 0 and we will use the notation |h( p)| for the number of events.An event can be uniquely identified by a pair comprised of a peer identifier p ∈ P and its index in the history of that peer; we call such a tuple an event id. 18n Raynal [24, Chapter 7], peers can also have internal events and, instead of broadcast events that disseminate a message to all peers, messages are directed to individual peers.We omit these cases for simplicity.Another difference is in the assumption that all messages are distinct.Since in the WOOT framework it is possible for two peers to send the same message, 19 we avoid that requirement and instead use 3-tuples for receive events, capturing the event index and peer the message originated from.In this way, the links between a broadcast event and its corresponding reception events are still represented.
Note that while we assume event indices count successive events for the same peer, there is no synchronicity assumption between indices from distinct peers.The only ordering of event ids between peers is induced by the causality implied by message transmission.To that end, we introduce a relation on the event ids 20 , the happened-immediately-before relation 17 datatype ('p, 's) event 18 type_synonym 'p event_id 19 This happens when two peers delete the same character concurrently. 20More commonly in other work, this relation is defined between events.However because of the assumptions that messages are not distinct, and thus it is possible to have the same broadcast event multiple times, we instead define the relation on event ids.
→ hib . 21The relation ( p, i) → hib (q, j) holds if i < |h( p)|, j < |h(q)| and either: • p = q and j = i + 1, i.e., they are successive events on the same peer, or • there exists m such that h(q) j = receive(m, p, i), i.e., the latter event is the reception of a message sent by the former event.
The transitive closure of → hib is the happened-before relation → hb , which was introduced by Lamport [13] to order events of asynchronous distributed systems.
Histories which describe an actual execution of a distributed algorithm fulfill additional conditions.For example, a peer can only receive a message from another peer if the latter broadcast that message.We summarize the assumptions about histories of a distributed system in the following condition:

Condition 4 (Distributed Execution)
• If a message m was received from peer q event j, that event must be a broadcast event, e.g., if h( p) i = receive(m, q, j) then h(q) j = broadcast(m). 22 A broadcast event will deliver a message to each peer at most once, i.e., for all p and i, j < |h( p)|, if there exists m, q, k such that h( p) i = h( p) j = receive(m, q, k) then i = j.In practice, if the network communication mechanism does not guarantee at-most-once delivery, this condition can also be simulated by keeping track of all received messages in the implementation.Note that we do not yet require that a message will be received at all by all peers. 23 The happened-immediately-before relation → hib is acyclic.This is equivalent to the fact that the happenedbefore relation → hb is a strict partial order.This condition about the execution of distributed systems is a consequence of the fact that the distributed system runs on physical machines and that events cannot cause themselves.See for example Lamport [13, §The Partial Ordering]. 24 will write R( p, i) 25 to denote the set of messages received by a peer before event i, i.e., 123

R( p, i)
For results where we need to assume that all broadcast messages will be delivered to each peer, we introduce the eventual delivery condition: Condition 5 Delivery) A message will be delivered to all peers, i.e., for all p, q ∈ P and j The following condition expresses the semantic delivery condition we introduced in Sect.3.5, i.e., a message is not received before its dependencies are.If the underlying communication protocol does not meet this condition, an implementation can buffer messages until their dependencies are received.See Protocol 5 for an example implementation of such an algorithm.Condition 6 (Semantic Causal Delivery) A message will only be delivered to a peer if its dependencies have already been delivered to it, i.e., for all p ∈ P and j < |h( p)|, if h( p) j = receive(m, q, k) then for each i ∈ deps(m) there exists an m ∈ R( p, j) such that i(m ) = i .Here we refer to the function deps 26 which is defined in Protocol 5. 27 In addition to events, we also associate a sequence s( p) of states to each peer p ∈ P. Similarly to the notation for history, we write |s( p)| for the number of states and index the states starting from 0. Whenever a peer receives a message, it will update its state using the corresponding integration algorithm for the type of message it received as described in Protocol 6.Similarly, a broadcast message is created using one of the modify algorithms described in Protocol 6.
To be able to express properties about the histories of states, we represent the integration and modification algorithms in Protocol 6 as mathematical functions.For the integration algorithms, if m is a message and s a preceding state then integrate(m, s) 28 returns the new state after integrating the message into state s.Depending on the type of message, this is the result of applying the algorithm integrate insert or integrate delete.In cases where the integration algorithm fails or does not terminate, the resulting state is ⊥ and we define integrate(m, ⊥) = ⊥.
Both modification algorithms modify insert and modify delete read but do not modify the state and their last statement is a broadcast.We introduce functions that return the message that would be broadcast by them.Given a state s, we define the function create-insert(i, σ, k, s), 29 which returns the message that would be broadcast by the algorithm modify insert(σ, k) if the state of the peer were s and the unique 26 fun deps [10,  §4.7]   27 assumes semantic_causal_delivery [10,  §4.7]   28 fun integrate [10,  §4.6]   29 fun create_insert [10,  §4.5]   id returned by the create-unique-id function were i.Similarly, create-delete(k, s) 30 returns the message that would be broadcast by modify delete(k) if the state of the peer were s.
In the following, we summarize the conditions that express that each peer implements Protocol 6.

Condition 7 (Peers execute WOOT Protocol)
1.The number of states is exactly one larger than the number of events, i.e., |s( p)| = |h( p)| + 1.This is because we use s( p) 0 for the initial state of a peer, before any event has happened on the peer.2. Each peer's initial state is the empty string, i.e., s( p) 0 = ε.3.If a peer receives a message, the resulting state is the output of applying the integration algorithm for that message on the previous state, i.e., s( p) i+1 = integrate(m, s( p) i ) if there exist m, q, j such that h( p) i = receive(m, q, j). 4. In the case of a broadcast event, the state of the peer remains the same, i.e., s( p) i+1 = s( p) i if there exists m such that h( p) i = broadcast(m).Deferring the update of the state allows us to simplify the correctness proof: we can model the broadcast event as transmitting the message to all peers including the source peer itself; its state will be updated when it receives its own message.An actual implementation would usually introduce a separate code path to update the peer's own state.See also Fig. 6 and Sect.3.1.315.In the case of a broadcast event, either the message was created by applying the modify insert or modify delete algorithm on the state s( p) i , i.e., if h( p) i = broadcast(m) then s( p) i = ⊥ and either: Note that, this means we are assuming a peer which reached the failure state will not broadcast any more messages. 32 would like to note a simplification made regarding the unique identifier.In the description of Protocol 6, we define the create-unique-id procedure to be an arbitrary algorithm that returns unique identifiers.In the above conditions, we use the combination of peer id and event index as the unique id for newly created characters (see Condition 7 Clause 5).This is a valid implementation for create-unique-id, but implementations could of course choose other methods to create unique identifiers.

Results
With the definitions in this section, we have verified the following two theorems using the Isabelle/HOL interactive theorem prover. 33) During the distributed execution of the WOOT framework with semantic causal delivery, the integration algorithms will never fail, i.e., if the conditions 4, 6 and 7 are met, then for all p ∈ P and i < |s( p)| we have s( p) i = ⊥.
We recall the definition of Strong Convergence from Sect. 1 and Shapiro et al. [27,  §2.2

]:
Definition 1 A CRDT is strongly convergent if any pair of peers who have received the same set of messages will be in equal states, i.e., for all p, q ∈ P, i < |h( p)| , j < |h(q)|, if R( p, i) = R(q, j) then s( p) i = s(q) j .
Theorem 3 (Strong Convergence 34 ) During the execution of the WOOT framework with semantic causal delivery, if two peers have received the same set of messages they will be in the same state, i.e., if the conditions 4, 6 and 7 are met then for all p, q ∈ P, Proof Verified in [10,  §6].
We recall the definition of Eventual Consistency from Sect. 1 and Shapiro et al. [27,  §2.2

]:
Definition 2 A CRDT is eventually consistent when, if after some point no further updates are made at any peer, then the peers will eventually reach the same final state, i.e., there is a state s such that for all p ∈ P: s( p) |s( p)|−1 = s.
Except for trivial cases, an operation-based CRDT can only be eventually consistent if all messages are actually delivered.On the other hand, strong convergence can be proved even without that assumption.If we additionally assume eventual delivery, we can conclude that the WOOT framework is eventually consistent: Corollary 1 If the WOOT framework is executed with semantic causal delivery and eventual delivery, then it is eventually consistent, i.e., if the conditions 4, 5, 6 and 7 are met, then there is a state s such that for all p ∈ P: Under the assumption of eventual delivery, we can prove that all peers will eventually 33 theorem no_failure [10,  §6]   34 theorem strong_convergence [10,  §6]   have received the same set of messages, i.e., R( p, |h( p)|) = R(q, |h(q)|) for all p, q ∈ P.
We recall the definition for strong eventual consistency from Sect. 1 and Shapiro et al. [27,  §2.2

]:
Definition 3 A CRDT is strongly eventually consistent if it is eventually consistent and it is strongly convergent.

Theorem 4
If the WOOT framework is executed with semantic causal delivery and eventually delivery, then it is strongly eventually consistent.
Proof Follows from Theorem 3 and Corollary 1.

Details
In the following, we describe the high-level approach (and the reasoning behind it) to prove Theorems 2 and 3 in our formalization [10] using Isabelle/HOL.
Usually, eventual consistency of a CRDT can be proven by checking that any pair of operations commute, i.e., given two messages, the successive integration of them to a given starting state will lead to the same resulting state irrespective of the order of integration.In that scenario, a general theorem about CRDTs (see Shapiro et al. [27, Theorem 2.2]) implies strong eventual consistency for the CRDT at hand.
In some cases, such as in the WOOT framework or RGAs, integration operations are partial functions where the state needs to satisfy a precondition to enable integration of a received message.An integration operation will fail if the precondition is not satisfied when it is invoked.For such CRDTs, in addition to consistency, we also need to prove such failures will not occur during the execution of the framework.For example, Gomes et al. [8] establish that the insert operation for RGAs will not fail because the dependency of an element will already be in the array as a consequence of causal delivery.
Interestingly, a proof of non-failure for WOOT must necessarily establish consistent ordering of characters between the participating peers.This can be seen by considering examples where two peers that have a permutation of each other's state will not be able to create messages that can be successfully integrated into the other peers state.However, invariants that imply consistent ordering of the characters between peers can be found.This makes it more favorable in the case of WOOT to directly show consistency from these established invariants, instead of verifying commutativity of operations.
To describe the invariant we establish during the tion of the WOOT framework, we first define the notion of consistent sets of messages.Let A be a sort key space and let : (A ∪ { }) × I × (A ∪ { }) → A be a map fulfilling conditions 1, 2 and 3. We have seen in Sect. 4   for all m ∈ M insert .
Roughly, a set of messages is consistent if it would be possible to inductively associate sort keys to each insert message (i.e., created character) according to the scheme described in Protocol 4. See also Fig. 3.
In addition to that, we introduce a relation between sets of messages and states of peers.The state of a peer consists of characters whose symbols are replaced by ⊥ if they were deleted but are otherwise identical to the corresponding insert message.The state corresponding to a set of insert messages is a sequence of such characters, with the ordering induced by the sort key function.More precisely: Definition 5 (Association 37 ) Let M be a consistent set of messages, where M = M insert ∪ M delete , and s be a sequence 35 definition consistent [10,  §5.3]   36 fun depends_on [10,  §5.3]   37 definition is_associated_string [10,  §5.3]   of characters.Let d be a function defined on M insert by: Then we say s and M are associated if: • M insert and s represent the same set of characters, up to possible substitutions of symbols with ⊥ due to delete messages, i.e., |s| = |M insert | and • For all α fulfilling the conditions of Clause 3 of Definition 4, the sequence α(i(s 1 )), • • • , α(i(s |s| )) is strictly increasing.
Observe that, given a consistent set of messages, there can be at most one state associated to it. 38A key result we establish is that the integration algorithm commutes with set insertions under that relation: The arguments of the proof rely on the insights we established in Sect.3.8.Note that the above result in particular implies that the integration algorithm will not fail.Since the starting state (the empty sequence) is associated with the empty set, 40 it is possible to prove using induction and Proposition 4 that, with the semantic causal delivery condition, a peer that receives a consistent set of messages will be in the state associated to the received set.
Having established that, we proceed [10, §5.7] by proving that all the messages broadcast during the execution of the WOOT framework are a consistent set, using induction according to some causal ordering of the events. 41It should be noted that during induction, we use Proposition 4 and thus use the fact that the messages broadcast so far, according 41 More precisely, since the events are partially ordered according to the → hb relation, we choose an arbitrary total order that is an extension of it.This can be done using topological sorting.In general, there may be many possible such extensions.to the chosen causal ordering, must be consistent.In Fig. 7 we depict an example induction step: Assuming the set of messages generated by the events left of the dashed line, i.e., the messages A, B and C, are consistent, we want to show that including the message D preserves consistency.
Because the induction proceeds according to the happenedbefore relation, the peer can only have received a subset of the messages-that were already shown to be consistent.In the depicted example, these are A and C. Because of the semantic causal delivery condition, such a subset must itself be consistent.Hence the state of the second peer, when it creates message D is the state associated to the messages A, C.
To complete the induction step, we rely on the following proposition: Proposition 5 Let M be a consistent set of messages and N be a consistent subset of M and let s be the state associated to N and m be a message, such that either • there exists n < |s| + 1, σ ∈ and m = createinsert(i, n, σ, s), where i is an identifier distinct from all identifiers in M, or • there exists n < |s| and m = create-delete(n, s).

Formalization in Isabelle/HOL
This section gives a brief overview of the machine-checked proof [10] we open-sourced in the Archive of Formal Proofs 42 lemma create_insert_consistent and lemma create_delete_consistent [10,  §5.4]   (AFP) [1].The AFP is a refereed publication containing formal documents verified by Isabelle.Contrary to ordinary publications, it is being updated with each release of Isabelle, and results are always checked with its most recent release.Authors can improve and add additional content to their entries provided the updates can be verified.All prior versions are accessible.Another distinctive feature of the AFP is that it allows entries to be used as a library, i.e., an entry can depend and use results established by previously published entries.
Our entry uses the Certification Monads [28] library to express partial functions.These are used for example to handle illegal indices during array lookups and missing identifiers during find operations in sequences, or to capture non-termination cases. 43Partial functions return an error result in these cases, which can then be propagated.Hence, when we prove that an algorithm will not return an error state, it implies that such runtime errors will not happen.
We also use the Data Type Order Generator [29] library to automatically derive total orders for data types, for example during the construction of the sort key space.
We organized the AFP entry such that all necessary definitions are summarized in Sections 1 to 4 (18 pages) with thorough explanations.Section 5 (36 pages) contains the actual proof, but readers who are only interested in the results can skip it. 44The resulting Theorems 2 and 3 of this document are presented in Section 6 of the AFP entry.
As mentioned before, Sect. 5 of this document includes footnotes that link the definitions and assumptions to the corresponding definitions and assumptions in the AFP entry.We would also like to refer to the documentation of Isabelle/HOL [18] for an introduction to its semantics and syntax.
In the following subsections, we mention notable methods we used while formalizing the proof, including notions that are uncommon in standard formal mathematics such as type parameters or sum types.

The function
Proposition 3 is represented using 2 propositions in our AFP entry [10, §5.1]: • proposition psi_elem • proposition psi_preserve_order 43 This may happen during the while-loop in the integration algorithm for insert messages in Protocol 6, where we return an error if u − l does not decrease during an iteration. 44Any intermediate definitions within Section 5 are not (neither directly nor indirectly) used in the statement of the resulting theorems.
Instead of using the compactness theorem, we prove the propositions constructively.It was however not easy present that version of the proof in Sect. 4 as it uses case with more than 25 separate cases.In Isabelle, the case distinctions result in goals that are automatically resolved.We would like to note that the existence of a constructive version is interesting for possible implementations of Protocol 4.

Types
Since Isabelle's type system allows the construction of new types based on existing types, we use that mechanism to abstract over the set of symbols and identifiers.Type parameters are indicated using a prime prefix, and type constructors are suffix operators in Isabelle.For example the type of a WOOT character is: ('I, ' ) woot_character where 'I denotes the type of identifiers and ' denotes the type of symbols.Then the type of a list of characters, i.e., a state of a peer is: ('I, ' ) woot_character list In cases where we use special elements such as , or ⊥, the representation in Isabelle uses a sum type.For example, when to include the special state ⊥ denoting the failure of the integration algorithm, the formalization uses the sum type: We can read this as an element of 'I extended is either , or [[i]] where i is an element of 'I.The terms in parenthesis, such as and , denote abbreviations; they make the formal document more concise and closer to the notation in this manuscript.

Locales
Locales are parametric theories, where a number of theorems can be shown for a common set of assumptions and definitions.A good use case for locales are algebraic structures such as groups, rings or fields.Locales can extend each other, for example the locale for a field would extend the locale for a ring.
We use locales to model the distributed system.In particular, we fix a finite set of peers and history of events, and establish their assumptions, such as semantic causal delivery or the absence of causal cycles.For technical reasons, we use two locales:

• dist_execution_preliminary • dist_execution
The first one establishes the assumption that the set of peers is finite and introduces definitions such as the sequence of received messages for a given state: This corresponds to our definition of R for received message in Sect. 5 (only with the slight difference that messages are returned in the order they were received by the peer).
The second locale extends the first locale and introduces the remaining assumptions, as described in Conditions 4, 6 and 7. Interestingly, we could express Clauses 1 to 4 of Condition 7 using a single definition.We recall that we required that the starting state of a peer is the empty sequence and that the state is updated by application of the integrate algorithm whenever a message is received.In the Isabelle formalization, we could express these constraints using the foldM function: fun state where state i = foldM integrate [] (received_messages i) In Section 9 of the AFP entry, we define an example execution of the framework consisting of histories of 3 peers where each peer broadcasts a message, and show that it is an example instance for the distributed-execution locale, i.e., that those fulfill the assumptions of it.Note that this implies, that the assumptions are consistent.

Conclusion
We have shown that WOOT is strongly eventually consistent.This property was an open conjecture in the original presentation of the framework in 2006.The fact that the general convergence proof was missing had also been mentioned by Kumawat and Khunteta [12,  §3.10].To achieve this result we relied on an association of sort keys to the characters.The proof is verified using the interactive theorem prover Isabelle/HOL.
Having machine-checked our proof, we have strong confidence in its correctness.By open-sourcing our formalization and framework and having it accepted in the Archive of Formal Proofs [10], our work is accessible, reproducible and available in a long-term maintained way to the community.
A key insight we could derive about WOOT is that it can be simulated by a specific instance of a broad class of algo-rithms, parameterized by the function (See Protocol We think it is worthwhile to investigate how properties of are related to properties of the CRDT, example with respect to interleaving anomalies.This also implies the existence of an implementation by explicitly computing the sort keys.That would allow an integration algorithm with a runtime of O(n log n) with the same behaviour, compared to the O(n 2 ) worst-case performance of WOOT (where n is the number of previous insert operations on the document).A second insight we contribute is that the communication-cost of sort-key based protocols could be significantly improved by performing a program transformation that defers some of the computation to the integration site.We think that similar modifications could also improve performance properties of other distributed data structures.It would be interesting to develop a formal theory of how program transformations can be defined on CRDTs and which conditions are required to preserve convergence and consistency properties.We found that the conventional commutativity argument, commonly used to show convergence of CRDTs, does not work with the WOOT framework.Instead we showed consistency using induction over the events of the distributed system.An interesting question for further work is whether there may be stronger fundamental theorems for CRDTs that could apply for WOOT and similar cases.

Proposition 1
Let b ≥ 2 and I := 1 b , . . ., b−1 b be the set of rational numbers with denominator b strictly between 0 and 1.Then there exists a totally ordered set A and a function with domain (l, i, u) l < u ∈ A ∪ { , } , i ∈ I ⊆ A∪ { } × I × A ∪ { } and range A fulfilling conditions 1 and 3. Proof We denote by Q b the set of rational numbers with finite b-ary expansion, i.e., Q b is closed with respect to addition, negation, multiplication.It is also closed under division by b.With d b (x), we denote the length of that expansion, i.e., d b : ) on the domain of using induction on max(d b (l), d b (u)).If max(d b (l), d b (u)) = 0 we have (l, i, u) = i using (5) confirming both (6) and(7).For the induction step let us assume the statements are true if max(d b (l), d b (u)) = n.And let max(d b (l), d b (u)) = n + 1.We consider the three cases from the definition of separately: For the induction step let us assume the statements are true when max(d b (l), d b (u)) = n.And let max(d b (l), d b (u)) = n + 1.We consider 6 separate cases: 12 ) using induction on max(d b (l), d b (u)).In the case where max(d b (l), d b (u)) = 0, we can conclude (l, i, u) = i and thus l < i < u which implies (l , i, u ) = i.For the induction step let us assume the statements are true if max(d b (l), d b (u)) = n.And let max(d b (l), d b (u)) = n + 1.

Proposition 2
Let I be a finite totally ordered set.Then there exists a totally ordered set A and a function : A ∪ { } × I × A ∪ { } → A fulfilling conditions 1 and 3. Proof Let b = |I| + 1 and I := { 1 b , . . ., b−1 b }, then we can define a strict monotone function between I and I φ(x) = y ∈ I y < x + 1 b .

Fig. 6
Fig. 6 Example execution of the framework

Proposition 4
If both M and M ∪ {m} are consistent sets of messages, s is the state associated to M, and either:• m / ∈ M,or • m is a delete message then integrate(m, s) is the state associated to M ∪ {m}. 39Proof Verified in [10, §5.6].

Fig. 7
Fig. 7 induction step, where the creation of message D keeps the set of messages consistent error + ('I, ' ) woot_character list This means the result of the integrate function is either an error or a sequence of characters.To extend types with the special and elements, we use the type constructor extended, defined by:dataype 'I extended = Begin ( ) | InString 'I ((1[[-]])) | End ( ) fun received_messages where received_messages (i,j) = [m.(Receive _ m) ← (take j (events i))] Each peer is initialized by calling the init function.Here we set up the global state variables S and R.Similarly, the modify functions are part of the interface of the CRDT and allow modifications to the data structure.
query lookup(e : element) : boolean return e ∈ S ∧ e / ∈ R modify add(e : element) broadcast (add e) modify remove(e : element) broadcast (remove e) Modification functions broadcast messages, that are integrated into the state of each peer, when the respective message is received using the integrate functions: integrate add(e : element) S ← S ∪ {e} integrate remove(e : element) R ← R ∪ {e}

Table 1
Sort keys and symbols

Table 2
Characters with predecessors and successor identifiers that such a space and function exists.Then we can define consistent sets of messages: Definition 4 (Consistent Sets of Messages 35 ) Let M be a set of messages, consisting of the insert messages M insert and delete messages M delete .We say such a set of messages is consistent, if: 1.Each message has a distinct identifier, i.e., i is injective on M insert .2. The dependencies of each message in the set are met, i.e., deps(m) ⊆ i(m) m ∈ M insert for all m ∈ M. 3. Let → dep be the relation 36 induced by the deps function on the insert messages, i.e., for m 1 , m 2 ∈ M insert , m 1 → dep m 2 iff i(m 1 ) ∈ deps(m 2 ).This relation → dep is wellfounded.4.There exists a function α from the identifiers in M insert to the sort key space A, i.e., α : i(M insert ) → A, such that: