Exploring Type-Level Bisimilarity towards More Expressive Multiparty Session Types

A key open problem with multiparty session types (MPST) concerns their expressiveness: current MPST have inflexible choice, no existential quantification over participants, and limited parallel composition. This precludes many real protocols to be represented by MPST. To overcome these bottlenecks of MPST, we explore a new technique using weak bisimilarity between global types and endpoint types, which guarantees deadlock-freedom and absence of protocol violations. Based on a process algebraic framework, we present well-formed conditions for global types that guarantee weak bisimilarity between a global type and its endpoint types and prove their check is decidable. Our main practical result, obtained through benchmarks, is that our well-formedness conditions can be checked orders of magnitude faster than directly checking weak bisimilarity using a state-of-the-art model checker.


Introduction
Background. To take advantage of modern parallel and distributed computing platforms, message-passing concurrency is becoming increasingly important. Modern programming languages, however, offer insufficiently effective linguistic support to guide programmers towards safe usage of message-passing abstractions (e.g., to prevent deadlocks or protocol violations).  Multiparty session types (MPST) [34] constitute a static, correct-by-construction approach to simplify concurrent programming, by offering a type-based framework to specify message-passing protocols and ensure deadlock-freedom and protocol conformance. The idea is to use behavioural types [1,37] to enforce protocols (i.e., patterns of admissible communications) between roles (e.g., threads, processes, services) to avoid concurrency bugs. The framework is illustrated in Fig. 1: first, a global type G (protocol specification; written by the programmer) is projected onto every role; then, every resulting endpoint type (local type) L i (role specification) is type-checked with the corresponding process P i (role implementation). If every process is welltyped against its local type, then their parallel composition is guaranteed to be free of deadlocks and protocol violations relative to the global type. Notably, common concurrency bugs as sends without receives, receives without sends, and type mismatches (actual type sent vs. expected type received) are ruled out statically. The MPST framework is language-agnostic: in recent years, practical implementations of MPST have been developed for several programming languages, including Erlang, F#, Go, Java, and Scala [18,35,36,45,46,50].
Three open problems. Many practically relevant protocols cannot be specified as global types; this limits MPST's applicability to real-world concurrent programs. Specifically, while the original work [33] has been extended with several advanced features (e.g., time [7,44], security [11,12,13,17], and parametrisation [18,25,47]), core features still have significant restrictions: inflexible choice, no existential quantification over participants, and limited parallel composition.
1. Inflexible choice: In the original work [33], if there is a choice between multiple branches, the sender in the first communication of each branch must be the same, the receiver must be the same, and the message type must be different (i.e., no non-determinism). Moreover, each role not involved in the first communication of each branch, must have the same behaviour in each continuation. For instance, the following global type specifies a protocol where Client c repeatedly requests an arithmetic Server s to compute the sum or product of two numbers: µX. c s : Add · s c : Sum · X + c s : Mul · s c : Prod · X Here, c s : Add specifies a communication of an Add-message (with two numbers as payload) from the Client to the Server, while · and + specify sequencing and branching, and square brackets indicate operator precedence. This is a "good" global type that satisfies the conditions. In contrast, the following "bad" global type specifies a protocol where Client c repeatedly requests addition and multiplication Servers s 1 and s 2 via Router r (payload types omitted; r 1 r 2 r 3 : t abbreviates r 1 r 2 : t · r 2 r 3 : t): µX. c r s 1 : Add · s 1 c : Sum · X + c r s 2 : Mul · s 2 c : Prod · X Several improvements to the original work have been proposed: Honda et al. managed to allow each role r not involved in a choice to have different behaviour in different branches [15], so long as r is made aware of which branch is chosen in a timely and unambiguous fashion (e.g., the previous global type is still forbidden), while Lange et al., Castagna et al., and Hu & Yoshida managed to allow choices between different receivers [16,23,36,40]. For instance, the following global type (the Client directly requests the specialised server) is allowed: µX. c s 1 : Add · s 1 c : Sum · X + c s 2 : Mul · s 2 c : Prod · X But, the following global type (two Clients c 1 and c 2 use Server S) is forbidden: µX. c 1 s : Add · s c 1 : Sum · X + c 1 s : Mul · s c 1 : Prod · X + c 2 s : Add · s c 2 : Sum · X + c 2 s : Mul · s c 2 : Prod · X None of the existing works allow the above nondeterministic choices between different senders. We call this the +-problem: how to add a choice constructor, denoted by +, to specify choices between disjoint sender-receiver-label triples?
2. No existential quantification: Related to the +-problem is the ∃problem: how to add an existential role quantifier, denoted by ∃, to specify the execution of ∃'s body for some role in ∃'s domain? For instance, instead of writing a separate global type for 2 Clients, 3 Clients, etc., existential role quantification allows us to write only one global type for any n>1 Clients: µX. ∃r∈{c i | 1≤i≤n}. r s : Add · s r : Sum · X + r s : Mul · s r : Prod · X The ∃-problem was first formulated by Deniélou & Yoshida [22] as the dual of the ∀-problem (i.e., specify the execution of ∀'s body for each role in ∀'s domain): the ∀-problem was solved in the same paper, but the ∃-problem "raises many semantic issues" [22] and has remained open for almost a decade.

Limited parallel composition:
The third open problem related to choice is the -problem: how to add a constructor, denoted by , that allows infinite branching (i.e., non-finite control) through unbounded parallel interleaving? While extensions of the original work with parallel composition exist (e.g., [16,22,23,43]), none of these works supports unbounded interleaving. For instance, the following global type allows an unbounded number of requests to be served by the Server in parallel (instead of sequentializing them): µX. ∃r∈{c i | 1≤i≤n}. r s : Add · s r : Sum X + r s : Mul · s r : Prod X Contributions. We overcome these three bottlenecks of MPST with an approach based on three key novelties: first, we have a new definition of projection that keeps more information in the local types than existing definitions; second, we exploit this extra information to formulate our well-formedness conditions; third, we use an unexplored proof method for MPST, namely to prove the operational equivalence between a global type and its projections modulo weak bisimilarity. This makes the proofs cleaner and ultimately allows for more flexibility (e.g., our approach can be modularly combined with traditional session type checking, but potentially also with other verification methods, such as model checking or conformance testing). To summarise the highlights: -For the first time, we provide solutions to the +-problem, the ∃-problem, and the -problem, by presenting expressive syntax for global and local types (formulated as process algebraic terms), a refined notion of projection, and novel well-formedness conditions. -Our main theoretical result is operational equivalence: a well-formed global type behaves the same as the parallel composition of its projections, modulo weak bisimulation. This implies freedom of deadlocks and freedom of protocol violations of the projections. Checking this equivalence is decidable. To our knowledge, we are the first to use (weak) bisimilarity to prove the correctness of a projection operator from global to local types. By doing so,  we decouple (a) the act of reasoning about projection and (b) the act of establishing compliance between local types and process implementations; until our work, these two concerns have always been conflated. -Our main practical results are: (1) to provide representative protocols typable in our approach; and (2) the well-formedness conditions of (1) can be checked orders of magnitude faster than directly checking weak bisimilarity using mCRL2 [10,20,29], a state-of-the-art model checker.
In Sect. 2, we present an overview of our contribution through a representative example protocol that is not supported by previous work. In Sect. 3, we present the details of our theoretical contribution. In Sect. 4, we present the details of our practical contribution (implementation and evaluation). In Sect. 5, we discuss related work. We conclude and discuss future work in Sect. 6. Detailed formal definitions and proofs of all lemmas and theorems can be found in our supplement [38].

Overview of our Approach
Scenario. To highlight our solutions to the +-problem, ∃-problem, andproblem, we consider a Key-Value Store protocol, similar to those used in modern NoSQL databases [21,27]. Specifically, our Key-Value Store protocol is inspired by the transaction mechanism of the popular Redis database [48,49]. This protocol is not supported by any of the existing MPST works.
The Key-Value Store protocol consists of n Clients that require access to the store, represented by role names c 1 , ..., c n , and one Server that provides access to the store, represented by role name s. The store has keys of type Str (strings) and values of type Nat (numbers). Fig. 2 shows valid and invalid example executions of the protocol (n=2) as message sequence charts; it works as follows.
First, a Lock-message is communicated from some Client c i (1≤i≤n) to Server s (Fig. 2a, arrows 1, 5); this grants c i exclusive access to the store. Then, a sequence of messages to write and/or read values is communicated: -To write, a Set-message is communicated from c i to s (arrows 2, 3, 11).
-To read, a Get-message is communicated from c i to s (arrows 6, 7). Then, eventually, a Value-message is communicated from s to c i (arrows 8, 10), but in the meantime, additional Get-messages can be communicated from c i to s. In this way, the Client does not need to await the responses of the Server to perform multiple independent requests. To indicate enough Get-messages have been sent, a Barrier-message is communicated from c i to s (arrow 9), which serves as a communication fence: the protocol will only proceed once all Value-messages for pending Get-messages have been communicated.
The sequence ends with the communication of an Unlock-message from c i to s (arrow 12). The protocol is then repeated for some Client c j (1≤j≤n); possibly, but not necessarily, i=j. In this way, the Server atomically processes accesses to the store between Lock/Unlock-messages.
Global and local types. The corresponding global type and local types, inferred via projection (for some n), are as follows: µZ. r s : Get(Str) · s r : Value(Str, Nat) Z + r s : Barrier · Y + r s : Set(Str, Nat) · Y + r s : Unlock · X µZ. rs ? Get(Str) · sr ! Value(Str, Nat) Z + rs ? Barrier · Y + rs ? Set(Str, Nat) · Y + rs ? Unlock · X Global type r 1 r 2 : (t) specifies the communication of a message labelled with a payload typed t from sender r 1 to receiver r 2 ; global type G 1 · G 2 specifies the sequential composition of global types G 1 and G 2 ; global type G 1 + G 2 specifies the alternative composition (choice) of global types G 1 and G 2 ; global type ∃r∈{r 1 , ..., r n }. G specifies the existential role quantification over domain {r 1 , ..., r n } (i.e., the alternative composition of G[r 1 /r] and ... and G[r n /r], where G[r i /r] denotes the substitution of r i for every r in G); global type G 1 G 2 specifies the interleaving composition of G 1 and G 2 (free merge [4]); global type µX. G specifies recursion (i.e., X is bound to µX. G in G).
Local type r 1 r 2 ! (t) specifies the send of a (t)-message through the channel from r 1 to r 2 ; dually, local type r 1 r 2 ? (t) specifies a receive. Because every Client participates in only one branch of the quantification, their local types do not contain ∃ under the recursion. In contrast, because the Server participates in all branches, L S does contain ∃ under the recursion. By Thm. 3, G and the parallel composition of L C1 , ..., L Cn , L S are operationally equivalent (weakly bisimilar), which in turn implies deadlock-freedom and absence of protocol violations. Note also that our global type for the Key-Value Store protocol indeed relies on solutions to the +-problem (choice between multiple clients that send a Lock-message), the ∃-problem (existential quantification over clients), and the -problem (unbounded interleaving to support asynchronous responses of a statically unknown number of requests).

Types as Process Algebraic Terms
We define our languages of global and local types as algebras over sets of (global) communications and (local) sends/receives. This subsection presents preliminaries on the generic algebraic framework we use, based on the existing algebras PA [3] and TCP+REC [2]; the next subsection presents our specific instantiations for global and local types.
Let A denote a set of actions, ranged over by α, and let {X 1 , X 2 , . . . , Y, . . .} denote a set of recursion variables. Then, let Term(A) denote the set of (algebraic) terms, ranged over by T , generated by the following grammar: Term 1 specifies a skip; the grey background indicates it should not be explicitly written by programmers (but it is used only implicitly in the operational semantics). Term α specifies an atomic action from A. Terms T 1 + T 2 , T 1 · T 2 , and T 1 T 2 specify the alternative composition, the sequential composition, and the interleaving composition (free merge [4]; a form of parallel composition without interaction between the operands) of T 1 and T 2 . Terms X and X k |{X i → T i } i∈I specify recursion, where {X i → T i } i∈I is a recursive specification that maps recursion variables to terms, X k is the initial call (for T k ), and every X j that occurs in T k is a subsequent recursive call (for T j ); we write µX. T instead of X |{X → T } .
Let X Term(A) denote the set of all recursive specifications (i.e., every recursive specification is a partial function), ranged over by E, F , and let sub(E, T ) denote the simultaneous substitution of term E(X) for each recursion variable X in T . Fig. 3 defines the operational semantics of terms. It consists of two components: relation − → defines reduction of terms, while relation ↓ defines successful termination of terms. In words, term T 1 + T 2 is reduced by reducing either T 1 or T 2 ; term T 1 · T 2 is reduced by reducing first T 1 and then T 2 ; term A term is 1 -free if it has no occurrences of 1 . A term is closed if it has no occurrences of free recursion variables. A term T is deterministic if (1) for every action α, there exists at most one term T such that T can reduce to T by performing α, and (2) every term to which T can reduce is deterministic as well. Henceforth, we consider only 1 -free, closed, and deterministic terms.
We note that A, +, ·, is the signature of PA [3], while 1 , A, +, ·, , X, -|is a subsignature of TCP+REC [2]. As the operational semantics of terms in Term(A) coincides with the operational semantics of terms in (the corresponding subalgebra of) TCP+REC, our languages of global and local types inherit TCP+REC's sound and complete axiomatisation, used in our tool (Sect. 4.1).

Global Types and Local Types
Actions. We instantiate Term(A) to obtain languages of global and local types by defining action sets for (global) communications and for (local) sends/receives. Let R = {a, b, ...} denote the set of all role names, ranged over by r. Let Lab = {Lock, Get, ...} denote the set of all labels, ranged over by . Let T = {Nat, Bool, . . .} denote the set of all payload types, ranged over by t. Let U = Lab × T denote the set of all message types, ranged over by U ; we write (t) instead of , t . Finally, let A g and A l denote the sets of all (global) communications and (local) sends/receives, ranged over by g and l, generated by: Global action r 1 r 2 : U specifies the communication of a U -message from sender r 1 to receiver r 2 ; we note that communications are synchronous, as actions in the underlying algebra are indivisible [2,3], but asynchrony can be encoded (Exmp. 1, below). Local action r 1 r 2 ! U specifies the send of a U -message through channel r 1 r 2 (from r 1 to r 2 ). Dually, local action r 1 r 2 ? U specifies a receive. Local

Fig. 4: Macros
action ε r r1r2 specifies the idling of role r during a communication between roles r 1 and r 2 . The inclusion of such annotated idling actions in local types is novel; we shortly elaborate on its purpose.
We can now define Glob = Term(A g ) and Loc = Term(A l ) as the sets of all global and local types, ranged over by G and L.
Macros. As a testimony to the unique expressive power of our language of global types, we extend it with a number of macros that can be expanded to "normal" global types in Glob. A macro M is generated by the following grammar: Degenerate "macro" G is a normal global type; it is part of the grammar to nest global types inside macros. Macro r 1 r 2 · M specifies an asynchronous communication from sender r 1 to receiver r 2 . Macro Σ{M i } i∈I specifies an n-ary choice among |I| alternatives. Macro µ(X, c , e , { r 1i , r 2i , M i } i∈I ) specifies finite recursion: at the start of each unfolding of recursion variable X, for some i ∈ I, either an c -message is communicated from sender r 1i to receiver r 2i (in which case they continue their participation in the recursion), or an e -message is communicated (in which case they exit). Macro ∃r∈{r i } i∈I . M specifies existential role quantification. Macros can be nested. Slightly abusing notation, we allow macros to occur and be expanded freely in "normal" global types. Fig. 4 defines the macro expansion rules. We note that the left-hand side of is a macro, while the right-hand side is a normal global type. We demonstrated existential role quantification in Sect. 2; below, we give two more examples to illustrate our encoding of asynchronous communication and finite recursion.
Example 1 (Asynchrony). Although communications are synchronous, we can encode asynchrony by representing buffered channels (unordered, as in asynchronous π-calculus [32]) explicitly as roles that participate in a protocol. To this end, assume for all r 1 , r 2 ∈ R, there exists a role r 1 r 2 ∈ R as well (to represent the buffer from r 1 to r 2 ); alternatively r 1 r 2 could be any fresh name.
The following global types (message types omitted) specify paradigmatic cases for protocols with asynchronous communications: (For brevity, we omit 1 from the resulting global types; this can be incorporated in the macro expansion rules, at the expense of a more complex formulation.) Global type G 1 specifies an asynchronous communication from Alice to Bob. Global type G 2 specifies two asynchronous communications from Alice to Bob; Alice can do the second send already before Bob has done the first receive. Global type G 3 specifies an asynchronous communication from Alice to Bob, followed by one from Bob to Alice; in contrast to G 2 , Bob can send only after he has received (i.e., this encoding of asynchrony preserves causality of messages sent and received by the same role). Global type G 4 specifies an asynchronous communication from Alice to Bob, followed by a synchronous communication from Bob to Alice; it highlights that, unlike existing languages of global types, ours supports mixing synchrony and asynchrony in a single global type.
Example 2 (Finite recursion). The Key-Value Store protocol in Sect. 2 does not terminate: in its global type, the inner recursions (Y and Z) can be exited, but the outer recursion (X) cannot. A version of this protocol that terminates once each of the Clients has indicated it has finished using the store (e.g., by sending an Exit-message) can also be specified.
We illustrate the key idea in a simplified example: Global type G 1 specifies the communication of either a Con-message (to continue the recursion) or an Exit-message (to break it) from Alice to Carol. Global type G 2 is similar. Global type G specifies the communication of a Con-message from  In the latter case, Carol stops communicating with a role, while she proceeds communicating with the other role. Thus, the communications between Alice and Carol, and between Bob and Carol, are decoupled (i.e., decisions to continue or break recursions are made per role). Macro µ generalizes this pattern to arbitrary recursion bodies.
Groups. Finally, let R Loc denote the set of all groups of local types (i.e., every group is a partial function from role names to local types), ranged over by L. The idea is that while a global type specifies a protocol among n roles from one global perspective, a group of local types specifies a protocol from the n local perspectives. Fig. 5 defines the operational semantics of groups, built on top of the operational semantics of local types; we use the f [x → y] notation to update function f with entry x → y. In words, group L is reduced either by synchronously reducing the local types of a sender r 1 and a receiver r 2 (yielding a communication from r 1 to r 2 ), or by reducing the local type of an idling role.

End-Point Projection: from Global Types to Local Types
A key part of MPST (Fig. 1) is a projection operator that consumes a global type G as input and produces a group of local types L as output; it is correct if, under certain well-formedness conditions, G and L are operationally equivalent.
Let r(G) denote the set of all role names that occur in G. Fig. 6 defines our projection operator. In words, the projection of a communication r 1 r 2 : U onto a role r is a send r 1 r 2 ! U if the role is sender in the communication, a receive r 1 r 2 ? U if it is receiver, or an idling action ε r r1r2 if it is not involved; the projections of all other forms of global types onto r are homomorphic; the projection of a global type onto a set of roles R is the corresponding group of projections, where the side condition implies that the group is nonempty and contains a local type for at least every role name that occurs in G. Thus, a group of projections of G is a partial function relative to the set of all roles R, but it is total relative to the set of roles r(G) ⊆ R that occur in G. (We note that we also continue to assume global types are 1 -free, closed, and deterministic.) Our projection operator is similar to existing projection operators in the MPST literature [34], but it also differs on a fundamental account: it produces local types with annotated idling actions. These idling actions will be instrumental in the definition of our well-formedness conditions. We note that no idling actions occur in the local types for the Key-Value Store protocol in Sect. 2. This is because after the idling actions have been used to establish well-formedness, they are of no more use and can be eliminated to simplify the local types.
The following lemmas state key properties about termination and reduction behaviour of global types and their projections: Lem. 1 states projection is sound and complete for termination; Lem. 2 states the same for reduction.
Proof. Both conjuncts are proven by induction on the structure of G, also using Lem. 1 (needed because termination plays a role in reduction of ·).

Weak Bisimilarity of Global Types, Local Types, and Groups
The idling actions introduced in local types by our projection operator are internal, because they never compose into communications that emerge between local types in groups. Therefore, the operational equivalence relation under which we prove the correctness of projection should be insensitive to idling actions.
First, let A τ = {ε r r1r2 | r 1 = r 2 and r 1 = r = r 2 } denote the set of all internal actions, ranged over by τ, σ. Second, Fig. 7 defines an extension of our operational semantics (Fig. 3) with relations that assert weak termination and weak reduction (i.e., versions of termination and reduction that are insensitive to internal actions). Third, Fig. 8 defines weak bisimilarity (≈), in terms of weak similarity ( ), in terms of weak termination and weak reduction; it coincides with the definition found in the literature (e.g., [2]), with the administrative T1 T2 T1 ↓ implies T2 ⇓ T 1 T 2 and T2 α = ⇒ T 2 for some T 2 or T 1 T2 and α ∈ Aτ exception that we need the fourth rule in Fig. 7b to account for the fact we have multiple different internal actions. We use a double horizontal line in the formulation of rules to indicate they should be applied coinductively.
The notion of weak reduction allows us to generalize the soundness and completeness of projection from roles (Lem. 2) to groups of roles: Lem. 3 states (1) if G can g-reduce to G and the projection of G is defined, then the group of projections of G can reduce to the group of projections of G , either directly or with a trailing weak τ -reduction; (2) conversely, if the group of projections of G can g-reduce to L , then G can g-reduce to G and either L equals the group of projections of G , or it can get there with a weak reduction.
Proof. Both conjuncts are proven by induction on R, also using Lem. 2.

Well-formedness of Global Types
In general, projection does not preserve weak operational semantics.
Example 3 (Bad protocols). The following global types (message types omitted) specify "bad" protocols that do not permit "good" concurrent implementations: Global type G 1 specifies a communication from Alice to either Bob or Carol, chosen by Alice. This is a bad protocol, because if Alice chooses Bob, there is no way for Carol to know (and vice versa): Carol cannot locally distinguish between whether Alice has not made her choice yet, or whether Alice has chosen Bob. Formally, this is manifested in the fact that Carol's local type can at any time choose to perform idling action ε c ab (i.e., local type G 1 c has two reductions, neither one of which has priority), thereby assuming that Alice has chosen Bob. However, Bob can symmetrically assume that Alice has chosen Carol. As a result, the group projection can reduce as follows: ac − − → L 2 . Now, L 2 cannot reduce further, but Alice has not terminated yet. This sequence of reductions cannot be (weakly) simulated by G 1 .
Global type G 2 specifies a communication from Alice to Bob, followed by a communication from Carol to Dave. This is a bad protocol, because there is no way for Carol and Dave to know when the communication from Alice to Bob has occurred. Formally, this is manifested in the fact that Carol's and Dave's local types can at any time choose to perform idling actions, thereby assuming that the communication from Alice to Bob has occurred. As a result, the group projection can reduce as follows: −−−→ L 4 . This sequence cannot be (weakly) simulated by G 2 .
Next, we define two well-formedness conditions that invalidate the previous examples; in Sect. 3.6, we prove that if these conditions are satisfied by a global type G, it is indeed guaranteed that G and G R are operationally equivalent (i.e., weakly bisimilar). Instead of defining the conditions in terms of global types, we define them in terms of projections (i.e., local types). Informally: C For every r ∈ R, for every choice that local type G r has between a weak reduction l = ⇒ (where l is a send, a receive, or an idling action) and a completely unobservable weak reduction τ = ⇒, choosing to perform the former does not disable the latter, and vice versa. This can be thought of as a form of commutativity between l and τ . EC For every r ∈ R, one of the following is true: 1. For every every weak reduction l = ⇒ that local type G r can perform (where l is a send or a receive, but not an idling action), it can perform a reduction l − →. That is, if G r can perform l in the future after idling actions, it can do l already eagerly in the present. 2. Local type G r is the start of a causal chain: a sequence of τ -reductions, followed by a non-τ -reduction, that are "causally related" to each other. An ε r r1r2 -reduction is causally related to a ε r r3r4 -reduction iff {r 1 , r 2 } ∩ {r 3 , r 4 } = ∅. Globally speaking, this means communication between r 3 and r 4 must be preceded by communication between r 1 and r 2 .
These conditions must hold coinductively for all local types that G r can reduce to. Essentially, these conditions state that by performing idling actions, a local type can neither decrease its possible behaviour (C), nor increase it (EC-1), unless it is guaranteed the added behaviour cannot be exercised yet, because it is causally related to other communications that need to happen first (EC-2).
Example 4 (Bad protocols, continued). Global type G 1 (Exmp. 3) is ill-formed: its projections onto b and c violate condition C. Global type G 2 (Exmp. 3) is also ill-formed: its projections onto c and d violate condition EC.  9 defines C and EC formally. We define C not only for local types, but also for groups of local types, as this simplifies some notation later on. We prove key properties of C: Thm. 1 states commutativity of local sends/receives/idling (l) in local types gets lifted to commutativity of global communications/idling (α) in groups of local types; Lem. 4 states weak bisimilarity preserves commutativity.

Theorem 1.
C l τ (L(r)) for all l, τ for all r ∈ dom L implies C α τ (L) for all α, τ and C(L(r)) for all r ∈ dom L implies C(L) Proof. The first conjunct is proven by induction on the rules of = ⇒. The second is proven by coinduction on the rule of C, also using the first conjunct.
Lemma 4. C α1 α2 (L 1 ) and L 1 ≈ L 2 implies C α1 α2 (L 2 ) and C(L 1 ) and L 1 ≈ L 2 implies C(L 2 ) Proof. The first conjunct is proven by applying the definitions of C and ≈; the second is proven by coinduction on the rule of C, also using the first conjunct.
We also prove key properties of Chain and EC, both of which work specifically for groups of projections: Lem. 5 states if the projections of r 1 and r 2 are both causal chains, they cannot weakly reduce to local types where they can perform reciprocal actions (r 1 the send; r 2 the receive); Thm. 2 states eagerness of local sends/receives (not idling) in projections gets lifted to eagerness of global communications in groups of projections (cf. Thm. 1).

Lemma 5.
Chain (G R)(r 1 ) implies false Proof. By induction on the rules of = ⇒.
and EC(L(r)) for all r ∈ dom L implies EC(L) Proof. The first conjunct is proven by using Lem. 5; the second is proven by coinduction on the rule of EC, also using the first conjunct.
We note that, in contrast to Lem. 4 for C, we do not have a lemma that states weak bisimilarity preserves EC. Such a lemma would have been highly useful in our subsequent proofs, but it is unfortunately false, because weak bisimilarity does not preserve Chain. A simple counterexample, for local types, is this: L 1 = r 1 r 2 ! U and L 2 = ε r3 r4r5 · r 1 r 2 ! U , where {r 1 , r 2 } ∩ {r 3 , r 4 , r 5 } = ∅. While L 1 and L 2 are weakly bisimilar, L 1 is the start of a unary causal chain, but L 2 is not. The problem here is that Chain depends on the role names associated with idling actions, whereas weak bisimilarity abstracts those role names away.
We call a global type well-formed if each of its projections satisfies C and EC.

Correctness of Projection under Well-Formedness
We now to prove our main result: if a global type is well-formed, it is weakly bisimilar to the group of its projections. We start by defining a relation to relate global types with groups of local types (denoted by R in Fig. 8): Here, we write L 1 = ⇒ L 2 as an abbreviation for: In words, L 1 = ⇒ L 2 means L 1 has a silent reduction (only τ -s) to a term that is weakly bisimilar to L 2 , or L 1 is already weakly bisimilar to L 2 (without any reductions). Essentially, if C(G R) and EC(G R), then relates G to a set of groups S = {L | G L} that can roughly be characterised as follows: -(base) G R is in S; -(successors) any group to which G R can silently reduce, is in S; -(predecessors) any group that can silently reduce to G R, is in S; -(pseudo-predecessors) any group that can silently reduce to a group to which G R can silently reduce, is in S; -(closure) S is closed under weak bisimilarity.
The following technical lemma states if a well-formed group of projections G R can weakly g-reduce to some group L , then the original global type G can g-reduce to some G , and L and the group of projections of G either are weakly bisimilar, or they can weakly reduce to a weakly bisimilar group L . The following two lemmas state key properties of : Lem. 7 states preserves termination (as weak termination); Lem. 8 states coinductively preserves reduction (as weak reduction). Together, these lemmas imply ⊆ and -1 ⊆ , which in turn imply ⊆ ≈.   Proof. By coinduction on the rule of (Fig. 8), also using Lemmas 7-8.
A group of local types L enjoys deadlock-freedom if it either has successfully terminated (L ↓; Fig. 5a) or can make another reduction. A group of local types L enjoys absence of protocol violations relative to global type G if, coinductively, every non-τ reduction of L can be simulated by G (i.e., every communication in the group is "permitted" by G). The following corollary relates Thm. 3 of operational equivalence to these classical MPST properties: Corollary 1. If global type G is well-formed, then the group of G's projections enjoys deadlock-freedom and absence of protocol violations relative to G.
The key insight to understand this, is that global types are by definition free of deadlocks (they either reduce to 1 , or they never terminate; Fig. 3), while weak bisimilarity preserves deadlock-freedom of global types in their projections (notably, weak bisimilarity is sensitive to termination, and a group of local types terminates only if all individual local types terminate; Fig. 5a). Weak bisimilarity also directly implies freedom of protocol violations.

Decidability of Checking Well-Formedness
We note our proof of Thm. 3 is non-constructive, in the sense that is infinitely large (i.e., for each group of local types, there exist infinitely many weakly bisimilar groups). The following proposition states this is not a problem in practice.

Proposition 1. Checking C(L) and EC(L) is decidable.
The rationale behind this proposition is as follows. First, to check C(L) and EC(L), by Thm. 1 and Thm. 2, it suffices to check C(L(r)) and EC(L(r)) for each r ∈ dom L. For each such local type L(r), there are two possibilities.
If local type L(r) has finite control, its state space can be exhaustively explored in finite time, so checking C(L(r)) and EC(L(r)) is obviously decidable.
In contrast, if L(r) has non-finite control, we make two observations. The first observation is that the only possibly source of infinity is the occurrence of recursion variables under parallel composition. The second observation is that C and EC are true for L 1 L 2 if they are true for L 1 and L 2 separately; this is because C and EC essentially assert a "diamond structure" on the reductions of L 1 L 2 , which is precisely the operational semantics of (Fig. 3). Thus, we can check C(L 1 L 2 ) and EC(L 1 L 2 ) by checking C(L 1 ), C(L 2 ), EC(L 1 ), and EC(L 2 ), thereby "avoiding" the possible source of infinity.
We note that splitting the checks for parallel composition in this way not only ensures decidability; it also avoids exponential state explosion (in the number of nested -operators in a single local type) in local types with finite control.

Discussion of Challenges
Our use of (weak) bisimilarity, plus the key insight to annotate silent actions with additional information to keep track of choices, made the problem of proving the correctness of projection (Thm. 3) feasible. The major technical challenges to achieve this were defining the right bisimulation relation (Sect. 3.5) and discovering corresponding well-formedness conditions (Sect. 3.6).
A naive weak bisimulation relation, R naive , relates every global type only with its group of projections. R naive is sufficient to prove that every reduction of a global type can be weakly simulated with one non-silent reduction of the group (sender and receiver), followed by a number of silent reductions (idling  Fig. 10: Overview of mpstpp processes). In contrast, R naive is insufficient to prove that every reduction of the group can be simulated by its global type, because of silent actions: if global type G is related to group of projections L by R naive , and a silent action subsequently reduces L to L , the simulation fails, as R naive does not relate G to L .
To alleviate this issue, we defined the bisimulation relation in such a way that it relates every global type G to a group of local types that are not necessarily equal to the projections of G, but every local type can be behind the corresponding projection (the local type can reach the projection with silent actions) or ahead (the projection can reach the local type with silent actions).

Implementation
Tool. We implemented a tool, mpstpp, based on the core theoretical contributions of this paper. Fig. 10 shows a high-level overview of the tool, including the main components (boxes) and data flows (arrows).
First, mpstpp parses an input .glob-file to a data structure for a global type G (programmer-friendly Scribble-style syntax [35] is also supported as input). Then, it projects G onto all roles that occur in G. Then, it checks each of the resulting local types for well-formedness, depending on settings, either sequentially or in parallel : a key advantage of the formulation of our well-formedness conditions is that they can be checked modularly for every role in isolation, enabling us to take advantage of modern multicore hardware. Finally, if the local types are well-formed, idling actions are eliminated and typed communication APIs are generated from the local types to enable MPST++-based programming in Java.
Optimisations. Parsing, computing projections, and generating APIs is relatively inexpensive; instead, the run times of our tool are dominated by checks for well-formedness. We therefore implemented several optimisations to make these checks more efficient. Before we present these optimisations, we first note that the complexity of checking well-formedness of a local type L is polynomial in the number of successors that can be reached from L (Fig. 9).
(1) Our first optimisation targets local types with parallel composition; local type L 1 L 2 is potentially a serious bottleneck, as its number of successors is exponential in the number of nested -operators. Therefore, even with finite state spaces, we check the well-formedness of L 1 L 2 by checking the well-formedness of L 1 and L 2 , without explicitly considering the exponentially many successors of L 1 L 2 , exploiting the same observation as with decidability (Sect. 3.7).
(2) Our second optimisation concerns computation of weak reductions. In particular, to check whether C and EC are true for a local type L, according to their definitions (Fig. 9), we need to iterate over each of their weak reductions. Especially if L has many τ -reductions (Fig. 7), computing the set of weak reductions can be expensive. To avoid this, mpstpp computes sound (but incomplete) approximations of C and EC. We implemented two kinds of approximations: (a) checking versions of C and EC where every occurrence of = ⇒ in the definition is replaced with − →, and (b) checking L ≈ L for every τ -reduction from L to L . Approximation (a) is sound for both C and EC (rationale: if individual reductions can commute, sequences of reductions consisting of those individual reductions can commute as well), but approximation (b) is sound only for C (rationale: auxiliary relation Chain of EC is not preserved by weak bisimilarity). To ensure soundness, thus, mpstpp never uses approximation (b) for EC.
(3) Our third optimisation targets the checks for weak bisimilarity that occur in several places in the definitions of C and EC (Fig. 9). Instead of computing the full reduction relations and run an algorithm to decide their weak bisimilarity (which would be computationally costly), we take advantage of the fact that our language of local types is based on existing algebras (Sect. 3.1) that have sound and complete axiomatisations. Specifically, to check whether two local types are weakly bisimilar, mpstpp applies the axioms as rewrite rules and compares the resulting normal forms for structural equality. To ensure rewriting is fast, we sacrificed completeness (i.e., we use rewriting only to eliminate as many silent actions as possible in a sound way, but for instance, our rewrite procedure cannot prove that (L 1 · τ ) + L 2 and L 2 + L 1 are weakly bisimilar); however, for the ample examples we tried (including this paper's), this optimisation is highly effective.
Optimisations (2) and (3) are conservative: mpstpp may conclude C or EC is false, even though it is actually true. While this affects completeness, soundness is guaranteed: if mpstpp concludes a local type is well-formed, it really is.

Evaluation of the Approach
Setup. In the previous section, we formulated and proved the theoretical correctness of our well-formedness conditions (Thm. 3). In this section, we demonstrate the practical usefulness through experimental evaluation in benchmarks. Specifically, we show that checking our well-formedness conditions is faster and more scalable than explicitly checking operational equivalence (which currently seems the only alternative to attain the same level of expressiveness as our work).
In our benchmarks, we compare three approaches to check operational equivalence between a global type and its group of projected local types: mpstpp-seq (baseline): In this approach, the mpstpp tool is used to check our well-formedness conditions (which imply operational equivalence; Thm. 3), without using any form of parallel processing.
mpstpp-par: Like mpstpp-seq, except each projected local type is checked in a separate thread. The fact our well-formedness conditions can be easily parallelised in this way is an important practical advantage. explicit: In this approach, mpstpp is used only for parsing and projecting; after that, we use the state-of-the-art verification tool set mCRL2 [10,20,29] to explicitly check operational equivalence (details below).
We identified six example protocols (details below) that can naturally be scaled in the number of roles N (e.g., the number of Clients in the Key-Value Store protocol). Using each of the three approaches, for each of the protocols, for each value of N between the minimal number of roles N min (e.g, N min =2 in the Key-Value Store protocol: the Server and one Client) and 16, we subsequently checked operational equivalence; varying N in this way, yields insights not only in per-case performance, but also scalability. To get statistically reliable results [31], we repeated executions as many times as was necessary until the 95% confidence interval was within 5% of our reported means (i.e., there is a 95% probability that the true mean is within 5% of our reported means).
We ran our benchmarks on a machine with an Intel Xeon 6130 processor (16 cores; no hyper-threading), using Debian 9, Java 13, and mCRL2 201908.0.
Translation to mCRL2. In the explicit approach, we use mCRL2 [10,20,29] to explicitly check if global type G and its group of projections L are operationally equivalent. Our choice for mCRL2 is motivated by the fact our languages of global and local types are based on the same process algebra as mCRL2's specification language, so their translation to mCRL2 specifications is direct and straightforward. Moreover, mCRL2 is mature (e.g., used in industry [5]), and it uses optimised, state-of-the-art algorithms to check behavioural equivalences (e.g., [28]), so we are comparing our tool with a serious competitor.
First, we translate global type G to mCRL2 specification G . Then, we use mCRL2 tools mcrl22lps and lps2lts to normalize G to a linear process specification (LPS) and generate a corresponding labelled transition system (LTS). Because of the directness of the translation, the transition labels in the resulting LTS are all global communication actions of the form r 1 r 2 : U .
Second, we translate group of projections L, consisting of roles r 1 , ..., r n , to mCRL2 specification L . It looks as follows (in formal mCRL2 notation [29]): where each L(r i ) is a direct translation of local type L(r i ) to an mCRL2 specification; is a form of parallel composition that prescribes both interleaving and synchronisation of operand actions; is synchronous composition of actions; Γ is the communication operator that replaces synchronised local send/receive actions r i r j !U r i r j ?U with global communication action r i r j :U ; and ∇ is the allow operator that allows only global communication actions to be executed (i.e., unsynchronized, individual send/receive actions cannot be executed).
When translating a local type L(r i ) to an mCRL2 specification L(r i ) , to make mCRL2's subsequent verification easier, we already eliminate as many idling actions ε r r1r2 as possible (modulo branching bisimulation); those that remain are represented as a general τ action, because mCRL2 does not need the additional information provided by ε r r1r2 . Then, we use mcrl22lps and lps2lts to generate an LPS and LTS for L .
Third, we use mCRL2 tool ltscompare to check if the LTS for G is weakly bisimilar to the LTS for L . We note that normalisation to an LPS using mcrl22lps is a requirement to use ltscompare.
Protocols. We used the following protocols in our benchmarks: Key-Value Store (KVS): This protocol is the same protocol as the one presented in Sect. 2, except each inner parallel composition ( ) is replaced with sequential composition (·). This is because mcrl22lps does not support normalisation of mCRL2 specifications where occurs under recursion.  each 2≤n≤7, we instantiated the Pub/Sub protocol with 1 Publisher and n Subscribers; we did not instantiate the Pub/Sub protocol with n>7 Subscribers, as the resulting global types are too large (their size grows exponentially in n).
Benchmark results. Figures 11-12 shows the results of our benchmarks. The x-axis indicates the number of roles; the y-axis indicates relative speed-ups. The baselines are at y=1E+0 and y=1: above it, a competing approach is faster than mpstpp-seq; below it, it is slower. We draw two conclusions.
(1) For each protocol and number of roles, mpstpp-seq outperforms explicit. In the cases of Key-Value Store and Load Balancer, explicit grows towards mpstpp-seq, but the growth levels off as the number of roles increases, while explicit is still about two order of magnitude slower than mpstpp-seq in the best of circumstances. In the cases of Work Stealing, Peer-to-Peer, and Pub/Sub, the LTSs generated from the translated mCRL2 specifications were too large to be compared (i.e., ltscompare produced an error) beyond 7, 5, and 5 roles; this was no issue for mpstpp-seq. In the case of Map/Reduce, the LTSs were small enough to compare using mCRL2's ltscompare, but after an initial upwards slope for 2≤N ≤7 roles, explicit starts to perform progressively worse.
(2) Especially for larger numbers of roles, parallelisation can yield serious performance improvements. In the cases of Key-Value Store and Load Balancer, mpstpp-par outperforms mpstpp-seq only with 14-16 roles; for smaller numbers of roles, parallel execution is slower. In the worst case (Load Balancer, 2 roles), the slowdown is roughly 10.9µs 3.2µs = 3.4; we hypothesise that be- cause of the low absolute execution times, the cost of spawning and synchronising threads outweighs their benefit. However, the ascending gradient indicates that as the number of roles increases, relatively more of the total work can be parallelised, yielding progressive rewards. In the cases of Work Stealing, Map/Reduce, Peer-to-Peer, and Pub/Sub, similar trends can be observed, except y=1 is crossed sonner. The absolute execution times for these protocols and for small numbers of roles are higher than for Key-Value Store and Load Balancer.

Related Work
Multiparty compatibility. Closest to this paper is existing literature on multiparty compatibility [6,24,40,42]. The key idea, initially developed by Deniélou and Yoshida for the original MPST [23,24], is to represent (groups of) local types operationally as (systems of) communicating finite state machines (CFSM) [8]. A CFSM M is a state machine where transitions are labelled with sends/receives; a system of CFSMs S is a parallel composition where CFSMs communicate through asynchronous buffers. Multiparty compatibility, then, is a condition on the reachable states and transitions of a system S = (M 1 , ..., M n ): if it is satisfied by S, the system is guaranteed to be safe (no deadlocks; no unmatched sends/receives) and live (S terminates, assuming at least one M i can terminate). Multiparty compatibility is a sufficient condition to guarantee safety and liveness, but not necessary: there exist safe/live systems that are not multiparty compatible. Therefore, several generalisations have been proposed to cover timed behaviour [6], undirected choice [40], and non-synchronisability [42]. The main similarities between our method in this paper and the multiparty compatibility approach are: (1) we also use an operational interpretation of local types; (2) we guarantee similar liveness/safety properties; (3) and we also neatly factor out the act of checking conformance of processes to local types (resp. CF-SMs). In contrast, we support a wider range of behaviours. Moreover, from a practical/computational perspective, multiparty compatibility is a global condition that needs to be checked on the whole state space of a system (i.e., parallel composition of the CFSMs), prone to exponential blow-up; our well-formedness conditions, in contrast, are completely local and require only polynomial time to check. The reason we do not require CFSM-like machinery in this paper is that our operational correspondence (weak bisimilarity) is sensitive to termination: notably, in Fig. 5a, a group of local types terminates iff every individual local type terminates (for multiparty compatibility, proofs are done modulo trace equivalence [24], which cannot distinguish between successful/abnormal termination and is therefore in itself too weak to show deadlock-freedom).

Expressiveness of MPST.
In the original MPST theory [33], and many of its descendants (e.g., [14,19,22,24,25,43]), the restrictions on choices are enforced through a combination of syntax and additional well-formedness conditions. Notably, in these works, communications in global types are specified as r 1 r 2 : { i · G i } i∈I , so syntactically, it is impossible to specify choices among senders or receivers. There exist also papers where a seemingly more general binary +-like operator is introduced, particularly those that support choices among receivers [16,23,36,40], but the well-formedness conditions still basically restrict the use of + in these works to r 1 r 2 : { i · G i } i∈I or r {r i : i · G i } i∈I . This is the first paper where well-formedness conditions do not force the use of + into one of those two restricted forms. Moreover, our well-formedness conditions are compatible with unbounded interleaving (recursion under parallel), beyond similar operators in previous work [16,22,23,43]. An alternative approach is to completely omit statically checked well-formedness conditions (and projection), and to only dynamically verify communication actions against global types through monitoring, as recently proposed [30]. The language of global types in that paper is more expressive than ours in this paper, but all verification happens at run-time, whereas we provide correctness guarantees already at compile-time. Session types and model checking. Recently, there has been growing interest in using model checking to verify properties of (multiparty) session types, similar to our use of mCRL2 as an alternative to checking well-formedness (Sect. 4.2). Lange et al. [39] infer behavioural types from Go programs and use mCRL2 to verify the inferred types, to establish safety properties (combined with another tool, KITTeL [26], to establish liveness). Hu and Yoshida [36] use a custom model checker to verify safety and progress properties of local types (represented as CFSMs) as part of API generation in the Scribble toolchain for MPST [35].
Closest to our use of mCRL2 is the work of Scalas et al. [52,53], where mCRL2 is used to verify properties of local types (e.g., deadlock-freedom), while a form of dependent type-checking is used to verify conformance of processes against those types (i.e., actors in Scala); no global types and projection are used, though (programmers write local types manually). The idea is that properties model-checked on the types carry over to the processes. Similarly, Scalas and Yoshida [51] use mCRL2 to model-check session environments, as a more expressive alternative to the classical consistency condition needed to prove subject reduction. Note that [51,Theorem 5.15] shows that, in the case that a set of processes is typable by a single multiparty session (i.e. a single global type), type-level properties including safety, deadlock-freedom and liveness guarantee the same properties for multiparty session π-processes. Hence our type-level analysis is directly usable to provide decidable procedures to verify session π-calculi with extended expressiveness [51, Theorem 7.2].

Conclusion
A key open problems with multiparty session types (MPST) concerns expressiveness: none of the previous languages of global and local types supports arbitrary choice (e.g., choices between different senders), existential quantification over roles, and unbounded interleaving of subprotocols (in the same session). In this paper, we presented the first theory that supports these features. Our main theoretical result is operational equivalence under weak bisimilarity: this guarantees classical MPST properties for groups of local types projected from a global type, namely freedom of deadlocks and absence of protocol violations. Our main practical result is that our well-formedness conditions, which guarantee operational equivalence, can be checked orders of magnitude faster than directly checking weak bisimilarity, which is demonstrated by our benchmark results.
We identify several interesting avenues for future work. First, it is useful to extend our theory with parametrisation along the lines of Castro et al. [18] (which currently works only for restrictive choices); their proof technique for correctness seems to offer substantial synergy with our bisimilarity-based approach in this paper. Second, we aim to investigate extensions of our theory with subtyping (e.g., in terms of weak similarity). Notably, while asynchronous communication can be encoded in our current theory, asynchronous subtyping is known to be undecidable [9,41], so the connection between the two is interesting to explore.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.