The Discourje project: run-time verification of communication protocols in Clojure

To simplify shared-memory concurrent programming, languages have started to offer core support for high-level communications primitives, in the form of message passing though channels, in addition to lower-level synchronisation primitives. Yet, a growing body of evidence suggests that channel-based programming abstractions also have their issues. The Discourje project aims to help programmers cope with channels and concurrency bugs in Clojure programs, based on dynamic analysis. The idea is that programmers write not only implementations of communication protocols in their Clojure programs, but also specifications. Discourje then offers a run-time verification library to ensure that channel actions in implementations are safe relative to specifications. The aim of this paper is to provide a comprehensive overview of the current state of Discourje, including case studies, theoretical foundations, and practical aspects.


Introduction
To take advantage of modern multi-core processors, sharedmemory concurrent programming-a notoriously difficult enterprise-has become increasingly important. In the wake of this development, languages have started to offer core support for high-level communication primitives, in the form of message passing through channels (e.g. Go, Rust, Clojure), in addition to lower-level synchronisation primitives. The idea is that channels can also serve as a programming abstraction for shared memory beyond their usage in distributed systems. Supposedly channels are less prone to concurrency bugs than locks, semaphores, and the like. For instance, the official Go documentation recommends programmers to "not communicate by sharing memory; instead, share memory by communicating" [1].
Yet, a growing body of evidence suggests that channelbased programming abstractions also have their issues. For instance, in the 2016-2018 editions of the annual Go survey [2][3][4], "[respondents] least agreed that they are able to effectively debug uses of Go's concurrency features", while in the 2019 edition [5], "debugging concurrency" has the lowest satisfaction rate of all eleven "very or critically important" topics. Moreover, after studying 171 concurrency bugs in popular open source Go programs [6], Tu et al. conclude that "message passing does not necessarily make multithreaded programs less error-prone than shared memory".
Several research projects have emerged that aim to help programmers cope with channels and concurrency bugs in Go programs (e.g. [7][8][9][10][11]), based on static analysis. The idea is to employ compile-time verification to complement Go's static type-checker in a way that fits established Go programming techniques, practices, and culture. However, while similar techniques may be likely to suit other statically typed languages as well (e.g. Rust), it remains an open question if they are equally appropriate for dynamically typed languages (e.g. Clojure); technically, practically, and culturally, run-time verification may fit such languages better. Discourje-pronounced "discourse"-is a research project that aims to help programmers cope with channels and concurrency bugs in Clojure programs, based on dynamic analysis.

From the programmer's perspective
A major challenge to cope with channels and concurrency bugs is as follows: how to ensure that an implementation I is safe relative to a specification S, where S prescribes the roles (implemented as threads), the network (implemented as channels between threads), and the protocols (implemented as sessions of communications through channels) that I should fulfil. Safety means that "bad" channel actions never happen: if a channel action happens in I , then it is allowed to happen in S. For instance, typical specifications rule out common concurrency bugs [6], such as sends without receives, receives without sends, and type mismatches (i.e. actual type sent = expected type received).
The Discourje project offers a run-time verification library in Clojure, called discourje, to ensure safety of I relative to S. The idea is to execute specification S-as if it were a state machine-alongside implementation I using two typical run-time verification components (e.g. [12]): a monitor (of S) and instrumentation (of I ). Every time a channel action is about to happen in I , the instrumentation quickly intervenes and first asks the monitor if S can make a corresponding transition. If the monitor answers "yes", both the channel action in I and the corresponding transition in S happen; if "no", only an exception is thrown. Thus, a channel action in I happens if, and only if, a corresponding transition happens in S, in lockstep (i.e. "bad" channel actions never happen).
The discourje library facilitates writing specifications, adding monitors, and adding instrumentation to implementations written in Clojure. To make discourje easy and non-invasive to start using, and inspired by recent editions of the annual Clojure survey [13,14] (respondents indicate that "ease of development" is one of Clojure's most important strengths; more so than "runtime performance"), we emphasise ergonomics in discourje's development: • We leverage Clojure's macro system to offer the specification language for protocols as an embedded domainspecific language (DSL). As a result, the programmer can write both specifications and implementations in similar notation, using the same editor (no external tools needed), towards a seamless specification-implementation experience. Monitors can subsequently be added with simple function calls. • To add instrumentation, the only things the programmer needs to change in an existing implementation are: (1) to load discourje.core.async instead of standard library clojure.core.async for channels; (2) to add a bit of configuration data when channels are created. This means, in particular, that the programmer does not need to write an implementation with discourje Fig. 1 Traditional MPST [16,17] in mind: instrumentation can straightforwardly be added afterwards. • The following main functions and macros from clojure.core.async are currently supported: thread thread thread (new thread), chan chan chan (new channel), close! close! close! (closing), >!! >!! >!! (send), <!! <!! <!! (receive), and alts!! alts!! alts!! (selection).
When clojure.core.async was introduced in 2013 [15], already, it was suggested that "certain kinds of automated correctness analysis" are possible, but at the time, "no work [had] been done on that front". To our knowledge, Discourje is the first project that addresses this open problem.

From a researcher's perspective
The Discourje project was originally conceived to explore a new direction in research on multiparty session types (MPST): since the early achievements [16,17], while substantial progress had been made both in MPST theory (e.g. extensions with time [18,19], security [20][21][22][23], parametrisation [7,24,25]) and in MPST practice (e.g. tools for F# [26], Go [7], Java [27,28], Scala [29]), nearly all efforts had targeted the domain of statically typed languages and distributed systems. By targeting the domain of dynamically typed languages and shared-memory concurrent programs instead, the Discourje project set out to enter uncharted waters. In particular, the main research question that has been driving the project from the start has been how to take advantage of the unique properties of the target domain to deliver "better" (by some definition) tools. As a result, the "Discourje approach" has diverged considerably from the "traditional MPST approach".
To explain the two fundamental differences in more detail, first, Fig. 1 visualises the traditional MPST approach. It works as follows: 1. Initially, the programmer manually writes a "global" specification S glob ; it prescribes the communication behaviour of all roles, collectively, from a shared perspective (e.g. "first, a number is communicated from Alice to Bob; next, a Boolean is communicated from Bob to Carol or Dave".) 2. Subsequently, an MPST tool automatically decomposes S glob into role-specific "local" specifications S loc 1 , S loc 2 , . . . , S loc n ; every S loc i prescribes the communication behaviour of one role, individually, from its own perspective (e.g. for Bob: "first, receive a number from Alice; next, send a Boolean to Carol or Dave"). 3. Finally, an MPST tool automatically verifies every thread I i in the implementation against S loc i by means of static type-checking (in the style of behavioural type systems [30,31]). Now, MPST theory guarantees that well-typedness at compile time implies safety at run time.
In contrast, Fig. 2 visualises the Discourje approach. It fundamentally differs from the traditional MPST approach on two accounts: • In the traditional MPST approach, to fit established programming techniques, practices, and culture of statically typed languages, compile-time verification has been a non-negotiable requirement. However, the Discourje approach targets dynamically typed languages, which are technically, practically, and culturally different. As a result, the Discourje approach uses run-time verification instead of compile time. • In the traditional MPST approach, to fit established programming practices for distributed systems, decentralised verification (i.e. type-checking against local specifications on a per-role basis) has been a nonnegotiable requirement. However, the Discourje approach targets shared-memory concurrent programs, without any form of distribution (i.e. all threads are executed on the same machine). As a result, the Discourje approach uses centralised verification without decomposition instead of decentralised.
Due to these two fundamental differences, the Discourje approach substantially improves expressiveness by removing two limitations of the traditional MPST approach. The first limitation pertains to compile-time verification vs. run time: the traditional MPST approach statically rejects ill-typedbut-safe implementations (i.e. it is sound but not complete), whereas the Discourje approach dynamically rejects only unsafe implementations (i.e. it is sound and complete). The second limitation pertains to decentralised verification vs. centralised: the traditional MPST approach relies on decomposition and rejects specifications that cannot be decomposed in a behaviour-preserving way (i.e. many grammatical specifications are unsupported; e.g. [32]), whereas the Discourje approach does not rely on decomposition (i.e. all grammatical specifications are supported).
Besides these two fundamental differences, the following strengths of the traditional MPST approach remain consolidated: • Fully automated verification of concrete programs (vs. abstract models); • User-friendly programming language-based notation to write specifications (vs. dynamic logic or temporal logic).

This paper
The aim of this paper is to provide a comprehensive overview of the current state of the Discourje project. In Sect. 2, we present a few preliminaries on Clojure. In Sect. 3, we demonstrate the usage of the discourje library in a number of case studies. In Sect. 4, we present the theoretical foundations on which discourje is built. In Sect. 5, we discuss practical aspects, including details of discourje's internals and results of performance experiments. This paper substantially extends our TACAS 2020 paper [33] with material from our ISoLA 2020 paper [34] (notably: case studies and new features) and our ESEC/FSE 2021 paper [35] (notably: a built-in model checker for specifications). To improve the presentation, the new material is integrated throughout the paper instead of isolated in separate new sections.

Preliminaries on Clojure
Clojure [36][37][38] is a dynamically typed, functional language (impure) that compiles to Java bytecode and runs on the JVM. It is a dialect of Lisp and has a powerful macro system. In the 2019 edition of the Stack Overflow Developer Survey [39], Clojure was the 7th most loved language, outranking languages including Go, C#, Scala, Java, C++, and C.
Channel-based programming abstractions are offered in Clojure through standard library clojure.core.async [15]. It has both unbuffered and buffered channels. In the absence of a buffer, both sends and receives are blocking until a reciprocal channel action is performed on the other end of the channel. In the presence of a bounded, n-capacity, order-preserving buffer, sends are blocking until the buffer is non-full (next, a value is enqueued to the back of the buffer), while receives are blocking until the buffer is non-empty (next, a value is dequeued from the front of the buffer).
For reference, Fig. 3 summarises the main Clojure functions and macros relevant to this paper; we clarify their usage in the next sections, by example.

A tour of discourje
To demonstrate the usage of the discourje library, we take a 4-stop tour. The first stop (Sect. 3.1) presents the intended workflow of discourje. The remaining three stops (Sects. 3.2-3.4) present three Clojure programs that we can specify and verify using discourje, each of which simulates a game and requires unique features (i.e. Tic-Tac-Toe, Rock-Paper-Scissors, and Go Fish). In each of these case studies, the safety property that discourje ensures is that the players (i.e. threads) never violate the "interaction rules" of the game (e.g. proper turn-taking), as stated in the specifications. We note that discourje does not check full functional correctness (e.g. it ensures that players properly take turns to make moves, but it does not ensure that every move is valid in the current game state).
As a notational convention, in the rest of this paper, the main Clojure functions and macros are typeset in blue font blue font blue font, while the main discourje functions and macros are typeset in red font red font red font. Figure 4 summarises the intended workflow of discourje:

The workflow
• First, the programmer writes a specification S using discourje and, possibly independently, an implementation I in Clojure. • Next, the programmer runs I with S: during the run, a channel action in I happens if, and only if, a corresponding transition happens in S (Sect. 1.1.1). • When an unsafe channel action is attempted, an exception is thrown. • Next, the programmer diagnoses the problem: if it is "clearly" a bug in I , then they can fix I ; else, they can analyse S using a built-in model checker for S. In the latter case, the aim is to rule out bugs in S, so the pro-grammer can more confidently focus their attention on fixing I (even if the problem is not "clearly" a bug in I , it can still be one, especially with concurrency). The builtin model checker supports both generic sanity checks and protocol-specific temporal requirements.
To illustrate the workflow, we consider a classical example from the MPST literature, namely the Two-Buyer program: "Buyer1 and Buyer2 wish to buy an expensive book from Seller by combining their money. Buyer1 sends the title of the book to Seller, Seller sends to both Buyer1 and Buyer2 its quote, Buyer1 tells Buyer2 how much she can pay, and Buyer2 either accepts the quote or rejects the quote by notifying Seller" [40].
First, we write the specification in Fig. 5. Lines 1-3 specify the roles, identified by :buyer1, :buyer2, and :seller, while lines 4-12 specify the protocol, identified by :two-buyer. In general, (-> -> -> t p q) specifies a communication of a value of type t through the unbuffered channel from p to q; (close close close p q) specifies closing of the channel from p to q; (cat cat cat S 1 . . . S n ) and (par par par S 1 . . . S n ) specify concatenation (i.e. sequential composition) and interleaving (i.e. parallel composition). 1,2 Additional features will be presented in the next subsections. Because discourje is built on top of Clojure/Java, we can also use a few Clojure/Java features to write specifications (e.g. "colon-prefixed" identifiers from Clojure and data types from Java).
Thus: lines 5-9 specify communications of a String (book) from :buyer1 to :seller, an Integer (quote) from :seller to :buyer1 and :buyer2, an Integer (contribution) from :buyer1 to :buyer2, and a Boolean (accept/reject) from :buyer2 to :seller; lines 10-12 specify closings of all channels, in no particular order.
Next, we write the implementation in Fig. 6. Lines 1-3 implement the channels, while lines 4-24 implement the threads. In general (Fig. 3), Thus, the quote of :seller is 19 (variable x at :buyer1 and :buyer2); the contribution of :buyer1 is half of the quote (variable y at :buyer2), and the decision of :buyer2 is to reject (variable z).    Next, we run the implementation with the specification. To do this, we first need to add the lines in Fig. 7 between lines 3-4 in Fig. 6. That is, we create a monitor for the specification in Fig. 5 and link it to every channel, along with the intended sender and the intended receiver. Furthermore, we need to load discourje.core.async instead of clojure.core.async. Besides these little changes, no other changes are needed: notably, the code for :buyer1, :buyer2, and :seller in Fig. 6 stays exactly the same. This demonstrates that discourje is non-invasive to start using.
Next, we observe an exception: is not enabled in current state(s): [3].
LTS in Aldebaran format: des (0,4,5) (0,"? !(String,buyer1,seller)",1) (1,"? !(Integer,seller,buyer1)",2) (2,"? !(Integer,seller,buyer2)",3) (3,"? !(Integer,buyer1,buyer2)",4) *** state 4 not yet expanded *** The first two lines report that the implementation of :buyer1 attempts to send value 19/2 to :buyer2, but that this is not allowed in the specification's current state 3. The remaining lines show the relevant part of the state space of the specification, as a list of transitions. By matching the unsafe action reported on the first line, ? !(19/2,buyer1,buyer2), against the label of the transition out of current state 3, ? !(Integer,buyer1,buyer2), we can infer that a communication from :buyer1 to :buyer2 is actually allowed, but that the type of the value must be Integer, which 19/2 is not; it is a Ratio value that we forgot to round down. Thus, "clearly", the problem is a bug in the implementation. Next, we fix the bug by replacing (/ x 2) on line 7 in Fig. 6 with (int (/ x 2)), to round the Ratio down to an Integer.
Next, we re-run the implementation with the specification.
Next, we observe another exception: is not enabled in current state(s): [4].
LTS in Aldebaran format: des (0,5,6) (0,"? !(String,buyer1,seller)",1) (1,"? !(Integer,seller,buyer1)",2) (2,"? !(Integer,seller,buyer2)",3) (3,"? !(Integer,buyer1,buyer2)",4) (4,"? !(Boolean,buyer2,seller)",5) *** state 5 not yet expanded *** By matching the unsafe action reported on the first line, C(buyer1,buyer2), against the label of the transition out of current state 4, ? !(Boolean, buyer2, seller), we can infer that the implementation of :buyer1 attempts to close its channel to :buyer2, but that the specification allows only a communication from :buyer2 to :seller at this point. Thus, there seems to be a timing issue with :buyer1's closing. This is not "clearly" a bug in the implementation: the specification prescribes all closings to happen at the end (Fig. 5, lines 10-12), and indeed, every thread closes its channels at the end of its run (Fig. 6, lines 9-10, [16][17][23][24], so what goes wrong? Next, we check the specification using discourje's built-in model checker, by having it automatically perform  Causally unrelated actions are strictly ordered seven generic sanity checks: three checks pertain to termination (the protocol must always terminate; it may always terminate; it can never terminate), three checks pertain to closings (if a channel is used, it must be closed; if a channel is closed, it must have been used; if a channel is closed, it cannot be used again), and one check pertains to causality (clarified below). Next, the model checker reports three issues. The first issue is that the specification cannot never terminate. This is intended, so we can immediately ignore it (and disable the check). The second issue is that, apparently, one of the channels can be closed before it is used.
To help debugging, the model checker provides the witness in Fig. 8 (i.e. a violating sequence of actions). It clarifies that after five communications, the specification allows :buyer2 to close its channel to :buyer1, but actually, that channel is never used. While this is not a bug per se, it is "smelly" (cf. dead code and unused variables).
Next, we re-check the specification using the model checker.
Next, only the third issue remains reported: at some point, apparently, two causally unrelated actions are allowed to happen in one order, but not in the other order. This can be problematic, because in the absence of a causal relation between the actions, it is impossible to write an implementation that fulfils one order but not the other, unless "covert interaction" is used (i.e. synchronisation or communication outside the specification).
To help debugging, the model checker provides the witness in Fig. 9. It clarifies that after five communications, the specification allows :buyer1 to close its channel to :buyer2, but it forbids :buyer1 to do so before :buyer2 and :seller have communicated (penultimate action of the witness). However, as a non-participant in that communication, :buyer1 cannot know when :buyer2 and :seller are done (i.e. no causality), so the specification cannot be fulfilled; this is a specification bug.
Next, we fix the bug by observing that the specification is too restrictive: it requires all channels to be closed at the end, but since :buyer1's part in the protocol is already done at line 8 in Fig. 5, the specification should allow :buyer1 to close its channels from that point onwards. We therefore replace lines 9-12 with the following: 9 (par par par (--> --> --> Boolean :buyer2 :seller) 10 (close close close :buyer1 :buyer2) 11 (close close close :buyer1 :seller)) 12 (par par par (close close close :buyer2 :buyer1) 13 (close close close :buyer2 :seller) 14 (close close close :seller :buyer1) 15 (close close close :seller :buyer2)))) (The closing of the unused channel from :buyer2 to :buyer1 was removed in a previous step.) Thus, by judiciously introducing a new par par par-block, the specification now allows :buyer1 to close its channels in parallel to the communication from :buyer2 to :seller.
Next, we re-check the specification using the model checker.
Next, another causality issue is reported. The last two actions of the witness are C(buyer1,seller) and C(buyer2,seller). Thus, the specification allows :buyer1 and :buyer2 to close their channels to :seller in that order, but not in the reverse order; since :buyer2 cannot know when :buyer1 is done, this is a specification bug.
Next, no more issues are reported. Next, we re-run the implementation with the specification.
At last, no more exceptions are reported. Thus, in several iterations, we detected and fixed bugs in both the implementation and the specification. We note that lines 9-12 of the final specification reveal intricate timing constraints. On the one hand, as a result, the specification is not easy to write, which may discourage potential users. On the other hand, the intricate timing constraints exist regardless of whether a specification is written; this can make writing the specification, and subsequently enjoying the benefits of runtime verification, all the more valuable. The model checker is, however, important to assist in getting the specification right.
Having demonstrated the intended workflow of discourje, we proceed with three case studies to systematically present the features and expressiveness of the Discourje approach (Sect. 1.1.2, Fig. 1)

Preface
Our first case study is a program that simulates a game of Tic-Tac-Toe. 3 It consists of two threads and two 1-capacity buffered channels through which they communicate. The threads take turns to make plays on thread-local copies of the grid; at the end of its turn, the active thread sends its play to the other thread and becomes passive, while the other thread receives the play, becomes active, updates its copy of the grid accordingly, and makes the next play. This case study demonstrates the following features: • Specification: roles; asynchronous communication through buffered channels; closings; concatenation (sequential composition); choice; interleaving (parallel composition); role-based parametrisation. • Implementation: channels; sends; receives; closings.
Specification :ttt-turn represents one turn of r1 (active player) against r2 (passive player). It specifies a concatenation (cat cat cat): 1. First, a value of type Long is communicated through a buffered channel from r1 to r2 (-» -» -»; we recall that -> -> -> is used to specify unbuffered communications). The idea is that r1 sends its play this turn to r2.

Next, there is a choice (alt alt alt):
(a) Either, there is another instance of :ttt-turn, but now with r2 as active player and r1 as passive player. The idea is that r1 did not win or draw this turn, so the game continues. (b) Or, channels are closed (close close close), in parallel (par par par).
The idea is that r1 did win or draw this turn, so the game ends.
The closings may happen in any order; this is important, as neither one of the closings is causally related to the other (i.e. in the implementation, covert interaction would be needed to order them).
Specification :ttt represents the whole game. It specifies a choice between either an initial instance of :ttt-turn with actual parameters :alice and :bob, or :bob and :alice, depending on who takes the first turn. Thus, at the specification level, it is undecided who goes first (implementation detail).
Since concatenation, choice, and recursion are supported in discourje, any regular expression (over communications and closes) can be written. However, for convenience, shorthands are available for the following patterns: 0-or-more repetitions (* * *), 1-or-more (+ + +), and 0-or-1 (? ? ?). Thus, the programmer never needs to use explicit recursion to write regular expressions.

Implementation
An implementation of the Tic-Tac-Toe program is shown in Fig. 11. Lines 1-9 define constants (blank, cross, nought, initial-grid) and functions (get-blank, put, not-final?) to represent Tic-Tac-Toe concepts. Lines 11-12 define buffered channels of capacity 1 (a->b and b->a) that implement the network through which the threads communicate. Lines 14-24 and 25-35 define threads that implement roles :alice and :bob. Both threads execute a loop, starting with a blank initial grid. In each iteration, :alice first gets the index of a blank space on the grid, then plays a cross in that space, then sends a value to :bob to communicate the index, then awaits a value from :bob, and then updates the grid accordingly; :bob acts symmetrically. After every grid update, :alice or :bob checks if it has reached a final grid; if so, the loop is exited and channels are closed. 4 A monitor and instrumentation can be added to the implementation in the same way as shown in Fig. 7. Interestingly, in this case study, the implementation is actually unsafe relative to the specification: the specification states that channels are allowed to be closed only after (the receive of) the previous communication is done, but in the implementation, :alice or :bob can attempt to close already before. There are several ways to fix this bug. One solution is to use unbuffered channels instead of buffered ones. Another solution is to mix channels with a synchronisation barrier from Java's standard library java.util.concurrent (readily usable in Clojure), to let :alice and :bob first await each other and then close (i.e. covert interaction). The next case study further demonstrates the latter idea. Footnote 4 continued operation on an old data structure leaves it unmodified and, instead, returns a new data structure. In concurrent programs, including Tic-Tac-Toe, persistent data structures can be used as thread-local copies of data, but modifications need to be explicitly communicated. Persistence also means that data races cannot happen: if threads communicate only persistent data structures, freedom of data races is guaranteed.

Rock-Paper-Scissors
Our second case study is a program that simulates a game of Rock-Paper-Scissors. 5 The program consists of k threads and k 2 −k directed channels from every thread to every other thread. In every round, every thread chooses an item-rock, paper, or scissors-and sends it to every other thread; then, when all items have been received, every thread determines if it goes to the next round. This case study demonstrates the following features: • Specification: indexed roles; synchronous communication through unbuffered channels; conditional choice; local bindings; existential and unordered-universal quan-tification; index-based parameters; set operations; implicit non-determinism. • Implementation: selection; covert interaction (synchronisation barrier).

Specification
A specification of Rock-Paper-Scissors is shown in Fig. 12; auxiliary discourje functions are typeset in font. Line 1 specifies one role, identified by :player. Lines 3-16 specify two protocols, identified by :rps (one formal parameter for role indices) and :rps-round (two formal parameters). There are two key differences with Fig. 10 in Sect. 3.2: • Whereas roles :alice and :bob in Tic-Tac-Toe are enacted each by a single thread, role :player in Rock-Paper-Scissors is enacted by multiple threads.
To distinguish between different threads that enact the same role, roles can be indexed. For instance, with 0based indexing, (:player 5) represents the thread that implements the sixth player. • Whereas formal parameters of specification :ttt-turn in Tic-Tac-Toe range over roles, those of specifications :rps and :rps-round range over (sets of) role indices.
Specification :rps-round represents one round of the game; threads indexed by elements in set ids are still in, while threads indexed by elements in set co-ids are already out. When at least two threads are still in (if if if), :rps-round specifies a concatenation: 1. First, there is an unordered-universal quantification (par-every par-every par-every) of local variable i over domain ids, and simultaneously, local variable j over domain "ids without i" (disj). In general, an unordered-universal quantification gives rise to a "big parallel" of branches, each of which is formed by binding values in the domains to local variables (cf. parallel for-loops). In this particular example, every such branch specifies a communication of a value of type String through an unbuffered channel from (:player i) to (:player j) (-> -> ->). The idea is that every (:player i) sends its chosen item to every other in-game (:player j), in no particular order (implementation detail). 2. Next, there is an existential quantification (alt-every alt-every alt-every) of local variable winner-ids over domain "set of subsets of ids" (power-set). Similar to unordereduniversal quantification, in general, existential quantification gives rise to a "big choice" of branches. In this particular example, every such branch specifies a bind-ing (let let let) of local variable loser-ids to "ids without winner-ids" (difference), after which: • There is another instance of :rps-round, but now with only winner-ids retained from ids, and with loser-ids added to co-ids (union). The idea is that only every (:player i) that is a winner this round goes to the next round. • Concurrently, there is an unordered-universal quantification of i over loser-ids, and simultaneously, j over "all indices except i". Every branch of this "big parallel" specifies the closing of the channel from (:player i) to (player j). The idea is that every (:player i) that is a loser this round closes its channel to every other (:player j).
Thus, the idea of the existential quantification is, for every possible subset of winners, that the winners stay in the game, while the losers go out. We note that the usage of existential quantification in this way makes the specification implicitly nondeterministic: different branches may start with the exact same (sequence of) channel action(s), until a "distinguishing" channel action happens. This requires nontrivial bookkeeping to support.
Specification :rps represents the whole game. It specifies an initial instance of :rps-round, when all threads are in, and no threads are out (empty-set).
In addition to existential quantification and unordereduniversal quantification, there is also support for ordereduniversal quantification (cat-every cat-every cat-every): similar to the former two, the latter one gives rise to a "big concatenation" of branches (cf. sequential for-loops). We also note that the syntax and semantics of the functions for operations on sets are the same as those in standard library clojure.set, to make discourje easy to learn.

Implementation
An implementation of the Rock-Paper-Scissors program is shown in Fig. 13; auxiliary discourje functions are typeset in font; shading indicates external Java calls for covert interaction.
Line 1 defines a constant for the number of threads k. Lines 3-7 define constants and functions to represent Rock-Paper-Scissors concepts. Line 9 defines a collection of k 2 −k unbuffered channels that implement the network, intended to be used as a fully connected mesh; the threads are represented by indices in the range from 0 to k (exclusive). We note that mesh is an auxiliary discourje function to simplify defining collections of channels; just as the other auxiliary discourje functions used in Fig. 13, it works also without adding a monitor or instrumentation. Line 10 defines Fig. 12 Specification of the Rock-Paper-Scissors program a reusable synchronisation barrier, imported from standard library java.util.concurrent, leveraging Clojure's interoperability with Java; shortly, we clarify the need for this.
Lines 12-30 define k copies of a thread that implements role :player. Every such thread executes two parametrised loops: an outer one, each of whose iterations comprises a round, and an inner one, each of whose iterations comprises a channel action. We clarify the following aspects: • According to the specification (Fig. 12), in the first half of every round (lines 8-10), the items that are chosen by in-game threads are communicated among them. This is potentially problematic: as channels are unbuffered, sends and receives are blocking until reciprocal channel actions are performed, so unless threads collectively agree on a global order to perform reciprocal channel actions, deadlocks may occur. However, such global orders are hard to get right and brittle to maintain. An alternative solution is to use selections: in general, a selection consumes a list of channel actions as input, then blocks until one of those actions becomes enabled, then performs that action, then unblocks, and then produces that action's output as output. Thus, a selection performs one channel action from a list, depending on its enabledness at run time.
In this particular example, instead of performing globally ordered reciprocal sends and receives, every thread performs a series of selections (alts!! alts!! alts!!) in the inner loop (Fig. 13, lines 17-24). Initially, the list of channel actions consists of all sends (puts) and receives (takes) that a thread needs to perform in a round. When a selection finishes, the channel action that was performed is removed from the list, and the inner loop continues. Because every thread behaves in this way, reciprocal channel actions will always be enabled.
• According to the specification (Fig. 12), there is a strict order between the first half of every round (lines 8-10) and the second half (lines [11][12][13][14][15][16]: all channel actions that belong to the first half need to have happened before proceeding to the second half. This is potentially problematic: additional synchronisation is needed to ensure that "fast threads"-those that perform their channel actions early-wait for "slow threads" to catch up. To solve this, in this case study, we mix channels with a synchronisation barrier from java.util. concurrent (shaded code in Fig. 13). This demonstrates that channel-based programming abstractions (verified using discourje) can be mixed seamlessly with other concurrency libraries (not verified), which is common practice [6,41].
A monitor and instrumentation can be added to the implementation in the same way as shown in Fig. 7. In this case study, the implementation is safe relative to the specification.

Go fish
Our third case study is a program that simulates a game of Go Fish. 6 The Go Fish program consists of k+1 threads (players, plus dealer), and k 2 +k channels from every thread to every other thread; unlike the Rock-Paper-Scissors program, however, all interactions among threads happen through channels (no covert interaction). This example demonstrates the following features: 6 Go Fish is a multiplayer game played with a standard 52-card deck. A dealer shuffles the deck and deals an initial hand to every player. Next, players take turns to collect groups of cards of the same rank. Every turn, the active player asks a passive player for a card. If the asked player has it, the asking player gets it and takes another turn; if not, the asked player tells the asking player ("go"), the asking player gets a card from the dealer ("fish"), and the turn is passed to the asked player. The first player to hold only complete groups wins. (This version of Go Fish is due to Parlett [42]). • Specification: user-defined data types; repetition; ordered-universal quantification; explicit nondeterminism.
• Implementation: data type-based control flow.

Specification
A specification of Go Fish is shown in Fig. 14. Line 1 defines two roles, identified by :dealer (enacted by a single thread) and :player (multiple threads). Lines 3-29 define two protocols, identified by :gf and :gf-turn. Lines 30-35 define six user-defined data types.
Specification :gf-turn represents one turn of (:player i). It specifies a "big choice". In every branch, the idea is as follows. First, (:player i) asks (:player j) for some card. Next, there is a choice: 1. (:player j) replies with the card that it was asked for, which happens to be the last card that (:player i) needed (to complete its last group), so it informs (:dealer), and the game ends.
2. Or, (:player j) replies with the card that it was asked for, which does not happen to be the last card that (:player i) needed, so (:player i) takes another turn, and the game continues. We note that the specification is explicitly nondeterministic: the first branch and the second branch both start with the same channel action. 3. Or, (:player j) does not reply with the card that it was asked for, so (:player i) tries to "fish" a card from :dealer, after which (:player i) passes the turn to (:player j), and the game continues.
Specification :gf represents the whole game. It specifies a concatenation: 1. First, there is a "big parallel". The idea is that :dealer deals every player an initial hand of five cards, in no particular order (implementation detail). 2. Next, there is a "big choice". The idea is that :dealer passes the first turn to one of the players (implementation detail). During the game, the players pass the turn among themselves without involving :dealer. 3. Next, there is a "big parallel". The idea is that the game has ended at this point, so :dealer closes its channel to every (:player i), in no particular order (implementation detail), after which every (:player i) sends its hand back to :dealer through the oppositely directed channel, closes that channel, and closes its channel to every other (:player j), in no particular order (implementation detail).

Implementation
An implementation of Go Fish is shown in Fig. 15 (excerpt; many details are left out to save space). To demonstrate that discourje supports data type-based control flow, Fig. 15 shows fragments of code where values are received-directly with <!! <!! <!! and indirectly with alts!! alts!! alts!!by threads that enact role :player. Specifically: • On line 3, alts!! alts!! alts!! is used to receive a value v from another :player or from :dealer. This value is either of type Turn/Ask (received from another :player), or nil ("received" from :dealer). We note that a "receive" of nil happens only, and automatically, when the channel from :dealer to (:player i) is closed. Such a degenerate "receive" is used by (:player i) to detect that the game has ended. • On line 5, <!! <!! <!! is used to receive a value of type Card or Go from (:player j), to which a value of type Ask must have been sent previously (not shown).
A monitor and instrumentation can be added to the implementation in the same way as shown in Fig. 7. In this case study, the implementation is safe relative to the specification.

Theory of discourje
The discourje library is built on a formal foundation, inspired by process algebra (e.g. [43]) and multiparty session types (e.g. [16,17]). This underlying theory consists of a calculus of specifications (Sect. 4.1), a calculus of implementations (Sect. 4.2), and a simulation relation (Sect. 4.3). The aim of this section is to explain the general idea without excessive notation; in the interest of clarity, we therefore focus on the basic fragments of discourje and Clojure. These fragments consist of channel actions (sends, receives, closings, and selects), choice, and concatenation.

Specification calculus
Let R denote the set of roles, ranged over by p, q, r . Let T = {Bool, Nat, . . .} denote the set of types, ranged over by t. Let S denote the set of specifications, ranged over by S; it is induced by the following grammar: Term p q :t specifies a synchronous communication of a value of type t through an unbuffered channel from p to q. Term p q specifies an asynchronous communication of a value of type t through a buffered channel from p to q; the capacity of the buffer is left unspecified (implementation detail). Term pq• specifies a closing of a channel from p to q. Terms S 1 + S 2 , S 1 · S 2 , and S 1 S 2 specify a choice (i.e. alternative composition), a concatenation (i.e. sequential composition), and an interleaving (i.e. parallel composition) of S 1 and S 2 . The "boxed" terms (i.e. 1 and pq?t; the boxes are not part of the grammar) are auxiliary in the sense that they are used only to define the operational semantics below; there are no corresponding discourje macros. Term 1 specifies a skip; it can only terminate. Term pq?t specifies the asynchronous receive of a value of type t through a buffered channel from p to q (when a send has already happened).
To formally define the operational semantics of specifications, let denote the set of type-level actions, ranged over by σ ; it is induced by the following grammar: σ ::= pq! ?t | pq!t | pq?t | pq• Term pq! ?t specifies a synchronous send and receive of a value of type t through an unbuffered channel from p to q. Terms pq!t and pq?t specify an asynchronous send and receive of a value of type t through a buffered channel from p to q. Term pq• specifies a closing of a channel from p to q.
The operational semantics of specifications is formally defined in terms of a termination predicate and a labelled reduction relation, denoted by ↓ and →; they are induced by the rules in Fig. 16. The rules are standard in process algebra (e.g. [44]). We note that rule [S→-Buf] induces two reductions.
The state machine S of specification S is a triple (Q, q 0 , ), where Q is a set of states, q 0 ∈ Q is the initial state, and ⊆ Q × Q is the transition relation. Formally, Q is induced by the following rules:  (Q, q 0 , ) is a sequence of states q 1 · · · q n , such that (q i , q i+1 ) ∈ for every 1 ≤ i < n, and such that (q n , q ) / ∈ for every q ∈ Q; let paths(M, q) denote the set of all paths in M that start in q. (It suffices to restrict ourselves to finite paths here, as our specification calculus does not feature loops/recursion.) State machines can be used to model-check specifications for temporal requirements expressed in computation tree logic (CTL) [45]. Let denote the set of formulas, ranged over by φ; it is induced by the following grammar: Formula σ means that σ has just happened in the current state. Formulas ¬φ and φ 1 ∨ φ 2 mean that the negation of φ and the disjunction of φ 1 and φ 2 are true in the current state. Formula AX(φ) and (resp. EX(φ)) mean that φ is true in every (resp. some) next state. Formula AU(φ 1 , φ 2 ) (resp. EU(φ 1 , φ 2 )) means that φ 1 is true until φ 2 is true on every (resp. some) path that starts in the current state. We note that  The semantics of formulas is formally defined in terms of an entailment relation, denoted by | ; it is induced by the rules in Fig. 17. The rules are standard (e.g. [46]). To formally define the operational semantics of implementations, we introduce the following auxiliary definitions:

Implementation calculus
• Let I denote the set of value-level actions, ranged over by ι; it is induced by the following grammar: Term pq! ?v implements a synchronous send and receive of v through an unbuffered channel from p to q. Terms pq!v and pq?v implement an asynchronous send and receive of v through a buffered channel from p to q. Term pq• implements a closing of a channel from p to q. Term τ implements any other action. We will use value-level actions as reduction labels. For instance, eval((+ 5 6)) = 11. We stipulate that "bogus" expressions are evaluated to err. For instance, eval((+ 5 true)) = err. We will use evaluation to ensure that only values are communicated through channels.
The operational semantics of implementations is formally defined in terms of a labelled reduction relation, denoted by →, over configurations of the form (I , N ); it is induced by the rules in Fig. 18. In words:  A run → I of implementation I is a subset of → such that: • There exist ι, I , N such that (I , ∅) ι − → I (I , N ). That is, the run has a proper initial configuration.  N 2 ), then ι 1 = ι 2 and I 1 = I 2 and N 1 = N 2 . That is, every configuration in the run has a unique successor.
We note that we do not require runs to be complete, as we also want to verify the safety of partial runs that are not finished yet, but which are safe so far.

Verification
To formally define safety, we introduce the following auxiliary definitions: • Let C → (R × R) ∪ {⊥} denote the set of instrumentations, ranged over by †. In words, every instrumentation is a partial function from channel identifiers to pairs of roles of the form pq, where p is the intended sender and q is the intended receiver. The idea is that every † establishes links between channel references in an imple- Fig. 18 Operational semantics of implementations mentation (characterised by their identifiers) and channel references in a specification (characterised by roles). • Let : † ⊆ I× denote the †-compliance relation between type-level actions and value-level actions; it is induced by the following rules: v is of type t In words, the rules state that an action implementation ι complies with an action specification σ if: (1) the channel identified by c in ι is linked by † to the intended sender and the intended receiver that occur in σ ; (2) the value that occurs in ι is of the type that occurs in σ .
Safety ("bad channel actions never happen") is formally defined in terms of weak simulation (e.g. [47]). More precisely, given instrumentation †, a run → I of implementation I is †-safe relative to specification S, if there exists a binary simulation relation such that:

Practice of discourje
In this section, we present two practical aspects of the discourje library. First, we explain the main components and their internals in more detail (Sect. 5.1). Next, we present performance experiments using both microbenchmarks and whole-program benchmarks (Sect. 5.2).

The library
The discourje library consists of three main components, each of which corresponds with an activity in the intended workflow (Fig. 4): • discourje.core.spec is a sublibrary to write specifications (Sect. 5.1.1). • discourje.core.lint is a sublibrary to check specifications (Sect. 5.1.2). • discourje.core.async is a sublibrary to write implementations (Sect. 5.1.3).

Writing specifications: discourje.core.spec
Sublibrary discourje.core.spec consists of macros to write specifications (cf. syntax of the specification calculus; Sect. 4.1); data structures to represent specifications as state machines (cf. operational semantics of the specification calculus); and functions to instantiate these data structures and construct monitors. The idea is visualised in Fig. 19: first, the programmer writes a specification S using the macros; next, at run time, function spec is applied to S to expand and evaluate the macros to a state machine S ; next, function monitor is applied to S to create a monitor.
The monitor provides two operations, depicted as "lollipops" in Fig. 19: verifying if a given channel action ι is allowed in current state q of S = (Q, q 0 , ) (formally: given instrumentation †, check if there exist σ, S such that (q, (σ, S )) ∈ and ι : † σ ), and subsequently updating the current state of S to a successor. In this way, effectively, the monitor builds a simulation relation to ensure safety (Sect. 4.3), incrementally, as channel actions are performed. We note that operations verify and update happen atomically, using lock-free synchronisation (compare-andset): an update happens only if both verification succeeded and there has been no update in the meantime. Besides this base functionality, discoure.core.spec also offers the following extensions: • Non-determinism To support non-deterministic specifications, the monitor maintains a set of possible current states {q 1 , . . . , q n } instead of a single state. To verify if channel action ι is allowed, the monitor iterates over all states in the set to find at least one of them that has a corresponding transition (formally: given instrumentation †, check if there exist i, σ, S such that (q i , (σ, S )) ∈ and ι : † σ ). If so, to subsequently update the set of current states, the monitor collects all possible successors (formally: {(σ, S ) | (q, (σ, S )) ∈ and ι : † σ }). In this way, essentially, the state machine is determinised using an on-the-fly power set construction.
• Incremental generation Instead of generating the whole state machine for S upfront, the monitor can also generate it incrementally, by need. This is advantageous if only a small portion of the state machine is actually needed.

Checking specifications: discourje.core.lint
Sublibrary discourje.core.lint consists of functions to validate generic sanity checks (Sect. 3.1) and protocol-specific temporal requirements. The core of discourje.core.lint is a custom-built model checker for CTL. The idea is to: first, define intended requirements of a specification S as CTL formulas (Fig. 17); next, compute state machine S ; next, invoke a classical CTL model checking algorithm [48]. Besides this base functionality, discourje.core.lint also offers the following extensions: • Batch mode When asked to batch-check multiple formulas, the model checker reuses the state machine and bookkeeping information across formulas, to avoid double work. Notably, the generic sanity checks are performed in batch mode to improve performance. • Past-time operators CTL allows the programmer to express requirements in terms of properties of the future. However, in our experience, many requirements are more naturally expressed in terms of properties of the past. For instance: "if a channel is closed, then it must have been used before" (i.e. one of the generic san-ity checks). Therefore, discourje.core.lint also supports Past CTL (with branching past) [49]. • Witness generation To use discourje.core.lint effectively for debugging, proper diagnostics must be included when an issue is reported. Therefore, discourje.lint can generate witnesses that serve as counterexamples of a CTL formula. As usual for CTL (e.g. [50]), our witness generator works only for the universal fragment of CTL. • API Using an extra API (in Clojure), custom atomic propositions and temporal patterns can be written to extend the core. We used this feature to write the generic causality check, as it cannot be easily expressed using only the standard atomic propositions.

Running implementations:
discourje.core.async Sublibrary discourje.core.async consists of functions that serve as proxies for functions and macros thread thread thread (new thread), chan chan chan (new channel), close! close! close! (closing), >!! >!! >!! (send), <!! <!! <!! (receive), and alts!! alts!! alts!! (select) in clojure.core.async. The idea is visualised in Fig. 20: first, the programmer writes an implementation I ; next, at run time, function link is applied to the channels in I to create instrumentation. More precisely, function link associates a channel with an intended sender, intended receiver, and monitor; it is the practical embodiment of function † (Sect. 4.3). We emphasise that no other changes to I are needed: as the signatures of the supported macros and functions in clojure.core.async (listed above) are identical to their proxies in discourje.core.async, adding instrumentation in this way is non-invasive and nearly effortless. In more detail, the proxies of >!! >!! >!!, <!! <!! <!!, and close close close in discourje.core.async work as follows. When one of these functions is invoked, first, it waits until the underlying channel c is ready for the operation: in case of a send or receive through an unbuffered channel, a reciprocal receive or send needs to be pending; in case of a send or receive through a buffered channel, the buffer needs to be non-full or non-empty. Next, at time t 1 , the monitor linked to c is requested to verify if the attempted send, receive, or closing is allowed. If yes, at time t 2 , the monitor is requested to update accordingly and the attempted send, receive, or closing actually takes effect (i.e. a value is synchronously exchanged or asynchronously enqueued/dequeued); if no, an exception is thrown. If, between t 1 and t 2 , multiple threads request the monitor to update, only one will succeed; the others need to retry from the start. In this way, safety violations are detected in a way that is both sound (i.e. if an exception is thrown, the violating action really was not allowed) and complete (i.e. if no exception is thrown, all actions were really allowed).
Finally, we note that Java interoperability is supported. That is, to leverage the fact that Clojure compiles to Java bytecode and runs on the JVM, we also wrote a thin Java wrapper around discourje.core.async, so Java programmers can easily use channels and have them monitored from inside their Java programs, regardless of the threading mechanism (e.g. classical Java threads, thread pools, or parallel streams can be used).

Performance experiments
From the outset, we had two intended usage types of the discourje library: • Usage type A As a testing/debugging tool for concurrent programs in development, to find/diagnose communication-related concurrency bugs. • Usage type B As a fail-safe mechanism for concurrent programs in production, to prevent propagation of spurious results caused by concurrency bugs to end-users (i.e. it is often preferable to throw a runtime error).
A key factor that determines discourje's fitness for purpose is efficiency. We therefore conducted two kinds of performance experiments: microbenchmarks to study the scalability of discourje (Sect. 5.2.1) and whole-program benchmarks to study the overhead relative to unmonitored code (Sect. 5.2.2).
In all experiments, we used a machine with 32 physical cores and 64 GB of physical memory (far more than needed for our benchmarks), using CentOS Linux 8 (kernel: 4.18) and Java 16.0.1 (HotSpot JVM) with default settings.

Microbenchmarks
In the microbenchmarks, we studied discourje's scalability under "extreme" circumstances in which threads perform only sends and receives, without any real computations; this is the worst-case scenario for the lock-free algorithm to synchronise monitor access, as it gives rise to maximal thread contention.
We considered six basic protocols to investigate the core features of the specification language in isolation. The specifications are shown in Fig. 21; their relevant properties are summarised in Table 1 (discussed below).
Specifications :ring-unbuffered and :ringbuffered combine concatenation with synchronous and asynchronous communication. Specifications :star-unbuffered-out-wards and :star-un-bufferedin-wards combine choice with synchronous communication; their state machines have only a single state (with an outgoing transition for every :worker thread). Specifications :star-buffered-out-wards and :star-  :star-un-buffered-out-wards One-to-many Constant Linear :star-un-buffered-in-wards Many-to-one Constant Linear :star-buffered-out-wards One-to-many Exponential Linear :star-buffered-in-wards Many-to-one Exponential Linear buffered-in-wards combine interleaving with asynchronous communication; their state machines have exponentially many states due to the combinatorial explosion of the orders in which the communications can be interleaved. Each of these specifications consists of a loop with an unspecified number of iterations (* * *). In every iteration of :ring-unbuffered and :ring-buffered, the roles need to communicate according to a ring pattern; in every iteration of :star-un-buffered-out-wards and :star-un-buffered-in-wards, the roles need to communicate according to a one-to-many pattern; in every iteration of :star-un-buffered-in-wards and :star-buffered-in-wards, the roles need to communicate according to a many-to-one pattern.
We ran every implementation of these protocols with k ∈ {2, 4, 6, 8, 10, 12, 14, 16} :worker threads, 8 for 4096 loop iterations, and measured the run times. For every implementation, for every protocol, and for every k, we repeated the run 30 times to smooth out variability, on separate "cold" instances of the JVM to rule out JIT impact across repetitions. We computed the mean m, standard deviation s, and coefficient of variation s m . The means are shown in Fig. 22; the coefficients of variation were all less than 10%, so the general trends are informative.
To explain the general trends, we model the total run time t of an implementation in terms of its two dominant components, using the following equation: t = t mach + t act , where Microbenchmarks: run times in seconds (y-axis) as the number of worker threads increases (x-axis), as a measure of scalability t mach is the time required to compute the state machine for the specification, and t act is the time required to perform all sends and receives. Using this model, we summarise the main findings as follows: • We observe linear scalability for :ring-unbuffered, :ring-buffered, :star-un-buffered-out-wards, and :star-un-buffered-in-wards.
To explain this, we first note that the number of states (column "#states" in Table 1) and the number of transitions per state (column "#trans/state") grow linearly in k for these specifications, so t mach grows linearly in k too. We also note that the number of sends and receives grows linearly in k, so t act grows linearly in k too. Thus, t = t mach + t act grows linearly in k. • We observe exponential scalability for :star-buffered-out-wards and :star-buffered-in-wards (i.e. the scale on the y-axis is logarithmic).
To explain this, we note that the number of states (column "#states" in Table 1) grows exponentially in k, so t mach grows exponentially in k too. Thus, t = t mach + t act grows exponentially in k.
We note that we use equation t = t mach + t act only as a model to explain the general trends; we have not measured t mach and t act separately. To conclude, :ring-unbuffered, :ringbuffered, :star-un-buffered-out-wards, and :star-un-buffered-in-wards enjoy fine scalability. However, scalability of star-buffered-outwards and :star-buffered-in-wards can be improved.

Whole-program benchmarks
In the whole-program benchmarks, we studied discourje's overhead in five real(istic), existing concurrent programs: • Chess Simulates a game of chess between two player threads. • Conjugate Gradient (CG-k) Computes an estimate of the largest eigenvalue of a symmetric positive definite sparse matrix with a random pattern of nonzeros, using the conjugate gradient algorithm, with k worker threads. • Fourier Transform (FT-k) Computes the solution of a partial differential equation, using the forward and inverse Fast Fourier Transform algorithm, with 2·k worker threads. • Integer Sort (IS-k) Computes a sorted list of uniformly distributed integer keys, using histogram-based integer sorting, with k worker threads. • Multi-Grid (MG-k) Computes an approximate solution u to the discrete Poisson problem ∇ 2 u = v, using the V-cycle multigrid algorithm, with 4·k worker threads.
For Chess, we used Clojure code similar to the threads in Tic-Tac-Toe (Fig. 11), combined with invocations of the open source chess engine Stockfish (https://stockfishchess. org) to compute moves. For CG, FT, IS, and MG, we adapted existing Java implementations from the NAS parallel benchmarks (NPB) [51] suite, which consists of computational fluid dynamics kernels, by taking advantage of our Java interoperability wrapper (Sect. 5.1.3) to replace the monitor-based synchronisation used in the original versions. We also wrote specifications using discourje. For Chess, the specification is the same as the Tic-Tac-Toe specification (Fig. 10); for CG, FT, IS, and MG, the specifications consist of repetitions of buffered one-to-many and many-to-one patterns (Fig. 21), involving various subsets of worker threads and data types. From a communication perspective, the key difference between CG, FT, IS, and MG is the frequency in which repetitions of the one-to-many and manyto-one patterns happen (i.e. communication intensity).
We recorded execution times of each of the implementations without and with monitoring enabled, using standardised computational workloads. For Chess, the workload is controlled by the total amount of time each player has to compute its moves during the entire game; we used the four smallest such workloads supported by the open source chess server Lichess (https://lichess.org), namely {15, 30, 45, 60} seconds, and we limited games to a maximum of 40 turns per player (UltraBullet chess). Furthermore, we allow simultaneous "ponder" computations by a player during its opponent's turn, so there is ample parallelism as well. For CG, FT, IS, and MG, the workload is controlled by the input size; we used the standardised inputs that are predefined by NPB.
For every implementation (without and with monitoring enabled), and for every k, we repeated the run 30 times to smooth out variability, and we computed the mean m, standard deviation s, and coefficient of variation s m . All coefficients of variation were smaller than 10%, except for CG-14 (15%) and CG-16 (18%), so the general trends are informative. As a measure of overhead, we computed normalised means μ w μ wo , where μ w and μ wo are mean run times with and without monitoring enabled; this metric is a dimensionless number that indicates the factor by which monitoring slows down the implementation. The normalised means for Chess are 0.979 (15 seconds), 0.999 (30 seconds), 0.996 (45 seconds), and 0.996 (60 seconds); the normalised means for NPB are shown in Fig. 23. We summarise the main findings as follows, relative to intended usage types A and B of the discourje library (page 29): • For Chess, the normalised means are all very close to 1, which indicates that the overhead of monitoring is negligible. This suggests that intended usage types A and B are both possible for Chess. • For FT and IS, the slowdowns are all less than 12%.
This seems low enough not only for usage type A (testing/debugging in development), but also usage type B (fail-safe mechanism in production). We also note that the specifications for FT and IS are (extended versions of) specifications :star-buffered-in-wards and :star-buffered-out-wards, which scaled poorest in the microbenchmarks. This shows that despite poor scalability under the "extreme" circumstances in the microbenchmarks (only sends and receives; no computations), discourje can still perform reasonably well in whole programs. • For CG and MG, the slowdowns are higher: up to 5.3× and 3.7×, respectively. Although this is likely to be too much for usage type B, it seems low enough for usage type A (cf. the industrial-strength Valgrind tool for dynamic analysis of memory management [52], which can inflict similar slowdowns but is nevertheless effectively used in practice). The difference in performance between {FT, IS} and {CG, MG} may be explained by the fact the latter are considerably more communication-intensive than the former, so the overhead of monitoring communications is more pronounced.

Conclusion
We presented Discourje: a research project that aims to help programmers cope with channels and concurrency bugs in Clojure, based on dynamic analysis. That is: Discourje offers a run-time verification library in Clojure, called discourje, to ensure safety of channel actions in implementations relative to specifications. The formal foundations of discourje are based on multiparty session types, but trade in static type checking for dynamic run-time monitoring; a key advantage is higher expressiveness. An important design principle of discourje has been ergonomics: we aim to make discourje's usage as comfortable as possible. In particular, programmers can decide to start using discourje at any stage of development (and doing so requires little effort); discourje is itself implemented in Clojure (so there is no need to use a different IDE, learn completely new syntax, or install special compilers); and discourje can be used seamlessly alongside other concurrency libraries. Furthermore, results in performance experiments indicate that run time overhead can be less than 12% for real(istic), existing concurrent programs. This makes discourje suitable both as a testing/debugging tool in development and as a fail-safe mechanism in production.
We close this paper with an overview of related work (Sect. 6.1) and future work (Sect. 6.2).

Related work
As explained in Sect. 1.1.2, the Discourje project was originally conceived to explore a new direction in research on multiparty session types (MPST). In recent years, several practical tools were developed, mostly for statically typed languages (e.g. F# [26], Go [7], Java [27,28], Scala [29]), and to lesser extent for dynamically typed languages (e.g. Python [53], Erlang [54]). To our knowledge, in the context of MPST, the Discourje project is the first to leverage run-time verification and decomposition-free verification together for a dynamically typed language (Fig. 1), although these characteristics have been considered in isolation: • There are MPST approaches that combine static type checking with a form of distributed run-time verification and/or assertion checking [19,26,[55][56][57]. In contrast to Discourje, however, these dynamic techniques still rely on decomposition, which negatively affects their expressiveness (e.g. none of the case studies in Sects. 3.2-3.4 are supported). • Decomposition-free MPST has also been explored by López et al. [58,59]. The idea is to specify MPI communication protocols in an MPI-tailored DSL, inspired by MPST, and verify the implementation against the specification using deductive verification tools (VCC [60] and Why3 [61]). However, this approach requires considerable manual effort. In contrast, discourje can be used in a fully automated way.
Expressiveness of MPST has been an important research topic in recent years, but efforts have primarily been geared towards adding more advanced features (e.g. time [18,19], security [20][21][22][23], and parametrisation [7,24,25]); in contrast, restrictions on the usage of core features such as choice and interleaving have remained, even though they limit MPST's applicability in practice (e.g. none of the case studies in Sects. 3.2-3.4 are supported). Some work has been done to improve expressiveness in this regard using static techniques [62], but the specification language of discourje remains more expressive.
Verification of shared-memory concurrency with channels has received attention in the context of Go [8][9][10][11]. However, in addition to relying on static techniques, emphasis in these works is on checking deadlock-freedom, liveness, and generic safety properties, while we focus on programspecific protocol compliance. Castro et al. [7] also consider protocol compliance for Go, but their specification language is substantially less expressive than discourje (e.g. none of the case studies in Sects. 3.2-3.4 are supported).
We are aware of only two other works that use formal techniques to reason about Clojure programs: Bonnaire-Sergeant et al. [63] formalised the optional type system for Clojure and proved soundness, while Pinzaru et al. [64] developed a translation from Clojure to Boogie [65] to verify Clojure programs annotated with pre/post-conditions. Discourje seems the first research project to target concurrency in Clojure.

Future work
We aim to improve discourje along the following lines: • Recovery We aim to explore the idea that whenever a monitor detects a safety violation, instead of throwing an exception, it should delay the violating action as a corrective measure, in an attempt to steer the implementation towards safety. When done naively, such delays can easily give rise to deadlocks, so our plan is to combine this approach with run-time model checking/reachability analysis to ensure that eventually, the violating action will be allowed (if yes, delay; if no, throw). • Scalability Our microbenchmarks show that we need better ways to deal with specifications with exponentially sized state machines. Our plan is to study new forms of flexible decomposition that allow us to compute local specifications as in traditional MPST ( Fig. 1) to avoid exponential blow-up whenever possible, but without compromising expressiveness (by keeping a centralised component, like the current monitors, if needed).
Orthogonally, we would like to better understand the effectiveness of using discourje (e.g. in terms of reduced development costs).