1 Introduction

The relationship between universal algebra and monads has been studied at least since Linton [13] and Eilenberg and Moore [4], while the relationship between monads and the general theory of computational effects (exceptions, mutable state, nondeterminism, and such) has been observed by Moggi [14]. By transitivity, one can study computational effects using concepts from universal algebra, which is the main theme of Plotkin and Power’s prolific research programme (see [10, 20,21,22,23,24] among many others).

The simplest possible case of this approach is to describe an effect via a finitary equational theory: a finite set of operations (of finite arities), together with a finite set of equations. One such example is the theory of monoids:

The above reads that the signature of the theory consists of two operations: binary \(\gamma \) and nullary \(\varepsilon \). The equations state that \(\gamma \) is associative, with \(\varepsilon \) being its left and right unit.Footnote 1 One can also read this theory as a specification of backtracking nondeterminism, in which the order of results matters, where \(\gamma \) is an operation that creates a new computation as a choice between two subcomputations, while \(\varepsilon \) denotes failure. The connection between the equational theory and the computational effect becomes apparent when we consider the monad of free monoids (that is, the list monad), which is in fact used to form backtracking computations in programming; see, for example, Bird’s pearl [1].

This suggests a simple recipe for computational effects: it is enough to come up with an equational theory, and out of the hat comes the induced monad of free algebras that implements the corresponding effect. Such an approach is indeed possible in the category \(\mathbf {Set}\), where every finitary equational theory admits a free monad, constructed by quotienting terms over the signature by the congruence induced by the equations. However, if we want to implement this monad in a programming language, the situation is not so simple, since in most programming languages (in particular, those without higher inductive types) we cannot generally express this kind of quotients. For instance, to describe a variant of nondeterminism that does not admit duplicate results, we may extend the theory of monoids with an equation stating that \(\gamma \) is idempotent, that is, \(\gamma (x,x) = x\). But, unlike in the case of general monoids, the monad induced by the theory of idempotent monoids seems to be no longer directly expressible in, say, Haskell. This means that there is no implementation that satisfies all the equations of the theory “on the nose”—one informal argument is that the representations of \(\gamma (x,x)\) and x should be the same whatever the type of x, and this would require a decidable equality test on every type, which is not possible.

Thus, both from the practical viewpoint of programming and as a question on the general nature of equational theories, it makes sense to ask which theories are “simple” enough to induce monads expressible using only the basic type formers, such as (co)products, function spaces, algebraic datatypes, universal and existential quantification. This question seems difficult in general, and to our knowledge there is little work that addresses it. In this paper, we focus on a small piece of this problem: we study a certain subset of such implementable equational theories, and conjure some novel extensions.

The monads that we consider arise from Cayley representations. The overall idea is that if a theory has an expressible, well-behaved (in a sense that we make precise in the paper) Cayley representation, the induced monad also has an expressible implementation. The well-known Cayley theorem for monoids states that every monoid with a carrier X embeds in the monoid of endofunctions \({X \rightarrow X}\). In this paper, we generalise this result: given a polynomial \(\mathbf {Set}\)-endofunctor \(P\) with natural coefficients, we provide an explicit formula for an equational theory such that its every algebra with a carrier X embeds in a certain algebra with the carrier given by \(PX \rightarrow X\). Then, we show that the monad of free algebras of such a theory can be implemented as a continuation-like monad with the endofunctor given at a set A as:

$$\begin{aligned} \forall X. (A \rightarrow PX \rightarrow X) \rightarrow PX \rightarrow X \end{aligned}$$

This type is certainly expressible in programming languages based on polymorphic \(\lambda \)-calculi, such as Haskell.

However, before we can give the details of this construction, we need to address some technical issues. It is easy to notice that there may be more than one “Cayley representation” of a given theory: for example, a monoid X embeds not only in \(X \rightarrow X\), but also in a “smaller” monoid , by which we mean the monoid of functions of the type \(X \rightarrow X\) of the shape \(a \mapsto \gamma (b,a)\), where \(b \in X\). The same monoid X embeds also in a “bigger” monoid \(X^2 \rightarrow X\), in which we interpret the operations as \(\gamma (f,g) = (x,y) \mapsto f(g(x,y),y)\) and \(\varepsilon = (x,y) \mapsto x\). What makes \(X \rightarrow X\) special is that instantiating (1) with \(PX = X\) gives a monad that is isomorphic to the list monad (note that in this case, the type (1) is simply the Church representation of lists). At the same time, we cannot use instead of \(X \rightarrow X\), since (1) quantifies over sets, and thus there is no natural candidate for \(\gamma \). Moreover, even though we may use the instantiation \(PX = X^2\), this choice yields a different monad (which we describe in more detail in Sect. 5.4). To sort this out, in Sect. 2, we introduce the notion of tight Cayley representation. This notion gives rise to the monad of the following shape, which is a strict generalisation of (1), where R is a \(\mathbf {Set}\)-bifunctor of mixed variance:

$$\begin{aligned} \forall X. (A \rightarrow R(X,X)) \rightarrow R(X,X) \end{aligned}$$

Formally, all our constructions are set-theoretic—to focus the presentation, the connection with programming languages and type theory is left implicit. Thus, the second issue that we discuss in Sect. 2 is the meaning of the universal quantifier \(\forall \) in (1). It is known [27] that polymorphic functions of this shape enjoy a form of dinaturality proposed by Michael Barr (see Paré and Román [16]), called by Mulry strong dinaturality [15]. We model the universally quantified types above as collections of Barr-dinatural transformations, and prove that if R is a tight representation, the collection (2) is always a set.

In Sect. 4, we give the formula that defines an equational theory given a polynomial functor \(P\). In general, the theories we construct can be multi-sorted, which is useful for avoiding a combinatory explosion of the induced theories, hence a brief discussion of such theories in Sect. 3. We show that \(PX \rightarrow X\) is indeed a tight representation of the generated theory. Then, in Sect. 5, we study a number of examples in order to discover what effects are denoted by the generated theories. It turns out that each theory can be seen as a (rather complex, for nontrivial polynomial functors) composition of backtracking nondeterminism and finite mutable state. Moreover, in Sect. 6, we show that the corresponding monads can be implemented not only as continuation-like monads (1), but also in “direct style”, using algebraic datatypes.

Since they are parametrised by a polynomial, both the equational theory and its representation consist of many indexed components, so it is not necessarily trivial to get much intuition simply by looking at the formulas. To facilitate this, we have implemented a tool, called CayMon, that generates the theory from a given polynomial, and produces two implementations in Haskell: as a composition of algebraic datatypes and as a continuation-like (“Cayley”) monad (1). The tool can be run in a web browser, and is available at http://pl-uwr.bitbucket.io/caymon.

2 Tight Cayley Representations

In this section, we take a more abstract view on the concept of “Cayley representation”. In the literature (for example, [2, 5, 17, 25]), authors usually define Cayley representations of different forms of algebraic structures in terms of embeddings. This means that given an object X, there is a homomorphism \(\sigma : X \rightarrow Y\) to a different object Y, and moreover \(\sigma \) has a retraction (not necessarily a homomorphism) \(\rho : Y \rightarrow X\) (meaning \(\rho \cdot \sigma = \mathsf {id}\)). One important fact, which is usually left implicit, is that the construction of Y from X is in some sense functorial. Since we are interested in coming up with representations for many different equational theories, we first identify sufficient properties of such a representation needed to carry out the construction of the monad (2) sketched in the introduction. In particular, we introduce the notion of tight Cayley representation, which characterises the functoriality and naturality conditions for the components of the representation.

As for notation, we use \(A \rightarrow B\) to denote both the type of a morphism in a category, and the set of all functions from A to B (the exponential object in \(\mathbf {Set}\)). Also, for brevity, we write the application of a bifunctor to two arguments, e.g., G(XY), without parentheses, as GXY. We begin with the following definition:

Definition 1

(see [16]). Let \(\mathscr {C}, \mathscr {D}\) be categories, and \(G,H : \mathscr {C}^\mathsf {op}\times \mathscr {C} \rightarrow \mathscr {D}\) be functors. Then, a collection of \(\mathscr {D}\)-morphisms \(\theta _X : GXX \rightarrow HXX\) indexed by \(\mathscr {C}\)-objects is called a Barr-dinatural transformation if it is the case that for all objects A in \(\mathscr {D}\), objects X, Y in \(\mathscr {C}\), morphisms \(f_1 : A \rightarrow GXX\), \(f_2 : A \rightarrow GYY\) in \(\mathscr {D}\), and a morphism \(g : X \rightarrow Y\) in \(\mathscr {C}\),

figure a

An important property of Barr-dinaturality is that the component-wise composition gives a well-behaved notion of vertical composition of two such transformations. The connection between Barr-dinatural transformations and Cayley representations is suggested by the fact, shown by Paré and Román [16], that the collection of such transformations of type \(H \rightarrow H\) for the \(\mathbf {Set}\)-bifunctor \(H(X,Y) = X \rightarrow Y\) is isomorphic to the set of natural numbers. The latter, equipped with addition and zero (or the former with composition and the identity transformation, respectively), is simply the free monoid with a single generator, that is, an instance of (1) with \(PX = X\) and \(A = 1\).

For the remainder of this section, assume that \(\mathscr {T}\) is a category, while \(F : \mathbf {Set}\rightarrow \mathscr {T}\) is a functor with a right adjoint \(U : \mathscr {T} \rightarrow \mathbf {Set}\). Intuitively, \(\mathscr {T}\) is a category of algebras of some theory, and U is the forgetful functor. Then, the monad generated by the theory is given by the composition UF. For a function \(f : A \rightarrow UX\), we write \(\widehat{f} = U f' : UFA \rightarrow UX\), where \(f' : FA \rightarrow X\) is the contraposition of f via the adjunction (intuitively, the unique homomorphism induced by the freeness of the algebra FA).

Definition 2

A tight Cayley representation of \(\mathscr {T}\) with respect to \(F \dashv U\) consists of the following components:

  1. (a)

    A bifunctor \(R: \mathbf {Set}^\mathsf {op}\times \mathbf {Set}\rightarrow \mathbf {Set}\),

  2. (b)

    For each set X, an object \({\mathbb RX}\) in \(\mathscr {T}\), such that \(U{\mathbb RX}= {RX X}\),

  3. (c)

    For all sets A, X, Y and functions \(f_1 : A \rightarrow {RX X}\), \(f_2 : A \rightarrow {RY Y}\), \(g : X \rightarrow Y\), it is the case that

    figure b
  4. (d)

    For each object M in \(\mathscr {T}\), a morphism \(\sigma _M : M \rightarrow {\mathbb R(UM)}\) in \(\mathscr {T}\), such that \(U\sigma _M : UM \rightarrow {R(UM) (UM)}\) is Barr-dinatural in M,

  5. (e)

    A Barr-dinatural transformation \(\rho _M : {R(UM) (UM)} \rightarrow UM\), such that \(\rho _M \cdot U\sigma _M = \mathsf {id}\),

  6. (f)

    For each set X, a set of indices \(I_X\) and a family of functions \(\mathsf {run}_{X,i} : {RX X} \rightarrow X\), where \(i \in I_X\), such that \({R({RX X}) \mathsf {run}_{X}}\) is a jointly monic family, and the following diagram commutes for all X and \(i \in I_X\):

    figure c

Note that the condition (c) states that the objects \(\mathbb R\) are, in a sense, natural. Intuitively, understanding an object \({\mathbb RX}\) as an algebra, the condition states that the algebraic structure of \({\mathbb RX}\) does not really depend on the set X. The condition (f) may seem rather complicated: the intuition behind the technical formulation is that RXY behaves like a form of a function space (after all, we are interested in abstract Cayley representations), and \(\mathsf {run}_{X,i}\) is an application to an argument specified by i, as in the example below. In such a case, the joint monicity becomes the extensional equality of functions.

Example 3

Let us check how Cayley representation for monoids fits the definition above: (a) The bifunctor is \(RXY = X \rightarrow Y\). (b) The \(\mathscr {T}\)-object for a monoid M is the monoid \(M \rightarrow M\) with \(\gamma (f,g) = f \circ g\) and \(\varepsilon = \mathsf {id}\). (c) Given some elements \(a,b, \ldots , c \in A\), we need to see that \(g \circ f_1(a) \circ f_1(b) \circ \cdots \circ f_1(c) = f_2(a) \circ f_2(b) \circ \cdots \circ f_2(c) \circ g\). Fortunately, the assumption, which in this case becomes \(g\circ f_1(a) = f_2(a) \circ g\) for all \(a \in A\), allows us to “commute” g from one side of the chain of function compositions to the other. (d) \(\sigma _M(a) = b \mapsto \gamma (a,b)\). It is easy to verify that it is a homomorphism. The Barr-dinaturality condition: assuming \(f(m) = n\) for some \(m \in M\) and \(n \in N\), and a homomorphism \(f : M \rightarrow N\), it is the case that, omitting the U functor, \(RfN(\sigma _N(n)) = RfN(\sigma _N(f (m))) = b \mapsto \gamma (f(m),f(b)) = b \mapsto f(\gamma (m,b)) = RMf(\sigma _M(m))\), where the equalities can be explained respectively as: assumption in the definition of Barr-dinaturality, unfolding definitions, homomorphism, unfolding definitions. (e) \(\rho _M(f) = f(\varepsilon )\). It is easy to show that it is Barr-dinatural; note that we need to use the fact that \(\mathscr {T}\)-morphisms (that is, monoid homomorphisms) preserve \(\varepsilon \). (f) We define \(I_X = X\), while \(\mathsf {run}_{X,i}(f) = f(i)\).

The first main result of this paper states that given a tight representation of \(\mathscr {T}\) with respect to \(F \dashv U\), the monad given by the composition UF can be alternatively defined using a continuation-like monad constructed with sets of Barr-dinatural transformations:

Theorem 4

For a tight Cayley representation R with respect to \(F \dashv U\), elements of the set UFA are in 1-1 correspondence with Barr-dinatural transformations of the type \((A \rightarrow {RX X}) \rightarrow {RX X}\). In particular, this means that the latter form a set. Moreover, this correspondence gives a monad isomorphism between UF and the evident continuation-like structure on (2), given by the unit \((\eta _A(a))_X(f) = f(a)\) and the Kleisli extension \((f^*(k))_X(g) = k_X(a \mapsto (f (a))_X(g))\).

We denote the set of all Barr-dinatural transformations from the bifunctor \((X,Y) \mapsto A \rightarrow R XY\) to R as \(\forall X. (A \rightarrow {RX X}) \rightarrow {RX X}\). This gives us a monad similar in shape to the continuation monad, or, more generally, Kock’s codensity monad [12] embodied using the formula for right Kan extensions as ends. One important difference with the codensity monad (except, of course, the fact that we have bifunctors on the right-hand sides of arrows) is that we use Barr-dinatural transformations instead of the usual dinatural transformations [3]. Indeed, if we use ends instead of \(\forall \), the end \(\textstyle \int _X (A \rightarrow {RX X}) \rightarrow {RX X}\) is given as the collection of all dinatural transformations of the given shape. It is known, however, that even in the simple case when \(A = 1\) and \(RXY = X \rightarrow Y\), this collection is too big to be a set (see the discussion in [16]), hence such end does not exist.

3 Multi-sorted Equational Theories with a Main Sort

The equational theories that we generate in Sect. 4 are multi-sorted, which is useful for trimming down the combinatorial complexity of the result. This turns out to be, in our view, essential in understanding what computational effects they actually represent. In this section, we give a quick overview of what kind of equational theories we work with, and discuss the construction of their free algebras.

We need to discuss the free algebras here, since we want the freeness to be with respect to a forgetful functor to \(\mathbf {Set}\), rather than to the usual category of sorted sets; compare [26]. This is because we want the equational theories to generate monads on \(\mathbf {Set}\), as described in the previous section. In particular, we are interested in theories in which one of the sorts is chosen as the main one, and work with the functor that forgets not only the structure, but also the carriers of all the other sorts, only preserving the main one. Luckily, this functor can be factored as a composition of two forgetful functors, each with an obvious left adjoint.

In detail, assume a finite set of sorts \(S = \{ \varOmega , K_1, \ldots , K_d \}\) for some \(d \in \mathbb N\), where \(\varOmega \) is the main sort. The category of sorted sets is simply the category \(\mathbf {Set}^{|S|}\), where |S| is the discrete category generated by the set S. More explicitly, the objects of \(\mathbf {Set}^{|S|}\) are tuples of sets (one for each sort), while morphisms are tuples of functions. Given an S-sorted finitary theory \(\mathfrak T\), we denote the category of its algebras as \(\mathfrak {T}\text {-Alg}\). To see that the forgetful functor from \(\mathfrak {T}\text {-Alg}\) to \(\mathbf {Set}\) has a left adjoint, consider the following composition of adjunctions:

This means that the free algebra for each sort has the carrier given by the set of terms of the given sort (with variables appearing only at positions intended for the main sort \(\varOmega \)) quotiented by the congruence induced by the equations. This kind of composition of adjunctions is similar to [18], but in this case the compound right adjoints of the theories given in the next section are monadic.

4 Theories from Polynomial Cayley Representations

In this section, we introduce algebraic theories that are tightly Cayley-represented by \(PX \rightarrow X\) for a polynomial functor \(P\). Notation-wise, whenever we write \(i \le k\) for a fixed \(k \in \mathbb N\), we mean that i is a natural number in the range \(1 , \ldots , k\), and use \([ x_i ]_{ i \le k }\) to denote a sequence \(x_1, \ldots , x_k\). The latter notation is used also in arguments of functions and operations, so \(f([ x_i ]_{ i \le k })\) means \(f(x_1, \ldots , x_k)\), while \(f(x,[ y_i ]_{ i \le k })\) means \(f(x, y_1, \ldots , y_k)\). We sometimes use double indexing; for example, by \(\textstyle \prod _{i=1}^{k}\prod _{j=1}^{t_i} X_{i,j} \rightarrow Y\) for some \([ t_i ]_{ i \le k }\), we mean the type \(X_{1,1} \times \cdots \times X_{1,t_1} \times \cdots \times X_{k,1} \times \cdots \times X_{k,t_k} \rightarrow Y\). This is matched by a double-nested notation in arguments, that is, \(f([ [ x_i^j ]_{ j \le t_i } ]_{ i \le k })\) means \(f(x_1^1 , \ldots , x_1^{t_1} , \ldots , x_k^1 , \ldots , x_k^{t_k})\). Also, whenever we want to repeat an argument k-times, we write \([ x ]_{ k }\); for example, \(f([ x ]_{ 3 })\) means f(xxx). Because we use a lot of sub- and superscripts as indices, we do not use the usual notation for exponentiation. This means that \(x^i\) always denotes some x at index i, while a k-fold product of some type X, ordinarily denoted \(X^k\), is written as \(\textstyle \prod ^k X\). We use the \(\llbracket \text {-} \rrbracket \) brackets to denote the interpretation of sorts and operations in an algebra (that is, a model of the theory). If the algebra is clear from the context, we skip the brackets in the interpretation of operations.

For the rest of the paper, let \(d \in \mathbb {N}\) (the number of monomials in the polynomial) and sequences of natural numbers \([ c_i ]_{ i \le d }\) and \([ e_i ]_{ i \le d }\) (the coeffcients and exponents respectively) define the following polynomial endofunctor on \(\mathbf {Set}\):

$$\begin{aligned} PX = \sum _{i = 1}^d c_i \times \textstyle \prod ^{e_i}X, \end{aligned}$$

where \(c_i\) is an overloaded notation for the set \(\{ 1, \ldots , c_i \}\). With this data, we define the following equational theory:

Definition 5

Assuming d, \([ c_i ]_{ i \le d }\), and \([ e_i ]_{ i \le d }\) as above, we define the following equational theory \(\mathfrak {T}\):

  • Sorts:

    figure d
  • Operations:

  • Equations:

    figure e

Thus, in the theory \(\mathfrak T\), there is a main sort \(\varOmega \), which we think of as corresponding to the entire functor, and one sort \(K_i\) for each “monomial” \(\textstyle \prod ^{e_i}X\). Then, we can think of \(\varOmega \) as a tuple containing elements of each sort, where each sort \(K_i\) has exactly \(c_i\) occurrences. The fact that \(\varOmega \) is a tuple, which is witnessed by the \(\mathsf {cons}\) and \(\pi \) operations equipped with the standard equations for tupling and projections, is not too surprising—one should keep in mind that \(\mathfrak T\) is a theory represented by the type \(PX \rightarrow X\), which can be equivalently given as the product of function spaces \(c_i \times \textstyle \prod ^{e_i}X \rightarrow X\) for all \(i \le d\).

Each operation \(\gamma ^j_i\) can be used to compose an element of \(K_j\) and \(e_j\) elements of \(K_i\) to obtain an element of \(K_i\). The \(\varepsilon \) constants can be seen as selectors: in (beta-\(\varepsilon \)), \(\varepsilon _j^k\) in the first argument of \(\gamma ^j_i\) selects the k-th argument of the sort \(K_i\), while the (eta-\(\varepsilon \)) equation states that composing a value of \(K_i\) with the successive selectors of \(K_i\) gives back the original value. The equation (assoc-\(\gamma \)) states that the composition of values is associative in an appropriate sense. In Sect. 5, we provide a reading of the theory \(\mathfrak T\) as a specification of a computational effect for different choices of d, \(c_i\), and \(e_i\).

Remark 6

If it is the case that \(e_i = e_j\) for some \(i,j \le d\), then the sorts \(K_i\) and \(K_j\) are isomorphic. This means that in every algebra of such a theory, there is an isomorphism of sorts \(\varphi : \llbracket K_i \rrbracket \rightarrow \llbracket K_j \rrbracket \), given by \(\varphi (x) = \gamma _j^i(x,[ \varepsilon _j^k ]_{ k \le e_i })\). This suggests an alternative setting, in which instead of having a single \(c_i \times \prod ^{e_i}X\) comoponent, we can have \(c_i\) components of the shape \(\prod ^{e_i}X\). In such a setting, the equational theory \(\mathfrak T\) in Definition 5 would be slightly simpler—specifically, there would be no need for double-indexing in the types of \(\mathsf {cons}\) and \(\pi \). On the downside, this would obfuscate the connection with computational effects described in Sect. 5 and some conjured extensions in Sect. 7.

The theory \(\mathfrak T\) has a tight Cayley representation using functions from \(P\), as detailed in the following theorem. This gives us the second main result of this paper: by Theorem 4, the theory \(\mathfrak T\) is the equational theory of the monad (1). The notation \(\mathsf {in}_i\) means the i-th inclusion of the coproduct in the functor \(P\).

Theorem 7

The equational theory \(\mathfrak {T}\) from Definition 5 is tightly Cayley-represented by the following data:

  • The bifunctor \(RXY = PX \rightarrow Y\),

  • For a set X, the following algebra:

    • Carriers of sorts:

      $$\begin{aligned}&\llbracket \varOmega \rrbracket = RXX \\&\llbracket K_i \rrbracket = \textstyle \prod ^{e_i}X \rightarrow X \end{aligned}$$
    • Interpretation of operations:

      $$\begin{aligned}&\llbracket \mathsf {cons} \rrbracket ([ [ f_k^j ]_{ j \le c_k } ]_{ k \le d })(\mathsf {in}_i(c,[ x_t ]_{ t\le e_i })) = f_i^c([ x_t ]_{ t\le e_i }) \\&\llbracket \pi _i^j \rrbracket (f)([ x_t ]_{ t \le e_i }) = f(\mathsf {in}_i(j,[ x_t ]_{ t \le e_i })) \\&\llbracket \varepsilon _i^j \rrbracket ([ x_t ]_{ t \le e_i }) = x_j \\&\llbracket \gamma _i^j \rrbracket (f,[ g_k ]_{ k \le e_j })([ x_t ]_{ t\le e_i }) = f([ g_k([ x_t ]_{ t\le e_i }) ]_{ k \le e_j }) \end{aligned}$$
  • The homomorphism \(\sigma _M\) for the main sort and sorts \(K_i\):

    $$\begin{aligned}&\sigma ^\varOmega _M(m)(\mathsf {in}_i(c,[ x_t ]_{ t \le e_i })) = \mathsf {cons}([ [ \gamma _k^{i}(\pi _{i}^{c}(m),[ \pi _k^j(x_t) ]_{ t \le e_i }) ]_{ j \le e_k } ]_{ k \le d }) \\&\sigma ^i_M(s)([ x_t ]_{ t \le e_i }) = \mathsf {cons}([ [ \gamma _k^{i}(s,[ \pi _k^j(x_t) ]_{ t \le e_i }) ]_{ j \le e_k } ]_{ k \le d }) \end{aligned}$$
  • The transformation \(\rho _M\):

    $$\begin{aligned}&\rho _M(f) = \mathsf {cons}([ [ \pi _k^j(f (\mathsf {in}_k (j, [ \mathsf {cons}([ w^f_r ]_{ r< k }, [ \varepsilon _k^t ]_{ c_k }, [ w^f_r ]_{ k < r \le d }) ]_{ t \le e_k }))) ]_{ j \le c_k } ]_{ k \le d }) \\&\quad \text {where } w^f_r = [ \pi _r^c( f(\mathsf {in}_r(c, [ \varepsilon _r^j ]_{ j \le e_r }))) ]_{ c \le c_r } \end{aligned}$$
  • The set of indices \(I_X = PX\) and the functions \(\mathsf {run}_{X,i}(f) = f(i)\).

In the representing algebra, it is the case that each \(\llbracket K_i \rrbracket \) represents one monomial, as mentioned in the description of \(\mathfrak T\), while \(\llbracket \varOmega \rrbracket \) is the appropriate tuple of representations of monomials, which is encoded as a single function from a coproduct (in our opinion, this encoding turns out to be much more readable on paper), while \(\mathsf {cons}\) and \(\pi \) are indeed given by tupling and projections. For each \(i \le d\), the function \(\varepsilon _i^j\) simply returns its j-th argument, while \(\gamma \) is interpreted as the usual composition of multi-argument functions.

Homomorphisms between multi-sorted algebras are defined as operation-preserving functions for each sort, so \(\sigma \) is defined for the sort \(\varOmega \) and for each sort \(K_i\). In general, the point of Cayley representations is to encode an element m of an algebra M using its possible behaviours with other elements of the algebra. It is no different here: for each sort \(K_i\) at the c-th occurrence in the tuple, the function \(\sigma ^\varOmega \) packs (using \(\mathsf {cons}\)) all possible compositions (by means of \(\gamma \)) of values of \(K_i\) with the “components” of m (extracted using \(\pi \)). The same happens for each \(s \in \llbracket K_i \rrbracket \) in \(\sigma ^i_M(s)\), but there is no need to unpack s, as it is already a value of a single sort.

The transformation \(\rho _M\) is a bit more complicated. The argument f is, in general, a function from a coproduct to M, but we cannot simply apply f to one value \(\mathsf {in}_i(\ldots )\) for some sort \(K_i\), as we would obviously lose the information about the components in different sorts. This is why we need to apply f to all possible sorts with \(\varepsilon \) in the right place to ensure that we recover the original value. We extract the information about particular sorts from such values, and combine them using \(\mathsf {cons}\). Interestingly, the elements of \(w^f_r\) could actually be replaced by any expression of the appropriate sort that is preserved by homomorphisms, assuming that f is also preserved. This is needed to ensure that \(\rho \) is Barr-dinatural (the fact that f is preserved by homomorphisms is exactly the assumption in the definition of Barr-dinaturality). For example, if \(e_r > 0\) for some \(r \le d\), one can define \(w^f_r\) simply as \([ \varepsilon _r^j ]_{ c_r }\) for some \(j \le e_r\). The complicated expression in the definition of \(w^f_r\) is a way to produce values also for sorts \(K_r\) with \(e_r = 0\), which do not have any \(\varepsilon \) constants.

5 Effects Modeled by Polynomial Representations

Now we describe what kind of computational effects are captured by the theories introduced in the previous section. It turns out that they all are different compositions of finite mutable state and backtracking nondeterminism. These compositions include the two most basic ones: when the state is local for each nondeterministic branch, and when it is global to the entire computation.

In the following, if there is only one object of a given kind, we skip the indices. For example, if for some i, it is the case that \(e_i = 1\), we write \(\varepsilon _i\) instead of \(\varepsilon _i^1\). If \(d=1\), we skip the subscripts altogether.

5.1 Backtracking Nondeterminism via Monoids

We recover the original Cayley theorem for monoids instantiating Theorem 7 with \(PX = X\), that is, \(d = 1\) and \(c_1 = e_1 = 1\). In this case, we obtain two sorts, \(\varOmega \) and \(K\), while the equations (beta-\(\pi \)) and (eta-\(\pi \)) instantiate respectively as follows:

$$\begin{aligned} \pi (\mathsf {cons}(x)) = x, \quad \mathsf {cons}(\pi (x)) = x \end{aligned}$$

This means that both sorts are isomorphic, so one can think of this theory as being single-sorted. Of course, this is always the case if \(d = 1\) and \(c_1 = 1\). Since \(e_1 = 1\), the operation \(\gamma \) is binary and there is a single \(\varepsilon \) constant. The equations (beta-\(\varepsilon \)) and (eta-\(\varepsilon \)) say, respectively, that \(\varepsilon \) is the left and right unit of \(\gamma \), that is:

$$\begin{aligned} \gamma (\varepsilon , x) = x, \quad \gamma (x, \varepsilon ) = x \end{aligned}$$

Interestingly, the two unit laws for monoids are symmetrical, but in general the (beta-\(\varepsilon \)) and (eta-\(\varepsilon \)) equations are not. One should note that the symmetry is already broken when one implements free monoids (that is, lists) in a programming language: in the usual right-nested implementation, the “beta” rule is part of the definition of the append function, while the “eta” rule is a theorem. The (assoc-\(\gamma \)) equation instantiates as the associativity of \(\gamma \):

$$\begin{aligned} \gamma (\gamma (x,y), z) = \gamma (x, \gamma (y, z)) \end{aligned}$$

5.2 Finite Mutable State

For \(n \in \mathbb N\), if we take \(PX = n\), that is, \(d=1\), \(c_1 = n\) and \(e_1 = 0\), we obtain the equational theory of a single mutable cell in which the set of possible states is \(\{ 1, \ldots , n \}\). There are two sorts in the theory: \(\varOmega \) and \(K\). The sort \(K\) does not have any interesting structure on its own, as there are no constants \(\varepsilon \), and the equation (eta-\(\varepsilon \)) instantiates to

$$\begin{aligned} \gamma (x) = x, \end{aligned}$$

which means that \(\gamma \) is necessarily an identity. The fact that this theory is indeed the theory of state becomes apparent when we identify \(\varOmega \) as a sort of computations that require some initial state to proceed, and \(K\) as computations that produce a final state. Then, the operations \(\pi ^j : \varOmega \rightarrow K\) (\(j \le n\)) are the “update” operations, where \(\pi ^j\) sets the current state to j, while \(\mathsf {cons}: \prod ^n K\rightarrow \varOmega \) is the “lookup” operation, in which the j-th argument is the computation to be executed if the current state is j. The equations (beta-\(\pi \)), for all \(j \le n\), and (eta-\(\pi \)) state respectively:

$$\begin{aligned} \pi ^j(\mathsf {cons}([ x_i ]_{ i \le n })) = x_j, \quad \mathsf {cons}([ \pi ^i(x) ]_{ i \le n }) = x \end{aligned}$$

These equations embody the natural behaviour rules for this limited form of state. The former reads that setting the current state to j and then proceeding with the computation \(x_i\) if the current state is i is the same thing as simply proceeding with \(x_j\) (note that \(x_j\) is of the sort \(K\), hence it does not use the information that the current state has just been updated to j, so there is no need to keep the \(\pi ^j\) operation on the right-hand side of the equation). The latter states that if the current state is i and we set the current state to i, it is the same thing as not changing the state at all (note that x does not depend on the current state, as it is the same in every argument of \(\mathsf {cons}\)).

Interestingly, the presentations of equational theories for state in the literature (for example, [7, 23]) are all single-sorted. Such a setting can be recovered by defining the following macro-operations on the sort \(\varOmega \):

The trick here is that the \(\mathsf {get}\) operation does not change the state (by setting the new state to the current one), while \(\mathsf {put}\) does not depend on the current state (by having the same computation in every argument of \(\mathsf {cons}\)). The usual four equations for the interaction of \(\mathsf {put}\) and \(\mathsf {get}\) can be obtained by unfolding the definitions and using the (beta-\(\pi \)) and (eta-\(\pi \)) equations:

The connection with the implementation of state in programming becomes evident when we take a closer look at the endofunctor of the induced monad from Theorem 4. Consider the following informal calculation:

figure f

This means that not only do we prove that the equational theory corresponds to the usual state monad, but we can actually derive the implementation of state as the endofunctor \(A \mapsto (n \rightarrow A \times n)\).

5.3 Backtracking with Local State

We obtain one way to combine nondeterminism with state using the functor \(PX = n \times X\), for \(n \in \mathbb N\), that is, \(d=1\), \(c_1 = n\) and \(e_1 = 1\). It has two sorts, \(\varOmega \) and \(K\), which play roles similar to those detailed in the previous section. However, this time \(K\) additionally has the structure of a monoid. This gives us the theory of backtracking with local state, which means that whenever we make a choice using the \(\gamma \) operation, the computations in each argument carry separate, non-interfering states. In particular, in a computation \(\gamma (x,y)\), both subcomputations x and y start with the same state, which is the initial state of the entire computation. This non-interference is guaranteed simply by the system of sorts: the arguments of \(\gamma \) are of the sort \(K\), which means that the stateful computations inside the arguments begin with \(\pi \), which sets a new state.

We can also obtain a single-sorted theory, similar to the case of the pure state. To the \(\mathsf {put}\) and \(\mathsf {get}\) macro-operations, we add choice and failure as follows:

Then, the locality of state can be summarised by the following equality, which is easy to show using the (beta-\(\pi \)) and (eta-\(\pi \)) equations:

$$\begin{aligned}&\mathsf {put}^k(\mathsf {choose}(x,y)) = \mathsf {choose}(\mathsf {put}^k(x),\mathsf {put}^k(y)) \end{aligned}$$

5.4 Backtracking with Global State

Another way to compose nondeterminism and state is by using global state, which is obtained for \(n \in \mathbb N\) and \(PX = X^n\), that is, \(d=1\), \(c_1 = 1\), and \(e_1 = n\). As in the case of pure backtracking nondeterminism, it means that the sorts \(\varOmega \) and \(K\) are isomorphic. The intuitive understanding of the expression \(\gamma (x, [ y_i ]_{ i \le n })\) is: first perform the computation x, and then the computation \(y_i\), where i is the final state of the computation x. The operation \(\varepsilon ^j\) is: fail, but set the current state to j. In this case, the equations (beta-\(\varepsilon \)) instantiate to the following for all \(j \le n\):

$$\begin{aligned} \gamma (\varepsilon ^j, [ y_i ]_{ i \le n }) = y_j \end{aligned}$$

It states that if the first computation fails but sets the state to j, the next step is to try the computation \(y_j\). Note that there is no other way to give a new state than via failure, but this can be circumvented using \(\gamma (x,[ \varepsilon ^k ]_{ n })\) to set the state to k after performing x. The (eta-\(\varepsilon \)) instantiates to:

$$\begin{aligned} \gamma (x,[ \varepsilon ^j ]_{ j \le n }) = x \end{aligned}$$

This reads that if we execute x and then set the current state to the resulting state of x, it is the same as just executing x.

6 Direct-Style Implementation

Free algebras of the theory \(\mathfrak T\) from Definition 5 can also be presented as terms of a certain shape. They are best described as terms built using the operations from \(\mathfrak T\) that are well-typed according to the following typing rules, where the types are called \(\varOmega \), \(K_i\), and \(P_i\) for \(i \le d\). The type of the entire term is \(\varOmega \), and \(\textsc {Var}(x)\) means that x is a variable.

figure g

Note that even though variables appear as arguments to the operations \(\pi \), they are not of the type \(\varOmega \). This means that the entire term cannot be a variable, as it is always constructed with \(\mathsf {cons}\) as the outermost operation. Each argument of \(\mathsf {cons}\) is a term of the type \(K_i\) for an appropriate i, which is built out of the operations \(\varepsilon \) and \(\gamma \). Note that the first argument of \(\gamma \) is always a variable wrapped in \(\pi \), while all the other arguments are again terms of the type \(K_i\). Overall, such terms can be captured as the following endofunctors on \(\mathbf {Set}\), where \(W^i\) represents terms of the type \(K_i\), while \(W^\varOmega \) represents terms of the type \(\varOmega \). By \(\mu Y. G Y\) we mean the carrier of the initial algebra of an endofunctor G.

$$\begin{aligned}&W^i X = \mu Y. e_i + \textstyle \sum _{j=1}^d \left( \textstyle \sum ^{c_i} X \right) \times \textstyle \prod ^{e_j} Y \\&W^\varOmega X = \textstyle \prod _{i=1}^d \textstyle \prod ^{c_i} W^i X \end{aligned}$$

Clearly, \(e_i\) in the definition of \(W^i\) represents the \(\varepsilon _i\) constants, while the second component of the coproduct is a choice between the \(\gamma _i\) operations with appropriate arguments.

It is the case that every term of the sort \(\varOmega \) can be normalised to a term of the type \(\varOmega \) by a term-rewriting system obtained by orienting the “beta” and “assoc” equations left to right, and eta-expanding variables at the top-level:

This term rewriting system gives rise to a natural implementation of the monadic structure, where the “beta” and “assoc” rules normalise the two-level term structure, thus implementing the monadic multiplication, while the eta-expansion rule implements the monadic unit.

7 Discussion

The idea for employing Cayley representations to explore implementations of monads induced by equational theories is inspired by Hinze [8], who suggested a connection between codensity monads, Church representation of lists, and the Cayley theorem for monoids. We note that Hinze’s discussion is informal, but he suggests using ends, which, as we discuss in Sect. 2, is not sound.

Most of related work follows one of two main paths: it either concentrates on algebraic explanation of monads already used in programming and semantics (for example, [11, 19, 23]), or on the general connection between different kinds of algebraic theories and computational effects, but without much interest in whether it leads to structures implementable in a programming language. Some exceptions are the construction of the sum of a theory and a free theory [9] or the sum of ideal monads [6]. What we propose in Sect. 4 is a form of a “functional combinatorics”: given a type, what kind of algebra describes the possible values?

As our approach veers off the main paths of the recent work on effects, there are many possible directions of future work. One interesting direction would be to generalise \(\mathbf {Set}\), the base category used throughout this paper, to more abstract categories. After all, we want to talk about structures definable only in terms of (co)products, exponentials, and quantifiers—which are all constructions whose universal properties are singled out and explored using (co)cartesian (or even monoidal) closed categories. However, the current development relies heavily on the particular properties of \(\mathbf {Set}\), such as extensional equality of functions, which appears in disguise in the condition (f) in Definition 2.

One can also try to extend the type used as a Cayley representation. For example, we could consider the polynomial \(P\) in (3) to range over the space of all sets, that is, allow the coefficients \(c_i\) to vary over sets rather than natural numbers. In the Cayley representation, it would be enough to consider functions from \(c_i\) in place of \(c_i\)-fold products. We would immediately gain expressiveness, as the obtained state monad would no longer need to be defined only for a finite set of possible states. On the flip side, this would make the resulting theory infinitary – which, of course, is not uncommon in the field of algebraic treatment of computational effects. However, we decide to stick to the simplest possible setting in this paper, which greatly simplifies the presentation, but still gives us some novel observations, like the fact that the theory of finite state is simply the theory of 2-sorted tuples in Sect. 5.2, or the novel theory of backtracking nondeterminism with global state in Sect. 5.4. Other future extensions that we believe are worth exploring include iterating the construction to obtain a from of a distributive tensor (compare Rivas et al.’s [25] “double” representation of near-semirings) or quantifying over more variables, leading to less interaction between sorts.