1 Introduction

Genetic recombination describes the reshuffling of genetic information that occurs during the reproductive cycle of (sexual) organisms; it is one of the major mechanisms that maintain genetic diversity within populations. One of the standard models for its description is the deterministic recombination equation in continuous time, in the following simply referred to as recombination equation; for background, see [7]. This equation describes the evolution of the distribution of types in a (haploid) population under the assumption of random mating, while neglecting stochastic fluctuations.

In the case of finite sets of alleles at an arbitrary (but finite) number of sites, the recombination equation was reinterpreted by [12] as the law of mass action of a network of chemical reactions between gametes. This reaction network was shown to be strongly reversible and general theory [14, 16] on chemical reaction networks thus implies that it admits a representation in terms of a generalised gradient system, with respect to entropy [11]. This strengthens an earlier result by Akin [1, Thm. III.2.5] that entropy is a strong Lyapunov function for recombination; a somewhat weaker statement can be found in [13, Thm. 6.3.5].

A generalised version of the model, which allows for a more general reproduction mechanism (involving an arbitrary number of parents) as well as more general (not necessarily discrete) type spaces, was considered in [5]. There, the authors reduced the original measure-valued, infinite-dimensional equation via a suitable ansatz function to a finite-dimensional, albeit still nonlinear, system. This system was then analysed using lattice-theoretic techniques, leading to an explicit recursion formula for its solution. A somewhat different, but simpler approach can be found in [2], which relates the evolution of the type distribution, forward in time, to an ancestral partitioning process backward in time. More precisely, the solution of the recombination equation is expressed in terms of the solution of the (linear) differential equation for the law of the partitioning process.

One obvious question is now whether this more general model can also be represented as a strongly reversible chemical reaction network, and, consequently, as a generalised gradient system like the more classical version, which involves only two parents and finite type spaces.

We shall see that, in the case of finite type spaces, the answer is affirmative, generalising the results of Müller and Hofbauer [12] to the multi-parent case. Moreover, we will see that the dynamics of the law of the partitioning process [2, 4], which is independent of the type space, can be rewritten as a generalised (linear) gradient system.

The paper is organised as follows. First, we recall and explain the recombination equation for multiple parents. The connection to chemical reaction networks is established in Sect. 3, for finite type spaces. Section 4 contains the results on the gradient structure of this network, and Sect. 5 explains the gradient structure of a particular class of Markov chains, which covers the partitioning process.

2 The recombination equation

For our purposes, a genetic type is a sequence \( x = (x_1,\ldots ,x_n) \in X \mathrel {\mathop :}=X_1 \times \cdots \times X_n \) of fixed length n, where \(X_1,\ldots ,X_n\) are locally compact Hausdorff spaces and the evolution of the (gametic) type distribution of the population is modelled as a differentiable one-parameter family \(\omega = (\omega _t^{})_{t \geqslant 0}\) of (Borel) probability measures on X. This generality is useful in the context of quantitative genetics, when modelling the evolution of quantitative, polygenic traits such as body size, brain volume, growth rate or milk production; cf. [7, Ch. IV].

To understand the dynamics of \(\omega \), let us start on the level of individuals. When two or more parents jointly produce an offspring, a partition of \(S := \{1,\ldots ,n\}\), the set of genetic sites (or loci), describes how the type of that offspring is pieced together from the types of its parents; recall that a partition of a set M is a collection of pairwise disjoint, non-empty subsets, called blocks, whose union is M; whenever we enumerate the blocks of a partition of S, that is, whenever we write \( \mathcal {A}= \{A_1,\ldots ,A_{|\mathcal {A}|}^{}\}, \) where \(|\mathcal {A}|\) is the number of blocks in \(\mathcal {A}\), we order the blocks such that \(\mathcal {A}_1\) is the block that contains 1 and, for all \(2 \leqslant k \leqslant |\mathcal {A}|\), \(A_k\) is the block that contains the smallest element not contained in \(\bigcup _{j = 1}^{k-1} A_j\).

Formally, given k parent individuals of types \(x^{(1)},\ldots ,x^{(k)}\), (where \(x^{(i)} = \big (x^{(i)}_1,\ldots ,x^{(i)}_n \big )\) for all \(1 \leqslant i \leqslant k\)), and a partition \(\mathcal {A}= \{A_1,\ldots ,A_k \}\) of S into k blocks, the type of the offspring is

$$\begin{aligned} y = \pi _{A_1}^{} \big (x^{(1)} \big ) \sqcup \cdots \sqcup \pi _{A_k}^{} \big (x^{(k)} \big ). \end{aligned}$$
(1)

Here, for each \(A \subseteq S\), \(\pi _{A}^{}\) is the projection \( (x_i)_{i \in S} \mapsto (x_i)_{i \in A} \) and the symbol \(\sqcup \) is used to denote the joining of gene fragments, respecting the order of the sites; given pairwise disjoint subsets \(U_1,\ldots ,U_k\) of S and sequences \(x^{(\ell )}\) in \(\pi ^{}_{U_\ell } (X)\) with \(1 \leqslant \ell \leqslant k\), we write \( x^{(1)} \sqcup \cdots \sqcup x^{(k)} \) for the sequence indexed by \(U_1 \cup \cdots \cup U_k\) that has at site i the letter \(x^{(j)}_i\), where \(U_j\) is the unique subset that contains i. In particular, in Eq. (1), \(y = (y_1,\ldots ,y_n)\) where \(y_i = x^{(j)}_i\) if \(i \in U_j\).

For any \(U \subseteq S\), we denote by

$$\begin{aligned} X_U^{} \mathrel {\mathop :}=\prod _{i \in U} X_i \end{aligned}$$

the marginal type space with respect to U.

In the following, we denote by \(\varvec{P}(S)\) the set of all partitions of S, not to be confused with the set \(\mathcal {P}(X)\) of all (Borel) probability measures on X.

To express the effect of recombination on the type distribution in a concise way, we define for any \(\mathcal {A}\in \varvec{P}(S)\) a (nonlinear) operator on \(\mathcal {P}(X)\) by

$$\begin{aligned} \mathcal {R}_\mathcal {A}(\nu ) \mathrel {\mathop :}=\bigotimes _{i=1}^{|\mathcal {A}|}\pi ^{}_{A_i} . \nu . \end{aligned}$$

Here, \(\bigotimes \) denotes measure product and the dot denotes the push-forward of probability measures; i.e., \( \pi _A^{} . \nu (E) \mathrel {\mathop :}=\nu \big (\pi ^{-1}_A (E) \big ), \) for any Borel measurable subset \(E \subseteq X_A^{}\) and \(\nu \in \mathcal {P}(X)\). The operators \(\mathcal {R}_\mathcal {A}^{}\) are called recombinators [3]. Clearly, \(\mathcal {R}_\mathcal {A}^{} (\nu )\) is the distribution of the type of the joint offspring of \(|\mathcal {A}|\) parents whose types are drawn independently from \(\nu \), where the letters at two different sites k and \(\ell \) come from the same parent if and only if k and \(\ell \) are in the same block of \(\mathcal {A}\); we call such an offspring \(\mathcal {A}\)-recombined. If X is finite, the recombinator can also be written as follows.

Lemma 1

Assume that X is finite. Then, for all \(\nu \in \mathcal {P}(X)\) and all \(\mathcal {A}= \{A_1,\ldots ,A_k\} \in \varvec{P}(S)\), we have

$$\begin{aligned} \mathcal {R}_{\mathcal {A}} (\nu ) = \sum _{x^{(1)}_{},\ldots ,x^{(|\mathcal {A}|)}_{} \in X} \nu \big (x^{(1)}_{} \big )\cdot \ldots \cdot \nu \big (x^{(|\mathcal {A}|)}_{} \big ) \delta _{\bigsqcup _{i = 1}^{|\mathcal {A}|} \pi _{A_i}^{} (x^{(i)} )}^{}. \end{aligned}$$

Proof

Let us write \({\tilde{\mathcal {R}}}\) for the map on \(\mathcal {P}(X)\) defined by the right-hand side. Then, for all \(y \in X\),

$$\begin{aligned} {\tilde{\mathcal {R}}}(\nu ) (y) = \sum _{\begin{array}{c} x^{(1)}_{},\ldots ,x^{(k)}_{} \in X \\ \pi ^{}_{A_i} (x^{(i)}) = \pi ^{}_{A_i}(y) \, \forall i \end{array}} \nu \big (x^{(1)}_{} \big )\cdot \ldots \cdot \nu \big (x^{(|\mathcal {A}|)}_{} \big ) = \prod _{i = 1}^k \nu \big (\pi _{A_i}^{-1}(x)\big ), \end{aligned}$$

which implies the identity claimed. \(\square \)

Remark 2

As expressions of the form

$$\begin{aligned} \delta ^{}_{\bigsqcup _{i=1}^k \pi _{A_i}^{} ( x^{(i)})} \end{aligned}$$

are quite cumbersome, we simplify the notation by formally identifying each element m in a finite set M with the associated point (or Dirac) measure \(\delta _m\). Under this convention, the statement from Lemma 1 reads

$$\begin{aligned} \mathcal {R}_{\mathcal {A}} (\nu ) = \sum _{x^{(1)}_{},\ldots ,x^{(|\mathcal {A}|)}_{} \in X} \nu \big (x^{(1)}_{} \big )\cdot \ldots \cdot \nu \big (x^{(|\mathcal {A}|)}_{} \big ) \bigsqcup _{i = 1}^{|\mathcal {A}|} \pi _{A_i}^{} (x^{(i)} ). \end{aligned}$$

Put differently, we identify the vector space of finite signed measures on M with the vector space \(\mathbb {R}\,^M\) of formal sums of the elements of M. Unless stated otherwise, all vectors are interpreted as column vectors. This entails that the standard scalar product \(\langle v, w \rangle \) of any two vectors v and w can be written as \(v^{\mathsf {T}}w\) (where \({\mathsf {T}}\) denotes transposition) whereas \(v w^{\mathsf {T}}\) denotes the matrix that maps any other vector u to \(\langle w,u \rangle v\). \(\diamondsuit \)

Before we continue, we will need to recall from [2] a few additional notions around partitions. Given two partitions \(\mathcal {A}\) and \(\mathcal {B}\), we say that \(\mathcal {A}\) is finer than \(\mathcal {B}\) and write \(\mathcal {A}\preccurlyeq \mathcal {B}\) if and only if every block of \(\mathcal {A}\) is a subset of some block of \(\mathcal {B}\); this defines a partial order on \(\varvec{P}(S)\), and we denote the unique minimal (maximal) element by \(\underline{ 0}\mathrel {\mathop :}=\{\{i\} \mid i \in S\}\) (\(\underline{ 1}\mathrel {\mathop :}=\{S\}\)). The coarsest common refinement of \(\mathcal {A}\) and \(\mathcal {B}\) is defined as

$$\begin{aligned} \mathcal {A}\wedge \mathcal {B}:= \{A \cap B \mid A \in \mathcal {A}, B \in \mathcal {B}\} \setminus \{\varnothing \}; \end{aligned}$$

it is the largest (coarsest) element of \(\varvec{P}(S)\) smaller (finer) or equal to both \(\mathcal {A}\) and \(\mathcal {B}\). Furthermore, given a partition \(\mathcal {A}\) of S and some subset \(U \subseteq S\), we denote by

$$\begin{aligned} \mathcal {A}|_U := \{ A \cap U \mid A \in \mathcal {A}\} \setminus \{ \varnothing \} \end{aligned}$$

the partition of U induced by \(\mathcal {A}\).

We are now ready to state the recombination equation [5, Eq. (7)],

$$\begin{aligned} \dot{\omega _t} = \sum _{\mathcal {A}\in \varvec{P}(S)} \varrho (\mathcal {A}) \big (\mathcal {R}_\mathcal {A}(\omega _t) - \omega _t \big ), \end{aligned}$$
(2)

where the \(\varrho (\mathcal {A})\) are non-negative real numbers, called recombination rates. This equation expresses that in each infinitesimal time interval \([t,t + \, \mathrm {d}t]\), for each \(\mathcal {A}\in \varvec{P}(S)\), each individual is with probability \(\varrho (\mathcal {A}) \, \mathrm {d}t\) replaced by a new \(\mathcal {A}\)-recombined offspring, distributed as \(\mathcal {R}_\mathcal {A}^{}(\omega _t^{})\). In other words, the current type distribution \(\omega _t^{}\) is replaced by the convex combination

$$\begin{aligned} \Big (1 - \sum _{\mathcal {A}\in \varvec{P}(S)} \varrho (\mathcal {A}) \, \mathrm {d}t \Big ) \omega _t + \sum _{\mathcal {A}\in \varvec{P}(S)} \varrho (\mathcal {A}) \big (\mathcal {R}^{}_{\mathcal {A}} (\omega ^{}_t) \big ) \, \mathrm {d}t. \end{aligned}$$
(3)

Remark 3

For the reader familiar with stochastic models for finite population size, we remark that Eq. (2) may alternatively be obtained from the Moran model with recombination via a dynamic law of large numbers. This is because the Moran models with growing population size form a so-called density dependent family; see [9, Thm. 11.2.1]. \(\diamondsuit \)

It turns out that the nonlinear equation Eq. (2) can, in fact, be related to a linear system, via an ancestral partitioning process that runs backward in time. Here, we only give a brief sketch of the idea; for further background on genealogical methods in the context of recombination, the reader is referred to the excellent review [4].

The partitioning process is a Markov chain \(\Sigma = (\Sigma _t)_{t \geqslant 0}\) in continuous time with state space \(\varvec{P}(S)\) and is most easily understood when started in \(\Sigma _0 = \underline{ 1}\). Assume that we want to sample the type of a single individual (Alice, say) that is alive at time T; Now, for \(0 \leqslant t \leqslant T\), each block \(\sigma \) of \(\Sigma _t\) corresponds to an independent ancestor of Alice that lived at time \(T - t\) and from whom she inherited the letters at the sites in \(\sigma \). At time \(T = T - 0\), Alice herself is alive and corresponds to the unique block of \(\Sigma _0^{} =\underline{ 1}\).

To understand the time-evolution of \(\Sigma \), keep in mind that every block in \(\Sigma _{t}\) corresponds to one of Alice’s ancestors. Recall that, by our interpretation of Eq. (2), every individual alive at time \(T - t\) was with probability \(\varrho (\mathcal {B}) \, \mathrm {d}t\) a \(\mathcal {B}\)-recombined offspring of parents alive at time \(T - t - \, \mathrm {d}t\). This means for the evolution of the partitioning process that a block \(A \in \Sigma _t\) is in the infinitesimal time step from t to \(t + \, \mathrm {d}t\), with probability \(\varrho (\mathcal {B}) \, \mathrm {d}t\), replaced by the collection of blocks of induced partition \(\mathcal {B}_A^{}\), which reflects the partitioning of the genome of the corresponding ancestor of Alice across its own parents. More formally, \(\Sigma \) is a Markov chain in continuous time with rate matrix \(\mathcal {Q}\), where

(4)

Here, the marginal recombination rates are given by

$$\begin{aligned} \varrho ^{A}_{\mathcal {B}^{}_A} \mathrel {\mathop :}=\sum _{\begin{array}{c} \mathcal {C}\in \varvec{P}(S) \\ \mathcal {C}|^{}_{A} = \mathcal {B}_A^{} \end{array}} \varrho _\mathcal {C}^{}. \end{aligned}$$

Formalising our verbal discussion above gives the following stochastic representation

$$\begin{aligned} \omega _t^{} = \mathbb {E}\big [ \mathcal {R}_{\Sigma _t^{}}^{} (\omega _0) \mid \Sigma _0^{} = \underline{ 1}\big ] \end{aligned}$$
(5)

of the solution \(\omega \) of Eq. (2). More generally for arbitrary starting values, one has the duality relation

$$\begin{aligned} \mathcal {R}_\mathcal {A}^{} (\omega _t) = \mathbb {E}\big [ \mathcal {R}_{\Sigma _t^{}}^{} (\omega _0) \mid \Sigma _0^{} = \mathcal {A}\big ]. \end{aligned}$$

Put differently, Eq. (5) implies that any solution of Eq. (2) is of the form

$$\begin{aligned} \omega _t^{} = \sum _{\mathcal {A}\in \varvec{P}(S)} b_t(\mathcal {A}) \mathcal {R}^{}_{\mathcal {A}} (\omega _0), \end{aligned}$$

where the vector \(b_t\) solves the initial-value-problem of the linear ode

$$\begin{aligned} \dot{b}_t = \mathcal {Q}^{\mathsf {T}}b_t. \end{aligned}$$

with initial value \(\underline{ 1}\).

3 Chemical reaction networks

Let us recapitulate a few basic notions in chemical reaction network theory, taylored to our purposes. For an introduction, see [10].

Let \(\mathcal {S}\) (not to be confused with S, the set of sequence sites) be a finite set, the elements of which will be thought of as the reacting species in a chemical reaction network (CRN), that is, a finite collection of chemical reactions, which are represented by symbolic expressions of the form

$$\begin{aligned} r_1 + \cdots + r_{m_1} \rightarrow {\, \kappa \,} s_1 + \cdots + s_{m_2}. \end{aligned}$$
(6)

Here, the \(r_i\) and \(s_i\) are reacting species (not necessarily distinct) and \(\kappa > 0\) is the reaction constant. The left and right-hand sides in Eq. (6) are called the complexes of substrates and products. In our setting, we will always have \(m_1 = m_2 = m\), as we will see later.

Remark 4

Recall from Remark 2 that we formally identified the elements of any finite set with the corresponding point (or Dirac) measures. In this sense, the addition in Eq. (6) can be understood as addition of vectors in the space of signed measures on \(\mathcal {S}\).

Of particular interest are strongly reversible CRNs. They are usually defined as CRNs in which the forward reaction constant agrees with the backward reaction constant for every reaction. In the present setting, where we think of reactions as unidirectional, it is more convenient to phrase this slightly differently.

Definition 5

A CRN is called strongly reversible if it can be partitioned into pairs, each consisting of a reaction,

$$\begin{aligned} r_1 + \cdots + r_m \xrightarrow {\, \kappa \,} s_1 + \cdots + s_m, \end{aligned}$$

together with its backward reaction,

$$\begin{aligned} s_1 + \cdots + s_m \xrightarrow {\, \kappa \,} r_1 + \cdots + r_m. \end{aligned}$$

Note that the reaction constant is the same for both reactions.\(\diamondsuit \)

Given a CRN, it is natural to inquire about the dynamics of the probability vector

$$\begin{aligned} c_t = \sum _{s \in \mathcal {S}} c_t(s) \delta _s = \sum _{s \in \mathcal {S}} c_t(s) s \end{aligned}$$

of normalised concentrations of species. As the left and right-hand sides in Eq. (6) contain the same number of reacting species, the total mass is preserved and may therefore be normalised to one.

The law of mass action translates the collection of formal expressions (6) into a system of coupled differential equations for \(c = (c_t)_{t \geqslant 0}\). It assumes that each reaction occurs with a rate that is proportional to the concentration of each of the substrates, and hence to their product; the proportionality factor is the reaction constant \(\kappa \) in Eq. (6). As each reaction decreases the concentration of substrates and increases the concentration of products, we obtain the following system of ordinary differential equations,

$$\begin{aligned} \dot{c_t} = \sum \kappa c_t(r_1) \cdot \ldots \cdot c_t(r_m) \big ( s_1 + \cdots + s_m - r_1 - \cdots - r_m \big ), \end{aligned}$$

where, again, the reacting species \(s_1,\ldots ,s_m\) and \(r_1,\ldots ,r_m\) are identified with the corresponding point measures \(\delta ^{}_{s_1},\ldots ,\delta ^{}_{s_m}\) and \(\delta ^{}_{r_1},\ldots ,\delta ^{}_{r_m}\) and the sum is taken over all reactions that make up the CRN. We refer the interested reader to [9, Ex. 11.1.C] for a probabilistic variation on this theme.

We now return to recombination. In [12], genetic recombination is treated as a CRN with the types as reacting species, in the special case of two parents. For example, recombination according to \(\mathcal {A}= \{\{1,2\},\{3\}\}\) translates to the reaction

$$\begin{aligned} ({x_1,x_2,x_3}) + ({y_1,y_2,y_3}) \xrightarrow {\, \frac{\varrho (\mathcal {A})}{2} \,} ({x_1,x_2},{y_3}) + ({y_1,y_2},{x_3}). \end{aligned}$$

This describes the process of recombination at the molecular level; first, the parental sequences \(({x_1,x_2,x_3})\) and \(({y_1,y_2,y_3})\) are split in two, according to \(\mathcal {A}\). Then, two new sequences are obtained by joining the leading part of one sequence with the trailing part of the other, and vice versa. For each (ordered) pair of types and each partition \(\mathcal {A}\), the reaction constant is \(\frac{\varrho (\mathcal {A})}{2}\); this is a special case of Theorem 6, which is stated below. In the case when there are more than two parents, the basic idea remains the same; for any partition \(\mathcal {C}\), take \(|\mathcal {C}|\) types, split each of them according to \(\mathcal {C}\), rearrange the parts and join them back together. Note that this last step is somewhat ambiguous; already in the three-parent case, this can be done in at least two different ways; either,

$$\begin{aligned} ({x_1,x_2,x_3})+ & {} ({y_1,y_2,y_3}) + ({z_1,z_2,z_3}) \longrightarrow ({x_1},{y_2},{z_3}) + ({y_1},{z_2},{x_3})\nonumber \\+ & {} ({z_1},{x_2},{y_3}), \end{aligned}$$
(7)

or

$$\begin{aligned} ({x_1,x_2,x_3})+ & {} ({y_1,y_2,y_3}) + ({z_1,z_2,z_3}) \longrightarrow ({x_1},{z_2},{y_3}) + ({z_1},{y_2},{x_3})\nonumber \\+ & {} ({y_1},{x_2},{z_3}). \end{aligned}$$
(8)

One way to resolve this ambiguity is to order the substrates and define the reaction accordingly. Thus, there may be many different reactions with a common complex of substrates. More precisely, for every \(\mathcal {C}\in \varvec{P}(S)\) and each ordered tuple \(\big (x^{(1)},\ldots ,x^{(|\mathcal {C}|)} \big ) \in X^{\mathcal {C}}\), we define a chemical reaction via the following graphical construction, illustrated in Fig. 1.

Fig. 1
figure 1

An illustration of the reaction scheme for \(|\mathcal {C}| = 3\). The types \(x^{(1)}\), \(x^{(2)}\) and \(x^{(3)}\) are each split according to \(\mathcal {C}\) and then joined back together along the connecting lines. For the sake of clarity, the first two columns of the diagram are repeated after the vertical line

First, just as in the two-parent case, the \(|\mathcal {C}|\) types are broken up into their subsequences \(\pi _{C_j}^{} (x^{(i)})\) over the blocks of \(\mathcal {C}\). Then, they are arranged on a two-dimensional, \(|\mathcal {C}|\)-periodic grid (or discrete torus), where \(\pi ^{}_{C_j} \big ( x^{(i)} \big )\) is placed in the i-th column and j-th row. Finally, the products are formed by joining the fragments along each diagonal line, running from north-west to south-east through the grid. Alternatively, one may think about moving the i-th row \(i-1\) places to the left, and then joining the fragments in each column. More formally, every choice of \(\mathcal {C}\) and \(\big (x^{(1)},\ldots ,x^{(|\mathcal {C}|)} \big )\) defines a reaction,

$$\begin{aligned} \sum _{j = 1}^{|\mathcal {C}|} x^{(j)} \xrightarrow {\, \frac{\varrho (\mathcal {C})}{|\mathcal {C}|} \,} \sum _{j=1}^{|\mathcal {C}|} \bigsqcup _{i=1}^{|\mathcal {C}|} \pi _{C_i}^{} \big (x^{(i+j-1)} \big ), \end{aligned}$$
(9)

where the indices are to be read modulo \(|\mathcal {C}|\).

Notice that the right-hand side depends on the order of the substrates, while the left-hand side is independent of it. For instance, in our earlier example with three sites and parents (that is, \(\mathcal {C}\) is \(\underline{ 0}\), the trivial partition into singletons), the choice \(x^{(1)} = x, x^{(2)} = y, x^{(3)} = z\) leads to Eq. (7), while exchanging the roles of the second and third type leads to Eq. (8).

Theorem 6

For finite X, Eq. (2) is the law of mass action for the CRN comprised of all reactions (9), one for every choice of \(\mathcal {C}\) and \(\big (x^{(1)},\ldots ,x^{(|\mathcal {C}|)} \big )\). More concisely, (2) is equivalent to

$$\begin{aligned} \dot{\omega _t}= & {} \sum _{\mathcal {C}\in \varvec{P}(S)} \sum _{x^{(1)},\ldots ,x^{(|\mathcal {C}|)} \in X} \frac{\varrho (\mathcal {C})}{|\mathcal {C}|} \, \omega _t \big (x^{(1)} \big )\\&\cdot \ldots \cdot \omega _t \big (x^{(|\mathcal {C}|)} \big ) \Bigg ( \sum _{j = 1}^{|\mathcal {C}|} \bigg ( \bigsqcup _{i = 1}^{|\mathcal {C}|} \pi _{C_i}^{} \big (x^{(i + j -1)} \big ) - x^{(j)} \bigg ) \Bigg ). \end{aligned}$$

Proof

We will show that for all \(\mathcal {C}\in \varvec{P}(S)\) and all \(\nu \in \mathcal {P}(X)\), we have

$$\begin{aligned} (\mathcal {R}_\mathcal {C}(\nu ) - \nu ) = \frac{1}{|\mathcal {C}|} \sum _{x^{(1)},\ldots ,x^{(|\mathcal {C}|)}} \nu \big (x^{(1)} \big )\cdot \ldots \cdot \nu \big (x^{(|\mathcal {C}|)} \big ) \sum _{j=1}^{|\mathcal {C}|} \bigg ( \bigsqcup _{i = 1}^{|\mathcal {C}|} \pi _{C_i}^{}\big (x^{(i+j - 1)}_{} \big ) - x^{(j)} \bigg ). \end{aligned}$$

Recall that, by Lemma 1, we have

$$\begin{aligned} \begin{aligned} \mathcal {R}_{\mathcal {C}} (\nu )&= \sum _{x^{(1)},\ldots ,x^{(|\mathcal {C}|)}} \nu \big (x^{(1)} \big )\cdot \ldots \cdot \nu \big (x^{(|\mathcal {C}|)} \big ) \bigsqcup _{i = 1}^{|\mathcal {C}|} \pi _{C_i}^{} \big (x^{(i)} \big ) \\&= \frac{1}{|\mathcal {C}|} \sum _{j=1}^{|\mathcal {C}|} \sum _{x^{(1)},\ldots ,x^{(|\mathcal {C}|)}} \nu \big (x^{(1)} \big )\cdot \ldots \cdot \nu \big (x^{(|\mathcal {C}|)} \big ) \bigsqcup _{i = 1}^{|\mathcal {C}|} \pi _{C_i}^{} \big (x^{(i+j-1)} \big ) \\&= \frac{1}{|\mathcal {C}|} \sum _{x^{(1)},\ldots ,x^{(|\mathcal {C}|)}} \nu \big (x^{(1)} \big )\cdot \ldots \cdot \nu \big (x^{(|\mathcal {C}|)}\big ) \sum _{j=1}^{|\mathcal {C}|} \bigsqcup _{i = 1}^{|\mathcal {C}|} \pi _{C_i}^{}\big (x^{(i+j-1)} \big ). \end{aligned} \end{aligned}$$

Here, we obtain the second equality by replacing the product \( \nu \big (x^{(1)} \big )\cdot \ldots \cdot \nu \big (x^{(|\mathcal {C}|)}\big ) \) with its cyclic permutation \( \nu \big (x^{(1 - j +1)} \big ) \cdot \ldots \cdot \nu \big (x^{(|\mathcal {C}| - j +1)} \big ), \) and subsequently renaming the indices; recall that indices are to be read modulo \(|\mathcal {C}|\). Similarly, keeping in mind that \(\sum _{x \in X} \nu (x) = 1\) because \(\nu \) is a probability measure, we obtain,

$$\begin{aligned} \begin{aligned} \nu = \sum _{x \in X} \nu (x) x =&\frac{1}{|\mathcal {C}|} \sum _{j=1}^{|\mathcal {C}|} \sum _{x \in X} \bigg ( \sum _{y \in X} \nu (y) \bigg )^{j-1}\nu (x) \bigg ( \sum _{y \in X} \nu (y) \bigg )^{|\mathcal {C}| - j} x \\ =&\frac{1}{|\mathcal {C}|}\sum _{x^{(1)},\ldots ,x^{(|\mathcal {C}|)}} \nu \big (x^{(1)} \big ) \cdot \ldots \cdot \nu \big (x^{(|\mathcal {C}|)} \big ) \big (x^{(1)} + \cdots + x^{(|\mathcal {C}|)} \big ), \end{aligned} \end{aligned}$$

which completes the argument. \(\square \)

We have thus seen that, in the case of finite type spaces, genetic recombination can be reinterpreted as a CRN, also in the case of an arbitrary number of parents. In fact, this network is strongly reversible.

Theorem 7

The CRN from Theorem 6 is strongly reversible in the sense of Definition 5.

Proof

Let \(\mathcal {C}\) be fixed. Define \(\varphi : X^{|\mathcal {C}|} \rightarrow X^{|\mathcal {C}|}\) via

$$\begin{aligned} \varphi \big (x^{(1)},\ldots ,x^{(|\mathcal {C}|)} \big ) \mathrel {\mathop :}=\bigg (\bigsqcup _{i=1}^{|\mathcal {C}|} \pi _{C_i}^{} \big (x^{(i+|\mathcal {C}| - 1)} \big ),\ldots ,\bigsqcup _{i=1}^{|\mathcal {C}|} \pi _{C_i}^{} \big (x^{(i)} \big ) \bigg ). \end{aligned}$$

Note that \(\varphi \big (x^{(1)},\ldots ,x^{(|\mathcal {C}|)} \big )\) contains the products in the reaction defined by \(\mathcal {C}\) together with \(\big (x^{(1)},\ldots ,x^{(|\mathcal {C}|)} \big )\), in reverse order (compare (9)). As short reflection on Fig. 1 reveals that \(\varphi \) is an involution and therefore partitions \(X^{|\mathcal {C}|}\) into orbits that contain either one or two elements. Consider first an orbit with two elements \(\big (x^{(1)},\ldots ,x^{(|\mathcal {C}|)} \big )\) and \(\big (y^{(1)},\ldots ,y^{(|\mathcal {C}|)}\big )\). Then, the associated reactions form a forward-backward reaction pair,

$$\begin{aligned} \sum _{j = 1}^{|\mathcal {C}|} x^{(j)} \xrightarrow {\, \frac{\varrho (\mathcal {C})}{|\mathcal {C}|} \,} \sum _{j = 1}^{|\mathcal {C}|} y^{(j)} \quad \text { and } \quad \sum _{j = 1}^{|\mathcal {C}|} y^{(j)} \xrightarrow {\, \frac{\varrho (\mathcal {C})}{|\mathcal {C}|} \,} \sum _{j = 1}^{|\mathcal {C}|} x^{(j)}. \end{aligned}$$

On the other hand, the reaction defined by a fixed point of \(\varphi \) is void, since its product and substrate complex agree. \(\square \)

Next, we consider the connection to gradient systems.

4 Gradient systems

For this section, we need a few basic notions from differential (particularly Riemannian) geometry, which we recall here for the convenience of the reader. For further background, we refer the reader to [15], in particular Chapter 5. For a real-valued differentiable function V, defined on (some subset of) \(\mathbb {R}\,^d\), and a function C with the same domain and values in the symmetric positive semi-definite matrices, we call the ordinary differential equation

$$\begin{aligned} \dot{x} = C(x) \nabla V(x) \end{aligned}$$
(10)

a generalised gradient system (with respect to the potential V). Here,

$$\begin{aligned} \nabla \mathrel {\mathop :}=\sum _{i = 1}^d \hat{e}_i \frac{\partial }{\partial x_i} \end{aligned}$$

is the nabla symbol and \(\{\hat{e}_1,\ldots ,\hat{e}_d\}\) denotes the standard basis of \(\mathbb {R}\,^d\).

Given \(x \in \mathbb {R}\,^d\), a vector v in \(T_x (\mathbb {R}\,^d)\), the tangent space of \(\mathbb {R}\,^d\) at x, and a continuously differentiable curve \(\gamma \) in \(\mathbb {R}\,^d\) with \(\gamma (0) = x\) and \(\gamma '(0) = v\), recall that the directional derivative of V in direction v is given by

$$\begin{aligned} \, \mathrm {d}V(x)(v) \mathrel {\mathop :}=\frac{\, \mathrm {d}}{\, \mathrm {d}t} V \big (\gamma (t) \big )|_{t = 0}. \end{aligned}$$

The one-form \(\, \mathrm {d}V\) is called the exterior derivative of V; note that it can be defined analogously for any real-valued function on a smooth manifold, and, in particular, does not depend on the Euclidean structure of \(\mathbb {R}\,^d\). One has, by an application of the chain rule,

$$\begin{aligned} \, \mathrm {d}V(x)(v) = \sum _{j = 1}^d \gamma '(0)_j \frac{\partial }{\partial x_j} V(x) = \langle \gamma '(0), \nabla V(x) \rangle , \end{aligned}$$
(11)

where \(\langle \cdot , \cdot \rangle \) denotes the standard scalar product on \(\mathbb {R}\,^d\). Replacing the standard scalar product by a general Riemannian metric \(\langle \langle \cdot , \cdot \rangle \rangle _x\), (that is, a positive definite, symmetric bilinear form on the tangent space, which varies smoothly, depending on the base point), Eq. (11) can be used to define the gradient of V with respect to this metric [15, Ex. 108], denoted by \(\text {grad}_{\langle \langle \cdot , \cdot \rangle \rangle } (V)\); it is the unique vectorfield that satisfies

$$\begin{aligned} \, \mathrm {d}V(x)(v) = \langle \langle v, \text {grad}_{\langle \langle \cdot , \cdot \rangle \rangle } (V)(x) \rangle \rangle _x \end{aligned}$$

for all x and v. Geometrically, this means that, unless x is an equilibrium, \(\text {grad}_{\langle \langle \cdot , \cdot \rangle \rangle } (V)(x)\) points in the direction of steepest ascent of V at point x, with respect to the chosen metric. In particular, if C(x) in Eq. (10) is invertible and we consider the metric,

$$\begin{aligned} \langle \langle u, w \rangle \rangle _x := \langle u, C(x)^{-1} w \rangle , \end{aligned}$$

we see that

$$\begin{aligned} \text {grad}_{\langle \langle \cdot , \cdot \rangle \rangle }(V)(x) = C(x) \text {grad}_{\langle \cdot , \cdot \rangle }(V)(x) = C(x) \nabla V(x). \end{aligned}$$

Thus, Eq. (10) can be thought of as a gradient system in the classical sense, if we replace the Euclidean metric on \(\mathbb {R}\,^d\) by a Riemannian one, at least in the case that C(x) is invertible.

The interpretation is somewhat more delicate when C(x) fails to be invertible. Intuitively, one might think of the kernel of C(x) as a set of forbidden directions, and try to restrict attention to submanifolds which partition the space and are in each point x tangent to the image of C. However, this interpretation is only valid when the image of C is integrable in the sense that whenever Y and Z are two vectorfields such that \(Y(x) \in \text {Im }C(x)\) and \(Z(x) \in \text {Im }C(x)\) for all x, then also \([Y,Z](x) \in \text {Im }C(x)\) for all x, where [YZ] denotes the Lie bracket of Y and Z; this is the content of Frobenius’ theorem [15, Thm. 1.9.2]. The situation when \(\text {Im } C\) is not integrable can be understood via the theory of sub-Riemannian manifolds. Roughly speaking, this theory is concerned with Riemannian metrics which may take the value \(+\,\infty \); see [6] for an overview.

Remark 8

To demonstrate that the condition of integrability is not trivial, consider the following two vectorfields on \(\mathbb {R}\,^3\).

$$\begin{aligned} X_1 \mathrel {\mathop :}=\frac{\partial }{\partial x_1} \quad \text { and } \quad X_2 \mathrel {\mathop :}=x_1 \frac{\partial }{\partial x_3} + \frac{\partial }{\partial x_2}. \end{aligned}$$

Note that

$$\begin{aligned} {[}X_1,X_2] = \frac{\partial }{\partial x_3} \end{aligned}$$

is nowhere in the span of \(X_1\) and \(X_2\); thus, proving integrability in our case (and for the gradient systems arising in chemical reaction network theory in general) might be an interesting question in its own right. \(\diamondsuit \)

We remark that, under the assumption that (10) has a unique equilibrium, the potential V is always a strong, (global) Lyapunov function (by which we mean that V is strictly increasing along non-constant solutions). This is because

$$\begin{aligned} \langle \nabla V(x), \dot{x} \rangle = \langle \nabla V(x), C(x) \nabla V(x) \rangle \ge 0, \end{aligned}$$

by the positive semi-definiteness of C(x). Equality holds if and only if \(\nabla V(x)\) is in the kernel of C(x) (implying that \(\dot{x} = 0\)), hence, if and only if the system is in equilibrium.

We have seen in the previous section that the general recombination equation, interpreted as a chemical reaction network, is strongly reversible. Thus, it is a gradient system in the sense of Eq. (10), by standard theory; compare [14, 16], where this is proved in much greater generality. For the sake of completeness, we include the simple proof of this fact, in the special case needed for our purposes.

Theorem 9

The law of mass action for any strongly reversible CRN can be written as a generalised gradient system,

$$\begin{aligned} \dot{c}_t = C(c_t) \nabla F(c_t), \end{aligned}$$

where

$$\begin{aligned} F(c) \mathrel {\mathop :}=- \sum _{s \in \mathcal {S}} \Big ( c(s) \log \big (c(s) \big ) - c(s) \Big ) \end{aligned}$$

is called the negative free energy and C is a continuous function on \(\mathcal {P}(\mathcal {S})\), which is smooth on its interior and takes values in the positive semi-definite matrices.

Proof

Due to strong reversibility (see Definition 5), the law of mass action takes the form

$$\begin{aligned} \dot{c}_t = \sum \Big (\prod _{i = 1}^m c_t(r_i) - \prod _{i = 1}^m c_t(s_i) \Big ) \sum _{i=1}^m (s_i - r_i), \end{aligned}$$

where the outer sum is taken over all forward-backward reaction pairs in the network. Define for \(x,y \geqslant 0\),

$$\begin{aligned} L(x,y) := \frac{x - y}{\text {log }(x) - \text {log }(y)}. \end{aligned}$$
(12)

It is a straightforward exercise to verify that L defines a continuous, non-negative function on \(\mathbb {R}\,_{\geqslant 0}^2\), which is smooth on \(\mathbb {R}\,_{> 0}^2\). Note that

$$\begin{aligned} \nabla F(c) = - \sum _{s \in \mathcal {S}} \log \big (c(s) \big ) s. \end{aligned}$$

Thus, setting (for each forward-backward reaction pair)

$$\begin{aligned} M(c) \mathrel {\mathop :}=L \Big (\prod _{i = 1}^m c(r_i), \prod _{i = 1}^m c(s_i) \Big ) \bigg (\sum _{i=1}^m (s_i - r_i) \bigg )\bigg (\sum _{i=1}^m (s_i - r_i) \bigg )^{ T}, \end{aligned}$$

we see by the multiplication rule for the logarithm, that

$$\begin{aligned} \Big (\prod _{i = 1}^m c(r_i) - \prod _{i = 1}^m c(s_i) \Big ) \sum _{i=1}^m (s_i - r_i) = M(c) \nabla F(c). \end{aligned}$$

Here, we also used that \(s^{\mathsf {T}}\nabla F(c) = - \log \big ( c(s) \big )\) for all \(s \in \mathcal {S}\). Since a non-negative linear combination of positive semi-definite, symmetric matrices is symmetric and positive semi-definite, the claim follows. \(\square \)

Remark 10

Since the total mass, \(\sum _{s \in \mathcal {S}} c_t(s)\), is preserved in our case, we may replace the negative free energy F in Theorem 9 by the entropy

$$\begin{aligned} H(c) \mathrel {\mathop :}=- \sum _{s \in \mathcal {S}} c(s) \log \big (c(s) \big ). \end{aligned}$$

For the solution of the recombination equation (Eq. (2)) this has the following consequence. It is a well-known fact that, when considering the set of probability measures on a product space which all have the same marginals, the product measure of these marginals is a maximiser for the entropy. As the one-dimensional marginals are preserved under recombination (in absence of mutation or selection), the fact that Eq. (2) can be written as a generalised gradient system with respect to H reflects on the fact that the solution approaches linkage equilibrium; compare [8, Theorem 3.1]. \(\diamondsuit \)

4.1 Explicit examples

Combining Theorems 6, 7 and 9, for finite X, there exists a Function C, defined on \(\mathcal {P}(X)\) with values in the symmetric positive semi-definite matrices such that

$$\begin{aligned} \dot{\omega }_t^{} = C(\omega _t^{}) \nabla F(\omega _t^{}) \end{aligned}$$

is equivalent to the recombination Eq. (2). Our goal is now to write down the function \(\nu \mapsto C(\nu )\) for \(\nu \in \mathcal {P}(X)\) explicitly for concrete examples. The most simple one is the classical case with two parents and two diallelic loci (compare [12, Ex. 1]). Then, we have the reaction

Identifying (0, 0) with the first, (0, 1) with the second, (1, 0) with the third and (1, 1) with the fourth basis vector in \(\mathbb {R}\,^4\), the matrix \(C(\nu )\), as constructed in the proof of Theorem 9 can be written as

$$\begin{aligned} \varrho L \big ( \nu (0,0) \nu (1,1), \nu (1,0) \nu (0,1) \big ) \begin{pmatrix} 1 &{}\quad -1 &{}\quad -1 &{}\quad 1 \\ -1 &{}\quad 1 &{}\quad 1 &{}\quad -1 \\ -1 &{}\quad 1 &{}\quad 1 &{}\quad -1 \\ 1 &{}\quad -1 &{}\quad -1 &{}\quad 1 \end{pmatrix}, \end{aligned}$$

with L defined in Eq. (12)

Next, we treat the slightly more complicated example of three diallelic loci (but still 2 parents); compare [12, Ex. 2]. Again, we denote the two alleles by 0 and 1. We denote the type \((i_1,i_2,i_3)\) by \(g^{}_{4i_i + 2 i_2 + i_3}\); in other words, the index of a type is just the type itself, read as a binary integer. For example, we refer to (0, 0, 0) by \(g_0\) and to (1, 0, 1) by \(g_5\), and identify \(g_i\) with the canonical \(i+1\)-th basis vector of \(\mathbb {R}\,^8\).

Now, by the proof of Theorem 9, we associate to each reaction pair of the form

(13)

an \(8 \times 8\) matrix \(M(\nu )\) with entries

$$\begin{aligned} M_{ij}(\nu ) \mathrel {\mathop :}={\left\{ \begin{array}{ll} \kappa L \big (\nu (g_{i_1}^{}) \nu (g_{i_2}^{}), \nu (g_{j_1}^{}) \nu (g_{j_2}^{}) \big ), &{}\text { if }\; g_{i-1} \text { and } g_{j-1} \text { are on the same side of } (13), \\ -\kappa L \big (\nu (g_{i_1}^{}) \nu (g_{i_2}^{}), \nu (g_{j_1}^{}) \nu (g_{j_2}^{}) \big ), &{}\text { if }\; g_{i-1} \text { and } g_{j-1} \text { are on different sides of } (13), \\ 0, &{}\; \text {otherwise} \end{array}\right. } \end{aligned}$$

and \(C(\nu )\) is then given by summing these matrices over all forward-backward reaction pairs in the network. To keep things tidy, instead of summing over all forward-backward reaction pairs, we write down the sums over each different linkage class separately; this allows to take advantage of the following symmetry implied by our choice of indices. Namely, as 1s are only exchanged between gametes but their relative positions in the sequence remains unchanged, the sum of indices is the same for each complex that are in the same linkage class, of which there are seven; six consisting of only one forward-backward reaction pair each, and one consisting of six such pairs. Assume that M belongs to a reaction within a complex where the indices sum to \(\ell \). Then, it is easy to see that we have \(M_{i,j} = M_{\ell - i + 2,j} = M_{i, \ell -j + 2} = M_{\ell - i + 2,\ell -j + 2}\). This means that, for \(\ell \) odd, M is of the form

where denotes the reversal of columns and denotes the reversal of rows within a matrix and A is a \(\frac{\ell + 1}{2} \times \frac{\ell + 1}{2}\) matrix if \(\ell \leqslant 7\) and a \(\frac{14 - \ell +1}{2}\) matrix if \(\ell > 7\). For \(\ell \) even, M is of the form

where A is now an \(\frac{\ell }{2} \times \frac{\ell }{2}\) matrix if \(\ell \leqslant 7\) and \(\frac{14 - \ell }{2}\) if \(\ell > 7\); Here, the extra 0 between the reflected copies of A comes from the fact that reactions of the form

do not contribute to the system. Let us now write these matrices A for the different linkage classes. We abbreviate the function \(\nu \mapsto L \big (\nu (g_{i_1}^{}) \nu (g_{i_2}^{}), \nu (g_{j_1}^{}) ) \nu (g_{j_2}^{}) \big )\) by \(L_{i_1i_2,j_1j_2}\). For all \(1 \leqslant i \leqslant 3\), \(\varrho _i^{}\) denotes the recombination rate for the partition \(\{\{i\}, \{1,2,3\} \setminus \{i\} \}\). For the first six linkage classes in [12, Ex. 2], each consisting of one reaction, we have in place of A

$$\begin{aligned} \begin{aligned}&\left( {\begin{matrix} (\varrho _1^{} + \varrho _2^{} )L_{06,24} &{}\quad 0 &{}\quad -(\varrho _1^{} + \varrho _2^{} )L_{06,24} \\ 0 &{}\quad 0 &{}\quad 0 \\ -(\varrho _1^{} + \varrho _2^{} )L_{06,24} &{}\quad 0 &{}\quad (\varrho _1^{} + \varrho _2^{} )L_{06,24} \end{matrix}} \right) ,\quad \left( {\begin{matrix} (\varrho _1^{} + \varrho _2^{} )L_{17,35} &{}\quad 0 &{}\quad -(\varrho _1^{} + \varrho _2^{} )L_{17,35} \\ 0 &{}\quad 0 &{}\quad 0 \\ -(\varrho _1^{} + \varrho _2^{} )L_{17,35} &{}\quad 0 &{}\quad (\varrho _1^{} + \varrho _2^{} )L_{17,35} \end{matrix}} \right) , \\&\left( {\begin{matrix} (\varrho _1^{} + \varrho _3^{} )L_{05,14} &{}\quad -(\varrho _1^{} + \varrho _3^{} )L_{05,14} &{}\quad 0 \\ -(\varrho _1^{} + \varrho _3^{} )L_{05,14} &{}\quad (\varrho _1^{} + \varrho _3^{} )L_{05,14} &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 \end{matrix}} \right) ,\quad \left( {\begin{matrix} 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad (\varrho _1^{} + \varrho _3^{} )L_{27,36} &{}\quad - (\varrho _1^{} + \varrho _3^{} )L_{27,36} \\ 0 &{}\quad - (\varrho _1^{} + \varrho _3^{} )L_{27,36} &{}\quad (\varrho _1^{} + \varrho _3^{} )L_{27,36} \end{matrix}} \right) , \end{aligned} \end{aligned}$$

representing the reactions and

$$\begin{aligned} \left( {\begin{matrix} (\varrho _2^{} + \varrho _3^{} )L_{03,12} &{}\quad -(\varrho _2^{} + \varrho _3^{} )L_{03,12} \\ -(\varrho _2^{} + \varrho _3^{} )L_{03,12} &{}\quad -(\varrho _2^{} + \varrho _3^{} )L_{03,12} \end{matrix}} \right) , \left( {\begin{matrix} (\varrho _2^{} + \varrho _3^{} )L_{47,56} &{}\quad -(\varrho _2^{} + \varrho _3^{} )L_{47,56} \\ - (\varrho _2^{} + \varrho _3^{} )L_{47,56} &{}\quad (\varrho _2^{} + \varrho _3^{} )L_{47,56} \end{matrix}} \right) , \end{aligned}$$

representing the reactions . Finally, the last linkage class, comprised of the six reactions , , , , , , is represented by

$$\begin{aligned} \left( {\begin{matrix} \varrho _1^{} L_{07,34} + \varrho _2^{} L_{07,25} + \varrho _3^{} L_{16,07} &{} - \varrho _3^{} L_{16,07} &{} -\varrho _2^{} L_{07,25} &{} -\varrho _1^{} L_{07,34} \\ -\varrho _3^{} L_{16,07} &{} \varrho _1^{} L_{16,25} + \varrho _2^{} L_{16,34} + \varrho _3^{} L_{16,07} &{} -\varrho _1^{} L_{16,25} &{} -\varrho _2^{} L_{16,34} \\ -\varrho _2^{} L_{25,07} &{} -\varrho _1^{} L_{25,16} &{} \varrho _1^{} L_{25,16} + \varrho _2^{} L_{25,07} + \varrho _3^{} L_{25,34} &{} - \varrho _3^{} L_{25,34} \\ -\varrho _1^{} L_{34,07} &{} -\varrho _2^{} L_{34,16} &{} -\varrho _3^{} L_{34,25} &{} \varrho _1^{} L_{34,07} + \varrho _2^{} L_{34,16} + \varrho _3^{} L_{34,25} \end{matrix}} \right) . \end{aligned}$$

5 Monotone Markov chains and the partitioning process

We have seen how the result of Müller and Hofbauer [12] generalises in the setting of an arbitrary number of parents, at least for finite type spaces. For more general, potentially uncountable type spaces, this approach fails because it is not clear how to even make sense of the notion of the concentration of individual types, unless \(\omega _t\) is pure point. Now, we show how the evolution of the law of the partitioning process, related to \(\omega \) via Eq. (5) can be written as a gradient system. Ultimately, this is due to the monotonicity of its sample paths; recall from (4) that the transition rate from \(\mathcal {A}\) to \(\mathcal {B}\) vanishes whenever \(\mathcal {B}\not \preccurlyeq \mathcal {A}\). In particular, the number of blocks increases strictly in each transition.

Definition 11

Let \(\mathcal {X}= \big (\mathcal {X}_t \big )_{t \geqslant 0}\) be a continuous-time Markov chain on a finite state space E with rate matrix \(\big ( Q(i,j) \big )_{i,j \in E}\); it is called a Markov chain with strictly monotone orbits (MCsmo)(with respect to a real-valued function W on E) if \(Q(i,j) > 0\) implies that \(W(j) > W(i)\). \(\diamondsuit \)

Recall that the distribution of a finite-state Markov chain \(\mathcal {X}= (\mathcal {X}_t)_{t \geqslant 0}\) can be interpreted as a probability vector,

$$\begin{aligned} p^\mathcal {X}_t \mathrel {\mathop :}=\sum _{i \in E} p^\mathcal {X}_t(i) i, \end{aligned}$$

which evolves in time according to the differential equation,

$$\begin{aligned} \dot{p}_t^\mathcal {X}= \sum _{i \in E} \sum _{j \in E} p_t^\mathcal {X}(i) Q(i,j) (j - i). \end{aligned}$$
(14)

If \(\mathcal {X}\) has strictly monotone orbits in the sense of Definition 11, Eq. (14) can be written as a generalised gradient system, as defined in Sect. 4.

Theorem 12

Let \(\mathcal {X}= \big (\mathcal {X}_t \big )_{t \geqslant 0}\) be a MCsmo with respect to W and define \(\Psi : \mathbb {R}\,^E \rightarrow \mathbb {R}\,\),

$$\begin{aligned} \Psi (p) \mathrel {\mathop :}=\sum _{i \in E} p(i) W(i). \end{aligned}$$

Then, Eq. (14), which describes the time evolution of \(p_t^\mathcal {X}\), is equivalent to

$$\begin{aligned} \dot{p}_t^\mathcal {X}= K(p_t^\mathcal {X}) \nabla \Psi (p_t^\mathcal {X}), \end{aligned}$$

where K takes values in the symmetric, positive semi-definite matrices, is continuous on \(\mathcal {P}(E)\) and smooth on its interior.

Proof

Define

$$\begin{aligned} K(p) \mathrel {\mathop :}=\sum _{\begin{array}{c} i,j \in E \\ Q(i,j) > 0 \end{array}} \frac{p(i) Q(i,j)}{W(j) - W(i)} (j-i)(j-i)^{\mathsf {T}}. \end{aligned}$$

Since \(\Psi \) is linear with (constant) gradient

$$\begin{aligned} \nabla \Psi = \sum _{i \in E} W(i) i, \end{aligned}$$

we have \((j - i)^{\mathsf {T}}\nabla \Psi = W(j) - W(i)\) and thus,

$$\begin{aligned} K(p) \nabla \Psi = \sum _{\begin{array}{c} i,j \in E \\ Q(i,j) > 0 \end{array}} \frac{p(i) Q(i,j)}{W(j) - W(i)} \big (W(j) - W(i) \big ) (j-i) = \sum _{i,j \in E} p(i) Q(i,j) (j - i). \end{aligned}$$

Inserting \(p_t^\mathcal {X}\) for p, this is exactly the right-hand side of Eq. (14) \(\square \)

The partitioning process mentioned in Sect. 2 is a process of successive refinement; in every non-silent transition, the number of blocks increases at least by one. Thus, it is a MCsmo with respect to the number of blocks.

Corollary 13

The law \(p^\Sigma _{}\) of the partitioning process with generator \(\mathcal {Q}\) given in Eq. (4) satisfies a generalised gradient system with respect to N given by

$$\begin{aligned} N(p) = \sum _{\mathcal {A}\in \varvec{P}(S)} p(\mathcal {A}) |\mathcal {A}|. \end{aligned}$$

We conclude with an explicit example.

Example 14

Let us consider a Markov chain with 4 states ABCD and jump rates \(q(A,B) = q(A,C) = 1\) and \(q(B,D) = q(C,D) = 2\). All other transition rates are 0. This is a Markov chain with strictly monotone orbits in the sense of Definition 11, with respect to W given by \(W(A) = 1, W(B) = W(C) = 2, W(D) = 3\). Upon identifying ABCD with the standard basis of \(\mathbb {R}\,^4\), the linear differential equation describing the dynamics of its distribution reads

$$\begin{aligned} \dot{p}_t = \begin{pmatrix} -2 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 1 &{}\quad -2 &{}\quad 0 &{}\quad 0 \\ 1 &{}\quad 0 &{}\quad -2 &{}\quad 0 \\ 0 &{}\quad 2 &{}\quad 2 &{}\quad 0 \end{pmatrix} p_t, \end{aligned}$$
(15)

and can be rewritten as

$$\begin{aligned} \dot{p}_t = \begin{pmatrix} 2p_t(A) &{}\quad - p_t(A) &{}\quad -p_t(A) &{}\quad 0 \\ -p_t(A) &{}\quad p_t(A) + 2p_t(B) &{}\quad 0 &{}\quad -2p_t(B) \\ -p_t(A) &{}\quad 0 &{}\quad 2p_t(C) + p_t(A) &{}\quad -2p_t(C) \\ 0 &{}\quad -2p_t(B) &{}\quad -2p_t(C) &{}\quad 2p_t(C) + 2 p_t(B) \end{pmatrix} \begin{pmatrix} 1 \\ 2 \\ 2 \\ 3 \end{pmatrix}.\nonumber \\ \end{aligned}$$
(16)

Here, the vector \((1,2,2,3)^{\mathsf {T}}\) is the gradient (with respect to the euclidean metric) of

$$\begin{aligned} \Psi (p) = p(A) + 2p(B) + 2 p(C) + 3 p(D). \end{aligned}$$

Also, the matrix is symmetric and it is positive semi-definite, as it can be written as a sum of positive semi-definite matrices (as long as \(p(A),p(B),p(C) \ge 0)\),

$$\begin{aligned} p(A) \left( {\begin{matrix} -1 \\ 1 \\ 0 \\ 0 \end{matrix}} \right) \left( {\begin{matrix} -1&1&0&0 \end{matrix}} \right) + p(A) \left( {\begin{matrix} -1 \\ 0\\ 1\\ 0\end{matrix}} \right) \left( {\begin{matrix} -1&0&1&0 \end{matrix}} \right) + 2p(B) \left( {\begin{matrix} 0\\ -1\\ 0\\ 1 \end{matrix}} \right) \left( {\begin{matrix} 0&-1&0&1 \end{matrix}} \right) + 2p(C) \left( {\begin{matrix} 0\\ 0\\ -1\\ 1 \end{matrix}} \right) \left( {\begin{matrix} 0&0&-1&1 \end{matrix}} \right) , \end{aligned}$$

evaluated at \(p = p_t\). Thus, Eq. (16) is a generalised gradient system in the sense of Eq. (10).

Note that the coefficient matrix in Eq. (15) is not diagonalisable; this is because the eigenvalue \(-2\) has algebraic multiplicity 3, but the associated eigenspace is merely two-dimensional and spanned by \((0,1,-1,0)^{\mathsf {T}}\) and \((0,1,0,-1)^{\mathsf {T}}\). Its general solution will therefore contain terms of the form \(t \mathrm {e}^{-2 t}\). This seems to contradict the fact that a linear generalised gradient system can not have resonant solutions of the form \(t^k \mathrm {e}^{\lambda t}\) for \(k \ge 1\). One has to keep in mind, however, that the gradient representation only holds on the nonnegative cone (which is forward-invariant for the system). Note also that the problematic generalised eigenspace only has a trivial intersection with \(\mathbb {R}\,_{\geqslant 0}^4\). We conclude with one additional example.

Example 15

Let us now consider the actual partitioning process, for three loci. We have the five partitions \(\mathcal {A}_1 =\{\{1,2,3\}\}, \mathcal {A}_2 = \{\{1\},\{2,3\}\}, \mathcal {A}_3 = \{\{1,3\},\{2\}\}, \mathcal {A}_4 = \{\{1,2\},\{3\}\}\) and \(\mathcal {A}_5 = \{\{1\},\{2\},\{3\}\}\). Identifying \(\mathcal {A}_i\) with the i-th basis vector in \(\mathbb {R}\,^5\), the generator \(\mathcal {Q}\) of the partitioning process (cf. Eq. (4)) reads

$$\begin{aligned} \left( {\begin{matrix} -\varrho _1^{} - \varrho _2^{} - \varrho _3^{} &{} \varrho _1^{} &{} \varrho _2^{} &{} \varrho _3^{} &{} 0 \\ 0 &{} - \varrho _2^{} - \varrho _3^{} &{} 0 &{} 0 &{} \varrho _2^{} + \varrho _3^{} \\ 0 &{} 0 &{} - \varrho _1^{} - \varrho _3^{} &{} 0 &{} \varrho _1^{} + \varrho _3^{} \\ 0 &{} 0 &{} 0 &{} -\varrho _1^{} - \varrho _2^{} &{} \varrho _1^{} + \varrho _2^{} \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 \end{matrix}}\right) , \end{aligned}$$

where \(\varrho _1^{}, \varrho _2^{}, \varrho _3^{}\) are as in Sect. 4.1, and the gradient system for the distribution \(p_t^{\Sigma }\) then reads

$$\begin{aligned} \dot{p}_t^\Sigma = \left( {\begin{matrix} D_1 &{} -p_t^{}(\mathcal {A}_1) \varrho _1^{} &{} -p_t^{} (\mathcal {A}_1) \varrho _2^{} &{} -p_t^{} (\mathcal {A}_1) \varrho _3^{} &{} 0 \\ -p_t^{}(\mathcal {A}_1) \varrho _1^{} &{} D_2 &{} 0 &{} 0 &{} -p_t^{} (\mathcal {A}_2) (\varrho _2^{} + \varrho _3^{}) \\ -p_t^{}(\mathcal {A}_1) \varrho _2^{} &{} 0 &{} D_3 &{} 0 &{} - p_t^{} (\mathcal {A}_3) (\varrho _1^{} + \varrho _3^{}) \\ -p_t^{} (\mathcal {A}_1) \varrho _3^{} &{} 0 &{} 0 &{} D_4 &{} -p_t^{}(\mathcal {A}_4)(\varrho _1^{} + \varrho _2^{}) \\ 0 &{} -p_t^{}(\mathcal {A}_2)(\varrho _2^{} + \varrho _3^{}) &{} - p_t^{}(\mathcal {A}_3) (\varrho _1^{} + \varrho _3^{}) &{} - p_t^{}(\mathcal {A}_4) (\varrho _1^{} + \varrho _2^{}) &{} D_5 \end{matrix}} \right) \left( {\begin{matrix} 1 \\ 2 \\ 2 \\ 2 \\ 3 \end{matrix}} \right) , \end{aligned}$$

where \(D_1,\ldots ,D_5\) are chosen such that the rows sum to 0 and \((1,2,2,2,3)^{\mathsf {T}}\) is the gradient \(\nabla N\) of the mean number of blocks N, defined in Corollary 13. Again, the maximum of the potential, the partition \(\{\{1\},\{2\},\{3\}\}\) characterises linkage equilibrium (‘all sites come from independent ancestors’).