In this section, we consider an abstract problem, in which organisms are represented as points in some metric space and adaptation as a motion in this space towards some target point (an optimal organism), and fitness is negative distance to target. Minimisation of distance to the target is therefore equivalent to maximisation of fitness. Geometry of the metric space allows us to solve the optimisation problem precisely. These abstract results will be used in the following sections to develop the theory further bringing it closer to biology.
Representation and assumptions
Let \(\varOmega \) be the set of all possible genotypes representing organisms. This set is usually equipped with a metric \(d:\varOmega \times \varOmega \rightarrow [0,\infty )\) related to the mutation operator, such that large mutations correspond to large distance d(a, b) and vice versa. For example, the set of all DNA sequences of length \(l\in \mathbb {N}\) can be represented by vectors in the Hamming space \(\mathcal {H}_\alpha ^l:=\{1,\ldots ,\alpha \}^l\) equipped with the Hamming metric \(d_H(a,b)=\sum _{i=1}^l\delta _{a_i}(b_i)\) counting the number of different letters in two strings. This choice of metric is particularly suitable for a simple point-mutation, which will be the focus of this paper. A sphere S(a, r) and a closed ball B[a, r] of radius \(r\in [0,\infty )\) around \(a\in \varOmega \) are defined as usual:
$$\begin{aligned} S(a,r):=\{b\in \varOmega :d(a,b)=r\}\,,\quad B[a,r]:=\{b\in \varOmega :d(a,b)\le r\} \end{aligned}$$
We refer to \(r=d(a,b)\) as the mutation radius.
Environment defines a preference relation \(\lesssim \) (a total pre-order) so that \(a\lesssim b\) means genotype b represents an organism that is better adapted to or has a higher replication rate in a given environment than an organism represented by genotype a. We shall consider only countable or even finite \(\varOmega \), so that there always exists a real function \(f:\varOmega \rightarrow \mathbb {R}\) such that
$$\begin{aligned} a\lesssim b\qquad \iff \qquad f(a)\le f(b) \end{aligned}$$
In game theory, such a function is called utility, but in the biological context it is called fitness, and usually it is assumed to have non-negative values representing replication rates of the organisms. The non-negativity assumption is not essential, however, because the preference relation \(\lesssim \) induced by f does not change under a strictly increasing transformation of f. Thus, our interpretation of fitness simply as a numerical representation of a preference relation is distinct from population genetic definitions of fitness (e.g. see Orr 2009). We shall assume also that there exists a top (optimal) genotype \(\top \in \varOmega \) such that \(f(\top )=\sup f(\omega )\), which represents the most adapted or quickly replicating organism. Note that a finite set \(\varOmega \) always contains at least one top \(\top \) as well as at least one bottom element \(\bot \).
Generally, one should consider also the set of all environments (including other organisms), because different environments impose different preference relations on \(\varOmega \), which have to be represented by different fitness functions. In this paper, however, we shall assume that fitness in any particular environment has been fixed.
During replication, genotype a can mutate into b with transition probability \(P(b{\,\mid \,} a)\). Mutation can have different effects on fitness: It can be deleterious, if \(f(a)>f(b)\); neutral, if \(f(a)=f(b)\); or beneficial, if \(f(a)<f(b)\).
In this section, we consider a simple picture \(f(\omega )=-d(\top ,\omega )\), so that maximization of fitness \(f(\omega )\) is equivalent to minimization of distance \(d(\top ,\omega )\), and adaptation (beneficial mutation) corresponds to a transition from a sphere of radius \(n=d(\top ,a)\) into a sphere of a smaller radius \(m=d(\top ,b)\), which is depicted in Fig. 1. This geometric view of mutation and adaptation is based on Ronald Fisher’s idea (Fisher 1930), which was, perhaps, the earliest mathematical work on the role of mutation in adaptation. Fisher represented individual organisms by points of Euclidean space \(\mathbb {R}^l\) of \(l\in \mathbb {N}\) traits, and equipped with the Euclidean metric \(d_E(a,b)=(\sum _{i=1}^l |b_i-a_i|^2)^{1/2}\). The top element \(\top \) was identified with the origin in \(\mathbb {R}^l\), and fitness \(f(\omega )\) with the negative distance \(-d_E(\top ,\omega )\). Then Fisher used the geometry of the Euclidean space to show that the probability of beneficial mutation decreases exponentially as the mutation radius increases, and therefore mutations of small radii are more likely to be beneficial. Despite subsequent development of the theory (Orr 2005), the use of Euclidean space for representation was not revised.
Euclidean space is unbounded (and therefore non-compact) and the interior of any ball has always smaller volume than its exterior. Therefore, assuming mutation in random directions, a point on the surface of a ball around an optimum is always more likely to mutate into the exterior than the interior of this ball. This simple property is key for Fisher’s conclusion that adaptation is more likely to occur by small mutations. We showed previously, however, that the geometry of a finite space, such as the Hamming space of strings, implies a different relation between the radius of mutation and adaptation (Belavkin et al. 2011; Belavkin 2011). In particular, the mutation radius maximising the probability of adaptation varies as a function of the distance to the optimum.
Probability of adaptation in a Hamming space
Consider mutation of genotype \(a\in S(\top ,n)\) in a Hamming space \(\mathcal {H}_\alpha ^l\) into \(b\in S(\top ,m)\) with mutation radius \(r=d(a,b)\), as shown on Fig. 1. Assuming equal probabilities for all points in the sphere S(a, r), the probability that the offspring is in the sphere \(S(\top ,m)\) is given by the number of points in the intersection of spheres \(S(\top ,m)\) and S(a, r):
$$\begin{aligned} P(m\mid n,r)=\frac{|S(\top ,m)\cap S(a,r)|_{d(\top ,a)=n}}{|S(a,r)|} \end{aligned}$$
(1)
where \(|\cdot |\) denotes cardinality of a set (the number of its elements). The cardinality of the intersection \(S(\top ,m)\cap S(a,r)\) with condition \(d(\top ,a)=n\) is computed as follows:
$$\begin{aligned}&{\left| S(\top ,m)\cap S(a,r)\right| _{d(\top ,a)\,=\,n}}\nonumber \\&\quad =\sum _{\begin{array}{c} r_0+r_-+r_+=\min \{r,m\}\\ r_+-r_-=n-\max \{r,m\} \end{array}} (\alpha -2)^{r_0}{n-r_+\atopwithdelims ()r_0}(\alpha -1)^{r_-}{l-n\atopwithdelims ()r_-}{n\atopwithdelims ()r_+} \end{aligned}$$
(2)
where summation runs over indexes \(r_+\in [0,\lfloor (n-|r-m|)/2\rfloor ]\) and \(r_-\in [0,\lfloor (r+m-n)/2\rfloor ]\) (here \(\lfloor \cdot \rfloor \) denotes the floor operation) and satisfying conditions \(r_0+r_-+r_+=\min \{r,m\}\) and \(r_+-r_-=n-\max \{r,m\}\). See Appendix 1 for the derivation of this combinatorial result. We point out that for \(r\le m\), the indexes \(r_+\), \(r_-\) and \(r_0\) count respectively the numbers of beneficial, deleterious and neutral substitutions in \(r\in [0,l]\).
The cardinality of sphere \(S(a,r)\subset \mathcal {H}_\alpha ^l\) is
$$\begin{aligned} |S(a,r)|=(\alpha -1)^r{l\atopwithdelims ()r} \end{aligned}$$
(3)
Equations (1)–(3) allow us to compute the probability of adaptation, which is the probability that the offspring is in the interior of ball \(B[\top ,n]\):
$$\begin{aligned} P(m<n\mid n,r)=\sum _{m=0}^{n-1}P(m\mid n,r) \end{aligned}$$
(4)
Figure 2 shows the probability of adaptation for Hamming space \(\mathcal {H}_4^{100}\) as a function of mutation radius r for different values of \(n=d(\top ,a)\). One can see that when \(n<75\) (more generally when \(n<l(1-1/\alpha )\)), the probabilities of adaptation decrease with increasing radius \(r>0\), similar to Fisher’s conclusion for the Euclidean space. However, for \(n=75\) there is no such decrease, and when \(n>75\) (i.e. for \(n>l(1-1/\alpha )\)), the probability of adaptation actually increases with r. This is due to the fact that, unlike Euclidean space, Hamming space is finite, and the interior of ball \(B[\top ,n]\) can be larger than its exterior. The geometry of a Hamming space has a number of interesting properties (Ahlswede and Katona 1977). For example, every point \(\omega \) has \((\alpha -1)^l\) diametric opposite points \(\lnot \omega \), such that \(d_H(\omega ,\lnot \omega )=l\), and the complement of a ball \(B[\omega ,r]\) in \(\mathcal {H}_\alpha ^l\) is the union of \((\alpha -1)^l\) balls \(B[\lnot \omega ,l-r-1]\).
Random mutation
By mutation we understand a random process of transforming the parent string a into offspring b, so that the mutation radius is a random variable. The simplest form of mutation, called point mutation, is the random process of independently substituting each letter in the parent string \(a\in \{1,\ldots ,\alpha \}^l\) to any of the other \(\alpha -1\) letters with probability \(\mu \). At its simplest, with one parameter, there is an equal probability \(\mu /(\alpha -1)\) of mutating to each of the \(\alpha -1\) letters. The parameter \(\mu \) is called the mutation rate. For point mutation, the probability of mutating by radius \(r\in [0,l]\) is given by the binomial distribution:
$$\begin{aligned} P_\mu (r\mid n)={l\atopwithdelims ()r}\,\mu ^r(n)(1-\mu (n))^{l-r} \end{aligned}$$
(5)
We assume that the mutation rate \(\mu \) may depend on the distance \(n=d(\top ,a)\) from the top string n, and therefore the probability is also conditional on n.
Optimisation of the mutation rate requires knowledge of the probability \(P_\mu (m\mid n)\) that mutation of a into b leads to a transition from sphere \(S(\top ,n)\) into \(S(\top ,m)\). This transition probability can be expressed as follows:
$$\begin{aligned} P_\mu (m\mid n)=\sum _{r=0}^l P(m\mid n,r)\,P_\mu (r\mid n) \end{aligned}$$
Substituting (1) and (5) into the above equation, and taking into account (3), we obtain the following expression:
$$\begin{aligned} P_\mu (m\mid n)=\sum _{r=0}^l\frac{\left| S(\top ,m)\cap S(a,r)\right| _{d(\top ,a)=n}}{(\alpha -1)^r}\,\mu ^r(n)(1-\mu (n))^{l-r} \end{aligned}$$
(6)
where the number \(|S(\top ,m)\cap S(a,r)|_{d(\top ,a)=n}\) is given by Eq. (2). The case \(\alpha =2\) was investigated previously by several authors (e.g. Bäck 1993; Braga and Aleksander 1994). The expressions for arbitrary alphabets were first presented in (Belavkin et al. 2011) (see also Belavkin 2011).
We note that simple, one parameter point mutation is optimal in a certain sense: it is the solution of a variational problem of minimisation of expected distance between points a and b in a Hamming space subject to a constraint on mutual information between a and b (see Belavkin 2011, 2013). The constraint on mutual information between strings a and b represents the fact that perfect copying is not possible. The optimal solutions to this problem are conditional probabilities having exponential form \(P_\beta (b\mid a)\propto \exp [-\beta \,d(a,b)]\), where parameter \(\beta >0\), called the inverse temperature, is related to the mutation rate, and it is defined from the constraint on mutual information. The reason why this exponential solution in the Hamming space corresponds to independent substitutions with the same probability \(\mu /(\alpha -1)\) is because Hamming metric is computed as the sum \(d_H(a,b)=\sum _{i=1}^l\delta _{a_i}(b_i)\) of elementary distances \(\delta _{a_i}(b_i)\) between letters \(a_i\) and \(b_i\) in ith position in the string, and the values \(\delta _{a_i}(b_i)\) are equal to zero or one independent of the specific letters of the alphabet or their position i. Other, more complex mutation operators, which incorporate multiple parameters or non-independent substitutions (the phenomenon known in biology as epistasis) can be considered as optimal solutions of the same variational problem, but applied to a different representation space \(\mathcal {H}\) with a different metric.
Optimal control of mutation rates
The fact that the transition probability \(P_\mu (m\mid n)\), defined by Eq. (6), depends on the mutation rate \(\mu \) introduces the possibility of organisms maximising the expected fitness of their offspring by controlling the mutation rate. We call the collection of pairs \((n,\mu )\) the mutation rate control function
\(\mu (n)\). Indeed, let \(P_t(a)\) be the distribution of parent genotypes in \(\mathcal {H}_\alpha ^l\) at time t, and let \(P_t(n)=\sum _{a:d(\top ,a)=n}P_t(a)\) be the distribution of their distances \(n=d_H(\top ,a)\) from the optimum. Transition probabilities \(P_\mu (m\mid n)\) define a linear transformation \(T_{\mu (n)}(\cdot ):=\sum _{n=0}^lP_\mu (m\mid n)(\cdot )\) of distribution \(P_t(n)\) into distribution \(P_{t+1}(m)\) of distances \(m=d_H(\top ,b)\) of their offspring at time \(t+1\):
$$\begin{aligned} P_{t+1}(m)=T_{\mu (n)}P_t(n)=\sum _{n=0}^lP_\mu (m\mid n)P_t(n) \end{aligned}$$
If this transformation does not change with time, then the distribution \(P_{t+s}(m)\) after s generations is defined by \(T_{\mu (n)}^s\), the sth power of \(T_{\mu (n)}\). The optimal mutation rates can be found (at least in principle) by minimising the expected distance subject to additional constraints, such as the time horizon \(\lambda \):
$$\begin{aligned} \min _{\mu (n)}\ \mathbb {E}_{P_{t+s}}\{m\}=\sum _{m=0}^lm\,\left( T_{\mu (n)}^sP_t(n)\right) \quad \text{ subject } \text{ to } s\le \lambda \end{aligned}$$
(7)
For example, mutation rates minimising the expected distance at \(\lambda =1\) generation should depend on n according to the following step function:
$$\begin{aligned} \mu (n)=\left\{ \begin{array}{ll} 0&{} \text{ if } n<l(1-1/\alpha )\\ 1-1/\alpha &{} \text{ if } n=l(1-1/\alpha )\\ 1&{} \text{ if } n>l(1-1/\alpha ) \end{array}\right. \end{aligned}$$
(8)
This function is shown on Fig. 3 for Hamming space \(\mathcal {H}_4^{10}\). The sudden change of the optimal mutation rate from \(\mu =0\) at \(n<l(1-1/\alpha )\) to \(\mu =1\) at \(n>l(1-1/\alpha )\) corresponds to the sudden change of the effect of the mutation radius on the probability of adaptation shown on Fig. 2. Note that this mutation control function is not optimal for minimisation of the expected distance up to \(\lambda >1\) generations, because strings that are closer to the optimum than \(l(1-1/\alpha )\) do not mutate, so that there is no chance of improvement.
The variational problem for the optimal control of the mutation rate, such as problem (7), can be formulated in different ways optimising different criteria (e.g. instantaneous or cumulative expected distance, probability of adaptation, probability of mutating directly into the optimum) or taking into account additional constraints (e.g. the time horizon, information constraints), and generally they lead to different solutions. Previously, we investigated various types of such problems and obtained their solutions (Belavkin et al. 2011; Belavkin 2011, 2012), some of which are shown on Fig. 3. One can see that there is no single optimal mutation rate control function. However, it is also evident that all these control functions have a common property of monotonically increasing mutation rate with increasing distance from the optimum. The main question that we are interested in this paper is whether such monotonic control of mutation rate is beneficial in a broader class of landscapes, when fitness is not equivalent to distance. In Sect. 4, we shall further develop the theory from the simple case considered in this section to more general fitness landscapes and formulate hypotheses which will be tested in biological landscapes in Sect. 5. To generate data for this testing, we develop an evolutionary technique in Sect. 3 to obtain approximations to the optimal control functions in a broad class of problems, when the derivation of exact solutions is impractical or impossible.