The geometry of recombination

With the tools of information geometry, we can express relations between marginals of a joint distribution in geometric terms. We develop this framework in the context of population genetics and use this to interpret the famous Ohta–Kimura formula (cf. Ohta and Kimura in Genet Res 13(01):47–55, 1969) and discuss its generalizations for linkage equilibria in Wright–Fisher models with recombination with several loci. The state space associated with the Ohta–Kimura model is simply a Riemannian manifold of constant positive curvature. Furthermore, the equilibria states for recombination can be interpreted geometrically as a product of spheres. In the case of only 2 loci, we also derive the behavior of the mutual information between these two loci.


Introduction
In mathematical terms, the most basic model of mathematical population genetics, the Wright-Fisher model [11,20], is about iterated sampling with replacement. We shall explain the biological meaning in a moment, but first describe the mathematical structure, beginning with a simplified version and adding more details subsequently.
In the discrete model, in each generation, we have M entities, called gametes (later, we shall let M → ∞). They need not all be different, but they may fall into K different classes or types. For the next generation, we sample from that pool of M gametes with replacement, to create a new pool of M such gametes. For biological reasons, there might be sampling biases, but we ignore those here, and sample uniformly. Mathematically, this leads to the multinomial distribution. The question is how the class composition will evolve asymptotically, that is, how the relative frequencies of the different types will change when the number of generations goes to ∞, but we shall return to that later.
We are interested in another fundamental biological effect, recombination. To set this up, we first need a slight modification. We consider a population of N individuals that are diploid, i.e., carry two gametes each. That is, we have M = 2N gametes. As before, the gametes are sampled with replacement in each generation from the previous generation, and new individuals are randomly formed as pairs of gametes. Now recombination becomes possible, that is, two gametes can be fused into a single one. Biologically, that means that for each member of the next generation, we choose two individuals as parents and combine their altogether 4 gametes into 2 new ones. Of course, we could simply take one randomly from each parent, and then the sampling process, although now formulated in a more complicated manner, would not be different from the one described earlier. But now our gametes possess some internal structure that can get mixed or recombined during the sampling process. The gametes are strings with positions A, B, . . ., the so-called loci and at each locus, different values are possible. These different values are called alleles. For instance, in the simplest nontrivial case of two loci, A and B, at A, the possible alleles might be A 0 , . . . , A m , and at B, B 0 , . . . , B n . In the simplest case, each of these two loci can carry only two possible alleles. Let us denote those at the first locus by A 0 , A 1 , and those at the second locus as B 0 , B 1 . Thus, the possible gamete types are (A 0 , B 0 ), (A 0 , B 1 ), (A 1 , B 0 ), and (A 1 , B 1 ). When, for instance, a gamete of type (A 0 , B 0 ) is recombined with one of type (A 1 , B 1 ), and recombination gives the first allele of the first and the second allele of the second gamete to the offspring, that offspring then is of type (A 0 , B 1 ), hence different from either parent. Of course, when the gametes have more loci, more complicated recombination schemes are possible, but the preceding example already captures the essence.
Things become simpler when one passes to a continuum limit where the population size and the number of generation go to ∞ in an appropriate ratio. The process is then mathematically described by partial differential equations of Fokker-Planck or Kolmogorov type. Again, one is interested in the asymptotic limit states. Without recombination, the result is simple. Eventually, one gamete type will sweep through the population, and all others disappear. With recombination, other questions emerge. We may ask whether the frequency of a particular allele combination (A i , B j ) is equal to the product of the individual frequencies of A i and B j in the population. If that is so, one speaks of linkage equilibrium. Now, the famous Ohta-Kimura 1 formula [16] (see (3.13) below) describes the evolution of the coefficient of linkage disequilibrium that, as the name indicates, measures the deviation from equilibrium, in the continuum limit, for the simplest case of two loci with two alleles each. The formula looks complicated, and it does not appear clear how to systematically extend it to arbitrarily many loci with arbitrarily many alleles at each of them. But this is what we achieve in this paper. And we succeed in that by transforming the problem into a geometric one. Akin [1] had already developed a geometric interpretation of population genetics, but here we make the crucial observation that the underlying metric, called the Shashahani metric in mathematical biology, is nothing but the Fisher metric 2 of information geometry. And this enables us to draw systematically on information geometry, as developed for instance in Amari's books [2][3][4] (see also [6]) to achieve a natural and clear geometric picture of linkage and recombination. This makes the Ohta-Kimura formula easy to derive and conceptually clear. As our approach is general, it naturally works for arbitrary numbers of loci and alleles, and it yields a natural geometric hierarchy within which allele distinctions can be refined. In the end, linkage equilibrium simply corresponds to a product of lower dimensional (positive sectors of) spheres sitting in a larger one.
Before developing that geometric structure, however, let us reformulate the preceding in the language of population genetics. As described, in the most basic model of mathematical population genetics, the Wright-Fisher model [11,20], one considers a population of N individuals that are diploid, i.e., carry two gametes each. The population is resampled with replacement in each generation from the previous generation. 3 More precisely, the 2N gametes are sampled to yield those of the new generation. (A mathematically equivalent model results from considering a population of 2N haploid individuals that carry one gamete each. The essential step of the model will consist in the recombination of gametes, and it is thus the population of gametes that counts.) Those gametes can carry different alleles, and because of the random sampling, the allele composition of the next generation in general will not be identical to that of the previous one. This is the phenomenon of random genetic drift. Additional effects, like mutation or selection, yield sampling biases. Here, we are concerned with recombination. That means that the gametes have several loci, at each of which they can carry different alleles. Before forming the next generation, gametes to be paired in a new individual are broken into two complementary pieces, and the first piece of the first gamete is combined with the second piece of the second one to form a new gamete. An alternative biological mechanisms consists of the formation of zygotes, that is, pairs of gametes, in an individual that are then recombined before being passed to the next generation. 4 It turns out, however, that for the diffusion approximation of the Wright-Fisher model, with which we shall work in the present paper, it does not make a difference whether we randomly pair gametes or zygotes, and so, we shall ignore that distinction here.
Since, as described, the sampling from a generation of gametes is a stochastic process, we need to work with probability distributions for the population of gametes. A basic question is whether the allele probabilities at the different loci are independent of each other. The term linkage between certain loci (or sets of loci) means that the allelic configuration at one locus or loci set affects the allelic configuration at the other loci. In the two-loci 2-allelic model, each allele at one locus can be linked with both other alleles at the other locus. The question is, whether in hopefully obvious notation, If this is the case, the alleles at the two loci are not linked, and one speaks of a linkage equilibrium. This may be violated because of selective effects, because for instance a gamete of type , as well as other inequalities violating (1.1). Here, however, we shall not be concerned with effects of selection. A more basic question is when the first generation with which the process starts is not in linkage equilibrium, that is, does not satisfy (1.1) (where we now interpret the probabilities as relative frequencies), whether we may still expect that (1.1) will asymptotically hold when the number of generations goes to infinity. In this paper, we shall develop a mathematical framework to study this question, taking the formula of Ohta and Kimura [16] as our reference point. As already mentioned, our geometric approach is based on the Fisher metric 5 of information geometry, that makes the Ohta-Kimura formula easy to derive and conceptually clear. As our approach is general, it naturally works for arbitrary numbers of loci and alleles, and it yields a natural geometric hierarchy within which allele distinctions can be refined.
The mathematical perspective that we shall develop is that of information geometry. To put it simply, the family of linkage equilibrium states is simply an exponential subfamily of the family of all probability distributions of the alleles. In fact, the projection of a given distribution onto that family yields a maximum entropy distribution with the same marginals, see [4,6].
In the sequel, we shall speak of genotypes rather than pairs of gametes. Each individual in the population is thus represented by its genotype ξ . p t (ξ ) is the probability that an individual in generation t carries the genotype ξ , and the aim of the model is to investigate the dynamics of the probability distribution p t in time t. (For the purposes of the present paper, we do not need to distinguish between probabilities in future and relative frequencies in past populations.) We assume that the genetic loci of the different members of the population are in one-to-one correspondence with each other. Thus, we have loci α = 1, . . . , k. As we are looking at pairs of gametes, at each locus, there are two alleles, which could be the same or different. As explained above, each individual has two parents, and its genotype is assembled from the genotypes of its parents through sexual recombination. We are interested in how the distribution of genotypes changes over time through the combined effects random drift and recombination. Here, for simplicity, we ignore other evolutionary forces, like selection and mutation. We refer to [12] for the general theory.
Recombination is a binary operation, that is, an operation that takes two parent genotypes η, ζ as arguments to produce one offspring genotype ξ . Here, we assume that, like a chromosome, a genotype consists of a linear string 6 of k sites occupied by particular alleles. An offspring genotype is then formed through recombination by choosing at each locus of each of the two gametes the allele that one of the parents carries there. The selection rule for deciding which of the two possibly different alleles to choose is given by a mask μ, a binary string of length k. An entry 1 at position α means that the allele is taken from the first parent, say η, and when there is a 0 the allele is taken from the second parent, say ζ . For instance, for k = 5, the mask 10010 produces from the parents η = η 1 . . . η 5 and ζ = ζ 1 . . . ζ 5 the offspring ξ = η 1 ζ 2 ζ 3 η 4 ζ 5 . The recombination schemes C ξηζ (μ) for the masks μ and their probabilities p r (μ) then yield the recombination operator When all the possible 2 k masks are equally probable, then, at each locus, the offspring acquires an allele from either parent with probability 1/2, independently of the choices at the other loci. In particular, the linear arrangement of the loci plays no role in this case, and the loci are independent of each other. In contrast, dependencies between sites arise in the cross-over models (see for example [7]). Such models permit only masks of the form μ c = 11 . . . 100 . . . 0. For such a mask, at the first a(μ c ) sites, the allele from the first parent is chosen, and at the remaining k − a(μ c ) sites, the one from the second parent. As a can range from 0 to k, we then have k + 1 possible such masks μ c , and we may wish to assume again that each of those is equally probable. In such cross-over models, obviously, the linear arrangement of the sites is important.
Recombination cannot prevent the basic effect of random genetic drift in the Wright-Fisher model that some alleles may disappear from the population, and in fact, with probability 1, in the long term, only one allele will survive at each site. Because of the random selection, in each generation, it may happen that no carrier of a particular allele is chosen or that none of the chosen recombination masks preserves that allele when the mating partner carries a different allele at the locus under consideration. That would then lead to the irreversible extinction of that allele.
In this paper, we want to study the dynamics of probability distributions. For that purpose, we shall first analyze the geometry of the space of probability distributions. That will provide us with a geometric description and interpretation of the dynamics. In fact, we shall study the continuum limit of the dynamics, as is the custom in mathematical investigations of the Wright-Fisher model. That means that rescale population size N and generation time δt in such a way that N → ∞, but N δt = 1. As is well-known since the work of Wright and Kimura, in the limit, the probability density f ( p, s, x, t) := ∂ n ∂ x 1 ...∂ x n P(X (t) ≤ x|X (s) = p) with s < t satisfies the Kolmogorov forward or Fokker-Planck equation and the Kolmogorov backward equation (1.4) The second order terms arise from random genetic drift, whereas the first order terms with their coefficients b i may incorporate the effects of recombination. We shall develop a geometric framework that will interpret the coefficients of the second order terms as the inverse of the Fisher metric of mathematical statistics. In particular, this will enable us to find a handy and insightful description of the state space and, importantly, the geometric effect of recombination and linkage on it. This will lead us to the following general result for the geometry of linkage equilibria: Theorem 3.5 (cf. p. 25) In linkage equilibrium, for all n + 1 ≥ 2 and k ≥ 2 the corresponding restriction of the state space (n+1) k −1 of the diffusion approximation of a k-loci (n + 1)-allelic recombinational Wright-Fisher model equipped with the Fisher metric of the multinomial distribution is a kn-dimensional manifold and carries the geometric structure of where S n + is the positive sector of the n-dimensional unit sphere. And this is the geometry underlying the Ohta-Kimura formula and its generalization to arbitrarily many loci and alleles.
For simplicity, we have formulated this result under the assumption that the number of alleles at each locus is the same. Of course, that assumption is not necessary. We refer to Corollary 3.6 for the precise statement in the general case. This generalizes the corresponding results in [12].
Since the literature on the Wright-Fisher model is immense, we do not provide a literature review, but rather refer to [10].
We conclude this introduction with some remarks on the notations. Random variables are denoted by capital Latin letters, like X or Y , their values by the corresponding minuscules, like x or y. Expectation of a random variable X is denoted by E(X ). Time, be integer time m ∈ N or real time t ∈ R + , is represented by a subscript. So, Z m or Z t is the value of the random process Z at time m or t.
The random variable Y denotes absolute and X relative frequencies. When we have alleles A 0 , . . . , A n at a single locus, their relative frequencies are given by X = (X 0 , . . . , X n ), and analogously for Y . Thus, n i=0 Y i = N and n i=0 X i = 1, and likewise their realizations satisfy n i=0 x i = 1. We usually write p(x 0 , . . . , x n ) = P(X 0 = x 0 , . . . , X n = x n ) for the probability that the components of the random variable X take the values x i . We shall employ the same notation for different random variables. Thus, p or P do not denote specific functions, but stand for the generic assignment of probabilities. Which random variable is meant will be clear from the variable names and the context.

The basic setting
As we are working with probability distributions, our underlying space is the probability simplex where the x i stand for relative allele frequencies or probabilities of the alleles i = 0, . . . , n. In the infinite population size limit, the x i can take any values between 0 and 1. The normalisation induces correlations between the x i that will be captured by the Fisher metric introduced below. On this space of relative frequencies or probabilities, we shall consider probability distributions p(x). As probability distributions, they need to satisfy the normalisation In the sequel, we shall equip the probability simplex n with various geometric structures. Of course, it already possesses a geometric structures as an affine linear subset of R n+1 . Another geometric structure is obtained by projecting it to the positive sector of the unit sphere in R n+1 . The resulting, surprisingly rich, geometry is the content of information geometry, see [2,4,6]. For information geometry, we need to recall some basic concepts from Riemannian geometry, using [13] as a reference.

The Fisher metric
The results in this section are well known and described here only for completeness and ease of reference. The Fisher information metric of a smooth family of probability distributions on some domain parametrised by s = (s 1 , . . . , s n ) ∈ S ⊂ R n with probability density functions p(ω|s) = p(s) (for short) is given by (cf. [3], p. 27) where E p(s) is the expectation with respect to p(s).
For the basic Wright-Fisher model, we have n + 1 alleles which are assigned probabilities p 0 , . . . , p n . The metric tensor g i j in the coordinates p 1 , . . . , p n on the n-dimensional simplex (2.7) (we have eliminated p 0 = 1 − n i=1 p i to make the coordinates independent) is the Fisher metric 7 tensor and the inverse metric tensor g i j is 7 in the mathematical biology literature, also called the Shahshahani metric.
that is, the covariance matrix of the probability distribution p.
For the multinomial distribution M(2N ; p 0 , . . . , p n ), the metric has to be scaled by the factor 2N , The Fisher metric is nothing but the standard metric on the sphere, written in simplex coordinates p 0 , . . . , p n . We can also compute the Christoffel symbols (see [13], p.22) (2.12) We can also rewrite the metric in spherical coordinates, by simply putting Applying the transformation rule for Riemannian metrics, and Since this has been obtained as the restriction of the Euclidean metric to the unit sphere, this must simply be the standard metric on the unit sphere S n (up to the factor 4 that results from the normalization of the Fisher metric (2.4)). Since the standard metric on the sphere has sectional curvature ≡ 1, and since the sectional curvature of a Riemannian metric scales with the inverse of a scaling factor, our factor 4 leads to  .2), where the Fisher metric has been multiplied by four implying that the sectional curvature will be divided by four.

The transformation rule for geometric differential operators
Below, we shall encounter linear second order differential operators of the form with coefficients a i j and b i . In fact, the coefficients a i j of the leading second order will arise as the inverse g i j of the metric tensor of the Fisher metric.
In order to understand this operator, we recall some constructions from geometric analysis (see [13]). For those, g = (g i j ) can be any Riemannian metric, not necessarily the Fisher one. The basic geometric second order operator is the Laplace-Beltrami operator with the Christoffel symbols of (2.11). Thus [5,Eq. (31)]. Remark 2. 3 We can put this into a more general context, although this will not be explored more deeply in the present article. The Christoffel symbols as defined in (2.11), that is, in the notation of this section i jk = 1 , are the Christoffel symbols for the Levi-Civita connection for the metric g. More generally, we can consider the Christoffel symbols i jk for any affine connection and define a corresponding operator as in (2.18). This operator, however, in general will no longer be in divergence form as in (2.17). Such an operator arises, for example, in [9] in a statistical application. For the Wright-Fisher model which we have formulated on the simplex n as a Riemannian manifold with the Fisher information metric g i j ( p) in (2.8), one considers (see [4,6]) the general affine connection (α) (with α ∈ [−1, 1]) with the symbols i j,k := g k i j given by (2.20) For α = 0, this is (2.11). In fact, in that case, the symbols in (2.20) yield the Levi-Civita connection. The Christoffel force for this case is readily computed from (2.11) to be ch k (2.19) then becomes which reduces to (2.19) when α = 0, because for that case, we have the Levi-Civita connection of the Fisher metric. In information geometry (see [4,6]), the cases α = ±1 are particularly important. In fact, for α = 1, the symbols (2.20) vanish (meaning that the connection is flat). For α = −1, the right hand side of (2.21) vanishes, that is, M 0 coincides with the corresponding Laplacian. In fact, the case α = −1 is dual to α = 1 and hence also flat (meaning that we can find appropriate coordinates in which the corresponding symbols vanish).
We return to (2.19). The Laplace-Beltrami operator does not change its shape under coordinate transformations. The Christoffel symbols, however, do not transform as tensors. Therefore, when changing coordinates, we obtain additional first order terms in M 0 from the transformation of the metric. The precise formula is Lemma 2.4 When changing coordinates (x i ) i → (x i ) i , the partial differential operator (2.16) transforms into (2.23) Proof Letx be a coordinate change and u(x) =ũ(x(x)). The chain rule yields This formula will be used in the following Theorem 3.1.

Exponential families
There is more structure in information geometry than the Fisher metric, see [2,6]. In fact, the probability simplex carries two affine structures that are dual to each other. Their deeper geometry was discovered and explored by Amari and Chentsov. On one hand, we have the mixture structure which is simply the affine geometry of the simplex. Dually, we have the exponential structure. To explain that structure, consider a probability distribution p(x 0 , . . . , x n ). For any 1 ≤ k ≤ n, we have the marginals that is, as the difference between the entropy of the product distributionp = p 0,...,k−1 p k,...,n with the same marginals as p, or as the mutual information between these two marginals. In fact, as can be seen from the fact that the Kullback-Leibler divergence is always nonnegative, the entropy of the product distributionp = p 0,...,k−1 p k,...,n is largest among all distributions with the same marginals. And p →p can be seen as the projection onto the exponential family defined by (2.27). Of course, the same also applies when we refine the decomposition and look at products of more than two marginals. The extreme case is realized by the products which have the highest entropy among all probability distributions with the same marginals. Importantly, the projections can be composed. For instance, a given distribution p can first be projected onto the product p 0,...,k−1 p k,...,n and then further onto the distribution (2.31) with the same marginals. The result is the same as when we directly project p onto such a distribution.

Recombination
Gametes are strings of loci that are occupied by an allele from a set of alleles that is specific for the locus in question. The different gametes in the population are assumed to be perfectly aligned. That means that the number of loci, as well as the set of possible alleles for each, are the same for all gametes within the population under consideration. Thus, we can identify a locus across all the gametes. The loci are assumed to be linearly ordered. Instead of a locus, we may also speak of a site, but this means the same.
When now two gametes are paired, an offspring gamete is formed. Since the new gamete has the same number of loci as each of its parents, for each locus an allele from one of the two parents needs to be chosen. This is recombination; the possibilities for the combinations may be restricted by a recombination scheme.
In order to illustrate the principle, we start with the special case of two loci with two alleles each. With the conventions of Sect. 1, at the first locus the possible alleles are A 0 , A 1 , and for the second locus B 0 , B 1 . When using G as symbol for 'gamete', the possible gametes then are with relative frequencies denoted by X := X i j (as random variables) and x := x i j (as realized values) when G = (A i , B j ). We let i be the allele index opposite to i, that is, i = 0(1) if i = 1(0) and put Linkage equilibrium corresponds to D i j = 0, i.e., the relative frequency of (A i , B j ) equals the product of the relative frequencies of A i and B j . For the mathematics of the model to be discussed shortly, recombination, instead of simply passing parental gametes on to the next generation, should take place only at a certain rate that depends on the population size (see (3.6) below). To formalize this, we introduce the recombination rate R. When the gamete (A i , B j ) is mated with the gamete (A , B h  2 R each. The factor 1 2 appears here because we assume that the mating and recombination of two gametes produces a single offspring in place of two.
The recombination rate R has to be interpreted as a probability. R = 0 means that no recombination occurs, whereas R = 1 2 means that the two loci behave independently. Since R and 1 − R lead to mathematically equivalent models, it suffices to consider 0 ≤ R ≤ 1 2 .

Let us look at some examples. When
is produced with probability 1 2 R, even though neither parent was of this type. Thus, the effect of recombination on gamete frequencies can be neutral, negative or positive. Allele combination frequencies can stay invariant, get reduced or enhanced. In particular, recombination can create allele combinations that were not present in the parental population.
With recombination combined with random sampling, the relative frequency of G changes between the generations m ∈ Z according to This may be calculated easily; as an example, for = 0, ; the factor 2 arises, because the order of the two gametes does not matter. If there is no recombination, we have the martingale property in (3.3), but the presence of recombination introduces a bias. (This is the same with other evolutionary effects like selection or mutation, which, however, we do not study in this paper, but for recombination it may look more surprising than, say, for selection.) In a population of N individuals, we have 2N gametes in each generation. We have two operations, sampling and recombination. Their order matters, but in the continuum limit case that we shall analyze below, this becomes irrelevant.

Diffusion approximation
We now turn to the diffusion approximation of the Wright-Fisher model with recombination. This is a very common technique, and its application is straightforward: We let the population size N → ∞ and rescale time with δt = 1 N . All expectation values then acquire a factor N . The coefficients of the drift term (i. e. coefficients b i in Eq.
with δ X t := X t+δt − X t . To keep them finite we have to assume i.e., the recombination rate has to go to 0 like 1 N . The Eq. (3.6) in turn leads to Thus, the diffusion coefficients a h are now independent of recombination. Since in this situation the third and higher moments behave like o( 1 N ), we can obtain the forward and backward Kolmogorov equations by passing to the diffusions limit (for a proof see [18], Proposition 3.5; cf. also the original derivation of Kolmogorov in [15] on R n instead of on the simplex as here). The result is:

Theorem 3.1 The diffusion approximation of a 2 loci 2-allelic Wright-Fisher model with recombination is described by the Kolmogorov equations for its transition probability density u
is the forward operator, and the Kolmogorov forward equation then is for u( · , t) ∈ C 2 ( 3 ) for each fixed t ∈ (0, ∞) and u(x, · ) ∈ C 1 ((0, ∞)) for each fixed x ∈ 3 . Moreover, when we consider the dependency on the initial data x 0 , s of u, i.e. , consider a conditional probability density u(x, t|x 0 , s), then for any fixed t ∈ (0, ∞), for any probability measure λ 3 on n , and for any Borel measurable subset B in n , the probability v(s, satisfies the Kolmogorov backward equation for v(·, s) ∈ C 2 ( 3 ) for each fixed s ∈ (0, ∞) and v(x 0 , ·) ∈ C 1 ((0, ∞)) for each fixed x 0 ∈ 3 , where When we change coordinates via for f ( · , t) ∈ C 2 ( ( p,q,D) ) for every t > 0 and f ( p, q, D, · ) ∈ C 1 ((0, ∞)) for ( p, q, D) ∈ ( p,q,D) and with (3.14) Figure 1 illustrates the domain of new variables p, q, D.

Compositionality
There is an obvious hierarchy in the single locus case with alleles A 0 , . . . , A k . We could simply merge the alleles A i , . . . , A i −1 into a single super-allele S , and then consider the dynamics of the S . In the multi-locus case, we also have a compositionality in the following sense. If, for instance in the two-locus case, one ignored the dynamics at the second locus and only considered the dynamics at the first locus, then the frequencies at the first locus would not be affected by those at the second locus. Recombination only affects the correlations between the probabilities at the two loci, but not the marginals.
Such hierarchies find a natural expression in our geometric model. For the frequencies of A i , which are given as the sums of the frequencies of the pairs (A i , B 0 ) and (A i , B 1 ), we have a standard Wright-Fisher dynamics. When X i now denotes the frequency of A i and X i j that of the pair (A i , B j ), we have X i = j X i j . For instance, when we sum over the alleles at the second locus, the coefficients of the Kolmogorov operators satisfy, the index for an allele pair corresponding to the pair of indices (i j) in (3.5), (3.7), 1 j,s=0 (3.16) and analogously for summing over the first locus. Thus, by taking marginals, the frequency dynamics governed by the Kolmogorov operators (3.8), (3.11) lead to two frequency dynamics governed by the Kolmogorov operators on a one-dimensional state space. The dynamical process on the positive hyperoctant of the three-dimensional sphere projects to a process on the product of two one-dimensional spheres. The projected processes no longer see the recombination, because it has been summed over. This compositionality simply expresses the fact described in Sect. 2.4 that the projections onto exponential families can be composed.

The geometry of recombination
In the following section, we will finally consider the geometrical properties of the state space of recombination. We shall find that the geometric perspective that we have developed in Sect. 2 can substantially clarify the underlying mathematical structure. Starting with the Ohta-Kimura formula (3.13), we have as the corresponding coefficient matrix of the 2nd order derivatives a i j ( p, q, D) We shall see that these coefficients coincide with those of the inverse of a Fisher metric on ( p,q,D) .
Instead of inverting those coefficients to get the Fisher metric directly, it is much simpler to switch back to the x-coordinates, i. e. , the simplex coordinates. In fact, we can work either with the simplex or the sphere coordinates.
We first change coordinates via The domain ( p,q,D) then is mapped onto 3 , and we have (3.20) Applying the transformation formula of Lemma 2.4 transforms a i j into We then apply (2.13) to transform this via from simplicial to spherical coordinates, which satisfies ∂ y i ∂ x j = 1 2 √ x j δ i j . As a result, the coefficient matrix (a i j ( p, q, D)) of the 2nd order derivatives in Eq. (3.13) then as in (2.15), (2.14) turns from the tensor (k i j ) in coordinates x into the tensor (l i j ) in coordinates y and the inverse, that is the metric tensor itself, now is Thus, ( i j ) and (k i j ) coincide (up to the prefactor 16) with (the inverse of) the standard metric of the 3-sphere S 3 + ⊂ R 4 in spherical or simplicial coordinates. Since scaling the Riemannian metric g with a factor λ > 0, the corresponding sectional curvatures are scaled by 1 λ , Proposition 2.1 therefore yields In particular, k i j (x) coincides (up to scaling and the missing prefactor N ) with the covariance matrix of the multinomial distribution M (N ; p 0 , p 1 , . . . , p 3 Therefore, it also coincides with the Fisher metric of the multinomial distribution on 3 . We state this result as:

The geometry of linkage equilibrium
In order to investigate linkage equilibrium, we first consider the case of two loci. Linkage equilibrium then means that the product of the marginal frequencies of each pair of alleles A i and B j equals the frequency of the gamete A i B j . Figure 2 illustrates the set of linkage equilibrium states W 2 in the two-locus two-allele case. This corresponds to D = 0. We shall now analyse the geometry of such linkage equilibrium states with the concepts of Sect. 2.
They will be represented by product metrics. Starting with the product metric on the product of two Riemannian manifolds, let (M 1 , g 1 ) and (M 2 , g 2 ) be Riemannian manifolds of dimensions n and m, and let g 1 × g 2 be the product metric on M 1 × M 2 , given in local coordinates by The set of linkage equilibrium states, the so-called Wright manifold, denoted by W 2 , in two-locus two-allele case with 0 n,m being the n × m null matrix. The inverse metric is (3.28) Similarly for product metrics with more than two factors. We can now evaluate the Ohta-Kimura formula (3.13) for D = 0. For ( p, q, 0) ∈ ( p,q,0) = ( p, q, 0) ∈ R 2 × {0} 0 < p, q < 1 , (3.17) then becomes Its inverse is (3.30) The (a i j ( p, q, 0)) induces a product metric on 1 × 1 , in which each factor 1 is equipped with the metric g(x) = 4 x(1−x) , that is, (up to the prefactor) the standard metric of the 1-dimensional sphere S 1 + ⊂ R 2 + . Thus, the state space ( p,q,0) resp. a correspondingrestriction of 3 of the diffusion approximation of the two-loci two-allelic recombinational Wright-Fisher model in linkage equilibrium carries the geometrical structure of when equipped with the Fisher metric of the multinomial distribution (cf. Lemma 3.3). This is the positive part of the Clifford torus. We point out that the Clifford torus, while being a minimal surface with vanishing curvature, is not a totally geodesic submanifold of the 3-sphere. That means that geodesics connecting two points on the Clifford torus will in general not stay on this torus, but go through the ambient sphere. Upon reflection, this may not be so surprising in the present context, because the relevant geodesic structure is that of the unit simplex, rather than that of the sphere. In fact, the natural interpretation is in terms of the exponential families of Sect. 2.4. The Clifford torus is nothing but the exponential family of product distributions, that is, at linkage equilibrium the distribution for the allele combinations equals the product of the marginals.

Linkage equilibria for two loci with several alleles
We now extend the notion of linkage equilibrium to models with more than two possible alleles at a locus, but for the moment still consider only two loci, say A and  ...,m, j=1,...,n consisting of m allele frequencies x i• := n j=0 x i j and n allele frequencies x • j := m i=0 x i j and the D i j . Using this notation, the formula for the D i j simplifies, i. e. (3.33) Recalling (1.6), i.e., i, j x i j = 1, this implies that and The coefficients a x i• ,x •l (x • , D) vanish at linkage equilibrium. Next, D). (3.39) Again, these entries vanish in linkage equilibrium. Finally, Again, the corresponding coordinate representation of the inverse metric in linkage equilibrium becomes a block matrix, i. e.
Thus, we have generalized the representation of the metric corresponding to the Ohta-Kimura formula in linkage equilibrium (cf. Eq. (3.29)) to an arbitrary number of alleles. Since as before the inverse of a block matrix is a block matrix, we obtain the product structure of (a (x • ,D) (x • , 0)), given as Note that here we have used a somewhat loose formulation. A more precise statement is given in Theorem 3.5 below.

General linkage equilibria
We now move on to an arbitrary number of loci and alleles. We shall again describe linkage equilibria as product metrics when equipped with their Fisher metric. However, when there are more than two loci, the situation gets significantly more complicated as now linkage, that is, deviation from independence of relative frequencies at different loci, can occur not only between pairs of loci, but also in higher order relations. That is, for instance, even in the absence of pairwise correlations, there could be triple correlations. In particular, we need to clarify the term 'linkage equilibrium' in this extended setting. This may be achieved by generalizing (3.32). A short reflection on the reasoning in Sect. 3.2 convinces us that we may assume for simplicity that the number of possible alleles at the different k ≥ 2 loci is the same, say n + 1 ≥ 2.
Extending the previous notation, we put Instead of 1-linkage disequilibrium coefficients, which would provide no information, allele frequencies are used as non-interaction coordinates.
We then speak of a linkage equilibrium if for all l the coefficients of generalized l-linkage disequilibrium vanish. At linkage equilibrium, we again have a product structure for the state space (n+1) k −1 when equipped with the Fisher metric of the multinomial distribution, (3.45) To see this, we start with the allele frequencies x i 1 •...• , i 1 = 1, . . . , n. They parametrize the multinomial distribution M. We consider the corresponding Fisher metric g ·•...• on n (cf. also Sect. 2.2). Since i 1 x i 1 •...• = 1, (x i 1 •...• ) i 1 may itself be interpreted as a discrete probability distribution in n . The Fisher metricg ·•...• on n is then given byg By (2.10), both metrics coincide, except for the factor N , which we shall simply ignore. The same works for We have the product The image of the product, i. e. the corresponding restriction of the Fisher metric, has a product structure itself, if the following independence relations are satisfied: (3.50) Such a result of course also holds when the number of alleles at the different loci varies. Let the j th locus contain n j + 1 alleles ( j = 1, . . . , k). Putting n = (n 1 , . . . , n k ), |n| = n 1 + · · · + n k and 1 = (1, . . . , 1), we obtain for this k-loci (n + 1)-allelic recombinational Wright-Fisher model: (3.51) The product manifolds appearing in these results are exponential families in the sense of Sect. 2.4. We note that the preceding considerations do not depend on a particular recombination scheme. When no recombination takes place at all, however, the preceding becomes trivial insofar as in that case a k-loci (n + 1)-allelic model is formally the same as a (1-locus) (|n| + k)-allelic model.

Linkage disequilibrium and mutual information for two loci
In this section, we shall answer the question raised in the introduction, about the asymptotics of a process that starts outside of linkage equilibrium, in particular whether it will asymptotically reach equilibrium, and if so, at what rate. We study the asymptotic behavior of two quantities: linkage disequilibrium (local quantity) and mutual information (global quantity). We consider both the evolution of the Fokker-Planck or Kolmogorov forward equation and that of the corresponding Langevin equation. The former is a deterministic partial differential equation for the evolution of a probability density, and for simplicity, we call this the deterministic setting. The latter is a stochastic ordinary differential equation that describes the interaction of deterministic effects (coming from recombination, that is, the dynamics of linkage in the present setting) and stochastic effects (from random sampling). We call this the stochastic setting.
It turns out that in the deterministic setting, the linkage disequilibrium and mutual information exponentially decrease to zero when the number of generations goes to infinity. However, in the stochastic setting, only the expected linkage disequilibrium exponentially decreases to zero, the expected mutual information can increase or decrease, but eventually also converges to zero when the number of generations goes to infinity.

Deterministic setting
In this setting, we work with an infinite population each member of which has two loci A and B. Similar to the case of Sect. 3.5 for finite population size, the relative frequency of genotype A i B j , denoted by x i j , evolves = x i j (t) + r δt D i j (x(t)), (4.1) where r is the deterministic recombination rate which can be considered as the limit of R(N )N when N goes to infinity in finite population. This implies the ODE d dt x i j (t) = b i j (x(t)), for all (i, j) ∈ J , where r = lim N →∞ R(N )N . D i j (x) is the local linkage disequilibrium between A i and B j from (3.32). We let I (A; B) be the mutual information between the loci A and B, i.e.
with the Kullback-Leibler divergence d K L of (2.28). We have the following result.
because of (i, j)∈J x i• x • j log x i• x • j x i, j = d K L ( p A p B p AB ) ≥ 0. This implies the proof.

Stochastic setting
For an individual trajectory, rather than the ODE (4.2), we have the SDE d X i j (t) = b i j (X (t))dt + (k,l)∈J σ i j,kl (X (t))dW kl (t), for all (i, j) ∈ J , t ≥ 0, (4.5) where σ i j,kl (X ) = X i j (δ i j,kl − X i j X kl ).
By taking the expectation, we obtain d E D i j (X (t)) = −(r + 1) E D i j (X (t))dt, which implies E D i j (X (t)) = e −(r +1)t E D i j (0). This completes the proof. We note that in this realization, X 01 (t) (blue) goes to zero first (at time t = 508). At this time D(508) and I (508) are still different from zero therefore we see that X 01 (t) becomes positive again after this time (e.g., at time t = 510). However, at time t = 534 when both X 01 (534) and X 11 (534) (cyan) are equal to zero (then D(534) = I (534) = 0) therefore they both remain zero and never become positive again. Now, only X 00 (t) (red) and X 10 (t) (green) are positive and varying. At time t = 2447, X 00 (2447) = 0 and X 10 (2447) = 1 the process stops with the fixation of gamete A 1 B 0 (see Fig. 3).

A final remark
As explained in Sect. 2.4, among all distributions with the same marginals, the product distributions, that is, the linkage equilibria, are those of highest entropy, see (2.29). As explained in the preceding sections, the random dynamics of population genetics lead towards states of linkage equilibrium. Thus, we here see a tendency to increase the entropy. On the other hand, random genetic drift tends to eliminate alleles until only one remains at each site, and this is a state of vanishing entropy. Hence random genetic drift decreases the entropy.