A rigorous model study of the adaptive dynamics of Mendelian diploids

Adaptive dynamics (AD) so far has been put on a rigorous footing only for clonal inheritance. We extend this to sexually reproducing diploids, although admittedly still under the restriction of an unstructured population with Lotka–Volterra-like dynamics and single locus genetics (as in Kimura’s in Proc Natl Acad Sci USA 54: 731–736, 1965 infinite allele model). We prove under the usual smoothness assumptions, starting from a stochastic birth and death process model, that, when advantageous mutations are rare and mutational steps are not too large, the population behaves on the mutational time scale (the ‘long’ time scale of the literature on the genetical foundations of ESS theory) as a jump process moving between homozygous states (the trait substitution sequence of the adaptive dynamics literature). Essential technical ingredients are a rigorous estimate for the probability of invasion in a dynamic diploid population, a rigorous, geometric singular perturbation theory based, invasion implies substitution theorem, and the use of the Skorohod M1 topology to arrive at a functional convergence result. In the small mutational steps limit this process in turn gives rise to a differential equation in allele or in phenotype space of a type referred to in the adaptive dynamics literature as ‘canonical equation’.


Introduction
Adaptive dynamics (AD) aims at providing an ecology-based framework for scaling up from the micro-evolutionary process of gene substitutions to meso-evolutionary time scales and phenomena (also called long term evolution in papers on the foundations of ESS theory, that is, meso-evolutionary statics (cf. Eshel 1983Eshel , 2012Eshel et al. 1998;Eshel and Feldman 2001). One of the more interesting phenomena that AD has brought to light is the possibility of an emergence of phenotypic diversification at so-called branching points, without the need for a geographical substrate (Metz et al. 1996;Geritz et al. 1998;Doebeli and Dieckmann 2000). This ecological tendency may in the sexual case induce sympatric speciation (Dieckmann and Doebeli 1999). However, a population subject to mutation limitation and initially without variation stays essentially uni-modal, closely centered around a type that evolves continuously, as long as it does not get in the neighborhood of a branching point. In this paper we focus on the latter aspect of evolutionary trajectories.
AD was first developed, in the wake of Hofbauer and Sigmund (1987), Marrow et al. (1992), Metz et al. (1992), as a systematic framework at a physicist level of rigor by Dieckmann and Law (1996) and by Metz andvarious coworkers (Metz et al. 1992, 1996;Geritz et al. 1998). The first two authors started from a Lotka-Volterra style birth and death process while the intent of the latter authors was more general, so far culminating in Durinx et al. (2008) which works out the details for general physiologically structured populations at a physicist level of rigor. The theory was first put on a mathematically rigorous footing by Champagnat and Méléard and coworkers (Champagnat et al. 2008;Champagnat 2006;Méléard and Tran 2009), and recently also from a different perspective by Peter Jagers and coworkers (Klebaner et al. 2011). All these papers deal only with clonal models. In the meantime a number of papers have appeared that deal on a heuristic basis with special models with Mendelian genetics (e.g. Kisdi and Geritz 1999;Van Dooren 1999, 2000Van Doorn and Dieckmann 2006;Proulx and Phillips 2006;Peischl and Bürger 2008), while the general biological underpinning for the ADs of Mendelian populations is described in Metz (2012). In the present paper we outline a mathematically rigorous approach along the path set out in Champagnat et al. (2008), Champagnat (2006), with proofs for those results that differ in some essential manner between the clonal and Mendelian cases. It should be mentioned though that just as in the special models in Kisdi and Geritz (1999), Van Dooren (1999, 2000, Proulx and Phillips (2006), Peischl and Bürger (2008) and in contrast with the treatment in Metz (2012) we deal still only with the single locus infinite allele case (cf. Kimura 1965), while deferring the infinite loci case to a future occasion.
Our reference framework is a diploid population in which each individual's ability to survive and reproduce depends only on a quantitative phenotypic trait determined by its genotype, represented by the types of two alleles on a single locus. Evolution of the trait distribution in the population results from three basic mechanisms: heredity, which transmits traits to new offsprings thus ensuring the extended existence of a trait distribution, mutation, generating novel variation in the trait values in the population, and selection acting on these trait values as a result of trait dependent differences in fertility and mortality. Selection is made frequency dependent by the competition of individuals for limited resources, in line with the general ecological spirit of AD. Our goal is to capture in a simple manner the interplay between these different mechanisms.

The model
We consider a Mendelian population and a hereditary trait that is determined by the two alleles on but a single locus with many possible alleles [the infinite alleles model of Kimura (1965)]. These alleles are characterized by an allelic trait u. Each individual i is thus characterized by its two allelic trait values (u i 1 , u i 2 ), hereafter referred to as its genotype, with corresponding phenotype φ(u i 1 , u i 2 ), with φ : R m → R n . In order to keep the technicalities to a minimum we shall below proceed on the assumption that n = m = 1. In the Discussion we give a heuristic description of how the extension to general n and m can be made. When we are dealing with a fully homozygous population we shall refer to its unique allele as A and when we consider but two co-circulating alleles we refer to these as A and a. We make the standard assumptions that φ and all other coefficient functions are smooth and that there are no parental effects, so that φ(u 1 , u 2 ) = φ(u 2 , u 1 ), which has as immediate consequence that if u a = u A + ζ , |ζ | 1, then φ(u A , u a ) = φ(u A , u A ) + ∂ 2 φ(u A , u A )ζ + O(ζ 2 ) and φ(u a , u a ) = φ(u A , u A ) + 2∂ 2 φ(u A , u A )ζ + O(ζ 2 ), i.e., the genotype to genotype map is locally additive, φ(u A , u a ) ≈ (φ(u A , u A ) + φ(u a , u a ))/2, and the same holds good for all quantities that smoothly depend on the phenotype.
Remark 2.1 The biological justification for the above assumptions is that the evolutionary changes that we consider are not so much changes in the coding regions of the gene under consideration as in its regulation. Protein coding regions are in general preceded by a large number of relatively short regions where all sorts of regulatory material can dock. Changes in these docking regions lead to changes in the production rate of the gene product. Genes are more or less active in different parts of the body, at different times during development and under different micro-environmental conditions. The allelic type u should be seen as a vector of such expression levels. The genotype to phenotype map φ maps these expression levels to the phenotypic traits under consideration. It is also from this perspective that we should judge the assumption of smallness of mutational steps ζ : the influence of any specific regulatory site among its many colleagues tends to be relatively minor.
The individual-based microscopic model from which we start is a stochastic birth and death process, with density-dependence through additional deaths from ecological competition, and Mendelian reproduction with mutation. We assume that the population's size scales with a parameter K tending to infinity while the effect of the interactions between individuals scales with 1 K . This allows taking limits in which we count individuals weighted with 1 K . As an interpretation think of individuals that live in an area of size K such that the individual effects get diluted with area, e.g. since individuals compete for living space, with each individual taking away only a small fraction of the total space, the probability of finding a usable bit of space being proportional to the relative frequency with which such bits are around.

Model setup
The allelic trait space U is assumed to be a closed and bounded interval of R. Hence the phenotypic trait space is compact. For any (u 1 , u 2 ) ∈ U 2 , we introduce the following demographic parameters, which are all assumed to be smooth functions of the allelic traits and thus bounded. Moreover, these parameters are assumed to depend in principle on the allelic traits through the intermediacy of the phenotypic trait. Since the latter dependency is symmetric, we assume that all coefficient functions defined below are symmetric in the allelic traits.
f (u 1 , u 2 ) ∈ R + : the per capita birth rate (fertility) of an individual with genotype (u 1 , u 2 ). D(u 1 , u 2 ) ∈ R + : the background death rate of an individual with genotype (u 1 , u 2 ). K ∈ N: a parameter scaling the per capita impact on resource density and through that the population size.
C((u 1 ,u 2 ),(v 1 ,v 2 )) K ∈ R + : the competitive effect felt by an individual with genotype (u 1 , u 2 ) from an individual with genotype (v 1 , v 2 ). The function C is customarily referred to as competition kernel. μ K ∈ R + : the mutation probability per birth event (assumed to be independent of the genotype). The idea is that μ K is made appropriately small when we let K increase. σ > 0: a parameter scaling the mutation amplitude. m σ (u, h) dh = 1 σ m(u, h σ ) dh: the mutation law of a mutant allelic trait u +h from an individual with allelic trait u, with m(u, h) dh a probability measure with support [−1, 1]∩{h | u+h ∈ U}. As a result the support of m σ is of size ≤ 2σ .
Notational convention When only two alleles A and a co-circulate, we will use the shorthand: To keep things simple we take our model organisms to be hermaphrodites which in their female role give birth at rate f and in their male role have probabilities proportional to f to act as father for such a birth. We consider, at any time t ≥ 0, a finite number N t of individuals, each of them with genotype in U 2 . Let us denote by (u 1 1 , u 1 2 ), . . . , (u N t 1 , u N t 2 ) the genotypes of these individuals. The state of the population at time t ≥ 0, rescaled by K , is described by the finite point measure on U 2 where δ (u 1 ,u 2 ) is the Dirac measure at (u 1 , u 2 ). Let ν, g denote the integral of the measurable function g with respect to the measure ν and Supp(ν) the support of the latter. Then ν σ,K t , 1 = N t K and for any (u 1 , u 2 ) ∈ U 2 , the positive number ν σ,K t , 1 {(u 1 ,u 2 )} is called the density at time t of genotype (u 1 , u 2 ). Let M F denote the set of finite nonnegative measures on U 2 , equipped with the weak topology, and define An individual with genotype (u 1 , u 2 ) in the population ν σ,K t reproduces with an indi- K ν σ,K , f . With probability 1 − μ K (u 1 , u 2 ) reproduction follows the Mendelian rules, with a newborn getting a genotype with coordinates that are sampled at random from each parent.
At reproduction mutations occur with probability μ K (u 1 , u 2 ) changing one of the two allelic traits of the newborn from u to u + h with h drawn from m σ (u, h) dh.
Each individual dies at rate The competitive effect of individual j on an individual i is described by an increase of the latter's death rate. The parameter K scales the strength of competition: the larger K , the less individuals interact. This decreased interaction goes hand in hand with a larger population size, in such a way that densities stay well-behaved. Appendix A summarizes the long tradition of and supposed rationale for the representation of competitive interactions by competition kernels.
For measurable functions F : R → R and g : U 2 → R, g symmetric, let us define the function F g on M K by F g (ν) = F( ν, g ).
For a genotype (u 1 , u 2 ) and a point measure ν, we define the Mendelian reproduction operator and for m(u, h) dh a measure on R parametrized by u, we define the Mendelian reproduction-cum-mutation operator (2. 3) The process (ν σ,K t , t ≥ 0) is a M K -valued Markov process with infinitesimal generator defined for any bounded measurable functions F g from M K to R and ν = 1 The first term describes the deaths, the second term describes the births without mutation and the third term describes the births with mutations. (We neglect the occurrence of multiple mutations in one zygote, as those unpleasantly looking terms will become negligible anyway when μ K goes to zero.) The density-dependent non-linearity of the death term models the competition between individuals and makes selection frequency dependent. Let us denote by (H) the following three assumptions (H1) The functions f , D, μ K and C are smooth functions and thus bounded since U is compact.Therefore there existf ,D,C < +∞ such that and there exists C > 0 such that C ≤ C(·, ·). (H3) For any σ > 0, there exists a functionm σ : For fixed K , under (H1) and (H3) and assuming that E( ν σ,K 0 , 1 ) < ∞, the existence and uniqueness in law of a process on D(R + , M K ) with infinitesimal generator L K can be adapted from the one in Fournier and Méléard (2004) or Champagnat et al. (2008). The process can be constructed as solution of a stochastic differential equation driven by point Poisson measures describing each jump event. Assumption (H2) prevents the population from exploding or going extinct too fast.

The short term large population and rare mutations limit: how selection changes allele frequencies
In this section we study the large population and rare mutations approximation of the process described above, when K tends to infinity and μ K tends to zero. The limit becomes deterministic and continuous and the mutation events disappear. The proof of the following theorem can be adapted from Fournier and Méléard (2004).
Theorem 3.1 When K tends to infinity and if ν K 0 converges in law to a deterministic measure ν 0 , then the process (ν σ,K ) converges in law to the deterministic continuous measure-valued function (ν t , t ≥ 0) solving Below we have a closer look at the specific cases of genetically mono-and dimorphic initial conditions.

Monomorphic populations
Let us first study the dynamics of a fully homozygote population with genotype (u A , u A ) corresponding to a unique allele A and genotype A A. Assume that the initial converging to a deterministic number n 0 > 0 when K goes to infinity. In that case the population process is N K t δ (u A ,u A ) where N K t is a logistic birth and death process with birth rate f A A = f (u A , u A ) and death rate D A A + C AA,AA K N K t . The process ( N K t K , t ≥ 0) converges in law when K tends to infinity to the solution (n(t), t ≥ 0) of the logistic equation with initial condition n(0) = n 0 . This equation has a unique stable equilibrium equal to the carrying capacity:n

Genetic dimorphisms
Let us now assume that there are two alleles A and a in the population (and no mutation). Then the initial population has the three genotypes A A, Aa and aa. We use (N K A A,t , N K Aa,t , N K aa,t ) to denote the respective numbers of individuals with genotype A A, Aa and aa at time t, and (N A A , N Aa , N aa ) to indicate the typical state of the population. Let be the relative frequency of A in the gametes. Then the population dynamics t → is a birth and death process with three types and birth rates b A A , b Aa , b aa and death rates d A A , d Aa , d aa defined as follows.
To see this, it suffices to consider the generator (2.4) with μ K = 0; for instance,

Proposition 3.2 Assume that the initial condition K
and similar expressions for the other terms.
Due to its special functional form, the vector field X has some particular properties. We summarize some of them in the following Propositions.

Proposition 3.3
The vector field (3.6) has two fixed points (n A A , 0, 0) and (0, 0,n aa ) (denoted below by A A and aa) wherē An analogous result holds for D X (0, 0,n aa ).
This result follows from a direct computation left to the reader. As we will see later on, the eigenvalue S Aa,A A will play a key role in the dynamics of trait substitutions. It describes the initial growth rate of the number of Aa individuals in a resident population of A A individuals and is called the invasion fitness of an Aa mutant in an A A resident population. It is a function of the allelic traits u A and u a .
Notation When we wish to emphasize the dependence on the two allelic traits (u A , u a ), we use the notation Note that the function S is not symmetric in u A and u a and that moreover In Appendices B and C the long term behavior of the flow generated by the vector field (3.6) is analyzed in more detail. The main conclusions are:

Proposition 3.4 First consider the case when the mutant and resident traits are precisely equal. Then the total population density goes to a unique equilibrium and the relative frequencies of the genotypes go to the Hardy-Weinberg proportions
, there exists a globally attracting one-dimensional manifold filled with neutrally stable equilibria parametrized by p, with as stable manifolds the populations with the same p. For the mutant and resident sufficiently close, this attracting manifold transforms into an invariant manifold connecting the pure resident and pure mutant equilibria. When S Aa,A A > 0 the pure resident equilibrium attracts only in the line without any mutant alleles and its local unstable manifold is contained in the aforementioned invariant manifold (Theorem C.1). When moreover the traits are sufficiently far from an evolutionarily singular point (defined by ∂ 1 S(u A ; u A ) = 0) the movement on the invariant manifold is from the pure resident to the pure mutant equilibrium, and any movement starting close enough to the invariant manifold will end up in the pure mutant equilibrium (Theorem C.2).

The long term large population and rare mutations limit: trait substitution sequences (TSS)
In this section we generalize the clonal theory of adaptive dynamics to the diploid case. We again make the combined large population and rare mutation assumptions, except that we now change the time scale to stay focused on the effect of the mutations. Recall that the mutation probability for an individual with genotype (u 1 , u 2 ) is μ K ∈ (0, 1]. Thus the time scale of the mutations in the population is 1 K μ K . We study the long time behavior of the population process in this time scale and prove that it converges to a pure jump process stepping from one homozygote type to another. This process will be a generalization of the simple TSS that for the haploid case were heuristically derived in Dieckmann and Law (1996), and Metz et al. (1996) where they were called 'Adaptive Dynamics', and rigorously underpinned in Champagnat (2006), Champagnat and Méléard (2011).
Let us define the set of measures with single homozygote support.
We will denote by J the subset of U where ∂ 1 S(u; u) vanishes. We make the following hypothesis.

Hypothesis 4.1 For any u
This hypothesis implies that the zeros of ∂ 1 S(u; u) are isolated (see Dieudonné 1969), and since U is compact, J is finite.
Note that because of (3.8), Let us now define the TSS process which will appear in our asymptotics.

Definition 4.3
For any σ > 0, we define the pure jump process (Z σ t , t ≥ 0) with values in U, as follows: its initial condition is u A 0 and the process jumps from u A to Remark 4.4 Under our assumptions, the jump process Z σ is well defined on R + . Note moreover that the jump from u A to u a only happens if the invasion fitness S(u a ; u A ) > 0.
We can now state our main theorem.

Theorem 4.5 Assume (H). Assume moreover that ν σ,K
(That is, the initial population is monomorphic for a type that is not an ess). Assume finally that For η > 0 introduce the stopping time where d is the distance on the allelic trait space. Extend M F with the cemetery point ∂.
with u a = u A + h and infinitesimal rate (4.1).
Remark 4.6 Close to singular strategies the convergence to the TSS slows down. To arrive at a convergence proof it is therefore necessary to excise those close neighborhoods. This is done by means of the stopping times T σ,K η and T σ η : we only consider the process for as long as it stays sufficiently far away from any singular strategies. Assumptions (H) imply that the thus stopped TSS (Z σ t ) t is well defined on R + . Since its jump measure is absolutely continuous with respect to the Lebesgue measure, it follows that T σ η converges almost surely to ∞ when η tends to 0 (for any fixed σ > 0).
We now roughly describe the successive steps of the mutation, invasion and substitution dynamics making up the jump events of the limit process, following the biological heuristics of Dieckmann and Law (1996), Metz et al. (1996), Metz (2012).
The details of the proof are described in Appendix D, based on the technical Appendices B and C. The time scale separation that underlies the limit in Theorem 4.5 both simplifies the processes of invasion and of the substitution of a new successful mutant on the population dynamical time scale and compresses it to a point event on the evolutionary time scale. The two main simplifications of the processes of mutant invasion and substitution are the stabilization of the resident population before the occurrence of a mutation, simplifying the invasion dynamics, and the restriction of the substitution dynamics to a competition between two alleles. In the jumps on the evolutionary time scale t/K μ K these steps occur in opposite order. First comes the attempt at invasion by a mutant, then, if successful, followed by its substitution, that is, the stabilization to a new monomorphic resident population. After this comes again a waiting time till the next jump.
To capture the stabilization of the resident population, we prove, on the assumption that the starting population is monomorphic with genotype A A, that for arbitrary fixed ε > 0 for large K the population density ν σ,K t , 1 {(u A ,u A )} with high probability stays in the ε-neighborhood ofn A A until the next allelic mutant a appears. To this aim, we use large deviation results for the exit problem from a domain (Freidlin and Wentzel 1984) already proved in Champagnat (2006) to deduce that with high probability the time needed for the population density to leave the ε-neighborhood ofn A A is bigger than exp(V K ) for some V > 0. Therefore, until this exit time, the rate of mutation from A A in the population is close to μ K p A A f A A Kn A A and thus, the first mutation appears before this exit time if one assumes that Hence, on the time scale t/K μ K the population level mutation rate from A A parents is close to To analyze the fate of these mutants a, we divide the population dynamics of the mutant alleles into the three phases shown in Fig. 1, in a similar way as was done in Champagnat (2006). In the first phase (between time 0 and t 1 in Fig. 1), the number of mutant individuals of genotype Aa or aa is small, and the resident population with genotype A A stays close to its equilibrium densityn A A . Therefore, the dynamics of the mutant individuals with genotypes Aa and aa is close to a bi-type birth and death process with birth rates f Aa y + 2 f aa z and 0 and death rates (D Aa + C Aa,A An A A ) y and (D aa + C aa,A An A A ) z for a state (y, z). If the fitness S Aa;A A is positive (i.e., the branching process is supercritical), the probability that the mutant population with genotype Aa or aa reaches K ε > 0 at some time t 1 is close to the probability that the branching process reaches K ε > 0, which is itself close to its survival probability [S Aa;AA ] + f Aa when K is large. Assuming the mutant population with genotype Aa or aa reaches K ε > 0, a second phase starts. When K → +∞, the population densities ( ν σ,K The study of this dynamical system (see Appendices B and C) implies that, if the mutation step u a − u A is sufficiently small, any solution to the dynamical system starting in some neighborhood of (n A A , 0, 0) converges to the new equilibrium (0, 0,n aa ) as time goes to infinity. Therefore, with high probability the population densities reach the ε-neighborhood of (0, 0,n aa ) at some time t 2 . Applying the results in Theorems C.1 and C.2 for the deterministic system to the approximated stochastic process, is justified by observing that the definition of the stopping times T σ,K η and T σ η implies that the allelic trait u A stays at all times away from the set J . Finally, in the last phase, we use the same idea as in the first phase: since (0, 0,n aa ) is a strongly locally stable equilibrium, we can approximate the densities of the traits A A and Aa by a bi-type sub-critical branching process. Therefore, they reach 0 in finite time and the process comes back to where we started our argument (a monomorphic population), until the next mutation. In Champagnat and Méléard (2011) it is proved that the duration of these three phases is of order log K σ . Therefore, under the assumption the next mutation occurs after these three phases with high probability. Then the time scale Assumption (4.2) allows us to conclude, taking the limits K tending to infinity and then ε to 0. Then we repeat the argument using the Markov property. Note that the convergence cannot hold for the usual Skorohod topology and the space M F equipped with the corresponding weak topology. Indeed, it can be checked that the total mass of the limit process is not continuous, which would be in contradiction with the C-tightness of the sequence (ν σ,K t/K μ K , t ≥ 0), which would hold in case of convergence in law for the Skorohod topology (since the jump amplitudes are equal to 1 K and thus tend to 0 as K tends to infinity). However, certain functionals of the process converge in a stronger sense. Let us for example consider the average over the population of the phenotypic trait φ. This can be easily extended to more general symmetric functions of the allele.
Theorem 4.7 Assume that u → φ(u, u) is strictly monotone. Define Under the assumptions of Theorem 4.5, the process converges in law in the sense of the Skorohod M 1 topology to the process The Skorohod M 1 topology is a weaker topology than the usual J 1 topology, allowing processes with jumps tending to 0 to converge to processes with jumps (see Skorohod 1956). For a càd-làg function x on [0, T ], the continuity modulus for the M 1 topology is given by Note that if the function x is monotone, then w δ (x) = 0.
Proof From the results of Theorem 4.5, it follows easily that finite dimensional distributions of (R σ,K t , t ≥ 0) converge to those of (φ(Z σ t , Z σ t ), t ≥ 0). By Skorohod (1956), Theorem 3.2.1, it remains to prove that for all η > 0, The rate of mutations of (R σ,K t , t ≤ T ) being bounded, the probability that two mutations occur within a time less that δ is o(δ). It is therefore enough to study the case where there is at most one mutation on the time interval [0, δ]. As in the proof of Proposition 3.2, with probability tending to 1 when K tends to infinity, the process (Recall that ϕ t is the flow defined by the vector field; see Proposition 3.2.) Away from invading mutations, the function F W φ is constant and the modulus of continuity tends to 0. Around an invading mutation, it follows from Corollary C.4 that the function F W φ is monotone. Therefore the same conclusion holds.

Small mutational steps: the time scale of the canonical equation
We are now interested to study the convergence of the TSS when the mutation amplitude σ tends to zero. Without rescaling time, the TSS trivially tends to a constant. In order to get a nontrivial limit, we have to rescale time adequately, namely with 1 σ 2 , since S(u A ; u A ) = 0.
Theorem 5.1 Assume that the initial values Z σ 0 are uniformly bounded in L 2 and that they converge to Z 0 0 as σ tends to 0. Then, the sequence of processes (u, u) C ((u, u), (u, u)) .
The proof of this theorem is similar to the proof of Theorem 4.1 in Champagnat and Méléard (2011). In this general form the canonical equation is still of little practical use, although already some qualitative conclusions can be drawn from it. The trait increases whenever the selection gradient ∂ 1 S(u; u) is positive and decreases when it is negative, i.e., movement is always uphill with respect to the current allelic fitness landscape S(·; u). The equilibria of (5.1) correspond to the allelic evolutionarily singular strategies, except that close to those strategies (5.1) is no longer applicable since in their neighborhood the convergence of the underlying individual-based process to the simple TSS becomes slower and slower. So all we can deduce from the canonical equation (5.1) is that for small mutational steps the trait substitution sequence will move to some close neighborhood of an allelic evolutionarily singular strategy.
Remark 5.2 If we had considered extended TSSes taking values in the powers of the trait space as is done in Metz et al. (1996), the convergence to the canonical equation would similarly have gone awry due to a slowing down of the convergence near evolutionarily singular strategies, and the occurrence of polymorphism close to some of them, with adaptive branching as a particularly salient example; branching can only be investigated with a time scaling different from the one for the canonical equation (Metz et al. 1996;Champagnat and Méléard 2011).
To get from the previous observation to some biological conclusion we need to decompose the genotypic fitness function S into its ecological and developmental components Hence, the allelic singular strategies are of two different types, ecological, characterized by ∂ 1S (φ(u; u); φ(u; u)) = 0, and developmental, characterized by ∂ 1 φ(u, u) = 0. On the phenotypic level the latter are perceived as developmental constraints (cf. Van Dooren 2000).
To arrive at quantitative conclusions we have to make additional assumptions about the within individual processes. One often used assumption is that the mutation distribution is symmetric. With that assumption (5.1) reduces to with V a the allelic mutational variance. (The factor 1 2 comes from the fact that the integration is only over a half-line.) This equation can easily be lifted to the phenotypic level as with U = φ(u, u) and V p the phenotypic mutational variance, an equation fully phrased in population level observables. The factor 1 2 is canceled by a factor 2 coming from the fact that the fitnessS refers to heterozygotes with only one mutant allele, while after a substitution the other allele is also a mutant one. For this equation only the ecological singular strategies remain while developmental constraints appear in the form of V p becoming zero (cf. Van Dooren 2000). [It is also possible to lift (5.1) to the phenotypic level. However, the truncated first and second moments that appear in the resulting expression are no longer well-established statistics that can be measured independent of any knowledge of the surrounding ecology.]

Discussion
This paper forms part of a series by a varied collection of authors that aim at putting the tools of adaptive dynamics on a rigorous footing (Metz et al. 1992(Metz et al. , 1996Dieckmann and Law 1996;Geritz et al. 1998;Champagnat et al. 2008;Champagnat 2006;Durinx et al. 2008;Méléard and Tran 2009;Champagnat and Méléard 2011;Metz 2012;Klebaner et al. 2011;Bovier and Champagnat 2012), (see also Diekmann et al. 2005;Barles and Perthame 2007;Carrillo et al. 2007;Desvillettes et al. 2008). It is the first in the series to treat the individual-based justification of the adaptive dynamics tools in a genetic setting. As such it forms the counterpart of the more heuristic, but also more general (Metz 2012). We only consider unstructured Lotka-Volterra type populations and single locus genetics, in line with applied papers such as Kisdi and Geritz (1999), Van Dooren (1999), Proulx and Phillips (2006), Peischl and Bürger (2008). For such models we proved the convergence (for large population sizes and suitably small mutation probabilities) of the individual-based stochastic process to the TSS of adaptive dynamics, and the subsequent convergence (for small mutational steps) of the TSS to the canonical equation. Not wholly unexpectedly, the results are in agreement with the assumed framework of the more applied work. Yet, to arrive at a rigorous proof new developments were needed, like the derivation of a rigorous estimate for the probability of invasion in a dynamic diploid population (Appendix D), a rigorous, geometric singular perturbation theory based, invasion implies substitution theorem (Appendix C), and the use of the Skorohod M 1 topology to arrive at a functional convergence result for the TSS (Sect. 4).
The main differences of Mendelian models compared to the clonal ones in Dieckmann and Law (1996), Metz et al. (1996) and successors, is a difference in the invasion probability of a mutant due to the additional noise inherent in the Mendelian mechanism, and the occurrence of an additional factor 2 in the canonical equation due to the fact that mutants invade as heterozygotes but establish as homozygotes. In addition there is the conceptual point that developmental constraints that appear phenotypically as restrictions on the mutational covariance matrix are on the allelic level seen as restrictions on the allelic selection gradient.
We finish with listing the remaining biological limitations of the present results and the corresponding required further developments.
The first limitation is the assumption of an unstructured population. For a fair number of real populations the assumption of random deaths appears to match the observations, but no organisms reproduce in a Poisson process starting at birth. Moreover, in nature a good amount of population regulation occurs through processes affecting the birth rate, as when a scarcity of resources translates in a delay of maturing to the reproductive condition. Durinx et al. (2008) heuristically treats very general life histories (although only for a finite number of birth states, a finite number of variables channeling the interaction between individuals, and a deterministic population dynamics converging to a unique equilibrium) based on the population dynamical modeling framework of Diekmann et al. (1998Diekmann et al. ( , 2001Diekmann et al. ( , 2003. However, it only considers the convergence to the canonical equation, starting from the TSS, conjectured to be derivable from the population dynamical model, with the goal of relating its coefficient functions to observationally accessible statistics of individual behavior. In fact, even the convergence to a deterministic population model, as in Theorem 3.1, does not easily fit in the scheme of Fournier and Méléard (2004) in the (biologically common) cases where the movement of individuals through their state spaces depends directly or indirectly on the population size and composition. (The special case where this movement decomposes in a product of a population-and a state-dependent term is covered in Tran (2006Tran ( , 2008, Ferrière and , an extension to the "Dapnia" models of ,  in Metz and Tran (2012).) A further limitation is that we assumed the trait to be governed by only a single locus [in keeping with a well-established tradition starting with Kimura (1965)]. The more locus case still has to be worked out. The superficially more easy case with infinitely many loci, so that no mutant ever occurs on the same locus, is considered from a heuristic perspective in Metz (2012), Metz and de Kovel (2012). However, the problem of rigorously setting up the underlying individual-based model as a limit for models with an ever increasing number of loci still needs to be tackled.
The final extension to be considered is to higher dimensional geno-and phenotypic trait spaces. We conclude with a heuristic discussion of the form such an extension will take. On the genotypic level the canonical equation will take essentially the same form as (5.1) and (5.4), with scalar u, h and ∂ 1 S replaced by vectors, and the mutational variance by a covariance matrix, just as this is written in Dieckmann and Law (1996), Champagnat et al. (2008), Durinx et al. (2008), Champagnat and Méléard (2011) for the clonal and Metz (2012), Metz and de Kovel (2012) for the Mendelian case. However, there is one remaining snag, which is the reason why we opted for treating only the one-dimensional case. In the directions orthogonal to the selection gradient the fitness landscape around the resident strategy has the same shape as at an evolutionarily singular strategy. In the one-dimensional case we opted for just removing the neighborhoods of the singular strategies. If we were to apply the same strategy for the higher dimensional case we would have to remove all residents. The way out is by observing that the directions where something awry may occur are but a very small minority among all possible directions in which mutations may occur. Heuristic calculations suggest that the trouble only occurs in a narrow double horn with a boundary that at the resident strategy is orthogonal to the selection gradient, so that when the mutational step size σ goes to zero, the probability of a mutant ending up in that horn decreases as some higher power of σ . Moreover, in the directions orthogonal to the selection gradient the fitness is a quadratic function, making the probability of invasion scale not linearly but quadratically with the size of any mutational steps in those directions. The main problem with such mutants is that some of them may on the population dynamical time scale keep coexisting with the resident. Further heuristic calculations then suggest that for such a resident pair the probability of invasion of a subsequent mutant more in the direction of the selection gradient is to the lowest order of approximation -in the distance between the two residents -equal to the probability of invasion in a monomorphic population of the average type, and that such a mutant ousts both residents. Therefore the general (i.e., more type) TSS is close to a simple TSS in which those untoward mutants are just removed from the consideration, the smaller the mutational step the closer. We put rigorously underpinning this scenario forward as the last of our list of challenges.
Acknowledgments This work benefitted from the support from the "Chaire Modélisation Mathématique et Biodiversité of Veolia Environnement-Ecole Polytechnique-Museum National d'Histoire Naturelle-Fondation X". Two anonymous reviewers helped us to improve the paper by suggesting small but useful additions and listing a good many small typos in the formulas.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Appendix A: A few words about competition kernels
In the ecological literature the models described in Sect. 2 are known as Lotka-Volterra competition models (Lotka 1925;Volterra 1931). The early LV models were all deterministic, phrased as ODEs corresponding to large population limits such as considered in the Sect. 3, without mutations. The determinism together with the assumption of clonal reproduction obviated the need to separately model birth and deaths: competition was represented as its overall effect on the population growth rate. The later stochastic models, e.g. (Dieckmann and Law 1996;Metz et al. 1996), usually put the effect of competition only in the death rate, as otherwise the chosen linear form of the interaction might lead to negative birth rates.
The simplest case is when C = 0. This is the case customarily put forward in population genetics textbooks as starting point for the derivation of their deterministic models for gene frequency change by selection, but for the fact that population geneticists usually work in discrete time. The unnatural consequence that the population either will die out or will keep growing indefinitely is made invisible by transforming to relative frequencies. The more realistic case of non-selective competition, v 2 ), leads to the same population genetical equations. The selective pressures on the gene frequencies then do not change with the population size or composition as they are caused only by differences in the fixed mortality components and the fertilities.
Where in population genetics the early selection models assumed indefinitely growing populations, the early stochastic models, in continuous time the Moran-type models, assumed constant population sizes. Although later variable population sizes were introduced, it was just assumed that these sizes neither become zero nor grow too large too often (Karlin 1968;Seneta 1974;Heyde and Seneta 1975;Heyde 1977Heyde , 1983Donnelly and Weber 1985;Klebaner 1988). Stochastic models with the population regulation represented in accordance with ecological tradition are relative newcomers (e.g. Metz and Redig 2012).
The case where the additional death rate incurred by an individual from its competitive interaction depends only on the genotype of the focal individual and not on that of its competitors is known in the ecological literature as purely density dependent selection (Roughgarden 1971(Roughgarden , 1976(Roughgarden , 1979 , and in the mathematical literature as logistic population regulation. This logistic case can be generalized to C ((u 1 , u 2 ), (v 1 , v 2 )) = C(u 1 , u 2 ) C(v 1 , v 2 ), when it is not the total density but, e.g. the total biomass that determines the felt competitive effect and different phenotypes have different biomasses. A further generalization is that population growth is regulated by a finite number of variables, think for example of the combination of space and nitrogen depletion: The vector ( C 1 , . . . , C k ) T is known as the impact of the individuals on their environment, and the vector ( C 1 , . . . , C k ) as their sensitivity (Meszéna et al. 2006). The latter generalization is evolutionarily richer in that it can allow diversification, which is excluded by the earlier considered kernels. In Durinx et al. (2008) it is shown heuristically that close to an evolutionarily singular strategy any clonal model evolutionarily behaves like a Lotka-Volterra competition model of the above type with k smaller than or equal to one plus the dimension of the trait space.
The above considerations all come from either ecology or population genetics, and originally were phrased for a fixed finite number of types, clonal ones in the ecological and Mendelian ones in the population genetics literature. The first model characterizing these types in terms of traits was formulated by MacArthur and Levins (1964), (see also MacArthur 1970). This model was later used to great effect by a large number of authors (e.g. Levins 1968;MacArthur and Levins 1967;May 1973May , 1974Roughgarden 1976Roughgarden , 1979Christiansen and Fenchel 1977;Slatkin 1980), (but see also Roughgarden 1989), to study species packing population dynamically as well as evolutionarily. The first genetic model of this type was studied by Loeschcke (1980, 1987), Loeschcke and Christiansen (1984), who considered the possibilities for the coexistence of finite numbers of genotypes. Explicit traitbased LV-style birth and death process models with mutation only appeared on the scene with the birth of adaptive dynamics (Dieckmann and Law 1996;Metz et al. 1996).
The most common assumption in trait-based LV competition models (MacArthur and Levins 1964;MacArthur 1970MacArthur , 1972Roughgarden 1979) is that Here z ∈ R is customarily interpreted as a trait of a fine-grained self-renewing resource with a fast logistic dynamics that is supposed to be non-evolving. That is, it is assumed that a resource unit comprises close to infinitely many very small particles, so that the resource dynamics can be treated as deterministic and that the turnover of the resource is very fast so that it effectively tracks its deterministic equilibrium as set by the current consumer population. Functions of (u 1 , u 2 ) depend again on this argument through φ. Q is the average rate constant for the encounter and absorption of resource particles by our consumer individuals, expressed in resource units, while q tells how this use is spread over the resource axis.
The most commonly used parametric form is Deterministic models based on this kernel have all sorts of nice mathematical properties, but Adaptive Dynamically they are a bit degenerate in that when σ a < σ k the final stop for TSS that result from the long term large population and rare mutations limit, as treated in Sect. 4, is a Gaussian distribution over trait space (cf. Roughgarden 1979) whereas for almost any slightly different model the final stop has finite support (Gyllenberg and Meszéna 2005;Leimar et al. 2008). For this reason adaptive dynamics researchers started to use slightly modified expressions for k or C. [When K is still finite, the number of branches visible in simulations also stays finite, due to the early abortion of incipient ones, with the number of recognizable branches becoming larger with increasing K and σ k /σ a (Claessen et al. 2007(Claessen et al. , 2008.] Exploring the consequences of all sorts of different competition kernels by now has become a little growth industry; a good sample may be found in Doebeli (2011).
Remark A.1 The description of the mechanism underlying the competition kernel given above was a bit brash, in keeping with biological tradition. Starting from an underlying fast logistic resource dynamics actually gives with y the yield, i.e., y −1 is the resource mass needed to make one consumer, w the mass of a resource unit, v the rate constant of consumers encountering and eating resource units, d 1 the rate constant of consumer mass loss due to basal metabolism, and d 2 the consumer mortality rate, r R the low density reproductive rate of the resource, and k R its carrying capacity. V is some unknown proportionality constant. (In the above terms the time scale separation results from both r R and v being very large and y very small with the product of y and v being O(1).) Apparently the interpretation of Q and q is more complicated than the standardly attributed one based on the assumption of constant wk R /r R . Although time-honoured, the above described mechanistic underpinning is not without flaws, as explicitly laid out by Chesson (1990). In the derivation it is assumed that, but for the indirect coupling through the consumers, the dynamics of different resources are independent. Even very similar resource populations do not compete. However, this is only possible if their ecological properties depend everywhere discontinuously on the trait z, since the assumed logistic nature of the resource dynamics means that there is non-negligible competition between equal resource particles. The alternative assumption alluded to by MacArthur (1972) that the intrinsic resource dynamics is of a chemostat type (as can be approximately the case for seeds from perennial plants) also is problematical: Under the reasonable assumption that the resource mass removed by a consumer population equals the mass this population acquires, the detrimental effect from competition becomes non-linear in the competitor densities, instead of being simply representable by a competition kernel.
Appendix B: Properties of the vector field (3.6.)

B.1 Neutral case
We first consider the case of neutrality between the A and a alleles, namely f a 1 a 2 = f , D a 1 a 2 = D 0 and C a 1 a 2 ,b 1 b 2 = D 1 for a 1 a 2 , b 1 b 2 = A A, Aa, aa. We have in this case with n = x + y + z p = x + y/2 n which is the proportion of allele A. We get for the vector field The vector field X 0 has a line of fixed points given by That is, we have for any v, X 0 ( 0 (v)) = 0. The parametrization with v is chosen such that the differential of the vector field X 0 at each point of the curve 0 , DX 0 ( 0 (v)), has the three eigenvectors The corresponding eigenvectors of the transposed matrix D X 0 ( 0 (v)) t , to be denoted by β 1 (v), β 2 (v) and β 3 (v) can be normalized such that for any i, j, ∈ {1, 2, 3} and any v Proof This is easily seen by using the standard variables: total population density, n = x + y + z, relative frequency of the A allele, p = (x + y/2)/n, and excess heterozygosity realtive to the Hardy-Weinberg proportion, h = y/n − 2 p(1 − p). In these new coordinates, the vector field X 0 becomes the vector field Y 0 given by This vector field obviously vanishes on the line n = n 0 , h = 0. One gets immediately the results by taking v = n 0 (1 − 2 p). The spectral results follow by standard computations.

B.2 Small perturbations
We now assume that mutations are small. We denote by ζ the variation of the allelic trait ζ = u a − u A . The vector field depends on ζ and will be denoted by X (ζ, M). We assume regularity in ζ and M, and observe that X (0, M) = X 0 (M).
In practice we will apply our results to the vector field (3.6) which has a particular algebraic form. It is however convenient to derive the perturbation results in full generality. We will come back to the particular case of (3.6) in Sect. C.
Our goal in this section is to understand the time asymptotic of the flow associated to the vector field X (ζ, M).
Since the curve 0 is transversally hyperbolic (even transversally contracting, see Proposition B.1) for the vector field X 0 , we can apply Theorem 4.1 in Hirsh et al. (1977) to conclude that for ζ small enough, there is an attracting curve ζ invariant by X . Moreover, ζ is regular and converges to 0 when ζ tends to zero. In other words, there is a small enough tubular neighborhood V of 0 such that for any |ζ | small enough, ζ is contained in V and attracts all the orbits with initial conditions in V . [For earlier, weaker results in this direction for general differential and difference equation population dynamical models without genetics see (Geritz et al. 2002;Dercole and Rinaldi 2008, Appendix B).] Applying Theorem 4.1 in Hirsh et al. (1977) requires that the curve 0 is a compact manifold without boundary, but this is not the case here. However one can perform some standard surgery to put our problem in this form in a neighborhood of the part of 0 which lies in the positive quadrant which is the only part of phase space that matters for us.

B.2.1 Location of the zeros of the perturbed vector field
Since the curve ζ is invariant and (locally) attracting for the flow associated to the vector field X (ζ, M), where M stands for the vector (x, y, z) it is enough to study the flow on this curve. In particular, since ζ is a curve, if the vector field does not vanish on ζ except at the intersections with the lines x = y = 0 and y = z = 0 (the fixed points aa and A A respectively see Theorem 3.3), we know that the orbit of any initial condition on ζ (between A A and aa) will converge either to A A or to aa.
We now look for the fixed points on ζ of the flow associated to the vector field X (ζ, M) which are the points where the vector field vanishes. Since ζ is attracting, it is equivalent (and more convenient) to look for the fixed points in V .
It is convenient to use for this study local frames in the tubular neighborhood V of 0 . There are many possibilities for defining such frames, we found that a convenient one is to represent a point M by the parametrisation with v ∈ [−n 0 − δ, n 0 + δ], r ∈ [−δ, δ], s ∈ [−δ, δ] with δ > 0 to be chosen small enough later on. We observe that M(v, 0, 0) = 0 (v). The Jacobian of the transformation (v, r, s) → (x, y, z) = M(v, r, s) is equal to −(1 + r )/2 and therefore does not vanish if 0 < δ < 1. It is easy to verify that if δ > 0 is small enough, the map (v, r, s) → M(v, r, s) is a diffeomorphism of [−n 0 − δ, n 0 + δ] × [−δ, δ] 2 to a close neighborhood of V (provided this tubular neighborhood is small enough). In particular, once δ > 0 is chosen, for any ζ > 0 small enough, V contains the intersection of ζ with the first quadrant (by continuity of ζ in ζ ).
In order to find the zeros of the vector field X (ζ, M), we will use convenient linear combinations of its components which reflect the fact that the flow is transversally hyperbolic. We will first equate to zero two linear combinations of the components, and by the implicit function theorem this will lead to a curve containing all possible zeros. We will then look at the points on this curve where the third (independent) linear combination of the components vanishes.

Proposition B.2 For any
depending smoothly on ζ , and converging to 0 when ζ tends to zero such that for any v ∈ [−n 0 − δ, n 0 + δ] we have Moreover, if a point (v, r, s) with v ∈ [−n 0 − δ, n 0 + δ], r and s small enough is such that Proof Consider the map F from R 2 × R 2 to R 2 given by For any v 0 ∈ [−n 0 − δ, n 0 + δ], and |ζ | small enough, the differential of F in (r, s) at (0, v 0 , 0, 0) is invertible. This follows by continuity from the same result in ζ = 0 where the determinant of the differential is f ( f − D 0 ). Therefore, by the implicit function theorem (see for example Dieudonné 1969), for any v 0 ∈ [−n 0 − δ, n 0 + δ], there is an open neighborhood U v 0 of (v 0 , 0) in R 2 and two regular functions functions on U v 0 , r v 0 and s v 0 such that for any (ζ, v) ∈ U v 0 we have Since the set [−n 0 − δ, n 0 + δ] × {0} is compact in R 2 , we can find a finite sequence v 1 , . . . , v m such that the finite sequence of sets (U v j ) is a finite open cover of [−n 0 − δ, n 0 + δ] × {0}. We now define the functions r and s in the tubular neighborhood (ζ, v) by the uniqueness of the solution in the implicit function theorem. The last assertion of the proposition follows also from the uniqueness of the solution in the implicit function theorem.
It follows immediately from the above result that the vector field X (ζ, · ) vanishes in a small enough neighborhood of 0 if and only if which at a given ζ is an equation for v.
We analyze a neighborhood of the point ζ = 0. We first observe that Therefore by the Malgrange preparation Theorem (Golubitsky and Guillemin 1973) (the Weierstrass preparation Theorem in the analytic setting), we can write The function g in (B.2) is given by Proof We have The lemma follows at once from Proposition B.1.
The following result gives conditions for the perturbed vector field to have only two fixed points near 0 .
As we will see in the proof g(±n 0 ) = 0 and the condition dg/dv(±n 0 ) = 0 ensures that these zeros are isolated.
Proof We observe that On the other hand, by a direct computation one gets there is a number δ > 0 such that for |ζ | small enough non zero, the function v → β 2 (v), X (ζ, M(v, r ζ (v), s ζ (v))) has at most two zeros in the interval [−n 0 −δ , n 0 + δ ]. Such zeros must be simple and near ±n 0 . By Theorem 3.3 we conclude that these two zeros exist and are the two fixed points aa and A A respectively. Consider now the average phenotypic trait φ. This corresponds to the vector Corollary C.4 The function F W φ is strictly monotonous for |ζ | small enough.
Proof One gets and by Proposition C.3 we get the monotonicity in time of the average phenotypic trait.

Appendix D: Proof of Theorem 4.5
The proof of the theorem will essentially follow the same steps as the ones of the proof of Theorem 1 in Champagnat (2006) and of the Appendix A in Champagnat and Méléard (2011). We will not repeat the details and we will restrict ourselves to the steps that must be modified. The proof is based on intermediary results that we state now.
Proposition D.1 Assume that for K ≥ 1, Supp(ν K 0 ) = {A A, Aa, aa} and and similarly for Aa and aa, where ϕ t is the flow of the vector field (3.6).
The proof of this result can be obtained following a standard compactness-uniqueness result, (see Ethier and Kurtz 1986;Fournier and Méléard 2004) and using Theorem C.2.
Proposition D.2 Let Supp(ν K 0 ) = {A A} and let τ 1 denote the first mutation time. For any sufficiently small ε > 0, if ν K 0 , 1 A A belongs to the ε 2 -neighborhood of n A A = f AA −D AA C AA,AA , the time of exit of ν σ,K t , 1 A A from the ε-neighborhood ofn A A is bigger than e V K ∧ τ 1 with probability converging to 1. Moreover, there exists a constant c such that for any sufficiently small ε > 0,the previous result still holds if the death rate of an individual with genotype A A is perturbed by an additional random process that is uniformly bounded by c ε.
(In principle τ 1 also depends on K , but to avoid clutter we have suppressed this in the notation.) Such results are standard (cf. Champagnat 2006). The first part of this proposition is an exponential deviation estimate on the so-called "exit from an attracting domain" (Freidlin and Wentzel 1984). It is used to prove that when the first mutation occurs, the population density has never left the ε-neighborhood ofn A A . When a mutation a occurs, the additional term in (D.2) is C A A,Aa ν σ,K t , 1 Aa + C A A,aa ν σ,K t , 1 aa which is smaller thatC ε if ν σ,K t , 1 Aa + ν σ,K t , 1 aa ≤ ε. From these results, one can deduce the following proposition, already proved in Champagnat (2006).

Proposition D.3 Let Supp(ν K
0 ) = {A A} and let τ 1 denote the first mutation time. There exists ε 0 such that if ν K 0 , 1 belongs to the ε 0 -neighborhood ofn A A , then for any ε < ε 0 , lim K →∞ P K τ 1 > ln K , sup t∈[ln K ,τ 1 ] | ν σ,K t , 1 −n A A | < ε = 1, and K μ K τ 1 converges in law (when K tends to infinity) to a random variable with exponential law with parameter 2 f A A p A An A A , that is for any t > 0, Then, if ln K 1 K μ K , we deduce that lim K →∞ P K τ 1 < ln K = 0 and that for any ε > 0 lim K →∞ Let us define two stopping times which describe the first time where the process arrives in a ε-neighborhood of a stationary state of the dynamical system. Note that τ A is the extinction time of the population with alleles a and fixation of the allele A and that τ a is the extinction time of the population with allele A and fixation of the allele a. (