The canonical equation of adaptive dynamics for life histories: from fitness-returns to selection gradients and Pontryagin’s maximum principle

This paper should be read as addendum to Dieckmann et al. (J Theor Biol 241:370–389, 2006) and Parvinen et al. (J Math Biol 67: 509–533, 2013). Our goal is, using little more than high-school calculus, to (1) exhibit the form of the canonical equation of adaptive dynamics for classical life history problems, where the examples in Dieckmann et al. (J Theor Biol 241:370–389, 2006) and Parvinen et al. (J Math Biol 67: 509–533, 2013) are chosen such that they avoid a number of the problems that one gets in this most relevant of applications, (2) derive the fitness gradient occurring in the CE from simple fitness return arguments, (3) show explicitly that setting said fitness gradient equal to zero results in the classical marginal value principle from evolutionary ecology, (4) show that the latter in turn is equivalent to Pontryagin’s maximum principle, a well known equivalence that however in the literature is given either ex cathedra or is proven with more advanced tools, (5) connect the classical optimisation arguments of life history theory a little better to real biology (Mendelian populations with separate sexes subject to an environmental feedback loop), (6) make a minor improvement to the form of the CE for the examples in Dieckmann et al. and Parvinen et al.

fitness gradient equal to zero results in the classical marginal value principle from

Introduction
In their recent paper "Function-valued adaptive dynamics and optimal control theory", Parvinen et al. (2013) give (i) an abstract recipe for calculating the selection gradient for function valued traits affecting the i(ndividual)-dynamics of physiologically structured populations for use in the canonical equation of adaptive dynamics (in the terminology of Metz and Diekmann (1986); Parvinen et al. refer to these models as process-mediated) and (ii) a recipe for calculating the corresponding evolutionarily steady strategies (ESS-es) by using Pontryagin's maximum principle (c.q. evolutionarily singular strategies (ess-es) if we confine ourselves to the first order condition derived from this principle). They subsequently apply these recipes to derive concrete expressions for three sample models. However, they do not explicitly consider the relationship between (i) and (ii) but for numerically demonstrating that for their special models the adaptive trajectories approach the ess. In this note we (i) demonstrate how the selection gradient can be calculated from a concrete starting point by using the idea of fitness returns, which gives an interpretation to the components of the resulting formulas, and (ii) show that setting the selection gradient equal to zero leads to a classical marginal value argument which turns out to be equivalent to the local version of Pontryagin's maximum principle.
Terminology We employ the term fitness return here for a concept that is widely used in evolutionary ecology, often also under this name, but for which we failed to find an explicit definition. If some fitness proxy can be decomposed as the sum of a number of terms that supposedly stand for the contributions of different pathways by which fitness can accrue, we call the effect of a strategy change on the contribution of a pathway the fitness return through that pathway. For a global ESS the sum of all fitness returns is non-positive whatever the strategy change. For local ESS-es we consider only the fitness returns of infinitesimal strategy changes. To accord with common usage these should be called marginal returns. However, as these are the only returns that we consider we shall drop the epithet. When the attention is confined to an infinitesimal neighbourhood of a reference strategy far more fitness proxies allow a conceptually useful additive decomposition thanks to the rules of differential calculus. All that is needed is a biologically interpretable way in which the proxy can be decomposed as a differentiable function of a number of differentiable functions of the strategy. The (marginal) fitness return through one of these functions is then defined as the sensitivity of the proxy to the strategy change in a thought experiment in which we keep the argument of all other functions unchanged. The fitness returns from state dependent decisions are usually determined from first principles conditional on the state under consideration. The epithet conditional is customarily dropped in this case. The (marginal) fitness return from a compound decision involving more than one state is calculated by summing the fitness returns for the separate states weighted with their lifetime occurrence frequencies or duration.
To keep the arguments accessible for evolutionary ecologists, we restrict our calculations from the start to the most commonly encountered class of life history models and use the simplest possible mathematical arguments rather than a more advanced functional analytical approach. In the appendices we will sketch how the same arguments can be obtained more rigorously. Basically we assume our readers to be knowledgeable only about demography and the attendant elementary results from probability theory, but not about systems theory or dynamic optimisation.

On selection gradients, canonical equations, and evolutionarily singular strategies, a summary
Below we consider a life history model in which individuals are characterised by two dynamical variables, a physiological state, assumed to move deterministically, and a probability of still being alive, in addition to an inherited strategy u influencing their dynamics. The strategy u (or u res if we talk specifically about the resident's strategy, or u mut if we talk about a mutant strategy) is supposed to be a function of the state of the individual taking values in [0, 1]. To make our life simple we assume that on the population dynamical time scale the community dynamics converges to an equilibrium, which generates the non-fluctuating environment E res = E attr (u res ), with u res the strategy currently in residence. This assumption of a non-fluctuating environment allows us to make use of the fitness proxy R 0 (u mut ; E res ), the average lifetime offspring production of a mutant in the environment E res , calculated e.g. by integrating the average rate of producing offspring over the age of an individual. Consistency requires that R 0 (u res ; E res ) = 1. If R 0 (u mut ; E res ) > 1, a mutant has a positive probability to invade, else it cannot invade. The invasion fitness F (u mut ; E res ) of a mutant is by definition equal to the asymptotic exponential growth rate of a mutant population in the environment E res (Metz et al. 1992;Metz 2008). For R 0 close to 1 this invasion fitness is well approximated by where T r (u res ) is the average age at which the residents give birth in the environment E res (Metz and Diekmann 1986;Durinx et al. 2008).
Remark 1 Dieckmann et al. (2006) and Parvinen et al. (2013) consider seasonal differential-equation-based models where it is possible to calculate the invasion fitness directly by subtracting the time-averaged death rate from the time-averaged birth rate. For such models fitness takes the explicit form of an integral over the year cycle, and there is no need to fall back on an approximation. However, in the usual continuous time life history models only R 0 can be expressed explicitly as an integral. The availability of such an integral-based expression formed the basis for the developments in Parvinen et al. (2013), and will also be the cornerstone for our calculations.
The so-called selection gradient G tells how the invasion fitness of a u mut close to u res depends on the difference u mut − u res . Mathematically, the selection gradient is the derivative of the invasion fitness for u mut evaluated at u mut = u res = u. From the previous approximation formula for the invasion fitness it follows that we can calculate G as (Durinx et al. 2008;Metz 2008). In this formula d R 0 du mut is an abstract differential quotient, i.e. a linear map transforming functions of the physiological state into a number that linearly approximates the nonlinear dependence of R 0 on u mut .
In view of our stress on life history models, let us moreover assume that u is an allocation, so that u takes values in [0, 1]. The assumptions of a non-fluctuating resident environment and a deterministically moving physiological state moreover allow us to represent the strategy u as a function of the age a of an individual, i.e., u : R + → [0, 1]. In that case we can write for a function x : R + → R: (c.f., Parvinen et al. 2013). Hence the problem of calculating G reduces to that of calculating the function g. On the assumption that mutations are rare and mutational steps small the dynamics of u can on the evolutionary time scale be described by the so-called canonical equation (CE) of adaptive dynamics (Dieckmann and Law 1996;Champagnat 2003;Dieckmann et al. 2006;Parvinen et al. 2006Parvinen et al. , 2013Durinx et al. 2008;Méléard and Tran 2009;Champagnat and Méléard 2011;Gupta et al. 2014) with T s the average age at which the residents die, σ 2 the between individual variance of their offspring numbers (i.e., if m i is a lifetime offspring number of the i-th individual, σ 2 = Var (m i )), n their equilibrium population size, μ the (small) probability at a birth event of a mutation affecting u, and c the (small) covariance kernel of the mutational steps, i.e., if x denotes a mutational step in u, then The form of the CE given above is the one for clonally reproducing organisms (the customary assumption in most of life history theory which, however, usually is left implicit). In Appendix 1 we briefly consider its extension to Mendelian diploids. Our formula for the CE is slightly more complicated than the one in Dieckmann et al. (2006) and Parvinen et al. (2013). The reason is that these authors did not consider local constraints on the strategy, whereas in our case u(a) ∈ [0, 1], for each possible age a > 0. See Appendix 2 for further information. Another difference is that Dieckmann et al. (2006) and Parvinen et al. (2013) have set the factor σ 2 equal to 2, in keeping with the idea that for the i-models underlying the standard ordinary differential equation (ODE) models the distribution of the lifetime offspring number is geometric. Moreover, for the standard ODE models T r = T s and since the g in Dieckmann et al. (2006) and Parvinen et al. (2013) corresponds to our f def = g/T r , the T s in (4) cancels. Appendix 3 treats the corresponding considerations for the periodic ODE case considered by Dieckmann et al. (2006) and Parvinen et al. (2013), with as outcome that in this case their n should be interpreted as a harmonic death-rate weighted mean of the population sizes over a cycle.
The equilibria of the CE are the so-called ess-es. If these strategies are moreover (local) fitness maxima for the corresponding E res then they are also evolutionary equilibria, to which we refer as (local) ESS-es. (An alternative is that at an attracting ess the population starts to accumulate variation, so that it no longer stays quasimonomorphic as is supposed in the derivation of the CE. (The latter on good grounds: see Geritz et al. 2002;Geritz 2005;Dercole and Rinaldi 2008, Appendix 2).) Another way to calculate ESS-es is to maximise the invasion fitness, or alternatively R 0 , over u mut , leading to a function-valued map 1 u * mut (u res ), followed by solving the equation u * mut (u res ) = u res . It is here that Pontryagin's maximum principle is encountered (e.g., Pontryagin et al. 1964;Intrilligator 1971). This principle is derived by considering the differential equations for the i-states as constraints on their time development, and to extend the idea of Lagrange multipliers as encountered in finite dimensional optimisation problems to this case. The Lagrange multipliers then become functions of time, which can be shown to satisfy a set of differential equations, and for this reason are referred to as co-states (or adjoints). In Sect. 6 we give explicit expressions for the life history models described in the next section. Appendix 5 shows how Pontryagin's maximum principle can be derived directly from a weak variant of Bellman's principle of optimality, which is rather better known among ecologists.

Model ingredients
Before we get to the specifics we first introduce some notational conventions in order to keep our formulas from becoming too unwieldy.

Conventions
1. The argument E res will be usually hidden. 2. Similarly we shall often hide the argument u in expressions like P(a; u) for the probability that an individual survives till age a, or m(a; u) for its body size at that age. 3. When we use the argument u, then u stands either for u mut , u res , or u mut = u res , with the context making clear which is the case. 4. For a function of a single scalar variable we use a prime to indicate its derivative.
A superscript dot indicates a derivative for age, also when a function has other arguments as well.
The two dynamical variables characterising an individual are (i) one i-state variable, to wit its body mass m, increasing from a fixed birth mass m(0) = m 0 , and (ii) its probability P to be still alive, starting from P(0) = 1. The energy intake by an individual with body mass m will be denoted by e(m). The strategy of an individual will be denoted by u : where u(a) determines which fraction of the energy intake at age a is used for reproduction while the remains are used for growth. The body mass just increases with (1 − u)e(m), while the birth rate is assumed to depend monotonically on the available energy u e(m) as b : R + → R + : u(a) e (m(a)) → b (u(a) e (m(a))). Finally, the energy allocation is assumed also to affect the death rate d : [0, 1] → R + : u(a) → d (u(a)). All three functions e, b, and d also implicitly depend on E res . In this model the average lifetime offspring number of a mutant strategy u mut equals Note that if u mut = u res , due to the value of E res necessarily R 0 = 1. Moreover, we assume that the tail of P is bounded by a negative exponential and that b and e are bounded. These assumptions derive from the biology behind the example and imply that the improper integral in (6) exists. Fig. 1 Illustration of the fitness-returns argument where u is any control path. The idea is to do a thought experiment in which we increase u between a and a + δ with a block B of height ε, and observe its effect on fitness we proceed by means of a thought experiment. For a living individual aged a we increase u between a and a +δ by an amount ε, i.e., we construct a functionũ = u + B, B : R + → R, B(α) = ε for a ≤ α < a + δ and 0 elsewhere (see Fig. 1), Calculate the resulting expected change in the expected life-time offspring number, multiply this number with (εδ) −1 and let both ε and δ go to zero. Since the fitness return by definition is calculated conditional on an individual surviving to a, only the fraction P(a; u) surviving till age a contributes in this manner to R 0 . Hence g = P r.
To calculate those expected additional offspring numbers we proceed in the spirit of the marginal value theorem, that is, we calculate and then add the components of r contributed by different routes. These components include the immediate additional number of offspring coming from the temporary increase in energy allocation to reproduction and the decreases in future offspring numbers caused by the future smaller size and lesser survival caused by the temporary decrease in allocation to growth and to staving off death. We start with the calculation of the second and third components. Let Δ m and Δ P denote the differences m(ũ) − m(u) and P(ũ) − P(u), respectively. Then, by a first-order Taylor expansion of P(α) and m(α) with respect to u(α), we obtain for a ≤ α < a + δ The immediate offspring gain from this strategy change over the time interval [a, a+δ) (for an individual that survived till a) is From a + δ onwards Eq. (8) apply with ε set to zero and with initial condition (9). The future loss of offspring from this change in strategy for an individual that already has survived till a is The linearity of Eq. (8) with ε = 0 implies that Δ P (α) and Δ m (α) are linearly dependent on the initial conditions given by (9), and therefore the outcome (11) is proportional to εδ. To make the coming calculation more transparent we introduce new functionsP(α; a),Δ m (α; a), α ≥ a, defined by dP dα = −d(u)P,P(a; a) = 1, whereP(α; a) can be interpreted as the conditional survival of an individual that has already survived to age a, i.e.,P(α; a) = P(α)/P(a), given the strategy u, andΔ m (α; α + δ) = Δ m (α)/Δ m (a + δ) as the relative amount by which a small perturbation in m present at age a + δ will propagate into the future given u. Similarlŷ P(α; a + δ) allows an alternative interpretation as relative amount by which a small perturbation in m present at age a + δ will propagate into the future given u. For ε ↓ 0 and δ ↓ 0 we can then express the fitness return r (a; u) as follows: At an ess u * the return r (a) should be 0 when u * (a) ∈ (0, 1), non-positive when u * (a) = 0 and non-negative when u * (a) = 1.

The other ingredients of the canonical equation
To complete the canonical equation we need to find expressions for T r , T s , and σ 2 .
Since R 0 (u; E attr (u)) = 1, the expression P(α; u)b(u(α)e(m(α; u))) is a probability density. Furthermore, thus also −Ṗ(α; u) is a probability density. Therefore, T r and T s can be expressed directly: where the last equation comes from integrating by parts and using lim A→∞ (AP(A)) = 0.
To calculate σ 2 we have to be more specific about the microstructure of the reproduction process. The assumption that naturally leads to (5) is that for an individual that is still alive the births come in a Poisson process with rate b(ue(m)), or, slightly more generally, in clutches of average size C(u, e(m)) produced according to a Poisson process with rate b(ue(m)) C(u,e(m)) . We confine ourselves here to the former option. In such a case, for a given age at death a the total offspring number is Poisson distributed with mean In general, a is a realisation of a random variable a. Hence, the lifetime offspring number is a mixture of Poisson random variables. The mean of λ = λ(a; u) is nothing but the average lifetime offspring number 2 2 Using integration by parts, (integration by parts). Finally, from the general rules for mixtures of distributions 3

Locating fitness maxima by means of Pontryagin's maximum principle
The equilibria of the canonical equation are called ess-es. The reason for this from a differential equations viewpoint unusual terminology is that among the ess-es only the ESSes, characterised by the fact that they are also maxima of the current fitness landscapes, are immune to evolutionary change. One way of calculating ESSes for life history problems is to make use of Pontryagin's maximum principle to locate the fitness maxima in u mut that go with a given u res and then to set u mut = u res . In contrast to the canonical equation, Pontryagin's maximum principle, at least in its original form and with a number of standard assumptions satisfied, is textbook material. In this section we will just in the wake of Intrilligator (1971) state the conditions that an optimal u has to satisfy. For a discussion of different variants of Pontryagin's maximum principle and its connection to the Bellman's principle of optimality (Bellman 1957), see Appendix 5. In the notation of Intrilligator (1971), Eq. (6) can be rewritten in the following form: with x the state vector, J the quantity to be optimised, calculated as the lifetime integral of I , and f the right hand side of the differential equation for x. Pontryagin's maximum principle then says that if u * maximises J , then at each age a ∈ [0, ∞) it also maximises the so-called Hamiltonian, defined as with y = y 1 y 2 being the so-called co-state (or adjoint) vector, where its components  E(λ(a)). Since for a Poisson random variable with mean λ its variance also equals λ, σ 2 = Var(m) = E(m 2 ) − (E(m)) 2 = E(E(m 2 |a)) − 1 = E(Var(m|a) + (E(m|a)) 2 ) − 1 = E(λ(a)) + E(λ 2 (a)) − 1 = E(λ 2 (a)) = E(λ 2 ).
at each a ∈ [0, ∞). This implies that Obviously, to assure that u * (a) is a local maximum of J , resp. H , the derivative of g H (a) with respect to u(a) has to be negative whenever 0 < u(a) < 1. The co-states y 1 (a) and y 2 (a) in (26) can be expressed from (24) as follows: where y 1 (0) and y 2 (0) have to be chosen such that lim A→∞ y 1 (A) = lim A→∞ y 2 (A) = 0. In Appendix 4 we show that: which can be interpreted as the marginal loss or gain per unit weight change (sensitivity) of lifetime offspring due to lower subsequent weights, and (ii) which can be interpreted as the sensitivity of lifetime offspring due to lower subsequent survival. Moreover, in the same appendix we show that Formulas (12)-(13) for calculating the fitness returns (c.q. the selection gradient) and Formulas (26)-(27) for the derivative of the Hamiltonian with respect to u, are equivalent.
The detailed match between the results from the two approaches more generally follows from the correspondence between Bellman's principle of optimality and Pontryagin's maximum principle that we work out in some detail in Appendix 5.
On the practical side we point at the fact that even when one is only interested in calculating an ESS with the help of Pontryagin's maximum principle, and has no particular interest in the evolutionary trajectories by which this ESS may be reached, running some discretised variant of the canonical equation can still provide an effective computational implementation of that principle as used in ESS calculations.

Discussion
The main contribution of this note is that we carefully set up the CE for life history decisions. As it turned out, a few details had to be added to the exposition in Parvinen et al. (2013). In particular, it was necessary to extend the canonical equation so as to be able to handle inequality constraints. In addition, there was the small detail of the appearance of an additional multiplicative factor accounting for the difference in the initial branching process that mutants have to get through before getting established compared to the linear birth and death process that appears in this role for ODE population models (c.f. Durinx et al. 2008).
Given the venerable history of Pontryagin's maximum principle and its applications to life history theory it should raise no wonder that interpreting the co-states is not new. In particular, Jesus Alberto Leon already did so in the nineteen-seventies (Leon 1976); see also Perrin and Sibly (1993). However, in those days there was no canonical equation around and hence no need to make a connection. Moreover, these early authors put forward the interpretation seemingly ex cathedra, and only post hoc and summarily related it to a marginal value argument, without exhibiting the explicit connection made in our Sects. 4 and 6 and Appendices 4 and 5. In particular, they did not consider "co-state variables" for other u than the optimal one. Precisely these "generalised costate variables" occur as ingredients of the selection gradient. Although such variables are already used in numerical approaches to Pontryagin's maximum principle (e.g. Näslund et al. 1974), we believe that our explicit calculations add to the biological understanding of the mathematical structure of eco-evolutionary models.
As a final point we note that the argument that we provide in Sect. 5, although this was not spelled out there, is exemplary of a more general principle. When we delve a little more deeply into the stochastic models for individual behavior, as was necessary in order to calculate σ 2 , it generally becomes clear how embarrassingly oversimplified such models tend to be. In our case it turned out that it was implicitly assumed that microscopically the production of young is coupled far more loosely to the energy flow to reproduction than seemingly is assumed at the macroscopic level. Real organisms first have to accumulate the necessary energy that then is transformed into the birth of a young, instead of randomly producing young on the basis of the instantaneous availability of resources. Therefore in reality the production of young usually is far more regular than Poisson (so that σ 2 is close to σ 2 λ ), and at a given time depends also on past energy availabilities. Hence the idea that the average rate of offspring production at age a is just a function b of u(a) e(m(a)) is at best only a rough approximation. One possible justification is that most of the time u e(m) varies only slowly compared to the rate at which young are produced, and that if reproduction does occur spread out in time, no two individuals will be in the same phase of their reproduction cycle, so that at any one time the effective offspring production of the individuals that have a size close to the scalar m may well be on average close to b(ue(m)). However, the modelling community is still a long way from proving any rigorous approximation theorems of this ilk. (See Heijmans and Metz (1989) for another possible justification, which, however, is less often applicable in a general life history context.) Of course we also made other simplifying assumptions, like neglecting basal metabolism. However, these simplifications were only put in to ease the exposition, raise no deep mathematical issue, and hence can presumably be relaxed without great difficulty.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix 1: Mendelian organisms
Most life history models implicitly assume clonal reproduction. Yet, by far the majority of organisms that are supposedly targeted by these modeling efforts are Mendelian diploids (c.f. Stearns 1976Stearns , 1977. To help overcoming this awkward discrepancy we summarise here some results for the Mendelian case (c.f. Metz and de Kovel 2013).
The first difference between the clonal and Mendelian cases is that the homozygote phenotype present after a substitution differs from the heterozygote phenotype that invaded. Since for small mutational steps the genotype to phenotype map is approximately additive, this leads to the appearance of an additional factor two (on the assumption that there are no parental effects) in the right hand side of the canonical equation.
A more fundamental difference is that as a rule the gametes involved in sexual reproduction come in two types, macro-and micro-gametes. To keep the discussion simple we concentrate on the case where the sexes are separate, for otherwise we have to consider triple allocation targets, to growth, macro-gametes, and micro-gametes. In the case of separate sexes we simply have u = (u f , u m ), with u f the allocation rule of the females, and u m the one of the males. These allocation rules in general will be evolutionarily coupled through mutational co-variances, but, except for a common time scaling with T −1 r (u), the selection gradients can be treated separately, as if we were dealing with two coevolving species, with each of the sexes setting part of the environment, which now also includes fertilization opportunities, for the other sex. This independence derives from the additive relation R 0 = 1 2 (R f + R m ) , with R f the average lifetime number of kids of a female and R m the average lifetime number of kids of a male (e.g. Metz and Leimar 2011;Gyllenberg et al. 2011). Similarly, where the additional indices f and m mean that the so indexed quantity, in this case the average age of the parent at the birth of its kids, is calculated conditional on the sampled individual being a female or a male. Hence, for S ∈ {f, m}, The action of the derivative can again be expressed as an integral with the functions g S calculated in the same manner as for the clonal model, with the hidden argument E attr in the functions b S accounting for any differences in availability of fertilization opportunities at different u res . Finally, T s = q f T s,f + q m T s,m , with q f and q m the relative frequencies with which the sexes are born into the resident population, and σ 2 = 1 4 (q f σ 2 f + q m σ 2 m + q −1 f + q −1 m − 2) (the latter formula also takes into account the random sampling of alleles during the offspring production by the heterozygotes).
The upshot is that the males and females in any ESS-es satisfy separate Pontryagin maximum principles, with the coupling between the sexes appearing in the equations only through the influences the resident female and male strategies exert on E attr . Yet, the fact that the fertilization opportunities come as a component of E attr inextricably entwines life history evolution with sex ratio evolution.

Appendix 2: How to deal with local constraints?
This appendix emulates the treatment in Dieckmann et al. (2006) with the difference that we go one step further in working out the result, so that we end up with a simple formula that provides the right match for the Pontryagin maximum principle.
For a start we note that in principle the mutational covariance function is not constant over evolutionary time, but depends on the evolutionary history of the population. In particular, the distribution of mutational steps has to change near a constraint boundary so as to preclude overstepping it. There are various ways in which this change may happen. Most of these will make the distribution of the steps asymmetric, with close to the boundary steps towards the interior of the space of feasible strategies becoming more common relative to steps towards the boundary. The CE as given by Dieckmann et al. (2006) and Parvinen et al. (2013) is based on the assumption that the mutation distribution is symmetric, in line with most papers on the CE; (Formulas for the nonsymmetric case may be found in Law (1996), Champagnat (2003), and Champagnat and Méléard (2011).) In our formula we have kept the form of the CE unchanged in the interior of the constraint set and only set the right hand side equal to zero where that movement would lead to the passing of a constraint boundary. The rationale for this ploy is the following. The CE is derived as a limit in which one lets a factor that scales the mutational steps go to zero. This means that at any distance from the constraint boundary eventually the effect of the constraint will no longer be felt, and if the mutation distribution would otherwise be symmetric, this symmetry would eventually be recovered for all resident strategies that are not located on the boundary. At boundary strategies, in the CE limit the movement component in the outward direction has to drop to zero, since there the mutation distribution stays forever asymmetric, with its probability mass all located on the feasible side. In the limit the distribution of this mass contracts towards the boundary. On segments of the boundary where the nearby movement is towards that boundary the movement on the boundary becomes restricted to it by the covariance function abruptly becoming singular. On the natural assumption that the constraint does not affect movement parallel to the boundary, this corresponds to just setting the right hand side to zero at the indicated values of a. (In finite dimensional trait spaces the analogous condition is that on the boundary the movement component orthogonal to the boundary becomes zero whenever close by the movement is towards the boundary, while the movement component parallel to the boundary is a continuous extension of the movement component in that direction in the interior of the constraint set.)

Appendix 3: The canonical equation for periodic ODE population models
The right hand side of the CE equals On the assumption of small mutational steps and a symmetric mutation distribution the latter average gives 1 2 times the mutational covariance operator applied to the selection gradient, where the 1 2 comes from the fact that the linear approximation only applies in the half space where the invasion fitness is positive and is replaced by 0 where it is negative. The factor σ −2 in (4) comes from the lowest order term of the asymptotic expansion for the probability Q that a mutant with a slightly positive fitness (0<F 1) invades. When births occur singly the individual-based models underlying ODE population models can for the initial phases of mutant invasion be approximated by a linear birth and death process. For constant environments the corresponding generation process is of Galton-Watson type with a geometric offspring distribution with mean R 0 = b d with b and d the per capita birth and death rates of the mutant, respectively, while e a measure for the average variability of the offspring production of the residents (for which R 0 = 1), which in the case of a single birth state reduces to the variance of the offspring distribution σ 2 (c.f. Metz and de Kovel 2013;Durinx et al. 2008).
(For a geometric distribution with mean 1: σ 2 = 2.) The rate at which mutants are produced equals the population birth rate times the per birth probability of a mutation. The factor n in (4) appears by re-expressing the population birth rate B of the resident as n T s , based on the general consistency relation n = B T s . Below we consider the extension of these considerations to periodic environments; the further extension to general ergodic environments is treated in Ripa and Dieckmann (2013).
In the case of periodic environments we have to average both the number of births as well as the probability to invade over a cycle, where the first average is a time average and the latter average is over the distribution of births over the cycle.
To calculate the invasion probability in dependence of the phase θ of appearance of a mutant during the environmental cycle, q(θ ), we use the general formula for the invasion probability for linear birth and death processes with time variable parameters derived in Kendall (1948): With time rescaled so that the period equals T = 1, we then get with w(θ) = b 0 (θ )e r 0 (θ;0) the probability distribution of the phase of the environmental cycle at which a mutant may be expected to appear, with b 0 and d 0 the periodic per capita birth and death rates of the residents and r 0 defined as in (33). The stationarity of the resident population implies that r 0 (t + 1; t) = 0, i.e., t+1 t b 0 (τ ) dτ = t+1 t d 0 (τ ) dτ (no per capita population growth over a full environmental cycle) as well as t+1 t b 0 (τ ) e r 0 (τ ;t) dτ = t+1 t d 0 (τ ) e r 0 (τ ;t) dτ (the total births over a cycle matches the death toll over the cycle). More in general F = r (t + 1; t) = r (1; 0) and (Bacaer and Guernaoui 2006), where in the periodic case R 0 is defined as the dominant eigenvalue of the operator that gives the average number of newborns born at different phases of the cycle for mothers born at different phases. To calculate the derivative of Q we introduce a scalar variable x by which we parametrise a curve in the space of strategies passing transversally through the resident value at x = 0, and write all the coefficient functions as functions of x, written as an index in the case of b, d, and r . As later on we also need the invasion probability and invasion fitness as a function of any mutant strategy, we will denote the maps from x to these two quantities asQ andF. With we can write From lim x→0 q(θ ; x) = 0 it follows that lim andQ To calculate the term after the limit sign we observe that Hence Hence away from local constraints the CE becomes where s now denotes the strategy, which in the seasonal flowering model of Dieckmann et al. (2006) consists of a flowering intensity as a function of θ , and Hence the n in Parvinen et al. (2013) has to be interpreted as To see how (46) compares with (4) we first observe that for periodically fluctuating populations there is no immediate counterpart for the equality n = B T s , so we substitute the latter in (4), while observing that the counterpart of B in (46) is (46) we then end up with the pairing To calculate T r we use F = b − d and R 0 = b/d together with (1) to find Therefore,

Appendix 4: Relating the results of Sects. 4 and 6
In this appendix we show that formulas (12)-(13) for calculating the fitness returns (c.q. the selection gradient) and formulas (27) which is to be compared with where we now dropped the second argument ofP andΔ m on the understanding that P(a) =Δ m (a) = 1. The structure of the above expressions is • for fitness returns: Expanding the integrals giveŝ Hence, Therefore, indeedỹ 1 =ŷ 1 andỹ 2 =ŷ 2 . The optimal control problem is to find an admissible control u * which maximises the objective function (63) subject to Constraint (62). Such a control u * is called an optimal control, the corresponding state trajectory is denoted by x * and is called the optimal trajectory under u * , J (u * ) or J * then denotes the optimal value of J .

Appendix 5.2: Bellman's optimality principle
Bellman's optimality principle which has to be satisfied for an optimal control u * (Bellman 1957) reads: "an optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision." Observing Fig. 2, we can state Bellman's optimality principle as a proposition: Proposition If a b e is the optimal path from a to e, then b e is the optimal path from b to e.
Proof Suppose it is not. Then there is another path (note that existence is assumed here) b c e which is optimal from b to e, i.e. J bce > J be . But then J abe = J ab + J be < J ab + J bce = J abce . This contradicts the hypothesis that a b e is the optimal path from a to e.

Appendix 5.3: The value function for any fixed u
We can formulate Bellman's optimality principle using the so-called value function V : R n × R → R for a given (but not necessarily optimal) control u ∈ Ω. This value function is defined as where for s ≥ t In the original work of Bellman (1957) the value function was defined for an optimal strategy u, as the main focus of his work was finding this optimal strategy. We, however, formulate the value function for any u and introduce the Bellman's optimality principle with u being optimal at the further step of this derivation.
In Fig. 3 we illustrate the value function in the (x, t)-space for a given u. Note that incremental changes in J from t to t + Δ t are given by the integral of Π(x, u) from t to t + Δ t . Considering that the change in the objective function consists of the incremental changes in J from t to t + Δ t plus the value function V (x + Δ x , t + Δ t ; u) at time t + Δ t (for fixed u) we can write Since Π is continuous in t, the integral in (66) can be approximated by Π(x t , u(t)) Δ t , so that we can rewrite (66) as If V is continuously differentiable, we can use the Taylor expansion of V with respect to Δ t to obtain Substituting x from (62) we obtain By canceling V (x t , t; u) on both sides and then dividing by Δ t we get By letting Δ t ↓ 0, we obtain with the boundary condition V (x(T ), T ; u) = S (x(T )). Here ∂ V (x t ,t;u) ∂ x can be interpreted as the marginal contribution of the state variable x to the objective function for a fixed u. We denote it by y(t) ∈ R n and call it the costate vector (also known as adjoint or auxiliary vector in optimisation and control theory and as a shadow price in economics), thus and introduce the so-called Hamiltonian 4 H (x(t), u(t), y(t)) def = Π(x(t), u(t)) + (y(t)) T f (x(t), u(t)).
Note that the previous definition for our example identifies y with the marginal fitness return per unit of change in the state variables at age a (here time t). Below we shall focus on optimal controls and trajectories, and in this way derive the equations for y customarily encountered in the literature on Pontryagins maximum principle.
(Note the implicit assumption that V is twice continuously differentiable.) By definition of the Hamiltonian, at u = u * and x = x * , By definition (72) of y(t) From Formulas (80) and (81) we have Substituting the costate from Formula (72), we obtaiṅ Substituting the Hamiltonian (73) into this expression, we obtaiṅ Terminal boundary (or transversality) conditions are defined as Equations (83) and (84) Combining Formulas (83), (84), (85), and (62) we get Equation (86) is called a canonical system of equations or canonical adjoints.