Lotka–Volterra approximations for evolutionary trait-substitution processes

A set of axioms is formulated characterizing ecologically plausible community dynamics. Using these axioms, it is proved that the transients following an invasion into a sufficiently stable equilibrium community by a mutant phenotype similar to one of the community's finitely many resident phenotypes can always be approximated by means of an appropriately chosen Lotka–Volterra model. To this end, the assumption is made that similar phenotypes in the community form clusters that are well-separated from each other, as is expected to be generally the case when evolution proceeds through small mutational steps. Each phenotypic cluster is represented by a single phenotype, which we call an approximate phenotype and assign the cluster’s total population density. We present our results in three steps. First, for a set of approximate phenotypes with arbitrary equilibrium population densities before the invasion, the Lotka–Volterra approximation is proved to apply if the changes of the population densities of these phenotypes are sufficiently small during the transient following the invasion. Second, quantitative conditions for such small changes of population densities are derived as a relationship between within-cluster differences and the leading eigenvalue of the community’s Jacobian matrix evaluated at the equilibrium population densities before the invasion. Third, to demonstrate the utility of our results, the ‘invasion implies substitution’ result for monomorphic populations is extended to arbitrarily polymorphic populations consisting of well-recognizable and -separated clusters.



Introduction
Ecological interactions create selection pressures that may change those very interactions. Such eco-evolutionary feedback can induce rich coevolutionary dynamics including cyclic coevolution (e.g., Dieckmann et al. 1995;Dieckmann and Law 1996), adaptive radiation (e.g., Ackermann and Doebeli 2004;Egas et al. 2005), adaptive speciation (e.g., Dieckmann and Doebeli 1999;Dieckmann et al. 2004;Rundle and Nosil 2004), taxon cycles (e.g., Kisdi et al. 2001;Ito and Dieckmann 2007), and community formation (e.g., Loeuille and Loreau 2005;Dieckmann et al. 2007;Ito et al. 2009;Takahashi et al. 2013). To arrive at tractable descriptions of such evolutionary dynamics, the assumption is often made that mutation rates are low relative to the timescale of population dynamics. This assumption reduces the evolutionary dynamics to a trait-substitution sequence resulting from repeated mutant invasions (Metz et al. 1992(Metz et al. , 1996Dieckmann and Law 1996). Such invasions potentially bring about various outcomes: most often, (1) extinction of only the resident that is parental to the mutant, and more rarely, (2) coexistence of the mutant with all residents, or (3) other combinations of extinctions of the parental resident, non-parental residents, and mutant.
It has been proved that when for all residents all potentially invading mutants are subject to directional selection and the resulting perturbations to the system are sufficiently weak, as measured by the product of fitness gradients and mutational step sizes relative to the return rate to their population-dynamical equilibrium before the invasion, invading mutants replace their parental residents-a statement referred to as the invasion-implies-substitution theorem (Geritz 2005;Dercole and Rinaldi 2008). The resulting trait-substitution sequences describe directional coevolution, characterized well by a set of ordinary differential equations called the canonical equations of adaptive dynamics theory (Dieckmann and Law 1996), which have a form similar to Lande's equations of quantitative genetics theory (Lande 1979).
Eventually, directional coevolution may take some residents to the neighborhood of peaks, troughs, or saddles of the community's fitness landscape, which means that those populations experience very weak directional selection. Here, an invading mutant may coexist with its parental resident, which may be followed by diversifying evolution of the two morphs, called evolutionary branching (Metz et al. 1996). If the community has a one-dimensional trait space and a single resident, necessary and sufficient conditions for its evolutionary branching into two distinct residents have been obtained (Metz et al. 1996;Geritz et al. 1998).
On the other hand, for higher-dimensional traits or more than one resident, obtaining formal conditions for the occurrence of evolutionary branching is difficult (but see Ito and Dieckmann (2014) for a special case). This is largely because in these more complex community dynamics it is not easy to analyze the outcomes of mutant invasions (Metz et al. 1996). This difficulty may be reduced when the population dynamics can be approximated by Lotka-Volterra (LV) models, which are analytically more tractable and have been studied well (e.g., Zeeman 1993;Hofbauer and Sigmund 1998). The LV-approximation is possible when all existing residents and the mutant are similar to each other, so that they form a single phenotypic cluster (Meszéna et al. 2005;Durinx et al. 2008), which yields an expression for the invasionfitness function in terms of resident and mutant phenotypes that is given by a rational function. By using this rational form, considerable progress in deriving conditions for multidimensional evolutionary branching has recently been made (Geritz et al. 2016;Sect. 9.3). Dercole and Rinaldi (2008) proved that the LV-approximation holds also when all of the existing residents are not similar to each other, i.e., when every cluster has only a single resident, and their initial equilibrium population densities are not small. (Although such limiting assumptions for residents are not made in their proof, these assumptions are required when we consider trait-substitution sequences, as explained in Sect. 4.4). Thus, the remaining cases to be analyzed are (a) only some residents are similar to each other and (b) the population densities of some residents are very small so that they may go extinct as a result of the invasion. Both cases are likely to occur in multispecies coevolution, including processes involving multiple evolutionary branching and taxon cycles, commonly observed in numerical simulations of trait-mediated community dynamics (e.g., Doebeli and Dieckmann 2000;Ito and Dieckmann 2007). Therefore, the goal of the present paper is to obtain formal conditions for ensuring the LV-approximation for an arbitrary set of residents, including the aforementioned two cases. Based on the obtained conditions, the invasion-implies-substitution theorem can be extended to a mutant with an arbitrary set of residents.
The next section, Sect. 2, formulates a set of axioms that are expected to hold for ecologically plausible differential equations describing trait-mediated community dynamics. Section 3 derives a condition for ensuring the LV-approximation. Sections 4 and 5 derive sufficient conditions for satisfying this condition, in terms of properties of the fitness-generating function and mutational step sizes. Section 6 explains how the thresholds for the obtained sufficient conditions can be improved further. Section 7 shows how to examine the obtained sufficient conditions for a specific ecological model. Section 8 extends the invasion-implies-substitution theorem.

Axioms for fitness-generating functions
We consider community dynamics written as dn i dt n i F(s i ; s; n) (2.1) with population densities n i for i 1, . . . , N .
We denote by S ⊂ R Z a compact Z -dimensional trait space, by s (s 1 , . . . , s N ) T ∈ S N an N -dimensional vector of trait values of the phenotypes present in the community, and by n (n 1 , . . . , n N ) T ∈ R N + the vector of their population densities. The fitness-generating function F : S × ∞ N 1 S N × R N → R : (s , s, n) → F(s ; s; n) ( 2 . 2 ) describes the instantaneous per capita growth rate of an arbitrary phenotype s with an infinitesimally small population density in the instantaneous environment produced by resident community composition (s, n) (Brown and Vincent 1987;Cohen et al. 1999). The fitness-generating function provides a fitness landscape for each community composition (s, n). We assume that it satisfies the following axioms: Below, we restrict the community's space of population densities to [0, η] N . The smoothness axiom (i) follows from the assumption that the populationdynamical behavior of individuals depends smoothly on their traits and that all ecological interactions are instantaneous. The latter assumption is implicit in the assumption that the per capita growth rate depends only on the arguments s and (s, n). Axioms (ii) to (iv) are consistency conditions that go with representing the behaviour of large collectives of individuals by differential equations for their densities. Axiom (ii) follows from the arbitrariness of the ordering of the trait N-tuples, and axiom (iv) from the fact that individuals with the same trait values are assumed to be indistinguishable. The consequent additivity for identical phenotypes mechanistically lies at the heart of the LV-approximability. The bounded-world axiom (v) is just what it says: there necessarily is a limit to the biomass that a patch of world can support. Models that do not acknowledge this may on occasion be good approximations for specific purposes, but when we run into results contradicting the bounded-world assumption, we have to start modifying the model.
To keep the exposition simple, we assume from now on a one-dimensional trait space S ⊂ R. The results are generalized to higher-dimensional trait spaces in Sect. 5.4.

Population dynamics triggered by mutant invasion
We assume that the community is at a locally stable equilibrium n ( n 1 , . . . , n N ) T , determined by F(s i ; s; n) 0 for all i 1, . . . , N . When an invasion by a mutant s s N +1 with |s N +1 − s N | ε μ has occurred, the combined population dynamics can be written as dn i dt n i F(s i ; s ; n ) (2.3) for i 1, . . . , N + 1, where s (s 1 , . . . , s N , s N +1 ) T and n (n 1 , . . . , n N , n N +1 ) T , starting from n ( n 1 , . . . , n N , n N +1 ) T with very small n N +1 , which means that n is almost identical to the equilibrium before the invasion, n ( n 1 , . . . , n N , 0) T . Please notice that here we have introduced the notational convention, to which we adhere throughout this paper, that vectors of dimension N + 1 directly corresponding to vectors of dimension N are denoted by an added prime, as in s , n , and n .
The remainder of this paper is devoted to making precise the, very general, conditions under which this proposition holds, and to calculating the corresponding error bounds. Important variables and parameters used in our analysis are shown in Table 1.

Basic idea
The root of the LV-approximability is the exchangeability axiom (iv) combined with the smoothness axiom (i). Under the exchangeability axiom (iv), the fitness-generating function does not distinguish individuals with identical phenotypes. Hence, the function responds only to the sum of their densities. Under the smoothness axiom (i), this property is approximately inherited by slightly different phenotypes; the fitnessgenerating function responds primarily to the sum of their densities. In the remainder of this paper, we will work out how to lowest order of approximation the fitnessgenerating function responds linearly to the separate contributions to this sum, leading to the LV-approximation.
To get a more specific picture, we first suppose that there exist only two phenotypes, a resident phenotype s 1 and a mutant phenotype s 2 , with population densities n 1 and n 2 , respectively, and with their phenotypic difference given by the mutational step size, |s 2 − s 1 | ε μ , with ε μ being small. Proposition 1 trivially holds when the deviations of n 1 and n 2 from their initial states n ( n 1 , n 2 ) T are both small during the transient following mutant invasion. In many cases, however, those changes are large, resulting in the exclusion of the resident (Dercole and Rinaldi 2008, Appendix B). In the latter case, it is not obvious whether a linear approximation of the fitness-generating function in n (n 1 , n 2 ) T is valid. On the other hand, as the mutant is similar to the resident, due to the smoothness and exchangeability property of the fitness-generating function, they act almost like a single phenotype in their effect on the environment. Thus, invasion by the mutant in many cases causes only a slight change in their total population density n 1 + n 2 , and only their fractions may change substantially, but will do so slowly (Dercole and Rinaldi 2008, Appendix B;Meszéna et al. 2005;Durinx et al. 2008). In other words, the fitness-generating function is not sensitive to even large changes of n 1 and n 2 , as long as n 1 + n 2 is kept almost constant. As shown later, this implies that the change of F(s i ; s ; n ) induced by a large change of n 2 , keeping n 1 + n 2 constant, is slight, so that F(s i ; s ; n ) can be expanded with respect to m (m 1 , m 2 ) T (n 1 + n 2 , ε μ n 2 ) T , even for ε μ → 0. The linear relationship between m and n then makes Proposition 1 hold: as the change of m 2 ε μ n 2 is always small because of the smallness of ε μ , this is the case whenever the change of the population density m 1 n 1 + n 2 is small. Below, we introduce the notion of approximate phenotypes, so we can abbreviate the preceding condition by stating that the change in the population density m 1 n 1 + n 2 of the approximate phenotype (s a s 1 or s a s 2 ) is small.
The strategy above is readily extended to multiple residents s 1 , . . . , s N and a mutant s N +1 emerged from the parental phenotype s N with |s N +1 − s N | ε μ , by choosing an approximate phenotype from each of the existing phenotypic clusters, so that density changes of those approximate phenotypes can be kept small during the transient following mutant invasion, and thus an LV-approximation can be warranted (Sect. 3). We can gauge the smallness of their density changes from the leading eigenvalue of the community's Jacobian matrix evaluated at the equilibrium population densities of the approximate phenotypes before the invasion (Sect. 4). However, this linear stability analysis does not work well when some approximate phenotypes have very small initial equilibrium densities, because those small densities inevitably cause the leading eigenvalue of the community's Jacobian matrix to be close to zero. To overcome this difficulty, we analyze not only the linear terms but also the quadratic terms of the transient dynamics around the initial equilibrium (Sect. 5). In the remainder of this section, we show how we can easily find approximate phenotypes for a set of phenotypes s (s 1 , . . . , s N +1 ) T , such that Proposition 1 holds when the changes of the population densities of these approximate phenotypes are sufficiently small.

Approximate phenotypes
We consider an arbitrary set of residents together with a mutant, s (s 1 , . . . , s N , s N +1 ) T . We choose phenotypic clusters so that within-cluster phenotypic differences do not exceed ε ρ μ ε μ (Fig. 1a), with an arbitrarily chosen constant ρ μ larger than 1 (but not too large, so that the clustering is meaningful, i.e., the error estimates to be derived below are small). We assume that those phenotypic clusters are well-recognizable and well-separated from each other, so that we can find an ε that is much smaller than the smallest distance among the approximate phenotypes. Generally, this assumption is warranted in evolutionary dynamics with small mutational step sizes (as explained in Sect. 9.2) by the principle of limiting similarity. Notice that in any case the mutant s N +1 and its parental phenotype s N form a cluster. Any resident not similar to any other phenotype forms a cluster by itself. Thus, the number of clusters, denoted by M, satisfies 1 ≤ M ≤ N . From each cluster, we arbitrarily pick one phenotype as its representative. Then, by the symmetry axiom (ii), we can permute s (s 1 , . . . , s N , s N +1 ) T so that those representatives come first as s 1 , . . . , s M , followed by the other phenotypes, i.e., s (s 1 , . . . , s M , s M+1 , . . . , s N +1 ) T (Fig. 1b). We refer to those representatives as approximate phenotypes s a (s 1 , . . . , s M ) T .
We introduce the cluster-identifying function cid, such that cid( j) i means that phenotype s j belongs to the ith cluster, with s cid( j) as the representative-i.e., approximate-phenotype of that cluster, and cid( j) j for j ≤ M. We also introduce the component-identifying function com, which returns the set of indices of the phenotypes comprising the i-th cluster, i.e., com(i) { j| cid( j) i}. Then, the population densities of these clusters are given by a vector m (m 1 , . . . , m M ) T , with the population densities for i 1, . . . , M treated as belonging to the approximate phenotypes s a (s 1 , . . . , s M ) T (Fig. 1c). While the approximate phenotype of the ith cluster is identical to the representative phenotype of that cluster, the population densities of the former and latter are different and given by m i and n i , respectively.
Notice that the number M of approximate phenotypes is less than the number N + 1 of phenotypes in the original community dynamics. Thus, for expanding the fitness- Clustering and permuting

Fig. 1
Construction of approximate phenotypes and their population densities. a The population densities n (n 1 , n 2 , n 3 , n 4 , n 5 ) T of existing phenotypes s (s 1 , s 2 , s 3 , s 4 , s 5 ) T -comprising four residents s 1 , s 2 , s 3 , s 4 , and a mutant s 5 -are indicated by colored histogram bars. The thick gray curve shows the fitness landscape, which passes through 0 at the resident phenotypes. The existing phenotypes are clustered so that within-cluster phenotypic differences do not exceed the threshold ε, which is chosen to be larger than the mutational step size ε μ , so that the mutant s 5 and its parental resident s 4 are guaranteed to be part of the same cluster. b The existing phenotypes are permuted so that the approximate phenotypes s 1 , s 2 , s 3 come first. c Within each cluster, an approximate phenotype is chosen to represent the cluster. The population densities m 1 , m 2 , m 3 of the approximate phenotypes are assigned so that each equals the total population density of the corresponding cluster (color figure online) generating function, we need to define the other (N − M + 1) variables in such a way that their changes stay small during the transient following mutant invasion. As long as the population densities of the approximate phenotypes are kept almost constant, the fitness-generating function is expected to be insensitive to n i for all i 1, . . . , N + 1. Thus, we describe the remaining degrees of freedom, m M+1 , . . . , m N +1 , by for i M + 1, . . . , N + 1. Combining Eqs. (3.1a) and (3.1b), we write m (m 1 , . . . , m M , m M+1 , . . . , m N +1 ) T , which has the same dimension as n . Then, by the smoothness axiom (i), the exchangeability axiom (iv), and the bounded-world axiom (v), we have Lemma 1 With F(s i ; s ; m ) : F(s i ; s ; n ), for sufficiently small ε there exists a constant C Fm such that for all i, j 1, . . . , N + 1, n ∈ [0, η] N +1 , and any s such that s j − s cid( j) ≤ ε.
See Appendix A for the proof. Although F(s i ; s ; m ) differs from F(s i ; s ; n ) as a mathematical object, their biological meaning is the same. Lemma 1 thus ensures the expandability of F(s i ; s ; n ) in terms of m . The estimate C Fm still depends on s , but is positive and uniformly bounded away from 0 and ∞.
As we did for C Fm , below we will introduce bounds for other important variables and functions in the form of expressions C · that are independent of population densities (but may be functions of other model parameters). Please notice that here we have introduced the notational convention, to which we adhere throughout this paper, that C F · denotes the upper bound for the absolute value (or norm) of · the derivative of the fitness function with respect to ·, while C · denotes the upper bound for the absolute value (or norm) of · or for the derivative of the first symbol in · with respect to the subsequent symbol(s). All C F · and C · are positive and uniformly bounded away from 0 and ∞. In the propositions below, we just indicate that such constants exist. Expressions for determining their values are derived in the associated appendices and are shown in Table 2.

Taylor expansion in the population densities of the approximate phenotypes
We now expand the fitness-generating function in m . We denote by m ( m 1 , . . . , m N +1 ) T the initial state n ( n 1 , . . . , n N +1 ) expressed in terms of m , with m i j∈com(i) n j for i 1, . . . , M and m i ε n i for i M + 1, . . . , N + 1.
Lemma 1 allows F(s i ; s ; n ) to be expanded in m around n as where F i : F(s i ; s ; n ), (3.3b)   Eq. (5.6c) Appendix F.5 Tighter estimates that apply under various restrictions are presented in the appendices b Here and Here F i 0 for i 1, . . . , N (from the equilibrium equation of the residents), and 2). Moreover, by the bounded-world axiom (v) and Taylor's theorem, we have and where for vectors | · | denotes the Euclidian norm.
See Appendix B.3 for the proof. Thus, if Eq. (3.4a) is satisfied for a sufficiently small ε, the fitness-generating function is approximated well by a linear function of m .

Taylor expansion in the population densities of the original phenotypes
Next, we transform the term linear in m in Eq. (3.3a) into one in n . As m is a linear function of n , m can be written as m Wn , where W is a (N + 1)-by-(N + 1) matrix with components given by Eq. (3.1a). Therefore, substituting this relationship into Eq. (3.3a) gives (3.5b) By combining the equations above with Lemma 2, and by using Eq. (2.3), we get

Approximability condition when the population densities of the approximate phenotypes are large
In this section, we consider the sufficient condition in Eq. (3.4a) for LVapproximability, m − m < εC m . We refer to this as the approximability condition. If the initial equilibrium population densities of approximate phenotypes are not small, so that m i ε is satisfied for all i 1, . . . , M, their dynamics can be analyzed by a linear stability analysis of the resident equilibrium. On the other hand, if some approximate phenotypes have very small equilibrium population densities, also the corresponding eigenvalues of the associated Jacobian come very close to zero and examining linear terms alone is not sufficient. In this section, we analyze the first, simpler, case to show that the approximability condition (3.4a) can generally be fulfilled. The second, more complicated, case is then analyzed in a similar manner in the next section.

Dynamics of approximate phenotypes
The dynamics of approximate phenotypes m (m 1 , . . . , m M ) T satisfies, by Eqs. (3.1a) and (2.3), is the average growth rate within the ith cluster weighted with the fractions p j : n j /m cid( j) of its component phenotypes. As for the remaining degrees of freedom in m , i.e., m i εn i for i M + 1, . . . , N + 1, their dynamics are given by When m is kept constant, these remaining degrees of freedom describe the relatively slow dynamics of the cluster compositions, corresponding to the dynamics of the fractions p j .

Transformation into perturbed community
For convenience, Eq. (4.1a) is rewritten in vector-matrix form as We decompose the right-hand side of Eq. (4.2) into a component determined by m alone and a residual of order ε, which is treated as a perturbation. The former component is further decomposed into linear and higher-order terms. Specifically, we have and r m (r m1 , . . . , r mM ) T is a function of m satisfying |r m | ≤ C rm , while h m (h m1 , . . . , h mM ) T is a function of m and ε satisfying |h m | ≤ C hm . See Appendix C for the proof. Notice that J and r m are both independent of ε.

Local Lyapunov function
If the perturbation term is neglected in Eq. (4.3a), i.e., ε 0, we can easily examine the local stability of the fixed point m by checking whether all eigenvalues of J have negative real parts. With the perturbation, however, we also have to compare the magnitudes of those eigenvalues with the perturbation. Moreover, as the perturbation causes a deviation of the community from m, the effect of the higher-order term r m m − m 2 has to be examined as well.
When the second and third terms in Eq. (4.4) are both neglected, the time derivative of |x| 2 is a monotonically decreasing function if λ max < 0, which gives is a local Lyapunov function, i.e., V 0 for x 0 and dV /dt < 0 for 0 < |x| < φ with a sufficiently small φ.
Proof By Eqs. (4.4) and (4.6a), the time derivative of V equals Stability against perturbation when the initial equilibrium population densities of approximate phenotypes are not small. As a simple example, a community composed of two approximate phenotypes s a (s 1 , s 2 ) T is considered. Their population densities m (m 1 , m 2 ) T are transformed into x (x 1 , x 2 ) T so that x 0 corresponds to the initial equilibrium before mutant invasion. A local Lyapunov function V |x| 2 of the community dynamics monotonically decreases with time within the light-gray and dark-gray regions marked by D. The dark-gray region marked by E is associated with a repeller that prevents the community dynamics from passing its inner boundary |x| φ h from its inner side |x| Thus, if ε 0 (i.e., φ h 0), then V 0 for x 0 and dV /dt < 0 for 0 < |x| < φ r . Therefore, V is a local Lyapunov function for x 0.

Stability condition under perturbation
For λ max < 0 and r on which dV /dt < 0 (Fig. 2). Hence, all solutions of Eq. (4.4) that start within this contour stay inside of it. As the initial state x satisfies x P( m − m) 0, we have during the transient following mutant invasion.
Finally, by translating Eq. (4.8b) back to m − m P −1 x, and by substituting it into Eq. (3.4a), we have Theorem 2 For the population densities m (m 1 , . . . , m M ) T of approximate phenotypes s a (s 1 , . . . , s M ) T formed by clustering resident and mutant phenotypes s (s 1 , . . . , s N +1 ) T according to a threshold phenotypic distance ε ρ μ ε μ , if λ max defined by Eq. (4.5) satisfies the approximability condition, Eq. (4.8a), with P defined by Eq. (E.24) of Appendix E and with · denoting the induced norm for the matrix·, i.e., the maximum absolute value among its eigenvalues.
In evolutionary dynamics determined by trait-substitution sequences, i.e., induced by repeated mutant invasions, when the fitness gradients for all residents are sufficiently strong, so that the coexistence of a mutant and its parental resident is impossible for any resident (as explained in Sect. 8), each of the resident phenotypes is not similar to any other. Then, ε may be chosen at ε ε μ (i.e., ρ μ 1), so that only the mutant and its parental resident are clustered. This parental resident can be chosen as the approximate phenotype of that cluster, in which case all approximate phenotypes are identical to the resident phenotypes before the mutant invasion. Thus, as long as the initial equilibrium before the invasion is linearly stable, λ max is negative, because of Eq. (4.1a) in conjunction with Eq. (2.3). Hence, for sufficiently small ε μ , Eq. (4.8a) is always satisfied, in accordance with the proof by Dercole and Rinaldi (2008).
On the other hand, when the fitness gradients for some residents become small as a consequence of their directional coevolution toward higher fitnesses, effects of the higher-order properties of the fitness function may induce evolutionary branching. During the early stage of evolutionary branching, phenotypic distances among residents branched from the ancestral resident have magnitudes that are comparable with ε μ . In this case, clustering only the mutant and its parental resident with ε ε μ may provide too small a value of |λ max | to satisfy Eq. (4.8a), while including similar residents in the cluster for an appropriate ε larger than ε μ may provide a sufficiently large |λ max | to satisfy Eq. (4.8a).
In Dercole and Rinaldi (2008), only the mutant and its parental resident are clustered together, and the other residents are not clustered. Thus, when the phenotypic distance among some residents is small, say, equal to ε resident , the leading eigenvalue of the community's Jacobian matrix inevitably is close to zero as well. This problem is avoided in their proof by assuming sufficiently small mutational step sizes compared to ε resident . However, when we consider a trait substitution sequence under a given magnitude of mutational step sizes, early stages of evolutionary branching inevitably lead to ε resident of the same order of magnitude as the mutational step sizes, no matter what is assumed for the latter. From this perspective, the proof by Dercole and Rinaldi (2008) requires all the residents to be dissimilar.
According to Eq. (4.3b), the leading eigenvalue also becomes close to zero when initial equilibrium densities of some residents are small. Analogously to the above case of similar residents, while this problem is seemingly avoided in Dercole and Rinaldi (2008) by assuming sufficiently small mutational step sizes, it inevitably occurs in traitsubstitution sequences in which residents gradually go extinct. Hence, their proof fails to cover all the cases that one may wish to consider (and are actually considered in their book).

Approximability condition when the population densities of some approximate phenotypes are small
If an approximate phenotype s 1 has a small population density m 1 O(ε) at the initial equilibrium, then the corresponding eigenvalue of J diag( m)B will be close to zero, making it difficult to satisfy the approximability condition in Eq. (4.8a). Even in this case, however, m − m O(ε) may hold during the transient following mutant invasion. To cover this situation by developing a refined approximability condition, we examine in this section not only linear terms, but also quadratic terms, of the Taylor expansions investigated in the preceding section.

Transformation into perturbed community
First, we decompose the function f in Eq. (4.2), into the terms that are linear in m, the terms that are of higher order in m, and the perturbation terms, in a manner similar to Lemma 3 in the previous section. Specifically, we have Around the initial equilibrium m, the dynamics of m y are much slower than those of m x . When ε goes to zero, the equilibrium population densities ( m L+1 , . . . , m M ) ≤ ρ m ε do so as well, i.e., and the whole dynamics get confined to a center manifold given by dm x /dt 0. The slow dynamics close to the center manifold are governed by dm y /dt. An approximation for this center manifold can be derived by setting dm x /dt which passes through the fixed pointm x (0) : m.
Although for ε > 0, m ( m x , m y ) T will deviate from m ( m x , 0) T , it is expected that for small ε the dynamics can still be effectively characterized by their projection onto the center manifold m x m x (m y ). Thus, we transform Eq. (5.2a) into dx dt A x x + εh x +r x |w| 2 , (5.4a) dy dt = diag(y) J y y + Ux + εh y +r y |w| 2 , (5.4b) with h x ≤C hx , h y ≤C hy , |r x | ≤C rx , and r y ≤C ry , where x : P(m x −m x (m y )) describes the convergence to, or deviation from, the center manifold m x m x (m y ), and y : m y describes the slow dynamics along the manifold. The other variables and parameters newly introduced in Eqs. (5.4a) and (5.4b) are given by w : x y , . Notice that y must be non-negative while x is indeterminate. Here, the effect of m − m is subsumed into the perturbation termsh x andh y . Neglecting

Local Lyapunov function
Following Mazenc (2001), we construct a local Lyapunov function to examine the magnitude of |w| during the transient following mutant invasion. We have with d being a positive constant, satisfỹ and |w| < φ, with φ being a sufficiently small constant, then Notice that the last line of Eq. (5.6a) has a form identical to the second line of Eq. (4.7a). Although in this case the initial state w : analogously be transformed further, In Eq. (5.6c), the transformation from the second to the third row allows the initial state w to satisfy w ερ m ≤φ h , which is used for the case of ε > 0 in Lemma 10.
In addition, the following lemma is proved in Appendix G.
Lemma 9 Equation (5.5b), i.e.,λ holds, if the real parts of the eigenvalues of J x diag( m x )B xx and the real eigenvalues of 1 2 J y + J T y with J y B yy − B yx B −1 xx B xy are all negative, and if d is sufficiently small.

Stability under perturbation
Next, we take the perturbation into account, i.e., we consider ε > 0. In the previous section, the contour curves of the local Lyapunov function have the same shapes as the boundaries of the region ensuring that dV /dt < 0, i.e., as the two circles |w| φ h and |w| φ r . In this section, although the contours have shapes different from the circles |w| φ h and |w| φ r , the manner of analysis is the same. First, as the initial state w satisfies w ερ m ≤φ h according to Eq. (5.6d), we trivially have

Lemma 10
We assume that ε is sufficiently small so thatφ h <φ r . For a region D w|φ h < |w| <φ r , within which dV /dt < 0 with V defined in Eq. (5.5c), consider a contour curve V V 0 such that its inscribed circle is given by |w| φ h and its circumscribed circle is given by |w| αφ h with α > 1 (Fig. 3). If

holds during the transient following mutant invasion.
This lemma is an extension of Lemma 5 relaxing the requirement that the contours be circular: Lemma 10 with α 1 corresponds to Lemma 5.
Then, by substituting Eqs. (5.6d) into (5.7a), we have Stability against perturbation when the initial equilibrium population densities of some approximate phenotypes are small. As a simple example, a community composed of two approximate phenotypes s a (s 1 , s 2 ) T is considered. Their population densities m (m 1 , m 2 ) T are transformed into w (x, y) T so that w 0 corresponds to the initial equilibrium before mutant invasion, where y corresponds to the population density of the phenotype with small population density. A local Lyapunov function V x 2 + dy (with d > 0) of the community dynamics monotonically decreases with time within the light-gray and darkgray regions marked by D. The dark-gray region marked by E is associated with a repeller that prevents the community dynamics from passing a boundary V V 0 from its inner side V < V 0 , thus keeping (5.8c) and K M − L is the number of approximate phenotypes with small equilibrium population densities.
See Appendix H for the derivation of the expressions for α. When K 0, this lemma is independent of d and becomes identical to Lemma 6 in the previous section, i.e.,λ max λ max ,C r C r ,C h C h , and α 1. Thus, this lemma includes Lemma 6 as a special case. When K > 0, on the other hand,λ max , α,C h , andC r in Eq. (5.8a) all depend on d. A choice for d may be one that maximizes the right-hand side of Eq. (5.8a).
By translating Lemma 11 into a corresponding statement for m, we get

9b)
where I y denotes the K × K identity matrix, B is defined in Eq. (4.3b), and P is defined in Eq. (E.24) in Appendix E.
Note that it is arbitrary which phenotypes we choose as having small initial population densities. Thus, whether each approximate phenotype's initial population density is small or not can be decided in such a manner that satisfying the approximability condition becomes easiest. If none of the approximate population densities is treated as small, i.e., K 0, Theorem 3 becomes identical to Theorem 2. Thus, Theorem 3 includes Theorem 2 as a special case. The threshold phenotypic distance ε and the way of clustering can also be chosen arbitrarily, so that satisfying the approximability condition becomes easiest, as long as ε > ε μ . Notice thatC h depends on ε, although it is bounded when ε → 0. In addition, all of the other mathematical objects in Eq. (5.8a) depend indirectly on ε, because ε affects how to cluster the existing phenotypes. A procedure for the evaluation of Eq. (5.8a) would be as follows: first choose ε, and then choose approximate phenotypes, choose K , choose d when K > 0 (so that the right-hand side of Eq. (5.8a) is maximized), and examine whether the inequality holds good.
Moreover, the initial state n need not be exactly at an equilibrium of the resident phenotypes s 1 , . . . , s N if the value of C Fz is adjusted such that F(s i ; s ; n ) ≤ ε C Fz is satisfied for all i 1, . . . , N +1. Therefore, this theorem can be applied also to the case of higher mutation rates, in which frequent mutant invasions prevent the community from reaching the next population-dynamical equilibrium. Note also that the smallness of changes of the population densities of the approximate phenotypes ensures not only the smallness of fitness changes of existing phenotypes F(s i ; s ; n ) for i 1, . . . , N + 1, i.e., LV-approximability, but also the smallness of fitness changes of any non-existing phenotype z, i.e., of the fitness landscape F(z; s ; n ). Specifically, from Theorem 3 we immediately see

Corollary 1
If the approximability condition, Eq. (5.8a), in Theorem 3 is satisfied, then the change of the fitness landscape F(z; s ; n ) is slight during the transient following mutant invasion, because by using Taylor's theorem we see that where m T : θ T (m − m) + m with an appropriately chosen θ T ∈ [0, 1], and by using C m from Eq. (5.9a) in C m C 2 m + (N + 1 − M)η 2 from Lemma 2, we see that

Generalization to higher-dimensional trait spaces
Theorems 1-3 and Corollary 1 apply to one-dimensional trait spaces. These results readily generalized to trait spaces of arbitrary dimensions by a slight modification of the analyses so that the derivative of the fitness function with respect to a phenotype in the one-dimensional trait space is replaced with the corresponding directional derivative in the higher-dimensional trait space, as shown in Appendix I.

Tighter estimates
For deriving the approximability conditions in Eqs. (4.7a) and (5.8a), we have used the maximum possible values of the perturbation terms (h andh) and nonlinear terms (r andr) attainable for n ∈ [0, η] N +1 . These provide the simplest, but rather conservative, approximability conditions. By approximating those terms as linear or higher-order functions of population densities (corresponding to x and w), we may improve the estimates underlying the approximability conditions. In Theorem 2, for example, the perturbation term h can be expanded in w around w 0 (i.e., m m) up to the first-order remainder terms, with some appropriately chosen m T ∈ [0, η] M . Then, applying Eqs. (6.1) to Lemma 6 gives the condition where C h ≥ h and C H ≥ H ; see Appendix J for the derivation. As the magnitude of the zeroth-order term for h is estimated more tightly by C h at m m, compared to C h used in the original approximability condition in Eq. (4.8a), this condition can work better than the original approximability condition, but is less simple.

Example: Approximability condition for a resource-competition model
In this section, we give a simple example of how to examine the approximability condition in a specific ecological model.

Model description
We consider a resource-competition model based on the Beddington-DeAngelis-type functional response (Beddington 1975;DeAngelis et al. 1975), known to describe both saturation of consumption and interference competition among consumers. Under N coexisting consumer phenotypes s (s 1 , . . . , s N ) T with their densities n (n 1 , . . . , n N ) T , we describe the ith phenotype's per-capita growth rate as In Eq. (7.1), g(s i ; s; n) is the resource gain of phenotype s i , β is a constant assimilation efficiency, and ψ is a constant natural death rate. In Eq. (7.2), θ (s i ) is the density of potential resources for s i , and α(s j , s i ) describes the niche overlap between phenotypes s i and s j . ζ 1 , ζ 2 , and ζ 3 are constant parameters related to the encounter rate of resources, handling time of resources, and intensity of interference competition, respectively. Notice that ζ 3 0 gives the Holling type-II functional response (Holling 1959). Equation (7.2) can be derived from a generalized Beddington-deAngelis functional response with explicit description of a resource distribution and phenotypes' niches expressed along a resource-quality axis (Appendix L.1).
We assume ψ 1 and ζ 3 1 without loss of generality. For simplicity, we assume that θ (s i ) depends only on s i (i.e., constant inflows of resources into the system) and that

Application: Extending the invasion-implies-substitution theorem
The derived stability conditions and resultant Lotka-Volterra approximation can be used to analyze the community dynamics triggered by a mutant invasion. In this section, we apply them to extend the invasion-implies-substitution theorem (Dercole and Rinaldi 2008, Appendix B) to an arbitrary set of resident phenotypes that form wellrecognizable and -separated clusters in a one-dimensional trait space; see Appendix K for details. We assume an arbitrary set of resident phenotypes s 1 , . . . , s N together with a mutant s s N +1 , with the resident and mutant phenotypes clustered into approximate phenotypes s a (s 1 , . . . , s M ) T that satisfy the approximability condition of Theorem 3. Then, from Lemma 2, m ≤ εC m ε C 2 m + (N + 1 − M)η 2 is conserved during the transient following mutant invasion. We denote the identity of the cluster containing the mutant by i, i.e., cid(N + 1) i. Using Eq. (4.1a), the dynamics of the mutant fraction p N +1 n N +1 /m i within this cluster can be expressed as For convenience, we assume that the representative phenotype s i of this cluster is chosen as the phenotype most similar to the mutant, i.e., |s N +1 − s i | min j∈com(i) s N +1 − s j . Then, by Taylor's theorem, we transform Eq. (8.1) into Here, F z (s i ) ∂ F(z; s ; n )/∂z z s i is the fitness gradient at s i , and the constants C Fzm and C Fzz bound the remainder terms through for z jT ∈ [s j , s cid( j) ] for all j 1, . . . , N + 1 during the transient following mutant invasion. Then, by substituting our results m ≤ εC m ε C 2 m + (N + 1 − M)η 2 and Eq. (5.9a) into Eq. (8.2a), a sufficient condition for d p N +1 /dt to be always positive is given by with C m in Eq. (5.10b).
If the fitness gradient F z (s i ) is sufficiently strong, so that it satisfies Eq. (8.3), then p N +1 monotonically increases until it reaches 1, i.e., until all other phenotypes within the cluster containing the mutant are excluded. Equation (8.3) means that the fitness advantage of s N +1 against s i due to the fitness gradient must exceed the effects of the curvature of the fitness landscape (ε 2 C Fzz ) and of the perturbation due to the population dynamics (ε 2 C Fzm C m ). As long as Eq. (8.3) holds for any resident phenotype and its mutants, repeated mutant invasions always result in monomorphic phenotype clusters, i.e., resident phenotypes are kept dissimilar, corresponding to the situation considered by Dercole and Rinaldi (2008). Notice that when λ max becomes close to zero, e.g., when the community is close to a bifurcation point of its population dynamics, C m becomes large and Eq. (8.3) thus becomes difficult to satisfy.

Discussion
As explained in the beginning of this paper, ecological interactions engender various evolutionary dynamics, including cyclic coevolution, adaptive radiation, adaptive speciation, taxon cycles, and community formation. To analyze how ecological interactions induce selection pressures that drive such dynamics, the following two assumptions are often made (Metz et al. 1992(Metz et al. , 1996Dieckmann and Law 1996). First, mutation rates are sufficiently small relative to the timescale of the population dynamics, so that the evolutionary dynamics are reduced to trait-substitution sequences resulting from repeated mutant invasions. Second, mutational step sizes are sufficiently small, so that a mutant invasion typically results in an equilibrium phenotype distribution similar to that before the invasion. The latter is called attractor inheritance (Geritz et al. 2002). In such cases, each mutant invasion modifies the fitness landscape only slightly. The fitness landscape can then be treated as a smooth function of resident phenotypes at equilibrium population densities, enabling effective analyses of directional coevolution (Dieckmann and Law 1996) and diversification through evolutionary branching (Metz et al. 1992(Metz et al. , 1996Geritz et al. 1997Geritz et al. , 1998. Using the concept of approximate phenotypes introduced in the present paper, attractor inheritance can be translated into the smallness of changes of the population densities of approximate phenotypes during the transient population dynamics following mutant invasion, toward the next population-dynamical equilibrium.

Conditions for attractor inheritance
Prior to our analyses in the present paper, qualitative conditions for attractor inheritance have been proved for sufficiently small mutational step sizes in the following two cases: (1) all residents and the mutant are similar to each other (Geritz et al. 2002;Meszéna et al. 2005;Durinx et al. 2008), or (2) no two residents are similar to each other and their initial equilibrium population densities are not small (Dercole and Rinaldi 2008, Appendix B). In this paper, we have derived quantitative conditions for attractor inheritance for a set of residents and a mutant, by clustering them according to a threshold phenotypic distance into approximate phenotypes. The condi-tions ensuring attractor inheritance, i.e., the approximability conditions in Theorems 2 and 3, establish relationships among the magnitudes of the mutational step size, the return rate to an equilibrium of the population dynamics of approximate phenotypes, the nonlinearity of the population dynamics, and the perturbation due to within-cluster population dynamics. These conditions are especially important when finite, rather than infinitesimally small, mutational step sizes are required for analyzing the considered evolutionary dynamics, such as when investigating evolutionary suicide (Gyllenberg and Parvinen 2001) and evolutionary branching of directionally evolving populations Dieckmann 2012, 2014). A next step would be to analyze whether it is really possible to satisfy the approximability condition, or rather, whether the condition can be satisfied with not too large error bounds in all but a set of theoretically possible but practically irrelevant cases. Although we here have considered only deterministic population dynamics, the impact of demographic stochasticity on trait-substitution sequences (Geritz et al. 2002) can be considered using the same framework we have introduced here, by subsuming its effect in the perturbation terms.

Assumption of well-recognizable and -separated phenotypic clusters
Our analysis assumes that the number N of existing phenotypes is finite, and that phenotypic clusters are well-recognizable and well-separated from each other so that the largest of within-cluster distances, ε, is much smaller than the smallest of betweencluster distances. We discuss the validity of our two assumptions below.
In principle, ODE population models should be seen as large-system-size limits of stochastic individual-based models. Generally, the larger the number of coexisting phenotypes, the slower is the convergence to the ODE limit. Thus, for all practical purposes, ODE models with very large numbers of phenotypes can be left out of the picture. If we do so, the finiteness of the number of existing phenotypes, N , ensures the existence of the smallest between-cluster distance and the largest within-cluster distance.
However, if a system has long chains of phenotypes in which the distances between any two consecutive members of the chain are small but the distance between the ends of the chain is large, we have no way to cluster them so that ε becomes much smaller than the smallest between-cluster distance. In this case, the error estimate for perturbation terms in Theorems 2 or 3 (C h orC h ), in comparison with the leading eigenvalue of the community Jacobian matrix (λ max orλ max ), can be too large for the approximability condition to be satisfied.
Fortunately, there is effectively no chance of such configurations occurring in ongoing evolutionary dynamics with sufficiently small mutational step sizes as in such dynamics closely similar phenotypes only occur in the early stages of evolutionary branching. (The local coexistence regions that can occur in higher-dimensional trait spaces around the zero fitness contour for particular residents are that narrow that the chance of a mutant landing in them is practically negligible. The more so since the far more common mutants landing outside these coexistence regions will oust all those inside the regions, so there is no chance of the number of coexisting similar phenotypes ever becoming large (Durinx et al. 2008).) Finally, of phenotypes that evolve towards each other, only one will survive due to competitive exclusion. Therefore, the assumption of well-recognizable and -separated phenotypic clusters is warranted except for a fraction of cases that will be encountered only very exceptionally, as well as transiently, in the scenarios that have our interest.

LV-approximation for analyzing evolutionary branching in multidimensional trait spaces
As shown in Sect. 3, attractor inheritance in approximate phenotypes directly enables LV-approximations of the population dynamics of the original phenotypes before clustering, similar to the previous studies (Meszéna et al. 2005;Dercole and Rinaldi 2008;Durinx et al. 2008). The derived LV-approximations may be especially useful for extending conditions for evolutionary branching from one-dimensional trait spaces to higher-dimensional trait spaces. In two-dimensional trait spaces, various numerical analyses have shown that phenotypes that are strongly convergence stable, but not evolutionarily stable, also known as strongly attracting invadable ESSes, induce evolutionary branching (e.g., Vukics et al. 2003;Ackermann and Doebeli 2004;Egas et al. 2005;Ravigné et al. 2009;Ito and Dieckmann 2012). Those phenotypes are fixed-point attractors that can be attained by directional evolution causing the convergence of a monomorphic population (Leimar 2009) to them, with sufficient proximity of a set in S 2 enabling the emergence of dimorphisms followed by directional evolution causing the divergence of the two morphs. However, whether an emergent polymorphism can evolutionarily diversify further into visually distinct morphs without collapse has not been proved until recently for higher-dimensional trait spaces. Based on the rational form of invasion-fitness functions in terms of existing phenotypes, which has been derived by LV-approximation (Durinx et al. 2008), Geritz et al. (2016) derived a set of conditions that ensure that such diversifying evolution does not collapse in trait spaces of arbitrary dimension, by describing the initial diversifying evolution with coupled Lande equations (Lande 1979). While those conditions are satisfied by strongly attracting invadable ESSes in two-dimensional trait spaces, the higher-dimensional cases remain to be analyzed further (Geritz et al. 2016).

Axioms for fitness functions
The analyses in this paper are based on a set of axioms for the fitness-generating functions characterizing ecologically plausible differential equations describing traitmediated community dynamics. Our set of axioms are similar to the set of properties assumed in Dercole (2016), which are used by him to derive a general procedure for formulating population-dynamical models resulting from individual pairwise interaction. Properties 1, 2, and 3 in Dercole (2016) are identical to our axiom (iii), (iv), and (ii), respectively, while property 4 in Dercole (2016) corresponds to our axiom (i).
Dercole's property 4, however, delimits a smaller class of models than ours. The symmetry axiom (ii) and the reducibility axiom (iii) are no more than consistency conditions, as is the exchangeability axiom (iv). The latter axiom, however, is together with the remaining smoothness axiom (i) and bounded-world axiom (v) the root cause of the Lotka-Volterra approximabiliy. Indeed, Lemma 1, which is central for deriving the condition for attractor inheritance and LV-approximation, is proved by applying those three axioms (Appendix A). While the bounded-world axiom (v) seems to be well grounded in reality, the smoothness axiom (i) may not hold in an exact sense, because it assumes that the population-dynamical behavior of individuals depends smoothly on their traits and that all ecological interactions are instantaneous. This instantaneousness can arise when the timescale of the life-history dynamics among individuals is much faster than that of their population dynamics. Durinx et al. (2008) have proved attractor inheritance and LV-approximation in physiologically structured models with multiple birth states, in which the timescales of life-history dynamics and population dynamics are not separated. This instills us with cautious optimism that the assumption of instantaneousness we have used in the present paper might be relaxed as well.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Appendix A: Proof of Lemma 1
Here we derive C Fm in Eq. (3.2) in the main text. We start from the observation that by the smoothness and exchangeability of F and the compactness of the set of community states, i.e., axioms (i), (iv), and (v), there exists a constant C Fn such that for all i, j 1, …, N + 1, with The derivative of F with respect to m j is calculated from To calculate the derivative ∂n k ∂m j , we substitute Eq.
We use Taylor's theorem and the exchangeability axiom (iv) to derive an estimate for the term in square brackets, for some appropriately chosen s jT ∈ [s j , s cid( j) ], where ρ j (s j − s cid( j) )/ε with ρ j ≤ 1 (because by assumption within-cluster phenotypic differences do not exceed ε). By the smoothness of F, there exists a constant C Fsn such that To get an estimate for the remainder R in Then by Taylor's theorem, with θ T some appropriately chosen number between 0 and 1. By the chain rule, Now we use that with · Q defined for an arbitrary K × K symmetric matrix B by B Q : max u T Bu |u| 1 , (B.5) i.e., the absolute value of the dominant eigenvalue. By using Eq. (B.5), we see When it is given that | x| < δ this translates into the uniform estimates Note that B Q satisfies

B.3 Proof of Lemma 2
Using Taylor's theorem, Eq. (3.3d) in the main text can be transformed to Thus, the result from Appendix B.1 translates as where, as proved in the next subsection,

B.4 Finding C Fmm
Here we prove Eq. (B.16). Analogously to Appendix A, we start from the observation that by the smoothness axiom (i) and bounded-world axiom (v) there exists a constant C Fnn such that For j, k M + 1, . . . , N + 1, we first transform Eq. (B.19) and then substitute Eq. (B.20) into the equation, ∂ 4 F(s i ; s ; n ) ∂s k ∂n cid(k) ∂s j ∂n j + ∂ 4 F(s i ; s ; n ) ∂s k ∂n cid(k) ∂s j ∂n cid( j) s k s kT ,s j s jT , and for j, k M + 1, …, N + 1 Finally, for we find its upper bound by using Eq. (B.8), as (B.31)

C.2 Expansion of dm/dt
Eq. (C.14) is further transformed into is the Jacobian matrix at m, and r m (r m1 , . . . , r mM ) T and h m (h m1 , . . . , h mM ) T are given by where from Eq. (C.16) we find There exist constants C r and C h such that |r| ≤ C r and |h| ≤ C h , given by, e.g.,

Appendix E: Proof of Lemma 4
We first prove three simple cases: all eigenvalues are (1) distinct, (2) the same real number, or (3) the same complex number, and then combine them for proving the general case.

E.2 All eigenvalues are the same real number
We assume that m (m 1 , m 2 , m 3 ) T for which all three eigenvalues of J are the same real number, i.e., λ 1 λ 2 λ 3 λ. Then there exists a regular matrix P J that transforms J into the Jordan normal form : Here we modify P J to P by multiplying it with , .

E.4 General case
By combining the aforementioned three cases, here we prove Lemma 4. For an arbitrary M × M matrix J, there exists a regular matrix P J that transforms J into P J JP −1 J in the following form where, here and below, all blank components are zero, R is a G × G diagonal matrix with its entries given by the distinct real eigenvalues λ R 1 , . . . , λ R G , and C is a 2H ×2H block diagonal matrix with its entries determined by the distinct complex eigenvalues (E.25) Since J is an M × M matrix, G + 2H + repr(1) + · · · + repr(K ) + 2[repc(1) + · · · + rep(L)] M is fulfilled. Here we define where I R and I C are, respectively, G ×G and 2H ×2H identity matrices, repR k is given by Eq. (E.9) with W repr(k), and repC l is given by Eq. (E.18) with W repc(l). Then we see where repR k is the same as I 2 /2 for l 1, . . . , L. Therefore, we get and (F.10) (F.11)

F.3 Variable transformation
If |m −m| is sufficiently small, so that the nonlinear term in Eq. (F.8) can be neglected, and ε 0, then dm with T : B −1 xx B xy . Even when ε > 0, the dynamics is expected to be effectively characterized by its projection on this manifold. Thus, we introduce x P(m x − m x (m y )) and y m y , to capture the fast convergence to the manifold and slow dynamics along it, respectively. The vector w x y is written as with Q : P PT 0 I y ,  where r * is worked out below after the derivation of the equation for dw/dt, and estimated in Appendix F.4. BQ −1 is transformed by using T B −1 xx B xy , as In addition, r * and h * given in Eq. (F.19) are further transformed into the following formulas forC rx ,C ry ,C hx , andC hy are found,

Appendix H: Derivation of Eq. (5.8c)
Here we derive Eq. (5.8c) in the main text. As explained in the main text, the contour curve V V 0 has an inscribed circle |w| φ h , and a circumscribed circle |w| αφ h . Specifically, the contour curve is defined by where V 0 is determined so that |w| φ h is an inscribed circle of V V 0 , i.e., For convenience, we write y for y (y 1 , . . . , y K ) T , which is identical to y (y L+1 , . . . , y M ) T in the other appendices and in the main text. On the other hand, the radius of the circumscribed circle, αφ h , satisfies Thus, α is given by To calculate the maximum and minimum of |w| 2 for w ∈ E V we proceed as follows. To make calculation easier, we decompose the expression of For the calculation of max y∈E Vy |y| 2 , the |y| 2 can be transformed by 2d K j 1 y j qV 0 into (H.10) Since y i ≥ 0 for i 1, . . . , K , max y∈E Vy |y| 2 is given by which is attained when y j 0 qV 0 2d and y j j 0 0 for some j 0 1, . . . , K . To calculate min y∈E Vy |y| 2 , Eq. (H.10) is further transformed into (y j − y k ) 2 − (K − 1)y 2 1 + · · · + y 2 K −1 − (y 2 2 + · · · + y 2 K ) + · · · + (y 2 K )  and s cid( j) ∈ {s 1 , . . . , s M }, where s 1 , . . . , s M can be chosen arbitrarily as long as u j (s j ) v j holds for j M + 1, . . . , N + 1 (because all the expansions of the fitness function in this paper are in non-representative phenotypes that correspond to s M+1 , . . . , s N +1 ). Notice that u j (s cid( j) ) v cid( j) . For notational convenience, we also introduce for j 1, . . . , M u j (s j ) : v j . (I.3) Then we define for j 1, . . . , N + 1 F j (s j ; s ; n ) : F(u j (s j ); U (s ); n ) ( I . 4 ) with s : (s 1 , . . . , s N +1 ) T , u j (s j ) v j , and U (s ) (u 1 (s 1 ), . . . , u N +1 (s N +1 )) V . Note that the F j (s j ; s ; n ) satisfy the smoothness axiom (i), the reducibility axiom (ii), and the bounded-world axiom (iii). The exchangeability axiom (iv) is also satisfied between s j and its representative phenotype s cid( j) , for j M + 1, . . . , N + 1 (i.e., F j (s j ; s ; n ) F cid( j) (s cid( j) ; s ; n ) for s j s cid( j) ). Thus, by Taylor's theorem, F j (s j ; s ; n ) for j M + 1, . . . , N + 1 can be expanded in s j around s cid( j) as F j (s j ; s ; n ) F j (s j ; s ; n ) s j s cid( j) + ∂ F j (s j ; s ; n ) ∂s j s j s jT (s j − s cid( j) ) F cid( j) (s j ; s ; n ) s j s cid( j) + ∂ F j (s j ; s ; n ) ∂s j s j s jT (s j − s cid( j) ) (I.5) with some appropriately chosen s jT ∈ [s j , s cid( j) ]. The derivatives of F j (s j ; s ; n ) with respect to population densities or phenotypes can be expanded in the same manner. Therefore, for all · i, j, k replacing F(s · ; s ; n ) with F · (s · ; s ; n ), F(z · ; s ; n ) with F · (z · ; s ; n ), F(s · ; s ; n ) with F · (s · ; s ; n ), F(s · ; s ; m ) with F · (s · ; s ; m ), F(s · ; s a ; m) with F · (s · ; s a ; m) and F(s cid(·) + ερ · ; s ; n ) with F · (s cid(·) + ερ · ; s ; n ) throughout this paper (except in this section, Sect. 7, and Appendix K) gives the complete proofs for Theorems 1-3 for the fitness function F(v j ; V ; n ).
This means that |h| does not exceed c h (φ x ) when x is within a circle of radius φ x . Thus, replacing C h with c h (φ x ) in Eqs. (4.8a) and (4.8b) in Lemma 5 in Sect. 4, we have ε < λ 2 max 4c h (φ x )C r λ 2 max 4( C h + C H φ x )C r (J.7) and |x| ≤ 2ε (J.8) Equation (J.8) is ensured by Eq. (J.7), as long as φ x is appropriately chosen so that the right-hand side of Eq. (J.8) does not exceed φ x , i.e.,

2ε
C Clearly, the smallest φ x that satisfies both Eq. (J.7) and Eq. (J.9) makes the approximability condition Eq. (J.7) the easiest to satisfy for a given λ max , and simultaneously makes the right-hand side of Eq. (J.8) the smallest. Such a φ x is given by solving Eq. (J.9) assuming equality, i.e., This approximability condition can be further improved by the higher-order approximation of the nonlinear term and/or perturbation term, although the resultant conditions will be less simple than Eq. (J.11).