1 Introduction

Ecological interactions create selection pressures that may change those very interactions. Such eco-evolutionary feedback can induce rich coevolutionary dynamics including cyclic coevolution (e.g., Dieckmann et al. 1995; Dieckmann and Law 1996), adaptive radiation (e.g., Ackermann and Doebeli 2004; Egas et al. 2005), adaptive speciation (e.g., Dieckmann and Doebeli 1999; Dieckmann et al. 2004; Rundle and Nosil 2004), taxon cycles (e.g., Kisdi et al. 2001; Ito and Dieckmann 2007), and community formation (e.g., Loeuille and Loreau 2005; Dieckmann et al. 2007; Ito et al. 2009; Takahashi et al. 2013). To arrive at tractable descriptions of such evolutionary dynamics, the assumption is often made that mutation rates are low relative to the timescale of population dynamics. This assumption reduces the evolutionary dynamics to a trait-substitution sequence resulting from repeated mutant invasions (Metz et al. 1992, 1996; Dieckmann and Law 1996). Such invasions potentially bring about various outcomes: most often, (1) extinction of only the resident that is parental to the mutant, and more rarely, (2) coexistence of the mutant with all residents, or (3) other combinations of extinctions of the parental resident, non-parental residents, and mutant.

It has been proved that when for all residents all potentially invading mutants are subject to directional selection and the resulting perturbations to the system are sufficiently weak, as measured by the product of fitness gradients and mutational step sizes relative to the return rate to their population-dynamical equilibrium before the invasion, invading mutants replace their parental residents—a statement referred to as the invasion–implies–substitution theorem (Geritz 2005; Dercole and Rinaldi 2008). The resulting trait-substitution sequences describe directional coevolution, characterized well by a set of ordinary differential equations called the canonical equations of adaptive dynamics theory (Dieckmann and Law 1996), which have a form similar to Lande’s equations of quantitative genetics theory (Lande 1979).

Eventually, directional coevolution may take some residents to the neighborhood of peaks, troughs, or saddles of the community’s fitness landscape, which means that those populations experience very weak directional selection. Here, an invading mutant may coexist with its parental resident, which may be followed by diversifying evolution of the two morphs, called evolutionary branching (Metz et al. 1996). If the community has a one-dimensional trait space and a single resident, necessary and sufficient conditions for its evolutionary branching into two distinct residents have been obtained (Metz et al. 1996; Geritz et al. 1998).

On the other hand, for higher-dimensional traits or more than one resident, obtaining formal conditions for the occurrence of evolutionary branching is difficult (but see Ito and Dieckmann (2014) for a special case). This is largely because in these more complex community dynamics it is not easy to analyze the outcomes of mutant invasions (Metz et al. 1996). This difficulty may be reduced when the population dynamics can be approximated by Lotka–Volterra (LV) models, which are analytically more tractable and have been studied well (e.g., Zeeman 1993; Hofbauer and Sigmund 1998). The LV-approximation is possible when all existing residents and the mutant are similar to each other, so that they form a single phenotypic cluster (Meszéna et al. 2005; Durinx et al. 2008), which yields an expression for the invasion-fitness function in terms of resident and mutant phenotypes that is given by a rational function. By using this rational form, considerable progress in deriving conditions for multidimensional evolutionary branching has recently been made (Geritz et al. 2016; Sect. 9.3).

Dercole and Rinaldi (2008) proved that the LV-approximation holds also when all of the existing residents are not similar to each other, i.e., when every cluster has only a single resident, and their initial equilibrium population densities are not small. (Although such limiting assumptions for residents are not made in their proof, these assumptions are required when we consider trait-substitution sequences, as explained in Sect. 4.4). Thus, the remaining cases to be analyzed are (a) only some residents are similar to each other and (b) the population densities of some residents are very small so that they may go extinct as a result of the invasion. Both cases are likely to occur in multispecies coevolution, including processes involving multiple evolutionary branching and taxon cycles, commonly observed in numerical simulations of trait-mediated community dynamics (e.g., Doebeli and Dieckmann 2000; Ito and Dieckmann 2007). Therefore, the goal of the present paper is to obtain formal conditions for ensuring the LV-approximation for an arbitrary set of residents, including the aforementioned two cases. Based on the obtained conditions, the invasion–implies–substitution theorem can be extended to a mutant with an arbitrary set of residents.

The next section, Sect. 2, formulates a set of axioms that are expected to hold for ecologically plausible differential equations describing trait-mediated community dynamics. Section 3 derives a condition for ensuring the LV-approximation. Sections 4 and 5 derive sufficient conditions for satisfying this condition, in terms of properties of the fitness-generating function and mutational step sizes. Section 6 explains how the thresholds for the obtained sufficient conditions can be improved further. Section 7 shows how to examine the obtained sufficient conditions for a specific ecological model. Section 8 extends the invasion–implies–substitution theorem.

2 Framework and assumptions

2.1 Axioms for fitness-generating functions

We consider community dynamics written as

$$ \frac{{{\text{d}}n_{i} }}{{{\text{d}}t}} = n_{i} F(s_{i} ;{\mathbf{s}};{\mathbf{n}}) $$
(2.1)

with population densities \( n_{i} \) for \( i = 1, \ldots ,N \).

We denote by \( \mathcal{\mathcal{S}} \subset {\mathbb{R}}^{Z} \) a compact \( Z \)-dimensional trait space, by \( {\mathbf{s}} = (s_{1} , \ldots ,s_{N} )^{\text{T}} \in \mathcal{\mathcal{S}}^{N} \) an \( N \)-dimensional vector of trait values of the phenotypes present in the community, and by \( {\mathbf{n}} = (n_{1} , \ldots ,n_{N} )^{\text{T}} \in {\mathbb{R}}_{ + }^{N} \) the vector of their population densities. The fitness-generating function

$$ F:\mathcal{\mathcal{S}} \times \bigcup\limits_{N = 1}^{\infty } {\left( {\mathcal{\mathcal{S}}^{N} \times {\mathbb{R}}^{N} } \right)} \to {\mathbb{R}}:(s^{{\prime }} ,{\mathbf{s}},{\mathbf{n}}) \mapsto F(s^{{\prime }} ;{\mathbf{s}};{\mathbf{n}}) $$
(2.2)

describes the instantaneous per capita growth rate of an arbitrary phenotype \( s^{{\prime }} \) with an infinitesimally small population density in the instantaneous environment produced by resident community composition \( ({\mathbf{s}},{\mathbf{n}}) \) (Brown and Vincent 1987; Cohen et al. 1999). The fitness-generating function provides a fitness landscape for each community composition \( ({\mathbf{s}},{\mathbf{n}}) \). We assume that it satisfies the following axioms:

  1. (i)

    Smoothness: \( F \) is smooth on each component of its domain \( \mathcal{\mathcal{S}} \times \mathcal{\mathcal{S}}^{N} \times {\mathbb{R}}^{N} . \)

  2. (ii)

    Symmetry: \( F(s^{{\prime }} ;\sigma {\mathbf{s}};\sigma {\mathbf{n}}) = F(s^{{\prime }} ;{\mathbf{s}};{\mathbf{n}}) \) for all permutations \( \sigma \) operating on the indices of \( ({\mathbf{s}};{\mathbf{n}}). \)

  3. (iii)

    Reducibility: \( F(s^{{\prime }} ;(s_{1} , \ldots ,s_{N} )^{\text{T}} ;(n_{1} , \ldots ,n_{N - 1} ,0)^{\text{T}} ) = F(s^{{\prime }} ;(s_{1} , \ldots ,s_{N - 1} )^{\text{T}} ;(n_{1} , \ldots ,n_{N - 1} )^{\text{T}} ). \)

  4. (iv)

    Exchangeability: If \( s_{N} = s_{N - 1} \), then \( F(s^{{\prime }} ;(s_{1} , \ldots ,s_{N} )^{\text{T}} ;(n_{1} , \ldots ,n_{N} )^{\text{T}} ) \)\( = F(s^{{\prime }} ;(s_{1} , \ldots ,s_{N - 1} )^{\text{T}} ;(n_{1} , \ldots ,n_{N - 1} + n_{N} )^{\text{T}} ). \)

  5. (v)

    Bounded world: There exists an upper bound \( \eta > 0 \) for the community's total population density, i.e., Eq. (2.1) eventually restricts the population densities to \( \left\{ {(n_{1} , \ldots ,n_{N} ) \in {\mathbb{R}}_{ + }^{N} \left| {\sum\nolimits_{i = 1}^{N} {n_{i} } \le \eta } \right.} \right\}. \)

Below, we restrict the community's space of population densities to \( [0,\eta ]^{N} . \)

The smoothness axiom (i) follows from the assumption that the population-dynamical behavior of individuals depends smoothly on their traits and that all ecological interactions are instantaneous. The latter assumption is implicit in the assumption that the per capita growth rate depends only on the arguments \( s^{{\prime }} \) and \( ({\mathbf{s}},{\mathbf{n}}) \). Axioms (ii) to (iv) are consistency conditions that go with representing the behaviour of large collectives of individuals by differential equations for their densities. Axiom (ii) follows from the arbitrariness of the ordering of the trait N-tuples, and axiom (iv) from the fact that individuals with the same trait values are assumed to be indistinguishable. The consequent additivity for identical phenotypes mechanistically lies at the heart of the LV-approximability. The bounded-world axiom (v) is just what it says: there necessarily is a limit to the biomass that a patch of world can support. Models that do not acknowledge this may on occasion be good approximations for specific purposes, but when we run into results contradicting the bounded-world assumption, we have to start modifying the model.

To keep the exposition simple, we assume from now on a one-dimensional trait space \( \mathcal{\mathcal{S}} \subset {\mathbb{R}} \). The results are generalized to higher-dimensional trait spaces in Sect. 5.4.

2.2 Population dynamics triggered by mutant invasion

We assume that the community is at a locally stable equilibrium \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{n}}= ( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{1} , \ldots , \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{N} )^{\text{T}} \), determined by \( F(s_{i} ;{\mathbf{s}};\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{n}}) = 0 \) for all \( i = 1,\ldots,N \). When an invasion by a mutant \( s^{{\prime }} = s_{N + 1} \) with \( \left| {s_{N + 1} - s_{N} } \right| = \varepsilon_{\mu } \) has occurred, the combined population dynamics can be written as

$$ \frac{{{\text{d}}n_{i} }}{{{\text{d}}t}} = n_{i} F(s_{i} ;{\mathbf{s^{\prime}}};{\mathbf{n^{\prime}}}) $$
(2.3)

for \( i = 1, \ldots ,N + 1 \), where \( {\mathbf{s}}^{{\prime }} = (s_{1} , \ldots ,s_{N} ,s_{N + 1} )^{\text{T}} \) and \( {\mathbf{n}}^{{\prime }} = (n_{1} , \ldots ,n_{N} ,n_{N + 1} )^{\text{T}} \), starting from \( {\mathbf{n}}^{{\prime }} = ( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{1} , \ldots , \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{N} ,n_{N + 1} )^{\text{T}} \) with very small \( n_{N + 1} \), which means that \( {\mathbf{n}}^{{\prime }} \) is almost identical to the equilibrium before the invasion, \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{n}}^{\prime } = ( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{1} , \ldots , \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{N} ,0)^{\text{T}} \).

Please notice that here we have introduced the notational convention, to which we adhere throughout this paper, that vectors of dimension N + 1 directly corresponding to vectors of dimension N are denoted by an added prime, as in \( {\mathbf{s}}^{{\prime }} \), \( {\mathbf{n}}^{{\prime }} \), and \(\overset{{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}}{{{\textbf n}}}^{\prime}\).

Proposition 1

For a sufficiently small mutational step size\( \varepsilon_{\mu } \), the fitness-generating function during the transient following mutant invasion can be approximated by a linear function of\( {\mathbf{n}}^{{\prime }} \),

$$ \begin{aligned} & F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{i} + \sum\limits_{j = 1}^{N + 1} {a_{ij} (n_{j} - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{j} )} , \\ & \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{i} : = F(s_{i} ;{\mathbf{s}}^{{\prime }} ;\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{n}}^{\prime} ), \\ & a_{ij} : = \left. {\frac{{\partial F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} )}}{{\partial n_{j} }}} \right|_{{{\mathbf{n^{\prime}}} = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{n}}^{\prime}}} , \\ \end{aligned} $$
(2.4)

which upon substitution into Eq. (2.3) gives the approximating LotkaVolterra model,

$$ \frac{{{\text{d}}n_{i} }}{{{\text{d}}t}} = n_{i} \left[ {\gamma_{i} + \sum\limits_{j = 1}^{N + 1} {a_{ij} n_{j} } } \right] $$
(2.5)

with\( \gamma_{i} = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{i} - \sum\nolimits_{j = 1}^{N + 1} {a_{ij} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{j} } . \)

The remainder of this paper is devoted to making precise the, very general, conditions under which this proposition holds, and to calculating the corresponding error bounds. Important variables and parameters used in our analysis are shown in Table 1.

Table 1 List of notation for parameters, phenotypes, population densities, fitness functions, and other quantities

3 Linear approximation of the fitness-generating function

3.1 Basic idea

The root of the LV-approximability is the exchangeability axiom (iv) combined with the smoothness axiom (i). Under the exchangeability axiom (iv), the fitness-generating function does not distinguish individuals with identical phenotypes. Hence, the function responds only to the sum of their densities. Under the smoothness axiom (i), this property is approximately inherited by slightly different phenotypes; the fitness-generating function responds primarily to the sum of their densities. In the remainder of this paper, we will work out how to lowest order of approximation the fitness-generating function responds linearly to the separate contributions to this sum, leading to the LV-approximation.

To get a more specific picture, we first suppose that there exist only two phenotypes, a resident phenotype \( s_{1} \) and a mutant phenotype \( s_{2} \), with population densities \( n_{1} \) and \( n_{2} \), respectively, and with their phenotypic difference given by the mutational step size, \( |s_{2} - s_{1} | = \varepsilon_{\mu } \), with \( \varepsilon_{\mu } \) being small. Proposition 1 trivially holds when the deviations of \( n_{1} \) and \( n_{2} \) from their initial states \( {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{n}}^{{\prime }}} }} = (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{1} ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{2} )^{\text{T}} \) are both small during the transient following mutant invasion. In many cases, however, those changes are large, resulting in the exclusion of the resident (Dercole and Rinaldi 2008, Appendix B). In the latter case, it is not obvious whether a linear approximation of the fitness-generating function in \( {\mathbf{n}}^{{\prime }} = (n_{1} ,n_{2} )^{\text{T}} \) is valid.

On the other hand, as the mutant is similar to the resident, due to the smoothness and exchangeability property of the fitness-generating function, they act almost like a single phenotype in their effect on the environment. Thus, invasion by the mutant in many cases causes only a slight change in their total population density \( n_{1} + n_{2} \), and only their fractions may change substantially, but will do so slowly (Dercole and Rinaldi 2008, Appendix B; Meszéna et al. 2005; Durinx et al. 2008). In other words, the fitness-generating function is not sensitive to even large changes of \( n_{1} \) and \( n_{2} \), as long as \( n_{1} + n_{2} \) is kept almost constant. As shown later, this implies that the change of \( F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) \) induced by a large change of \( n_{2} \), keeping \( n_{1} + n_{2} \) constant, is slight, so that \( F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) \) can be expanded with respect to \( {\mathbf{m}}^{{\prime }} = (m_{1} ,m_{2} )^{\text{T}} = (n_{1} + n_{2} ,\varepsilon_{\mu } n_{2} )^{\text{T}} \), even for \( \varepsilon_{\mu } \to 0 \). The linear relationship between \( {\mathbf{m}}^{{\prime }} \) and \( {\mathbf{n^{\prime}}} \) then makes Proposition 1 hold: as the change of \( m_{2} = \varepsilon_{\mu } n_{2} \) is always small because of the smallness of \( \varepsilon_{\mu } \), this is the case whenever the change of the population density \( m_{1} = n_{1} + n_{2} \) is small. Below, we introduce the notion of approximate phenotypes, so we can abbreviate the preceding condition by stating that the change in the population density \( m_{1} = n_{1} + n_{2} \) of the approximate phenotype (\( s_{\text{a}} = s_{1} \) or \( s_{\text{a}} = s_{2} \)) is small.

The strategy above is readily extended to multiple residents \( s_{1} , \ldots ,s_{N} \) and a mutant \( s_{N + 1} \) emerged from the parental phenotype \( s_{N} \) with \( |s_{N + 1} - s_{N} | = \varepsilon_{\mu } \), by choosing an approximate phenotype from each of the existing phenotypic clusters, so that density changes of those approximate phenotypes can be kept small during the transient following mutant invasion, and thus an LV-approximation can be warranted (Sect. 3). We can gauge the smallness of their density changes from the leading eigenvalue of the community’s Jacobian matrix evaluated at the equilibrium population densities of the approximate phenotypes before the invasion (Sect. 4). However, this linear stability analysis does not work well when some approximate phenotypes have very small initial equilibrium densities, because those small densities inevitably cause the leading eigenvalue of the community’s Jacobian matrix to be close to zero. To overcome this difficulty, we analyze not only the linear terms but also the quadratic terms of the transient dynamics around the initial equilibrium (Sect. 5). In the remainder of this section, we show how we can easily find approximate phenotypes for a set of phenotypes \( {\mathbf{s}}^{{\prime }} = (s_{1} , \ldots ,s_{N + 1} )^{\text{T}} \), such that Proposition 1 holds when the changes of the population densities of these approximate phenotypes are sufficiently small.

3.2 Approximate phenotypes

We consider an arbitrary set of residents together with a mutant, \( {\mathbf{s}}^{{\prime }} = (s_{1} , \ldots ,s_{N} ,s_{N + 1} )^{\text{T}} \). We choose phenotypic clusters so that within-cluster phenotypic differences do not exceed \( \varepsilon = \rho_{\mu } \varepsilon_{\mu } \) (Fig. 1a), with an arbitrarily chosen constant \( \rho_{\mu } \) larger than 1 (but not too large, so that the clustering is meaningful, i.e., the error estimates to be derived below are small). We assume that those phenotypic clusters are well-recognizable and well-separated from each other, so that we can find an \( \varepsilon \) that is much smaller than the smallest distance among the approximate phenotypes. Generally, this assumption is warranted in evolutionary dynamics with small mutational step sizes (as explained in Sect. 9.2) by the principle of limiting similarity. Notice that in any case the mutant \( s_{N + 1} \) and its parental phenotype \( s_{N} \) form a cluster. Any resident not similar to any other phenotype forms a cluster by itself. Thus, the number of clusters, denoted by \( M \), satisfies \( 1 \le M \le N \). From each cluster, we arbitrarily pick one phenotype as its representative. Then, by the symmetry axiom (ii), we can permute \( {\mathbf{s}}^{{\prime }} = (s_{1} , \ldots ,s_{N} ,s_{N + 1} )^{\text{T}} \) so that those representatives come first as \( s_{1} , \ldots ,s_{M} \), followed by the other phenotypes, i.e., \( {\mathbf{s}}^{{\prime }} = (s_{1} , \ldots ,s_{M} ,s_{M + 1} , \ldots ,s_{N + 1} )^{\text{T}} \) (Fig. 1b). We refer to those representatives as approximate phenotypes \( {\mathbf{s}}_{\text{a}} = (s_{1} , \ldots ,s_{M} )^{\text{T}} . \)

Fig. 1
figure 1

Construction of approximate phenotypes and their population densities. a The population densities \( {\mathbf{n}}^{{\prime }} = (n_{1} ,n_{2} ,n_{3} ,n_{4} ,n_{5} )^{\text{T}} \) of existing phenotypes \( {\mathbf{s}}^{{\prime }} = (s_{1} ,s_{2} ,s_{3} ,s_{4} ,s_{5} )^{\text{T}} \)—comprising four residents \( s_{1} \), \( s_{2} \), \( s_{3} \), \( s_{4} \), and a mutant \( s_{5} \)—are indicated by colored histogram bars. The thick gray curve shows the fitness landscape, which passes through 0 at the resident phenotypes. The existing phenotypes are clustered so that within-cluster phenotypic differences do not exceed the threshold \( \varepsilon \), which is chosen to be larger than the mutational step size \( \varepsilon_{\mu } \), so that the mutant \( s_{5} \) and its parental resident \( s_{4} \) are guaranteed to be part of the same cluster. b The existing phenotypes are permuted so that the approximate phenotypes \( s_{1} \), \( s_{2} \), \( s_{3} \) come first. c Within each cluster, an approximate phenotype is chosen to represent the cluster. The population densities \( m_{1} \), \( m_{2} \), \( m_{3} \) of the approximate phenotypes are assigned so that each equals the total population density of the corresponding cluster (color figure online)

We introduce the cluster-identifying function \( {\text{cid}} \), such that \( {\text{cid}}(j) = i \) means that phenotype \( s_{j} \) belongs to the \( i \)th cluster, with \( s_{{{\text{cid}}(j)}} \) as the representative—i.e., approximate—phenotype of that cluster, and \( {\text{cid}}(j) = j \) for \( j \le M \). We also introduce the component-identifying function \( {\text{com}} \), which returns the set of indices of the phenotypes comprising the i-th cluster, i.e., \( {\text{com}}(i) = \left\{ {\left. j \right|\;{\text{cid}}(j) = i} \right\} \). Then, the population densities of these clusters are given by a vector \( {\mathbf{m}} = (m_{1} , \ldots ,m_{M} )^{\text{T}} \), with the population densities

$$ m_{i} : = \sum\limits_{{j \in {\text{com}}(i)}} {n_{j} } $$
(3.1a)

for \( i = 1, \ldots ,M \) treated as belonging to the approximate phenotypes \( {\mathbf{s}}_{\text{a}} = (s_{1} , \ldots ,s_{M} )^{\text{T}} \) (Fig. 1c). While the approximate phenotype of the \( i \)th cluster is identical to the representative phenotype of that cluster, the population densities of the former and latter are different and given by \( m_{i} \) and \( n_{i} \), respectively.

Notice that the number \( M \) of approximate phenotypes is less than the number \( N + 1 \) of phenotypes in the original community dynamics. Thus, for expanding the fitness-generating function, we need to define the other \( (N - M + 1) \) variables in such a way that their changes stay small during the transient following mutant invasion. As long as the population densities of the approximate phenotypes are kept almost constant, the fitness-generating function is expected to be insensitive to \( n_{i} \) for all \( i = 1,\ldots,N+1 \). Thus, we describe the remaining degrees of freedom, \( m_{M + 1} , \ldots ,m_{N + 1} \), by

$$ m_{i} : = \varepsilon n_{i} $$
(3.1b)

for \( i = M + 1, \ldots ,N + 1 \). Combining Eqs. (3.1a) and (3.1b), we write \( {\mathbf{m}}^{{\prime }} = (m_{1} , \ldots ,m_{M} ,m_{M + 1} , \ldots ,m_{N + 1} )^{\text{T}} \), which has the same dimension as \( {\mathbf{n}}^{{\prime }} \). Then, by the smoothness axiom (i), the exchangeability axiom (iv), and the bounded-world axiom (v), we have

Lemma 1

With\( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} ): = F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) \), for sufficiently small\( \varepsilon \)there exists a constant\( C_{\text{Fm}}^{{\prime }} \)such that

$$ \left| {\frac{{\partial \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} )}}{{\partial m_{j} }}} \right| \le C_{\text{Fm}}^{{\prime }} $$
(3.2)

for all\( i,j = 1, \ldots ,N + 1 \), \( {\mathbf{n}}^{{\prime }} \in [0,\eta ]^{N + 1} \), and any\( {\mathbf{s}}^{{\prime }} \)such that\( \left| {s_{j} - s_{{{\text{cid}}(j)}} } \right| \le \varepsilon . \)

See Appendix A for the proof. Although \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} ) \) differs from \( F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) \) as a mathematical object, their biological meaning is the same. Lemma 1 thus ensures the expandability of \( F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) \) in terms of \( {\mathbf{m}}^{{\prime }} \). The estimate \( C_{\text{Fm}}^{{\prime }} \) still depends on \({\mathbf{s}}^{{\prime }}\), but is positive and uniformly bounded away from 0 and ∞.

As we did for \( C_{\text{Fm}}^{{\prime }} \), below we will introduce bounds for other important variables and functions in the form of expressions \( C_{ \cdot } \) that are independent of population densities (but may be functions of other model parameters). Please notice that here we have introduced the notational convention, to which we adhere throughout this paper, that \( C_F{ \cdot } \) denotes the upper bound for the absolute value (or norm) of \( \cdot \) the derivative of the fitness function with respect to \( { \cdot }, \) while \( C_{ \cdot } \) denotes the upper bound for the absolute value (or norm) of \( { \cdot } \) or for the derivative of the first symbol in \( { \cdot } \) with respect to the subsequent symbol(s). All \( C_F{ \cdot } \) and \( C_{ \cdot } \)are positive and uniformly bounded away from 0 and ∞. In the propositions below, we just indicate that such constants exist. Expressions for determining their values are derived in the associated appendices and are shown in Table 2.

Table 2 List of constants describing the upper bounds for variables dependent on population densities and (or) phenotypes. Notation: \( C_{{{\text{F}} \cdot }} \) denotes the upper bound for the absolute value (or norm) of the derivative of the fitness function with respect to \( \cdot \), while \( C_{ \cdot } \) denotes the upper bound for the absolute value (or norm) of \( \cdot \) or for the derivative of the first symbol in \( \cdot \) with respect to the subsequent symbol(s)

3.3 Taylor expansion in the population densities of the approximate phenotypes

We now expand the fitness-generating function in \( {\mathbf{m}}^{{\prime }} \). We denote by \( {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}^{{\prime }}} }} = (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{1} , \ldots ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{N + 1} )^{\text{T}} \) the initial state \( {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{n}}^{{\prime }}} }} = (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{1} , \ldots ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{N + 1} ) \) expressed in terms of \( {\mathbf{m}}^{{\prime }} \), with \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{i} = \sum\nolimits_{{j \in {\text{com}}(i)}} {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{j} } \) for \( i = 1, \ldots ,M \) and \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{i} = \varepsilon \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{i} \) for \( i = M + 1, \ldots ,N + 1 \). Lemma 1 allows \( F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) \) to be expanded in \( {\mathbf{m}}^{{\prime }} \) around \( {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{n}}}^{{\prime }}} } \) as

$$ F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{i} + {\mathbf{b}}_{i}^{{{\prime }{\text{T}}}} ({\mathbf{m}}^{{\prime }} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}^{{\prime }} } }}) + R_{i} , $$
(3.3a)

where

$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{i} : = F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{n}}^{{\prime }}} }} ), $$
(3.3b)
$$ {\mathbf{b}}_{i}^{{{\prime }{\text{T}}}} = \left( {b_{i1}^{{\prime }} , \ldots ,b_{i\,N + 1}^{{\prime }} } \right): = \left. {\frac{{\partial \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} )}}{{\partial {\mathbf{m}}^{{\prime }} }}} \right|_{{{\mathbf{m}}^{{\prime }} = {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}^{{\prime }}} }} }} = \left( {\frac{{\partial \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} )}}{{\partial m_{1}^{{\prime }} }}, \ldots ,\frac{{\partial \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} )}}{{\partial m_{N + 1}^{{\prime }} }}} \right)_{{{\mathbf{m}}^{{\prime }} = {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}^{{\prime }}} }} }} , $$
(3.3c)

and

$$ R_{i} : = \frac{1}{2}({\mathbf{m}}^{{\prime }} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{m}^{{\prime }}} }} )^{\text{T}} \left. {\frac{{\partial^{2} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} )}}{{\partial {\mathbf{m}}^{{\prime }} \partial {\mathbf{m}}^{{{\prime }{\text{T}}}} }}} \right|_{{{\mathbf{m}}^{{\prime }} = {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}^{{\prime }} } }}}} ({\mathbf{m}}^{{\prime }} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}^{{\prime }}} }} ) + {\text{h}} . {\text{o}} . {\text{t}} . , $$
(3.3d)

with

$$ \left. {\frac{{\partial^{2} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} )}}{{\partial {\mathbf{m}}^{{\prime }} \partial {\mathbf{m}}^{{{\prime }{\text{T}}}} }}} \right|_{{{\mathbf{m}}^{{\prime }} = {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}^{{\prime }} } }}}} = \left( {\begin{array}{*{20}c} {\frac{{\partial^{2} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} )}}{{\partial m_{1}^{{{\prime }2}} }}} & \cdots & {\frac{{\partial^{2} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} )}}{{\partial m_{1}^{{\prime }} \partial m_{N + 1}^{{\prime }} }}} \\ \vdots & \ddots & \vdots \\ {\frac{{\partial^{2} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} )}}{{\partial m_{1}^{{\prime }} \partial m_{N + 1}^{{\prime }} }}} & \cdots & {\frac{{\partial^{2} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} )}}{{\partial m_{N + 1}^{{{\prime }2}} }}} \\ \end{array} } \right)_{{{\mathbf{m}}^{{\prime }} = {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}^{{\prime }}} }} }}. $$
(3.3e)

Here \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{i} = 0 \) for \( i = 1, \ldots ,N \) (from the equilibrium equation of the residents), and

$$ \left| {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{N + 1} } \right| \le \varepsilon \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{C_{\text{Fz}}^{{\prime }}} $$
(3.3f)

(Appendix B.2). Moreover, by the bounded-world axiom (v) and Taylor’s theorem, we have

Lemma 2

If there exists a constant \( C_{{\mathbf{m}}} \) such that

$$ \left| {{\mathbf{m}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}} \right| \le \varepsilon C_{{\mathbf{m}}} , $$
(3.4a)

then there exist constants \( C_{{\mathbf{m}}}^{{\prime }} = \sqrt {C_{{\mathbf{m}}}^{2} + (N - M + 1)\eta^{2} } \) and \( C_{{{\text{F}}{\mathbf{mm}}}}^{{\prime }} \) satisfying

$$ \left| {{\mathbf{m}}^{{\prime }} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}^{{\prime }}} }} } \right| \le \varepsilon C_{{\mathbf{m}}}^{{\prime }} $$
(3.4b)

and

$$ \left| {R_{i} } \right| \le \tfrac{1}{2}C_{{{\text{F}}{\mathbf{mm}}}}^{{\prime }} \left| {{\mathbf{m}}^{{\prime }} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}^{{\prime }}} }} } \right|^{2} \le \tfrac{1}{2}\varepsilon^{2} C_{{{\text{F}}{\mathbf{mm}}}}^{{\prime }} C_{{\mathbf{m}}}^{{{\prime }2}} , $$
(3.4c)

where for vectors \( \left| {\, \cdot \,} \right| \) denotes the Euclidian norm.

See Appendix B.3 for the proof. Thus, if Eq. (3.4a) is satisfied for a sufficiently small \( \varepsilon \), the fitness-generating function is approximated well by a linear function of \( {\mathbf{m}}^{{\prime }} \).

3.4 Taylor expansion in the population densities of the original phenotypes

Next, we transform the term linear in \( {\mathbf{m}}^{{\prime }} \) in Eq. (3.3a) into one in \( {\mathbf{n}}^{{\prime }} \). As \( {\mathbf{m}}^{{\prime }} \) is a linear function of \( {\mathbf{n}}^{{\prime }} \), \( {\mathbf{m}}^{{\prime }} \) can be written as \( {\mathbf{m}}^{{\prime }} = {\mathbf{Wn}}^{{\prime }} \), where \( {\mathbf{W}} \) is a \( (N + 1) \)-by-\( (N + 1) \) matrix with components given by Eq. (3.1a). Therefore, substituting this relationship into Eq. (3.3a) gives

$$ \begin{aligned} F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) & = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{i} + {\mathbf{b}}_{i}^{{{\prime }{\text{T}}}} {\mathbf{W}}({\mathbf{n}}^{{\prime }} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{n}}^{{\prime }}} }} ) + R_{i} \\ & = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{i} + {\mathbf{a}}_{i}^{\text{T}} ({\mathbf{n}}^{{\prime }} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{n}}^{{\prime }}} }} ) + R_{i} , \\ \end{aligned} $$
(3.5a)

where, since \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} ) = F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) \),

$$ \begin{aligned} {\mathbf{a}}_{i}^{\text{T}} & = \left( {a_{i,1} , \ldots ,a_{i\,,N + 1} } \right) := {\mathbf{b}}_{i}^{{{\prime }{\text{T}}}} {\mathbf{W}} \\ & = \left. {\frac{{\partial \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} )}}{{\partial {\mathbf{m}}^{{\prime }} }}} \right|_{{{\mathbf{m}}^{{\prime }} = {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}^{{\prime }}} }} }} {\mathbf{W}} \\ & = \left. {\frac{{\partial F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} )}}{{\partial {\mathbf{n}}^{{\prime }} }}} \right|_{{{\mathbf{n}}^{{\prime }} = {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{n}}^{{\prime }}} }} }} \\ & = \left. {\left( {\frac{{\partial F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} )}}{{\partial n_{1}^{{\prime }} }}, \ldots ,\frac{{\partial F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} )}}{{\partial n_{N + 1}^{{\prime }} }}} \right)} \right|_{{{\mathbf{n}}^{{\prime }} = {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{n}}^{{\prime }}} }} }} . \\ \end{aligned} $$
(3.5b)

By combining the equations above with Lemma 2, and by using Eq. (2.3), we get

Theorem 1

For the population densities\( {\mathbf{m}} = (m_{1} , \ldots ,m_{M} )^{\text{T}} \)of approximate phenotypes\( {\mathbf{s}}_{\text{a}} = (s_{1} , \ldots ,s_{M} )^{\text{T}} \)formed by clustering resident and mutant phenotypes\( {\mathbf{s}}^{{\prime }} = (s_{1} , \ldots ,s_{N + 1} )^{\text{T}} \)according to a threshold phenotypic distance\( \varepsilon = \varepsilon_{\mu } \rho_{\mu } \), if\( {\mathbf{m}} \)satisfies

$$ \left| {{\mathbf{m}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}} \right| \le \varepsilon C_{{\mathbf{m}}} , $$

during the transient following mutant invasion, then the fitness-generating function can be expanded as

$$ F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{i} + \sum\limits_{j = 1}^{N + 1} {a_{ij} (n_{j} - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{j} )} + R_{i} $$
(3.6a)

with\( \left| {R_{i} } \right| \le C_{{{\text{F}}{\mathbf{mm}}}}^{{\prime }} \left| {{\mathbf{m}}^{{\prime }} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}^{{\prime }}} }} } \right|^{2} \le \varepsilon^{2} C_{{{\text{F}}{\mathbf{mm}}}}^{{\prime }} C_{{\mathbf{m}}}^{{\prime }} \), which gives the LV-approximation

$$ \frac{{{\text{d}}n_{i} }}{{{\text{d}}t}} = n_{i} \left[ {\gamma_{i} + \sum\limits_{j = 1}^{N + 1} {a_{ij} n_{j} } + R_{i} } \right] $$
(3.6b)

with\( \gamma_{i} = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{i} - \sum\nolimits_{j = 1}^{N + 1} {a_{ij} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{j} } \).

4 Approximability condition when the population densities of the approximate phenotypes are large

In this section, we consider the sufficient condition in Eq. (3.4a) for LV-approximability, \( \left| {{\mathbf{m}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}} \right| < \varepsilon C_{{\mathbf{m}}} \). We refer to this as the approximability condition. If the initial equilibrium population densities of approximate phenotypes are not small, so that \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{i} \gg \varepsilon \) is satisfied for all \( i = 1, \ldots ,M \), their dynamics can be analyzed by a linear stability analysis of the resident equilibrium. On the other hand, if some approximate phenotypes have very small equilibrium population densities, also the corresponding eigenvalues of the associated Jacobian come very close to zero and examining linear terms alone is not sufficient. In this section, we analyze the first, simpler, case to show that the approximability condition (3.4a) can generally be fulfilled. The second, more complicated, case is then analyzed in a similar manner in the next section.

4.1 Dynamics of approximate phenotypes

The dynamics of approximate phenotypes \( {\mathbf{m}} = (m_{1} , \ldots ,m_{M} )^{\text{T}} \) satisfies, by Eqs. (3.1a) and (2.3),

$$ \frac{{{\text{d}}m_{i} }}{{{\text{d}}t}} = m_{i} f(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) $$
(4.1a)

for \( i = 1, \ldots ,M \), where the growth rate of \( m_{i} \),

$$ f(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ): = \sum\limits_{{j \in {\text{com}}(i)}} {p_{j} F(s_{j} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} )} , $$
(4.1b)

is the average growth rate within the \( i \)th cluster weighted with the fractions \( p_{j} : = n_{j} /m_{{{\text{cid}}(j)}} \) of its component phenotypes. As for the remaining degrees of freedom in \( {\mathbf{m}}^{{\prime }} \), i.e., \( m_{i} = \varepsilon n_{i} \) for \( i = M + 1, \ldots ,N + 1 \), their dynamics are given by

$$ \frac{{{\text{d}}m_{i} }}{{{\text{d}}t}} = m_{i} F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ). $$
(4.1c)

When \( {\mathbf{m}} \) is kept constant, these remaining degrees of freedom describe the relatively slow dynamics of the cluster compositions, corresponding to the dynamics of the fractions \( p_{j} . \)

4.2 Transformation into perturbed community

For convenience, Eq. (4.1a) is rewritten in vector–matrix form as

$$ \frac{{{\text{d}}{\mathbf{m}}}}{{{\text{d}}t}} = {\text{diag}}({\mathbf{m}}){\mathbf{f}}({\mathbf{s}}_{\text{a}} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ), $$
(4.2)

where \( {\text{diag}}({\mathbf{m}}) \) is a diagonal matrix with diagonal entries \( m_{1} , \ldots ,m_{M} \), \( {\mathbf{s}}_{\text{a}} = (s_{1} , \ldots ,s_{M} )^{\text{T}} \), and \( {\mathbf{f}}({\mathbf{s}}_{\text{a}} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ): = \left( {f(s_{1} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ), \ldots ,f(s_{M} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} )} \right)^{\text{T}} \). We decompose the right-hand side of Eq. (4.2) into a component determined by \( {\mathbf{m}} \) alone and a residual of order \( \varepsilon \), which is treated as a perturbation. The former component is further decomposed into linear and higher-order terms. Specifically, we have

Lemma 3

The dynamics of\( {\mathbf{m}} \)in Eq. (4.2) can be transformed into

$$ \frac{{{\text{d}}{\mathbf{m}}}}{{{\text{d}}t}} = {\mathbf{J}}({\mathbf{m}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{m}} }}) + {\mathbf{r}}_{\text{m}} \left| {{\mathbf{m}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{m}} }}} \right|^{2} + \varepsilon {\mathbf{h}}_{\text{m}} , $$
(4.3a)

where

$$ \begin{aligned} {\mathbf{J}} & : = {\text{diag}}({{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{m}} }}){\mathbf{B}} = \left( {\begin{array}{*{20}c} {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{1} b_{11} } & \ldots & {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{1} b_{1M} } \\ \ldots & \ldots & \ldots \\ {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{M} b_{M1} } & \ldots & {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{M} b_{MM} } \\ \end{array} } \right), \\ b_{ij} & : = \left. {\frac{{\partial F(s_{i} ;{\mathbf{s}}_{\text{a}} ;{\mathbf{m}})}}{{\partial m_{j} }}} \right|_{{{\mathbf{m}} = {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{m}} }}}} , \\ \end{aligned} $$
(4.3b)

and\( {\mathbf{r}}_{\text{m}} = (r_{\text{m1}} , \ldots ,r_{{{\text{m}}M}} )^{\text{T}} \)is a function of\( {\mathbf{m}} \)satisfying\( \left| {{\mathbf{r}}_{\text{m}} } \right| \le C_{{{\mathbf{rm}}}} \), while\( {\mathbf{h}}_{\text{m}} = (h_{\text{m1}} , \ldots ,h_{{{\text{m}}M}} )^{\text{T}} \)is a function of\( {\mathbf{m}}^{{\prime }} \)and\( \varepsilon \)satisfying\( \left| {{\mathbf{h}}_{\text{m}} } \right| \le C_{{{\mathbf{hm}}}} \).

See Appendix C for the proof. Notice that \( {\mathbf{J}} \) and \( {\mathbf{r}}_{\text{m}} \) are both independent of \( \varepsilon \).

4.3 Local Lyapunov function

If the perturbation term is neglected in Eq. (4.3a), i.e., \( \varepsilon = 0 \), we can easily examine the local stability of the fixed point \( {{\overset{\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{m}} }} \) by checking whether all eigenvalues of \( {\mathbf{J}} \) have negative real parts. With the perturbation, however, we also have to compare the magnitudes of those eigenvalues with the perturbation. Moreover, as the perturbation causes a deviation of the community from \( {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{m}} }} \), the effect of the higher-order term \( {\mathbf{r}}_{\text{m}} \left| {{\mathbf{m}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}} \right|^{2} \) has to be examined as well.

To simplify the analysis, we introduce a new vector \( {\mathbf{x}} = (x_{1} , \ldots ,x_{M} )^{\text{T}} = {\mathbf{P}}({\mathbf{m}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}) \) with a real matrix \( {\mathbf{P}} \) and write Eq. (4.3a) as

$$ \frac{{{\text{d}}{\mathbf{x}}}}{{{\text{d}}t}} = {\mathbf{Ax}} + {\mathbf{r}}\left| {\mathbf{x}} \right|^{2} + \varepsilon {\mathbf{h}}, $$
(4.4)

with \( {\mathbf{A}}: = {\mathbf{PJP}}^{ - 1} \), \( {\mathbf{r}}: = {\mathbf{Pr}}_{\text{m}} \left| {{\mathbf{P}}^{ - 1} {\mathbf{x}}} \right|^{2} /\left| {\mathbf{x}} \right|^{2} \), \( {\mathbf{h}}: = {\mathbf{Ph}}_{\text{m}} \), \( \left| {\mathbf{r}} \right| \le C_{{\mathbf{r}}} \), and \( \left| {\mathbf{h}} \right| \le C_{{\mathbf{h}}} \); see Appendix D. As proved in Appendix E, we have

Lemma 4

A real matrix\( {\mathbf{P}} \)can be chosen so that\( {\mathbf{A}} = {\mathbf{PJP}}^{ - 1} \)satisfies

$$ {\mathbf{x}}^{\text{T}} {\mathbf{Ax}} \le \lambda_{\rm {max} } \left| {\mathbf{x}} \right|^{2} $$
(4.5a)

for\( \lambda_{\rm {max} } < 0 \)with

$$ \lambda_{\rm {max} } : = \hbox{max} \left\{ {\text{Re} (\lambda_{1} ), \ldots ,\text{Re} (\lambda_{M} )} \right\} $$
(4.5b)

(when the eigenvalues\( \lambda_{1} , \ldots ,\lambda_{M} \)of\( {\mathbf{J}} \)in Eq. (4.3b) are all distinct) or

$$ \lambda_{\rm {max} } : = \hbox{max} \left\{ {\text{Re} (\lambda_{1} ), \ldots ,\text{Re} (\lambda_{D} ),\tfrac{1}{2}\text{Re} (\lambda_{D + 1} ), \ldots ,\tfrac{1}{2}\text{Re} (\lambda_{M} )} \right\} $$
(4.5c)

(when some eigenvalues are repeated, with distinct eigenvalues\( \lambda_{1} , \ldots ,\lambda_{D} \)and repeated eigenvalues\( \lambda_{D + 1} , \ldots ,\lambda_{M} \)).

When the second and third terms in Eq. (4.4) are both neglected, the time derivative of \( \left| {\mathbf{x}} \right|^{2} \) is a monotonically decreasing function if \( \lambda_{\rm {max} } < 0 \), which gives

Lemma 5

For Eq. (4.4) with\( \varepsilon = 0 \), if

$$ \lambda_{\rm {max} } < 0, $$
(4.6a)

then

$$ V: = \sum\limits_{i = 1}^{M} {x_{i}^{2} } = \left| {\mathbf{x}} \right|^{2} = {\mathbf{x}}^{\text{T}} {\mathbf{x}} $$
(4.6b)

is a local Lyapunov function, i.e.,\( V = 0 \)for\( {\mathbf{x}} = {\mathbf{0}} \)and\( {\text{d}}V/{\text{d}}t < 0 \)for\( 0 < \left| {\mathbf{x}} \right| < \phi \)with a sufficiently small\( \phi . \)

Proof

By Eqs. (4.4) and (4.6a), the time derivative of \( V \) equals

$$ \begin{aligned} \frac{{{\text{d}}V}}{{{\text{d}}t}} & = 2{\mathbf{x}}^{\text{T}} \frac{{{\text{d}}{\mathbf{x}}}}{{{\text{d}}t}} \\ & = 2{\mathbf{x}}^{\text{T}} {\mathbf{Ax}} + 2{\mathbf{x}}^{\text{T}} {\mathbf{r}}\left| {\mathbf{x}} \right|^{2} + 2\varepsilon {\mathbf{x}}^{\text{T}} {\mathbf{h}} \\ & \le 2\lambda_{\rm {max} } \left| {\mathbf{x}} \right|^{2} + 2{\mathbf{x}}^{\text{T}} {\mathbf{r}}\left| {\mathbf{x}} \right|^{2} + 2\varepsilon {\mathbf{x}}^{\text{T}} {\mathbf{h}} \\ & \le 2\lambda_{\rm {max} } \left| {\mathbf{x}} \right|^{2} + 2C_{{\mathbf{r}}} \left| {\mathbf{x}} \right|^{3} + 2\varepsilon C_{{\mathbf{h}}} \left| {\mathbf{x}} \right| \\ & = 2C_{{\mathbf{r}}} \left| {\mathbf{x}} \right|^{2} \left( {\left| {\mathbf{x}} \right| - \phi_{\text{r}} } \right) + \lambda_{\rm {max} } \left| {\mathbf{x}} \right|\left( {\left| {\mathbf{x}} \right| - \phi_{\text{h}} } \right), \\ \end{aligned} $$
(4.7a)

where

$$ \begin{aligned} \phi_{\text{r}} & : = \frac{{\left| {\lambda_{\rm {max} } } \right|}}{{2C_{{\mathbf{r}}} }} > 0, \\ \phi_{\text{h}} & : = \frac{{2\varepsilon C_{{\mathbf{h}}} }}{{\left| {\lambda_{\rm {max} } } \right|}} > 0. \\ \end{aligned} $$
(4.7b)

Thus, if \( \varepsilon = 0 \) (i.e., \( \phi_{\text{h}} = 0 \)), then \( V = 0 \) for \( {\mathbf{x}} = {\mathbf{0}} \) and \( {\text{d}}V/{\text{d}}t < 0 \) for \( 0 < \left| {\mathbf{x}} \right| < \phi_{\text{r}} \). Therefore, \( V \) is a local Lyapunov function for \( {\mathbf{x}} = {\mathbf{0}} \). □

4.4 Stability condition under perturbation

For \( \lambda_{\rm {max} } < 0 \) and \( \varepsilon < \lambda_{\rm {max} }^{2} /(4C_{{\mathbf{h}}} C_{{\mathbf{r}}} ) \), there exists a contour \( V = V_{0} \) with \( \phi_{\text{h}}^{2} < V_{0} < \phi_{\text{r}}^{2} \) on which \( {\text{d}}V / {\text{d}}t < 0 \) (Fig. 2). Hence, all solutions of Eq. (4.4) that start within this contour stay inside of it. As the initial state \( {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{x}}} }} \) satisfies \( {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{x}}} }} = {\mathbf{P}}({{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}) = {\mathbf{0}} \), we have

Fig. 2
figure 2

Stability against perturbation when the initial equilibrium population densities of approximate phenotypes are not small. As a simple example, a community composed of two approximate phenotypes \( {\mathbf{s}}_{\text{a}} = (s_{1} ,s_{2} )^{\text{T}} \) is considered. Their population densities \( {\mathbf{m}} = (m_{1} ,m_{2} )^{\text{T}} \) are transformed into \( {\mathbf{x}} = (x_{1} ,x_{2} )^{\text{T}} \) so that \( {\mathbf{x}} = {\mathbf{0}} \) corresponds to the initial equilibrium before mutant invasion. A local Lyapunov function \( V = \left| {\mathbf{x}} \right|^{2} \) of the community dynamics monotonically decreases with time within the light-gray and dark-gray regions marked by D. The dark-gray region marked by E is associated with a repeller that prevents the community dynamics from passing its inner boundary \( \left| {\mathbf{x}} \right| = \phi_{\text{h}} \) from its inner side \( \left| {\mathbf{x}} \right| < \phi_{\text{h}} \)

Lemma 6

If

$$ \sqrt \varepsilon < \frac{{ - \lambda_{\rm {max} } }}{{2\sqrt {C_{{\mathbf{h}}} C_{{\mathbf{r}}} } }}, $$
(4.8a)

then

$$ \left| {\mathbf{x}} \right| < 2\varepsilon \frac{{C_{{\mathbf{h}}} }}{{\left| {\lambda_{\rm max} } \right|}} $$
(4.8b)

during the transient following mutant invasion.

Finally, by translating Eq. (4.8b) back to \( {\mathbf{m}} - {{\overset{\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{m}} }} = {\mathbf{P}}^{ - 1} {\mathbf{x}} \), and by substituting it into Eq. (3.4a), we have

Theorem 2

For the population densities\( {\mathbf{m}} = (m_{1} , \ldots ,m_{M} )^{\text{T}} \)of approximate phenotypes\( {\mathbf{s}}_{\text{a}} = (s_{1} , \ldots ,s_{M} )^{\text{T}} \)formed by clustering resident and mutant phenotypes\( {\mathbf{s}}^{{\prime }} = (s_{1} , \ldots ,s_{N + 1} )^{\text{T}} \)according to a threshold phenotypic distance\( \varepsilon = \rho_{\mu } \varepsilon_{\mu } \), if\( \lambda_{\rm max} \)defined by Eq. (4.5) satisfies the approximability condition, Eq. (4.8a),

$$ \sqrt \varepsilon < - \frac{{\lambda_{\rm max} }}{{2\sqrt {C_{\text{h}} C}_{\text{r}} }}, $$

then

$$ \left| {{\mathbf{m}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}} \right| \le C_{{\mathbf{m}}} \varepsilon $$
(4.9a)

during the transient following mutant invasion, where

$$ C_{{\mathbf{m}}} = \frac{{2\left\| {{\mathbf{P}}^{ - 1} } \right\|C_{{\mathbf{h}}} }}{{\left| {\lambda_{\rm max} } \right|}} $$
(4.9b)

with\( {\mathbf{P}} \)defined by Eq. (E.24) of Appendix Eand with\( \left\| {\, \cdot \,} \right\| \)denoting the induced norm for the matrix⋅, i.e., the maximum absolute value among its eigenvalues.

In evolutionary dynamics determined by trait-substitution sequences, i.e., induced by repeated mutant invasions, when the fitness gradients for all residents are sufficiently strong, so that the coexistence of a mutant and its parental resident is impossible for any resident (as explained in Sect. 8), each of the resident phenotypes is not similar to any other. Then, \( \varepsilon \) may be chosen at \( \varepsilon = \varepsilon_{\mu } \) (i.e., \( \rho_{\mu } = 1 \)), so that only the mutant and its parental resident are clustered. This parental resident can be chosen as the approximate phenotype of that cluster, in which case all approximate phenotypes are identical to the resident phenotypes before the mutant invasion. Thus, as long as the initial equilibrium before the invasion is linearly stable, \( \lambda_{\rm max} \) is negative, because of Eq. (4.1a) in conjunction with Eq. (2.3). Hence, for sufficiently small \( \varepsilon_{\mu } \), Eq. (4.8a) is always satisfied, in accordance with the proof by Dercole and Rinaldi (2008).

On the other hand, when the fitness gradients for some residents become small as a consequence of their directional coevolution toward higher fitnesses, effects of the higher-order properties of the fitness function may induce evolutionary branching. During the early stage of evolutionary branching, phenotypic distances among residents branched from the ancestral resident have magnitudes that are comparable with \( \varepsilon_{\mu } \). In this case, clustering only the mutant and its parental resident with \( \varepsilon = \varepsilon_{\mu } \) may provide too small a value of \( |\lambda_{\rm max} | \) to satisfy Eq. (4.8a), while including similar residents in the cluster for an appropriate \( \varepsilon \) larger than \( \varepsilon_{\mu } \) may provide a sufficiently large \( |\lambda_{\rm max} | \) to satisfy Eq. (4.8a).

In Dercole and Rinaldi (2008), only the mutant and its parental resident are clustered together, and the other residents are not clustered. Thus, when the phenotypic distance among some residents is small, say, equal to \( \varepsilon_{\text{resident}} \), the leading eigenvalue of the community’s Jacobian matrix inevitably is close to zero as well. This problem is avoided in their proof by assuming sufficiently small mutational step sizes compared to \( \varepsilon_{\text{resident}} \). However, when we consider a trait substitution sequence under a given magnitude of mutational step sizes, early stages of evolutionary branching inevitably lead to \( \varepsilon_{\text{resident}} \) of the same order of magnitude as the mutational step sizes, no matter what is assumed for the latter. From this perspective, the proof by Dercole and Rinaldi (2008) requires all the residents to be dissimilar.

According to Eq. (4.3b), the leading eigenvalue also becomes close to zero when initial equilibrium densities of some residents are small. Analogously to the above case of similar residents, while this problem is seemingly avoided in Dercole and Rinaldi (2008) by assuming sufficiently small mutational step sizes, it inevitably occurs in trait-substitution sequences in which residents gradually go extinct. Hence, their proof fails to cover all the cases that one may wish to consider (and are actually considered in their book).

5 Approximability condition when the population densities of some approximate phenotypes are small

If an approximate phenotype \( s_{1} \) has a small population density \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{1} = {\text{O}}(\varepsilon ) \) at the initial equilibrium, then the corresponding eigenvalue of \( {\mathbf{J}} = {\text{diag}}({{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{m}} }}){\mathbf{B}} \) will be close to zero, making it difficult to satisfy the approximability condition in Eq. (4.8a). Even in this case, however, \( \left| {{\mathbf{m}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{m}} }}} \right| = {\text{O}}(\varepsilon ) \) may hold during the transient following mutant invasion. To cover this situation by developing a refined approximability condition, we examine in this section not only linear terms, but also quadratic terms, of the Taylor expansions investigated in the preceding section.

5.1 Transformation into perturbed community

First, we decompose the function \( {\mathbf{f}} \) in Eq. (4.2),

$$ \frac{{{\text{d}}{\mathbf{m}}}}{{{\text{d}}t}} = {\text{diag(}}{\mathbf{m}} ){\mathbf{f}}({\mathbf{s}}_{\text{a}} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ), $$

into the terms that are linear in \( {\mathbf{m}} \), the terms that are of higher order in \( {\mathbf{m}} \), and the perturbation terms, in a manner similar to Lemma 3 in the previous section. Specifically, we have

Lemma 7

Equation (4.2) can be transformed into

$$ \frac{{{\text{d}}{\mathbf{m}}}}{{{\text{d}}t}} = {\text{diag(}}{\mathbf{m}} )\left[ {{\mathbf{B}}({\mathbf{m}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{m}} }}) + \varepsilon {\mathbf{h}}_{\text{f}} + {\mathbf{r}}_{\text{f}} \left| {{\mathbf{m}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{m}} }}} \right|^{2} } \right] , $$
(5.1)

where\( {\mathbf{r}}_{\text{f}} = (r_{\text{f1}} , \ldots ,r_{{{\text{f}}M}} )^{\text{T}} \)is a function of\( {\mathbf{m}} \)satisfying\( \left| {{\mathbf{r}}_{\text{f}} } \right| \le C_{{{\mathbf{r}}{\text{f}}}} \)and\( {\mathbf{h}}_{\text{f}} = (h_{\text{f1}} , \ldots ,h_{{{\text{f}}M}} )^{\text{T}} \)is a function of\( {\mathbf{m}}^{{\prime }} \)and\( \varepsilon \)satisfying\( \left| {{\mathbf{h}}_{\text{f}} } \right| \le C_{{{\mathbf{h}}{\text{f}}}} . \)

See Appendix C.1 for the proof.

We now consider situations in which \( L \) population densities, i.e., \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{i} \) for \( i = 1, \ldots ,L \), are large, while the remaining \( K = M - L \) population densities, i.e., \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{i} \) for \( i = L + 1, \ldots ,M \), are small, such that \( \left| {(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{L + 1} , \ldots ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{M} )} \right| = \rho_{\text{m}} \varepsilon \) for a positive constant \( \rho_{\text{m}} \). To treat the small population densities differently from the larger ones, we decompose \( {\mathbf{m}} \) into the larger population densities \( {\mathbf{m}}_{\text{x}} = (m_{1} , \ldots ,m_{L} )^{\text{T}} \) and the small population densities \( {\mathbf{m}}_{\text{y}} = (m_{L + 1} , \ldots ,m_{M} )^{\text{T}} \), \( {\mathbf{m}} = ({\mathbf{m}}_{\text{x}} ,{\mathbf{m}}_{\text{y}} )^{\text{T}} \). Then, Eq. (5.1) is split into

$$ \frac{{{\text{d}}{\mathbf{m}}_{\text{x}} }}{{{\text{d}}t}} = {\text{diag(}}{\mathbf{m}}_{\text{x}} )\left[ {{\mathbf{B}}_{\text{xx}} ({\mathbf{m}}_{\text{x}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{m}} }}_{\text{x}} ) + {\mathbf{B}}_{\text{xy}} ({\mathbf{m}}_{\text{y}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{m}} }}_{\text{y}} ) + \varepsilon {\mathbf{h}}_{\text{fx}} + {\mathbf{r}}_{\text{fx}} \left| {{\mathbf{m}} - {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m} }}} \right|^{2} } \right] $$
(5.2a)

and

$$ \frac{{{\text{d}}{\mathbf{m}}_{\text{y}} }}{{{\text{d}}t}} = {\text{diag(}}{\mathbf{m}}_{\text{y}} )\left[ {{\mathbf{B}}_{\text{yx}} ({\mathbf{m}}_{\text{x}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}_{\text{x}} ) + {\mathbf{B}}_{\text{yy}} ({\mathbf{m}}_{\text{y}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}_{\text{y}} ) + \varepsilon {\mathbf{h}}_{\text{fy}} + {\mathbf{r}}_{\text{fy}} \left| {{\mathbf{m}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}} \right|^{2} } \right] , $$
(5.2b)

with

$$ \left( {\begin{array}{*{20}c} {{\mathbf{B}}_{\text{xx}} } & {{\mathbf{B}}_{\text{xy}} } \\ {{\mathbf{B}}_{\text{yx}} } & {{\mathbf{B}}_{\text{yy}} } \\ \end{array} } \right) = {\mathbf{B}},\quad \left( {\begin{array}{*{20}c} {{\mathbf{h}}_{\text{fx}} } \\ {{\mathbf{h}}_{\text{fy}} } \\ \end{array} } \right) = {\mathbf{h}}_{\text{f}} ,\quad \left( {\begin{array}{*{20}c} {{\mathbf{r}}_{\text{fx}} } \\ {{\mathbf{r}}_{\text{fy}} } \\ \end{array} } \right) = {\mathbf{r}}_{\text{f}} . $$
(5.2c)

Around the initial equilibrium \( {\mathbf{\overset{\hbox{$\smash{\scriptscriptstyle\frown}$}}{m} }} \), the dynamics of \( {\mathbf{m}}_{\text{y}} \) are much slower than those of \( {\mathbf{m}}_{\text{x}} \). When \( \varepsilon \) goes to zero, the equilibrium population densities \( \left| {(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{L + 1} , \ldots ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{M} )} \right| \le \rho_{\text{m}} \varepsilon \) do so as well, i.e.,

$$ {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }} \to (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{1} , \ldots ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{L} ,0, \ldots ,0)^{\text{T}} = \left( {\begin{array}{*{20}c} {{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}_{\text{x}} } \\ {\mathbf{0}} \\ \end{array} } \right) = :{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\tilde{{\mathbf{m}}}} }}, $$
(5.3a)

and the whole dynamics get confined to a center manifold given by \( {\text{d}}{\mathbf{m}}_{\text{x}} / {\text{d}}t = 0 \). The slow dynamics close to the center manifold are governed by \( {\text{d}}{\mathbf{m}}_{\text{y}} / {\text{d}}t \). An approximation for this center manifold can be derived by setting \( {\text{d}}{\mathbf{m}}_{\text{x}} / {\text{d}}t = 0 \) with \( \varepsilon {\mathbf{h}}_{\text{fx}} + {\mathbf{r}}_{\text{fx}} \left| {{\mathbf{m}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}} \right|^{2} = 0 \), yielding

$$ {\mathbf{m}}_{\text{x}} = {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}_{\text{x}} - {\mathbf{B}}_{\text{xx}}^{ - 1} {\mathbf{B}}_{\text{xy}} {\mathbf{m}}_{\text{y}} = :{\tilde{\mathbf{m}}}_{\text{x}} ({\mathbf{m}}_{\text{y}} ) , $$
(5.3b)

which passes through the fixed point \( {\tilde{\mathbf{m}}}_{\text{x}} ({\mathbf{0}}) = :{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\tilde{{\mathbf{m}}}} }}. \)

Although for \( \varepsilon > 0 \), \( {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }} = ({{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}_{\text{x}} ,{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}_{\text{y}} )^{\text{T}} \) will deviate from \( {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\tilde{{\mathbf{m}}}} }} = ({{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}_{\text{x}} ,{\mathbf{0}})^{\text{T}} \), it is expected that for small \( \varepsilon \) the dynamics can still be effectively characterized by their projection onto the center manifold \( {\mathbf{m}}_{\text{x}} = {\tilde{\mathbf{m}}}_{\text{x}} ({\mathbf{m}}_{\text{y}} ) \). Thus, we transform Eq. (5.2a) into

$$ \frac{{{\text{d}}{\mathbf{x}}}}{{{\text{d}}t}} = {\mathbf{A}}_{\text{x}} {\mathbf{x}} + \varepsilon {\tilde{\mathbf{h}}}_{\text{x}} + {\tilde{\mathbf{r}}}_{\text{x}} \left| {\mathbf{w}} \right|^{2} , $$
(5.4a)
$$ \frac{{{\text{d}}{\mathbf{y}}}}{{{\text{d}}t}}{\text{ = diag(}}{\mathbf{y}} )\left[ {{\mathbf{J}}_{\text{y}} {\mathbf{y}} + {\mathbf{Ux}} + \varepsilon {\tilde{\mathbf{h}}}_{\text{y}} + {\tilde{\mathbf{r}}}_{\text{y}} \left| {\mathbf{w}} \right|^{2} } \right], $$
(5.4b)

with \( \left| {{\tilde{\mathbf{h}}}_{\text{x}} } \right| \le \tilde{C}_{{{\mathbf{h}}{\text{x}}}} \), \( \left| {{\tilde{\mathbf{h}}}_{\text{y}} } \right| \le \tilde{C}_{{{\mathbf{h}}{\text{y}}}} \), \( \left| {{\tilde{\mathbf{r}}}_{\text{x}} } \right| \le \tilde{C}_{{{\mathbf{r}}{\text{x}}}} \), and \( \left| {{\tilde{\mathbf{r}}}_{\text{y}} } \right| \le \tilde{C}_{{{\mathbf{r}}{\text{y}}}} \), where \( {\mathbf{x}}: = {\mathbf{P}}({\mathbf{m}}_{\text{x}} - {\tilde{\mathbf{m}}}_{\text{x}} ({\mathbf{m}}_{\text{y}} )) \) describes the convergence to, or deviation from, the center manifold \( {\mathbf{m}}_{\text{x}} = {\tilde{\mathbf{m}}}_{\text{x}} ({\mathbf{m}}_{\text{y}} ) \), and \( {\mathbf{y}}: = {\mathbf{m}}_{\text{y}} \) describes the slow dynamics along the manifold. The other variables and parameters newly introduced in Eqs. (5.4a) and (5.4b) are given by

$$ \begin{aligned} {\mathbf{w}} & : = \left( {\begin{array}{*{20}c} {\mathbf{x}} \\ {\mathbf{y}} \\ \end{array} } \right), \\ {\mathbf{A}}_{\text{x}} & : = {\mathbf{PJ}}_{\text{x}} {\mathbf{P}}^{ - 1} = {\mathbf{P}}{\text{diag(}}{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}_{\text{x}} ){\mathbf{B}}_{\text{xx}} {\mathbf{P}}^{ - 1} , \\ {\mathbf{U}} & : = {\mathbf{B}}_{\text{yx}} {\mathbf{P}}^{ - 1} , \\ {\mathbf{J}}_{\text{y}} & : = {\mathbf{B}}_{\text{yy}} - {\mathbf{B}}_{\text{yx}} {\mathbf{B}}_{\text{xx}}^{ - 1} {\mathbf{B}}_{\text{xy}} , \\ \end{aligned} $$
(5.4c)

(Appendix F.1F.4), where \( {\mathbf{P}} \) is chosen so that Eq. (4.5a) is satisfied for \( {\mathbf{A}} = {\mathbf{A}}_{\text{x}} \) (Appendix E). Notice that \( {\mathbf{y}} \) must be non-negative while \( {\mathbf{x}} \) is indeterminate. Here, the effect of \( {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\tilde{{\mathbf{m}}}} }} \) is subsumed into the perturbation terms \( {\tilde{\mathbf{h}}}_{\text{x}} \) and \( {\tilde{\mathbf{h}}}_{\text{y}} \). Neglecting them gives a fixed point \( {\mathbf{w}} = \left( {\begin{array}{*{20}c} {\mathbf{x}} \\ {\mathbf{y}} \\ \end{array} } \right) = {\mathbf{0}} \) that corresponds to \( {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\tilde{{\mathbf{m}}}} }} = ({{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}_{\text{x}} ,{\mathbf{0}})^{\text{T}} \), which is slightly different from the initial equilibrium \( {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m} }} = ({\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m} }}_{\text{x}} ,{\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m} }}_{\text{y}} )^{\text{T}} \) when \( \varepsilon > 0 \). In the next subsections, we analyze the magnitude of \( \left| {{\mathbf{m}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\tilde{\mathbf{m}}} }}} \right| \) in Eq. (5.4a) during the transient following mutant invasion, to obtain the magnitude of \( \left| {{\mathbf{m}} - {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m} }}} \right| = \left| {{\mathbf{m}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\tilde{\mathbf{m}}} }}} \right| + O(\varepsilon ) \).

5.2 Local Lyapunov function

Following Mazenc (2001), we construct a local Lyapunov function to examine the magnitude of \( \left| {\mathbf{w}} \right| \) during the transient following mutant invasion. We have

Lemma 8

In Eq. (5.4a) with\( \varepsilon = 0 \), if the eigenvalues\( \tilde{\lambda }_{1} , \ldots ,\tilde{\lambda }_{M} \)of the symmetric part of

$$ \,{\tilde{\mathbf{A}}} = \left( {\begin{array}{ll} {{\mathbf{A}}_{\text{x}} } \hfill &\quad 0 \hfill \\ {d{\mathbf{U}}} \hfill &\quad {d{\mathbf{J}}_{\text{y}} } \hfill \\ \end{array} } \right), $$
(5.5a)

with\( d \)being a positive constant, satisfy

$$ \tilde{\lambda }_{\rm max} = \hbox{max} \left\{ {\tilde{\lambda }_{1} , \ldots ,\tilde{\lambda }_{M} } \right\} < 0 $$
(5.5b)

and\( \left| {\mathbf{w}} \right| < \phi \), with\( \phi \)being a sufficiently small constant, then

$$ \begin{aligned} V & = \sum\limits_{i = 1}^{L} {x_{i}^{2} } + 2d\sum\limits_{i = L + 1}^{M} {y_{i} } \\ & = {\mathbf{x}}^{\text{T}} {\mathbf{x}} + 2d{\mathbf{c}}^{\text{T}} {\mathbf{y}} \\ \end{aligned} $$
(5.5c)

with\( {\mathbf{c}} = (1, \ldots ,1)^{\text{T}} \)is a local Lyapunov function.

Proof

We assume that \( \tilde{\lambda }_{\rm max} < 0 \). The time derivative of \( V \) is

$$ \begin{aligned} \frac{{{\text{d}}V}}{{{\text{d}}t}} & = 2{\mathbf{x}}^{\text{T}} \frac{{{\text{d}}{\mathbf{x}}}}{{{\text{d}}t}} + 2d{\mathbf{c}}^{\text{T}} \frac{{{\text{d}}{\mathbf{y}}}}{{{\text{d}}t}} \\ & = 2{\mathbf{x}}^{\text{T}} \left[ {{\mathbf{A}}_{\text{x}} {\mathbf{x}} + \varepsilon {\tilde{\mathbf{h}}}_{\text{x}} + {\tilde{\mathbf{r}}}_{\text{x}} \left| {\mathbf{w}} \right|^{2} } \right] + 2d{\mathbf{c}}^{\text{T}} {\text{diag(}}{\mathbf{y}} )\left[ {{\mathbf{J}}_{\text{y}} {\mathbf{y}} + {\mathbf{Ux}} + \varepsilon {\tilde{\mathbf{h}}}_{\text{y}} + {\tilde{\mathbf{r}}}_{\text{y}} \left| {\mathbf{w}} \right|^{2} } \right] \\ & = 2{\mathbf{x}}^{\text{T}} \left[ {{\mathbf{A}}_{\text{x}} {\mathbf{x}} + \varepsilon {\tilde{\mathbf{h}}}_{\text{x}} + {\tilde{\mathbf{r}}}_{\text{x}} \left| {\mathbf{w}} \right|^{2} } \right] + 2d{\mathbf{y}}^{\text{T}} \left[ {{\mathbf{J}}_{\text{y}} {\mathbf{y}} + {\mathbf{Ux}} + \varepsilon {\tilde{\mathbf{h}}}_{\text{y}} + {\tilde{\mathbf{r}}}_{\text{y}} \left| {\mathbf{w}} \right|^{2} } \right] \\ & = 2\left( {\begin{array}{*{20}c} {\mathbf{x}} \\ {\mathbf{y}} \\ \end{array} } \right)^{\text{T}} \left( {\begin{array}{*{20}c} {{\mathbf{A}}_{\text{x}} } & 0 \\ {d{\mathbf{U}}} & {d{\mathbf{J}}_{\text{y}} } \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {\mathbf{x}} \\ {\mathbf{y}} \\ \end{array} } \right) + 2\left( {\begin{array}{*{20}c} {\mathbf{x}} \\ {\mathbf{y}} \\ \end{array} } \right)^{\text{T}} \left( {\begin{array}{*{20}c} {{\tilde{\mathbf{r}}}_{\text{x}} } \\ {d{\tilde{\mathbf{r}}}_{\text{y}} } \\ \end{array} } \right)\left| {\mathbf{w}} \right|^{2} + 2\varepsilon \left( {\begin{array}{*{20}c} {\mathbf{x}} \\ {\mathbf{y}} \\ \end{array} } \right)^{\text{T}} \left( {\begin{array}{*{20}c} {{\tilde{\mathbf{h}}}_{\text{x}} } \\ {d{\tilde{\mathbf{h}}}_{\text{y}} } \\ \end{array} } \right) \\ & = 2{\mathbf{w}}^{\text{T}} {\tilde{\mathbf{A}}\mathbf{w}} + 2{\mathbf{w}}^{\text{T}} {\tilde{\mathbf{r}}}\left| {\mathbf{w}} \right|^{2} + 2\varepsilon {\mathbf{w}}^{\text{T}} {\tilde{\mathbf{h}}}, \\ \end{aligned} $$
(5.6a)

with

$$ {\tilde{\mathbf{r}}} = \left( {\begin{array}{*{20}c} {{\mathbf{r}}_{\text{x}} } \\ {d{\mathbf{r}}_{\text{y}} } \\ \end{array} } \right),\quad {\tilde{\mathbf{h}}} = \left( {\begin{array}{*{20}c} {{\mathbf{h}}_{\text{x}} } \\ {d{\mathbf{h}}_{\text{y}} } \\ \end{array} } \right). $$
(5.6b)

Notice that the last line of Eq. (5.6a) has a form identical to the second line of Eq. (4.7a). Although in this case the initial state \( {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{w} }}: = (0, \ldots ,0,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{L + 1} , \ldots ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{M} )^{\text{T}} \) is not zero, \( \left| {{\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{w} }}} \right| = \left| {(0, \ldots ,0,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{L + 1} , \ldots ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{M} )} \right| = \varepsilon \rho_{\text{m}} \) holds. Therefore, Eq. (5.6a) can analogously be transformed further,

$$ \begin{aligned} \frac{{{\text{d}}V}}{{{\text{d}}t}} & \le 2\tilde{\lambda }_{\rm max} \left| {\mathbf{w}} \right|^{2} + 2\tilde{C}_{{\mathbf{r}}} \left| {\mathbf{w}} \right|^{3} + 2\varepsilon \tilde{C}_{{\mathbf{h}}} \left| {\mathbf{w}} \right| \\ & = 2\tilde{C}_{{\mathbf{r}}} \left| {\mathbf{w}} \right|^{2} \left( {\left| {\mathbf{w}} \right| - \frac{{\left| {\tilde{\lambda }_{\rm max} } \right|}}{{2\tilde{C}_{{\mathbf{r}}} }}} \right) + \tilde{\lambda }_{\rm max} \left| {\mathbf{w}} \right|\left( {\left| {\mathbf{w}} \right| - \frac{{2\varepsilon \tilde{C}_{{\mathbf{h}}} }}{{\left| {\tilde{\lambda }_{\rm max} } \right|}}} \right) \\ & \le 2\tilde{C}_{{\mathbf{r}}} \left| {\mathbf{w}} \right|^{2} \left( {\left| {\mathbf{w}} \right| - \frac{{\left| {\tilde{\lambda }_{\rm max} } \right|}}{{2\tilde{C}_{{\mathbf{r}}} }}} \right) + \tilde{\lambda }_{\rm max} \left| {\mathbf{w}} \right|\left( {\left| {\mathbf{w}} \right| - \hbox{max} \left\{ {\frac{{2\varepsilon \tilde{C}_{{\mathbf{h}}} }}{{\left| {\tilde{\lambda }_{\rm max} } \right|}},\varepsilon \rho_{\text{m}} } \right\}} \right) \\ & = 2\tilde{C}_{{\mathbf{r}}} \left| {\mathbf{w}} \right|^{2} \left( {\left| {\mathbf{w}} \right| - \tilde{\phi }_{\text{r}} } \right) + \tilde{\lambda }_{\rm max} \left| {\mathbf{w}} \right|\left( {\left| {\mathbf{w}} \right| - \tilde{\phi }_{\text{h}} } \right), \\ \end{aligned} $$
(5.6c)

where \( \tilde{C}_{{\mathbf{h}}} \ge \left| {{\tilde{\mathbf{h}}}} \right| \), \( \tilde{C}_{{\mathbf{r}}} \ge \left| {{\tilde{\mathbf{r}}}} \right| \) (Appendix F.5) and

$$ \begin{aligned} \tilde{\phi }_{\text{h}} & = \hbox{max} \left\{ {\frac{{2\varepsilon \tilde{C}_{{\mathbf{h}}} }}{{\left| {\tilde{\lambda }_{\rm max} } \right|}},\varepsilon \rho_{\text{m}} } \right\}, \\ \tilde{\phi }_{\text{r}} & = \frac{{\left| {\tilde{\lambda }_{\rm max} } \right|}}{{2\tilde{C}_{{\mathbf{r}}} }}. \\ \end{aligned} $$
(5.6d)

In Eq. (5.6c), the transformation from the second to the third row allows the initial state \( {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{w} }} \) to satisfy \( \left| {{\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{w} }}} \right| = \varepsilon \rho_{\text{m}} \le \tilde{\phi }_{\text{h}} \), which is used for the case of \( \varepsilon > 0 \) in Lemma 10. Therefore, for \( \varepsilon = 0 \) (i.e., \( \tilde{\phi }_{\text{h}} = 0 \)), \( V \) satisfies \( V = 0 \) for \( {\mathbf{w}} = {\mathbf{0}} \) and \( {\text{d}}V/{\text{d}}t < 0 \) for \( \left| {\mathbf{w}} \right| < \tilde{\phi }_{\text{r}} \). □

In addition, the following lemma is proved in Appendix G.

Lemma 9

Equation (5.5b), i.e.,

$$ \tilde{\lambda }_{\rm max} < 0, $$

holds, if the real parts of the eigenvalues of\( {\mathbf{J}}_{\text{x}} = {\text{diag(}}{\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m} }}_{\text{x}} ){\mathbf{B}}_{\text{xx}} \)and the real eigenvalues of\( \tfrac{1}{2}\left( {{\mathbf{J}}_{\text{y}} + {\mathbf{J}}_{\text{y}}^{\text{T}} } \right) \)with\( {\mathbf{J}}_{\text{y}} = {\mathbf{B}}_{\text{yy}} - {\mathbf{B}}_{\text{yx}} {\mathbf{B}}_{\text{xx}}^{ - 1} {\mathbf{B}}_{\text{xy}} \)are all negative, and if\( d \)is sufficiently small.

5.3 Stability under perturbation

Next, we take the perturbation into account, i.e., we consider \( \varepsilon > 0 \). In the previous section, the contour curves of the local Lyapunov function have the same shapes as the boundaries of the region ensuring that \( {\text{d}}V / {\text{d}}t < 0 \), i.e., as the two circles \( \left| {\mathbf{w}} \right| = \phi_{\text{h}} \) and \( \left| {\mathbf{w}} \right| = \phi_{\text{r}} \). In this section, although the contours have shapes different from the circles \( \left| {\mathbf{w}} \right| = \tilde{\phi }_{\text{h}} \) and \( \left| {\mathbf{w}} \right| = \tilde{\phi }_{\text{r}} \), the manner of analysis is the same. First, as the initial state \( {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{w}} }} \) satisfies \( \left| {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mathbf{w} }}} \right| = \varepsilon \rho_{\text{m}} \le \tilde{\phi }_{\text{h}} \) according to Eq. (5.6d), we trivially have

Lemma 10

We assume that\( \varepsilon \)is sufficiently small so that\( \tilde{\phi }_{\text{h}} < \tilde{\phi }_{\text{r}} \). For a region\( D = \left\{ {\left. {\mathbf{w}} \right|\,\,\,\tilde{\phi }_{\text{h}} < \left| {\mathbf{w}} \right| < \tilde{\phi }_{\text{r}} } \right\} \), within which\( {\text{d}}V/{\text{d}}t < 0 \)with\( V \)defined in Eq. (5.5c), consider a contour curve\( V = V_{0} \)such that its inscribed circle is given by\( \left| {\mathbf{w}} \right| = \tilde{\phi }_{\text{h}} \)and its circumscribed circle is given by\( \left| {\mathbf{w}} \right| = \alpha \tilde{\phi }_{\text{h}} \)with\( \alpha > 1 \) (Fig. 3). If

$$ \alpha \tilde{\phi }_{\text{h}} < \tilde{\phi }_{\text{r}} , $$
(5.7a)

then there exists a set\( E = \left\{ {\left. {\mathbf{w}} \right|\,\,\,V_{0} < V < V_{0} + \delta } \right\} \)with a sufficiently small\( \delta \)where\( {\text{d}}V/{\text{d}}t < 0 \)and which therefore ensures that

$$ \left| {\mathbf{w}} \right| \le \alpha \tilde{\phi }_{\text{h}} $$
(5.7b)

holds during the transient following mutant invasion.

Fig. 3
figure 3

Stability against perturbation when the initial equilibrium population densities of some approximate phenotypes are small. As a simple example, a community composed of two approximate phenotypes \( {\mathbf{s}}_{\text{a}} = (s_{1} ,s_{2} )^{\text{T}} \) is considered. Their population densities \( {\mathbf{m}} = (m_{1} ,m_{2} )^{\text{T}} \) are transformed into \( {\mathbf{w}} = (x,y)^{\text{T}} \) so that \( {\mathbf{w}} = {\mathbf{0}} \) corresponds to the initial equilibrium before mutant invasion, where \( y \) corresponds to the population density of the phenotype with small population density. A local Lyapunov function \( V = x^{2} + dy \) (with \( d > 0 \)) of the community dynamics monotonically decreases with time within the light-gray and dark-gray regions marked by D. The dark-gray region marked by E is associated with a repeller that prevents the community dynamics from passing a boundary \( V = V_{0} \) from its inner side \( V < V_{0} \), thus keeping \( \left| {\mathbf{w}} \right| \le \alpha \tilde{\phi }_{\text{h}} \)

This lemma is an extension of Lemma 5 relaxing the requirement that the contours be circular: Lemma 10 with \( \alpha = 1 \) corresponds to Lemma 5.

Then, by substituting Eqs. (5.6d) into (5.7a), we have

Lemma 11

For some \( d \) , if

$$ \sqrt \varepsilon < \frac{{ - \tilde{\lambda }_{\rm max} }}{{2\sqrt {\alpha \tilde{C}_{{\mathbf{r}}} \hbox{max} \left\{ {\tilde{C}_{{\mathbf{h}}} ,\tfrac{1}{2}\left| {\tilde{\lambda }_{\rm max} } \right|\rho_{\text{m}} } \right\}} }} , $$
(5.8a)

then

$$ \left| {\mathbf{w}} \right| \le - \frac{2\varepsilon \alpha }{{\left| {\tilde{\lambda }_{\rm max} } \right|}}\hbox{max} \left\{ {\tilde{C}_{{\mathbf{h}}} ,\,\tfrac{1}{2}\,\left| {\tilde{\lambda }_{\rm max} } \right|\rho_{\text{m}} } \right\} $$
(5.8b)

holds during the transient following mutant invasion, where

$$ \alpha = \left\{ {\begin{array}{ll} 1 \hfill &\quad {{\text{for}}\;\;K = 0} \hfill \\ {\hbox{max} \left\{ {\frac{{\sqrt {\tilde{\phi }_{\text{h}}^{ 2} + Kd^{2} } }}{{\tilde{\phi }_{\text{h}} }},\frac{{\tilde{\phi }_{\text{h}}^{ 2} + Kd^{2} }}{{2d\tilde{\phi }_{\text{h}} }}} \right\}} \hfill &\quad {{\text{for}}\;\;0 < K \le \frac{{\tilde{\phi }_{\text{h}}^{ 2} }}{{d^{2} }}} \hfill \\ {\hbox{max} \left\{ {\frac{{\sqrt {2dK^{1/2} \tilde{\phi }_{\text{h}} } }}{{\tilde{\phi }_{\text{h}} }},\sqrt K } \right\}} \hfill &\quad {{\text{for}}\;\;K > \frac{{\tilde{\phi }_{\text{h}}^{ 2} }}{{d^{2} }}} \hfill \\ \end{array} } \right. , $$
(5.8c)

and \( K = M - L \) is the number of approximate phenotypes with small equilibrium population densities.

See Appendix H for the derivation of the expressions for \( \alpha \). When \( K = 0 \), this lemma is independent of \( d \) and becomes identical to Lemma 6 in the previous section, i.e., \( \tilde{\lambda }_{\rm max} = \lambda_{\rm max} \), \( \tilde{C}_{{\mathbf{r}}} = C_{{\mathbf{r}}} \), \( \tilde{C}_{{\mathbf{h}}} = C_{{\mathbf{h}}} \), and \( \alpha = 1 \). Thus, this lemma includes Lemma 6 as a special case. When \( K > 0 \), on the other hand, \( \tilde{\lambda }_{\rm max} \), \( \alpha \), \( \tilde{C}_{\text{h}} \), and \( \tilde{C}_{\text{r}} \) in Eq. (5.8a) all depend on \( d \). A choice for \( d \) may be one that maximizes the right-hand side of Eq. (5.8a).

By translating Lemma 11 into a corresponding statement for \( {\mathbf{m}} \), we get

Theorem 3

For the population densities\( {\mathbf{m}} = (m_{1} , \ldots ,m_{M} )^{\text{T}} \)of approximate phenotypes\( {\mathbf{s}}_{\text{a}} = (s_{1} , \ldots ,s_{M} )^{\text{T}} \)formed by clustering resident phenotypes\( {\mathbf{s}}^{{\prime }} = (s_{1} , \ldots ,s_{N + 1} )^{\text{T}} \)according to a threshold phenotypic distance\( \varepsilon = \rho_{\mu } \varepsilon_{\mu } \), if the approximability condition, Eq. (5.8a),

$$ \sqrt \varepsilon < \frac{{ - \tilde{\lambda }_{\rm max} }}{{2\sqrt {\alpha \tilde{C}_{\text{r}} \hbox{max} \left\{ {\tilde{C}_{{\mathbf{h}}} ,\,\tfrac{1}{2}\,\left| {\tilde{\lambda }_{\rm max} } \right|\rho_{\text{m}} } \right\}} }} $$

is satisfied for some\( d \), then

$$ \begin{aligned} \left| {{\mathbf{m}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}} \right| & = \left| {{\mathbf{Q}}^{ - 1} {\mathbf{w}} + {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\tilde{{\mathbf{m}}}} }}} \right| \le C_{{\mathbf{m}}} \varepsilon , \\ C_{{\mathbf{m}}} & = \frac{{2\alpha \left\| {{\mathbf{Q}}^{ - 1} } \right\|\hbox{max} \left\{ {\tilde{C}_{{\mathbf{h}}} ,\, - \tfrac{1}{2}\,\tilde{\lambda }_{\rm max} \rho_{\text{m}} } \right\}}}{{\left| {\tilde{\lambda }_{\rm max} } \right|}} + \rho_{\text{m}} , \\ \end{aligned} $$
(5.9a)

holds during the transient following mutant invasion, where\( {\mathbf{m}} \)is split into an\( L \)-dimensional vector\( {\mathbf{m}}_{\text{x}} = (m_{1} , \ldots ,m_{L} )^{\text{T}} \)of not-small initial population densities, and a\( K( = M - L) \)-dimensional vector\( {\mathbf{m}}_{\text{y}} = (m_{L + 1} , \ldots ,m_{M} )^{\text{T}} \)of small initial population densities\( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{L + 1} , \ldots ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{M} \)such that\( \left| {\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{L + 1} , \ldots ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m_{M}} } \right)} \right| = \varepsilon \rho_{\text{m}} \)(\( \rho_{\text{m}} = 0 \)when all initial population densities are not small, i.e.,\( K = 0 \)), and\( \tilde{\lambda }_{1} , \ldots ,\tilde{\lambda }_{M} \)are the eigenvalues of\( \tfrac{1}{2}\left( {{\tilde{\mathbf{A}}} + {\tilde{\mathbf{A}}}^{\text{T}} } \right) \)with

$$ \begin{aligned} & {\tilde{\mathbf{A}}} = \left( {\begin{array}{cc} {{\mathbf{A}}_{\text{x}} } &\quad 0 \\ {d{\mathbf{B}}_{\text{yx}} {\mathbf{P}}^{ - 1} } &\quad {d{\mathbf{J}}_{\text{y}} } \\ \end{array} } \right), \\ & {\mathbf{A}}_{\text{x}} = {\mathbf{P}}^{ - 1} {\text{diag(}}{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}_{\text{x}}} }} ){\mathbf{B}}_{\text{xx}} {\mathbf{P}}, \\ & {\mathbf{J}}_{\text{y}} = {\mathbf{B}}_{\text{yy}} - {\mathbf{B}}_{\text{yx}} {\mathbf{B}}_{\text{xx}}^{ - 1} {\mathbf{B}}_{\text{xy}} , \\ & {\mathbf{Q}} = \left( {\begin{array}{cc} {\mathbf{P}} & \quad{{\mathbf{PB}}_{\text{xx}}^{ - 1} {\mathbf{B}}_{\text{xy}} } \\ 0 &\quad {{\mathbf{I}}_{\text{y}} } \\ \end{array} } \right), \\ & \left( {\begin{array}{cc} {{\mathbf{B}}_{\text{xx}} } &\quad {{\mathbf{B}}_{\text{xy}} } \\ {{\mathbf{B}}_{\text{yx}} } & \quad{{\mathbf{B}}_{\text{yy}} } \\ \end{array} } \right) = {\mathbf{B}}, \\ \end{aligned} $$
(5.9b)

where\( {\mathbf{I}}_{\text{y}} \)denotes the\( K \times K \)identity matrix,\( {\mathbf{B}} \)is defined in Eq. (4.3b), and\( {\mathbf{P}} \)is defined in Eq. (E.24) in AppendixE.

Note that it is arbitrary which phenotypes we choose as having small initial population densities. Thus, whether each approximate phenotype’s initial population density is small or not can be decided in such a manner that satisfying the approximability condition becomes easiest. If none of the approximate population densities is treated as small, i.e., \( K = 0 \), Theorem 3 becomes identical to Theorem 2. Thus, Theorem 3 includes Theorem 2 as a special case. The threshold phenotypic distance \( \varepsilon \) and the way of clustering can also be chosen arbitrarily, so that satisfying the approximability condition becomes easiest, as long as \( \varepsilon > \varepsilon_{\mu } \). Notice that \( \tilde{C}_{{\mathbf{h}}} \) depends on \( \varepsilon \), although it is bounded when \( \varepsilon \to 0 \). In addition, all of the other mathematical objects in Eq. (5.8a) depend indirectly on \( \varepsilon \), because \( \varepsilon \) affects how to cluster the existing phenotypes. A procedure for the evaluation of Eq. (5.8a) would be as follows: first choose \( \varepsilon \), and then choose approximate phenotypes, choose \( K \), choose \( d \) when \( K > 0 \) (so that the right-hand side of Eq. (5.8a) is maximized), and examine whether the inequality holds good.

Moreover, the initial state \( {\mathbf{\overset{\hbox{$\smash{\scriptscriptstyle\frown}$}}{n^{{\prime }}} }} \) need not be exactly at an equilibrium of the resident phenotypes \( s_{1} , \ldots ,s_{N} \) if the value of \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{C_{\text{Fz}}^{{\prime }}} \) is adjusted such that \( F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{n}}^{{\prime }}} }} ) \le \varepsilon \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{C_{\text{Fz}}^{{\prime }} }\) is satisfied for all \( i = 1, \ldots ,N + 1 \). Therefore, this theorem can be applied also to the case of higher mutation rates, in which frequent mutant invasions prevent the community from reaching the next population-dynamical equilibrium.

Note also that the smallness of changes of the population densities of the approximate phenotypes ensures not only the smallness of fitness changes of existing phenotypes \( F(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) \) for \( i = 1, \ldots ,N + 1 \), i.e., LV-approximability, but also the smallness of fitness changes of any non-existing phenotype \( z \), i.e., of the fitness landscape \( F(z;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) \). Specifically, from Theorem 3 we immediately see

Corollary 1

If the approximability condition, Eq. (5.8a), in Theorem3is satisfied, then the change of the fitness landscape\( F(z;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) \)is slight during the transient following mutant invasion, because by using Taylor’s theorem we see that

$$ \begin{aligned} & \left| {F(z;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) - F(z;{\mathbf{s}}^{{\prime }} ;{\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n^{{\prime }}} }} )} \right| \\ & \quad = \left| {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (z;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} ) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (z;{\mathbf{s}}^{{\prime }} ;{\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m^{{\prime }}} }} )} \right| \\ & \quad = \left| {\left[ {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (z;{\mathbf{s}}^{{\prime }} ;{\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m^{{\prime }}} }} ) + \left. {\frac{{\partial \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (z;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} )}}{{\partial {\mathbf{m}}^{{\prime }} }}} \right|_{{{\mathbf{m}}^{{\prime }} = {\mathbf{m}}_{\text{T}}^{{\prime }} }} ({\mathbf{m}}^{{\prime }} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}^{{\prime }}} }} )} \right] - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (z;{\mathbf{s}}^{{\prime }} ;{\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m^{{\prime }}} }} )} \right|\, \\ & \quad = \left| {\left. {\frac{{\partial \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (z;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} )}}{{\partial {\mathbf{m}}^{{\prime }} }}} \right|_{{{\mathbf{m}}^{{\prime }} = {\mathbf{m}}_{\text{T}}^{{\prime }} }} ({\mathbf{m}}^{{\prime }} - {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m^{{\prime }}} }} )} \right| \\ & \quad \le \mathop {\hbox{max} }\limits_{{\left| {{\mathbf{m}}^{{\prime }} - {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m^{{\prime }}} }} } \right| < C_{{\mathbf{m}}}^{{\prime }} \varepsilon }} {\left\| {\frac{{\partial \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (z;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} )}}{{\partial {\mathbf{m}}^{{\prime }} }}} \right\|} C_{{\mathbf{m}}}^{{\prime }} \varepsilon , \\ \end{aligned} $$
(5.10a)

where\( {\mathbf{m}}_{\text{T}} : = \theta_{\text{T}} ({\mathbf{m}} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{{\mathbf{m}}} }}) + {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m} }} \)with an appropriately chosen\( \theta_{\text{T}} \in [0,1] \), and by using\( C_{{\mathbf{m}}} \)from Eq. (5.9a) in\( C^{\prime}_{{\mathbf{m}}} = \sqrt {C_{\text{m}}^{ 2} + (N + 1 - M)\eta^{2} } \)from Lemma 2, we see that

$$ C_{{\mathbf{m}}}^{{\prime }} = \sqrt {\left( {\frac{{2\alpha \left\| {{\mathbf{Q}}^{ - 1} } \right\|\hbox{max} \left\{ {\tilde{C}_{{\mathbf{h}}} , - \tfrac{1}{2}\tilde{\lambda }_{\rm max} \rho_{\text{m}} } \right\}}}{{\left| {\tilde{\lambda }_{\rm max} } \right|}} + \rho_{\text{m}} } \right)^{2} + (N + 1 - M)\eta^{2} } . $$
(5.10b)

5.4 Generalization to higher-dimensional trait spaces

Theorems 13 and Corollary 1 apply to one-dimensional trait spaces. These results readily generalized to trait spaces of arbitrary dimensions by a slight modification of the analyses so that the derivative of the fitness function with respect to a phenotype in the one-dimensional trait space is replaced with the corresponding directional derivative in the higher-dimensional trait space, as shown in Appendix I.

6 Tighter estimates

For deriving the approximability conditions in Eqs. (4.7a) and (5.8a), we have used the maximum possible values of the perturbation terms (\( {\mathbf{h}} \) and \( {\tilde{\mathbf{h}}} \)) and nonlinear terms (\( {\mathbf{r}} \) and \( {\tilde{\mathbf{r}}} \)) attainable for \( {\mathbf{n}}^{{\prime }} \in [0,\eta ]^{N + 1} \). These provide the simplest, but rather conservative, approximability conditions. By approximating those terms as linear or higher-order functions of population densities (corresponding to \( {\mathbf{x}} \) and \( {\mathbf{w}} \)), we may improve the estimates underlying the approximability conditions. In Theorem 2, for example, the perturbation term \( {\mathbf{h}} \) can be expanded in \( {\mathbf{w}} \) around \( {\mathbf{w}} = {\mathbf{0}} \) (i.e., \( {\mathbf{m}} = {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m} }} \)) up to the first-order remainder terms,

$$ \begin{aligned} {\mathbf{h}} & = {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{h} }} + {\mathbf{Hx}}, \\ {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{h} }} & = \left. {\mathbf{h}} \right|_{{{\mathbf{m}} = {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m^{\prime } } }}}} , \\ {\mathbf{H}} & = \left. {\frac{{\partial {\mathbf{h}}}}{{\partial {\mathbf{x}}}}} \right|_{{{\mathbf{m}} = {\mathbf{m}}_{{\text{T}}}^{\prime } }} \\ \end{aligned}, $$
(6.1)

with some appropriately chosen \( {\mathbf{m}}_{\text{T}} \in [0,\eta ]^{M} \). Then, applying Eqs. (6.1) to Lemma 6 gives the condition

$$ \sqrt \varepsilon < \frac{{ - \lambda_{\rm max} }}{{\sqrt {4\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{C}_{{\mathbf{h}}} C_{{\mathbf{r}}} + 2C_{{\mathbf{H}}} \big| {\lambda_{\rm max} } \big|} }}, $$
(6.2)

where \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{C}_{{\mathbf{h}}} \ge \left| {{\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{h} }}} \right| \) and \( C_{{\mathbf{H}}} \ge \left\| {\mathbf{H}} \right\| \); see Appendix J for the derivation. As the magnitude of the zeroth-order term for \( {\mathbf{h}} \) is estimated more tightly by \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{C}_{{\mathbf{h}}} \) at \( {\mathbf{m}} = {\mathbf{\overset{\hbox{$\smash{\scriptscriptstyle\frown}$}}{m} }} \), compared to \( C_{{\mathbf{h}}} \) used in the original approximability condition in Eq. (4.8a), this condition can work better than the original approximability condition, but is less simple.

7 Example: Approximability condition for a resource-competition model

In this section, we give a simple example of how to examine the approximability condition in a specific ecological model.

7.1 Model description

We consider a resource-competition model based on the Beddington–DeAngelis-type functional response (Beddington 1975; DeAngelis et al. 1975), known to describe both saturation of consumption and interference competition among consumers. Under \( N \) coexisting consumer phenotypes \( {\mathbf{s}} = (s_{1} , \ldots ,s_{N} )^{\text{T}} \) with their densities \( {\mathbf{n}} = (n_{1} , \ldots ,n_{N} )^{\text{T}} \), we describe the \( i \)th phenotype’s per-capita growth rate as

$$ \frac{1}{{n_{i} }}\frac{{{\text{d}}n_{i} }}{{{\text{d}}t}} = \beta g(s_{i} ;{\mathbf{s}};{\mathbf{n}}) - \psi , $$
(7.1)
$$ g(s_{i} ;{\mathbf{s}};{\mathbf{n}}) = \frac{{\theta (s_{i} )}}{{\zeta_{1} + \zeta_{2} \theta (s_{i} ) + \zeta_{3} \sum\nolimits_{j = 1}^{N} {n_{j} \alpha (s_{j} ,s_{i} )} }} , $$
(7.2)

In Eq. (7.1), \( g(s_{i} ;{\mathbf{s}};{\mathbf{n}}) \) is the resource gain of phenotype \( s_{i} \), \( \beta \) is a constant assimilation efficiency, and \( \psi \) is a constant natural death rate. In Eq. (7.2), \( \theta (s_{i} ) \) is the density of potential resources for \( s_{i} \), and \( \alpha (s_{j} ,s_{i} ) \) describes the niche overlap between phenotypes \( s_{i} \) and \( s_{j} \). \( \zeta_{1} \), \( \zeta_{2} \), and \( \zeta_{3} \) are constant parameters related to the encounter rate of resources, handling time of resources, and intensity of interference competition, respectively. Notice that \( \zeta_{3} = 0 \) gives the Holling type-II functional response (Holling 1959). Equation (7.2) can be derived from a generalized Beddington–deAngelis functional response with explicit description of a resource distribution and phenotypes’ niches expressed along a resource-quality axis (Appendix L.1).

We assume \( \psi = 1 \) and \( \zeta_{3} = 1 \) without loss of generality. For simplicity, we assume that \( \theta (s_{i} ) \) depends only on \( s_{i} \) (i.e., constant inflows of resources into the system) and that

$$ \alpha (s_{j} ,s_{i} ) = \exp \left( { - \tfrac{1}{2}(s_{j} - s_{i} )^{2} } \right).$$
(7.3)

7.2 Approximability condition

As a simplest example for the approximability condition in this model, we consider invasion by a mutant phenotype \( s_{2} \) into a single resident phenotype \( s_{1} \) at its population-dynamical equilibrium \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{1} = [\beta - \zeta_{2} ]\theta (s_{1} ) - \zeta_{1} \). From Eqs. (7.1) and (7.2), their population dynamics are given by

$$ \frac{1}{{n_{1} }}\frac{{{\text{d}}n_{1} }}{{{\text{d}}t}} = F(s_{1} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) = \frac{{\beta \theta (s_{1} )}}{{\zeta_{1} + \zeta_{2} \theta (s_{1} ) + n_{1} + n_{2} \alpha (s_{2} ,s_{1} )}} - 1 , $$
(7.4a)
$$ \frac{1}{{n_{2} }}\frac{{{\text{d}}n_{2} }}{{{\text{d}}t}} = F(s_{2} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) = \frac{{\beta \theta (s_{2} )}}{{\zeta_{1} + \zeta_{2} \theta (s_{2} ) + n_{1} \alpha (s_{2} ,s_{1} ) + n_{2} }} - 1 , $$
(7.4b)

with \( {\mathbf{s}}^{{\prime }} = (s_{1} ,s_{2} )^{\text{T}} \) and \( {\mathbf{n}}^{{\prime }} = (n_{1} ,n_{2} )^{\text{T}} . \)

We choose \( s_{1} \) as the approximate phenotype, i.e., \( s_{\text{a}} = s_{1} \) and \( {\mathbf{m}}^{{\prime }} = (m_{1} ,m_{2} )^{\text{T}} = (n_{1} + n_{2} ,\varepsilon n_{2} )^{\text{T}} \) with \( \varepsilon = s_{2} - s_{1} \), \( 0 < \varepsilon \ll 1 \). Then, from Eqs. (4.3b) and (4.5b), we see

$$ \begin{aligned} \lambda_{\rm max} & = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{m}_{1} b_{11} = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{1} \left[ {\frac{{{\text{d}}F(s_{2} ;s_{1} ;m_{1} )}}{{{\text{d}}m_{1} }}} \right]_{{m_{1} = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{1} }} \\ & = - \frac{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{1} }}{{\beta \theta (s_{1} )}} \\ & = - \frac{{[\beta - \zeta_{2} ]\theta (s_{1} ) - \zeta_{1} }}{{\beta \theta (s_{1} )}}. \\ \end{aligned} $$
(7.5)

Notice that \( \lambda_{\rm max} \) is always negative because \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{1} \) must be positive. As for \( C_{{\mathbf{h}}} \) and \( C_{{\mathbf{r}}} \) in the approximability condition \( \sqrt \varepsilon < - \lambda_{\rm max} /[2\sqrt {C_{{\mathbf{h}}} C_{{\mathbf{r}}} } ] \) in Theorem 2, we find (as derived in Appendix L.2) that

$$ \begin{aligned} C_{{\mathbf{h}}} & \le \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{C}_{{\mathbf{h}}} , \\ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{C}_{{\mathbf{h}}} & = \frac{{\eta \beta [5C_{\partial \theta } + 3\varepsilon C_{\theta } ]}}{{\zeta_{1} + \zeta_{2} C_{\theta {\rm min} } }}, \\ C_{{\mathbf{r}}} & = \frac{1}{{\beta \theta (s_{1} )}} + \eta \frac{{2\beta \theta (s_{1} )}}{{[\zeta_{1} + \zeta_{2} \theta (s_{1} )]^{3} }}, \\ \end{aligned} $$
(7.6)

with

$$ \begin{aligned} \eta & = [\beta - \zeta_{2} ]C_{\theta } - \zeta_{1} , \\ C_{\theta } & = \hbox{max} \left\{ {\theta (s)\left| {s \in [s_{1} ,s_{2} ]} \right.} \right\}, \\ C_{\theta {\rm min} } & = \hbox{min} \left\{ {\theta (s)\left| {s \in [s_{1} ,s_{2} ]} \right.} \right\}, \\ C_{\partial \theta } & = \hbox{max} \left\{ {\frac{{{\text{d}}\theta (s)}}{{{\text{d}}s}}\big| {s \in [s_{1} ,s_{2} ]} } \right\}. \\ \end{aligned} $$
(7.7)

Therefore, a sufficient condition for the approximability condition is given by

$$ \begin{aligned} \sqrt \varepsilon & < - \frac{{\lambda_{\rm max} }}{{2\sqrt {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{C}_{{\mathbf{h}}} C_{{\mathbf{r}}} } }} \\ & = \frac{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{1} }}{{2\sqrt {\frac{{\eta \beta [5C_{\partial \theta } + 3\varepsilon C_{\theta } ]}}{{\zeta_{1} + \zeta_{2} C_{\theta {\rm min} } }}\left( {\beta \theta (s_{1} ) + \eta \frac{{2\beta^{3} \theta (s_{1} )^{3} }}{{[\zeta_{1} + \zeta_{2} \theta (s_{1} )]^{3} }}} \right)} }} \\ \end{aligned} $$
(7.8)

with \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n}_{1} = [\beta - \zeta_{2} ]\theta (s_{1} ) - \zeta_{1} \) and Eqs. (7.7). Notice that the right-hand side of Eq. (7.8) includes \( \varepsilon C_{\theta } \), which is negligible when \( \varepsilon C_{\theta } \ll C_{\partial \theta } \).

8 Application: Extending the invasion–implies–substitution theorem

The derived stability conditions and resultant Lotka–Volterra approximation can be used to analyze the community dynamics triggered by a mutant invasion. In this section, we apply them to extend the invasion–implies–substitution theorem (Dercole and Rinaldi 2008, Appendix B) to an arbitrary set of resident phenotypes that form well-recognizable and -separated clusters in a one-dimensional trait space; see Appendix K for details.

We assume an arbitrary set of resident phenotypes \( s_{1} , \ldots ,s_{N} \) together with a mutant \( s^{{\prime }} = s_{N + 1} \), with the resident and mutant phenotypes clustered into approximate phenotypes \( {\mathbf{s}}_{\text{a}} = (s_{1} , \ldots ,s_{M} )^{\text{T}} \) that satisfy the approximability condition of Theorem 3. Then, from Lemma 2, \( \left| {\Delta {\mathbf{m}}^{{\prime }} } \right| \le \varepsilon C_{\text{m}}^{{\prime }} = \varepsilon \sqrt {C_{\text{m}}^{ 2} + (N + 1 - M)\eta^{2} } \) is conserved during the transient following mutant invasion. We denote the identity of the cluster containing the mutant by \( i \), i.e., \( {\text{cid}}(N + 1) = i \). Using Eq. (4.1a), the dynamics of the mutant fraction \( p_{N + 1} = n_{N + 1} /m_{i} \) within this cluster can be expressed as

$$ \begin{aligned} \frac{{{\text{d}}p_{N + 1} }}{{{\text{d}}t}} & = \frac{\text{d}}{{{\text{d}}t}}\frac{{n_{N + 1} }}{{m_{i} }} \\ & = \frac{{n_{N + 1} }}{{m_{i} }}F(s_{N + 1} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) - \frac{{n_{N + 1} }}{{m_{i} }}f(s_{i} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) \\ & = p_{N + 1} F(s_{N + 1} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} ) - p_{N + 1} \sum\limits_{{j \in {\text{com}}(i)}} {p_{j} F(s_{j} ;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} )} . \\ \end{aligned} $$
(8.1)

For convenience, we assume that the representative phenotype \( s_{i} \) of this cluster is chosen as the phenotype most similar to the mutant, i.e., \( \left| {s_{N + 1} - s_{i} } \right| = \mathop {\hbox{min} }\nolimits_{{j \in {\text{com}}(i)}} \left| {s_{N + 1} - s_{j} } \right| \). Then, by Taylor’s theorem, we transform Eq. (8.1) into

$$ \frac{{{\text{d}}p_{N + 1} }}{{{\text{d}}t}} \ge p_{N + 1} (1 - p_{N + 1} )\left[ {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{\text{z}} (s_{i} )(s_{N + 1} - s_{i} ) - \varepsilon C_{{{\text{Fz}}{\mathbf{m}}}}^{{\prime }} \left| {\Delta {\mathbf{m}}^{{\prime }} } \right| - \varepsilon^{2} C_{\text{Fzz}} } \right] . $$
(8.2a)

Here, \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{\text{z}} (s_{i} ) = \left. {\partial F(z;{\mathbf{s}}^{{\prime }} ;{\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{n^{{\prime }}} }} ) /\partial z} \right|_{{z = s_{i} }} \) is the fitness gradient at \( s_{i} \), and the constants \( C_{{{\text{Fz}}{\mathbf{m}}}}^{{\prime }} \) and \( C_{\text{Fzz}}^{{\prime }} \) bound the remainder terms through

$$ \left| {\frac{{\partial \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} (z;{\mathbf{s}}^{{\prime }} ;{\mathbf{m}}^{{\prime }} )}}{{\partial {\mathbf{m}}^{{\prime }} \partial z}}} \right|_{{z = z_{{j{\text{T}}}} }} \le C_{{{\text{Fz}}{\mathbf{m}}}}^{{\prime }} ,\quad \left| {\frac{{\partial^{2} F(z;{\mathbf{s}}^{{\prime }} ;{\mathbf{n}}^{{\prime }} )}}{{\partial z^{2} }}} \right|_{{z = z_{{j{\text{T}}}} }} \le C_{\text{Fzz}}^{{\prime }} , $$
(8.2b)

for \( z_{{j{\text{T}}}} \in [s_{j} ,s_{{{\text{cid}}(j)}} ] \) for all \( j = 1, \ldots ,N + 1 \) during the transient following mutant invasion. Then, by substituting our results \( \left| {\Delta {\mathbf{m}}^{{\prime }} } \right| \le \varepsilon C_{{\mathbf{m}}}^{{\prime }} = \varepsilon \sqrt {C_{\text{m}}^{ 2} + (N + 1 - M)\eta^{2} } \) and Eq. (5.9a) into Eq. (8.2a), a sufficient condition for \( {\text{d}}p_{N + 1} / {\text{d}}t \) to be always positive is given by

$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{\text{z}} (s_{i} )(s_{N + 1} - s_{i} ) > \varepsilon^{2} [C_{\text{Fzz}} + C_{{{\text{Fz}}{\mathbf{m}}}}^{{\prime }} C_{{\mathbf{m}}}^{{\prime }} ] $$
(8.3)

with \( C_{{\mathbf{m}}}^{{\prime }} \) in Eq. (5.10b).

If the fitness gradient \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{\text{z}} (s_{i} ) \) is sufficiently strong, so that it satisfies Eq. (8.3), then \( p_{N + 1} \) monotonically increases until it reaches 1, i.e., until all other phenotypes within the cluster containing the mutant are excluded. Equation (8.3) means that the fitness advantage of \( s_{N + 1} \) against \( s_{i} \) due to the fitness gradient must exceed the effects of the curvature of the fitness landscape (\( \varepsilon^{2} C_{\text{Fzz}} \)) and of the perturbation due to the population dynamics (\( \varepsilon^{2} C_{{{\text{Fz}}{\mathbf{m}}}}^{{\prime }} C_{{\mathbf{m}}}^{{\prime }} \)). As long as Eq. (8.3) holds for any resident phenotype and its mutants, repeated mutant invasions always result in monomorphic phenotype clusters, i.e., resident phenotypes are kept dissimilar, corresponding to the situation considered by Dercole and Rinaldi (2008). Notice that when \( \left| {\tilde{\lambda }_{\rm max} } \right| \) becomes close to zero, e.g., when the community is close to a bifurcation point of its population dynamics, \( C_{{\mathbf{m}}}^{{\prime }} \) becomes large and Eq. (8.3) thus becomes difficult to satisfy.

9 Discussion

As explained in the beginning of this paper, ecological interactions engender various evolutionary dynamics, including cyclic coevolution, adaptive radiation, adaptive speciation, taxon cycles, and community formation. To analyze how ecological interactions induce selection pressures that drive such dynamics, the following two assumptions are often made (Metz et al. 1992, 1996; Dieckmann and Law 1996). First, mutation rates are sufficiently small relative to the timescale of the population dynamics, so that the evolutionary dynamics are reduced to trait-substitution sequences resulting from repeated mutant invasions. Second, mutational step sizes are sufficiently small, so that a mutant invasion typically results in an equilibrium phenotype distribution similar to that before the invasion. The latter is called attractor inheritance (Geritz et al. 2002). In such cases, each mutant invasion modifies the fitness landscape only slightly. The fitness landscape can then be treated as a smooth function of resident phenotypes at equilibrium population densities, enabling effective analyses of directional coevolution (Dieckmann and Law 1996) and diversification through evolutionary branching (Metz et al. 1992, 1996; Geritz et al. 1997, 1998). Using the concept of approximate phenotypes introduced in the present paper, attractor inheritance can be translated into the smallness of changes of the population densities of approximate phenotypes during the transient population dynamics following mutant invasion, toward the next population-dynamical equilibrium.

9.1 Conditions for attractor inheritance

Prior to our analyses in the present paper, qualitative conditions for attractor inheritance have been proved for sufficiently small mutational step sizes in the following two cases: (1) all residents and the mutant are similar to each other (Geritz et al. 2002; Meszéna et al. 2005; Durinx et al. 2008), or (2) no two residents are similar to each other and their initial equilibrium population densities are not small (Dercole and Rinaldi 2008, Appendix B). In this paper, we have derived quantitative conditions for attractor inheritance for a set of residents and a mutant, by clustering them according to a threshold phenotypic distance into approximate phenotypes. The conditions ensuring attractor inheritance, i.e., the approximability conditions in Theorems 2 and 3, establish relationships among the magnitudes of the mutational step size, the return rate to an equilibrium of the population dynamics of approximate phenotypes, the nonlinearity of the population dynamics, and the perturbation due to within-cluster population dynamics. These conditions are especially important when finite, rather than infinitesimally small, mutational step sizes are required for analyzing the considered evolutionary dynamics, such as when investigating evolutionary suicide (Gyllenberg and Parvinen 2001) and evolutionary branching of directionally evolving populations (Ito and Dieckmann 2012, 2014). A next step would be to analyze whether it is really possible to satisfy the approximability condition, or rather, whether the condition can be satisfied with not too large error bounds in all but a set of theoretically possible but practically irrelevant cases. Although we here have considered only deterministic population dynamics, the impact of demographic stochasticity on trait-substitution sequences (Geritz et al. 2002) can be considered using the same framework we have introduced here, by subsuming its effect in the perturbation terms.

9.2 Assumption of well-recognizable and -separated phenotypic clusters

Our analysis assumes that the number \( N \) of existing phenotypes is finite, and that phenotypic clusters are well-recognizable and well-separated from each other so that the largest of within-cluster distances, \( \varepsilon \), is much smaller than the smallest of between-cluster distances. We discuss the validity of our two assumptions below.

In principle, ODE population models should be seen as large-system-size limits of stochastic individual-based models. Generally, the larger the number of coexisting phenotypes, the slower is the convergence to the ODE limit. Thus, for all practical purposes, ODE models with very large numbers of phenotypes can be left out of the picture. If we do so, the finiteness of the number of existing phenotypes, \( N \), ensures the existence of the smallest between-cluster distance and the largest within-cluster distance.

However, if a system has long chains of phenotypes in which the distances between any two consecutive members of the chain are small but the distance between the ends of the chain is large, we have no way to cluster them so that \( \varepsilon \) becomes much smaller than the smallest between-cluster distance. In this case, the error estimate for perturbation terms in Theorems 2 or 3 (\( C_{{\mathbf{h}}} \) or \( \tilde{C}_{{\mathbf{h}}} \)), in comparison with the leading eigenvalue of the community Jacobian matrix (\( \lambda_{\rm max} \) or \( \tilde{\lambda }_{\rm max} \)), can be too large for the approximability condition to be satisfied.

Fortunately, there is effectively no chance of such configurations occurring in ongoing evolutionary dynamics with sufficiently small mutational step sizes as in such dynamics closely similar phenotypes only occur in the early stages of evolutionary branching. (The local coexistence regions that can occur in higher-dimensional trait spaces around the zero fitness contour for particular residents are that narrow that the chance of a mutant landing in them is practically negligible. The more so since the far more common mutants landing outside these coexistence regions will oust all those inside the regions, so there is no chance of the number of coexisting similar phenotypes ever becoming large (Durinx et al. 2008).) Finally, of phenotypes that evolve towards each other, only one will survive due to competitive exclusion. Therefore, the assumption of well-recognizable and -separated phenotypic clusters is warranted except for a fraction of cases that will be encountered only very exceptionally, as well as transiently, in the scenarios that have our interest.

9.3 LV-approximation for analyzing evolutionary branching in multidimensional trait spaces

As shown in Sect. 3, attractor inheritance in approximate phenotypes directly enables LV-approximations of the population dynamics of the original phenotypes before clustering, similar to the previous studies (Meszéna et al. 2005; Dercole and Rinaldi 2008; Durinx et al. 2008). The derived LV-approximations may be especially useful for extending conditions for evolutionary branching from one-dimensional trait spaces to higher-dimensional trait spaces. In two-dimensional trait spaces, various numerical analyses have shown that phenotypes that are strongly convergence stable, but not evolutionarily stable, also known as strongly attracting invadable ESSes, induce evolutionary branching (e.g., Vukics et al. 2003; Ackermann and Doebeli 2004; Egas et al. 2005; Ravigné et al. 2009; Ito and Dieckmann 2012). Those phenotypes are fixed-point attractors that can be attained by directional evolution causing the convergence of a monomorphic population (Leimar 2009) to them, with sufficient proximity of a set in \( \mathcal{\mathcal{S}}^{2} \) enabling the emergence of dimorphisms followed by directional evolution causing the divergence of the two morphs. However, whether an emergent polymorphism can evolutionarily diversify further into visually distinct morphs without collapse has not been proved until recently for higher-dimensional trait spaces. Based on the rational form of invasion-fitness functions in terms of existing phenotypes, which has been derived by LV-approximation (Durinx et al. 2008), Geritz et al. (2016) derived a set of conditions that ensure that such diversifying evolution does not collapse in trait spaces of arbitrary dimension, by describing the initial diversifying evolution with coupled Lande equations (Lande 1979). While those conditions are satisfied by strongly attracting invadable ESSes in two-dimensional trait spaces, the higher-dimensional cases remain to be analyzed further (Geritz et al. 2016).

9.4 Axioms for fitness functions

The analyses in this paper are based on a set of axioms for the fitness-generating functions characterizing ecologically plausible differential equations describing trait-mediated community dynamics. Our set of axioms are similar to the set of properties assumed in Dercole (2016), which are used by him to derive a general procedure for formulating population-dynamical models resulting from individual pairwise interaction. Properties 1, 2, and 3 in Dercole (2016) are identical to our axiom (iii), (iv), and (ii), respectively, while property 4 in Dercole (2016) corresponds to our axiom (i). Dercole’s property 4, however, delimits a smaller class of models than ours.

The symmetry axiom (ii) and the reducibility axiom (iii) are no more than consistency conditions, as is the exchangeability axiom (iv). The latter axiom, however, is together with the remaining smoothness axiom (i) and bounded-world axiom (v) the root cause of the Lotka–Volterra approximabiliy. Indeed, Lemma 1, which is central for deriving the condition for attractor inheritance and LV-approximation, is proved by applying those three axioms (Appendix A). While the bounded-world axiom (v) seems to be well grounded in reality, the smoothness axiom (i) may not hold in an exact sense, because it assumes that the population-dynamical behavior of individuals depends smoothly on their traits and that all ecological interactions are instantaneous. This instantaneousness can arise when the timescale of the life-history dynamics among individuals is much faster than that of their population dynamics. Durinx et al. (2008) have proved attractor inheritance and LV-approximation in physiologically structured models with multiple birth states, in which the timescales of life-history dynamics and population dynamics are not separated. This instills us with cautious optimism that the assumption of instantaneousness we have used in the present paper might be relaxed as well.