Heteroclinic cycling and extinction in May–Leonard models with demographic stochasticity

May and Leonard (SIAM J Appl Math 29:243–253, 1975) introduced a three-species Lotka–Volterra type population model that exhibits heteroclinic cycling. Rather than producing a periodic limit cycle, the trajectory takes longer and longer to complete each “cycle”, passing closer and closer to unstable fixed points in which one population dominates and the others approach zero. Aperiodic heteroclinic dynamics have subsequently been studied in ecological systems (side-blotched lizards; colicinogenic Escherichia coli), in the immune system, in neural information processing models (“winnerless competition”), and in models of neural central pattern generators. Yet as May and Leonard observed “Biologically, the behavior (produced by the model) is nonsense. Once it is conceded that the variables represent animals, and therefore cannot fall below unity, it is clear that the system will, after a few cycles, converge on some single population, extinguishing the other two.” Here, we explore different ways of introducing discrete stochastic dynamics based on May and Leonard’s ODE model, with application to ecological population dynamics, and to a neuromotor central pattern generator system. We study examples of several quantitatively distinct asymptotic behaviors, including total extinction of all species, extinction to a single species, and persistent cyclic dominance with finite mean cycle length.

1. Introduction.Following Lotka [25] and Volterra [47], May and Leonard [27] introduced a model generalizing Lotka-Volterra dynamics for a system of three species: In Equation (1.1) n i represents the population of species i, and the constants α ≥ 0 and β ≥ 0 represent the strengths of competitive interactions.The model exhibits different types of coexistence for different choices of α and β.When α + β = 2, the system converges to a periodic orbit contained in the plane n 1 + n 2 + n 3 = 1.This solution can be interpreted as the direct extension of Lotka-Volterra to three species, where each species' population oscillates with finite period.However, when α + β > 2 and either α < 1 or β < 1, the system undergoes heteroclinic cycling, with the duration of each cycle increasing as time progresses.In this regime, each species' population becomes closer to zero with each cycle, and spends a longer fraction of each cycle in this near-extinction state.
Heteroclinic cycling models such as Equation (1.1) and their variants have frequently served as models for "rock-paper-scissors"-type population dynamics in which populations take turns as the dominant species before being pushed out in favor of a more competitive population.Sinervo and Lively [41] found that a species of sideblotched lizards exhibits rock-paper-scissors competition: orange aggressive lizards beat out less-aggressive blue lizards for mates, yellow "sneaker" lizards invade the larger orange lizard territory to steal mates, and blue lizards beat out the sneakers for mates.Kerr et.al. [24] observed a similar behavior in colicinogenic E. coli : a toxin-producing strain kills a susceptible population, a toxin-resistant population grows faster than the toxin-producing population, and the susceptible population grows faster than the resistant population.In computational neuroscience, heteroclinic cycling has been proposed as an alternative to classic "winner-take-all" models for neural networks.Rabinovich et.al. [33,34] suggested that the activity of olfactory neurons when encoding stimuli can be projected onto a heteroclinic cycle and called the behavior "winnerless competition."Varona et.al. [45] theorized that high-dimensional heteroclinic systems leading to chaotic dynamics might underlie the apparently random search behavior during hunting in the mollusc Clione.Shaw et.al. [39] and Lyttle et.al. [26] constructed a model capable of transitioning between limitcycling and heteroclinic-cycling behaviors to represent a neuromotor central pattern generator (CPG) in Aplysia californica (see also [29]).While more detailed models for the Aplysia feeding system have since been developed [48], the simplicity of the three-component SLG (Shaw-Lyttle-Gill) model makes it an attractive target for analysis.
Despite their popularity, heteroclinic cycling models of biological populations, when formulated as systems of ordinary differential equations, suffer a fundamental flaw.Indeed, in their original paper, May and Leonard noted a significant drawback of their model's ability to describe population dynamics.They observed that, while heteroclinic cycling continues indefinitely, real biological populations "cannot fall below unity, [and] it is clear that the system will, after a few cycles, converge on some single population, extinguishing the other two" [27].This discrepancy arises from demographic stochasticity, or copy number noise, that is inherent in systems where populations take on discrete integer values.
In light of May and Leonard's observation, one might expect that a stochastic system undergoing heteroclinic cycling would necessarily exhibit population extinctions.However, as is well known, the mapping from a given ODE model to a stochastic model having matching mean-field dynamics is not unique.For example, Allen [1] noted that for a logistic birth-death process, there are an infinite number of per capita birth and death rates that yield the same mean-field logistic growth.Xue and Goldenfeld [52] found that modeling plankton ecosystems using stochastic versions of the "kill-the-winner" model resulted in extinction events, while the mean-field model had stable coexistence of all species.And Strang et.al. [42] explored the paradox that stochastic models with the Allee effect, which reduces per-capita growth rate for small population size, can have longer persistence than models without the effect.The ambiguity intrinsic to stochastic extensions of ODE systems is not confined to ecological models.A series of papers have debated the most appropriate way to extend the deterministic Hodgkin-Huxley equations to incorporate the effects of random gating of ion channels in neural dynamics [3,14,20,21,28,30,31] At the level of large-scale neural circuits, several distinct stochastic generalizations have been proposed that coincide with the classical deterministic Wilson-Cowan neural field equations in the mean-field limit [5,6,9,12,13].
As these examples suggest, there could be more than one stochastic model consistent with Equation (1.1) in the mean-field limit, but exhibiting distinct long-term behaviors for finite system size.In this paper, we investigate three different stochastic implementations of heteroclinic cycling, each resulting in distinct long-term behavior.First, we consider two alternative stochastic models, each based on a birth-death formalism consistent with Equation (1.1).By formulating the discrete master equation [15] and leveraging complex-balanced equilibrium results from chemical kinetics [4], we prove that each alternative results in a qualitatively different stationary distribution.We confirm these findings numerically.We then propose a modified May-Leonard system inspired by a neuromotor CPG model from Lyttle et.al. [26].Using the same birth-death formalism, we construct a stochastic implementation of this new model that not only avoids extinction events, but also maintains a finite mean cycle length.We numerically investigate how the mean cycle length depends on model parameters and examine its asymptotic behaviors in the both the large and small system size limits.Taken together, these results illustrate the rich variety of behaviors that may be obtained from different stochastic generalizations of May and Leonard's original deterministic heteroclinic cycling model.

2.
Mean-Field Formulations of Heteroclinic Cycling.For a general system of m species following deterministic Lotka-Volterra interactions, species i has the governing equation (2.1) In Equation (2.1), n i is the population size of species i ∈ {1, . . ., m}, r i is the intrinsic growth rate of species i, k ij represents the strength the competitive effect of species j on species i, and f i (t) is a nonhomogeneous forcing function that can represent immigration, harvesting, etc. of species i.We will use Equation (2.1) to construct three versions of May and Leonard's heteroclinic cycling model.For the duration of the paper we will restrict our attention to three interacting species (m = 3), assume that each species has the same intrinsic growth rate r 1 = r 2 = r 3 = r and forcing function f 1 = f 2 = f 3 = f , and enforce that competition rates have the same cyclic symmetry as the May-Leonard system, so that , where indicial addition is taken cyclically.Note that by setting m = 3, r = 1, k i,i+1 = α, k i,i+2 = β, k ii = 1, and f = 0, we recover Equation (1.1).
The first two models we consider will be direct analogues of Equation (1.1).As is the case in May and Leonard's original system, both models will obey mass-action kinetics, with implications that we discuss below.We begin with a "general variance" or "GV model."In this model, the intrinsic growth rate r reflects the combined effects of a per capita birth rate b > 0 and a per capita death rate d > 0, chosen so that r = b − d > 0. The terminology "general variance" reflects the fact that the variance of the population growth over short times ∆t scales as (b + d)∆t + o(∆t).Thus for a given value of r, we can obtain arbitrarily large variance in the population growth process by increasing both b and d.Following the language of van Kampen [44] and Gardiner [15], we introduce a system size parameter Ω (representing the single-species carrying capacity).We consider the n i of Equation (2.1) as intensive variables and define N i = Ωn i as extensive variables for the number of individuals in the i-th species.
The resulting mean-field equations for the GV model may be written as: (2.2) For notational clarity, we write the birth and death rates separately; in the stochastic model each will parametrize a separate stochastic reaction term (see subsection 3.1).Note that when N 2 = N 3 = 0, N 1 follows logistic growth with carrying capacity Ω and low-density growth rate (b − d).
The second model we consider may be seen as a special case of the GV model, given by setting the intrinsic growth rate r = b and the per capita death rate d = 0.While this restriction may seem nonphysical, it may be a good approximation of some biological systems.For example, some bacterial populations survive exposure to antibiotics by entering a "persistent state" for which the mortality rate is effectively zero (see [7,16] for details).As noted above, the variance of the population growth over short times is proportional to b + d.Therefore, for a fixed r, the assumption d = 0 gives the minimum variance model, which we call the "minimal model."Its mean-field equations are: (2.3) In Equation (2.3), we replaced the individual birth and death rates from the GV model with the net growth rate r.Again note that if b − d = r from Equation (2.2), the GV and minimal models are equivalent at the level of mean-field equations, and we recover Equation (1.1) by taking Ω = b − d = r ≡ 1.However, in the stochastic implementation of the minimal model, eliminating the death process qualitatively changes the long-time asymptotic behavior (see subsection 3.2).
As a third stochastic variation on the May-Leonard model, we explore the effect of the nonhomogeneous term f = 0.This variation is motivated by heteroclinic cycling models of neural CPGs.For example, Shaw et.al. [39] and Lyttle et.al. [26] proposed a model for a CPG driving feeding movements in the marine mollusk Aplysia californica that comprises three pools of motor neurons, coupled by reciprocal inhibition and driven by endogenous activation.Each neural pool has an activation variable, a i , i ∈ {0, 1, 2}, ranging from a i = 0 (inactive) to a i = 1 (fully active), and satisfying May-Leonard type competitive dynamics.To study the effects of demographic stochasticity, we interpret the a i as intensive variables representing the fraction of active neurons in i-th pool, analogous to the Wilson-Cowan equations [50,51].We introduce a system size Ω, corresponding to the number of cells in each pool, and write A i = Ωa i as extensive variables, representing the integer number of active cells.We thus obtain our third mean-field model, which we call the "three-pool model:" (2.4) Note that Equation (2.4) can be obtained from Equation (2.1) by taking r = 0, and f = µΩ τ .In Equation (2.4), τ is a time constant, γ is the strength of inhibition, and µ governs the rate of endogenous activation.This activation parameter µ represents intrinsic sources of excitation, whether from ongoing network activity, slow endogenous excitatory currents, or neuromodulatory effects, that cause cells to activate spontaneously.This endogenous activation provides an additional source of stochasticity in our model.In this model, the total number of cells in each neural pool is conserved, with transitions representing changes of activation state rather than "births" or "deaths".In contrast to the ecological models, Equation (2.2) and Equation (2.3), where population sizes are unbounded, in the neural pool model the population state-space finite.The endogenous activation term µ 1 was introduced by Shaw et.al. [39] as a means of regulating the sensitivity of the neural activity, by steering trajectories away from the saddle points of the heteroclinic system.Here we define the endogenous activation term somewhat differently from their original formulation, in order to enforce zero flux conditions on the boundaries of our space, which in turn allows us to construct a well-defined stochastic model (see section 5).As in the GV and minimal models, the three-pool model obeys mass-action kinetics; to see this, define I i = Ω − A i to be the number of inactive neurons in the i-th pool.We may then rewrite Equation (2.4) as: To better understand the dynamics three-pool model, compared to the more traditional translations of heteroclinic cycling, we simulated the deterministic Equation (2.3) and Equation (2.4) (see Figure 1).Figure 1A shows that the minimal model exhibits the same heteroclinic cycling as May and Leonard's original system.As expected, solutions converge to the plane n 1 + n 2 + n 3 = 1 in phase space (Figure 1C).In contrast, in the three-pool model, when µ = 0 we recover a rescaled version of the original May-Leonard system.However, when µ > 0, the three-pool model does not exhibit heteroclinic cycling; instead, as Figure 1B shows, it undergoes periodic oscillations.Trajectories no longer converge to the triangular unit plane, but instead converge to a hyperbolic manifold (see Figure 1D).
In the following sections we introduce stochastic models corresponding to each of the three mean-field models discussed above.In order to constrain our choice of stochastic model, in each case we restrict consideration to models that obey massaction kinetics.This choice allows us to leverage results from birth-death processes and complex-balanced equilibrium theory in order to study the long-time asymptotic behavior of each model.As a consequence of this modeling choice, the noise in our models will come from demographic stochasticity rather than, for example, scaled Gaussian noise typical of Langevin-type population models.By focusing on discretestate population models, we aim to hew closely to the spirit of May and Leonard's original work.

Stationary Distribution of General Variance and Minimal Models.
3.1.General Variance Model: Total Extinction.Following [22,49], we adopt the formalism of stochastic mass-action kinetics and construct the reaction net for Equation (2.2): In each of Equation (3.1)-(3.5)we take i ∈ {1, 2, 3} and interpret indicial addition cyclically.For each reaction, c j is the microscopic rate constant determining the propensity of the given reaction.Figure 2B,D shows a sample trajectory of this system generated via Gillespie's stochastic simulation algorithm [19,22].While short-term dynamics of the mean-field model Equation (2.2) evolve slowly from initial conditions (Figure 2A,B), the stochastic GV system quickly exhibits extinction of two of the three species (Figure 2C).This result is consistent with May and Leonard's prediction: while intensive variables can become infinitely close to zero, extensive variables taking discrete values will eventually drop to zero.However, rather than leading the third "winning" species to dominate in perpetuity, on a longer time scale (Figure 2D) the winning population also ultimately suffers a downward fluctuation leading to its own We provide a proof in Appendix A. Proposition 3.1 tells us that the GV model will exhibit total extinction in the longtime limit, independent of initial conditions.This result recalls that of Vellela and Qian [46], in which the authors demonstrated that for the single-population Keizerator reaction system, the mean-field system converges to a nontrivial equilibrium while the stochastic system converges to total extinction (albeit with mean extinction times that are exponentially long in the system size).We can explain this behavior by the fact that our reaction system includes individual birth (N i → 2N i ) and death (N i → ∅) reactions [1].As we will see in subsection 3.2, removing the death reaction fundamentally changes the stationary behavior of the model.thus the reaction system takes the form (for i ∈ {1, 2, 3}, as before):

Minimal Model
Figure 2E,F contrast Gillespie simulations of the GV model and the minimal model.In both case two population extinctions occur quickly, but in the minimal model the third population does not go extinct.In the minimal model, once the system reduces to a single species, the only death mechanism is homocidal competition, which requires at least two individuals.Therefore the total extinction state is not accessible from non-trivial initial conditions, which guarantees a distinct stationary distribution from the GV model.The framework of complex-balanced equilibria from Horn and Jackson [23] and Anderson and Kurtz [4] allows us to obtain this stationary distribution for the system, as given in Proposition 3.2: Proposition 3.2.Let N = (N 1 , N 2 , N 3 ) be the population vector of the minimal model Equation (3.6)-(3.9).The reaction system Equation (3.6)-(3.9)has four distinct stationary distributions.Three may be expressed as component-wise stationary distributions of the form for i ∈ {1, 2, 3}, n i ≥ 1, with δ(x) being the distribution with unit probability at x = 0, and with index addition taken cyclically on {1, 2, 3}.The fourth is π(n) = δ(n).
We provide a proof in Appendix B. Figure 3 shows a comparison of the analytic stationary distribution from Equation (3.10) with the empirical distribution from Gillespie simulations.We can see that the two results show excellent agreement, even when the system size Ω takes on non-integer values.Comparing Proposition 3.1 and Proposition 3.2, the stationary behaviors of the GV and minimal models are in fact distinct.While two of the populations will go extinct in both models, the GV model converges to total extinction while the minimal model converges to a truncated Poisson distribution representing stochastic logistic growth.Although the total extinction state is a stationary distribution for the minimal model, it is not accessible from nontrivial initial conditions.This comparison illustrates the well-known fact that two stochastic models both consistent with the same mean-field deterministic model can have fundamentally different long-term behavior.
4. Transient Behavior of the Minimal Model.While we have thus far restricted our investigation to long-time asymptotic behaviors, we may also study the dynamics of extinction over intermediate times.The order and timing of extinctions is important in conservation ecology, where it is crucial to determine if and when intervention is required to prevent population collapse [32,37].Gillespie simulations suggest that the cycle length of the stochastic May-Leonard system, conditioned on non-extinction, has finite mean (Figure 2).Taking this observation together with the stationary distribution results from section 3, we can reasonably expect to estimate both the ordering of extinction events and their times of the stochastic system.Because both the GV and minimal models exhibit similar transient behavior, we will restrict our investigations to the minimal model, as the results will be more clear due to its lower variance.

Distribution and Ordering of Extinction Events.
To study the ordering of extinction events in the minimal model, we found the distribution of hitting locations on the coordinate planes N i = 0 for i ∈ {1, 2, 3} from a fixed initial condition N(0) = Ω 3 (1, 1, 1).This distribution gives the relative probability of extinction of each species from this starting condition.In addition, it gives us the conditional density of, for example, species 2 and 3, conditioned on species 1 going extinct first.We formulated the first-passage location problem as (4.1) where L is the infinitesimal generator matrix associated with the discrete master equation, s is a fixed absorbing state, π is the probability of hitting s as a function of initial condition, and e s is the standard basis vector.We imposed absorbing boundary conditions along the coordinate planes N i = 0 and adjoint reflecting boundary conditions along the planes N i = 2Ω to ensure a well-posed numerical problem in which probability conservation is guaranteed.For more details about this construction, see Appendix C. Figure 4A shows the solution of this linear system; we can see that for an initial condition along the vector (1, 1, 1), the distribution of absorption locations exhibits a three-fold rotational symmetry about the initial condition.The majority of the distribution is located near the intersections of the plane N 1 + N 2 + N 3 = Ω with the three absorbing coordinate planes.These results suggest that, for a symmetricallydistributed initial condition, all three populations are equally likely to go extinct.To confirm these findings, we also found the first-hitting distribution empirically, shown in Figure 4C, using Gillespie simulations.We can see that the two approaches show good agreement over the entire domain.
Assuming WLOG that N 3 is the first population to go extinct, we formulated the first-hitting problem for the two-dimensional subsystem to find the absorption distribution of the remaining two species conditioned on extinction of N 3 .Using the same approach from the full three-dimensional system, with initial conditions weighted by the distribution in Figure 4A over the plane N 3 = 0, we obtained the distributions shown in Figure 4B.The upper (red) curve labeled "N 2 " shows the density of N 1 at the time N 2 goes extinct.Similarly, the lower (blue) curve labeled "N 1 " shows the density of N 2 at the time N 1 goes extinct.The area under each curve gives the conditional probability that the corresponding population goes extinct, given that N 3 goes extinct first; note that the summed area under the two curves equals unity.From these results, we can see that once N 3 goes extinct, N 2 is much more likely to go extinct than N 1 .We again confirmed our these results using Gillespie simulations (Figure 4D) and found good agreement (χ 2 goodness-of-fit test, p = 0.96 > 0.05).
Combining these two results, we can see that there is a distinct pattern to the extinctions in the minimal model, which is schematized in Figure 4E.In the full three-dimensional system, the likelihood of each extinction is determined by the initial conditions; any initial condition along the vector (1, 1, 1) results in an equal probability of first extinction.Once one population goes extinct, a second population quickly goes extinct because of the imbalance in competition rates α and β, leaving a sole surviving species.For example, if species 3 goes extinct first, then it is more likely that species 2 goes extinct next, leaving species 1 to dominate over long times.This pattern is reminiscent of the age-old saying "the enemy of my enemy is my friend," as species 1, which is out-competed by species 3, survives because species 2 out-competes species 3.

Extinction Times in the Minimal Model.
In order to find the exact mean time to first extinction, we construct the first-passage time problem where L is the same infinitesimal generator matrix from Equation (4.1), τ is the vector of mean absorption times as a function of initial condition, and −1 is a vector of -1's.We impose absorbing boundary conditions on the coordinate planes N i = 0, and adjoint reflecting boundary conditions on the planes N i = 2Ω, as in the first-passage location problem (subsection 4.1).Using this approach, we obtained the mean firstextinction time for all initial states in the domain.For ease of visualization, Figure 5A and B show τ restricted to the plane Π = {N 1 + N 2 + N 3 = Ω 3 }.(We note that trajectories with initial conditions away from Π quickly approach a small neighborhood of this plane, so mean extinction times on the plane are representative of mean extinction times from most starting locations in the interior of the domain.)Figure 5A shows a slice of the mean first-extinction time along the plane Π.As expected, the extinction times as a function of initial condition have three-fold rotational symmetry.Moreover, the time is largely determined by the distance between the initial condition and the deterministic fixed point Ω 3 (1, 1, 1).To confirm the results from the discrete first-passage time problem, we also used large-sample Gillespie simulations with initial condition taken over Π. Figure 5B shows the empirical mean first-extinction time as a function of starting location.Comparing the exact and approximate results, we found good agreement (t-test, averaged over initial conditions: p(N) Π ≈ 0.51 > 0.05).To illustrate this agreement further, in Figure 5C we plot each initial condition in a scatter plot: the abscissa is the exact mean extinction time found using the discrete backward equations (T D ) and the ordinate is the empirical mean extinction time found using Gillespie simulations (T G ).The two methods show excellent agreement for initial states near the coordinate planes (when T D is small) and have a slightly increased variance when the initial state is close to the deterministic fixed-point (when T D is large).Nevertheless T D and T G show excellent agreement over the entirety of Π.This result demonstrates that large-sample Gillespie simulations give a good approximation of the exact mean extinction times, and justifies the use of Gillespie simulations for large-Ω systems where the exact solution becomes intractable (e.g.Ω 60).
In order to study the effect of system size on mean extinction time for Ω > 30, we relied on Gillespie simulations.From our previous simulations for a fixed Ω = 30 system, we observed that time to first extinction has three-fold rotational symmetry and largely depends on the distance from the deterministic fixed point.Therefore we considered initial conditions along the segment connecting (Ω, 0, 0) and Ω 3 (1, 1, 1), where we parameterized the distance along this segment using the parameter s ∈ [0, 1].Using this parameterization, we varied Ω and s and estimated the time to first extinction, shown in Figure 5D.As s increases and the initial condition moves closer to the deterministic fixed point, the mean extinction time increases across all values of Ω; this trend is consistent with the behavior we observe in the small-Ω system.
5. Three-Pool Model: Stochastic Oscillations.In the previous sections, we showed that modifying a single reaction in the stochastic model (removing the individual death reaction) led to distinct asymptotic dynamics.However, both the GV and minimal models share the same mean-field behavior, and both produce transient dynamics that may be described as noisy heteroclinic cycling.In contrast, the threepool model for a neuromotor central pattern generator (CPG) in Equation ( 2.4) has a non-homogeneous term, µ, that steers trajectories away from the fixed points in the corners of the boundaries, preventing heteroclinic cycling.In the CPG model, the parameter µ represents endogenous activation of each pool of motor neurons.When µ > 0 the resulting deterministic system exhibits finite-period oscillations, converting heteroclinic cycling into finite-period limit cycle behavior (see Figure 1), with prolonged dwell times near the saddle points and a period that can be sensitively controlled by the endogenous activation parameter.
Both endogenous activation and noise intensity have been suggested as potential mechanisms for regulating the frequency of cycling in CPG models built on a dynamical architecture of heteroclinic cycling [26,38,39,40].The three-pool model specified below allows us to investigate the relative contributions of both activation (controlled by µ) and noise (controlled by the system size Ω) to regulating the mean oscillation period of the CPG model.
Using the same formalism as in section 3, we write the reaction net for A i for the three-pool system as: where i ∈ {0, 1, 2} and indicial addition is taken cyclically; recall that I i = Ω − A i is the inactive population.Note that the endogenous activation µ enters into the reaction I i → A i .Because the total population of cells in each pool remains fixed over time, this reaction ensures that even if one population becomes fully inactive, it will eventually become active again, after some delay.Thus, in the language of the previous two models, the neural populations in the three-pool model will never go permanently extinct.Consequently the neural activity oscillation persists indefinitely, albeit with a randomly varying cycle length.
Figure 6 illustrates how the population size Ω and activation strength µ influence the cycle length.In order to cover a wide range of system sizes, we utilized Gillespie simulations.Figure 6A shows the empirical mean cycle length for varied Ω and µ; we observe that larger parameter values cause faster oscillations, on average.Additionally, as Ω increases, the mean period approaches a value that depends solely on µ; this value is the deterministic period from the mean-field equations in Equation (2.4).We calculated the empirical variance of the cycle length, shown in Figure 6B, and found that the variance also decreases when either Ω or µ are increased.These results suggest that both µ and Ω could contribute to controlling the frequency of neural activity.For example, consider a relatively slow system, with small Ω and small µ.This system can be sped up by either increasing µ, which increases endogenous activation noise and drives activity further away from the saddle points, or by increasing Ω, which decreases demographic stochasticity.Additionally, both parameters have similar influence on the variance of the cycle length.Recent work has shown that the feeding CPG of the marine mollusk Aplysia californica recruits additional motor neurons when the organism encounters unexpected resistance in swallowing food [17,18], and that variability of motor neuronal activity is reduced for those components of feeding behavior that matter most for task fitness [11].Although the isolated three pool model considered here lacks important circuit components (such as sensory feedback [10]), the relative sensitivity of the cycle time variance to µ versus Ω could nevertheless suggest experimentally testable questions.For instance, one could probe experimentally whether the variability in the motor pattern decreases or increases when subjected to larger external loads.
Figure 6A,B exhibit a large region of parameter space in which the mean and variance both vary linearly on a log scale with both Ω and µ.To explain this observation, we developed an approximate expression for the average period as a function of Ω and µ.Consider the discrete system, which forms a cubic lattice, and suppose the population vector is currently (Ω, 0, 0).While there are three possible transitions away from this state, the only transition that pushes the system forward along a cycle is the transition (Ω, 0, 0) → (Ω, 1, 0).The time of this transition is exponentially distributed with rate parameter Ωµ τ .Once this transition occurs, the subsequent transitions are more rapid, and push the system to the corner (0, Ω, 0), where the process repeats itself.Because of the differing timescales of these transitions, we can approximate the cycle dynamics as a sequence of three "rate-limiting steps", each with transition times that are iid exponentially distributed with parameter Ωµ τ .The sum of these three times follows a Gamma distribution with parameters (α, β) = 3, τ µΩ .This distribution predicts the mean cycle length and coefficient of variation to be (5.4) To verify the Gamma distribution approximation and the predicted mean cycle length given by Equation (5.4), we plot the average period as a function of Ω for fixed µ, and as a function of µ for fixed Ω (solid thin lines) against the empirical average period (thick colored lines) in Figure 6C and Figure 6D, respectively.The Gamma distribution and Gillespie simulation results agree in the region of small Ω and small µ; however, the Gamma distribution approximation breaks down as Ω increases and the number of different transition paths between fixed points increases.To further validate the Gamma distribution approximation, in Figure 6E we plot the coefficient of variation (CV) of the empirical cycle length and found that the CV is approximately constant for a large region of parameter space.Comparing our empirical CV to the predicted value Equation (5.4) in Figure 6F, we found that this heuristic interpretation holds for a large range of parameters.

Conclusions.
In this work, we studied both transient and long-term behavior in several stochastic versions of May and Leonard's heteroclinic cycling model, introducing noise via demographic stochasticity under a variety of assumption.Although two of our models (the general-variance (GV) and minimal models) coincide with the classical May-Leonard system in Equation (1.1) in the mean-field limit, we found that these stochastic versions are guaranteed to undergo population extinctions in finite time.Moreover, by eliminating the individual death reactions in the GV model to obtain the minimal model, we proved that the stationary distribution changes from total extinction of all species (in the GV model) to extinction to a sole survivor that follows a truncated Poisson distribution (in the minimal model).We also studied a variant of the model representing a three-pool neural system.In this version, we added a single reaction to introduce endogenous excitation of each neural population; in an ecological context a similar modification can be thought of as representing immigration.This additional reaction yielded a system that not only avoids permanent extinctions, but has a finite mean cycle time that depends on the size of each neural pool (Ω) and the strength of endogenous excitation (µ).Using an intuitive rate-limiting step argument, we found an approximation to the mean cycle length that showed good agreement in both mean and variance with Gillespie simulations.As elements of a potential control scheme for a neural central pattern generator, it is worth noting that although both Ω and µ provide potential control parameters, their effects on the mean and variance of the cycle time are similar for a wide range of parameters, meaning that the mean and variance cannot be controlled independently of one another.
Throughout our investigation, we limited attention to stochastic models with mass-action kinetics.This focus allowed us to formulate our models as multi-dimensional birth-death processes and leverage results from complex balanced equilibrium theory to find the stationary distribution of the GV and minimal models.While others, such as Reichenbach et.al. [35] and Yahalom et.al. [53], have studied cyclic stochastic population models, their results required taking continuum limits of the state space and linearizing the resulting dynamics about fixed-points of the associated deterministic system.As a result, their models employed Gaussian white noise rather than discrete population noise, which can lead to inconsistent treatment of small population dynamics leading to extinction [42].Our approach avoided these potential difficulties and guaranteed that demographic stochasticity was the only source of noise in our models.
To verify our analytic results, we both ran large-scale Gillespie simulations and constructed first-passage problems as sparse linear systems.While these two numerical approaches showed good agreement, we could only leverage the exact results from the master equation for small system sizes.This limitation is a consequence of our choice to use a birth-death formalism for all the stochastic models: the infinitesimal generator matrix L that is required for solving first-passage problems has O Ω 3 scaling, and we quickly reached hardware limitations when trying to vary Ω over several orders of magnitude.Future work using the discrete system may require approximating the operator to make the first passage problem tractable.Safta et.al. [36] have developed a hybrid discrete-continuum approximation of the forward operator, the adjoint of the infinitesimal generator matrix, that involves partitioning the state space, taking a continuum limit within each partition, and simulating a continuous flow within partitions and discrete transitions between partitions.In future work, this approach could be extended to create a hybrid approximation to the backward operator L to solve first-passage time problems, which would allow us to obtain the extinction time statistics for a larger range of system sizes semi-analytically.Our proof follows ideas similar to Vellela and Qian [46].
Appendix B. Proof of Proposition 3.2.Following [4], we summarize the elements of a chemical reaction network.The network comprises a set of m species S (in our case, S = {N 1 , N 2 , N 3 }), a set of complexes C, which are nonnegative integer linear combinations of species (for example, N 1 + N 2 is the complex y = (1, 1, 0); 2N 1 is the complex y = (2, 0, 0), etc.), and a finite set R of reactions (e.g.reaction 1 might be N 1 + N 2 → 2N 2 ).A reaction network of this form has linkage classes, which are the number of connected components of the reaction network graph.We index the reactions 1, . . ., k, . . ., |R|.In a deterministic mass-action kinetics model, the reaction taking complex y to complex y has rate κc y , where c ∈ R m + is the vector of species concentrations, κ is a molecular rate constant, and c y = m i=1 c yi i .An equilibrium concentration c * for a deterministic network is "complex-balanced" if for every complex η ∈ C, the net production and consumption rates of η are equal, i.e.where the LHS sums over source complexes and the RHS sums over product complexes.Anderson and Kurtz [4] further define a chemical reaction network {S, C, R} to be weakly reversible if for any reaction y k → y k ∈ R, there is a finite sequence of reactions beginning with y k as a source complex and ending with y k as a product complex, e.g.y k → y 1 → y 2 → . . .→ y r → y k .
Proof.The distribution π(n) = 3 i=1 δ(n i ), which represents complete extinction, is clearly an invariant distribution for the system Equation (3.6)-(3.9).It remains to show that the only other stationary distributions have the form Equation (3.10).
We start by showing that neither the full three-dimensional system nor the twodimensional subsystem admit complex balanced equilibria, while each one-dimensional subsystem does admit a complex balanced equilibrium (CBE).Without loss of generality, we will set i = 1; the remaining stationary distribution follows by permutation of indices.First, we consider the full three-dimensional system.We write the reaction network from Equation (3.6)-(3.9) in the more compact form The system has nine complexes: C = {N 1 , N 2 , N 3 , 2N 1 , 2N 2 , 2N 3 , N 1 + N 2 , N 1 + N 3 , N 2 + N 3 } and six linkage classes (the distinct connected components of the reaction network, displayed in Equation (B.2)).The stoichiometric reaction vectors, representing the change in number of each species that results from each reaction, span the entire space of dimension s = 3.The network deficiency is δ := |C| − − s; for our system δ = 9−6−3 = 0.The complex balanced equilibrium theorem [2,4] establishes that a zero-deficiency network has a CBE if and only if it is weakly reversible.The minimal reaction network Equation (B.2) is not weakly reversible.For example, the complex y = N 1 + N 2 appears as the source complex in the reaction N 1 + N 2 → N 1 , but there is no reaction path leading from the product complex y = N 1 back to y.We conclude that the network does not admit a CBE.Consequently, there is no stationary distribution in which all three species have nonzero populations.(Failure of weak reversibility coincides with the intuition that once the population enters a two-dimensional subspace on a coordinate plane, there is no reaction to bring the system back into the full three-dimensional space.)Next suppose, again WLOG, that N 3 is the first population to go extinct.The two dimensional subsystem (N 1 , N 2 , 0) is an absorbing set, within which the system has the reduced reaction network This reaction network has deficiency δ = 5−3−2 = 0. Invoking the complex balanced equilibrium theorem again, since this two-dimensional network also fails to be weakly reversible, it again does not admit a stationary distribution.Within the (N 1 , N 2 , 0) subsystem, either species could go extinct.Suppose (again WLOG) that N 2 goes extinct next.Now the network reduces to the one-dimensional subsystem This reaction network also has zero deficiency, δ = 2 − 1 − 1 = 0, but unlike the previous cases, it is weakly reversible.In this case, the stationary distribution theorem [2] establishes that the subsystem Equation (B.3) has a unique stationary distribution which is a truncated Poisson distribution: (B.4) π(n 1 ) = Ω n1 n 1 !(e Ω − 1) , n 1 ∈ N.
puting at Case Western Reserve University.The second author acknowledges research support from Oberlin College.

Fig. 1 .
Fig. 1.Comparison of deterministic minimal and three-pool models.A: Relative population size n i as a function of t generated from Equation (2.3) with r = 1, α = 0.8, β = 1.3, and n(0) = (1, 0.8, 0.2).B: Active fraction of neural pool a i as a function of t from Equation (2.4) with τ = 1, γ = 2.4, and µ = 10 −5 with same initial condition as A. C: Solution from A plotted in phase space.D: Solution from B plotted in phase space.

Fig. 2 .
Fig. 2. GV and minimal models of stochastic heteroclinic cycling.A: Mean-field behavior of GV/minimal model from Equation (2.2) with α = 0.8, β = 1.3, r = 1, Ω = 30, and N (0) = Ω 1 3 , 1 3 , 13 30 .B: Same as A, plotted for 0 ≤ t ≤ 200.C: Stochastic realization of GV model from reaction system in Equation (3.1)-(3.5)with same parameters as A. Strikes (x) mark extinction events of each species.D: Same as B, plotted for 0 ≤ t ≤ 200.Note that all species have gone extinct by the end of the simulation.E: C: Stochastic realization of minimal model from reaction system in Equation (3.6)-(3.9)with same parameters as A. F: Same as E, plotted for 0 ≤ t ≤ 200.In the minimal model, the last species survives in perpetuity.

Fig. 3 .
Fig. 3. Comparison of analytical and empirical stationary distributions for the minimal variance model.A: Blue bars show histogram of empirical stationary distribution from Gillespie simulations with α = 0.8, β = 1.3, r = 1, and Ω = 10.Black curve shows stationary distribution calculated from Equation (3.10) with Ω = 10.B: Same as A, with Ω = 0.1.C: p-values from χ 2 goodness-of-fit test between empirical stationary distribution from Gillespie simulations and analytical stationary distribution from Equation (3.10) with varied Ω. Dashed line shows standard significance level α = 0.05; p-values above the dashed line indicate good agreement between the distributions.

Fig. 4 .
Fig. 4. Ordering of extinction events in the minimal-variance model.A: Exact distribution for first extinction event obtained by solving the discrete first-passage location problem (r = 1, α = 0.8, β = 1.3, Ω = 30).B: Exact distribution for second extinction event, conditioned on N 3 going extinct first, obtained by solving the discrete first-passage location problem.Red (upper) curve: Density of population N 1 when population N 2 goes extinct.Blue (lower) curve: Density of population N 2 when population N 1 goes extinct.See text for details.C: Empirical distributions for first extinction event, obtained from 10 6 Gillespie simulations with the same parameters as in A. D: Empirical distributions for second extinction event, conditioned on N 3 going extinct first, obtained from Gillespie simulations with 10 6 initial states sampled from the plane N 3 = 0 in C. E: Schematic showing competition interactions and ordering of extinction events.Thicker arrows indicate stronger competitive interactions.

Fig. 5 .
Fig. 5. Timing of first extinction in the minimal model.A: Exact mean time to first extinction event, obtained by solving the discrete first-passage time problem (r = 1, α = 0.8, β = 1.3, Ω = 30), from initial conditions in the plane Π = {N 1 + N 2 + N 3 = Ω/3}.B: Empirical expected time to first extinction event, calculated using 10 4 Gillespie simulations over Π with the same parameters as in A. C: Scatter plot of mean extinction times at every initial condition.Abscissa: Exact time T D , from discrete backward equation.Ordinate: Empirical mean time T G , from 10 4 samples.D: Empirical mean time to first extinction event, obtained from 10 4 Gillespie simulations, as a function of Ω.Several values of s along the segment Ω(1 − s) + Ωs 3 , Ωs 3 , Ωs 3 are superimposed.

Fig. 6 .
Fig. 6.Cycle length statistics for the three-pool model.A: Mean cycle length Tc as a function of system size Ω and excitation parameter µ.Mean calculated at each parameter set using 10 4 samples from Gillespie simulations.B: Variance in cycle length as a function of Ω and µ.C: Mean cycle length as a function of Ω, with several values of µ superimposed.Solid thin lines show the approximate mean cycle length given by Equation (5.4), and dashed lines show cycle length of deterministic system from Equation (2.4) for the given value of µ.D: Mean cycle length as a function of µ, with several values of Ω superimposed.Solid thin lines show approximate cycle length given by Equation (5.4), and dashed line shows deterministic cycle length as a function of µ.E: Coefficient of variation (CV) of the cycle length as a function of Ω and µ.F: Difference in CV from Gillespie samples (CVc) and CV from Gamma distribution approximation (CVa), calculated as ∆CV = CVc − CVa, as a function of Ω and µ.
Appendix A. Proof of Proposition 3.1.For the reader's convenience, we restate Proposition 3.1: Let N = (N 1 , N 2 , N 3 ) be the vector of individuals in each population of the GV model.If the per capita death rate d > 0, then the unique stationary distribution of the reaction system Equation (3.1)-(3.5) is π(n) = δ(n).