Abstract
The stochastic simulation algorithm commonly known as Gillespie’s algorithm (originally derived for modelling wellmixed systems of chemical reactions) is now used ubiquitously in the modelling of biological processes in which stochastic effects play an important role. In wellmixed scenarios at the subcellular level it is often reasonable to assume that times between successive reaction/interaction events are exponentially distributed and can be appropriately modelled as a Markov process and hence simulated by the Gillespie algorithm. However, Gillespie’s algorithm is routinely applied to model biological systems for which it was never intended. In particular, processes in which cell proliferation is important (e.g. embryonic development, cancer formation) should not be simulated naively using the Gillespie algorithm since the historydependent nature of the cell cycle breaks the Markov process. The variance in experimentally measured cell cycle times is far less than in an exponential cell cycle time distribution with the same mean.
Here we suggest a method of modelling the cell cycle that restores the memoryless property to the system and is therefore consistent with simulation via the Gillespie algorithm. By breaking the cell cycle into a number of independent exponentially distributed stages, we can restore the Markov property at the same time as more accurately approximating the appropriate cell cycle time distributions. The consequences of our revised mathematical model are explored analytically as far as possible. We demonstrate the importance of employing the correct cell cycle time distribution by recapitulating the results from two models incorporating cellular proliferation (one spatial and one nonspatial) and demonstrating that changing the cell cycle time distribution makes quantitative and qualitative differences to the outcome of the models. Our adaptation will allow modellers and experimentalists alike to appropriately represent cellular proliferation—vital to the accurate modelling of many biological processes—whilst still being able to take advantage of the power and efficiency of the popular Gillespie algorithm.
Introduction
In a wellmixed solution of chemicals undergoing reactions, nonreactive collisions occur far more often than reactive collisions allowing us to neglect the fast dynamics of motion. We can thus assume that the time between reactive collision events is exponentially distributed with rates which are a combinatorial function of the numbers of available reactants (Gillespie 1976, 1977). This premise is the basis of the Gillespie (1976) stochastic simulation algorithm.^{Footnote 1} The Gillespie algorithm has become a ubiquitous algorithm for the simulation of stochastic systems in the biological sciences, in particular in computational systems biology (Szekely and Burrage 2014).
However, the Gillespie algorithm is often used inappropriately to represent processes for which the interevent time is not exponentially distributed. One prevalent example of this is in the simulation of the cell cycle (Baar et al. 2016; CastellanosMoreno et al. 2014; Ryser et al. 2016; Figueredo et al. 2014; Turner et al. 2009; Mort et al. 2016; Zaider and Minerbo 2000) (see Fig. 1). The assumption of memorylessness, and consequently exponentially distributed cell cycle times, means that with high probability a daughter cell may divide immediately after the division event which created it. This is not biologically plausible since each cell is required to pass through the \(G_1, S, G_2\) and M phases of the cell cycle before division, and these phases (in particular Sphase) are ratelimiting.
As an illustrative example, Turner et al. (2009) present a wellmixed stochastic model of small populations of cancer stem cells, which they use to suggest treatment strategies. Both analytical and simulated results pertaining to the survival of the tumour are based on the assumption that interdivision times are exponentially distributed. We demonstrate later in this paper that using the correct cell cycle time distributions (CCTDs) could alter their results leading to different suggested treatment strategies.
Similarly, in a spatially extended context, Baker and Simpson (2010) investigate the role of spatial correlations on individualbased models of cell migration, proliferation and death, designed to represent experimental assays of cell behaviour in culture. The individualbased models they employ are onlattice, volumeexclusion processes in two and three dimensions in which cell migration, proliferation and death events are all considered to be exponentially distributed. Cell proliferation occurs when a cell is chosen to divide, placing a daughter cell at one of its neighbouring lattice sites. Baker and Simpson (2010) demonstrate the effects of spatial correlations in these models. In particular, they find that changing the motility of the cells can alter their net rate of growth in comparison with the logistic growth predicted by a simple mean field assumption which neglects the effects of correlations. This effect is due to the fact that cell motility serves to break up correlations allowing more proliferation events to occur in comparison with the lower motility case. By explicitly considering the correlations between the occupancies of pairs of lattice sites, Baker and Simpson (2010) derive a more accurate populationlevel model which better represents the growth in the number of cells over time for a diverse range of parameter values. We will investigate the effects of incorporating more realistic CCTDs on the outcomes of the model simulations.
NonMarkovian simulation methods exist for events which do not have exponentially distributed interevent times (Boguná et al. 2014). However, these algorithms are often difficult to understand and complex to encode since we are required to keep track of every cell individually. This presents a potential barrier to their use and consequently a barrier to the appropriate modelling of CCTDs.
Given the ubiquity of the Gillespie algorithm, it would be significantly more beneficial if we could decompose the cell cycle into a series of exponentially distributed events which could be naturally encoded in the framework of the Gillespie algorithm. One potential solution to this problem is the use of the hypoexponential family of distributions. It has been suggested that these distributions can be used to accurately represent phases of the cell cycle (and, by closedness of the sums of these distributions, the cell cycle itself) (Stewart 2009). Hypoexponential distributions are made up of a series of k independent exponential distributions, each with its own rate, \(\lambda _i\), in series. If k is large, then these models may face issues of parameter identifiability.
Recently, Weber et al. (2014) have suggested that a delayed hypoexponential distribution (consisting of three delayed exponential distributions in series) could be used to appropriately represent the cell cycle. These delayed exponential distributions represent the \(G_1, S\) and a combined \(G_2/M\) phases of the cell cycle. Their model is an extension of the seminal stochastic cell cycle model of Smith and Martin (1973) who use a single delayed exponential distribution to capture the variance in the cell cycle. Delayed hypoexponential distributions representing periods of the cell cycle have been justified by appealing to the work of Bel et al. (2009). Bel et al. (2009) showed that the completion time for a large class of complex theoretical biochemical systems, including DNA synthesis and repair, protein translation and molecular transport, can be well approximated by either deterministic or exponential distributions.
In this paper, we consider two special cases of the general hypoexponential distribution: the Erlang and exponentially modified Erlang distribution which, in turn, are special cases of the Gamma and exponentially modified Gamma distributions. For reference, their PDFs \(P_E\) and \(P_\mathrm{EME}\), respectively, are given below:
With suitable parameter choices, both distributions have been shown to provide good fits to large numbers of experimentally derived CCTDs (Golubev 2016) (see Fig. 2 for one such example). As special cases of the hypoexponential distributions, these distributions also have the significant advantage that they can be simulated using the ubiquitous Gillespie stochastic simulation algorithm. This will allow for the appropriate representation of CCTDs in stochastic models of cell populations, in contrast to the inappropriate exponentially distributed times which are commonly used (Baar et al. 2016; CastellanosMoreno et al. 2014; Ryser et al. 2016; Figueredo et al. 2014; Turner et al. 2009; Mort et al. 2016; Zaider and Minerbo 2000). Additionally, the two and three parameters (respectively) of the Erlang and exponentially modified Erlang distributions (respectively) simplify parameter identification in comparison with more highly parametrised distributions. These two choices (Erlang and exponentially modified Erlang distributions) are not the only nonmonotone distributions which could be used to appropriately represent the cell cycle. However, they are the general, nonmonotone, hypoexponential distributions with the fewest number of parameters (two for Erlang and three for exponentially modified Erlang). These features will aid parameter identifiability (few parameters) and crucially mean the distributions can be simulated using the Gillespie algorithm (hypoexponentiality), making these the most suitable distributions to consider.
Figure 2 demonstrates the improved agreement between the Erlang and exponentially modified Erlang distributions with the experimental data in comparison with the exponential distribution (c.f. Fig. 1). In each case, the parameters of the distributions are fitted by minimising the sum of squared residuals, \(\Sigma \), between the curve and the bars of the histogram.^{Footnote 2} Clearly, for the exponential distribution (see Fig. 1), the shape of the curve is incorrect. Consequently, the exponential distribution gives a poor representation of the true CCTD with a sum of squared residuals \(\Sigma =1.86\times 10^{6}\). Evidently the Erlang distribution with parameters \(\lambda =0.0083\) and \(k=12\) gives a much better agreement to the experimental data (see Fig. 2a), with a minimised sum of squared residuals, \(\Sigma =1.23\times 10^{7}\). Finally, the exponentially modified Erlang distribution with parameters \(\lambda _1=0.0251, \lambda _2=0.0019\) and \(k=26\) gives an even better agreement to the data^{Footnote 3} with a minimised sum of squared residuals, \(\Sigma =6.01\times 10^{8}\). The exponentially modified Erlang distribution achieves a minimised sum of squared residuals which is around half that of the Erlang distribution. Nevertheless, both Erlang and exponentially modified Erlang are good candidates for fitting cell cycle time data and can both be simulated within the existing Gillespie framework, so will be considered here.
In Sect. 2, we begin by outlining a general hypoexponential model of the cell cycle and noting that many previous models of the cell cycle are special cases. By simplifying the model further, we demonstrate that the Erlang and exponentially modified Erlang distributions are also special cases. In Sect. 3, we consider the special case of the Erlang distributed CCTD in more detail. Undertaking some simple analysis, we derive the expected behaviour of the mean cell number in the case of Erlang CCTDs and demonstrate analytically that significant differences can arise in comparison with models in which exponentially distributed CCTDs are used. In Sect. 4, we demonstrate the utility of our new CCTD representation in stochastic simulations of two biological models in which cellular proliferation is of critical importance. In each case, we show, through simulation, that there are important quantitative and qualitative differences between models which represent cell cycle times appropriately and those which do not. We conclude in Sect. 5 with a short discussion on the implications of our findings.
Multistage Model of the Cell Cycle
We divide the cell cycle (with mean length C) into k stages.^{Footnote 4} The time to progress through each of these stages is exponentially distributed with mean \(\mu _i\). We can represent the progression through these stages of the cell cycle as the following chain of ‘reactions’
where \(\lambda _i=1/\mu _i\).
The CCT under this model is hypoexponentially distributed. Although there is no simple closed form for the probability density function of the hypoexponential distribution, we can find simple expressions for its mean and variance. The mean is given by the sum of the means of the exponentially distributed stage times \(\sum ^k_{i=1}\mu _i=C\), and the variance is the sum of the variances of these stage times, \(\sum ^k_{i=1}\mu ^2_i\). By increasing the number of exponentially distributed stages, whilst decreasing their mean duration (in order to maintain the correct mean CCT), we can arbitrarily decrease the variance of the CCT. Many multistage models of the cell cycle are special cases of this general model (Golubev 2016; Hawkins et al. 2007; Hillen et al. 2010; Hoel and Crump 1974; Kendall 1948; Leander et al. 2014; León et al. 2004; Nakaoka and Inaba 2014; Powell 1955; Smith and Martin 1973; Weber et al. 2014; Zilman et al. 2010).
We can analyse the cell cycle reaction chain (2) further by considering the associated probability master equation (PME). Let \(P(x_1,x_2,\ldots ,x_k,t)\) be shorthand for the probability that there are \(x_1\) cells in stage one, \(x_2\) in stage two and so on. The PME is
By multiplying the PME by \(x_j\) and summing over the state space, we can find the evolution of the mean number of cells, \(M_j=\sum _{\varvec{x}}x_j P\), in each stage, where \(\sum _{\varvec{x}}\) is shorthand for \(\sum ^{\infty }_{x_1=1}\ldots \sum ^{\infty }_{x_k=1}\) and P is shorthand for \(P(x_1,x_2,\ldots ,x_k,t)\). Upon simplification we find the following evolution equations for the mean number of cells in each stage
Identical Rates of Progression
The hypoexponential model’s generality is also a significant drawback since it hampers parameter identifiability (Weber et al. 2014). As such, we seek to reduce the number of free parameters in the model whilst maintaining its ability to accurately represent CCTDs. Several authors have suggested using the Gamma distribution to model CCTDs (Hawkins et al. 2007; Kendall 1948; Nakaoka and Inaba 2014; Zilman et al. 2010). If we assume that all transition rates, \(\lambda _i\), are identically equal to \(\lambda _1\) (for \(i=1,\ldots , k\)) in our general hypoexponential model, then the time to progress through the whole cell cycle is distributed according to the sum of k identically exponentially distributed random variables. It is straightforward to show (using moment generating functions or convolutions) that the CCTD is Erlang distributed with scale parameter \(\mu =C/k\) and shape parameter k. In analogy with the general hypoexponential case, if we decrease \(\mu \) and simultaneously increase k so that \(\mu k = C\) remains constant, the Erlang distribution approaches the Dirac delta function centred on C, demonstrating that we can still arbitrarily reduce the variance to match the distribution we are trying to model.
Analysis of the CCDT with Equal Rates of Progression
We now analyse this CCTD model with identical rates of progression, noting that Kendall (1948) studied this case extensively and we draw on some of his analyses below. Although for this special case it is possible to derive a closed form firstorder partial differential equation for the evolution of the generating function corresponding the master equation (3), solving the associated characteristic equations is analytically intractable for all but the simplest case (\(k=1\)) (Kendall 1948). Instead we will focus on the mean behaviour of the cell population with which we can make some analytical progress.
In the equal rates case, we can sum the individual equations in system (4) to give
where \(M=\sum _i^k M_i\).
Consider the naive one stage (i.e. \(k=1\)) cell cycle mode with mean cell cycle time C:
The evolution of the mean number of cells is given by the special case of Eq. (5):
In the multistage model, under the assumption that all cells are evenly distributed between the stages (i.e. \(M_i=M/k\)), we can replace \(M_k\) with M / k in Eq. (5) to give a closed equation for the evolution of the total number of cells which matches equation (7):
However, the assumption on the even distributions of cells between stages is incorrect. This leads to differences not just, as might be expected, between the variation exhibited by the multistage and singlestage models, but also between their mean behaviour. In Fig. 3a, a clear difference between the \(k=1\) and \(k=4\) models is evident. The mean total cell number grows significantly more slowly in the \(k=4\) case than the \(k=1\) case. This is true for all models in which \(k>1\). Intuitively, exponentially distributed CCTs imply that the most probable time for a cell to divide is the current time. Once a cell has divided, it is immediately able to divide again with high probability allowing cells proliferating under the exponentially distributed CCT assumption to reinforce their numbers. This is in direct contrast to cells with Erlang distributed CCTs (with the same mean but \(k>1\)) which, with high probability, will wait for a period of time before dividing. In short, the larger variance of the exponentially distributed CCT population allows it to grow more rapidly.
Under the assumption of identical transition rates, equation system (4) can be reduced to a closed equation for the mean number of cells in the kth stage
and a set of \(k1\) ODEs which relate the number of cells in the other stages to \(M_k\)
Under the given initial conditions, a single cell in the first stage and no cells in any other stages, we can solve these equations to find
where z is the first nth root of unity (Kendall 1948). Although this expression looks complicated, in some cases it is possible to express \(M_j\) in a simple closed form. For example, when \(k=4\)
A comparison between these analytical solutions and their numerically solved counterparts demonstrates their veracity in Fig. 3b.
By summing Eq. (11) over all values of \(j=1,\ldots ,k\), we can also find an expression for the total number of cells in a population:
Although these formulae [Eqs. (12)–(16)] may be useful in specific cases where the closed form of the solution is readily accessible, their real utility is in shedding light on the asymptotic properties of the mean numbers of cells.
In the limit that t gets large for finite k the dominant term in the summation in Eq. (11) corresponds to the case \(r=0\). Indeed for \(k\le 28\) the real parts of the other elements in the summation are negative, and hence, these terms decay (Kendall 1948). Thus, we have
where
Summing Eq. (17) over all k stages leads to the asymptotic behaviour of the cell population as a whole:
Equation (19) can be rewritten as
For all \(k>1\), not only is the base of the exponent t / C less then e (since \(\alpha _k<1\), for \(k>1\)), but the coefficient is less than unity (Kendall 1948). This implies that, asymptotically, the expected total number of cells in a multistage model will always be less than the number expected in a singlestage cell cycle model [which can be determined upon substituting \(k=1\) in to (19)].
Note that in the limit as \(k\rightarrow \infty , \alpha _k\rightarrow \ln 2\). Thus, as might have been expected for the deterministic model resulting from the limit \(k\rightarrow \infty \), the asymptotic population grows with base 2, doubling at regular intervals as the cells divide synchronously:
Surprisingly though, the coefficient of \(2^{k/C t}\) does not tend to unity in Eq. (21) as might have been expected. Thus, the total population grows like
Reversing the order of limits and taking the limit as k tends to infinity of Eq. (16) for finite t gives the limit
for noninteger value of t / C, where \(\lfloor x\rfloor \) gives the integer part of x (Kendall 1948). For integer values of t / C, the limit is
corresponding to the algebraic mean of the limits of Eq. (23) as integers values are approached from the left and right hand sides. This “deterministic” doubling process is unsurprising since the waiting time distribution tends to a delta function in the \(k\rightarrow \infty \) limit, implying that cell division is synchronous.
Cells are Not Distributed Proportional to Stage Length
Returning to Eq. (4) under the assumption of identical rates of progression through the stages, we can derive corresponding equations for the mean proportion of cells in each stage, \(\hat{M}_j=M_j/M\) for \(j=1,\ldots , k\):
At steady state, we have the following recurrence relations for the mean proportion of cells in each stage
In particular, this implies that \(\hat{M}^{st}_j<\hat{M}^{st}_{j1}\) for \(j=2\ldots k\), so that, at steady state, as we progress through the stages, we will have successively fewer cells in each stage on average (independent of the initial distribution of cells amongst different stages). By solving these recurrence formulae, we can find exact expressions for the steadystate proportions:
In particular, note that
That is to say there are twice as many cells in the first stage as the last stage at steady state when the number of stages is large.
These differences are potentially important for determining average CCTs experimentally. One popular method for determining cell cycle times is to label Sphase cells using two sequentially administered distinct DNA specific labels (Wimber and Quastler 1963; Bokhari and Raza 1992). The administration of the labels is separated by a known time period. By counting cells labelled with one or both labels, and with reference to the known time period of separation, it is possible to calculate the mean duration of the Sphase. Once the proportion of cells in Sphase and the mean duration of Sphase have been determined, it is also possible to calculate the mean cell cycle time for the population (Nowakowski et al. 1989).
The method outlined above implicitly makes the assumption that the number of cells in a particular phase of the cell cycle is proportional to the length of that phase. For the multistage model, we have demonstrated that in the large time limit this is unequivocally not the case. Equation (11) can also be used to show that this phenomenon holds dynamically, although the mathematics is cumbersome. Instead we solve Eq. (4) numerically. Numerical solution also allows us to investigate the more general hypoexponential CCTD model (2) for which no analytical solutions are available. Figure 4a, b displays the evolution of the mean numbers and proportions (respectively) of cells in each stage for equal stage progression rates, \(\lambda _1\), and Fig. 4c, d displays the equivalent for unequal progression rates. The number/proportion of cells in each stage is not proportional to the mean duration of the stage, \(\mu _i\), either at steady state (compare actual steadystate proportions with the stars representing the normalised mean stage durations, \(\mu _i/\sum \mu _i\), in Fig. 4b, d) or dynamically.
The Exponentially Modified Erlang Distribution
Although the identicalstage model, which gives rise to the Erlang distribution for CCTDs, is convenient from a mathematical perspective, it has been shown to have been outperformed by a number of other distributions (Golubev 2010, 2016). In particular, by considering 77 independent CCT data sets, Golubev (2016) has recently shown that one of the most appropriate distributions for representing CCTDs is the exponentially modified Gamma (EMG) distribution. For our purposes, we will require that the shape parameter of the Gamma distribution is be integervalued so that the CCTD is actually an exponentially modified Erlang (EME). This will mean that we can continue to simulate CCTs using a series of exponentially distributed random variables (albeit one of them will have a different rate). Consequently, this will allow us to continue to appropriately simulate processes in which cell division is important using the popular Gillespie algorithm. In order to generate EME distributed CCTs, we modify our multistage cell cycle model as follows:
Note that, in system (29), the rate of progression is identical through each of the initial k stages of cell cycle and that we have added an additional exponentially distributed stage at the end whose rate, \(\lambda _2\), is distinct from the rate, \(\lambda _1\), of the previous k stages.
We can ascertain the probability density function for the EME distribution, \(P_\mathrm{EME}(t)\), by convolving the Erlang (\(P_\mathrm{ER}(t)\)) and exponential (\(P_E(t)\)) distributions as follows,
where \(\lambda _2\) is the rate of the exponential distribution with which we are convolving and, as before, \(\lambda _1\) is the rate of progression through each of the k identical exponentially distributed stages which comprise the Erlang distribution. We can simplify expression (30) to the following formulation
where \(L=\lambda _1\lambda _2\) and \(\Gamma (k,L t)=\int ^{\infty }_{L t}z^{k1}e^{z}\text{ d }z\) is the complementary incomplete gamma function.
We note that it is almost as simple to simulate this more general distribution in the Gillespie algorithm using a series of exponentially distributed stages as it is to simulate the distribution with constant rates of progression between stages. Indeed the simulation of any hypoexponential CCTD is straightforward in the Gillespie algorithm. However, the addition of extra parameters hampers their identifiability when fitting to experimental data and as such we only suggest using the Erlang or exponentially modified Erlang distributions in models of the CCTD.
In the following section, we illustrate the importance of incorporating nonexponentially distributed CCTDs into stochastic simulations of cellular proliferation. For ease of understanding, we concentrate purely on Erlang CCTDs, and note that the parameters are not based on fitted CCTDs but merely chosen for illustrative purposes.
Illustrative Examples
In this section, we recapitulate results from two different models which each assume exponentially distributed CCTs. The first is a wellmixed model of cancer stem cell proliferation and differentiation in a brain tumour. The second is a spatially extended model of cell migration and proliferation mimicking a growthtoconfluence experimental assay. In each case, we alter the CCTD in order to see what effect this has on the qualitative and quantitative results presented in the papers. For clarity, we will restrict ourselves to Erlang distributed CCTs, but note that results are qualitatively similar for exponentially modified Erlang distributed CCTs.
Cancer Stem Cell Maintenance
Turner et al. (2009) investigate the role of subpopulations of cells within a brain tumour possessing stem celllike properties and responsible for maintaining the tumour. In situations (e.g. posttreatment) in which there are small numbers of stem cells, they consider a stochastic model of cell proliferation and differentiation. Stem cells, S, can undergo symmetric division in which the daughter cells possess the same characteristics as the parent cells [see Eq. (32)] and the stem cell population increases. They can also undergo asymmetric selfrenewal in which one stem cell and one progenitor cell, P, are produced [see Eq. (33)] or symmetric proliferation in which two progenitor cells result from a stem cell division [see Eq. (34)]. Cell cycle times are exponentially distributed with rate \(\rho _s\) and fate choices (about which division type to undergo) are made at the point of division. With probability \(r_1\), symmetric division occurs, and with probability \(r_3\), symmetric proliferation occurs. Consequently, with probability \(r_2=1r_1r_3\) asymmetric selfrenewal occurs. These divisions with their effective rates are captured in the reaction system (32)–(34):
Under the assumption of exponential CCTDs, Turner et al. (2009) write down and solve a simple probability master equation for the stem cell population. In particular, they consider the case in which \(r_1>r_3\) which implies a positive net growth rate of the stem cell population. Under this assumption the mean number of cells in the stem cell population and variance can be shown to increase exponentially. Since the model is linear, by appealing to the central limit theorem, Turner et al. (2009) argue that, for large enough cell populations, the exact mean field equations given by
will approximate the stochastic dynamics well.
In order to ensure a more realistic representation of the CCTD, we introduce a multistage cell division process, as suggested above, so that the CCTDs are now Erlang distributed with the same mean, \(\rho _s\), but with shape parameter k (and thus scale parameter \(\mu _1=1/(\rho _s k)\)). As, before, at each division event, a choice about the type of division to occur is made with the same probabilities (\(r_1,r_2, r_3\)) as previously specified. Although we still get exponential increases in the mean and variance, the rate of increase is significantly decreased (see Fig. 5a, b). Crucially this means that the deterministic mean field model derived from the original process will significantly overestimate the number of cancer stem cells in the tumour which could have significant therapeutic implications.
Under this model with \(r_1>r_3\), if a tumour is not completely eradicated by treatment, it is possible that it can return. It may therefore be informative to know the probability that a tumour will reach a certain size by a particular time in order to plan appropriate followup therapeutic intervention. For example, we may be interested to know the evolution of the probability that the tumour has reached 1000 stem cells in size (which we will denote \(p_{1000}(t)\)) so that we might calculate the most appropriate time to initiate the followup intervention. In Fig. 5c, we plot the evolution of \(p_{1000}(t)\) over time. It is clear, by \(t=100\), that the probability of the tumour having grown to 1000 stem cells, \(p_{1000}\) varies significantly depending on the value of k used in the model despite the cells having the same mean CCT.^{Footnote 5} The effects of varying CCTD can clearly be seen to influence the model outcome even in this relatively straightforward linear model of cellular proliferation. In more complex models, in which other species depend in a nonlinear manner on the number of cells, the effects will no doubt be further exacerbated. The potential for therapeutic interventions to be based on stochastic mathematical models of cellular proliferation further emphasises the importance of modelling the CCTD correctly.
GrowthtoConfluence Assays
Next we investigate the effect of incorporating a more realistic model of cell proliferation on the behaviour of a spatially extended individuallevel model of cell migration and proliferation (Baker and Simpson 2010). As such, we alter the mechanism of cellular proliferation from the original, exponentially distributed division times to our more realistic multistage Erlang distributed division times and observe the effect this has on the growth of the cell population. In order to achieve this, we break the proliferation process into k stages, the passage through each of which has an exponentially distributed waiting time (as described above). As before, we chose the parameter of each stages’ waiting time to ensure we have the same mean proliferation attempt time as in the original model.
In more detail, we consider a volumeexclusion process on a regular, square lattice in two dimensions with periodic boundary conditions. Each lattice site, of length h, can hold at most one cell. Each repeat realisation begins by initialising, particles uniformly at random across the \(L_x\times L_y\) sites of the lattice. Agents can move between adjacent (in the von Neumann sense) lattice sites with rate \(P_m\). Movement is unbiased, meaning that once a cell has been chosen to move it does so into one of its four neighbouring lattice sites with equal probability. If the site into which a cell attempts to move is already occupied, then that movement event is aborted: the cell attempting movement remains at its current site.
Agents undergo a proliferation stage change with rate \(P_p k\) (giving average rate \(P_p\) for unhindered progression through the k stages required for division); this results in the cell’s current proliferation stage being incremented by one if the cell is currently in one of the first \(k1\) stages. If the cell is in the final stage (stage k) of proliferation and is selected to change stage, then the cell attempts to place a daughter in one of its four neighbouring lattice sites with equal probability. If the chosen site is empty, the cell places a daughter in the empty site and the proliferation stages of both the parent and the daughter are reset to unity. However, if the cell attempts to place a daughter in a site which is already occupied, then that proliferation event is aborted. In the multistage model of the cell cycle, we then have two choices:

(1)
the progression stage of the cell attempting proliferation is reset to unity;

(2)
the cell remains in the kth stage.
In the original model in which \(k=1\), these two choices are identical. Under implementation (1), cells would have the same average rate of division attempts as in the original model. However, it could be argued that implementation (2) is more realistic as real cells do not reverse through the cell cycle if division is not favourable, but remain held at checkpoints (Alberts et al. 2002). We will investigate both possibilities. In order to clearly distinguish the effects of the different CCTDs, we will not consider cell death in our simulations. For different values of \(P_p\), the population will naturally grow at different rates. As in Baker and Simpson (2010), we will rescale time, \(\bar{t}=P_p t\), in order to make population evolutions comparable.
Figure 6 shows example snapshots of the domain occupancy at rescaled time \(\bar{t}=10\) for three different values of \(k=1,10,100\). Panels (a)–(c) Represent implementation (1) for \(k=1,10,100\), respectively. Panels (d)–(f) represent implementation (2) for \(k=1,10,100\), respectively. Spatial correlations in the occupancies of lattice sites (clusters) are clearly visible in all cases.
In the multistage model, implementation (1) generally leads to less dense colonies than implementation (2) since cells do not attempt division as frequently. Under implementation (2) (see Fig. 6e, f), a clear proliferating rim of (grey) cells can be seen with the bulk of cells being kept at stage k (black). Under implementation (1), every cell can be found in any stage of the cell cycle so it is hard to distinguish the proliferating rim (see Fig. 6b, c). The difference between the two implementations, however, is not due to aborted proliferation events in the bulk (away from the rim) but to the ability of cells at the proliferating rim to rapidly undergo a further division attempt after an aborted attempt under implementation (2). This suggests that the difference between the two implementations will only be apparent at high densities for which correlations have built up and significant numbers of division attempts are being aborted.
For lowdensity systems, in which very few particles are adjacent, the mean cell division attempt times are almost the same for all values of k independent of the implementation ((1) or (2)). However, the variance in the CCTDs for lowdensity systems affects the rate of growth with larger values of k (less variance in the CCTD) generally leading to slower growth. This effect can be understood by considering Eqs. (16) and (19) for a nonexcluding population of cells, for which the finite time and asymptotic time behaviours, respectively, of cell populations with different values of k can be contrasted.
In Fig. 7, we contrast the evolution of the spatially averaged density for three values of \(k=1, 10, 100\) and three proliferation rates, \(P_p=0.05, 0.5, 1\), under implementation (1) [(a)–(c)] and implementation (2) [(d)–(f)].
Under implementation (1) (see Fig. 7a–c), even as cell–cell correlations build up, multistage cells still proliferate more slowly than singlestage cells, since the mean division attempt time remains the same for all values of k. The increased variance of cells with fewer stages results in faster population growth.
However, under the more realistic implementation (2) (see Fig. 7d–f), cells with multistage cell cycles are able to reattempt division after abortive division events more quickly than they otherwise could under the singlestage cell cycle model. Thus, effective average CCTs for cells with a multistage cell cycle at the proliferating rim of a cluster decrease in comparison with cells with a singlestage cell cycle. The faster the pairwise correlations build up, the more pronounced this effect becomes. With a very low proliferation rate (in comparison with fixed motility—see Fig. 7d), cell movement is effective at breaking up correlations meaning that large clusters do not form and that we only see the effect of decreasing mean CCT for larger values of k at late (scaled) times, when the density is higher. Contrastingly, when proliferation is high in comparison with motility (see Fig. 7f), clusters can form quickly preventing the bulk of cells from proliferating earlier and allowing the cells with multistage cell cycles to divide faster on average at the proliferating rim of these clusters than their singlestage counterparts.
It is also worth noting that the greater synchrony in cell division times for larger values of k [exemplified by Eqs. (23)–(24) for the limit of infinite k] is in evidence in the jagged nature of the yellow curves (corresponding to \(k=100\)) in all six subfigures.
Discussion
Currently, many stochastic models which incorporate cell proliferation employ the ubiquitous Gillespie stochastic simulation algorithm (Gillespie 1976, 1977). Unfortunately, in its basic form the Gillespie algorithm represents all events as exponentially distributed. Cell cycle times are not exponentially distributed and cannot therefore be represented by a single reaction event in the Gillespie algorithm. Modelling cell cycle times as a single exponentially distributed event can lead to significant alterations in model behaviour in comparison with more appropriate CCTDs. Consequently, we postulated a simple, general hypoexponentially distributed CCT which can be broken down into exponentially distributed stages allowing for straightforward simulation with the popular Gillespie algorithm. To account for ease of parameter identification, we suggested two special cases of this more general model which have been shown to do an excellent job of recapitulating CCTDs (Golubev 2016).
We postulate that the general hypoexponential distribution (Zhou and Zhuang 2007) or even the more specific Erlang (Gibson and Bruck 2000; Svoboda et al. 1994) or exponentially modified Erlang (Lucius et al. 2003) interevent distribution time models could be used to allow the simplified simulation of complex biochemical and biophysical processes [e.g. enzymatic reactions (Nelson and Cox 2005), allosteric transitions in ion channels (Qin and Li 2004), the movement of molecular motors (Schnitzer and Block 1995), DNA unwinding (Lucius et al. 2003)] using the Gillespie algorithm. More generally, nonMarkovian processes for which only the interevent distribution, rather than the mechanism which generates this distribution, is important might be simulated efficiently using our proposed mechanism (Gibson and Bruck 2000; Floyd et al. 2010; Lucius et al. 2003; Zhou and Zhuang 2007).
We employed our improved model of cell cycle proliferation times on two recent models of real biological processes (Turner et al. 2009; Baker and Simpson 2010). In each case, we found that the incorporation of multiple stages to the cell cycle led to significant differences in the population size in comparison with the original exponentially distributed CCT model. We suggest that these difference will hold more generally throughout stochastic models in which CCTs are currently modelled as exponential. In particular, we intend to investigate the effects of our modified CCTD on the speed of invasion of a population of migrating and proliferating cells.
The application here of hypoexponentially distributed CCTs built up from a number of intermediary exponential stages assumes that the CCT is not correlated between direct descendants or within a given generation. Whilst there are scenarios in which there is no evidence for a correlation in CCT between related cells (Schultze et al. 1979), there are other situations in which this assumption is clearly invalid (Duffy et al. 2012; Hawkins et al. 2009). It is possible that some of these correlation effects can be attributed to the environment in which the cells are proliferating. However, in NIH 3T3 cells a clear correlation has been observed between daughter cells of a given mitotic event compared to more distant relatives; implying a heritable predisposition (Mort et al. 2014). Therefore, one obvious extension to this work would be to incorporate the effects of correlations in cell and phase times to better reflect the biological heterogeneity of a given system.
Change history
08 August 2019
Equations (9) and (10) were transcribed incorrectly.
08 August 2019
Equations (9) and (10) were transcribed incorrectly.
Notes
 1.
 2.
Note that even simpler methods of fitting are possible. In particular, given the first two (three) moments for the Erlang (exponentially modified Erlang) distribution, parameters can be identified by matching moments. This implies the knowledge of the whole distribution of cell cycle times is not necessary. For the Erlang distribution, one would need only the mean and variance of cell cycle times. For the exponentially modified Erlang distribution, the skewness would also be required.
 3.
The threeparameter exponentially modified Erlang distribution may be poorly constrained for some data sets. That is to say there are several values of the parameters which give almost equally good fits to the data. Keeping the aim of carrying out stochastic simulation using the determined distribution in mind, we suggest a preference for acceptable parameter values with the smallest possible value of k. The larger the value of k, the more “reaction channels” the Gillespie algorithm must account for which, if ratelimiting, will significantly reduce the algorithm’s performance.
 4.
Note these stages do not necessarily correspond to the traditional (\(G_1, S, G_2\) and M) phases of the cell cycle. Indeed, there is good evidence that at least one of the phases is not exponentially distributed (Hahn et al. 2009). Rather they are arbitrary division of the cell cycle which will allow us to recreate the correct CCTD.
 5.
Note that by varying k with a fixed cell cycle time we are implicitly varying \(\lambda \), the rate of progression through each stage.
 6.
Note in step 1, that in the case \(j=1\) we assume \(\sum _{j'=1}^{0}=0\).
References
Alberts B, Bray D, Lewis J, Raff M, Roberts K, Watson JD (2002) Molecular biology of the cell, 4th edn. Garland Science, New York
Baar M, Coquille L, Mayer H, Hölzel M, Rogava M, Tüting T, Bovier A (2016) A stochastic model for immunotherapy of cancer. Sci Rep 6:24169
Baker RE, Simpson MJ (2010) Correcting meanfield approximations for birthdeathmovement processes. Phys Rev E 82(4):041905
Bel G, Munsky B, Nemenman I (2009) The simplicity of completion time distributions for common complex biochemical processes. Phys Biol 7(1):016003
Boguná Marian, Lafuerza Luis F, Toral Raúl, Serrano M Ángeles (2014) Simulating nonMarkovian stochastic processes. Phys Rev E 90(4):042108
Bokhari SAJ, Raza A (1992) Novel derivation of total cell cycle time in malignant cells using two dnaspecific labels. Cytom Part A 13(2):144–148
CastellanosMoreno A, CastellanosJaramillo A, CorellaMadueño A, GutiérrezLópez S, RosasBurgos R (2014) Stochastic model for computer simulation of the number of cancer cells and lymphocytes in homogeneous sections of cancer tumors. arXiv preprint arXiv:1410.3768
Doob JL (1945) Markoff chainsdenumerable case. Trans Am Math Soc 58(3):455–473
Duffy KR, Wellard CJ, Markham JF, Zhou JHS, Holmberg R, Hawkins ED, Hasbold J, Dowling MR, Hodgkin PD (2012) Activationinduced b cell fates are selected by intracellular stochastic competition. Science 335(6066):338–341
Feller W (1940) On the integrodifferential equations of purely discontinuous Markoff processes. Trans Am Math Soc 48(3):488–515
Figueredo GP, Siebers PO, Owen MR, Reps J, Aickelin U (2014) Comparing stochastic differential equations and agentbased modelling and simulation for earlystage cancer. PLoS ONE 9(4):e95150
Floyd DL, Harrison SC, Van Oijen AM (2010) Analysis of kinetic intermediates in singleparticle dwelltime distributions. Biophys J 99(2):360–366
Gibson MA, Bruck J (2000) Efficient exact stochastic simulation of chemical systems with many species and many channels. J Phys Chem A 104(9):1876–1889
Gillespie DT (1976) A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys 22(4):403–434
Gillespie DT (1977) Exact stochastic simulation of coupled chemical reactions. J Phys Chem 81(25):2340–2361
Golubev A (2010) Exponentially modified Gaussian (EMG) relevance to distributions related to cell proliferation and differentiation. J Theor Biol 262(2):257–266
Golubev A (2016) Applications and implications of the exponentially modified gamma distribution as a model for time variabilities related to cell proliferation and gene expression. J Theor Biol 393:203–217
Hahn AT, Jones JT, Meyer T (2009) Quantitative analysis of cell cycle phase durations and pc12 differentiation using fluorescent biosensors. Cell Cycle 8(7):1044–1052
Hawkins ML, Turner ED, Dowling MR, Van Gend C, Hodgkin PD (2007) A model of immune regulation as a consequence of randomized lymphocyte division and death times. Proc Natl Acad Sci USA 104(12):5032–5037
Hawkins ED, Markham JF, McGuinness LP, Hodgkin PD (2009) A singlecell pedigree analysis of alternative stochastic lymphocyte fates. Proc Natl Acad Sci USA 106(32):13457–13462
Hillen T, De Vries G, Gong J, Finlay C (2010) From cell population models to tumor control probability: including cell cycle effects. Acta Oncol 49(8):1315–1323
Hoel DG, Crump KS (1974) Estimating the generationtime distribution of an agedependent branching process. Biometrics 30(1):125–135
Kendall DG (1948) On the role of variable generation time in the development of a stochastic birth process. Biometrika 35(3/4):316–330
Kurtz TG (1972) The relationship between stochastic and deterministic models for chemical reactions. J Chem Phys 57(7):2976–2978
Leander R, Allen EJ, Garbett SP, Tyson DR, Quaranta V (2014) Derivation and experimental comparison of celldivision probability densities. J Theor Biol 359:129–135
León K, Faro J, Carneiro J (2004) A general mathematical framework to model generation structure in a population of asynchronously dividing cells. J Theor Biol 229(4):455–476
Lucius AL, Maluf NK, Fischer CJ, Lohman TM (2003) General methods for analysis of sequential “nstep” kinetic mechanisms: application to single turnover kinetics of helicasecatalyzed dna unwinding. Biophys J 85(4):2224–2239
Mort RL, Ford MJ, SakaueSawano A, Lindstrom NO, Casadio A, Douglas AT, Keighren MA, Hohenstein P, Miyawaki A, Jackson IJ (2014) Fucci2a: a bicistronic cell cycle reporter that allows cre mediated tissue specific expression in mice. Cell Cycle 13(17):2681–2696
Mort RL, Ross RJH, Hainey KJ, Harrison OJ, Keighren MA, Landini G, Baker RE, Painter KJ, Jackson IJ, Yates CA (2016) Reconciling diverse mammalian pigmentation patterns with a fundamental mathematical model. Nat Commun 7:10288
Nakaoka S, Inaba H (2014) Demographic modeling of transient amplifying cell population growth. Math Biosci Eng 11(2):363–384
Nelson D, Cox M (2005) Lehninger principles of biochemistry. Wiley Online Library, New York
Nowakowski RS, Lewin SB, Miller MW (1989) Bromodeoxyuridine immunohistochemical determination of the lengths of the cell cycle and the DNAsynthetic phase for an anatomically defined population. J Neurocytol 18(3):311–318
Powell EO (1955) Some features of the generation times of individual bacteria. Biometrika 42(1/2):16–44
Qin F, Li L (2004) Modelbased fitting of singlechannel dwelltime distributions. Biophys J 87(3):1657–1671
Ryser M, Lee W, Ready N, Leder K, Foo J (2016) Quantifying the dynamics of field cancerization in HPVnegative head and neck cancer: a multiscale modeling approach. Cancer Res 76(24):7078–7088
Schindelin J, ArgandaCarreras I, Frise E, Kaynig V, Longair M, Pietzsch T, Preibisch S, Rueden C, Saalfeld S, Schmid B, Tinevez JY, White DJ, Hartenstein V, Eliceiri K, Tomancak P, Cardona A (2012) Fiji: an opensource platform for biologicalimage analysis. Nat Methods 9(7):676–682
Schneider CA, Rasband WS, Eliceiri KW (2012) Nih image to ImageJ: 25 years of image analysis. Nat Methods 9(7):671
Schnitzer MJ, Block SM (1995) Statistical kinetics of processive enzymes. In: Cold spring harbor symposia on quantitative biology, vol 60. Cold Spring Harbor Laboratory Press, New York pp 793–802
Schultze B, Kellerer AM, Maurer W (1979) Transit times through the cycle phases of jejunal crypt cells of the mouse. Cell Tissue Kinet 12(4):347–359
Smith JA, Martin L (1973) Do cells cycle? Proc Natl Acad Sci USA 70(4):1263–1267
Stewart WJ (2009) Probability, Markov chains, queues, and simulation: the mathematical basis of performance modeling. Princeton University Press, Princeton
Svoboda K, Mitra PP, Block SM (1994) Fluctuation analysis of motor protein movement and single enzyme kinetics. Proc Natl Acad Sci USA 91(25):11782–11786
Szekely T, Burrage K (2014) Stochastic simulation in systems biology. Comput Struct Biotechnol J 12(20):14–25
Turner C, Stinchcombe AR, Kohandel M, Singh S, Sivaloganathan S (2009) Characterization of brain cancer stem cells: a mathematical approach. Cell Prolif 42(4):529–540
Weber TS, Jaehnert I, Schichor C, OrGuil M, Carneiro J (2014) Quantifying the length and variance of the eukaryotic cell cycle phases by a stochastic model and dual nucleoside pulse labelling. PLoS Comput Biol 10(7):e1003616
Wimber DE, Quastler H (1963) A \(^14\)c and \(^3\)hthymidine double labeling technique in the study of cell proliferation in Tradescantia root tips. Exp Cell Res 30(1):8–22
Zaider M, Minerbo GN (2000) Tumour control probability: a formulation applicable to any temporal protocol of dose delivery. Phys Med Biol 45(2):279
Zhou Y, Zhuang X (2007) Kinetic analysis of sequential multistep reactions. J Phys Chem B 111(48):13600–13610
Zilman A, Ganusov VV, Perelson AS (2010) Stochastic models of lymphocyte proliferation and death. PLoS ONE 5(9):e12775
Acknowledgements
Open access funding provided by University of Bath. This collaboration was supported by an LMS research in pairs grant (Grant #41461). Dr. Richard Mort is supported by funding from the NC3Rs and Medical Research Scotland (Grant #NC/K001612/1 and #NC/M001091/1). Dr. Christian Yates would like to thank the CMB/CNCB preprint club for constructive and helpful comments on a preprint of this paper.
Author information
Affiliations
Corresponding author
Additional information
The original version of this article was revised due to a retrospective Open Access order.
Appendices
Appendix A: Materials and Methods
In order to determine cell cycle times in cell culture, NIH 3T3 FlpIn cells (Mort et al. 2014) were seeded at various densities in phenolredfree Dulbecco’s modified eagle medium (DMEM) containing 10% fetal calf serum, 1% penicillin/streptomycin and \(100\,\upmu \hbox {g/ml}\) hygromycin B on glass bottomed 24well culture plates (Greiner bioone, UK). The next day, timelapse imaging was performed on subconfluent wells with a 20\(\times \) objective using a Nikon A1R inverted confocal microscope in a heated chamber supplied with 5% CO\(_2\) in air. The time elapsed between mitotic events was measured using the Fiji (Schindelin et al. 2012) distribution of ImageJ—an open source image analysis package based on NIH Image (Schneider et al. 2012).
Appendix B: Psuedocode for the Multistage Model of the Cell Cycle
One of the original (and most popular) implementations of the mathematically exact SSA is known as the direct method (Gillespie 1977). Here we present pseudocode for the direct method implementation of the simple multistage model of the cell cycle (corresponding to Erlang distributed CCTs) in a wellmixed context.
Let \(\varvec{X}(t)\) be a vector of length k which contains the number of cells in each stage at time t. A time interval \(\tau \), until the next stage advancement event, is generated. Along with it, an index j is chosen which determines from which stage a cell will advance from time \(t+\tau \). The changes in the numbers of cells caused by the stage advancement are implemented, the propensity functions (progression rate, \(\lambda \), multiplied by the number of cells in each stage) are altered accordingly, and the time is updated, ready for the next (\(\tau ,j\)) pair to be selected. A method for the implementation of this algorithm is given below:

1.
Initialize the time \(t=t_0\) and the number of cells in each stage, \(\varvec{X}(t_0)=\varvec{x}_0\).

2.
Evaluate the propensity functions, \(a_j(\varvec{X}(t))=\lambda X_j(t)\), (for \(j=1,\ldots ,k\)) associated with the advancement of cells from their respective stages, and their sum \(a_0(\varvec{X}(t))=\sum ^k_{j=1}a_j(\varvec{X}(t))\).

3.
Generate two random numbers \(rand_1\) and \(\hbox {rand}_2\) uniformly distributed in (0, 1).

4.
Use \(rand_1\) to generate a time increment, \(\tau \), an exponentially distributed random variable with mean \(1/a_0(\varvec{X}(t))\). i.e.
$$\begin{aligned} \tau =\frac{1}{a_0}\ln \left( \frac{1}{rand_1}\right) . \end{aligned}$$ 
5.
Use \(rand_2\) to generate index of the next stage advancement event, j, with probability \(a_j(\varvec{X}(t))/a_0(\varvec{X}(t))\), in proportion with its propensity function, i.e. find j such that^{Footnote 6}
$$\begin{aligned} \displaystyle \sum _{j'=1}^{j1}a_{j'}(\varvec{X}(t))<a_0(\varvec{X}(t))\cdot rand_2<\displaystyle \sum _{j'=1}^{j}a_{j'}(\varvec{X}(t)). \end{aligned}$$ 
6.
Update the time, \(t=t+\tau \), and the state vector to reflect the advancement of one cell from the chosen stage,
$$\begin{aligned} X_j=X_j1 \quad&\text {if}\quad j = 1,\ldots ,k \\&\text {and}\\ X_{j+1}=X_{j+1}+1 \quad&\text {if}\quad j\ne k, \\&\text {or}\\ X_1=X_1+2\quad&\text {if}\quad j = k. \end{aligned}$$ 
7.
If \(t<t_\mathrm{final}\), the desired stopping time, then go to step (1). Otherwise stop.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Yates, C.A., Ford, M.J. & Mort, R.L. A Multistage Representation of Cell Proliferation as a Markov Process. Bull Math Biol 79, 2905–2928 (2017). https://doi.org/10.1007/s1153801703564
Received:
Accepted:
Published:
Issue Date:
Keywords
 Cell cycle
 Markovian representation
 Stochastic simulation
 Gillespie algorithm
 Exponentially modified Erlang