1 Introduction

The choice of a stochastic or deterministic model for problems in mathematical biology is not always simple. Deterministic models are in general much cheaper to solve, and often the most appropriate if the scale of the system is large, but stochastic effects can still be important on the macroscale (Black et al. 2009; Butler and Goldenfeld 2011; Rogers et al. 2012; Black and McKane 2012). Examples that motivate this work are populations that initially start from small numbers (cells, virons, invading species, infected individuals, etc.). Due to the random nature of the events, these populations initially go through a period of noisy dynamics (and possible extinction), before entering an exponential growth phase (Black et al. 2014). The effect of this early time noise is not averaged out in the large system but instead persists on the macroscale (Baker et al. 2018). For populations that grow to a maximum before declining, as seen in a susceptible-infected-recovered (SIR) model, this noise manifests as variability in the time for the population to peak (Nitschke et al. 2022; Curran-Sebastian et al. 2024). A deterministic approximation to the full stochastic process can accurately capture the large scale dynamics (the early growth rate and shape of the curves at the peak), but fails to capture the variability in the time to peak.

Recent work has shown that the effect of this type of stochasticity on macroscopic population dynamics can be captured by a single univariate random variable representing a time-shift applied to the initial conditions of a deterministic approximation to the full stochastic system (Barbour et al. 2015; Baker et al. 2018; Bauman et al. 2023). Although Barbour et al. (2015) presents the analytical theory for these time-shifts distributions and when they are valid, they do not present a method to actually calculate the distributions for general models. In this paper we fill this gap by showing how to numerically compute the time-shift distributions for a broad class of continuous- and discrete-time Markov chain models. Our work therefore greatly expands the applicability of these random time-shift ideas for applied modelling.

The key to calculating the time-shift distribution comes from a branching process (BP) approximation to the full model that is valid at early times. The BP approximation can accurately quantify the variability in the population numbers due to the initial stochasticity before the exponential growth phase is entered. In the long time limit, a suitably re-scaled version of this process tends to a limiting distribution, which is the distribution of a random variable often denoted W in the literature (Athreya and Ney 1972, Chapters III.7 and V; Mode 1971, Chapter 1.8; Harris 1964, Chapter VI.19). Matching the solution of the BP with a linearised version of a deterministic approximation to the same system shows that the time-shift distribution and W are related via a simple transformation (Barbour et al. 2015). Although analytic calculation of W for a few simple models is possible (Harris 1948; 1951; Kimmel and Axelrod 2015, Chapter 3.1.4), in general this appears to be a hard problem, especially for multivariate models. Hence we instead develop an efficient numerical scheme to evaluate W. This is based on a Taylor series expansion of the Laplace-Stieltjes transform (LST) of the distribution, the terms of which can be calculated via differentiation of an implicit functional equation. Similar to the operation of automatic differentiation routines (Bartholomew-Biggs et al. 2000), we show that this procedure can be encoded as a series of rules for a given model structure. These insights lead to an efficient algorithm and we provide a numerical package to automate this computation for a given model. We also present another approximate, but even faster, method that employs moment matching, whereby the analytical moments of a surrogate distribution are matched to those of W.

Time-shift distributions are useful in themselves for quantifying the role of early stochasticity in population models, but also suggest an elegant and fast method for the generation of macroscopic solutions that also capture this effect. This simulation method requires a single solution of the deterministic approximation (typically found by solving a set of coupled ordinary differential equations) that is then replicated many times, and each replicate then shifted in time by a sample from the univariate time-shift distribution. This simulation approach can be considered as a type of hybrid simulation method (Rebuli et al. 2017; Kreger et al. 2021), but with a greatly reduced computational cost as only one deterministic trajectory needs to be simulated. The reduced computational cost is particularly useful for applications such as Bayesian inference (Wilkinson 2019, Chapter IV; Kreger et al. 2021), where the use of exact (Gillespie 1977), or even approximate (Gillespie 2001), stochastic simulation methods can become computationally expensive for large systems. Moreover, the time-shift distribution can also be used as an importance sampling distribution for even more efficient sequential inference algorithms (Kroese et al. 2011, Chapter 6; Black 2018).

In the next section we present an example to illustrate the basic idea of approximating the macroscopic stochastic dynamics of a model with a deterministic model subject to random initial conditions. We then present our general method for computing W, and hence the time-shift distribution, in Sect. 3.

2 Example: SIR time-shift distribution

We begin with a discussion of the time-shift distribution for an SIR model. The aim of this example is to provide an overview of what a time-shift distribution is and how this arises naturally from an early time analysis of the model. We choose this particular example as its simplicity allows a transparent presentation of the main ideas; in addition, analytic expressions for the main results can also be derived, but these are withheld until Sect. 4.1.

The SIR model, in a population of fixed size \(N\), is formulated as a two-state continuous-time Markov chain (CTMC) where the state of the system at time \(t\) is given by \(\varvec{X}(t) = (S(t), I(t))\), and \(S(t)\) and \(I(t)\) are the number of susceptible and infectious individuals respectively (Allen 2017). An infected individual creates infectious contacts at a rate \(\beta \) and if the contact is with a susceptible individual, they become infected. Infected individuals each recover independently at a rate \(\gamma \), and hence the mean infectious period is \(1/\gamma \). The possible transitions and corresponding rates are summarised in Table 1. We consider a fixed initial number of infectious \(I(0) = I_0\) and hence \(\varvec{X}(0) = (N - I_0, I_0)\). The regime we are interested in is where the population N is large, but the initial number of infected, \(I_0\), is small.

Table 1 Change in state variables and rates for the CTMC SIR model assuming current state is \(X(t)=(m,n)\)

For the SIR model a deterministic approximation, valid in the limit \(N\rightarrow \infty \), can be derived for the densities \(s(t) = N^{-1} S(t)\) and \(i(t) = N^{-1} I(t)\) (Kurtz 1976), which are the solutions to the ordinary differential equations (ODEs) (McKendrick 1914)

(1)
Fig. 1
figure 1

Panel (a): Number of infected individuals (\(\log _{10}\)) simulated from the SIR model starting with a single infectious individual in a population of \(10^6\) with model parameters \((\beta , \gamma ) = (0.95, 0.5)\). Five realisations from the stochastic model, conditional on non-extinction, are shown in grey. The solution to the deterministic approximation (Ni(t) where i(t) is the solution of Eq. (1)) is shown in black. The two coloured regions roughly identify the period during which the microscopic stochastic dynamics dominate (green) and the exponential growth phase (red). Panel (b): A histogram of the timing of the peak from \(10^5\) stochastic simulations (color figure online)

Realisations of the stochastic dynamics along with the deterministic solution are shown in Fig. 1a, where \(I_0=1\) and \(N=10^6\). At early times (green shaded region), the process is strongly affected by stochasticity due to the small numbers of individuals. Once the number of infected becomes large enough, the growth becomes exponential (red region), but the random timing of events during the early period affects the transition time at which this occurs. Over longer time scales, susceptibles become significantly depleted and the non-linearity in the transmission rate becomes important; the exponential growth phase ends and the number of infected peaks and then declines. What is clear from the realisations is that the stochasicity from the early time dynamics is not averaged out, but persists and is reflected on the macroscale in the random time for the number of infected to peak (Fig. 1b). For example, a realisation that, by chance, takes a long time to enter the exponential growth phase will also peak much later. It can be seen that the deterministic approximation (black curve) captures the macroscopic dynamics well (the exponential growth rate and shape of the curves at the peak), but does not by itself capture the stochasticity in the time to peak.

As we shall see in the next two sections, this stochastic behavior at the macroscale can be captured by randomly shifting in time the initial conditions of the deterministic solution. The distribution of this time-shift can be found by equating two early time approximations of the model: a branching process approximation and a linearised deterministic approximation for the mean. Our major contribution in the manuscript is to provide a method to compute this distribution for a general class of discrete- and continuous-time Markov chain models, described in Sect. 3.

2.1 Early time approximations to the SIR model

The analysis begins by considering the early-time approximation of the SIR model when N is large and \(S(0)\approx N\), meaning susceptible depletion can be ignored and the rate of infection is approximately linear, i.e. \(\beta m n / (N - 1) \approx \beta n\). Hence, as the rate of recovery is already linear, the number of infected individuals can be approximated with a continuous-time branching process (Dorman et al. 2004). Continuous-time branching processes are defined in more detail in Sect. 3.1. The number of individuals infected at time \(t\) is then given by the random variable \(I_b(t)\).

We next consider two approximations to the branching process dynamics. For the first, define

$$\begin{aligned} W(t) = e^{-\lambda t} I_b(t) , \end{aligned}$$
(2)

a rescaled branching process with initial condition \(W(0) = I_0\). The parameter \(\lambda \) is the so-called Malthusian parameter or early growth rate (Dorman et al. 2004), and for the SIR model is simply \(\lambda = \beta - \gamma \) (Allen 2017). The process \(W(t)\) is well studied in the literature: it is a martingale and converges to a random variable \(W\) almost surely (e.g. Barbour et al.,2015; Athreya and Ney 1972, Chapter III.7). Let us pause to interpret the limiting behaviour when the time is long enough such that \(W(t) \approx W\). In this limit, the number of infected \(I_b(t) \approx e^{\lambda t} W\) only depends on t in the exponent, and hence the dynamics have entered the exponential growth phase. Substituting this approximation in Eq. (2) and rearranging, we have,

$$\begin{aligned} I_b(t) \approx \exp {\left( \lambda \left( t + \lambda ^{-1}\log {W}\right) \right) }. \end{aligned}$$
(3)

In this equation, W has been placed in the exponent so that we can in future interpret it as a shift to the time t.

A second approximation to the BP dynamics is to just consider the mean number of infected, (Athreya and Ney 1972, Chapter III). This can be expressed similarly to Eq. (3)

(4)

where since \(W(t)\) is a martingale (Athreya and Ney 1972, Chapter III.7; Barbour et al. 2015). The two approximations for the dynamics look similar, and clearly capture the observed exponential growth in this model, but are different. The approximation for \(I_b(t)\) is stochastic as W is a random variable, \(I_d(t)\) is deterministic.

2.2 Random time-shifts

As alluded to above, the form of Eqs. (3) and (4) are deliberate and suggest an elegant way to understand the effect of the early time stochasticity on the longer time mean dynamics. Equating the two solutions we see that the process \(I_b(t)\) can be approximated by \(I_d(t + \tau )\), where

(5)

The two solutions are identical up to a random time-shift of the initial conditions of the deterministic mean solution. The main panel of Fig. 2 illustrates this concept. The green lines show full stochastic realisations and the orange dashed lines indicate the corresponding shifted mean solutions (see Kendall 1966 for early prototype of this figure). The above analysis shows that the time-shift is a simple transformation of the random variable W.

Figure 2 summarises the main ideas of this section on the relationships between stochastic branching processes, deterministic approximations and time-shifts. The histogram at the top shows the time-shift distribution calculated from stochastic simulations run until \(t=20\), which is enough time for W(t) to have converged to W. The histogram to the right shows the distribution of the state of the branching process (\(I_b(t)\)) at time \(t = 20\), which can be seen to resemble the shape of the time-shift distribution but with some scaling. All simulations used to construct the histograms are conditioned on the event of non-extinction.

Fig. 2
figure 2

Three realisations of the stochastic SIR model are shown in green, plotted on a log scale, with \(I_0=1\). The orange dashed lines are the projections down from the point at which the stochastic trajectories approximately follow the deterministic trajectory but shifted relative to \(t=0\). The histogram on the right-hand side of the plot shows the distribution of the number of infected (\(\log _{10}\)) at time \(t = 20\) from \(5\times 10^4\) stochastic simulations. The histogram at the top is the empirical time-shift distribution obtained by transforming the same simulations (color figure online)

2.3 Macroscopic dynamics

The above discussion shows how the stochastic early time dynamics can be approximated by a deterministic solution for the mean subject to random initial conditions. As we saw in the discussion of the model dynamics, this noise persists on the macroscale as well. Theorem 1.1 of Barbour et al. (2015), shows how the above time-shift analysis can be extended to the full non-linear deterministic approximation for the density process given in Eq. (1). This states that given a CTMC with state vector \(\varvec{X}(t)\) that is well approximated by a branching process near the initial condition and has a deterministic approximation, \(\varvec{\zeta }(t)\), then the process \(\varvec{\zeta }(t + \tau )\), with \(\tau \) given by Eq. (5), also approximates \(\varvec{X}(t)\). In our example here, \(\varvec{\zeta }(t) = N \varvec{x}(t)\), where \(\varvec{x}(t) = (s(t),i(t))\).

Thus the time-shift distribution has predictive power on the macroscale as well. For example, Fig. 1B shows that we can predict the timing of peak infections as follows. Obtain a deterministic prediction for the timing of the peak number of infections, \(t_p\), by either solving Eq. (1) numerically or employing an analytic approximation (Turkyilmazoglu 2021). We can then determine the distribution on the peak timing from simply adding \(t_p+\tau \).

3 Methods

This section details the theoretical foundations for estimating the distribution of \(W\) and subsequently the distribution of the random time-shifts. We assume a branching process model is being used directly or has been derived from a CTMC model as was done for the SIR example in Sect. 2.

If a CTMC is the starting point, then the requirements for this analysis to be applicable are as follows: The system has a natural scaling parameter K (e.g. the total population or carrying capacity), such that one may consider how the dynamics scale in the limit \(K\rightarrow \infty \). Typically this parameter is referred to as the system size and one talks about the dynamics in the deterministic limit (Black and McKane 2012). For this limit to actually exist, it is required that the rates of events can be written in a density dependent form (Kurtz 1970, 1976; Barbour 1980). This means that in the limit that \(K\rightarrow \infty \), the density follows a set of ODEs (for example see Eq. (1) for the SIR model, and Appendix D for the innate response model). Finally, we require that the rates of the CTMC are approximately linear near the initial condition so a BP approximation to the CTMC can be constructed (Barbour et al. 2015; Allen and Lahodny 2012; Allen 2017). Many population models are naturally formulated such that all these conditions are simultaneously satisfied and hence the methodology is suitable (Barbour 1980; Schuster 2016, Chapter 5.2).

The three conditions above guarantee that we can approximate the early-time dynamics by a branching process, which can be matched with a deterministic approximation in the same manner as in Sects. 2.1, 2.2. This matching can be done since the linearised (about the unstable equilibria) deterministic system produces an equivalent system of differential equations to the mean of the branching process. The resulting time-shift distribution is guaranteed by Theorem 1.1 of Barbour et al. (2015) to accurately characterise the difference between the true stochastic process and the limiting (large K) deterministic system.

The rest of this section is as follows: In Sect. 3.1 we detail the branching process theory required for defining \(W\) in the multivariate, continuous-time, case. Section 3.2 details our method for computing the LST of W using a moment expansion and a suitably formulated embedded process. Section 3.3 details the inversion of the LST to recover the distribution. Section 3.4 shows how the conditional moments required for the calculation of the LST can be found using a recursive approach. Section 3.5 shows how the method can be applied to discrete time models and Sect. 3.6 details a simpler, but approximate, approach for estimating the distribution of \(W\) by fitting a generalised gamma distribution to its first five moments.

3.1 Branching process theory

A branching process models the evolution of the number of particles, or agents, that each live for a particular lifetime at which point they die and reproduce according to predefined rules (Athreya and Ney 1972, Chapter V; Dorman et al. 2004; Kimmel and Axelrod, 2015, Chapter 1). The particular property that makes a branching process amenable to analysis is that, once created, each individual is assumed to evolve independently of all others. In a multi-type model each individual is assigned a type and each type is governed by different lifetimes and reproduction rules. We consider only Markovian branching processes where the lifetime for each type is assumed to be exponentially distributed, which we herein refer to as a continuous-time multi-type branching process (CT-MBP) (Dorman et al. 2004).

Formulated mathematically, a CT-MBP \( \{ \varvec{Z}(t), t\ge 0 \}\) is defined on a state space \(\mathcal {S} \subseteq \mathbb {N}_0^m\) where \(Z_i(t)\) is the number of individuals of type \(i = 1, \dots , m\), at time \(t\). Each type lives for an exponentially distributed amount of time with mean \(1/a_i\), and upon death they create a number of offspring with \(p_{i}(\varvec{k})\) the probability of individual \(i\) having \(\varvec{k} = (k_1, \dots , k_m)\) offspring of each type. Note that throughout this work, unless specified otherwise, all vectors correspond to row vectors. This information is conveniently summarised in the progeny generating functions

$$\begin{aligned} f_i(\varvec{s}) = \sum _{\varvec{k}} p_i(\varvec{k}) \prod _{j = 1}^{m} s_j^{k_j}, \quad \varvec{s} \in [0, 1]^m. \end{aligned}$$
(6)

A CT-MBP is therefore fully specified by the rates \(a_i\) and the probabilities \(p_i(\varvec{k})\). Throughout this work we assume that the CT-MBP is irreducible, that is for every pair of types (ij),

$$\begin{aligned} \text {Pr}\, (Z_j(t) \ge 1 \,|\, \varvec{Z}(0) = \varvec{e}_i) > 0, \quad \text {for some } t\ge 0, \end{aligned}$$
(7)

where \(\varvec{e}_i\) is a standard basis vector in \(\mathbb {R}^m\). Intuitively this means that starting with an individual of type i, there is a non-zero probability of eventually producing an individual of type j.

The dynamics of this model can be largely analysed through a matrix \(\Omega \) with elements (Athreya and Ney 1972, Chapter V; Dorman et al. 2004)

where \(\delta _{ij} \) is the Kronecker delta. In particular, defining \(\varvec{\zeta }(t):= \mathbb {E}[\varvec{Z}(t)]\), we can characterise the mean behaviour of the branching process, which can be considered a deterministic approximation since \(\varvec{\zeta }(t)\) satisfies the following system of ordinary differential equations (Dorman et al. 2004)

(8)

Assuming the initial condition \(\varvec{\zeta }(0) = \varvec{z}_0\), the solution of Eq. (8) is given by the matrix exponential

$$\begin{aligned} \varvec{\zeta }(t) = \varvec{z}_0 e^{\Omega t}. \end{aligned}$$
(9)

By Theorem 2.7 of Seneta (1981) and the Perron–Frobenius theorem (Athreya and Ney 1972, Chapter V.7.4; Barbour et al. 2015), in the limit as \(t\rightarrow \infty \) then \(e^{\Omega t} \approx e^{\lambda t} \varvec{u}^T \varvec{v} \) where \(\varvec{u}^T\) (column vector) and \(\varvec{v}\) (row vector) are the right and left eigenvectors of \(\Omega \), corresponding to the dominant eigenvalue \(\lambda \), normalised such that \(\varvec{u} \cdot \varvec{1} = 1\) and \(\varvec{u} \cdot \varvec{v} = 1\) (Athreya and Ney 1972, Chapter V; Harris 1964, Chapter VI).

We only consider the super-critical regime where \(\lambda > 0\) as otherwise, with probability 1, the population will go extinct and hence not grow to be large on the macroscale. Hence, an approximate solution to Eq. (9) valid at long times—but not so long that the BP approximation has broken down—is given by

$$\begin{aligned} \varvec{\zeta }(t) \approx e^{\lambda t} \varvec{z}_0 \varvec{u}^{T} \varvec{v}. \end{aligned}$$
(10)

From these quantities a rescaled branching process can be defined,

$$\begin{aligned} \varvec{W}(t) = e^{-\lambda t} \varvec{Z}(t), \end{aligned}$$
(11)

which is a non-negative Martingale with \(\lim _{t \rightarrow \infty } \varvec{W}(t) = W \varvec{v}\) almost surely (Kesten and Stigum 1966; Athreya and Ney 1972, Chapter V). This martingale is the main tool to understand how the mean dynamics differs from the the stochastic realisations.

For a long enough time such that \(\varvec{W}(t)\) has converged, we can make the substitution \(W \varvec{v}\) for \(\varvec{W}(t)\) in Eq. (11) and rearrange to get \(\varvec{Z}(t) \approx e^{\lambda t} W \varvec{v}\) which can be expressed equivalently as (Barbour et al. 2015)

$$\begin{aligned} \varvec{Z}(t) \approx \exp {\left( \lambda \left( t +\lambda ^{-1} \log W \right) \right) } \varvec{v}. \end{aligned}$$

The deterministic approximation for the mean, Eq. (9), can be written as \(\varvec{\zeta }(t) \approx e^{\lambda t}\mathbb {E}[W] \varvec{v}\) where \(\mathbb {E}[W] = \varvec{z}_0 \varvec{u}^T\) and can be similarly expressed as

These two processes are identical up to the random time delay of (Barbour et al. 2015)

(12)

The time-shift is a simple transformation of W and thus the main part of our method is concerned with computing W itself.

The tractability of the later calculations relies on conditioning the process to start with a single individual of a particular type. When the process starts from multiple individuals, i.e. a general initial condition \(\varvec{z}_0\), this is not a problem as we can exploit the independence of the agents to write

$$\begin{aligned} \varvec{Z}(t) = \sum _{i=1}^m \sum _{j = 1}^{z_{0, i}} \varvec{Z}_i^{(j)}(t), \end{aligned}$$
(13)

where \(\varvec{Z}_i^{(j)}\) are independent sub-processes that are each started from a single individual of type \(i\) for each of the \(j = 1, \dots , z_{0, i}\) initial individuals. Defining \(\varvec{W}_i(t)\) as the random variable \(\varvec{W}(t)\) conditional on starting from a single individual of type i and once more using Theorem 2, Chapter V.7 of Athreya and Ney (1972) we have that

$$\begin{aligned} \lim _{t \rightarrow \infty } \varvec{W}_i(t) = W_i \varvec{v}, \end{aligned}$$

almost surely, and \(\mathbb {E}[W_i] = u_i\). Since the left eigenvector, \(\varvec{v}\), of the matrix \(\Omega \) is the same regardless of initial condition we can multiply Eq. (13) by \(e^{-\lambda t}\) and taking the limit as \(t \rightarrow \infty \) results in

$$\begin{aligned} W = \sum _{i=1}^m \sum _{j = 1}^{z_{0, i}} W_i^{(j)} \end{aligned}$$
(14)

where the \(W_i^{(j)}\), \(j = 1, \dots , z_{0, i}\), are independent copies of \(W_i\).

In order to compute the distribution of \(W\) we work with the Laplace-Stieltjes transforms (LSTs) of the \(W_i\) defined as

$$\begin{aligned} \phi _i(\theta ) = \mathbb {E}[ e^{-\theta W_i}], \quad \theta \in \mathbb {C}, \end{aligned}$$
(15)

and we define the vector \(\varvec{\varphi }(\theta ) = (\phi _1(\theta ), \dots , \phi _m(\theta ))\). Since the random variables \(W_i^{(j)}\) are independent and identically distributed copies of the \(W_i\), the LST of \(W\) is simply

$$\begin{aligned} \phi (\theta ) = \prod _{i=1}^m \phi _i(\theta )^{z_{0, i}}. \end{aligned}$$
(16)

3.2 Computation of the LST of \(W_i\)

To compute the distribution of W we will derive approximations for \(\phi _i\), then Eq. (16) can be inverted using standard methods.

Since \(\lambda > 0\), a Taylor series expansion of the term \(e^{-\theta W_i}\) about \(0\) in Eq. (15) yields an approximation to the LST in terms of the first n moments of \(W_i\)

$$\begin{aligned} \hat{\phi }_i(\theta ) = \sum _{k = 0}^{n} \frac{(-\theta )^k}{k!} \mathbb {E}[W_i^k]. \end{aligned}$$
(17)

The calculation of the moments, \(\mathbb {E}[W_i^k]\), can be done by recursively solving sets of linear equations and is discussed in Sect. 3.4. Simply evaluating Eq. (17) at \(\theta \in \mathbb {C}\) will result in a poor approximation as \(\theta \) increases away from \(0\) due to the error term in the Taylor series (Hubbard and Hubbard 1999, Chapter 3.3).

The error in the approximation can be determined through Lagrange’s remainder theorem and the linearity property of expectation (see Appendix B for details). Let \(\mathcal {E}_i^{(n)}( \theta )\) denote the (absolute) error in Eq. (17), then this error is bounded above by

$$\begin{aligned} \mathcal {E}_i^{(n)}(\theta ) \le \frac{\left|{\theta }\right|^{n+1}}{(n+1)!} \mathbb {E}[W_i^{n+1}]. \end{aligned}$$
(18)

Since this holds for all \(i = 1, \dots , m\), then we can simply ensure the largest LST error is below some (user specified) tolerance \(\epsilon \) and the others will have error less than this by default. Let

$$\begin{aligned} \gamma = \max \left\{ \mathbb {E}[W_i^{n+1}], i = 1, \dots , m\right\} , \end{aligned}$$

then

$$\begin{aligned} \mathcal {E}_i^{(n)}(\theta ) \le \frac{\left|{\theta }\right|^{n+1}}{(n+1)!} \mathbb {E}[W_i^{n+1}] \le \frac{\left|{\theta }\right|^{n+1}}{(n+1)!} \gamma \le \epsilon , \end{aligned}$$

and rearranging this error bound, we can define

$$\begin{aligned} L(n, \epsilon ) = \left( \frac{(n+1)!\epsilon }{\gamma }\right) ^{1 / (n+1)}. \end{aligned}$$
(19)

Provided \(\theta \) is in the region

$$\begin{aligned} \mathcal {A}_{n, \epsilon } = \left\{ \theta : \left|{\theta }\right| \le L(n, \epsilon )\right\} , \end{aligned}$$
(20)

we simultaneously satisfy the error tolerance for all the LSTs. This provides a bound on the size of the open neighbourhood about \(0\) where the approximation has its error controlled to arbitrary levels of precision.

In order to extend the region where the approximation Eq. (17) is accurate from \(\mathcal {A}_{n, \epsilon }\) to all of \(\mathbb {C}\), we consider the construction of an embedded discrete-time multi-type branching process (DT-MBP). Before we give the full details of the approach we provide insight into why this approach was the one taken. The construction of the embedded DT-MBP means we can formulate another rescaled process (similar to Eq. (11)) which has the same limit, \(W \varvec{v}\), as the original CT-MBP (Doob 1940; Athreya and Ney 1972, Chapters III.6 and V.7). With a discrete-time BP the LSTs of the \(W_i\) can be shown to satisfy a simple functional equation that relates the progeny generating function (of the embedded process) and the LSTs. This functional equation provides a method for shrinking (we provide detail for what this means shortly) the value of \(\theta \) in the cases where \(\theta \notin \mathcal {A}_{n, \epsilon }\). From the functional equation we can derive a simple recursive algorithm for evaluating the LSTs at \(\theta \in \mathbb {C}\).

The detailed realisation of this approach begins by constructing the embedded DT-MBP of \(\varvec{Z}(t)\) which is defined, for some \(h > 0\), as \(\varvec{Z}^{(h)}(n) = \varvec{Z}(h n)\) with \(n \in \mathbb {N}_0\) (Mode 1971, Chapter 7.4). We provide results and some discussion for choosing the value of h in Sect. 4. The progeny generating functions for the embedded process can be calculated from the generating functions of the continuous-time process. Once more we condition on the process starting with a single individual of type i and define a rescaled version of the embedded chain

$$\begin{aligned} \varvec{W}_i^{(h)}(n) = e^{-\lambda h n} \varvec{Z}_i^{(h)}(n), \quad n \in \mathbb {N}_0, \quad i=1,\dots ,m. \end{aligned}$$

Crucially both the embedded process, \(\varvec{W}_i^{(h)}(n)\), and original continuous time process, \(\varvec{W}_i(t)\), converge to the same limit, \(W_i \varvec{v}\) for all \(i=1,\dots ,m\) (Doob 1940; Athreya and Ney 1972, Chapter III).

Let \(\tilde{f}_i(\varvec{s})\) be the progeny generating function of the embedded process, \(\varvec{Z}^{(h)}(n)\), conditional on starting with an individual of type i and let \(\varvec{\tilde{f}}(\varvec{s}) = (\tilde{f}_1(\varvec{s}), \dots , \tilde{f}_m(\varvec{s}))\). The progeny generating functions of the embedded process can be related to the progeny generating functions of the original process, \(F_i(\varvec{s},t)\), where

$$\begin{aligned} F_i(\varvec{s},t) = \mathbb {E}\left[ \prod _{j=1}^{m} s_j^{Z_j(t)} \,\big \vert \, \varvec{Z}(0) = \varvec{e}_i\right] , \quad i = 1, \dots , m. \end{aligned}$$

These generating functions satisfy the system of differential equations (Allen 2015, Chapter 1.3; Athreya and Ney 1972, Chapter V),

(21)

with initial conditions \(F_i(\varvec{s}, 0) = s_i\). The progeny generating function of an individual of type i in the embedded process is then \(\tilde{f}_i(\varvec{s}) = F_i(\varvec{s}, h)\). Since the rescaled embedded process, \(\varvec{W}_i^{(h)}(t)\), and continuous-time processes, \(\varvec{W}_i(t)\), converge to the same limiting random variable, \(W_i\varvec{v}\), \(\varvec{\varphi }(\theta )\) satisfies the following functional equation (Athreya and Ney 1972, Chapter V; Harris 1951; Mode 1971, Chapter 1.8),

$$\begin{aligned} \varvec{\varphi }(\theta ) = \varvec{\tilde{f}}\left( \varvec{\varphi }\left( \theta e^{-\lambda h} \right) \right) . \end{aligned}$$
(22)

This provides a way of accurately evaluating the approximation \(\hat{\varvec{\varphi }}(\theta )\) for all \(\theta \in \mathbb {C}\). If \(\theta \in \mathcal {A}_{n, \epsilon }\) then we simply evaluate Eq. (17). If \(\theta \notin \mathcal {A}_{n, \epsilon }\) then we can recursively evaluate Eq. (22) \(\kappa \) times until \(\theta e^{-\lambda h \kappa } \in \mathcal {A}_{n, \epsilon } \). This recursive calculation is equivalent to calculating

$$\begin{aligned} \varvec{\varphi }(\theta ) = \underbrace{\varvec{\tilde{f}}\circ \varvec{\tilde{f}}\circ \dots \circ \varvec{\tilde{f}}}_{\kappa \text { times}}\left( \varvec{\varphi }\left( \theta e^{-\lambda \kappa h} \right) \right) . \end{aligned}$$

The value of \(\kappa \) can be chosen ahead of time for a particular value of \(L(n, \epsilon )\) (given by Eq. (19)) as

$$\begin{aligned} \kappa \ge \frac{1}{\lambda h}\log {\left( \frac{\left|{\theta }\right|}{L(n, \epsilon )} \right) }. \end{aligned}$$
(23)

The full procedure is listed below in Algorithm 1. Note that the progeny generating functions for the embedded process, \(\varvec{\tilde{f}}(\varvec{s})\), only needs to be calculated once for a given set of model parameters. The two hyperparameters are the number of moments used in the LST expansions, \(n\), and the size of the discrete time step, \(h\); in Sect. 4 we will explore the choices of these parameters and their effect on the accuracy of the approximation and solve times.

Algorithm 1
figure a

Computation of the LST of \(W\).

3.3 Distribution of W using inversion

We can use numerical inversion techniques to obtain the distribution of \(W\) from the LSTs. We refer to this method as the probability-estimation (PE) method. Inversion of \(\phi (\theta ) / \theta \) recovers the CDF, i.e. \(G_W(w) = \mathcal {L}^{-1}\left\{ \phi (\theta )/\theta \right\} (w)\) where \(\mathcal {L}^{-1}\) is the inverse Laplace transform. This inversion can be carried out through a variety of methods, for example see Abate et al. (2000), Abate and Whitt (1995) for an overview. In this work we utilise the concentrated matrix exponential (CME) method (Horváth et al. 2020), with \(21\) terms. This method falls under the class of Abate-Whitt methods, out competes similar methods in terms of accuracy, and was robust throughout our testing. The CME approach, like most inversion methods, is valid only for values of \(w > 0\). However, from analysis of the branching processes we know that there is a point mass at \(w = 0\) corresponding to the probability of ultimate extinction, \(q^{\star }\), (Harris 1964, Chapter II.7), which is calculated below. Hence we are able to add this in post-inversion and can express the CDF as

$$\begin{aligned} \begin{aligned} G_W(w)&= {\left\{ \begin{array}{ll} q^{\star }, &{} w = 0, \\ \mathcal {L}^{-1}\left\{ \hat{\phi }(\theta )/\theta \right\} (w), &{} w > 0. \end{array}\right. } \end{aligned} \end{aligned}$$

Typically we will only be interested in the non-extinction case. Defining the random variable \(W^\star := W \,\vert \, W > 0\), the probability density function (PDF) of this is

(24)

The derivative of the CDF can be computed numerically or through automatic differentiation methods, which are more accurate (Baydin et al. 2018). In this work we utilise a specific version of automatic differentiation referred to as forward-mode automatic differentiation. This method is supported natively in Julia (Revels et al. 2016).

The probability \(q^{\star }\) can be calculated as follows. Let \(q_i\) be the probability of extinction conditioned on starting with a single individual of type \(i\) and define \(\varvec{q} = (q_1, \dots , q_m)\). The vector \(\varvec{q}\) can be calculated by solving for the minimal non-negative solution of the system of equations (Harris 1964, Chapter II.7; Mode 1971, Chapter 7)

$$\begin{aligned} f_i(\varvec{q}) = q_i, \quad i = 1, \dots , m. \end{aligned}$$
(25)

By the independence assumption of the individuals in the branching process the probability of ultimate extinction for a given initial condition is simply

$$\begin{aligned} q^\star = \prod _{i = 1}^{m} q_i^{z_{0,i}}, \end{aligned}$$

where \(z_{0, i}\) is the initial number of individuals of type \(i\).

3.4 Calculating moments

In this section we outline the calculation of the moments that are required in the Taylor expansion of the LST (Eq. 17).

The moment generating function (MGF) of \(W_i\) is defined as

$$\begin{aligned} \xi _i(\theta ) = \mathbb {E}[ e^{\theta W_i}], \quad \theta \in \mathbb {R}, \end{aligned}$$
(26)

and \(\varvec{\Xi }(\theta ) = (\xi _1(\theta ), \dots , \xi _m(\theta )) \). The \(i\)th MGF then satisfies the following functional equation (Athreya and Ney 1972, Chapter V)

(27)

where \(f_i(\varvec{s})\) is the progeny generating function and \(a_i\) is the rate parameter of the exponential lifetime distribution for individuals of type \(i\) respectively. We note that throughout this section, unless otherwise specified, we use the notation \(\xi _i^{(n)}(x)\) to denote the nth derivative of \(\xi _i(\theta )\) with respect to \(\theta \) evaluated at x, and let \(\varvec{\Xi }^{(n)}(x) = (\xi _1^{(n)}(x), \dots , \xi _m^{(n)}(x))\).

The \(n\)th derivative of the \(i\)th MGF yields the \(n\)th moment of \(W_i\), \(\xi _i^{(n)}(0) = \mathbb {E}[W_i^n]\) and hence all the moments can be obtained by differentiating Eq. (27) \(n\) times, for each i, and evaluating the result at \(\theta = 0\) (Bellman and Harris 1952),

(28)

Upon first consideration it would appear simpler to differentiate Eq. (22) to obtain the moments. However evaluation of the progeny generating functions for the embedded process requires a system of ODEs to be solved numerically (see Sect. 3.5), which is not easily handled. The process of repeated differentiation of the progeny generating functions is complicated in general, however frequently occurring reproduction rules for a branching processes (i.e. linear or quadratic progeny generating functions) can be solved for and yield a system of linear equations that when solved gives the moments for each \(W_i\). This enables the calculations to be automated and examples of this for common use cases will be provided here. This approach is also exact in the sense that it is not influenced by the accuracy of the numerical solvers for obtaining the ODE solutions that are needed for evaluating for the progeny generating functions.

In this work we consider progeny generating functions of the form

$$\begin{aligned} f_i(\varvec{s}) = \frac{\nu _i}{a_i} + \sum _{\begin{array}{c} j = 1 \\ j \ne i \end{array}}^{m}\frac{\alpha _{ij}}{a_i} s_j + \sum _{k = 1}^{m} \sum _{l = k}^{m} \frac{\beta _{ikl}}{a_i} s_k s_l, \end{aligned}$$
(29)

where the rate parameter for the lifetime distribution is given by

$$\begin{aligned} a_i = \nu _i + \sum _{\begin{array}{c} j = 1 \\ j \ne i \end{array}}^{m} \alpha _{ij} + \sum _{k = 1}^{m} \sum _{l = k}^{m} \beta _{ikl}. \end{aligned}$$

The parameter \(\nu _i\) relates to the rate of type i dying without producing any offspring; the parameters \(\alpha _{ij}\) correspond to linear branching dynamics (i.e. type i dying and generating a type j), and the \(\beta _{ikl}\) correspond to quadratic branching (i.e. type i splitting into two other types k and l). The summation involving the \(\alpha _{ij}\) is over values \(\{1, \dots , m\} {\setminus } \{i\}\) as type i must become another type in this case. This is not a restriction in the quadratic branching case (i.e. the summation including the \(\beta _{ikl}\)) however we do assume an ordering \(l \ge k\) on the indices, which assures there is no double counting (i.e. no contribution for \(s_k s_l\) and \(s_l s_k\) as these are treated the same and hence \(\beta _{ikl} = 0\) for \(k > l\)).

Substituting Eq. (29) into Eq. (27) and noting that as the sums are finite we can swap the order of summation and integration, the functional equation takes the form

$$\begin{aligned} \xi _i(\theta ) = I_1(\theta ) + I_2(\theta ) + I_3(\theta ) \end{aligned}$$
(30)

where

(31)
(32)
(33)

We can therefore differentiate Eq. (30) n times and evaluate at \(\theta = 0\) by considering the terms in Eq. (31) to (33) individually. Carrying this out and substituting the results we arrive at the equation

$$\begin{aligned} \begin{aligned} \xi _i^{(n)}(0) =&\sum _{\begin{array}{c} j = 1 \\ j \ne i \end{array}}^{m} \frac{\alpha _{ij}}{a_i + n \lambda } \xi _j^{(n)}(0) \\ {}&+ \sum _{k = 1}^{m} \sum _{l = k}^{m} \frac{\beta _{ikl}}{n\lambda + a_i}\sum _{r = 0}^{n} \left( {\begin{array}{c}n\\ r\end{array}}\right) \xi _k^{(r)}(0) \xi _l^{(n - r)}(0). \end{aligned} \end{aligned}$$
(34)

With the initial condition \(\xi _i^{(1)}(0) = \mathbb {E}[W_i] = u_i\) (see Sect. 3.1), we can formulate a recursive system of linear equations (in the moments) by isolating all the terms involving the nth moment on the left-hand side of Eq. (34). The recursive equation is then given by

$$\begin{aligned} \xi _i^{(n)}(0) - \sum _{\begin{array}{c} j = 1 \\ j \ne i \end{array}}^{m} \frac{\alpha _{ij}}{a_i + n \lambda } \xi _j^{(n)}(0) - \sum _{k = 1}^{m} \sum _{l = k}^{m} \frac{\beta _{ikl}}{n\lambda + a_i}\left( \xi _k^{(n)}(0) + \xi _l^{(n)}(0) \right) \nonumber \\ = \sum _{k = 1}^{m} \sum _{l = k}^{m} \frac{\beta _{ikl}}{n\lambda + a_i} \sum _{r = 1}^{n-1} \left( {\begin{array}{c}n\\ r\end{array}}\right) \xi _k^{(r)}(0) \xi _l^{(n - r)}(0), \quad n \ge 2. \end{aligned}$$
(35)

Defining the constants

$$\begin{aligned} \begin{aligned} \tilde{\alpha }_{ij}^{(n)}&= \frac{\alpha _{ij}}{a_i + n \lambda }, \\ \tilde{\beta }_{ikl}^{(n)}&= \frac{\beta _{ikl}}{n\lambda + a_i}, \\ d_i^{(n)}&= \sum _{k = 1}^{m} \sum _{l = k}^{m} \tilde{\beta }_{ikl}^{(n)}\sum _{r = 1}^{n-1} \left( {\begin{array}{c}n\\ r\end{array}}\right) \xi _k^{(r)}(0) \xi _l^{(n - r)}(0), \end{aligned} \end{aligned}$$

and noting that \(\xi _i^{(n)}(0) = \varvec{e}_i \varvec{\Xi }^{(n)}(0)^{T}\) we can rewrite Eq. (35) in vector notation

$$\begin{aligned} \left( \varvec{e}_i - \sum _{\begin{array}{c} j =1 \\ j \ne i \end{array}}^{m} \tilde{\alpha }_{ij}^{(n)} \varvec{e}_j - \sum _{k =1}^{m}\sum _{l = k}^{m}\tilde{\beta }_{ikl}^{(n)}\left( \varvec{e}_k + \varvec{e}_l\right) \right) \varvec{\Xi }^{(n)}(0)^{T} = d_i^{(n)}, \quad n \ge 2, \end{aligned}$$
(36)

subject to the initial condition \(\varvec{\Xi }^{(1)}(0) = \varvec{u}\).

Equation (36) holds for agents of type \(i = 1, \dots , m\) and so we can formulate a matrix \(C^{(n)}\) and a vector \(\varvec{d}^{(n)}\), that constitute a linear system that can be solved recursively to get the \(n\)th moments,

$$\begin{aligned} C^{(n)} \varvec{\Xi }^{(n)}(0)^{T} = \varvec{d}^{(n)T}, \text { for } n \ge 2. \end{aligned}$$

Row i of \(C^{(n)}\) corresponds to a linear equation derived from the ith progeny generating function. An example of formulating this system of equations is given for the SEIR epidemic model in Sect. 4.2.

3.5 Application to discrete-time models

The method developed so far has been for continuous-time processes, but discrete-time processes can also be handled with some simplifications to the method that we briefly outline in this section. A discrete-time multi-type branching process (DT-MBP) \(\varvec{Z}(t)\) is specified similarly to the CT-MBP but with \(t \in \mathbb {N}_0\). The progeny generating functions take the same form as Eq. (6) but individuals have non-random unit lifetimes. The behaviour of the system is studied through the mean offspring matrix \(\mathcal {M}\) with elements

The dominant eigenvalue of this matrix is denoted by \(\rho \) and with it we define the analogous form of Eq. (11), \(\varvec{W}(t) = \varvec{Z}(t) \rho ^{-t}\) (Athreya and Ney 1972, Chapter V). Note that \(\rho ^{-t} = e^{-t\log \rho }\) and letting \(\lambda = \log \rho \) we have \(\varvec{W}(t) = \varvec{Z}(t) e^{-\lambda t}\) as in the continuous-time case (Eq. 11). The rescaled process \(\varvec{W}(t)\) approaches the limit \(W \varvec{v}\) where \(\varvec{u}^T\) and \(\varvec{v}\) now correspond to the right and left eigenvectors of \(\mathcal {M}\), normalised such that \(\varvec{u} \cdot \varvec{1} = 1\) and \(\varvec{u} \cdot \varvec{v} = 1\).

All the remaining constructions from the previous sections apply, but with the simplification that the conditional LSTs of \(W\) directly satisfy (Athreya and Ney 1972, Chapter V; Mode 1971, Chapter 1.8)

$$\begin{aligned} \varvec{\varphi }(\theta ) = \varvec{f}\left( \varvec{\varphi }\left( \theta \rho ^{-1} \right) \right) . \end{aligned}$$
(37)

Algorithm 1 can be used to compute the LSTs by setting \({\varvec{\tilde{f}}}(\varvec{s}) = \varvec{f}(\varvec{s})\), \(\lambda = \log \rho \) and \(h = 1\). For a discrete-time model, the process of deriving the moments, given a model specification, is dramatically simplified: firstly, as we can directly differentiate Eq. (37) (as opposed to differentiating Eq. (27) as in the continuous-time case) to obtain the moments by replacing \(\varvec{\varphi }(\theta )\) with the MGFs, \(\varvec{\Xi }(\theta )\). Secondly, we do not need to extend the neighbourhood for evaluating the LST through solving ODEs, as the progeny generating functions are explicitly given.

3.6 Moment matching

Here we outline a second approach to calculating the distribution of \(W\) that is based on moment matching with a parametric distribution. This approach is an approximation, but is quicker than the PE method as we do not require construction of the LST and its subsequent inversion. Furthermore, this method results in analytical distributions which can be more easily sampled.

First, recall from Eq. (14) that W can be written as the sum of independent random variables \(W^{(j)}_i\) and so the kth moment of W is given by

For ease of notation, define \(N = \sum _{i = 1}^{m} z_{0, i}\) independent random variables \(U_1 = W_1^{(1)}, U_2 = W_1^{(2)}, \dots , U_{z_{0, 1}} = W_1^{(z_{0, 1})}, \dots , U_{N} = W_m^{(z_{0, m})}\), then the moments of W can be written as

(38)

which follows from the multinomial theorem and the linearity of expectation over finite sums. The set \(\mathcal {B}_k\) is defined as

$$\begin{aligned} \mathcal {B}_k = \left\{ \varvec{l}: \sum _{n = 1}^{N} l_n = k, l_n \in \{0, 1, \dots , k\} \right\} , \end{aligned}$$

which are the integer partitions of k. The expectations appearing in Eq. (38), \(\mathbb {E}[U_n^{l_n}]\), are simply the moments of the \(W_i^{(j)}\)’s that are calculated in Sect. 3.4.

Next, recall from Sect. 3.3 that the distribution of W can be expressed as a mixture of a point mass at \(w=0\) and a continuous part for \(w>0\), which is denoted \(W^\star \). We can therefore approximate W by fitting a parametric distribution to \(W^\star \), adding the point mass, and re-normalising appropriately. The family of distributions we fit to is chosen by considering the known properties of \(W^\star \). The distribution of \(W^\star \) is strictly non-negative and absolutely continuous (Athreya and Ney 1972, Chapter V). Additionally, we assume that the distribution of the sample paths of the branching processes at time \(t\) are unimodal and potentially heavy-tailed away from \(0\), which implies similar characteristics for \(\varvec{W}^{\star }(t)\) and hence \(W^{\star }\). This suggests that suitable distributions would likely be from the exponential family and as such we consider fitting a generalised gamma distribution by using a moment matching method (MM). This distribution has the Weibull, exponential and gamma distributions as special cases and in testing appeared to be well fitting.

Suppose \(W^{\star } \sim \text {GG}(\beta , \alpha _1, \alpha _2)\), then the PDF is given by

$$\begin{aligned} g_{W^\star }(w) = \frac{\alpha _2}{ \beta ^{\alpha _1}\Gamma (\alpha _1/\alpha _2)} w^{\alpha _1 - 1} \exp {\left( -\left( \frac{w}{\beta }\right) ^{\alpha _2}\right) }, \end{aligned}$$

with the \(k\)th moment given by

$$\begin{aligned} M_k(\beta , \alpha _1, \alpha _2) = \beta ^{ k} \frac{\Gamma ((\alpha _1 + k)/\alpha _2)}{\Gamma (\alpha _1 / \alpha _2)}. \end{aligned}$$

We determine the parameters of this distribution, \((\beta , \alpha _1, \alpha _2)\), by minimising the difference between its first five moments and those of \(W^{\star }\) as calculated by our method, where

Since the moments grow very large with increasing k, we standardise them to ensure that each are assigned approximately equal importance when fitting (Bishop 1996, Chapter 8). This ensures that the optimisation routine does not prioritise the higher order moments that would otherwise excessively contribute to a naive loss function. To determine the appropriate scaling we express the moments of \(W^{\star }\) as for some \(c_k \in [0, 10)\) and some \(\eta _k\). From this we can define the loss function as the sum of squares

(39)

This can be minimised numerically using standard methods to estimate the parameters.

Samples can be drawn from the GG distribution using the inverse CDF method. Additionally, the approximation to the time-shift distribution conditional on non-extinction, denoted \(\tau ^\star \), can be derived through the CDF method and has PDF given by

$$\begin{aligned} g_{\tau ^\star }(\tau ) = \frac{\alpha _2\lambda \mu _W^{\alpha _1}}{\beta ^{\alpha _1} \Gamma \left( \frac{\alpha _1}{\alpha _2}\right) } \exp {\left( -\left( \frac{\mu _W e^{\lambda \tau }}{\beta } \right) ^{\alpha _2} + \alpha _1\lambda \tau \right) }. \end{aligned}$$
(40)

4 Results

In Sect. 3 we provided the details of two approaches for computing the distribution of W and hence the time-shift itself. The first method, which we refer to as the PE method, relies on approximating the LST of W and inverting it (detailed in Sect. 3.3) and the second, called the MM method, relies on matching the first five moments (detailed Sect. 3.6). This section focuses on how these methods perform on three models of increasing complexity.

We first demonstrate how the methods perform on the SIR model described in Sect. 2, since this is a special case where the distribution of W is known analytically. In Sect. 4.2 we apply the methods to an SEIR model, which is a simple extension of the SIR example. Section 4.2.1 explores the effect of the hyper-parameters (the number of moments in the expansion and the step size for the embedded process \(h\)) on the accuracy of the resulting distributions as well as the computation time. Following this, in Sect. 4.2.2 we assess the effect of the total population size, on the time-shift distributions which provides some insight to when this method is suitable. Finally, in Sect. 4.2.3 we explore more complex initial conditions and how this influences the shape and location of the resulting time-shift distributions. In Sect. 4.3 we explore a more complicated model which demonstrates how the method can be used to approximate the macroscopic dynamics of a more complex 6-state CTMC model.

4.1 SIR model

The first example is the SIR model (see Sect. 2 for formulation) which serves as validation of the method for a one-dimensional situation where the distribution of W, and hence the distribution of \(\tau \), is known analytically. This follows as the early time approximation of the SIR model is a one-dimensional, linear, birth-death process (Harris 1951). We give the main results here; a full derivation is given in Appendix A. For this model the LST of W is given by

$$\begin{aligned} \phi (\theta ) = \frac{\gamma }{\beta } + \frac{\beta - \gamma }{\beta } \left( 1 + \frac{\beta \theta }{\beta - \gamma } \right) ^{-1}. \end{aligned}$$
(41)

Inversion of \(\phi (\theta ) / \theta \) gives the CDF of W,

$$\begin{aligned} G_{W}(w) = q + (1 - q) \left( 1 - e^{-(1-q) w}\right) , \quad w \ge 0, \end{aligned}$$
(42)

where \(q = \gamma / \beta \). The point mass of size q at \(w=0\) corresponds to extinction of the process. Conditional on the event of non-extinction (\(w>0\)), the distribution is exponential with rate \(1 - q\).

For the results in this section we fix the initial condition at \(\varvec{X}(0) = (N-1, 1)\) where \(N = 10^6\) and the parameters at \((\beta , \gamma ) = (0.95, 0.5)\) (as they were in Sect. 2). We choose the two hyper-parameters, the time step, \(h = 0.1\), and number of terms in the moment expansion, \(n = 30\). The choices of these parameters and the effect of these choices are assessed in the following section (Sect. 4.2).

Figure 3 shows the LST and CDF computed using our methods (green dots) against the true values (black solid line) as given by Eqs. (41) and (42), respectively. Note that we only show the PE method here as the MM method is visually indistinguishable. We see strong agreement between the exact quantities and their counterpart estimated with our methods.

Fig. 3
figure 3

Panel (a): Comparison of the exact LST (Eq. 41) (black solid line) and our approximation from the two methods (green dots). Panel (b): Comparison of the CDF (Eq. 42) (black solid line) and our approximation from the two methods (green dots) Note that in both panels, the results for the two methods (PE and MM) are visually indistinguishable so we show only one (PE) for clarity. Numerical results for both methods are provided in Table 2 (color figure online)

To assess the accuracy of the different methods more quantitatively we consider the simple measure of the L1-norm (Pajankar and Joshi 2022, Chapter 13) between the CDF values computed under Eq. (42) and our methods over the interval [0, 10] in steps of 0.1. This can be considered as an average of the error over the interval. Alongside this we also report the maximal error over the interval.

Table 2 shows the results of these two values. We see that both methods produce low average errors based on the L1-norm. While the maximum error occurs for the PE method (on the order of \(10^{-5}\)) which is 5 orders of magnitude larger than the error for the MM method, it should be noted that the error corresponds to a difference in the 4th decimal place. This will be largely insignificant for practical use cases and as seen in Fig. 3, the methods can reliably approximate the LST and CDF.

Table 2 Error between the two estimation methods and the true CDF for the SIR example. Results are computed over the interval [0, 10] in steps of 0.1

4.2 SEIR model

The next example is the canonical extension to the SIR model that incorporates a latent (or exposure) state where individuals are infected but not yet infectious (Keeling and Rohani 2008, Chapter 2.5). Assuming a fixed population of size \(N\), we can formulate the SEIR model as a CTMC with state vector \(\varvec{X}(t) = (S(t), E(t), I(t))\). The model is governed by the parameters \((\beta , \sigma , \gamma )\) where \(\beta \) is the effective transmission parameter, \(\sigma \) is the rate of transitioning from \(E\) to \(I\), and \(\gamma \) is the rate of transitioning from \(I\) to \(R\). When \(\varvec{X}(0) \approx (N, 0, 0)\) we can approximate the CTMC by a 2-type CT-MBP, \(\varvec{Z}(t) = (E(t), I(t))\). Rates of the CTMC and CT-MBP approximation are given in Table 3. Note that we overload notation here and use the same variables (SE and I) for both the population numbers and states.

Table 3 Change in state and rates for the CTMC SEIR model and the BP approximation

We use the natural ordering of the CT-MBP state vector to define the mapping between the types so that \(E = \text {type }1\) and \(I = \text {type }2\) individuals. The lifetimes of individuals are exponentially distributed with rates \(a_1 = \sigma \) and \(a_2 = \beta + \gamma \). An individual of type 1 produces a single offspring of type 2 after their lifetime with probability 1. Individuals of type 2 either produce no offspring with probability \(\gamma / a_2\) or produce one offspring of type 1 and type 2 with probability \(\beta / a_2\). Thus, the PGFs for the CT-MBP are

$$\begin{aligned} f_1(\varvec{s})&= \frac{\sigma }{a_1} s_2 , \end{aligned}$$
(43)
$$\begin{aligned} f_2(\varvec{s})&= \frac{\gamma }{a_2} + \frac{\beta }{a_2}\beta s_1 s_2 . \end{aligned}$$
(44)

Next we formulate the recursive system of equations needed for computing the moments through the method outlined in Sect. 3.4. We can easily extract the rate constants from Eqs. (43) and (44) using the subscripts of the progeny generating function and the subscripts of the elements of \(\varvec{s}\) appearing in the right-hand side. In Eq. (43), \(i = 1\) (from the left-hand side) and \(j = 2\) (from the right-hand side) so \(\alpha _{12} = \sigma \). Similarly, in Eq. (44), \(i = 2\), \(k = 1\) and \(l = 2\), so \(\beta _{212} = \beta \) for the second equation. Hence the only non-zero parameters appearing in the rows of \(C^{(n)}\) arising from Eq. (36) are

$$\begin{aligned} \tilde{\alpha }_{12}^{(n)} = \frac{\sigma }{\sigma + n \lambda }, \quad \tilde{\beta }_{212}^{(n)} = \frac{\beta }{\beta + \gamma + n \lambda }, \end{aligned}$$

noting that the leading subscript denotes which row of \(C^{(n)}\) the parameters correspond to. The growth rate, \(\lambda \), for the SEIR model is given explicitly by (Ma 2020)

$$\begin{aligned} \lambda = \frac{-(\sigma + \gamma ) + \sqrt{(\sigma -\gamma )^2 + 4 \sigma \beta }}{2}. \end{aligned}$$

With the constants determined, the system of linear equations for the moments is given succinctly as

$$\begin{aligned} \begin{bmatrix} 1 &{} -\tilde{\alpha }_{12}^{(n)} \\ \\ -\tilde{\beta }_{212}^{(n)} &{}\quad 1 - \tilde{\beta }_{212}^{(n)} \end{bmatrix} \begin{bmatrix} \xi _1^{(n)}(0) \\ \\ \xi _2^{(n)}(0) \end{bmatrix} = \begin{bmatrix} 0 \\ \tilde{\beta }_{212} \displaystyle {\sum _{r = 1}^{n-1}} \left( {\begin{array}{c}n\\ r\end{array}}\right) \xi _j^{(r)}(0)\xi _k^{(n - r)}(0) \end{bmatrix}, \quad n \ge 2, \end{aligned}$$

with \((\xi _1^{(1)}(0), \xi _2^{(1)}(0)) = (u_1, u_2)\), which is the normalised eigenvector corresponding to the eigenvalue \(\lambda \) as outlined in Sect. 3.1.

4.2.1 Hyper-parameter sensitivity

The two free hyper-parameters of the algorithm are: the number of terms in the moment expansion, n, and the choice of the time step, \(h\), for the construction of the embedded process. For the simulations in this section and the following sections (unless specified otherwise) we fix the initial condition at \(\varvec{X}(0) = (N-1, 1, 0)\) where \(N = 10^6\) and the parameters at \((\beta , \sigma , \gamma ) = (0.56, 0.5, 0.33)\). Different parameter choices (satisfying \(\lambda > 0\)) were also explored and the results were consistent with those presented for this parameter combination.Footnote 1

Exact stochastic simulation (Gillespie 1977) is utilised throughout this section to estimate the empirical distribution of the time-shifts by computing the difference in the time for the number of infected to reach a threshold of \(0.05 N\) under the stochastic simulation and deterministic approximation respectively. This threshold was chosen based on the time taken for an ensemble of stochastic simulations to have all reached their exponential growth phase, measured by comparing when the growth rate of a simulation was consistent with the deterministic model. All simulations will be conditioned on the event of non-extinction and we take the error tolerance to be \(\epsilon = 1\times 10^{-6}\) (used in Eq. (20)).

Figure 4 shows the distribution of the time-shifts for different choices of the number of moments, n. We see that fewer terms (\(n = 3\)) in the expansion results in poor estimation of the PDF. Oscillations appear in the PDF estimated through our method and these become increasingly large in the right tail of the distribution. These oscillations are clear in the \(n = 3\) case and also exist in the \(n = 10\) and \(n = 15\) cases but in a less obvious way. There appears to be minimal difference in the choice of \(n\) once \(n > 15\) and at \(n = 30\) the expansion is highly accurate. In what follows, unless stated otherwise, we assume \(n = 30\) moments are used in the expansion.

Fig. 4
figure 4

Effect of the number of terms, n, in the moment expansion. Each panel shows the PDF for the random time-shifts with a varying number of terms where the value of n is given in the title. The black histogram shows the empirical PDF and the coloured lines show the estimated density from the PE method (color figure online)

Figure 5 shows the time-shift distributions for \(4\) different choices for the time-step, \(h\), of the embedded process. The choices reflect some typical values for a model of this size and temporal resolution, but the computed distributions are visually indistinguishable, hence the method appears to be insensitive to \(h\). The estimated PDFs do not exhibit any deviations from the empirical time-shift distribution which suggests that the choice of \(n\) is more critical to the accurate estimation of the time-shift distributions.

Fig. 5
figure 5

Effect of the discrete time-step, h, used in the embedded process. The black histogram shows the empirical density and the coloured lines show the density from the PE method. The computed PDFs are visually indistinguishable from one another (color figure online)

We also explored the computation time of the method under the different choices of \(h\). All tests were run on a 2021 Macbook Pro with M1 chip and the runtimes are provided in Table 4. As we increase h the runtimes reduce slightly and this is a result of a smaller number of times we need to recursively evaluate Eq. (22) (i.e. the value of \(\kappa \) in Eq. (23)). The main computational expense when using the method is in the repeated inversion of LST required to evaluate the CDF and produce an approximation to it. This is needed for sampling realisations as would be needed within a simulation routine.

Table 4 Benchmark results of CPU time (\(10^{-3}\) s) for different choices of h

4.2.2 Agreement at smaller population sizes

Our theoretical results are valid in the limit as the population size \(N\rightarrow \infty \). In practical circumstances \(N\) will be finite and here we explore how the empirical distributions compare with the computed time-shift distributions in such cases.

Figure 6 shows the empirical time-shift distribution alongside the estimated distribution, using both the MM and PE methods, for different the population sizes (provided in the titles of each subplot). The MM and PE methods produce PDFs that agree strongly with one another and are independent of the population size \(N\). Some clear deviation from the empirical time-shift distribution can be seen in the \(N = 10^3\) case but as \(N\) increases we see that the empirical distribution converges to the estimated time-shift distribution. In the \(N = 10^4\) case there are still some minor deviations between the empirical distribution and the distributions estimated through our approaches, but the methods still appear to be approximately suitable in this situation.

Fig. 6
figure 6

PDFs for the time-shifts from the SEIR example with different population sizes (where the population size is given in the subtitle of each subplot). The PDFs estimated through simulation are shown by the solid black lines. PDFs for the PE and MM methods are shown by the coloured dots and dashed lines respectively. Note that the time-shift distributions estimated using our method in each panel is independent of N and thus is the same in all four panels (color figure online)

4.2.3 Different initial conditions

In this section we briefly explore the impact of different initial conditions on the distribution of the time-shift. As the LSTs for all simple initial conditions (starting with a single individual of a particular type) are computed simultaneously in Algorithm 1 we can easily calculate the LST for the arbitrary initial condition as detailed in Sect. 3.1.

Figure 7 shows the empirical time-shift distributions for varied initial conditions along the computed distributions using both the PE and MM methods. Both MM and PE are able to reliably estimate the shape of the PDFs and capture the tail behaviours. An additional observation is to the shape of the time-shift distribution as the initial conditions become larger (i.e. both \(E\) and \(I\) get larger). When \((E, I) = (1, 0)\) the distribution is left-skewed with a median close to \(10\) and a larger variance compared to the other initial conditions. The long left tail accounts for slow growing epidemics which occur with lower probability. This shape aligns with intuition that stochasticity plays a larger role in an outbreak when there are few infections, which in turn influences the time taken to reach the exponential growth phase. As we approach the \((E, I) = (15, 10)\) case, the distributions become more symmetric and the median becomes centered closer to a time-shift of \(0\) days. This means the model typically hits the growth phase much quicker when the initial number of individuals is larger.

Fig. 7
figure 7

PDFs for the time-shifts from the SEIR example with varying initial conditions, where the initial condition is given in the subtitle of each subplot. The PDFs estimated through simulation are shown by the solid black lines. PDFs for the PE and MM methods are shown by the coloured dots and dashed lines respectively (color figure online)

4.3 Innate response model

Our final example is a within-host viral kinetics model that characterises a respiratory infection such as COVID-19 or influenza. This demonstrates our method on a model with higher complexity and shows the simplicity offered in simulating macroscopic dynamics using the time-shift approach. This model is a (slightly adjusted) stochastic version of that used in Ke et al. (2021), which is an extension to the so-called target-cell-limited model which incorporates an innate immune response (Baccam et al. 2006). Our adjustment is that we also account for two mechanisms of infection, through virions and cell-to-cell interactions to further increase the complexity of the model (Odaka and Inoue 2021).

This model tracks \(6\) population sizes; target (or susceptible) cells (\(U\)), cells in an eclipse phase (\(E\)), infectious cells (\(I\)), number of viral particles (\(V\)), number of interferons (\(A\)) and the number of cells refractory to infection (\(R\)). Target (or susceptible cells) (\(U\)) can become infected when bound to by a virus particle at rate \(\beta _1\) or infected by infectious cells (\(I\)) through the viral synapse structure at rate \(\beta _2\) (Odaka and Inoue 2021). We assume that contacts in these transmission processes scale with the density of cells and virus. We capture this by assuming some maximal carrying number of cells and virus, K. Infected cells transition through an eclipse phase (\(E\)) at rate \(\sigma \) before becoming infectious (\(I\)) and being removed at rate \(\gamma \). During the eclipse phase we account for the death of infected but not yet infectious cells at rate \(\eta \). Over a cell’s infectious period, new virus particles (\(V\)) are produced at rate \(p_v\) and are cleared at rate \(c_V\). Furthermore, we consider the effect of an innate immune response whereby an infectious cell produces interferons (\(A\)) at rate \(p_A\) (cleared at rate \(c_A\)) which bind to cells at rate \(\delta \) and cause them to become refractory to infection \(R\). Finally, refractory cells become target cells again at rate \(\varrho \). This model can be formulated as a six-state CTMC with state vector \(\varvec{X}(t) = (U(t), R(t), E(t), I(t), V(t), A(t))\).

Near the unstable equilibrium \(\varvec{X}(0) \approx (U_0, 0, 0, 0, 0, 0)\), where \(U_0\) is the maximum number of cells lining the upper respiratory tract (URT), the process can be approximated by a CT-MBP, \(\varvec{Z}(t) = (E(t), I(t), V(t))\). This process is much simpler than the original CTMC as we don’t need to directly model the immune response. Intuitively this is because the immune response is not critical in the early stages of an infection due to the large number of target cells. The rates of the CTMC and CT-MBP approximation are given in Table 5.

Table 5 Change in state and rates for the CTMC innate response model and the BP approximation

Letting \(\bar{\beta }_i = \beta _i U_0 / K\) for \(i = 1, 2\), from Table 5 the vector of lifetime parameters for the CT-MBP is

$$\begin{aligned} \varvec{a} = \left( \sigma , \gamma + p_V + \bar{\beta }_2, \bar{\beta }_1 + c_V \right) \end{aligned}$$

and the PGFs are

$$\begin{aligned} \begin{aligned} f_1(\varvec{s})&= \frac{\eta + \sigma s_2}{a_1}, \\ f_2(\varvec{s})&= \frac{\gamma + p_V s_2 s_3 + \bar{\beta }_2 s_2 s_1}{a_2}, \\ f_3(\varvec{s})&= \frac{c_V + \bar{\beta }_1 s_1}{a_3}. \end{aligned} \end{aligned}$$

For the simulations we take the initial condition as \(\varvec{X}_0 = (U_0-1, 0, 1, 0, 0, 0)\) where \(U_0 = 8\times 10^7\). We assume the maximum number of agents in the system, \(K = U_0\), for convenience as this is already very large (so our approximations hold). The parameter values used in the simulations and brief descriptions for interpretability are given in Table 6. We use \(n = 30\) moments in the expansion and a step size of \(h = 0.1\) for the PE method as this showed strong accuracy from the previous section. We also tested \(h = 1\) but this produced divergent behaviours in the inversion (Fig. 10) which suggests that the time-scale of the process also influences the choice of h and is hence model dependent.

Table 6 Parameter values and interpretations for the innate response model

Let \(\varvec{x}(t) = K^{-1} \varvec{X}(t)\) denote the density process. In the limit \(K\rightarrow \infty \), a system of differential equations for the evolution of the density can be derived (see Appendix D for details),

(45)

Solving Eq. (45) we obtain the deterministic approximation \(\varvec{X}(t) \approx K \varvec{x}(t)\).

Figure 8 demonstrates the accurate agreement between the behaviour of a stochastic realisation and a trajectory generated using the time-shift methodology. The time-shift is determined from the time at which the population of virus, V(t), in the stochastic realisation exceeds 2000 and thus has reached the exponential growth phase. The time-shift distributions themselves are shown in Fig. 9 and the value of the time-shift used to produce Fig. 8 (\(\tau = -0.422\)) is shown with the orange line. This example also shows that we can appropriately capture the stochastic dynamics of the full 6-state model using a simpler 3-state model \((E, I, V)\) which is then reflected in the components of the system not tracked by the BP approximation. This in and of itself demonstrates great utility in being able to quickly generate sample paths in instances where the macroscopic dynamics are the main focus of the analysis. The agreement between the results is worse at very small population numbers, but in practice it is only once a population has grown to a large size that it even becomes observable.

Fig. 8
figure 8

Example simulation for the innate response model. The solid line indicates the stochastic simulation and the dotted line indicates the shifted deterministic solution (color figure online)

Fig. 9
figure 9

PDFs for the time-shift in the innate response model. The PDF estimated through simulation is shown by the solid black lines. PDFs for the PE and MM methods are shown by the coloured dots and dashed lines respectively. The vertical orange line indicates the value of the time-shift used to produce Fig. 8 (color figure online)

Figure 9 shows that both the PE and MM methods are reliably able to approximate the PDF of the time-shift. The time-shift distribution has lower variance compared to those seen in Sect. 4.2, which is a result of the faster growth and larger reproduction numbers of the populations in a within-host process.

5 Discussion

We have presented a framework to approximate the distribution of the time-shift between sample paths of a broad class of Markov chain models and the corresponding deterministic solution. The recent work of Barbour et al. (2015) establishes a crucial theoretical result, that the initial dynamics of the Markov process affects the long-time dynamics in a manner that is largely captured by a random variable W and the time-shift is a simple transformation of this. Our contribution is a numerical framework to approximate the distribution of W for the class of continuous-time Markov chains identified at the start of Sect. 3. We introduced the PE method (Sect. 3.2), an accurate approximation dependent on several hyper-parameters and built on the Laplace-Stieltjes transform of W, and the MM method (Sect. 3.6), a fast approximation matching the first five moments of W to a generalised gamma distribution.

The PE and MM methods are both flexible and can be applied to multi-type branching process models in discrete-time and continuous-time. They allow one to generate solution curves that preserve the macroscopic behaviours of the sample paths from the stochastic model. Sample paths obtained in this way can be used for rapid and accurate simulation studies or inference on medium to long time scales.

The theory of branching processes is well established (see Harris 1948, 1964; Mode 1971; Athreya and Ney 1972). The random variable W was constructed and studied in the literature as early as 1948 (Harris 1948, 1951). However, these bodies of work focus on models with very particular structure that aids analytic tractability; for example the linear birth-death process we derive as an early time approximation to the SIR model in Sect. 2 is studied in Harris (1951). Another tractable example is the linear fractional branching process featured in Kimmel and Axelrod (2015, Chapter 1). To our knowledge, prior to this manuscript there were no tools developed specifically to compute W in the more general multivariate case and our methodology provides a solution for this.

We explored three models of varying complexity, including the SIR model in Sect. 2 and Sect. 4.1, a SEIR model in Sect. 4.2 and a 6-dimensional innate response model in Sect. 4.3. The numerical results show that both PE and MM methods reproduce the empirical distribution accurately across a broad range of model conditions, and how the parameters of the PE method can easily be chosen. Furthermore, the innate response model demonstrates an effective macroscale reduction performed by our method (Givon et al. 2004; Roberts 2015). In this model, the branching process is 3-dimensional, and the computation of W from this low-dimensional space allows us to accurately reproduce the macroscale dynamics of the full 6-dimensional system (shown in Fig. 8). This macroscale reduction is a known consequence of Theorem 1.1 of Barbour et al. (2015), and the contribution of our manuscript is to provide a practical route to compute it.

All of the models we considered shared relatively simple branching dynamics with linear and quadratic progeny generating functions. This may seem restrictive, but most biological or physical processes will have dynamics of this form, especially if the common assumption of mass action mixing is made. The only requirement common to both the PE and MM methods is that we can compute the moments of W. It should be straightforward to extend the rules derived in Sect. 3.4 for the calculation of the moments to more complex dynamics. An example exhibiting such dynamics arises in the modelling of the accumulation of HIV-1 mutations where progeny generating functions arise that are polynomial of large (positive) integer order (Shiri and Welte 2011). Other examples are where the offspring distribution has a parametric form, such as the negative binomial often used in epidemic models that feature super-spreading (Garske and Rhodes 2008). We have provided flexible open source code to automate these computations, which can be easily extended to handle new cases when they are investigated.

Our work can theoretically be extended to branching processes with non-Markovian lifetimes of individuals; these models are referred to as age-dependent branching processes (Athreya and Ney 1972, Chapter IV; Harris 1964, Chapter VI; Mode 1971, Chapter 3). Age-dependent processes have the same definition for \(\varvec{W}(t)\) (e.g. Equation 11) with equivalent limiting behaviours and can be studied in much the same ways as we did in Sect. 3.1. The main source of difficulty in working with age-dependent processes arises from differentiating the functional equations required for computing the conditional moments of W. As opposed to the Markovian case, the integrals that need to be computed may be much more challenging to solve and may even require numerical integration to calculate. This would slow the computation, as exact results like those presented in Sect. 3.4 may not arise. Current work is investigating which cases are amenable to analysis.

One of the primary advantages of our methodology is the fast generation of macroscopic sample paths that also capture the early time stochasticity. Simulation of stochastic models in either an exact (Gillespie 1977) or approximate framework (Gillespie 2001) is often a computational bottleneck (Gillespie 2001; Black 2018; Kreger et al. 2021). This computational expense is mostly a result of the requirement to generate random numbers to calculate which event occurs next and the number of events scales linearly with the size of the system (Gillespie 1977). Approximate stochastic simulation methods alleviate this demand somewhat by approximating the number of events over a time-step, usually referred to as tau-leaping (Gillespie 2001). The secondary class of methods—which our methodology can loosely be classified under—are hybrid simulation methods. These methods combine deterministic and stochastic simulation methods to produce more efficient simulation routines (Kreger et al. 2021; Rebuli et al. 2017). Often these methods are formulated with a regime switching during the period where the populations experience exponential growth, whereby we switch from a stochastic simulation over to a deterministic solution. These methods then suffer from the computational complexity required for the random number generation during the stochastic simulation regime as well as the need to solve the deterministic model with an arbitrary initial condition for each path generated (Rebuli et al. 2017; Kreger et al. 2021). Our approach generates sample paths using only a single solve of the deterministic model and then drawing from the univariate time-shift distribution. Then each sample path is obtained by simply replicating the solution and shifting it by the random variate. This is much less computationally demanding and as such is readily applicable for inference methods that rely on forward simulations of the model such as particle filters. In the situation where after taking large values, the population declines back to small values, one could return to using stochastic simulations where the initial condition is determined from the deterministic solutions.

As detailed in Sect. 4.2.1, estimating PDF values using the PE method is fast (on the order or \(10^{-2}\) s) and using the method we can easily draw realisations of \(W\) through standard sampling approaches (i.e. rejection sampling). The MM method provides further performance for sampling by approximating the distribution analytically. Fitting an analytical distribution means we can directly sample realisations using the inverse CDF method (taking around \(10^{-6}\) s). When accuracy is paramount, the PE method offers a framework for computing the LST to arbitrary precision through increasing the number of moments and controlling the tolerance for the moment expansion in the neighbourhood of 0. The PE method relies on several numerical methods, each of which has associated errors. Despite this, the main contributor to the overall error is from the error in the LST approximation in the neighbourhood of 0 as explained in Sect. 3.2. The errors associated with the numerical integration of the ODEs and the Laplace inversion are controlled though tolerances that can be automatically tuned in software. In situations where performance is the priority then the MM method provides a much faster route to a reliable approximation of the distribution of \(W\) (as seen for the examples in Sect. 4). However, as it is an approximation to the distribution of \(W\), it may be the case that some characteristics of the distribution are not captured (i.e. very long tails). From our testing, matching to the first five moments yields a highly accurate approximation to the distribution. By providing two methods for computing the distribution of \(W\), we give a framework which can be used to assess how reliably the MM method is working for a given problem if needed.