1 Introduction

For models of infectious spread in which long-term quasi-stable endemicity is possible, a random variable of particular interest is the persistence time until infection dies out from the population. For a number of such models, it is known (Andersson and Djehiche 1998; Ball et al. 2016; van Herwaarden and Grasman 1995) that as the typical population size N tends to infinity, the expected persistence time \(\tau \) for an infection that has become endemic in the population satisfies

$$\begin{aligned} \tau\sim & {} {C \over \sqrt{N}} \exp ( NA ) \end{aligned}$$
(1)

where we use \(\sim \) to denote that the ratio of the two sides converges to 1 as \(N \rightarrow \infty \), and AC are constants whose values depend upon parameters of the process, but not upon N. We assume here, and from now on, that the basic reproduction number \(R_0\) (the average number of new infections caused by a typical infected individual in an otherwise susceptible population) is greater than one, so that the process is super-critical, or ‘above threshold’.

A pioneering piece of work in this direction was van Herwaarden and Grasman (1995), where it was shown that the relationship (1) holds for a particular susceptible–infectious–removed (SIR) infection model. Evaluation of A required numerical solution of a system of ordinary differential equations, while no method for evaluating C was given. It should be noted that van Herwaarden and Grasman (1995) studied a diffusion approximation to the infection model, but it is now well known (see, e.g., Clancy and Tjia 2018; Doering et al. 2005) that such a diffusion approximation does not, in general, give correct leading-order asymptotics—that is, the value of the constant A computed from the diffusion approximation is not necessarily equal to the correct A value for the underlying discrete state-space model. Using rather different techniques, Andersson and Djehiche (1998) derived a result of the form (1), together with explicit expressions for the constants AC, for the classic susceptible–infectious–susceptible (SIS) model of Weiss and Dishon (1971).

In recent years, a number of authors have applied techniques from statistical physics to study persistence times for a range of population models, including infection models. For models which are naturally one-dimensional, including the classic SIS model, a relationship of the form (1) together with explicit formulae for AC can be obtained (Assaf and Meerson 2010, 2017). For multidimensional models (including most infection models), it is usually only possible to establish results of the cruder form \(\lim _{N \rightarrow \infty } (\ln \tau ) / N = A\), and to evaluate the leading-order constant A via numerical solution of a system of ordinary differential equations (Dykman et al. 1994; Elgart and Kamenev 2004; Hindes and Schwartz 2016; Kamenev and Meerson 2008; Lindley et al. 2014). One reason why the technique has not been more widely exploited is that for a k-dimensional model, it is necessary to solve a system of ordinary differential equations in 2k dimensions subject to boundary conditions at times \(t = - \infty \) and \(t = + \infty \). Progress has been made by a number of authors in the development of efficient numerical procedures (e.g. Forgoston et al. 2011; Lindley and Schwartz 2013), but implementation for models in dimensions \(k>2\) remains far from trivial.

Much of the above work on infection models assumes that individuals’ infectious periods are exponentially distributed. This is not biologically realistic for most infections and is done purely for reasons of mathematical tractability, so that it is of interest to understand the effect of this simplifying assumption upon the resulting persistence time estimates. Ball et al. (2016) extended the result of Andersson and Djehiche (1998) for the classic SIS model to allow for a quite general infectious period distribution, by applying a result on insensitivity in stochastic networks from Zachary (2007), finding that the leading-order constant A takes the same value regardless of the infectious period distribution (provided only that its mean is held constant), but that the prefactor constant C must be appropriately modified.

A different extension of the result of Andersson and Djehiche (1998) has recently been established in Clancy (2018). For an SIS model incorporating heterogeneity in susceptible individuals’ levels of susceptibility or infected individuals’ levels of infectiousness, an explicit formula was found for the leading-order constant A in the relationship \(\lim _{N\rightarrow \infty } (\ln \tau ) / N = A\). Infectious periods were allowed to follow an Erlang distribution, and the value of A shown to depend only upon the mean of the distribution, under this assumption. In the current paper, we build on the work of Clancy (2018) to establish results of the much more precise form (1), including simple explicit formulae for the prefactor constant C. At the same time, we extend the model of Clancy (2018) in two ways: firstly, we allow for heterogeneity in individuals’ infectious period distributions in addition to the heterogeneities in susceptibility and infectiousness of Clancy (2018); secondly, following the approach of Ball et al. (2016), we allow for infectious periods following quite general (not necessarily Erlangian) distributions. Clancy (2018) showed that for a sufficiently large population, greater heterogeneity (in the sense of majorization ordering, see Marshall et al. 2011), whether in susceptibility or infectiousness, leads to a reduction in mean persistence time of infection in the population. Using our more general model, we are also able to investigate the effect of heterogeneity in infectious period distributions; see Sect. 5.2.

We note that the only infection model for which such explicit formulae for AC have previously been available is the SIS model in a homogeneously mixing population (with general infectious period distribution); in general, the leading-order constant A must be evaluated via numerical solution of a system of ordinary differential equations (a non-trivial exercise, as noted above), while no general method exists to evaluate the prefactor constant C for multidimensional models.

The remainder of the paper is structured as follows. In Sect. 2, we define our model and state our main result, Theorem 1. Section 3 recalls some standard approximations for infection models and general theory that we will require in the sequel. The proof of our results occupies Sect. 4. In Sect. 5, we demonstrate the accuracy of our approximations, both for mean persistence time and for the quasi-stationary distribution of the process; we apply our results to investigate the effects of different forms of heterogeneity; and we outline the application of our results to network models via the annealed network approximation (Hindes and Schwartz 2016, 2017). Finally, in Sect. 6, we present some concluding discussion and suggestions for further work.

2 The Model and Asymptotic Persistence Time Formulae

Consider a closed population of N individuals divided into k groups, with group i (\(i=1,2,\ldots ,k\)) consisting of \(N_i\) individuals. Denote by \(f_i = N_i / N\) the proportion of the population belonging to group i, so that \(\sum _i f_i = 1\). When a group i individual becomes infected, it remains so for a time distributed as a random variable \(T_i\) of mean \(\alpha _i = E \left[ T_i \right] \), after which it returns to the susceptible state. During this infectious period, the group i infective makes contacts with each individual in each group \(j = 1 , 2 , \ldots , k\) at the points of a Poisson process of rate \(\beta \lambda _i \mu _j / N\), where \(\beta \) is some overall measure of infectiousness, \(\lambda _i\) represents the relative infectiousness of group i individuals and \(\mu _j\) represents the relative susceptibility of group j individuals. (The assumption that the group i to group j infection rate factorises in this way is sometimes referred to as ‘separable mixing’.) Without loss of generality, we scale the \(\lambda _i\), \(\mu _j\) values so that \(\sum _i \lambda _i f_i = \sum _j \mu _j f_j = 1\). These Poisson processes and infectious periods are all mutually independent. If a contacted individual is susceptible, then it becomes infected (and infectious); if the contacted individual is already infected, then the contact has no effect. We denote by \(I_j (t)\) the number of infected individuals in group j at time \(t \ge 0\), the corresponding number of susceptible individuals being \(S_j (t) = N_j - I_j (t)\), and write \(\varvec{I}(t) = \left( I_1 ( t ) , I_2 (t) , \ldots , I_k (t) \right) \). We assume throughout that \(\beta > 0\), and that \(f_i , \alpha _i , \lambda _i , \mu _i > 0\) for all i. The basic reproduction number \(R_0\) for our model is given by

$$\begin{aligned} R_0= & {} \beta \sum _{i=1}^k \alpha _i \lambda _i \mu _i f_i , \end{aligned}$$
(2)

and we will assume throughout that parameter values are such that \(R_0 > 1\).

In order to state our results, we require the following definitions. Define \(D ( \varvec{\lambda }, \varvec{\mu })\) to be the unique positive solution of

$$\begin{aligned} \beta \sum _{j=1}^k {\alpha _j \lambda _j \mu _j f_j \over 1 + \alpha _j \mu _j D ( \varvec{\lambda }, \varvec{\mu })}= & {} 1 . \end{aligned}$$
(3)

Denote by \(\varphi _i ( \theta ) = E \left[ \mathrm{e}^{\theta T_i} \right] \) the moment-generating function of \(T_i\), and define the function \(\psi ( \theta )\) to be

$$\begin{aligned} \psi ( \theta )= & {} \sum _{i=1}^k \mu _i f_i \varphi _i \left( - \beta \lambda _i (1-\theta ) \right) . \end{aligned}$$
(4)

Finally, define \(\omega \) to be the unique solution in [0, 1) of

$$\begin{aligned} \omega= & {} \psi (\omega ) . \end{aligned}$$
(5)

We are now in a position to state our main result as follows.

Theorem 1

Consider the heterogeneous population SIS infection model defined above, and recall that \(\tau \) denotes the expected time from quasi-stationarity to disease extinction.

  1. (i)

    Suppose that heterogeneity is in susceptibility but not infectiousness (so \(\varvec{\lambda }= \mathbf{1}\)), and that \(R_0 > 1\). Then as \(N \rightarrow \infty \),

    $$\begin{aligned} \tau\sim & {} {1 \over \beta \left( 1 - \omega \right) D ( \mathbf{1} , \varvec{\mu })} \sqrt{{2 \pi \over N \sum _i f_i \left( {\alpha _i \mu _i \over 1+\alpha _i \mu _i D ( \mathbf{1} , \varvec{\mu })} \right) ^2}} \nonumber \\&\times \exp \left( N \left( \sum _i f_i \ln \left( 1 + \alpha _i \mu _i D ( \mathbf{1} , \varvec{\mu }) \right) - {D ( \mathbf{1} , \varvec{\mu }) \over \beta } \right) \right) . \end{aligned}$$
    (6)
  2. (ii)

    Suppose that heterogeneity is in infectiousness but not susceptibility (so \(\varvec{\mu }= \mathbf{1}\)) and that \(R_0 > 1\). Provided that individuals’ infectious periods are exponentially distributed (so the process \(\varvec{I}(t)\) is Markovian with transition rates given in Table 1), then as \(N \rightarrow \infty \),

    $$\begin{aligned} \tau\sim & {} {1 \over D ( \varvec{\lambda }, \mathbf{1} ) D ( \mathbf{1} , \varvec{\lambda })} \sqrt{{2 \pi \over N \sum _i f_i \left( {\alpha _i \lambda _i \over 1+\alpha _i \lambda _i D ( \mathbf{1} , \varvec{\lambda })} \right) ^2}} \nonumber \\&\times \exp \left( N \left( \sum _i f_i \ln \left( 1 + \alpha _i \lambda _i D ( \mathbf{1} , \varvec{\lambda }) \right) - {D ( \mathbf{1} , \varvec{\lambda }) \over \beta } \right) \right) . \end{aligned}$$
    (7)

In the next section, we summarise some standard theory that we will require in order to prove Theorem 1; the proof itself occupies Sect. 4.

3 General Theory

We will make use of two standard approximations to the process \(\varvec{I}(t)\), valid during different phases of an outbreak. First, consider the early stages of an outbreak initiated by a small number of infected individuals in a large susceptible population. So long as the number of infected individuals remains small relative to the total population size, the process \(\varvec{I}(t)\) may be approximated by a multitype branching process in which each group i individual lives for a time distributed as \(T_i\) and during this time gives birth to new group j individuals (\(j=1,2,\ldots ,k\)) according to a Poisson process of rate \(\beta \lambda _i \mu _j f_j\). Denoting by \(G_{ij}\) the number of type j offspring of a typical type i individual, the basic reproduction number \(R_0\) is given by the dominant eigenvalue of the mean offspring matrix M with entries \(m_{ij} = E \left[ G_{ij} \right] = \beta \alpha _i \lambda _i \mu _j f_j\), leading to formula (2). Denoting by \(\omega _i\) the probability that the branching process initiated by a single group i individual produces only a finite number of offspring (corresponding to a minor outbreak of infection), then for \(R_0 \le 1\) we have \(\omega _1 = \omega _2 = \cdots = \omega _k = 1\), while for \(R_0 > 1\), \(\varvec{\omega }= \left( \omega _1 , \omega _2 , \ldots , \omega _k \right) \) is the unique solution in \([0,1)^k\) of

$$\begin{aligned} \omega _i= & {} \varphi _i \left( - \beta \lambda _i \left( 1 - \sum _{j=1}^k \mu _j f_j \omega _j \right) \right) \quad \hbox { for } i=1,2,\ldots ,k , \end{aligned}$$

see, for instance, Section 3 of Clancy and Pearce (2013).

We can alternatively treat the above multitype branching process as a single-type branching process in which the number of offspring produced by a typical type i individual is distributed as \(G_i\) where \(G_i\) follows a mixture distribution: \(G_i = G_{ij}\) with probability \(\mu _j f_j\) for \(j=1,2,\ldots ,k\). For a process initiated by a single infected individual which belongs to group i with probability \(\mu _i f_i\), the minor outbreak probability \(\omega \) is given by \(\omega =\sum _{i=1}^k \mu _i f_i \omega _i\), and for \(R_0 > 1\), \(\omega \) is the unique solution in [0, 1) of \(\omega = \psi (\omega )\), where \(\psi ( \theta )\) is defined by Eq. (4).

Given that a major outbreak occurs, then following the initial (branching process) phase, the growth of the epidemic towards an endemic equilibrium and the long-term quasi-equilibrium behaviour may be approximated by a deterministic process. Denoting by \(\bar{F}_i (u) = \Pr ( T_i > u )\) the survival function of \(T_i\), then the scaled process \(\varvec{I}(t) / N\) is approximated by the deterministic process \(\varvec{y}(t)\) which evolves according to, for \(i=1,2,\ldots ,k\),

$$\begin{aligned} y_i (t)= & {} \int _{-\infty }^t \beta \left( \sum _{j=1}^k \lambda _j y_j (u) \right) \mu _i ( f_i - y_i (u) ) \bar{F}_i ( t-u ) \, \mathrm{d}u. \end{aligned}$$
(8)

System (8) has a disease-free equilibrium point at \(\varvec{y}= \mathbf{0}\), and for \(R_0 > 1\) a unique nonzero equilibrium point \(\varvec{y}^*\) with components (Nold 1980; Clancy 2015)

$$\begin{aligned} y_i^*= & {} {\alpha _i \mu _i f_i D ( \varvec{\lambda }, \varvec{\mu }) \over 1 + \alpha _i \mu _i D ( \varvec{\lambda }, \varvec{\mu })}\quad \hbox { for } i=1,2,\ldots ,k, \end{aligned}$$
(9)

where \(D ( \varvec{\lambda }, \varvec{\mu })\) is given by Eq. (3).

Table 1 Transition rates for the heterogeneous population SIS model with exponentially distributed infectious periods

For the remainder of this section, we focus upon the case that infectious periods \(T_i\) are exponentially distributed. The process \(\left\{ \varvec{I}(t) : t \ge 0 \right\} \) is now a continuous-time Markov chain on the finite state space \(S = \prod _{i=1}^k \left\{ 0,1,\ldots ,N_i \right\} \) with transition rates given in Table 1. The moment-generating function of \(T_i\) (\(i=1,2,\ldots ,k\)) is \(\varphi _i ( \theta ) = ( 1 - \alpha _i \theta )^{-1}\), and it follows that the minor outbreak probability \(\omega \) is

$$\begin{aligned} \omega= & {} 1 - \left( D ( \varvec{\mu }, \varvec{\lambda }) / \beta \right) , \end{aligned}$$
(10)

where \(D ( \varvec{\mu }, \varvec{\lambda })\) is given by Eq. (3) with the roles of \(\varvec{\lambda }, \varvec{\mu }\) interchanged.

The process \(\varvec{I}(t)\) is a density-dependent process in the sense of chapter 11 of Ethier and Kurtz 2005; that is, the transition rates are of the form

$$\begin{aligned} P \left( \varvec{I}( t + \delta t ) = \varvec{x}+ \varvec{l}\mid \varvec{I}(t) = \varvec{x}\right)= & {} N W_{\varvec{l}} \left( {\varvec{x}\over N} \right) + o ( \delta t )\quad \hbox { for } \varvec{x}\in S,\ \varvec{l}\in L , \end{aligned}$$
(11)

for some functions \(W_{\varvec{l}} : {\mathbb R}^k \rightarrow {\mathbb R}^+\), where L is the set of possible jumps from each state \(\varvec{x}\in S\). The scaled process \(\varvec{I}(t) / N\) converges almost surely over finite time intervals (Ethier and Kurtz 2005, Theorem 11.2.1), as \(N \rightarrow \infty \), to the solution \(\varvec{y}(t)\) of the ordinary differential equation system

$$\begin{aligned} {\mathrm{d} \varvec{y}\over \mathrm{d}t}= & {} \sum _{\varvec{l}\in L} \varvec{l}W_{\varvec{l}} ( \varvec{y}) . \end{aligned}$$

That is, Eq. (8) may, in the case of exponentially distributed infectious periods, be written in the form

$$\begin{aligned} {\mathrm{d}y_i \over \mathrm{d}t}= & {} \beta \left( \sum _{j=1}^k \lambda _j y_j \right) \mu _i ( f_i - y_i ) - ( 1 / \alpha _i ) y_i \quad \hbox { for } i=1,2,\ldots ,k. \end{aligned}$$
(12)

For \(R_0 > 1\), the disease-free equilibrium point \(\varvec{y}= \mathbf{0}\) is unstable and the endemic equilibrium point \(\varvec{y}^*\) given by (9) is globally asymptotically stable (Lajmanovich and Yorke 1976).

Denote by Q the transition rate matrix of the process, with entries given in Table 1. The state space S is made up of an absorbing state at \(\mathbf 0\) and a single transient communicating class C, and we denote by \(Q_C\) the restriction of Q to C. The infection will almost surely die out within finite time, and (Darroch and Seneta 1967) there exists a unique quasi-stationary distribution \(\varvec{q}\) with elements

$$\begin{aligned} q_{\varvec{x}}= & {} \lim _{t \rightarrow \infty } \Pr \left( \varvec{I}(t) = \varvec{x}\left| \varvec{I}(t) \in C \right. \right) \quad \hbox { for } \varvec{x}\in C . \end{aligned}$$

The distribution \(\varvec{q}\) may be found as the unique solution of

$$\begin{aligned} \varvec{q}Q_C= & {} - (1/\tau ) \varvec{q} \text{ with } \sum _{\varvec{x}\in C} q_{\varvec{x}} = 1 , \end{aligned}$$
(13)

where \(-(1/\tau )\) is the eigenvalue of \(Q_C\) with largest real part, and the time to extinction from quasi-stationarity is exponentially distributed with mean \(\tau \).

For a process with transition rates of the form (11), Eq. (13) may be written as

$$\begin{aligned} \sum _{\varvec{l}\in L} \left( q_{\varvec{x}-\varvec{l}} W_{\varvec{l}} \left( {\varvec{x}- \varvec{l}\over N} \right) - q_{\varvec{x}} W_{\varvec{l}} \left( {\varvec{x}\over N} \right) \right)= & {} - (\tau N)^{-1} q_{\varvec{x}}\quad \text{ for } \varvec{x}\in C , \end{aligned}$$
(14)

with

$$\begin{aligned} \tau= & {} \left( N \sum _{\varvec{l}\in L} q_{- \varvec{l}} W_{\varvec{l}} \left( - \, {\varvec{l}\over N} \right) \right) ^{-1} . \end{aligned}$$
(15)

Writing \(\varvec{y}= \varvec{x}/ N\), then, following the methodology described in Assaf and Meerson (2017) and references therein, we adopt the WKB (Wentzel, Kramers, Brillouin) ansatz that

$$\begin{aligned} q_{\varvec{x}}= & {} K_N \exp \left( - N V ( \varvec{y}) - V_0 ( \varvec{y}) + o(1) \right) \end{aligned}$$
(16)

for some functions \(V(\varvec{y}) , V_0 (\varvec{y})\) that do not depend upon N, and some \(K_N\) that does not depend upon \(\varvec{y}\). Without loss of generality, we set \(V ( \varvec{y}^* ) = V_0 ( \varvec{y}^* ) = 0\). Then

$$\begin{aligned} q_{\varvec{x}- \varvec{l}}= & {} q_{\varvec{x}} \exp \left( \varvec{l}^T {\partial V \over \partial \varvec{y}} - {1 \over 2N} \, \varvec{l}^T {\partial ^2 V \over \partial \varvec{y}^2} \varvec{l}+ {1 \over N} \, \varvec{l}^T {\partial V_0 \over \partial \varvec{y}} + o(1/N) \right) , \end{aligned}$$

and similarly

$$\begin{aligned} W_{\varvec{l}} \left( \varvec{y}- {\varvec{l}\over N} \right)= & {} W_{\varvec{l}} ( \varvec{y}) - {\varvec{l}^T \over N} {\partial W_{\varvec{l}} \over \partial \varvec{y}} + o(1/N) . \end{aligned}$$

Substituting into Eq. (14), and assuming that \(\tau \) is sufficiently large for the right-hand side to be neglected, we obtain

$$\begin{aligned}&\sum _{\varvec{l}\in L} W_{\varvec{l}} ( \varvec{y})\left( \exp \left( \varvec{l}^T {\partial V \over \partial \varvec{y}} \right) - 1 \right) \nonumber \\&\quad + \,{1 \over N} \sum _{\varvec{l}\in L} \exp \left( \varvec{l}^T {\partial V \over \partial \varvec{y}} \right) \left( - \varvec{l}^T {\partial W_{\varvec{l}} \over \partial \varvec{y}} + \varvec{l}^T {\partial V_0 \over \partial \varvec{y}} W_{\varvec{l}} ( \varvec{y}) - {1 \over 2} \varvec{l}^T {\partial ^2 V \over \partial \varvec{y}^2} \varvec{l}W_{\varvec{l}} ( \varvec{y}) \right) \nonumber \\&\quad +\, o(1/N) = 0 . \end{aligned}$$
(17)

Collecting together leading-order terms from Eq. (17), we have

$$\begin{aligned} \sum _{\varvec{l}\in L} W_{\varvec{l}} ( \varvec{y})\left( \exp \left( \varvec{l}^T {\partial V \over \partial \varvec{y}} \right) - 1 \right)= & {} 0 . \end{aligned}$$
(18)

That is, \(V ( \varvec{y})\) satisfies the Hamilton–Jacobi equation \(H \left( \varvec{y}, {\partial V \over \partial \varvec{y}} \right) = 0\), where the Hamiltonian is defined to be \(H ( \varvec{y}, \varvec{\theta }) = \sum _{\varvec{l}\in L} W_{\varvec{l}} ( \varvec{y}) \left( \mathrm{e}^{\varvec{l}^T \varvec{\theta }} - 1 \right) \).

Collecting together second-order terms in Eq. (17) gives

$$\begin{aligned} \sum _{\varvec{l}\in L} \exp \left( \varvec{l}^T {\partial V \over \partial \varvec{y}} \right) \varvec{l}^T \left( \left( {\partial V_0 \over \partial \varvec{y}} - {1 \over 2} {\partial ^2 V \over \partial \varvec{y}^2} \varvec{l}\right) W_{\varvec{l}} ( \varvec{y}) - {\partial W_{\varvec{l}} \over \partial \varvec{y}} \right)= & {} 0 , \end{aligned}$$
(19)

a first-order linear partial differential equation to be solved for \(V_0 ( \varvec{y})\), once \(V ( \varvec{y})\) has been found from Eq. (18). Equation (18) has been previously studied in numerous specific applications (e.g. Assaf and Meerson 2010; Clancy 2018; Dykman et al. 1994; Elgart and Kamenev 2004; Hindes and Schwartz 2016; Kamenev and Meerson 2008; Lindley et al. 2014). Equation (19) has been analysed in some detail in the \(k=1\)-dimensional case (Assaf and Meerson 2017), but does not seem to have been much studied in the context of multidimensional problems.

With the above standard results and general theory in hand, we now proceed to the proof of Theorem 1.

4 Proof of Asymptotic Persistence Time Formulae

4.1 Heterogeneous Susceptibilities and Exponentially Distributed Infectious Periods: The Main Body of the Quasi-Stationary Distribution

We will start by proving Theorem 1(i), corresponding to the case of heterogeneity in susceptibility but not infectiousness, so \(\varvec{\lambda }= \mathbf{1}\), and focus initially on the case of exponentially distributed infectious periods. For the Markov chain model with transition rates given in Table 1, we aim to express the asymptotic form of the main body of the quasi-stationary distribution in the form (16). We thus need to find the constant \(K_N\) and the functions \(V ( {\varvec{y}} )\), \(V_0 ( \varvec{y})\). In this section and Sect. 4.2, we follow the approach described in Assaf and Meerson (2010) for \(k=1\)-dimensional processes, adapted to our multidimensional situation.

For the case \(\varvec{\lambda }= \mathbf{1}\), the solution to Eq. (18) with boundary condition \(V ( \varvec{y}^* ) = 0\) is known, by a slight generalisation of formula (20) of Clancy (2018), to be

$$\begin{aligned} V ( \varvec{y})= & {} \sum _{i=1}^k y_i \left( 1 + \ln y_i - \ln \left( \beta \alpha _i \mu _i \right) \right) - \left( \sum _{i=1}^k y_i \right) \ln \left( \sum _{i=1}^k y_i \right) \nonumber \\&+ \sum _{i=1}^k (f_i - y_i) \ln (f_i - y_i) - {D ( \mathbf{1} , \varvec{\mu }) \over \beta } - \sum _{i=1}^k f_i \ln f_i \nonumber \\&+ \sum _{i=1}^k f_i \ln \left( 1 + \alpha _i \mu _i D ( \mathbf{1} , \varvec{\mu }) \right) . \end{aligned}$$
(20)

Next, to evaluate \(K_N\), we consider the Taylor series expansion of formula (16) for \(\varvec{y}\) in the vicinity of \(\varvec{y}^*\). Differentiating Eq. (20), we obtain

$$\begin{aligned} {\partial V \over \partial y_i}= & {} \ln \left( {y_i \over \beta \alpha _i \mu _i \left( f_i - y_i \right) \left( \sum _{m=1}^k y_m \right) } \right) , \end{aligned}$$
(21)

so that in particular, \(\left. {\partial V \over \partial \varvec{y}} \right| _{\varvec{y}= \varvec{y}^*} = \mathbf{0}\). This is as one would expect, since for large N the quasi-stationary distribution has its mode at \(\varvec{x}= N \varvec{y}^*\). Recalling the boundary conditions \(V ( \varvec{y}^* ) = V_0 ( \varvec{y}^* ) = 0\), then for \(| \varvec{x}- N \varvec{y}^* | = O ( \sqrt{N} )\), Taylor series expansion of (16) yields

$$\begin{aligned} q_{\varvec{x}}= & {} K_N \exp \left( - \, {1 \over 2N} ( \varvec{x}- N \varvec{y}^* )^T \left. {\partial ^2 V \over \partial y_i \partial y_j} \right| _{\varvec{y}= \varvec{y}^*} ( \varvec{x}- N \varvec{y}^* ) + o(1) \right) . \end{aligned}$$
(22)

Denoting by S the matrix with entries

$$\begin{aligned} s_{ij}= & {} \left. {\partial ^2 V \over \partial y_i \partial y_j} \right| _{\varvec{y}= \varvec{y}^*} , \end{aligned}$$

then (22) represents a multivariate Gaussian distribution with variance matrix \(N S^{-1}\), normalisation of which requires that

$$\begin{aligned} K_N= & {} \sqrt{{\text{ det } (S) \over ( 2 \pi N )^k}} . \end{aligned}$$
(23)

To evaluate the determinant of S, we differentiate Eq. (21) to obtain

$$\begin{aligned} {\partial ^2 V \over \partial y_i \partial y_j}= & {} \left( {f_i \over y_i (f_i - y_i)} \right) \delta _{ij} - {1 \over \sum _m y_m} , \end{aligned}$$
(24)

where \(\delta _{ij}\) is the Kronecker delta. In particular, recalling that the components of \(\varvec{y}^*\) are given by formula (9) with \(\varvec{\lambda }= \mathbf{1}\), then S has elements

$$\begin{aligned} s_{ij}= & {} \left. {\partial ^2 V \over \partial y_i \partial y_j} \right| _{\varvec{y}= \varvec{y}^*} \;\;=\;\; \left( {\left( 1 + \alpha _i \mu _i D ( \mathbf{1}, \varvec{\mu }) \right) ^2 \over \alpha _i \mu _i f_i D ( \mathbf{1} , \varvec{\mu }) } \right) \delta _{ij} - \, {\beta \over D ( \mathbf{1} , \varvec{\mu })} . \end{aligned}$$

It follows that

$$\begin{aligned} \text{ det } (S)= & {} {1 \over D ( \mathbf{1} , \varvec{\mu })^k} \left( 1 - \beta \sum _{i=1}^k {\alpha _i \mu _i f_i \over \left( 1 + \alpha _i \mu _i D ( \mathbf{1} , \varvec{\mu }) \right) ^2} \right) \prod _{i=1}^k {\left( 1+\alpha _i\mu _i D ( \mathbf{1} , \varvec{\mu }) \right) ^2 \over \alpha _i\mu _i f_i} \nonumber \\= & {} {\beta \over D ( \mathbf{1} , \varvec{\mu })^{k-1}} \left( \prod _{i=1}^k {\left( 1+\alpha _i\mu _i D ( \mathbf{1} , \varvec{\mu }) \right) ^2 \over \alpha _i\mu _i f_i} \right) \sum _{i=1}^k f_i \left( {\alpha _i \mu _i \over 1 + \alpha _i \mu _i D ( \mathbf{1} , \varvec{\mu })} \right) ^2 , \end{aligned}$$
(25)

the last line above following from Eq. (3) with \(\varvec{\lambda }= \mathbf{1}\). Substituting from (25) into (23), we obtain

$$\begin{aligned} K_N= & {} \sqrt{{\beta \over ( 2 \pi N )^k D ( \mathbf{1} , \varvec{\mu })^{k-1}} \left( \prod _{i=1}^k {\left( 1+\alpha _i\mu _i D ( \mathbf{1} , \varvec{\mu }) \right) ^2 \over \alpha _i\mu _i f_i} \right) \sum _{i=1}^k f_i \left( {\alpha _i \mu _i \over 1 + \alpha _i \mu _i D ( \mathbf{1} , \varvec{\mu })} \right) ^2} . \end{aligned}$$
(26)

To find \(V_0 ( \varvec{y})\), substitute the derivatives (21, 24) into Eq. (19) to obtain, after some simplification, the partial differential equation

$$\begin{aligned} \sum _{i=1}^k \left( \beta \mu _i (f_i-y_i) \left( \sum _{j=1}^k y_j \right) - {y_i \over \alpha _i} \right) \left( {f_i - 2 y_i \over 2 y_i ( f_i - y_i )} + {1 \over 2 \sum _j y_j} - {\partial V_0 \over \partial y_i} \right)= & {} 0 . \end{aligned}$$
(27)

Equation (27) will be satisfied if we can find \(V_0 ( \varvec{y})\) such that

$$\begin{aligned} {\partial V_0 \over \partial y_i}= & {} {f_i - 2 y_i \over 2 y_i ( f_i - y_i )} + {1 \over 2 \sum _j y_j}\quad \hbox { for } i=1,2,\ldots ,k , \end{aligned}$$

and hence we find that the solution of (27) subject to the boundary condition \(V_0 ( \varvec{y}^* ) = 0\) is

$$\begin{aligned} V_0 ( \varvec{y})= & {} (1/2) \ln \left( {\beta \over D ( \mathbf{1} , \varvec{\mu })^{k+1}} \left( \sum _{i=1}^k y_i \right) \prod _{i=1}^k {y_i ( f_i - y_i ) \left( 1 + \alpha _i \mu _i D ( \mathbf{1} , \varvec{\mu }) \right) ^2 \over \alpha _i \mu _i f_i^2} \right) . \end{aligned}$$
(28)

Our analysis of the WKB approximation (16) is now complete, in that we have found explicit formulae for \(V ( \varvec{y})\), \(K_N\) and \(V_0 ( \varvec{y})\) in Eqs. (20, 26, 28), respectively, giving an approximation to the main body of the quasi-stationary distribution. The WKB ansatz is thus justified, in that we have been able to exhibit a solution of the assumed form (16). However, \(V_0 ( \varvec{y}) \rightarrow - \infty \) as \(\varvec{y}\rightarrow \mathbf{0}\), and the asymptotic form (16) is not valid in the tail of the distribution, where \(| \varvec{x}| = O(1)\). We address this in Sect. 4.2.

4.2 Heterogeneous Susceptibilities and Exponentially Distributed Infectious Periods: The Asymptotic Persistence Time Formula

For the SIS model with heterogeneous susceptibilities and exponentially distributed infectious periods, the mean extinction time \(\tau \) from quasi-stationarity is given, from formula (15), by \(\tau = \left( \sum _{i=1}^k ( 1 / \alpha _i ) q_{\varvec{e}_i} \right) ^{-1}\), where \(\varvec{e}_i\) denotes the unit vector with ith element equal to 1. We are not yet in a position to evaluate the quasi-stationary probabilities \(q_{\varvec{e}_i}\), since expression (16) is not valid in the range \(| \varvec{x}| = O (1)\). We will therefore derive an alternative asymptotic formula for \(q_{\varvec{x}}\) that is valid in the range \(| \varvec{x}| = O (1)\), but is un-normalised. Normalisation may be achieved by matching our two approximations in the region where their domains of validity overlap. In order that we can carry out this matching, we first consider the Taylor series expansion of our existing approximation (16) about \(\varvec{y}= \mathbf{0}\). Since \(V_0 ( \varvec{y})\) diverges at \(\varvec{y}= \mathbf{0}\), this is quite technical, as we shall see. To deal with this, we define

$$\begin{aligned} \phi ( \varvec{y})= & {} V_0 ( \varvec{y}) - {1 \over 2} \ln \left( \sum _{i=1}^k y_i \right) - {1 \over 2} \sum _{i=1}^k \ln y_i \\= & {} (1/2) \ln \left( {\beta \over D ( \mathbf{1} , \varvec{\mu })^{k+1}} \prod _{i=1}^k {( f_i - y_i ) \left( 1 + \alpha _i\mu _i D ( \mathbf{1} , \varvec{\mu }) \right) ^2 \over \alpha _i \mu _i f_i^2} \right) , \end{aligned}$$

so that \(\phi ( \varvec{y})\) is well behaved at \(\varvec{y}= \mathbf{0}\), and (16) may be rewritten as

$$\begin{aligned} q_{\varvec{x}}= & {} {K_N \over \sqrt{\left( \sum _i y_i \right) \left( \prod _i y_i \right) }} \exp \left( - N V ( \varvec{y}) - \phi ( \varvec{y}) + o(1) \right) . \end{aligned}$$

Taylor series expansion in the range \(| \varvec{x}| = o ( \sqrt{N} )\), together with substitution for \(K_N\) from (26), gives

$$\begin{aligned} q_{\varvec{x}}= & {} {K_N \over \sqrt{\left( \sum _i y_i \right) \prod _i y_i}} \exp \left( - N V ( \mathbf{0} ) - N \varvec{y}^T \left. {\partial V \over \partial \varvec{y}} \right| _{\varvec{y}= \mathbf{0}} - \phi ( \mathbf{0} ) + o(1) \right) \\= & {} K_N \sqrt{{D ( \mathbf{1} , \varvec{\mu })^{k+1} \over \beta \left( \sum _i y_i \right) \prod _i y_i} \prod _{i=1}^k {\alpha _i \mu _i f_i \over \left( 1 + \alpha _i \mu _i D ( \mathbf{1} , \varvec{\mu }) \right) ^2}} \\&\times \exp \left( - N V ( \mathbf{0} ) - N \varvec{y}^T \left. {\partial V \over \partial \varvec{y}} \right| _{\varvec{y}= \mathbf{0}} + o(1) \right) \\= & {} D ( \mathbf{1} , \varvec{\mu }) \sqrt{{N \over ( 2 \pi )^k \left( \sum _i x_i \right) \prod _i x_i} \sum _{i=1}^k f_i \left( {\alpha _i \mu _i \over 1 + \alpha _i \mu _i D ( \mathbf{1} , \varvec{\mu })} \right) ^2} \\&\times \exp \left( - N V ( \mathbf{0} ) - N \varvec{y}^T \left. {\partial V \over \partial \varvec{y}} \right| _{\varvec{y}= \mathbf{0}} + o(1) \right) . \end{aligned}$$

A further difficulty arises here, since the derivatives \({\partial V / \partial y_i}\) given by (21) are not well defined at \(\varvec{y}= \mathbf{0}\). We therefore consider approach to \(\varvec{y}= \mathbf{0}\) along a specific trajectory \(y_i = \hat{y} \xi _i\) as \(\hat{y} \rightarrow 0\), where \(\xi _1 , \xi _2 , \ldots , \xi _k > 0\) are fixed with \(\sum _{i=1}^k \xi _i = 1\). Along this trajectory, \({\partial V \over \partial y_i} \rightarrow \ln ( \xi _i / \beta \alpha _i \mu _i f_i )\) as \(\hat{y} \rightarrow 0\), and so with \(\hat{x} = N \hat{y}\),

$$\begin{aligned} q_{\varvec{x}}= & {} \beta ^{\hat{x}} D ( \mathbf{1} , \varvec{\mu })\prod _{i=1}^k \left( {\alpha _i \mu _i f_i \over \xi _i} \right) ^{\hat{x} \xi _i} \sqrt{{N \over ( 2 \pi )^k \hat{x}^{k+1} \prod _i \xi _i} \sum _{i=1}^k f_i \left( {\alpha _i \mu _i \over 1 + \alpha _i \mu _i D ( \mathbf{1} , \varvec{\mu })} \right) ^2} \nonumber \\&\times \exp \left( - N V ( \mathbf{0} ) + o(1) \right) . \end{aligned}$$
(29)

We now seek an approximation for \(q_{\varvec{x}}\) valid for \(| \varvec{x}| = O(1)\). For the SIS model with heterogeneous susceptibilities, with the convention that \(q_{\varvec{x}} = 0\) for \(\varvec{x}\notin C\), the exact balance equation (13) may be written as

$$\begin{aligned}&{\beta \over N} \sum _{i=1}^k \mu _i \left( N_i - x_i + 1 \right) \left( \sum _{j=1}^k x_j - 1 \right) q_{\varvec{x}- \varvec{e}_i} + \sum _{i=1}^k ( 1 / \alpha _i ) ( x_i + 1 ) q_{\varvec{x}+ \varvec{e}_i} \nonumber \\&\qquad - \left( {\beta \over N} \sum _{i=1}^k \sum _{j=1}^k \mu _i ( N_i - x_i ) x_j + \sum _{i=1}^k ( 1 / \alpha _i ) x_i \right) q_{\varvec{x}} \nonumber \\&\quad =\;\; - ( 1 / \tau ) q_{\varvec{x}} \text{ for } \varvec{x}\in C . \end{aligned}$$
(30)

Assuming, as before, that \(\tau \) is sufficiently large for the right-hand side of Eq. (30) to be neglected, and taking the linear approximation to the left-hand side, which is valid in the required range \(| \varvec{x}| = o (\sqrt{N})\), we obtain the asymptotic balance equation

$$\begin{aligned}&\beta \sum _{i=1}^k \mu _i f_i \left( \sum _{j=1}^k x_j - 1 \right) q_{\varvec{x}- \varvec{e}_i} + \sum _{i=1}^k ( 1 / \alpha _i ) ( x_i + 1 ) q_{\varvec{x}+ \varvec{e}_i} \nonumber \\&\quad - \left( \sum _{j=1}^k \left( \beta + ( 1 / \alpha _j ) \right) x_j \right) q_{\varvec{x}} = 0 \end{aligned}$$
(31)

for \(\varvec{x}\in C\). Equation (31) corresponds to the (linear) branching process approximation discussed in Sect. 2, and we seek an un-normalised stationary solution. One component of the solution may be found by solving the detailed balance equations (Kelly 2011, chapter 1) corresponding to (31), and the other component by analogy with the solution for the case \(k=1\) given in Assaf and Meerson (2010). The solution thus obtained may be written in the form

$$\begin{aligned} q_{\varvec{x}}= & {} {1 \over \sum _i x_i} {\left( \sum _i x_i \right) ! \over \prod _i x_i !} \prod _{i=1}^k \left( \beta \mu _i f_i \right) ^{x_i} \left( \prod _{i=1}^k \alpha _i^{x_i} - \prod _{i=1}^k \left( {\alpha _i \over 1 + \alpha _i D ( \varvec{\mu }, \mathbf{1} )} \right) ^{x_i} \right) {1 \over D ( \varvec{\mu }, \mathbf{1} ) \tau } \end{aligned}$$

where the normalising factor \(\tau \) remains to be found. For \(| \varvec{x}|\) large, noting that \(D ( \varvec{\mu }, \mathbf{1} ) > 0\) and applying Stirling’s formula to the factorial terms, we obtain

$$\begin{aligned} q_{\varvec{x}}\sim & {} {1 \over \sum _i x_i} {\left( \sum _i x_i \right) ! \over \prod _i x_i !} \prod _{i=1}^k \left( \beta \alpha _i \mu _i f_i \right) ^{x_i} {1 \over D ( \varvec{\mu }, \mathbf{1} ) \tau } \nonumber \\\sim & {} \left( \sum _i x_i \right) ^{\sum _i x_i} \prod _{i=1}^k \left( {\beta \alpha _i \mu _i f_i \over x_i} \right) ^{x_i} \sqrt{{1 \over (2 \pi )^{k-1} \left( \sum _m x_m \right) \prod _m x_m}} \left( {1 \over D ( \varvec{\mu }, \mathbf{1} )\tau } \right) . \end{aligned}$$
(32)

Along the previously considered trajectory with \(x_i = \hat{x} \xi _i\), expression (32) reduces to

$$\begin{aligned} q_{\varvec{x}}\sim & {} \beta ^{\hat{x}} \prod _{i=1}^k \left( {\alpha _i \mu _i f_i \over \xi _i} \right) ^{\hat{x} \xi _i} \sqrt{{1 \over (2\pi )^{k-1} \hat{x}^{k+1} \prod _i \xi _i}} \left( {1 \over D ( \varvec{\mu }, \mathbf{1} ) \tau } \right) , \end{aligned}$$
(33)

and we can now match expressions (29) and (33) to obtain

$$\begin{aligned} \tau\sim & {} {1 \over D ( \mathbf{1} , \varvec{\mu }) D ( \varvec{\mu }, \mathbf{1} )} \sqrt{{2 \pi \over N \sum _i f_i \left( {\alpha _i \mu _i \over 1+\alpha _i \mu _i D ( \mathbf{1} , \varvec{\mu })} \right) ^2}} \, \exp \left( N V(\mathbf{0}) \right) . \end{aligned}$$
(34)

Noting from formula (20) that \(V ( \mathbf{0} ) = \sum _i f_i \ln \left( 1 + \alpha _i \mu _i D ( \mathbf{1} , \varvec{\mu }) \right) - ( D ( \mathbf{1} , \varvec{\mu }) / \beta )\), and recalling that the minor outbreak probability in the case of exponentially distributed infectious periods is given by Eq. (10), we have now established Theorem 1(i) for the case of exponentially distributed infectious periods. Note that our assumption that \(\tau \) is sufficiently large for the right-hand side of Eq. (14) to be neglected is thus justified in retrospect. Next, in Sect. 4.3, we extend the result to cover the case of heterogeneous susceptibilities with more general infectious period distributions.

4.3 Heterogeneous Susceptibilities and General Infectious Period Distributions

To allow for infectious period distributions more general than exponential, we will follow the approach developed in Ball et al. (2016) for the classic SIS model (the case \(k=1\)). That is, we consider a restarted version of our model and apply an insensitivity result for stochastic networks from Zachary (2007).

Consider our model of Sect. 2 in the case \(\varvec{\lambda }= \mathbf{1}\). Following Hernández-Suárez and Castillo-Chavez (1999), we introduce a regeneration step as follows. Whenever the process reaches the state \(\varvec{I}= \mathbf{0}\), it remains there for an exponentially distributed period of mean 1, after which a randomly chosen individual becomes infected, and the process then continues as before. The newly infected individual at the regeneration step is chosen to belong to group i with probability \(\rho _i\) for some distribution \(\varvec{\rho }= \left( \rho _1 , \rho _2 , \ldots , \rho _k \right) \).

In the terminology of Zachary (2007), each occasion when an individual in group i becomes infected corresponds to a class i arrival with associated workload distributed as \(T_i / \alpha _i\). When the process is in state \(\varvec{x}\in S\), individuals arrive in class i (\(i=1,2,\ldots ,k\)) at rate

$$\begin{aligned} \phi _{0i} ( \varvec{x})= & {} \left\{ \begin{array}{ll} \rho _i &{}\quad \text{ for } \varvec{x}= \mathbf{0} , \\ {\beta \over N} \mu _i \left( N_i - x_i \right) \sum _{j=1}^k x_j &{}\quad \text{ for } \varvec{x}\in C . \end{array} \right. \end{aligned}$$

While \(x_i > 0 \), the workload of each class i individual reduces at rate \(\phi _i ( \varvec{x}) / x_i\), where

$$\begin{aligned} \phi _i ( \varvec{x})= & {} x_i / \alpha _i . \end{aligned}$$

The framework of Zachary (2007) allows for movement of individuals between classes, which we do not require here, so that in the notation of Zachary (2007) we take, for \(i=1,2,\ldots ,k\),

$$\begin{aligned} \phi _{ij} ( \varvec{x})= & {} \left\{ \begin{array}{ll} \phi _i ( \varvec{x}) &{} \quad \text{ for } j=0 , \\ 0 &{} \quad \text{ for } j = 1,2,\ldots ,k . \end{array} \right. \end{aligned}$$

That is, on completion of its workload, an individual of class i leaves the system with probability \(\phi _{i0} ( \varvec{x}) / \phi _i ( \varvec{x}) = 1\) and moves to class \(j=1,2,\ldots ,k\) with probability \(\phi _{ij} ( \varvec{x}) / \phi _i ( \varvec{x}) = 0\).

In Theorem 2 of Zachary (2007), it is shown that if a distribution \(\pi ( \varvec{x})\) satisfies a certain partial balance condition (Equations (10) of Zachary 2007), together with an integrability condition (Equation (11) of Zachary 2007), then \(\pi ( \varvec{x})\) is the stationary distribution of the numbers of individuals present in each class, regardless of the distributions of \(T_1 , T_2 , \ldots , T_k\). In order to find the stationary distribution \(\pi ( \varvec{x})\), we return to the case of exponentially distributed infectious periods. If the restarted process is reversible, then it is straightforward to find \(\pi ( \varvec{x})\) from the detailed balance equations (Kelly 2011, chapter 1). In order that the restarted process be reversible, consider Kolmogorov’s criterion (Kelly 2011, chapter 1) applied to the sequence of states \(\mathbf{0} \rightarrow \varvec{e}_i \rightarrow \varvec{e}_i + \varvec{e}_j \rightarrow \varvec{e}_j \rightarrow \mathbf{0}\) for \(i \ne j\). That is, we require the product of the transition rates in one direction around the loop to equal the corresponding product of transition rates in the opposite direction. This may be achieved by taking \(\rho _i = \mu _i f_i\) for \(i=1,2,\ldots ,k\). We can then solve the detailed balance equations, thereby verifying that the restarted process with this choice of \(\rho _i\) values is indeed reversible, with stationary distribution \(\pi ( \varvec{x})\) satisfying

$$\begin{aligned} \pi ( \varvec{x})= & {} {1 \over \beta } {\left( \sum _{j=1}^k x_j \right) ! \over \sum _{j=1}^k x_j} \prod _{j=1}^k \left( {\beta \alpha _j \mu _j \over N} \right) ^{x_j} {N_j \atopwithdelims ()x_j} \, \pi ( \mathbf{0} )\quad \text{ for } \varvec{x}\in C , \end{aligned}$$
(35)

so that

$$\begin{aligned} \pi ( \mathbf{0} )= & {} \left( 1 + {1 \over \beta } \sum _{\varvec{x}\in C} {\left( \sum _{j=1}^k x_j \right) ! \over \sum _{j=1}^k x_j} \prod _{j=1}^k \left( {\beta \alpha _j \mu _j \over N} \right) ^{x_j} {N_j \atopwithdelims ()x_j} \right) ^{-1} . \end{aligned}$$

It is now straightforward to check that the balance equations (10) of Zachary (2007) are satisfied by \(\pi ( \varvec{x})\) given by (35). The integrability condition (11) of Zachary (2007) is trivially satisfied since our state space is finite. It therefore follows from Theorem 2 of Zachary (2007) that \(\pi (\varvec{x})\) given by (35) is stationary for the numbers of infected individuals in our restarted process, whatever the distributions of the infectious periods \(T_i\).

Now \(\pi ( \mathbf{0} )\) is the expected proportion of time spent in state \(\varvec{I}= \mathbf 0\) in the long term, which is equal to the expected proportion of time spent in state \(\varvec{I}= \mathbf 0\) during one regenerative cycle. Denoting by \(\tau _0\) the expected regeneration time, being the time from one entry into state \(\varvec{I}= \mathbf{0}\) until the following entry into state \(\varvec{I}= \mathbf{0}\), then \(\tau _0 = 1 / \pi ( \mathbf{0} )\). For \(i=1,2,\ldots ,k\), denote by \(\tau _i\) the expected time for the process to hit state \(\varvec{I}= \mathbf{0}\) after having been initiated with a single newly infected individual in group i. Notice that the values of \(\tau _1 , \tau _2 , \ldots , \tau _k\) are the same for the restarted process as for the original process and that

$$\begin{aligned} \tau _0= & {} 1 + \sum _{i=1}^k \mu _i f_i \tau _i , \end{aligned}$$

so that

$$\begin{aligned} \sum _{i=1}^k \mu _i f_i \tau _i= & {} {1 \over \pi ( \mathbf{0} )} - 1 . \end{aligned}$$
(36)

Recall from Sect. 3 that in the large population limit, the initial stage of an outbreak initiated by a single newly infected individual may be approximated by a multitype branching process. Recall that \(\omega _i\) (\(i=1,2,\ldots ,k\)) denotes the probability that this branching process, initiated by a single group i individual, produces only a finite number of progeny. Denote by \(\zeta _i\) the expected time to extinction of the process conditional upon a finite number of progeny being produced, and by \(\sigma _i\) the expected time taken to attain quasi-stationarity, starting from a single group i individual, given that a major outbreak occurs. Then, in the large population limit, we have

$$\begin{aligned} \tau _i\sim & {} \omega _i \zeta _i + ( 1 - \omega _i ) \left( \sigma _i + \tau \right) . \end{aligned}$$

Now \(\zeta _i , \sigma _i\) are negligible in comparison with \(\tau \), and so

$$\begin{aligned} \tau _i\sim & {} ( 1 - \omega _i ) \tau . \end{aligned}$$
(37)

The above somewhat heuristic argument is made rigorous for the case \(k=1\) in Appendix B of Ball et al. (2014).

Recalling from Sect. 3 that \(\omega = \sum _i \mu _i f_i \omega _i\), it follows from (37) that

$$\begin{aligned} \sum _{i=1}^k \mu _i f_i \tau _i\sim & {} ( 1 - \omega ) \tau . \end{aligned}$$

Now from Eqs. (35) and (36),

$$\begin{aligned} \sum _i \mu _i f_i \tau _i= & {} {1 \over \beta } \sum _{\varvec{x}\in C} {\left( \sum _{j=1}^k x_j \right) ! \over \sum _{j=1}^k x_j} \prod _{j=1}^k \left( {\beta \alpha _j \mu _j \over N} \right) ^{x_j} {N_j \atopwithdelims ()x_j} , \end{aligned}$$
(38)

and hence

$$\begin{aligned} \tau\sim & {} {1 \over \beta \left( 1 - \omega \right) } \sum _{\varvec{x}\in C} {\left( \sum _{j=1}^k x_j \right) ! \over \sum _{j=1}^k x_j} \prod _{j=1}^k \left( {\beta \alpha _j \mu _j \over N} \right) ^{x_j} {N_j \atopwithdelims ()x_j} . \end{aligned}$$

In the case of exponentially distributed infectious periods, we know from Eq. (10) that \(\beta \left( 1 - \omega \right) = D ( \varvec{\mu }, \mathbf{1} )\), and so comparing with expression (34), we obtain our general result (6), and Theorem 1(i) is now proved.

For the homogeneous population case (\(k=1\)), Ball et al. (2016) approximated the sum (38) directly, and from this derived an asymptotic approximation to the mean persistence time \(\tau \). For \(k>1\), it does not seem straightforward to approximate (38) directly, so that we have proceeded more indirectly via the methods of Sects. 4.14.2.

4.4 Heterogeneous Infectiousness and Exponentially Distributed Infectious Periods

As Clancy (2018) points out, see also Wilkinson and Sharkey (2013), it follows from the network duality results of Harris (1976) and Holley and Liggett (1975) that provided infectious periods are exponentially distributed, the value of \(\tau \) is unchanged if we interchange the roles of \(\varvec{\lambda }, \varvec{\mu }\). Theorem 1(ii) then follows immediately from Theorem 1(i).

5 Applications

5.1 Performance of the Mean Persistence Time Approximation

Figure 1 shows values of \(\ln \tau \) computed from the exact formula (13) compared with values computed from our asymptotic formulae (6, 7), for the case of exponentially distributed infectious periods with heterogeneous susceptibility or infectiousness. We see that the approximation is quite accurate for the range of parameter values considered, and (although the effect is a little hard to make out from Fig. 1) the accuracy of the approximation improves as population size N increases, as expected. It is also apparent that the approximation performs better as \(R_0\) increases. Since our methods are valid under the condition that \(R_0 > 1\), it is perhaps not surprising that the accuracy of the approximation decreases as \(R_0\) approaches 1. We note that the approximation appears to consistently err on the side of slightly underestimating the mean persistence time.

Fig. 1
figure 1

Values of \(\ln \tau \) plotted against population size N for the case of exponentially distributed infectious periods, with three different values for the basic reproduction number \(R_0\). Fixed parameter values \(k=2\), \(\varvec{f} =(0.5,0.5)\), \(\varvec{\alpha }= (1,1)\), \(\varvec{\lambda }= (1,1)\), \(\varvec{\mu }= {1 \over 3} (5,1)\) (or equivalently \(\varvec{\lambda }= {1 \over 3} (5,1)\), \(\varvec{\mu }= (1,1)\)). Lines are computed from the asymptotic formula (6); dots are the true values of \(\ln \tau \) computed from Eq. (13)

The effects of different infectious period distributions are illustrated in Fig. 2, where we consider both exponentially distributed infectious periods and infectious periods which are constant (non-random). For the case of exponentially distributed infectious periods, exact values of \(\ln \tau \) are again computed from formula (13). For the case of constant infectious periods, we compare our asymptotic formula (6) with the results of Monte Carlo simulation. Specifically, for each N value we simulated 1000 realizations of the process started close to the deterministic endemic equilibrium, at the point \(\left( \lfloor N y_1^* \rfloor , \lfloor N y_2^* \rfloor , \ldots , \lfloor N y_k^* \rfloor \right) \), where \(\lfloor x \rfloor \) denotes the integer part of x. To indicate the dependence of the expected persistence time upon parameters, we write \(\tau = \tau _{\varvec{\alpha }} ( \varvec{\lambda }, \varvec{\mu })\) and denote by \(\tilde{\tau }_{\varvec{\alpha }} ( \mathbf{1} , \varvec{\mu })\) the approximation to \(\tau _{\varvec{\alpha }} ( \mathbf{1} , \varvec{\mu })\) given by the right-hand side of formula (6). For the case of heterogeneous susceptibility, we allowed a burn-in period of \(t_0 = 0.3 \tilde{\tau }_{\varvec{\alpha }} ( \mathbf{1} , \varvec{\mu })\) for the process to attain quasi-stationarity, after which the process was allowed to continue until either infection became extinct or time \(t_{\max } = 1.8 \tilde{\tau }_{\varvec{\alpha }} ( \mathbf{1} , \varvec{\mu })\) was reached. For the case of heterogeneous infectiousness, we took \(t_0 = 0.3 \tilde{\tau }_{\varvec{\alpha }} ( \mathbf{1} , \varvec{\lambda })\) and \(t_{\max } = 1.8 \tilde{\tau }_{\varvec{\alpha }} ( \mathbf{1} , \varvec{\lambda })\). We then computed the maximum likelihood estimate of \(\ln \tau \) as described in Section 6 of Clancy (2018). Histograms of the observed extinction times were compared visually with the probability density function of the exponential distribution with rate parameter estimated by maximum likelihood and seen to fit reasonably well, providing reassurance that the burn-in period \(t_0\) was sufficient. Although the network duality results of Harris (1976) and Holley and Liggett (1975) apply only to exponentially distributed infectious periods, we observe that, as noted in Clancy (2018), Monte Carlo estimates of \(\tau \) are essentially identical for the cases of heterogeneous infectiousness and (corresponding) heterogeneous susceptibility, even with constant infectious periods. We observe that the error in our approximating formula is much the same when infectious periods are constant as when infectious periods are exponentially distributed, with mean persistence time again being consistently somewhat underestimated by our asymptotic formula (6).

Fig. 2
figure 2

Values of \(\ln \tau \) plotted against population size N, showing the effect of the infectious period distribution upon the mean persistence time of infection \(\tau \). Fixed parameter values \(k=2\), \(\varvec{f} = (0.5,0.5)\), \(\varvec{\alpha }= (1,1)\), \(R_0 = 1.3\). Dots computed from the eigenvalue equation (13) with \(\varvec{\lambda }= (1,1)\), \(\varvec{\mu }= {2 \over 51} (50,1)\); crosses (‘Heterogeneous susceptibility’) computed via Monte Carlo simulation with \(\varvec{\lambda }= ( 1,1 )\), \(\varvec{\mu }= {2 \over 51} (50,1)\) and constant infectious periods; circles (‘Heterogeneous infectiousness’) computed via Monte Carlo simulation with \(\varvec{\lambda }= {2 \over 51} (50,1)\), \(\varvec{\mu }= (1,1)\) and constant infectious periods; dashed line computed from formula (6) with \(\varvec{\lambda }= (1,1)\), \(\varvec{\mu }= {2 \over 51} (50,1)\) and exponentially distributed infectious periods; solid line computed from formula (6) with \(\varvec{\lambda }= (1,1)\), \(\varvec{\mu }= {2 \over 51} (50,1)\) and constant infectious periods

5.2 Superspreaders and the Effects of Different Forms of Heterogeneity

For many outbreaks of infection, it is thought that a small number of infected individuals, often referred to as ‘superspreaders’, are responsible for a disproportionate amount of pathogen transmission (e.g. Lau et al. 2017; Plowright et al. 2017; Yates et al. 2006). This may arise through a variety of mechanisms. Two possibilities are that some individuals infect at a much higher rate than others (represented in our model by heterogeneity in \(\varvec{\lambda }\)); or that some individuals remain infectious for much longer than others (represented by heterogeneity in \(\varvec{\alpha }\)). Our results allow us to study the effect of either of these forms of superspreading, comparing alternative forms of heterogeneity to one another, and also comparing with a matched (having the same value for \(R_0\)) homogeneous population, as follows.

To make explicit the dependence upon parameters, we now write \(D ( \varvec{\lambda }, \varvec{\mu }) = D_{\varvec{\alpha }} ( \varvec{\lambda }, \varvec{\mu })\), and similarly \(\tau = \tau _{\varvec{\alpha }} ( \varvec{\lambda }, \varvec{\mu })\). We consider only the case of exponentially distributed infectious periods, in order that we can apply formula (7), and we impose the constraint \(\sum _i \alpha _i f_i = 1\) (that is, we re-scale time so that the unit of time is the mean infectious period across the whole population). Let \(\varvec{\eta }= \left( \eta _1 , \eta _2 , \ldots , \eta _k \right) \) be any vector with positive components satisfying \(\sum _i \eta _i f_i = 1\), representing the heterogeneity. Denoting by \(\tilde{\tau }_{\varvec{\alpha }} ( \varvec{\lambda }, \mathbf{1} )\) the approximation to \(\tau _{\varvec{\alpha }} ( \varvec{\lambda }, \mathbf{1} )\) given by the right-hand side of (7), then from Eq. (3), we have that \(D_\mathbf{1} ( \varvec{\eta }, \mathbf{1} ) = \beta - 1\) and \(D_\mathbf{1} ( \mathbf{1} , \varvec{\eta }) = D_{\varvec{\eta }} ( \mathbf{1} , \mathbf{1} )\), and hence formula (7) implies that

$$\begin{aligned} {\tilde{\tau }_\mathbf{1} ( \varvec{\eta }, \mathbf{1} ) \over \tilde{\tau }_{\varvec{\eta }} ( \mathbf{1}, \mathbf{1} )}= & {} {D_{\varvec{\eta }} ( \mathbf{1} , \mathbf{1} ) \over \beta - 1} . \end{aligned}$$

Now it follows from Jensen’s inequality applied to Eq. (3) that \(D_{\varvec{\eta }} ( \mathbf{1} , \mathbf{1} ) \le \beta - 1\), and so

$$\begin{aligned} \tilde{\tau }_\mathbf{1} ( \varvec{\eta }, \mathbf{1} )\le & {} \tilde{\tau }_{\varvec{\eta }} ( \mathbf{1}, \mathbf{1} ) . \end{aligned}$$

That is, for sufficiently large N, heterogeneity in levels of infectiousness leads to a shorter expected persistence time than corresponding heterogeneity in the lengths of infectious periods.

It was shown in Theorem 2(i) of Clancy (2018) that when \(\varvec{\alpha }= \varvec{\mu }= \mathbf{1}\), the leading-order constant A in formula (1) is maximised, for a given value of \(R_0\), when \(\varvec{\lambda }= \mathbf{1}\). This implies that for sufficiently large N, \(\tau _\mathbf{1} ( \mathbf{1}, \mathbf{1} ) \ge \tau _\mathbf{1} ( \varvec{\eta }, \mathbf{1} )\), and since \(D_\mathbf{1} ( \mathbf{1} , \varvec{\eta }) = D_{\varvec{\eta }} ( \mathbf{1} , \mathbf{1} )\) it also follows that for sufficiently large N, \(\tau _\mathbf{1} ( \mathbf{1}, \mathbf{1} ) \ge \tau _{\varvec{\eta }} ( \mathbf{1} , \mathbf{1} )\). That is, for a sufficiently large population, heterogeneity in either levels of infectiousness or infectious period durations reduces the expected persistence time of infection in the population, compared to a corresponding homogeneous population.

These effects are illustrated in Fig. 3, in which we take 10% of the population to generate up to 50 times more potentially infectious contacts per infectious period than the remaining 90% of the population. The leftmost point of each curve, at \(\varvec{\eta }= (1,1)\), corresponds to a homogeneous population. We see that as the degree of heterogeneity increases, the mean persistence time decreases, while the difference between the effects of the two types of heterogeneity increases. With maximal heterogeneity represented by \(\hat{\varvec{\eta }} = (1/f_1 , 0)\), the limiting ratio is given by \(\tilde{\tau }_{\hat{\varvec{\eta }}} ( \mathbf{1}, \mathbf{1} ) / \tilde{\tau }_\mathbf{1} ( \hat{\varvec{\eta }} , \mathbf{1} ) = (\beta -1) / D_{\hat{\varvec{\eta }}} ( \mathbf{1} , \mathbf{1} ) = 1/ f_1\).

It is worth noting that when heterogeneity is in infectiousness, the fact that we have held \(R_0\) constant across Fig. 3 implies that the endemic equilibrium point \(\varvec{y}^*\) also remains fixed. In fact, from Eqs. (39), it is immediate that when \(\varvec{\alpha }= \varvec{\mu }= \mathbf{1}\), we have \(y_i^* = (1 - (1/R_0)) f_i\) for \(i=1,2,\ldots ,k\), for any \(\varvec{\lambda }\). Now Theorem 11 of Clancy and Pearce (2013) demonstrates, via a multivariate normal approximation, that greater heterogeneity in infectiousness corresponds to greater variability in the quasi-stationary distribution. Thus, in this case, the decrease in mean persistence time observed across Fig. 3 (solid curve) corresponds to an increase in variability of the quasi-stationary distribution leading to larger fluctuations (around the same equilibrium point) and hence faster extinction of infection. However, the solid line in Fig. 3 could equally be interpreted as corresponding to heterogeneous susceptibilities, with exponentially distributed infectious periods and \(\varvec{\alpha }= \varvec{\lambda }= \mathbf{1}\), \(\varvec{\mu }= \varvec{\eta }\) in formula (6). When heterogeneity is in susceptibilities, it has been shown (Clancy and Pearce 2013, Theorem 10) that the overall endemic prevalence level \(y^* = \sum _{i=1}^k y_i^*\) decreases with increasing heterogeneity. That is, when heterogeneity is in susceptibilities, the decrease in persistence time observed across Fig. 3 (solid curve) accompanies a corresponding decrease in overall endemic prevalence level. Furthermore, from formulae (39), it is apparent that the endemic equilibrium point \(\varvec{y}^*\) for the case \(\varvec{\alpha }= \varvec{\eta }, \varvec{\lambda }= \varvec{\mu }= \mathbf{1}\) is the same as for the case \(\varvec{\mu }= \varvec{\eta }\), \(\varvec{\alpha }= \varvec{\lambda }= \mathbf{1}\). Consequently, when heterogeneity is in infectious period durations, the decrease in persistence time observed across Fig. 3 (dashed curve) again accompanies a corresponding decrease in overall endemic prevalence level.

Fig. 3
figure 3

Effect of different types of heterogeneity upon the mean persistence time \(\tau \). Plotted lines show values of \(\ln \tilde{\tau }\), where \(\tilde{\tau }\) denotes the approximation to mean persistence time given by formula (7). Fixed parameter values \(k=2\), \(\varvec{f} = (0.1,0.9)\), \(\varvec{\mu }= (1,1)\), \(R_0 = 1.5\), \(N=500\). Solid line computed from formula (7) with \(\varvec{\alpha }= (1,1)\), \(\varvec{\lambda }= \varvec{\eta }\); dashed line computed from formula (7) with \(\varvec{\alpha }= \varvec{\eta }\), \(\varvec{\lambda }= (1,1)\)

5.3 Approximating the Quasi-Stationary Distribution

En route to our analysis of mean time to extinction, obtained via approximation of the tail of the quasi-stationary distribution, we also obtained an approximation for the main body of the quasi-stationary distribution, at least in the case of heterogeneous susceptibilities (\(\varvec{\lambda }= \mathbf{1}\)) and exponentially distributed infectious periods, from the WKB formula (16) with \(K_N , V ( \varvec{y}) , V_0 ( \varvec{y})\) given by Eqs. (26, 20, 28), respectively. This new approximation may be regarded as a refinement of the multivariate Gaussian approximation (22), previously derived via an approximating diffusion (Ornstein–Uhlenbeck) process in Section 6 of Clancy and Pearce (2013). Figure 4 shows contour plots of the exact quasi-stationary distribution \(\varvec{q}\), obtained from Eq. (13), and the two approximations (16, 22) for a population of size \(N=200\) consisting of two equal-sized groups, with group 1 individuals being three times as susceptible to infection as group 2 individuals. Our new approximation is clearly a great improvement upon the Gaussian approximation, particularly away from the endemic equilibrium point \(N \varvec{y}^*\), the contours of the WKB approximation being indistinguishable from those of the exact solution.

Fig. 4
figure 4

Contour plots of the quasi-stationary distribution \(\varvec{q}\) and approximations. Contour levels correspond to probabilities \(5 \times \left( 10^{-3} , 10^{-5} , 10^{-7} , 10^{-9} , 10^{-11} \right) \). Parameter values \(k=2\), \(\varvec{f} = (0.5,0.5)\), \(\varvec{\alpha }= (1,1)\), \(\varvec{\mu }= {1 \over 2} (3,1)\), \(\varvec{\lambda }= (1,1)\), \(R_0 = 2.5\), \(N=200\). Solid contours represent both the exact quasi-stationary distribution computed from Eq. (13) and the WKB approximation (16), which are indistinguishable. Dashed contours represent the Gaussian approximation (22)

Note that unlike the mean persistence time \(\tau \), the quasi-stationary distribution \(\varvec{q}\) for the case of heterogeneous infectiousness cannot be obtained from the heterogeneous susceptibilities solution simply by interchanging the roles of \(\varvec{\lambda }, \varvec{\mu }\). In fact, even the location of the endemic equilibrium point \(N \varvec{y}^*\), corresponding to the mode of the quasi-stationary distribution, is not maintained under this transformation (Clancy and Pearce 2013). On the other hand, for the case of heterogeneous susceptibilities, the WKB approximation (16) approximates the stationary distribution of the restarted process studied in Sect. 4.3 and hence may be used to approximate the quasi-stationary distribution of numbers of infected individuals even when infectious periods are not exponentially distributed.

5.4 Network Interpretation

The model of Sect. 2 may be interpreted as describing an infection spreading through a directed network under the so-called annealed network approximation, as outlined in Clancy (2018). Briefly, we suppose that individuals are assigned an in-degree \(d_\mathrm{in}\) and out-degree \(d_\mathrm{out}\) according to some joint probability mass function \(p \left( d_\mathrm{in},d_\mathrm{out} \right) \) satisfying \(E \left[ d_\mathrm{in} \right] = E \left[ d_\mathrm{out} \right] \). We suppose that there are a finite number k of \(\left( d_\mathrm{in} , d_\mathrm{out} \right) \) pairs having nonzero probability. We further assume that the network is uncorrelated, that is, with no correlations between degrees of neighbouring individuals. We define a bijective function \(j \left( d_\mathrm{in} , d_\mathrm{out} \right) : {\mathbb Z}_+^2 \rightarrow \left\{ 1,2,\ldots ,k \right\} \), so that any individual having degrees \(\left( d_\mathrm{in} , d_\mathrm{out} \right) \) belongs to group \(j \left( d_\mathrm{in} , d_\mathrm{out} \right) \). Denote by \(d_\mathrm{in} (j) , d_\mathrm{out} (j)\) the in and out degrees, respectively, of a group j individual, and by \(\kappa \) the rate at which infection transmits along each link from an infectious individual to a susceptible individual. This network model may be approximated by our multigroup model of Sect. 2 by setting, for \(j=1,2,\ldots ,k\),

$$\begin{aligned} \beta= & {} \kappa E \left[ d_\mathrm{out} \right] , \\ f_j= & {} p \left( d_\mathrm{in}(j) , d_\mathrm{out} (j) \right) , \\ \mu _j= & {} d_\mathrm{in} (j) / E \left[ d_\mathrm{in} \right] , \\ \lambda _j= & {} d_\mathrm{out} (j) / E \left[ d_\mathrm{out} \right] . \end{aligned}$$

The undirected version of the above annealed network approximation (with \(\varvec{\lambda }= \varvec{\mu }\)) has been studied in Hindes and Schwartz (2016, 2017) in terms of the leading-order constant A in expression (1), evaluated via numerical solution of the Hamilton–Jacobi equation (18). Our results (6, 7), as well as being much quicker and more straightforward to evaluate, are thus considerably more precise, although applicable only to rather restricted classes of directed networks. Specifically, the assumption that \(\varvec{\lambda }= \mathbf{1}\) corresponds to every individual having the same out-degree, whereas \(\varvec{\mu }= \mathbf{1}\) corresponds to every individual having the same in-degree. Nevertheless, it is remarkable to be able to obtain such simple and precise results as formulae (6, 7) even for such a restricted class of networks.

6 Discussion and Further Work

The main contribution of this paper has been to provide simple explicit formulae (6, 7) for the mean persistence time, in the large population limit, of the heterogeneous population SIS infection model described in Sect. 2. The only infection model for which such a result has previously been available is the homogeneous population version of this same model, corresponding to the case \(k=1\). Explicit formulae are particularly valuable here since numerical solution of the Hamilton–Jacobi equation (18) generally requires the solution of a high-dimensional system of ordinary differential equations subject to boundary conditions at times \(t = - \infty \) and \(t = + \infty \), so that even to obtain the leading-order constant A in formula (1) can be very challenging, in the absence of an explicit formula. We have shown, in Sect. 5.2, how such explicit formulae may be used to study qualitative features such as the effects of different types of heterogeneity upon the persistence time of infection. Additionally, in the course of our analysis, we have obtained a new and accurate approximation to the quasi-stationary distribution of the process, which determines long-term behaviour prior to eventual extinction of infection; see Sect. 5.3. Our model may also be interpreted as an approximate model for infection spreading on a directed network, as described in Sect. 5.4. While our results thus represent substantial progress, many open questions remain.

Firstly, while Theorem 1(i) for the case of heterogeneous susceptibilities allows for any infectious period distribution, Theorem 1(ii) for heterogeneous infectiousness applies only with exponentially distributed infectious periods. From numerical work, including that presented in Fig. 2, it seems likely that a result corresponding to formula (6), generalising formula (7) to allow for any infectious period distribution, does indeed apply when heterogeneity is in infectiousness, but we have not been able to prove this because the network duality results of Harris (1976) and Holley and Liggett (1975) apply only provided that infectious periods are exponentially distributed.

Secondly, although our model of Sect. 2 allows for heterogeneity in susceptibilities and infectiousness simultaneously, Theorem 1 requires that only one of these forms of heterogeneity be present. It would be of great interest to find a corresponding formula allowing for both forms of heterogeneity simultaneously. In particular, this would allow our result to be applied to infections spreading on a much more general class of networks, including undirected networks, as studied via the annealed network approximation in Hindes and Schwartz (2016), requiring \(\varvec{\lambda }= \varvec{\mu }\). More generally, one could allow some quite general matrix of contact rates \(\left\{ \beta _{ij} \right\} \), rather than restricting as we have to contact rates that factorise as \(\beta _{ij} = \beta \lambda _i \mu _j\), in order to study phenomena such as assortative/disassortative mixing (Clancy and Pearce 2013). Unfortunately, there is no reason to expect that explicit formulae such as (6, 7) exist at all in such cases, even under the assumption of exponentially distributed infectious periods. In particular, we have only been able to find an explicit solution \(V ( \varvec{y})\) to the Hamilton–Jacobi equation (18) in the case \(\varvec{\lambda }= \mathbf{1}\) (formula (20); see also Clancy 2018). Consequently, to evaluate the leading-order constant A in formula (1) generally requires numerical solution of Eq. (18), as implemented for the case \(\varvec{\lambda }= \varvec{\mu }\) in Hindes and Schwartz (2016). Further, without an explicit formula for \(V ( \varvec{y})\), Eqs. (1923) cannot be used to find explicit expressions for \(V_0 ( \varvec{y})\), \(K_N\). One could, in principle, evaluate \(K_N\) and \(V_0 ( \varvec{y})\) numerically, as was done in Black and McKane (2011) for a particular SIR infection model, and thereby obtain the WKB approximation for the main body of the quasi-stationary distribution, corresponding to our result illustrated in Fig. 4. However, our asymptotic formulae for mean persistence time depend upon approximating the tail of the quasi-stationary distribution, and it is not clear how the matching procedure of Sect. 4.2 could be carried through numerically, without explicit formulae to match.

Finally, it would be of great interest to obtain explicit formulae such as (6, 7) for infection models incorporating features such as disease-induced immunity (SIR models), latent periods and demographic processes of birth, death and migration. For such more sophisticated models, a common strategy has been to resort to approximating the quasi-stationary distribution by a multivariate Gaussian distribution (obtained as the stationary distribution of an approximating Ornstein–Uhlenbeck diffusion process) and then substitute this Gaussian approximation into the right-hand side of Eq. (15) to obtain an approximation to the mean persistence time \(\tau \). For instance, Nåsell (1999) made use of this approach in studying an infection model incorporating demographic processes and disease-induced immunity (with exponentially distributed infectious periods), following on from which Andersson and Britton (2000) extended the model to include latency, with latent periods and infectious periods each being allowed to follow Erlang distributions. Unfortunately, while this approach can give some rough qualitative indication of the effect of model parameters upon persistence times, the numerical approximation to \(\tau \) thus obtained is known to be extremely inaccurate (Clancy and Tjia 2018). Indeed, as pointed out in Nåsell (1999), this approximation does not yield correct N-dependence in the large population limit; specifically, the approximation which appears as equation (2.15) of Nåsell (1999) takes the form \(\tau \approx c \sqrt{N} \exp ( aN )\) for some constants ac, in contrast to the asymptotic form (1) obtained in van Herwaarden and Grasman (1995). In fact, even the value of \(\lim _{N \rightarrow \infty } (\ln \tau ) / N\), given by the leading-order constant A in formula (1), is not correctly reproduced via this approach (Clancy and Tjia 2018; Doering et al. 2005). For these reasons, Andersson and Britton (2000) noted that their approximation ‘should only serve as a qualitative guidance and not be relied on in detail’. In view of this failure of the Ornstein–Uhlenbeck approximating diffusion approach, the approach that we have employed, via WKB approximation, can be seen to be of great potential value, yielding as it does the correct asymptotic behaviour. However, it is much more difficult to obtain explicit formulae via this approach, and indeed there seems no reason to expect that explicit formulae such as (6, 7) will exist in general. Consequently, much of the work to date employing this approach for models in dimensions \(k>1\), by many authors, has consisted essentially of numerical evaluation of the leading-order constant A in formula (1). One exception is the SEIS model in a homogeneous population—that is, the classic SIS model of Weiss and Dishon (1971), extended to allow for a latent (‘exposed’) period. It was shown in Clancy and Tjia (2018) that for this SEIS model, with latent periods and infectious periods each allowed to follow Erlang distributions, the value of \(\lim _{N \rightarrow \infty } ( \ln \tau )/N\) is given by \(A = (1/R_0) - 1 + \ln R_0\), exactly as for the classic SIS model (Andersson and Djehiche 1998). Thus, while the presence of a latent period may impact substantially upon the mean persistence time, this impact is restricted to the prefactor constant C in formula (1), at least for this particular model. In general, while explicit formulae for the constants AC in (1) may be too much to hope for, in cases where the leading-order constant A can only be evaluated numerically a natural next step may be to seek general methods for evaluating the prefactor constant C numerically. Even here, as mentioned in the previous paragraph, the difficulties to be overcome remain substantial.