Approximating Time to Extinction for Endemic Infection Models

Approximating the time to extinction of infection is an important problem in infection modelling. A variety of different approaches have been proposed in the literature. We study the performance of a number of such methods, and characterise their performance in terms of simplicity, accuracy, and generality. To this end, we consider first the classic stochastic susceptible-infected-susceptible (SIS) model, and then a multi-dimensional generalisation of this which allows for Erlang distributed infectious periods. We find that (i) for a below-threshold infection initiated by a small number of infected individuals, approximation via a linear branching process works well; (ii) for an above-threshold infection initiated at endemic equilibrium, methods from Hamiltonian statistical mechanics yield correct asymptotic behaviour as population size becomes large; (iii) the widely-used Ornstein-Uhlenbeck diffusion approximation gives a very poor approximation, but may retain some value for qualitative comparisons in certain cases; (iv) a more detailed diffusion approximation can give good numerical approximation in certain circumstances, but does not provide correct large population asymptotic behaviour, and cannot be relied upon without some form of external validation (eg simulation studies).


Introduction
In modelling endemic infections, a quantity of particular interest is the persistence time until infection dies out of the population. For the simplest models, it is possible to compute the distribution of this random variable exactly, based on general Markov chain theory. In particular, the expected persistence time may be expressed as the solution of a linear system of equations. However, for large populations and for more complicated models, numerical computation of this exact solution can become very time-consuming, and may also suffer from numerical instability. Further, it is not straightforward to use the exact solution to address qualitative issues such as, for instance, the effect of greater variability in individual infectious periods upon the expected persistence time in the population. Hence it remains of great interest to find simple and accurate approximations.
Our aim is to assess a number of approximation techniques in terms of simplicity, accuracy, and generality. In order to do this, we focus upon relatively simple models, for which it is possible to compute exact values for comparison. Thus in Section 2 we begin by studying the classic susceptible-infected-susceptible (SIS) model of Weiss and Dishon (1971). Whilst a useful simple model to start from, this classic SIS model is not an entirely typical infection model. In particular, it possesses a natural 1-dimensional structure (only the number of infected individuals must be kept track of), whereas most infection models are multi-dimensional (eg one may need to keep track of susceptible, infected, and immune individuals, giving a 3-dimensional model). In Section 3 we therefore extend the SIS model to allow for Erlang-distributed infectious periods, generalising the exponentially distributed infectious periods assumed within the classic SIS model, and resulting in a model which is naturally multi-dimensional, but nevertheless still relatively simple. The models we study have the further advantage that in each case, rigourous asymptotic results are known giving the expected persistence time as population size becomes large. Results for the classic SIS model appear in Andersson and Djehiche (1998), and the extension to the SIS model with general infections period distribution in Ball et al. (2016). These asymptotic results are derived by techniques which seem unlikely to generalise to broader classes of epidemic models, but provide a basis against which to compare the more general approximation methods that we are chiefly interested in.
We consider four general approximation techniques. First, we consider the below threshold case, when the basic reproduction number R 0 (the average number of new infections caused by a typical infected individual in an otherwise susceptible population) is less than one. Then epidemic threshold theory (Ball 2014), applicable to a wide class of models, tells us that the infectives process may be approximated by a linear branching process, and consequently the mean persistence time of the infection process may be approximated by the mean time to extinction of the branching process.
An alternative, applicable to both the below threshold (R 0 ≤ 1) and above threshold (R 0 > 1) cases, is the approach often referred to as 'the diffusion approximation'. Results such as those contained in section 11.3 of Ethier and Kurtz (1986) justify approximating the infection model (a Markov jump process) by a diffusion process over finite time intervals, in the limit as population size becomes large. However, in this large population limit, the mean persistence time generally tends to infinity, so that convergence over finite time intervals is not sufficient (see section 11.4 of Ethier and Kurtz (1986)). In spite of this lack of rigourous justification, a number of recent papers have nevertheless used the diffusion approximation to approximate mean persistence times for epidemic models, eg Doubova and Vadillo (2016) and Wang et al. (2014).
In the case R 0 > 1, a simpler diffusion approximation is possible, in which the diffusion approximation referred to above is linearised about the equilibrium point of the deterministic approximation, leading to an Ornstein-Uhlenbeck diffusion process. This approximation is considerably simpler than the diffusion approximation described previously, and hence has seen widespread use, eg Andersson and Britton (2000), Britton and Neal (2010), Clancy (2005), Clancy and Pearce (2013), Lindholm and Britton (2007), Nåsell (1999), and Nåsell (2002).
Finally, a rather different approach valid for R 0 > 1 uses techniques from Hamiltonian statistical mechanics, and has seen much recent attention in the theoretical physics literature (Asaf and Meerson 2010;Dykman et al. 1994;Elgart and Kamenev 2004;Kamenev and Meerson 2008).

Approximation Techniques
We consider first the classic stochastic SIS model introduced by Weiss and Dishon (1971). In this model, infection spreads between members of a closed population of size N consisting at time t of S(t) susceptible and I (t) infected individuals. It is sufficient to focus on is a continuous-time Markov chain, with rates of transition given in Table 1, where β, γ > 0 are the infection and recovery rate parameters, respectively.
The basic reproduction number for this model is given by R 0 = β/γ . The state space consists of an absorbing state at I = 0 together with the transient communicating class C = {1, 2, . . . , N}. Consequently, absorption at state 0 will occur within finite time with probability 1, and our interest is in the random variable to be the expected time to extinction starting from i infected individuals, and write τ = (τ 1 , τ 2 , . . . , τ N ). Then denoting by Q the transition rate matrix of the process, and by Q C the rate matrix restricted to the transient states, we have (Norris (1997), theorem 3.3.3) that Equation 1 may be solved numerically for τ . Alternatively, equation (5.28) of Norden (1982) gives the explicit solution When the infection process is above threshold (R 0 > 1) then rather than specifying a fixed initial state I (0), it is arguably of more interest to consider as initial condition a population in which infection has already settled to an endemic level. Since {I (t)} is a process on a finite state-space with all non-absorbing states communicating, then (Darroch and Seneta 1967) there exists a unique quasi-stationary distribution q = (q 1 , q 2 , . . . , q N ) such that, whatever the initial state of the process, The quasi-stationary distribution q may be found as the unique solution of where −λ is the eigenvalue of Q C with largest real part. The time to extinction from quasistationarity is then exponentially distributed with mean τ q = 1/λ = 1/γ q 1 . Note that the quasi-stationary distribution q and the mean extinction time τ q may both be evaluated for any R 0 > 0. However, for R 0 < 1 the process is likely to die out before settling to a quasistationary state, so that q and τ q are of little practical interest. In contrast, for R 0 > 1 the process is likely to settle around the quasi-stationary distribution for a considerable time prior to eventual extinction. Hence for R 0 > 1 it is natural to consider a population in which infection is endemic, and to study the time to extinction from quasi-stationarity. The simplest possible approximation to the above model is the deterministic SIS model, defined as follows. For sufficiently large population size N , the scaled number of infectives I (t)/N may be approximated by the deterministic process y(t) satisfying The deterministic system (4) has equilibria at y = 0 and y * = 1 − (1/R 0 ). For R 0 < 1, the former equilibrium (of extinction) is stable, and y(t) → 0 as t → ∞ for any initial state y(0) ∈ [0, 1]. For R 0 > 1, the endemic equilibrium y * is stable, with y(t) → y * as t → ∞ for any initial state y(0) ∈ (0, 1]. In fact, Eq. 4 may be solved explicitly: for β = γ , starting from y(0) = y 0 , we have Consequently, the process does not become extinct within finite time (for y 0 > 0), whatever the value of R 0 . For R 0 > 1, since y(t) → y * > 0 as t → ∞, the deterministic process cannot provide any useful approximation for mean extinction time. For R 0 < 1, on the other hand, although y(t) > 0 for all t, it is at least the case that y(t) → 0 as t → ∞. Noting that we are approximating the discrete, integer-valued process I (t) in terms of the continuous process y(t), it is natural to employ a continuity correction; that is, for any initial state with and then for i = 1, 2, . . . , N we may suggest τ Det (i) as a crude first approximation to τ i in the subcritical case R 0 < 1. Solving Eq. 5 for t with y(t) = 1/2N , we find for 0.5 ≤ y ≤ N .
The rather crude (first order) deterministic approximation may be improved by considering a (second order) diffusion approximation. That is, the process I (t) may be approximated by the diffusion process Y (t) satisfying where W (t) is a standard Brownian motion. Since we are again approximating a discrete process with a continuous process, we again apply a continuity correction, and approximate τ i by τ Diff (i), where for 0.5 ≤ y ≤ N , From Gardiner (2009), section 5.5.2, we have that τ Diff (y) satisfies the Kolmogorov backward equation for y ∈ (0.5, N), with boundary conditions τ Diff (0.5) = 0, ∂τ Diff ∂y y=N = 0 (where we have imposed a reflecting boundary at y = N ). The solution to Eq. 8 may be written explicitly (Gardiner (2009), equation 5.5.24) as The above integral formula for τ Diff (y) may be evaluated numerically using Matlab. In the case R 0 > 1, we will approximate the time to extinction from quasi-stationarity τ q using τ Diff ( Ny * ), where Ny * denotes the integer part of the re-scaled deterministic endemic equilibrium point. In the supercritical case (R 0 > 1), when studying fluctuations around endemic quasiequilibrium, it is standard (see, for instance, Nåsell (1999)) to further approximate the diffusion Y (t) in terms of an Ornstein-Uhlenbeck process centred at the deterministic equilibrium point Ny * . The drift and diffusion coefficients in Eq. 7 are approximated to leading order around Y = Ny * , yielding the Ornstein-Uhlenbeck processỸ (t) satisfying The stationary distribution of an Ornstein-Uhlenbeck process with drift coefficient J and local varianceB, centred at Ny * , is Gaussian with mean Ny * and variance V satisfying In this case we have J = −(β − γ ) andB = 2Nγ y * , so that V = N/R 0 . The quasistationary distribution q may thus be approximated by a Gaussian distribution with mean Ny * , variance N/R 0 . Recalling that the mean time to extinction is given by τ q = 1/γ q 1 , we have the approximation For large N , recalling that y * = 1 − (1/R 0 ) and making the slight further approximation Ny * − 1 ≈ Ny * gives For N large and R 0 > 1, the mean time to extinction from quasi-stationarity τ q may thus be approximated by The above argument can only be expected to give a reasonable approximation provided the Gaussian approximation to the quasi-stationary distribution assigns negligible probability to negative numbers of infected individuals. A rule of thumb suggested by Nåsell (1999) is to require the coefficient of variation of the Gaussian distribution to be at most 1/3, on the basis that a Gaussian distribution assigns negligible probability to values more than 3 standard deviations from the mean; that is, we require N > 9R 0 / (R 0 − 1) 2 .
Although a number of authors have employed diffusion and Ornstein-Uhlenbeck approximations in studying extinction time for infection models, such approximations are most appropriate in describing moderate deviations from the mean, whereas extinction arises as a result of a large deviation. Consequently, a more appropriate technique may be the methodology from Hamiltonian statistical mechanics described in, for example, Elgart and Kamenev (2004) and Kamenev and Meerson (2008). Defining M(θ; t) = E e θI (t) to be the moment generating function of I (t), then the Kolmogorov forward equations for the process may be written as In terms of the cumulant generating function K(θ; t) = ln M(θ; t), Eq. 11 is equivalent to Supposing (ansatz) that K(θ; t) = NS(θ; t) + o(N) for some function S(θ; t), then Eq. 12 becomes To leading order, this has the form of a Hamilton-Jacobi equation corresponding to the Hamiltonian In this formulation, y represents the scaled number of infected individuals, and θ represents (in the terminology of Hamiltonian mechanics) a conjugate momentum variable. Note that References (Elgart and Kamenev 2004;Kamenev and Meerson 2008) formulate the problem in terms of the probability generating function; we find the above formulation in terms of the cumulant generating function to be more natural. An alternative approach is to work directly with the Kolmogorov forward equations in terms of the quasi-stationary probabilities q i , see for example Dykman et al. (1994). The same Hamiltonian (13) is thus obtained.
The Hamiltonian describes a deterministic motion in (y, θ) space that follows the equations of motion For the specific Hamiltonian (13) the equations of motion are therefore Trajectories of the deterministic SIS model (4) may be obtained by taking θ = 0 in Eq. 14. The system (14) possesses classical equilibrium points at (y, θ) = (0, 0) and (y, θ) = (y * , 0), together with a non-classical disease-free equilibrium point at (y, θ) The expected time to extinction starting from the deterministic equilibrium point satisfies (see Elgart and Kamenev (2004)) ln τ Ny * /N → A as N → ∞, where A is known as the 'action'. The value of A is found by integrating along the trajectory of Eq. 14 that goes from initial state (y, θ) Now H (y, θ) is a constant of the motion (sometimes referred to as the 'energy' of the system), and H (y * , 0) = 0, so along this trajetory we have It follows that y = 0, or θ = 0, or θ = − ln (R 0 (1 − y)). Along the relevant trajectory we do not have y = 0 or θ = 0 except at the endpoints, and so For R 0 > 1 and N large, using τ Ny * to approximate τ q , we therefore have that ln τ q /N may be approximated by ln (τ H The methods outlined above are quite general, and can potentially be applied to any model for transmission of infection (or other population processes). More precise and rigourous results specific to the classic SIS model are presented in Kryscio and Lefèvre (1989), Andersson and Djehiche (1998), and Doering et al. (2005). The relevant results that we will use for comparison with the general approximation methods are as follows.
(i) For R 0 < 1 and I (0) = i fixed, then as N → ∞ the mean extinction time τ i converges to the extinction time of a linear birth-death process with birth and death rate parameters β, γ , respectively (Andersson and Djehiche 1998). That is, τ i → τ Lin i , where from equation (7.10), chapter 4, of Karlin and Taylor (1975), we find (Doering et al. 2005) (iii) For R 0 > 1, the expected time to extinction starting from quasi-stationarity satisfies τ q ∼ τ AD , where (Andersson and Djehiche 1998) Result (i) for the case R 0 < 1 is based on the epidemic threshold theorem, and can be generalised to a wide range of infection models. In contrast, the derivation of result (iii) for the case R 0 > 1 relies upon the 1-dimensional structure of the model, so that whilst similar results may apply to more general birth-death processes, it does not seem straightforward to apply such methods to more general, multi-dimensional, infection models. Doering et al. (2005), by different arguments, confirmed the formula (18) for the case R 0 > 1. In the case R 0 < 1 with I (0)/N → y 0 ∈ (0, 1] as N → ∞, however, formula (17) of Doering et al. (2005) is in disagreement with a corresponding result from Kryscio and  (2); KL asymptotic formula (19); linear birth-death process approximation (16); deterministic approximation (6); diffusion approximation (9) Lefèvre (1989), cited in Andersson and Djehiche (1998). For completeness, we include this alternative result, which states that where γ E denotes Euler's constant γ E ≈ 0.577216.

Below Threshold Case
For all numerical work we fix γ = 1, since varying γ for fixed R 0 simply amounts to a scaling of the time axis. Figure 1 illustrates how our various approximations compare with the exact mean persistence time τ i from any initial state i ∈ {1, 2, . . . , N} for relatively small population size N = 100, with R 0 = 0.8. We see that the deterministic approximation τ Det is, as one might expect, least accurate. The diffusion approximation τ Diff seems to perform best of all our approximations here. The approximation τ Lin performs very well for small initial infective numbers i, but much less well as i becomes large. Again this is to be expected, since this  (2); linear birth-death process approximation (16); deterministic approximation (6); diffusion approximation (9) approximation is based on an assumption that the number of available susceptible individuals remains approximately equal to N at all times. The approximation τ KL performs well in the range 5 ≤ i ≤ 20, but increasingly badly as i increases, and actually produces negative answers for i = 1, 2. It is also notable that τ KL , in contrast to the other approximations, is not monotonically increasing in i, whereas the exact solution (2) clearly is. Next, we consider the performance of our approximations across a range of population sizes N , up to the moderately large size N = 1000. There are essentially two initial conditions of interest -either a fixed number of initial infectives, or a fixed fraction of the population initially infected. Figure 2 shows the performance of relevant approximations for the case of a fixed number of initial infectives, specifically i = 1. We see that the approximation τ Lin , whose value is independent of N , performs well even for small population sizes N , and very well for larger N values. Neither τ Det nor τ Diff perform nearly as well, and they do not improve as N increases, which is not surprising, since both the deterministic and the diffusion process assume that the number of infective individuals is large enough to be treated as a continuous variable. For these parameter values, τ KL takes negative values, and so is not shown.
For the case of a fixed initially infected fraction y 0 , Fig. 3 shows expected persistence time plotted against ln(N ). The best approximation here is τ Diff , with τ Det , τ Lin , τ KL all  (2); KL asymptotic formula (19); linear birth-death process approximation (16); deterministic approximation (6); diffusion approximation (9) performing rather poorly. In terms of asymptotic (large N ) behaviour, formula (17) Fig. 3 it is clear that formula (19), τ KL , does not predict the correct asymptotic gradient. The approximations τ Det , τ Diff , τ Lin all appear to display the correct asymptotic gradient, but with different offsets, with τ Diff performing best. Note that the deterministic approximation (6) predicts the same asymptotic gradient as Doering et al. (2005). Overall, our numerical results for the below threshold case suggest that: (i) the approximation τ KL of Kryscio and Lefèvre (1989) gives the wrong asymptotic behaviour, and overall does not seem a reliable approximation; (ii) the formulae τ DSS of Doering et al. (2005) and τ Det give the correct first-order asymptotic behaviour, but τ DSS differs from the exact answer by an unspecified offset, whilst τ Det differs by a substantial constant offset, so that neither provides a useful numerical approximation; (iii) in the case of an initial small trace of infection, the best approximation is that derived from an approximating linear birth-death process, τ Lin ; (iv) in the case of a substantial fraction of the population being initially infected, the best approximation is that derived from an approximating diffusion process, τ Diff .

Above Threshold Case
In the above threshold case, we focus upon the mean persistence time starting from endemicity; that is, we are interested in approximating τ q . Figure 4 shows the values of τ q and relevant approximations for a system not far above threshold (R 0 = 1.1). The best approximation here is provided by the diffusion process, τ Diff . Note that for the diffusion approximation, we take initial condition Y (0) = Ny * , ie starting from the endemic equilibrium point of the deterministic system. The Ornstein-Uhlenbeck approximation is seen to perform very poorly, and to become even less accurate as N increases. The approximation τ AD performs reasonably well for N ≥ 100, although not as well as τ Diff .
In Fig. 5 we consider a system well above threshold (R 0 = 1.5). We see that τ AD gives the correct leading-order asymptotic behaviour; τ Diff underestimates the mean persistence time; and τ OU provides a very poor approximation. Figure 5 reproduces elements of figure 6 of Doering et al. (2005), although Doering et al. (2005) did not consider the Ornstein-Uhlenbeck approximation, and computed mean time to extinction from a fixed state close to the endemic equilibrium Ny * , rather than from quasi-stationarity. There are also two differences in how the diffusion approximation is computed: firstly, we compute τ Diff exactly using formula (9), whereas Doering et al. (2005) use an asymptotic approximation; secondly, our formula (9) employs a continuity correction, stopping the diffusion at Y = 0.5, whereas in Doering et al. (2005) the diffusion is simply absorbed at the boundary Y = 0. However, these refinements do not make any appreciable difference to the values plotted.
The asymptotic result derived by Doering et al. (2005) for the diffusion approximation (absorbed at Y = 0) is that, for R 0 > 1, the expected persistence time τ Diff (y) from any initial state y = O(N) satisfies, as N → ∞,  (3); AD asymptotic formula (18); diffusion approximation (9); Ornstein-Uhlenbeck approximation (10) As pointed out by Doering et al. (2005), comparison of formulae (20) and (18) shows that the diffusion approximation does not give the correct asymptotic behaviour of ln(τ )/N , confirming what we see in Fig. 5. To look more closely at the failure of the diffusion approximation, in Fig. 6 we plot the exponential constants from formulae (18,20). We see that as R 0 increases, leading order behaviour of τ Diff becomes an increasingly poor approximation to the true leading order behaviour of τ q .
It is similarly clear from formula (10) that τ OU does not give correct leading-order asymptotic behaviour. On the other hand, the Hamiltonian approach, formula (15), does yield correct leading-order asymptotic behaviour.
In summary, for R 0 slightly above 1 and moderate population size N , the best approximation appears to be provided by the diffusion approximation. However, the diffusion approximation has the wrong leading-order asymptotic behaviour as N → ∞, and gets worse as R 0 increases. Hence this approximation can only be used provided that N is not too large and R 0 is not too far above 1. This is in line with comments of Doering et al. (2005). Correct leading-order asymptotic behaviour may be found using the Hamiltonian approach. However, this approach does not provide a useful numerical approximation, since the result we obtain is that τ q ∼ C(N, R 0 )τ H with C(N, R 0 ) = o e N as N → ∞. Since the prefactor C(N, R 0 ) is unknown, we cannot use this asymptotic result to evaluate a numerical approximation to τ q . The Ornstein-Uhlenbeck approximation, though widely used, is seen to provide an extremely poor approximation to τ q .  (3); AD asymptotic formula (18); diffusion approximation (9); Ornstein-Uhlenbeck approximation (10)

Approximation Techniques
The classic SIS model makes the implicit assumption that each individual's infectious period follows an exponential distribution. This is purely a mathematical convenience, not motivated by biological realism. We can improve the biological realism of the model by allowing infectious periods to follow an Erlang distribution, using the 'method of stages'. That is, when an individual becomes infected, it passes through k infectious stages, remaining in each stage for an exponentially distributed time of mean 1/kγ , before returning to susceptibility. Thus the total infectious period follows an Erlang distribution, and the process can be modelled as a Markov chain {(I 1 (t), I 2 (t), . . . , I k (t)) : t ≥ 0}, where I m (t) is the number of individuals in infectious stage m at time t, and the number of susceptible individuals at time t is S(t) = N − k m=1 I m (t). Transition rates are given in Table 2. The state space is (i 1 , i 2 , . . . , i k ) ∈ (Z + ) k : i 1 + i 2 + · · · + i k ≤ N , and consists of a single absorbing state at I = (I 1 , I 2 , . . . , I k ) = 0 together with a transient communicating class C. The basic reproduction number is R 0 = β/γ as before. The mean infectious period is 1/γ as before, but the variance of the infectious period is now 1/kγ 2 and thus may be varied according to our choice of k.
We will consider only the case R 0 > 1, so that infection may become endemic in the population, and study the time to extinction from quasi-stationarity. Denoting by q k the quasi-stationary distribution, then the mean extinction time is given by τ q k = 1/λ k where −λ k is the eigenvalue with largest real part of Q C k , the transition rate matrix restricted to the non-absorbing states. The state-space is now of size N+k k , so that finding the mean time to extinction from Eq. 1 amounts to evaluating an eigenvalue of a sparse non-symmetric square matrix of dimension N+k k − 1. The process I (t)/N may be approximated in the large-population limit by the deterministic process y = (y 1 , y 2 , . . . , y k ) satisfying  This system has two equilibria: the disease-free equilibrium y = 0 and an endemic equilibrium y * = (1 − (1/R 0 )) (1/k), where 1 denotes the vector with all entries equal to 1. The (second-order) diffusion approximation is the process Y (t) satisfying where W 0 , W 1 , W 2 , . . . , W k are independent standard Brownian motions. Defining where the state-space is = y ∈ (R + ) k : max {y 1 , y 2 , . . . , y k } ≥ 0.5, y 1 + y 2 + · · · + y k ≤ N }, and the coefficients are The process has an absorbing boundary at S a = {y ∈ : max {y 1 , y 2 , . . . , y k } = 0.5}, the remainder of the boundary, denoted S r , being taken to be reflecting. Denoting by n = (n 1 , n 2 , . . . , n k ) a vector normal to the boundary S r , the boundary conditions (Gardiner 2009, section 6.6) are τ Diff (y) = 0 on S a , i,j n i B ij (y) ∂τ Diff (y) ∂y j = 0 on S r .
The above system is not uniformly elliptic on , since the matrix B(y) with entries B ij (y) has an eigenvalue of zero at points on the boundary. Consequently, we have no rigourous proof that a unique solution exists. Nevertheless, since our main interest is in approximations that work in practice, we will go ahead with investigating the performance of this diffusion approximation numerically. Specifically, we apply the finite element method (see, for example, Reddy 2006) using Freefem++ software (Freefem++ 2016).
In order to apply the finite element method, the partial differential equation must first be put into variational form, as follows. We start from the integral formulation ⎛ for appropriate test functions w(y). Integrating by parts in order to eliminate second order terms (treating mixed derivatives ∂ 2 ∂y i ∂y j symmetrically), applying boundary conditions to the resulting boundary integral terms, and noting that the condition τ Diff (y) = 0 on S a implies that relevant test functions can be taken to satisfy w(y) = 0 on S a , then all boundary integral terms vanish, and we obtain the variational form Freefem++ code to solve the above system in the case k = 2 is given in the Appendix. Approximating the drift and local variance matrices of the diffusion approximation to leading order around N y * yields a multivariate Ornstein-Uhlenbeck approximationỸ (t), centred at N y * , with drift matrix J and local variance matrixB given by The stationary distribution of this Ornstein-Uhlenbeck process is multivariate Gaussian with mean N y * , variance matrix V satisfying The solution to Eq. 23 is where I k is the k × k identity matrix and 1 k represents the k × k matrix with all entries equal to 1. Under the Ornstein-Uhlenbeck approximation, the total number of infectivesỸ 1 +Ỹ 2 + · · · +Ỹ k has a Gaussian stationary distribution with mean Ny * and variance N/R 0 . That is, the Ornstein-Uhlenbeck approximation suggests that the quasi-stationary distribution of the total number of infected individuals does not depend upon k, the number of infected stages. Further, under the Ornstein-Uhlenbeck approximationỸ 1 ,Ỹ 2 , . . . ,Ỹ k are exchangeable random variables, in stationarity. Hence we can approximate where e k denotes the unit vector with kth element equal to 1, q k e k is the quasi-stationary probability of being in state e k , and q 1 is the quasi-stationary probability for the classic (k = 1) SIS model to be in state I = 1. The mean time to extinction τ q k is then approximated as where τ OU is given by equation (10) describing a motion in R 2k that follows the equations of motion Setting θ 0 = θ k+1 = y 0 = 0, these equations of motion may be written as dy i dt =β k m=1 y m 1− k m=1 y m e θ 1 δ i1 +kγ y i−1 e −θ i−1 +θ i −y i e −θ i +θ i+1 for i = 1, 2, . . . , k, dθ i dt =−β 1 − 2 k m=1 y m e θ 1 −1 −kγ e −θ i +θ i+1 −1 for i = 1, 2, . . . , k, where δ ij is the Kronecker delta. This system has classical equilibrium points at (y, θ ) = (0, 0) and (y * , 0), together with a non-classical disease-free equilibrium point at 0, θ * , where θ * = (k, k − 1, . . . , 3, 2, 1)θ * k with θ * k satisfying The function f (z) = k m=1 z m is increasing, with f (0) = 0 and f (z) → ∞ as z → ∞, so there is a unique positive z * satisfying f (z * ) = kγ /β, and then the unique real solution of Eq. 25 is θ * k = ln z * . Also, for R 0 > 1 we have f (z * ) < k, and since f (1) = k it follows that z * < 1, so θ * k < 0. We may approximate ln τ q /N by ln τ H k /N , where with action A k given by the integration being along a trajectory going from (y * , 0) at t = −∞ to 0, θ * at t = +∞.
Solving the 2k-dimensional system of ordinary differential Eq. 24 numerically for the case k = 2, subject to boundary conditions at t = −∞ and t = +∞, and then using the solution (y(t), θ(t)) thus obtained to evaluate A k , it appears that where the function S k (θ ) satisfies the partial differential equation H k ∂S k ∂θ , θ = 0 with S k (0) = 0. From the form of the equilibrium point θ * , together with the conjecture that A k = A, it is possible to guess the solution to be It is now straightforward to check that S k (θ) given by formula (28) does indeed satisfy H k ∂S k ∂θ , θ = 0, and that S k (θ * ) = 1 − (1/R 0 ) − ln R 0 . Hence A k = (1/R 0 ) − 1 + ln R 0 for k = 1, 2, . . .. We thus arrive at the same conclusion as suggested by the Ornstein-Uhlenbeck approximation: that, to leading order, the mean persistence time does not depend upon k.
For the SIS model with Erlang distributed infectious periods, a precise result corresponding to formula (18) is available from the recent paper . Consider an SIS epidemic model in which individual infectious periods are distributed as any non-negative random variable Q of finite variance, and suppose that R 0 = βE[Q] > 1. Denote by p Q the asymptotic (large N ) probability, starting from a single infected individual in an otherwise susceptible population, that only a minor outbreak occurs. Denoting g(θ) = E e θQ then it is well-known that p Q is the unique solution in [0, 1) of Lemmas 3.2 and 3.3 of Ball et al. (2016) together imply that τ ∼ τ BBN , where In the case of the classic SIS model, then p Q = 1/R 0 and formula (18) is recovered. When Q follows an Erlang distribution with mean E[Q] = 1/γ and variance 1/kγ 2 , then g(θ) = (1 − (θ/kγ )) −k , so that p Q is the unique solution in [0, 1) of

Numerical Results
For numerical work, we take k = 2, and investigate how well our general approximation methods perform for a 2-dimensional problem. Consider first a system not far above threshold, with R 0 = 1.1. Comparing Fig. 7 with Fig. 4, the pictures seem rather similar, but there are significant differences in how they are constructed. In both cases, the exact values are computed by solving Eq. 3. For the classic SIS model this requires the evaluation of an eigenvalue of a square matrix Q C of dimension N ; for the case k = 2, the matrix in question is of dimension N(N + 3)/2.
The diffusion approximation for the classic SIS model is computed by evaluating the integral (9). For k = 2, no such explicit expression for τ Diff is available, and we proceed via the Finite Element Method. Freefem++ code (see Appendix) is used, which gives a numerical approximation to τ Diff (y) for a grid of y values within . The user specifies Parameter values β = 1.1, γ = 1 (so that R 0 = 1.1 and y * 1 = y * 2 ≈ 0.0455). Exact values computed from formula (3); diffusion approximation (21); Ornstein-Uhlenbeck approximation (10). Diffusion approximation is initiated at the finite element meshpoint nearest to Y (0) = N y * 1 , y * the set of grid points along the boundary S a ∪ S r , and then grid points in the interior of are selected automatically by the software. To approximate τ q k , we select the grid point at minimal Euclidean distance from N y * . Since the surface τ Diff (y) is quite flat around N y * , our approximation is not overly sensitive to the precise value of y chosen.
Turning to the asymptotic result of Ball et al. (2016), with k = 2 then p Q satisfying equation (30) may be found explicitly as Formula (31) in this case reduces to The performance of our approximations for a system well above threshold (R 0 = 1.5) is shown in Fig. 8. The picture here is very similar to that seen in Fig. 5 for the classic SIS model: τ BBN gives the correct leading-order asymptotic behaviour; τ Diff underestimates mean persistence time; and τ OU provides a very poor approximation. Correct leading-order asymptotic behaviour, in agreement with the result of Ball et al. (2016), may be found using the Hamiltonian approach.   (21); Ornstein-Uhlenbeck approximation (10). Diffusion approximation is initiated at the finite element meshpoint nearest to Y (0) = N y * 1 , y *

Discussion
For the below-threshold case R 0 < 1, we have seen that a linear birth-death process approximation works well provided the initial number of infected individuals is small. Although we only looked at the classic SIS model, a wide range of infection models (Ball 2014) are known to satisfy an epidemic threshold theorem. The threshold theorem tells us that for R 0 < 1 and population size N large, infection is very likely to die away quickly without ever infecting a significant fraction of the population; and that throughout a typical shortlived outbreak the number of susceptible individuals S(t) may be well approximated by the constant N , so that consequently the process of infected individuals I (t) may be wellapproximated by a (linear) branching process. Thus we have a quite general approximation technique, which is seen to perform well in practice provided the number of initially infected individuals is small (Figs. 1 and 2). A number of previous authors have also considered the case in which a significant fraction of the population is initially infected. This situation is of less practical interest, since for sufficiently large N the chance of a significant fraction of the population ever becoming infected is negligible. The only approximation that we have found to work well in this case is the 'diffusion approximation' (Fig. 3), and the value of this approximation seems doubtful, since it appears no more straightforward to compute than the exact answer (indeed, if anything it seems more difficult to work with).
We turn now to the more interesting case R 0 > 1, so that it is possible for infection to sustain itself in the population over the long term. In this case, we studied time to extinction from an endemic state. First of all, consider the 'diffusion approximation'. We saw that for moderate population sizes and R 0 not too far above 1, the approximation performs rather well (Figs. 4 and 7). On the other hand, it is known that this approximation gives the wrong asymptotic (large N ) behaviour for the classic SIS model (Doering et al. 2005). We have seen (Figs. 5 and 8) that for the SIS model with Erlang-distributed infectious periods, exactly as for the classic SIS model, the diffusion approximation gives the wrong asymptotic behaviour. Hence we conclude that, firstly, the diffusion approximation does not seem trustworthy in cases where it is not possible to compute the exact answer for comparison; and secondly, the circumstances in which the diffusion approximation performs well seem to be precisely those circumstances in which it is straightforward to numerically evaluate the exact solution (1). Thus the value of this approximation method seems at best doubtful. The recent paper (Wang et al. 2014) uses a diffusion approximation to study time to extinction of E. coli O157:H7 infection from a herd of cattle. The diffusion is approximating a 2-dimensional Markov jump process, with population size of the order of 100 cattle and 10 11 colony forming units (cfu) of E. coli. The state-space of the Markov jump process is therefore far too large to compute the mean extinction time exactly from Eq. 1, and some approximation technique is required. For different scenarios considered in Wang et al. (2014), mean extinction times computed from the diffusion approximation are from around 1 month up to around 40 years. This suggests that the infection process is not very far above threshold, and that the approximation may perform well. On the other hand, a population size of 10 11 cfu is very large, which may lead the approximation to break down. Although it is certainly possible in these circumstances that the diffusion may give a good approximation to mean extinction time, our results suggest that one cannot have any real confidence of this. Similarly, in Doubova and Vadillo (2016), diffusion processes are used to compute mean extinction times for a Lotka-Volterra predator-prey model, for two variants of the SIS epidemic model, and for a two-pathogen epidemic model; there is no attempt to validate the approximations with reference to the underlying Markov chains.
Secondly, consider the Ornstein-Uhlenbeck approximation. This approach has been widely used in the literature due to its great simplicity. A formula such as Eq. 10 is much more analytically tractable than the exact solution (2) or the diffusion approximation (9). However, we have seen (Figs. 4,5,7 and 8) that this approximation performs extremely poorly. Nevertheless, we would argue that this approach retains some value in limited circumstances: specifically, for qualitative comparison between two infection models which share the same deterministic endemic prevalence level. This is because, for N large and R 0 well above 1, this approach gives a good approximation to the body of the quasi-stationary distribution -see, for example, Clancy (2005) and Nåsell (1999). The approximation fails in the tails of the distribution, hence the poor performance of our numerical approximation, which relies upon approximating the quasi-stationary probability that exactly one individual is infected. For qualitative comparison of two models, provided that both models have the same mean number of infected individuals in quasi-stationarity (approximated by the deterministic endemic level), it seems reasonable to suppose that that the model whose quasi-stationary distribution has higher variance will assign greater probability to the state with one individual infected. The Ornstein-Uhlenbeck approach gives a good approximation for the variance of the quasi-stationary distribution, provided N is large and R 0 1. It is therefore reasonable to suppose that for two infection models sharing a common deterministic endemic prevalence level, the model under which the variance of the stationary distribution of the Ornstein-Uhlenbeck approximating process is greater will have the lower mean persistence time.
Finally, we turn to the Hamiltonian approach. Whereas diffusion approximations deal with moderate deviations from a deterministic mean, the Hamiltonian approach is suited to dealing with large deviations, and so would be expected to perform better when studying time to disease extinction. Our results (Figs. 5 and 8) confirm that this approach does indeed give the correct leading-order asymptotic behaviour for large N . This approach therefore seems the most promising overall, but a number of difficulties remain. Firstly, we have not obtained a useful numerical approximation for finite N , due to the unknown pre-factor C in the relation τ q ∼ C(N, R 0 ) exp(N A). For 1-dimensional systems, including the classic SIS model, it is possible to evaluate the pre-factor C by retaining higher order information when approximating K(θ, t), see Doering et al. (2005) and Asaf and Meerson (2010). It does not seem straightforward to generalise this to higher dimensional systems, although some progress has been made for one particular 2-dimensional infection model in van Herwaarden and Grasman (1995), where is is shown that the pre-factor is of the form C 0 / √ N for some (unknown) C 0 that does not depend upon N . Secondly, to evaluate the constant A in the dominant exponential term will, in general, require the solution of a boundary value problem for a 2k dimensional ordinary differential equation system, where k is the dimensionality of the original model. Recent work on numerical approaches to this problem includes (Schwartz et al. 2011;Lindley et al. 2014), where systems in k = 2, 3 dimensions are analysed. An alternative is to seek approximations valid within certain regions of parameter space, such as the 'adiabatic approximation' of Dykman et al. (2008), where (for their model of interest) a simple explicit expression for A valid for R 0 close to 1 is derived. Our results for the SIS model with Erlang-distributed infectious periods suggest another possibility: that for systems with sufficient symmetry, it may be possible to guess an explicit solution to the partial differential equation satisfied by S(θ ), as we did to obtain formula (28), which leads directly to an explicit formula for A. In fact, it is not necessary to solve for S(θ) for all θ, but only to evaluate S θ * − S (0). To illustrate this, consider an SEIS model, with a latent ('exposed') period between being infected and becoming infectious.
Suppose the latent period follows an Erlang distribution corresponding to j latent stages, each of mean 1/j ν, whilst the infectious period remains Erlang with k stages each of mean 1/kγ . The Hamiltonian for this system is y m e −θ m +θ m+1 − 1 + kγ y j +k e −θ j +k − 1 where y 1 , y 2 , . . . , y j correspond to latent stages and y j +1 , y j +2 , . . . , y j +k to infectious stages, and similarly for the components of θ . The corresponding equations of motion have a disease-free equilibrium point with θ * = (k, k, . . . , k, k − 1, k − 2, . . . , 3, 2, 1)θ * k , where θ * k is given by Eq. 25. On the hyperplane θ 1 = θ 2 = · · · = θ j = θ j +1 , the function S k (θ ) given by formula (28) (with components θ 1 , θ 2 , . . . , θ k re-labelled as θ j +1 , θ j +2 , . . . , θ j +k ) is readily seen to satisfy the relevant Hamilton-Jacobi equation, and it follows that the action is given by A j,k = (1/R 0 ) − 1 + ln R 0 as before. Thus we see that the existence of a latent period has no effect upon the leading-order term in the expected time to extinction of infection. The above argument can be straightforwardly extended to allow for a latent period distributed according to any phase-type distribution.